Communications In Mathematical Physics - Volume 269

Commun. Math. Phys. 269, 1–15 (2007) Digital Object Identifier (DOI) 10.1007/s00220-006-0133-y Communications in Mathe...

Author: M. Aizenman (Chief Editor)

32 downloads 669 Views 9MB Size Report

This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!

Report copyright / DMCA form

DOWNLOAD PDF

Commun. Math. Phys. 269, 1–15 (2007) Digital Object Identifier (DOI) 10.1007/s00220-006-0133-y

Communications in

Mathematical Physics

Power-Law Inflation in Spacetimes Without Symmetry J. Mark Heinzle, Alan D. Rendall Max Planck Institute for Gravitational Physics, Albert Einstein Institute, Am Mühlenberg 1, D-14476 Golm, Germany. E-mail: [email protected], [email protected] Received: 6 July 2005 / Accepted: 27 April 2006 Published online: 18 October 2006 – © Springer-Verlag 2006

Abstract: We consider models of accelerated cosmological expansion described by the Einstein equations coupled to a nonlinear scalar field with a suitable exponential potential. We show that homogeneous and isotropic solutions are stable under small nonlinear perturbations without any symmetry assumptions. Our proof is based on results on the nonlinear stability of de Sitter spacetime and Kaluza-Klein reduction techniques. 1. Introduction At present the subject of accelerated cosmological expansion is of great astrophysical interest. Many candidates have been suggested for the cause of the acceleration, known under the name of dark energy. The greater part of the literature on this concerns homogeneous (and even isotropic) models or their linearized perturbations. This is often enough to make contact with observations. Nevertheless, since many of the phenomena of interest are linked to inhomogeneities, it is desirable to develop an understanding of the full nonlinear dynamics for initial data which are as general as possible. One approach, which is pursued in this paper, is to try to prove general mathematical theorems. A recent review of this approach is [12]. A well-known feature of models for accelerated cosmological expansion is that they exhibit attractor solutions which are homogeneous and isotropic. The simplest model is a positive cosmological constant in which case the attractor is the de Sitter solution. In that case a theorem on stability of this solution under small nonlinear perturbations has been proved [5]. It concerns the vacuum equations with positive cosmological constant. No symmetry assumption is required. Generalizations where matter such as a perfect fluid or kinetic theory is added are not available. Under the restriction of plane symmetry an analogous result with collisionless matter was proved in [14]. There are also no generalizations of the result of [5] in the literature to other models of accelerated expansion such as nonlinear scalar fields. This paper proves a generalization of this type for a nonlinear scalar field with exponential potential. Because it was originally applied

2

J. M. Heinzle, A. D. Rendall

in models of the early universe this is associated with the name power-law inflation. The method of proof restricts the exponents allowed to a discrete set. The result of [5] makes essential use of the conformal properties of the Einstein equations in four dimensions. Anderson [1] has extended this analysis to any even spacetime dimension. Here we combine the results of [1] with a Kaluza-Klein reduction which relates power-law inflation in four dimensions with a suitable exponent in the potential to vacuum spacetimes with cosmological constant in higher dimensions. It is proved that certain homogeneous and isotropic solutions are nonlinearly stable. Consider a spacetime (M, g, ˜ ϕ) that satisfies the Einstein equations R˜ μν − 21 R˜ g˜ μν = κ 2 Tμν with nonlinear scalar field matter Tμν

1 δ ∇ ϕ∇δ ϕ + V (ϕ) g˜ μν , = ∇μ ϕ∇ν ϕ − 2

(1)

(2)

where V (ϕ) is the potential of the scalar field. Using that the spacetime is four-dimensional, Eq. (1) and (2) can be condensed into R˜ μν = κ 2 ∇μ ϕ∇ν ϕ + V (ϕ)g˜ μν . (3a) The Bianchi identity implies the equation of motion for the scalar field ϕ = V (ϕ).

(3b)

Power-law inflation refers to the case V (ϕ) = V0 exp [−κλ ϕ] (4) √ with constants V0 and λ, where λ < 2. Models with a potential of this type are the subject of the following. The most elementary power-law inflation models are the homogeneous and isotropic models [7], of which the simplest is ˜2

g˜ = −d t +

dH 2

2+ 4 d

4

t˜2+ d δi j d x˜ i d x˜ j ,

(5)

where d is related to the exponent λ in the potential via λ2 = 2d(2 + d)−1 . The main theorem of this paper establishes nonlinear stability of this model. Theorem 1.1. Consider the exponential potential

√ d ϕ with d ∈ N, d even. V (ϕ) = V0 exp −κ 2 d +2

(6)

On T 3 consider smooth Cauchy initial data for the Einstein scalar field equations (3) with potential (6). Let the initial data be close in the C ∞ -topology to the homogeneous and isotropic initial data of the model (5). Then the initial data evolves into a power-law inflation model (M, g, ˜ ϕ) which is geodesically complete to the future, and

Power-law Inflation without Symmetry

3

there exists a coordinate system (t˜, x˜ i ) that is global to the future, such that g˜ takes the form g˜ = −α 2 d t˜2 + g˜i j d x˜ i d x˜ j , where g˜i j and ϕ admit the asymptotic expansions (m) √ d +2 2+4/d −2m/d −1 g˜i j = t˜ g˜i j t˜ , ϕ=κ 2 ϕ(m) t˜−2m/d , (7) log t˜ + d m≥0

m≥0

and α = m≥0 α(m) t˜−2m/d . Thus, the homogeneous and isotropic power-law inflation model (5) is nonlinearly stable. A slightly different formulation of the theorem uses the concept of asymptotic simplicity. We call the spacetime (M, g, ˜ ϕ) asymptotically simple (in the future), when g˜ is conformal to a metric g´ = 2+d g, ˜ and ϕ´ = ϕ + dκ −1 λ−1 log , where g, ´ ϕ, ´ and the positive function can be smoothly extended through the hypersurface = 0, which we denote as I + , the conformal boundary of M. The formula for the conformal transformation of the curvature scalar implies g´ μν ∇μ ∇ν = −2κ −2 V0 (2 + d)−1 (3 + d)−1 exp(−κλϕ) ´ on I + , hence I + is spacelike. Note in this context that our definition for asymptotic simplicity differs from the standard definition which applies for the vacuum case and for the case when matter can be neglected in an appropriate sense in the neighborhood of the conformal boundary. Theorem 1.1 states that initial data close to homogeneous and isotropic data evolves into an asymptotically simple solution. We also have Theorem 1.2. The initial value problem for the Einstein equations with scalar field matter and potential (6), where initial data is prescribed at conformal infinity I + , is well-posed. The resulting spacetime (M, g, ˜ ϕ) is asymptotically simple with conformal boundary I + and the metric and the scalar field admit the asymptotic expansions (7), where the coefficients are uniquely determined by the initial data. The proof of the theorems is based on the fact that power-law inflation models (M, g, ˜ ϕ) with potential (6) are in one-to-one correspondence with a certain type of ˆ g) d-dimensional vacuum solutions ( M, ˆ of the Einstein equations with positive cosmological constant. This is proved in Sect. 2. In Sect. 3 we give a brief overview of ˆ g); the asymptotic behavior of solutions ( M, ˆ in particular we recapitulate existence and nonlinear stability of asymptotically simple solutions. On the basis of these results, the theorems are proved in Sect. 4. Theorem 1.1 formulates the asymptotic behavior in a certain coordinate system that is not Gaussian in general. In Appendix A we briefly discuss asymptotic expansions in Gaussian coordinates; in particular we show that this choice of coordinates introduces logarithmic terms into the expansions. In Appendix B we investigate asymptotic expansions for general exponential potentials on the level of formal power series. We demonstrate that logarithmic terms that appear in the series can often be removed by a suitable choice of (non-Gaussian) coordinates; this complements previous studies of formal expansions of this type [9]. 2. Reduction In this section we employ the Kaluza-Klein reduction method to find a one-to-one relationship between solutions of the Einstein equations with positive cosmological constant and certain power-law inflation models. The formulation of Kaluza-Klein theory used and the notation is primarily based on [3].

4

J. M. Heinzle, A. D. Rendall π

Consider a principal fiber bundle G → Mˆ → M, where the base space M is a 4-dimensional differentiable manifold, and G a d-dimensional Lie group (which we ˆ let gˆ be a Lorentzian metric such that gˆ is will eventually assume to be abelian). On M, invariant under the right action of G and vertical vectors are not null w.r.t. g. ˆ The metric gˆ induces a Lorentzian metric g on M, a metric ξ on each fiber, where ξ is invariant under the action of G, and a connection on Mˆ (in the form of a horizontal bundle). In the so-called polarized case the horizontal distribution is assumed to be involutive and in an adapted local trivialization over a chart neighborhood of M with coordinates {x μ | μ = 0 . . . 3} the metric gˆ can be written as gˆ = gˆ AB e A e B = gμν d x μ d x ν + ξmn θ m θ n ,

(8)

where the θ m constitute a basis of right-invariant 1-forms on G, and gμν , ξmn depend only on the coordinates {x δ }. Greek indices run from 0 to 3, latin indices m, n, etc. assume values 1, 2, . . . , d. Capital letters A, B, etc. run over the combined range: the components T AB of a tensor thus comprise Tμν , Tμn , Tmν , Tmn . We set

eφ := det ξ so that ξmn = e2φ/d ζmn , (9) where det ζ = 1. Like ξmn , in the given trivialization, the field φ can be regarded as a scalar field on M. To compute the Ricci tensor Rˆ AB of the metric gˆ we recall the general Kaluza-Klein formulas from [3] and use ∇μ φ = 21 ξ mn ∇μ ξmn , where ξ mn is the inverse of ξmn . We obtain Rˆ μν = Rμν − ∇μ ∇ν φ + 41 ∇μ ξmn ∇ν ξ mn , Rˆ mn = Rmn + 21 ξ pq ∇α ξmp ∇ α ξnq − 21 ∇ α ∇α + ∇ α φ ∇α ξmn ,

(10a) (10b)

where Rμν is the Ricci tensor of gμν and Rmn the Ricci curvature of the fibers. Both Rmn and the components Rˆ μn of the Ricci tensor Rˆ AB vanish if the Lie group G is abelian. On M we introduce the conformally rescaled metric g˜ μν := eφ gμν

(11)

and denote the Ricci curvature of g˜ μν by R˜ μν . Employing ∇ α ∇α + ∇ α φ ∇α = eφ ∇˜ α ∇˜ α , where ∇˜ α = g˜ αβ ∇˜ β , Eq. (10) becomes 1 ˜ ˜δ 1 1 ˜ 1 + (12a) Rˆ μν = R˜ μν + ∇δ ∇ φ g˜ μν − ∇μ φ ∇˜ ν φ + ∇˜ μ ζmn ∇˜ ν ζ mn , 2 2 d 4 2 ˜ ˜α 1 (∇α ∇ φ)ξmn + e2φ/d ∇˜ α ∇˜ α ζmn − ζ pq ∇˜ α ζmp ∇˜ α ζnq . Rˆ mn = Rmn − eφ 2 d (12b) Contracting (12b) with ξ mn leads to ξ mn Rˆ mn = ξ mn Rmn − eφ ∇˜ α ∇˜ α φ,

(13)

where again ξ mn Rmn = 0 in the case of an abelian Lie group G. Equation (12) simplifies when ζmn is independent of x δ , i.e., when ∇μ ζmn = 0.

Power-law Inflation without Symmetry

5

ˆ g) Assume that ( M, ˆ satisfies the Einstein vacuum equations with cosmological constant , i.e., Rˆ AB =

2 gˆ AB . d +2

(14)

Suppose further that the Lie group G is abelian and that ∇μ ζmn = 0. From (13) we obtain 2d e−φ . ∇˜ α ∇˜ α φ = − d +2

(15)

Equation (12a) then leads to R˜ μν = e−φ g˜ μν +

1 1 ˜ + ∇μ φ ∇˜ ν φ. 2 d

(16)

Define ϕ=κ

−1

1 1 + φ, 2 d

(17)

then (15) and (16) become ∇˜ α ∇˜ α ϕ = V (ϕ) and R˜ μν = κ 2 V (ϕ) gμν + κ 2 ∇˜ μ ϕ ∇˜ ν ϕ,

(18)

where

V (ϕ) := κ

−2

√ exp −κ 2

d ϕ . d +2

(19)

By comparing (18) with (3a) and (3b) we conclude that (M, g, ˜ ϕ) is a solution of the Einstein equations with nonlinear scalar field, where the potential is an exponential function, i.e., (M, g, ˜ ϕ) is a power-law inflation model. The exponent in the potential (19) is λ = λd :=

√

2

√ d < 2, d +2

(20)

cf. (4). Conversely, given a solution (M, g, ˜ ϕ) representing power-law inflation with exponent λ = λd = (2d)1/2 (d + 2)−1/2 for some d ∈ N, we are able to construct ˆ g) a (4 + d)-dimensional solution ( M, ˆ of the Einstein vacuum equations with positive cosmological constant. We take for G an abelian Lie group, which ensures Rˆ μn = 0, and we set ∂μ ζmn = 0; e.g., we use Mˆ = M ×G, where G = Rd is endowed with ζmn = δmn . The relations (18) together with the form of the potential imply Rˆ AB = 2/(d + 2) gˆ AB with = κ 2 V0 .

6

J. M. Heinzle, A. D. Rendall

We conclude by giving a schematic overview of the one-to-one correspondence of solutions, which has been established: ˆ ( Mˆ = M × Rd , g),

(M, g, ˜ ϕ), R˜ μν − 21 R˜ g˜ μν = κ 2 Tμν [V (ϕ)], ˜ = V (ϕ), ϕ √ d −2 V (ϕ) = κ exp −κ 2 d+2 ϕ ,

Rˆ AB − 21 Rˆ gˆ AB + gˆ AB = 0,

gˆ = gμν d x μ d x ν + e2φ/d δmn dy m dy n ,

g˜ μν = eφ gμν , ϕ = κ −1

d+2 2d

φ. (21)

For the de Sitter solution in (4 + d) dimensions we can write 0 gˆ = −(d x 0 )2 + e2H x (d x 1 )2 + (d x 2 )2 + (d x 3 )2 + (dy 1 )2 + · · · + (dy d )2 , (22) where H −2 = (d + 2)(d + 3)/(2 ). From (21) we infer that φ = d H x 0 , and g˜ becomes 0 0 (23) g˜ μν d x μ d x ν = ed H x −(d x 0 )2 + e2H x (d x 1 )2 + (d x 2 )2 + (d x 3 )2 . By introducing new coordinates x˜ μ through d x˜ 0 = exp(d H x 0 /2) d x 0 and x˜ i = x i we obtain μ

ν

g˜ μν d x˜ d x˜ = −(d x˜ ) + 0 2

dH 2

2+ 4 4 d 2+ d x˜ 0 δi j d x˜ i d x˜ j ,

(24)

i.e., a flat Robertson-Walker model for power-law inflation as in (5). 3. Asymptotic Series Consider the Einstein vacuum equations with cosmological constant in n + 1 dimensions, n ≥ 3, n odd. The n + 1 decomposition of the equations consists of the constraint equations and the evolution equations ˆ kˆ a − 2 δ a , ∂t gˆ ab = −2gˆ ac kˆbc , ∂t kˆba = Rˆ ba + (tr k) b n−1 b

(25)

where we have used a vanishing shift vector and a lapse function set equal to one. In [11] it was proved that the equations admit power series of the following type as formal solutions: (0) (2) (3) + e−2H t gˆ ab + e−3H t gˆ ab + ··· , (26a) gˆ ab = e2H t gˆ ab (26b) kˆ ab(m) e−m H t , kˆba = −H δba + m≥2

Power-law Inflation without Symmetry

7

√ where H = 2 /[n(n − 1)]. The coefficients kˆ ab(m) = σˆ ab(m) + n −1 tr kˆ(m) δba are obtained recursively through the relations [n − m]H σˆ ab(m) =

m−2

σˆ ab(p) tr kˆ(m-p) + tfRˆ ba (m) ,

(27a)

tr kˆ(p) tr kˆ(m-p) + Rˆ (m) ,

(27b)

p=2

[2n − m]H tr kˆ(m) =

m−2 p=2

for m ≥ 2, m = n, m = 2n, which follows from (25), and through 2(n − 1)H tr kˆ(m) = Rˆ (m) +

m−2

−kˆ ab(p) kˆ ba (m-p) + tr kˆ(p) tr kˆ(m-p) ,

(27c)

p=2

for m = 2n, which follows from the Hamiltonian constraint. Here kˆ ab(m) vanishes for all (m) odd m < n. The evolution equation ∂t gˆ ab = −2gˆ ac kˆbc implies that the coefficients gˆ ab (m) are determined by the coefficients kˆ ab(l) , l = 0 . . . m, recursively; in particular gˆ ab =0 (0) (n) for all odd m < n. The remaining unspecified coefficients gˆ ab and gˆ ab encode the free data, (0) (n) gˆ ab = Aˆ ab , gˆ ab = Bˆ ab ,

(28)

where Aˆ ab is a Riemannian metric, Bˆ ab a symmetric tensor that satisfies Aˆ ab Bˆ ab = 0 and ∇ˆ a Bˆ ab = 0, where Aˆ ab is the inverse of Aˆ ab and ∇ˆ a refers to Aˆ ab . ˆ g) Consider now a spacetime ( M, ˆ that is asymptotically simple and de Sitter (in the future), see, e.g., [5]. By definition, gˆ is then conformal to a metric gˇ = 2 g, ˆ where gˇ and the positive function can be smoothly extended through the hypersurface = 0, ˆ Since gˇ μν ∇μ ∇ν = which is often denoted as the conformal boundary Iˆ+ of M. −1 −1 2 + −2 n (n − 1) = −H on Iˆ , which follows from the conformal transformation of the curvature scalar, the metric gˇ takes the form gˇ = −H −2 α 2 d2 + gˇ ab d zˇ a d zˇ b ,

(29)

when is used as the time coordinate and the zˇ a are spatial coordinates that are constant along the curves orthogonal to slices = const. The function α depends on and zˇ a ; α = 1 on Iˆ+ . Letting tˆ = exp(−H ) the physical metric becomes gˆ = −α 2 d tˆ2 + gˆ ab d zˇ a d zˇ b with gˆ ab = exp(2H tˆ)gˇ ab . In [11, Sect. 4], in dimension n = 3, it is shown that this relation together with an analogous set of fall-off conditions for α and kˆba implies that Gauss coordinates can be introduced in which the metric gˆ ab and the extrinsic curvature kˆba exhibit an asymptotic expansion of the form (26). It is straightforward to apply the proof in [11] to all odd dimensions. The initial value problem for the Einstein equations with positive cosmological constant, where initial data is prescribed at conformal infinity Iˆ+ , is well-posed; this has been shown in [5] in the case n = 3. In particular, given an arbitrary Riemannian metric Aˆ ab on a (compact) manifold Iˆ+ and a symmetric tensor Bˆ ab that is tracefree and divergence-free, then there exists a unique future asymptotically simple solution of Einstein’s equations whose conformal boundary is Iˆ+ . Global non-linear stability

8

J. M. Heinzle, A. D. Rendall

of asymptotic simplicity has been proved in [6]. Hence, the evolution of initial data sufficiently close to de Sitter data yields a spacetime that is globally close to de Sitter and in particular asymptotically simple in the past and in the future. For our purposes it is most relevant that these statements have been generalized recently to arbitrary odd (spatial) dimensions n, in particular, even-dimensional de Sitter spacetime is (globally) non-linearly stable, see [1]. We conclude that any initial data close to de Sitter evolves into an asymptotically simple solution having an asymptotic expansion of the form given in (26) together with (28), where Aˆ ab and Bˆ ab correspond to the conformal initial data set. 4. Reduction of Asymptotic Series Consider the (4 + d)-dimensional manifold Mˆ = M × Rd ; let d be even. On Mˆ consider solutions of the Einstein vacuum equations with cosmological constant of the type (21), gˆ = −dt 2 + gi j d x i d x j + e2φ/d δmn dy m dy n . gˆ ab dz a dz b

(30)

ˆ g) If ( M, ˆ is asymptotically simple, as is guaranteed for solutions sufficiently close to de Sitter, gˆ ab exhibits the asymptotics (26). It follows that ⎛ ⎞ (m) (0) gi j e−m H t ⎠ , φ = d H t + φ(0) + φ(m) e−m H t . (31) gi j = e2H t ⎝gi j + m≥2

From the 3 + d split of

m≥2

kˆba ,

∂ ∂ ∂ kˆba a ⊗ dz b = k ij ⊗ d x j + κδnm m ⊗ dy n , i ∂z ∂x ∂y we obtain in an analogous manner k ij = −H δ ij + k i j (m) e−m H t , κ = −H + κ(m) e−m H t . m≥2

(32)

(33)

m≥2

The evolution equation ∂t gˆ ab = −2gˆ ac kˆbc , cf. (25), reduces to j

∂t gik = −2gi j kk , ∂t φ = −dκ,

(34)

hence gi j (m ≥ 2) is determined recursively from gi j and k i j (l) , l = 2 . . . m, and κ(m) = m H d −1 φ(m) . Reduction of the recursive algebraic system (27) yields (m)

(0)

[3 + d − m]H σ i j (m) = tfP ji (m) +

m−2

σ i j (p) [trk(m-p) + dκ(m-p) ],

(35a)

p=2

[(6 + d) − m]H trk(m) + 3H dκ(m) = P(m) +

m−2

[trk(p) + dκ(p) ] trk(m-p) ,

(35b)

[trk(p) + dκ(p) ] dκ(m-p) ,

(35c)

p=2

d H trk(m) + [(3 + 2d) − m]H dκ(m) = ρ(m) +

m−2 p=2

Power-law Inflation without Symmetry

9

where P ji = R ij −

1 i ∇ φ∇ j φ − ∇ i ∇ j φ = P ji (m) e−m H t d

(36)

m≥2

and ρ = −∇ i ∇i φ −∇ i φ∇i φ with an analogous expansion. (For the reduction it is useful to employ the warped product structure of the metric, see, e.g., [4].) The m th coefficient P ji (m) is determined by the coefficients gi(l)j and φ(l) , with l = 0 . . . (m − 2); ρ(m) by φ(l) , l = 0 . . . (m − 2). The determinant of the coefficient matrix of the l.h.s. of (35b,35c) is (m −[d +3])(m −2[d +3]). The system (35) thus determines σ i j (m) , trk(m) , κ(m) recursively except for m = d + 3, m = 2(d + 3). In the case m = 2(d + 3) the system (35b,35c) is complemented by the equation m−2 m−2 2 j 2H (2 + d) trk(m) + dκ(m) = − σ i j (p) σ i (m-p) + trk(p) trk(m-p) 3 p=2

+

d −1 d

p=2

m−2

dκ(p) dκ(m-p) + 2

p=2

m−2

dκ(p) trk(m-p) + P(m) + ρ(m) ,

p=2

(35d) which is the reduced version of the constraint equation (27c). The 0th and the (3 + d)th coefficients are undetermined by the algebraic system; they represent the free data. When we decompose Aˆ ab , Bˆ ab , cf. (28), according to Aˆ ab dz a dz b = Ai j d x i d x j + e2 A δmn dy m dy n , Bˆ ab dz a dz b = Bi j d x i d x j + Bδmn dy m dy n ,

(37a) (37b)

we obtain gi(0)j = Ai j ,

φ(0) = d A ,

2φ/d gi(3+d) )(3+d) = B. j = Bi j , (e

(38)

Hereby the data must satisfy the following conditions: Ai j Bi j + d Be−2 A = 0 , ∇ i Bi j − d Be−2 A ∇ j A + d Bi j Aik ∇k A = 0,

(39)

where Ai j is the inverse of Ai j and ∇i refers to Ai j . We now make use of the relation (21) to prove Theorems 1.1 & 1.2: ˆ g) An asymptotically simple solution ( M, ˆ of the type (30) uniquely corresponds to an asymptotically simple solution (M, g, ˜ ϕ) representing power-law inflation, φ 2 i j −1 d + 2 φ. (40) g˜ = e −dt + gi j d x d x , ϕ = κ 2d Thus, the asymptotic behavior of g˜ μν and ϕ is uniquely determined by studying the (reduction of the) asymptotic expansions of g. ˆ From the above analysis we obtain that the asymptotic behavior of g˜ μν and ϕ is given by the asymptotic series (31) of gi j and φ. The coefficients in these series are determined via (34) through the coefficients k i j (l) and κ(l) , which are in turn determined recursively by (35). The remaining free data is specified as in (38).

10

xi ,

J. M. Heinzle, A. D. Rendall

Introducing a new time coordinate t˜ through d t˜ = exp(d H t/2) dt and setting x˜ i = the metric (40) becomes (φ−d H t)

g˜ = −e −α 2 where

(φ−d H t) ˜2 dt + e

⎡ α 2 = exp ⎣d A +

φ(m)

m≥2

h i j = Ai j +

m≥2

gi(m)j

4 d H 2+ d 2+ 4 t˜ d h i j d x˜ i d x˜ j , 2 g˜i j

dH 2

dH 2

−2m/d

−2m/d

(41)

⎤ t˜−2m/d ⎦ ,

t˜−2m/d .

(42a)

(42b)

To show Theorem 1.2 consider on the three-dimensional manifold I + a Riemannian metric Ai j , a symmetric tensor Bi j , and fields A, B, that satisfy the condition (39). Defining Aˆ ab , Bˆ ab according to (37) results in Cauchy data at conformal infinity Iˆ+ = I + × Rd for the Einstein vacuum equations with positive cosmological constant in (4 + d) dimensions. The well-posedness of the corresponding Cauchy problem has been established in [1]. Reduction of this result yields Theorem 1.2. It is straightforward to show from (35) that (41) together with (42) coincides with the homogeneous and isotropic solution (5) when A = 1, Ai j = δi j , and (e2φ/d )(3+d) = B = 0, gi(3+d) j = Bi j = 0. Since the solution (5) uniquely corresponds to the (4 + d)-dimensional de Sitter solution, nonlinear stability of the latter reduces to nonlinear stability of the former, which shows Theorem 1.1. Equivalently, Theorem 1.2 can be applied directly to obtain the result. Hence, the evolution of initial data sufficiently close to data characterizing the power-law inflation Robertson-Walker model yields a spacetime that is globally close to that model and the spacetime possesses a metric g˜ and a scalar field ϕ of the form (40), which exhibit the asymptotic expansion (31) in a future end of the spacetime. 5. Conclusions In this paper it has been shown that in certain models of accelerated cosmological expansion homogeneous and isotropic solutions are stable under small nonlinear perturbations without any symmetry assumptions. These results concern the Einstein equations coupled to a nonlinear scalar field with a suitable exponential potential. They show that some known results for spacetimes with positive cosmological constant generalize to a situation where the acceleration of a cosmological model is due to the effect of a nonlinear scalar field. For cosmological applications it would be desirable to incorporate a description of ordinary matter (galaxies and dark matter) into the models. It is expected that the source of the cosmological acceleration (dark energy, the cosmological constant or the scalar field) will dominate the dynamics at late times, but this should be proved rather than assumed. In this paper we were not able to include ordinary matter but note that this has not yet even been done for a perfect fluid or collisionless matter in the case of a

Power-law Inflation without Symmetry

11

cosmological constant. It seems that in order to do this, methods will be needed which are more direct than those using conformal invariance properties. Another direction in which the results should be extended is to nonlinear scalar fields with more general potentials. The case of an exponential potential with a general value of the exponent is discussed on the level of formal power series in Appendix B of this paper. The observation that a judicious choice of time coordinate can simplify the asymptotic expansions may be useful for later work using other methods. A similar discussion for a potential with a strictly positive lower bound is given in [2]. For wider classes of potentials the only mathematical theorems concern spatially homogeneous spacetimes of Bianchi types I-VIII, including normal matter [8, 10, 13]. Questions related to cosmic acceleration and dark energy play a key role in modern cosmology. They deserve the attention of researchers in mathematical relativity and we hope that this paper will contribute to the development of this area of mathematical physics. A. Asymptotics in Gaussian Coordinates The coordinates in which the asymptotic expansions have been given above are not Gaussian; in this section we investigate the asymptotic expansions in Gaussian coordinates. We begin by showing that the spacetime admits Gauss coordinates that are global to the future. The metric and extrinsic curvature functions satisfy the following estimates: |g˜i j | ≤ C t˜2+4/d , |g˜ i j | ≤ C t˜−2−4/d , |˜ ijk | ≤ C, |α − e |σ˜ ij |

d A/2

˜−4/d

˜−1−4/d

| ≤ Ct

˜−1−4/d

≤ Ct

, |∂t˜α| ≤ C t , |∂i α| ≤ C, −d A/2 , |tr k˜ + (3 + 6/d)e t˜−1 | ≤ C t˜−1−4/d .

(A.1a) (A.1b) (A.1c)

Lemma A.1. Consider a metric g˜ μν d x˜ μ d x˜ ν = −α 2 d t˜2 + g˜i j d x˜ i d x˜ j , cf. (41), which is given on a time interval [T, ∞), and assume that there exists C ∈ R such that the estimates (A.1) hold. Consider a hypersurface t˜ = t˜0 and Gaussian coordinates based on that hypersurface. If t˜0 is sufficiently large, then the Gaussian coordinates extend globally to the future. Proof. Consider an affinely parametrized geodesic γ (τ ) that is orthogonal to the hyper˜ ) for γ (τ ); surface t˜ = t˜0 . By a slight abuse of notation we write (t˜, x)(τ ˜ 0 ) = x˜0i , t˜(τ0 ) = t˜0 , x(τ

d t˜ (τ0 ) = α −1 (τ0 ) , dτ

d x˜ i (τ0 ) = 0. dτ

(A.2)

It is a solution of the geodesic equations d 2 t˜ + α −1 ∂t˜α dτ 2 d 2 x˜ i + α∂ i α dτ 2

d t˜ dτ

2

d t˜ dτ

2

+ 2α −1 ∂i α

d x˜ i d x˜ j d x˜ i d t˜ + α −1 k˜i j = 0, dτ dτ dτ dτ (A.3a)

d x˜ i d t˜ d x˜ j d t˜ d x˜ i d x˜ j 2 − 2α σ˜ ij + ˜ ijk = 0. − αtr k˜ 3 dτ dτ dτ dτ dτ dτ (A.3b)

12

J. M. Heinzle, A. D. Rendall

Let > 0 be sufficiently small in comparison to α −1 (τ0 ) and E > 0, let ζ ∈ (1 + 2/d, 1 + 4/d), and consider the maximal interval [τ0 , τ¯ ) such that ! ! i! ! ! ! d x˜ ! ! d t˜ −1 ! ζ ! ! ! (A.4) ! dτ − α ! ≤ , t˜ ! dτ ! ≤ E on [τ0 , τ¯ ). Integrating (A.4) we infer that there exist constants C+ > C− > 0, such that −1 C− (τ − τ0 ) < t˜(τ ) − t˜0 < C+ (τ − τ0 ) , |x˜ i (τ ) − x˜0i | < EC− (ζ − 1)−1 t˜0

1−ζ

(A.5)

on [τ0 , τ¯ ). (In fact, C− can be improved (iteratively) by redefining C− as C− = −4/d min x∈D e−d A/2 − const t˜0 − , where D is the domain in x˜ i specified by (A.5), ˜ −4/d and analogously for C+ : C+ = max x∈D e−d A/2 + const t˜0 + .) ˜ Making use of (A.1) the geodesic equation (A.3a) yields d 2 t˜ = ι(τ ) where |ι(τ )| ≤ const t˜1−2ζ +4/d , dτ 2

(A.6)

and by integration ! ! ! ! ! d t˜ ! ! ! ˜ ! (τ ) − α −1 (τ0 )! ≤ const t˜2(1+2/d−ζ ) , hence ! d t − α −1 ! ≤ const t˜2(1+2/d−ζ ) , 0 0 ! dτ ! ! ! dτ (A.7) where the constants depend on , E, but are independent of t˜0 . The geodesic equation (A.3b) can be treated by noting that 6 −d A/2 −1 6 −1 − 3+ e αtr k˜ = (α − ed A/2 )tr k˜ + ed A/2 tr k˜ + 3 + t˜ t˜ . (A.8) d d By (A.1) we obtain d 2 x˜ i 4 −1 d t˜ d x˜ i = ς (τ ) where |ς (τ )| ≤ const t˜−2−4/d ; + 2+ t˜ dτ 2 d dτ dτ

(A.9)

integration leads to ! i! ! i! ! ! d x˜ ! ! −1−4/d+ζ ζ ! d x˜ ! ! ≤ const t˜−1−4/d − t˜0 t˜−2−4/d ! ˜ ≤ const t˜0 and t ! .(A.10) ! dτ ! dτ ! 2(1+2/d−ζ )

−1−4/d+ζ

< and const t˜0 < E, hence (A.7) If t˜0 is sufficiently large, then const t˜0 and (A.10) improve (A.4) on [τ0 , τ¯ ). Since τ¯ was chosen maximal, τ¯ must be infinite, and the above estimates hold globally. In particular, from (A.10), ! i! ! ! ! d x˜ ! ! ! ˜ ! ! ≤ const t˜−1−4/d and ! d t − α −1 ! ≤ const t˜−4/d , (A.11) ! dτ ! ! ! dτ where the second inequality results from the fact that the geodesic is affinely parametrized. Global existence of the timelike geodesics orthogonal to the hypersurface t˜ = t˜0 has thus been established; the asymptotic properties are given by (A.11).

Power-law Inflation without Symmetry

13

To show that the family of geodesics gives rise to a global Gaussian coordinate system μ we investigate geodesic deviation. Consider the geodesic γ (τ ) and let d j be the deviation vector between γ and an infinitesimally neighboring geodesic in the direction ∂ j , i.e., d 0j (τ0 )

d ij (τ0 )

=0,

=

δ ij

,

dd 0j dτ

(τ0 ) = −(α

−2

∂ j α)(τ0 ),

dd ij dτ

(τ0 ) = 0. (A.12)

In analogy to (A.4) we can assume that |dd 0j /dτ + α −2 ∂ j α| ≤ and t˜ζ |dd ij /dτ | ≤ E on a maximal interval [τ0 , τ¯ ). Along the lines of the above argument, by using derivatives of the geodesic equations (A.3) w.r.t. spatial variables we can improve these inequalities to obtain τ¯ = ∞, and we get that ! i! ! dd ! ! j! −4/d (A.13) ! ! ≤ const t˜−1−4/d so that |d ij − δ ij | ≤ const t˜0 ! dτ ! on [τ0 , ∞). For large t˜ the component d 0j will increase linearly in t˜. In line with the μ existing gauge freedom we may redefine d j ,

0 d 0j d t˜/dτ μ ˜ = (A.14) + λ( t ) d˜ j = −4/d δ ij + const [t˜0 + t˜−4/d ] d x˜ i /dτ d ij for a suitable choice of λ(t˜). We infer that the deviation vector behaves in a nice manner, at least for t˜0 sufficiently large, so that the family of geodesics originating from t˜ = t˜0 generates a Gaussian coordinate system that is global in the future.

Let {τ, x˜ i } denote the Gaussian coordinate system constructed above. In these coordinates the field equations take the form ∂τ g˜i j = −2k˜i j , ˜ k˜ i − 8π S i + 4π δ i (trS − ρ), ∂τ k˜ ij = R˜ ij + (tr k) j j j − ∂τ2 ϕ

(A.15a)

˜ τ ϕ = V (ϕ), ˜ + (tr k)∂ + ϕ

(A.15b)

where Si j and ρ stem from the scalar field energy-momentum tensor (2), i.e., ρ = T00 , Si j = Ti j . It can be shown by example that series of the type (m) gi j τ −2m/d , ϕ = [2κ −2 (d + 2)/d]1/2 log τ + ϕ(m) τ −2m/d g˜i j = τ 2+4/d m≥0

m≥0

(A.16) do not provide solutions of (A.15a) in general. The asymptotic expansions of g, ˜ ϕ, etc. necessarily include logarithmic terms, i.e., g˜i j = τ 2+4/d

Lm

l −2m/d gi(m,l) , etc., j (log τ ) τ

(A.17)

m≥0 l=0

where L m ∈ N ∀m. It turns out that L m = 0 for all m < d/2 in the case d = 4k, k ∈ N, and L m = 0 for all m < 3 + d in the case d = 2(2k + 1), k ∈ N.

14

J. M. Heinzle, A. D. Rendall

It is interesting to contrast (A.17) and (7): in Gaussian coordinates the asymptotic expansions contain logarithmic terms in general, however, by the use of a suitable time coordinate these logarithmic terms can be removed. In Appendix B we investigate whether this statement can be generalized, on the level of formal power series, for arbitrary exponents d. B. Formal Asymptotic Expansions with General Exponents For a spacetime (M, g, ˜ ϕ) consider the Einstein scalar field equations with potential

√ d V (ϕ) = V0 exp −κ 2 ϕ with d ∈ R. (B.1) d +2 In [9] it was shown that the equations admit power series as formal solutions. The analysis was performed in Gaussian coordinates {τ, x˜ i }, so that g˜ = −dτ 2 + g˜i j d x˜ i d x˜ j , and it was found that (m) g˜ i j = τ 2+4/d gi j τ −2m/d , ϕ = [2κ −2 (d + 2)/d]1/2 log τ + ϕ(m) τ −2m/d , m∈M

m∈M

(B.2) where M is the set {0} ∪ {(d/2)n 1 + 2n 2 + (3 + d)n 3 | n i ∈ N}. However, the series contain logarithmic terms, cf. (A.17), if there exists n 1 , n 2 ∈ N such that (d/2)n 1 +2n 2 = (3+d), which is the case when d = n or d = 6/n, n ∈ N. The problem simplifies considerably when we make the ansatz φ 2 i j −1 d + 2 g˜ = e −dt + gi j d x d x , ϕ = κ φ, (B.3) 2d which is inspired by (21). Let k ij denote the second fundamental form of the hypersurfaces t = const in the spacetime (M, −dt 2 + g), and σ ij its trace-free part. Then the Einstein scalar field (evolution) equations become ∂t gi j = −2gil k lj and 1 1 k 1 k l tf l l l l ∇ φ∇ j φ − ∇ φ∇k φ − ∇ ∇ j φ − ∇ ∇k φ , ∂t σ j = R j + (trk)σ j − d 3 3 (B.4a) 2 1 − ∂t φ trk − ∇ k φ∇k φ − ∇ k ∇k φ, (B.4b) ∂t trk = R + (trk)2 − κ 2 V0 d +2 d k 2 2 2d V0 , φ + ∇ φ∇k φ − (∂t φ) = −κ (B.4c) d +2 where R and ∇ refer to g. It can be proved that this system of equations, complemented by the constraints, admits power series of the following type as formal solutions:

(m) (0) gi j = e2H t gi j + gi j e−m H t , φ = d H t + φ(0) + φ(m) e−m H t , (B.5) m∈M

m∈M

cf. (31), where H 2 = 2V0 κ 2 (d +2)−1 (d +3)−1 and M = {2m 1 +(3+d)m 2 |m i ∈ N}. The recursive algebraic system specifying the coefficients is essentially identical with (35),

Power-law Inflation without Symmetry

15

the free data is represented by the 0th and the (3 + d)th coefficients. In this context the exponent d ∈ R is still not completely arbitrary, though: for d = 2k + 1, k ∈ N, the expansions (B.5) must be supplemented by logarithmic terms. These results suggest that there exist two types of logarithmic terms in formal expansions: (i) artificial logarithms which are due to an unsuitable choice of coordinates and can be removed by using different coordinates, and (ii) genuine logarithms. In our particular case we have seen that, of the logarithmic terms in [9] which appear for d = n and d = 6/n, n ∈ N, only those in the case d = 2k + 1, k ∈ N can be genuine. References 1. Anderson, M.T.: Existence and stability of even dimensional asymptotically de Sitter spaces. Ann. H. Poincaré 6, 801–820 (2005) 2. Bieli, R.: Algebraic expansions for curvature coupled scalar field models. Class. Quart. Grav. 22, 4363– 4376 (2005) 3. Choquet-Bruhat, Y., DeWitt-Morette, C.: Analysis, Manifolds and Physics. Part 2. Amsterdam: North-Holland, 1989 4. Dobarro, F., Ünal, B.: Curvature of multiply warped products. J. Geom. Phys. 55(1), 75–106 (2005) 5. Friedrich, H.: Existence and structure of past asymptotically simple solutions of Einstein’s field equations with positive cosmological constant. J. Geom. Phys. 3, 101–117 (1986) 6. Friedrich, H.: On the Existence of n-Geodesically Complete or Future Complete Solutions of Einstein’s Field Equations with Smooth Asymptotic Structure. Commun. Math. Phys. 107, 587–609 (1986) 7. Halliwell, J.J.: Scalar fields in cosmology with an exponential potential. Phys. Lett. B 185, 341–344 (1987) 8. Lee, H.: The Einstein-Vlasov system with a scalar field. Ann. Henri Poincaré 6, 697–723 (2005) 9. Müller, V., Schmidt, H.J., Starobinsky, A.A.: Power-law inflation as an attractor solution for inhomogeneous cosmological models. Class. Quant. Grav. 7, 1163–1168 (1990) 10. Rendall, A.D.: Accelerated cosmological expansion due to a scalar field whose potential has a positive lower bound. Class. Quant. Grav. 21, 2445–2454 (2004) 11. Rendall, A.D.: Asymptotics of Solutions of the Einstein Equations with Positive Cosmological Constant. Ann. Henri Poincaré 5, 1041–1064 (2004) 12. Rendall, A.D.: Mathematical properties of cosmological models with accelerated expansion. In: Lect. Notes in Phys. 692, Berlin-Heidelberg-New York: Springer, 2006, pp. 141–155 13. Rendall, A.D.: Intermediate inflation and the slow-roll approximation. Class. Quant. Grav. 22, 1655–1666 (2005) 14. Tchapnda, S.B., Rendall, A.D.: Global existence and asymptotic behaviour in the future for the Einstein-Vlasov system with positive cosmological constant. Class. Quant. Grav. 20, 3037–3049 (2003) Communicated by G.W. Gibbons

Commun. Math. Phys. 269, 17–37 (2007) Digital Object Identifier (DOI) 10.1007/s00220-006-0108-z

Communications in

Mathematical Physics

Large-Time Behavior of Solutions for the Boltzmann Equation with Hard potentials Ming-Yi Lee1 , Tai-Ping Liu2 , Shih-Hsien Yu3 1 Institute of Mathematics, Academia Sinica, Taipei, Taiwan. E-mail: [email protected] 2 Institute of Mathematics, Academia Sinica, Taipei, Taiwan and Mathematics Department,

Stanford University, Stanford, CA 94305, USA. E-mail: [email protected]

3 Mathematics Department, City University of Hong Kong, Tat Chee Avenue, Kowloon, Hong Kong, SAR.

E-mail: [email protected] Received: 10 December 2005 / Accepted: 11 April 2006 Published online: 12 September 2006 – © Springer-Verlag 2006

Abstract: We study the quantitative behavior of the solutions of the one-dimensional Boltzmann equation for hard potential models with Grad’s angular cutoff. Our results generalize those of [5] for hard sphere models. The main difference between hard sphere and hard potential models is in the exponent of the collision frequency ν(ξ ) ≈ (1+|ξ |)γ . This gives rise to new wave phenomena, particularly the sub-exponential behavior of waves. Unlike the hard sphere models, the spectrum of the Fourier operator −iξ 1 η + L is non-analytic in η for hard potential models. Thus the complex analytic methods for inverting the Fourier transform are not applicable and we need to use the real analytic method in the estimates of the fluidlike waves. We devise a new weighted energy function to account for the sub-exponential behavior of waves. 1. Introduction Consider the one-dimensional Boltzmann equation ∂t f (x, t, ξ ) + ξ 1 ∂x f (x, t, ξ ) = Q( f, f ),

Q( f, g)(ξ ) ≡ 2π π/2 1 2 R3 0

0

{ f (ξ∗ )g(ξ ) + f (ξ )g(ξ∗ )− f (ξ∗ )g(ξ )− f (ξ )g(ξ∗ )}B(θ, V )dθ dε dξ∗ ,

in which the microscopic velocity ξ = (ξ 1 , ξ 2 , ξ 3 ), V = ξ∗ − ξ, ξ = ξ + α(α · V ), ξ∗ = ξ∗ − α(α · V ), α = (cos θ, sin θ cos ε, sin θ sin ε),

18

M.-Y. Lee, T.-P. Liu, S.-H. Yu

and B(θ, V ) is (the collision cross section)·V . For the hard sphere model, B(θ, V ) = |V | cos θ sin θ . We are interested in the hard potential models with an angular cutoff in the sense of Grad [1]: B(θ, V ) ≤ c|V γ sin θ cos θ |, (1.1) for some constant γ , 0 < γ < 1 and c > 0. In this paper, we consider the linearized Boltzmann equation: ∂t g + ξ 1 ∂x g − Lg = 0, g(x, 0, ξ ) ≡ gin (x, ξ ),

(1.2)

with bounded and compact supported initial data; gin (x, ξ ) ≡ 0, |x| ≥ 1, |||gin ||| ≡ sup |gin | ≤ 1.

(1.3)

x∈R,ξ ∈R3

Here L is the collision operator linearized around the global Maxwellian state M, L f ≡ 2M −1/2 Q(M, M 1/2 f ), 2 M(ξ ) = (2π )−3/2 e−|ξ | /2 . L can be represented as L = −ν I + K with π/2 ν(ξ ) = 2π B(θ, η − ξ )M(η)dθ dη R3 0

and ⎧ ⎪ ⎨ K f (ξ ) = R3 k(ξ, η) f (η) dη, k = −k1 + k2 , π/2 k1 (ξ, η) = 2π M(ξ )1/2 M(η)1/2 0 B(θ, V ) dθ, ⎪ ⎩ k (ξ, η) = 2(2π )−3/2 v −2 exp{− 1 v 2 − 1 ζ 2 } exp{− 1 |w + ζ |2 }q(v, w) dw 2 2 8 2 1 2 (1.4) in which v = η − ξ = α(α · V ), w = V − α(α · V ), ζ1 = v(v · 21 (ξ + η)), ζ2 = 21 (ξ + η) − ζ1 , q(v, w) = 2| sin θ |−1 {B(θ, V ) + B( π2 − θ, V )}. From the expression of ν(ξ ), ν0 (1 + |ξ |)γ ≤ ν(ξ ) ≤ ν1 (1 + |ξ |)γ ,

(1.5)

for some positive constants ν0 and ν1 , and γ is the same as in (1.1). The exponent γ equals 1 for the hard sphere models and is between 0 and 1 for the hard potential models with an angular cutoff. The linearized collision operator L defines an unbounded symmetric operator on the standard Hilbert space [3],

L 2ξ ≡ L 2 (R3 ), · L 2 , h 2L 2 ≡ h(ξ )2 dξ, and f, g ≡ f (ξ )g(ξ )dξ. ξ

ξ

R3

R3

Boltzmann Equation with Hard Potentials

19

The null space of L is a five-dimensional vector space with the orthogonal basis χi , i = 0, 1, . . . , 4, ⎧ 1/2 ⎨ χ0 ≡ M , i χi ≡ ξ M 1/2 , i = 1, 2, 3, ker(L) ≡ span{χ0 , χ1 , χ2 , χ3 , χ4 }, ⎩ χ ≡ √1 (|ξ |2 − 3)M 1/2 . 4 6

ξ 3,

For the one-dimensional problem considered here, by a shift of the variables ξ 2 and we can restrict functional space to L 2ξ ≡ {g ∈ L ∞ (R3 ) : ξ 2 M 1/2 , g = 0, ξ 3 M 1/2 , g = 0, g L 2 < ∞}. ξ

We decompose the Hilbert space L 2ξ = ker(L) ⊕ ker(L)⊥ : for any g ∈ L 2ξ , ⎧ ⎨ g ≡ P0 g + P1 g(≡ g0 + g1 ), P0 g ≡ χ0 , gχ0 + χ1 , gχ1 + χ4 , gχ4 , ⎩ P g ≡ g − P g. 1 0 The characteristic information of the Euler equation is connected to the operator P0 ξ 1 on P0 L 2ξ , [6]: P0 ξ 1 E i = λi E i for i = 1, 2, 3, √ √ {λ1 = − 5/3, λ2 = 0, λ3 = 5/3}, ⎧ √ √ E 1 ≡ ( 3/2χ0 − 5/2χ1 + χ4 ), ⎪ ⎪ √ ⎪ (1.6) ⎨ E ≡ (− 2/3χ + χ ), 2 0 √ 4 √ E 3 ≡ ( 3/2χ0 + 5/2χ1 + χ4 ), ⎪ ⎪ ⎪ ⎩ E , E = δ i (Kronecker’s delta function). i j j To obtain the quantitative and pointwise behavior of solutions of the linearized Boltzmann equation, we first study the Green’s function G(x, t, ξ ; ξ0 ): ∂t G + ξ 1 ∂x G = LG, (1.7) G(x, 0, ξ ; ξ0 ) = δ(x)δ 3 (ξ − ξ0 ). The Green’s function yields the representation formula for the general solutions of the initial value problem of the linearized Boltzmann equation. It is also a basic tool for studying the full Boltzmann equation (1.2) through Duhamel’s principle. In the present paper, however, we concentrate only on the more essential part of the linearized equation. The Green’s function consists of particle-like and fluid-like waves. It has the following expression by Fourier transform: 1 1 G(x, t) ≡ eiηx+(−iξ η+L)t dη. 2π R This expression is useful for studying the fluid-like waves which correspond to long waves with |η| small. Since the eigenvalues of the operator −ξ 1 η + L is not analytic in η at the origin, we can not use the complex analytical techniques to estimate the long waves as in [5]. So we try to use the method of Fourier analysis to evaluate and get, though not exact, but an essentially exponential decay inside the fluid region of |x| ≤ 2λ3 t. This is partly motivated by [4].

20

M.-Y. Lee, T.-P. Liu, S.-H. Yu

Outside the fluid region, |x| ≥ 2λ3 t, we obtain the estimates of particlelike waves using the Picard iterations and mixture lemma of the hard potential models following the framework of [5, 7]. However, there is another main difference with [5] in that the tail behavior is exponential with exponents ν0 t + ν1 |x|γ t 1−γ , which is not linear in the space x and the time t variables. This is a consequence of the fact that the collision frequency ν(ξ ) ≈ (1 + |ξ |)γ is with sublinearity 0 < γ < 1. A new weighted energy function 2 3−γ

motivated by [2] is used to yield estimates of the form e−C(t+|x|) , expressing the wave behavior of the almost exponential decaying fluidlike waves and sub-exponential decaying particlelike waves. We now state the main theorem: Theorem 1. For any given positive integral number N , there exist positive constants C j , j = 0, 1, . . . , 4 such that the solution of (1.2) satisfies (I) |x| ≤ 2λ3 t, Gt gin (x) L 2 ξ

≤ O(1)|||gin |||

3

C0 (1 + t)

i=1

−1/2

|x − λi t|2 −N −(|x|+t)/C1 , 1+ +e 1+t

(II) |x| > 2λ3 t,

2 γ 1−γ 3−γ , Gt gin (x) L 2 ≤ C2 |||gin ||| e−(ν0 t+ν1 |x| t )/C3 + e−C4 (t+|x|) ξ

In Sect. 2, we review the basic results for the linearized collision operator and the spectrum analysis which plays an essential role in our study of long-wave approximation. In Sect. 3 we show that the spectrum of the operator −iξ 1 η + L is smooth at the origin and get the estimate of long-wave by real analysis. The construction of particlelike waves in Sect. 4 through the Picard iterations of the operator in (4.2) aims at identifying the waves with singular behavior. The Mixture Lemma is such that the iteration yields, increasingly more regular waves. Combining the analysis in Sect. 3 and the smoothness of the remainder waves in this section, the construction of Gt gin within the wave region |x| ≤ 2λ3 t is obtained. The estimates in the outside fluid region is obtained by the weighted energy method in Sect. 5. 2. Preliminaries Consider the Hilbert space L 2ξ given by the inner product: L 2ξ ≡ {g ∈ L ∞ (R3 ) : ξ 2 M 1/2 , g = 0, ξ 3 M 1/2 , g = 0, g L 2 < ∞}, ξ

h 1 , h 2 ≡ R3 h 1 (ξ )h 2 (ξ ) dξ. We introduce the following two sets of measures: Measures locally in (x, t) : ⎧ 1/2 ⎨ g L 2 ≡ R3 g(ξ )2 dξ , ξ ⎩ g L ∞ ≡ sup 3 |g(ξ )|(1 + |ξ |)β . ξ,β

ξ ∈R

Boltzmann Equation with Hard Potentials

21

Measures locally in t : ⎧ h L ∞ (L 2 ) ≡ supx∈R h(x, ·) L 2 , ⎪ x ξ ξ ⎪ ⎪ 1/2 ⎪ ⎪ 2 dξ d x ⎪ h ≡ h(x, ξ ) , ⎨ L 2x (L 2ξ ) R R3 1/2 i j ⎪ 2 dξ d x ⎪ h ≡ |∂ h(x, ξ )| , 2 i 3 x ⎪ Hx (L ξ ) R R ⎪ ⎪ j=0 ⎪ ⎩ ∞ h L ∞ ≡ supx∈R h(x, ·) L ∞ . x (L ξ,β ) ξ,β The following three basic properties of the linear collision operator L can be found in [1], except for Lemma 2, which is direct consequence of (1.4). Lemma 1. The linearized collision operator L is self-adjoint and nonpositive definite. Moreover, there exists κ > 0 such that, for any j ∈ ker(L)⊥ , i.e., χi , j = 0 for i = 0, . . . , 4,

j, L j ≤ −κ j 2L 2 . ξ

Lemma 2. Set

K ξ g(ξ ) ≡ R3 ∂ξ k(ξ, ξ∗ )g(ξ∗ ) dξ∗ , K ξ∗ g(ξ ) ≡ R3 ∂ξ∗ k(ξ, ξ∗ )g(ξ∗ ) dξ∗ .

Then the operator K is compact and the operators K ξ and K ξ∗ are bounded operators in L 2ξ . The first estimate in the following lemma is useful to prove the properties of the iteration of operators and the second estimate shows that the operator K is pointwise bounded on the space L 2ξ . Lemma 3. For any β > 0 there exists positive constants C(β) and C such that K j L ∞ ≤ C(β) j L ∞ , ξ,β ξ,β+1 ≤ C j . K j L ∞ 2 L ξ,0 ξ

Take the Fourier transformation of (1.2) in the x-variable; here the ξ -variable dependence is not spelled out: ⎧ ˆ t) ≡ √1 R e−iηx g(x, t) d x, ⎨ g(η, 2π 1 ⎩ ∂t gˆ + iξ η gˆ − L gˆ = 0, g(η, ˆ 0) = gˆ in (η) < ∞. From this we can formally write the solution g(x, t, ξ ) as follows: g(η, ˆ t) = e(−iξ η+L)t gˆ in (η), 1 1 g(x, t) = √ eiηx+(−iξ η+L)t gˆ in (η) dη = G(x − y, t)gin (y) dy, 2π R R 1 1 eiηx+(−iξ η+L)t dη. G(x, t) ≡ 2π R 1

22

M.-Y. Lee, T.-P. Liu, S.-H. Yu

To analyze G(x, t), we need to consider the spectrum of the operator −iξ 1 η + L. For long waves, |η| small, the dominating part consists of the eigenvalues: σ (η) ≡ λ(η) ∈ C : there exists nontrivial e(η) ∈ L 2ξ such that (−iξ 1 η + L)e(η) = λ(η)e(η) . Lemma 4 (Spectrum gap). There exist two positive numbers κ0 and κ1 such that for |η| > κ0 , σ (η) ⊂ {z ∈ C : Re(z) ≤ −κ1 }. The function G(x, t) can be decomposed into two parts: G(x, t) = G L (x, t) + G S (x, t), ⎧ 1 1 ⎪ G L (x, t) ≡ 2π ei xη+(−iξ η+L)t dη, ⎪ |η|<κ /2 0 ⎪ ⎪ ⎨ G (x, t) ≡ 1 i xη+(−iξ 1 η+L)t dη, S 2π |η|≥κ0 /2 e ⎪ GtL h(x) ≡ R G L (y, t)h(x − y) dy, ⎪ ⎪ ⎪ ⎩ Gt h(x) ≡ G (y, t)h(x − y) dy. S R S

(3.3)

Here, the notion ei xη+(−iξ η+L)t is a well-defined L 2ξ -operator-valued function in the variable (x, η, t) ∈ R × R × R+ . The operator G L corresponds to the long waves and G S to the short waves. The long waves are clearly smooth, [5]. 1

Lemma 5. For any i ≥ 0, GtL gin H i (L 2 ) = O(1) gin L 2 (L 2 ) , x

ξ

x

ξ

where O(1) depends on i only. For any given j and k in L 2ξ , the operator j ⊗ k| on L 2ξ is defined pointwise in L 2ξ as follows: j ⊗ k|g ≡ k, g j for any g ∈ L 2ξ . Denote by η an eigen-projection operator of G L from L 2ξ onto the vector space spanned by {e1 (η), e2 (η), e3 (η)} for |η| < κ0 /2, Lemma 8: ⎧ 3 ⎨ η h ≡ ei (η) ⊗ ei (η)|h, (3.4) i=1 ⎩ ⊥ η ≡ 1 − η . With this, the operator G L (x, t) is decomposed into 1 1 G L (x, t) = ei xη+(−iξ η+L)t (η + ⊥ η ) dη 2π |η|<κ0 /2 ≡ G L;0 (x, t) + G L;⊥ (x, t),

Boltzmann Equation with Hard Potentials

23

and one has a rather explicit expression for G L;0 (x, t) and G L;⊥ (x, t): ⎧ 3 1 i xη+σi (η)t e (η) ⊗ e (η)| dη, ⎪ G L;0 (x, t) = i=1 i i ⎪ 2π |η|<κ0 /2 e ⎪ ⎪ 1 η+L)t ⎨ 1 i xη+(−iξ ⊥ G L;⊥ (x, t) = 2π |η|<κ0 /2 e η dη, ⎪ Gt h(x) = G (x − y, t)h(y) dy, ⎪ L;0 R L;0 ⎪ ⎪ ⎩ Gt h(x) = G (x − y, t)h(y) dy. L;⊥ R L;⊥

(3.5)

From the spectrum gap of the operator (−iξ 1 η + L) when |η| 1 and the smoothness of longwaves, we have the following estimate on the sup norm L ∞ x , [5]. Lemma 6. There exists κ2 > 0 and positive constant C such that, for |η| < κ2 /2, √ t G L;⊥ h ∞ 2 ≤ C κ 0 e−κ2 t h L 2x (L 2 ) . L x (L ξ )

ξ

Here, the constant κ2 depends on the parameter κ0 in the smallness condition |η| ≤ κ0 /2. On the other hand, G S is bounded but not a regular operator in x, for the estimate on the sup norm we need to use the Sobolev inequality. Lemma 7. The operator GtS satisfies t G S h L 2 (L 2 ) = O(1)e−κ2 t h L 2 (L 2 ) , x

ξ

ξ

x

GtS h L ∞ (L 2 ) = O(1)e−κ2 t ( h L 2 (L 2 ) h x L 2 (L 2 ) )1/2 . x

ξ

x

ξ

x

ξ

Notice that the second estimate requires regularity of h in x, since the Sobolev inequality is applied here. 3. Fluidlike Waves The following information on the leading terms of the eigenvalues for iξ 1 η + L is the same as that in [5] for hard sphere models. Lemma 8. For the same number κ0 as in Lemma 4, we have, for |η| ≤ κ0 , σ (η) = {σ1 (η), σ2 (η), σ3 (η)}, where ⎧ σ1 (η) = −iλ1 η − A1 |η|2 + O(1)|η|3 , ⎪ ⎪ ⎪ ⎪ ⎨ σ2 (η) = −iλ2 η − A2 |η|2 + O(1)|η|3 , σ3 (η) = −iλ3 η − A3 |η|2 + O(1)|η|3 , ⎪ ⎪ ⎪ A = − P1 ξ 1 E j , L −1 P1 ξ 1 E j > 0, ⎪ ⎩ j A1 = A3 ,

(3.6)

and λi are the eigenvalues of the operator P0 ξ 1 P0 given in (1.6). Furthermore, there exist normalized eigenvectors e j (η) ∈ L 2ξ of the operator −iξ 1 η + L, ⎧ ⎨ (−iξ 1 η + L)e j (η) = σ j (η)e j (η), j (3.7)

e (η), ek (η) = δk , ⎩ j 2 ei (η) = E i + ηei (0) + O(1)|η| ,

24

M.-Y. Lee, T.-P. Liu, S.-H. Yu

and 3 j εi E j + ei⊥ , ei⊥ ≡ P1 ei (0), ei (0) = j=1 ⎧

E ,P ξ 1 L −1 P ξ 1 E j ⎪ ⎨ εk = −i j 0(λ j −λk )1 k , j = k, εk = 0, ⎪ ⎩ k⊥ ei = i L −1 P1 ξ 1 E k .

In this lemma, the A j are the dissipation parameters corresponding to the ChapmanEnskog expansion relating the Boltzmann equation to the Navier-Stokes equation. Lemma 9. Every element of σ (η) is smooth for |η| 1, η ∈ R. Proof. Though the elements σk (η) of σ (η) satisfy the same form of asymptotics in (3.6) around η = 0, the form does not give the regularity property of the spectrum in η. We apply the macro-micro decomposition to the spectrum equation (−iξ 1 η+L)e(η) = σk (η)e(η) to obtain ⎧ ⎨ P0 ξ 1 (e0 + e1 ) = ρk (η)e0 , (3.8) −iηP1 ξ 1 (e0 + e1 ) + Le1 = −iηρk (η)e1 , ⎩ ρk ≡ −σk (η)/iη, e0 ≡ P0 e, e1 ≡ P1 e. The operator (L − iP1 ξ 1 η)|Range P1 is invertible when η ∈ R due to the fact that for any g ∈ Range P1 , η ∈ R, Re (L − iP1 ξ 1 η)g, g ¯ = Re Lg, g ¯ − η Re iξ 1 g, g ¯ = Lg, g ¯ ≤ −α0 g, g ¯ for some α0 > 0. Thus the operator (L − iηP1 ξ 1 + iηρ1 ) is invertible when η ∈ R ∩ {|η|, |ρk | 1}. Equation (3.8) can be rewritten as follows: P0 ξ 1 − ρk + iηP0 ξ 1 [L − iηP1 ξ 1 + iηρk ]−1 P1 ξ 1 e0 = 0. (3.9) Since dim(Range(P0 )) = 3, the operator (P0 ξ 1−ρk +iηP0 ξ 1 [L−iηP1 ξ 1 +iηρk ]−1 P1 ξ 1 )P0 can be represented by a 3 × 3 matrix in the basis of E 1 , E 2 , E 3 . The matrix is ⎛ ⎞ ⎛ ⎞ 0 λ1 0 1 0 0 ⎝ 0 λ2 0 ⎠ − ρk ⎝ 0 1 0 ⎠ + iηAi j (η, ρk ), 0 0 λ3 0 0 1 where Ai j (η, ρk ) = E i , P0 ξ 1 [L − iηP1 ξ 1 + iηρk ]−1 P1 ξ 1 E j . Due to the fact that [L − iηP1 ξ 1 + iηρk ]−1 exists for η ∈ R ∩ {|η|, |ρk | 1}, Ai j (η, ρk ) is smooth in η ∈ R ∩ {|η|, |ρk | 1}. Equation (3.9) is equivalent to ⎤ ⎡⎛ ⎞ λ1 − ρk 0 0 λ2 − ρk 0 ⎠ + iηAi j (η, ρk )⎦ = 0. det ⎣⎝ 0 0 0 λ3 − ρk The scalar equation gives rise to the existence of the smooth implicit function ρk (η) satisfying ρk (0) = λk and (3.8).

Boltzmann Equation with Hard Potentials

25

Remark 1. For the hard sphere model, the series L −1

∞ k

iη(P1 ξ 1 + ρk )L −1 k=0

converges and is bounded for |η| 1 because P1 ξ 1 L −1 is bounded. Thus, [L −iηP1 ξ 1 + iηρk ]−1 exists where η ∈ C, |η|, |ρk | 1. As mentioned before, the long wave component G L (x, t) is responsible for the fluidlike behavior in the Boltzmann solution. Lemma 6 shows that the operator G L;⊥ decays exponentially in t. Through the decomposition G L = G L;0 + G L;⊥ , we see that G L;0 is the main component with fluidlike behavior. Because of the non-analyticity of eigenvalues, Lemma 9, we use the real analytic methods to study the fluidlike waves. For this, we apply the method of [4] with extra boundary values at |η| = κ0 /2. Note that the estimate below for G L;0 is essentially heat kernels (1 + t)−1/2 e− decaying terms.

(x−λi t)2 C(t+1)

plus exponential

Theorem 2. For any positive integer N , there exist positive constants C N and c such that, for any fixed x ∈ R, 3 |x − λi t|2 −N (1 + t)−1/2 1 + + e−ct ; ξ 1+t i=1 3 (x−λi t)2

1 − 4 A (t+1) i (ii) G L;0 (x, t) − e E i ⊗ E i | √ 2 4 Ai π(1 + t)

(i) G L;0 (x, t) L 2 = O(1)C N

i=1

= O(1)C N

3 i=1

|x − λi t|2 −N (1 + t)−1 1 + 1+t

−ct . +e

Lξ

Proof. For (i), by (3.5) and (3.6), we have that G L;0 (x, t) =

3

1 2π j=1

=

3

j=1

1 2π

ei xη+σ j (η)t ei (η) ⊗ ei (η)| dη

|η|<κ0 /2

ei(x−λ j t)η−A j t|η|

2 +O(1)t|η|3

(e j (0) ⊗ e j (0)| + ε j (η))dη,

|η|<κ0 /2

where the operator ε j (η) = O(1)η when κ0 is small enough. When t ≤ 1, the ei (η) ∈ L 2ξ shows that G L;0 (x, t) L 2 ≤ C. ξ

√ √ So we just consider the case t > 1. Let x¯ = (x − λ j t)/ t and η¯ = η t; we use the change variables to get

26

M.-Y. Lee, T.-P. Liu, S.-H. Yu

2

3

ei(x−λ j t)η−A j t|η| +O(1)t|η| (ei (0) ⊗ ei (0)| + ε j (η))dη |η|<κ0 /2 √ √ 1 2 ¯ ¯ 3/ t j η¯ +O(1)|η| =√ ei x¯ η−A (ei (0) ⊗ ei (0)| + ε j (η/ ¯ t))dη. √ t |η|<κ ¯ 0 t/2 √ √ For |x| ¯ ≤ t, since |η| ¯ < κ0 t/2 and the eigenvector belongs to L 2ξ , we have √ √ i x¯ η−A ¯ η¯ 2 +O(1)|η| ¯ 3/ t j e (ei (0) ⊗ ei (0)| + ε j (η/ ¯ t)) dη √ ≤ C. |η|<κ ¯ 0 t/2

L 2ξ

√ For x¯ > t and N is given, √ √ 2 ¯ ¯ 3/ t j η¯ +O(1)|η| x¯ N ei x¯ η−A (ei (0) ⊗ ei (0)| + ε j (η/ ¯ t)) dη √ |η|<κ ¯ t/2 0 1 ∂ √ √ N i x¯ η¯ −A j η¯ 2 +O(1)|η| ¯ 3/ t e e (ei (0) ⊗ ei (0)| + ε j (η/ ¯ t)) dη = √ i ∂ η¯ |η|<κ ¯ 0 t/2 =

N

(−1)k+1

k=1

1 ∂ N −k i x¯ η¯ e i ∂ η¯

√ √

1 ∂ k−1 −A η¯ 2 +O(1)|η| ¯ 3/ t e j (ei (0) ⊗ ei (0)| + ε j (η/ ¯ t)) √ i ∂ η¯ |η|=κ ¯ 0 t/2 √ √

i x¯ η¯ 1 ∂ N −A j η¯ 2 +O(1)|η| ¯ 3/ t e +(−1) N (ei (0) ⊗ ei (0)|+ε j (η/ ¯ t)) dη. √ e i ∂ η¯ |η|<κ ¯ 0 t/2 √ We choose√κ0 small such that O(1)|η| ¯ 3 / t = O(1)|η|3 t < A j η¯ 2 /2 = A j η2 t/2. Hence, for |x| ¯ > t, √ √ i x¯ η−A ¯ η¯ 2 +O(1)|η| ¯ 3/ t j e (ei (0) ⊗ ei (0)| + ε j (η/ ¯ t)) dη √ ×

|η|<κ ¯ 0 t/2

L 2ξ

√ 1 −k 1 ∂ k−1 −A j η¯ 2 +O(1)|η| ¯ 3/ t e ≤ C√ |x| ¯ i ∂ η¯ t k=1 √ ×(ei (0) ⊗ ei (0)| + ε j (η/ ¯ t)) √ 2 |η|=κ ¯ 0 t/2 L ξ N 1 ∂ 1 +C √ |x| ¯ −N √ i ∂ η¯ t |η|<κ ¯ 0 t/2 1/2 √ √ 2 2 ¯ 3/ t × e−A j η¯ +O(1)|η| (ei (0) ⊗ ei (0)| + ε j (η/ ¯ t)) dη dξ N

≤ CN

|x − λ j t| √ t

−N

Thus, we obtain G L;0 (x, t) L 2 ≤ O(1)C N ξ

+ e−ct .

3

i=1

|x − λi t|2 −N (1 + t)−1/2 1 + + e−ct . 1+t

Boltzmann Equation with Hard Potentials

27

For (ii), 3

1 2 G L;0 (x, t) − ei(x−λ j t)η−A j t|η| E j ⊗ E j | dη 2π |η|<κ0 /2 j=1

=

3

1 2π

j=1

2

|η|<κ0 /2

ei(x−λ j t)η−A j t|η| O(1)t|η|3 E j ⊗ E j | dη

3

1 2 3 + ei(x−λ j t)η−A j t|η| +O(1)t|η| ε j (η) dη. 2π |η|<κ0 /2 j=1

√ √ We use the change variables x¯ = (x − λ j t)/ t and η¯ = η t, then the first integral is

1 1 t 2π 3

|η|<κ ¯ 0 t/2

j=1

¯ ¯ j |η| ei x¯ η−A O(1)|η| ¯ 3 E j ⊗ E j | d η, ¯ 2

√

and the second integral is 1 1 t 2π 3

j=1

e √ |η|<κ ¯ 0 t/2

√ i x¯ η−A ¯ ¯ 2 +O(1)t|η/ ¯ t|3 j |η|

O(1)η¯ d η. ¯

Since we know that 1 2π and

2

2

ei(x−λ j t)η−A j t|η| dη = !

|η|≥κ0 /2

(x−λi t) 1 − e 4A j t , 4A jπt

2 2 ei(x−λ j t)η−A j t|η| dη ≤ Ce−κ0 A j t/4 .

The case (ii) is similar to case (i) by the following inequalities: 3 (x−λi t)2

1 − 4 A (t+1) G L;0 (x, t) − i e E i ⊗ E i | √ 4 Ai π(1 + t) i=1 3

1 i(x−λ j t)η−A j t|η|2 ≤ G L;0 (x, t) − e E j ⊗ E j | 2π |η|<κ0 /2 j=1

3 (x−λ j t)2 (x−λi t)2 1 1 − − ! + e 4A j t − ! e 4 A j (t+1) E j ⊗ E j | 4A jπt 4 A j π(1 + t) j=1 3 1 i(x−λ j t)η−A j t|η|2 + e E i ⊗ E i | dη. 2π |η|≥κ0 /2 j=1

This completes the proof.

28

M.-Y. Lee, T.-P. Liu, S.-H. Yu

4. Particlelike Waves We decomposed K of (1.4) into K 0 ≡ K 0.D and K 1 ≡ K 1,D : K i z(ξ ) ≡ R3 K i (ξ, ξ∗ )z(ξ∗ ) dξ∗ , i = 0, 1, −ξ∗ | K 0 (ξ, ξ∗ ) = ch 0 ( |ξDν )k(ξ, ξ∗ ), 0 K 1 (ξ, ξ∗ ) = k(ξ, ξ∗ ) − K 0 (ξ, ξ∗ ), ch 0 (r ) ≡ 1, r ∈ [−1, 1], supp(ch 0 ) ⊂ [−2, 2], ch 0 ∈ Cc∞ (R), ch 0 > 0. Here the cutoff parameter D will be chosen to be small. So the operator K 1 has high order regularity with respect to the microscopic variable ξ . We rewrite the linearized Boltzmann equation (1.2) as follows: ∂t g + ξ 1 ∂x g + ν(ξ )g = (K 0 + K 1 )g. Definition 1. Denote by St and OtD the solution operator of the following equations: ⎧ ⎨ ∂t h + ξ 1 ∂x h + ν(ξ )h = 0, ⎩ h(x, 0) = h (x) ∈ L ∞ , x ∈ R, 0 ξ,β ⎧ ⎨ ∂t j + ξ 1 ∂x j + ν(ξ ) j = K 0 j, ⎩ j (x, 0) = j (x) ∈ L ∞ , x ∈ R, 0 ξ,β

h(x, t) ≡ St h 0 (x),

(4.1)

j (x, t) ≡ OtD j0 (x),

(4.2)

where β > 5/2 and 0 < D 1. Equation (4.1) is a damped transport equation, and we call (4.2) an “essential kinetic equation”. These equations are hyperbolic and carry particlelike waves. Lemma 10. (1) The operator K 1 is a smoothing operator in ξ . For any h ∈ L 2ξ , i ≥ 0, K 1 h H i ≤ C h L 2 . ξ

ξ

(2) For any β ≥ 0, there exist positive constants C and Cβ such that the operator St satisfies

∞ St L ∞ ≤ Cβ e−ν0 t , x (L ξ,β ) t S L 2 (L 2 ) ≤ Ce−ν0 t . x

ξ

(3) For any β ≥ 0, there exist positive constants C and Cβ such that the operator St satisfies

∞ St L ∞ ≤ Cβ e−ν0 t , x (L ξ,β ) t S L 2 (L 2 ) ≤ Ce−ν0 t . x

ξ

Boltzmann Equation with Hard Potentials

29

∞ (4) The operator OtD is also a bounded operator on L ∞ x (L ξ,β ) for any β ≥ 0. In fact, there exist positive constants C0 , C1 such that, for any D ∈ (0, C0 ), ∞ ∞ . OtD j0 L ∞ ≤ C1 e−ν0 t/2 j0 L ∞ x (L ξ,β ) x (L ξ,β )

The following lemma yields the significant hyperbolic property of the operators St and OtD . Note that the exponent of the estimate is due to the property ν(ξ ) ≈ (| + |ξ |)γ for the hard potential model. Lemma 11. For any given β ≥ 0 and t ≥ 1, there exists sufficiently small positive D such that " t −2ν t/3 0 max h 0 (y) L ∞ S h 0 (x) L ∞ = O(1)e ξ,β ξ,β |y−x|1

max

|x−y|=|ξ 1 |t

= O(1)e−ν0 t/2 OtD j0 (x) L ∞ ξ,β + sup

e−ν1 |y−x|

γ |ξ 1 |γ −1 /3

h 0 (y) L ∞ , (4.3) ξ,β

" max j0 (y) L ∞ ξ,β

|y−x|
max

1 |ξ 1 |>1 |x−y|=|ξ |t

e

#

−ν1 |y−x|γ |ξ 1 |γ −1 /4

j0 (y) L ∞ , (4.4) ξ,β

where ν0 and ν1 are given in (1.5). Proof. We use the representation (4.1) for St . For |ξ 1 | ≤ 1, |St h 0 (x, ξ )| ≤ e−ν(ξ )t (1 + |ξ |)−β h 0 (x − ξ 1 t, ·) L ∞ ξ,β ≤ e−2ν0 t/3 (1 + |ξ |)−β max h 0 (y) L ∞ . ξ,β |x−y|
For |ξ 1 | > 1, we use the basic property of the hard potential model ν(ξ ) ≈ (1 + |ξ |)γ : |St h 0 (x, ξ )| ≤ e−2ν0 t/3−ν(ξ )t/3 (1 + |ξ |)−β h 0 (x − ξ 1 t, ·) L ∞ ξ,β ≤ e−2ν0 t/3−|ξ

1 |γ t/3

(1 + |ξ |)−β

max

|x−y|=|ξ 1 |t

h 0 (y) L ∞ . ξ,β

Estimate (4.3) follows from the above inequality. From Duhamel’s principle, t t t j (x, t) = O D j0 (x) = S j0 (x) + St−τ (K 0 j)(x, τ ) dτ. 0

We make the following a priori hypothesis: " −ν1 t/2 ∞ max j0 (y) L ∞ j (x, t) L ξ,β ≤ C(β)e ξ,β |y−x|
+ sup

max

1 |ξ 1 |>1 |x−y|=|ξ |t

e

−ν1 |y−x|γ |ξ 1 |γ −1 /4

# j0 (y) L ∞ , ξ,β

(4.5)

30

M.-Y. Lee, T.-P. Liu, S.-H. Yu

then we have the inequality, (4.3), t St−τ (K 0 j)(x, τ ) dτ 0

L∞ ξ,β

t

≤

St−τ (K 0 j)(x, τ ) L ∞ dτ ξ,β " t max K 0 j (y) L ∞ ≤ O(1)e−ν0 (t−τ )/2 ξ,β 0

|y−x|
0

+ sup

max

1 |ξ 1 |>1 |x−y|=|ξ |(t−τ )

e−ν1

|y−x|γ |ξ |γ −1 /4

#

K 0 j (y) L ∞ ξ,β

dτ.

From Lemma 3 and the expression of K 0 , ≤ Cβ D h L ∞ , K 0 h L ∞ β β+1 we get t St−τ (K 0 j)(x, τ ) dτ 0

L∞ ξ,β

≤ C(β) · O(1) · Cβ · D + sup

max

1 |ξ 1 |>1 |x−y|=|ξ |τ

e

e−ν0 (t−τ )/2

0 −ν1 |y−z|γ |ξ |γ −1 /4

max

+ sup

t

1 |ξ 1 |>1 |x−y|=|ξ |(t−τ )

e−ν1 |y−x|

max

|y−x|
e−ν0 τ/2

" max j0 (z) L ∞ ξ,β

|y−z|<τ

j0 (z) L ∞ ] ξ,β

γ |ξ |γ −1 /4

e−ν0 τ/2

$

×

#

max j0 (z) L ∞ + sup ξ,β

|y−z|<τ

≤ C1 (β)e + sup

−ν1 t/2

max

1 |ξ 1 |>1 |x−y|=|ξ |τ

"

max j0 (z) L ∞ ξ,β

|z−x|
max

1 |ξ 1 |>1 |x−y|=|ξ |(t−τ )

e

−ν1 |z−x|γ |ξ |γ −1 /4

e

−ν1 |y−z|γ |ξ |γ −1 /4

j0 (z) L ∞ ξ,β

dτ

# j0 (z) L ∞ , ξ,β

where C1 (β) < C(β) with D small enough. Thus we obtain a stronger estimate from the hypothesis (4.5). We therefore can establish (4.5) by iterations. This complete the proof of (4.4). 4.1. Wave Mixture Operator. The regularity in x can be deduced from the regularity in ξ by the mixture operators. Definition 2. For any g0 ∈ L 2x (L 2ξ ), the k th -degree mixture operator Mtk is given as follows: t s1 s2k−1 t Mk g0 ≡ ··· St−s1 K Ss1 −s2 K Ss2 −s3 K · · · Ss2k−1 −s2k 0

0

0

×K Ss2k g0 ds2k · · · ds1 .

Boltzmann Equation with Hard Potentials

31

Through the k th -degree mixture operator, the regularity in x can be deduced from the regularity in ξ . We now state the Mixture Lemma for hard potentials. Lemma 12 (Mixture Lemma). For each given k ≥ 0, there exists a positive constant Ck such that ∂xk Mtk g0 L 2 (L 2 ) ≤ Ck e−ν0 t g0 L 2 (L 2 ) + ∂ξk1 g0 L 2 (L 2 ) . x

ξ

ξ

x

x

ξ

Proof. Following the proof of the mixture lemma in [5], we need only to check that the operator l1 (ξ, ξ1 ) ≡

∂ν(ξ1 ) · k(ξ, ξ1 ) ∂ξ11

and (k(ξ, ξ1 )k(ξ1 , ξ2 ))V 1 are bounded on L 2ξ . The first boundedness is to show 2

2 l1 (ξ, ξ1 )k(ξ1 , ξ2 )gˆ 0 (η, ξ2 ) dξ2 dξ1 dξ ≤ C. Since k(ξ, ξ1 ) is smooth outside the region |ξ1 − ξ | ≤ D, we only discuss the following case. It is easy to check that l1 ∈ L 1ξ1 , and so for the Schwarz inequality and Lemma 2, we have 2 (ξ, ξ )k(ξ , ξ ) g ˆ (η, ξ ) dξ dξ l 1 1 1 2 0 2 2 1 dξ |ξ1 −ξ |≤D 2 |l1 (ξ, ξ1 )| k(ξ1 , ξ2 )gˆ 0 (η, ξ2 ) dξ2 dξ1 dξ ≤ O(1)D |ξ −ξ |≤D 1 2 |l1 (ξ, ξ1 )| dξ k(ξ1 , ξ2 )gˆ 0 (η, ξ2 ) dξ2 dξ1 ≤ O(1)D |ξ1 −ξ |≤D

≤ CO(1) D gˆ 0 L 2 . 2

2

ξ

Similarly, we also have the second boundedness. Hence the proof is complete.

4.2. Wave Decompositions. With the above preparation, we now construct a series of increasingly regular hyperbolic waves as follows: The i th degree wave carrier Wi (x, t) is defined by % t " t St−s + Wi (x, t) ≡ I0 (x, t) + St−s1 K Ss1 −s ds1 K 1 I0 (·, s) ds(x) +

0 i t

j=1

0

s 0 Mt−s j K 1 I (·, s) ds +

≡ I0 (x, t) + J−1 (x, t) + J0 (x, t) +

t 0

2i

j=1

0

t1

St−t1 K Mtj1 −s K 1 I0 (·, s) ds dt1 (x)

Ji (x, t),

32

M.-Y. Lee, T.-P. Liu, S.-H. Yu

where I0 (x, t) ≡ OtD gin (x). For the solution g of (1.2), the i th remainder waves are defined by Ri (x, t) ≡ g(x, t) − Wi (x, t). (4.6) So we also have the following lemma, [5]. Since the functions I 0 , J−1 , Ji , i = 0, 1, . . . as the solution of ∂t I 0 + ξ 1 ∂x I 0 + ν(ξ )I 0 = K 0 I 0 , ∂t J−1 + ξ 1 ∂x J−1 + ν(ξ )J−1 = K 1 I 0 , ∂t Ji + ξ 1 ∂x Ji + ν(ξ )Ji = K Ji−1 , expressed by Duhamel’s principle, the remainder Ri satisfies following lemma. Lemma 13. The equation for Ri is ⎧ ⎨ ∂t Ri + ξ 1 ∂x Ri − L Ri = K J2 j t t ≡ 0 01 K St−t1 K Mit1 −s K 1 I0 (·, s) ds dt1 , ⎩ Ri (x, 0) ≡ 0,

(4.7)

and Ri (·, t) H i (L 2 ) = O(1)|||gin |||. x

ξ

Lemma 14. Suppose that the initial data is of compact support, (1.3). For each i ≥ 1, there exists Ci such that ≤ Ci e−(ν0 t+ν1 |x| J2i (x, t) L ∞ ξ,0 J2i H i (L 2 ) ≤ Ci e− x

ξ

I0 (x, t) L 2 , J j (x, t) L 2 ≤ C j e−(ν0 t+ν1 |x| ξ

ξ

ν0 t 8

γ t 1−γ )/8

|||gin |||,

|||gin |||,

γ t 1−γ )/8

(4.8) (4.9)

|||gin |||, for j = −1, 0, 1. (4.10)

Proof. Estimate (4.8) is obtained by induction in i as follows: Since gin has compact support in the x-variable and (a + b)r ≤ a r + br for a, b > 0 and 0 < r ≤ 1, by Lemma 11 there exists C0 > 0 such that ≤ C 0 e− I0 (x, t) L ∞ ξ,0

ν0 t+ν1 |x|γ t 1−γ 4

|||gin |||.

Suppose that (4.8) holds for i ≤ k. We have t t1 St−t1 K St1 −s K J2k (·, s) ds dt1 . J2(k+1) (x, t) = 0

0

By Lemma 3 and Lemma 11, (4.8) holds for i = k + 1. To prove (4.9), we have t t1 St−t1 K Mt1 −s K 1 I0 (·, s)(x) ds dt1 , (4.11) J2i (x, s) = 0

0

and Lemma 10 (1) implies that ∂ξi K 1 I0 (·, s) L 2 (L 2 ) ≤ Ci e−ν0 s/8 |||gin |||. x

ξ

Thus (4.9) follows from Lemma 12 (Mixture Lemma). The proof of (4.10) is similar. Lemma 15. There exists a positive number ν2 such that G tS gin L ∞ (L 2 ) = O(1)e−ν2 t/4 |||gin ||| x

ξ

for t ≥ 1.

Boltzmann Equation with Hard Potentials

33

Proof. Combining the spectral decomposition and the wave decomposition, we have Wi (x, t) − GtS gin (x) = GtL gin (x) − Ri (x, t). From Lemmas 14 and 7, both terms on the left-hand side decay exponentially in t with the norm L 2x (L 2ξ ): Wi (·, t) − GtS gin L 2 (L 2 ) = O(1)e−ν2 t/4 gin L 2 (L 2 ) , x

ξ

x

ξ

where ν2 = min{κ2 , ν0 }. The terms on the right-hand side have smooth property with the norm Hxi (L 2ξ ) by Lemmas 5 and 13: GtL gin (x) − Ri (·, t) H i (L 2 ) = O(1)|||gin |||. ξ

x

Thus, by Sobolev’s embedding theorem, we obtain Wi (·, t) − Gt gin L ∞ (L 2 ) ≤ Ce−ν2 t/4 |||gin |||. x

ξ

(4.12)

Since the expression of Wi and the estimate through Lemma 14, we have Wi (·, t) L ∞ (L 2 ) ≤ Ce−ν0 t/4 |||gin |||. x

ξ

(4.13)

We complete the proof with (4.12) and (4.13). With Theorem 2 and Lemma 15 we have the structure of Gt gin for |x| ≤ 2λ3 t. Proposition 1. For |x| ≤ 2λ3 t and any given positive integral number N , there exist positive constants C0 and C1 such that the solution of (1.2) satisfies G gin (x) L 2 = O(1)|||gin ||| t

3

ξ

C0 (1 + t)

−1/2

i=1

|x − λi t|2 −N −(|x|+t)/C1 . 1+ +e 1+t

5. Weighted Energy Estimate For the region |x| > 2λ3 t, we apply a weighted energy estimate on Ri , (4.6). Lemma 16. Let f 0 ≡ P0 f and f 1 ≡ P1 f . Then, for any constant α > 0, α f, f ≤ (α + 2α 3/2 ) f 0 , f 0 + (α + 1/2α 1/2 ) f 1 , f 1 . Proof. The proof follows from the identity α f, f = α f 0 , f 0 + 2α f 0 , f 1 + α f 1 , f 1 √ √ and that ( 2εa)(b/ 2ε) ≤ εa 2 + b2 /(4ε). Proposition 2. There exist positive constants C and C1 such that the remainder term Ri , (4.6), (4.7) satisfies Ri (x, t) L 2 ≤ Ce−C1 (t+|x|) ξ

2 3−γ

for |x| > 2λ3 t.

34

M.-Y. Lee, T.-P. Liu, S.-H. Yu

Proof. We use the weighted energy method. The weight function generalized that in [2]: for some large positive constant and small positive constant δ, set the exponent of the weight functions to be δ(x − 3λ t/2) 2 3−γ 3 σ+ (x, ξ, t) = 5 δ(x − 3λ3 t/2) + 1−η (1 + |ξ |)3−γ δ(x − 3λ t/2) δ(x − 3λ t/2) + 3 3 2 η , + 3|ξ | + (1 + |ξ |)1−γ (1 + |ξ |)3−γ −δ(x + 3λ t/2) 2 3−γ 3 1−η σ− (x, ξ, t) = 5 − δ(x + 3λ3 t/2) + (1 + |ξ |)3−γ −δ(x + 3λ t/2) −δ(x + 3λ t/2) + 3 3 2 η , + 3|ξ | + (1 + |ξ |)1−γ (1 + |ξ |)3−γ where η : R → R is a smooth non-increasing function, η(s) = 1 for s ≤ 1, η(s) = 0 for s ≥ 2; and 0 ≤ η ≤ 1. We use the weighted energy functions eσ+ (x,ξ,t) for x > 2λ3 t and eσ− (x,ξ,t) for x < −2λ3 t. We only consider the case x > 2λ3 t; the other case is similar. The following two estimates for the weighted function eσ+ (x,ξ,t) are also generalizations of those in [2]: There exists a positive constant c such that |ξ 1 ||∂x σ+ (x, ξ, t)| ≤ c(1 + |ξ |)γ . For any g ∈

L 2ξ

(5.1)

and x − 3λ3 t/2 > 0, there is a constant > 0 such that for 0 ≤ α ≤ ,

− 1−γ 3−γ

g0 , g0 + cν2 g1 , νg1 . (5.2) − g, eασ+ Le−ασ+ g ≥ −cα 2 δ(x − 3λ3 t/2) + Denote by At ≡ {x : x − 3λ3 t/2 ≤ 0} and Bt ≡ {x : x − 3λ3 t/2 > 0}. Let f (x, ξ, t) ≡ Ri (x, ξ, t) exp(ασ+ (x, ξ, t)), α > 0, so that Eq. (4.7) implies f t + ξ 1 f x − α(σ+t + ξ 1 σ+x ) f − eασ+ Le−ασ+ f = eασ+ K J2i , and so 1 d 2 dt

∞ −∞

&

f, f d x =

' α f, (σ+t + ξ 1 σ+x ) f + f, eασ+ Le−ασ+ f d x

At ∪Bt

+

f, eασ+ K J2i d x ≡ I At + I Bt .

At ∪Bt

For the last integral, we use γ −1

1−γ

f, eασ+ K J2i ≤ 4−1 3−γ α f, f + 3−γ α −1 eασ+ K J2i , eασ+ K J2i , and choose α1 small enough to conclude that, for α < α1 , eασ+ K J2i L 2 ≤ c J2i L 2 ≤ Ci exp{(−ν0 t − |x|γ t 1−γ )/c} gin L ∞ . ξ ξ

ξ

Boltzmann Equation with Hard Potentials

35

There exist α2 and c > 0 such that for 0 < α < α2 , I At ≤ −

1−γ

At

{cν2 f 1 , ν f 1 + Ci 3−γ α −1 ex p{(−ν0 t − |x|γ t 1−γ )/c} gin L ∞ } d x. (5.3) ξ

By Lemma 16 and (5.2), we get I Bt ≤ α

f, σ+t f d x + (α + 2α 3/2 )

f 0 , |ξ 1 σ+x | f 0 d x Bt Bt 1/2 + (α + α /2)

f 1 , |ξ 1 σ+x | f 1 d x Bt

− 1−γ 3−γ δ(x − 3λ3 t/2) +

f0 , f0 d x + c1 α 2 Bt γ −1 − c2 ν2

ν(ξ ) f 1 , f 1 d x + 4−1 α 3−γ

f, f d x Bt Bt 1−γ + c3 3−γ α −1 exp{−(ν0 t + |x|γ t 1−γ )/c4 } d x gin L ∞ . ξ Bt

From (5.1), with α3 small enough, we have α < α3 ,

(α + α 1/2 /2)

f 1 , |ξ 1 σ+x | f 1 d x ≤ c2 ν2 Bt

ν(ξ ) f 1 , f 1 d x.

(5.4)

Bt

By the expression of σ+ , the estimate can be divided into two cases. We just show the case (1 + |ξ |)3−γ ≥ δ(x − 3λ3 t/2) > 0; the other case is similar. For this case, we have σ+ (x, ξ, t) = (δ(x − 3λ3 t/2) + )/(1 + |ξ |)1−γ + 3|ξ |2 , and so f, σ+t f = − f, f (1 + |ξ |)−1+γ . Since the inequality eασ+ ≥ C(1 + |ξ |)1−γ holds for |ξ | large enough, we get

f, f (1 + |ξ |)−1+γ d x = Bt

Ri eασ+ , Ri eασ+ (1 + |ξ |)−1+γ d x ≥C

Ri eασ+ /2 , Ri eασ+ /2 d x Bt

Ri eασ+ , Ri eασ+ d x =c Bt =c

f, f d x. Bt

Bt

36

M.-Y. Lee, T.-P. Liu, S.-H. Yu

From (5.4) and the above inequality, we obtain − 1−γ I Bt ≤ −3cλ3 α/2

f, f d x + (α + 2α 3/2 ) 3−γ f 0 , |ξ 1 | f 0 d x Bt 1−γ γ −1 2 − 3−γ + cα

f 0 , f 0 d x + 4−1 α 3−γ

f0 , f0 d x Bt Bt 1−γ + c1 3−γ α −1 exp{−(ν0 t + |x|γ t 1−γ )/c4 } d x gin L ∞ ξ Bt 1−γ ≤ −c

f, f d x + c1 3−γ α −1 exp{−(ν0 t + |x|γ t 1−γ )/c4 } d x gin L ∞ . ξ Bt

Bt

Hence, from (5.3) and the above inequality, we get for α < min{, α1 , α2 , α3 }, t t

f, f d x + cα

f, f d x dt ≤ cα,γ , |||gin |||2 .

(5.5)

The estimates for higher derivatives are similar: t t

∂x f, ∂x f d x + cα

∂x f, ∂x f d x dt ≤ cα,γ , |||gin |||2 .

(5.6)

R

0

R

0

0

0

R

R

From (5.5) and (5.6), sup

(x,t)∈R×R+

eασ+ Ri L 2 ≤ cα,γ , |||gin |||. ξ

(5.7)

Finally, we note that, for x > 2λ3 t, x−

x 4x 3λ3 t x λ3 t 3λ3 t > + − > + , 2 5 5 2 10 10

(5.8)

and the proposition for x > 2λ3 t follows from (5.7) and (5.8). With Lemma 14 and Propositions 1 and 2, our main theorem, Theorem 1, is proved. Acknowledgements. The research of the first author is supported by the NSC Grant 94-2115-M-001-006 and 094-2917-1-001-001. The research of the second author is supported in part by the Institute of Mathematics, Academia Sinica, Taipei, and NSC Grant 94-2115-M-001-006. The reserach of the third author is supported by the Competitive Earmarked Research Grant of Hong Kong Cityu 103005. The offprints are supported by NSC Grant 95-2115-M-001-001.

References 1. Grad, H.: Asymptotic theory of the Boltzmann equation, II. In: Rarified gas dynamics, Int. Symp. on Rarefied Gas Dynamics, Third Symp. (1962), London-New York: Academic Press, 1963, pp. 25–59 2. Chen, C.-C., Liu, T.-P., Yang, T.: Existence of boundary layer solutions to the Boltzmann equation. Anal. Appl. (Singap.) 2, 337–363 (2004) 3. Hilbert, D.: Grundzüge einer allgeinen Theorie der linearen Integralgleichungen. Chap. 22, Leipzig: Teubner, 1912 4. Liu, T.-P., Wang, W.: The pointwise estimates of diffusion wave for the Navier–Stokes systems in odd multi-dimensions. Commun. Math. Phy. 196, 145–173 (1998) 5. Liu, T.-P., Yu, S.-H.: The Green’s function and large-time behavior of solutions for the one-dimensional Boltzmann equation. Comm. Pure. Appl. Math. 57, 1543–1608 (2004)

Boltzmann Equation with Hard Potentials

37

6. Liu, T.-P., Yu, S.-H.: Boltzmann equation: micro-macro decompositions and positivity of shock ptofiles. Commun. Math. Phys. 246(1), 133–179 (2004) 7. Liu, T.-P., Yu, S.-H.: Green’s function of Boltzmann equation, 3-D waves. Bulletin of Institute of Mathmatics, Academia Sinica New Series 1(1), 1–78 (2006) Communicated by H.-T. Yau

Commun. Math. Phys. 269, 39–86 (2007) Digital Object Identifier (DOI) 10.1007/s00220-006-0114-1

Communications in

Mathematical Physics

Mirror Symmetry in Two Steps: A–I–B Edward Frenkel1, , Andrei Losev2, 1 Department of Mathematics, University of California, Berkeley, CA 94720, USA.

E-mail: [email protected]

2 Institute of Theoretical and Experimental Physics, B. Cheremushkinskaya 25, Moscow 117259, Russia.

E-mail: [email protected] Received: 26 December 2005 / Accepted: 7 April 2006 Published online: 13 October 2006 – © Springer-Verlag 2006

Abstract: We suggest an interpretation of mirror symmetry for toric varieties via an equivalence of two conformal field theories. The first theory is the twisted sigma model of a toric variety in the infinite volume limit (the A–model). The second theory is an intermediate model, which we call the I–model. The equivalence between the A–model and the I–model is achieved by realizing the former as a deformation of a linear sigma model with a complex torus as the target and then applying to it a version of the T –duality. On the other hand, the I–model is closely related to the twisted Landau-Ginzburg model (the B–model) that is mirror dual to the A–model. Thus, the mirror symmetry is realized in two steps, via the I–model. In particular, we obtain a natural interpretation of the superpotential of the Landau-Ginzburg model as the sum of terms corresponding to the components of a divisor in the toric variety. We also relate the cohomology of the supercharges of the I–model to the chiral de Rham complex and the quantum cohomology of the underlying toric variety.

Introduction Two-dimensional supersymmetric sigma models have attracted a lot of attention in recent years. These models are rich enough to display many important and non-trivial physical phenomena, understanding which may help us gain insights into more difficult models, such as the four-dimensional gauge theories. One of the most interesting phenomena is mirror symmetry which is a duality between a type A twisted sigma model and a type B twisted topological theory, such as a Landau-Ginzburg model (see, e.g., [18]). The advent of mirror symmetry has led to spectacular conjectures and results in mathematics, bringing together such diverse topics as enumerative algebraic geometry, Gromov-Witten Partially supported by the DARPA grant HR0011-04-1-0031 and the NSF grant DMS-0303529.

Partially supported by the Federal Program 40.052.1.1.1112, by the Grants INTAS 03-51-6346,

NSh-1999/2003.2 and RFFI-04-01-00637.

40

E. Frenkel, A. Losev

invariants, Floer cohomology, soliton equations and singularity theory (see the book [19] and references therein). In this paper we suggest an interpretation of mirror symmetry for toric varieties. We show that there is a certain conformal field theory (the “I–model”) that is intermediate between the type A twisted sigma model and the type B twisted Landau-Ginzburg model. On the one hand, this model is equivalent to the sigma model of a toric variety in the infinite volume limit, considered as a conformal field theory, and on the other hand its BPS sector is closely related to the BPS sector of the corresponding Landau-Ginzburg model. Let us describe this correspondence in more detail. Sigma model in the infinite volume. Consider the type A twisted N = (2, 2) supersymmetric sigma model with a target Kähler manifold M. This model is believed to define a superconformal quantum field theory if the Kähler metric is Ricci flat, i.e., if M is a Calabi-Yau manifold. However, we will argue in this paper that a suitable infinite volume limit of the twisted sigma model defines a conformal field theory for more general target manifolds. This infinite volume limit is defined at the classical level by passing to a suitable “first order formalism” Lagrangian, which has previously been considered in the literature in [29, 3, 6] and more recently in [2, 20]. Rescaling the Kähler metric by a parameter t, we find that the first order Lagrangian has a well-defined limit even as t → ∞. In this limit we obtain a conformally invariant Lagrangian, which describes what is natural to call the infinite volume limit of the twisted sigma model (see Sect. 1.1 for details). Quantization of a first order Lagrangian could be non-trivial and even problematic in some cases. However, in the twisted N = (2, 2) supersymmetric theory that we are considering it is expected that all potential anomalies cancel out and the theory remains conformally invariant at the quantum level as well. Moreover, the corresponding path integral over all maps : → M, where is a Riemann surface (the worldsheet), has a nice geometric interpretation as the delta-form supported on the subspace of holomorphic maps : → M. When we deform the Lagrangian back to the finite volume, i.e., to finite values of t, we obtain what looks like a “smoothening” of this delta-form, or, more precisely, the Mathai-Quillen representative of the Euler class of an appropriate vector bundle over the space of maps, see [6]. Hence it is natural to think that in the infinite volume limit the path integral localizes on the holomorphic maps, i.e., it can be represented as a sum of integrals over the finite-dimensional moduli spaces of holomorphic maps of different degrees (see Sect. 1.2). This is what one expects in the type A twisted sigma model in the infinite volume limit as explained by E. Witten in [30, 31]. We wish to view the model in the infinite volume first and foremost as a topological conformal field theory. In particular, it should come with a Hilbert space combining chiral and anti-chiral states, and a state-field correspondence. Correlation functions should be defined for any Riemann surface with marked points x1 , . . . , xn (and possibly germs of local coordinates at those points), and a collection of local operators inserted at those points. These correlation functions may be viewed as differential forms on the moduli space Mg,n of pointed curves (, (xi )) (see Sect. 1.3). Part of this structure is captured by the Gromov-Witten invariants, which appear as integrals of the differential forms corresponding to particular observables over a compactification of Mg,n (see Sect. 1.3 for more details). Another ingredient of this conformal field theory is a sheaf of chiral algebras over M, called the chiral de Rham complex, introduced in [26]. It is defined by gluing free chiral algebras on the overlaps of open subsets of M isomorphic to Cn . From the point

Mirror Symmetry in Two Steps: A–I–B

41

of view of the twisted sigma model, this chiral algebra corresponds to the cohomology of the right moving supercharge in the perturbative regime, i.e., without counting the instanton contributions, as explained in [34, 20]. In order to understand the correlation functions of the sigma model and in particular to include the instanton corrections, it is necessary to go beyond the chiral algebra and consider the full conformal field theory. This is one of the goals of the present paper.

Non-linear sigma models as deformations of free field theories. There is one case when the sigma model can certainly be defined as a conformal field theory, and this is the case of the target manifolds with a flat metric, such as a flat space Cn or a torus (for a detailed treatment of the latter, see [21]). We will consider in Sect. 2 the intermediate case of the sigma model in the infinite volume with the target manifold a complex torus (C× )n , which we call the toric sigma model. This is a free conformal field theory, but we will show that it exhibits some non-trivial effects, such as the appearance of holomorphic analogues of vortex operators, which we call holomortex operators. We will then define in Sects. 3 and 4 the conformal field theory governing a non-linear sigma model of a toric variety in the infinite volume as a deformation, in the sense of A. Zamolodchikov [35], of the toric sigma model, by some explicitly written exactly marginal operators. By its very definition, this deformed conformal field theory will include the instanton effects corresponding to holomorphic maps of non-zero degree. To illustrate our main idea, it is instructive to look at the case of the sigma model with the target P1 in the infinite volume limit, obtained by quantization of the corresponding first order Lagrangian. We wish to obtain it as a deformation of the toric sigma model with the target C× which we realize as the quotient C/2πiZ. This is a free conformal field theory with the basic chiral fields X (z), p(z), ψ(z), π(z), and their anti-chiral partners with the action i 2π

d2z

p∂ z X + p∂z X + π ∂ z ψ + π ∂z ψ .

(0.1)

The field X (z) corresponds to a linear coordinate on C/2πiZ, and so is defined modulo 2πiZ. As discussed above, the correlation functions of this model are given by integrals over the space of holomorphic maps → C× . For compact , all such maps are necessarily constant. Therefore the correlation functions reduce to integrals over the zero mode (i.e., over the image of the constant map : → C× ), as expected in a free field theory. How can we interpret holomorphic maps → P1 within the framework of this free field theory? Such maps may be viewed as holomorphic maps \{wi± } → C/2πiZ with logarithmic singularities at some points w1± , . . . , w ± N , where this map behaves as ± ± log(z − wi ). These singular points correspond to zeroes and poles of exp , and generically they will be distinct. Our proposal is that we can create these singularities of by inserting in the correlation function of the linear sigma model certain vertex operators ± (wi± ). The defining property of the operators ± (w) (up to a scalar) is that their operator product expansion (OPE) with X (z) should read X (z)± (w) = ± log(z − w)± (w).

(0.2)

42

E. Frenkel, A. Losev

Given such operators, we can write a given function (in the case of of genus zero) (z) = c +

n

log(z − wi+ ) −

i=1

n

log(z − wi− )

i=1

as the correlator (z) = X (z)

n

+ (wi+ )

i=1

n

− (wi− )δ 2 (X (∞) − c)ψ(∞)ψ(∞)

i=1

(the term involving the delta-function and the fermions will give, upon the integration over the zero modes of X and ψ, the normalization condition (∞) = c). Thus, we can create all instantons of the P1 sigma model, that is holomorphic maps → P1 , as correlation functions in the toric sigma model of the above form (the case of of genus greater than zero will be discussed in Sect. 3.1). The property (0.2) is satisfied by the following fields: w ( p(z)dz + p(z)dz) , ± (w, w) = exp ∓i w0

which are examples of the holomortex operators mentioned above. Including these operators in the correlation functions and allowing the points wi± to vary over is equivalent to deforming the action (0.1) with the term

(2) (2) q 1/2 + + − ,

(2)

where ± are the cohomological descendants (2)

± = ± (w, w)π(w)π(w)dwdw. The resulting deformed theory appears to be equivalent to the sigma model with the target P1 in the infinite volume limit (in the sense explained in Sect. 3). By construction, the part of a correlation function of this deformed theory that corresponds to degree n maps → P1 will appear with the overall factor q n . More generally, suppose that we are given a smooth compact Kähler manifold M with an open dense submanifold M0 with a linear structure. The complement C = M\M0 is a compactification divisor, which is a union of irreducible components C1 , . . . , C N . The linear sigma model corresponding to M0 is a free superconformal field theory, and we wish to describe the non-linear model with the target M in terms of this theory. Let us observe that a generic holomorphic map : → M will take values in C at a finite set of points x1 , . . . , xn , and generically we will have (x j ) ∈ Ck j and (x j ) ∈ Cl , l = k j . To account for such maps we need to insert some vertex operators k j corresponding to the compactification divisors Ck j at the points x j , j = 1, . . . , n. It is then natural to expect that the non-linear sigma model with the target M in the infinite volume limit can be described as the deformation of the free field theory corresponding to the target M0 by (2) (2) means of the operators k , k = 1, . . . , N , where k is the (1, 1)–form counterpart of k obtained via the cohomological descent. To solve the theory we therefore need to identify explicitly the suitable vertex operators k , k = 1, . . . , n, corresponding to the compactification divisors. In general, they may be highly non-local and given by very complicated formulas.

Mirror Symmetry in Two Steps: A–I–B

43

While finding these operators may seem like a daunting task in general, it turns out that in the case when the target is a toric variety, they can be written down quite explicitly. Such a variety M comes with a natural open dense subset M0 isomorphic to (C× )n and the compactification divisors are naturally parameterized by the one-dimensional cones in the fan defining M. We construct explicitly the vertex operators corresponding to these compactification divisors in Sect. 4. These operators may be viewed as holomorphic counterparts of the vortex operators familiar from the free bosonic theory compactified on a torus. We will argue that the deformation of the action by these operators changes the topology of the target manifold and deforms a free field theory to a non-linear sigma model with the target M. As in the case of P1 , we expect that the sigma model with the target M, which is a smooth compact Fano toric variety, is equivalent to a deformation of the free field theory with the target (C× )n by the holomortex operators corresponding to the irreducible components of the compactification divisor. As a consistency check, we compute in Sect. 5.4 the cohomology of the right moving supercharge in our deformed theory, making a connection to the results of L. Borisov [4] and F. Malikov–V. Schechtman [25]. In particular, we show that in the case of M = Pn this cohomology is equal to the quantum cohomology of Pn . On the other hand, in a certain limit we obtain the cohomology of the chiral de Rham complex of M. This is consistent with the assertion of [34, 20] that the chiral de Rham complex should appear as the cohomology of the right moving supercharge of the type A twisted sigma model in the perturbative regime.

I–model and mirror symmetry. Next, we consider the question as to what is the meaning of mirror symmetry from the point of view of our description of the sigma model of a toric variety as a deformation of a free field theory. The first step in answering this question is to perform a kind of T –duality transform of the free field theory with the target (C× )n . In the case of P1 , before the deformation, we have the free field theory with the target C× . The dual of this theory turns out to be the ordinary sigma model with the target being the cylinder R × S 1 equipped with the metric of Minkowski signature. Let R and U be the coordinates on R and S 1 = R/2π , respectively. Under the T –duality the local fields p and X become more complicated, but the complicated fields, like the holomortex operators, become simple. In fact, we have the following transformation: pdz + pdz = dU, and so the holomortex operators ± turn out to be simply the exponential fields e∓iU . The field R coincides with the field 21 (X + X ) of the original theory. Therefore e R coincides with the field |e X |, the absolute value of the holomorphic coordinate on P1 compactifying the target C× . The action of the deformed dual theory reads i 2 1/2 d z ∂z U ∂ z R + ∂ z U ∂z R + π ∂ z ψ + π∂z ψ + q (eiU + e−iU )π πd 2 z. 2π (0.3) Thus, the correlation functions of the observables of this theory that depend only on the field R realize the corresponding correlation functions of the twisted sigma model, namely, those that depend only on |e X |. But while the correlation functions of the twisted

44

E. Frenkel, A. Losev

sigma model appear as sums over the instanton contributions, the dual description gives us their non-perturbative realization! Let us compare the action (0.3) to the action of the Landau-Ginzburg model with the target C and the Landau-Ginzburg superpotential W = q 1/2 (eiY + e−iY ), where Y is a chiral superfield: 1 d 2 z ∂z ϕ∂ z ϕ + ∂ z ϕ∂z ϕ + iχ+ ∂ z χ + + iχ− ∂z χ − 2π +q 1/2 (eiϕ + e−iϕ )χ+ χ− d 2 z.

(0.4)

We observe that the two actions look similar: if we “analytically continue” the theory with the action (0.3), allowing the fields U and R to become complex-valued fields ϕ and ϕ, which are complex conjugate to each other, and rename the fermions as follows: π → χ − , π → χ+ , ψ → χ − , ψ → χ + , then the action (0.3) becomes the action (0.4). This means that the correlation functions in the two theories should be related by a kind of analytic continuation. However, we wish to stress the models with the actions (0.3) and (0.4) are different. For example, in the model (0.3) the field U is real periodic, and R is real non-periodic, while in the model (0.4) the fields ϕ and ϕ are complex (conjugate to each other) and both periodic. It is instructive to compare the supersymmetry charges in the above models. For simplicity we consider the case when q = 0. In the original A–model with the action (0.1) the supercharge is (ψ pdz + ψ pdz). This is a de Rham type supercharge, because under its action X → ψ, X → ψ. In the T –dual theory with the action (0.3) (with q = 0) the supercharge becomes (ψ∂z U dz + ψ∂ z U dz). Under the “analytic continuation” that we discussed above, it becomes the supercharge of the type B twisted Landau-Ginzburg model with the action (0.4) (with q = 0): (χ − ∂z ϕdz + χ + ∂ z ϕdz). This is now a Dolbeault type supercharge, because under its action ϕ → 0, ϕ → χ − +χ + . Thus, the T –duality indeed transforms a de Rham type supercharge of the A-model to a Dolbeault type supercharge of the B–model, as expected in mirror symmetry. Note that the interpretation of the fermionic fields is very different in the two theories, and this underscores the highly non-local nature of the mirror symmetry. Traditionally the Landau-Ginzburg model to the action of the is defined by adding supersymmetric linear sigma model the term d 2 zd 2 θ W (Y )+ d 2 zd 2 θ W (Y ). Usually, one chooses W (Y ) to be complex conjugate of W (Y ). But in a type B twisted LandauGinzburg model there is an essential difference between the first and the second terms:

Mirror Symmetry in Two Steps: A–I–B

45

while the integrand in the first one is a (1, 1)–form, the integrand in the second is a (0, 0)–form, and hence to integrate it one needs to pick a metric on the worldsheet. This breaks conformal invariance. That is why in the action (0.4) we have set W = 0, for otherwise the theory would not be conformally invariant. The Landau-Ginzburg model with the (twisted) superpotential W , where W is as above, and its complex conjugate W has been considered by K. Hori and C. Vafa [18] (see also [13, 5, 7, 16]). They showed that its correlation functions in the BPS sector are related to those of the twisted sigma model of P1 , which is the sense in which the two theories are mirror dual to each other. Note that W is Q–exact, and the possibility of setting W to 0 was mentioned in [22] and [18], Sect. 6. The point of our construction is that in addition to the twisted sigma model and the Landau-Ginzburg model, which are usually considered in the study of mirror symmetry, there is an intermediate model, or the “I–model”, described by the action (0.3). This is a conformal field theory that has two properties: on the one hand it should be equivalent to the type A twisted sigma model with the target P1 in the infinite volume, which is also a conformal field theory. In other words, all correlation functions in the two models are equivalent, not just in the BPS sector. On the other hand, the BPS sector of the I–model is closely related to the BPS sector of the type B twisted Landau-Ginzburg model considered in [18] (see the discussion in Sect. 3.2 for more details). This conclusion leads to a curious observation that the correlation functions of the field e R in the I–model (which corresponds to eϕ in the Landau-Ginzburg model (0.4)) encode the correlation functions of the field |e X | of the sigma model with the target P1 . Thus, one can actually see the P1 instantons, and not just the correlation functions of the BPS states, in the framework of the I–model (or the Landau-Ginzburg model)! We define a similar I–model for an arbitrary toric variety. Then the corresponding N −iU deformation term in the Lagrangian is equal to the sum k=1 e k π(k) π (k) over the components of the compactification divisor of our toric variety. The fields Uk satisfy constraints reflecting the structure of the fan defining the toric variety M. For example, in the case when M = Pn we have N = n + 1, and the fields Uk satisfy the familiar constraint

n+1 −iU k = q. Thus, we immediately recognize that, after the analytic continuation, k=1 e we obtain a term that looks like the Landau-Ginzburg superpotential corresponding to Pn considered in [18]. We note that these superpotentials and the corresponding oscillating integrals representing correlation functions of the Landau-Ginzburg model had previously appeared in the mathematical work of A. Givental [16] on mirror symmetry. We stress that in our approach the superpotential is generated because of our description of the sigma model with the target M (in the infinite volume limit) as a deformation of a free field theory, to which we apply the T –duality transform. Therefore the superpotential has a transparent geometric meaning. Namely, the summands appearing in the superpotential naturally correspond to the irreducible components of the compactification divisor in M. The mirror symmetry can now be viewed as a corollary of the equivalence of the I–model and the A–model (sigma model with the target M in the infinite volume limit), as conformal field theories. We hope that the I–model will help us understand more fully the phenomenon of mirror symmetry.1 (1) In the case of Pn , the action of the I–model is very similar to the action of the An−1 affine Toda field theory, considered as a deformation of a free field theory. However, since the I–model is conformally invariant, its structure is actually more reminiscent of that of the conformal An−1 Toda field theory. We can use the methods familiar from 1 It is instructive to compare our derivation of mirror symmetry to A. Polyakov’s model of confinement in three dimensions [27].

46

E. Frenkel, A. Losev

the Toda theory to determine the structure of the chiral sector of the I–model. We recall that in the case of an An−1 Toda field theory the chiral algebra of integrals of motion is the Wn –algebra [8, 12]. It appears as the subalgebra of those operators of the free field theory which commute with the screening operators, which are the residues of the operators deforming the action. Likewise, the W–algebra in the I–model associated to a toric variety M consists of the operators that commute with the operators −iU e k π(k) dz, k = 1, . . . , n + 1 (which can therefore be viewed as supersymmetric analogues of the screening operators), and it is possible to determine it explicitly. In doing so, we make a connection to the results of [4] (see also [9, 17, 25]) and show that this W–algebra is isomorphic to the algebra of global sections of the chiral de Rham complex on M. In a follow-up paper we will generalize our results to hypersurfaces in toric varieties, and, more generally, to complete intersections in toric varieties. This way we hope to obtain a realization of mirror symmetry for such varieties as an equivalence of conformal field theories in the sense explained above. In a future work we plan to consider an analogue of this construction for the (0, 2) supersymmetric sigma models. We believe that in the case when M is a flag manifold of a simple Lie group, this theory, when coupled to gauge theory, is closely related to the geometric Langlands correspondence. We also plan to apply similar methods to the study of four-dimensional supersymmetric Yang-Mills theories. The paper is organized as follows. In Sect. 1 we discuss the sigma model in the infinite volume limit, at both classical and quantum levels. We explain how the first order Lagrangian (with a B–field term) arises in the infinite volume limit and the interpretation of the corresponding path integrals as integrals of differential forms on the moduli spaces of holomorphic maps. We then outline our idea of constructing non-linear sigma models as deformations of linear ones. We illustrate this idea on the example of the deformation of the target manifold from C to P1 . In Sect. 2 we introduce the toric sigma model, which is the linear sigma model with the target C× in the infinite volume. We define the holomortex operators and the T –duality transform. We show that the T –dual model of the toric sigma model is the ordinary sigma model with the target being the cylinder equipped with a metric of Minkowski signature. In Sect. 3 we consider a deformation of the toric sigma model to the sigma model with the target P1 . We then define the T–dual theory, which is our I–model. We give a sample computation of the correlation functions in the I–model and obtain explicit formulas for the supercharges. We generalize these results to the case of an arbitrary compact smooth toric variety in Sect. 4. Finally, we discuss the operator formalism of these theories in Sect. 5, as well as their W–algebras and the cohomologies of the supersymmetry charges. 1. Supersymmetric Sigma Model in the Infinite Volume Limit 1.1. Lagrangian description. We start by describing the A twisted N = (2, 2) supersymmetric sigma model in the formalism of the first order, following [29, 3, 6]. Let be a complex Riemann surface (worldsheet). We denote by z and z the local holomorphic and anti-holomorphic coordinates on , and by d 2 z = idz ∧ dz the corresponding integration measure on . Let M be a complex Kähler manifold (target) with a fixed Kähler metric gab . We will denote by X a , a = 1, . . . , N = dim M, local holomorphic coordinates on M, and by X a = X a their complex conjugates. Given a map : → M, we consider the pull-backs of X a and X a as functions on , denoted by the same symbols. We also

Mirror Symmetry in Two Steps: A–I–B

47

have fermionic fields ψ a and ψ a , a = 1, . . . , N , which are sections of ∗ (T 1,0 M) and ∗ (T 0,1 M), respectively. The Levi-Civita connection on T M corresponding to the metric gab induces a connection on ∗ (T M). The corresponding covariant derivatives have the form a ψ c, Dz ψ a = ∂ z ψ a + ∂ z X b · bc a Dz ψ a = ∂z ψ a + ∂z X b · bc ψ c, a = g ab ∂ g . where bc b cb Next, we introduce auxiliary fields pa which will play the role of the “Lagrange multipliers” corresponding to the equations ∂ z X a = 0, and their complex conjugates pa . Their fermionic super-partners will be denoted by πa and πa . These are sections of ∗ (1,0 M) ⊗ 1,0 and ∗ (0,1 M) ⊗ 0,1 , respectively. We write down the action for these fields following [29] (formula (2.14)): 1 It = d 2 z i pa ∂ z X a + i pa ∂z X a + iπa Dz ψ a + iπa Dz ψ a 2π

(1.1) − t −1 R ab cd πa πb ψ c ψ d + t −1 g ab pa pb ,

where t is a parameter (the “radius”). The equations of motion for pa , pa are as follows: pa = −itgab ∂z X b ,

(1.2)

pa = −itgab ∂ z X . b

Remark 1.1. Formulas (1.2) seem to indicate that the complex conjugate of pa is equal to − pa , which is misleading. In fact, the substitution (1.2) is formal and only makes sense under the path integral. It corresponds to completing the action to a square and integrating out the variables pa and pa . Substituting these formulas back into (1.1), we obtain the action

1 d 2 z tgab ∂ z X a ∂z X b + iπa Dz ψ a + iπa Dz ψ a − t −1 R ab cd πa πb ψ c ψ d . It = 2π This is the action of the A–twisted N = (2, 2) supersymmetric sigma model with the t target M and the B–field − 2π ω, where ω = 2i gab d X a ∧ d X b is the Kähler form on M, introduced in [29, 31]. The corresponding metric on M is tgab . Thus, the action (1.1) describes this model. In the infinite volume limit t → ∞ the action (1.1) becomes

i (1.3) I∞ = d 2 z pa ∂ z X a + pa ∂z X a + πa Dz ψ a + πa Dz ψ a . 2π This action is conformally invariant, and it has two supersymmetries: one is mapping X a → ψ a , ψ a → 0, b πa → − pa − ac πb ψ c , b pb ψ c , pa → ac

and the other does the same to their complex conjugates.

48

E. Frenkel, A. Losev

1.2. The path integral. The action (1.3) describes a conformal field theory governing the infinite volume limit of the A–twisted sigma model. We wish to understand the corresponding quantum field theory. The first observation is that the path integral [Dp][Dπ ]e−I∞ , considered as a differential form on the space of maps → M, may be viewed as the integral representation of the delta-function differential form supported on the space of holomorphic maps → M. To see this, consider a finite-dimensional model situation: a complex vector space C M and functions f a , a = 1, . . . , N , defining a codimension N complex subvariety C ⊂ C M . Then the delta-like differential form supported on this subvariety has the following integral representation: δC =

dpa dpa dπa dπa exp −i pa f a − i pa f a − iπa d f a − iπ a d f a .

a

This delta-form may be viewed as the limit, when t → ∞, of the regularized integral δC,t =

dpa dpa dπa dπa exp −i pa f a − i pa f a − iπa d f a − iπ a d f a −t −1 pa pa .

a

Comparing these formulas to (1.1) and (1.3), we see that the path integral

[Dp][Dπ ]e−I∞

(1.4)

looks like a delta-like form supported on the solutions of the equation ∂ z X a = 0, i.e., on the holomorphic maps, while [Dp][Dπ ]e−It may be viewed as its regularized version. Alternatively, and more precisely, one may say that the integral [Dp][Dπ ]e−It looks like the Mathai-Quillen representative of the Euler class of an appropriate vector bundle over the space of maps → M (see [6], Section 13.6). Motivated by this analogy, it is natural to expect that in the infinite volume limit the correlation functions in our theory will correspond to sums of integrals of differential forms over different connected components of the moduli space of holomorphic maps → M, as explained in [30]. Particular examples of these functions give rise to the Gromov-Witten invariants of M [31]. The connected components of the moduli space of holomorphic maps → M are labeled by H2 (M). Choosing a basis in H2 (M), we can label them by k–tuples of integers (n 1 , . . . , n k ). It is customary to weight the contribution to the path integral corresponding to the component of the space of holomorphic maps → M of degree (n 1 , . . . , n k ) with the coefficient q1n 1 . . . qkn k (we choose this basis in such a way that non-zero contributions come n i ≥ 0). This can be achieved by adding to the action from ui ∗ i i I∞ the topological term i 2π ( ). Here { } is the basis of the Kähler cone of M that is dual to the above basis of H2 (M) and the u i ’s are the coupling constants such that qi = e−u i . The corresponding path integral is then the sum over n 1 , . . . , n k ≥ 0 of terms corresponding to the holomorphic maps → M of degrees (n 1 , . . . , n k ) with coefficients q1n 1 . . . qkn k . This path integral may be obtained as the t → ∞ limit of a sigma model t 1 path integral as follows. We simply add to the action the B–field − 2π ω + 2π , where

Mirror Symmetry in Two Steps: A–I–B

49

ω = 2i gab d X a ∧ d X b is the Kähler form on M and = i u i i . Then the bosonic part of the action will read

t 1 2 gab ∂ z X a ∂z X b + gab ∂z X a ∂ z X b d z 2π 2

t + gab ∂ z X a ∂z X b − gab ∂z X a ∂ z X b 2 ui 1 ∗ i 2 a b + u i ( ) = d z tgab ∂ z X ∂z X + ∗ ( i ). 2π 2π i

i

In terms of the first order variables this becomes

u 1 i d 2 z i pa ∂ z X a + i pa ∂z X a + t −1 g ab pa pb + ∗ ( i ). 2π 2π i

Therefore in the limit t → ∞ the path integral will indeed give us the desired sum over n 1 , . . . , n k ≥ 0 weighted with coefficients q1n 1 . . . qkn k , where qi = e−u i . Proper definition of the path integral (1.4) for worldsheets of genus greater than zero requires a prescription for the integration of the zero modes of the fields pa and pa . 2 The most evident possibility to do so is to add the term of the form G ab pa pb to the action and consider the limit → 0. However, if we choose G ab to be the inverse of a Kähler form on M, this will bring us back to the finite volume and spoil conformal invariance if M is not Calabi-Yau. But we can take G ab to be any tensor in T 1,1 M of the following form. Suppose that we have a flat Kähler metric on an open dense subset M0 of M, such that its inverse is a section of T 1,1 M0 that extends to a section on the entire M. We can then take this extension as our G ab . Then we can regularize the integrals over the zero modes of the pa ’s and pa ’s without violating conformal invariance of the theory. Such tensors can be easily constructed for Fano toric varieties, and we will see examples of that below. We also remark that for general Fano manifolds the zero modes disappear altogether when the genus of is fixed and the degree of the map → M is sufficiently high. Remark 1.2. The action (1.3) is conformally invariant, and we expect that the corresponding quantum field theory is also conformally invariant, for any Kähler manifold M. However, in the case of non-Ricci flat Kahler manifolds non-zero β–function is developed and the theory becomes non-conformal for finite values of t, even though the deformation to finite volume is achieved by adding the operator V = a,b g ab pa pb of dimension (1,1). In general, consider the basis Va in the space of operators of dimension (1,1). The |z − w|−2 term in their operator product expansion reads as follows: Va (z)Vb (w) ∼

c V (w) Cab c . |z − w|2

c V . In our Then the theory with interaction t a Va has the beta-function equal to t a t b Cab c −2 case, the OPE of the above operator V with itself contains |z − w| with the coefficient proportional to R ab pa pb , where Rab is the Ricci curvature of M [23]. Therefore, if M 2 We thank N. Nekrasov for a discussion of this point.

50

E. Frenkel, A. Losev

is not Calabi-Yau, the sigma model in the finite volume is not conformally invariant. However, in the infinite volume limit the beta-function vanishes and the theory becomes conformally invariant, even for manifolds that are not Calabi-Yau. 1.3. Correlation functions. Correlation functions in our model are defined for any Riemann surface with marked points x1 , . . . , xn , and a collection of local operators inserted at those points. In a general conformal field theory with central charge c = 0 correlation functions are functions on the moduli space Mg,n of pointed curves (, (xi )).3 But our theory carries a supersymmetry charge Q such that the stress tensor T (z) is Q–exact: T (z) = [Q, G(z)]+ , and similarly for the anti-chiral fields, and so it has the structure of topological conformal field theory. In a topological conformal field theory we can construct not only functions, but also differential forms on the moduli space Mg,n , by inserting integrals of the fields G(z) and G(z) (see [33, 36]). Let us recall this construction. Suppose for simplicity that n > 0, and let O1 , . . . , On be some local operators inserted at the points x1 , . . . , xn . We will explain how to construct holomorphic differential forms. The construction is easily generalized to arbitrary forms. We note that the holomorphic tangent space to the moduli space Mg,n at (, (xi )) is isomorphic to the double quotient

(\{x1 , . . . , xn }, T 1,0 )\

n i=1

C((ti ))∂ti /

n

C[[ti ]]∂ti ,

i=1

where T 1,0 is the holomorphic tangent bundle of (see, e.g., [14], Sect. 17.3, and references therein). Now any holomorphic vector field on the punctured disc near xi , 1,0 ξi = f i (ti )∂ti ∈ C((ti ))∂ti defines a tangent vector in T(,(x Mg,n . To define a differeni )) tial (k, 0)–form on Mg,n corresponding to O1 , . . . , On we need to describe its values on k–tuples of holomorphic tangent vectors of the above form. Let us suppose that we have (α ) (1) tangent vectors corresponding to the vector fields ξ j , . . . , ξ j j at the point x j . Then, by definition, the value of this (k, 0)–form on these tangent vectors is just the correlation function n (α j ) (α j ) (1) (1) ξ j G(z j ) · · · ξ j G(z j )O j . j=1

In other words, we “dress” the local operator inserted at x j by contour integrals of G(z) (α )

j coupled to the vector fields ξ (1) . To obtain more general differential forms, j , . . . , ξj we should use the anti-chiral field G(z) as well. If the observables O j have definite fermionic charges, then among all of these differential forms there is at most one that is non-zero. Its degree is determined by the corresponding fermionic charge conservation law. What do these differential forms look like? Typical observables of the theory are differential forms on M, and Q acts on them as the de Rham differential. Let Mg,n (M, β) be the moduli space of (, (xi ), ), where and (xi ) are as above and is a holomorphic map → M of degree β. Then we have a forgetful map Mg,n (M, β) → Mg,n .

3 We may also need to choose non-zero tangent vectors, or even germs of local coordinates, at the marked points, but in the discussion below we will omit them.

Mirror Symmetry in Two Steps: A–I–B

51

Suppose we want to compute the correlation functions of the local operators corresponding to differential forms ωi , i = 1, . . . , n on M, not necessarily closed. Then we should take the cup product of the pull-backs of the ωi ’s to Mg,n (M, β) under the evaluation maps, and take the push-forward of the resulting differential form to Mg,n . If the ωi ’s are smooth and have compact support, then one can show that the result is a differential form (not necessarily of top degree) on Mg,n . This is an example of a correlation function in our conformal field theory. But this is not the most general example. Other correlation functions correspond to other local observables, such as the vector fields on M realized as Lie derivatives acting on differential forms. Part of this structure is captured by the Gromov-Witten invariants. Since these moduli spaces Mg,n (M, β) are non-compact, we find that if we wish the correlation functions of Q–closed observables (such as closed differential forms on M) to depend only on their cohomology classes, we need to compactify these moduli spaces. The factorization property of the correlation functions will then also require that we introduce certain additional components into the compactified moduli spaces. The Kontsevich moduli spaces Mg,n (M, β) of stable maps provide one with compactifications which satisfy all desirable properties and are equipped with the evaluation maps to the target manifold M which one can use to pull-back differential forms on M.4 One also has a forgetful map from Mg,n (M, β) to the Deligne-Mumford compactification Mg,n of Mg,n . Taking the cup product of the pull-backs of such forms ωi ’s to Mg,n (M, β), and then the push-forward to Mg,n , we obtain differential forms on Mg,n whose cohomology classes now depend only on the cohomology classes of the ωi ’s. Pairing them with some natural cohomology classes on Mg,n , we obtain the Gromov-Witten invariants. But since they come from very special observables of our theory, they correspond to a particular sector of the full conformal field theory associated to the twisted sigma model in the infinite volume. A natural question is how one can see the compactification Mg,n (M, β) of Mg,n (M, β) in the framework of the conformal field theory with the action (1.3). A possible answer is that the integrals over the additional strata may naturally appear when one performs a regularization of the integral over the zero modes of the pa ’s and pa ’s along the lines described above. Another part of this structure has been studied in mathematical literature starting with [26]. It is encoded by a sheaf of chiral algebras over M, called the chiral de Rham complex, which is defined by gluing the free chiral algebras on the overlaps of the open subsets. From the point of view of the sigma model, this chiral algebra corresponds to the cohomology of the right moving supercharge of the twisted sigma model in the perturbative regime (i.e., without counting instanton contributions), as explained in [34, 20]. However, the knowledge of this cohomology is not sufficient for determining the correlation functions of the sigma model. In order to determine them one needs to generalize the construction of this chiral algebra to the full conformal field theory and to include the instanton corrections. This is done in this paper in the case when the target manifold is a toric variety. The idea is to realize the quantum field theory governed by the action (1.3) in the case when the target manifold M is a toric variety as a deformation of a free field theory. A toric variety P S has a particularly nice open cover {Aσ (i) }i=1,...,N with each open subset Aσ (i) isomorphic to Cd and their intersection T S to (C× )d (see Sect. 4.1). The 4 Note that it may happen that M (M, β) is empty, but M (M, β) is non-empty; see the discussion at g,n g,n the end of Sect. 3.1.

52

E. Frenkel, A. Losev

complement of T S in P S is a divisor with components Ci equal to the complements of Aσ (i) in P S . Our idea is that the sigma model corresponding to a target manifold M is equivalent to a deformation of the sigma model with the target manifold M\C, where C is a divisor, by means of a marginal vertex operator determined by C. Now, starting with the sigma model with the target T S , which is a free field theory, we may build the sigma models with the target manifolds obtained by gradually “gluing” back the divisors Ci . Each time we “glue” back a divisor Ci , we deform the theory by a vertex operator corresponding to Ci . Thus, the end result, which is the sigma model with the target P S , is identified with the deformation of the free field theory associated to T S by means of the vertex operators corresponding to all Ci , i = 1, . . . , N . In this paper we identify these vertex operators and construct these deformations explicitly. Moreover, we use this description of the sigma model of P S to give a new interpetation of mirror symmetry. We expect that one can give a similar description to the sigma models corresponding to more general target manifolds. A general complex manifold can be covered by open subsets that are analytically isomorphic to domains in Cn . The supersymmetric sigma model corresponding to each of these open subsets is described by a free field theory which may be viewed as a system of decoupled bosonic and fermionic ghosts. So one may hope to define the quantum theory for a general Kähler target manifold M by appropriately “gluing” together the free field theories corresponding to these open subsets. The mathematical works on the chiral de Rham complex indicate that this is a non-trivial task which requires methods that up to now have not been widely used by physicists in this context, such as Cech cohomology. However, for toric varieties our task is considerably simplified by the existence of a particularly nice cover. We will use this cover in order to realize the sigma model as a deformation of a free field theory. To illustrate these ideas, we will now consider the case when the target manifold M is P1 . 1.4. Warm-up example: From C to P1 . As a warm-up example, we will consider the case of the target manifold M = P1 . The corresponding non-linear sigma model will be defined as a deformation of the linear model with the target C. In the next section we will define the same non-linear model as a deformation of the linear model with the target C× , which we will find to be technically more convenient. However, it is instructive to start by looking first at the deformation from C to P1 . The theory with the target C is a free conformal field theory with the chiral fields X (z), p(z), ψ(z), π(z) and their anti-chiral partners with the action (0.1). The chiral fields obey the standard OPEs p(z)X (w) = −

i i + reg., ψ(z)π(w) = − + reg. z−w z−w

This is nothing but the free theory of bosonic and fermionic ghosts (also known as a βγ –system and a bc–system), and its quantization is relatively straightforward. We wish to interpret holomorphic maps → P1 within the framework of this free field theory. Namely, we view such maps as meromorphic maps → C. Let w1 , . . . , wn be the points of where this map has a pole. Generically, all these poles will be of order one. As explained in the introduction, our proposal is that we can include such maps by inserting in the correlation functions of the linear sigma model certain vertex operators at the points w1 , . . . , wn . In the case at hand, we propose the following candidate for this operator:

Mirror Symmetry in Two Steps: A–I–B

53

D(z, z) = δ 2 ( p)(z, z)π(z)π(z). What is the meaning of the operator δ 2 ( p)(z, z) from the Lagrangian point of view? Recall that the field p(z) is a Lagrange multiplier responsible for the equation of holomorphy ∂ z X = 0. Therefore the insertion of the field δ 2 ( p)(z, z) in the path integral is the instruction to relax this equation at the point z ∈ in the minimal possible way. This just means that our map X : → P1 should cease to be holomorphic at the point z ∈ , i.e., it should develop a pole. We need to multiply δ 2 ( p)(z, z) by its odd counterpart, namely δ 2 (π )(z, z), which is nothing but the operator π(z)π(z). This gives us the above operator D(z, z). Inserting the operators D(wi , wi ), i = 1, . . . , n, corresponds to considering meromorphic maps → C with poles precisely at the points w1 , . . . , wn , or equivalently, considering the holomorphic maps → P1 which pass through the point ∞ ∈ P1 precisely at the points w1 , . . . , wn ∈ . The operator D(z, z) also has a transparent meaning from the point of view of the operator formalism. While operators of the form δ 2 (X )(z, z) are quite common, the operators δ 2 ( p)(z, z) may appear at first glance as somewhat more exotic. But the mystery disappears if one considers the corresponding state in the Hilbert space of the linear sigma model corresponding to a small circle around a point z ∈ . To simplify notation, set z = 0. Then this space contains the direct sum of the tensor products FN ⊗ F N ,

N ∈ Z,

of the Fock representations FN , the Heisenberg algebra generated by the Fourier modes of the chiral fields X n z −n , p(z) = pn z −n−1 , X (z) = n∈Z

n∈Z

and their anti-holomorphic analogues F N . The vacuum vector |0 ⊗ |0 is in F0 ⊗ F 0 . The vector |0 ∈ F0 is annihilated by X (z) f (z)dz for all holomorphic one-forms f (z)dz on a small disc around 0, where the integral is taken over a small circle around 0 (i.e., it is annihilated by X n , n > 0) and by p(z)g(z)dz, for all holomorphic functions g(z) on the small disc around 0 (i.e., it is annihilated by pn , n ≥ 0). The vector |0 satisfies similar equations. Now, the vector corresponding to the operator δ 2 ( p)(0, 0) is nothing but the tensor product of the highest weight vectors from other Fock spaces, namely, |1 ⊗ |1 ∈ F1 ⊗ F 1 . The vector |1 satisfies X (z) f (z)dz · |1 = 0, f (z) ∈ zC[[z]], p(z)g(z)dz · |1 = 0, g(z) ∈ z −1 C[[z]]. In other words, |1 is annihilated by X n , n > 1, and by pn , n ≥ −1. So δ 2 ( p)(z, z) = δ( p)(z)δ( p)(z), where δ( p)(z) is nothing but the chiral field corresponding to the highest weight vector |1 of the Fock representation F1 of the Heisenberg algebra, and δ( p)(z) is its anti-chiral analogue corresponding to the anti-chiral state |1.

54

E. Frenkel, A. Losev

Likewise, π−1 π −1 |0 is a highest weight vector over the Clifford algebra generated by the Fourier coefficients of the fields ψ(z), π(z), ψ(z), π (z). It is annihilated by ψn , ψ n , n > 1, and πn , π n , n ≥ −1. Incidentally, from this point of view δ 2 (X )(z, z) is nothing but the operator corresponding to the state | − 1 ⊗ | − 1. So the familiar operator O0 (z, z) = δ 2 (X )(z, z)ψ(z)ψ(z) is an analogue of our operator D(z, z), which may in fact be used to represent the observable in the Gromov-Witten theory corresponding to the degree two cohomology class of P1 . The conformal dimension of the field δ 2 ( p) is (−1, −1). This is in fact a special case of a general fact: if (z, z) is a bosonic field of conformal dimension (, ) and charge ν, then δ 2 ()(z, z) should have conformal dimension (−, −) and charge −ν. Note also that D(z, z) has conformal dimension (0, 0). Let us compute the correlation function of these observables for of genus zero. From the Gromov-Witten theory we know that the correlation function is non-zero if the number of insertions is odd, 2n + 1, and then the answer should be equal to q n , because it corresponds to holomorphic maps of degree n. Let us explain how to reproduce exactly this answer within the framework of the linear sigma model. Observe that a map of degree n has to pass through ∞ exactly n times (with multiplicities, in general, but generically the multiplicities will all be equal to one). This means that we have to insert the operator D(z, z) at n distinct points. But this operator has charge 1 (with respect to the current :X (z) p(z):) and ghost number 1, while the operator O(z, z) has charge −1 and ghost number −1. The anomalous conservation law in genus zero demands that the total charge and the ghost number be both equal to −1. Therefore in order to compensate for the n insertions of the operator D(z, z) we have to insert the operator O(z, z) at n + 1 additional points. After that we reproduce the answer of the Gromov-Witten theory because the correlation function of these operators is equal to 1, which we should multiply by q n to account for the degree of the map. In other words, in order to account for the degrees of the holomorphic maps we should really be inserting the operator q D(z, z) rather than D(z, z). In the Gromov-Witten theory one also considers the fields obtained by cohomological descent from the basic fields described above (see [31]). The cohomological descendants of an operator O satisfy the equations dO = Q tot , O(1) , dO(1) = Q tot , O(2) , where Q tot = Q + Q is the supersymmetry charge. To calculate them, we observe that we have two (twisted) N = 2 superconformal algebras with the chiral one generated by the fields G(z) = i∂z X (z)π(z), Q(z) = −i p(z)ψ(z), T (z) = −i:∂z X (z) p(z): − i:π(z)∂z ψ(z):, J (z) = i:ψ(z)π(z): , and similarly for the anti-chiral one. The chiral supersymmetry charge is the operator Q = Q(z)dz, and G(z) satisfies Q(w)dw · G(z) = T (z)

Mirror Symmetry in Two Steps: A–I–B

55

(here and below, in similar formulas, the contour of integration goes around z, and we suppress the factor 1/2πi). In particular, we have [Q, G −1 ]+ = L −1 ,

where G −1 = G(z)dz. We have similar formulas for Q. This allows us to find O(1) and O(2) from the formulas O(1) = G −1 · Odz + G −1 · Odz, O(2) = G −1 G −1 · Odzdz, provided that Q tot · O = 0. In particular, since Q tot · D(z, z) = 0, we find that D (2) (z, z) = X (w)wdw X (w)wdw · δ 2 ( p)(z, z) π(z)∂z π(z)π(z)∂ z π (z)dzdz. The bosonic part of this field corresponds to the state X 1 |1 ⊗ X 1 |1 ∈ F1 ⊗ F 1 . Note that the field D (2) (z, z) has conformal dimension (1, 1). In the setting of the linear sigma model the maps → P1 of degree n are the same as meromorphic maps with poles at n points (counted with multiplicity). As we argued above, those should be counted via the insertion of the vertex operator q D(z, z). Since we will be integrating over all such maps, and hence over all possible positions of the poles, the degree n contribution to the correlation function O1 . . . On P1 of local observables in the non-linear sigma model with the target P1 (such as O0 (z, z) introduced above) should be equal to the correlation function of these operators in the linear model with the additional insertion of the integral of the (1, 1)–forms q D (2) (z, z), obtained by cohomological descent from the operators D(z, z) introduced above. Thus, this correlation function should be given by ∞ qn n=0

n!

O1 . . . Om

D

(2)

(w1 , w 1 ) . . .

D (2) (w1 , w n )C

(the 1/n! factor is due to the fact that the points w1 , . . . , wn are unordered). But this is the same as the correlation function in the linear sigma model deformed by the marginal operator q D (2) (z, z). This suggests that the non-linear sigma model with the target P1 in the infinite volume limit is equivalent to the linear sigma model deformed by the marginal operator D (2) (z, z), i.e., the theory defined by the action

1 d 2 z i p∂ z X + iπ ∂ z ψ + i p∂z X + iπ ∂z ψ + q D (2) . 2π While nice and intuitive, the representation of the deforming operator in terms of the delta-function δ 2 ( p)(z, z) is rather inconvenient for practical calculations. One possible way to do that is to invoke the Friedan-Martinec-Shenker bosonization [15] of the p, X system: X (z) = eu(z)+v(z) ,

p = −∂v(z)e−u(z)−v(z) ,

where u(z) and v(z) are the scalar fields having the OPEs u(z)u(w) ∼ − log(z − w), v(z)v(w) ∼ log(z − w).

56

E. Frenkel, A. Losev

We have similar formulas for the anti-chiral fields p, X . Then we have the following bosonic representation: δ( p)(z) = eu(z) , δ( p)(z) = eu(z) . It is easy to see that these fields have the right OPE with the fields X (z), p(z) and their complex conjugates. Since the conformal dimension of eαu(z) is −α(α + 1)/2, we obtain that the conformal dimension of δ( p(z)) is indeed −1. Thus, we obtain the following realization of the fields introduced above: D(z, z) = eu(z)+u(z) π(z)π(z), D (z, z) = e2(u+u)+(v+v) π ∂z π π ∂ z πdzdz. (2)

However, the FMS bosonization identifies the X, p system with a subalgebra of the chiral algebra of the two scalar bosons u, v. To get an isomorphism, we need to invert X , i.e., pass from C to C× (see [11]). This already indicates that it is more convenient to formulate the theory on C× rather than on C. This leads us to the toric sigma model introduced in the next section. 2. The Model with the Target C× 2.1. Toric sigma model. We would like to express the correlation functions of the sigma model with the target P1 in the limit of infinite volume in terms of the operator formalism of the sigma model with the target C× = P1 \{0, ∞}, also at the infinite volume. To define the sigma model with the target C× we will use the logarithmic coordinate X = R + iφ, where φ is periodic with the period 2π . In other words, we identify C× with R × S 1 , where R is a coordinate on R and φ is a coordinate on S 1 . We introduce the metric t (d R 2 + dφ 2 ) = td X d X ,

(2.1) √ so the circle has radius t. The action of the sigma model in the first order formalism, introduced in Sect. 1.1, is

1 d 2 z i p∂ z X + i p∂z X + iπ ∂ z ψ + iπ ∂z ψ + t −1 p p . (2.2) It = 2π To eliminate p and p in the path integral by completing the action to a square and integrating them out (see Remark 1.1), we substitute the following expressions in the Lagrangian: p = −it∂z X ,

p = −it∂ z X.

Then we obtain the usual action of the sigma model with the target R × metric (2.1): t d 2 z ∂ z X ∂z X + iπ ∂ z ψ + iπ ∂z ψ . 2π

(2.3) S1

In the limit t → ∞ the last term in It drops out and we obtain the action i d 2 z p∂ z X + π ∂ z ψ + p∂z X + π ∂z ψ . 2π We call this model a toric sigma model with the target C× .

with the

(2.4)

Mirror Symmetry in Two Steps: A–I–B

57

Equations of motion imply that fields X (z), p(z), ψ(z), π(z) are holomorphic (X (z) and ψ(z) have conformal dimension 0 and p(z), π(z) have conformal dimension 1), while their complex conjugates X (z), p(z), ψ(z), π (z) are anti-holomorphic. They obey the standard OPEs X (z) p(w) = −

i i + :X (z) p(w):, ψ(z)π(w) = − + :ψ(z)π(w):, (2.5) z−w z−w

and similarly for the anti-chiral fields. So this is a free field theory which is the toric version of the well-known system of bosonic and fermionic ghost fields. It possesses an N = 2 superconformal symmetry. The generating fields of the left moving N = 2 (twisted) superconformal algebra are given by the following formulas: Q(z) = −i p(z)ψ(z) − ∂z ψ(z), G(z) = i∂z X (z)π(z), (2.6) T (z) = −i:∂z X (z) p(z): − i:π(z)∂z ψ(z):, J (z) = i:ψ(z)π(z): + ∂z X (z). There are also anti-chiral fields Q(z), G(z), T (z), and J (z), given by similar formulas, which generate the right moving copy of the N = 2 superconformal algebra. The Hilbert space of the theory is built from bosonic Fock representations of the Heisenberg algebra generated by the Fourier coefficients of the fields ∂z X (z), p(z), ∂ z X (z), p(z) and fermionic Fock representations of the Clifford algebra generated by the Fourier coefficients of ψ(z), π(z), ψ(z), π (z). The precise structure of the bosonic Hilbert space and the state-field correspondence will be described in Sect. 5.1. Here we focus on the most salient features of the theory. 2.2. Holomorphic vortices. In the canonical quantization of the toric sigma model we consider the theory defined on the cylinder , with the holomorphic coordinate z = et+is , t ∈ R, s ∈ R/2π Z. Because our target space is also a cylinder, we find that we can allow non-trivial winding, i.e., we can allow X (e2πi z) to differ from X (z) by a an integral multiple of 2πi. This, together with the condition of holomorphy, means that X (z) and X (z) may be written as follows: X (z) = ω log z + X n z −n , X (z) = ω log z + X n z −n , n∈Z

n∈Z

where ω is the winding operator which is allowed to take integer values. This indicates that the Hilbert space may contain states that have non-zero value of the operator ω, and hence non-zero winding. A convenient way to understand this is by interpreting the toric sigma model as the Z–orbifold of the corresponding model with the target C. The latter is the free field theory that we discussed in Sect. 1.4. It is described by the action (2.4), where X (z) and X (z) are single-valued. The group Z is a symmetry group of the action, shifting X by integer multiples of 2πi. We expect that our toric sigma model with the target C/2πiZ may be obtained from the corresponding theory with the target C by taking its Z–orbifold. The corresponding twist fields should then be exactly the fields with non-zero winding number ω. This is analogous to the fact that the vortex operators of the sigma model with the target C/2πiZ at the finite radius may be interpreted as the twist fields arising in the Z–orbifolding of the usual linear sigma model with the target C. Because of this analogy, we call the twist fields arising in the toric sigma model holomortex operators. However,

58

E. Frenkel, A. Losev

the vortex operators and the twist fields that we have at the infinite radius have different nature. To explain this point, it is convenient to work in the logarithmic coordinates s, t on the worldsheet cylinder 0 . In the finite volume theory the coordinates R, φ on the target cylinder M0 are completely independent, and the winding occurs in the φ variable, independently of R. In other words, there are harmonic maps X : 0 → M0 which are constant along R, but wind around φ, such as R = 0, φ = ms, where m ∈ Z. The vortex operator with the winding number m belongs to the sector of the theory corresponding to maps of this type. But in the infinite volume limit the map X : → M has to be holomorphic. Therefore R and φ are no longer independent. Now we have maps of the form R = mt, φ = ms, where m ∈ Z, so R as well as φ depend on (s, t). That is why there is no straightforward way to define the holomorphic winding operators as a naive limit of the vortex operators in the infinite volume limit. What are then the explicit formulas for the holomortex operators? Denoting the operator with the winding number m by m (z, z), we find that we need to have the following OPEs: X (z)m (w, w) = m log(z − w)m (w, w) + . . . , X (z)m (w, w) = m log(z − w)m (w, w) + . . . . Using the OPEs (2.5), we find that the field z ( p(w)dw + p(w)dw) m (z, z) = e−im P (z, z) = exp −im

(2.7)

(2.8)

z0

(or any of its scalar multiples) has precisely the OPEs (2.7) with X (z) and X (z). Formula (2.8) a priori depends on the point z 0 and the integration contour. We will give a more precise definition of these operators acting on the Hilbert space of the theory in Sect. 5.1. Here we would like to comment that for the purposes of this paper we only need to consider the correlation functions of the operators e±i P . We will postulate that a correlation function of such operators will be non-zero if and only if an equal number of these operators with the + and − signs are involved. (In Sect. 2.3 we will see that this condition naturally comes from integrating over the zero mode of the dual variable U .) Then we simply define the correlation function by pairing the + and − operators in an arbitrary way and integrating over the contours going from the location of the − operator to the location of the + operator in each pair. The result is independent of the choice of the pairing as long as all other operators in the correlation function have well-defined OPEs with the operators e±i P , as discussed below. Note also that while the individual operator e±i P is a priori defined only up to a scalar multiple, once we normalize one of them, the other is also automatically normalized. Therefore the product of an equal number of the + and − holomortex operators does not depend on the choice of normalization. This gives us a well-defined prescription for the computation of the correlation functions that we need, and it is easy to generalize it to the correlation functions involving the fields m with m = ±1. The presence of the holomortex operators eim P , m ∈ Z, given by formula (2.8), in our theory places restrictions on what other fields are allowed. Namely, those fields im P must have well-defined OPEs with the fields e , m ∈ Z. (This insures the contour independence of the correlation functions discussed in the previous paragraph.) This is analogous to the case of the sigma model with the target C/2πiZ at the finite radius,

Mirror Symmetry in Two Steps: A–I–B

59

considered as a Z–orbifold. In the linear sigma model we have the fields eir φ with arbitrary r ∈ R, but after orbifolding r is quantized and can take only integer values. This condition insures that these fields have well-defined OPEs with the vortex operators, which are the orbifold twist fields at the finite radius. Let us analyze what conditions are imposed by the presence of the twist fields im P e , m ∈ Z in our theory. The fields p(z), p(z) have well-defined OPEs with them, and so do the derivatives ∂z X (z), ∂ z X (z). Next, we look at the exponential fields exp(α X (z) + β X (z)). They have the following OPEs with e−im P (w, w): exp(α X (z) + β X (z))e−im = (z − w)

mα

(z − w)

mβ

P

(w, w)

: exp(α X (z) + β X (z))e−im

P

(w, w): .

The condition for the right hand side to be single-valued is that α − β ∈ Z. This condition ensures that the correlation functions of the allowed operators and the operators e−im P (w, w) do not depend on the choice of the contours of integration. The operator content of the theory is described in more detail in Sect. 5.1. 2.3. T –duality. Now we will show that the toric sigma model introduced in the previous section is equivalent to the ordinary sigma model with the target space being the torus R × S 1 equipped with the Minkowski metric such that the circle is isotropic. In this realization the holomortex operators eim P have a particularly simple form. In this section we discuss the path integral realization of the duality. The operator realization will be considered in Sect. 5.1. Let us introduce the one-form P = p(z)dz + p(z)dz on . We choose the real structure in which the complex conjugate of p is p, so that the one-form P is real. Then we rewrite the bosonic part of the action (2.4) as follows: i Ibos = (−P ∧ dφ + P ∧ ∗d R). (2.9) 2π Here ∗ denotes the Hodge star operator on , which in coordinates looks as follows: ∗dz = −idz, ∗dz = idz. Recall that our convention for the integration measure on is d 2 z = idz ∧ dz. Let us integrate out the field φ in the path integral. Then we obtain the constraint d P = 0, or in components ∂z p = ∂ z p. A general solution of this equation is P = dU = dU0 + ajωj, (2.10) j∈I

where U0 is a real single-valued field and the ωi ’s are closed real one-forms representing a basis in the first cohomology group of . We choose them in such a way that they are harmonic and their integrals over cycles in are integers and J kl = ωk ∧ ωl is an integral skew-symmetric matrix with determinant one. We claim that the coefficients ai are constrained to be of the form a j = 2π m j , m j ∈ Z, and so U is a 2π –periodic field. We follow the presentation of the book [19], Sect. 11.2. The field φ takes values in R/2π Z and therefore it is allowed to have non-trivial winding. This means that dφ may be expressed by the formula dφ = dφ0 + 2π n i ωi , n i ∈ Z, i∈I

60

E. Frenkel, A. Losev

where φ0 is a real single-valued function. Then we have 1 P ∧ dφ = ai J i j n j . 2π i, j∈I

Taking the summation over the n j ’s in the path integral, we find from the Poisson summation formula that a j = 2π m j , m j ∈ Z. Hence U is a function → R/2π Z. Thus, we have the following transformation formulas: p(z) = ∂z U (z, z), p(z) = ∂ z U (z, z), 1 (X (z) + X (z)) = R(z, z). 2 These formulas are closely related to the Friedan-Martinec-Shenker bosonization discussed in Sect. 1.4. The holomortex operators eim P have a particularly simple realization in the dual variables: e

im

z

z0

P

= eimU (z) e−imU (z 0 ) ,

and this is the reason why the dual theory will be convenient for our purposes. Let us introduce the improved holomortex operators eimU (z) . The integration over the zero mode of the field U (z) will guarantee that the correlation function of the operators e±iU (z) will be non-zero if and only if equal numbers of the operators eiU (z) and e−iU (z) are involved. This is precisely the condition that we imposed by hand in Sect. 2.2.5 On the other hand, if this condition is satisfied, then the correlation functions of the improved holomortex operators are the same as the correlation functions of the original ones. Hence from now on we will use the improved holomortex operators in our computations. The dual theory is formulated in terms of the fields U and R with the action i i Ibos = dU ∧ ∗d R = d 2 z (∂z U ∂ z R + ∂ z U ∂z R). (2.11) 2π 2π This is the action of the sigma model with the target the cylinder R × (R/2π Z) with coordinates (R, U ), but with the Minkowski metric idRdU . Note that the compact direction U is isotropic, and so the notion of the “radius” of this cylinder does not make sense. Since the one-forms ω j ’s in formula (2.10) are chosen to be harmonic, we can replace in the action the multivalued function U by the single-valued function U0 . However, we then have to remember to integrate in the path integral not only over U0 but also sum up over all possible values of a j = 2π m j , m j ∈ Z. Because our metric is Minkowski and the dual variable to U , namely R, is non-periodic, this leads to some non-trivial consequences as discussed below in Sect. 3.2. The fermionic part of action of the theory remains the same, so the total action of the dual theory is i I = d 2 z (∂z U ∂ z R + ∂ z U ∂z R + π ∂ z ψ + π ∂z ψ). (2.12) 2π Note that when we deform the action (2.4) to a finite radius r , we add to it the term p p, which in the dual variables looks like r12 ∂z U ∂ z U . Therefore we see that the metric on the torus is changing in such a way that the circle in the U direction acquires radius r −1 , as we should expect under T –duality at the finite radius. 1 r2

5 We thank V. Lysov for a discussion of this point.

Mirror Symmetry in Two Steps: A–I–B

61

3. Changing the Target from C× to P1 The correlation functions of the toric sigma model correspond to path integrals over all maps → C× . Then, since the path integral over p and π and their complex conjugates is interpreted as the delta-form supported on the holomorphic maps, as we argued above, any correlation function of the fields involving X (z) and ψ(z) (and their complex conjugates) may be written in terms of the holomorphic maps → C× , which are necessarily constant for compact . Therefore the correlation functions reduce to integrals over the zero mode (i.e., over the image of the constant map : → C× ). Is it possible to interpret holomorphic maps → P1 within the framework of the toric sigma model? 3.1. Deformation of the toric sigma model. As we explained in the Introduction, holomorphic maps → P1 may be viewed as holomorphic maps \{wi± } → C/2πiZ with logarithmic singularities at some points w1± , . . . , w ± N , where this map behaves as ± log(z − wi± ). These singular points correspond to zeroes and poles of exp , and generically they will be distinct. Our proposal is that we can include these maps by inserting in the correlation function of the linear sigma model certain vertex operators ± (wi± ). The defining property of the operators ± (w) is that their operator product expansion (OPE) with X (z) should read X (z)± (w) = ± log(z − w)± (w). We have already found such operators in Sect. 2.2. These are the holomortex operators ± (w) = e

∓i

w

w0

P

.

Note that using these operators we can obtain a given function (for of genus zero) (z) = c +

n

log(z − wi+ ) −

i=1

as the correlator (z) = X (z)

n i=1

+ (wi+ )

n

n

log(z − wi− )

i=1

− (wi− )

2 δ (X (z 0 ) − c0 )ψ(z 0 )ψ(z 0 ) .

(3.1)

i=1

The factor in brackets is needed so as to normalize the function (z) by the condition that (z 0 ) = c0 . This condition naturally appears upon integrating over the zero modes of X and ψ. Therefore we can create any meromorphic function on of genus 0 by taking the correlation function of the form (3.1). How to generalize this to of genus greater than zero? In this case, for a meromorphic function to exist, the points wi± where it has zeroes and poles must satisfy a constraint: the divisor i (wi+ ) − i (wi− ) has to be in the kernel of the Abel-Jacobi map. Therefore for our theory to be consistent, the correlation functions must somehow take this condition into account. This appears puzzling at first, but the apparent paradox is resolved if we recall formula (2.10). The one-form P is defined up to an addition of a linear combination of closed one-forms ω j , and periodicity of the field φ implies that the coefficients a j in

62

E. Frenkel, A. Losev

front of these one-forms must be integer multiples m j ’s of 2π . In the path integral we need to sum up over the m j ’s, and this leads to non-trivial consequences. Let Oi , i = 1, . . . , N , be some local operators in the toric sigma model and suppose we wish to compute the correlation function of these as well as the holomortex operators w−

w+

P , j = 1, . . . , n , and ei P , j = 1, . . . , n . First of all, recall from Sect. 2.2 e−i + − −i P has to be equal to the number of insertions of that the number of insertions of e ei P ; otherwise, the correlation function is automatically zero. Thus, n + = n − = n. The correlation function should be a differential form on the moduli space Mg,N +2n . To simplify our analysis, let us fix the complex structure on and the positions of the operators Oi , i = 1, . . . , N , leaving the positions w ±j of the holomortex operators free, but distinct. Consider the resulting differential form ω on the configuration space of 2n distinct points on . Its degree is determined by the fermionic charge conservation. Assume for simplicity that the operators Oi do not contain fermions. As in genus zero, we need to insert an operator of the form δ 2 (X (z 0 ) − c0 )ψ(z 0 )ψ(z 0 ) to take care of the zero mode of X and ψ. This means that in addition we need to insert g operators π and π, so that we should get a (g, g)–form on the configuration space. According to a general prescription of [33, 36] (see also Sect. 1.3 above), this differential form is constructed as follows. It is completely determined by its values on g tangent vectors of the form ∂/∂w ±j and g tangent vectors of the form ∂/∂w±j . Then at the corresponding point we have to insert the operators G −1 ± , G −1 ± or G −1 G −1 ± . For example, let j run from 1 to g. Then at the points w+j we have to insert the operator j

e−i

w+ j

j

P π(w + )π (w + ). j j

N i=1

=

Oi

g

The corresponding value of our differential form ω is given by e

−i

w+ j

P

j=1

N i=1

Oi

n

π(w +j )π (w j+ )

e

−i

w+

j=g+1 g

π(w +j )π (w j+ )

j=1

n

i

e

w− j w+j

j

P

n

e

i

w− j

P

j=1 P

.

j=1

Substituting formula (2.10), we find that the correlation function will contain the factor ⎛ ⎞ w− n j exp ⎝2πi mk ωk ⎠ , k

j=1

w+j

and this is the only term that depends on the m k ’s. In the path integral we will have to take the sum over all values of the m k ’s. The result of this summation is a delta-function, which means that the correlation function is identically equal to zero unless n j=1

mk

w− j w+j

ωk = 0

− + for all k. This precisely means that the divisor j (w j ) − j (w j ) has to be in the kernel of the Abel-Jacobi map. Now it is clear that the differential form ω on the configuration space of 2n points that we obtain in our theory is the delta-form supported on the kernel of the Abel-Jacobi

Mirror Symmetry in Two Steps: A–I–B

63

map (which has codimension g). We can “smoothen” this delta-form by deforming the action of our model with the term p pd 2 z (see below). Now suppose that Oi , i = 1, . . . , N , are operators from the sigma model with the target P1 , and we wish to compute the correlation function in the sigma model with the target P1 , O1 . . . O N P1 =

O1 . . . O N P1 ,n q n ,

(3.2)

n≥0

where O1 . . . O N P1 ,n is the term corresponding to the holomorphic maps of degree n. As we explained above, more general correlation functions in our sigma model are obtained by inserting contour integrals of the fields G(z) and G(z) coupled to vector fields on Mg,n . These correlation functions are interpreted as differential forms on Mg,n . As we discussed above, in the setting of the linear sigma model with the target C× the maps of degree n are maps with logarithmic singularities at 2n points w±j , j = 1, . . . , n (counted with multiplicity). For fixed positions of these points such a map is counted by inserting in the correlation function the holomortex operators ± (w ±j , w ±j ). Including all possible positions of the points w ±j means applying to each field ± (w ±j , w ±j ) the operator G −1 G −1 , where G −1 and G −1 are the contour integrals of the fields G(z) and G(z), coupled to the translation vector field ∂/∂w ±j and ∂/∂w ±j , respectively. In other words, we must replace each field ± (w, w) by the corresponding (1, 1)–form (2) ± (w, w) given by the formula (2)

± (w, w) = G −1 G −1 · ± (w, w)dwdw = e∓i

w

P

π(w)π (w)dwdw,

and integrate these (1, 1)–forms over . Note that since the operators ± are Q–closed, (2) ± is the operator obtained by cohomological descent (see Sect. 1.4). Thus, we find that O1 (z 1 , z 1 ) . . . O N (z N , z N )P1 ,n

1 = O1 (z 1 , z 1 ) . . . O N (z N , z N ) (n!)2 n n (2) (2) − − + + × + (w j , w j ) − (w j , w j ) i=1

i=1

C×

(the coefficient 1/(n!)2 is due to the fact that the collections of points {w +j } and {w +j } are unordered). The integrand is not well-defined on the diagonals wi+ = w −j near the points z k , a j

typical singularity being |z k − wi+ |2 /|z k − w− |2 . However, we believe that the above integrals do converge as long as we choose smooth observables Oi (see an example in Sect. 3.4). A proper way of treating this integral may be to extend it to a compactification of Mg,N +2n . Note that the first N points z 1 , . . . , z N correspond to the positions of the operators, while the additional 2n points w1± , . . . , wn± correspond to parameters of the space of maps → P1 . Therefore it is natural to expect that the resulting compactification is related to the Kontsevich moduli space of stable maps.

64

E. Frenkel, A. Losev

It follows that we can write the correlation function O1 . . . O N P1 as O1 . . . O N P1 = O1 . . . O N exp q 1/2 (+ (w)π(w)π(w) + − (w)π(w)π(w))dwdw

C×

.

Therefore we have interpreted the correlation functions of the P1 sigma model in the infinite volume as the correlation functions of the deformation of the toric sigma model, with the deformed action i d 2 z p∂ z X + π ∂ z ψ + p∂z X + π ∂z ψ 2π 1/2 +q (+ (w)π(w)π(w) + − (w)π(w)π(w))d 2 z. (3.3)

It is in this sense that we can say that the model with the deformed action (3.3) is equivalent to the type A twisted sigma model with the target P1 in the infinite volume. This works fine when has genus zero. But for of genus greater than zero, as we discussed in Sect. 1.2, we need to take care of the zero modes of p and p. As we saw above, the existence of these zero modes leads to correlation functions being delta-like differential forms on the moduli spaces of pointed curves. We can regularize these forms by adding the term p pd 2 z to the action. Note that we are not adding the term corresponding to the inverse of the Fubini-Study form on the target P1 , which would have violated conformal invariance of the action, but rather the inverse of the flat metric on C× . While this flat metric has poles at 0, ∞ ∈ P1 , its inverse has zeroes, and so it is regular on P1 . This term preserves conformal invariance of our theory. There is a similar regularization procedure in the case of more general Fano toric varieties. This regularization procedure becomes particularly important for maps of low degrees, where without regularization it may be impossible to evaluate the correlation functions. To illustrate this point, consider the simplest example. Suppose that is the torus and we wish to compute a contribution to some correlation function corresponding to maps of degree one to P1 . While there are certainly no maps from a smooth curve of genus one to P1 , there are stable maps corresponding to curves with nodal singularities having a genus zero component (this is often referred to as “bubbling”). Such maps constitute the entire moduli space of stable maps in this case (unlike the case of maps of high degree, where nodal curves contribute points at the boundary of the locus corresponding to smooth curves). It is well-known that the two-point function of the local observables O1 , O2 corresponding to two-forms ω1 , ω2 on P1 such that ωi = 1 is equal to 2q in this case. If we were to follow the above recipe for the computation of the two-point function in our deformed model literally, we would have to compute a correlation function of the form (2) (2) + + + + qO1 (z 1 , z 1 )O2 (z 2 , z 2 ) + (w , w )dw dw − (w − , w − )dw − dw − C× .

But as we explained above, the integral will be over those points w+ and w − which satisfy the Abel-Jacobi condition, which in this case reads w+ = w − . Since (2)

(2)

+ (w + , w + )− (w − , w − ) → 0 as w+ → w − , it seems that we obtain 0.

Mirror Symmetry in Two Steps: A–I–B

65

However, if we deform the action by the term p pd 2 z, the Abel-Jacobi condition is relaxed, and we obtain a non-trivial integral. We will show elsewhere that this integral reproduces the right answer 2q when → 0. We hope that this is the mechanism by which we can “reach” the components of the moduli spaces of stable maps which cannot be found in the closure of the locus corresponding to smooth curves. 3.2. Dual description of the deformed theory. A i B sideli na trube. A upalo, B propalo. Kto ostals na trube? 6 We have come to the key point of our construction. Let us apply the T –duality of Sect. 2.3 to the deformed theory defined by the action (3.3). In the dual variables R, U the holomortex operators ± become purely local operators e±iU and so the action (3.3) becomes

i eiU + e−iU π π d 2 z. d 2 z (∂z U ∂ z R +∂ z U ∂z R +π ∂ z ψ +π ∂z ψ) + q 1/2 2π (3.4) As we explained in the Introduction, this action is very similar to the action (0.4) of the B twisted Landau-Ginzburg model with the superpotential W (Y ) = q 1/2 (eiY + e−iY ). Unlike the Lagrangian in (3.3), the Lagrangian in (3.4) is local. The equivalence of the two theories implies that the q–series expansion of the instanton contributions on the deformed model described by (3.3), such as one given by formula (3.2), now has non-perturbative meaning in the dual theory defined by (3.4). In this theory q 1/2 appears as the coupling constant, and if it is small, then expanding the correlation functions in q 1/2 we reproduce the q–expansion of the correlation functions of the sigma model. However, we can study the theory with the action (3.4) for arbitrary values of q 1/2 . Note that in the path integral definition of the correlation functions of this model we must integrate over the single-valued function U0 as well as over the integers m j = a j /2π appearing in formula (2.10). This leads to some non-trivial consequences. In particular, when has genus greater than zero, the correlation functions involving the − +

factor nj=1 e−iU (w j ) nj=1 eiU (w j ) are non-zero only if the divisor j (w +j )− j (w −j ) is in the kernel of the Abel-Jacobi map. This follows in the same way as for the toric sigma model (see Sect. 3.1). Thus, the action (3.4) defines an intermediate model, which we call the I–model, between the A–model, namely, the twisted sigma model with the target P1 in the infinite volume, and the B–model, namely, the twisted Landau-Ginzburg model with the action (0.4). By the T –duality of Sect. 2.3, the q–perturbative I–model is equivalent to the A–model as a conformal field theory. On the other hand, the correlation functions in the BPS sector of the I–model are related to the correlation functions in the BPS sector of the B–model, which is the Landau-Ginzburg model with the superpotential W , considered in [18], up to contact terms (in the sense discussed in [24]). Thus, we conclude that the correlation functions in the BPS sector of the A–model are related to the correlation functions in the BPS sector of the B–model Landau-Ginzburg model, up to contact terms. This is usually considered as the statement of mirror symmetry. 6 A and B were sitting on a pipe. A fell, B disappeared. Who remained on the pipe? (Russian folklore riddle) The answer is “and”, which is “i” in Russian; hence the name “I–model”.

66

E. Frenkel, A. Losev

Mathematically, this is expressed as the equality of certain generating functions of Gromov-Witten invariants of M (these correspond to correlation functions in the sigma model deformed by the gravitational descendants) and certain oscillating integrals (these correspond to the correlation functions in a Landau-Ginzburg model). In general, this equivalence involves an intricate transformation on the space of coupling constants that is referred to as the mirror map (see [16], the recent book [19] and references therein for details). The reason for this transformation is that the two theories differ by contact terms, and this difference has to be absorbed in a transformation of the coupling constants (see [24]). To summarize, our construction for M = P1 (and for the more general case of a Fano toric variety M treated in the next section) realizes this correspondence of BPS correlation functions in two steps. First, we have an equivalence of two conformal field theories, the twisted sigma model of M (A–model) and the intermediate model defined by the action (3.4) (I–model). This means that all correlation functions that one can write in the A–model and the I–model are equal to each other. Second, we have a correspondence between the I–model to the B–model, which is more subtle: it applies only to the BPS sector, and in the BPS sector the two models are equivalent only up to contact terms, which is the reason for non-triviality of the mirror map. We do not address here the issue of computing these contact terms and explicitly deriving the mirror map from our proposed equivalence. But in principle this can be done. We hope to return to this issue in a future paper. 3.3. The supersymmetry charges. Recall that in the toric sigma model the left and right moving supersymmetry charges are given by the formulas Q = −i ψ(z) p(z)dz, Q = −i ψ(z) p(z)dz. The total supersymmetry charge Q + Q corresponds to the de Rham differential, which is typical for an A–model. After the deformation to the theory with the action (3.3) the supercharges change their form. This is due to the fact that the field Q(z) = ψ(z) p(z) is no longer holomorphic and the field Q(z) is no longer anti-holomorphic in the deformed theory. In fact, for any chiral field A(z) in a conformal field theory, after deforming the action with the term (z, z)dzdz, we have the following formula (see [35]): (3.5) (∂ z A)(z, z) = (w, z)dw · A(z), where the integral is over a small contour enclosing z. There is a similar formula for an anti-chiral field. Suppose that we have a superconformal field theory such that = (2) = G −1 G −1 , where is even, Q–closed and a highest weight vector of the Virasoro algebra, i.e., L n = L n = 0, n ≥ 0. Let (1) be the one-form obtained by cohomological descent (see Sect. 1.4): (1)

(1) = z(1) dz + z dz = G −1 dz + G −1 dz. Then if A(z) = Q(z) we find that (2) dw · Q(z) = −∂z z(1) .

Mirror Symmetry in Two Steps: A–I–B

67

(1) Hence the new left moving supercharge is (Qdz − z dz). Likewise, we have (1) (2) dw · Q(z) = −∂ z z , (1) and so the new right moving supercharge is (Qdz−z dz). Thus, the total supercharge of the deformed theory is Q + Q − (1) . In our case we have a deformation by

(2) (2) q 1/2 + + − dzdz, (2)

where ± = e∓i

Pππ.

Therefore we find that (1)

±,z = ±ie∓i

P

Thus, the new supercharge is Q(q) = −i ψ pdz − q 1/2 ei

π.

P

− e−i

P

π dz .

Similarly, we obtain that after the deformation the supercharge Q becomes

Q(q) = −i ψ pdz + q 1/2 ei P − e−i P π dz . In the I–model, these supercharges look as follows:

Q(q) = −i ψ∂z U dz − q 1/2 eiU − e−iU π dz ,

Q(q) = −i ψ∂ z U dz + q 1/2 eiU − e−iU π dz . Let us compute the cohomology of the right moving supercharge Q(q) on the Hilbert space of our theory. This Hilbert space is defined in Sect. 5.2. We will show in Sect. 5.4 that the cohomology of the resulting complex coincides with the the cohomology of a complex considered by L. Borisov [4] and F. Malikov and V. Schechtman in [25]. Its cohomology was shown in [25] to be equal to the quantum cohomology of P1 . The corresponding cohomology classes may be represented by 1 and eiU + e−iU . On the other hand, according to [34, 20], the cohomology of the operator Q(q) in the perturbative regime (without instanton corrections) should coincide with the cohomology of the chiral de Rham complex of P1 . To obtain this result, we need to consider a certain degeneration of the above complex, which corresponds to the perturbative regime of the theory. For that we introduce two parameters t1 , t2 such that t1 t2 = q, and write t1 eiU − t2 e−iU instead of q 1/2 (eiU − e−iU ). In the perturbative regime we have t1 , t2 = 0, but their product, which is q, becomes equal to 0. In other words, we should work over C[t1 , t2 ]/(t1 t2 ). This corresponds to allowing only degree zero maps → P1 . Such maps can pass through 0 or ∞, but not through both of them.

68

E. Frenkel, A. Losev

We will show in Sect. 5.4 that the cohomology of the degenerate complex coincides with the cohomology of a complex introduced in [4] (see also [25]). Borisov showed in [4] that its cohomology is precisely the cohomology of the chiral de Rham complex of P1 . Therefore we find an agreement with the prediction of [34, 20]. Our computation explains the meaning of the somewhat mysterious computation of [4, 25] from the point of view of the sigma model, with and without instanton corrections.

3.4. A sample computation of correlation functions. Here we show how to reproduce the simplest one-instanton calculation of the A–model (the sigma model with the target P1 ) in the framework of the I–model defined by action (3.4). Let ωi , i = 1, 2, 3, be three two-forms on P1 representing the second cohomology class. We will assume that they are invariant under the U (1)–action on P1 with the fixed points 0 and ∞. We identify P1 \{0, ∞} with C/2πiZ via the exponential map and use the coordinates R and φ on C/2πiZ = R × i(R/2π Z) as before. With respect to these coordinates, these forms may be written as ωi = f i (R)d X d X , where X = R + iφ. The local operators corresponding to the two-forms ωi in the A–model are ωi = f i (R)ψψ. Consider the case when the worldsheet has genus zero. The simplest non-trivial correlation function in the A–model is ω1 ω2 ω3 P1 = q

3 1 i=1 P

ωi .

Let us show how to reproduce this answer in the I–model. In the I–model the operators ωi are given by the same formula as above (since R makes sense in the dual theory), hence their correlation function expanded in powers of q is the correlation of the free field theory defined by the action (2.12) given by the formula

ω1 ω2 ω3 exp q 1/2 eiU + e−iU π π =

n n ∞ qn iU −iU ω . ω ω π π π π e e 1 2 3 (n!)2

(3.6)

n=0

We have already explained above that, due to the charge conservation, for the correlation function to be non-zero the number of insertions of eiU has to be equal to the number of insertions of e−iU . This explains why in the above formula we consider only the contributions corresponding to equal numbers of insertions. Next, we count the ghost number. The chiral ghost number of each of the operators ωi is one, due to the presence of the fermion ψ. Hence the contribution of the operators ωi to the chiral ghost number is 3, and likewise for the anti-chiral ghost number. The conservation law in genus zero is that the total chiral number and the anti-chiral ghost number should be equal to 1. Hence to get a non-zero correlation function we must insert two chiral fermions π and two anti-chiral fermions π . This means that the only non-zero term in the sum (3.6) is the term with n = 1, and the coefficient in front of it is precisely q.

Mirror Symmetry in Two Steps: A–I–B

69

Thus, it remains to show that in the free field theory with the action (2.12) we have 3 3 iU − − −iU + + e ωi (z i , z i ) e π πdw dw π π dw dw = ωi . (3.7) 1 i=1 P

i=1

In the correlation function appearing in the left-hand side of this formula we have fixed the points z 1 , z 2 , z 3 and we are integrating over the points w − and w + the (1, 1)–forms G −1 G −1 · e±iU . By using the Ward identities in the standard way (see [33, 36]), we can “swap” the operators G −1 G −1 and the integrals from the variables w − and w + to any two of the three variables z 1 , z 2 , z 3 , say, z 1 and z 2 , fix the position of the remaining point z 3 , say z 3 = ∞, and fix the positions of w − , w + . We find that G −1 G −1 · ωi = f i (R)∂z R∂ z Rdzdz. The fermionic part of the correlation function becomes equal to 1, and the bosonic part is given by the integral d 2 X d 2 z 1 d 2 z 2 f 1 (R(z 1 , z 1 )) f 2 (R(z 2 , z 2 )) f 3 (R(z 3 )) − + (3.8) × ∂z 1 R(z 1 )∂z 1 R(z 1 )∂z 2 R(z 2 )∂z 2 R(z 2 )eiU (w ) e−iU (w ) (the integral over d 2 X is the integral over the zero mode). But we have the following OPE: R(z, z)e∓iU (w

±)

±

∼ ± log |z − w ± |e∓iU (w ) .

Hence ∂z R(z)∂ z R(z)e∓iU (w

±)

±

∼ |z − w ± |±2 e∓iU (w ) .

Therefore the term ∂z R(z)∂ z R(z) in the correlation function (3.8) may be replaced by |(z − w + )/(z − w − )|, which is the Jacobian of the map z → log c(z − w+ )/(z − w − ). Thus, the integrals over z 1 and z 2 correspond to the integrals of ω1 and ω2 over P1 , while the integral over the zero mode corresponds to the integral of ω3 . We find that the integral (3.8) is equal to the right-hand side of (3.7), as desired. Note that in this computation we have in effect “localized” on the holomorphic maps → P1 corresponding to the meromorphic functions c(z − w+ )/(z − w − ), where c is a scalar. 4. General Toric Varieties 4.1. Recollections on toric varieties. Let us recall the combinatorial data involved in the definition of smooth compact toric varieties, following [1] (see also [28]). ˇ be the dual lattice. We set R = ⊗Z R, Let be a lattice of rank d and C = ⊗Z C. For k ≥ 1 a convex subset σ ⊂ R is called a regular k–dimensional cone if it is generated by a subset of a basis of , i.e., k ! k σ = R≥0 vi i=1 = ai vi ai ∈ R≥0 , i=1

70

E. Frenkel, A. Losev

where {v1 , . . . , vk } is a subset of that can be extended to a basis. The 0–dimensional regular cone is by definition the origin 0 ∈ R . A subcone σ of σ generated by a subset k of {vi }i=1 is called a face of σ . In this case we use the notation σ < σ . m is called a complete regular fan if the following A finite collection S = {σi }i=1 conditions are satisfied: 1. if σ ∈ S and σ < σ , then σ ∈ S; 2. if σ, σ " ∈ S, then σ ∩ σ < σ and σ ∩ σ < σ ; m 3. R = i=1 σi . For example, let be the d–dimensional lattice generated by v1 , . . . , vd . Set vd+1 = d − i=1 vi . For any subset I ⊂ {1, . . . , d + 1}, let σ I = R≥0 v j j∈I . Then S(d) = {σ I } I ⊂{1,...,d+1} is a complete regular fan. One associates a toric variety to a fan S as follows. To each cone σ ∈ S we assign ˇ the dual cone in , ˇ | λˇ , v ≥ 0, ∀v ∈ σ }, σˇ = {λˇ ∈ and the affine variety Aσ = Spec C[σˇ ]. It is clear that if σ < σ , then we have a natural inclusion Aσ → Aσ . This allows us to glue the varieties Aσ , σ ∈ S, into a projective variety P S , which is the toric variety associated to S. For example, the variety associated to the fan S(d) is the projective variety Pd . In particular, we have an open dense subvariety of P S , d ˇ Spec C[x ±1 ]i=1 T S = A{0} = Spec C[] = (C× )d . i

Here xi , i = 1, . . . , d, are coordinates on T S corresponding to a basis {eˇ1 , . . . , eˇd } of ˇ that is dual to a basis {e1 , . . . , ed } of that we fix once and for all. Note that any

d d ai eˇi gives rise to a monomial function i=1 xiai on T S which we element λˇ = i=1 denote by f λˇ . In a basis independent way we can say that T S is the algebraic torus, whose lattices ˇ of characters T S → C× and cocharacters C× → T S are canonically identified with and , respectively. Let σ (1), . . . , σ (N ) be the set of all one-dimensional cones in S. Each such cone σ (i) has a canonical generator v(i) ∈ that can be completed to a basis of . The varieties Aσ (i) , i = 1, . . . , N provide a covering of the toric variety P S by open dense subsets. By definition, the ring of functions on Aσ (i) is the span of all monomials f λˇ , ˇ v(i) ≥ 0. The complement of T S in Aσ (i) is the divisor Ci in the latter whose where λ, ideal is the span of the monomials f λˇ , where λˇ , v(i) > 0. It is clear that the closures C i of these divisors are the irreducible components of the complement of T S in P S . For instance, in the case of Pd , the one-dimensional cones are σ (i) = R≥0 vi , i = 1, . . . , d + 1, and so v(i) = vi . Therefore the varieties Aσ (i) are the subvarieties of Pd , where all but the i th homogeneous components are non-zero. The divisor Ci consists of points in which the i th homogeneous component is equal to 0. 4.2. The toric sigma model. Let us fix a smooth compact toric variety P S corresponding to a fan S. We will assume that P S is a fano variety. In fact, our construction can be applied to more general toric varieties; however, in the case of toric varieties that are not Fano the connection between the deformed model that we define below and the

Mirror Symmetry in Two Steps: A–I–B

71

A–model of P S is more subtle. We have indicated some of the underlying reasons for this in Sect. 3.1. The first step of our construction is to define the toric sigma model with the target d T S (C× )d = Spec C xi±1

i=1

.

This model is just the tensor product of d independent copies of the toric sigma model of C × described in Sect. 2.1. We will use the logarithmic coordinates X i , i = 1, . . . , d, on (C× )d C /2πi, such that xi = e X i . Thus, we have the fields X i , pi , ψ i , πi and their complex conjugates X i , pi , ψ i , πi . d d λˇ ˇ we have fields X λˇ = i=1 ai eˇi ∈ ai X i and X = For any element λˇ = i=1 d d i i=1 ai X , whereas for any element λ = i=1 bi ei ∈ we have fields pλ = d d λˇ λˇ ˇ ˇ i=1 bi pi and p λ = i=1 bi p i . We define the fermions ψ , ψ , λ ∈ , and πλ , π λ , λ ∈ in the same way. The action of the toric sigma model is given by the formula

i (4.1) d 2 z pi ∂ z X i + πi ∂ z ψ i + pi ∂z X i + πi ∂z ψ i . 2π The theory has N = (2, 2) superconformal symmetry. The corresponding generators are the sums of the generators in the C× toric sigma model given by formula (2.6). As in the one-dimensional case, explained in Sect. 2.2, we find that the fields X i may have non-trivial winding. The winding numbers take values in the lattice . For each λ ∈ we introduce the corresponding holomortex operators z −i Pλ = exp −i ( pλ (w)dw + p λ (w)dw) . λ (z, z) = e z0

μˇ

They have the following OPE with the fields X μˇ and X : X μˇ (z)λ (w, w) = μ, ˇ λ log(z − w)λ (w, w), μˇ

X (z)λ (w, w) = μ, ˇ λ log(z − w)λ (w, w). The prescription for the computation of correlation functions of these operators is the same as in the one-dimensional case (see Sect. 2.2). Next, we define the T –dual theory of the T S –toric sigma model. This is an ordinary sigma model with the target being the partially dualized torus ˇ R /2π ), ˇ Tˇ S = R × i( equipped with the Minkowski metric, which is the product of d copies of the Minkowski metric introduced in Sect. 2.3. Note that this metric is canonically defined precisely ˇ are dual to each other. because the lattices and In the dual theory the bosonic fields are Ui and R i , i = 1, . . . , d, and the fermionic fields are the same as in the toric sigma model. The action is as in (2.12):

i (4.2) d 2 z ∂z U j ∂ z R j + ∂ z U j ∂z R j + π j ∂ z ψ j + π j ∂z ψ j . I = 2π

72

E. Frenkel, A. Losev

The transformation formulas for the bosonic fields of the two models are pi (z) = ∂z Ui (z, z), pi (z) = ∂ z Ui (z, z), 1 i (X (z) + X i (z)) = R i (z, z). 2 The holomortex operators λ = e−i

Pλ

have a simple realization in the dual variables:

λ (z, z) = e−iUλ (z,z) , d d where we set Uλ = i=1 bi Ui for λ = i=1 bi ei . 4.3. Changing the target from T S to P S . We wish to describe the non-linear sigma model with the target toric variety P S as a deformation of the toric sigma model with the target torus T S . We follow the same idea as in the case of P1 explained in Sect. 3. Recall from Sect. 4.1 that the complement of T S in P S is a divisor, whose irreducible components C j , j = 1, . . . , N , are naturally parameterized by the one-dimensional cones σ j in S generated by v( j) ∈ . A generic holomorphic map : → P S takes values in T S ⊂ P S for all but finitely many points, and at the special points it takes values in the open part C j of the divisor C j , introduced in Sect. 4.1, for some j = 1, . . . , N . Let us ( j) denote the points of where takes values in Ci by wk , j = 1, . . . , m j . We propose to include such maps

by inserting in the correlation functions the holo( j) ( j) mortex operators v( j) wk , w k introduced in the previous section. Recall that v( j) (w, w) = e

−i

w

w0

Pv( j)

.

Clearly, these operators are Q–closed. Hence we find the following formula for the two-form cohomological descendant field of v( j) (w, w): (2)

v( j) (w, w)dwdw = v( j) (w, w)πv( j) (w)π v( j) (w)dwdw. Now observe that the lattice of all relations between the generators v( j), j = 1, . . . , N , of one-dimensional cones in S is generated by N − d linearly independent relations N

ai j v( j) = 0, i = 1, . . . , N − d,

j=1

where we choose the ai j ’s to be integers that are relatively prime. Let us introduce parameters t j , j = 1, . . . , N , and set qi =

N

a

t j i j , i = 1, . . . , N − d.

(4.3)

j=1

As in the case of P1 , the type A twisted sigma model with the target P S in the infinite volume is then described by the deformation of the toric sigma model by N j=1

tj

v( j) πv( j) π v( j) dwdw.

Mirror Symmetry in Two Steps: A–I–B

73

Note that the t j ’s can be redefined by changing the normalization of the operators v( j) , but this will not affect the parameters qi given by formula (4.3). Therefore the qi ’s are the true parameters of the theory, and they correspond precisely to the Kähler classes on P S , as explained in [1]. For example, if P S = Pd , then we have v( j) πv( j) π v( j) = e−i v(d+1) πv(d+1) π v(d+1)

π j π j , j = 1, . . . , d, ⎛ ⎞⎛ ⎞ d d d = ei j=1 P j ⎝ πj⎠ ⎝ πj⎠ , Pj

j=1

and there is only one parameter q =

j=1

d+1

j=1 t j .

4.4. The I–model. Finally, we apply the T –duality to the action of the deformed toric sigma model. The operators v( j) are now written as e−iUv( j) , and so the action takes the form

i d 2 z, (4.4) d 2 z ∂z U j ∂ z R j + ∂ z U j ∂z R j + π j ∂ z ψ j + π j ∂z ψ j + W 2π where = W

N

t j e−iUv( j) πv( j) π v( j) .

j=1

For example, if P S = Pd , then we have = W

d

t j e−iU j π j π j + td+1 ei

d

j=1 U j

⎛ ⎞⎛ ⎞ d d ⎝ πj⎠ ⎝ π j ⎠.

j=1

j=1

j=1

The action (4.4) defines the I–model for a general toric variety P S . As in the case of P1 , the action (4.4) should be compared to the action of the Landau-Ginzburg model with the superpotential W =

N

t j e−iYv( j) .

(4.5)

j=1

Here Yv( j) is a chiral superfield which is a linear combination of d independent chiral superfields Yk , k = 1, . . . , d, defined by the formula Yv( j) = dk=1 bi j Yk , where d v( j) = k=1 bi j ek . We recognize in formula (4.5) the superpotential of the type B twisted Landau-Ginzburg model that is mirror dual to the type A sigma model with the target P S considered in [18]. As we explained in the case of P1 , this suggests that mirror symmetry can be realized in two steps. The first step is the equivalence of the twisted sigma model of P S (A–model), described as a deformation of a free field theory, and the intermediate model defined by the action (4.4) (I–model), as conformal field theories. The second step is a correspondence between the I–model to the B–model, which is more subtle: it applies

74

E. Frenkel, A. Losev

only to the BPS sector, and in the BPS sector the two models are equivalent only up to contact terms. We hope that further study of the I–model and its connections with the A–model on the one hand and the B-model on the other hand will help us understand more fully the nature of mirror symmetry. 4.5. Supercharges. Let us compute the supersymmetry charges of the I–model. Following the same computation as in Sect. 3.3, we find the following formulas for the left and right moving supercharges: ⎛ ⎛ ⎞⎞ N Q = −i ⎝ψ k ∂z Uk dz + ⎝ t j e−iUv( j) π v( j) dz ⎠⎠, Q = −i

⎛

j=1

⎛

⎝ψ k ∂ z Uk dz − ⎝

N

⎞⎞ t j e−iUv( j) πv( j) dz ⎠⎠.

j=1

It is interesting to compute the cohomologies of the right moving supercharge Q(q). As in the case of P1 , we will find in Sect. 5.4 that these cohomologies coincide with the cohomologies of a complex constructed in [4, 25]. It was shown in [25] that in the case of Pn this cohomology coincides with the quantum cohomology of Pn . We expect the same to be true for more general toric Fano varieties. (In fact, it follows from the results of [25] that the cohomology of the total supercharge Q + Q is isomorphic to the quantum cohomology of P S .) On the other hand, the cohomology of a degeneration of this complex was computed in [4], and it gives the cohomology of the chiral de Rham complex of P S . This agrees with the prediction of [34, 20]. 5. Operator Formalism In this section we discuss the operator content of the toric sigma models introduced in the previous sections, the T –duality transform and the deformed models. For simplicity we will mostly treat the case of the target C× as the general case is very similar. The algebraic object that we define (we call it the “Hilbert space” of the theory) obeys the axioms of a vertex algebra mixing chiral and anti-chiral sectors, similar to the ones defined by A. Kapustin and D. Orlov in [21] who considered the case of sigma models of the torii in the finite volume. In particular, we define a state-field correspondence assigning to every state of the Hilbert space an operator depending on z, z acting on the Hilbert space. Thus, the toric sigma models and their T –dual models studied in this paper provide us with new examples of vertex algebras in which chiral and anti-chiral sectors are non-trivially mixed. While there is a vast mathematical literature on the subject of chiral algebras, examples of mixed vertex algebras have not been widely discussed in the mathematical literature so far. Actually, it is expected that the vertex algebras that occur in the study of mirror symmetry are for the most part of this sort, with the chiral and anti-chiral sectors entangled in a non-trivial way. Therefore we believe that algebraic study of such vertex algebras is important. At the end of this section we will compute the chiral algebra of the I–model and the cohomology of the right moving supercharge, making a connection with the results of [4, 25].

Mirror Symmetry in Two Steps: A–I–B

75

5.1. Hilbert space and state-field correspondence in the toric sigma model. We collect all the ingredients found in Sect. 2 and define the Hilbert space and the state-field correspondence of the toric sigma model. Let us write X (z) = ω log z + X n z −n , n∈Z

X (z) = ω log z + p(z) =

X n z −n ,

n∈Z

pn z

−n−1

,

n∈Z

p(z) =

p n z −n−1 ,

n∈Z

and let Tm be the operator satisfying [ω, Tm ] = mTm and commuting with the X n ’s and pn ’s. Note that we also have the following commutation relations: [X n , pm ] = −iδn,−m , [ pn , pm ] = [X n , X m ] = 0, and ω commutes with all pn ’s and X n ’s. We also have similar formulas for the components of the anti-chiral fields. Consider the Heisenberg algebras generated by X n , pn , n ∈ Z, and X n , p n , n ∈ Z, respectively. For γ ∈ C, let Fγ (resp., Fγ ) be the Fock representation of the Heisenberg algebra generated by a vector annihilated by X n , n > 0, pm , m ≥ 0 (resp., X n , n > 0, p m , m ≥ 0) and on which i p0 (resp., i p 0 ) acts by multiplication by γ . Since the imaginary part of X (z) is periodic, the eigenvalues of i( p0 − p 0 ) are quantized to be integers. This is exactly the condition that we obtained in Sect. 2.2. The operator ω is also quantized and has to take integer eigenvalues, called the winding numbers. The Fock representation Fα (resp., Fα ) on which ω acts by multiplication by m ∈ Z will be denoted Fα,m (resp., Fα,m ). We denote by |α, n (resp., |α, m) its generating vector. The big bosonic Hilbert space of the theory is the direct product of the tensor products of the left and right moving Fock representations F(r +α)/2,m ⊗ F(−r +α)/2,m ,

(5.1)

where r, m ∈ Z and α runs over a subset of C. There are different choices for this subset which is determined by what type of functions of the zero mode R0 of the field R(z, z) = (X (z) + X (z))/2 we wish to allow. One possibility is to restrict ourselves to the subset of α ∈ iR ⊂ C. This is compati√ ble with the structure of the Hilbert space in the sigma model at the finite radius t and corresponds to restricting ourselves to the L 2 functions of the zero mode. This choice is natural from the point of view of the latter √ model because it can itself be obtained as the sigma model with the target torus of radii t and r in the limit when r → ∞.

76

E. Frenkel, A. Losev

But this is not the only way to treat the toric sigma model. Indeed, we will consider it as a degeneration of the sigma model with the target P1 (in the infinite volume limit). Therefore another natural choice for the class of functions of the zero mode is the space of polynomial functions in e±R0 . In fact, it is natural to consider all rational functions on P1 which are regular on C× , that is polynomials in e±X 0 , as well as their complex conjugates, polynomials in e±X 0 . Choosing this space is equivalent to demanding that α be in the set Z ⊂ C. In the subsequent sections we will define a deformation of the toric sigma model which is equivalent to the A–model of P1 (that is the type A twisted sigma model with the target P1 in the infinite volume). The operators in this theory will be obtained by restriction to C× from operators defined on the entire P1 . While there are no regular functions on P1 other than constant functions, there will be composite operators depending on X (z) as well as p(z) that are well-defined, such as the normally ordered products :e±X (z) p(z):. Thus, we see that there are different choices for the subset of α’s which we may include in our Hilbert space. However, from the purely algebraic point of view, the state-field correspondence that we will now describe works equally well for any of these choices. Therefore in the rest of this subsection we will consider the direct product of the Fock spaces (5.1) with arbitrary complex values of α. The state-field correspondence that we describe now gives us the structure of a vertex algebra combining holomorphic and anti-holomorphic sections, in the sense of [21]. Note that just like in the case of sigma models on the torii that was considered in [21], we cannot separate the holomorphic and anti-holomorphic sectors of this vertex algebra. The key point is the assignment of a field to the state |0, m ⊗ |0, m. We assign to it the field e−im P given by the formula z def ( p(w)dw + p(w)dw) = (5.2) e−im P (z, z) = exp −im ⎛ ⎞ −im( p0 − p0 )/2 pn z −n + p n z −n z ⎠, Tm |z|−im( p0 + p0 ) exp ⎝im z n n=0

where Tm is the translation operator that shifts the winding number by m and commutes with all other operators. Note that the operator i( p0 − p 0 ) has only integer eigenvalues on the Hilbert space of the theory, so this formula is well-defined. This field has the following OPE with X (z): X (z)e−im

P

(w, w) = m log(z − w)e−im

P

(w, w),

and similarly with X (z). Next, we define the field corresponding to the state |(r + α)/2, m ⊗ |(−r + α)/2, m as the normally ordered product :e(r +α)X (z)/2+(−r +α)X (z)/2 e−im

P

(z, z): ,

where e(r +α)X (z)/2+(−r +α)X (z)/2 = ⎛ ⎞ 1 z 1 r ω/2 |z|ωα S(r +α)/2,(−r +α)/2 exp⎝ (r + α) X n z −n + (−r +α) X n z −n ⎠. z 2 2 def

n=0

n=0

Mirror Symmetry in Two Steps: A–I–B

77

Here S(r +α)/2,(−r +α)/2 is the translation operator F(r +α )/2,m ⊗ F(−r +α )/2,m → F(r +r +α+α )/2,m ⊗ F(−r −r +α+α )/2,m . Note that since ω has only integer eigenvalues, this formula is well-defined. Finally, the fields corresponding to other states in the Fock representation Fr +α,m ⊗ F−r +α,m are constructed as the normally ordered products of the fields defined above and the fields ∂z X (z), p(z), ∂ z X (z), p(z), under the usual assignment: X n →

1 ∂ −n X (z), n ≤ 0; (−n)! z

pn →

1 ∂ −n−1 p(z), n < 0. (−n − 1)! z

This completes the description of the bosonic part of the Hilbert space of the theory. Now we describe the fermionic part. Let us write ψ(z) = ψn z −n , π(z) = πn z −n−1 , n∈Z

n∈Z

and similarly for the anti-chiral fields. The operator product expansions give us the following anti-commutation relations: [πn , ψm ]+ = −iδn,−m , [πn , πm ]+ = [ψn , ψm ]+ = 0, and similarly for the components of the anti-chiral fields. Consider the Clifford algebra generated by ψn , πn , n ∈ Z (resp., ψ n , π n , n ∈ Z) and let Fferm (resp., Fferm ) be the fermionic Fock space representation of this algebra generated by a vector annihilated by ψn , n > 0, πm , m ≥ 0 (resp., ψ n , n > 0, π m , m ≥ 0). The fermionic Hilbert space is Fferm ⊗ Fferm . The total Hilbert space of the theory is the tensor product of the bosonic and fermionic spaces. 5.2. T –duality, operator formalism. Next, we discuss the duality transformation from the operator point of view. The operator content of the toric sigma model is described in the previous section. Let is now describe the operator content of the T –dual free bosonic field theory given by the action (2.11). The equations of motion imply that the fields U and R are harmonic, so we can write R(z, z) = R0 + log |z| p R +

Rn Rn z −n + z −n , n n n=0

z i ω log + U (z, z) = U0 + log |z| pU − 2 z

n=0

Un n=0

n

z −n +

Un n=0

n

z −n .

The OPEs of these fields are of the form R(z, z)U (w, w) = i log |z − w| + :R(z, z)U (w, w): . The Fourier coefficients of these fields satisfy the following commutation relations i nδn,−m , [Rn , Rm ] = [Un , Um ] = 0, 2 i [R n , U m ] = nδn,−m , [R n , R m ] = [U n , U m ] = 0, 2 [Rn , Um ] =

78

E. Frenkel, A. Losev

and [R0 , pU ] = −i, [U0 , p R ] = i. All other commutators are equal to zero. Because U is periodic, the momentum operator p R is quantized and takes only integer values, whereas pU can take arbitrary values. Also, the winding operator ω is quantized and takes only integer values. The Hilbert space of the theory is built from Fock representations of the Heisenberg algebra generated by the coefficients in the expansions of R and U . For β ∈ C and r, m ∈ Z, let Fr,β,m be the Fock representation generated by a vector |r, β, m annihilated by Rn , Un , R n , U n , n > 0, and on which p R and pU act by multiplication by m and by β, respectively, and ω acts by multiplication by r . Introduce the following translation operators, which map generating vectors to generating vectors and commute with the operators Rn , Un , R n , U n , n = 0: e β R0 : eimU0 :

er R 0 :

Fr ,β ,m → Fr ,β +β,m , Fr ,β ,m → Fr ,β,m +m , Fr ,β ,m → Fr +r,β ,m .

The state-field correspondence is defined as follows. The field corresponding to the vector |r, β, m is given by the normally ordered product

:eβ R(z,z) eimU (z,z) er R(z,z) : ,

(5.3)

where ⎛

eβ R(z,z)

eimU (z,z)

er R(z,z)

⎞ Rn Rn z −n + β = eβ R0 |z|βp R : exp ⎝β z −n ⎠ : , n n n=0 n=0 ⎞ ⎛ m ω/2 Un Un z z −n + m = eimU0 |z|impU : exp ⎝im z −n ⎠ : , z n n n=0 n=0 ⎛ ⎞ r p R /2 Rn Rn z z −n − r = er R 0 : exp ⎝r z −n ⎠ : . z n n n=0

n=0

The other fields are obtained in the standard way as normally ordered products of the field (5.3) and the derivatives of R and U , under the rule 1 1 ∂z−n R(z, z), R n → ∂ −n R(z, z), (−n − 1)! (−n − 1)! z 1 1 ∂z−n U (z, z), U n → ∂ −n U (z, z), Un → (−n − 1)! (−n − 1)! z Rn →

for n < 0.

Mirror Symmetry in Two Steps: A–I–B

79

The isomorphism between the Hilbert spaces of the two theories is given by the following transformation of the generating fields: 1 (X (z) + X (z)) → R(z, z), 2 p(z) → ∂z U (z, z), p(z) → ∂ z U (z, z), 1 z). (X (z) − X (z)) → R(z, 2

(5.4) (5.5) (5.6)

z) is non-local with respect to R(z, z), namely, R(z, z) = R− (z) − R+ (z), The field R(z, where R± are the holomorphic and anti-holomorphic parts of R(z, z) = R− (z) + R+ (z) defined by the formulas R− (z) =

Rn 1 − (R0 + p R log z) + z −n , 2 n

(5.7)

n=0

R+ (z) =

Rn 1 + (R0 + p R log z) + z −n , 2 n n=0

0 . where R0± = R0 ∓ R More precisely, at the level of the operators appearing as the coefficients in the expansions of these fields we have the following transformation: 2 Rn , n pn → −Un ,

X n →

2 R n , n = 0, n p n → −U n , n = 0,

X n →

1 1 0 , (X 0 + X 0 ) → R0 , (X 0 − X 0 ) → R 2 2 ( p0 + p 0 ) → pU , i( p0 − p 0 ) → ω, ω → p R . Thus we see that this transformation exchanges the momentum and the winding, as expected in T –duality. The isomorphism between the two Hilbert spaces is given by Fr,α,m . sending Fr +α,m ⊗ F−r +α,m to The fermionic Hilbert spaces are the same in the two theories. Hence we obtain an isomorphism of the full Hilbert spaces of the two T –dual theories. 5.3. Chiral algebra of the I–model. In Sect. 3.2 we defined the deformed model with the action (3.3), which should be equivalent to the A–model of P1 . The corresponding T –dual model is the I–model, which is a deformation of the theory discussed in the previous section. The action of this theory is given by formula (3.4), and for a more general toric variety P S it is given by formula (4.4). In this section we will determine the chiral algebra of integrals of motion of this theory in the sense of [35, 12]. In this context, the I–model is analogous of the conformal An Toda field theory, in which the chiral algebra is the Wn –algebra (see [8, 12]). We will show that the chiral algebra in the I–model corresponding to a toric variety P S is isomorphic to the space of global sections of the chiral de Rham complex of P S . In order to do this, we will identify the complex whose zeroth cohomology is this chiral algebra with the complex introduced

80

E. Frenkel, A. Losev

by Borisov in [4]. The cohomology of this complex is isomorphic to the cohomology of the chiral de Rham complex of P S , as shown in [4]. Thus, the I–model provides a natural link between Borisov’s complex, and hence the chiral de Rham complex of P S , and the sigma model of P S . Consider first the case of P1 . The action of the I–model given by formula (3.4) is obtained by deforming the action (2.12) of the free conformal field theory using the operators q 1/2 eiU π π and q 1/2 e−iU π π . According to formula (3.5), for any chiral field A(z) of the free field theory, we have in the I–model, 1/2 (∂ z A)(z, z) = q eiU (w,z) π(w)π(z)dw · A(z) +q 1/2 e−iU (w,z) π(w)π (z)dw · A(z). Therefore the chiral algebra of the I–model is equal to the intersection of the kernels of the operators e±iU (w,z) π(w)π(z)dw on the chiral algebra V of the free theory. The chiral algebra of the free conformal field theory is given by the direct sum ch V = ⊗ Fferm . (5.8) Fr,r,0 r ∈Z ch is the chiral sector of the Fock representation Fr,r,0 introduced in Sect. 5.1. Here Fr,r,0 The corresponding chiral fields are normally ordered products of ∂z U (z), ∂z R(z) and their derivatives, as well as the fields e2r R− (z) , where R− is given by formula (5.7). The chiral fermionic fields corresponding to vectors in the chiral fermionic Fock representation Fferm introduced in Sect. 5.1 are normally ordered products of ψ(z), π(z) and their derivatives. We need to find the intersection of the kernels of the operators e±iU (w,z) π(w)π(z)dw on V . Let us write U (w, w) = U− (w) + U+ (w), where

U− (w) =

Un 1 − (U0 + pU w −n , log w) + 2 n n=0

Un 1 + log w) + w −n , U+ (w) = (U0 + pU 2 n n=0

± and pU = pU ± i ω. Then it is clear that the kernel of the operator

e±iU (w,z) π(w)π(z)dw = e±iU+ (z) π (z)

e±iU− (w) π(w)dw

on V is equal to the kernel of the operator S± = e±iU− (w) π(w)dw. This allows us to express the chiral algebra of the I–model purely in terms of modules over a free chiral superalgebra, as we now explain.

Mirror Symmetry in Two Steps: A–I–B

81

Consider the Heisenberg-Clifford superalgebra with the generators An , Bn , n , n , n ∈ Z, and relations [Bn , Am ] = nδn,−m , [n , m ]+ = δn,−m , with all other super-commutators being zero. Let Fa,b be the Fock representation of this algebra generated by a vector |a, b which is annihilated by all generators with n > 0 and such that A0 |a, b = a|a, b, B0 |a, b = b|a, b, 0 |a, b = 0. # The direct sum a,b∈Z Fa,b is a chiral algebra. In particular, the field corresponding to the vector |a, 0 is given by the standard formulas ⎛ ⎞ An z −n ⎠. ea A(z)dz = exp ⎝ap A + a A0 log z − a n n=0

Let us identify the above chiral algebra with our chiral algebra by the formula An → −iUn , Bn → 2Rn , n = 0, i − , B0 → −R0− , A0 → pU 2 n → ψn , n → iπn , n ∈ Z. Then our chiral algebra V given by formula (5.8) becomes F0,• = the above operators S± become the operators S± = −i e± A(z)dz (z)dz : F0,• → F±1,• .

#

b∈Z

F0,b , and

(5.9)

Thus, we obtain that the chiral algebra of the I–model is the intersection of the kernels of the operators S+ and S− on F0,• . Now let us compare this with the results of [4] (see also [17]). In that paper a com• 0 n plex # C is constructed such that C = F0,• , and C = Fn,• ⊕ F−n,• , where Fn,• = b∈Z Fn,b . The differential is d = S+ + S− , where S± : Fn,• → Fn±1,• is given by formula (5.9), if ±n ≥ 0 and is equal to 0 otherwise. It is proved in [4] that the n th cohomology of this complex is isomorphic to the n th cohomology of the chiral de Rham complex of P1 . The latter vanishes for n = 0, 1, and the 0th and 1st cohomology may be described as modules over the affine Kac-Moody algebra sl2 of level 0 [17] (see Remark 5.1 below). Now we see that this complex, after a change of variables, naturally appears in the context of the I–model, and hence the A–model of P1 , as anticipated in [4]. In particular, we find that the 0th cohomology of the chiral de Rham complex of P1 is isomorphic to the chiral algebra of the I–model, and hence to the chiral algebra of the A–model associated to P1 (in the infinite volume limit). The operators S± are analogues of the screening operators familiar from the theory of W–algebras (see [8, 12]). It is clear from the above formula that they are residues of fermionic fields. Screening operators of this type have been considered by B. Feigin [9]. The generalization of the above computation to the case of the I–model associated to a toric variety P S is straightforward. Using a change of variables similar to the one explained above, we relate the chiral algebra of the I–model associated to P S to the 0th

82

E. Frenkel, A. Losev

cohomology of a complex constructed in [4]. According to [4], the cohomologies of this complex are isomorphic to the cohomologies of the chiral de Rham complex of P S . In particular, we find that the chiral algebra of the I–model associated to P S is isomorphic to the 0th cohomology of the chiral de Rham complex of P S . Remark 5.1. The toric sigma model carries a chiral sl2 symmetry with level 0, with the generating currents given by the formulas J ± (z) = :( p(z) ± ψ(z)π(z))e±X (z) : ,

J 0 (z) = −i p(z).

These formulas can be obtained by a change of variables from the formulas found in [10], which constitute a special case of the Wakimoto free field realization. There is also an anti-chiral copy of sl2 , with the anti-chiral currents given by similar formulas. The above chiral fields commute with the screening operators and therefore survive in the deformed theory, and hence we obtain that the A–model of P1 carries an sl2 symmetry. It corresponds to the natural action of the Lie algebra sl2 on P1 . One can check that these currents are Q(q)–exact, where Q(q) is the right supercharge discussed in the next section. 5.4. Cohomology of the right moving supercharge. Now we wish to compute the cohomology of the right moving supercharge of the I–model and its degeneration. Consider the case of P1 . Recall from Sect. 3.3 that the right moving supercharge of the I–model is given by the formula

ψ∂ z U dz + q 1/2 eiU − e−iU π dz , Q(q) = and in the T –dual variables by Q(q) = ψ pdz + q 1/2 ei

P

− e−i

P

π dz

(we omit the factor of −i which is inessential for the computation of cohomology). We wish to compute the cohomology of this operator on the Hilbert space H of our theory that was described in Sect. 5.1. As the space of functions of the zero mode of the bosonic fields X, X we will take the space of all smooth functions on C× . We will compute the cohomology of Q(q) by utilizing a spectral sequence corresponding to a Z–bigrading on H (we note that our computation is similar in spirit to the computation in [32]). The only non-zero degrees are assigned to the fermionic generators: deg ψ n = − deg π n = (1, 0), deg πn = − deg ψn = (0, 1). Then the first summand of the differential Q(q) has degree (1, 0), while the second summand has degree (0, 1). Using the additional Z–gradings by the eigenvalues of the L 0 and L 0 operators, it is easy to see that the corresponding spectral sequence converges. The zeroth differential is ψ pdz = ψ n p −n . n∈Z

Mirror Symmetry in Two Steps: A–I–B

83

Clearly, it affects only the part of the complex which is generated by p n , X n , ψ n , π n . All non-zero modes of these operators cancel out in the cohomology, and the cohomology reduces to the cohomology of the operator ψ 0 p 0 on the zero mode part of the complex. This operator is the Dolbeault ∂ operator, and its cohomology is the space of holomorphic functions on C× in degree zero, and the other cohomology vanishes. In the computation that follows we will replace this space by the space of Laurent polynomial functions on C× . This will not affect the cohomologies. Thus, we obtain that the first term of the spectral sequence is given (in the variables of the I–model) by the direct sum ch Fr,r,m ⊗ Fferm , m,r ∈Z ch is the chiral sector of the Fock representation Fr,r,m introduced in the previwhere Fr,r,m ous section. The cohomological gradation corresponds to the fermionic charge operator. The differential is given by the formula

1/2 d=q eiU − e−iU π dz.

To relate this complex to the complex considered in [4, 25], we make the change of variables from the previous section. Then the complex becomes Cq = Fm,r r,m∈Z

with the differential d = where S± are the screening operators from the previous section. Thus, as a vector space, this complex coincides with the complex C • from the previous section, but the differential is different. Indeed, the differential on C • was given by formula S+ − S− (up to inessential factors), but by definition S+ acted non-trivially on Fm,r with m ≥ 0, and by 0 on Fm,r with m < 0, whereas S− acted non-trivially on Fm,r with m ≤ 0, and by 0 on Fm,r with m > 0. In contrast, now the differential is defined in such a way that both S+ and S− act non-trivially on Fm,r with an arbitrary integer m. In particular, the cohomological gradation on C • introduced in the previous section is now well-defined only mod 2. This new complex is therefore a deformation of the complex C • , which was previously considered in [4, 25]. It was shown in [25] that its cohomology is isomorphic to the quantum cohomology of P1 (so it is commutative as a chiral algebra). The cohomology is therefore two-dimensional, and as representatives of two independent cohomology classes we can take the identity operator and the operator q 1/2 (eiU + e−iU ), familiar from the Landau-Ginzburg theory. But what about the complex C • considered in the previous section? Following [4, 25], we can interpret it as a certain limit of the complex Cq• when q → 0. To this end, let us redefine the term Fm,r of the complex by multiplying it with q |m|/2 . Then the differential q 1/2 S+ , when acting from Fm,r to Fm+1,r , m ≥ 0, will become S+ , but when acting from Fm,r to Fm+1,r , m < 0, it will become q S+ , and so will vanish when q = 0. Likewise, q 1/2 S− , when acting from Fm,r to Fm−1,r , m ≤ 0, will become S− , but when acting from Fm,r to Fm−1,r , m > 0, it will become q S− , and so will vanish when q = 0. Thus, when q = 0 the complex Cq will degenerate into the complex C • considered in the previous section. Hence its cohomology will become isomorphic to the cohomology of the chiral de Rham complex of P1 (and so will become much bigger). q 1/2 S+

− q 1/2 S− ,

84

E. Frenkel, A. Losev

The degenerate complex makes perfect sense as the complex computing the cohomology of the right moving supercharge in the perturbative regime, i.e., without the instanton corrections. Indeed, in the perturbative regime we consider maps → P1 which either pass through 0 or through ∞, but not through both points. We achieve this effect by rescaling the terms of the complex as described above. According to [34, 20], we should expect that the cohomology of the right moving supercharge of the A–model of P1 (which is equivalent to the I–model) is isomorphic to the cohomology of the chiral de Rham complex of P1 . The above computation confirms this assertion. In addition, we have also obtained the cohomology of the right moving supercharge with the instanton corrections and found it to be isomorphic to the quantum cohomology of P1 , using the results of [4, 25]. To summarize, we have a family of complexes Cq depending on a complex parameter q. When q = 0 the cohomology is two-dimensional and is isomorphic to the quantum cohomology of P1 , and when q = 0 the cohomology is isomorphic to the cohomology of the chiral de Rham complex of P1 . Note that we also have a residual action of the left moving supercharge on this cohomology. For q = 0 it simply acts by zero, and so the cohomology of the total supercharge is the quantum cohomology of P1 as expected in the A–model of P1 . If q = 0, then it is known (see [25]) that the cohomology will be the ordinary (not quantum) cohomology of P1 . This pattern holds for other Fano toric varieties. Indeed, we can show in the same way as above that for such a variety P S the cohomology of the right moving supercharge of the I–model introduced in Sect. 4.5 is computed by a complex isomorphic to the one introduced in [4, 25]. The differential obtained from the supercharges of Sect. 3.3 coincides with the differential of [4, 25]. In fact, we have a family of complexes parameterized by the Kähler cone of P S . According to [25], in the case when P S = Pn its cohomology is isomorphic to the quantum cohomology of Pn . We expect the same to be true for general Fano toric varieties. This is confirmed by the computation in [25] (which uses the results of [1]) which shows that the cohomology of the total supercharge is isomorphic to the quantum cohomology of P S . Moreover, the cohomology classes are of the I–model. represented by the elements of the gradient ring of the superpotential W This is what we expect to be true in the I–model of P S , which should be equivalent to the A–model of P S . In the limit when the parameters of our complex tend to zero, our complex degenerates. The cohomology of the degenerate complex was shown in [4] to be isomorphic to the cohomology of the chiral de Rham complex of P S . This is again in agreement with the assertion of [34, 20]. Acknowledgements. E.F. is grateful to E. Witten for illuminating discussions of sigma models, and in particular for explaining his ideas on the connection between sigma models and the chiral de Rham complex prior to their publication in [34]. E.F. also thanks B. Feigin, A.J. Tolland, and especially A. Givental for useful discussions. A.L. thanks B. Feigin, M. Olshanetsky and A. Rosly for discussions. We are grateful to N. Nekrasov and E. Witten for their comments on a draft of this paper. Our collaboration on this project started during a meeting on the geometric Langlands correspondence in Chicago in October of 2004 that was sponsored by DARPA through its Program “Fundamental Advances in Theoretical Mathematics”. We gratefully acknowledge support from DARPA.

References 1. Batyrev, V.: Quantum cohomology rings of toric varieties. Asterisque 218, 9–34 (1993) 2. Baulieu, L., Losev, A., Nekrasov, N.: Target space symmetries in topological theories I. J. HEP 02, 021 (2002)

Mirror Symmetry in Two Steps: A–I–B

85

3. Baulieu, L., Singer, I.: The topological sigma model. Commun. Math. Phys. 125, 227–237 (1989) 4. Borisov, L.: Vertex algebras and mirror symmetry. Commun. Math. Phys. 215, 517–557 (2001) 5. Cecotti, S., Vafa, C.: On classification of N = 2 supersymmetric theories. Commun. Math. Phys. 158, 569 (1993) 6. Cordes, S., Moore, G., Ramgoolam, S.: Lectures on 2D Yang-Mills theory, equivariant cohomology and topological field theory. In: Géométries fluctuantes en mécanique statistique et en théorie des champs (Les Houches, 1994), Amsterdam: North-Holland, 1996, pp. 505–682 7. Eguchi, T., Hori, K., Yang, S.-K.: Topological sigma models and large N matrix integrals. Int. J. Mod. Phys. A10, 4203 (1995) 8. Fateev, V., Lukyanov, S.: The models of two-dimensional conformal quantum field theory with Zn symmetry. Int. J. Mod. Phys. A3, 507–520 (1988) 2 algebra. RIMS Preprint 9. Feigin, B.: Super quantum groups and the algebra of screenings for sl 10. Feigin, B., Frenkel, E.: Representations of affine Kac–Moody algebras, bosonization and resolutions. Lett. Math. Phys. 19, 307–317 (1990) 11. Feigin, B., Frenkel, E.: Semi-infinite Weil complex and the Virasoro algebra. Comm. Math. Phys. 137 617–639 (1991) 12. Feigin, B., Frenkel, E.: Integrals of motion and quantum groups. In: Proceedings of the C.I.M.E. School Integrable Systems and Quantum Groups, Italy, June 1993, Lect. Notes in Math. 1620, Berlin-Heidelberg-New York: Springer, 1995 13. Fendley, P., Intriligator, K.: Scattering and thermodynamics in integrable N = 2 theories. Nucl. Phys. 380, 265–292 (1992) 14. Frenkel, E., Ben-Zvi, D.: Vertex Algebras and Algebraic Curves. Mathematical Surveys and Monographs 88, Second Edition, Providence, RI: AMS, 2004 15. Friedan, D., Martinec, E., Shenker, S.: Conformal invariance, supersymmetry and string theory. Nucl. Phys. B271, 93–165 (1986) 16. Givental, A.: Homological geometry and mirror symmetry. In: Proceedings of ICM, Zürich 1994, Basel: Birkhäuser, pp. 472–480 1995, A mirror theorem for toric complete intersections. In: Topological field theory, primitive forms and related topics (Kyoto, 1996), eds. M. Kashiwara, e.a., Progr. Math. 160, Boston: Birkhäuser, 1998, pp. 141–175 17. Gorbounov, V., Malikov, F., Schechtman, V.: Twisted chiral de Rham algebras on P1 , MPI Preprint, 2001 18. Hori, K., Vafa, C.: Mirror symmetry. http://arxic.org/list/ hep-th/0002222, 2000 19. Hori, K., Katz, S., Klemm, A., Pandharipande, R., Thomas, R., Vafa, C., Vakil, R., Zaslow, E.: Mirror symmetry, Clay Mathematics Monographs, Vol. 1, Providence, RI: AMS, 2004 20. Kapustin, A.: Chiral de Rham complex and the half-twisted sigma-model. http://arxiv.org/list/ hepth/0504074, 2005 21. Kapustin, A., Orlov, D.: Vertex algebras, mirror symmetry, and D-branes: the case of complex tori. Commun. Math. Phys. 233 79–136 (2003) 22. Losev, A.: Hodge strings and elements of K. Saito’s theory of primitive form. In: Topological field theory, primitive forms and related topics (Kyoto, 1996), eds. M. Kashiwara, e.a., pp. 305–335, Progr. Math. 160, Basel: Birkhäuser, 1998, pp. 305–355 23. Losev, A., Marshakov, A., Zeitlin, A.: On first order formalism in string theory. Phys. Lett. B633, 375–381 (2006) 24. Losev, A., Nekrasov, N., Shatashvili, S.: The freckled instantons. In: The many faces of the superworld, River Edge, NJ: World Sci. Publishing, 2000, pp. 453–475 25. Malikov, F., Schechtman, V.: Deformations of chiral algebras and quantum cohomology of toric varieties. Commun. Math. Phys. 234, 77–100 (2003) 26. Malikov, F., Schechtman, V., Vaintrob, A.: Chiral de Rham complex. Commun. Math. Phys. 204, 439–473 (1999) 27. Polyakov, A.: Quark confinement and topology of gauge groups. Nucl. Phys. 120 429 (1977) 28. Voisin, C.: Mirror symmetry. SFM/AMS Texts and Monographs, Vol. 1, Providence, RI: AMS, 1999 29. Witten, E.: Topological sigma models. Commun. Math. Phys. 118, 411–449 (1988) 30. Witten, E.: Two-dimensional gravity and intersection theory on moduli space. In: Surveys in Diff. Geom., Vol. 1, Bethlehem, PA: Lehigh Univ., 1991, pp. 243–310 31. Witten, E.: Mirror manifolds and topological field theory. In: Essays on Mirror manifolds, Ed. S.-T. Yau, Cambridge MA: International Press 1992, pp. 120–158 32. Witten, E.: On the Landau-Ginzburg description of N = 2 minimal models. Int. J. Mod. Phys. A9, 4783–4800 (1994) 33. Witten, E.: Chern-Simons gauge theory as a string theory. In: The Floer memorial volume, Progr. Math. 133, Basel-Boston: Birkhäuser, 1995, pp. 637–678 34. Witten, E.: Two-Dimensional Models With (0,2) Supersymmetry: Perturbative Aspects. http:arxiv.org/list/ hep-th/0504078, 2005

86

E. Frenkel, A. Losev

35. Zamolodchikov, A.: Integrable field theory from conformal field theory. In: Integrable systems in quantum field theory and statistical mechanics. Adv. Stud. Pure Math. 19, New York London-San Diego: Academic Press, 1989, pp. 641–674 36. Zwiebach, B.: Closed String Field Theory: Quantum Action and the BV Master Equation. Nucl. Phys. B390, 33–152 (1993) Communicated by L. Takhtajan

Commun. Math. Phys. 269, 87–105 (2007) Digital Object Identifier (DOI) 10.1007/s00220-006-0050-0

Communications in

Mathematical Physics

Cantor and Band Spectra for Periodic Quantum Graphs with Magnetic Fields Jochen Brüning1 , Vladimir Geyler2 , Konstantin Pankrashkin1 1 Humboldt-Universität zu Berlin, Institut für Mathematik, Rudower Chaussee 25, Berlin, 12489 Germany.

E-mail: [email protected]

2 Mordovian State University, Mathematical Faculty, Saransk, 430000 Russia

Received: 29 December 2005 / Accepted: 20 March 2006 Published online: 15 June 2006 – © Springer-Verlag 2006

Abstract: We provide an exhaustive spectral analysis of the two-dimensional periodic square graph lattice with a magnetic field. We show that the spectrum consists of the Dirichlet eigenvalues of the edges and of the preimage of the spectrum of a certain discrete operator under the discriminant (Lyapunov function) of a suitable Kronig-Penney Hamiltonian. In particular, between any two Dirichlet eigenvalues the spectrum is a Cantor set for an irrational flux, and is absolutely continuous and has a band structure for a rational flux. The Dirichlet eigenvalues can be isolated or embedded, subject to the choice of parameters. Conditions for both possibilities are given. We show that generically there are infinitely many gaps in the spectrum, and the Bethe-Sommerfeld conjecture fails in this case. Introduction The Hamiltonian H of a charged particle in a two-dimensional system subjected to a periodic electric potential and a uniform magnetic field B has a highly non-trivial spectral and topological structure depending on the ratio of the area σ of the elementary cell of the lattice in question and the squared magnetic length 2M = c/|eB| (here e is the electron charge, c is the light velocity, is the Planck constant). More precisely, denote by θ the number of the magnetic flux quanta through the elementary cell: θ = σ/2π 2M . If θ is a rational number, then the spectrum of H has a band structure (i.e. the spectrum is the union of a locally finite family of segments) and for θ = 0 each vector bundle of the magnetic Bloch functions corresponding to a completely filled Landau level is non-trivial [33] (for a non-zero integer θ the Chern number of this bundle is exactly the value of the quantized Hall conductance in units e2 /2π [6, 35, 40]). The most likely conjecture is that at irrational values of θ , the spectrum of H has a Cantor The work was supported by the Deutsche Forschungsgemeinschaft, the Sonderforschungsbereich “Raum, Zeit, Materie” (SFB 647), and the International Bureau of BMBF at the German Aerospace Center (IB DLR, cooperation Germany–New Zealand NZL 05/001)

88

J. Brüning, V. Geyler, K. Pankrashkin

structure. This conjecture was recently proved in the case of the tight-binding model for the magnetic Bloch electron [4, 5, 36]; in this case H is reduced to the Harper operator in the discrete Hilbert space l 2 (Z). Due to the Langbein duality [28], the same is true for the Hamiltonian of the nearly-free-electron approximation. As a result of these Cantor properties, the diagram describing the dependence of the spectrum on the flux θ , the so-called Hofstadter butterfly, has a remarkable fractal structure, see e.g. [19, 21, 25]. Little is known for the magnetic Schrödinger operator H in the space L 2 (R2 ), H=

e 2 1 p − A + V (x, y) , 2m c

(1)

where A is the vector potential of B and V is a potential which is periodic with respect to the considered lattice. It is proven in this case, that H has a piece of the Cantor spectrum near the bottom of the spectrum for a restricted class of potentials V [20]. In this connection, the quantum network models (also called the quantum graph models) have attracted considerable interest recently. These models combine some essential features of both discrete and continuous models mentioned above. On the one hand, the Hamiltonian of a magnetic network model has infinitely many magnetic bands of different shape. On the other, the time-independent Schrödinger equation for this Hamiltonian can be reduced to a discrete equation. S. Alexander was the first who performed this reduction [3] in the framework of the percolation approach to the effect of disorder on superconductivity proposed by P. G. de Gennes [11]. A very short and elegant derivation of the Schrödinger equation for a periodic quantum graph with a uniform magnetic field and a constant potential on each edge of the graph is given in [23]. On the mathematical level of rigor the relation between solutions of the Schrödinger equation for H on quantum graphs and those for a Jacoby matrix J (H ) on the corresponding combinatorial graphs was established by P. Exner [14]. Nevertheless, the main theorem from [14] is applicable only to finding eigenvalues of H distinct from the Dirichlet eigenvalues on the edges of . As to the points of the continuous spectrum, the main result of [14] allows an exhaustive analysis only in the case when the direct and inverse Schnol-type theorems are known for both H and J (H ). It is worth noting that quantum networks are not only a mathematical tool to get simplified models of various quantum systems, but in many cases experimental devices really have a shape of planar graphs such that the width of the sides is much smaller than the parameters of the dimension of length which characterizes the quantity in question, e.g., much smaller than the magnetic length, the Fermi wave length, the scattering length, etc. [1,31,32]. In these cases the quantum graph models are the most adequate ones for simulating spectral, scattering, and transport properties of these devices. Here we propose an alternative approach to the spectral analysis of quantum graph Hamiltonians based on boundary triples, Dirichlet-to-Neumann maps, and the Krein technique of self-adjoint extensions. Such a machinery works effectively in many other problems connected with explicitly solvable models [2, 34]. In the case of square network lattices with a periodic magnetic field (including a uniform one), an arbitrary L 2 -potential on edges and δ-like boundary conditions at the vertices (including the Kirchhoff boundary conditions), we perform an exhaustive spectral analysis of the network Hamiltonian H . It is proved that the spectrum always contains Dirichlet eigenvalues of the edges as infinitely degenerate eigenvalues of H . The rest part of the spectrum is absolutely continuous and has a band structure, if θ is a rational number, and is the union of countably many Cantor sets placed between Dirichlet eigenvalues, otherwise. Moreover, this part is the preimage of the spectrum of the corresponding lattice Hamiltonian

Cantor and Band Spectra for Periodic Quantum Graphs with Magnetic Fields

89

J (H ) with respect to a many-sheeted analytic function which is just the discriminant of a generalized Kronig–Penney operator (i.e. the Sturm–Liouville operator with a Kronig–Penney potential). The eigenvalues are isolated if the magnetic flux is non-integer, while for integer magnetic fluxes it depends on the electric potential and on the coupling at the nodes. 1. Magnetic Schrödinger Operator on the Periodic Metric Graph Consider a planar square graph lattice whose nodes are the points K m,n := (ml, nl), (m, n) ∈ Z2 , where l > 0 is the length of each edge. Two nodes K m,n and K p,q are connected by an edge iff |m − p| + |n − q| = 1. We denote the edge between K m,n and K m+1,n by E m,n,r (right), and between K m,n and K m,n+1 by E m,n,u (up). Each edge E m,n,r/u will be considered as the segment [0, l] so that 0 is identified in both cases with K m,n , and l is identified with K m+1,n for E m,n,r and K m,n+1 for E m,n,u , respectively. The state space of the lattice is Hm,n,r ⊕ Hm,n,u , Hm,n,r/u = L 2 [0, l]. H= (m,n)∈Z

The elements of H will be denoted as f = ( f m,n,r , f m,n,u ), f m,n,r/u ∈ Hm,n,r/u . On each edge consider the same electric potential V ∈ L 2 [0, l]. We assume that the lattice is subjected to an external magnetic field orthogonal to the plane, B(x) = 0, 0, b(x) , b ∈ C(R2 ), such that the quantity 1 ξ= b(x)d x, 2πl 2 Fm,n where Fm,n is the square spanned by E m,n,r and E m,n,u , is independent of m, n ∈ Z. This includes the periodic magnetic field, i.e. the case b(x1 +l, x2 ) = b(x1 , x2 +l) = b(x1 , x2 ). The corresponding magnetic vector potential in the symmetric gauge can be written as A(x1 , x2 , x3 ) = (−π ξ x2 , π ξ x1 , 0) + A1 (x1 , x2 ), A2 (x1 , x2 ), 0 with

(m+1)l ∂ A1 A1 (t, nl)−A1 t, (n+1)l dt (x1 , x2 ) d x1 d x2 ≡ ∂ x2 Fm,n ∂ x 1 ml (n+1)l + A2 (m + 1)l, t − A2 (ml, t) dt = 0 for all m, n ∈ Z. (2) ∂ A

2

(x1 , x2 )−

nl

The presence of the magnetic field leads to non-trivial magnetic potentials on the edges, which are the projections of A(x) on the corresponding directions. The magnetic potentials Am,n,r/u on E m,n,r/u are:

Am,n,r (t) = A (ml, nl, 0) + (1, 0, 0)t , (1, 0, 0) ≡ −π ξ nl + A1 (ml + t, nl),

Am,n,u (t) = A (ml, nl, 0) + (0, 1, 0)t , (0, 1, 0) ≡ π ξ ml + A2 (ml, nl + t). On each of the edges E m,n,r/u we consider the operator L m,n,r/u =

−i

2 d − Am,n,r/u + V, dt

90

J. Brüning, V. Geyler, K. Pankrashkin

with the domain H 2 [0, l]. The direct sum of these operators over all edges is not self-adjoint, and in order to obtain a self-adjoint operator on the whole lattice it is necessary to introduce boundary conditions at each node. The most general boundary conditions involve a number of parameters and can be found, for example, in [24]. We restrict ourselves by considering the so-called magnetic δ-like interaction at K m,n , 1 1 f m,n,u (0) = f m−1,n,r (l) = f m,n−1,u (l) =: f m,n , β β d d − i Am,n,r f m,n,r (0) + β − i Am,n,u f m,n,u (0) dt dt d d − − i Am−1,n,r f m−1,n,r (l) − β − i Am,n−1,u f m,n−1,u (l) = α f m,n , dt dt m, n ∈ Z, f m,n,r (0) =

(3)

where α ∈ R, β ∈ R \ {0}. These quantities have the following physical meaning. The parameter α = 0 is the coupling constant of a δ-like potential at each node. Introducing the parameter β can be treated as considering a more general form of the Hamiltonian H: 1 e e H= p j − A j pk − Ak + V (x, y) , 2m jk c c jk

where m jk is the effective mass tensor and β is the corresponding anisotropy coefficient (the ratio of the eigenvalues of the symmetric matrix (m jk )). In particular, if β = 1, one obtains H in the form (1); if in addition α = 0, we get the magnetic Kirchhoff coupling. This class of boundary conditions covers main couplings used in the physics literature. The self-adjoint operator obtained in this way we denote by L. 2. Gauge Transformations To study the spectral properties of L it is useful to use the gauge transformation ( f m,n,r , f m,n,u ) = (Um,n,r ϕm,n,r , Um,n,u ϕm,n,u ) given by t f m,n,r/u (t) = exp i Am,n,r/u (s) ds ϕm,n,r/u (t) =: Um,n,r/u ϕm,n,r/u (t). 0

d2 + V , and the boundary conditions (3) dt 2 for ϕ = (ϕm,n,r , ϕm,n,u ) ∈ U −1 (dom L) take the form −1 There holds Um,n,r/u L m,n,r/u Um,n,r/u = −

ϕm,n = α ϕm,n ,

(4a)

where 1 ϕm,n,u (0) β ml = exp − iπ nθ + i A1 (t, nl)dt ϕm−1,n,r (l)

ϕm,n := ϕm,n,r (0) =

(m−1)l

nl 1 = exp iπ mθ + i A2 (ml, t)dt ϕm,n−1,u (l) β (n−1)l

(4b)

Cantor and Band Spectra for Periodic Quantum Graphs with Magnetic Fields

91

and :=ϕm,n,r (0) + βϕm,n,u (0)

ϕm,n − exp − iπ nθ + i

− β exp iπ mθ + i

ml (m−1)l nl

(n−1)l

A1 (t, nl)dt ϕm−1,n,r (l) A2 (ml, t)dt ϕm,n−1,u (l),

(4c)

and θ := ξl 2 is the number of flux quanta through the elementary cells Fm,n . Therefore, the operator L = U −1 LU acts on each edge as ϕm,n,r/u → −ϕm,n,r/u + V ϕm,n,r/u on functions ϕ satisfying (4), and its spectrum coincides with the spectrum of L. To simplify subsequent calculations we apply another gauge transformation, nl ml ϕm,n,r/u = exp i A1 (t, nl)dt + i A2 (0, t)dt φm,n,r/u . 0

0

Substituting this expression into (4) and using (2) we arrive at an operator acting on + V φm,n,r/u on functions φ = (φm,n,r , φm,n,u ), each edge as φm,n,r/u → −φm,n,r/u 2 φm,n,r/u ∈ H [0, l], satisfying 1 1 φm,n,u (0) = e−iπ nθ φm−1,n,r (l) = eiπ mθ φm,n−1,u (l) =: φm,n , (5a) β β = αφm,n , m, n, ∈ Z, (5b)

φm,n,r (0) = φm,n

where φm,n := φm,n,r (0) + βφm,n,u (0) − e−iπ nθ φm−1,n,r (l) − βeiπ mθ φm,n−1,u (l).

(5c) This operator, which we denote by , is unitarily equivalent to the initial magnetic Hamiltonian L. In what follows we work mostly with this new operator. At this point we emphasize some important circumstances. First, we see that the initial magnetic field must not be necessary periodic to produce a periodic operator on the lattice. Second, for the usual magnetic Schrödinger operators in L 2 (R2 ) the spectral analysis for non-zero but periodic magnetic vector potentials (i.e. with the zero flux per cell) essentially differs from that for the Schrödinger operators without magnetic field; even the proof of the absolute continuity of the spectrum is non-trivial [7, 39]. In our case, the operator on the graph with a periodic magnetic vector potential appears to be unitarily equivalent to the operator without magnetic field. Third, for the usual magnetic Schrödinger operators the bottom of the spectrum grows infinitely as the flux becomes infinitely large. In our situation, the spectrum is 1-periodic with respect to the magnetic flux θ , as changing θ by θ + 1 in (5) obviously can be compensated by a unitary transformation. Such periodicity leads to the so-called Aharonov–Bohm oscillations in the corresponding physical quantities. Remark 1. It is worth noting that is invariant with respect to the so-called magnetic translation group G M [41]. In our case this group is generated by the magnetic shift operators τr and τu , τr φm,n,r/u (t) = eiπ nθ φm−1,n,r/u (t), τu φm,n,r/u (t) = e−iπ mθ φm,n−1,r/u (t).

92

J. Brüning, V. Geyler, K. Pankrashkin

The properties of this group depend drastically on the arithmetic properties of θ [9]. In particular, if θ is irrational, then G M has only infinite-dimensional irreducible representations which are trivial on the center of G M . Therefore, for any irrational θ each point of spec is infinitely degenerate. Proposition 2. The operator is semibounded below. Proof. Let φ ∈ dom . Using the integration by parts and changing suitably the summation order one obtains φm,n,r , −φm,n,r + V φm,n,r + φm,n,u , −φm,n,u + V φm,n,u

φ, φ =

m,n

= φm,n,r (0)φm,n,r (0)−φm,n,r (l)φm,n,r (l)+φm,n,u (0)φm,n,u (0)−φm,n,u (l)φm,n,u (l) m,n

|φm,n,r |2 + V |φm,n,r |2 d x +

l

+ 0

l 0

|φm,n,u |2 + V |φm,n,u |2 d x

l l |φm,n,r |φm,n,u |2 + V |φm,n,u |2 d x |2 + V |φm,n,r |2 d x + = m,n

0

0

+ φm,n φm,n,r (0) + βφm,n,u (0) − e−iπ nθ φm−1,n,r (l) − βeiπ mθ φm,n−1,u (l)

= =

l l |φm,n,r |φm,n,u |2 +V |φm,n,u |2 d x +φm,n φm,n |2 +V |φm,n,r |2 d x + m,n

0

m,n

0

0

l l |φm,n,r |φm,n,u |2 +V |φm,n,u |2 d x+α|φm,n |2 . |2 +V |φm,n,r |2 d x+ 0

Now choose c ∈ (0, 1) and C ∈ R with

l

|α||h(0)|2 ≤

2 c|h | + (V + C)|h|2 d x

for all h ∈ H 1 [0, l]

0

(the existence of such constants follows from the Sobolev inequality), then

l

|α||φm,n | ≡ |α||φm,n,r (0)| ≤ 2

2

0

c|φm,n,r |2 + (V + C)|φm,n,r |2 d x,

and l (1 − c)|φm,n,r |2 + |φm,n,u |2 + (V + C)|φm,n,u |2 d x ≥ 0. φ, ( + C)φ ≥ m,n

0

Cantor and Band Spectra for Periodic Quantum Graphs with Magnetic Fields

93

3. Boundary Triples Here we describe briefly the technique of abstract self-adjoint boundary value problems with the help of boundary triples. For more detailed discussion we refer to [12]. Let S be a closed linear operator in a Hilbert space H with the domain dom S. Assume that there exists an auxiliary Hilbert space G and two linear maps , : dom S → G such that • for any f, g ∈ dom S there holds f, Sg − S f, g = f, g − f, g , • the map (, ) : dom S → G ⊕ G is surjective, • the set ker ∩ ker is dense in H. The triple (G, , ) with the above properties is called a boundary triple for S. Example 3. Let us describe one important example of boundary triple. Let V ∈ L 2 [0, l] be a real-valued function. In H = L 2 [0, l] consider the operator S=−

d2 + V, dom S = H 2 [0, l], dt 2

then one can set

G = C2 ,

f =

f (0) , f (l)

f =

f (0) . − f (l)

(6)

(7)

If an operator S has a boundary triple, then it has self-adjoint restrictions provided S ∗ is a symmetric operator (see Theorem 3.1.6 in [18]). For example, if T is a self-adjoint operator in G, then the restriction of S to elements f satisfying abstract boundary conditions f = T f is a self-adjoint operator in H, which we denote by HT . Another example is the operator H corresponding to the boundary conditions f = 0. One can relate the resolvents of H and HT as well as their spectral properties by the Krein resolvent formula, which is our most important tool in this paper. Let z ∈ / spec H . For g ∈ G denote by γ (z)g the unique solution to the abstract boundary value problem (S − z) f = 0 with f = g (the solution exists due to the above conditions for and ). Clearly, γ (z) is a linear map from G to H. Denote also by Q(z) the operator on G given by Q(z)g = γ (z)g; this map is called the Krein function. The operator-valued functions γ and Q are analytic outside spec H . Moreover, Q(z) is self-adjoint for real z. Proposition 4. (A) (Proposition 2 in [12]) For z ∈ / spec H ∪ spec HT the operator Q(z) − T acting on G has a bounded inverse defined everywhere and the Krein resolvent formula holds, −1 ∗ (H − z)−1 − (HT − z)−1 = γ (z) Q(z) − T γ (¯z ). (B) The set spec HT\spec H consists exactly of real numbers z such that 0 ∈ spec Q(z) − T . (C) (Theorem 1 in [16]) Let z ∈ spec HT\spec H , then z is an eigenvalue of HT if and only if 0 is an eigenvalue of Q(z) − T , and in this case γ (z) is an isomorphism of the corresponding eigensubspaces. This statement is especially useful if the spectrum of H is a discrete set and the spectrum of HT is expected to have a positive measure, because one can describe the most part of the spectrum of HT in terms of Q(z) − T .

94

J. Brüning, V. Geyler, K. Pankrashkin

Example 5. Consider the example given by (6) and (7). The corresponding Krein function s(z) can be obtained as follows. The restriction D of S given by f = 0 is D f = − f + V f,

dom D = { f ∈ H 2 [0, l] : f (0) = f (l) = 0}.

(8)

In what follows we denote the eigenvalues of D by μk , k = 0, 1, 2 . . . , μ0 < μ1 < μ2 < . . . . Let two functions u 1 , u 2 ∈ H 2 [0, l] satisfy u 1 , u 2 ∈ ker(S − z),

u 1 (0; z) = 0, u 2 (0; z) = 1,

u 1 (0; z) = 1, u 2 (0; z) = 0.

(9)

Clearly, for their Wronskian one has w(z) = u 1 (x; z)u 2 (x; z) − u 1 (x; z)u 2 (x; z) ≡ 1. Both u 1 , u 2 as well as their derivatives with respect to x are entire functions of z. Let z ∈ / spec D, then any function f with − f + V f = z f can be written as f (x; z) =

f (l) − f (0)u 2 (l; z) u 1 (x; z) + f (0)u 2 (x; z), u 1 (l; z)

and the calculation of f (0) and − f (l) gives s(z) =

1 u 1 (l; z)

1 −u 2 (l; z) −u 2 (l; z) 1 1 ≡ . w(z) −u 1 (l; z) 1 −u 1 (l; z) u 1 (l; z)

(10)

It can be directly seen that s(z) is real and self-adjoint for real z. Clearly, the matrix s has simple poles at μk , which are at the same time simple zeros of u 1 (l; z). More precisely, by the well-known arguments, see e.g. Eq. (I.4.13) in [29], there holds l ∂u 1 (l; z) = u 2 (l; μk ) u 21 (s, μk ) ds, z=μk ∂z 0 and u 2 (l; μk ) = 0 due to u 1 (l; μk )u 2 (l; μk ) ≡ w(μk ) = 1.

4. Reduction to a Discrete Problem on the Lattice To describe the spectrum of we use the Krein resolvent formula. Denote by the operator acting on each edge as φm,n,r/u → −φm,n,r/u + V φm,n,r/u on functions sat isfying only the condition (5a). Clearly, for such functions the expression φm,n given by (5c) makes sense. This operator is not symmetric, as it is a proper extension of the self-adjoint operator .

Cantor and Band Spectra for Periodic Quantum Graphs with Magnetic Fields

95

Proposition 6. The operator is closed and the triple l 2 (Z2 ), , , : dom φ = (φm,n,r , φm,n,u ) → φm,n ∈ l 2 (Z2 ), : dom φ = (φm,n,r , φm,n,u ) → φm,n ∈ l 2 (Z2 ), is a boundary triple for . d2 Proof. Denote by the direct sum of operators − 2 + V with the domain H 2 [0, l] dt over all edges E m,n,r/u . Clearly, is a closed operator, and the functionals gm,n,1 (φ) = φm,n,r (0) −

1 φm,n,u (0), β

gm,n,2 (φ) = φm,n,r (0) − e−iπ nθ φm−1,n,r (l), 1 gm,n,3 (φ) = φm,n,r (0) − eiπ mθ φm,n−1,u (l) β are continuous with respect to the graph norm of . Therefore, the restriction of to functions on which all these functionals vanish is a closed operator, and this is exactly . For any φ ∈ dom the inclusions φ, φ ∈ l 2 (Z2 ) follow from the Sobolev inequality, and both , are continuous with respect to the graph norm of . Let φ, ψ ∈ dom , then the integration by parts gives φ, ψ − φ, ψ

φm,n,i , −ψm,n,i + V ψm,n,i − −φm,n,i + V φm,n,i , ψm,n,i

≡ m,n∈Z, i=r,u

≡

φm,n,i , −ψm,n,i

− −φm,n,i , ψm,n,i

m,n∈Z, i=r,u

=

1 φm,n,r (0)ψm,n,r (0) + φm,n,u (0)βψm,n,u (0) β

m,n∈Z

(l) − − φm−1,n,r (l)ψm−1,n,r

1 φm,n−1,u (l)βψm,n−1,u (l) β

1 φ (0)βψm,n,u (0) β m,n,u 1 + φm−1,n,r (l)ψm−1,n,r (l) + φm,n−1,u (l)βψm,n−1,u (l) β

(0)ψm,n,r (0) − − φm,n,r

=

(0) + βφm,n ψm,n,u (0) φm,n ψm,n,r m,n∈Z − eiπ nθ φm,n ψm−1,n,r (l) − βe−iπ mθ φm,n ψm,n−1,u (l) (0)ψm,n − βφm,n,u (0)ψm,n − φm,n,r + φm−1,n,r (l)eiπ nθ ψm,n + βφm,n−1,u (l)e−iπ mθ ψm,n

96

J. Brüning, V. Geyler, K. Pankrashkin

=

φm,n ψm,n,r (0) + βψm,n,u (0) m,n∈Z

− e−iπ nθ ψm−1,n,r (l) − βeiπ mθ ψm,n−1,u (l) − φm,n,r (0) + βφm,n,u (0) − e−iπ nθ φm−1,n,r (l) − βeiπ mθ φm,n−1,u (l) ψm,n ψ φm,n ψm,n − φm,n = m,n ≡ φ, ψ − φ, ψ . m,n∈Z

Now we verify the surjectivity condition. Choose functions f 0 , f 1 ∈ H 2 [0, l], with ( j) f 0 (0) = f 1 (0) = 1, f 0 (0) = f 1 (0) = 0, f k (l) = 0, j, k = 0, 1. Let g, g ∈ l 2 (Z2 ). For any p, q ∈ Z denote h p,q,1 (x) = g p,q f 0 (x) + h p,q,2 (x) = βg p,q

g p,q

f 1 (x), 4 g p,q f 1 (x), f 0 (x) + 4β

h p,q,3 (x) = g p,q eiπqθ f 0 (l − x) + e−iπqθ h p,q,4 (x) = βg p,q e−iπqθ

g p,q

f 1 (l − x), 4 g p,q f 1 (l − x), f 0 (l − x) + eiπ pθ 4β

then h p,q, j ∈ H 2 [0, l], j = 1, . . . , 4, and these functions satisfy 1 1 h p,q,2 (0) = e−iπqθ h p,q,3 (l) = eiπ pθ h p,q,4 (l) = g p,q , β β g p,q , h p,q,1 (0) = β h p,q,2 (0) = −eiπqθ h p,q,3 (l) = −β e−iπ pθ h p,q,4 (l) = 4 h p,q,1 (l) = h p,q,2 (l) = h p,q,3 (0) = h p,q,4 (0) h p,q,1 (0) =

= h p,q,1 (l) = h p,q,2 (l) = h p,q,3 (0) = h p,q,4 (0) = 0. ( p,q) ( p,q) Define φ ( p,q) = φm,n,r , φm,n,u ∈ H with ( p,q)

( p,q)

( p,q)

( p,q)

φ p,q,r = h p,q,1 , φ p,q,u = h p,q,2 , φ p−1,q,r = h p,q,3 , φ p,q−1,u = h p,q,4 , ( p,q)

φm,n,i = 0

for all other m, n ∈ Z and i = r, u.

( p,q) Clearly, by construction φ ( p,q) ∈ dom and there holds φ ( p,q) m,n ≡ φm,n = ( p,q) ( p,q) g p,q δmp δnq and φ ≡ φ = g p,q δmp δnq , m, n ∈ Z. It is easy to see m,n m,n (m,n) that the series φ = m,n φ converges in the graph norm of , hence φ ∈ dom . 2 Since H [0, l] is continuously imbedded in C 1 [0, l], we have φ = m,n φ (m,n) = g and φ = m,n φ (m,n) = g . The surjectivity condition is proved. It remains to note that the set ker ∩ ker contains the direct sum of C0∞ (0, l) over all edges and is obviously dense in H.

Cantor and Band Spectra for Periodic Quantum Graphs with Magnetic Fields

97

The operator is the restriction of to the set of function φ satisfying φ = αφ. Consider another self-adjoint extension 0 given by φ = 0. Clearly, 0 is exactly the direct sum of the operators D from (8) over all the edges E m,n,r/u . In particular, spec 0 = spec D. Let z ∈ / spec D, g ∈ l 2 (Z2 ) and ψg be the solution of ( − z)ψg = 0 satisfying the boundary condition ψg = g. Consider the corresponding Krein function Q(z) : l 2 (Z2 ) g → ψg ∈ l 2 (Z2 ). Application of Proposition 4 gives the following implicit description of the spectrum of . Proposition 7. A number z ∈ R \ spec 0 ≡ R \ spec D lies in spec iff 0 ∈ spec Q(z) − α . Such z is an eigenvalue of iff 0 is an eigenvalue of Q(z) − α. Therefore, outside the discrete set spec 0 ≡ spec D we can reduce the spectral problem for to the spectral problem for Q(z) − α. Let us calculate Q(z) more explicitly; actually this is our key construction. Proposition 8. For z ∈ / spec D there holds Q(z) = (1 + β 2 ) s11 (z) + s22 (z) + s12 (z)M(θ, β),

(11)

where M(θ, β) is the discrete magnetic Laplacian in l 2 (Z2 ),

M(θ, β)g

m,n

= eiπ nθ gm+1,n + e−iπ nθ gm−1,n + β 2 (e−iπ mθ gm,n+1 + eiπ mθ gm,n−1 ),

g = (gm,n ) ∈ l 2 (Z2 ). Proof. Note that for φ = (φm,n,r , φm,n,u ) in the notation of Proposition 6 there holds ⎞⎞ ⎛⎛ ⎞⎞ φm,n,r (0) φm,n,r (0) ⎟ ⎜⎜ −φm,n,r (l) ⎟ ⎜⎜ φm,n,r (l) ⎟⎟ ⎟⎟ ⎜ = Cφ, φ = B ⎜ ⎝⎝φ ⎠ ⎠ ⎝⎝ φm,n,u (0) ⎠⎠ m,n,u (0) φm,n,u (l) (m,n)∈Z2 (l) (m,n)∈Z2 −φm,n,u ⎛⎛

with operators B : l 2 (Z2 ) ⊗ C4 → l 2 (Z2 ) and C : l 2 (Z2 ) → l 2 (Z2 ) ⊗ C4 given by ⎛⎛

⎞⎞ (1) h m,n ⎜⎜ (2) ⎟⎟ (1) ⎜⎜h ⎟⎟ −iπ nθ (2) iπ mθ (4) h m−1,n + βh (3) h m,n−1 , B : ⎜⎜ m,n m,n + βe (3) ⎟⎟ → h m,n + e ⎝⎝h m,n ⎠⎠ h (4) m,n and ⎞⎞ gm,n ⎜⎜ eiπ nθ gm+1,n ⎟⎟ ⎟⎟ , ⎜ C : (gm,n ) → ⎜ ⎠⎠ ⎝⎝ βgm,n βe−iπ mθ gm,n+1 ⎛⎛

cf. (5).

98

J. Brüning, V. Geyler, K. Pankrashkin

Let g ∈ l 2 (Z2 ). For z ∈ / spec S, finding the solution φ with ( − z)φ = 0 and φ = g reduces to a series of boundary value problems for components of φ, d2 − 2 + V (t) − z φm,n,r/u (t) = 0, dt ⎞⎞ ⎛⎛ φm,n,r (0) ⎜⎜ φm,n,r (l) ⎟⎟ ⎠⎠ = Cg, ⎝⎝φ m,n,u (0) φm,n,u (l) and

⎛

⎞ ⎛ φm,n,r (0) s11 (z) ⎜ −φm,n,r ⎟ ⎜s21 (z) (l) ⎜ ⎟ ⎝ φm,n,u (0) ⎠ = ⎝ 0 0 (l) −φm,n,u

For Q(z)g ≡ φ one has

s12 (z) s22 (z) 0 0

0 0 s11 (z) s21 (z)

⎞⎛ ⎞ 0 φm,n,r (0) 0 ⎟ ⎜ φm,n,r (l) ⎟ . s12 (z)⎠ ⎝φm,n,u (0)⎠ φm,n,u (l) s22 (z)

⎞⎞ (0) φm,n,r ⎟ ⎜⎜ −φm,n,r (l) ⎟ ⎟⎟ ⎜ ψ = B ⎜ ⎝⎝ φm,n,u (0) ⎠⎠ . (l) −φm,n,u ⎛⎛

Therefore, Q(z) = B K (z)C, where K (z) is a linear operator on l 2 (Z2 ) ⊗ C4 with the matrix ⎛⎛ ⎞⎞ s11 (z) s12 (z) 0 0 0 0 ⎟⎟ ⎜⎜s (z) s22 (z) K (z) = diag ⎝⎝ 21 . 0 0 s11 (z) s12 (z)⎠⎠ 0 0 s21 (z) s22 (z) In other words, for any g ∈ l 2 (Z2 ) one has ⎛⎛

⎞⎞ gm,n ⎜⎜ eiπ nθ gm+1,n ⎟⎟ ⎟⎟ , ⎜ Cg = ⎜ ⎠⎠ ⎝⎝ βgm,n −iπ mθ βe gm,n+1 ⎞⎞ ⎛⎛ s11 (z)gm,n + eiπ nθ s12 (z)gm+1,n ⎜⎜ s21 (z)gm,n + eiπ nθ s22 (z)gm+1,n ⎟⎟ ⎜ ⎟⎟ K (z)Cg = ⎜ ⎝⎝β s11 (z)gm,n + e−iπ mθ s12 (z)gm,n+1 ⎠⎠ , β s21 (z)gm,n + e−iπ mθ s22 (z)gm,n+1

and, finally, Q(z)g m,n = B K (z)Cg m,n = (1 + β 2 ) s11 (z) + s22 (z) gm,n + eiπ nθ s12 (z)gm+1,n + e−iπ nθ s21 (z)gm−1,n + β 2 e−iπ mθ s12 (z)gm,n+1 + β 2 eiπ mθ s21 (z)gm,n−1 .

(12)

As can be seen from (10), there holds s12 (z) = s21 (z) and (12) becomes exactly (11). Corollary 9. A number z ∈ R \ spec D lies in the spectrum of L iff 0 ∈ spec (1 + β 2 ) s11 (z) + s22 (z) − α + s12 (z)M(θ, β) .

Cantor and Band Spectra for Periodic Quantum Graphs with Magnetic Fields

99

5. Spectral Analysis To describe the spectrum of we need some additional information on the Krein matrix s(z) from (10). Proposition 10. The matrix s(z) has the following properties: (A) s12 (z) = 0 for all z ∈ / spec D. (B) For any α ∈ R the function η(z) =

α s11 (z) + s22 (z) − (1 + β 2 ) s12 (z) s12 (z)

(13)

can be extended to an entire function. 1 (C) The function η(z) is the discriminant of the (generalized) Sturm-Liouville 1 + β2 operator P=−

d2 + W (t) + WKP (t), dt 2

(14)

where W is the periodic extension of V , W (t + nl) = V (t), t ∈ [0, l), n ∈ Z, α and WKP is the Kronig-Penney potential, WKP (t) = δ(t − kl); such 1 + β2 k∈Z operators P are also called Kronig-Penney Hamiltonians. (D) There holds η(μk ) ≤ −2(1 + β 2 ) for even k and η(μk ) ≥ 2(1 + β 2 ) for odd k. (Recall that μk are the eigenvalues of D.) (E) For all real z with |η(z)| < 2(1 + β 2 ) there holds η (z) = 0. The function η has no local minima with η = 2(1 + β 2 ) and no local maxima with η = −2(1 + β 2 ). Proof. Recall that s(z) is given by (10) with u 1 , u 2 from (9). There holds s12 (z) = 1 and s12 (z) = 0 for all z ∈ / spec D since z → u 1 (l; z) is an entire function. This u 1 (l; z) proves (A). Substituting (10) for z ∈ / spec D in (13) one arrives at η(z) = (1 + β 2 ) u 1 (l; z) + u 2 (l; z) + αu 1 (l; z) and η obviously has analytic extension to all points of spec D. This proves (B). To understand the meaning of η look at the operator (14). This operator acts as f → − f + W f on functions f ∈ H 2 (R \ lZ) satisfying α 1 f (kl−) f (kl+) 2 = , k ∈ Z. (15) 1+β f (kl+) f (kl−) 0 1 Let y1 , y2 be two solutions of (P − z)y = 0 with y1 (0+; z) = y2 (0+; z) = 0 and y1 (0+; z) = y2 (0+; z) = 1. Consider the matrix y1 (l+; z) y2 (l+; z) . M(z) = y1 (l+; z) y2 (l+; z) It is well-known that the spectrum of P consists exactly of real z satisfying tr M(z) ≡ y1 (l+; z) + y2 (l+; z) ∈ [−2, 2], see e.g. [15, 22]. The function tr M(z) is called the discriminant or the Lyapunov function of P and plays an important role in the study of

100

J. Brüning, V. Geyler, K. Pankrashkin

second order differential operators; if α = 0, the study of this function is a classical topic of the theory of ordinary differential equations, see e.g. [10, 29]. On the other hand, note that on the interval (0, l) the solutions y1 and y2 coincide with u 1 and u 2 from (9), respectively. In particular, y1,2 (l−; z) = u 1,2 (l; z) and (l−; z) = u (l; z) Therefore, taking into account the boundary conditions (15) we y1,2 1,2 can write M(z) in the form α α (l; z) (l; z) u (l; z) + u u (l; z) + u 1 2 1 2 M(z) = 1 + β 2 , 1 + β2 u 1 (l; z) u 2 (l; z) 1 η(z), which proves (C). The items (D) and (E) describe typical 1 + β2 properties of the discriminants of one-dimensional periodic operators. To prove (D) note that for any k one has u 1 (l, μk ) = 0, and η(μk ) = (1 + 1 2 η(μk ) coincides with the value of the discrimiβ ) u 1 (l; μk ) + u 2 (l; μk ) , i.e. 1 + β2 nant of the classical periodic Sturm–Liouville problem (α = 0), for which the requested inequalities are well known, see e.g. Lemma VIII.3.1 in [10]. The first part of (E) is known for much more general potentials, see e.g. Lemma 5.2 in [22]. As for the second part, local maxima with η = −2(1 + β 2 ) and local minima with η = 2(1 + β 2 ) would be isolated eigenvalues of P, which is impossible, because the spectrum of P is absolutely continuous [30].

and tr M(z) =

Therefore, up to the discrete set spec D the spectrum of is the preimage of spec M(θ, β) under the entire function η. The operator M(θ, β) is very sensitive to the arithmetic properties of θ and is closely related to the Harper operator, cf. [38]. The nature of the spectrum of M(θ, β) in its dependence on θ is described in the following proposition, which summarizes Theorem 2.7 in [38] (Item A), Theorem 4.2 in [38], Theorem 1.6 in [5] and the main theorem in [4] (Item B), and Theorem 2.1 in [8] (Item C). Proposition 11. (A) The operator M(θ, β) has no eigenvalues for all θ and β. (B) If θ is irrational, the spectrum of M(θ, β) is a Cantor set. If, in addition, β = 1, the spectrum has zero Lebesgue measure. (C) For non-integer θ there holds M(θ, β) < 2(1 + β 2 ). The previous discussion gives a description of the spectrum of in R \ spec D. Let us include spec D into consideration. Proposition 12. There holds spec D ⊂ spec . Moreover, each μk ∈ spec D is an infinitely degenerate eigenvalue of . Proof. Consider an eigenvalue μk of D and the corresponding eigenfunction f with f (0) = 1 and let σ := f (l). Let θ be rational. Take M ∈ Z such that θ M ∈ 2Z. Let p, q ∈ Z. Denote by φ the function from H whose only non-zero components are φ pM+ j,q M,r = βσ j f, φ( p+1)M,q M+ j,u = σ M+ j f,

φ pM,q M+ j,u = −σ j f, φ pM+ j,(q+1)M,r = −βσ M+ j f, j = 0, . . . , M − 1.

Cantor and Band Spectra for Periodic Quantum Graphs with Magnetic Fields

101

Clearly, φ ∈ dom and −φm,n,r/u + (V − μk )φm,n,r/u = 0 for all m, n ∈ Z. Therefore, φ is an eigenfunction of with the eigenvalue μk . As p and q are arbitrary, one can construct infinitely many eigenfunctions with non-intersecting supports. Therefore, each μk is infinitely degenerate in spec . Now let θ be irrational. We use arguments similar to the Schnol-type theorems [27]. For each n ∈ Z put φ−n,n,r = βeiπ nθ f and φ−n,n,u = −eiπ nθ f . The “chain” constructed from these components does not belong to H, but satisfies the boundary condi tions (5). Moreover, −φn,−n,r/u + (V − μk )φn,−n,r/u = 0 for all n, i.e. this chain is a “generalized eigenfunction” of . Take ϕ ∈ C ∞ [0, l] with ϕ(0) = ϕ (0) = 0 and ϕ(l) = ϕ (l) = 1. For any N ∈ N construct ψ (N ) ∈ H such that (N )

ψ−n,n,r/u = φ−n,n,r/u (N ) ψ−N ,N ,r = (N ) ψm,n,i = 0

ϕφ−N ,N ,r ,

if |n| < N ,

(N ) ψ N ,−N ,u

= ϕφ N ,−N ,u ,

for all other m, n ∈ Z and i = r, u. √ Clearly, ψ (N ) ∈ dom for any N and ψ (N ) ≥ 2N f . Moreover, the only two non-zero components of g (N ) = ( − μk )ψ (N ) are (N ) g−N ,N ,r = βeiπ N θ − ϕ f − 2ϕ f − ϕ f + (V − μk )ϕ f = −βeiπ N θ ϕ f + 2ϕ f , and (N ) g N ,−N ,u = −eiπ N θ − ϕ f − 2ϕ f − ϕ f + (V − μk )ϕ f = eiπ N θ ϕ f + 2ϕ f . Therefore, g (N ) ≡ ( − μk )ψ (N ) = 1 + β 2 ϕ f + 2ϕ f ≡ C and ( − μk )ψ (N ) C lim ≤ lim √ = 0, (N ) N →∞ N →∞ 2N f ψ which means that μk ∈ spec . Let us show that μk is an eigenvalue of . By Proposition 11(C) one has M(θ, β) < 2(1+β 2 ). Recall that the spectrum of outside spec D is the preimage of spec M(θ, β) under the function η and, due to Proposition 10(D), does not contain μk . As μk is an isolated point of the spectrum, it is an eigenvalue of , which is infinitely degenerate according to the arguments given in Remark 1. Now we state the main result of the paper. Theorem 13. The spectrum of is the union of two sets, spec = 0 ∪ ,

0 = spec D,

= η−1 spec M(θ, β) ,

and has the following properties: (A) The discrete spectrum is empty and the point spectrum coincides with 0 . (B) The set is non-empty, moreover, the intersection [μk , μk+1 ] ∩ is non-empty for any k. (C) For rational θ the singular continuous spectrum of is empty and the absolutely continuous spectrum coincides with and has a band structure.

102

J. Brüning, V. Geyler, K. Pankrashkin

(D) For irrational θ , the spectrum of is infinitely degenerate. The part is a closed nowhere dense set without isolated points, and ∩ (μk , μk+1 ) is a Cantor set for any k = 0, 1, 2, . . . . If additionally β = 1, then the spectrum of has no absolutely continuous part and the singular continuous spectrum coincides with . Proof. Proposition 12 shows that 0 ⊂ spec . The spectrum of outside spec D is described by Corollary 9 and, in virtue of Proposition 10(B), coincides with . (A) Propositions 8, 10(B), and 11(A) show the absence of eigenvalues of Q(z) − α. By virtue of Proposition 7 the operator has no point spectrum in R \ spec D for any θ . Therefore, due to Proposition 12 the point spectrum coincides with spec D, and all eigenvalues have infinite multiplicity. (B) The trivial estimate M(θ, β) ≤ 2(1+β 2 ) implies the inclusion spec M(θ, β) ⊂ [−2(1 + β 2 ), 2(1 + β 2 )]. The assertion follows now from Proposition 10(D). (C) Let θ be rational. Take N ∈ Z with N θ ∈ 2Z. The operator appears to be invariant under the shifts (φm,n,r , φm,n,u ) → (φm+k N ,n+l N ,r , φm+k N ,n+l N ,u ), k, l ∈ Z, i.e. is Z2 -periodic and, therefore, the absence of singular spectrum for follows from the standard arguments of the Bloch theory, see e.g. Theorem 11 in [27]. Therefore by (A) coincides with the absolutely continuous spectrum. The spectrum of M(θ, β) consists of finitely many bands, so is η−1 (spec M(θ, β)) between any two Dirichlet eigenvalues. (D) Now let θ be irrational. The infinite degeneracy of spec follows from the arguments of Remark 1. In view of continuity of η, the set of z ∈ R for which |η(z)| < 2(1 + β 2 ) is a union of open disjoint intervals In . Moreover, due to Proposition 10(D) there is exactly one such interval between any two subsequent eigenvalues of D. Put Jn = In . Note that ∪Jn contains all points z with |η(z)| ≤ 2(1 + β 2 ). Due to Proposition 10(E), the restriction of η to Jn is a homeomorphism of Jn on the segment [−2(1 + β 2 ), 2(1 + β 2 )]. Therefore, the preimage −1 K n := η| Jn spec M(θ, β) ⊂ Jn is a Cantor set as is true of spec M(θ, β). Moreover, the intersection of any two of sets K m is empty, which follows from Proposition 11(C) and Proposition 10(D). Therefore, the set ∪K n , which coincides with , is also closed, nowhere dense, and without isolated points. If β = 1, then the spectrum of M(θ, β) has zero Lebesgue measure by Proposi−1 tion 11(B). Since (η| In are real-analytic, the sets K n and hence = ∪K n are also of zero Lebesgue measure. Such a set cannot support absolutely continuous spectrum and does not intersect the point spectrum due to (A), therefore, is the singular continuous spectrum. In view of the unitary equivalence between the operators and L, Theorem 13 provides a complete spectral analysis of the magnetic Schrödinger operator on the periodic graph. At the same time, we believe that the operator may be considered as a model of quasiperiodic interaction on quantum graphs and may be useful also outside the problems related to magnetic fields. We formulate several corollaries in order to answer the following natural questions arising in the case of rational magnetic flux θ : • Are the eigenvalues of (and of L) isolated or embedded in the continuous spectrum? • Is the number of gaps in the spectrum finite or infinite? Note that the rank of the lattice defining the magnetic translation group is equal to 2, therefore, one can expect the validity of the Bethe–Sommerfeld conjecture for θ = 0.

Cantor and Band Spectra for Periodic Quantum Graphs with Magnetic Fields

103

We emphasize that these questions are rather non-trivial even for lattices without any potentials; for example, rectangular lattices with δ-boundary conditions at the nodes can have very different properties depending on the coupling constants and the ratio between the edge lengths [13]. We will see that the introduction of scalar potentials on edges provides a mechanism of gap creation similar to the so-called decoration [37]. The case of a non-trivial magnetic field can be treated by a simple norm estimate. Corollary 14. If θ is non-integer, then the spectrum of has infinitely many gaps, and all μk lie inside the gaps. Proof. In this case the set does not contain μk due to Propositions 10(D) and 11(D).

Let us consider now the case without magnetic field in greater details. Corollary 15. Let θ be integer. (A) The part of the spectrum of coincides with the spectrum of the Kronig-Penney Hamiltonian P from (14). In particular, if there are inifinitely many gaps in the spectrum of P, then has the same property. (B) If V is a convex smooth function whose derivative does not vanish, then all gaps are open for any α, the spectrum of P and hence also of has infinitely many gaps, and all μk are isolated in spec . (C) Let the gap of P near μk be closed for α = 0. Then μk is an embedded eigenvalue of for all α. In particular, μk lies on a band edge for α = 0. For V = 0 and α = 0 all gaps are closed and all μk are embedded into the continuous spectrum. Proof. (A) In this case one has spec M(θ, β) = − 2(1 + β 2 ), 2(1 + β 2 ) and the set ≡ η−1 − 2(1 + β 2 ), 2(1 + β 2 ) coincides with the spectrum of P by Proposition 10(C). (B) Denote ν(z) = u 1 (l; z) + u 2 (l; z), where u 1 and u 2 are the special solutions from Example 5. Clearly, ν is the discriminant of the periodic Sturm-Liouville operator d2 Q := − 2 + W with W from (14). If α = 0, then P = Q and η(z) = (1 + β 2 )ν(z) dt for all z. Let V be smooth convex with V = 0 and α = 0, then it is proved in [17] (see Lemma 3 and Theorem 2 therein) that all gaps of P are open and that μk do not belong to spec P = , which means that |ν(μk )| > 2. One has η(μk ) = (1 + β 2 )ν(μk ) + αu 1 (l; μk ) = (1 + β 2 )ν(μk ), and the gap remains open for all α = 0 as |η(μk )| > 2(1 + β 2 ). (C) The case with V = 0 and α = 0 is obvious. If the gap near μk is closed, then ν(μk ) = ±2, η(μk ) = ±2(1 + β 2 ), and μk ∈ . Moreover, ν (μk ) = 0. As ∂z u 1 (l; μk ) = 0 (see Example 5), one has η (μk ) = 0 for α = 0, which means that η ∓ 2(1 + β 2 ) changes the sign at μk . This means that there is a gap near μk . 6. Concluding Remarks We hope that the approach presented here can be extended to the analysis of more general periodic magnetic systems, for example, for more complicated combinatorial structures, for nodes and edges with geometric defects or measure potentials, or with the spin-orbital coupling taken into account. Another open question is whether one can deal with more general boundary conditions. Although there are some particular examples for which

104

J. Brüning, V. Geyler, K. Pankrashkin

the above construction works, we are not able to present a suitable general picture at the moment. We hope to clarify the situation, which actually goes beyond the quantum graph context, in subsequent works. The first version of the paper was significantly improved by the suggestions of the referee, who pointed out that the results hold not only for periodic magnetic fields but also for those with a constant flux per cell. The authors are indebted to him/her very much for the attention and the careful reading. References 1. Abilio, C.C., Butaud, P., Fournier, Th., Pannetier, B., Vidal, J., Tedesco, S., Dalzotto, B.: Magnetic field induced localization in a two-dimensional superconducting wire network. Phys. Rev. Lett. 83, 5102–5105 (1999) 2. Albeverio, S., Gesztesy, F., Høegh-Krohn, R., Holden, H.: Solvable models in quantum mechanics. 2nd ed. Providence, R: AMS Chelsea Publ., 2005 3. Alexander, S.:Superconductivity of networks. A percolation approach to the effects of disorder. Phys. Rev. B 27, 1541–1557 (1983) 4. Avila, A., Jitomirskaya, S.: The ten martini problem. Ann. Math. (to appear) http://arXiv. org:math.DS/0503363, 2005 5. Avila, A., Krikorian, R.: Reducibility or non-uniform hyperbolicity for quasiperiodic Schrödinger cocycles. To appear in Ann. of Math., http:// arXiv.org:math.DS/0306382, 2003 6. Avron, J.E., Seiler, R., Simon, B.: Homotopy and quantization in condensed matter physics. Phys. Rev. Lett. 51, 51–54 (1983) 7. Birman, M.Sh., Suslina, T.A.: A periodic magnetic Hamiltonian with a variable metric. The problem of absolute continuity. St. Petersburg Math. J. 11, 203–232 (2000) 8. Boca, F.P., Zaharescu, A.: Norm estimates of almost Mathieu operators. J. Funct. Anal. 220, 76–96 (2005) 9. Boon, M.H.: Representations of the invariance group for a Bloch electron in a magnetic field. J. Math. Phys. 13, 1268–1285 (1972) 10. Coddington, E.A., Levinson, N.: Theory of ordinary differential equations. New York etc.: McGraw-Hill, 1995 11. de Gennes, P.-G.: Diamagnétisme de grains supraconduteurs près d’un seuil de percolation. C.R. Acad. Sci. Paris, Sér. II 292, 9–12 (1981) 12. Derkach, V.A., Malamud, M.M.: Generalized resolvents and the boundary value problems for Hermitian operators with gaps. J. Funct. Anal. 95, 1–95 (1991) 13. Exner, P.: Lattice Kronig-Penney models. Phys. Rev. Lett. 74, 3503–3506 (1995) 14. Exner, P.: A duality between Schrödinger operators on graphs and certain Jacobi matrices. Ann. Inst. H. Poincaré 66, 359–371 (1997) 15. Gehtman, M.M., Stankevich, I.V.: The generalized Kronig-Penney problem. Funct. Anal. Appl. 11, 51–52 (1977) 16. Geyler, V.A., Margulis, V.A.: Anderson localization in the nondiscrete Maryland model. Theor. Math. Phys. 70, 133–140 (1987) 17. Geyler, V.A., Senatorov, M.M.: The structure of the spectrum of the Schrödinger operator with a magnetic field in a strip and infinite-gap potentials. Sb. Math. 88(5), 657–669 (1997) 18. Gorbachuk, V.I., Gorbachuk. M.A.: Boundary value problems for operator differential equations. Dordrecht, etc.: Kluwer Acad. Publ., 1991 19. Helffer, B., Kerdelhué, P., Sjöstrand, J.: Le papillon de Hofstadter revisité. Mém. Soc. Math. Fr., Nouv. Sér. 43 1–87 (1990) 20. Helffer, B., Sjöstrand, J.: Semi-classical analysis for Harper’s equation III: Cantor structure of the spectrum. Mém. Soc. Math. Fr., Nouv. Sér. 39, 1–124, (1989) 21. Hofstadter, D.R.: Energy levels and wave functions of Bloch electrons in rational and irrational magnetic fields. Phys. Rev. B 14, 2239–2249 (1976) 22. Hryniv, R.O., Mykytyuk, Ya.V.: 1-D Schrödinger operators with periodic singular potentials. Methods Funct. Anal. Topology 7(4), 31–42 (2001) 23. Kohmoto, M.: Quantum-wire networks and the quantum Hall effect. J. Phys. Soc. Japan 62, 4001–4008 (1993) 24. Kostrykin, V., Schrader, R.: Quantum wires with magnetic fluxes. Commun. Math. Phys. 237, 161–179 (2003) 25. Kreft, Ch., Seiler, R.: Models of the Hofstadter type. J. Math. Phys. 37, 5207–5243, (1996) 26. Kuchment, P.: Quantum graphs I. Some basic structures. Waves Random Media 14, S107–S128 (2004)

Cantor and Band Spectra for Periodic Quantum Graphs with Magnetic Fields

105

27. Kuchment, P.: Quantum graphs II. Some spectral properties of quantum and combinatorial graphs. J. Phys. A: Math. Gen. 38, 4887–4900 (2005) 28. Langbein, D.: The tight-binding and the nearly-free-electron approach to lattice electron in external magnetic field. Phys. Rev. (2) 180, 633–648 (1969) 29. Levitan, B.M., Sargsyan, I.S.: Sturm-Liouville and Dirac operators. Dordrecht, etc.: Kluwer 1990 30. Mikhailets, V.A., Sobolev, A.V.: Common eigenvalue problem and periodic Schrödinger operators. J. Funct. Anal. 165, 150–172 (1999) 31. Naud, C., Faini, G., Mailly, D.: Aharonov–Bohm cages in 2D normal metal networks. Phys. Rev. Lett. 86, 5104–5107 (2001) 32. Naud, C., Faini, G., Mailly, D., Vidal, J., Douçot, B., Montambaux, G., Wieck, A., Reuter, D.: Aharonov– Bohm cages in the GaAlAs/GaAs system. Physica E 12, 190–196 (2002) 33. Novikov, S.P.: Two-dimensional Schrödinger operators in periodic fields. J. Soviet Math. 28(1), 1–20 (1985) 34. Pavlov, B.S.: The theory of extensions and explicitly solvable models. Russ. Math. Surv. 42, 127–168 (1987) 35. Prange, R.E., Girvin, S.M.(eds.).: The Quantum Hall Effect. New York: Springer-Verlag, 1990 36. Puig, J.: Cantor spectrum for the almost Mathieu operator. Commun. Math. Phys. 244, 297–309 (2004) 37. Schenker, J.H., Aizenman, M.: The creation of spectral gaps by graph decoration. Lett. Math. Phys. 53, 253–262 (2000) 38. Shubin, M.A.: Discrete magnetic Laplacian. Commun. Math. Phys. 164, 259–275 (1994) 39. Sobolev, A.V.: Absolute continity of the periodic magnetic Schrödinger operator. Invent. Math. 137, 85–112 (1999) 40. Thouless, D.J., Kohmoto, M., Nightingale, M.P., den Nijs, M.: Quantized Hall conductance in a twodimensional periodic potential. Phys. Rev. Lett. 49, 405–408 (1982) 41. Zak, J.: Magnetic translation group. Phys. Rev. 134, A1602–A1606 (1964) Communicated by B. Simon

Commun. Math. Phys. 269, 107–136 (2007) Digital Object Identifier (DOI) 10.1007/s00220-006-0118-x

Communications in

Mathematical Physics

Quantum State Merging and Negative Information Michał Horodecki1 , Jonathan Oppenheim2 , Andreas Winter3 1 Institute of Theoretical Physics and Astrophysics, University of Gda´nsk, 80-952 Gda´nsk, Poland 2 Department of Applied Mathematics and Theoretical Physics, University of Cambridge,

Cambridge CB3 0WA, U.K. E-mail: [email protected]

3 Department of Mathematics, University of Bristol, Bristol BS8 1TW, U.K

Received: 4 January 2006 / Accepted: 14 April 2006 Published online: 21 October 2006 – © Springer-Verlag 2006

Abstract: We consider a quantum state shared between many distant locations, and define a quantum information processing primitive, state merging, that optimally merges the state into one location. As announced in [Horodecki, Oppenheim, Winter, Nature 436, 673 (2005)], the optimal entanglement cost of this task is the conditional entropy if classical communication is free. Since this quantity can be negative, and the state merging rate measures partial quantum information, we find that quantum information can be negative. The classical communication rate also has a minimum rate: a certain quantum mutual information. State merging enabled one to solve a number of open problems: distributed quantum data compression, quantum coding with side information at the decoder and sender, multi-party entanglement of assistance, and the capacity of the quantum multiple access channel. It also provides an operational proof of strong subadditivity. Here, we give precise definitions and prove these results rigorously.

I. Introduction The field of quantum information theory is still in its infancy, with many of the key building blocks of the theory not yet in place or not well understood. This is perhaps not surprising, since the important elements of classical information theory have only been in place since the 70’s. The notion of classical information was first introduced by Shannon [1] who defined it operationally, as the minimum number of bits needed to communicate the message produced by a statistical source. This gave meaning to the entropy H (X ) of the source producing a random variable X . The amount of information that two random variables X and Y have in common was given a meaning through the mutual information I (X : Y ) = H (X ) + H (Y ) − H (X Y ). Operationally it is the rate of communication possible through a noisy channel taking X to Y . The fundamental Shannon theorems treated two basic questions: how many bits does one need to transmit a message from a source? How many bits can one send via a noisy channel?

108

M. Horodecki, J. Oppenheim, A. Winter

Table 1. Key concepts in classical information theory concept information

quantity H (X )

mutual information

I (X : Y )

partial information

H (X |Y )

operational meaning The rate at which a source can convey messages (Shannon compression) For an input X which produces Y after being sent down a channel, I (X : Y ) is the rate at which information can be sent reliably (channel coding) The rate at which messages X can be sent to a party who has prior information Y (Slepian-Wolf theorem)

Another basic brick in classical information theory, which is a generalization of the noiseless coding problem, is the notion of partial information. The question is now, how many bits does the sender (Alice) need to send to transmit a message from the source, provided the receiver (Bob) already has some prior information about the source. The amount of bits we call the partial information. Slepian and Wolf showed that partial information is equal to the entropy of the source reduced by the mutual information [2]. This quantity is equal to what is called conditional entropy H (X |Y ) = H (X Y ) − H (Y ). It is actually an entropy, and was originally defined as the average entropy of conditional probability distributions: H (X |Y ) = − pY (y) p X |Y (x|y) log p X |Y (x|y), (1) xy

with p X |Y (x|y) the probability of the source producing symbol x conditioned on the fact that Bob has y, and pY (y) the probability that y is produced at Bob’s site. This discovery of Slepian and Wolf clarified the picture of correlated sources: mutual information is the knowledge common to both Alice and Bob. Entropy of Alice’s source is its full information content. The difference between the two is the information that Bob needs to complete his prior knowledge about Alice’s source (Fig. 1). It thus provided an information theoretic basis for the conditional entropy. It should be noted that it is a highly non-trivial operation, since Alice is able to communicate to Bob the full information about her string X 1 . . . X n , even though she is unaware of what string Y1 . . . Yn Bob has. The quantities and operational meaning of the entropy, mutual information, and conditional entropy thus form the basic building blocks of classical information theory. We are interested in finding the corresponding basic elements in quantum information theory. The first step was done by Schumacher [3], who showed that the von Neumann entropy plays an analogous role to Shannon entropy: it has the operational interpretation of the number of qubits needed to transmit quantum states emitted by a statistical source. The next step was to find an analogue of the noisy coding theorem. Here it turned out that the analogy was not very strict: the quantum analogue of mutual information cannot be obtained by replacing Shannon entropies with von Neumann ones. It was found that the capacity of the quantum channel is determined by a different quantity – the coherent information [4, 5]. The coherent information, defined for a bipartite state ρ AB is I (AB) = S(B) − S(AB),

(2)

and the channel capacity is obtained [6–8] by maximising it over input states ρ A . Here, S(B) and S(AB) are the von Neumann entropy of states ρ B = Tr A ρ AB and ρ AB , and we adopt the notation of dropping the explicit dependence on ρ when such dependence is obvious.

Quantum State Merging and Negative Information

109

Fig. 1. A graphical representation of the building blocks of classical information theory. The total information of the source producing pairs of random variables X, Y is H (X Y ), while the information contained in just the variable X (Y ) is H (X ) (H (Y )). The information common to both variables is the mutual information I (X : Y ), while the partial informations are H (X |Y ) and H (Y |X ). In the quantum case, the quantum mutual information I (A : B) can be greater than the total information S(AB), which can be also greater than the local informations S(A) and S(B). To compensate, the partial informations S(A|B) and S(B|A) can be negative Table 2. Key concepts in quantum information theory with additions due to merging highlighted in bold concept quantum information

quantity S(A)

coherent information

I (AB)

partial quantum information

S(A|B)

operational meaning The rate at which a source can convey quantum states (Schumacher compression) For an input which produces ρ AB after being sent down a channel, I (AB) is the rate at which quantum information can be sent reliably down the channel (quantum channel coding). Merging allows us to interpret the negative values of this quantity The rate at which quantum states with density matrix ρ A can be sent to a party who has prior quantum information ρ B

With the coherent information, there was a persistent mystery – for any particular input ρ A , the quantity S(A) − S(AB) could be negative, and it was not known how to interpret such a quantity, as it indicated a sense in which the channel capacity could be negative for such input distributions. Thus it is often the case that for a particular channel, no inputs will give positive distributions, and one should set the coherent information to zero, by inputting the null distribution (any pure state). Turning next to a quantum analogue of prior and partial information, there had previously not been any such notion – a quantum scenario like that of Slepian-Wolf appeared intractable [9]. Another serious obstacle in the quantum world is that there are no conditional probabilities, hence conditional entropy cannot be defined. Conditional probabilities only exist after one performs a measurement which of course destroys the state. One may try to overcome this difficulty, by naively replacing Shannon entropies with von Neumann one in the formula for conditional entropy, so that quantum conditional entropy would be the difference between the total entropy and the entropy of subsystem, S(A|B) = S(AB) − S(B).

(3)

Such an approach has been strongly advocated [10], however while this H goes to S rule works for defining information, it doesn’t work for channel capacity, as mentioned above. It is thus not clear that it is the correct thing to do. However there is a more serious obstacle here: the conditional entropy defined by taking H to S can be negative [10–12].

110

M. Horodecki, J. Oppenheim, A. Winter

In [12] this problem was connected with quantum entanglement. Likewise for maximally entangled states, it was connected with the ability to perform teleportation [10]. It had already been noted by Schrödinger, that an entangled state may possess a weird feature: if a system is in such a state we may know more about the whole system than about subsystems. In [12], Schrödinger’s intuition was quantified by von Neumann entropies, and it was found that the entropy of a subsystem can be greater than the entropy of the total system only when the state is entangled. It was however also found that there are entangled states that do not exhibit this weird property. Thus there was a question: what does it mean, that for some states we have such behaviour, and not for other states? It doesn’t help that −S(A|B) is nothing but the coherent information, that determines channel capacity [6–8]! How can the duality between channel coding and Slepian-Wolf compression be conserved in any quantum analog? In our recent paper [13], we approached the problem of quantifying partial and prior information from a purely operational point of view. Inspired by the classical Slepian-Wolf theorem, we consider the scenario in which an unknown quantum state is distributed over two systems. We determined how much quantum communication is needed to transfer the full state to one system. This communication measures the partial information one system needs conditioned on its prior information. We found that the partial information is given by the conditional entropy, just as in the classical case. However, in the classical case, partial information must always be positive, while in the quantum world we find this physical quantity can be negative. If the partial information is positive, its sender needs to communicate this number of quantum bits to the receiver to achieve state transfer; if it is negative, the state can be transferred, and in addition, the sender and receiver gain the corresponding potential for future quantum communication. This potential communication is in the form of pure entangled states which can be used to teleport quantum states. Thus viewing entanglement as a potential for quantum communication, we see that when the conditional entropy is positive, entanglement needs to be consumed, while when it is negative, entanglement is gained. One can view it in another way – the entropy S(B) quantifies how much Bob knows (in the sense of possessing the state), while the entropy S(AB) quantifies how much there is to know. Since quantum distributions can have S(AB) ≤ S(B), there is a sense in which Bob knows too much. If Alice were to send her full state to him, at a cost of S(A), then he ends up having entropy S(AB) – in the quantum world, after you receive negative information, you know less. The primitive which (optimally) transfers partial information we call quantum state merging, as Alice’s state is effectively merged with Bob’s state, arriving at his site. With this primitive in hand, one can gain a systematic understanding of quantum network theory, including several important applications such as distributed compression, multiple access channels and assisted entanglement distillation (localizable entanglement), and compression with quantum side information. The purpose of the current paper is to provide full proofs for the result of [13]. In Sect. II we formally define the notion of quantum state merging, and state the main result. In Sect. III we exhibit a general condition to ensure state merging and derive a one-shot protocol based on random measurements. In Sect. IV we prove the main theorem, show that our protocol has the optimal classical communication rate, and provide a heuristic explanation of why the conditional entropy comes into play. Once the primitive of state merging has been put on a firm footing, we are able to use it to solve a number of previously intractable problems. A broad outline of these

Quantum State Merging and Negative Information

111

applications was given in [13], and here we provide more details. In Sect. V we look at the problem of distributed compression, where several parties at different sites individually compress a source, which is then decoded by a single party. It is found that the parties can compress at the ideal rate of the total entropy, even though they are distributed. In Sect. VI, we look at noiseless coding with side information, i.e. we consider the problem where one party (Alice) wishes to compress her state to send to a decoder, and a second party (Bob) who holds part of the total state can aid her by sending part of his state. The decoder only wishes to decode the state of Alice, while Bob’s state is only used to help in the decoding. As a corollary, we find that if there is a single encoder Alice who has access to side information, then this can help her in sending information to a decoder, a situation impossible in the classical case. Next, in Sect. VII, we treat entanglement of assistance [14] in the case of many helping parties (a concept similar to localizable entanglement [15]). A pure state is shared by many parties, and the goal is to distill the maximum amount of entanglement between two of the parties. The other parties can aid in this distillation through local operations and classical communication. We find that state merging gives the optimal rate of distillation. We then consider the quantum multiple access channel, in Sect. VIII. Two parties, Alice and Bob wish to send quantum states to a decoder through a channel which acts on both their states. We find optimal rates using state merging, and derive the full rate region. We are also able to provide an interpretation to the longstanding puzzle of negative coherent information in the formula for the capacity of the quantum channel. Namely, if one party’s rate is negative, then this is the amount of entanglement he or she must invest in order to help the other party achieve the maximum rate. Before concluding in Sect. X, we provide a quick and intuitive proof of strong subadditivity using state merging in Sect. IX.

II. State Merging: Concept, Definitions and Main Result Consider a source emitting a sequence of unknown bipartite pure states |ψ1 AB |ψ2 AB |ψ3 AB . . . from a distribution, with average density matrix ρ AB . As with Schumacher compression, we assume the density matrix of the source is known to the two parties Alice and Bob, but they don’t know the ensemble which realises it. I.e., for any given state they possess, the state is unknown, although the statistics of the source are. We are interested in information theoretic quantities, and in particular, we are interested in quantifying quantum information. We thus allow free classical communication between the two parties, and consider many copies n of the state ρ AB . We now ask how much quantum communication is needed for Alice to transfer the unknown sequence of states |ψ1 AB |ψ2 AB |ψ3 AB . . . to Bob’s site. This we call quantum state merging. Notice that because classical communication is free, we can replace quantum communication by entanglement due to teleportation [16] – this will be a more convenient way of accounting for the quantum resources. Faithful state merging means that the fidelity of the sequence of states is kept for any realisation of the density matrix. There is an equivalent, yet more elegant way to conceive of this problem. We imagine that the state ρ AB is part of a larger pure state ψ AB R = |ψψ| AB R , with a state vector |ψ AB R which also lives on a reference (or environment) system R. Faithful state transfer means that the transferred state has high fidelity with the original state |ψ⊗n AB R . More formally, we define:

112

M. Horodecki, J. Oppenheim, A. Winter

Definition 1 (State merging). Consider a pure state | A shared between two BR Let Alice and Bob have further registers A0 , A1 and parties A, B and a reference R. A0 ⊗ B0 , B1 , respectively. We call a joint operation M : A B B0 −→ A1 ⊗ B1 B B state = (M ⊗ id ) merging of with error , if it is LOCC and, with ρ A1 B1 ⊗ R B BR A BR ( K ) A0 B0 , (4) F ρ A1 B1 , ( L ) A1 B1 ⊗ ≥ 1 − , B BR B BR with maximally entangled states K , L on A0 B0 , A1 B1 of Schmidt rank K , L, respec The number log K −log L tively. Here, B is a local ancilla of Bob’s of the same size as A. is called the entanglement cost of the protocol. In the case of many copies of the same state, = ψ ⊗n , we call n1 (log K − log L) the entanglement rate of the protocol. A real number R is called an achievable rate if there exist, for n → ∞, merging protocols of rate approaching R and error approaching 0. The smallest achievable rate is the merging cost of ψ. The main purpose of this paper is to prove in detail the result announced in [13], namely, that the merging cost is equal to the conditional entropy of the state ρ AB shared by Alice and Bob, S(A|B) = S(AB) − S(B). Theorem 2 (Quantum State Merging). For a state ρ AB shared by Alice and Bob, the entanglement cost of merging is equal to the quantum conditional entropy S(A|B) = S(AB) − S(B), in the following sense. When the S(A|B) is positive, then merging is possible if and only if R > S(A|B) ebits per input copy are provided. When S(A|B) is negative, then merging is possible by local operations and classical communication, and moreover R < −S(A|B) maximally entangled states are obtained per input copy. Our strategy of proof will be the following. We first show that if the quantity is negative, then merging can be done by LOCC (indeed, with only one-way communication from Alice to Bob), and the entanglement rate that can be obtained is equal to minus the conditional entropy. Using this we will show that in the case of positive conditional entropy it is enough to spend S(A|B) ebits of entanglement. Finally, we will show that the rates given by the conditional entropy are optimal. We will also show that the classical communication cost is equal to the quantum mutual information between Alice and the reference system R, I (A : R) = S(A) + S(R) − S(A R)

(5)

and prove its optimality. III. One-Shot State Merging In this section, we first formulate a general sufficient condition on a measurement of Alice that ensures that Bob can complete state merging by local operations; then we show how random measurements succeed with high probability in realising this condition. A. Condition for merging with one-way LOCC. Here we will provide a condition that is sufficient to obtain state merging with only LOCC. We formulate it in the one-shot setting of Definition 1. It is based on Alice performing a measurement which takes the original

Quantum State Merging and Negative Information

113

state A to another pure state, with the essential features that: (1) the reference system BR is unchanged, and (2) Alice’s and the Reference’s states are in product form. Since all R purifications are equal up to a local unitaries, this implies that Bob can perform a local unitary which transforms his state into ρ A B. More formally, we consider a protocol, whose basic constituent is Alice’s incomplete to A1 (in our actual solution, it measurement given by Kraus operators P j mapping A will be a von Neumann measurement followed by a unitary). Given the outcome was j, j the state A collapses to a state which we will denote by A BR , BR 1

1 (P j ⊗ I | j A1 = √ )| A , BR BR BR pj

(6)

where p j is probability of obtaining outcome j, p j = | P j† P j ⊗ I B R | .

(7)

Suppose for the moment that | j A1 has the property BR j

ρA

1R

= τ A1 ⊗ ρ R,

j

(8)

j

where ρ A R is the reduced density matrix of A , τ A1 is the maximally mixed state of 1 1BR dimension L on Alice’s system A1 , and ρ R is the reduced density matrix of the original state A B −→ B1 B B on Bob’s side, . Then (see [17]) there exists an isometry U j : BR such that (I A1 R ⊗ U j )| j A1 = | L A1 B1 ⊗ | , BR B BR

(9)

This is where | B substituted for A. is the original state | A with the system B BR BR j because is the purification of ρ R and L that of τ A1 , so both A and ( L ) A1 B1 ⊗ 1BR are purifications of τ ⊗ ρ . Hence, by Uhlmann’s theorem, they are related by A1 B BR R a partial isometry on Bob’s system. Since we require fidelity approaching 1 only in the asymptotic limit, we obtain the following merging condition: Proposition 3 (Merging condition). Consider Alice’s measurement with outcomes j, which occur with probability p j . Denote the state after the measurement result j was j . The following condition obtained by | j A1 , and its reduced density matrix by ρ A R BR 1 implies the existence of a merging protocol with entanglement cost − log L and error √ 2 : that the so-called quantum error Q e satisfies Q e :=

j

j p j ρ A

1R

− τ A1 ⊗ ρ R 1 ≤ ,

where τ A1 is the maximally mixed state of dimension L on A1 .

(10)

114

M. Horodecki, J. Oppenheim, A. Winter

Proof. The proof is based on the above considerations concerning the ideal situation. Using the relation Eq. (A4) between the trace distance and the fidelity, we get j

j p j F ρ A R, τ A1 ⊗ ρ R ≥ 1 − . 1 2

(11)

Then, by Uhlmann’s theorem [18, 19] there exist isometries U j of Bob such that

j F ρ A R, τ A1 ⊗ ρ R = F (I A1 R ⊗ U j )| j A1 , | ⊗ | , (12) L A1 B1 BR B BR 1

hence j

p j F (I A1 R ⊗ U j )| j A1 , | L A1 B1 ⊗ | BR B BR

⎛ p j F (I A ≥⎝

1R

⊗ U j )| j A1 , | L A1 B1 BR

⎞2 ⊗ | ⎠ B BR

j

2 ≥ 1− ≥ 1 − . 2

(13)

So, with the output state of the protocol, ρ A1 B1 A = BR

† (I A1 R ⊗ U j )| j j | A1 (I A1 R ⊗ Uj) , BR

(14)

j

thanks to the linearity of the fidelity when one argument is pure we obtain F ρ A1 B1 A , ( L ) A1 B1 ⊗ ≥ 1 − . BR B BR

(15)

And using the relation (A4) between fidelity and trace distance once more, we arrive at

ρ

− ( L ) A1 B1 BR A1 B1 A

which concludes the proof.

√

⊗ 1 ≤ 2 , B BR

(16)

Note that for any protocol which achieves merging, the condition (10) must necessarily be met at some stage of the protocol. This is because in order for the final state to be close to the original state, ρ R must necessarily be virtually unchanged, and in order for the state to be at Bob’s site, Alice’s state must necessarily be in a product state with the reference system R.

B. One-shot merging by random measurement. Here we will prove an abstract, one-shot version of the main theorem, showing that a random orthogonal measurement of rank-L projectors (and a little remainder) achieves merging.

Quantum State Merging and Negative Information

115

Proposition 4 (One-shot merging). Let A be a pure state, with local dimensions BR d A 1 2 , d , and Tr ρ ≤ . Then there exists a POVM consisting of N = d A, d B R D L B projectors of rank L and one of rank L = d A − N L < L such that Qe ≤ 2 L

d R L +2 , D d A

(17)

d and there is a merging protocol with error at most 2 2 L DR + 2 dL . A

In fact, by choosing the measurement at random according to the Haar measure on the expectation of the left-hand side of Eq. (17) is upper bounded by the right-hand A, side. Remark 5. Let us explain here briefly how we will use the lemma in the proof of Theorem 2 in the case of negative S(A|B). Namely, we will apply this lemma with the following parameters: d R ≈ 2nS(R) = 2nS(AB) , d A ≈ 2nS(A) , D ≈ 2nS(B) , where n is the number of copies of initial state ψ AB R shared by Alice, Bob and the reference system. Moreover L will be related to the rate r of singlets obtained between Alice and Bob in the process of merging: L ≈ 2nr . Then the expression for quantum error will be 1

Q e ≈ 2 2 n(S(AB)−S(B)+r ) + 2n(r −S(A))+1.

(18)

Thus if only r < S(AB) − S(B), then the quantum error will decay exponentially with n. The crucial technical result in the proof of Proposition 4 will be the following statement about random (Haar distributed) rank-L projectors: −→ A1 be a random partial isometry of rank L, i.e. P † P is a Lemma 6. Let P : A For example, one might put P = P0 U projection onto a L-dimensional subspace of A. and a Haar distributed with some fixed rank L-projector P0 onto a subspace A1 of A, For the subnormalized density matrix unitary U on A. ω A1 R = (P ⊗ I R)ρ AR(P ⊗ I R)† , observe that its average over unitaries U is ω A1 R =

L τ A ⊗ ρ R. d A 1

And we have:

2

L

ω −

A1 R d τ A1 ⊗ ρ R ≤ A 2

ω − L τ A ⊗ ρ ≤ R

A1 R d 1 A 1

L2 1 , d 2A D d L L R. d A D

(19) (20)

116

M. Horodecki, J. Oppenheim, A. Winter

Proof. In Appendix A we recall the basic properties of the trace norm · 1√and the Hilbert-Schmidt norm · 2 . From there (Lemma 13) we take that X 1 ≤ dX 2 for an operator on a d-dimensional space. This, and the concavity of the square root function, show that Eq. (19) implies Eq. (20). To prove Eq. (19), we use the fact that it has the form of a variance, so

2

L

ω −

= ω − ω 2 τ ⊗ ρ A 1 R R R R A A A

1

1 1 2 d A 2 = Tr ω2A R − Tr ω A1 R2 1

= Tr ω2A

1R

−

L2 1 Tr ρ 2R. d 2A L

(21)

To evaluate the average of Tr ω2A R, we use the well-known equation 1

Tr ω2A

1R

= Tr (ω A1 R ⊗ ω A1 R)(FA1 A1 ⊗ FRR) ,

(22)

where we have introduced copies of all systems involved, and with the swap (or flip) operator F exchanging the two systems. (Note that FAR, R = FA A A ⊗ FR R .) With this, and w.l.o.g. assuming that A1 is a subspace of A,

Tr ω2A

1R

= Tr (ω A1 R ⊗ ω A1 R)(FA1 A1 ⊗ FRR) = Tr (UU AA ⊗ I RR)(ρ AR ⊗ ρ AR)(UU AA ⊗ I RR)† (FA1 A1 ⊗ FRR) = Tr (ρ AR ⊗ ρ AR) (UU AA ⊗ I RR)† (FA1 A1 ⊗ FRR)(UU AA ⊗ I RR) = Tr (ρ AR ⊗ ρ AR) (UU AA)† FA1 A1 (UU AA) ⊗ FRR , (23)

where we have used the shorthand UU AA := U A ⊗ U A and the projection P from the state ω A1 R has been absorbed into FA1 A1 = P ⊗ P FA˜ A˜ P ⊗ P. In Appendix B we demonstrate, using elementary arguments from the representation theory of U ⊗ U , how one can calculate that

L d A − L L Ld A − 1 (UU AA)† FA1 A1 (UU AA) = I + F . d A d 2 − 1 A A d A d 2 − 1 A A A A

(24)

Inserting this into Eq. (23) gives

Tr ω2A

1R

=

L d A − L L Ld A − 1 Tr ρ 2R + Tr ρ 2AR d A d 2 − 1 d A d 2 − 1 A A

≤

L L2 1 2 , Tr ρ + R d 2A d 2A D

and looking at Eq. (21) we are done.

(25)

Quantum State Merging and Negative Information

117

Proof of Proposition 4. Fix a random measurement according to the description of the proposition. One way of doing this is picking N fixed orthogonal subspaces of dimension L, and one of dimension L = d A − N L < L. The projectors onto these subspaces followed by a fixed unitary mapping it to A1 we denote by Q j , j = 0, . . . , N . Here Q 0 projects onto a subspace of dimension L . Then put P j := Q j U with a Haar distributed random unitary U on A. j

Then, by Lemma 6, with ω A

1R

= (P j ⊗ I R)ρ AR(P j ⊗ I R)† ,

N

j

d d R L L R

ω

A1 R − d τ A1 ⊗ ρ R ≤ N d L D ≤ L D . A A 1

(26)

j=1

This is almost what we want, except that we haven’t taken into account the normalisaj j j tion: with p j = Tr ω A R and ρ A R = p1j ω A R, we need to argue that on average, the p j are close to

L d A .

1

1

1

Indeed, Eq. (26) implies N L pj − ≤ L d R , d A D

(27)

j=1

hence we obtain

N

j p j ρ A

j=1

− τ A1 1R

⊗ ρ R

1

≤2 L

d R . D

(28)

Finally, it is clear that Tr (ρ AR P0 ) = dL < dL , and since the trace distance of two A A states is at most 2, we get the result as advertised, because the quantum error is composed of the probability of hitting P0 and the sum of the error terms of the P j , weighted by their probabilities. Now we can apply Proposition 3. So, if d R D there is a merging LOCC protocol with small error and entanglement cost up to log d R − log D (i.e., the negative of this is the amount of entanglement produced). If d R D, consider the state A ⊗ ( K ) A0 B0 with a maximally entanBR gled state of Schmidt rank K d R/D. Now merging is possible (with L = 1); the entanglement cost is log K , and it can be made as small as log d R − log D. IV. Proof of the Main Theorem A. Achievability of merging Proof of Theorem 2. We will first prove the direct part saying that the rates are achievable. Consider n copies of the state |ψ AB R , and assume first that S(A|B) < 0. We would like to use our one-shot version, Proposition 4, but cannot do so directly, since the dimension d Rn and the number (Tr ρ 2R )n are not information theoretically meaningful. Instead, we consider the vector | A and state | A , with BR BR ⊗n | A := ( A ⊗ )|ψ AB R , BR B ⊗ R

| A := BR

1 | A , BR |

(29)

118

M. Horodecki, J. Oppenheim, A. Winter

are the typical subspaces of An , B n and R n , respectively, and A, where A, B and R etc. are the projection operators onto these typical subspaces. In Appendix C we explain what is necessary to know about typicality, in particular we have: ⊗n | = ψ|⊗n ( A ⊗ ≥ 1 − , )|ψ B ⊗ R

(30)

for any > 0 and large enough n. Indeed, we can choose = 3 exp(−cδ 2 n) with some constant c, where δ > 0 is a typicality parameter; namely from Eq. (C5) in Appendix C ⊗n ⊗n 2 we have Tr ρ ⊗n , Tr ρ B ≥ 1 − exp(−cδ n). We obtain the bound B , Tr ρ R R A A (30) from observing I An ⊗ I B n ⊗ I R n − A ⊗ ≤ (I An − A ) ⊗ (I B n − ). B ⊗ R B ) ⊗ (I R n − R (31) Furthermore, with = | |, we have (using Eqs. (C10), (C9) and (C6) in Appendix C) rank A ≥ (1 − )2n[S(A)−δ] , rank R ≤ 2n[S(R)+δ] , ⊗n −n[S(B)−δ]

B ≤ B ρB B ≤2 B.

(32)

Hence we get, for the normalized A , BR d A ≥ (1 − )2n[S(A)−δ] , d R ≤ 2n[S(R)+δ] ,

D ≥ (1 − )2 2n[S(B)−δ] .

(33)

By the gentle measurement Lemma 15 (see Appendix A), we obtain from Eq. (30),

⊗n

⊗n

√ √

ψ

(34) 1 ≤ 2 , hence ψ AB R − A 1 ≤ 4 . BR BR AB R − A Now Alice and Bob follow a merging protocol as if they had A , and with L = BR , the quantum error would be 2n[S(B)−S(R)−3δ] . If the state were actually A BR d L 2 −nδ/2 2 Qe ≤ 2 L R + 2 ≤ + 21−2nδ . (35) D d A 1− (Observe that S(B) − S(R) ≤ S(A) by subadditivity.) So, by Proposition 3 we would get a merging protocol with error O(2−nδ/4 ). By Eq. (34), running the same protocol on ⊗n −nδ/4 ) + O(2−cnδ 2 /2 ), which vanishes exponentially as ψ AB R , we obtain an error of O(2 n → ∞. Since δ > 0 was arbitrary, the direct part follows. It remains to consider the case when S(A|B) is non-negative. Here, Alice and Bob share additionally n(S(A|B) + ) maximally entangled states. Each ebit contributes conditional entropy −1, so that the final state has negative conditional entropy −n . Then however merging can be done by LOCC, as we have proven above. Remark 7. Note that despite the generality of the definition of merging, our protocol is much more special. The definition allows to start and end with certain amounts of ebits, but the amount charged is only the difference, so that it would be conceivable that to achieve the conditional entropy some catalytic use of entanglement is necessary. However, our protocol either needs no initial entanglement and outputs some (if S(A|B) < 0) or produces no entanglement but needs some initially (if S(A|B) ≥ 0).

Quantum State Merging and Negative Information

119

B. Merging is optimal. Let us now turn to the converse part. The essence of the proof is that under local operations and classical communication, and transmission of n qubits, entanglement can increase at most by n [20]. We will consider preservation of Bob’s entanglement with Alice and the Reference. The initial entanglement E in includes the entanglement of the shared state plus any initial resource of pure entanglement log K . ⊗n Initially, it is nS(B) + log K as the initial state was just ψ AB R. The final entanglement E out includes the entanglement of the final state plus the final resource, log L bits of pure state entanglement, and is E out ≈ nS(AB) + n log L .

(36)

Since Alice and Bob used only LOCC operations, we have E out ≤ E in

(37)

as entanglement could only decrease, giving R = log K − log L < S(AB) − S(B). In more detail, assume L ≤ 2 O(n) for technical reasons. The LOCC protocol (which is also LOCC between Bob and Alice+Reference) can be thought of as generating an ensemble {ϕ kA B B n B n R n , qk } of pure states. Monotonicity of the entropy of entangle1 1 ment under LOCC [21] means

k (38) nS(B) + log K ≥ qk S ϕ B1 B n B n . k

The condition (4) for successful merging translates into

⊗n k qk F ϕ A1 B1 B n B n R n , ( L ) A1 B1 ⊗ ψ B B R ≥ 1 − ,

(39)

k

thanks to the linearity of the fidelity when one argument is pure. Using Eq. (A4) in Appendix A this yields

√

qk ϕ kA1 B1 B n B n R n − ( L ) A1 B1 ⊗ ψ B⊗n (40) B R 1 ≤ 2 , k

hence by monotonicity of the trace norm under partial tracing,

√ qk ϕ kB1 B n B n − τ A1 ⊗ ρ B⊗n B 1 ≤ 2 .

(41)

k

By Fannes’ inequality (stated as Lemma 16 in Appendix A), this finally gives √ qk S(ϕ kB1 B n B n ) − log L − nS(AB) ≤ log L + n log d A + n log d B η(2 ) k

√ ≤ O(n)η(2 ),

(42)

using the concavity of the η-function. With Eq. (38), we thus get √ 1 (log K − log L) ≥ S(A|B) − O(1)η(2 ), n which results in the converse when n → ∞ and → 0.

(43)

120

M. Horodecki, J. Oppenheim, A. Winter

C. Classical communication cost of merging. In our protocol for quantum state merging, the amount of classical communication that Alice needs to send Bob is given by the d d number of possible measurement outcomes: at most AD R + 1, which in the i.i.d. case ⊗n ψ AB R means a rate of S(A) + S(R) − S(B) = I (A : R). Note that this is true regardless of S(A|B) ≥ 0 or S(A|B) < 0. We now show that this amount of communication is needed, and thus our protocol is communication optimal. Theorem 8. For a state |ψ AB R shared by Alice, Bob and the Reference, the classical communication cost of merging is equal to the quantum mutual information between Alice and the reference system R, I (A : R) = S(A) + S(R) − S(A R). Proof. We will first need to take a short digression. Consider a protocol which achieves merging with a entanglement rate Rq and classical communication at rate Rc . Now let us imagine that the parties do not have access to a classical channel, so must send all their classical communication via the quantum channel, encoded into qubits. This gives a fully quantum version of merging [22] similar to the “mother protocol” (see [23] for an alternative, direct proof). If Rq = S(A|B) and Rc = I (A : R), we have, in the “sloppy” notation of [24], 1 1 I (A : R)[q → q] ≥ I (A : B)[qq] + id A→B : ρ AB , 2 2

(44)

where the equation means that a rate of 21 I (A : R) uses a noiseless qubit channel [q → q], and it produces 21 I (A : B) bits of shared entanglement [qq] in addition to achieving state merging from Alice to Bob. The latter is represented by id A→B : ρ AB , i.e. a identity channel from Alice to Bob working on the source ρ AB . We briefly sketch how state merging gives the protocol of Eq. (44). Our merging protocol is expressed in the resource inequality formalism as S(A|B)[qq] + I (A : R)[c → c] ≥ id A→B : ρ AB ,

(45)

where [c → c] stands for the communication resource of 1 classical bit. Recall that for any state merging protocol, the classical communication must be completely decoupled from the sent state for |ψ AB R to remain pure, and thus it can be recycled as Rc bits of entanglement; the entanglement can further be used to send quantum states. This is what the authors of [24, 25] call Rule I, where each bit of classical communication (denoted as [c → c]) can be made coherent: we denote a coherent classical [26] bit by [q → qq]. At the left hand side of an inequality like (45), Rule I says that it can be replaced by half a bit of a quantum channel on the left and half a bit of shared entanglement on the right hand side (denoted 21 [q → q] − 21 [qq]). One sees this by sending the classical communication used in teleportation as coherent qubits which are then recycled into entanglement. Thus, [q → qq] =

1 1 [q → q] − [qq]. 2 2

(46)

Applying Rule I of Eq. (46) to Eq. (45), and rearranging the terms gives the mother protocol in the formulation of Eq. (44). We now show that the mother is an optimal protocol to achieve state merging in the case when one doesn’t have access to a classical channel (see also [23]). We use the fact

Quantum State Merging and Negative Information

121

that a necessary condition for any state merging protocol is that Alice must completely decouple herself from the state |ψ AB R . This is because the state needs to be shared by R and B by definition of state merging. Whatever Alice does, including measurements and processing, we may consider coherently, as an operation which takes ρ A and some ancillas, and produces a part which gets sent down the quantum channel, and a part ρ A she retains. This results in a state |ψ B B R which has high fidelity with |ψ AB R , plus some entanglement between Alice and Bob. Now, using standard quantum cryptographic reasoning originating in [27], if |ψ B B R is (almost) pure, then the system A must be virtually in a product state with B B R. In particular, the mutual information between the state ρ A and the reference system R must be close to zero. Each qubit sent can reduce Alice’s mutual information with the reference system by at most 2, thus at a minimum, Alice must send 21 I (A : R) qubits down the quantum channel. This gives the optimality of Alice’s use of the quantum channel in protocol (44). That at most 21 I (A : B) bits of entanglement are obtainable from the shared state, when sending 21 I (A : R) qubits, can be easily seen as follows. Observe that the 21 I (A : R)[q → q] on the left-hand side of Eq. (44) can be replaced by 21 I (A : R)[qq] + I (A : R)[c → c] due to teleportation. If the entanglement rate on the right were larger than 1 2 I (A : B), we could perform state merging with entanglement rate strictly smaller than 1 1 2 I (A : R) − 2 I (A : B) = S(A|B), contradicting the converse of Theorem 2. Now, to prove optimality of the classical communication in Eq. (45), consider a hypothetical state merging protocol Rq [qq] + Rc [c → c] ≥ id A→B : ρ AB which we may transform using Rule I [24, 25] into

1 1 Rq − Rc [qq] + Rc [q → q] ≥ id A→B : ρ AB . 2 2

(47)

(48)

Comparing this with the mother protocol (44), we have that Rc ≥ I (A : R) by virtue of the optimality of (44); Rq ≥ S(A|B) comes out again, as it should. Thus in addition to giving an operational interpretation for the quantum condition entropy, merging gives an operational interpretation for the quantum mutual information. Secondly, the measurement of Alice makes her state completely product with R, thus reinforcing the interpretation of quantum mutual information as the minimum entropy production of any local decorrelating process [28, 29]. This same quantity is also equal to the amount of irreversibility of a cyclic process: Bob initially has a state, then gives Alice her share (communicating S(A) qubits), which is finally merged back to him (communicating S(A|B) qubits). The total quantum communication of this cycle is I (A : R) quantum bits. Having concluded our proofs regarding state merging, we now turn to its applications. V. Distributed Compression In usual Schumacher compression, a single party Alice receives a state from a source, and must compress the states so that they can be faithfully decoded by another party. For a source emitting states with density matrix ρ A , this can be done at a rate given by the entropy S(A) of the source [3]. One can imagine the situation where the states are

122

M. Horodecki, J. Oppenheim, A. Winter

distributed over many parties, and have to be compressed individually. Each party then sends their compressed share to a decoder who must be able to decode the full state. In the classical case, this problem was solved by Slepian and Wolf [2] who found that the total rate for distributed compression could equal the compression rate when the parties are not distributed. In the quantum case, previous results [9, 30] were interpreted as indications that one cannot compress at the same rate in the distributed vs. non-distributed case. However, using state merging, we will show that formally the same achievable rate region as in the Slepian-Wolf theorem is obtained In detail, we assume that the source emits states with average density matrix ρ A1 A2 ...Am , and distributes it over m parties. The parties wish to compress their shares as much as possible so that the full state can be reconstructed by a single decoder. We allow classical side information for free (we will only need classical communication from each encoder to the joint decoder), and only ask about the rate Ri of entanglement between the i th encoder and the decoder. A tuple (R1 , . . . , Rm ) is achievable if there exists an (m + 1)-party LOCC procedure taking in the source ρ ⊗n A1 ...Am , purified to a state ⊗n ψ R A1 ...Am , and n(Ri + ) ebits between Ai and the decoder B, such that the final state is σ R n B1 ...Bm with F σ, ψ ⊗n ≥ 1 − ,

(49)

and → 0 as n → ∞. As always, the reference is passive, and plays no role in the protocol. Note that the rates Ri can be negative here, just as in state merging, meaning that n(−Ri + ) ebits are returned by the protocol. Let us first describe the quantum solution for two parties and depict the rate region in Fig. 2. If one party compresses at a rate S(B), then the other party can over-compress at a rate S(A|B), by merging her state with the state which will end up with the decoder. The only difference between this scenario and the state merging one is that Bob first compresses his state, and sends it to the decoder, who then decompresses it; Alice then merges her state with Bob’s state which is now at the decoder. This gives us one possible way for the two parties to jointly compress the states. Time-sharing gives the full rate region, since the bounds evidently cannot be improved. Analogously, for m parties Ai , and all subsets T ⊆ {A 1 , A2 , . . . , Am } holding a combined state with entropy S(T ), the rate sums RT = Ai ∈T R Ai clearly have to obey RT ≥ S T |T for all sets T , (50) with T = {A1 , A2 , . . . , Am } \ T the complement of set T . This just follows from the converse to Theorem 2: even if the decoder somehow has all the shares T , a total rate of at least S T |T is necessary to convey the remaining shares T . That this bound can be achieved simply follows from the fact that with T at the decoder, each party can in turn merge their state with what will be at the decoder. So, for example, with four parties, an obtainable rate point is obtained when party A1 sends her state at rate S(A1 ) just by regular Schumacher compression, party A2 merges her state with the first parties state at the decoder with rate S(A2 |A1 ), party A3 merges at a rate S(A3 |A1 A2 ), and party A4 at rate S(A4 |A1 A2 A3 ), with rate total being the Schumacher rate S(A1 A2 A3 A4 ), etc. These rate tuples are however just the corners of the region defined by Eq. (50); hence time sharing between various combinations of ordering the encoders gives the full rate region.

Quantum State Merging and Negative Information

123

solely quantum regime

RA S(A)

RA

S(A) RA+RB= S(AB)

S(A|B)

S(B|A)

S(B)

0 > S(B|A)

RB

RB

RA

RA

RB

RB

Fig. 2. The rate region for distributed compression by two parties with individual rates R A and R B . The total rate R AB is bounded by S(AB). The top left diagram shows the rate region of a source with positive conditional entropies; the top right and bottom left diagrams show the purely quantum case of sources where S(B|A) < 0 or S(A|B) < 0. It is even possible that both S(B|A) and S(A|B) are negative, as shown in the bottom right diagram, but observe that the rate-sum S(AB) has to be positive

VI. Quantum Source Coding with Side Information at the Decoder Related to distributed compression is the case where only Alice’s state needs to arrive at the decoder, while Bob can send part of his state to the decoder (subject to a rate constraint) in order to help Alice lower her rate. The classical case of this problem was introduced by Wyner [31]. For the quantum case, we demand that the full state ψ AB R be preserved in the protocol, but do not place any restriction on what part of Bob’s state may be at the decoder and what part can remain with him, while Alice’s has to go to the decoder. To arrive at a formal definition, we would like to speak of two rates R A and R B here, of entanglement between Alice and the decoder C and of Bob and the decoder ⊗n C. Starting with n copies of the source, An B n R n = ψ AB R , we may consider LOCC protocols between A, B and C, that take in this state and maximally entangled states of Schmidt rank K A (K B ) between A and C (B and C). It is supposed to produce a high-fidelity approximation of C n C B R n tensored with maximally entangled states of Schmidt rank L A (L B ) between A and C (B and C), where C n C B R n is obtained from ⊗n n n ψ AB R by substituting C for A and with an isometry (e.g. a unitary operation taking n one system to two systems) B −→ C B . If in the limit of arbitrary block length the fidelity tends to 1 and n1 (log K A − log L A ) → R A , n1 (log K B − log L B ) → R B , we call the rate pair (R A , R B ) achievable, and the side information problem is to characterise the achievable pairs as concisely as possible. Using state merging we can see that for any isometry T : B −→ U ⊗ V , the rates R A = S(A|U )

and

R B = E P (AU : R) − S(A|U )

are achievable, where ψ AU V R = (id A ⊗ T ⊗ id R )ψ AB R , and E P (AU : R) = min S (id AU ⊗ )ρ AU V

(51) (52)

124

M. Horodecki, J. Oppenheim, A. Winter

is the so-called entanglement of purification [32] of the state ρ AU R with respect to the split AU -R. The minimum is taken over all channels acting on V . The entanglement of purification is in some sense a measure of total correlations, as it can be interpreted as the amount of entanglement needed to create a state, if the only allowed operation is tracing out. The achievability of rates can be seen as follows: the channel can be represented, with the help of an environment B , as another isometry V −→ W B , so that ψ AU V R is mapped to ψ AU W B R . Now, with many copies, let Bob send the system U to the decoder, at rate S(U ), and Alice merge her state to the decoder, at rate R A = S(A|U ). Finally, with the decoder now having AU , let Bob merge W to him, which has rate S(W |AU ), so that the total of Bob’s rate is R B = S(U ) + S(W |AU ) = S(AU W )− S(A|U ). The minimisation over W leads to the formula for the entanglement of purification. Here, the isometry T acts on many copies of B, and up to this “regularisation limit”, the rate pairs (51) are optimal for those protocols, where one way classical communication from Alice to C and from Bob to C is allowed. To see why this is so, consider that at the end of the protocol, Bob will have sent part of his state to the decoder. This part, U , is obtained by some local isometry of Bob’s: B n −→ U V . Likewise, Alice will have sent all her An to the decoder. The total amount of entanglement used, n(R A + R B ), cannot be less than the total entropy of what ends up at the receiver, which has entropy S(An U ), and this is lower bounded by E P (An U : R). By the converse of Theorem 2, Alice’s entanglement cost, n R A , cannot be less than S(An |U ). Thus we have proved that the set of achievable pairs for one-way protocols is given by ∞ 1 n=1

n

S(A |U ), E P (A U : R ) − S(A |U ) s.t. T : B −→ U V isometry . (53) n

n

n

n

n

(Note that since the formula doesn’t mention V , we may actually look at channels B n −→ U .) Because T acts on many copies of B, it is unclear whether a single-letter formula for the achievable rate region can be obtained, potentially by finding a better – lower – expression for Bob’s rate. Indeed, in the classical case, this is what happens [31]. For classical random variables X and Y with Alice and Bob, respectively, the single-letterized rate for Bob is given by imagining a channel Y → W . Bob needs to send only I (W : X ) bits of W rather than H (W ). While the quantum protocol above is clearly optimal, it may be that the entanglement of purification is non-additive, and thus S(U ) may be much lower than nS(U1 ), where ρU1 is the state obtained by acting a channel on single copies of ρ B .

Source coding with side information at the encoder. In the classical case, if a party aims to send her variable to the decoder, having herself access to some side information is of no additional value. If Alice wants to send classical variable X to Bob, she cannot lower her rate by sending or even knowing additional information. In the quantum world, this is not the case, as can be seen from the side information problem in the case of one party. We consider Alice, who has state ρ A1 and is required to send it to Bob. This she can do using state merging at rate S(A1 |B). However, if she also has access to state ρ A2 which may be entangled or correlated to ρ A1 , then she may be able to do better. This better rate is obtained by sending part of ρ A2 as well – so in some cases, less is more!

Quantum State Merging and Negative Information

125

Applying an isometry T : A2 −→ A2 A2 , and actually merging A1 A2 , she can achieve a rate S(A1 A2 |B). Hence one would naturally minimize over channels T : R ≥ min S(A1 A2 |B). T

(54)

As argued in the side information problem, the right-hand side is equal to E P (A1 B : R) − S(B). Essentially, due to the non-monotonicity of the von Neumann entropy, it can be beneficial to lower the entropy of what you are sending, by merging additional quantum states which are entangled with what you needed to send. VII. Multipartite Entanglement of Assistance In this section we consider the multipartite entanglement of assistance [14]. Sometimes it is called localizable entanglement [15], although we operate in the regime of many copies and collective measurements. Consider a pure m-partite state ψ A1 ,A2 ,...,Am . The entanglement of assistance is defined for two fixed nodes Ai and A j , as the maximal pure entanglement that can be obtained between those nodes by LOCC operations performed by all the parties. Here is a more precise definition: Definition 9. For an m-partite pure state, consider a measurement performed by LOCC that leads to pure states between chosen nodes Ai and A j for any outcome k of the measurement. Let the probability of the outcome k be pk , and the entropy of the node i (equal to entropy of the node j) be denoted by Sk (Ai ). The entanglement of assistance between the nodes Ai and A j is defined as pk Sk (Ai ), (55) E A (ψ, Ai : A j ) = sup k

where supremum is taken over the above measurements. Asymptotic entanglement of assistance is given by regularization of the above quantity E∞ A (ψ, Ai : A j ) = lim

n→∞

1 E A (ψ ⊗n , Ai : A j ). n

(56)

Asymptotic entanglement of assistance was determined for pure states of up to four parties in [33]. Namely it was proven that for m ≤ 4 the maximal amount of entanglement that can be distilled between Alice and Bob, with the help of the other m − 2 parties C1 , . . . , Cm−2 , is given by the minimum entanglement across any bipartite cut of the system which separates Alice from Bob: E∞ A (ψ, A : B) = min{S(AT ), S(BT )} =: E min-cut (ψ, A : B), T

(57)

where the minimum is taken over all possible partitions of the other parties into a group T and its complement T = {C1 , . . . , Cm−2 } \ T . In [13] we generalized this result to an arbitrary number of parties, by use of the primitive of state merging. The result is clearly optimal – one cannot increase entanglement by LOCC. The entropy of any splitting T which divides A from B is a measure of the entanglement of the total pure state between AT and BT and it cannot increase during the protocol – in fact not by any protocol allowing arbitrary joint operations of the two groups AT and BT and classical communication. Thus all entropies under such

126

M. Horodecki, J. Oppenheim, A. Winter

splitting serve as an upper bound for the amount of entanglement which can be distilled between A and B. The protocol for achieving this optimal rate is as follows: each party in turn merges their state with the remaining parties on its side of the minimal cut, preserving the minimum cut entanglement. The merging protocol we consider will be slightly different from the merging protocol considered previously in two respects. As before, the party who wishes to merge his state with other parties performs a random measurement on their typical subspace. However, since the receiver will consist of many parties who are separated from one another, the final decoding step (i.e. the unitary which the receiver performs conditional on the measurement outcome of the sender) will not be performed until the very end. The second difference is that the senders will perform complete measurements, and will not attempt to distill additional entanglement between themselves and the receiving parties. This will not effect the merging condition, but it does mean that the maximally entangled states which would be created between the merging parties and the receiver will be destroyed. This greatly simplifies the analysis, despite some entanglement being lost. We only consider entanglement of assistance – i.e. a protocol which attempts to distill entanglement between A and B. More complicated protocols can be constructed which also result in entanglement between other parties. Before moving to the protocol, we will need to prove an aspect of state merging already implicit in Theorem 2, which will serve as a cornerstone of (among other things) proving a formula for asymptotic entanglement of assistance: for a tripartite pure state ⊗n n ψ AB R if S(R) < S(B), a random rank-1 measurement on the typical subspace A ⊂ A j j produces states ψ B n R n such that most of their reduced states ρ R n are close to the state ⊗n ρ ⊗n R , the reduced state of the initial state ψ AB R . Proposition 10 (Random measurement gives covering). Let ψ AB R be a tripartite pure state with S(R) < S(B), of which we consider n copies, and consider the state A of BR Denote by the proof of Theorem 2 (Sect. IV) belonging to the typical subspaces A B R. Let {|e j } be a basis on A chosen at random according to the ρ R the state of system R. j Haar measure, and ρ R be the state obtained on system R upon obtaining outcome j; let p j be the probability of this event. Then for any > 0 and all large enough n, we have

j

p j ρ R − ρ R 1 ≤ ,

(58)

j

where the average is taken over the choice of basis. Proof. This is just the special case of L = 1 in Proposition 4.

With this tool in hand we can analyze the protocol outline above. Clearly, if m = 2, there is only one cut, and its entropy is S(A), the entropy of entanglement, and we are done. So, from now on m ≥ 3. Assume for the moment that all S(AT ) are distinct (we’ll come back to this point at the end), and consider helper Cm−2 . For each set T , clearly S(AT ) = S(BT ), by the purity of the overall state. Hence, for the min-cut we can restrict to looking at the entropies S(AT ) and S(BT ), with Cm−2 ∈ T . For each such set T ⊂ {1, . . . , m − 3}, consider the relative complement T := {1, . . . , m − 3} \ T . This defines a tripartite system composed of Cm−2 , AT and BT . Let Cm−2 perform a random measurement

Quantum State Merging and Negative Information

127

m−2 , as in Proposition 10. We get (if only n is large enough), on his typical subspace C j with arbitrarily high probability, states ABC1 ...Cm−3 which by Eq. (58) satisfy: For all T :

! n S An T n j = S B n T j = n min S(AT ), S(BT ) ± δ , (59)

with arbitrarily small δ. In other words, for each such j , E min-cut j , An : B n = n E min-cut (ψ, A : B) ± δ ,

(60)

and that means that the min-cut entanglement is almost preserved (up to an arbitrarily small variation in the rate), and hence that the reduced state entropies can be assumed to be all distinct (by choosing δ small enough). Now we recursively apply the same to Cm−3 , . . . , C1 . Finally, for the assumption that all reduced state entropies are pairwise distinct: this can be enforced if the parties first “borrow” an arbitrarily small rate of entanglement to distribute singlets between chosen pairs. Then our distinctness assumption becomes true. In the limit, only a sublinear amount of entanglement is needed to do this, but on the other hand [34] shows that the asymptotic entanglement landscape of multiple parties does not change if one allows this sublinear amount – this is due to them being able to always, perhaps inefficiently, extract some entanglement across any given cut unless across that cut they happen to be in a product state. Remark 11. Note that a crucial part of the argument of why the minimum cut entropy doesn’t change is the use of random codes. This is because C1 ’s procedure is universal – it does not depend on the cut. He makes a measurement which only depends on the typical subspace of his state. The measurement thus serves to merge his state with whichever grouping of subsystems has the larger entropy compared with the remaining systems. Not all quantum codes have this feature – for example Devetak codes [7] depend both on the state of the sender, and that of the receiver. The same applies to [33], which is why there even the argument for m = 4 has to be quite subtle. It may seem odd that after performing a random measurement, ones state goes to any set of parties which has more entropy than the remaining parties. Since there are many possible groupings of the parties, for some groupings a certain party would help receive the state, but for other groupings, that party’s state would be left unchanged by the random measurement. Of course, there is no contradiction, as in the end, at the decoding step, one has to decide on the grouping, and with fidelity approaching 1 only for many copies of the state. Conjecture 12. It is awkward that in the recursive procedure described above for m parties we have to first consider a measurement on a long block of states, and then for the second measurement blocks of these blocks, etc. It seems likely that the simplest random measurement strategy will indeed also work: all m − 2 helpers C1 , . . . , Cm−2 measure in a random basis of their respective typical subspaces and broadcast the result to Alice and Bob. They should then end up, with high probability, with a state of the min-cut entanglement.

128

M. Horodecki, J. Oppenheim, A. Winter

VIII. Capacity Region for the Multiple Access Channel We consider a channel with two senders Alice and Bob, and one receiver Charlie; this is the multiple access channel. For the classical multiple access channel, any rates satisfying the following inequalities are achievable for encoding independent messages from Alice and from Bob at their respective terminals to Charlie who decodes them jointly: R A ≤ I (A : C|B), R B ≤ I (B : C|A), R A + R B ≤ I (AB : C).

(61)

The quantum multiple access channel – where Alice and Bob want to send quantum information was considered in [35], and we refer to that paper for the definitions of codes and rate region. In [13], we found that one could use state merging to find a larger achievable region, including negative rates. Namely, that for the quantum multiple access channel, there is the following region of achievable rates: R A ≤ I (AC|B) := I (ABC), R B ≤ I (BC|A) := I (BAC), R A + R B ≤ I (ABC).

(62)

The state on which the quantities are evaluated is constructed as follows. Consider two pure states ψ A A and ψ B B . Let ρ ABC be the state, resulting from the halves A and B being sent down the channel: ρ ABC = (I AB ⊗ A B →C )(|ψψ| A A ⊗ |ψψ| B B ).

(63)

In the classical theory, only positive rates make sense. In the quantum case, the rates can be meaningful, even if one of them is negative. For example, when R A is negative, and R B is positive, this means that when Alice invests R A qubits, then Bob can send R B qubits, as we shall see. A. Remarks on coherent information. In [5] the coherent information was introduced and defined in terms of an input state ρ A and a channel producing output ρ B as I (AB) = S(B) − S(AB),

(64)

that is, as the conditional entropy with a minus sign; this was puzzling because it can be negative. Since it gives the channel capacity of a quantum channel (by maximizing it over input distributions ρ A ), it was unclear how to interpret negative uses of a channel. We will see that the negative part will acquire operational meaning, in full accordance with the positive part. We also have defined the conditional coherent information as I (AB|C) = S(B|C) − S(AB|C).

(65)

We have the useful identity [consistent with Eq. (62)] I (A1 B|A2 ) = I (A1 B A2 ).

(66)

That is, conditioning the coherent information is very simple: just erase the bar. Then we have a chain rule of the same form as the one for mutual information, I (A1 A2 B) = I (A2 B) + I (A1 B|A2 ).

(67)

Quantum State Merging and Negative Information

129

What seems surprising is that conditioning can only increase coherent information! However, this can be explained as follows. Namely, in classical information theory we have to have situations where conditioning decreases information, due to lack of monogamy. Indeed, we can have a situation where I (X 1 : Y ) + I (X 2 : Y ) > I (X 1 X 2 : Y ).

(68)

(E.g., the three variables could be fully correlated.) Therefore, to save the chain rule, conditioning must decrease mutual information. However in the quantum case we always have I (A1 B) + I (A2 B) ≤ I (A1 A2 B),

(69)

due to strong subadditivity. Now conditioning very often increases coherent information, because we have equality in the chain rule identity (67). B. Direct coding theorem: achievability of rates. To check that the rates satisfying the above conditions are achievable, it is enough to consider one corner, for example R A = I (ABC),

R B = I (BC),

(70)

which is an upper corner of the rate region, see Fig. 3. When both I (ABC) and I (BC) are negative, they are trivially achievable: Alice and Bob do nothing. So in this case negativity of rates does not appear meaningful, as zero is achievable too, and one always optimises rates over input states. When I (ABC) is negative and I (BC) is positive, again, those rates can be achieved by Alice doing RA

RA I (A BC)

I (A BC) RA+RB= I (AB C)

I (A C)

I (B C)

I (B AC)

RA

RB

0 > I (B C)

RB RA

RB

RB

Fig. 3. The rate region for the multiple-access channel for two parties with individual rates R A and R B . The total rate R AB is bounded by I (ABC). The top left diagram shows the rate region when both rates are positive; the top right and bottom left diagrams show the case where I (BC) < 0 or I (AC) < 0. I.e. here, Bob (Alice) can invest entanglement so that the other party can send at a rate I (ABC) ≥ R A ≥ I (AC) (I (BAC) ≥ R B ≥ I (BC)). In the bottom right diagram, both parties may have the option of achieving the higher rate by having the other party invest entanglement

130

M. Horodecki, J. Oppenheim, A. Winter

nothing, and Bob – by standard quantum coding theorem. So again the negative rate is not interesting. There are therefore two situations, which we have to consider: I (ABC) ≥ 0 and I (BC) ≥ 0, or I (ABC) ≥ 0 and I (BC) < 0.

(71) (72)

It is enough to consider the first one in detail, as the second one is its simple consequence. Let us first describe how to achieve those rates, when Bob and Alice can communicate quantum messages to C if classical side-communication is permitted. Alice and Bob prepare (n copies of) states ψ A A and ψ B B , respectively, and send halves of them down the channel (inputs A n and B n ). Then Bob performs the merging protocol, i.e. he makes the measurement on his typical subspace in blocks of size 2n R B . As previously we label blocks (codes) by j. On average, he obtains a state close to a 2n R B dimensional maximally entangled state shared with Charlie (who holds the system C), and Bob’s part of the state ψ ABC R is merged with Charlie (ψ ABC R is purification of ρ ABC ). Then, Alice shares with Charlie state ρ ABC where both part B and C is now with Charlie. Random measurement of Alice in blocks 2n R A , will create a state close to the maximally entangled state of this dimension between Alice and Charlie, after Alice communicates her results to Charlie. In this way she also merges her part to Charlie, however it is not important in the present context. Let us now show, how Alice and Bob can share with Charlie the maximally entangled state of suitable dimensions without classical communication. Namely, both Alice and Bob can perform their measurements before sending halves of their states ψ A A and ψ B B j j down the channel. They can then send the states ψ AAA , ψ BBB that they have obtained (here j A and j B denote the outcomes of measurement). This still requires communication, as they have to tell Charlie what outcomes they obtained. j j However, instead of measuring, they can prepare already ψ AAA , ψ BBB with fixed j A and j B known to Charlie. This will have the same effect as before, once they choose such labels, that guarantee that merging conditions are satisfied. Note that the states that Alice and Bob are now sending are close to maximally entangled states (this is guaranteed by the merging condition). The maximally entangled states to which they are close, defines the subspaces, which go through the channel, and allow correction of errors. The subspaces are codes that when used by Alice and Bob, allow them to obtain the above rates. Since our criterion was fidelity with the maximally entangled state, we have obtained here the coding theorem with small average error. In our case it was relatively easy to go from one way to zero because the states that Alice and Bob obtain in our one-way protocol are close to maximally entangled states. For more complicated situations see [36]. Finally, consider the case, where I (BC) is negative, Eq. (72). The reasoning is very similar: in the scenario with classical side-communication, Bob sends −I (BC) + halves of maximally entangled states through a noiseless channel, (keeping the other half), and performs merging, so that after that Alice can achieve her rate as above. However, again Alice and Bob instead of performing measurements, can send the state that would emerge under some outcome of the measurement. The difference is that Bob will send the state not only down the noisy channel, but also down the supplementary noiseless channel, and will share rate of maximally entangled states (thus his overall rate is negative). This is the more interesting rate point: for Alice to achieve the rate I (ABC), she requires Charlie to have C and B. Bob assists in providing this information (which can be understood as additional error correcting information from inside the channel)

Quantum State Merging and Negative Information

131

but that comes at a price, which is exactly −I (BC). We thus have an interpretation of negative channel capacities. C. Converse coding theorem. Here we briefly argue that (up to regularization) the rate region described by our conditions is optimal. The reasoning is quite standard (see e.g. [4, 37]), therefore we will provide only a sketch of the proof. Suppose that some rates R A and R B are achievable. Consider first the case where they are both positive. This means that Alice and Bob can send halves of singlets down the channels in such a way that after decoding by Charlie, they share with Charlie those singlets with fidelity tending asymptotically to one. Alice shares a singlet of dimension 2n R A with Charlie, and Bob one of dimension 2n R B . Would they have exact singlets, the coherent informations would be equal to I (ABC) = I (AC) = n R A , I (BC) = n R B and I (ABC) = n(R A + R B ). Because they share inexact singlets, we apply asymptotic continuity of coherent information [37] (which plays here the role of Fano’s inequality), thanks to which the coherent informations of the real state, per use of channel, approach the ideal values in the asymptotic limit. This means that there exist such states, such that, if Alice and Bob will send halves of them down the channel, then after Charlie’s decoding, the coherent informations approach the values from the coding theorem. There are still two issues. First, the states may be mixed: Alice and Bob prepared singlets, however the encoding procedure may turn them into mixed states. However, coherent information is convex, so that Alice and Bob will not do worse by sending some pure states. Second, we considered the joint ABC state after Charlie’s decoding, while in the coding theorem, we have state merging just from sending by Alice and Bob. However, due to the data processing inequality [4] (saying that operating on V one cannot increase I (U V ), the coherent information of the state before Charlie’s decoding can be only greater. Let us now consider the case when one of the rates (suppose R B ) is negative. This means that Bob uses the noiseless qubit channel an additional R B times (per use of the noisy channel), and Alice achieves her rate. It suffices to show that, if the rate pair (R A , R B ) where R B is negative, is achievable, then Alice and Bob can create the joint state of ABC system, such that I (ABC) = R A and I (AC) = R B per use of channel. To this end, consider a new channel which consists of the old one supplemented by −R B + uses of the noiseless channel from Bob to Charlie. For the new channel, the rates (R A , ) are achievable. They are positive, so that, as explained above, there exist states of Alice and Bob, that sent down the channel produce a joint state having I (AC) = R A and I (BC) = . Suppose now that Bob will not send part of the system that was intended to go through the noiseless channel, but keeps it. In this situation they only use the original channel. We will now see that they achieve the needed coherent informations in this way. Of course I (ABC) = R A , as this quantity does not depend on whether a given system is with Bob or with Charlie. Let us now estimate the quantity I (BC). By sending −R B + qubits, Bob could increase it up to . However, by sending one qubit, one can increase coherent information no more than by one. Thus, coherent information I (BC) cannot be smaller than R B . This ends the proof of the converse theorem. IX. Strong Subadditivity Using state merging, we can get a very quick and operationally intuitive proof of strong subadditivity [38], which can be written as

132

M. Horodecki, J. Oppenheim, A. Winter

S(A|BC) ≤ S(A|B).

(73)

Strong subadditivity is simply the observation that if Bob has access to an additional register C, then Alice surely doesn’t need to send more partial information for him to get the full state ρ AB . After all, Bob could always ignore the ancilla on C, but if he uses it, Alice may need to send him less. Mathematically, we can use this argument because in the proof that S(A|B) is the optimal merging rate we have used only typical subspaces and elementary probability for the direct part, and ordinary subadditivity in the converse part. X. Conclusion It is very interesting to compare the proof of the classical Slepian-Wolf theorem, with the proof of its quantum version – state merging. The Slepian-Wolf protocol is as follows: the typical sequences of Alice are divided into blocks of size ≈ 2n I (A:B) . Note that this is the size of a good code. Now, when a particular sequence occurs, Alice lets Bob know in which code is the sequence, and this is enough for him to determine her sequence. Thus the Slepian-Wolf theorem follows solely from the fact that a random code is a good code, which was shown by Shannon. Interestingly, our protocol is based on the same property, especially for states for which coherent information is positive. (This could be regarded as a situation analogous to the classical case, as the classical mutual information is always positive.) Namely, to prove quantum state merging it is enough to know that a random quantum code is a good quantum code. And in the quantum state merging protocol Alice performs an analogous task: she measures in which quantum code her state is, and tells Bob the result. What is now extremely surprising is that those similarities turn out to be quite superficial. Namely, in the Slepian-Wolf protocol, the amount of bits needed to tell Bob the information “which code” is just the cost of transmission of Alice’s data to Bob. In the quantum case, the information “which code”, since represented by classical bits, is not counted at all, as we count only the quantum information. Thus in this case (positive coherent information) merging does not cost at all, unlike in the classical case. What is more remarkable still is that despite this difference, the cost of sending partial quantum information is the conditional entropy, and thus formally similar to the classical case. This despite the fact that the classical case does not emerge as a limit from the quantum case. In other words, if one takes quantum state merging, and applies it to classical states (i.e. states which are fully decohered, and contain only classical correlations), then the goal is rather different, as one is attempting to retain entanglement between this classical state and the reference system, and one is further allowing free classical communication. We have two ways of interpreting the classical mutual information: (i) either as the quantity responsible for capacity or (ii) as the quantity that reports the part of information that is common both to Alice and Bob. Indeed, the latter meaning is implied by the fact that the cost of communication needed to transfer full information to Bob is H (X ) (full information content of Alice’s state) reduced by the amount of mutual information. Thus the latter represents that part of Alice’s information that Bob also knows, and it need not be transferred to him. It turns out that in the quantum case those two notions are no longer represented by the same quantity (see however [39]). Namely, the communication cost is equal to Alice’s information reduced by quantum mutual information. Thus quantum mutual information serves as common information. The capacity is on the other hand represented by the coherent information. The first quantity is sometimes greater than the whole of Alice’s

Quantum State Merging and Negative Information

133

information, and precisely in those instances, the second quantity has the chance to be positive. It is indeed the beauty of the quantum information world, that both the quantities, into which the classical quantity has split, do their job in an analogous way as it was in the classical case. Indeed, the analogue of common information counts by how much the transmission cost is reduced – exactly as in the classical case, while the analogue of capacity is responsible for protocol, with the same basic elements as in the classical case. The additional brick in the quantum protocol is teleportation, which is perhaps the thread that binds the two notions together. However, as we have noted, the analogy in the protocol is quite superficial. Even though Alice performs the operations that can be called by use of the same name (checking “which code”, and telling it to Bob) the meaning of those operations is completely different. It is extremely mysterious how the quantum and classical cases can have so much in common, and at the same time can be so different. Acknowledgements. MH acknowledges EC grants RESQ, QUPRODIS, EC IP SCALA and (solicited) grant of Polish Ministry of Science and Education contract no. PBZ-Min-008/P03/03, JO acknowledges the support of the Royal Society, the Cambridge-MIT Institute, and EU grant PROSECCO and QAP. AW thanks the EC for support through the RESQ project, the U.K. EPSRC for support through the “QIP IRC”, and the University of Bristol for a Research Fellowship.

Appendix A: Miscellaneous Facts about Norms and Fidelity The following lemma relates the trace norm to the Hilbert-Schmidt norm. Recall that these norms are defined, for an operator X , as √ X 1 := Tr X † X , √ X 2 := Tr X † X .

(trace norm) (Hilbert-Schmidt norm)

Lemma 13. For any operator X , X 21 ≤ dX 22 ,

(A1)

where d is the dimension of the support of operator X (the subspace on which X has nonzero eigenvalues). Proof. It is implied by convexity of function x 2 , where one takes probabilities 1/d. The fidelity of two states is given by

√ √ 2 F(ρ, σ ) = Tr ρσ ρ .

(A2)

Notice that if one of the states is pure, say σ = |φφ|, then F(ρ, |φφ|) = φ|ρ|φ = Tr (ρ|φφ|). Lemma 14. The fidelity is related to trace norm as follows [40]: " " 1 1 − F(ρ, σ ) ≤ ρ − σ 1 ≤ 1 − F(ρ, σ ). 2

(A3)

(A4)

134

M. Horodecki, J. Oppenheim, A. Winter

Lemma 15 (Gentle measurement). Let ρ be a (subnormalized) state, i.e. ρ ≥ 0 and Tr ρ ≤ 1, and let 0 ≤ X ≤ I . Then, if Tr ρ X ≥ 1 − ,

√ √

√

X ρ X − ρ ≤ 2 . 1 Proof. See [41], Lemma 9; the better constant above is from [42].

(A5)

Lemma 16 (Fannes [43]). For states ρ and σ on a d-dimensional space, such that ρ − σ 1 ≤ , # x log x S(ρ) − S(σ ) ≤ η() log d, with η(x) := x − log x + ee

if x ≤ 1e , if x ≥ 1e .

(A6)

Appendix B: The Twirling Average of Equation (24) We use the fact that an operator T (X ) := (UU AA)† X (UU AA)

(B1)

is U ⊗ U -invariant [44]. However, the representation of U ⊗ U decomposes into the two irreducible components, the symmetric and the antisymmetric subspace. By Schur’s lemma, the only invariant operators are then linear combinations of the projections onto these subspaces: sym

AA =

1 I AA + FAA , 2

anti A = A

1 I AA − FAA . 2

(B2)

Hence, the twirling map T can be written T (X ) =

1 1 sym sym anti

anti sym A A Tr X A A . A Tr X A A + anti A Tr AA Tr AA

(B3)

This is enough to evaluate our average: (UU AA)† FA1 A1 (UU AA) =

2

sym sym

AA Tr FA1 A1 AA

d A(d A + 1) 2 anti

anti Tr F

+ A A 1 1 A A d A(d A − 1) AA

2 L + L2 L − L2 +

anti A A d A(d A + 1) 2 d A(d A − 1) 2 L(L − 1) I AA − FAA L(L + 1) I AA + FAA − = d A(d A + 1) 2 d A(d A − 1) 2 L d A − L L Ld A − 1 = (B4) I + F . d A d 2 − 1 A A d A d 2 − 1 A A A A =

2

sym

AA

Quantum State Merging and Negative Information

135

Appendix C: Typicality We shall need the concept and a few properties of typical subspaces [3]. Consider n copies of a density matrix ρ, ρ ⊗n . Writing ρ in its eigenbasis, ρ = i pi |ii|, we note first of all that S(ρ) = H ( pi ). Now, ρ ⊗n = pi n |i n i n |, (C1) in

with i n = i1 . . . in , pi n = pi1 . . . pin , |i n = |i 1 . . . |i n . For δ > 0, the set of typical sequences is defined as (see [45]) ! Tδn := i n : | − log pi n − nS(ρ)| ≤ nδ ,

(C2)

(C3)

and the typical projector [3] is

nδ :=

|i n i n |.

(C4)

i n ∈Tδn

The typical projector inherits its properties from the set of typical sequences. We quote the following from [3], and from [30] for the exponential bounds (see also [45]): abbreviating = nδ , Tr (ρ ⊗n ) ≥ 1 − exp(−cδ 2 n) with a constant c,

ρ

⊗n

ρ

⊗n

≤ρ

≤2

⊗n

,

−n[S(ρ)−δ]

(C5) (C6)

,

(C7)

ρ ⊗n ≥ 2−n[S(ρ)+δ] ,

(C8)

rank = Tr ≤ 2 , −cδ 2 n 2n[S(ρ)−δ] . rank = Tr ≥ 1 − e n[S(ρ)+δ]

(C9) (C10)

References 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13.

Shannon, C.E.: Bell Syst. Tech. J. 27, 379 (1948) Slepian, D., Wolf, J.: IEEE Trans. Inf. Theory 19, 461 (1971) Schumacher, B.W.: Phys. Rev. A 51, 2738 (1995) Barnum, H., Nielsen, M.A., Schumacher, B.: Phys. Rev. A 57, 4153 (1998) Schumacher, B., Nielsen, M.A.: Phys. Rev. A 54, 2629 (1996) Shor, P.W.: Talk at MSRI Workshop on Quantum Computation. Available online under http://www.msri. org/publications/ln/msri/2002/quantumcrypto/shor/1/, 2002 Devetak, I.: IEEE Trans. Inf. Theory 51, 44 (2005) Lloyd, S.: Phys. Rev. A 55, 1613 (1997) Ahn, C., Doherty, A., Hayden, P., Winter, A.: http://arxiv.org/list/quant-ph/0403042, 2004 Cerf, N., Adami, C.: Phys. Rev. Lett 79, 5194 (1997) Wehrl, A.: Rev. Mod. Phys. 50, 221 (1978) Horodecki, R., Horodecki, P.: Phys. Lett. A 194, 147 (1994) Horodecki, M., Oppenheim, J., Winter, A.: Nature 436, 673 (2005)

136

M. Horodecki, J. Oppenheim, A. Winter

14. DiVincenzo, D.P., Fuchs, C.A., Mabuchi, H., Smolin, J.A., Thapliyal, A.V., Uhlmann, A.: In: Proc. 1st NASA International Conference on Quantum Computing and Quantum Communication, Williams, C.P. (ed.) LNCS 1509, pp. 247–257 Berlin-Heidelberg-New York. Springer Verlag, 1998 15. Verstraete, F., Popp, M., Cirac, J.I.: Phys. Rev. Lett. 92, 027901 (2004) 16. Bennett, C.H., Brassard, G., Crépeau, C., Jozsa, R., Peres, A. Wootters, W.K.: Phys. Rev. Lett. 70, 1895 (1993) 17. Schumacher, B., Westmoreland, M.D.: Quantum Inf. Process. 1, 5 (2002) 18. Uhlmann, A.: Rep. Math. Phys. 9, 273 (1976) 19. Jozsa, R.: J. Mod. Optics 41, 2315 (1994) 20. Lo, H.-K., Popescu, S.: Phys. Rev. Lett 83, 1459 (1999) 21. Bennett, C.H., Bernstein, H.J., Popescu, S., Schumacher, B.: Phys. Rev. A 53, 2046 (1996) 22. Devetak, I.: Personal communciation 23. Abeyesinghe, A., Devetak, I., Hayden, P., Winter, A.: In preparation, 2005 24. Devetak, I., Harrow, A.W., Winter, A.: http://arxiv.org/list/quant-ph/0512015, 2005 25. Devetak, I., Harrow, A.W., Winter, A.: Phys. Rev. Lett. 93, 230504 (2004) 26. Harrow, A.W.: Phys. Rev. Lett. 92, 097902 (2004) 27. Ekert, A.: Phys. Rev. Lett 67, 661 (1991) 28. Groisman, B., Popescu, S., Winter, A.: Phys. Rev. A 72, 032317 (2005) 29. Horodecki, M., Horodecki, P., Horodecki, R., Oppenheim, J., Sen (De), A., Sen, U., Synak, B.: http://arxiv.org/list/quant-ph/0410090, 2004 30. Winter, A.: Ph.D. dissertation, Universität Bielefeld, http://arxiv.org/list/quant-ph/9907077, 1999 31. Wyner, A.D.: IEEE Trans. Inf. Theory 21, 294 (1975) 32. Terhal, B.M., Horodecki, M., DiVincenzo, D.P., Leung, D.W.: J. Math. Phys. 43, 4286 (2002) 33. Smolin, J.A., Verstraete, F., Winter, A.: Phys. Rev. A 72, 052317 (2005) 34. Smolin, J.A., Thapliyal, A.V.: Phys. Rev. A 68, 062324 (2003) 35. Yard, J., Devetak, I., Hayden, P.: http://arxiv.org/list/quant-ph/0501045, 2005 36. Demianowicz, M., Horodecki, P.: http://arxiv.org/list/quant-ph/0603106, 2006 37. Horodecki, M., Horodecki, P., Horodecki, R.: Phys. Rev. Lett. 85, 433 (2000) 38. Lieb, E.H., Ruskai, M.B.: J. Math. Phys. 14, 1938 (1973) 39. Bennett, C.H., Shor, P.W., Smolin, J.A., Thapliyal, A.V.: IEEE Trans. Inf. Theory 48, 2637 (2002) 40. Fuchs, C.A., van de Graaf, J.: IEEE Trans. Inf. Theory 45, 1216 (1999) 41. Winter, A.: IEEE Trans. Inf. Theory 45, 2481 (1999) 42. Ogawa, T., Nagaoka, H.: In: Proc. SITA 2001 (2001), p. 599, http://arxiv.org/list/quant-ph/0208139, 2002 43. Fannes, M.: Commun. Math. Phys. 31, 291 (1973) 44. Werner, R.: Phys. Rev. A 40, 4277 (1989) 45. Cover, T.M., Thomas, J.A.: Elements of Information Theory. New York: Wiley Interscience (1991) Communicated by M.B. Ruskai

Commun. Math. Phys. 269, 137–152 (2007) Digital Object Identifier (DOI) 10.1007/s00220-006-0123-0

Communications in

Mathematical Physics

Energy Splitting, Substantial Inequality, and Minimization for the Faddeev and Skyrme Models Fanghua Lin1 , Yisong Yang2 1 Courant Institute of Mathematical Sciences, New York University, New York, New York 10021, USA.

E-mail: [email protected]

2 Department of Mathematics, Polytechnic University, Brooklyn, New York 11201, USA.

E-mail: [email protected] Received: 8 January 2006 / Accepted: 10 April 2006 Published online: 10 October 2006 – © Springer-Verlag 2006

Abstract: In this paper, we prove that the Faddeev energy E 1 at the unit Hopf charge is attainable. The proof is based on utilizing an important inequality called the substantial inequality in our previous paper which describes how the Faddeev energy splits into its sublevels in terms of energy and topology when compactness fails. With the help of an optimal Sobolev estimate of the Faddeev energy lower bound and an upper bound of E 1 , we show that E 1 is attainable. For the two-dimensional Skyrme model, we prove that the substantial inequality is also valid, which allows us to greatly improve the range of the coupling parameters for the existence of unit-charge solitons previously guaranteed in a smaller range of the coupling parameters by the validity of the concentration-compactness method. 1. Introduction Global energy minimizers are important in field theory as they provide leading-order contributions to the transition amplitudes calculated through functional integrals or partition functionals for the quantization of fundamental particle systems [9]. Some prototype examples include kinks, vortices, monopoles, and instantons, which are static solitons characterized by various topological invariants. Except for the one-dimensional (1D) kink case which is completely integrable, in all the other cases, global energy minimizers can only be obtained in the so-called BPS limits. The main difficulty we encounter in this kind of problems is a lack of compactness because the energy functionals are all defined over the full Euclidean spaces. For the well-known Skyrme model and the Faddeev model, the situation is even less transparent because these models do not have a BPS-limit structure. Therefore, one is forced to study the direct minimization problem for these models. From an analytic point of view, the first temptation would be to try to see whether the concentration-compactness method [14] works because this method is developed to tackle similar minimization problems defined over full spaces which says that a minimizing sequence converges (hence compactness holds) if after suitable

138

F. Lin, Y. Yang

translations it concentrates in a local region (that is, if concentration takes place). For our problems, however, it is not directly possible to establish such a concentrationcompactness picture. In fact, we will have to be forced to study the situation when concentration-compactness fails and an energy splitting or dichotomy takes place. It is interesting that the topological structure of these problems now become important which allows us to deduce concentration-compactness indirectly from an inequality we call “the substantial inequality” which originates essentially from assuming dichotomy or energy splitting. We have seen in [12] that this substantial inequality method enabled us to establish a series of existence theorems for the Faddeev model [6–8] and the 3D Skyrme model [18–21, 27], which were previously unavailable. In this paper, we will use this method to establish the much anticipated existence theorem that the Faddeev energy E 1 at the unit Hopf charge is attainable. Besides, we will use the same method to establish some new existence results for the 2D Skyrme model which considerably improve the existence result previously obtained in [13] using the concentration-compactness method. The rest of this paper is organized as follows. In the next section, we recall the existence problem of the Faddeev model and prove that the Faddeev energy E 1 at the unit Hopf charge is attainable by using the substantial inequality method. This method relies on some suitable energy estimates which are consequences of a specific topological energy lower bound and an upper estimate for E 1 , which will be elaborated in detail in Sect. 3 and Sect. 4. In Sect. 5, we study the 2D Skyrme model and we prove that the substantial inequality is valid. In particular, we show that the minimization problem of the 2D Skyrme model has a solution within a suitable (but unknown) topological class. In Sect. 6, we use the substantial inequality method as we do for the Faddeev model to show the existence of a least-positive-energy minimizer for the 2D Skyrme model. We also show that an energy minimizer for the 2D Skyrme model exists at the unit topological degree when the product of the coupling constants lies in an explicit interval which greatly improves the interval we obtained in [13] by using the concentration-compactness method directly. We also remark that the values of the coupling constants in the Faddeev model and Skyrme model are not important for the understanding of their minimization problems. 2. Minimization for the Faddeev Model Let n = (n 1 , n 2 , n 3 ) : R3 → S 2 be a map (from the Euclidean 3-space to the unit 2-sphere) and F jk (n) = n · (∂ j n ∧ ∂k n) ( j, k = 1, 2, 3) the induced (Faddeev) magnetic field. We follow [25] to use the renormalized Faddeev energy 1 2 2 E(n) = |∂k n| + Fk (n) dx 2 R3 1≤k≤3 1≤k<≤3 1 |∇n|2 + |F|2 dx. (2.1) = 2 R3

Here F = F(n) = ( 21 ε jkk Fkk (n)) = (F23 (n), −F13 (n), F12 (n)). The finite-energy condition implies that n approaches a constant vector n∞ at infinity of R3 . Hence we may compactify R3 into S 3 and view the fields as maps from S 3 to S 2 . As a consequence, we see that each finite-energy field configuration n is associated with an integer, Q(n), in π3 (S 2 ) = Z. In fact, such an integer Q(n) is known as the Hopf invariant which has

Faddeev and Skyrme Models

139

the following integral characterization due to Whitehead [26]: Since the vector field F is divergence free, we can express F in terms of a vector potential A, F = ∇ ∧ A. Then the Hopf invariant or charge Q(n) of the map n may be evaluated by the integral 1 Q(n) = A · F dx, (2.2) 16π 2 R3 which is also a Chern–Simons index (see [10] for an interesting discussion from the point of view of a physicist) and may be interpreted as a linking number. We are interested in the following topologically constrained minimization problem E m = inf{E(n) | E(n) < ∞, Q(n) = m}.

(2.3)

For m = 0, the solutions of (2.3) give rise to static solitons known as the Faddeev knots [8, 2–4]. In [12], we proved the existence of an infinite subset S of the set of all integers Z such that (2.3) is solvable for any m ∈ S. Moreover, we showed also that m 0 ∈ S, where m 0 = 0 is such that E m 0 = inf{E m | m ∈ Z \ {0}}. We were unable to further describe the set S. In this paper, we shall establish the much anticipated result, 1 ∈ S, for the above Faddeev problem [7]. That is, we shall prove Theorem 2.1. The Faddeev minimization problem (2.3) has a solution for m = ±1. The proof of this theorem follows from an inequality we derived in [12], which we called the “substantial inequality” and some suitable refined energy estimates. Substantial Inequality [12]. For any m ∈ Z\{0}, there is a decomposition m = m 1 + · · · + m , m s ∈ S\{0}, s = 1, . . . , ,

(2.4)

so that the following sub-additivity relation Em ≥ Em1 + · · · + Em

(2.5)

holds. Note that the two ingredients of (2.4) and (2.5) are that the former expresses a “chargeconservation” law and the latter says that the total mass of a multiple-particle system is at least equal to the sum of the masses of the particles that the system is made of plus a possible amount of binding energy. More precisely, such an energy splitting process may be compared with the familiar nuclear fission process. Indeed, when a nucleus undergoes fission spontaneously, it splits into several smaller fragments (or substances). The sum of the masses of these fragments is less than the original mass of the nucleus and the “missing” mass has been converted into energy according to Einstein’s equation. With this interpretation, the substantial inequality may also be called the “mass inequality” or the “fission inequality.” We thank Michael Kiessling and Zhengchao Han for their valuable comments on this interpretation. Lower and Upper Energy Estimates. The following energy lower bound holds: √ E(n) ≥ 33/8 8 2π 2 |Q(n)|3/4 .

(2.6)

Besides, the energy E 1 satisfies the upper estimate √ E 1 ≤ 32 2π 2 .

(2.7)

140

F. Lin, Y. Yang

Remarks. The lower bound (2.6) was first derived by Vakulenko and Kapitanski [24] in the form E(n) ≥ C|Q(n)|3/4 with C > 0 an unspecified universal constant. Since (2.6) is an important inequality for the Faddeev model, we shall go over the details of its derivation in the next section. When we do this, we give close attention at all the steps to keeping the optimality of various constants encountered. Other related discussions can be found in [11, 17]. The upper bound (2.7) was obtained by Ward [25]. In Sect. 4, we shall follow the steps sketched in [25] to arrive at (2.7). Proof of Theorem 1.1. Suppose that E 1 is not attainable. Then in the minimization process for E 1 concentration does not occur and there holds the nontrivial energy splitting in view of the substantial inequality (2.6) by [12]: E1 ≥ Em1 + · · · + Em , 1 = m 1 + · · · + m , m s ∈ Z \ {0}, s = 1, . . . , ,

(2.8) (2.9)

with ≥ 2. Since each E m s > 0 in view of (2.6), we see from (2.8) and the fact E 1 = E −1 that m s = ±1 for s = 1, . . . , . In view of (2.9), one of the integers, m 1 , . . . , m , must be an odd number. Assume that m 1 is odd. Then |m 1 | ≥ 3. Of course, |m 2 | ≥ 2. Therefore (2.7) and (2.6) lead us to √ √ (2.10) 32 2π 2 ≥ E 1 ≥ E m 1 + E m 2 ≥ 33/8 8 2π 2 (33/4 + 23/4 ), which is a contradiction and the proof of the theorem follows. 3. Vakulenko–Kapitanski Inequality First recall the sharp Sobolev inequality [1, 23] for a scalar function f ∈ W 1, p (Rn ): if 1 < p < n and 1/q = 1/ p − 1/n, then C0 f q ≤

1/ p |∇ f | dx

,

p

Rn

(3.1)

where the best constant C0 is defined by C0 = n

1/ p

n−p p−1

1−1/ p ( n )(n + 1 − n ) 1/n p p ωn , (n)

(3.2)

with ωn the n-dimensional volume enclosed by the unit sphere S n−1 in Rn , and f q denotes the standard L q (Rn )-norm. Since we need n = 3 and p = 2, we must have q = 6 and (3.1) and (3.2) give us the sharp Sobolev inequality in 3D (see also [16]): f 6 ≤

4 √ 3 3π 2

1/3

1/2 |∇ f | dx 2

R3

.

(3.3)

We now consider the vector fields A and F defined in (2.1) and (2.2). Following [24], we have 2/3 1/3 A · F dx ≤ A 6 F 6/5 ≤ A 6 F 1 F 2 . (3.4) R3

Faddeev and Skyrme Models

141

Note that we always use A p to denote the L p (R3 )-norm for magnitude (scalar) function |A| of the vector field A. Using (3.3), we have 1/3 1/2 4 2 A 6 = |A| 6 ≤ | ∇|A| | dx √ 3 3π 2 R3 1/3 1/2 4 ≤ |∇A|2 dx , √ 3 3π 2 R3

(3.5)

|∇ A j |2 . On the other boundary where |∇A|2 =

hand, neglecting

terms at infinity when integrating, we have the identity |∇A|2 = (∇ · A)2 + |∇ ∧ A|2 (in [24], there is an additional erroneous factor 1/2 on the right-hand side of this relation). Hence restricting to divergence-free vector field A as in [24] and using the relation F = ∇ ∧ A, we see that (3.5) becomes A 6 ≤

4 √ 3 3π 2

1/3 F 2 .

(3.6)

Inserting (3.6) into (3.4), we obtain

R

1/3 4 2/3 4/3 A · F dx ≤ F 1 F 2 . √ 2 3 3 3π

(3.7)

We now estimate F 1 and F 2 in terms of the Faddeev energy E(n) given in (2.1). For convenience, we make the decomposition E = E D + E S where

1 |∇n| dx and E S (n) = |F|2 dx E D (n) = 2 R3 R3 2

(3.8)

stand for the Dirichlet-type energy and Skyrme-type energy, respectively. Specializing the argument of Ward [25] based on a paper of Manton [15] using symmetric polynomials, we have |F| ≤ |∇n|2 /2. In fact, let λ1 , λ2 , λ3 be the eigenvalues of the symmetric matrix (∇n j · ∇n k ). Then there holds the identity λ1 λ2 + λ2 λ3 + λ1 λ3 =

3 1≤ j
=

3

det

∇n j · ∇n j ∇n j · ∇n k ∇n k · ∇n j ∇n k · ∇n k

|∂ j n ∧ ∂k n|2 = |F|2 .

(3.9)

1≤ j
It can be directly checked that n lies in the nullspace of the matrix (∇n j · ∇n k ). Therefore √ this matrix has a zero eigenvalue. Assume λ3 = 0. We get from (3.9) that |F| = λ1 λ2 ≤ 21 (λ1 + λ2 ) = 21 the trace of (∇n j · ∇n k ) = 21 |∇n|2 as stated.

142

F. Lin, Y. Yang

Hence F 1 ≤ 21 |∇n|2 = 21 E D (n). Besides, it is obvious that F 22 = 2E S (n). As a consequence, we can update (3.7) into the form

1/3 2/3 4 1 E D (n) A · F dx ≤ (2E S (n))2/3 √ 2 3 3π 2 R3 1/3 4/3 1 4 E(n) ≤ √ 2 3 3π 2 √ 2 −1/3 = (12 3π ) (E(n))4/3 ,

(3.10)

which establishes (2.6).

4. An Upper Estimate for E1 In this section, we follow the steps sketched in Ward [25] to derive (2.7). Note that an intermediate result (see (4.18) below) we obtain is different from that stated in [25] due to our choice of the stereographic projection for the 3-sphere. However, this result does not affect the final estimate (2.7). Energy of the Hopf Map from S 3R into S 2 . Consider the spheres in R4 and R3 given in terms of their respective coordinate variables by S 3R = (x1 , x2 , x3 , x4 ) ∈ R4 | x12 + x22 + x32 + x42 = R 2 , S 2 = (y1 , y2 , y3 ) ∈ R3 | y12 + y22 + y32 = 1 . The Hopf map : S 3R → S 2 , (x1 , x2 , x3 , x4 ) = (y1 , y2 , y3 ), may be defined by y1 =

2 (x1 x3 + x2 x4 ), R2

y2 =

2 (x2 x3 − x1 x4 ), R2

y3 =

1 2 2 2 2 x + x − x − x 3 2 1 . R2 4 (4.1)

This map has the Hopf index one. Using the “Hopf” coordinates (θ, s, t) for which θ θ θ sin s, x2 = R sin cos s, x3 = cos sin t, x1 = R sin 2 2 2 θ cos t, 0 ≤ θ ≤ π, −π ≤ s ≤ π, −π ≤ t ≤ π, (4.2) x4 = cos 2 the Hopf map can be represented in view of (4.1) and (4.2) simply as (θ, s, t) = (sin θ cos(s − t), − sin θ sin(s − t), cos θ ).

(4.3)

So | θ |2 = 1, | s |2 = sin2 θ , and | t |2 = sin2 θ . Besides, with the notation x = (x1 , x2 , x3 , x4 ) and the coordinate representation (4.2), we can calculate the induced metric components for S 3R directly: gθθ = R 2 /4, gss = R 2 sin2 (θ/2), gtt = R 2 cos2 (θ/2), and gθs = gθt = gst = 0. Consequently, g θθ = 4/R 2 , g ss = 1/R 2 sin2 (θ/2),

Faddeev and Skyrme Models

143

g tt = 1/R 2 cos2 (θ/2), g θs = g θt = g st = 0, and the Dirichlet energy density of the Hopf map over S 3R , ED ( ; S 3R ) takes the constant value,

8 ED ; S 3R = g jk ∂ j · ∂k ( j, k = θ, s, t) = 2 , R

(4.4)

as stated in [25]. Similarly, we can evaluate the Skyrme energy density, ES ( ; S 3R ). We easily see in view of (4.3) that the respective components of the Faddeev magnetic field F jk ( ) = · (∂ j ∧ ∂k ) are Fθs ( ) = − sin θ, Fθt ( ) = sin θ , and Fst ( ) = 0. Therefore

1 8 ES ; S 3R = g j g km F jk ( )Fm ( ) = 4 , 4 R

(4.5)

also as stated in [25]. Integrating (4.4) and (4.5) over S 3R and using the fact that the total volume of S 3R is 2π 2 R 3 , we arrive at the following Ward’s number [25] for the intrinsic Faddeev energy of the Hopf map : S 3R → S 2 :

E ; S 3R ≡

S 3R

1 , ED ( ; S 3R ) + ES ; S 3R dVS 3 = 16π 2 R + R R

(4.6)

where we use dVS 3 to denote the canonical volume element of S 3R . R

Stereographic Coordinates. We need the stereographic projection from S 3R to R3 so that the inverse of this projection can be viewed as a specific coordinate chart for S 3R : x1 =

2R 2 2R 2 2R 2 ξ, x2 = 2 ζ, x3 = 2 η, x4 = 2 2 2 r +R r +R r + R2

r 2 − R2 R, r 2 + R2

(ξ, ζ, η) ∈ R3 , r 2 = ξ 2 + ζ 2 + η2 .

(4.7)

In terms of this coordinate system, we see that the respective components of the canonical metric tensor of S 3R become gξ ξ = gζ ζ = gηη = 4R 4 /(r 2 + R 2 )2 and gξ ζ = gξ η = gζ η = 0. Consequently, dVS 3 = (8R 6 /(r 2 + R 2 )3 )dξ dζ dη, g ξ ξ = g ζ ζ = R

g ηη = (r 2 + R 2 )2 /4R 4 , and g ξ ζ = g ξ η = g ζ η = 0. Now let n : R3 → S 2 be a map of finite Faddeev energy, which may be viewed as a map from S 3R into S 2 represented through the above stereographic coordinates. We have E D (n; S 3R )

≡ = →2

S 3R

3 R

g jk ∂ j n · ∂k n dVS 3 ( j, k = ξ, ζ, η) R

2R 2 |∇n|2 dξ dζ dη + R2

r2

R3

|∇n|2 dξ dζ dη as R → ∞,

(4.8)

144

F. Lin, Y. Yang

which is twice the standard Dirichlet energy over R3 . Similarly, for the Skyrme energy part, we have

E S (n; S 3R )

1 j km g g F jk (n)Fm (n) dVS 3 ( j, k = ξ, ζ, η) R 4 (r 2 + R 2 ) 1 j km 1 = δ F (n)F (n) dξ dζ dη δ jk m 2 R3 R2 4 1 → |F(n)|2 dξ dζ dη as R → ∞, 4 R3 ≡

S 3R

(4.9)

which is half of the standard Skyrme energy over R3 . Hence we arrive at the weighted limit E(n) = lim

R→∞

1 E D n; S 3R + 2E S n; S 3R . 2

(4.10)

Note that the above weighted limit is a result of our choice of the stereographic projection (4.7) which maps S 3R onto the extended plane through its equator. However, when we use the stereographic projection which maps S 3R onto the extended plane tangential to its north pole, we shall not need to place weights and we get the same intermediate result as that stated in Ward [25], instead of (4.18) below. Upper Bound by Rescaling/Dilation. From (4.6), we see that we cannot take the R → ∞ limit directly for the Hopf map. On the other hand, however, the limit (4.10) suggests that for a suitably chosen map, the limit R → ∞ taken over S 3R may allow us to recover the Faddeev energy over the Euclidean space R3 . In the following, we use a rescaling/dilation argument of Ward [25] to get a suitable Hopf map over S 3R which allows us to take the R → ∞ limit. In this way, we arrive at the upper bound (2.7) stated for E 1 in [25]. We again use (ξ, ζ, η) to denote the stereographic coordinates defined in (4.7) and the Hopf map from S 3R to S 2 defined in (4.1). We introduce the deformed (dilated) map λ : S 3R → S 2 given by λ (ξ, ζ, η) = (λξ, λζ, λη) = (y1 , y2 , y3 ) ∈ S 2 ,

(4.11)

where, in view of (4.1) and (4.7), the image coordinates y1 , y2 , y3 are given by 4λR (2λRξ η + [λ2 r 2 − R 2 ]ζ ), (λ2 r 2 + R 2 )2 4λR y2 = 2 2 (2λRζ η − [λ2 r 2 − R 2 ]ξ ), (λ r + R 2 )2 8λ2 R 2 (η2 − r 2 ). y3 = 1 − 2 2 (λ r + R 2 )2 y1 =

(4.12)

Faddeev and Skyrme Models

145

Let R = λa. Then the above representation simplifies to 4a y1 = 2 (2a ξ η + [r 2 − a 2 ]ζ ), (r + a 2 )2 4a (2a ζ η − [r 2 − a 2 ]ξ ), y2 = 2 (r + a 2 )2 8a 2 (η2 − r 2 ), y3 = 1 − 2 (r + a 2 )2

(4.13)

which is the Hopf map from Sa3 to S 2 . This property allows us to evaluate the energy densities on S 3R easily. Indeed, we have in view of the conformality of the stereographic coordinates and (4.13) the relations

(r 2 + R 2 )2 4a 4 3 ED λ ; S 3R = ; S · E D a 4R 4 (r 2 + a 2 )2 2 2 r + R2 a4 8 = · 4 · 2, (4.14) 2 2 r +a R a

(r 2 + R 2 )2 2 4a 4 2 3 3 ES λ ; S R = ; S · E S a 4R 4 (r 2 + a 2 )2 2 4 a8 8 r + R2 · 8 · 4. (4.15) = 2 2 r +a R a Hence, integrating (4.14) and (4.15) against the volume element dVS 3 = (8R 6 /(r 2 + R

R 2 )3 )dξ dζ dη over the (ξ, ζ, η)-space R3 , we obtain

64π 2 λR

8π 2 (1 + λ2 ) 3 . , E ; S E D λ ; S 3R = λ S R = (1 + λ)2 λR It can be checked that, as a function of λ > 0, the global minimum of

1

E˜ λ ; S 3R = E D λ ; S 3R + 2E S λ ; S 3R 2 is √ 32π 2 min E˜ λ ; S 3R | λ > 0 = 32 2π 2 − , R which is achieved at 1 2 √ R λ = λR ≡ √ − 1 + R − 2R. 2 2

(4.16)

(4.17)

(4.18)

(4.19)

With this choice of the dilation parameter λ = λ R , the Hopf map (4.12) under the (ξ, ζ, η)-coordinates can be rewritten as 4(R/λ R ) (2(R/λ R )ξ η + [r 2 − (R/λ R )2 ]ζ ), + (R/λ R )2 )2 4(R/λ R ) (2(R/λ R )ζ η − [r 2 − (R/λ R )2 ]ξ ), y2R = 2 (r + (R/λ R )2 )2 8(R/λ R )2 (η2 − r 2 ). y3R = 1 − 2 (r + (R/λ R )2 )2

y1R =

(r 2

(4.20)

146

F. Lin, Y. Yang

Using (4.19) in (4.20), we see that, as R → ∞, the map y R = (y1R , y2R , y3R ) : R3 → S 2 converges rapidly to ◦ P −1 ≡ N : R3 → S 2 , where is the Hopf map from S 3 √ to 1/ 2

S 2 defined in (4.1) and P : S 3 √ → R3 is the stereographic projection defined in (4.7), √ 1/ 2 respectively, with R = 1/ 2. Hence, setting λ = λ R in (4.17) and letting R → ∞, we obtain by virtue of (4.18) that 1 j km 2 |∇N| + δ δ F jk (N)Fm (N) dξ dζ dη E(N) = 4 R3 R2 (r 2 + R 2 ) 1 j km R 2 R R = lim |∇ y | + · δ δ F jk (y )Fm (y ) dξ dζ dη R→∞ R3 r 2 + R 2 R2 4

√ 1 E D λ R ; S 3R + 2E S λ R ; S 3R = 32 2π 2 . (4.21) = lim R→∞ 2 Of course, Q(N) = 1. Therefore the upper bound (2.7) follows. 5. Two-Dimensional Skyrme Model With the notation in [13], the two-dimensional Skyrme energy functional governing a configuration map u : R3 → S 2 is given by 1 λ μ (5.1) |∇u|2 + |∂1 u ∧ ∂2 u|2 + |n − u|4 dx, E(u) = 4 16 R2 2 where n = (0, 0, 1) is the north pole of S 2 in R3 , and λ, μ are positive coupling constants. Finite-energy condition implies that u tends to n as |x| → ∞. Therefore u may be viewed as a map from S 2 to itself which defines a homotopy class in π2 (S 2 ) = Z, whose integer representative is the Brouwer degree of u with the integral representation 1 u · (∂1 u ∧ ∂2 u) dx. (5.2) deg(u) = 4π R2 Like before, we are interested in the minimization problem E k = inf{E(u) | E(u) < ∞, deg(u) = k},

(5.3)

where k ∈ Z. Of course, E k = E |k| for all k ∈ Z. The main existence result of [13] states that if the coupling constants λ and μ satisfy λμ ≤ 48,

(5.4)

then the minimization problem (5.3) has a solution for k = ±1. A direct consequence of the form of the energy (5.1) and the topological integral (5.2) is the following standard topological energy lower bound: 1 |∇u|2 dx ≥ 4π | deg(u)|. (5.5) 2 R2

Faddeev and Skyrme Models

147

Hence, it follows from (5.5) that if deg(u) = 0, then E(u) > 4π | deg(u)|.

(5.6)

Besides, using stereographic projection of S 2 as a trial field configuration, it can be shown [13] that there holds the following upper estimate for E 1 : 1 λμ . (5.7) E 1 ≤ 4π 1 + 2 3 Minimization and Concentration-Compactness. Let {u n } be a minimizing sequence of the problem (5.3). Then, passing to a subsequence if necessary, we may assume that u n → some u weakly in a well-understood sense and (cf. [5, 22]) E(u) ≤ lim inf E(u n ) = E k . n→∞

(5.8)

Hence, in order to show that (5.3) is solved by the map u, it remains to show that u carries the same topology, deg(u) = k, which is the main difficulty one encounters in this type of problems. For the minimizing sequence {u n }, set 1 λ μ f n (x) = |∇u n |2 + |∂1 u n ∧ ∂2 u n |2 + |n − u n |4 (x), n = 1, 2, . . . . (5.9) 2 4 16 Then f n ∈ L 1 (R2 ), f n L 1 ≥ 4π |k|, and we can assume f n L 1 ≤ E k + 1 (say), n = 1, 2, . . .. Use D(y, R) to denote the disk in R2 centered at y and of radius R > 0: D(y, R) = {x ∈ R2 | |x − y| < R} (we also use the simplified notation D R = {x ∈ R2 | |x| < R}). Then, according to Lions [14], one of the following three situations must occur (principle of concentration-compactness): (a) Compactness: There is a sequence {yn } in R2 such that, for any ε > 0, there exists an R > 0, such that sup f n (x) dx ≤ ε. (5.10) n

R2 \D(yn ,R)

(b) Vanishing: For any R > 0, lim

n→∞

f n (x) dx

sup y∈R2

= 0.

(5.11)

D(y,R)

(c) Dichotomy: There is a sequence {yn } ⊂ R2 and a positive number t ∈ (0, 1) such that for any ε > 0, there is an R > 0 and a sequence of positive numbers {Rn } satisfying limn→∞ Rn = ∞ so that f (x) dx − t f 1 n n L ≤ ε, D(yn ,R) (5.12) f n (x) dx − (1 − t) f n L 1 ≤ ε. R2 \D(yn ,Rn )

148

F. Lin, Y. Yang

It is not hard to show that if (a) is the case, then (5.3) has a solution. Besides, it can also be shown that (b) does not happen for k = 0. See [13]. Therefore, we are left with the remaining case (c) to consider. The Substantial Inequality Implied by a Technical Lemma. Let be a subdomain in R2 and define 1 λ μ 2 2 4 (5.13) E(u; ) = |∇u| + |∂1 u ∧ ∂2 u| + |n − u| dx. 4 16 2 In [13], we proved the following technical lemma for the functional E(u; ): Lemma 5.1. For any ε ∈ (0, 1) and R ≥ 1, let u : D2R \ D R → S 2 satisfy E(u; D2R \ D R ) < ε.

(5.14)

Then there is a map u˜ : D2R \ D R → S 2 such that (i) u| ˜ ∂ D R = u, (ii) u| ˜ ∂ D2R = n, (iii) E(u; ˜ D2R \ D R ) < Cε1/2 where C > 0 is an absolute constant independent of R, ε, and u. ˜ ∂ D R = n, Likewise, one can also obtain a modified map u˜ : D2R \ D R → S 2 such that u| u| ˜ ∂ D2R = u, and (iii) holds as well. The above lemma seems to be naturally true for higher dimensional problems such as the Faddeev problem and the (classical 3D) Skyrme problem. However, we are only able to prove it in our 2D situation here. It will be seen that this lemma is crucial in our proof of the substantial inequality for the 2D Skyrme model as stated in the next theorem. Theorem 5.2. Let k be a nonzero integer and {u n } a minimizing sequence of the problem (5.3). Then either (a) holds (hence a subsequence of {u n } converges weakly to a solution of (5.3)) or there are nonzero integers k1 and k2 such that k = k1 + k2 and E k ≥ E k1 + E k2 .

(5.15)

As a consequence, if S denotes the subset of Z \ {0} for which every member k ∈ S makes (5.3) solvable, then S = ∅. In particular, for any k ∈ Z \ {0}, there are integers k1 , . . . , k ∈ S such that k = k1 + · · · + k and E k ≥ E k1 + · · · + E k .

(5.16)

Proof. Suppose (c) (dichotomy) occurs. Then, after making translations, we may assume that there is a number t ∈ (0, 1) such that for any ε > 0 there is an R > 0 and a sequence of positive numbers {Rn } satisfying limn→∞ Rn = ∞ so that (5.17) f (x) dx − t E(u ) n n < ε, DR (5.18) f n (x) dx − (1 − t)E(u n ) < ε. R2 \D Rn Without loss of generality, we may assume that Rn > 2R for all n.

Faddeev and Skyrme Models

149

From (5.17), (5.18), and the decomposition E(u n ) = f n (x) dx + f n (x) dx + E(u n ; D Rn \ D R ), R2 \D Rn

DR

(5.19)

we have E(u n ; D2R \ D R ) ≤ E(u n ; D Rn \ D R ) < 2ε.

(5.20)

Using (5.20) and Lemma 5.1, we can find maps vn and wn from R2 into S 2 such that vn = u n in D R , vn = n in R2 \ D2R , E(vn ; D2R \ D R ) < Cε1/2 , wn = u n in R2 \ D Rn , wn = n in D Rn /2 , E(wn ; D Rn \ D Rn /2 ) < Cε1/2 , where C > 0 is a constant independent of R, u n , and ε. Therefore, with the notation F(u) = u · (∂1 u ∧ ∂2 u), we have 4π |deg(u n ) − (deg(vn ) + deg(wn ))| |F(u n )| dx + |F(vn )| dx + ≤ D Rn \D R

D2R \D R

D Rn \D Rn /2

|F(wn )| dx

≤ E(u n ; D Rn \ D R ) + E(vn ; D2R \ D R ) + E(wn ; D Rn \ D Rn /2 ) ≤ 2ε + 2Cε1/2 . Since ε can be made arbitrarily small and deg(u n ), deg(vn ), and deg(wn ) are integers, we may assume k = deg(u n ) = deg(vn ) + deg(wn ), ∀n.

(5.21)

On the other hand, since 4π | deg(vn )| ≤ E(vn ) = E(u n ; D R ) + E(vn ; D2R \ D R ) ≤ E(u n ) + Cε1/2 ≤ (k + 1) + Cε1/2 , we see that {deg(vn )} is bounded. We claim that deg(vn ) = 0 for n sufficiently large. Indeed, if deg(vn ) = 0 for infinitely many n’s, then by going to a subsequence when necessary, we may assume that deg(vn ) = 0 for all n. Thus in view of (5.21) we see that deg(wn ) = k for all n and E(wn ) ≤ E(u n ; R2 \ D Rn ) + Cε1/2 = f n (x) dx + Cε1/2 . (5.22) R2 \D Rn

Using (5.18) and (5.22), we arrive at E k ≤ lim sup E(wn ) ≤ (1 − t) lim E(u n ) + ε + Cε1/2 ≤ (1 − t)E k + ε + Cε1/2 . n→∞

n→∞

(5.23) Since 0 < t < 1 and ε can be made arbitrarily small, (5.23) implies E k = 0 which contradicts the topological lower bound E k ≥ 4π |k| as stated in (5.6).

150

F. Lin, Y. Yang

Similarly, we see that the sequence {deg(wn )} is also bounded and deg(wn ) = 0 for n sufficiently large. Hence, by going to subsequences if necessary, we may assume that there are integers k1 = 0 and k2 = 0 such that deg(vn ) = k1 and deg(wn ) = k2 ∀n.

(5.24)

Now we have E(vn ) + E(wn ) = E(u n ; D R ) + E(u n ; R2 \ D Rn ) + E(vn ; D2R \ D R ) + E(wn ; D Rn \ D Rn /2 ) ≤ E(u n ) + 2Cε1/2 .

(5.25)

Therefore, it follows from (5.25) directly that E k1 + E k2 ≤ lim E(u n ) + 2Cε1/2 = E k + 2Cε1/2 . n→∞

(5.26)

Combining (5.21), (5.24), and (5.26), we see that (5.15) is established. If (a) (compactness) does not occur at k = k1 or k = k2 for the minimization problem (5.3), we can continue our splitting in the above fashion. This splitting procedure will have to stop after finitely many steps because E k is a finite number and the splitting cannot go on forever. In other words, we will have to stop at an inequality of the type E k ≥ E k1 + · · · + E k with k = k1 + · · · + k (ks = 0, s = 1, . . . , ) and no splitting of the energies E k1 , . . . , E k will be possible. Therefore, (a) (compactness) must occur for the minimization problem (5.3) for k = k1 , . . . , k = k . In other words, (5.3) is solvable for k = k1 , . . . , k = k and (5.16) is established as well. 6. Least-Positive-Energy and Unit-Charge Solitons Using the inequality E k ≥ 4π |k| (cf. (5.6)), we see that {E k }k∈Z\{0} ⊂ [4π, ∞) and that there is an integer k0 ≥ 1 such that E k0 = min{E k | k ∈ Z \ {0}}.

(6.1)

That is, E k0 is the least possible positive energy of the 2D Skyrme model (5.1). For this energy value, we have Theorem 6.1. The least positive energy E k0 of the 2D Skyrme model is attainable. In other words, for k = k0 , the minimization problem (5.3) has a solution. Proof. We use Theorem 5.2. If (a) (compactness) does not occur when taking k = k0 in the minimization problem (5.3), then in view of Theorem 5.2 we can find two nonzero integers k1 and k2 such that E k0 ≥ E k1 + E k2 , which is false because E k1 ≥ E k0 , E k2 ≥ E k0 , and E k0 > 0. Next, we use Theorem 5.2 to study the attainability of E 1 for the 2D Skyrme model following the substantial inequality method used in Sect. 2 for the Faddeev model. We can state

Faddeev and Skyrme Models

151

Theorem 6.2. For the 2D static Skyrme model (5.1), the energy E 1 is attainable provided that the coupling constants λ and μ satisfy the bound λμ ≤ 192.

(6.2)

In other words, the minimization problem (5.3) is solvable for k = ±1 under the condition (6.2). Proof. First recall that we established [13] a stronger version of the topological lower bound (5.6) which states that there is a positive constant C(λ, μ, k) (i.e., the constant only depends on the coupling parameters λ and μ and the nonzero integer k) such that E(u) ≥ 4π | deg(u)| + C(λ, μ, deg(u)) (deg(u) = 0). In particular, we have E k > 4π |k|, k = 0.

(6.3)

Now assume (6.2). If for k = 1 the compactness (the alternative (a)) for a minimizing sequence of (5.3) does not occur, then by Theorem 5.2 there are nonzero integers k1 and k2 so that 1 = k1 + k2 and E 1 ≥ E k1 + E k2 .

(6.4)

Since E k1 > 0 and E k2 > 0, we see from (6.4) that k1 = ±1 and k2 = ±1. However, one of the k1 and k2 must be odd. Assume k1 is odd. Then |k1 | ≥ 3. Since k2 must be even, so |k2 | ≥ 2. Using these facts, (5.7), (6.4), and (6.3), we get 1 λμ 4π 1 + (6.5) ≥ E 1 > 4π(3 + 2), 2 3 which contradicts the condition (6.2). Note that (6.2) enlarges the range of the product of the coupling parameters λ and μ stated in (5.4) (obtained earlier in [13]) by three times. Thus we see again that the method of substantial inequality is rather powerful. It may be interesting to know whether the minimization problem for the Faddeev model or the Skyrme model in 3D may be modified by introducing coupling parameters in the energy functional as in the 2D Skyrme model. To answer this question, we use the notation (3.8), modify the Faddeev energy as E λμ (n) = λE D (n) + μE S (n),

(6.6)

where λ, μ > 0 are constants, and consider the minimization problem (E λμ )m = inf{E λμ (n) | E λμ (n) < ∞, Q(n) = m}.

(6.7)

Then, using the conformal properties of E D (n) and E S (n), we can establish the √ factorization relation (E λμ )m = λμE m , where E m denotes the energy infimum stated in (2.3). In other words, the effect of the coupling constants can always be factored away for the minimization problem. Note that the same relation is also valid for the classical 3D Skyrme model. Acknowledgements. Fanghua Lin was supported in part by NSF grant DMS–0201443. Yisong Yang was supported in part by NSF grant DMS–0406446.

152

F. Lin, Y. Yang

References 1. Aubin, T.: Problemes isoperimetriques et espaces de Sobolev. J. Diff. Geom. 11, 573–598 (1976) 2. Battye, R.A., Sutcliffe, P.M.: Knots as stable solutions in a three-dimensional classical field theory. Phys. Rev. Lett. 81, 4798–4801 (1998) 3. Battye, R.A., Sutcliffe, P.M.: Solitons, links and knots. Proc. Roy. Soc. A 455, 4305–4331 (1999) 4. Cho, Y.M.: Monopoles and knots in Skyrme theory. Phys. Rev. Lett. 87, 252001 (2001) 5. Evans, L.C.: Weak Convergence Methods for Nonlinear Partial Differential Equations. Regional Conference Series in Math. No. 74, Providence, RI: A. M. S., 1990 6. Faddeev, L.: Einstein and several contemporary tendencies in the theory of elementary particles. In: Relativity, Quanta, and Cosmology, Vol. 1, edited by M. Pantaleo, F. de Finis, Newyork: Johnson Reprint Co., 1979, pp. 247–266 7. Faddeev, L.: Knotted solitons. In: Proc. Internat. Congress Mathematicians, Vol. I, Beijing: Higher Ed. Press, 2002, pp. 235–244 8. Faddeev, L., Niemi, A.J.: Stable knot-like structures in classical field theory. Nature 387, 58–61 (1997) 9. Jackiw, R.: Quantum meaning of classical field theory. Rev. Mod. Phys. 49, 681–706 (1977) 10. Jackiw, R.: Chern-Simons integral as a surface term. http://arxiv.org/list/ math-ph/0408051, 2004 11. Kundu, A., Rybakov, P.: Closed-vortex-type solitons with Hopf index. J. Phys. A Math. Gen. 15, 269–275 (1982) 12. Lin, F., Yang, Y.: Existence of energy minimizers as stable knotted solitons in the Faddeev model. Commun. Math. Phys. 249, 273–303 (2004) 13. Lin, F., Yang, Y.: Existence of two-dimensional Skyrmions via the concentration-compactness method. Comm. Pure Appl. Math. LVII, 1332–1351 (2004) 14. Lions, P.L.: The concentration-compactness principle in the calculus of variations. Part I. Ann. Inst. H. Poincar’e – Anal. non linéaire 1, 109–145 (1984); Part II, ibid 1, 223–283 (1984) 15. Manton, N.S.: Geometry of skyrmions. Commun. Math. Phys. 111, 469–478 (1987) 16. Rosen, G.: Minimum value for c in the Sobolev inequality φ 3 ≤ c ∇φ 3∗ . SIAM J. Appl. Math. 21, 30–32 (1971) 17. Shabanov, S.V.: On a low energy bound in a class of chiral field theories with solitons. J. Math. Phys. 43, 4127–4134 (2002) 18. Skyrme, T.H.R.: A nonlinear field theory. Proc. Roy. Soc. A 260, 127–138 (1961) 19. Skyrme, T.H.R.: Particle states of a quantized meson field. Proc. Roy. Soc. A 262, 237–245 (1961) 20. Skyrme, T.H.R.: A unified field theory of mesons and baryons. Nucl. Phys. 31, 556–569 (1962) 21. Skyrme, T.H.R.: The origins of Skyrmions. Internat. J. Mod. Phys. A 3, 2745–2751 (1988) 22. Tartar, L.: Compensated compactness and applications to partial differential equations. In: Nonlinear Analysis and Mechanics: Heriot-Watt Symposium, Vol. IV, edited by R.J. Knops, London: Pitman, 1979, pp. 136–212 23. Talenti, G.: Best constant in Sobolev inequality. Ann. Mat. Pura Appl. 110, 352–372 (1976) 24. Vakulenko, A.F., Kapitanski, L.V.: Stability of solitons in S 2 nonlinear σ -model. Sov. Phys. Dokl. 24, 433–434 (1979) 25. Ward, R.S.: Hopf solitons on S 3 and R 3 . Nonlinearity 12, 241–246 (1999) 26. Whitehead, J.H.C.: An expression of Hopf’s invariant as an integral. Proc. Nat. Acad. Sci. 33, 117–123 (1947) 27. Zahed, I., Brown, G.E.: The Skyrme model. Phys. Rep. 142, 1–102 (1986) Communicated by H.-T. Yau

Commun. Math. Phys. 269, 153–174 (2007) Digital Object Identifier (DOI) 10.1007/s00220-006-0124-z

Communications in

Mathematical Physics

The Dynamics of Relativistic Strings Moving in the Minkowski Space R1+n De-Xing Kong1 , Qiang Zhang2 , Qing Zhou3 1 Department of Mathematics, Shanghai Jiao Tong University, Shanghai 200240, China.

E-mail: [email protected]

2 Department of Mathematics, City University of Hong Kong, Hong Kong, China 3 Department of Mathematics, Shanghai Jiao Tong University, Shanghai 200240, China

Received: 10 January 2006 / Accepted: 27 April 2006 Published online: 7 October 2006 – © Springer-Verlag 2006

Abstract: In this paper we investigate the dynamics of relativistic (in particular, closed) strings moving in the Minkowski space R1+n (n ≥ 2). We first derive a system with n nonlinear wave equations of Born-Infeld type which governs the motion of the string. This system can also be used to describe the extremal surfaces in R1+n . We then show that this system enjoys some interesting geometric properties. Based on this, we give a sufficient and necessary condition for the global existence of extremal surfaces without space-like point in R1+n with given initial data. This result corresponds to the global propagation of nonlinear waves for the system describing the motion of the string in R1+n . We also present an explicit exact representation of the general solution for such a system. Moreover, a great deal of numerical analyses are investigated, and the numerical results show that, in phase space, various topological singularities develop in finite time in the motion of the string. Finally, some important discussions related to the theory of extremal surfaces of mixed type in R1+n are given. 1. Introduction Recently the Born-Infeld theory has received much attention mainly due to the fact that the Born-Infeld type Lagrangians naturally appear in string theory and relativity theory. This triggers the revival of interests in the original Born-Infeld electromagnetism (cf. Born and Infeld [4]) and the exploration of Born-Infeld gauge theory (cf. Gibbons [11]). From the mathematical point of view, this theory is a nonlinear generalization of the Maxwell theory. Gibbons [11] gave a systematic study of the Born-Infeld theory and obtained exact solutions in numerous situations. Recently Brenier [5] even carried out a study of the theory in the connection to hydrodynamics. On the other hand, this theory is also related to the theory for the extremal surfaces, which are C 2 surfaces with vanishing mean curvature. In mathematics, the extremal surfaces in the Minkowski space include the following four types: space-like, time-like, light-like or mixed types. For the case of the

154

D. Kong, Q. Zhang, Q. Zhou

space-like minimal (or maximal) surfaces, we refer to the classical papers by Calabi [6] and by Cheng and Yau [7]. The case of time-like surfaces has been investigated by several authors (e.g. [1] and [22]). Barbashov, Nesterenko and Chervyakov [1] studied the nonlinear differential equations describing in differential geometry the minimal surfaces in the Minkowski space and provided examples with exact solutions. Milnor [22] generated examples that display considerable variety in the shape of entire time-like minimal surfaces in the 3-dimensional Minkowski space R1+2 and showed that such surfaces need not be planar. Gu investigated the extremal surfaces of mixed type in the n-dimensional Minkowski space (cf. [13]) and constructed many complete extremal surfaces of mixed type in the 3-dimensional Minkowski space (cf. [14]). Recently, Kong et al re-studied the equation for time-like extremal surfaces in the Minkowski space R1+n , which corresponds to the motion of an open string in R1+n (see [17, 18]). In addition, for the multidimensional versions Hoppe et al derived the equation for a classical relativistic open membrane moving in the Minkowski space R1+3 , which is a nonlinear wave equation corresponding to the extremal hypersurface equation in R1+3 , and gave some special classical solutions (cf. [3, 15]). Recently this equation has been studied successfully by Lindblad [21] and, by Chae and Huh [8] in a more general framework. They proved the global existence of smooth solutions for sufficiently small initial data with compact support, using the null forms in Christodoulou and Klainerman’s style (cf. [9] and [16]). However, all results mentioned above have been obtained under the assumption that the initial curves (or surfaces) are open. As for the case that the initial curves (or surfaces) are closed, up to now only a few results are known. In what follows we state our general framework. Let be an m-dimensional manifold with coordinates (ϕ 1 , . . . , ϕ m ) and (M, g) be a d-dimensional (possibly curved) Lorentz manifold, and the action for the world-volume x = (x 0 , x 1 , . . . , x d−1 ) : R × → M of the surface moving in M be given by the (1 + m)-dimensional volume swept out in space-time √ Gdϕ 0 dϕ, (1.1) S[x] = where ϕ = (ϕ 1 , . . . , ϕ m ), and G is defined by G = | det(G αβ )|,

(1.2)

in which G αβ is the induced world-volume metric G αβ = x μ,α x ν,β gμν (x).

(1.3)

Here and in the sequel, we use the Einstein summation convention. The index notation will always be the following: α, β, . . . = 0, 1, . . . , m and μ, ν, · · · = 0, 1, · · · , d − 1, and as usual, a comma in the front of an index denotes partial derivative with respect to the corresponding variable. Here we assume that (G αβ ) is non-degenerate. The resulting field equations (i.e., the Euler-Lagrange equations for the action (1.1)) read 1 √ ρ μ GG αβ x μ,α + G αβ x ν,α x ,β νρ (x) = 0, (1.4) √ ,β G μ

where (G αβ ) is the inverse matrix of (G αβ ), and νρ denote the Christoffel symbols of the metric η (which vanish if M is the flat Minkowski space and x μ are the standard coordinates). Equations (1.4) are invariant under both arbitrary reparametrizations R × → R × of the world-volume and isometries M → M.

Dynamics of Relativistic Strings

155

Our ultimate goal is to study Eqs. (1.4). However, as the first step, in the present paper we consider relativistic (in particular, closed) strings moving in the Minkowski space R1+n (n ≥ 2). We first reduce Eqs. (1.4) to a system with n nonlinear wave equations of Born-Infeld type. This system can also be used to describe the extremal surfaces in R1+n . We then show that this system enjoys some interesting geometric properties. Based on this, we give a sufficient and necessary condition for the global existence of extremal surfaces without space-like point in R1+n with given initial data. This result corresponds to the global propagation of nonlinear waves for the system which governs the motion of the string in R1+n . We present a method to construct the explicit exact representation of the general solution for such a system. Moreover, a great deal of numerical results are investigated, and these numerical results show that, in phase space, various topological singularities develop in finite time in the motion of the string. Finally, some important discussions related to the theory of extremal surfaces of mixed type in R1+n are given. Here we would like to mention an important result related to this topic. In their classical monograph [12], Green, Schwarz and Witten pointed out that, by making a convenient choice of gauge, Eq. (1.4) for strings (exactly, Eq. (2.12) below) can be linearized (see Sect. 2.1 in [12]). However, it is not easy to solve such a convenient gauge. It is this point that is one of motivations of the present research. The paper is organized as follows. In Sect. 2, we reduce Eqs. (1.4) to a system with n nonlinear wave equations of Born-Infeld type. Section 3 is devoted to investigating some interesting properties enjoyed by the system. Based on Sect. 3, in Sect. 4 we study the global existence of the classical solutions for this system. In Sect. 5, we present an explicit exact representation of the general solution for such a system. Section 6 is devoted to the numerical results which show the formation of the topological singularities in the motion of a closed string. Finally, some important discussions and remarks related to the theory of extremal surfaces of mixed type are given in Sect. 7. 2. The Equations for Relativistic Strings Moving in Minkowski Space R1+n Let X = (t, x1 , . . . , xn ) be a position vector of a point in the (1 + n)-dimensional Minkowski space R1+n . The scalar product of two vectors X and X˜ = (t˜, x˜1 , . . . , x˜n ) is X · X˜ =

n

xi x˜i − t t˜,

(2.1)

xi2 − t 2 .

(2.2)

d xi2 − dt 2 .

(2.3)

i=1

in particular, X2 =

n i=1

The Lorentz metric of R1+n reads ds 2 =

n i=1

In order to be able to describe the motion of a closed string, we consider the local equation of a surface S in R1+n taking the following parameter form in a suitable coordinate system: xi = xi (t, θ ) (i = 1, . . . , n).

(2.4)

156

D. Kong, Q. Zhang, Q. Zhou

Then, in the surface coordinates t and θ , the Lorentz metric (2.3) becomes ds 2 = (dt, dθ )M(dt, dθ )T , where

M=

|xt |2 − 1 xt , xθ xt , xθ

|xθ |2

(2.5)

,

(2.6)

in which x = (x1 , · · · , xn ) and xt , xθ =

n

xi,t xi,θ , |xt |2 = xt , xt and |xθ |2 = xθ , xθ .

(2.7)

i=1

We assume that the surface S is C 2 and time-like, i.e., det M < 0,

(2.8)

xt , xθ 2 − (|xt |2 − 1)|xθ |2 > 0.

(2.9)

equivalently,

Thus the area element of S is d A = xt , xθ 2 − (|xt |2 − 1)|xθ |2 dtdθ.

(2.10)

The surface S is called extremal, if x = x(t, θ ) is a critical point of the area functional I = xt , xθ 2 − (|xt |2 − 1)|xθ |2 dtdθ. (2.11) The corresponding Euler-Lagrange equation reads xt , xθ xt − (|xt |2 − 1)xθ |xθ |2 xt − xt , xθ xθ − = 0. (2.12) xt , xθ 2 − (|xt |2 − 1)|xθ |2 t xt , xθ 2 − (|xt |2 − 1)|xθ |2 θ By computation, it follows from (2.12) that |xθ |2 xtt − 2xt , xθ xtθ + (|xt |2 − 1)xθθ = 0.

(2.13)

Remark 2.1. For the case under consideration, Eqs. (1.4) coincide with Eqs. (2.12). In particular, taking θ = x1 and n = 2, we observe that Eq. (2.13) is just the classical Born-Infeld equation. Therefore, in this sense, (2.13) is called the generalized Born-Infeld equation. Equations (2.13) contain n nonlinear wave equations of second order. These equations show that the surface is extremal if and only if its mean curvature vector vanishes. Although in the process of deriving (2.13) we assume that the surface is time-like, these equations themselves do not need this assumption. In fact, let ω = xt , xθ 2 − (|xt |2 − 1)|xθ |2 .

(2.14)

Dynamics of Relativistic Strings

157

If ω > 0 at a point P ∈ S, then, as before, the surface is said to be time-like at P; if ω > 0 at every point P ∈ S, then the surface is entire time-like. If ω < 0 at a point P ∈ S, then the surface is called space-like at P; if ω < 0 at every point P ∈ S, then the surface is entire space-like. If ω = 0 at a point P ∈ S, then the surface is said to be light-like at P; if ω ≡ 0 at every point P ∈ S, then the surface is entire light-like (essentially, light cone). The metric is Lorentzian (resp. Riemannian), if the surface is time-like (resp. space-like). Moreover, according to the terminology of PDEs, Eqs. (2.13) are hyperbolic (resp. elliptic) if the surface is time-like (resp. space-like). A connected surface is of mixed type if it contains both a time-like part and a space-like part simultaneously. In this case, Eqs. (2.13) are also of mixed type. It is well known that “time-like” in mathematics corresponds to causality in physics. In the present paper we mainly focus on the study of the dynamics of relativistic strings moving in the Minkowski space R1+n , and the problem under consideration possesses the causality, so throughout this paper we assume that the surface is non-space-like. 3. Properties Enjoyed by the Generalized Born-Infeld Equation This section is devoted to the study on some interesting properties enjoyed by the generalized Born-Infeld Eq. (2.13). In particular, in this section we assume that the surface is time-like, i.e., (2.9) always holds. Let u = xt , v = xθ ,

(3.1)

where u = (u 1 , . . . , u n ) and v = (v1 , . . . , vn ). Then (2.13) can be equivalently rewritten as ⎧ 2 ⎪ ⎨ u t − 2u, v u θ + |u| − 1 vθ = 0, 2 |v| |v|2 (3.2) ⎪ ⎩ vt − u θ = 0, for classical solutions, since (2.9) implies |v|2 > 0. Setting U = (u, v)T ,

(3.3)

Ut + A(U )Uθ = 0,

(3.4)

⎤ |u|2 − 1 2u, v In×n ⎥ ⎢ − |v|2 In×n |v|2 A(U ) = ⎣ ⎦.

(3.5)

we can rewrite (3.2) as

where

⎡

−In×n

0

By a direct calculation, the eigenvalues of A(U ) are λ1 ≡ · · · ≡ λn = λ− , λn+1 ≡ · · · ≡ λ2n = λ+ , where λ± =

1 2 − (|u|2 − 1)|v|2 . −u, v ± u, v |v|2

(3.6)

(3.7)

158

D. Kong, Q. Zhang, Q. Zhou

The right eigenvector corresponding to λi (i = 1, . . . , 2n) can be chosen as ri = (−λ− ei , ei )T (i = 1, . . . , n), ri = (−λ+ ei−n , ei−n )T (i = n + 1, . . . , 2n), (3.8) where (i)

ei = (0, . . . , 0, 1 , 0, . . . , 0) (i = 1, . . . , n);

(3.9)

while, the left eigenvector corresponding to λi (i = 1, . . . , 2n) can be taken as li = (ei , λ+ ei ) (i = 1, . . . , n), li = (ei−n , λ− ei−n ) (i = n + 1, . . . , 2n). (3.10) Summarizing the above argument gives Property 3.1. Under the assumption (2.9), Eqs. (3.2) consist of a non-strictly hyperbolic system with two n-constant multiple eigenvalues (see (3.6)), and the right (resp. left) eigenvectors can be chosen as (3.8) (resp. (3.10)). Property 3.2. Under the assumption (2.9), the characteristic fields λ± are linearly degenerate in the sense of Lax, that is, the system (3.2) is linearly degenerate (cf. [20]). Proof. We calculate the invariants ∇λ− · ri (i = 1, . . . , n) and ∇λ+ · ri (i = n + 1, . . . , 2n). For every i ∈ {1, . . . , n}, by a direct calculation, we have ∂λ− ∂λ− ∂λ− ∂λ− , · (−λ− ei , ei )T = − λ− · eiT ≡ 0. (3.11) ∇λ− · ri = ∂u ∂v ∂v ∂u Similarly, ∇λ+ · ri ≡ 0, ∀ i ∈ {n + 1, . . . , 2n}.

(3.12)

Thus, the proof of Property 3.2 is completed. Remark 3.1. In fact, when n ≥ 2, the linear degeneracy of λ± may follow from Boillat [2] and Freistühler [10] directly. By Property 3.2, we may introduce the vector fields (eigenvectors spaces) corresponding to λ± , V− = span{r1 , . . . , rn } and V+ = span{rn+1 , . . . , r2n }.

(3.13)

On V− , we define the Lie bracket [ri , r j ] = ∇ri · r j − ∇r j · ri , ∀ i, j ∈ {1, . . . , n},

(3.14)

and similarly, on V+ , [ri , r j ] = ∇ri · r j − ∇r j · ri , ∀ i, j ∈ {n + 1, . . . , 2n}.

(3.15)

It is easy to show the following lemma. Property 3.3. Under the assumption (2.9), V+ and V− are two commutative Lie algebras. On the other hand, we can easily prove the following property.

Dynamics of Relativistic Strings

159

Property 3.4. Under the assumption (2.9), the system (3.2) is rich in the sense of Serre [23]. In fact, introduce Ri = u i + λ− vi ,

Ri+n = u i + λ+ vi (i = 1, . . . , n).

(3.16)

It is easy to verify that Ri (resp. Ri+n ) are Riemann invariants corresponding to λ+ (resp. λ− ), and then they satisfy ∂ Ri ∂ Ri+n ∂ Ri ∂ Ri+n + λ+ = 0, + λ− = 0 (i = 1, . . . , n). (3.17) ∂t ∂θ ∂t ∂θ Moreover, λ− (resp. λ+ ) is also a Riemann invariant corresponding to λ+ (resp. λ− ), and λ± satisfy ∂λ− ∂λ+ ∂λ− ∂λ+ + λ+ = 0, + λ− = 0. ∂t ∂θ ∂t ∂θ The systems (3.17) and (3.18) play an important role in our argument.

(3.18)

4. Global Propagation of Nonlinear Waves In this section we investigate the dynamics of relativistic strings moving in the Minkowski space R1+n , and describe the global propagation of nonlinear waves. Nonlinear waves considered here are the C 2 solutions of the Cauchy problem for Eqs. (2.13) with the following initial data t = 0 : x = p(θ ), xt = q(θ ), p is a given C 2

(4.1)

is a given C 1

where vector-valued function and q vector-valued function. In physics, this system governs the motion of a free relativistic string in the Minkowski space R1+n with the initial position p(θ ) and initial velocity q(θ ). In particular, when p(θ ) and q(θ ) are periodic, this means that the string is closed. Introduce 1

(θ )2 − (|q(θ )|2 − 1)| p (θ )|2 ± (θ ) = −q(θ ), p (θ )± q(θ ), p | p (θ )|2 (4.2) and L(θ ) = q(θ ), p (θ )2 − (|q(θ )|2 − 1)| p (θ )|2 .

(4.3)

In physics, ± (θ ) stand for the characteristic propagation speeds of the point θ at the initial time, and L(θ ) denotes the Lagrangian energy density. Theorem 4.1. Suppose that there exist two positive constants ∗ and ∗ such that ∗ ≤ ± (θ ) ≤ ∗ and L(θ ) > 0, ∀ θ ∈ R. Then the Cauchy problem (2.13), (4.1) admits a unique global on R+ × R satisfying

C2

(4.4)

solution x = x(t, θ )

L(t, θ ) = xt (t, θ ), xθ (t, θ )2 − (|xt (t, θ )|2 − 1)|xθ (t, θ )|2 ≥ 0, ∀ (t, θ ) ∈ R+ ×R, (4.5)

160

D. Kong, Q. Zhang, Q. Zhou

if and only if, for every fixed θ2 ∈ R, − (θ1 ) < + (θ2 ), ∀ θ1 < θ2 .

(4.6)

Remark 4.1. In particular, if the initial velocity q(θ ) is less than the light speed, that is, |q(θ )| < 1, ∀ θ ∈ R, and the initial data p(θ ) satisfies p (θ ) = 0, ∀ θ ∈ R, then the condition (4.4) is satisfied and − (θ ) < 0 < + (θ ), ∀ θ ∈ R, which, as a consequence, implies that the condition (4.6) is satisfied, and then the conclusion of Theorem 4.1 is true. Moreover, it holds that λ− (t, θ ) < 0 < λ+ (t, θ ), where λ± (t, θ ) is the C 1 solution of the Cauchy problem for the system (3.18) with the initial data λ± (0, θ ) = ± (θ ). Remark 4.2. The assumption (4.4) implies that the surface is time-like when t = 0. The inequality (4.6) is a necessary and sufficient condition guaranteeing the global existence of the extremal surface in R1+n , on which there is no space-like point. That is, under the condition (4.6) there is a unique global extremal surface without space-like points. If (4.6) is not satisfied, then in geometry the surface is no longer time-like and changes its type, for example, from the time-like type to the space-like type. See Sect. 7 for the details. Remark 4.3. On the other hand, even if the surface is time-like at the initial time, the assumption (4.6) can only guarantee that the surface does not change to be space-like, however it can not ensure that the surface is always time-like, in other words, the surface may change its type from the time-like type to the light-like type. In particular, even if the initial velocity is less than the light speed, under the assumption (4.6) the speed of the string may achieve the light speed in finite time. For example, we consider Eqs. (2.13) in the Minkowski space R1+2 with the initial data t =0:

x1 = cos θ, x2 = sin θ,

x1,t = 0, x2,t = 0.

The unique smooth solution reads x1 = cos θ cos t, x2 = sin θ cos t. It is easy to see that xθ = 0 and |xt |2 = 1, when t = kπ + π/2 (k = 0, 1, 2, . . .), which implies that, when t takes the values kπ + π/2 (k = 0, 1, 2, . . .), the speed of the string is the light speed, the surface is light-like at the points (kπ + π/2, 0, 0) in the (t, x1 , x2 )-plane, and these points are singular points of the surface. But, in this case, the solution (x1 , x2 ) = (cos θ cos t, sin θ cos t) is still smooth.

Dynamics of Relativistic Strings

161

Corollary 4.1. Suppose that p(θ ) and q(θ ) are periodic and (4.4) is satisfied. Then the Cauchy problem (2.13), (4.1) admits a unique global C 2 solution x = x(t, θ ) with (4.5) on R × R if and only if max − (θ ) < min + (θ ),

θ∈[0,P ]

θ∈[0,P ]

(4.7)

where P is the period of the functions p(θ ) and q(θ ). Remark 4.4. Corollary 4.1 describes the dynamics of closed relativistic strings moving in the Minkowski space R1+n and gives a necessary and sufficient condition on the global propagation of nonlinear waves. Proof of Theorem 4.1. The proof is constructive. We first consider the Cauchy problem for the system (3.18) with the initial data t = 0 : λ± = ± (θ ).

(4.8)

By Kong and Tsuji [19], the Cauchy problem for the system (3.18), (4.8) has a unique global C 1 solution λ± = λ± (t, θ ) on R+ × R if and only if (4.6) is satisfied; moreover, on the existence domain of the solution, it always holds λ+ (t, θ ) > λ− (t, θ ), ∀ (t, θ ) ∈ R+ × R.

(4.9)

We then consider the Cauchy problem for the system (3.17) with the following initial data: Ri = qi (θ ) + − (θ ) pi (θ ) = Ri0 (θ ), t =0: (i = 1, . . . , n). (4.10) 0 Ri+n = qi (θ ) + + (θ ) pi (θ ) = Ri+n (θ ) Under the assumption (4.6), the λ± = λ± (t, θ ) can be solved. In this case the system (3.17) becomes linear. Therefore, by the standard method of characteristics the Cauchy problem (3.17), (4.10) has a unique global C 1 solution Ri = Ri (t, θ ) (i = 1, . . . , 2n) on R+ × R. Once Ri = Ri (t, θ ) (i = 1, . . . , 2n) have been obtained, by (3.16) and (4.9), we have ui =

λ+ (t, θ )Ri (t, θ ) − λ− (t, θ )Ri+n (t, θ ) Ri+n (t, θ ) − Ri (t, θ ) , vi = . (4.11) λ+ (t, θ ) − λ− (t, θ ) λ+ (t, θ ) − λ− (t, θ )

Clearly, u i and vi are C 1 with respect to t, θ . Noting (3.1) gives t u(s, θ )ds. x(t, θ ) = p(θ ) +

(4.12)

0

It is easy to verify that (4.12) is the desired C 2 solution of the Cauchy problem (2.13), (4.1). Under the assumption (4.6), we prove the inequality (4.5). If |xθ (t, θ )| = 0, then it is obvious that the inequality (4.5) is true at these kinds of points. If |xθ (t, θ )| = 0, by (4.9) and (3.7) we see that (4.5) is also valid. This proves (4.5). The uniqueness comes from the standard theory on quasilinear hyperbolic equations of second order. Thus, the proof of Theorem 4.1 is completed.

162

D. Kong, Q. Zhang, Q. Zhou

In what follows, we study the global existence of the classical C 2 solutions, x = x(t, θ ), satisfying (2.13) on a time-space region of the form [0, ∞) × [0, L] and obeying either the Dirichlet boundary condition x(t, θ ) θ = 0 = 0 for t > 0, θ = L

(4.13)

or the Neumann boundary condition ∂x (t, θ ) θ = 0 = 0 for t > 0 ∂θ θ = L

(4.14)

on the side walls together with the initial conditions x(0, θ ) = p(θ ), xt (0, θ ) = q(θ ),

(4.15)

where ( p, q) ∈ C 2 ([0, L])×C 1 ([0, L]). In order for x to belong to C 2 ([0, δ]×[0, L]) for arbitrarily small δ > 0, the initial data p and q must necessarily satisfy the compatibility conditions p(θ ) θ = 0 = p

(θ ) θ = 0 = 0, q(θ ) θ = 0 = 0, for the Dirichlet conditions, θ = L θ = L θ = L (4.16) ∂p (θ ) θ = 0 = 0, ∂θ θ = L

∂q (θ ) θ = 0 = 0, for the Neumann conditions. (4.17) ∂θ θ = L

We have the following consequences of Corollary 4.1. Theorem 4.2. (I). Suppose that the compatibility conditions in (4.16) are satisfied and suppose further that min{ inf + (± p(θ ), q(θ ))} > max{ sup − (± p(θ ), q(θ ))}, θ∈[0,L]

θ∈[0,L]

where ± are defined by ± ( p(θ ), q(θ )) =

1 | p (θ)|2

(4.18)

−q(θ), p (θ) ± q(θ), p (θ)2 − (|q(θ)|2 − 1)| p (θ)|2 . (4.19)

Then the Dirichlet problem (2.13), (4.13), (4.15) admits a unique global C 2 solution x = x(t, θ ) on [0, ∞) × [0, L]; moreover, L(t, θ ) ≥ 0, ∀ (t, θ ) ∈ [0, ∞) × [0, L].

(4.20)

(II). Suppose that the compatibility conditions in (4.17) are satisfied and suppose further that (4.18) holds. Then the Neumann problem (2.13), (4.14)–(4.15) admits a unique global C 2 solution x = x(t, θ ) on [0, ∞) × [0, L]; moreover, the estimate (4.20) still holds.

Dynamics of Relativistic Strings

163

Remark 4.5. As a special case, if + ( p(θ ), q(θ )) > 0 > − ( p(ϑ), q(ϑ)), ∀ θ, ϑ ∈ [0, L], then the hypothesis (4.18) is satisfied, and the conclusions of Theorem 4.2 hold. Proof of Theorem 4.2. For the Dirichlet problem, we extend any C 2 solution x = x(t, θ ) to the interval [−L , L] by x(t, θ ) = −x(t, −θ )

for θ ∈ [−L , 0],

(4.21)

and then extend x(t, θ ) to be 2L-periodic. One easily checks that if the given initial data has the form in (4.15), the extended initial data is 2L-periodic and given by

( p (θ ), q (θ )) = (x(0, θ ), xt (0, θ )) =

(− p(−θ ), −q(−θ )) for θ ∈ [−L , 0], (4.22) ( p(θ ), q(θ )) for θ ∈ [0, L].

When the compatibility conditions in (4.16) are satisfied, this extended x is a C 2 solution of Eqs. (2.13) with the initial data ( p, q ). Therefore, by Corollary 4.1, we have Lemma 4.3. Under the assumptions of Theorem 4.2(I), the Cauchy problem for Eqs. (2.13) with the following 2L-periodic initial data: q (θ ) x(0, θ ) = p (θ ), xt (0, θ ) =

(4.23)

x (t, θ ) on R × R; moreover, admits a unique global C 2 solution x = θ ) = L(t, xt (t, θ ), xθ (t, θ )2 − (| xt (t, θ )|2 − 1)| xθ (t, θ )|2 ≥ 0, ∀ (t, θ ) ∈ R × R. (4.24) Let x(t, θ ) be the restriction of x (t, θ ) on the region [0, ∞)×[0, L]. It is obvious that x(t, θ ) is the unique global C 2 solution of the Dirichlet problem (2.13), (4.13), (4.15). On the other hand, (4.20) comes from (4.24) directly. This proves Theorem 4.2(I). Similarly, for the Neumann problem, we extend x by x(t, θ ) = x(t, −θ )

for θ ∈ [−L , 0].

(4.25)

Then, x(t, θ ) can be extended to be a 2L-periodic C 2 solution of the Eqs. (2.13) with 2L-periodic initial data given by

( p (θ ), q (θ )) = (x(0, θ ), xt (0, θ )) =

( p(−θ ), q(−θ )) for θ ∈ [−L , 0], (4.26) ( p(θ ), q(θ )) for θ ∈ [0, L].

When the compatibility conditions in (4.17) are satisfied, by a similar argument as mentioned above, we can prove Theorem 4.2 (II). Thus, the proof of Theorem 4.2 is completed.

164

D. Kong, Q. Zhang, Q. Zhou

5. Explicit Exact Representation of General Solutions This section presents an explicit exact representation, involving two independent arbitrary functions, of general solution of Eqs. (2.13). Throughout this section, we assume that (4.4) and (4.6) are all satisfied. As before, the key point is to solve λ± (t, θ ). We now consider the Cauchy problem (3.18), (4.8). Similar to Serre [23], we define θ 2 dξ (5.1) ρ(θ ) = 0 + (ξ ) − − (ξ ) and let θ = (σ ) be the inverse function of σ = ρ(θ ). Similarly, define 1 σ +t 1 σ −t (t, σ ) = + ((ξ ))dξ − − ((ξ ))dξ 2 0 2 0

(5.2)

and let σ = (t, θ ) be the inverse function of θ = (t, σ ). The solution of the Cauchy problem (3.18), (4.8) can be explicitly given by λ± (t, θ ) = ± (((t, θ ) ± t)).

(5.3)

We next solve the Cauchy problem for the following linear system: ∂ Ri ∂ Ri + λ+ (t, θ ) = 0, ∂t ∂θ

∂ Ri+n ∂ Ri+n + λ− (t, θ ) = 0 (i = 1, . . . , n) ∂t ∂θ

(5.4)

with the initial data (4.10), where λ± (t, θ ) are given by (5.3), pi and qi are the i-th component of p and q respectively. To do so, we consider the following initial value problems for ODEs: dζ± = λ± (τ, ζ± (τ )), ζ± (t) = θ. dτ

(5.5)

Let ζ± = ζ± (τ ; t, θ ) be the solutions of the above initial value problems and define α± (t, θ ) = ζ± (0; t, θ ).

(5.6)

Noting (5.3) and using (3.18) yields α± (t, θ ) = ((t, θ ) ∓ t).

(5.7)

By the method of characteristics, the solution of the Cauchy problem (5.4), (4.10) is given by Ri (t, θ ) = Ri0 (α+ (t, θ )),

0 Ri+n (t, θ ) = Ri+n (α− (t, θ )) (i = 1, . . . , n),

(5.8)

that is, Ri (t, θ ) = Ri0 (((t, θ ) − t)),

0 Ri+n (t, θ ) = Ri+n (((t, θ ) + t)) (i = 1, . . . , n). (5.9)

On the other hand, noting (3.16), we obtain ui =

λ+ (t, θ )Ri (t, θ ) − λ− (t, θ )Ri+n (t, θ ) Ri+n (t, θ ) − Ri (t, θ ) , vi = , (5.10) λ+ (t, θ ) − λ− (t, θ ) λ+ (t, θ ) − λ− (t, θ )

Dynamics of Relativistic Strings

165

Fig. 1. The corresponding extremal surface in R1+2

Fig. 2. The corresponding extremal surface (cone) in R1+2 : the left is for +, the right for −

where λ± (t, θ ) and R j (t, θ ) ( j = 1, . . . , 2n) are given by (5.3) and (5.9), respectively. Therefore, the solution of the Cauchy problem (2.13), (4.1) is given by t x(t, θ ) = p(θ ) + u(s, θ )ds, (5.11) 0

where u = (u 1 , . . . , u n ) in which u i is defined by the first equality in (5.10). By the assumption (4.6), x(t, θ ) is well defined for all (t, θ ) ∈ R+ × R. Equation (5.11) gives an explicit exact representation, involving two independent arbitrary vector-valued functions p(x) and q(x), of general solution of Eqs. (2.13), where p(x) and q(x) should be chosen such that the condition (4.6) is satisfied. These solutions describe the global extremal surfaces without space-like points.

166

D. Kong, Q. Zhang, Q. Zhou

1 0.5 0 –0.5 –1 x

2

1.5

1

0.5 y

0

–0.5

–1

–1.5

–2

Fig. 3. The extremal surface in the Minkowski space R1+2

6. Numerical Illustrations and Topological Singularity This section is devoted to the numerical results which show the development of topological singularities in the motion of a closed string. According to the method presented in the last section, we first give some explicitly exact solutions, which will play an important role in our numerical analysis. Example 6.1. Consider Eqs. (2.13) with the initial data t =0:

x1 = cos θ, x2 = sin θ,

x1,t = −a sin θ, x2,t = a cos θ,

where a is a constant. By Sect. 5, it is easy to see that the unique smooth solution is x1 = cos(θ + at) cos t, x2 = sin(θ + at) cos t. This implies that x12 + x22 = cos2 t. In the present situation, the corresponding extremal surface is shown in Fig. 1.

Dynamics of Relativistic Strings

167 2

1.5

1

y

0.5

0

–0.5

–1

–1.5

–2 –1

–0.5

0 x

0.5

1

Fig. 4. The projection of the extremal surface in the (x, y)-plane

Example 6.2. Consider Eqs. (2.13) with the initial data t =0:

x1 = cos θ, x2 = sin θ,

x1,t = ± cos θ, x2,t = ± sin θ.

The unique smooth solution reads x1 = cos θ ± t cos θ, x2 = sin θ ± t sin θ. Clearly, x12 + x22 = (1 ± t)2 . The corresponding extremal surface is shown in Fig. 2. In what follows, some numerical examples are investigated. Numerical Example 1.

Consider the motion of a unit circle with the initial velocity t = 0 : xt = (0, −0.99 sin θ ).

Figure 3 below is the corresponding extremal surface in the Minkowski space R1+2 ; Fig. 4 is the projection of the extremal surface in the (x, y)-plane; Figs. 5 show the dynamics of the closed string moving in the Minkowski space R1+2 with the above initial velocity. Here we particularly point out that in order to understand how the topological singularities appear, in Figs. 5 we use different scales (see Figs. 5 for the details) so that we can see clearly how the topological structure changes even if the closed string becomes small.

168

D. Kong, Q. Zhang, Q. Zhou

Fig. 5. The dynamics of the closed string moving in the Minkowski space R1+2

Numerical Example 2.

Consider the motion of a unit circle with an initial velocity t = 0 : xt = (0, −0.99 sin(8θ )).

Similar to Figs. 3–5, we have the following Figs. 6–8 for this initial data. Remark 6.1. The above examples show that, in phase space, there are some topological singularities, even if the solution is sufficiently smooth. For example, there are some cross points in the string. Remark 6.2. In order to investigate the development of topological singularities in the motion of a closed string, we have constructed many numerical examples and made some movies which can clearly show how the topological structure changes for a closed string moving in the Minkowski space R1+2 .

Dynamics of Relativistic Strings

169

Fig. 5. Contd.

7. Some Remarks and Discussions In this section we give some important remarks and discussions related to the theory of extremal surfaces of mixed type. Remark 7.1. Consider the Cauchy problem (2.13), (4.1). We assume that the initial data (4.1) is smooth and satisfies the light-like condition, that is, q(θ ), p (θ )2 − (|q(θ )|2 − 1)| p (θ )|2 = 0, ∀ θ ∈ R.

(7.1)

If the Cauchy problem (2.13), (4.1) has a smooth solution x = x(t, θ ) with |xθ (t, θ )| > 0

(7.2)

on the existence domain, denoted by ∗ , then the corresponding surface is always light-like on ∗ . In fact, since (7.1) holds, it follows from (4.2) that − (θ ) = + (θ ), ∀ θ ∈ R.

(7.3)

170

D. Kong, Q. Zhang, Q. Zhou

Fig. 6. The extremal surface in the Minkowski space R1+2

Fig. 7. The projection of the extremal surface in the (x, y)-plane

Dynamics of Relativistic Strings

171

Fig. 8. The dynamics of the closed string moving in the Minkowski space R1+2

By a standard argument, we can prove that the Cauchy problem (3.18), (4.8) has a unique smooth solution λ± = λ± (t, θ ) satisfying λ− (t, θ ) = λ+ (t, θ )

(7.4)

on the domain ∗ . By (7.2), it follows from (3.7) that xt (t, θ ), xθ (t, θ )2 − (|xt (t, θ )|2 − 1)|xθ (t, θ )|2 = 0.

(7.5)

Equation (7.5) implies that the surface is light-like on the domain ∗ . Summarizing the above argument, we observe that, if the surface is light-like at the initial time, then in general it will always be light-like on its existence domain. Remark 7.2. Consider the Cauchy problem (2.13), (4.1). If the initial data (4.1) is smooth and satisfies (7.1), moreover the initial speed is light speed, i.e., |q| = 1, then the smooth

172

D. Kong, Q. Zhang, Q. Zhou

Fig. 8. Contd.

Dynamics of Relativistic Strings

173

solution to the Cauchy problem (2.13), (4.1) always exists globally on t ≥ 0. In fact, it is easy to check that x = p(θ ) + q(θ )t is the desired solution. This shows that the light cone passing through the curve x = p(θ ) in the plane t = 0 is the desired extremal surface. As pointed out in Remark 4.2, under the assumption (4.4), (4.6) is a necessary and sufficient condition guaranteeing the global existence of the extremal surface in R1+n , on which there is no space-like point. That is, if (4.6) is not satisfied, then in geometry the surface is no longer time-like and changes its type, for example, from the time-like type to the space-like type. In this case, we have to consider the extremal surface of mixed type. It is very important and interesting to study how to construct an extremal surface of mixed type. This problem is worthy of studying in the future. Acknowledgements. The authors thank Prof. S.-T. Yau for valuable suggestions. Kong was supported in part by the NNSF of China (Grant No. 10371073) and the NCET of China (Grant No. NCET-05-0390); Zhang was supported by the Research Grants Council of the Hong Kong Special Administrative Region, China, Project CityU 1221/02P; Zhou was supported in part by the NNSF and MST of China.

References 1. Barbashov, B.M., Nesterenko, V.V., Chervyakov, A.M.: General solutions of nonlinear equations in the geometric theory of the relativistic string. Commun. Math. Phys. 84, 471–481 (1982) 2. Boillat, G.: Chocs caractéristiques. C. R. Acad. Sci., Paris, Sér. A 274, 1018–1021 (1972) 3. Bordemann, M., Hoppe, J.: The dynamics of relativistic membranes II: Nonlinear waves and covariantly reduced membrane equations. Phys. Lett. B 325, 359–365 (1994) 4. Born, M., Infeld, L.: Foundation of the new field theory. Proc. Roy. Soc. London A144, 425–451 (1934) 5. Brenier, Y.: Some Geometric PDEs Related to Hydrodynamics and Electrodynamics. Proceedings of ICM 2002 3, 761–772 (2002) 6. Calabi, E.: Examples of Bernstein problems for some nonlinear equations. In: Global Analysis (Proc. Sympos. Pure Math., Vol. XIV, Berkeley, Calif., 1968), Providence, RI: Amer. Math. Soc., 1970 pp. 223–230. 7. Cheng, S.Y., Yau, S.T.: Maximal space-like hypersurfaces in the Lorentz-Minkowski spaces. Ann. of Math. 104, 407–419 (1976) 8. Chae, D., Huh, H.: Global existence for small initial data in the Born-Infeld equations. J. Math. Phys. 44, 6132–6139 (2003) 9. Christodoulou, D.: Global solutions of nonlinear hyperbolic equations for small initial data. Comm. Pure Appl. Math. 39, 267–282 (1986) 10. Freistühler, H.: Linear degeneracy and shock waves. Math. Zeit. 207, 583–596 (1991) 11. Gibbons, G.W.: Born-Infeld particles and Dirichlet p-branes. Nucl. Phys. B 514, 603–639 (1998) 12. Green, M.B., Schwarz, J.H., Witten, E.: Superstring theory: 1. Introduction. Second edition, Cambridge Monographs on Mathematical Physics, Cambridge: Cambridge University Press, 1988 13. Gu, C.H.: Extremal surfaces of mixed type in Minkowski space Rn+1 . In: Variational methods (Paris, 1988), Progr. Nonlinear Differential Equations Appl. 4, Boston, MA: Birkhauser ¨ Boston, 1990, pp. 283– 296 14. Gu, C.H.: Complete extremal surfaces of mixed type in 3-dimensional Minkowski space. Chinese Ann. Math. 15B, 385–400 (1994) 15. Hoppe, J.: Some classical solutions of relativistic membrane equations in 4-space-time dimensions. Phys. Lett. B 329, 10–14 (1994) 16. Klainerman, S.: The null condition and global existence to nonlinear wave equations. In Lectures in Appl. Math. 23, Providence, RI: Amer. Math. Soc., 1986, pp. 293–326. 17. Kong, D.X.: A nonlinear geometric equation related to electrodynamics. Europhys. Lett. 66, 617–623 (2004) 18. Kong, D.X., Sun, Q.Y., Zhou, Y.: The equation for time-like extremal surfaces in Minkowski space R1+n . J. Math. Phys. 47, 013503 (2006) 19. Kong, D.X., Tsuji, M.: Global solutions for 2 × 2 hyperbolic systems with linearly degenerate characteristics. Funkcialaj Ekvacioj 42, 129–155 (1999) 20. Lax, P.D.: Hyperbolic systems of conservation laws II. Commun. Pure Appl. Math. 10, 537–556 (1957)

174

D. Kong, Q. Zhang, Q. Zhou

21. Lindblad, H.: A remark on global existence for small initial data of the minimal surface equation in Minkowskian space time. Proc. Amer. Math. Soc. 132, 1095–1102 (2004) 22. Milnor, T.: Entire timelike minimal surfaces in E 3,1 . Michigan Math. J. 37, 163–177 (1990) 23. Serre, D.: Systems of Conservation Laws 2: Geometric Structures, Oscillations, and Initial-Boundary Value Problems, Cambridge: Cambridge University Press, 2000 Communicated by G.W. Gibbons

Commun. Math. Phys. 269, 175–193 (2007) Digital Object Identifier (DOI) 10.1007/s00220-006-0079-0

Communications in

Mathematical Physics

Borel Summability and Lindstedt Series O. Costin1 , G. Gallavotti2 , G. Gentile3 , A. Giuliani4 1 2 3 4

Department of Mathematics, The Ohio State University, Columbus, Ohio 43210, USA Dipartimento di Fisica, Università di Roma “La Sapienza”, Roma I-00185, Italy Dipartimento di Matematica, Università di Roma Tre, Roma I-00146, Italy Department of Physics, Princeton University, Princeton, New Jersey 08544, USA. E-mail: [email protected]

Received: 15 January 2006 / Accepted: 23 March 2006 Published online: 26 August 2006 – © Springer-Verlag 2006

Abstract: Resonant motions of integrable systems subject to perturbations may continue to exist and to cover surfaces with parametric equations admitting a formal power expansion in the strength of the perturbation. Such series may be, sometimes, summed via suitable sum rules defining C ∞ functions of the perturbation strength: here we find sufficient conditions for the Borel summability of their sums in the case of two-dimensional rotation vectors with Diophantine exponent τ = 1 (e.g. with ratio of the two independent frequencies equal to the golden mean). 1. Introduction In the paradigmatic setting of KAM theory, one considers unperturbed motions ϕ → ϕ +ω0 (I)t on the torus Td , d ≥ 2, driven by a Hamiltonian H = H0 (I), where I ∈ Rd are the actions conjugated to ϕ and ω0 (I) = ∂I H0 (I). Standard analytic KAM theorem considers the perturbed Hamiltonian Hε = H0 (I)+ε f (ϕ, I), with f analytic, and a frequency vector ω0 which is Diophantine with constants C0 and τ (i.e. |ω0 ·ν| ≥ C0 |ν|−τ ∀ν ∈ Zd , ν = 0) and which is among the frequencies of the unperturbed system: ω0 = ω0 (I0 ) for some I0 . Suppose H0 (I) = I2 /2 and f (ϕ, I) = f (ϕ) for simplicity, that is assume that H0 is quadratic and the perturbation depends only on the angle variables. Then for ε small enough the unperturbed motion ϕ → ϕ + ω0 t can be analytically continued into a motion of the perturbed system, in the sense that there is an ε-analytic function hε : Td → Td , reducing to the identity as ε → 0, such that ϕ(t) = ψ +ω0 t +hε (ψ +ω0 t) solves the Hamilton equations for Hε , i.e. ϕ¨ + ε∂ϕ f (ϕ) = 0, for any choice of the initial data ψ. The function hε (called the conjugation) can be constructed as a power series in ε (Lindstedt series) and for ε small convergence can be proved, exploiting cancellations and summation methods typical of quantum field theory, [GBG]. We can call this the maximal KAM theorem, as it deals with invariant tori of maximal dimension. The same methods allow us to study existence of perturbed resonant quasi-periodic motions in quasi-integrable Hamiltonian systems. By resonant here we mean that ω0

176

O. Costin, G. Gallavotti, G. Gentile, A. Giuliani

satisfies 1 ≤ s ≤ d − 1 rational relations, i.e. it can be reduced to a vector (ω, 0), ω ∈ Rd−s , with rationally-independent components, via a canonical transformation acting as a linear integer coefficients map of the angles. In this representation we shall write ϕ = (α, β) denoting by α the “fast variables” rotating with angular velocity ω and by β the fixed unperturbed angles (“slow variables”). The study of resonant quasi-periodic motions is mathematically a natural extension of the maximal KAM case and physically it arises in several stability problems. We mention here celestial mechanics, where the phenomenon of resonance locking between rotation and orbital periods of satellites is a simple example (as in this case the resonant torus is one-dimensional, i.e. it describes a periodic motion). In general resonant motions arise in the presence of small friction: the most unstable motions are the maximally quasi-periodic ones (on KAM tori). In the presence of friction the maximally quasi-periodic motions “collapse” into resonant motions with one frequency rationally related to the others, then on a longer time scale one more frequency gets locked to the others and the motion takes place on an invariant torus with dimension lower than the maximal by 2, and so on. Periodic motions are the least dissipative, and eventually the motion becomes maximally resonant, i.e. periodic. This is a scenario among others possible, and its study in particular cases seems to require a good understanding of the properties of the resonant motions of any dimension. The first mathematical result is that also in the resonant case, if ω is a (d − s)-dimensional Diophantine vector, a conjugation hε can be constructed by summations of the Lindstedt series; however the conjugation hε that one is able to construct is not analytic, but, at best, only C ∞ in ε: in general it is defined only on a large measure Cantor set E of ε’s, E ⊂ [−ε0 , ε0 ] for some positive ε0 , so that C ∞ has to be meant in the sense of Withney. It is commonly believed that a conjugation which is analytic in a domain including the origin does not exist in the resonant case. Moreover, contrary to what happens in the maximal case, not all resonant unperturbed motions with a given rotation vector ω appear to survive under perturbation, but only a discrete number of them. This is not due to technical limitations of the method, and it has the physical meaning that only points β 0 which are equilibria for the “effective potential” (2π )s−d dα f (α, β) can remain in average at rest in the presence of the perturbation. The following natural (informal) question then arises: “where do the unperturbed motions corresponding to initial data (α, β), β = β 0 , disappear when we switch on the perturbation?” An intriguing scenario is that the tori that seem to disappear in the construction of hε actually “condense” into a continuum of highly degenerate tori near the ones corresponding to the equilibria β 0 . It is therefore interesting to study “uniqueness”, regularity and possible degeneracies of the perturbed tori constructed by the Lindstedt series methods (or by alternative methods, such as Newton’s classical iteration scheme). In the present paper we investigate hyperbolic (see below) perturbed tori with twodimensional rotation vectors for a class of analytic quasi-integrable Hamiltonians. Informally, our main result is that the hyperbolic tori, which survive to the switching of the perturbation and are described by a function hε that we construct explicitly, are independent on the procedure used to construct √them iteratively (which is not obvious, due to lack of analyticity). Moreover, if η = ε, the conjugation is Borel summable in η for ε > 0. The conjugation constructed here is the unique one possible for our problem within the class of Borel summable functions. Of course this does not exclude existence of other less regular quasi-periodic motions, not even existence of other quasi-periodic solutions which admit the same formal power expansion as hε .

Borel Summability and Lindstedt Series

177

In order to make our results more precise, we first introduce the model and summarize the results about existence and properties of lower-dimensional perturbed tori as we need in the following. Then we briefly recall the definition and some key properties of Borel summable functions, and finally we state more technically our main result. 1.1. The model. Consider a Hamiltonian system 1 2 1 2 (1.1) A + B + η2 f (α, β), 2 2 with I = (A, B) ∈ Rr × Rs the action coordinates and ϕ = (α, β) ∈ Tr × Ts the conjugated angle coordinates. Let A0 = ω, B = 0 be an unperturbed resonance with |ω · ν| ≥ C0 |ν|−τ for ν ∈ Zr , ν = 0 and for some C0 , τ > 0, and let ((A0 , 0), (ψ + ω t, β 0 )) be the corresponding unperturbed resonant motions with initial angles α = ψ, β = β 0 . The following result holds. H=

Proposition 1 ([GG1, GG2]). Let β 0 be a non-degenerate maximum of the function f 0 (β) ≡ (2π )−r dα f (α, β), i.e. ∂β f 0 (β 0 ) = 0 and ∂β2 f 0 (β 0 ) < 0, and let ω ∈ Rr be a Diophantine vector of constants C0 , τ , i.e. |ω · ν| ≥ C0 |ν|−τ for all ν ∈ Zr , ν = 0. √ Then for ε > 0, setting η = ε, there exists a function ψ → h(ψ, η) = (a(ψ), b(ψ)), vanishing as η → 0 and with the following properties. (i) The functions t → ϕ(t) = (α(t), β(t)) = (ψ + ω t, β 0 ) + h(ψ + ω t, η) satisfy the equation of motion ϕ¨ = −ε∂ϕ f (ϕ) for any choice of ψ ∈ Tr . (ii) The function h is defined for (ψ, η) ∈ Tr ×[0, η0 ] and can be analytically continued to a holomorphic function in the domain D = {η ∈ C : Reη−1 > η0−1 }. (iii) (The analytic continuation of ) h is C ∞ in η at the origin along any path contained in D and its Taylor coefficients at the origin satisfy, for suitable positive constants C and D, the bounds | k!1 ∂ηk h(ψ, 0)| < DC k k!τ , where τ is the Diophantine exponent of ω. Remarks. (1) The domain D is an open disk in C, centered at η0 /2 and of radius η0 /2 (hence tangent to the imaginary axis at the origin). (2) A function h was constructed in [GG1] by a perturbative expansion in power series in ε = η2 (Lindstedt series) and by exploiting multiscale decomposition, cancellations and summations in order to control convergence of the series. Eventually h is expressed as a new series which is not a power series in ε = η2 and which is holomorphic in D. The procedure leading to the convergent resummed series from the initial formal power series in ε = η2 relies on a number of arbitrary choices (which will be made explicit in next section) and a priori is not clear that the result is actually independent of such choices. Another (in principle) function h was constructed in [GG2], with a method which applies in more general cases (see item (4) below): we shall see that the two functions in fact coincide. (3) The function h(ψ, η) can be regarded as a function of ε = η2 , as in [GG1] and [GG2]. As such the bound in item (iii) would be modified into | k!1 ∂εk h(ψ, 0)| < DC k (2k)!τ , and the analyticity domain in item (ii) would be as described in [GG1], Fig. 1. We also note here that the exponent 2τ +1 in [GG1], Eq. (5.29), was not correct (without consequences as the value of the exponent was just quoted and not exploited), as the right one is 3τ : of course for τ = 1 (that is the case we consider in this paper) the two values coincide.

178

O. Costin, G. Gallavotti, G. Gentile, A. Giuliani

(4) In [GG2] a similar statement was proved for β 0 a non-degenerate equilibrium point of the function f 0 (β), i.e. β 0 not necessarily a maximum. If β 0 is a maximum the corresponding torus is called hyperbolic, if β 0 is a minimum it is called elliptic. In the elliptic case for ε real the domain of definition of h on the real line is Tr × E, where E ⊂ [0, ε0 ] is a set with open dense complement in [0, ε0 ] but with 0 as a density point in the sense of Lebesgue. The reason for stating Proposition 1 as above is that in the present paper we shall restrict our analysis to the hyperbolic case. (5) If τ = 1, i.e. if r = 2 and ω is quadratically irrational, then the bound on the coefficients of the power expansion of h in η at the origin is |h(k) (ψ)| < DC k k!. Hence in this case it is natural to ask for Borel summability of h. (6) Even if we consider only hyperbolic tori, in the following we shall use the method introduced in [GG2], because it is more general and it is that one should look at if one tried to extend the analysis to the case of elliptic tori. The main difference between the forthcoming analysis and that of [GG2] is the use of a sharp multiscale decomposition, instead of a smooth one, as it allows further simplifications in the case of hyperbolic tori. We shall come back to this later. 1.2. Borel transforms. Let F(η) be a function of η which is analytic in a disk centered at ( 2ρ1 0 , 0) and radius 2ρ1 0 (i.e. centered on the positive real axis and tangent, at the origin, to the imaginary axis), and which vanishes as η → 0 as ηq for some q > 1. Then one can consider the inverse Laplace transform of the function z → F(z −1 ), defined for p real and positive and ρ > ρ0 by ρ+i∞ 1 dz . (1.2) ez p F L−1 F( p) = z 2πi ρ−i∞ k If F admits a Taylor series at the origin in the form F(η) ∼ ∞ k=2 Fk η then the Taylor −1 series for L F at the origin is L−1 F( p) ∼

∞ k=2

Fk

p k−1 . (k − 1)!

(1.3)

If the series in (1.3) is convergent then the sum of the series coincides with L−1 F( p) for p > 0 real. Of course the series expansion (1.3) provides an expression suitable for studying analytic continuation of L−1 F( p) outside the real positive axis. Note that (1.3) makes sense even in the case when the series starts from k = 1. Then given any formal ∞ p k−1 k power series F(η) ∼ ∞ k=1 Fk η we shall define FB ( p) = k=1 Fk (k−1)! as the Borel transform of F(η), whenever the sum defining it is convergent. It is remarkable that in some cases the map F → FB is invertible. If this is the case one says that the Taylor series of F is Borel summable and we also say that the function F is Borel summable: this is made precise as follows: Definition 1. Let a function η → F(η) be such that (i) it is analytic in a disk centered at ( 2ρ1 0 , 0) and radius 2ρ1 0 , and admits an asymptotic Taylor series at the origin where it vanishes, (ii) its Taylor series at the origin admits a Borel transform FB ( p) which is analytic for p in a neighborhood of the positive real axis, and on the positive real axis grows at most exponentially as p → +∞,

Borel Summability and Lindstedt Series

179

(iii) it can be expressed, for η > 0 small enough, as +∞ e− p/η FB ( p) d p. F(η) =

(1.4)

0

Then we call F Borel summable (at the origin). Remarks. (1) If F is Borel summable, then one says that F is equal to the Borel sum of its own Taylor series. (2) For instance one checks that a function F holomorphic at the origin and vanishing at the origin is Borel summable; ∞ −kits ηBorel transform is entire. (3) The function F(η) = k=1 2 1+kη is not analytic at the origin but it is Borel summable. (4) If Borel summable then also F G is Borel summable and (F G) B ( p) = pF, G are

)G ( p − p )dp ≡ (F ∗ G )( p) on the common analyticity domain of F ( p B B B B 0 FB and G B , with the integral which can be computed along any path from 0 to p in the common analyticity domain. Of course by definition |FB ∗ G B | ≤ ||FB | ∗ |G B ||, where in the r.h.s. the convolution is along any path from 0 to p in the common analyticity domain of FB and G B (note however that now the convolution |FB | ∗ |G B | depends on the path). (5) The Borel transform of ηk is p k−1 /(k − 1)!. The Borel transform of η/(1 − αη) is eαp . If FB ( p) = eαp p k1 /k1 ! and G B ( p) = eαp p k2 /k2 !, then FB ∗ G B ( p) = eαp p k1 +k2 +1 /(k1 + k2 + 1)! (6) More generally if Fi , i = 1, 2, have Borel transforms Fi B bounded in a sector around the real axis and centered at the origin by |Fi B ( p)| ≤ Ceαi | p|+βi |Im p| | p|ki /ki !, with αi and βi real, then |F1 B ∗ F2 B ( p)| ≤ C 2 eα| p|+β|Im p| | p|k1 +k2 +1 /(k1 + k2 + 1)!, where α = max αi and β = max βi . We shall make use of this bound repeatedly below. 1.3. Main results. We are now ready to state more precisely our main results. Proposition 2. Let us consider Hamiltonian (1.1) with r = 2 and f (α, β) an analytic function of its arguments. Let β 0 ∈ Ts satisfy the assumptions of Proposition 1 and ω ∈ R2 be a Diophantine vector with constants C0 > 0 and τ = 1, i.e. |ω · ν| ≥ C0 |ν|−1 for all ν ∈ Z2 , ν = 0. Then there exists a unique Borel summable function h(ψ, η) satisfying properties (i)–(iii) of Proposition 1. Note that, once existence of a Borel summable function h(ψ, η) satisfying properties (i)–(iii) of Proposition 1 is obtained, the uniqueness in the class of Borel summable functions is obvious, by the very definition of Borel summability. In fact all such functions have a Borel transform h B (ψ, p) that is p-analytic in an open domain enclosing R+ (hence also in a neighborood of the origin), where they coincide (because they all have the same expansion at the origin), then they all coincide everywhere. The proof of Proposition 2 will proceed by showing Borel summability of (one of) the function(s) h(ψ, η) constructed in [GG2]. In particular a corollary of the proof is that h(ψ, η) constructed in [GG2] is independent of the arbitrary choices mentioned in Remark (2) after Proposition 1. Our proof of Borel summability of h does not use Nevanlinna’s theorem, [Ne, So]. In fact we failed in checking that h satisfies the hypothesis of the theorem. Our strategy

180

O. Costin, G. Gallavotti, G. Gentile, A. Giuliani

goes as follows. We introduce a sequence of approximants h(N ) to h, that is naturally induced by the multiscale construction of [GG2]. We explicitly check that h(1) is Borel summable and that its Borel transform is entire. Then we show inductively that the ana(N ) lyticity domain of h B is a neighborood B of R+ (not shrinking to 0 as N → ∞) and (N ) that h B grows very fast at infinity in B (in general faster than exponential). However ) the results of Proposition 1 imply that the growth of h(N B on the positive real line is uniformly bounded by an exponential. Then Borel summability of h follows by performing the limit N → ∞ and using uniform bounds that we shall derive on the approximants and on their Borel transforms. In the next section we will recall the structure and the properties of the resummed series obtained in [GG2], defining the function h(ψ, η) of Proposition 1. In Sect. 3 we define the sequence h(N ) of approximants and we show that the inverse Laplace transform of h(N ) is uniformly bounded by an exponential on the positive real line. In Sect. 4 we prove Borel summability of h(ψ, η). In Appendix A1 we discuss how to improve the bounds in Sect. 4 in the case in which the perturbation f (α, β) in (1.1) is a trigonometric polynomial in α. Finally, in Appendix A2 we show that the same result applies to the function h constructed in [GG1]: this allows us to identify the functions constructed with the two methods of [GG1] and [GG2], since they are both Borel summable and admit the same formal expansion at the origin. Note that the conjugation functions constructed in [GG1] and [GG2] admit the same formal expansion at the origin simply because they were obtained by two different summation schemes of the same formal Taylor series. 2. Lindstedt Series Denote by a(ψ), b(ψ) the α, β components of h, respectively. In [GG2] an algorithm is described to construct order by order in η2 the solution to the homologic equation (ω · ∂ψ )2 a(ψ) = η2 ∂α f (ψ + a(ψ), β 0 + b(ψ)), (2.1) (ω · ∂ψ )2 b(ψ) = η2 ∂β f (ψ + a(ψ), β 0 + b(ψ)). The resulting series, called the “Lindstedt series”, is widely believed to be divergent. A summation procedure has been found which collects its terms into families until a convergent series is obtained. The resummed series (no longer a power series) can be described in terms of suitably decorated tree graphs, i.e. h can be expressed as a sum of values of tree graphs: h γ ,ν (η) = Val(θ ), (2.2) θ∈ ν,γ

where hν is the ν th coefficient in the Fourier series for h, and γ = {1, . . . , 2 + s} labels the component of the vector hν (recall that 2 + s is the number of degrees of freedom of our Hamiltonian, 2 being the number of “fast variables” α and s being the number of “slow variables” β). ν ,γ is the set of decorated trees contributing to h γ ,ν (η) and, given θ ∈ ν ,γ , Val(θ ) is its value, both still to be defined. We now describe the rules to construct the tree graphs and to compute their value. We shall need the explicit structure in the proof of Borel summability in the next sections, and this is why we are reviewing it here. Given the rules below one can formally check that the sum (2.2) is a solution to the Hamilton equations, see [GG2]. A few differences (in fact simplifications) arise here with respect to [GG2], and we provide some

Borel Summability and Lindstedt Series

181

details with the aim of making the discussion self-consistent. Essentially, the changes consist of: (i) shifting the order of factors in products appearing in the definition of the values Val(θ ) to an order that makes it easier to organize the recursive evaluation of several Borel transforms; see remarks following (2.9), and (ii) using a sharp multiscale decomposition; see item (f) below. Consider a tree graph (or simply tree) θ with k nodes v1 , . . . , vk and one root r, which is not considered a node; the tree lines are oriented towards the root (see Fig. 1). The line entering the root is called the root line. We denote by V (θ ) and (θ ) the set of nodes and the set of lines in θ , respectively. (a) On each node v a label ν v ∈ Z2 , called the mode label, is appended. (b) To each line a pair of labels η = (γ , γ ) is attached. γ and γ are called the left or right component labels, respectively: γ ∈ (1, . . . , 2 + s) is associated with the left endpoint of and γ ∈ (1, . . . , 2 + s) with the right endpoint (in the orientation toward the root, see Fig. 1). The label γ associated with the root line will be denoted by γ (θ ). (c) Each node v will have pv ≥ 0 entering lines 1 , . . . , pv . Hence with the node v we

,...,γ can associate the left component labels γv1 vpv of the entering lines, and an extra label γv0 = γ attached to the right endpoint of the line exiting from v. Thus a tensor ∂γv0 γv1 ···γvpv f ν v (β 0 ) can be associated with each node v, with ∂γ denoting the derivative with respect to βγ if γ > 2 and multiplication by iνγ if γ ≤ 2. (d) A momentum ν is associated with each line = v v oriented from v to v : this is a 2 vector in Z defined as ν = w≤v ν w . The root momentum, that is the momentum through the root line, will be denoted by ν(θ ). (e) A number label k ∈ {1, . . . , | (θ )|} is associated with each line , with ∪∈ (θ) {k } = {1, . . . , | (θ )|}. The number label is used for combinatorial purposes: two trees differing only because of the number labels are still considered distinct. (f) Each line also carries a scale label n = −1, 0, 1, 2, . . .: this is a number which determines the size of the small divisor ω · ν , in terms of an exponentially decreasing sequence {γ p }∞ p=0 of positive numbers that we shall introduce in a moment. If ν = 0 then n = −1. If |ω · ν | ≥ γ0 then n = 0, and we say that the line (or else ω · ν ) is on scale 0. If γ p ≤ |ω · ν| < γ p−1 for some p then n = p, and

Fig. 1. A tree θ with 12 nodes; one has pv0 = 2, pv1 = 2, pv2 = 3, pv3 = 2, pv4 = 2. The length of the lines should be the same but it is drawn of arbitrary size. The separated line illustrates the way to think of the label η = (γ , γ )

182

O. Costin, G. Gallavotti, G. Gentile, A. Giuliani

we say that the line (or else ω · ν ) is on scale p. The sequence {γ p }∞ p=0 is such − p−2 − p−1 ,2 ) for all p ≥ 0 and, furthermore, ω · ν not only stays that γ p ∈ C0 [2 bounded below by C0 |ν|−1 (because of the Diophantine condition) but it stays also “far” from the values γ p for ν not too large, i.e. for |ν| at most of order 2 p ; cf. [GG] for a proof of the existence of the sequence (without further assumptions on ω). Precisely, (1) |ω · ν| ≥ C0 |ν|−1 , 0 = ν ∈ Z2 , (2) min |ω · ν| − γ p > C0 2−n 0≤ p≤n

if n ≥ 0, 0 < |ν| ≤ 2(n−3) .

(2.3)

Note that the definition of scale of a line depends on the arbitrary choice of the sequence {γ p }: we could as well have used a sequence scaling as γ p ∼ γ − p , with γ any number > 1 instead of γ = 2; or we could have used a smooth cutoff function (as in [GG2]) replacing the sharp cutoff function 1l(γ p ≤ |ω · ν| < γ p−1 ) implied in the definition above. (g) The scale labels allow us to define hierarchically ordered clusters. A cluster T of scale n is a maximal connected set of lines on scale n , with n ≤ n, containing at least one line on scale n. The lines which are connected to a line of T but do not belong to T are called the external lines of T : according to their orientations, one of them will be called the exiting line of T , while all the others will be the entering lines of T . All the external lines are on scales n with n > n. The set of lines of T , called the internal lines of T , will be denoted by (T ) and the set of nodes of T will be denoted by V (T ). (h) Not all arrangements of the labels are permitted. The “allowed trees” will have no nodes with 0 momentum and with only one entering line and the exiting line also carrying 0 momentum. We also discard trees which contain clusters with only one entering line and one exiting line with equal momentum and with no line with 0 momentum on the path joining the entering and exiting lines (“self–energy” clusters or “resonances”). Remark. One can verify that chains of self-energy clusters can actually appear in the initial formal Lindstedt series. One of the main points in [GG1] and [GG2] is to show that if one modifies the series by discarding all chains of self-energy clusters, then the resulting series is convergent (a form of Bryuno’s lemma that appears in KAM theory). In both [GG1] and [GG2] it is shown that, in order to deal with chains of self-energy diagrams one can iteratively resum them into the propagators (i.e. the factors associated with the tree lines in the value of a tree, see below for a definition), that will then turn out to be different from those appearing in the naive formal Lindstedt series (which are simply (ω · ν)−2 ). Such resummation is the analogue of Dyson’s equations in quantum field theory and the iteratively modifed propagator has been, therefore, called the dressed propagator. Here there is further freedom in the choice of the self-energy clusters. The idea is that the self-energy clusters must include the “diverging contributions” affecting the initial formal power series. But if we change the definition of self-energy clusters by adding to the class a new class of non-diverging clusters, the construction can be shown to go through as well. We find convenient the specific choice above but this is of course not necessary. This is the second arbitrary choice we do in the iterative construction of the resummed series. It can fuel doubts about the uniqueness of the result which can only be dismissed by further arguments (like the Borel summability that we are proving). The set of all allowed trees with labels γ (θ ) = γ and ν(θ ) = ν is denoted by ν γ (this is the set appearing in (2.2)). The labels described above are used to define the

Borel Summability and Lindstedt Series

183

value Val(θ ) of a (decorated) tree θ ∈ ν γ : this is a number obtained by multiplying the following factors: f ν v (β 0 ), called the node factor, per each node v; (1) a factor Fv = ∂γv0 γv1 ···γvp

v

(2) a factor g[n ] = gγ[n ,γ] (ω · ν ; η), called the propagator, per each line of scale n ,

momentum ν and component labels γ , γ , see items (I)–(V) below for a definition.

The value is then defined as ⎞⎛ ⎞ ⎛ [n ] 1 ⎝ Val(θ ) = Fv ⎠ ⎝ g ⎠ , | (θ )|! v∈V (θ)

(2.4)

∈ (θ)

where it should be noted that all labels γ (of the tensors Fv and of the matrices g[n ] ) appear repeated twice because they appear in the propagators as well as in the tensors associated with the nodes, with the exception of the label γ associated with the left endpoint of the line ending in the root (as the root is not a node and therefore there is no tensor associated with it). Adopting the convention of summation over repeated component labels Val(θ ) depends on the root label γ (θ ) = γ = 1, . . . , 2 + s so that it defines a vector in C2+s . The recursive definition of the propagators is such that the series in (2.2) is convergent and gives the ν th Fourier component of the function h(ψ, η) in Sect. 1. The definition of propagators we adopt here is the same introduced in [GG2]: the definition in [GG1] is slightly different (see Appendix A2), but it has the drawback that it is specific for hyperbolic resonances, while the definition in [GG2] can be (expected to be) extended also to the theory of elliptic resonances and, therefore, might turn out to be useful in view of possible extensions of the main results of this work to elliptic resonances. (I) For n = −1 the propagator of the line is defined as the block matrix g[−1] =

def

0 0

0 . −1 β f 0 (β 0 ))

(2.5)

(−∂ 2

def

(II) For n = 0, if the line carries a momentum ν and if x = ω · ν, the propagator is the matrix g[0] = g [0] (x; η) =

x2

η2 , + η 2 M0

(2.6)

0 0 0 −∂β2 f 0 (β 0 ) . By the assumptions of Proposition 2 one has M0 ≥ 0. (III) For n > 0 the propagator is the matrix def

with M0 =

g[n] = g [n] (x; η) =

x2

η2 , + M[≤n] (x; η)

(2.7)

with M[≤n] (x; η) = M[0] (x; η) + M[1] (x; η) + · · · + M[n] (x; η), where M[0] (x; η) = η2 M0 , whereas M[ j] (x; η), j ≥ 1, are matrices, called self-energy matrices, whose expansion in η starts at order η4 and are defined as described in the next two items.

184

O. Costin, G. Gallavotti, G. Gentile, A. Giuliani

(IV) Let T be a self-energy cluster on scale n (see item (h) above) and let us define the matrix1 VT (ω · ν; η) as ⎞⎛ ⎞ ⎛ [n ] η2 ⎝ Fv ⎠ ⎝ g ⎠, (2.8) VT (ω · ν; η) = − | (T )|! v∈V (T )

∈ (T )

where, necessarily, n ≤ n for all ∈ (T ). The matrix (2.8) will be called the selfenergy value of T . The set of the self-energy clusters with value proportional to η2k , hence with k − 1 internal lines with n ≥ 0, and with maximum scale label n will be R. denoted Sk,n (V) The self-energy matrices M[n] (x; η), n ≥ 1, are defined recursively for |x| ≤ γn−1 (i.e. for x on scale ≥ n) as M[n] (x; η) =

∞

VT (x; η),

(2.9)

k=2 T ∈S R

k,n−1

where the self-energy values are evaluated by means of the propagators on scales p, with p = −1, 0, 1, . . . , n − 1. Remarks. (1) With respect to [GG2] the second argument of the propagators (and of the self-energy values and matrices) has been denoted η instead of ε; we recall that the variable η2 appearing here is the same as the variable ε appearing in [GG1] and [GG2]. We make this choice because it is natural to study Borel summability in η and not in ε = η2 . (2) The association of the factors η2 with the lines themselves rather than with the nodes (as in [GG1] and [GG2]) will be more convenient when considering the Borel transforms of the involved quantities. (3) The multiscale decomposition used in [GG2] may look quite different from the one we are using here, but this is not really so. First, even though the decomposition in [GG2] was based on the propagator divisors [n] (x; ε) = min j |x 2 − λ[n] j (ε)|,

where the self-energies λ[n] j (ε) were defined recursively in terms of the self–energy matrices, in the case of hyperbolic tori one has identically [n] (x; ε) = x 2 : indeed all self-energies which are non-zero are strictly negative. Then the only real difference is that here we are using a sharp decomposition instead of a smooth one, but the latter is not a relevant difference. In fact the choice of the sequence {γ p }∞ p=0 implies that the lines appearing in the groups of graphs that will be collected together to exhibit the necessary cancellations have currents ν such that ν · ω stays relatively far from the extremes of the intervals [γ p+1 , γ p ] that define the scale labels, and this allows us to use a sharp multiscale decomposition instead of the smooth one used in [GG2]. In other words this change with respect to [GG2] is done only to avoid introducing partitions of unity by smooth functions and the related discussions.

Therefore the expression (2.2) makes sense and in fact the function h mentioned in Proposition 1 is exactly the Fourier sum of the r.h.s. of (2.2). In particular in [GG2] it was proved that the Fourier sum of the r.h.s. of (2.2) satisfies the properties (i)–(iii) in Proposition 1. 1 This is a matrix because the self-energy cluster inherits the labels γ , γ attached to the left of the entering line and to the right of the exiting line.

Borel Summability and Lindstedt Series

185

3. Integral Representation of the Resummed Lindstedt Series Given the definitions of Sect. 2 consider the function h(N ) defined in the same way as h but restricting the sum in (2.2) to the trees containing only lines of scale n ≤ N . The functions h(N ) have the “same” convergence and analyticity properties of the functions h and the same bounds on the Taylor coefficients at the origin. Moreover h(N ) −→ h: this is a consequence of the intermediate steps in the proof of the above N →∞

proposition in [GG2], as the strategy of the proof is to define h(N ) making sure that the convergence and analyticity properties are uniform in N . In fact h(N ) is even analytic in η near the origin for |η| ≤ η N (but η N −→ 0). N →∞

Therefore the functions h(N ) are trivially Borel summable, and have an entire Borel transform, but the growth at p → +∞ of their Borel transforms is N –dependent while, to show Borel summability of h, uniform estimates are needed. This section is devoted to a first attempt at such bounds which uses minimally the information on the resummed series that can be gathered from [GG2], i.e. the convergence properties just mentioned. The Borel transform of the functions h(N ) (ψ, η) is an entire function that can be written for p real and positive as η−1 +i∞ N 1 dz (h(N ) ) B (ψ, p) = L−1 h(N ) (ψ, p) = , p ∈ R+ , e pz h(N ) ψ, −1 z 2πi η N −i∞ (3.1) where η N is the convergence radius of h(N ) (ψ, η). The key remark is that we also know that by property (ii) in Proposition 1 (actually by the same property for h(N ) that follows from the construction in [GG2]) the function h(N ) (ψ, 1z ) is analytic for |z| > 2η0−1 , so that the integral in (3.1) can be shifted to a contour on the vertical line with abscissa ρ > 2η0−1 , i.e. with N -independent abscissa. Therefore ρ+i∞ ¯ 1 dz , p ∈ R+ , (h(N ) ) B (ψ, p) = e pz h(N ) ψ, (3.2) z 2πi ρ−i∞ ¯ and, for all N , the function h(N ) ψ, 1z is uniformly bounded by O(|z|−2 ) on the integration contour, because, by property (iii) in Proposition 1, h is twice differentiable at the −1 origin along any path contained in D (in particular along the circular path Reη = ρ). 1 Hence the latter boundedness property of h(N ) ψ, z and (3.2) imply the bound, for p > 0 and for a suitable constant C, |(h(N ) ) B (ψ, p)| ≤ C eρ¯ p , ∀ p ∈ R+ ,

(3.3)

1 and for all ρ¯ > 2η0−1 . The existence of the limit lim N →∞ h(N ) (ψ, 1z ) = h(ψ, 1z ) for |z| small (by [GG2]) implies existence of the limit F(ψ, p) as N → ∞ of (h(N ) ) B (ψ, p) for p ∈ R+ and F(ψ, p) satisfies the bound (3.3) on R+ . Hence the functions h(ψ, η) can be expressed, for 0 ≤ η < η0 , as ∞ ∞ h(ψ, η) = lim (h(N ) ) B (ψ, p) d p = e− p/η e− p/η F(ψ, p) d p, (3.4) 0

N →∞

0

which provides us with an integral representation of the resummed series and shows that the resummation (2.2) generates a Borel sum of the formal Lindstedt series provided

186

O. Costin, G. Gallavotti, G. Gentile, A. Giuliani

F(ψ, p) can be shown to be analytic in a neighborhood of the positive axis, as required by the very definition of Borel summability, see property (ii) in Definition 1 of Sect. 1.2. Note that, because of property (iii) with τ = 1 in the statement of Proposition 1, the functions (h(N ) ) B (ψ, p) are analytic in an N -uniform neighborhood of the origin, and so is F(ψ, p). We are then left with showing that F(ψ, p) can be analytically extended to a neighborhood of the positive axis. In order to prove this we will show that the approximants (h(N ) ) B (ψ, p) are analytic in an N -uniform strip around the positive axis and that there they admit N -uniform bounds: by Vitali’s convergence theorem we will then conclude that the limit F(ψ, p) of the approximants is analytic in the same strip and it satisfies the same bounds in its analyticity domain. 4. Borel Summability By the remark at the end of Sect. 3 Borel summability of h will be established once the natural candidate for its Borel transform, namely the function F in (3.4), is shown to be analytic in a region containing the positive real axis. Here we will prove that the analyticity domain of F(ψ, p) (which is not trivial, since by construction it contains a neighborhood of the origin) can be extended to a strip of finite width around the real axis. The proof of this claim will be based on an inductive assumption on (g [n] ) B (x; p) 2 formulated by introducing the matrix g˜ [n] (x; η) = x 2 +Mη[≤n] (x;η) for all |x| < γn−1 . def

Note that if χn (x) = 1l(γn ≤ |x| < γn−1 ) is the indicator function of the scale of x, the propagator g [n] (x; η) is given by g [n] (x; η) = χn (x)g˜ [n] (x; η). Furthermore the matrices g [n] (x; η) satisfy the recursive equations

−1 = g˜ [n−1] (x; η) + η−2 M[n] (x; η), ∀|x| < γn−1 . (4.1) √ We suppose, inductively, that for κ0 = M0 and |Im p| < σ for a suitable σ one has g˜ [n] (x; η)

−1

(g˜ [n] ) B (x; p) ≤ K 0 | p|x −2 e(cn +cn |x|

−1/2 )| p|+κ

where the matrix norm is ||M|| = max j cn , cn will be specified below. Note that

i

0 |Im p||x|

−1

, ∀|x| < γn−1 , n ≥ 0, (4.2)

|Mi j | and the growth of the coefficients

1 sin p M0 x −2 g˜ (x; p) = 2 , (4.3) B x M0 x −2 √ so at the first step (4.2) is valid with K 0 = s (where s is the dimension of the non trivial block in M0 ) and c0 = c0 = 0. The constant K 0 comes from our choice of the matrix norm M = max j i |Mi j |: with this choice for any d × d matrix we have d −1/2 M2 ≤ M ≤ d 1/2 M2 , where · 2 is the spectral norm, so that in particular M −1 sin M, cos M ≤ d 1/2 . Assuming the inductive assumption (4.2) to be valid for n ≤ N − 1, we remark that this implies a bound on (g˜ [N ] ) B (x; p) via the expansion

[0]

∞ m − η−2 M[N ] (x; η)g˜ [N −1] (x; η) (g˜ [N ] ) B (x; p) = g˜ [N −1] (x; η) . m=0

B

(4.4)

Borel Summability and Lindstedt Series

187

Taking the Borel transform and performing all convolutions along a straight line from 0 to p we get (g˜ [N ] ) B (x; p) ≤ (g˜ [N −1] ) B (x; p) ∗

∞ −2 [N ] η M (x; p) B k=0

∗k ∗ (g˜ [N −1] ) B (x; p)

(4.5)

def

with f ∗m = f ∗· · ·∗ f mtimes. Now(g˜ [N −1]) B (x; p) is bounded using the inductive assumption (4.2), while η−2 M[N ] B (x; p) is estimated via (2.9), that is −2 [N ] η M (x; p) B ≤

∞

1 | (T )|!

k=2 T ∈S R

k,N −1

⎛

⎛

⎞ −1/2 ∗ ⎜ (c +c γ )| p|+κ0 |Im p|γn−1 ⎟ ⎠ K 0 | p|x−2 e n n n ⎝

⎞⎛

∈ (T ) n ≥0

⎞ ⎟ ⎜ g[−1] ⎠ ⎝ Fv ⎠, ×⎝

(4.6)

v∈V (T )

∈ (T ) n =−1

where x = ω · ν , the

∗

is a convolution product and

Fv =

max

,γ ,...,γ γv1 vpv γv0 v2

|(Fv )γv0 ,γv1 γv2 ...γvp

|. v

(4.7)

Hence using the bound ∗ −1/2 | p|2k−3 (c N −1 +c γ −1/2 )| p|+κ0 |Im p|γ −1 (c +c γ )| p|+κ0 |Im p|γn−1 N −1 , N −1 N −1 e | p|e n n n ≤ (2k − 3)! ∈ (T ) n ≥0

(4.8) and using the estimates in [GG2] to control the sum over the self-energy clusters, the bound becomes −2 [N ] η M (x; p) B ≤

∞ k=2

2k

| p|2k−3 (c N −1 +c γ −1/2 )| p|+κ0 |Im p|γ −1 −2κ2 N N −1 e N −1 N −1 e (2k − 3)!

≤ D0 | p| ed N | p| e−κ2 , N

(4.9)

where and κ are suitable constants derived in Appendix A3 of [GG2], D0 = 4 , −1/2 d N = + c N −1 + c N −1 γ N −1 and, using that γ N−1−1 ≤ 4C0−1 2 N , we chose |Im p| so small that 4κ0 C0−1 |Im p| ≤ κ.

188

O. Costin, G. Gallavotti, G. Gentile, A. Giuliani

Remark. The step leading from (4.6) and (4.8) to (4.9) is non-trivial and the possibility of bounding the small divisors x2 and the sum over self-energy clusters, after defining the propagators as above, is the main technical aspect of the work in [GG2]. We take the existence of from [GG2]. We do not repeat here the analysis performed in Sects. 5 and 6 (and the corresponding Appendices) of [GG2]: the constant 2 has been called ε−1 in Theorem 1 of [GG2]. Using (4.2) and (4.6) in (4.5) we can get a bound on (g˜ [N ] ) B (x; p): [N ] ≤ K 0 | p|x −2 e(c N −1 +c N −1 |x|−1/2 )| p|+κ0 |Im p||x|−1 g˜ (x; p) B ∗

∞

D0 | p|ed N | p| e−κ2

N

∗k

−1/2 −1 ∗ K 0 | p|x −2 e(c N −1 +c N −1 |x| )| p|+κ0 |Im p||x| ,

k=0

(4.10) | p|4k+1 (4k+1)!

and, since | p| ∗ (| p| ∗ | p|)∗k =

for k ≥ 0, the k th term in the sum is bounded by

(K 0 D0 e−κ2 )k | p| | p|4k (d N +c |x|−1/2 )| p|+κ0 |Im p||x|−1 N −1 e . K0 (4k + 1)! x 2 x 2k N

(4.11)

Summing (4.11) over k ≥ 0 and comparing the result with the inductive assumption (4.2), N −1/2 we see that we can take c N = d N = +c N −1 +c N −1 γ N −1 and c N = c N −1 +K 0 D0 e−κ2 . Solving the iterative equations for c N , c N , we see that, for x on scale n, (g˜ [n] ) B (x; p) can be bounded as (g˜ [n] ) B (x; p) ≤ K 0 | p|x −2 ec2

n/2 | p|+κ

1 |Im p|2

n

, γn ≤ |x| < γn−1 ,

n ≥ 0, (4.12)

for some constants c, κ1 . (N ) Plugging this bound into the expansion for h B (ψ, p), denoting by N (θ ) the maximal scale in θ and choosing |Im p| small enough, we finally get |(h(N ) ) B (ψ, p)| ≤

∞

k=1 ν n 0 ≥0 θ∈ ν,γ N (θ)=n 0

⎛

⎞

⎛

∗ ⎟ ⎜ 1 K0 ⎜ c2n /2 | p|+κ1 |Im p|2n ⎟ | p|e ⎟ ⎜ ⎠ | (θ )|! ⎝ x2 ∈ (θ) n ≥0

⎞

⎛ ⎞ ⎜ ⎟ ⎜ ⎟ ×⎜ g[−1] ⎟ ⎝ Fv ⎠ ⎝ ⎠ ∈ (T ) n =−1

≤

∞ k=1 n 0 ≥0

2k

v∈V (T )

| p|2k−1 c2n0 /2 | p|+κ1 |Im p|2n0 −2κ2n0

2 e ≤ | p|ec | p| , e (2k − 1)! (4.13)

for some new constants , c and where we chose |Im p| < σ , σ = κ/κ1 .

Borel Summability and Lindstedt Series

189

The conclusion of the previous discussion is that the functions (h(N ) ) B (ψ, p), which we already knew to be analytic in a neighborhood of the origin, can be analytically continued to a strip of width 2σ around the real positive axis, where they admit the N -uniform bounds (4.13). By Vitali’s convergence theorem, the limit F(ψ, p) of (h(N ) ) B (ψ, p) as N → ∞ is an analytic function in the same domain satisfying the same bound

|F(ψ, p)| ≤ | p| ec | p| , for |Im p| ≤ σ. 2

(4.14)

As remarked after (3.3), a consequence of the results of [GG2] is that F(ψ, p) satisfies the bounds (3.3) ∀ p ∈ R+ . Then, by the very definition of Borel summability, using the representation (3.4) and the analyticity of F(ψ, p) proved in this section, we find that h(ψ, η) is Borel summable (in η). 5. Concluding Remarks (1) It is interesting to stress that the recursive bounds on the Borel transform of the propagators in (4.5) have been derived without making use of the cancellations that played such an essential role in the theory in [GG2] (and [GG1]) and by “undoing” at each step the resummations which led to the construction of h and to the proof of Proposition 1, see (4.4) above. However the properties of h and the result of Proposition 1 have played a key role in the derivation of (3.3) and (3.4). Without the uniform bounds (in N ) on the convergence radii in p of the series expressing (h(N ) ) B , which depend on the cancellations and on the resummations, the bounds in Sect. 4 or, in the trigonometric case, of Appendix A1, would remain the same but they would be useless for our purposes of establishing (3.4) and the Borel summability. (2) It has been remarked above that the resummation procedure is based on several arbitrary a priori choices which therefore may lead to the existence of several solutions of h with the same asymptotic series at ε = 0. All the choices in [GG2], as well as that used in [GG1] (see Appendix A2), lead however to a Borel summable series: this proves that all solutions coincide and the results of the resummations are independent of the particular choices provided the Diophantine constant τ is τ = 1 (hence r = 2 and ω is a Diophantine vector with τ = 1, e.g. ω1 /ω2 is a quadratic irrational). (3) If τ > 1, in particular if r > 2, the problem of Borel summability and of independence of the result from the summation method remains open. (4) The existence or non-existence of solutions h which are C ∞ at the origin and give solutions to the equations of motion but which are not Borel summable is also an open problem. Note that even in the case of non-resonant motions, i.e. in the case of maximal tori, the problem of the uniqueness at fixed ω is not trivial, and also the recent results in [BT] do not exclude the possibility of other quasi-periodic motions besides those constructed through the KAM algorithm. (5) The case ε < 0 with β 0 a minimum point for f 0 (β), i.e. of elliptic motions is quite different. We have used on purpose the resummation technique of [GG2] which works for hyperbolic as well as for elliptic resonances, but the present results still only apply to the hyperbolic case and it is not clear whether the above techniques can be extended to prove Borel summability of the parametric equations for h in elliptic cases. (6) In the case f is a trigonometric polynomial the bounds of the previous section can be improved, the result being that F(ψ, p) is an entire function, satisfying

190

O. Costin, G. Gallavotti, G. Gentile, A. Giuliani

|F(ψ, p)| ≤ 2 | p|e| p|e given in Appendix A1.

c | p|

in the whole complex plane. A proof of this claim is

Appendix A1. Polynomial Perturbations In this appendix we want to prove that, in the case f is a trigonometric polynomial, the bounds in Sect. 4 can be improved to show that F(ψ, p) is an entire function of p, with explicit bounds on its grow at infinity. Assume, inductively, that ∀x = ω · ν the functions (M[≤k] ) B (x; p) are entire functions of p and [≤k] M ≤ | p|edk | p| , for all x on scale k , with k ≥ k ≥ 0, (x; p) B (A1.1) for all p complex. Note that (M[0] ) B ( p) = pM0 (so that the inductive assumption in (A1.1) is valid at the first step k = 0 with d0 = 0 if ≥ ||M0 ||) and (g [0] ) B (x; p) is given √ by (4.3): in particular (g [0] ) B (x; p) ≤ |xp|2 e| p|c0 , ∀|x| ≥ γ0 with c0 = ||M0 ||/γ0 . Supposing the inductive assumption to be valid for k = 1, . . . , N − 1 we remark that this implies a bound on (g [k] ) B (x; p) for k = 1, . . . , N − 1 via the expansion (g [k] ) B (x; p) =

∞ η2 −M[≤k] (x; ·) m . x2 x2 B

(A1.2)

m=0

Taking the Borel transform and performing all convolutions along a straight line from 0 to p, for x on scale 1 ≤ k ≤ N − 1, we get (g [k] ) B (x; p) ≤ x −2 ≤ | p|x

∞

| p| ∗

(| p|edk | p| )∗m x 2m

m=0 −2 dk | p|+ 1/2 γk−1 | p|

e

≡

≤ | p|x −2

∞

| p| edk | p| (2m)! m

m=0 | p|x −2 eck | p| .

2m

1 x 2m

(A1.3)

Then (M[≤N ] − M[0] ) B (x; p) is estimated via (2.9), that is (M[≤N ] − M[0] ) B (x; p) ≤

∞

k=2 T ∈∪ N −1 S R j=0 k, j

×

⎛

⎞⎛

⎞

1 ⎜ ∗ 1 ⎟⎜ ⎟ | p| ∗ ⎝ | p|ecn | p| ⎠ ⎝ g[−1] ⎠ 2 | (T )|! x ∈ (T ) ∈ (T ) n ≥0

Fv .

n =−1

(A1.4)

v∈V (T )

∗ Bounding | p| ∗ | p|ecn | p| by ec N −1 | p| | p|2k−1 /(2k − 1)! and using the estimates in [GG2] to control the sum over the self-energy clusters, the bound becomes (M[≤N ] ) B (x; p) ≤ | p| +

∞ k=2

2k

| p|2k−1 c N −1 | p| e ≤ | p|ed N | p| , (2k − 1)!

where is a suitable constant derived in [GG1], see Remark following (4.9).

(A1.5)

Borel Summability and Lindstedt Series

191

Thus the inductive assumption holds for all N , the constants c N , d N can be taken c2 N for some c, and for all x on scale N one has (g [N ] ) B (x; p) ≤ | p|x −2 e2

N c | p|

.

(A1.6)

This leads to a bound on (h(N ) ) B (ψ, p), via (2.4) and (2.2): |(h(N ) ) B (ψ, p)| ≤

∞ k=1 ν ν,γ

⎛

⎛

⎞⎛ ⎞ 1 ⎜ ∗ 1 ⎟⎜ ⎟ | p|ecn | p| ⎠ ⎝ ||g[−1] ||⎠ ⎝ | (θ )|! ∈ (θ) x2 ∈ (T ) ⎞

×⎝

n ≥0

n =−1

||Fv ||⎠

v∈V (T )

≤

∞

∞

2k

k=1

≤ | p|e 2

| p|2k−1 c2nmax | p| 2k | p|2k−1 c k | p| e e ≤ (2k − 1)! (2k − 1)! k=1

| p|ec | p|

,

(A1.7)

because the maximum scale n max of the lines of a graph with k lines can be at most log2 (bk) by our assumption that f is a trigonometric polynomial: in fact the maximum momentum on a line can be |ν| ≤ b0 k (for f a trigonometric polynomial), so that the smallest x can be bC00k if C0 and τ = 1 are the Diophantine constants, hence the scale of x can be at most log2 (b0 k/4) and c2n max ≤ c k for a suitable c . Therefore the functions (h(N ) ) B (ψ; p) are entire and bounded by |(h(N ) ) B (ψ; p)| ≤ 2 | p|e| p|e

c | p|

(A1.8)

independently of N . By the same argument at the end of Sect. 4 (i.e. by an application of Vitali’s convergence theorem) we infer that the limit F(ψ; p) of (h(N ) ) B (ψ; p) as N → ∞ is entire and satisfies the same bound (A1.8). By the definition of Borel summability and the results of Sect. 3, h(ψ, η) is Borel summable (in η). Appendix A2. Comparison with the Method of [GG1] In this appendix we briefly discuss how the function h constructed in [GG1] can be identified with the Borel summable function of Proposition 1. By the uniqueness in the class of Borel summable functions, it is enough to prove that also the function h of [GG1] is Borel summable. We begin by reviewing the differences of the construction envisaged in [GG1] with respect to that of [GG2]. Trees, labels and clusters are defined in the same way as in items (a) to (h) of Sect. 2. What changes is the definition of the propagators, which is iterative. By writing x = ω · ν , ν = 0, we set g[0] = 1/x 2 , and, for k ≥ 1, g[k] =

x2

η2 , + M [k] (x; η)

(A2.1)

192

O. Costin, G. Gallavotti, G. Gentile, A. Giuliani

with M [k] (x; η) defined as M [k] (x; η) =

VT (x; η),

(A2.2)

T

where the sum is restricted to the self-energy clusters T with scale n T ≥ n + 3, where n is such that γn−1 ≤ |x| < γn , and the self-energy value is given by ⎞⎛ ⎞ ⎛ [k−1] η2 ⎝ ⎠. VT (ω · ν; η) = − Fv ⎠ ⎝ g (A2.3) | (T )|! v∈V (T )

∈ (T )

Note that k labels the iterative step and, in principle, it has no relation with the scale n of x. However the matrices M [k] (x; η) are obtained from resummations of self-energy clusters with height up to k = n (for definitions and details we refer to [GG1], where the self-energy clusters are called self-energy graphs). Hence they stop flowing at k = n if x is on scale n: this means that M [k] (x; η) = M [n] (x; η) as soon as k ≥ n. Then h(N ) will be expressed in terms of trees as in (2.2), if Val(θ ) is defined as in (2.4), with g[n ] replaced with g[N ] , and h(N ) is obtained as the limit of h(N ) as N → ∞. Let us consider for simplicity the case of polynomial perturbations. Then we can proceed as in Sect. 4, and prove by induction the bound [k] M (x; p) ≤ | p|edk | p| , for all x, (A2.4) B for all p complex. Supposing inductively the bound (A2.4) we obtain that the Borel transform of g[k] can be bounded as (g [k] ) B (x; p) ≤

| p| ck | p| e . x2

(A2.5)

This is trivial for k = 0, as (g [0] ) B (x; p) = p/x 2 , while it follows from (A2.4) for n ≥ 1 by the inductive hypothesis (and it can be proved as the analogous bound in Sect. 4). Therefore we can write M [N ] (x; η) according to (A2.2), and its Borel transform (M [N ] ) B (x; p) can be computed and bounded as done in (A1.4) and (A1.6), so that at the end the bound (A2.4) is obtained for k = N . In particular the bounds (A2.5) on the propagators imply that h(N ) admits the same bound (A1.8) found in Sect. 4. Therefore we can take the limit for N → ∞, and Borel summability for h follows. Acknowledgements. AG was partially supported by U.S. National Science Foundation grant PHY 01 39984, which is gratefully acknowledged. OC was partially supported by U.S. National Science Foundation grants DMS-0406193 and DMS-0600369.

References [BT]

Broer, H., Takens, F.: Unicity of KAM tori. Preprint, Groningen, 2005, available at http://www.math. rug.nl/∼broer/pdf/btun.pdf [GBG] Gallavotti, G., Bonetto, F., Gentile, G.: Aspects of ergodic, qualitative and statistical theory of motion. Texts and Monographs in Physics, Berlin: Springer, 2004 [GG] Gallavotti, G., Gentile, G.: Majorant series convergence for twistless KAM tori. Ergodic Th. Dyn. Syst. 15, 857–869 (1995) [GG1] Gallavotti, G., Gentile, G.: Hyperbolic low-dimensional invariant tori and summations of divergent series. Commun. Math. Phys. 227(3), 421–460 (2002)

Borel Summability and Lindstedt Series

[GG2] [JLZ] [Ne] [So]

193

Gallavotti, G., Gentile, G.: Degenerate elliptic resonances. Commun. Math. Phys. 257, 319–362 (2005) Jorba, À., de la Llave, R., Zou, M.: Lindstedt series for lower-dimensional tori. In: Hamiltonian systems with three or more degrees of freedom (S’Agaró, 1995), NATO Adv. Sci. Inst. Ser. C Math. Phys. Sci., 533, Dordrecht: Kluwer Acad. Publ., 1999, pp. 151-167 Nevanlinna, F.: Zur Theorie der Asymptotischen Potenzreihen. Annales Academiae Scientiarum Fennicae. Series A I. Mathematica 12, 1–18 (1916) Sokal, A.D.: An improvement of Watson’s theorem on Borel summability. J. Math. Phys. 21(2), 261–263 (1980)

Communicated by A. Kupiainen

Commun. Math. Phys. 269, 195–221 (2007) Digital Object Identifier (DOI) 10.1007/s00220-006-0077-2

Communications in

Mathematical Physics

WKB Analysis for Nonlinear Schrödinger Equations with Potential Rémi Carles MAB, Université de Bordeaux 1, 351 cours de la Libération, 33405 Talence cedex, France. E-mail: [email protected] Received: 24 January 2006 / Accepted: 31 March 2006 Published online: 19 July 2006 – © Springer-Verlag 2006

Abstract: We justify the WKB analysis for the semiclassical nonlinear Schrödinger equation with a subquadratic potential. This concerns subcritical, critical, and supercritical cases as far as the geometrical optics method is concerned. In the supercritical case, this extends a previous result by E. Grenier; we also have to restrict to nonlinearities which are defocusing and cubic at the origin, but besides subquadratic potentials, we consider initial phases which may be unbounded. For this, we construct solutions for some compressible Euler equations with unbounded source term and unbounded initial velocity. 1. Introduction Consider the initial value problem, for x ∈ Rn and κ ≥ 0: iε∂t u ε +

ε2 u ε = V (t, x)u ε + εκ f |u ε |2 u ε , 2 u ε|t=0 = a0ε (x)eiφ0 (x)/ε .

(1.1) (1.2)

The aim of WKB methods is to describe u ε in the limit ε → 0, when φ0 does not depend on ε, and a0ε has an asymptotic expansion of the form: a0ε (x) ∼ a0 (x) + εa1 (x) + ε2 a2 (x) + . . . .

(1.3)

The parameter κ ≥ 0 describes the strength of a coupling constant, which makes nonlinear effects more or less important in the limit ε → 0; the larger the κ, the weaker the nonlinear interactions. In this paper, we describe the asymptotic behavior of u ε at leading order, when the potential V and the initial phase φ0 are smooth, and subquadratic in the space variable. Such equations as (1.1) appear in physics: see e.g. [34] for a general overview. For instance, they are used to model Bose-Einstein condensation when V is an harmonic

196

R. Carles

potential (isotropic or anisotropic) and the nonlinearity is cubic or quintic (see e.g. [15, 25, 30]). In most of this paper, the initial data we consider are in Sobolev spaces H s . We refer to [5] for numerics on the semi-classical limit of (1.1). We shall not recall the results concerning the Cauchy problem for (1.1)-(1.2), and refer to [10] for an overview on the semilinear Schrödinger equation. In the one-dimensional case n = 1, the cubic nonlinear Schrödinger equation is completely integrable, in the absence of an external potential, or when V is a quadratic polynomial ([1, p. 375]). Several tough papers analyze the semi-classical limit in the case V ≡ 0, for κ = 0: see e.g. [23, 24, 36, 37]. The author confesses that the link between these results and WKB methods is not clear to him. We shall not use this approach, but rather work in the spirit of [22]. An interesting feature of (1.1) is that one does not expect the creation of harmonics, provided that only one phase is present initially, like in (1.2). The WKB methods consist in seeking an approximate solution to (1.1) of the form: u ε (t, x) ∼ a0 (t, x) + εa1 (t, x) + ε2 a2 (t, x) + . . . ei(t,x)/ε .

(1.4)

One must not expect this approach to be valid when caustics are formed: near a caustic, all the terms , a0 , a1 , …become singular. Past the caustic, several phases are necessary in general to describe the asymptotic behavior of the solution (see e.g. [17] for a general theory in the linear case). In this paper, we restrict our attention to times preceding this break-up. For such an expansion to be available with profiles a j independent of ε, it is reasonable to assume that κ is an integer. However, we do not assume that κ is an integer: we study the asymptotic behavior of u ε at leading order (strong limits in L 2 ∩ L ∞ for instance), including cases where other powers of ε would come into play. We distinguish two families of assumptions: “geometrical” assumptions, on the potential V and the initial phase φ0 , and “analytical” assumptions, on f and the initial amplitude a0ε . In all the cases, we shall not try to seek the optimal regularity; we focus our interest on the limit ε → 0. Assumption 1 (Geometrical). We assume that the potential and the initial phase are smooth and subquadratic: ∞ n – V ∈ C ∞ (Rt × Rnx ), and ∂xα V ∈ L ∞ loc (Rt ; L (Rx )) as soon as |α| ≥ 2. ∞ n α ∞ n – φ0 ∈ C (R ), and ∂x φ0 ∈ L (R ) as soon as |α| ≥ 2.

The assumption of V being subquadratic is classical in other contexts; for instance, t 2 locally in time, the dispersion for e−i ε (−ε +V ) is the same as without potential (see [19, 20]), t 2 −i ε (−ε +V ) e

L 1 →L ∞

≤

C , ∀|t| ≤ δ, |εt|n/2

hence yielding the same local Strichartz estimates as in the free case. Global in time Strichartz estimates must not be expected in general, as shown by the example of the harmonic oscillator, which has eigenvalues. For positive superquadratic potentials, the smoothing effects and Strichartz estimates are different (see [38, 39]). This is related to the properties of the Hamilton flow, which also imply:

WKB for NLS with Potential

197

Lemma 1. Under Assumption 1, there exist T > 0 and a unique solution φeik ∈ C ∞ ([0, T ] × Rn ) to: 1 ∂t φeik + |∇x φeik |2 + V (t, x) = 0; φeik|t=0 = φ0 . 2

(1.5)

This solution is subquadratic: ∂xα φeik ∈ L ∞ ([0, T ] × Rn ) as soon as |α| ≥ 2. This result is proved in Sect. 2, where other remarks on Assumption 1 are made. Assumption 2 (Analytical). We assume that the nonlinearity is smooth, and that the initial amplitude converges in Sobolev spaces: – f ∈ C ∞ (R; R). – There exists a0 ∈ H ∞ := ∩s≥0 H s (Rn ), such that a0ε converges to a0 in H s for all s, as ε → 0. Note that in general, we consider complex-valued initial data a0ε . In the rest of this paper, what we call amplitude is complex-valued in general. The distinction between phase and amplitude lies essentially in the fact that the phase is associated with rapid oscillations of wavelength ε. In Sect. 6, we deal with oscillations of wavelength small compared to one, but large compared to ε. We choose to consider them as terms of the phase. Remark 1. Some of the results we shall prove remain valid when f is complex-valued. In that case, the conservation of mass associated to the Schrödinger equation, u ε (t) L 2 = a0ε L 2 , no longer holds. On the other hand, when 0 ≤ κ < 1, this assumption is necessary in our approach, and we even assume f > 0. 1.1. Subcritical and critical cases: κ ≥ 1. When the initial data is of the form (1.3), the usual approach consists in plugging a formal expansion of the form (1.4) into (1.1). Ordering the terms in powers of ε, and canceling the cascade of equations thus obtained yields , a0 , a1 , …. Assume in this section that κ ≥ 1, and apply the above procedure. To cancel the term of order O(ε0 ), we find that must solve (1.5): = φeik . Canceling the term of order O(ε1 ), we get: 1 ∂t a0 + ∇φeik · ∇a0 + a0 φeik = 2

0

− i f |a0 |2 a0

if κ > 1, if κ = 1.

We see that the value κ = 1 is critical as far as nonlinear effects are concerned: if κ > 1, no nonlinear effect is expected at leading order, since formally, u ε ∼ a0 eiφeik /ε , and φeik and a0 do not depend on the nonlinearity f . If κ = 1, then a0 solves a nonlinear equation involving f . We will see in Sect. 3 that a0 solves a transport equation that turns out to be a ordinary differential equation along the rays of geometrical optics, as is usual in the hyperbolic case (see e.g. [31]). More typical of Schrödinger equation is the fact that this ordinary differential equation can be solved explicitly: the nonlinear effect is measured by a nonlinear phase shift (see the example of [27] for a similar result in the hyperbolic setting). We prove the following result in Sect. 3:

198

R. Carles

Proposition 1. Let Assumptions 1 and 2 be satisfied. Let κ ≥ 1. Then for all ε ∈]0, 1], (1.1)-(1.2) has a unique solution u ε ∈ C ∞ ([0, T ] × Rn ) ∩C([0, T ]; H s ) for all s > n/2 (T is given by Lemma 1). Moreover, there exist a, G ∈ C ∞ ([0, T ] × Rn ), independent of ε ∈]0, 1], where a ∈ C([0, T ]; L 2 ∩L ∞ ), and G is real-valued with G ∈ C([0, T ]; L ∞ ), such that: κ−1 ε → 0 as ε → 0. u − aeiε G eiφeik /ε ∞ 2 ∞ L ([0,T ];L ∩L )

The profile a solves the initial value problem: 1 ∂t a + ∇φeik · ∇a + aφeik = 0; a|t=0 = a0 , 2

(1.6)

and G depends nonlinearly on a through f . In particular, if κ > 1, then ε → 0 as ε → 0, u − aeiφeik /ε ∞ 2 ∞ L ([0,T ];L ∩L )

and no nonlinear effect is present in the leading order behavior of u ε . If κ = 1, nonlinear effects are present at leading order, measured by G. The dependence of G upon a and f is made more explicit in Sect. 3, in terms of the Hamilton flow determining φeik (see (3.4)). Note that in the above result, we do not assume that κ is an integer. 1.2. Supercritical case: κ = 0. It follows from the above analysis that the case 0 ≤ κ < 1 is supercritical. We restrict our attention to the case κ = 0. We present an analysis of the range 0 < κ < 1 in Sect. 6. Plugging an asymptotic expansion of the form (1.4) into (1.1) yields a shifted cascade of equations: 1 O ε0 : ∂t + |∇|2 + V + f |a0 |2 = 0, 2 1 1 O ε : ∂t a0 + ∇ · ∇a0 + a0 = 2i f |a0 |2 Re (a0 a1 ) . 2 Two comments are in order. First, we see that there is a strong coupling between the phase and the main amplitude: a0 is present in the equation for . Second, the above system is not closed: is determined in function of a0 , and a0 is determined in function of a1 . Even if we pursued the cascade of equations, this phenomenon would remain: no matter how many terms are computed, the system is never closed (see [21]). This is a typical feature of supercritical cases in nonlinear geometrical optics (see [12, 13]). In the case when V ≡ 0 and φ0 ∈ H s , this problem was resolved by E. Grenier [22], by modifying the usual WKB methods; this approach is recalled in Sect. 4. Note that even though a1 is not determined by the above system, the pair (ρ, v) := (|a0 |2 , ∇) solves a compressible Euler equation: ∂t v + v · ∇v + ∇V + ∇ f (ρ) = 0; v t=0 = ∇φ0 , (1.7) ∂t ρ + ∇ · (ρv) = 0; ρ = |a0 |2 . t=0

Using techniques introduced in the study of quasilinear hyperbolic equations, E. Grenier justified a WKB expansion for nonlinearities which are defocusing, and

WKB for NLS with Potential

199

cubic at the origin ( f > 0). We shall not change this assumption, but show how to treat the case of a subquadratic potential with a subquadratic initial phase. Note that even the construction of solution to (1.7) under Assumption 1 is not standard: the source term ∇V may be unbounded, as well as the initial velocity ∇φ0 . Assumption 3. In addition to Assumption 2, we assume: – f > 0. – There exists a0 , a1 ∈ H ∞ , with xa0 , xa1 ∈ H ∞ , such that: ε a − a0 − εa1 s + xa ε − xa0 − εxa1 s = o(ε), ∀s ≥ 0. 0 0 H H We can then describe the asymptotic behavior of the solution to (1.1)-(1.2) for small times: Theorem 1. Let Assumptions 1, 2 and 3 be satisfied. Let κ = 0. There exists T∗ > 0 independent of ε ∈]0, 1] and a unique solution u ε ∈ C ∞ ([0, T∗ ] × Rn ) ∩ C([0, T∗ ]; H s ) for all s > n/2 to (1.1)-(1.2). Moreover, there exist a, φ ∈ C([0, T∗ ]; H s ) for every s ≥ 0, such that: (1.8) lim sup u ε − aei(φ+φeik )/ε 2 ∞ = O(t) as t → 0. L ∩L

ε→0

Here, a and φ are nonlinear functions of φeik and a0 , given by (5.9). Finally, there exists φ (1) ∈ C([0, T∗ ]; H s ) for every s ≥ 0, real-valued, such that: (1) lim sup sup u ε − aeiφ ei(φ+φeik )/ε 2 ∞ = 0. (1.9) ε→0

0≤t≤T∗

L ∩L

The phase shift φ (1) is a nonlinear function of φeik , a0 and a1 . This result can be understood as follows. At leading order, the amplitude of u ε is given (1) by aeiφ , which can be approximated for small times by a, because φ (1) t=0 = 0. The rapid oscillations are described by the phase φ + φeik . The function φ is constructed as a perturbation of φeik , but must not be considered as negligible: its H s -norms are not small in general, see (5.9) (at time t = 0 for instance). As a consequence of our analysis, the pair (1) 2 (ρ, v) = |a|2 , ∇(φ + φeik ) = aeiφ , ∇(φ + φeik ) solves the system (1.7). Remark 2. With this result, we could deduce instability phenomena for (1.1)-(1.2) in the same fashion as in [8]. Note that because of Assumption 1, it seems that the approaches of [6, 7, 14] cannot be adapted to the present case: the Laplacian can never be neglected, and apparently, the WKB approach is really needed. The rest of this paper is organized as follows. In Sect. 2, we prove Lemma 1. In Sect. 3, we prove Proposition 1, and explain how G is obtained. We recall the approach of [22] in Sect. 4, and show how to adapt it to prove Theorem 1 in Sect. 5. We present an analysis for the case 0 < κ < 1 in Sect. 6.

200

R. Carles

2. Global in Space Hamilton-Jacobi Theory In this section, we prove Lemma 1. Consider the classical Hamiltonian: 1 H (t, x, τ, ξ ) = τ + |ξ |2 + V (t, x), (t, x, τ, ξ ) ∈ R+ × Rn × R × Rn . 2 It is smooth by Assumption 1. Therefore, it is classical (see e.g. [16]) that in the neighborhood of each point x ∈ Rn , one can construct a smooth solution to the eikonal equation (1.5), on some time interval [−t (x), t (x)], for some t (x) > 0 depending on x. This stems from the fact that the classical Hamiltonian H is smooth, and from the local inversion theorem. In Lemma 1, we can find some T > 0 uniform in x ∈ Rn : this is so because the potential and the initial phase are subquadratic. Recall that if there exist some constants a, b > 0 such that a potential V satisfies V(x) ≥ −a|x|2 − b, then − + V is essentially self-adjoint on C0∞ (Rn ) (see [32, p. 199]). If −V has superquadratic growth, then it is not possible to define e−it (−+V) (see [18, Chap. 13, Sect. 6, Cor. 22] for the case V(x) = −x 4 in space dimension one). This is due to the fact that classical trajectories can reach an infinite speed. We will see below that if the initial phase φ0 is superquadratic, then focusing at the origin may occur “instantly” (see Example 1). To construct the solution of (1.5), introduce the flow associated to H : let x(t, y) and ξ(t, y) solve ∂t x(t, y) = ξ (t, y) ; x(0, y) = y, (2.1) ∂t ξ(t, y) = −∇x V (t, x(t, y)) ; ξ(0, y) = ∇φ0 (y). Recall the result (valid under weaker conditions than Assumption 1): Theorem 2 ([16], Th. A.3.2). Suppose that Assumption 1 is satisfied. Let t ∈ [0, T ] and θ0 an open set of Rn . Denote θt := {x(t, y) ∈ Rn , y ∈ θ0 }; θ := {(t, x) ∈ [0, T ] × Rn , x ∈ θt }. Suppose that for t ∈ [0, T ], the mapping θ0 y → x(t, y) ∈ θt is bijective, and denote by y(t, x) its inverse. Assume also that ∇x y ∈ L ∞ loc (θ ). Then there exists a unique function θ (t, x) → φeik (t, x) ∈ R that solves (1.5), and satisfies ∇x2 φeik ∈ L ∞ loc (θ ). Moreover, ∇x φeik (t, x) = ξ(t, y(t, x)).

(2.2)

Proposition 2 ([33], Th. 1.22 and [16], Prop. A.7.1). Suppose that the function Rn y → x(y) ∈ Rn satisfies: | det ∇ y x| ≥ C0 > 0 and ∂ yα x ≤ C, |α| = 1, 2. Then x is bijective.

WKB for NLS with Potential

201

Proof of Lemma 1. Lemma 1 follows from the above two results. From Assumption 1, we know that we can solve (2.1) locally in time in the neighborhood of any y ∈ Rn . Differentiate (2.1) with respect to y: ∂t ∂ y x(t, y) = ∂ y ξ (t, y) ; ∂ y x(0, y) = Id, (2.3) ∂t ∂ y ξ(t, y) = −∇x2 V (t, x(t, y)) ∂ y x(t, y); ∂ y ξ(0, y) = ∇ 2 φ0 (y). Integrating (2.3) in time, we infer from Assumption 1 that for any T > 0, there exists C T such that for (t, y) ∈ [0, T ] × Rn :

t ∂ y x(s, y) + ∂ y ξ(s, y) ds. ∂ y x(t, y) + ∂ y ξ(t, y) ≤ C T + C T 0

Gronwall lemma yields: ∂ y x(t)

L∞ y

+ ∂ y ξ(t) L ∞ ≤ C (T ). y

(2.4)

Similarly, α ∂ y x(t)

L∞ y

+ ∂ yα ξ(t)

L∞ y

≤ C(α, T ), ∀α ∈ Nn , |α| ≥ 1.

(2.5)

Integrating the first line of (2.3) in time, we have:

t det ∇ y x(t, y) = det Id + ∇ y ξ (s, y) ds . 0

We infer from (2.4) that for t ∈ [0, T ], provided that T > 0 is sufficiently small, we can find C0 > 0 such that: det ∇ y x(t, y) ≥ C0 , ∀(t, y) ∈ [0, T ] × Rn . (2.6) Applying Proposition 2, we deduce that we can invert y → x(t, y) for t ∈ [0, T ]. n To apply Theorem 2 with θ0 = θ = θt = Rn , we must check that ∇x y ∈ L ∞ loc (R ). Differentiate the relation x (t, y(t, x)) = x with respect to x: ∇x y(t, x)∇ y x (t, y(t, x)) = Id. Therefore, ∇x y(t, x) = ∇ y x (t, y(t, x))−1 as matrices, and ∇x y(t, x) =

1 adj ∇ y x (t, y(t, x)) , det ∇ y x(t, y)

(2.7)

where adj ∇ y x denotes the adjugate of ∇ y x. We infer from (2.4) and (2.6) that ∇x y ∈ L ∞ (Rn ) for t ∈ [0, T ]. Therefore, Theorem 2 yields a smooth solution φeik to (1.5), local in time and global in space: φeik ∈ C ∞ ([0, T ] × Rn ). The fact that φeik is subquadratic as stated in Lemma 1 then stems from (2.2), (2.5), (2.6) and (2.7).

202

R. Carles

We now give some two examples showing that Assumption 1 is essentially sharp to solve (1.5) globally in space, at least when no assumption is made on the sign of V nor on the geometry of ∇φ0 . We already recalled that if −V has a superquadratic growth, then − + V is not essentially self-adjoint on C0∞ (Rn ), so we shall rather study the dependence of φeik on the initial phase φ0 . Example 1. Assume that V ≡ 0 and φ0 (x) = −

1+δ 1 |x|2 + 1 , T > 0, δ = −1. (2 + 2δ)T

Then Assumption 1 is satisfied if and only if δ ≤ 0. When δ = 0, then (1.5) is solved explicitly: φeik (t, x) =

1 |x|2 − . 2(t − T ) 2T

This shows that we can solve (1.5) globally in space, but only locally in time: as t → T , a caustic reduced to a single point (the origin) is formed. Note that with T < 0, (1.5) can be solved globally in time for positive times. When δ > 0, then integrating (2.1) yields:

t

t δ t 2 |y| + 1 y x(t, y) = y + ξ(s, y)ds = y + ξ(0, y)ds = y − T 0 0 δ t 2 . |y| + 1 = y 1− T For R > 0, we see that the rays starting from the ball {|y| = R} meet at the origin at time Tc (R) =

T . (R 2 + 1)δ

Since R is arbitrary, this shows that several rays can meet arbitrarily fast, thus showing that Theorem 2 cannot be applied uniformly in space. Example 2. When V (t, x) = φ0 ≡ 0, we have:

1 2

n

2 2 j=1 ω j x j

φ(t, x) = −

is an harmonic potential (ω j = 0), and

n ωj j=1

2

x 2j tan(ω j t).

This also shows that we can solve (1.5) globally in space, but locally in time only. Note that if we replace formally ω j by iω j , then V is turned into −V , and the trigonometric functions become hyperbolic functions: we can then solve (1.5) globally in space and time.

WKB for NLS with Potential

203

Instead of invoking Theorem 2 and Proposition 2, one might try to differentiate (1.5) in order to prove that φeik is subquadratic, in the same fashion as in [4, 3]. For 1 ≤ j, k ≤ n, differentiate (1.5) with respect to x j and xk : n 2 ∂t ∂ 2jk φeik + ∇x φeik · ∇x ∂ 2jk φeik + ∂ 2jl φeik ∂lk φeik + ∂ 2jk V (t, x) = 0; l=1

∂ 2jk φeik|t=0 = ∂ 2jk φ0 . We see that we obtain a system of the form Dt y = Q(y) + R;

y|t=0 = y(0),

(∂ 2jk φeik )1≤ j,k≤n ,

where y stands for the family Q is quadratic, and R and y(0) are bounded. The operator Dt is a well-defined transport operator provided that the characteristics given by: ∂t x(t, y) = ∇x φeik (t, x(t, y)) ; x(0, y) = y, define a global diffeomorphism. Proving this amounts to using Proposition 2, for the rather general initial data we consider. So it seems that this approach does not allow to shorten the proof of Lemma 1. 3. Subcritical and Critical Cases To establish Proposition 1, define a ε (t, x) := u ε (t, x)e−iφeik (t,x)/ε . Then u ε solves (1.1)–(1.2) if and only if a ε solves: 1 ε ∂t a ε + ∇φeik · ∇a ε + a ε φeik = i a ε − iεκ−1 f |a ε |2 a ε , 2 2 ε = a0ε . a|t=0

(3.1)

Note that even if the initial data a0ε are real-valued, the amplitude a ε (t, x) is complexvalued, due to the right-hand side of (3.1). Proposition 3. Let Assumptions 1 and 2 be satisfied. Let κ ≥ 1. For all ε ∈]0, 1], (3.1) has a unique solution a ε ∈ C ∞ ([0, T ]×Rn )∩C([0, T ]; H s ) for all s > n/2. Moreover, a ε is bounded in L ∞ ([0, T ]; H s ) uniformly in ε ∈]0, 1], for all s ≥ 0. Proof. There are at least two procedures to construct a solution to (3.1): an iterative scheme, or Galerkin methods. For the iterative scheme, we solve: 1 ε ∂t a εj+1 + ∇φeik · ∇a εj+1 + a εj+1 φeik = i a εj+1 − iεκ−1 f |a εj |2 a εj ; j ≥ 0, 2 2 a εj+1|t=0 = a0ε . This is actually a linear Schrödinger equation: setting u εj := a εj eiφeik /ε , we see that the above equation is equivalent to: ε2 iε∂t u εj+1 + u εj+1 = V (t, x)u εj+1 + εκ f |u εj |2 u εj ; u εj+1|t=0 = a0ε eiφeik /ε . 2

204

R. Carles

Using Galerkin methods, we can mimic the mollification procedure presented for instance in [2, 28]; roughly speaking, we solve ordinary the differential equation in H s by considering 1 ε ∂t ahε + Jh ∇φeik · ∇ Jh ahε + a ε h φeik = i Jh2 ahε − iεκ−1 f |ahε |2 ahε , 2 2 ε ah|t=0 = a0ε , where Jh = j (h D) is a Fourier multiplier, with j ∈ C0∞ (Rn ; R) equal to one in a neighborhood of the origin. For the iterative scheme as well as for the Galerkin methods, the problem boils down to obtaining energy estimates for (3.1) in H s , for all s ≥ 0. Let s > n/2, and α ∈ Nn , with s = |α|. Applying ∂xα to (3.1), we find: ε (3.2) ∂t ∂xα a ε + ∇φeik · ∇∂xα a ε = i ∂xα a ε − iεκ−1 ∂xα f |a ε |2 a ε + Rαε , 2 where 1 Rαε = ∇φeik · ∇, ∂xα a ε − ∂xα a ε φeik . 2 Take the inner product of (3.2) with ∂ α a ε , and consider the real part: the first term of the right-hand side of (3.2) vanishes, and we have:

1 d α ε 2 ∂x a L 2 + Re ∂xα a ε ∇φeik · ∇∂xα a ε ≤ εκ−1 f |a ε |2 a ε s a ε H s n H 2 dt R ε ε + Rα L 2 a H s . Notice that we have

1 α ε 2 α ε α ε Re ∂x a ∇φeik · ∇∂x a = ∇φeik · ∇ ∂x a 2 Rn Rn

2 α ε 2 1 = ∂x a φeik ≤ C a ε H s , 2 Rn since φeik ∈ L ∞ ([0, T ] × Rn ) from Lemma 1. Moser’s inequality yields: f |a ε |2 a ε s ≤ C a ε L ∞ a ε H s . H

Summing over α such that |α| = s, we infer: d ε a H s ≤ C a ε L ∞ a ε H s + Rαε H s . dt Note that the above locally bounded map C(·) is independent of ε if and only if κ ≥ 1. To apply the Gronwall lemma, we need to estimate the last term: we use the fact that the derivatives of order at least two of φeik are bounded, from Lemma 1, to have: ε R 2 ≤ C a ε s . α L H We can then conclude by a continuity argument and Gronwall lemma: a ε L ∞ ([0,T ];H s ) ≤ C s, a0ε H s . This yields boundedness in the “high” norm. Contraction in the “small” norm (that is, contraction in L 2 ) follows easily, completing the proof of Proposition 3.

WKB for NLS with Potential

205

Corollary 1. Let Assumptions 1 and 2 be satisfied. Let κ ≥ 1. Then ε a − a ε L ∞ ([0,T ];H s ) → 0 as ε → 0, ∀s ≥ 0, where a ε solves: 1 ε ε aε ; a ε + ∇φeik · ∇ aε + a ε |2 a|t=0 = a0 . ∂t a φeik = −iεκ−1 f | 2

(3.3)

a ε is Proof. The proof of Proposition 3 shows that a ε ∈ C ∞ ([0, T ] × Rn ), and that bounded in L ∞ ([0, T ]; H s ) uniformly in ε ∈]0, 1], for all s ≥ 0. Let wε = a ε − a ε : it solves 1 ε ∂t w ε + ∇φeik · ∇w ε + w ε φeik = i a ε − iεκ−1 F(a ε ) − F( aε ) , 2 2 ε = a0ε − a0 , w|t=0 where we have denoted F(z) = f (|z|2 )z. Proceeding as in the proof of Proposition 3, we have the following energy estimate: d w ε H s ≤ C w ε H s + ε a ε H s + F(a ε ) − F( a ε ) H s . dt Since H s is an algebra and F is C 1 , the Fundamental Theorem of Calculus yields: ε ε F(a ε ) − F( a H s w H s . a ε ) H s ≤ C a ε H s , Now since a ε and a ε are bounded in L ∞ ([0, T ]; H s+2 ) uniformly in ε ∈]0, 1], we have an estimate of the form: d w ε H s ≤ C(s) w ε H s + C(s)ε. dt We conclude by the Gronwall lemma, since a0ε − a0 H s → 0 from Assumption 2. Remark 3. If we assume moreover that like in (1.3), a0ε = a0 + O(ε) in H s , ∀s ≥ 0, then the above estimate can be improved to: ε a − a ε L ∞ ([0,T ];H s ) = O(ε), ∀s ≥ 0. We have reduced the study of the asymptotic behavior of u ε to the understanding of a ε . To complete the proof of Proposition 1, we resume the framework of Sect. 2. With x(t, y) given by the Hamilton flow (2.1), introduce the Jacobi determinant Jt (y) = det∇ y x(t, y). Denote a ε (t, x(t, y)) Aε (t, y) :=

Jt (y).

206

R. Carles

We see that so long as y → x(t, y) defines a global diffeomorphism (which is guaranteed for t ∈ [0, T ] by construction), (3.3) is equivalent to: 2 ∂t Aε = −iεκ−1 f Jt (y)−1 Aε Aε ; Aε (0, y) = a0 (y). This ordinary differential equation along the rays of geometrical optics can be solved explicitly: since f is real-valued, we see that ∂t |Aε |2 = 0, hence

t ε κ−1 −1 2 f Js (y) |a0 (y)| ds . A (t, y) = a0 (y) exp −iε 0

Back to the

function aε ,

Proposition 1 follows, with:

1 a(t, x) = √ a0 (y(t, x)) , Jt (y(t, x))

t G(t, x) = − f Js (y(t, x))−1 |a0 (y(t, x))|2 ds.

(3.4)

0

The presence of G shows that the amplitude Aε is complex-valued, as noted in the beginning of this section. One may wonder if this approach could be extended to some values κ < 1. Seek a solution of (3.1). To have a simple ansatz as in Proposition 1, we would like to remove the Laplacian in the limit ε → 0 in (3.1), and obtain the analogue of Corollary 1. Following the same lines as above, we find: a ε (t, x) = a(t, x)eiε

κ−1 G(t,x)

,

which is exactly the first formula in Proposition 1. Now recall that in Proposition 3, we prove that a ε is bounded in H s , uniformly for ε ∈]0, 1]; this property is used to approximate a ε by a ε . But when κ < 1, a ε is no longer uniformly bounded in H s , because what was a phase modulation for κ ≥ 1 is now a rapid oscillation. This remark in a particular case reveals a much more general phenomenon. When studying geometric optics in a supercritical régime (when κ < 1 in the present context), distinguishing phase and amplitude becomes a much more delicate issue. Suppose for ε instance that we seek u ε = a ε eiφ /ε , where a ε and φ ε have asymptotic expansions as ε → 0. All the terms for φ ε which are not o(ε) are relevant, since φ ε is divided by ε. To determine these terms, it is not sufficient to determine the leading order amplitude lim a ε in general: because of supercritical interactions, initial perturbations of the amplitude may develop non-negligible phase terms. To illustrate this discussion in the above case, let κ < 1 and a ε solve (3.3) with initial data ε a|t=0 = a0 + εγ a1 , γ > 0,

where a0 and a1 are smooth and independent of ε. Integrating (3.3), we find a ε (t, x) = √ where G ε (t, x) = −

κ−1 ε 1 a0 (y(t, x)) + εγ a1 (y(t, x)) eiε G (t,x) , Jt (y(t, x))

t

f 0

2 Js (y(t, x))−1 a0 (y(t, x)) + εγ a1 (y(t, x)) ds.

WKB for NLS with Potential

207

We have the identity G ε = G + εγ G 1 + ε2γ G 2 , for the same G as before, and G 1 , G 2 independent of ε, but depending on a1 . We infer: a ε (t, x) ∼ √ ε→0

1 κ−1 γ 2γ a0 (y(t, x)) eiε (G+ε G 1 +ε G 2 )(t,x) . Jt (y(t, x))

In particular, if κ + γ ≤ 1, we see that G 1 has to be incorporated to describe the leading order behavior of a ε . Since the three requirements κ < 1, γ > 0 and κ + γ ≤ 1 can be met, we see that the small initial perturbation εγ a1 may produce relevant phase perturbations. This example explains why in Assumption 3, we require the asymptotic behavior of the initial amplitude up to order o(ε) and not only o(1) (take κ = 0 and γ = 1). We refer to [12] for a general explanation of this phenomenon, and to the next three sections as far as Schrödinger equations are concerned.

4. A Hyperbolic Point of View In this section, we study (1.1) in the case κ = 0, with no potential: iε∂t u ε +

ε2 u ε = f |u ε |2 u ε ; u ε|t=0 = a0ε (x)eiϕ0 (x)/ε . 2

(4.1)

We recall the method introduced by E. Grenier [22], which is valid for smooth nonlinearities which are defocusing and cubic at the origin. Throughout this section, we assume the following: Assumption 4 (Study of (4.1)). We have f > 0. In addition, ϕ0 ∈ H ∞ , and there exists a0 , a1 ∈ H ∞ such that a0ε = a0 + εa1 + o(ε) in H s , ∀s ≥ 0. Note that this assumption is closely akin to Assumption 3: nevertheless, we do not make any assumption on the momenta of a0 and a1 , and the initial phase is bounded. We will see in Sect. 5 how to weaken this assumption.

4.1. Grenier’s approach. The principle is somehow to perform the usual WKB analysis “the other way round”. First, write the exact solution as u ε (t, x) = a ε (t, x)ei

ε (t,x)/ε

,

(4.2)

where ε is real-valued. Then show that the “amplitude” a ε and the “phase” ε have asymptotic expansions as ε → 0: a ε ∼ a + εa (1) + ε2 a (2) + · · · ; ε ∼ φ + εφ (1) + ε2 φ (2) + · · · . The phase ε is real-valued. Note that as in the case studied in Sect. 3, the amplitude a ε is complex-valued. The analysis below will show more precisely that a is real-valued if a0 is, but that a (1) is complex-valued in general.

208

R. Carles

Introducing two unknown functions to solve one equation yields a degree of freedom. The historical approach [26, Chap. III] consisted in writing 2 a ε 1 ∂t ε + ∇ε + f |a ε |2 = ε2 ε ; ε t=0 = ϕ0 , 2 2a 1 ∂t a ε + ∇ε · ∇a ε + a ε ε = 0; a ε t=0 = a0ε . 2 Of course, this choice is not adapted when the amplitude a ε vanishes (see [21]), so it must be left out when a0ε ∈ L 2 (Rn ) in general. The approach introduced by E. Grenier consists in imposing: 2 1 ∂t ε + ∇ε + f |a ε |2 = 0; ε t=0 = ϕ0 , 2 (4.3) 1 ε ε ∂t a + ∇ε · ∇a ε + a ε ε = i a ε ; a ε t=0 = a0ε . 2 2 Before recalling the results of [22], observe that if a ε and ε are bounded in some sufficiently small Sobolev spaces uniformly in ε, passing to the limit formally in (4.3) yields: 1 ∂t φ + |∇φ|2 + f |a|2 = 0; φ t=0 = ϕ0 , 2 (4.4) 1 ∂t a + ∇φ · ∇a + aφ = 0; a t=0 = a0 . 2 We see that when the nonlinearity is exactly cubic ( f (y) ≡ y), (ρ, v) := |a|2 , ∇φ solves the compressible, isentropic Euler equation ∂t v + v · ∇v + ∇ρ = 0; v t=0 = ∇ϕ0 , (4.5) ∂t ρ + ∇ · (ρv) = 0; ρ t=0 = |a0 |2 . From this point √ of view, the formulation (4.4) is closely akin to the change of unknown function ρ → ρ introduced in [29] (see also [11]) to study (4.5) when the initial density is compactly supported, a situation more or less similar to the present one. Note however that here, a is complex-valued in general. Introducing the “velocity” v ε = ∇ε , (4.3) yields ∂t v ε + v ε · ∇v ε + 2 f |a ε |2 Re a ε ∇a ε = 0; v ε t=0 = ∇φ0 , (4.6) 1 ε ∂t a ε + v ε · ∇a ε + a ε ∇ · v ε = i a ε ; a ε t=0 = a0ε . 2 2 Separate real and imaginary parts of a ε , a ε = a1ε + ia2ε . Then we have ∂t uε +

n

A j (uε )∂ j uε =

j=1

⎛

⎞ a1ε ε ⎜ a2 ⎟ ⎜ vε ⎟ ε ⎟ with u = ⎜ ⎜ .1 ⎟ , ⎝ .. ⎠ vnε

⎛

ε ε Lu , 2

⎞ 0 − 0 . . . 0 L = ⎝ 0 0 ... 0⎠, 0 0 0n×n

(4.7)

WKB for NLS with Potential

and A(u, ξ ) =

209 n j=1

⎛ A j (u)ξ j = ⎝

v·ξ

0

0 v·ξ 2 f a1 ξ 2 f a2 ξ

a1 t 2 ξ a2 t 2 ξ v · ξ In

⎞ ⎠,

where f stands for f (|a1 |2 + |a2 |2 ). The matrix A(u, ξ ) can be symmetrized by I2 0 S= , (4.8) 1 0 4 f In which is symmetric and positive since f > 0. For an integer s > 2 + n/2, we bound (S∂xα uε , ∂xα uε ), where α is a multi-index of length ≤ s, and (·, ·) is the usual L 2 scalar product. We have d α ε α ε S∂x u , ∂x u = ∂t S∂xα uε , ∂xα uε + 2 S∂t ∂xα uε , ∂xα uε dt since S is symmetric. For the first term, we consider the lower n × n block: α ε α ε 1 ε 2 α ε α ε ε 2 ∂t S∂x u , ∂x u ≤ ∂t f |a1 | + |a2 | ∞ S∂x u , ∂x u . f L So long as uε L ∞ ≤ 2 a0ε L ∞ , we have:

ε 2 ε 2

ε 2 f |a1 | + |a2 | ≥ inf f (y); 0 ≤ y ≤ 4 sup a0 L ∞ = δn > 0, 0<ε≤1

where δn is now fixed, since f is continuous with f > 0. We infer, 1 ε 2 ε ∂t f |a | + |a ε |2 1 2 ∞ ≤ C u H s , f L

where we used Sobolev embeddings and (4.7). For the second term we use

n ε S L(∂xα uε ), ∂xα uε − S∂xα S∂t ∂xα uε , ∂xα uε = A j (uε )∂ j uε , ∂xα uε . 2 j=1

We notice that S L is a skew-symmetric second order operator, so the first term is zero. For the second term, use the symmetry of S A j (uε ) and usual estimates on commutators (see e.g. [28, 35]) to get finally: α ε α ε d α ε α ε S∂x u , ∂x u ≤ C uε H s S∂x u , ∂x u , dt |α|≤s

|α|≤s

for s > 2 + d/2. Gronwall lemma along with a continuity argument yield: Proposition 4 ([22], Th. 1.1). Let Assumption 4 be satisfied. Let s > 2 + n/2. There ε exist Ts > 0 independent of ε ∈]0, 1] and u ε = a ε ei /ε solution to (4.1) on [0, Ts ]. Moreover, a ε and ε are bounded in L ∞ ([0, Ts ]; H s ), uniformly in ε ∈]0, 1]. The solution to (4.3) formally converges to the solution of (4.4). Under Assumption 4, (4.4) has a unique solution (a, φ) ∈ L ∞ ([0, T∗ ]; H m )2 for all m > 0 for some T∗ > 0 independent of m (see e.g. [2, 28]). We infer:

210

R. Carles

Proposition 5. Let s ∈ N. Then Ts ≥ T∗ , and there exists Cs independent of ε such that for every 0 ≤ t ≤ T∗ , a ε (t) − a(t) H s ≤ Cs ε; ε (t) − φ(t) H s ≤ Cs εt.

(4.9)

Proof. We keep the same notations as above, (4.7). Denote by v the analog of uε corresponding to (a, φ). We have n n ε ∂t uε − v + A j (uε ) − A j (v) ∂ j v = Luε. A j (uε )∂ j uε − v + 2 j=1

j=1

Keeping the symmetrizer S corresponding to uε , we can do similar computations to the previous ones. Note that we know that uε and v are bounded in L ∞ ([0, min(Ts , T∗ )]; H s ). Denote wε = uε − v. Writing Luε = Lwε + Lv, the term Lwε disappears from the energy estimate, and we get, for s > 2 + n/2: α ε α ε d α ε α ε S∂x w , ∂x w ≤ C uε H s , v H s+2 S∂x w , ∂x w dt |α|≤s

|α|≤s

ε

+ε v H s+2 w (t) H s . Gronwall’s lemma and a continuity argument show that wε (hence uε ) is defined on ε [0, T∗ ]. By Assumption 4, w|t=0 H s = O(ε), and we get: ε w ∞ = O(ε). L ([0,T ];H s ) ∗

The estimate for the phase (and not only its gradient) then follows from the above estimate and the integration in time of (4.3)–(4.4). Proposition 5 yields an approximation of u ε for small times only: ε ε iε (t)/ε iφ(t)/ε u (t) − a(t)eiφ(t)/ε 2 = a (t)e − a(t)e 2 L ε L = O a ε (t) − a(t) L 2 + ei (t)/ε − eiφ(t)/ε

L

a(t) L 2 ∞

= O(ε) + O(t). For times of order O(1), the corrector a1 must be taken into account: Proposition 6. Let Assumption 4 be satisfied. Define (a (1) , φ (1) ) by ∂t φ (1) + ∇φ · ∇φ (1) + 2 Re aa (1) f |a|2 = 0, 1 1 i ∂t a (1) + ∇φ · ∇a (1) + ∇φ (1) · ∇a + a (1) φ + aφ (1) = a, 2 2 2 φ (1) t=0 = 0; a (1) t=0 = a1 . Then a (1) , φ (1) ∈ L ∞ ([0, T∗ ]; H s ) for every s ≥ 0, and a ε − a − εa (1) L ∞ ([0,T∗ ];H s ) + ε − φ − εφ (1) L ∞ ([0,T∗ ];H s ) ≤ Cs ε2 , ∀s ≥ 0 .

WKB for NLS with Potential

211

The proof is a straightforward consequence of the above analysis, and is given in [22]. Despite the notations, it seems unadapted to consider φ (1) as being part of the phase. Indeed, we infer from Proposition 6 that (1) ε u − aeiφ eiφ/ε

L ∞ ([0,T∗ ];L 2 ∩L ∞ )

= O(ε).

Relating this information to the WKB methods presented in the introduction, we would have: (1)

a0 = aeiφ . Since φ (1) depends on a1 while a does not, we retrieve the fact that in super-critical régimes, the leading order amplitude in WKB methods depends on the initial first corrector a1 . (1)

(1)

Remark 4. The term eiφ does not appear in the Wigner measure of aeiφ eiφ/ε . Thus, from the point of view of Wigner measures, the asymptotic behavior of the exact solution is described by the Euler-type system (4.4). Remark 5 (Introducing an isotropic harmonic potential). The above method makes it possible to consider the semi-classical of (1.1) when V (t, x) = 21 |x|2 is an isotropic harmonic potential, and Assumption 4 is satisfied. Let 2 x 1 i t 2 |x| ε 2ε 1+t . U (t, x) = e u arctan t, √ (1 + t 2 )n/4 1 + t2 ε

Then U ε solves: ⎧ n/2 2 ⎪ 2 ε 2 ⎨ iε∂ U ε + ε U ε = 1 f U ε, 1 + t |U | t 2 1 + t2 ⎪ ⎩ ε U (0, x) = a0ε (x)eiϕ0 (x)/ε . We can then proceed as above. The only difference is the presence of time in the nonlinearity, which changes the analysis very little. Remark 6 (Momenta). If in Assumption 4, we replace H s with s = H s ∩ F(H s ) = w ∈ L 2 ; (1 − )k/2 xs−k w ∈ L 2 , 0 ≤ k ≤ s , then the above analysis can be repeated in s . The main difference is due to the commutations of the powers of x with the differential operators; it is easy to check that they introduce semilinear terms, which can be treated as source terms when applying the Gronwall lemma.

212

R. Carles

4.2. Remarks about some conserved quantities. Consider the case of the cubic, defocusing Schrödinger equation: f (y) ≡ y. Recall three important evolution laws for (1.1): d ε u (t) L 2 = 0 , dt d Energy: ε∇x u ε 2L 2 + u ε 4L 4 = 0 , dt

d Momentum: Im u ε (t, x)ε∇x u ε (t, x)d x = 0 , dt d ε Pseudo-conformal law: J (t)u ε 2L 2 + t 2 u ε 4L 4 = t (2 − n) u ε 4L 4 , dt Mass:

where J ε (t) = x + iεt∇x . These evolutions are deduced from the usual ones (ε = 1, see e.g. [10, 34]) via the scaling ψ(t, x) = u(εt, εx). Using (4.2) and passing to the limit formally in the above formulae yields: d a(t) L 2 dt

d |a(t, x)|2 |∇φ(t, x)|2 + |a(t, x)|4 d x dt

d |a(t, x)|2 ∇φ(t, x)d x dt

d |(x − t∇φ(t, x)) a(t, x)|2 + t 2 |a(t, x)|4 d x dt

= 0, = 0, = 0, = (2 − n)t

|a(t, x)|4 d x .

Note that we also have the conservation ([9]):

d Re u ε (t, x)J ε (t)u ε (t, x)d x = 0 , dt which yields: d dt

(x − t∇φ(t, x)) |a(t, x)|2 d x = 0 .

All these expressions involve only (|a|2 , ∇φ), that is, the solution of (4.5). We thus retrieve formally some evolution laws for the Euler equation.

5. Introducing Subquadratic Potential and Initial Phase In this section, we prove Theorem 1. First, we point out that the uniqueness for u ε in C([0, T∗ ]; H s ) is straightforward for s > n/2. We thus have to prove that there exists such a solution, and that it is smooth. As suggested by the statements of Theorem 1, the idea consists in resuming Grenier’s method, and in writing the phase ε as ε = φeik + φ ε.

WKB for NLS with Potential

213

We take φ ε as a new unknown function. Recall that the system (4.3) reads, with the present notations: 2 1 ∂t ε + ∇ε + V + f |a ε |2 = 0; ε t=0 = φ0 , 2 1 ε ∂t a ε + ∇ε · ∇a ε + a ε ε = i a ε ; a ε t=0 = a0ε . 2 2 This system becomes, in terms of φ ε , and given (1.5): 1 ε 2 ∇φ + ∇φeik · ∇φ ε + f |a ε |2 = 0, 2 1 1 ε ε ε ε ∂t a + ∇φ · ∇a + ∇φeik · ∇a ε + a ε φ ε + a ε φeik = i a ε , 2 2 2 φ ε t=0 = 0; a ε t=0 = a0ε. ∂t φ ε +

(5.1)

We note that φ ε is real-valued, while a ε is complex-valued. Like in Sect. 4.1, we work with v ε = ∇φ ε instead of φ ε , to begin with. The new terms are the factors where φeik is present. The point is to check that they are semilinear perturbations, which can be treated as source terms in view of the Gronwall lemma. Again, separate real and imaginary parts of a ε , a ε = a1ε + ia2ε , and introduce: ⎛ ε⎞ a1 ⎛ ⎞ ⎜ a2ε ⎟ 0 − 0 . . . 0 ⎜ ⎟ ε v ⎟ ⎝ ⎠ uε = ⎜ ⎜ .1 ⎟ , L = 0 0 . . . 0 , ⎝ .. ⎠ 0 0 0n×n vnε ⎛

⎞ a1 t v·ξ 0 2 ξ a 2 t ⎠, v·ξ and A(u, ξ ) = A j (u)ξ j = ⎝ 0 2 ξ

a ξ 2 f a ξ v · ξ I 2 f j=1 1 2 n n

where f stands for f (|a1 |2 + |a2 |2 ). Instead of (4.7), we now have a system of the form ε

∂t u +

n j=1

ε

ε

A j (u )∂ j u +

n j=1

ε B j (∇φeik )∂ j uε + M ∇ 2 φeik uε = Luε, 2

(5.2)

where the matrices B j depend linearly on their argument, and the matrix M is smooth, locally bounded. The quasilinear part of (5.2) is the same as in Sect. 4.1, and involves the matrices A j . In particular, we keep the same symmetrizer S given by (4.8). The matrices B j have a semilinear contribution, as we see below. The term corresponding to the matrix M can obviously be considered as a source term, since φeik is subquadratic. Let s be an integer, s > 2 + n/2, and let α be a multi-index of length ≤ s. We have: d α ε α ε S∂x u , ∂x u = ∂t S∂xα uε , ∂xα uε + 2 S∂t ∂xα uε , ∂xα uε , dt since S is symmetric. For the first term, we consider the lower n × n block: α ε α ε 1 ε 2 α ε α ε ε 2 ∂t S∂x u , ∂x u ≤ ∂t f |a1 | + |a2 | ∞ S∂x u , ∂x u . f L

(5.3)

214

R. Carles

We consider times not larger than T given by Lemma 1, so that the function φeik remains smooth and subquadratic. So long as uε L ∞ ≤ 2 a0ε L ∞ , we have: f |a1ε |2 + |a2ε |2 ≥ inf f (y) ; 0 ≤ y ≤ 4 sup a0ε 2L ∞ = δn > 0 . 0<ε≤1

We infer, 1 ε 2 ε ε ∂t f |a | + |a ε |2 1 2 f ∞ ≤ C u H s + xu H s−1 , L

(5.4)

for some locally bounded map C(·). We used Sobolev embeddings, (5.2) and Lemma 1: the terms B j are sublinear in x, hence the norm xuε H s−1 which we did not consider in Sect. 4.1. We emphasize that this estimate explains why we assume s > 2 + n/2, and not only s > 1 + n/2: we control ∂t uε in L ∞ using (5.2), so we need to estimate Luε in L ∞ . For all the other terms, s > 1 + n/2 would suffice. This also explains why we wrote xuε H s−1 and not xuε H s . For the second term we use

n ε S L(∂xα uε ), ∂xα uε − S∂xα A j (uε )∂ j uε , ∂xα uε S∂t ∂xα uε , ∂xα uε = 2 j=1

n

− S∂xα

B j (∇φeik )∂ j uε , ∂xα uε

j=1

− S∂xα M(∇ 2 φeik )uε , ∂xα uε . The first two terms of the right hand side are handled in the same way as in Sect. 4.1: the first one is zero, and the second can be estimated by: ⎛ ⎛ ⎞ ⎞ n α ε α ε ⎝ S∂xα ⎝ S∂x u , ∂x u , (5.5) A j (uε )∂ j uε ⎠ , ∂xα uε ⎠ ≤ C uε H s |α|≤s

j=1

where we keep the convention that C(·) is a locally bounded map. Let us briefly explain this quasilinear estimate. First, write α S∂x A j (uε )∂ j uε , ∂xα uε = S A j (uε )∂ j ∂xα uε , ∂xα uε + S ∂xα (A j (uε )∂ j uε ) − A j (uε )∂ j ∂xα uε , ∂xα uε . By symmetry of S A j (uε ), S A j (uε )∂ j ∂xα uε , ∂xα uε = − ∂ j (S A j (uε ))∂xα uε , ∂xα uε − S A j (uε )∂ j ∂xα uε , ∂xα uε . We infer: S A j (uε )∂ j ∂ α uε , ∂ α uε ≤ ∂ j S A j (uε ) ∞ ∂ α uε 2 2 x x x L L 2 ≤ C uε ∞ ∇x uε ∞ ∂ α uε 2 . L

L

x

L

WKB for NLS with Potential

215

The usual estimates on commutators (see e.g. [28, 35]) lead to α S ∂ A j (uε )∂ j uε − A j (uε )∂ j ∂ α uε , ∂ α uε ≤ C uε s uε 2 s , x x x H H −1 and (5.5) follows, since we times consider where S is bounded. α ε α ε For the third term of S∂t ∂x u , ∂x u , write:

S∂xα B j (∇φeik )∂ j uε , ∂xα uε =

S B j (∇φeik )∂ j ∂xα uε ∂xα uε d x

+ S ∂xα , B j (∇φeik )∂ j uε ∂xα uε d x.

For the first term of the right-hand side, an integration by parts yields:

S B j (∇φeik )∂ j ∂ α uε ∂ α uε d x ≤ ∂ j S B j (∇φeik ) ∞ uε 2 s x x H L ≤ C( a ε L ∞ ) x a ε ∇a ε L ∞ uε 2H s ≤ C( uε L ∞ ) uε L ∞ + xuε L ∞ 2 + ∇uε L ∞ uε 2H s , (5.6) where we have used Lemma 1. Again from Lemma 1, the commutator α ∂x , B j (∇φeik )∂ j is a differential operator of degree ≤ s, with bounded coefficients. We infer: n α B j (∇φeik )∂ j uε , ∂xα uε ≤ C( uε H s + xuε H s−1 ) uε 2H s . S∂x j=1

We have obviously S∂xα M(∇ 2 φeik )uε , ∂xα uε ≤ C( uε L ∞ ) uε 2H s . This yields: d α ε α ε S∂x u , ∂x u ≤ C uε H s + xuε H s−1 uε 2H s , dt

(5.7)

where the map C(·) is locally bounded. We now have to bound xuε in H s−1 to close our family of estimates: we consider d β S∂x (xk uε ), ∂xβ (xk uε ) ; 1 ≤ k ≤ n, |β| ≤ s − 1. dt

216

R. Carles

We can proceed as above, replacing uε with xk uε : xk uε solves almost the same equation as uε , and we must control some commutators: ∂t (xk uε ) +

n j=1

A j (uε )∂ j (xk uε ) +

n

B j (∇φeik )∂ j (xk uε )

j=1

ε ε + M ∇ 2 φeik xk uε = L(xk uε ) + Ak (uε )uε + Bk (∇φeik )uε + [xk , L]uε . 2 2 The term Ak (uε )uε is harmless. The term Bk (∇φeik )uε is controlled by x uε , since φeik is subquadratic: this is a (semi)linear perturbation. Finally, ⎛ ⎞ 0 2∂k 0 . . . 0 [xk , L] = ⎝ −2∂k 0 0 . . . 0 ⎠ . 0 0 0n×n Now we only have to notice that estimating xk uε does not involve extra regularity or extra momenta. In the above computations, the first time we needed to consider momenta was for (5.4): we need exactly the same estimate now, since it is due to the symmetrizer, which remains the same. The same remark is valid for (5.6). For β a multi index of length ≤ s − 1, we find: d β S∂x (xk uε ), ∂xβ (xk uε ) dt ≤ C uε H s + xuε H s−1 uε 2H s + xuε 2H s−1 .

(5.8)

The term uε 2H s is due to the commutator [xk , L]: β S∂ [xk , L]uε , ∂ β xk uε = ∂ β [xk , L]uε , ∂ β xk uε x x x x ≤ uε H s xuε H s−1 .

Summing over the inequalities (5.7) and (5.8) yields a closed set of estimates, from which we infer the analogue of Proposition 4; note that the time Ts is not larger than T by construction, and may be strictly smaller than T , due to possible shocks for (5.2). We also mention the fact that the above analysis gives v ε = ∇φ ε ∈ C([0, Ts ]; H s ), with x∇φ ε ∈ C([0, Ts ]; H s−1 ): back to (5.1), this shows that φ ε ∈ C([0, T∗ ]; H s−1 ), but that we cannot claim that xφ ε ∈ C([0, T∗ ]; H s−1 ). Passing to the limit ε → 0 in (5.1), it is natural to introduce the system: ∂t φ + 21 |∇φ|2 + ∇φeik · ∇φ + f |a|2 = 0, ∂t a + ∇φ · ∇a + ∇φeik · ∇a + 21 aφ + 21 aφeik = 0, φ t=0 = 0; a t=0 = a0 .

(5.9)

The above analysis shows that this system has a unique solution in H s with the first momentum in H s−1 , locally in time for t ∈ [0, T∗ ], for some T∗ ∈]0, T ]; T∗ is independent of s, from the continuation principle, explained for instance in [28, Sect. 2.2]; essentially, tame estimates given by Moser’s calculus show that the only obstruction to global existence is the unboundedness of the C 1 norm of (φ, a). We easily obtain the analogue of Proposition 5: mimicking the above computations, we can estimate the error

WKB for NLS with Potential

217

(a ε − a, φ ε − φ) in H s , by first estimating (a ε − a, ∇φ ε − ∇φ) and its first momentum in H s . We deduce that Ts ≥ T∗ , and (1.8) follows. The end of Theorem 1 can then be proved as in [22]: from the above analysis, the functions a ε − a ∇φ ε − ∇φ , ε ε and their first momentum, are bounded in H s for every s ≥ 0. A subsequence converges to the linearization of (5.2), yielding a pair (a (1) , φ (1) ). By uniqueness for the limit system, the whole sequence is convergent, and the analogue of Proposition 6 follows. This completes the proof of Theorem 1. 6. Extension to the Case 0 < κ < 1 When 0 < κ < 1, we propose an analysis which can be considered as a generalization of the study led in Sect. 5. Throughout this paragraph, we suppose that Assumptions 1–3 are satisfied. Again, we write the exact solution as u ε = a ε ei

ε /ε

, with ε = φeik + φ ε .

The unknown function is the pair (a ε , φ ε ). We have two unknown functions to solve a single equation, (1.1). We can choose how to balance the terms: we resume the approach followed when κ = 0. Note that this approach would also be efficient for the case κ ≥ 1, with the serious drawback that we still assume f > 0, an assumption proven to be unnecessary when κ ≥ 1 (see Sect. 3). We impose: 2 1 ∂t φ ε + ∇φ ε + ∇φeik · ∇φ ε + εκ f |a ε |2 = 0, 2 1 1 ε (6.1) ∂t a ε + ∇φ ε · ∇a ε + ∇φeik · ∇a ε + a ε φ ε + a ε φeik = i a ε , 2 2 2 φ ε t=0 = 0; a ε t=0 = a0ε . This is the same system as (5.1), with only f replaced by εκ f . Mimicking the analysis of Sect. 5, we work with the unknown uε given by the same definition: it solves the system (5.2), where only the matrices A j have changed, and now depend on ε. The symmetrizer is the same as before, with f replaced by εκ f : the matrix S = S ε is not bounded as ε → 0, but its inverse is. We see that (5.3) and (5.4) still hold, independent of κ. We claim that inequalities similar to (5.7) and (5.8) hold: ε γ ε γ ε d ε α ε α ε S ∂x u , ∂x u ≤ C uε H s + xuε H s−1 S ∂x u , ∂x u , dt |γ |≤s ε γ ε γ ε d ε β S ∂x (xk uε ), ∂xβ (xk uε ) ≤ C uε H s + xuε H s−1 S ∂x u , ∂x u dt |γ |≤s γ ε γ ε + S ∂x (x j u ), ∂x x j uε , |γ |≤s−1 1≤ j≤n

where the map C(·) is locally bounded, independent of ε ∈]0, 1]. The fact that such estimates remain valid, with this dependence upon ε, stems essentially from the following reasons:

218

R. Carles

– The matrices S ε and B j are diagonal. – The matrix M is block diagonal: the blocks correspond to the presence/absence of ε in S ε . – The matrices S ε Aεj are independent of ε ∈]0, 1]. – The inverse of S ε is uniformly bounded on compacts, as ε → 0. A continuity argument and the Gronwall lemma then imply the analogue of Proposition 4: uε exists locally in time, with H s -norm uniformly bounded as ε → 0. Note that since φ ε t=0 = 0, we have: ε α ε α ε S ∂x u , ∂x u t=0 = O(1), and we infer more precisely:

a ε L ∞ ([0,T∗ ];H s ) = O(1); ∇φ ε L ∞ ([0,T

∗ ];H

s)

= O εκ .

ε = ε−κ φ ε instead of It seems natural to change unknown functions, and work with φ φ ε . With this, we somehow correct the shift in the cascade of equations caused by the factor εκ in front of the nonlinearity. Then (6.1) becomes: εκ ε 2 ε + + ∇φeik · ∇ φ ε + f |a ε |2 = 0, ∇φ ∂t φ 2 εκ 1 ε ε · ∇a ε + ∇φeik · ∇a ε + a ε φ ε + a ε φeik = i a ε , (6.2) ∂t a ε + εκ ∇ φ 2 2 2 ε ε φ = 0; a = a0ε . t=0 t=0 ε , a ε ) is bounded in C([0, T ]; H s ). Therefore, a subsequence is convergent, The pair (φ and the limit is given by: + ∇φeik · ∇ φ + f |a|2 = 0 ; φ ∂t φ = 0, t=0 1 ∂t a + ∇φeik · ∇a + aφeik = 0 ; a t=0 = a0 . 2 is given by an ordinary differential equation along the rays We see that a solves (1.6); φ associated to φeik , with a source term showing nonlinear effect: f |a|2 . By uniqueness, the whole sequence is convergent. Roughly speaking, we see that if ε −φ , aε − a , wε = t ∇ φ then the Gronwall lemma yields: ε α ε α ε S ∂x w , ∂x w ≤ C ε + εκ ≤ 2Cεκ . We infer: Proposition 7. Let s > 2 + n/2. Then (6.1) has a unique solution (a ε , φ ε ) ∈ C([0, T ]; H s )2 , such that xk a ε, xk ∂ j φ ε ∈ C([0, T ]; H s ), for every 1 ≤ j, k ≤ n (T is given by Lemma 1). Moreover, there exists Cs independent of ε such that for every 0 ≤ t ≤ T , H s ≤ Cs ε2κ t, a ε (t) − a(t) H s ≤ Cs εκ ; φ ε (t) − εκ φ where a is given by (1.6).

(6.3)

WKB for NLS with Potential

219

Three cases must be distinguished: – If 1/2 < κ < 1, then we can infer the analogue of (1.9). – If κ = 1/2, then we can infer the analogue of (1.8) (but not yet of (1.9)). – If 0 < κ < 1/2, then we must pursue the analysis, and compute a corrector of order ε2κ . We shall not go further into detailed computations, but instead, discuss the whole analysis in a rather loose fashion. However, we note that all the ingredients have been given for a complete justification. Let N = [1/κ], where [r ] is the largest integer not larger than r > 0. We construct (1) , . . . , φ (N ) such that: a (1) , . . . , a (N ) and φ ε a − a − εκ a (1) − · · · − ε N κ a (N ) ∞ L ([0,T ];H s ) ε Nκ (1) − · · · − ε N κ φ (N ) . + φ − φ − εκ φ = o ε ∞ s L ([0,T ];H )

But since N + 1 > 1/κ, we have: ε − ε2κ φ (1) − · · · − ε N κ φ (N −1) φ − ε κ φ The analogue of (1.9) follows: ε ε u − aeiφeik /ε+iφapp

L ∞ ([0,T ];H s )

L ∞ ([0,T ];L 2 ∩L ∞ )

= O ε(N +1)κ = o(ε).

= o(1),

where ε = φapp

φ ε1−κ

+

(1) (N −1) φ φ + . . . + 1−N κ . 1−2κ ε ε

Remark 7. In the case κ = 1, N = 1, and the above analysis shows that one phase shift factor appears: we retrieve the function G of Proposition 1 (under the unnecessary assumption f > 0). If κ > 1, then N = 0, and we see that aeiφeik /ε is a good approximation for u ε . Acknowledgements. Support by European network HYKE, funded by the EC as contract HPRN-CT-200200282, by Centro de Matemática e Aplicações Fundamentais (Lisbon), funded by FCT as contract POCTIISFL-1-209, and by the ANR project SCASEN, are acknowledged.

References 1. Ablowitz, M.J., Clarkson, P.A.: Solitons, nonlinear evolution equations and inverse scattering. London Mathematical Society Lecture Note Series, Vol. 149, Cambridge, Cambridge University Press, 1991 2. Alinhac, S., Gérard, P.: Opérateurs pseudo-différentiels et théorème de Nash-Moser. Savoirs Actuels, Paris: InterEditions, 1991 3. Bahouri, H., Chemin, J.-Y.: Équations d’ondes quasilinéaires et effet dispersif. Internat. Math. Res. Notices, no. 21, 1141–1178 (1999) 4. Bahouri, H., Chemin, J.-Y.: Équations d’ondes quasilinéaires et estimations de Strichartz. Amer. J. Math. 121no. 6, 1337–1377 (1999) 5. Bao, W., Jin, S., Markowich, P.A.: Numerical study of time-splitting spectral discretizations of nonlinear Schrödinger equations in the semiclassical regimes. SIAM J. Sci. Comput. 25(1), 27–64 (2003) 6. Burq, N., Gérard, P., Tzvetkov, N.: Multilinear eigenfunction estimates and global existence for the three dimensional nonlinear Schrödinger equations. Ann. Sci. École Norm. Sup. (4) 38(2), 255–301 (2005)

220

R. Carles

7. Burq, N., Zworski, M.: Instability for the semiclassical non-linear Schrödinger equation. Commun. Math. Phys. 260(1), 45–58 (2005) 8. Carles, R.: Geometric optics and instability for semi-classical Schrödinger equations. Arch. Rat. Mech. Anal. (2006), to appear 9. Carles, R., Nakamura, Y.: Nonlinear Schrödinger equations with Stark potential. Hokkaido Math. J. 33(3), 719–729 (2004) 10. Cazenave, T.: Semilinear Schrödinger equations. Courant Lecture Notes in Mathematics, Vol. 10, New York: New York University Courant Institute of Mathematical Sciences, 2003 11. Chemin, J.-Y.: Dynamique des gaz à masse totale finie. Asymptotic Anal. 3(3), 215–220 (1990) 12. Cheverry, C.: Cascade of phases in turbulent flows. Bull. Soc. Math. France (2006), to appear. Preprint version: http://arxiv.org/math.AP/0402408 13. Cheverry, C., Guès, O.: Counter-examples to the concentration-cancellation property. Preprint, 2005 14. Christ, M., Colliander, J., Tao, T.: Ill-posedness for nonlinear Schrödinger and wave equations. Ann. Inst. H. Poincaré Anal. Non Linéaire (2005), see also http://arxiv.org/math.AP/0311048, 2003 15. Dalfovo, F., Giorgini, S., Pitaevskii, L.P., Stringari, S.: Theory of Bose-Einstein condensation in trapped gases. Rev. Mod. Phys. 71(3), 463–512 (1999) 16. Derezi´nski, J., Gérard, C.: Scattering theory of quantum and classical N-particle systems. Texts and Monographs in Physics, Berlin Heidelberg: Springer Verlag, 1997 17. Duistermaat, J.J.: Oscillatory integrals, Lagrange immersions and unfolding of singularities. Comm. Pure Appl. Math. 27, 207–281 (1974) 18. Dunford, N., Schwartz, J.T.: Linear operators. Part II: Spectral theory. Self adjoint operators in Hilbert space. With the assistance of William G. Bade and Robert G. Bartle, New York-London: Interscience Publishers John Wiley & Sons, 1963 19. Fujiwara, D.: A construction of the fundamental solution for the Schrödinger equation. J. Analyse Math. 35, 41–96 (1979) 20. Fujiwara, D.: Remarks on the convergence of the Feynman path integrals. Duke Math. J. 47(3), 559–600 (1980) 21. Gérard, P.: Remarques sur l’analyse semi-classique de l’équation de Schrödinger non linéaire. Séminaire sur les Équations aux Dérivées Partielles, 1992–1993, Palaiseau, École Polytech., 1993, pp. Exp. No. XIII, 13 22. Grenier, E.: Semiclassical limit of the nonlinear Schrödinger equation in small time. Proc. Amer. Math. Soc. 126(2), 523–530 (1998) 23. Jin, S., Levermore, C.D., McLaughlin, D.W.: The semiclassical limit of the defocusing NLS hierarchy. Comm. Pure Appl. Math. 52(5), 613–654 (1999) 24. Kamvissis, S., McLaughlin, K.D.T.-R., Miller, P.D.: Semiclassical soliton ensembles for the focusing nonlinear Schrödinger equation. Annals of Mathematics Studies, Vol. 154, Princeton, NJ: Princeton University Press, 2003 25. Kolomeisky, E.B., Newman, T.J., Straley, J.P., Qi, X.: Low-dimensional Bose liquids: Beyond the GrossPitaevskii approximation. Phys. Rev. Lett. 85(6), 1146–1149 (2000) 26. Landau, L., Lifschitz, E.: Physique théorique (“Landau-Lifchitz”). Tome III: Mécanique quantique. Théorie non relativiste. Moscow: Éditions Mir, 1967, Deuxième édition, Traduit du russe par Édouard Gloukhian 27. Lannes, D., Rauch, J.: Validity of nonlinear geometric optics with times growing logarithmically. Proc. Amer. Math. Soc. 129(4), 1087–1096 (2001) 28. Majda, A.: Compressible fluid flow and systems of conservation laws in several space variables. Applied Mathematical Sciences, Vol. 53, New York: Springer-Verlag, 1984 29. Makino, T., Ukai, S., Kawashima, S.: Sur la solution à support compact de l’équation d’Euler compressible. Japan J. Appl. Math. 3(2), 249–257 (1986) 30. Pitaevskii, L., Stringari, S.: Bose-Einstein condensation. International Series of Monographs on Physics, Vol. 116, Oxford: The Clarendon Press Oxford University Press, 2003 31. Rauch, J., Keel, M.: Lectures on geometric optics. In: Hyperbolic equations and frequency interactions (Park City, UT, 1995), pp. 383–466. Providence, RI: Amer. Math. Soc., 1999 32. Reed, M., Simon, B.: Methods of modern mathematical physics. II. Fourier analysis, self-adjointness. New York: Academic Press [Harcourt Brace Jovanovich Publishers], 1975 33. Schwartz, J.T.: Nonlinear functional analysis. New York Gordon and Breach Science Publishers, 1969, see Notes by H. Fattorini, R. Nirenberg, and H. Porta,: with an additional chapter by Hermann Karcher, Notes on Mathematics and its Applications 34. Sulem, C., Sulem, P.-L.: The nonlinear Schrödinger equation, self-focusing and wave collapse. New York: Springer-Verlag, 1999 35. Taylor, M.: Partial differential equations. III. Applied Mathematical Sciences, Vol. 117, New York: Springer-Verlag, 1997 36. Tian, F.-R., Ye, J.: On the Whitham equations for the semiclassical limit of the defocusing nonlinear Schrödinger equation. Comm. Pure Appl. Math. 52(6), 655–692 (1999)

WKB for NLS with Potential

221

37. Tovbis, A., Venakides, S., Zhou, X.: On semiclassical (zero dispersion limit) solutions of the focusing nonlinear Schrödinger equation. Comm. Pure Appl. Math. 57(7), 877–985 (2004) 38. Yajima, K., Zhang, G.: Smoothing property for Schrödinger equations with potential superquadratic at infinity. Commun. Math. Phys. 221(3), 573–590 (2001) 39. Yajima, K., Zhang, G.: Local smoothing property and Strichartz inequality for Schrödinger equations with potentials superquadratic at infinity. J. Differ. Eqs. 202(1), 81–110 (2004) Communicated by P. Constantin

Commun. Math. Phys. 269, 223–238 (2007) Digital Object Identifier (DOI) 10.1007/s00220-006-0081-6

Communications in

Mathematical Physics

All Inequalities for the Relative Entropy Ben Ibinson, Noah Linden, Andreas Winter Department of Mathematics, University of Bristol, Bristol BS8 1TW, United Kingdom. E-mail: [email protected]; [email protected]; [email protected] Received: 9 February 2006 / Accepted: 6 March 2006 Published online: 10 August 2006 – © Springer-Verlag 2006

Abstract: The relative entropy of two n-party quantum states is an important quantity exhibiting, for example, the extent to which the two states are different. The relative entropy of the states formed by reducing two n-party states to a smaller number m of parties is always less than or equal to the relative entropy of the two original n-party states. This is the monotonicity of relative entropy. Using techniques from convex geometry, we prove that monotonicity under restrictions is the only general inequality satisfied by quantum relative entropies. In doing so we make a connection to secret sharing schemes with general access structures: indeed, it turns out that the extremal rays of the cone defined by monotonicity are populated by classical secret sharing schemes. A surprising outcome is that the structure of allowed relative entropy values of subsets of multiparty states is much simpler than the structure of allowed entropy values. And the structure of allowed relative entropy values (unlike that of entropies) is the same for classical probability distributions and quantum states. 1. Entropy and Relative Entropy Entropy inequalities play a central role in information theory [5], classical or quantum. This is so because practically all capacity theorems are formulated in terms of entropy, and the same, albeit to a lesser degree, holds for many monotones, of, for example, entanglement: e.g., the entanglement of formation [2] or squashed entanglement [4]. It may thus come as a surprise that until recently [13] essentially the only inequality known for the von Neumann entropies in a composite system is strong subadditivity S(ρ AB ) + S(ρ BC ) ≥ S(ρ ABC ) + S(ρ B ),

(1)

proved by Lieb and Ruskai [9]. We use the notation ρ ABC for the density operator representing the state of the system ABC, with the notation ρ BC = Tr A ρ ABC etc. for the reduced states.

224

B. Ibinson, N. Linden, A. Winter

The relative entropy of two states ρ, σ (density operators of trace 1) is defined as Tr ρ(log ρ − log σ ) if supp ρ ⊂ supp σ, D(ρσ ) = +∞ otherwise, where supp ρ is the supporting subspace of the density operator ρ. Note that in this paper, log always denotes the logarithm to base 2. Like von Neumann entropy, the relative entropy is used extensively in quantum information and entanglement theory to obtain capacity-like quantities and monotones. The most prominent example may be the relative entropy of entanglement [21, 22]. Many other applications of the relative entropy are illustrated in the review [20]. In this paper we study the universal relations between the relative entropies in a composite system and for general pairs of states. For the most part we shall restrict ourselves to finite dimensional spaces. What are the known inequalities? First of all, the relative entropy is always nonnegative, and indeed 0 iff ρ = σ (see the recent survey by Petz [15]). The most important, and indeed only known inequality, for the relative entropy is the monotonicity, D(ρ AB σ AB ) ≥ D(ρ A σ A )

(2)

for a bipartite system AB. This relation can be derived from strong subadditivity, Eq. (1), as was shown in [9] and by Lindblad [10, 11] in the finite dimensional (and more generally: separable Hilbert space) case; Uhlmann [19] later showed it for general von Neumann algebras and the wider class of 2-positive maps replacing the partial trace. Conversely, strong subadditivity can be easily derived from Eq. (2); see e.g. Petz [15] again. The monotonicity of relative entropy was shown to be fundamentally important to quantum communication theory by Yuen and Ozawa [24], who showed that it can be used to prove the famous Holevo information bound [7]. Before returning to relative entropy we make a few further observations about entropy. For an n-party system, there are 2n − 1 non-trivial reduced states, with their entropies, so we can associate with each state a vector of 2n − 1 real coordinates. Pippenger [16], following the programme of Yeung and Zhang in the classical case [23], showed that, after going to the topological closure, the set of all entropy vectors is a convex cone. Hence it must be describable by linear (entropy) inequalities, like strong subadditivity, and one can ask if the entropy cone coincides with the cone defined by the “known” inequalities (strong subadditivity in the quantum case, additionally positivity of conditional entropy classically). This is indeed the case for n ≤ 3: the classical result is due to Yeung and Zhang [23], the quantum case by Pippenger [16]. Yeung and Zhang [25] have however found a new, “non-Shannon type” inequality for n = 4 classical parties, and Linden and Winter [13] found a new so-called constrained inequality for n = 4 quantum parties, providing evidence that to describe the entropy cones of four and more parties one needs new inequalities, too. In [13] Linden and Winter describe how the putative vector of entropies, [S A , S B , . . . , S ABC D ] = λ[3, 3, 2, 2, 4, 3, 3, 3, 3, 4, 4, 4, 3, 3, 2],

(3)

for λ ≥ 0, satisfies strong subadditivity for all subsets of parties ABC D, but is nonetheless not achievable by any quantum state ρ [i.e. there is no quantum state ρ such that S A = S A (ρ A ), S B = S B (ρ B ) etc. achieving the values in Eq. (3)]. Here we ask (and answer in the affirmative) the question of whether any vector [D A , D B , . . . , D ABC D , . . .]

All Inequalities for the Relative Entropy

225

in which the numbers D A , . . . , D ABC D , . . . satisfy the constraints of monotonicity for all subgroups may be realised as the relative entropy of pairs of states [i.e. for any such vector we show that there are states ρ and σ such that D A = D(ρ A σ A ) etc.] In this paper we prove the result that for relative entropy, monotonicity is necessary and sufficient to describe the complete set of realisable relative entropy vectors. This is a surprising discovery as relative entropy is a seemingly more complex functional than entropy. However strong subadditivity is sufficient to define all possible relative entropy vectors (as monotonicity is derived from it) whereas it cannot encapsulate normal von Neumann entropy. Our approach is as follows: we show first, by adapting the YeungPippenger techniques, that the topological closure of the set of all relative entropy vectors is a convex cone (Sect. 2). Then we study the extremal rays of the Lindblad-Uhlmann cone defined by monotonicity, in Sect. 3: they correspond one-to-one to so-called upsets in the power set 2[n] of [n] = {1, . . . , n}. It remains to prove that every one of the rays is indeed populated by relative entropy vectors, which we do in Sect. 4. It turns out that the construction to show this depends heavily on secret sharing schemes, which we explain in Sect. 4, to make the paper self-contained, followed by an instructive example in Sect. 5, after which we conclude.

2. The Cone of Relative Entropy Vectors 2n −1 Define the set ∗n ⊂ R≥0 of vectors v = vS ∅ =S ⊂[n] , with [n] = {1, 2, . . . , n}:

v ∈ ∗n iff there exist quantum states of n-parties ρ, σ such that D(ρ S σ S ) = vS for every non-empty subset S. Observe that there are 2n − 1 nonempty subsets S, which n label the coordinates of R2 −1 in some fixed way. Lemma 1. The topological closure ∗n of ∗n is a convex cone. To be precise, it is enough to show that [16]: 1. (Additivity) for v, w ∈ ∗n , v + w ∈ ∗n ; 2. (Approximate diluability) for all δ > 0 there exists > 0 such that for all v ∈ ∗n and 0 ≤ λ ≤ there is w ∈ ∗n with λv − w ≤ δ. (We use the sup norm in the proof below, but since all norms in finite dimensions are equivalent, the exact choice of the norm is irrelevant.) Proof. Consider the following states ρ, ρ , σ and σ where the prime indicates that the corresponding state lives on a system different from the unprimed states. Let us define v and v as the relative entropy vectors generated from taking entropy values of D(ρ S σ S ) = ρ ⊗ ρ and σ = σ ⊗ σ . To prove and D(ρ S σ S ) respectively. Consider states ρ the first part of the lemma, we show v = v + v for the relative entropy vector v of ρ ; in detail, for every S ⊂ [n], D( ρ S σ S ) = D(ρ S ⊗ ρ S σ S ⊗ σ S ) = D(ρ S σ S ) + D(ρ S σ S ). Then, σ S ) = −S( ρ S ) − Tr( ρ S log σS) D( ρ S = −S(ρ S ) − S(ρ S ) − Tr ρ S ⊗ ρ S log σ S ⊗ σ S .

226

B. Ibinson, N. Linden, A. Winter

We use the fact that log(σ ⊗ σ ) = (log σ ) ⊗ 1l + 1l ⊗ (log σ ). Therefore, σ S ) = −S(ρ S ) − S(ρ S ) − Tr(ρ S log σ S ) − Tr(ρ S log σ S ) D( ρ S = D(ρ S σ S ) + D(ρ S σ S ). Therefore we can always construct a state that will give a vector in ∗n and is the sum of v and v . To prove the second part, choose such that ≤ 1/2 and H2 () ≤ δ, where H2 () is the binary entropy of , H2 () = − log − (1 − ) log(1 − ). Note that we can always choose a value of which satisfies these conditions for any δ. Let v be the relative entropy vector created by states ρ, σ . Consider the following states, ρ = λρ + (1 − λ)σ and σ = σ with the entropy vector w created by states ρ , σ . Consider the following quantity that leads to the entropy vector w: D( ρ S σ S ) = D λρ S + (1 − λ)σ S σ S = −S λρ S + (1 − λ)σ S − Tr[ λρ S + (1 − λ)σ S log σ S ] = −S λρ S + (1 − λ)σ S − λTr(ρ S log σ S ) − (1 − λ)Tr(σ S log σ S ). (4) We now make use of the following inequality, see for example [14]: pi S(ρi ) ≤ S pi ρi ≤ H ( pi ) + pi S(ρi ), i

i

i

which here specialises to

λS(ρ S ) + (1 − λ)S(σ S ) ≤ S λρ S + (1 − λ)σ S ≤ H2 (λ) + λS(ρ S ) + (1 − λ)S(σ S ). Hence we can define a quantity α such that 0 ≤ α ≤ H2 (λ) ≤ H2 () ≤ δ, S λρ S + (1 − λ)σ S = λS(ρ S ) + (1 − λ)S(σ S ) + α. Therefore, Eq. (4) reads, D( ρ S σ S ) = −S λρ S + (1 − λ)σ S − λTr(ρ S log σ S ) − (1 − λ)Tr(σ S log σ S ) = −λS(ρ S ) − (1 − λ)S(σ S ) − α − λTr(ρ S log σ S ) −(1 − λ)Tr(σ S log σ S ) = λD(ρ S σ S ) + (1 − λ)D(σ S σ S ) − α = λD(ρ S σ S ) − α. Thus for our given vector v [the vector made from the relative entropies D(ρ S σ S )], σ S )] such that for all we have found a w [the vector of the relative entropies D( ρ S δ > 0 (where H () ≤ δ), λv − w = α ≤ δ for all λ ≤ (where H () ≤ δ). This completes the proof.

Remark 2. From the proof, one sees that also the vectors of relative entropies of classical states ρ and σ , i.e. states diagonal in a fixed product basis, form a cone up to topological closure. And indeed in Sect. 4 we show that they are identical.

All Inequalities for the Relative Entropy

227

3. The Lindblad-Uhlmann Cone 2 −1 : all vectors v satisfying the following inequalities, Define the convex cone n ⊂ R≥0 for all [n] ⊃ S ⊃ S = ∅: n

vS ≥ vS , vS ≥ 0.

(5) (6)

This defines the cone of all vectors that obey the only known inequality between relative entropies of subsystems, the Lindblad-Uhlmann monotonicity relation (which implies non-negativity). Proposition 3. The extremal rays of n are spanned by vectors u of the form

1 if S ∈ U, uS = 0 if S ∈ U, for a set family ∅ = U ⊂ 2[n] and ∅ ∈ / U with the property that for all S ∈ U and S ⊃ S, S ∈ U. (Such a set family is called an up-set.) Conversely, every up-set U, by the above assignment, defines a vector u ∈ n spanning an extremal ray. Proof. Every extremal ray R of n is spanned by a vector v ∈ n , such that R = R≥0 v. It has the property that if λa + μb ∈ R for λ, μ > 0 and a, b ∈ n , then a, b ∈ R. With this every point in the cone is a positive linear combination of elements from extremal rays. In geometric terms, R is an edge of the cone n [6]. It is a standard result from convex geometry (see [6]) that an extremal ray is specified by requiring that sufficiently many of the defining inequalities are satisfied with equality, in the sense that the solution space of these equations is one-dimensional. (Of course, in addition the remaining inequalities must hold.) In the present case, there are only two, very simple, types of inequalities. For a spanning vector v of an extremal ray R, the equations (i.e., inequalities satisfied with equality) take one of the following two forms: for A ⊂ B, C ⊂ [n], vA = vB , vC = 0.

(7) (8)

How can it be that v is specified by a set of such equations up to a scalar multiple? Since the equations only demand that an entry of v is 0 or that two entries are equal, it must be such that there exists a subset U ⊂ 2[n] such that for all A, B ∈ U, the corresponding entries of v are equal, vA = vB = v, while for C ∈ U, it holds that vC = 0. Now, to satisfy all the monotonicity inequalities, U must be an up-set. (We note that v = 0 to span a ray, hence v = 0.) Thus, v = vu for the vector u constructed from the up-set U in the statement of the proposition. This shows that every extremal ray is determined by an up-set. For the other direction, we first observe that u constructed from an arbitrary up-set U as stated satisfies all the inequalities. Furthermore, it is clear that many inequalities will be saturated. To show that R = R≥0 u is extremal, we only need to find a set of 2n − 2 linearly independent equations of the form (7) and (8) that are satisfied. This is given by vA = v[n] for [n] = A ∈ U, vB = 0 for B ∈ U.

228

B. Ibinson, N. Linden, A. Winter

Indeed, these equations leave only the freedom to choose v[n] , and then all entries of v are determined. This concludes the proof that every up-set determines an extremal ray.

Example 4. The following table shows all the extremal rays and hence all possible up-sets for three parties up to permutations of parties.

Ray 1 Ray 2 Ray 3 Ray 4 Ray 5 Ray 6 Ray 7 Ray 8

vA 0 0 0 0 1 1 1 1

vB 0 0 0 0 0 0 1 1

vC 0 0 0 0 0 0 0 1

v AB 0 1 1 1 1 1 1 1

v AC 0 0 1 1 1 1 1 1

v BC 0 0 0 1 0 1 1 1

v ABC 1 1 1 1 1 1 1 1

These up-sets are also represented in graphical form in Fig. 1. Note that every extremal ray of the relative entropy cone is very well structured and can be defined precisely with up-sets. The standard entropy cone however shows no such structure and its extremal rays, although realised by highly structured states, show far less structure in the actual entropy values of the extremal rays (see [16, 12]). 4. ∗ n = n Clearly ∗n ⊂ n since all actual states obey the Lindblad–Uhlmann monotonicity inequalities (5) and (6). Since n is closed, we thus get ∗n ⊂ n . In this section we will show the opposite inclusion, ∗n ⊃ n , thus showing equality between the relative entropy cone and the Lindblad–Uhlmann cone. To show this, it will clearly be enough to show that on every extremal ray of n there exists a nonzero vector contained in ∗n . In other words, if we can construct a pair of states that has a relative entropy vector on an extremal ray, for all possible extremal rays of n , then due to approximate diluability we can find entropy vectors along all points of all extremal rays. Since every point inside a cone can be made with a positive linear combination of points from its extremal rays, we obtain that every point inside the cone can be realised and n = ∗n . Achieving these states can be identified with classical secret sharing schemes (see for example [18]) as we will explain. The formalism for a secret sharing scheme can be defined as follows. Imagine a defined secret bit that we want to share between a number of participants. We want only certain so-called “authorised” groups of participants to be able to recover the secret exactly, while unauthorised groups of parties get no information about the secret. It is clear that with every authorised group S, any group S ⊃ S will also be authorised. So, the authorised groups will form an up-set called an access structure. Definition 5. An n-party secret sharing scheme for a bit b with access structure ∅ = U ⊂ 2[n] , ∅ ∈ / U, consists of the following:

All Inequalities for the Relative Entropy

229

Fig. 1. All possible ‘up-sets’ for three parties, up to permutations. Arrows indicate which sets have the corresponding element as a subset, i.e. arrow implies is subset of. Every element that is inside a box or circle is defined as having relative entropy 1. A box indicates that we have chosen the set to have relative entropy 1, where as a circle indicates the set is forced to have relative entropy 1 as one of its subsets also has relative entropy 1. This ‘forcing’ of relative entropy via one of the subsets is represented as a solid black arrow

(i) Random variables X 1 (b), X 2 (b), X 3 (b), . . . , X n (b), each one associated with a participant labelled 1, . . . , n in the secret sharing scheme. X i (b) takes values in a set Xi . (ii) For S ∈ U, denote X S (b) = (X i (b) : i ∈ S), the collection of shares accessible to the group S. (iii) For each S ∈ U, there is a function f S : X S := i∈S Xi → {0, 1} s.t. f S (X S (b)) = b. For S ∈ / U however, X S (0) and X S (1) have the same distributions. With this scheme the notion of an up-set is naturally included. Since an authorised group of parties are allowed to recover the secret, adding additional parties must also result in an authorised group since the decoding function can be chosen only to act on the previous authorised group. This is the defining feature of an up-set. To relate this to a quantum information setting, we can construct the following density matrix based on a secret sharing scheme: ρ(b) = Pr{X 1 (b) = x1 , . . . , X n (b) = xn }|x1 x1 |1 ⊗|x2 x2 |2 ⊗· · ·⊗ |xn xn |n . x1 ...xn

(9) The superscript on the terms of the tensor product denote the label of the share. We denote a partial trace of the matrix as ρ(b)S = Pr {X j (b) = x j , ∀ j ∈ S} |x j x j | j . (10) x j : j∈S

ρ(b)S has the following properties :

j∈S

230

B. Ibinson, N. Linden, A. Winter

• If S ∈ U then the supporting subspace of ρ(0)S is orthogonal to that of ρ(1)S which allows the group S to determine the secret bit exactly: ρ(0)S ⊥ ρ(1)S . • If S = U then ρ(0)S = ρ(1)S and no information about the secret can be achieved. With this density matrix we can construct the following matrices for use in relative entropy D(ρσ ): ρ S = ρ(0)S , 1 σ S = ρ(0)S + ρ(1)S . 2

(11) (12)

Note that if S ∈ / U then ρ S = σ S and the relative entropy is zero. For S ∈ U, we can calculate the relative entropy as follows:

ρ(0)S ρ(1)S D(ρ S σ S ) = Tr ρ(0)S log ρ(0)S − ρ(0)S log + . (13) 2 2 Using ρ(0)S ⊥ ρ(1)S ,

ρ(0)S ρ(1)S S S S S S S . (14) D(ρ σ ) = Tr ρ(0) log ρ(0) − ρ(0) log + ρ(0) log 2 2 Since there are no elements in ρ(0)S that are present in ρ(1)S the third term is zero. Hence expanding the second term,

D(ρ S σ S ) = Tr ρ(0)S log ρ(0)S − ρ(0)S log ρ(0)S + ρ(0)S (log 2)1l (15) = (log 2)Tr[ρ(0)S ] = 1.

(16)

Note that the relative entropy is constant and independent of the number of elements of S. Hence we have states from which we can produce relative entropies in the form of up-sets described in Proposition 2 by simply realising a classical secret sharing scheme with the required access structure. There exists a secret sharing scheme for every up-set structure, in fact for every access structure [8, 17]. Therefore for each extremal ray of n there is a secret sharing scheme whose density operators according to Eqs. (9), (11) and (12) will produce the required relative entropy vector and hence prove that each extremal ray is realisable. Hence we have proved that ∗n = n and thus that monotonicity under restrictions is the only inequality satisfied by relative entropies. 5. Simple Secret Sharing: Threshold Schemes In this section we will describe a simple secret sharing scheme for a specialised access structure known as a threshold scheme. We will then build upon this scheme showing how we can construct schemes for any access structure. The threshold scheme was discovered by Shamir [17] and allows parties to recover a secret if and only if enough of the parties collaborate, such that their number is beyond a predetermined threshold number of parties. Each party is given a part of the secret which we call a ‘share’ of the secret. There is a total of n shares, one share for each party. A threshold value k is also determined such that if a number of parties get together and pool their shares, if the number of shares they have are greater than or equal to k then they can recover

All Inequalities for the Relative Entropy

231

the secret precisely. However, if the number of shares is less than k, then no information can be extracted about the secret. Accordingly, these schemes are called (n, k)threshold schemes, depending on the number of parties and the desired threshold value. The construction of the threshold scheme is outlined as follows. The premise for the scheme is based on evaluations of a polynomial. Imagine the following polynomial: y = a0 + a1 x + a2 x 2 + a3 x 3 + · · · + am−1 x m−1 .

(17)

We label a0 as the secret value and the shares as evaluations of this polynomial at different points. Geometry tells us that we need exactly m evaluations of this polynomial to determine the coefficient a0 and that if we have any fewer than m evaluations any value of a0 would fit the given points. This means that if we have m or more evaluations we know the secret exactly and if we have fewer than m evaluations we know nothing about the secret. The evaluations of the polynomials become the ‘shares’ of the scheme and we perform (n, k)-threshold scheme the calculations over a finite field. Here is a formulation of the scheme extracted from the original paper by Shamir [17]. • Choose a random k − 1 degree polynomial y(x) = a0 + a1 x + a2 x 2 + · · · + ak−1 x k−1 and let s be the secret where s = a0 , i.e. a1 , . . . ak−1 are chosen independently and uniformly from the field G F( p) of p elements (integer modulo p). • The shares are defined as D1 = y(1), D2 = y(2), . . . , Di = y(i), . . . , Dn = y(n). • Any given subset of k of these Di values together with their indices can find the coefficients of y(x) by interpolation and hence find the value of s = y(0). • Knowing k −1 or fewer shares will not reveal the value of s as there exist polynomials that will fit the given points in the polynomial and allow a0 = 0 or a0 = 1 with every polynomial equally likely. • We use a set of integers modulo a prime number p which forms a finite field allowing interpolation. • Given that the secret is an integer we require p to be larger than both max s and n. • If we only have k − 1 shares, there is one and only one polynomial that can be constructed for each value of s in G F( p). Since each polynomial is equally likely by construction, no information about the secret can be gained. This scheme can be easily translated to the quantum density matrix defined in Eq. (9). Most of the probabilities in the sum are zero except for the ones that are valid for a polynomial fitting the secret value, with shares labeling that part of the sum. This scheme has a very specific access structure, but we can expand to more general access structures. Consider the number of parties p, we can have n > p so that we have more shares than parties, allowing us to distribute multiple shares to single parties. This allows us to have access structures not possible with the simple access structure. Imagine that we require an access structure given in Fig. 2. We require that B and C cannot recover the secret, however if they pool their resources together they can. We also need A to be able to recover the secret independently. Under the normal threshold scheme, we need the threshold to be set at k = 1 so that single party A can recover the secret. However, this means B and C will independently be also able to recover the secret so we cannot create the required access structure. However, if we use a scheme with more shares than parties, we can achieve this access structure, see Definition 5. Many up-sets can be realised using this modified threshold scheme. The following example provides the required threshold scheme and the resulting density matrices.

232

B. Ibinson, N. Linden, A. Winter

Fig. 2. Diagram of up-set used in Definition 5

Example 6. Imagine an n = 3 system, each labelled by A,B and C respectively. Consider also the following up-set representing an extremal ray. This is Ray 6 as used in the previous section. vA 1

vB 0

vC 0

v AB 1

v AC 1

v BC 1

v ABC 1

With this up-set we can now construct a secret sharing scheme to represent it. One of the easiest constructions to understand is the threshold scheme. The scheme required is a (4,2) threshold scheme: 4 is the total number of shares, 2 shares or higher required to construct the secret. We distribute the shares as follows: two shares to A and only one share to B and one to C. This leads us to the required access structure as shown below.

Shares Above threshold

A 2

✓

B 1 ×

C 1 ×

AB 3

✓

AC 3

✓

BC 2

✓

ABC 4

✓

Since we have a total of four shares, we have to construct the scheme of a finite field of 5. In this example calculations will be assumed to be done over this finite field. Since the threshold is two shares, we only need consider polynomials of order one, since only two or more values are necessary to recover the polynomial of order 1. Therefore the possible polynomials are as follows. y y y y y

= = = = =

s, s + x, s + 2x, s + 3x, s + 4x.

We can now embed this scheme into a quantum system. Each system has the same number of qudits as the corresponding party has shares, with d being large enough to incorporate the finite field values (i.e. in this case d=5). For example system A has two qudits whereas system B only has one. We now construct the density matrices ρ(0) and ρ(1) as follows:

All Inequalities for the Relative Entropy

233

1 |00000000| + |12341234| + |24132413| + |31423142| + |43214321| , 5 (18) 1 ρ(1) = |11111111| + |23402340| + |30243024| + |42034203| + |04320432| . 5 (19)

ρ(0) =

A has the first two qudits, B the third and C the fourth. From this we can construct the overall system described previously. We take ρ = ρ(0) and σ = ρ(0)+ρ(1) as in Eqs. (11) 2 and (12). As examples we may compute 1 |0000| + |1212| + |2424| + |3131| + |4343| , 5 1 σA = |0000| + |1212| + |2424| + |3131| + |4343| 10 +|1111| + |2323| + |3030| + |4242| + |0404| .

ρA =

(20)

(21)

Therefore it can be verified that the relative entropy of party A is log 2. Repeating this for party B, ρB =

1 |00| + |33| + |11| + |44| + |22| = σ B . 5

(22)

Therefore the relative entropy for B is 0. All other relative entropies can be verified in this way. Thus giving an unequal number of shares to the parties can achieve more complicated access structures. However not all access structures can be produced in this way. For example imagine that we have 4 parties A,B,C and D with the number of shares in each party being a, b, c and d respectively. We require that A and B can recover the secret and that C and D can recover the secret but no other two party combination. If A and B can recover the secret then their combined total of shares must be greater than k, i.e. a + b ≥ k. Therefore either a ≥ 2k or b ≥ k2 . Similarly we can claim that c ≥ 2k or d ≥ 2k . Say that in this case a ≥ 2k , c ≥ k2 . Hence there exists another two party combination, A and C, that have a number of shares greater than k and can recover the secret, i.e. a + c ≥ k. Therefore the access structure is impossible to produce with this scheme. However there are general methods for dealing with arbitrary access structures [1, 8]. These allow us to represent any extremal ray. One strategy is to create a hierarchy of threshold schemes. Here we illustrate the strategy with an example. Example 7. Imagine an n = 4 system which we label A,B,C and D respectively. Consider also the following up-set representing an extremal ray. vA 0

vB 0

vC 0

vD 0

v AB 1

v AC 0

v AD 0

v BC 0

vB D 0

vC D 1

v ABC 1

v AB D 1

v AC D 1

v BC D 1

v ABC D 1

Note that access structure representing this ray requires that no single party has access to the secret and only parties A and B collaborating, and C and D collaborating will be authorised. Also any greater number of parties will always contain an authorised group and

234

B. Ibinson, N. Linden, A. Winter

are therefore also authorised. The required access structure can be represented by two schemes. This in illustrated in Fig. 3. Each scheme requires a (2, 2)-threshold scheme, 2 total number of shares with a threshold for recovering the secret of 2 shares. We distribute the shares as follows : in one scheme (scheme α) we give 1 share to A and 1 share to B. In the other scheme (scheme β) we give 1 share to C and 1 share to D. This ensures that the secret can be recovered by authorised parties via at least one of the schemes reaching threshold, shown below.

A B Shares(scheme α) 1 1 Shares(scheme β) 0 0 Above threshold × ×

C 0 1 ×

D AB AC AD BC B D C D ABC AB D AC D BC D ABC D 0 2 1 1 1 1 0 2 2 1 1 2 1 0 1 1 1 1 2 1 1 2 2 2 × ✓α × × × × ✓β ✓α ✓α ✓β ✓β ✓α,β

Since we have a total of two shares for each scheme, we construct the scheme using a finite field of 3 elements. From now on calculations will be assumed to be done over this finite field. Since the threshold is two shares, we only need consider polynomials of order one, since only two or more coordinates are necessary to recover the polynomial of order 1. Therefore the possible polynomials are as follows: y = s, y = s + x, y = s + 2x. In the construction of the quantum density matrix we need to consider all possible sets of shares the individual parties can have. The possible combinations are presented in the following tables: Table 1. All possible shares for s = 0 Scheme α y=s y=s y=s y =s+x y =s+x y =s+x y = s + 2x y = s + 2x y = s + 2x

Scheme β y=s y =s+x y = s + 2x y=s y =s+x y = s + 2x y=s y =s+x y = s + 2x

A 0∗ 0∗ 0∗ 1∗ 1∗ 1∗ 2∗ 2∗ 2∗

B 0∗ 0∗ 0∗ 2∗ 2∗ 2∗ 1∗ 1∗ 1∗

C ∗0 ∗1 ∗2 ∗0 ∗1 ∗2 ∗0 ∗1 ∗2

D ∗0 ∗2 ∗1 ∗0 ∗2 ∗1 ∗0 ∗2 ∗1

C ∗1 ∗2 ∗0 ∗1 ∗2 ∗0 ∗1 ∗2 ∗0

D ∗1 ∗0 ∗2 ∗1 ∗0 ∗2 ∗1 ∗0 ∗2

Table 2. All possible shares for s = 1 Scheme α y=s y=s y=s y =s+x y =s+x y =s+x y = s + 2x y = s + 2x y = s + 2x

Scheme β y=s y =s+x y = s + 2x y=s y =s+x y = s + 2x y=s y =s+x y = s + 2x

A 1∗ 1∗ 1∗ 2∗ 2∗ 2∗ 0∗ 0∗ 0∗

B 1∗ 1∗ 1∗ 0∗ 0∗ 0∗ 2∗ 2∗ 2∗

All Inequalities for the Relative Entropy

235

Each party has two registers, one for each scheme. If a party has no share then the register associated with that scheme is put into a fixed state (here |∗ A , |∗ B , |∗C , |∗ D ) which is uncorrelated to the variables for that scheme. Thus the density matrix ρ(0) is ρ(0) =

1 |0∗0∗| A |0∗0∗| B |∗0∗0|C |∗0∗0| D 9 +|0∗0∗| A |0∗0∗| B |∗1∗1|C |∗2∗2| D +|0∗0∗| A |0∗0∗| B |∗2∗2|C |∗1∗1| D +|1∗1∗| A |2∗2∗| B |∗0∗0|C |∗0∗0| D +|1∗1∗| A |2∗2∗| B |∗1∗1|C |∗2∗2| D +|1∗1∗| A |2∗2∗| B |∗2∗2|C |∗1∗1| D +|2∗2∗| A |1∗1∗| B |∗0∗0|C |∗0∗0| D +|2∗2∗| A |1∗1∗| B |∗1∗1|C |∗2∗2| D +|2∗2∗| A |1∗1∗| B |∗2∗2|C |∗1∗1| D .

Similarly we can construct ρ(1) by repeating the process but setting the secret bit to be 1, i.e. |1∗ A |1∗ B |∗1C |∗1 D etc., leading to the density matrix ρ(1): ρ(1) =

1 |1∗1∗| A |1∗1∗| B |∗1∗1|C |∗1∗1| D 9 +|1∗1∗| A |1∗1∗| B |∗2∗2|C |∗0∗0| D +|1∗1∗| A |1∗1∗| B |∗0∗0|C |∗2∗2| D +|2∗2∗| A |0∗0∗| B |∗1∗1|C |∗1∗1| D +|2∗2∗| A |0∗0∗| B |∗2∗2|C |∗0∗0| D +|2∗2∗| A |0∗0∗| B |∗0∗0|C |∗2∗2| D +|0∗0∗| A |2∗2∗| B |∗1∗1|C |∗1∗1| D +|0∗0∗| A |2∗2∗| B |∗2∗2|C |∗0∗0| D +|0∗0∗| A |2∗2∗| B |∗0∗0|C |∗2∗2| D .

We notice that in both states ρ(0) and ρ(1) in this example the state |∗ A |∗ B |∗C |∗ D factors out so that we could equally well take 1 |00000000| + |00120012| + |00210021| + |12001200| 9 +|12121212| + |12211221| + |21002100| + |21122112| +|21212121| , 1 ρ(1) = |11111111| + |11201120| + |11021102| + |20112011| 9 +|20202020| + |20022002| + |02110211| + |02200220| +|02020202| .

ρ(0) =

(23)

(24)

[Note however that in more complicated examples parties need shares from more than one scheme.] From this we can construct the overall system described previously, and for example for parties AB. 1 |0000| + |1212| + |2121| , 3 1 = |0000| + |1212| + |2121| + |1111| + |2020| + |0202| . 6

ρ AB =

(25)

σ AB

(26)

236

B. Ibinson, N. Linden, A. Winter

Fig. 3. Diagram of up-set used in Example 6

Therefore it can be verified that the relative entropy of parties AB is log 2. Repeating this for parties BC, ρ BC =

1 |1111| + |1212| + |1010| + |0202| + |0000| + |2121| + |2222| 6 +|2020| = σ BC . (27)

Therefore the relative entropy for BC is 0. All other relative entropies can be verified in this way. The idea of using a hierarchy of threshold schemes was discovered by Ito, Saito and Nishizeki [8] and requires an exponential number of threshold schemes to represent an access structure. This number of schemes required is irrelevant as long as a scheme exists and we can create the corresponding density matrix. A simpler scheme for general access structure was found by Benaloh and Leichter [1], which does not use threshold schemes but can be directly translated to the required density matrices in Eq. (9). 6. Conclusion In this paper, we have determined the set of all relative entropy vectors for general states on (general) n-party systems: it coincides with the convex cone defined by non-negativity and monotonicity of the relative entropy. We have done this by first showing that the former set in is indeed a convex cone, and then demonstrating that every extremal ray in the latter cone is realised by a specific pair of states. These extremal rays are characterised by up-sets in 2[n] , and the pairs of states correspond to (classical) secret sharing schemes. A particular consequence is that the cone of relative entropy vectors is the same for quantum states and for classical probability distributions. This is in marked contrast to the case of entropy vectors, where even for n = 2 classical and quantum entropy cones differ [16]. Beyond the characterisation in terms of convex geometry, our result also means that, apart from monotonicity, there can be no other universal relation between the relative entropy values of the reduced states in a composite system (except those that follow trivially from monotonicity). In this sense, quantum and classical relative entropy is completely characterised by the monotonicity relation.

All Inequalities for the Relative Entropy

237

We are now in a position to go back to our assumption of finite dimensional systems and the demand that all relative entropies are finite. Clearly, if some of the parties are described by infinite dimensional quantum systems, we still have monotonicity [19], even in the von Neumann algebra scenario, so the relative entropy vectors are all within the Lindblad-Uhlmann cone. In this case, and even in the finite dimensional case some entries in a relative entropy vector may be positive infinity. However, even this does not present a problem, once we realise that the groups where the value is infinite form an up-set, so the vector can indeed be obtained as a limit of finite relative entropy vectors in the Lindblad-Uhlmann cone. Another mathematical peculiarity is the following: From the proof of achievability of all extremal rays of the Lindblad-Uhlmann cone, we discover that every point in the entropy cone is achievable rather than infinitely approximated, i.e. n = ∗n . This is due to the fact that every point on all extremal rays can be attained. To see this, simply choose ρ(0) and ρ(1) in Eq. (12) with different weights p and 1 − p (0 < p ≤ 1). Then the calculation following that equation shows that the relative entropy is either − log p or 0 depending on whether S is an authorised set or not. By additivity in Lemma 1 we obtain that every point on the extremal rays is realised, hence every point in the Lindblad-Uhlmann cone. We conclude the paper by commenting briefly on possible connections of our result to the entropy cone, and possibly to the relative entropy of entanglement. In the above arguments we have often used the formula D(ρσ ) = −S(ρ) − Trρ log σ , which means that if we make the restriction σ = d1 1l, the maximally mixed state in d dimensions, the relative entropies (now dependent only on ρ) evaluate to log d − S(ρ). Going through the proof of Lemma 1 we see that for any number n of parties, the set of all these relative entropy vectors is also a convex cone, and one might think that its relations would capture all inequalities for the entropy. That this is too optimistic a hope is indicated by the fact that the relative entropy is expressed by the entropy and a term beyond what can be expressed by general entropies alone (essentially the log of the rank). And it is indeed not the case, since for example the nonnegativity of the relative entropy translates into S(ρ) ≤ log d. However, the fundamental fact that the entropy S(ρ) is nonnegative is not captured at all, since that would require an upper bound on the relative entropy depending on the dimension. Another way of looking at this (which was pointed out to us by M.-B. Ruskai) is by noting that the relative entropy is homogeneous in the trace of ρ and σ (assumed to be equal), whereas the entropy is not, unless one introduces a correction term, but then one loses nonnegativity. Acknowledgement. BI was supported by the U.K. Engineering and Physical Sciences Research Council. NL and AW acknowledge support by the EU project RESQ and the U.K. EPSRC’s IRC QIP.

References 1. Benaloh, J., Leichter, J.: Generalising Secret Sharing and Monotone Functions. Advances in Cryptology, CRYPTO 1998, LNCS 403, Berlin: Springer Verlag, 1990, pp. 27–35 2. Bennett, C.H., DiVincenzo, D.P., Smolin, J.A., Wootters, W.K.: Mixed state entanglement and quantum error correction. Phys. Rev. A 54, 3824–3851 (1996) 3. Cerf, N.J., Adami, C.: Negative entropy and information in quantum mechanics. Phys. Rev. Lett. 79, 5194–5197 1997 4. Christandl, M., Winter, A.: Squashed entanglement – An additive entanglement measure. J. Math. Phys. 45(3), 829–840 (2004) 5. Cover, T.M., Thomas, J.A.: Elements of Information Theory. New York Wiley & Sons, 1991 6. Grünbaum, B.: Convex Polytopes. 2nd ed. prepared by V. Kaibel, V. Klee, G. Ziegler, Graduate Texts in Mathematics 221, Berlin: Springer Verlag, 2003

238

B. Ibinson, N. Linden, A. Winter

7. Holevo, A.S.: Bounds for the quantity of information transmitted by a quantum channel. Probl. Inform. Transm. 9(3), 177–183 (1973) 8. Ito, M., Saito, A., Nishizeki, T.: Secret Sharing Schemes releasing General Access Structure. In: Proc. IEEE Globecom ’87, Berlin-Heidelberg-New York: Springer, 1987, pp. 99–102 9. Lieb, E.H., Ruskai, M.B.: Proof of the strong subadditivity of quantum-mechanical entropy. J. Math. Phys. 14, 1938–1941 (1973) 10. Lindblad, G.: Expectations and entropy inequalities for finite quantum systems. Commun. Math. Phys. 39(2), 111–119 (1974) 11. Lindblad, G.: Completely positive maps and entropy inequalities. Commun. Math. Phys. 40(2), 147–151 (1975) 12. Linden, N., Maneva, E., Massar, S., Popescu, S., Roberts, D., Schumacher, B., Smolin, J.A., Thapliyal, A.V.: In preparation 13. Linden, N., Winter, A.: A new inequality for the von Neumann entropy. Commun. Math. Phys. 259, 129–138 (2005) 14. Nielsen, M.A., Chuang, I.L.: Quantum Computation and Quantum Information. Cambridge: Cambridge University Press, 2000 15. Petz, D.: Monotonicity of quantum relative entropy revisited. Rev. Math. Phys. 15(1), 79–91 (2003) 16. Pippenger, N.: The inequalities of quantum information theory. IEEE Trans. Inf. Theory 49(4), 773–789 (2003) 17. Shamir, A.: How to Share a Secret. Commun. ACM 22(11), 612–613 (1979) 18. Stinson, D.R.: An explication of secret sharing schemes. Designs, Codes and Cryptography 2(4), 357–390 (1992) 19. Uhlmann, A.: Relative entropy and the Wigner-Yanase-Dyson-Lieb concavity in an interpolation theory. Commun. Math. Phys. 54, 21–32 (1977) 20. Vedral, V.: The role of relative entropy in quantum information Theory. Rev. Mod. Phys. 74(1), 197–234 (2002) 21. Vedral, V., Plenio, M.B.: Entanglement measures and purification procedures. Phys. Rev. A 57, 1619–1633 (1998) 22. Vedral, V., Plenio, M.B., Rippin, M.A., Knight, P.L.: Quantifying entanglement. Phys. Rev. Lett. 78, 2275–2279 (1996) 23. Yeung, R.W.: A Framework for linear information inequalities. IEEE Trans. Inf. Theory 43(6), 1924–1934 (1997) 24. Yuen, H.P., Ozawa, M.: Ultimate information carrying limit of quantum systems. Phys. Rev. Lett. 70(4), 363–366 (1993) 25. Zhang, Z., Yeung, R.W.: On characterization of entropy function via information inequalities. IEEE Trans. on Inform. Theory 44(4), 1440–1452 (1998) Communicated by M.B. Ruskai

Commun. Math. Phys. 269, 239–257 (2007) Digital Object Identifier (DOI) 10.1007/s00220-006-0120-3

Communications in

Mathematical Physics

Absolutely Continuous Spectrum for the Anderson Model on a Tree: A Geometric Proof of Klein’s Theorem Richard Froese1 , David Hasler2 , Wolfgang Spitzer3, 1 Department of Mathematics, University of British Columbia, Vancouver, British Columbia, Canada.

E-mail: [email protected]

2 Department of Mathematics, University of Virginia, Charlottesville, Virginia, USA.

E-mail: [email protected]

3 Department of Physics, International University of Bremen, Bremen, Germany.

E-mail: [email protected] Received: 8 March 2006 / Accepted: 30 May 2006 Published online: 6 October 2006 – © Springer-Verlag 2006

Abstract: We give a new proof of a version of Klein’s theorem on the existence of absolutely continuous spectrum for the Anderson model on the Bethe Lattice at weak disorder. Model and Statement of Main Results It is widely believed that the Anderson model [An] should exhibit absolutely continuous spectrum at weak disorder in dimensions three and higher. But it is only for the Bethe lattice B, or Cayley tree, that this has been established. The first proof was given by Klein [K1, K2], and his remained the only result of this kind until the recent work of Aizenman, Sims and Warzel [ASW]. These authors proved a stability result for absolutely continuous spectrum for the Anderson model on B that implies the existence of an absolutely continuous component in the spectrum for perturbations of the Anderson model on B, also in the presence of a periodic background potential. For the related problem of proving absolutely continuous spectrum for slowly decaying potentials there has been recent progress ([D, DK, KLS, SS]). It is interesting to note that for the Bethe lattice, localization for large disorder has not yet been established at the band edge, but only for strictly larger energies (see [Ai, AM]). For more information about this model, and further references we recommend the discussion in [ASW]. In this paper we give a new proof of a variant of Klein’s theorem, Theorem 1 below. Our proof is quite different from either of the two previous approaches. It is based on [FHS] where we proved existence of absolutely continuous spectrum for a class of deterministic potentials whose radial behaviour was restricted only by an ∞ bound. That proof was based on the contracting properties of the map φ, defined below, that arises in Copyright © 2006 by the authors. This article may be reproduced in its entirety for non-commercial purposes. Current address: Department of Theoretical Physics, University of Erlangen-Nürnberg, Staudtstrasse 7, 91058 Erlangen, Germany

240

R. Froese, D. Hasler, W. Spitzer

the recurrence relation for the Green’s function, thought of as a map between hyperbolic spaces. The key result of this paper, Lemma 4 below, is a version of this, adapted to the probabilistic setting. Klein is able to handle some random potentials that we cannot, since we require that the single site distribution has a finite fourth moment. On the other hand, our Theorem 6 quantifies how the finiteness of higher moments of the single site distribution leads to more decay in the probability distribution of the Green’s function. The Bethe Lattice, B, or Cayley tree of degree k is the infinite connected graph with no closed loops where each vertex has k nearest neighbours. In this paper, we set k = 3. We believe a similar proof works for all k. The Anderson model on B is given by the random Hamiltonian H =+q on the Hilbert space H = 2 (B) =

ϕ:B→C :

|ϕ(x)|2 < ∞

,

x∈B

where q denotes a random potential, such that for each x ∈ B, q(x) is an independently distributed real random variable with probability distribution ν, and is the Laplacian defined by (ϕ)(x) = ϕ(y) . y:d(x,y)=1

Here d(x, y) denotes the distance in the graph, that is, the number of edges√in the√shortest path joining x and y. The spectrum of the free Laplacian is σ () = [−2 2, 2 2]. The main theorem which we will prove is the following. √ Theorem 1. For any E, with 0 < E < 2 2, there exist δ1 > 0 and δ2 > 0, such that for all ν with (1 + |q|4 ) dν(q) ≤ δ2 , |q|≥δ1

the spectrum of H is purely absolutely continuous in [−E, E] with probability one, i.e., we have almost surely

ac ∩ [−E, E] = [−E, E] , pp ∩ [−E, E] = ∅ , sc ∩ [−E, E] = ∅ . As was shown in [K1], Theorem 1 follows from the following fact. Let R(E, ) be the strip in the complex plane defined by R(E, ) = {z ∈ H : Rez ∈ [−E, E], 0 < Imz ≤ } . Theorem 2. Let x ∈ B. Under the hypotheses of Theorem 1, 2 sup E x|(H − λ)−1 |x < ∞ λ∈R(E,)

for some > 0.

Absolutely Continuous Spectrum for the Anderson Model on a Tree

241

Here is a brief outline of the paper. Our main technical result is the contraction estimate in Lemma 4. This and the companion result Lemma 5 are used in Theorem 6 to prove that the probability distribution on the hyperbolic plane for Z (x|y) (λ), defined below, decays at infinity. This decay then implies the decay of the Green’s function required in Theorem 2. It seems possible to us that our method could be extended to handle some local correlations and periodic background potentials. However, such extensions would probably involve contraction estimates on multiple step recursion relations, with a corresponding increase in the level of complexity. Let H = {z ∈ C : Im(z) > 0} denote the complex upper half plane. For convenience we fix an arbitrary site in B to be the origin and denote it by 0. Given two nearest neighbour sites x, y ∈ B, we will denote by B(x|y) the graph obtained by removing from B the branch emanating from x that passes through y. We will write H (x|y) for H when restricted to B(x|y) and set G (x|y) (λ) = x|(H (x|y) − λ)−1 |x . We will use the following recursion relations. For a proof see [K1] or [FHS]. Proposition 3. For any λ ∈ H, ⎛

G(0, 0, λ) = 0|(H − λ)−1 |0 = − ⎝

⎞−1 G (x|0) (λ) + λ − q(0)⎠

,

(1)

x:dist(x,0)=1

and for any two nearest neighbour sites x, y ∈ B, ⎛ ⎞−1 G (x|y) (λ) = − ⎝ G (x |x) (λ) + λ − q(x)⎠ .

(2)

x : d(x,x )=1,x = y

It will turn out to be convenient to study the sum of two Green’s functions, i.e., for two nearest neighbour sites x, y ∈ B we set Z (x|y) (λ) = G (x |x) (λ). (3) x : d(x,x )=1,x = y

Using the recursion relation for G (x|y) (λ) we obtain the following recursion relation:

−1 Z (x |x) (λ) + λ − q(x ) . Z (x|y) (λ) = − x : d(x,x )=1,x = y

This leads to the investigation of the transformation φ : H × H × R × R × H → H defined by φ(z 1 , z 2 , q1 , q2 , λ) =

−1 −1 + . z 1 + λ − q1 z 2 + λ − q2

(4)

If Im λ > 0, the transformation z → φ(z, z, 0, 0, λ) has a unique fixed point, z λ , in the upper half plane, i.e., with Imz λ > 0. Explicitly, z λ = −λ/2 + (λ/2)2 − 2,

242

R. Froese, D. Hasler, W. Spitzer

√ √ where we will always make the choice Im · ≥ 0 (and a > 0 for a > 0). This fixed point as a function of λ ∈ H extends continuously onto the real axis. This extension √ yields for Im(λ) = 0 and |λ| < 2 2 the fixed point z λ = −λ/2 + i 2 − (λ/2)2 , √ √ lying on an arc of the circle |z| = 2. When Im(λ) = 0 and |λ| ≤ E < 2 2, the arc is strictly contained in the upper half plane. Thus when λ lies in the strip R(E, ) with √ 0 < E < 2 2 and sufficiently small, Im(z λ ) is bounded below and |z λ | is bounded above by a positive constant. We will use the weight function cd(z) defined by cd(z) = 2 Im(z λ )(cosh(distH (z, z λ )) − 1) =

|z − z λ |2 . Im(z)

(5)

Up to constants, cd(z) is the hyperbolic cosine √ of the hyperbolic distance from z to z λ , provided λ ∈ R(E, ) with 0 < E < 2 2 and sufficiently small. This notation suppresses the λ dependence. To prove Theorem 2, we will study the following function: μ3, p (z 1 , z 2 , z 3 , q1 , q2 , q3 , q4 , λ) cd p (φ(z σ , φ(z σ , z σ , qσ , qσ , λ), qσ , q4 , λ)) 1 2 3 2 3 1 , = p (z ) + cd p (z ) + cd p (z ) cd 1 2 3 σ

(6)

where σ runs over the cyclic permutations of (1, 2, 3), i.e., (σ1 , σ2 , σ3 ) ∈ {(1, 2, 3), (2, 3, 1), (3, 1, 2)}. Note that μ3, p is well defined as long as (z 1 , z 2 , z 3 ) = (z λ , z λ , z λ ). The proof of Theorem 1 is based on the following bounds, which will be proved in the next section. For small |qi | we have √ Lemma 4. For any E, 0 < E < 2 2 and any p > 1, there exist positive constants , δ, 0 and a compact set K ⊂ H3 such that (7) μ3, p K c ×[−δ,δ]4 ×R(E, ) ≤ 1 − . 0

Here

Kc

denotes the complement

H3 \K .

This lemma also holds for p = 1, but the proof is more involved. We will also need the following bounds, that hold for all |qi |. √ Lemma 5. For any E, 0 < E < 2 2 and any p ≥ 1, there exist positive constants 0 , C and a compact set K ⊂ H3 such that 4 μ3, p c 4 ≤C 1+ |qi |2 p . (8) K ×R ×R(E,0 )

i=1

Similarly, if we define cd(−(z 1 + z 2 + z 3 + λ − q)−1 ) p , cd(z 1 ) p + cd(z 2 ) p + cd(z 3 ) p cd(−(z + λ − q)−1 ) p μ 1, p (z, q, λ) = , cd(z) p

μ 3, p (z 1 , z 2 , z 3 , q, λ) =

Absolutely Continuous Spectrum for the Anderson Model on a Tree

then

243

μ 3, p K c ×R4 ×R(E, ) ≤ C(1 + |q|2 p ), 0 μ1, p K c ×R4 ×R(E, ) ≤ C(1 + |q|2 p ). 0

Let ρ be the probability distribution for Z (0|x) (λ) on the hyperbolic plane given by ρ(A) = Prob(Z (0|x) (λ) ∈ A). Although it is suppressed in the notation, ρ depends on λ, and for Im(λ) > 0 the support of ρ is bounded. This follows, for example from the fact that it is contained in the range of φ. Given Lemma 4 and Lemma 5, we can prove that the decay of ρ at infinity is preserved as Im(λ) becomes small, provided ν has enough finite moments and is concentrated near 0. √ Theorem 6. Let x be a nearest neighbour of 0. For any E, 0 < E < 2 2 and p > 1, there exist δ1 > 0, δ2 > 0 and > 0, such that for all ν satisfying (1 + |q|2 p )dν(q) ≤ δ2 , |q|≥δ1

we have sup

λ∈R(E,)

E cd p (Z (0|x) (λ)) < ∞.

Proof. Let δ1 be the δ given by Lemma 4, and choose 0 and K that work in both Lemma 4 and Lemma 5. For (z 1 , z 2 , z 3 ) ∈ K c , we estimate μ3, p (z 1 , z 2 , z 3 , q1 , . . . , q4 , λ) dν(q1 ) · · · dν(q4 ) R4 ≤ (1 − ) dν(q1 ) . . . dν(q4 ) [−δ1 ,δ1 ]4

+C

R4 \[−δ1 ,δ1 ]4

1+

4

|qi |2 p

i=1

≤ (1 − ) + C 1 + 4M2 p − −4

[−δ1 ,δ1 ]

3 dν(q)

dν(q1 ) · · · dν(q4 )

4 [−δ1 ,δ1 ]

dν(q)

|q| dν(q) 2p

[−δ1 ,δ1 ]

≤ (1 − ) + C 1 + 4M2 p − (1 − δ2 )4 − 4(1 − δ2 )3 (M2 p − δ2 ) ≤ 1 − /2, provided δ2 is sufficiently small. Here M2 p denotes the moment |q|2 p dν(q). The recursion relation for Z (0|x) (λ) implies that for any continuous function w(z), w(z)dρ(z) = w(φ(z 1 , z 2 , q1 , q2 , λ)) dρ(z 1 )dρ(z 2 ) dν(q1 )dν(q2 ). H

H× H × R × R

244

R. Froese, D. Hasler, W. Spitzer

Using this relation (twice) and the estimate above, we obtain for λ ∈ R(E, 0 ),

E cd p (Z (0|x) )(λ) = cd p (z) dρ(z) = cd p (φ(z 1 , φ(z 2 , z 3 , q2 , q3 , λ), q1 , q4 , λ)) dρ(z 1 ) · · · dρ(z 3 ) dν(q1 ) · · · dν(q4 ) 1 p = cd φ(z σ1 , φ(z σ2 , z σ3 , qσ2 , qσ3 , λ)qσ1 , qσ4 , λ)) dρ(z 1 ) · · · dν(q4 ) 3 σ 1 = μ3, p (z 1 , z 2 , z 3 , q1 , . . . , q4 , λ) dν(q1 ) · · · dν(q4 ) 3 K c R4 × cd p (z 1 ) + cd p (z 2 ) + cd p (z 3 ) dρ(z 1 ) · · · dρ(z 3 ) + C ≤ (1 − /2) cd p (z) dρ(z) + C, where C is some finite constant, only depending on the choice of K . This implies that for all λ ∈ R(E, 0 ),

2C E cd p (Z (0|x) ) ≤ . Now we show how this theorem for p = 2 implies Theorem 2. We must transfer our decay estimate for the distribution ρ for Z (0|x) (λ) to the distributions ρg for G (0|x) (λ) and finally to ρG for G(0, 0, λ), where these probability distributions are defined by ρG (A) = Prob{G(0, 0, λ) ∈ A}, ρg (A) = Prob{G (0|x) (λ) ∈ A}. Proof of Theorem 2. We will use the following inequality: |z| ≤ 4

|z − w|2 + 2|w|. Im z

(9)

The inequality clearly holds for |z| ≤ 2|w|. In the complementary case, we have |z| > 2|w| and thus |z − w| ≥ ||z| − |w|| ≥ |w|, implying |z|Im z ≤ |z|2 ≤ 2|z − w|2 + 2|w|2 ≤ 4|z − w|2 and further |z| ≤ 4|z − w|2 /Im z. This proves (9). Using (9) with w = z λ yields that for λ ∈ R(E, ), |z| ≤ 4cd(z) + C,

Absolutely Continuous Spectrum for the Anderson Model on a Tree

245

where C depends only on E and . To transfer the estimate on ρg to one on ρG we use the relation (1) and the estimate on μ 3,2 given by Lemma 5. Let R denote R(E, ). Then 2 supλ∈R E 0|(H − λ)−1 |0 = supλ∈R |z|2 dρG (z) ≤ 32 supλ∈R cd2 (z) dρG (z) + C = 32 supλ∈R cd2 (−1/(z 1 + z 2 + z 3 + λ − q)) dρg (z 1 )dρg (z 2 )dρg (z 3 ) dν(q) + C ≤ 32 supλ∈R μ 3,2 (z 1 , z 2 , z 3 , q, λ) K c ×R 2

×(cd (z 1 ) + cd (z 2 ) + cd2 (z 3 )) dρg (z 1 )dρg (z 2 ) dρg (z 3 ) dν(q) + C ≤C (1 + |q|4 )cd2 (z) dρg (z) dν(q) + C H× R ≤ C cd2 (z) dρg (z) + C . 2

A completely analogous argument, using the relations (2) and (3) and the estimate of μ 1, p in Lemma 5 yields

cd (z) dρg (z) ≤ C 2

cd2 (z) dρ(z) + C

and completes the proof. Analysis of μ2 To analyze the function μ3, p we will write it in terms of μ2 , defined by μ2 (z 1 , z 2 , q1 , q2 , λ) =

2cd(φ(z 1 , z 2 , q1 , q2 , λ)) cd(z 1 ) + cd(z 2 )

initially as a function √ from H2 \{(z λ , z λ )} × R2 × R → R. In this section R = R(E, ) for some 0 < E < 2 2 and > 0. (Note that here and throughout this paper we are using Hn to denote a product of hyperbolic planes, and not n-dimensional hyperbolic space.) Proposition 7. For all z 1 , z 2 ∈ H2 \{(z λ , z λ )} and λ ∈ R, μ2 (z 1 , z 2 , 0, 0, λ) < 1 . Proof. For z, w ∈ H set c(w, z) = 2(cosh(distH (w, z)) − 1) =

|w − z|2 . Im(w) Im(z)

246

R. Froese, D. Hasler, W. Spitzer

Note that z → c(w, z) is strictly convex. This can be seen for example by noting that its Hessian has strictly positive eigenvalues. Also, c(w, z) is invariant under hyperbolic isometries. Thus

z +z 1 1 1 2 ≤ c(w, z 1 ) + c(w, z 2 ) . c(2w, z 1 + z 2 ) = c w, 2 2 2 Substituting −(z 1 − λ)−1 for z 1 and −(z 2 − λ)−1 for z 2 yields 1 1 c(w, −(z 1 − λ)−1 ) + c(w, −(z 2 − λ)−1 ) 2 2 1 1 −1 = c(2w, −2(z 1 − λ) ) + c(2w, −2(z 2 − λ)−1 ) . 2 2

c(2w, φ(z 1 , z 2 , 0, 0, λ)) ≤

Now choose 2w = z λ . Since z λ is the fixed point of z → −2(z − λ)−1 we obtain cd(φ(z 1 , z 2 , 0, 0, λ)) ≤

1 1 cd(z 1 ) + cd(z 2 ) . 2 2

If equality holds then strict convexity in the first estimate above implies z 1 = z 2 . Then, since Im(λ) > 0, z → φ(z, z, 0, 0, λ) is a strict contraction with fixed point z λ (see [FHS]). This implies that the common value of z 1 and z 2 must be z λ . We need to understand the behaviour of μ2 (z 1 , z 2 , q1 , q2 , λ) as z 1 and z 2 approach infinity, and λ approaches the real axis. We know from Proposition 7 that the value of μ2 is at most one, and wish to determine at what points it equals one. Thus it is natural to introduce the compactification H 2 × R2 × R. Here R denotes the closure, and H is the compactification of H obtained by adjoining the boundary at infinity. (The word compactification is not quite accurate here because of the factors of R, but we will use the term nevertheless.) The boundary at infinity is defined as follows. Cover the upper half plane model of the hyperbolic plane H with two coordinate patches, one where |z| is bounded below and one where |z| is bounded above. On the patch where |z| > C we use the co-ordinate function w = −1/z. Each chart looks like a semi-circle in the complex plane of the form {z ∈ C : Im(z) > 0, |z| < C}. The boundary at infinity consists of the sets {Im(z) = 0} and {Im(w) = 0} in the respective charts. The compactification H is the upper half plane with the boundary at infinity adjoined. We will use i∞ to denote the point where w = 0. We now think of μ2 as being defined in the interior of the compactification H 2 × 2 R × R and ask how it behaves near the boundary. It turns out that in the co-ordinates introduced above, μ2 is a rational function. At most points on the boundary the denominator does not vanish in the limit, and μ2 has a continuous extension. There are, however, points where both numerator and denominator vanish, and at these singular points the limiting value of μ2 depends on the direction of approach. By blowing up the singular points, it would be possible to define a compactification of H2 × R2 × R to which μ2 extends continuously. However, this is more than we need for our proof. We will do a partial resolution of the singularities of μ2 , consisting of two blow-ups of the simplest kind, and then extend μ2 to an upper semi-continuous function on the resulting compactification. The reciprocal of the function cd(z), χ (z) = 1/cd(z) =

Im(z) |z − z λ |2

Absolutely Continuous Spectrum for the Anderson Model on a Tree

247

is a boundary defining function for H. This means that in each of the two charts above, χ is positive near infinity and vanishes exactly to first order on the boundary at infinity. We will now describe our compactification of H2 × R2 × R. Start with H 2 × R2 × R. The first blowup consists of writing χ (z 1 ), χ (z 2 ) in polar co-ordinates. Thus we introduce new variables r1 , ω1 and ω2 and impose the equations χ (z 1 ) = r1 ω1 , χ (z 2 ) = r1 ω2 ,

(10)

ω12 + ω22 = 1.

(11)

and

The blown up space is the variety in H 2 ×R2 ×R×R3 containing all points (z 1 , z 2 , q1 , q2 , λ, r1 , ω1 , ω2 ) that satisfy (10) and (11). In the region where |z 1 | and |z 2 | are bounded, we could use χ (z 1 ), χ (z 2 ), Re(z 1 ), Re(z 2 ), q1 , q2 , λ as local co-ordinates for the original space H 2 × R2 × R. The image of such a co-ordinate chart near the boundary would be [0, )2 × I 2 × R2 × R for some interval I . Local co-ordinates for the blown up space would be r1 , θ , Re(z 1 ), Re(z 2 ), q1 , q2 , λ, where ω1 = cos(θ ) and ω2 = sin(θ ). The image of such a chart in the blown up space would be [0, ) × [0, π/2] × I 2 × R2 × R. Similarly, we could write local co-ordinates in the other regions. The singular locus for the first blowup is the corner ∂∞ (H) × ∂∞ (H) × R2 × R in H 2 × R2 × R, defined by χ (z 1 ) = χ (z 2 ) = 0. Corresponding to each point in the singular locus is a quarter circle of points in the blown up space, parametrized by ω1 , ω2 . Away from the singular locus the original space and the blown up space are essentially the same, since we can solve for r1 , ω1 , ω2 in terms of the original variables. For the second blowup we introduce an additional real variable r2 and two additional complex variables η1 and η2 . We impose z 1 + Re(λ) − q1 = r2 η1 , z 2 + Re(λ) − q2 = r2 η2 ,

(12)

|η1 |2 + |η2 |2 = 1,

(13)

with

and r2 ≥ 0. The variables of the first and second blowups are not independent when r1 , r2 = 0. In fact, since χ (z 1 ) = r2 Im(η1 )/|r2 η1 − Re(λ) + q1 − z λ |2 = r1 ω1 , we find that r1r2 Im(η1 )ω2 |r2 η2 − Re(λ) + q2 − z λ |2 and r1r2 Im(η2 )ω1 |r2 η1 − Re(λ) + q1 − z λ |2 are equal so that, when r1 , r2 = 0, Im(η1 )ω2 |r2 η2 − Re(λ) + q2 − z λ |2 = Im(η2 )ω1 |r2 η1 − Re(λ) + q1 − z λ |2 .

(14)

We will require that this equation be satisfied everywhere. Otherwise, there would be points in the blown up space (where r2 = 0 and (14) is not satisfied) that are not in the closure of the interior of the original space. As before, the twice blown up space is essentially the same as the once blown up space away from the singular locus z 1 = − Re(λ) + q1 , z 2 = − Re(λ) + q2 . Local

248

R. Froese, D. Hasler, W. Spitzer

co-ordinates for the twice blown up space near the singular locus are given by r2 , ω1 , ω2 , Re(η1 ), Re(η2 ), q1 , q2 , λ. Define K to be the space obtained from H 2 × R2 × R by the two blowups described above. The topology is the one given by the local description as a closed subset of Euclidean space. The boundary at infinity is defined to be ∂∞ K = {χ (z 1 ) = 0} ∪ {χ (z 2 ) = 0} = {r1 = 0} ∪ {ω1 = 0} ∪ {ω2 = 0} . The set K \∂∞ K can be identified with H2 × R2 × R. Extend μ2 to an upper semi-continuous function on K by defining, for points k ∈ ∂∞ K , μ2 (k) = lim sup μ2 (kn ) . kn →k kn ∈K \∂∞ K

Here kn → k means convergence in K . More explicitly, kn is a point (z 1,n , z 2,n , q1,n , q2,n , λn ) ∈ H2 × R2 × R, and not only do these co-ordinates approach limiting values 2 (z 1 , z 2 , q1 , q2 , λ) in H × R2 × R, but also the co-ordinates r1 , ω1 and ω2 defined by (10) and (11) and the co-ordinates r2 , η1 and η2 defined by (12) and (13) approach limiting values as well. Of course, the co-ordinates r2 , η1 and η2 are only defined in the region where |z 1 | and |z 2 | are bounded. But, for these co-ordinates, we really care only about the point where z i = − Re(λ) + qi , i = 1, 2, since away from the singular locus, the blowup co-ordinates are determined by the base co-ordinates z i , qi and λ. Lemma 8. Let √

be the √ subset of K , where μ2 = 1. Let K 0 denote the subset of ∂∞ K , where λ ∈ (−2 2, 2 2), q1 = q2 = 0. Then

∩ K 0 = ( 1 ∪ 2 ∪ 3 ∪ 4 ) ∩ K 0 ,

(15)

where

1

2

3

4

= {z 1 = {z 1 = {z 1 = {z 1

= −λ, z 2 = −λ, z 2

= −λ, z 2 = −λ, z 2

= −λ, z 1 = z 2 ∈ ∂∞ H, ω1 = ω2 },

= −λ, ω1 = 0}, = −λ, ω2 = 0}, = −λ, η1 = eiψ ω1 , η2 = eiψ ω2 for some ψ ∈ [0, π ]}.

Remark. In fact we will only use this theorem when both z 1 and z 2 are in ∂∞ H. Proof. Assume for the moment that (z 1 , z 2 , q1 , q2 , λ) ∈ H2 × R2 × R. Since χ (φ(z 1 , z 2 , q1 , q2 , λ)) =

Im(z 1 + λ)|z 2 + λ − q2 |2 + Im(z 2 + λ)|z 1 + λ − q1 |2 , |z 1 + λ − q1 + z 2 + λ − q2 + z λ (z 1 + λ − q1 )(z 2 + λ − q2 )|2

the function μ2 is given by μ2 (z 1 , z 2 , q1 , q2 , λ) 2χ (z 1 )χ (z 2 )|z 1 +λ−q1 +z 2 +λ−q2 +z λ (z 1 +λ−q1 )(z 2 +λ−q2 )|2 , = ( p1 χ (z 1 )|z 1 − z λ |2 |z 2 +λ−q2 |2 + p2 χ (z 2 )|z 2 − z λ |2 |z 1 +λ−q1 |2 )(χ (z 1 )+χ (z 2 )) where pi = 1 + Im(λ)/ Im(z i ).

Absolutely Continuous Spectrum for the Anderson Model on a Tree

249

Define μ∗2 by setting p1 = p2 = 1 in this formula, that is, μ∗2 (z 1 , z 2 , q1 , q2 , λ) =

2ω1 ω2 |z 1 + λ − q1 + z 2 + λ − q2 + z λ (z 1 + λ − q1 )(z 2 + λ − q2 )|2 . (ω1 |z 1 − z λ |2 |z 2 + λ − q2 |2 + ω2 |z 2 − z λ |2 |z 1 + λ − q1 |2 )(ω1 + ω2 )

(16)

Clearly μ2 ≤ μ∗2 . Now let k ∈ ∩ K 0 . To show the inclusion ⊆ in (15) we must show that k is in i for some i ∈ {1, 2, 3, 4}. Let the of k, be given by the base co-ordinates √ co-ordinates √ z 1 , z 2 , q1 = 0, q2 = 0, λ ∈ (−2 2, 2 2), the first blow-up co-ordinates r1 , ω1 and ω2 and, if z 1 , z 2 = i∞, the second blow-up co-ordinates r2 , η1 and η2 . Since k ∈ ∂∞ K , μ2 (k) is defined as a lim sup. The points of continuity of μ∗2 are the points where the denominator of (16) does not vanish. Thus k satisfies one of the following four mutually disjoint conditions: (i) (ii) (iii) (iv)

k is a point of continuity for μ∗2 , z 1 = −λ and z 2 = −λ, z 1 = −λ, z 2 = −λ and ω1 = 0, z 1 = −λ, z 2 = −λ and ω2 = 0.

If conditions (iii) or (iv) hold then k lies in 2 or 3 and we are done. Suppose that (i) holds. Then 1 = μ2 (k) = lim sup μ2 (kn ) ≤ lim sup μ∗2 (kn ) ≤ 1. kn →k kn ∈K \∂∞ K

kn →k kn ∈K \∂∞ K

The last inequality holds because at a point of continuity, the lim sup is actually a limit which can be evaluated If we take the limit in λ and qi first, we may use

in√any order. √ the fact that for λ ∈ −2 2, 2 2 , μ2 = μ∗2 , and from Proposition 7 we see that the remaining limit in z 1 and z 2 can be at most 1. Thus we have μ∗2 (k) = 1 and we need to show that if (i) holds, and (ii), (iii) and (iv) do not, then k lies in one of the sets i . Let us first consider the case where z 1 = z 2 = i∞. In this case we must introduce new variables wi = √ −1/z i√ , substitute into (16) and send w1 and w2 to zero. Using |z λ |2 = 2 for λ ∈ (−2 2, 2 2) we find that at this point μ∗2 = 4ω1 ω2 /(ω1 + ω2 )2 . So μ∗2 = 1 implies that ω1 = ω2 and thus that k ∈ 1 . Continuing with case (i) let us consider next the possibility that z 1 ∈ ∂∞ H and z 2 ∈ ∂∞ H. Then ω1 = 0 and ω2 = 1 and the numerator in (16) is zero. Away from points described by (ii), (iii) and (iv) the denominator is not zero so μ∗2 (k) = 0. This is impossible. When z 1 = i∞ we should first replace z 1 with −1/w1 and send w1 to zero. This leads to the same conclusion. Thus we are left to consider points satisfying (i) where z 1 and z 2 are both in ∂∞ H and not the point at infinity. In other words z 1 and z 2 are real but not equal to −λ. In this case the condition μ∗2 = 1 can be rewritten as

ω1 ω2

T

with M=

M

ω1 = 0, ω2

m 1,1 m 1,2 , m 2,1 m 2,2

(17)

250

R. Froese, D. Hasler, W. Spitzer

where m 1,2 = m 2,1 and m 1,1 = |z 1 − z λ |2 |z 2 + λ|2 , m 1,2 = (|z 1 − z λ |2 |z 2 +λ|2 +|z 2 − z λ |2 |z 1 +λ|2 )/2 − |z 1 +λ+z 2 +λ+z λ (z 1 +λ)(z 2 +λ)|2 , m 2,2 = |z 2 − z λ |2 |z 1 + λ|2 . Setting s1 = z 1 + λ and s2 = z 2 + λ and using s1 , s2 ∈ R we find −s1 s2 (s1 s2 − λ(s1 + s2 )/2 + 2 (s12 − λs1 + 2)s22 . M= −s1 s2 (s1 s2 − λ(s1 + s2 )/2 + 2 (s22 − λs2 + 2)s12 Since tr(M) ≥ 0 the condition (17) requires det(M) = s12 s22 (s1 − s2 )2 (2 − λ2 /4) = 0 . But s1 and s2 are not zero. Therefore we conclude that s1 = s2 and thus z 1 = z 2 . In addition, 1 −1 , M = s 2 (s 2 − λs + 2) −1 1 √ √ where s is the common value of s1 and s2 . For λ ∈ (−2 2, 2 2), s 2 − λs + 2 > 0. Thus (17) implies that ω1 = ω2 and so k ∈ 1 . Finally, we must deal with case (ii). Let (z 1,n , z 2,n , q1,n , q2,n , λn ) ∈ H2 × R2 × R be a sequence that realizes the lim sup in the definition of μ2 (k). Define r˜2,n , η˜ 1,n and η˜ 2,n via z 1,n + λn − q1,n = r˜2,n η˜ 1,n , z 2,n + λn − q2,n = r˜2,n η˜ 2,n , and |η˜ 1,n |2 + |η˜ 2,n |2 = 1. By going to a subsequence if needed, we may assume that r˜2,n , η˜ 1,n and η˜ 2,n converge to 0, η˜ 1 and η˜ 2 respectively. We may also assume that pi,n = 1+Im(λn )/ Im(z i,n ) converge to pi for i = 1, 2. Then we find that ω1 ω2 |η˜ 1 + η˜ 2 |2 ( p1 ω1 |η˜ 2 |2 + p2 ω2 |η˜ 1 |2 )(ω1 + ω2 ) ω1 ω2 |η˜ 1 + η˜ 2 |2 ≤ 1, ≤ (ω1 |η˜ 2 |2 + ω2 |η˜ 1 |2 )(ω1 + ω2 )

1 = μ2 (k) =

(18)

unless the denominator is zero, that is, ω1 = 0, η˜ 1 = 0 or ω2 = 0, η˜ 2 = 0, which we assume for the moment is not the case. The last inequality holds because it is equivalent to T |η˜ 2 |2 − Re(η˜ 1 η˜ 2 ) ω1 ω1 ≥ 0, 2 ω2 ω |η˜ 1 | − Re(η˜ 1 η˜ 2 ) 2 and the matrix in this formula is positive semi-definite.

Absolutely Continuous Spectrum for the Anderson Model on a Tree

251

Still under the assumption that neither ω1 = 0, η˜ 1 = 0 nor ω2 = 0, η˜ 2 = 0 hold, we see that at least one of p1 or p2 must equal 1. Otherwise we would have a strict inequality in (18) which is impossible. If pi = 1 then Im(λn )/ Im(z i,n ) → 0. This implies that r˜2,n /r2,n → 1, because r2,n ≥ Im(z i,n ) implies Im(λn )/r2,n → 0 and 2 = r˜2,n

2

Re(z i,n − qi,n + λn )2 + Im(z i,n + λn )2 .

i=1

Now, from r˜2,n η˜ i,n = r2,n ηi,n + Im(λn ) we conclude that η˜ i,n and ηi,n have the same limit ηi . So, in fact, we have that (17) holds with M=

− Re(η1 η2 ) |η2 |2 . − Re(η1 η2 ) |η1 |2

Since tr(M) > 0 this requires det(M) = |η1 |2 |η2 |2 1 − cos(2(arg(η1 ) − arg(η2 ))) = 0 . This means either η1 = 0, η2 = 0, arg(η1 ) = arg(η2 ) or arg(η1 ) = arg(η2 ) + π . If ω1 η1 = 0 then ∈ Ker(M) requires ω1 = 0 and k ∈ 4 . Similarly, if η2 = 0 then ω2 k ∈ 4 . If arg(η1 ) = arg(η2 ) = ψ then M=

−|η1 ||η2 | |η2 |2 −|η1 ||η2 | |η1 |2

|η1 | ω1 . Thus η1 = eiψ ω1 = |η2 | ω2 and η2 = eiψ ω2 , and again we have k ∈ 4 . The remaining possibility is that arg(η1 ) = arg(η2 ) + π . Since both η1 and η2 lie in the upper half plane, this implies that they are both real with opposite signs. Equation (17) then requires ω1 = η1 and ω2 = η2 . But this is impossible, as ω1 and ω2 are both non-negative. To complete the proof we must return to the possibility that ω1 = 0, η˜ 1 = 0 or ω2 = 0, η˜ 2 = 0. Clearly, at most one of these can hold. Suppose ω1 = 0, η˜ 1 = 0. (The other possibility is handled similarly.) Introduce one more set of variables sn , α1,n and α2,n satisfying and (17) and the fact that ω1 , ω2 ≥ 0 implies that

ω1,n = sn2 α1,n , η˜ 1,n = sn2 α2,n , and 2 α1,n + |α2,n |2 = 1.

Then sn → 0 and going to a subsequence we may assume that α1,n and α2,n converge to α1 and α2 . Then μ2 (k) = α1 /( p1 α1 + p2 |α2 |2 ) = 1 so that α2 = 0 and p1 = 1. But p1 = 1 implies η˜ 1 = η1 by the argument above. Thus η1 = η˜ 1 = 0 and ω1 = 0 which implies that k ∈ 4 .

252

R. Froese, D. Hasler, W. Spitzer

Proofs of Lemma 4 and Lemma 5 Proof of Lemma 4. Extend μ3, p to an upper semi-continuous function on H 3 × R4 × R by setting, at points Z 0 , Q 0 , λ0 , where it is not already defined, μ3, p (Z 0 , Q 0 , λ0 ) =

lim sup

Z →Z 0 ,Q→Q 0 ,λ→λ0

μ3, p (Z , Q, λ) .

Here the limsup is taken over points in H3 × R4 × R, and we are using the notation Z = (z 1 , z 2 , z 3 ) for points in H3 and Q = (q1 , q2 , q3 , q4 ) for points in R4 . The points 3 Z , Q and λ are approaching their limits in the topology of H × R4 × R. To prove the theorem it is then enough to show that μ3, p (Z , Q, λ) < 1

(19)

for (Z , Q, λ) in the compact set ∂∞ (H 3 ) × {0}4 × [−E, E], since this implies that for some > 0, the upper semi-continuous function μ3, p (Z , Q, λ) is bounded by 1 − 2 on the set, and by 1 − in some neighbourhood. We will rewrite μ3, p in terms of the simpler function μ2 . Define νi (Z ) =

cd(z i ) , cd(z 1 ) + cd(z 2 ) + cd(z 3 )

and the maps ξσ and τσ from H3 × R4 × R to H × H × R2 × L labelled by a permutation σ of (1, 2, 3) and given by ξσ (Z , Q, λ) = (z σ2 , z σ3 , qσ2 , qσ3 , λ), τσ (Z , Q, λ) = (z σ1 , φ(z σ2 , z σ3 , qσ2 , qσ3 , λ), qσ1 , q4 , λ). Then we have μ3, p (Z , Q, λ) cd(φ(z σ , φ(z σ , z σ , qσ , qσ , λ), qσ , qσ , λ)) p 1 1 2 3 2 3 1 4 = p p p cd(z 1 ) + cd(z 2 ) + cd(z 3 ) ν1 + ν2 + ν3 σ p 1 1 1 μ2 (τσ (Z , Q, λ)) νσ1 + μ2 (ξσ (Z , Q, λ))(νσ2 + νσ3 ) = p p p. 2 4 ν1 + ν2 + ν3 σ (20) Let R1 , 1 , 2 and 3 be three dimensional polar co-ordinates defined as functions of Z = (z 1 , z 2 , z 3 ) ∈ H3 by χ (z 1 ) = R1 1 , χ (z 2 ) = R1 2 , χ (z 3 ) = R1 3 , and 21 + 22 + 23 = 1.

Absolutely Continuous Spectrum for the Anderson Model on a Tree

253

Notice that for any permutation σ of (1, 2, 3), νσ1 =

σ2 σ3 . 1 2 + 1 3 + 2 3

(21)

Next, let r1 (z 1 , z 2 ), ω1 (z 1 , z 2 ) and ω2 (z 1 , z 2 ) be the co-ordinates defined by (10) and (11). Then, for any permutation σ of (1, 2, 3) and any Z = (z 1 , z 2 , z 3 ) ∈ H3 , 2σ2 = 2σ2 + 2σ3 ω12 (z σ2 , z σ3 ), (22) 2σ3 = 2σ2 + 2σ3 ω22 (z σ2 , z σ3 ), where each σi is evaluated at Z . To see this note that, since χ (z σ2 ) = r1 (z σ2 , z σ3 )ω1 (z σ2 , z σ3 ) = R1 σ2 and χ (z σ3 ) = r1 (z σ2 , z σ3 )ω2 (z σ2 , z σ3 ) = R1 σ3 , we have r12 (z σ2 , z σ3 ) = r12 (z σ2 , z σ3 )(ω12 (z σ2 , z σ3 ) + ω22 (z σ2 , z σ3 )) = R12 (2σ2 + 2σ3 ). Thus R12 2σ2 = r12 (z σ2 , z σ3 )ω12 (z σ2 , z σ3 ) = R12 (2σ2 + 2σ3 )ω12 (z σ2 , z σ3 ), and since R1 = 0 for Z ∈ H3 the first equality of (22) follows. The second equality is proved in the same way. A similar argument also shows = 2σ1 + F 2 (2σ2 + 2σ3 ) ω12 (z σ1 , φ(z σ2 , z σ3 , qσ2 , qσ3 , λ)), 2σ1 (23) F 2 (2σ2 + 2σ3 ) = 2σ1 + F 2 (2σ2 + 2σ3 ) ω22 (z σ1 , φ(z σ2 , z σ3 , qσ2 , qσ3 , λ)). Here each σi is evaluated at Z and χ (φ(z σ2 , z σ3 , qσ2 , qσ3 , λ)) r1 (z σ2 , z σ3 ) 2ω1 (z σ2 , z σ3 )ω2 (z σ2 , z σ3 ) . = μ2 (z σ2 , z σ3 , qσ2 , qσ3 , λ)(ω1 (z σ2 , z σ3 ) + ω2 (z σ2 , z σ3 ))

F=

(24)

We will prove (19) by contradiction. For this suppose that μ3, p (Z , Q, λ) = 1 for some (Z , Q, λ) ∈ ∂∞ (H 3 ) × {0}4 × [−E, E]. Then there must exist a sequence (Z n , Q n , λn ) with Z n → Z in H 3 , Q n → (0, 0, 0, 0) and λn → λ ∈ [−E, E] such that lim μ3, p (Z n , Q n , λn ) = 1. From now on Z = (z 1 , z 2 , z 3 ) and λ will denote the limiting values of the sequence Z n and λn . Similarly, we will denote by νi and i the limits of νi (Z n ) and i (Z n ). We claim that 1 (25) ν1 = ν2 = ν3 = . 3 This follows from (20), the bound μ2 ≤ 1 proved in Proposition 7 and convexity of x → x p which imply p 1 1 1 1≤ νσ1 + (νσ2 + νσ3 ) p p p 2 4 ν1 + ν2 + ν3 σ 1 1 1 νσp1 + (νσp2 + νσp3 ) ≤ p p p 2 4 ν + ν 1 2 + ν3 σ = 1,

254

R. Froese, D. Hasler, W. Spitzer

so the inequalities must actually be equalities. Since p > 1, strict convexity implies that equality only holds if ν1 =ν2 =ν3 . Since their sum is 1, their common value must be 1/3. By going to a subsequence, we may assume that i (Z n ) converge. Then (25) and (21) imply that their limiting values along the sequence must be √ (26) 1 = 2 = 3 = 1/ 3 . One consequence is that z i ∈ ∂∞ H

(27)

for i = 1, 2, 3. Now consider the values of ξσ (Z n , Q n , λn ) and τσ (Z n , Q n , λn ). Since these vary in a compact region in K we may, again by going to a subsequence, assume that they converge in K to values which we will denote ξσ and τσ . Returning to (20) and using (25), the upper semi-continuity of μ2 and the bound μ2 ≤ 1, we find that 1 μ2 (τσ (Z n , Q n , λn ))(1 + μ2 (ξσ (Z n , Q n , λn ))) p 1 = lim n→∞ 3 2 σ 1 μ2 (τσ )(1 + μ2 (ξσ )) p ≤ 3 σ 2 ≤ 1.

This implies that for every σ occurring in the sum we have μ2 (ξσ ) = μ2 (τσ ) = 1. This and (27) imply that for each σ , ξσ and τσ lie in the set of Lemma 8. Now consider the co-ordinates ω1 and ω2 for the point ξσ . These are the limiting values of ωi (z σ2 , z σ3 ) along our sequence. Equations (22) and (26) then imply that these √ limiting values are ω1 = ω2 = 1/ 2. Examining the description of in Lemma 8, we conclude that the H co-ordinates of ξσ , namely the limiting values of z σ2 and z σ3 , must be equal. Since this is true for every σ we conclude that z 1 = z 2 = z 3 ∈ ∂∞ H . Let z denote their common value. We first consider the possibility that z = −λ. The two H co-ordinates of the point τ(1,2,3) are z and the limiting value of φ(z 2 , z 3 , q2 , q3 , λ). This limiting value is simply φ(z, z, 0, 0, λ) = −2/(z + λ) and is easily seen to be not equal to z. The only way that τσ with H co-ordinates z and φ(z, z, 0, 0, λ) can lie in with z = −λ is that φ(z, z, 0, 0, λ) = −λ and that the ω2 co-ordinate is 0. The ω2 co-ordinate is the limiting value of ω2 (z 1 , φ(z 2 , z 3 , q2 , q3 , λ)) which we may use in taking the limit of Eq. (23). The limiting value of F in that equation√can be computed from (24), since we know that the values of ω √i in that formula are 1/ 2 and the value of μ2 in that formula is 1. This gives F = 1/ 2 and so the second equation of (23) yields 1/3 = 0 in the limit, which is impossible. This leaves the possibility that z = −λ. Again, the H co-ordinates for the point τ(1,2,3) are z and the limiting value of φ(z 2 , z 3 , q2 , q3 , λ). By going to a subsequence, we have assumed that this limiting value exists. However, in this case it is not clear what

Absolutely Continuous Spectrum for the Anderson Model on a Tree

255

the value is, since (−λ, −λ, 0, 0, λ) is the point where φ is not continuous. In fact, we will see that the limiting value, possibly after going one more time to a subsequence, is i∞. To see this we write −(z 2,n + λn − q2,n ) − (z 3,n + λn − q3,n ) (z 2,n + λn − q2,n )(z 3,n + λn − q3,n ) −r2,n (η1,n + η2,n ) − 2i Im(λn ) = , (r2,n η1,n + i Im(λn ))(r2,n η2,n + i Im(λn ))

φ(z 2,n , z 3,n , q2,n , q3,n , λn ) =

where r2,n , η1,n and η1,n are the co-ordinates defined by (12) and (13). Since for our sequence, z 2,n , z 3,n → −λ, q2,n , q3,n → 0 we have r2,n → 0. We also have Im λn → 0 so if we write (r2,n , Im(λn )) in polar co-ordinates, that is, r2,n = sn α1,n and Im(λn ) = 2 + α 2 = 1, then s → 0. By going to a subsequence we may assume sn α2,n with α1,n n 2,n α1,n and α2,n converge to non-negative values α1 and α2 . Then φ(z 2,n , z 3,n , q2,n , q3,n , λn ) =

−α1,n (η1,n + η2,n ) − 2iα2,n . sn (η1,n + iα1,n )(η2,n + iα2,n )

The denominator of this expression converges to 0. The numerator converges to −α1 (η1 + η2 ) − 2iα2 , where η1 and η2 are co-ordinates in K for ξ(1,2,3) √. Since ξ(1,2,3) lies in

−iψ 2 with ψ ∈ [0, π ]. Here with H co-ordinates (−λ, −λ) we must have η1 + η2 = e√ we used that the ω co-ordinates for ξ(1,2,3) are both 1/ 2. But now we see that it is impossible that α1 (η1 + η2 ) + 2iα2 = 0, since that imaginary part being zero √ forces α2 = 0 and ψ ∈ {0, π } in which case α1 = 1 so that α1 (η1 + η2 ) + 2iα2 = ± 2 = 0. This implies that the limiting value of φ is i∞. Now we know that the point τ(1,2,3) has H co-ordinates −λ and i∞. Thus τ(1,2,3) ∈

requires that the ω1 co-ordinate of τ(1,2,3) be zero. Arguing as above, we find that in the limit, the first equation of (23) reads 1/3 = 0. This contradiction concludes the proof of the theorem. Proof of Lemma 5. Each term in the sum appearing in μ3, p can be estimated p cd(φ(· · · · · ·)) cd p (φ(· · · · · ·)) p−1 ≤3 , cd p (z 1 ) + cd p (z 2 ) + cd p (z 3 ) cd(z 1 ) + cd(z 2 ) + cd(z 3 ) where φ(· · · · · ·) denotes φ(z σ1 , φ(z σ2 , z σ3 , qσ2 , qσ3 , λ), qσ1 , q4 , λ). Therefore it is enough to prove 4 cd(φ(· · · · · ·)) 2 |qi | . (28) ≤C 1+ cd(z 1 ) + cd(z 2 ) + cd(z 3 ) i=1

Let φ(· · ·) denote φ(z σ2 , z σ3 , qσ2 , qσ3 , λ). Then Im(φ(· · ·)) =

Im(z σ3 + λ) Im(z σ2 ) Im(z σ2 + λ) + ≥ . |z σ2 + λ − qσ2 |2 |z σ3 + λ − qσ3 |2 |z σ2 + λ − qσ2 |2

Thus we have cd(φ(· · · · · ·)) cd(z 1 ) + cd(z 2 ) + cd(z 3 )

256

R. Froese, D. Hasler, W. Spitzer

(z σ + λ − qσ ) + (φ(· · ·) + λ − qσ ) + z λ (z σ + λ − qσ )(φ(· · ·) + λ − qσ )2 1 1 4 1 1 4 = Im(z σ1 + λ)|φ(· · ·) + λ − qσ4 |2 + Im(φ(· · ·) + λ)|z σ1 + λ − qσ1 |2 1 × 3 2 i=1 |z i − z λ | / Im(z i ) 1 3|z σ2 + λ − qσ2 |2 3 + 3|z σ1 + λ − qσ1 |2 + ≤ . 3 2 Im(z σ2 ) Im(z σ1 ) i=1 |z i − z λ | / Im(z i ) 3 |z i − z λ |2 / Im(z i ) ≥ C > 0 for some constant Choose the compact set K so that i=1 c C if (z 1 , z 2 , z 3 ) ∈ K . Then we can estimate each term depending on whether z σi is close to z λ . If it is sufficiently close, then Im(z σi ) is bounded below and |z σi | is bounded above by a constant. Thus Im(z σi )

3

|z i − z λ |2 / Im(z i ) ≥ Im(z σi )C ≥ C > 0

i=1

and |z σi + λ − qσi |2 ≤ C(1 + |qσi |2 ), so we are done. Otherwise Im(z σi )

3

|z i − z λ |2 / Im(z i ) ≥ |z σi − z λ |2 ≥ C(1 + |z σi |2 )

i=1

3 |z i − z λ |2 / Im(z i ) ≤ C(1 + |qσi |2 ) in this case so that |z σi + λ − qσi |2 / Im(z σi ) i=1 too. The estimates for μ 3, p and μ 1, p are very similar. We omit the details. Acknowledgements. R. F. would like to thank Rafe Mazzeo for useful conversations. D. H. and W. S. would like to thank the Department of Mathematics at the University of British Columbia for hospitality.

References [Ai]

Aizenman, M.: Localization at weak disorder: some elementary bounds. Rev. Math. Phys. 6, 1163– 1182 (1994) [AM] Aizenman, M., Molchanov, S.: Localization at large disorder and at extreme energies: an elementary derivation. Commun. Math. Phys. 157, 245–278 (1993) [ASW] Aizenman, M., Sims, R., Warzel, S.: Stability of the Absolutely Continuous Spectrum of Random Schrödinger Operators on Tree Graphs. To appear in Prob. Theor. Rel. Fields 2006, DOI: 10.1007/s00440-005-0486-8, posted 30 December 2005 [An] Anderson, P.W.: Absence of diffusion in certain random lattices. Phys. Rev. 109, 1492–1505 (1958) [D] Denisov, S.A.: On the preservation of absolutely continuous spectrum for Schrodinger operators. J. Funct. Anal. 231, 143–156 (2006) [DK] Denisov, S.A., Kiselev, A.: Spectral properties of Schrodinger operators with decaying potentials. To appear is B. Simon’s Festschrift, Proceedings of Symposia in Pure Mathematics, Providence RI: Amer. Math. Soc., 2006 [FHS] Froese, R., Hasler, D., Spitzer, W.: Transfer matrices, hyperbolic geometry and absolutely continuous spectrum for some discrete Schrödinger operators on graphs. J. Func. Anal. 230, 184–221 (2006) [K1] Klein, A.: Extended States in the Anderson Model on the Bethe Lattice. Adv. in Math. 133, 163–184 (1998) [K2] Klein, A.: Spreading of wave packets in the Anderson model on the Bethe lattice. Commun. Math. Phys. 177, 755–773 (1996)

Absolutely Continuous Spectrum for the Anderson Model on a Tree

[KLS] [SS]

257

Laptev, A., Naboko, S., Safronov, O.: Absolutely continuous spectrum of Schrödinger operators with slowly decaying and oscillating potentials. Commun. Math. Phys. 253, 611–631 (2005) Safronov, O., Stolz, G.: Absolutely continuous spectrum of Schrödinger operators with potentials slowly decaying inside a cone. To appear in J. Math. Anal. Appl., DOI: 10.1016/j.jmaa.2006.01.093, 2006

Communicated by B. Simon

Commun. Math. Phys. 269, 259–281 (2007) Digital Object Identifier (DOI) 10.1007/s00220-006-0122-1

Communications in

Mathematical Physics

Spectral Measures of Small Index Principal Graphs Teodor Banica1 , Dietmar Bisch2, 1 Department of Mathematics, Université Paul Sabatier, 118 route de Narbonne, 31062 Toulouse, France.

E-mail: [email protected]

2 Department of Mathematics, Vanderbilt University, 1326 Stevenson Center, Nashville, TN 37240, USA.

E-mail: [email protected] Received: 24 March 2006 / Accepted: 28 May 2006 Published online: 29 September 2006 – © Springer-Verlag 2006

Abstract: The principal graph X of a subfactor with finite Jones index is one of the important algebraic invariants of the subfactor. If is the adjacency matrix of X we consider the equation = U +U −1 . When X has square norm ≤ 4 the spectral measure of U can be averaged by using the map u → u −1 , and we get a probability measure ε on the unit circle which does not depend on U . We find explicit formulae for this measure ε for the principal graphs of subfactors with index ≤ 4, the (extended) Coxeter-Dynkin graphs of type A, D and E. The moment generating function of ε is closely related to Jones’ -series. Introduction The Coxeter-Dynkin graphs of type A, Deven , E6 , E8 , and the extended Coxeter-Dynkin graphs of type ADE appear in the theory of subfactor as basic invariants for inclusions of II1 factors with Jones index ≤ 4 ([6, 10–12], see also [4, 5]). They are fusion graphs of subfactor representations and capture the algebraic information contained in the standard invariant of the subfactor N ⊂ M (see e.g. [2, 12, 4, 5]). Such a graph X is bipartite and has a distinguished vertex 1. Of particular interest in subfactor theory are the number of length 2k loops on X based at 1, since these are the dimensions of the higher relative commutants associated to N ⊂ M. If combined in a formal power series f (z) we obtain the Poincaré series of X . A related series (u), with z 1/2 = u + u −1 , is considered by Jones in [8]. Jones made the remarkable discovery that if the subfactor has index > 4 then the coefficients of this series are necessarily positive integers. The series f and are natural invariants of the subfactor. In this paper we compute explicitly measures whose generating series of moments are (essentially) these two power series in the case when the graphs have square norm ≤ 4. These measures can then be regarded as invariants of the subfactors. D.B. was supported by NSF under Grant No. DMS-0301173.

260

T. Banica, D. Bisch

For a graph X of square norm ≤ 4 we have the following measure-theoretic version of . Consider the equation = U + U −1 , where is the adjacency matrix of X . We can average the spectral measure of U by using the map u → u −1 , and we get a probability measure ε on the unit circle which does not depend on U . We find that for the (extended) Coxeter-Dynkin graphs of type A and D this measure ε is given by very simple formulae as follows: An−1 → α dn , Dn+1 → α dn , A∞ → α d, (1)

A2n → dn , A−∞,∞ → d, (1) → (d1 + dn )/2, Dn+2

D∞ → (d1 + d)/2.

Here d, dn , dn are the uniform measures on the unit circle, the uniform measure supported on the 2n th roots of unity, and the uniform measure supported on the 4n th roots of unity of odd order. The fundamental density α is given by α(u) = 2I m(u)2 and corresponds via x = u + u −1 to the semicircle law from [14]. (1) (1) (1) For the graphs of type E 6 , E 7 , E 8 , E 6 , E 7 , and E 8 , ε is given by the following formulae: E 6 → α d12 + (d12 − d6 − d4 + d3 )/2, E 7 → ε7 , E 8 → ε8 , (1)

E 6 → α d3 + (d2 − d3 )/2, (1)

E 7 → α d4 + (d3 − d4 )/2, E 8(1) → α d6 + (d5 − d6 )/2. Here ε7 , ε8 denote certain exceptional measures for which we do not have closed formulae (we do compute their moment generating series though). These results are obtained by explicit computations. They could also be obtained by using planar algebras methods, see [8, 9, 13] for related work. The measure ε computed in this article should be viewed as an analytic invariant for subfactors of index ≤ 4. It is unclear what the appropriate generalization of ε to subfactors of index > 4 should be. However, in light of Jones’ work in [8], the measure ε should be related to certain representations of planar algebras. It should shed some light on the structure of subfactor planar algebras (or equivalently standard invariants) arising from subfactors with index > 4. We intend to come back to this question in future work. Similar considerations make sense for quantum groups. The hope would be that an analytic invariant for quantum groups would emerge from the Weingarten formula in [1], and work here is in progress. This in turn might be related to the results in the present article via Di Francesco’s formula in [3]. The paper is dedicated to the proofs of the above results and is organized as follows. Section 1 fixes the notation and contains some preliminaries. In Sects. 2, 3 and 4 we

Spectral Measures of Small Index Principal Graphs

261

divide the graphs of type A and D into three classes – circulant graphs, graphs with An tails, and graphs with fork tails – and we compute ε. In Sects. 5, 6 and 7 we discuss the graphs of type E using a key lemma in Sect. 3. 1. Spectral Measures and the Jones Series We collect in this section several known results about spectral measures associated to graphs and their Stieltjes transforms. We relate them to a natural power series discovered by Jones in the context of planar algebras and their representation theory ([7, 8]). Let X be a (possibly infinite) bipartite graph with distinguished vertex labeled by “1”. Since X is bipartite, its adjacency matrix is given by 0 M , = Mt 0 where M is a rectangular matrix with non-negative integer entries. If we let L = M M t and N = M t M, then L 0 . 2 = 0 N For a matrix T with entries labeled by the vertices of X , we use the following notation (“1” is the label of the distinguished vertex): T = T11 . We call T11 the integral of the matrix T . Definition 1.1. The spectral measure of X is the probability measure μ on R satisfying ∞ ϕ(x) dμ(x) = ϕ() −∞

for any continuous function ϕ : R → C. Note that the spectral measure of can be regarded as an invariant of X . The spectral measure is uniquely determined by its moments. The generating series of these moments is called the Stieltjes transform σ of μ, i.e. ∞ 1 dμ(x). σ (z) = −∞ 1 − zx This is related to the Poincaré series of X , which appears as the generating function of the numbers loop(2k), counting loops of length 2k on X based at 1. The Poincaré series of X is defined as f (z) =

∞

loop(2k)z k .

k=0

We have then the following well-known result.

262

T. Banica, D. Bisch

Proposition 1.1. Let X be a bipartite graph with spectral measure μ and Poincaré series f . The Stieltjes transform σ of μ is given by σ (z) = f (z 2 ). Proof. We compute σ (z) using the fact that the integral of l is loop(l), ∞ 1 loop(l)z l . σ (z) = = 1 − z l=0

Since loop(l) = 0 for l odd, we have f (z 2 ) =

∞

loop(2k)z 2k =

k=0

∞

loop(l)z l .

l=0

This proves the statement. We assume now that the graph X has norm ≤ 2, that is the matrix has norm ≤ 2. Thus the support of μ, which is contained in the spectrum of , is contained in [−2, 2]. Let T be the unit circle, and consider the following map : T → [−2, 2], defined by (u) = u + u −1 . Any probability measure ε on T produces a probability measure μ = ∗ (ε) on [−2, 2], according to the following formula, valid for any continuous function ϕ : [−2, 2] → C: ϕ(x) dμ(x) = ϕ(u + u −1 ) dε(u). R

T

We can obtain in this way all probability measures on [−2, 2]. Given μ, there is a unique probability measure ε on T satisfying ∗ (ε) = μ with the normalisation dε(u) = dε(u −1 ). Definition 1.2. The spectral measure of X (on T) is the probability measure ε on T given by −1 ϕ(u + u ) dε(u) = ϕ() T

for any continuous function ϕ : [−2, 2] → C, with the normalisation dε(u) = dε(u −1 ). The generating series of the moments of ε (the Stieltjes transform) is given by 1 dε(u). S(q) = 1 − qu T Following Jones ([7, 8]), given a subfactor planar algebra (P = (P0± , (Pk )k≥1 ) with parameter δ, the associated Poincaré series is defined as ∞

f (z) =

1 (dim P0+ + dim P0− ) + dim Pk z k . 2 k=1

Jones introduced in [8] an associated series , which is essentially obtained from the q Poincaré series by a change of variables z → 1+q 2 . In this paper we will call this series the Jones series. If δ > 2, then the Jones series is the dimension generating function for the multiplicities of certain Temperley-Lieb modules which appear in the decomposition

Spectral Measures of Small Index Principal Graphs

263

of the planar algebra P viewed as a module for the Temperley-Lieb planar algebra ([8], see also [9, 13]). Using the formula for the -series from [8] we define the Jones series of the graph X by q 1−q (q) = q + , f 1+q (1 + q)2 where f (z) is the Poincaré series of X . With these notations, we have then the following result. Proposition 1.2. Let X be a bipartite graph with spectral measure ε (on T) and Jones series . The Stieltjes transform of ε is given by 2S(q) = (q 2 ) − q 2 + 1. Proof. We compute S in terms of ε, 1 1 2S(q) = dε(u) dε(u) + −1 T 1 − qu T 1 − qu 1 − q2 dε(u). = 1+ −1 2 T 1 − q(u + u ) + q We compute now in terms of μ, 1 − q2 q2 f 1 + q2 (1 + q 2 )2 ∞ 1 − q2 = dμ(x). 2 −∞ 1 − q x + q

(q 2 ) − q 2 =

This formula in the statement follows now from the definition of ε. In the next sections we compute explicitly the spectral measure (on T) for graphs which appear in the classification of subfactors with index ≤ 4 (the so-called principal graphs). We use standard notation for these graphs (see for instance [5]). 2. Circulant Graphs (1)

(1)

Consider the circulant graph A2n , that is A2n is the 2n-gon, and choose any vertex as the distinguished vertex 1. (1)

Theorem 2.1. Let X be the circulant graph A2n . The spectral measure of X (on T) is given by dε(u) = dn u, where dn is the uniform measure on 2n th roots of unity. Proof. We identify X with the group {w k }0≤k≤2n−1 of 2n th roots of unity, where w = eiπ/n . The adjacency matrix of X acts on functions f ∈ C(X ) in the following way: f (w s ) = f (w s−1 ) + f (w s+1 ).

264

T. Banica, D. Bisch

Consider the following operators U and U −1 : U f (w s ) = f (w s+1 ), U −1 f (w s ) = f (w s−1 ). We have = U + U −1 . The moments of the spectral measure of U are obtained as follows: U k = U k δ1 , δ1 = δwk , δ1 = δwk ,1 . We compute now the moments of the measure in the statement, k u dε(u) = u k dn u = δwk ,1 . T

T

Thus ε is the spectral measure of U , and together with the identity dε(u) = dε(u −1 ), we get the result. The graph A−∞,∞ is the set Z with consecutive integers connected by edges. Choose any vertex as the distinguished vertex labeled 1. Theorem 2.2. Let X be the graph A−∞,∞ . The spectral measure of X (on T) is given by dε(u) = du, where du is the uniform measure on the unit circle. Proof. This follows from Theorem 2.1 by letting n → ∞, or from a direct loop count. 3. Graphs with Tails The Coxeter-Dynkin graph of type An , n ≥ 2, has n vertices and the distinguished vertex 1 labels a vertex at one end of the graph, i.e. An is bipartite graph of the form 1 − α1 − 2 − α2 − 3 − α3 ::: β, where 1, 2, 3, . . . labels the even vertices and α1 , α2 , . . . labels the odd vertices. β is either even or odd depending on the parity of n. Consider a sequence of graphs X k obtained by adding A2k tails to a finite graph . We let X 0 = 1 ::: , X 1 = 1 − α − 2 ::: , X 2 = 1 − α − 2 − β − 3 ::: , where 1 denotes the distinguished vertex, α, 2, β and 3, . . . denote vertices connected by single edges as indicated and denotes a finite graph connected by a single edge to the preceding vertex. For instance, X 1 is obtained by attaching A3 to (since the vertex 2 of A3 is identified with one of the vertices of , we have attached an A2 -tail to ), X 2 by attaching A5 to , etc. Similarly we define the graph X k .

Spectral Measures of Small Index Principal Graphs

265

We denote by L 0 the matrix appearing on the top left of the square of the adjacency matrix of X 0 , that is: L0 0 2 . 0 = 0 N0 We compute the Jones series of each X k in the next lemma. Lemma 3.1. The Jones series of the graphs X k , k ≥ 0, is given by (q) − q 1 − Pq 2k , = 1−q 1 − Pq 2k+1 where P is defined by the formula P=

P1 − q −1 P0 P1 − q P0

and Pi = Pi (y) = det(y − K i ), i = 0, 1, with y = 2 + q + q −1 , and with K 0 , K 1 being the following matrices: – K 0 is obtained from L 0 by deleting the first row and column, – K 1 is obtained from L 0 by adding 1 to the first entry. Proof. We use the notation fixed in Sect. 1. The matrix Mk in the adjacency matrix of X k is given by a row vector w and a matrix M, as follows: ⎞ ⎛ ⎛ ⎞ 1 1 w ⎟ ⎜1 1 . M0 = , M1 = ⎝1 w ⎠ , M2 = ⎝ 1 w⎠ M M M The corresponding matrices L k are given by the real number a = ww t + 1, the row vector u = wM t , the column vector v = Mw t , and the matrix N = M M t as follows:

a−1 u w t t w M = , L0 = v N M ⎛ ⎞ ⎛ ⎞ 1 1 1 1 1 L 1 = ⎝1 w ⎠ = ⎝1 a u ⎠, wt M t v N M ⎞ ⎛ ⎞⎛ ⎞ ⎛1 1 1 1 1 1 2 1 ⎟ ⎜1 1 ⎟⎝ ⎠=⎜ 1 1 L2 = ⎝ . ⎝ 1 a u⎠ 1 w⎠ t t w M v N M It is now clear what the form of the matrices Mk and L k is for general k. Consider the matrix K k , k ≥ 0, obtained from L k by deleting the first row and column, in other words ⎛ ⎞ 2 1 a u K 2 = ⎝1 a u ⎠ K 0 = (N ) K 1 = v N v N and similarly for general k.

266

T. Banica, D. Bisch

We have the following formula for the Poincaré series of X k , with y = z −1 : det(y − K k ) 1 det(1 − z K k ) =y· . f k (z) = = 1 − z Lk det(1 − z L k ) det(y − L k ) The characteristic polynomials Pk = det(y − K k ) and Q k = det(y − L k ) satisfy the following two identities, obtained by developing determinants at the top left. Pk+1 = (y − 2)Pk − Pk−1 , Q k = (y − 1)Pk − Pk−1 . We consider the first identity and the second identity minus the first, with the change of variables y = 2 + q + q −1 , and obtain Pk+1 = (q + q −1 )Pk − Pk−1 , Q k = Pk+1 + Pk . If we let P+ = P1 − q P0 and P− = P1 − q −1 P0 , the solutions of these equations can be written as follows: q −k P+ − q k P− , q −1 − q q −k P+ − q k+1 P− Qk = . 1−q Pk =

We can compute now the series f k by using the variables z = y −1 = q(1 + q)−2 , (1 + q)2 Pk · q Qk 2 (1 + q) 1−q q −k P+ − q k P− = · −1 · −k q q − q q P+ − q k+1 P− 1 − Pq 2k = (1 + q) . 1 − Pq 2k+1

f k (z) =

And finally we obtain the Jones series as k (q) − q =

1−q 1 − Pq 2k f k (z) = (1 − q) . 1+q 1 − Pq 2k+1

This proves the statement. We are now ready to compute the spectral measure for An (on T). Theorem 3.1. Let X be the Coxeter-Dynkin graph An−1 , n ≥ 1, with n − 1 vertices. The spectral measure of X (on T) is given by dε(u) = α(u) dn u, where α(u) = 2I m(u)2 , and dn is the uniform measure on 2n th roots of unity.

Spectral Measures of Small Index Principal Graphs

267

Proof. We use Lemma 3.1 to compute the Jones series of X k = A2k+2 by letting M0 = 1 , L0 = 1 , K0 = , K1 = 2 , P0 = 1, P1 = q + q −1, P = q 2. Then 1 − q 2k+2 (q) − q = . 1−q 1 − q 2k+3 Similarly, Lemma 3.1 gives the Jones series of X k = A2k+3 by letting 1 11 21 , L0 = , K0 = 1 , , M0 = K1 = 1 11 11 P0 = 1 + q + q −1 , P1 = 1 + q + q −1 + q 2 + q −2 , P = q 3. Then 1 − q 2k+3 (q) − q = . 1−q 1 − q 2k+4 The above two formulae give the Jones series for An−1 as (q) − q 1 − q n−1 = . 1−q 1 − qn We use the following formula, valid for m = 2nk + r with r = 0, 1, . . . , 2n − 1: u −m qr dn u = . 1 − q 2n T 1 − qu We can compute now the Stieltjes transform of ε,

2 − u 2 − u −2 dn u 1 − qu T 2 − q 2n−2 − q 2 = −1 + 1 − q 2n 2n 1 + q − q 2n−2 − q 2 = 1 − q 2n 2 (1 − q )(1 − q 2n−2 ) = . 1 − q 2n

2S(q) − 1 = −1 +

Thus we have 2S(q) − 1 = (q 2 ) − q 2 , and we are done.

268

T. Banica, D. Bisch

We compute next the Jones series for the Coxeter-Dynkin graph of type Dn , n ≥ 3. The graph Dn has n vertices. It consists of two vertices connected to one other vertex, and this vertex in turn is connected to an A-tail, ending at the distinguished vertex 1. Theorem 3.2. Let X be the Coxeter-Dynkin graph Dn+1 , n ≥ 2, with n + 1 vertices. The spectral measure of X (on T) is given by dε(u) = α(u) dn u, where α(u) = 2I m(u)2 , and dn is the uniform measure on 4n th roots of unity of odd order. Proof. We use again Lemma 3.1 to compute the Jones series of X k = D2k+3 by letting L0 = 2 , K0 = , K1 = 3 , M0 = 1 1 , P0 = 1, P1 = −1 + q + q −1 , P = −q. Thus (q) − q 1 + q 2k+1 . = 1−q 1 + q 2k+2 Similarly, Lemma 3.1 allows us to compute the Jones series of X k = D2k+4 by letting ⎛ ⎞ ⎛ ⎞ ⎛ ⎞ 1 111 211 1 1 M0 = ⎝1⎠, L 0 = ⎝1 1 1⎠, K0 = , K 1 = ⎝1 1 1⎠, 11 1 111 111 P0 = 2 + 2q + 2q −1 + q 2 + q −2 , P1 = q + q −1 + 2q 2 + 2q −2 + q 3 + q −3 , P = −q 2 . Thus 1 + q 2k+2 (q) − q = . 1−q 1 + q 2k+3 From the above two formulae we deduce the Jones series of Dn+1 as 1 + q n−1 (q) − q = . 1−q 1 + qn We use now the following identity: 1 + q n−1 (1 − q n )(1 + q n−1 ) = 1 + qn (1 − q n )(1 + q n ) 1 − q n + q n−1 − q 2n−1 = 1 − q 2n 2(1 − q 2n−1 ) − (1 + q n )(1 − q n−1 ) = 1 − q 2n 1 − q 2n−1 1 − q n−1 =2 − . 2n 1−q 1 − qn

Spectral Measures of Small Index Principal Graphs

269

Multiplying by 1 − q and adding q makes the Jones series m appear for Am−1 computed in Theorem 3.1, and we obtain (q) = 22n (q) − n (q). We compute now the Stieltjes transform of ε. Denote by Sm the Stieltjes transform of the measure for Am−1 . Then 2S(q) − 1 = 2(2S2n (q) − Sn (q)) − 1 = 2(S2n (q) − 1) − (2Sn (q) − 1) = 2(2n (q 2 ) − q 2 ) − (n (q 2 ) − q 2 ) = (22n (q 2 ) − n (q 2 )) − q 2 . Thus we have 2S(q) − 1 = (q 2 ) − q 2 as claimed. Theorem 3.3. For X = A∞ the spectral measure is given by dε(u) = α(u) du, where α(u) = 2I m(u)2 , and du is the uniform measure on the unit circle. Proof. This follows from Theorems 3.1 or 3.2 with n → ∞, or from a direct loop count. 4. Graphs with Fork Tails Consider a sequence of graphs X k obtained by adding D2k+2 tails to a given graph (compare with Sect. 3). We let X0 =

1 > α ::: , 2

X1 =

1 > α − 3 − β ::: , 2

X2 =

1 > α − 3 − β − 4 − γ ::: . 2

As before we let L 0 be the matrix appearing on the top left of the adjacency matrix of X 0 , i.e. L0 0 . 20 = 0 N0 Lemma 4.1. The Jones series of the graphs X k is given by ((q) − q)(1 + q) =

1 − Pq 2k+1 , 1 + Pq 2k

where P is defined by P=

P1 − q −1 P0 , P1 − q P0

270

T. Banica, D. Bisch

where Pi = Pi (y) = det(y − Ji ), i = 0, 1, with y = 2 + q + q −1 , and with J0 , J1 being the following matrices: – J0 is obtained from L 0 by deleting the first two rows and columns. – J1 is obtained from L 0 by deleting the first row and column, then adding 1 to the first entry. Proof. The matrix Mk associated to the adjacency matrix of X k is described by a column vector w, and a matrix M, as follows: ⎞ ⎛ ⎛ ⎞ 1 ⎛ ⎞ 1 1 ⎟ ⎜1 ⎟ ⎜ ⎜1 ⎟ M0 = ⎝ 1 ⎠, M1 = ⎝ , M2 = ⎜1 1 ⎟. ⎠ 1 1 ⎝ 1 1 ⎠ wM wM wM The corresponding matrices L k make the matrix N = wwt + M M t appear and are given by ⎛ ⎞ ⎞ ⎛ 1 1 1 wt t ⎠ 1 1 w t = ⎝ 1 1 w t ⎠, L0 = ⎝ 1 M w M w w N ⎛ ⎞⎛ ⎞ ⎞ ⎛1 1 1 1 1 1 1 ⎜1 ⎟⎝ ⎜1 1 1 ⎟ 1 wt ⎠ = ⎝ L1 = ⎝ , ⎠ 1 1 1 1 2 wt ⎠ t M w M w N ⎞ ⎛ ⎛ ⎞ ⎛ ⎞ 1 1 1 1 1 1 1 ⎟ ⎜1 1 1 ⎜1 ⎟ 1 1 ⎟⎜ ⎜ ⎟ ⎟ ⎜ L 2 = ⎜1 1 = ⎜1 1 2 1 ⎟⎝ ⎟. ⎠ t 1 w ⎠ ⎝ ⎝ 1 2 wt ⎠ 1 1 t M w M w N It is now clear what the form of the matrices Mk and L k is for general k. Consider the matrix K k , obtained from L k by deleting the first row and column, that is ⎛ ⎞ ⎛ ⎞ 11 1 1 t 1 w ⎜1 2 1 ⎟ , K 1 = ⎝1 2 w t ⎠, K0 = , K2 = ⎝ 1 2 wt ⎠ w N w N w N and similarly for general k. Consider also the matrix Jk , obtained from L k by deleting the first two rows and columns, i.e. ⎛ ⎞ 2 1 2 wt , J2 = ⎝1 2 w t ⎠. J0 = N , J1 = w N w N We have then the following formula for the Poincaré series of X k , with y = z −1 : det(y − K k ) 1 det(1 − z K k ) f k (z) = =y· . = 1 − z Lk det(1 − z L k ) det(y − L k )

Spectral Measures of Small Index Principal Graphs

271

The characteristic polynomials Pk = det(y − Jk ), Q k = det(y − K k ) and Rk = det(y − L k ) satisfy the following relations, obtained by developing the determinant at top left: Pk+1 = (y − 2)Pk − Pk−1 , Q k = (y − 1)Pk − Pk−1 , Rk = (y − 1)Q k − Pk − (y + 1)Pk−1 . The solutions of these equations can be written in terms of P+ = P1 − q P0 and P− = P1 − q −1 P0 , namely q −k P+ − q k P− , q −1 − q q −k P+ − q k+1 P− , Qk = 1−q q −k P+ + q k P− Rk = . q/(1 + q)2 Pk =

We can compute now f k by using the variables z = y −1 = q(1 + q)−2 , (1 + q)2 Q k · q Rk 2 (1 + q) q/(1 + q)2 q −k P+ − q k+1 P− · · −k = q 1−q q P+ + q k P− 2k+1 1 − Pq 1 · = . 1−q 1 + Pq 2k

f k (z) =

And finally we obtain the Jones series for Dn+1 as k (q) − q =

1 − Pq 2k+1 1−q 1 f k (z) = · . 1+q 1+q 1 + Pq 2k

This proves the claim. We proceed now with the computation of the spectral measure for the extended (1) (1) Coxeter-Dynkin graph Dn . The graph Dn , n ≥ 4, has n + 1 vertices. It consists of the distinguished vertex 1, connected to a triple point, connected in turn to another vertex and to an A-tail ending at another triple point. This triple point is connected to two other vertices. (1)

Theorem 4.1. Let X be the extended Coxeter-Dynkin graph of type Dn+2 , n ≥ 2 (n + 3 vertices). The spectral measure of X (on T) is given by dε(u) =

δi + δ−i dn u + , 4 2

where dn u is the uniform measure on 2n th roots of unity and δw is the Dirac measure at w ∈ T.

272

T. Banica, D. Bisch (1)

Proof. We use Lemma 4.1 to compute the Jones series of X k = D2k+4 by letting ⎛ ⎞ 1 ⎜1⎟ M0 = ⎝ ⎠, 1 1

⎛ 1 ⎜1 L0 = ⎝ 1 1

1 1 1 1

1 1 1 1

⎞ 1 1⎟ , 1⎠ 1

11 J0 = , 11

⎛ ⎞ 211 J1 = ⎝1 1 1⎠, 111

P0 = 2 + 2q + 2q −1 + q 2 + q −2 , P1 = q + q −1 + 2q 2 + 2q −2 + q 3 + q −3 , P = −q 2 . We obtain ((q) − q)(1 + q) =

1 + q 2k+3 . 1 − q 2k+2 (1)

Similarly, Lemma 4.1 gives the Jones series of X k = D2k+5 by letting ⎛ ⎞ 100 M0 = ⎝1 0 0⎠, 111

⎛

⎞ 111 L 0 = ⎝1 1 1⎠, 113

K0 = 3 ,

K1 =

21 , 13

P0 = −1 + q + q −1 , P1 = 1 − q − q −1 + q 2 + q −2 , P = −q 3 . We obtain ((q) − q)(1 + q) =

1 + q 2k+4 . 1 − q 2k+3 (1)

From the above two formulae we deduce that Jones series for Dn+2 satisfies ((q) − q)(1 + q) =

1 + q n+1 . 1 − qn (1)

Hence we get the following explicit formula for the Jones series for Dn+2 : 1 1+q 1 = 1+q 1 = 1+q 1 = 1+q

(q) − q =

1 − q n + q n + q n+1 · 1 − qn n q (1 + q) 1+ 1 − qn n q + 1 − qn 1 − 1. + 1 − qn

Spectral Measures of Small Index Principal Graphs

273

We compute now the Stieltjes transform of the associated spectral measure ε, 1 1 1 1 + + dn u 2S(q) = 2 1 − qi 1 + qi T 1 − qu 1 1 = + . 1 + q 2 1 − q 2n Thus we have 2S(q) = (q 2 ) − q 2 + 1, and we are done. The spectral measure for the infinite Coxeter-Dynkin graph D∞ follows now. Recall that the graph D∞ has a triple point, connected to the distinguished vertex 1 and to another vertex, and to an A∞ -tail. Theorem 4.2. The spectral measure (on T) of the Coxeter-Dynkin graph D∞ is given by dε(u) =

δi + δ−i du + , 4 2

where du is the uniform measure on the unit circle. Proof. This follows from Theorem 4.1 by letting n → ∞. 5. Exceptional Graphs In this chapter we compute the spectral measures for the Coxeter-Dynkin graphs of type E. These graphs are E 6 = F(2, 1, 2), E 7 = F(2, 1, 3), E 8 = F(2, 1, 4), E 6(1) = F(2, 2, 2), (1)

E 7 = F(3, 1, 3), (1)

E 8 = F(2, 1, 5). Here we denote by F(a, b, c) the graph with a + b + c + 1 vertices, consisting of a triple point which is connected to an Aa tail, to an Ab tail, and to an Ac tail ending at the distinguished vertex 1. All of these graphs, with the exception of E 7 , appear as principal graphs of subfactors with Jones index ≤ 4 (see for instance [4, 5]). We use the following notation. Assume that ε is a probability measure on the unit circle which is even, in the sense that all its odd moments are 0. The Stieltjes transform S(q) of ε is then a series in q 2 , and the following definition makes sense. Definition 5.1. The T series of an even measure ε is given by T (q) = where S is the Stieltjes transform of ε.

2S(q 1/2 ) − 1 , 1−q

274

T. Banica, D. Bisch

It follows from Propositions 1.1 and 1.2 that the spectral measure of a graph is even, and that we have the formula T (q) =

(q) − q , 1−q

where is the Jones series. In this section we compute the T series of exceptional graphs of type E. Theorem 5.1. The T series of the Coxeter-Dynkin graphs E 6 , E 7 and E 8 are given by the following formulae: (1 − q 6 )(1 − q 8 ) , (1 − q 3 )(1 − q 12 ) (1 − q 9 )(1 − q 12 ) , T7 (q) = (1 − q 4 )(1 − q 18 ) (1 − q 10 )(1 − q 15 )(1 − q 18 ) T8 (q) = . (1 − q 5 )(1 − q 9 )(1 − q 30 ) T6 (q) =

The proof of these results uses techniques from Sect. 3. We combine the proof with the proof of next result. (1)

(1)

(1)

Theorem 5.2. The T series of the extended Coxeter-Dynkin graphs E 6 , E 7 and E 8 are given by the following formulae: T6(1) (q) =

1 − q 12 (1 − q 3 )(1 − q 4 )(1 − q 6 )

,

1 − q 18 , (1 − q 4 )(1 − q 6 )(1 − q 9 ) 1 − q 30 . T8(1) (q) = (1 − q 6 )(1 − q 10 )(1 − q 15 ) T7(1) (q) =

(1)

Proof. We compute first the T series of E 6 , E 7 , E 8 , E 8 , which are all of the form F(2, 1, n). We use Lemma 3.1 to compute the T series of X k = F(2, 1, 2k) by letting

11 M0 = , 01

21 L0 = , 11

K0 = 1 ,

31 K1 = , 11

P0 = 1 + q + q −1 , P1 = q 2 + q −2 , 1 + q − q3 P = −q . 1 − q2 − q3 We obtain T (q) =

(1 − q 2 − q 3 ) + q 2k+1 (1 + q − q 3 ) . (1 − q 2 − q 3 ) + q 2k+2 (1 + q − q 3 )

Spectral Measures of Small Index Principal Graphs

275

Similarly, Lemma 3.1 gives the T series of X k = F(2, 1, 2k + 1) by letting ⎛ ⎞ ⎛ ⎞ ⎛ ⎞ 10 111 211 11 M 0 = ⎝1 0 ⎠, L 0 = ⎝1 1 1⎠, K0 = , K 1 = ⎝1 1 1⎠, 12 11 112 112 P0 = 1 + q + q −1 + q 2 + q −2 , P1 = −1 + q 2 + q −2 + q 3 + q −3 , 1 + q − q3 P = −q 2 . 1 − q2 − q3 We obtain T (q) =

(1 − q 2 − q 3 ) + q 2k+2 (1 + q − q 3 ) . (1 − q 2 − q 3 ) + q 2k+3 (1 + q − q 3 )

To simplify this expression we introduce the following polynomials: Q n = (1 − q 2 − q 3 ) + q n (1 + q − q 3 ). From the above formulae we get then the T series of F(2, 1, n) in terms of these polynomials as T (q) =

Q n+1 . Q n+2 (1)

The polynomials Q k needed for E 6 , E 7 , E 8 , and E 8 are all cyclotomic: (1 − q 2 )(1 − q 8 ) , 1 − q4 (1 − q 2 )(1 − q 3 )(1 − q 12 ) , Q4 = (1 − q 4 )(1 − q 6 ) (1 − q 2 )(1 − q 3 )(1 − q 18 ) , Q5 = (1 − q 6 )(1 − q 9 ) (1 − q 2 )(1 − q 3 )(1 − q 5 )(1 − q 30 ) Q6 = , (1 − q 6 )(1 − q 10 )(1 − q 15 ) Q3 =

Q 7 = (1 − q 2 )(1 − q 3 )(1 − q 5 ). (1)

The formulae for the T series of E 6 , E 7 , E 8 and E 8 follow now. We use again Lemma 3.1 to compute the T series of X k = F(2, 2, 2k) by letting ⎛ ⎞ ⎛ ⎞ ⎛ ⎞ 11 211 311 10 , K 1 = ⎝1 1 0⎠, M0 = ⎝1 0⎠, L 0 = ⎝ 1 1 0 ⎠, K0 = 01 01 101 101 P0 = 3 + 2q + 2q −1 + q 2 + q −2 , P1 = −1 + q 2 + q −2 + q 3 + q −3 , 1 + q − q2 P = −q . 1 − q − q2

276

T. Banica, D. Bisch

We obtain T (q) =

(1 − q − q 2 ) + q 2k+1 (1 + q − q 2 ) . (1 − q − q 2 ) + q 2k+2 (1 + q − q 2 )

Factorising the numerator N and denominator D for k = 1, we get N = 1 − q − q 2 + q 3 + q 4 − q 5 = (1 − q)(1 − q 2 + q 4 ), D = 1 − q − q 2 + q 4 + q 5 − q 6 = (1 − q)(1 − q 2 )(1 − q 3 ). Rewrite these expressions as N =

(1 − q)(1 − q 2 )(1 − q 12 ) , (1 − q 4 )(1 − q 6 )

D = (1 − q)(1 − q 2 )(1 − q 3 ), (1)

and the formula for the T series of E 6 follows. We use next Lemma 3.1 to compute the T series of X k = F(3, 1, 2k + 1) by letting ⎛

1 ⎜1 M0 = ⎝ 1 0

⎞ 0 0⎟ , 1⎠ 1

⎛

1 ⎜1 L0 = ⎝ 1 0

1 1 1 0

1 1 2 1

⎞ 0 0⎟ , 1⎠ 1

⎛

⎞ 110 K 0 = ⎝1 2 1⎠, 011

⎛

2 ⎜1 K1 = ⎝ 1 0

1 1 1 0

P0 = 2 + 2q + 2q −1 + 2q 2 + 2q −2 + q 3 + q −3 , P1 = −2 − q − q −1 + q 2 + q −2 + 2q 3 + 2q −3 + q 4 + q −4 , 1 + q2 − q3 P = −q 2 . 1 − q − q3 We obtain T (q) =

(1 − q − q 3 ) + q 2k+2 (1 + q 2 − q 3 ) . (1 − q − q 3 ) + q 2k+3 (1 + q 2 − q 3 )

Factorising the numerator N and denominator D for k = 1 gives N = 1 − q − q 3 + q 4 + q 6 − q 7 = (1 − q)(1 − q 3 + q 6 ), D = 1 − q − q 3 + q 5 + q 7 − q 8 = (1 − q)(1 − q 3 )(1 − q 4 ). Rewrite these expressions as N =

(1 − q)(1 − q 3 )(1 − q 18 ) , (1 − q 6 )(1 − q 9 )

D = (1 − q)(1 − q 3 )(1 − q 4 ), (1)

and the formula for the T series of E 7 follows.

1 1 2 1

⎞ 0 0⎟ , 1⎠ 1

Spectral Measures of Small Index Principal Graphs

277

6. Spectral Measures for Exceptional Graphs (1)

We compute in this section the spectral measures of the exceptional graphs E 6 and E 6,7,8 . We will express the T series computed in the previous section as linear combinations of elementary T series computed in the next two lemmas. The notation for the measures is as in sects. 3 and 4. Lemma 6.1. The T series of the measures α dn , α dn , dn , dn are given by the following identities: 1 − q n−1 , 1 − qn 1 + q n−1 , T (α dn ) = 1 + qn 1 + qn 1 · , T (dn ) = 1 − q 1 − qn 1 − qn 1 · . T (dn ) = 1 − q 1 + qn

T (α dn ) =

Proof. The first two identities appear in the proof of Theorems 3.1 and 3.2. For the third one we use that u k dn u = (2n|k), T

where (2n|k) is defined to be 1 when 2n divides k, and 0 otherwise. This gives the following identity for the Stieltjes transform: S(q) =

∞ s=0

q 2ns =

1 . 1 − q 2n

We can then compute the T series as in Definition 5.1 by 2S(q 1/2 ) − 1 1−q 2 1 −1 = 1 − q 1 − qn 1 + qn 1 . · = 1 − q 1 − qn

T (dn ) =

For the fourth identity in the lemma we use that dn = 2d2n − dn . Hence T (dn ) = 2T (d2n ) − T (dn ) 1 + q 2n 1 + qn 1 2· − = 1−q 1 − q 2n 1 − qn 1 1 − 2q n + q 2n · 1−q 1 − q 2n 1 (1 − q n )2 = . · 1 − q 1 − q 2n

=

After simplification we obtain the formula in the statement.

278

T. Banica, D. Bisch (1)

Theorem 6.1. The spectral measures of E 6,7,8 (on T) are given by ε6(1) = α d3 + (d2 − d3 )/2, (1)

ε7 = α d4 + (d3 − d4 )/2, (1)

ε8 = α d6 + (d5 − d6 )/2, where α(u) = 2I m(u)2 , and dn u is the uniform measure on 2n th roots of unity. Proof. The T series in Theorem 5.2 can be written as 1 + q6 , (1 − q 3 )(1 − q 4 ) 1 + q9 , = (1 − q 4 )(1 − q 6 ) 1 + q 15 . = (1 − q 6 )(1 − q 10 )

T6(1) = (1)

T7

T8(1)

Factoring by 1 + q 2 , 1 + q 3 resp. 1 + q 5 gives 1 − q2 + q4 , (1 − q 2 )(1 − q 3 ) 1 − q3 + q6 = , (1 − q 3 )(1 − q 4 ) 1 − q 5 + q 10 . = (1 − q 5 )(1 − q 6 )

T6(1) = T7(1) (1)

T8

We get then the following formula, with k = 2, 3, 5 corresponding to n = 6, 7, 8: Tn(1) =

1 − q k + q 2k . (1 − q k )(1 − q k+1 )

We can rewrite this series in the following way: 1 − 2q k + q k qk + k k+1 k (1 − q )(1 − q ) (1 − q )(1 − q k+1 ) q k − q k+1 1 − qk 1 · = + k+1 1−q 1 − q (1 − q k )(1 − q k+1 ) 1 1 − qk 1 1 . = + − 1 − q k+1 1 − q 1 − q k 1 − q k+1

Tn(1) =

(1)

We can then write Tn Lemma 6.1 as follows:

as a linear combination of the elementary T series from

Tn(1) = T (α dk+1 ) + (T (dk ) − T (dk+1 ))/2. By using linearity of the Stieltjes transform, hence of the T series, we get the formulae in the statement of the theorem.

Spectral Measures of Small Index Principal Graphs

279

Theorem 6.2. The spectral measure of E 6 (on T) is given by ε6 = α d12 + (d12 − d6 − d4 + d3 )/2, where α(u) = 2I m(u)2 , and dn is the uniform measure on 2n th roots of unity. Proof. The T series of E 6 can be written as (1 + q 3 )(1 − q 8 ) 1 − q 12 1 − q 11 q 3 − q 8 = + . 1 − q 12 1 − q 12

T6 =

Note that we have the following identity: q3 − q8 q3 − q4 − q8 + q9 1 · = 1 − q 12 1−q 1 − q 12 1 1 1 + q6 1 + q4 + q8 1 + q3 + q6 + q9 = − − + 1 − q 1 − q 12 1 − q 12 1 − q 12 1 − q 12 1 1 1 1 1 . = − − + 12 6 4 1−q 1−q 1−q 1−q 1 − q3 It follows now easily that T6 can be written as a linear combination of the elementary T series in Lemma 6.1, namely T6 = T (α d12 ) + (T (d12 ) − T (d6 ) − T (d4 ) + T (d3 ))/2. This gives the formula for the spectral measure ε6 in the statement of the theorem. 7. Exceptional Measures: E7 and E8 Note that all spectral measures of the finite ADE graphs computed so far are linear combinations of measures of type dn and α dn (observe that we have dn = 2d2n − dn ). Definition 7.1. A discrete measure supported by roots of unity is called cyclotomic if it is a linear combination of measures of type dn , n ≥ 1, and α dn , n ≥ 2. Note that we require n ≥ 2 for the measure α dn . This is simply because α d1 is the null measure. Theorem 7.1. The spectral measures of E 7 , E 8 (on T) are not cyclotomic. Proof. From Theorem 5.1 we obtain the T series of E 7 as T7 =

(1 − q 9 )(1 + q 4 + q 8 ) . 1 − q 18

This shows that the corresponding spectral measure ε7 is supported by 36th roots of unity. Assume now that ε7 is cyclotomic. Then ε7 ∈ span{dn , α dm | n ≥ 1, m ≥ 2, n, m|18}.

280

T. Banica, D. Bisch

Using the linearity of the T transform (with respect to the measure) this means T7 ∈ span{T (dn ), T (α dm ) | n ≥ 1, m ≥ 2, n, m|18}. We multiply the relevant T series by (1 − q)(1 − q 18 ), i.e. we consider the following degree 18 polynomials, where n, m are as above: Pn = (1 − q)(1 − q 18 ) T (dn ), Q m = (1 − q)(1 − q 18 ) T (α dm ), R7 = (1 − q)(1 − q 18 ) T7 . The assumption that ε7 is cyclotomic becomes then R7 ∈ span{Pn , Q m | n ≥ 1, m ≥ 2, n, m|18}. The above formula of T and Lemma 6.1 lead to the following expessions: Pn = (1 + q n ) ·

1 − q 18 , 1 − qn

Q m = (1 − q)(1 − q m−1 ) ·

1 − q 18 , 1 − qm

R7 = (1 − q)(1 − q 9 )(1 + q 4 + q 8 ). The coefficients ck of each of these polynomials satisfy ck = c18−k , so in order to solve the system of linear equations resulting from our assumption, we can restrict attention to coefficients ck with k = 0, 1, . . . , 9. Thus we have 10 equations, and the unknowns are the coefficients of Pn , Q m with n ≥ 1, m ≥ 2 and n, m|18. The matrix of the system is given below. The rows correspond to the polynomials appearing, and the columns correspond to coefficients of q k , with k = 0, 1, . . . , 9:

P1 P2 P3 P6 P9 P18 Q2 Q3 Q6 Q9 Q 18 R7

c0

c1

c2

c3

c4

c5

c6

c7

c8

c9

1 1 1 1 1 1 1 1 1 1 1 1

2

2 2

2

2 2

2

2 2 2 2

2

2 2

2

2

2 2

−2 −1 −1 −1 −1 −1

2 −1

−2 2

2 −1

1

−2 −1 −1

2 2 2

−1

−2 −1 −1

2 −1

−2 2

−1

2

1

−2

Comparing the c2 and c4 columns shows that this system of equations has no solution. This contradicts our assumption that ε7 is cyclotomic. The same method applies to E 8 . From Theorem 5.1 we get the T series for E 8 as T8 =

(1 + q 5 )(1 + q 9 )(1 − q 15 ) . 1 − q 30

Spectral Measures of Small Index Principal Graphs

281

Thus ε8 is supported by 60th roots of unity. Now by using degree 30 polynomials Pn , Q m and R8 defined as above, we get again a linear system of equations with the following matrix of coefficients:

P1 P2 P3 P5 P6 P10 P15 P30 Q2 Q3 Q5 Q6 Q 10 Q 15 Q 30 R8

c0

c1

c2

c3

c4

c5

c6

c7

c8

c9

c10

c11

c12

c13

c14

c15

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

2

2 2

2

2 2

2

2 2 2

2

2 2

2

2 2

2

2 2 2

2

2 2

2

2

2

2

2 2

2 2

2 2 2

−2 −1 −1 −1 −1 −1 −1 −1

2 −1

−2 2

2 −1 −1

−2 −1 2 −1

1

2 2 −1 2

−1

−2 −1 −1

2 −1

−2 2 −1

2 −1 2

−1

2

1

−1

−2 −1 −1 −1 −1

2 2

−2 −1

2

−1

2 −1 −1

−2 2 2

−1

2

1

−2

Assume that ε8 is cyclotomic. This means that R8 appears as linear combination of Pn , Q m . Now, comparing the c2 and c4 columns shows that the coefficient of Q 5 must be zero, and comparing the c6 and c12 columns shows that the coefficient of Q 5 must be non-zero. Thus our assumption that ε8 is cyclotomic is wrong. Acknowledgements. Most of this paper was written while T.B. was visiting Vanderbilt University and D.B. was visiting the Université Paul Sabatier. The authors are grateful for the hospitality received at both institutions.

References 1. Banica, T., Collins, B.: Integration over compact quantum groups. http://arxiv.org/list/math.QA/0511253, 2005 2. Bisch, D.: Bimodules, higher relative commutants and the fusion algebra associated to a subfactor. Fields Inst. Comm. 13, 13–63 (1997) 3. Di Francesco, P.: Meander determinants. Commun. Math. Phys. 191, 543–583 (1998) 4. Evans, D., Kawahigashi, Y.: Quantum symmetries on operator algebras. Oxford: Oxford University Press, 1998 5. Goodman, F.M., de la Harpe, P., Jones, V.F.R.: Coxeter graphs and towers of algebras. Publ. MSRI 62, Berlin-Heidelberg-New York: Springer, 1989 6. Jones, V.F.R.: Index for subfactors. Invent. Math. 72, 1–25 (1983) 7. Jones, V.F.R.: Planar algebras I. http://arxiv.org/list/math.QA/9909027, 1999 8. Jones, V.F.R.: The annular structure of subfactors. Monogr. Enseign. Math. 38, 401–463 (2001) 9. Jones, V.F.R., Reznikoff, S.A.: Hilbert space representation of the annular Temperley-Lieb algebra. Preprint, available at http://math.berkeley.edu/∼vfr/hilbertannular.pdf, 2003 10. Ocneanu, A.: Quantized groups, string algebras and Galois theory for algebras. London Math. Soc. Lect. Notes 136, 119–172 (1988) 11. Popa, S.: Classification of subfactors: the reduction to commuting squares. Invent. Math. 101, 19–43 (1990) 12. Popa, S.: Classification of amenable subfactors of type II. Acta Math. 172, 163–255 (1994) 13. Reznikoff, S.A.: Temperley-Lieb planar algebra modules arising from the ADE planar algebras. J. Funct. Anal. 228, 445–468 (2005) 14. Voiculescu, D.V., Dykema, K.J., Nica, A.: Free random variables. CRM Monograph Series 1, Providence, RI: AMS, 1993 Communicated by Y. Kawahigashi

Commun. Math. Phys. 269, 283–310 (2007) Digital Object Identifier (DOI) 10.1007/s00220-006-0130-1

Communications in

Mathematical Physics

Convex Polytopes and Quasilattices from the Symplectic Viewpoint Fiammetta Battaglia Dipartimento di Matematica Applicata “G. Sansone”, Via S. Marta 3, 50139 Firenze, Italy. E-mail: [email protected] Received: 5 May 2005 / Accepted: 18 May 2006 Published online: 4 November 2006 – © Springer-Verlag 2006

Abstract: We construct, for each convex polytope, possibly nonrational and nonsimple, a family of compact spaces that are stratified by quasifolds, i.e. each of these spaces is a collection of quasifolds glued together in an suitable way. A quasifold is a space locally modelled on Rk modulo the action of a discrete, possibly infinite, group. The way strata are glued to each other also involves the action of an (infinite) discrete group. Each stratified space is endowed with a symplectic structure and a moment mapping having the property that its image gives the original polytope back. These spaces may be viewed as a natural generalization of symplectic toric varieties to the nonrational setting.

Introduction Consider a vector space d of dimension n. In [D] Delzant shows that each n-dimensional simple convex polytope ⊂ d∗ , rational with respect to a lattice in d and satisfying an additional integrality condition, is the image of a symplectic toric manifold via moment mapping. The result is obtained by providing an explicit construction of such a manifold, which coincides in fact with the toric variety associated to the polytope and is naturally endowed with a symplectic structure whose symplectic volume is equal to the Euclidean volume of the polytope. In [P] Prato extends Delzant’s result to the nonrational setting, constructing, for each n-dimensional simple convex polytope ⊂ d∗ , rational or not, a family of 2n-dimensional compact symplectic spaces, called quasifolds, naturally associated to . Each space of the family admits the effective Hamiltonian action of an n-dimensional quasitorus D, with a moment mapping whose image is exactly . A quasifold is a topological space locally modelled on Rk modulo the action of a discrete, possibly infinite, group, and therefore it is not necessarily Hausdorff. A quasitorus is the Partially supported by GNSAGA (CNR).

284

F. Battaglia

natural analogue of a torus in this setting, it is the group and quasifold d/Q, where Q is a quasilattice1 . The idea underlying the present paper is that in order to extend these results to arbitrary convex polytopes, neither simple, nor rational, quasifolds are still the natural structures but a further degree of singularity has to be allowed, just as in the case of classical toric varieties, in which nonsimplicity of the polytope brings in singularities which are not finite group quotient type. We therefore introduce spaces stratified by quasifolds: nonsimplicity of the polytope causes the decomposition in strata of the corresponding topological spaces, whilst nonrationality of produces the quasifold structure of the strata and intervenes in the way strata are glued to each other, leading to a definition of stratification that naturally extends the usual one. More precisely we prove that to each n-dimensional convex polytope ⊂ d∗ , there corresponds a family of compact spaces that are stratified by symplectic quasifolds. Different members of the family correspond, like in [P], to different choices of additional data attached to . Each space M of the family admits the continuous effective action of an n-dimensional quasitorus D and a continuous mapping : M −→ d∗ such that (M) = . The restriction of the D-action to each stratum is smooth and Hamiltonian, with moment mapping given by the restriction of . The proof is based on the explicit construction of M as a symplectic quotient; the stratification of M is the one naturally induced by the decomposition by isotropy type, it is, in other words, the decomposition by singularity type and mirrors, as we shall see, the decomposition of the polytope in singular and nonsingular faces. Moreover, as in the rational case, the local structure of the stratification reflects the polytope shape. If we restrict to the rational case we obtain, from the point of view of singularities, what is expected from the classical theory of toric varieties, and in addition we gain further insight on the symplectic geometry of them; it should also be noticed that these spaces provide a wide range of explicit examples of symplectic stratified spaces as defined in [SL] (further details in Remark 6.6). In the present paper, whose results have been announced, jointly with Prato, in [BP2], the spaces corresponding to arbitrary convex polytopes are described in the symplectic setting. They have a complex counterpart, strictly related to the theory of toric varieties (for further details cf. Remark 6.9; for the extension of the notion of toric variety to simple nonrational convex polytope see [BP1]). This aspect will be treated in a subsequent paper [B]. Recent works by Karu and Bressler-Lunts deal with arbitrary convex polytopes from another viewpoint. In order to prove the Hard-Lefschetz theorem ([K]) and HodgeRiemann bilinear relations ([BL]) for nonrational convex polytopes, they make use of a combinatorial construction, from the polytope data, of the intersection cohomology, thus bypassing the problem of the existence of a “variety” associated to . What we provide here is on the contrary an explicit construction of a geometric space, indeed of a whole family of geometric spaces, that naturally correspond to . The geometry and topology of our spaces and the relationship with properties of the corresponding polytope–volume, counting integer lattice points, Hard Lefschetz–are all very natural questions related to our work. A first step towards a better understanding of these different aspects, that will be pursued in the sequel, is to investigate cohomological properties of our spaces. 1 Quasilattices are quasiperiodic structures underlying quasiperiodic tilings and quasicrystals’ atomic order. There is now an extremely rich literature on the subject, whose origin goes back to the eighties for quasicrystals [Sh et al.]–for an updated account see for example [AYP], while aperiodic tilings were produced first in the sixties by R. Berger, followed by the works by Robinson [R] and Penrose [Pe].

Convex Polytopes and Quasilattices from the Symplectic Viewpoint

285

The paper is organized as follows. After a preliminary section in which we recall the basics about quasifolds, we give in Sect. 2 the definition of space stratified by quasifolds. In Sect. 3 we explicitly construct, adopting the procedure introduced by Delzant in [D] and extended to the nonrational case by Prato in [P], a family of topological spaces naturally associated to a convex polytope . In the last two sections we describe the structure of these spaces, by proving that they are spaces stratified by symplectic quasifolds. To guide the exposition we provide two model examples worked out in detail. 1. Preliminaries about Quasifolds In this section we recall basic definitions about quasifolds: for further details and for related notions, such as symplectic forms and Hamiltonian actions, we refer the reader to Prato’s article [P]. The local model for quasifolds is a manifold acted on diffeomorphically by a discrete group. Definition 1.1 (Model). Let U˜ be a connected, simply connected manifold of dimension k and let be a discrete group acting smoothly on U˜ so that the set of points, U˜ 0 , where the action is free, is connected and dense. Consider the space of orbits, U˜ / , of the action of the group on the manifold U˜ , endowed with the quotient topology, and the canonical projection p : U˜ → U˜ / . A model of dimension k is the triple (U˜ / , p, U˜ ), shortly U˜ / . Remark 1.2. Nonclosed subgroups of Lie groups, for instance of tori, play a central role in our construction. Such groups are immersed Lie subgroups but of course they are not embedded. Most discrete groups we are dealing with are of this kind, more precisely they are finitely generated nonclosed subgroups of Lie groups–they are not of course discrete with the induced topology. Remark 1.3. Let (U˜ / , p, U˜ ) be a triple with U˜ not necessarily simply connected and satisfying all other requirements of Definition 1.1. We can obtain a model from (U˜ / , p, U˜ ) by the following procedure: consider the universal cover, π : U → U˜ , and its fundamental group, . The manifold U is connected and simply connected, the mapping π is smooth, the discrete group acts smoothly, freely and properly on the manifold U and U˜ = U /. Consider the extension of the group by the group , 1 −→ −→ −→ −→ 1, defined as follows: = λ ∈ Diff(U ) | ∃ γ ∈ s. t. π(λ(u # )) = γ · π(u # ) ∀ u # ∈ U . It is easy to verify that is a discrete group, that it acts on the manifold U according to the assumptions of Definition 1.1 and that U˜ / = U / . Definition 1.4 (Submodel). Consider a model (U˜ / , p, U˜ ) and let W be an open subset of U˜ / . We will say that W is a submodel of (U˜ / , p, U˜ ), if (W, p, p −1 (W )) defines a model by means of Remark 1.3. Definition 1.5 (Smooth mapping, diffeomorphisms of models). Given two models (U˜ / , p, U˜ ) and (W˜ /, q, W˜ ), a mapping f : U˜ / −→ W˜ / is said to be smooth if there exists a smooth mapping f˜ : U˜ −→ W˜ such that q ◦ f˜ = f ◦ p; we will say that f˜ is a lift of f . We will say that the smooth mapping f is a diffeomorphism of models if it is bijective and if the lift f˜ is a diffeomorphism.

286

F. Battaglia

If the mapping f˜ is a lift of a smooth mapping of models f : U˜ / −→ W˜ / so are the mappings f˜γ (−) = f˜(γ · −), for all elements γ in and δ f˜(−) = δ · f˜(−), for all elements δ in . If the mapping f is a diffeomorphism, then these are the only other possible lifts and the groups and are isomorphic; for a proof see [P, orange and green lemmas]. Quasifolds are obtained by gluing together the models in the appropriate way: Definition 1.6 (Quasifold). A dimension k quasifold structure on a topological space M is the assignment of an atlas, or collection of charts, A = {(Uα , φα , U˜ α / α ) | α ∈ A} having the following properties: (1) The collection {Uα | α ∈ A} is an open cover of M; (2) For each index α in A the space U˜ α / α defines a model, where the set U˜ α is an open, connected, and simply connected subset of the space Rk , and the mapping φα is a homeomorphism of the space U˜ α / α onto the set Uα ; (3) For all indices α, β in A such that Uα ∩ Uβ = ∅, the sets φα−1 (Uα ∩ Uβ ) and φβ−1 (Uα ∩ Uβ ) are submodels of U˜ α / α and U˜ β / β respectively and the mapping gαβ = φβ−1 ◦ φα : φα−1 (Uα ∩ Uβ ) −→ φβ−1 (Uα ∩ Uβ ) is a diffeomorphism of models. We will then say that the mapping gαβ is a change of charts and that the corresponding charts are compatible; (4) The atlas A is maximal, that is: if the triple (U, φ, U˜ / ) satisfies Property (2) and is compatible with all the charts in A, then (U, φ, U˜ / ) belongs to A. Remark 1.7. To each point m ∈ M there corresponds a discrete group m defined as follows: take a chart (Uα , φα , U˜ α / α ) around m, then m is the isotropy group of α at any point u˜ ∈ U˜ α which projects down to m. One can check that this definition does not depend on the choice of the chart. Definition 1.8 (Submodel in a quasifold). An open subset W of M is a submodel in the quasifold M if there exists a chart (U, φ, U˜ / ) of M such that W ⊂ U and φ −1 (W ) is a submodel of U˜ / . Definition 1.9 (Smooth mapping, diffeomorphism). Let M and N be two quasifolds. A continuous mapping f : M −→ N is said to be smooth if there exists a chart (Uα , φα , U˜ α / α ) around each point m in the space M, a chart (Wα , ψα , W˜ α /α ) around the point f (m), and a smooth mapping of models f α : V˜α / α → W˜ α /α such that ψα ◦ f α = f ◦ φα . If f is bijective, and if each f α is a diffeomorphism of models, we will say that f is a diffeomorphism. Definition 1.10 (Quasilattice, quasitorus). Let d be a vector space of dimension n. A quasilattice in d is the Z-span, Q, of a set of R-spanning vectors X 1 , . . . , X d ∈ d. We call a quasitorus of dimension n the group and quasifold, covered by one chart, D = d/Q. Remark that Q is a true lattice if and only if dimQ SpanQ {X 1 , . . . , X d } = n. In this case d/Q is a torus.

Convex Polytopes and Quasilattices from the Symplectic Viewpoint

287

2. Stratifications by Quasifolds We define the notion of space stratified by quasifolds in the generality we need for our purposes. For the general definition of stratification see [GM1]. Definition 2.1. Let M be a topological space. A decomposition of M by quasifolds is a collection of disjoint locally closed connected quasifolds T F (F ∈ F), called pieces, such that (1) The set F is finite and partially ordered; (2) M = F T F ; (3) T F ∩ T F = ∅ if and only if T F ⊆ T F if and only if F ≤ F . We also require that F has a maximal element F and that the corresponding piece T F is open and dense in M. We call this piece the regular piece, the other pieces are called singular. We will then say that M is an n-dimensional compact space decomposed by quasifolds, with n the dimension of the regular set. Remark 2.2. For the definition of stratification we need the following construction: let L be a compact space decomposed by quasifolds, we will call cone over L, denoted by C(L), the space [0, 1) × L/ ∼, where two points (t, l) and (t , l ) in [0, 1) × L are equivalent if and only if t = t = 0. This space is itself a space decomposed by quasifolds: for example when L is a compact quasifold the space C(L) decomposes into two pieces: one is the cone point, the other is given by the quasifold (0, 1) × L. A further construction to ˜ a submodel be considered is the following: let t be a point in a quasifold T , B ∼ = B/ in T containing t, and L a compact space decomposed by quasifolds. Notice first that the decomposition of L induces a decomposition of the product B˜ × C(L): to each piece L of L there corresponds the piece B˜ × (0, 1) × L; to cover the whole of B˜ × C(L) we add a minimal piece, lying in the closure of all other pieces, given by B˜ times the cone point. Suppose, in addition, that acts freely on B˜ and that the space L is endowed with an action of that preserves the decomposition. Then the product B˜ × C(L) is acted on by and the quotient ( B˜ × C(L))/ inherits the decomposition of B˜ × C(L) in pieces. Moreover the quotient ( B˜ × C(L))/ fibers over B with fiber C(L). A stratification is a decomposition that is locally well behaved. Definition 2.3. Let M be an n-dimensional compact space decomposed by quasifolds, the decomposition of M is said to be a stratification by quasifolds if each singular piece T , called stratum, satisfies the following conditions: (i) let r be the dimension of T , for every point t ∈ T there exist: an open neighborhood ˜ in T containing t and such that acts freely U of t in M; a submodel B ∼ = B/ ˜ an (n − r − 1)-dimensional compact space L decomposed by quasifolds, on B; called the link of t; an action of the group on L, preserving the decomposition of L and such that the pieces of the induced decomposition of B˜ × C(L)/ are quasifolds; finally a homeomorphism h : ( B˜ × C(L))/ −→ U that respects the decompositions and takes each piece of ( B˜ × C(L))/ diffeomorphically into the corresponding piece of U ; (ii) the decomposition of the link L satisfies condition (i). The definition is recursive and, since the dimension of L decreases at each step, we end up, after a finite number of steps, with links that are compact quasifolds.

288

F. Battaglia

Remark 2.4. Notice that, if the discrete groups ’s are finite for any possible F, t ∈ T F and B, then the twisted products B˜ × C(L)/ become trivial and the singular strata turn out to be smooth manifolds, since ’s act freeely. Therefore our stratification satisfies in this case the local triviality condition of the classical definition of stratification, moreover, strata are smooth, with the only possile exception of the principal stratum, that might be an orbifold. We shall learn from the examples how the twisting discrete group ’s arise naturally from the construction. 3. The Construction Let d be a real vector space of dimension n, and let be a convex polytope of dimension n in the dual space d∗ . We want to associate to the polytope a family of compact spaces that are suitably stratified by symplectic quasifolds. We construct these spaces as symplectic quotients, following the procedure which was first introduced by Delzant in [D] and then extended to nonrational simple convex polytopes by Prato in [P]. Write the polytope as d = {μ ∈ d∗ | μ, X j ≥ λ j } (1) j=1

for some elements X 1 , . . . , X d in the vector space d and some real numbers λ1 , . . . , λd . Let Q be a quasilattice in the space d containing the elements X j (for example the one that is generated by these elements, namely SpanZ {X 1 , . . . , X d }) and let {e1 , . . . , ed } denote the standard basis of Rd ; consider the surjective linear mapping π : Rd −→ d, e j −→ X j .

(2)

Consider the n-dimensional quasitorus d/Q. The mapping π induces a group homomorphism, : T d = Rd /Zd −→ d/Q. (3) We define N to be the kernel of the mapping . The mapping defines an isomorphism T d /N −→ d/Q.

(4)

We construct a moment mapping for the Hamiltonian action of N on C . Consider the mapping d

ϒ(z) =

d |z j |2 + λ j e∗j , j=1

where the λ j ’s are given in (1) and are uniquely determined by our choice of inward pointing normals to codimension 1 faces. The mapping ϒ is a moment mapping for the standard action of T d on Cd . Consider now the subgroup N ⊂ T d and the corresponding inclusion of Lie algebras ι : n → Rd . The mapping : Cd → n∗ given by = ι∗ ◦ ϒ is a moment mapping for the induced action of N on Cd . We want to prove that the quotient M = −1 (0)/N , endowed with the quotient topology, is a space stratified by quasifolds. Notice that, by (3), the group N is not closed in T d unless Q is an honest lattice, moreover to each there corresponds a whole family of quotients, given by all possible choices of normal vectors and of quasilattices Q containing these vectors. Let

Convex Polytopes and Quasilattices from the Symplectic Viewpoint

289

us recall that the polytope is said to be rational if there exist a lattice L and a a choice of normals X j such that X j ∈ L for j = 1, . . . , d. Nonrational polytopes are for example the regular pentagon and Penrose’s kite, whilst triangles are all rational. However we can associate to a rational polytope a “nonrational” space by making a nonrational choice of data attached to it. More precisely a choice of normals and quasilattice Q is said to be rational if Q is a true lattice, it is nonrational if the chosen quasilattice Q is not a lattice. In our setting, in which the polytope can be nonsimple, the zero set −1 (0) is not in general a smooth submanifold of R2d . Nonsimplicity of the polytope is responsible, as in the rational case, for the decomposition in strata of the quotient. To define the decomposition of M in pieces we start by giving some further definitions on the polytope . Let us consider the open faces of . They can be described as follows. For each such face F there exists a subset I F ⊂ {1, . . . , d} such that F = {μ ∈ | μ, X j = λ j if and only if j ∈ I F }.

(5)

The n-dimensional open face of , which we denote by Int(), corresponds to the empty subset. A partial order on the set of all faces of is defined by setting F ≤ F (we say F contained in F ) if F ⊆ F . The polytope is the disjoint union of its faces. Let r F = card(I F ); we have the following definitions: Definition 3.1. A p-dimensional face F of the polytope is said to be singular if r F > n − p, nonsingular if r F = n − p. Remark 3.2. Let F be a p-dimensional singular face in d∗ , then p < n −2. For example: a polytope in (R2 )∗ is simple; the singular faces of a nonsimple polytope in (R3 )∗ must be 0-dimensional. The following proposition is an adaptation of an analogous result in [G, P], to which we refer the reader for further details. Proposition 3.3. The n-dimensional quasitorus D = d/Q acts continuously and effectively on the topological space M = −1 (0)/N . Moreover M is compact and a continuous mapping : M −→ d∗ is defined such that (M) = . Proof. Consider the exact sequence π∗

ι∗

0 −→ d∗ −→ (Rd )∗ −→ (n)∗ −→ 0.

(6)

(π ∗ )−1

By (6) we have that the mapping ◦ ϒ induces a continuous mapping from the quotient M to d∗ , we call this mapping . Now notice, by making use of the explicit expression of ϒ, that z ∈ −1 (0) if and only if |z j |2 = ([z]), X j − λ j ,

j = 1, . . . , d.

(7)

This implies that (M) = . Moreover properness of ϒ implies that M is compact. A continuous action τ : D × M −→ M (8) is defined via the isomorphism (4). This action is free on −1 (Int()).

Remark 3.4. An immediate consequence of the arguments used in the proof of Proposition 3.3 is that the mapping induces a homeomorphism from the topological quotient M/D onto .

290

F. Battaglia

We introduce now two examples of nonsimple convex polytopes that have served as models for the whole construction. They are both rational polytopes with respect to the integer lattice Zn ⊂ Rn , that is they both admit choices of normals contained in the integer lattice. Hence they admit rational and nonrational choices; the corresponding families of spaces include therefore spaces stratified by smooth manifolds/orbifolds and spaces stratified by quasifolds (cf. Remark 6.6). Our first example is a polytope in (R3 )∗ , the second one is a polytope in (R4 )∗ . This last example allows us to illustrate the features of our spaces to a greater extent, since it has singular faces of positive dimension. Both examples will be resumed in the last section. Example 3.5 (Fig. 1). In (R3 )∗ consider the pyramid given by the convex hull of ν = (0, 0, 1) (the apex), μ1 = (1, 0, 0), μ2 = (1, 1, 0), μ3 = (0, 1, 0), μ4 = (0, 0, 0). It is a nonsimple polytope with only a singular face: the apex ν. We make the following choice of inward pointing normals. μ1 , μ2 , ν μ2 , μ3 , ν μ3 , μ4 , ν μ4 , μ1 , ν μ1 , μ2 , μ3 , μ4

X1 X2 X3 X4 X5

= (−1, 0, −1) = (0, − p2 , − p2 ) = (1, 0, 0) = (0, 1, 0) = (0, 0, p5 )

λ1 λ2 λ3 λ4 λ5

= −1 = − p2 =0 =0 =0

(9)

with p j ∈ R>0 for j = 2, 5. The j th item of the first column lists the vertices contained in the 1-codimensional affine space < μ, X j >= λ j . We choose the quasilattice Q to be the one generated by X 1 , . . . , X 5 .

n

m3

m4

m1

m2

Fig. 1. Ex. 3.5

Example 3.6 (Fig. 2). In (R4 )∗ consider the convex hull of ν1 = (1, −1, 0, 0), ν2 = (0, 0, 1, −1) and μ1 = (1, 0, 0, 0), μ2 = (0, 1, 0, 0), μ3 = (0, 0, 1, 0), μ4 = (0, 0, 0, 1). The resulting polytope is nonsimple, with nine 3-dimensional faces. It can be thought of as the 4-simplex in which the origin has been substituted by the edge ν1 ν2 . The singular faces are all of its vertices and some of its edges. We make the following choice of inward pointing normals

Convex Polytopes and Quasilattices from the Symplectic Viewpoint

ν1 , ν2 , μ1 , μ2 ν1 , ν2 , μ3 , μ4 ν1 , ν2 , μ1 , μ3 ν1 , ν2 , μ2 , μ4 ν2 , μ1 , μ2 , μ3 ν1 , μ1 , μ2 , μ4 ν1 , μ1 , μ3 , μ4 ν2 , μ2 , μ3 , μ4 μ1 , μ2 , μ3 , μ4

X1 X2 X3 X4 X5 X6 X7 X8 X9

= (0, 0, p1 , p1 ) = (1, 1, 0, 0) = (−1, 0, −1, 0) = (2, 1, 2, 1) = (− p5 , − p5 , − p5 , 0) = (0, 0, 1, 0) = (−1, 0, −1, −1) = ( p8 , 0, 0, 0) = (−1, −1, −1, −1)

291

λ1 λ2 λ3 λ4 λ5 λ6 λ7 λ8 λ9

=0 =0 = −1 =1 = − p5 =0 = −1 =0 = −1

(10)

with p j ∈ R>0 for j = 1, 5, 8. The j th item of the first column lists the vertices contained in the 1-codimensional affine space < μ, X j >= λ j . Thus we find that rη = 6 for each vertex η of the polytope. The singular edges are those with r F = 4 (marked in red in the picture). For example: Iν1 = {1, 2, 3, 4, 6, 7}, rν1 = 6, Iν2 = {1, 2, 3, 4, 5, 8}, rν2 = 6, Iν1 ν2 = {1, 2, 3, 4, }, rν1 ν2 = 4, where ν1 ν2 denotes the edge joining ν1 to ν2 . We choose the quasilattice Q to be the one generated by X 1 , . . . , X 9 . m4

m3 n2

n1

m2

m1 Fig. 2. Ex. 3.6

4. Notation We gather in this section the necessary notation to proceed with the proofs of the main results. Let K be one of the following sets C, C∗ , R, R∗ , R≥0 , R>0 , all of them considered naturally immersed in C. Let J be a subset of {1, . . . , d} and let J c be its complement. We denote by

K J = {(z 1 , . . . , z d ) ∈ Cd | z j ∈ K if j ∈ J, z j = 0 if j ∈/ J }. c

For example if J = ∅ then (K) J = {(z 1 , . . . , z d ) ∈ Cd | z j ∈ K} = Kd . We have c Kd = K J × (K) J . Let z ∈ Kd , we denote by z J its projection onto the factor K J . The stabilizer of the T d -action at any point in (C∗ )

Jc

is the torus

292

F. Battaglia

T J = {(a1 , . . . , ad ) ∈ T d | a j = 1 if j ∈ / J} of Lie algebra R J . Let F be a p-dimensional face and let I F be the corresponding set of indices. To lighten the notation we omit the I and write T F for T I F , R F for R I F , Fc Ic (C∗ ) for (C∗ ) F , etc. Moreover we set NF = N ∩ TF

(11)

and

nF = n ∩ RF its Lie algebra. Let d F = π(R F ), where π : Rd −→ d is the projection defined in (2); notice that d F ∩ Q is a quasilattice. The subgroup N F of N has dimension (r F − n + p), moreover T F /N F ∼ = d F /(d F ∩ Q). Now let μ be a vertex of . Define Iμ to be the set of subsets I of Iμ such that the set {X j | j ∈ I } is a basis of d. If μ is non-singular then Iμ contains just the element Iμ . We denote by I=

Iμ .

(μ vertex

of )

For each I ∈ I we have that the group N ∩ T I is discrete, since π(R I ) is n-dimensional. We set I = N ∩ T I , for I ∈ I. (12) Now let F be a singular face of of dimension p > 0 and let μ be a vertex contained in F, hence I F ⊂ Iμ . Take I ∈ Iμ , such that card(I ∩ I F ) = (n − p). We define the discrete group I ∩I F = N F ∩ T I ∩I F . (13) Consider now the diagrams I → T I ∩I F × T I \(I ∩I F ) −→ T I ∩I F ,

(14)

I → T I ∩I F × T I \(I ∩I F ) −→ T I \(I ∩I F ) .

(15)

Define ˇ II ∩I F and ˇ I \(I ∩I F ) to be the images of the composition of mappings (14) and (15) respectively. The groups ˇ II ∩I F and ˇ I \(I ∩I F ) are discrete groups such that × ˇ I \(I ∩I F ) . Moreover the following group exact sequence is defined: I = ˇ I I ∩I F

I

,I

F 1 → I \I ∩I F −→ ˇ I \(I ∩I F ) −→ ˇ II ∩I F / I ∩I F −→ 1.

(16)

Remark 4.1. The discrete groups just defined have a crucial role in the sequel. Notice that they need not be finite, but they are so whenever Q is an honest lattice. They are trivial if, for example, we can choose inward pointing normals {X 1 , . . . , X d } such that: i) the quasilattice generated by {X 1 , . . . , X d } is a lattice L and we choose Q = L; ii) for each I ∈ I the set {X j , j ∈ I } is a basis for Q. In particular a Delzant polytope in a lattice L realizes the above condition if the X j ’s are taken to be primitive in L.

Convex Polytopes and Quasilattices from the Symplectic Viewpoint

293

5. The Stratification We are now ready to define the decomposition of M, the singular pieces are given by: T F = −1 (F) with F singular face; the regular piece, which contains −1 (Int()), is given by T = ∪ F nonsing. −1 (F). Remark 5.1. It is important to point out that for any face F of the set −1 (F) is c non-empty and is precisely given by ( −1 (0) ∩ (C∗ ) F )/N ; this follows easily from the proof of Proposition 3.3 (for further details cf. [G]). Remark 5.1 allows us to characterize the pieces T F in the standard way, by the isotropy group attached to each of them. Remark 5.2. The regular piece T is given by the quotient −1 (0) /N , where c −1 (0) = ∪ F nonsing. ( −1 (0) ∩ (C∗ ) F ). Therefore the stabilizer of N at each point of −1 (0) is discrete and −1 (0) is precisely the set of regular points of the moment mapping , while the stabilizer of N corresponding to each singular piece T F is precisely the (r F − n + p)-dimensional group N F defined in (11). Theorem 5.3 (Quasifold structure of strata). The subset T F of M corresponding to each p-dimensional singular face of is a 2 p-dimensional quasifold. The subset T is a 2n-dimensional quasifold. These subsets give a decomposition by quasifolds of M. Before giving the proof of this theorem, we prove two key lemmas; they ensure that the behavior of the group N and its subgroups N F does not differ much from that of a torus; noncompactness can always be concentrated within a discrete group. The first lemma is the real version of [BP1, Lemma 2.3], we recall the proof for completeness. Lemma 5.4 (The group N ). Let μ be a vertex of the polytope and let I ∈ Iμ . Then we have that (i) T d /T I ∼ = N / I ; (ii) N = I exp (n); (iii) given any complement s of R I in Rd , we have that

n = {Y − π I−1 (π(Y )) | Y ∈ s}. Proof.

(i) Consider the group homomorphism λ I : N −→ T d /T I , n −→ [n].

Since n and R I are complementary λ I is surjective. The kernel of λ I is given by I , therefore λ I induces an isomorphism T d /T I ∼ = N / I . (ii) Every element in N can be written in the form exp (X ), where X ∈ Rd is such that π(X ) ∈ Q. Write now X = X − π I−1 (π(X )) + π I−1 (π(X )); it is easy to check that X − π I−1 (π(X )) ∈ n, and that exp (π I−1 (π(X ))) ∈ I . The group I ∩ exp (n) is not necessarily trivial, so the decomposition is not necessarily unique.

294

F. Battaglia

(iii) Every element of the form Y − π I−1 (π(Y )), with Y ∈ s, clearly belongs to n. Conversely, write every element V ∈ n as V = X + Y according to the decomposition Rd = R I ⊕ s, and notice that π(V ) = 0 implies that X = −π I−1 (π(Y )). Lemma 5.5 (The group N F ). Let F be a singular face of and let μ be a vertex contained in F. For any choice of I ∈ Iμ such that card(I ∩ I F ) = (n − p) we have: (i) T F /T (I ∩I F ) ∼ = N F / I ∩I F ; F (ii) N = I ∩I F exp(n F ); I F \I ∩I F (iii) n F = {Y − π I−1 }. ∩I F (π F (Y )) | Y ∈ R

Proof. See the proof of Lemma 5.4. Notice that also in this case the intersection I ∩I F ∩ exp(n F ) may not be trivial, so the decomposition need not to be unique. Remark 5.6. We exhibit here, by means of Lemmas 5.4 and 5.5, an explicit basis of n, which is very useful for explicit computations. Consider the flag in n,

n F ⊂ nμ ⊂ n.

(17)

Let A I = (aiIj )i∈I ∈ Mn×d be the matrix of the projection π : Rd −→ d with respect to the standard basis of Rd and the basis {X h , h ∈ I } of d. When clear from the context we will omit the I ’s and write simply A = (ai j ). A basis of n adapted to the flag (17) is given by the following vectors: ek − ahk eh , k ∈ I F \ I ∩ I F ; (18) h∈I ∩I F

el −

h∈I

ahl eh ,

l ∈ Iμ \ (I ∪ I F );

er −

ahr eh ,

r∈ / Iμ .

(19)

h∈I

Proof of Theorem 5.3. We have already observed in Remarks 5.1 and 5.2 that the regular and singular pieces are quotients of suitable subsets of −1 (0) by the group N . In order to construct, for each piece, a quasifold atlas, we construct local slices for the corresponding subset of −1 (0). This leads to the construction of local models. The crucial point is that the natural bijective mapping, from each local model thus obtained into the piece in consideration, is closed (cf. Step (I,d)). We prove the statement separately for the regular and singular pieces and divide the proof in steps, each with a title, in order to simplify the exposition and make it easier to refer to parts of the proof. Part I. The regular stratum. The fact that T is a 2n-dimensional symplectic quasifold acted on by D descends from the general result [P, Thm 3.1]. We need to give here an explicit proof. (I, a). Construction of local models. In order to prove that T is a 2n-dimensional quasifold we construct a collection of charts covering T . For each I ∈ I consider the open

I of −1 (0) defined by U

I = −1 (0) ∩ (C I × (C∗ ) I c ). The group N acts subset U

I with discrete stabilizer and the open subsets U

I cover the whole regular set on each U

I /N , that we denote by U I , give an open covering −1 (0) . Therefore the quotients U of T . Consider now a vertex μ and an I ∈ Iμ . Let I be the discrete group defined

Convex Polytopes and Quasilattices from the Symplectic Viewpoint

295

in (12). We want to prove that there exist an open subset U˜ I ⊂ Cn and a mapping φ I : U˜ I / I −→ U I such that U˜ I / I is a model, in the sense of Remark 1.3, and the mapping φ I is a homeomorphism. The components of the moment mapping with respect to the basis dual to the adapted basis given in (18,19) are as follows: the first rμ − n components are given by ahk (|z h |2 + λh ) + (|z k |2 + λk ) (20) − h∈I

with k ∈ Iμ \ I . Since for each k ∈ Iμ \ I we have ahk X h = ahk μ, X h = ahk λh , λk = μ, X k = μ, h∈I

h∈I

(21)

h∈I

the expression (20) reduces to −

ahk |z h |2 + |z k |2

(22)

h∈I

with k ∈ Iμ \ I . The remaining d − rμ components are given by ahr (|z h |2 + λh ) + (|zr |2 + λr ), −

(23)

h∈I

where r ∈ / Iμ and, by (21), (

h∈I

ahr λh ) − λr > 0. We define the set U˜ I by

U˜ I = {u ∈ C I | h∈I ahk |u h |2 > 0 for k ∈ Iμ \ I, 2 > −(( a |u | / Iμ }. hr h h∈I h∈I ahr λh ) − λr ) for r ∈ n The set U˜ I is a nonempty open subset of C I ∼ = C , star-shaped with respect to the origin, the quotient U˜ I / I is a model, in the sense of Remark 1.3. We construct now the homeomorphism φ I : U˜ I / I −→ U I . Consider the mapping c FI : U˜ I −→ (R>0 ) I

defined as follows: (FI )k (u) = 0 k ∈ I, (FI )l (u) = ahl |u h |2 l ∈ Iμ \ I, h∈I

and

(FI )r (u) = ahr |u h |2 + ahr λh − λr r∈ / Iμ . h∈I

h∈I

Now define the following mapping: UI U˜ I −→ u −→ [u + FI (u)].

(24)

296

F. Battaglia

We want to prove that the induced mapping φ I from U˜ I / I to U I is a homeomorphism. Observe first that the induced mapping φ I is continuous, and injective by definition of I . It is also surjective, this can be proved as follows:

I (I, b) Surjective mapping. In order to prove surjectivity of φ I take any element u +w ∈ U

with u ∈ C I , w ∈ (C∗ ) I . By Lemma 5.4 we can choose an element a ∈ N such that c a · (u + w) = u + w with u ∈ C I and w ∈ (R>0 ) I . Therefore w = FI (u ) and φ I is surjective. In other words the image of the mapping c

I U φ˜ I : U˜ I −→ u −→ u + FI (u),

I . is a slice whose saturation is exactly U (I,c) Closed mapping. In order to prove that φ I is closed, we need to check that N (φ˜ I (C)) is closed for each I -invariant closed subset C of U˜ I . Let am (u m + FI (u m )) be a sequence in N (φ˜ I (C)) converging to x, which necessarily belongs to −1 (0). We prove that x lies in N (φ˜ I (C)). By Lemma 5.4–the key fact here–the elements am of N can be decomc posed as gm exp(−π I−1 (π(Ym ))) exp(Ym ), for suitable elements Ym ∈ R I and gm ∈ I . c Remark now that the sequence exp(Ym ) is in the torus T I , therefore it does admit a subsequence, that we call in the same way, converging to an element of the form exp(Y ), c c with Y ∈ R I . We can now choose a sequence Ym ∈ R I such that exp Ym = exp Ym , for each m, with Ym converging to Y . This leads to am = km bm , with km ∈ I and bm = exp(−π I−1 (π(Ym ))) exp(Ym ) converging to b = exp(−π I−1 (π(Y ))) exp(Y ) by continuity. Therefore am (u m + FI (u m )) becomes bm (u m + FI (u m )), where u m = km u m – here we use the invariance of FI under the action of I . By continuity of the T d -action this implies that (u m + FI (u m )) −→ b−1 x, in particular the sequence u m is convergent and its limit w is in C, since C is closed. Therefore we can deduce that x = b(w+ FI (w)). (I, d) Universal covering. Now we compute the fundamental group of U˜ I . Denote by C I = {ρ ∈ (R≥0 ) I | h∈I ahk ρh > 0 for k ∈ Iμ \ I, / Iμ }. h∈I ahr ρh > −( h∈I ahr λh − λr ) for r ∈ The set C I is an intersection of half-spaces, it is therefore convex. Denote by {ρh = 0} the coordinate hyperplane {ρ ∈ R I | ρh = 0}. Let I∗ = {h ∈ I | {ρh = 0} ∩ C I = ∅} and let = card(I∗ ), then π1 (U˜ I ) = Z . When = 0, the quotient U˜ I / I is a model, when > 0 we construct a chart by taking the universal covering U I of U˜ I and the discrete group I , extension of I by π1 (U˜ I ), as explained in Remark 1.3. We thus obtain a model homeomorphic to U I . (I,e) Change of charts. To prove that the charts constructed above are compatible we need to check that the changes of charts are diffeomorphisms of models, more precisely: consider two subsets I and J in I. Suppose that the corresponding charts U I and U J have nonempty intersection. We want to prove that the mapping −1 −1 φ −1 J ◦ φ I : φ I (U I ∩ U J ) −→ φ J (U I ∩ U J )

is a diffeomorphism of models. For simplicity we consider the case in which U˜ I and U˜ J are both simply connected. We adapt the proof given in [BP1, Thm 2.2]. Let I \(I ∩J ) W˜ I = U˜ I ∩ C I ∩J × C∗

Convex Polytopes and Quasilattices from the Symplectic Viewpoint

and

297

J \(I ∩J ) . W˜ J = U˜ J ∩ C I ∩J × C∗

−1 ˜ Then W I = W˜ I / I is exactly φ −1 I (U I ∩ U J ) and W J = W J / J is exactly φ J (U I ∩ U J ), so they are submodels of U˜ I / I and U˜ J / J as required by the definition. In order to have simply connected open sets we pass to the universal coverings W I and W J of W˜ I and W˜ J respectively. We have W I = {(u, ρ, θ ) ∈ C I ∩J × (R>0 ) I \(I ∩J ) × R I \(I ∩J ) | (u, ρ exp(θ)) ∈ U˜ I }

and W J = {(u, ρ, θ ) ∈ C I ∩J × (R>0 ) J \(I ∩J ) × R J \(I ∩J ) | (u, ρ exp(θ)) ∈ U˜ J }, √ √ where ( ρ exp θ ){ j} = ρ j exp(θ j ). These are simply connected open sets acted on by the discrete groups I = {(exp X, Y ) | X ∈ R I ∩J , Y ∈ R I \(I ∩J ) , π(X + Y ) ∈ Q} and J = {(exp X, Y ) | X ∈ R I ∩J , Y ∈ R J \(I ∩J ) , π(X + Y ) ∈ Q} in the following manner: ( I

,

W I −→ W I )

((exp X, Y ), (u, ρ, θ )) −→ (exp X · u, ρ, θ + Y ), analogously for the J -action. Remark that the projections W I −→ W˜ I and W J −→ W˜ J induce homeomorphisms W I / I ∼ = W I and W J / J ∼ = W J . We now exhibit an equivariant homeomorphism g I J that projects down to g I J . This is given by the following mapping:

gI J :

W I −→

WJ (u, ρ, θ ) −→ exp(π J−1 · π )(θ) · u + (FI (u, ρ exp θ))(J \I ∩J ) .

It is straightforward to check that this is a continuous, injective mapping between open subsets of Cn whose Jacobian matrix has rank 2n at every point, therefore g I J is a diffeomorphism. Now add all compatible charts to obtain a complete atlas. The key point that allows us to construct the lift g I J is the following: we can choose, for a point in U I ∩ U J , two representatives, one in the slice corresponding to the open neighborhood

J . The mapping g

I and one in the slice corresponding to the open neighborhood U U IJ expresses how to go from the first representative to the other by moving along the orbit that joins the two slices. Part II. Singular strata. Singular strata are easier to describe as quasifolds, since each of them is covered by one chart. But let first take care of vertices: for each singular μc vertex μ of , the stratum Tμ = ( −1 (0) ∩ C∗ )/N μ is a point, since by Lemma 5.4

298

F. Battaglia μc

the group N μ acts transitively on −1 (0) ∩ C∗ . Now let F be a singular face of of dimension p > 0, let μ be a vertex contained in F and let I ∈ Iμ be such that card(I ∩ I F ) = (n − p). The quotient −1 (0) ∩ C∗

Fc

/N

is a quasifold covered by one chart, defined in the following way. Consider the open subset of (C∗ ) p : ∗ Uˇ F,I = {w ∈ (C∗ ) I \(I ∩I F ) ∼ = (C ) p | h∈I \(I ∩I F ) ahl |wh |2 > 0, for l ∈ Iμ \ (I ∪ I F ) 2 / Iμ }, I ∩I F ahr λh + h∈I \(I ∩I F ) ahr (|wh | + λh ) − λr > 0 for r ∈ and the mapping Uˇ F,I −→ T F

(25)

w −→ [w + φˇ F,I (w)] with φˇ F,I (w)k = 0, φˇ F,I (w)l =

k ∈ I ∪ IF , ahl |wh |2 ,

l ∈ Iμ \ (I ∪ I F ),

h∈I \(I ∩I F )

and φˇ F,I (w)r =

h∈I ∩I F

ahr λh +

ahr (|wh |2 + λh ) − λr

r∈ / Iμ .

h∈I \(I ∩I F )

The quotient Uˇ F,I /ˇ I \(I ∩I F ) is a model, in the sense of Remark 1.3: the induced mapping from Uˇ F,I /ˇ I \(I ∩I F ) to T F is continuous, it is injective since N ∩ T (I ∪I F ) = I N F . It is also surjective and closed: this can be easily proved by applying the same arguments of Part I, Steps (I,b) and (I,c). The open set Uˇ F,I is not simply connected, therefore, as we have done for some of the charts of the regular piece, we have to consider its universal covering in order to have a proper chart. We then add all compatible charts in order to obtain a complete atlas. In particular the charts corresponding to those J ’s in I satisfying the hypothesis of Lemma 5.5 are compatible. Part III. The decomposition. It remains to check that the described pieces, indexed by the singular faces of plus the open face, give a decomposition of M according to Definition 2.1. It is easy to verify that all pieces are locally closed and connected. The set of indexes certainly satisfies point (i) of Definition 2.1. Point (ii) follows from Proposition 3.3, while point (iii) is a consequence of Remark 5.1. Moreover the regular piece is open since the set of regular points in −1 (0) is open, it is also dense since it contains the dense set −1 (Int()). Remark 5.7. Each point m in the space M lies in a stratum, therefore it follows from Remark 1.7 that there is a well defined discrete group m attached to it. Remark 5.8. A singular face has at most dimension n − 3, therefore a singular piece has at most dimension 2n − 6.

Convex Polytopes and Quasilattices from the Symplectic Viewpoint

299

Remark 5.9. The decomposition of M is induced by the decomposition of −1 (0) given Fc by the manifolds −1 (0) ∩ C∗ , with F singular, and the open subset −1 (0) of −1 (0). The quasifold structure of each piece T F is naturally induced by the smooth Fc structure of −1 (0) ∩ C∗ , the quasifold structure of T is induced by the smooth structure of −1 (0) . Let p : −1 (0) −→ M be the projection; we have the following: Theorem 5.10 (Symplectic structure of strata). Each piece T F (T ) of the decomposition of M has a natural symplectic structure induced by the quotient procedure, that is, its pull-back via p coincides with the restriction of the standard symplectic form of c Cd to the manifold −1 (0) ∩ C∗ F ( −1 (0) ). Proof. Consider the regular piece. As in the classical reduction procedure, the standard symplectic structure of Cd induces a symplectic structure on each slice, and therefore, via pullback, a symplectic structure, I ( I )– invariant, on each open subset U˜ I ⊂ Cn (U I ⊂ Cn ). The structure induced is the standard one and respects the changes of charts, thus defining a symplectic structure on the quasifold T . The proof for the singular pieces goes in the same way. Theorem 5.11 (Quasitorus action on M). The restriction of the D-action and of the mapping to each piece of the space M is smooth, the action of D is Hamiltonian and a moment mapping is given by the restriction of . Proof. We refer to [P] for the definition of Hamiltonian action of a quasitorus on a quasifold and for the definition of moment mapping with respect to this action. To prove that the action τ defined in (8) is smooth and Hamiltonian we have to prove that it is so when lifted to local models. From Proposition 3.3 it then follows that the restriction of to each stratum is a moment mapping. We consider first the regular stratum T . For each I ∈ I, we have that the following diagram commutes:

d × U˜ I (X, u)

↓

(d/Q) × (U˜ I / I ) ([X ], [u])

↓

(d/Q) × U I ([X ], φ I ([u]))

τ˜I

−→ −→ τI

−→ −→

U˜ I exp(π I−1 (X )) · u

↓

U˜ I / I [exp(π I−1 (X )) · u]

↓

I /N −→ UI = U −→ [exp(π I−1(X )) · u + FI (u)] τ

moreover τ˜I is a smooth mapping and the action is Hamiltonian with respect to the standard symplectic form on U˜ I . For each singular piece T F we proceed in the same manner; consider the diagram

300

F. Battaglia τ˜F,I

−→ −→

d × Uˇ F,I (X, w)

↓

(d/Q) × (Uˇ F,I /ˇ I \(I ∩I F ) ) ([X ], [w])

↓

(d/Q) × T F ([X ], [w + φˇ F,I (w)])

Uˇ F,I −1 exp(π I (X ))1

↓

τ F,I

−→ −→

·w

Uˇ F,I /ˇ I \(I ∩I F ) [exp(π I−1 (X ))1 · w]

↓

−→ TF −→ [exp(π I−1(X ))1 · w + φˇ F,I (w))] τ

where exp(π I−1 (X ))1 stands for the first factor of exp(π I−1 (X )) in the decomposition T I = T I \I ∩I F × T I ∩I F . The diagram is commutative, the mapping τ˜F,I is smooth and the action is Hamiltonian with respect to the standard symplectic form on Uˇ F,I . 6. The Stratification: Local Structure Now we need to prove that our decomposition has good local behavior. Let t0 be a point in the singular 2 p-dimensional piece T F ; we want to construct a link of t0 satisfying Definition 2.3. Let C F be the complexification of the Lie algebra R F . The mapping ϒ restricted to C F gives rise a moment mapping ϒ F for the action of T F on C F . The to ∗ F F ∗ 2 mapping ϒ F : C −→ R is given by ϒ F (z) = j∈I F (|z j | + λ j )e j . Consider now the Hamiltonian action of the (r F −n+ p)-dimensional group N F on C F , induced by that of T F : a moment mapping is then given by F = ι∗F ◦ ϒ F , where ι F : n F −→ R F is the inclusion map. Using (5) we find that F (z) = j∈I F |z j |2 ι∗F (e∗j ), hence F−1 (0) is a cone. Let j F : d F −→ d be the inclusion map. Consider the exact sequence π F∗

ι∗F

0 −→ (d F )∗ −→ (R F )∗ −→ (n F )∗ −→ 0, where π F is the restriction to R F of the projection π : Rd −→ d. Define F = ∗ j∈I F {μ ∈ d | μ, X j ≥ λ j }. By repeating the argument of Proposition 3.3 a continuous surjective mapping F : F−1 (0)/N F −→ j F∗ ( F ) is defined by setting F ([z]) = (π F∗ )−1 (ϒ F (z)). Remark 6.1. We enumerate here properties of F that we need in the sequel. (1) F = j F∗ ( F ) is an (n − p)-dimensional cone with vertex j F∗ (F); (2) if G is a q-dimensional face of containing F, then j F∗ (G) is a (q − p)-dimensional face of F ; (3) for each j ∈ I F we can find b j ∈ (0, 1] such that, taken Y = j∈I F b j X j , the ∗ intersection F ∩ {ξ ∈ d F | ξ, Y = j∈I F λ j b j + } is a nonempty convex polytope, F, , of dimension (n − p − 1) ( ∈ R>0 ); (4) let G be a q-dimensional face of properly containing F, then G F = j F∗ (G)∩{ξ ∈ ∗ d F | ξ, Y = j∈I F λ j b j +} is a (q − p−1)-dimensional face of F, . Moreover G F is singular in F, if and only if G is singular in .

Convex Polytopes and Quasilattices from the Symplectic Viewpoint

301

Fix Y ∈ d F as specified in the previous remark, then the following lemma holds: Lemma 6.2 (Local structure). For each t0 in the singular piece T F , the space L F, = −1 F ( F, ) satisfies the first point of Definition 2.3 for a suitable . 2 Proof. Let S F, = z ∈ C F | j∈I F b j |z j | = and let ( F−1 (0)) = F−1 (0) ∩ Notice that L F,

⎧ ⎨ ⎩

z ∈ CF |

b j |z j |2 <

j∈I F

⎫ ⎬ ⎭

.

is nothing but the quotient F−1 (0) ∩ S F, /N F and therefore ( F−1 (0)) /N F = C(L F, ).

The space L F, is our candidate link for t0 . Recall that the decomposition in pieces of the space M reflects the geometry of the polytope and is defined via the mapping . The decompositions in pieces of both spaces ( F−1 (0)) /N F and L F, are defined via the mapping F exactly in the same way and they are related accordingly to Remark 2.2. The arguments used in the proof of Theorems 5.3 and 5.10 apply with no important changes to show that the two decompositions satisfy Definition 2.1 and that their pieces are quasifolds, symplectic in the case of ( F−1 (0)) /N F . Consider for instance the singular pieces of L F, . Each singular piece corresponds to a singular face G of properly containing F. Let q be the dimension of G. We want to prove that the singular piece −1 F (G F ) is a quasifold of dimension (2q − 2 p − 1), covered by one chart. The polytope to be considered is now F, . We can choose I ∈ I such that card(I ∩ IG ) = n − q and card(I ∩ I F ) = n − p, then, for the transversality condition of Remark 6.1, point (3), there exists a j ∈ I ∩ I F \ I ∩ IG such that, having set ch = bh + k∈I F \(IG ∪(I ∩I F )) bk ahk , the coefficient c j = 0. By h = j we shall mean h∈(I F ∩I )\((IG ∩I )∪{ j}) . Define the dis∩I F ∩I F in such a way that I ∩I F = ˇ I ∩I F \I ∩IG × ˇ II ∩I . crete groups ˇ I ∩I F \I ∩IG and ˇ II ∩I G G As in Theorem 5.3, in order to construct a local model we have to define a suitable slice. Consider the open subset: ∗ [(I F ∩I )\(IG ∩I )]\{ j} F Uˇ G,I, | (1/c j )( − h = j ch |wh |2 ) > 0, j = {(θ j , w) ∈ R × (C ) 2 2 h = j ahk |wh | + (a jk /c j )( − h = j ch |wh | ) > 0, k ∈ I F \ (IG ∪ (I ∩ I F ))} acted on by the group Z × ˇ I ∩I F \(I ∩IG ) in the following way: F ˇF (Z × ˇ I ∩I F \(I ∩IG ) ) × Uˇ G,I, j −→ UG,I, j (m, exp(T )), (θ j , w) −→ (θ j + m + T j , exp(T(I F ∩I )\{ j} )w).

(26)

(27)

−1 F ˇ The homeomorphism from the model Uˇ ,I, j /Z × I ∩I F \(I ∩IG ) to F (G F ) ⊂ L G F , is induced by the continuous mapping:

Uˇ G F ,I, j −→ −1 F (G F ) (θ j , w) −→ [w + φˇ G F (θ j , w)]

(28)

302

F. Battaglia

with l ∈ IG ∪ (I F )c ∪ (I ∩ I F ), φˇ G F (θ j , w)l = 0, φˇ G F (θ j , w) j = (1/c j ) − h = j ch |wh |2 e(2πiθ j ) , 2 2 φˇ G F (θ j , w)k = h = j ahk |wh | + (a jk /c j )( − h = j ch |wh | ) k ∈ I F \ (I G ∪ (I ∩ I F ).

The proof that the mapping (28) induces a homeomophism onto −1 F (G F ) goes very similarly to that given for the mapping (25), in the proof of Theorem 5.3, Part I, but since we deal now with the group N F , the key result here is Lemma 5.5. The atlas, obtained by adding all compatible charts, contains, in this case too, all of the charts corresponding to those J ∈ I and j ∈ J satisfying the conditions specified above. The mapping h F . Let F be a singular face of dimension p > 0 and let t0 be a point in T F . We prove that near t0 our space M is homeomorphic to the twisted product of an open subset of T F by a cone over the link L F, . Let I ∈ I be such that Uˇ F,I /ˇ I \(I ∩I F ) gives a model for T F as constructed in the proof of Theorem 5.3, Part II. In what follows we identify T F with this model. The discrete group ˇ I \(I ∩I F ) acts on the quotient ( F−1 (0)) /N F and on the product Uˇ F,I × (( F−1 (0)) /N F ) in the following way: ˇ I \(I ∩I F ) × ( F−1 (0)) /N F −→ ( F−1 (0)) /N F (g

,

[z])

−→

[ I F ,I (g)z]

;

ˇ I \(I ∩I F ) × Uˇ F,I × (( F−1 (0)) /N F ) −→ Uˇ F,I × ( F−1 (0)) /N F (g

,

(w, [z]))

−→

(gw, I F ,I (g)[z])

.

By atlases, it is straightforward to check that the quotient making use of the explicit −1 F ˇ U F,I × (( F (0)) /N ) /ˇ I \(I ∩I F ) inherits the decomposition in strata of the product, and that each of these strata has an induced quasifold structure. This holds for any ˇ ˇ I \I ∩I F is a model in the sense of Remark 1.3. We open subset Bˇ ⊂ Uˇ F,I such that B/ want to choose now an and a T I \(I ∩I F ) - invariant open subset Bˇ of Uˇ F,I such that ˇ ˇ I \(I ∩I F ) and the mapping h F from the twisted product t0 ∈ B/ Bˇ × (( −1 (0)) /N F ) /ˇ I \(I ∩I F ) F

to the open subset of M ⎛ ⎝ −1 (0) ∩ ( Bˇ × {z ∈ C F |

⎞ b j |z j |2 < } × C(I F

∪I )c

)⎠ /N

j∈I F

given by # " h F ([w, [z]]) = w + z + (FI (z (I ∩I F ) + w))(I ∪I F )c is well defined and surjective. Let w0 ∈ Uˇ F,I be a point that projects down to t0 , namely t0 = [w0 + φˇ F,I (w 0 )]. We can choose a positive real constant c and a Bˇ containing w 0

Convex Polytopes and Quasilattices from the Symplectic Viewpoint

303

ˇ Denote by in such a way that (φˇ F,I (w)l )2 > c for all l ∈ (I ∪ I F )c and for all w ∈ B. ˇ ˇ I \(I ∩I F ) . The open subset B is a B the open neighborhood in T F homeomorphic to B/ submodel in T F . Choose now > 0 in such a way that for each [z] ∈ ( F−1 (0)) /N F we have ahl |z h |2 > −c, l ∈ (I ∪ I F )c . h∈I ∩I F

With these choices the mapping h F is well defined, continuous and injective. It is easy to check, via Lemma 5.4, that h F is surjective. Moreover a simple adaptation of the argument used in the proof of Theorem 5.3, Step (I,c), shows that the mapping h F is closed. Finally we observe that by construction the mapping h F takes strata into strata and its restriction to each stratum is a diffeomorphism of quasifolds, according to Definition 1.9. For each singular vertex μ the mapping h μ is defined on the cone C(L μ, ) and satisfies all the required properties provided that > 0 is chosen in such a way that 2 c h∈I ahr |z h | > −( h∈I ahr λh − λr ) for each r ∈ Iμ . Lemma 6.3 (The link of the link). Let F be a singular face of the convex polytope and t0 be a point in T F . The compact space L F, is a link of t0 . Proof. We need to prove that the decomposition of the compact space L F, , defined in Lemma 6.2, is itself a stratification, i.e. it satisfies the recursive Definition 2.3. Let G0 ⊂ G1 ⊂ G2 ⊂ · · · ⊂ Gk be a sequence of singular faces of such that dim(G r ) = qr , with G 0 = F and such that there are no singular faces containing G k . Notice that IG r ⊂ IG r −1 for each r = 1, . . . , k. Now take a sequence of Y r ∈ dG r and a sequence of r in order to obtain, according to Remark 6.1, a corresponding sequence of convex polytopes G r ,r , all nonsimple but the last one. Choose an I ∈ I and a sequence of indices jr −1 ∈ (I ∩ IG r −1 ) \ (I ∩ IG r ) such that the following two conditions are verified: the first is card(I ∩ IG r ) = n − qr for each r = 0, . . . , k ; for r = 1, . . . , k set crh−1 = bhr −1 + k bkr −1 ahk , where, if h ∈ (I ∩ IG r −1 ) \ (I ∩ IG r ), then k ranges in IG r −1 \ (IG r ∪ (I ∩ IG r −1 )), if h ∈ I ∩ IG r ,

= 0. Let then k ∈ IG r −1 \ I ∩ IG r −1 , the second condition then is that the coefficient crjr−1 −1 tr be a point in the singular piece LG r of L G r −1 ,r −1 corresponding to the face G r . The space L G r ,r , defined as in Lemma 6.2 for a point in the piece TG r , is the candidate link of tr . Notice that L G k ,k is a quasifold. The proof of the theorem is complete if we can prove that, for each such point tr , with r = 1, . . . , k, the space L G r −1 ,r −1 satisfies the first point of Definition 2.3, namely we need to prove that, near tr , the space L G r −1 ,r −1 is homeomorphic to the twisted product of an open subset of LG r by a cone over the link G −1 L G r ,r . In order to do so define Uˇ G rr,I, jr −1 in analogy with (26). Consider now the discrete I ∩I

I ∩I

G G groups ˇ I ∩IGr −1 \I ∩IGr and ˇ I ∩IG r −1 such that I ∩IGr −1 = ˇ I ∩IGr −1 \I ∩IGr × ˇ I ∩IG r −1 . r r G −1 as indicated in (27), it also acts on The group Z × ˇ I ∩IGr −1 \I ∩IGr acts on Uˇ G rr,I, j

(G−1r (0))r /N G r in the following way:

(Z × ˇ I ∩IGr −1 \I ∩IGr ) × (G−1r (0))r /N G r −→ (G−1r (0))r /N G r ((m, g) , [z])

−→

(g)[z]

304

F. Battaglia I ∩I

G where is the natural epimorphism : ˇ I ∩IGr −1 \I ∩IGr −→ ˇ I ∩IG r −1 / I ∩IGr . As in the r proof of Lemma 6.2 we choose an open neighborhood Bˇ r in C[(I ∩IGr −1 )\(I ∩IGr )]\{ jr −1 } , G invariant by the action of T d , such that R × Bˇ r is contained in Uˇ r −1 and the quo-

G r ,I, jr −1

tient R × Bˇ r /(Z × ˇ I ∩IGr −1 \I ∩IGr ) contains tr . Consider now the mapping h r from the twisted product $ R × Bˇ r × (G−1r ,r (0)/N G r ) Z × ˇ I ∩IGr −1 \I ∩IGr to the open subset of L G r −1 ,r −1 $ c G−1r −1 (0) ∩ SG r −1 ,r −1 ∩ Bˇ r × C{ jr −1 }∪IGr ∪(I ∩IGr −1 ) N G r −1 , given by h r ([θ j , w, [z]]) = [w + z + x], where xh = 0 for all h ∈ [(IG r ∪ (I ∩ IG r −1 )) \ { jr −1 }] ∪ (IG r −1 )c ,

⎛ ⎞ ) ⎝r −1 − crh−1 |(w + z)h |2 ⎠e(2πiθ j ) x jr −1 = (1/crjr−1 −1 h = jr −1

and

⎛ ⎞ ahs |(w + z)h |2 + (a js /crjr−1 ) ⎝r −1 − crh |(w + z)h |2 ⎠, xs = −1 h = jr −1

h = jr −1

where h = jr −1 ranges in I ∩ IG r −1 \ I ∩ IG r and s ∈ IG r −1 \ ((I ∩ IG r −1 ) ∪ IG r ). The sequence of neighborhoods Bˇr , r = 1, . . . , k, and the sequence of r > 0, r = 0, . . . , k, can be chosen in such a way that the mapping h r is well defined for each r = 1, . . . , k. A straightforward adaptation of the arguments used to check the properties of the mapping h F in Lemma 6.2 shows that h r is continuous, bijective and closed. Moreover h r , restricted to each stratum, is a quasifold diffeomorphism. Theorem 6.4. The decomposition of the space M is a stratification by quasifolds according to Definition 2.3. Proof. The proof is an immediate consequence of Lemma 6.2 and Lemma 6.3. Remark 6.5. The link L F, fibers naturally over a space corresponding to the polytope F, , this can be proved as follows: let ann(Y ) = {ξ ∈ d∗F | ξ, Y = 0} and let k F : ann(Y ) −→ d∗F be the natural inclusion. Fix a point ξ0 ∈ {ξ ∈ d∗F | ξ, Y = denote by F, the polytope F, viewed in the subspace ann(Y ). j∈I F λ j b j + } and We have F, = j∈I F { ξ ∈ ann(Y ) | ξ, k ∗F (X j ) ≥ λ j − ξ0 , X j }. Now apply the construction described in Sect. 3 to the polytope F, , with the choice of normals k ∗F (X j ) and quasilattice k ∗F (d F ∩ Q). Denote by (N F )Y , (n F )Y and F,Y, the group thus obtained, its Lie algebra and the relative moment mapping respectively. Let Y˜ = j∈I F b j e j , then it is straightforward to check that:

Convex Polytopes and Quasilattices from the Symplectic Viewpoint

305

(i) (n F )Y = n F ⊕ Span{Y˜ }; (ii) (N F )Y /N F ∼ = exp(Span{Y˜ }); (iii) the moment mapping, written in components according to the direct sum (i), is given by F,Y, (z) = ( F (z), j∈I F b j |z j |2 − ). Therefore the link L F, is −1 exactly the quotient F,Y, (0)/N F , hence L F, fibers over the symplectic quo−1 F tient F,Y, (0)/(N )Y , with fiber the 1-dimensional group (N F )Y /N F . Although the coefficients b j ’s can be chosen to be rational, thus obtaining a compact fiber, this is not always the most natural choice, as we shall see in the examples. Remark 6.6. If the polytope is rational, namely if the X j ’s can be chosen to be in a lattice L, then all discrete groups involved become finite, and therefore, as we have already observed in Remark 2.4, the stratification becomes locally trivial, singular strata are smooth and the principal stratum is either smooth or an orbifold. Each link L F, is in this case a fiber bundle, with fiber a 1-dimensional torus, over the symplectic stratified space corresponding to the polytope F, , as specified in Remark 6.5. If in addition the polytope admits a choice of X j ’s and Q that satisfies conditions (i) and (ii) specified in Remark 4.1, then the principal stratum is also smooth. The quotient −1 (0)/N provides, in the rational case, an explicit example of the symplectic stratified spaces described in [SL], as far as we refine the stratification of the principal stratum, considering the strata given by finite group isotropy type. However the results obtained by Sjamaar-Lerman in [SL] do not seem extendable to our specific case, since they are based on Mather’s results, which, as they are known, do not apply to our context. If the polytope is simple each corresponding space has no singular strata and we find exactly the family of symplectic quasifolds constructed by Prato in [P]. This family includes, when is rational in a lattice L, the symplectic orbifolds associated with the pair (, L), constructed in [LT]; it also includes, when satisfies Delzant’s integrality condition, the symplectic toric manifold constructed in [D]. The simple case is significant in that it makes already clear that orbifold structures are naturally associated to rational polytopes, this explains why the principal stratum, which is associated with the regular part of the polytope, is, in general, an orbifold. As in the simple case, additional conditions have to be satisfied in order to have a smooth principal stratum. We are now ready to work out in detail Examples 3.5 and 3.6. For detailed examples of quasifolds, and in particular, of the symplectic quasifolds corresponding to simple convex polytopes, we refer the reader to [P]. Example 3.5 resumed. The regular stratum is a symplectic quasifold of dimension 6. It is covered by the open sets Uμ j , for j = 1, . . . , 4. The corresponding models are U˜ μ j / μ j . The only singular stratum is the point Tν = [0, 0, 0, 0, z 5 ]. We want to describe M as a cone in a neighborhood of Tν . Let I = {2, 3, 4} ⊂ Iν . The corresponding matrix is ⎛ ⎞ 1/ p2 1 0 0 − p5 / p2 ⎠. 0 A I = ⎝ −1 0 1 0 1 0 0 1 − p5 The subset −1 (0) ⊂ C5 is given by the following equations: |z 1 |2 − 1/ p2 |z 2 |2 − |z 4 |2 + |z 3 |2 = 0, |z 5 |2 + p5 / p2 |z 2 |2 + p5 |z 4 |2 − p5 = 0,

306

F. Battaglia

while the 2-dimensional group N = exp(n) is the following subgroup of T 5 : N = {(e2πi x , e2πi(−(1/ p2 )x+( p5 / p2 )y) , e2πi x , e−2πi(x+ p5 y) , e2πi y ) | x, y ∈ R}. We construct now the link of Tν : the cone ν−1 (0)/N ν is the quotient of {(z 1 , z 2 , z 3 , z 4 , 0) | |z 1 |2 − 1/ p2 |z 2 |2 − |z 4 |2 + |z 3 |2 = 0} modulo the action of the 1-dimensional group N ν = N ∩ T ν = {(e2πi x , e2πi(−1/ p2 x+ p5 / p2 h) , e2πi x , e2πi(−x+ p5 h) , 1) |h ∈ Z, x ∈ R}. Choose Y˜ = (1, 1, 1, p2 ) according to Remarks 6.1, 6.5 and define the mapping −1 (0)/N ν −→ M h ν : ν,

[z] −→ [z 1 , z 2 , z 3 , z 4 , p5 (1 − |z 2 |2 − |z 4 |2 ) ], where is chosen in such a way that if 3j=1 |z j |2 + p2 |z 4 |2 < , then −|z 2 |2 − |z 4 |2 > −1. The link L ν, = (ν−1 (0) ∩ Sν, )/N ν is a compact quasifold: it is the quotient {(z 1 , z 2 , z 3 , z 4 ) | (1 + 1/ p2 )|z 2 |2 + 2|z 4 |2 = ; |z 1 |2 + |z 3 |2 = /(1 + p2 )}/N ν . The group N ν is a subgroup of the 2-dimensional group (N ν )Y = N1 × N2 , where N1 = {{(e2πi x , 1, e2πi x , 1, 1) |x ∈ R} and N2 = {(1, e2πi(−1/ p2 y+ p5 / p2 h) , 1, e2πi(−y+ p5 h) , 1) |h ∈ Z, y ∈ R}. The quotient (N ν )Y /N ν is isomorphic to exp(Span{Y˜ }). Therefore the link fibers over S 3 /N1 × S 3 /N2 , which is a product of two quasispheres (cf. [P] for exact definitions and details). The fiber is the 1-dimensional group (N ν )Y /N ν , which is nonclosed if p2 is nonrational. Recall from [P] that quasispheres are symplectic quasifolds associated to an interval in R, therefore the link L ν, consistently corresponds to ν, , which, with our choice of Y˜ , is a product of intervals. If the p j ’s are rational we obtain a cone over an orbifold. If the p j ’s are all equal to 1, then conditions (i) and (ii) of Remark 4.1 are satisfied and the corresponding space is stratified by smooth manifolds. Near the singular point, the space is a cone over L ν, , which is in this case a fiber bundle over S 2 × S 2 with fiber S 1 . Example 3.6 resumed. The regular stratum is an 8-dimensional quasifold, a collection of charts is given by the open subsets U I , with I ranging in I. We want to describe M in a neighborhood of the singular point Tν1 = [0, 0, 0, 0, z 5 , 0, 0, z 8 , z 9 ] and in a neighborhood of a point in the 2-dimensional singular stratum Tν1 ν2 . Consider I = {1, 2, 3, 6} ⊂ Iν1 . The corresponding matrix is ⎛ ⎞ 1 0 0 1/ p1 0 0 −1/ p1 0 −1/ p1 0 −1 ⎟ ⎜ 0 1 0 1 − p5 0 0 . AI = ⎝ 0 0 1 −1 0 0 1 − p8 0 ⎠ 0 0 0 0 − p5 1 1 − p8 0

Convex Polytopes and Quasilattices from the Symplectic Viewpoint

307

The subset −1 (0) ⊂ C9 is given by the following equations: |z 7 |2 + 1/ p1 |z 1 |2 − |z 3 |2 − |z 6 |2 = 0, |z 4 |2 − 1/ p1 |z 1 |2 − |z 2 |2 + |z 3 |2 = 0, 2 2 2 |z 5 | + p5 |z 2 | + p5 |z 6 | − p5 = 0, |z 8 |2 + p8 |z 3 |2 + p8 |z 6 |2 − p8 = 0, |z 9 |2 + 1/ p1 |z 1 |2 + |z 2 |2 − 1 = 0, while the 5-dimensional group N = exp(n) is the following subgroup of T 9 : N= exp{(1/ p1 (−x + z + w), −x + p5 y + w, x − z + p8 t, x, y, p5 y − z + p8 t, z, t, w)| x, y, z, t, w ∈ R}. We construct now the link of Tν1 : the cone ν−1 (0)/N ν1 is the quotient of 1 {(z 1 , z 2 , z 3 , z 4 , 0, z 6 , z 7 , 0, 0) | |z 4 |2 − 1/ p1 |z 1 |2 − |z 2 |2 + |z 3 |2 = 0, |z 7 |2 + 1/ p1 |z 1 |2 − |z 3 |2 − |z 6 |2 = 0} modulo the action of the 2-dimensional group N ν1 = N ∩ T ν1 given by exp {(1/ p1 (−x + z + l), −x + p5 h + l, x − z + p8 k, x, h, p5 h − z + p8 k, z, k, l) | x, z ∈ R h, k, l ∈ Z}. We choose Y˜ = (1, 1, 1, 1, 1, 1) according to Remarks 6.1, 6.5 and define the mapping (0)/N ν1 −→ M h ν1 : ν−1 1 , [z] −→ [z ], where for j ∈ Iν1 , z j = z j 2 2 z 5 = − p5 |z 2 | − p5 |z 6 | + p5 , z 8 = − p8 |z 3 |2 − p8 |z 6 |2 + p8 , z 9 = −1/ p1 |z 1 |2 − |z 2 |2 + 1, and is chosen in such a way that if j∈Iν |z j |2 < then 1

−|z 2 |2 − |z 6 |2 > −1,

−|z 3 |2 − |z 6 |2 > −1,

−1/ p1 |z 1 |2 − |z 2 |2 > −1. (0) ∩ Sν1 , )/N ν1 is a 7-dimensional stratified space. The singular The link L ν1 , = (ν−1 1 pieces are 1-dimensional and correspond to the singular polytope edges stemming from ν1 , namely: ν1 ν2 , ν1 μ1 and ν1 μ4 . They are the quotients {(0, 0, 0, 0, 0, z 6 , z 7 , 0, 0) | |z 7 |2 = |z 6 |2 = /2}/N ν1 , {(0, z 2 , 0, z 4 , 0, 0, 0, 0, 0) | |z 4 |2 = |z 2 |2 = /2}/N ν1 , {(z 1 , 0, z 3 , 0, 0, 0, 0, 0, 0) | 1/ p1 |z 1 |2 = |z 3 |2 = /(1 + p1 )}/N ν1 . The mapping h ν1 maps strata into strata diffeomorphically.

308

F. Battaglia

Let us now consider the 2-dimensional singular stratum Tν1 ν2 . Recall that Iν1 ν2 = {1, 2, 3, 4} and that we have chosen I = {1, 2, 3, 6}. Following the proof of Theorem 5.3 we can construct a local model in a neighborhood of a point t0 of the stratum: it is given by Uˇ I,ν1 ν2 = {w6 ∈ C∗ | |w6 |2 < 1} modulo the free action of the discrete group ˇ I \I ∩Iν1 ν2 ; denote by w0 the point in Uˇ I,ν1 ν2 projecting down to t0 . The group ˇ I \I ∩Iν1 ν2 is obtained by considering I = {(e2πi1/ p1 (−h+l+r ) , e2πi(−h+ p5 k+r ) , e2πi(h−l+ p8 m) , 1, 1, e2πi( p5 k−l+ p8 m) , 1, 1, 1) | h, k, l, m, r ∈ Z} then ˇ I \Iν1 ν2 ∩I = {(1, 1, 1, 1, 1, e2πi( p5 k−l+ p8 m) , 1, 1, 1) | k, l, m ∈ Z}. We also have the discrete group ˇ II ∩Iν

1 ν2

= {e2πi1/ p1 (−h+l+r ) , e2πi(−h+ p5 k+r ) , e2πi(h−l+ p8 m) , 1, 1, 1, 1, 1, 1) | h, k, l, m, r ∈ Z}.

Moreover, if p5 , p8 / p5 ∈ R \ Q then I ∩Iν1 ν2 = {(e2πi(1/ p1 )(−h+l+r ) , 1, 1, 1, 1, 1, 1, 1, 1) | h, l, r ∈ Z}. Recall that a natural group epimorphism is defined from the group ˇ I \Iν1 ν2 ∩I onto the (0)/N ν1 ν2 is the quotient of group ˇ II ∩Iν ν / I ∩Iν1 ν2 . The cone ν−1 1 ν2 1 2

{(z 1 , z 2 , z 3 , z 4 , 0, 0, 0, 0, 0) | |z 4 |2 − 1/ p1 |z 1 |2 − |z 2 |2 + |z 3 |2 = 0} by the action of the 1-dimensional group N ν1 ν2 = N ∩ T ν1 ν2 given by {(e2πi1/ p1 (−x+l+r ) , e2πi(−x+r ) , e2πi(x−l) , e2πi x , 1, e−2πil , e2πil , 1, e2πir | x ∈ R l, r ∈ Z}. We choose Y˜ = (1, p1 , 1, 1) according to Remarks 6.1, 6.5 and define the mapping h ν1 ν2 : ( B˜ × ν−1 (0)/N ν1 ν2 ) / ˇ I \(I ∩Iν1 ν2 ) −→ M 1 ν2 , [w, [z]] −→ [z ], where z j = z j z 6 = w6 , z 5 = − p5 |z 2 |2 − p5 |w6 |2 + p5 , z 7 = −1/ p1 |z 1 |2 + |z 3 |2 + |w6 |2 , z 8 = − p8 |z 3 |2 − p8 |w6 |2 + p8 , z 9 = −1/ p1 |z 1 |2 − |z 2 |2 + 1

for j ∈ Iν1 ν2

Convex Polytopes and Quasilattices from the Symplectic Viewpoint

309

and a positive constant c is chosen in such a way that B˜ = {w6 ∈ C∗ | c < |w6 |2 < 1−c} is well defined and contains w0 , is chosen in such a way that if |z 1 |2 + p1 |z 2 |2 + |z 3 |2 + |z 4 |2 < then −|z 2 |2 > −c,

−|z 3 |2 > −c,

−1/ p1 |z 1 |2 − |z 2 |2 > −c,

−1/ p1 |z 1 |2 + |z 3 |2 > −c. Here we can touch the twisting group ˇ I \I ∩Iν1 ν2 . Recall from Lemma 5.4 that N / I ∼ = T d /T I ; namely I , when infinite, represents, intuitively, the nonclosed part of N . The twisting group ˇ I \I ∩Iν1 ν2 is a subgroup of I and it does act on both sides of the product ( B˜ × ν−1 (0)/N ν1 ν2 ). The link L ν1 ν2 , = (ν−1 (0) ∩ Sν1 ν2 , )/N ν1 ν2 is a compact 1 ν2 , 1 ν2 quasifold which fibers over the product of two quasispheres, with fiber a 1-dimensional group isomorphic to exp(Span{Y˜ }), similarly to the link found in the pyramid example. To exemplify the proof of Theorem 6.3 let us consider the sequence of singular faces ν1 ⊂ ν1 ν2 . The corresponding polytopes, ν1 ,1 and ν1 ν2 ,2 , can be visualized in Fig. 3, the link of the singular point Tν1 , namely L ν1 ,1 , is a fibration over a space corresponding to ν1 ,1 , the link of the link, at a singular point in the stratum corresponding to ν1 ν2 , is a fibration over a space corresponding to ν1 ν2 ,2 . The fibers are, as we have seen, 1-dimensional abelian groups, possibly nonclosed. If the p j ’s are rational the space corresponding to our polytope is stratified by orbifolds. If the p j ’s are all equal to 1, then conditions (i) and (ii) of Remark 4.1 are satisfied and the corresponding space is, near to each singular stratum, a trivial bundle over the stratum itself; moreover the strata are smooth manifolds. Δν1 ν2

ν1 ν1

2

ν1 μ3 Δν1

1

ν1 μ1 ν 1 μ2

ν1 μ4 Fig. 3. The link of the link

Remark 6.7. Theorem 6.4 proves that the decomposition of M is in fact a stratification. Moreover, from Theorems 5.3, 5.10, we know that each piece of the stratification of

310

F. Battaglia

M has the structure of a symplectic quasifold, naturally induced by that of Cd . This suggests that the symplectic forms defined on each stratum glue together to give rise to a symplectic form on the stratified space M, globally defined, thus making sense of the notion of differential form defined on M. Remark 6.8. In the light of Theorem 5.11, we can view the mapping as a moment mapping for the action of the n-dimensional quasi-torus D on the 2n-dimensional compact space M stratified by symplectic quasifolds. By Proposition 3.3, the image (M) of the moment mapping , is exactly the polytope . Remark 6.9. The remark above emphasizes the relationship between the space M and the polytope , which is very neat in the symplectic setting. From the complex point of view we have a compact space X , stratified by complex quasifolds and a homeomorphism from M onto X that is a diffeomorphism restricted to the strata. The space X is n-dimensional and is acted on by the complexified torus DC of the same dimension. Such an action has a dense open orbit, corresponding to the open set −1 (Int()). Complex toric spaces corresponding to will be treated in [B]. Acknowledgements. This work began in collaboration with Elisa Prato and the results presented here have been announced in the joint paper [BP2]. The project developed in these works was initiated by Prato’s article [P] and then carried on jointly in [BP1]. I am very grateful to Elisa Prato for having introduced me to the beautiful subject of quasifolds – working together has been an enrichening experience.

References [AYP]

Abe, E., Yan, Y., Pennycook, S.J.: Quasicrystals as cluster aggregates. Nature Materials 3, 759–767 (2004) [B] Battaglia, F.: Compactification of complex quasitori and nonrational convex polytopes. In preparation [BP1] Battaglia, F., Prato, E.: Generalized toric varieties for simple nonrational convex polytopes. Intern. Math. Res. Notices 24, 1315–1337 (2001) [BP2] Battaglia, F., Prato, E.: Nonrational, nonsimple convex polytopes in symplectic geometry. Electron. Res. Announc. Amer. Math. Soc. 8, 29–34 (2002) [BL] Bressler, P., Lunts, V.: Hard Lefschetz theorem and Hodge-Riemann relations for intersection cohomology of nonrational polytopes. Ind. Univ. Math. J. 54(1), 263–307 (2005) [D] Delzant, T.: Hamiltoniens périodiques et image convexe de l’application moment. Bull. Soc. Math. France 116, 315–339 (1988) [GM1] Goresky, M., MacPherson, R.: Stratified Morse Theory. New York: Springer Verlag, 1988 [G] Guillemin, V.: Moment maps and combinatorial invariants of Hamiltonian T n -spaces. Progress in Mathematics 122, Boston: Birkhäuser, 1994 [K] Karu, K.: Hard Lefschetz Theorem for Nonrational Polytopes. Invent. Math. 157(2), 419–447 (2004) [LT] Lerman, E., Tolman, S.: Hamiltonian torus actions on symplectic orbifolds and toric varieties. Trans. A.M.S. 349(10), 4201–4230 (1997) [Pe] Penrose, R.: The rôle of æsthetics in pure and applied mathematical research. Bull. Inst. Math. Applications 10, 266–271 (1974) [P] Prato, E.: Simple non-rational convex polytopes via symplectic geometry. Topology 40, 961–975 (2001) [R] Robinson, R.M.: Undecidability and nonperiodicity for tilings of the plane. Invent. Math. 12, 177–209 (1971) [Sh et al.] Shechtman, D., Blech, I., Gratias, D., Cahn, J.W.: Metallic phase with long-range orientational order and no translational symmetry. Phys. Rev. Lett. 53, 1951–1053 (1984) [SL] Sjamaar, R., Lerman, E.: Stratified symplectic spaces and reduction. Ann. of Math. 134, 375–422 (1991) Communicated by A. Connes

Commun. Math. Phys. 269, 311–365 (2007) Digital Object Identifier (DOI) 10.1007/s00220-006-0131-0

Communications in

Mathematical Physics

Distribution of Resonances for Open Quantum Maps Stéphane Nonnenmacher1 , Maciej Zworski2 1 Service de Physique Théorique, CEA/DSM/PhT, Unité de Recherche Associée au CNRS, CEA/Saclay,

91191 Gif-sur-Yvette, France. E-mail: [email protected]

2 Mathematics Department, University of California, Evans Hall, Berkeley, CA 94720, USA.

E-mail: [email protected] Received: 30 May 2005 / Accepted: 9 August 2006 Published online: 4 November 2006 – © Springer-Verlag 2006

Abstract: We analyze a simple model of quantum chaotic scattering system, namely the quantized open baker’s map. This model provides a numerical confirmation of the fractal Weyl law for the semiclassical density of quantum resonances. The fractal exponent is related to the dimension of the classical repeller. We also consider a variant of this model, for which the full resonance spectrum can be rigorously computed, and satisfies the fractal Weyl law. For that model, we also compute the shot noise of the conductance through the system, and obtain a value close to the prediction of random matrix theory. 1. Introduction 1.1. Statement of the results. In this paper we analyze simple models of classical chaotic open systems and of their quantizations. They provide a numerical confirmation of the fractal Weyl law for the density of quantum resonances of such systems. The exponent in that law is related to the dimension of the classical repeller of the system. In a simplified model, a rigorous argument gives the full resonance spectrum, which satisfies the fractal Weyl law. Our model is similar to models recently studied in atomic and mesoscopic physics (see §2.4 below). Before stating the main result we remark that in this paper we use mathematicians’ notation h for what the physicists call . That is partly to stress that our h is a small parameter in asymptotic analysis, not necessarily interpreted as the Planck constant. Theorem 1. There exist families of symplectic relations, B ⊂ T2 × T2 , and of their N −1 (subunitary) quantization, Bh ∈ L(C ), N = (2π h) , such that # λ ∈ Spec( Bh ) : |λ| ≥ r = c(r ) h −ν + o(h −ν ), r > 0, h = h k = (2π D k )−1 → 0, k → ∞, ν = dim − ( B) ∩ W+ ( B) , c(r ) = (2π )−ν χ[0,r0 ( B)] (r ), 0 < r0 ( B) < 1,

312

S. Nonnenmacher, M. Zworski

where the integer parameter D > 1 depends on B. The set − ( B) ⊂ T2 is the forward trapped set of B and W+ ( B) is the unstable manifold of B at any point of − ( B). The eigenvalues are counted with multiplicities. In the model discussed in detail we took D = 3. The asymptotics are actually much more precise and include uniform angular distribution (see Prop. 5.5). The resonances lie on a lattice, and some of this structure is also seen in numerically computed more generic situations (some numerical results have been presented in [40, 38, 39]). Each symplectic relation B (or “multivalued symplectic map”) is defined together with the probabilities, for any point, to be mapped to each of its images: B thus represents a certain stochastic process. The quantizations Bh quantize the relations together with their jump probabilities in the precise sense given in §4.4. In the models used in Theorem 1 we can compute the conductance and the shot noise power (or the closely related Fano factor) — see §2.4.3 and references given there for physics background and §6 for precise definitions. Theorem 2. Suppose that the models in Theorem 1 have the openings consisting of two “leads” of equal width (see §6.1 for a detailed description), so that each lead carries the same number, M(h) ∼ h −1 , of scattering channels. Then, the quantum conductance (6.2) between the two leads satisfies 1 (1.1) g(h) = M(h) 1 + o(1) , h = h k → 0. 2 The Fano factor (6.3) is given by F(h) =

11 M(h)ν 1 + o(1) , h = h k → 0, 80 g(h)

(1.2)

where the exponent ν is the same as in Theorem 1. The theorem should be interpreted as follows. In (1.1) we see that for a model of scattering through a chaotic cavity, approximately one half of the scattering channels get transmitted from one lead to the other, the other half being reflected back (this is natural and well known). Asymptotics in (1.2) are more interesting. We see that the fractal Weyl law, h −ν , appears in the expression for the Fano factor. In the interpretation of the Fano factor in terms of “shot noise” (see §2.4.3), 11/80 gives the average “shot noise” per “nonclassical transmission channel”. This number is close to the random matrix theory prediction for this quantity, namely 1/8 [26, 57]. In fact, had (1.2) come from a physical experiment rather than an asymptotic computation, it would be regarded as being in a very good agreement with random matrix theory1 . Much of the paper is devoted to rigorous definitions of the objects appearing in the statements of the two theorems. In this section we give some general indications, with detailed references to previous works appearing below. We consider the two-torus T2 = [0, 1) × [0, 1) as our classical phase (with coordinates ρ = (q, p)). Classical observables are functions on T2 and classical dynamics is given in terms of relations, B ⊂ T2 × T2 , which are unions of truncated graphs of symplectic (area and orientation preserving) maps T2 → T2 . An example is given by the baker’s relation p = p/3, 0 ≤ q ≤ 1/3 q = 3q, (ρ ; ρ) = (q , p ; q, p) ∈ B⇐⇒ (1.3) q = 3q − 2, p = ( p + 2)/3, 2/3 ≤ q < 1. 1 We are grateful to Yan Fyodorov for this amusing comment.

Distribution of Resonances for Open Quantum Maps

313

This is a “rectangular horseshoe” modeling a Poincaré map of a chaotic open system: some points (here {ρ : 1/3 < q < 2/3}) are thrown out “to infinity” at each iteration. For relations such as B we can define the forward and backward trapped sets (see (2.4) for the definition in the case of flows): ρ ∈ − ⇔ ∃{ρ j }∞ j=0 ,

ρ0 = ρ, (ρ j ; ρ j−1 ) ∈ B, j > 0,

ρ ∈ + ⇔

ρ0 = ρ, (ρ j ; ρ j−1 ) ∈ B, j ≤ 0.

∃{ρ j }0j=−∞ ,

In the example (1.3), − = C × [0, 1), + = [0, 1) × C, where C is the usual 13 − Cantor set. We also define the trapped set K = + ∩ − and, at points of K , the stable and unstable manifolds, W± . In the case of the above baker’s relation, ν = dim − ∩ W+ =

1 log 2 dim K = dim + ∩ W− = , 2 log 3

but for general (possibly multivalued) relations these equalities do not hold. A quantization (in the sense made rigorous in §4.5) of B is given by ⎛

F N /3 Bh = F N∗ ⎝ 0 0

0 0 0

⎞ 0 ⎠ , h = (2π N )−1 , 3|N , 0 F N /3

(1.4)

where F M is the discrete Fourier transform on C M . Table 1. Number of eigenvalues of Bh in the regions {|λ| > r }, for 2π h = 1/N , N given by powers of 3. N = 3k k k k k k k

=1 =2 =3 =4 =5 =6

r = 0.1

r = 0.2

r = 0.3

r = 0.4

r = 0.5

r = 0.6

r = 0.7

r = 0.8

5 14 32 63 124 237

5 14 26 53 103 196

5 10 23 45 85 161

5 9 19 40 78 150

5 8 16 33 71 142

4 8 16 33 65 131

3 7 14 30 63 128

3 6 5 6 11 12

Table 2 shows the analogies between the eigenvalues of this subunitary quantum map and the resonances of a Schrödinger operator for a scattering situation (see §2.1). For Bh given by (1.4) we are unable to prove the fractal Weyl law presented in the last line of Table 2, but numerical results strongly support its validity [40]. A striking illustration is given by tripling N , in which case the number of eigenvalues approximately doubles, in agreement with the fractal Weyl law — see Table 1. The family of subunitary quantum maps in the main theorem is obtained by simplifying Bh , and is described explicitly in (5.2). It is a quantization of a more complicated multivalued relation for which + = T2 , − = C × [0, 1), and − ∩ W+ C — see Proposition 5.1. Theorem 1 follows from the more precise Proposition 5.4.

314

S. Nonnenmacher, M. Zworski

1.2. Organization of the paper. In §2 we present related results from recent mathematical, numerical, and physics literature. In particular, in §2.4.3 we give the physical motivation for the objects appearing in Theorem 2 above. Section 3 is devoted to the review of classical dynamics used in our models, stressing the dynamics of open baker’s relations. In §4 we first review the quantization of tori. We assume the knowledge of semiclassical quantization in T ∗ Rn (pseudodifferential operators) but otherwise the presentation is self contained. The definitions of Lagrangian states associated to smooth and singular Lagrangian submanifolds is based on the ideas of Guillemin, Hörmander, Melrose, and Uhlmann in microlocal analysis but, partly due to technical differences, we give direct proofs of the properties we need in this paper. These properties are used to analyze the quantizations of the baker’s relation coming from the work of Balazs, Voros, Saraceno and Vallejos. Numerical results for the (usual) quantization of the open baker’s relation have been presented in [40], we briefly summarize them in §4.6. In §5 we discuss the toy model Bh , with two different interpretations. That section contains the proof of Theorem 1. Finally in Section 6 we give precise definitions of objects appearing in Theorem 2 and in a lengthy computation we give its proof. 2. Motivation and Background In this section we discuss motivating topics in mathematics and theoretical physics, and survey related results.

2.1. Schrödinger operators. The original motivation comes from the study of resonances in potential scattering. The simplest case is given by considering the following quantum Hamiltonian: H = −h 2 + V (q),

V ∈ Cc∞ (Rn ; R).

(2.1)

By assuming that the potential vanishes near infinity and that it is infinitely differentiable, we eliminate the need for technical assumptions — see [22] and [53] for more general settings, in the analytic and C ∞ categories respectively. For instance, as noted in [50, (c.32)–(c.33)] the theory of [22] applies to arbitrary homogeneous polynomial potentials at nondegenerate energy levels.

Table 2. Analogies between Schrödinger propagators and open quantum maps. h→0 χ exp − it (−h 2 + V )/ h χ , t ≥ 0, χ a cut-off on the interaction region e−it z/ h , z a resonance of H = −h 2 + V z ∈ [E − h, E + h] − i[0, γ h] #{z ∈ [E − h, E + h] − i[0, γ h]} C(γ ) h −μ E

N = (2π h)−1 → ∞ Bht , t = 0, 1, · · · Bh a subunitary matrix λt , λ an eigenvalue of Bh ∈ L(C N ) 1 ≥ |λ| > r > 0 #{λ, |λ| > r } C(r ) N ν

Before discussing open systems we recall the well known results for closed systems, obtained for instance by considering H above on a bounded domain ⊂ Rn and imposing a self-adjoint boundary condition at ∂ (Dirichlet or Neumann). Then the spectrum,

Distribution of Resonances for Open Quantum Maps

315

Spec(H ), of H is discrete and, at a non-degenerate energy level E its density is described by the celebrated Weyl law: # {Spec(H ) ∩ [E − δ, E + δ]} =

1 (2π h)n

| p 2 +V (q)−E|<δ

dq dp + O(h 1−n ),

(2.2)

see [14, 25] and references given there. We note that this implies a precise upper bound # {Spec(H ) ∩ [E − Ch, E + Ch]} = O(h 1−n ),

(2.3)

which can be improved further by making assumptions on the classical flow of the Hamiltonian p 2 + V (q) on , see [14, 25]. For open systems, with the simplest example given by the Hamiltonian in (2.1), real eigenvalues are replaced by complex resonances. The simplest definition (easily made rigorous in the case (2.1)) comes from considering the meromorphic continuation of the resolvent. Defining the Green’s function G(z, q , q) for Im z > 0 through (z − H )−1 u(q ) =

Rn

G(z, q , q) u(q)dq, u ∈ Cc∞ (Rn ),

then G(z, q , q) admits a meromorphic continuation in z across the real axis. Its poles for Im z < 0 (which do not depend on (q , q)) are the quantum resonances of H . Counting of resonances is affected by the dynamical structure of the scatterer much more dramatically than counting of eigenvalues of closed systems. Since we are now counting points in the complex plane we need to make geometric choices dictated by dynamical and physical considerations. Here we consider scatterers and energies exhibiting a hyperbolic classical flow, and regions in the lower half-plane which lie at a distance proportional to h from the real axis. This choice is motivated as follows. Quantum mechanics interprets a resonance at z = E − iγ in terms of a metastable state, which decays proportionally to exp(−tγ / h). Hence for γ h the decay is so rapid that the state is invisible. On the other hand, for many chaotic scatterers there are no resonances with γ h. One class for which this is known rigorously consists in the Laplacian on co-compact quotients Hn / , H = −h 2 Hn / , when the dimension of the limit set satisfies δ() < (n − 1)/2. This follows from the work of Patterson and Sullivan — see the discussion below and [37]. After a complex deformation (see [53] and references given there) the long living quantum states should semiclassically concentrate on the set of phase space points which do not escape to infinity, that is on the trapped set K E defined as follows: let H be the Hamilton vector field of the Hamiltonian H (q, p) = p 2 /2 + V (q):

H =

n

p j ∂q j − ∂q j V (q)∂ p j .

j=1

Then def

K E = + (E) ∩ − (E), with the forward/backward trapped sets def

± (E) = {ρ ∈ E : exp t H (ρ) → ∞, ∓t → ∞}.

(2.4)

316

S. Nonnenmacher, M. Zworski

Suppose that the flow generated by H is hyperbolic near K E for E close to a non-degenerate energy E. That means that the field H does not vanish on the energy surfaces E = { p 2 + V (q) = E } ⊂ T ∗ Rn for E ≈ E, and that for ρ ∈ E near K E , Tρ E = R H (ρ) ⊕ E + (ρ) ⊕ E − (ρ), E ρ −→ E ± (ρ) ⊂ Tρ E is continuous, d(exp t H )(E ± (ρ)) = E ± (exp t H (ρ)),

(2.5)

d(exp t H )(X ) ≤ Ce±λt X , for all X ∈ E ± (ρ), ∓t ≥ 0. Weaker assumptions are possible — see [50, §5] and [53, §7]. Typically, the set K E has a fractal structure and in the semiclassical estimates the Minkowski dimension naturally appears: dim K E = 2n − 1 − sup c : lim sup −c vol →0 × {ρ ∈ E : dist(K E , ρ) < } < ∞ . We say that K E is of pure dimension if the supremum is attained. For simplicity of the presentation we assume that this is the case. Under these assumptions the estimate (2.3) has an analogue for chaotic open systems [53]. For C0 > 0 there exists C1 such that # {Res(H ) ∩ {z : |z − E| < C0 h}} ≤ C1 h −μ E , dim K E = 2μ E + 1.

(2.6)

We notice that for a closed system the trapped set is the entire energy surface, so that in that case μ E = n − 1, hence (2.6) is consistent with (2.3). In this note we use open quantum maps to provide the first evidence that this precise estimate is optimal. We should also mention that, as was already stressed in the work of Sjöstrand [50], the estimates involving the dimension are only reasonable when the flow is strictly hyperbolic. In the case of more complicated flows the estimates should be stated in terms of properties of escape or Lyapunov functions associated to the flow – see [50, 53]. For expository reasons the estimates involving the dimension are however most persuasive. 2.2. Survey of related results. The first indication that fractal dimensions enter into counting laws for quantum resonances of chaotic open systems appears in a result of Sjöstrand [50]: −n 1 h # Res(H ) ∩ {z : |z − E| < δ, Im z > −γ } ≤ C1 δ γ −2m , γ (2.7) 1

Ch ≤ γ ≤ 1/C, max(h 2 , h/γ ) ≤ δ ≤ 2/C, where m is any number greater than the dimension of the trapped set in the shell H −1 (E − 1/C2 , E + 1/C2 ). In a homogeneous situation, such as for instance obstacle scattering, the dimension of K E , 2μ E + 1, is independent of E, so that m > 2(μ E + 1). The improvement in [53] quoted in (2.6) lies in providing a bound for the number of resonances in a smaller region D(E, Ch) = {z ∈ C : |z − E| < Ch}. Heuristic arguments suggesting that the estimate (2.7) should be optimal were given in [31] and [32].

Distribution of Resonances for Open Quantum Maps

317

Another class of Hamiltonians with chaotic classical flows and fractal trapped sets is given by Laplacians on convex co-compact quotients, H/ . Here is a discrete subgroup of isometries of the hyperbolic plane H, such that • All elements γ ∈ are hyperbolic, which means that their action on H can be represented as α ◦ γ ◦ α −1 (x, y) = e(γ ) (x, y), (x, y) ∈ H R+ × R, α ∈ Aut(H).

(2.8)

• If π : H → H/ , and () ⊂ ∂H is the limit set of , that is the set of limit points of {γ (z) : γ ∈ }, z ∈ H, then π(convex hull ()) is compact. The trapped set is determined by (): trapped trajectories are given by geodesics connecting two points of () at infinity, and dim K E = 2δ + 1, δ = dim (). The limit set is always of pure dimension, which coincides with its Hausdorff dimension. A nice feature of this model is the exact correspondence between the resonances of H = h 2 (−H/ − 1/4), and the zeros of the Selberg zeta function, Z (s):2 z ∈ Res(H ) ⇐⇒ Z (s) = 0, z = h 2 (s(1 − s) − 1/4), Re s ≤ δ ,

(2.9)

where the multiplicities of zeros and resonances agree. The Selberg zeta function is defined by the analytic continuation of 1 − e−(s+k) (γ ) , Re s > δ , Z (s) = {γ } k≥0

where {γ } denotes a conjugacy class of a primitive element γ ∈ (an element which is not a power of another element), and we take a product over distinct primitive conjugacy classes (each of which corresponds to a primitive closed orbit). The length (γ ) of the corresponding closed orbit appears in (2.8). The exact analogue of (2.6) is given by # {s : Z (s) = 0, Re s > −C0 , r < Im s < r + C1 } ≤ C2 r δ ,

(2.10)

which is a consequence of an estimate established by Guillopé-Lin-Zworski [20] in a more general setting of convex co-compact Schottky groups in any dimension, δ

|Z (s)| ≤ C K eC K |s| , Re s ≥ −K , for any K .

(2.11)

This improved earlier estimates of [59], the proof of which was largely based on [50]. In the (non-quantum) context of rational maps on the complex plane, similar results were obtained concerning the zeros of associated zeta functions [11, 54]. Take f a uniformly expanding rational map on C (for instance z → z 2 + c, c < −2), and call f n its n-fold composition. The zeta function associated with this map is given by ⎛ ⎞ ∞ n −s

|( f ) (z)| | ⎠ Z (s) = exp ⎝− . (2.12) n −1 1 − |( f n ) (z)|−1 n n=1

f (z)=z

2 We refer to [42] for this and a general treatment. The term 1 in the definition of the Hamiltonian H comes 4 from requiring that the bottom of the spectrum of H is 0, so that Green’s function (H − λ2 )−1 is meromorphic

in λ ∈ C.

318

S. Nonnenmacher, M. Zworski

Then the number or resonances in a strip is also given by a law of the type (2.10), where δ is replaced by the dimension of the Julia set: J= {z : f n (z) = z}. n≥1

Note that this set is also made of “trapped orbits”. 2.3. Survey of numerical results. The first model investigated numerically was perhaps the hardest to give definitive results. Lin [30, 31] studied the semiclassical Schrödinger Hamiltonian (2.1) with a potential made of 3 Gaussian “bumps”. The semiclassical resonances were computed using the method of complex scaling and were counted in boxes of type [E − δ, E + δ] − i[0, h] with δ fixed. The purpose was to verify optimality of Sjöstrand’s estimate (2.7) with these parameters. The results were encouraging but not conclusive. Since for small values of h the method of [30] required the use of large matrices to discretize the Hamiltonians, the range of h’s was rather limited. A different point of view was taken by Lu-Sridhar-Zworski [32] where resonances for the three discs scatterer in the plane were computed using the semiclassical zeta function of Eckhardt-Cvitanovi´c, Gaspard, and others (see for instance [12, 18, 58] and references therein). The zeta function is computed using the cycle expansion method loosely based on the Ruelle theory of dynamical zeta functions. Although it is not rigorously known if the resonances computed by this method approximate resonances of the Dirichlet Laplacian in the exterior of the discs, or even if the semiclassical zeta function has an analytic continuation, proceeding this way is widely accepted in the physics literature. Resonances z = h 2 k 2 were counted in regions {k ∈ C : 1 ≤ Re k ≤ r, Im k ≥ −γ } , r → ∞,

(2.13)

which under semiclassical rescaling correspond to counting in [1/2, 2] − i[0, γ h/2], h → 0. Let us denote the number of resonances (zeros of the semiclassical zeta function) in (2.13) by N (r, γ ). The fractal Weyl law corresponds to the claim that for γ large enough, N (r, γ ) ∼ C(γ ) r μ+1 , r → ∞, (2.14) where 2μ + 1 is the dimension of the trapped set in the three dimensional energy shell (for such homogeneous problems, all energy shells are equivalent). In [32] the prediction (2.14) was tested by linear fitting of log N (r, γ ) as a function of log r : log N (r, γ ) = (α(γ ) + 1) log r + O(1). We found that the coefficient α(γ ) was independent of γ for γ large enough, and that it agreed with μ. The counting was done for three different equilateral disc configurations, parametrized by ρ = R/a, where a is the radius of each disc, and R the distance between them. We also noticed that if γρ is the classical rate of decay for the ρ configuration, then αρ (xγρ /2) μρ is essentially independent of ρ for 1 < x < 1.5. This corresponds to a numerical observation that for each ρ the distribution of resonance widths (imaginary parts) peaks near γ = γρ /2.

Distribution of Resonances for Open Quantum Maps

319

Encouraged by the results of [32], the cycle method was used in [20] to count the zeros of the Selberg zeta function for a certain Schottky quotient, but the results were not definitive. For the dynamical zeta function (2.12) with f (z) = z 2 + c, c < −2, the resonances were computed by Strain-Zworski [54], using a different method based on the theory of the transfer operator on Hilbert spaces of holomorphic functions introduced in [20]. The zeros were counted in a region of the same type as in (2.13), {s : Re s > −K , 0 ≤ Im s ≤ r }, where real parts and imaginary parts exchange their meaning due to different conventions3 . By reaching very high values of r we saw a very good agreement of the log-log fit with the fractal Weyl, with μ given by the dimension of the Julia set. In the model considered in this paper, we can verify the optimality of the fractal Weyl law on much smaller scales (see Table 1 and the numerics presented in [40, 38, 39]). That could not be seen in the other approaches. 2.4. Related models in physics. The behaviour of quantum open systems has been recently investigated in situations where the classical dynamics has chaotic features. The physical motivation can originate from nuclear or atomic physics (study the lifetime statistics of metastable states, possibly leading to ionization), mesoscopic physics (study the conductance, conductance fluctuations, shot noise in quantum dots or quantum wires), and from waveguides (optical wave propagation in an optical fiber with some dissipation, microwave propagation in an open microwave cavity). 2.4.1. Kicked rotator with absorbing boundaries. In [3, 7] a kicked rotator with absorption was used to model the process of ionization. The classical kicked rotator is Chirikov’s standard map on the cylinder, which is a paradigmatic model for transitions from regular to chaotic motion [9]. Quantizing the map on L 2 (T1 ) results in a unitary operator U , a first instance of quantum map. To model the ionization process which takes place at some threshold momentum pion , the authors truncate the map U to the subspace Hion = span | p j : | p j | ≤ pion : a particle reaching that threshold is ionized, or equivalently “escapes to infinity”. Here the discrete values p j = 2π h j are the eigenvalues of the momentum operator on L 2 (T1 ); the space Hion is thus of dimension N ≈ pion /π h. This projection leads to an open quantum map, namely the subunitary propagator Uion = ion U , where ion is the orthogonal projector on Hion . For the parameters used by [3], the classical dynamics is diffusive, meaning that a particle starting from p = 0 will need many kicks to reach the ionization threshold. The matrix Uion was numerically diagonalized for various values of h with pion fixed, and the distribution of the N level widths γi = −2 ln |λi |, λi ∈ Spec(Uion ) was found approximately independent of h, such that the number of resonances n(N , γ ) = #{γi ≤ γ } scales like C(γ )N in this case. In subsequent works [47, 60, 17], this distribution was shown to correspond to an ensemble of random subunitary random matrices, more precisely the ensemble formed by the [α N ] × [α N ] upper-left corner (0 < α < 1 fixed) of a large N × N matrix drawn in the Circular Unitary Ensemble (that is, the set U (N ) equipped with Haar measure). 3 Although frustrating, the different conventions of semiclassical, obstacle, and hyperbolic scattering show how the same phenomenon appears in historically different fields.

320

S. Nonnenmacher, M. Zworski

2.4.2. Quasi-bound states in an open quantum map. Recently, Schomerus and Tworzydło [49] have performed a similar study for the quantized kicked rotator on the torus (obtained from the map of the former section by periodizing the momentum variable). They also “opened” the map by assuming that particles reaching a certain position window q ∈ L “escape to infinity”. The quantum projector associated with these “escape windows” is denoted by L , so that the remaining subunitary quantum map reads Uop = (I − L )U . The main difference with the case studied in the previous section lies in the strongly chaotic motion (as opposed to diffusive), due to a different choice of parameters. The map has a positive Lyapunov exponent λ, and a typical trajectory will escape after a few kicks: the average “dwell time”, called τ D , is of order unity. The eigenmodes associated with eigenvalues bounded away from zero are called “quasi-bound states”, as opposed to the “instantaneous decay modes” associated with very small eigenvalues. The authors provide numerical and heuristic evidence that, in the semiclassical limit, the number of quasi-bound states grows like Neff = N 1−1/(λτ D ) . This shows that most eigenvalues of Uop are very close to zero, while only a small fraction Neff /N remains bounded away from zero. The authors also plot the distribution of the ∼ Neff quasi-bound eigenvalues: again, it resembles the spectrum of a random subunitary matrix obtained by keeping the upper-right corner block of size Neff of a [τ D Neff ]-dimensional random unitary matrix. The quantized baker’s relation we will study in §4.6–5 will be of similar nature. For the map (4.38), the fractal dimension ν given in (3.7) can be shown to be close to the formula 1 − 1/(λτ D ), in the limit when the dwell time τ D is large compared to unity (limit of small opening). 2.4.3. Conductance through an open chaotic cavity. The “scattering approach to semiclassical quantization” [4, 15, 43, 41], consists in quantizing the return map on a Poincaré surface of the section of the Hamiltonian system under study. Within this approach, the scattering matrix of the open system can be expressed as a “multiple-scattering expansion” in terms of the quantized return map. Using that framework, Beenakker et al. [57] study the quantum kicked rotator defined in the previous section, in order to understand the fluctuations of conductance through a quantum dot. The evolution inside the closed dot is represented by the same unitary matrix U as in the last subsection, and its opening L is split into two intervals, L 2 and L 1 , which represent the two “leads” bringing in and taking out the charge carriers from the dot. The orthogonal projector corresponding to these openings reads L = L 1 ⊕ L 2 . The conductance can then be analyzed from the scattering matrix of the dot: ˜ S(ϑ) = L {e−iϑ − U (1 − L )}−1 U L .

(2.15)

Here ϑ ∈ [0, 2π ) is called the quasi-energy. In terms of this parameter, the “physical ˜ half-plane” corresponds to Im ϑ > 0: the matrix S(ϑ) has no singularity in this region. On the opposite, the resonances analyzed in the previous section, which are the poles of ˜ S(ϑ), are situated in the region Im ϑ < 0. def ˜ ˜ While S(ϑ) is unitary, its subblock t = L 2 S(ϑ) L 1 describes the transmission from the lead L 1 to the lead L 2 . The dimensionless conductance (which depends on ϑ) is given by the Landauer-Büttiker formula g = tr(tt ∗ ). The eigenvalues of tt ∗ (called “transmission eigenvalues”) can be either close to 1 (corresponding to a total transmission), or close to 0 (corresponding to a total reflection), or inbetween. The last case corresponds to genuinely quantum transmission eigenmodes, which are partly transmitted, partly reflected, due to interference phenomena inside the dot. The “quantum shot

Distribution of Resonances for Open Quantum Maps

321

noise” is due to these intermediate transmission eigenvalues. A simple measure of that noise is given by the Fano factor [6] F = tr(tt ∗ (1 − tt ∗ ))/trtt ∗ . Using similar arguments as in the former section, the authors show that the number of intermediate transmission eigenvalues also scales like Neff , and thereby estimate the Fano factor, by assuming that these eigenvalues are distributed according to the prediction of random matrix theory. In Section 6 we will analytically compute both the conductance and the Fano factor in the case of the open quantum relation Bh . 3. Classical Dynamics 3.1. Symplectic geometry on tori. We consider the simplest class of compact symplectic manifolds, the tori, def

T2n = R2n /Z2n (I × I)n , ω =

n

dq ∧ dp , (q, p) ∈ T2n .

=1

Here and in what follows, we identify the interval I = [0, 1) with the circle T1 = R/Z. A Lagrangian (submanifold) ⊂ T2n is a n-dimensional embedded submanifold of T2n such that ω| = 0. We recall the following well known fact (see for instance [24, Theorem 21.3.2]): Proposition 3.1. Suppose that ⊂ T2n is a Lagrangian submanifold, and that (q0 , p0 ) ∈ . Then, after a possible permutation of indices, there exists k, 0 ≤ k ≤ n, and a splitting of coordinates: q = (q , q ),

p = ( p , p ), q = (q1 , . . . qk ),

p = ( pk+1 , . . . , pn ),

such that the map (q, p) −→ (q , p ) ∈ In−k × Ik is bijective from a neighbourhood V of (q0 , p0 ) to a neighbourhood W of (q0 , p0 ). Consequently there exists a function, S = S(q , p ) defined on W , such that ∩ V is generated by the function S, that is, ∩ V = d p S(q , p ), q ; p , −dq S(q , p ) , (q , p ) ∈ W . In this paper we will also consider singular Lagrangian manifolds obtained by taking finite unions of Lagrangians with piecewise smooth boundaries.

3.2. Symplectic relations 3.2.1. Symplectic maps. A symplectic (or “canonical”) diffeomorphism on the torus T2n is a diffeomorphism κ : T2n → T2n which leaves invariant the symplectic form on T2n : κ ∗ ω = ω. An equivalent characterization of such a map is through its graph , which is the 2n-dimensional embedded submanifold of T2n × T2n , defined as κ = (ρ ; ρ) : ρ = (q, p) ∈ T2n , ρ = κ(ρ) .

322

S. Nonnenmacher, M. Zworski

Using the identification In = Rn /Zn , we set up the reflection map In p → − p ∈ In , and define the twisted graph [24, Section 25.2] κ = {(q , q; p , − p) : (q , p ; q, p) ∈ κ } ⊂ T4n .

(3.1)

4n Then the diffeomorphism κ is symplectic n iff κ is a Lagrangian submanifold of T (equipped with the symplectic form j=1 dq j ∧ dp j + dq j ∧ dp j ). For this reason, we will sometimes denote κ by κ . The definition of the twisted graph is clearly dependent on the choice of the splitting of variables (q, p), which will be related to a choice of polarization in the quantization process. More generally, one can consider invertible maps on T2n which are smooth and symplectic except on a negligible set of singularities (say, discontinuities on a hypersurface). The twisted graph of such a map is then a singular Lagrangian submanifold of T4n .

Example. The usual “baker’s map” is the following piecewise-linear transformation κ on T2 : (2q, p/2) if 0 ≤ q < 1/2 def κ(q, p) = (3.2) (2q − 1, p/2 + 1/2) if 1/2 ≤ q < 1. The twisted graph of κ: def κ = (q , q; p , − p) : (q, p) ∈ T2 , (q , p ) = κ(q, p) is a singular Lagrangian submanifold of T4 . It can be decomposed into κ = 0 ∪ 1 , with the components p+ j j = , − p : j/2 ≤ q < j/2 + 1/2, p ∈ I 2q − j, q; 2 = (2q − j, q; p , −2 p + j) : j/2 ≤ q, p < j/2 + 1/2 . Each j is locally Lagrangian in T4 and, as a manifold with corners, it is diffeomorphic to a 2-dimensional square. 3.2.2. Multivalued symplectic maps. A canonical (or symplectic) relation is an arbitrary subset ⊂ T2n × T2n , such that = (q , q; p , − p) : (q , p ; q, p) ∈ is a Lagrangian submanifold of T4n . We are interested in symplectic relations coming from multivalued symplectic maps. A multivalued map is the union of finitely many components κ j , where κ j is a canonical diffeomorphism κ j between an open subset S j with piecewise smooth boundary of T2n and its image S j = κ j (S j ) ∈ T2n . A priori, the sets S j (respectively S j ) can overlap, and their union can be a proper subset of T2n . Each map κ j is associated to its graph j = (κ j (ρ); ρ) : ρ ∈ S j ,

Distribution of Resonances for Open Quantum Maps

323

and the symplectic relation can now be defined through its graph = j, j

or equivalently its twisted graph (defined from as in (3.1)). is a singular Lagrangian in T4n . The inverse relation can be defined by −1 def −1 = (ρ; ρ ) : (ρ ; ρ) ∈ = (κ j (ρ); ρ) : ρ ∈ S j , j

and the composition of two relations by def ◦ = (ρ ; ρ) ∈ T4n : ∃ ρ ∈ T2n , (ρ ; ρ) ∈ and (ρ ; ρ ) ∈ . Following [24, Theorem 21.2.4], we note that ◦ will be a (locally) smooth symplectic relation if × ⊂ T4n × T4n intersects {(ρ , ρ , ρ , ρ) : ρ , ρ , ρ ∈ T2n } ⊂ T4n × T4n cleanly, that is the intersections of tangent spaces are the tangent spaces of intersections. We can then iterate a relation , defining a multivalued dynamical system { n : n ∈ Z} on T2n . In §3.4 we will give a stochastic interpretation to this system. 3.3. Open baker’s relation. The dynamics we will consider takes place on the 2-torus phase space, T2 = {ρ = (q, p) : q, p ∈ I} . On this phase space, we define two vertical strips S j ( j = 1, 2) from the data of four real numbers D1 , D2 > 1 and 1 , 2 ≥ 0: j j + 1 j = 1, 2. (3.3) with I j = , S j = (q, p) : q ∈ I j , p ∈ I , Dj Dj The strips are assumed to be disjoint, which is the case if we impose the conditions: 1 + 1 2 2 + 1 ≤ and ≤ 1. D1 D2 D2 The corresponding baker’s relation is made of two components B j , j = 1, 2 associated with linear symplectic maps defined on the two strips: p + j , ρ = (q, p) ∈ S j . (3.4) B j = (ρ ; ρ) : (q , p ) = D j q − j , Dj The baker’s relation is defined as the graph B = B1 ∪ B2 . One clearly notices that each component map is a hyperbolic diffeomorphism, with positive stretching exponent log D1 (resp. log D2 ). At all points where the map is defined, the unstable (stable) direction is the horizontal (vertical) one.

324

S. Nonnenmacher, M. Zworski

Since the two strips are disjoint, each point ρ ∈ T2 has at most one image. In the notations of Proposition 3.1 (taking q = q, q = q ), each Lagrangian component B j can be generated by the function j j S j (q, p ) = D j q − p − defined on the square (q, p ) ∈ I j × I j . Dj Dj (3.5) Let π L , π R : T2 × T2 −→ T2 be the projections on the left and right factors respectively. From the definition (3.4), the set π R (B) = S1 ∪ S2 is made of points on ρ ∈ T2 which have an image through the relation B. Hence, a point ρ ∈ π R (B) is said to escape from the torus at time 1. Similarly, a point ρ ∈ π L (B) = π R (B −1 ) is said to escape from T2 at time −1. This “escape” is the reason why we call this relation an “open” relation: the system is not “closed” because it sends particles “to infinity”, both in the future and in the past. We define ∞ def ± = (3.6) π R B ∓n n=1

the set of points which never escape from T2 in the past, respectively in the future. One checks that these subsets have the form − = C × I,

+ = I × C,

where C ⊂ I is a “cookie-cutter set” in the sense of [16]: if we consider the two contracting maps on I, f j (q) =

q + j , q ∈ I, Dj

j = 1, 2,

this closed set is defined as C=

q ∈ I : f j1 ◦ · · · ◦ f jn (q) = q for some sequence jm ∈ {1, 2} . n∈N

The Hausdorff dimension of C (which is equal to its Minkowski and box-counting dimensions) is given by the unique 0 < ν < 1 solving D1−ν + D2−ν = 1.

(3.7)

The trapped set (or set of nonwandering points) is defined as the set of points which never escape from T: K = + ∩ − = C × C, dim K = 2ν. The baker’s relation is a hyperbolic invertible map on the set K , which is a “fractal repeller”. This relation is a model of Smale’s horseshoe mechanism.

Distribution of Resonances for Open Quantum Maps

325

The simplest case consists in considering a symmetric baker’s relation, with D1 = D2 = D, = 1 = D − 2 − 1: +1 p+
log 2 . log D

3.4. Weighted symplectic relations. To give a multivalued map a physical meaning, we assign Markovian weights P j (ρ) to the different “jumps”, ρ → κ j (ρ). The associated dynamical system is then stochastic, each point ρ having finitely many images with well-prescribed transition probabilities P j (ρ). The sum of all the probabilities from ρ def must satisfy 0 ≤ P(ρ) = j P j (ρ) ≤ 1, so that (1 − P(ρ)) is the probability that ρ “escapes to infinity”. The weights associated with the inverse relation −1 are the same: each point ρ ) with probability P (ρ ) = P (κ −1 (ρ )). Hence, the weights jumps back to κ −1 j j j (ρ j must also satisfy 0 ≤ j P j (ρ ) ≤ 1. Such a weighted relation (in geometric optics one would speak of a “ray-splitting” map) induces a discrete-time evolution of “mass distributions”, which is in general dissipative: the full mass decrease at each step, the system expelling part of the mass “to infinity”. In more mathematical terms, we assume that the symplectic relation ⊂ T2n × 2n T comes with a nonnegative measure (or weight) μ on , which for any χα ∈ C ∞ (T2n , [0, 1]), α = L , R, satisfies πα∗ (π L∗ χ L π R∗ χ R μ) = gαχ L χ R

ωn , gαχ L χ R ∈ C ∞ (T2n ), 0 ≤ gαχ L χ R ≤ 1, n!

(3.9)

where π L , π R : → T2n are projections on left and right factors respectively, and ω is the symplectic form on T2n . The condition (3.9) implies that πα | is a local bijection, which forces to be a piecewise smooth union of graphs of symplectic transformations, as defined in §3.2.2. When is singular, that is a union of smooth symplectic relations with boundaries, we demand that gαχ L χ R ∈ C ∞ (T2n )

if supp(π L∗ χ L π R∗ χ R ) ∩ ∂ = ∅,

where ∂ is the union of the boundaries of the smooth components. The reason for introducing the measure μ is to have a quantity independent of the choice of coordinates on . On T2n , an obvious intrinsic measure is given by the symχ χ plectic form, hence gα L R are well defined. Building an atlas of the manifold we can use these functions to describe μ in local coordinates.

326

S. Nonnenmacher, M. Zworski

We denote a weighted relation by (, μ). As explained above, one can invert such a relation, as well as compose them. If (ρ ; ρ) ∈ \ ∂, the probability of a transition from ρ to ρ = κ j1 (ρ) is obtained by letting χ R (resp. χ L ) be supported in a sufficiently small neighbourhood of ρ (resp. of ρ ), with χ R (ρ) = 1, χ L (ρ ) = 1. This probability is then given by χ χR

P j1 (ρ) = g RL

χ χR

(ρ) = g L L

(ρ ) = P j 1 (ρ ).

(3.10)

Examples. The simplest example is given by a graph of a symplectic transformation κ : T2n → T2n in which case the density μ is obtained by taking μ = π L∗ (ωn /n!) = π R∗ (ωn /n!), where the equality follows from κ ∗ ω = ω. A slightly more complicated example is given by taking a union of two non-intersecting graphs j of κ j , j = 1, 2, and putting μ = (π R |1 )∗ (g1 ωn /n!) + (π R |2 )∗ (g2 ωn /n!), where g j ∈ C ∞ (T2n ; [0, 1]) satisfy g1 + g2 ≤ 1 and g1 ◦ κ1−1 + g2 ◦ κ2−1 ≤ 1. In this case, g j (ρ) = P j (ρ). In the case of an open baker B defined in §3.3, for instance the symmetric 3-baker (1.3), a natural μ comes from pulling back the Liouville measure ω to each component B j given in (3.4). One obtains π R∗ μ = 1l I1 ∪I2 (q) dq dp, π L∗ μ = 1l I1 ∪I2 ( p ) dq dp .

(3.11)

These equations fully determine the measure μ on B. A more interesting example, which will be relevant in §5, is given by the following multivalued generalization of the symmetric 3-baker: B=

2 =0

2 2 B + (0, /3; 0, 0) = Bk j , where

Bk j = 3q,

k=1 j=0

p+ j ; q, p : q ∈ I , p ∈ I , k 3

(3.12) I1 = (0, 1/3), I2 = (2/3, 1).

Each point ρ ∈ S1 ∪ S2 = (I1 ∪ I2 ) × I has 3 images, and each point ρ ∈ T2 has two preimages. The following measure on B will arise in the quantum model studied in §5. We define it explicitly on each component Bk j , using the right projection on Sk : sin2 (π p) 1l I1 (q) dq dp, + j)/3) sin2 (π p) = 1l I2 (q) dq dp, 9 sin2 (π( p + j − 2)/3)

π R∗ μ| ˜ B1 j = ˜ π R∗ μ| B2 j

9 sin2 (π( p

(3.13) j = 0, 1, 2.

The functions on the right-hand sides are the probabilities P j (ρ). The sum of these components reads ⎛ ⎞ 2 2 πp

1 sin ⎠ 1l I1 ∪I2 (q) dq dp = 1l I1 ∪I2 (q) dq dp. π R∗ μ˜ = ⎝ 9 sin2 π( p/3 + j/3) j=0

Distribution of Resonances for Open Quantum Maps

327

2 2 2 Here we used the fact4 that D−1 j=0 sin (Dx)/ sin (x + jπ/D) = D , with D = 3 and x = π p/3. This right pushforward is identical to that of (3.11): in both cases, any point ρ ∈ (S1 ∪ S2 ) has an empty escape probability, 1 − P(ρ) = 0. On the opposite, the left pushforward of μ˜ is given by sin2 3π p π L∗ μ˜ = 9

1 1 + 2 2 sin π p sin π( p − 2/3)

dq dp .

Almost any point ρ ∈ T2 has a nonzero escape probability through B −1 . This left pushforward is obviously different from that of μ. 4. Quantized Maps and Relations Before giving the definition of the quantized baker’s relation, we need to define the quantum Hilbert space corresponding to T2 , as well as the algebra of quantum observables. 4.1. Quantized tori. The quantization of tori T2n = R2n /Z2n has a long tradition in mathematical physics [21, 13, 5]. It can be considered as a special case of the BerezinToeplitz quantization of compact symplectic Kähler manifolds — see [27] and references given there. Here we will give a self-contained presentation of the simplest case from the point of view of pseudodifferential operators. We first recall from [14] the quantization of functions f ∈ Cb∞ (T ∗ Rn ), Cb∞ (T ∗ Rn ) = { f ∈ C ∞ (T ∗ Rn ) : ∀α, β ∈ Nn , def

sup

(q, p)∈T ∗ Rn

|∂qα ∂ βp f (q, p)| < ∞}.

To any f ∈ S(T ∗ Rn ) we associate its h-Weyl quantization, that is the operator f w (q, h D) acting as follows on ψ ∈ S(Rn ): 1 q + r i #q−r, p def w [ f (q, h D) ψ](q) = , p eh ψ(r ) dr dp. (4.1) f (2π h)n 2 This operator clearly has the mapping properties f w (q, h D) : S(Rn ) −→ S(Rn ), f w (q, h D) : S (Rn ) −→ S (Rn ). It can be shown [14, Lemma 7.8] that f → f w (q, h D) can be extended to any f ∈ Cb∞ (T ∗ Rn ), and that the resulting operator has the same mapping properties. Furthermore, f w (q, h D) is a bounded operator on L 2 (Rn ). We now introduce quantum spaces associated with the torus T2n . For that aim, we fix our notations for the semiclassical Fourier transform on S (Rn ): i 1 def ψ(q) e− h #q, p dq, Fh ψ( p) = n/2 (2π h) 4 The value of the sum at x = 0 is equal to D 2 , and the sum is invariant under translation x → x + kπ/D. Fejér’s formula for the Cesàro mean of the Fourier series shows that the sum is a trigonometric polynomial of degree D − 1 in x, hence it is constant.

328

S. Nonnenmacher, M. Zworski

and as usual in quantum mechanics, Fh ψ( p) is the “momentum representation” of the state ψ. The torus quantum space is made of distributions ψ ∈ S (Rn ) which are both periodic in position and momentum: ψ(q + ) = ψ(q), Let us denote by

Hhn

Fh ψ( p + ) = Fh ψ( p).

(4.2)

this space of distributions. We have the following elementary

Lemma 4.1. Hhn = {0} if and only if h = (2π N )−1 for some positive integer N , in which case dim Hhn = N n and Hhn is generated by the following basis: 1 n n Hh = span √ δ(q − − j/N ) : j ∈ (Z/N Z) . (4.3) N n ∈Zn The distributions elements of this basis will be denoted by |Q j ,

Qj =

j N

∈ In is the position on which that state is microlocalized. h, the Fourier transform Fh maps Hhn

One can check that for such a value of the above basis, it is represented by the discrete Fourier transform

(F N ) j

j

=

e−2iπ # j, j /N , N n/2

j, j ∈ (Z/N Z)n .

(4.4)

to itself. In

(4.5)

It is also easy to check the following Lemma 4.2. Suppose that f ∈ Cb∞ (Rn × Rn ) satisfies f (q + , p + m) = f (q, p) for any , m ∈ Zn . Then the operator f w (q, h D) maps Hhn to itself. Identifying a function f ∈ C ∞ (T2n ) with a periodic function on R2n , we will write Oph ( f ) for the restriction of f w (q, h D) on Hhn , C ∞ (T2n ) f −→ Oph ( f ) ∈ L(Hhn ). We remark that Oph (1) = Id. The vector space Hhn can be equipped with a natural Hilbert structure. Lemma 4.3. There exists a unique (up to a multiplicative constant) Hilbert structure on Hhn for which all Oph ( f ) : Hhn → Hhn with f ∈ C ∞ (T2n ; R) are self-adjoint. One can choose the constant such that the basis in (4.3) is orthonormal. This implies that the Fourier transform on Hhn (represented by the unitary matrix (4.5)) is unitary. Proof. Let #•, •0 be the inner product for which the basis in (4.3) is orthonormal. We write the operator f w (q, h D) on Hhn explicitly in that basis using the Fourier expansion of its symbol:

f (q, p) = fˆ(, m) e2πi(#,q+#m, p) . ,m∈Zn

For that let L ,m (q, p) = #, q + #m, p, so that

f w (q, h D) = fˆ(, m) exp(2πi L w ,m (q, h D)). ,m∈Zn

Distribution of Resonances for Open Quantum Maps

329

Applying this operator to the distributions (4.4), we get πi (2# j, − #m, ) |Q j−m , exp 2πi L w ,m (q, h D) |Q j = exp N and consequently, f w (q, h D) |Q j = Fm j =

,r ∈Zn

Fm j |Q m ,

m∈Zn /(N Z)n

πi # j + m, . fˆ(, j − m − r N )(−1)#r, exp N

Since πi # j + m, fˆ¯(−, j − m + r N )(−1)#r, exp − N ,r ∈Zn

πi = # j + m, , fˆ¯(, j − m − r N )(−1)#r, exp N n

F¯ jm =

,r ∈Z

we see that for real f , f = f¯, F jm = F¯m j . This means that f w (q, h D) is self-adjoint for the inner product #•, •0 . We also see that the map f → (F jm ) j,m∈(Z/N Z)n is onto, from C ∞ (T2n ; R) to the space of Hermitian matrices. Any other metric on Hhn could be written as #u, v = #Bu, v0 = #u, Bv0 . If # f w u, v = #u, f w v for all f ’s, then B f w = f w B for all f ’s, and hence for all Hermitian matrices. That shows that B = c Id, as claimed. & ' This choice of normalization #•, •0 can be obtained in a natural way, if we use the following periodization operator to construct Hhn from S(Rn ) [5]: Lemma 4.4. For any h = (2π N )−1 , the periodization operator PT2n : S(Rn ) → Hhn defined below is surjective: ∀ψ ∈ S(Rn ),

def

[PT2n ψ](Q j ) =

1 N n/2

ψ(Q j − ν),

j ∈ (Z/N Z)n . (4.6)

ν∈Zn

In the rest of this article we will always assume that h = (2π N )−1 for some N ∈ N, so the semiclassical limit corresponds to N → ∞. The scalar product on Hhn will be #•, •0 . From now on we will omit the subscript 0, and also often use Dirac’s notation #•|• for this product. For instance, the j th component of a state ψ ∈ Hhn in the basis (4.4) will be denoted by ψ(Q j ) = #Q j |ψ. The Hilbert norm associated with #•, • will simply be written • . 4.2. Lagrangian states. We want to characterize the semiclassical localization in phase space of sequences of states of the form ψ = {ψh ∈ Hhn }h→0 . In general we will assume that each element of this sequence is normalized, ψh = 1, but all definitions can be extended to sequences such that the norms satisfy ψh = O(h K ) as h → 0, for some fixed K ∈ R (the sequence ψ is then said to be tempered).

330

S. Nonnenmacher, M. Zworski

The localization of this sequence is first characterized through its microsupport, or wave front set, which is the following subset of T2n : WFh (ψ) = ρ ∈ T2n : ∃ f ∈ C ∞ (T2n ), f (ρ) = 0, Oph ( f )ψh = O(h ∞ ) , (4.7) where stands for the set theoretical complement. It is not hard to show [44, Prop. IV-8 ] that this definition is equivalent to the following: ρ ∈ WFh (ψ) if and only if there exists a neighbourhood Wρ of ρ such that, for any f ∈ C ∞ (T2n ) supported in Wρ , Oph ( f )ψh = O(h ∞ ). This yields the following Lemma 4.5. For any function f ∈ C ∞ (T2n ) with f ≡ 0 in an open neighbourhood of WFh (ψ), we have Oph ( f )ψh = O(h ∞ ). As a consequence, the microsupport of a sequence ψ = {ψh }, ψh ) h K , cannot be empty. Proof. The (compact) support of f can be covered by finitely many Wρi , and using a partition of unity associated with these sets we can decompose it as f = i fi , with supp( f i ) ⊂ Wρi . We get the result by linearity, and using the second definition of WFh (ψ). & ' We also make the following observation: Lemma 4.6. Let ψ = {ψh ∈ Hhn }h→0 be a tempered sequence. Considering ψh as a N n -component vector in the basis (4.3), we define ψ¯ h as the vector with complex conjugate components. Then ¯ = {(q, − p) : (q, p) ∈ WFh (ψ)}. WFh (ψ) Proof. The definition (4.1) of Weyl’s quantization gives, for any function f ∈ C ∞ (T2n ), Oph ( f ) ψ¯ = f w (q, h D) ψ¯ = f¯w (q, −h D) ψ. The lemma follows from the definition (4.7) of the wave front set.

' &

Now let ⊂ T2n be a union of Lagrangian submanifolds of T2n with piecewise smooth boundaries. Definition 4.7. A sequence of states ψ = {ψh ∈ Hhn } is a Lagrangian state associated to , which we denote by ψ ∈ I (), if for any M ∈ N and any sequence of functions, f j ∈ C ∞ (T2n ), 1 ≤ j ≤ M, we have

f j | = 0,

Oph ( f M ) ◦ · · · ◦ Oph ( f 1 ) ψh = O(h M ) ψh .

(4.8)

From the definition (4.7) of the microsupport, we obtain that, if the sequence ψ is tempered, then ψ ∈ I () =⇒ WFh (ψ) ⊂ . (4.9) Indeed, suppose that ρ ∈ . Then there exists f ∈ C ∞ (T2n ) such that f | = 0 and f ≡ 1 in a neighbourhood of ρ. We can also find a ∈ C ∞ (T2n ) such that f = 1 on a neighbourhood of the support of a, and a(ρ) = 0. The symbol calculus (see

Distribution of Resonances for Open Quantum Maps

331

[14, Chap. 7]) shows that for any M, Oph (a)Oph ( f ) M = Oph (a) + O M (h ∞ ). On the other hand Oph ( f ) M ψh = O(h M ψh ), and as M is arbitrary and ψ tempered, Oph (a) ψh = O(h ∞ ). In view of (4.7), this gives (4.9). We stress that the opposite implication in (4.9) is not true in general. To see that consider n = 1 and the Lagrangian = {(0, p) : p ∈ I} ⊂ T2 . Let ψh ∈ Hh1 be the “torus coherent state at the origin”: 2 1/4 ψh (Q j ) = exp{−π N (Q j − r )2 }, j = 0, . . . , N − 1. N r ∈Z

h→0

Then one can check that ψh −−−→ 1, that WFh (ψ) = {(0, 0)} ⊂ . On the other hand, √ Oph sin(2πq) ψh ∼ π 2h, which shows that ψh ∈ / I (). In the physics literature, Lagrangian states are usually called WKB states, and are introduced as Ansätze for eigenstates of integrable systems, using Bohr-Sommerfeld quantization formulae [28]. For instance, in the case n = 1, if is generated by the function S ∈ C ∞ (I): S = (q, −S (q)), q ∈ I , (4.10) then for any function a(q) ∈ C ∞ (I), the state ψh ∈ Hh1 defined as a(Q j ) ψh (Q j ) = √ exp(−2iπ N S(Q j )), N

j = 0, . . . , N − 1,

(4.11)

is in I ( S ). In the next proposition, we generalize this construction to any dimension. Proposition 4.8. Let ⊂ T2n be an embedded Lagrangian manifold. Then for any ρ0 ∈ there exist Lagrangian states ψ ∈ I (), such that ρ0 ∈ WFh (ψ). Proof. We take ρ0 = (q0 , p0 ) ∈ , and assume that there exists a neighbourhood V of ρ0 , and a function S ∈ C ∞ (π(V )) (where π(q, p) = q), such that ∩ V = {(q; −dq S(q)), q ∈ π(V )}. This is a particular case of Proposition 3.1. The general case of a generating function S(q , p ) can be transformed to that of S = S(q) using the symplectic rotation (q , p ) → (− p , q ). On the quantum mechanical side, this rotation is performed through a partial Fourier transform in the variable q . Our construction below can be transposed to this general case through this Fourier transform (which acts covariantly on the Weyl quantization). We also assume that the neighbourhood V is contained in the interior of I2n , and we identify π(V ) with a subset of In . We first construct a Lagrangian state in L 2 (Rn ): i

u h (q) = a(q) e− h S(q) ,

(4.12)

with a symbol a ∈ C ∞ (Rn ) compactly supported inside π(V ), and such that a(q0 ) = 0. This state admits the norm u h L 2 = a L 2 . For any f ∈ C ∞ (T2n ), we apply the operator f w (q, h D) to that state. Although we could do it directly using (4.1), we prefer to reduce the problem to the case of S = 0 by conjugation with the unitary multiplication operator i w i v(q) −→ [e h S (q) v](q) = e h S(q) v(q), (4.13)

332

S. Nonnenmacher, M. Zworski

where we can assume that S ∈ Cb∞ (Rn ). We then apply the operator i

G w (q, h D) = e h S def

w (q)

i

f w (q, h D) e− h S

w (q)

,

to the function a(q). The symbol calculus shows that G(q, p) admits an h-expansion, with principal symbol g(q, p) = f (q, p + dq S(q)): if f vanishes on , then g vanishes on {(q, 0) : q ∈ π(V )}. We get i

[ f w (q, h D) u h ](q) = e− h S

w (q)

i

G w (q, h D) a(q) = e− h S

w (q)

g w (q, h D) a(q) + O(h).

The explicit integral [g w (q, h D) a](q) =

1 (2π h)n

g

q + r 2

i , p a(r ) e h #q−r, p dr dp

can be evaluated through the stationary phase method. The derivative of the phase vanishes at r = q, p = 0, so the integral admits the following expansion [23, Section 7.7] for q ∈ π(V ): [g w (q, h D) a](q) = L 0 (g a)(q) + h L 1 (g a)(q) + O(h 2 ).

(4.14)

Here each function L j (g a) is obtained by applying a certain differential operator (in (r, p)) on the function g((q +r )/2, p) a(r ), taking the output at the point (r = q, p = 0). The first term is simply L 0 (g a)(q) = g(q, 0) a(q). For q outside π(V ), the nonstationary phase estimates show that ∞ h w . (4.15) f (q, h D) u h (q) = O h + dist(q, π(V )) If f (ρ0 ) = 0, then L 0 (g a) is nonzero in a neighbourhood W of q0 , and we obtain f w (q, h D) u h L 2 (Rn ) = g w (q, h D) a L 2 (Rn ) + O(h) ≥ L 0 (g a) L 2 (W ) + O(h). (4.16) The left-hand side is thus bounded from below by a positive constant. On the opposite, if f vanishes on , then at each point q ∈ π(V ) we get L 0 (g a)(q) = 0, which implies that f w (q, h D) u h L 2 (Rn ) = O(h). The same procedure can be iterated to show that, for any family of functions f i ∈ C ∞ (T2n ) vanishing on , the function −M w u (M) [ fM ◦ · · · ◦ f 1w u h ](q), h (q) = h def

(4.17)

is uniformly bounded and smooth on Rn , and very small outside π(V ), as in (4.15). As a result, (M)

w fM (q, h D) ◦ · · · ◦ f 1w (q, h D)u h L 2 (Rn ) = h M u h = O(h M ).

(4.18)

We can now carry over the estimates (4.16,4.18) onto the state ψh = PT2n u h ∈ Hhn , where PT2n is the periodizing operator (4.6). Since a(q) was supported inside π(V ) ⊂ In , this state admits the following representation, which generalizes (4.10): ψh (Q j ) =

u h (Q j ) a(Q j ) = exp(−2iπ N S(Q j )), N n/2 N n/2

j ∈ (Z/N Z)n .

(4.19)

Distribution of Resonances for Open Quantum Maps

333

The norm of this state is therefore the sum ψh = N 2

−n

|a(Q j )| = 2

dq |a(q)|2 + O(h ∞ ),

j∈(Z/N Z)n

where we used the smoothness of a(q). Similarly, the projection on Hhn of the function (M) u h defined in (4.17) satisfies (M)

(M)

PT2n u h (Q j ) =

u h (Q j ) + O(h ∞ ), N n/2

j ∈ (Z/N Z)n .

(4.20)

(M)

From the smoothness of u h , we obtain the “projected version” of (4.18): Oph ( f M ) ◦ · · · ◦ Oph ( f 1 )ψh (M)

(M)

= h M PT2n u h = h M u h L 2 (Rn ) +O(h ∞ ) = O(h M ). On the other hand, if f (ρ0 ) = 0, one easily deduces from (4.16) that Oph ( f )ψh = f w (q, h D) u h L 2 (Rn ) + O(h ∞ ) ≥ C + O(h), C > 0. These estimates show that the family ψ ∈ I (), and that ρ0 ∈ WFh (ψ).

' &

Remark 4.1. The definition of I () mimicks the Hörmander-Melrose definition of Lagrangian distributions [24, Def. 25.1.1] (see [1] for an adaptation to the standard semiclassical setting). The requirement that is Lagrangian reflects the uncertainty principle, in the following sense. A Lagrangian submanifold is the lowest dimensional submanifold for which the conclusion of Proposition 4.8 holds, that is, for any ρ ∈ , there exists a state ψ satisfying ψ ∈ I () and ρ ∈ WFh (ψ). Indeed, let be an embedded submanifold of T2n . Let us assume that ψ ∈ I (), so (4.8) must hold for any family of functions f j | = 0. From the identity i [Oph ( f i ), Oph ( f j )] h = Oph ({ f i , f j }) + O(h), we see that Oph ({ f i , f j }) ψh = O(h). As in the proof of (4.9), we can show that if { f i , f j }(ρ) = 0 for some ρ ∈ , then ρ ∈ WFh (ψ). Hence, if we want the conclusion of Proposition 4.8 to hold for , then this submanifold must satisfy ∀ f i , f j ∈ C ∞ (T2n ),

f i | , f j | = 0 =⇒ { f i , f j }| = 0.

This property means that is co-isotropic, and must be of dimension ≥ n. Lagrangian manifolds are co-isotropic manifolds of minimal dimension.

334

S. Nonnenmacher, M. Zworski

4.2.1. Singular Lagrangian states. We now give an example where is a union of Lagrangians with piecewise smooth boundaries (in §3.2 we called such a singular Lagrangian). Let S be given by (4.10) and ψh by (4.11). Let us truncate ψh to some proper subinterval [Q, Q ] ⊂ I, that is, replace the symbol a(q) by the discontinuous function a(q) ˜ = a(q)1l[Q,Q ] (q). That gives a state ψ˜ h ∈ Hh1 . One could expect ψ˜ h to ˜ S ), where be a Lagrangian state in I ( S ) (as is ψh ), or rather in I ( ˜ S def = S ∩ ([Q, Q ] × I). This is not the case: one needs to include in the Lagrangian the singularity set sing = {(Q, p) : p ∈ I} ∪ {(Q , p) : p ∈ I}, ˜ S . We will indeed prove which is the “periodized” conormal bundle of the boundary ∂ ˜ S ∪ sing ), which can be considered as a semiclassical, discrete analogue that ψ˜ h ∈ I ( of singular Lagrangian distributions of Guillemin-Uhlmann [19] and Melrose-Uhlmann [34]. We have the following Lemma 4.9. Let us truncate the state (4.19) to a hypercube H ⊂ In , H = n=1 [α , β ] : ψ˜ h (Q j ) =

a(Q j ) 1l H (Q j ) exp(−2iπ N S(Q j )), N n/2

j ∈ (Z/N Z)n .

(4.21)

˜ S ∪ sing , where Then ψ˜ h is associated with the singular Lagrangian ˜ S = {(q, −dq S(q)), q ∈ H } and sing =

n (q, p) : q = α , p ∈ I, qm ∈ [αm , βm ] , pm = −dqm S(q), m = =1 (4.22) ∪ (q, p) : q = β , p ∈ I, qm ∈ [αm , βm] , pm = −dqm S(q), m = .

Remark. It would be tempting to generalize the lemma by replacing the hypercube H by an arbitrary set S with smooth boundaries. However, if n = 2, S ≡ 0, a ≡ 1, and ∂S does not contain a segment with rational slopes then W Fh (ψ˜ h ) = (S × {0}) ∪ (∂S × I2 ). The second component being 3-dimensional, this set is certainly not contained in a finite union of Lagrangians. Proof. As in the proof of Proposition 4.8, we can, by conjugation with the operator (4.13), reduce the proof to the case S = 0. We first consider states defined on Rn , localized on the hypercube H ⊂ Rn : u h (q) = 1l H (q) a(q), a ∈ C ∞ (Rn ).

(4.23)

We use the following ˜ 0 = H × {0} and sing be as in Lemma 4.9. The ideal J of periodic Lemma 4.10. Let ˜ 0 ∪ sing is (infinitely) generated by functions vanishing on the singular Lagrangian def g j ( p, q) = sin π(q j − α j ) sin π(q j − β j ) sin(π p j ), def

gi j (q, p) = sin(π pi ) sin(π p j ) i = j, 1 ≤ i, j ≤ n, φ j (q, p) = φ(q j , p1 , · · · , p j−1 , p j+1 , · · · , pn ), where φ(q j , •) ≡ 0, α j ≤ q j ≤ β j , ψ(q), where ψ ∈ C ∞ (In ) vanishes on H .

Distribution of Resonances for Open Quantum Maps

335

Proof. We only give the proof for the following model (n = 2), which contains all the basic ingredients of the general case. Let us study the ideal of functions vanishing on {q1 = p2 = 0} ∪ {q2 = p1 = 0} ∪ { p1 = p2 = 0} ∩ {q1 ≥ 0, q2 ≥ 0}. (4.24) The functions vanishing on the first factor in the intersection are generated by q1 p1 , q2 p2 , and p1 p2 . Writing an arbitrary function F(q, p) as F(q1 , q2 , p1 , p2 ) = F0 ( p1 , p2 ) + q1 F1 (q1 , q2 , p2 ) + q2 F2 (q1 , q2 , p1 ) + q1 p1 F11 (q1 , q2 , p1 , p2 ) + q2 p2 F22 (q1 , q2 , p1 , p2 ), we need to find conditions for q1 F1 (q1 , q2 , p2 ) and q2 F2 (q1 , q2 , p1 ) to vanish on (4.24). We treat the first function by expanding it as F1 (q1 , q2 , p2 ) = F10 (q1 , q2 ) + p2 F12 (q1 , p2 ) + q2 p2 F122 (q1 , q2 , p2 ). This forces F10 (q1 , q2 ) to vanish identically in {q1 , q2 ≥ 0} and F12 (q1 , p2 ) to vanish identically in {q1 ≥ 0}. The function F2 (q1 , q2 , p1 ) is treated identically. Hence the functions vanishing on (4.24) are generated by q1 p1 , q2 p2 , p1 p2 , and all the smooth fuctions ψ(q1 , q2 ), φ1 (q1 , p2 ), φ2 (q2 , p1 ) vanishing on {q1 , q2 ≥ 0}. The transposition to the torus setting gives the lemma for that case. The general case can be proven similarly. & ' This lemma means that any F ∈ J can be decomposed as

f i j gi j + ( f j j g j + f j φ j + ψ), F= j=i

j

where the functions f • are smooth and either periodic or antiperiodic in each variable, so that f • g• are periodic in all variables. The action of each term ( f g)w (q, h D) on the state (4.23) can be written ( f g)w u h = ( f a)w ◦ g w + h L( f, a, g) 1l H , where L( f, a, g) is a pseudodifferential operator of norm O(1). Therefore, we are reduced to study the action of the generators g w (q, h D), g = gi j , g j , φ j , ψ, on the characteristic function 1l H (q). We first note that ψ 1l H = φ wj 1l H ≡ 0, so there is nothing to prove in this case. For each j ∈ {1, . . . , n}, the generator g j contains a factor sin(π p j ). Up to an error O(h), we first quantize this factor and apply it to 1l H : def 1 1l H (q j + π h, q ) − 1l H (q j − π h, q ) = b j (q). 2i The function b j (q) is supported in the strips S j = |q j −α j | ≤ π h ∪ |q j −β j | ≤ π h , where it takes values ±1. We now apply the remaining factors of g j : this amounts to multiplying b j (q) by the product sin π(q j −α j ) sin π(q j −β j ) , and gives a function O(h). Taking the error into account, we obtain g wj 1l H L 2 (Rn ) = O(h). In the case of gi j , i = j, we apply sin(π h Di ) to b j (q): the resulting function takes values ±1 on its support Si ∩ S j , so that giwj 1l H L 2 (Rn ) = O(h). We have now proved that F w u h L 2 (Rn ) = O(h) for any F ∈ J . The procedure can be iterated to any finite product of functions Fi ∈ J , yielding the estimate (4.18). sin(π h D j )1l H (q) =

336

S. Nonnenmacher, M. Zworski

The proof is completed by the periodization argument as in the proof of Proposition 4.8. The only slight difference lies in the fact that the analogues of the func(M) tions u (M) h (q) of (4.17) may now have discontinuities near ∂ D, so that PT2n u h − (M) ' u h L 2 (Rn ) = O(h) instead of O(h ∞ ). & 4.3. Quantum relations. Suppose that ⊂ T2n × T2n is a Lagrangian submanifold. The basic example is given by the twisted graph κ of a symplectic diffeomorphism κ on T2n (see Sect. 3.2.1): κ = (q , q; p , − p) : (q , p ) = κ(q, p), (q, p) ∈ T2n . As we noticed in that section, the choice of change of sign depends on the choice of the splitting of variables (q, p), which is itself related with the choice of a polarization in the quantization a → Oph (a) [24, §25.2]. This somewhat cumbersome convention is explained as follows. Any state v ∈ Hhn is naturally identified to a linear form f v ∈ (Hhn )∗ through f v (w) = #v, w. In our notations5 , this scalar product is antilinear in the first component. To make the identification linear, we choose instead v ∈ Hhn =⇒ f v (•) = #v, ¯ •,

(4.25)

where states v are written as vectors in the basis (4.3). Let L(Hhn ) Hhn ⊗ (Hhn )∗ be the space of linear operators on Hhn . The linear identification (4.25) of Hhn with (Hhn )∗ gives the identification, ¯ w, u, v, w ∈ Hhn . L(Hhn ) Hh2n , through (u ⊗ v)(w) = u #v,

(4.26)

We observe that the norm on Hh2n is the same as the Hilbert-Schmidt norm on L(Hhn ): 1

T H2n = (trHnh (T ∗ T )) 2 . h

(4.27)

It is related to the operator norm on L(Hhn ) as follows: T L(Hnh ) ≤ T H2n ≤ N n/2 T L(Hnh ) . h

(4.28)

In particular, unitary operators have Hilbert-Schmidt norm N n/2 = (2π h)−n/2 . The identification (4.26) dictates the way an operator of the type A1 ⊗ A2 (with Ai ∈ L(Hhn )) acts on u ⊗ v ∈ Hh2n L(Hhn ). Indeed, if we take any w ∈ Hhn , we have [(A1 ⊗ A2 )(u ⊗ v)] (w) = [A1 u ⊗ A2 v] (w) = A1 u #A2 v, w = A1 u #v, ¯ A 2 w = (A1 u ⊗ v) ◦ A 2 (w). Here A 2 is the transposed of the operator A2 , written as a matrix in the basis (4.3). In the case A1 = Oph (a), A2 = Oph (b) for some real functions a, b ∈ C ∞ (T2n ), one 5 This is the physicists’ convention.

Distribution of Resonances for Open Quantum Maps

337

˜ with the same twisted function as in the proof of Lemma 4.6: checks that A 2 = Oph (b), ˜ b(q, p) = b(q, − p). By linearity, for any C h ∈ Hh2n L(Hhn ), we have ˜ Oph (a ⊗ b) C h = Oph (a) ◦ C h ◦ Oph (b).

(4.29)

The sign change in the tilting ; parallels the transformation a(ρ ) b(ρ) ; ˜ a(ρ ) b(ρ). We are now in position to quantize a symplectic map, more generally a symplectic relation as defined in Sect. 3.2. Definition 4.11. A semiclassical sequence U = Uh ∈ Hh2n h→0 satisfying Uh H2n = O(h K ), for some fixed K ∈ R,

(4.30)

h

is a quantum relation associated with the symplectic relation if U is a Lagrangian state in I ( ), in the sense of Definition 4.7. Explicitly, for any M ∈ N and any sequence of functions g j ∈ C ∞ (T2n × T2n ), g j | = 0, 1 ≤ j ≤ M, we must have Oph (g M ) ◦ · · · ◦ Oph (g1 ) Uh H2n = O(h M ) Uh H2n . h

h

(4.31)

The assumption that Uh is tempered in the sense of (4.30) (which also implies temperedness in the operator norm) is necessary to assure that composing Uh with residual (O(h ∞ )) terms produces residual terms. That is a standard assumption in C ∞ semiclassical calculi — see [1, 51], and will be used in the proof of Prop. 4.12. The quantum weighted relations defined in §4.4 will naturally be tempered, having norms Uh H2n = h

O(h −n/2 ). If a function g ∈ C ∞ (T4n ) vanishes on , then the function g˜ defined as g(q ˜ , p ; q, p) = g(q , p ; q, − p) vanishes on . The condition g j | = 0 can thus be written g˜ j | = 0. We also note that (4.31) entails a version of Egorov’s theorem. If f L , f R ∈ C ∞ (T2n ) satisfy (ρ , ρ) ∈ =⇒ f L (ρ ) = f R (ρ),

then we have

Oph ( f L )Uh − Uh Oph ( f R )H2n = O(h) Uh H2n . h

h

(4.32)

def Indeed, the function f = f L ⊗ 1 − 1 ⊗ f R vanishes on , so that f˜ vanishes on . We then simply apply the definition (4.31) with g1 = f˜ and use (4.29). When is a graph of a symplectic transformation, f R is the pullback of f L , and we get a statement similar with the standard Egorov’s theorem.

Remark 4.2. Following Sect. 4.2, in the case when is a Lagrangian with boundaries projecting on a hypercube, it is useful to include in the definition sequences U = {Uh } in the (larger) space I ( ∪ sing ); the quantum baker’s relation we define in the next section will belong to such an enlarged space.

338

S. Nonnenmacher, M. Zworski

Through the identification (4.26), Uh is an operator on Hhn . We now show that this operator “classically transports” the microsupport of a sequence w = wh ∈ Hhn . Proposition 4.12. Take U = Uh ∈ Hh2n L(Hhn ) a quantum relation U ∈ I ( ). Then for any sequence w = wh ∈ Hhn , wh ) 1, the microsupport of the image sequence U (w) = {Uh (wh )} satisfies: WFh (U (w)) ⊂ WFh (w) = ρ ∈ T2n : ∃ ρ ∈ WFh (w), (ρ , ρ) ∈ . Proof. Assume that ρ0 ∈ (WFh (w)), which means that −1 (ρ0 ) ∩ WFh (w) = ∅. Then there exists a function f ∈ C ∞ (T2n ) with f ≡ 1 near ρ0 but with supp( f ) sufficiently small so that −1 (supp( f )) WFh (w). Consequently, there exists a function g ∈ C ∞ (T2n ) with g ≡ 1 near WFh (w) but g ≡ 0 on −1 (supp( f )). The function f ⊗ g˜ ∈ C ∞ (T4n ) then automatically vanishes on . Our aim is to show that ρ0 ∈ WFh (U (w)). For this, we introduce one further function a ∈ C ∞ (T2n ) such that a(ρ0 ) > 0 and f ≡ 1 on supp(a). As in the proof of (4.9) we see that for any M ∈ N, Oph (a)Oph ( f ) M = Oph (a) + O(h ∞ ). Hence Oph (a)Uh wh = Oph (a)Oph ( f ) M Uh wh + O(h ∞ ) ≤ Oph (a)Oph ( f ) M Uh Oph (g) M wh +Oph (a)Oph ( f ) M Uh (1 − Oph (g) M ) wh + O(h ∞ ). To bound the second term on the right-hand side, we notice that the function (1 − g M ) vanishes near WFh (w), so from Lemma 4.5 we get (1 − Oph (g) M ) wh = O(h ∞ ); from the temperedness of Uh , the second term is thus residual. The first term on the right-hand side is estimated using the identity Oph ( f ) M Uh Oph (g) M = Oph ( f ⊗ g) ˜ M Uh . Because f ⊗ g˜ vanishes on , the Hilbert-Schmidt norm of that operator is O(h M+K ), where K comes from the temperedness of Uh , (4.30). Using (4.28), we thus get Oph (a)Uh wh = O(h M+K ) for an arbitrary M ∈ N, which shows that ρ0 ∈ WFh (U (w)). ' & 4.4. Quantized weighted relations. In Sect. 3.4 we equipped a symplectic relation with a measure, or weight μ. In order to associate to the weighted relation (, μ) a sequence of operators Uh ∈ Hh2n , we need to elaborate on Definition 4.11, thereby defining a subfamily I ( , μ) I ( ). In the standard microlocal context [24, Sect. 25.1], a Lagrangian state ψ ∈ I () has a well defined amplitude, or symbol, which is a section of the Maslov half density bundle over the Lagrangian submanifold — see [24, Theorem 25.1.9]. The local aspects of this procedure have recently been adapted to the semiclassical case [1], and a similar approach can be used in the case of T4n . Although one could characterize the operators quantizing (, μ) in terms of their symbols (grossly speaking, the absolute square of the symbol should equal the weight

Distribution of Resonances for Open Quantum Maps

339

μ), we won’t do it here, in order to avoid technical issues involved in the description of the symbol map. Instead, in the definition below we use bilinear expressions in Uh , which allows us to avoid introducing symbols. Definition 4.13. Let (, μ) be a weighted piecewise smooth relation as defined in §3.4 and let U ∈ I ( ∪ sing ), in the sense of Definition 4.11 and Remark 4.2. For any χα ∈ C ∞ (T2n ; [0, 1]), α = L , R, we define def

Uχ L χ R = Oph (χ L ) Uh Oph (χ R ). We say that U quantizes the weighted relation (, μ) if for all χ L , χ R with sufficiently small supports satisfying supp(χ L ⊗ χ R ) ∩ sing = ∅, χ χR

Uχ L χ R Uχ∗L χ R = Oph (g L L

Uχ∗L χ R Uχ L χ R

=

) + O(h)

χ χ Oph (g RL R ) + O(h),

(4.33)

χ χ

where gα L R are the functions given in (3.9), and the remainder is O(h) in the operator norm on L(Hhn ). We then write U = {Uh } ∈ I ( ∪ sing , μ). The conditions on the smallness of supports of χα guarantee that the operators appearing on the left in (4.33) are of the form Oph ( f ), f ∈ C ∞ (T2n ). That follows from the fact that is locally a graph — see §3.4. If is the graph of a symplectic diffeomorphism κ and μ = π L∗ (ωn /n!) = π R∗ (ωn /n!), then Uh is unitary to leading order: Uh∗ Uh = I + C h , Uh Uh∗ = I + Dh , 1

C h L(Hnh ) = O(h), Dh L(Hnh ) = O(h).

1

For h small, (I +C h )− 2 , (I + Dh )− 2 exist, therefore a possibility to make the quantization 1 1 strictly unitary is to replace Uh by Uh (I + C h )− 2 or (I + Dh )− 2 Uh . The condition (4.33) can be interpreted as follows. Suppose that ψ ∈ Hhn , ψ = 1, is microlocalized at a single “regular” point ρ0 : WFh (ψ) = {ρ0 } ⊂ T2n \ π R ( sing ), and (ρ0 ) = ∪ Jj=1 ρ j , ρ j = κ j (ρ0 ). Then, Uh ψ =

J

ψ j + O(h ∞ ),

j=1

ψ j 2 = P j (ρ0 ) + O(h),

WFh (ψ j ) ⊂ ρ j .

From Lemma 4.5, if P j (ρ0 ) = 0 then WFh (ψ j ) = ρ j . A similar statement holds for Uh∗ .

340

S. Nonnenmacher, M. Zworski

Indeed, if for each j = 0, · · · , J we take χ j ∈ C ∞ (T2n ; [0, 1]) supported in a small neighbourhood of ρ0 , resp. ρ j , and equal to 1 near that point, (3.10) shows that χ χ

g R j 0 (ρ0 ) = P j (ρ0 ) for j = 1, . . . , J . On the other hand, Proposition 4.12 gives Uh ψ = Uh Oph (χ0 )ψ + O(h ∞ ) = WFh (Uχ j χ0 ψ) ⊂ ρ j .

J j=1

Uχ j χ0 ψ + O(h ∞ ),

(4.34)

def

If we take ψ j = Uχ j χ0 ψ then χ χ

ψ j 2 = #Uχ∗ j χ0 Uχ j χ0 ψ, ψ = #Oph (g R j 0 )ψ, ψ + O(h) = P j (ρ0 ) + O(h). Example. We now consider a special case of quantum relations Uh , of the form #Q j |Uh |Q k = N −n/2 a(Q j , Q k ) exp 2πi N S(Q j , Q k ) , (4.35) where a, S ∈ C ∞ (T2n × T2n ) and the generating function S(q , q) satisfies the nondegeneracy condition det(∂q2 q S) = 0 near the support of a(q , q). Using Definition 4.11 we see that Uh is associated to the graph S of the symplectic transformation (q, −∂q S) → (q , ∂q S). To be more precise, Uh ∈ I ( S , μ S ),

for μ S = |a(q , q)|2 dq dq, def

(4.36)

where we used the coordinates (q , q) on S . Projecting this measure on the left and right tori, we get: ⎛ ⎞

π L∗ μ S = ⎝ |a(q , q)|2 | det(∂q2 q S)|−1 ⎠ dq dp, q : p=−∂q S(q ,q)

⎛ π R∗ μ S = ⎝

⎞

(4.37)

|a(q , q)|2 | det(∂q2 q S)|−1 ⎠ dq dp .

q : p =∂q S(q ,q)

The above sums are always finite. This example will be used to analyze the quantum baker’s relations studied in the next sections. 4.5. Quantized baker’s relation. We explicitly construct quantum relations Bh ∈ L(Hh1 ) associated with the “open baker’s maps” described in Sect. 3.3. For simplicity, we will assume that the coefficients D j and j are integers. Besides, we will only consider the subsequence of Planck’s constants of the form h = (2π N )−1 such that N /D1 = M1 ∈ N and N /D2 = M2 ∈ N (that is, N is a multiple of lcm(D1 , D2 )). Restricting ourselves to this subsequence, we define the quantization of the baker’s relation (3.4) as the following operators (written as N × N matrices in the bases (4.3)): ⎛ ⎞ 0 0 0 0 0 0 0⎟ def ⎜0 F M 1 0 = B1,h + B2,h . Bh = F N∗ ◦ ⎝ (4.38) 0 0 0 F M2 0⎠ 0 0 0 0 0

Distribution of Resonances for Open Quantum Maps

341

The numbers of columns in successive blocks are respectively given by 1 M1 , M1 , 2 M2 − (1 + 1)M1 , M2 , (D2 − 2 − 1)M2 , and F M is the discrete Fourier transform given in (4.5). These matrices obviously generalize the unitary matrices associated with the closed baker’s map [2]. We now check that the matrices (4.38) satisfy Definition 4.13 if we select the appropriate Lagrangian surface on T4 , namely by adjoining a singularity set sing to the twisted graph B (see Remark 4.2), and equip B with the weight μ described in (3.11). By linearity, we can separately consider the two blocks B j,h . Let us study the left block B1,h . Since the classical relation B1 is generated by the function S1 (q, p ) of (3.5), it is natural to express the operator B1,h in the mixed representation ( p , q), that is by a matrix from the basis |Q j to the basis {|Pk }. Since the change of basis matrix, (|Pk #Q j |) j,k=0,...,N −1 , equals F N , the operator A1,h defined as the matrix def

(#Q k |A1,h |Q j ) j,k=0,...,N −1 = (#Pk |B1,h |Q j ) j,k=0,...,N −1 = F N ◦ B1,h is given by the Fourier block F M1 at the same position as in (4.38), and zeros everywhere else. The following lemma reduces finding the (weighted) Lagrangian relation associated to B1,h to finding the (weighted) Lagrangian associated to A1,h . We denote by F the following transformation of T2n : F(q, p) = ( p, −q). It means we rotate by −π/2 around the origin in each plane (qi , pi ). We denote by FL the transformation of T4n acting through F on the left coordinates (q , p ) and leaving the right coordinates unchanged. def

Lemma 4.14. Suppose that Uh ∈ L(Hhn ) Hh2n and that Vh = F N ◦ Uh . Then, for any (possibly singular) Lagrangian C ∈ T4n , Uh ∈ I (C ) ⇐⇒ Vh ∈ I (D ), where D = FL (C ), equivalently D = FL (C) = {( p , −q ; q, p) : (q , p ; q, p) ∈ C}. Furthermore, Uh ∈ I (C , μ) ⇐⇒ Vh ∈ I (D , ν), with ν = FL∗ μ. Proof. The transformation C → D results from a general composition formula which can be proved by mimicking the semiclassical proof in [1]. Here it follows from the covariance properties of Weyl quantization with respect to the Fourier transform: for any a ∈ C ∞ (T2n ), (4.39) Fh−1 Oph (a) ◦ Fh = Oph (a ◦ F). As a result, for any f ∈ C ∞ (T4n ), Oph ( f )(Fh ◦ Uh ) = Fh ◦ Oph ( f ◦ FL )(Uh ). This identity proves the first assertion. Using (4.39), we notice that for any χ L , χ R ∈ C ∞ (T2n ; [0, 1]), the cutoff propagator Vχ L χ R satisfies χ ◦F χ R

Vχ∗L χ R Vχ L χ R = Uχ∗L ◦F χ R Uχ L ◦F χ R = Oph (g RL Vχ L χ R Vχ∗L χ R

=

Fh Uχ L ◦F χ R Uχ∗L ◦F χ R

Fh∗

=

) + O(h),

χ ◦F χ R Oph (g L L

◦ F −1 ) + O(h).

342

S. Nonnenmacher, M. Zworski

Using the pushforward of functions FL∗ f = f ◦ FL−1 and the fact that π R ◦ FL = π R , we get χ ◦F χ R

g RL χ ◦F χ R

gL L

−1 = π R∗ (π L∗ (FL∗ χ L ) π R∗ χ R μ) = π R∗ (π L∗ χ L π R∗ χ R FL∗ μ),

−1 ◦ F −1 = π L∗ FL∗ (π L∗ (FL∗ χ L ) π R∗ χ R μ) = π L∗ (π L∗ χ L π R∗ χ R FL∗ μ).

This proves that Vh is associated with the weight ν = FL∗ μ on D .

' &

Let us now describe the weighted Lagrangian associated with the operator A1,h . The kernel of that operator vanishes outside the square H = I1 × I1 , where I1 = [1 /D1 , 1 +1/D1], and on H it takes the values # " ! " ! D1 1l H (Q k , Q j ) exp −2iπ N S1 (Q k , Q j ) . Q k |A1,h |Q j = Pk |B1,h |Q j = N (4.40) The operator A1,h has the same form as in (4.35), with the (obviously nondegenerate) √ generating function S = −S1 and symbol a(q , q) = D1 1l H (q , q). If we forget (for a moment) the discontinuities of the symbol, we find that A1,h is associated with the graph S1 = q , −(D1 q − 1 ); q, (D1 q − 1 ) : q, q ∈ I1 , equipped with the weight μ S1 = D1 1l H (q , q) dq dq. From Lemma 4.14, the operator B1,h = F N∗ ◦ A1,h is associated with the graph FL−1 ( S1 ) = (D1 q − 1 ), q ; q, (D1 q − 1 ) : q, q ∈ I1 = B1 and the weight −1 μ1 = FL∗ μ S1 = D1 1l H ( p , q) dp dq, def

which can be expressed as π R∗ μ1 = 1l I1 (q) dq dp, π L∗ μ1 = 1l I1 ( p ) dq dp . It represents the half part of the weight (3.11). Let us now take the discontinuities of a(q , q) into account. Since they occur at the boundary of the square H , they have the same consequences as in Lemma 4.9. Namely, we must add to the Lagrangian S 1 a “singular” Lagrangian, which is the union of 4 pieces, each piece sitting above a side of H . This Lagrangian should then be rotated through FL−1 as well. For instance, the side {q ∈ I1 , q = 1 /D1 } leads (after rotation) to the singular Lagrangian 1 sing,1 = q = 0, q = ; p , p : p ∈ I1 , p ∈ I , D1 which contains the corresponding side of ∂ B1 . Similar Lagrangians sing,i , i = 2, 3, 4, contain the other sides of ∂ B1 . The same analysis applies to B2,h and hence we have proved the

Distribution of Resonances for Open Quantum Maps

343

Proposition 4.15. The sequence of matrices {Bh } given in (4.38) quantizes the classical baker’s relation B = B1 ∪ B2 of (3.4), in the sense of Definitions 4.11, 4.13, and Remark 4.2: ⎛ ⎞ 8 sing,j , μ⎠ , Bh ∈ I ⎝ B ∪ j=1

where the weight μ is given by (3.11). This quantization of the baker’s relation is very close to the “quantum horseshoe” defined by Saraceno-Vallejos in [45]. The operator Bh is contracting, and its eigenstates can be seen as “metastable states”, “decaying states” or “resonances”. This contraction mirrors the decay of a classical probability density evolved through the open map B (due to the “escape” of particles to infinity). This classical decay can be analyzed in terms of a “conditionally invariant measure” on T2 [8], which decays according to the classical decay rate γcl = − log(D1−1 + D2−1 ). 4.6. Numerical check of the Weyl law for the baker’s relation. We have numerically computed the spectra of the quantum baker relations for various symmetric and nonsymmetric baker’s relations. Results for the symmetric “3-baker” (D = 3, = 0) were presented in [38] (see also Table 1), some for the “5-baker” (D = 5, = 1) were given in [40], while a nonsymmetric map (D1 = 32, D2 = 3/2) was studied in [39]. In the log 2 symmetric cases, the trapped set is a pure Cantor set of dimension 2d = 2 log D , so that for any 1 > r > 0, the number of resonances in the annulus {|λ| > r } is expected to scale as log 2

#{λ ∈ Spec(Bh ) : |λ| ≥ r } ∼ C(r ) N log D in the limit N → ∞. Our numerics for both maps shows that this scaling is roughly satisfied along any sequence N → ∞; much better fits are obtained for N taken along geometric sequences of the type N = No D k , with No fixed and k → ∞ (as in Table 1), which lead us to the following weaker conjecture for the symmetric maps: log 2

#{λ ∈ Spec(Bh ) : |λ| ≥ r } ∼ C(No , r ) N log D ,

(2π h)−1 = N = No D k , k → ∞.

Here, the “profile function” C(No , r ) may (slightly) depend on the “root” of the geometric sequence. The special role played by geometric sequences is probably due to the strong relationship between the symmetric D-baker and the D-nary decomposition. On the opposite, for the nonsymmetric map the fractal Weyl law seems accurate for an “arbitrary” sequence N → ∞ [39], which was also the case for the nonlinear map studied by [49]. 5. A Toy Model Let us explicitly compute the matrix elements of the two vertical blocks B1,h , B2,h in (4.38), for the symmetric 3-baker. Both are matrices N × N /3, which we index by 0 ≤ k ≤ N − 1, 0 ≤ l ≤ N /3 − 1:

344

S. Nonnenmacher, M. Zworski

√ (B1,h )k l =

3(1 − e2iπ √ 1/ 3

k−3l N

(B2,h )k l = ω32k (B1,h )k l ,

)−1 (1 − ω3k )/N if k = 3l, , if k = 3l,

(5.1)

where ω3 = e2iπ/3 .

The largest matrix elements are near the “tilted diagonals” k ≈ 3l, and decay as 1/|k − 3l| away from them (see Fig. 6 in [40]). Being unable to rigorously analyze the spectrum of Bh , we replace this matrix by the following simplified model:

( B1,h )k l

Bh = BN = [ B1,h , 0, B2,h ], √ 1/ 3 if l = +k/3, = B1,h )k l , , ( B2,h )k l = ω32k ( 0 if l = +k/3,

where +x, denotes the integer part of x. For ⎛ 1 0 0 0 0 0 ⎜1 0 0 0 0 0 ⎜ ⎜1 0 0 0 0 0 ⎜ ⎜0 1 0 0 0 0 1 ⎜ B N =9 = √ ⎜ 0 1 0 0 0 0 3⎜ ⎜0 1 0 0 0 0 ⎜0 0 1 0 0 0 ⎜ ⎝0 0 1 0 0 0 0 0 1 0 0 0

N = 9, this gives ⎞ 1 0 0 ω32 0 0 ⎟ ⎟ ω3 0 0 ⎟ ⎟ 0 1 0 ⎟ ⎟ 0 ω32 0 ⎟ , ω3 = e2πi/3 . 0 ω3 0 ⎟ ⎟ 0 0 1 ⎟ ⎟ 0 0 ω32 ⎠ 0 0 ω3

(5.2)

(5.3)

The model has been obtained “by hand”, by replacing “lower order” terms in the matrix Bh by 0, keeping only nonzero elements on the “tilted diagonals”, and replacing (1 − e2πi(±1)/3 )/(N (1 − e2iπ(±1)/N )) by 1. The new matrix Bh retains some qualitative features of Bh but there is no immediate connection between their spectra: the “lower order” terms are not small enough for that, and Bh cannot be considered as a “small perturbation” of Bh . The simplicity of the matrices Bh will allow us to prove (in the case N = 3k , k ∈ N) the fractal Weyl law which we could numerically observe for Bh (see Sect. 5.2). It is interesting to notice that the simplified operator Bh is in fact not associated with the same classical relation as Bh : Proposition 5.1. In the notations of Sect. 4.2, the quantum relation { Bh } is associated with the weighted relation ( B, μ) ˜ given by (3.12) and (3.13): sing , μ), B ∪ ˜ Bh ∈ I ( sing =

2

sing,j ,

where sing,j =

q = 0, q = j/3 ; p , p , p , p ∈ I .

j=0

h = F N ◦ Proof. In place of Bh we will consider A Bh , and apply Lemma 4.14. From h can obviously be split into A 1,h + A 2,h . We will the structure of Bh , the operator A analyze the first component in detail, the analysis for the second one being similar. The 1,h |Q j is nonzero in the vertical strip I1 × I, with I1 = [0, 1/3): matrix #Q k | A % $ 2 1l I1 (Q j ) −2iπ Q k 1,h |Q j = √ #Q k | A e exp(−6iπ N Q k Q j ). 3N =0

Distribution of Resonances for Open Quantum Maps

345

Like A1,h (see §4.5), this operator is of the form (4.35), with generating function S(q , q) = −S1 (q , q) = −3q q and discontinuous symbol a(q , q) = 1l I1 (q)

e−2iπq sin(3πq ) . √ 3 sin(πq )

1,h is therefore associated with the graph Forgetting about discontinuities, A S1 = (q , p = −3q; q, p = −3q ), : q ∈ I, q ∈ I1 , and the weight μ S1 = |a(q , q)|2 dq dq = 1l I1 (q)

sin2 (3πq ) dq dq. 3 sin2 (πq )

After applying the transformation of Lemma 4.14, this leads to the graph 2 FL−1 ( S1 ) = (q = 3q, q ; p , p = 3 p ), : q ∈ I1 , p ∈ I = B1 j , j=0

and the weight −1 FL∗ μ S1 = 1l I1 (q)

sin2 (3π p ) dp dq. 3 sin2 (π p )

Through the change of variable (q, p ) → (q, p), we see that this is the weight (3.13) on the component B1 . The discontinuities of a(q , q) only occur along the two segments {(q ∈ I, q = 0)}, {(q ∈ I, q = 1/3)}: they generate the singular Lagrangian j Dsing, j = q = 0, q = ; p ∈ I, p ∈ I , j = 0, 1, 3 sing,0 , sing,1 . which transforms under FL−1 into the components Similarly, the second part of the matrix, B2,h , is associated to the twisted graph B2 with weight μ˜ | ' B2 and the two singular components sing,2 , sing,0 . & As explained in Sect. 3.4, the graph B can be obtained by adjoining to each point (ρ ; ρ) ∈ B the points (ρ + (0, 1/3); ρ) and (ρ + (0, 2/3); ρ). This “aliasing” is due to the diffraction created by the sharp cutoff in the matrix Bh , as opposed to the “smooth” decay of coefficients in Bh . A similar aliasing was observed in [56] for the graph associated with the unitary matrices A2k defined in (5.9): instead of quantizing the standard 2-baker (3.2), they are associated with a multivalued map obtained from it by aliasing. This observation was obtained using the propagation of coherent states. Both B and B share the same forward trapped set − = − = C ×I (see Section 3.3), but the backwards trapped set of B is easily shown to be + = T2 , which drastically differs from + . This asymmetry between − and + reflects the fact that, unlike B, the relation B is not time reversal symmetric. The fact that Bh is not associated with the relation B should not bother us too much though. In the next section, we will give a more “formal” construction of the matrix Bh , in the case where N is a power of 3 (this construction will also hold for any symmetric D-baker, for N a power of D). We will show that this matrix naturally appears through a “nonstandard” (Walsh) quantization of the open 3-baker relation B.

346

S. Nonnenmacher, M. Zworski

5.1. Walsh quantization of the baker’s relation. The Walsh model of harmonic analysis has been originally devoted to fast signal processing [29]. It has been used recently in mathematics to obtain simpler (and provable) versions of statements of the usual harmonic analysis — see [36] for an application in scattering theory and for pointers to the recent literature. The major advantage of Walsh harmonic analysis is the possibility to completely localize a wavepacket both in position and momentum: for our problem, this has the effect of avoiding diffraction problems due to the discontinuities of the map, which spoil the usual semiclassics [46]. Closer to our context, Meenakshisundaram and Lakshminarayan recently used the Hadamard Fourier transform (which is related with the Walsh transform we give below) to analyze the multifractal structure of some eigenstates of the (unitary) quantum 2-baker Bh [33]. 5.1.1. The quantum torus as a system of quantum Dits. We first fix the coefficient D ∈ N (D ≥ 2) of the symmetric baker’s relation (3.8), and will consider in this section only the inverse Planck’s constants of the form N = D k for some k ∈ N. In this case, integers j ∈ Z D k = {0, . . . , D k − 1} are in one-to-one correspondence with the words = 1 2 · · · k made of symbols (or “Dits”) ∈ Z D : Z Dk j =

k

D k− .

(5.4)

=1

The natural order for j ∈ Z D k corresponds to the lexicographic order for the symbolic words { ∈ (Z D )k }. This way, each position eigenstate |Q j of the basis (4.3) can be associated with the unique symbolic sequence 1 2 · · · k which gives its Dnary expansion j = 0 · 1 2 · · · k . Qj = (5.5) N Let us denote the canonical basis of C D by {e0 , e1 , . . . , e D−1 }. Then, each |Q j can be written as |Q j = e1 ⊗ e2 ⊗ · · · ⊗ ek . (5.6) Following [48], we denote each |Q j by | = |1 2 · · · k to emphasize the above tensor product decomposition. This way, the quantum space Hh1 is naturally identified with the tensor product of k spaces C D : Hh1 = (C D )1 ⊗ (C D )2 ⊗ · · · ⊗ (C D )k . In the quantum computating framework, each space (C D ) is interpreted as a “quantum Dit”, or “ quDit”, and the basis {|} is called the computational basis [35]. Viewed in our toral phase space, the quDit (C D ) is associated with the scale D − in the position variable, so (C D )1 is called the “most significant quDit”. 5.1.2. Walsh Fourier transform. The discrete Fourier transform of (4.5) (with n = 1, N = D k ) is the Fourier transform (in the sense of abstract harmonic analysis) on the group Z D k . More explicitly, each row of F D k corresponds to the character j → exp − 2iπ j j /D k of Z D k . Using (5.4), the matrix elements can be factorized: k j j ( j j ) (F D k ) j j = D −k/2 exp − 2iπ k = D −k/2 . exp − 2iπ D D =1

(5.7)

Distribution of Resonances for Open Quantum Maps

347

Notice that each m ( j j ) can be easily expressed in terms of the symbols of j and j :

m ( j j ) =

( j) ( j ).

+ =k+m

The Walsh Fourier transform is the Fourier transform on the group (Z D )k . It can be defined by keeping only the first factor on the right-hand side of (5.7): one obtains the matrix k 1 ( j j ) −1/2 − ( j)k+1− ( j ) = (Wk ) j j = D −k/2 exp − 2iπ D ωD , ω D = e2iπ/D . D =1 (5.8) Using the identification Hh1 (C D )⊗k , this definition can be recast as follows.

Lemma 5.2. The Walsh Fourier transform Wk acts simply on tensor product states: Wk (v1 ⊗ · · · ⊗ vk ) = F D vk ⊗ · · · ⊗ F D v1 , v ∈ C D , = 1, . . . , k. Here F D = W1 is the discrete Fourier transform on C D . As a result, Wk is a unitary tranformation on Hh1 . The proof consists in a straightforward algebraic check. As opposed to the discrete Fourier transform, the Walsh Fourier transform does not entangle the different quDits: a tensor product state is sent to another tensor product state. Example. To illustrate this simple lemma we take D = 2, and consider the following 2k × 2k matrix, k ≥ 1: √ (−1) jn / 2, m = +n/2, A2k = A0,2k , A1,2k , (A j,2k )0≤n≤2k −1, 0≤m≤2k−1 −1 = 0, m = +n/2,. (5.9) For instance when k = 2 we get ⎛

A 22

1 1 ⎜1 =√ ⎝ 2 0 0

0 0 1 1

1 −1 0 0

⎞ 0 0⎟ . 1⎠ −1

This sequence of matrices has been obtained as the “extreme” possibility among a family of different quantizations of the (closed) 2-baker’s map [48]6 , and its semiclassical properties were further studied in [56]. In a different context, this (unitary) matrix belongs to the family of transfer matrices associated with the de Bruijn graph with 2k vertices [55]. The transformation A2k acts as follows on tensor product states: v1 ⊗ · · · ⊗ vk −→ v2 ⊗ · · · ⊗ vk ⊗ F2 v1 . 6 We thank M. Saraceno for pointing out this interpretation to us.

348

S. Nonnenmacher, M. Zworski

This implies that this matrix can be easily expressed in terms of the Walsh Fourier transform (for D = 2): Wk−1 0 , (5.10) A 2 k = Wk 0 Wk−1 where the 2 × 2 block structure corresponds to the most significant (leftmost) qubit. This expression exactly parallels the one defining the Balazs-Voros (unitary) quantum baker [2]. Compared to this “usual” quantum baker, A2k is thus obtained by replacing the discrete Fourier matrices F2k , F2k−1 by their Walsh analogues Wk , Wk−1 . The matrix A2k is unitary; as we will see in the next section, our toy model Bh for the quantum open 3-baker (see Eq. (5.2)) is its subunitary analogue. 5.2. Resonances for the Walsh quantization of the open baker relation. In this section we set D = 3, and concentrate on the symmetric 3-baker (1.3). By analogy with the example in the last section, we modify the quantization (4.38,5.1), in the case N = 3k , by replacing the discrete Fourier matrices by their Walsh analogues. The resulting operator exactly coincides with the toy model (5.2) introduced in the beginning of this section: Lemma 5.3. In the case N = 3k , the matrix Bh defined in (5.2) can be rewritten in terms of the Walsh Fourier transforms as follows: ⎛ ⎞ Wk−1 0 0 Bh = Wk∗ ⎝ 0 0 0 ⎠ . 0 0 Wk−1 We omit the simple algebraic proof. If we define the “truncated” inverse Fourier matrix ⎞ ⎛ 1 0 1 1 3∗ def F (5.11) = √ ⎝1 0 ω32 ⎠, 3 1 0 ω3 the toy model Bh acts as follows on tensor product states: 3∗ v1 . Bh (v1 ⊗ · · · ⊗ vk ) = v2 ⊗ v3 ⊗ · · · ⊗ F

(5.12)

This form is particularly nice to compute the spectrum of Bh . We start by computing the spectrum of its power ( Bh )k , which is enough to obtain the radial distribution of resonances (that is, the distribution of resonance widths). Proposition 5.4. Let λ± , |λ− | < |λ+ |, be the eigenvalues of the matrix 1 1 1 .

3 = √ ω3 3 1 k− p p The non-zero eigenvalues of ( Bh )k (for N = (2π h)−1 = 3k ) are given by λ+ λ− , k 0 ≤ p ≤ k, each occurring with multiplicity p . From this we get the radial distribution of the eigenvalues of Bh (counted with multiplicities): k→∞ 1 ∀ r ∈ [0, 1], # λ ∈ Spec( Bh ) : |λ| ≥ r −−−→ C(r ), 2k 1 (5.13) 1, r < | det 3 | 2 C(r ) = 1 0, r > | det 3 | 2 .

Distribution of Resonances for Open Quantum Maps

349

Hence the nontrivial resonances accumulate near the circle of radius r0 ( B) = | det 3 | 2 . 1

This proposition gives Theorem 1, where B is the baker’s relation described in Proposition 5.1, Bh the matrices (5.2), and Planck’s constants are taken along the sequence {h k = (2π × 3k )−1 , k ∈ N}. Proof. From the expression (5.12), we see that 3∗ v1 ⊗ · · · ⊗ F 3∗ vk . ( Bh )k (v1 ⊗ · · · ⊗ vk ) = F ∗ )⊗k , so one eigenbasis is obtained by taking the tensor That means that ( Bh )k = (F 3 ∗ , and the eigenvalues of ( Bh )k are the corresponding prodproducts of eigenstates of F 3 ∗ are the eigenvalues ∗ . The nonzero eigenvalues λ+ , λ− of F ucts of eigenvalues of F 3 3 of 3 , so the first part of the proposition follows. To prove the second part, notice that k− p p each eigenvalue λ+ λ− of Bhk corresponds to an eigenvalue (possibly in the generalized 1− p/k p/k λ− | of Bh . Therefore, we are able to count eigenvalues of sense) of modulus |λ+ Bh (with multiplicities) in a given annulus. Let H (t) denote the Heaviside function, H (t) = 0 for t < 0, and H (t) = 1 otherwise. Then, for any 0 < r < 1, k k 1− p/k p/k # λ ∈ Spec( Bh ) : |λ| ≥ r = H (|λ+ | |λ− | − r) p

p=0

=

k

1 k log(|λ− λ+ | 2 /r ) . , ρ= log(|λ+ |/|λ− |) p

H (− p/k + 1/2 + ρ)

p=0

Using Stirling’s formula, one easily gets in the limit k → ∞: k 1 k H (− p/k + 1/2 + ρ) ∼ k 2 p p=0

#

2k π

ρ −∞

e−2kx d x → H (ρ). 2

This expression shows that the distribution of resonances is semiclassically dominated ' by the degrees | p − k/2| = O(k 1/2 ), and proves the second part of the proposition. & & √ √ 3 , with approximate values The explicit eigenvalues are λ± = 1+i√ 3 ± 11−i3 24 4 3

λ+ ≈ 0.8390 + i0.0942, |λ+ | ≈ 0.8443, λ− ≈ −0.5504 + i4058, |λ− | ≈ 0.6838. √ The geometric mean of their moduli is r0 ( Bh ) = |λ− λ+ |1/2 = | det 3 | = 3−1/4 . We need to analyze the spectrum of Bh more precisely to show that the distribution of resonances is asymptotically uniform with respect to the angular variable. Proposition 5.5. Let h = (2π 3k )−1 . As a set, the nontrivial spectrum of Bh is given by {λ+ } ∪ {λ− } ∪

1− p/k p/k λ−

{ωλ+

ωk =1

: 1 ≤ p ≤ k − 1}.

350

S. Nonnenmacher, M. Zworski

For each p = 0, k, the k eigenvalues asymptotically have the same degeneracy k1 kp , which shows that their distribution is uniform in the angular variable. Therefore, for any continuous function f ∈ C(D(0, 1)) we have (counting multiplicities in the LHS): 2π

1 1 dθ k→∞ f (λ) −−−→ f (|λ− λ+ | 2 , θ ) . 2k 2π 0 0=λ∈Spec( Bh ) ∗ Proof. To classify the nontrivial spectrum of Bh , we will use the eigenvectors v± of F 3 associated with the eigenvalues λ± . Call {η = η1 η2 · · · ηk : η ∈ {±}} (Z2 )k the set of binary sequences of length k. The number of symbols η = − in the sequence η is called the degree of η. The cyclic shift τ acts on these sequences as τ (η1 · · · ηk ) = allows us to partition (Z2 )k into periodic orbits, each orbit O = η2 · · · ηk η1 . The shift η, τ (η), . . . , τ O −1 (η) being of (primitive) period O = η . Since τ k = id, the primitive period must divide k. We call deg(O) the common degree of the elements of O and observe that k | O deg(O). (5.14) def

To each sequence η we associate the state |η = vη1 ⊗ vη2 ⊗ · · · ⊗ vηk , which is k−deg(η) deg(η) obviously an eigenstate of ( Bh )k , with eigenvalue λ+ λ− . These 2k states form an independent family, which span the nontrivial eigenspaces of Bh . This operator acts very simply on these states: ∀η ∈ (Z2 )k ,

Bh |η = λη1 |τ (η).

Hence, for any orbit O, Bh leaves invariant the O -dimensional subspace VO = span {|η, η ∈ O} . To compute the spectrum of Bh |VO we first observe that it is k−deg(O ) deg(O ) th λ− , which in view of (5.14) is equal to contained in the set of k roots of λ+ def 1−deg(O )/k deg(O )/k j SO = ωO λ+ λ− , j = 0, . . . , O − 1 .

def

We claim that Spec( Bh |VO ) = SO (clearly with no degeneracies). In fact, let O : VO → VO be defined by O |τ (η) = ω− |τ (η), for a choice of η ∈ O. This O operator is invertible on VO . By a verification on basis elements, Bh |VO ◦ O = ωO O ◦ Bh |VO , j Bh |VO ) for any j. and hence if λ ∈ Spec( Bh |VO ) then ωO λ ∈ Spec( Since O = O =⇒ VO ∩ VO = {0}, enumerating the orbits decomposition of (Z2 )k yields the full nontrivial spectrum of Bh . In spite of the large degeneracies, this nontrivial spectrum does not contain any Jordan block. The degree p = 0 corresponds to the unique orbit O = {η = + + · · · +}, so the eigenvalue λ+ is simple. Similarly, the degree p = k leads to the simple eigenvalue λ− . For any degree 1 ≤ p ≤ k − 1, call g = gcd(k, p). The sequences of degree p will take all possible periods η = k/, where ∈ N, |g. We show below that, in the semiclassical limit, the huge majority of the sequences of any degree p = 0, k have primitive period k.

Distribution of Resonances for Open Quantum Maps

351

Lemma 5.6. There exists C > 0, K > 0 s.t., for any k ≥ K and any degree 1 ≤ p ≤ k − 1, # η ∈ (Z2 )k : deg(η) = p, η < k log k ≤C . k k # η ∈ (Z2 ) : deg(η) = p Proof. We still use g = gcd(k, p). If g = 1, then all orbits of degree p are of primitive period k. If g > 1, there exists > 1, |g. For any P prime divisor of , any sequence of primitive period η = k/ is also of (nonnecessarily primitive) period k/P. Any sequence of degree p and (nonnecessarily primitive) period k/P can be seen as the P repetitions of a sequence of k/P bits, among p/P take the value (−). Therefore, the number which of such sequences is exactly k/P . As a consequence, we have p/P # η ∈ (Z2 )k : deg(η) = p, η < k ≤ # η ∈ (Z2 )k : deg(η) = p

P prime, P|g

k/P p/P

k .

(5.15)

p

We will now estimate each term in the above sum. From the symmetry can assume p ≤ k/2. Expanding the coefficient kp into

k p

=

k k− p , we

k k(k − 1) · · · (k − p + 1) , = p( p − 1) · · · 1 p we notice that both the numerator and the denominator contain exactly p/P factors , while the ratio of the remaining which are multiples of P. Their ratio gives k/P p/P factors is k (k − 1) · · · (k − P + 1)(k − P − 1) · · · (k − p + 1) p k/P = ( p − 1) · · · ( p − P + 1)( p − P − 1) · · · 1 p/P

≥

k k− p+1 ≥ + 1. 1 2

Here we used the fact that each factor (k − m)/( p − m) > 1, 0 ≤ m ≤ p − 2, and only kept explicit the last factor. The last inequality comes from p ≤ k/2. We have obtained a uniform upper bound for each term in the sum of (5.15). By > 0 s.t. the number of prime factors of any standard arguments, there exists K , C log k, so the number of terms in the sum is ≤ C log k. As a result, (5.15) k ≥ K is ≤ C log k/(k + 2), which proves the lemma. & is bounded from above by C ' This lemma shows that the orbits of period O < k have a negligible contribution to the asymptotic density of resonances. We can therefore act as if, for any 1 ≤ p ≤ k − 1, each orbit of degree p had period k, leading to the k eigenvalues j 1− p/k p/k ωk λ+ λ− , j = 0, . . . , k − 1 . In the semiclassical limit, these k eigenvalues are 1− p/k p/k

uniformly distributed on the circle of radius |λ+ λ− |, and each of them has multi plicity k1 kp . This shows that the asymptotic resonance distribution is circular-symmetric, with the radial distribution described in Proposition 5.4. & '

352

S. Nonnenmacher, M. Zworski

Remark 5.1. Several features of the (nontrivial) spectrum of Bh are very different from what one expects for a random subunitary matrix of size 2k × 2k : the (logarithms of the) resonances form a regular lattice, most eigenvalues are highly degenerate, and the radial density is a delta function at r0 ( B). Actually, the only generic feature seems to be the global fractal scaling of the Weyl law, and the uniform angular distribution. √ Remark 5.2. The radial density of resonances is governed by r0 ( B) = | det 3 |, which seems to depend on the subtleties of the quantization. As an example of this fact, in Section 6 we will consider the open baker’s map with D = 4, which we call B, obtained by keeping only the second and third strips. It has Lyapounov exponent log 4, and the Cantor set C (see §3.3) has dimension ν = log 2/log 4 = 1/2. The open map B obtained by removing the first and third strips has the same characteristics. However, if we Walshquantize these two maps, the spectra of Bh and Bh are very different. These spectra are obtained from the eigenvalues of different 2 × 2 blocks of the inverse Fourier matrix F4∗ . The first map leads to the block 1 i −1

4 = , 1 2 −1 with two nonzero eigenvalues λ± of different moduli, so the spectrum of Bh will satisfy the fractal Weyl law, and be concentrated around the circle of radius ' r0 ( B) = | det 4 | = 2−3/4 . In an opposite way, the second map leads to the singular block 1 1 1 .

4 = 1 2 1 The nontrivial spectrum of Bh then reduces to a simple eigenvalue λ+ = 1. In that case, the Weyl law is singular, corresponding to the profile function C(r ) ≡ 0. This qualitative difference between both spectra cannot be explained from the classical maps, but is due to the phases in the matrix elements of the quantum maps. 6. Conductance in the Walsh Model 6.1. Quantum transport. In this section, we consider open baker’s relations for which the “opening” consists in two disjoint intervals, which are supposed to represent two “leads” connecting a quantum dot to the outside world. We will prove Theorem 2 in this setting: (1.1) in §6.2 and (1.2) in §6.3. The baker’s relations defined in §3.3 can all be seen as truncations of invertible maps on T2 . More precisely there exists an invertible baker’s map, κ : T2 → T2 , such that the graph B = B1 ∪ B2 of the open baker’s map is B = κ ∩ {(q, p) : q ∈ I1 ∪ I2 = I, p ∈ I} . For admissible values of N , one can quantize the closed map κ into a unitary transformation Uh = Uκ,h on Hh1 by straightforwardly generalizing the method of [2]. Multiplying this unitary operator by the quantum projector I = Q j ∈I |Q j #Q j |, we obtain the quantum open baker’s map (4.38) Bh = Uκ,h I .

Distribution of Resonances for Open Quantum Maps

353

To obtain an agreement with the notations of §2.4.3, we can interpret the set I = I1 ∪ I2 as the “wall” of the quantum dot, while the complementary interval L = I \ I represents the “openings” of the dot, perfectly connected with the “leads”. In the previous sections, we studied the resonances, that is, the eigenvalues of Bh = Uκ,h I (or of Bh when choosing the Walsh quantization). Now, we want to study the “transport” through the dot, using the formalism presented in §2.4.3. We assume that the opening L splits into two disjoint “leads” L = L 1 ∪ L 2 , and we study the transmission matrix from lead L 1 to lead L 2 (for simplicity, both leads will have the same width). This matrix is obtained by decomposing the scattering matrix (2.15)

n ˜ S(ϑ) = L eiϑ Uh I eiϑ Uh L n≥0

into 4 blocks, according to the decomposition L = L 1 ⊕ L 2 . The transmission matrix from L 1 to L 2 is defined as the block

def inϑ t (ϑ) = einϑ L 2 Uh ( I Uh )n−1 L 1 = e tn . (6.1) n≥1

n≥1

Because L 1 and L 2 have the same rank M = N |L 1 |, t (ϑ) is a square matrix of size M. According to Landauer’s theory of coherent transport, each eigenvalue Ti (ϑ) of the matrix t ∗ (ϑ)t (ϑ) corresponds to a “transmission channel”. The dimensionless conductance of the system is then given by the sum over these transmission eigenvalues: g(ϑ) = tr t ∗ (ϑ)t (ϑ) . (6.2) A transmission channel is “classical” if the eigenvalue Ti is very close to unity (perfect transmission) or close to zero (perfect reflection). The intermediate values correspond to the “nonclassical channels”, the importance of which is reflected in the noise power P(ϑ) . (6.3) P(ϑ) = tr t ∗ t (ϑ) I d − t ∗ t (ϑ) , or the Fano factor F(ϑ) = g(ϑ) In general it may be necessary to perform the “ensemble averaging” (averaging over ϑ) to obtain significant results [57]. However, for the model we will study below, both conductance and noise power will depend very little on ϑ, so this averaging will not be necessary. To alleviate notations we will suppress the dependence in ϑ in the transmission matrix t. Our model. In the remainder of this section, we will compute the quantities characterizing the transport through the “dot” when Uh is a Walsh-quantized baker’s map similar to the operator (5.9), but with D = 4 instead of D = 2. The sequence of values of h consequently is given by 2π h k = 4−k , k = 1, 2, · · · . We will choose the two leads L 1 = [0, 1/4] and L 2 = [3/4, 1]: this way, the projectors L i and I = I d − L 1 − L 2 can be represented as tensor product operators: L 1 = π0 ⊗ I d4 ⊗ · · · ⊗ I d4 , L 2 = π3 ⊗ I d4 ⊗ · · · ⊗ I d4 , I = π I ⊗ I d4 ⊗ · · · ⊗ I d4 ,

354

S. Nonnenmacher, M. Zworski

where πi = |ei #ei | is a rank-1 orthogonal projector acting on C4 , and we note π I = π1 ⊕ π2 . h = Uh I , is the first one among the The “inside” propagator for this model, namely B two quantum maps mentioned in Remark 5.2: its nontrivial spectrum satisfies the fractal Weyl law with exponent ν = 21 , and is concentrated near the radius r0 ( B) = 2−3/4 . The number of scattering channels in each lead is the rank of L 1 (equal to that of L 2 ). It is given by 41 of the total dimension, and we denote it by M(h) =

1 (2π h)−1 = 4k−1 , 4

h ∈ {h k }.

(6.4)

The number of channels is “macroscopic”, and each channel is “fully coupled” to the leads. We are therefore in a very nonperturbative régime, where resonances have no memory at all of the eigenvalues of the closed (unitary) system.

6.2. Conductance. We will crucially use the fact that all operators under consideration act nicely on the tensor product structure Hh1 = (C4 )⊗k , that is, they do not entangle the quDits. It is then suitable to compute the trace of t ∗ t in a basis adapted to this tensor product, and we naturally choose the computational (or position) basis. The conductance is then given by tr(t ∗ t) =

#Q j |t ∗ t|Q j =

Q j ∈L 1

4k−1

−1

t|Q j 2 .

j=0

Let us consider an arbitrary j = 1 2 · · · k with 1 = 0, so that 0 ≤ j ≤ 4k−1 − 1. Using (6.1) we write

einϑ L 2 Uh ( I Uh )n−1 L 1 |Q j = einϑ tn |Q j , (6.5) t|Q j = n≥1

n≥1

so that t|Q j 2 =

ei(n−m)ϑ #Q j |tm∗ tn |Q j .

m,n≥0

From now on, we replace the notation |Q j , j ∈ [0, 4k−1 − 1], by the symbolic notation |, where the sequence = 0 2 · · · k corresponds to j. 6.2.1. Classical transmission channels. To understand the action of tn on | we notice that I Uh acts on tensor products as I Uh (v1 ⊗ · · · ⊗ vk ) = π I v2 ⊗ · · · ⊗ vk ⊗ F4∗ v1 . If n < k, we obtain tn | = π3 en+1 ⊗ en+2 ⊗ · · · ek ⊗ F4∗ e0 ⊗ F4∗ π I e2 ⊗ · · · ⊗ F4∗ π I en , frow which we draw the

(6.6)

Distribution of Resonances for Open Quantum Maps

355

Lemma 6.1. Consider a sequence = 0 2 3 · · · k , and assume that there exists an index 2 ≤ ≤ k such that ∈ {0, 3}. Let 0 be the smallest such index. Then 0 if 0 = 0, t| = 1 if 0 = 3. This shows that | is a classical transmission channel. Proof. For any 1 ≤ n ≤ 0 −2, n+1 ∈ {1, 2} by assumption. Hence the first quDit on the right-hand side of (6.6) vanishes and tn | = 0. Furthermore, the state ( I Uh )0 −1 | admits as first quDit π I e0 = 0, so that tn | = 0 for any n ≥ 0 . The only remaining term in (6.5) is t0 −1 |: • if 0 = 0, the first quDit of that term is π3 e0 = 0, so t0 −1 | = t| = 0. • if = 3, t0 −1 | = e3 ⊗ e0 +1 ⊗ · · · ⊗ F4∗ e0 ⊗ F4∗ e2 ⊗ · · · F4∗ e0 −1 . Since F4∗ is unitary, t0 −1 | = t| = 1. & ' The number of classical channels discussed in Lemma 6.1 is easy tocompute: it is obtained by removing from the set [0, 4k−1 − 1] ≡ 2 · · · k ∈ (Z4 )k−1 the sequences such that ∈ {1, 2} for all 2 ≤ ≤ k (these will be called “nonclassical sequences”). The number of the latter is 2k−1 , so the number of classical channels is 4k−1 − 2k−1 . Among them, half are fully reflected, t| = 0, and half are fully transmitted, t| = 1. Hence the conductance through these classical channels is trcl (t ∗ t) =

4k−1 − 2k−1 . 2

Remark 6.1. Such classical channels are mentioned in the analysis of [57] for the transmission through an open kicked rotator. They sit in the phase space regions above the lead L 1 which are either sent back to L 1 , or sent to L 2 through the classical dynamics, in a time smaller than the Ehrenfest time TEhr = log N /(log 4) = k. For our baker’s map B, these regions are vertical strips of widths 4− , = 2, . . . , k which exit to L 1 or L 2 at time . The particularity of the Walsh quantization is the exact full transmission (or reflection) through these channels. 6.2.2. Nonclassical transmission channels. The nonclassical channels are necessarily combinations of the position states | with ∈ {1, 2} for all 2 ≤ ≤ k (“nonclassical” sequences or states). The associated positions 4Q j = 0 · 2 3 · · · k lie close to the Cantor set C, such that − = C × I is the set of points never escaping through B or B (see Eq. (3.6)). For such a state |, the term (6.6) vanishes for all n < k, due to the first quDit π I en+1 = 0. That state therefore accomplishes k “unitary bounces” inside the cavity, before it starts to decay out of it. We first consider the terms tk+m | for 0 ≤ m < k: tk | = (e3 /2) ⊗ F4∗ e2 ⊗ F4∗ e3 · · · F4∗ ek , while for 0 < m < k, tk+m | = π3 F4∗ em+1 ⊗ F4∗ em+2 · · · F4∗ ek ⊗ F4∗ (e3 /2)⊗F4∗ π I F4∗ e2 · · · F4∗ π I F4∗ em . (6.7) An explicit computation shows that π3 F4∗ e j 2 =

1 , 4

π I F4∗ e j 2 =

1 , 2

j = 0, · · · , 3,

356

S. Nonnenmacher, M. Zworski

so that

1 , 0 ≤ m ≤ k − 1. (6.8) 4 × 2m For larger times n = pk + m, p > 1, m ∈ [0, k − 1], the state tn | is obtained from (6.7) by inserting the operator (π I F4∗ ) p−1 in front of each quDit e . Since π I F4∗ has a spectral radius |λ+ | < 1, the norms of these states will decay exponentially fast as n → ∞. This argument gives the following tk+m |2 =

Lemma 6.2. For any 0 < < 1, there exists C > 0 such that, for any k ≥ 1 and any nonclassical state |, we have

tk+m | ≤ C 2−k/2 . m>+k,

Neglecting errors of order O(2−k/2 ), we just need to compute From (6.8) we already know the diagonal terms: +k,

m=0

tk+m |2 =

+k,

1 + O(2−k ). 2

2 m=0 tk+m | .

(6.9)

In the next lemma we will show that the contribution to the conductance of the nondiagonal terms is negligible in the semiclassical limit. Lemma 6.3. Let 0 < ≤ 1/5. There exists C = C() > 0 such that for any k ≥ 1, ∗ t # nonclassical , ∃m, m ∈ [0, k], #|tk+m k+m | = 0 ≤ C 2−k/2 . # {nonclassical } In other words, in the semiclassical limit, a “generic” nonclassical state | will satisfy ∗ ∀m, m ∈ [0, k], m = m =⇒ #|tk+m tk+m | = 0.

Proof. Take an arbitrary nonclassical state |, and any m, m ∈ [0, k], m > m . From (6.7), the first (k − m) quDits of the states tk+m | and tk+m |) are respectively π3 F4∗ em+1 ⊗ F4∗ em+2 ⊗ · · · ⊗ F4∗ ek , π3 F4∗ em +1 ⊗ F4∗ em +2 ⊗ · · · ⊗ F4∗ ek+m −m . Due to the unitarity of F4∗ and the fact that the ei form an orthonormal basis of C4 , the two states tk+m |, tk+m | will be orthogonal if the sequences m+2 · · · k and m +2 · · · k+m −m are not equal. Since we took m < k, these two sequences are subsequences of length (k − m − 1) ≥ (1 − )k of the sequence , shifted from one another by (m − m ) steps. If the two subsequences are equal, then all subsequences k−( p+1)+1 · · · k− p , p = 0, · · · , R, ) ( def def k − m − 1 , = (m − m ), R = have to be equal to each other. Hence contains a subsequence of length (R + 1) which is periodic with period .

Distribution of Resonances for Open Quantum Maps

357

Let us count the number of such sequences . Once we have fixed the bits k−+1 · · · k , the remaining free bits are 2 · · · k−(R+1) . The number #(m, m ) of such sequences is therefore 2 × 2k−(R+1)−1 . From the definition of R, we get

#(m, m ) < 22m−m ≤ 22k . Taking into account all possible pairs (m, m ), we obtain the following bound for the number of nongeneric nonclassical channels: ∗ # nonclassical , ∃ m, m , 0 ≤ m < m ≤ k, #|tk+m tk+m | = 0 ≤ (k)2 22k . Since # {nonclassical } = 2k−1 and 2 ≤ 2/5, we have proven the lemma.

' &

From Lemma 6.2 and Eq. (6.9), a generic nonclassical sequence will satisfy t|2 = 21 + O(2−k/2 ). For a nongeneric nonclassical sequence , we use the simple bound t|2 ≤ 1. As a result, we get the following estimate for the “nonclassical conductance”: trnoncl (t ∗ t) =

nonclassical generic

t|2 +

t|2 =

nonclassical nongeneric

2k−1 1 + O(2−k/2 ) . (6.10) 2

Adding this to the “classical conductance”, we get the full conductance g(ϑ) = tr(t ∗ t (ϑ)) =

4k−1 + O(2(1−/2)k ). 2

(6.11)

The implied constant is independent of ϑ ∈ [0, 2π ) and 0 < ≤ 1/5. The number of scattering channels in our model is given by M(h) = 4k−1 , see (6.4), so we have proven (1.1) in Theorem 2.

6.3. Noise power. The conductance corresponds to the first moment of the distribution of transmission eigenvalues. It can not distinguish between a purely classical transport (Ti ∈ {0, 1}) and a quantum one (some Ti take intermediate values). To do so, we need to compute the second moment of these eigenvalues, that is, the trace

tr((t ∗ t)2 ) = t ∗ t|Q j 2 , Q j ∈L 1

or equivalently the noise power (6.3). As in the previous section, we split the sum on the right-hand side between the classical and nonclassical states |Q j =|. Lemma 6.1 shows that half the classical states are in the kernel of t ∗ t, half in the eigenspace of t ∗ t associated with the eigenvalue 1 (full transmission). As a consequence, the trace over the classical states takes the value trcl ((t ∗ t)2 ) =

4k−1 − 2k−1 . 2

Obviously, the classical states are noiseless.

358

S. Nonnenmacher, M. Zworski

The contribution of the nonclassical states is more delicate. According to Lemma 6.2, for any nonclassical state | we have (for any 0 < < 1) t| =

+k,

einϑ tk+m | + O(2−k/2 ).

m=0

We now apply to each state tk+m |, m ≤ k, the adjoint operator

e−inϑ tn∗ . t∗ =

(6.12)

n≥0

According to (6.7), the state tk+m | has the form tk+m | = e3 ⊗ F4∗ π I w2 ⊗ · · · ⊗ F4∗ π I wk , for some explicit set of quDits w ∈ C4 , 2 ≤ ≤ k. From the expressions I Uh∗ (v1

tn∗ = L 1 Uh∗ ( I Uh∗ )n−1 L 2 , ⊗ · · · ⊗ vk ) = π I F4 vk ⊗ v1 ⊗ · · · ⊗ vk−1 ,

we can easily write the action of the operators tn∗ on tk+m |: if n < k, then

tn∗ tk+m | = π0 π I wk−n+1 ⊗ . . . = 0.

The first non-trivial case of n = k is given by tk∗ tk+m | = π0 F4 e3 ⊗ π I w2 ⊗ · · · π I wk , while for any 1 ≤ m ≤ k − 1, we have ∗ |ψm ,m () = tk+m tk+m | = π0 F4 π I wk−m +1 ⊗ π I F4 π I wk−m +2 ⊗ · · · π I F4 π I wk ⊗ π I F4 e3 ⊗π I w2 ⊗ · · · π I wk−m . def

The above state is obtained by first evolving | k + m times through the “inside propagator” Uh I , then projecting on the “output lead” L 2 , then evolving backwards (through I Uh∗ ) k + m times, and finally projecting on the “input lead” L 1 . As for the case of the operator t, we see that by increasing m we increase the number of quDits on which we apply the operator π I F4 . Therefore, for any index m, the norm of |ψm ,m () will decrease exponentially fast with m . As in Lemma 6.2, we truncate the expansion (6.12) to the range m ≤ k, thereby omitting a remainder O(2−k/2 ). We now replace the quDits w by their explicit values, which depend on the index m. We introduce the following notations for operators on C4 : Pαβ = πα F4 πβ F4∗ , def

with α, β ∈ {0, I, 3}. def

The explicit decomposition of |ψm ,m () depends on the sign of = m − m , and on whether m, m = 0 or not (we will often omit to indicate the dependence in ):

Distribution of Resonances for Open Quantum Maps

359

• for m = m, |ψ0,0 = P03 e0 ⊗ e2 ⊗ · · · ek , |ψm,m = P0I e0 ⊗ P I I e2 ⊗ · · · P I I em ⊗ P I 3 em+1 ⊗ em+2 ⊗ · · · ek .

(6.13)

• for m = m + , > 0, |ψ0, = P03 e+1 ⊗ e+2 ⊗· · · · · · ek ⊗ π I F4∗ e0 ⊗ π I F4∗ e2 ⊗ · · · π I F4∗ e , |ψm ,m + = P0I e+1 ⊗ P I I e+2 ⊗ · · · P I I e+m ⊗ P I 3 e+m +1

⊗e+m +2 ⊗ · · · ek ⊗ π I F4∗ e0 ⊗ π I F4∗ e2 ⊗ · · · π I F4∗ e . (6.14)

• for m = m + , < 0, |ψ||,0 = π0 F4 ek−||+1 ⊗π I F4 ek−||+2 ⊗· · ·π I F4 ek ⊗P I 3 e0 ⊗ e2 ⊗· · · ek−|| , |ψm+||,m = π0 F4 ek−||+1 ⊗ π I F4 ek−||+2 ⊗ · · · π I F4 ek ⊗ P I I e0 ⊗P I I e2 ⊗ · · · ⊗ P I I em ⊗ P I 3 em+1 ⊗ em+2 ⊗ · · · ek−|| . (6.15) We notice that each of these states contains subfactors em+2 ⊗ · · · ⊗ ek if m ≥ m , and em+2 ⊗ · · · ⊗ ek+m−m if m < m . Compared to its position in the tensor product expansion of |, this subfactor is shifted by m − m = − steps. From this remark, and using similar methods as for Lemma 6.3, we can easily prove the following Lemma 6.4. Let 0 < < 1/6 and for any pair of indices (m, m ), denote = m − m . There exists C = C() > 0 such that # nonclass. : ∃ m 1 , m 1 , m 2 , m 2 ∈ [0, k], 1 = 2 ,#ψm 1 ,m 1 |ψm 2 ,m 2 = 0 ≤ C2−Ck . # {nonclass.} In other words, for a generic nonclassical sequence , any two states |ψm 1 ,m 1 (), |ψm 2 ,m 2 () with m i , m i ≤ k will be orthogonal to each other if 1 = 2 , that is, if the shifts between their respective forward and backward evolution times are different. From now on we assume that is a generic nonclassical sequence in the sense of the above lemma. If we group the states |ψm+,m () into def

| () =

|ψm+,m (),

0≤m,m ≤k m=m +

then genericity implies that # ()| () = 0 if = . The square-norm of t ∗ t| is then given by

()2 + O(2−k/2 ). (6.16) t ∗ t|2 = ||≤k

As we will see, no further simplification occurs in this expression, meaning that two different states |ψm ,m with the same will generally interfere with each other. Our remaining task consists in computing each square norm on the right-hand side of (6.16). We will use the explicit tensor decompositions (6.13-6.15), and notice that the overlap between two states |ψm ,m is the product of the overlaps of their tensor factors. We split the lengthy, yet straightforward computation according to the value of .

360

S. Nonnenmacher, M. Zworski

6.3.1. Norm of 0 . We have

0 2 = ψm,m 2 + 2 m≤k

Re#ψm,m |ψn,n .

(6.17)

0≤m
The successive diagonal terms take the values 1 ψ0,0 2 = P03 e0 2 = , while for m ≥ 1, 16 $m % 1 3 m−1 1 2 2 2 . ψm,m = P0I e0 P I I e P I 3 em+1 2 = 4 8 8 =2

The sum over the diagonal terms is therefore

ψm,m 2 =

m≤k

9 + O (3/8)k . 80

The nondiagonal terms read, for any 0 < n ≤ k, $ n % 1 1 n−1 1 #e , P I I e #en+1 , P I 3 en+1 = #ψ0,0 |ψn,n = #P03 e0 , P0I e0 , 8 2 4 =2

and for 0 < m < n ≤ k, one similarly gets #ψm,m |ψn,n =

1 4

m−1 3 1 ± i 1 n−m−1 1 . 8 16 2 4

The sign is + if m+1 = 1, and − if m+1 = 2. Adding up the real parts of these off-diagonal terms, we obtain

2

Re#ψm,m |ψn,n =

0≤m
3 + O(2−k ). 20

We notice that this contribution is of the same order as the diagonal one. Summing the diagonal and nondiagonal parts yields the norm 0 2 =

21 + O(2−k ). 80

(6.18)

6.3.2. Norm of with > 0. From Eq. (6.14) we notice that all states |ψm,m+ , 0 ≤ m ≤ k − share the same last quDits, which results in a common factor =1

π I F4∗ e 2 =

1 in the norm 2 . 2

To avoid taking this factor into account at all steps, we rather consider the states |ψm,m+ obtained by removing these last quDits from |ψm,m+ . 2 = 1 . We first compute the square-norm ψ0, 16

Distribution of Resonances for Open Quantum Maps

361

For all m > 0, the first quDit of |ψm,m+ is P0I e+1 . From the explicit expression of P0I , this quDit vanishes if +1 = 2, so that

if +1 = 2,

()2 =

1 1 for all 1 ≤ ≤ k. 16 2

(6.19)

In the opposite case +1 = 1, the states |ψm,m+ are nontrivial:

for any 0 < m ≤ k − , [k]−

so that

2 = ψm,m+

1 3 m−1 1 ( ) , 8 8 8

ψm,m+ 2 =

7 + O (3/8)k− ) . 80

m=0

(6.20)

We then compute the off-diagonal terms: −1 − i 1 1 for 0 < m ≤ k − ||, 16 2m−1 4 1 1±i 1 1 3 #ψm,m+ |ψn,n+ = ( )m−1 for 1 ≤ m < n ≤ k − || 8 8 16 2n−m−1 4 #ψ0, |ψm,m+ =

(the sign ± in the last line depends on +m+1 ). Summing up the real parts yields:

2

Re#ψm,m+ |ψn,n+ =−

0≤m
1 + O(2−k+ ). 20

Adding this to the diagonal terms (6.20), restoring the factor 2− , and using (6.19) results in the following norm (which explicitly depends on ): ()2 =

1 3 1 δ+1 =1+ δ+1 =2 + O(2−k ) for any 1 ≤ ≤ k. 2 80 16

(6.21)

6.3.3. Norm of with < 0. As in the previous case, we notice from (6.15) that all components of share the same || first quDits, which contribute a factor ⎛ ⎞ || 1 1 . (6.22) π0 F4 ek−||+1 2 ⎝ π I F4 ek−||+ 2 ⎠ = 4 2||−1 =2

the states with these || quDits removed. They have the norms We call ψm+||,m 2 ψ||,0

1 1 = , and for m ≥ 1, ψm+||,m 2 = 8 8

m−1 3 1 . 8 8

Hence, the diagonal contribution reads +k,−||

m=0

ψm+||,m 2 =

3 + O (3/8)k−|| . 20

362

S. Nonnenmacher, M. Zworski

The nondiagonal terms take the values −1 + i 1 1 for 0 < n ≤ k − ||, 16 2n−1 4 1 1 1 3 m−1 1 ± i #ψm+||,m for 1 ≤ m < n ≤ k − ||. |ψn+||,n = n−m−1 8 8 16 2 4 #ψ||,0 |ψn+||,n =

These contributions sum up to 2

Re#ψm+||,m |ψn+||,n =−

0≤m
1 + O(2−k+|| ). 20

Putting together this with the diagonal contributions and restoring the factor (6.22), yields 1 1 2 = + O(2−k ), −k ≤ ≤ −1. (6.23) 20 2|| 6.3.4. Summing up. We can now sum over all shifts , || ≤ k for a given generic nonclassical sequence . The sum over the shifts ≤ 0 is simple, and independent on the sequence :

()2 =

−k≤≤0

25 + O(k 2−k ). 80

The sum over the shifts > 0 is slightly more delicate, since the norm of | () depends on — see Eq. (6.21). However, we notice that the set of generic nonclassical sequences can be partitioned into “mirror pairs” (, ) such that ∀ ∈ [2, k], = 1 ⇐⇒ = 2. Summing the norms over a “mirror pair” is easy: for 1 ≤ ≤ k,

()2 + ()2 =

1 1 + O(2−k ). 2 10

This contribution is identical (up to the remainder) with − ()2 + − ()2 , which shows a sort of symmetry between positive and negative shifts. Summing over all || ≤ k, we get, for any mirror pair (, ) of generic nonclassical sequences: t ∗ t|2 + t ∗ t|2 =

58 + O(2−k/2 ). 80

Using Lemma 6.4, we obtain the trace over the nonclassical states: 58 trnoncl ((t ∗ t)2 ) = 2k−2 + O(2−Ck ) + O(2−k/2 ) . 80 Subtracting this expression from the “nonclassical conductance” (6.10), and calling = min(C, 1 ), we finally obtain the noise power: C 2 P = tr(t ∗ t − (t ∗ t)2 ) = trnoncl (t ∗ t − (t ∗ t)2 ) 11 = 2k−1 + O(2−Ck ) . 80

Distribution of Resonances for Open Quantum Maps

363

This proves (1.2) in Theorem 2. As remarked in §1.1, the factor 11/80 is close to the random-matrix prediction for this quantity, namely 1/8 [26, 57]. This is in contrast with our remark 5.1 that the semiclassical resonance spectrum of the propagator inside the dot, Bh = Uh I , is quite different from that of a random subunitary matrix. Somehow, the matrix t, obtained by summing iterates of Bh , has acquired some “genericity”, as far as the distribution of its singular values is concerned. It would be interesting (but quite cumbersome) to compute the higher moments of that distribution. Acknowledgements. We are grateful to Christof Thiele and Terry Tao for pointing out the “Walsh” interpretation of our toy model, and to the anonymous referee for his comments. The first author thanks Marcos Saraceno for his insights on that model, and André Voros for interesting questions. He is also grateful to UC Berkeley for the hospitality in April 2004. Generous support of both authors by the National Science Foundation under the grant DMS-0200732 is also gratefully acknowledged.

References 1. Alexandrova, I.: Semi-Classical Wavefront Set and Fourier Integral Operators. To appear in Can. J. Math., available at http://arxiv.org/list/math.AP/0407460, 2004 2. Balazs, N.L., Voros, A.: The quantized baker’s transformation. Ann. Phys. (NY) 190, 1–31 (1989) 3. Borgonovi, F., Guarneri, I., Shepelyansky, D.L.: Statistics of quantum lifetimes in a classically chaotic system. Phys. Rev. A 43, 4517–4520 (1991) 4. Bogomolny, E.B.: Semiclassical quantization of multidimensional systems. Nonlinearity 5, 805–866 (1992) 5. Bouzouina, A., De Bièvre, S.: Equipartition of the eigenfunctions of quantized ergodic maps on the torus. Commun. Math. Phys. 178, 83–105 (1996) 6. Büttiker, M.: Scattering theory of thermal and excess noise in open conductors. Phys. Rev. Lett. 65, 2901–2904 (1990) 7. Casati, G., Maspero, G., Shepelyansky, D.L.: Relaxation process in a regime of quantum chaos. Phys. Rev. E 56, R6233–6236 (1997) 8. Chernov, N., Markarian, R.: Ergodic properties of Anosov maps with rectangular holes. Boletim Sociedade Brasileira Matematica 28, 271–314 (1997) 9. Chirikov, B.V.: Time-dependent quantum systems. In: Chaos et physique quantique. (École d’été des Houches, Session LII, 1989), M.J. Giannoni, A. Voros, J. Zinn-Justin, eds., Amsterdam: North Holland, 1991 10. Chirikov, B.V., Izrailev, F.M., Shepelyansky, D.L.: Dynamical stochasticity in classical and quantum mechanics. Math. Phys. Rev. 2, 209–267 (1981), Soviet Sci. Rev. Sect. 2 C, Math. Phys. Rev. 2, Harwood Academic, Chur 11. Christianson, H.: Growth and zeros of the zeta function for hyperbolic rational maps. Preprint 2003, to appear in Can. J. Math., available at http://math.berkeley.edu/∼hans 12. Cvitanovi´c, P., Eckhardt, B.: Periodic-orbit quantization of chaotic systems. Phys. Rev. Lett. 63, 823–826 (1989) 13. Degli Esposti, M.: Quantization of the orientation preserving automorphisms of the torus. Ann. Inst. Henri Poincaré 58, 323–341 (1993) 14. Dimassi M., Sjöstrand, J.: Spectral Asymptotics in the semi-classical limit. Cambridge: Cambridge University Press, 1999 15. Doron, E., Smilansky, U.: Semiclassical quantization of chaotic billiards: a scattering approach. Nonlinearity 5, 1055–1084 (1992) 16. Falconer, K.: Techniques in fractal geometry. Newyork: J. Wiley & Sons, 1997 17. Fyodorov, Y.V., Sommers, H.-J.: Statistics of resonance poles, phase shifts and time delays in quantum chaotic scattering: Random matrix approach for systems with broken time-reversal invariance. J. Math. Phys. 38, 1918–1981 (1997); ibid: Spectra of random contractions and scattering theory for discrete-time systems. JETP Lett. 72, 422–426 (2000) 18. Gaspard, P., Alonso, D., Burghardt, I.: New Ways of Understanding Semiclassical Quantization. Adv. Chem. Phys. 90, 105–364 (1995) 19. Guillemin, V., Uhlmann, G.: Oscillatory integrals with singular symbols. Duke Math. J., 48, 251–267 (1981) 20. Guillopé, L., Lin, K., Zworski, M.: The Selberg zeta function for convex co-compact Schottky groups. Comm. Math. Phys. 245, 149–176 (2004)

364

S. Nonnenmacher, M. Zworski

21. Hannay, J.H., Berry, M.V.: Quantization of linear maps on a torus - Fresnel diffraction by a periodic grating. Physica D 1, 267–290 (1980) 22. Helffer, B., Sjöstrand, J.: Résonances en limite semi-classique. Mémoires de la S.M.F. 114(3), (1986) 23. Hörmander, L.: The Analysis of Linear Partial Differential Operators. Vol. I-II. Berlin Heidelberg Newyork, Springer Verlag, 1983 24. Hörmander, L.: The Analysis of Linear Partial Differential Operators. Vol. III–IV. Berlin Heidelberg Newyork, Springer Verlag, 1985 25. Ivrii, V.: Microlocal Analysis and Precise Spectral Asymptotics. Springer Verlag, 1998 26. Jalabert, R.A., Pichard, J.-L., Beenakker, C.W.J.: Universal Quantum Signatures of Chaos in Ballistic Transport. Europhys. Lett. 27, 255–260 (1994) 27. Karabegov, A., Schlichenmaier, M.: Identification of Berezin-Toeplitz deformation quantization. J. Reine Angew. Math. 540, 49–76 (2001) 28. Keller, J.B.: Corrected Bohr-Sommerfeld quantum conditions for nonseparable systems. Ann. Phys. 4, 180–188 (1958) 29. Lifermann, J.: Les méthods rapides de transformation du signal: Fourier, Walsh, Hadamard, Haar. Paris: Masson, 1979 30. Lin, K.: Numerical study of quantum resonances in chaotic scattering. J. Comp. Phys. 176, 295–329 (2002) 31. Lin, K., Zworski, M.: Quantum resonances in chaotic scattering. Chem. Phys. Lett. 355, 201–205 (2002) 32. Lu, W., Sridhar, S., Zworski, M.: Fractal Weyl laws for chaotic open systems. Phys. Rev. Lett. 91, 154101 (2003) 33. Meenakshisundaram, N., Lakshminarayan, A.: Multifractal eigenstates of quantum chaos and the ThueMorse sequence. Phys. Rev. E. 71, 065303(R) (2005) 34. Melrose, R.B., Uhlmann, G.: Lagrangian intersection and the Cauchy problem. Comm. Pure Appl. Math. 22, 483–519 (1979) 35. Miquel, C., Paz, J.P., Saraceno, M.: Quantum computers in phase space. Phys. Rev. A 65, 062309 (2002) 36. Muscalu, C., Thiele, C., Tao, T.: A Carleson-type theorem for a Cantor group model of the Scattering Transform. Nonlinearity 19, 219–246 (2003) 37. Naud, F.: Classical and Quantum lifetimes on some non-compact Riemann surfaces. J. Phys. A 38, 10721–10729 (2005) 38. Nonnenmacher, S.: Fractal Weyl law for open chaotic maps. In: Mathematical physics of quantum mechanics. Asch, J., Joye, A. (Eds.), Lect. Notes in Physics 690, Berlin: Springer, 2006 39. Nonnenmacher, S., Rubin, M.: Resonant eigenstates in quantum chaotic scattering. http:arxiv.org/list/nlin. CD/0608069, 2006 40. Nonnenmacher, S., Zworski, M.: Fractal Weyl laws in discrete models of chaotic scattering. J. Phys A 38, 10683–10702 (2005), invited paper in a special issue on Trends in quantum chaotic scattering 41. Ozorio de Almeida, A.M., Vallejos, R.O.: Poincaré recurrence theorem and the unitarity of the S-matrix. Chaos, Solitons and Fractals 11, 1015–1020 (2000) 42. Patterson, S.J., Perry, P.: The divisor of Selberg’s zeta function for Kleinian groups (with an Appendix by C.L. Epstein). Duke Math. J. 106 (2001), 321–390 43. Prosen, T.: General quantum surface-of-section method. J. Phys. A 28, 4133–4155 (1995) 44. Robert, D.: Autour de l’approximation semi-classique. Basel: Birkhäuser, 1987 45. Saraceno, M., Vallejos, R.O.: The quantized D-transformation. Chaos 6, 193–199 (1996) 46. Saraceno, M., Voros, A.: Towards a semiclassical theory of the quantum baker’s map. Physica D 79, 206–268 (1994) 47. Savin, D.V., Sokolov, V.: Quantum versus classical decay laws in open chaotic systems. Phys. Rev. E 56, R4911–4913 (1997) Frahm, K.: Quantum relaxation in open chaotic systems. Phys. Rev. E 56, R6237– 6240 (1997) 48. Schack, R., Caves, C.M.: Shifts on a finite qubit string: a class of quantum baker’s maps. Appl. Algebra Engrg. Comm. Comput. 10, 305–310 (2000) 49. Schomerus, H., Tworzyd, J.ło: Quantum-to-classical crossover of quasi-bound states in open quantum systems. Phys. Rev. Lett. 93, 154102 (2004) 50. Sjöstrand, J.: Geometric bounds on the density of resonances for semiclassical problems. Duke Math. J. 60, 1–57 (1990) 51. Sjöstrand, J., Zworski, M.: Quantum monodromy and semiclassical trace formulae. J. Math. Pure Appl. 81, 1–33 (2002) 52. Sjöstrand, J., Zworski, M.: Elementary linear algebra for advanced spectral problems. preprint 2003, http://math.berkeley.edu/∼zworski/ela.ps.gz, and http://arxiv.org/list/sp/0312166, 2003 53. Sjöstrand, J., Zworski, M.:Fractal upper bounds on the density of semiclassical resonances. preprint 2005, to appear in Duke Math. J., available at http://math.berkeley.edu/∼zworski/sz10.ps.gz 54. Strain, J., Zworski, M.: Growth of the zeta function for a quadratic map and the dimension of the Julia set. Nonlinearity 17, 1607–1622 (2004)

Distribution of Resonances for Open Quantum Maps

365

55. Tanner, G.: Spectral statistics for unitary transfer matrices of binary graphs. J. Phys. A 33, 3567–3585 (2000) 56. Tracy, M.M., Scott, A. J.: The classical limit for a class of quantum baker’s maps. J. Phys. A 35, 8341–8360 (2002) 57. Tworzydło, J., Tajic, A., Schomerus, H., Beenakker, C.W.: Dynamical model for the quantum-to-classical crossover of shot noise. Phys. Rev. B 68 (2003), 115313; Ph. Jacquod, Sukhorukov, E.V.: Breakdown of universality in quantum chaotic transport: the two-phase dynamical fluid model. Phys. Rev. Lett. 92, 116801 (2004) 58. Wirzba, A.: Quantum Mechanics and Semiclassics of Hyperbolic n-Disk Scattering Systems. Phys. Rep. 309, 1–116 (1999) 59. Zworski, M.: Dimension of the limit set and the density of resonances for convex co-compact Riemann surfaces. Inv. Math. 136, 353–409 (1999) ˙ 60. Zyczkowski, K., Sommers, H.-J.: Truncations of random unitary matrices. J.Phys. A 33, 2045–2057 (2000) Communicated by P. Sarnak

Commun. Math. Phys. 269, 367–399 (2007) Digital Object Identifier (DOI) 10.1007/s00220-006-0132-z

Communications in

Mathematical Physics

The Phase Transitions from Chiral Nematic Toward Smectic Liquid Crystals Sookyung Joo1 , Daniel Phillips2, 1 Institute for Mathematics and its Applications, University of Minnesota, 400 Lind Hall, 207 Church Street

S.E., Minneapolis, MN 55455, USA. E-mail: [email protected]

2 Mathematics Department, Purdue University, West Lafayette, IN 47907, USA.

E-mail: [email protected] Received: 24 June 2005 / Accepted: 15 June 2006 Published online: 4 November 2006 – © Springer-Verlag 2006

Abstract: A Chen-Lubensky energy is used to investigate phase transitions from chiral nematic to smectic C* and smectic A* liquid crystal phases. We consider a liquid crystalline material confined between two parallel plates, where the dimensions of are assumed to be large relative both to the width of a smectic layer and the material’s chiral pitch. We take boundary conditions so that the smectic phase melts at the plates’ surfaces and prove the existence of energy minimizers in an admissible set consisting of order parameters ∈ H02 () and molecular directors n ∈ W1,2 (; S2 ). Then under the physically observed assumption that the Frank elasticity constants become large near a phase transition, we establish estimates for the transition region separating phases. In particular we derive analytic estimates proving that chirality lowers the transition temperature regime above which minimizers are nematic and below which minimizers are in a smectic phase. 1. Introduction 1.1. Chen-Lubensky model. Liquid crystal phases form when a material has a degree of positional or orientational ordering yet stays in a liquid state. In this paper we study minimizers of the Chen-Lubensky free energy as a way to describe the influence of temperature and material properties on the phases of stable liquid crystal configurations. Liquid crystal molecules are anisotropic. Here we concentrate on the case of long thin molecules where the local order is largely due to near neighbor interactions. Liquid crystals possess one-dimensional order in the nematic phase. Here the molecules are symmetric and tend to slide by one another but locally align, roughly in the direction of their principal axis called a molecular director, n. If nonsymmetric molecules are dispersed in a nematic liquid crystal a chiral nematic phase forms. This phase is distinguished by the fact that the molecular director field n is no longer locally uniform but Research supported by NSF grants DMS-0306516 and DMS-0456286.

368

S. Joo, D. Phillips

undergoes a small helical distortion. As temperature decreases the mechanical interactions become more pronounced and layered structures form called smectic phases. In the smectic A phase, the molecules form layers such that their directors are parallel to the layer’s normal, while smectic C liquid crystals have layers with n tilted with respect to the layer normal. See [4, 6, 8]. Here we investigate liquid crystals displaying chiral nematic behavior above a transition temperature and smectic structure below the transition. As the temperature is lowered an energetic competition forms between the tendency of the molecules to form layers of uniform thickness in the smectic phase and the bending of these layers caused by the helical twist of the director configuration in nematic chiral structures. The main objective of this article is to analytically characterize the transition regime. We single out three parameters appearing in the energy: q ∼ d1 , where d is the smectic layer thickness, τ measuring the chiral twist, and r = T − TN A the temperature of the material relative to TN A , denoting the transition temperature for a nonchiral (τ = 0) smectic A material. We find decreasing curves r = r (qτ ) and r = r¯ (qτ ) in the qτ − r plane where r (qτ ) < r¯ (qτ ) such that minimizers for which r > r¯ (qτ ) are in the nematic phase and those for which r < r (qτ ) are in a smectic phase. The fact that the curves are decreasing is significant, since this indicates that forming stiffer (thinner) smectic layers by increasing q or increasing the chiral twist τ results in lowering the transition temperature, which is the physically observed phenomenon. Furthermore we determine the structure of the bounding transition curves. We use the Chen-Lubensky model to describe the transition between chiral nematic and smectic A∗ or C ∗ liquid crystal phases.(See [5, 20, 21, 25]). This is defined by a second order energy and is a generalization of the first order covariant Landau-de Gennes free energy introduced to study the nematic to smectic A phase transition [18]. The model is defined in terms of the director n and the complex valued smectic order parameter . We consider the energy F = [F S + F N ] dx,

is a bounded domain in R3 with a piecewise smooth boundary. F N is the where ⊂ Oseen-Frank energy density for a nematic, R3

F N = K 1 (∇ · n)2 + K 2 (n · ∇ × n + τ )2 + K 3 |n × (∇ × n)|2 + (K 2 + K 4 )(tr (∇n)2 − (∇ · n)2 ), where K 1 , K 2 and K 3 are the splay, twist, and bend elastic constants, respectively. The parameter τ denotes the cholesteric twist. We note that each of the four terms summing to F N vanish when the director is the special helical configuration nτ (x) = (cos(τ z), sin(τ z), 0). ¯ = 1 makes F N vanish. In the nonchiral case (τ = 0), any constant vector n = n¯ with |n| It is observed in experiments that the constants K 2 and K 3 become large near the chiral nematic–smectic A transition and that all three constants K 1 , K 2 , and K 3 are large near the chiral nematic–smectic C transition [5, 8]. We prove that these phenomena force the director of a minimizer close to a rotation of nτ , and that this plays a major role in showing that chirality lowers the transition temperature as described above.

Phase Transitions from Chiral Nematic Toward Smectic Liquid Crystals

369

In order to associate smectic and nematic structure with a state (n, ) we write (x) = ρ(x)eiφ(x) . Then the molecular mass density is defined by 1 δ(x) = ρ0 (x) + ((x) + ∗ (x)) = ρ0 (x) + ρ(x) cos φ(x), 2 where ρ0 is a locally uniform mass density, ρ(x) is the mass density of the smectic layers, and φ parameterizes the layers so that ∇φ is the direction of the layer normal. If ≡ 0 then ρ(x) = |(x)| = 0. This corresponds to a state with no layered structure and is said to be in the nematic phase if τ = 0, and in the chiral nematic phase if τ = 0. These phases are denoted as N and N ∗ respectively. If ≡ / 0 the state corresponds to being smectic if τ = 0 and chiral smectic if τ = 0. These phases are labelled as Sm A(C) and Sm A∗ (C ∗ ) respectively. In [5] Chen and Lubensky introduce the smectic free energy density 2

F S = D⊥ |D2n − n · (n · Dn )Dn | − C⊥ (|Dn |2 − |n · Dn |2 ) g + C |n · Dn |2 + r ||2 + ||4 , 2 where Dn = ∇ − iqn, D2n = Dn · Dn , r = T − TN A , and such that the coefficients q, D⊥ , C , and g are positive ([5, 25]). The energy F is designed to describe chiral nematic to smectic C ∗ transitions if C⊥ > 0 and to smectic A∗ if C⊥ ≤ 0. The energy density F S is degenerate in that it lacks second order coercivity in the direction n. Nevertheless, since C > 0, the energy is non-degenerate in this direction as well but with first order growth. In [21], Luk’yanchuk introduces a modified energy by making the second order gradient term isotropic, and we adopt his modified energy here, ∗ F= F S + F N dx,

where F∗S = D|D2n |2 − C⊥ |Dn |2 + C |n · Dn |2 +

g |ψ|4 + r ||2 . 2

We have replaced C + C⊥ by C since C⊥ can be considered much smaller than C

which we will see more clearly in the next subsection. In fact both F∗S and F S minimize at the same uniform smectic state and due to this they should produce the same qualitative behavior. To see this let (n, ) be such that = ρeiφ where n and ρ are constants, and φ is linear. Then g F∗S = D|∇φ − qn|4 ρ 2 − C⊥ |∇φ − qn|2 ρ 2 + C (n · ∇φ − q)2 ρ 2 + rρ 2 + ρ 4 2 2 2 C C⊥ g = D |∇φ − qn|2 − ρ 2 + C (n · ∇φ − q)2 ρ 2 + r − ⊥ ρ 2 + ρ 4 . 2D 4D 2 If C⊥ ≤ 0 we see the energy is minimized if and only if ∇φ = qn. This corresponds to a uniform smectic A state where the direction of molecules is perpendicular to the layers having thickness d = 2π q . See Fig. 1(b). If C ⊥ > 0 however, the energy is

370

S. Joo, D. Phillips

(a)

(b)

(c) n θ N

Fig. 1. (a) nematic phase (b) smectic A phase (c) smectic C phase

minimized if and only if n · ∇φ = q and |∇φ − qn|2 = C2D⊥ . This corresponds to a uniform smectic C state with tilt angle θ , between the director and the layer normal, C⊥ 2 2 determined by tan2 θ = C⊥ /(2Dq 2 ) and layer thickness satisfying ( 2π d ) = q + 2D . (See Fig.1(c)). We note that the same tilt angle and layer thickness are obtained from the original Chen-Lubensky model as well. Throughout this paper we assume that ∈ H02 (), that is the smectic phase melts at ∂. Such a boundary condition can result from applying chemical surface treatments or imposing mechanical stresses. See [3, 9, 15, 16]. Using integration by parts, we can rewrite the energy by

2 C⊥ C⊥

2 g

2 2 4 2 F= + C |n · Dn | + |ψ| + r − || dx D Dn + 2D 2 4D + F N dx.

2 /(4D) then ≡ 0 minimizes F∗ and minimizers for F will be For C⊥ > 0, if r ≥ C⊥ S 2 /(4D) however, then ≡ chiral nematic. If r < C⊥ / 0 may be more stable and in this case we have the smectic C* phase. On the other hand, if C⊥ ≤ 0, then the region above the line r = 0 directly indicates the chiral nematic phase. The goal of our paper is to determine the phase diagram for both N* – Sm C* and N* – Sm A* transitions which

should appear below the lines r =

2 C⊥ 4D

and r = 0, respectively.

1.2. Statement of main results. We consider a bounded simply connected Lipschitz domain in R3 cut by two plates so that the height h of is fixed. The domain has piecewise C 2,α surface ∂ for some α > 0 and ⊃ C R,h (0), a cylinder with sufficiently large radius R and height h and center at the origin. We use H to denote a Banach space with real-valued elements, H if the elements are complex-valued, and H if the elements are vector-valued. The admissible set in this paper is A = {(, n) ∈ H02 (; C) × W1,2 (; S2 ) : n(x, y, ±h) · e3 = 0 for (x, y, ±h) ∈ ∂}. Hence we seek minimizers for F where ∈ H02 () and n is parallel on the top and the bottom plates. We note that (, nτ ) are in A if ∈ H02 (). In Sect. 2, we prove the existence of minimizers for F when ∈ H02 () and n ∈ 1,2 W (; S2 ) under the assumptions on the Frank constants, (2.2).

Phase Transitions from Chiral Nematic Toward Smectic Liquid Crystals

371

In Sect. 3, assuming that K 2 , and K 3 are large, we show that the director n from a minimizer is close to a rotation of nτ about the vector e3 . More precisely, we prove that given ε > 0, if τ ≥ τ0 for some fixed positive constant τ0 , if K 2 , and K 3 are sufficiently large, and if (, n) is a minimizer for F in A, then there exists n˜ τ , a rotation of nτ about e3 such that

n(x) − n˜ τ (x) 4, < ε. (1.1) When we consider N* – C* transition, we can take τ0 = 0. In Sects. 4 and 5, we draw the nematic- smectic C* transition region in qτ − r plane for certain values of qτ . As in [1], we assume q τ and K 2 = K 3 in both cases. However, the results in this paper are still valid with any sufficiently large K 2 and K 3 . Furthermore, C⊥ < 3Dq 2 and σ ≥ 2 are presumed. From the previous section, we see that the value σ⊥ := C⊥ /(2Dq 2 ) is related to the angle θ between the director and the layer normal by the equation σ⊥ = tan2 θ . Thus σ⊥ 1 for materials that have a small tilt angle near the transition. In this paper, we assume to be definite that σ⊥ < 1.5. On the other hand, the parameter σ := C /(4Dq 2 ) is considered to be larger than 1. (See [21] as well as the references therein). Here we assume σ ≥ 2 for simplicity. In order to study the stability of the chiral nematic and smectic C* phases, we derive eigenvalue estimates for F∗S (, n˜ τ ) and then rely on the estimate (1.1). We deal with two cases. Let A1 (c˜2 , c3 ):= {(q, τ ) : Dqτ ≥ c3 C⊥ , q ≥ c˜2 τ, τ ≥ R −1 , qτ ≥ h −2 }, A2 (c3 ) := {(q, τ ) : Dqτ < c3 C⊥ , Dq/(C⊥ R 3 ) ≤ τ, D/(C⊥ h 6 ) ≤ qτ }, where σ := C /(4Dq 2 ) is a fixed constant. In this paper, we consider a large domain relative to q −1 and τ −1 . In Theorem 5.1 we prove that there exist constants c3 , c6 , and c7 , depending only K depending on q and such that if K 1 , K 2 ≥ K , if (, n) is a on σ and a constant minimizer for F in A, and if (q, τ ) ∈ A2 (c3 ), then 2 2 C⊥ 1 4 − c6 D 3 C⊥3 (qτ ) 3 implies ≡ 0 (chiral nematic), 4D 2 C2 1 4 and 2) r < r C := ⊥ − c7 D 3 C⊥3 (qτ ) 3 implies ≡ / 0 (smectic C*). 4D In Theorems 4.1 and 6.1 we prove that there exist constants c˜2 , c˜4 , and c˜5 , depend depending on q and such that if K 1 , K 2 ≥ K , if (, n) is a ing only on σ and K minimizer for F in A, and if (q, τ ) ∈ A1 (c˜2 , c3 ), then

1) r > r¯C :=

2 C⊥ − c˜4 D(qτ )2 implies ≡ 0 (chiral nematic), 4D C2 and 2) r < r C := ⊥ − c˜5 D(qτ )2 implies ≡ / 0 (smectic C*). 4D Here we take c˜2 = max(c2 , c10 ), c˜4 = min(c4 , c11 ), and c˜5 = max(c5 , c12 ). From the result above, we can conclude that the boundary separating the chiral nematic and smectic C* phase lies between r¯C and r C , where their formulas change at qτ = c3 C⊥ /D. See Fig.2. It is illuminating to express these formulas in terms of b = qτ , D, r˜ = r , and σ⊥ . We have q4

1) r ≥ r¯C :=

2

4

r˜ = D(σ⊥2 − c σ⊥3 b 3 ) for b < 2c3 σ⊥ and r˜ = D(σ⊥2 − c b2 ) for b ≥ 2c3 σ⊥ .

372

S. Joo, D. Phillips

r

r

N* − C* Phase Transition

C2 /4D

N* − A* Phase Transition

⊥

N*

N*

0

− r=rC(qτ) − r=r (qτ) A

C*

A*

r=rC(qτ) −

r=r (qτ) −A

|C | /D ⊥

qτ

c C /D 3 ⊥

qτ

Fig. 2. Chiral Nematic - Chiral Smectic Phase Diagram

Finally, in Sect. 7, we estimate the nematic-smectic A* transition region in the qτ −r plane when the lateral side of the domain is assumed smooth. The analysis used in this section is highly dependent on the one in Sect. 4. In this case however it is expected that K 1 remains bounded near the phase transition. Because of this we need an alternative way to control the divergence of the director from the minimizer resulting from (3.5). Our approach is to use gauge invariance techniques as in [1]. In Theorem 7.1 we prove that there are a universal positive constant e1 and constants e5 , e6 depending only on σ

A depending on q, , D, and C⊥ such that if and a constant K A C⊥ ≤ 0, τ ≥ R −1 , qτ ≥ h −2 , q ≥ e1 τ, and K 2 ≥ K and if (, n) is a minimizer for F A in A, then the following two statements hold: 1) r > r¯ A := −e5 max D(qτ )2 , |C⊥ |qτ implies ≡ 0, and 2) r < r A := −e6 max D(qτ )2 , |C⊥ |qτ implies ≡ / 0. We should mention that these estimates for both N* – Sm C* and N* – Sm A* phase diagrams are consistent with the result found in [18, 25 and 21] (where in [18 and 25], the original Chen-Lubensky model is used.) Related analyses based on the Landau-de Gennes free energy for the N* – Sm A* transition were done in [1 and 24]. 2. Admissible Sets and the Existence of Minimizers We find that the energy becomes g F = (D|D2n |2 − C⊥ |Dn |2 + C |n · Dn |2 + r ||2 + ||4 ) dx 2 + (K 1 − K 2 − K 4 )(∇ · n)2 + (K 3 − K 2 − K 4 )|n × (∇ × n)|2 dx 2 − K 4 (n · (∇ × n)) dx + (K 2 + K 4 ) |∇n|2 dx + 2τ K 2 n · (∇ × n) dx + τ 2 K 2 ||, (2.1)

Phase Transitions from Chiral Nematic Toward Smectic Liquid Crystals

373

using the following identities for n ∈ W1,2 (; S2 ): tr (∇n)2 − (∇ · n)2 = |∇n|2 − |∇ × n|2 − (∇ · n)2 , |∇ × n|2 = |n · (∇ × n)|2 + |n × (∇ × n)|2 . Definition 1. A set A ⊂ H02 ()×W1,2 (; S2 ) is called admissible if A = ∅ and weakly sequentially compact in H02 () × W1,2 (; S2 ). We impose the same restriction on the Frank constants as in [1] to ensure the existence of minimizers and the estimate in Sect. 3. Assumptions. C

are fixed constants with g, D, and σ positive, 4Dq 2 q ≥ 0 and τ ≥ 0.

g, D, C⊥ , σ ≡

There exist fixed positive constants c0 and c1 such that c1 ≥ K 2 + K 4 ≥ c0 > 0 , K 1 ≥ K 2 + K 4 , K 3 ≥ K 2 + K 4 , and 0 ≥ K 4 .

(2.2)

Throughout this paper, C denotes a positive constant, which does not always have to be the same. Theorem 2.1. Let F be as in (2.1) such that the above assumptions hold. Then there exists a minimizer for F in A. Proof. Let ( j , n j ) be a minimizing sequence for F in A. Note that c0 |∇n j |2 − K 4 (n j · ∇ × n j )2 + 2τ K 2 n j · (∇ × n j ) dx FN dx ≥ ≥ c0 |∇n j |2 dx − (K 4 + εK 2 ) (n j · (∇ × n j ))2 dx − C ≥ c0 |∇n j |2 dx − C,

taking ε = −K 4 /(2K 2 ) and C is a positive constant depending on the parameters and ||. Thus F∗S + c0 |∇n j |2 dx − C F( j , n j ) ≥ D 2 g 2 |Dn j j | + C⊥ |Dn j j |2 + r | j |2 + | j |4 + c0 |∇n j |2 dx = 2 2

2 D 2 2C⊥ 2 2C⊥ j − | j |2 dx − C +

Dn j j + 2 D D and hence

2 |D2n j j | + |Dn j ψ j |2 + |ψ j |4 + |∇n j |2 dx ≤ C.

(2.3)

374

S. Joo, D. Phillips

In particular we see that

∇ j 2 = Dn j j + iqn j j 2 ≤ C.

(2.4)

Using (2.3),(2.4), |n j | = 1, integration by parts, we have

∇ 2 j 2 = C D2n j j + iq∇ · n j j + 2iqn j · ∇ j + q 2 j 2 ≤ C + C j ∞ .

(2.5)

By the Gagliardo-Nirenberg inequality ([10, 22, 23]), there are positive constants C1 and C2 depending only on the Lipschitz domain for which 3

1

j ∞ ≤ C1 ∇ 2 j 24 j 24 + C2 j 2 . Thus we have, using Young’s inequality, for any ε > 0,

j ∞ ≤ Cε ∇ 2 j 2 + Cε−3 j 2 .

(2.6)

Combining this with (2.5), we are led to

∇ 2 j 22 ≤ C. Since we have j H2 () ≤ C and n j W1,2 () ≤ C, it follows, for a subsequence, still labelled {( j , n j )} that j ∞ in H2 (),

j → ∞ in W 1,4 (),

n j n∞ in W1,2 (),

n j → n∞ almost everywhere in ,

where n∞ ∈ W1,2 (; S2 ). Thus (∞ , n∞ ) ∈ A and F(∞ , n∞ ) ≤ lim inf F( j , n j ) = j→∞

inf

(,n)∈A

(, n).

3. The Effect of Large Frank Constants We assume that K2 = K3

(3.1)

to simplify the analysis. As in [1], the following analysis is still valid for sufficiently large K 2 and K 3 without the assumption (3.1). Now the energy (2.1) becomes

C⊥

2 g

dx + C

F(, n) = D D2n + |n · Dn |2 dx + ||4 dx 2D 2 C⊥ 2 K 1 (∇ · n)2 + K 2 |∇ × n + τ n|2 dx ||2 dx + + r− 4D 2 + (K 2 + K 4 ) |∇n| − |∇ × n|2 − (∇ · n)2 dx. (3.2)

Phase Transitions from Chiral Nematic Toward Smectic Liquid Crystals

375

Lemma 3.1. Assume that the Frank constants satisfy (2.2) and (3.1), K 2 ≥ 4c1 and τ ≤ q0 . There are positive constants M1 and M2 , independent of τ and K 2 so that if ( ∞ , n∞ ) is a minimizer for F in A then

∇n∞ 2L2 () ≤ M1 ||,

∇ × n∞ + τ n∞ 2L2 ()

(3.3)

M2 ≤ ||. K2

(3.4)

Furthermore, if K 1 ≥ 2c1 as well then there is a positive constant M3 independent of τ such that n∞ from a minimizer ( ∞ , n∞ ) satisfies

∇ · n∞ 2L 2 () ≤

M3 ||. K1

(3.5)

Proof. Since ( ∞ , n∞ ) is a minimizer for F in A, 0 = F(0, nτ )

C⊥ ∞

2

D D2n∞ ∞ + |n∞ · Dn∞ ∞ |2 dx ≥ dx + C

2D ⎤ ⎡ C2 2 2 2 g 1 || C⊥ ∞ 2 ∞ ⊥ ⎣ −r −r + + F N (n )⎦ dx − | | − g 4D 2g 4D 2 2 2 || C⊥ −r + ≥− F N (n∞ ) dx. 2g 4D By the definition of F N from (3.2), we obtain (K 1 − K 2 − K 4 )(∇ · n∞ )2 + c0 |∇n∞ |2 + K 2 |∇ × n∞ + τ n∞ |2 dx

−(K 2 + K 4 )

|∇ × n∞ |2 dx ≤

C⊥ 2 −r 4D

2

|| . 2g

(3.6)

Since K 1 − K 2 − K 4 ≥ 0, and c1 > K 2 + K 4 , 2 2 || C⊥ ∞ 2 ∞ ∞ 2 ∞ 2 −r . c0 |∇n | + K 2 |∇ × n + τ n | − c1 |∇ × n | dx ≤ 4D 2g Hence using the assumption K 2 ≥ 4c1 and the elementary inequality, 2 (a − b)2 ≥ a2 − b2 , on K22 |∇ × n∞ + τ n∞ |2 , we get the first part of the lemma. The second part of the lemma follows from (3.6) by doing the same procedure with additional assumption K 1 ≥ 2c1 . Corollary 3.1. Let {( j , n j )} be a sequence of minimizers for F with Frank constants j j j {(K 2 , K 4 )} satisfying (2.2), (3.1) and such that lim j→∞ K 2 = ∞. Then there is a subsequence {( jk , n jk )}and a function n∞ ∈ W1,2 (, S2 ) so that n jk n∞ in W1,2 () as jk → ∞ where n∞ satisfies ∇ × n∞ + τ n∞ = 0 in . Now we need the following result from Lemma 3 in [1].

376

S. Joo, D. Phillips

Lemma 3.2. Let τ = 0 and consider n ∈ W1,2 (, S2 ) such that ∇ × n + τ n = 0 in . Then

n(x) = Qnτ (Q t x) ≡ n˜ τ (x) for some Q ∈ S O(3).

We now take the admissible set to be A ≡ {(, n) ∈ H02 () × W1,2 (; S2 ) : n(x, y, ±h) · e3 = 0 for (x, y, ±h) ∈ ∂}. Next we define S O(2)× I by the subgroup of S O(3), representing a rotation through the angle θ about the vector e3 , i.e., ⎧ ⎫ ⎛ ⎞ cos θ − sin θ 0 ⎨ ⎬ cos θ 0⎠ ∈ S O(3) : 0 ≤ θ < 2π . S O(2) × I = Q ≡ ⎝ sin θ ⎩ ⎭ 0 0 1 In the following lemma, we show that in order for a rotation of nτ to be in the admissible set A, the rotation should be in S O(2) × I . Lemma 3.3. Suppose n˜ τ (x1 , x2 , ±h) · e3 = 0 for (x1 , x2 , ±h) ∈ ∂, where n˜ τ = Qnτ (Q t x) for some Q ∈ S O(3). Then Q ∈ S O(2) × I . Proof. Let Q = (ai j ). From the hypothesis, 3 3 a31 cos τ

ak3 xk k=1

+ a32 sin τ

ak3 xk

= 0 for any (x1 , x2 , ±h) ∈ ∂.

k=1

(3.7)

If a32 = 0 then −

a31 = tan(τ (a13 x1 + a23 x2 ± a33 h)) for any (x1 , x2 , ±h) ∈ ∂. a32

Because a13 x1 + a23 x2 is a constant for every x1 , x2 on the top and the bottom plates, we have a13 = a23 = 0 and hence a31 = a32 = 0 from Q Q t = ! I . This is a contradiction. Also if a32 = 0 and a31 = 0, then by (3.7), we get cos(τ 3k=1 ak3 xk ) = 0 for (x1 , x2 , ±h) ∈ ∂ and which leads to a31 = 0 by the same argument, now with a31 = a32 = 0. Since Q ∈ S O(3) and Q Q t = I , we get a33 = 1 and a13 = a23 = 0. Then the upper left 2 × 2 submatrix of Q must be in S O(2). The following theorem is our main goal in this section. We show that the director from a minimizer is close to a rotation of nτ about e3 in the L4 norm sense if K 2 is sufficiently large and τ is away from 0. Theorem 3.1. Suppose there is τ0 > 0 such that τ0 ≤ τ ≤ q0 and 0 ≤ q ≤ q0 and |r | ≤ r0 . Then given ε > 0 there exists a constant Π = Π (ε, τ0 , q0 , r0 ) so that if K 2 ≥ Π and (, n) minimizes F in A then

n(x) − Qnτ (Q t x) 4, < ε for some Q ∈ S O(2) × I . Furthermore, if K 1 ≥ Π as well, then the above inequality holds for any 0 ≤ τ ≤ q0 .

Phase Transitions from Chiral Nematic Toward Smectic Liquid Crystals

377

Proof. Suppose not. Then there exist ε0 > 0 and sequences of minimizers {( j , n j )} j j j j j with {(K 1 , K 2 , K 4 , τ j , q j , r j )}, such that K 2 ≥ j (K 1 ≥ j as well, for the second part of the theorem) and

n j − n˜ τ 4, ≥ ε0 for any n˜ τ of the form n˜ τ (x) = Qnτ (Q t x) with Q ∈ S O(2) × I for each j = 1, 2, . . .. Note that we can take M1 , M2 , M3 to be uniform in τ, q and r for τ, q ≤ q0 and |r | ≤ r0 . Since ∇n j 22 ≤ M1 || and |n j | = 1, there exists a subsequence still labelled with j, {n j }, n∞ ∈ W1,2 (, S2 ), and τ∞ ∈ [τ0 , q0 ] such that τ j → τ∞ , n j n∞ in W1,2 (), n j → n∞ in L4 (), n j → n∞ in L2 (∂), n j → n∞ almost everywhere on ∂ as j → ∞. For the L2 convergence on the boundary, we used the estimate, from the proof of Theorem 1.5.1.10 in [11], |u|2 dσ ≤ C(){ u L 2 () ∇u L 2 () + u 2L 2 () } ∂

for all u ∈ W 1,2 (). From these limits and (3.4), ∇ × n∞ + τ∞ n∞ = 0 in . If τ∞ > 0, we see that n∞ (x) = Qnτ∞ (Q t x) for some Q ∈ S O(3) by Lemma 3.2. Also since n j (x, y, ±h) · e3 = 0 for (x, y, ±h) ∈ ∂ for each j, we have n∞ (x, y, ±h)· e3 = 0 for (x, y, ±h) ∈ ∂. From Lemma 3.3, n∞ (x) = Qnτ∞ (Q t x) with Q ∈ S O(2) × I. Since lim j→∞ n j = n∞ and lim j→∞ nτ j = nτ∞ in L4 () we see lim n j − n˜ τ j 4, = 0,

j→∞

(3.8)

where n˜ τ j (x) = Qnτ j (Q t x) and this is a contradiction. This proves the first part of the theorem. For the second part of the theorem, we may have τ∞ = 0. Then from (3.4) and (3.5) j along with K 1 ≥ j, we have ∇ × n∞ = 0, ∇ · n∞ = 0 and |n∞ | = 1. Thus we have n∞ = ∇ρ locally, where ρ = 0 and |∇ρ| = 1. Then we derive that n∞ is a constant by a direct calculation. In fact, we have for each j, ∂i ρ ∂i j ρ = 0 by differentiating |∇ρ| = 1 and then taking the divergence gives (∂i j ρ)2 = 0 with repeated indices. Since n∞ (x, y, ±h) · e3 = 0 for (x, y, ±h) ∈ ∂, we can find Q ∈ S O(2) × I such that n∞ = Qn0 (Q t x). Again we have (3.8) and this is a contradiction.

378

S. Joo, D. Phillips

4. Analysis for Dqτ C⊥ 4.1. The chiral nematic phase. In Sects. 4, 5, and 6, C⊥ is always positive. In this section we characterize r¯ in terms of qτ , above which the liquid crystal is in the chiral nematic phase. Since ≡ 0 corresponds to the nematic phase which has locally uniform mass density everywhere, we will show that the smectic order parameter ≡ 0 if r ≥ r¯ . In fact, we prove that if the Frank constants are large, if τ q and if Dqτ C⊥ C2

then there is a constant c4 such that r¯ = −c4 D(qτ )2 . We already know that if r ≥ 4D⊥ then the chiral nematic phase is stable. When τ = 0, the curve separating the nematic C2

and smectic C phase is r = 4D⊥ over R3 . Thus we see that in a large domain, the chiral nematic regime extends below the transition temperature TN C between the nematic and smectic C phase due to the chirality. For the next lemma, we basically follow the proof of Lemma 5 in [1] where they used a covering argument inspired by [12] and eigenvalue estimates on discs for the Ginzburg-Landau system, developed in [2]. Lemma 4.1. Suppose qτ is positive. There exist positive universal constants c2 and m such that if q ≥ c2 τ then |(∇ − iq n˜ τ )|2 dx ≥ 8mqτ ||2 dx R3

R3

for all ∈ H1 (R3 ) and all n˜ τ of the form n˜ τ (x) = Qnτ (Q t x) for some Q ∈ S O(3). Proof. For ∈ H1 (R3 ), we cover R3 with cylinders, C(x), of radius (qτ )−1/2 and height 2(qτ )−1/2 , with axis parallel to ∇ × n˜ τ (x) and the center at x . We select a subcover {C(x j )} with at most S cylinders overlapping at each point so that we hold N

j=1 C(x j )

|(∇ − iq n˜ τ )|2 dx ≤ S

R3

|(∇ − iq n˜ τ )|2 dx.

Note that the subcover may be selected in the way that S is a universal number. We follow the proof of Lemma 5 in [1] to find qτ σ (1) |(∇ − iq n˜ τ )|2 dx ≥ ||2 dx, 3 3 2S R R where σ is a continuous real valued function from Proposition 2.7 in [12]. In particular, σ (t) > 0 for t > 0. We proved the lemma with c2 = σ (1)/16 and m = σ (1)/(16S) which are both universal constants. Lemma 4.2. Let c2 , m, q and τ be as in Lemma 4.1. Then there exists a constant c4 such that if C⊥ < m Dqτ then D |D2n˜ τ |2 dx − 2C⊥ |Dn˜ τ |2 dx ≥ 3c4 Dq 2 τ 2 ||2 dx 2 for all ∈ H02 () and all n˜ τ of the form n˜ τ (x) = Qnτ (Q t x) for some Q ∈ S O(3).

Phase Transitions from Chiral Nematic Toward Smectic Liquid Crystals

379

Proof. We may assume that 2 = 0. By integration by parts and Hölder’s inequality, we have 2 |Dn˜ τ | dx = −D2n˜ τ · dx ≤ D2n˜ τ 2 2

for all ∈ D 2

H02 ().

Then we have

|D2n˜ τ |2 dx − 2C⊥

|Dn˜ τ |2 dx ≥ Dn˜ τ 22

From Lemma 4.1, the proof is complete with 3c4 = 16m 2 .

D Dn˜ τ 22 2 22

− 2C⊥ .

Next we derive the estimate for r¯ when Dqτ C⊥ , with the help of (3.5) and Theorem 3.1. Lemma 4.3. Let c2 , m, c4 , q and τ be as in Lemma 4.2. There exists a constant Π1 = Π1 (q, ) so that if (, n) is a minimizer for F in A with K 1 , K 2 ≥ Π1 and c4 Dq 2 τ 2 > −r then = 0. Proof. Since (, n) is a minimizer it satisfies the Euler equation DD4n + C⊥ D2n − C {(n · Dn )2 + (n · Dn )(∇ · n)} = −(g||2 + r ) in the sense of H02 (). Multiplying the equation by the complex conjugate ∗ and integrating by parts result in D|D2n |2 − C⊥ |Dn |2 + C |n · Dn |2 dx F0 ≡ = −r ||2 − g||4 dx ≤ −r ||2 dx. (4.1)

Using integration by parts,

C⊥

2

dx + C⊥ D D2n + |Dn |2 dx ≤ D

C2 −r + ⊥ D

||2 dx.

The second term in the left-hand side gives |∇|2 dx ≤ C1 ||2 dx

(4.2)

(4.3)

since −r < c4 Dq 2 τ 2 < c4 Dq 4 . Here C1 is a constant depending on D, C⊥ , and q. Also from (4.2), |D2n |2 dx ≤ C ||2 dx.

D2n

= − 2iqn · ∇ − iq∇ · n − q 2 and using (2.6), Using the definition of (3.3), and (4.3), for any ε > 0,

2 ≤ C ( 2 + ∇ 2 + ∞ ∇ · n 2 ) ≤ C 2 + C M1 || ε ∇ 2 2 + ε−3 2

380

S. Joo, D. Phillips

for some constant C depending on D, C⊥ and q. Taking ε small such that εC M1 || < 1/2, and using integration by parts, we have

∇ 2 22 = 22 ≤ C2 22 ,

(4.4)

where C2 is a constant depending on q, C⊥ , and D. From the definitions of D2n and Dn , we have |D2n˜ τ + 2iq(n˜ τ − n) · ∇ − iq∇ · n|2 dx F0 ≥ D −C⊥ |Dn˜ τ + iq(n˜ τ − n)|2 dx.

By Hölder’s inequality, we get F0 ≥

D 2

Dn˜ τ 22 − 2C⊥ Dn˜ τ 22 − 8Dq 2 ∇ 24 n˜ τ − n 24 2 −2Dq 2 2∞ ∇ · n 22 − 2C⊥ q 2 24 n˜ τ − n 24 .

In order to control terms with 4 and ∇ 4 we use the Sobolev imbedding theorem [14]. Using (2.6) and Lemma 4.2, F0 ≥ 3c4 Dq 2 τ 2 22 − C0 n˜ τ − n 24 + ∇ · n 22 22 . Here, we have used (4.3), and (4.4) and C0 is a constant depending on q, D, C⊥ and 2 . Now apply Theorem 3.1 with ε satisfying C0 ε2 = c4 D mC⊥D . Then there exists Π = Π (ε, q, c4 Dq 4 ) such that if K 1 , K 2 ≥ Π then n˜ τ − n 4 < ε. If M3 || m D 2 K 1 , K 2 ≥ max Π, 4c1 , C0 := Π1 (q, ) c4 D C⊥ then by applying (3.5), we get −r 22 ≥ F0 ≥ c4 Dq 2 τ 2 22 , combining with (4.1). Since c4 Dq 2 τ 2 > −r , we see ≡ 0 in .

4.2. The smectic C* phase. In this section we establish the characterization for r when Dqτ > C⊥ . For certain values for qτ and K 1 , K 2 and K 3 large, we show that if r ≤ r , then ≡ / 0, i.e., material is not in the chiral nematic phase. We find that the estimates r and r¯ are equal up to the multiplicative constant. Lemma 4.4. Let τ ≥ R −1 , qτ ≥ h −2 , and τ < q. Then there exists a constant c5 > 0 depending on σ = C /(4Dq 2 ) so that for each Q ∈ S O(2) × I there exists ˜ ∈ H2 () for which 0 c ˜ 2 + C |n˜ τ · Dn˜ τ | ˜ 2 ) dx < 5 D(qτ )2 ˜ 2 dx. (D|D2n˜ τ | || 4

Phase Transitions from Chiral Nematic Toward Smectic Liquid Crystals

381

Proof. Denoting the left-hand side of the desired inequality by G, we get ˜ 2 + (C + 8Dq 2 )|n˜ τ · ∇ ˜ + q 2 | ˜ − iq | ˜ 2 ) dx. G ≤ (2D|

Recall that, for some 0 ≤ θ < 2π, n˜ τ (x) = (cos(θ + τ z), sin(θ + τ z), 0)t . Writing cos(θ + τ z) = cos θ − τ z sin θ − τ z cos θ −

(τ z)2 2

(τ z)2 2

(4.5)

cos(θ + τ ξ ), sin(θ + τ z) = sin θ +

sin(θ + τ η) for some |ξ |, |η| ≤ |z|, we find G ≤ I1 + I2 + I3 ,

where the integrals I1 , I2 and I3 are given by ˜ 2 dx, ˜ + q 2 | | I1 = 2D 2 ˜ x + (sin θ + τ z cos θ ) ˜ y − iq | ˜ 2 dx, I2 = 8Dq (σ + 2) |(cos θ − τ z sin θ ) ˜ x |2 + | ˜ y |2 ) dx. I3 = 4Dq 2 (σ + 2) τ 4 z 4 (|

˜ = f · exp{iq(x cos θ + y sin θ )}, we have Letting | f + 2iq(cos θ f x + sin θ f y )|2 dx, I1 = 2D I2 = 8Dq 2 (σ + 2) |(cos θ − τ z sin θ ) f x + (sin θ + τ z cos θ ) f y |2 dx, 2 I3 = 4Dq (σ + 2) τ 4 z 4 (| f x + iq cos θ f |2 + | f y + iq sin θ f |2 ) dx.

We define two real cut-off functions f 1 (x, y) , f 2 (z) supported in B R0 and (−h 0 , h 0 ), respectively and f 1 ≡ 1 in B R0 (0) and f 2 ≡ 1 in (−h 0 /2, h 0 /2), where R0 = 1/τ and 2 √ h 0 = 1/ qτ so that √ |∇ f 1 | ≤ Cτ, |∇ 2 f 1 | ≤ Cτ 2 , |( f 2 )z | ≤ C qτ , |( f 2 )zz | ≤ Cqτ. Set f = f 1 f 2 and 0 = B R0 × (−h 0 , h 0 ). From the bounds for the derivatives of f 1 and f 2 and the fact that τ < q we can estimate I1 , I2 , and I3 , so as to obtain G ≤ C Dq 2 τ 2 |0 |, where C denotes a constant depending only on σ in this proof. Since C|0 | for some constant C, the lemma is proved.

"

˜

||

2 dx

=

Lemma 4.5. There exists a constant Π2 = Π2 (q, ) so that if q ≥ τ ≥ R −1 , qτ ≥ h −2 , K 1 , K 2 ≥ Π2 , Dqτ > C⊥ and if (0, n ) minimizes F in A then 2 ˜ 2 2 2 2 ˜ ˜ ˜ 2 dx (4.6) (D|Dn | − C⊥ |Dn | + C |n · Dn | ) dx < c5 D(qτ ) ||

˜ ∈ for some

H02 ().

382

S. Joo, D. Phillips

Proof. Following the proof of Lemma 9 of [1], we find that if 4c1 ≤ K 2 and if 2c1 ≤ K 1 , then

K1 K2 (∇ · n )2 + c0 |∇n |2 + |∇ × n + τ n |2 dx ≤ 2c1 τ 2 ||. 2 2

(4.7)

˜ Now as in Theorem 3.1, given ε > 0, there is a constant Π˜ = Π(ε, q, ) so that if ˜ be τ ≤ q and K 1 , K 2 ≥ Π˜ then n − n˜ τ 4; < ε for some Q ∈ S O(2) × I. Let associated with Q as in Lemma 4.4. Denote the left-hand side of (4.6) by F . We have ˜ 22 + 2C n˜ τ · Dn˜ τ

˜ 22 + 3Dq 2 (∇ · n )

˜ 22 F ≤ 2D D2n˜ τ

˜ 22 + 2C (n − n˜ τ ) · ∇

˜ 22 + 12Dq 2 (n − n˜ τ ) · ∇

c5 ˜ 22 + 3Dq 2

˜ 2∞ ∇ · n 22 + C Dq 2 ∇

˜ 24 n − n˜ τ 24 D(qτ )2

≤ 2 ≡ I1 + I2 + I3 .

˜ as in Lemma 4.4. Since || ˜ ≤ 1 and by (4.7), Estimate I2 ≤ C Dq 2

τ 2 || || C Dq 4 || q 3 ˜ 22 ≤ 0 ˜ 22 ,

≤ C0 Dq 2 τ 2 K1 K 1 |0 | K1 π

where |0 | = π τ −2 (qτ )−1/2 ≥ π/q 3 is used and 1 √ ˜ 22 I3 ≤ C Dq 2 (τ 2 + qτ + q 2 ) ε2 |0 | 2 ≤ C1 Dq 5 q ε2

5 for some constant C, C0 and C1 depending # only on σ . Choose$ε small so that C1 Dq 2 2 7 √ 2 c C ˜ 4c1 , C0 4D q 2 || we obtain the desired q ε = 54D⊥ . Then setting Π2 = max Π, πc C

inequality (4.6).

5 ⊥

Theorem 4.1. We suppose that τ ≥ R −1 and qτ ≥ h −2 . Then there are universal positive constants c2 , m, c4 , a constant c5 depending only on depending on q, , D, and C⊥ such that if m Dqτ > C⊥ , q ≥ σ , and a constant K and if (, n) is a minimizer for F in A, then the following two c2 τ, and K 1 , K 2 ≥ K statements hold:

and

1) r > r¯C := −c4 D(qτ )2 implies ≡ 0 2) r < r C := −c5 D(qτ )2 implies ≡ / 0.

= max(Π1 , Π2 ) where the constants Π1 , Π2 are from Lemma 4.3 and Proof. Set K Lemma 4.5, respectively. The constants c2 , m, c4 , and c5 are from the previous lemmas. The first statement follows from Lemma 4.3. The second statement will follow from the contrapositive argument given in Theorem 3 of [1], applying Lemma 4.5.

Phase Transitions from Chiral Nematic Toward Smectic Liquid Crystals

383

5. Analysis for Dqτ C⊥ 5.1. The chiral nematic phase. For the estimate of r¯ when Dqτ C⊥ , we were able to treat the terms in the smectic energy separately since the term multiplied by C⊥ was dominated by the gradient term with coefficient D. In this section we have the opposite case, we have to deal with these terms simultaneously along with the term with the coefficient C . To get r¯ for the region Dqτ C⊥ we need several lemmas. Lemma 5.1. Let A and b be given positive constants. Then there exists a universal constant C > 0 such that 2 4 | f + A f |2 + b4 x 4 | f |2 d x ≥ C A 3 b 3 | f |2 d x, (5.1) R R √ | f + A f |2 + b2 x 2 | f |2 d x ≥ C Ab | f |2 d x (5.2) R

R

for all f ∈ H2 (R). Furthermore, if we have 0 < l1 ≤

b2 A ≤ l2 < ∞ a6

(5.3)

for some positive constants l1 and l2 , then there exists a constant C = C(l1 , l2 ) > 0 such that √ 2 2 2 2 2 2 |b f + A f | + (x − a ) | f | d x ≥ C Aba | f |2 d x (5.4) R

R

for all f ∈ H2 (R). √ Proof. Let y = Ax and g(y) = f (x). Then (5.1) is equivalent to 4 4 b b 3 2 4 2 y |g| dy ≥ C |g|2 dy. |g + g| + A A R R Denoting τ 3 = equivalent to

% b &4 A

and using change of variables, z =

R

1 |τ h + h|2 + z 4 |h|2 τ

√ τ y, h(z) = g(y), it is

dz ≥ C

R

|h|2 dz.

By taking account of the Fourier transform, it suffices to prove that ( ' 1 (τ x 2 − 1)2 | f |2 + | f |2 d x ≥ C | f |2 d x (5.5) R τ R " for all f ∈ H2 (R). We may assume that R | f |2 d x = 1. We suppose that this inequality is false, i.e., there exists {Cn } with limn→∞ Cn = 0, for which we can find τn and f n ∈ H2 (R) with f n 2 = 1 for each n = 1, 2, . . . such that τn (x 2 − θn2 )2 | f n |2 + | f n |2 d x ≤ Cn , (5.6) R

384

S. Joo, D. Phillips

where θn2 = τn−1 . We can see that from the second integral, f n H2 (R) ≤ C and hence

f n ∞ ≤ C. We also have τn (x 2 − θn2 )2 ≥ τn θn2 min{(x − θn )2 , (x + θn )2 } = min{(x − θn )2 , (x + θn )2 }. Then

R

min{(x − θn )2 , (x + θn )2 }| f n |2 d x −→ 0 as n → ∞.

This implies that the mass of | f n |2 must concentrate near ±θn , which contradicts the fact that f n is uniformly bounded. The second inequality can be proved similarly, since (5.2) is equivalent to ( ' 1 2 2 2 2 | f |2 d x. (τ x − 1) | f | + | f | d x ≥ C R τ R We also see that the last inequality (5.4) is equivalent to

2 2 √ b 2 2 4 2 x − A | f | + a | f + f | Aab | f |2 d x. d x ≥ C 2 a R R Note that by (5.3), setting θ 2 = Aa 2 /b2 , it is enough to prove ) −1 l1 min{(x − θ )2 , (x + θ )2 }| f |2 + l2 2 | f + f |2 dx ≥ C | f |2 d x R

for all f ∈

R

H2 (R).

The same process as above leads to the conclusion.

Corollary 5.1. Let f ∈ H2 (R) and a > 0, A > 0 and b > 0 be given. Suppose there is a positive constant k1 such that √ k1 b A ≤ a 3 . Then

|b2 f + A f |2 + (x 2 − a 2 )2 | f |2 d x R √ √ c Ab cb3 ≥ C Aba 1 − − √ | f |2 d x, a3 a5 A R

where C and c are positive constants depending on k1 .

" Proof. We may assume that f ∈ Cc∞ (R) by density argument and R | f |2 d x = 1. Let {φ j }, j = 1, 2, 3 be a partition of unity on the support of f , subordinate to the covering (−∞, 0), (0, ∞), (− a2 , a2 ). Note that |φ j | ≤ Ca , |φ j | ≤ aC2 for j = 1, 2, 3. The left hand side of the desired estimate is bounded below by 3

j=1 R

|b2 f φ j + A f φ j |2 + (x 2 − a 2 )2 | f φ j |2 d x

Phase Transitions from Chiral Nematic Toward Smectic Liquid Crystals

385

! because 0 ≤ φ j ≤ 1 and 3j=1 φ j = 1. We can find a constant c such that 2 2 |b f φ j + A f φ j | d x = |b2 ( f φ j ) − 2b2 f φ j − b2 f φ j + A f φ j |2 d x j

R

≥ j

j

R

1 2 cb4 | |b ( f φ j ) + A f φ j |2 d x − 2 a R 2 R

( f φ j ) | 2 d x − j

cb4 a4

! writing f = ( j f φ j ) . Using integration by parts, we see that |b2 f φ j + A f φ j |2 d x j

R

≥ j

2

2 c b2 c Ab2 cb4

f φ j

d x − 2 − 4 . c b ( f φ j ) + A + 2 a a a R

Hence we see that |b2 f + A f |2 + (x 2 − a 2 )2 | f |2 d x ≥ R

Fj − c j

where F j :=

Ab2 b4 + 4 a2 a

,

(5.7)

c |b2 ( f φ j ) + A f φ j |2 + (x 2 − a 2 )2 | f φ j |2 d x. R

Applying (5.2) to F1 , using change of variables if necessary, we get c |b2 ( f φ1 ) + A f φ1 |2 + a 2 (x + a)2 | f φ1 |2 d x F1 ≥ R √ ≥ C Aba | f φ1 |2 d x. R

We may have a similar result for F2 . For f φ3 supported in (− a2 , a2 ), √ 9 a4 9k1 b a A 2 2 2 2 2 F3 ≥ (x − a ) | f φ3 | d x ≥ | f φ3 | d x ≥ | f φ3 |2 d x. 16 16 R R R Thus the proof of the lemma is complete from (5.7).

Now we will prove the lemma which will play an important role to get r¯ when Dqτ C⊥ . Lemma 5.1 and Corollary 5.1 will be used to prove the following lemma. Lemma 5.2. Suppose σ⊥ < 1.5. There exist positive constants d1 < 1 and d2 depending only on σ such that if b < d1 σ⊥ then |g + (σ⊥ − α 2 − β 2 − 2α)g|2 + 4σ (α cos bz + β sin bz − 1 + cos bz)2 |g|2 dz R 2 4 3 3 ≥ d2 σ⊥ b |g|2 dz (5.8) R

for all α, β ∈ R and for all g ∈ H2 (R).

386

S. Joo, D. Phillips

" Proof. We may assume that g ∈ Cc∞ (R) by density and R |g|2 dz = 1. Let F be the left-hand side of (5.8). Setting (α + 1, β) = s(cos θ, sin θ ) for s ≥ 0, we get % &2 F= |g + (σ⊥ + 1 − s 2 )g|2 + 4σ s cos(bz − θ ) − 1 |g|2 dz. R

By change of variable, it suffices to prove that there exists a constant d2 such that 2 4 % &2 |b2 f + (σ⊥ + 1 − s 2 ) f |2 + 4σ s cos x − 1 | f |2 d x ≥ d2 σ⊥3 b 3 (5.9) F= R

for all f ∈ H2 (R) with f 2 = 1. Using the integration by parts, the estimate (5.9) is 1 1 obvious for s 2 ≥ 1 + σ⊥ + (σ⊥ b2 ) 3 . Now let s 2 < 1 + σ⊥ + (σ⊥ b2 ) 3 . 2

4

First we claim that there is a constant d2 such that F ≥ 2d2 σ⊥3 b 3 for any f ∈ 2 H0 (2nπ − π3 , 2nπ + π3 ) or f ∈ H02 (2nπ + x0 , 2(n + 1)π − x0 ), where x0 = cos−1 ( √1 ) 3 and n is an integer. We can easily prove the claim for f ∈ H02 (2nπ + x0 , 2(n + 1)π − x0 ) by following. If √ s < 3, then on the interval [2nπ + x0 , 2(n + 1)π − x0 ], (s cos x − 1)2 has a minimum at x = 2nπ + x0 and 2(n + 1)π − x0 for each integer n. Thus (s cos x − 1)2 ≥

√ 1 (s − 3)2 , 3

2 √ 1 whenever s < 3. Since s 2 < 1 + σ⊥ + (σ⊥ b2 ) 3 < 2.5 + 1.5d13 < 2.7 for small enough √ d1 , we see that (s cos x − 1)2 ≥ 13 (s − 3)2 is bounded below by a positive constant if d1 is sufficiently small. Next we consider five cases for f ∈ H02 (2nπ − π3 , 2nπ + π3 ). We may assume that s ≥ 21 otherwise we have (s cos x − 1)2 ≥ 14 . Throughout this proof, C and c denote constants depending only on σ . 1 Case 1. s 2 ≤ 1 + a1 (σ⊥ b2 ) 3 , where a1 is yet to be determined. By Taylor’s theorem, 2 s cos γ (x − 2nπ )2 (s cos x − 1)2 = s − 1 − (5.10) 2

for some γ ∈ (− π3 + 2nπ, π3 + 2nπ ). Note that cos γ ≥ 21 . From s ≥ 21 , we obtain 1 (s cos x − 1)2 ≥ 2

1

s cos γ a1 (σ⊥ b2 ) 3 (x − 2nπ )2 + 1 − s + 2 1+s

2

2

a 2 (σ⊥ b2 ) 3 − 1 (1 + s)2

2

≥

a 2 (σ⊥ b2 ) 3 2 s2 (x − 2nπ )4 (x − 2nπ )4 − 1 − a12 (σ⊥ b2 ) 3 . ≥ 2 32 (1 + s) 128

Then from (5.9), 1 σ

1 |b2 f + (1 + σ⊥ − s 2 + a1 (σ⊥ b2 ) 3 ) f |2 d x + (x − 2nπ )4 | f |2 d x F≥ 2 R 32 R 2

− (4σ + 1)a12 (σ⊥ b2 ) 3 .

Phase Transitions from Chiral Nematic Toward Smectic Liquid Crystals

387

Now we apply (5.1). Using change of variables, 2 1 4 1 3 2 F ≥ C 1 + σ⊥ − s 2 + a1 (σ⊥ b2 ) 3 (bσ 4 ) 3 − (4σ + 1)a12 (σ⊥ b2 ) 3 1

2

≥ (Cσ 3 − (4σ + 1)a12 )(σ⊥ b2 ) 3 . 1

Then we obtain the claim if we choose a1 small enough so that 1 b2 ) 3

a12

1 b2 ) 3 ,

<

Cσ 3 2(4σ +1) .

Case 2. 1 + σ⊥ − a2 (σ⊥ ≤ < 1 + σ⊥ + (σ⊥ where a2 is to be determined. Then using change of variable if necessary and from (5.10), we get s cos γ 2 2 2 |b2 f + (1 + σ⊥ − s 2 ) f |2 + 4σ (s − 1 − F≥ x ) | f | dx 2 R 4 2 b s cos γ 2 2 2 | f |2 + 4σ (s − 1 − x ) | f | d x − a22 (σ⊥ b2 ) 3 . ≥ 2 R 2 s2

We may assume that 1 < s < 2 provided d1 is sufficiently small, from the assumption. Now notice that 2

1

3(s − 1) ≥ (s − 1)(s + 1) = s 2 − 1 ≥ σ⊥ − a2 (b2 σ⊥ ) 3 ≥ σ⊥ (1 − a2 d13 ). Let h be the Fourier transform of f . Then we have from (5.1), 4

2

s cos γ 2 b 4 2 1

y |h| + 4σ

h yy + (s − 1)h dy − a22 (b2 σ⊥ ) 3 F≥ 2π R 2 2 2 2 2 4 2 2 2 2 23 2 3 3 3 3 3 ≥ C(s − 1) b − a2 (b σ⊥ ) ≥ (σ⊥ b ) C(1 − a2 d1 ) − a2 . 2

We choose a2 such that 4a22 < C and then choose d1 such that 2a2 d13 < 1. 1

1

Case 3. 1 + a1 (σ⊥ b2 ) 3 < s 2 ≤ 1 + M(σ⊥ b2 ) 3 , where M is to be determined. By (5.10) and change of variables, √ √ |b2 f + (1 + σ⊥ − s 2 ) f |2 + ( σ s cos γ x 2 − 2 σ (s − 1))2 | f |2 d x. (5.11) F≥ R

Since we have, for sufficiently small d1 , s<

√

1

2,

1 a1 (σ⊥ b2 ) 3 ≤ s − 1 ≤ M(σ⊥ b2 ) 3 , 3

σ⊥ ≤ 1 + σ⊥ − s 2 < σ⊥ , 2

we can see that the assumption for (5.4) is satisfied. In fact, cM −3 ≤ Thus we get

b2 (1 + σ⊥ − s 2 ) ≤ ca1−3 . (s − 1)3

) √ 2 F ≥ Cb 1 + σ⊥ − s 2 s − 1 ≥ C(b2 σ⊥ ) 3 .

(5.12)

388

S. Joo, D. Phillips 1

Case 4. 1 + M(σ⊥ b2 ) 3 < s 2 ≤ 1 + For sufficiently small d1 , we have

σ⊥ 2

1

− a2 (σ⊥ b2 ) 3 .

1

M(σ⊥ b2 ) 3 ≤ s − 1 ≤ σ⊥ , 3

s < 2,

σ⊥ ≤ 1 + σ⊥ − s 2 < σ⊥ . 2

Now we apply Corollary 5.1 with (5.11). Since ) 3 5 4 b3 b 1 + σ⊥ − s 2 < cM − 2 and < c M − 2 d13 , 3 5) (s − 1) 2 (s − 1) 2 1 + σ⊥ − s 2 having M large enough, we obtain ) √ 3 5 2 F ≥ Cb 1 + σ⊥ − s 2 s − 1(1 − cM − 2 − cM − 2 ) ≥ C(b2 σ⊥ ) 3 . Case 5. 1 + σ2⊥ − a2 (σ⊥ b2 ) 3 < s 2 < 1 + σ⊥ − a2 (σ⊥ b2 ) 3 . We can apply Corollary 5.1 with (5.11), since k1 is depending only on σ and a2 , using 2 s − 1 ≥ s 3−1 ≥ σ12⊥ for sufficiently small d1 . Also since 1

) b 1 + σ⊥ − s 2 (s − 1)

3 2

1

< cd1 and

8 b3 < cd13 , ) (s − 1) 1 + σ⊥ − s 2 5 2

having d1 small enough, then we get ) 8 √ 2 √ F ≥ Cb 1 + σ⊥ − s 2 s − 1(1 − c d13 − c d1 ) ≥ C a2 (b2 σ⊥ ) 3 . Thus we proved the claim for f ∈ H02 (2nπ − π3 , 2nπ + π3 ) for each integer n. Now we consider f ∈ Cc∞ (R). We find a partition of unity {φ j } subject to {(2nπ − π π −1 √1 3 , 2nπ + 3 )} and {(2nπ + x 0 , 2(n + 1)π − x 0 )}, where x 0 = cos ( 3 ). Note that π x0 < 3 . From the claim, we have the estimate (5.8) for f φ j for any j. As in the proof of Corollary 5.1, we derive |b2 ( f φ j ) + (σ⊥ + 1 − s 2 ) f φ j |2 + 4σ (s cos x − 1)2 | f φ j |2 d x F≥ j

R

− C b2

R

j

| f φ j |2 d x.

By the claim, we get 2 2 2 2 3 2d2 (b σ⊥ ) − C b | f φ j |2 d x ≥ d2 (b2 σ⊥ ) 3

F ≥

R

j 2

if d1 is chosen such that 2d2 − C d13 > d2 . The above inequality follows from the fact that only a finite number of φ j ’s are nonvanishing at every point in R.

Phase Transitions from Chiral Nematic Toward Smectic Liquid Crystals

Lemma 5.3. Suppose σ⊥ ≡

C⊥ 2Dq 2

< 1.5 and σ ≡

389

C

≥ 2. Then there is a positive 4Dq 2 1 < 2 d1 C⊥ where d1 is from Lemma

constant c6 depending only on σ such that if Dqτ 5.2 then D|D2n˜ τ |2 − C⊥ |Dn˜ τ |2 + C |n˜ τ · Dn˜ τ |2 dx

2 2 C⊥ 1 4 3 + 4c6 D 3 C⊥ (qτ ) 3 ≥ − ||2 dx 4D

(5.13)

for all ∈ H02 () and all n˜ τ of the form n˜ τ (x) = Qnτ (Q t x) for some Q ∈ S O(2)× I . " Proof. We may assume that ||2 dx = 1. Denoting the left-hand side of (5.13) with F1 , we find 2 C⊥ C⊥ 2 2 =D | dx + C

|Dn˜ τ + |n˜ τ · Dn˜ τ |2 dx F1 + 4D 2D D C⊥ 2 2 2 2 | + q + | + (C − 4Dq )|n˜ τ · ∇ − iq| dx. ≥ 2 2D Let f = · exp[ −iq{ (cos θ )x + (sin θ )y } ] where θ is given in (4.5). Then C2 C⊥ 2 D F1 + ⊥ ≥ | f + 2iq( f x cos θ + f y sin θ ) + f | dx 4D 2 2D C

+ | cos(θ + τ z) f x + sin(θ + τ z) f y + iq(cos τ z − 1) f |2 dx. 2 Now we define the dimensionless units, i.e., we let C

C⊥ τ h(u, v, w) = f (x, y, z), , σ⊥ = , b= , 2 2 4Dq 2Dq q ¯ = q. u = q x, v = qy, w = qz, H = qh, σ =

Then

2 C⊥ Dq 4 du dv dw ≥ |h + 2i(h u cos θ + h v sin θ ) + σ⊥ h|2 4D 2 ¯ q3 du dv dw + 2Dq 4 σ

|(cos(θ + bw)h u + sin(θ + bw)h v + i(cos bw − 1)h|2 . q3 ¯

F1 +

Let g be the Fourier transform of h in u and v. Then C2 Dq 4 |gww + (σ⊥ − ζ 2 − η2 − 2ζ cos θ − 2η sin θ )g|2 dw dζ dη F1 + ⊥ ≥ 4D 8π 2 q 3 R3 2σ Dq 4 ((ζ cos θ + η sin θ ) cos bw + 4π 2 q 3 R3 + (−ζ sin θ + η cos θ ) sin bw + cos bw − 1)2 |g|2 dw dζ dη. We define (α, β) by a rotation of (ζ, η) by an angle θ , i.e., α = ζ cos θ + η sin θ

and

β = −ζ sin θ + η cos θ.

390

S. Joo, D. Phillips

Then F1 +

2 C⊥ Dq 4 ≥ |gww + (σ⊥ − α 2 − β 2 − 2α)g|2 dw dα dβ 4D 8π 2 q 3 R3 2σ Dq 4 (α cos bw + β sin bw + cos bw − 1)2 |g|2 dw dα dβ. + 4π 2 q 3 R3

Now we apply Lemma 5.2. Then F1 +

2 2 4 C⊥ Dq 4 3 3 ≥ d σ b |g|2 dα dβ dw 2 ⊥ 4D 8π 2 q 3 R3 5 2 1 4 1 3 = d2 D 3 C⊥3 (qτ ) 3 ||2 dx. 2 5

Thus we proved the lemma with 4c6 = (1/2) 3 d2 .

Finally we are able to characterize r¯ . Lemma 5.4. Let q, τ, d1 and c6 be as in Lemma 5.3. Suppose τ ≥ τ0 for some positive number τ0 . Then there exists a constant Π3 = Π3 (q, ) so that if (, n) is a minimizer to F in A with K 1 , K 2 ≥ Π3 and −

2 2 C⊥ 1 4 + c6 D 3 C⊥3 (qτ ) 3 > −r 4D

then ≡ 0. Proof. Let (, n) be a minimizer. Then as in Lemma 4.3, (4.1), (4.2), (4.3), and (4.4) follow from −r < −

2 2 2 8 C⊥ C2 1 4 1 + c6 D 3 C⊥3 (qτ ) 3 < − ⊥ + c6 D 3 C⊥3 q 3 . 4D 4D

We get F0 :=

(D|D2n |2 − C⊥ |Dn |2 + C |n · Dn |2 ) dx C⊥ + 2iq(n˜ τ − n) · ∇ − iq∇ · n|2 dx |D2n˜ τ + = D 2D 2 C⊥ 2 |n˜ τ · Dn˜ τ + (n − n˜ τ ) · ∇| dx − ||2 dx. + C

4D

Now using Lemma 5.3,

C2 F0 ≥ 2c6 D C⊥ (qτ ) − ⊥ 4D 1 3

2 3

4 3

22

− (8Dq 2 + C ) ∇ 24 n˜ τ − n 24 − 2Dq 2 2∞ ∇ · n 22 .

(5.14)

Phase Transitions from Chiral Nematic Toward Smectic Liquid Crystals

391

The same proof of Lemma 4.3 can be applied to obtain 2 2 C⊥ 1 4 3 ||2 dx. F0 ≥ c6 D 3 C⊥ (qτ ) 3 − 4D Note that when we apply Theorem 3.1, we need to take ε depending on τ0 . Combining this with (4.1), we have 2 2 C⊥ 1 4 3 + c6 D 3 C⊥ (qτ ) 3 − ||2 dx ≤ −r ||2 dx. 4D C2

1

2

4

Since we have − 4D⊥ + c6 D 3 C⊥3 (qτ ) 3 > −r , we see that ≡ 0.

If we impose a more restrictive condition on the director, then the same result can be obtained as in Lemma 5.4 without the assumption τ ≥ τ0 > 0. For now take the admissible set A ≡ {(, n) ∈ H02 () × W1,2 (; S2 ) : n(x, y, h) = Qnτ (Q t x)(x, y, h) for some Q ∈ S O(2) × I, n(x, y, −h) · e3 = 0 for (x, y, −h) ∈ ∂}. Lemma 5.5. Let q, τ, d1 and c6 be as in Lemma 5.3. Then there exists a constant Π3 = Π3 (q, ) so that if (, n) is a minimizer to F in A with K 1 , K 2 ≥ Π3 and −

2 2 C⊥ 1 4 + c6 D 3 C⊥3 (qτ ) 3 > −r 4D

then ≡ 0. Proof. Using the notation from the proof of Lemma 5.4, from (5.14), we get a constant C depending on D, C⊥ , q, σ and such that 2 2 C⊥ 1 4 3 −r ≥ 2c6 D 3 C⊥ (qτ ) 3 − − C n˜ τ − n 24 − C ∇ · n 22 . 4D Together with (3.6), we have 2 C2 1 4 C r˜ ≡ ⊥ − r > 2c6 D 3 C⊥3 (qτ ) 3 − C n˜ τ − n 24 − 4D K1

2 C⊥ −r 4D

2

≡ I1 − I2 − I3 . 4

2

1

4

By hypothesis, r˜ < c(qτ ) 3 where c = c6 D 3 C⊥3 . Hence, if K 1 ≥ 2Cc(qτ ) 3 , we have 8 4 C c (qτ ) 3 ≤ (qτ ) 3 . K1 2

I3 =

We use the Poincaré inequality on n− n˜ τ , which vanishes on the portion of the boundary. Then we control I2 by (3.6), 4

2

4

2

4

I2 ≤ C ∇(n˜ τ − n) 23 n − n˜ τ 43 ≤ C(τ 3 + r˜ 3 ) n − n˜ τ 43 4

16

2

4

2

≤ C(τ 3 + (qτ ) 9 ) n − n˜ τ 43 ≤ (qτ ) 3 C n − n˜ τ 43 ,

392

S. Joo, D. Phillips 2

where C is a constant independent of τ . By Theorem 3.1 with ε such that Cε 3 < there is a constant such that n − n˜ τ 4 < ε if K 1 , K 2 ≥ . Thus we have I2 ≤ and then

2 C⊥ 4D

1

2

4

c6 2,

4 c (qτ ) 3 , 2

− r > c6 D 3 C⊥3 (qτ ) 3 . Again we see ≡ 0.

5.2. The smectic C* phase. ) √ Lemma 5.6. Let τ ≥ (q D)/(C⊥ R 3 ) , qτ ≥ h −3 D/C⊥ and C⊥ < 3Dq 2 . Then there exists a constant c7 > 0 depending only on σ so that for each Q ∈ S O(2) × I ˜ ∈ H2 () for which if Dqτ < C⊥ then there exists 0 ˜ 2 − C⊥ |Dn˜ τ | ˜ 2 + C |n˜ τ · Dn˜ τ | ˜ 2 ) dx (D|D2n˜ τ | 2 C⊥ 4 c7 1 23 ˜ 2 dx. + D 3 C⊥ (qτ ) 3 < − || 4D 4 Proof. Let G0 =

˜ + D|D2n˜ τ

C⊥ 2 ˜ + C |n˜ τ · Dn˜ τ | ˜ 2 | 2D

dx.

Then as in Lemma 4.4, G0 ≤ I1 + I2 + I3 , where C ˜ + ⊥ | ˜ 2 dx ˜ + q 2 I1 = 2D | 2D and I2 , I3 are the same as in Lemma 4.4. We let * C⊥ ˜ (−x sin θ + y cos θ )} = f · exp{iq(x cos θ + y sin θ ) + i 2D and define real cut-off functions f 1 (x, y) and f 2 (z) supported in B R0 and (−h 0 , h 0 ), 1 1 3 6 respectively with R0 = CDqτ 2 and h 0 = C qD2 τ 2 . Let 0 = B R0 × (−h 0 , h 0 ). ⊥ ⊥ Now define f = f 1 f 2 . Then as in Lemma 4.4, we may derive

C2 ˜ 2 − C⊥ |Dn˜ τ | ˜ 2 + C |n˜ τ · Dn˜ τ | ˜ 2 dx = G0 − ⊥ ˜ 2 dx D|D2n˜ τ | || 4D

2 C⊥ 4 c7 1 23 ˜ 2 dx. D 3 C⊥ (qτ ) 3 − ≤ || 4 4D

Just as in Sect. 4, we can characterize the transition curve for Dqτ < c3 C⊥ , where c3 = d1 /2 was found in Lemma 5.2.

Phase Transitions from Chiral Nematic Toward Smectic Liquid Crystals

393

Theorem 5.1. We suppose that + σ ≥ 2,

C⊥ < 1.5, 2Dq 2

+ Dq ≤ τ, and C⊥ R 3

D ≤ qτ. C⊥ h 6

K Then there are positive constants c3 , c6 , and c7 depending only on σ and a constant depending on q, , D, and C⊥ such that if K Dqτ < c3 C⊥ and K 1 , K 2 ≥ and if (, n) is a minimizer for F in A, then we have the following two statements: 2 2 C⊥ 1 4 − c6 D 3 C⊥3 (qτ ) 3 implies ≡ 0 4D 2 C2 1 4 2) r < r C := ⊥ − c7 D 3 C⊥3 (qτ ) 3 implies ≡ / 0. 4D

1) r > r¯C := and

6. Analysis for Dqτ ∝ C⊥ In this section we estimate the phase transition region in the r − qτ plane for c3 C⊥ ≤ Dqτ ≤

C⊥ , m

(6.1)

where m is from Lemma 4.1 and c3 is from Theorem 5.1. This is the case remaining from the analyses in Sects. 4 and 5. We note in terms of b = qτ and σ⊥ , the inequalities (6.1) are equivalent to 2σ⊥ 2c3 σ⊥ ≤ b ≤ . (6.2) m Lemma 6.1. Given σ > 0 and σ⊥ , b satisfying (6.2) there exist constants c8 , c9 > 0 depending only on σ so that if b < c8 then |g + (σ⊥ − α 2 − β 2 − 2α)g|2 dz R + 4σ (α cos bz + β sin bz − 1 + cos bz)2 |g|2 dz R 2 ≥ c9 b |g|2 dz R

for all α, β ∈ R and for all g ∈ H2 (R). Proof. Using (6.2) we can assume that c8 is sufficiently small so that σ⊥ ≤ 18 . We begin by arguing as in Lemma 5.2. It suffices to prove that there exists a constant c9 > 0 so that F = (|b2 f + (σ⊥ + 1 − s 2 ) f |2 + 4σ (s cos x − 1)2 | f |2 d x ≥ c9 b2 R

394

S. Joo, D. Phillips

for all s ≥ 0 and f ∈ H2 (R) with f 2 = 1. Just as in Lemma 5.2 the estimate is straightforward for 0 ≤ s 2 ≤ 1 − σ⊥ and 1 + 2σ⊥ ≤ s 2 . Using the periodicity of the coefficients then it suffices to find c9 > 0 so that π F0 = (|b2 f + (σ⊥ + 1 − s 2 ) f |2 + 4σ (s cos x − 1)2 | f |2 )d x ≥

−π c9 b2

(6.3)

for all 1 − σ⊥ ≤ s 2 ≤ 1 + 2σ⊥ , f ∈ H2 (−π, π ) with f 2 = 1, and all b sufficiently small. For s ≥ 1 let x(s) ∈ [0, π2 ) be such that s cos x(s) − 1 = 0 and set x(s) = 0 for 0 ≤ s ≤ 1. It follows that there is a constant C > 0 so that (x 2 − x(s)2 )2 ≤ C(s cos x − 1)2 for −π ≤ x ≤ π and 0 ≤ s 2 ≤ 45 . If (6.3) fails to hold there exist sequences bn , sn , (σ⊥ )n , f n such that bn > 0, limn→∞ bn = 0, limn→∞ sn = 1, θn = ((σ⊥ )n + 1 − sn2 )/bn ∈ [−1, 2], and f n ∈ H2 (−π, π ) with f n 2 = 1 for which π π |bn f n + θn f n |2 d x + (bn )−2 (x 2 − x(sn )2 )2 | f n |2 d x → 0 (6.4) −π

−π

as n → ∞. "π From the first integral and the fact that f n 2 = 1 we have the local estimate bn −2 π 2

| f n |2 d x ≤ C, where C is independent of n. From this we obtain the Hölder estimate

x − y 1/2 π π

| f n (x) − f n (y)| ≤ C

. (6.5) for x, y ∈ − ,

bn 2 2 Now fix 0 < δ < 1 and set 1/2

I±n = (−δbn

1/2

± x(sn ), δbn

± x(sn )).

(6.6)

From the second integral in (6.4) and f n 2 = 1 we find lim | f n |2 d x = 1. n→∞ I ∪I +n −n

Using this, (6.5), and (6.6) we see there is a constant c0 > 0 such that for δ sufficiently small and all n sufficiently large that | f n |2 ≥

c0 (bn )1/2

It follows that lim inf (bn )−2 n→∞

π 2

− π2

on either I+n or

I−n .

(x 2 − x(sn )2 )2 | f n |2 d x > 0

and this contradicts (6.4). With this lemma, the existence of curves r C (·) and rC (·), for qτ satisfying (6.1), follows just as in Lemmas 4.4, 5.3, 5.4 and Theorem 5.1.

Phase Transitions from Chiral Nematic Toward Smectic Liquid Crystals

395

Theorem 6.1. Given σ ≥ 2 we suppose that R −1 ≤ τ, h −2 ≤ qτ

and C⊥ , D, q,

and τ

satisfy (6.1). Then there are positive constants c10 , c11 and c12 depending only on σ

≈

≈

and a constant K depending on q, , D, and C⊥ such that if c10 τ ≤ q and K 1 , K 2 ≥ K and if (, n) is a minimizer for F in A, then we have the following two statements: 2 C⊥ − c11 D(qτ )2 implies ≡ 0 4D C2 2) r < r C := ⊥ − c12 D(qτ )2 implies ≡ / 0. 4D

1) r > r¯C :=

and

7. The Chiral Nematic to Smectic A* Phase Transition In this section, we study the N* – Sm A* phase transition based on the modified ChenLubensky energy with C⊥ ≤ 0. Since near the N* – Sm A* phase transition, only K 2 and K 3 can be assumed to be sufficiently large, we need to work on the estimate of the divergence of the director from the minimizer. Throughout this section, C⊥ ≤ 0 is assumed. The domain in this section is assumed to have a smooth lateral side. 7.1. The chiral nematic phase Lemma 7.1. Let qτ ≥ h −2 . There are universal constants e1 , e2 and a constant Π5 (q, ) so that if (, n) is a minimizer for F A in A with K 2 ≥ Π5 , q ≥ e1 τ and e22 D(qτ )2 + e2 |C⊥ |qτ > −r then = 0. Proof. Following the proof of Lemma 4.3 it suffices to prove that there are positive constants e1 , e2 such that if (, n) is a minimizer for F A in A with K 2 ≥ Π5 , q ≥ e1 τ then D|D2n |2 + |C⊥ ||Dn |2 dx ≥ e22 D(qτ )2 + e2 |C⊥ |qτ ||2 dx. (7.1)

As in the proof of Lemma 4.3, the inequalities (4.3) and (4.4) hold. Using Lemma 4.1 and the proof of Lemma 4.2 we see that there are positive constants e1 , e2 such that if q ≥ e1 τ then 2 D|D2n˜ τ |2 + |C⊥ ||Dn˜ τ |2 dx ≥ D 2e22 qτ + 2e2 |C⊥ |qτ ||2 dx (7.2)

for all ∈ H02 () and all n˜ τ . The only obstacle on following the proof of Lemma 4.3 in this case is that we cannot assume that K 1 is large enough. Thus, in order to estimate ∇ · n we use a “change of gauge” method as in [1]. Let n˜ τ be as in Theorem 3 and u be a solution of u = ∇ · n in , ∂u = (n − n˜ τ ) · ν on ∂, ∂ν

(7.3)

396

S. Joo, D. Phillips

where ν is the exterior normal to . Set = eiqu . Then we obtain |D2n |2 = |D2n˜ τ + 2iq(∇u − n + n˜ τ ) · ∇ + iq(u − ∇ · n)

− q 2 |∇u|2 + 2q 2 n · ∇u |2

1

≥ D2n˜ τ |2 − 8q 2 (∇u − n + n˜ τ ) · ∇ |2 − 2q 4 |2n · ∇u − |∇u|2 |2 ||2 2 =: I1 − I2 − I3 . (7.4)

Also we have |Dn |2 = |Dn˜ τ + iq(∇u − n + n˜ τ ) |2 1 ≥ |Dn˜ τ |2 − q 2 |∇u − n + n˜ τ |2 ||2 =: I4 − I5 . 2 As proceeded in the proof of Lemma 7 from [1], we need to estimate the term with ∇u − n + n˜ τ . However, since the domain for the elliptic problem (7.3) is not smooth, we need to use the reflection principle by the even function of the solution to (7.3). Using the extension in the neighborhood of the nonsmooth boundary part, we have the same estimate as we have for u in a domain with smooth boundary. Hence we get q 2 ∇u − n + n˜ τ 2L4 () ≤ Cq 2 ∇u − n + n˜ τ 2W1,2 () ≤ C()q 2 ∇ × (∇u − n + n˜ τ ) 2L2 () ≤ C()q 2 ∇ × (n − n˜ τ ) + τ (n − n˜ τ ) 2L2 () + τ 2 n − n˜ τ 2L2 () 1 ≤ C()q 2 + τ 2 n − n˜ τ 24; . K2

(7.5)

Now we see that I5 can be easily controlled by Hölder’s inequality and (7.5). Next we need an estimate on ∇ L4 () in order to control I2 . In fact,

∇ 2L4 () = ∇ − iq∇u 2L4 () ≤ C ∇ 2L4 () + Cq 2 ∇u 2L6 () 2L12 () ≤ C ∇ 2 2L2 () + Cq 2 ∇ 2 u 2L 2 () 2L12 () , and then the elliptic regularity, Sobolev inequality and (3.3), (4.4) lead to

∇ 2L4 () ≤ C 2L2 () + Cq 2 ∇ · n 2L2 () + n − n˜ τ 2H1 () 2L2 () ≤ C(, q) 2L2 () . Now we have to estimate I3 . Note that, using |n| = |n˜ τ | = 1, |2n · ∇u − |∇u|2 |2 = |2n · (∇u − n + n˜ τ ) + 2n · (n − n˜ τ ) − |∇u|2 |2 ≤ 8|∇u − n + n˜ τ |2 + 2|2n · (n − n˜ τ ) − |(∇u − n + n˜ τ ) + n − n˜ τ |2 |2 ≤ 32|∇u − n + n˜ τ |2 + 6|∇u − n + n˜ τ |4 + 6|2n · (n − n˜ τ ) − |n − n˜ τ |2 |2 ≤ 32|∇u − n + n˜ τ |2 + 6|∇u − n + n˜ τ |4 .

Phase Transitions from Chiral Nematic Toward Smectic Liquid Crystals

397

Combining with Sobolev inequality and (4.3) leads to I3 ≤ Cq 4 |∇u − n + n˜ τ |2 + |∇u − n + n˜ τ |4 ||2 dx 4 ≤ Cq ∇u − n + n˜ τ 2W1,2 () + ∇u − n + n˜ τ 4W1,2 () 2L2 () . Again by using (7.5), we establish the estimate on the error term I3 as well and hence for sufficiently large K 2 , the error terms I2 , I3 and I5 can be controlled successfully as we did in Lemma 4.3. Thus, the claim (7.1) follows from (7.2) and estimates on error terms. 7.2. The smectic A* phase Lemma 7.2. There exist constants e3 , e4 and Π6 = Π6 (q, ) so that if q ≥ τ ≥ R −1 , qτ ≥ h −2 , K 2 ≥ Π6 , and if (0, n ) minimizes F A in A then

2 ˜ 2 2 2 ˜ ˜ D Dn + |C⊥ ||Dn | + C |n · Dn | dx 2 ˜ 2 dx < e3 D(1 + σ )(qτ ) + e4 |C⊥ |qτ || (7.6)

˜ ∈ for some

H02 ().

Proof. First using the same eigenfunction found in the proof of Lemma 4.4, we see that there is a constant e3 > 0 and e4 > 0 such that for each Q ∈ S O(2) × I there exists ˜ ∈ H2 () for which 0

2 ˜ 2 2 2 ˜ ˜ D Dn˜ τ + |C⊥ ||Dn˜ τ | + C |n˜ τ · Dn˜ τ | dx 1 ˜ 2 dx. < e3 D(1 + σ )(qτ )2 + e4 |C⊥ |qτ || (7.7) 4 Now we start with the proof as we did in Lemma 4.5 to hold (4.7) and the estimate on n − n˜ τ 4; for sufficiently large K 2 and τ ≥ τ0 = R −1 . Since we do not have the divergence of K 1 , we need to carry out the “change of gauge” method as before. ˜ = eiqu , where u is a solution to (7.3). The first two terms of (7.6) will be Set approximated by the first two of (7.7) by exactly the same proof as in Lemma 7.1. Now we turn to the third term of (7.6). We have ˜ 2 dx = |n · Dn | |n˜ τ · Dn˜ τ + (n − n˜ τ ) · ∇ + iqn · ∇u |2 dx

≤ 2 n˜ τ · Dn˜ τ 22 + C(q) n − n˜ τ 24 ∇ 24

+ Cq 2 ∇u − n + n˜ τ 24 24 + Cq 2 n − n˜ τ 24 24 .

(7.8)

˜ and elliptic regularity for a solution of (7.3), we have From the construction of ˜ 22 ≤ C(, q)

˜ 22 .

∇ 24 ≤ C(q) 1 + ∇u 24

As in Lemma 7.1, the last three terms in (7.8) can be treated as error terms. Now with the help of (7.7), we obtain (7.6).

398

S. Joo, D. Phillips

Lemma 7.1 and Lemma 7.2 derive the following theorem based on the proof of previous theorems. Theorem 7.1. We suppose that τ ≥ R −1 and qτ ≥ h −2 . Let

β = max D(qτ )2 , |C⊥ |qτ .

Then there are universal positive constant e1 and constants e5 , e6 depending only on σ

A depending on q, , D, and C⊥ such that if and a constant K A q ≥ e1 τ, and K 2 ≥ K and if (, n) is a minimizer for F A in A, then we hold the following two statements: and

1) r > r¯ A := −e5 β implies ≡ 0 2) r < r A := −e6 β implies ≡ / 0.

References 1. Bauman, P., Carme Calderer, M., Liu, C., Phillips, D.: The phase transition between chiral nematic and smectic A* liquid crystals. Arch. Rat. Mech. Anal. 165, 161–186 (2002) 2. Bauman, P., Phillips, D., Tang, Q.: Stable nucleation for the Ginzburg-Landau system with an applied magnetic field. Arch. Rat. Mech. Anal. 142, 1–43 (1998) 3. Cagnon, M., Durand, G.: Positional anchoring of smectic liquid crystals. Phys. Rev. Lett. 70(18), 2742– 2745 (1993) 4. Chandrasekhar, S.: Liquid Crystals. 2nd Edition. Cambridge: Cambridge University Press, 1992 5. Chen, J.-H., Lubensky, T.C.: Landau-Ginzburg mean-field theory for the nematic to smectic-C and nematic to smectic-A phase transitions. Phys. Rev. A, 14(3), 1202–1207 (1976) 6. Collings, P.J., Patel, J.S.: Handbook of Liquid Crystal Research. Oxford: Oxford University Press, 1997 7. Dierking, I.: Textures of Liquid Crystals. New York: Wiley-VCH, 2003 8. De Gennes, P.G., Prost, J.: The Physics of Liquid Crystals. 2nd Edition. Oxford: Clarendon Press, 1993 9. Elston, S., Mottram, N.: Order parameter variations in smectic liquid crystals. In: Advances in Liquid Crystals: A Special Volume of Advances Chemical Physics, edited by J. Vig, 113, New York: Wiley, 2000, pp. 317–339 10. Friedman, A.: Partial Differential Equations. Malabar, FL: Robert E. Krieger Publishing Company, 1983 11. Grisvard, P.: Elliptic Problems in Nonsmooth Domains. London: Pitman Advanced Publishing Program, 1985 12. Giorgi, T., Phillips, D.: The breakdown of superconductivity due to strong fields for the Ginzbug-Landau models. SIGEST Article, SIAM Rev. 44(2), 237–256 (2002) 13. Girault, V., Raviart, P.A.: Finite Element Methods for Navier Stokes Equations. Berlin: Springer-Verlag, 1986 14. Gilbarg, D., Trudinger, N.S.: Elliptic Partial Differential Equations of Second Order. Berlin-HeidelbergNew York: Springer, 1998 15. Iannacchione, G., Finotello, D.: Specific heat dependence on orientational order at cylindrically confined liquid crystal phase transitions. Phys. Rev. E, 50(6), 4780–4795 (1994) 16. Kralj, S., Žumer, S.: Smectic-A structures in submicrometer cylinderical cavities. Phys. Rev. E, 54(2), 1610–1617 (1996) 17. Lubensky, T.C., Renn, S.R.: Abrikosov dislocation lattice in a model of the cholesteric to smectic-A transition. Phys. Rev. A, 38(4), 2132–2147 (1988) 18. Lubensky, T.C., Renn, S.R.: Twist–grain–boundary phases near the nematic—smectic-A—smectic-C point in liquid crystals. Phys. Rev. A 41(8), 4392–4401 (1990) 19. Ladyzhenskaja, O.A., Solonnikov, V.A., Ural’ceva, N.N.: Linear and quasi-linear equations of parabolic type. Providence, RI: Amer. Math. Soc. (1968) 20. Lubensky, T.C.: Abrikosov vortex lattices in liquid crystals. Physica A 220, 99–112 (1995)

Phase Transitions from Chiral Nematic Toward Smectic Liquid Crystals

399

21. Luk’yanchuk, I.: Phase transition between the cholesteric and twist grain boundary C phases. Phys. Rev. E. 57(1), 574–581 (1998) 22. Nirenberg, L.: On elliptic partial differential equations. Ann. Sc. Norm. Sup. Pisa 13, 123–131 (1959) 23. Nirenberg, L.: An extended interpolation inequality. Ann. Sc. Norm. Sup. Pisa 20, 733–738 (1966) 24. Pan, X.: Landau-de Gennes model of liquid crystals and critical wave number. Commun. Math. Phys. 239(1-2), 343–382 (2003) 25. Renn, S.R.: Multicritical behavior of Abrikosov vortex lattices near the cholesteric - smectic A - smectic C* point. Phys. Rev. A. 45(2), 953–973 (1992) Communicated by A. Kupiainen

Commun. Math. Phys. 269, 401–424 (2007) Digital Object Identifier (DOI) 10.1007/s00220-006-0093-2

Communications in

Mathematical Physics

Mass Generation in a Fermionic Model with Finite Range Time Dependent Interactions Vieri Mastropietro Dipartimento di Matematica, Università di Roma “Tor Vergata”, Via della Ricerca Scientifica, 00133 Roma, Italy. E-mail: [email protected] Received: 10 September 2005 / Accepted: 29 May 2006 Published online: 8 November 2006 – © Springer-Verlag 2006

Abstract: Bardeen, Cooper and Schrieffer in their paper on the theory of superconductivity introduced a model of interacting fermions (BCS model) in which the (instantaneous) interaction is only between electrons of opposite momentum and spin (Cooper pairs). Subsequently it was claimed that in the thermodynamic limit the BCS model is equivalent to the (exactly solvable) quadratic mean field BCS model in which the phenomenon of mass generation is present; a rigorous proof of this equivalence is however still an open problem. In this paper we consider an interacting fermionic model in which the Cooper pairs interact through a finite range time dependent interaction. For this model (quartic in the fermions and not solvable) we are able to prove the generation of mass in the thermodynamic limit and its equivalence with the mean field BCS model. The proof is achieved by a convergent perturbation expansion about mean field theory. 1. Introduction and Main Results 1.1. BCS and mean field BCS model. Bardeen, Cooper and Schrieffer [BCS] developed their theory of superconductivity by introducing and studying the BCS model, describing a system of electrons with an instantaneous infinite range interaction involving only electrons of opposite momentum and spin (Cooper pairs); the Hamiltonian of this model is ∇2 λ − + + H BC S = − μ)ax− [ d xax+,σ (− − d x a a ][ d ya − x,+ x,− ,σ y,− a y,+ ], 2 V σ =± V

V

V

(1.1) ax± ,σ

where ∇ is the Laplacian, μ is the chemical potential, are creation or annihilation 1 spin 2 fermionic field operators in a d-dimensional box with side L and V = L d , m is the electron effective mass and λ > 0 is the (attractive) coupling. The model is not solvable but it was shown in [BCS] that a superconducting phase is energetically favorable with respect to a normal phase. Later on it was realized that the properties of such

402

V. Mastropietro

superconducting phase are identical to the ones of the mean field BCS model, an exactly solvable model with Hamiltonian ∇2 HM F = λ−1 ||2 + d xax+,σ (− − μ)ax− ,σ 2 σ =± V − − ¯ −( + h)[ d ya y,− a y,+ ] − ( + h)[ d ya +y,+ a +y,− ], (1.2) V

V

where is a complex number to be determined by minimizing the ground state energy (that is solves the BCS gap equation), and the external field h has to be removed after the thermodynamic limit. It has been argued in several papers, starting from [BR], that in the limit V → ∞ the BCS model (1.1) and the mean field model (1.2) are equivalent in the sense that they have the same finite temperature correlation functions; this seems quite natural also by analogy with lattice classical statistical mechanics in which infinite range interactions give mean field behavior in the thermodynamic limit. Indeed many arguments have been given to support the equivalence of (1.1) and (1.2) over the years but, as far as I know, a rigorous proof is still lacking. In [BZT] the identity of the correlation functions for H BC S and HM F in the thermodynamic limit was claimed, by using a diagrammatic (i.e. perturbative) approach showing by a graph by graph analysis that the difference of the two correlation functions vanishes in the thermodynamic limit. However the validity of such an argument presupposes that the perturbation expansion is convergent and there are many examples in which properties established at a perturbative level are indeed not valid because of lack of convergence. A similar perturbative argument in a more modern language has been given in [SHML], in which the similarity of the perturbative expansion of the BCS model with the so-called N1 expansion is pointed out. In [ML] it was proved that the ground state of H BC S and HM F are equal up to corrections vanishing in the limit, but no information was obtained on the finite temperature correlation functions. In [B] and [H] it was argued that H BC S can be replaced by HM F in the infinite volume limit on the basis of the fact that certain commutators vanish in the limit, allowing the replacement of operators by c-numbers. Unfortunately it was possible to use such ideas only to prove, see [TW], the convergence of H BC S to HM F in a rather small subspace of states with no single particle excitation and in which the “gap equation” holds; this does not imply convergence of the finite temperature correlation functions (involving the trace over a complete set of states). The question was then reconsidered, later on, in [T] in which equivalence of the correlation functions in H BC S and HM F in the thermodynamic limit was finally proved but only if the fermionic dispersion relation = const (degenerate BCS model), as in this case H BC S is assumed to be a constant ε(k) can be explicitly diagonalized. The assumption that the system is dispersionless means that the energy band in the single electron spectrum is completely flat: this corresponds to perfectly localized electrons with no hopping, or to electrons with an infinite mass. It is not obvious at all that the results in [T] survive if we add a small hopping term and in any case the proof would require a delicate analysis; see [Ta] for a similar problem in the case of ferromagnetism in the Hubbard model with nearly flat bands. Finally in [M] a new equivalence proof based on a functional integral approach was given, but the analysis involves an unjustified exchange of limits (between the thermodynamic limit with the cutoffs necessary to give a meaning to the functional integral). The conclusion of this historical review is that, after so many years of attempts, there are still no rigorous

Mass Generation in a Fermionic Model

403

results on the equivalence of the BCS with the mean field BCS model, and similar for the validity of BCS theory beyond mean field. In this paper we consider an interacting fermionic model in which the Cooper pairs interact, instead through an instantaneous interaction like in (1.1), through a finite range time dependent interaction. For this model, quartic and not solvable, we are able to prove the mass generation and the equivalence with the BCS mean field model (1.2) in the thermodynamic limit. The proof of this statement is somewhat based on the original ideas in [BZT], as we introduce a perturbative expansion about mean field theory which is convergent if the range of the interaction is long enough and vanishing in the thermodynamic limit; the key point is that we avoid (contrary to [BZT]) an expansion in terms of Feynmann diagrams (which has bad convergence properties) but we consider a different expansion in terms of the product of determinants (which one cannot expand otherwise the good convergence properties are lost) and we prove that at each order they are vanishing in the thermodynamic limit at least as V −1 ; the uniform convergence of the expansion is established via determinant bounds for fermionic expectations. The perturbation theory about the mean field theory uses as a small parameter the inverse range of the time dependent potential; this is a classical approach in classical statistical mechanics to prove phase transition beyond mean field theory, see for instance [LMP].

1.2. Main results. The model we consider is expressed in terms of Grassmann functional integrals defined in the following way. Given [0, L]d ∈ Zd , the inverse temperature β and the integer M, we introduce in = [0, L]d × [0, β] a lattice M , whose sites are given by the space-time points x = ( x , x0 ) = (n 1 , . . . , n d , n 0 a0 ), β a0 = M , n 1 , . . . , n d = 0 . . . , L − 1 and n 0 = 0, 1, . . . , M − 1. We consider also k0 ), with k = k1 , . . . , kd and ki = 2π n i , the set D of space-time momenta k = (k, L 1 (n + ), n ∈ Z, −M ≤ n ≤ M − 1; −[L/2] ≤ n i ≤ [(L − 1)/2] and k0 = 2π 0 0 0 β 2 M plays the role of an ultraviolet cutoff for k0 while the lattice introduces an intrinsic In the limit L → ∞ k ∈ Td . With each k ∈ D we associate four Grasscutoff for k. ε , ε, σ ∈ {+, −}. The functional integration Dψ is defined as mannian variables ψˆ k,σ ε , such the linear functional on the Grassmann algebra generated by the variables ψˆ k,σ ˆ in the variables ψˆ ε , its value is 0, except in the case that, given a monomial Q(ψ) k,σ − ˆ = k∈D,σ =± ψˆ ψˆ + , up to a permutation of the variables. In this case the Q(ψ) k,σ k,σ value of the functional is determined, by using the anticommuting properties of the ˆ = 1 . We also define the Grassmannian field as, V = L d , variables, by Dψ Q(ψ) ε ψx,σ =

1 iεkx ε ψˆ k,σ , e Vβ

x ∈ M .

(1.3)

k∈D

Note that the ultraviolet cutoff M is introduced just for technical reasons in order to have finite dimensional Grassmann integrals and it has to be removed before the thermodynamic limit. We define V=−

λ V

dx

− + + − ψx,− ψy,− ψy,+ , dyv (x0 − y0 ) ψx,+

(1.4)

404

V. Mastropietro

where dx stands for a0 x∈ M , v(x0 − y0 ) is a Kac potential with a long but finite range potential κ −1 given by v (x0 − y0 ) =

Moreover

1 β

e−i p0 (x0 −y0 )

2π n p0 = β 0 n 0 ∈Z

κ2 . κ 2 + p02

(1.5)

P(dψ) = N

−1

1 + ˆ− ψk,σ −ik0 + ε k − μ ψˆ k,σ Dψ · exp − Vβ σ =±

,

(1.6)

k∈D

d − μ)2 )]. The two = i=1 (1 − cos ki ), N = k∈D [(Vβ)−2 (−k02 − (ε(k) with ε(k) point correlation functions are given by the following Grassmann integrals: σ ψσ −V −h σ =± dxψx,σ x,−σ ψ ε ψ −ε P(dψ)e k,σ k ,σ ε ˆ −ε ˆ ψk,σ ψk ,σ L ,β,h = lim . (1.7) σ σ M→∞ P(dψ)e−V −h σ =± dxψx,σ ψx,−σ The equivalence with the BCS mean field model (1.2) can be established in the thermodynamic limit, at least if the range is long enough, as it is shown by the following theorem. Theorem. Assume μ < d and λ > 0; there exists βc (λ) > 0 such that for β ≥ βc (λ) 1 and 0 < κ < κ0 = C −1 λ− 2 β −d/2−2 for a suitable constant C the Schwinger functions (1.7) with v(x0 − y0 ) given by (1.5) are such that − ˆ+ ψk,σ L ,β,h = lim+ lim ψˆ k,σ

h→0 L→∞

k02

+ ˆ+ ψ−k,− L ,β,h = lim lim ψˆ k,+

h→0+ L→∞

− μ) ik0 + (ε(k) , − μ)2 + (β)2 + (ε(k) (β)

− μ)2 + (β)2 k02 + (ε(k)

where (β) is the real negative solution of the BCS gap equation

β 2 2 tanh 2 (ε(k) − μ) + (β) d k 1=λ , (2π )d 2 + (β)2 2 (ε( k) − μ) Td

(1.8) ,

(1.9)

(1.10)

and βc (λ) is the minimal β such that (1.10) admits a solution. Equation (1.10) is the well known gap equation found in [BCS] and the r.h.s. of (1.8), (1.9) are the Schwinger functions of the BCS mean field model. It is well known a that, if β = T −1 , (1.10) has a solution for T ≤ Tc with Tc = Ae− λ , with A, a 1 suitable positive constants, and, for T close to Tc , B(Tc − T ) 2 for a suitable constant B. The above theorem then implies that, if λ, κ, β are chosen so that β ≥ βc 1 and 0 < κ < C −1 λ− 2 β −d/2−2 then the phenomenon of spontaneous mass generation is + ψ+ present as ψk,+ −k,− is different from zero and the Schwinger functions in coordinate space have an exponential decay proportional to (β).

Mass Generation in a Fermionic Model

405

Note that we can prove convergence only for small κ, as it turns out that κ ≤

1 C −1 λ− 2 β −d/2−2 for a suitable constant C; we have not tried to optimize the power of β −1 in the above bound and small improvements could be easily obtained using the

techniques in this paper, which however would not change qualitatively our results. Of course it would be interesting to prove the same theorem up to κ = ∞, so obtaining a real solution of the BCS model with instantaneous interaction, or at least up to κ independent from λ and β. 1.3. Physical motivations. In order to discuss the physical meaning of the model (1.7) we note first that, if we replace v(x0 − y0 ) by δ(x0 − y0 ) in (1.4), the functional integral (1.7) is the imaginary time finite temperature two point Schwinger function of the BCS model on a lattice, with Hamiltonian (1.1) in which the laplacian is replaced by its dis ˆ f ( crete version x ) = dj=1 f ( x + eˆ j ) − 2 f ( x ) + f ( x − eˆ j ), where f : Zd → R and d eˆ j , j = 1, . . . , d are the unit versors on Z ; see for instance [NO]. The BCS model (1.1) is not a truly fundamental one; a realistic model for describing superconductivity should include the interaction of electrons with bosonic fields, describing phonons or other excitations in metals which are rather complex; one can then consider as a model a fermionic functional integral in which the effect of such interactions is phenomenologically taken into account by an effective interaction between electrons. Such effective interaction should be time dependent; in fact the instantaneous Coulomb interaction between fermions is repulsive and the attractive interaction, responsible for superconductivity, is due to the interaction of the electrons with some bosonic excitation resulting in an effective time dependent interaction between electrons, see for instance [CEKO]. Then a realistic model could be given by (1.7) with interaction VR = −

1 1 λvσ,σ (k)ψk+1 ,σ ψk−2 ,σ ψk+3 ,σ ψk−4 ,σ δ(k1 − k2 + k3 − k4 ) 4 (Vβ)4 σ,σ

k1 ,...,k4

(1.11) with vσ,σ (k) ≡ vσ,σ (k1 , . . . , k4 ) in general depending on temporal and spatial momenta. The analytic difficulties for studying a model with interaction (1.11) with a generic vσ,σ (k) motivate the search of simpler models. The BCS model (1.1) is equivalent to the assumption that vσ,σ (k) is non vanishing only on the Cooper pairs, that is vσ,σ (k) = δσ,−σ δk1 ,−k3 δk2 ,−k4 ; this has the effect that in the interaction there are three independent sums on time momenta but only two in space momenta. Renormalization Group arguments, suggesting that the relevant interactions are the ones involving the momenta of the four electrons close to the Fermi surface, combined with the geometrical constraints to which zero sum four vectors constrained on a convex surface are subjected, gives a (very partial) justification to this assumption [BG, FMRT, Sh]. With this choice the interaction can be written as λ − + ˆ+ V BC S = − ψˆ k,+ ψ−k+p,− ψˆ k− ,− ψˆ −k (1.12) +p,+ , 3 (βV ) 2π k,k

p=0, p0 = β n 0 n 0 ∈Z

and the interaction (1.12) is precisely the same appearing in the model Hamiltonian (1.1). Note that the interaction (1.12) cannot be factorized in a simple way, as the sum

406

V. Mastropietro

over p0 has the effect that it is not the square of the total number of Cooper pairs. Hence, even if the Hamiltonian (1.1), with interaction (1.12) seems much more tractable than the full model with interaction (1.11), this lack of factorization has prevented, despite so many attempts, a rigorous analysis of the BCS model and a proof of the validity of the BCS equations (1.8)-(1.9)-(1.10) in the thermodynamic limit. A much more drastic simplification consists in assuming that also the temporal momenta are in Cooper pairs (but there is no real justification for this assumption as there are no geometrical constraints for the time variables), vσ,σ (k) = δσ,−σ δk1 ,−k3 δk2 ,−k4 . In this case one has perfect factorization in the interaction which can be written as, if ε = ±, √ λ + − ε ε ˆε ¯ V D R = −2N N , ψˆ k,ε ψ−k,−ε , N = (1.13) (2βV )1/2 (βV ) k

where N ε is the total density of Cooper pairs. The model in this case is called Doubly Reduced model; the fact that the interaction is perfectly factorized allows one to show the equivalence of the Doubly Reduced model and the mean field model (1.2). This can be established in the thermodynamic limit [L], by an argument essentially identical to the one for the Ising model with infinite range interaction. The lack of rigorous results for the BCS model with interaction (1.12) (at least with a realistic dispersion relation), and the intrinsic mean field nature of the Doubly Reduced model (1.13) motivates the introduction of our model, in which the interaction is given by (1.4) or, equivalently, by λ − + ˆ+ V=− ψ−k+p,− ψˆ k− ,− ψˆ −k v( ˆ p0 )ψˆ k,+ (1.14) +p,+ 3 (βV ) 2π k,k

p=0, p0 = β n 0 n 0 ∈Z

with v( ˆ p0 ) given by (1.5). The model reduces to the BCS model when κ = ∞ (local interation), and to the Doubly Reduced model when κ = 0 (infinite range interaction). In the model defined by Eq.(1.7)-(1.14) the interaction is not factorized so that it is a truly non mean field model for which the phenomenon of spontaneous mass generation via the BCS mechanism can be proved for κ small enough. The validity of a similar result up to κ = ∞ is an open problem. The rest of the paper is organized in the following way. In §2 we perform a partial Hubbard-Stratonovich transformation and we will show that a standard saddle point analysis allows us to prove the theorem, if suitable bounds are valid for the corrections to mean field. In §3 an expansion for (1.7) around mean field theory is introduced, and such bounds are shown to be indeed true. 2. Saddle Point Analysis 2.1. Partial Hubbard-Stratonovich transformation. We start the analysis of (1.7) by splitting the interaction V (1.14) as sum over two terms, one with n 0 = 0 and the other with n 0 = 0, ˆ V = V¯ + V,

+ ψ ˆ + ψˆ − ψˆ − = −2N + N − , V¯ = − (βVλ )3 k,k ψˆ k,+ −k,− k ,− −k ,+ λ + ψ ˆ+ ˆ− ˆ− Vˆ = − (βV )3 v( ˆ p0 )ψˆ k,+ k,k −k+p,− ψk ,− ψ−k +p,+ , 2π p=0,| p0 |≥ β

(2.1) (2.2) (2.3)

Mass Generation in a Fermionic Model

407

with N ± defined in (1.13). Let us consider the generating function S L ,β,h (J ), √ √ 2βV + 2βV − + − + − S L ,β,h (J ) 2N + N − −Vˆ −h √λ N −h √λ N e = P(dψ)e e e dx σ [Jx,σ ψx,σ +ψx,σ Jx,σ ] , (2.4) where J ± are external Grassmann fields, so that ∂2

ε ε ψy,σ ψx,σ =

−ε −ε ∂ Jx,σ ∂ Jy,σ

SL ,β,h (J )| J =0 .

(2.5)

By using the identity (Hubbard-Stratonovich transformation) (φ = u + iv, φ¯ = u − iv, u, v ∈ R2 , a, b ∈ R2 ) 1 1 2 ¯ 2ab e = dudve− 2 |φ| eaφ+bφ , (2.6) 2π R2

we can rewrite the above expression as 1 1 2 S L ,β,h (J ) e = dudve− 2 |φ| 2π R2

×

P(dψ)e

−Vˆ (φ−h

e

√ 2βV √ λ

¯ )N + +(φ−h

√ 2βV √ λ

)N −

e

dx

+ − + − σ [Jx,σ ψx,σ +ψx,σ Jx,σ ]

.

(2.7) Performing the change of variables (u, v) → √

λ Dε (2βV )1/2

2βV (u +

√h , v) λ

and defining N ε ≡

we obtain βV π

eS L ,β,h (J ) =

dudve

−βV (v 2 +(u+ √h )2 ) −βV F L ,β,h (u,v)+B L ,β,h (u,v,J ) λ

e

,

(2.8)

R2

where e

√

−βV F L ,β,h (u,v)+B L ,β,h (u,v,J )

√ √ + + − + − ˆ ¯ − = P(dψ)e−V e λφ D + λφ D e dx σ [Jx,σ ψx,σ +ψx,σ Jx,σ ] (2.9)

and (by definition) B L ,β,h (u, v, J ) vanishes for J = 0 so that F L ,β,h (u, v) is given by √ √ + ˆ ¯ − P(dψ)e−V e λφ D + λφ D . (2.10) e−βV F L ,β,h (u,v) = The conclusion is that (1.7) can be written as

ε ε ψ−εε ψk,σ k,−εε σ

1 −βV (v 2 +(u+ √h )2 ) −βV F L ,β (u,v) ε,ε λ dudve e = Sˆ L ,β (k, u, v), Z L ,β,h

R2

(2.11)

408

V. Mastropietro

where SLε,ε,β (k, u, v) = ∂ J −ε ∂ J −ε k,σ

−εε k,−εε σ

Z L ,β,h =

dudve

B(u, v, J )| J =0 (σ -independent) and

−βV (v 2 +(u+ √h )2 ) −βV F L ,β (u,v) λ

e

.

(2.12)

R2

2.2. Proof of the Theorem. We can write F L ,β (u, v) = t BC S (u, v) + F¯ L ,β (u, v), = ε(k) − μ and φ = u + iv, where, if E(k)

t BC S (u, v) = −

1 2 log V β

cosh

k

β 2

+ λ|φ|2 E 2 (k)

(2.13)

(2.14)

cosh β2 E(k)

is the free energy in the mean field BCS model (see for instance [L]) and F¯ L ,β (u, v) is the rest. In the following section we prove the following lemma. 1

Lemma 1. There exist constants C, C1 such that, if 0 < κ < κ0 = C −1 λ− 2 β −d/2−2 , then λ |F¯ L ,β (u, v)| ≤ C1 κ 2 β d+2 log β. V

(2.15)

The above lemma is the key technical result of the present paper: it says that the correction to mean field behaviour vanishes in the thermodynamic limit, if the range is long enough. Calling V F¯ L ,β (u, v) ≡ Fˆ L ,β (u, v), we can write the two point Schwinger functions as 1 −βV [v 2 +(u+ √h )2 +t BC S (u,v)] −β Fˆ L ,β (u,v) ε,ε λ dudve e SL ,β (k, u, v), (2.16) Z L ,β,h R2

By the saddle point theorem, for β large enough lim L→∞

e

−βV (v 2 +(u+ √h )2 +t BC S (u,v))

dudve

λ

−βV (v 2 +(u+ √h )2 +t BC S (u,v)) λ

= δ(u − u 0 )δ(v),

where u 0 is given by the negative (for h > 0) solution of

⎡ ⎤ β 2 (k) + λu 2 tanh E 0 2 ⎢ ⎥ d k 2h λ − 1⎥ =√ u0 ⎢ ⎣ ⎦ d (2π ) λ + λu 2 2 E 2 (k) 0

(2.17)

(2.18)

which in the limit h → 0 reduces to the BCS equation (1.10). Moreover the following lemma also holds.

Mass Generation in a Fermionic Model

409 1

Lemma 2. There exist constants C, C2 such that, if 0 < κ < κ0 = C −1 λ− 2 β −d/2−2 then − μ) ik0 + (ε(k) + R −,+ L ,β (k, u, v) − μ)2 + λ|φ|2 + (ε(k) √ λφ +,+ SL ,β (k, u, v) = 2 + R +,+ L ,β (k, u, v) − μ)2 + λ|φ|2 k + (ε(k)

SL−,+ ,β (k, u, v) =

(2.19)

k02

(2.20)

0

with +,+ 2 3d+5 −1 V . |R −,+ L ,β (k, u, v)|, |R L ,β (k, u, v)| ≤ C 2 λκ β

(2.21)

Hence, by inserting (2.19)-(2.20) in (2.16) and using (2.15)-(2.17)-(2.21), we prove the theorem. Remark. One could perform the Hubbard-Stratonovitch transformation for the full inter¯ so writing the partition function, if N M is a normalization factor, action V (not only for V) lim N M

[

M→∞

dφ p0 ]e−F L ,β ({φ})

(2.22)

p0 = 2π β n0 |n 0 |≤2M

with F L ,β ({φ}) = − log

1 2

( p02 κ −1 + 1)|φ p0 |2

p0 = 2π β n0 |n 0 |≤2

−

P(dψ)e

√ λ 2(βV )3/2

k, p0 =

− − + ψ+ ¯ [φ p0 ψk,+ 2π n 0 −k+p,− +φ p0 ψk,− ψ−k+p,+ ] β , p=0

.

(2.23)

A similar representation holds also for the two point correlation function. Note that if one performs in (2.22) the limits in the wrong way, that is L → ∞ with the ultraviolet cutoff M fixed, the evaluation of the point function becomes immediate; in fact F L ,β has a global minimum corresponding to φ ∗p0 = δ p0 ,0 φ ∗ , with φ ∗ given by the BCS mean field solution; hence, keeping M finite and L → ∞, the saddle point theorem can be applied and the mean field behaviour is immediately recovered for any κ. However the limit M → ∞ must be taken before the thermodynamic limit, and the naive saddle point analysis cannot be applied (the corrections are O(C M V −1 )). Some relationship is believed to hold between the problem of mass generation in the BCS model and the analogous problem in the so called 1/N -expansion, where in certain cases it has been possible, by cluster expansion techniques and Peierls estimates, to prove that fluctuations around mean field are negligible for large N , see for instance [K, KMR, MVH]; whether such methods could be applied to recover our result, or, more interestingly, whether they could be applied to larger κ possibly up to the case of local interaction (κ = ∞), is an open problem.

410

V. Mastropietro

3. Vanishing of the Corrections to Mean Field 3.1. The partition function. In §3.1 §3.2 and §A1.1 we will show that that the expansion for F L ,β.h (u, v) or for the two point function are convergent for sufficiently long range interactions. Then in §3.3 and §A1.2 we will complete the proof of Lemmas 1 and 2, by showing that the corrections to mean field are indeed vanishing in the thermodynamic limit. Recalling that φ = u + iv, we write √ √ + ˆ ˆ ¯ − P(dψ)e λφ D + λφ D e−V (ψ) = e−βV t BC S (u,v) Pφ (dψ)e−V (ψ) ¯

= e−βV t BC S (u,v)−βV F L ,β (u,v) , where

where

λ ˆ V(ψ) =− V

dx is a symbol for “a0

− + + − ψx,− ψy,− ψy,+ , dxdyv(x ˜ 0 − y0 )ψx,+

1 β

v(x ˜ 0 − y0 ) =

Moreover Pφ (dψ) =

x∈ M ”

k

k

k

(3.2)

and

e−i p0 (x0 −y0 )

2π n p0 = β 0 n 0 =0

κ2 . κ 2 + p02

⎫ ⎬ 1 ε ψˆ εk,ε Tε,ε ψˆ ε−ε , − k,ε ⎭ ⎩ Vβ

⎧ d ψˆ + d ψˆ − ⎨ N (k)

(3.1)

(3.3)

(3.4)

k ε,ε =±

where N (k) is the normalization of Pφ (dψ), t BC S (u, v) = −

2 + λ|φ|2 k 2 + E(k) 1 2 log 0 2 2 Vβ k0 + E(k)

(3.5)

k

and the 2 × 2 matrix T (k) is given by −ik√ 0 + E(k) T (k) = λφ¯

√

λφ . −ik0 − E(k)

(3.6)

We can write t BC S (u,√v) as (2.14) by explicitly performing the sums over k0 and of course |t BC S (u, v)| ≤ |λ|C[1+|φ|]. If ε, ε = ±, the propagator of Pφ (dψ) is given by 1 −ik(x−y) −1 −ε ε ψy,ε e [T (k)]ε,ε . (3.7) Pφ (dψ)ψx,ε ≡ gε,ε (x − y) = Vβ k

As usual F¯ L ,β (u, v) can be represented in terms of Feynmann diagrams obtained starting from the “graph elements” in Fig. 1 and “contracting” them in all possible ways, leaving no lines uncontracted. However such representation is not suitable for proving convergence of the series (the best bounds we are able to find for the k th order grow as k!) so that a different one has to be used.

Mass Generation in a Fermionic Model

411

Fig. 1. Graphical representation of (3.2); the wiggly line represents λV −1 v(x ˜ 0 − y0 ) and the oriented lines represent the fields ψ ±

We decompose the propagator gε,ε (x − y) into a sum of two propagators supported in the regions of k0 “large” and “small”, respectively. The regions of k0 large and small are defined in terms of a smooth compact support function H0 (t), t ∈ R+ , such that 1 if t < 1/γ , H0 (t) = (3.8) 0 if t > 1, with γ > 1. We define h(k0 ) = H0 (|k0 |) so that we can rewrite gε,ε (x − y) as: (u.v.)

(i.r.)

gε,ε (x − y) = gε,ε (x − y) + gε,ε (x − y),

(3.9)

where 1 −ik(x−y) e h(k0 )[T −1 (k)]ε,ε , Vβ k 1 (u.v.) gε,ε (x − y) = e−ik(x−y) (1 − h(k0 ))[T −1 (k)]ε,ε Vβ (i.r.)

gε,ε (x − y) =

(3.10) (3.11)

k

are called the infrared and the ultraviolet propagator. We can write then Pφ (dψ) = Pφ (dψ (u.v.) )Pφ (dψ (i.r.) ) and in the Appendix we will show that 0 (i.r.) ˆ (i.r.) (u.v) (3.12) Pφ (dψ (u.v.) )e−V (ψ +ψ ) = N0 e−V (ψ ) with N0 a normalization factor and λ − + + − V (0) = − ψx,− ψy,− ψy,+ dxdyv(x ˜ 0 − y0 )ψx,+ V ∞ 2n (0) + (x , . . . , x ) ψxεii ,σi , dx1 . . . dx2n W2n,σ 1 2n ,ε n=1

(3.13)

i=1

where σ = σ1 , . . . , σ2n , ε = ε1 , . . . , ε2n and 1 (0) dx1 · · · dx2n |W2n,σ ,ε (x1 , . . . , x2n )| ≤ C n |λ|max(1,n−1) (κ 2 β 2 )max(1,n−1) . (3.14) Vβ

412

V. Mastropietro

Remark. The above bound is suitable for proving the convergence of the expansion for F L ,β.h (u, v) or the two point function but it is not strong enough for proving Lemmas 1 and 2, which show that the corrections to mean field are vanishing in the thermodynamic limit. We will show in §A1.2 that, indeed, the bound can be improved obtaining an extra V −1 . 3.2. Convergence of the infrared integration. We define a distance d(x, y) L ,β = (dβ (x0 , y0 ), d L (x1 , y1 ), . . . , d L (xn , yn )) as dβ (x0 , y0 ) =

π β sin (x0 − y0 ) π β

d L (xi , yi ) =

π L sin (xi − yi ). π L

(3.15)

In order to perform the infrared integration we need the large distance behaviour of the infrared propagator. Lemma 3. For any integer N ≥ 1 the following bounds hold: (i.r.) |gε,ε (x − y)| ≤ (i.r.)

CN −1 1 + [β |d(x √

|gε,−ε (x − y)| ≤ √

λ|φ|

λ|φ| + β −1

− y)|] N

,

(3.16)

CN . 1 + [β −1 |d(x − y)|] N

(3.17)

Proof. The above bounds follow by integrating by parts. Consider integers N0 ,N1 ,. . .,Nd and note that, if i = 1, . . . , d, [

d

i.r −iπ(x L d L (xi , yi ) Ni ]dβ (x0 , y0 ) N0 gε,ε (x − y) = e

−1 d i=1

Ni +x0 β −1 N0 )

i=1

(−i) N0 +

i

Ni

d 1 −ik(x−y) Ni N0 e [ ∂ki ]∂k0 [h(k0 )[T0−1 (k )]ε,ε ], Vβ k

(3.18)

i=1

where ∂ki and ∂k0 denote the discrete derivatives. Note that the sum over k is finite (with d a number of terms equal to V ) so that, if N = N0 + i=1 Ni > 1, the absolute value of the r.h.s. of (3.18) can be bounded, uniformly in φ by CN [

d

|d L (xi , yi )|−Ni ]|dβ (x0 , y0 )|−N0

i=1

β N +1 |2n 0 + 1|−N −1 , β n ∈N

(3.19)

0 |n 0 |≤cβ

where C N , c are constants. This implies the bounds (3.16), (3.17). By the definition of truncated expectations it is ∞ (0) i.r. n 1 T 0 Pφ (dψ (i.r.) )e−V (ψ ) = e n=0 (−1) n! E (V ;n) , where E T (X ; n) =

∂n log ∂λn

P(dψ)eλX (ψ) |λ=0 .

(3.20)

(3.21)

Mass Generation in a Fermionic Model

413

We write (3.13) as

˜ dx P W (x P )ψ(P),

(3.22)

P

where P is the set of field labels appearing in (3.13), W (x P ) are the kernels in (3.13), 0 that is λV −1 v(x ˜ 0 − y0 ) or W2n,σ ,ε (x1 , . . . , x2n ) and ˜ ψ(P) =

ε( f )

f ∈P

ψx( f ),σ ( f ) .

(3.23)

We can write E T (V 0 ; n) =

dx P1 . . .

˜ 1 ) . . . ψ(P ˜ n )). dx Pn W (x P1 ) . . . W (x Pn )E T (ψ(P

P1 ,...,Pn

(3.24) The fermionic truncated expectations can be also expressed by the formula (see [Le] or [GM] for a detailed derivation) ˜ 1 ) . . . ψ(P ˜ s )) = gεl ,εl (xl − yl ) d PT (t) det G T (t), (3.25) E T (ψ(P T l∈T

where s ≥ 1 ˜ ψ(P) =

f ∈P

ε( f )

ψx( f ),σ ( f )

(3.26)

and a) T is a set of lines forming an anchored tree between the cluster of points P1 , . . . , Ps , i.e. T is a set of lines which becomes a tree if one identifies all the points in the same clusters. b) t = {ti,i ∈ [0, 1], 1 ≤ i, i ≤ s}, d PT (t) is a probability measure with support on a set of t such that ti,i = ui · ui for some family of vectors ui ∈ Rs of unit norm. c) G T (t) is a (N − s + 1) × (N − s + 1) matrix, 2N = |P1 | + · · · + |Ps | whose elements are given by G iTj,i j = ti,i gε,ε (xi j − yi j ) with ( f i−j , f i+ j ) not belonging to T . If s = 1 the sum over T is empty, but we can still use the above equation by interpreting the r.h.s. as 1 if P1 is empty, and detG(P1 ) otherwise. We bound the determinant using the Gram-Hadamard inequality, stating that, if M is a square matrix with elements Mi j of the form Mi j = Ai , B j , where Ai , B j are vectors in a Hilbert space with scalar product ·, · , then | det M| ≤

||Ai || · ||Bi ||.

i

where || · || is the norm induced by the scalar product.

(3.27)

414

V. Mastropietro

Let H = Rs ⊗ H0 , where H0 is the Hilbert space of complex four dimensional vectors F(k) = (F1 (k), . . . , F4 (k)), Fi (k) being a function on the set D, with scalar product F, G =

4 1 ∗ Fi (k)G i (k). Lβ i=1

(3.28)

k

One can check that G iTj,i j (t) = ti,i gεi.r.,ε (xi j − yi j ) = ui ⊗ Ax( fi j ),ε( fi j ) , ui ⊗ Bx( fi j ),ε( fi j ) , l

l

(3.29) where ui ∈ Rs , i = 1, . . . , s, are the vectors such that ti,i = ui · ui , and ⎧ √ ⎨( −ik + E(k), 0, ( λφ)1/2 , 0), if ε = +, 0 ikx · Ax,ε (k) = e √ ¯ 1/2 , 0, −ik0 − E(k)), + λ|φ|2 ⎩(0, ( λφ) if ε = −, k02 + E 2 (k) √

h(k0 )

(3.30) ⎧ √ ⎨ ( λφ) ¯ 1/2 , 0, 0), if ε = +, ( −ik0 + E(k), h(k0 ) By,ε (k) = e−iky · √ + λ|φ|2 ⎩(0, 0, ( λφ)1/2 , −ik0 − E(k)), if ε = −. k02 + E 2 (k) √

√ √ Hence from (3.27), as ||A|| ≤ C1 | log β| and ||B|| ≤ C2 | log β| (see (3.19)) we find | det G iTj,i j (t)| ≤ C3N −s+1 | log β| N −s+1 ,

(3.31)

where C1 , C2 , C3 are constants. By using the above formula in (3.24) we get |E (V T

(0)

; n)| ≤

(C4 | log β|)

1 2 [|P1 |+...+|Pn |]−n+1

P1 ,...,Pn

dx Pn |W (x P1 )| . . . |W (x Pn )|

dx P1 . . .

(i.r.) [ |gε,ε (xl − yl )|], T

(3.32)

l∈T

where we have used that d PT (t) = 1. The number of addends in T is bounded by n n!C for a suitable constant C; this follows from Cailey’s formula. Integration over the coordinates can be done by remembering that T is a set of lines forming an anchored tree between the clusters of points P1 , . . . , Ps , and we can associate to T a tree T˜ connecting all the points x P1 , . . . , x Pn in the following way; T˜ is formed by the lines in T and by a set of lines (chosen arbitrarily) connecting among them the points x Pi . Hence the integration over all the coordinates can be done integrating over all the n − 1 coordinate differences of the extreme points of the lines in T , using that, by Lemma 3, each integration contributes to the final bound for E T (V 0 ; n) a factor β d+1 ;

Mass Generation in a Fermionic Model

415

the integration over the other coordinate differences can be performed by using (3.14) or the fact that |v(x ˜ 0 − y0 )| = |

1 β

eik0 (x0 −y0 )

k0 =0 k0 = 2π β n0

κ 2β 2 κ2 1 | ≤ ≤ β −1 C(κβ)2 2 (2π )2 β κ 2 + k02 n 0 n =0 0

(3.33) which implies

dxV −1 |v(x ˜ 0 )| ≤ C(κβ)2

(3.34)

√ if Cis a suitable constant. Finally we get, assuming κβ | log β| ≤ sum over Pi , |E T (V 0 ; n)| ≤ n!

1 −1 2C

in order to

n [ C Pi |λ|max(1,|Pi |/2−1) (κ 2 β 2 )max(1,|Pi |/2−1) | log β||Pi |/2−1 ] i=1 Pi

×(βV )β (n−1)(d+1) | log β| ≤ (βV )n!C n λn (κ 2 β 2 )n | log β|n+1 β (d+1)n β −(d+1) ≤ (βV )C n λn (κ 2 β d+3+η )n β −(d+1) n!

(3.35) 1

for a constant 0 ≤ η < 1. Hence, by assuming κ ≤ C −1 λ− 2 β (3.20), it follows |F¯ L ,β (u, v)| ≤ Cλκ 2 β 2+η .

−d−3−η 2

≡ κ0 and using (3.36)

3.3. Extracting a volume factor. The above analysis says that F¯ L ,β,h , which is the correction to mean field, is given by a convergent expansion if the interaction range is long enough. A closer look to the Feynmann graphs shows, however, that each graph obeys to a much better bound as it vanishes as V → ∞. Expanding in terms of Feynmann graphs corresponds to evaluating the determinants as sums of n! terms and gives up the combinatorial better bound based on the Gram-Hadamard inequality. However an extra factor V −1 can be gained in the estimates without expanding in graphs (i.e. without loosing convergence). We consider first the case in which all the kernels W (x P ) in (3.24) are associated to λV −1 v, ˜ so that we have to bound

dx1

dy1

dx

dy[

n λ ˜ 1 ∪ y1 ) . . . ψ(x ˜ n ∪ yn )), v(x0,i − y0,i )]E T (ψ(x V i=1

(3.37) + ψ+ , ˜ where dx = dx2 . . . dxn , dy = dy2 . . . dyn and we defined ψ(x) = ψx,+ x,− − − − − + + ˜ ˜ ∪ y) = ψx,+ ψx,− ψy,− ψy,+ . We use a well known property ψ(y) = ψy,− ψy,+ and ψ(x

416

V. Mastropietro

of fermionic truncated expectation, see for instance [Le] ˜ ∪ y)ψ(P ˜ 2 ) . . . ψ(P ˜ n )) = E T (ψ(x ⎞ ⎞ ⎛ ⎛ ˜ ˜ ˜ j )⎠ E T ⎝ψ(y) ˜ j )⎠ ψ(P ψ(P (−1)π E T ⎝ψ(x) K 1 ,K 2 ,K 1 ∩K 2 =0 K 1 ∪K 2 =2,...,n

j∈K 1

j∈K 2

˜ ψ(y) ˜ ψ(P ˜ 2 ) . . . ψ(P ˜ n )) + E T (ψ(x)

(3.38)

and (−1)π is the parity of the permutation necessary to bring the Grassmann variables on the r.h.s. of (3.36) to the original order. Note that the number of terms in the sum in the r.h.s. of (3.36) is bounded by C n for a suitable constant C. By (3.38) we can write E T (V 0 ; n) in (3.20) in the following way (n ≥ 2): λ (−1)π H2 (x1 ; K 1 )H2 (y1 ; K 2 ) E T (V 0 ; n) = dx1 dy1 v(x ˜ 0,1 − y0,1 ) V K ,K ;K ∩K =0 1 2 1 2 K 1 ∪K 2 =2,...,n

+

dx1 dy1

λ v(x ˜ 0,1 − y0,1 )H4 (x1 , y1 ), V

(3.39)

where H2 (x1 ; K 1 ) λ ˜ 1) ˜ i ∪ yi )), v(x0,i − y0,i )]E T (ψ(x =[ dxi dyi][ ψ(x V i∈K 1

i∈K 1

H2 (y1 ; K 2 ) λ ˜ i ∪ yi )), ˜ 1) =[ v(x0,i − y0,i )]E T (ψ(y ψ(x dxi dyi][ V i∈K 2

i∈K 2

(3.40)

i∈K 1

(3.41)

i∈K 2

H4 (x1 , y1 ) n λ ˜ 1 )ψ(y ˜ 1 )ψ(x ˜ 2 ∪ y2 ) . . . ψ(x ˜ n ∪ yn )). v(x0,i − y0,i )]E T (ψ(x = dx dy[ V i=2

(3.42) As it is clear from Fig. 2, in (A1.17) we separate the terms becoming disconnected by cutting a wiggly line from the terms remaining connected.

Fig. 2. Graphical representation of (3.38)

Mass Generation in a Fermionic Model

417

The crucial point is that, by translation invariance, H2 (x; K 1 ) and H2 (y; K 2 ) are x, y independent so that the first addend in (A1.17) vanishes (because v( ˜ p0 ) = 0 if p0 = 0) 1 β

p0 =0 p0 = 2π β n0

κ2 δ p0 ,0 H2 (0; K 1 )H2 (0; K 2 ) = 0. κ 2 + p02

(3.43)

On the other hand we can write

λ |v(x ˜ 0,1 − y0,1 )H4 (x1 , y1 )| V 1 −1 −1 2 λV β (κβ) ≤ dx1 dy1 |H4 (x1 , y1 )|. βV

1 βV

dx1 dy1

(3.44)

The truncated expectation in the r.h.s. of (A1.18) is expressed by the formula (3.25) and the bounds leading to (3.35) can be repeated; we get, for n ≥ 1, 1 λ (κβ)2 β −1 Vβ V

dx

dy|H4 (x, y)| ≤

1 1 n C (λκ 2 β 2 )n β (d+1)n | log β|n n! βV

(3.45)

so that for κ ≤ κ0 and summing over n (dividing by n!) we get the bound (2.15). Note that, with respect to the bound (3.35), we have a fermionic integration, giving an extra β d+1 log β −1 , replacing an integration over v, ˜ giving an extra V −1 β −1 . Remark. The bound (3.45) has been found assuming that to all the kernels W (x P ) in (3.24) are associated λV −1 v: ˜ this is equivalent to the replacement of the fermionic propi.r. (x − y), that is it is equivalent to impose an ultraviolet agator gε,ε (x − y) (3.9) with gε,ε cutoff in the imaginary time variable. In the fermionic full propagator the sum over k0 is unbounded and the summand is O(k0−1 ) so that the sum is improperly convergent in the limit M → ∞. However it is well known that the ultraviolet cutoff on the “time direction” can be removed in any dimension for lattice fermion systems, see [GLM], and this lack of convergence does not affect the large distance properties. We repeat such analysis for completeness in the Appendix, showing that indeed (3.45) holds also in the general case.

3.4. The integration of S. If SLε,ε,β (x, y, u, v) is the Fourier transform of SLε,ε,β,n (k, u, v) in (2.13), we can write

SLε,ε,β (x, y, u, v) = gε,ε (x, y) +

∞ 1 ¯ ε,ε (x, y, u, v), S n! L ,β,n n=1

(3.46)

418

V. Mastropietro

and, using again (3.38), S¯ Lε,ε,β,n (x, y, u, v) can be written for n ≥ 2 as λ ˜ 0,1 − y0,1 ) S¯ Lε,ε,β,n (x, y; u, v) = dx1 dy1 v(x V × H¯ 4 (x1 , x, y; K 1 )H2 (y1 ; K 2 ) K 1 ,K 2 ;K 1 ∩K 2 =0 K 1 ∪K 2 =2,...,n

+ dx1 dy1 + dx1 dy1

λ v(x ˜ 0,1 − y0,1 ) V K

H2 (x1 ; K 1 ) H¯ 4 (y1 , x, y; K 2 )

1 ,K 2 ;K 1 ∩K 2 =0 K 1 ∪K 2 =2,...,n

λ v(x ˜ 0,1 − y0,1 ) H¯ 6 (x1 , y1 , x, y), V

(3.47)

where H¯ 2 , H¯ 4 , H¯ 6 are defined analogously to (3.40),(3.41),(A1.18). Again the first two terms in (3.47) vanish by translation invariance. The last term admits the bound ! ! !λ ! 1 λ −1 1 ! ¯ ˜ 0,1 − y0,1 ) H6 (x1 , y1 , x, y)!! ≤ β (κβ)2 dxdy dx1 dy1 ! v(x βV V βV V dxdy dx1 dy1 | H¯ 6 (x1 , y1 , x, y)| ≤ Cn

1 1 n C (λκ 2 β 2 )n β (d+1)(n+2) | log β|max(0,n−1) n! βV

(3.48)

from which (2.21) follows.

Fig. 3. Graphical representation of (3.45); the dotted lines represent the external lines

Appendix A1. The Ultraviolet Integration A1.1. Multiscale analysis. The integration of the ultraviolet part (3.12) can be done by a multiscale analysis; it is quite standard and we refer to [GLM] or § 3 of [BM] for details. It is convenient to introduce a multiscale decomposition of the ultraviolet part of the propagator by writing (u.v.)

[1,N ] gε,ε (x − y) ≡ gε,ε (x − y) =

N k=1

(k)

gε,ε (x − y),

(A1.1)

Mass Generation in a Fermionic Model

419

Fig. 4. An example of tree τ

where (k)

gε,ε (x − y) =

1 h k (k0 )e−ik(x−y) gε,ε (k) Vβ

(A1.2)

k∈D

with h k (k0 ) = H0 (γ −k |k0 |) − H0 (γ −k+1 |k0 |) and N and M are proportional by the compact support properties of h k . Note that lim N →∞ g [1,N ] (x − y) = g (u.v.) (x − y) and that, for any integer K ≥ 0, g (k) (x − y) satisfies the bound (k) |gε,ε (x − y)| ≤

CK . 1 + (γ k |dβ (x0 − y0 )| + |d L ( x − y)|) K

(A1.3)

(k)

We associate to any propagator gε,ε (x, y) a Grassmann field ψ (k) and an integration P(dψ (k) ) with propagator g (k) (x − y). We can write V (0) as: [1,N ] +φ) V (0) (φ) + Vβ E 1 = − lim log P(dψ (1) ) · · · P(dψ (N ) )e−V (ψ . (A1.4) N →∞

We can integrate iteratively the fields on scale N , N − 1, . . . , h + 1 and after each integration we can rewrite the r.h.s. of (A1.4) in terms of a new effective potential V (h) : ⎧ ⎫ N ⎨ ⎬ (h) [1,h] (A1.4) = lim Vβ E j − log P(dψ (1) ) · · · P(dψ (h) )e−V (ψ +φ) . (A1.5) ⎭ N →∞ ⎩ j=h+1

with V (h) (ψ [1,h] ) admitting a representation in terms of trees defined in the following way: 1) Let us consider the family of all trees which can be constructed by joining a point r , the root, with an ordered set of n ≥ 1 points, the endpoints of the unlabeled tree, so that r is not a branching point. n will be called the order of the unlabeled tree and the branching points will be called the non-trivial vertices. The unlabeled trees are partially ordered from the root to the endpoints in the natural way; we shall use the symbol < to denote the partial order. Two unlabeled trees are identified if they can be superposed by a suitable continuous deformation, so that the endpoints with the same index coincide. It is then easy to see that the number of unlabeled trees with n end-points is bounded by 4n .

420

V. Mastropietro

We shall consider also the labeled trees (to be called simply trees in the following); they are defined by associating some labels with the unlabeled trees, as explained in the following items. 2) We associate a label h ≥ 0 with the root and we denote T(h,N ),n the corresponding set of labeled trees with n endpoints. Moreover, we introduce a family of vertical lines, labeled by an integer taking values in [h, N ], and we represent any tree τ ∈ T(h,N ),n so that, if v is an endpoint or a non-trivial vertex, it is contained in a vertical line with index h v > h, to be called the scale of v, while the root is on the line with index h. The tree will intersect the vertical lines in a set of points different from the root and the end-points; these points will be called trivial vertices. The set of the vertices of τ will be the union of the endpoints, the trivial vertices and the non-trivial vertices. Note that, if v1 and v2 are two vertices and v1 < v2 , then h v1 < h v2 . Moreover, there is only one vertex immediately following the root, which will be denoted v0 and can not be an endpoint; its scale is h + 1. ˆ (≤h v +1) ) (2.3). Given a vertex v 3) With each endpoint v of scale h v we associate V(ψ which is not an end-point, xv will denote the family of all space-time points associated with one of the endpoints following v. 4) We introduce a field label f to distinguish the field variables appearing in the terms Vˆ associated with the endpoints. The set of field labels associated with the endpoint v will be called Iv . Analogously, if v is not an endpoint, we shall call Iv the set of field labels associated with the endpoints following the vertex v; x( f ), and σ ( f ) will denote the space-time point, the σ index and the ω index, respectively, of the field variable with label f . We call a trivial tree a tree containing only the root and an endpoint. The effective potential can be written then in the following way: V

(h)

(ψ

(≤h)

) + Vβ E˜ h+1 =

∞

V (h) (τ, ψ (≤h) ),

(A1.6)

n=1 τ ∈T(h,N )n

where, if v0 is the first vertex of the non trivial tree τ and τ1 , . . . , τs (s = sv0 ) are the subtrees of τ with root v0 , V (h) (τ, ψ (≤h) ) is defined inductively by the relation, if s > 1: V (h) (τ, ψ (≤h) ) =

# (−1)s+1 T " (h+1) Eh+1 V (τ1 , ψ (≤h+1) ) . . . V (h+1) (τs , ψ (≤h+1) ) . s! (A1.7)

If s = 1 then V (h) (τ, ψ (≤h) ) = Eh+1 [V (h+1) (τ1 , ψ (≤h+1) )] if τ1 is not a trivial tree; on the contrary if τ1 is trivial then V (h) (τ, ψ (≤k) ) = Eh+1 [Vˆ (h+1) (ψ (≤h+1) )] − Vˆ (h+1) (ψ ≤(h) ). By iterating (A1.7) we can write V (h) (τ, ψ (≤h) ) in the following way. We associate with any vertex v of the tree a subset Pv of Iv , the external fields of v. These subsets must satisfy various constraints. First of all, if v is not an endpoint and v1 , . . . , vsv are the vertices immediately following it, then Pv ⊂ ∪i Pvi ; if v is an endpoint, Pv = Iv . We shall denote Q vi the intersection of Pv and Pvi ; this definition implies that Pv = ∪i Q vi . The subsets Pvi \Q vi , whose union will be made, by definition, of the internal fields of v, have to be non-empty, if sv > 1 or if sv = 1 and v1 is an endpoint. We call χ -vertices the vertices of τ such that the set of internal lines is not empty; Vχ (τ ) will denote the set of all χ -vertices of τ . Given τ ∈ T(h,N )n , there are many possible choices of the subsets Pv , v ∈ τ , compatible with all the constraints; we shall denote Pτ the family of all these choices and P the elements of Pτ .

Mass Generation in a Fermionic Model

421

Then we can write V (h) (τ, ψ (≤h) ) =

V (h) (τ, P);

(A1.8)

P∈Pτ

V (h) (τ, P) can be represented as V (h) (τ, P) =

(h+1) dxv0 ψ˜ (≤h) (Pv0 )K τ,P (xv0 ),

(A1.9)

(h+1)

with K τ,P (xv0 ) defined inductively (recall that h v0 = h + 1) by the equation, valid for any v ∈ τ which is not an endpoint, (h )

K τ,Pv (xv ) =

sv " # 1 K v(hi v +1) (xvi ) sv ! i=1 " # T ×Eh v ψ˜ (h v ) (Pv1 \Q v1 ) . . . ψ˜ (h v ) (Pvsv \Q vsv ) ,

(A1.10)

(h )

where if v is an endpoint K v v (xv ) is the kernel λV −1 v(x ˜ 0 − y0 ). By using the representation of the truncated expectation analogous to (3.21) and the Gram inequality we get that the contribution from a tree τ ∈ T(1,h),n associated to a kernel with 2l external legs can be bounded as (see §3.14 [BM] for details in a similar case): 1 (h) dx1 · · · dx2l |W2l (τ ; x1 ; . . . ; x2l )| Vβ ≤ C n |λ(κβ)2 |n γ −h(n−1) γ −(h v −h v )(n v −1+z v ) , (A1.11) v∈Vχ (τ )

where v is the χ -vertex immediately preceding v on τ , n v is the number of endpoints following v on τ and z v = 1 if n v = 1 and 0 otherwise. In deriving (A1.11) we have used that the vertices v with n v = 1 are associated to the factor (tadpole contribution) λ − + (h v ) ψy,−σ gσ,σ (x − y) (A1.12) dx dyv(x ˜ 0 − y0 )ψx,−σ V σ as the contraction of ψ + ψ + or ψ − ψ − in Vˆ is vanishing by momentum conservation; then (h ) we can bound the kernel of (A1.12) using the propagator gσ,σv (x − y) to integrate over the coordinates (instead of the interaction) , so obtaining the bound Cλ(κβ)2 V −1 γ −h v . Moreover, calling Tv the anchored tree in the representation (3.25) for EhTv , the integration over the coordinates has been done over the tree T˜ , obtained adding to T = ∪v∈τ Tv the lines connecting the couple of points associated to the endpoints. To each of the line (h ) (h ) of T˜ is associated a propagator gε,εv or a factor λV −1 v; ˜ the integration of gε,εv produces a factor γ −(d+1)h v and the integration of λV −1 v˜ a factor (κβ)2 . In order to sum over τ and P we note that the number of unlabeled trees is ≤ 4n ; fix an unlabeled tree, the number of terms in the sum over the various labels of the tree is bounded by C n , except the sums over the scale labels and the sets P. Regarding the sum over T , it is empty if sv = 1. If sv > 1 and Nvi ≡ |Pvi | − |Q vi |, the number

422

V. Mastropietro

of anchored trees with di lines branching from the vertex vi can be bounded, by using Caley’s formula, by (sv − 2)! d N d1 . . . Nvssvv ; (d1 − 1)! . . . (dsv − 1)! v1

(A1.13)

sv hence the number of addenda in T ∈T is bounded by v not e.p. sv ! C i=1 |Pvi |−|Pv | . In order to bound the sums over the scale labels and P we first use the inequality, following from (A1.11),

γ −(h v −h v )(n v −1+z v ) ≤ [

v∈Vχ (τ )

1

γ − 40 (h v −h v ) ][

v∈Vχ (τ )

γ−

|Pv | 40

].

(A1.14)

v∈Vχ (τ )

1

The factors γ − 40 (h v˜ −h v˜ ) in the r.h.s. of (A1.14) allow to bound the sums over the scale labels by C n . The sum over P can be bounded by using the following combinatorial sv inequality. Let { pv , v ∈ τ } be a set of integers such that pv ≤ i=1 pvi for all v ∈ τ which are not endpoints; then (see for instance App. 6 of [GM])

γ−

|Pv | 40

≤

P v∈Vχ (τ )

v∈Vχ (τ ) pv

pv

γ − 40 B(

sv

pv , pv ) ≤ C n ,

(A1.15)

i=1

where B(n, m) is the binomial coefficient. This completes the proof of (3.14).

A1.2. Extracting a volume factor. The proof of (2.15), (2.21) in the general case is essentially identical to the one in §3.3. We write F L ,β as e−βV F L ,β = lim

N →∞

P (≤N ) (dψ)e−V (ψ) =

∞ 1 T E (V; n), n! ≤N

(A1.16)

n=1

(i.r.)

where P (≤N ) (dψ) is the fermionic integration with respect to the full propagator gε,ε (x− y) + have

[1,N ] gε,ε (x

− y). Each truncated expectation can be decomposed as in §3.3 and we T (V; n) E≤N

=

dx1 dy1

λ v(x ˜ 0,1 − y0,1 )H4 (x1 , y1 ), V

(A1.17)

where n λ T ˜ 1 )ψ(y ˜ 1 )ψ(x ˜ 2 ∪y2 ) . . . ψ(x ˜ n ∪yn )), v(x0,i−y0,i )]E≤N (ψ(x H4 (x1 , y1 ) = dx dy[ V i=2

(A1.18) where the analogue of the first addend of (3.39) is vanishing. This means that λ e−βV F L ,β = dxdy v(x ˜ 0 − y0 )G N (x, y), (A1.19) V

Mass Generation in a Fermionic Model

where G N (x, y) =

∂2 log ∂φx ∂ φ¯ y

423

P(dψ (i.r.) )P(dψ (1) ) · · · P(dψ (N ) )

×e−V (ψ)+

σ

− + ψ+ − ¯ dx[φ(x)ψx,σ x,−σ +φ(x)ψx,−σ ψx,σ ]

|φ=0 . (A1.20)

G N (x, y) is given by a sum over trees similar to (A1.6) with the only difference that T(−1,N )n is the set of trees defined after (A1.5) with root scale h ≥ −1 (instead of h ≥ 0), and to a non-trivial vertex with scale 0 is associated E0T , the fermionic integration with (0) (i.r ) respect to the propagator gε,ε (x, y) ≡ gε,ε (x, y); moreover, in addition to the normal endpoints associated to V , there are two special endpoints associated to terms linear in φ in the exponent of the r.h.s. of (A1.20). By repeating the analysis from (3.20) to (3.35) we get G N (x, y) =

∂2 log ∂φx ∂ φ¯ y × P(dψ (i.r.) ) ×e−V (ψ

(i.r.) )+

σ

+(i.r.)

dx[φ(x)ψx,σ

−(i.r.)

−(i.r.)

¯ ψx,−σ +φ(x)ψ x,−σ ψx,σ +(i.r.)

]+B0 (ψ (i.r.) ,φ)

|φ=0 , (A1.21)

¯ given by a sum over trees where B 0 (ψ (i.r.) , φ) is a sum of monomials in ψ (i.r.) and φ, φ, bounded by the r.h.s. of (A1.11) in which n v is the sum of normal and special endpoints and [λ(κβ)2 ]n is replaced by [λ(κβ)2 ]n−n s , with n the number of endpoints in τ and n s the number of φ, φ¯ fields. Repeating the analysis leading from (3.20) to (3.35) we can bound the r.h.s. of (A1.20) by $ % 1 λ C1 2 d+2 sup |v(r λκ β ˜ 0 )| log β. (A1.22) dx dy|G N (x, y)| ≤ Vβ V r0 V By repeating a similar analysis also for SL ,β (k, u, v), (2.15), (2.21) again follow. References [BCS] [B] [BG] [BM] [BR] [BZT] [CEKO]

Bardeen, J., Cooper, L.N., Schrieffer, J.R.: Phys. Rev. 108, 1175 (1957) Bogolubov, N.N.: J.E.T.P. 7, 41 (1958) Benfatto, G., Gallavotti, G.: J. Stat. Phys. 59(3-4), 541–664 (1990) Benfatto, G., Mastropietro, V.: Rev. Math. Phys. 13, 11,1323–1425 (2001) Bardeen, J., Rickayzen, G.: Phys Rev 118, 936 (1960) Bogolubov, N.N., Zubarev, D.N., Tserkovnikov, I.A.: Sov. Phys.-Doklady 2, 535 (1958) Carlson, E.W., Emery, V.J., Kivelson, S.A., Orgad, D.: In: Physics of Conventional and Un conventional superconductors, K.H. Bennemann, J.B. Keherson, eds. Berlin Heidelberg New York: Springer, 2002 [GLM] Gallavotti, G., Lebowitz, J., Mastropietro, V.: J. Stat. Phys. 108(5), 831–861 (2002) [GM] Gentile, G., Mastropietro, V.: Phys. Rep. 352, 273–437 (2001) [FMRT] Feldman, J., Magnen, J., Rivasseau, V., Trubowitz, E.: Europhys. Lett. 24(7), 521–526 (1993) [H] Haag, R.: Nuovo Cimento 25, 287 (1962) [K] Kupiainen, A.: Commun. Math. Phys. 73, 273–294 (1980) [KMR] Kopper, K., Magnen, J., Rivasseau, V.: Commun. Math. Phys. 169, 121–180 (1995) [L] Lehmann, D.: Commun. Math. Phys. 198, 427–468 (1998)

424

[Le] [LMP] [M] [ML] [MVH] [NO] [Sh] [SHML] [T] [Ta] [TW]

V. Mastropietro

Lesniewski, A.: Commun. Math. Phys. 108, 437–467 (1987) Lebowitz, J., Presutti, E., Mazel, A.E.: Phys. Rev. Lett. 80, 4701–4704 (1998) Muhlschlegel, B.: J. Math. Phys. 3, 522–530 (1962) Mattis, D., Lieb, L.H.: J. Math. Phys. 2, 602–609 (1961) Marchetti, D.H.U., Faria da Veiga, P.A., Hurd, T.R.: Commun. Math. Phys. 179, 623–646 (1996) Negele, J.W., Orland, H.: Quantum Many-Particle Systems, New York: Addison Wesley, (1998) Shankar, R.: Rev. Mod. Phys. 66, 129–192 (1994) Salmhofer, M., Honerkamp, C., Metzner, W., Lauscher, O.: http://arxiv.org/list/ cond-mat 0409725, 2004 Thirring, W.: Commun. Math. Phys. 7, 181–189 (1967) Tasaky, H.: Phys. Rev. Lett. 73(8), 11158–1162 (1994) Thirring, W., Wehrl, W.: Commun. Math. Phys. 4, 303–314 (1966)

Communicated by G. Gallavotti

Commun. Math. Phys. 269, 425–471 (2007) Digital Object Identifier (DOI) 10.1007/s00220-006-0137-7

Communications in

Mathematical Physics

The Distribution of the Free Path Lengths in the Periodic Two-Dimensional Lorentz Gas in the Small-Scatterer Limit Florin P. Boca1,2 , Alexandru Zaharescu1,2 1 Department of Mathematics, University of Illinois at Urbana-Champaign, 1409 W. Green Street, Urbana,

IL 61801, USA. E-mail: [email protected]

2 Institute of Mathematics “Simion Stoilow” of the Romanian Academy, P.O. Box 1-764, RO-014700

Bucharest, Romania. E-mail: [email protected] Received: 10 October 2005 / Accepted: 11 July 2006 Published online: 4 November 2006 – © Springer-Verlag 2006

In memory of Walter Philipp Abstract: We study the free path length and the geometric free path length in the model of the periodic two-dimensional Lorentz gas (Sinai billiard). We give a complete and rigorous proof for the existence of their distributions in the small-scatterer limit and explicitly compute them. As a corollary one gets a complete proof for the existence of the (3) constant term c = 2 − 3 ln 2 + 27ζ in the asymptotic formula h(T ) = −2 ln ε + c + o(1) 2π 2 of the KS entropy of the billiard map in this model, as conjectured by P. Dahlqvist. 1. Introduction and Main Results A periodic two-dimensional Lorentz gas (Sinai billiard) is a billiard system on the twodimensional torus with one or more circular regions (scatterers) removed. This model in classical mechanics was introduced by Lorentz [31] in 1905 to describe the dynamics of electrons in metals. The associated dynamical system is simple enough to allow a comprehensive study, yet complex enough to exhibit chaos. According to Gutzwiller [26]: “The original billiard of Sinai was designed to imitate, in the most simple-minded manner, a gas of hard spherical balls which bounce around inside a finite enclosure. The formidable technical difficulties of this fundamental problem were boiled down to the shape of a square for the enclosure, and the collisions between the balls were reduced to a single point particle hitting a circular hard wall at the center of the enclosure.” The model was intensively studied from the point of view of dynamical systems [10, 13, 14, 21, 22, 24, 34]. Our primary goal here is to estimate the free-path length (first return time) in this periodic two-dimensional model in the small-scatterer limit. We solve the following three open problems: (1) the existence and computation of the distribution of the free path length, previously considered in [9, 11, 16], (2) the existence and computation of the distribution of the geometric free path length, previously shown, but not fully proved, in [14],

426

F. P. Boca, A. Zaharescu

(3) the existence and computation of the second (constant) term in the asymptotic formula of the KS entropy h(Tε ) of the billiard map in this model, previously studied in [12–14, 21]. For each ε ∈ (0, 21 ) let Z ε = {x ∈ R2 ; dist(x, Z2 ) ≥ ε}, denote by ∂ Z ε the boundary Z2 + εT of Z ε , and define the free path length (also called first exit time) as the Borel map given by τε (x, ω) = inf{τ > 0 ; x + τ ω ∈ ∂ Z ε },

x ∈ Z ε , ω ∈ T.

If tan ω is irrational, then τε (x, ω) < ∞ for every x ∈ Z ε . We consider the probability space (Yε , με ), with Yε = Z ε /Z2 ⊆ [0, 1)2 and με the normalized Lebesgue measure on Yε . Let et = e(t,∞) denote the characteristic function of (t, ∞). For every t > 0 the probability that τε (x, ω) > 2εt is given by et (2ετε ) dμε . Pε (t) = με ({(x, ω) ∈ Yε × [0, 2π ) ; 2ετε (x, ω) > t}) = Yε × T

Lower and upper bounds for Pε of correct order of magnitude were established by Bourgain, Golse and Wennberg [9], using the rational channels introduced by Bleher [3]. More recently, Caglioti and Golse [11] have proved the existence of the Cesaro lim sup and lim inf means, proving for large t that 1/4 2 1 1 dε = 2 +O 2 lim sup Pε (t) + | ln δ| ε π t t δ δ→0 (1.1) 1/4 1 dε Pε (t) . = lim inf δ→0+ | ln δ| δ ε In Sects. 2–7 below we prove the existence of the limit P(t) of Pε (t) as ε → 0+ and explicitly compute it. Theorem 1. For every t > 0 and δ > 0, Pε (t) = P(t) + Oδ (ε1/8−δ ) with

(ε → 0+ ),

⎧ 2 π t2 ⎪ ⎪ ⎪ (1 − t) + if 0 < t ≤ 1; ⎪ ⎪ 2 ⎪ ⎪6 t−1 1 ⎨ 6 ψ(x, t) d x + φ(x, t) d x if 1 < t ≤ 2; P(t) = 2 π ⎪ 0 t−1 ⎪ ⎪ 1 ⎪ ⎪ ⎪ ⎪ ⎩ ψ(x, t) d x if t > 2, 0 t−x t (t − x)2 (1 − x)2 2 ln − ln , ψ(x, t) = x t − 2x x t (t − 2x) 1−t 1 (t − x)(x − t + 1) φ(x, t) = ln + x t−x x t−x t t−x (1 − x)2 2 ln − ln . + x 1−x x t (1 − x)

Free Path Lengths in the Periodic Lorentz Gas in the Small-Scatterer Limit

427

After a direct computation the above formula for P(t) yields P(t) =

∞ 2n − 1 24 , π2 n 2 (n + 1)2 (n + 2)t n

t ≥ 2,

n=1

and thus for large t we find 2 +O π 2t

P(t) =

1 t2

,

which agrees with (1.1). The related “homogeneous” problem when the trajectory starts at the origin O and the phase space is a subinterval of the velocity range [0, 2π ) was studied by Gologan and the authors. The limit distribution

1

H (t) = lim+ {ω ∈ [0, 2π ) ; ετε (O, ω) > t} = lim+ et ετε (O, ω) dω, ε→0 2π ε→0 T where | | denotes the Lebesgue measure, was shown to exist and explicitly computed in [6, 7]. Unlike P, the function H is compactly supported on the interval [0, 1]. Interestingly, in the particular situation where the scatterers are vertical segments, this case is related to some old problems in diophantine approximation investigated by Erdös, Szüsz and Turán [17, 18], Friedman and Niven [20], and by Kesten [28]. The main tools used to prove Theorem 1 are a certain three-strip partition of [0, 1)2 and the Weil-Salié estimate for Kloosterman sums [19, 27, 35]. The latter is used in infinitesimal form with respect to the parameter ω to count the number of solutions of equations of form x y = 1 (mod q) in various regions in R2 . This approach, somehow reminiscent of the circle method, produces good estimates, allowing us to keep under control the error terms. It was developed and used recently in many situations to study problems related to the spacing statistics of Farey fractions and lattice points in R2 [1, 4–7]. A possible source for getting better estimates for the error terms might come from further cancellations in certain sums of Kloosterman sums, of the form [15, 23, 29] S= h a,b (c)S(a, b; c). a,b

c

The three-strip partition of T2 is related to the continued fraction decomposition of the slope of the trajectory. Following work of Blank and Krikorian [2] on the longest orbit of the billiard, Caglioti and Golse explicitly introduced this partition and used it in conjunction with ergodic properties of the Gauss map [11] to prove (1.1). We will use it in Sect. 3 in a suitable setting for our computations. One can also consider the phase space ε+ = {(x, ω) ∈ ∂Yε × T ; ω · n x > 0} with n x the inward unit normal at x ∈ ∂Yε and the probability measure νε on ε+ obtained by normalizing the Liouville measure ω · n x d x dω to mass one. Consider also the distribution Gε (t) = νε ({(x, ω) ∈ ε+ ; 2ετε (x, ω) > t}) = et (2ετε ) dνε ε+

of the geometric free path length τε (x, ω). The first moment (geometric mean free path length) of τε with respect to νε can be expressed as 1 − π ε2 π |Yε | = . (1.2) τε dνε = |∂Yε | 2ε ε+

428

F. P. Boca, A. Zaharescu

Fig. 1. The graphs of P(t), G(t), and respectively g(t)

Equality (1.2) is a consequence of a more general formula of Santaló [33] who extended earlier work of Pólya on the mean visible distance in a forest [32]. The formulation from (1.2) appears in [12, 13, 16]. Knowledge of the mean free path does not give however any information on other moments or on the limiting distribution of the free path in the small-scatterer limit. Our number theoretical analysis leads to the following solution of this limiting distribution problem, proved in Sects. 8–11 below. Theorem 2. For every t > 0 and δ > 0, Gε (t) = G(t) + Oδ (ε1/8−δ )

(ε → 0+ ),

with ⎧ 2 π ⎪ ⎪ −t if 0 < t ≤ 1; ⎪ ⎪ ⎪ 6 ⎪ ⎪ t−1 ⎪ ⎪ 1 ⎪

(x, t) d x ⎪ −2 + t + (t − 1) ln + ψ ⎨ 6 t −1 0 G(t) = 2 1 π ⎪ ⎪ ⎪

(x, t) d x φ if 1 < t ≤ 2; + ⎪ ⎪ ⎪ t−1 ⎪ ⎪ 1 ⎪ ⎪ ⎪

(x, t) d x ⎩ ψ if t > 2, 0

(1 − x)2 1 1 (1 − x)2 (t − x)2 t−x

(x, t) =

(x, t) = ln ψ , φ + . ln ln 2 2 x t (t − 2x) x t−x x t (1 − x) We note the equalities

G(t) = −P (t),

t > 0,

(1.3)

and g(t) : = −G (t) = P (t) ⎧ if 0 < t ≤ 1; 6 ⎨1

= 2 1 2

1 2 1 2 2

1 π ⎩ +2 1− − 1− ln 1 − ln 1 − if t > 1. t

t

t

2

t

(1.4)

t

The latter also yields g(t) =

∞ 24 2n − 1 , π 2t 2 n(n + 1)(n + 2)t n n=1

t ≥ 2.

(1.5)

Free Path Lengths in the Periodic Lorentz Gas in the Small-Scatterer Limit

429

Remarkably, formulas (1.4) and (1.5) were found by Dahlqvist [14]. That approach however does not provide a rigorous proof for the existence of the limit distribution, because it fails to control in a quantitative way the uniform distribution of his variable

(see the comments after formulas (75) and (86) in [14]). In the final section we use some standard analysis arguments and properties of the dilogarithm and trilogarithm to estimate Cε := ln τε dνε − ln τε dνε . ε+

ε+

It was conjectured by Friedman, Kubo and Oono [21] that Cε is convergent as ε → 0+ . Its hypothetical limit C was estimated to be 0.44 ± 0.001 in [21] and ≈ 0.43 in [8]. This conjecture was known to imply [13, 21] the asymptotic formula h(Tε ) = −2 ln ε + 2 − C + o(1)

as ε → 0+

for the KS entropy of the associated billiard map. In [12] Chernov proved that Cε remains bounded when ε → 0+ , without giving however any estimate for the bounds. 9ζ (3) The constant C was identified by Dahlqvist [14, formula (73)] as being 3 ln 2 − 4ζ (2) = 0.43522513609 . . .. The conjecture of Friedman, Kubo and Oono, in the more precise form provided by Dahlqvist, follows now from Theorem 2. Theorem 3. In the small scatterer limit ε → 0+ the following holds: (i) Cε = 3 ln 2 −

9ζ (3) + o(1). 4ζ (2)

(ii) h(Tε ) = −2 ln ε + 2 − 3 ln 2 +

9ζ (3) + o(1). 4ζ (2)

These methods work for any convex scatterer due to the good error control they give when integrating over the velocity in very short intervals. To keep the presentation of the paper neat we have chosen to only consider circular scatterers. In dimension ≥ 3 the problem of the existence of the limiting distribution of the freepath length in the small-scatterer limit remains open and is manifestly difficult. Partial results in this direction have appeared in [9, 24, 25]. 2. Farey Fractions and Summation over Primitive Lattice Points In this section we collect some basic properties of Farey fractions and outline the summation method that will allow us to estimate the limit distribution of the free path length when the size of scatterers tends to zero. For each positive integer Q, let F Q denote the set of Farey fractions of order Q. These are the rational numbers γ = qa with coprime integers a, q such that 1 ≤ a ≤ q ≤ Q. For each interval I ⊆ [0, 1] the number of elements in the set F Q (I ) = I ∩ F Q can be expressed, using elementary arguments on Möbius and Euler-Maclaurin summation, as Q 2 |I | + O(Q ln Q). #F Q (I ) = 2ζ (2)

430

F. P. Boca, A. Zaharescu

If γ =

a q

< γ =

a q

are two consecutive elements in F Q , then a q − aq = 1

and

q + q > Q.

(2.1)

This shows on the one hand that the denominators of consecutive Farey fractions of order Q are exactly the primitive integer points in the set QT = {(Qx, Qy) ; 0 < x, y ≤ 1, x + y > 1}, and on the other hand that denominators uniquely determine consecutive Farey fractions. For instance, a is the unique integer in [0, q] for which (q − a)q = 1 (mod q). In many instances in this paper we will seek to estimate sums of type S f,,I (Q) = f (q, q , a), γ ∈F Q (I ) (q,q )∈Q

where I ⊆ [0, 1] is an interval, ⊆ T a region, and f a C 1 function. These kinds of sums can be roughly approximated by some integrals, with control on error terms given by the following two results which will be systematically used in this process. The first one is a standard fact and is a plain consequence of the Möbius summation (for a proof see [4, Lemma 2.3]). Lemma 1. Let 0 < a < b and f be a C 1 function on [a, b]. Then b b ϕ(k) 1 f (k) = f (x) d x + O ln b f ∞ + |f | , k ζ (2) a a a
where ϕ denotes Euler’s totient function. The second one is a consequence of Weil’s type bounds for Kloosterman sums (cf. [7, Lemma 2.2]). Lemma 2. Let q ≥ 1 be an integer, I and J intervals with |I|, |J | < q, f a C 1 function on I × J , and T ≥ 1 an integer. Then for every δ > 0, ϕ(q) f (a, b) = 2 f (x, y) d x d y + E, q a∈I , b∈J ; ab=1 (mod q)

I ×J

with 1

3

E = E(q, T, f, |I|, |J |, δ) δ T 2 q 2 +δ f ∞ + T q 2 +δ D f ∞ +

|I| |J | D f ∞ , T

where we denote · ∞ = · ∞,I ×J and D f = | ∂∂ xf | + | ∂∂ yf |. When = {(x, y) ; α < x ≤ β, ξ(x) ≤ y ≤ η(x)} is a subset of T = {(x, y) ; 0 < x, y ≤ 1, x + y > 1}, the above mentioned properties of Farey fractions lead to S f,,I (Q) = f (q, q , a) = f (q, q , q − a), α Q
α Q
Free Path Lengths in the Periodic Lorentz Gas in the Small-Scatterer Limit

431

where we denote J (x) = {y ; (x, y) ∈ } = [ξ(x), η(x)] ⊆ (1 − x, 1],

x ∈ (α, β].

The inner sum above is mastered by Lemma 2, being approximated by ϕ(q) ϕ(q) f (q, q , q − a) dq da = 2 f (q, q , a) dq da. q2 q Q J (q/Q)×q(1−I ) Q J (q/Q)×q I Thus

S f,,I (Q) =

α Q
ϕ(q) V (q) + error term, q

where we take 1 V (q) = f (q, q , a) dq da = Q f (q, Qy, qγ ) dy dγ . q Q J (q/Q)×q I J (q/Q)×I When the error term is small enough, this sum is mastered by Lemma 1, giving βQ 1 S f,,I (Q) = V (q) dq + error ζ (2) α Q βQ Q = dq dy dγ f (q, Qy, qγ ) + error ζ (2) α Q J (q/Q)×I β Q2 = dx dy dγ f (Qx, Qy, Qxγ ) + error ζ (2) α J (x)×I 1 = f (v, w, vγ ) dv dw dγ + error. ζ (2) Q×I 3. A Partition of the Unit Square In this section we give an account on the three-strip partition mentioned in the introduction. This approach, slightly different from that in [11], is suitable for computations involving Farey fraction partitions of the unit interval. 1 In the first part of this section we shall consider a fixed (small) ε > 0 and let Q = [ 2ε ] 1 a be the integer part of 2ε . For each γ = q ∈ F Q , consider the points Nγ (q, a + ε) and Sγ (q, a − ε). Consider also the points N0 (0, ε) and S0 (0, −ε), and denote by Sγ the strip determined by the lines N0 Nγ and S0 Sγ . A segment does not interfere with an open strip when their intersection is empty. Throughout this section γ = qa < γ = qa will be two consecutive fractions in F Q , so that (2.1) is fulfilled. In particular this gives 2ε max{q, q } ≤ 2ε Q ≤ 1 and 2ε(q + q ) > 2ε(Q + 1) > 1.

(3.1)

The slope of a segment AB is denoted by t AB . Set t P = t O P . Lemma 3. The segment Nγ Sγ does not interfere with the strip Sγ , and the segment Nγ Sγ does not interfere with the strip Sγ .

432

F. P. Boca, A. Zaharescu

Fig. 2. The strips Sγ and Sγ

Proof. First, we show that Sγ lies above the line N0 Nγ of equation y − ε − aq

a

ax q

= 0,

which amounts to − 2ε − q ≥ 0. The latter is equivalent to 1 − 2εq ≥ 0, which is true by (3.1). Furthermore, Nγ lies below the line S0 Sγ of equation y + ε − aq x = 0, as a result of a + 2ε <

aq q

being equivalent to 2εq < 1.

For each k ∈ N0 = {0, 1, 2, . . . } set qk = q + kq, γk =

ak = a + ka, qk = q + kq , ak = a + ka ,

a + 2ε ak ak − 2ε , tk = , uk = k , qk qk qk

αk = arctan tk , βk = arctan u k . The following three relations hold for every k ∈ N = {1, 2, . . . }: − ak−1 qk , ak−1 qk − ak qk−1 = 1 = ak qk−1

(3.2)

− ak−1 q , ak−1 q − aqk−1 = 1 = a qk−1

(3.3)

min{2εqk , 2εqk } k≥1

≥ 2ε(q + q) > 1.

(3.4)

As a result of (2.1) and (3.1)–(3.4), it is seen that γ = γ0 =

a k→∞ a = γ, > γ1 > γ2 > · · · > γk −→ q q

and that γ =

a ∞←k a − 2ε ←− tk ≤ tk−1 ≤ · · · ≤ t1 ≤ t0 = q q a + 2ε k→∞ a = u 0 ≤ u 1 ≤ · · · ≤ u k−1 ≤ u k −→ = γ . < q q

So putting Iγ ,0 = (t0 , u 0 ],

Iγ ,k = (tk , tk−1 ],

Iγ ,−k = (u k−1 , u k ],

k ∈ N,

we end up with a partition (Iγ ,k )k∈Z of the interval (γ , γ ). Next we consider the points Nγk (qk , ak + ε) and Sγk (qk , ak − ε), proving

Free Path Lengths in the Periodic Lorentz Gas in the Small-Scatterer Limit

433

Lemma 4. The following inequalities hold for every k ≥ 1: (i) t N0 Nγk−1 > t N0 Nγk > t N0 Sγk−1 ≥ t N0 Sγk ≥ t N0 Nγ . (ii) t S0 Sγ ≥ t S0 Nγ > t S0 Sγ1 ≥ t S0 Sγk . Proof. The inequalities in (i) are equivalent to ak−1 ak ak−1 − 2ε ak − 2ε a > > ≥ ≥ , qk−1 qk qk−1 qk q which follow from (3.2), (3.3), (3.1), and from (3.4). The inequalities in (ii) are equivalent to a a + 2ε a + a a1 ak ≥ = > ≥ , q q q1 q + q qk which follow from (2.1), (3.2), (3.4), and a1 qk − ak q1 = k − 1.

Consider the half-infinite strip S = Sω = {(x, y + x tan ω) ; x > 0, −ε ≤ y ≤ ε} of direction ω, top line passing through N0 , and bottom line passing through S0 . Assume that γ < tan ω < γ . For each y0 ∈ [−ε, ε] we wish to find the first vertical segment of form {m} × [n − ε, n + ε], m, n ∈ N, that intersects the line of slope tan ω passing through (0, y0 ). In other words, we wish to calculate q(ω, y0 ) = inf{n ∈ N ; y0 + n tan ω ≤ ε}, where we denote x = dist(x, Z), x ∈ R. We shall assume that tan ω is irrational and split the discussion according to the three cases where the slope of ω belongs to one of a −2ε a+2ε a+2ε a , q , q or q , q . the intervals qa , a −2ε q Proposition 1. Let γ = ( qa , a −2ε q ]

a q

< γ =

a q

be consecutive fractions in F Q . Suppose tan ω ∈

is irrational and tan ω ∈ Iγ ,k = (tk , tk−1 ] for some k ∈ N. Set w Bk = w Bk (ω) = qk tan ω − ak + 2ε, wCk = wCk (ω) = −qk−1 tan ω + ak−1 − 2ε, w A+ = w A+ (ω) = −q tan ω + a + 2ε, I A+ := [−ε, −ε + w A+ ), I Bk := (−ε + w A+ + wCk , ε] = (ε − w Bk , ε], ICk := [−ε + w A+ , −ε + w A+ + wCk ],

and

⎧ ⎪ if y0 ∈ I A+ ; ⎨ L A+ (ω) := q L(ω, y0 ) = L Ck (ω) := qk+1 if y0 ∈ ICk ; ⎪ ⎩ L (ω) := q if y0 ∈ I Bk . Bk k

Then for any ∈ {A+ , Bk , Ck } we have 0 ≤ w ≤ 2ε and q(ω, y0 ) = L(ω, y0 ) = L (ω),

y0 ∈ I .

434

F. P. Boca, A. Zaharescu

Fig. 3. The case tan ω ∈ Iγ ,k , k ∈ N

Furthermore, if S denotes the parallelogram of height {0} × I , angle ω between its ∗ (ω) side and the horizontal direction, and side length Lcos ω , then area(S A+ ) + area(S Bk ) + area(SCk ) = w A+ L A+ + w Bk L Bk + wCk L Ck = 1. Moreover, {S A+ , S Bk , SCk } mod Z2 provides a partition of the unit square [0, 1)2 (we allow the boundaries of these three sets to intersect). Proof. Taking stock on Lemma 4, we notice that the line of slope tan ω through S0 intersects the vertical line Nγ Sγ at a point between Nγ and Sγ (see Fig. 3). Also, because t N0 Sγk = tk < tan ω ≤ tk−1 = t N0 Sγk−1 < t N0 Nγk , the line of slope tan ω through N0 (respectively through Nγ ) intersects the line Nγk Sγk (respectively Nγk+1 Sγk+1 ) between Nγk and Sγk (respectively between Nγk+1 and Sγk+1 ). The segment Nγk−1 Sγk−1 is placed above these two parallel lines because tan ω ≤ tk−1 = t N0 Sγk−1 . Next, we find that the intersections with the vertical axis of the lines y − a − ε = (x − q) tan ω and y − ak + ε = (x − qk ) tan ω which have slope tan ω and pass through Nγ and respectively Sγk , are (0, ε + a − q tan ω) and respectively (0, −ε + ak − qk tan ω), whence the required values of w A+ , w Bk and wCk follow. Notice that 2ε > w A+ = 2ε + a − q tan ω ≥ 2ε + a − q

a − 2ε 2ε(q + q ) − 1 = > 0, q q

ak−1 − 2ε 1 − 2εq = qk − ak + 2ε ≥ w Bk = qk tan ω − ak + 2ε > 0, qk−1 qk−1 1 − 2εq ak − 2ε 2ε > = ak−1 − 2ε − qk−1 > ak−1 − 2ε − qk−1 tan ω = wCk ≥ 0. qk qk 2ε >

Besides one clearly has w A+ + w Bk + wCk = 2ε, and it is easy to check by a direct calculation that area(S ) = w L = 1. ∈{A+ ,Bk ,Ck }

∈{A+ ,Bk ,Ck }

Free Path Lengths in the Periodic Lorentz Gas in the Small-Scatterer Limit

435

It remains to check that the interiors of the subsets S mod Z2 ⊆ [0, 1)2 , ∈ {A+ , Bk , Ck }, are disjoint. If not, there exist two points P, P inside ∪ S such that P − P ∈ Z2 . The latter is preserved by translating the segment P P to a parallel segment. Owing to the shape of ∪ S we may thus assume that, say, P lies on the y-axis; hence P = P(0, y0 ) and P = P (n, m + y0 ) for some y0 ∈ [−ε, ε], m, n ∈ N∗ . The line of slope tan ω which passes through P intersects the y-axis at (0, m + y0 − n tan ω). Hence −ε ≤ m + y0 − n tan ω ≤ ε, which shows that y0 − n tan ω = − y0 + n tan ω ≤ ε. By the first part of the proposition this gives n ≥ L(ω, −y0 ), thus P must belong to the boundary, which is a contradiction. Proposition 2. Let γ = a ( a −2ε q , q )

a q

< γ =

a q

be consecutive fractions in F Q . Suppose tan ω ∈

is irrational.

(i) If tan ω ∈ Iγ ,0 = (t0 , u 0 ], then the analog of Proposition 1 holds true, with1 w B0 = w B0 (ω) = q tan ω − a + 2ε ∈ (0, 2ε), wC0 = wC0 (ω) = −(q − q) tan ω + a − a − 2ε ∈ [0, 2ε), w A0 = w A0 (ω) = w A+ (ω) = −q tan ω + a + 2ε ∈ [0, 2ε), I A0 := [−ε, ε + w A0 ), I B0 = (−ε + w A0 + wC0 , ε] = (ε − w B0 , ε], IC0 := [−ε + w A0 , −ε + w A0 + wC0 ], ⎧ ⎪ if y0 ∈ I A0 ; ⎨ L A0 (ω) := q L(ω, y0 ) = L C0 (ω) := q + q if y0 ∈ IC0 ; ⎪ ⎩ L (ω) := q if y0 ∈ I B0 . B0 (ii) If k ∈ N and tan ω ∈ Iγ ,−k = (u k−1 , u k ], then the analog of Proposition 1 holds true, with w B− = w B− (ω) = w B0 (ω) = q tan ω − a + 2ε ∈ (0, 2ε), wC−k = wC−k (ω) = qk−1 tan ω − ak−1 − 2ε ∈ (0, 2ε),

w A−k = w A−k (ω) = −qk tan ω + ak + 2ε ∈ [0, 2ε), I A−k = [−ε, −ε + w A−k ), IC−k = [−ε + w A−k , −ε + w A−k + wC−k ], I B− = (−ε + w A−k + wC−k , ε] = (ε − w B− , ε], ⎧ ⎪ if y0 ∈ I A−k ; ⎨ L A−k (ω) := qk L(ω, y0 ) = L C−k (ω) := qk+1 if y0 ∈ IC−k ; ⎪ ⎩ L (ω) := q if y0 ∈ I B− . B−

Proof. (i) follows as in the proof of Proposition 1, using ε > a − ε − q tan ω ≥ a + ε − q tan ω ≥ −ε, tan ω ∈ Iγ ,0 . (ii) follows as in the proof of Proposition 1 using ε > a − ε − q tan ω > ak + ε − qk tan ω ≥ −ε, 1 Note that in both cases q < q or q < q we get 0 ≤ w C0 < 2ε.

tan ω ∈ Iγ ,−k .

436

F. P. Boca, A. Zaharescu

Fig. 4. The case tan ω ∈ Iγ ,0

Fig. 5. The case tan ω ∈ Iγ ,−k , k ∈ N

We now start investigating the case where the scatterers are vertical slits. Propositions 1 1 and 2 will only be applied for ε = 2Q , corresponding to the case of vertical slits of 1 height Q . The Lebesgue measure of a Borel set A in Rd , d = 1, 2, 3, will be denoted by |A|. Throughout the paper τδ (x, ω) will denote the free path length in the periodic twodimensional Lorentz gas with vertical slits of height 2δ as scatterers centered at all integer lattice points. Given λ > 0, I = [tan ω0 , tan ω1 ] ⊆ [0, 1] with 0 ≤ ω0 ≤ ω1 ≤ π4 , and Q ≥ 1 integer, we denote

P I,Q (λ) = {(x, ω) ; x ∈ [0, 1)2 , ω0 ≤ ω ≤ ω1 , τ1/(2Q) (x, ω) > λ} . Although the cases 0 < t < 1, 1 < t < 2, t > 2, will be considered separately, applying Propositions 1 and 2 to 2ε = Q1 , we can write for all t, ε∗ > 0,

P I,Q

t 2ε∗

=

∞

αk−1

γ ∈F Q (I ) k=1 αk

t cos ω wCk (ω) max qk+1 − , 0 dω 2ε∗

t cos ω + eight other similar terms where appears. 2ε∗

(3.5)

Lemma 5. For any interval I = [tan ω0 , tan ω1 ] ⊆ [0, 1] such that |I | εc with fixed 0 < c < 1 and small ε > 0, and any (large) integer Q = cos2εω0 + O(εc−1 ), the estimate t

P I,Q = PI,Q (t) + O(ε2c ), (3.6) 2ε

Free Path Lengths in the Periodic Lorentz Gas in the Small-Scatterer Limit

437

holds uniformly in t on compact subsets of (0, ∞). Here PI,Q (t) is obtained by ω substituting t Q in place of t cos 2ε∗ in (3.5), that is PI,Q (t) :=

∞

αk−1

γ ∈F Q (I ) k=1 αk

wCk (ω) max{qk+1 − t Q, 0}

+w Bk (ω) max{qk − t Q, 0} + w A+ (ω) max{q − t Q, 0} dω β0 wC0 (ω) max{q + q − t Q, 0} + γ ∈F Q (I )

α0

+w B0 (ω) max{q − t Q, 0} + w A0 (ω) max{q − t Q, 0} dω ∞ βk wC−k (ω) max{qk+1 − t Q, 0} + γ ∈F Q (I ) k=1

βk−1

+w A−k (ω) max{qk − t Q, 0} + w B− (ω) max{q − t Q, 0} dω, (3.7) with w A+ (ω) = w A0 (ω) = Q −1 + a − q tan ω, w B− (ω) = w B0 (ω) = q tan ω − a + Q −1 , wC0 (ω) = Q −1 − w A0 (ω) − w B0 (ω), w Bk (ω) = qk tan ω − ak + Q −1 , wCk (ω) = ak−1 − Q −1 − qk−1 tan ω, tan ω − ak−1 − Q −1 , w A−k (ω) wC−k (ω) = qk−1 a + Q −1 ak − Q −1 αk = arctan , βk = arctan k , qk qk

=Q

−1

+ ak

− qk

(3.8)

tan ω,

k ∈ N.

Proof. Using the inequality max{w A , w B , wC } ≤ Q −1 ε, which is a consequence of w A+ + w Bk + wCk = Q −1 and of the similar relations for k = 0 and k ≤ −1, the estimate (see also (7.2))

cos ω

| cos ω0 − cos ω1 | | tan ω1 − tan ω0 |

≤ εc−1 + εc−1 , sup Q −

εc−1 + 2ε 2ε 2ε ω∈I and the inequalities

| max(x, 0) − max(y, 0)| ≤ |x − y|

and

γ ∈F Q (I )

it follows that we can replace

γ ∈F Q (I )

This establishes (3.6).

t cos ω 2ε

1 1 εc , |I | + qq Q by t Q in (3.5) at a cost which is

1 c−1 1 εc ε Q qq

γ ∈F Q (I )

1 ε2c . qq

(3.9)

438

F. P. Boca, A. Zaharescu

Equality (3.7) will be at the center of most of the forthcoming computations because it shows how the estimation of distribution of the free path length reduces to estimates on sums involving Farey fractions. There is an alternative approach to estimating P I,Q , by using a monotonicity argument instead of the continuity argument which is based on (3.9). Such an argument will be used in the proof of Theorem 2. In the remainder of the paper given I = [tan ω0 , tan ω1 ] ⊆ [0, 1] we denote du = ω1 − ω0 . (3.10) cI = 1 + u2 I 4. The Case 0 < t ≤ 1 The aim of this section is to prove the following result Proposition 3. Suppose I is a subinterval of [0, 1] of size |I | Q −c for 0 < c < 1. Then for every c1 > 0 with c + c1 < 1 and δ > 0, t2 PI,Q (t) = 1 − t + c I + Oδ (E c,c1 ,δ (Q)) (Q → ∞), 2ζ (2) with

E c,c1 ,δ (Q) = Q max{2c1 −1/2+δ,−c−c1 } .

The estimate is uniform in t ∈ (0, 1]. Before starting to estimate PI,Q , the following remark is in order. Remark 1. If I ⊆ [0, 1] is an interval with |I | ≥ 1 1 qq ≤ Q ≤ |I | we have

f (γ ) =

γ ∈F Q γ ∈I

1 Q,

then as a consequence of γ − γ =

f (γ ) + O

γ ∈F Q γ ∈I

f ∞ Q

.

As a result, replacing the condition γ ∈ I by γ ∈ I only produces an error of order Q −1 , which has no impact in any of the forthcoming estimates. Thus in Propositions 3, 4, 5, 6, 7, 8 the assumption |I | Q −c can be replaced by the weaker assumption |I | Q −c and |I | ≥ Q −1 . Then we notice that since min{qk , qk } ≥ q + q > t Q for all k ≥ 1, we can write, according to (3.7) and (3.8), PI,Q (t) =

∞

αk−1

γ ∈F Q (I ) k=1 αk

+

∞

γ ∈F Q (I ) k=1

− S Q,γ ,k (ω) dω +

βk

βk−1

+ S Q,γ ,k (ω) dω,

γ ∈F Q (I )

β0 α0

(0) S Q,γ ,k (ω) dω

Free Path Lengths in the Periodic Lorentz Gas in the Small-Scatterer Limit

439

with − S Q,γ ,k (ω) = wCk (ω)(qk+1 − t Q) + w Bk (ω)(qk − t Q) + w A+ (ω) max{q − t Q, 0}, (0)

S Q,γ ,k (ω) = wC0 (ω)(q + q − t Q) + w A+ (ω) max{q − t Q, 0} + w B− (ω) max{q − t Q, 0},

+ S Q,γ ,k (ω) = wC−k (ω)(qk+1 − t Q) + w A−k (qk − t Q) + w B− (ω) max{q − t Q, 0}).

Here the formulas for the width of the strips are as in (3.8), and we take αk = arctan α∞

a + Q −1 ak − Q −1 , βk = arctan k , qk qk

a a = arctan , β∞ = arctan . q q

(4.1)

Taking into account the equalities qk+1 wCk + qk w Bk + qw A+ = 1 = qk+1 wC−k + qk w A−k + qw B− ,

1 = wC−k + w A−k + w B− , Q

wCk + w Bk + w A+ =

(4.2) (4.3)

and wC0 (ω) = we can write PI,Q (t) =

1 − w A+ (ω) − w B− (ω), Q

(1)

(5)

(TQ,γ + · · · + TQ,γ ),

γ ∈F Q (I )

with (1) TQ,γ

= max{q − t Q, 0}

(2)

TQ,γ = max{q − t Q, 0} (3) TQ,γ (4)

TQ,γ (5)

β0

w A+ (ω) dω,

α∞ β∞ α0

w B− (ω) dω,

1 − w A+ (ω) − w B− (ω) dω, = (q + q − t Q) Q α0 α0 (t Q − q)w A+ (ω) + 1 − t dω, =

TQ,γ =

α∞ β∞ β0

β0

(t Q − q)w B− (ω) + 1 − t dω.

Rewriting the terms in a convenient way we arrive at PI,Q (t) = A0 + A1 + A2 + A3 ,

(4.4)

440

F. P. Boca, A. Zaharescu

where

A0 = (1 − t)

(β∞ − α∞ ),

γ ∈F Q (I )

A1 =

max{q − t Q, 0} + t Q − q

γ ∈F Q (I )

=−

min{q − t Q, 0}

α∞

γ ∈F Q (I )

A2 =

min{q − t Q, 0}

γ ∈F Q (I )

A3 =

γ ∈F Q (I )

β0

α0

β0

α∞

w A+ (ω) dω

w A+ (ω) dω,

max{q − t Q, 0} + t Q − q

γ ∈F Q (I )

=−

β0

β∞ α0

β∞

α0

w B− (ω) dω

w B− (ω) dω,

q + q − 1 − q w A+ (ω) − qw B− (ω) dω. Q

Remark first that A3 = 0, as a result of q + q − 1 − q w A+ (ω) − qw B− (ω) Q 1 1 q + q − 1 − q + a − q tan ω − q + q tan ω − a = Q Q Q = a q − aq − 1 = 0. (4.5) The next elementary statement will be repeatedly used. Lemma 6. For any λ, μ ∈ R we have, uniformly in c ∈ [0, 1] as h → 0+ ,

arctan(c+h)

(λ tan ω + μ) dω =

arctan c

h h2c − 1 + c2 (1 + c2 )2 +

arctan c

(λ tan ω + μ) dω =

arctan(c−h)

(λc + μ)

h2λ + O(h 3 (|λ| + |μ|)), 2(1 + c2 )

h h2c + 1 + c2 (1 + c2 )2 −

(λc + μ)

h2λ + O(h 3 (|λ| + |μ|)). 2(1 + c2 )

Proof. Applying to our situation Taylor’s formula a

a+ξ

f (x) d x = ξ f (a) +

(4.6)

ξ2 f (a) + O( f ∞ |ξ |3 ) 2

(4.7)

Free Path Lengths in the Periodic Lorentz Gas in the Small-Scatterer Limit

441

together with ξ = arctan(c + h) − arctan c =

h h2c − + O(h 3 ), 1 + c2 (1 + c2 )2

(4.8)

we get

arctan c

h h2c μ 3 − + O(h ) c + 1 + c2 (1 + c2 )2 λ 2 2 h 1 h c + − + O(h 3 ) (1 + c2 ) + O(h 3 ) 2 1 + c2 (1 + c2 )2 μ h2 h h2c c + + = − 1 + c2 (1 + c2 )2 λ 2(1 + c2 ) |μ| +1 ; +O h 3 |λ|

μ dω = tan ω + λ

arctan(c+h)

whence (4.6) follows for λ = 0. The case λ = 0 is a direct consequence of (4.8), while (4.7) is derived from (4.6) by changing h into −h. This result will only be applied in cases where λc + μ = 0. We shall also use the following weaker form of (4.8): arctan(c + h) − arctan c =

h + O(h 2 ). 1 + c2

(4.9)

It remains to estimate A0 , A1 and A2 . By (4.9) it is immediate that

A0 = (1 − t)

γ ∈F Q (I )

1 1 + O . qq (1 + γ 2 ) q 2 q 2

(4.10)

This shows in conjunction with γ ∈F Q

1 1 q 2 q 2 q2 Q

q=1

Q/2≤q ≤Q

Q 1 1 1 1 q 2 Q q2 Q

(4.11)

q=1

and with the subsequent Lemma 7 that A0 = c I (1 − t) + Oδ (E c,c1 ,δ (Q)).

(4.12)

Lemma 7. Let c, c1 > 0 such that c + c1 < 1. Then for any interval I ⊆ [0, 1] with |I | Q −c and δ > 0, γ ∈F Q (I )

1 qq (1 + γ 2 )

= c I + Oδ (E c,c1 ,δ (Q)).

442

F. P. Boca, A. Zaharescu

Proof. We decompose the sum above as S1 + S2 , according to whether q > q or q > q . Thus we can write Q f q (q , a), (4.13) S1 = q=1 q ∈I :=(max{Q−q,q},Q] a∈J :=q I −aq =1 (mod q)

where we put f q (q , a) =

1 qq (1 + a 2 /q 2 )

,

a ∈ I, q ∈ J , q ∈ [1, Q − 1].

The inclusion I ⊆ ( Q2 , Q] gives 1 q 2 ≤ ; ≤ 2 +a ) qq qQ

∂ fq ∂ fq

q 1 2a

0 ≤ D f q (q , a) = (q , a) +

+ (q , a) = 2 ∂q ∂a q (q + a 2 ) q q 2 + a 2 2 2 4 1 8 + ≤ 2 ≤ 2 . ≤ qq Q q q q q Q

0 ≤ f q (q , a) =

q (q 2

Applying Lemma 2 with T = [Q c1 ], the inner sum in (4.13) can be expressed as ϕ(q) dq q 2 Q −c da 2c1 1/2+δ 1 c1 3/2+δ 1 +Q q + + Oδ Q q q I qq q I 1 + a 2 /q 2 qQ q 2 Q Q c1 q 2 Q ϕ(q) V (q) + Oδ (Q 2c1 −1 q 1/2+δ + Q c1 −1 q −1/2+δ + Q −1−c−c1 ), = cI q where

V (q) =

The function

Q dq 1 , = ln q max{q, Q − q} I qq

q ∈ (0, Q].

⎧ 1 ⎨ 1 ln if x ∈ (0, 1]; x max{x, 1 − x} W (x) := ⎩ 1 if x = 0,

is bounded and has finite total variation on [0, 1], hence 1 M := sup |W (x)| + |W (x)| d x = O(1). x∈[0,1]

Since V (Qx) = Q ϕ(q) q=1

q

W (x) Q ,

Lemma 1 yields

1 V (q) = ζ (2) =

0

1 ζ (2)

Q

0

V (q) dq + O ln Q

0

sup |V (q)| + q∈(0,Q]

1

W (x) d x + O

Q

ln Q . Q

0

|V (q)| dq

Free Path Lengths in the Periodic Lorentz Gas in the Small-Scatterer Limit

443

Hence S1 = c I

Q ϕ(q)

q

q=1

=

cI ζ (2)

1 0

V (q) + Oδ (E c,c1 ,δ (Q))

W (x) d x + Oδ (E c,c1 ,δ (Q)).

(4.14)

Using a familiar identity of Euler (cf. formula (1.8) in [30]) we find that 1 1/2 1 ln(1 − x) ln x dx − dx W (x) d x = − x 0 0 1/2 x 1/2 ln2 2 ln2 2 ζ (2) ln2 2 ζ (2) ln(1 − x) =− dx − = + − = , x 2 2 2 2 2 0 which we combine with (4.14) to get cI + Oδ (E c,c1 ,δ (Q)). S1 = 2 Finally we employ

(4.15)

1 1 1 1 = + O(γ − γ ) = +O 1 + γ 2 1 + γ2 1 + γ2 qq

and (4.11) to write S2 =

γ ∈F Q (I ) q>q

(4.16)

1 . qq (1 + γ 2 )

Using (4.16) and (4.11) we see that S2 =

γ ∈F Q (I ) q>q

Q 1 = qq (1 + γ 2 )

q =1 q∈(max{Q−q ,q },Q] a ∈q I a q=1 (mod q )

1 . qq (1 + a 2 /q 2 )

Changing a to q − a , reversing the roles of q and q , and using dx = cI q , 2 q (1−I ) 1 + (1 − x/q ) it follows that S2 is given by the same expression as in (4.15). 1 1 Next we estimate A1 and find, taking c = a+1/Q q , h = q Q , λ = −q, μ = a + Q in (4.7), that β0 arctan a+1/Q q 1 + a − q tan ω dω w A+ (ω) dω = Q α∞ arctan qa 1 1 +O 2 3 = q Q 2q Q 2 1 + (a + 1/Q)2 )/q 2 1 1 . (4.17) +O = 2 2 2 2q Q (1 + γ ) q Q3

444

F. P. Boca, A. Zaharescu

Since

γ ∈F Q

Q Q 1 ϕ(q) ln Q 1 , ≤ = O q 2 Q3 Q2 q2 Q2 Q q=1

we infer from (4.17) and the definition of A1 that

A1 + O(Q −1 ) =

γ ∈F Q (I ) q≤t Q

=

1≤q≤t Q

tQ − q 2q Q 2 (1 + a 2 /q 2 ) tQ − q 2q Q 2

Q−q
Applying Lemma 2 to I = (Q − q, Q], J = q I , f q (q , a) = f q ∞ ≤ 1 and D f q ∞ ≤ ϕ(q) q q2

qI

2 q,

and taking T =

[Q c1 ],

(4.18)

1 . 1 + a 2 /q 2

1 1+a 2 /q 2

for which

the inner sum above becomes

q 2 |I | da 2c1 1/2+δ c1 3/2+δ 1 Q + + O q + Q q δ 1 + a 2 /q 2 q Q c1 q

= c I ϕ(q) + Oδ (Q 2c1 q 1/2+δ + Q −c−c1 q), which inserted back into (4.18) gives that A1 + O(Q −1 ) may be written as ⎞ ⎛ Q ϕ(q) 2c 1/2+δ Q cI (t Q − q) + Oδ ⎝ + Q −c−c1 q ⎠ Q 1q 2Q 2 q q Q2 q=1

1≤q≤t Q

=

cI 2Q 2

1≤q≤t Q

ϕ(q) (t Q − q) + Oδ (E c,c1 ,δ (Q)). q

(4.19)

Applying now Lemma 1 to the main term above with V (q) = t Q − q, q ∈ [1, t Q], we find that tQ cI A1 = (t Q − q) dq + Oδ (Q −1+δ + E c,c1 ,δ (Q)) 2Q 2 ζ (2) 0 t cI cI t 2 + Oδ (E c,c1 ,δ (Q)). (4.20) = (t − x) d x + Oδ (E c,c1 ,δ (Q)) = 2ζ (2) 0 4ζ (2) In a similar way we find A2 = and therefore

cI t 2 + Oδ (E c,c1 ,δ (Q)), 4ζ (2)

PI,Q (t) = 1 − t +

which proves Proposition 3.

t2 c I + Oδ (E c,c1 ,δ (Q)), 2ζ (2)

Free Path Lengths in the Periodic Lorentz Gas in the Small-Scatterer Limit

445

5. The Case t > 2 In this section we shall evaluate the contribution of the integrals on [αk , αk−1 ] in (3.7) when k ≥ 1 and t > 2. In this situation there is a unique nonnegative integer, given by t Q − q K = K (γ , t) = ≥ 0, q for which

q K ≤ t Q < q K +1 .

(5.1)

When t ≥ 2 it follows that K ≥ 1, and we prove Proposition 4. Suppose I is a subinterval of [0, 1] of size |I | Q −c for 0 < c < 1. Then for every c1 > 0 with c + c1 < 1 and δ > 0, 1 cI ψ(x, t) d x + Oδ (E c,c1 ,δ (Q)) (Q → ∞), PI,Q (t) = ζ (2) 0 with ψ as in Theorem 1 and E c,c1 ,δ as in Proposition 3. The estimate is uniform in t on compacts of (2, ∞). Next, αk and βk will be as in (4.1) and the widths w as in (3.8). Since t > 2, then q + q < t Q and the second sum in (3.7) is zero. In the beginning we fix γ = qa ∈ F Q (I ) and estimate S2 (γ , t) :=

∞

αk−1

k=K +1 αk

qk+1 wCk (ω) + qk w Bk (ω) − t Q wCk (ω) + w Bk (ω) dω.

Using (4.2) and (4.3), taking c = (4.6), and also owing to α K − α∞ we infer that

a q,

h =

a K −1/Q qK

−

a q

=

1−q/Q qq K ,

λ = q, μ = −a in

1 − q/Q a K − 1/Q a +O = arctan − arctan = qK q qq K (1 + a 2 /q 2 )

1 2 q q12

,

1 1 − qw A+ (ω) − t Q − w A+ (ω) dω Q α αK ∞ 1 αK q 2 tan ω + 1 − + a q dω − t Q = (q tan ω − a) dω Q α∞ α∞ αK q (α K − α∞ ) + (q − t Q) = 1− (q tan ω − a) dω Q α∞ Q (1 − q/Q)2 q q (α K − α∞ ) + (q − t Q) = 1− +O 2 3 Q 2q 2 q K2 (1 + γ 2 ) q qK 1 (1 − q/Q)2 (1 − q/Q)2 +O + (q − t Q) = 2 2 2 2 qq K (1 + γ ) 2qq K (1 + γ ) q q12 1 (1 − q/Q)2 (q + 2q K − t Q) = +O . 2qq K2 (1 + γ 2 ) q 2 q12

S2 (γ , t) =

αK

446

F. P. Boca, A. Zaharescu

On the other hand, taking c = λ = −q K −1 , μ = a K −1 − 0≤

1 Q

a K −1 −1/Q , q K −1

h =

a K −1 −1/Q q K −1

−

a K −1/Q qK

=

1−q/Q q K −1 q K

in (4.7), and also using

1 1 1 a K −1 − 1/Q a − − ≤ , 2 2 1+c 1+γ q K −1 q qq K −1

we estimate α K −1 wCk (ω) dω =

1 − q K −1 tan ω dω a K −1 − Q αK q K −1 q K −1 (1 − q/Q)2 + O = 2 2q K −1 q K2 (1 + c2 ) q K3 −1 q K3 1 1 (1 − q/Q)2 1 + · = + O qq K −1 q K −1 q K2 2q K −1 q K2 (1 + γ 2 ) q K2 −1 q K3 1 (1 − q/Q)2 = +O . 2 2 2 2q K −1 q K (1 + γ ) qq K −1 q K2

αK

α K −1

Using also 0 < q K +1 − t Q ≤ q, this gives whenever t > 2 (so K ≥ 1), α K −1 S1 (γ , t) := (q K +1 − t Q) wCk (ω) dω αK

=

(q K +1 − t Q)(1 − q/Q)2 1 . + O q 2 (q + q )2 2q K2 −1 q K2 (1 + γ 2 )

Since t > 2, the sum of integrals on [αk , αk−1 ] in (3.7) becomes + S1 (γ , t) + S2 (γ , t) . PI,Q (t) := γ ∈F Q (I )

Making use of γ ∈F Q

Q 1 1 ≤ 2 2 q (q + q ) q 2 q =1

Q q=Q−q

Q ∞ 1 1 1 1 ≤ 2 2 2 (q + q) q k Q q =1

k=Q+1

and of (1 − q/Q)2 (q K +1 − t Q) (1 − q/Q)2 (q + 2q K − t Q) + 2q K −1 q K2 (1 + γ 2 ) 2qq K2 (1 + γ 2 ) (1 − q/Q)2 q(q K +1 + q K −1 ) + 2q K −1 q K − t Q(q + q K −1 ) = 2qq K −1 q K2 (1 + γ 2 ) =

(1 − q/Q)2 (2q K − t Q) (1 − q/Q)2 (2qq K + 2q K −1 q K − t Qq K ) , = 2qq K −1 q K (1 + γ 2 ) 2qq K −1 q K2 (1 + γ 2 )

we find + PI,Q (t) =

γ ∈F Q (I )

(1 − q/Q)2 (2q K − t Q) +O 2qq K −1 q K (1 + γ 2 )

1 Q

.

,

Free Path Lengths in the Periodic Lorentz Gas in the Small-Scatterer Limit

447

Fig. 6. The set k ∩ T

Next for each integer k ≥ 1 consider the sets t−y t −1 t −1 2 = k and Ik = , ∩ [0, 1), k = (x, y) ∈ R ; x k k−1 and for q ∈ Q Ik and k ≥ 1, respectively k ≥ 2, the intervals (see Fig. 6) q q q kq q (0) ,1 = ; , ∈ k−1 ∩ T ⊆ 1 − , 1 , = t− Jk,q Q Q Q Q Q kq q q q q q (1) Jk,q = 1 − , t − = ; , ∈ k ∩ T ⊆ 1 − , 1 . Q Q Q Q Q Q (0)

(1)

Note that |Q Jk,q |, |Q Jk,q | < q, that min{k ; |k ∩ T | > 0} = [t] − 1 ≥ 1, and that |Ik | = 0 unless k ≥ [t] ≥ 2. We also consider the function (1 − q/Q)2 (2qk − t Q) 2qqk qk−1 (1 + γ 2 ) (1 − q/Q)2 (2q + 2kq − t Q) = . 2q(q + kq)(q + (k − 1)q)(1 + a 2 /q 2 )

Qk × [0, q] (q, q , a) → f k (q, q , a) =

448

F. P. Boca, A. Zaharescu

Using the one-to-one correspondence between the primitive integer points in Q(k ∩T ) and the set of consecutive Farey fractions γ = qa and γ = qa in F Q with [ Q−q q ] = k, we derive using the summation method described in Sect. 2 that + (t) PI,Q

=

∞

k=1

γ = qa ∈F Q (I ) (q,q )∈Q(k ∩T )

with Sk (q) =

f k (q, q , a) =

∞

Sk (q) + Tk (q) + O

k=2 q∈Q Ik

f k (q, q , a), Tk (q) =

(1)

1 Q

,

f k−1 (q, q , a).

(0)

q ∈Q Jk,q , a∈q I −aq =1 (mod q)

q ∈Q Jk,q , a∈q I −aq =1 (mod q) (1)

We aim to estimate Sk (q) and Tk (q) applying Lemma 2 to the intervals I = Q Jk,q , (0)

J = q I and the function f = f k (q, ·, ·), and respectively to I = Q Jk,q , J = q I and f = f k−1 (q, ·, ·). For (q, q ) ∈ Q(k ∩ T ) we have qk ≤ t Q < qk+1 , or equivalently qk−1 < 2qk − t Q ≤ qk . As a result, we see that (here k ≥ 2) f k (q, ·, ·)∞ ≤ f k−1 (q, ·, ·)∞ ≤

sup (1) q ∈Q Jk,q

sup (0)

q ∈Q Jk,q

1 qk 1 < , ≤ sup qqk qk−1 q ∈(Q−q,Q] q(q + q ) qQ qk−1 1 1 1 t . ≤ sup ≤ qqk−1 qk−2 q >(t−2)Q qq (t − 2)q Q qQ

1 The last estimate holds without the factor t−2 whenever k > [t]. In the remainder of this 1 section we will simply write t−2 1 with the understanding that this holds uniformly in t on compacts of (2, ∞). We also need to estimate the L ∞ -norm of D f k . It is easily seen that ∂ fk 2q 1 ∂a (q, ·, ·) ≤ q 2 f k (q, ·, ·)∞ q 2 Q , ∞ ∂ f k−1 1 2q 1 ∂a (q, ·, ·) ≤ q 2 f k−1 (q, ·, ·)∞ (t − 2)q 2 Q q 2 Q , ∞ ∂ fk |2qk qk−1 − (2qk − t Q)(qk + qk−1 )| 1 ∂q (q, ·, ·) ≤ 2q sup (1) 2 qk2 qk−1 ∞ q ∈Q Jk,q 1 qk + qk−1 qk (qk + qk−1 ) sup + sup 2 2 2 qq q qqk qk−1 (1) (1) qqk qk−1 k k−1 q ∈Q J q ∈Q J k,q

sup (1) q ∈Q Jk,q

k,q

1 1 1 1 ≤ sup < ≤ 2 , 2 )2 2 q(q + q q Q q Q qqk−1 q ∈(Q−q,Q]

Free Path Lengths in the Periodic Lorentz Gas in the Small-Scatterer Limit

and similarly ∂ f k−1 1 ∂q (q, ·, ·) q ∞

sup

q ∈[(t−2)Q,Q]

449

1 1 1 1 ≤ t ≤ 2 . q 2 (t − 2)2 q Q 2 q Q2 q Q

Applying now Lemma 2 to this situation with T = [Q c1 ], where 0 < c1 < determined later, we approximate Sk (q) + Tk (q) within error E k (q) δ Q 2c1 q 1/2+δ

1 2

is to be

1 1 1 + Q c1 q 3/2+δ + Q −c−c1 q 2 Qq Qq 2 Qq 2

δ Q 2c1 −1 q −1/2+δ + Q −1−c−c1 by ϕ(q) q2

(1)

Q Jk,q ×q I

= cI

f k (q, q , a) dq da +

ϕ(q) q2

(0)

Q Jk,q ×q I

f k−1 (q, q , a) dq da

ϕ(q) (1 − q/Q)2 · Wk (q), q 2q

where c I is as in (3.10) and Wk (q) = gk (q, q ) =

(1) Q Jk,q

gk (q, q ) dq +

(0) Q Jk,q

gk−1 (q, q ) dq ,

2qk − t Q , (q, q ) ∈ Q(k ∩ T ). qk qk−1

By a direct computation we find that t Q−kq Q 2qk − t Q 2qk−1 − t Q Wk (q) = W (q) = dq + dq qk qk−1 Q−q t Q−kq qk−1 qk−2 Q+(k−1)q tQ 2y − t Q 2y − t Q dy + dy = y(y − q) Q+(k−1)q y(y − q) t Q−q

tQ t Q y − q

t Q 2y − t Q dy = 2 ln(y − q) − ln =

q y t Q−q y(y − q) y=t Q−q = 2 ln

tQ (t Q − q)2 tQ − q − ln t Q − 2q q t Q(t Q − 2q)

is independent of k. Since the error terms sum up to ∞

(Q 2c1 −1 q −1/2+δ + Q −1−c−c1 ) = Q 2c1 −1

Q

q −1/2+δ + Q 1−1−c−c1

q=1

k=2 q∈Q Ik

E c,c1 ,δ (Q), we arrive at + PI,Q (t) = c I

Q ϕ(q) q=1

q

V (q) + Oδ (E c,c1 ,δ (Q)),

(5.2)

450

F. P. Boca, A. Zaharescu

with

(1 − q/Q)2 W (q), q ∈ (0, Q]. 2q For t > 2 consider the function (1 − x)2 x t (t − x)2 ψ(x, t) = 2 ln 1 + − ln 1 + , x ∈ (0, 1]. f t (x) = 2 2x t − 2x x t (t − 2x) V (q) =

Using the Taylor series of the logarithm we obtain for small x, 2x (1 − x)2 t x2 x2 3 f t (x) = − ) − · + O(x 2x t − 2x (t − 2x)2 x t (t − 2x) 1 x 2 − + O(x ) , = (1 − x)2 2(t − 2x) 2(t − 2x)2 which shows that f extends to a C 1 function on [0, 1], and so 1 | f t (x)| d x 1, 0

uniformly for t in compacts of (2, ∞). The equality V (Qx) = Q −1 f (x), x ∈ (0, 1], implies now that both V ∞ and the total variation of V on (0, Q] are Q −1 . Thus we may apply Lemma 1 to (5.2) and conclude, also using c + c1 < 1, that Q cI + PI,Q = V (q) dq + Oδ (E c,c1 ,δ (Q)) ζ (2) 0 1 cI = f t (x) d x + Oδ (E c,c1 ,δ (Q)) ζ (2) 0 1 cI = ψ(x, t) d x + Oδ (E c,c1 ,δ (Q)). (5.3) 2ζ (2) 0 One can see in a similar way that the contribution of integrals on the intervals [βk−1 , βk ] in (3.7) for k ≥ 1 and t > 2 is 1 cI − ψ(x, t) d x + O(E c,c1 ,δ (Q)). (5.4) PI,Q (t) = 2ζ (2) 0 Proposition 4 now follows from (5.3) and (5.4). 6. The Case 1 < t < 2 In this section we prove Proposition 5. Suppose I is a subinterval of [0, 1] of size |I | Q −c for 0 < c < 1. Then for any c1 with c + c1 < 1 and δ > 0, t−1 1 cI PI,Q (t) = ψ(x, t) d x + φ(x, t) d x + Oδ (E c,c1 ,δ (Q)) (Q → ∞), ζ (2) 0 t−1 with φ and ψ as in Theorem 1. The estimate holds uniformly in t on compacts of (1, 2).

Free Path Lengths in the Periodic Lorentz Gas in the Small-Scatterer Limit

451

In this case (3.7) gives

β0

γ ∈F Q (I )

α0

PI,Q (t) =

+

wC0 (ω) max{q + q − t Q, 0} dω

∞ αk−1

γ ∈F Q (I ) k=1

αk

∞ βk

+

wCk (ω) max{qk+1 − t Q, 0} + w Bk (ω) max{qk − t Q, 0} dω

γ ∈F Q (I ) k=1

βk−1

wC−k (ω) max{qk+1 − t Q, 0} + w A−k (ω) max{qk − t Q, 0} dω.

We break the main term above according as to whether q + q > t Q or q + q ≤ t Q. Thus we first estimate > (t) : = PI,Q

γ ∈F Q (I ) q+q >t Q

+

β0 1 − w A+ (ω) − w B− (ω) (q + q − t Q) dω Q α0 ∞ αk−1

wCk (ω)qk+1 + w Bk (ω)qk − t Q wCk (ω) + w Bk (ω) dω

α γ ∈F Q (I ) k=1 k q+q >t Q ∞ βk

+

γ ∈F Q (I ) k=1 q+q >t Q

βk−1

wC−k (ω)qk+1 + w A−k (ω)qk − t Q wC−k (ω) + w A−k (ω) dω.

Using (4.2) and (4.3) we may also write β0 1 > − w A+ (ω) − w B− (ω) (q + q − t Q) dω PI,Q (t) = Q α0 γ ∈F Q (I ) q+q >t Q

+

+

α0

γ ∈F Q (I ) α∞ q+q >t Q

1 1 − w A+ (ω)q − t Q − w A+ (ω) dω Q

β∞

γ ∈F Q (I ) β0 q+q >t Q

1 − w B− (ω)q − t Q

1 Q

− w B− (ω) dω

1 + A

2 + A

3 ,

0 + A =A with

0 = (1 − t) A

(β∞ − α∞ ),

γ ∈F Q (I ) q+q >t Q

2 = A

γ ∈F Q (I ) q+q >t Q

(t Q − q)

1 = A

(t Q − q)

γ ∈F Q (I ) q+q >t Q β∞ α0

w B− (ω) dω,

β0 α∞

w A+ (ω) dω,

452

F. P. Boca, A. Zaharescu

Fig. 7. The set ∪∞ k=1 k ∩ T when 1 < t < 2

3 = A

γ ∈F Q (I ) q+q >t Q

β0 α0

q + q − 1 − q w A+ (ω) − qw B− (ω) dω. Q

0 , A

1 , A

2 and A

3 by noticing that (4.5) yields We proceed to estimate A

3 = 0. A

(6.1)

1 is estimated in a similar way as A1 was in Sect. 4, only with the difference Next A that the summation over γ ∈ F Q (I ) is being done under the additional requirement q + q > t Q. This is not going to produce any change in the error, and will only affect the main terms. As in (4.10) and (4.11) we obtain 1 1

+O . A0 = (1 − t) qq (1 + γ 2 ) Q γ ∈F Q (I ) q+q >t Q

Then, as in the proof of Lemma 7, we find that

0 = (1 − t) A

∞

f q (q , a) = c I (1 − t)

q=1 q ∈I :=(t Q−q,Q] a∈J :=q I −aq =1 (mod q)

ϕ(q) V (q) + Oδ (E c,c1 ,δ (Q)), q

(t−1)Q
where this time we take V (q) =

Q 1 ln , q ∈ ((t − 1)Q, Q]. q tQ − q

(x) and the function But V (Qx) = Q −1 V

(x) = 1 ln 1 , V x t−x

x ∈ [t − 1, 1],

Free Path Lengths in the Periodic Lorentz Gas in the Small-Scatterer Limit

453

is C 1 on [t − 1, 1]. Hence the L ∞ -norm and the total variation of V on the interval [(t − 1)Q, Q] are Q −1 , uniformly in t on compacts of (1, 2). Lemma 1 applies now and yields Q

0 = c I (1 − t) V (q) dq + Oδ (E c,c1 ,δ (Q)) A ζ (2) (t−1)Q 1 c I (1 − t) 1 1 ln d x + Oδ (E c,c1 ,δ (Q)). = (6.2) ζ (2) x t − x t−1 Proceeding as in Sect. 4 (see (4.18)–(4.20)) we find

1 = A

(t−1)Q
tQ − q 2q Q 2

1 1 +O 1 + a 2 /q 2 Q

t Q−q

ϕ(q) (t Q − q) q − (t − 1)Q + Oδ (E c,c1 ,δ (Q)) 2 q (t−1)Q
=

cI 2Q 2

This immediately gives

1 = A

cI 2ζ (2)

1 t−1

(t − x)(x − t + 1) d x + Oδ (E c,c1 ,δ (Q)). x

(6.3)

(t − x)(x − t + 1) d x + Oδ (E c,c1 ,δ (Q)). x

(6.4)

In a similar way we find

2 = A

cI 2ζ (2)

1 t−1

From (6.1)–(6.4) we now collect 1 1 cI (t − x)(x − t + 1) c I (1 − t) 1 1 > ln dx + dx (t) = PI,Q ζ (2) x t − x ζ (2) x t−1 t−1 +Oδ (E c,c1 ,δ (Q)). (6.5) It remains to estimate the contribution of Farey fractions of order Q with q + q ≤ t Q to PI,Q (t), which is < PI,Q (t) := B1 + B2 , where B1 denotes ∞ αk−1 wCk (ω) max{qk+1 − t Q, 0} + w Bk (ω) max{qk − t Q, 0} dω, γ ∈F Q (I ) k=1 αk q+q ≤t Q

and B2 denotes ∞ γ ∈F Q (I ) k=1 q+q ≤t Q

βk

βk−1

wC−k (ω) max{qk+1 − t Q, 0} + w A−k (ω) max{qk − t Q, 0} dω.

454

F. P. Boca, A. Zaharescu

In this case one also has K =

t Q − q ≥ 1, q

and as in Sect. 5 we find B1 + Oδ (E c,c1 ,δ (Q)) ∞ αk−1 = wCk (ω) max{qk+1 − t Q, 0} + w Bk max{qk − t Q, 0} dω

=

k=1 γ ∈F Q (I ) αk (q,q )∈Q(k ∩T ) ∞

f 1 (q, q , a)

(t−1)Q
k=2 q∈Q Ik

(1 − x)2 t−x 2(x + y) − t dy dx x x y(x + y) 0 t−1 1−x t−1 1 t 1 1 (1 − x)2 t−x 2 cI cI − − d y d x, = ψ(x, t) d x + 2ζ (2) 0 2ζ (2) t−1 x y x y y+x 1−x =

cI 2ζ (2)

Sk (q) + Tk (q) +

t−1

ψ(x, t) d x +

and thus B1 =

cI 2ζ (2)

cI + 2ζ (2)

cI 2ζ (2)

1

1

(1 − x)2 x

t−1 t−1 0

t−x t t−x 2 ln − ln dx 1−x x t (1 − x)

ψ(x, t) d x + Oδ (E c,c1 ,δ (Q)).

In a similar way we find that B2 can too be expressed as in (6.6), and thus 1 t−x t t−x (1 − x)2 cI < 2 ln − ln dx (t) = PI,Q ζ (2) t−1 x 1−x x t (1 − x) t−1 cI + ψ(x, t) d x + Oδ (E c,c1 ,δ (Q)). ζ (2) 0

(6.6)

(6.7)

Proposition 5 follows now from (6.5) and (6.7). 7. Proof of Theorem 1

We may assume without loss of generality that ω ∈ 0, π4 , thus estimate for small ε > 0 the quantity

π 4

t

. Pε (t) = (x, ω) ∈ Yε × 0, ; τε (x, ω) > π 4 2ε

We partition the interval [0, 1] as a union of N intervals I j = [tan ω j , tan ω j+1 ] of equal size, with N = [ε−c ], thus with |I j | = N1 εc , where 0 < c < 1 is to be chosen later. For each j we set cos ω cos ω j j+1 + + 1. , Q = Q −j = j 2ε + 2εc+1 2ε − 2εc+1

Free Path Lengths in the Periodic Lorentz Gas in the Small-Scatterer Limit

455

Since ω j ∈ 0, π4 , we have Q ±j ε−1 , and thus |I j | εc . Moreover, for ω ∈ [ω j , ω j+1 ] we have 1 ε ε − εc+1 ε ε ε + εc+1 1 ≤ < ≤ ≤ ≤ ≤ . + 2Q j cos ω j cos ω j cos ω cos ω j+1 cos ω j+1 2Q −j

(7.1)

From the definition of Q ±j and from | cos y − cos x| ≤ | sin(x − y)| ≤ | tan x − tan y|,

x, y ∈ [0, π/4],

(7.2)

we infer that Q +j − Q −j

cos ω j cos ω j+1 εc+1 cos ω j − cos ω j+1 − + εc−1 2ε − 2εc+1 2ε + 2εc+1 ε 2ε

and Q ±j =

cos ω j + O(εc−1 ). 2ε

(7.3)

ω Remark 2. If ω ∈ 0, π4 and λ± are such that λ− < cos 2ε < λ+ , then for all x ∈ Yε we have τ1/(2λ− ) (x, ω) − ε.

τ1/(2λ+ ) (x, ω) + ε > τε (x, ω) > This shows in turn that if for each interval I = [tan ω0 , tan ω1 ] ⊆ [0, 1] we denote

t

, Pε,I :=

(x, ω) ; x ∈ Yε , tan ω ∈ I, τε (x, ω) > 2ε

then for any integers Q ± such that Q − < cos2εω1 < cos2εω0 < Q + we have t +ε t −ε 2

− π ε ≤ Pε,I (t) ≤ P I,Q + . P I,Q − 2ε 2ε By the previous remark we infer t +ε t −ε

P I j ,Q − − π ε2 ≤ Pε,I j (t) ≤ , P I j ,Q +j j 2ε 2ε

j = 1, . . . , N .

(7.4)

For small ε > 0 we also have t +ε t t t −ε < < < , c+1 2ε + 2ε 2ε 2ε 2ε − 2εc+1 uniformly in t on compacts of (0, ∞). Thus (7.4), (7.3), and Lemma 5 yield t t + = P + O(ε2c ) P I j ,Q +j Pε,I j (t) ≤ I j ,Q j 2ε + 2εc+1 1 + εc and P I j ,Q − Pε,I j (t) ≥ j

t 2ε − 2εc+1

− π ε2 = PI j ,Q − j

t 1 − εc

(7.5)

+ O(ε2c ).

(7.6)

456

F. P. Boca, A. Zaharescu

By the definition of P we see that for any compact interval K ⊂ (0, ∞) \ {1, 2}, there exists a constant C K > 0 such that |P(t1 ) − P(t2 )| ≤ C K |t1 − t2 |, t1 , t2 ∈ K .

(7.7)

Now by Propositions 3, 4, 5 we know that for any j ∈ {1, 2, . . . , N } and small ε > 0, t PI j ,Q ± = PI j ,Q ± t (1 + O(εc )) c j j 1±ε = c I j (P(t) + O(εc )) + Oδ (ε2c + εc+c1 + ε1/2−2c1 −δ ), uniformly in t on compacts of (0, ∞) \ {1, 2}. Here P(t) is defined as in Theorem 1.1. Summing over j the inequalities (7.5) and (7.6), and using also N j=1

cI j =

1 0

du π = , 2 1+u 4

(7.8)

N ≤ ε−c , and (7.7), we gather N j=1

Pε,I j (t) =

π P(t) + Oδ (εc + εc1 + ε1/2−2c1 −c−δ ). 4

For obvious symmetry reasons we can only consider ω ∈ [0, π4 ]. Thus, after nor-

malizing the Lebesgue measure με on Yε by dividing by π4 area(Yε ) = get Pε (t) = P(t) + Oδ (εc + εc1 + ε1/2−2c1 −c−δ ).

π(1−π ε2 ) , 4

we

The proof of Theorem 1 is completed by taking c = c1 = 18 . 8. The Geometric Free Path Length in the Case 0 < t ≤ 1 In this and the next two sections we shall take ω ∈ [0, π4 ], and analyze the geometric free path length in the case of vertical scatterers of height 2δ centered at integer

δ,I , dμ ), where δ > 0, lattice points. In this setup we will consider the phase space ( 2δ

δ,I = [−δ, δ] × [ω0 , ω1 ] and dμ is the I = [tan ω0 , tan ω1 ] ⊆ [0, 1] is an interval,

δ,I . The trajectory will therefore start at a (non-normalized) Lebesgue measure on point (0, y), y ∈ [−δ, δ], under angle ω, with tan ω ∈ I . Recall that the free path length is denoted by τδ in this case. Given λ > 0, consider

δ,I (λ) = 1 |{(y, ω) ∈

δ,I ; G τδ (y, ω) > λ}| 2δ ω1 δ 1 = eλ ( τδ (y, ω)) dy dω. 2δ ω0 −δ

(8.1)

1 for properly chosen integers Q. The first goal Actually it will suffice to take δ = 2Q will be to estimate the distribution of the free path length τ1/(2Q) (x, ω) when we average

1/(2Q),I , under the assumptions that I = [tan ω0 , tan ω1 ] ⊆ [0, 1] is a over (x, ω) ∈

Free Path Lengths in the Periodic Lorentz Gas in the Small-Scatterer Limit

457

short interval of length |I | ε1/8 for small ε, and that Q is a (large) integer such that Q = cos2εω0 + O(ε1/8−1 ). Concretely, we will be interested in the quantity

I,Q (t) := G

1/(2Q),I (t) = Q|{(x, ω) ∈

1/(2Q),I ; G τ1/(2Q) (x, ω) > t}|. In the remainder of the paper we take c = c1 = 18 . We set 1 if x < λ;

λ (x) = e(−∞,λ) (x) = 0 if x ≥ λ. A direct application of Propositions 1 and 2, with widths w given by (3.8) and αk , βk by (4.1), yields the following formula, derived from (3.5) by replacing max{q − x, 0} with Q q (x), and valid for any t, ε∗ > 0:

I,Q G

t 2ε∗

=Q

∞

αk−1

γ ∈F Q (I ) k=1 αk

wCk (ω) qk+1

+ eight other terms where

t cos ω 2ε∗

t cos ω 2ε∗

dω

appears

in an analogous way.

(8.2)

This quantity will be compared with the one obtained by substituting t Q in place of in (8.2), as in Sect. 3. For this purpose we shall consider

t cos ω 2ε∗

G I,Q (t) := Q

∞

γ ∈F Q (I ) k=1

+ Q

A− Q,γ ,k (t, ω) dω

∞

γ ∈F Q (I ) k=1

βk

βk−1

+Q

β0

γ ∈F Q (I ) α0

(0)

A Q,γ ,k (t, ω) dω

A+Q,γ ,k (t, ω) dω,

(8.3)

with A− Q,γ ,k (t, ω) = wCk (ω) qk+1 (t Q) + w Bk (ω) qk (t Q) + w A+ (ω) q (t Q), (0)

A Q,γ ,k (t, ω) = wC0 (ω) q+q (t Q) + w B0 (ω) q (t Q) + w A0 (ω) q (t Q), (t Q) + w A (ω) q (t Q) + w B (ω) q (t Q). A+Q,γ ,k (t, ω) = wC−k (ω) qk+1 − −k k

Remark 3. If I = [tan ω0 , tan ω1 ] ⊆ [0, 1] and 0 < λ− ≤ cos2εω1 < cos2εω0 ≤ λ+ , then owing to (8.2), (8.3) and to the fact that x → λ (x) is monotonically decreasing we have tλ+

I,Q t ≤ G I,Q tλ− . ≤G G I,Q Q 2ε Q The argument, based on inequality (3.9) used to compare P I,Q 2εt with PI,Q (t) in Lemma 5, is not going to apply here because λ is not a Lipschitz function. Nevertheless, we can overcome this problem by appealing again to a soft monotonicity argument, based on Remark 3 and on the fact (which can be seen directly from the definition of the function G(t)) that for any compact K ⊂ (0, ∞) \ {1, 2}, there exists a constant C K > 0 such that |G(t1 ) − G(t2 )| ≤ C K |t1 − t2 |, t1 , t2 ∈ K . (8.4)

458

F. P. Boca, A. Zaharescu

In this and the next two sections we will analyze the asymptotic of the quantity G I,Q (t) for large integers Q and short intervals I such that |I | Q −1/8 . We note at this point that the relation (1.3) is hinted by formula (8.3) and by

d max{q − t Q, 0} = −Q q (t Q), dt

t =

q . Q

For the sake of space, the error estimates which are similar to the ones already derived in the first part of the paper are going to be more sketchy. Proposition 6. For every interval I ⊆ [0, 1] of size |I | Q −1/8 and δ > 0, G I,Q (t) = 1 −

t c I + Oδ (Q −1/4+δ ) ζ (2)

(Q → ∞).

The estimate holds uniformly in t ∈ (0, 1]. Proof. Since 0 < t ≤ 1, we have min{qk , qk } ≥ q + q > t Q for all k ≥ 1. Thus we infer from (8.3), as in formula (4.4), that (1)

(2)

(3)

G I,Q (t) = G I,Q (t) + G I,Q (t) + G I,Q (t),

with

(1)

G I,Q (t) : = Q

∞

γ ∈F Q (I ) k=1 αk

+Q

∞

wCk (ω) + w Bk (ω) dω + Q

αk−1

β0

γ ∈F Q (I ) α0

βk

γ ∈F Q (I ) k=1 βk−1

wC0 (ω) dω

wC−k (ω) + w A−k (ω) dω

β∞ 1 1 − w A+ (ω) dω + Q − w B− (ω) dω =Q Q Q γ ∈F Q (I ) α∞ γ ∈F Q (I ) β0 β0 1 +Q − w A+ (ω) − w B− (ω) dω Q α0

α0

γ ∈F Q (I )

=

(β∞ − α∞ ) − Q

γ ∈F Q (I )

β0

w A+ (ω) dω − Q

γ ∈F Q (I ) α∞

β∞

γ ∈F Q (I ) α0

w B− (ω) dω,

Free Path Lengths in the Periodic Lorentz Gas in the Small-Scatterer Limit (2)

G I,Q (t) := Q

∞

αk−1

w A+ (ω) dω + Q

γ ∈F Q (I ) k=1 αk q>t Q

(3)

γ ∈F Q (I ) k=1 βk−1 q >t Q

β0

w A+ (ω) dω

β0

G I,Q (t) := Q

=Q

γ ∈F Q (I ) α0 q>t Q

w A+ (ω) dω, γ ∈F Q (I ) α∞ q>t Q ∞ βk

=Q

459

β∞

γ ∈F Q (I ) α0 q >t Q

w B− (ω) dω + Q

β0

γ ∈F Q (I ) α0 q >t Q

w B− (ω) dω

w B− (ω) dω.

From (4.12) we gather

(β∞ − α∞ ) =

γ ∈F Q (I )

arctan

γ ∈F Q (I )

a a − arctan q q

= c I + Oδ (Q −1/4+δ ).

(8.5)

On the other hand, (4.17) gives Q

β0

α∞

1 1 +O 2 2 . w A+ (ω) dω = 2q Q(1 + γ 2 ) q Q

(8.6)

We can show in a similar way that Q

β∞ α0

1 1 + O 2 2 . w B− (ω) dω = 2q Q(1 + γ 2 ) q Q (1)

(2)

(8.7)

(3)

From the formulas for G I,Q , G I,Q , G I,Q and from (8.5)–(8.7) we infer G I,Q (t) = c I −

γ ∈F Q (I ) q≤t Q

−

= cI −

γ ∈F Q (I ) q≤t Q

γ ∈F Q (I ) q ≤t Q

1 1 +O 2 2 2 2q Q(1 + γ ) q Q

1 1 + O 2 2 + Oδ (Q −1/4+δ ) 2q Q(1 + γ 2 ) q Q

1 − 2q Q(1 + γ 2 )

γ ∈F Q (I ) q ≤t Q

1 + Oδ (Q −1/4+δ ). 2q Q(1 + γ 2 )

(8.8)

460

F. P. Boca, A. Zaharescu

Finally we show as at the end of Sect. 4 that

1 1 = 2 2q Q(1 + γ ) 2Q

γ ∈F Q (I ) q≤t Q

= = =

1 2Q cI 2Q

1≤q≤t Q

1≤q≤t Q

1≤q≤t Q

1 q

Q−q
1 1 + a 2 /q 2

1 ϕ(q) 2 · q c I + Oδ (Q −1/4+δ ) q q2 ϕ(q) + Oδ (Q −1/4+δ ) q

cI t + Oδ (Q −1/4+δ ). 2ζ (2)

A similar formula holds for the second sum in (8.8), and therefore we get G I,Q (t) = c I −

cI t + Oδ (Q −1/4+δ ). ζ (2)

It is clear that these estimates hold uniformly in t ∈ [0, 1].

9. The Geometric Free Path in the Case t > 2 In this section we prove in the setting of Sect. 8 the following result Proposition 7. For every interval I ⊆ [0, 1] of size |I | Q −1/8 and δ > 0, G I,Q (t) =

cI ζ (2)

1

0

(1 − x)2 (t − x)2 ln d x + Oδ (Q −1/4+δ ) x2 t (t − 2x)

(Q → ∞).

The estimate holds uniformly in t on compacts of (2, ∞). Proof. We proceed as in Sect. 5, estimating first

S2 (γ , t) := Q

∞

αk−1

k=K +1 αk

wCk (ω) + w Bk (ω) dω

arctan a K −1/Q qK 1 = Q − w A+ (ω) dω = Q (q tan ω − a) dω a Q α∞ arctan q Q(1 − q/Q)2 Q(1 − q/Q)2 q Q Q = +O 2 3 = +O 2 3 , 2q 2 q K2 (1 + γ 2 ) q qK 2qq K2 (1 + γ 2 ) q qK

αK

(9.1)

Free Path Lengths in the Periodic Lorentz Gas in the Small-Scatterer Limit

and then

S1 (γ , t) = Q

α K −1

αK

461

wC K (ω) dω

1 a dω − q tan ω − K −1 K −1 a −1/Q Q arctan K q K Qq K −1 Q(1 − q/Q)2 q K −1 = +O 3 2(1 + γ 2 )q K2 −1 q K2 q K −1 q K3 1 Q(1 − q/Q)2 +O 2 2 . = 2(1 + γ 2 )q K −1 q K2 q q1

=Q

arctan

a K −1 −1/Q q K −1

(9.2)

In this case we find from (8.3), (9.1) and (9.2), as in Sect. 5, that Q(1 − q/Q)2 (q K −1 + q) ln Q + O G I,Q (t) = 2 Q 2(1 + γ 2 )qq K −1 q K2 γ ∈F Q (I ) Q(1 − q/Q)2 ln Q = + O (1 + γ 2 )qq K −1 q K Q γ ∈F Q (I )

=

∞

k (q) + O ln Q ,

Sk (q) + T Q k=2 q∈Q Ik

with

Sk (q) =

k (q) =

f k (q, q , a), T

(1)

q ∈Q Jk,q , a∈q I −aq =1 (mod q)

f k (q, q , a) =

f k−1 (q, q , q − a),

(0)

q ∈Q Jk,q , a∈q I −aq =1 (mod q)

Q(1 − q/Q)2 (1) , q ∈ I = Q Jk,q , a ∈ J = q I. (1 + a 2 /q 2 )qqk−1 qk

Employing the same technique as in Sect. 5 and the fact that one gets a similar result while integrating between βk−1 and βk , we find that G I,Q (t) can be expressed, up to an error term of order Oδ (Q −1/4+δ ), as ∞ ϕ(q)

f k (q, q , a) dq da + f k−1 (q, q , a) dq da (1) (0) q2 Q Jk,q ×q I Q Jk,q ×q I k=2 q∈Q Ik ∞

Q ϕ(q) Q(1 − q/Q)2 t Q−kq dq dq · + q q qk−1 qk Q−q t Q−kq qk−2 qk−1 k=2 q∈Q Ik Q+(k−1)q tQ ∞ ϕ(q) Q(1 − q/Q)2 dy dy · + = cI . q q y(y − q) Q+(k−1)q y(y − q) t Q−q

= cI

k=2 q∈Q Ik

= cI

Q ϕ(q) q=1

q

·

Q(1 − q/Q)2 (t Q − q)2 . ln q2 t Q(t Q − 2q)

462

F. P. Boca, A. Zaharescu

This is further equal to Q Q(1 − q/Q)2 (t Q − q)2 cI dq ln 2 ζ (2) 0 q t Q(t Q − 2q) Q Q(1 − q )2 cI (t Q − q)2 Q = dq ln ζ (2) 0 q2 t Q(t Q − 2q) 1 cI (1 − x)2 (t − x)2 d x, = ln ζ (2) 0 x2 t (t − 2x) which is the desired conclusion. 10. The Geometric Free Path in the Case 1 < t < 2 In this section we prove in the setting of Sect. 8 the following result Proposition 8. For every interval I ⊆ [0, 1] of size |I | Q −1/8 and δ > 0, 1 1 cI 1 cI 1 G I,Q (t) = ln dx + −2 + t + (t − 1) ln ζ (2) t−1 x t − x ζ (2) t −1 1 t−1 2 2 (1 − x) (1 − x) t−x (t − x)2 + ln ln dx + dx x2 t (1 − x) x2 t (t − 2x) t−1 0 + Oδ (Q −1/4+δ )

(Q → ∞).

The estimate holds uniformly in t on compacts of (1, 2). Proof. Since 1 < t < 2, we have max{q, q } ≤ t Q and we infer from (8.2) that < G I,Q (t) = G > I,Q (t) + G I,Q (t), < where G > I,Q (t), respectively G I,Q (t), contains the contribution of Farey fractions in F Q (I ) with q + q > t Q, respectively with q + q ≤ t Q. When q + q > t Q we have min{qk , qk } > t Q, k ≥ 1, and therefore

G> I,Q (t) := Q

∞ αk−1

α γ ∈F Q (I ) k=1 k q+q >t Q

β0

γ ∈F Q (I ) q+q >t Q

α0

+Q

=Q

γ ∈F Q (I ) q+q >t Q

+Q

=

wC0 (ω) dω + Q

∞ βk wC−k (ω) + w A−k (ω) dω

γ ∈F Q (I ) k=1 q+q >t Q

βk−1

β0 1 − w A+ (ω) − w B− (ω) dω + Q Q α0

β0

γ ∈F Q (I ) q+q >t Q

α∞

(β∞ − α∞ ) − Q

γ ∈F Q (I ) q+q >t Q

α0 1 − w A+ (ω) dω Q α∞

γ ∈F Q (I ) q+q >t Q

wCk (ω) + w Bk (ω) dω

γ ∈F Q (I ) q+q >t Q

1 − w B− (ω) dω Q

β∞

γ ∈F Q (I ) q+q >t Q

α0

w A+ (ω) dω − Q

w B− (ω) dω.

Free Path Lengths in the Periodic Lorentz Gas in the Small-Scatterer Limit

463

Standard considerations as in Sect. 6 and 8 show that, uniformly in t on compacts of (1, 2) and up to an error term of order O(Q −1 ln Q), G > I,Q (t) can be expressed as 1 1 1 − + qq (1 + γ 2 ) 2q Q(1 + γ 2 ) 2q Q(1 + γ 2 ) γ ∈F Q (I ) q+q >t Q

=

γ ∈F Q (I ) q+q >t Q

=

1 Q

γ ∈F Q (I ) q+q >t Q

1 1 − qq (1 + γ 2 ) q Q(1 + γ 2 )

(t−1)Q
1 q

=

1 Q

γ ∈F Q (I ) q+q >t Q

Q − q qq (1 + γ 2 )

Q − q

t Q−q
q (1 + a 2 /q 2 )

Q 1 ϕ(q) Q − q dq + Oδ (Q −1/4+δ ) · 2 qc I q q q t Q−q (t−1)Q

1 Q

Next we estimate G< I,Q (t)

:= Q

∞

αk−1

wCk (ω) + w Bk (ω) dω

γ ∈F Q (I ) k=1 αk q+q ≤t Q ∞ βk

+Q

γ ∈F Q (I ) k=1 q+q ≤t Q

βk−1

wC−k (ω) + w A−k (ω) dω,

and find as in Sects. 5, 6 and 9 that G < I,Q (t) can be expressed, up to an error term of −1/4+δ order Oδ (Q ), as 2

∞

k (q) + 2 Sk (q) + T

q∈Q I1 q ∈Q J (1) , a∈q I 1,q −aq =1 (mod q)

k=2 q∈Q Ik

cI = ζ (2) cI = ζ (2)

t Q−q Q cI Q(1 − q/Q)2 (t − x)2 d x + dq dq ln x2 t (t − 2x) ζ (2) (t−1)Q Q−q qq (q + q ) 1 (1 − x)2 (1 − x)2 (t − x)2 t−x ln ln d x + . x2 t (t − 2x) x2 t (1 − x) t−1

t−1(1 −

0

t−1 0

f 1 (q, q , a)

x)2

464

F. P. Boca, A. Zaharescu

> We conclude the proof by adding the formulas for G < I,Q (t) and G I,Q (t).

11. Proof of Theorem 2 Identifying ε+ with {(εeiα , ω) ; −ω − π/2 ≤ α ≤ ω + π/2} = {(εei(ω+β) , ω) ; β ∈ [−π/2, π/2]}, the (non-normalized) Liouville measure on the phase space ε+ is expressed as ! dλε = ε (cos α, sin α), (cos ω, sin ω) dα dω = ε cos(ω − α) dα dω = ε cos β dβ dω. Next we shall consider a fixed interval I = [tan ω0 , tan ω1 ] ⊆ [0, 1], define # " + ε,I := (εei(ω+β) , ω) ; |β| ≤ π/2, ω0 ≤ ω ≤ ω1 , and estimate Gε,I (t) :=

1 λε ε

t + (x, ω) ∈ ε,I . ; τε (x, ω) > 2ε

To each point P = εei(ω+β) we associate (see Fig. 8) the point P (0, y), where y = ε sin β ε ε cos ω ∈ − cos ω , cos ω . Note that ω1 π/2 + λε (ε,I )=ε cos β dβ dω = 2εc I . ω0

−π/2

Since P P has slope tan ω, we have the obvious inequality

ε sin β

τε (εei(ω+β) , ω) − , ω

< 2ε, τε/ cos ω

cos ω

Fig. 8. The parametrization of ε+

Free Path Lengths in the Periodic Lorentz Gas in the Small-Scatterer Limit

465

and as a consequence we can write ω1 π/2 et/(2ε) τε (εei(ω+β) , ω) cos β dβ dω Gε,I (t) = ω0 ω1

−π/2 π/2

ε sin β τε/ cos ω , ω cos β dβ dω et/(2ε)−2ε ≤ cos ω −π/2 ω0 ω1 ε/ cos ω cos ω = e(t−4ε2 )/(2ε) τε/ cos ω (y, ε) dy dω. ε ω0 −ε/ cos ω

When 0 < λ− ≤

cos ω1 2ε

<

cos ω0 2ε

≤ λ+ , obvious monotonicity properties yield

ω1

1/(2λ− )

ω0 ω1

−1/(2λ− ) 1/(2λ+ )

Gε,I (t) ≤ 2λ+

τ1/(2λ+ ) (y, ω) dy dω e(t−4ε2 )/(2ε)

τ1/(2λ+ ) (y, ω) dy dω e(t−4ε2 )/(2ε) ω0 −1/(2λ ) + 1 1 +O λ+ − λ− λ+ 2 t − 4ε λ+

1/(2λ+ ),I = 2G +O −1 , 2ε λ− = 2λ+

δ,I as defined in (8.1). Using similar arguments we infer with G 2

1/(2λ− ),I t + 4ε + O 1 − λ− . Gε,I (t) ≥ 2G 2ε λ+

(11.1)

(11.2)

Take now ε > 0 small, and suppose that |I | ε1/8 and Q ± are two integers such that Q− ≤

cos ω0 cos ω1 ≤ ≤ Q+, 2ε 2ε

Q± =

cos ω0 + O(ε1/8−1 ), 2ε

Q± = 1 + O(ε1/8 ). Q∓

Such integers can be chosen for instance as at the beginning of Sect. 7 with c = 18 . Fix also a compact K ⊂ (0, ∞) \ {1, 2}. Applying successively (11.1), Remark 3, Propositions 6, 7, 8, and inequality (8.4), we infer that + 2

1/(2Q + ),I t − 4ε + O Q − 1 Gε,I (t) ≤ 2G 2ε Q− Q+ t − 4ε2

+O −1 = 2G I,Q + 2ε Q− (t − 4ε2 )Q − Q+ +O ≤ 2G I,Q + −1 Q+ Q− = 2G I,Q + (t − 4ε2 ) 1 + O(ε1/8 ) + O(ε1/8 ) = 2G I,Q + t + O(ε1/8 ) + O(ε1/8 ) = 2c I G(t) + Oδ (ε1/8−δ )

uniformly in t ∈ K .

(11.3)

466

F. P. Boca, A. Zaharescu

In a similar way we infer from (11.2) and the previous arguments that Gε,I (t) ≥ 2c I G(t) + Oδ (ε1/8−δ )

uniformly in t ∈ K .

(11.4)

Consider now a partition of [0, 1] with intervals {I j } Nj=1 , where N = [ε−1/8 ] and |I j | = N1 ε1/8 . Summing over j we find as a result of (11.3), (11.4) and (7.8) that Gε,[0,1] (t) =

N

Gε,I j (t) =

j=1

π G(t) + Oδ (ε1/8−δ ), 2

and thus + ; 2ετε (x, ω) > t}) λε ({(x, ω) ∈ ε,[0,1] Gε,[0,1] (t) = + + λε (ε,[0,1] ) λε (ε,[0,1] )

π εG(t) · + Oδ (ε1/8−δ ) = G(t) + Oδ (ε1/8−δ ). 2 2εc[0,1] For obvious symmetry reasons we can only consider ω ∈ 0, π4 , therefore =

Gε (t) = G(t) + Oδ (ε1/8−δ ), which ends the proof of Theorem 2. 12. Estimates of Cε = lnτε − ln τε In this section we prove Theorem 3 (i). Part (ii) then follows from (i) and from relation (2.8) in [13]. We consider the probability measures ν0 and νε on [0, ∞) defined by ∞ ∞ f (u) dν0 (u) = f (u)g(u) du, 0 0 ∞ f (u) d νε (u) = f (2ετε ) dνε , f ∈ Cc ([0, ∞)). ε+

0

As a result of Theorem 2, lim+ ε→0

t

∞

d νε (u) =

∞

dν0 (u), t > 0,

t

which implies lim+

ε→0

∞ 0

f (u) d νε (u) =

∞

f (u) dν0 (u),

f ∈ Cc ([0, ∞)),

0

meaning that νε → ν0 vaguely as ε → 0+ . Since 1 ∞ 1 ∞ lim+ d νε (u) = dν0 (u), x ≥ 1, ε→0 x x x x

Free Path Lengths in the Periodic Lorentz Gas in the Small-Scatterer Limit

and the map x →

1 x

∞

dν0 (u) =

x

1 x

467

∞

g(u) du

x O(u −3 ),

u ≥ 1, the Lebesgue Dominated belongs to L 1 ([1, ∞), d x) because g(u) = Convergence theorem yields ∞ ∞ ∞ ∞ 1 1 lim+ d νε (u) d x = dν0 (u) d x < ∞. ε→0 1 x x x x 1 Using Fubini’s theorem, these double integrals can also be expressed as ∞ ∞ ∞ ∞ 1 1 νε (u) d x = νε (u) e[1,u] (x) d e[1,u] (x) d x d x x 1 1 1 1 ∞ ∞ u dx d νε (u) = ln u d νε (u), = x 1 1 1 and respectively as

∞ ∞

1

1 e[1,u] (x) dν0 (u) d x = x

1

It follows that for any (small) ε > 0, ∞ 1

and also that lim+

ε→0

1

∞

ε→0

0

1

0

∞

ln u dν0 (u).

1

ln u d νε (u) < ∞,

ln u d νε (u) du =

We show in a similar way that 1 ln u d νε (u) = lim+

(12.1)

∞

g(u) ln u du.

(12.2)

1

6 g(u) ln u du = 2 π

1

ln u du = −

0

6 π2

by using Fubini’s theorem which gives in turn 1 1 1 1 1 1 1 d x d νε (u) = − e[u,1] (x) d x d ln u d νε (u) = − νε (u) x x 0 0 u 0 0 1 x 1 1 1 1 e[u,1] (x) d νε (u) d x = − d νε (u) d x. =− x x 0 0 0 0 By (12.1) and (12.2) we get ∞ −C := g(u) ln u du = lim+ 0

ε→0

= ln 2 + lim+ ε→0

ln ε +

ε+

0

∞

ln u d νε (u) = lim+ ε→0

ln τε dνε .

Since (1.2) yields lim+ ln τε dνε + ln ε + ln 2 = lim+ ln ε→0

ε+

ε→0

ε+

ln(2ετε ) dνε

1 τε dνε − ln + 2ε ε

= 0,

468

F. P. Boca, A. Zaharescu

we collect lim

ln

ε→0+

ε+

τε dνε −

ε+

ln τε dνε

= C.

Finally we outline the proof of the identity C = 3 ln 2 − First we note that

1 0

6 g(t) ln t dt = 2 π

9ζ (3) . 4ζ (2) 1

(12.3)

ln t dt = −

0

6 , π2

so we may write −C = − where C1 =

6 π2

6 π2 6 C3 = 2 π

C2 =

6 + C1 + C2 + C3 , π2

(12.4)

∞ 2

1 1 2 ln 1 − +2 1− ln t dt, t t t 1 2 1 1 2 2 2 − − 1− − 1 ln t dt, ln t 2 t t 1 ∞ 1 1 2 2 2 − − 1− ln t dt. ln 1 − t 2 t t 2

The substitution t = 2u leads to 1 6 C3 = − C1 − 2 ln 2 2 π

∞ 1

1 2 1 + 1− du. ln 1 − u u u

1

By a direct computation, the integral above is equal to 2( π6 − 1), thus 2 6 C1 6 C1 π − − 1 = − − (2 ln 2) 1 − . C3 = − ln 2 2 2 π2 6 2 π2 2

(12.5)

Next, a direct computation shows that C1 =

12 2ζ (3) − 1 . 2 π

(12.6)

The relations (12.4)–(12.6) provide −C = C2 + But

12 12 6 ln 2. ζ (3) − − 2 1 − π2 π2 π2

3 6 ln2 2 − 2 C2 = − 2 · π 2 π

1

2

2 1− t

2

(12.7)

2 − 1 ln t dt, ln t

thus we get −C =

12 6 ln2 2 3 12 6 ln 2 − − 2 C4 , ζ (3) − − 2 1 − · 2 2 2 2 π π π π 2 π

(12.8)

Free Path Lengths in the Periodic Lorentz Gas in the Small-Scatterer Limit

with

2

C4 =

1−

1

where

2

1−

C5 = ln 2 1

2

C7 = 1

2 t

2 1− t

2

2 t

2

ln

469

2 − 1 ln t dt = C5 − C6 + C7 , t

2

2

ln t dt, C6 =

t ln t dt. ln 1 − 2

1−

1

2 t

2 ln2 t dt,

By a direct computation we find C5 = ln 2 − 2 ln3 2, C6 = 6 − 8 ln 2 −

4 3 ln 2. 3

As a result we gather C4 = ln 2 − 2 ln3 2 − 6 + 8 ln 2 +

4 3 ln 2 + C7 , 3

and so 12 12 12 ln 2 3 3 6 ζ (3) − 2 − 2 ln 2 + − 2 ln2 2 − 2 ln 2 + 2 ln3 2 π2 π π2 π π π 18 24 ln 2 4 3 + 2− − 2 ln3 2 − 2 C7 π π2 π π 12 6 15 3 2 3 = 2 ζ (3) + 2 − 2 ln 2 − 2 ln 2 − 2 ln2 2 + 2 ln3 2 − 2 C7 . π π π π π π By a careful computation we find 1 2π 2 ln 2 + 4Li3 − 4ζ (3) + 2, C7 = − ln2 2 − 5 ln 2 + 3 2 −C =

where Li3 denotes the trilogarithm function Li3 (z) =

∞ zm , m3

|z| ≤ 1.

m=1

Using the equality (cf. [30, formula (6.12)]) 1 7 π2 ln3 2 Li3 = ζ (3) − ln 2 + 2 8 12 6 we infer

ζ (3) 2 3 π2 ln 2 − + ln 2 + 2. 3 2 3 Inserting this back into (12.9) we finally find C7 = − ln2 2 − 5 ln 2 +

12 6 15 3 2 3 ζ (3) + 2 − 2 ln 2 − 2 ln 2 − 2 ln2 2 + 2 ln3 2 + 2 ln2 2 π2 π π π π π 3 2 6 15 ζ (3) − 2 ln3 2 − 2 + 2 ln 2 − ln 2 + π 2π 2 π π 27 9ζ (3) . = −3 ln 2 + ζ (3) = −3 ln 2 + 2 2π 4ζ (2)

−C =

(12.9)

470

F. P. Boca, A. Zaharescu

Acknowledgements. We are grateful to Professor Giovanni Gallavotti and to one of the referees of [7] for bringing to our attention reference [9] in 2002.

References 1. Augustin, V., Boca, F.P., Cobeli, C., Zaharescu, A.: The h-spacing distribution between Farey points. Math. Proc. Cambridge Phil. Soc. 131, 23–38 (2001) 2. Blank, S., Krikorian, N.: Thom’s problem on irrational flows. Internat. J. Math. 4, 721–726 (1993) 3. Bleher, P.: Statistical properties of two-dimensional periodic Lorentz with infinite horizon. J. Stat. Phys. 66, 315–373 (1992) 4. Boca, F.P., Cobeli, C., Zaharescu, A.: Distribution of lattice points visible from the origin. Commun. Math. Phys. 213, 433–470 (2000) 5. Boca, F.P., Cobeli, C., Zaharescu, A.: A conjecture of R.R. Hall on Farey points. J. Reine Angew. Math. 535, 207–236 (2001) 6. Boca, F.P., Gologan, R.N., Zaharescu, A.: The average length of a trajectory in a certain billiard in a flat two-torus. New York J. Math. 9, 303–330 (2003) 7. Boca, F.P., Gologan, R.N., Zaharescu, A.: The statistics of the trajectory of a billiard in a flat two-torus. Commun. Math. Phys. 240, 53–73 (2003) 8. Bouchaud, J.-P., Le Doussal, P.: Numerical study of a D-dimensional periodic Lorentz gas with universal properties. J. Stat. Phys. 41, 225–248 (1985) 9. Bourgain, J., Golse, F., Wennberg, B.: On the distribution of free path lengths for the periodic Lorentz gas. Commun. Math. Phys. 190, 491–508 (1998) 10. Bunimovich, L.: Billiards and other hyperbolic systems. In: Dynamical systems, ergodic theory and applications, edited by Ya.G. Sinai, Encyclopaedia Math. Sci. Vol. 100, Berlin: Springer-Verlag, 2000, pp. 192–233 11. Caglioti, E., Golse, F.: On the distribution of free path lengths for the periodic Lorentz gas. III. Commun. Math. Phys. 236, 199–221 (2003) 12. Chernov, N.: New proof of Sinai’s formula for the entropy of hyperbolic billiard systems. Application to Lorentz gases and Bunimovich stadium. Funct. Anal. and Appl. 25(3), 204–219 (1991) 13. Chernov, N.: Entropy values and entropy bounds. In: Hard ball systems and the Lorentz gas, edited by D. Szász, Encyclopaedia Math. Sci., Vol. 101, Berlin: Springer-Verlag, 2000, pp. 121–143 14. Dahlqvist, P.: The Lyapunov exponent in the Sinai billiard in the small scatterer limit. Nonlinearity 10, 159–173 (1997) 15. Deshouillers, J.-M., Iwaniec, H.: Kloosterman sums and Fourier coefficients of cusp forms. Invent. Math. 70, 219–288 (1982/1983) 16. Dumas, H.S., Dumas, L., Golse, F.: Remarks on the notion of mean free path for a periodic array of spherical obstacles. J. Stat. Phys. 87(3/4), 943–950 (1997) 17. Erdös, P.: Some results on diophantine approximation. Acta Arith. 5, 359–369 (1959) 18. Erdös, P., Szüsz, P., Turán, P.: Remarks on the theory of diophantine approximation. Colloq. Math. 6, 119–126 (1958) 19. Estermann, T.: On Kloosterman’s sum. Mathematika 8, 83–86 (1961) 20. Friedman, B.: Niven, I.: The average first recurrence time. Trans. Amer. Math. Soc. 92, 25–34 (1959) 21. Friedman, B., Oono, Y., Kubo, I.: Universal behaviour of Sinai billiard systems in the small-scatterer limit. Phys. Rev. Lett. 52, 709–712 (1984) 22. Gallavotti, G.: Lectures on the billiard. In: Dynamical systems, theory and applications (Rencontres, Battelle Res. Inst., Seattle, Wash., 1974), edited by J. Moser, Lecture Notes in Phys. Vol. 38, Berlin-Heidelberg-Newyork: Springer-Verlag, 1975, pp. 236–295 23. Goldfeld, D., Sarnak, P.: Sums of Kloosterman sums. Invent. Math. 71, 243–250 (1983) 24. Golse, F.: On the statistics of free-path lengths for the periodic Lorentz gas. In: XIV International Congress on Mathematical Physics (Lisbon, 2003), edited by J.-C. Zambrini, River Edge, NJ: World Sci. Publ., 2006, pp. 439–446 25. Golse, F., Wennberg, B.: On the distribution of free path lengths for the periodic Lorentz gas. II. M2AN Math. Model. Numer. Anal. 34, 1151–1163 (2000) 26. Gutzwiller, M.: Physics and arithmetic chaos in the Fourier transform. In: The mathematical beauty of physics (Saclay, 1996). edited by J.M. Drouffe, J.B. Zuber, Adv. Series in Math. Phys. Vol. 24, River Edge, NJ: World Sci. Publ., 1997, pp. 258–280 27. Hooley, C.: An asymptotic formula in the theory of numbers. Proc. London Math. Soc. 7, 396–413 (1957) 28. Kesten, H.: Some probabilistic theorems on diophantine approximations. Trans. Amer. Math. Soc. 103, 189–217 (1962) 29. Kuznetsov, N.V.: The Petterson conjecture for forms of weight zero and Linnik’s conjecture. Math. Sb. (N.S.) 111(153), 334–383, 479 (1980)

Free Path Lengths in the Periodic Lorentz Gas in the Small-Scatterer Limit

471

30. Lewin, L.: Dilogarithms and associated functions. London: Macdonald & Co. London, 1958 31. Lorentz, H.A.: Le mouvement des électrons dans les métaux. Arch. Néerl. 10, 336 (1905). Reprinted in Collected papers. Vol. 3. The Hague: Martinus Nijhoff, 1936 32. Pólya, G.: Zahlentheoretisches und wahrscheinlichkeitstheoretisches über die sichtweite im walde. Arch. Math. Phys. 27, 135–142 (1918) 33. Santaló, L.A.: Sobre la distribucion probable de corpusculos en un cuerpo. Deducida de la distribucion en sus secciones y problemas analogos. Rev. Un. Mat. Argentina 9, 145–164 (1943) 34. Sinai, Y.G.: Dynamical systems with elastic reflections. Ergodic properties of dispersing billiards. Russ. Math. Surveys 25, 137–189 (1970) 35. Weil, A.: On some exponential sums. Proc. Nat. Acad. Sci. USA 34, 204–207 (1948) Communicated by P. Sarnak

Commun. Math. Phys. 269, 473–492 (2007) Digital Object Identifier (DOI) 10.1007/s00220-006-0140-z

Communications in

Mathematical Physics

Generalized Particle Statistics in Two-Dimensions: Examples from the Theory of Free Massive Dirac Field Dario Salvitti Dipartimento di Matematica, Università di Roma “La Sapienza”, P.le Aldo Moro 2, 00185 Roma, Italy. E-mail: [email protected] Received: 1 November 2005 / Accepted: 13 July 2006 Published online: 4 November 2006 – © Springer-Verlag 2006

Dedicated to the memory of Sabrina Picucci Abstract: In the framework of algebraic quantum field theory we analyze the anomalous statistics exhibited by a class of automorphisms of the observable algebra of the twodimensional free massive Dirac field, constructed by fermionic gauge group methods. The violation of Haag duality, the topological peculiarity of a two-dimensional spacetime and the fact that unitary implementers do not lie in the global field algebra account for strange behaviour of statistics, which is no longer an intrinsic property of sectors. Since automorphisms are not inner, we exploit asymptotic abelianness of intertwiners in order to construct a braiding for a suitable C ∗ -tensor subcategory of End(A ). We define two inequivalent classes of path connected bi-asymptopias, selecting only those sets of nets which yield a true generalized statistics operator. 1. Introduction The intrinsic definition of particle statistics in the approach of Algebraic Quantum Field Theory (AQFT) in a four-dimensional space-time is provided by assigning to superselection sectors equivalence classes of permutation group representations, which describe the statistics of multiparticle states. In a (3+1)-dimensional space-time, fields and particles obey Bose-Fermi alternative, exhibiting the more general bosonic or fermionic parastatistics, while in lower dimensional Minkowski space statistics are described, in general, by braid group representations. The first models leading to particles described by a one-dimensional representation of the braid group (anyons) are in [25], while higher dimensional representations describe plektons. In a (2+1)-dimensional spacetime, strictly local quantum fields are always subject to the normal commutation rules, but particles carrying “topological charges”, created from the vacuum by the action of fields localized in cones, may exhibit intermediate statistics. The statistics of a sector describes the interchange of identical charges. In two dimensions, DHR theory allows for two distinct statistics operators (one the inverse of the other), since the causal complement of a bounded region has two connected components.

474

D. Salvitti

The statistics operator is a topological invariant if the pairs of spatially separated auxiliary regions can be continuously deformed from one to the other, maintaining a relative spacelike distance. Therefore, the braid group enters in the description of DHR superselection charges localized in two-dimensional double cones, for intervals of the real line or in (2+1)-dimensional theories for charges localized in space-like cones. Superselection sectors in four-dimensional theories are classified by equivalence classes of irreducible representations of the compact group of internal symmetries [12]. However, if the superselection category in low dimensional theories is not symmetric but only braided, such a group may not exist. Indeed, some models of (1+1)-dimensional conformal fields exhibit a superselection structure which seems not to fit any representation group theory. Why “generalized” particles statistics? Well, this is necessary since the algebraic approach to local field theories which do not fulfill Haag duality and which does not allow non-inner automorphisms of the underlying field algebra does not yield a well defined notion of statistics. Thus, we need to extend it to physical theories which do not fit the prescriptions of the algebraic framework totally, as in the case of smeared-out kink operators [20] in the context of the two-dimensional free massive Dirac field in the formalism of relativistic second quantization developed in [5]. There, not only is Haag duality violated (Sect. 3), but a family of unitary operators implementing DHR automorphisms is not in the field algebra, forcing us to explore alternative tools. In a more general setting, field theories in (1+1)-dimensions satisfying twisted Haag duality and the split property for wedges and having an unbroken (i.e. unitarily implemented) group of inner symmetries G give rise to a not Haag dual observable algebra A = F G [16]. Split property for wedges has been proven recently by Buchholz and Lechner for the Bose and Fermi cases [3]. Together with the argument in [16], this proves that that the observable algebra is not Haag dual when F is any finite product of free massive Bose and Fermi fields and G is non-trivial. We remark that “free” anyons are studied in a two-dimensional space-time since no (2+1)-dimensional model of free anyons can exist1 [17]. In the setting of CAR algebras on the fermionic Fock space there exists a natural notion of second quantization more appropriate for a theory of relativistic particles. The theory of Fermionic gauge groups [5, 6] displays a wide class of unitarily implementable automorphisms on the antisymmetric Fock space. Our choice is the natural one [1], i.e. that for which the winding numbers (the charges) are easily computable through index formulae [21], while the zero charge implementers are the well known smeared-out kink operators since they can modify the statistics of a sector [22]. Implementable gauge groups in the one-particle Dirac theory lead to a model which exhibits strange statistics. A class of Bogoliubov automorphisms unitarily implementable in the Fock representation induce a family of localized and transportable automorphisms of the observable algebra [1], implemented by non local operators which are not even contained in the field algebra F . Since our investigations are strongly influenced by the violation of Haag duality and the non-locality of implementers, which gives rise to non-inner automorphisms, we begin with a discussion of the arguments leading to the known results, in order to emphasize the need to clarify the notion of statistics even for theories not fulfilling all the axioms of AQFT, and to better understand the developments presented in this paper. Statistics of 1 This notion of “free” anyons refers to the on-mass-shell nature of the Fourier transform. In d=1+1 the anyon operators can live together in the same Hilbert space as the free Fermions, but they are not really free in the mass-shell sense.

Generalized Particle Statistics in Two-Dimensions

475

sectors, approached first with “classical” DHR theory [7–10, 12, 13], depends not only on the charge (i.e. on the sector), but also on a continuous parameter which indexes a collection of unitarily implementable automorphisms which carry no charge, but modify the statistics of the composed sector [22]. Unfortunately, since the net of local observables does not fulfill Haag duality, some results largely exploited in AQFT are no longer true in the setting with which we are concerned here, and we are able to produce counterexamples. However, the statistics operator still possesses all of the formal properties as it does in DHR theory, since unitary intertwiners between automorphisms of the same translation equivalent class are always local elements of A , even if Haag duality is violated. After computing the statistics operator formally, the question remains as to whether the braiding obtained in this way has a genuine meaning in terms of statistics. Actually, we cannot proceed step by step along DHR theory alone, as it deals with local objects and often exploits Haag duality as a fundamental technical assumption. We are now analyzing a theory that allows for intertwiners not lying in the algebra where endomorphisms act and where endomorphisms are not locally inner but are inner only in a asymptotical sense. We appeal to a more recent notion of braiding [2], where the condition of asymptotic abelianness of intertwiners allows us to define bi-asymptopias, giving rise to a braiding for a suitable full subcategory of End(A ). In higher dimensions, Roberts has shown that a DHR sector of a non-Haag dual net A extends to a DHR sector of the dual net A d , and the latter can be studied with the usual methods [19]. In (1+1)-dimensional massive theories this fails since A d , satisfying Haag duality and the split property for wedges, has no localized sectors as Mueger has shown in [15]. The net A may have non-trivial localized sectors, but they necessarily become solitons when they are extended to A d . In (1+1)-dimensional free massive Dirac field theory we exclude those braidings that do not give rise to true statistics, since they have their DHR counterpart in pseudostatistics operators constructed without remaining in the same connected component. Bi-asymptopias relative to different components are not mutually cofinal, nor path connected, due to the geometry of a two-dimensional space-time. A direct computation for each connected component yields different braidings, i.e. the category is not symmetric. Physically speaking, particles described by this theory are neither bosons nor fermions for almost all values of the solitonic parameter λ. The present article is organized as follows. In Sect. 2 our assumptions are stated: relativistic second quantization, implementable gauge groups in the (1+1)-dimensional free massive Dirac field theory, index formulae for smeared-out kink operators in the formalism developed in [6, 5, 21]. In Sect. 3 we analyze a field theory model arising from the fermionic gauge group theory when applied to the Dirac field in two-dimensional Minkowski space. Some known results from [1] are derived. Investigation of the statistics in the framework of DHR analysis leads to anomalous behaviour of the charge composition, and statistics is not an intrinsic property of the sector 2 . In Sect. 4 we prove that unitary implementers are not elements of the global field algebra F for almost all values of the continuous parameter λ, thus giving a complete classification of their localization properties.

2 In the conformally invariant zero mass (or short distance) limit the situation changes radically and the standard DHR theory becomes again fully applicable. This phenomenon is inexorably linked to the emergence of new sectors in this limit (the disorder becomes charge-carrying) and has been observed in [14].

476

D. Salvitti

In Sect. 5 we prove asymptotic abelianness in order to exhibit a pair of disjoint (i.e. not path connected) bi-asymptopias which give rise to an authentic braiding for the C ∗ subcategory of End(A ) generated by our family of automorphisms. We compute the braiding explicitly and give a natural condition to be imposed in (1+1)-dimension in order to avoid those braidings with no true counterpart in the DHR setting. 2. Preliminaries Since we work in the algebraic setting, we briefly list the axioms appropriate for theories where observables are defined from fields through a principle of gauge invariance. 1. The field algebra F is the inductive limit of the net of von Neumann algebras O → F (O) and its action on the Hilbert space H is irreducible. 2. There exists a strongly continuous unitary representation L → U(L) of the Poincaré group P on H inducing automorphisms α L of the field algebra, and the action on local algebras is geometric, i.e. α L (F (O)) = F (LO). Moreover, there exists a unit vector ∈ H , the vacuum vector, unique up to a phase, which is left invariant by U(L), L ∈ P. The vector state induced by is the vacuum state ω0 of F , ω0 (F) = (, F). 3. (Reeh-Schlieder property for double cones) The vacuum vector is cyclic and separating for every algebra F (O). 4. There exists a faithful representation g → βg of a compact group G, the gauge group, by automorphisms of F . βg commutes with α L and βg (F (O)) = F (O). Moreover, for F ∈ F (O), the correspondence g → βg (F) is weakly continuous. 5. (Normal commutation relations) There exists a k ∈ G, with k 2 = e, such that, setting F± (O) = {F ∈ F (O) : βk (F) = ±F}, (1) we have that F+ (O1 ) commutes with F (O2 ) and F− (O1 ) anticommutes with F− (O2 ) when O1 ⊂ O2 . If the unitary (g) implements the automorphism βg , we can reformulate (1) by requiring twisted locality: F (O)τ ⊂ F (O ) , where F (O)τ := Z F (O)Z ∗ , Z =

1+i(k) 1+i ,

defines the twisted algebra.

2.1. Relativistic second quantization. We now fix notation and give an overview of the fundamentals of relativistic second quantization. Let H be a separable complex Hilbert space with inner product ( , ). The fermionic Fock space Fa (H ) is the completion of n the vector space Dat := ∞ algebraic tensors with respect to n=0 ∧ H of antisymmetric the ”natural” scalar product < ⊕n 0 ξn ⊕n 0 ηn >= n 0 (ξn , ηn ), with the standard unity vector := (1, 0, 0, ...) defined as the vacuum vector. For every f ∈ H , the creation operator c∗ ( f ) and its adjoint c( f ), the annihilation operator, are defined on the whole fermionic Fock space, with c∗ ( f ) = f , and they satisfy the canonical anticommutation relations: {c( f ), c(g)} = {c∗ ( f ), c∗ (g)} = 0, {c( f ), c∗ (g)} = ( f, g)1

Generalized Particle Statistics in Two-Dimensions

477

for arbitrary f, g ∈ H . If U ∈ B(H ), we define the operator (U ) on ∧n H by (U )c∗ ( f 1 ) · · · c∗ ( f n ) = c∗ (U f 1 ) · · · c∗ (U f n ),

(2)

which has the property (U )(V ) = (U V ). In particular, if U ∈ U(H ) := {A : H → H , A unitary}, the correspondence c∗ ( f ) → c∗ (U f ) is an automorphism of the CAR algebra, unitarily implemented by (U ) (in the Fock representation): (U )c∗ ( f )(U )∗ = c∗ (U f )

(3)

as follows from (2). Moreover, an arbitrary A ∈ B(H ) induces on Fa (H ) a sum operator d(A) such that exp(itd(A)) = (eit A ), which preserves the adjoint and the commutator [5]. In concrete cases, that we discuss in this paper, H = H+ ⊕ H− , where H± are copies of the same function space L 2 ≡ H . Let Pδ (δ = +, −) be the projectors onto H± . If A is an operator on H , we set Aδδ := Pδ A Pδ , δ, δ = +, −. In this notation, A is a block matrix whose entries are endomorphisms of L 2 . Such a decomposition of H is related to the well known fact that the free Dirac hamiltonian Hm has spectrum (−∞, −m] ∪ [m, +∞), where m 0 denotes the rest mass of the particle. Here, P± are the spectral projectors of Hm onto [m, +∞) and (−∞, −m] respectively. Instead of non-relativistic second quantization A → d(A), we work with another irreducible representation of the CAR algebra, defined by c( ˜ f ) := c(P+ f ) + c∗ (P− f ).

(4)

The one-particle Hilbert space is then defined by H1 := P+ H ⊕ P− H while the physical Hilbert space is Fa (H1 ). If U ∈ U(H ) satisfies [U, Pδ ] = 0, then the automorphism c( ˜ f ) → c(U ˜ f ) is unitarily implemented by (U ) := (U++ )(U¯ −− ),

(5)

with the compact notation (U++ ) ≡ (U++ ⊕ 1), (U¯ −− ) ≡ (1 ⊕ U¯ −− ). Here the bar over an operator stands for the action by a fixed conjugation J on H , i.e. U¯ = J U J. For arbitrary U ∈ U(H ), the Shale-Stinespring theorem states that there exists a unitary (U ) on the antisymmetric Fock space such that c(U ˜ f) = (U )c( ˜ f ) (U )∗

∀f ∈H

if and only if the off-diagonal parts of U, namely Uδ−δ , are Hilbert-Schmidt (H S) operators. Unitaries on H inducing automorphisms of the CAR algebra which are unitarily implementable on Fa (H ) form a group, denoted G2 . Let g2 := {A ∈ B(H ) : Aδ,−δ ∈ H S} be the complex Lie algebra of G2 . By a suitable choice for the phases of the unitary operator implementing the automorphism of the CAR algebra, it is always possible to define a one-parameter strongly continuous group ˜ (eit A ) = eitd(A) ,

A = A ∗ ∈ g2 ,

˜ where d(A) is the self-adjoint generator. The arbitrary additive constant in the defini˜ ˜ tion of d(A) is fixed by requiring that (, d(A)) = 0. In Sect. 3, where we deal with the Dirac field, we shall employ the more common notation π(φ( f )) rather than c( ˜ f ).

478

D. Salvitti

Definition 1. The charge operator Q is the generator of the one-parameter group induced by the identity: Q := d (1) = d(P+ ) − d(P− ). Under the action of chargeoperator, fermionic Fock space splits into a direct sum of charge sectors Fa (H ) = n∈Z Fn , and (U )Fn = Fn+q(U ) ,

U ∈ G2 ,

where q(U ) ∈ Z is the Fredholm index of U−− . The additive property q(U1 U2 ) = q(U1 ) + q(U2 ) also holds. In other terms, if q(U ) = 1, the vacuum sector F0 can be connected to the n charge sector by applying (U )n , while q(eit A ) = 0, A = A∗ ∈ g2 , since q(eit A ) = q(1) = 0 by virtue of the continuity in t. Hence the charged sectors are left invariant by (eit A ). We end this overview with an identity of great relevance for computations: (U )(−1). (−1) (U ) = (−1)q(U ) In view of the subsequent applications, we cite three useful propositions which establish the commutation rules between unitary implementers and their self-adjoint generators [5]. Proposition 1. For every A, B ∈ g2 , on the domain D of finite particle vectors there holds ˜ [d (A), d (B)] = d([A, B]) + C(A, B)1, (6) where C(A, B) := Tr (A−+ B+− − B−+ A+− ) is the Schwinger term. Here D := {F ∈ Fa (H ) : F = Pl F, for some l ∈ N} and Pl denotes the spectral projector of the particle number operator on [0, l]. Proposition 2. Let A, B ∈ g2 , A = A∗ , B = B ∗ and [A, B] = 0. We have: (ei A ) (ei(A+B) ). (ei B ) = e−C(A,B)/2

(7)

The third proposition establishes the commutation rule between second quantization operators in the case that one of them is a charge shift (i.e. it carries a non-zero charge). Proposition 3. Let U ∈ G2 , A = A∗ ∈ g2 , [U, A] = 0. Then ˜ ˜ ˜ (U ) (ei A ). (ei A ) (U ) = ei((U ), d(A)(U ))

(8)

2.2. Implementable gauge groups in the one-particle Dirac theory. In the theory of (1+1)-dimensional free massive Dirac field, gauge transformations are operators of multiplication by unitary matrices on Hˇ ≡ L 2 (R, d x)⊗2 , the image of H ≡ L 2 (R, dp)⊗2 by means of the Fourier transformation F, employed to diagonalize the differential operator representing the free Dirac Hamiltonian. Since we are interested in lifting these unitaries to the Fock space, we must consider only gauge transformations which define unitaries in G2 . We denote by H1 (R) the Sobolev space, which consists of all absolutely

Generalized Particle Statistics in Two-Dimensions

479

continuous functions of L 2 (R) with derivatives in L 2 (R). Once we have introduced the group (under pointwise multiplication) L e U(1) := {u ∈ Map(R, U(1)) | u(·) − 1 ∈ H1 (R)}, we define two faithful unitary representations πˇ ± of L e U(1) on Hˇ , given by πˇ + (u) = u(x) ⊕ 1 and πˇ − (u) = 1 ⊕ u(−x), i.e. they act as multiplication by a function on one component space only. Then πs (u) ∈ G2 , and two projective unitary representations (π± ) on Fa (H ) are automatically defined. Another global gauge transformation has the form eiϕ+ ⊕ eiϕ− , ϕ± ∈ (0, 2π ). In the case of interest to us, i.e. rest mass of the particle m > 0, we put ϕ+ = ϕ− , otherwise the H S condition would be violated. We end this paragraph with a formula for IndU−− . We are interested in operators of the form (Uˇ f )(x) = u(x) f (x), u(x) ∈ U(2),

f ∈ Hˇ ,

where the 2 × 2 matrix u(x) is assumed to be diagonal: u(x) =

u + (x) 0 , u ± (x) ∈ C. 0 u − (x)

(9)

ˇ for Aˇ an operator on Hˇ . We consider continuous multipliers of We set A := F −1 AF the form (9) such that, for each s, there exists u ∞ ∈ C({±1}, U(1)) satisfying u s (x) − u ∞

x = o(1), |x| → ∞, s = +, −. |x|

These multipliers form a group, denoted by G h , and their Fredholm indices are easily computable in view of the following [21]. Theorem 1. Let U ∈ G h . Then Ind U−− = w(u + u −1 − ), where w is the winding number which, by convention, is positive on the map x →

(10) x−i x+i .

In order to complete the discussion of the operators we shall employ in the next section, we remark that all our unitaries induce automorphisms of the CAR algebra which are unitarily implementable on the Fock space. Indeed, if x1 → α(x1 ) is an odd, monotonously increasing, C ∞ real valued function, which equals 1 at the right of the interval (−1, 1), we introduce the smeared-out kink operators Uˇ λ, := eiπ λα(·/) , λ ∈ C, > 0.

(11)

The off-diagonal parts of Uλ, are H S for every λ ∈ C, so it induces Bogoliubov transformations unitarily implementable for every λ ∈ R [21].

480

D. Salvitti

3. Strange Statistics in Two-Dimensional Free Massive Dirac Field Theory The theory of fermionic gauge group reveals itself as a natural setting for the construction of a model which exhibits anomalous statistics [1]. The initial Hilbert space is L 2 (R, C2 ). Denoting by K the set of double cones in the (1+1)–dimensional Minkowski space, let BO be the base at time t of O ∈ K. The algebra of fields localizable in O is defined in the usual way, F (O) = {π(φ(eit Hm f )) : supp( f ) ⊂ BO } , and the global field algebra F is the C ∗ -inductive limit of the net {F (O)}O∈K . The ˜ iγ ), γ ∈ R) form the gauge invariant parts (i.e. the subalgebras left invariant by Ad(e net of observables. Let us mention that the free massive Dirac field theory in (1+1)dimensions fulfills twisted duality F (O)τ = F (O ) (the proof of this is independent of the space-time dimension; see [4] for details), but the net of observables does not fulfill Haag duality for double cones. Indeed, if O, O1 , O2 are double cones, with bases at t = 0, such that O1 , O2 lie in different connected components of O , and if f i are test functions with supp( f i ) ⊂ Oi (i = 1, 2), then the observable π(φ( f 1 ))∗ π(φ( f 2 )) is contained in A (O) but not in A (O ) [1, 16]. The automorphism of F defined by ρ Z (π(φ( f ))) := π(φ(Z f )) is unitarily implemented in the Fock representation when Z is one of the following unitaries of L 2 (R, C2 ): (U (n) f )(x) = (eiπ nε(x) f 1 (x), f 2 (x)), n ∈ Z, (V (λ) f )(x) = eiπ λϑ(x) f (x), λ ∈ R, which correspond in (9) to the choice u + (x) = exp(iπ nε(x)), u − ≡ 0 and u ± (x) = exp(iπ λϑ(x)), respectively. Here, the functions ε and ϑ are characterized by the same properties of α relative to generic intervals (−, ), resp. (−θ, θ ), instead of (−1, 1). We emphasize that this result is valid only in the massive case [18]. The gauge group U(1) ˜ iγ ), and the self-adjoint unitary operator inducing the acts on F through eiγ → Ad(e ˜ twisting is (−1). We refine Propositions 1 and 2 to suit our purpose [18]. The computation of the statistics operator needs commutation rules between implementers when translated to mutually space-like regions. Let O be a double cone with basis (−, ) at t = 0, and x, x ∈ R such that O + x ⊂ (O + x) . Since the Schwinger term for the pair V (λ)x , V (λ )x vanishes when O + x ⊂ (O + x) , Proposition 2 establishes that the projective representation ˜ is multiplicative in almost all cases of interest to us. Proposition 3 applies to the case U = U (n)x and ei A = V (λ)x , giving ˜ (λ)x )(U ˜ (n)x )(V ˜ (n)x ) = eiπ nλ sgn(x−x ) (U ˜ (λ)x ), (V ˜ (n)x )(U ˜ (n )x ) = (U ˜ (n )x )(U ˜ (n)x ). (U As will be clear later, we consider only even charge n, while the real number λ is left arbitrary. Since G2 is a group, the product U (n)V (λ) is unitarily implementable too. We set W ≡ W (n, λ) := U (n)V (λ). Choosing the same generating function ϑ = ε we obtain a unitary operator on L 2 (R, C2 ) defined by: W := eiπ(n+λ)ε(·) ⊕ eiπ λε(·) . Here W acts on both components of L 2 (R, C2 ) as multiplication by two distinct functions. We note that U (n) ≡ W (n, 0) and V (λ) ≡ W (0, λ). In order to determine the

Generalized Particle Statistics in Two-Dimensions

481

charge carried by the automorphism ρW we must evaluate q(W ). The Fredholm index of W−− can be easily computed as an immediate application of Theorem 1, and it equals n. Note that V (λ) belongs to the connected component of the identity, therefore IndV (λ) = Ind(1) = 0, λ ∈ R. Thus n is the charge carried by W, with no contribution from V (λ). (The charge is entirely transported by U (n) while V (λ) is neutral.) Proposition 4. Automorphisms of F of the form ρW induce, on restriction, automorphisms of the observable algebra A . Proof. Since any ∗-homomorphism between C ∗ algebras is continuous, the statement is an immediate consequence of the inclusion ρW (A (O1 )) ⊂ A (O1 ),

∀O1 ∈ K.

In order to prove this, let us consider a unitary W (n, λ) with generating function ε centred in O ∈ K. (We assume that all double cones have base at t = 0.) We note that ρW (F (O1 )) ⊂ F (O1 ). Indeed, if supp( f ) ⊂ BO1 , then supp(W f ) ⊂ BO1 too, hence: ρW (π(φ( f ))) ≡ π(φ(W f )) ∈ F (O1 ). (eiγ )] = 0 for all A ∈ A (O1 ) and γ ∈ R, since the adjoint Moreover, [ρW (A), iγ ˜ ) commute between themselves in view of Proposition 3. ˜ ) and Ad(V actions Ad(e The claim follows from the gauge invariance of A. 2 Proposition 5. Automorphisms of A defined as in Proposition 4 are localizable in double cones. / Proof. Let A := π(φ( f )), supp( f ) ⊂ BO1 , where O1 ⊂ O . Obviously, if x ∈ supp( f ) then (W f )(x) = 0. If x ∈ supp( f ) ⊂ BO1 , then x ∈ / BO and ε(x) = ±1, and we have (W f )(x) = e±πiλ f (x), ˜ ±πiλ )(A), ∀A ∈ F (O1 ). Now, it is then evident that, if A ∈ hence ρW (A) = Ad(e U(1) A (O1 ) ≡ F (O1 ) , then ρW (A) = A, i.e. ρW acts trivially on A (O ). 2 Proposition 6. For each unitary W defined as above, ρW := Ad (W ) defines a localizable and translatable automorphism of A . Proof. We have just proved localizability: ρW |A (O ) = id|A (O ) , where O is the double cone in whose base the unitary W (i.e. its generating function ε) is “centred”, and ˜ ⊂ (A (O)) ˜ for every double cone O˜ ⊃ O. In order to prove translatability, ρW (A (O)) let us observe that denoting the translates by Wx := T (x)W T (−x), the automorphism ˜ ρWx is localized in O + x. Moreover, the unitary (Wx W ∗ ) intertwines ρW := Ad(W ) ˜ and ρWx := Ad(Wx ), and induces an equivalence between them since it is a local ˜ with O˜ ⊃ O ∪ Ox . The gauge invariance observable. Indeed, (Wx W ∗ ) ∈ A (O), comes from the commutation relation of Proposition 3 between (V ) and (ei A ), where ∗ the inner product (8) now vanishes. Indeed, since q(Wx W ) = q(Wx ) + q(W ∗ ) ≡ q(W ) − q(W ) = 0, it follows that the intertwiner (Wx W ∗ ) preserves the charge and so (Wx W ∗ ) ∈ C. We thus have: ( (Wx W ∗ ), d (γ 1) (Wx W ∗ )) = (, d (γ 1)) = 0.

482

D. Salvitti

In order to prove that (Wx W ∗ ) ∈ F (O˜ ) , let us consider O1 ⊂ O˜ and supp( f ) ⊂ BO1 . In order to evaluate the expression (Wx W ∗ )π(φ( f )) (Wx W ∗ )∗ = ρWx ◦ ρW ∗ (π(φ( f ))),

(12)

we notice that ρW ∗ ≡ ρW−1 is still localized in O, and then, since W (n, λ)∗ = W (−n, −λ), ρW ∗ (π(φ( f ))) = π(φ(e−iπ λ f )). Analogously ρWx (π(φ( f ))) = π(φ(eiπ λ f )), so the right side of (12) reduces to π(φ( f )) and the result follows from twisted duality. 2 ˜ In the previous proof we have incidentally established that unitaries (W ) are gauge invariant if and only if q(W ) = 0. We are now in a position to perform the computation of the statistics operator ερW . For simplicity, we start with an automorphism ρW localized in a double cone O centred at the origin. Proposition 7. If the automorphism ρW is localized in a double cone centred at the origin, its statistics operator is ερW = e±2πinλ 1, according to the connected component of O . Proof. Since we work at t = 0, we omit the component-subscript and consider x ∈ R. The automorphism ρWx is localized in O + x and is unitarily equivalent to ρW through ˜ x W ∗ ). Then, the intertwiner (W (W ) (Wx ) (Wx W ∗ )∗ ρW ( (Wx W ∗ )) = (W ) (Wx )∗ (W ∗ )2 . ερW = (Here, and in the sequel, we omit all cocyles since they are always coupled with their conjugate). With this convention the previous expression yields (U )∗ (W )∗ . (W ) (Wx∗ ) (U ) (V ) (Ux ) (Vx ) (V )∗ By Proposition 3 and our remarks on the specific cases discussed in [18], ˜ ˜ ˜ (Ux ) (V ) (Ux ) ≡ (ei X (λ) ) (U (n)x ) = ei((Ux ),d(X (λ))(Ux )) (ei X (λ) ) (Ux ) = eiπ nλ sgn(x) (V ),

(Ux )∗ with (U ). We then obtain while (V ) commutes with (Vx ), and (W ) (Vx∗ ) (W )∗ . (U ) (Vx ) (U )∗ eiπ nλ sgn(x) Repeating the same arguments for the two central terms, one has ερW = e2πinλ sgn(x) 1. 2

Generalized Particle Statistics in Two-Dimensions

483

If the automorphism is translated to a double cone O + x, for arbitrary x, it assumes the form ρWx , with W localized around the origin. With a proof identical to that of Prop. 7, one easily obtains the following. Proposition 8. For every x ∈ R, ερW = e2πinλ sgn(x−y) 1, x

with O + y the auxiliary double cone, spatially separated from O + x, used in the construction of the statistics operator. Remark. In view of the decomposition ρW = ρU ρV , an alternative method of evaluating ερW is based on the identity ερU ρV = ρU (ε(ρU , ρV ))ερU ρU 2 (ερV )ρU (ε(ρV , ρU )).

(13)

A straightforward computation reduces (13) to ερW = ρU (ε M (ρU , ρV )), where the monodromy operator is simply ε M (ρU , ρV ) = e2πinλ sgn(y−x) 1. We also observe that we could have determined the latter by exploiting the lowdimensional quantum field theory as formulated in [13], where only the statistics phases are involved. Indeed, since κW = e2πinλ sgn(y−x) and κU = κV = 1, the claim follows from ε(ρV , ρU )ε(ρU , ρV ) =

κW 1 κU κV

[13, Lemma 3.3]. Therefore, these results are still consistent with the general theory of local quantum fields in low dimension. We have incidentally noticed that in a (1+1)-dimensional massive QFT the statistics of a product may not coincide with the product of statistics, i.e. composition of DHR morphisms with ordinary statistics may generate braid statistics. This possibility is excluded by (3+1)-dimensional QFT with Haag duality [8, p.179], where εξ1 εξ2 = εξ1 ξ2 for two arbitrary superselection sectors ξ1 , ξ2 , (i.e. equivalence classes of Poincaré covariant localized automorphisms). The factorization property of statistics is no longer true in theories where non-ordinary statistics can occur. In our model this property is equivalent to the triviality of monodromy, i.e. if and only if the automorphism carries ordinary statistics. Since unitary intertwiners between translation equivalent automorphisms are local observables, the violation of the multiplicative property of statistics cannot be attributed to the violation of Haag duality but to the geometry of space-time. It is easily seen that the statistics operator depends on the translation equivalent class of automorphisms but not on its representative. The following corollary is then evident. Corollary 1. The statistics operator ερW gives rise to a one-dimensional representation of the braid group if and only if 2nλ ∈ / Z. We end this section with an expression for the statistics operator ε(ρW , ρW ) when the unitaries W and W are centred in the same interval. If the induced automorphisms are localized in O, let x, y ∈ R2 be such that Ox and O y lie in the right component of O , with Ox O y , where by Ox O y we mean that Ox lies in the right component

484

D. Salvitti

of the space-like complement of O y . Let ρWx be localized in Ox and equivalent to ρW . Analogously, let ρW be localized in O y and equivalent to ρW . One then has y

∗

ε(ρW , ρW ) = (W W y ) × (W Wx∗ ) ◦ (Wx W ∗ ) × (W y W ∗ )

∗ (Wx W ∗ )ρW ( (W y ) (W Wx ∗ ) (W y )∗ (W y W ∗ )) = (W W y ) ∗ (W y W ∗ ) (W y ) (W Wx ∗ ) (W y )∗ (Wx W ∗ ) = (W W y )

= e−iπ(nλ +n λ) (W ) (W )∗ (W ) (W )∗ = e−πi(nλ +n λ) 1. On the other hand, if we choose Ox ≺ O y , the exponent in the last member changes sign. Therefore (14) ε(ρW , ρW ) = eiπ(nλ +n λ) sgn(y−x) 1. 4. Charge Implementers are not Quasilocal ˜ As already stated, charge implementers (W ) do not belong to the observable algebra. In this section we will show that they are not even in the field algebra F when λ = 0 mod 2. ˜ ˜ As a first step we observe that if (W ) ∈ F , then its translates (W )x ∈ F too, through αx (π(φ( f ))) = π(φ(τx f )), where (τx f )(ξ ) := f (ξ − x). Once we have determined the statistics operator, we can exclude the trivial case λ = 0 (no kinks present), since it yields ordinary statistics. Obviously, this is not the unique value of λ for which the statistics reduce to the ordinary one. The other values which realize this possibility depend on the charge, since they are given by 2nλ ∈ Z, and are trivially taken into account. For x ∈ R such that O + x ⊂ O , we have ˜ ˜ x ) = e2πiqλ sgn(x) (W ˜ ˜ x )(W (W )(W ),

(15)

where q ≡ q(W ) = q(Wx ) is the charge carried by both ρW and ρWx . For a general field F we have ˜ (16) F τ = F+ − i F− (−1), where F± denote the bosonic, resp. fermionic, part of F. This can be easily seen from the explicit form of the twisted field in the general case: F τ = AdZ (F),

Z=

˜ 1 + i (−1) 1+i

(the symbol 1 always denotes the identity operator on the corresponding Hilbert space, as is clear from the context). We are interested in the case 2qλ ∈ Z, since this leads to a ˜ contradiction – the commutators are non-vanishing, according to (15). Let (W ) ∈ F. ˜ There exists a sequence of local fields {Fn }n∈N norm convergent to (W ), with Fn ∈ ˜ ˜ F (On ), n ∈ N. Since (W )τ = (W ), we have ˜ ) − Fn .

Fnτ − Fn 2 (W

(17)

On the other hand, to (16), we have Fnτ − Fn = −(1 + i)Fn− Z , and thus √ according τ −

Fn − Fn = 2 Fn . By virtue of (17) we then have √ ˜ ) − Fn . (18)

Fn− 2 (W

Generalized Particle Statistics in Two-Dimensions

Let now M > 0 be such that Fn < M. An ε/3 argument yields

[(W ˜ ˜ ˜ ˜ ˜ ) − Fn + Fn Fn,x − (W ), (W )x ] < (M + 1) (W )x (W ) ,

485

(19)

where we have used the fact that the translations, being implemented by unitary operators, are norm-preserving. Here Fn,x denotes the translate of Fn by x, for the generic n ∈ N. The second term on the right hand side of (19) is dominated by ˜ ˜

Fn Fn,x − Fn,x Fn + Fn,x Fn − (W )x (W ) . − when O + x ⊂ O . In this Since Fn and Fn,x are local fields, [Fn , Fn,x ] = 2Fn− Fn,x n n case we obtain, using (18):

˜

Fn Fn,x − Fn,x Fn 2 Fn−

Fn,x − 4 (W ) − Fn 2 . In the last step we have exploited the commutativity between the actions αR2 and βU(1) in order to yield Fx − = F − x for each quasilocal field F. Let now ε > 0, and let n ε ∈ N be a positive integer such that ˜

(W ) − Fn <

ε ε ˜ ˜ ) < )x (W , Fn,x Fn − (W 3(M + 1) 3

for all n n ε . The integer n ε is independent of x. For n = n ε , let x > 0 be such that On ε + x ⊂ On ε . Then, the left-hand side of (19) can be made arbitrarily small, contradicting (15). This proof, though intuitive, does not work if 2qλ ∈ Z, since no contradiction is obtained in the latter case. In particular, our arguments exclude the case q = 0. We will give an alternative proof which overcomes this impediment, based on twisted duality for the field algebra. ˜ Theorem 2. For every λ = 0 mod 2 the unitary implementers (W ) are not elements of the field algebra F . If λ = 0 mod 2, they are local elements of the field algebra. Proof. If (W ) ∈ F , let {Fn }n∈N be a sequence of local elements of F norm convergent to (W ), with Fn ∈ F (On ) for suitable double cones On . Since Fnτ ∈ F (On )τ = F (On ) , [Fnτ , A] = 0, A ∈ F (On ), n ∈ N. If Oˆ n ⊂ On , we have [Fnτ , π(φ( f ))] = 0, supp( f ) ⊂ BOˆ , n ∈ N. With this notation it follows that n

( (W ) − τ

(W )τ

(W )τ π(φ( f )) − π(φ( f ))

Fnτ )π(φ( f )) + Fnτ π(φ( f )) − π(φ( f ))Fnτ

+ π(φ( f ))(Fnτ − (W )τ )

2 π(φ( f )) (W )τ − Fnτ + Fnτ π(φ( f )) − π(φ( f ))Fnτ .

(20)

Without loss of generality, we consider only normalized functions. Then, since the correspondence f → π P (φ( f )) is isometric, π(φ( f )) = 1. Fixing an arbitrary > 0, let n ∈ N be such that (W )τ − Fnτ < /2 for every n n . For n = n and supp( f ) ⊂ BOˆ , where Oˆ n ⊂ On , the expression in the last line of (20) is less than n (W )τ can be made . Therefore, for each fixed q and λ, (W )τ π(φ( f )) − π(φ( f )) arbitrary small by choosing a suitable function.

486

D. Salvitti

On the other hand, since (W )τ = (W ), the expression in the first line of (20) assumes the simpler form

e−iπ λ π(φ( f )) (W ) − π(φ( f )) (W ) = |e−iπ λ − 1|,

(21)

in view of the CAR relations and the unitarity of (W ). Thus, we have a contradiction between (20) and (21), since the latter establishes that, for λ = 0 mod 2, the norm is a strictly positive constant. Finally, if λ = 0 mod 2 the unitary implementers are local elements of F . Indeed, if W is centred in O, ˜ ˜ (W )π(φ( f ))(W )∗ = π(φ( f )) for every f with support spatially separated from O. Therefore, by twisted duality, ˜ (W ) ∈ F (O ) = F (O)τ . On the other hand, since the twisting is involutive on any ˜ ˜ local field algebra, i.e. F (O)τ τ = F (O), (W ) ∈ F (O) follows from (W )τ ≡ ˜ (W ). 2 This result shows that the strange behaviour of statistics for this model appears only when the implementers are not elements of the field algebra, confirming that there is no contradiction between what we expected from the general theory of superselections sectors and the peculiarities arising from this model. When the “solitonic” parameter λ vanishes, the statistics is again trivial, i.e. conventional Bose-Einstein or Fermi˜ Dirac. Since only the zero charge (W ) are gauge invariant, the unique cases in which the implementers are observables (and local) are when λ ∈ 2Z and q = 0. The classification of the localizability property of our implementers is thus complete. 5. Braiding Structure and Asymptotic Abelianness In order to approach the study of strange statistics with a more general tool, we observe that the well known AQFT as formulated in [9, 10 and 12] cannot be applied here in its entirety, since Haag duality is violated by (1+1)-dimensional free massive Dirac field theory. A more appropriate setting seems to be that proposed in [2] in order to construct symmetric tensor C ∗ -categories in QED, since it extends to theories where intertwiners are not contained in the algebra where the endomorphisms act. Moreover, the endomorphisms are not necessary locally inner 3 , but only in an asymptotic sense. Asymptotic abelianness yields a tensor C ∗ -category starting directly from representations, without exploiting Haag duality. We set up notation and state definitions. Let us introduce two sets of nets ρW → UρW and ρW → VρW for every object ρW . Each net consists of unitary intertwiners in (ρW , ρWxm ), where ρWxm tends pointwise in norm to the identity morphism on A for suitable sequences {xm }m . Definition 2 (Asymptotic abelianness). A field theory model satisfies asymptotic abelianness if, given intertwiners R ∈ (ρW , ρW ), S ∈ (ρW , ρW ) and nets Um ∈ UρW , Um ∈ Uρ , Vn ∈ Vρ , Vn ∈ Vρ , W

W

W

Um RUm∗

× Vn SVn∗ − Vn SVn∗ × Um RUm∗ −→ 0

in norm as m, m , n, n → ∞. 3 We recall that (W ) ∈ / A in almost all cases.

(22)

Generalized Particle Statistics in Two-Dimensions

487

Finally, the sets of nets must be compatible with products: for each pair ρW , ρW ∈ , there exist Um ∈ UρW and Um ∈ Uρ such that Um × Um ∈ UρW ρ , and similarly W W for V. Here denotes a semigroup of endomorphism of A . If the nets satisfy all these conditions, they give rise to a bi-asymptopia. Theorem 3. [2]. If ρW , ρW ∈ , then: ε(ρW , ρW ) :=

lim Vn∗ × Um∗ Um × Vn

m,n→∞

exists, is independent of Um ∈ UρW and Vn ∈ Vρ , and lies in (ρW ρW , ρW ρW ). MoreW over, if R ∈ (ρW , ρW ), S ∈ (ρW , ρW ) and ρW ∈ , then: ε(ρW , ρW ) ◦ R × S = S × R ◦ ε(ρW , ρW ), ε(ρW ρW , ρW ) = ε(ρW , ρW ) × 1ρ ◦ 1ρW × ε(ρW , ρW ), W

ε(ρW , ρW ρW ) = 1ρ × ε(ρW , ρW ) ◦ ε(ρW , ρW ) × 1ρ . W

W

We apply this method to our class of automorphism in the setting of the (1+1)-dimensional free massive Dirac field. In light of the computation performed in the previous section, it remains to verify the condition of asymptotic abelianness in order to have a bi-asymptopia and consequently a braiding for our category. Starting from (2+1)dimensional models, where a space-like cone C and its opposite −C are usually chosen as asymptotic localization regions in the definition of the families U and V, we extend the method to a (1+1)-dimensional space-time by choosing the two standard wedges W± . Since in (1+1)-dimensions there is a natural notion of right and left, hence of +∞ and −∞, we set UρW := {(Ua )a ⊂ (ρW , ρWa ), a → +∞}, VρW := {(Vb )b ⊂ (ρW , ρWb ), b → −∞}, where the two families of nets are contained in W+ , W− respectively. (This condition implies that the nets are contained in the causal complement of every bounded region for large values of the indexes.) Let R : ρW → ρW , S : ρW → ρW , and z, ζ ∈ C be defined by W ∗ ). R = z (W W ∗ ), S = ζ (W As intertwiners Um ∈ UρW , Um ∈ Uρ , Vn ∈ Vρ , Vn ∈ Vρ , we set: W

(Wxm W ∗ ), U m = λm (Wx W ∗ ), Um = λm m

W

W

W+

xm −→ +∞, W+

xm −→ +∞, W−

yn W ∗ ), (W Vn = μn

yn −→ −∞,

W ∗ ), (W Vn = μn y

yn −→ −∞,

n

W−

where λm , λm , μn , μn ∈ C are defined as the scalar z, for every m, m , n, n . To simplify notation, in the sequel xm stands for xm . The term of the sequence to which we refer

488

D. Salvitti

will be clear from the context. Analogous simplifications will be adopted for yn , λm μn . We now perform the computation of the limit in (22), which now assumes the form: (Wx W ∗ )z (Wxm W ∗ )∗ × λm (W W ∗ )λ¯ m m

W yn W W ∗ )ζ ∗ )μ¯ n ∗ )∗ − (W (W ×μn (W yn W yn W W ∗ )ζ ∗ )μ¯ n ∗ )∗ × (W (W −μn (W yn (Wx W ∗ )z (Wxm W ∗ )∗ = (W W ∗ )λ¯ m ×λm m = λm λ¯ m z D(Wx W ∗ , W W ∗ )D(Wx W ∗ , W Wx∗m ) (Wx Wx∗m ) × m

m

m

W W W W y∗ ) W ∗ , W ∗ )D(W ∗, W y∗ ) − (↔), (W ×μn μ¯ n ζ D(W y y y n n n

n

n

(23)

where (↔) denotes the cross product of the same terms in the inverse order. Here D(A, B) is the cocycle of the projective representation , i.e. (W1 ) (W2 )=D(W1 ,W2 ) (W1 W2 ). Collecting all scalars in a factor q, (23) reduces to ∗ W W y∗ ) − y∗ ) × q (Wx Wx∗m ) × (W ( W (W W ) (24) x y y x n n m m

n

n

m

so, to evaluate the asymptotic behaviour of (22) it suffices to compute the limit of the expression in parentheses. Since W y∗ ) : ρ −→ ρ (Wx Wx∗m ) : ρWxm −→ ρW , (W y n Wy W xm

m

n

yn

n

the expression in (24) becomes: ∗ W ∗ q (Wx Wx∗m ) (Wxm ) (W yn yn ) (W xm ) m ∗ ∗ W yn ) y∗ ) − (W ( W (W W ) ( W ) y n yn xm xm n ∗ ∗ ∗ ∗ ) = q (Wx ) (W y ) (W yn ) (Wxm ) − (W y (W x ) (W xm ) ( W yn ) , m

n

n

m

where , W y∗ ). q := q D(Wx , Wx∗m )D(W y n m

n

By construction yn < xm , yn < xm for m, n sufficiently large, then unitaries Wx and m are centred in disjoint intervals for m, n sufficiently large. Equivalently, the correW yn

y ) are localized in causally disjoint regions. sponding implementers (Wx ) and (W n m We are then in a position to exploit the commutation rules between second quantization operators we have established in Sect. 2. For example, ) ) = e−iπ(nλ +n λ) (Wx ) (W (W y y (W x ), m

n

n

m

and analogously for Wxm . Performing the substitutions, it turns out that (22) vanishes for large values of the indexes and thus, a fortiori, tends to zero. We have thus shown the following

Generalized Particle Statistics in Two-Dimensions

489

Proposition 9. The subcategory of End(A ) generated from admits a braiding structure ε. This model shows that the method of asymptotic abelianness, in the form stated in [2], cannot be applied to (1+1)-dimensional massive theories in order to obtain a generalized statistics operator, since the definition of bi-asymptopias is not consistent with the geometric peculiarity of space-time which may give rise to two distinct statistics operators. More precisely, we compare the braiding ε of Theorem 3 with the statistics operator ε(ρW , ρW ) computed in the purely algebraic setting as in (14). If {xm }m and {yn }n are such that Oxm O yn for sufficiently large m and n, one has Vn∗ × Um∗ Um × Vn W y∗ ) × λ¯ m yn W ∗) (W (W Wx∗m ) ◦ λm (Wxm W ∗ ) × μn (W = μ¯ n n W y∗ ) yn ) y∗ ) yn W ∗ ) (W (W (W Wx∗m ) (W (Wxm W ∗ ) (W ) (W (W )∗ , n n which coincides with the expression of ε(ρW , ρW ) gained by transporting ρW and ρW resp. to ρWxm and ρWy . Since the two nets of double cones are contained in distinct n components of O , this is incompatible with the basic prescriptions of AQFT, since this procedure would be equivalent to not remaining in a fixed connected component! We U and remark that asymptotic abelianness requires that the two nets of double cones Om V On , which appear in UρW and VρW , do lie in distinct components of O , since only this configuration guarantees that OmU − OnV tends space-like to infinity as m, n → ∞. (There are no alternatives in (1+1)-dimension with no additional constraint.) So, the theory of bi-asymptopias may not lead to true statistics when the Minkowski space is not at least (2+1)-dimensional and braid statistics occurs. As in the AQFT setting, we can always collect path connected bi-asymptopias into equivalence classes, but without additional prescription on the double limit in Definition 2, we could include objects which have no physical meaning, since they correspond in (1+1)-dimensions to working with both components of O at the same time. A natural way to generalize this approach to bidimensional theories is to reformulate some definitions, giving a restricted notion of asymptotic abelianness appropriate to all dimensions. Instead of performing the double limit as in Definition 2 and Theorem 3, we choose a particular “direction”, for example the diagonal one, i.e. m = n, ε(ρW , ρW ) := lim Vn∗ × Un∗ Un × Vn , n→∞

provided Oxn and O yn are space-like separated for m and n sufficiently large (analogously for the definitions of asymptotic abelianness and bi-asymptopias). All properties continue to be valid, since the true reason for sending to infinity the two nets of double cones is to exploit morphisms which commute. For example, with the definition OnV := OnU ± eˆ1 n, all conditions are satisfied and each pair of nets lies in the wedge W± for large values of the indexes. (Here eˆ1 denotes the unit vector in the x1 -direction.) In this way, the braiding structure arising from asymptotic abelianness of intertwiners coincides with the braiding computed for a Haag-dual net whose morphisms are all inner, and it gives rise to a true braided tensor C ∗ -category in all other cases, e.g. the present one (where the statistics operator now has a genuine, not simply formal, meaning of statistics). In other terms, we have excluded all cases in which intertwiners satisfy asymptotic abelianness, but the limits in Theorem 3 do not give rise to a statistics operator. The braiding induced by these bi-asymptopias is really non symmetric, since

490

D. Salvitti

bi-asymptopias {U, V}, {V, U} are not path connected. In (3+1) dimensions all particles exhibit ordinary statistics since we can choose OnV = −OnU when we deal with strictly localized morphisms. This ensures the possibility of changing continuously from {U, V} to {V, U} along a chain of double cones. In (2+1) dimensions, where cones give the better notion of localization, one can choose ρa in such a way that a tends to space-like infinity remaining in a space-like cone C, resp. −C, for Uρ , Vρ , and it is always possible to interchange the two cones by a sequence of allowed moves. This is not possible in (1+1) dimensions, since O is not connected and thus, a fortiori, is not arcwise connected. We remark that a distinction between ε(ρ, σ ) and ε(σ, ρ)∗ for a generic pair of DHR morphisms cannot be achieved by interchanging the roles of U and V, since both nets of double cones are in the same connected component of O . Hence, in two dimensions to each morphism (object) we must associate two bi-asymptopias, one for each side of O . A direct computation of the limits in Theorem 3 for each connected component separately, gives two distinct values, e±2πinλ , which coincide with those found before. Remark. Although two arbitrary automorphisms of the form ρW are always connected by a similarity transformation (i.e., ρW2 = Ad (W2 W1∗ ) ◦ ρW1 ), this does not imply that they are unitarily equivalent through a local element of the observable algebra, since the unitary intertwiner is not necessarily in A . More precisely, if (n 1 , λ1 ) = (n 2 , λ2 ), then (W1 W2∗ ) ∈ / A as a consequence of the previous observation and of group relations for unitaries W ( · , · ; ε), i.e. W (n, λ)W (n , λ ) = W (n + n , λ + λ ), W (n, λ)∗ = W (−n, −λ). On the other hand, products Wx W ∗ have a different behaviour, since W (n, λ; ε)x W (n, λ; ε)∗ = W (n, λ; τx ε − ε)

(25)

˜ where O˜ ⊃ O ∪ Ox . We emphasize that the and, as already stated, (Wx W ∗ ) ∈ A (O), notation in (25) may give rise to ambiguities, since the operator on the right-hand side carries no charge even if n = 0, due to the particular form of the generating function. Conclusions In the setting of AQFT we have shown that a family of localized and transportable automorphisms of the observable algebra A exhibits non-ordinary statistics. Inside each sector one has different braiding structures labelled by a solitonic parameter λ which reflects the action of smeared-out kink operators carrying no charge. Owing to non-locality of charge implementers, statistics is not an invariant of the sector, as already known in some two-dimensional particle theories or in solitonic theories. On the underlying ordinary structure, smeared-out kink operators give rise to a continuous family of braided tensor categories in the sense of the theory of bi-asymptopias. The results are consistent with AQFT, which must be handled carefully here when tackling problems arising from the non-locality of unitary implementers, the violation of Haag duality and the topological peculiarity of (1+1)-dimensional space-time. Owing to the latter, some results of local field theory are no longer valid in a two-dimensional world, giving rise to a range of intermediate situations and strengthening the concept that for massive theories in (1+1) dimensions statistics is not an intrinsic characteristic of sectors a priori [23]. In the present case, since Haag duality can be overcome by peculiarities of the model, strangeness of statistics has its origin in the fact that implementers do not lie in the field algebra. The interpretation of the braiding structure of this model extends to the CAR algebra the constructive method exploited for the Weyl algebra. Since not all the braidings

Generalized Particle Statistics in Two-Dimensions

491

obtained in this way give rise to a notion of statistics compatible with the DHR analysis, but only those constructed from pairs of sets of nets which tend to the same space-like infinity, the method of bi-asymptopias can be carried over to (1+1)-dimensional spacetime only if we add a compatibility condition. This kind of selection criterion reflects the “initial condition” which determines uniquely the statistics operator in the standard algebraic approach, i.e. trivialization of ε(ρW , ρW ) for ρW ≺ ρW [13]. The particles described by this model are “statistical schizons”, since the same sector allows “pseudo” statistical descriptions and they exist in the same Hilbert space either as bosons or as fermions or as proper anyons [22], i.e. in two-dimensional massive theories not only the spin but also the statistics is a convention. Acknowledgement. I am greatly indebted to the supervisor of my PhD thesis, S. Doplicher, for many helpful discussions and much encouragement. I would like to thank C. D’Antoni, J. Roberts and M. Gabriel for a careful reading of the manuscript, and B. Schroer for useful correspondence. It is a pleasure to thank A. Silva, who was the coordinator of the PhD program at the Department of Mathematics of “La Sapienza”, University of Rome, in the period when this work has been carried out.

References 1. Adler, C.: Braid group statistics in two-dimensional quantum field theory. Rev. Math. Phys. 7, 907–924 (1996) 2. Buchholz, D., Doplicher, S., Morchio, G., Roberts, J.E., Strocchi, F.: Asymptotic abelianness and braided tensor C ∗ -categories. http://arxiv.org/list/math-ph/0209038,2002 3. Buchholz, D., Lechner, G.: Modular nuclearity and localization. Annales Henri Poincare 5, 1065–1080 (2004) 4. Baumgärtel, H., Jurke, M., Lledó, F.: Twisted duality of the CAR-algebra. J. Math. Phys. 43, 4158–4179 (2002) 5. Carey, A.L., Ruijsenaars, S.N.M.: On fermionic gauge groups, current algebras and Kac-Moody algebras. Acta Appl. Math. 10, 1–86 (1987) 6. Carey, A.L., Hurst, C.A., O’Brien, D.M.: Automorphisms of the canonical anticommutation relation and index theory. J. Funct. Anal. 48, 360–393 (1982) 7. Doplicher, S., Haag, R., Roberts, J.E.: Fields, observables and gauge transformations I. Commun. Math. Phys. 1, 1–23 (1969) 8. Doplicher, S., Haag, R., Roberts, J.E.: Fields, observables and gauge transformations II. Commun. Math. Phys. 15, 173–200 (1969) 9. Doplicher, S., Haag, R., Roberts, J.E.: Local observables and particle statistics I. Commun. Math. Phys. 23, 199–230 (1971) 10. Doplicher, S., Haag, R., Roberts, J.E.: Local observables and particle statistics II. Commun. Math. Phys. 35, 49–85 (1974) 11. Doplicher, S., Roberts, J.E.: C ∗ -algebras and duality for compact groups: why there is a compact group of internal symmetries in particle physics. Proceedings of the International Conference on Mathematical Physics, Marseille (1986), Singapore: World Scientific, 1987 12. Doplicher, S., Roberts, J.E.: Why there is a field algebra with a compact gauge group describing the superselection structure in particle physics. Commun. Math. Phys. 131, 51–107 (1990) 13. Fredenhagen, K., Rehren, K.-H., Schroer, B.: Superselection sectors with braid group statistics and exchange algebras II: geometric aspects and conformal covariance. Rev. Math. Phys., Special Issue, 113–157 (1992) 14. Köberle, R., Marino, E.C.: Duality, mass spectrum and vacuum expectation values. Phys. Lett. B126, 475–480 (1983) 15. Mueger, M.: Superselection structure of massive quantum field theories in 1+1 dimensions. Rev. Math. Phys. 10, 1147–1170 (1998) 16. Mueger, M.: Quantum double actions on operator algebras and orbifold quantum field theories. Commun. Math. Phys. 181, 137–181 (1998) 17. Mund, J.: No-go theorem for ‘free’ relativistic anyons in d=2+1. Lett. Math. Phys. 43, 319–328 (1998) 18. Pressley, A., Segal, G.: Loop groups. Oxford: Clarendon Press, 1986 19. Roberts, J.E.: Lectures on algebraic quantum field theory. In: The algebraic theory of superselection sectors: Introduction and recent results. Singapore: World Scientific, 1990

492

D. Salvitti

20. Ruijsenaars, S.N.M.: The Wightman axioms for the fermionic Federbush model. Commun. Math. Phys. 87, 181–228 (1982) 21. Ruijsenaars, S.N.M.: Index formulas for generalized Wiener-Hopf operators and boson-fermion correspondence in 2N dimensions. Commun. Math. Phys. 124, 553–593 (1989) 22. Schroer, B.: Scattering properties of anyons and plektons. Nucl. Phys. B369, 478–498 (1992) 23. Schroer, B.: Two-dimensional models as testing ground for principles and logarithmic structures. Ann. Phys. 321, 435–479 (2006) 24. Schroer, B., Swieca, J.A.: Spin and statistics of quantum kinks. Nucl. Phys. B121, 505–513 (1977) 25. Wilczek, F.: Quantum mechanics of fractional spin particles. Phys. Rev. Lett. 49, 957–1149 (1983) Communicated by Y. Kawahigashi

Commun. Math. Phys. 269, 493–532 (2007) Digital Object Identifier (DOI) 10.1007/s00220-006-0144-8

Communications in

Mathematical Physics

A Variational Principle for KPP Front Speeds in Temporally Random Shear Flows James Nolen1 , Jack Xin2 1 Department of Mathematics, University of Texas at Austin, Austin, TX 78712, USA.

E-mail: [email protected]

2 Department of Mathematics, University of California at Irvine, Irvine, CA 92697, USA.

E-mail: [email protected] Received: 3 November 2005 / Accepted: 7 July 2006 Published online: 4 November 2006 – © Springer-Verlag 2006

Abstract: We establish the variational principle of Kolmogorov-Petrovsky-Piskunov (KPP) front speeds in temporally random shear flows with sufficiently decaying correlations. A key quantity in the variational principle is the almost sure Lyapunov exponent of a heat operator with random potential. To prove the variational principle, we use the comparison principle of solutions, the path integral representation of solutions, and large deviation estimates of the associated stochastic flows. The variational principle then allows us to analytically bound the front speeds. The speed bounds imply the linear growth law in the regime of large root mean square shear amplitude at any fixed temporal correlation length, and the sublinear growth law if the temporal decorrelation is also large enough, the so-called bending phenomenon. 1. Introduction Reaction-diffusion front propagation in strongly time dependent random media arises in premixed flame propagation problems ([9, 26, 35, 36, 42, 43] and references), interacting particle systems ([28, 10] and references) and population biology ([37] and references). A fundamental issue is to characterize, bound and compute the large time front speed, an upscaled quantity that depends on statistics of the random media in a highly nontrivial manner. In combustion literature, ad hoc and formal procedures abound for approximation, such as closures and renormalization group methods [35, 43]. In this paper, we establish a variational principle for the propagation speeds of KPP reaction-diffusion fronts through temporally random shear flows. The variational characterization then allows us to estimate and compute the statistical properties of front speeds with both accuracy and ease. Related computational results will be presented separately [34]. The model equation is: ut =

1 z u + B · ∇z u + f (u), 2

(1)

494

J. Nolen, J. Xin

where u = u(z, t), z = (x, y) ∈ R × Rn−1 , n ≥ 2; B = (b(y, t), 0, . . . , 0), b(y, t) is a stationary Gaussian process in t, with a deterministic profile in y, to be made more precise later. The nonlinear function f (u) ∈ C 1 ([0, 1]) is a KPP nonlinearity: f (u) > 0 for u ∈ (0, 1), f (0) = f (1) = 0, f (0) = supu∈(0,1) f (u)/u. An example is f (u) = u(1 − u). For compactly supported initial data bounded between 0 and 1, solutions of (1) develop into propagating fronts separating the domain into a region where u ≈ 1 and the rest where u ≈ 0, which correspond to burned (hot) and unburned (cold) states in combustion. In case B is periodic in z and t, KPP type front dynamics and speeds have been recently studied for both shear and more general incompressible flows [22, 25, 26, 30, 32, 29]. Exact traveling front solutions exist [30, 32, 29], extending those in spatially periodic media, [5, 7, 40, 39], see also [4] and [41] for reviews. For temporally random shear flows, it is more efficient to study front solutions asymptotically without constructing exact traveling fronts. This line of work goes back to Gärtner and Freidlin [18, 16] where variational principles of KPP front speeds in spatially periodic media are obtained by combining large deviation techniques and Feynman-Kac representation formulas of KPP solutions. See also [16] and references therein for related results on KPP fronts through one dimensional spatial random media. We shall further develop this approach to treat the temporally random shear flows which generate more complexities in path integrals and unbounded variations in time. Let us make precise our assumptions on the shear field. The function b(y, t) = b(y, t, ω) ˆ is a mean zero Gaussian random field over (y, t), periodic in y with period L for each fixed t, and stationary in t for each fixed y. The field b is defined over probability ˆ Q) and has covariance function (y1 , y2 , t1 , t2 ) = E Q [b(y1 , t1 )b(y2 , t2 )]. ˆ F, space (, The following assumptions hold on b(y, t): A1 (Periodicity in y). Let C 0,1 P (D) denote the space of Lipschitz continuous functions ˆ there is a that are periodic on the period cell D = [0, L]n−1 . For each ωˆ ∈ , (D) such that b(·, t, ω) ˆ = J (t, ω). ˆ continuous map J (·, ω) ˆ : [0, +∞) → C 0,1 P A2 (Stationarity in t). For each s ∈ R+ there is a measure preserving transformation ˆ → ˆ such that b(y, · + s, ω) ˆ = b(y, ·, τs ω). ˆ Hence, depends only on τs : y1 , y2 and |t − s|. A3 (Ergodicity). The transformation τs is ergodic: if a set A ∈ Fˆ is invariant under the transformation τs , then either Q(A) = 0 or Q(A) = 1. A4. The field b is mean zero, almost surely continuous in (y, t), and has uniformly bounded variance: E[b(y, t)] = 0 E[b(y, t)2 ] ≤ σ 2 for all y ∈ D, t ≥ 0.

(2)

ˆ ) = sup y ,y (y1 , y2 , 0, r ) is A5 (Decay of Temporal Correlations). The function (r 1 2 integrable over [0, ∞): ∞ ˆ ) dr = p1 < ∞ (r (3) 0

for some finite constant p1 > 0. This constant will appear later in estimates of the front speed. A6. There is a finite constant p2 > 0 such that ˆ |(y1 , y2 , s, t) − (y1 , y3 , s, t)| ≤ p2 |y3 − y2 | (|s − t|).

KPP Fronts in Temporally Random Shear Flows

495

For example, a field satisfying Assumptions A1-A6 might have the form b(y, t, ω) ˆ = N j j j b (y)b (t, ω), ˆ where the functions b (y) are deterministic, Lipschitz continuous j=1 1 2 1 j

ˆ are mean zero, stationary Gaussian fields and periodic over D, and the functions b2 (t, ω) in t. Before stating the main results, let us define the family of Markov processes associˆ and for each z ∈ Rn , ated with the linear part of the operator in (1). For a fixed ωˆ ∈ z,t z,t z,t n t ≥ 0, let Z (s) = (X (s), Y (s)) ∈ R solve the Itô equation: d Z z,t (s) = B(Z z,t (s), t − s) ds + dW (s),

s ∈ [0, t]

(4)

with initial condition Z z,t (0) = z = (x, y) ∈ Rn , where W (s) = (W1 (s), W2 (s)) ∈ Rn is the n-dimensional Wiener process with W (0) = 0. Because of the shear structure of B, we therefore have s X z,t (s) = x + b(y + W2 (τ ), t − τ ) dτ + W1 (s), (5) 0

Y z,t (s) = y + W2 (s). Let P z,t denote the corresponding family of measures on C([0, t]; Rn ). As we will see, the KPP front speed depends on large deviations of the random variable z − Z z,t (κt) , (6) κt which is the average velocity of a trajectory over time interval [0, κt]. The need for the parameter κ results from the time dependence of the field b(y, t) and will become more apparent later. Now we state the main results. First, the following proposition allows us to characterize the speed of propagation: ηzt (κt) =

ˆ0 ⊂ ˆ Proposition 1. Assume that A1-A6 hold for the process b(y, t). There is a set n ˆ ˆ such that Q(0 ) = 1 and for any ωˆ ∈ 0 and any λ ∈ R , the limit 1 z,t μ(λ, z) = μ(λ) = lim log E e−λ·(Z (t)−z) (7) t→∞ t exists uniformly over z ∈ Rn and locally uniformly over λ ∈ Rn . The limit μ(λ) is a ˆ and independent of z ∈ Rn . Moreover, μ(λ) ≥ 0, and μ(λ) finite constant for all ωˆ ∈ is both convex and super-linear: μ(λ)/ |λ| → +∞ as |λ| → ∞. If we let S(c) be the Legendre transform of μ(λ), S(c) = sup [c · λ − μ(λ)], λ∈Rn

(8)

then we find that the speed of propagation can be bounded above in terms of S. Theorem 1 (Upper bound on front speed). Let b(y, t, ω) ˆ satisfy Assumptions A1 - A6. Let u(z, t, ω) ˆ solve (1) with initial condition u(z, 0, ω) ˆ = u 0 (z), where u 0 (z) ∈ [0, 1] has compact support and is independent of ω. ˆ Then, for any closed set F ⊂ {c ∈ Rn | S(c) − f (0) > 0}, lim sup u(ct, t, ω) ˆ =0

t→∞ c∈F

ˆ uniformly in c ∈ F, for almost every ωˆ ∈ .

496

J. Nolen, J. Xin

Furthermore, the speed of propagation can be bounded below in terms of the function S: Theorem 2 (Lower bound on front speed). Let b(y, t, ω) ˆ satisfy Assumptions A1 - A6. Let u(z, t, ω) ˆ solve (1) with initial condition u(z, 0, ω) ˆ = u 0 (z), where u 0 (z) ∈ [0, 1] has compact support and is independent of ω. ˆ Then, for any compact set K ⊂ {c ∈ Rn | S(c) − f (0) < 0}, ˆ =1 lim inf u(ct, t, ω)

t→∞ c∈K

(9)

ˆ uniformly in c ∈ K , for almost every ωˆ ∈ . Therefore, if for each unit vector e ∈ Rn we define the constant c∗ = c∗ (e) > 0 by the variational formula: μ(λ) + f (0) , λ·e>0 λ·e

c∗ (e) = inf

(10)

we see from the definition of S that the front spreads asymptotically with speed equal to ˆ since c∗ (e) in the direction of the vector e. Although the solution u depends on ωˆ ∈ ˆ the function S(c) and the speeds c∗ (e) are independent B is a random variable over , ˆ a consequence of the ergodicity of ω. ˆ They are almost surely constant with respect to Q, Assumption A3. Hence, we will refer to the constant c∗ (e) as the front speed in the direction e. We will frequently suppress the dependence of u on ωˆ for clarity of notation. If the initial data is front-like and aligned with the shear (u 0 (z, t) = χ{x<0} (x, y)), then this variational formula reduces to the one derived by Berestycki and Nirenberg [7] for waves traveling in a cylinder with a time-independent, spatially periodic flow. Theorems 1 and 2 extend our recent results on KPP front speeds in temporally periodic incompressible flows [32, 29] and classical results of Gärtner and Freidlin (see [18] and Theorem 7.3.1, p. 494 of [16]) where they treated the case of spatially periodic advecting flows. Our proofs are built on those, with additional ingredients to handle both the time-dependence and the stochastic nature of the field B. For example, in the periodic case, μ(λ) is the principal eigenvalue of a periodic-parabolic operator [32, 29], and perturbation theory [21] implies that μ(λ) is differentiable in λ. It then follows from Theorem 7.1.1 and Theorem 7.1.2 of [16] that the random variables ηzt (t) satisfy a large deviation principle with convex rate function S(c) given by (8). However, if μ(λ) is not known to be differentiable, the large deviation property needs a new proof. In the present case, μ(λ) is not an eigenvalue of a linear operator, so we cannot readily apply the perturbation theory [21] to get differentiability. Instead, we will show that a rate function exists and is convex, and that it continues to satisfy (8). In fact, μ(λ) is related to the almost sure principal Lyapunov exponent of a heat operator with random potential [42], known as the parabolic Anderson problem ([8, 12] and references). Dynamical aspects of principal Lyapunov exponents as an extension of principal eigenvalues are recently studied in [27]. Regularity of μ(λ) is an interesting problem in itself. The paper is organized as follows. In Sect. 2, we prove some important technical bounds on the process Z z,t (s). In Sect. 3, we prove Proposition 1 which defines the principal Lyapunov exponent μ(λ). In Sect. 4, we prove Theorem 1, the upper bound on the front speed. In Sect. 5, we adapt the method of [16] to prove Theorem 2, the lower bound on the front speed. A technical estimate (Lemma 7) and a large deviations estimate on the process Z z,t (s) are needed here; these are proven in Sects. 6 and 7. We

KPP Fronts in Temporally Random Shear Flows

497

will make frequent use of the subadditive ergodic theorem and the Borell inequality for Gaussian fields [1, 2, 23]. In Sect. 8, we use the variational formula (10) to derive analytical estimates on the front speed c∗ based on properties of the Lyapunov expo j j nent μ. In particular, we demonstrate that if the shear field is δ Nj=1 b1 (y)b2 (t, ω), ˆ ∗ the front speed c grows linearly with large δ. On the other hand, if the shear field is j j δ Nj=1 b1 (y)b2 (δt, ω) ˆ with the temporal correlation length decreasing accordingly, the √ speed enhancement is no faster than O( δ). This is analogous to the decrease of front speeds with increasing frequency of temporally oscillating periodic shear flows [22, 30, 32]. The reduction of speed enhancement due to rapid temporal decorrelation is known as the bending phenomenon in combustion literature [3, 13, 22], here we obtain a rigorous proof in the stochastic setting. We note that linear (large δ) and quadratic (small δ) speed growth laws are known for deterministic flow patterns with channel structures ([4–6, 11, 19, 24, 38] and references), also for spatially random shears inside infinite cylinders [33, 31] or white in time Gaussian shears in the entire space [42]. Our variational bounds show consistent results for temporally random shears with sufficiently decaying correlations. The speed growth laws known to date are not sensitive to the form of the nonlinearity as long as fronts propagate out of the initial data. In this sense, KPP plays the role of a solvable model and KPP front speeds carry universal properties, similar to Burgers equation for conservation laws [14]. Though the arguments in our proofs rely on the periodicity of b(y, t) in y to provide compactness in the y dimensions, they can be easily modified to solve the same problem in an infinite cylinder with the zero Neumann boundary condition on the sides of the cylinder, a case considered originally by Berestycki and Nirenberg [7] for time-independent, spatially-periodic shear flow. In the infinite cylinder, the compactness property remains, and the process Z z,t (s) just needs to be reflected when it hits the boundary R × ∂. It is also not necessary for the process b(y, t) to be Gaussian. Our proofs of the bounds in Sects. 1, 6, and 7 shall rely on the powerful Borell inequality for Gaussian process, yet it is easy to see that if the estimates of Sects. 1, 6, and 7 hold for a given process, the main results extend. 2. Estimates on Z z,t (s) In this section we derive some technical estimates on the process Z z,t (s) which follow from our structural assumptions on the field B and the Borell inequality for Gaussian fields. Let us first note that by changing variables r = s − t, v = s + t, it is easy to see from our assumptions on the field B that

T

0

T

0

√

sup (y1 , y2 , s, t) ds dt ≤ 2

y1 ,y2

0

2T

√ T/ 2

√ ≤ 2 2 p1 T

ˆ ) dr dv (r

0

(11)

and for H ∈ [0, T ], 0

T

T

sup (y1 , y2 , s, t) ds dt ≤

H y1 ,y2

≤

√ √

√ T/ 2

2 |T − H |

ˆ ) dr (r

0

2 |T − H | p1 .

(12)

498

J. Nolen, J. Xin

Let ρ(s) ∈ C([0, +∞), Rn−1 ) with ρ(0) = 0 be fixed. For y ∈ D, define ρ y (s) = y + ρ(s). For fixed t > 0, the integral s b(ρ y (τ ), t − τ ) dτ f (y, s) = 0

is a Gaussian random field over M = D × [0, t], with respect to the measure Q. The Borell inequality for Gaussian fields states that if f = sup(y,s)∈M f (y, s) is almost surely finite, then E Q [ f ] < ∞ and for any u > 0, −

u2 2σt2

, (13) √ where σt2 = sup(y,s)∈M E Q [ f 2 ] (see [1]). By (11), σt2 ≤ 2 2 p1 t. So, using inequality (13), we can control deviations of f , if we bound the growth of E[ f ]. Q ( f − E[ f ] > u) ≤ e

Lemma 1. There is a finite constant C > 0 such that E Q [ f ] ≤ Ct 1/2 .

(14)

Proof of Lemma 1. The expectation E[ f ] can be bounded by the metric entropy relation δ E[ f ] ≤ C (log N ())1/2 d, 0

where δ = diam(M)/2 in the metric

1/2 d((x, s), (y, z)) = E ( f (x, s) − f (y, z))2

and N () is the minimum number of -balls required to cover M (see [1]). Using (11) and (12), a straightforward computation shows that E ( f (x, s) − f (y, s))2 ≤ C |x − y| t and

E ( f (y, s) − f (y, z))2 ≤ C |s − z|

for some finite constant C, independent of ρ. Therefore, 1/2 d((x, s), (y, z)) ≤ C1 |s − z|)1/2 + C2 (|x − y| t , and there is a constant C3 independent of t and such that d((x, s), (y, z)) ≤ whenever 2 |s − z| ≤ C3 2 and |x − y| ≤ C4t . For ∈ (0, diam(M)/2], we have the bound N () ≤ max(C5 and

1/4

C5 t 1/2

E[ f ] ≤ C 0

1

= C6 t 1/2 0

t2 , 1) 4

(log(C5

(log(

t 2 1/2 )) d 4

1 1/2 )) , d ≤ C7 t 1/2 . 4

(15)

KPP Fronts in Temporally Random Shear Flows

499

Note that the constants depend on the assumed properties of the process b and the size of the domain D, but not on the particular function ρ(s). If u ≥ 2C7 t 1/2 , then by (13), Q ( f > u) ≤ e−(u−E f )

2 /2σ 2 t

≤ e−u

2 /8σ 2 t

≤ e−u

2 /32 p

1t

.

It now follows that Lemma 2. For any η > 0 and for t ≥ t0 = t0 (η) = (2C7 /η)2 , s 2 Q sup b(y + ρ(τ ), t − τ ) dτ > ηt ≤ e−η t/32 p1 y∈D, s∈[0,t] 0

for any ρ ∈ C([0, ∞), Rn−1 ), ρ(0) = 0. In applying this lemma, the continuous function ρ will be a realization of the Wiener process W2 (s) ∈ Rn−1 . Lemma 3. For η > 0, z ∈ Rn , define the Markov time

τη,z (t) = min{s ≥ 0| X z,t (s) − x ≥ ηt} with τη,z (t) = +∞ if the set on the right is empty. Then there are constants K 1 , K 2 such that 2 2 Q P inf τη,z (t) ≤ t > e−K 2 η t/2 ≤ K 1 e−K 2 η t/2 z∈Rn

for all t > 0. Proof of Lemma 3. Note that for the Wiener process W1 (s) with W1 (0) = 0,

2 ∞ −x 2 /2 e dx P sup |W1 (s)| ≥ ηt ≤ 2 π η √t s∈[0,t] ≤ K 1 e−η

2 t/2

.

(16)

The point of the lemma is that at large times and almost surely with respect to Q, the process X z,t (s) also behaves like a Wiener process in the sense that a bound like (16) holds. By definition of τη,z (t), | f t (y, s) + W1 (s)| ≥ ηt . sup P inf τη,z (t) ≤ t = P z∈Rn

s∈[0,t],z∈Rn

Using Tchebyshev’s inequality, (16), and Lemma 2 we see that for any η > 0, α > 0: −1 | f t (y, s) + W1 (s)| ≥ ηt sup Q P inf τη,z (t) ≤ t > α ≤ α E Q P z∈Rn

s∈[0,t],z∈Rn

=α

−1

≤α

−1

EP Q

sup

EP Q

sup

+α

−1

P

s∈[0,t],z∈Rn

s∈[0,t],z∈Rn

sup

s∈[0,t],z∈Rn

≤ α −1 (2e−η

2 t/32 p

1

| f t (y, s) + W1 (s)| ≥ ηt | f t (y, s)| ≥ ηt/2

|W1 (s)| ≥ ηt/2

+ K 1 e−η

2 t/8

) ≤ α −1 K 1 e−K 2 η

2t

500

J. Nolen, J. Xin

for t sufficiently large, for some constants K 1 , K 2 > 0. The result now follows from a 2 choice of α = e−K 2 η t/2 . Lemma 4. There are constants K 1 , K 2 > 0 such that, except on a set of Q-measure zero, 2 sup P τη,z (t) ≤ t ≤ K 1 e−K 2 η t (17) z∈Rn

for t sufficiently large depending on ωˆ and η. Proof of Lemma 4. Lemma 3 and the Borel-Cantelli lemma imply that outside a set of Q-measure zero 2 P inf τη,z (k) ≤ k ≤ e−K 2 η k/2 (18) z∈Rn

if k ∈ Z is sufficiently large. Now we want to extend this to all real t sufficiently large. Let t ∈ [k, k + 1], t = k + τ , τ ∈ [0, 1],

z,t

sup P sup X (s) − x0 ≥ tη z∈Rn

s∈[0,t]

≤ sup P

sup X z,t (s) − x0 ≥ tη/2

s∈[0,τ ]

z∈Rn

z,t sup X (s) − X z,t (τ ) ≥ tη/2 .

+ sup P z∈Rn

s∈[τ,t]

By the Markov property, this is bounded by

z,t

≤ sup P sup X (s) − x0 ≥ tη/2 s∈[0,τ ]

z∈Rn

+ sup P z¯ ∈Rn

sup

z∈Rn ,t∈[k,k+1],s∈[0,1]

+P

s∈[0,k]

≤P

sup X z¯ ,k (s) − x¯0 ≥ tη/2

sup

z¯ ∈Rn ,s∈[0,k]

z,t

X (s) − x0 ≥ kη/2

z¯ ,k

X (s) − x¯0 ≥ kη/2 .

(19)

By (18), the second term on the right side of (19) is bounded (Q-a.s.) by

2

z¯ ,k

P sup

X (s) − x¯0 ≥ kη/2 ≤ e−K 3 η k z¯ ∈Rn ,s∈[0,k]

(20)

for k ∈ Z sufficiently large. To bound the other term in (19), it suffices to show that P

sup

y∈D,r ∈[0,1],s∈[0,1]

| f k (y, s, r )| ≥ kη/2 ≤ e−K 4 η

2k

(21)

KPP Fronts in Temporally Random Shear Flows

for k ∈ Z sufficiently large, where

f k (y, s, r ) =

s

0

501

y

b(W2 (τ ), r + k − τ ) dτ.

Note that f k (y, s, r ) is a centered Gaussian field (with respect to Q) over D × [0, 1] × [0, 1], and its distribution is invariant with respect to k > 0, due to the stationarity of y b(y, t). For any fixed path W2 (ω), the Borell inequality implies that for k sufficiently large Q

sup

y∈D,r ∈[0,1],s∈[0,1]

| f k (y, s, r )| ≥ kη/2 ≤ K 5 e−K 6 η

2k2

y

for some constants K 5 , K 6 > 0, independent of k and the realization W2 (ω). Therefore, proceeding as in the proof of Lemma 2, we see that Q

P

sup

y∈D,r ∈[0,1],s∈[0,1]

| f k (y, s, r )| ≥ kη/2 ≥ e−K 6 η

2 k 2 /2

≤ K 7 e−K 6 η

2 k 2 /2

.

Now (21) follows from the Borel-Cantelli lemma. We complete the proof by combining (20) and (21). The next estimate gives a coarse bound on very large excursions of the random process X z,t , the first component of the process Z z,t : Lemma 5. There are constants K 1 , K 2 > 0 independent of κ ∈ (0, 1] such that, except on a set of Q-measure zero,

z,t 2 sup P sup X (s) − x ≥ ηt | W 0 ∈ ≤ K 1 e−K 2 η t/κ (22) z∈Rn

2

s∈[0,κt]

for any open set ⊂ Rn , for any κ ∈ (0, 1], η > 0, and for t sufficiently large depending on ω, ˆ κ, and η. In particular, the lemma holds with = Rn . Using the fact that the Y z,t component of the process Z z,t is a Wiener process, the lemma implies the following corollary: Corollary 1. There are constants K 1 , K 2 > 0 independent of κ ∈ (0, 1] such that, except on a set of Q-measure zero,

z,t 2 sup P sup Z (s) − z ≥ ηt ≤ K 1 e−K 2 η t/κ (23) z∈Rn

s∈[0,κt]

for any κ ∈ (0, 1], η > 0, and for t sufficiently large depending on ω, ˆ κ, and η. Proof of Lemma 5. Lemma 4 encompasses the special case that κ = 1 and = Rn . For κ < 1, modify the preceding bounds for the field s f (y, s) = b(ρ y (τ ), t − τ ) dτ 0

502

J. Nolen, J. Xin

considered over Mκ = D × [0, κt]. Now we have σt2 = sup(x,s)∈M E Q [ f 2 ] ≤ p1 κt, so √ we find that E[ f ] ≤ C κt for some constant C > 0. Then, just as in Lemma 3, we have

z,t 2

X (s) − x ≥ ηt > e−K 2 η t/2κ ≤ K 1 e−K 2 η2 t/2κ , Q P sup s∈[0,κt],z∈Rn

and for = Rn , the rest follows as in the proof of Lemma 3. To bound the more general conditional probability in Lemma 5 (with = Rn ), observe that whenever P(W20 ∈ ) > 0,

z,t 2 t/2κ 0 −K η

X (s) − x ≥ ηt |W ∈ > e 2 Q P sup 2

s∈[0,κt],z∈Rn

=Q

P

sup

s∈[0,κt],z∈Rn

z,t

X (s) − x ≥ ηt, W 0 ∈ > P W 0 ∈ e−K 2 η2 t/2κ 2 2

2

z,t e K 2 η t/2κ 0

X (s) − x ≥ ηt, W ∈ EQ P sup ≤ 0 2 P W2 ∈ s∈[0,κt],z∈Rn 2

z,t e K 2 η t/2κ

E P χW 0 ∈ Q . (24) X (s) − x ≥ ηt ≤ 0 sup 2 P W2 ∈ s∈[0,κt],z∈Rn

By Lemma 2, the probability Q(sups∈[0,κt],z∈Rn X z,t (s) − x ≥ ηt) is bounded independently of the realization of W20 , so the right-hand side of (24) is bounded by 2

z,t e K 2 η t/2κ

E P χW 0 ∈ Q X (s) − x ≥ ηt sup 2 P W20 ∈ s∈[0,κt],z∈Rn e K 2 η t/2κ 2 2 0 E P χW 0 ∈ K 1 e−K 2 η t/κ = K 1 e−K 2 η t/2κ . 2 P W2 ∈ 2

≤

Then the rest follows as in Lemma 3. 3. The Lyapunov Exponent μ(λ) In this section we prove Proposition 1. We study the limit 1 z,t μ(λ, z) = lim log E e−λ·(Z (t)−z) . t→∞ t

(25)

Notice that this is equivalent to the limit μ(λ, z) =

1 λ2 + lim log φ(z, t), 2 t→∞ t

(26)

where φ(z, t) > 0 solves that auxiliary initial value problem 1 z φ + (B − λ) · ∇φ − λ · B(z, t)φ, 2 φ(z, 0) ≡ 1. φt =

(27)

KPP Fronts in Temporally Random Shear Flows

503

To see this, use the Feynman-Kac representation to express φ as t ˜ z,t φ = E˜ z e− 0 λ·B( Z (s),t−s)ds , where Z˜ solves

d Z˜ z,t (s) = B( Z˜ z,t (s), t − s) − λ ds + dW (s).

This induces a measure P˜ z,t that is absolutely continuous with respect to P z,t . The Girsanov theorem [20] implies that t 1 d P˜ 2 = e− 0 λ· d W (s)− 2 |λ| t . dP

Hence t |λ|2 z,t ˜ z,t φ = E˜ z e− 0 λ·B( Z (s),t−s)ds = e− 2 t E z e−λ·(Z (t)−z) , establishing (26). Without the drift term, Eq. (27) is called the parabolic Anderson problem (see [8] and [12]). We will denote by ρ(λ) the limit limt→∞ 1t log φ(z, t). Therefore, μ(λ) exists independent of z if and only if ρ(λ) exists independent of z. The proof that μ(λ) exists almost surely with respect to Q, independent of z, relies on the sub-additive ergodic theorem and a Harnack-type estimate based on techniques in [12], provided we assume the necessary decay of temporal correlation of the process B(y, t). Following [12], we define for any continuous path W ∈ C([0, t], Rn ) the exponential ξ(t, W ) = e−

t 0

λ1 b(W2 (s)+z,t−s) ds−λ1 W1 (t)−λ2 W2 (t)

,

−λ·(Z z,t (t)−z)

which is the exponential term e for a fixed realization of the Wiener process. For any fixed path W , ξ(t, W ) is lognormal with mean E Q [ξ(t, W )] = e where σˆ = 2

t 0

t

λ2 σˆ 2 2

e−λ1 W1 (t)−λ2 W2 (t) ,

√ (X s , X r , s, r ) ds dr ≤ 2 2 p1 t,

(28)

(29)

0

by (11). Note that σˆ 2 is bounded independently of the particular path W . For 0 ≤ s < t, define the random variables q z (λ, s, t) = E z,t [e−λ·(Z (t−s)−z) ], q I (λ, s, t) = inf q z (λ, s, t), z∈D

q S (λ, s, t) = sup q z (λ, s, t). z∈D

Using the sub-additive ergodic theorem, we will show that the limits lim

t→∞

1 log q I (λ, 0, t) = μ I (λ) t

(30)

504

J. Nolen, J. Xin

and 1 log q S (λ, 0, t) = μ S (λ) t→∞ t lim

(31)

exist and are finite, almost surely with respect to Q. Then we will show μ I (λ) = μ S (λ), and therefore μ(λ) = μ I (λ) = μ S (λ) is well-defined, independently of z. By the Markov property of the Wiener process we have for any s < r < t: z,t a,r q I (λ, s, t) = inf E z,t e−λ·(Z (t−r )−z) E[e−λ·(Z (r −s)−a) |Z z,t (t − r ) = a] z z,t a,r ≥ inf E z,t e−λ·(Z (t−r )−z) inf E a,r [e−λ·(Z (r −s)−a) ] z

a

= q I (λ, s, r )q I (λ, r, t). Therefore, log(q I (λ, s, t)) is super-additive: log(q I (λ, s, t)) ≥ log(q I (λ, s, r )) + log(q I (λ, r, t)) for any 0 ≤ s < r < t. Similarly, the function log(q S (λ, s, t)) is sub-additive. By the stationarity of B, τr log(q I (λ, s, t)) = log(q I (λ, s + r, t + r )) for any r ≥ 0. In order to apply the ergodic theorem, we must show that log q I and log q S are integrable. First, E Q [log(q I (λ, s, t))] ≤ log E Q [q I (λ, s, t)] ≤ log inf E P E Q [e−λ·(Z

z,t (t−s)−z)

z

= log inf E P [e

λ2 σˆ 2 2

z

],

e−λ1 W1 (t−s)−λ2 W2 (t−s) ]

λ2 σ 2 |λ|2 |t − s| + log e 2 2 √ |λ|2 |t − s| + 2λ2 p1 |t − s| . ≤ 2

≤

(32)

By Jensen’s Inequality, z,t E Q log(q I (λ, s, t)) ≥ E Q log(E P einf z −λ·Z (t−s) z,t ≥ E P E Q inf −λ · (Z (t − s) − z) z

=

t |λ|2 |t − s| + E P E Q inf − λ1 b(W2 (s) + z, t − s) ds . z 2 0

This last term is finite, by the Borell inequality. Note also that if M(ω) ˆ = supt∈[0,1],z∈D |B(z, t)|, then

sup |log q I (λ, s, t)| ≤ |λ1 | M(ω) ˆ + sup log E[e−λ1 W1 (t−s)−λ2 W2 (t−s) ]

s,t∈[0,1]

≤ |λ1 | M(ω) ˆ +

s,t∈[0,1] |λ|2

2

,

KPP Fronts in Temporally Random Shear Flows

505

and the latter is integrable with respect to Q. It now follows from the sub-additive ergodic theorem (Theorem 2.5 of [2]) and the continuity of q(λ, 0, t) with respect to t that the limit (30) exists almost surely and is finite: E Q log(q I (λ, 0, t)) log(q I (λ, 0, t)) = sup = μI . (33) lim t→∞ t t t √ 2 Also, by (32), μ I ≤ |λ|2 + 2 |λ|2 p1 . Because b is ergodic with respect to translation in t, μ I (λ) is constant on a set of full measure (with respect to Q). Now we show that (1/t) log(q S (λ, 0, t) is integrable. A lower bound on the expectation follows from Jensen’s inequality: E Q log q S (λ, s, t) ≥ sup E Q E z,t [−λ · (Z z,t (t − s) − z)] z

= sup E z,t E Q [−λ · (Z z,t (t − s) − z)] = 0.

(34)

z

Now we derive an upper bound. The Borell inequality and Theorem 3.2 of [1] (p. 63, let α = 1) imply that there is a finite constant K 0 > 0 such that E Q esupz −

ti −t j 0

λ1 b(W2 (τ )+z,ti −τ ) dτ

< K0 < ∞

(35)

ti −t j if σˆ 2 < 21 , where σˆ 2 is the variance of the integral λ1 b(W2 (τ ) + z, ti − τ ) dτ

− 0

ti − t j is small. Thus by (29), there is with respect to Q. This variance is small, when

a constant K 1 > 0 such that (35) holds when ti − t j ≤ K 1 . Now for any s < t, let N be the smallest integer greater than |t − s| /K 1 and s = t0 < t1 < t2 < · · · < t N = t with |ti+1 − ti | = t = |t − s| /N ≤ K 1 for all i = 0, . . . , N − 1. Jensen’s inequality implies that E Q log(q S (λ, ti , ti+1 )) ti+1 −ti b(W2 (s)+z,t−s) ds ≤ log E P e−λ·W (ti+1 −ti ) E Q [esupz −λ1 0 ] |λ|2 (t − t ) i+1 i + log K 0 . ≤ log E P e−λ·W (ti+1 −ti ) K 0 = 2

(36)

Combining this with the subadditivity of log(q S (λ, s, t)), we derive the upper bound E Q log(q S (λ, s, t)) ≤

N −1

E Q log(q S (λ, ti , ti+1 ))

i=0 |λ|2

(t − s) + N (log K 0 ) 2 |t − s| |λ|2 (t − s) +( ≤ + 1)(log K 0 ). 2 K1 ≤

The last inequality follows from our definition of N . Moreover, E P [e−|λ| B |t−s|−λ·W (t−s) ] ≤ q S (λ, s, t) ≤ E P [e|λ| B |t−s|−λ·W (t−s) ],

506

J. Nolen, J. Xin

so that sup

s,t∈[0,K 1 ]

λ2 ), 2

|log q S (λ, s, t)| ≤ K 1 (|λ| B +

(37)

where B denotes supt∈[0,K 1 ],z∈D |B(z, t)|. The right side of (37) is integrable. So, we can apply the sub-additive ergodic theorem to conclude that the limit E Q log(q S (λ, 0, t)) log(q S (λ, 0, t)) lim = inf = μS , t→∞ t t t

(38)

holds almost surely with μ S a constant, μ S ∈ [0, ∞). The convergence along continuous time follows from (37), the continuity of q S (λ, 0, t), and Theorem 2.5 of [2]. As with μ I , μ S is constant on a set of full measure, because of the ergodicity of b with respect to translation in t. Clearly μ I ≤ μ S . To show that μ I = μ S , we will need a kind of Harnack inequality to compare the quantities q I (λ, 0, t) and q S (λ, 0, t). Such a result has been obtained in [12] in the case that b(y, t) is Gaussian in both space and time, with a white-noise temporal dependence. Under Assumptions A1-A6, however, the arguments of [12] imply that the following estimate also holds in the present case. Theorem 3 (Cranston and Mountford [12]). For any fixed M > 0, there are positive 1 5/6 constants c1 , c2 such that outside an event of Q−probability e− 4 n , one has q I (λ, 0, n) ≥ c1 e−c2 n

11/12

1 7/6 q S (λ, 0, n) − e− 4 n .

From this result it follows immediately that 1 z,n log E e−λ·(Z (n)−z) = μ I (λ) = μ S (λ), n→∞ n lim

uniformly in z. By (33) and (38), we see that this extends to continuous time 1 z,t log E e−λ·(Z (t)−z) = μ I (λ) = μ S (λ) = μ(λ). t→∞ t lim

(39)

We have now shown that for each λ ∈ Rn , μ(λ) is well-defined, independent of z ∈ Rn , ˆλ ⊂ ˆ almost surely with respect to Q. This means that for each λ ∈ Rn , there is a set ˆ λ ) = 1 and (39) holds for all ωˆ ∈ ˆ λ . We claim that the set such that Q( ˆ0 =

ˆλ

(40)

λ∈Rn

ˆ = ˆ 0 ) = 1. It is clear that the set has full measure: Q( 0 n since Q is a countable set. ˆ0 = ˆ . So, Q( ˆ 0 ) = 1. Lemma 6. 0

λ∈Qn

ˆ λ has full measure,

KPP Fronts in Temporally Random Shear Flows

507

ˆ0 ⊂ ˆ . For λ ∈ Rn , t > 0, z ∈ D, we define the quantities Proof. Clearly 0 1 1 log E z,t [e−λ·Z (t) ] = log q z (λ, 0, t), t t μ+ (λ, t) = inf μz (λ, t), μz (λ, t) =

z∈D

μ− (λ, t) = sup μz (λ, t). z∈D

ˆ , μ+ (λ, t) → μ(λ) and μ− (λ, t) → μ(λ) as t → ∞. So, for all λ ∈ Qn and ωˆ ∈ 0 We claim that μz (λ, t) and μ+ (λ, t) are convex in λ, for each t > 0. Let r ∈ [0, 1], λ1 , λ2 ∈ Rn . By Hölder’s inequality, E[e−r λ1 ·Z

z,t (t)−(1−r )λ

2 ·Z

z,t (t)

] ≤ E[e−λ1 ·Z

z,t (t)

]r E[e−λ2 ·Z

z,t (t)

]1−r .

(41)

Upon taking a logarithm and dividing by t, this inequality implies that μz (r λ1 + (1 − r )λ2 , t) ≤ r μz (λ1 , t) + (1 − r )μz (λ2 , t). Hence, μz (λ, t) is convex. Since μ+ (λ, t) is a supremum of convex functions (μz ), it must also be convex. It follows that the function μ(λ) is continuous in λ, since for each λ ∈ 0 , μ(λ) is the finite, pointwise limit of continuous convex functions μ+ (λ, t) (pointwise for λ ∈ Qn , a dense subset of Rn ). Moreover, μ(λ) must be uniformly continuous on compact sets. ˆ fixed, μ+ (λ, t) → μ(λ) and μ− (λ, t) → μ(λ) Next, we claim that for each ωˆ ∈ 0 n locally uniformly in λ ∈ R . Let δ ⊂ Rn be a closed ball of radius δ. For > 0, let −k k ∈ Z be large enough so that |μ(λ1 ) − μ(λ2 )| < whenever

+ |λ1 − λ2 | ≤ n2 and

λ1 , λ2 ∈ 2δ . Then, let t0 > 0 be large enough so that μ (λ, t) − μ(λ) ≤ for all λ ∈ 2δ ∩ 2−k Zn and t ≥ t0 . Such a t0 exists since 2δ ∩ 2−k Zn is a finite subset of ⊂ Qn . For k sufficiently large, depending on δ, any λ ∈ δ can be expressed as a convex combination of points in the set {λ j } j = 2δ ∩!2−k Zn : λ= wjλj (42) j

such that w j ≥ 0, j w j = 1. Moreover, we can require that w j = 0 if λ j − λ ≥ n2−k . Therefore, for t sufficiently large, w j λ j , t) μ+ (λ, t) = μ+ (

≤

j

≤+

j

w j μ+ (λ j , t) (by convexity)

w j μ(λ j ) (by choice of t0 )

j

≤+

w j ( + μ(λ)) = 2 + μ(λ).

j

This implies that lim sup μ+ (λ, t) − μ(λ) ≤ 0.

t→∞ λ∈

δ

(43)

508

J. Nolen, J. Xin

Now suppose that for some > 0, there are sequences z k → z 0 ∈ D, tk → ∞, and λk → λ0 ∈ δ such that μz k (λk , tk ) < μ(λ0 ) − , for k = 1, 2, 3, . . . . Then for any λ ∈ Qn , μz k (λ , tk ) − μz k (λk , tk ) μz k (λ , tk ) − μ(λ0 ) + ≥ . |λk − λ | |λk − λ |

(44)

Since λ ∈ Qn , μz k (λ , tk ) → μ(λ ) as k → ∞.

Therefore, by choosing λ sufficiently z

k close to λ0 , we can make μ (λ , tk ) − μ(λ0 ) < /2 for k sufficiently large, since μ(λ) is continuous in λ. Hence, the right-hand side of (44) can be made arbitrarily large. That is, the slopes of the secant lines through the points (λ , μz k (λ , tk )) and (λk , μz k (λk , tk )) can be made arbitrarily large for k large, since λk → λ0 . However, because the functions μz k (λ, tk ) are convex in λ and bounded above by μ+ (λ, tk ), this contradicts (43). Hence,

lim inf μ(λ) − μ− (λ, t) ≤ 0.

(45)

t→∞ λ∈δ

Equations (43) and (45) imply the claim that μ+ (λ, t) → μ(λ) and μ− (λ, t) → μ(λ) ˆ . Therefore, ˆ ⊂ ˆ 0. locally uniformly in λ ∈ Rn for each ωˆ ∈ 0 0 To complete the proof of Proposition 1, we now show that μ(λ) is super-linear in λ. Clearly μ(0) = f (0). Let λ = (λ1 , 0), λ1 ∈ R, so that the nonzero component of λ is in the x-direction. Then φ in (27) can be chosen to depend only on the y variable: φ = φ(y, t). Thus, the problem (27) reduces to 1 y φ − λ1 b(y, t)φ, 2 φ(y, 0) ≡ 1. φt =

(46)

Since b(y, t) has the same distribution as −b(y, t), we conclude that ρ(−(λ1 , 0)) = ρ((λ1 , 0)) for all λ1 ∈ R. Hence ρ((λ1 , 0)) and μ((λ1 , 0)) are even functions of λ1 . Using the Feynman-Kac representation for ρ((λ1 , 0)) as in (41), we find that ρ((λ1 , 0)) is convex in λ1 . Since ρ(0) = 0, we conclude that ρ((λ1 , 0)) ≥ 0 and μ((λ1 , 0)) ≥ λ21 /2,

∀λ1 ∈ R.

(47)

If we choose λ = (0, λ2 ), λ2 ∈ R, so that the nonzero component of λ is in the y-direction, then (27) φ can be chosen to be constant φ ≡ 1, for all time. Hence ρ((0, λ2 )) = 0 and μ((0, λ2 )) = λ22 /2,

∀λ2 ∈ R.

(48)

Combining (47), (48), and the convexity of μ(λ), we conclude that μ(λ) is super-linear. This completes the proof of Proposition 1.

KPP Fronts in Temporally Random Shear Flows

509

4. Proof of Theorem 1 The proof of Theorem 1 is based on the assumption that f (u) ≤ f (0)u. This allows us to construct a super-solution to Eq. (1) as follows. Consider the solution to the auxiliary initial value problem t =

1 z + (B − λ) · ∇ + (|λ|2 /2 + f (0) − λ · B(z, t)), (z, 0) ≡ 1, 2

where = (z, t) > 0 is periodic in z. As shown at the beginning of Sect. 3, we can express μ(λ) in terms of : μ(λ) = − f (0) + lim

t→∞

1 log (z, t). t

(49)

Now suppose S(c) − f (0) > 0. Then for > 0 sufficiently small, there exists λ > 0 such that λ · c > μ(λ) + f (0) + 2. By Proposition 1, μ(λ) is finite, and there is a function R = R(z, t) such that |R| → 0 as t → ∞, uniformly in z, and (y, t) = eμ(λ)t+ f

(0)t+R(z,t)t

.

(50)

If δ > 0 is sufficiently small, then λ · c > μ(λ) + f (0) + , whenever c − c < δ. Then for any α > 1, we also have λ · αc > μ(λ) + f (0) + . If we define the function ψ(z, t) = e−λ·z (z, t), then ψ solves the equation 1 z ψ + B · ∇ψ + f (0)ψ, 2 ψ(z, 0) = e−λ·z , ψt =

(51)

and ψ is a super solution to the original nonlinear equation (1), since f (u) ≤ f (0)u. Combining (50) and (51), we see that

lim sup ψ(αc t, t) = lim sup e−λ·αc t (αc t, t) t→∞

t→∞

≤ lim sup e−t+R(αc t,t)t t→∞

= 0, since |R(z, t)| < for t sufficiently large. By definition of R(z, t), the limit is uniform in α and δ, for α ≥ 1 and δ small. After multiplying ψ by a constant, if necessary, the maximum principle implies that u(z, t) ≤ ψ(z, t) for all z and t. The function u is therefore trapped below ψ which moves with velocity c . Now we piece together a collection of such super solutions. If F is bounded, then it is compact, since it is closed. From the above analysis, we see that we can pick finite sets {c j } ⊂ F and {λ j } such that F ⊂ j Uδ j (c j ) ⊂ {S(c) − f (0) > 0} and λ j · c > μ(λ j ) + whenever c ∈ Uδ j (c j ). If we define ψ j according to (51) with λ = λ j , and set ˆ ψ(z, t) = inf ψ j (z, t), j

we see that lim

sup

t→∞ c∈F,α>1

u(αct, t) ≤ lim

sup

t→∞ c∈F,α>1

ˆ ψ(αct, t) = 0.

510

J. Nolen, J. Xin

Since {S(c) − f (0) ≤ 0} is bounded, then the general result follows from the fact that the limit is uniform in α > 1. This completes the proof of Theorem 1. ˆ HowNote that the function R(z, t) = R(z, t, ω) ˆ depends on the realization ωˆ ∈ . ever, such a function exists almost surely with respect to Q, according to Proposition 1. 5. Proof of Theorem 2 Proving Theorem 2 requires more estimates on the random solutions u(z, t) and the processes Z z,t (s). The following estimate is a lower bound analogous to Lemma 7.3.3 of [16]. It estimates the exponential decay rate of u in terms of the function S: Lemma 7. For any compact set K ⊂ {c ∈ Rn | S(c) − f (0) > 0}, lim inf t→∞

1 log inf u(ct, t) ≥ − max(S(c) − f (0)). c∈K c∈K t

(52)

This lemma and the estimates of Sects. 2 and 3 represent the main technical difficulty in extending the work of [16] to the present case with a stochastic time dependence in b(y, t). For the moment, however, we delay the proof of this lemma and show how the bound leads to Theorem 2. Lemma 7 is proved in the next section. The proof of Theorem 2 is based on the observation that when u < h < 1, the reaction rate can be bounded below. For each u ∈ (0, 1], define the reaction rate ζ by ζ (u) =

f (u) u

and ζ (0) = f (0). Now Eq. (1) can be written ut =

1 z u + b(y, t)u x + ζ (u)u. 2

(53)

By the properties of f (u) we see that ζ (u) > 0 for u ∈ [0, 1), ζ (u) is continuous for u ∈ [0, 1], and ζ (0) ≥ ζ (u) for any u ∈ [0, 1]. If h ∈ (0, 1) we define a lower bound on ζ: ζh = inf ζ (u) > 0. u∈(0,h)

So, in regions where u is bounded away from one, the reaction rate can be bounded below by ζh > 0. ˆ we can estimate u(z, t) using the Feynman-Kac formula for the For a fixed ωˆ ∈ , solution of (53): t z,t u(z, t) = E e 0 ζ (t−s,u(Z (s),t−s))ds u 0 (Z z,t (t)) , (54) where the expectation is with respect to measure P z,t . If τ is any stopping time, we also have t∧τ z,t u(z, t) = E e 0 ζ (t−s,u(Z (s),t−s))ds u(Z z,t (t − (t ∧ τ )), t − (t ∧ τ )) , (55) where t ∧ τ = min(t, τ ). Therefore, we can obtain estimates on u by carefully choosing stopping times and restricting the expectation to certain sets of paths. The exponential

KPP Fronts in Temporally Random Shear Flows

511

term inside the expectation will be large when the path Z z,t (s) passes through regions where u is small and the reaction rate is large. On the other hand, if u(Z z,t (t − (t ∧ τ )), t − (t ∧ τ )) is too small, then the expectation as a whole may be small. Now we follow the ideas of Freidlin [16] (see p. 494). For s ∈ R, define the set (s) = {c ∈ Rn | S(c) − f (0) = s} and (s) = {c ∈ Rn | S(c) − f (0) ≤ s}. For any δ > 0 and T > 1, define ⎛ T = ⎝[(δ) × {1}] ∪ [

⎞ (t(δ)) × {t}]⎠ .

1≤t≤T

This defines the boundary of a region that spreads outward in z, linearly in t. Outside this region u may be close to zero, but on the boundary of this region, we have the crucial lower bound from Lemma 7: u(z, s) ≥ e−2δt for all (z, s) ∈ t

(56)

for t sufficiently large. Let K be a compact set K ⊂ {c ∈ Rn | S(c) − f (0) < 0} and z = ct for some c ∈ K . For h ∈ (0, 1), t, η > 0, define the Markov times σh (t) = σ (t) = τη (t) = σˆ (t) =

min{s ∈ [0, t]| u(Z z,t (s), t − s) ≥ h}, min{s ∈ [0, t]| (Z z,t (s), t − s) ∈ t },

min{s ∈ [0, t]| Z z,t (s) − z > ηt}, σh (t) ∧ σ (t).

(We set these variables equal to +∞ if the sets on the right are empty.) Using (55) with the stopping time σˆ (t) we express u(z, t) as t∧σˆ

u(z, t) = E[e 0 ζ (t−s,u(Z (s),t−s))ds × u Z z,t (t − (t ∧ σˆ )), t − (t ∧ σˆ ) (χ A1 + χ A2 + χ A3 )], z,t

(57)

where A1 , A2 , and A3 are the disjoint sets A1 = {ω| σh (t) ≤ t}, A2 = {ω| σh (t) > t, σ (t) ≥ r t}, A3 = {ω| σh (t) > t, σ (t) < r t} for some r ∈ (0, 1) to be chosen. Note that P(A1 ) = P z,t (σh (t) ≤ t) is the probability that a particle starting at z = ct will encounter the “hot region”, u ≥ h, at or before time t. Because the sets A1 , A2 , and A3 are disjoint, the expectation (57) splits into three integrals. The first integral, over A1 , can be bounded below by t∧σˆ z,t E e 0 ζ (t−s,u(Z (s),t−s))ds u(Z z,t (t − (t ∧ σˆ )), t − (t ∧ σˆ ))χ A1 ≥ h P(A1 ),

(58)

512

J. Nolen, J. Xin

since ζ ≥ 0. The second integral, over A2 , can be bounded below by t∧σˆ z,t E e 0 ζ (t−s,u(Z (s),t−s))ds u(Z z,t (t − (t ∧ σˆ )), t − (t ∧ σˆ ))χ A2 ≥ e−2δt eζh r t P(A2 ).

(59)

Combining (58) and (59) we have u(z, t) ≥ h P(A1 ) + e−2δt+ζh r t P(A2 ).

(60)

If we choose δ to be small, depending on h and r , then −2δt +ζh r t > 0. Since u(z, t) ≤ 1 for all (z, t), (60) then implies that P(A2 ) → 0 exponentially fast, if δ is small. Therefore, if we can also show that P(A3 ) → 0, then P(A1 ) → 1, and (60) implies the desired result (9), since h can be chosen arbitrarily close to 1. The compact set K is bounded away from the boundary of (0) ⊂ (δ), so we can choose η small and then r ∈ (0, 1) sufficiently small so that r t < σ (t) ≤ t − 1 whenever τη (t) > t. In other words, the trajectory Z z,t (s) stays in the set t for at least some fixed proportion of the interval [0, t]. Therefore, P(A3 ) ≤ P(σ (t) < r t) ≤ P(τη (t) ≤ t). By Corollary 1, sup P z,t τη (t) < t → 0

z∈Rn

(61)

as t → ∞, for all η > 0, except on a set of Q−measure zero. Hence P(A3 ) → 0, uniformly over c ∈ K . This completes the proof of Theorem 2. We note that the main difficulty in extending the argument of [16] for the periodic case is the manner in which the estimates (56) and (61) are obtained. In [16], estimate (61) followed from the uniform boundedness of the field B and the independence of B with respect to time, properties that we do not have in the present case. 6. Proof of Lemma 7 The main issue in proving the estimate of Lemma 7 (and thus the lower bound (56)) is whether the random variable ηzt (κt) =

z − Z z,t (κt) κt

(62)

satisfies a large deviation principle with a convex rate function that can be characterized by μ(λ), almost surely with respect to Q. The variable ηzt (κt) is the average speed of a trajectory over time interval [0, κt]. ˆ the random variables ηzt (κt) satisfy a large deviation Definition 1. For fixed ωˆ ∈ , principle with a convex rate function S(c) if there exists a convex function S(c), independent of z ∈ Rn , such that (i) For each s ≥ 0, the set (s) = {c ∈ R| S(c) ≤ s} is compact.

KPP Fronts in Temporally Random Shear Flows

513

(ii) For any δ, h > 0, there exists t0 > 0 such that for all t > t0 , P d(ηzt (κt), (s)) > δ ≤ e−κt (s−h) . (iii) For any δ, h > 0, there exists t0 > 0 such that for all t > t0 , P ηzt (κt) ∈ Uδ (c) ≥ e−κt (S(c)+h) .

(63)

If such a function S(c) exists, it might depend on the parameter κ ∈ (0, 1], and it ˆ However, we will show that might depend on ωˆ ∈ . Theorem 4. Almost surely with respect to Q, the random variables ηzt (κt) satisfy a large deviation principle (with respect to P z,t ) with a convex rate function S(c) that is ˆ independent of κ and ωˆ ∈ . We postpone the proof for the moment while we finish the proof of Lemma 7. By Proposition 1, the quantity μ(λ) is well defined and is almost surely constant with respect to Q for λ ∈ Rn , independent of κ. Since, by our assumption of Theorem 4, the variables ηzt (κt) have a convex rate function, it follows (see Sect. 5.1 of [17]) that the rate function S(c) is the same convex function defined by (8): S(c) = sup [c · λ − μ(λ)]. λ∈Rn

(64)

Thus, our use of the notation S(c) in Definition 1, Theorem 4, and (8) anticipates this equivalence. The characterization (64) does not hold if the rate function is not convex, in which case the Legendre transform of μ is equal to the convex envelope of the rate ˆ although function. Let us emphasize that S(c) is independent of κ ∈ (0, 1] and ωˆ ∈ , the constants t0 in Definition 1 may depend on κ, ω. ˆ Now, by definition of S(c), lim inf t→∞

1 log inf P z,t ηzt (κt) ∈ Uδ (c) ≥ −S(c) > −∞, κt z∈Rn

(65)

and the lower bound (52) of Lemma 7 is lim inf t→∞

1 log inf u(ct, t) ≥ f (0) − max S(c). c∈K c∈K t

(66)

To prove the lower bound we now use the Feynman-Kac formula to relate (65) to (66), as in the arguments of Freidlin in Lemma 7.3.2 in [16]. The compactness of K implies that it suffices to show that given any > 0, and any c for which S(c) − f (0) > 0, 1 log inf u(ct, ˜ t) ≥ f (0) − S(c) − (67) lim inf t→∞ t c∈U ˜ δ (c) for δ > 0 sufficiently small. Without loss of generality, we assume that the initial data is the characteristic function of a small ball centered at the origin: u 0 (z) ≥ χUδ (0) (z) for some δ > 0. We define q to be the limit on the left-hand side of (67): 1 log inf u(z, t) . q = lim inf t→∞ z∈Uδt (ct) t We also assume that S(c) − f (0) > 0.

(68)

514

J. Nolen, J. Xin

Step 1. The first step is essentially the same as in [16]. Suppose for the moment that q is finite. By the representation (55) we have for any κ ∈ (0, 1], κt inf u(t c, ˜ t) ≥ inf E e 0 ζ (t−s,u(Z (s),t−s))ds u(Z (t − κt), t − κt)χ A (69) c∈U ˜ δ (c)

c∈U ˜ δ (c)

for any set Fs≤t -measurable set A. Recall that when u ≤ h, the reaction rate ζ (u) is bounded below by ζh > 0. If we choose A to be the set of paths satisfying both Z z,t (κt) ∈ U(1−κ)δt ((1 − κ)tc)

(70)

u(Z z,t (s), t − s) ≤ h for all s ∈ [0, κt],

(71)

and

then from (69) and the assumption that q is finite we have a lower bound q ≥ ζh + lim inf t→∞

1 log inf P(A), κt c∈U ˜ δ (c)

(72)

provided that the limit on the right also exists and is finite. Step 2. Now we bound the right-hand side of (72) and show how it relates to (65). Since we have assumed that S(c) − f (0) > 0, Theorem 1 implies that there is δ sufficiently small so that for any h ∈ (0, 1) there is a constant t0 > 0, depending on h, such that u(c t, t) ≤ h for all c ∈ U6δ (c), t ≥ t0 . Now if κ < 1/2 and

sup Z z,t (s) − (t − s)c ≤ 3δt,

(73)

s∈[0,κt]

then (71) is achieved along such paths when t > 2t0 . Next, if c˜ ∈ Uδ (c) is written c˜ = c + δe1 for some e1 ∈ Rn with |e1 | < 1, then define cˆ = c + 2δe1 . Then for any |e2 | < 1, ct ˜ − κt cˆ + κtδe2 ∈ U(1−κ)δt ((1 − κ)ct).

(74)

It follows that for each c˜ ∈ Uδ (c) there is a cˆ ∈ U2δ (c) such that (70) is achieved whenever ηzt (κt) ∈ Uδ (c), ˆ where η is defined by (6). This gives us a lower bound on P(A) in terms of the ηzt (κt), the average speed of a trajectory over [0, κt]: inf

c∈U ˜ δ (c),z=ct

≥

inf

P(A)

c∈U ˆ 2δ (c),z=ct ˆ

(75)

P

z,t t

sup Z (s) − (t − s)c ≤ 3δt, ηz (κt) ∈ Uδ (c) ˆ .

s∈[0,κt]

For κ sufficiently small, κ < (2δ)/(3 max(1, |c|)), we see that

z,t

sup P sup Z (s) − (t − s)c ≥ 3δt c∈U ˆ 2δ (c),z=ct ˆ

≤ sup P z∈Rn

s∈[0,κt]

z,t sup Z (s) − z ≥ δt/3 .

s∈[0,κt]

KPP Fronts in Temporally Random Shear Flows

515

By Corollary 1 there are constants K 1 , K 2 > 0 independent of κ such that (except possibly on a set of Q-measure zero)

z,t 2 sup Z (s) − z ≥ δt/3 ≤ K 1 e−K 2 δ t/κ (76) sup P z∈Rn

s∈[0,κt]

for t sufficiently large, depending on ω. ˆ Therefore, for any M > 0, by choosing κ arbitrarily small, we can make K 2 δ 2 /κ 2 > M, so that

z,t 1 sup Z (s) − (t − s)c ≥ 2δt lim sup log sup P t→∞ κt s∈[0,κt] z∈Rn 1 2 log(K 1 e−K 2 δ t/κ ) ≤ −M. t→∞ κt

≤ lim

Therefore, from (72) and (75) we now see that for κ sufficiently small, q ≥ ζh + lim inf t→∞

1 κt

inf

c∈U ˆ 2δ

(c),z∈Rn

P ηzt (κt) ∈ Uδ (c) ˆ ,

(77)

provided that the limit on the right is finite and bounded below, independent of κ. However, this follows immediately from Theorem 4 and the lower bound (65), since S(c) is independent of κ. Then (67) follows by letting h → 0 so that ζh → f (0). Step 3. It remains to establish the initial claim that q > −∞, almost surely with respect to Q. To see this, define for any c ∈ Rn , (78) qˆδ (c, t) = inf P z,t Z z,t (t) ∈ Uδ (0) , z∈Uδ (tc)

ˆ We will show that for any bounded set ⊂ Rn , there is a a random variable over . finite constant K 1 > 0 such that the limit lim inf t→∞

1 log qˆδ (c, t) ≥ −K 1 t

(79)

holds uniformly over c ∈ . This immediately implies that q > −∞. For z ∈ Uδ (ct), let us write X z,t (s) as X z,t (s) = x + I z,t (s) + W10 (s), where I z,t is the first integral term in (5) and W10 (0) = 0. Note that the integral I z,t is independent of W1 , due to the shear structure of the flow. For simplicity of notation, we will write W2 (t) ∈ Uδ (0) to mean that |W2 (t)| < δ, even though Uδ (0) generally denotes an n-dimensional ball. First, we claim that for z ∈ Uδ (tc), 2 y P Z tz,t ∈ Uδ (0)| W2 (t) ∈ Uδ/2 (0) ≥ e−(3|c| +1)t for t sufficiently large. For ωˆ ∈ fixed, let M > 0 and define the set

A M = A M (t) = {w ∈ | sup I z,t (s) ≤ Mt}. z∈D,s∈[0,t]

(80)

516

J. Nolen, J. Xin

Using the fact that W1 (s) and I z,t (s) are independent, we see that for z ∈ Uδ (tc), y

P(Z tz,t ∈ Uδ (0)| W2 (t) ∈ Uδ/2 (0)) y

≥ P(W10 (t) ∈ Uδ/2 (0) − I z,t (t) − x| W2 (t) ∈ Uδ/2 (0)) y

≥ P(W10 (t) ∈ Uδ/2 (0) − I z,t (t) − x, A M | W2 (t) ∈ Uδ/2 (0)) y

≥

inf P(W10 (t) ∈ Uδ/2 (0) + eMt ˆ + ct + δ eˆ2 )P(A M | W2 (t) ∈ Uδ/2 (0)) |eˆ1 |,|eˆ2 |≤1 (Mt+|c|t+δ)2 δ y 2t ≥√ P(A M | W2 (t) ∈ Uδ/2 (0)). (81) e− 2π t y

By Lemma 5, P(A M | W2 (t) ∈ Uδ/2 (0)) ≥ 1/2 for t sufficiently large, depending on ωˆ y and M. Moreover, if we choose M = max(1, |c|), then P(A M | W2 (t) ∈ Uδ/2 (0)) ≥ 1/2 for t sufficiently large, independent of c. Using this in (81) establishes (80), for t sufficiently large. Now, we can bound qˆδ : qˆδ (c, t) = inf P Z tz,t ∈ Uδ (0) z∈Uδ (tc) ≥ inf P Z tz,t ∈ Uδ (0), W2 (t) ∈ Uδ/2 (0) z∈Uδ (tc) = inf E χW2 (t)∈Uδ/2 (0) P(Z tz,t ∈ Uδ (0)| W2 (t) ∈ Uδ/2 (0)) z∈Uδ (tc) 2 ≥ inf E χW2 (t)∈Uδ/2 (0) e−(3|c| +1)t z∈Uδ (tc)

≥ e−(3|c|

2 +1)t

inf

z∈Uδ (tc)

2 P W2 (t) ∈ Uδ/2 (0) ≥ e−(4|c| +1)t

(82)

for t sufficiently large. Therefore, lim inf t→∞

1 log qˆδ (c, t) ≥ −(4 |c|2 + 1) t

(83)

is finite almost surely with respect to Q. For any bounded set ⊂ Rn , we can choose K 1 to be K 1 = 1 + sup 4 |c|2 < ∞.

(84)

c∈

This establishes the claim (79). Having shown that q is finite, we have completed the proof of Lemma 7. For use in the next section, we now show that for all t > 0, log(qˆδ (c, t)) is integrable with respect to Q. Note that bound (82) holds for t sufficiently large, depending on ω, ˆ so more work is needed in order to establish the integrability of log(qˆδ (c, t)). Lemma 8. For each c ∈ Rn ,

1

sup E Q log qˆδ (c, t)

< ∞. t t>1

(85)

KPP Fronts in Temporally Random Shear Flows

517

Proof. Using (81) we see that 1 1 log qˆδ (c, t) ≥ log P Z tz,t ∈ Uδ (0), W2 (t) ∈ Uδ/2 (0) t t 2 δ 1 y − ((1+2|c|)t+δ) t ≥ log P W2 (t) ∈ Uδ/2 (0) √ P(A M | W2 (t) ∈ Uδ/2 (0)) e t 2π t 2 1 y − ((1+2|c|)t+2δ) t P(A M | W2 (t) ∈ Uδ/2 (0)) ≥ −C1 + log e t for a constant C1 > 0 depending only on c and δ, for t ≥ 1. This constant is uniformly bounded for c in a bounded set and δ > 0 fixed. Let gˆ be the term inside the logarithm: gˆ = e−

(Mt+|c|t+2δ)2 2t

y

P(A M | W2 (t) ∈ Uδ/2 (0)),

a random variable with respect to Q. Then for α ≥ 2C1 , Q

1 log qˆδ (c, t) ≤ −α t

1 log gˆ ≤ −α/2 t (86) = Q gˆ ≤ e−αt/2 (Mt+|c|t+2δ)2 y 2t = Q P(A M | W2 (t) ∈ Uδ/2 (0)) ≤ e−αt/2 e . ≤Q

Also, from Lemma 5, 2 2 y Q P(A M | W2 (t) ∈ Uδ/2 (0)) ≤ 1 − e−K 2 M t/2 ≤ K 1 e−K 2 M t/2 .

(87)

It is easy to see that √ there exist constants K 3 , K 4 > 0 independent of t such that whenever t ≥ 1, M = K 3 α, and α ≥ K 4 |c|2 , we have e−αt/2 e

(Mt+|c|t+2δ)2 2t

≤ 1/2 ≤ 1 − e−K 2 M

2 t/2

.

By combining (86) and (87), we now conclude that Q

1 log qˆδ (c, t) ≤ −α t

≤ K 1 e−K 2 K 3 αt/2 2

whenever α ≥ K 4 |c|2 and t ≥ 1. It follows that for t ≥ 1,

∞

1

1

E Q

log qˆδ (c, t)

= Q

log qˆδ (c, t)

≥ α dα t t 0 ∞ 2 ≤ K 4 |c|2 + K 1 e−K 2 K 3 αt/2 dα < ∞. K 4 |c|2

This is bounded uniformly in t, for t ≥ 1.

(88)

(89)

518

J. Nolen, J. Xin

7. Proof of Large Deviation Estimates In this section we prove Theorem 4. We work first with the case κ = 1. For c ∈ Rn and 0 ≤ r < s < t, define the probability qδz (c, r, t) = P z,t (z − Z z,t (t − r ) ∈ Uδ(t−r ) (c(t − r ))) = P z,t (ηzt (t − r ) ∈ Uδ (c)) and qδ+ (c, r, t) = sup qδz (c, r, t), z∈D

qδ− (c, r, t)

= inf qδz (c, r, t). z∈D

The quantity qδz (c, r, t) is the probability that a trajectory should have average velocity sufficiently close to c, over a given time interval. This probability depends on the starting point z, so qδ+ and qδ− are the maximum and minimum possible probabilities. The ˆ but we will use the sub-additive ergodic quantities qδz , qδ+ , and qδ− also depend on ωˆ ∈ , − theorem to show that (1/t) log qδ (c, 0, t) converges to a finite constant, −Sδ (c), almost surely with respect to Q. From these constants we will recover the desired rate function S(c) as the limit of Sδ (c) as δ → 0. Then, we will derive a Harnack-type inequality to compare (1/t) log qδ− (c, 0, t) and (1/t) log qδ+ (c, 0, t) and show that S(c) satisfies the requirements for Definition 1. The same analysis will extend to the case of κ < 1. Define the events A = ω ∈ | z − Z z,t (t − r ) ∈ Uδ(t−r ) (c(t − r )) = ηzt (t − r ) ∈ Uδ (c) , B = ω ∈ | z − Z z,t (t − s) ∈ Uδ(t−s) (c(t − s)) = ηzt (t − s) ∈ Uδ (c) . Note that event B is Fst ≤τ measurable for any τ ≥ t − s. Using the Markov property of the Wiener process, we find that log(qδ− (c, s, t)) is super-additive for each c ∈ Rn since qδ− (c, r, t) = inf P(A) ≥ inf P(A ∩ B) y y ! z,t ≥ inf P −Z (t − r ) + Z 0,z,t (t − s) ∈ Uδ(r −s) (c(s − r )) ∩ B z = inf E χ B P −Z z,t (t − r ) + Z z,t (t − s) ∈ Uδ(s−r ) (c(s − r )) | Ft−s z = inf E χ B P −Z z,t (t − r ) + Z z,t (t − s) ∈ Uδ(s−r ) (c(s − r )) | Z z,t (t − s) z z,s ≥ inf E χ B inf P z − Z (s − r ) ∈ Uδ(s−r ) (c(s − r )) z z = inf P z − Z z,s (s − r ) ∈ Uδ(s−r ) (c(s − r )) inf P(B) z

z

= qδ− (c, r, s)qδ− (c, s, t). Also, due to the stationarity of B with respect to t, τh qδ− (c, r, t) = τh inf P(z − Z z,t (t − r ) ∈ Uδ(t−r ) (c(t − r ))) y

= inf P(z − Z z,t+h (t − r ) ∈ Uδ(t−r ) (c(t − r ))) =

y qδ− (c, r

+ h, t + h).

KPP Fronts in Temporally Random Shear Flows

519

For any > 0, we can bound q below by translating in z and using (78): qδ− (c, r, t) ≥ τr qˆ (c, t − r ) =

inf

z∈U (ct)

P(Z z,t (t − r ) ∈ U (cr ))

(90)

if < δ(t −r ). Hence, log(qδ− (c, r, t)) is integrable by (89). Kingman’s ergodic theorem [23] now implies that the limit lim

n→∞

1 1 log qδ− (c, 0, n) = sup E Q [log qδ− (c, 0, n)] = −Sδ (c) n n n>0

(91)

exists and is a finite constant, Q-a.s, because of the ergodicity Assumption A2. To extend the convergence in (91) to continuous time, we employ a technique from [2] (see the proof of Theorem 2.5 therein). Let

g(ω) ˆ = sup log(qδ− (c, r, t)) . r,t∈[0,2] |r −t|≥1

Let ϒ(ω) ˆ = supz∈D,t∈[0,2] |B(z, t)|. Then for all 0 ≤ r < t ≤ 2, we can bound

t−r

y

sup

b(W2 (τ ), t − τ )dτ

≤ ϒ |t − r | y∈D

0

y

independently of the realization of W2 . As in (81), P(z − Z z,t (t − r ) ∈ Uδ(t−r ) (c(t − r ))) ≥

P(W10 (t − r ) ∈ Uδ(t−r )/2 (0) + eˆ1 ϒ(t − r ) + (c + δ eˆ2 )(t − r )) inf |eˆ1 |,|eˆ2 |≤1

×P(W20 (t − r ) ∈ Uδ(t−r )/2 (0)) 2 (|c|+δ)2 (ϒ+|c|+δ)2 δ |t − r | − (t−r )(t−r δ |t − r | − (t−r )2(t−r ) ) ≥√ . e e √ 2π |t − r | 2π |t − r | Therefore, since r, t ∈ [0, 2] and |r − t| ≥ 1 in the definition of g(ω), ˆ 0 ≤ g(ω) ˆ ≤ K1 + K2ϒ 2 for some constants K 1 , K 2 > 0 that depend on δ and c. Hence g(ω) ˆ is integrable with respect to Q, since ϒ 2 is integrable by the Borell inequality. By the super-additivity of log qδ− (c, r, t), log qδ− (c, 0, n − 1) − τn−1 g ≤ log qδ− (c, 0, t) ≤ log qδ− (c, 0, n + 2) + τn g

(92)

whenever t ∈ (n, n + 1), n ∈ Z . The ergodic theorem implies that N 1 τn g → E[g] < ∞ N

(93)

n=1

almost surely. Therefore, n1 τn g → 0 almost surely as n → ∞. It now follows from (92) that the limit along continuous time 1 log qδ− (c, 0, t) = −Sδ (c) t holds almost surely with respect to Q. lim

t→∞

(94)

520

J. Nolen, J. Xin

Now we extend this conclusion to the case of κ < 1, as well. If κ ∈ (0, 1) and δ > 0, the stationarity of b(y, t) implies that 1 1 log inf P ηzt (κt) ∈ Uδ (c) = log qδ− ((1 − κ)t, t) → −Sδ (c) z κt κt

(95)

in distribution (with respect to Q) as t → ∞, but this does not immediately imply pointwise, almost-sure convergence. However, the collection of sets {[(1−κ)t, t]}t≥0 is a regular family of sets in the sense of [2], since 0 ≤ |[0, t]| ≤ C |[(1 − κ)t, t]| for all t, with C = κ1 . It now follows from Theorem 2.8 of [2] that limn→∞ n1 log qδ− ((1 − κ)n, n) converges almost surely along any rational sequence. Therefore, (95) implies that, indeed, this limit is −Sδ (c), lim

n→∞

1 log qδ− ((1 − κ)n, n) = −Sδ (c), Q − a.s. n

(96)

Finally, this convergence can be extended to continuous time, using the same technique as in (92). For each c ∈ Rn , Sδ (c) can be bounded above independently of δ > 0 using (79) and (90). From the definition, it is clear that Sδ (c) ≥ 0 for all δ, and that Sδ1 (c1 ) ≤ Sδ2 (c2 )

(97)

whenever Uδ1 (c1 ) ⊃ Uδ2 (c2 ). In particular, Sδ1 (c) ≤ Sδ2 (c) for δ1 > δ2 , c ∈ Rn . Therefore, we define for each c ∈ Rn , S(c) = lim Sδ (c) = sup Sδ (c) ∈ [0, +∞). δ→0

δ>0

This will be the rate function described in the theorem. Lemma 9. For all δ > 0, the functions Sδ (c) are continuous and convex in c. Also, S(c) is continuous and convex in c. Proof. The continuity and convexity of S(c) follows immediately from the fact that it is the finite, pointwise limit of the continuous, convex functions Sδ (c). The convexity of Sδ (c) follows from the Markov property of the process Z z,t , as follows. Let p ∈ [0, 1] and c0 = pc1 +(1− p)c2 . Let t > 0 and denote t1 = pt, t2 = (1− p)t. Then we see that qδ− (c0 , 0, t) = inf P z − Z z,t (t) ∈ Uδt (c0 t) z ≥ inf P z − Z z,t (t) ∈ Uδt (c0 t) , z − Z 0,y,t (t1 ) ∈ Uδt1 (c1 t1 ) z ≥ inf P z − Z z,t−t1 (t2 ) ∈ Uδt2 (c2 t2 ) inf P z − Z z,t (t1 ) ∈ Uδt1 (c1 t1 ) =

z qδ− (c2 , 0, t2 )qδ− (c1 , t2 , t).

z

Hence 1 − log qδ− (c0 , 0, t) t 1 1 ≤ − log qδ− (c2 , 0, (1 − p)t) − log qδ− (c1 , (1 − p)t, t). t t

(98)

KPP Fronts in Temporally Random Shear Flows

521

By the preceding arguments, both terms on the right converge (Q-a.s.) as t → ∞ to the constants (1 − p)Sδ (c2 ) and pSδ (c1 ). Therefore, we infer that Sδ (c0 ) ≤ (1 − p)Sδ (c2 ) + pSδ (c1 ). So, Sδ (c) is convex and must also be continuous in c, since it is finite for every c ∈ Rn . This establishes the existence and convexity of the function S(c). Part (iii) of Definition 1 follows from the definition of Sδ (c) and the fact that Sδ (c) S(c). To finish the proof of Theorem 4, we must establish a Harnack-type inequality to relate the probabilities

P(z − Z z,t (t) ∈ Uδt (ct)) and P(z − Z z ,t (t) ∈ Uδt (ct)) corresponding to different starting points z, z ∈ D. This will allow us to remove the inf z in the definition of q and S(c) and to establish parts (i) and (ii) of Definition 1. Unfortunately, the quantity log qδ+ is not sub-additive or super-additive, so we cannot use the ergodic theorem to show that 1t log qδ+ → −Sδ (c), almost surely, as is the case with log qδ− . Note that it is not true that two trajectories starting close together will remain close. For example, suppose the flow is B(z, t) = (sin(y), 0). Then if z = (0, 0) and z = (0, π ), the X components of the trajectories will satisfy t t X z,t (t) − X z ,t (t) = sin(W2 (s)) − sin(π + W2 (s)) ds = 2 sin(W2 (s)) ds, 0

0

√ which we expect will grow like t. Nevertheless, the estimate we need must only relate the distributions of the two processes, not the individual trajectories for fixed realizations of W . We prove the following lemma: Lemma 10. There are constants K 1 , K 2 , K 3 , K 4 > 0 such that for all κ ∈ (0, 1], c ∈ Rn , > 0, and δ > 0, 2 2 2 2 inf P ηzt (κt) ∈ U(1+)δ (c) ≥ K 4 e−K 3 δt sup P ηzt (κt) ∈ Uδ (c) − K 1 e−K 3 δ κ t . z

z

Proof of Lemma 10. For clarity we let κ = 1. Extension to κ < 1 is straightforward, as in the proof of Lemma 5. Because of the shear flow structure, (Z z,t (t) − z) and ηzt (t) are independent of x (where z = (x, y)), and the component Y z,t is just a Wiener process. These two facts will enable us to estimate the cost of switching the initial point from z to z . For M > 0 and s ∈ (0, t) to be chosen, the Markov property of the process implies that

P(Z z,t (t) ∈ Uδt+2M (z + ct), Z z,t (s) − z ≤ M) = ρ(z, t, zˆ , s)P zˆ ,s (Z zˆ ,s (t − s) ∈ Uδt+2M (z + ct)) d zˆ |zˆ −z |≤M ≥ ρ(z, t, zˆ , s)P zˆ ,s (Z zˆ ,s (t − s) ∈ Uδt+M (ˆz + ct)) d zˆ , (99) z ˆ −z ≤M | | where ρ(z, t, zˆ , r ) denotes the transition density of the Markov process Z z,t (r ). The term P zˆ ,s (Z zˆ ,s (t − s) ∈ Uδt+M (ˆz + ct)) inside the integral is independent of xˆ (where

522

J. Nolen, J. Xin

zˆ = (x, ˆ yˆ )). Using this fact, we will bound the integral in (99) by first integrating over x. ˆ Since Y z,t is just a Wiener process, the marginal distribution of ρ with respect to yˆ is Gaussian: | yˆ −y |2 1 ρ(z, t, zˆ , s)d xˆ = √ e− 2s = F( yˆ − y, s). 2π s R Therefore, 1 ρ(z, t, zˆ , s)d xˆ = √ F( yˆ − y, s) − P z,t (A M | Y z,t (s) = yˆ ), 2π |s| |x−xˆ |≤M

where A M = {ω| X z,t (s) − x ≥ M}. This set will turn out to be very small. For > 0, let s = 1 and M = δt/2 (use M = δκt/2 when κ < 1). Therefore, integrating only in x, ˆ we have ρ(z, t, zˆ , s)P zˆ ,s (Z zˆ ,s (t − s) ∈ Uδt+M (ˆz + ct)) d zˆ |zˆ −z |≤M ≥ F( yˆ − y, s)P zˆ ,s (Z zˆ ,s (t − s) ∈ Uδt+M (ˆz + ct)) d yˆ − G 1 , (100) | yˆ −y |≤M/2 where G 1 = M sup P z,t (A M | Y z,t (s) = yˆ ). yˆ

Now we switch the initial point from z to z , such that y − y ≤ L. Then, continuing from (100), we have F( yˆ − y, s)P zˆ ,s (Z zˆ ,s (t − s) ∈ Uδt+M (ˆz + ct)) d yˆ − G 1 | yˆ −y |≤M/2 −M/2 ≥ Ce F( yˆ − y , s)P zˆ ,s (Z zˆ ,s (t − s) ∈ Uδt+M (ˆz + ct)) d yˆ − G 1 | yˆ −y |≤ M2 −L ≥ Ce−M/2 ρ(z , t, zˆ , s)P zˆ ,s (Z zˆ ,s (t − s) ∈ Uδt+M (ˆz + ct)) d zˆ − G 1 |zˆ −z |≤ M2 −L

M

− L) − G 1 . ≥ Ce−M/2 P(Z z ,t (t) ∈ Uδt (z + ct), Z z ,t (s) − z ≤ (101) 2 Combining (99) and (101) we have P(Z z,t (t) ∈ Uδt+2M (z + ct))

M

− L) − G 1 ≥ Ce−M/2 P(Z z ,t (t) ∈ Uδt (z + ct), Z z ,t (s) − z ≤ 2 ≥ Ce−M/2 P(Z z ,t (t) ∈ Uδt (z + ct)) − Ce−M/2 G 2 − G 1 , (102)

where G 2 = sup P z,t (A M −L ). 2

zˆ

It follows from Lemma 5, that G 1 , G 2 ≤ K 1 e−K 2 M = K 1 e−K 2 2

2 δ 2 t 2 /4

for t sufficiently large. Now the lemma follows from (102).

KPP Fronts in Temporally Random Shear Flows

523

Since > 0 is arbitrary, the lemma implies that Corollary 2. For all , δ > 0, c ∈ Rn , and κ ∈ (0, 1], 1 log qδ− (c, (1 − κ)t, t) = Sδ (c) t→∞ κt

− lim

≥ − lim sup t→∞

1 log qδ+ (c, (1 − κ)t, t) κt

≥ S(1+)δ (c) almost surely with respect to Q. Now, using this estimate, we can establish parts (i) and (ii) from Definition 1. From Lemma 1 there are constants K 1 , K 2 > 0 such that for t sufficiently large,

2 (103) P ηzt (κt) ≥ |c| ≤ K 1 e−K 2 |c| t/κ . This implies that lim|c|→∞ S(c)/ |c| = +∞. Hence, (s) is a bounded set, for each s ≥ 0. By continuity of S(c), (s) is compact. Let A be the set A = {c ∈ Rn | d(c, (s)) > δ}. We must show that for any fixed δ > 0, h > 0, P(ηzt (κt) ∈ A) ≤ e−κt (s−h) ,

(104)

for t > 0 sufficiently large. Because of the bound (103), it suffices to show that (104) holds with A replaced by any compact subset A of A (because K 2 |c|2 > κ 2 s when |c| is sufficiently large). ˆ such that Q(V ) = 0 and the We have shown that for c ∈ Rn , there is a set V (c) ⊂ ˆ V (c). To obtain convergence on a set independent convergence (94) holds for all ωˆ ∈ \ ˆ by of c, we define the set Vˆ ⊂ Vˆ = V (c), (105) c∈Qn

ˆ \ Vˆ . which has measure zero. Therefore, for all c ∈ Qn , (94) holds for all ωˆ ∈ Now, we claim that we can choose > 0 small enough so that < δ and inf S (c ) > s −

c ∈A

h . 2

(106)

If this were not so, then there must be a sequence k → 0 and {ck } ⊂ A such that Sk (ck ) ≤ s − h/2. Because A is compact, there must be a subsequence ckn converging to some c0 ∈ A . Since c0 ∈ A , S (c0 ) > s − h/4 whenever is less than some 0 > 0. However, Uk (ck ) ⊂ U0 (c0 ) for k sufficiently large. It follows from (97) that Sk (ck ) > s − h/4, for k sufficiently large. This is a contradiction, so the claim must hold. Having chosen to satisfy (106), cover A with a finite number of balls having size 2 : A ⊂

N n=1

U/2 (cn )

524

J. Nolen, J. Xin

N for some finite set {cn }n=1 ⊂ A ∩ Qn . Therefore,

P(ηzt (κt) ∈ A ) ≤

N

P(ηzt (κt) ∈ U/2 (cn )).

n=1

≤

N

+ q/2 ((1 − κ), t, cn ).

n=1

ˆ \ Vˆ , Now we apply Corollary 2 and inequality (106) to conclude that for all ωˆ ∈ lim sup t→∞

1 log P(ηzt (κt) ∈ A ) κt 1 + log q/2 ((1 − κ)t, t, cn ) κt N

≤ lim sup t→∞

n=1

S2/3 (c ) < −(s − ≤ − inf c ∈A

h ) 2

for t sufficiently large. Thus, (104) holds almost surely for t sufficiently large. This proves part (ii) of Definition 1 for κ ∈ (0, 1] completes the proof of Theorem 4. 8. Estimating c∗ With the Variational Formula In this final section, we use formula (10) to derive some analytical bounds on c∗ . Throughout this section we will assume that the shear b(y, t) has the form b(y, t) =

N

j

j

b1 (y)b2 (t),

(107)

j=1 j

j

where b1 (y) are Lipschitz continuous and periodic in y, and b2 (t) are stationary centered Gaussian fields such that Assumptions A1-A6 are satisfied. Also, we assume that the initial data is independent of y: u(z, 0, ω) ˆ = u 0 (x), and we consider only front propagation in the direction k = (1, 0), which is aligned with the direction of the shear. Under these assumptions, the random solutions u(z, t, ω) ˆ will be periodic in y for all time, and the maximum principle implies that the speed in the k direction is given by c∗ (k) = sup c · k, c∈

where = {c ∈ Rn | S(c) − f (0) = 0}. Using the definition of S(c) and the fact that is a convex set, one can see that this supremum is achieved at a point cˆ ∈ Rn that satisfies S(c) ˆ − f (0) = 0 = sup cˆ · kλ1 − μ(λ1 k) − f (0). λ1 >0

Consequently, the variational formula for the front speed reduces to the one-dimensional optimization problem μ(λ1 k) + f (0) , λ1 >0 λ1

c∗ (k) = inf

KPP Fronts in Temporally Random Shear Flows

525

where μ is determined by the limit μ(λk) =

1 λ2 + lim log φ(y, t) 2 t→∞ t

(108)

and φt =

1 φ − λb(y, t)φ, 2

φ|t=0 ≡ 1.

(109)

In the case of time-independent flows, c∗ (k) is the minimal speed of the traveling wave in the direction k, as described in the work of Berestycki and Nirenberg [7]. Notice that (109) is the same as Eq. (46). As before, we will use ρ(λ) to denote the limit on the right side of (108). We consider the scaling b(y, t) → δb(y, t) and the resulting enhancement of the corresponding speed c∗ = c∗ (δ). It is known [32] that if b(y, t) is periodic in both space and time that c∗ (δ) = c∗ (0) + O(δ 2 ) for δ small and c∗ (δ) = c∗ (0) + O(δ) for δ as δ → ∞. The following theorem gives analytical upper bounds consistent with this asymptotic behavior. Theorem 5 (Bounds on c∗ ). For all δ ≥ 0, c∗ (δ) satisfies the bounds (i) c∗ (δ) ≥ c∗ (0). (ii) c∗ (δ) = c∗ (0) if b(y, t) = b(t).

j

j (iii) c∗ (δ) ≤ c∗ (0) + δ Nj=1 b1 ∞ E Q b2 . " (iv) c∗ (δ) ≤ c∗ (0) 1 + δ 2 p1 . From (iv), we also have c∗ (δ) ≤ c∗ (0)(1 +

δ 2 p1 ) + O(δ 3 ) 2

when δ is small. We also have a linear lower bound on the growth of c∗ (δ) as δ → ∞. Theorem 6 (Linear growth of c∗ ). The non-random constant C¯ ∈ [0, +∞) defined by lim inf δ→∞

c∗ (δ) = C¯ δ

(110)

is equal to zero if and only if b(y, t) ≡ b(t). Proof of Theorem 5. The first bound (i) follows from (47) and the formula μ(λk) + f (0) λ f (0) ≥ inf + = c∗ (0). λ>0 λ>0 2 λ λ

c∗ (δ) = inf

When φ satisfies (109), the function ψ = log(φ) satisfies

2 1

1 ψ + ∇ y ψ − λb(y, t), 2 2 ψ(y, 0) ≡ 0. ψt =

(111)

526

J. Nolen, J. Xin

Integrating (111) over D × [0, t], we have t

1 1 t

∇ y ψ 2 dy dt − λ ψ(y, t) dy = b(y, t) dy dt. t D 2t 0 D t 0 D Now let t → ∞: 1 1 t ψ(y, t) dy ≥ −λ lim b(y, t) dy dt ρ(λ) = lim t→∞ t D t→∞ t 0 D b(y, t) dy = 0, = −λE Q

(112)

D

almost surely with respect to Q. If b(y, t) = b(t), then the first integral on the right-hand

2 side of (112) vanishes since ∇ y ψ ≡ 0. Then taking the limit as t → ∞, we have equality: δb(t) dy = 0. ρ(λ) = E Q D

c∗ (δ)

c∗ (0).

= This proves part (ii). Hence For the linear upper bound (iii), note that

j t

j t

1 1 λδ j b1 ∞ 0 b2 (t−s) ds log E eλδ 0 b(W (s),t−s) ds ≤ log E e t t N 1 t

j

j = |λ| δ

b1 ∞

b (s) ds. t 0 2 j=1

As t → ∞, this last term converges almost surely to |λ| δ fore, c∗ (δ) always satisfies the linear upper bound

N

j j j=1 b1 E Q [|b2 |].

There-

j μ(λk) + f (0) λ f (0) j ≤ inf + +δ

b1 E Q [|b2 |] λ>0 λ>0 2 λ λ N

c∗ (δ) = inf

j=1

= c∗ (0) + δ

N j j

b1 E Q [|b2 |].

(113)

j=1

Finally, for upper bound (iv), observe that under the scaling b → λδb, the constant p1 defined in Assumption A5 can be replaced by p1 → λ2 δ 2 p1 . Then by (32), √ ρ(λ) ≤ 2λ2 δ 2 p1 and μ(λk) + f (0) λ f (0) λ2 δ 2 p1 ≤ inf + + = c∗ (0) λ>0 λ>0 2 λ λ 2 # = 2 (1 + δ 2 p1 ) f (0)/2 # = c∗ (0) (1 + δ 2 p1 )

c∗ (δ) = inf

= c∗ (0)(1 +

δ 2 p1 ) + O(δ 3 ). 2

(114)

KPP Fronts in Temporally Random Shear Flows

527

In proving Theorem 6, we will make use of the following lemma:

f (0) Lemma 11. The infimum of the curve μ(λk)+ over (0, ∞) is achieved at a unique λ " ∗ point λ ∈ (0, λ0 ], where λ0 = f (0)/2. Moreover, there are no other local minima.

We will also make use of the following growth estimate on the principal Lyapunov exponents: Proposition 2. There is a constant K > 0 such that for λ sufficiently large, ρ(λ) ≥ K λ. Proof of Lemma 11. This follows from the fact that μ(λk) = λ2 /2 + ρ(λ) with ρ being convex in λ and ρ(0) = 0 (see discussion leading to (47)). The point λ0 is the value of λ where the infimum of the curve λ/2 + f (0)/λ is attained. Proof of Proposition 2. In the case that b(y, t) is a Gaussian field with white-noise time dependence, the authors of [12] studied the behavior of ρ(κ) as κ → 0, where κ > 0 is a diffusion constant (replace with κ in (1)). Here we modify their strategy in order to treat the large advection limit when b has the form (107). For clarity of notation, we assume y ∈ D = [−L/2, L/2]. The argument generalizes to multiple dimensions in a straightforward way. For 0 ≤ s < t < ∞, let Aks,t be the set of functions g ∈ C 0,1 ([s, t]; R) such that g(s) = g(t) = 0, and g ∞ ≤ k. Define the random variable t−s I k (s, t) = sup b( f (τ ), t − τ ) dτ. f ∈Aks,t

0

ˆ = I k (s + h, t + h, ω). ˆ Then, the The variable I k (s, t) is super-additive, and τh I k (s, t, ω) sub-additive ergodic theorem implies that the limit 1 k I (0, t) = ζ (k) t→∞ t lim

exists Q-almost surely, and that ζ (k) is a non-random constant given by the formula 1 ζ (k) = sup E Q [I k (0, t)]. t>0 t We claim that ζ (k) > 0. Therefore, given ∈ (0, 1), ⎞ ⎛ t b( f (s), t − s) ds ≥ (ζ (k) − )t ⎠ ≥ 1 − Q ⎝ sup f ∈Ak0,t

(115)

(116)

0

if t is sufficiently large. That is, for any small there is a set of probability at least (1 − ) such that we can find f (s) = f (s, ω) ˆ ∈ Ak0,t satisfying t b( f (s, ω), ˆ t − s, ω) ˆ ds ≥ ζ (k)(1 − )t, (117) 0

and we expect that Brownian paths staying close to this f will make a significant contribution to the exponential in the definition of ρ(λ). For a constant γ > 0 to be determined and f ∈ Ak0,t , we let Bt ( f, γ ) be the γ -neighborhood of f in C([0, t]; D): Bt ( f, γ ) = X ∈ C([0, t], D) | X − f C 0 < γ .

528

J. Nolen, J. Xin

Using the Girsanov transformation, one can show that there are constants K 1 , K 2 independent of λ, t, and f ∈ Ak0,t such that P(Bt ( f, γ )) ≥ K 1 e−K 2 (k

2 +1/γ 2 )t

j

for t > 1. Because the {b1 (y)} are assumed to be Lipschitz continuous, we see that for any path X ∈ Bt ( f, γ ),

t

t N t

j

b(X (s), t − s) ds −

<γM b( f (s), t − s) ds

b2 (s) ds, (118)

0

0

j=1 0

j

where M is the maximum of the Lipschitz constants for the functions {b1 (y)} Nj=1 . By (117) and (118) with > 0 sufficiently small, there is a set of Q-probability at least (1 − ) such that t t y y E P eλ 0 b(Ws (s),t−s) ds ≥ E P eλ 0 b(Ws (s),t−s) ds χ Bt ( f,γ ) = eλζ (k)(1−)t e−λγ M V P(Bt ( f, γ )) N

t

≥ eλζ (k)(1−)t e−λγ M V K 1 e−K 2 (k

2 +1/γ 2 )t

,

(119)

j

k where V = j=1 0 |b2 (s)| ds and f ∈ A0,t is chosen to satisfy (117). For t large, independent of λ, k, and γ , V can be bounded by ⎛ ⎞ N j V ≤t⎝ E[|b2 (0|] + 1⎠ j=1

except on a set of probability less than . Therefore, we can choose γ small so that γ ≤

M(

N

ζ (k)

j=1

j

E[|b2 (0] + 1)|

.

Hence eλζ (k)(1−)t e−λγ M V ≥ eλζ (k)(1−2)t for t sufficiently large. Then by choosing λ large, λ ≥

K 2 (k12 /γ ) ζ (k) ,

and we obtain from (119), t y E P eλ 0 b(Ws (s),t−s) ds ≥ eλ(ζ (k)−3)t

with Q-probability at least (1 − 2), for t sufficiently large, independent of λ. Since the limit defining ρ(λ) exists Q-almost surely, this establishes the lemma with K = ζ (k)(1 − 3), for any ∈ (0, 1), k > 0. It remains to establish the claim that ζ (k) > 0. Note that for all k ≥ 0, E Q [I k (0, t)] ≥ t sup f ∈Ak E Q [ 0 b( f (s), t − s)ds] = 0. Also, E Q [I k2 (0, t)] ≥ E Q [I k1 (0, t)] whenever 0,t

k2 > k1 , since Ak0,t2 ⊃ Ak0,t1 . Without loss of generality, suppose that there is an > 0

j

j such that for all j = 1, . . . , N we have b (y) − b (0) = 0 if |y| < and y = 0. 1

1

This means that the b j (y) do not have a flat spot touching y = 0. Define the set j j j j G = {ω| ˆ b1 (y)b2 (s) > b1 (0)b2 (s), ∀s ∈ [0, 1], y ∈ (0, ), j = 1, . . . , N }. Then k Q(G) > 0. For k > 0, let f˜ ∈ A0,1 such that f˜(s) ∈ (0, ) for s ∈ (0, 1). Then we have

KPP Fronts in Temporally Random Shear Flows

⎡ E Q [I k (0, 1)] = E Q ⎣ sup f ∈Aks,t

⎡

⎤ 1

b( f (s), 1 − s)dsχG ⎦ +

0

+E Q ⎣ sup f ∈Aks,t

1

≥ EQ

1

> EQ 0

1

= EQ

⎤

1

b( f (s), 1 − s)dsχG c ⎦

0

b( f˜(s), 1 − s)dsχG + E Q

0

529

b(0, 1 − s)dsχG + E Q b(0, 1 − s)ds = 0.

1

b(0, 1 − s)dsχG c

0 1

b(0, 1 − s)dsχG c

0

0

Combining this with (115) establishes the claim that ζ (k) > 0 for all k > 0. Proof of Theorem 6. The fact that C¯ ∈ [0, +∞) follows from Theorem 5. Also, if b(y, t) ≡ b(t) then C¯ = 0 since c∗ (δ) = c∗ (0) for all δ > 0. By Lemma 11 there is a unique λ = λδ ∈ (0, λ0 ] such that μ(λδ k) + f (0) μ(λk) + f (0) = . λ>0 λ λδ

c∗ (δ) = inf

Let δ j → ∞ as j → ∞ and suppose that lim sup j→∞ (λδ j δ j ) ≤ M. This implies that lim inf j→∞

μ(λδ j k) + f (0) c∗ (δ j ) f (0) f (0) > 0. = lim inf ≥ lim inf ≥ j→∞ j→∞ λδ j δ j δj λδ j δ j M

So, in this case the result holds with C¯ = f (0)/M. Now suppose λδ j δ j is unbounded as j → ∞. By Proposition 2, there is a positive constant K such that ρ(λδ j δ j ) ≥ K λδ j δ j > 0 for j sufficiently large. Note that Proposition 2 treats the case of δ = 1; this is why we use ρ(λδ j δ j ) instead of ρ(λδ j ). Therefore, lim inf j→∞

λδ j c∗ (δ j ) f (0) ρ(λδ j δ j ) = lim inf + + ≥ K > 0, j→∞ 2δ j δj λδ j δ j λδ j δ j

since λδ j ∈ (0, λ∗ ] and λδ j δ j → ∞. Hence C¯ ≥ K > 0. Theorems 5 and 6 give linear upper and lower bounds on the enhancement of c∗ as the flow intensity increases. However, experiments with premixed flames have shown that increasing turbulence intensity does not lead to unlimited linear enhancement of the turbulent burning rate [36]. Denet [13] has proposed that this “bending" of the turbulent burning velocity in high-intensity flows can be explained by a rapid temporal decorrelation of the flow (see also Ashurst [3]). For the present model, the following upper bound confirms the hypothesis that rapid temporal decorrelation leads to sub-linear enhancement of the front speed. Notice that the derivation uses no information about the spatial

530

J. Nolen, J. Xin j

structure of the flow other than the maximum value ( b1 ∞ ). As a result, it is likely that the actual speed may grow more slowly than δ 1/2 , or that c∗ eventually decreases with δ, as suggested by the numerical experiments of [13] and [3] for temporally periodic flows. j

Corollary 3. For δ > 0, let {b2 (t)} Nj=1 be a family of stationary Gaussian fields on [0, ∞) satisfying E Q [b2 (s)b2k (t)] ≤ C1 e−α j,k |t−s| , where α j,k > 0 and C1 > 0. Then j j for the scaled flow bδ (y, t) = Nj=1 δb1 (y)b2 (δt), j

c∗ (δ) lim sup √ < +∞. δ δ→∞ Proof. For the flow

(120)

N

j j j=1 b1 (y)b2 (δt),

ˆ ) = sup (y1 , y2 , 0, r ) ≤ (r y1 ,y2

j j

b1 ∞ b1k ∞ E Q b2 (0)b2k (δr ) j,k

j ≤

b1 ∞ b1k ∞ C1 e−α j,k δ|r | .

(121)

j,k

Then from (3) we have 0

∞

ˆ ) dr ≤ p1 = C1 (r

b j ∞ bk ∞ 1

j,k

1

α j,k δ

.

The result now follow from part (iv) of Theorem 5. 9. Conclusions We have considered the propagation of KPP reaction fronts in temporally random shear flows with sufficiently decaying correlations. We showed that, under Assumptions A1A6 on a Gaussian shear field, the front speeds obey a variational formula that extends the known variational formula in the case of periodic media. Using this formula, we derived basic bounds on the front speeds. As a function of large shear root mean square amplitude, the front speed obeys linear growth at fixed correlation length. However, front speed growth becomes sublinear if there is sufficient temporal decorrelation. Developing methods to generalize the variational front speed formula for nonshear random flows will be left as a future work. Acknowledgement. The work was partially supported by NSF grants ITR-0219004, SCREMS-0322962, DMS-0506766. J. X. would like to thank Prof. M. Cranston for helpful communications. J. N. is grateful for support through a VIGRE graduate fellowship at UT Austin.

References 1. Adler, R.: An Introduction to Continuity, Extrema and Related Topics for General Gaussian Processes. Institute of Math Stat, Lecture Notes-Monograph Series, 12, 1990 2. Akcoglu, M.A., Krengel, U.: Ergodic theorems for superadditive processes. J. Reine Angew Math. 323, 53–67 (1981)

KPP Fronts in Temporally Random Shear Flows

531

3. Ashurst, Wm.T.: Flow-frequency effect upon Huygens front propagation. Combust. Theory Modelling 4, 99–105 (2000) 4. Berestycki, H.: The influence of advection on the propagation of fronts in reaction-diffusion equations. In: Nonlinear PDEs in Condensed Matter and Reactive Flows. NATO Science Series C 569, Berestycki, H., Pomeau, Y. eds. Doordrecht: Kluwer, 2003 5. Berestycki, H., Hamel, F.: Front Propagation in Periodic Excitable Media. Comm. Pure Appl. Math. 60, 949–1032 (2002) 6. Berestycki, H., Hamel, F., Nadirashvili, N.: Elliptic eigenvalue problems with large drift and applications to nonlinear propagation phenomena. Commun. Math Phys., 253(2), 451–480, (2005) 7. Berestycki, H., Nirenberg, L.: Travelling fronts in cylinders. Ann. Inst. H. Poincaré Anal. Non Linéaire 9, 497–572 (1992) 8. Carmona, R.A., Molchanov, S.A.: Parabolic Anderson problem and intermittency. Mem. Amer. Math. Soc. 108(518), viii+125 (1994) 9. Clavin, P., Williams, F.A.: Theory of premixed-flame propagation in large-scale turbulence. J. Fluid Mech. 90, 598–604 (1979) 10. Conlon, J., Doering, C. On Traveling Waves for the Stochastic FKKP Equation. J. Stat Phys. 120(3–4), 421–477 (2005) 11. Constantin, P., Kiselev, A., Oberman, A., Ryzhik, L.: Bulk burning rate in passive-reactive diffusion. Arch Rat. Mech Analy 154, 53–91 (2000) 12. Cranston, M., Mountford, T.: Lyapunov exponent for the parabolic Anderson model in R d . J. Funct. Anal. 236, 78–119 (2006) 13. Denet, B.: Possible role of temporal correlations in the bending of turbulent flame velocity. Combust. Theory Modelling 3, 585–589 (1999) 14. E, W., Sinai, Y.: New results in mathematical and statistical hydrodynamics. Russ. Math. Surv. 55(4), 635–666 (2000) 15. Ellis, R.S.: Entropy, Large Deviations, and Statistical Mechanics. New York: Springer-Verlag, 1985 16. Freidlin, M.I.: Functional Integration and Partial Differential Equations. Ann. Math. Stud. 109, Princeton, NJ: Princeton University Press, 1985 17. Freidlin, M.I., Wentzell, A.D.: Random Perturbations of Dynamical Systems. New York: Springer-Verlag, 1998 18. Gärtner, J., Freidlin, M.I.: The propagation of concentration waves in periodic and random media. Dokl. Acad. Nauk SSSR 249, 521–525 (1979) 19. Heinze, S., Papanicolaou, G., Stevens, A.: Variational principles for propagation speeds in inhomogeneous media. SIAM J. Applied Math. 62(1) 129–148 (2001) 20. Karatzas, I., Shreve, S.: Brownian Motion and Stochastic Calculus. New York: Springer-Verlag, 1991 21. Kato, T.: Perturbation Theory for Linear Operators. Berlin: Springer-Verlag, 1995 22. Khouider, B., Bourlioux, A., Majda, A.: Parameterizing turbulent flame speed-Part I: unsteady shears, flame residence time and bending. Combustion Theory and Modeling 5, 295–318 (2001) 23. Kingman, J.P.C.: The Ergodic Theory of Subadditive Stochastic Processes. J. Royal. Stat. Soc. Series B, 30(3), 499–510 (1968) 24. Kiselev, A., Ryzhik, L.: Enhancement of the traveling front speeds in reaction-diffusion equations with advection. Ann. de l’Inst. Henri Poincaré, Analyse Nonlinéaire, 18, 309–358 (2001) 25. Majda, A., Souganidis, P.E.: Large scale front dynamics for turbulent reaction-diffusion equations with separated velocity scales. Nonlinearity 7, 1–30 (1994) 26. Majda, A., Souganidis, P.E.: Flame fronts in a turbulent combustion model with fractal velocity fields. Comm. Pure Appl. Math. LI 1337–1348 (1998) 27. Mierczynski, J., Shen, W.: Exponential separation and principal Lyapunov exponent/spectrum for random/nonautonomous parabolic equations. J. Differ. Eqs. 191, 175–205 (2003) 28. Mueller, C., Sowers, R.: Random Traveling Waves for the KPP equation with Noise. J. Funct. Anal. 128, 439–498 (1995) 29. Nolen, J., Rudd, M., Xin, J.: Existence of KPP fronts in spatially-temporally periodic advection and variational principle for propagation speeds. Dynamics of PDE 2(1), 1–24, (2005) 30. Nolen, J., Xin, J.: Reaction diffusion front speeds in spatially-temporally periodic shear flows. SIAM J. Multiscale Modeling and Simulation 1(4), 554–570 (2003) 31. Nolen, J., Xin, J.: Min-Max Variational Principle and Front Speeds in Random Shear Flows. Meth. Appl. Anal. 11(4), 635–644 (2004) 32. Nolen, J., Xin, J.: Existence of KPP type fronts in space–time periodic shear flows and a study of minimal speeds based on variational principle. Discrete and Cont. Dyn. Syst. 13(5), 1217–1234 (2005) 33. Nolen, J., Xin, J.: A Variational Principle Based Study of KPP Minimal Front Speeds in Random Shears. Nonlinearity 18, 1655–1675 (2005) 34. Nolen, J., Xin, J.: Variational Principle Based Computation of KKP Front Speeds in Temporally Random Shear Flows. In preparation, 2006

532

J. Nolen, J. Xin

35. Peters, N.: Turbulent Combustion. Cambridge: Cambridge University Press, 2000 36. Ronney, P.: Some open issues in premixed turbulent combustion. In: Modeling in Combustion Science (Buckmaster, J.D., Takeno, T. eds. Lecture Notes In Physics, 449, Berlin: Springer-Verlag, (1995), pp. 3–22 37. Shen, W.: Traveling Waves in Diffusive Random Media. J. Dyn. Diff. Eqs. 16(4), 1011–1060 (2004) 38. Vladimirova, N., Constantin, P., Kiselev, A., Ruchayskiy, O., Ryzhik, L.: Flame enhancement and quenching in fluid flows. Combust. Theory and Modeling 7, 487–508 (2003) 39. Xin, J.: Existence and stability of travelling waves in periodic media governed by a bistable nonlinearity. J. Dyn. Diff. Eqs. 3, 541–573 (1991) 40. Xin, J.: Existence of planar flame fronts in convective–diffusive periodic media. Arch. Rat. Mech. Anal. 121, 205–233 (1992) 41. Xin, J.: Front propagation in heterogeneous media. SIAM Review 42(2), 161–230 (2000) 42. Xin, J.: KPP front speeds in random shears and the parabolic Anderson problem. Meth. Appl. Anal. 10(2), 191–198 (2003) 43. Yakhot, V.: Propagation velocity of premixed turbulent flames. Comb. Sci. Tech 60, 191 (1988) Communicated by P. Constantin

Commun. Math. Phys. 269, 533–543 (2007) Digital Object Identifier (DOI) 10.1007/s00220-006-0117-y

Communications in

Mathematical Physics

Entropy Production in Gaussian Thermostats Nurlan S. Dairbekov1 , Gabriel P. Paternain2 1 Kazakh British Technical University, Tole bi 59, 050000 Almaty, Kazakhstan.

E-mail: [email protected]

2 Department of Pure Mathematics and Mathematical Statistics, University of Cambridge,

Cambridge CB3 0WB, England. E-mail: [email protected] Received: 4 January 2006 / Accepted: 11 May 2006 Published online: 19 September 2006 – © Springer-Verlag 2006

Abstract: We show that an arbitrary Anosov Gaussian thermostat on a surface is dissipative unless the external field has a global potential. This result is obtained by studying the cohomological equation of more general thermostats using the methods in [3]. 1. Introduction Gaussian thermostats provide interesting models in nonequilibrium statistical mechanics [6, 9, 21]. Given a closed Riemannian manifold (M, g) and a vector field E (the external field) on M, the Gaussian thermostat (or isokinetic dynamics, cf. [13]) is given by the differential equation D γ˙ E(γ ), γ˙ = E(γ ) − γ˙ . dt |γ˙ |2

(1)

This equation defines a flow φ on the unit sphere bundle S M of M which reduces to the geodesic flow when E = 0. In general, Gaussian thermostats are not volume preserving and the purpose of the present paper is to characterize precisely those Anosov Gaussian thermostats in 2 degrees of freedom which do not preserve any smooth measure. When φ is Anosov and dim M = 2 a result of E. Ghys [11] ensures that φ is topologically conjugate to the geodesic flow of a metric of constant negative curvature and thus φ is transitive and topologically mixing. For such a flow it is well known (cf. [14, Chap. 20]) thatthere exists a unique Gibbs state ρ associated with the Hölder continuous d potential − dt log Jtu , where Jtu is the unstable Jacobian of φ. The measure ρ is t=0 characterized by being the maximum of d log Jtu dν, ν → h ν (φ) − dt t=0

534

N. S. Dairbekov, G. P. Paternain

where ν runs over all φ-invariant Borel probability measures and h ν (φ) is the measure theoretic entropy of φ with respect to ν. The unique measure ρ is called the SRB measure of φ. If τ is a probability measure which is absolutely continuous with respect to the T Liouville measure of S M, then ρ is also the weak limit of T1 0 φ ∗ τ dt as T → ∞. The entropy production of the state ρ is given by (cf. [20]) eφ (ρ) := − divF dρ = − Lyapunov exponents, where F is the infinitesimal generator of φ and divF is the divergence of F with respect to any volume form in S M. Fix a volume form on S M. Any other volume form can be written as f for some smooth positive function f . If we let L F be the Lie derivative of along F, then L F ( f ) = d(i F f ) = F( f ) + f L F = F( f ) + f divF. Hence if d ivF denotes the divergence of F with respect to f we have d ivF = F(log f ) + divF.

(2)

In other words the two divergences are flow cohomologous (and thus eφ is well defined for any φ-invariant measure). Ruelle [20] has shown that eφ (ρ) ≥ 0 with equality if and only if ρ is also the SRB measure of the flow φ−t . If ρ is an SRB measure for both φt and φ−t then of the theory d u and Gibbs states for Anosov flows (cf. [14, Prop. 20.3.10]) implies that − dt log J t t=0 d s are cohomologous (and the coboundary is the derivative along the flow log J t dt t=0 of a Hölder continuous function). It follows that φ preserves an absolutely continuous invariant measure with positive continuous density (and this measure would have to be ρ). An application of the smooth Livšic theorem [15, Cor 2.1] shows that φ preserves an absolutely continuous invariant measure with positive continuous density if and only if φ preserves a smooth volume form. Using (2) we see that eφ (ρ) = 0 if and only if divF is a flow coboundary and we can take divF with respect to any volume form. Let θ be the 1-form dual to E, i.e., θx (v) = E(x), v. An easy calculation (see Lemma 3.2) shows that if we consider in S M the volume form determined by the canonical contact 1-form, then divF(x, v) = −θx (v). Thus eφ (ρ) = 0 if and only if there is a smooth solution u to the cohomological equation F(u) = θ.

(3)

We will show as a consequence of a more general result to be stated below that if dim M = 2 then (3) holds if and only if θ is an exact form, i.e. if and only if E has a global potential. Thus we obtain: Theorem A. An Anosov Gaussian thermostat on a closed surface has zero entropy production if and only if the external field E has a global potential. A system with eφ (ρ) > 0 is referred to as dissipative. Dissipative Gaussian thermostats provide a large class of examples to which one can apply the Fluctuation Theorem of G. Gallavotti and E.G.D. Cohen [7, 8, 5] (extended to Anosov flows by G. Gentile [10]) and this theorem is perhaps one of the main motivations for determining precisely which thermostats are dissipative. Observe that Gaussian thermostats are reversible in

Entropy Production in Gaussian Thermostats

535

the sense that the flip (x, v) → (x, −v) conjugates φt with φ−t (just as in the case of geodesic flows). We recall that the chaotic hypothesis of Gallavotti and Cohen asserts that for systems out of equilibrium, physically correct macroscopic results will be obtained by assuming that the microscopic dynamics is uniformly hyperbolic. In [22], M. Wojtkowski proved Theorem A assuming that E has a local potential (i.e. θ is closed) and in [1], F. Bonetto, G. Gentile and V. Mastropietro proved the theorem for the case of a metric of constant negative curvature and θ a small harmonic 1-form. We emphasize that we do not make any assumptions on g or E except that the underlying isokinetic dynamics is Anosov. Conditions under which the Anosov property holds have been given in [22, 23]. We now explain for which Anosov systems we can understand the cohomological equation (3) completely. Let M be a closed manifold endowed with a Riemannian metric g. We consider a generalized isokinetic thermostat. This consists of a semibasic vector field E(x, v), that is, a smooth map T M (x, v) → E(x, v) ∈ T M such that E(x, v) ∈ Tx M for all (x, v) ∈ T M. As before the equation E(γ , γ˙ ), γ˙ D γ˙ = E(γ , γ˙ ) − γ˙ dt |γ˙ |2 defines a flow φ on the unit sphere bundle S M. These generalized thermostats are no longer reversible unless E(x, v) = E(x, −v). Suppose now that M is a closed oriented surface. We can write E(x, v) = κ(x, v)v + λ(x, v)iv, where i indicates rotation by π/2 according to the orientation of the surface and κ and λ are smooth functions. The evolution of the thermostat on S M can now be written as D γ˙ = λ(γ , γ˙ ) i γ˙ . dt

(4)

If λ does not depend on v, then φ is the magnetic flow associated with the magnetic field λ a , where a is the area form of M. Of course, magnetic flows are Hamiltonian. If λ depends linearly on v, we obtain the Gaussian thermostat (1). Let π : S M → M be the canonical projection. Theorem B. Let M be a closed oriented surface and consider a generalized isokinetic thermostat (4). Suppose the flow φ is Anosov and let F be the vector field generating φ. Let h ∈ C ∞ (M) and let θ be a smooth 1-form on M. Then the cohomological equation F(u) = h ◦ π + θ has a solution u ∈ C ∞ (S M) if and only if h = 0 and θ is exact. Note that by the smooth Livšic theorem [15] saying that h ◦π +θ = F(u) is equivalent to saying that h ◦ π + θ has zero integral over every closed orbit of φ. Theorem B was proved in [3] for the case of magnetic flows (i.e. λ depends only on x). It was surprising for us that the theorem also holds for systems that do not preserve a smooth measure. The proof is also based on establishing a Pestov identity as in [2, 4] for geodesic flows, but some unexpected cancellations take place producing in the end formulas which are just what one needs to prove the theorem. Earlier proofs of Theorem B for some geodesic and magnetic flows using Fourier analysis can be found in [12, 18].

536

N. S. Dairbekov, G. P. Paternain

Finally we note that Theorem A also holds if we allow magnetic forces. Indeed Theorem B holds for a generalized thermostat and divF = −θ even when we have a magnetic field present. The extension of Theorem A to isoenergetic thermostats (i.e. in the presence of potential forces) is discussed in Remark 5.1. 2. Preliminaries Let M be a closed oriented surface, S M the unit sphere bundle and π : S M → M the canonical projection. The latter is in fact a principal S 1 -fibration and we let V be the infinitesimal generator of the action of S 1 . Given a unit vector v ∈ Tx M, we will denote by iv the unique unit vector orthogonal to v such that {v, iv} is an oriented basis of Tx M. There are two basic 1-forms α and β on S M which are defined by the formulas: α(x,v) (ξ ) := d(x,v) π(ξ ), v; β(x,v) (ξ ) := d(x,v) π(ξ ), iv. The form α is the canonical contact form of S M whose Reeb vector field is the geodesic vector field X . The volume form α ∧ dα gives rise to the Liouville measure dμ of S M. A basic theorem in 2-dimensional Riemannian geometry asserts that there exists a unique 1-form ψ on S M (the connection form) such that ψ(V ) = 1 and dα = ψ ∧ β, dβ = −ψ ∧ α, dψ = −(K ◦ π ) α ∧ β,

(5) (6) (7)

where K is the Gaussian curvature of M. In fact, the form ψ is given by DZ ψ(x,v) (ξ ) = (0), iv , dt where Z : (−ε, ε) → S M is any curve with Z (0) = (x, v) and Z˙ (0) = ξ and DdtZ is the covariant derivative of Z along the curve π ◦ Z . For later use it is convenient to introduce the vector field H uniquely defined by the conditions β(H ) = 1 and α(H ) = ψ(H ) = 0. The vector fields X, H and V are dual to α, β and ψ and as a consequence of (5–7) they satisfy the commutation relations [V, X ] = H, [V, H ] = −X, [X, H ] = K V.

(8)

Equations (5–7) also imply that the vector fields X, H and V preserve the volume form α ∧dα and hence the Liouville measure. Note that the flow of H is given by R −1 ◦ gt ◦ R, where R(x, v) = (x, iv) and gt is the geodesic flow. 3. An Integral Identity Henceforth (M, g) is a closed oriented surface and X , H , and V are the same vector fields on S M as in the previous section. Let λ be the smooth function on S M given by (4), and let F = X + λV be the generating vector field of the generalized thermostat.

Entropy Production in Gaussian Thermostats

537

From (8) we obtain: [V, F] = H + V (λ)V, [V, H ] = −F + λV, [F, H ] = −λF + (K − H (λ) + λ2 )V. Lemma 3.1 (The Pestov identity). For every smooth function u : S M → R we have 2H u · V Fu = (Fu)2 + (H u)2 − (K − H (λ) + λ2 )(V u)2 +F(H u · V u) + V (λ)H u · V u − H (Fu · V u) + V (Fu · H u). Proof. Using the commutation formulas, we deduce: 2H u · V Fu − = = = =

V (H u · Fu) H u · V Fu − V H u · Fu H u · (F V u + [V, F]u) − Fu · (H V u + [V, H ]u) H u · (F V u + H u + V (λ)V u) − Fu · (H V u − Fu + λV u) (Fu)2 + (H u)2 + (F V u)(H u) − (H V u)(Fu) −λFu · V u + H u · V (λ)V u = (Fu)2 + (H u)2 + F(V u · H u) − H (V u · Fu) − [F, H ]u · V u −λFu · V u + H u · V (λ)V u = (Fu)2 + (H u)2 + F(V u · H u) + V (λ)H u · V u − H (V u · Fu) −(K − H (λ) + λ2 )(V u)2 ,

which is equivalent to the Pestov identity. Now let := α ∧ dα. This volume form generates the Liouville measure dμ. Lemma 3.2. We have L F = V (λ); L H = 0; L V = 0.

(9) (10) (11)

Proof. Note that for any vector field Y , L Y = d(i Y ). Since i V = −α ∧ β = −π ∗ a , where a is the area form of M, we see that L V = 0. Similarly, L X = L H = 0. Finally L F = L X + L λV = d(i λV ) = V (λ). Below we will use the following consequence of Stokes theorem. Let N be a closed oriented manifold and a volume form. Let X be a vector field on N and f : N → R a smooth function. Then X( f ) = − f L X . (12) N

N

Integrating the Pestov identity over S M against the Liouville measure dμ, and using (10) and (11) we obtain: H u · V Fu dμ = (Fu)2 dμ + (H u)2 dμ 2 SM SM SM (F(H u · V u) + V (λ)H u · V u) dμ + S M (K − H (λ) + λ2 )(V u)2 dμ. − SM

538

N. S. Dairbekov, G. P. Paternain

Using (12) and (9) we get: (F(H u · V u) + V (λ)H u · V u) dμ = 0, SM

and thus

H u · V Fu dμ =

2

(Fu)2 dμ +

SM

SM

(H u)2 dμ SM

(K − H (λ) + λ2 )(V u)2 dμ.

−

(13)

SM

We will derive one more integral identity. By the commutation relations, we have F V u = V Fu − H u − V (λ)V u. Therefore, (F V u)2 = (V Fu)2 + (H u)2 + (V (λ))2 (V u)2 −2V Fu · H u − 2V Fu · V (λ)V u + 2V (λ)V u · H u. Thus using again the commutation relations: (F V u)2 = (V Fu)2 + (H u)2 + (V (λ))2 (V u)2 −2V Fu · H u − 2F V u · V (λ)V u − 2(V (λ))2 (V u)2 . Since F(V (λ)(V u)2 ) = 2V (λ)V u · F V u + (V u)2 F(V (λ)), we obtain (F V u)2 = (V Fu)2 + (H u)2 − (V (λ))2 (V u)2 −2V Fu · H u − F(V (λ)(V u)2 ) + (V u)2 F(V (λ)). Integrating this equation we obtain 2 H u · V Fu dμ = (V Fu) dμ + (H u)2 dμ 2 SM SM SM − (F V u)2 dμ + F(V (λ))(V u)2 dμ, SM

(14)

SM

since by (12) and (9) we get {F(V (λ)(V u)2 ) + (V (λ))2 (V u)2 } dμ = 0. SM

Combining (13) and (14) we arrive at the final integral identity of this section: Theorem 3.3. 2 (F V u) dμ − SM

K(V u) dμ =

(V Fu) dμ −

2

SM

where K := K − H (λ) + λ2 + F(V (λ)).

(Fu)2 dμ,

2

SM

SM

(15)

Entropy Production in Gaussian Thermostats

539

Of course this identity holds without any assumption on the underlying dynamics. In the next section we will show how to use the Anosov hypothesis to rewrite the left-hand side of (15) in terms of the stable or unstable bundles. At this point the proof differs from the one presented in [3]. We can no longer estimate the left-hand side of (15) using closed orbits and the non-negative Livšic theorem [16, 19] since in our context the Liouville measure is not necessarily invariant. 4. Using the Anosov Property Recall that the Anosov property means that T (S M) splits as T (S M) = RF ⊕ E u ⊕ E s in such a way that there are constants C > 0 and 0 < ρ < 1 < η such that for all t > 0 we have dφ−t | E u ≤ C η−t

and dφt | E s ≤ C ρ t .

The subbundles are then invariant and Hölder continuous and have smooth integral manifolds, the stable and unstable manifolds, which define a continuous foliation with smooth leaves. Let us introduce the weak stable and unstable bundles: E + = RF ⊕ E s , E − = RF ⊕ E u . Lemma 4.1. For any (x, v) ∈ S M, V (x, v) ∈ / E ± (x, v). Proof. Let (S M) be the bundle over S M such that at each point (x, v) ∈ S M consists of all 2-dimensional subspaces W of T(x,v) S M with F(x, v) ∈ W . The map (x, v) → V := RF(x, v) ⊕ RV (x, v) is a section of (S M) and its image is a codimension one submanifold that we denote by V . Similarly the map (x, v) → RF(x, v) ⊕ RH (x, v) is a section of (S M) and its image is a codimension one submanifold that we denote by H . The flow φ naturally lifts to a flow φ ∗ acting on (S M) via its differential. Let F ∗ be the infinitesimal generator of φ ∗ . Claim. F ∗ is transversal to V . To prove the claim we define a function m: (S M)\ H → R as follows. If W ∈ (S M) \ H , then H ∈ / W . Thus there exists a unique m = m(W ) such that m H + V ∈ W . Clearly m is smooth and V = m −1 (0) ⊂ (S M) \ H . Fix (x, v) ∈ S M and set m(t) := m(φt∗ (V(x, v))). By the definition of m, there exist functions x(t) and y(t) such that m(t)H (t) + V (t) = x(t)F(t) + y(t)dφt (V ). Equivalently m(t)dφ−t (H (t)) + dφ−t (V (t)) = x(t)F + y(t)V. Differentiating with respect to t and setting t = 0 (recall that m(0) = 0) we obtain: m(0)H ˙ + [F, V ] = x(0)F ˙ + y˙ (0)V. But [V, F] = H + V (λ)V . Thus m(0) ˙ = 1 which proves the Claim.

540

N. S. Dairbekov, G. P. Paternain

From the Claim it follows that V determines an oriented codimension one cycle in (S M) and by duality it defines a cohomology class m ∈ H 1 ((S M), Z). Set E = E ± . Given a continuous closed curve α : S 1 → S M, the index of α is ν(α) := m, [E ◦ α] (i.e. ν = E ∗ m ∈ H 1 (S M, Z)). The index of α only depends on the homology class of α. Since E is φ-invariant, the Claim also ensures that if γ is any closed orbit of φ, then ν(γ ) ≥ 0. Recall that according to Ghys [11] we know that φ is topologically conjugate to the geodesic flow of a metric of constant negative curvature. In particular, every homology class in H1 (S M, Z) contains a closed orbit of φ. Thus ν must vanish. If there exists (x, v) ∈ S M for which V (x, v) ∈ E(x, v), then using that every point of φ is non-wandering, we can produce exactly as in [17, Lemma 2.49] a closed curve α: S 1 → S M with ν(α) > 0. This contradiction shows the lemma. Remark 4.2. The reader will recognize that the index that appears in the proof of the lemma reduces to the Maslov index when φ is Hamiltonian. The proof of the lemma also follows the presentation in [17, Chap. 2] of analogous results for geodesic flows. The lemma implies that there exist unique continuous functions r ± on S M such that H + r + V ∈ E +, H + r − V ∈ E −. Note that the Anosov property implies that r + = r − everywhere. Below we will need to use that the functions r ± satisfy a Riccati type equation along the flow. Note that r ± are smooth along φ because E ± are φ-invariant. Lemma 4.3. Let r = r ± . Then F(r − V (λ)) + r (r − V (λ)) + K = 0. Proof. Let E = E ± . Fix (x, v) ∈ S M, flow along φ and set ξ(t) := dφ−t (H (t) + r (t)V (t)). By the definition of r , ξ(t) ∈ E(x, v) for all t. Differentiating with respect to t and setting t = 0 we obtain: ξ˙ (0) = [F, H ] + F(r )V + r [F, V ]. Using that [V, F] = H + V (λ)V, [F, H ] = −λF + (K − H (λ) + λ2 )V we have

ξ˙ (0) = −λF − r H + F(r ) + K − H (λ) + λ2 − V (λ)r V.

Replacing H by ξ(0) − r V yields:

ξ˙ (0) + r ξ(0) + λF = r 2 + F(r ) + K − H (λ) + λ2 − V (λ)r V. Since ξ˙ (0) + r ξ(0) + λF ∈ E we must have r 2 + F(r ) + K − H (λ) + λ2 − V (λ)r = 0, which is the desired equation since K = K − H (λ) + λ2 + F(V (λ)).

Entropy Production in Gaussian Thermostats

541

Here is the main result of this section: Theorem 4.4. Let ψ : S M → R be a smooth function and suppose φ is Anosov. Then for r = r ± , (Fψ)2 dμ − Kψ 2 dμ = [F(ψ) − r ψ + ψ V (λ)]2 dμ ≥ 0. SM

SM

Moreover,

SM

[F(ψ) − r ψ + ψ V (λ)]2 dμ = 0, SM

if and only if ψ = 0. Proof. Let us expand [F(ψ) − r ψ + ψ V (λ)]2 : [F(ψ) − r ψ + ψ V (λ)]2 = [F(ψ)]2 + ψ 2 r 2 + ψ 2 [V (λ)]2 −2F(ψ)ψr + 2F(ψ)ψ V (λ) − 2ψ 2 r V (λ). Using that (see Lemma 4.3) F(r − V (λ)) + r (r − V (λ)) + K = 0 we obtain [F(ψ) − r ψ + ψ V (λ)]2 = [F(ψ)]2 − Kψ 2 −F((r − V (λ))ψ 2 ) + ψ 2 [V (λ)]2 − ψ 2 r V (λ). If we integrate the last equality with respect to the Liouville measure μ we obtain as desired: 2 2 (Fψ) dμ − Kψ dμ = [F(ψ) − r ψ + ψ V (λ)]2 dμ, SM

SM

SM

since by (12) and (9) we have the following cancellation: {−F((r − V (λ))ψ 2 ) + ψ 2 [V (λ)]2 − ψ 2 r V (λ)} dμ = 0. SM

Suppose now [F(ψ) − r ψ + ψ V (λ)]2 dμ = 0 SM

which implies F(ψ) − r ψ + ψ V (λ) = 0 everywhere. Since this holds for r = r ± we deduce (r + − r − )ψ = 0. But for an Anosov flow r + − r − = 0 everywhere, thus ψ = 0.

542

N. S. Dairbekov, G. P. Paternain

5. Proof of Theorem B Let us now prove Theorem B. If Fu = h ◦ π + θ , then it is easy to see that the right-hand side of (15) is nonpositive. Indeed, since μ is invariant under v → −v and v → iv we have

θx (v) dμ = 0 and SM

(θx (v)) dμ =

(θx (iv))2 dμ.

2

SM

SM

But V Fu = θx (iv) and thus

(V Fu)2 dμ − SM

(Fu)2 dμ = − SM

(h ◦ π )2 dμ ≤ 0. SM

Setting ψ = V u, we get

(Fψ)2 − Kψ 2 dμ ≤ 0.

(16)

SM

By Theorem 4.4 this happens if and only if ψ = 0. This would give V u = 0, which says that u = f ◦π , where f is a smooth function on M. But in this case, since dπ(x,v) (F) = v we have Fu = d f x (v). This clearly implies the claim of the theorem. Remark 5.1. Suppose that we include potential forces in our dynamics, that is, we consider the isoenergetic thermostat: D γ˙ E(γ ), γ˙ = −∇W + E(γ ) − γ˙ dt |γ˙ |2

(17)

on the energy level 21 |v|2 + W (x) = k (we assume that |v| does not vanish on the energy level). Wojtkowski has pointed out [24, Theorem 2.4] that the dynamics of (17) reparametrized by arc-length defines a flow on S M which coincides with the isokinetic thermostat with external field E := −∇W + E = 1 ∇(log(k − W )) + . E 2(k − W ) 2 2(k − W ) Since the vanishing of entropy production and the Anosov property are unaltered by that an Anosov isoenersmooth time changes we conclude applying Theorem A to E getic thermostat has zero entropy production if and only if E/2(k − W ) has a global potential. The question of whether Theorem B extends to higher dimension is more delicate. We hope to discuss this topic elsewhere. Acknowledgements. The first author thanks the Department of Pure Mathematics and Mathematical Statistics at the University of Cambridge and Trinity College for hospitality and financial support while this work was in progress.

Entropy Production in Gaussian Thermostats

543

References 1. Bonetto, F., Gentile, G., Mastropietro, V.: Electric fields on a surface of constant negative curvature. Ergod. Th. Dynam. Sys. 20, 681–696 (2000) 2. Croke, C.B., Sharafutdinov, V.A.: Spectral rigidity of a negatively curved manifold. Topology 37, 1265– 1273 (1998) 3. Dairbekov, N.S., Paternain, G.P.: Longitudinal KAM-cocycles and action spectra of magnetic flows. Math. Res. Lett. 12, 719–730 (2005) 4. Dairbekov, N.S., Sharafutdinov, V.A.: Some problems of integral geometry on Anosov manifolds. Ergod. Th. Dynam. Sys. 23, 59–74 (2003) 5. Gallavotti, G.: Reversible Anosov diffeomorphisms and large deviations. Math. Phys. Electronic J. 1, 1–12 (1995) 6. Gallavotti, G.: New methods in nonequilibrium gases and fluids. Open Sys. Inf. Dynam. 6, 101–136 (1999) 7. Gallavotti, G., Cohen, E.G.D.: Dynamical ensembles in nonequilibrium statistical mechanics. Phys. Rev. Lett. 74, 2694–2697 (1995) 8. Gallavotti, G., Cohen, E.G.D.: Dynamical ensembles in stationary states. J. Stat. Phys. 80, 931–970 (1995) 9. Gallavotti, G., Ruelle, D.: SRB states and nonequilibrium statistical mechanics close to equilibrium. Commun. Math. Phys. 190, 279–281 (1997) 10. Gentile, G.: Large deviation rule for Anosov flows. Forum Math. 10, 89–118 (1998) 11. Ghys, E.: Flots d’Anosov sur les 3-variétés fibrées en cercles. Ergod. Th. Dynam. Sys. 4, 67–80 (1984) 12. Guillemin, V., Kazhdan, D.: Some inverse spectral results for negatively curved 2-manifolds. Topology 19, 301–312 (1980) 13. Hoover, W.G.: Molecular Dynamics, Lecture Notes in Phys. 258, Berlin Heidelberg New York: Springer, 1986 14. Katok, A., Hasselblatt, B.: Introduction to the modern theory of dynamical systems. Encyclopedia of Mathematics and its Applications 54, Cambridge: Cambridge University Press, 1995 15. de la Llave, R., Marco, J.M., Moriyon, R.: Canonical perturbation theory of Anosov systems and regularity for the Livsic cohomology equation. Ann. Math. 123, 537–611 (1986) 16. Lopes, A.O., Thieullen, P.: Sub-actions for Anosov flows. Ergod. Th. Dynam. Sys. 25, 605–628 (2005) 17. Paternain, G.P.: Geodesic flows. Progress in Mathematics 180, Basel-Boston: Birkäuser 1999 18. Paternain, G.P.: The longitudinal KAM-cocycle of a magnetic flow. Math. Proc. Camb. Phil. Soc. 139, 307–316 (2005) 19. Pollicott, M., Sharp, R.: Livsic theorems, maximising measures and the stable norm. Dyn. Sys.: An Int. J. 19, 75–88 (2004) 20. Ruelle, D.: Positivity of entropy production in nonequilibrium statistical mechanics. J. Stat. Phys. 85, 1–23 (1996) 21. Ruelle, D.: Smooth dynamics and new theoretical ideas in nonequilibrium statistical mechanics. J. Stat. Phys. 95, 393–468 (1999) 22. Wojtkowski, M.P.: Magnetic flows and Gaussian thermostats on manifolds of negative curvature. Fund. Math. 163 177–191 (2000) 23. Wojtkowski, M.P.: W-flows on Weyl manifolds and Gaussian thermostats. J. Math. Pures Appl. 79, 953– 974 (2000) 24. Wojtkowski, M.P.: Weyl manifolds and Gaussian thermostats. Proceedings of the International Congress of Mathematicians, Beijing 2002, Vol. III, pp. 511–523 China: Higher Ed. Press/ world Scientific (2002) Communicated by G. Gallavotti

Commun. Math. Phys. 269, 545–556 (2007) Digital Object Identifier (DOI) 10.1007/s00220-006-0119-9

Communications in

Mathematical Physics

Decoupling Problem for Systems of Quasi-Linear pde’s Oleg I. Bogoyavlenskij Department of Mathematics, Queen’s University, Kingston, K7L 3N6, Canada. E-mail: [email protected] Received: 8 January 2006 / Accepted: 8 May 2006 Published online: 27 September 2006 – © Springer-Verlag 2006

Abstract: The necessary and sufficient conditions for the decoupling of a quasi-linear system of partial differential equations into k non-interacting subsystems are derived. Several necessary conditions for the decoupling are found and applied to the Benney system. 1. Introduction As is known, the decoupling of a quasi-linear system of partial differential equations (pde’s) n ∂u ∂u m = Am (u 1 , . . . , u n ) ∂t ∂x

(1.1)

m=1

into subsystems of a simpler form drastically effects properties of its solutions and the computer time required for its numerical investigation. Until now no necessary and sufficient conditions for the decoupling were known. Courant’s decoupling problem [1] is formulated as follows: When can a given system (1.1) be locally decoupled in some coordinates v 1 (u), . . . , v n (u) into k non-interacting subsystems nj

m j +i ∂v m j +i ∂v m j + = A˜ m j + (v m j +1 , . . . , v m j +n j ) ∂t ∂x

(1.2)

=1

of some orders n 1 , . . . , n k with n 1 + · · · + n k = n? Here j = 1, . . . , k, i = 1, . . . , n j and m j = n 1 + · · · + n j−1 ; u 1 , . . . , u n and v 1 (u), . . . , v n (u) form systems of local coordinates in the Euclidean space R n . The first result concerning the decoupling problem follows from the Nijenhuis theorem [2]

546

O. I. Bogoyavlenskij

on the X n−1 -forming sets of eigenvectors of the (1,1)-tensors Aij (u 1 , . . . , u n ). The theorem is proved in [2] and is equivalent to the following statement: The necessary and sufficient condition for the complete decoupling of a system (1.1) into n non-interacting one-dimensional subsystems is the vanishing of the corresponding Nijenhuis tensor j N Aik

=

j α ∂ Ak Ai α

∂u

−

j α ∂ Ai Ak α

∂u

+ Aαj

α ∂ Aiα j ∂ Ak − A , α ∂u k ∂u i

(1.3)

provided that all eigenvalues of Aij (u) are real and distinct. The more general case of complex eigenvalues of Aij (u) was not studied in [2]. It is evident however that for a generic system of quasi-linear pde’s (1.1) some of the eigenvalues of Aij (u) are complex. In paper [3], we presented a necessary condition for the decoupling into k non-interacting subsystems: The polynomial PN (V, λ) = det(N V − λ) should have degree n − k in variables V and should be a product of k factors. Here V is a tangent vector, V ∈ Tu (R n ), and (N V )ij = Nαi j V α . In Sect. 3, we prove Theorem 1 that gives the necessary and sufficient conditions for the decoupling of systems (1.1). The proof is based on a new algebraic identity derived in Sect. 2 that connects the Nijenhuis tensor N B(A) (U, V ) with the N A (U, V ), where B(A) is an arbitrary polynomial in A. The identity resolves the Nijenhuis problem on the interconnections between the tensors N B(A) (U, V ) and N A (U, V ) posed in [2]. We use the identity for the polynomials B(A) defined by the subfactors P j (λ) of the characteristic polynomial Pc (λ) = det(A − λ). The (1,1)-tensors P j (A) annihilate certain invariant subspaces of A and thus lead to the Nijenhuis tensors N P j (A) (U, V ) of a simpler structure. Several necessary conditions for the decoupling are presented in Sect. 4 together with applications to the Benney system [4]. 2. An Algebraic Identity for the Nijenhuis Tensors I. Let u and v be tangent vectors at a point x ∈ R n , and u˜ and v˜ be arbitrary vector fields extending the vectors u and v and [u, ˜ v] ˜ be the commutator of the vector fields. The Nijenhuis tensor is defined by the formula [2] N A (u, v) = A2 [u, ˜ v] ˜ + [Au, ˜ Av] ˜ − A[Au, ˜ v] ˜ − A[u, ˜ Av]. ˜

(2.1)

The expression (2.1) depends only on the tangent vectors u and v and is independent of their extensions u˜ and v˜ [2]. The Nijenhuis tensor (2.1) appears in many problems of mathematical physics and differential geometry, mostly as the vanishing condition N A (u, v) = 0: The Newlander-Nirenberg theorem [5] states that a quasi-complex structure A(x), A2 (x) = −1, is complex if and only if the Nijenhuis tensor N A (u, v) vanishes. The Gelfand-Dorfman-Magri-Morosi theorem [6, 7] states that the two Poisson structures P1 and P2 are compatible in Magri’s sense [8] if and only if N A (u, v) = 0, where A = P1 P2−1 . The condition N A (u, v) = 0 is used in [7, 9, 10] as the definition of the Poisson-Nijenhuis structures and in [11] as the definition of the Nijenhuis G-manifolds with applications

Decoupling Problem for Systems of Quasi-Linear pde’s

547

to the KP systems. The condition N A (u, v) = 0 is used in [12] as a sufficient condition for the existence of conservation laws for systems of pde’s (1.1). In papers [3, 13], we applied the non-zero Nijenhuis and Haantjes tensors [14] to study the necessary criteria for the existence of the Hamiltonian and bi-Hamiltonian structures for systems (1.1). Remark 1. Let a (k, l)-tensor T (A) analytically depend on the entries of the (1,1)-tensor Aij (x) and their partial derivatives up to a finite order r . If tensor T (A) is equal to zero for all (1,1)-tensors Aij (x) with distinct (complex) eigenvalues then T (A) ≡ 0 for any (1,1)-tensor Aij (x). This evidently follows by continuation from the non-degenerate case Aij (x) with distinct eigenvalues. II. Let us consider a (1,1)-tensor B(A) =

k

bm (x)Am (x),

(2.2)

m=0

where coefficients bm (x) are arbitrary smooth functions. Lemma 1. For any polynomial B(A) (2.2), the Nijenhuis tensor N B(A) (u, v) is connected with N A (u, v) by the formula N B(A) (u, v) =

k m,l=1

+

bm bl

k

Am+l− p−q−2 N A (A p u, Aq v)

p<m,q
[B(A)u(bm )Am v − B(A)v(bm )Am u

m=0

− u(bm )B(A)Am v + v(bm )B(A)Am u].

(2.3)

Proof. In (2.3), the B(A)u(bm ) is the derivative of function bm (x) in the direction of the tangent vector B(A)u; the same for B(A)v(bm ). We first assume that the operator A(x) has distinct eigenvalues λ1 (x), . . . , λn (x) corresponding to the eigenvectors e1 (x), . . . , en (x). The operator B(A) (2.2) has the same eigenvectors ei (x) with the eigenvalues B(λi (x)). For the Nijenhuis tensor N B(A) (u, v) (2.1), the formula N B(A) (ei , e j ) = (B(A) − B(λi ))(B(A) − B(λ j ))[ei , e j ] + (B(λi ) − B(λ j )) ei (B(λ j ))e j + e j (B(λi ))ei

(2.4)

holds as a consequence of the Nijenhuis formula [2] for any (1,1)-tensor A: N A (ei , e j ) = (A − λi )(A − λ j )[ei , e j ] + (λi − λ j )(ei (λ j )e j + e j (λi )ei ).

(2.5)

As is known, the Bezout identity (2.6) B(z) − B(λ) = (z − λ)Q B (z, λ) k m holds for any polynomial B(z) = m=0 bm (x)z . Here Q B (z, λ) is the symmetric polynomial Q B (z, λ) =

k m=1

bm (x)

r +s=m−1

z r λs .

(2.7)

548

O. I. Bogoyavlenskij

Bezout identity (2.6) yields ∂ B(λ)/∂λ = Q B (λ, λ). Hence for any tangent vector e ∈ Tx (M n ), we have e(B(λ)) =

k

(bm e(λm ) + e(bm )λm ) = Q B (λ, λ)e(λ) +

m=0

k

e(bm )λm .

(2.8)

m=0

In view of the identities (2.6) and (2.8), formula (2.4) takes the form: N B(A) (ei , e j ) = Q B (A, λi )Q B (A, λ j )(A − λi )(A − λ j )[ei , e j ] + Q B (λi , λ j )(λi − λ j )[Q B (λ j , λ j )ei (λ j )e j + Q B (λi , λi )e j (λi )ei ] +

k

(B(λi ) − B(λ j ))[ei (bm )λmj e j + e j (bm )λim ei ].

m=0

Applying the Nijenhuis formula (2.5), we obtain N B(A) (ei , e j ) = Q B (A, λi )Q B (A, λ j )N A (ei , e j ) +

k

[B(A)ei (bm )Am e j − B(A)e j (bm )Am ei

m=0

− ei (bm )B(A)Am e j + e j (bm )B(A)Am ei ].

(2.9)

Formula (2.9) after substitution of (2.7) takes the form N B(A) (ei , e j ) =

k

bm bl

m,l=1

+

k

Am+l− p−q−2 N A (A p ei , Aq e j )

p<m,q
[B(A)ei (bm )Am e j − B(A)e j (bm )Am ei

m=0

− ei (bm )B(A)Am e j + e j (bm )B(A)Am ei ].

(2.10)

Formula (2.3) for arbitrary vectors u and v follows by the bilinearity from (2.10), for the case of distinct eigenvalues of A(x). Thus (2.3) is proven for any (1,1)-tensor Aij (x) having distinct eigenvalues. Applying Remark 1, we obtain that formula (2.3) holds for an arbitrary (1,1)-tensor Aij (x). Remark 2. Let B(A) be any polynomial (2.2) that annihilates a (1,1)-tensor Aij (x). Then formula (2.3) implies that the Nijenhuis tensor satisfies the identity k m,l=1

bm bl

Am+l− p−q−2 N A (A p u, Aq v) = 0.

(2.11)

p<m,q
For example, for any quasi-complex structure, the minimal polynomial is B(λ) = λ2 + 1, A2 + I = 0. Hence the identity A2 N A (u, v) + N A (Au, Av) + AN A (Au, v) + AN A (u, Av) = 0 (2.11) holds for any quasi-complex structure A.

Decoupling Problem for Systems of Quasi-Linear pde’s

549

3. The Decoupling Problem For a quasi-linear system of pde’s in the decoupled form (1.2), the different blocks m +i A˜ j (v m j +1 , . . . , v m j +n j ) of the operator A˜ α depend on different variables. Hence it β

m j +

is evident that for the generic case the eigenvalues corresponding to any two blocks m +i A˜ j (v m j +1 , . . . , v m j +n j ) do not coincide with each other almost everywhere for x ∈ m j +

R n (while inside a given block some eigenvalues can coincide). Theorem 1. For a system of quasi-linear pde’s (1.1) to be locally reducible into k non-interacting subsystems of some orders n 1 , . . . , n k with n 1 + · · · + n k = n it is necessary and sufficient that in the tangent spaces Tx (R n ) there exist k smooth distributions L 1x , . . . , L kx of dimensions n 1 , . . . , n k such that L 1x ⊕ · · · ⊕ L kx = Tx (R n ) and the conditions A(L i x ) ⊂ L i x , N A (L i x , L i x ) ⊂ L i x , N A (L i x , L r x ) = 0

(3.1)

hold provided that the eigenvalues of the operator A(x) in any two different subspaces L i x and L r x are different almost everywhere for x ∈ R n . Here i = r ; i, r ∈ {1, . . . , k}. Proof. The necessity. Suppose that in some coordinates v 1 , . . . , v n system (1.1) is decoupled into k non-interacting subsystems (1.2) in the subspaces v 1 , . . . , v n 1 ∈ L 1 , v n 1 +1 , . . . , v n 1 +n 2 ∈ L 2 , . . . , v n−n k +1 , . . . , v n ∈ L k that form the distributions L j , L 1 ⊕ · · · ⊕ L k = T (R n ). Then the (1,1)-tensor A˜ αβ (v 1 , . . . , v n ) has the block-diagonal form and the corresponding Nijenhuis tensor is a direct sum of the Nijenhuis tensors (1.3) in each subspace L j . Hence the necessary conditions (3.1) are evidently satisfied. The sufficiency. First we prove that system (1.1) has a block-diagonal form in some local coordinates. Let P(λ) = det(A − λ) be the characteristic polynomial of the (1,1)-tensor Aαβ (x), x ∈ R n . Let Ai (x) be the restriction of the operator A(x) onto the invariant subspace L i x and Pi (λ) = det(Ai (x) − λ) be the corresponding characteristic polynomial. Since L 1x ⊕ · · · ⊕ L kx = Tx (R n ), we obtain P(λ) = P1 (λ) · · · Pk (λ). Let us define the polynomials B j (λ) = P1 (λ) · · · P j−1 (λ)P j+1 (λ) · · · Pk (λ) = P(λ)/P j (λ).

(3.2)

By the Cayley-Hamilton theorem we have P (A ) = 0. Hence we get B j (A ) = 0 for j = ; j, ∈ {1, . . . , k}. On the subspaces L j x , the operator B j (A j ) is non-degenerate almost everywhere for x ∈ R n because the operators A j and A do not have coinciding eigenvalues for j = . Hence the restriction of the operator B j (A) onto the invariant subspace L j x ⊂ Tx (R n ) is non-degenerate almost everywhere for x ∈ R n and is zero on all other subspaces L x ⊂ Tx (R n ). n−n The polynomial B j (λ) has some form B j (λ) = m=0j b jm (x)λm , where coefficients b jm (x) depend on point x ∈ R n . Let N B j (u, v) be the Nijenhuis tensor (2.1) defined by the (1,1)-tensor n−n j

B j (A) =

m=0

b jm (x)Am .

(3.3)

550

O. I. Bogoyavlenskij

Applying formula (2.3) for the (1,1)-tensor B(A) = B j (A) (3.3), we find N B j (A) (u, v) =

k

b jm b j

m,l=1

+

k

Am+l− p−q−2 N A (A p u, Aq v)

p<m,q
[B j (A)u(b jm )Am v − B j (A)v(b jm )Am u

m=0

− u(b jm )B j (A)Am v + v(b jm )B j (A)Am u].

(3.4)

For any vectors u, v ∈ L i x , i = 1, . . . , n, formula (3.4) and conditions (3.1) imply N B j (A) (L i x , L i x ) ⊂ L i x .

(3.5)

For vectors u ∈ L i x and v ∈ L r x , where i = r = j, we have B j (A)u = 0, B j (A)v = 0. Hence formula (3.4) and conditions (3.1) yield N B j (A) (L i x , L r x ) = 0, i = r = j.

(3.6)

Let us consider the (n − n j )-dimensional distribution M j = L 1 + · · · + L j−1 + L j+1 + · · · + L k .

(3.7)

Equations (3.5), (3.6) and (3.7) yield N B j (A) (M j , M j ) ⊂ M j .

(3.8)

By the definition of the (1,1)-tensor B j (A), all vector fields v(x) ∈ M j x are zero eigenvector fields of B j (A), B j (A)v = 0, because the (1,1)-tensor B j (A) annihilates the distribution M j . Applying formula (2.5) for the (1,1)-tensor B j (A) and the arbitrary vector fields v, w ∈ M j , we obtain N B j (A) (v, w) = (B j (A))2 [v, w] ∈ M j .

(3.9)

Since L j x ⊕ M j x = Tx (R n ) and the restriction of the operator B j (A) onto the invariant subspace L j x is non-degenerate, the generalized zero-eigenspace of the operator B j (A) is exactly the subspace M j x (3.7). Hence Eq. (3.9) yields [v, w](x) ∈ M j x and N B j (A) (v, w) = 0 for any vector fields v(x), w(x) ∈ M j x and almost everywhere for x ∈ R n . Hence by the continuity [v, w](x) ∈ M j x everywhere and the distribution M j is involutive. Applying the Frobenius theorem [15], we obtain that each point x ∈ R n belongs to an (n − n j )-dimensional integral submanifold that is tangent to the linear subspaces M j x . Hence there exist n j functionally independent functions f j1 (x), . . . , f jn j (x) such that d f jm (L i ) = 0, i = j, m = 1, . . . , n j , and differentials d f jm form a basis of the dual space L ∗j . Hence all differentials d f jm for j = 1, . . . , k and m = 1, . . . , n j form a basis of the dual space T ∗ (R n ) = L ∗1 ⊕· · ·⊕ L ∗k . Therefore the n functions f jm (x) form a system of local coordinates on the manifold R n . The integrability of the distributions M j implies the integrability of each distribution L j because L j is the intersection of all distributions M for = j. The distribution L j

Decoupling Problem for Systems of Quasi-Linear pde’s

551

is defined by the equations d f im (L j ) = 0, i = j; i, j ∈ {1, . . . , k}, m = 1, . . . , n i . Hence the corresponding integral submanifolds of the distribution L j in the local coordinates v = f im are n j -dimensional planes defined by the equations f im (x) = cim = const, i = j; i, j ∈ {1, . . . , k}, m = 1, . . . , n i . For i = j, the coordinates f jm , m = 1, . . . , n j , are arbitrary on L j . Since the subspaces L j x are A-invariant, the (1,1)-tensor Aαβ has the block-diagonal form in the local coordinates f im (x) with (n j × n j )-dimensional blocks. This yields the block-diagonalization of the system (1.1) in the local coordinates v = f im , = 1, . . . , n. Let us prove that in the coordinates v = f im the operator A j defined by the (n j ×n j )dimensional diagonal block does not depend on the coordinates f im , where i = j and m = 1, . . . , n i . Let eim = ∂/∂ f im be the basis of unit vector fields in these coordinates. The vector fields are constant and hence commute with each other. For a generic point 0 } = x ∈ R n , we consider the (1,1)-tensor { f im 0 n−n j

B0 j (A(x)) =

b jm (x0 )Am (x)

(3.10)

m=0

that is defined by the formulae (3.2)–(3.3), where coefficients b jm (x0 ) are constants evaluated at the point x0 . By definition, we have B0 j (M j x0 ) = 0,

B0 j (eim (x0 )) = 0, i = j,

(3.11)

and the restriction of the operator B0 j (A(x0 )) onto the invariant subspace L j x0 is nondegenerate because the eigenvalues of the operator A on different invariant subspaces L i and L j are different. Since the operators A(x) have block-diagonal form, the operator B0 j (A)(x) in a neighborhood of the point x0 has the following diagonal blocks in the basis eim : B0 j (eim ) =

ni

i Bim (x)ei , i = j,

B0 j (e j p ) =

=1

nj

jq

B j p (x)e jq .

(3.12)

q=1

i (x ) = 0 for i = j and the Equations (3.11) imply that at the point x0 all entries Bim 0 jq n j × n j matrix B j p (x0 ) is non-degenerate, p, q ∈ {1, . . . , n j }. Applying Lemma 1 and formula (2.3) to the (1,1)-tensor B0 j (A) (3.10) with constant coefficients b jm (x0 ), we obtain that the Nijenhuis tensor N B0 j (u, v) is connected with the Nijenhuis tensor N A (u, v) by the formula n−n j

N B0 j (u, v) =

m,l=1

b jm (x0 )b j (x0 )

Am+l− p−q−2 N A (A p u, Aq v).

p<m,q
Hence Eqs. (3.1) imply N B0 j (L i x , L j x ) = 0, i = j.

(3.13)

552

O. I. Bogoyavlenskij

Applying formula (2.1) to the coordinate vector fields eim ∈ L i , i = j, and e j p ∈ L j , we obtain N B0 j (eim , e j p ) = B02 j [eim , e j p ] + [B0 j eim , B0 j e j p ] − B0 j [B0 j eim , e j p ] −B0 j [eim , B0 j e j p ]. Equation (3.13) gives N B0 j (eim , e j p ) = 0. Hence using Eqs. (3.11) and (3.12) we find 0 }: at the point x0 = { f im jq

jq

jr

i 0 = (B j p (x0 )e jq )(Bim )ei + eim (B j p )B jq (x0 )e jr

=

∂ B i jq B j p (x0 ) im (x0 )ei ∂ f jq

jq

+

∂ Bjp ∂ f im

jr

(x0 )B jq (x0 )e jr .

Considering the e jr -components and using the non-degeneracy of the n j × n j matrix jr B jq (x0 ), we find jq

∂ Bjp ∂ f im

(x0 ) = 0.

(3.14)

jq

Since B j p are the entries of the matrix n−n j

B0 j (A j ) =

b jm (x0 )(A j )m

m=0

and the x-independent matrix map A j −→ B0 j (A j ) is a diffeomorphism almost everywhere, Eq. (3.14) yields q

∂(A j ) p (x0 ) = 0 ∂ f im

(3.15)

also almost everywhere in the space of matrices A j . Hence by the continuity Eq. (3.15) 0 } ∈ R n is generic in R n , we find from is true everywhere. Since the point x0 = { f im q (3.15) that the entries (A j ) p of the restriction of matrix Aαβ onto the invariant subspace L j do not depend on the variables f im where i = j and m = 1, . . . , n i . Hence system (1.1) has the block-diagonal form (1.2) in the coordinates v = f im and is decoupled into k non-interacting subsystems. 4. Necessary Conditions for the Decoupling I. For any tangent vectors u, v ∈ Tx (R n ), we define the operator Nu by the formula Nu (v) = N A (u, v), where N A (u, v) is the Nijenhuis tensor. Definition 1. For any two non-negative integers p = q, we define the differential 2-form pq (u, v) = Tr Nu A p Nv Aq − Nv A p Nu Aq . (4.1)

Decoupling Problem for Systems of Quasi-Linear pde’s

553

For any non-negative integers p1 , . . . , pk and tangent vectors u 1 , . . . , u k , we introduce the differential k-forms sign(τ ) Tr Nu τ (1) A p1 · · · Nu τ (k) A pk , p1 ... pk (u 1 , · · · , u k ) = τ

ω p1 ... pk−1 (u 1 , · · · , u k ) =

τ

sign(τ )Nu τ (1) A p1 · · · Nu τ (k−1) A pk−1 u τ (k) ,

(4.2)

where summation is taken over all permutations τ of k symbols and k-forms ω p1 ··· pk−1 are vector-valued. Since ω p1 ··· pm−1 (u 1 , . . . , u m ) is a vector and d(Tr Ar ) is a differential 1-form, their contraction d(Tr Ar )ω p1 ··· pm−1 defines a differential m-form. To apply Theorem 1 to a concrete system (1.1) one has to find the invariant distributions L j that satisfy Eqs. (3.1) where L 1x ⊕ · · · ⊕ L kx = Tx (R n ). The existence or non-existence of such distributions can be investigated using the following criteria. We use the Haantjes tensor H A (u, v) [14] that is defined in terms of the Nijenhuis tensor N A (u, v) (2.1): H A (u, v) = A2 N (u, v) + N (Au, Av) − AN (Au, v) − AN (u, Av).

(4.3)

Necessary conditions for the decoupling of system (1.1) into k non- interacting subsystems of m (or less) equations: 1) All m-forms p1 ··· pm and d(Tr Ar )(ω p1 ··· pm−1 ) (4.2) should be closed, and all (m + 1)forms p1 ··· pm+1 and ω p1 ··· pm should vanish. 2) The polynomials PN (u, λ) = det(Nu − λ) and PH (u, λ) = det(Hu − λ) should have degree n − k in variables u and should be products of k factors of degrees ≤ m − 1. 3) If m = 3 then Haantjes tensor H A (u, v) (4.3) should define the Lie algebra structures in Tx (R n ) that are direct sums of the 3-dimensional Lie algebras and an abelian Lie algebra; the (1, 3)-tensor [3] B H (u, v, w) = H (H (u, v), w) + H (H (v, w), u) + H (H (w, u), v)

(4.4)

should vanish. For the (1, 3)-tensor [3] B N (u, v, w) = N (N (u, v), w) + N (N (v, w), u) + N (N (w, u), v),

(4.5)

the differential 3-forms p (u, v, w) = d Tr(A p )(B N (u, v, w)) should be closed, d p = 0. Proof. 1) If a system (1.1) is reduced to the non-interacting diagonal blocks of dimensions m × m or less than the differential forms p1 ... pm (u 1 , . . . , u m ) and ω p1 ... pm−1 (u 1 , . . . , u m ) (4.2) are direct sums of their projections onto the invariant subspaces L j x of dimensions ≤ m. Hence the m-forms p1 ··· pm and d(Tr Ar )(ω p1 ··· pm−1 ) are closed and the (m + 1)-forms p1 ··· pm+1 and ω p1 ··· pm vanish. 2) For any manifold M n , the polynomial PN (u, λ) = det(Nu − λ) has degree ≤ n − 1 as a function of the tangent vectors u ∈ Tx (M n ). Indeed, the operators Nu are linear in variables u and the maximal degrees in u appear in the free term det Nu . However the latter is zero because Nu (u) = N (u, u) = 0. If the system (1.1) is decoupled into k non-interacting subsystems of ≤ m equation each then the operators Nu have form of k diagonal blocks of dimensions n 1 , . . . , n k . Hence the polynomial PN (u, λ) is a product of k factors of degrees ≤ n 1 − 1, . . . , n k − 1 and its total degree in variables u is ≤ n − k

554

O. I. Bogoyavlenskij

since n 1 + · · · + n k = n. Each factor has degree ≤ m − 1 because n j ≤ m. The same is true for the polynomial PH (u, λ). 3) For the 3-dimensional case, the Haantjes tensor H A (u, v) of any (1,1)-tensor Aij defines the Lie algebra structures in the tangent spaces Tx (R 3 ) and thus tensor B H (u, v, w) (4.4) vanishes [3, 13]. For a decoupled system (1.1) with m ≤ 3, the (1,3)-tensor B H (u, v, w) (4.4) and the differential 3-forms p are direct sums of the tensors and forms for each invariant distribution L j . Hence the necessary conditions for m = 3 follow. Using the necessary conditions, we arrive at the following: Corollary 1. If for some non-negative integers p1 , . . . , pn the n-form p1 ··· pn or ω p1 ··· pn−1 (4.2) is non-zero or if the polynomial PN (u, λ) = det(Nu −λ) has degree n −1 in variables u, then the system of pde’s (1.1) cannot be decoupled into non-interacting subsystems. Example 1. Let us consider Benney’s system [4] in the Zakharov form [16], u it = −u i u i x −

k

η j x , ηit = −ηi u i x − u i ηi x .

(4.6)

j=1

Let ei = ∂/∂u i , h i = ∂/∂ηi be the basis tangent vectors. The (1,1)-tensor Aij for system (4.6) has the form: A(ei ) = −u i ei − ηi h i , A(h i ) = −(e1 + · · · + ek ) − u i h i . The corresponding Nijenhuis tensor is defined by the formulae N (ei , e j ) = 0,

N (h i , h j ) = h j − h i , N (ei , h j ) = −ei . (4.7) Let v be a tangent vector v = i (xi ei + yi h i ). We define a 1-form ψ(v) = y1 + · · · + yk . Formulae (4.7) yield N (v, w) = ψ(v)w − ψ(w)v.

(4.8)

Formula (4.8) yields that the (1,3)-tensor B N (4.5) vanishes. Hence the Nijenhuis tensor (4.7) - (4.8) defines a structure of a Lie algebra in Tu (R 2k ). The Lie algebra is solvable. Indeed, let L ⊂ Tu = Tu (R 2k ) be the hyperplane defined by equation ψ(L) = 0. Formula (4.8) implies N (Tu , Tu ) ⊂ L and N (L , L) = 0. Let h = (h 1 + · · · + h k )/k. Since ψ(h) = 1, any tangent vector v has the form v = v + ψ(v)h, where ψ(v) = 0. Formula (4.8) implies Nv w = ψ(v)w, Nv h = ψ(v)h − v ∈ L. Hence we find PN (v, λ) = det(Nv − λ) = −λ[ψ(v) − λ]2k−1 .

(4.9)

Since the polynomial PN (v, λ) (4.9) has the maximal possible degree n −1 in v, n = 2k, we obtain that the Benney system is not reducible into non-interacting subsystems. Any small perturbation of the Benney system cannot not be decoupled either. Indeed, the polynomial PN (v, λ) is a continuous function of the system’s (1,1)-tensor Aij (u). Hence for all small perturbations the polynomial has the maximal possible degree 2k − 1 and the above criteria apply.

Decoupling Problem for Systems of Quasi-Linear pde’s

555

II. If the (1,1)-tensor Am (1.1) has complex and real distinct eigenvalues then evidently there exist invariant 2-dimensional and 1-dimensional distributions L j satisfying equation L 1x ⊕ · · · ⊕ L kx = Tx (R n ). If conditions (3.1) are satisfied then Theorem 1 implies that system (1.1) is reducible to the form (1.2) with non-interacting 2 × 2 and 1 × 1 blocks. However inside the 2 × 2 blocks a non-trivial interaction is realized generically. Necessary conditions for the decoupling of system (1.1) into k 2- dimensional and n − 2k 1- dimensional non- interacting subsystems: 1) The Haantjes tensor H ij should be zero, 2) The Nijenhuis tensor N ij should define a solvable Lie algebra structure in each tangent space Tx (R n ); the (1, 3)-tensor B N (u, v, w) (4.5) should vanish. The Lie algebras should be direct sums of k 2-dimensional solvable Lie algebras and (n−2k)-dimensional abelian Lie algebra R n−2k , 3) The quadratic forms (u, u) N p = Tr (A p Nu A p Nu ) should be semi-positive definite and in some basis should have the form (v, v) N p = v12 + · · · + v2 , where ≤ k. 4) The differential 2-forms ω p (u, v) = d Tr(A p )(N (u, v)) and pq (u, v) (4.1) should be closed, dω p = 0, d pq = 0, and should have rank ≤ 2k. The proof follows from the 2-dimensional case that is studied in [13]. The necessary conditions imply the following consequence. Corollary 2. If for some non-negative integers p and q the 2-forms ω p (u, v) = d Tr(A p )(N (u, v)) or pq (u, v) (4.1) are not closed, or if quadratic form (u, u) N p = Tr (A p Nu A p Nu ) has rank > [n/2], or if the Haantjes tensor H ijk is non-zero then system (1.1) cannot be decoupled into non-interacting 2- and 1-dimensional subsystems. Remark 3. In paper [17] we prove that the necessary and sufficient conditions for the local reducibility of a system (1.1) into a block-diagonal form with k mutually interacting blocks [1] have the form A(L i x ) ⊂ L i x ,

H A (L i x , L i x ) ⊂ L i x ,

H A (L i x , L j x ) ⊂ L i x + L j x ,

where H A (u, v) is the Haantjes tensor (4.3) and all notations are the same as in Theorem 1. Acknowledgements. The author thanks Peter Lax for the useful discussion of the paper at the Conference LN2006.

References 1. Courant, R., Hilbert, D.: Methods of Mathematical Physics, II. New York: Interscience Publishers, 1962 2. Nijenhuis, A.: X n−1 -forming sets of eigenvectors. Proc. Kon. Ned. Akad. Amsterdam 54, 200–212 (1951) 3. Bogoyavlenskij, O. I.: Courant problems and their extensions. In:Proceedings of VII International Conference on hyperbolic problems, International Series of Numerical Mathematics, Vol. 129, Basel: Birkhauser Verlag, pp. 97–104 (1999) 4. Benney, D. J.: Some properties of long non-linear waves. Stud. Appl. Math. 52, 42–50 (1973) 5. Newlander, A., Nirenberg, L.: Complex analytic coordinates in almost-complex manifolds. Ann. Math. 65, 391–404 (1957) 6. Gelfand, I. M., Dorfman, I. Ya.: The Schouten bracket and Hamiltonian operators. Funct. Anal. Appl. 14, 223–226 (1980) 7. Magri, F., Morosi, C.: A geometrical characterization of integrable Hamiltonian systems through the theory of the Poisson-Nijenhuis manifolds. Quaderno S/19, Universita di Milano (1984) 8. Magri, F.: A simple model of an integrable Hamiltonian system. J.Math. Phys. 19, 1156–1162 (1978)

556

O. I. Bogoyavlenskij

9. Kosmann-Schwarzbach, Y., Magri, F.: Poisson-Nijenhuis structures. Ann. Inst. Henri Poincare 53, 35–81 (1990) 10. Magri, F., Morosi, C., Ragnisco, O.: Reduction techniques for infinite-dimensional Hamiltonian systems: some ideas and applications. Commun. Math. Phys. 99, 115–140 (1985) 11. Magri, F., Morosi, C., Tondo, G.: Nijenhuis G-manifold and Lenard bicomplexes: a new approach to K P systems. Commun. Math. Phys. 115, 457–475 (1988) 12. Stone, A. P.: Generalized conservation laws. Proc. Amer. Math. Soc. 18, 868–873 (1967) 13. Bogoyavlenskij, O. I.: Necessary conditions for existence of non-degenerate Hamiltonian structures. Commun. Math. Phys. 182, 253–290 (1996) 14. Haantjes, J.: On X m -forming sets of eigenvectors. Proc. Kon. Ned. Akad. Amsterdam 58, 158–162 (1955) 15. Marsden, J. E., Ratiu, T. S.: Introduction to Mechanics and Symmetry. New York: Springer Verlag 1999 16. Zakharov, V. E.: Benney equations and quasiclassical approximation in the inverse problem method. Funkt. Anal. App. 14, 15–24 (1980) 17. Bogoyavlenskij, O. I.: Block-diagonalizability problem for hydrodynamic type systems. J. Math. Phys. 47, 023504 (2006) Communicated by L. Takhtajan

Commun. Math. Phys. 269, 557–569 (2007) Digital Object Identifier (DOI) 10.1007/s00220-006-0129-7

Communications in

Mathematical Physics

On the Lagrangian Dynamics for the 3D Incompressible Euler Equations Dongho Chae Department of Mathematics, Sungkyunkwan University, Suwon 440-746, Korea. E-mail: [email protected] Received: 24 March 2006 / Accepted: 25 May 2006 Published online: 5 October 2006 – © Springer-Verlag 2006

Abstract: In this paper we study the dynamical behaviors along the particle trajectories for some quantities of the 3D inviscid incompressible fluids. We construct evolution equations satisfied by scalar quantities composed of spectrum of the deformation tensor, the hessian of the pressure and the direction field of the vorticity, and study the dichotomy between the finite time singularity and the long time behaviors of the various scalar quantities. 1. Introduction We are concerned with the following Euler equations for the homogeneous incompressible fluid flows in a domain ⊂ R3 : Dv = −∇ p, Dt div v = 0, v(x, 0) = v0 (x),

(1.1) (1.2) (1.3)

where D/Dt is the material derivative defined by D ∂ = + (v · ∇). Dt ∂t Here v = (v1 , v2 , v3 ), v j = v j (x, t), j = 1, 2, 3, is the velocity of the flow, p = p(x, t) is the scalar pressure, and v0 is the given initial velocity, satisfying div v0 = 0. Since our analysis is along the particle trajectory of the fluid, which is defined below, our study does not depend on the specific domain and the associated boundary conditions. The system was derived by L. Euler in 1755 ([13]). Given m ∈ Z+ ∪ {0}, let H m () be the The work was supported partially by the KOSEF Grant no. R01-2005-000-10077-0.

558

D. Chae

k 2 standard Sobolev space defined by H m () = { f ∈ L 2 () | m k=0 |D f (x)| d x < ∞},where the derivatives are in the sense of distributions. The local in time solution of the Euler equations in the Sobolev space H m (Rn ) for m > n/2 + 1, n = 2, 3 was obtained in [16, 23], and many authors subsequently derived local well-posedness results in various other function spaces (see e.g. [8] and the references therein). One of the most outstanding open problems for the Euler equations is whether or not there exists any smooth initial data, say v0 ∈ C0∞ (R3 ), which evolves in finite time into a blowing up solution. The answer to this question is tremendously important in the fundamental understandings of the physics of turbulence. Even the numerical results on this problem are not yet conclusive (see e.g. [17, 14, 15]). For very interesting discussions of physical motivation of this problem and the interplay of multifaced approaches to it from the points of views of mathematics, numerics and physics we refer the articles [9, 21], or Chapter 5 of [22]. In this direction of study there is a celebrated criterion of the blow-up due to Beale, Kato and Majda (called the BKM criterion)[1], which states for m > 25 , lim sup v(t) H m = ∞ tT∗

T∗

if and only if

ω(t) L ∞ dt = ∞,

(1.4)

0

where curl v = ω is the vorticity of the fluid. After that many authors obtained the refinements of this result, replacing the L ∞ norm of the vorticity by weaker norms close to the L ∞ norm ([19, 18, 7, 6]), or reducing the number of components of the vorticity ([7]). There were also approaches to derive blow-up criteria of the geometric nature ([9, 10, 12]. We also mention that a particular scenario leading to singularities from the vortex tubes is excluded in [11], while another scenario leading to self-similar singularities is excluded recently in [2]. In this paper we approach the problem with the viewpoint of dichotomy between the finite time singularity and the long time behaviors of the regular solutions. In other words, we try to answer the following question: Suppose we do not have finite time singularity for the solution of the 3D Euler equations. Can we say anything about its long time dynamics for the regular solutions? If we could, hopefully, find some kind of absurdity in the hypothetical long time dynamics, then we might be able to find a way to resolve the finite time singularity problem. The results in this paper are those obtained during the author’s trial to answer to the above question, which are not yet complete to solve the original problem. We could say something about the hypothetical long time regular dynamics at least for the quantities defined as j , j = 1, 2, 3 below. These quantities are naturally generated when we study the dichotomy between the long time behaviors of the regular solutions and the singularities originated from the vortex dynamics, the dynamics of the vortex stretching rate α and the spectral dynamics of deformation tensor respectively. Those are, on the other hand, the scalar quantities consisting of the deformation tensor, the hessian of pressure, the direction field of vorticity and the spectrum of the deformation tensor. The dynamics is along the particle trajectories of the flow. This type of lagrangian dynamics approach to the finite time singularity problem was previously studied by the author of this article in [3], and here we clarify, refine and further develop the results obtained there. We also try to localize the spectral dynamics of the deformation tensor. The ‘average spectral dynamics’ was studied in [4]. We note that the pointwise spectral dynamics of the velocity gradient matrix (not of the deformation tensor) for a model problem of the Euler equations was expounded by Liu and Tadmor in [20]. We introduce some notations to state our main theorems.

Lagrangian Dynamics for the 3D Incompressible Euler Equations

559

Given velocity v(x, t), and pressure p(x, t), we introduce the 3 × 3 matrices, Vi j =

∂v j Vi j + V ji , , Si j = ∂ xi 2

Ai j =

Vi j − V ji , 2

Pi j =

∂2 p , ∂ xi ∂ x j

with i, j = 1, 2, 3. Then, we have the decomposition V = (Vi j ) = S + A, where S = (Si j ) represents the deformation tensor of the fluid, and A = (Ai j ) is related to the vorticity ω by the formula, Ai j =

3 3 1 εi jk ωk , ωi = εi jk A jk , 2 k=1

(1.5)

j,k=1

where εi jk is the skewsymmetric tensor with the normalization ε123 = 1. Note that P = (Pi j ) is the hessian of the pressure. We use the notion of the fluid particle trajectory (Lagrangian variable) as a main tool in this paper. The function X (a, t), which is called the particle trajectory, is defined by the unique solution of the ordinary differential equations, d X (a, t) = v(X (a, t), t); dt

X (a, 0) = a ∈ ,

where v(x, t) is the classical solution of the system (1.1)–(1.3). Let {(λk , ηk )}3k=1 be the eigenvalue and the normalized eigenvectors of S. We set λ = (λ1 , λ2 , λ3 ), and |λ| =

3

21 λ2k

, ρk = ηk · Pηk

for k = 1, 2, 3.

k=1

We also denote ηk (x, 0) = η0k (x), λk (x, 0) = λk,0 (x), λ(x, 0) = λ0 (x), ρk (x, 0) = ρk,0 (x) for the quantities at t = 0. Let ω(x, t) = 0, and ξ(x, t) = ω(x, t)/|ω(x, t)| be the direction field of the vorticity. At such point (x, t) we define the scalar fields α = ξ · Sξ, ρ = ξ · Pξ, where S and P are the deformation tensor and the Hessian of the pressure respectively, associated with the flow. At the points where ω(x, t) = 0 we define α(x, t) = ρ(x, t) = 0. We denote α0 (x) = α(x, 0), ρ0 (x) = ρ(x, 0). Theorem 1.1 (Vortex dynamics). Let v0 ∈ H m (), m > 5/2, be given. We define 1 (a, t) =

α(X (a, t), t) |ω(X (a, t), t)|

and

1 (t) = {a ∈ | α(X (a, t), t) > 0} associated with the classical solution v(x, t). Suppose a ∈ 1 (0) and ω0 (a) = 0. Then one of the following holds true:

560

D. Chae

(i) (finite time singularity) The solution of the Euler equations blows-up in finite time along the trajectory {X (a, t)}. (ii) (regular dynamics) One of the following holds true: (a) (finite time extinction of α) There exists t1 ∈ (0, ∞) such that α(X (a, t1 ), t1 ) = 0. (b) (long time behavior of 1 ) There exists an infinite sequence {t j }∞ j=1 with t1 < t2 < · · · < t j < t j+1 → ∞ as j → ∞ such that for all j = 1, 2, . . . we have 1 (a, 0) > 1 (a, t1 ) > · · · > 1 (a, t j ) > 1 (a, t j+1 ) > 0 and 1 (a, t) ≥ 1 (a, t j ) > 0 for all t ∈ [0, t j ]. Theorem 1.2 (Dynamics of α). Let v0 ∈ H m (), m > 5/2, be given. In case α(X (a, t), t) = 0 we define 2 (a, t) =

|ξ × Sξ |2 (X (a, t), t) − ρ(X (a, t), t) , α 2 (X (a, t), t)

and

2+ (t) = {a ∈ | α(X (a, t), t) > 0, 2 (X (a, t), t) > 1},

2− (t) = {a ∈ | α(X (a, t), t) < 0, 2 (X (a, t), t) < 1}, associated with v(x, t). Suppose a ∈ 2+ (0) ∪ 2− (0). Then one of the following holds true. (i) (finite time singularity) The solution of the Euler equations blows-up in finite time along the trajectory {X (a, t)}. (ii) (regular dynamics) One of the following holds true: (a) (finite time extinction of α) There exists t1 ∈ (0, ∞) such that α(X (a, t1 ), t1 ) = 0. (b) (long time behaviors of 2 ) Either there exists T1 ∈ (0, ∞) such that 2 (a, T1 ) = 1, or there exists an infinite sequence {t j }∞ j=1 with t1 < t2 < · · · < t j < t j+1 → ∞ as j → ∞ such that one of the following holds: (b.1) In the case a ∈ 2+ (0), for all j = 1, 2, . . . we have 2 (a, 0) > 2 (a, t1 ) > · · · > 2 (a, t j ) > 2 (a, t j+1 ) > 1 and 2 (a, t) ≥ 2 (a, t j ) > 1 for all t ∈ [0, t j ]. (b.2) In the case a ∈ 2− (0), for all j = 1, 2, . . . we have 2 (a, 0) < 2 (a, t1 ) < · · · < 2 (a, t j ) < 2 (a, t j+1 ) < 1 and 2 (a, t) ≤ 2 (a, t j ) < 1 for all t ∈ [0, t j ]. Theorem 1.3 (Spectral dynamics). Let v0 ∈ H m (), m > 5/2, be given. In case λ(X (a, t), t) = 0 we define 3 3 1 2 k=1 −λk + 4 |ηk × ω| λk − ρk λk (X (a, t), t) 3 (a, t) = , |λ(X (a, t), t)|3 and

3 (t) = {a ∈ | λ(X (a, t), t) = 0, 3 (X (a, t), t) > 0} associated with v(x, t). Suppose a ∈ 3 (0). Then one of the following holds true: (i) (finite time singularity) The solution of the Euler equations blows-up in finite time along the trajectory {X (a, t)}. (ii) (regular dynamics) One of the following holds true:

Lagrangian Dynamics for the 3D Incompressible Euler Equations

561

(a) (finite time extinction of λ) There exists t1 ∈ (0, ∞) such that λ(X (a, t1 ), t1 ) = 0. (b) (long time behavior of 3 ) Either there exists T1 ∈(0, ∞) such that 3 (a, T1 ) = 0, or there exists an infinite sequence {t j }∞ j=1 with t1 < t2 < · · · < t j < t j+1 → ∞ as j → ∞ such that for all j = 1, 2, . . . we have 2 (a, 0) > 3 (a, t1 ) > · · · > 3 (a, t j ) > 3 (a, t j+1 ) > 0 and 3 (a, t) ≥ 3 (a, t j ) > 0 for all t ∈ [0, t j ].

2. Proof of the Main Theorems Taking ∂/∂ xk of (1.1) yields DV = −V 2 − P. Dt

(2.1)

DS = −S 2 − A2 − P, Dt

(2.2)

The symmetric part of (2.1) is

from which, using the formula (1.5), we derive DS 1 = −S 2 + (|ω|2 I − ω ⊗ ω) − P, Dt 4

(2.3)

where I is the 3 × 3 unit matrix, and (ω ⊗ ω)i j = ωi ω j . The antisymmetric part of (2.1) is DA = −S A − AS, Dt

(2.4)

which, using the formula (1.5) again, we obtain easily Dω = Sω, Dt

(2.5)

which is the well-known vorticity evolution equation. Note that from (2.5) we immediately have D|ω| = α|ω|, Dt which is derived previously in [10]. Below we denote f (X (a, t), t) = for simplicity.

Df (X (a, t), t) Dt

(2.6)

562

D. Chae

2.1. The vortex dynamics. Lemma 2.1. Suppose α0 (a) > 0, and there exists ε > 0 such that α0 (a)|ω0 (a)| ≥ ε|ω0 (a)|2 .

(2.7)

Let us set T∗ =

1 . εα0 (a)

(2.8)

Then, either the vorticity blows up no later than T∗ , or there exists t ∈ (0, T∗ ) such that α(X (a, t), t)|ω(X (a, t), t)| < ε|ω(X (a, t), t)|2 .

(2.9)

Proof. Suppose that there is no blow-up of the solution on [0, T∗ ], and the inequality α(X (a, t), t)|ω(X (a, t), t)| ≥ ε|ω(X (a, t), t)|2

(2.10)

persists on [0, T∗ ]. We will see that this leads to a contradiction. Combining (2.10) with (2.6), we have |ω| ≥ ε|ω|2 . Hence, by Gronwall’s lemma, we obtain |ω(X (a, t), t)| ≥

|ω0 (a)| . 1 − ε|ω0 (a)|t

Thus we are led to lim sup |ω(X (a, t), t)| = ∞. tT∗

Proof of Theorem 1.1. We first observe that the formula

t α(X (a, s), s)ds |ω0 (a)|, |ω(X (a, t), t)| = exp 0

which is obtained from (2.6) shows that ω(X (a, t), t) = 0 if and only if ω0 (a) = 0 for the particle trajectory {X (a, t)} of the classical solution v(x, t) of the Euler equations. Choosing ε = α0 (a)/|ω0 (a)| in Lemma 2.1, we see that either the vorticity blows up no later than T∗ = 1/α0 (a), or there exists t1 ∈ (0, T∗ ) such that 1 (a, t1 ) =

α0 (a) α(X (a, t1 ), t1 ) < = 1 (a, 0). |ω(X (a, t1 ), t1 )| |ω0 (a)|

Under the hypothesis that (i) and (ii)-(a) do not hold true, we may assume a ∈ 1 (t1 ) and repeat the above argument to find t2 > t1 such that 1 (a, t2 ) < 1 (a, t1 ), and also a ∈ 1 (t2 ). Iterating the argument, we find a monotone increasing sequence {t j }∞ j=1 such that 1 (a, t j ) > 1 (a, t j+1 ) for all j = 1, 2, 3, . . . . In particular we can choose each t j so that 1 (a, t) ≥ 1 (a, t j ) for all t ∈ (t j−1 , t j ]. If t j → t∞ < ∞ as j → ∞, then we can proceed further to have t∗ > t∞ such that 1 (a, t∞ ) > 1 (a, t∗ ). Hence, we may set t∞ = ∞.

Lagrangian Dynamics for the 3D Incompressible Euler Equations

563

2.2. The dynamics of α. In order to prove Theorem 1.2 we establish the following two lemmas. Lemma 2.2. Suppose α0 (a) > 0, and there exists ε > 0 such that ρ0 (a) + (1 + ε)α02 (a) ≤ |ξ0 × S0 ξ0 |2 (a).

(2.11)

Let us set T∗ =

1 . εα0 (a)

(2.12)

Then, either the solution blows up no later than T∗ , or there exists t ∈ (0, T∗ ) such that ρ(X (a, t), t) + (1 + ε)α 2 (X (a, t), t) > |ξ × Sξ |2 (X (a, t), t).

(2.13)

Lemma 2.3. Suppose α0 (a) < 0, and there exists ε > 0 such that ρ0 (a) + (1 − ε)α02 (a) ≥ |ξ0 × S0 ξ0 |2 (a).

(2.14)

We set T∗ =

−1 . εα0 (a)

(2.15)

Then, either the solution blows up no later than T∗ , or there exists t ∈ (0, T∗ ) such that ρ(X (a, t), t) + (1 − ε)α 2 (X (a, t), t) < |ξ × Sξ |2 (X (a, t), t).

(2.16)

Proof of Lemma 2.2. From (2.5) and (2.6) we have ξ =

ω|ω| ω − = Sξ − αξ. |ω| |ω|2

Hence, using (2.3), we derive α = ξ · S ξ + 2ξ · Sξ

1 2 2 = ξ · −S + (|ω| I − ω ⊗ ω) − P ξ + 2|Sξ |2 − 2αξ · Sξ 4 = −|Sξ |2 − ρ + 2|Sξ |2 − 2α 2 = −α 2 + (|Sξ |2 − α 2 ) − ρ = −α 2 + |ξ × Sξ |2 − ρ.

(2.17)

Now suppose that there is no blow-up of the solution on [0, T∗ ], and the inequality ρ(X (a, t), t) + (1 + ε)α 2 (X (a, t), t) ≤ |ξ × Sξ |2 (X (a, t), t)

(2.18)

persists on [0, T∗ ]. We will show that this leads to a contradiction. Combining (2.18) with (2.17), we have α ≥ εα 2 on

[0, T∗ ],

which leads to α(X (a, t), t) ≥

α0 (a) . 1 − εα0 (a)t

(2.19)

564

D. Chae

Hence, we obtain lim suptT∗ α(X (a, t), t) = ∞, and by the Sobolev inequality, |α(X (a, t), t)| ≤ |S(X (a, t), t)| ≤ ∇v(·, t) L ∞ ≤ C v(·, t) H m for m > 5/2, we have lim suptT∗ v(·, t) H m = ∞. Hence, either the solution of the Euler equations blows up on [0, T∗ ] and the formal computations leading to (2.19) are not valid, or there exists t ∈ (0, T∗ ], at which the inequality (2.18) does not hold true. Proof of Lemma 2.3. Since the proof is similar to the above one, we will be brief. Suppose that there is no blow-up of the solution on [0, T∗ ], and the inequality ρ(X (a, t), t) + (1 − ε)α 2 (X (a, t), t) ≥ |ξ × Sξ |2 (X (a, t), t)

(2.20)

persists on [0, T∗ ] with T∗ = −1/εα0 (a). We will show that this leads to a contradiction. Combining (2.20) with (2.17), we have α ≤ −εα 2 on

[0, T∗ ],

which leads to α(X (a, t), t) ≤

α0 (a) . 1 + α0 (a)εt

(2.21)

Hence, we have lim inf tT∗ α(X (a, t), t) = −∞, and the remaining part of the proof is the same as that of Lemma 2.2. Lemma 2.4. Suppose α0 (a) > 0, and ρ0 (a) + α02 (a) < |ξ0 × S0 ξ0 |2 (a).

(2.22)

Let us set T∗ =

|ξ0 ×

α0 (a) . 2 S0 ξ0 | (a) − α02 (a) − ρ0 (a)

(2.23)

Then, either the solution blows up no later than T∗ , or there exists t ∈ (0, T∗ ) such that 2 (a, t) < 2 (a, 0). Proof. We choose ε=

|ξ0 × S0 ξ0 |2 (a) − ρ0 (a) −1 α02 (a)

in Lemma 2.2. Then (2.11) is satisfied with such choice of ε, and (2.13) is equivalent to 3 (a, 0) > 1. Hence, the conclusion of Lemma 2.4 follows from Lemma 2.2. Similarly we have Lemma 2.5. Suppose α0 (a) < 0, and ρ0 (a) + α02 (a) > |ξ0 × S0 ξ0 |2 (a).

(2.24)

Let us set T∗ as in (2.23). Then, either the solution blows up no later than T∗ , or there exists t ∈ (0, T∗ ) such that 2 (a, t) > 2 (a, 0).

Lagrangian Dynamics for the 3D Incompressible Euler Equations

565

Proof. We choose ε=

ρ0 (a) − |ξ0 × S0 ξ0 |2 (a) +1 α02 (a)

in Lemma 2.3. Then (2.14) is satisfied with such choice of ε, and (2.16) is equivalent to 2 (a, 0) < 1. Hence, the conclusion of Lemma 2.5 follows from Lemma 2.3. Proof of Theorem 1.2. Suppose both (i) and (ii)-(a) do not hold true. We prove the case (ii)-(b.1) holds true. The proof of the other case (ii)-(b.2) is similar. We first recall that 1 (a, 0) > 1 is equivalent to (2.22). Applying Lemma 2.4 in this case, we find that there exists t1 such that 2 (a, 0) > 2 (a, t1 ). If 2 (a, t1 ) ≤ 1, then we are done by continuity of the mapping t → 2 (a, t). Otherwise, by applying Lemma 2.4 again, there exists t2 > t1 such that 2 (X (a, t1 ), t1 ) > 2 (X (a, t2 ), t2 ). We repeat the argument continuously until the value of 2 (a, t) touches the value 1 in finite step. Otherwise, we find a monotone increasing sequence {t j }∞ j=1 such that 2 (a, t j ) > 2 (a, t j+1 ) for all j = 1, 2, 3, . . . . In particular we can choose each t j so that 2 (a, t) ≥ 2 (a, t j ) for all t ∈ (t j−1 , t j ]. If t j → t∞ < ∞ as j → ∞, and 2 (a, t∞ ) > 1 then we can proceed further to have t∗ > t∞ such that 2 (a, t∞ ) > 2 (a, t∗ ). Hence either t∞ < ∞ and 2 (a, t∞ ) = 1, or t∞ = ∞. The former case corresponds to the finite time touch of the value 1 for 2 (a, t). 2.3. The spectral dynamics. In this section we prove Theorem 1.3, starting from the following lemma. Lemma 2.6. We assume λ0 (a) = 0. Suppose there exists ε > 0 such that 3

2 1 −λ3k,0 + η0k × ω0 λk,0 − ρk,0 λk,0 (a) ≥ ε|λ0 (a)|3 . 4

(2.25)

k=1

Let us set T1 =

1 . ε|λ0 (a)|

(2.26)

Then, either the solution blows up no later than T1 , or there exists t ∈ (0, T1 ) such that 3

2 1 −λ3k + ηk × ω λk − ρk λk (X (a, t), t) < ε|λ(X (a, t), t)|3 , 4

(2.27)

k=1

if λ(X (a, t), t) = 0 for all t ∈ (0, T1 ). Proof. Taking derivative, D/Dt, on the eigenvalue-eigenvector relation, Sηk = λk ηk , we have S ηk + S(ηk ) = λk ηk + λk (ηk ) . Substituting (2.3) into this equation, and then taking the inner product with {ηk }, we have

1 ηk · −S 2 + (|ω|2 I − ω ⊗ ω) − P ηk + ηk · S(ηk ) = λk ηk · ηk + λk ηk · (ηk ) . 4

566

D. Chae

Since (ηk · ηk ) = 0, the right-hand side is equal to λk , while the left-hand side is 1 1 −λ2k + [|ω|2 − (ω · ηk )2 ] − ρk = −λ2k + |ηk × ω|2 − ρk , 4 4 where we used the fact ηk · S(ηk ) = λk ηk · (ηk ) = 0. Hence, we obtain 1 λk = −λ2k + |ηk × ω|2 − ρk , k = 1, 2, 3. 4

(2.28)

Multiplying (2.28) by λk and summing over k ∈ {1, 2, 3}, we have 3

1 2 1 k 3 2 −λk + |η × ω| λk − ρk λk . |λ| = 2 4

(2.29)

k=1

Now, let us suppose that we have no singularity on [0, T1 ] and the inequality 3

k=1

−λ3k

1 k 2 + |η × ω| λk − ρk λk (X (a, t), t) ≥ ε|λ(X (a, t), t)|3 4

(2.30)

persists for all t ∈ [0, T1 ]. Similarly to the previous proofs we will see below that this leads to a contradiction. Combining (2.30) with (2.29), we obtain that |λ| ≥ ε|λ|2 for all t ∈ [0, T1 ], and from this we have |λ(X (a, t), t)| ≥

|λ0 (a)| ∀t ∈ [0, T1 ]. 1 − ε|λ0 (a)|t

Hence, lim suptT1 |λ(X (a, t), t)| = ∞. This, combined with the estimates, |λ(X (a, t), t)| ≤ 3 max |λk (X (a, t), t)| = 3 max |ηk · Sηk (X (a, t), t)| 1≤k≤3

1≤k≤3

≤ 3|S(X (a, t), t)| ≤ 3 ∇v(·, t) L ∞ ≤ C v(·, t) H m for m > 5/2, provides us with lim suptT1 v(·, t) H m = ∞, which is the desired contradiction. Using the above lemma we establish the following. Lemma 2.7. Let us assume λ0 (a) = 0. Suppose 3

2 1 −λ3k,0 + η0k × ω0 λk,0 − ρk,0 λk,0 (a) > 0. 4

(2.31)

k=1

We set |λ0 (a)|2 . 3 + 1 |ηk × ω |2 λ −λ (a) − ρ λ 0 k,0 k,0 k,0 k=1 k,0 4 0

T1 = 3

(2.32)

Then, either the solution blows up no later than T1 , or there exists t ∈ (0, T1 ) such that 3 (a, t) < 3 (a, 0) if λ(X (a, t), t) = 0 for all t ∈ (0, T1 ).

Lagrangian Dynamics for the 3D Incompressible Euler Equations

567

Proof. We set 3 + 1 |ηk × ω |2 λ −λ (a) − ρ λ 0 k,0 k,0 k,0 k=1 k,0 4 0

3 ε=

|λ0 (a)|3

in Lemma 2.6. Then (2.25) is satisfied, and (2.27) is equivalent to 3 (a, t) < 3 (a, 0). Hence, the conclusion of Lemma 2.7 follows from Lemma 2.6. Proof of Theorem 2.1. Although the proof is similar to that of Theorem 1.2, we present it here for the reader’s convenience. Suppose both of (i) and (ii)-(a) do not hold true. We prove the case (ii)-(b) holds true. We first recall that 3 (a, 0) > 0 is equivalent to (2.31). Applying Lemma 2.7, we find that there exists t1 such that 3 (a, 0) > 3 (a, t1 ). If 3 (a, t1 ) ≤ 0, then we are done. Otherwise, applying Lemma 2.7 again, there exists t2 > t1 such that 3 (X (a, t1 ), t1 ) > 3 (X (a, t2 ), t2 ). We repeat the argument continuously until the value of 3 (a, t) touches the value 0 in finite step. Otherwise, we find a monotone increasing sequence {t j }∞ j=1 such that 3 (a, t j ) > 3 (a, t j+1 ) for all j = 1, 2, 3, . . . . In particular we can choose each t j so that 3 (a, t) ≥ 3 (a, t j ) for all t ∈ (t j−1 , t j ]. If t j → t∞ < ∞ as j → ∞, and 3 (a, t∞ ) > 0 then we can proceed further to have t∗ > t∞ such that 3 (a, t∞ ) > 3 (a, t∗ ). Hence either t∞ < ∞ and 3 (a, t∞ ) = 0, or t∞ = ∞. The former case corresponds to the finite time touch of the value 0 for 3 (a, t). A. Addendum Here we present a refinement of Theorem 2.1 in [3], which is also a different formulation and the proof of Lemma 2.2. Theorem A.1. Let v0 ∈ H m (), m > 5/2, be given. For such v0 let us define a set

⊂ by

= {a ∈ | α0 (a) > 0, ω0 (a) = 0, ∃ ε ∈ (0, 1) such that ρ0 (a) + 2α02 (a) − |ξ0 × S0 ξ0 |2 (a) ≤ (1 − ε)2 α02 (a)}. Let us set T∗ =

1 . εα0 (a)

(A.1)

Then, either the solution blows up no later than T∗ , or there exists t ∈ (0, T∗ ) such that ρ(X (a, t), t) + 2α 2 (X (a, t), t) − |ξ × Sξ |2 (X (a, t), t) > (1 − ε)2 α 2 (X (a, t), t). (A.2) Remark. We note that if we ignore the term |ξ0 × S0 ξ0 |2 (a), then we have the condition, ρ0 (a) + α02 (a) ≤ (−2ε + ε2 )α02 (a) < 0, since ε ∈ (0, 1). Thus ⊂ S, where S is the set defined in Theorem 2.1 of [3]. We also observe that actually the condition on the point a in is equivalent to the (2.11) in Lemma 2.2, after redefining ε.

568

D. Chae

Proof of Theorem A.1. As in the proof of Theorem 1.1 we have ω(X (a, t), t) = 0 if and only if ω0 (a) = 0. Hence the function := 1/|ω| is well defined along the particle trajectories. We compute α2 α α + =− = − |ω| |ω| |ω| = (2α 2 − |ξ × Sξ |2 + ρ). Now let us suppose that we have persistence of the regularity of solution on [0, T∗ ], and the inequality, ρ(X (a, t), t) + 2α 2 (X (a, t), t) − |ξ × Sξ |2 (X (a, t), t) ≤ (1 − ε)2 α 2 (X (a, t), t) holds for t ∈ [0, T∗ ]. We will show that this hypothesis leads to a contradiction. We have = (2α 2 − |ξ × Sξ |2 + ρ) ≤ (1 − ε)2 α0 (a)2 .

(A.3)

Setting h = (1 − ε)α0 (a), and solving the differential inequality (A.3), we obtain ≤ 0 exp(−ht) +

(h0 + 0 ) exp(−ht) [exp(2ht) − 1]. 2h

Going back to |ω(X (a, t), t)| = 1/(X (a, t), t), we have |ω(X (a, t), t)| ≥ =

|ω0 (a)| exp(ht) |ω0 (a)| 1 − ( |ω0 (a)| − h) exp(2ht)−1 2h |ω0 (a) exp(ht) 1 − [α0 (a) − h(a)]

Thus, we find limtT∗ |ω(X (a, t), t)| = ∞.

exp(2ht)−1 2h

≥

|ω0 (a)| . 1 − εα0 (a)t

Acknowledgements. The author would like to thank to Professor Peter Constantin for very helpful comments and suggestions.

References 1. Beale, J.T., Kato, T., Majda, A.: Remarks on the breakdown of smooth solutions for the 3-D Euler equations. Commun. Math. Phys. 94, 61–66 (1984) 2. Chae, D.: Nonexistence of self-similar singularities for the 3D incompressible Euler equations. http:// arxiv.org/list/ math.AP/0601060, 2006; http://arxiv.org/list/ math.AP/0601661, 2006 3. Chae, D.: On the finite time singularities of the 3D incompressible Euler equations. Comm. Pure Appl. Math. 109, 0001–0021 (2006) 4. Chae, D.: On the spectral dynamics of the deformation tensor and new a priori estimates for the 3D Euler equations. Commun. Math. Phys. 263(3), 789–801 (2006) 5. Chae, D.: Remarks on the blow-up criterion of the 3D Euler equations. Nonlinearity, 18, 1021–1029 (2005) 6. Chae, D.: Local existence and blow-up criterion for the Euler equations in the besov spaces. Asymp. Anal. 38(3-4), 339–358 (2004) 7. Chae, D.: On the well-posedness of the euler equations in the Triebel-Lizorkin Spaces. Comm. Pure Appl. Math. 55, 654–678 (2002) 8. Chemin, J.-Y.: Perfect incompressible fluids. Oxford: Clarendon Press, (1998) 9. Constantin, P.: Geometric statistics in turbulence. SIAM Rev. 36, 73–98 (1994)

Lagrangian Dynamics for the 3D Incompressible Euler Equations

569

10. Constantin, P., Fefferman, C., Majda, A.: Geometric constraints on potential singularity formulation in the 3-D Euler equations. Comm. P.D.E, 21(3-4), 559–571 (1996) 11. Córdoba, D., Fefferman, C.: On the collapse of tubes carried by 3D incompressible flows. Comm. Math. Phys., 222(2), 293–298 (2001) 12. Deng, J., Hou, T.Y., Yu, X.: Geometric and Nonblowup of 3D incompressible Euler flow. Comm. P.D.E. 30, 225–243 (2005) 13. Euler, L.: Opera omnia. Series Secunda 12, 274–361 (1755) 14. Grauer, R., Sideris, T.: Numerical computation of three dimensional incompressible ideal fluids with swirl. Phys. Rev. Lett. 67, 3511–3514 (1991) 15. Grauer, R., Sideris, T.: Finite time singularities in ideal fluids with swirl. Phys. D 88(2), 116–132 (1995) 16. Kato, T.: Nonstationary flows of viscous and ideal fluids in R3 . J. Funct. Anal. 9, 296–305 (1972) 17. Kerr, R.: Evidence for a singularity of the 3-dimensional, incompressible Euler equations. Phys. Fluids A 5, 1725–1746 (1993) 18. Kozono, H., Ogawa, T., Taniuchi, Y.: The Critical sobolev inequalities in besov spaces and regularity criterion to some semilinear evolution equations. Math. Z. 242, 251–278 (2002) 19. Kozono, H., Taniuchi, Y.: Limiting case of the Sobolev inequality in BMO, with applications to the Euler equations. Commun. Math. Phys. 214, 191–200 (2000) 20. Liu, H., Tadmor, E.: Spectral dyanamics of the velocity gradient field in restricted flows. Commun. Math. Phys. 228, 435–466 (2002) 21. Majda, A.: Vorticity, turbulence and acoustics in fluid flow. SIAM Rev. 33, 349–388 (1991) 22. Majda, A., Bertozzi, A.: Vorticity and Incompressible Flow, Cambridge: Cambridge Univ. Press, 2002 23. Temam, R.: On the Euler equations of incompressible flows. J. Funct. Anal. 20, 32–43 (1975) Communicated by P. Constantin

Commun. Math. Phys. 269, 571–609 (2007) Digital Object Identifier (DOI) 10.1007/s00220-006-0128-8

Communications in

Mathematical Physics

Random Skew Plane Partitions and the Pearcey Process Andrei Okounkov1 , Nicolai Reshetikhin2 1 Department of Mathematics, Princeton University, Princeton, NJ 08544-1000, USA.

E-mail: [email protected]

2 Department of Mathematics, University of California at Berkeley, Berkeley, CA 94720-3840, USA.

E-mail: [email protected] Received: 28 April 2005 / Accepted: 30 August 2005 Published online: 15 November 2006 – © Springer-Verlag 2006

Abstract: We study random skew 3D partitions weighted by q vol and, specifically, the q → 1 asymptotics of local correlations near various points of the limit shape. We obtain sine-kernel asymptotics for correlations in the bulk of the disordered region, Airy kernel asymptotics near a general point of the frozen boundary, and a Pearcey kernel asymptotics near a cusp of the frozen boundary. 1. Introduction A plane partition π = (πi j ) is an array of nonnegative numbers indexed by (i, j) ∈ N2 that is monotone, that is, πi j ≥ πi+r, j+s , r, s ≥ 0 and finite in the sense that πi j = 0 when i + j 1. Plane partitions have an obvious generalization which we call skew plane partitions (see Appendix A for some details). A skew plane partition is again a monotone array (πi j ) which is now indexed by points (i, j) of a skew shape λ/μ, where μ ⊂ λ is a pair of ordinary partitions. We call μ and λ the inner and outer shape of π , respectively. In fact, in this paper we will only consider the case when the outer shape λ is a a × b rectangle. Here is an example with μ = (1, 1) and a = b = 5: 7 4 2 1 5 3 2 0 π= 6 3 1 1 0. (1) 4 1 1 0 0 2 0 0 0 0 Placing πi j cubes over the (i, j) square in λ/μ gives a three-dimensional object which we will call a skew 3D partition and denote by the same letter π . Its volume is |π | = πi j . For π as in (1), it shown in Fig. 1.

572

A. Okounkov, N. Reshetikhin

7 6 5 4 3 2 1 0 0

0 1

1 2

2 3

3 4

4 5

5

Fig. 1. A skew 3D partition

Given a parameter 0 < q < 1, define a probability measure on the set of all skew plane partitions with given inner and outer shapes by setting Prob(π ) ∝ q |π | .

(2)

The corresponding random skew 3D partition model has a natural random growth interpretation, the parameter q being the fugacity. Also, a simple bijection, which should be clear from Fig. 1 and is recalled below, relates this model to a random tiling problem. We are interested in the thermodynamic limit in which q → 1 and both inner and outer shapes are rescaled by 1/r , where r = − ln q → +0 . The results of [6] imply the following form of the law of large numbers: scaled by r in all directions, the surface of our random skew 3D partition converges to a nonrandom surface — the limit shape. This limit shape will be easy to see in the exact formulas discussed below. A simulation showing the formation of the limit shape is presented in Fig. 2. An important qualitative feature of limit shape is the presence of both ordered and disordered regions, separated by the frozen boundary. Furthermore, the frozen boundary has various special points, namely, it has cusps (there is one forming in Fig. 2, it can be seen more clearly in Figs. 16 and 17) and also turning points where the limit shape is not smooth. In Figs. 15–17, the turning points are the points of tangency to any of the lines in the same figure. One expects that the microscopic properties of the random surface, in particular, the correlation functions of local operators, are universal in the sense that they are determined by the macroscopic behavior of the limit shape at that point. More specifically, one expects that: 1. in the bulk of the disordered region, the correlation are given by the incomplete beta kernel [12, 15] with the parameters determined by slope of the limit shape (a special case of this is the discrete sine kernel);

Random Skew Plane Partitions and the Pearcey Process

573

Fig. 2. A large random skew 3D partition

2. at a general point of the frozen boundary, suitably scaled, the correlation are given by the extended Airy kernel [17, 9, 19]; 3. at a cusp of the frozen boundary, correlation, suitably scaled, are given by the extended Pearcey kernel, discussed below and in [20]. In this paper we prove all these statements for the model at hand. The required techniques were developed in our paper [15], of which this one is a continuation. Namely, as will be reviewed below, our random skew plane partition model is a special case of Schur process. This yields an exact contour integral formula for correlation functions. The asymptotics is then extracted by a direct albeit laborious saddle point analysis. The striking resemblance of the above list to classification of singularities is not accidental for, as we will see, these three situation correspond precisely to the saddle point being a simple, double, or triple critical point. We will also see that the frozen boundary is essentially an algebraic curve and that it has precisely one cusp per each exterior corner of the inner shape μ. We expect that near a turning point the correlations behave like eigenvalues of a k × k corner of a GUE random N × N matrix, where N 0 and k plays the role of time. We hope to return to this question in a future paper. The results presented here were obtained in 2002-03 and were reported by us at several conferences. The period between then and now saw many further developments in the field. Most notably, the Pearcey process, which we found describes the behavior near a cusp of the frozen boundary arose in the random matrix context in the work of Tracy and Widom [20]. Pearcey asymptotics for equal time correlations of eigenvalues were obtained earlier by Brezin and Hikami [3, 4] and also by Aptekarev, Bleher, and Kuijlaars [2]. We enjoyed and benefited from the correspondence with C. Tracy on subject. In [8], Ferrari and Spohn derived from the exact formulas of [15] the Airy process asymptotics in the case of unrestricted 3D partitions (the μ = ∅, a = b = ∞ case in our notation). In a related but technically more involved context, the Airy process asymptotics was found by K. Johansson in [10].

574

A. Okounkov, N. Reshetikhin

In [16], the partition function of the random surface model studied here was related to the topological vertex of [1] and, thus, to the Gromov-Witten theory of toric Calabi-Yau threefolds. There were many subsequent developments, some of which are reviewed in [14]. The papers [7, 18] may the closest to the material presented here. Also, much more general results on algebraicity of the frozen boundary are now available [13]. We would like to thank R. Kenyon and C. Vafa for numerous discussions.

2. Preliminaries 2.1. Skew 3D partition as a sequence of its slices. We associate to π the sequence {λ(t)} of its diagonal slices, that is, the sequence of partitions λ(t) = (πi,t+i ) , i > max(0, −t), t ∈ Z .

(3)

Throughout this paper, we assume that the outer shape of our skew partition is an a × b box and, in particular, we will use the letter λ to denote diagonal slices, not the outer shape. Notation λ ν as usual means that λ and ν interlace, that is, λ1 ≥ ν1 ≥ λ2 ≥ ν2 ≥ λ3 ≥ · · · . It is easy to see that the sequence {λ(t)} corresponds to a skew plane partition if and only if it satisfies the following conditions: – if the slice λ(t0 ) is passing through an inner corner of the skew plane partition then · · · ≺ λ(t0 − 2) ≺ λ(t0 − 1) ≺ λ(t0 ) λ(t0 + 1) λ(t0 + 2) · · · ;

(4)

– if the slice λ(t0 ) is passing through an outer corner of the skew plane partition then · · · λ(t0 − 2) λ(t0 − 1) λ(t0 ) ≺ λ(t0 + 1) ≺ λ(t0 + 2) ≺ · · · .

(5)

For example, the configuration {λ(t)} corresponding to the partition (1) is (2) ≺ (4) ≺ (6, 1) (3, 1) ≺ (5, 1) ≺ (7, 3, 1) (4, 2) (2) (1) . We will denote the sequence of inner and outer corners of the inner shape by {vi }1≤i≤N and {u i }1≤i≤N −1 , respectively. We assume that they are numbered so that v1 < u 1 < v2 < u 2 < . . . u N −1 < v N . We also assume that the point t = 0 is chosen so that 1≤i≤N

vi =

1≤i≤N −1

ui .

(6)

Random Skew Plane Partitions and the Pearcey Process

575

2.2. Connection to tilings. There is a well-known mapping of 3D diagrams to tilings of the plane by rhombi. Namely, the tiles are the images of faces of the 3D diagram under the projection (x, y, z) → (t, h) = (y − x, z − (x + y)/2).

(7)

This mapping is a bijection between 3D diagrams and tilings with appropriate boundary conditions. The horizontal tiles of the tiling corresponding to the diagram in Fig. 1 are shown in Fig. 3. It is clear that the positions of horizontal tiles uniquely determine both the tiling and the partition π . The set σ (π ) = {( j − i, πi j − (i + j − 1)/2)} ⊂ Z ×

1 2

Z

(8)

is precisely the set of the centers of the horizontal tiles. Define B(t) =

N N −1 1 1 |t − vi | − |t − u i | . 2 2 i=1

i=1

The image of the inner boundary of our skew plane partitions in the (h, t)-plane is the curve h = −B(t), see an example of this curve in Fig. 4. In particular, the highest layer of horizontal rhombi for an empty plane partition is the set of points with coordinates h = −B(t) − 1/2. 7

5

3

1

–1

–3

–5 –5

–3

–1

1

3

5

Fig. 3. Horizontal tiles of the tiling corresponding to the partition in Fig. 1 in (t, h)-coordinates

576

A. Okounkov, N. Reshetikhin u

0

v

1

0

u

1

v

2

u

2

Fig. 4. Coordinates of corners

2.3. Partition function and correlation functions. Generalizing (2), introduce a probability measure on skew plane partitions by |λ(t)| Prob({λ(t)}) ∝ qt , (9) t∈Z

where 0 ≤ qt < 1 are parameters. We assume that qt = 0, t < u 0 = −a or t > u N = b and so the plane partition is confined to a a × b outer box. The homogeneous case when all nonzero qt are equal corresponds to (2). Notice also that |λ(u 0 )| = |λ(u N )| = 0 and therefore the probability measure depends only on qu 0 +1 , dots, qu N −1 . For fixed inner shape μ, the partition function is defined by |λ(t)| |π | Z ({qt }, μ) = qt = qt t ,

{λ(t)} t∈Z

π t∈Z

where |πt | = i πi,t+i . The correspondence π → σ (π ) defined in (8) makes a random skew partition a random subset of Z × (Z + 21 ), that is, a random point field on a lattice. This motivates the following Definition 1. Given a subset U ⊂ Z × (Z + 21 ), define the corresponding correlation function by 1 |πt | qt . (10) ρ(U ) = Prob (U ⊂ σ (π )) = Z π,U ⊂σ (π ) t∈Z

These correlation functions depend on parameters qt and on the fixed inner shape μ of skew plane partitions. Consider the following “local” functions on skew plane partitions: 1 if (h, t) ∈ σ (π ), ρh,t (π ) = 0 otherwise. If U = {(h 1 , t1 ), . . . , (h n , tn )} with t1 ≥ · · · ≥ tn and (h i , ti ) = (h j , t j ), the correlation function (10) can be written as: |π | 1 ρh 1 ,t1 (π ) . . . ρh n ,tn (π ) qt t . ρ(U ) = ρh 1 ,t1 . . . ρh n ,tn = Z π t∈Z

(11)

Random Skew Plane Partitions and the Pearcey Process

577

3. Schur Processes 3.1. General Schur processes. Schur process, introduced in [15] is a probability measure on sequences of partitions. Parameters of the Schur process are sequences of pairs of functions {φt± (z)}t∈Z such that φ + (z) is analytic at z = 0 and φ − (z) is analytic at z = ∞. For such a pair of functions φ(z)± consider skew Schur functions sλ/μ [φ + ] = det(φλ+i −μ j −i+ j ), and − ). sλ/μ [φ − ] = det(φ−λ i +μ j +i− j

These determinants are effectively determinants of matrices of finite size. Some basic notions about Schur functions are recalled in Appendix A. Define the transition weight by the formula Sφ (λ, μ) =

sλ/ν [φ + ]sμ/ν [φ − ].

ν

The sum here is finite. Definition 2. The probabilities of the Schur process are given by Prob({λ(t)}) =

1 Z

m∈Z+1/2

Sφm λ(m − 21 ), λ(m + 21 ) ,

where the transition weight Sφ is defined above, by φt we denoted the pair of functions φt± and Z is the normalizing factor (partition function) Z=

{λ(t)} m∈Z+1/2

Sφm λ(m − 21 ), λ(m + 21 ) .

If instead of infinite sequences {λ(t)} we have finite sequences of length N , we will say that the Schur process is of length N .

3.2. Polynomial Schur processes and height distributions on skew plane partitions. We will say that the Schur process is polynomial if functions φt± (z)−1 are polynomials in z ± . Let us show that the measure (9) is closely related to a polynomial Schur process. We will parametrize the inner shape as before by assuming that vi , 1 ≤ i ≤ N , vi ∈ Z are positions of inner corners of the inner shape of the plane partition (see Fig. 4) and u i ∈ (vi , vi+1 ), 1 ≤ i ≤ N − 1 are positions of outer corners. Theorem 1. The restriction of the measure (9) to random variables supported on subsequences {λ(m)}, m < v1 , {λ(vi )}, 1 ≤ i ≤ N , {λ(m)}, m > v N coincides with the polynomial Schur process with parameters

578

A. Okounkov, N. Reshetikhin

φm− (z) = (1 − z −1 xm− )−1 , φm+ (z) = 1, (1 − zxm+ )−1 , φi+ (z) =

(12) (13)

1 vi <m

φi− (z) = φm+ (z) =

(1 − z −1 xm− )−1 ,

1 u i <m
= 1,

(14)

(15)

where parameters xt± and qt are related as follows: + xm+1 = q 1 , vi < m < u i − 1, or m > v N , m+ 2 xm+ − 1x ui − 2 ui + 1 2 x− 1 x+ 1 vi − 2 vi + 2 xm− − xm+1

x+

(16)

= qu−1 , i

(17)

= qvi ,

(18)

=q

1 , ui m+ 2

< m < vi+1 − 1, or m < v1 .

(19)

Proof. Let us restrict the process (9) to the subsequence {λvi }. It is easy to see that transition probability from vi+1 to vi in such a subprocess is S i (λ(vi ), λ(vi+1 )) =

sλ(vi )/λ(u i ) (xv+i +1/2 , xv+i +3/2 , . . . , xu+i −1/2 )

(20)

sλ(vi+1 )/λ(u i ) (xv−i+1 −1/2 , xv−i+1 −3/2 , . . . , xu−i +1/2 ),

(21)

where xm± are related to qt as in (16). Thus this process (9) is a polynomial Schur process with φi± given by (12). Conversely, it is clear that due to the identity (79) any polynomial Schur process can be extended to a probability measure (9) on sequences of interlacing partitions with parameters {qt } defined as in (12), (16). It is easy to solve the equations for xm± assuming xu−0 +1/2 = 1. When vi < m ≤ u i , xm+ = qm−1/2 qm−3/2 . . . qu 0 +1 . When u i < m ≤ vi+1 xm− = qm−1/2 qm−3/2 . . . qu 0 +1 . In particular these formulae together with qt < 1 imply xm− > 1 for all m and xm+ < 1 for all m and xv+i +1/2 = qvi . . . qu i−1 xu+i−1 −1/2 and xu−i +1/2 = qu−1 . . . qv−1 xv−i −1/2 . i i

Random Skew Plane Partitions and the Pearcey Process

579

4. Fermionic Representation for Correlation Functions 4.1. For m ∈ Z + 21 define ε(m) = + if vi < m < u i and 1 ≤ i ≤ N , and ε(m) = − if u i < m < vi+1 and 0 ≤ i ≤ N − 1. This is shown on Fig.4. Define D + = {m|ε(m) = +} and D − = {m|ε(m) = −}. Let xm± be positive numbers related to qt as in (16). Notice that for given qt the numbers xm± are defined up to a transformation xm± → xm± a ±1 . Theorem 2. 1. The partition function for the height distribution on skew plane partitions can be represented as the matrix element of the product of vertex operator described in Appendix B.2 as follows:

Z=

u N >m>v N

×

v1 >m>u 0

− (xm+ ) · · ·

u i <m
(m) (m) + (xm− )v0 , v0

+ (xm− )

vi <m

=

− (xm+ ) (0) (0) −ε(m) (xmε(m) )v0 , v0

1 m∈Z+ 2 ,u 0 <m
(22) and

Z=

m 1 <m 2 ,m 1 ∈D − ,m 2 ∈D +

(1 − xm−1 xm+ 2 )−1 , m i ∈ Z + 21 .

2. Assume t1 > · · · > tn , then

1 ρh 1 ,t1 . . . ρh n ,tn = Z ×

ti <m

−ε(m) (xmε(m) )ψ j1 ψ ∗j1 . . .

1 m∈Z+ 2 ,u N >m>t1

−ε(m) (xmε(m) )ψ ji ψ ∗ji . . . ψ jn ψ ∗jn

−ε(m) (xmε(m) )

ti+1 <m
tn >m>u 0

−ε(m) (xmε(m) )v0(0) , v0(0) .

Here and below ji = h i + B(ti ) + 1/2. 3. Correlation functions (11) are determinants: ρh 1 ,t1 . . . ρh n ,tn = det(K ((ti , h i ), (tk , h k )))1≤i,k≤n ,

(23)

(24)

where K ((t1 , h 1 ), (t2 , h 2 )) 1 − (z, t1 )+ (w, t2 ) = (2πi)2 |z|
(25)

580

A. Okounkov, N. Reshetikhin

Here |w| < |z| for t1 ≥ t2 , |w| > |z| for t1 < t2 , R(t) = minm>t ((xm+ )−1 ) and we can choose R ∗ (t) = maxmt,m∈D + ,m∈Z+ 2

− (z, t) =

(1 − z −1 xm− ).

(27)

1 m
+ )−1 , R ∗ (t) = xv−i −1/2 and if u i ≤ t ≤ Remark 1. If vi ≤ t ≤ u i we have R(t) = (xt+1/2 − + −1 ∗ vi+1 we have R(t) = (xvi+1 +1/2 ) , R (t) = xt−1/2 . Notice that R(t) > R ∗ (t) for all t.

Proof. The fact that the partition function and correlation functions for the height distribution of the plane partitions the matrix element of the product of vertex operators as above follows from formula (92) for matrix elements of products of vertex operators ± (x). Using the commutation relations (87), and the fact that − (x)v0(m) = 0 we obtain the product formula for the partition function. (0) The operators ψ j ψ ∗j act on the vector vλ as follows: (0)

(0)

ψ j ψ ∗j vλ = vλ if j = λi − i + 1/2 for some i = 1, 2, . . . and

ψ j ψ ∗j vλ(0) = 0 otherwise. Using this fact and the formula for the matrix elements of ± (x) we obtain the formula (23) for the correlation functions of densities. Moving operators − to the right and + to the left we obtain the following formula for the correlation functions:

ρh 1 ,t1 . . . ρh n ,tn

(0)

(0)

= (ψ j1 (t1 )ψ ∗j1 (t1 ) · · · ψ ji (ti )ψ ∗ji (ti ) · · · ψ jn (tn )ψ ∗jn (tn )v0 , v0 ), where

ψ j (t) =

+ (xm− )

m
and

ψ ∗j (t) =

m

− (xm+ )−1

(28)

m>t,m∈D +

− (xm+ )

m>t,m∈D +

×

+ (xm− )−1 ψ j

m
m>t,m∈D +

×

− (xm+ )

+ (xm− )−1 ψ ∗j

m
+ (xm− )

m>t,m∈D +

− (xm+ )−1 .

(29)

Random Skew Plane Partitions and the Pearcey Process

581

Here the operators on the right are given by power series. Commuting formal power series gives the following identities: + (x)−1 ψk + (x) = ψk − xψk+1 , x n ψk−n , − (x)ψk − (x)−1 = n≥0

+ (x)

−1

ψk∗ + (x)

− (x)ψk∗ − (x)−1

= =

∗ x n ψk−n ,

n≥0 ψk∗ −

∗ xψk+1 .

Applying these identities to the formal Fourier transform of ψ j (t) and ψ ∗j (t) we obtain: ψ(z, t) = (1 − zxm+ )−1 (1 − z −1 xm− )ψ(z), (30) m
m>t,m∈D +

ψ ∗ (z, t) =

(1 − zxm+ )

m>t,m∈D +

(1 − z −1 xm− )−1 ψ ∗ (z),

(31)

m
where both sides are power series in xm± and are formal Laurent power series in z. If v ∈ F, then ψ(z)v ∈ z 1/2 F[z −1 , z]] and ψ ∗ (z)v = z 1/2 F[[z −1 , z]. For the inverse Fourier transform of ψ(z, t)v and ψ(z, t) we obtain: − (z, t) −k−1 1 z ψ(z)vdz, (32) ψk (t)v = 2πi |z|
1 2πi

R ∗ (t)<|w|<1

+ (w, t) k−1 ∗ w ψ (w)vdw. − (w, t)

(33)

Here v ∈ F, both sides are vectors in F[[xm± ]] and these power series converge for sufficiently small x’s. The contour of integration for z is chosen in such a way that none of the zeros of + (z, t) will be inside of it, this gives |z| < R(t) = minm>t (|xm+ |−1 ). The contour of integration for w is such that none of the zeros of − (w, t) are outside the contour. This gives |w| > R ∗ (t) = maxm
⎧ ⎨ ψ j (ta ) ψ ∗ (tb ) v0 , v0 , a ≤ b, jb a = K ((ta , h a ), (tb , h b )) = ⎩− ψ ∗ (tb ) ψ j (ta ) v0 , v0 , a > b. a jb

Notice that a ≤ b iff ta ≥ tb and a > b iff ta < tb . Now, substitute (32) and (33) into K ab and take into account (94). This proves the formula for correlation functions.

582

A. Okounkov, N. Reshetikhin

4.2. Notice that operators ψ(z, t) and ψ ∗ (z, t) satisfy the difference equations: + ψ(z, t + 1) = (1 − zxt+1/2 )ψ(z, t), t ∈ D+ , − ψ(z, t + 1) = (1 − z −1 xt+1/2 )ψ(z, t), t ∈ D− , + )ψ ∗ (z, t), t ∈ D+ , ψ ∗ (z, t − 1) = (1 − zxt−1/2 − ψ ∗ (z, t − 1) = (1 − z −1 xt−1/2 )ψ(z, t), t ∈ D− .

These difference equations give the following difference equations for correlation functions: K ((t1 , h 1 ), (t2 , h 2 )) − K ((t1 − 1, h 1 + 1/2), (t2 , h 2 )) +xt+1 −1/2 K ((t1 − 1, h 1 − 1/2), (t2 , h 2 )) = δt1 ,t2 δh 1 ,h 2 , t1 ∈ D+ ,

(34)

K ((t1 , h 1 ), (t2 , h 2 )) − K ((t1 − 1, h 1 − 1/2), (t2 , h 2 )) +xt−1 −1/2 K ((t1 − 1, h 1 + 1/2), (t2 , h 2 )) = δt1 ,t2 δh 1 ,h 2 , t1 ∈ D− .

(35)

Using these equations and similar difference equations in t2 one can express all correlation functions in terms of equal time correlation functions. Remark 2. Equations (34) and (35) together with appropriate boundary conditions are the equations for the inverse Kasteleyn matrix for the corresponding dimer model. 4.3. The homogeneous restricted case. In the homogeneous restricted case 0 < qt = q < 1 for u 0 ≤ t ≤ u N and qt = 0 otherwise. In the homogeneous case xm± = a ±1 q ±m . The partition function does not depend on a. The functions ± (z, t) are + (z, t) = (1 − zq m a), (36) m>t,m∈D +

− (z, t) =

(1 − z −1 q −m a −1 ).

m
In this case we have R(t) = ∗

R (t) =

a −1 q −vi , u i < t < vi , a −1 q −t , vi−1 < t < u i a −1 q −t , u i < t < vi . a −1 q −vi−1 , vi−1 < t < u i

Notice that when q → 0 the density of horizontal tiles converges to 1 dz dw 1 ρ(h, t) = . (2πi)2 |z|=1+ |w|=1− z − w h+B(t)+ 21 −h−B(t)+ 21 z w This integral is 1 when (h, t) is on a “floor” and is 0 when it is on the “wall”.

(37)

Random Skew Plane Partitions and the Pearcey Process

583

5. The Thermodynamic Limit Here we will study the limit q → 1 of the homogeneous Gauss distribution on restricted skew plane partitions when the number of corners in the inner shape of diagrams remain finite. We assume that q = exp(−r ), r → +0 and Ui = r u i , Vi = r vi and N remain fixed and that U0 < V1 < U1 < · · · < VN < U N . 5.1. Asymptotics of the partition function. It is easy to compute the free energy of the system in this limit: log(1 − q n−m ) = F = − log Z = − m
1 − 2 r

Ui−1 <μ
1≤i≤ j≤N

V j <ν
log(1 − e

μ−ν

)dμdν + o

1 . r2

Similarly, one can compute the asymptotic of the average volume of a 3D partition: |π | = q

∂ Z = ∂q =

m
1 r3

n−m 1 − q n−m

Ui−1 <μ
i≤ j

V j <ν
1 ν−μ . dμdν + o 1 − eμ−ν r3

The first formula reflects essentially the two dimensional nature of the problem. The second formula implies that r −1 is the characteristic length of the system when r → 0. 5.2. The function S(z). Now let us analyze the correlation functions (25) in the limit r → +0. Since r −1 is a characteristic scale of the system in this limit we assume τ = ti r, χ = h i r remain finite. Depending on the value of (χ , τ ) we will either keep the differences ti − t j and h i − h j finite, or we will scale them as appropriate powers of r . When r → +0 the functions in the integral defining correlation functions behave as

− (z, t) −h−B(t) S(z) z F(z)(1 + O(r )), = exp + (z, t) r where S(z) =

μ<τ, μ∈D−

log(1−z

−1 μ

e )dμ −

log(1−ze−μ )dμ−(χ + B(τ )) ln(z)

μ>τ, μ∈D+

and F(z) can be computed explicitly. In this limit the integral (25) becomes K (h 1 , t1 ), (h 2 , t2 )) =

1 (2πi)2 × Cz

Cw

e

S(z;χ1 ,τ1 )−S(w,χ2 ,τ2 ) F(z; χ , τ ) √zw dz dw 1 1 r (1 + o(1)). F(w; χ2 , τ2 ) z − w z w

(38)

584

A. Okounkov, N. Reshetikhin

The integration contours are described in the previous section. For example, if N = 2 and U1 < τ1 , τ2 < V2 the contours are: V e 2 > |z|, |w| > |z|, |w| > eτ if τ1 > τ2 C z × Cw = . e V2 > |z| > |w| > eτ if τ1 ≥ τ2 The integral (38) can be computed by the steepest descent method. In order to do this one should first analyze critical points of S(z) and then deform contours of integration accordingly. 6. Critical Points of S(z) and the Deformation of Contours The function S(z) can be written as a sum of dilogarithms: S(z) = −(χ + B(τ )) ln(z) +

N

Li2 (ze−Ui ) −

i=0

where

N

Li2 (ze−Vi ) − Li2 (ze−τ ),

i=1

Li2 (z) = 0

z

ln(1 − x) d x. x

Critical points of S(z) are zeros of ∂ ze−μ z −1 eμ z S(z) = −(χ + B(τ )) + dμ + dμ. −μ −1 μ ∂z μ>τ, μ∈D+ 1 − ze μ<τ, μ∈D− 1 − z e This is equivalent to the following equations.

1 − ze−Ui 1 − z −1 e V j+1 1 − ze−U j +log + =0 log log −(χ + B(τ ))− 1 − ze−τ 1 − z −1 eU j 1 − ze−V j N ≥ j>i

0≤ j
when Vi < τ < Ui and −(χ + B(τ ))−

log

0≤ j

1−z −1 eτ 1−z −1 e V j+1 1−ze−U j −log + =0 log 1−z −1 eUi 1−z −1 eU j 1−ze−V j N ≥ j>i

when Ui < τ < Vi+1 . Exponentiating these equations we obtain exp(−χ − B(τ ) − L(τ )) (1 − ze−U j ) = (1 − ze−τ ) (1 − ze−V j ), (39) 0≤ j≤N

where L(τ ) =

1≤ j≤N

if Vi < τ < Ui 0≤ j
The number L(τ ) is the total length of (−) intervals which are to the left of τ . It is easy to see that L(t) + B(t) = t/2 − u 0 .

Random Skew Plane Partitions and the Pearcey Process

585

Therefore Eq. (39) can be written as eχ −τ/2 z − eχ +τ/2 = f (z), where

N f (z) = e

U0

0

(ze−Ui − 1)

1

(ze−Vi − 1)

N

(40)

.

6.1. The number of roots Theorem 3. Equation (40) has either N real solutions or N − 2 real solutions and two complex conjugates. Proof. The function f (z) has simple poles at z = vi and z = ∞ and simple zeros at z = u i . A sample graph of the function f (z) is plotted in Fig. 5. Solutions to Eq. (40) are intersection points of the line eχ −τ/ z − eχ +τ/2 with the graph of the function f (z). Any straight line of non-infinite slope obviously intersects the graph of f (z) in at least N − 2 points (and in at most N points, since the degree of f equals N ). Hence among the roots of (40) there is at most one complex conjugate pair. 6.2. The N = 1 case. First, consider Eq. (39). When N = 1 the normalization of U ’s and V ’s given by Eq. (6) implies V1 = 0. Introduce variables X = eχ +τ/2 , T = e−τ , α = eU0 , β = e−U1 . 10

8

6 y 4

2

–1

1

2

z

3

4

5

–2

–4

–6

–8

–10

Fig. 5. The graph of f (z) for {U0 , V1 , U1 , V2 , U2 } = {−2, −0.5, 1.5, 2, 3}

586

A. Okounkov, N. Reshetikhin

The equation for critical points of S(z) is quadratic: (β − X T )z 2 + (X + X T − (1 + αβ))z + (α − X ) = 0.

(41)

The discriminant of this equation is: = (X − X T − α + β)2 − (X + X T )2(1 − α)(1 − β) + (1 − α 2 )(1 − β 2 ).

(42)

The structure of solutions depend on the value of the discriminant . • When < 0 there are two complex conjugated critical points. • When > 0 these critical points are real. • When = 0 the two simple critical points degenerate into one double critical point. 6.3. The N = 2 case. This is the smallest value of N when the function S(z) can have a triple critical point. The function S(z) for the case when N = 2, and V1 + V2 < τ < V2 is τ V1 ln(1 − z −1 eμ )dμ + ln(1 − z −1 eμ )dμ S(z) = U0

−

V1 +V2 U2

ln(1 − z −1 eμ )dμ − (χ + B(τ )) ln z.

V2

We choose branches of logarithms such that the derivative of S(z) has brunch cuts [eU0 , e V1 ], [e V1 +V2 , eτ ], and [e V2 , eU2 ]. The function S(z) has three critical points. They are either all real or there is a complex conjugate pair of simple complex critical points. Geometrically these critical points correspond to intersection points of the line eχ −τ/2 z − eχ +τ/2 with the graph of the function f (z). When the line eχ −τ/ z −eχ +τ/2 intersects the graph of the function f (z) transversally, the intersection points are simple critical points of S(z). At double critical points the line eχ −τ/ z − eχ +τ/2 is tangent to f (z) but it does not bisect the graph of f (z) at the point where it is tangent to the graph. At a triple critical point the line is tangent to the graph of f (z) and bisects it. By the definition, a double critical point z of the function S(z) satisfies the two equations S (z) = 0 and S (z) = 0. We have z

τ τ ∂ S(z) = −χ − ln(ze− 2 − e 2 ) + ln f (z), ∂z

z

∂ ∂z

2

τ

S(z) = −

ze− 2 ze

− τ2

−e

τ 2

+z

f (z) . f (z)

This gives the system of equations for double critical points: eχ −τ/2 z − eχ +τ/2 = f (z), eχ −τ/2 = f (z).

(43) (44)

These equations define the curve in the (χ , τ ) plane. We will say the point (χ , τ ) on this curve is generic if it is not a triple critical point, i. e. if S (z) = 0, where z is the corresponding double critical point of S(z) (a solution to (43)).

Random Skew Plane Partitions and the Pearcey Process

587

Denote by z 0 the triple critical point of S(z) and by (χ0 , τ0 ) the corresponding values of χ and τ . By definition S (z 0 ) = S (z 0 ) = S (z 0 ) = 0, which gives one more equation in addition to (43): f (z 0 ) = 0. It is clear from the shape graph of f (z) that for each value of τ there are either 2 or 4 double critical points of S(z). They satisfy the following inequalities: 1. 2. 3. 4. 5.

−∞ < τ < V1 , then z 1 < 0 and 0 < z 2 < e V1 . V1 < τ < U1 , then z 1 < 0 and e V1 < z 2 < eτ . U1 < τ < τ0 , then z 1 < 0, e V1 < z 2 < z 0 , z 0 < z 3 < eU1 , and eτ < z 4 < e V2 . τ0 < τ < V2 , then z 1 < 0 and eτ < z 2 < e V2 . V2 < τ < U2 , then z 1 < 0 and e V2 < z 2 . It is also clear that if τ0 > U1 , then e V1 < z 0 < eU1 .

6.4. Deformation of integration contours . We want to deform contours of integration in (38) to position them in the way the steepest descent methods requires. If z c is a critical point of S(z), and if it does not lie on a branch cut of the function S(z), the integration contour should be deformed to a contour which lies on the curve (S(z)) = (S(z c )). The function S(z) has branch cuts along the real line. However, only exp( S(z) r )F(z), − (z,t) −h−B(t) z appear in the integral which is the leading term of the asymptotic of + (z,t) (38). Zeros of + (z) are accumulating along (e V2 , eU2 ). Therefore we can not deform C z through this segment but we can deform through any other part of the real line. Similarly, zeros of − (w) are accumulating in the segments (eU0 , e V1 ) and (eU1 , eτ ). Thus, the contour Cw can not be deformed through these segments but can be deformed through any other segment of the real line. Therefore we have to deform contours C z and Cw to the union of appropriate branches of curves (S(z)) = (S(z c + i0)) and (S(z)) = (S(z c − i0)). Figures 6, 7, and 8

14 12 10

y 8 6 4 2

–10

–5

5

10

x

Fig. 6. Level curves of (S) when all critical points are simple

588

A. Okounkov, N. Reshetikhin 5

4

3

y 2

1

0

–18

–17

–16

–15

–14

–13

–12

x

Fig. 7. Level curves of (S) when there is a double critical point 6 5 4

y 3 2 1

0

2

4

6

x

Fig. 8. Level curves of (S) when there is a triple critical point

Fig. 9. Deformation of the integration contour C z for a double critical point

show these curves for simple critical points, double critical points and the triple critical point, respectively. Deformed contours of integration C z and Cw are shown in Figs. 13,14,9,10,11,12 for simple, double, and triple critical points respectively. While deforming contours C z and Cw one should keep track of the residues at the pole z = w. We will discuss this later.

Random Skew Plane Partitions and the Pearcey Process

589

Fig. 10. Deformation of the integration contour Cw for a double critical point

Cz

Fig. 11. Deformation of the integration contour C z for a triple critical point

Cw

Fig. 12. Deformation of the integration contour Cw for a triple critical point

7. Simple Critical Points and Bulk Limit It is easy to show that if as r → 0 the coordinates h a , ta in local correlation functions are scaled as h a = χa /r , ta = τa /r for finite χa and τa with χa = χb and τa = τb for a = b, then

590

A. Okounkov, N. Reshetikhin

Cw Cz

C2

C1

Fig. 13. Deformation of the integration contour for a simple critical point when t1 ≥ t2

Cz Cw C2

C1

Fig. 14. Deformation of the integration contours for a simple critical point when t1 < t2 n lim ρh 1 ,t1 . . . ρh n ,tn = ρ(χa , τa ),

r →0

a=1

where ρ(χa , τa ) is the limit of the one point correlation function (see below). This reflects the absence of correlations in the scaling limit between two macroscopically separted points.

7.1. Bulk limit of correlation functions. Here we will compute the asymptotic of the integral (38) when there is a pair of complex conjugate critical points and when h = h 1 −h 2 and t = t1 − t2 are fixed in the limit r → 0. Deform contours C z and Cw to C1 and C2 respectively as it is shown on Fig. 13 for t1 ≥ t2 and as it is shown on Fig. 14 for t1 < t2 . Taking into account the residue at z = w we have the identity: ··· = ··· . + ··· . (45) Cz

Cw

C1

C2

γ±

Here we take + for t ≥ 0 and − otherwise. If Ui < τ < Vi+1 then γ+ is a simple curve connecting z c and z¯c and passing through the positive part of the real line with e Vi+1 < z.

Random Skew Plane Partitions and the Pearcey Process

591

Similarly, γ− is a simple curve connecting z c and z¯c and passing through the negative part of the real line. If Vi < τ < Ui then γ± are again simple curves connecting z c and z¯c in such a way that γ− intersects the negative part of the real line and γ+ intersects the positive part of the real line at eτ < z. As r → 0 only the second term in the right hand side of (45) will have finite limit. The first term will vanish. The computations are similar to [15]. The integrals along γ± converge as r → 0 to )) (t, h + ε(τ )t/2), t ≥ 0 Bγ(ε(τ + , (46) κ(h, t) = (ε(τ )) Bγ− (t, h + ε(τ )t/2), t < 0 where Bγ(±) (k, l)

1 = 2πi

γ

(1 − e∓τ w ±1 )k w −l−1 dw.

Here ε(τ ) = 1 when Vi < τ < Ui and ε(τ ) = −1 when Ui < τ < Vi+1 . The contours γ± are as above. For the limit of correlation functions we obtain the following answer: lim ρh 1 ,t1 . . . ρh n ,tn = det(κa,b )1≤a,b≤n , r →0

where κa,b = κ((h a − h b ), (ta − tb )) = lim K ((ta , h a ), (tb , h b )). r →+0

Correlation functions for finite q satisfy the recurrence equations (34) and (35). In the limit these recurrence relations turn into difference equations for correlation functions (7.1): κ(h, t) − κ(h + ε(τ )1/2, t − 1) + e−ε(τ )τ κ(h − ε(τ )1/2, t − 1) = δh,0 δt,0 , where (τ ) = +1 when τ ∈ D + , (τ ) = −1 when τ ∈ D − . These equations can also be directly deduced from the integral representation of correlation functions. Remark 3. The difference operator in these equations is the Kasteleyn matrix on the infinite hexagonal lattice. The pairwise correlation function (7.1) can be regarded as the inverse for this matrix with the boundary conditions determined by (τ, χ ). For one-time correlation functions we have: κ(h, 0) = |z c |−h

sin(θ h) , πh

where θ is the argument of z c , that is, z c = |z c |eiθ . Notice that the factor |z c |h does not contribute to the equal-time density correlation functions and we have:

lim ρh 1 ,t . . . ρh n ,t

r →0

sin(θ (h a − h b )) = det π(h a − h b )

Here we assume that h 1 > h 2 > . . . h n .

. 1≤a,b≤n

592

A. Okounkov, N. Reshetikhin

7.2. Limit shape. The form of the one-point correlation function implies that the density of horizontal tiles is θ ρ(τ, χ ) = , π where θ is the argument of z c . The limit shape can be reconstructed from this density by integration: χ z(τ, χ ) = (1 − ρ( τ, s)) ds, (47) −b(τ )

τ τ , y(τ, χ ) = z(τ, χ ) − χ + . (48) 2 2 Here b(τ ) = max(−τ/2 + U0 , τ/2 + U N ). Thus, the information about the limit shape is in the structure of critical points of the function S(z). When the point (τ, χ ) is such that all critical points of S(z) are real the limit of correlation functions (7.1) is either 1 or 0. It is zero if the maximum of S(z) is inside the cycle of integration and it is 1 if it is outside. The corresponding point (x, y, z) lies on a facet (flat part of the limit shape). If (τ, χ ) is such that there is a pair of complex conjugate simple critical points, the point (x, y, z) lies in the disordered region (curved part of the limit shape). The frozen boundary (that is, the boundary between the disordered region and the facets) limit shape correspond to (χ , τ ) for which there exists a real zero of S(z) of the multiplicity at least 2. Cusps of the frozen boundary correspond to triple critical points. We plotted the frozen boundary in the (τ, χ )-plane on Fig. 15 for N = 1 and on Fig. 16 for N = 1. The vicinity of the cusp is magnified on Fig. 17. It is instructive to compare these curves with the result of numeric simulation in Figure 2. x(τ, χ ) = z(τ, χ ) − χ −

Fig. 15. An example of frozen boundary for N = 1

Fig. 16. An example of frozen boundary for N = 2

Random Skew Plane Partitions and the Pearcey Process

593

Fig. 17. The vicinity of a cusp

8. Double Critical Points and the Scaling Limit Near the Boundary 8.1. Now, let us assume that (χ0 , τ0 ) are such that z 0 is a double critical point of S(z). Consider the vicinity of (z 0 , χ0 , τ0 ) with coordinates z = z 0 exp(ξ ), χ = χ0 + δχ , τ = τ0 + δτ . Lowest degree terms in the Taylor expansion around (z 0 , χ0 , τ0 ) are: A 3 1 1 ξ − (δχ − Bδτ )ξ + Cδτ ξ 2 + Dδτ 2 ξ 3! 2 2 1 1 + Eδτ 3 + H δτ + Gδτ 2 − ln(z 0 )δχ + · · · , 3! 2

S(z, χ , τ ) = S(z 0 , χ0 , τ0 ) +

where

∂ A= z ∂z

3 S(z)|z 0 ,τ0 =

z 02 f (z 0 ) , f (z 0 ) τ0

τ0

∂2 z 0 e− 2 + e 2 B=z S(z)|z 0 ,τ0 = τ0 τ0 , ∂τ ∂z 2(z 0 e− 2 − e 2 ) ∂ ∂ z0 C = (z )2 S(z)|z 0 ,τ0 = − , τ0 τ0 ∂z ∂τ (z 0 e− 2 − e 2 )2 D=z

(49)

∂ ∂2 z0 S(z)|z 0 ,τ0 = , τ τ0 − 20 ∂z ∂τ 2 (z 0 e − e 2 )2

∂3 z0 S(z)|z 0 ,τ0 = − , τ0 τ0 3 − ∂τ (z 0 e 2 − e 2 )2 − 21 ln(z 0 ) + ln(z 0 − eτ0 ) , τ0 ∈ D− ∂ S(z)|z 0 ,τ0 = H= , 1 τ ∂τ − 2 ln(z 0 ) + ln(e − z 0 ) − τ0 , τ0 ∈ D+ τ0 ∂2 − z 0e−eτ0 , τ0 ∈ D− . G = 2 S(z)|z 0 ,τ0 = z0 − eτ0 −z 0 , τ0 ∈ D+ ∂τ E=

(50) (51) (52) (53)

(54)

(55)

Notice that C = −D, E = −D, D > 0, A/B > 0. The sign of A depends on the nature of the interface. It is positive if the frozen region is above the melted region, and it is negative otherwise. It changes signs at the points z 0 = exp(Ui ) and at the triple critical points where f (z 0 ) = 0. Rescaling local coordinates ξ , δχ and δτ as 1

2

1

ξ = r 3 σ, δχ − Bδτ = r 3 x, δτ = r 3 y

594

A. Okounkov, N. Reshetikhin

we have S(z, χ , τ ) − S(z 0 , χ0 , τ0 ) 1 1 (H − B ln(z 0 ))y A = σ 3 − xσ + C yσ 2 + Dy 2 σ + r 3! 2 2 r 2/3 2 Gy − ln(z 0 )x 1 × + E y 3 + o(1). (56) r 1/3 3! 8.2. As r → 0 the leading asymptotic of the integral (38) is determined by the leading asymptotic of the integral: √ S(z χ1 ,τ1 )−S(w,χ2 ,τ2 ) zw dz dw 1 r K (h 1 , t1 ), (h 2 , t2 )) = e (1 + O(r 1/3 ). (2πi)2 C z Cw z−w z w (57) If V1 < τ < V2 , the contour of integration is 1 > |z| > |w| > e V1 if V1 < τ < U1 C z × Cw = e V2 > |z| > |w| > eτ if U1 < τ < V2 for t1 ≥ t2 . When t1 < t2 the only difference is that |z| < |w|. Assume that coordinates (h i , ti ) are in the vicinity of the point (χ0 , τ0 ), and that as r → 0 they scale in the following way: 2

1

1

χi = r h i = χ0 + r 3 xi − Br 3 yi , τi = r ti = τ0 + r 3 yi .

(58)

In this limit, conditions t1 ≥ t2 , t2 > t1 translate to y1 ≥ y2 and y2 > y1 respectively. Deform contours of integration in such a way that they pass through this critical point and will follow the branches of curves (S(z)) = (S(z 0 )) and (S(w)) = (S(z 0 )), where (S(z)) and −(S(w)) have local maximum. Deformed contours are described in Sect. 6.4. The leading contribution to the asymptotic comes from the vicinity of the point z 0 1 1 where we will use local coordinates z = z 0 (1 + r 3 σ ) and w = z 0 (1 + r 3 κ). The saddle point integration contours for σ and κ in the limit r → 0 are shown on Fig. 18 and Fig. 19 for A > 0. The leading term of the asymptotic of the integral (57) is given by H G ln(z 0 ) (y1 − y2 ) + 1/3 (y12 − y22 ) − 1/3 (x1 − x2 ) r 2/3 r r B ln(z 0 ) 1 D D − 2/3 (y1 −y2 )+ E(y13 −y23 ))K A ((x1 − y12 , y1 ), (x2 − y22 , y2 ))(1+O(r 1/3 )). r 3! 2 2 (59) K ((h 1 , t1 ), (h 2 , t2 )) = r 1/3 exp(

Fig. 18. Integration contour for σ

Random Skew Plane Partitions and the Pearcey Process

595

Fig. 19. Integration contour for κ

For y1 ≥ y2 the function K A is K A ((a1 , b1 ), (a2 , b2 )) A D dσ dκ 1 . (60) exp( (σ 3 − κ 3 ) − (b1 σ 2 − b2 κ 2 ) − a1 σ + a2 κ) = 2 (2πi) C+ C− 3! 2 σ −κ For y1 < y2 it has an extra term which comes from the residue at σ = τ : 1 A D K A ((a1 , b1 ), (a2 , b2 )) = exp( (σ 3 − κ 3 ) − (b1 σ 2 − b2 κ 2 ) 2 (2πi) C+ C− 3! 2

1 dσ dκ D + exp − (b1 − b2 )σ 2 − (a1 − a2 )σ dσ . (61) − a1 σ + a2 κ) σ − κ 2πi C+ 2 The function K A ((x1 , y1 ), (x2 , y2 )) is discontinuous at y1 = y2 . The discontinuity is determined by the second term. It is not difficult to see that it is equal to

(x1 − x2 )2 1 exp − . 2π D|y1 − y2 | 2D|y1 − y2 | Notice that since correlation functions (23) are given by determinants, the exponential factors in front of the integral in (60) will be canceled and we have:

D D lim r −n/3 ρh 1 ,t1 . . . ρh n ,tn = det K A xa − ya2 , ya , xb − yb2 , yb , yb . r →0 2 2 Here we assume that (h i , ti ) are scaled as in (58). It is easy to verify that K A ((x1 , 0), (x2 , 0)) =

Ai(λx1 ) Ai (λx2 ) − Ai(λx2 ) Ai (λx1 ) , λ(x1 − x2 )

where Ai(x) is the Airy function: 1 i Ai(x) = exp( t 3 + i xt)dt, 2π R 3 and λ = ( A2 )1/3 .

596

A. Okounkov, N. Reshetikhin

For the scaling limit of the density of horizontal tiles we have: ρ A (x, y) =

2 2

C y 1 s3 − t 3 i dsdt −i exp −i − − x (s − t) . (62) 2 1/3 (2π ) λ R R 3 A 2A s − t − i0 It can also be written as ρ A (x, y) =

2 2

λ( C2 Ay −x) −∞

Ai(v)2 dv.

In this form it is clear that the density function is positive. 9. Triple Critical Points and the Scaling Limit Near the Cusp 9.1. The singularity of the limit curve. The following holds if z 0 is a triple critical point

∂ 4 f (3) (z 0 ) z . (63) S(z)|z o = z 03 ∂z f (z 0 ) The proof is straightforward. First, one computes the fourth derivative:

τ ∂ 4 f f z 2z 2 e− 2 f 2 3 z S(z) = − + z + 3z +z . τ τ τ τ − − ∂z f f f (ze 2 − e 2 )2 (ze 2 − e 2 )3 This expression reduces to (63) after taking into account equations for z 0 . The second derivative of f (z) vanishes at z 0 and, as it follows from the graph of f (z), it is negative for z < z 0 and positive for z > z 0 . This implies the positivity of f (3) (z 0 ). This means that the sign of A is determined by the sign of f (z 0 ). The later is negative if z 0 < exp(U1 ) (when the cusp on the limit shape is turned to the right) and is positive otherwise (i.e. when the cusp is turned to the left). Now, let us find the behavior of the boundary curve near the cusp. Assume that z 0 is a triple critical point corresponding to the cusp with singularity at (χ0 , τ0 ), i.e. z 0 ,a0 and b0 satisfy the system: eχ0 −τ0 /2 = f (z 0 ),

(64)

eχ0 +τ0 /2 = z 0 f (z 0 ) − f (z 0 ),

(65)

0 = f (z 0 ).

(66)

If a point (χ , τ ) is at the boundary curve, it satisfies the equations: eχ −τ/2 = f (z), eχ +τ/2 = z f (z) − f (z).

(67)

Let (z, χ , τ ) be a double critical point in a small vicinity of (z 0 , χ0 , τ0 ): χ = χ0 + δχ , τ = τ0 + δτ z = z 0 + ε. Then from Eqs. (67,64) we obtain the following asymptotic of the boundary curve near the cusp: exp(χ0 − τ0 /2)(δχ − δτ/2) =

1 (3) 1 f (z 0 )ε2 + f (4) (z 0 )ε3 + O(ε4 ), 2 6

Random Skew Plane Partitions and the Pearcey Process

exp(χ0 + τ0 /2)(δχ + δτ/2) =

597

1 1 z 0 f (3) (z 0 )ε2 + (z 0 f (4) (z 0 ) + 2 f (3) (z 0 ))ε3 + O(ε4 ) 2 6

when ε → 0. From here we have: τ0 1 ((z 0 + exp(τ0 )) f (3) (z 0 )ε2 δχ = exp −χ0 − 4 2 1 + ((z 0 + exp(τ0 )) f (4) (z 0 ) + 2 f (3) (z 0 ))ε3 + · · · ), 3 δτ =

τ0 1 exp(−χ0 − )((z 0 − exp(τ0 )) f (3) (z 0 )ε2 2 2 1 + ((z 0 − exp(τ0 )) f (4) (z 0 ) − 2 f (3) (z 0 ))ε3 + ...). 3

(68)

(69)

After reparametrization ε=σ−

(z 0 − eτ0 ) f (4) (z 0 ) − 2 f (3) (z 0 ) 6 f (3) (z 0 )(z 0 − eτ0 )

we have: δτ =

δχ =

1 τ0 exp −χ0 − (z 0 − exp(τ0 )) f (3) (z 0 )σ 2 + O(σ 3 ), 2 2

τ0 4 exp(τ0 ) 1 exp −χ0 − ((z 0 +exp(τ0 )) f (3) (z 0 )σ 2 − f (3) (z 0 )σ 3 )+ O(σ 4 ). 4 2 3 z 0 −exp(τ0 )

This is a parametrization of a cusp singularity in the boundary of the limit shape. 9.2. The asymptotic of correlation functions. Expanding the function S(z, χ , τ ) near the triple critical point z 0 we obtain the following lowest degree terms of the Taylor expansion: S(z, χ0 + δχ , τ0 + δτ ) = S(z 0 , χ0 , τ0 ) +

A 4 1 ξ − δχ ξ + Bδτ ξ + Cδτ ξ 2 4! 2

1 + Dδτ 2 − ln(z 0 )δχ + H δτ + o(1), 2 where A = z 03

f (3) (z 0 ) f (z 0 )

and B and C are as before, given by (49), (51). Recall that A < 0 is the cusp if the limit shape is turned right and A > 0 if it is turned left. Scaling local coordinates as 1

3

1

ξ = r 4 σ, δχ − Bδτ = r 4 x, δτ = r 2 y

598

A. Okounkov, N. Reshetikhin

we obtain the following asymptotic for S: S(z, χ , τ ) − S(z 0 , χ0 , τ0 ) C D A = σ 4 − xσ + yσ 2 + y 2 r 4! 2 2 H ln(z 0 ) B ln(z 0 ) + 1/2 y − 1/4 x + y + o(1). r r r 3/4 Now let us find the asymptotic of the integral (57) as r → 0 assuming that coordinates are scaled as 3 1 1 (70) r h i = χ0 + r 4 xi − Br 4 yi , r ti = τ0 + r 2 yi , where (χ0 , τ0 ) are coordinates of the tip of the cusp. The asymptotic of the integral (57) when r → 0 and 3

1

1

r h i = χ0 + r 4 xi − Br 4 yi , r ti = τ0 + r 2 yi can be evaluated by the steepest descent method. It is determined by the contribution from the triple critical point z 0 . After deforming contours of integration as it is described in Sect. 6.4 the leading term of the asymptotic is given by the integral

ln(z 0 ) H B ln(z 0 ) K ((h 1 , t1 )(h 2 , t2 )) = r 1/4 exp − 1/4 (x1 − x2 ) + 1/2 (y1 − y2 ) + (y1 − y2 ) r r r 3/4 D (71) + (y12 − y22 ) K P ((x1 , y1 ), (x2 , y2 ))(1 + O(r 1/4 ), 2 where for y1 > y2 we have K P ((x1 , y1 ), (x2 , y2 )) A C dσ dκ 1 exp( (σ 4 − κ 4 ) + (y1 σ 2 − y2 κ 2 ) − x1 σ + x2 κ) = 2 (2πi) C1 C2 4! 2 σ −κ (x1 − x2 )2 1 + exp(− ) . (72) 2π D|y1 − y2 | 2D|y1 − y2 | For y1 < y2 there is an extra term coming from the residue at σ = τ : K P ((x1 , y1 ), (x2 , y2 ))

C A 4 dσ dκ 1 4 2 2 (σ (y . exp − κ ) + σ − y κ ) − x σ + x κ = 1 2 1 2 2 (2πi) C1 C2 4! 2 σ −κ (73) Integration contours for A > 0 are shown on Figs. 21 and 20. Notice that the exponential factors cancel in the limit of correlation functions and we have: lim r −n/4 ρh 1 ,t1 . . . ρh n ,tn = det(K P ((xa , ya ), (xb , yb ))).

r →0

Here we assume (h i , ti ) scale as in (70).

Random Skew Plane Partitions and the Pearcey Process

599

Fig. 20. Integration contour for κ

Fig. 21. Integration contour for σ

10. Some Properties of Pearcey Kernel Recall that the asymptotic of correlation functions is determined by the following integral, which we will call Pearcey kernel: K P ((x1 , y1 ), (x2 , y2 ))

A 4 C dσ dκ 1 4 2 2 (σ (y . exp − κ ) + σ − y κ ) − x σ + x κ = 1 2 1 2 (2πi)2 C1 C2 4! 2 σ −κ (74) After the appropriate scaling of variables we can set A = 6 and C = 1. Below we will review some of its properties. 10.1. The function K P ((x1 , y1 ), (x2 , y2 )) (with A = 6 and C = 1) satisfies the following differential equations:

∂ 1 ∂2 KP = 0 − ∂ yi 2 ∂ xi 2 for i = 1, 2 and

∂ ∂ + ∂ x1 ∂ x2

where 1 P± (x, y) = 2πi

K P = −P+ (x1 , y1 )P− (x2 , y2 ),

σ4 1 2 + yσ − xσ ) dσ. exp ±( 4 2 C±

600

A. Okounkov, N. Reshetikhin

For functions P± we have:

∂3 ∂ ±x+y 3 ∂ x ∂x

P± = 0,

∂ 1 ∂2 ± − P± = 0. ∂y 2 ∂x2 10.2. The function P+ (x, y) can be written as

1 P+ (x, y) = − + − + 2πi ω R+ ω−1 R+ ω−3 R+ ω 3 R+ or, as 1 P+ (x, y) = (ω π

∞

o

i 2 s4 exp − − ys (e xωs − e−xωs )ds). 4 2

From this integral representation one can find a power series formula for the integral: 2 P+ (x, y) = π

(−1)

n≥o,m≥o,n+m=0(2)

n+m 2

2n−1 ( n+m+1 2 ) 2n+1 x (−y)m (2n + 1)!m!

or P+ (x, y) =

1 (−1)k+l 22k (k + l + 1/2) 4k+1 2l x y π (4k + 1)!(2l)! k,l≥0

+

1 (−1)k+l 22k+1 (k + l + 3/2) 4k+3 2l+1 x y . π (4k + 3)!(2l + 1)! k,l≥0

√ Now, using the identity (k + 1/2) = 2−2k+1 π (2k)! k! we arrive to the formula 2 (−1)k+l 2−2l+1 (2k + 2l)! 4k+1 2l x P+ (x, y) = √ y (4k + 1)!(2l)!(k + l)! π k,l≥0

2 (−1)k+l 2−2l (2k + 2l + 2)! 4k+3 2l+1 x +√ y . (4k + 3)!(2l + 1)!(k + l + 1)! π k,l≥0

10.3. The integral 1 P− (x, y) = 2πi

4 1 2 t exp − + yt + i xt dt 4 2 −∞ ∞

can be easily expanded into a power series in x and y: P− (x, y) =

1 (−1)n 2n−3/2 ( n+m 2 + 1/4) 2n m x y . π (2n)!m! n,m≥0

Random Skew Plane Partitions and the Pearcey Process

601

Fig. 22. The scaling limit of the density of horizontal tiles near the cusp

10.4. One-time correlation functions can be expressed in terms of P± as follows: K (x1 , x2 |y) P (x1 , y)P− (x2 , y) − P+ (x1 , y)P− (x2 , y) + P+ (x1 , y)P− (x2 , y) + y P+ (x1 , y)P− (x2 , y) . = + x1 − x2 (75)

Here P (x, y) = ∂∂x P(x, y). This identity follows from

∂ ∂ + ∂κ C1 C2 ∂σ

C dσ dκ A 4 4 2 2 (σ − κ ) + (y1 σ − y2 κ ) − x1 σ + x2 κ = 0. exp 4! 2 σ −κ

(76)

Taking the limit x1 → x2 = x in (75) we obtain the following expression for the scaling limit of the density of tiles near the cusp: ρ(x, y) = P+ (x, y)P− (x, y) − P+ (x, y)P− (x, y) − x P+ (x, y)P− (x, y). This function is plotted on Fig 22. A. Schur Functions Recall that a partition is a sequence of integers λ = (λ1 ≥ λ2 ≥ · · · ≥ λn ≥ 0). A diagram of a partition λ (also known as its Young diagram) has λ1 boxes in the upper row, λ2 boxes in the next row, etc., see Fig. 23. Unless this leads to confusion, we identify partitions with their diagrams. The sum λi |λ| = i≥1

is the area of the diagram λ.

602

A. Okounkov, N. Reshetikhin

Fig. 23. A Young diagram 1

1

1

1 2

2 3

2 3

3 3

3

5 7

5 8

6

2

2

3

3

Fig. 24. A semi-standard tableau

A skew diagram λ/μ is a pair of two partitions λ and μ such that λ1 ≥ μ1 , λ2 ≥ μ2 , . . . . Graphically, it is obtained from λ by removing μ1 first boxes from the first row, μ2 first boxes from the second row, etc. The size of the skew diagram λ/μ is |λ/μ| = |λ| − |μ|. A semistandard tableau of the shape λ/μ with entries 1, 2, 3, . . . is the result of writing numbers 1, 2, 3, . . . in boxes of the diagram, one in each box, in such a way that the numbers weakly decrease along the rows and strictly decrease along the columns, see an example in Fig. 24. The Schur function corresponding to the skew tableau λ/μ is a symmetric function of variables x1 , x2 , . . . which can be defined as the sum over all semi-standard tableaux of shape λ of monomials in xi : m m x1 1 x2 2 . . . , sλ/μ (x1 , x2 , . . . ) = Tλ/μ

where m i is the number of occurrences of i in the tableau. A semi-standard tableaux of shape λ/μ can be identified with sequences of skew diagrams in the following way. Let us say that λ μ if λ and μ interlace, that is, λ1 ≥ μ1 ≥ λ2 ≥ μ2 ≥ λ3 ≥ . . . . Let us call the sequence of diagrams λ(1), λ(2), . . . λ(n) interlacing if λ(1) λ2

· · · λ(n). It is clear that if we will associate with such a sequence the semi-standard tableau with |λ(1)| − |λ(2)| entries of n, |λ(2)| − |λ(3)| entries of n − 1, etc. It is also clear that this correspondence gives a bijection between semi-standard tableaux of shape λ(1)/λ(n) and interlacing sequences of diagrams which start with λ(1) and ends with λ(n). For skew Schur functions this bijection gives the following formula: sλ/μ (x1 , . . . , xn , 0, 0, . . . ) =

|λ|−|λ(1)| |λ(1)|−|λ(2)| x2 . . . xn|λ(n−1)|−|μ| ,

x1

λ λ(1) ··· λ(n−1) μ

from now on we will write sλ/μ(x1 ,x2 ,...,xn ) for sλ (x1 , . . . , xn , 0, 0, . . . ).

(77)

Random Skew Plane Partitions and the Pearcey Process

Notice that

sλ/μ (x1 , 0, . . . ) =

603

|λ|−|μ|

x1 0,

, μ ≺ λ, μ ≺ λ,

(78)

and therefore we can write sλ/μ (x1 , . . . , xn )

=

sλ/λ(1) (x1 )sλ(1)/λ(2) (x2 ) . . . sλ(n−1)/μ (xn ).

(79)

λ λ(1) ··· λ(n−1) μ

The function sλ (x1 , x2 , . . . , xn ) is the character of the irreducible representation of G L n with the highest weight λ computed on the diagonal element with entries x1 , x2 , . . . , xn . The formula (77) is the result of the computation of this character in the Gelfand-Tsetlin basis. B. Semiinfinite Forms and Vertex Operators B.1. Semiinfinite forms. Let the space V be spanned by k, k ∈ Z + 21 . The space F = ∞ 2 V is, by definition, spanned by vectors v S = s1 ∧ s2 ∧ s3 ∧ . . . , where S = {s1 > s2 > . . . } ⊂ Z +

1 2

is such a subset that both sets

S+ = S \ Z≤0 − 21 , S− = Z≤0 − 21 \ S ∞

are finite. We equip 2 V with the inner product (., .) in which the basis {v S } is orthonormal. The space F is also called the fermionic Fock space. The infinite Clifford algebra Cl(V ) is generated by elements ψk , ψk∗ , k ∈ Z + 21 with defining relations ψk ψl + ψl ψk = 0, ψk∗ ψl∗ + ψl∗ ψk∗ = 0, ψk ψl∗ + ψl∗ ψk = δk,l . It acts on the Fock space F as ψk s1 ∧ s2 ∧ s3 ∧ · · · = k ∧ s1 ∧ s2 ∧ s3 ∧ · · · , ψk∗ s1 ∧ s2 ∧ · · · sl ∧ k ∧ sl−1 ∧ · · · = (−1)l s1 ∧ s2 ∧ · · · sl ∧ sl−1 ∧ · · · , ψk∗ s1 ∧ s2 ∧ s3 ∧ . . . = 0, k ∈ Z\S. The space F is an irreducible representation of Cl(V ). Notice that the operator representing ψk∗ is conjugate to the operator representing ψk with respect to the scalar product in F. The Lie algebra gl∞ of Z × Z-matrices with finitely many entries acts naturally on V and therefore acts diagonally on the semi-infinite wedge space F. This action extends

604

A. Okounkov, N. Reshetikhin

to the action of a∞ [11] and is reducible. Irreducible components F(m) are eigenspaces of the operator C= ψk∗ ψk − ψk ψk∗ . k<0

k<0

The Fock space decomposes into the direct sum F = ⊕m∈Z F(m) of irreducible representations of a∞ [11]. The subspace F(m) is spanned by vectors s1 ∧ s2 ∧ s3 ∧ . . . with si = m − i + 1/2 for sufficiently large i. It is generated by the action of a∞ on the vacuum vector (the a∞ highest weight vector in F (m) ): v0(m) = m − 1/2 ∧ m − 3/2 ∧ m − 5/2 ∧ · · · . (m)

The operators ψk and ψk∗ act on v0

as

ψk v0

(m)

= 0, k ≤ m − 1/2,

(80)

(m) ψk∗ v0

= 0, k > m − 1/2.

(81)

They can be regarded as a∞ -intertwining operators ψ : V ⊗ F (m) → F (m+1) , where 1

V = CZ+ 2 is the vector representation of gl∞ . Vectors in the space F (m) can be parameterized by partitions. For a partition λ define vλ(m) = λ1 + m − 1/2 ∧ λ2 + m − 3/2 ∧ · · · . It is clear that these vectors span F (m) . B.2. Clifford algebra and vertex operators. Consider elements αn = ψk+n ψk∗ , n = ±1, ±2, · · · . k∈Z+ 21

They satisfy commutation relations [αn , αm ] = −n δn,−m , [α , ψ ] = ψk+n , n ∗k ∗ αn , ψk = −ψk−n .

(82) (83) (84)

It is clear that αn vo(m) = 0 for n < 0 and m ∈ Z. Vertex operators are the formal power series ⎛ ⎛ ⎞ ⎞ xn xn αn ⎠ , − (x) = exp ⎝ α−n ⎠ . + (x) = exp ⎝ n n n≥1

n≥1

Random Skew Plane Partitions and the Pearcey Process

605

The operator − (x) acts finitely in the space F (applied to any vector of F it acts as a polynomial in x). In particular (m)

− (x)v0

(m)

= v0 .

The operator + (x) is conjugate to − (x): ( − (x)v, w) = (v, + (x)w),

(85)

and since the scalar product is symmetric ( − (x)v, w) = ( + (x)w, v). Notice that its action is defined in F not only as a formal power series in x. In a weak sense operators ± are operator- valued functions which are analytic at x = 0. Define the formal Fourier transform of ψk , ψk∗ as power series ψ(z) = z k ψk , ψ ∗ (z) = z −k ψk∗ . (86) k∈Z+1/2

k∈Z+1/2

These operators and vertex operators satisfy the following commutation relations: + (x) − (y) = (1 − x y) − (y) + (x),

(87)

+ (x)ψ(z) = (1 − z −1 x)−1 ψ(z) + (x), − (x)ψ(z) = (1 − x z)−1 ψ(z) − (x),

+ (x)ψ ∗ (z) = (1 − z −1 x)ψ ∗ (z) + (x), − (x)ψ ∗ (z) = (1 − x z)ψ ∗ (z) − (x).

(88) (89) (90) (91)

Here left and right sides are corresponding formal power series. The following is a well known statement. Theorem 4. The following identity holds: (m)

( − (x1 ) − (x2 ) · · · − (xn )vλ , vμ(m) ) = sλ/μ (x1 , . . . , xn ).

(92)

This is a well known statement, for a proof see for example [11]. The key step is to show that the identity (92) holds for n = 1 which can be easily derived from the fact that (m)

vλ

(m−n)

= ψq∗1 · · · ψ ∗qk ψ p1 · · · ψ pl v0

,

where q1 , . . . qk , p1 , . . . , pl are the Frobenius coordinates of the partition λ and from the identity − (x)ψ ∗ (w1 ) . . . ψ ∗ (wk )ψ(z 1 ) . . . ψ(zl )v0(m) l k (m) −1 = (1 − x z i ) (1 − xwi )ψ ∗ (w1 ) · · · ψ ∗ (wk )ψ(z 1 ) · · · ψ(zl )v0 i=1

for generating functions.

i=1

(93)

606

A. Okounkov, N. Reshetikhin

For matrix elements of generating functions ψ(z) and ψ ∗ (z) we have: m+1/2 1 (m) (m) (ψ ∗ (w)ψ(z)v0 , v0 ) = wz , |z| < |w|, z m−1/2 1−z/w (m) (m) 1 ∗ (ψ(z)ψ (w)v0 , v0 ) = w 1−w/z , |z| > |w|.

(94)

These identities summation of a geometric series. follow from the Let Aia = j Aiaj ψ j and Bia = j Biaj ψ ∗j for a = 1, . . . , n. The following identity is known as Wick’s Lemma: (m)

(m)

(Ai11 B 1j1 Ai22 B 2j2 · · · Ainn B njn v0 , v0 ) = det(K ab )1≤a,b≤n , where

K ab =

(Aiaa B bjb , v0(m) , v0(m) ),

(m) (m) −(Bibb Aaja , v0 , v0 ),

a≤b a>b

(95)

.

Notice that this identity also holds when v0(m) is replaced by any vλ(m) . C. Some Asymptotic for Limit Shapes C.1. The frozen boundary is singular at τ = V j . When τ → V j the singular branch of the boundary curve behaves as

χ (τ ) = 2 ln

|τ − V j | 2

+ |V j |/2 +

j−1 (Vi+1 − Ui ) i=0

k= j,1≤k≤N

+ ln

1≤k≤N

|1 − e−Uk +U j |

|1 − e−Uk +V j |

+ O(|τ − V j |).

C.2. The limit curve is tangent to the lines τ = U0 and τ = U N at the points (χ L , U0 ) and (χ R , U N ), where χL =

N

ln

j=1

χR =

N j=1

ln

1 − eU0 −V j 1 − eU0 −U j

1 − eU N −V j 1 − eU N −U j

+

U0 , 2

+ U0 −

UN . 2

If τ = U0 + or τ = U N − and → +0 the asymptotic of two double critical points z (±) (τ ) is U √ √ e 0 (1 ± √C L + O( )) if τ = U0 + (±) √ z (τ ) = U N , e (1 ± C R + O( )) if τ = U N − where C L and C R are some constants.

Random Skew Plane Partitions and the Pearcey Process

607

The boundary curve near this point behaves as: χ (τ ) =

√ χ L (1 + √(τ − U0 D L + O(|U0 − τ )|, if τ → U0 + 0 , χ L (1 + (U N − τ D R + O(|U N − τ |), if τ → U N − 0

where, again, D L and D R are some constants. A Similar solution exists near each point τ = U j , j = 1, . . . , N − 1.

C.3. Let us find the asymptotic of the density function near the left boundary of the limit shape. Assume χ = χ L + δχ , τ = U0 + δτ for some positive δτ → 0. Solutions to (40) have the asymptotic z = exp(U0 )(1+δz). Let us find δz as a function of δχ and δτ . We have the following asymptotical expansions: N N 1 − ze−V j 1 − eU0 −V j = (1 − Cδz + O(δz 2 )), −U j U0 −U j 1 − ze 1 − e j=1 j=1

where C=

N i=1

eU0 −V j eU0 −U j − > 0. U −U 1−e 0 j 1 − eU0 −U j

Keeping leading orders in δχ , δτ , and δz in (40) we obtain the equation for δz: Cδz 2 + δz((C + 1/2)δτ − δχ ) + δτ = 0. In the region δτ ∝ (δχ )2 we have two asymptotical solutions

δz 1,2

δχ ± 2

δτ δχ 2 − . 2 C

From here we obtain the asymptotic of the density function in this region: θ ρ = arctan π

4δτ −1 . Cδχ 2

Here θ is the argument of δz 1 . Notice that ρ → 1/2 − 0 as δχ → +0.

608

A. Okounkov, N. Reshetikhin

D. The Symmetry of Correlation Functions Change variables in the integral K ((t1 , h 1 ), (t2 , h 2 )) √ 1 − (z, t1 )+ (w, t2 ) zw − j1 j2 dzdw z w = (2πi)2 |z|<min{1,R(t1 ) R ∗ (t2 )<|w|<1 + (z, t1 )− (w, t2 ) z − w zw (96) from z to w−1 and from w to z −1 . It becomes K ((t1 , h 1 ), (t2 , h 2 )) 1 = (2πi)2 |w|−1 <min{1,R(t1 ) R ∗ (t2 )<|z|−1 <1 where ˜ + (z, t) =

˜ − (z, −t2 ) ˜ + (w, t1 ) √zw − j j dzdw z 2w 1 , ˜ + (z, t2 ) ˜ − (w, t1 ) z − w zw (97)

(1 − z x˜m+ ),

m

˜ − (z, t) =

(1 − z − x˜m− ),

m ∓ and x˜m± = x−m . Thus, we have the following “reflection” symmetry of correlation functions:

K (( j1 , t1 ), ( j2 , t2 )) = K˜ (( j2 , −t1 )( j1 , −t1 )). This symmetry is obvious on the “microscopical level”. It corresponds to the reflection of the tiling in the t-direction. Acknowledgements. N. R. is grateful to Laboratoire de Physique Theorique at Saclay for hospitality, where part of this work was done and to J.-B. Zuber and Ph. Di Francesco for interesting discussions. His work was partially supported by the NSF grant DMS-0070931, the CRDF grant, and by the Humboldt Foundation. The work of A. O. was partially supported by the Packard Foundation.

References 1. Aganagic, M., Klemm, A., Marino, M., Vafa, C.: The Topological Vertex. Commun. Math. Phys. 254, 425–478 (2005) 2. Aptekarev, A., Bleher, P., Kuijlaars, A.: Large n limit of Gaussian random matrices with external source, part II.http://arxiv.org/list/ math-ph/0408041,2004 3. Brézin, E., Hikami, S.: Universal singularity at the closure of a gap in a random matrix theory. Phys. Rev. E (3) 57(4), 4140–4149 (1998) 4. Brézin, E., Hikami, S.: Level Spacing of Random Matrices in an External Source. Phys. Rev. E (3) 58(6), part A, 7176–7185 (1998) 5. Cerf, R., Kenyon, R.: The low-temperature expansion of the Wulff crystal in the 3D Ising model. Commun. Math. Phys. 222, 147–179 (2001) 6. Cohn, H., Kenyon, R., Propp, J.: A variational principle for domino tilings. J. Amer. Math. Soc. 14, 297–346 (2001) 7. Dijkgraaf, R., Sinkovics, A., Temurhan, M.: Universal Correlators from Geometry. JHEP 0411, 012(2004)

Random Skew Plane Partitions and the Pearcey Process

609

8. Ferrari, P.L., Spohn, H.: Step fluctuations for a faceted crystal, J. Stat. Phys. 113, 1–46 (2003) 9. Johansson, K.: Discrete polynuclear growth and determinantal processes. Commun. Math. Phys., 242, 277–329 (2003) 10. Johansson, K.: The Arctic circle boundary and the Airy process. Ann. Probab. 33(1), 1–30 (2005) 11. Kac, V.: Infinite dimensional Lie algebras. Cambridge: Cambridge University Press, 1990 12. Kenyon, R.: Local statistics of lattice dimers. Ann. Inst. H. Poincaré Probab. Statist. 33(5), 591–618 (1997) 13. Kenyon, R., Okounkov, A.: Limit shapes and the complex Burgers equation, http://arxiv.org/list/mathph/0507007, 2005 14. Okounkov, A.: Random surfaces enumerating algebraic curves. http://arxiv.org/list/math-ph/0412008, 2004 15. Okounkov, A., Reshetikhin, N.: Correlation function of Schur process with application to local geometry of a random 3-dimensional Young diagram. J. Amer. Math. Soc. 16(3), 581–603 (2003) 16. Okounkov, A., Reshetikhin, N., Vafa, C.: Quantum Calabi-Yau and Classical Crystals. http://arxiv.org/list/hep-th/0309208, 2003 17. Praehofer, M., Spohn, H.: Scale Invariance of the PNG Droplet and the Airy Process. J. Stat. Phys. 108, 1071–1106 (2002) 18. Saulina, N., Vafa, C.: D-branes as Defects in the Calabi-Yau Crystal. http://arxiv.org/list/hep-th/0404246, 2004 19. Tracy, C., Widom, H.: Differential equations for Dyson process. Commun. Math. Phys. 252, 7–41 (2003) 20. Tracy, C., Widom, H.: The Pearcey Process. Commun. Math. Phys., 263, 381–400 (2006) Communicated by L. Takhtajan

Commun. Math. Phys. 269, 611–657 (2007) Digital Object Identifier (DOI) 10.1007/s00220-006-0135-9

Communications in

Mathematical Physics

Quantum Spin Systems at Positive Temperature Marek Biskup, Lincoln Chayes, Shannon Starr Department of Mathematics, UCLA, Los Angeles, CA 90095, U.S.A. E-mail: [email protected] Received: 10 September 2005 / Accepted: 5 July 2006 Published online: 15 November 2006 – © M. Biskup, L. Chayes and S. Starr 2006

Abstract: We develop a novel approach to phase transitions in quantum spin models based on a relation to their classical counterparts. Explicitly, we show that whenever chessboard estimates can be used to prove a phase transition in the classical model, the corresponding quantum model will have a similar phase transition, provided √ the inverse temperature β and the magnitude of the quantum spins S satisfy β S. From the quantum system we require that it is reflection positive and that it has a meaningful classical limit; the core technical estimate may be described as an extension of the Berezin-Lieb inequalities down to the level of matrix elements. The general theory is applied to prove phase transitions in various quantum spin systems with S 1. The most notable examples are the quantum orbital-compass model on Z2 and the quantum 120-degree model on Z3 which are shown to exhibit symmetry breaking at low-temperatures despite the infinite degeneracy of their (classical) ground state. Contents 1. Introduction . . . . . . . . . . . . . . . . . . . . 2. Preliminaries . . . . . . . . . . . . . . . . . . . . 2.1 Coherent states . . . . . . . . . . . . . . . . 2.2 Chessboard estimates . . . . . . . . . . . . . 3. Main Results . . . . . . . . . . . . . . . . . . . . 3.1 Matrix elements of Gibbs-Boltzmann weights 3.2 Absence of clustering . . . . . . . . . . . . . 3.3 Phase transitions in quantum models . . . . . 4. Proofs . . . . . . . . . . . . . . . . . . . . . . . 4.1 Bounds on matrix elements . . . . . . . . . . 4.2 Quasiclassical Peierls’ arguments . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

612 614 615 617 620 620 622 626 628 628 630

© 2006 by M. Biskup, L. Chayes and S. Starr. Reproduction, by any means, of the entire article for non-commercial purposes is permitted without charge.

612

M. Biskup, L. Chayes, S. Starr

4.3 Exhibiting phase coexistence . . . . . . . . . . . . . 5. Applications . . . . . . . . . . . . . . . . . . . . . . . . 5.1 General considerations . . . . . . . . . . . . . . . . 5.2 Anisotropic Heisenberg antiferromagnet . . . . . . . 5.3 Large-entropy models . . . . . . . . . . . . . . . . . 5.4 Order-by-disorder transitions: Orbital-compass model 5.5 Order-by-disorder transitions: 120-degree model . . . 6. Appendix . . . . . . . . . . . . . . . . . . . . . . . . . . 6.1 Technical claims: Large-entropy models . . . . . . . 6.2 Technical claims: Orbital-compass model . . . . . . 6.3 Technical claims: 120-degree model . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

632 634 634 636 638 641 643 646 647 651 653

1. Introduction It is considered common knowledge that, for spin systems, the behavior of a quantum model at finite temperature is “like” the behavior of the corresponding classical model. However, beyond the level of heuristics, it is far from clear in what sense the above statement is meaningful. Another, slightly more academic way to “recover” the classical spin system is to consider spin-representations with spin-magnitude S and then let S → ∞. A standard argument as to why this should work is that the commutators between various spin operators are order-1/S smaller than the quantities themselves, and so the spins behave essentially classically when S is large. Notwithstanding, precise statements along these lines have only been made for the S → ∞ limit of the free energies [4, 27, 28, 36, 45] and specific types of 1/S corrections [12, 38, 39]. A common shortcoming of the above studies is that neither spells explicit conditions on the relative magnitude of β and S for which the classical behavior is exhibited. This is of importance because, at sufficiently low temperatures, the relevant excitations are quantum. For example, while the classical Heisenberg antiferromagnet on a finite bipartite graph has a continuum of ground states (related by the SO(3) symmetry), the quantum Heisenberg antiferromagnet has a unique ground state [37]. Another example is the 111-interface in the classical Ising model which, at zero temperature, is disordered but may be stabilized by appropriate (but arbitrarily small) quantum perturbations [9,32]. The control of the relevant quantum excitations is a non-trivial subject and is usually accomplished only when finite-temperature effects are of little significance for the overall behavior. The preceding discussion is particularly important for systems which undergo phase transitions. Here several techniques have been available—infrared bounds [20, 26], chessboard estimates [23–25,33] and contour expansions [10,13,14,35]—some of which (specifically, the latter two) are more or less based on the assumption that the quantum system of interest has a strong classical component. However, while certain conclusions happen to apply uniformly well even as S → ∞, the classical reference state of these techniques is usually discrete (e.g., Ising type). This is quite unlike the S → ∞ limit which inherently leads to a continuous-spin, Heisenberg-like model. Thus, the relation between the above “near-classical” techniques and the S → ∞ results discussed in the first paragraph is tenuous. The purpose of this paper is to provide a direct connection between the S → ∞ approach to the classical limit of quantum spin systems and the proofs of phase transitions by the traditional means of chessboard estimates. Explicitly, we establish the following general fact: Whenever chessboard estimates can be used to prove a phase

Quantum Spin Systems at Positive Temperature

613

transition in the classical system, a corresponding transition will occur in the quantum √ system provided S is sufficiently larger than the inverse temperature. This permits us to prove phase transitions in systems with highly degenerate ground states, but without continuous symmetry, as well as certain temperature driven phase transitions which have not been accessible heretofore. To highlight the main idea of our approach, let us recall how chessboard estimates enter the proofs of phase transitions. Suppose a quantum system on the torus is partitioned into disjoint blocks and a projector on a “bad event” is applied in some of the blocks. The goal is to show that the expectation—in the quantum Gibbs state—of the product of these projectors decays exponentially with the number of bad blocks. Here the chessboard estimates offer a non-trivial simplification: The expectation to the inverse number of bad blocks is maximized by the configuration in which all blocks are bad. In classical models, the latter quantity—sometimes referred to as the universal contour— is often fairly easy to estimate by properly accounting for energy and entropy of the allowed configurations. However, this is not the case once quantum effects get into play; the only general technique that has been developed for this purpose is the “principle of exponential localization” [25] which hinges on an approximate diagonalization of the “universal projectors” and model-specific spectral estimates. The main feature of our approach is that we bound the (relevant) universal contours directly—namely, by the universal contours for the classical (i.e., S = ∞) version of the quantum system. The technical estimate making this possible is a new bound on the matrix element of the Gibbs-Boltzmann weight relative to coherent states |, which is close in spirit to the celebrated Berezin-Lieb inequalities [4, 36]. The result is that |e−β H | is dominated by √ the classical Gibbs-Boltzmann √ weight times a correction that is exponential in O(β/ S)× volume. Hence, if β S, the exponential growthrate of partition functions, even those constrained by various projectors, is close to that of the classical system. This is ideally suited for an application of chessboard estimates and the corresponding technology—developed in [23–25,33]—for proving first-order phase transitions. Unfortunately, the bound in terms of universal contour has to be performed before the “conversion” to the classical setting and so we still require that the quantum system is reflection positive. To showcase our approach, we provide proofs of phase transitions in the following five quantum systems (defined by their respective formal Hamiltonians): (1) The anisotropic Heisenberg antiferromagnet: y y H =+ S −2 (J1 Srx Srx + J2 Sr Sr + Srz Srz ),

(1.1)

r ,r

where 0 ≤ J1 , J2 < 1. (2) The non-linear XY-model: H =−

r ,r

x x y y S S + Sr Sr , P r r 2 S

(1.2)

where P(x) = P1 (x 2 ) ± xP2 (x 2 ) for two polynomials P1 , P2 (of sufficiently high degree) with positive coefficients. (3) The non-linear nematic model: H =− (1.3) P S −2 (S r · S r )2 , r ,r

614

M. Biskup, L. Chayes, S. Starr y y

where S r · S r = Srx Srx + Sr Sr + Srz Srz and where P is a polynomial—typically of high degree—with positive coefficients. (4) The orbital compass model on Z2 : S −2 Srx Srx , if r = r ± eˆ x , (1.4) H= −2 S y S y , if r = r ± eˆ y . r r r ,r S (5) The 120-degree model on Z3 : j j S −2 Tr Tr if r = r ± eˆ j , H=

(1.5)

r ,r

where

⎧ Srx , ⎪ ⎪ ⎪ ⎨ √ j y Tr = − 21 Srx + 23 Sr , ⎪ ⎪ ⎪ ⎩ 1 x √3 y − 2 Sr − 2 Sr ,

if j = 1, if j = 2,

(1.6)

if j = 3.

Here r , r denotes a nearest-neighbor pair on Zd —where unless specified we are only assuming d ≥ 2—the symbol eˆ j stands for the unit vector in the j th lattice direction and y S r = (Srx , Sr , Srz ) is a triplet of spin-S operators for the spin at site r . The scaling of all interactions by the indicated inverse powers of S is necessary to make the S → ∞ limit meaningful. Model (1) has been included only for illustration; the requisite transition was proved for large anisotropy [25] and, in the context of the ferromagnet (which is not even reflection positive), for arbitrarily small anisotropy [31]. The classical versions of models (2–4) feature strong order-disorder transitions at intermediate temperatures; cf [1, 16, 22, 33]. Here we will prove that corresponding transitions occur for large-S quantum versions of these systems. Models (4–5) are quite unusual even at the classical level: Notwithstanding the fact that the Hamiltonian has only discrete symmetries, there is a continuum of ground states. As was shown in [6, 7], at positive temperatures the degeneracy is lifted leaving only a finite number of preferential directions. The proofs of [6, 7] involve (classical) spin-wave calculations not dissimilar to those of [18, 19]. However, since the massless spin-wave excitations are central to the behavior of these systems—even at the classical level—it is by no means clear how to adapt the methods of [10,13,14,20,23–25,31,33,35] to these cases. The remainder of the paper is organized as follows: In the next section, we recall the formalism of coherent states, which is the basis of many S → ∞ limit results, and the techniques of reflection positivity and chessboard estimates, which underline many proofs of phase transitions in quantum systems. In Sect. 3 we state our main theorems; the proofs come in Sect. 4. Applications to the various phase transitions in the aforementioned models are the subject of Sect. 5. The Appendix (Sect. 6) contains the proofs of some technical results that would detract from the main line of argument in Sects. 5.3–5.5. 2. Preliminaries In this section, we summarize standard and well-known facts about the SU(2) coherent states (Sect. 2.1) and the techniques of chessboard estimates (Sect. 2.2). The purpose of

Quantum Spin Systems at Positive Temperature

615

this section is mostly informative; a reader familiar with these concepts may skip this section altogether and pass directly to the statement of main results in Sect. 3. 2.1. Coherent states. Here we will recall the Bloch coherent states which were the basis for rigorous control of various classical limits of quantum spin systems [4,27,28,36,45]. In a well defined sense, these states are the “closest” objects to classical states that one can find in the Hilbert space. Our presentation follows closely Lieb’s article [36]; some of the calculations go back to [3]. The theory extends to general compact Lie groups, see [17, 45] for results at this level of generality. The literature on the subject of coherent states is quite large; we refer to, e.g., [2, 42] for comprehensive review and further references. Given S ∈ {1/2, 1, 3/2, . . . }, consider the (2S + 1)-dimensional irreducible representation of the Lie algebra su(2). The generators, (S x , S y , S z ), obeying the commutation rules [S i , S j ] = 2iεi jk S k , are operators acting on span{|M : M = −S, S + 1, . . . , S − 1, S} C2S +1 . In terms of spin-rasing/lowering operators, S ± = S x ± iS y , we have S z |M = M |M, S + |M = S(S + 1) − M(M + 1) |M + 1, S − |M = S(S + 1) − M(M − 1) |M − 1.

(2.1)

In particular, S x and S z are real while S y is purely imaginary. The classical counterpart of su(2)-spins are vectors on the two-dimensional unit sphere S2 in R3 . For each ∈ S2 , one defines the coherent state vector in the direction to be 1/2 S 2S | = [cos(θ/2)]S +M [sin(θ/2)]S −M ei(S −M)φ |M. (2.2) S+M M=−S

Here (θ, φ) are the spherical coordinates of , with θ denoting the azimuthal angle and φ denoting the polar angle. Let ζ = tan(θ/2)eiφ denote the stereographic projection from S2 to C. Then (2.2) can be written as | = e ζ S

− −ζ¯ S +

−

|S = [1 + |ζ |2 ]−S eζ S |S

= [cos(θ/2)]2S exp(tan(θ/2)eiφ S − ) |S .

(2.3)

One important property of the coherent state | is that it is an eigenvector of the matrix · S with maximal eigenvalue: ( · S)| = S| .

(2.4)

This equation characterizes the vector | up to a phase factor. The choice of the phase factors may seem arbitrary, but in practice they will cancel in all the formulas we use. The fact that the states | have been defined relative to the basis in (2.1) is inconsequential. Indeed, a rotation of a coherent state is, to within a harmless phase factor, the coherent state corresponding to the rotated vector. More precisely, for each ω ∈ S2 and t ∈ R, one may consider the unitary Uω,t = eit (ω· S) . Then, for any ∈ S2 , a simple calculation shows that + = Rω,t () · S , (2.5) Uω,t ( · S)Uω,t

616

M. Biskup, L. Chayes, S. Starr

where Rω,t ∈ SO(3) is the rotation about the ray passing through ω by the angle t. Because of this Uω,t | satisfies (2.4) with replaced by Rω,t () and so Uω,t | = ei f (,ω,t) |Rω,t () , for some phase factor f (, ω, t). Since SU(2) is a double cover of SO(3), f (, ω, 2π ) is not necessarily 0 (mod 2π ); rather ei f (,ω,2π ) = (−1)2S . The explicit formula (2.2) for | yields

2S | = cos(θ/2) cos(θ /2) + ei(φ−φ ) sin(θ/2) sin(θ /2) . Defining the angle between and to be , one also has | = [cos( /2)]2S . Another formula that is directly checked from (2.2) is 2S + 1 1= d ||, 4π S2

(2.6)

(2.7)

(2.8)

where d denotes the uniform surface measure on S2 with total mass 4π . Given any operator A on C2S +1 , one can form what is commonly known as the lower symbol, which is a function → A defined by A := |A|.

(2.9)

(Here and henceforth, |A| denotes the inner-product of | with the vector A|.) While not entirely obvious, it turns out that the trace of A admits the formula 2S + 1 Tr(A) = d A . (2.10) 4π S2 There is also a generalization of (2.8): There exists a function → [A] such that 2S + 1 d [A] ||. (2.11) A= 4π S2 Any such → [A] is called an upper symbol for A. Unfortunately, such a function is not unique and so [A] actually represents an equivalence class of functions. Obviously A + B = A + B . For the upper symbols, if [A] and [B] are upper symbols for A and B then [A + B] = [A] + [B] is an upper symbol for A + B. When A = 1, one has 1 = 1 and, by (2.8), one can also choose [A] = 1. However, it is usually not the case that the lower symbol is also an upper symbol, e.g., we have S x = S sin θ cos φ, [S x ] = (S + 1) sin θ cos φ, S y = S sin θ sin φ, [S y ] = (S + 1) sin θ sin φ, (2.12) z z S = S cos θ, [S ] = (S + 1) cos θ. As is easily checked, the leading order in S of these expressions is exactly the classical counterpart of the corresponding operator. For more complicated products of the spin components, both symbols develop lower-order “non-classical” corrections but, as was shown in [17, Theorem 2], the leading order term is always the classical limit.

Quantum Spin Systems at Positive Temperature

617

The above formalism generalizes to collections of many spins. Let be a finite set and, for each r ∈ , let (Sr1 , Sr2 , Sr3 ) be the spin operator for the spin at r . We will assume that the spins at all sites have magnitude S, so we assume to have a joint (product) representation of these spins on H = r ∈ [C2S +1 ]r . Consider an assignment of a classical spin r ∈ S2 to each r ∈ and denote the resulting configuration ( r ) r ∈ by . The desired product coherent state then is | := | r . (2.13) r ∈

Given an operator A on H , we define its lower symbol by the generalization of (2.9), A = |A|,

∈ (S2 )|| .

With this lower symbol we may generalize (2.10) into 2S + 1 || TrH (A) = d A . 4π (S2 )|| There is also a representation of A in terms of an upper symbol [A] , 2S + 1 || A= d [A] ||, 4π (S2 )||

(2.14)

(2.15)

(2.16)

where d is the product surface measure on (S2 )|| and where → [A] is now a function (S2 )|| → C. A special case of this formula is the resolution of the identity on H . Note that (2.16) allows us to substitute [A] for A in (2.15). It is easy to check that → [A] has the expected behavior under (tensor) product of operators, provided these respect the product structure of H . Indeed, suppose that is the disjoint union of 1 and 2 and let |1 and |2 be product coherent states from H1 and H2 , respectively. Given two operators A1 : H1 → H1 and A2 : H2 → H2 , let [A1 ]1 and [A2 ]2 be their associated upper symbols. Then [A1 ⊗ A2 ](1 ,2 ) := [A1 ]1 [A2 ]2

(2.17)

is an upper symbol of A1 ⊗ A2 relative to state |(1 , 2 ) = |1 ⊗ |2 . On the other hand, if [A] depends only on ( r ) r ∈ where , then we can perform a partial trace in (2.16) by integrating over the ( r ) r ∈ and applying (2.8) for each integral. 2.2. Chessboard estimates. Next we will review the salient features of the technology of reflection positivity/chessboard estimates which was developed and applied to both classical and quantum systems in the works of F. Dyson, J. Fröhlich, R. Israel, E. Lieb, B. Simon and T. Spencer [20, 23–26]. Consider a C -algebra A and suppose that A+ and A− are commuting subalgebras which are “mirror images” of each other in the sense that there is an algebraic automorphism θ : A → A such that θ (A± ) = A∓ and θ 2 = id. Assuming that A is represented in terms of complex matrices, for A ∈ A we define A¯ to be the complex conjugate—not the adjoint—of A. We will always assume that A is closed under complex conjugation. Note that, since complex conjugation is not a “covariant operation,” the representation of A ought to stay fixed throughout all calculations involving complex conjugation.

618

M. Biskup, L. Chayes, S. Starr

A relevant example of the above setting is a quantum spin-S system on the d-dimensional torus T L of L × · · · × L sites, with L even, which we think of as a union of two disjoint symmetric halves, T+L and T− L . (Note that T L can also be identified with Zd /LZd . Of course the origin 0 ∈ Zd maps to the origin of the torus.) Then A is the C -algebra of all observables—represented by (2S + 1)|T L | dimensional complex matrices—and A± are the sets of observables on T± L , respectively. Explicitly, A+ are matrices of the form A+ ⊗ 1, where A+ “acts” only on T+L , while the matrices in A− take the form 1 ⊗ A− . The operation θ is the map that interchanges the “left” and “right” half of the torus; e.g., in a properly parametrized basis, θ (A+ ⊗ 1) = 1 ⊗ A+ . The fact that θ arises from a reflection leads to the following concept: Definition 2.1. Let − be a state—i.e., a continuous linear functional—on A and let θ be as above. We say that − is reflection positive (relative to θ ) if for all A, B ∈ A+ , and

A θ (B) = B θ (A)

(2.18)

A θ (A) ≥ 0.

(2.19)

The following condition, derived in [20, Theorem E.1] and in [25, Theorem 2.1], is sufficient for the Gibbs state to have the above property: Theorem 2.2 (Reflection positivity—sufficient condition). Given a reflection of T L as described above and using θ to denote the associated reflection operator, if the Hamiltonian of a quantum system on T L can be written as H = C + θ (C) − (dα) Dα θ (Dα ), (2.20) where C, Dα ∈ A+ and is a (finite) positive measure, then the canonical Gibbs state − L ,β , which is defined by A L ,β =

TrHT L (e−β H A) TrHT L (e−β H )

,

(2.21)

is reflection positive relative to θ for all β ≥ 0. The crux of the proof of (2.19) is the fact that the β = 0 state is generalized reflection positive, i.e., A1 θ (A1 ) . . . An θ (An ) L ,0 ≥ 0. The rest follows by a Lie-Trotter expansion of e−β H into powers of the last term in (2.20)—hence the need for a minus sign in front of the integral. Remark 2.3. We reiterate that the reflections of T L considered here are always for “planes of reflections” between sites. In classical models one can also consider the (slightly more robust) reflections for “planes” on sites. However, due to non-commutativity issues, Theorem 2.2 does not seem to generalize to quantum systems for these kinds of reflections. Reflection positivity has two important (and related) consequences: Gaussian domination—leading ultimately to infrared bounds—and chessboard estimates. In this work we make no use of the former; we proceed by discussing the details of the latter.

Quantum Spin Systems at Positive Temperature

619

Let B be a block of B × · · · × B sites with the “lower-left” corner at the origin. Assuming that L is a multiple of B, we can tile T L by disjoint translates of B . The positions of these translates are given by B-multiples of vectors t from the factor torus T L/B . In particular, if B + r denotes the translate of B by r ∈ T L , then T L is the disjoint union t ∈T L/B ( B + B t). Let A B denote the algebra of observables in B , i.e., each A ∈ A B has the form A = A B ⊗ 1, where A B acts only on the portion of the Hilbert space corresponding to B . For each A ∈ A B and each t ∈ T L/B with |t| = 1, we can define an antilinear operator ϑˆ t (A) in B + B t by ϑˆ t (A) = θ (A),

(2.22)

where θ is the operator of reflection along the corresponding side of B . By taking further reflections, we can define ϑˆ t (A) for every t ∈ T L/B . (Thus ϑˆ t is linear for evenparity t and antilinear for odd-parity t; if every component of t is even then ϑˆ t is simply the translation by B t.) It is easy to check that the resulting ϑˆ t (A) does not depend on what sequence of reflections has been used to generate it. The fundamental consequence of reflection positivity, derived in a rather general form in [25, Theorem 2.2], is as follows: Theorem 2.4 (Chessboard estimate). Suppose that the state − is reflection positive for any “plane of reflection” between sites on T L . Then for any A1 , . . . , Am ∈ A B and any distinct vectors t 1 , . . . , t m ∈ T L/B ,

m j=1

ϑˆ t j (A j ) ≤

m j=1

(B/L)d ϑˆ t (A j )

.

(2.23)

t ∈T L/B

By (2.23) we may bound the expectation of a product of operators by a product of expectations of so-called “disseminated” operators. As we will show on explicit examples later, these are often easier to estimate. Note that the giant products above can be written in any order by our assumption that the block-operators in different blocks commute. A corresponding statement works also for classical reflection-positive measures. The only formal difference is that the A j ’s are replaced by functions, or indicators of events A j , which depend only on the spin configuration in B . Then Eq. (2.23) becomes ⎞ ⎛ ⎞(B/L)d ⎛ m m θ t j (A j )⎠ ≤ P⎝ θ t (A j )⎠ . (2.24) P⎝ j=1

j=1

t ∈T L/B

Here θ t (A) is the (usual) reflection of A to the block B + B t. (We reserve the symbol ϑ t (A) for an operation that more closely mimics ϑˆ t in the coherent-state representation; see the definitions right before Proposition 3.4.) References [5, 6, 8] contain a detailed account of the above formalism in the classical context; the original statements are, of course, due to [23–25]. Remark 2.5. Unlike its classical counterpart, the quantum version of reflection positivity is a rather mysterious concept. First, for most of the models listed in the introduction, in order to bring the Hamiltonian to the form (2.20), we actually have to perform some sort of rotation of the spins. (We may think of this as choosing a different representation

620

M. Biskup, L. Chayes, S. Starr

of the spin operators.) The purpose of this operation is to have all spins “represented” by real-valued matrices, while making the overall sign of the interactions negative. This permits an application of Theorem 2.2. It is somewhat ironic that this works beautifully for antiferromagnets, which thus become effectively ferromagnetic, but fails miserably [47] for genuine ferromagnets. For XY-type models, when only two of the spin-components are involved in the interaction, we can always choose a representation in which all matrices are real valued. If only quadratic interactions are considered (as for the nematics) the overall sign is inconsequential but, once interactions of different degrees are mixed—even if we just add a general external field to the Hamiltonian—reflection positivity may fail again.

3. Main Results We now give precise statements of our main theorems. First we will state a bound on the matrix elements of the Gibbs-Boltzmann weight in the (overcomplete) basis of coherent states. On the theoretical side, this result generalizes the classic Berezin-Lieb inequalities [4, 36] and thus provides a more detailed demonstration of the approach to the classical limit as S → ∞. On the practical side, the bound we obtain allows us to replace the “exponential localization” technique of Fröhlich and Lieb [25]—which is intrinsically quantum—by an estimate for the classical version of the model. The rest of our results show in detail how Theorem 3.1 fits into the standard line of proof of phase transitions via chessboard estimates. In Sect. 5 we will apply this general strategy to the five models of interest.

3.1. Matrix elements of Gibbs-Boltzmann weights. We commence with a definition of the class of models to which our arguments apply. a finite set ⊂ Zd and, for Consider 2 S +1 ] r that depends only on the each ⊂ , let h be an operator on H = r ∈ [C spins in . (I.e., h is a tensor product of an operator on H and the unity on H .) We will assume that h = 0 if the size of exceeds some finite constant, i.e., each interaction term involves only a bounded number of spins. The Hamiltonian is then H=

h .

(3.1)

: ⊂

Most of the interesting examples are such that h = 0 unless is a two point set {x, y} containing a pair of nearest neighbors on Zd —as is the case of all of the models (1–5) discussed in Sect. 1. As already noted, our principal technical result is a bound on the matrix element |e−β H | . To state this bound precisely, we need some more notation. Let → [h ] be an upper symbol of the operator h which, by (2.17), may be assumed independent of the components ( r )x∈ . We fix the upper symbol of H to [H ] =

[h ] .

(3.2)

: ⊂

We will also use || to denote the number of elements in the set and h to denote the operator norm of h on H .

Quantum Spin Systems at Positive Temperature

621

Let | r − r | denote the (3-dimensional) Euclidean distance of the points r and r on S2 , and consider the following 1 and 2 -norms on (S2 )|| : | r − r | (3.3) − 1 = r ∈

and

− 2 =

1/2 | r − r |2

.

(3.4)

r ∈

Besides these two norms, we will also need the “mixed” quantity √ S| r − r | ∧ S| r − r |2 , dS (, ) =

(3.5)

r ∈

where ∧ denotes the minimum. This is not a distance function but, as will be explained in Lemma 4.2, it does satisfy an inequality which could be compared to the triangle inequality. Finally, from (2.7) we know that | r | r | = 1 − O(S| r − r |2 ). Hence, there is η > 0 such that | ≤ e−η S − 22 (3.6) holds for all S, all , ∈ (S2 )|| and all . We fix this η throughout all forthcoming derivations. (Since [cos( /2)]2 = 1 − 1/4 − 2 for a single spin, we have η = 1/4. But η plays only a marginal role in our calculations so we will leave it implicit.) Our first main theorem then is: Theorem 3.1. Suppose that there exists a number R such that || > R

⇒

h = 0,

(3.7)

and that, for some constants c0 and c1 independent of S and , we have sup h ≤ c0

(3.8)

as well as the Lipschitz bound [h ] − [h ] ≤ c1 − 1 h ,

(3.9)

x∈ :x∈⊂

⊂ .

Then for any constant c2 > 0, there √ exists a constant c3 > 0, depending only on c0 , c1 , c2 and R, such that for all β ≤ c2 S, √ |e−β H | ≤ e−β[H ] −η dS (, )+c3 β||/ S

(3.10)

holds for all , ∈ (S2 )|| and all finite . Note that we do not assume that the Hamiltonian is translation-invariant. In fact, as long as the conditions (3.7–3.9) hold as stated, the geometry of the underlying set is completely immaterial. For the diagonal elements—which is all we need in the subsequent derivations anyway—the above bound becomes somewhat more transparent:

622

M. Biskup, L. Chayes, S. Starr

Corollary 3.2. Suppose √ (3.7–3.9) hold and let c2 and c3 be as in Theorem 3.1. Then for all β and S with β ≤ c2 S, all ∈ (S2 )|| and all , √

e−βH ≤ |e−β H | ≤ e−β[H ] +c3 β||/ S .

(3.11)

It is interesting to compare this result with the celebrated Berezin-Lieb inequalities [4, 36] which state the following bounds between quantum and classical partition functions: TrH (e−β H ) d d −βH e ≤ ≤ e−β[H ] . (3.12) || || || || (4π ) (2S + 1) (4π )|| (S2 ) (S2 ) (An unpublished proof of E. Lieb, cf. [46], shows both inequalities are simple consequences of Jensen’s inequality; the original proof [36] invoked also the “intrinsically non-commutative” Golden-Thompson √ inequality.) From Corollary 3.2 we now know that, to within a correction of order β/ S, the estimates corresponding to (3.12) hold even for the (diagonal) matrix elements relative to coherent states. However, the known proofs of (3.12) use the underlying trace structure in a very essential way and are not readily extended to a generalization along the lines of (3.11). Remarks 3.3. Some comments are in order: √ (1) The correction of order β||/ S is the best one can do at the above level√of generality. Indeed, when and are close in the sense √− 1 = O(||/ S), then [H ] and [H ] differ by a quantity of order c1 ||/ S. Since the matrix element is symmetric in and , the bound must account for the difference. However, √ there is a deeper reason why β/ S needs to be small for the classical Boltzmann weight to faithfully describe the matrix elements of the quantum Boltzmann weight. Consider a single spin with the Hamiltonian H = S −1 S z , and let correspond to the spherical angles (θ, φ). A simple calculation shows that then 2S

1 1 |e−β H | = cos2 (θ/2)e− 2 β/S + sin2 (θ/2)e 2 β/S β2

2 θ)+O(β 3 /S 2 )

= e−β cos θ+ 4S (1−cos

.

(3.13)

The term β cos θ is the (now unambiguous) classical interaction√ in “state” . The leading correction is of order β 2 /S, which is only small if β S. (2) Another remark that should be made, lest the reader think about optimizing over the many choices of upper symbols in (3.10): The constant c3 depends on the upper symbol. For h being a polynomial in spin operators, [h ] may be chosen a polynomial too [17, Proposition 3]. This automatically ensures properties such as the Lipschitz continuity (as well as existence of the classical limit, cf. (3.14)). For more complex h ’s—e.g., those defined by an infinite power series—one must carefully check the conditions (3.7–3.9) before Theorem 3.1 can be applied. 3.2. Absence of clustering. Our next task is to show how Theorem 3.1 can be applied to establish phase transitions in models whose (S → ∞) classical version exhibits a phase transition that can be proved by means of chessboard estimates. The principal conclusion is the absence of clustering which, as we will see in Sect. 3.3, directly implies a quantum phase transition.

Quantum Spin Systems at Positive Temperature

623

Consider the setting as described in Sect. 2.2, i.e., we have a torus T L of side L which is tiled by (L/B)d disjoint translates of a block B of side B. For each operator in B and each t ∈ T L/B , we write ϑˆ t (A) for the appropriate reflection—accompanied by complex conjugation if t is an odd parity site—of A “into” the block B + B t. In addition to the operators on HT L = t ∈T L [C2S +1 ] t , we will also consider events A on the space of classical configurations (S2 )|T L | equipped with the Borel product σ -algebra and the product surface measure d = r ∈T L d r . If A is an event that depends only on the configuration in B , we will call A a B-block event. For each t ∈ T L/B , we use θ t (A) to denote the event in B + B t that is obtained by (pure) reflection of A “into” B + B t. Given a quantum Hamiltonian H of the form (3.1), let − L ,β denote the thermal state (2.21). Considering the classical Hamiltonian H ∞ : (S2 )|T L | → R, which we define as (3.14) H ∞ () = lim H = lim [H ] , S →∞

S →∞

we use P L ,β to denote the usual Gibbs measure. Explicitly, for any event A ⊂ (S2 )|T L | , P L ,β (A) =

∞

A

d

e−β H () , Z L (β)

(3.15)

where Z L (β) is the classical partition ! function. For each B-block event A we will also consider its disseminated version t ∈T L/B θ t (A) and introduce the abbreviation "

p L ,β (A) = P L ,β

#

θ t (A)

$%(B/L)d (3.16)

t ∈T L/B

for the corresponding quantity on the right-hand side of (2.24). An application of (2.23) shows that A → p L ,β (A) is an outer measure on the σ -algebra of B-block events (cf. [6, Theorem 6.3]). For each measurable set A ⊂ (S2 )|T L | we consider the operator Qˆ A =

2S + 1 4π

|T L | A

d ||.

(3.17)

Since the coherent states are overcomplete, this operator is not a projection; notwithstanding, we may think of it as a non-commutative counterpart of the indicator of the event A. In order to describe the behavior of Qˆ A under ϑˆ t , we introduce the classical version ϑ t of ϑˆ t which is defined as follows: Consider a “complex-conjugation” map σ : (S2 )|T L | → (S2 )|T L | which, in a given representation of the coherent states, has the effect || = |σ σ |. (3.18) For the representation introduced in Sect. 2.1, we can choose σ to be the reflection through the x z-plane (in spin space), i.e., if = (θ, φ) then σ () = (θ, −φ). For even parity t ∈ T L/B , we simply have ϑ t = θ t while for odd parity t ∈ T L/B we have ϑ t = θ t ◦ σ . ˆ Here are some simple facts about the Q-operators:

624

M. Biskup, L. Chayes, S. Starr

Proposition 3.4. For any B-block event A we have ϑˆ t ( Qˆ A ) = Qˆ ϑt (A) ,

t ∈ T L/B .

(3.19)

Moreover, if A1 , . . . , Am are B-block events and t 1 , . . . , t m are distinct elements from T L/B , then [ Qˆ θt i (Ai ) , Qˆ θt j (A j ) ] = 0, 1 ≤ i < j ≤ m, (3.20) and

Qˆ θt 1 (A1 ) . . . Qˆ θt m (Am ) = Qˆ θt 1 (A1 ) ∩···∩ θt m (Am ) .

(3.21)

Finally, Qˆ of the full space (i.e., (S2 )|T L | ) is the unity, Qˆ ∅ = 0, and if A1 , A2 , . . . is a countable collection of disjoint events, then (in the strong-operator topology) = Qˆ ∞ n=1 An

∞

Qˆ An .

(3.22)

n=1

In particular, Qˆ Ac = 1 − Qˆ A for any event A. Proof. The map ϑˆ t is a pure reflection for even-parity t ∈ T L/B and so (3.19) holds by the fact that pure reflection of Qˆ A is Qˆ of the reflected A. For odd-parity t, the relation (3.18) implies Qˆ A = Qˆ σ (A) , which yields (3.19) in these cases as well. The remaining identities are easy consequences of the definitions and (2.8). Remark 3.5. The last few properties listed in the lemma imply that the map A → Qˆ A is a positive-operator-valued (POV) measure, in the sense of [15]. As a consequence, if A ⊂ A then Qˆ A ≤ Qˆ A while if {An } is a countable collection of events, not necessarily disjoint, then ∞ ˆ ∞ Q n=1 An ≤ (3.23) Qˆ An . n=1

Both of these properties are manifestly true by the definition (3.17). Before we state our next theorem, let us recall the “standard” setting for the application of chessboard estimates to proofs of phase transitions in classical models. Given B that divides L, one typically singles out a collection G1 , . . . , Gn of “good” B-block events and defines (3.24) B = (G1 ∪ · · · ∪ Gn )c to be the corresponding “bad” B-block event. Without much loss of generality we will assume that B is invariant under “complex” reflections, i.e., ϑ t (B) = τ B t (B), where τ r denotes the shift by r on (S2 )|T L | . In the best of situations, carefully chosen good events typically satisfy the conditions in the following definition: Definition 3.6. We say that the “good” B-block events are incompatible if (1) they are mutually exclusive, i.e., Gi ∩ G j = ∅ whenever i = j;

Quantum Spin Systems at Positive Temperature

625

(2) their simultaneous occurrence at neighboring blocks forces an intermediate block (which overlaps the two neighbors) i.e., there exists with 1 ≤ < B such that θ t (Gi ) ∩ θ t (G j ) ⊂ τ B t +( t − t ) (B)

(3.25)

holds for all i = j and any t, t ∈ T L/B with |t − t | = 1. Here τ r is the shift by r . These conditions are much easier to achieve in situations where we are allowed to use reflections through planes containing sites. Then, typically, one defines the Gi ’s so that the neighboring blocks cannot have distinct types of goodness. But as noted in Remark 2.3, we are not allowed to use these reflections in the quantum setting. Nevertheless, (1) and (2) taken together do ensure that a simultaneous occurrence of two distinct types of goodness necessarily enforces a “contour” of bad blocks. The weight of each such contour can be bounded by the quantity p L ,β (B) to the number of constituting blocks; it then remains to show that p L ,β (B) is sufficiently small. For quantum models, appropriate modifications of this strategy yield the following result: Theorem 3.7. Consider a quantum spin system on T L with spin S and interaction for which the Gibbs state − L ,β from (2.21) is reflection positive for reflections through planes between sites on T L . Let H ∞ be a function and ξ > 0 a constant such that, for all L ≥ 1, sup [H ] − H ∞ () + sup H − H ∞ () ≤ ξ |T L |. (3.26) ∈(S2 )|T L |

∈(S2 )|T L |

Let G1 , . . . , Gn be incompatible “good” B-block events and define B as in (3.24). Suppose that B is invariant under reflections and conjugation σ , i.e., ϑ√ t (B) = τ B t (B) for all t ∈ T L/B . Fix > 0. Then there exists δ > 0 such that if β ≤ c2 S and √

p L ,β (B) e β(ξ +c3 /

S)

< δ,

where c2 and c3 are as in Theorem 3.1, we have Qˆ B L ,β <

(3.27)

(3.28)

and, for all i = 1, . . . , n and all distinct t 1 , t 2 ∈ T L/B , &

Qˆ θt 1 (Gi ) [1 − Qˆ θt 2 (Gi ) ]

' L ,β

< .

(3.29)

Here δ may depend on and d, but not on β, S, n nor on the details of the model. Remarks 3.8. Here are some notes concerning the previous theorem: (1) By general results (e.g., [17]) on the convergence of upper and lower symbols as S → ∞, the quantity ξ in (3.26) can be made arbitrarily small by increasing S appropriately. In fact, for two-body interactions, ξ is typically a√small constant times 1/S and so it provides a harmless correction to the term c3 / S in (3.27). In particular,√ apart from the classical bound that p L ,β (B) 1, (3.27) will only require that β S.

626

M. Biskup, L. Chayes, S. Starr

(2) Note that the result is stated for pure reflections, θ t (Gi ), of the good events, not their more complicated counterparts ϑ t (Gi ). This is important for maintaining a close link between the nature of phase transition in the quantum model and its classical counterpart. We also note that H ∞ is not required to be reflection positive for Theorem 3.7 to hold. (Notwithstanding, the classical Hamiltonian will be reflection positive for all examples in Sect. 5.) (3) The stipulation that the ϑ t ’s “act” on B only as translations is only mildly restrictive: Indeed, σ (B) = B in all cases treated in the present work. However, if it turns out that σ (B) = B, the condition (3.27) may be replaced by (

√

p L ,β (B)p L ,β σ (B) e β(ξ +c3 /

S)

< δ,

(3.30)

which—since p L ,β (σ (B)) ≤ 1—is anyway satisfied by a stricter version of (3.27) (this does need reflection positivity of H ∞ ). Note that σ (B) = B implies that every configuration in σ (Gi ) is also good. In most circumstances we expect that σ (Gi ) is one of the good events.

3.3. Phase transitions in quantum models. It remains to show how to adapt the main conclusion of Theorem 3.7 to the proof of phase transition in quantum systems. We first note that (3.27) is a condition on the classical model which, for δ small, yields a classical variant of (3.29), P L ,β θ t 1 (Gi ) ∩ θ t 2 (Gic ) < , 1 ≤ i ≤ n. (3.31) Under proper conditions on and the probabilities of the Gi ’s, this yields absence of clustering for the classical torus Gibbs state which, by a conditioning “on the back of the torus”—see the paragraph before Lemma 4.5—implies the existence of multiple infinite-volume Gibbs measures. For a quantum system with an internal symmetry, a similar argument allows us to deal with the cases when the symmetry has been “spontaneously” broken. For instance (see [25]) in magnetic systems (3.29) might imply the non-vanishing of the spontaneous magnetization which, in turn, yields a discontinuity in some derivative of the free energy, i.e., a thermodynamic phase transition. In the cases with no symmetry—or in situations where the symmetry is not particularly useful, such as for temperature-driven phase transitions—we can still demonstrate a thermodynamic transition either by concocting an “unusual” external field (which couples to distinct types of good blocks) or by directly proving a jump e.g. in the energy density. An elegant route to these matters is via the formalism of infinite-volume KMS states (see, e.g., [30,46]). Let us recall the principal aspects of this theory: Consider the C algebra A of quasilocal observables defined as the norm-closure of ⊂Zd A , where the union is over all finite subsets and where A is the set of all bounded operators on the Hilbert space H = r ∈ [C2S +1 ] r . (To interpret the union properly, we note that if ⊂ , then A is isomorphic to a subset of A , via the map A → A ⊗ 1 with 1 being the identity in A \ .) For each L ≥ 1, let us identify T L with the block L and let HL be the Hamiltonian on T L which we assume is of the form (3.1) with h finite range and translation invariant. (L) For each observable A ∈ A L , let αt (A) = eit HL Ae−it HL be the strongly-continuous one-parameter family of operators representing the time evolution of A in the

Quantum Spin Systems at Positive Temperature

627

Heisenberg picture. For A local and HL finite range, by expanding into a series of commutators (it)n (L) αt (A) = [HL [HL . . . [HL , A] . . . ]], (3.32) n! n≥0

(L) the map t → αt (A) extends to all t ∈ C, see [30, Theorem III.3.6]. Moreover, the infi(L) nite series representation of αt (A) converges in norm, as L → ∞, to a one-parameter family of operators αt (A), uniformly in t on compact subsets of C. (These facts were

originally proved in [43].) A state −β on A—i.e., a linear functional obeying Aβ ≥ 0 if A ≥ 0 and 1β = 1—is called a KMS state (for the translation-invariant, finite-range interaction H at inverse temperature β) if for all local operators A, B ∈ A, the equality ABβ = α−iβ (B)A β , (3.33) also known as the KMS condition, holds. This condition is the quantum counterpart of the DLR equation from classical statistical mechanics and a KMS state is thus the counterpart of the infinite-volume Gibbs measure. We proceed by stating two general propositions which will help us apply the results from previous sections to the proof of phase transitions. We begin with a statement which concerns phase transitions due to symmetry breaking: Proposition 3.9. Consider the quantum spin systems as in Theorem 3.7 and suppose that the incompatible good block events G1 , . . . , Gn are such that Qˆ Gk L ,β is the same for all k = 1, . . . , n. If (3.28–3.29) hold with an such that (n + 1) < 1/2, then there (k) exist n distinct, KMS states −β , k = 1, . . . , n, which are invariant under translations by B and for which (k) k = 1, . . . , n. (3.34) Qˆ Gk β ≥ 1 − (n + 1), The proposition says that there are at least n distinct equilibrium states. There may be more, but not less. This ensures a phase transition, via phase coexistence. Our second proposition deals with temperature driven transitions. The following is a quantum version of one of the principal theorems in [33, 34]: Proposition 3.10. Consider the quantum spin systems as in Theorem 3.7 and let G1 and G2 be two incompatible B-block events. Let β1 < β2 be two inverse temperatures and suppose that ∈ [0, 1/4) is such that for all L ≥ 1, (1) the bounds (3.28–3.29) hold for all β ∈ [β1 , β2 ], (2) Qˆ G1 L ,β1 ≥ 1 − 2 and Qˆ G2 L ,β2 ≥ 1 − 2. Then there exists an inverse temperature βt ∈ [β1 , β2 ] and two distinct KMS states (2) −(1) βt and −βt at inverse temperature βt which are invariant under translations by B and for which (1) (2) (3.35) Qˆ G1 β ≥ 1 − 4 and Qˆ G2 β ≥ 1 − 4. t

t

The underlying idea of the latter proposition is the existence of a forbidden gap in the density of, say, G1 -blocks. Such “forbidden gap” arguments have been invoked in (limiting) toroidal states by, e.g., [29, 33, 34]; an extension to infinite-volume, translation-invariant, reflection-positive Gibbs states has appeared in [8]. Both propositions are proved in Sect. 4.3.

628

M. Biskup, L. Chayes, S. Starr

4. Proofs Here we provide the proofs of our general results from Sect. 3. We begin by the estimates of matrix elements of Gibbs-Boltzmann weight (Theorem 3.1) and then, in Sect. 4.2, proceed to apply these in quasiclassical Peierls’ arguments which lie at the core of Theorem 3.7. Finally, in Sect. 4.3, we elevate the conclusions of Theorem 3.7 to coexistence of multiple KMS states, thus proving Propositions 3.9–3.10. 4.1. Bounds on matrix elements. The proof of Theorem 3.1 is based on a continuity argument whose principal estimate is encapsulated into the following claim: ) = Proposition 4.1. Suppose that (3.7–3.9) hold with constants R, c0 , and c√ 1 . Let H H − [H ] . Suppose there exist c2 > 0 and > 0 such that for all β ≤ c2 S, |e−β H) | ≤ e−η dS (, )+β|| (4.1) is true for all , ∈ (S2 )|| . Then there exists a√constant c3 depending on c0 , c1 , c2 and R (but not , S or ) such that for all β ≤ c2 S, d c3 ) (4.2) |e−β H | ≤ √ || e−η dS (, )+β|| . dβ S Before we commence with the proof, we will make a simple observation: Lemma 4.2. For all and all , , ∈ (S2 )|| , √ dS (, ) ≤ dS ( , ) + S − 1 + 1{r = r } .

(4.3)

r ∈

Proof. Since all “norms” in the formula are sums over r ∈ , it suffices to prove the above for having only one point. This is easy: For = √the inequality is actually an equality. Otherwise, we apply the bounds dS (, ) ≤ S| − | and √ dS ( , ) + 1 ≥ S| − | to convert the statement into the triangle inequality for the 1 -norm. Proof of Proposition 4.1. Let us fix and for the duration of this proof and abbre) viate M(β) = |e−β H | . We begin by expressing the derivative of M(β) as an ) e−β H) | and so inserting the integral over coherent states. Indeed, M (β) = * −| H ) = ⊂ (h − [h ] ), we have upper-symbol representation (2.16) for H 2S + 1 || + | + + |e−β H) | [h ] M (β) = − d + − [h ] . || 4π (S2 ) ⊂ (4.4) + on , the integrals By the fact that [h ] + − [h ] depends only on the portion of + outside can be carried out which yields over the components of 2S + 1 || ) d | |e−β H | [h ] − [h ] . M (β) = − 4π (S2 )|| ⊂ (4.5)

Quantum Spin Systems at Positive Temperature

629

Here, as for the rest of this proof, is set to outside and to in . Let I denote the integral on the right-hand side of (4.5). Using (3.6), (4.1) and (3.9) we have 2 |I | ≤ c1 h eβ|| d e−η dS ( , )−η S −2 −β([H ] −[H ] ) − 1 . (S2 )||

(4.6) ) = H ) − [H ] + [H ] .) In order to bound the (Recall from the definition that H right-hand side, we need a few simple estimates. First, noting that ([h ] − [h ] ) , (4.7) [H ] − [H ] = : ∩=∅

(3.8) and (3.9) imply that, for some constant c4 depending only on c0 , c1 and R, [H ] − [H ] ≤ c4 − 1 = c4 − 1 . (4.8) Second, Lemma 4.2 tells us

√ (4.9) − dS ( , ) ≤ − dS (, ) + S − 1 + ||. √ −1/2 times the exponential of S − . Since Finally, − 1 is bounded 1 √ by S we are assuming that β ≤ c2 S, we conclude that

e−η dS ( ,

)−β([H ] −[H ] )

√ eη|| − 1 ≤ √ e−η dS (, )+c5 S − 1 S

(4.10)

for some constant c5 independent of S and . Plugging this back in the integral (4.6), we get √ c1 eη|| 2 β||−η dS (, ) |I | ≤ √ h e d ec5 S − 1 −η S − 2 . S (S2 )|| (4.11) To estimate the integral, we note that both norms in the exponent are sums over individual components. Hence, the integral is bounded by the product of || integrals of the form √ 2 K = dr ec5 S | r − r |−ηS | r − r | , (4.12) S2

where r and r are vectors on S2 —representing the corresponding 3-dimensional components of and —and where |r − r | denotes Euclidean distance in R3 . Parametrizing by r = |r − r | and integrating over the polar angle of r relative to r , we now get 2 √ 1 2 K = dr J (r ) e− 2 ηS r +c5 S r . (4.13) 0

Here the Jacobian, J (r ), is the circumference of the circle {r : |r | = 1, |r −r | = r }. But this circle has radius smaller than r and so J (r ) ≤ 2πr . Scaling r by S −1/2 yields K ≤ c6 /S for some constant c6 > 0 independent of S. Plugging this back in (4.11), we then get c1 # c6 eη $|| h e−η dS (, )+β|| . (4.14) |I | ≤ √ S S

630

M. Biskup, L. Chayes, S. Starr

Inserting this into (4.5), using (3.7) to bound the terms exponential in || by a constant depending only on R—this is possible because there are || factors of S’s in the denominator of (4.14) that can be used to cancel the factors (2S + 1) in front of the integral in (4.5)—and applying (3.8), we get (4.2). On the basis of Proposition 4.1, the proof of Theorem 3.1 is easily concluded: Proof from Proposition 4.1 and let = √ of Theorem 3.1. Let c2 and c3 be the constants √ c3 / S. We claim that (4.1) holds for all β ≤ c2 S. First, in light of (3.6) and the definition of dS (, ), (4.1) holds for β = 0. This allows us to define β√0 to be the largest number such that (4.1) holds for all β ∈ [0, β0 ]. Now, if β ≤ β0 ∧ c2 S, then Proposi) tion 4.1 and our choice of guarantee that the β-derivative of |e−β H | is no larger √ than that of the right-hand side of (4.2). We deduce (by continuity) that β0 = c2 S. ) = H − [H ] , we now get (3.10). Using that H Proof of Corollary 3.2. First we observe that the diagonal matrix element |e−β H | is real and positive. The upper bound is then the = version of Theorem 3.1; the lower bound is a simple consequence of Jensen’s—also known as the Peierls-Bogoliubov— inequality; see, e.g., [46, Theorem I.4.1].

4.2. Quasiclassical Peierls’ arguments. Our goal is to prove the bounds (3.28–3.29). To this end, let us introduce the quantum version of the quantity from (3.16): For any Bblock event A, let (B/L)d ˆ q L ,β (A) = . (4.15) Q ϑ t (A) t ∈T L/B

L ,β

(Note that, by (3.19), this is of the form of the expectation on the right hand side of (2.23).) First we will note the following simple consequence of Theorem 3.1: √ Lemma 4.3. Let ξ be as in (3.26) and let c2 and c3 be as in Theorem 3.1. If β ≤ c2 S, then for any B-block event A, √ 1/

q L ,β (A) ≤ p L ,β (A)p L ,β σ (A) 2 e β(ξ +c3 / S ) . (4.16) Proof. By (3.21) we have

q L ,β (A) = Qˆ A+(B/L) L ,β

d

+= where A

ϑ t (A).

(4.17)

t ∈T L/B

Invoking the integral representation (3.17), the bounds from Corollary 3.2 and the definition of ξ from (3.26), √

+ (B/L) e β(ξ +c3 / q L ,β (A) ≤ P L ,β (A) d

S)

.

(4.18)

Now we may use (2.24) for the classical probability and we get (4.16). Next we will invoke the strategy of [25] to write a bound on the correlator in (3.29) in terms of a sum over Peierls contours. Let M L/B denote the set of connected sets Y ⊂ T L/B with connected complement. By a contour we then mean the boundary of a set Y ∈ M L/B , i.e., the set ∂Y of nearest neighbor edges on T L/B with one endpoint in Y and the other endpoint in Yc ⊂ T L/B . The desired bound is as follows:

Quantum Spin Systems at Positive Temperature

631

Lemma 4.4. Let G1 , . . . , Gn be incompatible good events and let B be the bad event with the property that τ B t (B) = ϑ t (B) for all t ∈ T L/B . Then for all distinct t 1 , t 2 ∈ T L/B and all i = 1, . . . , n, & '

1 |∂ Y| Qˆ θt 1 (Gi ) Qˆ θt 2 (Gic ) ≤ 2 4q L ,β (B) 4d . (4.19) L ,β

Y : Y∈M L/B t 1 ∈Y, t 2 ∈Y

Proof. We begin by noting that t 1 = t 2 and (3.20–3.21) give us 2S + 1 |T L | Qˆ θt 1 (Gi ) Qˆ θt 2 (Gic ) = d ||. 4π θ t 1 (Gi )∩θ t 2 (Gic )

(4.20)

Now pick ∈ θ t 1 (Gi ) ∩ θ t 2 (Gic ) and let Y ⊂ T L/B be the largest connected component of B-blocks—i.e., translates of B by B t, with t ∈ T L/B —such that t 1 ∈ Y and that θ t (Gi ) occurs for every t ∈ Y . This set may not have connected complement, so we define Y ∈ M L/B to be the set obtained by filling the “holes” of Y , except that which contains t 2 . Note that all translates of B corresponding to the boundary sites of Y are of type Gi . In order to extract the weight of the contour, we will have to introduce some more notation. Decomposing the set of boundary edges ∂Y into d sets ∂1 Y, . . . , ∂d Y according to the coordinate directions into which the edges are pointing, let j be a direction c where |∂ j Y| is maximal. Furthermore, let Yext j be the set of sites in Y which are on the “left” side of an edge in ∂ j Y. It is easy to see that this singles out exactly half of the sites in Yc that are at the endpoint of an edge in ∂ j Y. Next we intend to show that the above setting implies the existence of at least |Yext j |/2 bad blocks whose position is more or less determined by Y. Recall that eˆ j denotes the unit vector in the j th coordinate direction. Since the good events satisfy the incompatibility condition (3.25), at least one of the following two possibilities must occur: either ∈ τ B t (B) for at least half of t ∈ Yext j or ∈ τ B t +ˆe j (B) ext for at least half of t ∈ Y j . (Here is the constant from the definition of incompatibility.) Indeed, if the former does not occur then more than half of t ∈ Yext j mark a good block, but of a different type of goodness than Gi . Since this block neighbors on a Gi -block, incompatibility of good block events implies that a bad block must occur lattice units along the line between these blocks. Let us temporarily abbreviate K j = |Yext j | and let C j (Y) be the set of collections of K j /2 sites representing the positions of the aforementioned K j /2 bad blocks. In light of τ B t (B) = ϑ t (B), the above argument implies ⎞ ⎛ K j /2 K j /2 , , ⎝ ϑ t i (B) ∪ τˆe j ϑ t i (B) ⎠. θ t 1 (Gi ) ∩ θ t 2 (Gic ) ⊂ Y : Y∈M L/B ( t i )∈C j (Y) t 1 ∈Y, t 2 ∈Y

i=1

i=1

(4.21) Therefore, using the fact that A → Qˆ A is a POV measure (cf. Remark 3.5), this implies ⎞ ⎛ K j /2 K j /2 ⎝ Qˆ ϑt i (B) + Qˆ τ ϑ (B) ⎠. Qˆ θt 1 (Gi ) Qˆ θt 2 (Gic ) ≤ Y : Y∈M L/B ( t i )∈C j (Y) t 1 ∈Y, t 2 ∈Y

i=1

i=1

ˆe j

ti

(4.22)

632

M. Biskup, L. Chayes, S. Starr

Here the two terms account for the two choices of where the bad events can occur and j is the direction with maximal projection of the boundary of Y as defined above. Since (2.23), (3.19) and θ (B) = B allow us to conclude that K j /2 ≤ q L ,β (B) K j /2 , (4.23) Qˆ ϑ (B) ti

L ,β

i=1

and since the translation invariance of the torus state − L ,β implies a similar bound is also valid for the second product, the expectation of each term in the sum in (4.22) is bounded by 2q L ,β (B) K j /2 . The sum over (t i ) ∈ C j (Y) can then be estimated at 2 K j which yields ' &

|Yext |/2 Qˆ θt 1 (Gi ) Qˆ θt 2 (Gic ) ≤ 2 4q L ,β (B) j . (4.24) L ,β

Y : Y∈M L/B t 1 ∈Y, t 2 ∈Y

From here the claim follows by noting that our choice of j implies |Yext j |≥ assume that 4q L ,β (B) ≤ 1 without loss of generality).

1 2d |∂Y| (we

Proof of Theorem 3.7. By Lemma 4.3, the assumptions on B, and (3.27) we have that

q L ,β (B) < δ. Invoking a standard Peierls argument in toroidal geometry—see, e.g., the

proof of [6, Lemma 3.2]—the right-hand side of (4.19) is bounded by a quantity η(δ) such that η(δ) ↓ 0 as δ ↓ 0. Choosing δ sufficiently small, we will thus have η(δ) ≤ , proving (3.29). The bound (3.28) is a consequence of the chessboard estimates which yield Qˆ B L ,β ≤ q L ,β (B) < δ.

4.3. Exhibiting phase coexistence. In order to complete our general results, we still need to prove Propositions 3.9 and 3.10 whose main point is to guarantee existence of multiple translation-invariant KMS states. (Recall that, throughout this section, we work only with translation-invariant interactions.) Let us refer to . (4.25) T+L = x ∈ T L : − L/4 − 1/2 ≤ x1 ≤ L/4 − 1/2" + as the “front side” of the torus, and to T− L as the “back side.” Let A L be the C algebra of all observables localized in T+L (i.e., an operator in A+L acts as the identity on T− L ). The construction of infinite-volume KMS states will be based on the following standard lemma:

Lemma 4.5. Let T L/B be the factor torus and let M ⊂ T L/B be a block of M ×· · ·× M L sites at the “back side” of T L/B (i.e., we have dist(0, M ) ≥ 2B − M). Given a B-block event C, let 1 ˆ ρˆ L ,M (C) = (4.26) Q θ t (C ) . | M | t ∈ M

Suppose that Qˆ C L ,β ≥ c for all L 1 and some constant c > 0, and define the “conditional” state − L ,M;β on A+L by A L ,M;β =

ρˆ L ,M (C) A L ,β . ρˆ L ,β (C) L ,β

(4.27)

Quantum Spin Systems at Positive Temperature

633

If −β is a (subsequential) weak limit of − L ,M;β as L → ∞ (along multiples of B) followed by M → ∞, then −β is a KMS state at inverse temperature β which is invariant under translations by B. Proof. Translation invariance is a consequence of “conditioning” on the spatially-averaged quantity (4.26). Thus, all we need to do is to prove that the limit state satisfies the KMS condition (3.33). Let t → αt(L) be the unitary evolution on T L . If B is a local observable that depends only on the “front” side of the torus the fact that the interaction is finite range and that the series (3.32) converges in norm, uniformly in L, implies

(L) αt (B), ρˆ L ,M (C) −→ 0

(4.28)

L→∞

in norm topology, uniformly in t on compact subsets of C. (Note that, for any B localized inside a fixed finite subset of Zd , for large enough L, it will always be in the “front” side T+L , under the projection Zd → T L = Zd /LZd .) This means that for any bounded local operators A and B on the “front” side of the torus,

ρˆ L ,M (C) AB

L ,β

& ' (L) = ρˆ L ,M (C) α−iβ (B)A

L ,β

+ o(1),

L → ∞.

(4.29)

(Again, it is no restriction to say that A and B are on the “front” side, by simply letting (L) L be large enough.) Since α−iβ (B) → α−iβ (B) in norm, the state A → A L ,M;β converges, as L → ∞ and M → ∞, to a KMS state at inverse temperature β. Proof of Proposition 3.9. By Qˆ B + Qˆ G1 + · · · + Qˆ Gn = 1, the symmetry assumption and (3.28) we know that 1− . (4.30) Qˆ Gk L ,β ≥ n So, if ρˆ L ,M (Gk ) is as in (4.26), the expectation ρˆ L ,M (Gk ) L ,β is uniformly positive. (k) This means that, for each k = 1, . . . , n, we can define the state − L ,M;β , k = 1, . . . , n, by (4.27) with the choice C = Gk . Using (3.29) we conclude

Qˆ θt (Gk )

(k)

L ,M;β

≥1−

n , 1−

k = 1, . . . , n,

(4.31)

for any t on the “front” side of T L/B (provided that M L/B ). For (n + 1) < 1/2, the right-hand side exceeds 1/2 and so any thermodynamic limit of −(k) L ,M;β as L → ∞ and M → ∞ is “domintated” by Gk -blocks. Since, by Lemma 4.5, any such limit is a KMS state, we have n distinct states satisfying, as is easy to check, (3.34). (1) (2) Proof of Proposition 3.10. Consider the states − L ,M;β and − L ,M;β defined by (4.27) with C = G1 and C = G2 , respectively. From assumption (1) we know that ak := ρˆ L ,M (Gk ) > 0 for at least one k = 1, 2 and so, for each β ∈ [β1 , β2 ], at least one of these states is well defined. We claim that we cannot have Qˆ (k) Gk L ,M;β < 1 − 4 for both k = 1, 2. Indeed, if that were the case then

ρˆ L ,M (G1 ) + ρˆ L ,M (G2 ) + ρˆ L ,M (B) = 1,

(4.32)

634

M. Biskup, L. Chayes, S. Starr

and the bounds (3.28–3.29) would yield a1 + a2 = Qˆ G1 + Qˆ G2 L ,β (1) (2) = Qˆ G1 L ,M;β Qˆ G1 L ,β + Qˆ G2 L ,M;β Qˆ G2 L ,β + ρˆ L ,M (G1 ) Qˆ G2 L ,β + ρˆ L ,M (G2 ) Qˆ G1 L ,β + ρˆ L ,M (B) [1 − Qˆ B ] L ,β < (1 − 4)(a1 + a2 ) + 3

(4.33)

i.e., 4(a1 + a2 ) < 3. Since ≤ 1/4 this implies a1 + a2 < 3/4 ≤ 1 − , in contradiction with assumption (1). (k) Hence, we conclude that the larger from Qˆ Gk L ,M;β , k = 1, 2 (among those states that exist) must be at least 1−4. The same will be true about any thermodynamic limit of these states. Let Ξk ⊂ [β1 , β2 ], k = 1, 2, be the set of β ∈ [β1 , β2 ] for which there exists an infinite-volume, translation-invariant KMS state −β such that Qˆ Gk β ≥ 1 − 4. Then Ξ1 ∪ Ξ2 = [β1 , β2 ]. Now, any (weak) limit of KMS states for inverse temperatures βn → β is a KMS state at β, and so both Ξ1 and Ξ2 are closed. Since [β1 , β2 ] is closed and connected, to demonstrate a point in Ξ1 ∩ Ξ2 it suffices to show that both Ξ1 and Ξ2 are non-empty. For that we will invoke condition (2) of the proposition: From Qˆ G1 L ,β1 ≥ 1 − 2 we deduce

Qˆ G1

(1)

L ,M;β1

(1) = 1 − Qˆ G2 + Qˆ B L ,M;β ≥ 1 − 1

2 ≥ 1 − 4, 1 − 2

(4.34)

(2) and similarly for Qˆ G2 L ,M;β2 . Thus β1 ∈ Ξ1 and β2 ∈ Ξ2 , i.e., both sets are non-empty and so Ξ1 ∩ Ξ2 = ∅ as claimed.

5. Applications Here we will discuss—with varying level of detail—the five quantum models described in the introduction. We begin by listing the various conditions of our main theorems which can be verified without much regard for the particulars of each model. Then, in Sect. 5.2, we proceed to discuss model (1) which serves as a prototype system for the application of our technique. Sections. 5.3–5.5 are devoted to the details specific for models (2–5).

5.1. General considerations. Our strategy is as follows: For each model we will need to apply one of the two propositions from Sect. 3.3, depending on whether we are dealing with a “symmetry-breaking” transition (Proposition 3.9) or a temperature-driven energyentropy transition (Proposition 3.10). The main input we need for this are the inequalities (3.28–3.29). These will, in turn, be supplied by Theorem 3.7, provided we can check the condition (3.27). Invoking Theorem 3.1, which requires that our model satisfies the mild requirements (3.7–3.9), condition (3.27) boils down to showing that p L ,β (B) is small for the requisite bad event. It is, for the most part, only the latter that needs to be verified on a model-specific basis; the rest can be done in some generality.

Quantum Spin Systems at Positive Temperature

635

We begin by checking the most stringent of our conditions: reflection positivity. Here, as alluded to in Remark 2.5, we are facing the problem that reflection positivity may be available only in a particular representation of the model—which is often distinct from that in which the model is a priori defined. The “correct” representation is achieved by a unitary operation that, in all cases at hand, is a “product rotation” of all spins. There are two rotations we will need to consider; we will express these by means 2S +1 ] of unitary operators UA and UB . Consider the Hilbert space HT L = r r ∈T L [C y and let (Srx , Sr , Srz ) have the usual form—cf. (2.1)—on HT L . In this representation, the action of UA on a state |ψ ∈ HT L is defined by

UA |ψ =

π

y

π

x

ei 2 Sr ei 2 Sr |ψ.

(5.1)

r ∈T L

The effect of conjugating by this transformation is the cyclic permutation of the spin y y components Sr → Srx → Srz → Sr . The second unitary, UB , is defined as follows: UB |ψ =

y

eiπ Sr |ψ.

(5.2)

r ∈T L odd-parity

The effect of UB on spin operators is as follows: For even-parity r , the spin operators y are as before. For odd-parity r , the component Sr remains the same, while both Srx and Srz pick up a minus sign. Here are the precise conditions under which our models are reflection positive (RP): Lemma 5.1. Let UA and UB be the unitary transformations defined above. Then: (a) UA HUA−1 is RP for models (4–5), and for model (2) with P(x) = P1 (x 2 )+ xP2 (x 2 ). (b) UB HUB−1 is RP for models (1,3). (c) UB UA HUA−1 UB−1 is RP for model (2) with P(x) = P1 (x 2 ) − xP2 (x 2 ). Proof. (a) Under the unitary UA map, the Hamiltonians of models (4–5) are only using the x and z-components of the spins, which are both real valued. The resulting interaction couples nearest-neighbor spins ferromagnetically, and thus conforms to (2.20). (b) For two-body, nearest-neighbor interactions, UB has the effect Srα Srα → −Srα Srα ,

α = x, z,

(5.3)

y y

while the Sr Sr terms remain unchanged. Writing y y

y

y

Sr Sr = −(iSr )(iSr )

(5.4)

we can thus change the sign of all quadratic terms in the interaction and, at the same time, express all operators by means of real-valued matrices. Under the conditions given in Sect. 1, the Hamiltonians in (1.1) and (1.3) are then of the desired form (2.20). (c) Finally, for model (2), we first apply the argument in (a). Then the effect of UB is that the minus sign in P(x) = P1 (x 2 ) − xP2 (x 2 ) becomes a plus sign. Our next items of general interest are the “easy” conditions of Theorem 3.1 and Theorem 3.7. These turn out to be quite simple to check:

636

M. Biskup, L. Chayes, S. Starr

Lemma 5.2. The transformed versions—as defined in Lemma 5.1—of the five models from Sect. 1 satisfy the conditions (3.7–3.9) with some finite R and some c1 independent of S. Moreover, for each of the models (1-6) there exists a constant C such that (3.26) holds with ξ = C/S for all S. Proof. All interactions involve at most two spins so R = 2 suffices to have (3.7). Writing the interaction in the form (3.1), the normalization by powers of S makes the corresponding norms h bounded by a quantity independent of S. This means that (3.8) holds in any finite set (including the torus, with proper periodic extension of the h ’s). As to the Lipschitz bound (3.9), this is the subject of Theorem 2 and Proposition 3 of [17]. Since S −1 [Sαr ] = r + O(1/S), and similarly for the lower symbol, the same argument proves that ξ = O(1/S). To summarize our general observations, in order to apply Propositions 3.9-3.10, we only need to check the following three conditions: (1) The requisite bad event is such that ϑ t (B) = B for all t ∈ T L/B . (2) The occurrence of different types of goodness at neighboring B-blocks implies that a block placed in between the two (so that it contains the sites on the boundaries between them) is bad—cf. condition (2) of Definition 3.6. (3) The quantity p L ,β (B) is sufficiently small. In all examples considered in this paper, conditions (1–2) will be checked directly but condition (3) will require estimates specific for the model at hand. (Note that, since we are forced to work in the representation that makes the interaction reflection positive; the conditions (1–3) must be verified in this representation.) Remark 5.3. It is noted that all of the relevant classical models—regardless of the signs of the interactions—are RP with respect to reflections in planes of sites. We will often use this fact to “preprocess” the event underlying p L ,β (B) by invoking chessboard estimates with respect to these reflections. We will also repeatedly use the subadditivity property of A → p L ,β (A) as stated in [6, Theorem 6.3]. Both of these facts will be used without (much) apology. 5.2. Anisotropic Heisenberg antiferromagnet. Consider the reflection-positive version of the Hamiltonian (1.1) which (in the standard representation of the spin operators) on the torus T L takes the form y y HL = − S −2 (J1 Srx Srx − J2 Sr Sr + Srz Srz ). (5.5) r ,r

(The classical version of HL is obtained by replacing each Srα by the corresponding component of S r .) The good block events will be defined on a 2 × · · · × 2 block B — i.e., B = 2—and, roughly speaking, they will represent the two ferromagnetic states in the z-direction one can put on B . Explicitly, let G+ be the event that r = (θ r , φ r ) satisfies |θ r | < κ for all r ∈ B and let G− be the event that |θ r −π | < κ for all r ∈ B . Theorem 5.4 (Heisenberg antiferromagnet). Let d ≥ 2 and let 0 ≤ J1 , J2 < 1 be fixed. For each >√0 and each κ > 0, there exist constants c and β0 and, for all β and S with β0 ≤ β ≤ c S, there exist two distinct, translation-invariant KMS states −+β and −− β with the property ± ≥ 1 − . (5.6) Qˆ G ±

β

Quantum Spin Systems at Positive Temperature

637

In particular, for all such β we have z + z − S0 β − S0 β > 0.

(5.7)

Proof. Let B = (G+ ∪ G− )c be the bad event. It is easy to check that ϑ t acts on B only via translations. Moreover, if G+ and G− occur at neighboring (but disjoint) translates of B , then the block between these is necessarily bad. In light of our general observations from Sect. 5.1, we thus only need to produce good bounds on p L ,β (B), the classical probability of bad behavior. Since these arguments are standard and appear, for all intents and purposes, in the union of Refs. [11,23,24,44], we will be succinct (and not particularly efficient). Let = min{(1− J1 ), (1− J2 ), 2/ad }, where ad = d2d−1 , and fix η > 0 with η 1 such that (5.8) 1 − cos η − sin2 κ < 0. We will start with a lower estimate on the full partition function. For that we will restrict attention to configurations where |θ r | ≤ η/2 for all r ∈ T L . The interaction energy of a pair of spins is clearly maximized when both the x and y-terms are negative. This allows us to bound the energy by that in the isotropic case J1 = J2 = 1—i.e., the cosine of the angle between the spins. Hence, the energy between each neighboring pair is at most (− cos η). We arrive at L d

Z L (β) ≥ V (η)edβ cos η ,

(5.9)

where the phase volume V (η) = 2π [1 − cos(η/2)] may be small but is anyway independent of β. To estimate the constrained partition function in the numerator of p L ,β (B), we will classify the bad blocks into two distinct categories: First there will be blocks where not all spins are within κ of the pole and, second, there will be those bad blocks which, notwithstanding their Ising nature, will have defects in their ferromagnetic pattern. We denote the respective events by B1 and B2 . To bound p L ,β (B1 ), since we may decorate the torus from a single site, we may as well run a single site argument 2d -times. We are led to consider the constrained partition function where every site is outside its respective polar cap. It is not hard to see that the maximal possible interaction is 1 − sin2 κ; we may estimate the measure of such configurations as full. Thus,

p L ,β (B1 ) ≤ 2d

4π βd(1−cos η− sin2 κ) e . V (η)

(5.10)

Note that, by (5.8), this is small when β 1. The less interesting Ising violations are estimated as follows: The presence of such violations implies the existence of a bond with nearly antialigned spins. We estimate the interaction of this bond at cos(2κ). Now there are ad bonds on any cube so when we disseminate—using reflections through sites—we end up with at least one out of every ad bonds with this energy. The rest we may as well assume are fully “aligned”— and have energy at least negative one—and we might as well throw in full measure, for good measure. We thus arrive at / # $0 1 4π 1 exp βd p L ,β (B2 ) ≤ ad cos(2κ) + 1 − − cos η (5.11) V (η) ad ad

638

M. Biskup, L. Chayes, S. Starr

as our estimate for each such contribution to the Ising badness. Here the prefactor ad accounts for the choice of the “bad” bond. Since 1/ad > /2, the constant multiplying βd in the exponent is less than the left-hand side of (5.8); hence p L ,β (B2 ) 1 once β 1 as well. It follows that, given J1 , J2 < 1, we can find β0 sufficiently large so that p L ,β (B) ≤ p L ,β (B1 )+ p L ,β (B2 ) 1 once β ≥ β0 . The statement of the theorem is now implied by Proposition 3.9 and the ±-symmetry of the model.

5.3. Large-entropy models. Here we will state and prove order-disorder transitions in models (2–3). As in the previous subsection, most of our analysis is classical. While we note that much of the material of this section has appeared in some form before, e.g., in [11,16,21,22,33,44], here we must go a slightly harder route dictated by the quantum versions of reflection positivity. We start with the observation that model (2) with P(x) = P1 (x 2 ) − xP2 (x 2 ) is unitarily equivalent, via a rotation of all spins about the z-axis, to the same model with P(x) = P1 (x 2 ) + xP2 (x 2 ). Hence, it suffices to consider only the case of the plus sign. We thus focus our attention on models with classical Hamiltonians of the form ∞

H () = −

p r ,r

ck ( r # r )k ,

ck ≥ 0,

(5.12)

k=1 (x)

(x)

(y)

(y)

where (1 # 2 ) denotes the variant of the usual dot product 1 1 − 1 1 + (z) (z) 1 1 for model (3), and the “dot product among the first two components” for model (2). We now state our assumptions which ensure that models (2) and (3) have the large entropy property. Let us regard the coefficients in (5.12) as an infinite (but summable) sequence, generally thought of as terminating when k = p. (For the most part we will require that E p be a polynomial. However, some of our classical calculations apply even for genuine power series.) The terms of this sequence may depend on p so we will write them as c( p) = ( p) ( p) (c1 , c2 , . . . ); we assume that the 1 -norm of each c( p) is one. Let E p : [−1, 1] → R be defined by ( p) E p (x) = ck x k . (5.13) k≥1

Here is the precise form of the large-entropy property: Definition 5.5. We say that the sequence (c( p) ) has the large entropy property if there is a sequence ( p ) of positive numbers with p ↓ 0 such that the functions A p (s) = E p (1 − p s)

(5.14)

converge—uniformly on compact subsets of [0, ∞)—to a function s → A(s) with lim A(s) = 1 and

s→0+

lim A(s) = 0.

s→∞

(5.15)

Remark 5.6. Despite the abstract formulation, the above framework amalgamates all known examples [21, 22] and provides plenty of additional generality. A prototypical example that satisfies Definition 5.5 is the sequence arising as the coefficients of the polyp ( p) is defined from a probability nomial E p (x) = ( 1+x 2 ) . A general class of sequences c

Quantum Spin Systems at Positive Temperature

639 ( p)

density function φ : [0, 1] → [0, ∞) via ck

=

1 k/ p φ( p ).

In these cases we can generi11 cally take p = 1/p and the limiting function A is then given by A(s) = 0 φ(λ)e−λs dλ. p However, as the example E p (x) = ( 1+x 2 ) shows, existence of such a density function is definitely not a requirement for the large-entropy property to hold. What is required * ( p) is that the “distribution function” k≤ ps ck is small for s 1.

Our analysis begins with the definition of good and bad events. First we will discuss the situation on bonds: The bond r , r is considered to be energetically good if the attractive energy is larger (in magnitude) than some strictly positive constant b (a number of order unity depending on gross details, where we recall that 1 is the optimal value), i.e., if E p ( r # r ) ≥ b. (5.16) The entropically good bonds are simply the complementary events (so that every bond is a good bond). Crucial to the analysis is the fact, ensured by our large entropy assumption, that the crossover between the energetic and entropic phenotypes occurs when the √ deviation between neighboring spins is of the order p . We define the good block events Gord and Gdis on the 2 × · · · × 2-block B as follows: Gord is the set of spin configurations where every bond on B is energetically good while Gdis collects all spin configurations where every bond on B is entropically good. The requisite bad event is defined as B = (Gord ∪ Gdis )c . Our fundamental result will be a proof that the density of energetically good blocks is discontinuous: Theorem 5.7 (Large-entropy models). Consider a family of finite sequences c( p) = ( p) (ck )k≤ p and suppose that E p have the large entropy property in the sense of Definition 5.5. Consider the quantum spin systems with the Hamiltonian H ( p) = − E p S −2 (Sr # Sr ) , (5.17) r ,r

(with both interpretations of (Sr # Sr ) possible). Then there exists b ∈ (0, 1) for which the associated energetic bonds have discontinuous density in the large S quantum systems. Specifically, for every > 0 there is a p0 < ∞ so that for any p > p0 and all S sufficiently large, there is an inverse temperature βt at which there exist two distinct, dis translation-invariant KMS states −ord βt and −βt with the property dis ˆ Qˆ Gord ord βt ≥ 1 − and Q Gdis βt ≥ 1 − .

(5.18)

With a few small additional ingredients, we show that the above implies that the energy density itself is discontinuous: Corollary 5.8. There exist constants b and b , both strictly less than 1/2, such that the energy density e(β)—defined via the β-derivative of the free energy—satisfies ≥ 1 − b , if β > βt , e(β) (5.19) ≤ b, if β < βt , for all p sufficiently large. The bulk of the proof of this theorem again boils down to the estimate of p L ,β (B):

640

M. Biskup, L. Chayes, S. Starr

Proposition 5.9. There exist b0 ∈ (0, 1), > 0, C < ∞, and for each b ∈ (0, b0 ] there exists p0 < ∞ such that lim p L ,β (B) < C( p ) (5.20) L→∞

hold for all p ≥ p0 and all β ≥ 0. Apart from a bound on p L ,β (B), we will also need to provide the estimates in condition (2) of Proposition 3.10. Again we state these in their classical form: Proposition 5.10. There exist constants C1 < ∞, p1 < ∞ and 1 > 0 such that the following is true for all p ≥ p1 : First, at β = 0 we have lim sup p L ,0 (Gord ) ≤ C1 ( p )1 .

(5.21)

L→∞

1) Second, if β0 ∈ (0, ∞) is large enough, specifically if eβ0 d ≥ −2(1+ , then p

lim sup p L ,β0 (Gdis ) ≤ C1 ( p )1 .

(5.22)

L→∞

The proof of these propositions is somewhat technical; we refer the details to the Appendix, where we will also prove the corollary. Proof of Theorem 5.7. We begin by verifying the three properties listed at the end of Sect. 5.1. As is immediate from the definitions, neighboring blocks of distinct type of goodness must be separated by a bad block. Similarly, reflections θ t act on B only as translations. To see that the same applies to the “complex” reflections ϑ t , we have to check that B is invariant under the “complex conjugation” map σ . For that it suffices to verify that σ () # σ ( ) = # for any , ∈ S2 . This follows because both interpretations of # are quadratic in the components of and because σ changes the sign of the y-component and leaves the other components intact. Let b < b0 , where b0 is as in Proposition 5.9. Then (5.20) implies that p L ,β (B) 1 once p 1. Quantum chessboard estimates yield Qˆ A L ,β ≤ q L ,β (A) which by means of Theorem √ 3.1 implies that both Qˆ Gdis L ,0 and Qˆ Gord L ,β0 are close to one once L 1 and S is sufficiently large compared with β0 (referring to Proposition 5.10). Theorem 3.7 then provides the remaining conditions required for application of Proposition 3.10; we conclude that there exists a βt ∈ [0, β0 ] and two translationdis invariant KMS states −ord βt and −βt such that (5.18) hold. Remarks 5.11. Again, a few remarks are in order: (1) Note that the theorem may require larger S for larger p, even though in many cases the transition will occur uniformly in S 1 once p is sufficiently large. The transition temperature βt will generally depend on p and S. (2) There are several reasons why Theorem 5.7 has been stated only for polynomial interactions. First, while the upper symbol is easily—and, more or less, unambiguously—defined for polynomials, its definition for general functions may require some non-trivial limiting procedures that have not been addressed in the literature. Second, the reduction to the classical model, cf. Corollary 3.2, requires that the classical interaction be Lipschitz, which is automatic for polynomials but less so for general power series. In particular, Theorem 5.7 does not strictly apply to nonsmooth (or even discontinuous) potentials even though we believe that, with some model-specific modifications of the proof of Theorem 3.1, we could include many such cases as well.

Quantum Spin Systems at Positive Temperature

641

5.4. Order-by-disorder transitions: Orbital-compass model. We begin with the easier of the models (4–5), the 2D orbital compass model. We stick with the reflection-positive version of the Hamiltonian which, on T L , is given by (α) (α) HL = −S −2 Sr Sr +ˆeα , (5.23) r ∈T L α=x,z

with eˆ x , eˆ y , eˆ z denoting the unit vectors in (positive) coordinate directions. The number B will only be determined later, so we define the good events for general B. Given κ > 0 (with κ 1), let Gx be the event that all (classical) spins on a B × B block B satisfy | r · eˆ x | ≥ cos(κ).

(5.24)

Let Gz be the corresponding event in the z spin-direction. Then we have: Theorem 5.12 (Orbital-compass model). Consider the model with the Hamiltonian as in (1.4). For each √ > 0 there exist κ > 0, β0 > 0 and c > 0 and, for each β with β0 ≤ β ≤ c S, there is a positive integer B and two distinct, translation-invari(x) (z) ant KMS states −β and −β such that

Qˆ Gα

(α) β

≥ 1 − ,

α = x, z.

(5.25)

√ In particular, for all β with β0 ≤ β ≤ c S,

(S r · eˆ α )2

(α) β

≥ S 2 (1 − ),

α = x, z.

(5.26)

The proof is an adaptation of the results from [5–7] for the classical versions of order-by-disorder. Let B = (Gx ∪ Gz )c denote the requisite bad event. By definition, B is invariant under reflections of (classical) spins through the x z-plane; i.e., σ (B) = B. Since the restrictions from B are uniform over the sites in B , we have ϑ t (B) = τ B t (B). So, in light of our general claims from Sect. 5.1, to apply the machinery leading to Proposition 3.9, it remains to show that p L ,β (B) is small if β 1 and the scale B is chosen appropriately. For that let H ∞ () denote the classical version of the Hamiltonian (5.23). By completing the nearest-neighbor terms to a square, we get H ∞ () =

(y) 1 (α) (α) ( r − r +ˆeα )2 + [ r ]2 − |T L |. 2 α=x,z r ∈T L

(5.27)

r ∈T L

(α)

Here r denotes the α th Cartesian component of r . Unforuntately, the event B is too complex to allow a direct estimate of p L ,β (B). Thus, we will decompose B into two events, BE and BSW depending on whether the “badness” comes from bad energy or bad entropy. Let > 0 be a scale whose size will be determined later. Explicitly, the event BE marks the situations that either (y)

for some site r ∈ B , or

| r | ≥ c1

(5.28)

(α) |(a) r − r +ˆeα | ≥ c2 /B,

(5.29)

642

M. Biskup, L. Chayes, S. Starr

for some pair r and r + eˆ α , both in B . Here c1 , c2 are constants to be determined momentarily. The event BSW is simply given by BSW = B \ BE .

(5.30)

By the subadditity property of p L ,β , we have p L ,β (B) ≤ p L ,β (BE ) + p L ,β (BSW ). Since BE implies the existence of an energetically “charged” site or bond with energy about (/B )2 above its minimum, the value of p L ,β (BE ) is estimated relatively easily: ˜ p L ,β (BE ) ≤ cβ B 2 e−cβ

2 /B 2

,

(5.31)

for some constants c and c. ˜ (Here cB 2 accounts for possible positions of the “excited” bond/site and β comes from the lower bound on the classical partition function.) As to BSW , here we will decompose further into more elementary events: Given a collection of vectors w ˆ 1, . . . , w ˆ s that are uniformly spaced on the first quadrant of the (i) to be main circle, S1++ = { ∈ S2 : · eˆ y = 0, (x) ≥ 0, (z) ≥ 0}, we define BSW the set of configurations in BSW such that ˆ i(x) | + |(z) ˆ i(z) | ≥ cos(), |(x) r ·w r ·w

r ∈ B .

(5.32)

Since BSW is disjoint from BE , on BSW the y-component of every spin is less than order and any neighboring pair of spins differ by angle at most (up to a reflection). Hence, by choosing c1 and c2 appropriately, any two spins in B will differ by less than from some w ˆ i , i.e., s , (i) BSW ⊂ BSW , (5.33) i=1 (i)

provided that s exceeds the total length of S1++ . To estimate p L ,β (BSW ) we will have (i) to calculate the constrained partition function for the event BSW . The crucial steps of this estimate are encapsulated into the following three propositions: Proposition 5.13. Consider the classical orbital compass model with the Hamiltonian H ∞ () as in (5.27) and suppose that 1. Then for all i = 1, . . . , s, (i) p L ,β (BSW ) ≤ 22B e−B

2 (F ˆ i )−FL , (ˆe1 )) L , (w

,

(5.34)

where, for each w ˆ ∈ S1++ = {ˆv ∈ S2 : vˆ · eˆ 2 = 0, vˆ (x) ≥ 0, vˆ (z) ≥ 0}, # βeβ $|T L | 1 −β H ∞ () . ˆ = − 2 log d e 1{r ·w≥cos()} FL , (w) ˆ L 2π (S2 )|T L | r ∈T L

(5.35) Proposition 5.14. For each > 0 there exists δ > 0 such that if β2 >

1 and β3 < δ, δ

(5.36)

then for all L sufficiently large, |FL , (w) ˆ − F(w)| ˆ < holds for any w ˆ ∈ S1++ with F given by dk 1 )k (w). log D ˆ (5.37) F(w) ˆ = 2 [−π,π ]2 (2π )2 )k (w) ˆ =w ˆ 2z |1 − eik1 |2 + w ˆ 2x |1 − eik2 |2 . Here D

Quantum Spin Systems at Positive Temperature

643

Proposition 5.15. The function w ˆ → F(w) ˆ is minimized (only) by vectors w ˆ = ±ˆex and w ˆ = ±ˆez . The proofs of these propositions consist of technical steps which are deferred to the Appendix. We now finish the formal proof of the theorem subject to these propositions: Proof of Theorem 5.12 completed. As already mentioned, the bad event is invariant under both spatial reflections θ t and the “internal” reflection σ ; hence ϑ t (B) = τ B t (B) as desired. Second, if two distinct good events occur in neighboring blocks, say B and B + B eˆ 1 , then at least one of the bonds between these blocks must obey (5.29); i.e., the box B + eˆ 1 is (energetically) bad. Third, we need to show that p L ,β (B) is small. We will set and B to the values 5

= β − 12 and B ≈ log β.

(5.38)

These choices make p L ,β (BE ) small once β is sufficiently large and, at the same time, ensure that (5.36) holds for any given δ. Since we have (5.34), Propositions 5.14–5.15 (i) and the fact that BSW , being a subset of B, is empty when w ˆ i is within, say, κ/2 of ±ˆex or ±ˆez tell us that 1 2 p L ,β (BSW ) ≤ se− 2 B (5.39) once B is sufficiently large. But s is proportional to 1/ and so this is small for β sufficiently large. We conclude that as β → ∞, we have p L ,β (B) → 0 for the above choice of B and . Having verified all required conditions, the x z-symmetry of the model puts us in a position to apply Proposition 3.9. Hence, for all sufficiently large β, there exist two infi(x) (z) nite-volume, translation-invariant KMS states −β and −β such that (5.25) holds. To derive (5.26), we note that, for any vector w ˆ ∈ S2 and any single-spin coherent state | √ S · w| ˆ = S(w ˆ · )| + O( S). (5.40) Hence, (S · eˆ k )2 Qˆ Gk = S 2 Qˆ Gk + O(S 3/2 ), where all error terms indicate bounds in norm. Invoking (5.25), the bound (5.26) follows. Remark 5.16. The 3D orbital-compass model is expected to undergo a similar kind of symmetry breaking, with three distinct states “aligned” along one of the three lattice directions. However, the actual proof—for the classical model, a version of this statement has been established in [7]—is considerably more involved because of the existence of (a large number of) inhomogeneous ground states that are not distinguished at the leading order of spin-wave free-energy calculations. We also note that an independent analysis of the classical version of the 2D orbital-compass model, using an approach similar to Refs. [6, 7] and [41], has been performed in [40]. 5.5. Order-by-disorder transitions: 120-degree model. The statements (and proofs) for the 120-degree model are analogous, though more notationally involved. Consider six vectors vˆ 1 , . . . , vˆ 6 defined by vˆ 1 = eˆ x , vˆ 4 = −ˆex ,

vˆ 2 = 21 eˆ x + vˆ 5 =

√

3 ˆz, 2 e √ 1 − 2 eˆ x − 23 eˆ z ,

vˆ 3 = − 21 eˆ x − vˆ 6 =

1 ˆx 2e

−

√

√

3 ˆz, 2 e

3 ˆz. 2 e

(5.41) (5.42)

644

M. Biskup, L. Chayes, S. Starr

As is easy to check, these are the six sixth complex roots of unity. The reflection-positive version of the Hamiltonian on T L then has the form (S r · vˆ 2α )(S r +ˆeα · vˆ 2α ), (5.43) H = −S −2 r ∈T L α=1,2,3

where eˆ 1 , eˆ 2 , eˆ 3 is yet another labeling of the usual triplet of coordinate vectors in Z3 . To define good block events, let κ > 0 satisfy κ 1 and let G1 , . . . , G6 be the B-block events that all spins r , r ∈ B , are such that r · vˆ α ≥ cos(κ),

α = 1, . . . , 6,

(5.44)

respectively. Then we have: Theorem 5.17 (120-degree model). Consider the 120-degree model with the Hamiltonian (5.43). √ For each > 0 there exist κ > 0, β0 > 0 and c > 0 and, for each β with β0(α)≤ β ≤ c S, there is a number B and six distinct, translation-invariant states −β , α = 1, . . . , 6, such that (α) Qˆ Gα β ≥ 1 − , α = 1, . . . , 6. (5.45) √ In particular, for all β with β0 ≤ β ≤ c S, (α) S r · vˆ α β ≥ S(1 − ),

α = 1, . . . , 6.

(5.46)

Fix κ > 0 (with κ 1) and let B and be as in (5.38). Let B = (G1 ∪ · · · ∪ G6 )c be the relevant bad event. It is easy to check that B is invariant with respect to σ and, consequently, ϑ t (B) = B for all r ∈ T L/B as required. Introducing the projections (α)

r

= r · vˆ α ,

α = 1, . . . , 6,

(5.47)

3 1 − (w ˆ · eˆ y )2 , 2

(5.48)

and noting that, for any vector w ˆ ∈ S2 , α=1,2,3

(w ˆ · vˆ α )2 =

the classical Hamiltonian H ∞ () can be written in the form 1 3 3 2 H ∞ () = ((2α) − (2α) ) + ( r · eˆ y )2 − |T L |. r r +ˆeα 2 2 2 r ∈T L α=1,2,3

(5.49)

r ∈T L

As for the orbital-compass model, we will estimate p L ,β (B) by further decomposing B into more elementary bad events. Let BE denote the event that the block B contains an energetically “charged” site or bond. Explicitly, BE is the event that either for some r ∈ B we have | r · eˆ y | ≥ c1

, B

(5.50)

or, for some nearest-neighbor pair r , r + eˆ α in B , we have r · vˆ 2α − r +ˆe · vˆ 2α ≥ c2 . α B

(5.51)

Quantum Spin Systems at Positive Temperature

645

Here c1 and c2 are constants that will be specified later. The complementary part of B will be denoted by BSW , i.e., (5.52) BSW = B \ BE . By the fact that BSW ⊂ BEc , on BSW the energetics of the entire block is good—i.e., the configuration is near one of the ground states. Clearly, all constant configurations with zero y-component are ground states. However, unlike for the 2D orbital-compass model, there are other, inhomogeneous ground states which make the treatment of this model somewhat more complicated. Fortunately, we will be able to plug in the results of [6] more or less directly. As for the orbital-compass model, to derive a good bound on p L ,β (BSW ) we will further partition BSW into more elementary events. We begin with the events corresponding to the homogeneous ground states: Given a collection of vectors w ˆ i , i = 1, . . . , s, that are uniformly spaced on the circle S1 ⊂ S2 in the x z-plane, we define B0(i) to be the subset of BSW on which r · w ˆ i ≥ cos(),

r ∈ B .

(5.53)

To describe the remaining “parts of BSW ,” we will not try to keep track of the entire “near ground-state” configuration. Instead, we will note that each inhomogeneous ground state contains a pair of neighboring planes in B where the homogenous configuration gets “flipped” through one of the vectors vˆ 1 , . . . , vˆ 6 . (We refer the reader to [6], particularly p. 259.) Explicitly, given a lattice direction α = 1, 2, 3 and a vector w ˆ ∈ S1 , let w ˆ i denote the reflection of w ˆ i through vˆ 2α−1 . For each j = 1, . . . , B − 1, we then (i) define Bα, j to be the set of spin configurations in BSW such that for all r ∈ B , ˆ i ≥ cos() r · w r · w ˆ i

≥ cos()

if r · eˆ α = j,

(5.54)

if r · eˆ α = j + 1. (i)

(Note that r · eˆ α = j means that the α th coordinate of r is j. Hence, on Bα, j , the spins are near w ˆ i on the j th plane orthogonal to eˆ α and near w ˆ i on the j + 1st plane in B .) The conditions under which these events form a partition of B is the subject of the following claim: Proposition 5.18. Given κ > 0, there exist c1 , c2 > 0 such that if BE and BSW are defined as in (5.50–5.52) and if and B are such that B κ 1 and s > 4π , then s , , (i) , B−1 (i) B0 ∪ (5.55) BSW ⊆ Bα, j . i=1

α=1,2,3 j=1

Next we will attend to the estimates of p L ,β for the various events constituting B. As for the orbital-compass model, the event BE is dismissed easily: ˜ p L ,β (BE ) ≤ cβ B 3 e−cβ

2 /B 2

,

(i)

where c and c˜ are positive constants. As to the events B0 , here we get:

(5.56)

646

M. Biskup, L. Chayes, S. Starr

Proposition 5.19. For each κ > 0 there exists δ > 0 such that if β and obey β2 >

1 and β3 < δ, δ

(5.57)

then for all L sufficiently large,

p L ,β (B0(i) ) ≤ e−B

3ρ

1 (κ)

,

i = 1, . . . , s.

(5.58)

Here ρ1 (κ) > 0 for all κ 1. For the “inhomogeneous” events the decay rate is slower, but still sufficient for our needs. Proposition 5.20. For each κ > 0 there exists δ > 0 such that if β, and δ obey (5.57), then for all j = 1, . . . , B − 1, all α = 1, 2, 3 and all L sufficiently large, (i) −B p L ,β (Bα, j) ≤ e

2ρ

2 (κ)

,

i = 1, . . . , s.

(5.59)

Here ρ2 (κ) > 0 for all κ 1. Again, the proofs of these propositions are deferred to the Appendix. Proof of Theorem 5.17 completed. We proceed very much like for the orbital compass model. The core of the proof again boils down to showing that p L ,β (B) is small, provided B is chosen appropriately. Let and B be related to β as in (5.38). By (5.56), this choice makes p L ,β (BE ) small and, at the same time, makes (5.57) eventually satisfied for any fixed δ > 0. Invoking Propositions 5.19-5.20, and the subadditivity of A → p L ,β (A), we have

p L ,β (BSW ) ≤ s e−B

3ρ

1 (κ)

+ 3Be−B

2ρ

2 (κ)

,

(5.60)

which by the fact that s = O(−1 ) implies p L ,β (BSW ) 1 once β is sufficiently large. Using that p L ,β (B) ≤ p L ,β (BE ) + p L ,β (BSW ), the desired bound p L ,β (B) 1 follows. It is easy to check that the bad event B is preserved by “complex conjugation” σ as well as reflections and so the ϑ t ’s act on it as mere translations. Moreover, once κ 1, if two distinct types of goodness occur in neighboring blocks, all edges between the blocks are of high-energy—any block containing these edges is thus bad. Finally, the model on torus is invariant under rotation of all spins by 60◦ in the x z-plane. This means that all conditions of Proposition 3.9 are satisfied and so, for β 1 and S β 2 , the quantum model features six distinct states obeying (5.45). From here we get (5.46).

6. Appendix This section is devoted to the proofs of various technical statements from Sects. 5.3, 5.4 and 5.5. Some of the proofs in the latter two subsections are based on the corresponding claims from [6, 7]. In such cases we will indicate only the necessary changes.

Quantum Spin Systems at Positive Temperature

647

6.1. Technical claims: Large-entropy models. Consider a sequence (c( p) ) satisfying the large-entropy property and assume, without loss of generality, that c( p) = 1 for all p ≥ 1. Our goal here is to provide the bounds on p L ,β (B) and the asymptotic statements concerning the dominance of the two types of goodness which were claimed in Propositions 5.9 and 5.10. We begin with a lower estimate on the full partition function. Lemma 6.1. Let t > 0 be fixed. Then there exists p1 < ∞ and constants c1 , c2 ∈ (0, ∞) such that for all p ≥ p1 and all β ≥ 0, . d (6.1) lim inf (Z L )1/L ≥ max c1 p eβd A p (t) , c2 . L→∞

Proof. We will derive two separate bounds on the partition function per site. Focussing on the cases when r # r involves all three components of the spins, let us restrict √ attention to configurations when every spin is within angle c p of the vector (0, 0, 1), where c is a constant to be determined momentarily. Let and be two vectors with √ this property. Then the (diamond) angle between and is less than 2c p and so √ (6.2) # ≥ cos 2c p ≥ 1 − 2c2 p . Choosing 2c2 = t, we thus have # ≥ 1 − t p . This means that the energy of any bond in the configuration obeying these constraints is at least A p (t); while each spin has d √ at least 1 − cos(c p ) ≈ 21 c2 p surface area at its disposal. This implies that (Z L )1/L is bounded by the first term in the maximum with c1 ≈ 21 c2 . The other interpretation of r # r is handled analogously. In order to derive the second bound, we will restrict all spins to a sector of √ angular aperture π/2, e.g., the one described as { = (1 , 2 , 3 ) ∈ S 2 : 1 > 1/ 2}. This has area a which is a fixed positive number. Moreover, the constraint ensures that the interaction between any two spins is non-positive; the partition function per site then boils down to the entropy of such configurations. To evaluate this entropy, we fix the configuration on the even sublattice. Every spin on the even sublattice is then presented with 2d “spots” on this sector which it must avoid. The area of each such spot is a d constant times p . It follows that (Z L )1/L ≥ a − O( p ) which is positive once p is sufficiently large. Our next bound concerns the constrained partition function Z mix L (L) obtained by disseminating a particular pattern L of ordered and disordered bonds (i.e. energetically and entropically good bonds) over the torus, when L is a genuine mixture of the two. That is, we assume that L contains bonds of both phenotypes. We remark that this dissemination is carried out by means of reflections in planes of sites (which is permissible by the nearest-neighbor nature of the interaction). Recall that ad = d2d−1 is the number of bonds entirely contained in the 2 × · · · × 2 block B . Lemma 6.2. Let t > 0 be such that

and

1 − (1 − b)/ad ≤1 A p (t)

(6.3)

0 / 1 1 1 b = min 1 + , > 0. − − ad A p (t) ad A p (t)

(6.4)

def

648

M. Biskup, L. Chayes, S. Starr

Then there exists a constant c3 < ∞ such that for any β ≥ 0 and any pattern L of ordered and disordered bonds (i.e. energetically and entropically good bonds) on B containing at least one bond of each phenotype, . 1/L d lim sup Z mix ≤ c3 max c1 p eβd A p (t) , c2 ( p ) . (6.5) L (L) L→∞

Proof. Fix a pattern L as specified above. As usual, we call a bond disordered if it is entropically good. Let f b denote the fraction of disordered bonds in pattern L. Let us call a vertex an “entropic site” if all bonds connected to it are disordered. (Note that this has two different, but logically consistent, connotations depending on whether we are speaking of a vertex in B or in T L .) Let f s denote the fraction of entropic sites in L. Upon dissemination (by reflections through planes of sites), these numbers f b and f s will represent the actual fractions of disordered bonds and entropic sites in T L , respectively. Now each disordered bond has an energetic at most b, while we may estimate the energy of each ordered bond by 1. For each entropic site we will throw in full measure so we just need to estimate the entropy of the non-entropic sites. Here we note that each ordered bond disseminates into a “line” of ordered bonds, upon reflections. If we disregard exactly one bond on this “line of sites”, then we see that there is a total measure proportional to O( pL−1 ). Since this entropy is shared by the L vertices on this line, the entropy density of each vertex on this line is O( p ) in the L → ∞ limit. This is an upper bound for the entropy density for each non-entropic site. The bounds on energy show that the Boltzmann factor is no larger than eβd(1− f b )+βdb f b = eβd[1−(1−b) f b ] .

(6.6)

We thus conclude that, for some constant c˜3 , lim sup Z L (L)1/L ≤ c˜3 ( p )1− fs eβd[1−(1−b) f b ] . d

(6.7)

L→∞

Now, we may write the right-hand side as # $ 1−(1−b) f b A p (t) βd A p (t) c˜3 p e ( p )(L) , where (L) = 1 − f s −

1 − (1 − b) f b . A p (t)

(6.8)

(6.9)

Since L contains at least one entropic bond, we know f b > 1/ad . Our choice of t guarantees that 1 − (1 − b) f b ≤ 1 − (1 − b)/ad ≤ A p (t) and so the complicated exponent in (6.8) is bounded by 1. We may use the famous identity X λ Y 1−λ ≤ max(X, Y ), true whenever X, Y ≥ 0 and 0 ≤ λ ≤ 1, to bound the term with the complicated power in (6.8) by the maximum in (6.5). (We set X = c1 p eβd A p (t) and Y = c2 , absorbing extra order-1 constants into our eventual c3 .) It remains to show that (L) exceeds in (6.4) whenever L contains both phenotypes of bonds. We will derive a relation between f s and f b that holds whenever L contains both phenotypes of bonds. We may give the argument in either picture—where we restrict to the small block B or where we consider the full torus T L after disseminating L—which are entirely equivalent since the fractions of entropic bonds and sites are the same. We will give the argument in the small 2 × · · · × 2 block B . Since L contains bonds of

Quantum Spin Systems at Positive Temperature

649

both phenotypes there are at least two vertices in B each of which “emanates” bonds of both phenotypes. We mark these sites, and for each of them we mark one of the incident entropically good (disordered) bonds. We now consider the bonds of B to be split into half-bonds each of which is associated to the closest incident vertex (disregarding the midpoints). We label each half-bond as entropic or energetic, according to whether it is half of a full bond which is entropically or energetically good. Let H be the total number of entropic half-bonds. Now note that for each entropic vertex, all d of the half-bonds emanating from it (and contained in B ) are “entropic half-bonds”. We also have at least two additional entropic half-bonds associated to the two marked sites. Therefore the number of entropic half bonds satisfies the bound H ≥ d2d f s + 2. (Note that there are 2d f s entropic sites.) Since there are 2ad = d2d total half-bonds in B , the proportion of entropic half bonds is at least f s + 1/ad . At this point let us observe that the proportion of entropic half-bonds is exactly the same as the proportion of entropic full-bonds, f b . Therefore fb ≥ fs +

1 . ad

(6.10)

Plugging this into the formula for (L) we thus get (L) ≥ 1 +

1 − (1 − b) f b 1 − fb − . ad A p (t)

(6.11)

Allowing f b to take arbitrary values in [0, 1], the right-hand side is minimized by one of the values in the maximum in (6.4). Hence, (L) ≥ whereby (6.5) follows. Proof of Proposition 5.9. As usual, we consider events disseminated by reflections in 1 planes of lattice sites. Let b0 < 1+a . If b ≤ b0 , then, as a calculation shows, the bound d (6.4) holds as well as (6.3) for t such that A p (t) ≥ 1−b. Such a t can in turn be chosen by the assumption that the model obeys the large-entropy condition. (This is where we need that p is sufficiently large.) Hence, the bound in Lemma 6.2 is at our disposal. Now the maximum on the right-hand side of (6.5) is a lower bound on the full partition function per site; the lemma thus gives us bounds on p L ,β of the events enforcing the various patterns on B . Since B can be decomposed into a finite union of such pattern-events, the desired inequality (5.20) follows. Proof of Proposition 5.10. Again we work with events disseminated using reflections in planes of sites. In order to prove (5.21), we note that E p ( r # r ) ≥ b—which is what every bond r , r in B satisfies provided ∈ Gord —implies r # r ≥ 1 − c p . The √ neighboring spins are thus constrained to be within angle O( p ) of each other. Disregarding an appropriate subset of these constraints (reusing the “line of sites” argument from the first part of the proof of Lemma 6.2) the desired bound follows. To prove (5.22), we note that the disseminated event Gdis forces all bonds to have energy less than b. Lemma 6.1 implies that the corresponding p L ,β -functional is bounded above by C˜ 1 ( p )−1 eβd[b−A p (t)] . Assuming that b < 1/2 and t is chosen so that A p (t)−b > 1/ , we see that if β is large enough to satisfy 2 1), eβd ≥ −2(1+ p

then the p L ,β bound is less than C˜ 1 ( p )1 .

(6.12)

650

M. Biskup, L. Chayes, S. Starr

Given the existing results on the discontinuity of energetic bonds, it is almost inconceivable that the energy density itself could be continuous. To mathematically rule out this possibility, we will show that, in actuality very few of the energetic bonds have value in the vicinity of b. So while the previous argument only considered two types of bonds, we will henceforth have the following three types of bonds: (1) strongly ordered if E p ( r # r ) ≥ 1 − b , (2) weakly ordered if 1 − b > E p ( r # r ) ≥ b, (3) disordered if E p ( r # r ) < b. Here 0 < b , b < 1/2 are constants which we will choose later, although we already know that we have the restriction b < 1/(1 + ad ) as was necessary in the proof of Proposition 5.9. A rather similar line of argument to that previously used for mixed patterns of ordered and disordered bonds handles the situation for mixed patterns of weak and strong order. For each pattern L of weakly and strongly ordered bonds on B , let Z ord L (L) denote the partition function obtained by disseminating L all over the torus. Then we have: Lemma 6.3. Let t > 0 be a number such that = 1 − def

1 − b /ad > 0. A p (t)

(6.13)

There exists a constant c4 < ∞ such that for any β ≥ 0 and any pattern L of weakly and strongly ordered bonds on the 2 × · · · × 2 block B containing at least one weakly ordered bond, . 1/L d lim sup Z ord ≤ c4 max c1 p eβd A p (t) , c2 ( p ) . L (L)

(6.14)

L→∞

Proof. Consider an ordered pattern L with fraction f w of weakly ordered bonds. After dissemination all over T L , there is a fraction f w of bonds on T L that are weakly ordered and a fraction 1 − f w that are strongly ordered. Putting energy 1 − b for each weakly ordered bond and 1 for each strongly ordered bond, the Boltzmann weight of any spin configuration contributing to Z ord L (L) is at most

eβd(1−b ) fw +βd(1− fw ) = eβd(1−b

fw )

.

(6.15)

To calculate the entropy, we again use the “line of sites” argument from the first part of the proof of Lemma 6.2, which gives an entropy per site on the order of O( p ) in the 1/L d is bounded by a constant L → ∞ limit. This implies that the limsup of Z ord L (L) times p eβd(1−b fw ) . Since 1 − b f w ≤ 1 − b /ad we get

lim sup L→∞

1/L d Z ord L (L)

# $ 1−b /ad A p (t) ≤ c˜4 p eβd A p (t) ( p ) ,

(6.16)

for some constant c˜4 < ∞. By (6.13), the exponent of the term p eβd A p (t) is less than 1 and so the famous identity, X λ Y 1−λ ≤ max{X, Y }, may be used again (as in the proof of Lemma 6.2) which readily yields the bound (6.14).

Quantum Spin Systems at Positive Temperature

651

Proof of Corollary 5.8. The proof is based on thermodynamical arguments. First, standard calculations using coherent states show that √ E p S −2 (Sr # Sr ) | = E p ( r # r )| + O(1/ S), (6.17) where the error term depends implicitly on p. Hence, for a given p and δ > 0, we can find S so large that for any r , r ∈ B , −2 E p S (Sr # Sr ) Qˆ A ≥ 1 − b − δ, if A = Gord , (6.18) ≤ b + δ, if A = Gdis . Qˆ A (At the classical level the second case is by definition, whereas the first case follows from Lemma 6.3.) Since β → e(β) is increasing, we conclude that (5.19) holds. As a technical point, we note that in the statement of the corollary we did not include the small corrections corresponding to δ > 0. This was primarily for æsthetic reasons: we wanted to state the simplest possible result. We can clearly accomplish this by taking b and b to be a little smaller than is otherwise needed.

6.2. Technical claims: Orbital-compass model. Here we will prove Propositions 5.13– 5.15 concerning the orbital-compass model. The proofs follow the strategy developed in the context of the 120-degree model [6]. (i) Proof of Proposition 5.13. The proof goes by one more partitioning of BSW . Consider a (i) (i) spin configuration = ( r ) r ∈T L ∈ BSW . Since BSW ⊂ BSW and 1, it is easy to check the following facts:

(1) The y-components of all spins in B are small. (2) The x-components of the spins along each “line of sites” (in B ) in the x-direction are either all near the x-component of vector w ˆ i or its negative. (3) The same is true for the z-components of the spins on “lines of sites” in the z lattice direction. Thus, at the cost of reflecting the x-components of spins along each “line of sites” in the x-direction, and similarly for the z-components, we may assume that all spins are aligned with w ˆ i in the sense that ˆ i ≥ cos(), r · w

r ∈ B .

(6.19)

(i,0) Let BSW denote the set of configurations satisfying (6.19). The above reflection pre(i) serves both the a priori measure and the Hamiltonian (5.27); the event BSW is thus (i,0) partitioned into 22B “versions” of the event BSW all of which have the same value of the p L ,β -functional. Invoking the Subadditivity Lemma, (5.34) is proved once we show that 2 (i,0) p L ,β (BSW ) ≤ e−B (FL , (wˆ i )−FL , (ˆe1 )) . (6.20)

This follows by noting that e−B FL , (wˆ i ) is, to within a convenient multiplier, the integral ∞ 2 (i,0) of the Boltzmann weight e−β H () on the event BSW while e−B FL , (ˆe1 ) provides a lower bound on the partition function (again, to within the same multiplier which thus cancels from the ratio). 2

652

M. Biskup, L. Chayes, S. Starr

Proof of Proposition 5.14. The principal idea is to derive upper and lower bounds on FL , (w) ˆ which converge, in the limit L → ∞, to the same Gaussian integral. Let us parametrize w ˆ ∈ S1++ as (cos θ , 0, sin θ ) and, given a spin configuration that satisfies r · w ˆ ≥ cos() for all r ∈ T L , let us introduce the deviation variables (ϑ r , ζ r ) by the formula ( #( $ r = 1 − ζ r2 cos(θ + ϑ r ), ζ r , 1 − ζ r2 sin(θ + ϑ r ) . (6.21) Noting that both ϑ r and ζ r are order , we derive that H ∞ () + |T L | is, to within a quantity of order L 2 3 , equal to the quadratic form I L ,wˆ (ϑ, ζ ) =

3 1 2 2 w ˆ z (ϑ r − ϑ r +ˆex )2 + w ˆ 2x (ϑ r − ϑ r +ˆez )2 + ζ r2 . 2 r ∈T L

(6.22)

r ∈T L

The Jacobian of the transformation r → (ϑ r , ζ r ) is unity. Next we will derive upper and lower bounds on the integral of e−β I L ,wˆ against the product of indicators in (5.35). For the upper bound we invoke the inequality

1

1{r ·w≥cos()} ≤ e 2 λβ L ˆ

r ∈T L

2 2

3 2 λβ exp − ϑ r2 , 2

(6.23)

r ∈T L

valid for each λ ≥ √0. The ζ r ’s are then unrestricted and their integrals can be performed yielding a factor 2π/β per integral. The integral over ϑ r ’s involves passing to the Fourier components, which diagonalizes the covariance matrix. The result is best expressed in L → ∞ limit: 1 lim inf FL , (w) ˆ ≥ O(β3 ) + λβ2 + F(λ, w), ˆ L→∞ 2 where F(λ, w) ˆ =

1 2

[−π,π ]2

(6.24)

dk )k (w) log λ + D ˆ . 2 (2π )

(6.25)

By the Monotone Convergence Theorem, F(λ, w) ˆ converges to F(w) ˆ as λ ↓ 0. Since β3 is less than δ, which is up to us to choose, taking λ ↓ 0 on both sides of (6.24) we deduce that FL , (w) ˆ ≥ F(w) ˆ − for L sufficiently large. It remains to derive the corresponding lower bound. Here we will still work with the parameter λ above but, unlike for the upper bound, we will not be able to take λ ↓ 0 at the end. Consider the Gaussian measure Pλ which assigns any Borel set A ⊂ (R × R)T L the probability # $ 2 β TL 1 βλ 2 3 exp −βI L ,wˆ (ϑ, ζ ) − ϑr dϑ r dζ r . Pλ (A) = Z L (λ) A 2π 2 r ∈T L

r ∈T L

(6.26) Let E λ denote the corresponding expectation. From βλ ≥ 0 we get −β I L ,wˆ (ϑ,ζ ) ≥ Z L (λ) E λ . d e 1{r ·w≥cos()} 1{r ·w≥cos()} ˆ ˆ (S2 )|T L |

r ∈T L

r ∈T L

(6.27)

Quantum Spin Systems at Positive Temperature

653

The free-energy corresponding to the normalization constant Z L (λ) is exactly F(λ, w) ˆ 2 ˆ above. Thus, given > 0, we can find λ > 0 such that Z L (λ) ≥ e−L [F(w)+/2] 2 once L 1. It remains to show that the expectation is at least e−L /2 provided δ in (5.36) is sufficiently small. Here we first decrease the product by noting that 1{r ·w≥cos()} ≥ 1{|ϑr |≤/2} 1{|ζr |≤/2} . ˆ

(6.28)

This decouples the ζ r ’s from the ϑ r ’s and allows us to use the independence of these fields under Pλ . Since the ζ r ’s are themselves independent, the integral over ζ r boils down to 2 L 2 Eλ 1{|ζr |≤/2} = Pλ |ζ r | ≤ /2 ≥ 1 − e−λβ /4 , (6.29) r ∈T L

r ∈T L

where we used the standard tail bound for normal distribution. Note that, for any fixed 2 λ > 0, the term 1 − e−λβ /4 can be made as close to one as desired by increasing β2 appropriately. The ϑ r ’s are not independent, but reflection positivity through bonds shows that the corresponding indicators are positively correlated, i.e., Eλ 1{|ϑr |≤/2} ≥ Pλ |ϑ r | ≤ /2 . (6.30) r ∈T L

r ∈T L

The probability on the right-hand side is estimated using a variance bound: # 2 $2 1 4 1 4 Var(ϑ r ) = 2 2 , ≤ Pλ |ϑ r | > /2 ≤ ) L λβ2 ˆ β[λ + D k (w)]

(6.31)

k ∈T L

where T L denotes the reciprocal torus. Again, for any fixed λ, Pλ (|ϑ r | ≤ /2) can be made as close to one as desired once β2 is sufficiently large. We conclude that, given > 0, we can choose δ such that FL , (w) ˆ ≤ F(w) ˆ + once L 1. This finishes the proof. Proof of Proposition 5.15. Since w ˆ 2x + w ˆ 2z = 1, this is a simple consequence of Jensen’s inequality and the strict concavity of the logarithm.

6.3. Technical claims: 120-degree model. Here we will provide the proofs of technical Propositions 5.18–5.20. The core of all proofs is the fact that any spin configuration ( r ) can be naturally deformed, by rotating along the main circle orthogonal to the x z-plane, to have zero y-component. An explicit form of this transformation is as follows: Let us write each r ∈ S2 using two variables ζ r ∈ [−1, 1] and θ r ∈ [0, 2π ) interpreted as the cylindrical coordinates, ( $ #( r = (6.32) 1 − ζ r2 cos θ r , ζ r , 1 − ζ r2 sin θ r . Then r is the vector in which we set ζ r = 0, i.e., r = (cos θ r , 0, sin θ r ).

(6.33)

654

M. Biskup, L. Chayes, S. Starr

(We have already used this transformation in the proof of Proposition 5.14.) An additional useful feature of this parametrization is that the surface (Haar) measure d r on S2 then decomposes into the product of the Lebesgue measure d r on S1 and the Lebesgue measure dζ r on [−1, 1]. Proof of Proposition 5.18. We will use the fact that, for configurations on B with vanishing component in the y-direction, this was already proved as Theorem 6.4 in [6]. Let ( r ) ∈ BSW and define ( r ) as above. Since | r · eˆ y | ≤ c1 /B for all r ∈ B , we have ( r − ) · eˆ y ≤ c1 /B, (6.34) r while ( r − r ) · eˆ α = O(2 /B 2 ),

α = x, z.

(6.35)

In particular, the configuration ( r ) is contained in the version of event BSW from [6], provided c2 is a sufficiently small numerical constant. Thus, under the condition B √ κ 1—which translates to the condition B κ 1 of [6, Theorem 6.4]—( r ) is contained in one of the events on the right-hand side of (5.55). But, at the cost of a slight adjustment of , the corresponding event will then contain also ( r ). To prove the bounds in the remaining two propositions, we will more or less directly plug in the results of [6]. This is possible because the y-component of the spins contributes only an additive factor to the overall spin-wave free energy. The crucial estimate is derived as follows: Lemma 6.4. There exists a constant c > 0 such that the following is true: Let 1 and (2α) (2α) let = ( r ) be a configuration on T L such that | r ·ˆe y | ≤ 2 and | r − r +ˆeα | ≤ , for all α = 1, 2, 3. Define = ( r ) as above. Then ∞ 2 H () − H ∞ ( ) − 3 ( y · eˆ y ) ≤ c3 L 3 . (6.36) 2 r ∈T L

Proof. By the fact that r · eˆ y = O() we have r · vˆ α = r · vˆ α + O(2 ). (2α)

But then the assumption r

(6.37)

(2α)

− r +ˆeα = O() yields

( r − r +ˆeα ) · vˆ 2α

2

2

= ( r − r +ˆeα ) · vˆ 2α + O(3 ).

(6.38)

Using (5.49), this proves the claim. Proof of Proposition 5.19. The quantity p L ,β (B0(i) ) is the ratio of the partition function in which all spins are constrained to make an angle at most with w ˆ i , and the full (i) partition function. The restriction B0 ⊂ BSW can, for the most part, be ignored except for the w ˆ i ’s that are close to one of the six preferred directions. In such cases the fact (i) that κ tells us that B0 is empty whenever the angle between w ˆ i and the closest of vˆ 1 , . . . , vˆ 6 is less than, say, κ/2. In particular, we may restrict attention to the w ˆ i ’s that are farther than κ/2 from any of these vectors.

Quantum Spin Systems at Positive Temperature

655

Viewing the collection of angles (θ r ) as a configuration of O(2)-spins, Lemma 6.4 tells * us that the Hamiltonian of ( r ) is, to within corrections of order L 3 3 , the sum 3 of 2 r ζ r2 and the Hamiltonian of the classical, O(2)-spin 120-degree model evaluated at configuration (θ r ). Since the measure d r equals the product dζ r dθ r on the respective domain, we may ignore the restriction of ζ r to values less than O() and (i) integrate the ζ r ’s. We conclude that p L ,β (B0 ) is bounded by the same quantity as for 3 the O(2)-spin 120-degree model times e O(β ) . Since β3 is controlled via (5.57), the desired bound follows from [6, Lemma 6.9]. Proof of Proposition 5.20. The proof is very much like that of the previous proposition. +(i) denote the event that the top line in (5.54) holds for all r ∈ B for which r · eˆ α Let B α, j is odd and the bottom line for all such r for which r · eˆ α is even. Chessboard estimates then yield (i) (i) 2/B + p L ,β Bα, . (6.39) j ≤ p L ,β Bα, j ! +(i) ) the assumptions of Lemma 6.4 are satisOn the disseminated event t ∈T L/B θ t (B α, j fied. Hence, we may again integrate out the ζ r ’s to reduce the calculation to that for O(2)-spins. The latter calculation was performed in detail in [6]; the desired bound is then proved exactly as Lemma 6.10 of [6] (explicitly, applying inequality (6.24) of [6] and the paragraph thereafter). Acknowledgements. This research was supported by the NSF grants DMS-0306167 and DMS-0505356. The authors wish to thank Elliott Lieb, Aernout van Enter and the anonymous referees for many useful comments on the first version of this paper.

References 1. Alexander, K., Chayes, L.: Non-perturbative criteria for Gibbsian uniqueness. Commun. Math. Phys. 189(2), 447–464 (1997) 2. Ali, S.T., Antoine, J.-P., Gazeau, J.-P., Mueller, U.A.: Coherent states and their generalizations: a mathematical overview. Rev. Math. Phys. 7(7), 1013–1104 (1995) 3. Arecchi, F.T., Courtens, E., Gilmore, R., Thomas, H.: Atomic coherent states in quantum optics. Phys. Rev. A 6(6), 2211–2237 (1972) 4. Berezin, F.A.: Covariant and contravariant symbols of operators (Russian). Izv. Akad. Nauk SSSR Ser. Mat. 36 1134–1167 (1972) [English translation: Math. USSR-Izv. 6, 1117–1151 (1973) (1972)] 5. Biskup, M., Chayes, L., Kivelson, S.A.: Order by disorder, without order, in a two-dimensional spin system with O(2)-symmetry. Ann. Henri Poincaré 5(6), 1181–1205 (2004) 6. Biskup, M., Chayes, L., Nussinov, Z.: Orbital ordering in transition-metal compounds: I. The 120-degree model. Commun. Math. Phys. 255, 253–292 (2005) 7. Biskup, M., Chayes, L., Nussinov, Z.: Orbital ordering in transition-metal compounds: II. The orbitalcompass model. In preparation 8. Biskup, M., Kotecký, R.: Forbidden gap argument for phase transitions proved by means of chessboard estimates. Commun. Math. Phys. 264(3), 631–656 (2006) 9. Bolina, O., Contucci, P., Nachtergaele, B., Starr, S.: Finite-volume excitations of the 111 interface in the quantum XXZ model. Commun. Math. Phys. 212(1), 63–91 (2000) 10. Borgs, C., Kotecký, R., Ueltschi, D.: Low temperature phase diagrams for quantum perturbations of classical spin systems. Commun. Math. Phys. 181(2), 409–446 (1996) 11. Chayes, L., Kotecký, R., Shlosman, S.B.: Staggered phases in diluted systems with continuous spins. Commun. Math. Phys. 189, 631–640 (1997) 12. Conlon, J.G., Solovej, J.P.: On asymptotic limits for the quantum Heisenberg model. J. Phys. A 23(14), 3199–3213 (1990) 13. Datta, N., Fernández, R., Fröhlich, J.: Low-temperature phase diagrams of quantum lattice systems. I. Stability for quantum perturbations of classical systems with finitely-many ground states. J. Stat. Phys. 84(3-4), 455–534 (1996)

656

M. Biskup, L. Chayes, S. Starr

14. Datta, N., Fernández, R., Fröhlich, J.: Low-temperature phase diagrams of quantum lattice systems. II. Convergent perturbation expansions and stability in systems with infinite degeneracy. Helv. Phys. Acta 69(5-6), 752–820 (1996) 15. Davies, E.B.: Quantum Theory of Open Systems. London: Academic Press Inc (London) Ltd., 1976 16. Dobrushin, R.L., Shlosman, S.B.: Phases corresponding to minima of the local energy. Selecta Math. Soviet. 1(4), 317–338 (1981) 17. Duffield, N.G.: Classical and thermodynamic limits for generalised quantum spin systems. Commun. Math. Phys. 127(1), 27–39 (1990) 18. Dyson, F.J.: General theory of spin-wave interactions. Phys. Rev. 102(5), 1217–1230 (1956) 19. Dyson, F.J.: Thermodynamic behavior of an ideal ferromagnet. Phys. Rev. 102(5), 1230–1244 (1956) 20. Dyson, F.J., Lieb, E.H., Simon, B.: Phase transitions in quantum spin systems with isotropic and nonisotropic interactions, J. Stat. Phys. 18, 335–383 (1978) 21. van Enter, A.C.D., Shlosman, S.B.: First-order transitions for n-vector models in two and more dimensions: Rigorous proof. Phys. Rev. Lett. 89, 285702 (2002) 22. van Enter, A.C.D., Shlosman, S.B.: Provable first-order transitions for nonlinear vector and gauge models with continuous symmetries. Commun. Math. Phys. 255, 21–32 (2005) 23. Fröhlich, J., Israel, R., Lieb, E.H., Simon, B.: Phase transitions and reflection positivity. I. General theory and long-range lattice models. Commun. Math. Phys. 62(1), 1–34 (1978) 24. Fröhlich, J., Israel, R., Lieb, E.H., Simon, B.: Phase transitions and reflection positivity. II. Lattice systems with short-range and Coulomb interations. J. Stat. Phys. 22(3), 297–347 (1980) 25. Fröhlich, J., Lieb, E.H.: Phase transitions in anisotropic lattice spin systems. Commun. Math. Phys. 60(3), 233–267. (1978) 26. Fröhlich, J., Simon, B., Spencer, T.: Infrared bounds, phase transitions and continuous symmetry breaking. Commun. Math. Phys. 50, 79–95 (1976) 27. Fuller, W., Lenard, A.: Generalized quantum spins, coherent states, and Lieb inequalities. Commun. Math. Phys. 67(1), 69–84 (1979) 28. Fuller, W., Lenard, A.: Addendum: “Generalized quantum spins, coherent states, and Lieb inequalities.” Commun. Math. Phys. 69(1), 99 (1979) 29. Gaw¸edzki, K.: Existence of three phases for a P(φ)2 model of quantum field. Commun. Math. Phys. 59(2), 117–142 (1978) 30. Israel, R.B.: Convexity in the Theory of Lattice Gases. With an introduction by Arthur S. Wightman. Princeton Series in Physics. Princeton, N.J.: Princeton University Press, 1979 31. Kennedy, T.: Long range order in the anisotropic quantum ferromagnetic Heisenberg model. Commun. Math. Phys. 100(3), 447–462 (1985) 32. Koma, T., Nachtergaele, B.: Low-lying spectrum of quantum interfaces. Abstracts of the AMS 17, 146 (1996) and unpublished notes 33. Kotecký, R., Shlosman, S.B.: First-order phase transitions in large entropy lattice models. Commun. Math. Phys. 83(4), 493–515 (1982) 34. Kotecký, R., Shlosman, S.B.: Existence of first-order transitions for Potts models. In: Albeverio, S., Ph. Combe, M. Sirigue-Collins (eds.), Proc. of the International Workshop — Stochastic Processes in Quantum Theory and Statistical Physics, Lecture Notes in Physics 173, Berlin-Heidelberg-New York: Springer-Verlag, 1982, pp. 248–253 35. Kotecký, R., Ueltschi, D.: Effective interactions due to quantum fluctuations. Commun. Math. Phys. 206(2), 289–335 (1999) 36. Lieb, E.H.: The classical limit of quantum spin systems. Commun. Math. Phys. 31, 327–340 (1973) 37. Lieb, E., Mattis, D.: Ordering energy levels of interacting spin systems. J. Math. Phys. 3(4), 749–751 (1962) 38. Michoel, T., Nachtergaele, B.: The large-spin asymptotics of the ferromagnetic XXZ chain. Markov Proc. Rel. Fields 11(2), 237–266 (2005) 39. Michoel, T., Nachtergaele, B.: Central limit theorems for the large-spin asymptotics of quantum spins. Probab. Theory Related Fields 130(4), 493–517 (2004) 40. Mishra, A., Ma, M., Zhang, F.-C., Guertler, S., Tang, L.-H., Wan, S.: Directional ordering of fluctuations in a two-dimensional compass model. Phys. Rev. Lett. 93(20), 207201 (2004) 41. Nussinov, Z., Biskup, M., Chayes, L., van den Brink, J.: Orbital order in classical models of transitionmetal compounds. Europhys. Lett. 67(6), 990–996 (2004) 42. Perelomov, A.: Generalized Coherent States and Their Applications, Texts and Monographs in Physics, Berlin: Springer-Verlag, 1986 43. Robinson, D.W.: Statistical mechanics of quantum spin systems II. Commun. Math. Phys. 7(3), 337–348 (1968) 44. Shlosman, S.B.: The method of reflective positivity in the mathematical theory of phase transitions of the first kind (Russian). Usp. Mat. Nauk 41(3)(249), 69–111, 240 (1986)

Quantum Spin Systems at Positive Temperature

657

45. Simon, B.: The classical limit of quantum partition functions. Commun. Math. Phys. 71(3), 247–276 (1980) 46. Simon, B.: The Statistical Mechanics of Lattice Gases. Vol. I., Princeton Series in Physics, Princeton, NJ: Princeton, University Press, 1993 47. Speer, E.R.: Failure of reflection positivity in the quantum Heisenberg ferromagnet. Lett. Math. Phys. 10(1), 41–47 (1985) Communicated by H. Spohn

Commun. Math. Phys. 269, 659–691 (2007) Digital Object Identifier (DOI) 10.1007/s00220-006-0136-8

Communications in

Mathematical Physics

Bosons in Disc-Shaped Traps: From 3D to 2D K. Schnee1 , J. Yngvason1,2 1 Erwin Schrödinger Institute for Mathematical Physics, Boltzmanngasse 9, 1090 Vienna, Austria 2 Institut für Theoretische Physik, Universität Wien, Boltzmanngasse 5, 1090 Vienna, Austria.

E-mail: [email protected] Received: 4 October 2005 / Accepted: 28 July 2006 Published online: 4 November 2006 – © Springer-Verlag 2006

Abstract: We present a mathematically rigorous analysis of the ground state of a dilute, interacting Bose gas in a three-dimensional trap that is strongly confining in one direction so that the system becomes effectively two-dimensional. The parameters involved are the ¯ of the gas cloud in the trap, particle number, N 1, the two-dimensional extension, L, ¯ the thickness, h L of the trap, and the scattering length a of the interaction potential. Our analysis starts from the full many-body Hamiltonian with an interaction potential that is assumed to be repulsive, radially symmetric and of short range, but otherwise arbitrary. In particular, hard cores are allowed. Under the premises that the confining energy, ∼ 1/ h 2 , is much larger than the internal energy per particle, and a/ h → 0, we prove that the system can be treated as a gas of two-dimensional bosons with scattering length a2D = h exp(−(const.)h/a). In the parameter region where a/ h | ln(ρh ¯ 2 )|−1 , with ρ¯ ∼ N / L¯ 2 the mean density, the system is described by a two-dimensional GrossPitaevskii density functional with coupling parameter ∼ N a/ h. If | ln(ρh ¯ 2 )|−1 a/ h 2 −1 the coupling parameter is ∼ N | ln(ρh ¯ )| and thus independent of a. In both cases Bose-Einstein condensation in the ground state holds, provided the coupling parameter stays bounded. 1. Introduction In recent experiments dilute Bose gases have been confined in magneto-optical traps in such a way that the particle motion is essentially frozen in one or two directions and the system becomes effectively lower dimensional [8, 9, 33, 20, 23, 10, 31]. This is an intrinsically quantum mechanical phenomenon because it is not necessary to have a trap width or thickness that is the size of an atom, but it suffices that the energy gap for the motion in the strongly confined direction(s) is large compared to the internal energy per particle. The case of highly elongated (“cigar-shaped”) traps has received particular attention [21, 4, 2, 3, 7, 26, 15] (see also [15] for further references), because it opens the possibility to realize the one-dimensional Lieb-Liniger model [11], and even the

660

K. Schnee, J. Yngvason

limiting Girardeau-Tonks case [6, 10, 23] that exhibits strong correlations. A detailed, rigorous derivation of the one-dimensional behavior from the many-body Hamiltonian of a three-dimensional gas was given in the paper [15]. This is not a simple problem, one reason being that an approximate factorization of the ground state wave function in the longitudinal and transverse variables is in general not possible (in particular not for hard core potentials), and the proofs in [15] are, in fact, quite long. In the present paper we carry out a corresponding analysis for thin, disc-shaped traps, i.e., traps with strong confinement in one direction so that a two-dimensional behavior is expected. Experimental realizations of such systems and possible mechanisms for creating them are discussed, e.g., in [8, 9, 1, 38, 31]. On the theoretical side the references [25, 22, 27, 28, 24] contain many valuable insights into their properties. There are several similarities with the emergence of one-dimensional behavior in cigar-shaped traps, but also some notable differences. Like for cigar-shaped traps, there is a basic division of the parameter domain into two regions: one where a limit of a three-dimensional Gross-Pitaevskii (GP) theory applies, and a complementary region described by a “truly” low dimensional theory. In the case discussed in [15] the latter is a density functional theory based on the exact Lieb-Liniger solution for the energy of a strongly interacting (and highly correlated) one-dimensional gas with delta interactions. (Note that in 1D strong interactions means low density.) In the present case, on the other hand, the gas is weakly interacting in all parameter regions. In the region not accessible from 3D GP theory the energy formula [32, 19, 14] for a dilute two-dimensional Bose gas with a logarithmic dependence on the density applies. To enter this region extreme dilution is required. The Lieb-Liniger region in the 1D case demands also quite dilute systems, but the requirement is even more stringent in 2D. This will be explained further below. We recall from [32, 19] that the energy per particle of a dilute, homogeneous, twodimensional Bose gas with density ρ2D and scattering length a2D of the interaction potential is (in units such that = 2m = 1) 2 e2D ≈ 4πρ2D | ln(ρ2D a2D )|−1 .

(1.1)

The corresponding result in three dimensions is e3D ≈ 4πρ3D a3D ,

(1.2)

cf. [18]. In the following we shall denote the two-dimensional density, ρ2D , simply by ρ and the three dimensional scattering length, a3D , by a. Our results establish rigorously, and in the many-body context, a relation between a, the thickness, h, of the trap (assumed to be a) and the effective two-dimensional scattering length, a2D . Essentially, as h tends to zero, a2D = h exp(−(const.)h/a). (The 2 )| ≈ h/a, precise formula is given in Eq. (1.17) below.) If | ln ρh 2 | h/a, then | ln(ρa2D and the two-dimensional formula (1.1) leads to the same result as the three dimensional formula (1.2), because ρ3D ∼ ρ/ h. The “true” two dimensional region requires | ln ρh 2 | h/a and hence the condition ρ −1/2 heh/a for the interparticle distance, ρ −1/2 . This should be compared with the corresponding condition for the 1D Lieb-Liniger region in [15] where the interparticle distance is “only” required to be of the order or larger than h 2 /a. The basic formula a2D = h exp(−(const.)h/a) for the scattering length appeared, to the best of our knowledge, first in [25]. It can be motivated by considering a weak, bounded potential, where perturbation theory can be used to compute the energy for a two-body problem that is directly related to the scattering length, cf. Appendix A in [19]. This perturbative calculation is carried out in Sect. 4 as a step in the proof of a

Bosons in Disc-Shaped Traps: From 3D to 2D

661

lower bound for the many-body energy; its relation to the formula for a2D is explained in the remark after Corollary 4.2. We wish to stress, however, that deriving this formula in the context of two-body scattering is only a step towards the solution of the many-body problem that is the concern of the present paper. We now define the setting and state the results more precisely. We consider N identical, spinless bosons in a confining, three-dimensional trap potential and with a repulsive, rotationally symmetric pair interaction. We take the direction of strong confinement as the z-direction and write the points x ∈ R3 as (x, z), x ∈ R2 , z ∈ R. The Hamiltonian is H N ,L ,h,a =

N −i + VL ,h (xi ) + i=1

va (|xi − x j |)

(1.3)

1≤i< j≤N

with VL ,h (x) = VL (x) + Vh⊥ (z) = va (|x|) =

1 1 V (L −1 x) + 2 V ⊥ (h −1 z), L2 h

1 v(a −1 |x|). a2

(1.4) (1.5)

The confining potentials V and V ⊥ are assumed to be locally bounded and tend to ∞ as |x| and |z| tend to ∞. The interaction potential v is assumed to be nonnegative, of finite range and with scattering length 1; the scaled potential va then has scattering length a. We regard v, V ⊥ and V as fixed and L , h, a as scaling parameters. The Hamiltonian (1.3) acts on symmetric wave functions in L 2 (R3N , dx1 · · · dx N ). Its ground state energy, E QM (N , L , h, a), scales with L as E QM (N , L , h, a) =

1 QM E (N , 1, h/L , a/L). L2

(1.6)

Taking N → ∞ but keeping h/L and N a/L fixed leads to a three dimensional GrossPitaevskii description of the ground state as proved in [13]. The corresponding energy functional is GP |∇φ(x)|2 + VL ,h (x)|φ(x)|2 + 4π N a|φ(x)|4 d 3 x (1.7) E3D [φ] = R3

and the energy per particle is GP GP E 3D (N , L , h, a)/N = inf{E3D [φ] :

|φ(x)|2 d 2 x = 1}

GP (1, 1, h/L , N a/L). = (1/L 2 )E 3D

(1.8)

By Theorem 1.1 in [13], we have, for fixed h/L and N a/L, lim

N →∞

E QM (N , L , h, a) = 1. GP (N , L , h, a) E 3D

(1.9)

It is important to note, however, that the estimates in [13] are not uniform in the ratio h/L and the question what happens if h/L → 0 is not addressed in that paper. It will be shown in the next section that a part of the parameter range for thin traps can be treated

662

K. Schnee, J. Yngvason

GP (1, 1, h/L , N a/L), with the by considering, at fixed N a/ h, the h/L → 0 limit of E 3D 2 ground state energy for the transverse motion, ∼ 1/ h , subtracted. But this limit can evidently never lead to a logarithmic dependence on the density and it does not give the correct limit formula for the energy in the whole parameter range. To cover all cases we have to consider a two-dimensional Gross-Pitaevskii theory of the type studied in [14], i.e., GP E2D |∇ϕ(x)|2 + VL (x)|ϕ(x)|2 + 4π N g|ϕ(x)|4 d 2 x [ϕ] = (1.10)

R2

with 2 −1 | . g = | ln(ρa ¯ 2D

(1.11)

Here ρ¯ is the mean density, defined as in Eq. (1.6) in [14]. An explicit formula that is valid in the case N g 1 can be stated as follows. Let ρ NTF (x) =

1 TF μ N − VL (x) + 8π

(1.12)

be the ‘Thomas Fermi’ density for particles at coupling constant 1 in the potential N TF VL , where μTF N is chosen so that ρ N = N . Then ρ¯ = N −1 ρ NTF (x)2 d x. (1.13) R2

For simplicity we shall assume that V is homogeneous of some degree p > 0, i.e., V (λx) = λ p V (x), and in this case ρ¯ ∼ N p/( p+2) /L 2 = N / L¯ 2 with L¯ = N 1/( p+2) L .

(1.14)

The length L¯ measures the effective extension of the gas cloud of the N particles in the ¯ i.e., p = ∞ and hence two-dimensional trap. A box potential corresponds to L = L, ρ¯ ∼ N /L 2 . The case N g = O(1) requires a closer look at the definition of ρ. ¯ First, for any value GP of N g we can consider the minimizer ϕ GP N g of (1.10) with normalization ϕ N g 2 = 1. The corresponding mean density is 4 ρ¯ N g = N |ϕ GP (1.15) Ng| . A general definition of ρ¯ amounts to solving the equation ρ¯ = ρ¯ N g with g as in (1.11). As discussed in [14] this gives the same result as (1.13) to leading order in g when N g 1. In the case | ln N h/L| h/a (referred to as ‘Region I’ below) the coupling constant is simply a/ h and thus independent of N . Moreover, in a homogeneous potential of degree p the effective length scale L¯ is ∼ (N g + 1)1/( p+2) L and thus of order L if N g = O(1). The energy per particle corresponding to (1.10) is GP GP GP (N , L , g)/N = inf{E2D [ϕ] : |ϕ(x)|2 d 2 x = 1} = (1/L 2 )E 2D (1, 1, N g). E 2D (1.16)

Bosons in Disc-Shaped Traps: From 3D to 2D

663

Let sh be the normalized ground state wave function of the one-particle Hamiltonian −d 2 /dz 2 +Vh⊥ (z). It can be written as sh (z) = h −1/2 s(h −1 z) and the ground state energy as eh⊥ = h −2 e⊥ , where s(z) and e⊥ are, respectively, the ground state wave function and ground state energy of −d 2 /dz 2 + V ⊥ (z). We define the two dimensional scattering length by the formula a2D = h exp −( s(z)4 dz)−1 h/2a . (1.17) Then, using (1.11), g = | − ln(ρh ¯ 2 ) + ( s(z)4 dz)−1 h/a|−1 .

(1.18)

The justification of the definition (1.17) is Theorem 1.1 below. Remarks. Since a2D appears only under a logarithm, and a/ h → 0, one could, at least as far as leading order computations are concerned, equally well define the two

= b exp −( s(z)4 dz)−1 h/2a with b satisfying dimensional scattering length as a2D

)2 )|−1 , then c a ≤ b ≤ C h for some constants c > 0, C < ∞. In fact, if g = | ln(ρ(a ¯ 2D g 2 ln(b/ h) =1+ →1

g | − ln(ρh ¯ 2 ) + (const.)h/a|

(1.19)

because (a/ h) ln(b/ h) → 0. We can now state the limit theorem for the ground state energy: Theorem 1.1. (From 3D to 2D, ground state energy). Let N → ∞ and at the same time h/L → 0 and a/ h → 0 in such a way that h 2 ρg ¯ → 0 (with g given by Eq. (1.18)). Then lim

E QM (N , L , h, a) − N h −2 e⊥ = 1. GP (N , L , g) E 2D

(1.20)

Remarks. 1. The condition h 2 ρg ¯ → 0 means that the ground state energy h −2 e⊥ associated with the confining potential in the z-direction is much larger than the energy ρg. ¯ This is the condition of strong confinement in the z-direction. In the case that h/a | ln(ρh ¯ 2 )| we have g ∼ a/ h and hence the condition in that region is equivalent to ρah ¯ 1.

(1.21)

On the other hand, if h/a | ln(ρh ¯ 2 )| the strong confinement condition is equivalent 2 2 −1 to h ρ| ¯ ln(h ρ)| ¯ 1, which means simply that ρh ¯ 2 1.

(1.22)

2 1, i.e., the gas is dilute in the 2D sense Both (1.21) and (1.22) clearly imply ρa ¯ 2D 3 (and also in the 3D sense, ρ3D a 1, because ρ3D = ρ/ h). This is different from the situation in cigar-shaped traps considered in [15] where the gas can be either dilute or dense in the 1D sense, depending on the parameters (although it is always dilute in the 3D sense). 2. It is, in fact, not necessary to demand h/L → 0 explicitly in Theorem 1.1. The reason is as follows. In the region where h/a | ln(ρh ¯ 2 )|, the strong confinement

664

K. Schnee, J. Yngvason

condition ρh ¯ 2 1 immediately implies h/L 1 because ρ¯ 1/L 2 , cf. Eq. (1.14). If h/a | ln(ρh ¯ 2 )|, then at least ρah ¯ 1 holds true. This leaves only the alternatives h/L → 0, or, if h/L stays bounded away from zero, N a/L → 0. But the latter alternative means, by the three dimensional Gross-Pitaevskii limit theorem [13], that the energy converges to the energy of a noninteracting, trapped gas, for which (1.20) obviously holds true. We shall refer to the parameter region where h/a | ln(ρh ¯ 2 )| as Region I, and the 2 one where h/a | ln(ρh ¯ )| as Region II. In Region I we can take g = ( s(z)4 dz)a/ h.

(1.23)

In Region II g ∼ | ln(ρh ¯ 2 )|−1 , and in the extreme case that h/a | ln(ρh ¯ 2 )|, g = | ln(ρh ¯ 2 )|−1 .

(1.24)

In particular g is then independent of a (but dependent on ρ). ¯ As remarked earlier, Region II applies only to very dilute gases since it requires interparticle distances ρ¯ −1/2 heh/a . By Eq. (1.16) the relevant coupling parameter is N g rather than g itself, and both Region I and Region II can be divided further, according to N g 1, N g ∼ 1, or N g 1. The case N g 1 corresponds simply to an ideal gas in the external trap potential. Note that this limit can both be reached from Region I by taking a/ h → 0 at fixed ρh ¯ 2 , or from Region II by letting ρh ¯ 2 tend more rapidly to zero than e−h/a . The case N g ∼ 1 in Region I corresponds to a GP theory with coupling parameter ∼ N a/ h as was already explained, in particular after Eq. (1.9). The case N g 1 is the ‘Thomas-Fermi’ case where the gradient term in the energy functional (1.10) can be ignored. In Region II, the case N g 1 requires ρ¯ −1/2 he N and is thus mainly of academic interest, while ρ¯ −1/2 he N (but still heh/a ρ¯ −1/2 ) corresponds to the TF case. The subdivision of the parameter range just described is somewhat different from the situation described in [15]. The reason is the different form of the energy per particle of the low dimensional gas as function of the density. Theorem 1.1 is a limit theorem for the ground state energy. By standard arguments (variation with respect to the trapping potential) it implies a convergence result for the QM one particle density in the ground state, ρ N ,L ,h,a (x) (cf. Sect. 2 in [15] and Theorem 8.2 in [17]): Define the 2D QM density by integrating over the transverse variable z, i.e., QM

ρˆ N ,L ,h,a (x) :=

QM

ρ N ,L ,h,a (x, z)dz.

(1.25)

With L¯ the extension of the system in the 2D trap, cf. (1.14), define the rescaled GP density ρ˜ by ρ(x) ˜ = L¯ 2 |ϕ GP ( L¯ x)|2 , where ϕ GP is the minimizer of (1.10) with normalization ρ˜ depends on N , L and g.)

(1.26)

|ϕ GP (x)|2 d x = 1. (Note that

Bosons in Disc-Shaped Traps: From 3D to 2D

665

Theorem 1.2. (2D limit for the density). In the same limit as considered in Theorem 1.1,

¯2 L QM ρˆ N ,L ,h,a ( L¯ x) − ρ(x) ˜ =0 (1.27) lim N in weak L 1 sense. In the GP case where the coupling parameter N g stays bounded a much stronger result can be proved, namely convergence of the 1-particle density matrix and Bose-Einstein condensation (BEC) in the ground state. Recall that the one-body density matrix obtained from the ground state wave function 0 is 0 (x, x2 , . . . , x N )0 (x , x2 , . . . , x N )∗ dx2 · · · dx N . (1.28) γ0 (x, x ) = N R3(N −1)

BEC means that in the N → ∞ limit it factorizes as N ψ(x)ψ(x ) for some normalized ψ. This, in fact, is 100% condensation and was proved in [12] for a fixed trap potential in the Gross-Pitaevskii limit, i.e., for both h/L and N a/L fixed as N → ∞. Here we extend this result to the case h/L → 0 with N g and L fixed. The function ψ is the minimizer of the 2D GP functional (1.10) times the transverse function sh (z): Theorem 1.3. (BEC in GP limit). If N → ∞, h/L → 0 while N g and L are fixed, then h γ (x, hz; x , hz ) → ϕ GP (x)ϕ GP (x )s(z)s(z ) N 0

(1.29)

in trace norm. Here ϕ GP is the normalized minimizer of the GP functional (1.10). There is a variant of Theorem 1.1 that applies to the thermodynamic limit in the 2D variable x where the density becomes homogeneous in this variable. Let E box (N , L , a) be the ground state energy of the Hamiltonian (1.3) with the potential VL (x) replaced by a 2D box of side length L and define, for fixed ρ, h and a, e2D (ρ, h, a) :=

lim

N ,L→∞,N /L 2 =ρ

E box (N , L , a) − N h −2 e⊥ . N

(1.30)

This is the energy per particle in the 2D thermodynamic limit (with the confining energy subtracted) and by standard arguments (cf. e.g., [30]) it is independent of the boundary conditions (Dirichlet, Neumann or periodic) imposed in the definition of E box (N , L , a). We then have Theorem 1.4. (From 3D to 2D, homogeneous case). If h 2 ρg → 0 and a/ h → 0, with g given by Eq. (1.18), then lim

e2D (ρ, h, a) = 1. 4πρg

(1.31)

The essentials of the dimensional reduction are already contained in the proof of Theorem 1.4 and are more transparent in this case than for inhomogeneous gases in Theorem 1.1. Hence we shall focus on the proof of Theorem 1.4, while the modifications that have to be made for a proof of Theorem 1.1 will be more briefly described with appropriate references to [13–15, 36], where analogous problems are discussed in some detail.

666

K. Schnee, J. Yngvason

An abbreviated version of this paper appears as Chapter 9 in the Oberwolfach Seminars volume [17]. 2. The 2D Limit of 3D GP Theory As in [15], certain aspects of the dimensional reduction of the many-body system can be seen already in the much simpler context of GP theory. We therefore begin by considering the h/L → 0 limit of the 3D GP ground state energy. The result is, apart from the confining energy, the 2D GP energy with coupling constant g ∼ a/ h. This shows in particular that Region II, where g ∼ | ln(ρh ¯ 2 )|−1 , cannot be reached as a limit of 3D GP theory. Theorem 2.1. (2D limit of 3D GP energy). Define g = s(z)4 dz a/ h. If h/L → 0, then GP (N , L , h, a) − N h −2 e⊥ E 3D GP (N , L , g) E 2D

→1

(2.1)

uniformly in the parameters, as long as ρah ¯ → 0. GP (1, L , N g) ∼ L −2 + ρa/ Remarks. Since E 2D ¯ h, the condition ρah ¯ → 0 is equivalent GP (1, L , N g) → 0, which means simply that the 2D GP energy per particle is to h 2 E 2D much less than the confining energy, ∼ 1/ h 2 .

Proof. Because of the scaling relation (1.8) it suffices to consider the case N = 1 and L = 1. For an upper bound to the 3D GP ground state energy we make the ansatz φ(x) = ϕ GP (x)sh (z),

(2.2)

where ϕ GP is the minimizer of the 2D GP functional with coupling constant g. Then GP GP E3D [φ] = e⊥ / h 2 + E 2D (1, 1, g)

(2.3)

GP GP E 3D (1, 1, h, a) − e⊥ / h 2 ≤ E 2D (1, 1, g).

(2.4)

and hence

For the lower bound we consider the one-particle Hamiltonian (in 3D) Hh,a = − + V1,h (x) + 8πa|ϕGP (x)|2 sh (z)2 .

(2.5)

Taking the 3D GP minimizer as a test state gives

GP (1, 1, h, a) − 4πa |(x)|4 d 3 x inf spec Hh,a ≤ E 3D 3 R +8πa |ϕGP (x)|2 sh (z)2 |(x)|2 d 3 x GP (1, 1, h, a) + 4πa |ϕ GP (x)|4 sh (z)4 d 3 x ≤ E 3D 3 R GP = E 3D (1, 1, h, a) + 4πg |ϕ GP (x)|4 d x. R2

(2.6)

Bosons in Disc-Shaped Traps: From 3D to 2D

667

To bound Hh,a from below we consider first for fixed x ∈ R2 the Hamiltonian (in 1D) Hh,a,x = −∂z2 + Vh⊥ (z) + 8πa|ϕ GP (x)|2 sh (z)2 .

(2.7)

We regard −∂z2 + Vh⊥ (z) as its “free” part and 8πa|ϕ GP (x)|2 sh (z)2 as a perturbation. Since the perturbation is positive all eigenvalues of Hh,a,x are at least as large as those of −∂z2 + Vh (z); in particular the first excited eigenvalue is ∼ 1/ h 2 . The expectation value in the ground state sh of the free part is Hh,a,x = e⊥ / h 2 + 8πg|ϕGP (x)|2 .

(2.8)

Temple’s inequality [37, 29] gives Hh,a,x

(Hh,a,x − Hh,a,x )2 ⊥ 2 GP 2 , 1− ≥ e / h + 8πg|ϕ (x)| Hh,a,x (e˜⊥ − e⊥ )/ h 2

(2.9)

where e˜⊥ / h 2 is the lowest eigenvalue above the ground state energy of −∂z2 + Vh⊥ (z). Since Hh,a,x sh = (e⊥ / h 2 )sh + 8πa|ϕ GP (x)|2 sh3

(2.10)

3 2 we have h,a,x −Hh,a,x )sh = 8π |ϕGP (x)| (ash −gsh ) and hence, using g = a (H (a/ h) s 4 ,

(Hh,a,x − Hh,a,x )2 = (8π )2 |ϕGP (x)|4

sh4 =

2 ash (z)3 − gsh (z) dz

2

≤ (8π )2 ϕ GP 4∞ (a/ h)2

s6 −

s4

GP (1, 1, g)2 , ≤ const. E 2D

(2.11)

where we have used Lemma 2.1 in [14] to bound the term g ϕ GP 2∞ by const. GP (1, 1, g). We thus see from (2.8) and the assumption h 2 E GP (1, 1, g) → 0 that E 2D 2D the error term in the Temple inequality (2.9) is o(1). Now Hh,a = −x + V (x) + Hh,a,x , so from (2.9) we conclude that Hh,a ≥ (e⊥ / h 2 ) − x + V (x) + 8πg|ϕ GP (x)|2 (1 − o(1)).

(2.12)

GP (1, 1, g) On the other hand, the lowest energy of −x + V (x)+8πg|ϕ GP (x)|2 is just E 2D 4 + 4πg R2 |ϕGP (x)| d x. Combining (2.6) and (2.12) we thus get GP GP E 3D (1, 1, h, a) − e⊥ / h 2 ≥ E 2D (1, 1, g)(1 − o(1)).

(2.13)

668

K. Schnee, J. Yngvason

3. Upper Bound 3.1. Finite n bounds. We now turn to the many-body problem, i.e., the proof of Theorems 1.1 and 1.4. Like in [15] the key lemmas are energy bounds in boxes with finite particle number. The bounds for the total system are obtained by distributing the particles optimally among the boxes. We start with the upper bound for the energy in a single box. Consider the Hamiltonian n H= −i + Vh⊥ (z i ) + va (|xi − x j |) (3.1) i=1

1≤i< j≤n

in a region = 2 × R, where 2 denotes a box of side length in the 2D x variables. For the upper bound on the ground state energy of (3.1) we impose Dirichlet boundary conditions on the 2D Laplacian. The goal is to prove, for a given 2D density ρ and parameters a and h, that for a suitable choice of and corresponding particle number n = ρ2 the energy per particle, with the confining energy e⊥ / h 2 subtracted, is bounded above by 2 4πρ| ln(ρa2D )|−1 (1 + o(1)),

(3.2)

where a2D is given by Eq. (1.17). Moreover, the Dirichlet localization energy per particle, ∼ 1/2 , should be small compared to (3.2). The relative error, o(1), in (3.2) tends to zero with the small parameters a/ h and ρah (Region I), or a/ h and ρh 2 (Region II). The choice of variational functions depends on the parameter regions and we are first concerned with the Region II, i.e., the case | ln(ρh 2 )| h/a. 3.1.1. Upper bound in Region II. Let f 0 (r ) be the solution of the zero energy scattering equation − f 0 + 21 va f 0 = 0,

(3.3)

normalized so that f 0 (r ) = (1−a/r ) for r ≥ R0 with R0 the range of va . Note that R0 = (const.)a by the scaling (1.5). The function f 0 satisfies 0 ≤ f 0 (r ) ≤ 1 and 0 ≤ f 0 (r ) ≤ min{1/r, a/r 2 }. This is seen by writing f 0 (r ) = u(r )/r and f 0 (r ) = u (r )/r − u(r )/r 2 with u

(r ) = 21 v(r )u(r ) ≥ 0. Since u(r ) = r − a for r ≥ R0 and u(0) = 0, convexity implies u(r ) ≥ max{0, r − a} and u (r ) ≤ 1. Hence 0 ≤ f 0 (r ) ≤ min{1/r, a/r 2 } and 0 ≤ f 0 (r ) ≤ limr →∞ (1 − a/r ) = 1. For R > R0 we define f (r ) = f 0 (r )/(1 − a/R) for 0 ≤ r ≤ R, and

f (r ) = 1 for r > R. (3.4)

We define a two-dimensional potential by

2 s 44 f (|x|)2 + 21 va (|x|) f (|x|)2 dz. W (x) = h R

(3.5)

Clearly, W (x) ≥ 0, and W is rotationally symmetric with W (x) = 0 for |x| ≥ R. Moreover, by partial integration, using (3.3), it follows that W ∈ L 1 (R2 ) with 8πa s 44 (1 − a/R)−1 . W (x)d x = (3.6) h R2

Bosons in Disc-Shaped Traps: From 3D to 2D

669

Define, for b > R,

⎧ ln(R/a2D )/ln(b/a2D ) if 0 ≤ r ≤ R ⎪ ⎪ ⎪ ⎨ ϕ(r ) = ln(r/a2D )/ln(b/a2D ) if R ≤ r ≤ b . ⎪ ⎪ ⎪ ⎩ 1 if b≤r

(3.7)

As test function for the three dimensional Hamiltonian (3.1) we shall take (x1 , . . . , xn ) = F(x1 , . . . , xn )G(x1 , . . . , xn )

(3.8)

with F(x1 , . . . , xn ) =

f (|xi − x j |) and G(x1 , . . . , xn ) =

i< j

ϕ(|xi − x j |)

i< j

n

sh (z k ).

k=1

(3.9) The parameters R, b and also will eventually be chosen so that the errors compared to the expected leading term in the energy are small. As it stands, the function (3.8) does not satisfy Dirichlet boundary conditions but this can be taken care of by multiplying the function with additional factors at energy cost ∼ 1/2 per particle, that will turn out to be small compared to the energy of (3.8). Since f (|xi − x j |)ϕ(|xi − x j |) = 1 for |xi − x j | ≥ b and sh is normalized, the norm of can be estimated as in Eq. (3.9) in [15], π n(n − 1) b2 . (3.10) | ≥ 2n 1 − 2 2 Next we consider the expectation value of H with the wave function . By partial integration we have, for every j, |∇ j (F G)|2 = G 2 |∇ j F|2 − F 2 G j G = G 2 |∇ j F|2 − F 2 G ∂z2j G − F 2 G( j G) 2 2 2 2 2 2 = G |∇ j F| − F G ∂z j G + F |∇ j G| + 2 F G(∇ j F) · (∇ j G), (3.11)

where j and ∇ j are, respectively, the two dimensional Laplace operator and gradi ent. The term − F 2 G ∂z2j G together with Vh⊥ F 2 G 2 gives the confinement energy, (e⊥ / h 2 ) 2 . Next we consider the first and the third term in (3.11). Since 0 ≤ f ≤ 1, f ≥ 0 and sh is normalized, we have f (|xi − x j |)2 G 2 F 2 |∇ j G|2 + |∇ j F|2 G 2 ≤ |∇ j |2 + 2 j

+4

k
j

j

f (|xk − xi |) f (|xk − x j |)G 2 ,

i< j

(3.12)

670

K. Schnee, J. Yngvason

where we have denoted 2

i< j

ϕ(|xi − x j |) by for short. Moreover, since 0 ≤ ϕ ≤ 1,

f (|xi − x j |)2 G 2 ≤ 2

i< j

f (|xi − x j |)2 sh (z i )2 s j (z j )2 .

(3.13)

i< j

By Young’s inequality 2 s 44 f (|xi − x j |)2 sh (z i )2 s j (z j )2 dz i dz j ≤ f (|(xi − x j , z)|)2 dz. 2 h R2 R

(3.14)

The right side gives rise to the first of the two terms in the formula (3.5) for the two dimensional potential W . The other term is provided by F 2 G 2 va (xi − x j ), using 0 ≤ f ≤ 1, 0 ≤ ϕ ≤ 1 and Young’s inequality. Altogether we obtain |H | − (ne⊥ / h 2 ) | ≤ |∇ j |2 +

i< j

n2

W (xi − x j )2 + R1 + R2

with

R1 = 2

and

R2 = 4

k

×

3

n

n2

j

n

(3.15)

F G(∇ j F) · (∇ j G)

f (|xk − xi |) f (|xk − x j |)G 2 ≤

(3.16)

2 n(n − 1)(n − 2)2(n−3) 3

f (|x1 − x2 |) f (|x2 − x3 |)sh (z 1 )2 sh (z 2 )2 sh (z 3 )2 ,

(3.17)

where 0 ≤ ϕ ≤ 1 has been used for the last inequality. The error term R1 is easily dealt with: It is zero because ϕ(r ) is constant for r ≤ R and f (r ) is constant for r ≥ R. The other term, R2 , is estimated as follows. Since f (r ) = 0 for r ≥ R we can use the Cauchy Schwarz inequality for the integration over x1 at fixed x2 to obtain f (|x1 − x2 |)sh (z 1 )2 dx1

≤

1/2

f (|x1 − x2 |) dx1

1/2

2

sh (z 1 ) dx1 4

|x1 −x2 |≤R

≤ (4π s ∞ a R 3 /3h 2 )1/2

(3.18)

with a = a(1 − a/R)−1 . The same estimate for the integration over x3 and a subsequent integration over x2 gives R2 ≤ (const.)2n n 3

a R3 . 4 h 2

(3.19)

Bosons in Disc-Shaped Traps: From 3D to 2D

671

We need R2 /| to be small compared to the leading term in the energy, ∼ n 2 −2 × | ln(ρh 2 )|−1 with ρ = n/2 . (Recall that we are in Region II where | ln(ρh 2 )| h/a.) Moreover, the leading term should be large compared to the Dirichlet localization energy, which is ∼ n/2 . We are thus lead to the conditions (the first comes from (3.10)): na R 3 | ln(ρh 2 )| 1, 2 h 2

n 2 b2 1, 2

n 1, | ln(ρh 2 )|

(3.20)

which can also be written ρa R 3 | ln(ρh 2 )| 1, h2

ρ 2 2 b2 1,

| ln(ρh 2 )| 1. ρ2

(3.21)

These conditions are fulfilled if we choose R = h, b = ρ −1/2 | ln(ρh 2 )|−α

(3.22)

ρ −1/2 | ln(ρh 2 )|1/2 ρ −1/2 | ln(ρh 2 )|α .

(3.23)

with α > 1/2 and Note also that n = ρ2 1. It remains to compare ⎛ −1 ⎝ |

R2n

j

|∇ j |2 +

R2n

i< j

⎞ W (xi − x j )2 ⎠

2 /2 )|−1 . with the expected leading term of the energy, i.e., 4π(n 2 /2 )| ln(na2D We consider first the simplest case, i.e., n = 2. We have b dr 2 −2 = (ln(b/a2D ))−2 2π ln(b/R), |∇ ϕ| = (ln(b/a2D )) 2π 2 R R r

1 2

R2

W ϕ2 =

4πa s 44 h

ln(R/a2D ) ln(b/a2D )

(3.24)

(3.25)

2 .

(3.26)

Inserting the formula (1.17) for a2D and using R = h, b = ρ −1/2 | ln(ρh 2 )|−α and a = a(1 + o(1)) we have (|∇ ϕ|2 + 21 W ϕ 2 ) R2

= 2π(ln(b/a2D ))−2 ln(b/ h) + (h/2a s 44 ) 2 )|−1 (1 + o(1)). = 4π | ln(ρa2D

(3.27)

For n > 2 we can use the symmetry of to write, using (3.27) as well as 0 ≤ ϕ(r ) ≤ 1, 2 |∇ j | + W (xi − x j )2 j

n2

=n ≤

i< j |∇1 |2

n2

n

W (xi − x1 ) n i=2 2 2 4π n 2 2(n−1) | ln(na2D /2 )|−1 (1 + o(1)) + R3

n2

+

1 2

2

(3.28)

672

K. Schnee, J. Yngvason

with

R3 = n

3 2(n−3)

32

ϕ (|x2 − x1 |)ϕ (|x3 − x1 |).

(3.29)

We estimate R3 in the same way as (3.18), obtaining R3 ≤ (const.)2(n−2) n 3 b2 (ln(b/a2D ))−2 2π ln(b/R).

(3.30)

The condition that R3 has to be much smaller than the leading term, given by 2 /2 )|−1 , is equivalent to 4π n 2 2(n−1) | ln(na2D nb2 ln(b/R) 1. 2

(3.31)

With the choice (3.22) this holds if α > 1/2. 3.1.2. Upper bound in Region I. In Region I the ansatz (3.8) can still be used, but this time we take b = R, i.e., ϕ ≡ 1. In this region (a/ h)| ln(ρh 2 )| = o(1) and the leading term in the energy is ∼ n 2 −2 a/ h. Conditions (3.20) are now replaced by n2 R2 1, 2

n R3 1, 2 h

na 1, h

(3.32)

where we have here used that a = a(1 + o(1)), provided R a. Note that the last condition in (3.32) means in particular that n 1. Putting again ρ = n/2 , (3.32) can be written as ρ 2 2 R 2 1,

ρ R3 1, h

h 1. ρ2 a

(3.33)

By assumption, a/ h 1, but also ρah 1 by the condition of strong confinement, cf. (1.21). We take R = a(ρah)−β

(3.34)

with 0 < β, so R a. Further restrictions come from the conditions (3.33): The first and the last of these conditions imply together that 1 h ρ2 a ρ R2

(3.35)

ρah (ρah)2β ,

(3.36)

which can be fulfilled if

i.e., if β < 21 . Note that this implies in particular R ρ −1/2 . We can then take = ρ −1/2 (h/a)1/2 (ρah)−γ

(3.37)

with 0<γ <

1 − 2β . 2

(3.38)

Bosons in Disc-Shaped Traps: From 3D to 2D

673

The second of the conditions (3.33) requires that ρ R3 (3.39) = (ρah)(a/ h)2 (ρah)−3β 1, h which holds in any case if β ≤ 1/3. A possible choice satisfying all conditions is β=

1 1 , γ = . 3 12

(3.40)

The error terms (3.33) are then bounded by (ρah)1/6 (first and third term) and (a/ h)2 (second term). Finally, with ≡ 1, Eqs. (3.16), (3.10) and (3.6) give 4π n 2 a s 44 (1 + o(1)). 2 h This completes the proof of the upper bound in boxes with finite n. |−1 |H | − (ne⊥ / h 2 ) ≤

(3.41)

3.2. Global bound, uniform case. The upper bound for the energy per particle in the 2D thermodynamic limit needed for Theorem 1.4 is now obtained by dividing R2 into Dirichlet boxes with side length satisfying (3.23) in Region II, or (3.35) in Region I, and distributing the particles evenly among the boxes. In other words, the trial wave function in a large box of side length L is = α , (3.42) α

where α labels the boxes of side length contained in the large box, and α is the Dirichlet ground state wave function for n = ρ2 particles in the box α, with ρ = N /L 2 . The choice of guarantees in particular that the error associated with the Dirichlet localization is negligible. In order to avoid contributions from the interaction between particles in different boxes the boxes should be separated by the range, R0 of the interaction potential and in the “corridors” between the boxes the wave function is put equal to zero. The number of particles in each box is then not exactly ρ2 , but rather the smallest integer larger than ρ2 (/( + R0 ))2 . In order that the errors are negligible one needs R0 / = o(1), as well as ρ2 1, and both are guaranteed by the choice (3.37) of . 3.3. Global bound in a trap. In Theorem 1.1 the system is inhomogeneous in 2D because of the trapping potential VL (x), and the distribution of the particles among the boxes has to be adjusted to the density given by the GP minimizer ϕ GP (x). As long as , given by (3.37) or (3.23), is small compared to L this can be done in an analogous way as in −1/12 in [15], Sect. 4.2. Since, however, the condition L requires N a/ h (ρah) ¯ 2 2 ε Region I, and N /| ln(ρh ¯ )| | ln(ρh ¯ )| in Region II, this will not work in the whole parameter range. A better and only slightly more complicated choice that works in all cases is to replace the Jastrow-type function i< j f (|xi − x j |) by a Dyson wave function of the type considered in [5] and [13], i.e., F(x1 , . . . , x N ) =

N i=1

f (ti (x1 , . . . , xi )),

(3.43)

674

K. Schnee, J. Yngvason

where ti (x1 , . . . , xi ) =

min |xi − x j |

1≤ j≤i−1

(3.44)

is the distance between xi and its nearest neighbor among the points x1 , . . . , xi−1 . To take care of the inhomogeneity in the 2D variables a factor i ϕ GP (xi ) is included, where ϕ GP the normalized minimizer of the 2D GP functional with coupling constant N g. Moreover, the local behavior in the 2D variables is modelled by a Dyson wave function (x1 , . . . , x N ) = ϕ(ti (x1 , . . . , x N ))

(3.45)

ti (x1 , . . . , xi ) =

(3.46)

with min |xi − x j |.

1≤ j≤i−1

The global trial wave function is thus of the form (x1 , . . . , x N ) = F(x1 , . . . , x N )G(x1 , . . . , x N )

(3.47)

with F given by (3.43), and G(x1 , . . . , x N ) = (x1 , . . . , x N )

ϕ GP (xi )sh (z i ).

(3.48)

i

The advantage of the Dyson wave functions over the Jastrow-type functions is that when computing the expectation values , H /, cancellations between numerator and denominator take place that effectively improve the estimate (3.10) of the norm of . 3.3.1. Region I. In Region I, where one can take ϕ ≡ 1, the computation of the expectation value of the Hamiltonian is the same as in [13]. In fact, there is even no need to do the computations explicitly because the required bound can be obtained by combining Theorem 2.1 with Theorem III.1 in [13]. One checks that the parameter b in Eq. (3.42) in [13] is proportional to ρ¯ −1/3 h 1/3 and hence the error terms in the upper bound of 1/3 . On the E QM in terms of the 3D GP energy are of the order a/b = (a/ h)2/3 (ρah) ¯ other hand, by Theorem 2.1 the 3D GP energy minus confining energy approaches, as h/L → 0, the 2D GP energy with coupling constant ( s 4 )a/ h and the approximation is uniform in the parameters as long as (ρah) ¯ → 0. Altogether we have the bound GP (N , L , ( s 4 )a/ h)(1 + o(1)), E QM (N , L , h, a) − N h −2 e⊥ ≤ E 2D where the error term o(1) tend to zero if h/L → 0, a/ h → 0 and ρah ¯ → 0.

(3.49)

Bosons in Disc-Shaped Traps: From 3D to 2D

675

3.3.2. Region II. In Region II one uses the ansatz (3.47)–(3.48) with f as in (3.4) with R = h and ϕ as in (3.7) with b = ρ¯ −1/2 | ln(ρh ¯ 2 )|α . Since the computations are analogous to those in [13] and in Eqs. (3.11)–(3.29) above we shall not write all details explicitly. Proceeding as in Sect. 3.1 in [13], using (3.11), we obtain for the expectation value of the Hamiltonian (1.3), N ||2 Fi−2 f (ti )2 ||2 v(|xi − x j |) , H (N ) + ≤2 | ||2 ||2 i=1 j
+

N i=1

where

F 2 G(−∇i2 G) + ||2 VL ,h (xi ) , ||2

⎧ ⎪ for i = k ⎨1 εik = −1 for ti = |xi − xk | . ⎪ ⎩0 otherwise

(3.50)

(3.51)

The next step is to decouple a given pair of variables, xi , x j , from the other variables in order to achieve cancellations between numerator and denominator in the terms on the right side of (3.50). As in [13] one defines for i < j < p,

F p,i = f min |x p − xk | , F p,i j = f min |x p − xk | . (3.52) k< p,k=i

k< p,k ∈{i, / j}

Note that F p,i is independent of i and F p,i j is independent of i and j. As in Eqs. (3.13)– (3.15) in [13] one has 2 . . . F2 F2 . . . F2 ≤ F2 2 2 2 F j+1 i−1 i+1 N j+1, j . . . Fi−1, j Fi+1,i j . . . FN ,i j

(3.53)

and 2 2 2 2 F j2 . . . FN2 ≥ F j+1, j . . . Fi−1, j Fi+1,i j . . . FN ,i j ⎞⎛ ⎛ N (1 − f (|x j − xk |)2 )⎠ ⎝1 − × ⎝1 − k=1, k=i, j

N

(3.54) ⎞ (1 − f (|xi − xk |)2 )⎠ .

k=1, k=i

In an analogous way one defines p,i and p,i j and obtains the corresponding equations (3.53) and (3.54) with f replaced by ϕ and x by the 2D variable x. Consider now the first two terms in (3.50). As in [13] one uses the estimates

f (ti ) ≤ 2

i−1

f (|xi − x j |)2 , and Fi ≤ f (|xi − x j |).

(3.55)

j=1

For fixed i, j one can use (3.53) and (3.54) (and their analogues for ) to separate the contribution from the variables xi and x j from the rest of the integrand. The part that

676

K. Schnee, J. Yngvason

depends only on the other variables cancels between numerator and denominator, while the i j contribution in the numerator is 2 f (|xi − x j |)2 + v(|xi − x j |) f (|xi − x j |)2 × sh (z i )2 sh (z j )2 ϕ(|(xi − x j |)2 ϕ GP (xi )2 ϕ GP (x j )2 dxi dx j .

(3.56)

Integration over z i and z j , using Young’s inequality, bounds (3.56) by W (xi − x j )ϕ(|(xi − x j |)ϕ GP (xi )2 ϕ GP (x j )2 ,

(3.57)

where W is defined in (3.5). This, in turn, can as in Eqs. (3.18)–(3.20) in [13] be bounded by ϕ GP (x)4 d x J

2

(3.58)

with J=

1 2

W (x)ϕ(|x|)2 d x

(3.59)

given by (3.26) The matching kinetic term, |∇ ϕ|2 , given by (3.25) is derived from the last term in (3.50) in the same way. These two terms together give, up to small errors, the correct coupling constant, i.e., the factor before ϕ GP (x)4 d x in the 2D GP functional. The other parts of the 2D GP functional as well as the confining energy N h −2 e⊥ follow from the last term in (3.50). It remains to look at the error terms. These come on the one hand from the i j contribution to the denominator, i.e., the last two factors in (3.54) and the corresponding factors with ϕ instead of f . On the other hand they come from the third term in (3.50) and corresponding terms with ϕ instead of f . The first mentioned errors have (cf. Eq. (3.21) in [15] and Eq. (3.1) in [14]) the form N ϕ GP 2∞ sh 2∞ (4π R 3 /3), and N ϕ GP 2∞ π b2 .

(3.60)

Since ϕ GP 2∞ ∼ L¯ −2 and sh 2∞ ∼ h −1 , while R = h and b = ρ¯ −1/2 | ln(ρh ¯ 2 )|−α , 2 2 −2α these terms are of the order ρh ¯ and | ln(ρh ¯ )| respectively. For the other errors one again exploits the cancellations between numerator and denominator and obtains in the same way as in [13] and [14] (cf. Eqs. (3.26) and (3.36) in [13], and (3.4)–(3.5) in 2 , N −1 , and | ln(ρh [14]) terms of the order (ρah) ¯ ¯ 2 )|−1 . This can be compared with the estimate for R2 in (3.17) where cancellations could not be used. All the mentioned error terms are small in Region II. In combination with the bound (3.49) for Region I we can thus state the upper bound in both regions as GP E QM (N , L , h, a) − N h −2 e⊥ ≤ E 2D (N , L , g)(1 + o(1)).

(3.61)

Bosons in Disc-Shaped Traps: From 3D to 2D

677

4. Scattering Length As a preparation for the lower bound we consider in this section the perturbative calculation of the 2D scattering length of an integrable potential. Consider a 2D, rotationally symmetric potential W ≥ 0 of finite range R0 . As discussed in Appendix A in [19] the scattering length is determined by minimizing, for R ≥ R0 , the functional |∇ψ|2 + 21 W |ψ|2 (4.1) E R [ψ] = |x|≤R

with boundary condition ψ = 1 for |x| = R. The Euler equation (zero energy scattering equation) is −ψ + 21 W ψ = 0

(4.2)

and for r = |x| ≥ R0 the minimizer, ψ0 , is ψ0 (r ) = ln(r/ascatt )/ ln(R/ascatt )

(4.3)

with a constant ascatt . This is, by definition, the 2D scattering length for the potential W . An equivalent definition follows by computing the energy, E R = E R [ψ0 ] = 2π/ ln(R/ascatt )

(4.4)

ascatt = R exp(−2π/E R ).

(4.5)

which means that

Lemma 4.1. (Scattering lengthfor soft potentials). Assume W (x) = λw(x) with λ ≥ 0, w ≥ 0, and w ∈ L 1 (R2 ), with w(x)d x = 1. Then, for R ≥ R0 ,

4π + η(λ, R) (4.6) ascatt = R exp − λ with η(λ, R) → 0 for λ → 0. Proof. The statement is, by (4.5), equivalent to E R = 21 λ(1 + o(1)),

(4.7)

where the error term may depend on R. The upper bound is clear by the variational principle, taking ψ = 1 as a test function. For the lower bound note first that ψ0 ≤ 1. This follows from the variational principle: Since W ≥ 0 the function ψ˜ 0 (x) = min{1, ψ0 } satisfies E R [ψ˜ 0 ] ≤ E R [ψ0 ]. Hence the function ϕ0 = 1 − ψ0 is nonnegative. It satisfies −ϕ0 + 21 W ϕ0 = 21 W and the Dirichlet boundary condition ϕ0 = 0 for |x| = R. Integration of (4.2), using that ψ0 (r ) = 1 for r = R, gives E R = 21 W ψ0 = 21 W (1 − ϕ0 ). Since ϕ0 ≥ 0 we thus need to show that ϕ0 ∞ = o(1).

(4.8)

(4.9)

678

K. Schnee, J. Yngvason

By (4.8), and since ϕ0 and W are both nonnegative, we have −ϕ0 ≤ 21 W and hence 1 ϕ0 (x) ≤ (4.10) K 0 (x, x )W (x )d x , 2 where K 0 (x, x ) is the (nonnegative) integral kernel of (−)−1 with Dirichlet boundary conditions at |x| = R. The kernel K 0 (x, x ) is integrable (the singularity is ∼ ln |x − x |) and hence, if W is bounded, we have ϕ0 ∞ ≤ (const.)λ w ∞ = O(λ). If w is not bounded we can, for every ε > 0, find a bounded w ε ≤ w with (w−w ε ) ≤ ε. Define Cε = w ε ∞ . Without restriction we can assume that Cε is a monotonously decreasing function of ε and continuous. The function g(ε) = ε/Cε is then monotonously increasing in ε (and hence decreasing if ε decreases), continuous and g(ε) → 0 if ε → 0. For every (sufficiently small) λ there is an ε(λ) = o(1) such that g(ε(λ)) = λ. Then ϕ0 ∞ ≤ (const.)(ε(λ) + λCε(λ) ) = (const.)ε(λ) = o(1).

(4.11)

Corollary 4.2. (Scattering length for scaled, soft potentials). Assume W R,λ (x) = λR −2 w1 (x/R) with w1 ≥ 0 fixed and w1 = 1. Then the scattering length of W R,λ is

4π + η(λ) (4.12) ascatt = R exp − λ with η(λ) → 0 for λ → 0, independent of R. Proof. This follows from Lemma 4.1 noting that, by scaling, the scattering length of W R,λ is R times the scattering length of λw1 . The latter is independent of R. Remarks. If W is obtained by averaging a 3D integrable potential v over an interval of length h in the z variable, the formula (4.12), together with Eq. (A.8) in [19], motivates the exponential dependence of the effective 2D scattering length (1.17) of v on h/a: The integral λ = W (x)d x is h −1 v(x)d 3 x, which for weak potentials is h −1 8πa to lowest order, by Eq. (A.8) in [19]. Inserting this into (4.12) gives (1.17) (apart from the dependence on the shape function s). This heuristics is, of course, only valid for soft potentials v. An essential step in the lower bound in the next section is the replacement of v by a soft potential to which this reasoning can indeed be applied. 5. Lower Bound 5.1. Finite boxes. Like for the upper bound we consider first the homogeneous case and finite boxes, this time with Neumann boundary conditions. The optimal distribution of particles among the boxes is determined by using subadditivity and convexity arguments as in [18] and [15]. In the treatment of the lower bound there is a natural division line between the case where the mean particle distance ρ −1/2 is comparable to or larger than h and the case where it is much smaller than h. The first case includes Region II and a part (but not all) of Region I. When ρ −1/2 is much smaller than h the boxes have finite extension in the z direction as well. The method is then a fairly simple modification of the 3D estimates in [13] (see also Sect. 4.4 in [15]) and will not be discussed further here. The derivation of a lower bound for the case that ρh 2 ≤ C < ∞ proceeds by the following steps:

Bosons in Disc-Shaped Traps: From 3D to 2D

679

• Use Dyson’s Lemma [5, 18] to replace va by an integrable 3D potential U , retaining part of the kinetic energy. • Average the potential U at fixed x ∈ R2 over the z-variable to obtain a 2D potential W . Estimate the error in this averaging procedure by using Temple’s inequality [37, 29] at each fixed x. • The result is a 2D many body problem with an integrable interaction potential W which, by Corollary 4.2, has the right 2D scattering length to lowest order in a/ h, but reduced kinetic energy inside the range of the potential. This problem is treated in the same way as in [19, 14], introducing a 2D Dyson potential and using perturbation theory, again estimating the errors by Temple’s inequality. • Choose the parameters (size of box, fraction ε of the kinetic energy, range R of potential U , as well as the corresponding parameters for the 2D Dyson potential) optimally to minimize the errors. The first two steps are analogous to the corresponding steps in the proof of the lower bound in Theorem 3.1 in [15], cf. Eqs. (3.30)–(3.36) in [15]. It is, however, convenient to define the Dyson potential U in a slightly different manner than in [18]. Namely, for R ≥ 2R0 , with R0 the range of va , we define 24 −3 R for 21 R < r < R (5.1) U R (r ) = 7 0 otherwise. The reason is that this potential has a simple scaling with R which is convenient when applying Corollary 4.2. Proceeding as in Eqs. (3.30)–(3.38) in [15] we write a general wave function as (x1 , . . . , xn ) = f (x1 , . . . , xn )

n

sh (z k ) ,

(5.2)

k=1

and define F(x1 , . . . , xn ) ≥ 0 by |F(x1 , . . . , xn )|2 =

| f (x1 , . . . , xn )|2

n

sh (z k )2 dz k .

(5.3)

k=1

Note that F is normalized if is normalized. The analogue of Eq. (3.38) in [15] is ne⊥ |H | − 2 h n n

≥ d xk ε|∇i F|2 + (1 − ε)|∇i F|2 χmink |xi −xk |≥R (xi ) i=1 n

+

k=1 n

ε|∂i f |2 + a U R (|xi − xk(i) |)χBδ (z k(i) / h)| f |2 sh (z k )2 dxk ,

i=1

(5.4)

k=1

where ∇i denotes the gradient with respect to xi and ∂ j = d/dz j . Moreover, χBδ is the characteristic function of the subset Bδ ⊂ R, where s(z)2 ≥ δ for δ > 0, a = a(1 − ε)(1 − 2R ∂s 2 ∞ /(hδ)),

(5.5)

680

K. Schnee, J. Yngvason

and k(i) denotes the index of the nearest neighbor to xi . In deriving (5.4) the Cauchy Schwarz inequality has been used to bound the longitudinal kinetic energy of f in terms of that of F, i.e., n

|∇i f |2

i=1

n

sh (z k )2 dxk ≥

k=1

n

|∇i F|2

i=1

n

d xk .

(5.6)

k=1

We now consider, for fixed x1 , . . . , xn , the term n n

sh (z k )2 dz k . ε|∂i f |2 + a U R (|xi − xk(i) |)χBδ (z k(i) / h)| f |2 i=1

(5.7)

k=1

This can be estimated from below by the expectation value of U R over z i at fixed xi , using Temple’s inequality to estimate the errors. The result is, by a calculation analogous to Eqs. (3.41)–(3.46) in [15], |H | −

ne⊥ ≥ h2

n

ε|∇i F|2 + (1 − ε)|∇i F|2 χmink=i |xi −xk |≥R (xi )

i=1

+ 21

W (xi − x j )|F|2

j=i

n

d xk ,

(5.8)

k=1

where W is obtained by averaging a U R over z:

W (x − x ) = 2a sh (z)2 sh (z )2 U R (|x − x |)χBδ (z / h)dzdz . R ×R

(5.9)

Here, a

= a (1 − η) with an error term η containing the error estimates from the Temple more than the nearest neighbor to xi . Moreover, since inequality and from including U R (x)dx = 4π , U R (|x − x |) = 0 for |x − x | > R, and |s(z)2 − s(z )2 | ≤ R ∂z s 2 ∞ for |z − z | ≤ R we have the estimate

8πa

R 4 2 W (x)d x ≥ s(z) dz − ∂z s ∞ h h R2 Bδ

8πa

R (5.10) s 44 − δ − ∂z s 2 ∞ . ≥ h h As we will explain in a moment, the errors, and the replacement of n − 1 by n, require the following terms to be small: nh 2 a , ε R3

nR , ε, h

1 , δ, n

R . hδ

(5.11)

The rationale behind the first term is as follows. The Temple errors in the averaging procedure at fixed x1 , . . . , xn produce a factor similar to (2.9), namely

2 1

U (5.12) 1−a U (const.)ε/ h 2 − (const.)a U

Bosons in Disc-Shaped Traps: From 3D to 2D

with U m =

n

681

m ⊥ U (|xi − xk(i) |)χBδ (xk(i) /r )

i=1

n

sh (z j )2 dz j ,

(5.13)

j=1

cf. Eqs. (3.40)–(3.41) in [15]. The analogue of Eq. (3.42) in [15] is U ≤ (const.)n(n − 1)

s 44 , h R2

(5.14)

and U 2 ≤ (const.)n R −3 U by Schwarz’s inequality. Since the denominator in (5.12) must be positive, we see in particular that the particle number must satisfy n(n − 1) < (const.)

ε R2 ah

(5.15)

and the error is of the order nh 2 a/ε R 3 as claimed in (5.11). The estimate from below on U , obtained in the same way as Eq. (3.46) in [15], is ⎛ ⎞ n U ≥ θ (R − |xk − xi |)⎠ sh (zl )2 dzl U (|xi − x j |)χBδ (z j / h) ⎝1 − i= j

k, k=i, j

1 R ≥

W (xi − x j ) 1 − (n − 2) s 2∞ . 2a h

l=1

(5.16)

i= j

In particular, the second term in (5.11), n R/ h, should be small. The requirement that ε and n −1 are small needs no further comments. The potential W can be written as W (x) = λR −2 w1 (x/R) ,

(5.17)

where w1 is independent of R, with w1 (x)d x = 1 and

8πa s 4 (1 − η ) . λ= h

(5.18)

(5.19)

Here, η is an error term involving δ and R/(hδ) (cf. (5.5) and (5.10)) besides the other terms in (5.11). In particular, δ and R/(hδ) should be small. The 2D scattering length of (5.17) can be computed by Corollary 4.2 and has the right form (1.17) to leading order in λ. (Recall from the remark preceding Eq. (1.19) that R in (4.12) can be replaced by h as long as ca < R < Ch.) The Hamiltonian on the right side of Eq. (5.8) can now be treated with the methods of [19]. The only difference from the Hamiltonian discussed in that paper is the reduced kinetic energy inside the range of the potential W . This implies that λ in the error term η(λ) in Corr. 4.2 should be replaced by λ/ε, which requires a 1. εh

(5.20)

682

K. Schnee, J. Yngvason

Otherwise the method is the same as in [19]: a slight modification of Dyson’s Lemma ˜ to (Lemma 3.1 in [19]) allows to substitute for W a potential U˜ of larger range, R, which perturbation theory and Temple’s inequality can be applied. The modified Dyson lemma is discussed in the Appendix. The fraction of the kinetic energy borrowed for the application of Temple’s inequality as in Eq. (3.16) in [19] will be denoted by ε˜ . The errors that have now to be controlled are ε˜ , n R˜ 2 /2 ,

R , R˜

n2 . 2 ) ε˜ R˜ 2 ln( R˜ 2 /a2D

(5.21)

To explain these terms we refer to Eqs. (3.18)–(3.19) in [19] which contain the relevant estimates. Substituting ε˜ for ε and R˜ for R in these inequalities we see first that ε˜ and n R˜ 2 /2 should be small. The smallness of R/ R˜ guarantees that the “hole” of radius R in the 2D Dyson potential has negligible effect. Since the denominator in the Temple error in Eq. (3.19) in [19] must be positive we see also that the particle number n in the box should obey the bound 2 n(n − 1) < (const.) ε˜ ln( R˜ 2 /a2D ),

(5.22)

2 )). and the Temple error is bounded by (const.)n2 /(˜ε R˜ 2 ln( R˜ 2 /a2D We summarize the discussion so far in the following lemma:

Lemma 5.1. For all n satisfying (5.15) and (5.22), 2π n(n − 1) 2 ˜ δ) , E box (n, , h, a) ≥ | ln(a2D / R˜ 2 )|−1 1 − E(n, , h, a; ε, R, ε˜ , R, 2 (5.23) where E tends to zero together with terms listed in (5.11) and (5.21). We note also from Eq. (3.19) in [19] that K (n) = (1 − E) is decreasing in n (for the other parameters fixed). If K (n) is defined as zero for n not satisfying (5.15) and (5.22) the estimate (5.23) holds for all n. 5.2. Global bound, uniform case. Using superadditivity of the energy (which follows from W ≥ 0), monotonicity of K (n) and convexity of n(n − 1) in the same way as in Eqs. (7)–(12) in [18], one sees that if ρ = N /L 2 is the density in the thermodynamically large box of side length L the optimal choice of n in the box of fixed side length is n ∼ ρ2 . We thus have to show that it is possible to choose the parameters ε, R, δ, ε˜ and R˜ and , in such a way that all the error terms (5.11) and (5.21), as well as δ and R/(hδ) 2 / R˜ 2 )| = | ln(a 2 ρ)|(1 + o(1)). We note that the conditions are small, while | ln(a2D 2D 2 2 → 0 and hence | ln(ρa 2 )|−1 → 0. a/ h 1 and ρ|ln(ρa2D )|−1 1/ h 2 imply ρa2D 2D We make the ansatz a α a α a β ε= , δ= , R=h , (5.24) h h h and choose such that L is a multiple of with ρ −1/2 ρ −1/2

a −γ h

.

(5.25)

Bosons in Disc-Shaped Traps: From 3D to 2D

683

Then n = ρ2 1 and R/(hδ) = (a/ h)β−α . The error terms (5.11) are also powers of a/ h and we have to ensure that all exponents are positive, in particular β − α > 0, β − 2γ > 0, 1 − α − 3β − 2γ > 0.

(5.26)

This is fulfilled, e.g., for 2 1 1 , β= , γ = , (5.27) 9 9 18 with all the exponents (5.26) equal to 1/9. Note also that with this choice Eq. (5.20) is fulfilled. Next we write α = α =

2 ε˜ = | ln(ρa2D )|−κ ,

2 R˜ = ρ −1/2 | ln(ρa2D )|−ζ .

(5.28)

2 / R˜ 2 )| = | ln(a 2 ρ)|(1 + o(1)). The error terms (5.21) are Then | ln(a2D 2D 2 ε˜ = | ln(ρa2D )|−κ ,

a β R 2 = (ρh 2 )1/2 | ln(ρa2D )|ζ , h R˜

n R˜ 2 2 = | ln(ρa2D )|−2ζ , 2 (5.29)

and a −4γ n2 2 2 = | ln(ρa2D )|−(1−κ−2ζ ) (1 + O(ln | ln(ρa2D )|)). (5.30) 2 ) h ε˜ R˜ 2 ln( R˜ 2 /a2D 2 )|−4γ = O(1), the error term (5.30) can also be written as Since (a/ h)−4γ | ln(ρa2D

n2 2 2 = O(1)| ln(ρa2D )|−(1−κ−2ζ −4γ ) (1 + O(ln | ln(ρa2D )|)). (5.31) 2 ) ε˜ R˜ 2 ln( R˜ 2 /a2D The condition ρh 2 < C is used to bound R/ R˜ in (5.29). Namely, 2 )|ζ ≤ (const.)(h/a)ζ , (ρh 2 )1/2 | ln(ρa2D

(5.32)

a β−ζ R = O(1) . h R˜

(5.33)

so

We choose now ζ =

2 1 , κ= . 9 9

(5.34)

Then 1 1 , and κ = 1 − κ − 2ζ − 4γ = . (5.35) 9 3 This completes our discussion of the lower bound of the energy in the thermodynamic limit, i.e., in Theorem 1.4, for the case ρh 2 ≤ C < ∞. As already mentioned, the case ρh 2 1 can be treated with the 3D methods of [18, 13], taking care to retain uniformity in the parameter h/. The analogous problem for the reduction from 3D to 1D is discussed in detail in [15], cf. in particular the lower bound in Theorem 3.2 in [15]. Since no new aspects arise in the reduction from 3D to 2D we refrain from discussing the case ρh 2 1 further here. β −ζ =

684

K. Schnee, J. Yngvason

5.3. Global bound in a trap and BEC. We now discuss the case when the system is inhomogeneous in the x variables, i.e., the lower bound for Theorem 1.1. A possible approach, analogous to that of [13], is to modify Eq. (5.2) and write a general wave function as (x1 , . . . , xn ) = f (x1 , . . . , xn )

n

ϕ GP (xk )sh (z k ).

(5.36)

k=1

This is always possible because ϕ GP and sh are strictly positive. As in [13] the task then becomes to minimize a quadratic form in f involving the GP density ρ GP (xi ) = ϕ GP (xi )2 both as a weight factor in the integration over the xi ’s and also as a replacement for the external potential VL (xi ). There is, however, a neat alternative approach to deal with inhomogeneities due to R. Seiringer [36] that is somewhat simpler and that we shall follow with appropriate modifications. This approach leads also quickly to a proof of Theorem 1.3. 5.3.1. GP case. We start by writing again the wave function in the form (5.2) but this time with n = N . With H including the trapping potential VL , cf. (1.3), and using the symmetry of F and f we obtain as in Eq. (5.4), |H | −

N N e⊥ 2 ≥ T + N V (x )F d xk L 1 h2 k=1

+N ε|∂1 f |2 + a U R (|x1 −xk(1) |)χBδ (z k(1) / h)| f |2 ×

N

sh (z k )2 dxk ,

(5.37)

k=1

where T =N ε

|x1 −xk(1) |≤R

|∇1 |2

N k=1

dxk +

|x1 −xk(1) |≥R

|∇1 |2

N

dxk . (5.38)

k=1

Note that we have here not yet used Eq. (5.6) to bound the parallel kinetic energy of f in terms of that of F. The reason is that we want to prove BEC for the wave function and not only for F. The next step is to split the kinetic energy T , for given R˜ > R and ε˜ > 0, into two parts with the main contributions coming from length scales > R˜ and < R˜ respectively in the longitudinal directions. To write this in a compact way we introduce the notation X = (x2 , . . . , x N ), d X = d x2 · · · d x N ; d Z = dz 1 · · · dz N

Z = (z 1 , . . . , z N ), (5.39)

and R, R˜ ˜ X = {x 1 : |x 1 − x k(1) | ≥ R}.

(5.40)

Bosons in Disc-Shaped Traps: From 3D to 2D

685

The splitting of the kinetic energy is T ≥ T> + T< with >

T =N

R3N −2

εε˜ 2

ε˜ |∇1 |2 +

˜

2

XR, R

(5.41)

|∇1 |2 + 1 −

ε˜ 2

˜

R > X

|∇1 |2

d Xd Z, (5.42)

T

<

=N +

R2(N −1)

R˜ > X

ε˜ ε 1− 2

|∇1 F|2 + 1 −

ε˜ 2

˜ XR, R

|∇1 F|2

ε˜ 2 |∇ F| d X. 2 1

(5.43)

Here we have used (5.6) for T < , but not for T > . We then add and subtract 8π N 2 g R2N ρ GP (x1 )|F|2 from the right side of (5.37) and, using (5.41), estimate it by E > + E < with > > 2 2 E = T + N VL (x1 )F + 8π N g ρ GP (x1 )|F|2 , (5.44)

E< = T < + N

N

ε|∂1 f |2 + a U R (|x1 − xk(1) |)χBδ (z k(1) / h)| f |2 sh (z k )2 dxk k=1

−8π N 2 g

ρ GP (x1 )|F|2 .

(5.45)

The two terms are now considered separately. The first remark is that ϕ GP (x) is the ground state wave function of the 1-body Hamiltonian − + VL (x) + 8π N gρ GP (x)

(5.46)

and the energy is equal to the GP chemical potential μ

GP

(N g) =

GP E 2D (1, L ,

N g) + 4π N g

ϕ GP (x)4 d x,

(5.47)

cf. Sect. 2 in [14]. Hence, if T > were replaced by the full kinetic energy, |∇1 |2 , then GP (N , L , g) + 4π N 2 g ϕ GP (x)4 d x. E > would be bounded below by N μGP (N g) = E 2D Using a variant of the generalized Poincaré inequalities of [16] it is shown in [35, 36] that this estimate also holds for E > up to small errors. Even a stronger result is true (cf. [36], Eq. (51)), because a contribution from the one particle density matrix in the longitudinal direction, γ (x, x ) = γ ((x, z), (x , z))dz (5.48) R

686

K. Schnee, J. Yngvason

can be included: provided R˜ 2 N → 0 for N → ∞, and N g is fixed,

E > /N ≥ μGP (N g) + CTr N1 γ (1 − |ϕ GP ϕ GP |) (1 − o(1)),

(5.49)

where the term o(1) goes to zero if ε, ε˜ → 0 go to zero after the limit N → ∞ has been taken. The term E < is treated in the same manner as in the homogeneous case by introducing boxes, α , of side length in the longitudinal directions. Defining ρα = sup ρ GP (x)

(5.50)

(E(n α , ) − 8π N gρα n α )

(5.51)

x∈ α

we have (cf. Eq. (60) in [36]) E < ≥ inf nα

α

˜ δ.) with E(n, ) the right side of (5.23). (It depends also on the parameters h, a; ε, R, ε˜ , R, The infimum is taken over all distributions of the N particles among the boxes and n α denotes the particle number in box α. Choosing the box length as in Eq. (5.25) with ρ ¯ and arguing exactly as in [36], Eqs. (71)–(78) replaced by the mean density ρ¯ ∼ N / L, we obtain E < ≥ −4π N g

|ϕ GP (x)|4 d x(1 + o(1)).

(5.52)

Adding (5.52) and (5.49) we obtain the bound

|H | e⊥ GP − 2 ≥ E 2D (1, L , N g) + CTr[ N1 γ (1 − |ϕ GP ϕ GP |) (1 − o(1)). N h (5.53) If is the ground state wave function 0 , then (5.53) together with the upper bound (3.61) proves both the convergence of the energy as claimed in Theorem 1.1 and also the convergence of the density matrix γ (x, x ) to ϕ GP (x)ϕ GP (x ) (in trace norm) as N → ∞ for the case that N g is fixed (or at least bounded). The convergence of the full density matrix γ0 (x, x ), scaled by h in the variables z and z , to ϕ GP (x)ϕ GP (x )s(z)s(z ) follows from a simple energetic consideration: The energy gap between the lowest and ⊥ the first excited level of −∂z2 + Vh⊥ is large compared to |H |/N − eh 2 in the limit considered. Hence the contributions to the density matrix from the excited states in the perpendicular direction must vanish. Remark. The above proof of BEC can be extended to allow a slow increase of N g to infinity with N , but this improvement is restricted by the available estimates of the error terms in (5.49) and (5.53) and is only marginal.

Bosons in Disc-Shaped Traps: From 3D to 2D

687

5.3.2. TF case. To complete the proof of Theorem 1.1 one must also consider the case that N g → ∞, i.e. the ‘Thomas-Fermi’ limit. Here the starting point is again (5.37), this time with T replaced by the smaller quantity T = ε

|x1 −xk(1) |≤R

|∇1 F|2

N

d xk +

k=1

|x1 −xk(1) |≥R

|∇1 F|2

N

d xk

(5.54)

k=1

cf. (5.6). We then use the TF equation ρ TF (x) =

1 [μTF − VL (x)]+ 8π N g

(5.55)

to trade the potential VL (x) for the TF density ρ TF (x) which is the square of the minimizer of (1.10) with the kinetic term omitted. The TF chemical potential μTF is determined by the normalization condition ρ TF = 1 and satisfies in analogy to (5.47), TF (1, L , N g) + 4π N g μTF (N g) = E 2D

ϕ TF (x)4 d x.

(5.56)

From (5.55) it follows that VL (x) ≥ μTF − 8π(N g)ρ TF (x).

(5.57)

Both μTF and ρ TF depend on N g and scale in a simple way if the potential VL (x) is a homogeneous function of some degree p > 0, cf. Eqs. (2.20)–(2.21) in [15]. We can now apply the box method to the minimization of

T +μ

TF

− 8π(N g)N

ρ

TF

(x1 )F

2

N

d xk

k=1

N

+N ε|∂1 f |2 + a U R (|x1 − xk(1) |)χBδ (z k(1) / h)| f |2 sh (z k )2 dxk ,

(5.58)

k=1

obtaining in analogy to (5.51) and using (5.56), TF (5.58) ≥ E 2D (N , L , g) + 4π(N g)N

ρ TF (x)2 d x + inf

− 8π N gραTF n α ).

nα

(E(n α , )

α

(5.59)

The remaining steps towards the estimate TF (N , L , g)(1 − o(1)) (5.58) ≥ E 2D

are now exactly like in Sect. 4 in [14].

(5.60)

688

K. Schnee, J. Yngvason

6. Conclusions We have investigated the dimensional reduction of a trapped, interacting Bose gas when the trap potential is strongly confining in one direction. Starting from the many-body Hamiltonian with repulsive, short range interactions we have shown rigorously how an effective 2D description of the ground state energy and density emerges and how the parameters of the 2D gas relate to those of the 3D gas. Two parameter regimes can be distinguished: One where the gas retains some of its 3D character despite the tight confinement and another where the situation is manifestly two-dimensional with a logarithmic dependence of the coupling parameter on the density. Moreover, we have shown that the trapped gas is Bose-Einstein condensed in the ground state provided the coupling parameter N g stays bounded. A. Appendix: A Modification of Dyson’s Lemma Let W (x) ≥ 0 be a rotationally symmetric potential in 2D with W (x) = 0 for |x| > R. Denote by B R the ball of radius R around the origin. For 0 < ε and R ≥ R define

E R ,ε = min ε|∇φ(x)|2 + 21 W (x)|φ(x)|2 d x + |∇φ(x)|2 d x, (A.1) B R \B R

BR

where the minimum is taken over φ ∈ H 1 (B R ) with φ(x) = 1 for |x| = R . Lemma A.1. E R ,ε =

2π . ln(R /R) + 2π/E R,ε

(A.2)

Proof. For any c > 0 the minimum of the first integral in (A.1) with boundary condition φ(x) = c for |x| = R is c2 E R,ε and the minimum of the second integral with the same 2, ˜ boundary condition at |x| = R and φ(x) = 1 for |x| = R is 2π ln(R /R)/(ln( R/b))

where b is determined by ln(R/b)/ ln(R /b) = c. Adding the two contributions and minimizing over c gives (A.2). Lemma A.2. (Modified Dyson Lemma) Let W be as above, let R˜ > R and let U˜ (r ) ≥ 0 ˜ satisfying be any function with support in [R, R]

R˜

2π R

U˜ (R )E −1 R ,ε R d R ≤ 1

(A.3)

with E R ,ε as in (A.2). Let B ⊂ R2 be star-shaped with respect to 0. Then, for all functions φ ∈ H 1 (B), 1 2 2 ε|∇φ(x)| + W (x)|φ(x)| d x + |∇φ(x)|2 d x 2 B∩B R B∩(B R˜ \B R ) (A.4) ≥ U˜ (|x|)|φ(x)|2 d x. B

Bosons in Disc-Shaped Traps: From 3D to 2D

689

Proof. The proof is very similar to that of Lemma 3.1 in [19]. In polar coordinates, r, θ , one has |∇φ|2 ≥ |∂φ/∂r |2 so it suffices to prove the analogue of (A.4) for each angle θ ∈ [0, 2π ). Denote φ(r, θ ) simply by f (r ), and let R(θ ) denote the distance of the origin to the boundary of B along the ray θ . It suffices to consider the case R ≤ R(θ ) (here, W ≥ 0 is used) and the estimate to prove is R min{ R,R(θ)} ˜ ε|∂ f (r )/∂r |2 + 21 W (r )| f (r )|2 r dr + |∂ f (r )/∂r |2 r dr 0

≥

R

˜ min{ R,R(θ)}

U˜ (r )| f (r )|2 r dr.

(A.5)

R

For the given value of θ , consider the disc B R(θ) centered at the origin in R2 and of radius R(θ ). Our function f defines a rotationally symmetric function, x → f (|x|) on B R(θ) , and (A.5) is equivalent to 2 1 2 ε|∇ f (|x|)| + W (r )| f (|x|)| d x + |∇ f (|x|)|2 d x 2 B R(θ) ∩B R0 B R(θ) ∩(B R˜ \B R0 ) (A.6) U˜ (|x|)| f (|x|)|2 d x. ≥ B R(θ)

˜ R(θ )} the left side of (A.6) is not smaller than the same quantity If R ≤ R ≤ min{ R, with B R(θ) replaced by the smaller disc B R . (Again, W ≥ 0 is used.) According to (A.1) this integral over B R is at least E R ,ε | f (R )|2 . Hence, for every such R ,

R

0

ε|∂ f (r )/∂r |2 + 21 W (r )| f (r )|2 r dr +

˜ min{ R,R(θ)}

|∂ f (r )/∂r |2 r dr

R

≥ (2π )−1 E R ,ε | f (R )|2 .

(A.7)

The proof is completed by multiplying both sides of (A.7) by 2π U˜ (R )E −1 R ,ε R and, ˜ R(θ )}. finally, integrating with respect to R from R to min{ R,

A convenient choice for the Dyson potential U˜ is U˜ (r ) =

˜ −1 if R ≤ r ≤ R˜ ν( R) 0 otherwise ,

(A.8)

with ˜ = 2π ν( R)

R˜ R

E R ,ε R d R .

(A.9)

In the case considered in this paper the potential W is integrable with coupling constant λ ∼ a/ h(1 + o(1)) while ε ∼ (a/ h)1/9 . In particular λ/ε → 0 as a/ h → 0. From Sect. 4 and the discussion of Eqs. (5.19)–(5.20) we thus conclude that in our case E R,ε = 2π/ ln(R/a2D )(1 + o(1)).

(A.10)

By Eq. (A.2) we then also have E R ,ε = 2π/ ln(R /a2D )(1 + o(1)),

(A.11)

690

K. Schnee, J. Yngvason

and thus 2 ˜ = 1 R˜ 2 ln( R˜ 2 /a2D ν( R) )(1 + o(1)). 4

(A.12)

Thus, for R and ε chosen as in (5.24) and (5.27), the modified Dyson Lemma gives, up to negligible errors, exactly the same result as the standard 2D Dyson Lemma for a potential with scattering length a2D . Acknowledgements. We thank Robert Seiringer and Elliott Lieb for helpful discussions. KS thanks the Institute for Theoretical Physics, ETH Zürich for hospitality and JY the Science Institute of the University of Iceland, Reykjavik, where parts of this work were done. This work is supported by the Post Doctoral Training Network HPRN-CT-2002-00277 of the European Union and a grant P17176-N02 of the Austrian Science Fund (FWF).

References 1. Colombe, Y., Kadio, D., Olshanii, M., Mercier, B., Lorent, V., Perrin, H.: Schemes for loading a BoseEinstein condensate into a two-dimensional dipole trap. J. Optics B 5, 155 (2003) 2. Das, K.K.: Highly anisotropic Bose-Einstein condensates: Crossover to lower dimensionality. Phys. Rev. A 66, 053612-1–7 (2002) 3. Das, K.K., Girardeau, M.D., Wright, E.M.: Crossover from One to Three Dimensions for a Gas of HardCore Bosons. Phys. Rev. Lett. 89, 110402-1–4 (2002) 4. Dunjko, V., Lorent, V., Olshanii, M.: Bosons in Cigar-Shaped Traps: Thomas-Fermi Regime, Tonks-Girardeau Regime, and In Between. Phys. Rev. Lett. 86, 5413–5316 (2001) 5. Dyson, F.J.: Ground-State Energy of a Hard-Sphere Gas. Phys. Rev. 106, 20–26 (1957) 6. Girardeau, M.D.: Relationship between systems of impenetrable bosons and fermions in one dimension. J. Math. Phys. 1, 516 (1960) 7. Girardeau, M.D., Wright, E.M.: Bose-Fermi Variational Theory for the BEC-Tonks Crossover. Phys. Rev. Lett. 87, 210402-1–4 (2001) 8. Görlitz, A., et al.: Realization of Bose-Einstein Condensates in Lower Dimension. Phys. Rev. Lett. 87, 130402-1–4 (2001) 9. Greiner, M., Bloch, I., Mandel, O., Hänsch, T.W., Esslinger, T.: Exploring Phase Coherence in a 2D Lattice of Bose-Einstein Condensates. Phys. Rev. Lett. 87, 160405 (2001) 10. Kinoshita, T., Wenger, T., Weiss, D.S.: Observation of a One-Dimensional Tonks-Girardeau Gas. Science 305, 1125-1-128 (2004) 11. Lieb, E.H., Liniger, W.: Exact Analysis of an Interacting Bose Gas. I. The General Solution and the Ground State. Phys. Rev. 130, 1605–1616 (1963) 12. Lieb, E.H., Seiringer, R.: Proof of Bose-Einstein Condensation for Dilute Trapped Gases. Phys. Rev. Lett. 88, 170409-1-4 (2002) 13. Lieb, E.H., Seiringer, R., Yngvason, J.: Bosons in a Trap: A Rigorous Derivation of the Gross-Pitaevskii Energy Functional. Phys. Rev. A 61, 043602 (2000) 14. Lieb, E.H., Seiringer, R., Yngvason, J.: A Rigorous Derivation of the Gross-Pitaevskii Energy Functional for a Two-dimensional Bose Gas. Commun. Math. Phys. 224, 17 (2001) 15. Lieb, E.H., Seiringer, R., Yngvason, J.: One-Dimensional Behavior of Dilute, Trapped Bose Gases. Commun. Math. Phys. 244, 347–393 (2004). See also: One-Dimensional Bosons in Three-Dimensional Traps. Phys. Rev. Lett. 91, 150401-1-4 (2003) 16. Lieb, E.H., Seiringer, R., Yngvason, J.: Poincaré Inequalities in Punctured Domains. Ann. Math. 158, 1067–1080 (2003) 17. Lieb, E.H., Seiringer, R., Solovej, J.P., Yngvason, J.: The Mathematics of the Bose gas and its Condensation. Basel: Birkhäuser 2005 18. Lieb, E.H., Yngvason, J.: Ground State Energy of the Low Density Bose Gas. Phys. Rev. Lett. 80, 2504– 2507 (1998) 19. Lieb, E.H., Yngvason, J.: The Ground State Energy of a Dilute Two-dimensional Bose Gas. J. Stat. Phys. 103, 509 (2001) 20. Moritz, H., Stöferle, T., Köhl, M., Esslinger, T.: Exciting Collective Oscillations in a Trapped 1D Gas. Phys. Rev. Lett. 91, 250402 (2003) 21. Olshanii, M.: Atomic Scattering in the Presence of an External Confinement and a Gas of Impenetrable Bosons. Phys. Rev. Lett. 81, 938–941 (1998)

Bosons in Disc-Shaped Traps: From 3D to 2D

691

22. Olshanii, M., Pricoupenko, L.: Rigorous approach to the problem of ultraviolet divergences in dilute Bose gases. Phys. Rev. Lett. 88, 010402 (2002) 23. Paredes, B., Widera, A., Murg1, V., Mandel, O., Flling, S., Cirac, I., Shlyapnikov, G.V., Hänsch, T.W., Bloch, I.: Tonks-Girardeau gas of ultracold atoms in an optical lattice. Nature 429, 277–281 (2004) 24. Petrov, D.S., Gangardt, D.M., Shlyapnikov, G.V.: Low-dimensional trapped gases. J. Phys. IV France, 116, 3–44 (2004) 25. Petrov, D.S., Holzmann, M., Shlyapnikov, G.V.: Bose-Einstein Condensates in Quasi-2D Trapped Gases. Phys. Rev. Lett. 84, 2551 (2000) 26. Petrov, D.S., Shlyapnikov, G.V., Walraven, J.T.M.: Regimes of Quantum Degeneracy in Trapped 1D Gases. Phys. Rev. Lett. 85, 3745–3749 (2000) 27. Pitaevskii, L., Stringari, S.: Bose-Einstein Condensation. Oxford: Oxford Science Publications, 2003 28. Pricoupenko, L.: Variational approach for the two-dimensional trapped Bose-Einstein condensate. Phys. Rev. A 70, 013601 (2004) 29. Reed, M., Simon, B.: Methods of Modern Mathematical Physics IV. New York: Academic Press, 1978 30. Robinson, D.W.: The Thermodynamic Pressure in Quantum Statistical Mechanics. Lecture Notes in Physics, Vol. 9, Berlin Heidelberg New York: Springer, 1971 31. Rychtarik, D., Engeser, B., Nägerl, H.-C., Grimm, R.: Two-dimensional Bose-Einstein condensate in an optical surface trap. Phys. Rev. Lett. 92, 173003 (2004) 32. Schick, M.: Two-Dimensional System of Hard Core Bosons. Phys. Rev. A 3, 1067–1073 (1971) 33. Schreck, F., et al.: Quasipure Bose-Einstein Condensate Immersed in a Fermi Sea. Phys. Rev. Lett. 87, 080403 (2001) 34. Seiringer, R.: Gross-Pitaevskii Theory of the Rotating Bose Gas. Commun. Math. Phys. 229, 491–509 (2002) 35. Seiringer, R.: Ground state asymptotics of a dilute, rotating gas. J. Phys. A: Math. Gen. 36, 9755–9778 (2003) 36. Seiringer, R.: Dilute, Trapped Bose gases and Bose-Einstein Condensation. In: Large Coulomb Systems, Derezinski, J., Siedentop, H. (eds.) Lect. Notes in Phys. 695, Berlin Heidelberg-New York: Springer, 2006, pp. 251–276, avilable at http://www.esi.ac.at/preprints/ESI-Preprints.html, 2004 37. Temple, G.: The theory of Rayleigh’s Principle as Applied to Continuous Systems. Proc. Roy. Soc. London A 119, 276–293 (1928) 38. Zobay, O., Garraway, B.M.: Atom trapping and two-dimensional Bose-Einstein condensates in fieldinduced adiabatic potentials. Phys. Rev. A 69, 023605 (2004) Communicated by H.-T. Yau

Commun. Math. Phys. 269, 693–713 (2007) Digital Object Identifier (DOI) 10.1007/s00220-006-0142-x

Communications in

Mathematical Physics

On the Global Evolution of Vortex Filaments, Blobs, and Small Loops in 3D Ideal Flows Luigi C. Berselli, Massimiliano Gubinelli Dipartimento di Matematica Applicata “U.Dini”, Università di Pisa, Pisa I-56127, Italy. E-mail: [email protected]; [email protected] Received: 28 October 2005 / Accepted: 7 July 2006 Published online: 10 November 2006 – © Springer-Verlag 2006

Abstract: We consider a wide class of approximate models of evolution of singular distributions of vorticity in three dimensional incompressible fluids and we show that they have global smooth solutions. The proof exploits the existence of suitable Hamiltonian functions. The approximate models we analyze (essentially discrete and continuous vortex filaments and vortex loops) are related to some problem of classical physics concerning turbulence and also to the numerical approximation of flows with very high Reynolds number. Finally, we apply our strategy to discrete models for filaments used in numerical methods. 1. Introduction In this paper we consider the Euler equations describing the motion of an ideal (inviscid), incompressible and homogeneous fluid in Rn (n = 2, 3): ∂ u + ( u · ∇) u + ∇ p = 0, ∂t

(1.1)

∇ · u = 0,

(1.2)

and with the associated initial condition u |t=0 = u 0 (x), where u = (u1 , . . . , un ) is the velocity field, while the scalar p is the kinematic pressure. To solve (1.1) and (1.2), the introduction of the vorticity field is crucial. This quantity, that measures the “amount of rotation” inside the fluid, is responsible for the big differences between the 2D and the 3D case, see Wolibner [37] and Yudovich [39]. If n = 2, taking the curl of Eq. (1.1), one gets ∂ω = 0, + ( u · ∇) ω ∂t

694

L. C. Berselli, M. Gubinelli

:= curl u = (0, 0, ∂1 u2 − ∂2 u1 ), that is a pure transport equation. Then, a where ω bounded initial vorticity remains bounded for all the positive times. Let us denote by t (x) the position at time t of the fluid particle that at time t = 0 was in x . We have the X following equation (equations for the characteristics) t (x) dX t (x)). = u (t, X dt

(1.3)

t , for t ≥ 0, The vorticity field is simply transported along the trajectories of particles X having two remaining always perpendicular to the plane of motion. Beside, the vector ω components identically zero behaves essentially as a scalar. When n = 3, the situation becomes much more complicated. The vorticity associated to u is a 3D vector field satisfying the following equation: ∂ω = (ω · ∇) u . + ( u · ∇) ω ∂t is no longer conserved and the fundamenIn contrast to the 2D case, the vorticity ω tal complication arising in three dimensions is the presence on the right-hand side of · ∇) u , the “vortex stretching term,” which is –roughly speaking– of order of |ω| 2. (ω Some numerical simulations suggest that solutions of the ideal fluid equations may develop singularities and that the possible singularities may be detected by the blow-up see Beale, Kato, and Majda [2] and references therein. Moreof the vorticity field ω, over, accurate direct numerical simulations of 3D turbulent fluids show that the regions x)| is big have an elongated “filament” shape, see for instance Bell and Markwhere |ω( us [3], Vincent and Meneguzzi [36], and the reviews in Chorin [9] and Frisch [18]. Thus, a consistent part of the current research activity is devoted to find suitable mathematical descriptions of these (and other geometric) vortex structures, see for instance Constantin [10]. In recent years also the use of the Lagrangian approach seems promising to detect different intrinsic geometric properties of fluid flows, see Constantin [11]. We briefly recall some recent contributions on the study of these filaments, motivated also by application to the numerical simulation: Chorin [9] proposed a model based on self-avoiding random walks, using ideas coming from the theory of polymers; Gallavotti [19] suggested the introduction of non-smooth curves (like Brownian motions), to avoid some of the divergences arising in the equations; Lions [26] and Lions and Majda [27] proved a mean field result based on an approximate model with “nearly parallel” vortex filaments; Flandoli [16] introduced a probabilistic description and overcame certain divergences in the energy by considering filaments with a fractal cross-section. Furthermore, by using the latter model, authors in [17] have defined certain Gibbs measures, while Bessaih and Flandoli [5] proved a mean field result. A recent overview on vortex computational methods can be found in Cottet and Koumoutsakos [15]. On the other hand, relevant advances in applied mathematics have been obtained in the study of 2D models for fluids (like the quasi-geostrophic equations) that replicate the 3D vorticity behavior via “analytic and geometric analogies,” see Constantin, Majda, and Tabak [12] and Córdoba and Córdoba [13]. This is obtained with a smoothing or a modification (with appropriate scaling properties) of the Biot-Savart law. Finally, very recently Córdoba et al. (see [14] and references therein) studied the breakdown of smooth tubes of vorticity and other singularities concentrated on small but elongated sets.

Global Evolution of Vorticity in 3D Ideal Flows

695

The basic goal of this paper is to show global existence of smooth solutions for models of line vortices, supplementing previous work in [4]. In particular, we shall exhibit a Hamiltonian function. The main results will be derived by showing that the energy is conserved and also that the velocity induced by a singular distribution of vorticity is smooth enough. We shall consider rectifiable curves, but similar results can be also be proved for Hölder curves and some rough paths. We shall also consider the vortex loop approximation (see Buttke [7] and Chorin [9]) for the equivalent equations of motion involving the impulse density. Also for this model we shall prove similar results of existence of smooth solutions for arbitrarily large times. Finally, we discuss the evolution for approximate equations in which a smooth vortex filament is approximated by a discretization and we shall connect the existence of solutions with the total and quadratic variation or the curve. This latter model is simpler numerical approximation to the line vortex evolution problem. Plan of the paper. In Sect. 2 we recall the motivation for the introduction of models with vorticity concentrated on singular sets together with classical methods to obtain integrable equations. In Sect. 3 we introduce a proper Hamiltonian and we give the main analytical results: conservation of energy and estimates of the velocity in terms of the energy that prove global existence of strong solutions. In Sect. 4 we use the same technique to deal also with the problem of vortex loops in magnetization variables. Finally, in Sect. 5 questions connected with the numerical simulation of a line vortex are put in the framework of the previous results. 2. Vortex Motion and Approximate Models To introduce the problem, we shortly recall the discussion in Sect. 2 of [4], where the reader can find full details. Curves that are always parallel to the vorticity vector are known as vortex lines, and vortex lines passing through the points of a closed curve define a volume called vortex tube. A vortex filament is a vortex tube that is immediately surrounded by irrotational fluid, even if the usage of this expression is neither uniform nor consistent. It can be also used to design infinitesimal vortex tubes or also patterns of vorticity that are not properly formed by vortex lines. A line vortex is a singular distribution in which infinite vorticity is concentrated on a line, such that the circulation around a closed circuit threaded by the line (the strength) is finite. A line vortex is the result of a limiting process in which a vortex filament (of finite strength) is contracted to a curve, the strength > 0 being kept constant. The laws of vortex motion, first formulated by Helmholtz [21], state that in an ideal fluid vortex lines and tubes move with the fluid and that the strength of each vortex tube (flux through any cross-section) remains constant in time. concentrated on a smooth curve Hence, we consider an ideal fluid with the vorticity ω genγ . Neglecting the transverse size of the filament, we can write the vorticity field ω erated by γ (that is described parametrically as a continuous function γ : [0, 1] → R3 such that γ (0) = γ (1)) and, formally, x ) = ω(t, 0

1

δ(x − γ (t, ξ ))γ ξ (t, ξ ) dξ

∀ x ∈ R3 , ∀ t ∈]0, T [,

(2.1)

where δ( . ) is the Dirac’s delta function, ξ is the arc-length parameter, t represents the time, and γ ξ is the derivative with respect to the arc-length.

696

L. C. Berselli, M. Gubinelli

A discussion on the precise meaning of the formal vorticity field (2.1) and its distributional rigorous definition, can be found, for example, in the Appendix of [4]. Remark 1. The above formula can also be written formally as the distribution 1 x ) = δ(x − γ ξ ) dξ γ , ω(t, 0

meaning that the integral could be the Riemann-Stieltjes integral or some other defined in a more general setting. In the sequel we shall not write explicitly the dependence and u on γ since we shall always consider vorticity (and corresponding velocity) of ω “generated” by a distribution concentrated on a curve γ . It is well-known that the kinetic energy associated to (2.1) (we shall denote by a, b 3 the scalar product in R of the vectors a and b) 1 2 1 1 γ (t, ξ ), γ η (t, η) dξ dη, E(t) = 8π 0 0 |γ (t, ξ ) − γ (t, η)| ξ is infinite for any reasonably smooth curve γ . This explains the reason for the study of approximate models (as the Localized Induction Approximation (LIA) or the Rosenhead approximation, see Saffman [34]) or for the same problem with curves supported by non-smooth lines (see Gallavotti [19] and recent results in [6]). To derive the evolution equation satisfied by a line vortex the essential tool is the Biot-Savart formula that gives a representation formula for the velocity, in terms of the vorticity: 1 (x − y) y) d y, u (x) = − ∧ ω( 4π R3 |x − y|3 ∈ R3 is the vector (exterior) product of the vectors a , b ∈ R3 . Inserting where a ∧ b the expression (2.1) of the vorticity field in the Biot-Savart formula, using the equation for the characteristics (1.3), and imposing that the curve γ is transported by the velocity field u , we get the following equation: 1 ∂ γ (t, ξ ) γ (t, ξ ) − γ (t, η) =− ∧ γ η (t, η) dη. ∂t 4π 0 |γ (t, ξ ) − γ (t, η)|3 This equation (relation between its solutions and weak solutions to the 3D Euler equations can be found -for instance- in the Appendix of [4]) has been used for the first time by J.J. Thomson, who proposed a cut-off in the integral in order to control an infinite energy, see [34]. An approximation based on the Taylor series expansion of the kernel in the above singular integral gives the (LIA) analytically solved by Hasimoto [20], while recent results in this direction are those by Klein and Majda [24, 25]. The de-singularized model studied analytically in [4] is that proposed by Rosenhead [33], 1 ∂ γ (t, ξ ) γ (t, ξ ) − γ (t, η) (2.2) =− 2 3/2 ∧ γ η (t, η) dη, ∂t 4π 0 γ (t, ξ ) − γ (t, η) + μ2 for some μ > 0. The choice was motivated by the fact that (2.2) gives advantages for numerical integration and, starting with the work by Moore [31], it has been used in the applied context for some calculation related to aircraft trailing vortices.

Global Evolution of Vorticity in 3D Ideal Flows

697

2.1. On the Rosenhead model. Regarding the Rosenhead model, a first analytical result has been recently proved in [4]. In that reference it is shown (as a particular case) that if the initial curve is smooth and closed then the initial value problem is well-posed in natural Sobolev spaces. In the sequel we shall use the customary W k, p and H k = W k,2 spaces, see Adams [1]. To our knowledge the following is essentially the first rigorous existence and uniqueness result, known for this model. Theorem 1. [See [4]]. Let γ 0 (ξ ) ∈ H#1 (0, 1), where H#1 (0, 1) denotes the subset of closed curves R → R3 belonging to the Sobolev space H 1 (0, 1). Then, there exist a strictly positive T ∗ ≤ T and a unique curve γ satisfying a) γ ∈ W 1,∞ (0, T ∗ ; L 2 (0, 1)) ∩ L ∞ (0, T ∗ ; H#1 (0, 1)); t 1 γ (τ, ξ ) − γ (τ, η) b) γ (t, ξ ) = γ 0 (ξ ) − 2 3/2 ∧ γ η (τ, η) dηdτ, 4π 0 0 γ (τ, ξ ) − γ (τ, η) + μ2 0 ≤ t < T ∗, that is a strong solution to (2.2) in the time interval [0, T ∗ ). In addition, the time T ∗ > 0 may be estimated from below and it depends just on the 1 H -norm of the initial datum. Remark 2. As proved in [4] the results of Theorem 1 hold also in the presence of more general regularized equations of which the Rosenhead model is a particular case. The assumption that the kernel function ∇ϕ in an approximate Biot-Savart law 1 y) d y , u (x) = − ∇ϕ(x − y) ∧ ω( 4π R3 (cf. with the ∇ϕ of the section below) has first and second order derivatives uniformly bounded is sufficient in order to have the same results. Hence the local existence of strong solutions is inherited by the models for line vortex we shall consider in the sequel. A similar result has been recently proved in a more general stochastic context in [6] with a different goal and weaker hypotheses on the smoothness of the curve γ . Anyway, in both papers the question of the global solvability is left open and some continuation criteria involving the length of the curve itself were derived. Our main goal is now to show that, for a class of regularizations including the Rosenhead model (and that share some symmetries with the original problem), the above local theorem is in fact global.

2.2. A class of regularized problems. A wide and interesting class of regularized evolution equations can be defined as follows. Let ϕ : R3 → R be a scalar function and define the velocity field u associated to the vortex filament γ as

1

= u (x) := (curl ϕ) ∗ ω 0

∇ϕ(x − γ (ξ )) ∧ γ ξ (ξ ) dξ,

(2.3)

where the convolution ∗ has to be intended in the sense of distributions. On the kernel ϕ we impose the following conditions that will allow the energy to be well-defined.

698

L. C. Berselli, M. Gubinelli

Hypothesis A. We shall assume that: - The function ϕ is even:

ϕ(−x) = ϕ(x)

∀ x ∈ R3 .

(A.1)

- The function ϕ has a real and non-negative Fourier transform: := e ik, x ϕ(x) d x ≥ 0 ∀ k ∈ R3 . ϕ (k)

(A.2)

is integrable over R3 : - The Fourier transform ϕ (k) d k < +∞. ϕ (k)

(A.3)

- The function ϕ is smooth enough, in order that 2 )2 d k < ∞. (1 + |k| ϕ (k)

(A.4)

R3

R3

R3

These assumptions are satisfied by a wide range of functions ϕ, covering also well-known cases with a physically meaningful interpretation. Remark 3. [An explicit relevant example]. The function ϕ R (x) =

(|x|2 + μ2 )1/2

for some μ > 0 would be a natural candidate satisfying all the above assumptions. Assumption (A.1) is trivially satisfied. The Assumption (A.2) on the non-negativity of the Fourier transform can be verified by a direct computation:

e ik, x d x x|2 + μ2 )1/2 R3 (| ∞ 2 2 e ik, x −t|x | −tμ =√ dt d x t 1/2 π 0 R3 ∞ −|k |2 /(4t)−tμ2 e dt ≥ 0, = π t2 0

= ϕ R (k)

where we used the fact that

π = λ

∞

e −λt t −1/2 dt,

0

and we employed the explicit expression of the Fourier transform of the Gaussian kernel. Regarding the Assumption (A.3), the Fourier Transform may be expressed in terms of a Bessel function of the second kind y(z) = Bessel(1, z), defined as a solution of the differential equation z 2 y + zy − (z 2 + 1)y = 0. Hence, the integral over R3 can be split into the inner and outer part. The first one is bounded since the function ϕ R (k) −2 ) near the origin, hence the inner integral converges. On the other hand, is O(|k| the decay at infinity necessary to show the convergence of the outer integral derives = 2μ2 Bessel(1, 2|k|μ)(μ| −1 . The fast decay at k|) directly by observing that ϕ R (k)

Global Evolution of Vorticity in 3D Ideal Flows

699

infinity (enough to have both (A.3) and (A.4) satisfied) derives from the properties of this special function. In this case we obtain, by using as a kernel the function ϕ R (x), exactly the Rosenhead model where the velocity field is given by the equation: 1 x − γ (ξ ) ∧ γ ξ (ξ ) dξ. u (x) = − 4π 0 |x − γ (ξ )|2 + μ2 3/2 The evolution problem can be set up in different Banach spaces of closed paths where local solutions exist and are unique. For example we can have solution for the initial condition in the Sobolev space H#1 (0, 1), in the space of Hölder continuous with exponent greater than 1/2, and also in some spaces of rough-paths, see [4, 6]. Details of the proofs (and some specific assumptions) depend however on the functional setting, so we will show the proofs in the case of solutions living in H#1 (0, 1), using the framework of Theorem 1. 3. Energy and Global Solutions For the evolution problem

∂ γ (t, ξ ) = u (t, γ (t, ξ )), (3.1) ∂t with velocity given by (2.3), we can identify a function which plays the röle of a “kinetic” energy. Definition 1 (Kinetic energy). The function H , defined on the space of smooth curves, is the “kinetic energy” for the smoothed evolution problem associated to (3.1), 1 1 1 H(γ ) := ϕ(γ (ξ ) − γ (η)) γ ξ (ξ ), γ η (η) dξ dη. (3.2) 2 0 0 The main result we shall prove is that H(γ ) is constant in time along solutions of (3.1). Then, the existence of the energy allows us to exploit a-priori estimates to have global existence for initial data with finite energy. Remark 4. This particular expression for the energy has been discussed also by Marsden and Weinstein [30] and Holm [22]. In fact, they show that on the space of closed curves in R3 there exists a natural Poisson structure and that -formally- with respect to this structure the function (3.2) is the Hamiltonian function which generates the flow described by (3.1). First, we show that the function H(γ ) is well-defined. Lemma 1. For each smooth enough curve γ , it holds that 0 ≤ H(γ ) < +∞. Proof. In terms of Fourier variables, the energy H(γ ) can be written as

1 1 1 e−i k, γ (ξ )−γ (η) γ ξ (ξ )γ η (η) d k dξ dη H(γ ) = ϕ ( k) 3 2(2π ) 0 0 R3

1

2

γ (ξ ) 1 i k,

= γ ξ (ξ ) dξ

d k ϕ (k) e

3 2(2π ) R3 0

700

L. C. Berselli, M. Gubinelli

which, thanks to the assumption on the non-negativity of the Fourier transform of ϕ, proves that the energy is a non-negative quantity. Moreover, for any γ ∈ H#1 (0, 1) we have the obvious estimate

1

1

,γ (ξ ) ik

≤ γ e (ξ ) dξ |γ ξ (ξ )| dξ. ξ

0

0

Actually we observe that sharper estimates are possible, even if they are not needed at this stage. Then, 1 H(γ ) ≤ 2(2π )3

R3

d k ϕ (k)

0

1

2 |γ ξ (ξ )| dξ

.

The final observation is that both integrals are finite (use Assumption (A.3) and the fact that γ is a rectifiable curve), showing that H(γ ) is a well-defined energy for any γ ∈ H#1 (0, 1). Next, we show that the function H(γ ) behaves as a constant of motion if γ evolves under the flow associated to u . Lemma 2. Let γ (t, ξ ) be a local smooth (as those of Theorem 1) solution of the problem (3.1), then dH(γ (t, ·)) = 0. dt Proof. We shall show by an explicit computation that the energy H(γ ) is invariant and the completely anti-symmetric Levi-Civita tensor i jk will be used to write, with the Einstein repeated indices convention, that ui (t, x ) = i jk 0

1

∇ j ϕ(x − γ (t, ξ ))γ kξ (t, ξ ) dξ.

Then, with explicit vector notations, we have: dH(γ (t, ·)) = dt

1 1

0

∇ k ϕ(γ (t, ξ ) − γ (t, η))∂t γ k (t, ξ )γ iξ (t, ξ )γ iη (t, η) dξ dη

0 1 1

+ 0

0

ϕ(γ (t, ξ ) − γ (t, η))∂t γ iξ (t, ξ )γ iη (t, η)dξ dη,

where, for simplicity, ∂t γ (t) := ∂ γ (t)/∂t. By integrating by parts the ξ -integral in the second term we get dH(γ (t, ·)) = dt

1 1

0

0

−

∇ k ϕ(γ (t, ξ ) − γ (t, η))∂t γ k (t, ξ )γ iξ (t, ξ )γ iη (t, η) dξ dη

1 1

0

= cab ci j

∇ k ϕ(γ (t, ξ ) − γ (t, η))∂t γ i (t, ξ )γ kξ (t, ξ )γ iη (t, η) dξ dη

0 1 1 0

0

j

∇ a ϕ(γ (t, ξ )− γ (t, η))∂t γ i (t, ξ )γ ξ (t, ξ )γ bη (t, η) dξ dη,

Global Evolution of Vorticity in 3D Ideal Flows

701

where we used the fact that cab ci j = δai δbj − δa j δbi . Next, by definition of u we have dH(γ (t, ·)) = ci j dt

1 0 1

= ci j 0

j

uc (γ (t, ξ ))∂t γ i (t, ξ )γ ξ (t, ξ ) dξ j

uc (γ (t, ξ ))ui (γ (t, ξ ))γ ξ (t, ξ ) dξ

= 0, where we used in sequence Eq. (3.1) of motion for γ (t) and the complete anti-symmetry of the tensor i jk . The next step is to show that if the kinetic energy is bounded, then the velocity u associated to the evolution problem is smooth. In particular, it will follow that the velocity induced by vorticity concentrated over a H 1 -curve is very regular. The regularity of the smoothing kernel is inherited by the velocity u even if the framework is that of a singular problem. Lemma 3. For any 0 ≤ n ∈ N, we have the bound 1/2 1 n 2(1+n) ∞ ∇ u (t, ·) L ≤ |k| ϕ (k) d k H1/2 (γ (t, ·)), 2π 3/2 R3 2(1+n) d k is finite. provided that the integral R3 |k| ϕ (k)

(3.3)

Proof. The proof follows easily by using Cauchy-Schwartz inequality. In fact,

1

1

,x ,γ (t,ξ ) ik −ik | u(t, x )| = γ ξ (t, ξ ) dξ d k

e ϕ (k)(i k) ∧ e

3 (2π ) R3 0 is well defined and since by assumption (A.3) ϕ (k)

1

1

,x ,γ (t,ξ ) ik −ik = γ ξ (t, ξ ) dξ d k

e ϕ (k)(i k) ∧ ϕ (k)e

3 3 (2π ) R 0

1

2 1/2 1/2

1 γ k 2 −i , (t,ξ )

k ≤ γ ξ (t, ξ ) dξ

d k |k| ϕ (k)d ϕ (k) e

3 (2π ) R3 R3 0 1/2 1 2 k = |k| ϕ (k)d H1/2 (γ (t, ·)), 2π 3/2 R3 and in a similar fashion we can bound all the derivatives of u , provided that the kernel ϕ is smooth enough: 1/2 1 2(1+n) d k ∇ n u (t, ·) L ∞ ≤ | k| ϕ ( k) H1/2 (γ (t, ·)). 2π 3/2 R3

702

L. C. Berselli, M. Gubinelli

Essentially we shall use this lemma just for n = 1 to prove global existence of solutions, hence Assumption (A.4) will be enough. On the other hand, to prove higher are needed. regularity of u these extra conditions on ϕ (k) Theorem 2. The evolution problem (3.1) has a unique global solution for any initial condition in H#1 (0, 1). ∈ H 1 (0, 1) the initial condition, then we already know that the loProof. Denote ψ # cal solution γ (t, ξ ) exists in a (possibly small) time interval t ∈ [0, T ∗ [, depending ξ , recall Theorem 1. Let us suppose per absurdum that this is on the L 2 -norm of ψ hence that the maximal existence interval for the smooth solution starting from ψ, limt→T ∗ γ ξ (t, ·) = +∞. However, by taking the derivative of (3.1) with respect to ξ , and with an integration over [0, t] ⊂ [0, T ∗ [, we have now the additional a-priori estimate ( . denotes the L 2 (0, 1)-norm) t γ ξ (t, ·) ≤ γ ξ (0, ·) + ∇ u (s, ·) L ∞ γ ξ (s, ·) ds 0 t ≤ γ ξ (0, ·) + C H1/2 (γ (s, ·))γ ξ (s, ·) ds 0 t = γ ξ (0, ·) + CH1/2 (γ (0, ·)) γ ξ (s, ·) ds, 0

where the first line is an easy bound for the norm of the solution, the second line comes from Lemma 3, and the last line from the fact that the energy is constant along solutions (as stated in Lemma 2.) Then, by using the Gronwall inequality we have sup γ ξ (t, ξ ) ≤ γ ξ (0, ·) e C H

t∈[0,T ∗[

1/2 (

γ (0,·)) T ∗ < +∞,

which guarantees the boundedness γ ξ (t, ·). This proves the existence of the global solution, since any local solution, having the H 1 (0, 1) norm uniformly bounded can be uniquely continued up to any positive time. 3.1. Hölder initial conditions. In Ref. [6] the authors prove that for sufficiently regular functions ϕ the vortex filament equation has local solutions for initial conditions γ (0) which are Hölder continuous functions of exponent α, with α ≥ 1/2. In this case, the line integrals of the form 1 1 (3.4) f(γ (t, ξ )), γ ξ (t, ξ ) dξ = f(γ (t, ξ )), dξ γ (t, ξ ) 0

0

must be understood as limits of Riemann sums: such a limit exists if the function f : R3 → R3 is at least C 1 and this process defines the integral in the sense of Young [38]. Moreover, in the same paper, the authors extend the local existence result to a class of initial conditions living in a space of rough paths (see the book of Lyons for more details [28, 29]) that are a special class of Hölder paths with some additional structure. For rough paths it is possible to define the line integrals (as those in Eq. (3.4)) by means

Global Evolution of Vorticity in 3D Ideal Flows

703

of natural “renormalized” Riemann sums. In this way it has been proved that the filament equation (with sufficient regularity of the kernel ϕ) admits local solutions starting from almost every 3D Brownian loop (i.e. curves chosen according to the Wiener measure restricted to the subspace of closed curves.) The proof that the energy is constant along solutions extends also to these different functional settings and the existence of global solutions can be proved with the same strategy used for H 1 paths. We leave the technical details to the interested reader, since they are outside the scope of the present paper. For example, the extension of Lemma 2 to Hölder solutions can be done using convergence of approximating discretizations and the computations contained in Sect. 5. Remark 5. The relevance of this additional result can be seen in the light of the K41 theory and the Onsager conjecture on possible singularities in the sense of C 0,α velocity fields. This may suggest that velocity behaves as a Hölder continuous velocity field (see [18]), hence as a singularity diffused all over the fluid. More recent developments link intermittency with concentration of singularities on small sets, described by Hölder, hence (multi-)fractal sets, see [18, 19]. 4. Impulse Formulation In this section we consider another singular distribution of vorticity that is well-known and useful in scientific computing, since it involves vector fields (as the “magnetization”) that are possibly with non-vanishing divergence. In particular, we shall consider Buttke loops that (according to the terminology of Chorin [9]) are small loops of vorticity in an irrotational background, which evolve according to the 3D Euler equation. The first study of the kinematic interaction of an immersed body in an inviscid irrotational flow dates back to Kelvin [23] and his analysis was based on the study of fluid impulse. It was in that paper that he introduced a model of “core-less vortices” in order to explain some experimental facts on eddy formation at the boundary. The interaction of a vortex ring (line) in an inviscid flow has been also studied by Roberts [32], by using as canonical variables the position of the centroid of each ring and its impulse (called by Roberts “momentum of vorticity”). A proper Hamiltonian formulation of the 3D Euler equations has been discovered independently by different authors: see for instance Osedelets [35] for a development in a continuum setting, with the canonical variables being the position and the impulse density. Then, around 1990 Buttke [7] linked the discrete formulation of Roberts to fast and efficient discrete methods for the numerical simulation of turbulent flows. We consider also this setting, since by using essentially the same techniques of the previous section we are able to prove a global existence result also for the latter model.

4.1. Buttke loops and a discrete problem. The main idea (refer for instance to [7, 9]) (that is called magnetization or vortex is to introduce a new vector valued variable m magnetization), that is obtained by adding to the velocity u a gradient at t = 0: = u + ∇q. m

(4.1)

does not satisfy the incompressibility constraint, but it is with compact The unknown m support and it satisfies curl u = curl m.

704

L. C. Berselli, M. Gubinelli

as essentially local, while ∇q is an extensive field. The decomThen, we can interpret m is different from u , or position (4.1) resembles the Helmholtz decomposition, even if m is related to the so-called “effective vorticity” from momentum density. The vector m Finally, the introduction of the new (compactly supported) by the relation ξ = curl m. can be seen as a gauge transformation, known as “geometric gauge.” variable m can be easily determined: The equation of evolution for m ∂m +m · (∇ u )T = 0, + ( u · ∇) m ∂t P denoting the Leray projection over divergence-free vector fields. The where u = P m, main point is that we have an equivalent equation for u plus an arbitrary gradient at the initial time. A proper choice of this gradient leads to the study of vortex loops. In fact, has support within the if curl u has support within a small ball B, then the resulting m same ball and we shall assume that ρ(x − x B ) for some x B ∈ B. x) = M m( is a vector in R3 , while ρ : R3 → R is a non-negative smooth function with Here M ρ(x − x B ) has a support within B and such that R3 ρ(x) d x = 1. The “magnet” M simple interpretation: the velocity field induced is the same as a small vorticity loop (or perpendicular to the plane of the loop and |M| = π R 2 . Then, vortex ring), with M can be interpreted as a vortex dipole density. from the physical point of view the vector m The above interpretation can be used to deduce a Hamiltonian system for a more complicated distribution of vorticity, see Buttke [7]. This approach works as a de-singularization of the problem: the resulting discrete dynamic we shall consider includes in an implicit way a regularization that will allow us to prove global existence of smooth solutions as well as good computational properties investigated in [7]. The discrete model is derived by considering a finite sum of magnets and denoting α the corresponding magby x α , for α = 1, . . . , N , the positions of the loops and by m netization vectors, that are not anymore constant vectors. Then, the velocity field u of the fluid is given by the solution of the equation + ∇q, u = M with x, t) = M(

N

α (t)ρ(x − x α ), m

α=1

where the scalar q is determined by the equation q = −∇ · M. is (component-wise) The equation of motion for M ∂Mi j j i + u ∇ M =− M j ∇i u j ∂t 3

3

j=1

j=1

i = 1, 2, 3,

Global Evolution of Vorticity in 3D Ideal Flows

705

and the “loop particles” move according to the system of ordinary differential equations ⎧ 3 ⎪ ⎪ ⎨m ˙ αi = − mαj ∇ i u j (xα ) (4.2) j=1 ⎪ ⎪ ⎩ i x˙ α = ui (xα ), where the index α runs over particle labels. Equations (4.2) form a Hamiltonian system, if we consider the conjugate pairs of α )α=1,...,N with symplectic structure variables (xα , m d =

3 N α=1 i=1

dmαi ∧ dxαi

and Hamiltonian function α }) = H({xα , m

N 1 α ·m β ρ(xα − x β ) + (m α · ∇)(m β · ∇)(xα − x β ), m 2 α,β=1

where is the solution of = ρ. Hence system (4.2) takes the canonical form ⎧ ∂H ⎪ ⎪ ⎨ x˙ α = α ∂m ∂H ⎪ ⎪ ⎩m ˙ α = − . ∂ x α Remark 6. The analogy between magneto-static and fluid mechanics is that the vorticity corresponds to the electric current and the velocity corresponds to the magnetic induction. Since the name “magnetization” may be misleading, we prefer to use Chorin’s notation and to refer to them as Buttke loops or vorticity loops. In order to show that the energy is constant on solutions we pass to the wave-numbers be the Fourier transform of ρ(x). The energy H({xα , m α }) can be notation. Let ρ (k) written in terms of wave-numbers as follows:

2 N

xα

1 i k,

αe α }) = H({xα , m m ρ (k)

d k,

k 2(2π )3 R3 α=1

From this represenwhere is the projection in the plane orthogonal to the vector k. k tation it is clear that the energy is positive definite. Moreover, the velocity field has the form N 1 α e −ik, x−xα d k m ρ (k) u (x) = k (2π )3 R3 α=1

is and the matrix ∇ u appearing in the equation for m N 1 l α ) j e −ik, x−xα d k. [∇ u ] (x) = ρ ( k)(−i k) ( m 3 k (2π ) R3 α=1 lj

706

L. C. Berselli, M. Gubinelli

4.2. The periodic problem. In order to avoid problems related to the decay at infinity and possibly non-converging integrals, we restrict to the periodic setting. Thus, we fix a box of linear size 2π and consider the periodic version of the above problem, i.e., we substitute the Fourier transform with a Fourier series expansion. The energy is now given by 1 α }) = H({xα , m ρ (k) 2(2π )3 k ∈Z3

2

N

α e ik, x α , m

k α=1

where k runs over Z3 and the coordinates x α are restricted to the box =] − π, π [3 . The velocity is then

u (x) =

N 1 α e −ik, x −x α ρ ( k) m 3 k (2π ) α=1 k ∈Z3

∀ x ∈ .

By using standard techniques we are able to prove the following result. Proposition 1. The solution for the evolution problem of a finite number of loops in the periodic box exists (and is unique) for any positive time, provided that the initial is non-negative and satisfies the decay condition has finite energy and the function ρ (k) estimate

4ρ < +∞. |k| (k)

k ∈Z3 as |k| → +∞ required in Proposition 1 is Remark 7. The condition of decay of ρ (k) easily satisfied if ρ(x) is a smooth enough function. The condition on the non-negativity is not automatically satisfied for each smooth function ρ(x). A wide class of of ρ (k) smooth functions with compact support and non-negative (hence real) Fourier coefficients can be identified as follows. Consider a smooth (say in C ∞ ()) “bump” function b(x) over the ball B(0, δ) and, in addition, suppose that b(x) is “even,” implying that the Fourier coefficients have vanishing imaginary part. Then, define ρ(x) := (b ∗ b)(x) and this will turn out to be smooth, non-negative, and null outside the ball B(0, 2δ) of radius twice that of the original one. Finally, by using the convolution theorem it will follow that = b = · ≥ 0. ρ (k) ∗ b (k) b(k) b(k) Proof of Proposition 1. Again the proof is based on an energy estimate and an a-priori bound of the velocity in terms of the energy. The main point of the proof is the following

Global Evolution of Vorticity in 3D Ideal Flows

707

Cauchy-Schwartz inequality:

N

1 x −xα

−ik, ρ ( k) e m | u(x)| ≤

α

k (2π )3 α=1 3 k∈Z ⎡ ⎤ ⎤1/2 ⎡

2 1/2

N

1 ⎢

⎥ ⎢ ⎥

α e −ik, x−xα ⎦ ≤ m ρ (k) ρ (k) ⎣ ⎦ ⎣

k (2π )3 α=1 k ∈Z3 k ∈Z3 ⎡ ⎤1/2 1 ⎢ 1/2 ⎥ α }), = ρ (k) ⎣ ⎦ H ({xα , m 2π 3/2 3 k∈Z and in the same way it easily follows that ⎡ u + ∇ u + ∇ 2 u ≤

⎤1/2

1 ⎢ 4 ⎥ |k| ρ (k)⎦ ⎣ 2π 3/2 k ∈Z3

α }), H1/2 ({xα , m

α }). By a showing that we can control the regularity of u in terms of the energy H({xα , m α }) is constant along (smooth enough) direct differentiation it is also clear that H({xα , m solutions of Eqs. (4.2) so that we have good control over u for any positive time. This implies the global existence of solutions. 5. A Class of Discrete Models of Evolution of Vorticity In this section we study a class of discrete models for the evolution of vorticity which includes discrete versions of the line vortex model. These models could be of interest in the numerical simulation using lattice methods as suggested in [9]. In the sequel we shall always assume on the function ϕ the four natural hypotheses (A.1)–(A.4) we introduced in Sect. 2.2. We consider distribution of vorticity giving rise to a velocity field of the form u (x) =

N

∇ϕ(x − x α ) ∧ ξ α ,

α=1

that is parameterized by the set of 6N variables {xα , ξ α }α=1,...,N . The physical interpretation of this equation is that of describing the superposition of N vortex “blobs” (i.e. concentrations of vorticity) indexed by α, situated at points x α , and whose vorticity is directed along the vector ξ α . These blobs determine the initial datum and there are no assumptions that the solution persists, being the same type for all positive times. This is the difference between blobs and the loops considered in the previous section. A priori we can think of each of the variables {xα , ξ α }α as an independent degree of freedom. However, if we set ξ α = x α+1 − x α (with the understanding that α + 1 = 0 if α = N ), then this model corresponds to a natural discretization of the line vortex model previously described. This approach may also be linked with numerical methods that

708

L. C. Berselli, M. Gubinelli

try to simulate the discrete line vortex equations (e.g. the LIA model) by solving the equation for the tangent vector, since ξ α represents the tangent vector for a piecewise linear closed line (see Buttke [8] and Zhou [40]). For the moment we do not impose any constraint on the variables ξ (either static or dynamic) and we just impose that the points x α are transported by the flow u : x˙ α = u (xα ),

α = 1, . . . , N .

(5.1)

This is the discrete counterpart of (1.3). If we want to address the problem of the global (in time) dynamics of these models we can introduce the following useful quantities: L=

N

|ξ α |

and

A=

α=1

N

|ξ α |2 ,

α=1

that are respectively the length and the quadratic variation of the curve itself. Moreover, in analogy with the Hamiltonian introduced in Sect. 3, we define an energy function (which controls the magnitude and regularity of the velocity field u ) as follows: N 1 H({xα , ξ α }) = ϕ(xα − x β ) ξ α , ξ β . 2 α,β=1

It is not difficult to show that we have an analogue of Lemma 3, which allows us to control u in terms of H({xα , ξ α }) and, of course, we have the following trivial bounds: ∇ n u L ∞ ≤ cn,ϕ L, and H ≤ ϕ L ∞ L2 , where the constants cn,ϕ depend only on ϕ and the number n of derivatives. (We assume that ϕ is sufficiently regular for all the constants in this section to be well defined. For example it is enough to take ϕ infinitely differentiable.) Let us now compute the time derivative of the energy. Taking into account Eq. (5.1), but leaving the possibility of an arbitrary time-derivative ∂t ξ α for ξ α , we obtain N N dH({xα , ξ α }) = u (xβ ) · ∇ϕ(xα − x β )ξ α , ξ β . ϕ(xα − x β )ξ α , ∂t ξ β − dt α,β=1

α,β=1

We now analyze the second sum appearing in the above equality, in order to recover some symmetries or cancellation properties. We get I =

N α,β=1

=

N

α,β=1

ξ α ∧ ∇ϕ(xα − x β ), ξ β ∧ u (xβ )

α,β=1

=−

N

u (xβ ) · ∇ϕ(xα − x β )ξ α , ξ β −

N β=1

u(xβ ), ξ β ∧ u (xβ ) = 0,

ξ α · u (xβ ) ξ β · ∇ϕ(xα − x β )

Global Evolution of Vorticity in 3D Ideal Flows

709

implying that we can write the following equivalent expression for the time derivative of H({xα , ξ α }): N N dH({xα , ξ α }) = ϕ(xα − x β )ξ α , ∂t ξ β − ξ α , u (xβ ) ξ β , ∇ϕ(xα − x β ) . dt α,β=1

α,β=1

We consider now the second term and we can write A=

N

ξ α , u (xβ ) ξ β , ∇ϕ(xα − x β )

α,β=1

=−

N

ξ β , ∇(xβ ) +

β=1

N 3 α,β=1 i, j=1

j

ϕ(xα − x β ) ξ iα ∇ j ui (xβ ) ξ β ,

where (x) =

N

ϕ(x − x α )ξ α , u (x) .

α=1

Finally, the rate of change of the energy takes the form N N 3 ! " dH({xα , ξ α }) j = ϕ(xα − x β ) ξ iα ∂t ξ iβ − ∇ j ui (xβ ) ξ β + ξ β · ∇(xβ ). dt α,β=1 i, j=1

β=1

x) = We can rewrite the above expression by setting ( ing

N # α=1

ϕ(x − x α ) ξ α , hence obtain-

3 N 3 " ! dH({xα , ξ α }) i j = ξ β · ∇(xβ ), (5.2) (xβ ) ∂t ξ iβ − ∇ j ui (xβ ) ξ β + dt β=1 i, j=1

β=1

· curl since -by definition- u (x) := curl ( x). By using a first order with = Taylor expansion with integral remainder, the introduction of the auxiliary function allows us to rewrite the second term of (5.2) as follows: N

ξ β · ∇(xβ ) =

β=1

N β=1

[(xβ + ξ β ) − (xβ )] +

N

ξ β , β ξ β ,

(5.3)

β=1

where β is the matrix ij

β =

1 0

(1 − σ )∇ j ∇ i (xβ + σ ξ β ) dσ.

in terms of We observe that analogously to Lemma 3 we can control the function the energy H({xα , ξ α }) according to the following lemma, whose straightforward proof is left to the reader.

710

L. C. Berselli, M. Gubinelli

Lemma 4. For any 0 ≤ n ∈ N, we have the bound 1/2 2n d k L∞ ≤ |k| ϕ (k) H1/2 ({xα , ξ α }). ∇ n R3

In the same way we can bound since, e.g., it holds · curl L ∞ ≤ cϕ H, L ∞ = where cϕ is some positive constant depending just on the Sobolev regularity of ϕ. At this point (see also [9]) we may recognize two interesting cases: A) The discrete filament model, obtained by setting ξ α = x α+1 − x α . In this case the first term in the r.h.s. of Eq. (5.3) vanishes due to the fact that N

[(xβ + ξ β ) − (xβ )] =

β=1

N

[(xβ+1 ) − (xβ )] = 0.

β=1

B) A blob model, obtained by considering each couple (xα , ξ α ) as describing (an approximation of) a vortex blob and for which the dynamics of ξ α is fixed by prescribing that this vector is transported by the flow at the point x α , i.e. ˙ ξ α = [∇ u (xα )]T · ξ α ,

α = 1, . . . , N .

In this case the first term in the r.h.s. of Eq. (5.2) vanishes identically. For these two models we prove the following result: Lemma 5. In the discrete filament model we have |∂t H| ≤ c1 HA + c2 H1/2 A. In the blob model we have |∂t H| ≤ c3 N H1/2 + c4 H1/2 A,

|∂t H| ≤ c5 LH1/2 .

(5.4)

Moreover, in both models we have |∂t A| ≤ c6 H1/2 A,

|∂t L| ≤ c7 H1/2 L.

The positive constants ci , for i = 1, . . . , 7, depend just on the function ϕ. Proof. Given Lemma 4, the only nontrivial part is to justify the first term appearing in the bound for the discrete filament model, i.e., c1 HA. This term comes from the expression G=

3 N β=1 i, j=1

" ! j i (xβ ) ∂t ξ iβ − ∇ j ui (xβ ) ξ β

appearing as the first term in the r.h.s. of Eq. (5.2). In the discrete filament model we have (note that ξ depends just on t, hence the partial derivative is in fact a total derivative) ˙ ξ α = x˙ α+1 − x˙ α = u (xα+1 ) − u (xα )

Global Evolution of Vorticity in 3D Ideal Flows

711

and, by a first order Taylor expansion, we get ˙ $ α (ξ α ⊗ ξ α ), ξ α = [∇ u (xα )]T · ξ α + $ α is given by the formula where the tensor j $ ki α =

1

0

(1 − σ ) ∇ i ∇ j u k (xα + σ ξ α ) dσ.

Then, we can bound G as follows: L∞ |G| ≤

3 N β=1 i, j=1

L∞ ≤

N β=1

|∂t ξ iβ

−∇ u

j i

j (xβ ) ξ β |

L∞ =

L ∞ ∇ u L ∞ $i jk | ≤ |ξ β |2 sup | β i jk

N

$ β (ξ β ⊗ ξ β )| |

β=1 N

|ξ β |2

β=1

≤ c1 HA. The other bounds are proved similarly.

Remark 8. We note that the term c3 N H1/2 in the rate of energy change of the vortex blobs model derives from the trivial bound

N

[(xβ + ξ β ) − (xβ )] ≤ 2N L ∞

β=1

which, however, misses any possible cancellation property of the various terms. Heuristically we expect that, if the evolution is constrained in a bounded volume (e.g. by imposing periodic boundary conditions on a torus), the distribution of the set of points {xα }α should not differ significantly from that of the points {xα + ξ α }α and thus that the # # two sums α (xα ) and α (xα + ξ α ) should not be too different (at least not of the order of N ).

6. Conclusions We considered several classes of problems related to the evolution of special patterns of singular vorticity. Various kinds of regularization have been analyzed and they are based either 1) on a smoother version of the Biot-Savart law, linking velocity and vorticity or 2) on a “point vortex” special expression of the solution. Under reasonable assumptions we have been able to prove global-in-time existence of smooth solutions for these various models, employing the Hamiltonian structure of the underlying 3D Euler equations. In the case of small vortex loops we solved a periodic problem involving a finite number of them. In the final section we discussed two discrete models of vortex evolution in the light of the above results.

712

L. C. Berselli, M. Gubinelli

References 1. Adams, R.A.: Sobolev spaces. Pure and Applied Mathematics, Vol. 65, New York-London: Academic Press, 1975 2. Beale, J.T., Kato, T., Majda, A.: Remarks on the breakdown of smooth solutions for the 3-D Euler equations. Commun. Math. Phys. 94(1), 61–66 (1984) 3. Bell, J., Markus, D.: Vorticity intensification and the transition to turbulence in the three-dimensional Euler equation. Commun. Math. Phys. 147(2), 371–394 (1992) 4. Berselli, L.C., Bessaih, H.: Some results for the line vortex equation. Nonlinearity 15(6), 1729–1746 (2002) 5. Bessaih, H., Flandoli, F.: A mean field result with application to 3D vortex filaments. In: Probabilistic methods in fluids, River Edge, NJ: World Sci. Publishing, 2003, pp. 22–34 6. Bessaih, H., Gubinelli, M., Russo, F.: The evolution of a random vortex filament. Ann. Prob. 33(5), 1825–1855 (2005) 7. Buttke, T.F.: Velicity methods: Lagrangian numerical methods which preserve the Hamiltonian structure of incompressible fluid flow. In: Vortex flows and related numerical methods (Grenoble, 1992), NATO Adv. Sci. Inst. Ser. C Math. Phys. Sci. 395, Kluwer Acad. Publ., Dordrecht. 1993, pp. 39–57 8. Buttke, T.F.: Numerical study of superfluid turbulence in the self-induction approximation. J. Comput. Phys. 76, 301 (1988) 9. Chorin, A.J.: Vorticity and turbulence. New York: Springer-Verlag, 1994 10. Constantin, P.: Geometric statistics in turbulence. SIAM Rev. 36(1), 73–98 (1994) 11. Constantin, P.: Near identity transformations for the Navier-Stokes equations. In: Handbook of mathematical fluid dynamics, Vol. II, Amsterdam: North-Holland, 2003, pp. 117–141 12. Constantin, P., Majda, A.J., Tabak, E.: Formation of strong fronts in the 2-D quasi-geostrophic thermal active scalar. Nonlinearity 7(6), 1495–1533 (1994) 13. Córdoba, A., Córdoba, D.: A maximum principle applied to quasi-geostrophic equations. Commun. Math. Phys. 249(3), 511–528 (2004) 14. Córdoba, A., Córdoba, D., Fefferman, C.L., Fontelos, M.A.: A geometrical constraint for capillary jet breakup. Adv. Math. 187(1), 228–239 (2004) 15. Cottet, G.-H., Koumoutsakos, P.D.: Vortex methods, theory and practice. Cambridge: Cambridge, Univ. Press, 2002 16. Flandoli, F.: A probabilistic description of small scale structures in 3D fluids. Ann. Inst. H. Poincaré Probab. Statist. 38(2), 207–228 (2002) 17. Flandoli, F., Gubinelli, M.: Gibbs ensembles of vortex filaments. Probab. Theory Related Fields 22(3), 317–340 (2002) 18. Frisch, U.: Turbulence. The legacy of A.N. Kolmogorov. Cambridge: Cambridge Univ. Press, 1995 19. Gallavotti, G.: Foundations of fluid dynamics. Translated from the Italian. Texts and Monographs in Physics. Berlin: Springer-Verlag, 2002 20. Hasimoto, H.: A soliton on a vortex filament. J. Fluid. Mech. 51, 477–485 (1972) 21. Helmholtz, H.: Uber integrale der hydrodynamischen gleichungen welche den Wirbelbewegungen entsprechen. Crelle J. 55, 25 (1885) 22. Holm, D.D.: Rasetti-Regge Dirac bracket formulation of Lagrangian fluid dynamics of vortex filaments. Nonlinear waves: computation and theory, II (Athens, GA, 2001). Math. Comput. Simulation 62(1-2), 53–63 (2003) 23. Lord Kelvin (Sir William Thomson),: On vortex motion. Trans. Royal Soc. Edin. 25, 217–260 (1869) 24. Klein, R., Majda, A.J.: Self-stretching of a perturbed vortex filament. I. The asymptotic equation for deviations from a straight line. Phys. D 49(3), 323–352 (1991) 25. Klein, R., Majda, A.J.: Self-stretching of a perturbed vortex filament. II. Structure of solutions. Phys. D 53(2–4), 267–294 (1991) 26. Lions, P.L.: On Euler equations and statistical physics. Scuola Normale Superiore, Pisa, 1997 27. Lions, P.L., Majda, A.J.: Equilibrium statistical theory for nearly parallel vortex filaments. Comm. Pure Appl. Math. 53(1), 76–142 (2000) 28. Lyons, T.J.: Differential equations driven by rough signals. Rev. Mat. Iberoamericana 14(2), 215–310 (1998) 29. Lyons, T.J., Qian, Z.: System control and rough paths. Oxford Mathematical Monographs. Oxford: Oxford Univ. Press, 2002 30. Marsden, J.E., Weinstein, A.: Coadjoint orbits, vortices, and Clebsch variables for incompressible fluids. Order in chaos (Los Alamos, N.M., 1982). Phys. D 7(1-3), 305–323 (1983) 31. Moore, D.W.: Finite amplitude waves on aircraft trailing vortices. Aero. Quarterly 23, 307–314 (1972) 32. Roberts, P.: A Hamiltonian theory for weakly interacting vortices. Mathematika 19, 169–179 (1972) 33. Rosenhead, L.: The spread of vorticity in the wake behind a cylinder. Proc. Royal Soc. 127, 590–612 (1930)

Global Evolution of Vorticity in 3D Ideal Flows

713

34. Saffman, P.G.: Vortex dynamics. Cambridge: Cambridge Univ. Press, 1992 35. Osedelets, V.I.: On a new way of writing the Navier-Stokes equation: the Hamiltonian formalism. Russ. Math. Surv. 44, 210–211 (1988) 36. Vincent, A., Meneguzzi, M.: The spatial structure and the statistical properties of homogeneous turbulence. J. Fluid. Mech. 225(1), 1–25 (1991) 37. Wolibner, W.: Un théoreme sur l’existence du mouvement plan d’un fluide parfait homogène incompressible, pendant un temps infiniment longue. Math. Z. 37, 698–726 (1933) 38. Young, L.C.: An inequality of Hölder type, connected with Stieltjes integration. Acta Math. 67, 251–282 (1936) 39. Yudovich, V.I.: Non-stationary flow of an ideal incompressible liquid. Comput. Math. & Math. Phys. 3, 1407–1456 (1963) (Russian) 40. Zhou, H.: On the motion of a slender vortex filament. Phys. Fluids 9, 970–981 (1997) Communicated by P. Constantin

Commun. Math. Phys. 269, 715–763 (2007) Digital Object Identifier (DOI) 10.1007/s00220-006-0143-9

Communications in

Mathematical Physics

Tunneling in Two Dimensions G. Bellettini1,2 , A. De Masi3 , N. Dirr4 , E. Presutti1, 1 Dipartimento di Matematica, Università di Roma “Tor Vergata”, via della Ricerca Scientifica 00133

Roma, Italy. E-mail: [email protected];[email protected]

2 INFN Laboratori Nazionali di Frascati, Frascati, Italy 3 Dipartimento di Matematica, Università di L’Aquila, via Vetoio, loc. Coppito, 67100 l’Aquila, Italy.

E-mail: [email protected]

4 Max Planck Institute for Mathematics in the Sciences, Inselstr. 22, D-04103 Leipzig, Germany.

E-mail: [email protected] Received: 2 November 2005 / Accepted: 20 July 2006 Published online: 8 November 2006 – © Springer-Verlag 2006

Abstract: Tunneling is studied here as a variational problem formulated in terms of a functional which approximates the rate function for large deviations in Ising systems with Glauber dynamics and Kac potentials, [9]. The spatial domain is a two-dimensional square of side L with reflecting boundary conditions. For L large enough the penalty for tunneling from the minus to the plus equilibrium states is determined. Minimizing sequences are fully characterized and shown to have approximately a planar symmetry at all times, thus departing from the Wulff shape in the initial and final stages of the tunneling. In a final section (Sect. 11), we extend the results to d = 3 but their validity in d > 3 is still open. 1. Introduction Tunneling in the d = 2 ferromagnetic Ising model at low temperatures has been the object of many studies, mainly focused on metastability, namely the analysis of the Glauber dynamics when an external magnetic field h > 0 is present and the initial state is close to the minus Gibbs state at h = 0. We are instead interested here in studying a bistable equilibrium with “oscillations” between the two minimizers. Such a case has been considered by Martinelli, [26], in the n.n. ferromagnetic Ising model in a d = 2 square of side L, proving upper and lower bounds for the [random] transition time from the “plus” to the “minus” state (and vice versa) in the limit as L → ∞. Much earlier Comets had attacked the problem in the context of Ising systems with Kac interactions. Supposing the side L of the square to be proportional to the range γ −1 of the Kac interaction, Comets [9] derived the large deviations rate function in the asymptotics of small γ . A “sharp” analysis of the path followed during the tunneling is however still an open problem in both models. This research has been partially supported by MURST and NATO Grant PST.CLG.976552 and COFIN, Prin n.2004028108.

716

G. Bellettini, A. De Masi, N. Dirr, E. Presutti

Tunneling is usually studied in two steps: the first one is based on a loss of memory property, namely that configurations close to one of the two stable states can be successfully coupled with large probability before leaving the neighborhood. Such estimates seem within the reach of the present techniques, as in [15] very strong properties of Glauber dynamics have been established. The second step for tunneling requires to solve a variational problem involving the large deviations rate function. In this paper we concentrate on the latter aspect and study tunneling in a purely variational setting. For simplicity we replace the Comets rate function by an “easier functional”, already considered in [3] in the d = 1 version of the model. The extension to the true Comets functional and then to the Ising system may still require a non-trivial work, but we believe that the main physical features of the actual tunneling excursion are already captured by our results. The extension from d = 1 to d > 1 is in general far from trivial. Large deviations and tunneling have been studied by Jona-Lasinio and Mitter, [22], for stochastic perturbations of the Allen-Cahn equation, partially extending the d = 1 work by Faris and Jona-Lasinio, [18] (see also [10]), but, as far as we know, a full analysis in d = 2 is open also for the Ginzburg-Landau action functional associated with the Allen-Cahn equation. Even more subtle is the analysis of tunneling under time constraints, namely when the excursion between the two stable states is required to occur within a given time interval. The picture in such a case may be dramatically different if time is short, and the optimal pattern may involve multiple nucleations. Results of this type are proved in d = 1 for the Ginzburg-Landau functional and Allen-Cahn equation, [23, 24], and for the non local interaction considered here, [11]; most of the proofs are still missing in the multi-dimensional case, but a clear picture of the phenomenon can at least be outlined, [24]. Geometric patterns are the main issues in a multi-dimensional analysis. In the sharp interface limit (i.e. when the spatial domain, a square of side L in our case, is observed in rescaled variables so that it always appears as a unit square as L → ∞) the tunneling orbits are moving surfaces which describe the boundaries of the set where the plus phase is located. In d = 1 this is simply a point which moves from an endpoint of the unit interval to the other one (Neumann boundary conditions are responsible for the nucleation to start from the boundaries of the domain). To see geometrical effects we thus need to go to d > 1. An important factor is then played by the Wulff shape. As it is well known (and briefly discussed in Sect. 2) in d = 2 dimensions the set with minimal perimeter for a given area θ is a quarter of a circle around a vertex of the unit square Q 1 , or a rectangle with three sides lying on ∂ Q 1 (again this is due to Neumann boundary conditions). Rectangles appear if their area and the area of the complement (in Q 1 ) are both larger than a critical value θcrit , otherwise we observe a quarter of a circle. As Wulff shapes describe states with minimal free energies under the area constraint, one usually expects that if the process is “slow” and the transformation “adiabatic” then the tunneling patterns are determined by sequences of “equilibrium” Wulff shapes. It is however evident from the above description that tunneling orbits cannot always be close to Wulff shapes as there is a discontinuity at θcrit . One possible scenario is depicted in (a) of Fig. 1 where the Wulff shape is deformed to interpolate around θcrit between the two different regimes. We will prove instead that the optimal tunneling in our diffused interface model is all the way planar as in pattern (b) of Fig. 1, namely that it is convenient to nucleate initially in a less efficient way, the cost being recovered in the end. Our results hold whenever there exists a stable invariant manifold which connects a saddle point of minimal energy to the

Tunneling in Two Dimension

717

a) Q

b) Q

L

t 0

L

Fig. 1. In a) and b) we depict two possible tunneling paths in the sharp interface regime. In fig. a) a small droplet (Wulff shape) of the + phase (dark region) nucleates at a vertex of the square Q L . It then invades Q L as time increases, gradually changing its interface, and eventually becomes a rectangle. Our results, valid for the diffused interface model, show that a) is not minimizing, and that the minimizing path is the one corresponding to fig. b). In this path we have initially a nucleation of a flat interface (dark rectangular region), which smoothly invades Q L

stable equilibria and which does not consist entirely of Wulff shapes. More discussions on this point can be found in Sect. 2. The content of the paper is outlined in Sect. 2, Subsect. 2.9, after defining the model and stating the main results, the extension to d = 3 is discussed in Sect. 11. For reasons of brevity imposed by the journal we have written in a separate paper, [2], the analysis of the invariant manifolds for a non-local version of the Allen-Cahn equation, which is used here to characterize the optimal tunneling orbits. 2. Definitions and Results We consider a continuum model of a two-dimensional magnet where states are functions m ∈ L ∞ (Q L , [−1, 1]), Q L = {r ∈ R2 : |r · e1 | ≤ L/2, |r · e2 | ≤ L/2}, r · e1 and r · e2 the x and y components of r . m(r ) is interpreted as a magnetization density which may be related, by a coarse graining procedure, to an underlying Ising spin configuration, hence the restriction to [−1, 1]. Time evolution is described by orbits which are smooth functions u = u(r, t), r ∈ Q L , t in R or in an interval of R, |u| ≤ 1. 2.1. The penalty functional. The “action” of an orbit u(·) restricted to an interval [t0 , t1 ] of its domain of definition is A L;t0 ,t1 (u) = FL (u(·, t0 )) + I L;t0 ,t1 (u), where FL (m), the free energy of the state m, is 1 φβ (m) dr + J neum (r, r )[m(r ) − m(r )]2 dr dr . FL (m) = 4

(2.1)

Q L ×Q L

QL

J neum (r, r ) is the interaction coupling constant (with Neumann boundary conditions), neum namely J (r, r ) = J (r, r ), where r r means that r is equal to r modulo r r

reflections along the lines {y = ±(2n + 1)L/2} and {x = ±(2n + 1)L/2}, n ∈ Z. We

718

G. Bellettini, A. De Masi, N. Dirr, E. Presutti

suppose J (r, r ) = J (0, r − r ) and make the following “technical assumptions” on J (0, r ): J (0,r ) only depends on |r |; it is a smooth non-negative function supported in the unit ball;

J (0, r ) = 1; j (0, x) =

J ((0, 0), (x, y)) dy

(2.2)

is a non-increasing function of x when x > 0. In (2.1) we take β > 1, φβ (m) = φ˜ β (m) − min φ˜ β (s), |s|≤1

S(m) = −

φ˜ β (m) = −

1 m2 − S(m), 2 β

1−m 1−m 1+m 1+m log − log . 2 2 2 2

Finally, 1 I L;t0 ,t1 (u) = 4

t1

t0

[u t − f L (u)]2 dr dt, QL

where u t is the time derivative of u and f L (u) = − neum ∗ u(r ) = J

δ FL (u) = J neum ∗ u − aβ (u), δu

aβ (u) =

J neum (r, r )u(r ) dr .

1 arctanh(u), β (2.3)

QL

As mentioned in the Introduction, A L;t0 ,t1 (u) is a simplified version of the Comets large deviations rate function for Glauber dynamics in a ferromagnetic Ising system with Kac potential Jγ (r, r ) = γ 2 J (γ r, γ r ). Later on, in the course of the proofs, we will consider rectangles Q L ,L = {(x, y) : |x| ≤ L , |y| ≤ L/2} with L ∈ (0, +∞] and call channel the set Q ∞,L . The definition of FL in (2.1) naturally extends to domains Q L ,L in which cases it will be denoted by FQ L ,L , as a functional on L ∞ (Q L ,L , [−1, 1]). 2.2. Dynamics: the semigroups St and Tt . We denote by St the semi-group generated by the L 2 -gradient dynamics, namely St (u 0 ) = u(·, t) is the solution to the non-local evolution equation u t = f L (u) = −

δ FL (u) , δu

u(·, 0) = u 0 .

(2.4)

The velocity field f L (u) is Lipschitz when restricted to sets of the form { u ∞ ≤ b}, b < 1. Then it is not difficult to prove global existence of St (u 0 ) if u 0 ∞ < 1, see [2] for details. The semigroup Tt (u 0 ) generated by the equation u t = −u + tanh{β J neum ∗ u},

u(·, 0) = u 0 ,

(2.5)

Tunneling in Two Dimension

719

has been much more studied in the literature, as (2.5) is the limit equation derived from Glauber dynamics with Kac potentials in a scaling limit, see [15]. Thus for ease of reference and in order to exploit results already existing in the literature we will often in the sequel consider Tt (u 0 ), regarded either in Q L or in Q ∞,L . Observe that it also decreases the energy FL , that its fixed points are the same as those of St , which are critical points of FL . 2.3. The cost of tunneling. The action A L;t0 ,t1 (u) is always non-negative, as the integrands in FL and I L;t0 ,t1 are non-negative. Actually A L;t0 ,t1 (u) > 0 unless u(r, t) ≡ ±m β , where m β > 0 is such that m β = tanh{βm β } (recall the assumption β > 1). Therefore m (±) (r ) ≡ ±m β have the interpretation of the [only] two equilibrium states of the system and tunneling describes orbits which connect such states. Thus the space of tunneling orbits in a time T > 0 is U L ,T = u ∈ C ∞ (Q L × [0, T ]) : u(r, 0) = −m β , u(r, T ) = m β for all r ∈ Q L and, calling I L ,T (u) = I L;0,T (u), we define the cost of tunneling as PL := inf

inf I L ,T (u),

T >0 u∈U L ,T

(2.6)

noticing that since FL (m (−) ) = 0, A L ,T (u) = A L;0,T (u) = I L ,T (u) when u ∈ U L ,T . As mentioned in the Introduction the problem is completely different if restrictions on T are imposed, but in this paper we will only study problem (2.6). To motivate our results let us first describe some properties of A L;t0 ,t1 . 2.4. Reversibility. First notice that I L;t0 ,t1 (u) = 0 if u(·, t) = St−t0 (u(·, t0 )), St being defined in Subsect. 2.2. Given u(·, t), t ∈ [t0 , t1 ], call u rev (·, t0 + s) = u(·, t1 − s), s ∈ [0, t1 − t0 ]. Then A L;t0 ,t1 (u) = A L;t0 ,t1 (u rev ).

(2.7)

To show (2.7), which is proved in [3], it suffices to expand the square in the integral defining I L;t0 ,t1 and recall that f L (u) = −δ FL (u)/δu. As a consequence of (2.7), I L;t0 ,t1 (u) ≥ FL (u(·, t1 )) − FL (u(·, t0 )),

(2.8)

I L;t0 ,t1 (u) = FL (u(·, t1 )) − FL (u(·, t0 )) if u(·, t0 ) = St1 −t0 (u(·, t1 )).

(2.9)

Remark 2.1. Note that I L ,T (u) ≥ FL (u(·, t)) for any t ∈ [0, T ]. 2.5. The Wulff shape. Given any tunneling orbit u ∈ U L ,T and α ∈ (−m β , m β ), by continuity there must be a time t ∈ (0, T ) when u(·, t) ∈ α , ⎧ ⎫ ⎪ ⎪ ⎨ ⎬ ∞

α = m ∈ L (Q L , [−1, 1]) : − m = α . ⎪ ⎪ ⎩ ⎭ QL

720

G. Bellettini, A. De Masi, N. Dirr, E. Presutti

Thus from (2.8),

I L ,T (u) ≥ inf FL (m) : m ∈ α for any α ∈ (−m β , m β ),

(2.10)

hence the intuition that optimality in tunneling requires closeness to the Wulff shape, namely the minimizer on the r.h.s. of (2.10). The Wulff problem is well understood in the limit L → ∞. As the infimum on the r.h.s. of (2.10) grows proportionally to L, (L d−1 in d dimensions), it is natural to renormalize the free energy by dividing by L and have (see [15–17])

F (m) 1 L : m ∈ α = cβ inf P(E, int(Q 1 )) : E ⊆ Q 1 , |E| = − ϑα , lim inf L→∞ L 2 where P(E, int(Q 1 )) denotes the perimeter of the BV set E in the interior of Q 1 (namely the intersection with ∂ Q 1 does not contribute); |E| is the Lebesgue measure of E; ϑα ∈ (−1/2, 1/2) is defined by 1 1 − ϑα m β − 1 − − ϑα m β = α. (2.11) 2 2 Equation (2.11) has a clear geometrical interpretation, the magnetization α being realized by putting m β in the rectangle {(x, y) ∈ Q 1 : x ≥ ϑα } and −m β in its complement. cβ , the surface tension, is equal to cβ = F (1) (m). ¯ Namely cβ is the one-dimensional free energy F (1) of the one-dimensional instanton m(x), ¯ x ∈ R, where 1 F (1) (m) = φβ (m) d x + j (x, x )[m(x) − m(x )]2 d xd x , (2.12) 4 R

R R

with j (x, x ) as in (2.2) and m¯ the non-zero, antisymmetric solution of m¯ = tanh{ j ∗ m). ¯ The limit Wulff problem 1 inf P(E, int(Q 1 )) : E ⊆ Q 1 , |E| = − θ 2

(2.13)

(2.14)

of minimizing the perimeter functional P(E, int(Q 1 )) is explicitly solved. Indeed (2.14) admits a solution and any solution E θ is such that Q 1 ∩ ∂ E θ is smooth and thus a critical point, [25]. Moreover Q 1 ∩ ∂ E θ is connected and has constant curvature. Hence it is contained either in a circle or in a line. In addition the contact between ∂ E θ and ∂ Q 1 is orthogonal. Let θcrit be defined by 1 2π R π R2 − θcrit = , where = 1. 2 4 4 Then the following result holds. Proposition 2.1 [25]. If |θ | ≤ θcrit then Q 1 ∩ ∂ E θ is a segment parallel to one of the coordinate axes and intersecting two of the opposite sides of ∂ Q 1 . If |θ | ≥ θcrit then 1/2 1 2 Q 1 ∩ ∂ E θ is a quarter of a circle of radius √ centered at one of the four −θ π 2 corners of Q 1 .

Tunneling in Two Dimension

721

Remark 2.2. As already remarked in the Introduction, for L large enough a tunneling orbit cannot always be close to the Wulff shape, as the Wulff shape varies discontinuously when α crosses the critical value at which ϑα = θcrit . When α = 0 the Wulff shape is planar and this may suggest that optimal orbits become eventually (approximately) planar. Two scenarios are then conceivable: (a) the plus phase grows initially as a quarter of circle around a corner and then progressively deforms to end up into a planar wave as α → 0; (b) the plus phase starts from the very beginning planar, so that in the limit picture the perimeter is discontinuous at time 0, jumping from 0 to its maximal value. In any case, both scenarios evidently contradict the intuitive idea that optimal orbits follow Wulff shapes. A discussion on this issue can be found in [27] in the context of statistical mechanics. Planar symmetry suggests relevance of d = 1 tunneling, which is the argument of the next subsection. 2.6. Tunneling in one dimension. Let FL(1) be defined by (2.12) with R replaced by [−L/2, L/2] and with Neumann boundary conditions. Let m ∈ L ∞ ([−L/2, L/2], [−1, 1]) and set m e (r ) = m(r · e1 ),

r ∈ QL.

Then (1)

FL (m e ) = L FL (m). U L(1) ,T

(2.15) PL(1)

Let be the d = 1 tunneling orbits in a time T and the d = 1 tunneling cost (1) associated with the functional F . We then have from (2.15), (1)

PL ≤ L PL .

(2.16)

In [3, 4] it is proved that (1)

(1)

PL = FL (mˆ L ),

(2.17)

where mˆ L is the unique non-zero, strictly monotone antisymmetric function of x which solves the equation mˆ L (x) = tanh{ j neum ∗ mˆ L (x)}, |x| ≤ L/2,

(2.18)

with j neum obtained from j, see (2.2), by reflections at ±L/2 (thus mˆ L is a critical point (1) of FL ). 2.7. St -invariant manifolds. It is proved in [2] that mˆ eL := (mˆ L )e is “dynamically connected” to m (±) in the sense that there are two St -invariant, one-dimensional mani(±) folds, W± = {v L (·, s), s ∈ R}, which connect mˆ eL to m (−) and, respectively, to m (+) . v (±) L (·, s) are planar functions (i.e. constant in the vertical direction) which satisfy the following two properties: (±)

lim v L (·, s) − mˆ eL 2 = 0,

s→−∞

(±)

lim v L (·, s) − m (±) 2 = 0,

s→∞

where · 2 is the L 2 norm in Q L , and (±)

(±)

St (v L (·, s)) = v L (·, s + t) for all s ∈ R and all t ≥ 0. (±)

(1)

Moreover FL (v L (·, s)) < L FL (mˆ L ) for any s ∈ R.

(2.19)

722

G. Bellettini, A. De Masi, N. Dirr, E. Presutti

2.8 Main results Theorem 2.3. For L large enough PL = L FL(1) (mˆ L ).

(2.20)

Theorem 2.3 will be proved starting from Sect. 5. It suggests that the best strategy for tunneling is to use orbits with planar symmetry, a statement made precise in Theorem 2.4 below which will be proved in Sect. 4 using heavily results from [2]. Theorem 2.4. For all L large enough, if {Tn , u n } is a minimizing sequence for (2.6), then lim Tn = +∞ and, given any > 0 there exists a positive integer n 0 such that n→+∞

for any n ≥ n 0 , u n (or its image under a rotation by an integer multiple of π/2) has the following properties. There is s ∈ (0, Tn ) so that u n (·, s) − mˆ eL 2 ≤ and there are τ and τ positive so that u n (·, t) − v (−) L (·, τ − t) 2 ≤ , t ∈ [0, s], (+)

u n (·, t) − v L (·, −τ + (t − s)) 2 ≤ , t ∈ [s, Tn ].

(2.21) (2.22)

Theorem 2.4 proves that the best tunneling is obtained by orbits which have (approximately) a planar symmetry and which (approximately) follow the one-dimensional manifolds connecting saddle and stable points, first in the time reverse direction and then, after crossing the saddle, along the forward time direction. Initially the orbits look far from optimal, in the sense that it would be cheaper to gain the same value of total magnetization by following a different pattern, closer to the corresponding Wulff shape; but overall such an initial cost is recovered by smaller costs afterwards. In the limit L → ∞ and rescaling penalties by dividing by L, we see that in optimal orbits the free energy jumps at time 0 to a value which then remains constant: in the limit the whole penalty is paid at time 0+ . Thus pattern b) in Fig. 1 rather than a) is what we actually observe in tunneling events. 2.9. Content of the paper. In Sect. 3 we reduce the proof of Theorem 2.3 to the proof that • when u(·, t) ∈ α with |α| small then u(·, t) is very close to a planar instanton, Theorem 3.1; • calling m = u(·, t), t as above, then either Ts (m) → mˆ eL as s → ∞, or else Ts (m) at some time s is close to a planar instanton suitably shifted away from the origin, Theorem 3.2; • if m is close to a planar instanton suitably shifted away to the right or to the left of the origin, then Ts (m) is attracted by m (−) or respectively by m (+) , Theorem 3.3. We conclude Sect. 3 by showing that indeed Theorem 2.3 follows from Theorems 3.1–3.3. In Section 4 we prove Theorem 2.4 as a consequence of Theorem 2.3 and of existence and stability of the invariant manifolds W± , properties which are proved in a companion paper, [2]. In Sects. 5–7 we prove Theorem 3.1: in Sect. 5 we quote from the literature lower bounds on the free energy cost of deviations from equilibrium (Peierls estimates). In Sect. 6 we prove that the distance from an instanton can be controlled in terms of the free energy, Theorem 6.1, and in Sect.. 7 we conclude the proof of Theorem 3.1. In Sect. 8 we prove Theorems 3.2 and 3.3, relying again on the companion paper [2], thus concluding the proof of Theorem 2.3. In the Appendix, Sects. 9 and 10, we prove some spectral properties of operators obtained by linearizing the flows Tt and St which have been used in the proofs of Theorems 3.1–3.3. The extension of the results to d = 3 is sketched in Sect. 11.

Tunneling in Two Dimension

723

3. Scheme of Proof of Theorem 2.3 (1)

By (2.16) and (2.17), PL ≤ L FL (mˆ L ), so that Theorem 2.3 will be proved once we show that for L large enough (1)

PL ≥ L FL (mˆ L ).

(3.1)

Thus we may take arbitrarily > 0, restrict to T > 0 and u ∈ U L ,T such that I L ,T (u) ≤ PL + ≤ L FL(1) (mˆ L ) +

(3.2)

and show that if L is large enough for any such u, (1)

I L ,T (u) ≥ L FL (mˆ L ).

(3.3)

The main point is an a-priori characterization of the tunneling orbits which satisfy (3.2) at times t when u(·, t) ∈ α with |ϑα | < θ0 (ϑα as in (2.11)) where θ0 is fixed arbitrarily with the only requirement that 0 < θ0 < θcrit

(3.4)

(how large L is in our analysis will depend also on the value of θ0 ). As we will see in Sect. 7, the proof of convergence to the Wulff shape as L → ∞, see Proposition 7.1, essentially contains closeness to the instanton in the following sense: For any δ > 0 there are (δ) > 0 and L(δ) so that if 0 < < (δ), L > L(δ), (1) m ∈ α with |ϑα | ≤ θ0 and FL (m) < L FL (mˆ L ) + , then, modulo a rotation of an integer multiple of π/2, there is ξ ∈ (−L/2, L/2) so that m − m¯ ξ,L 1 ≤ δL 2 , where ¯ · e1 − ξ ), r ∈ Q L . m¯ ξ,L (r ) = m(r

(3.5)

The bound m − m¯ ξ,L 1 ≤ δL 2 is still far from what is needed in our proof of (3.1), but it is an important ingredient in the proof of a much sharper estimate, where “the error” m − m¯ ξ,L 2 vanishes instead of growing as L → ∞. This is the main technical point in the paper, its precise statement is the content of: Theorem 3.1. There are L 0 and 0 (L) ∈ (0, L −100 ), so that for any L ≥ L 0 if (1)

m ∈ α with |ϑα | ≤ θ0 and FL (m) < L FL (mˆ L ) + , ∈ (0, 0 (L))

(3.6)

then there exists ξ ∈ (−θ0 L − 1, θ0 L + 1) such that, modulo a rotation of an integer multiple of π/2, m − m¯ ξ,L 2 < L −100 .

(3.7)

Remarks. Theorem 3.1 as well as Theorems 3.2 and 3.3 below, are proved in the next sections and in an appendix. The bound L −100 is not optimal. Analogously to (8.9), it can be proved that |ϑα + ξ/L| < L −100 . Our proof of Theorem 3.1 uses in an essential way two dimensions but it extends to d > 3 using an argument due to Bodineau and Ioffe, [5], and an extension of the theory of Wulff shapes to d = 3, [29], see Sect. 11. The proof of (3.1) proceeds with a characterization of the critical points of FL (·). For this purpose we use the dynamics with semigroup Tt defined by Eq. (2.5).

724

G. Bellettini, A. De Masi, N. Dirr, E. Presutti

Theorem 3.2. There exists L 1 ≥ L 0 such that for any L ≥ L 1 the following holds. If m satisfies (3.6) then either there is a time t when Tt (m) ∈ α , α such that |ϑα | = θ0 , or else lim Tt (m) = mˆ eL in L 2 (Q L ) (modulo a rotation of an integer multiple of π/2). t→∞

Theorem 3.3. There exists L 2 ≥ L 1 such that for any L ≥ L 2 the following holds. If m ∈ α for some α such that ϑα = ±θ0 and if there exists ξ such that m − m¯ ξ,L 2 < L −100 , then lim Tt (m) = m (∓)

t→∞

in L 2 (Q L ).

(3.8)

The proof of (3.1), giving the proofs of Theorems 3.1, 3.2 and 3.3, is then concluded using the following corollary: Corollary 3.4. Let L 2 be as in Theorem 3.3 and L > L 2 . Then for any u ∈ U L ,T which satisfies (3.2) with as in (3.6), there exists t ∗ ∈ (0, T ) so that FL (u(·, t ∗ )) ≥ L FL(1) (mˆ L ),

(3.9)

and lim Tt (u(·, t ∗ )) = mˆ eL

t→∞

in L 2 (Q L ).

(3.10)

Proof. Let u ∈ U L ,T be as in the statement, α(t) such that u(·, t) ∈ α(t) and I = {t ∈ [0, T ] : |ϑα(t) | ≤ θ0 }. Since ϑα(0) = 1/2, ϑα(T ) = −1/2 and θ0 < 1/2, by continuity there is an interval [t0 , t1 ] ⊂ I , where ϑα(t0 ) = θ0 and ϑα(t1 ) = −θ0 . By Theorems 3.1, 3.2 and 3.3, [t0 , t1 ] is the disjoint union of the intervals I+ , I− and Iˆ, respectively where u(·, t) is attracted by m (+) , m (−) and mˆ eL . By Theorem 3.3 I+ t1 and I− t0 , thus I± are both non-empty. Moreover, since the equilibria ±m β are stable, by the continuity of motion I+ and I− are open in I . Then necessarily also Iˆ = ∅ and hence there is a time t ∗ ∈ (t0 , t1 ) so that (3.10) holds. Since Tt decreases the energy FL , see (6.2), (3.9) follows from (3.10), (6.2) and (2.15). Conclusion of the proof of Theorem 2.3. From (2.8) and (3.9) it follows that, if L is large enough, I L ,T (u) = A L ,T (u) ≥ A L;t ∗ (u) = FL (u(·, 0)) + I L ,t ∗ (u) ≥ FL (u(·, t ∗ )) ≥ L F (1) (mˆ L ), hence (3.3). Theorem 2.3 is proved.

In d = 1, see [3, 4] (and [18] for Allen-Cahn), it is proved that for L large enough if (1) (1) FL (m) ≤ FL (mˆ L ) + and m is a critical point, then m ∈ {m + , m − , mˆ L }. (The statement is evident in Allen-Cahn once formulated in terms of a one dimensional mechanical (1) (1) point in a conservative force field.) It then follows that if PL (u) ≤ FL (mˆ L ) + then + − at all t, u(·, t) is attracted by {m , m , mˆ L }. In d = 2 we know that such a property is valid only at times t when u(·, t) ∈ α with α such that |ϑα | ≤ θ0 . As shown above the proof of Theorem 2.3 can be worked out also with such a weaker statement, but there could be problems when extending the analysis to Glauber dynamics in Ising models with Kac potentials. We have shown that the proof of Theorem 2.3 reduces to the proof of Theorems 3.1, 3.2 and 3.3, which is given in the next sections. While the proof of Theorems 3.2 and 3.3 is an extension of the proof of analogous statements in d = 1, see [3], the proof of Theorem 3.1 requires really new considerations, due to the geometrical complexities of a higher dimension and it will take most of the paper.

Tunneling in Two Dimension

725

4. Proof of Theorem 2.4 In this section we prove Theorem 2.4 using Theorem 2.3 which is thus taken for proved. Let {u n , Tn } be a minimizing sequence for (2.6), i.e., u n ∈ U L ,Tn and lim I L ,Tn (u n ) = n→∞

(1)

PL = L FL (mˆ L ), where the last equality follows from Theorem 2.3. Then for any

> 0 there exists n so that for any n ≥ n , (1)

I L ,Tn (u n ) ≤ L FL (mˆ L ) + , L F (1) (mˆ L ) = FL (mˆ eL ).

(4.1)

By Corollary 3.4 if L > L 2 and is as in (3.6), then for any n ≥ n there is a time sn ∈ (0, Tn ) (sn will be the time s in Theorem 2.4) so that (1) lim Tt u(·, sn ) = mˆ eL , FL (u n (·, sn )) ≥ L FL (mˆ L ). (4.2) t→∞

By (2.8), I L ,Tn (u n ) ≥ I L ,sn (u n ) ≥ FL (u n (·, sn )), then, using (4.1), 0 ≤ FL (u n (·, sn )) − FL (mˆ eL ) ≤ .

(4.3)

The function wn (·, t) = u n (·, sn − t),

t ∈ (0, sn )

(4.4)

satisfies the identity 1 dwn = J neum ∗ wn − arctanh(wn ) + K n , dt β

(4.5)

where K n is defined by (4.5) itself. We then consider (4.5) as an equation in wn , regarding K n as a “known term”. In the next lemma we will prove that K n is “small” and then as a consequence and relying heavily on [2] that wn follows closely the St -invariant manifold W− . Lemma 4.1. Let > 0, u n satisfy (4.1), sn as in (4.2), wn as in (4.4) and K n as in (4.5). Then for n sufficiently large, sn K n :=

K n (r, t)2 dr dt < .

2

(4.6)

0 QL

Furthermore there exists c > 0 independent of n so that wn (·, 0) − mˆ eL 22 ≤ c .

(4.7)

Proof. From (4.1) and (2.15) it follows that sn FL (mˆ eL ) +

≥

[(u n )t − f L (u n )]2 = I L ,sn (u n ).

(4.8)

0 QL

By (2.7) and recalling that FL (u n (·, 0)) = 0, I L ,sn (u n ) = A L;sn (u n ) = A L ,sn (wn ) = I L ,sn (wn )+ FL (u n (·, sn )) = K n 2 + FL (u n (·, sn )), which, together with (4.8) and (4.2), implies (4.6).

726

G. Bellettini, A. De Masi, N. Dirr, E. Presutti

Let := {m ∈ L ∞ (Q L , (−1, 1)) : lim Tt (m) − mˆ eL 2 = 0}. In [2, Theorem 7.2] t→∞ it is proved that there is c so that m − mˆ eL 22 ≤ c[FL (m) − FL (mˆ eL )]

for all m ∈ .

By (4.2) u n (·, sn ) = wn (·, 0) ∈ , therefore (4.7) follows from (4.9) and (4.3).

(4.9)

We will prove the properties of wn stated in Theorem 2.4 by investigating the evolution equation (4.5) and exploiting that K n is small. Smallness of K n is however not enough: if we only knew the bounds on K n from Lemma 4.1 we could not predict (even approximately) the evolution of wn . Recall in fact that mˆ eL is a stationary solution of the unperturbed evolution so that, no matter how small K n is, it would nonetheless be larger than the unperturbed force in a correspondingly small neighborhood of mˆ eL . In other words, when close to mˆ eL the evolution is essentially ruled by K n . Besides this, the initial datum wn (·, 0) is in the domain of attraction of mˆ eL with “the wrong dynamics” Tt , under the “right evolution” St it may no longer converge to mˆ eL but rather to m − or even m + . In conclusion the evolution of wn (·, 0) may have completely different behavior if we only had the information in Lemma 4.1 concerning smallness of K n and closeness of wn (·, 0) to mˆ eL . Let us now recall what is proved in [2], in particular Theorem 7.3 of [2]. Call StK (m) the flow generated by the equation u t = J neum ∗ u − β −1 arctanh(u) + K , u(·, 0) = m, where K = K (r, t), (r, t) ∈ Q L × R+ , is a smooth space-time dependent force. Then for any ζ and τ positive there is > 0 so that if K < and m − mˆ eL 2 < only the following two alternatives hold: • For all times t ≥ 0, StK (m) − mˆ eL 2 < ζ . (σ ) • There are t ∗ > 0 and σ ∈ {−, +} so that StK (m) − mˆ eL 2 < 2 v L (·, −τ ) − mˆ eL 2 (σ ) for all t ≤ t ∗ while StK (m) − v L (·, −τ + (t − t ∗ ) 2 < ζ for all t ≥ t ∗ . Let us now prove the statements in Theorem 2.4 referring to W− , calling ∗ the parameter in Theorem 2.4 to avoid confusion with the in (4.1) and identifying s = sn . Recall that u n (sn − t) = StK n (wn (0)), t ∈ [0, sn ]; we are only writing the time variable in the argument of the functions. (−) We choose: τ such that sup v L (s) − mˆ eL 2 ≤ ∗ /10; ζ < ∗ /10; is determined s≤−τ

by τ and ζ as above; in (4.1) so that < and c < ∗ , c as in (4.7), so that the inequality u n (·, sn ) − mˆ eL 2 ≤ ∗ in Theorem 2.4 follows from (4.7). Since u n (0) = m (−) the first alternative above is excluded and in the second alterna(−) tive σ = −. Let t ∗ be as in the second alternative. We then have u n (sn − t) − v L (−τ + ∗ ∗ ∗ ∗ t − t ) 2 < for t ∈ [t , sn ]. For t ∈ [0, t ] we write (−)

(−)

u n (sn − t) − v L (−τ + t − t ∗ ) 2 ≤ u n (sn − t)− mˆ eL 2 + v L (−τ + t − t ∗ )− mˆ eL 2 (−)

which is ≤ 3 sup v L (s) − mˆ eL 2 ≤ 3 ∗ /10. Equation (2.21) is thus proved with τ = s≤−τ − t ∗ ).

−τ + (sn The proof of (2.22) is analogous. We now take wn+ (·, t) = u n (·, sn + t),

t ∈ [0, Tn − sn ],

Tunneling in Two Dimension

727

so that wn+ satisfies the “equation” dwn+ 1 = J neum ∗ wn+ − arctanh(wn+ ) + K n+ dt β

(4.10)

with K n+ defined by (4.10). Analogously to Lemma 4.1, K n+ < for n large enough. + Since STKn −tn (wn+ ) = m (+) , the first alternative is again excluded and the second one is followed with σ = +. Again we require τ so large and ζ so small (and correspond(+) ingly small) that v L (·, −τ ) − mˆ eL 2 ≤ ∗ /10 and ζ < ∗ . Then (2.22) follows with τ = τ + t ∗ (t ∗ the time appearing in the second alternative applied to the present case). Notice finally that if ∗ → 0 the time τ in the above construction diverges and then we need also Tn → ∞ as stated in Theorem 2.4. 5. Local Equilibrium and Peierls Estimates The heuristics behind the proof of Theorem 3.1 goes as follows. The Wulff theorem and the limit Wulff shape suggest that if u(·, ·) satisfies (3.2), at times t when u(·, t) ∈ α with |ϑα | ≤ θ0 , to “zero order” u(·, t) looks like Wα,L := m β 1{(x,y):x≥Lϑα } − m β 1{(x,y):x
(5.1)

To a next approximation we expect u(·, t) close to m¯ ξ,L with ξ such that m¯ ξ,L ∈ α . Behind this picture is the intuition that it does not pay to have deviations from +m β and −m β away from the interface and that the actual profile at the interface is not exactly as sharp as in Wα,L but rather the diffuse interface defined by the d = 1 instanton m¯ shifted by ξ . In this section we quote from the literature lower bounds on the free energy due to deviations from equilibrium [Peierls estimates], in the next one we prove lower bounds due to deviations from the instanton shape and in Sect. 7 we use all that to prove Theorem 3.1. Local equilibrium and deviations from equilibrium as usual in statistical mechanics are defined in terms of “averages” and of “coarse grained” variables. We briefly recall the main notion adapted to the present context. Definition 5.1 (Coarse graining). We denote by D() , > 0, the partition of R2 into the () squares {(x, y) : x ∈ [n, (n + 1)), y ∈ [n , (n + 1))}, n, n integers, and by Cr the square of D() which contains r . Then the -coarse grained image m () of a function m ∈ L ∞ (R2 ) is m () (r ) := − m(r ). ()

Cr

Definition 5.2 (Geometrical notions). A set is D() -measurable if it is union of squares in D() , two sets are connected if their closures have non-empty intersection and B is a vertical connection if it is a D(+ ) -measurable, connected set which is connected to + both lines {y = ±L/2}. Given a D(+ ) -measurable region ⊂ Q L we call δout [] the union of all squares of D(+ ) in Q L \ which are connected to .

728

G. Bellettini, A. De Masi, N. Dirr, E. Presutti

Definition 5.3 (Phase indicators). Given an “accuracy parameter” ζ > 0 and m ∈ L ∞ (R2 , [−1, 1]), we define the “local phase indicator” ±1 if |m () (r ) ∓ m β | ≤ ζ , (ζ,) (m; r ) = η 0 otherwise. Given − > 0, + an integer multiple of − , D(+ ) a coarser partition of D(− ) , we define the “global phase indicator” + [Cr(+ ) ], ±1 if η(ζ,− ) (m; ·) = ±1 in Cr(+ ) ∪ δout (ζ,− ,+ ) (m; r ) = 0 otherwise. η(ζ,) (m; r ) and (ζ,− ,+ ) (m; r ) are defined also for functions m ∈ L ∞ (Q L , [−1, 1]) by simply extending m to R2 by reflections along the lines {y = (2n + 1)L/2} and {x = (2n + 1)L/2}, n ∈ Z. Definition 5.3 introduces the notion of “local equilibrium”: a point r is attributed to the plus phase if (ζ,− ,+ ) (m; r ) = 1, to the minus phase if (ζ,− ,+ ) (m; r ) = −1 while, if (ζ,− ,+ ) (m; r ) = 0, r belongs to a contour, contours being the maximal connected components of {r : (ζ,− ,+ ) (m; r ) = 0}. Local equilibrium in r requires closeness to m β in a large region, the 9 squares in Fig. 2. By choosing − small we try to approximate point-wise closeness (which would be too strong a request as the energy is defined by integrals) while by taking + large we try to approximate global equilibrium. Very little is needed for local equilibrium to fail as exemplified in Fig. 2. Definition 5.4 (Choice of parameters). We choose − and + as functions of ζ and L . The definition is used only when ζ is small and L much larger than + , and the dependence on L is only through the requirement that L−1 ± is an integer. We require that for 2 2 −4 −4 ζ small enough: − ∈ [ζ /2, ζ ]; + ∈ [ζ /2, ζ ] with + an integer multiple of − ; Q L to be the closure of union of squares of D(+ ) ; each square of D(+ ) to be the union of squares of D(− ) . r

η

(ζ , l- )

=/ 1

l+

Fig. 2. Nine large squares belonging to D(+ ) . The small squares are instead elements of D(− ) . Even if η(ζ,− ) (m; ·) = 1 in all small squares except the one in grey, nonetheless (ζ,− ,+ ) (m; r ) = 0

Tunneling in Two Dimension

729

We have the following two theorems, whose proof is (essentially) contained in [28]: Theorem 5.5. There exist c > 0 and ω > 0 such that if ζ > 0 is small enough and − , + and L are as above, the following assertions hold. Let ⊂ Q L be D(− ) -measurable, and let m be such that η(ζ,− ) (m; r ) = 1 for all r ∈ Q L at distance ≤ 1 from . Then there exists a function ψ satisfying ψ = m outside , η(ζ,− ) (ψ; ·) = 1 in , ψ(r ) = tanh{β J neum ∗ ψ(r )}, |ψ(r ) − m β | ≤ ce FL (ψ) ≤ FL (m).

−ωdist(r,Q L \)

r ∈ , ,

r ∈ ,

The analogous statement holds if η(ζ,− ) (m; ·) = −1, provided m β is replaced by −m β . Theorem 5.6. There exists c1 > 0 such that if ζ is small enough and − , + and L are as above, the following assertions hold. Let ⊂ Q L be D(+ ) -measurable and let m be + such that (ζ,− ,+ ) (m; ·) = 1 in δout []. Then there exists a function ψ satisfying ψ = m outside , η(ζ,− ) (ψ; ·) = 1 in , FL (m) ≥ FL (ψ) + c1 ζ 2 (− )2 N0 ,

(5.2)

where N0 denotes the number of squares of D(+ ) ∩ , where (ζ,− ,+ ) (m; ·) = 0. The + []. analogous statement holds if (ζ,− ,+ ) (m; ·) = −1 in δout 6. Free Energy Bounds in the Channel This section continues the “preparation” to the proof of Theorem 3.1. We will estimate here the cost of deviations from the instanton shape. The natural setup for the problem is the channel Q ∞,L = {(x, y) : |y| ≤ L/2}; in the next section we will in fact eventually reduce from Q L to Q ∞,L . Our main result is an extension to Q ∞,L of a d = 1 result in [28]: Theorem 6.1. There is c so that for any L large enough and for any m ∈ L ∞ (Q ∞,L , [−1, 1]) such that uniformly in y lim inf m(x, y) > 0 and lim sup m(x, y) < 0 x→∞

x→−∞

and such that for some ξ ∈ R, m − m¯ eξ 22 < ∞,

FQ ∞,L (m)− FQ ∞,L (m¯ ) ≥ e

⎧ −[22+36β] cL , ⎪ ⎪ ⎪ ⎪ ⎨ −[2+12(β+1)]

⎪ cL ⎪ ⎪ ⎪ ⎩

if inf m − m¯ eξ 22 ξ ∈R

inf m

ξ ∈R

− m¯ eξ 22 ,

>L−24β−8 . if inf m − m¯ eξ 22 ξ∈R

≤L −24β−8

The dependence on L in (6.1) is not optimal. We cannot possibly have FQ ∞,L (m) − FQ ∞,L (m¯ e ) ≥ c inf m − m¯ eξ 22 , c > 0 ξ ∈R

(6.1)

730

G. Bellettini, A. De Masi, N. Dirr, E. Presutti

because inf m − m¯ eξ 22 can be made arbitrarily large while keeping the free energy ξ ∈R

bounded: just take m as a piecewise constant function of x which as x increases from −∞ to +∞ has values −m β , m β , −m β and m β . Then the L 2 norm increases to ∞ as the two intermediate intervals are made long enough while the free energy is bounded j (x, x ). Thus the lower bound can hold only if inf m − m¯ eξ 22 by 4cL, c = x≤0

ξ ∈R

x ≥0

is small enough. Theorem 6.1 is proved at the end of the section. Its proof, essentially perturbative, is obtained by expanding FQ ∞,L (m) around FQ ∞,L (m¯ e ). The linear term disappears because the instanton is a critical point; the quadratic term becomes then the leading one. Its analysis requires the study of the spectral properties of a linear operator, which is the second derivative of the functional and hence also the operator obtained by linearizing the time flow around m¯ e . The spectral properties of such an operator are interesting in their own right, their analysis far from trivial and rather long. We have thus decided to just use in this section the outcome of the theory leaving details and proofs to an appendix, where the issue is presented in a self contained fashion. Thus a spectral gap estimate will prove the desired lower bounds to a second order approximation, an analysis of the energy landscape away from the instanton shape where non-linear effects are dominant requires a different set of ideas. Both close and away from the instanton shape, dynamical properties of the flow Tt (m) play a dominant role, as well as in the proofs of Theorems 5.5-5.6. We thus begin our analysis by quoting from the literature some basic properties of the time flow. 6.1. Monotonicity of energy. The semigroup Tt generated by (2.5) (either in Rd or else in Q L or in Q ∞,L with J → J neum , at the moment our notation does not distinguish among them) has the following properties (which explains why they are useful in proving energy bounds): (i) Tt decreases the energy F (respectively in Rd or else in Q L and in Q ∞,L ): F(Tt (m)) ≤ F(Ts (m)) for s ≤ t and if lim Tt (m) → m ∗ uniformly on the t→∞ compacts then lim inf F(Tt (m)) ≥ F(m ∗ ). t→∞

(6.2)

(ii) As t → ∞, Tt (m) converges by subsequences uniformly on the compacts to a solution of the stationary equation m = tanh{β J ∗ m} (with J → J neum in Q L or Q ∞,L ). 6.2. Properties of the instanton. In [17] it is proved that there exists a > 0 so that lim eαx m¯ (x) = a,

x→∞

where α > 0 is such that p− j (0, x)eαx = 1, R

p− = lim p(x) = β(1 − m 2β ) < 1. x→∞

(6.3)

Tunneling in Two Dimension

731

The finite volume instanton mˆ L is close to m¯ restricted to [−L/2, L/2]; we will just need here that their energies are exponentially close: there are c > 0 and ω > 0 so that for all L, (1)

¯ ≤ ce−ωL . |FL (mˆ L ) − F (1) (m)|

(6.4)

A function m on R “is close in shape to an instanton” if m is close to a translate m¯ ξ of m, ¯ m¯ ξ (x) = m(x ¯ − ξ ), ξ ∈ R. Usually ξ is chosen by minimizing a weighted L 2 distance of m from the instanton manifold {m¯ ξ , ξ ∈ R}. We will use here the notion of = 0, pξ = β(1 − m¯ 2ξ ). ξm is then center of m: ξm is a “center of m” if R m m¯ ξm pξ−1 m a critical point of ξ → m − m¯ ξ 2ξ := R (m − m¯ ξ )2 pξ−1 , i.e. the pξ−1 -weighted L 2 distance. Existence and uniqueness of the center are proved when either inf ξ m − m¯ ξ 2 or inf ξ m − m¯ ξ ∞ are small enough, see [28]. The proof extends straightforwardly to the case of the channel, Q ∞,L , where ξm is defined as m(r )m¯ ξm (r · e1 ) pξ−1 (r · e1 ) = 0, pξ (x) = β[1 − m¯ 2ξ (x)]. (6.5) m Q ∞,L

The precise statement (in the L 2 case) is given in the lemma below where we show that the center of m is related to the minimization of the usual L 2 distance from the instanton manifold. Lemma 6.1. There are c and positive so that if m ∈ L ∞ (Q ∞,L , [−1, 1]) and inf m − m¯ eξ 2 < then there is a unique ξm such that (6.5) holds, m − m¯ eξm 2 ≤ c

ξ ∈R

and m − m¯ eξm 22 ≤

1 inf m − m¯ eξ 22 . 1 − m 2β ξ ∈R

(6.6)

Proof. The proof of existence and uniqueness of the center and that m − m¯ eξm 2 ≤ c are a simple extension of their proof in d = 1, [28], and are omitted. It remains to prove (6.6). Without loss of generality we may suppose ξm = 0. Let then [m(r ) − m¯ ξ (r · e1 )]2 dr, b := f (0). f (ξ ) := p(r · e1 ) Q ∞,L

≤ [β(1 − m 2β )]−1 < (1 − m 2β )−1 and m − m¯ e 2 ≤ c then b2 ≤ (1 − Since pξ−1 m 2 −1 2 m β ) (c ) for all m such that inf m − m¯ eξ 2 < . ξ ∈R

We claim that f (ξ ) has a unique minimum at ξ = 0 if is small enough. Call f and f the first and second derivatives of f w.r.t. ξ . By an explicit computation, f (0) = 0 and (m¯ )2 (m¯ )2 dr − 2b[ dr ]1/2 . f (0) ≥ 2 p p (m¯ )2 (m¯ )2 dr ]1/2 < 2−1 dr . Then there is ∗ > 0 We suppose so small that 2b[ p p so that (m¯ )2 dr, |ξ | ≤ ∗ , f (ξ ) ≥ p

732

G. Bellettini, A. De Masi, N. Dirr, E. Presutti

which proves that f (ξ ) has a unique minimum at ξ = 0 when ξ ∈ [− ∗ , ∗ ]. Call (m¯ − m¯ ξ )2 A(ξ )2 = dr, A2 = inf ∗ A(ξ )2 > 0 |ξ |≥ p and suppose so small that b < A/2. Write f (ξ ) = [{m − m} ¯ − {m¯ ξ − m}] ¯ 2 p −1 dr = 2 b b2 + A(ξ )2 − 2 (m − m)( ¯ m¯ ξ − m) ¯ p −1 dr , hence f (ξ ) ≥ A(ξ )2 1 − ≥ A2 /4. A(ξ ) Then f (0) = b2 < A2 /4 ≤ inf ∗ f (ξ ) |ξ |≥

thus proving the claim that 0 is the unique minimizer of f . Using that pξm < β and that 1 − m¯ 2ξ > 1 − m 2β we then have m

− m¯ eξm 22

≤ Q ∞,L

≤

β [m − m¯ eξm ]2 = inf β pξem ξ ∈R

[m − m¯ eξ ]2

Q ∞,L

pξem

1 inf m − m¯ eξ 22 . 1 − m 2β ξ ∈R

6.3. Spectral estimates. There are several results on the linear stability of the instanton shape in d = 1 which extend to the case of the channel Q ∞,L . In an Appendix (Sects. 9 and 10) we will prove the following statements where, to simplify notation, we drop the superscript “e”to denote the extension of a function on R to the channel Q ∞,L . Recalling that g L (m) := −m + tanh{β J neum ∗ m} and that g L (m¯ ξ ) = 0, the first order term in the expansion of g L (m¯ ξ + ψ), ψ = m − m¯ ξ , gives ξ ψ = −ψ + pξ J neum ∗ ψ,

pξ = β(1 − m¯ 2ξ ).

(6.7)

We will regard ξ as an operator on L ∞ and/or L 2 . It is easily checked that ξ has an eigenvalue 0 with eigenvector m¯ ξ and that ξ is self-adjoint on L 2 (Q L , pξ−1 ). In Sect. 10 we will prove the existence of a L 2 spectral gap: there is a positive number κ (called a in Theorem 10.1) so that κ ψ, ξ ψξ ≤ − 2 ψ, ψξ , ψ, m¯ ξ ξ = 0, (6.8) L where ·, ·ξ is the scalar product in L 2 (Q L , pξ−1 ). A spectral gap in L ∞ is proved in Sect. 9: there is c > 0 so that (see Theorem 9.4) eξ t ψ ∞ ≤ ce−(κ/L

2 )t

ψ ∞ , ψ, m¯ ξ ξ = 0.

(6.9)

The orthogonality condition in (6.8)–(6.9) is behind the definition of center of a function in (6.5). Indeed if ξ = ξm then ψ = m − m¯ ξ fulfills the requirement in (6.8)–(6.9) as (m − m¯ ξ ), m¯ ξ ξ = 0.

(6.10)

Recall in fact that, m¯ ξ , m¯ ξ ξ = 0 because m¯ is antisymmetric and m¯ symmetric, then (6.10) with ξ = ξm follows from (6.5).

Tunneling in Two Dimension

733

6.4. Stability of the instanton. We start by proving a weaker version of (6.1): Theorem 6.2. Let m ∈ L ∞ (Q ∞,L , [−1, 1]) be such that lim inf m(x, y) ≷ 0 uniformly x→±∞

in y. Then there exists ξ such that lim Tt (m) = m¯ eξ

t→∞

in L ∞ (Q ∞,L )

(6.11)

and FQ ∞,L (m) ≥ FQ ∞,L (m¯ eξ ) = FQ ∞,L (m¯ e ) = cβ L .

(6.12)

Proof. The analogous statements hold for the instanton m¯ in d = 1 and indeed the proof of the theorem is a simple adaptation of the d = 1 proof. We just sketch the main lines. The first step uses in an essential way the spectral gap property of the previous subsection to extend from linear to local stability. The argument is standard and allows to conclude that if m − m¯ eξ ∞ ≤ with > 0 small enough, then (6.11) is verified. The global stability statement in the theorem follows from the above local stability using more involved arguments based on a comparison theorem (ferromagnetic inequalities). The proof is however essentially as in d = 1, see [12, 16, 17], and its details are omitted. Equation (6.12) follows from (6.11) and (i) of Subsect. 6.1. Lemma 6.3. Let m ∈ L ∞ (Q ∞,L , [−1, 1]). Then ∇(Tt (m) − e−t m) ∞ ≤ β ∇ J ∞ .

(6.13)

Moreover, there exists τ > 0 so that for any t ≥ τ and any m ∈ L ∞ (Q ∞,L , [−1, 1]), Tt (m) ∞ ≤ m β +

1 − mβ . 2

(6.14)

Proof. The integral version of (2.5) yields −t

t

Tt (m) − e m =

e−(t−s) tanh{β J neum ∗ Ts (m)},

0

hence (6.13). A comparison theorem holds for (2.5) so that Tt (−1) ≤ Tt (m) ≤ Tt (1) which then gives (6.14). Lemma 6.4. There exists a constant c > 0 such that if m ∈ L ∞ (Q ∞,L , [−1, 1]) and ξ ∈ R then 2/3 Tt (m) − m¯ eξ ∞ ≤ 2e−t + c Tt (m) − m¯ eξ 2 + e−t m − m¯ eξ 2 . Proof. We may assume ξ = 0 and write simply m¯ e . The function ψ = Tt (m) − m¯ e − e−t (m − m¯ e ) has bounded derivative hence (see for instance [20]) there exists c > 0 2/3 (which depends on the L ∞ norm of the derivative) so that ψ ∞ ≤ c ψ 2 . Thus 2/3 Tt (m) − m¯ e ∞ ≤ ψ ∞ + 2e−t ≤ 2e−t + c Tt (m) − m¯ e − e−t (m − m¯ e ) 2 .

734

G. Bellettini, A. De Masi, N. Dirr, E. Presutti

Lemma 6.5. If m ∈ L ∞ (Q ∞,L , [−1, 1]) and m − m¯ eξ ∈ L 2 (Q ∞,L ), then for all t ≥ 0, e−2(β+1)t m − m¯ eξ 22 ≤ Tt (m) − m¯ eξ 22 ≤ e2(β−1)t m − m¯ eξ 22

(6.15)

d Tt (m) − m¯ e 2 ≤ 2(β − 1) Tt (m) − m¯ e 2 . ξ 2 ξ 2 dt

(6.16)

and

Proof. Suppose again ξ = 0 and write simply m¯ e . Let v = Tt (m) − m¯ e , then vt = −v + tanh{β J neum ∗ Tt (m)} − tanh{β J neum ∗ m¯ e }. Since | tanh{β A} − tanh{β B}| ≤ β|A − B| and since J neum (r, r ) is a symmetric transition probability kernel, 1 d v 22 + v 22 ≤ β(|v|, J neum ∗ |v|)| ≤ β v 22 , 2 dt 1 d v 2 + v 2 ≥ −β(|v|, J neum ∗ |v|)| ≥ −β v 22 . 2 dt Hence (6.16) which, by integration, yields (6.15).

Proof of Theorem 6.1. To simplify notation we omit in this proof the superscript “e” to denote extensions to Q ∞,L and start by establishing some preliminary results. By Lemma 6.5 at any time t and for any ξ , Tt (m) − m¯ ξ 22 < ∞, as this holds at time 0 by assumption. Moreover by (6.16) for any ξ , Tt (m) − m¯ ξ 22 is a continuous function of t and for any t, Tt (m) − m¯ ξ 22 is a continuous function of ξ which diverges as |ξ | → ∞, (recall the properties of m¯ in Subsect. 6.2). It then follows that inf ξ Tt (m) − m¯ ξ 22 is a min and it is a continuous function of t. We can now start the proof of Theorem 6.1 and consider first inf m − m¯ ξ 22 >L −24β−8 . ξ∈R

There are then two possible alternatives: (a) at all times inf ξ Tt (m) − m¯ ξ 22 > L −24β−8 ; (b) there is a time t < ∞ when inf ξ Tt (m) − m¯ ξ 22 ≤ L −24β−8 . Case (b). Since inf ξ Tt (m) − m¯ ξ 22 is a continuous function of t, there is a time t0 when inf ξ Tt0 (m) − m¯ ξ 22 = L −24β−8 and since FQ ∞,L (Tt0 (m)) ≤ FQ ∞,L (m), this case is actually contained in the case when inf m − m¯ ξ 22 ≤ L −24β−8 , which is examined next ξ ∈R

(postponing the analysis of case (a)). We thus suppose inf m − m¯ ξ 22 =: m − m¯ ξˆ 22 ≤ L −24β−8

ξ ∈R

(6.17)

and set e−τ := L −6 and m ∗ := Tτ (m). By Lemma 6.5, m ∗ − m¯ ξˆ 2 ≤ L 6(β−1)−12β−4 <

, as in Lemma 6.1, for L large enough. Then there exists ξm∗ =: ξ ∗ and (6.5)–(6.6) hold. By definition we have FQ ∞,L (m ∗ ) − FQ ∞,L (m¯ ξ ∗ ) 1 =− S(m ∗ ) − S(m¯ ξ ∗ ) β Q ∞,L

1 − 2

Q ∞,L ×Q ∞,L

J neum (r, r ) m ∗ (r )m ∗ (r ) − m¯ ξ ∗ (r )m¯ ξ ∗ (r ) .

Tunneling in Two Dimension

735

Calling v = m ∗ − m¯ ξ ∗ and α = max( m ∗ ∞ , m¯ ξ ∗ ∞ ), − S(m ∗ ) − S(m¯ ξ ∗ ) ≥ −S (m¯ ξ ∗ )v +

1 α |v|3 . v2 − 3(1 − α 2 )2 2(1 − m¯ 2ξ ∗ )

By (6.14) if L is large enough, α ≤ (1 + m β )/2 < 1. Calling Lξ = pξ−1 ξ , Lξ v = J neum ∗ v − pξ−1 v,

pξ = β(1 − m¯ 2ξ ),

where ξ ∗ is defined in (6.7). We denote by (v, w) the scalar product on L 2 (Q ∞,L ) and regard Lξ as an operator on L 2 (Q ∞,L ). We have 1 α v ∞ (v, v). FQ ∞,L (m ∗ ) − FQ ∞,L (m¯ ξ ∗ ) ≥ − (v, Lξ ∗ v) − 2 3β(1 − α 2 )2 Since (v, Lξ ∗ v) = v, ξ ∗ vξ ∗ , recalling that v, m¯ ξ ∗ ξ ∗ = 0, by the L 2 spectral gap theorem, (6.8), and using that pξ ∗ ≥ β[1 − m 2β ] we get κ α v ∞ v, vξ ∗ − (v, v) 2 2L 3β(1 − α 2 )2 κ α v ∞ ≥ (v, v) − . 2L 2 β(1 − m 2β ) 3β(1 − α 2 )2

FQ ∞,L (m ∗ ) − FQ ∞,L (m¯ ξ ∗ ) ≥

By Lemma 6.4 with t = τ and ξ = ξ ∗ after recalling that v = Tτ (m) − m¯ ξ ∗ , 2/3 v ∞ ≤ 2L −6 + c m ∗ − m¯ ξ ∗ 2 + L −6 m − m¯ ξ ∗ 2 . Let ξˆ ∈ R be as in (6.17), then by (6.6) and Lemma 6.5, m ∗ − m¯ ξ ∗ 2 ≤ ≤

1 1 − m 2β 1 1 − m 2β

m ∗ − m¯ ξˆ 2 ≤

1 1 − m 2β

e(β−1)τ m − m¯ ξˆ 2

L 6(β−1)−12β−4 .

By Lemma 6.5, e−(β+1)τ m − m¯ ξ ∗ 2 ≤ m ∗ − m¯ ξ ∗ 2 so that, for L large enough, 2/3 v ∞ ≤ 2L −6 + c L 6(β−1)−12β−4 + L −6+6(β+1)+6(β−1)−12β−4 ≤ 3L −6 , and FQ ∞,L (m ∗ ) − FQ ∞,L (m¯ ξ ∗ ) ≥

κ (v, v). 4L 2 β(1 − m 2β )

By (6.15), v 22 ≥ m − m¯ ξ ∗ 22 e−2(β+1)τ ≥ inf m − m¯ ξ 22 e−2(β+1)τ . Recalling that e−τ = L −6 , FQ ∞,L (m) − FQ ∞,L (m¯ ξ ∗ ) ≥

ξ

κ L −[2+12(β+1)] inf m − m¯ ξ 22 . ξ 4β(1 − m 2β )

736

G. Bellettini, A. De Masi, N. Dirr, E. Presutti

Case (a), namely when at all times t, inf ξ Tt (m) − m¯ ξ 22 > L −24β−8 . By Theorem 6.2 for any > 0 there are t and ξ so that Tt (m) − m¯ ξ ∞ < . Call m ∗ = Tt (m) and write

FQ ∞,L (m ∗ ) − FQ ∞,L (m¯ ξ ) ≥ φβ (m ∗ ) dr − c L 2 + Le−αL |x−ξ |>L

having used (6.3) to bound the contribution of {|x − ξ | > L} to FQ ∞,L (m¯ ξ ). As already remarked there is c > 0 so that φβ (m ∗ ) = φβ (m ∗ ) − φβ (m β ) ≥ c |m ∗ − m β |2 , m ∗ > 0, hence

|x−ξ |>L

c φβ (m ) dr ≥ 2 ∗

|m ∗ − m¯ ξ |2 − c Le−αL .

|x−ξ |>L

Thus there is a new constant c > 0 so that

FQ ∞,L (m ∗ ) − FQ ∞,L (m¯ ξ ) ≥ m ∗ − m¯ ξ 22 − c L 2 + Le−αL ,

and since m ∗ − m¯ ξ 22 ≥ L −24β−8 we obtain (6.1) for small enough.

7. Proof of Theorem 3.1 In this section we will prove that orbits whose penalty is close to optimal approximately have an instanton shape at times when the total magnetization is small. Given θ0 ∈ (0, θcrit ), see (3.4), we fix once and for all θ1 and θ2 so that 1 > θ2 > θ1 > θ0 2

(7.1)

(the values of L for which our analysis applies depend on the actual choice of such parameters). Let Wα,L be as in (5.1) and for any δ > 0 set ⎫ ⎧ ⎪ ⎪ ⎬ ! ⎨ (7.2) m ∈ α : − |m − Wα,L | < δ , Nδ,L := ⎪ ⎪ ⎭ |ϑα |≤θ0 ⎩ QL

and in the sequel we will restrict to functions m which satisfy m ∈ Nδ,L ,

FL (m) < cβ L + L ,

cβ = F (1) (m). ¯

(7.3)

We will see that as > 0 gets small (7.3) forces m progressively closer to m¯ eξ , for a suitable value of ξ . Before entering into the whole issue we remark: Lemma 7.1. For every a > 0 there is L a so that for all L ≥ L a the following holds. Let (1) (u n , Tn ) be an optimizing orbit, namely such that lim inf I L ,Tn (u n ) ≤ FL (mˆ L )L. Then for all n large enough and all t ∈ [0, Tn ], FL (u n (·, t)) < cβ L +

n→∞

1 La

L , L ≥ La.

(7.4)

Tunneling in Two Dimension

737 (1)

Proof. For any δ > 0 if n is large enough, FL (u n (·, t)) < FL (mˆ L )L + δ. By (6.4) FL (u n (·, t)) < L(cβ + ce−ωL ) + δ. Choose δ = (2L a )−1 and L a so that ce−ωL a < (2L a )−1 and (7.4) follows.

L −a

Thus we can take in (7.3) = with a as large as desired, provided L ≥ L a and that we restrict to optimizing sequences. Our first result is a corollary of the convergence theorem to the Wulff shape of Subsect. 2.5. ¯ m ∈ α Proposition 7.1. For any δ > 0 there exist > 0 and L¯ such that if L ≥ L, with |ϑα | ≤ θ0 and FL (m) < cβ L + L, then m ∈ Nδ,L (modulo a rotation of an integer multiple of π/2). Proof. In the course of the proof we use the following notation: given a set A ⊂ Q 1 we call f A the function equal to m β in A and to −m β in Q 1 \ A and f A,L its image as a function on Q L , i.e. f A,L (Lr ) = f A (r ). If L = 1 we simply write f A . Let E ϑα ⊆ Q 1 be a solution of (2.14) with |E ϑα | = 1/2 − ϑα . We argue by contradiction. Thus we suppose that there is δ > 0 such that for any

> 0 and any L¯ positive the following holds. There exist α such that |ϑα | ≤ θ0 , L > L¯ and m ∈ α such that FL (m) < cβ L + L and min − |m − f E ϑα ,L dr | ≥ δ. E ϑα

QL

We can then find an increasing sequence {L h } converging to +∞ as h → +∞, αh such that |ϑαh | ≤ θ0 , and functions m h ∈ αh satisfying FL h (m h ) 1 min − |m h − f E ϑα ,L h dr | ≥ δ. < cβ + , (7.5) E ϑα Lh h h Q Lh

Rescale the functions m h by defining vh (r ) := m h (L h r ), r ∈ Q 1 . Then there is a (not relabelled) subsequence so that αh → α as h → +∞ with |ϑα | ≤ θ0 while {vh } con1 verges in L (Q 1 ) to a function f A , i.e. equal to m β in A and to −m β in Q 1 \ A, A ∈ BV , and − f A = α, [2]. Using the -convergence of the rescaled sequence of functionals, cβ ≥ lim inf

FL h (m h ) ≥ cβ P(A, int(Q 1 )). Lh

(7.6)

Since |A| ∈ [ 21 − θ0 , 21 + θ0 ], P(A, int(Q 1 )) ≥ 1 (1 being the minimal perimeter when the area is in [ 21 − θ0 , 21 + θ0 ]) hence from (7.6) P(A, int(Q 1 )) = 1 and A is a minimizer of the perimeter. By rescaling the second equation in (7.5), min − |vh − f E ϑα | ≥ δ. E ϑα

h

h

Q1

As h → ∞ (along the converging subsequence) min − | f A − f E ϑα | ≥ δ, E ϑα

(7.7)

Q1

which gives the desired contradiction because the left-hand side on (7.7) vanishes.

738

G. Bellettini, A. De Masi, N. Dirr, E. Presutti

All functions Wα,L in Nδ,L have value +m β in {(x, y) : x ≥ θ0 L} and value −m β in {(x, y) : x ≤ −θ0 L}, a property which evidently fails for the generic element of Nδ,L . A weaker property however holds, namely there are two vertical strips (a precise definition is given below), one in A+ = {(x, y) : x ∈ [θ0 L , θ1 L]} (see (7.1)) and the other one in A− = −A+ , where on a large fraction of points (ζ,− ,+ ) = 1, respectively (ζ,− ,+ ) = −1. Under the additional assumption that (7.3) holds with small enough (yet independent of L) there are “vertical connections” (see Definition 5.2) where identically (ζ,− ,+ ) = 1 and (ζ,− ,+ ) = −1. If we further strengthen the assumption by supposing = L −2 and L large, then η(ζ,− ) = 1 for x ≥ θ2 L and η(ζ,− ) = −1 for x ≤ −θ2 L. We define the [vertical] strips S(n) by S(n) := [n+ , (n + 1)+ ) × [−L/2, L/2) − + and call Z ± L ⊂ Z the set of all n ∈ Z such that S(n) ⊂ A± and Z L = Z L ∪ Z L .

Proposition 7.2. There exists a constant c = c(ζ, − , + ) such that for any m ∈ Nδ,L cδ (ζ,− ,+ ) (m, ·) = ±1 in at most N := there are n ± ∈ Z ± L squares δ L such that θ1 − θ0 of D(+ ) inside S(n ± ). Proof. The value of (ζ,− ,+ ) (m, ·) on S(n) is determined by the value of η(ζ,− ) on a strip which is three times larger than S(n). With reference to Fig. 2 in fact if the middle square is in S(n) then all 9 squares are needed to determine the value of (ζ,− ,+ ) in the middle one. Set then S (3) (n) := [(n − 1)+ , (n + 2)+ ) × [−L/2, L/2). By definition of Nδ,L , n∈Z − L

S (3) (n)

|m + m β | +

n∈Z +L

S (3) (n)

|m − m β | ≤ 3

QL

|m − Wα,L | < 3δL 2 .

(3) Since the cardinality of Z − L is ≥ L((θ1 −θ0 )/2 (for L large), there are two strips S (n ± ) such that 2+ 6+ δ 2 = |m ∓ m β | ≤ 3δL L L(θ1 − θ0 ) θ1 − θ0 S (3) (n ± ) −

which implies that η(ζ, ) (m, ·) = ±1 on at most Nδ :=

6+ δL θ1 − θ0

1 ζ (− )2

squares of D(− ) inside S (3) (n ± ). Thus there are at most 9Nδ squares in S(n + ) (resp. in S(n − )) where (ζ,− ,+ ) = 1 (resp. (ζ,− ,+ ) = −1). Proposition 7.3. There are δ, L ∗ and ∗ all positive so that if m satisfies (7.3) with L ≥ L ∗ and ∈ (0, ∗ ), then there are two vertical connections B∓ , one in B− = {(x, y) : x ∈ [−Lθ2 , −Lθ0 ]} and the other one in B+ = {(x, y) : x ∈ [Lθ0 , Lθ2 ]}, where (ζ,− ,+ ) (m, ·) is identically equal to −1 and respectively to +1.

Tunneling in Two Dimension

739

Proof. The proof by contradiction is based on successive modifications of m into new functions so that if the vertical connections were absent then the final function would have both energy smaller than the initial one and larger than cβ L + L, which is the desired contradiction. We next outline the main steps postponing their proofs. By symmetry we may restrict to the case where the vertical connection is absent in B− and it may or may not be absent in B+ . 1. The absence of a vertical connection in B− implies that the set

r ∈ Q L : (ζ,− ,+ ) (m; r ) > −1 connects S(n − ) to {(x, y) ∈ Q L : x = −θ2 L}. From this it will follow that the number K 0 of D(+ ) -squares strictly to the left of S (3) (n − ), where (ζ,− ,+ ) (m; ·) = 0 is K 0 ≥ c0 (θ2 − θ1 )L, c0 a positive constant. 2. It is possible to modify m only in S (3) (n − ) in such a way that the new function m˜ verifies (ζ,− ,+ ) (m; ˜ r ) = −1, r ∈ S(n − ) and FL (m) ˜ ≤ FL (m) + c 2+ δL, c a positive constant. 3. By Theorem 5.6 applied to m˜ with the region strictly to the left of S(n − ) there exists m ∗ = m˜ on c such that η(ζ,− ) (m ∗ ; ·) = −1 on and FL (m ∗ ) ≤ FL (m) ˜ − c1 ζ 2 2− K 0 . 4. By Theorem 5.5 we can further modify m ∗ into a new function ψ− equal to m ∗ outside in such a way that η(ζ,− ) (ψ− ; r ) = −1, r ∈ , ψ− (x, y) = −m β , x < −Lθ2 − 1 and FL (ψ− ) ≤ FL (m ∗ ) + c e−ω(1/2−θ2 )L , c a positive constant. Conclusion of proof. Call ψ the function where the analogous modifications are made to the right of the origin, namely by repeating Steps 2-4 above (notice that a vertical connection in B+ may very well exist, in which case we do not have the lower bound for the corresponding K 0 as in Item 1). The “previous errors” occur therefore twice while the gain term occurs only once, in the worst case, then FL (m) ≥ FL (ψ) − 2c 2+ δL− − 2c e−ω(1/2−θ2 )L + c1 ζ 2 2− {c0 (θ2 − θ1 )L} . (7.8) Since ψ(x, y) = ±m β in x ≥ L/2 − 1 and respectively x ≤ −L/2 + 1, FL (ψ) = ˜ where ψ(x, ˜ y) = −m β for x ≤ −L/2 and = m β for x ≥ L/2. Then by FQ ∞,L (ψ), (6.12), FL (m) ≥ cβ L − 2c 2+ δL − 2c e−ω(1/2−θ2 )L + c1 ζ 2 2− {c0 (θ2 − θ1 )L} ,

(7.9)

which for δ small enough yields for all L ≥ L ∗ , L ∗ large enough, FL (m) ≥ cβ L + Choosing ∗ =

c1 2 2 ζ − {c0 (θ2 − θ1 )L} . 2

(7.10)

c1 2 2 ζ − {c0 (θ2 − θ1 )}, 2 FL (m) ≥ cβ L + ∗ L

which contradicts FL (m) < cβ L + L because < ∗ .

(7.11)

740

G. Bellettini, A. De Masi, N. Dirr, E. Presutti

Remarks. The above argument is strictly two dimensional. Indeed the lower bound on K 0 grows like L in all dimensions (the “thin fingers effect”), while the error in Item 2 grows as cδL d−1 which in d > 2 wins against the “gain term”. A different argument developed by Bodineau and Ioffe, [5], applies in d > 2 and since the theory of Wulff shape can be partially extended to d = 3 the result extends to d = 3 as sketched in Sect. 11. While Items 3 and 4 are self explanatory, Items 1 and 2 do need a proof.: Proof of Item 1. We call ± or 0 a D(+ ) -square where (ζ,− ,+ ) (m, ·) is ±1 or 0, respectively and, given C ∈ D(+ ) , we set

there is x > x with (x , y) ∈ C , ∈ D(+ ) ∩ Q L : for any (x, y) ∈ C Sleft (C) := C there is y ∈ − L , L ∈ D(+ ) ∩ Q L : for any (x, y) ∈ C Svert (C) := C 2 2 with (x, y ) ∈ C . Denote by K the number of 0-squares to the left of S(n − ) (included). Item 1 then follows from the two alternatives below: Case (i). Assume that there exists a − square C0 ∈ D(+ ) ∩ S(n − ) such that the strip Sleft (C0 ) ∩ {(x, y) ∈ Q L : −θ2 L ≤ x ≤ n − + } contains only − squares. For each C in the strip we have that Svert (C ) contains at least one − square, because C ⊂ Svert (C ). On the other hand Svert (C ) cannot consist entirely of − squares by our assumption that there is no vertical connection. Since the sets (ζ,− ,+ ) = 1 and (ζ,− ,+ ) = −1 are (θ2 − θ1 )L . not connected, there must be at least one 0-square in Svert (C ). Thus K ≥ 2+ Case (ii). Any − square C0 ∈ D(+ ) ∩ S(n − ) is such that Sleft (C0 ) contains at least a L 0-square. In this case K ≥ − Nδ by definition of S(n − ). + Proof of Item 2. Call m˜ the function obtained from m by putting −m β on all squares connected to those in S (3) (n − ), where η(ζ,− ) (m, ·) is not identically −1. Then FL (m) ≥ FL (m) ˜ − c J 2+ Nδ , where c J > 0 is a constant depending only on J hence by Proposition 7.2 Item 2 is proved. As already remarked Items 3 and 4 are self explanatory and the proposition is therefore proved. Corollary 7.4. In the same context as in Proposition 7.3 assume in addition that (7.3) is verified with = L −2 . Then η(ζ,− ) (m, (x, y)) = ±1 for all x ≥ θ2 L − 1 and respectively x ≤ −θ2 L + 1. Proof. By Proposition 7.3 there are two vertical connections B∓ respectively to the right of x = −θ2 L and to the left of x = θ2 L, where (ζ,− ,+ ) (m, ·) = ∓1. Arguing again by contradiction and referring for definiteness to what happens to the left of B− , if (ζ,− ,+ ) (m, ·) = −1 somewhere on the left of B− , necessarily (ζ,− ,+ ) (m, ·) = 0

Tunneling in Two Dimension

741

somewhere to the left of B− . Then by Theorem 5.6 there is ψ equal to m to the right of B− (included) with (ζ,− ,+ ) (ψ, ·) = −1 on the left of B− and such that FL (m) ≥ FL (ψ) + c1 ζ 2 2− . The same argument used in the proof of Proposition 7.3 shows then that FL (m) ≥ cβ L − c e−ω(1/2−θ2 )L + c1 ζ 2 2− which leads to a contradiction because L = L −1 < c1 ζ 2 (− )2 − c e−ω(1/2−θ2 )L for L large enough. Lemma 7.1 and Theorem 7.5 below conclude the proof of Theorem 3.1. Theorem 7.5. Assume m ∈ α , |ϑα | ≤ θ0 and that ¯ + , FL (m) ≤ L F (1) (m)

< L −600−[2+24(β−1)] .

(7.12)

Then there exists ξ with |ξ | ≤ θ0 L + 1 such that m¯ eξ − m 22 ≤ L −100 . Proof. To simplify notation we omit also in this proof the superscript “e” to denote extension to Q ∞,L . We distinguish two cases, Case 1 is when (7.13) below is satisfied and Case 2 when it is not (we will see that the second case contradicts the assumptions of the theorem and thus it will not occur). Let θ2 be as in Corollary 7.4. Case 1. There exists |ξ | ≤ θ1 L such that, m¯ ξ − m 2L 2 (Q

θ2 L ,L )

≤ L −300 .

(7.13)

We split the free energy as FL (m) = FQ θ2 L−1,L m Q θ2 L−1,L | m Q cθ

2 L−1,L

+ FQ cθ

2 L−1,L

m Q cθ

2 L−1,L

,

(7.14)

where for f, g ∈ L ∞ (Q L , (−1, 1)) and A ⊆ Q L , 1 J neum (r, r )[ f (r ) − f (r )]2 dr dr , FA ( f ) := φβ ( f ) dr + 4 A×A A 1 J neum (r, r )[ f (r ) − g(r )]2 dr dr . FA ( f |g) := FA ( f ) + 2 A×(Q L \A) Since |φβ (m) − φβ (m)| ¯ ≤ c|m − m|, ¯ there is a new constant c so that FQ θ2 L−1,L m Q θ2 L−1,L | m Q cθ L−1,L − FQ θ2 L−1,L m¯ ξ 1 Q θ2 L−1,L | m¯ ξ 1 Q cθ 2

2

≤ cL m¯ ξ − m L 2 (Q θ

2

L−1,L

. L ,L )

By (7.13) and the exponential convergence of m¯ ξ to its asymptotes, for L large enough, FQ θ2 L−1,L m Q θ2 L−1,L | m Q cθ L−1,L ≥ L F (1) (m) ¯ − L −148 . 2

742

G. Bellettini, A. De Masi, N. Dirr, E. Presutti

By (7.14) and (7.12)

L F (1) (m) ¯ + − L F (1) (m) ¯ + L −148 ≥ FQ cθ

2 L−1,L

m Q cθ

2 L−1,L

.

(7.15)

Using again the exponential convergence of m¯ ξ , there are positive constants c¯ and c¯ so that m − m¯ ξ L 2 (Q c

θ2 L−1,L )

≤ m − sign(x)m β L 2 (Q c

θ2 L−1,L )

+ ce ¯ −c¯ L .

We postpone the proof that there is a constant c > 0 so that |m − sign(x)m β |2 . FQ cθ L−1,L m Q cθ L−1,L ≥ c 2

2

(7.16)

(7.17)

|x|≥Lθ2 −1

By (7.13) and(7.16)-(7.17)-(7.15), m¯ ξ − m 22 ≤ L −300 + L −147 , for L large enough. Since by assumption m ∈ α with |ϑα | ≤ θ0 , then, for L large enough, |ξ | ≤ θ0 + 1 so that the analysis of Case 1 will be complete once we prove (7.17), which we do next. By Corollary 7.4, for L sufficiently large, η(ζ,− ) (m, r ) = ∓1 when x < −θ2 L + 1 and x > θ2 L − 1. Using this we are going to prove that for any r ∈ Q cθ2 L−1,L , 1 J neum (r, r )[m(r ) − m(r )]2 dr ≥ c(m(r ) − m β )2 (7.18) φβ (m(r )) + 2 Q cθ

2 L−1,L

which yields (7.17). We consider only x > 0 as the case x < 0 is proved in exactly the same way. To prove (7.18), we first observe that φβ (±m β ) = 0, φβ (m) > 0 for m ∈ {±m β } and φβ (m) is strictly convex in ±m β . Therefore there exists a constant c > 0 such that φβ (m) ≥ c min{|m − m β |2 , |m + m β |2 }. Thus if m(r ) > 0 the first term on the l.h.s. of (7.18) already yields the bound.

If m(r ) ≤ 0 we call J (− ) (r, r ) = − J neum (r, r )dr , r ∈ Q cθ2 L−1,L , and, by the ( )

Cr −

Lipschitz continuity of J , Q cθ

2 L−1,L

2 J neum (r, r ) m(r ) − m(r )

≥

2 J (− ) (r, r ) m(r ) − m(r ) − c− .

Q cθ L−1,L 2

By Cauchy-Schwartz, 2 2 2 − m(r ) − m(r ) ≥ m(r ) − − m(r ) ≥ m β − ζ , ( )

Cr −

( )

Cr −

(7.19)

Tunneling in Two Dimension

743

which inserted in (7.19) gives Q cθ

2 1 J neum (r, r ) m(r ) − m(r ) ≥ (m β − ζ )2 − c− 2

2 L−1,L

because

J neum (r, r ) >

Q cθ

1 . Supposing ζ small enough, recall − ≤ ζ 2 /2, the 2

2 L−1,L

r.h.s. ≥ m 2β /4 and (7.18) follows. Case 2. The complementary case is when (7.13) does not hold, we will prove that such a case cannot actually happen. Indeed by Corollary 7.4 and Theorem 5.5 there is a function ψ equal to m on |x| ≤ θ2 L, such that ψ = ±m β on x > L/2 − 1 and x < −L/2 + 1 and FL (m) ≥ FL (ψ) − ce−ω(1/2−θ2 )L .

(7.20)

Calling φ the function on Q ∞,L equal to ψ on Q L and to ±m β on x < −L/2 and x > L/2, by Theorem 6.1 we have FL (ψ) = FQ ∞,L (φ) ≥ cβ L + inf cL −[2+24(β−1)] ξ

≥ cβ L + cL

|m − m¯ ξ |2

|x|<θ2 L

−[2+24(β−1)]−300

.

Equations (7.20)–(7.21) contradict (7.12) for L large.

(7.21)

8. Proof of Theorems 3.2 and 3.3 The proof follows the one dimensional analysis in [6], see also [7, 8, 14], and uses some spectral properties proved in an appendix, Sects. 9 and 10. Recalling that g L (m) := −m + tanh{β J neum ∗ m}, the first order term in f in the expansion of g L (m¯ ξ,L + f ), |ξ | < L/2, gives ξ,L f = − f + pξ,L J neum ∗ f,

pξ,L = β cosh−2 {β J neum m¯ eξ,L }

(8.1)

(the zero order term is however not missing because m¯ ξ,L is not a critical point unless L = ∞). We will regard here ξ,L as an operator on L ∞ (Q L ), fix in the sequel r ∈ (0, 1) and restrict to ξ such that |ξ | ≤ r L/2. ξ,L is a shorthand for m¯ ξ,L which is among the operators m considered in Sect. 9. Due to the planar symmetry some of its spectral properties just follow from the d = 1 analysis and are valid for all L large enough, see Subsect. 9.1, here we just mention that the maximal eigenvalue is λξ,L with eigenvector a strictly positive, planar function eξ,L (·). In the notation of Subsect. 9.1 eigenvalue and eigenfunction are denoted by λm and em respectively, where m = m¯ ξ,L .

744

G. Bellettini, A. De Masi, N. Dirr, E. Presutti

8.1. Fibers. Following [6], we introduce fibers in the space L ∞ (Q L ; [−1, 1]), defined as Bξ,L := m ∈ L ∞ (Q L ; [−1, 1]) : m = m¯ ξ,L + φ, πξ,L (φ) = 0 , (8.2) where πξ,L (φ) = and call

eξ,L φξ,L , f, gξ,L := eξ,L eξ,L ξ,L

−1 f g pξ,L

(8.3)

QL

B ,ξ,L := m¯ ξ,L + φ ∈ Bξ,L : φ ∞ ≤ , B ,ξ,L := m ∈ B ,ξ,L : m(x, y) = m(x, 0) .

(8.4)

8.2. Spectral analysis. The crucial property of ξ,L is invertibility: the inverse −1 ξ,L of ξ,L exists and it is a bounded operator on the space {φ : πξ,L (φ) = 0}, namely on the points of the fiber Bξ,L parameterized by φ = m − m¯ ξ,L . More precisely there is a constant κ > 0 so that sup

φ:πξ,L (φ)=0, φ ∞ =1

2 −1 ξ,L φ ∞ ≤ κ L .

(8.5)

In d = 1 the bound on the r.h.s. is independent of L, the extension to d = 2 is proved in the appendix as a direct consequence of Theorem 9.4, see (9.26). Theorem 8.1. For any r < 1 and all L large enough, the only solution of g L (m) = 0 with m ∈ B L −3 ,ξ,L , |ξ | ≤ r L/2 is mˆ eL . Proof. The analogous property in d = 1 has been proved in a stronger form in [6], thus the theorem will follow once we show that any solution of g L (m) = 0 in {B L −3 ,ξ,L , |ξ | ≤ r L/2} is necessarily in {B L −3 ,ξ,L , |ξ | ≤ r L/2}. Following Sect. 4 of [6], we consider the auxiliary equation g L (m) − πξ,L g L (m) eξ,L = 0, m ∈ B L −3 ,ξ,L . (8.6) We will prove that any solution of (8.6) is in B L −3 ,ξ,L . The theorem will then follow because if g L (m) = 0 then m satisfies (8.6). For φ as above we define Rξ,L (φ) = g L (m¯ ξ,L + φ) − g L (m¯ ξ,L ) − ξ,L φ. By a Taylor expansion to second order, there is c so that Rξ,L (φ) ∞ ≤ c φ 2 ∞ , Rξ,L (φ1 )− Rξ,L (φ2 ) ∞ ≤ c{ φ1 ∞ + φ2 ∞ } φ1 −φ2 ∞ . (8.7) For L large enough, let Aξ,L be the following operator on Bξ,L : " # g L (m¯ ξ,L )−πξ,L g L (m¯ ξ,L ) eξ,L Aξ,L (φ) := −−1 ξ,L " # + Rξ,L (φ) − πξ,L (Rξ,L (φ))eξ,L .

Tunneling in Two Dimension

745

If φ is a fixed point of Aξ,L (·) and φ ≤ L −3 , then m¯ ξ,L + φ solves (8.6). In [13] it has been proved that there is C so that, for α as in (6.3), g L (m¯ ξ,L ) ∞ ≤ Ce−α(L−2|ξ |) which implies that for L large enough ¯ g L (m¯ ξ,L ) − πξ,L g L (m¯ ξ,L ) eξ,L ∞ ≤ Ce−cL . From (8.5) and (8.7) it then follows that

$ % ¯ Aξ,L (φ) ∞ ≤ c0 L 2 Ce−cL + L −6 .

Thus, for all L large enough Aξ,L maps the set B L −3 ,ξ,L into itself. Moreover Aξ,L maps B L −3 ,ξ,L into itself. By (8.7) and (8.5) we have Aξ,L (φ1 ) − Aξ,L (φ2 ) ∞ ≤ δ φ1 − φ2 ∞ ,

δ < L −1 1 ,

so that Aξ,L is a contraction on B L −3 ,ξ,L and since B L −3 ,ξ,L is invariant, the unique fixed point φξ is in B L −3 ,ξ,L , namely it has planar symmetry. As already remarked, solutions of (8.6) are fixed points of Aξ,L . We have thus shown that solutions of (8.6) have planar symmetry, which, as argued before, proves the theorem. 8.3. Proof of Theorem 3.2. Let m, L and as in Theorem 3.2. Since the function t → α(t), α(t) = − Tt (m), t ≥ 0, is continuous and since |ϑα(0) | ≤ θ0 , either there is QL

a time t ∗ ≥ 0 when |ϑα(t ∗ ) | = θ0 or else any limit point m ∗ (in L ∞ ) of Tt (m) is in α with |ϑα | ≤ θ0 . Being a limit point, m ∗ is stationary and by lower semi-continuity, FL (m ∗ ) ≤ (1) FL (m) < L FL (mˆ L ) + . Since < 0 (L), by Theorem 3.1 there is ξ , |ξ | ≤ θ0 L + 1, so that m ∗ − m¯ ξ,L 2 < L −100 . Since m ∗ is stationary its derivative is bounded, hence there is a constant c so that 2/3 m ∗ − m¯ ξ,L ∞ ≤ c m ∗ − m¯ ξ,L 2 < c(L −100 )2/3 . (8.8) We omit the proof that if m − m¯ ξ,L ∞ < ζ , ζ small enough, then m is in a fiber Bξ ,L with |ξ − ξ | ≤ cζ , which is analogous to its d = 1 version proved in [6]. Using such a statement by (8.8) for L large enough m ∈ B L −3 ,ξ ,L , |ξ | ≤ r L/2, r < 1 and by Theorem 8.1 we then conclude that m ∗ = mˆ eL . Theorem 3.2 is proved. 8.4. Proof of Theorem 3.3. By symmetry we may restrict to m ∈ α with ϑα = −θ0 . By assumption m − m¯ ξ,L 2 < L −100 ; we are going to show that for L large enough, | − θ0 −

ξ | ≤ L −100 . L

(8.9)

4L −100 Indeed, m − m¯ ξ,L 1 ≤ 4 m − m¯ ξ,L 2 < 4L −100 , so that | − m − − m¯ ξ,L | ≤ L2 Q Q L L and (8.9) follows for L large enough because − m = α, ϑα = −θ0 and using (6.3). QL

746

G. Bellettini, A. De Masi, N. Dirr, E. Presutti

Theorem 8.2. For any and r ∈ (0, 1) there is L( , r ) so that for all L > L( , r ) the following holds. Let m ∈ L ∞ (Q L ) be such that there is ξ0 ∈ (− L2 , − r2L ) so that m − m¯ ξ0 ,L ∞ ≤ , then lim Tt (m) − m + ∞ = 0.

t→∞

(8.10)

Proof. By assumption m(x, y) ≥ m¯ ξ0 (x) − 2 ,

for all (x, y) ∈ Q L .

In Proposition 8.2, Theorem 8.3 and Proposition 8.4 of[4] it has been proved that for L large (how large depending on and r ) Tt m¯ ξ0 − 2 converges to m + . Thus (8.10) follows from the comparison theorem. By Lemmas 6.4 and 6.5, Tt (m) − m¯ ξ,L ∞ ≤ 2e−t + c

$

% 2/3 e−t + e2(β−1)t m − m¯ ξ,L 2 .

Choosing t suitably large (independently of L) the r.h.s. becomes < and by (8.9), Tt (m) satisfies the assumption of Theorem 8.2 with r < θ0 and L large enough. Then Theorem 3.3 follows from Theorem 8.2, noticing that convergence in L ∞ (Q L ) implies convergence in L 2 (Q L ), because Q L is bounded. Appendix 9. Spectral Estimates, sup Norms The analysis in this appendix refers to functions on the square Q L and on the channel Q ∞,L ; in the latter case we will consider only one function, the instanton m¯ e . For brevity we call planar a function or a kernel where the dependence on the point r is only via its x coordinate x = r · e1 . Definition. The set M L consists of the instanton m¯ e ∈ L ∞ (Q ∞,L , (−1, 1)) and of the family of planar functions m ∈ L ∞ (Q L , [−1, 1]) which are in either one of the following two classes (r below a fixed number in (0, 1)): rL • m¯ ξ,L , |ξ | ≤ 2 • m − mˆ eL ∞ ≤ (L), (L) > 0 a small number which will be fixed later. 9.1. Maximal eigenvalue and eigenvector. We call Am , m ∈ M L , the operator on L ∞ (Q L ) or L ∞ (Q ∞,L ) if m = m¯ e , whose kernel is Am (r, r ) = pm (r )J neum (r, r ).

(9.1)

If m = m¯ ξ,L , then pm = cosh−2 {β J neum ∗ m}, otherwise pm = β(1 − m 2 ). If m = m¯ e or m = mˆ eL the two expressions coincide. The different choices are due to different applications, e.g. if we linearize around the flow Tt (m) or St (m). In [13] it is proved that given r ∈ [0, 1) there are L r and (L) so that for all L ≥ L r −1 e , and any m ∈ M L , there are λm > 0 and em so that, with sm = pm m Am (r, r )em (r )dr = λm em (r ), sm (r )Am (r, r )dr = λm sm (r ). (9.2)

Tunneling in Two Dimension

747

em is a strictly positive, smooth planar function in L ∞ (Q L ) that we normalize so that 2 −1 pm = em , em m = 1. (9.3) sm em = em λm is an eigenvalue of Am with strictly positive right and left eigenvectors, em and sm , in agreement with the Perron-Frobenius theorem which is indeed behind the proof of the (1) above statements. The function em (x) on [−L/2, L/2] or R for the instanton, defined by x → em (r ), r · e1 = x, is the eigenvector for the d = 1 problem with interaction j (1) as in (2.2), however √em (x)2 d x = L −1 due to (9.3). In the case m = m¯ e , λm = 1 and em (r ) = cm¯ (r · e1 )/ L, c a normalization independent of L. The above statements are verified in a large class of functions m, those which follow are instead more restrictive. All bounds below are uniform in M L but we keep reference to the specific m ∈ M L for future applications. > 0 so that • There are c± > 0 and αm

1 − c+ e−2αm L ≤ λm ≤ 1 + c+ e−2αm L .

(9.4)

• For each m ∈ M L define xm as xm = 0 if m = m¯ e , xm = ξ if m = m¯ ξ,L and xm = 0 for the remaining m. Then there are s > 0 and δ < 1 so that pm (r ) ≤ δ, |r · e1 − xm | ≥ s,

(9.5)

> 0 and c so that and there are αm > 0, αm

√ c em (r ) ≤ √ e−αm |r ·e1 −xm | , em (r )−1 ≤ c L eαm |r ·e1 −xm | . L

(9.6)

• We will also use that there is a constant c so that −1 pm ∞ ≤ c.

(9.7)

• As mentioned, all the previous bounds are uniform in M L , by suitably resetting the coefficients.

9.2. Reduction to Markov chains. Let K m be the Markov operator whose transition probability kernel is K m (r, r ) =

Am (r, r )em (r ) . λm em (r )

(9.8)

Since Anm (r, r ) = em (r )λnm K mn (r, r )em (r )−1 we will derive bounds on Anm and consequently on the spectrum of Am and of m := Am − 1 from properties of K mn . The important point of the transformation (9.8) is that K m is a Perron-Frobenius Markov kernel to which the high temperature Dobrushin techniques apply. Calling x = r · e1 and y = r · e2 we can write K m (r, r ) as K m (r, r ) = Pm (x, x )qx,x (y, y ),

(9.9)

748

G. Bellettini, A. De Masi, N. Dirr, E. Presutti

where, relative to the measure K m (r, r )dr , Pm (x, x ) is the marginal distribution of x and qx,x (y, y ) is the conditional distribution of y given x (to simplify notation we drop sometimes the suffix m). The explicit expression of Pm (x, x ) is Pm (x, x ) =

pm (x) j neum (x, x )em (x ) , λm em (x)

(9.10)

where j (x, x ) is defined in (2.2) and em (x) ≡ em (x, y) (recall that em (r ) is planar). Equation (9.10) is (9.8) in the d = 1 case with interaction j (x, x ). Notice that due to the planar symmetry assumption the marginal Pm (x, x ) does not depend on the y coordinate of r . In the sequel we will consider the probability density qx,x (z) =

J ((x, 0), (x , z)) j (x, x )

(9.11)

on R noticing that the variable y := y + z modulo reflections at ±n L/2 has the law qx,x (y, y ) and sometimes, by an abuse of notation, we will write qx,x (z) for qx,x (y, y ). 9.3. One dimensional results. To study the dependence on the initial point r of the Markov chain with transition probability kernel K m (r, r ) we will use couplings (for brevity we may shorthand x = r · e1 and y = r · e2 ). We first recall some one dimensional results proved in [13]. Call " # (9.12) W (1) [x, x ] = wm (x) + wm (x ) 1{x=x } , wm (x) := em (x)−1 ; em and m ∈ M L below are regarded as functions of x. Theorem 9.1. There are c and ω(1) positive and for any (x0 , x0 ) ∈ [−L/2, L/2]2 (or R2 ) a process on ([−L/2, L/2]2 )N (or (R2 )N ) whose expectation is denoted by Ex(1),x so 0 0 that its marginal distributions are the Markov chains with transition probability (9.10) and, for any L large enough and n ≥ 1, (1) (1) Ex ,x W (1) [x(n), x (n)] ≤ cW (1) [x(0), x (0)]e−ω n . (9.13) 0

0

Moreover if for some n, x(n) = x (n) then x(n + k) = x (n + k) for all k ≥ 0. 9.4. Couplings and Wasserstein distance. For any (r0 , r0 ) ∈ Q L ×Q L (or Q ∞,L ×Q ∞,L if m = m¯ e ) we define a process {r (n), r (n), n ∈ N}, r (0) = r0 , r (0) = r0 , with values on Q L ×Q L (or Q ∞,L ×Q ∞,L ) as follows: The marginal distribution of {x(n), x (n), n ∈ (1) N} is set equal to the law Px ,x of the process defined in Theorem 9.1. To complete the 0

0

definition we must give the law of {y(n), y (n), n ∈ N} conditioned on the trajectory (x, x ) = {(x(n), x (n)), n ∈ N},

which we consider in the sequel as fixed. Define then n 0 , n 1 ∈ N ∪ {+∞} as n 0 := inf n ∈ N : x(n 0 ) = x (n 0 ) , n 1 := inf n ∈ N : n ≥ n 0 and |y(n)− y (n)| ≤ 1 ,

Tunneling in Two Dimension

749

where the infimum over the empty set is defined as +∞. This means that n 0 is the first time when the x-coordinates couple, and n 1 is the first time at which the y-coordinates get close after the x-coordinates have coupled. For n ≤ n 1 , y(n) and y (n) are independent of each other and distributed with the law of the Markov chain with transition probability (9.11) which starts respectively from y0 and y0 . If n 1 < ∞ the conditional law of {y(n), y (n), n ∈ [n 1 , n 1 + k0 ]}, k0 as in Lemma 9.2 below, given y(n 1 ), y (n 1 ) is , the probability in Lemma 9.2 below. If y(n 1 + k0 ) = y (n 1 + k0 ), y (n) = y(n) for n ≥ n 1 + k0 with y(n) having the law of the Markov chain with transition probability (9.11). If instead y(n 1 + k0 ) = y (n 1 + k0 ) we repeat the previous procedure with n 0 replaced by n 1 + k0 and so forth. Lemma 9.2. There are π0 and k0 positive and for any (y0 , y0 , X ), |y0 − y0 | ≤ 1, X = (x0 , .., xk0 ), a probability = (y0 ,y0 ,X ) on [−L/2, L/2]k0 +1 × [−L/2, L/2]k0 +1 such that the marginal distributions of y(·) and y (·) are the Markov chains with tran(1) sition probability (9.11) starting from y0 and y0 and (Ex ,x below as in Theorem 9.1), 0 0 (1) Ex ,x (y0 ,y0 ,X ) {y(k0 ) = y (k0 )} ≥ π0 . (9.14) 0

0

The lemma follows easily from the smoothness properties of the transition kernel, its proof is just as in its one dimensional version in [13] and it is omitted. We call Pr0 ,r0 the joint law of {r (n), r (n), n ∈ N} as defined above and denote by Er0 ,r0 expectation w.r.t. Pr0 ,r0 . Pr0 ,r0 is a coupling of the Markov chains starting from r0 and r0 and with transition probability K m . Indeed, for any f ∈ L ∞ (Q L ) or f ∈ L ∞ (Q ∞,L ), and any n ≥ 1, K mn (r0 , r ) f (r ), Er0 ,r0 f (r (n)) = K mn (r0 , r ) f (r ). (9.15) Er0 ,r0 f (r (n) = QL

QL

W [r, r ],

on Q L × Q L or on Q ∞,L × Q ∞,L as Recalling (9.12) we define a distance " # W [r, r ] = wm (r )+wm (r ) 1{r =r } = W (1) [x, x ], wm (r ) := em (r )−1 (9.16) (x = r · e1 above) and call

Rn,r0 ,r0 = Er0 ,r0 W [(r (n), r (n)] .

(9.17)

Rn,r0 ,r0 is an upper bound for the Wasserstein distance between K mn (r0 , ·) and K mn (r0 , ·) relative to the distance (9.16). Theorem 9.3. There are positive constants L ∗ , c and ω so that for any of the above chains and any L > L ∗ , n ≥ 1: Rn,r0 ,r0 ≤ ce−(ω/L

2 )n

W [r0 , r0 ].

(9.18)

The proof of Theorem 9.3, which is postponed, uses Theorem 9.1 to reduce to the case when x(·) = x (·). Then the y coordinates (regarded on the whole axis and then reduced to [−L/2, L/2] by reflections) perform independent random walks with increments having the same law (which depends on x) till when they get to distance ≤ 1. By Lemma 9.2 after a time they couple k0 with probability π0 > 0 and the proof of Theorem 9.3 will then be concluded with an estimate of the probability of the time when two independent walks get closer than 1. We will see that such a probability is positive independent of L and of the starting points provided the time is proportional to L 2 (recall that the y coordinates are defined modulo reflections at ±n L/2).

750

G. Bellettini, A. De Masi, N. Dirr, E. Presutti

9.5. L ∞ bounds. The Markov chain K m has an invariant probability measure μ(r )dr (recall the normalization of em and sm in Subsect. 9.1) 2 −1 μ(r ) := sm (r )em (r ) = em (r ) pm (r ) , μ(r )K m (r, r )dr = μ(r ). (9.19) Let ψ ∈ L ∞ (Q L ) and u = ψwm . By the invariance of μ, K mn (r0 , r )[u(r ) − μ(u)]dr = μ(r0 )Er0 ,r0 u(r (n)) − u(r (n)) dr0 . (9.20) u(r ˜ ) u(r ˜ ) wm (r ) − wm (r ), u˜ = u − μ(u) where, by an wm (r ) wm (r ) abuse of notation, μ(u) = μ(r )u(r )dr . Thus

We write u(r ) − u(r ) =

u(r ) − u(r ) ≤ u˜ ∞ W [r, r ]. wm

(9.21)

Hence by (9.18) " # K n (r0 , r ) u(r )−μ(u) ≤ u˜ ∞ ce−(ω/L 2 )n wm (r0 )+C . (9.22) m wm −1 The term with C is obtained by writing wm (r )μ(r ) = em pm which, by (9.6) and (9.7), is bounded. Moreover, recalling that u = ψwm , u(r ˜ ) ˜ ), ψ˜ := ψ − em ψ, em m . = ψ(r wm (r )

(9.23)

By (9.22) and (9.8), %n

$ An (r0 , r )ψ(r ˜ ˜ ∞ c λm e−(ω/L 2 ) wm (r0 )+C ) ≤ em (r0 ) ψ m

(9.24)

which using (9.6) proves: Theorem 9.4. There are positive constants L ∗ , c and ω so that for any of the above chains, any L > L ∗ , n ≥ 1 and any ψ such that ψ, em m = 0, Anm ψ ∞ ≤ c [λm e−(ω/L ) ]n ψ ∞ , (9.25) # " where c = c 1 + C em ∞ and for any t > 0, (recalling that m = Am − 1) 2

em t ψ ∞ ≤ e−t c ψ ∞

2 ∞ (λm e−(ω/L ) t)n

n=0

n!

≤ c e−(ω/2L

2 )t

ψ ∞ .

(9.26)

The last bound follows for L large enough by bounding −1+λm e−x < |λm −1|+e−x −1, e−x − 1 ≤ −3x/4 (x > 0 small enough), |λm − 1| ≤ ω/(4L 2 ), by (9.4) for L large enough.

Tunneling in Two Dimension

751

9.6. A preliminary lemma. In the proof of Theorem 9.3 and in Sect. 10 as well we will use Lemma 9.5 below. With reference to (9.6), define for m ∈ M L , wm;a (r ) = wm (r )ea|r ·e1 −xm | , a ≥ 0, km (n, r ) := min n, (|r · e1 − xm | − (s + 1))1|r ·e1 −xm |−(s+1)>0 ,

(9.27) (9.28)

where s is as in (9.5). Lemma 9.5. Let δ be as in (9.5). Then there exist positive constants L ∗ , a0 , c and δ1 ∈ (δ, 1) such that for any 0 < a < a0 and L ∗ > L the following holds. If m ∈ M L then for any n ≥ 1, k (n,r ) K mn (r, r )wm;a (r )dr ≤ cδ1m wm;a (r ). (9.29) All the above coefficients can be taken uniformly in m ∈ M L . Proof. Call x = r · e1 and Ps (r, r ) = K m (r, r )1|x−xm |≥s and = 0 otherwise; let Er be the expectation of the Markov process with transition probability K m starting from r so that K mn (r, r )wm;a (r )dr = Er wm;a (r (n)) . We decompose the expectation on the r.h.s. by using the sets A0 = {r (·) : |x(0)−xm | ≤ s},

Ak := r (·) : |x(t) − xm | > s, t = 0, .., k − 1; |x(k) − xm | ≤ s , k ≥ 1,

Bh := r (·) : |x(t) − xm | > s, t = h, .., n; |x(h − 1) − xm | ≤ s , h ≥ 1,

Cn = r (·) : |x(t) − xm | > s, t = 0, .., n , Dn = r (·) : |x(n) − xm | ≤ s . Then,

K mn (r, r )wm;a (r )dr =

Psn (r, r )wm;a (r )dr + Psk (r, r0 )1|x0−xm |≤s Er0 n≥h>k

where φl (r ) := that

× 1|r (h−k−1)·e1 −xm |≤s φn−h (r (h −k)) dr0 ,(9.30)

Psl (r, r )wm;a (r )dr for l ∈ N. By (9.8), (9.5) and (9.7) there is c so

$ %l a e δ wm;a (r ) Psl (r, r )wm;a (r )dr ≤ c λ−1 m

(9.31)

a because |x − x| ≤ l. By (9.4) for L large and a small enough λ−1 m e δ =: δ1 < 1. Note that only for k ≥ km (r, n) the corresponding terms in (9.30) are nonzero, hence δ1k+n−h wm;a (r ), K mn (r, r )wm;a (r )dr ≤ c n≥h>km (r,n)

and (9.29) then follows. By the last item in Subsect. 9.1 all coefficients in the above bounds can be chosen uniformly in m ∈ M L so that the proof of the lemma is complete.

752

G. Bellettini, A. De Masi, N. Dirr, E. Presutti

9.7. Proof of Theorem 9.3. Given n call n 0 the integer part of n/2 and shorthand ξn = (r (n), r (n)). Then . (9.32) Eξ0 W [ξn ] = Eξ0 Eξn0 W [ξn ] 1xn0 =xn + 1xn0 =xn 0

0

When xn 0 = xn 0 we bound W [ξn ] ≤ wm x(n) + wm x (n) , namely we drop the characteristic function that r (n) = r (n) so that the expectations (·) and r (·) relative to r(1) uncouple. Then by Lemma 9.5 with a = 0, Eξn0 W [ξn ] 1xn0 =xn ≤ cW [xn 0 , xn 0 ], 0 hence by Theorem 9.1, (1) (9.33) Eξ0 Eξn0 W [ξn ] 1xn0 =xn ≤ c W [ξ0 ]e−ω n 0 .

0

To bound Eξn0 W [ξn ] with xn 0 = xn 0 we recall from Theorem 9.1 and the definition of Pξ0 that x(i) = x (i) for all i ≥ n 0 , so that W [ξn ] = 2wm (x(n))1 yn0 = yn . 0

(9.34)

We distinguish two cases: First case, |x0 − xm | > n. We bound W [ξn ] ≤ 2wm (x(n)) and get (9.35) Eξ0 Eξn0 W [ξn ] 1xn0 =xn ≤ 2Er0 wm (r (n)) ≤ cδ1n wm (r0 ) 0

having used Lemma 9.5 with a = 0 and with c above a suitable constant. Second case, |x0 −xm | ≤ n. To decouple x from (y, y ) we use Hölder. Let p −1 +q −1 = 1 then, supposing (for instance) wm (x0 ) ≤ wm (x0 ), 1/ p Eξ0 Eξn0 W [ξn ] 1xn0 =xn ≤ 2Er0 wm (r (n)) p 0 1/q ×Pξ0 x(n 0 ) = x (n 0 ); y(n) = y (n) . (9.36) We use the second inequality in (9.6) to write % p−1 $ √ √ −1 wm (r ) p ≤ (c L)eαm |x−xm | em (r ) = (c L) p−1 wm;a (r ), a = αm ( p − 1). (9.37) Taking p−1 > 0 small enough we can apply Lemma 9.5 and recalling that |x0 −xm | ≤ n we get √ √ |x −x | Er0 wm (x(n)) p ≤ c ( L) p−1 wm;a (r0 )δ1 0 m ≤ c ( L) p−1 wm (r0 ). (9.38) The last inequality is valid for p − 1 > 0 small enough. Then

1/ p √ Er0 wm (x(n)) p ≤ C( L)1−1/ p wm (r0 )em (r0 )1−1/ p ≤ C wm (r0 ), (9.39) having used the first inequality in (9.6). Conclusions. In the first case, |x0 − xm | > n, thebound (9.35) concludes the proof, while in the second case we need to prove that Pξ0 x(n 0 ) = x (n 0 ); y(n) = y (n) is exponentially small, which is done in the next subsection.

Tunneling in Two Dimension

753

9.8. Coupling the y coordinates. In this subsection we suppose r0 = (x0 , y0 ) and r0 = (x0 , y0 ), namely that the initial x coordinates are the same. This is indeed what happens at time n 0 in the case we have to study and, to simplify notation, we have just reset time n 0 equal to 0. We will prove that at a time cL 2 the y coordinates are the same with probability not smaller than a number π > 0, uniformly in ξ0 and L. By iteration this will prove that (shorthanding ξ0 = (r0 , r0 )) 2 (9.40) Pξ0 {y(n) = y (n)} ≤ (1 − π )e−n/(cL ) which inserted in (9.36) will conclude the proof of (9.18). Let τ = inf n ∈ N : |y(n) − y (n)| ≤ 1 .

(9.41)

We will prove that Proposition 9.6. There are k1 > 0 and π1 > 0 so that inf Pξ0 {τ ≤ k1 L 2 } ≥ π1 . x0 ,y0 ,y0

(9.42)

Proposition 9.6 and Lemma 9.2 prove (9.40) with π = π0 π1 and cL 2 > k0 + k1 L 2 . In the sequel we will prove Proposition 9.6. Since y(n) and y (n) are independent of each other till τ , we may as well and will in the sequel consider Pξ0 defined so that y(n) and y (n) are independent of each other at all times. Shorthand, Z n = [y (n) − y (0)] − [y(n) − y(0)], and call

σ := inf x

P(x, x ) qx,x (z)z 2 d x dz > 0.

Positivity follows because there is c so that [13].

em (r ) ≤ c for any |(r − r ) · e1 | ≤ 2, see em (r )

Lemma 9.7. There is c so that for any n ≥ 1 for any ξ0 with x0 = x0 , Eξ0 Z n = 0, Eξ0 Z n2 ≥ 2σ n, Eξ0 Z n4 ≤ c n 2 .

(9.43)

Proof. We write z n = Z n − Z n−1 , n ≥ 1, so that Z n = z 1 + · · · + z n . For any k, n with k < n and any measurable function f on R, using that J (0, r ) depends on |r | and qx,x (z) = qx,x (−z), Eξ0 f (z k )z n = Eξ0 f (z k ) (u − u)qxn−1 ,x (u)qxn−1 ,x (u )P(xn−1 , x)dudu d x , =0 hence the first equality in (9.43) after setting f = 1. Analogously, recalling also the definition of σ , 2 Eξ0 z n = Er0 ,r0 (u − u)2 qxn−1 ,x (u)qxn−1 ,x (u )P(xn−1 , x)dudu d x 2 (u + u 2 )qxn−1 ,x (u)qxn−1 ,x (u )P(xn−1 , x)dudu d x ≥ 2σ, = Eξ0

754

G. Bellettini, A. De Masi, N. Dirr, E. Presutti

hence the lower bound in (9.43). The upper bound in (9.43) is derived by noticing that by symmetry Eξ0 Z n4 = Eξ0 z 4j + 12 z i2 z 2j ≤ cn 2 . j≤n

i< j≤n

Proof of Proposition 9.6. We have {τ ≤ n} ⊇ {|Z n | > L , sign(Z n ) = sign(y (0) − y(0))} because y (k) − y(k) jumps at most by 2. By symmetry, 1 Pξ0 |Z n | > L , sign(Z n ) = sign(y0 − y0 ) = Pξ0 |Z n | > L 2 so that 1 Pξ0 τ ≤ n ≥ Pξ0 |Z n | > L . 2 We have Eξ0 (Z n2 ) = Eξ0 (Z n2 1|Z n |≤L ) + Eξ0 (Z n2 1|Z n |>L ) ≤ L 2 + Eξ0 (Z n4 )1/2 Pξ0 (|Z n | > L)1/2 . Moreover, using (9.43) and the choice of σ n, we obtain that for n > L 2 σ −1 , Pξ0 (|Z n | > L)1/2 ≥ hence (9.42).

2σ n − L 2 σ ≥√ , (cn 2 )1/2 c

10. Spectral Gap We regard here m = Am − 1, m ∈ M L , as an operator on the weighted L 2 -spaces −1 dr ) or on L 2 (Q −1 L 2 (Q L , pm ¯ e and denote by ·, ·m the scalar ∞,L , p dr ) if m = m product. On such spaces m is self-adjoint, it has eigenvalue λm − 1 with eigenvector the planar function em . We will prove here that: Theorem 10.1. There is a > 0 so that for all L large enough, f, m f m a ≤ − 2. f, f L m f : f,em m =0 sup

(10.1)

A crucial point in the proof of Theorem 10.1, which is given in the remaining of this section, is that the operator m is self-adjoint. The mere existence of a spectral gap then follows from Weyl’s theorem by the same argument used in [16] for the d = 1 case. The argument is however abstract and does not allow to determine the dependence on L of the spectral gap. Notice on the other hand that for the Allen-Cahn equation m t = m − V (m) the question trivializes because the linearized operator is a sum of d2 d2 two commuting operators, { 2 − V (m(x))} ¯ + 2 , so that it is the non-local nature dx dy of the interaction which is behind all difficulties we find here.

Tunneling in Two Dimension

755

Notation. To simplify notation we fix m ∈ M L and shorthand ·, · for ·, ·m . We call M the self adjoint operator equal to Am on { f : f, em = 0}, while Mem = 0. We denote by M its norm: | f, M f | = f =0 f, f

M = sup

| f, Am f | . f, f f : f,em =0 sup

(10.2)

Lemma 10.2. If there is a > 0 so that for all L large enough, log M ≤ −

2a L2

(10.3)

then (10.1) holds. Proof. If (10.3) holds, then f, (Am − 1) f a 2 ≤ −1 + M ≤ −1 + e−2a/L ≤ − 2 f, f L f : f,em =0 sup

for L large enough.

To bound log M we use the spectral theorem: Proposition 10.3. log M =

sup

lim inf

f =0, f ∞ <∞ n→∞

f, M 2n f 1 log . 2n f, f

(10.4)

Equality holds with limsup as well. Proof. Equation (10.4) is a direct consequence of the spectral theorem for self-adjoint operators, as we are going to see. Let f, f = 1 and n be even. Since f, M n f ≤ M n , lim sup n→∞

1 log f, M 2n f ≤ log M . 2n

(10.5)

For the reverse inequality we use the spectral theorem to say that for any 0 ≤ λ < M there is a non-zero orthogonal projection Pλ which commutes with M and such that for any n ≥ 1, M 2n Pλ ≥ λ2n Pλ . Since L ∞ is dense in L 2 , given any 0 ≤ λ < M there are f and R such that f ∞ < R and Pλ f = 0. Then writing f in f, M 2n f as f = Pλ f + (1 − Pλ ) f and expanding, f, M 2n f ≥ Pλ f, M 2n Pλ f ≥ λ2n Pλ f, Pλ f , Pλ f, Pλ f > 0, the first inequality using that M and Pλ commute, so that the mixed terms vanish. Hence sup

lim inf

f : f, f =1, f ∞ <∞ n→∞

1 log f, M 2n f ≥ log λ, 2n

thus sup

lim inf

f : f, f =1, f ∞ <∞ n→∞

1 log f, M 2n f m ≥ log M 2n

which, together with (10.5), yields (10.4).

756

G. Bellettini, A. De Masi, N. Dirr, E. Presutti

Proof of (10.3). We consider f such that f, em = 0, f, f = 1 and f ∞ ≤ R and we look for upper bounds on f, M n f , n even. We define g = f /em . Recalling (9.19), gμ = f, em = 0. By (9.8) and shorthanding K for K m , n λ−n m f, M f =

=

g[K n g]μ dr =

g(r )g(r )[K n (r, r ) − μ(r )] dr μ(r )dr

g(r )g(r )[K n (r, r ) − K n (r , r )] μ(r ) dr dr μ(r )dr,

where we have used the invariance of μ with respect to K , see (9.19). Calling Q rn ,r 0

0

(dr, dr ) the distribution of (r (n), r (n)) of the Markov chain with law Pr0 ,r0 defined in Subsect. 9.4 we have n g(r )[K n (r, r ) − K n (r , r )]dr = [g(r1 ) − g(r2 )] Q r,r (dr 1 dr 2 ). (10.6) With such notation, n λ−n m f, M f =

n g(r ) [g(r1 )−g(r2 )] Q r,r (dr 1 dr 2 )μ(r ) dr μ(r )dr.

With reference to (9.5)-(9.6), we split the domain of integration into the two sets {|x − xm | ≤ n, |x − xm | ≤ n} and its complement, denoting by x and x the x coordinates of r and r . We call n I := g(r ) [g(r1 )−g(r2 )] Q r,r (dr 1 dr 2 )μ(r ) dr μ(r )dr. (10.7) {|x−xm |≤n,|x −xm |≤n}

Recalling that g = f /em , f ∞ ≤ R and with W defined in (9.16), proceeding as in (9.21), W [r1 , r2 ] n I ≤ R2 Q r,r (dr1 dr2 )μ(r ) dr μ(r )dr em (r ) {|x−xm |≤n,|x −xm |≤n} 2 −(ω/L 2 )n

≤ cR e

{|x−xm |≤n,|x −xm |≤n}

W [r, r ] μ(r ) dr μ(r )dr, em (r )

where we have used (9.18). By the definition of W [r, r ] we have for r = r , W [r, r ] 1 1 = 2 + , em (r ) em (r ) em (r )em (r ) hence

{|x−xm |≤n,|x −xm |≤n}

≤ {|x−xm |≤n}

W [r, r ] μ(r ) dr μ(r )dr em (r )

−1 pm dr

+

−1 dr em pm

2 .

Tunneling in Two Dimension

757

√ em ≤ c L, so that for a suitable constant c,

By (9.6) and (9.7),

I ≤ c R 2 e−(ω/L

2 )n

nL.

(10.8)

Note that in the case Q L the better bound f, M n f ≤ c R 2 e−(ω/L

2 )n

L2

follows directly from the spectral gap in L ∞ , see (9.25). For the channel Q L , however, the entire analysis presented in this section is necessary. n We will see below that all other contributions to λ−n m f, M f are smaller (for L large enough). In the complement of {|x − xm | ≤ n, |x − xm | ≤ n} we use (10.6) backwards to rewrite the integrals in terms of g(r )[K n (r, r ) − K n (r , r )]. We will not exploit the minus sign and bound separately the two terms in the difference. We start with the term Z:= g(r )g(r )K n (r, r ) μ(r ) dr dr μ(r )dr {|x−xm |≥n;|x −xm |≤n}

|g(r )g(r )|K n (r, r ) dr μ(r )dr =: Zˆ .

≤

(10.9)

{|x−xm |≥n}

Call K s (r, r ) = K (r, r ) if |x − xm | ≥ s and 0 otherwise. Then Zˆ =

n

Z h , where

h=0

|g(r )g(r )|K sn (r, r ) dr μ(r )dr,

Z0 = {|x−xm |≥n}

(10.10)

Zh =

|g(r )g(r

)|K sn−h (r, r )K h (r , r ) dr dr μ(r )dr.

{|x−xm |≥n,|x −xm |≤s}

To bound Z 0 we use (9.5) to write K s (r, r ) ≤ δ J neum (r, r )

em (r ) λm em (r )

and get, with

f 22

=

Z0 ≤

f 2, n−1 λ−n m δ

| f (r ) f (r )|(J neum )n (r, r ) dr dr

n−1 ≤ f 22 λ−n , m δ

(10.11)

758

G. Bellettini, A. De Masi, N. Dirr, E. Presutti

|g(r )|K sn−h (r, r ) dr μ(r )dr

Z h ≤ γh R {|x−xm |≥n,|x −xm |≤s}

≤ γh R 2

em (r ),

{|x−xm |≥n}

γh :=

sup

|x −xm |≤s

√ em (x )−1 K h (r , r )dr ≤ c L.

(10.12)

The last inequality follows from Lemma 9.5 and (9.6), with c = c(s) a constant independent of h. By (9.6), √ (10.13) em (r ) ≤ c Le−(αm /2)n sup |x−xm |≥n

so that Z h ≤ c R 2 Le−(αm /2)n . In conclusion, there is c so that n 2 −(αm /2)n . δ + n R Le Z ≤ c f 22 λ−n m The next term we examine is B :=

|g(r )g(r )|K n (r , r ) μ(r ) dr dr μ(r )dr

{|x−xm |≥n;|x −xm |≤n}

√ ≤ c R 2 Le−(αm /2)n √ ≤ c R 2 Le−(αm /2)n

{|x −xm |≤n} μ(r )

em (x )

where we have used (10.13). The next term is C :=

K n (r , r ) μ(r ) dr dr em (x )

dr ≤ c R 2 Le−(αm /2)n ,

|g(r )g(r )|K n (r , r ) μ(r ) dr dr μ(r )dr

{|x−xm |≤n;|x −xm |≥n}

which is equal to Z , see (10.9). The next one is D := |g(r )g(r )|K n (r, r ) μ(r ) dr dr μ(r )dr {|x−xm |≤n;|x −xm |≥n}

which is equal to B, see (10.15). The last two terms are G and H : |g(r )g(r )|K n (r, r ) μ(r ) dr dr μ(r )dr G := {|x−xm |≥n;|x −xm |≥n}

≤ ce−2αm n

{|x−xm |≥n}

(10.14)

|g(r )g(r )|K n (r, r ) dr μ(r )dr = ce−2αm n Zˆ ,

(10.15)

Tunneling in Two Dimension

759

where Zˆ is defined in (10.9). By (10.13) and (9.19), H := |g(r )g(r )|K n (r , r ) μ(r ) dr dr μ(r )dr {|x−xm |≥n;|x −xm |≥n}

≤ cR

2

√

Le

−(αm /2)n

K n (r , r ) μ(r ) dr dr ≤ c R 2 Le−(αm /2)n . em (r )

In conclusion we have proved that there is a constant c so that 2 n 2 −(αm /2)n f, M n f ≤ cλnm R 2 e−(ω/L )n n L + f 2 λ−n . m δ + (n + 1)R Le Hence for L large enough, lim sup n→∞

1 ω log f, M 2n f ≤ log λm − 2 2n L

which by (9.4) yields and Proposition 10.3 yields (10.3). 11. Extension to d > 2

While the paper is written for a strictly two dimensional system most of the arguments are not really two dimensional. We will sketch here the proof of the extension to d = 3 and see what is missing in d > 3. Thus we call Q L = [−L/2, L/2]d with the cost of tunneling still denoted by PL . (1)

Theorem 11.1. In d = 3 for L large enough PL = L 2 FL (mˆ L ) and (2.21)–(2.22) are also valid. Sketch of proof. The dimensional restriction to d = 3 comes uniquely from the limit Wulff problem as the isoperimetric inequality below is proved only in d = 3. The remaining parts of the proof work instead also in d > 3. The obvious analogy of (5.1) is, Wα,L := m β 1{x: x1 ≥Lϑα } − m β 1{x: x1 0 such that the solution E θ with θ ∈ (− , ) having volume 1/2 − θ is such that Q 1 ∩ ∂ E θ is contained in plane parallel to a coordinate plane.

760

G. Bellettini, A. De Masi, N. Dirr, E. Presutti

Assume now by contradiction that there exists a sequence (θn ) ⊂ (0, 1) converging to 1/2 such that Q 1 ∩ ∂ E θn is not contained in a plane parallel to a coordinate plane. Then E θn belongs to one of the four remaining types listed in [29]. Passing to a (not relabeled) subsequence if necessary, by the compactness theorem of bounded variation functions with uniformly bounded BV norm, it follows that the sequence (E θn ) converges in L 1 (Q 1 ) to a finite perimeter set E as n → +∞, the boundary of which cannot be contained in a plane parallel to a coordinate plane. 11.2. Couplings and Wasserstein distance. We refer here to Subsect. 9.4 (with the same title) where we have defined the coupling used in the proof of the spectral estimates and which must be modified in d > 2. Call xi , i = 1, .., d, the coordinates, x1 the one in the direction of the channel, xi , i > 1 the transversal ones. The coupling is then defined iteratively so that once x j = x j , j ≤ k −1, they stay together while the other coordinates move independently until xk = xk and so on. Thus at each step the requirement is that two one dimensional walks meet with each other and the estimate is then the same as in Sect. 9. The remaining analysis of the spectral estimates is essentially independent of the dimensions, and details are omitted. 11.3. The Bodineau-Ioffe argument. As a direct consequence of the Wulff estimate above we obtain Proposition 7.1. The next step consists in passing to the problem in the channel by finding vertical connections (which are now hyperplanes instead of stripes) close to the sides of the cube. In other words, we have to show that the interface is “flat” even on the mesoscopic scale, or, again phrased differently, that fingers do not grow too far. We sketch the proof for any dimension d ≥ 2. Define S(n) := [n+ , (n + 1)+ ) × [−L/2, L/2)d−1 , then we immediately obtain from the proof of Proposition 7.2 that for any m ∈ Nδ,L (ζ,− ,+ ) (m, ·) = ±1 in at most there are n ± ∈ Z ± L such that + Nδ := cδL d−1 (11.1) (θ1 − θ0 )ζ d− hypercubes of D(+ ) inside S(n ± ). The constant c depends only on the dimension. Let us now explain what we mean by a vertical connection (Definition 5.2) in higher dimensions. Definition 11.2. A vertical connection B is a D(+ ) -measurable connected set such that for any (y2 , . . . , yd ) ∈ [−L/2, L/2] there exists x ∈ [−L/2, L/2] such that (x, y2 , . . . , yn ) ∈ B. In particular the union of all cubes with sidelength + touching a given hyperplane normal to the x1 -axis is a vertical connection. Now we show that Proposition 7.2 can be extended to higher dimensions. Let us prove the existence of the positive connection B+ , the existence of B− being similar. To proceed we need the following definitions, see also Fig. 3: A(i) := {x : x1 ∈ [i+ , (i + 1)+ )} ∩ Q L , H (i) := A(i) ∩ {x : (ζ,− ,+ ) (m, x) ∈ {0, −1}}, f (i) := |H (i)|.

Tunneling in Two Dimension

761

S(n - ) S(n +)

0

A(i)

L/2

Fig. 3. We depict -for simplicity in a two-dimensional setting- the plus-squares (white), the minus- and zerosquares (dark), the slices A(i), S(n ± ) and, in gray color, the set H (i)

Proof. Case 1. Let M L := −1 + L/2 be the number of slices in the right half of the cube. Assume that there exists i 0 ∈ {n + , . . . , M L } such that f (i 0 ) < C0 ζ 2 d−

ML

d−2

( f ( j)) d−1 ,

(11.2)

j=i 0

where C0 is a constant to be specified later, which depends only on J, β and the dimension. We show the existence of a connection B+ by contradiction, i.e. if there is no connection, then we can construct a function m 2 such that a connection exists for m 2 and F(m) ≥ F(m 2 ) + for some > 0, thus deriving the desired contradiction as in the 2d-case. The function m 2 is constructed in two steps. First, we obtain a function m 1 by “cutting” the contour in the most naive way at level i 0 : Define m 1 (x) := m β for x ∈ H (i 0 ) and m 1 (x) = m(x) elsewhere. Then there is a c1 depending only on β, J and the dimension such that FL (m 1 ) ≤ FL (m) + c1 f (i 0 )d−1 + . m 1 has the property that the sets {x : x1 > + i 0 and (ζ,− ,+ ) (m, x) ∈ {0, −1}} and A(n + ) are not connected. We can then apply Theorem 5.6 and conclude that there exists m 2 such that m 2 = m on {x1 < + n + } and − c2 ζ 2 (− )d N0 , FL (m 2 ) ≤ FL (m 1 ) − c2 ζ 2 (− )d N0 ≤ FL (m) + f (i 0 )c1 d−1 + where N0 denotes the number of zero-cubes in {x : (i 0 + 1)+ ≤ x1 ≤ + M L }. & It remains to estimate N0 . Note that by definition of contours the boundary of j≥i 0 H ( j) is a union of 0-cubes, hence −1 N0 ≥ (2dd−1 + )

j≥i 0

Hd−1 (∂ P[H ( j)]),

762

G. Bellettini, A. De Masi, N. Dirr, E. Presutti

where Hd−1 is the (d-1)-dimensional Hausdorff measure, and P[(x1 , . . . , xn )] = (x2 , . . . , xn ) is the projection on the plane {x1 = 0}. By the isoperimetric inequality, d−2

d−2

Hd−1 (∂ P[H ( j)]) ≥ cd−1 P[H ( j)] d−1 = cd−1 f ( j) d−1 , where cd−1 is the isoperimetric constant in the interior of [0, L]d−1 . (Note that the isoperimetric inequality holds because we may assume that | f ( j)| ≤ (1/2)L d−1 for all i 0 ≤ j ≤ M L , or N0 ∼ M L /4 follows immediately for δ sufficiently small.) Therefore, if (11.2) holds, then N0 ≥

2dd−1 + C0 cd−1 ζ 2 d−

f (i 0 ), and

F(m) − F(m 2 ) ≥ f (i 0 )d−1 (c1 − 2dc2 /(cd−1 C0 )) > α f (i 0 ), + where we can require α > 0 for an appropriate choice of C0 . Note that we may assume | f (i 0 )| ≥ 1, or H (i 0 ) contains no cube and therefore A(i 0 ) is a connection. Hence this contradicts (7.4) for L sufficiently large and C0 chosen appropriately. Case 2. On the other hand, if (11.2) is false for all i ∈ {n + , . . . , −1 + (L/2)} then, by 'i+1 d−2 solving the resulting difference inequality for the function g(i) := j=n + ( f ( j)) d−1 we obtain that f (n + ) ≥ cL d−1 , where c does not depend on δ. (See also Sect. 4.12 (proof of Lemma 4.4) in [5].) This contradicts the fact that f (n + ) ≤ δL d−1 for δ sufficiently small. Acknowledgements. We are grateful to Dmitry Ioffe and Laszlo Zsido for helpful discussions.

References 1. Alberti, G., Bellettini, G., Cassandro, M., Presutti, E.: Surface tension in Ising systems with Kac potentials. J. Stat. Phys. 82, 743–796 (1996) 2. Bellettini, G., De Masi, A., Dirr, N., Presutti, E.: Optimal orbits, invariant manifolds and finite volume instantons Preprint http://cvgmt.sns.it/people/belletti/, 2005 3. Bellettini, G., De Masi, A., Presutti, E.: Tunnelling for nonlocal evolution equations J. Nonlin. Math. Phys. 12(Suppl.) 1, 50–63 (2005) 4. Bellettini, G., De Masi, A., Presutti, E.: Energy levels of a non local evolution equation. J. Math. Phys. 46, 1–31 (2005) 5. Bodineau, T., Ioffe, D.: Stability of interfaces and stochastic dynamics in the regime of partial wetting. Ann. Henri Poincaré 5, 871–914 (2004) 6. Buttà, P., De Masi, A., Rosatelli, E.: Slow motion and metastability for a nonlocal evolution equation. J. Stat. Phys. 112, 709–764 (2003) 7. Chen, X.: Existence, uniqueness and asymptotic stability of traveling waves in non local evolution equations. Adv. Diff. Eqs. 2, 125–160 (1997) 8. Chmaj, A., Ren, X.: Homoclinic solutions of an integral equation: existence and stability. J. Diff. Eq. 155, 17–43 (1999) 9. Comets, F.: Nucleation for a long range magnetic model. Ann. Inst. H. Poincaré Probab. Statist. 23 (2), 135–178 (1987) 10. Cassandro, M., Olivieri, E., Picco, P.: Small random perturbations of infinite dimensional dynamical systems and the nucleation theory. Ann. Inst. Henri Poincaré 44, 343–396 (1986) 11. De Masi, A., Dirr, N., Presutti, E.: Interface Instability under Forced Displacements. Annales Henri Poincaré 7(3), 471–511 (2006) 12. De Masi, A., Gobron, T., Presutti, E.: Traveling fronts in non-local evolution equations. Arch. Rat. Mech. Anal. 132, 143–205 (1995) 13. De Masi, A., Olivieri, E., Presutti, E.: Spectral properties of integral operators in problems of interface dynamics and metastability. Markov Process. Related Fields 4, 27–112 (1998) 14. De Masi, A., Olivieri, E., Presutti, E.: Critical droplet for a non local mean field equation. Markov Process. Related Fields 6, 439–471 (2000)

Tunneling in Two Dimension

763

15. De Masi, A., Orlandi, E., Presutti, E., Triolo, L.: Glauber evolution with Kac potentials. I. Mesoscopic and macroscopic limits, interface dynamics. Nonlinearity 7, 1–67 (1994) 16. De Masi, A., Orlandi, E., Presutti, E., Triolo, L.: Stability of the interface in a model of phase separation. Proc. Roy. Soc. Edinburgh Sect. A 124, 1013–1022 (1994) 17. De Masi, A., Orlandi, E., Presutti, E., Triolo, L.: Uniqueness and global stability of the instanton in non local evolution equations. Rend. Math. Appl. (7) 14, 693–723 (1994) 18. Faris, W.G., Jona Lasinio, G.: Large fluctuations for nonlinear heat equation with noise. J. Phys. A.: Math. Gen. 15, 3025–3055 (1982) 19. Freidlin, M.I., Wentzell, A.D.: Random Perturbations of Dynamical Systems. Grund. der. Math. Wiss. 260, Berlin-New York: Springer-Verlag, 1984 20. Fife, P.C.: Mathematical aspects of reacting and diffusing systems. Lectures Notes in Biomathematics 28, Berlin-New York: Springer-Verlag, 1979 21. Fusco, G., Hale, J.K.: Slow motion manifolds, dormant instability and singular perturbations. J. Dyn. Diff. Eqs. 1, 75–94 (1989) 22. Jona-Lasinio G., Mitter, P.K.: Large deviation estimates in the stochastic quantization of φ24 . Commun. Math. Phys. 130, 111–121 (1990) 23. Kohn, R.V., Reznikoff, M.G., Tonegawa, Y.: The sharp interface limit of the action functional for Allen Cahn in one space dimension. Calc. Var. Partial Diff. Eqs. 25(4), 503–534 (2006) 24. Kohn, R.V., Otto, F., Reznikoff, M.G., Vanden-Eijnden, E.: Action minimization and sharp interface limits for the stochastic Allen-Cahn Equation. Preprint (2005) 25. Massari, U., Miranda, M.: Minimal Surfaces of Codimension One. Notas de Matemática, Amsterdam: North-Holland, 1984 26. Martinelli, F.: On the two-dimensional dynamical Ising model in the phase coexistence region. J. Stat. Phys. 76(5–6),1179–1246 (1994) 27. Olivieri, E., Vares, M.E.: Large Deviations and Metastability. Cambridge: Cambridge University Press, 2005 28. Presutti, E.: From Statistical Mechanics towards Continuum Mechanics. Course given at Max-Plank Institute, Leipzig, 1999 29. Ros, A.: The isoperimetric problem. In: Global theory of minimal surfaces. Clay Math. Proc., 2, Providence, RI: Amer. Math. Soc., 2005, pp. 175–209 Communicated by J.L. Lebowitz

Commun. Math. Phys. 269, 765–808 (2007) Digital Object Identifier (DOI) 10.1007/s00220-006-0102-5

Communications in

Mathematical Physics

Asymptotic Behavior Near Transition Fronts for Equations of Generalized Cahn–Hilliard Form Peter Howard Department of Mathematics, Texas A&M University, College Station, TX 77843, USA. E-mail: [email protected] Received: 4 November 2005 / Accepted: 6 June 2006 Published online: 8 September 2006 – © Springer-Verlag 2006

Abstract: We consider the asymptotic behavior of perturbations of standing wave solutions arising in evolutionary PDE of generalized Cahn–Hilliard form in one space dimension. Such equations are well known to arise in the study of spinodal decomposition, a phenomenon in which the rapid cooling of a homogeneously mixed binary alloy causes separation to occur, resolving the mixture into its two components with their concentrations separated by sharp transition layers. Motivated by work of Bricmont, Kupiainen, and Taskinen [5], we regard the study of standing waves as an interesting step toward understanding the dynamics of these transitions. A critical feature of the Cahn– Hilliard equation is that the linear operator that arises upon linearization of the equation about a standing wave solution has essential spectrum extending onto the imaginary axis, a feature that is known to complicate the step from spectral to nonlinear stability. Under the assumption of spectral stability, described in terms of an appropriate Evans function, we develop detailed asymptotics for perturbations from standing wave solutions, establishing phase-asymptotic orbital stability for initial perturbations decaying with appropriate algebraic rate. 1. Introduction We consider the asymptotic behavior of perturbations of standing-wave solutions u(x), ¯ u(±∞) ¯ = u ± arising as equilibrium solutions in equations of generalized Cahn–Hilliard form u t = (b(u)u x )x − (c(u)u x x x )x , x, u, b, c ∈ R, t > 0, (1.1) for which we assume (H0) b, c ∈ C 2 (R), (H1) b(u ± ) > 0, c(u(x)) ¯ ≥ c0 > 0, x ∈ R. Our analysis is motivated by the distinguished role the Cahn–Hilliard model plays in the study of spinodal decomposition, a phenomenon in which the rapid cooling of a

766

P. Howard

homogeneously mixed binary alloy causes separation to occur, resolving the mixture into its two components with their concentrations separated by sharp transition layers [9, 11]. In this context, and for the case of incompressible fluids, the Cahn–Hilliard equation is u t = ∇ · {M(u)∇(F (u) − κu)}, ∂u ∂u = = 0; x ∈ ∂, (1.2) ∂ν ∂ν where denotes a bounded open subset of Rn (n typically 2 or 3), ν denotes the unit normal vector to , u denotes the concentration of one component of the binary alloy (or possibly a difference between this concentration and the concentration at homogenous mixing), F(u) denotes the bulk free energy of the alloy, and the typically small parameter κ is a measure of the strength of interfacial energy. (In the case that κ depends on u, a new term appears in (1.2)) of the form − 21 κ (u)|∇u|2 ; see [9]. We take this natural form of the equation from [12].) An intesting feature of equations of form (1.1) is that the second and fourth order regularizations can balance, and a wide variety of stationary solutions can arise, including standing waves. For example, in the case of (1.1) with b(u) =

3 2 1 u − ; c(u) ≡ 1 2 2

(1.3)

(the case studied in [5], corresponding with F(u) = (1/8)u 4 − (1/4)u 2 , M(u) ≡ 1, and κ = 1 in (1.2)), one readily verifies that the standing wave x (1.4) u(x) ¯ = tanh 2 is such a solution, often referred to as the kink solution. Motivated by the elegant result of Bricmont, Kupiainen, and Taskinen [5] (which we state below for comparison with the current analysis), we regard the study of standing waves as an interesting first step toward understanding the dynamics of these transitions. (See also [36] and [38] for related results in the case of multiple space dimensions). We stress that our focus is on standing waves only, and remark that linearization of (1.1) about a traveling wave leads to a linearized problem with non-vanishing convection and consequently an entirely different flavor than the problem that arises upon linearization about a standing wave. Indeed, upon shifting to a coordinate system moving along with the shock, a flux function is introduced f (u) = −su, and the wave can be regarded as a regularized undercompressive shock profile. Such problems have been considered in [16], and are quite similar to the problems considered in [25, 26]. A critical feature of equations of form (1.1) is that the linear operator that arises upon linearization of the equation about a standing wave solution has essential spectrum extending onto the imaginary axis, a feature that is known to complicate the step from spectral to nonlinear stability. The purpose of this paper is to study precisely this step from spectral to nonlinear stability. In particular, under the assumption of spectral stability (described below in terms of an appropriate Evans function and verified in the particular case (1.1)–(1.3) with wave (1.4)), we develop detailed asymptotics for perturbations from standing wave solutions, establishing phase-asymptotic orbital stability for initial perturbations decaying with algebraic rate (1 + |x|)−3/2 . Our approach to this problem will be to extend to this setting methods developed previously in the context of conservation laws with diffusive and/or dispersive regularity, u t + f (u)x = (b(u)u x )x + (c(u)u x x )x + . . . ,

(1.5)

Transition Fronts for Cahn–Hilliard Equations

767

which also have no spectral gap. More precisely, we proceed by computing pointwise estimates on the Green’s function for the linear equation that arises upon linearization of (1.1) about the wave u(x) ¯ (employing a contour-shifting approach introduced in [22, 52]; see also [23, 45]). Such estimates are dependent upon the spectrum of the linear operator, which we understand here in terms of an appropriate Evans function (see, for example, [14, 21, 35, 52] and the discussion and references below). Finally, we employ the local tracking method developed in [34], an approach through which Green’s function estimates on the linearized operator can be used to approximately locate shifts from the standing wave u. ¯ Our general approach is similar to the analysis of [5], in which case the authors also employ Green’s function estimates on the linear operator in order to close an iteration on the perturbation in some appropriately weighted space. A fundamental difference between the two analyses is that in [5], this iteration is carried out by the renormalization group method, a theory that has its origins in particle physics (see [8]) and was introduced in the context of time-asymptotic behavior for nonlinear PDE in [17, 18, 20], and further developed in [3, 4] (it pre-dates the current approach by about eight years). Briefly, the renormalization group method is an approach toward understanding asymptotic behavior of PDE that makes use of certain natural scalings in the PDE. This method is ideally suited for cases such as (1.1), for which the equation that arises upon linearization about the wave u(x) ¯ has the same natural scaling as one would use for the heat equation. Indeed, though the analysis of [5] is carried out in the context of a self-adjoint linearized equation (the integrated equation), the method could in principle be extended to the current setting for which the linearized operator is not generally self-adjoint even after integration. Nonetheless, the Green’s function estimates required for any such extension would almost certainly be obtained by methods very similar to those employed here, which have been developed particularly for the case of non-self-adjoint problems. In this way, the current approach seems to represent a change of thinking from the point of view of classical mathematical physics to that of modern Evans function techniques. We emphasize, however, that the spectral analysis of both [5] and the current analysis rely on the self-adjoint nature of the equation, and in the non-self-adjoint case spectral stability is left as a separate analysis (see the discussion following Remark 1.1). A more fundamental difficulty with the use of such a scaling technique as the renormalization group method in the context of the Cahn–Hilliard equation arises in the extension to multiple space dimensions, in which case it is known that the leading eigenvalue of the linearized operator about a planar wave u(x ¯ 1 ) (leading in the case of stability) scales as λ ∼ |ξ |3 (see [50]), and consequently introduces a new cubic scale into the problem, which competes with the natural heat-equation scaling of (1.1). (Here, ξ ∈ Rd−1 denotes a Fourier variable associated with spatial components transverse to the planar wave.) A different approach is taken in this setting in [36, 38], quite similar to the method employed here, and the authors conclude stability in space dimensions d ≥ 3 for the planar wave u(x ¯ 1 ) = tanh

x1 , 2

arising in (1.2) with F(u) = (1/8)u 4 − (1/4)u 2 , M(u) ≡ 1 and κ = 1. The primary difference between the approach of [36, 38] and the current analysis is the local tracking function δ(t) employed here, which can be generalized to the case of multiple space dimensions as a function of t and the transverse variable x˜ = (x2 , x3 , . . . , xd ) (see [25–28]). It is precisely this local tracking, which does not seem to fit in any straightforward fashion into the renormalization group framework, that allows us here to obtain

768

P. Howard

stability for more slowly decaying data than considered in [5], and that allows for the extension of this method to the case of space dimensions d ≥ 2 (an analysis we carry out in a separate paper). Finally, we note that in addition to the standing waves considered in the current analysis and the planar waves discussed above, the Cahn–Hillard equation admits a wide range of periodic solutions, and the methods here can be extended to that setting in a manner similar to the analyses of Oh and Zumbrun in the case of periodic solutions arising in the context of viscous conservation laws [46, 47]. Before setting up the analysis, we mention that equations of form (1.1) have been shown to arise in the area of mathematical biology in the context of cell formation and aggregation [44], and also that equations similar to (1.1) arise naturally in the modeling of thin film flows, for which under certain circumstances, the height h(t, x) of a film moving along an inclined plane can be modeled by fourth order equations h t + (h 2 − h 3 )x = β(u 3 u x )x − γ (u 3 u x x x )x (see [6] and the references therein). As in the case of conservation laws, it is readily seen that solutions u(t, x) initially near u(x), ¯ will not generally approach u(x), ¯ but rather will approach a translate of u(x) ¯ deter mined uniquely by the amount of perturbation mass (measured as R (u(0, x)− u(x))d ¯ x) carried into the shock layer. We proceed, then, by defining the perturbation v(t, x) = u(t, x + δ(t)) − u(x), ¯

(1.6)

for which δ(t) will be chosen by the analysis to track the location of the perturbed solution in time. In this way, we compare the shapes of u and u¯ rather than their locations. (The type of stability we will conclude is orbital.) Substituting v into (1.1), we obtain the perturbation equation ˙ vt +(a(x)v)x = (b(x)vx )x −(c(x)vx x x )x + δ(t)( u¯ x (x)+vx )+ Q(x, v, vx , vx x x )x , (1.7) with a(x) = −b (u) ¯ u¯ x + c (u) ¯ u¯ x x x ; b(x) = b(u(x)); ¯ c(x) := c(u(x)); ¯ Q(x, v, vx , vx x x ) = O(|vvx |) + O(|vvx x x |) + O(e−η|x| v 2 ), (1.8) where η > 0 and Q is a continuously differentiable function of its arguments. According to Hypotheses (H0)–(H1), we have that a ∈ C 1 (R), b, c ∈ C 2 (R) and for k = 0, 1, |∂xk a(x)| = O(e−α|x| ); |∂xk (b(x) − b± )| = O(e−α|x| ); |∂xk (c(x) − c± )| = O(e−α|x| ), (1.9) where α > 0 and ± represent the asymptotic limits as x → ±∞. Regarding the convection coefficient a(x), we note that it decays to 0 at both ±∞ (see Sect. 2 for more details). In comparison with conservation laws u t + f (u)x = (b(u)u x )x , this corresponds with the degenerate case, for which one of the asymptotic convection coefficients a± = f (u ± ) for a traveling wave u(x ¯ − st) is equal to the wave speed s, and both values can be taken without loss of generality as 0 (see [23, 24]). In the case (1.1)–(1.3) the wave (1.4) satisfies a(x) = −3u(x) ¯ u¯ x (x),

Transition Fronts for Cahn–Hilliard Equations

769

from which we see that for x < 0, we have a(x) > 0, while for x > 0, we have a(x) < 0. In this way, the wave (1.4) is similar to Lax waves in the sense that characteristic speeds (values of the convection coefficient a(x)) impinge on the transition layer from both sides. The critical difference, again, is that in the case of equations of form (1.1) these characteristic speeds approach 0 asymptotically. Finally, we mention that whereas this decay of a(x) is algebraic in the case of degenerate viscous shock waves, it is exponential in the current setting, a difference critical in the Evans function analysis of Sect. 2. Another important feature of equations of form (1.1) is that the diffusion coefficient b(u(x)) ¯ may not always be positive. In the case (1.1)–(1.3) with wave (1.4), we have b(u) ¯ =

3 2 1 u¯ − , 2 2

(1.10)

which is asymptotically positive at both ±∞, but negative near the transition layer at x = 0. Our assumptions (H0)–(H1) clearly do not give us control over the possible destabilizing effect of this feature of the equations. The effect is, however, encoded in the spectrum of the linear operator Lv = −(c(x)vx x x )x + (b(x)vx )x − (a(x)v)x ,

(1.11)

which we discuss further below. For now, we suffice to remark that the essential spectrum of L consists of the negative real axis, up to the imaginary axis, and so as in the case of regularized conservation laws, the step from spectral stability to nonlinear stability is problematic. Integrating (1.7), we have (after integration by parts and upon observing that e Lt u¯ x = u¯ x ) the integral equation v(t, x) =

+∞

G(t, x; y)v0 (y)dy + δ(t)u¯ x (x) t +∞ ˙ G y (t − s, x; y) Q(y, v, v y , v yyy ) + δ(s)v dy ds, (1.12) − −∞

0

−∞

where G(t, x; y) denotes a Green’s function for the linear part of (1.7): G t = LG; G(0, x; y) = δ y (x).

(1.13)

We divide G(t, x; y) into terms for which the x dependence is exactly u¯ x (x) (the excited ˜ x; y). In general, the excited terms do not decay for terms) and the remainder, G(t, large time and correspond with mass accumulating at the origin, shifting the asymptotic location of the waves. Writing ˜ x; y) + u¯ x (x)e(t, y), G(t, x; y) = G(t, we have

770

P. Howard

+∞ ˜ x; y)v0 (y)dy + u¯ x (x) e(t, y)v0 (y)dy + δ(t)u¯ x (x) G(t, −∞ −∞ t +∞ ˙ dy ds G˜ y (t − s, x; y) Q(y, v, v y , v yyy ) + δ(s)v − 0 −∞ t +∞ ˙ e y (t − s, y) Q(y, v, v y , v yyy ) + δ(s)v dy ds. − u¯ x (x)

v(t, x) =

+∞

0

−∞

Choosing δ(t) to eliminate precisely the excited term, we have +∞ t +∞ ˙ δ(t) = − e(t, y)v0 (y)dy + e y (t − s, y) Q(y, v, v y , v yyy ) + δ(s)v dyds, −∞

0

−∞

(1.14) after which, we have +∞ ˜ x; y)v0 (y)dy v(t, x) = G(t, −∞ t +∞ ˙ dy ds. (1.15) G˜ y (t − s, x; y) Q(y, v, v y , v yyy ) + δ(s)v − 0

−∞

Integral equations for vx and δ˙ can immediately be obtained by differentiation of both ˜ x; y) and sides of (1.14) and (1.15). Our approach will be to determine estimates on G(t, ˙ e(t, y) sufficient for closing simultaneous iterations on v(t, x), vx (t, x), δ(t), and δ(t). To this end, we analyze the Green’s function G(t, x; y) through its Laplace transform G λ (x, y), which satisfies the ODE (t transformed to λ) LG λ − λG λ = −δ y (x),

(1.16)

and can be estimated by standard methods. Letting φ1+ (x; λ) and φ2+ (x; λ) denote the (necessarily) two linearly independent asymptotically decaying solutions at +∞ of the eigenvalue ODE, Lφ = λφ, (1.17) and φ1− (x, λ) and φ2− (x, λ) similarly the two linearly independent asymptotically decaying solutions at −∞, we write G λ (x, y) as a linear combination φ1− (x; λ)N1+ (y; λ) + φ2− (x; λ)N2+ (y; λ), x < y G λ (x, y) = (1.18) φ1+ (x; λ)N1− (y; λ) + φ2+ (x; λ)N2− (y; λ), x > y. Insisting, as usual, on the continuity of G λ (x, y) and its first two x-derivatives in x, and on the jump in ∂x3 G λ (x, y) at x = y, we have W (φ1+ , φ2+ , φ2− ) W (φ1+ , φ2+ , φ1− ) ; N2+ (y; λ) = − ; c(y)Wλ (y) c(y)Wλ (y) W (φ1− , φ2− , φ2+ ) W (φ1− , φ2− , φ1+ ) N1− (y; λ) = − ; N2− (y; λ) = , c(y)Wλ (y) c(y)Wλ (y) N1+ (y; λ) =

(1.19)

where W (φ1 , φ2 , . . . , φn ) denotes a square determinant of column vectors created by augmentation with an appropriate number of x-derivatives and Wλ (y) := W (φ1− , φ2− , φ1+ , φ2+ ). We see immediately from (1.18) and (1.19) that, away from essential spectrum,

Transition Fronts for Cahn–Hilliard Equations

771

G λ (x, y) is bounded so long as Wλ (y) is bounded away from 0. Since Wλ (y) is a Wronskian for (1.17), for each fixed λ its sign does not change as y varies. In light of this, we define the Evans function as D(λ) = Wλ (0).

(1.20)

Introduced by Evans in the context of nerve impulse equations [14] (see also the early analysis of Jones for pulse solutions to the FitzHugh–Nagumo equation [35]), the Evans function serves as a characteristic function for the operator L. More precisely, away from essential spectrum, zeros of the Evans function correspond in location and multiplicity with eigenvalues of the operator L, an observation that has been made precise in [1] in the case—pertaining to reaction–diffusion equations—of isolated eigenvalues, and in [21, 52] and [40] in the cases—pertaining respectively to conservation laws and the nonlinear Schrödinger equation—of nonstandard “effective” eigenvalues embedded in essential spectrum. (The latter correspond with resonant poles of L, as also examined in [48].) Briefly, we remark on the connection between our form of the Evans function (1.20) ± ± ± ± tr and the definition of Evans et al. [1, 14, 35]. Letting ± k := (φk , φk , φk , φk ) , define ± (1.21) Y± (x; λ) = ± 1 (x; λ) ∧ 2 (x; λ), where ∧ represents a wedge product on the space of vectors in R4 (i.e., ∧ denotes an associative, bilinear operation on vectors v ∈ R4 , characterized by the relation between standard R4 basis vectors e j and ek , e j ∧ ek = −ek ∧ e j ). In this case, we obtain through straightforward calculation that D(λ) = e−

x 0

trA(y;λ)dy

Y− (x; λ) ∧ Y+ (x; λ),

(1.22)

which is the form of [1, 14, 35]. (As described below, the matrix A is simply the matrix that arises when (1.17) is written as a first order system.) One notable advantage of the formulation (1.22) is that the 2-forms Y± (x; λ) are the asymptotically decaying solutions of the equations Y± = A± (x; λ)Y± , where A is defined by

± ± ± ± ± Y± = (± 1 ∧ 2 ) = 1 ∧ 2 + 1 ∧ 2

± ± ± ± ± = (A± 1 ) ∧ 2 + 1 ∧ A2 = A± 1 ∧ 2 = A± Y± .

In this way, the wedge formulation can remain analytically valid in cases for which the wedge products Y± cannot be uniquely resolved into components. In the current setting, the ± k remain sufficiently distinct (away from essential spectrum) that the two formulations are equivalent. We will observe in Sect. 2 that D(λ) is√not analytic at λ = 0, but can be extended analytically on the Riemann surface ρ = λ. (This is quite similar to the analysis of Kapitula and Rubin in the case of the complex Ginzburg–Landau equation [39], and the analyses of Gu`es et al. and of Zumbrun in the case of multidimensional shock stability in the vicinity of glancing points [19, 51], but differs markedly from the case of degenerate √ conservation laws, for which the Evans function has a dependence of the form λ ln λ [23, 24].) In light of this, we define the function Da (ρ) := D(λ). The primary

772

P. Howard

concern of this analysis is to show that under assumptions (H0)–(H1), nonlinear stability of standing wave solutions u(x) ¯ of (1.1) is implied by the spectral condition (D): (D): The Evans function D(λ) has precisely one zero in {Reλ ≥ 0}, necessarily at 0, and ∂ρρ Da (0) = 0. Remark 1.1. Under the assumption of Condition (D), the point spectrum of L, aside from the distinguished point at λ = 0, lies to the left of a parabolic contour, which passes through the negative real axis and opens into the negative real half plane, λs (k) = −c˜2 k 2 + i c˜1 k + c˜0 .

(1.23)

We will denote this contour s . The essential spectrum of L consists of the negative real axis. While Condition (D) is generally quite difficult to verify analytically (see, for example, [13, 15, 35]), it can be analyzed numerically [2, 33, 31]. In the case of Eqs. (1.1)–(1.3) and the standing wave (1.4), (D) is straightforward to verify (see [5] or Lemma 2.6 of the current analysis, in which we review the analysis of [5] for the sake of completeness). We are now in a position to state the first theorem of the paper. Theorem 1.1. Suppose u(x) ¯ is a standing wave solution to (1.1) and suppose (H0)–(H1) hold, as well as stability criterion (D). Then for some fixed C and C E , and for positive contants M and K sufficiently large, and for η > 0, depending only on the spectrum and coefficients of L, the Green’s function G(t, x; y) described in (1.13) satisfies the following estimates for y ≤ 0 (with symmetric estimates in the case y ≥ 0). (I) For either |x − y| ≥ K t or t ≤ 1, and for α a multi-index in the variables x and y, |∂ α G(t, x; y)| ≤ Ct −

1+|α| 4

4/3

e

− |x−y|1/3 Mt

, |α| ≤ 3.

(II) For |x − y| ≤ K t and t ≥ 1, ˜ x; y) + E(t, x; y), G(t, x; y) = G(t, where (i) y ≤ x ≤ 0: 2 (x−y)2 (x−y)2 − − (x+y) ˜ x; y) = √ C e 4b− t − e 4b− t + O(t −1 )e− Mt , G(t, 4b− t 2 (x−y)2 C − − (x+y) ∂ y e 4b− t − e 4b− t G˜ y (t, x; y) = √ 4b− t (x−y)2 (x+y)2 |x + y| + O(t −3/2 )e− Mt + O( 3/2 )e− Mt , t (x−y)2 y2 (x−y)2 |x − y| − G˜ x (t, x; y) = O( 3/2 )e Mt + O(t −1/2 e−η|x| )e− Mt + O(t −3/2 )e− Mt , t (x−y)2 y2 |y| G˜ x y (t, x; y) = O(t −3/2 )e− Mt + O( 3/2 e−η|x| )e− Mt . t

Transition Fronts for Cahn–Hilliard Equations

773

(ii) x ≤ y ≤ 0: 2 (x−y)2 (x−y)2 − − (x+y) ˜ x; y) = √ C e 4b− t − e 4b− t + O(t −1 )e− Mt , G(t, 4b− t (x−y)2 (x−y)2 |x − y| G˜ y (t, x; y) = O( 3/2 )e− Mt + O(t −3/2 )e− Mt , t 2 (x−y)2 C − − (x+y) ∂x e 4b− t − e 4b− t G˜ x (t, x; y) = √ 4b− t

2 y2 |x + y| − (x+y)2 −1/2 −η|x| − Mt −3/2 − (x−y) Mt Mt , e + O(t e )e +O + O(t )e t 3/2

G˜ x y (t, x; y) = O(t −3/2 )e−

(x−y)2 Mt

.

(iii) y ≤ 0 ≤ x: (x− bb+ y)2 − − Mt

˜ x; y) = O(t −1/2 )e G(t, , ⎛ ⎞ b 2 x− b + y |x − bb−+ y| − ⎠ e− Mt G˜ y (t, x; y) = O ⎝ 3/2 t

2

x− bb+ y − |y| −η|x| − y 2 Mt e Mt + O(t −3/2 )e− + O 3/2 e , t ⎛ ⎞ b 2 x− b + y y2 |x − bb−+ y| − − −1/2 −η|x| ˜ ⎝ ⎠ Mt e− Mt G x (t, x; y) = O + O t e e t 3/2

+ O(t −3/2 )e−

G˜ x y (t, x; y) = O(t −3/2 )e−

2 x− bb+ y − Mt

2 x− bb+ y − Mt

, +O

|y| −η|x| − y 2 e Mt , e t 3/2

and in all cases E(t, x; y) = C E u¯ x (x)

√y

4b− t

−∞

2 e−ζ dζ 1 + O(e−η|y| ) ,

2 CE − y E y (t, x; y) = √ u¯ x (x)e 4b− t 1 + O(e−η|y| ) . 4b− t

The estimates of Theorem 1.1 are quite similar to those of Lemma 2.1 in [38], in which the authors are considering a multidimensional generalization of (1.1)–(1.3). Similar estimates are also obtained in [5] during the course of the analysis, though in that case the authors work with the integrated equation and consequently the term E(t, x; y), which does not decay as t grows, does not appear. In both of these cases, the analysis is restricted to the case of (1.2), with F(u) = (1/8)u 4 − (1/4)u 2 , M(u) ≡ 1, and κ = 1, and the wave (1.4).

774

P. Howard

The estimates of Theorem 1.1 are sufficient for establishing the following theorem on the perturbations v(t, x). Theorem 1.2. Suppose u(x) ¯ is a standing wave solution to (1.1), and suppose (H0)–(H1) hold, as well as stability criterion (D). Then for Hölder continuous initial perturbations (u(0, x) − u(x)) ¯ ∈ C 0+γ (R), γ > 0, with +∞ (u(0, x) − u(x))d ¯ x =0 −∞

|u(0, x) − u(x)| ¯ ≤ E 0 (1 + |x|)−3/2 ,

(1.24)

for some E 0 sufficiently small, and for δ(t) as defined in (1.14), there holds √ |u(t, x + δ(t)) − u(x)| ¯ ≤ C E 0 (1 + |x| + t)−3/2 , √ |∂x (u(t, x +δ(t))− u(x))| ¯ ≤ C E 0 t −1/4 (1+t)−1/4 (1+|x|+ t)−3/2 +(1+t)−3/4 e−η|x| , |δ(t)| ≤ C E 0 (1 + t)−1/4 , ˙ |δ(t)| ≤ C E 0 (1 + t)−5/4 .

(1.25)

In the estimates of Theorem 1.2, we have included vx (t, x) in order to take advantage of the particular form of the nonlinearity Q, in which the term O(|vvx |) is better than O(v 2 ), which arises in a similar analysis in the case of regularized conservation laws. Remark 1.2. We note that in the current setting the assumption of zero mass initial data can be taken without loss of generality. As in the case of Lax waves arising in regularized conservation laws, the asymptotic shift of the wave can be located here entirely from the initial perturbation. In this way, the shift to a zero mass perturbation is simply a change of perspective, in which we consider the stability of the asymptotically selected wave rather than of the original wave. The validity of this a priori selection of the asymptotic translate is justified precisely by our decay estimate on δ(t), from which we verify that perturbations of the zero-mass wave do indeed approach the zero-mass wave. Of course, this means the stability we conclude is orbital. In light of Remark 1.2, we can state the following theorem regarding initial perturbations with non-zero mass. Corollary 1.1. Suppose u(x) ¯ is a standing wave solution to (1.1), and suppose (H0)– (H1) hold, as well as stability criterion (D). Then for Hölder continuous initial perturbations (u(0, x) − u(x)) ¯ ∈ C 0+γ (R), γ > 0, with |u(0, x) − u(x)| ¯ ≤ E 0 (1 + |x|)−3/2 , for some E 0 sufficiently small, there holds √ |u(t, x + x0 ) − u(x)| ¯ ≤ C E 0 (1 + |x| + t)−3/2 + (1 + t)−1/4 e−η|x| , where 1 x0 = − u+ − u−

+∞ −∞

u(0, x) − u(x)d ¯ x.

(1.26)

(1.27)

Transition Fronts for Cahn–Hilliard Equations

775

We note specifically the relationship between Corollary 1.1 and the result of [5]. In [5], the authors employ the renormalization group method to establish the following result on (1.1)–(1.3): BKT Theorem 1.1. For any p > 2, there exists δ > 0 such that for v0 X p ≤ δ, X p := v0 ∈ C 0 (R) : v0 X := sup (1 + |x| p+1 )|v0 (x)| < ∞ , x∈R

the Cauchy problem (1.1)–(1.3) with u(1, x) = u(x) ¯ + v0 (x), where u(x) ¯ is as in (1.4), has a unique smooth solution u(t, x), t ∈ (1, ∞) satisfying u(t, ·) − u(· ¯ − x0 ) L ∞ (R) → 0

as t → ∞,

for some x0 ∈ R depending only on v0 . Moreover, for some constants A and B depending only on v0 , there holds 1 d − x2 u(t, x + x0 ) = u(x) e 4t (Au(x) ¯ +√ ¯ + B) + o(t −1 ). 4π t d x The difference between Corollary 1.1 and the result of [5] (for Eq. (1.1)–(1.3), with wave (1.4)) lies predominately in the difference in assumptions on the initial perturbation. In taking more rapidly decaying initial perturbations, we could increase the decay rate of our perturbation in time. More precisely, in the estimate of Theorem 1.2, for 1 < r < 2 initial perturbations of size (1 + |x|)−r give temporal decay on u(t, x + δ(t)) − u(x) ¯ at rate (1 + t)−r/2 and δ(t) decay at rate (1 + t)(1−r )/2 . For r > 2, δ(t) decays at the maximal rate (1 + t)−1/2 , and the estimate of Corollary 1.1 becomes (1.28) ¯ ≤ C E 0 (1 + t)−1/2 e−η|x| + (1 + t)−1 , |u(t, x + x0 ) − u(x)| which almost recovers the detail of [5] Theorem 1.1. Finally, we note that the second order term in [5] Theorem 1.1 is the analogue here of the diffusion waves of Liu [41], and could be recovered in the current setting by an analysis similar to that of [29, 30, 49]. Outline of the paper. In Sect. 2, we carry out a general analysis of the Evans function for the eigenvalue problem (1.17), providing full details in the case (1.1)–(1.3) with wave (1.4). In addition, we use this analysis to develop estimates on the Laplace-transformed Green’s function G λ (x, y). In Sect. 3, we prove Theorem 1.1, while in Sect. 4 we prove Theorem 1.2. Finally, some representative estimates on the integrations arising from our integral equations are provided in Sect. 5. 2. Analysis of the Evans Function In this section we analyze the Evans function, as defined in (1.20), for our linear eigenvalue problem (1.17). We begin with a remark on the structure of standing waves u(x) ¯ arising in the context of (1.1). Upon substitution of such a wave into (1.1), we have the ODE, (c(u) ¯ u¯ x x x )x − (b(u) ¯ u¯ x )x = 0. (2.1)

776

P. Howard

Observing that u¯ x (x) → 0 as x → ±∞, we can integrate (2.1) to obtain the third order equation c(u) ¯ u¯ x x x − b(u) ¯ u¯ x = 0. (2.2) Asymptotically, solutions to (2.2) behave like solutions to the constant coefficient equations c± u¯ x x x − b± u¯ x = 0, (2.3) √ whose solutions grow and decay with rates ± b± /c± and 0. Since the solutions with rate 0 correspond with constant solutions to (2.1), we see that any solution that does not blow up must approach its endstates at exponential rate. We now begin our analysis of the Evans function by writing the eigenvalue problem (1.17) as a first order system W = A(x; λ)W, (2.4) where

⎛

0 ⎜ 0 A(x; λ) = ⎜ 0 ⎝ λ − − c(x)

a (x) c(x)

1 0 0 − a(x) c(x) +

0 1 0

b (x) c(x)

b(x) c(x)

0 0 1

(x) − cc(x)

⎞ ⎟ ⎟. ⎠

Under assumptions (H0)–(H1), A(x; λ) has the asymptotic behavior A− (λ) + E(x; λ), x < 0 A(x; λ) = A+ (λ) + E(x; λ), x > 0, where

⎛

0 ⎜ 0 lim A(x; λ) = A± (λ) = ⎜ ⎝ 0 x→±∞ − cλ±

1 0 0 0

0 1 0 b± c±

⎞ 0 0⎟ ⎟ 1⎠ , 0

(2.5)

and for |λ| bounded E(x; λ) = O(e−α|x| ). The eigenvalues of A± (λ) satisfy the relation, 2 − 4λc −b± ± b± ± 2 . μ = −2c± Ordering these so that j < k ⇒ Reμ j (λ) ≤ Reμk (λ), we have b± 1 √ + O(|λ|); μ± μ± λ + O(|λ|3/2 ), 1 (λ) = − 2 (λ) = − √ c± b± b± 1 √ ± ± 3/2 λ + O(|λ| ); μ4 (λ) = + + O(|λ|), μ3 (λ) = + √ c± b± ± 2 ± 3 tr with associated eigenvectors Vk± = (1, μ± k , (μk ) , (μk ) ) . A straightforward way in which to understand these expressions is through the relationship √ √ 4c± λ 2 b± − b± − 4λc± = , 2 − 4λc b± + b± ±

Transition Fronts for Cahn–Hilliard Equations

777

√ where the right-hand side is now in the form of λ multiplied by a function that for |λ| sufficiently small is analytic in λ. We note that for |λ| sufficiently small, and away from essential spectrum (the negative real axis), these eigenvalues remain distinct. Also, we ± ± ± clearly have μ± 1 = −μ4 and μ2 = −μ3 . Lemma 2.1. For the eigenvalue problem (1.17), assume a ∈ C 1 (R), b, c ∈ C 2 (R), with b± > 0 and c± > 0, and additionally that (1.9) holds. Then for some α¯ > 0 and k = 0, 1, 2, 3, we have the following estimates on a choice of linearly independent solutions of (1.17). For |λ| ≤ r , some r > 0 sufficiently small, there holds: (i) For x ≤ 0: −

k −α|x| ¯ ∂xk φ1− (x; λ) = eμ3 (λ)x (μ− )), 3 (λ) + O(e −

k −α|x| ¯ ∂xk φ2− (x; λ) = eμ4 (λ)x (μ− )), 4 (λ) + O(e −

k −α|x| ¯ ∂xk ψ1− (x; λ) = eμ1 (λ)x (μ− )), 1 (λ) + O(e − 1 − k μ2 (λ)x k μ− ¯ 3 (λ)x + O(e −α|x| μ− ∂xk ψ2− (x; λ) = − (λ) e − μ (λ) e ). 3 μ2 (λ) 2

(ii) For x ≥ 0: ¯ ∂xk φ1+ (x; λ) = eμ1 (λ)x (μ+1 (λ)k + O(e−α|x| )), +

¯ )), ∂xk φ2+ (x; λ) = eμ2 (λ)x (μ+2 (λ)k + O(e−α|x| 1 + + ¯ μ+3 (λ)k eμ3 (λ)x − μ+2 (λ)k eμ2 (λ)x + O(e−α|x| ), ∂xk ψ1+ (x; λ) = + μ3 (λ) +

¯ )). ∂xk ψ2+ (x; λ) = eμ4 (λ)x (μ+4 (λ)k + O(e−α|x| +

Proof. Lemma 2.1 can be proved by standard methods such as those of [52], pp. 779–781. We remark here only on the choice of our slow growth solutions ψ2− (x; λ) and ψ1+ (x; λ). We could alternatively make the choice −

k −α|x| ¯ ∂xk ψ˜ 2− (x; λ) = eμ2 (λ)x (μ− )), 2 (λ) + O(e + ¯ ∂xk ψ˜ 1+ (x; λ) = eμ3 (λ)x (μ+3 (λ)k + O(e−α|x| )).

The difficulty with this choice is that these growth solutions coalesce with the slow decay solutions φ1− and φ2+ as λ → 0. (See [23], in which this latter convention is taken.) Following the analysis of [5], we take the linear combination of solutions stated in order to have a set of solutions of (1.17) that remains linearly independent as λ → 0. In order to see that ψ2− (x; λ) and ψ1+ (x; λ) are valid choices, we note that their derivation depends precisely on this coalescence with the slow decay solutions φ1− and φ2+ as λ → 0. In the case, for example, of ψ2− (x; λ), we proceed by letting φ0− (x) be any λ = 0 solution of (1.17) that neither grows nor decays as x → −∞. (The existence of such a solution follows immmediately from the slow decay solution φ1− .) We search for solutions of the form ψ2− (x; λ) = φ0− (x)w(x; λ), for which direct substitution reveals

778

P. Howard

− c(x)φ0− w − c(x) 3φ0− w + 3φ0− w + φ0− w

+b(x)φ0− w + (b(x)φ0− w ) − a(x)φ0− w = λφ0− w,

(2.6)

and we make the critical observation that the only coefficient of undifferentiated w is λφ0− . In this way, the first order system associated with (2.6) has the form (Wk = ∂ k w) W = A− (λ)W + E(x; λ)W, where A− (λ) is precisely as in (2.5), while E(x; λ) has the particular form ⎛

0 0 ⎜ E(x; λ) = ⎝ 0 λO(e−α|x| )

0 0 0

0 0 0

O(e−α|x| )

O(e−α|x| )

0 0 0

⎞ ⎟ ⎠,

(2.7)

O(e−α|x| )

−

for some α > 0. Searching for solutions W = eμ3 x Z , we have the equation Z + μ− 3 I Z = A− (λ)Z + E Z , which can be integrated as Z (x)

=V3− (λ) +

−M

+

x −∞

−

e(A−μ3 )(x−y) PE(y; λ)Z (y; λ)dy −

e(A−μ3 )(x−y) QE(y; λ)Z (y; λ)dy,

x

where similarly as in [52], P is a projection operator projecting onto the eigenspace − − associated with μ− 1 , μ2 , and μ3 , while Q is the projection operator projecting onto the − eigenspace associated with μ4 . In this way, the first integral decays at exponential rate due to the exponential decay of E and the second integral decays at exponential rate − − due to the exponential decay of the combination e(μ4 −μ3 )(x−y) E(y; λ). Consequently, a standard contraction mapping closes for all |x| sufficiently large. This in turn justifies a direct iteration, in which V3 (λ) is taken as a first approximation for Z (x). Upon substitution for a second iterate, we observe ⎞⎛ 1 ⎞ ⎛ 0 0 0 0 − ⎟ 0 0 0 0 ⎟⎜ ⎜ ⎜ μ−3 (λ)2 ⎟ E(y; λ)V3+ (λ) = ⎝ ⎠ ⎝μ (λ) ⎠ 0 0 0 0 3 3 λO(e−α|y| ) O(e−α|y| ) O(e−α|y| ) O(e−α|y| ) μ− 3 (λ) = O(|λ1/2 |e−α|y| ). Continuing the iteration, we conclude −

φ1− (x; λ) = eμ3 (λ)x φ0− (x)(1 + O(|λ1/2 |e−α|x| )). Similarly, a slow growth solution can be constructed from φ0− (x), as −

ψ˜ 2− (x; λ) = eμ2 (λ)x φ0− (x)(1 + O(|λ1/2 |e−α|x| )).

Transition Fronts for Cahn–Hilliard Equations

779

Any linear combination of φ1− (x; λ) and ψ˜ 2− (x; λ) is also a solution of (1.17), and so we are justified in defining ψ2− (x; λ) :=

1 − ˜ (x; λ) − φ − (x; λ) . ψ 2 1 μ− 2

We have, then, 1 − ˜ (x; λ) − φ − (x; λ) ψ 2 1 μ− 2 − − 1 = − eμ2 (λ)x φ0− (x)(1 + O(|λ1/2 |e−α|x| )) − eμ3 (λ)x φ0− (x)(1 + O(|λ1/2 |e−α|x| )). μ2 − φ − (x) − = 0 − eμ2 (λ)x − eμ3 (λ)x + O(|λ1/2 |e−α|x| ) . μ2 The estimate we state is obtained by scaling φ0− (x) as φ0− (x) = 1 + O(e−α|x| ). The case of ψ1+ (x; λ) can be analyzed similarly.

Lemma 2.2. Under the assumptions of Lemma 2.1, and for φk± , ψk± as in Lemma 2.1, with k = 0, 1 and for D(λ) as in (1.20) we have the following estimates. For some α˜ > 0, (i) For x ≤ 0: W (φ1− , φ2− , ψ1− ) c(x)Wλ (x) − 1 − − − − − − k −α|x| ˜ e−μ2 (λ)x (−μ− = ) (μ − μ )(μ − μ )(μ − μ ) + O(e ) , 2 1 4 1 3 4 3 c(0)D(λ) W (φ1− , φ2− , ψ2− ) ∂xk c(x)Wλ (x) − − − 1 − − μ2 − μ3 − k −α|x| ˜ e−μ1 (λ)x (−μ− = ) (μ − μ )( )(μ− ) , 1 2 4 4 − μ3 ) + O(e − c(0)D(λ) μ2 ∂xk

∂xk

∂xk

W (φ1− , ψ1− , ψ2− ) c(x)Wλ (x) − − − 1 − − μ2 − μ3 − −α|x| ˜ = e−μ4 (λ)x (−μ− )(μ − μ )( )(μ− ) , 4 2 1 1 − μ3 ) + O(e − c(0)D(λ) μ2 W (φ2− , ψ1− , ψ2− ) c(x)Wλ (x) (μ− − μ− )(μ− − μ− )(μ− − μ− ) − 1 k 2 1 2 4 1 4 = e−μ3 (λ)x (−μ− 3) − c(0)D(λ) μ2 − k 1/2 −α|x| ˜ − e−μ2 (λ)x (−μ− ) + O(|λ |e ) . 2

780

P. Howard

(ii) For x ≥ 0: W φ1+ , φ2+ , ψ1+ c(x)Wλ (x) μ+ − μ+ 1 + ˜ e−μ4 (λ)x (−μ+4 )k ( 3 + 2 )(μ+3 − μ+1 )(μ+2 − μ+1 ) + O(e−α|x| = ) , c(0)D(λ) μ3 + + + W φ1 , φ2 , ψ2 ∂xk c(x)Wλ (x) 1 + ˜ e−μ3 (λ)x (−μ+3 )k (μ+4 − μ+2 )(μ+4 − μ+1 )(μ+2 − μ+1 ) + O(e−α|x| = ) , c(0)D(λ) W φ1+ , ψ1+ , ψ2+ ∂xk c(x)Wλ (x) (μ+ − μ+ )(μ+ − μ+ )(μ+ − μ+ ) 1 + 4 3 4 1 3 1 e−μ2 (λ)x (−μ+2 )k = c(0)D(λ) μ+3 + ˜ − e−μ3 (λ)x (−μ+3 )k + O(|λ1/2 |e−α|x| ) ,

∂xk

∂xk

W (φ2+ , ψ1+ , ψ2+ ) c(x)Wλ (x)

+

μ3 − μ+2 1 + −α|x| ˜ + O(e = ) . e−μ1 (λ)x (−μ+1 )k (μ+4 − μ+3 )(μ+4 − μ+2 ) c(0)D(λ) μ+3

Proof. The estimates of Lemma 2.2 are each carried out in a similar straightforward manner, so we proceed only in the case of W φ2− , ψ1− , ψ2− . ∂x c(x)Wλ (x) Computing directly, and observing (c(x)Wλ (x)) = 0, we have ∂x

∂x W (φ2− , ψ1− , ψ2− ) W (φ2− , ψ1− , ψ2− ) = . c(x)Wλ (x) c(x)Wλ (x)

For the denominator, we note Abel’s relation, x

Wλ (x) = Wλ (0)e 0 trA(y;λ)dy = D(λ)e c(0) = D(λ) , c(x)

−

x

c (y) 0 c(y)

dy

or c(x)Wλ (x) = c(0)D(λ).

(2.8)

Transition Fronts for Cahn–Hilliard Equations

781

For the numerator, we compute ⎛

⎞ φ2− ψ1− ψ2− ∂x W (φ2− , ψ1− , ψ2− ) = det ⎝ φ2− ψ1− ψ2− ⎠ φ2− ψ1− ψ2− ⎛ ⎞ 1 1 1 ⎠ e(μ−1 +μ−2 +μ−4 )x μ− μ− = μ1− det ⎝ μ− 4 1 2 2 3 (μ− )3 (μ− )3 (μ− 4) 1 ⎛ ⎞2 1 1 1 ˜ ). ⎠ e(μ−1 +μ−3 +μ−4 )x + O(e−α|x| μ− μ− − μ1− det ⎝ μ− 4 1 3 − 3 − 3 − 3 2 (μ4 ) (μ1 ) (μ3 )

(2.9)

We mention particularly that in the exponentially decaying terms of this last calculation, we have observed the estimate 1 − − 1 √ ¯ ¯ ) ≤ C1 − | λx|O(e−α|x| ) − (eμ2 (λ)x − eμ3 (λ)x )O(e−α|x| μ2 (λ) μ2 (λ) ˜ = O(e−α|x| ),

for some 0 < α˜ < α. ¯ Computing the determinants, we find − − 1 − − − − − − (μ− 1 +μ2 +μ4 )x ∂x W (φ2− , ψ1− , ψ2− ) = − (−μ− 3 )(μ2 − μ1 )(μ2 − μ4 )(μ3 − μ4 )e μ2 − − − − − − − − − (μ1 +μ3 +μ4 )x ˜ − (−μ− )(μ − μ )(μ − μ )(μ − μ ) + O(e−α|x| ), 2 3 1 3 4 1 4 e − − − for which we observe that with μ− 1 = −μ4 and μ2 = −μ3 , we have − − − − − − − − − − − (μ− 2 − μ1 )(μ2 − μ4 )(μ3 − μ4 ) = (μ3 − μ1 )(μ3 − μ4 )(μ1 − μ4 ),

and

−

−

−

−

−

−

−

−

e(μ1 +μ2 +μ4 )x = e−μ3 x , e(μ1 +μ3 +μ4 )x = e−μ2 x .

Combining these last observations with (2.8), we conclude our argument for the case of interest. The remaining estimates follow similarly. In addition to Lemmas 2.1 and 2.2, we have the following estimates on the Nk± . Lemma 2.3. Under the assumptions of Theorem 1.1, we have the following estimates on the Nk± of (1.19) (with similar estimates for x ≥ 0). For x ≤ 0, k = 0, 1, and for |λ| ≤ r , some r sufficiently small, −

∂xk N1− (x; λ) = O(|λ−1+ 2 |)e−μ2 (λ)x , k

−

∂xk N2− (x; λ) = O(|λ− 2 + 2 |)e−μ2 (λ)x , − − 1 1 k k μ− 3 (λ)x μ− (λ)k + O |λ− 2 + 2 | e μ3 (λ)x , (λ) − e ∂xk N1+ (x; λ) = O(|λ− 2 |) eμ2 (λ)x μ− 2 3 − − k ∂xk N2+ (x; λ) = O |λ−1+ 2 | e−μ2 (λ)x + O(1)e−μ4 (λ)x . 1

k

782

P. Howard

Proof. We note at the outset that for x ≤ 0, we will take the expansions φk+ (x; λ) = A+k (λ)φ1− (x; λ) + Bk+ (λ)φ2− (x; λ) + Ck+ (λ)ψ1− (x; λ) + Dk+ (λ)ψ2− (x; λ). (2.10) Without loss of generality, we take the convention φ1+ (x; 0) = u¯ x (x) = φ2− (x; 0),

(2.11)

according to which there must hold A+1 (λ) = O(|λ1/2 |);

B1+ (λ) = 1 + O(|λ1/2 |); C1+ (λ) = O(|λ1/2 |);

D1+ (λ) = O(|λ1/2 |); D2+ (λ) = O(1).

A+2 (λ) = O(1);

B2+ (λ) = O(1); C2+ (λ) = O(1); (2.12)

As each estimate of Lemma 2.3 is proven similarly, we provide details only in the case of ∂x N1− . We have, similarly as in the proof of Lemma 2.2, ∂x N1− (x; λ) = −

∂x W (φ1− , φ2− , φ2+ ) , c(x)Wλ (x)

for which we compute ⎛

φ1− − − + ∂x W (φ1 , φ2 , φ2 ) = det ⎝ φ1− − φ1

φ2− φ2− − φ2

⎞ φ2+ φ2+ ⎠ . φ2+

In order to derive expressions for the terms φk± , we consider the eigenvalue problem (1.17). In general, we have the relation

−(c(x)φk± ) + (b(x)φk± ) − (a(x)φk± ) = λφk± .

(2.13)

For φ1− and φ2− , upon integration from −∞ up to x, and keeping in mind that a(x) decays at an exponential rate as x → ±∞, we have x φk− (y; λ)dy. −c(x)φk− + b(x)φk− − a(x)φk− = λ −∞

The most straightforward case is φ2− (x; λ), which for x < 0 decays at an exponential rate even for λ = 0. In this case, we define x − − W2 (x; λ) = φ2− (y; λ)dy = O(1)eμ4 (λ)x . (2.14) −∞

Proceeding similarly for

φ1− (x; λ),

√ W1− (x; λ) = λ = so that

√

x

−∞

which fails to decay for λ = 0, we define √ x μ− (λ)y ¯ φ1− (y; λ)dy = λ e 3 (1 + O(e−α|y| ))dy −∞

√ − 1 λ − eμ3 (λ)x + λO(e−α|x| ), μ3 (λ)

−c(x)φ1− + b(x)φ1− − a(x)φ1− =

√

λW1− (x; λ),

Transition Fronts for Cahn–Hilliard Equations

783

with W1− (x; λ)

=e

μ− 3 (λ)x

√ (

λ

μ− 3 (λ)

+ O(|λ1/2 |e−α|x| ).

Finally, for φ2+ (x; λ) in the case x < 0, we must expand in terms of the linearly independent solutions φk− and ψk− , φ2+ (x; λ) = A+2 (λ)φ1− (x; λ) + B2+ (λ)φ2− (x; λ) + C2+ (λ)ψ1− (x; λ) + D2+ (λ)ψ2− (x, λ). In this case, we integrate (2.13) for y ∈ [x, +∞) to obtain c(x)φ2+ (x; λ)−b(x)φ2+ (x; λ)+a(x)φ2+ (x; λ)

+∞

=λ x

φ2+ (y; λ)dy =:

√

λW2+ (x; λ),

where we divide this final integration up as

+∞ x

φ2+ (y; λ)dy

−M

= x

φ2+ (y; λ)dy

+

+∞

−M

φ2+ (y; λ)dy,

where M > 0 is chosen sufficiently large so that the estimates of Lemma 2.1 hold for x ≤ −M. For the integration over y > −M, we have

+∞

−M

φ2+ (y; λ)dy =

0

−M

O(1)dy +

+∞

¯ eμ2 (λ)y (1 + O(e−α|y| ))dy = O(|λ−1/2 |). +

0

On the other hand, for integration over y < −M, we have

−M x

φ2+ (y; λ)dy = A+2 (λ)

−M x

+ C2+ (λ)

φ1− (y; λ)dy + B2+ (λ)

−M x

−M x

ψ1− (y; λ)dy + D2+ (λ)

φ2− (y; λ)dy

−M x

ψ2− (y, λ)dy.

Combining these last observations, we conclude

W2+ (x; λ) = O(1) +

√ + λ A2 (λ)

+ C2+ (λ)

−M x

−M x

φ1− (y; λ)dy + B2+ (λ)

ψ1− (y; λ)dy + D2+ (λ)

−M x

−M x

φ2− (y; λ)dy

− ψ2− (y, λ)dy = O(1)eμ1 (λ)x .

784

P. Howard

Omitting coefficient dependence on y for brevity of notation, we have ⎛

φ1− det ⎝ φ1− φ1− ⎛

φ2− φ2− φ2−

⎞ φ2+ φ2+ ⎠ φ2+

φ2− φ2+ φ1− ⎜ − − φ1 φ2 φ2+ = det ⎝ √ − b − − b + λ a − a − λ a + b − c φ1 − c φ1 − c W1 c φ2 − c φ2 − c W2 c φ2 − c φ2 − ⎡⎛ ⎞ ⎤ ⎛ ⎞ φ2− φ2+ φ1− 1 00 ⎢⎝ 0 1 0⎠ ⎜ φ − ⎥ − φ2 φ+ ⎟ = det ⎣ ⎠⎦ ⎝ √1 √2 a b − − λ λ λ + −c c 1 − c W1 − c W2 − c W2 ⎞ ⎛ φ2− φ2+ φ1− ⎜ φ2− φ+ ⎟ = det ⎝ √φ1− ⎠. √2 − − λ λ λ + − c W1 − c W2 − c W2

⎞ ⎟

⎠ √ λ + W 2 c

(2.15)

We proceed by expanding this last determinant along its bottom row, giving √ √ λ − λ + λ − − − + + W1 (x; λ)W (φ2 , φ2 ) + W2 (x; λ)W (φ1 , φ2 ) + W (x; λ)W (φ1− , φ2− ). − c(x) c(x) c(x) 2 (2.16) For the first expression in (2.16), we observe W (φ2− , φ2+ ) = A+2 (λ)W (φ2− , φ1− ) + C2+ (λ)W (φ2− , ψ1− ) + D2+ (λ)W (φ2− , ψ2− ) −

−

= O(1)e(μ1 +μ4 )y = O(1), while similarly −

−

W (φ1− , φ2+ ) = O(1)e(μ1 +μ3 )y . In this way, the terms in (2.16) give respectively −

−

−

−

−

−

−

−

−

O(|λ1/2 |)e(μ1 +μ3 +μ4 )y + O(|λ|)e(μ1 +μ3 +μ4 )y + O(|λ1/2 |)e(μ1 +μ3 +μ4 )y . − We now recall the relationship μ− 3 + μ2 = 0, and note that according to spectral criterion (D), there holds D(λ) = c1 λ + O(|λ3/2 |). Upon division, then, by c(y)Wλ (y) = c(0)D(λ), we conclude the claimed estimate. The remaining estimates of Lemma 2.3 follow similarly.

Lemma 2.4. Under the assumptions of Theorem 1.1, and for Ck+ (λ) and Dk+ (λ) as defined in (2.10), there holds C1+ (λ)D2+ (λ) − C2+ (λ)D1+ (λ) = O(|λ|). Proof. We first observe that by augmenting (2.10) with y-derivatives up to third order, we obtain the matrix equations

Transition Fronts for Cahn–Hilliard Equations

⎛

φ1− ⎜ φ− 1 ⎜ − ⎝ φ 1 φ1−

φ2− φ2− − φ2 φ2−

785

ψ1− ψ1− − ψ1 ψ1−

⎞ ⎛ +⎞ ψ2− ⎛ A+ ⎞ φk k + ⎜ φ+ ⎟ ψ2− ⎟ B ⎜ ⎟ k ⎟ ⎟ ⎝ k+ ⎠ = ⎜ ⎝ φ + ⎠ . ψ2− ⎠ Ck k Dk+ φk+ ψ2−

(2.17)

Proceeding by Cramer’s rule, we have, for example,

C1+ (λ) =

W (φ1− , φ2− , φ1+ , ψ2− )

W (φ1− , φ2− , ψ1− , ψ2− )

,

where the W notation is defined immediately following (1.19). By linear independence, the denominator of this last expression is non-zero, while for the numerator, we compute similarly as in (2.15), W (φ1− , φ2− , φ1+ , ψ2− ) ⎛ ⎞ φ1− φ2− φ1+ ψ2− ⎜ φ− ⎟ φ2− ψ1+ ψ2− ⎜ ⎟ 1 = det ⎜ ⎟ φ1− φ2− ψ1+ ψ2− ⎝ √ ⎠ a(x) − − λ λ λ − c(x) W1− − c(x) W2+ − c(x) W1+ ψ2− − b(x) ψ + ψ c(x) 2 b(x) 2 b(x) a(x) = ψ2− − ψ− + ψ − W (φ1− , φ2− , φ1+ ) + O(|λ|), c(x) 2 b(x) 2 where ψ2− is treated in a distinguished manner because it cannot be integrated to an asymptotic limit. Proceeding similarly, we also have W (φ1− , φ2− , φ1− , ψ2− )C2+ (λ) = W (φ1− , φ2− , φ2+ , ψ2− ) b(x) − a(x) − = ψ2− − ψ + ψ c(x) 2 b(x) 2 × W (φ1− , φ2− , φ2+ ) + O(|λ1/2 |),

W (φ1− , φ2− , φ1− , ψ2− )D1+ (λ) = W (φ1− , φ2− , ψ1− , φ1+ ) b(x) − a(x) − = − ψ1− − ψ + ψ c(x) 1 b(x) 1 × W (φ1− , φ2− , φ1+ ) + O(|λ|),

W (φ1− , φ2− , φ1− , ψ2− )D2+ (λ) = W (φ1− , φ2− , ψ1− , φ2+ ) b(x) − a(x) − = − ψ1− − ψ + ψ c(x) 1 b(x) 1 × W (φ1− , φ2− , φ2+ ) + O |λ1/2 | .

786

P. Howard

Combining these observations, we compute W (φ1− , φ2− , φ1− , ψ2− ) C1+ (λ)D2+ (λ) − C2+ (λ)D1+ (λ) b(x) − a(x) − ψ2 + ψ2 W (φ1− , φ2− , φ1+ ) + O(|λ|) = ψ2− − c(x) b(x) b(x) a(x) − ψ1− + ψ1 W (φ1− , φ2− , φ2+ ) + O(|λ1/2 |) × − ψ1− − c(x) b(x) b(x) − a(x) − − ψ2 + ψ2 W (φ1− , φ2− , φ2+ ) + O(|λ1/2 |) − ψ2 − c(x) b(x) b(x) a(x) − ψ1− + ψ1 W (φ1− , φ2− , φ1+ ) + O(|λ|) × − ψ1− − c(x) b(x) b(x) a(x) ψ2− + ψ2− W (φ1− , φ2− , φ1+ )O(|λ1/2 |) = ψ2− − c(x) b(x) b(x) − a(x) − − ψ + ψ W (φ1− , φ2− , φ2+ )O(|λ|) − ψ1 − c(x) 1 b(x) 1 b(x) − a(x) − ψ + ψ W (φ1− , φ2− , φ2+ )O(|λ|) − ψ2− − c(x) 2 b(x) 2 b(x) − a(x) − ψ + ψ W (φ1− , φ2− , φ1+ )O(|λ1/2 |) + ψ1− − c(x) 1 b(x) 1 + O(|λ3/2 |). Recalling that φ2− (x; 0) = u¯ x (x) = φ1+ (x, 0), we have that determinants involving both of these functions must vanish as λ → 0, necessarily at rate |λ1/2 | or better by analyticity in ρ. We conclude the claimed estimate. We next state our main theorem on behavior of the Evans function for small values of λ, which we note is simply the extension to the current setting of Proposition 2.7 of [7]. Lemma 2.5. Under the assumptions of Theorem 1.1, and for D(λ) as defined in (1.20), we have D(λ) = γ (u + − u − )λ + O(|λ3/2 |), with

⎛ − φ1 (0; λ) 1 det ⎝ φ1− (0; λ) γ =− c(0) − φ1 (0; λ)

u¯ x (0) u¯ x x (0) u¯ x x x (0)

⎞ φ2+ (0; λ) φ2+ (0; λ) ⎠ . φ2+ (0; λ)

Proof. Observing that the solutions φk± (x; λ) are analytic in the variable ρ := set Da (ρ) := D(λ). We have, then, ∂ρ Da (0) = W (∂ρ φ1− , φ2− , φ1+ , φ2+ ) + W (φ1− , ∂ρ φ2− , φ1+ , φ2+ )

+ W (φ1− , φ2− , ∂ρ φ1+ , φ2+ ) + W (φ1− , φ2− , φ1+ , ∂ρ φ2+ )

= W (φ1− , ∂ρ (φ2− − φ1+ ), u¯ x (0), φ2+ ),

√

λ, we

Transition Fronts for Cahn–Hilliard Equations

787

where the terms for which neither φ1+ nor φ2− are differentiated are 0 due to the choice of scaling (2.11). We observe here that the fast-decaying solutions φ2− and φ1+ have been derived analytically in λ = ρ 2 , and thus ∂ρ φ2− (x; 0) = 0, ∂ρ φ1+ (x; 0) = 0.

(2.18)

In this way, we see immediately that ∂ρ Da (0) = 0. Proceeding similarly for ∂ρρ Da (ρ), we find ∂ρρ Da (0) = 2W (∂ρ φ1− (0; 0), ∂ρ (φ2− − φ1+ )(0; 0), u¯ x (0), φ2+ (0; 0))

+ 2W (φ1− (0; 0), ∂ρ (φ2− − φ1+ )(0; 0), u¯ x (0), ∂ρ φ2+ (0; 0)) + 2W (φ1− (0; 0), ∂ρ φ2− (0; 0), ∂ρ φ1+ (0; 0), φ2+ (0; 0))

+ W (φ1− (0; 0), ∂ρρ (φ2− − φ1+ )(0; 0), u¯ x (0), φ2+ (0; 0)). Upon substitution of (2.18), all terms with only single partials are eliminated, and we have ∂ρρ Da (0) = W (φ1− (0; 0), ∂ρρ (φ2− − φ1+ )(0; 0), u¯ x (0), φ2+ (0; 0)). In order to understand ρ derivatives of φ2− and φ1+ , we proceed similarly as in (2.13), setting (2.19) −(c(x)φk± ) + (b(x)φk± ) − (a(x)φk± ) = ρ 2 φk± , where prime here denotes differentiation with respect to x. Keeping in mind that φ2− and φ1+ both decay at exponential rate even for λ = 0, we have, upon integration on (−∞, x], −c(x)φ2− + b(x)φ2− − a(x)φ2− = ρ 2 W2− , where W2− is as in (2.14), and similarly for φ1+ . Upon differentiation with respect to ρ, we find

−c(x)(∂ρ φ2− ) + b(x)(∂ρ φ2− ) − a(x)(∂ρ φ2− ) = 2ρW2− + ρ 2 (∂ρ W2− ), with again a similar relationship for φ1+ . We next take a second ρ derivative of (2.19) to obtain, upon integration and evaluation at ρ = 0,

−c(x)(∂ρρ φ2− ) + b(x)(∂ρρ φ2− ) − a(x)(∂ρρ φ2− ) = 2W2− (x; 0). Recalling that W2− (x; ρ) we have W2− (x; 0) = and similarly

:= x

−∞

W1+ (x; 0) =

x −∞

x

−∞

φ2− (y; ρ)dy,

u¯ y (y)dy = u(x) ¯ − u−,

u¯ y (y)dy = u(x) ¯ − u+.

We conclude

−c(x)(∂ρρ (φ2− − φ1+ )) + b(x)(∂ρρ (φ2− − φ1+ )) − a(x)(∂ρρ (φ2− − φ1+ )) = 2(u + − u − ).

788

P. Howard

Finally, we have ⎞ φ1− (0; 0) ∂ρρ (φ2− − φ1+ )(0; 0) u¯ x (0) φ2+ (0; 0) ⎜ φ − (0; 0) ∂ρρ (φ − − φ + ) (0; 0) u¯ x x (0) φ + (0; 0) ⎟ 1 2 1 2 ⎟ ∂ρρ Da (0) = det ⎜ ⎝ φ − (0; 0) ∂ρρ (φ − − φ + ) (0; 0) u¯ x x x (0) φ + (0; 0) ⎠ 1 2 1 2 φ1− (0; 0) ∂ρρ (φ2− − φ1+ ) (0; 0) u¯ x x x x (0) φ2+ (0; 0) ⎛ − ⎞ φ1 (0; 0) ∂ρρ (φ2− − φ1+ )(0; 0) u¯ x (0) φ2+ (0; 0) ⎜ φ − (0; 0) ∂ρρ (φ − − φ + ) (0; 0) u¯ x x (0) φ + (0; 0) ⎟ 2 1 2 ⎜ 1 ⎟ = det ⎜φ − (0; 0) ∂ (φ − − φ + ) (0; 0) u¯ (0) φ + (0; 0)⎟ , ρρ x x x ⎝ 1 ⎠ 2 1 2 ⎛

2 (u + − u − ) − c(0)

0

from which the claimed relationship is immediate.

0

0

We now state for completeness a lemma asserting that (D) is known to hold in the case (1.1)–(1.3) with wave (1.4). Lemma 2.6. For the case (1.1)–(1.3), and for the standing front (1.4), spectral condition (D) holds. Proof. In order to verify the first assertion of (D)—that the only zero that D(λ) has with x non-negative real part is λ = 0—we first note that the integrated variable w(x) = −∞ φ(y; λ)dy satisfies the integrated eigenvalue problem Lw = λw; Lw := ∂x (−∂x x + 1 + V (x))wx , where V (x) = −(3/2) cosh−2 (x/2), and L is clearly a self-adjoint operator. Away from essential spectrum, the eigenvalues of L correspond with those of L, and so aside from the point λ = 0 (which is an eigenvalue of L but not of L), our study reduces to that of L, which has been considered in [5, 36, 38] (see additionally the alternative analysis of [10]). For completeness, we briefly review the observations of [5, 36, 38]. As pointed out in [36, 38], the middle operator Mw := −wx x + (1 + V (x))w has been shown in [43] to have two isolated eigenvalues at 0 and 34 , and essential spectrum on the line [1, ∞). It is an immediate consequence of the Spectral Theorem (see e.g. [37], p. 360) that M is a (non-strictly) positive operator. In the event that λ is an eigenvalue of L (necessarily real), we have that for some eigenfunction w associated with λ that (Mwx )x = λw. Upon multiplication by w¯ and integration over R, we have +∞ − w¯ x Mwx d x = λw2L 2 . −∞

We can conclude by the positivity of M that λ ≤ 0. According to Lemma 2.5, the second assertion in stability criterion (D) is verified if γ = 0. This can be checked by a careful study of solutions to the eigenvalue problem (1.17). In the case of (1.1)–(1.3), and profile (1.4), (1.17) becomes −φx x x x + (b(x)φ)x x = λφ,

(2.20)

Transition Fronts for Cahn–Hilliard Equations

with b(x) =

789

x 3 1 u(x) ¯ 2 − ; u(x) ¯ = tanh . 2 2 2

In the case of φ1− (x; λ), (2.20) can be integrated on x ∈ (−∞, x], and we obtain √ −φ1− x x x + (b(x)φ1− )x = λW1− (x; λ), where as in the proof of Lemma 2.3, W1− (x; λ) = e

μ− 3 (λ)x

! λ + O(|λ1/2 |e−α|x| . μ− 3 (λ) √

Setting λ = 0, we have −φ1− (x; 0)x x x + (b(x)φ1− (x; 0))x = 0.

(2.21)

Proceeding similarly, we see that φ2+ (x; 0) solves precisely the same equation. We can solve (2.21) exactly by integrating once, and solving the equation −φx x + b(x)φ = c1 . Taking c1 = 1, we find the three linearly independent solutions φ10 (x) = u¯ x (x), # " x x 3 x x φ20 (x) = u¯ x (x) 2 sinh cosh3 + 3 sinh cosh + x , 2 2 2 2 2 x φ30 (x) = cosh2 . 2 It follows that φ1− (x; 0) and φ2+ (x; 0) must both be linear combinations of φ20 and φ30 , where without loss of generality we can choose to remove a constant multiple of the exponentially decaying function u¯ x without affecting the estimates of Lemma 2.1. More precisely, we obtain a scaling that agrees with that of Lemma 2.1 by choosing φ1− (x; 0) = −(φ20 (x) + φ30 (x)), φ2+ (x; 0) = φ20 (x) − φ30 (x). Computing directly, we find γ = 2. This concludes the proof of Lemma 2.6.

We conclude this section by combining the estimates of Lemmas 2.1–2.3 to obtain estimates on G λ (x, y) for values of |λ| sufficiently small. We have the following lemma. Lemma 2.7. Under the assumptions of Theorem 1.1, and for G λ (x, y) as defined in (1.18), we have the following estimates. For G λ (x, y) = G˜ λ (x, y) + E λ (x, y), and |λ| < r , for some suitably small constant r , there holds

790

P. Howard

(i) For y ≤ x ≤ 0: −

−

−

−

G˜ λ (x, y) = c1 λ−1/2 (eμ2 (λ)x − eμ3 (λ)x )e−μ2 (λ)y + O(|λ−1/2 |)eμ3 (λ)(x+y) −

+ O(1)eμ2 (λ)(x−y) , − − − −μ− 2 (λ) μ− (e 2 (λ)x − eμ3 (λ)x )e−μ2 (λ)y + O(1)eμ3 (λ)(x+y) 1/2 λ − − + O(|λ1/2 |)eμ2 (λ)(x−y) + O(|λ−1/2 |e−η|x| )e−μ2 (λ)y + O(e−η|x−y| ),

∂ y G˜ λ (x, y) = c1

−

−

−

−

∂x G˜ λ (x, y) = O(1)(eμ2 (λ)x + eμ3 (λ)x )e−μ2 (λ)y + O(|λ−1/2 |e−η|x| )e−μ2 (λ)y + O(e−η|x−y| ), −

−

−

−

∂x y G˜ λ (x, y) = O(|λ1/2 |)(eμ2 (λ)x + eμ3 (λ)x )e−μ2 (λ)y + O(e−η|x| )e−μ2 (λ)y + O(e−η|x−y| ). (ii) For x ≤ y ≤ 0: −

−

−

−

G˜ λ (x, y) = c2 λ−1/2 e−μ2 (λ)x (eμ2 (λ)y − eμ3 (λ)y ) + O(|λ−1/2 |)eμ3 (λ)(x+y) −

+ O(1)eμ3 (λ)(x−y) , −

−

−

−

∂ y G˜ λ (x, y) = O(1)e−μ2 (λ)x (eμ2 (λ)y + eμ3 (λ)y ) + O(|λ−1/2 e−η|x| |)e−μ2 (λ)y + O(e−η|x−y| ), − − − −μ− 2 (λ) −μ− e 2 (λ)x (eμ2 (λ)y − eμ3 (λ)y ) + O(1)eμ3 (λ)(x+y) 1/2 λ − − + O(|λ1/2 |)eμ3 (λ)(x−y) + O(|λ−1/2 e−η|x| |)e−μ2 (λ)y + O(e−η|x−y| ),

∂x G˜ λ (x, y) = c2

−

−

−

−

∂x y G˜ λ (x, y) = O(|λ1/2 |)e−μ2 (λ)x (eμ2 (λ)y + eμ3 (λ)y ) + O(|λ1/2 |e−η|x| )e−μ2 (λ)y + O(e−η|x−y| ). (iii) For y ≤ 0 ≤ x: −

+ G˜ λ (x, y) = O(|λ−1/2 |)eμ2 (λ)x−μ2 (λ)y , −

−

+ ∂ y G˜ λ (x, y) = O(1)eμ2 (λ)x−μ2 (λ)y + O(e−η|x| )e−μ2 (λ)y , −

−

∂x G˜ λ (x, y) = O(1)eμ2 (λ)x−μ2 (λ)y + O(|λ−1/2 |e−η|x| )e−μ2 (λ)y , +

−

−

+ ∂x y G˜ λ (x, y) = O(|λ1/2 |)eμ2 (λ)x−μ2 (λ)y + O(e−η|x| )e−μ2 (λ)y ,

and in all cases − cE u¯ x (x)e−μ2 (λ)y (1 + O(e−η|y| )), λ − −μ− 2 (λ) ∂ y E λ (x, y) = c E u¯ x (x)e−μ2 (λ)y (1 + O(e−η|y| )). λ

E λ (x, y) =

Remark 2.1. We note that the difference terms − − c1 λ−1/2 eμ2 (λ)x − eμ3 (λ)x ,

(2.22)

Transition Fronts for Cahn–Hilliard Equations

791

which clearly vanish at x = 0, arise naturally from our choice of ψ2− (x; λ) and moreover are expected since the mass accumulating near x = 0 is recorded in the excited terms. However, since we additionally have a term −

O(|λ−1/2 |)eμ3 (λ)(x+y) , there is no real advantage to be taken from the cancellation. In this way, our estimates do not seem quite as sharp as those of [38], obtained for a multidimensional generalization of (1.1)–(1.3). We would also mention that we will point out in the proof of Lemma 2.7 precisely why it is that y-derivatives of E λ (x, y) do not have a term that would correspond with differentiation of e−ηy . Proof. We observe at the outset the relations, ∂ y W (φ1− , φ2− , ψ1− )(y; λ) ∂ y W (φ1− , φ2− , ψ2− )(y; λ) − D2+ (λ) , c(y)Wλ (y) c(y)Wλ (y) ∂ y W (φ1− , φ2− , ψ1− )(y; λ) ∂ y W (φ1− , φ2− , ψ2− )(y; λ) + D1+ (λ) . ∂ y N2− (y; λ) = C1+ (λ) c(y)Wλ (y) c(y)Wλ (y) (2.23) Since proofs of the four estimates in each case are similar, we provide details only for the estimates on y-derivatives. The case of y-derivatives is chosen because there is a subtle point in this case in which the estimates of Lemma 2.2 do not quite suffice, and we must use (2.23) and the estimates of Lemma 2.3. ∂ y N1− (y; λ) = −C2+ (λ)

Case (i): y ≤ x ≤ 0. For y ≤ x ≤ 0, we have ∂ y G λ (x, y) = φ1+ (x; λ)∂ y N1− (y; λ) + φ2+ (x; λ)∂ y N2− (y; λ) = (A+2 C1+ − A+1 C2+ )φ1− (x) + (B2+ C1+ − B1+ C2+ )φ2− (x) ∂ W (φ − , φ − , ψ − )(y) y 1 2 1 +(D2+ C1+ − D1+ C2+ )ψ2− (x) c(y)Wλ (y) + (A+2 D1+ − A+1 D2+ )φ1− (x) + (B2+ D1+ − B1+ D2+ )φ2− (x) +(D1+ C2+ − D2+ C1+ )ψ1− (x)

∂ W (φ − , φ − , ψ − )(y) y 1 2 2 . c(y)Wλ (y)

(2.24)

We group these into three sets of terms, beginning with the leading order G˜ λ expression (D2+ C1+ − D1+ C2+ )ψ2− (x)

∂ y W (φ1− , φ2− , ψ1− )(y) c(y)Wλ (y) −

−

= (c1 λ−1/2 + O(1))(eμ2 (λ)x − eμ3 (λ)x −

−α|y| ˜ + O(|λ1/2 |e−α|x| ))e−μ2 (λ)y (−μ− )) 2 (λ) + O(e

− − − −μ− 2 (λ) μ− (e 2 (λ)x − eμ3 (λ)x )e−μ2 (λ) + O(|λ1/2 |)eμ2 (λ)(x−y) 1/2 λ − + O(|λ1/2 |e−η|x| )e−μ2 (λ)y O(e−η(|x|+|y|) ).

= c1

We note in particular that we have made critical use here of Lemma 2.4; otherwise, we have employed the estimates of Lemma 2.2 in straightfoward fashion.

792

P. Howard

Next, we consider contributions to the excited term ∂ y E λ (x, y). In this case, we take # " ∂ y W (φ1− , φ2− , ψ1− )(y; λ) ∂ y W (φ1− , φ2− , ψ2− )(y; λ) + + + D2 (λ) C2 (λ) c(y)Wλ (y) c(y)Wλ (y)

− B1+ (λ)φ2− (x; λ)

= B1+ (λ)φ2− (x; λ)∂ y N1− (y; λ) − = c1 λ−1/2 + O(1) u¯ x (x) + O(|λ1/2 |e−η|x| ) e−μ2 (λ)y (1 + O(e−η|y| )) −

−

= c1 λ−1/2 u¯ x (x)e−μ2 (λ)y (1 + O(e−η|y| )) + O(e−η|x| )e−μ2 (λ)x . It is of particular importance to note that the undifferentiated excited term is precisely the same thing with y-derivatives omitted. In this way, we see that y-differentiation improves the singularity of E λ (x, y) as λ → 0. For the remaining terms, we group according to (2.23) in order to obtain A+2 (λ)φ1− (x; λ)∂ y N2− (y; λ) + A+1 (λ)φ1− (x; λ)∂ y N1− (y; λ) +B2+ (λ)φ2− (x; λ)∂ y N2− (y; λ) + (D2+ (λ)C1+ (λ)

−D1+ (λ)C2+ (λ))ψ1− (x; λ)

∂ y W (φ1− , φ2− , ψ2− ) . c(y)Wλ (y)

(2.25)

For the first three terms on the right hand side of (2.25), we have an estimate by − − c1 + O(|λ1/2 |) eμ3 (λ)x−μ2 (λ)y (1 + O(e−η|x| )) −

−

= O(1)eμ3 (λ)x−μ2 (λ)y . For the third term on the right hand side of (2.25), keeping in mind Lemma 2.4, we have − ∂ y W (φ1− , φ2− , ψ2− ) = O(1)eμ1 (λ)(x−y) . c(y)Wλ (y) (2.26) Case (ii): x ≤ y ≤ 0. For x ≤ y ≤ 0, we have

(D2+ (λ)C1+ (λ) − D1+ (λ)C2+ (λ))ψ1− (x; λ)

∂ y G λ (x, y) = φ1− (x; λ)∂ y N1+ (y; λ) + φ2− (x; λ)∂ y N2+ (y; λ).

(2.27)

Beginning again with the leading order G˜ λ estimate, we have −

φ1− (x; λ)∂ y N1+ (y; λ) = O(1)eμ3 (λ)(x−y) ,

(2.28)

while for the excited and correction terms, we compute φ2− (x; λ)∂ y N2+ (y; λ) " 1/2 −η|x| = u¯ x (x) + O(|λ |e ) k1 λ−1/2 #

− − + O(1) e−μ2 (λ)y 1 + O(e−α|y| ) + O(1)e−μ4 (λ)y −

−

= k1 λ−1/2 u¯ x (x)e−μ2 (λ)y (1 + O(e−α|y| )) + O(e−η|x| )e−μ2 (λ)y + O(e−η|x−y| ).

Transition Fronts for Cahn–Hilliard Equations

793

For the excited term, we take −

∂ y E λ (x, y) = k1 λ−1/2 u¯ x (x)e−μ2 (λ)y (1 + O(e−α|y| )). Case (iii): y ≤ 0 ≤ x. For y ≤ 0 ≤ x, we have ∂ y G λ (x, y) = φ1+ (x; λ)∂ y N1− (y; λ) + φ2+ (x; λ)∂ y N2− (y; λ).

(2.29)

For the leading order G˜ λ term, we take −

φ2+ (x; λ)∂ y N2− (y; λ) = O(1)eμ2 (λ)x−μ2 (λ)y , +

while for the excited and correction terms, we compute φ1+ (x; λ)∂ y N1− (y; λ) = u¯ x (x) + O(|λ1/2 |e−η|x| ) −

˜ × (n 1 λ−1/2 + O(1))e−μ2 (λ)y (1 + O(e−α|y| )).

For the excited term, we take −

˜ ∂ y E λ (x, y) = n 1 u¯ x (x)λ−1/2 e−μ2 (λ)y (1 + O(e−α|y| )).

The remaining cases can be established similarly.

Lemma 2.8. Suppose the conditions of Theorem 1.1 hold, and G λ (x, y) is as defined in (1.18). Then for |λ| ≥ R, some R sufficiently large, and for λ to the right of the contour s defined in Remark 1.1, we have the following estimates. For some β > 0 and for multi-index α in x and y, |∂ α G λ (x, y)| ≤ C|λ−

3−k 4

|e−β|λ

1/4 ||x−y|

, |α| ≤ 3.

Remark 2.2. Regarding the proof of Lemma 2.8, we note that large |λ| behavior corresponds with small t behavior, for which the fourth order effects dominate. Consequently, the proof of Lemma 2.8 is almost precisely the same as that of the corresponding Lemma 3.2 of [25], carried out in the case of multidimensional equations ut +

d $ j=1

f j (u)x j = −

$ b jklm (u)u x j xk xl . jklm

xm

We omit it here. Lemma 2.9. Suppose the conditions of Theorem 1.1 hold, and G λ (x, y) is as defined in (1.18). Then for r ≤ |λ| ≤ R, r as in Lemma 2.7 and R as in Lemma 2.8, and for λ to the right of the contour s defined in Remark 1.1, we have the following estimates. For some constant C sufficiently large, and for multi-index α in x and y, |∂ α G λ (x, y)| ≤ C, |α| ≤ 3.

794

P. Howard

3. Proof of Theorem 1.1 In this section, we obtain estimates on the Green’s function G(t, x; y) through the inverse Laplace transform representation 1 G(t, x; y) = eλt G λ (x, y)dλ, (3.1) 2πi where denotes a contour that lies to the right of the point spectrum of L and loops through the point at ∞. The validity of (3.1) can be established in a straightforward manner from the estimates of Lemmas 2.7, 2.8, and 2.9. See, in particular, Corollary 7.4 of [52]. 3.1. Small time estimates. In the case |x − y| ≥ K t, for some K sufficiently large, and also in the case of t ≤ 1 (in fact, for any fixed finite bound) behavior of the Green’s function is entirely dominated by fourth order effects. In this case, the analysis of [25], carried out in the case of multidimensional equations ut +

d $

f j (u)x j = −

j=1

$ b jklm (u)u x j xk xl , xm

jklm

extends immediately to the current setting and we can conclude, as there, |∂ α G(t, x; y)| ≤ Ct −

1+|α| 4

4/3

e

− |x−y|1/3 Mt

,

where α denotes a standard multi-index in x and y with |α| ≤ 3. The proof is carried out through the estimates of Lemma 2.8 and representation (3.1). 3.2. Large time estimates. In the case |x − y| ≤ K t, K as in Sect. 3.1, we proceed predominately through the estimates of Lemma 2.7, employing Lemmas 2.8 and 2.9 only for integration over |λ| ≥ r , which corresponds roughly with the non-critical cases of small and medium range times. As the analyses of several of the terms in Lemma 2.7 are quite similar, we will proceed by giving full details only for the most critical cases and pointing out the salient points for the remainder. Case (i), y ≤ x ≤ 0. For the case y ≤ x ≤ 0, we begin with the leading term −

−

−

c1 λ−1/2 (eμ2 (λ)x − eμ3 (λ)x )e−μ2 (λ)y , for which we first evaluate c1 2πi Recalling the expansion

−

eλt+μ2 (λ)(x−y) dλ. √ λ

μ− 2 (λ)

=−

λ + O(|λ3/2 |), b−

Transition Fronts for Cahn–Hilliard Equations

795

we have that for |λ3/2 (x − y)| ≤ , > 0 sufficiently small, there holds e

λt+μ− 2 (λ)(x−y)

=e

λt− bλ |x−y| −

(1 + O(|λ3/2 (x − y)|)).

We focus first, then, on integrals c1 2πi

e

λt− bλ (x−y) −

√

λ

dλ,

and treat the higher order terms later as small corrections. For |x − y| ≤ 1 t, 1 > 0 sufficiently small, we take the contour λ |x − y| = + ik. (3.2) b− 2tb− Our approach will be to follow the contour (3.2) until we strike the contour s of Remark 1.1, and from that point to follow s out to the point at ∞, employing the estimates of Lemma 2.8 and 2.9. We denote the truncated contour taken prior to intersection with s as ∗ , and let ±k ∗ denote the values of k at which ∗ strikes s . We remark before proceeding that in the case 1 ≤ |x − y| ≤ K t, we take the similar contour λ (x − y) + ik, (3.3) = b− Lt where L is chosen sufficiently large, and we have λ(k) = b−

% √ (x − y)2 x−y − b− k 2 ; dλ = 2i b− λ. + 2ikb− 2 2 L t Lt

In this case, we lose the precise kernel scaling—as stated for the leading order G˜ term in Theorem 1.1—but with |x − y| ≥ 1 t, we have exponential decay in t, and the term can be subsumed into the higher order estimates. Returning to our main contour (3.2), we observe the relationships λ(k) =

% √ (x − y)2 x−y − b− k 2 ; dλ = 2i b− λdk. + ik 2 4b− t t

(3.4)

In this way, c1 2πi

e ∗

λt− bλ |x−y| −

√

λ

√ 2 +k ∗ c1 −(x−y)2 c1 b− −(x−y) 2 4b t − e−b− k t dk = √ e 4b− t , (3.5) dλ = e π πt −k ∗

wherein we arrive at the sharp leading order estimate of Theorem 1.1. Along the contour s , we clearly have exponential decay in t, which in the case |x − y| ≤ K t gives an estimate which can be subsumed into the higher order terms (see the more detailed remarks in the paragraph labeled higher order corrections). Proceeding similarly in the case λt−μ− (λ)x−μ− (λ)y 3 2 c1 e dλ, √ 2πi λ

796

P. Howard

− and keeping in mind that μ− 2 (λ) = −μ3 (λ), we determine as above an expression

c1 − (x+y)2 √ e 4b− t , πt plus higher order corrections. Higher order corrections. We next consider in more detail the higher order corrections left over from the estimates above. First, we have the O(|λ3/2 |) correction, arising from integrals λt− bλ− (x−y) e O(|λ3/2 (x − y)|)dλ. √ λ Proceeding with the contour (3.3), we have

e

λt− bλ (x−y)

∗

−

√ λ

O(|λ3/2 (x − y)|)dλ

+k ∗

|x − y|3 | + |k 2 ||x − y|)dk t2 −k ∗ |x − y|3 (x−y)2 |x − y| (x−y)2 |x − y| = O( ) + O( 3/2 ) e− 2Lt = O( 3/2 )e− 4Lt . 2 t t t Also, regarding the correction terms from the scattering estimates, we note that in the case |x − y| ≤ K t, we have ≤ Ce−

(x−y)2 2Lt

e−b− k t O(| 2

|x − y| |x − y| |x − y|2 |x − y| ≥ = , K K Kt K 2t so that exponential decay in time, which is always the case along the contour s , gives the broadened kernel decay. Second and third summands in G˜ λ (x, y). The second estimate on G˜ λ (x, y) in the case y ≤ x ≤ 0 can be analyzed precisely as above. Though the third can also be analyzed in a similar manner, we carry out a portion of the calculation in order to indicate one important feature of most of the derivative estimates. We have − 1 O(1)eλt+μ2 (λ)(x−y) dλ. 2πi t≥

Proceeding as above along the contour (3.2) (for |λ| suitably small and to first order in |λ1/2 |), we have 2 +k ∗ |x − y| 1 − (x−y) 2 e 4b− t O(1)e−b− k t 2i + ik dk. 2πi 2tb− −k ∗ For the first term in the square brackets, we immediately obtain the sought estimate by 2 |x − y| (x−y) 4b− t , e t 3/2 subject to higher order corrections as discussed above. For the second term in the square brackets, a normed estimate provides the slightly worse estimate by

C

Ct −1 e

(x−y)2 4b− t

.

Transition Fronts for Cahn–Hilliard Equations

797

The point we would like to make here is that the analyticity of our O(1) term in ρ = allows us to take advantage of the cancellation +k ∗ 2 e−b− k t kdk = 0.

√ λ

−k ∗

In this way, the final estimate on this term takes the form

|x − y| − |x−y|2 O e Mt . t 3/2 Case (iii), y ≤ 0 ≤ x. In the case y ≤ 0 ≤ x, we have the leading term −

O(|λ−1/2 |)eμ2 (λ)x−μ2 (λ)y , +

which differs from the case x ≤ y ≤ 0 only in the different forms of μ+2 (λ) and μ− 2 (λ). Recalling the expansions λ μ− + O(|λ3/2 |), 2 (λ) = − b− λ + μ2 (λ) = − + O(|λ3/2 |), b+ we write

μ+2 (λ)x

− μ− 2 (λ)y

λ =− x+ b+

λ y + O(|λ3/2 |(|x| + |y|)). b−

We focus on the primary contribution, b− b+ λ λ λ λt− (x− y) 1 1 λt− x+ y b+ b− b− b+ b− O(|λ−1/2 |)e dλ = O(|λ−1/2 |)e dλ, 2πi 2πi which can be analyzed precisely as in the previous cases if (for |x − bb−+ y| ≤ t, some suitably small) we take the contour |x − bb−+ y| λ + ik, = b− 2b− t through which we obtain the estimate by

O(t −1/2 )e

−

2 x− bb+ y − 4b− t

.

b+ b−

Similarly as in the previous cases, for |x − y| ≥ t, we take the crude contour b+ − y x b− λ + ik, = b− Lt for a suitably large choice of L. We finally remark that estimates on the excited terms can be obtained similarly, as can estimates on differentiated terms.

798

P. Howard

4. Proof of Theorem 1.2 In this section, we combine the estimates of Theorem 1.1 with the integral represen˙ tations (1.14) and (1.15)—and corresponding integral representations for vx and δ(t) found by direct differentiation—to determine estimates on the perturbation v(t, x) and the local tracking function δ(t). To this end, we state two lemmas corresponding with estimates on the linear and nonlinear integrals in (1.14) and (1.15). Lemma 4.1. For G(t, x; y) as described in Theorem 1.1, and for v0 (y) as described in Theorem 1.2, there holds +∞ √ ˜ x; y)v0 (y)dy ≤ C E 0 (1 + |x| + t)−3/2 , G(t, −∞

+∞ √ ˜ G x (t, x; y)v0 (y)dy ≤ C E 0 t −1/4 (1+t)−1/4 (1+|x|+ t)−3/2 + (1+t)−1/4 e−η|x| , −∞ +∞ |e y (t, y)||V0 (y)|dy ≤ C(1 + t)−1/4 , −∞ +∞ |e yt (t, y)|V0 (y)dy ≤ C(1 + t)−5/4 . −∞

Lemma 4.2. For G(t, x; y) as in Theorem 1.1, and √ (s, y) = s −3/4 (1 + s)−1/2 (1 + |y| + s)−3/2 + s −3/4 (1 + s)−3/4 e−η|y| , there holds t +∞ 0

t 0

−∞ +∞

−∞

t 0

t 0

|G˜ y (t − s, x; y)|(s, y)dyds ≤ C(1 + |x| +

√ −3/2 t) ,

|G˜ yx (t − s, x; y)|(s, y)dyds ≤ Ct −1/4 (1 + t)−1/4 (1 + |x| + +∞ −∞ +∞

−∞

√ −3/2 t) ,

|e y (t − s, y)|(s, y)dyds ≤ C(1 + t)−1/2 ,

|e yt (t − s, y)|(s, y)dyds ≤ C(1 + t)−1 .

We note that the expressions in (s, y) follow from the linear estimates of Lemma 4.1 and the nonlinearites ˙ O(|vvx |), O(|vvx x x |), O(|e−η|x| v 2 |), O(|δ(t)v|). In addition to Lemma 4.1 and Lemma 4.2, we require the following lemma regarding small time behavior of the perturbation v(t, x). The salient points are that (1) for small t, the behavior of v(t, x) is dominated by fourth order effects, and (2) the behavior of derivatives vx , vx x , and vx x x can be linked to the behavior of v. Lemma 4.3. Under the assumptions of Theorem 1.1, and under the additional restriction of Hölder continuity on the initial perturbation v0 ∈ C 0+γ , γ > 0, the integral γ equations (1.14) and (1.15) determine a unique local solution v ∈ C 0+ 4 (t) ∩ C 0+γ (x),

Transition Fronts for Cahn–Hilliard Equations

799

γ

δ ∈ C 1+ 4 (t), extending so long as |v|C 0+γ remains bounded. Moreover, on this time interval √ sup |v(t, x)|(1 + |x| + t)3/2 x∈R

˙ + 1) are uniformly bounded, and for τ > 0 remains continuous so long as both it and δ(t sufficiently small and t ≥ τ , √ √ sup |∂xk v(t, x)|(1 + |x| + t)3/2 ≤ Cτ −k/4 sup |v(t − τ, x)|(1 + |x| + t − τ )3/2 , x∈R

x∈R

for k = 1, 2, 3, and −1 √ sup |∂xk+1 v(t, x)| t −1/4 (1 + t)−1/4 (1 + |x| + t)−3/2 + (1 + t)−3/4 e−η|x| x∈R

√ ≤ Cτ −k/4 sup |∂x v(t − τ, x)| (t − τ )−1/4 (1 + (t − τ ))−1/4 (1 + |x| + t − τ )−3/2 x∈R

+ (1 + (t − τ ))−3/4 e−η|x|

−1

,

for k = 1, 2. We proceed now by defining the iteration variable " |v(s, y)| ζ (t) = sup √ (1 + |y| + s)−3/2 s∈[0,t],y∈R + +

|v y (s, y)| √ −1/4 −1/4 s (1 + s) (1 + |y| + s)−3/2 |δ(s)| (1 + s)−1/4

# ˙ |δ(s)| . + (1 + s)−5/4

+ (1 + s)−3/4 e−η|y| (4.1)

Claim 4.1. Suppose there exists a constant C so that ζ (t) ≤ C(E 0 + ζ (t)2 ),

(4.2)

where E 0 is as in Theorem 1.2. Then for E 0 sufficiently small, there holds ζ (t) < 2C E 0 . Proof of Claim 4.1. We first observe that we have control over ζ (0) directly from the definition of ζ (t). Recalling the relation |v(0, y)| ≤ E 0 (1 + |y|)−3/2 , we see immediately that the first quotient in the definition of ζ (0) is bounded by E 0 . In addition, according to Lemma 4.3 there holds |v y (s, y)| |v y (s − τ, y)| , √ −3/2 ≤ Cτ −1/4 sup √ −3/2 y∈R (1 + |y| + s) y∈R (1 + |y| + s − τ )

sup

for s ≥ τ , τ suitably small, so that |v y (s, y)| √ −1/4 (1 + s)−1/4 (1 + |y| + s)−3/2 s→0 s y∈R

sup lim

|v(0, y)| ≤ C E0 . (1 + |y|)−3/2 y∈R

≤ C sup

800

P. Howard

˙ Finally, we can choose δ(t) in a smooth fashion so that δ(0) = 0 and δ(0) = 0. In this way, we have ζ (0) ≤ C1 E 0 , for a constant C1 . We proceed now by choosing E 0 sufficiently small so that C12 E 0 < 1 and 4C 2 E 0 < 1. First, this choice insures ζ (0) ≤ C(E 0 + ζ (0)2 ) ≤ C(E 0 + C12 E 02 ) < 2C E 0 . Next, let T denote the first time (if it exists) for which we have equality ζ (T ) = 2C E 0 . We have, then, ζ (T ) ≤ C(E 0 + ζ (T )2 ) = C(E 0 + 4C 2 E 02 ) < 2C E 0 , a contradiction. We observe that continuity of ζ (t), critical for our argument, is clear from the short time estimates of Lemma 4.3. This concludes the proof of Claim 4.1. We proceed now to verify the assumption of Claim 4.1 through a straightforward calculation involving the integral representations (1.14) and (1.15), combined with the small time estimates of Lemma 4.3. In particular, for the variable vx x x , which is not carried through the iteration, we employ Lemma 4.3 to write √ |v yyy (s, y)| ≤ Cs −1/2 s −1/4 (1 + s)−1/4 (1 + |y| + s)−3/2 + (1 + s)−3/4 e−η|y| ζ (0). Focusing on the case v(t, x), we have +∞ ˜ x; y)v0 (y)dy |v(t, x)| ≤ G(t, −∞ t +∞ ˙ + |G˜ y (t − s, x; y)| |Q(y, v, v y , v yyy )| + |δ(s)v| dyds 0 −∞ √ ≤ C E 0 (1 + |x| + t)−3/2 t +∞ √ 2 + ζ (t) |G˜ y (t − s, x; y)| s −3/4 (1 + s)−1/2 (1 + |y| + s)−3/2 0 −∞ + s −3/4 (1 + s)−3/4 e−η|y| dyds √ ≤ C(E 0 + ζ (t)2 )(1 + |x| + t)−3/2 . In this way, we have |v(t, x)| ≤ C1 E 0 (1 + ζ (t)2 ). √ (1 + |x| + t)−3/2 Taking a supremum norm over both sides of this last expression, and noting that ζ (t) is a non-decreasing function, we conclude |v(t, x)| √ −3/2 ≤ C1 E 0 (1 + ζ (t)2 ). (1 + |x| + t) x∈[0,t],y∈R sup

˙ Proceeding similarly for vx , δ(t), and δ(t), we conclude ζ (t) ≤ C E 0 (1 + ζ (t)2 ), from which Theorem 1.2 follows from Claim 4.1.

Transition Fronts for Cahn–Hilliard Equations

801

5. Proof of Technical Lemmas In this section, we prove Lemma 4.1 and Lemma 4.2. The proof of Lemma 4.3 is almost identical to that of Lemma 3.4 of [26] and is omitted. At the outset, we recall the notation ˜ x; y) + E(t, x; y), G(t, x; y) = G(t, E(t, x; y) = u¯ x (x)e(t, y). Proof of Lemma 4.1. We divide the analysis up into Cases I and II from Theorem 1.1. Case I, |x − y| ≥ K t or t ≤ 1. For |x − y| ≥ K t or for t ≤ 1, and for the first estimate of Lemma 4.1, we have integrals +∞ 4/3 − |x−y| t −1/4 e Mt 1/3 (1 + |y|)−3/2 dy, −∞

where for simplicity we have extended the range of y over all of R. We first observe that for |x − y| ≥ K t, the kernel decays at exponential rate in t, and we have +∞ 4/3 K 4/3 − |x−y| e− 2M t t −1/4 e 2Mt 1/3 (1 + |y|)−3/2 dy. −∞

(For t ≤ 1, we need not keep track of t decay.) We now divide the integration over y into the subintervals y ∈ [−|x|/2, |x|/2], and its complement, for which we compute

+ |x| 2

− |x| 2

4/3

t

|x−y| −1/4 − 2Mt 1/3

e

≤ C1 t −1/4 e

−

−3/2

(1 + |y|)

x 4/3 M27/3 t 1/3

dy +

|x| − |x| 2 , 2

c

t −1/4 e

4/3

− |x−y|1/3 2Mt

+ C2 (1 + |x|)−3/2 ,

(1 + |y|)−3/2 dy (5.1)

where the expressions on the final right-hand side are respective estimates, the first obtained by integration of (1 + |y|)−3/2 and the second obtained through integration of the L 1 kernel. Finally, we note that for t ≥ |x|, we have exponential decay in both t and |x| from the term exp(−K 4/3 /(2M)t), while for |x| ≥ t, the kernel decay in (5.1) gives exponential decay in |x|. Combining these observations, we obtain a much better estimate than the one claimed; in fact, the estimate here is already sufficient to give the second estimate of Lemma 4.1 as well. (The analysis is clearly driven by the case |x − y| ≤ K t.) Case II, |x − y| ≤ K t and t ≥ 1. For the case |x − y| ≤ K t and t ≥ 1, we first observe through integrating by parts and taking advantage of our zero-mass condition on v0 that we have sufficient decay in t. In order to see this more precisely, we write +∞ +∞ ˜ x; y)v0 (y)dy = ˜ y (t, x; y)V0 (y)dy , G(t, G −∞

−∞

where V0 (y) =

y

−∞

v0 (z)dz,

and consider the primary G˜ y contribution

(x−y)2 |x − y| − (x−y)2 e Mt = O(t −1 )e− 2Mt . O 3/2 t

(5.2)

802

P. Howard

For this contribution, we estimate +∞ (x−y)2 t −1 e− 2Mt (1 + |y|)−1/2 dy −∞

=

√ + t √ − t

t

−1 − (x−y) 2Mt

2

e

(1 + |y|)

−1/2

dy +

√ |y|≥ t

t −1 e−

(x−y)2 2Mt

(1 + |y|)−1/2 dy ≤ Ct −3/4 ,

which gives the required decay in time. (The calculations for other contributions to G˜ are almost identical.) In order to establish the appropriate decay in |x|, we divide the integration up as +∞ ˜ x; y)v0 (y)dy G(t, −∞

=

+ |x| 2

− |x| 2

˜ x; y)v0 (y)dy + G(t,

|x| c [− |x| 2 ,+ 2 ]

˜ x; y)v0 (y)dy. G(t,

(5.3)

For the first integral on the right-hand side of (5.3), we integrate by parts to obtain ˜ x; |x| )V0 (|x|/2)− G(t, ˜ x; − |x| )V0 (−|x|/2)− G(t, 2 2

+ |x| 2 − |x| 2

G˜ y (t, x; y)V0 (y)dy. (5.4)

In all cases, the first two (boundary) expressions in (5.4) provide the estimate x2

Ct −1/2 e− Lt (1 + |x|)−1/2 , for some constants C and L sufficiently large. We easily obtain the claimed decay in |x| upon observing (for |x| ≥ 1) x2

t −1/2 e− Lt = t −1/2

x2 |x| − x 2 e Lt ≤ C1 |x|−1 e− 2Lt . |x|

(5.5)

For the final expression in (5.4), we again specialize to the case (5.2), for which we have 2

e

x − 4Mt

+ |x| 2

− |x| 2

t −1 e−

(x−y)2 4Mt

x2

(1 + |y|)−1/2 dy ≤ Ct −1 (1 + |x|)1/2 e− 4Mt ,

which can be shown by an argument similar to (5.5) to give decay |x|−3/2 . For the second integral on the right-hand side of (5.3), we observe that in all cases integrability of G˜ immediately gives the spatial decay (1 + |x|)−3/2 . These two decay rates (1 + t)−3/4 and (1 + |x|)−3/2 are sufficient to give the claimed estimate. The analysis of the second integral in Lemma 4.1 is almost identical to that of the first. We note here only that it is the expression O(

y

t

y2

e−η|x| )e− Mt 3/2

that gives rise to the exponentially decaying estimate.

Transition Fronts for Cahn–Hilliard Equations

803

Excited estimates. For the third estimate in Lemma 4.1, we integrate by parts and estimate (specializing to the case y ≤ 0) 0

2

C E − 4by t e − (1 + |y|)−1/2 dy √ −∞ 4b− t −√t 0 2 2 C E − 4by t C E − 4by t = e − (1 + |y|)−1/2 dy + √ √ e − (1+|y|)−1/2 dy ≤ Ct −1/4 . √ 4b− t 4b− t −∞ − t

For the fourth estimate in Lemma 4.1, we explicitly compute the time derivative of e y (t, y) as " e yt (t, y) =

2 # 2C E b− − 4by 2 t C E y2 − 4by t − − (1 + O(e−η|y| )), − e + e (4b− t)3/2 (4b− )3/2 t 5/2

from which the primary contribution is

0

−∞

t −3/2 e

− 4by

2

−t

(1 + |y|)−1/2 dy ≤ Ct −5/4 .

˙ bounded as t → 0. By virtue of the alternative Case I estimate for t ≤ 1, we can take δ(t) This concludes the proof of Lemma 4.1. Proof of Lemma 4.2. For Lemma 4.2, we have two nonlinearities to consider, the summands of (s, y). For each estimate on G(t, x; y), we will carry out detailed estimates only for the first of these summands, noting only that the analysis of the second is almost identical. Proceeding as in the proof of Lemma 4.1, we first divide the analysis into Cases I and II. Case I, |x − y| ≥ K (t − s) or t − s ≤ 1. For |x − y| ≥ K (t − s) or t − s ≤ 1, and for the first estimate of Lemma 4.2, we have integrals t 0

+∞

−∞

(t − s)−1/2 e

−

|x−y|4/3 M(t−s)1/3

(s, y)dyds,

where for simplicity we have extended the range of integration over all of R. For the first nonlinearity in (s, y), we have integrals t 0

+∞

−∞

4/3

(t − s)

|x−y| −1/2 − M(t−s)1/3 −3/4

e

s

(1 + s)−1/2 (1 + |y| +

√

s)−3/2 dyds.

(5.6)

Proceeding similarly as in Case I of Lemma 4.1, we first observe that for |x − y| ≥ K (t − s), the kernel decays at exponential rate in (t − s), and we have 0

t

e−

K 4/3 2M (t−s)

+∞

−∞

(t − s)−1/2 e

−

|x−y|4/3 2M(t−s)1/3

s −3/4 (1 + s)−1/2 (1 + |y| +

√

s)−3/2 dyds.

804

P. Howard

|x| We now divide the integration over y into the subintervals y ∈ [− |x| 2 , 2 ] and its complement, for which we compute

t

e−

K 4/3 2M (t−s)

0

|x| 2

− |x| 2

t K 4/3 + e− 2M (t−s) 0

≤ C1 ≤ C2

t

0

−

|x−y|4/3 2M(t−s)1/3

s −3/4 (1 + s)−1/2 (1 + |y| +

√ −3/2 s) dyds

|x−y|4/3

|x| − |x| 2 , 2

√ − c (t − s)−1/2 e 2M(t−s)1/3 s −3/4 (1+s)−1/2 (1+|y|+ s)−3/2 dyds −

|x|4/3 27/3 M(t−s)1/3

e−

K 4/3 2M (t−s) (t

− s)−1/4 e

e−

K 4/3 2M (t−s) (t

− s)−1/4 s −3/4 (1 + s)−1/2 (1 + |x| +

0

t

(t − s)−1/2 e

s −3/4 (1 + s)−1/2 (1 +

√ −3/2 s) ds

√ −3/2 s) ds.

(5.7)

In both of the last two estimates of (5.7), we can integrate the exponential decay in (t − s). We observe that for s ∈ [0, t/2], we have exponential decay in t, while for s ∈ [t/2, t], we have algebraic decay at rate (1 + t)−2 . In the event that |x| ≥ t, we clearly have a sufficient estimate due to the |x| decay, while for |x| ≤ t, we have a sufficient estimate due to the t decay. We proceed in almost identical fashion for the remaining nonlinearity. Again, the analysis is driven by the case |x − y| ≤ K (t − s). Case II, |x − y| ≤ K (t − s) and (t − s) ≥ 1. For the case |x − y| ≤ K (t − s), and for (t − s) ≥ 1, we proceed through the observation that in Cases (i) and (ii) of Theorem 1.1, we have the crude estimate (for t ≥ 1) (x−y)2 G˜ y (t, x; y) ≤ C(1 + t)−3/2 (1 + |x − y|)e− Mt , with additionally |G˜ x y (t, x; y)| ≤ C(1 + t)−3/2 e−

(x−y)2 Mt

y2

+ (1 + t)−3/2 (1 + |y|)e−η|x| e− Mt ,

while for Case (iii) we have |G˜ y (t, x; y)| ≤ C(1 + t)−3/2 (1 + |x − y|)e−

(x−y)2 Mt

y2

+ C(1 + t)−3/2 (1 + |y|)e−η|x| e− Mt ,

with additionally |G˜ x y (t, x; y)| ≤ C(1 + t)−3/2 e−

(x−y)2 Mt

y2

+ (1 + t)−3/2 (1 + |y|)e−η|x| e− Mt .

In Cases (i) and (ii), and for the first summand of (s, y), we consider integrals t 0

0

−∞

(x−y)2

(1 + (t − s))−3/2 (1 + |x − y|)e− M(t−s) s −3/4 (1 + s)−1/2 (1 + |y| +

√

s)−3/2 dyds.

(5.8) For t decay, we divide the integration over s into the subintervals s ∈ [0, t/2] and s ∈ [t/2, t]. Observing the inequality 2 (x−y)2 x − y − (x−y) e M(t−s) ≤ Ce− 2M(t−s) , 1/2 (t − s)

(5.9)

Transition Fronts for Cahn–Hilliard Equations

we determine an estimate by t/2 C1 (1 + t)−1 0

0 −∞

s −3/4 (1 + s)−1/2 (1 + |y| +

+C2 t −3/4 (1 + t)−5/4 ≤ C3 (1 + t)

−1

t/2

805

t

0

t/2 −∞

(1 + (t − s))−1 e−

s −3/4 (1 + s)−1/2 (1 +

0

+C2 t −3/4 (1 + t)−5/4

t

0

t/2 −∞

√ −3/2 s) dyds (x−y)2 Mt

dyds

√ −1/2 s) ds

(1 + (t − s))−1/2 ds

≤ C(1 + t)−1 .

(5.10)

For |x| decay, we divide the integration into subintervals of y, y ∈ (−∞, x/2] and y ∈ [x/2, 0]. For y ∈ (−∞, x/2], we observe (5.9) and integrate the heat kernel to obtain an estimate by t C(1 + |x|)−3/2 (1 + (t − s))−1/2 s −3/4 (1 + s)−1/2 ds ≤ C(1 + |x|)−3/2 . 0

Alternatively, for y ∈ [x/2, 0], we observe that we have kernel decay x2

e− Lt , and can otherwise proceed precisely as in (5.10) to obtain an estimate by x2

C(1 + t)−1 e− Lt . This last estimate is sufficient by the argument of (5.5). For the second (x-derivative) estimate of Lemma 4.2 in Cases (i) and (ii), we have two Green’s kernels to consider, of which we begin with integrals t 0 (x−y)2 √ (1 + (t − s))−3/2 e− M(t−s) s −3/4 (1 + s)−1/2 (1 + |y| + s)−3/2 dyds. (5.11) 0

−∞

For t decay, we divide the integration over s into subintervals s ∈ [0, t/2] and s ∈ [t/2, t] and obtain an estimate by t/2 √ C1 (1 + t)−3/2 s −3/4 (1 + s)−1/2 (1 + s)−1/2 ds 0 t √ + C2 t −3/4 (1 + t)−1/2 (1 + t)−3/2 (1 + (t − s))−1 ds t/2 −3/2

≤ C(1 + t)

.

For |x| decay, we divide the integration into subintervals of y, y ∈ (−∞, x/2] and y ∈ [x/2, 0]. For y ∈ (−∞, x/2], we integrate the heat kernel to obtain an estimate by t C(1 + |x|)−3/2 (1 + (t − s))−1 s −3/4 (1 + s)−1/2 ds ≤ C(1 + t)−1 (1 + |x|)−3/2 . 0

806

P. Howard

Alternatively, for y ∈ [x/2, 0], our t-decay argument leads to an estimate by x2

C(1 + t)−3/2 e− Lt , which is sufficient by the argument of (5.5). For the second Green’s kernel, we have integrals t 0 y2 √ (1 + (t − s))−3/2 (1 + |y|)e−η|x| e− M(t−s) s −3/4 (1 + s)−1/2 (1 + |y| + s)−3/2 dyds. 0

−∞

(5.12) In this case, we always have exponential decay in |x|, and so we focus entirely on temporal decay. Dividing the integration over s into subintervals, we obtain an estimate by t 0 y2 C1 e−η|x| (1 + t)−3/2 (1 + |y|)−1/2 e− M(t−s) s −3/4 (1 + s)−1/2 dyds −∞

0

+ C2 e

−η|x| −3/4

t

(1 + t)

−1/2

√ (1 + t)−3/2

t 0

≤ C(1 + t)−5/4 e−η|x| ,

0 −∞

y2

(1 + (t − s))−1 e− 2M(t−s) dyds

much better than required. The case y ≤ 0 ≤ x can be analyzed similarly as were the previous cases. Excited estimates. We last consider the third and fourth estimates of Lemma 4.2. For the third, and for the first summand in (s, y), we estimate t 0 2 √ CE − y % e 4b− (t−s) s −3/4 (1 + s)−1/2 (1 + |y| + s)−3/2 dyds 4b− (t − s) 0 −∞ t/2 t √ ≤ C1 t −1/2 s −3/4 (1 + s)−1/2 (1 + s)−1/2 ds + C2 t −3/4 (1 + t)−5/4 ds 0

≤ Ct

−1/2

t/2

,

(5.13)

where the seeming blow-up as t → 0 can be eliminated by proceeding alternatively for t small. For the fourth estimate in Lemma 4.2, and for the first summand of (s, y), we estimate t−1 0 y2 √ (t − s)−3/2 e− M(t−s) s −3/4 (1 + s)−1/2 (1 + |y| + s)−3/2 dyds 0

−∞

≤ C1 t

−3/2

t/2

s 0

−3/4

(1 + s)

−3/4

+ C2 t

−3/4

(1 + t)

−5/4

t−1

(t − s)−1 ds

t/2

≤ Ct −3/2 , where we can eliminate the seeming blow-up as t → 0 by proceeding alternatively for t small, and where the integration over s ∈ [t − 1, t] involves the small time estimates on G(t, x; y) and can be carried out as in Case I. This concludes the proof of Lemma 4.2. Acknowledgements. This research was partially supported by the National Science Foundation under Grant No. DMS–0500988.

Transition Fronts for Cahn–Hilliard Equations

807

References 1. Alexander, J., Gardner, R., Jones, C.K.R.T.: A topological invariant arising in the analysis of traveling waves. J. Reine Angew. Math. 410, 167–212 (1990) 2. Brin, L.: Numerical testing of the stability if viscous shock waves. Math. Contemp. 70(235), 1071–1088 (2001) 3. Bricmont, J., Kupiainen, A.: Renormalization group and the Ginzburg–Landau equation. Commun. Math. Phys. 150, 193–208 (1992) 4. Bricmont, J., Kupiainen, A., Lin, G.: Renormalization group and asymptotics of solutions of nonlinear parabolic equations. Comm. Pure Appl. Math. 47, 893–922 (1994) 5. Bricmont, J., Kupiainen, A., Taskinen, J.: Stability of Cahn–Hilliard fronts. Comm. Pure Appl. Math. Vol. LII, 839–871 (1999) 6. Bertozzi, A.L., Münch, A., Shearer, M.: Undercompressive shocks in thin film flows. Physica D 134, 431–464 (1999) 7. Bertozzi, A.L., Münch, A., Shearer, M., Zumbrun, K.: Stability of compressive and undercompressive thin film traveling waves: The dynamics of thin film flow. European J. Appl. Math. 12, 253–291 (2001) 8. Bogoliubov, N.N., Shirkov, D.V.: The theory of quantized fields. New York: Interscience, 1959 9. Cahn, J.W.: On spinodal decomposition. Acta Metall. 9, 795–801 (1961) 10. Carlen, E.A., Carvalho, M.C., Orlandi, E.: A simple proof of stability of fronts for the Cahn–Hilliard equation. Commun. Math. Phys. 224, 323–340 (2001) 11. Cahn, J.W., Hilliard, J.E.: Free energy of a nonuniform system I. interfacial free energy. J. Chem. Phys. 28, 258–267 (1958) 12. Novick–Cohen, A., Segel, L.A.: Nonlinear aspects of the Cahn–Hilliard equation. Physica D 10, 277–298 (1984) 13. Dodd, J.: Convection stability of shock profile solutions of a modified KdV–Burgers equations. Thesis under the direction of R. L. Pego, University of Maryland, 1996 14. Evans, J.W.: Nerve Axon Equations I–IV, Indiana U. Math. J. 21, 877–885 (1972); 22, 75–90 (1972); 22, 577–594 (1972); 24, 1169–1190 (1975) 15. Freistühler, H., Szmolyan, P.: Spectral stability of small shock waves. Arch. Ration. Mech. Anal. 164, 287–309 (2002) 16. Gao, H., Liu, C.: Instability of traveling waves of the convective–diffusive Cahn–Hilliard equation. Chaos, Solitons & Fractals 20, 253–258 (2004) 17. Goldenfeld, N., Martin, O., Oono, Y.: Asymptotics of the renormalization group. In: Asymptotics beyond all orders, Proceedings of a NATO Advanced Research Workshop on Asymptotics Beyond all Orders, Segur, H., Tanveer, S., Levine, H. eds, New York: Plenum Press, 1991, pp. 375–383 18. Goldenfeld, N., Martin, O., Oono, Y., Lin, F.: Anomalous dimensions and the renormalization group in a nonlinear diffusion process. Phys. Rev. Lett. 64, 1361–1364 (1990) 19. Gu`es, O., M´etivier, G., Williams, M., Zumbrun, K.: Multidimensional viscous shocks. I. Degenerate symmetrizers and long time stability. J. Amer. Math. Soc. 18(1), 61–120 (2005) (electronic) 20. Goldenfeld, N., Oono, Y.: Renormalization group theory for two problems in linear continuum mechanics. Phys. A 177, 213–219 (1991) 21. Gardner, R., Zumbrun, K.: The Gap Lemma and geometric criteria for instability of viscous shock profiles. Comm. Pure Appl. Math. 51(7), 797–855 (1998) 22. Howard, P.: Pointwise estimates on the Green’s function for a scalar linear convection–diffusion equation, J. Differ. Eqs. 155, 327–367 (1999) 23. Howard, P.: Pointwise estimates and stability for degenerate viscous shock waves. J. Reine Angew. Math. 545, 19–65 (2002) 24. Howard, P.: Local tracking and stability for degenerate viscous shock waves. J. Differ. Eqs. 186, 440–469 (2002) 25. Howard, P., Hu, C.: Pointwise Green’s function estimates toward stability for multidimensional fourth order viscous shock fronts. J. Differ. Eqs. 218, 325–389 (2005) 26. Howard, P., Hu, C.: Nonlinear stability for multidimensional fourth order shock fronts. To appear in Arch. Rational Mech. Anal., DOI: 10.1007/s00205-005-0409-y, 2006 27. Hoff, D., Zumbrun, K.: Green’s function bounds for multidimensional scalar viscous shock fronts. J. Differ. Eqs. 183, 368–408 (2002) 28. Hoff, D., Zumbrun, K.: Asymptotic behavior of multidimensional scalar viscous shock fronts. Indiana U. Math. J. 49, 427–474 (2000) 29. Howard, P., Raoofi, M.: Pointwise asymptotic behavior of perturbed viscous shock profiles. To appear in Advances in Differential Equations 30. Howard, P., Raoofi, M., Zumbrun, K.: Sharp pointwise bounds for perturbed viscous shock profiles. J. Hyperbolic Diff. Eqs. 3, 297–373 (2006) 31. Humpherys, J., Sandstede, B., Zumbrun, K.: Efficient computation of analytic bases in Evans function analysis of large systems. Preprint 2005

808

P. Howard

32. Humpherys, J., Zumbrun, K.: Spectral stability of small amplitude shock profiles for dissipative symmetric hyperbolic–parabolic systems. Z. Angew. Math. Phys. 53, 20–34 (2002) 33. Humpherys, J., Zumbrun, K.: An efficient shooting algorithm for Evans function calculations in large systems. http://arxiv.org/list/math.NA/0508020, 2005 34. Howard, P., Zumbrun, K.: Pointwise estimates and stability for dispersive–diffusive shock waves. Arch. Rational Mech. Anal. 155, 85–169 (2000) 35. Jones, C.K.R.T.: Stability of the traveling wave solution of the FitzHugh–Nagumo system. Trans. Amer. Math. Soc. 286(2), 431–469 (1984) 36. Korvola, T.: Stability of Cahn–Hilliard fronts in three dimensions. Doctoral dissertation, University of Helsinki, 2003 37. Kato, T.: Perturbation theory for linear operators. 2nd Edition, Berlin–Heidelberg–New York: SpringerVerlag, 1980 38. Korvola, T., Kupiainen, A., Taskinen, J.: Anomalous scaling for three-dimensional Cahn–Hilliard fronts. Comm. Pure Appl. Math. Vol. LVIII, 1–39 (2005) 39. Kapitula, T., Rubin, J.: Existence and stability of standing hole solutions to complex Ginzburg–Landau equations, Nonlinearity 13, 77–112 (2000) 40. Kapitula, T., Sandstede, B.: Stability of bright solitary-wave solutions to perturbed nonlinear Schrödinger equations. Physica D 124, 58–103 (1998) 41. Liu, T.–P.: Nonlinear stability of shock waves for viscous conservation laws. Memoirs AMS 56(328) (1985) 42. Liu, T.–P.: Pointwise convergence to shock waves for viscous conservation laws. Comm. Pure Appl. Math. 50(11), 1113–1182 (1997) 43. Landau, L.D., Lifshitz, E.M.: Quantum Mechanics. 3rd Ed, New York: Pergamon, 1981 44. Murray, J.D.: Mathematical Biology. Vol. 19 of Biomathematics, New York: Springer-Verlag, 1989 45. Mascia, C., Zumbrun, K.: Pointwise Green function bounds for shock profiles of systems with real viscosity. Arch. Rational Mech. Anal. 169, 177–263 (2003) 46. Oh, M., Zumbrun, K.: Stability of periodic solutions of viscous conservation laws: Analysis of the Evans function. Arch. Rat. Mech. Anal. 166, 99–166 (2003) 47. Oh, M., Zumbrun, K.: Stability of periodic solutions of conservation laws with viscosity: pointwise bounds on the Green function. Arch. Ration. Mech. Anal. 166(2), 167–196 (2003) 48. Pego, R.L., Weinstein, M.I.: Eigenvalues and instabilities of solitary waves. Philos. Trans. Roy. Soc. London Ser. A 340, 47–94 (1992) 49. Raoofi, M.: L p asymptotic behavior of perturbed viscous shock profiles. J. Hyperbolic Diff. Eqs. 2, 595–644 (2005) 50. Shinozaki, A., Oono, Y.: Dispersion relation around the kink solution of the Cahn–Hilliard equation. Phys. Rev. E 47, 804–811 (1993) 51. Zumbrun, K.: Multidimensional stability of planar viscous shock waves. In: Advances in the theory of shock waves, Progr. Nonlinear Differential Equations Appl. 47, Boston, MA: Birkhauser Boston, 2001, pp. 307–516 52. Zumbrun, K., Howard, P.: Pointwise semigroup methods and stability of viscous shock waves. Indiana Math. J. 47, 741–871 (1998) Communicated by P. Constantin

Commun. Math. Phys. 269, 809–831 (2007) Digital Object Identifier (DOI) 10.1007/s00220-006-0146-6

Communications in

Mathematical Physics

Spin, Statistics, and Reflections II. Lorentz Invariance Bernd Kuckert , Reinhard Lorenzen II. Institut für Theoretische Physik, Luruper Chaussee 149, 22761 Hamburg, Germany. E-mail: [email protected]; [email protected] Received: 22 December 2005 / Accepted: 18 July 2006 Published online: 14 November 2006 – © Springer-Verlag 2006

Dedicated to Professor H.-J. Borchers on the occasion of his 80th birthday Abstract: The analysis of the relation between modular P1 CT-symmetry — a consequence of the Unruh effect — and Pauli’s spin-statistics relation is continued. The result in the predecessor to this article is extended to the Lorentz symmetric situation. A model ↑ ↑ G L of the universal covering L + ∼ = S L(2, C) of the restricted Lorentz group L + is modelled as a reflection group at the classical level. Based on this picture, a representation of G L is constructed from pairs of modular P1 CT-conjugations, and this representation can easily be verified to satisfy the spin-statistics relation. 1. Introduction The spin-statistics connection and the search for its conceptual roots has been a prominent object of investigation in quantum field theory over decades; we refer to Refs. [18, 15, and 14] for detailed discussions of the literature in this field. Spectacular success has been made in deriving the spin-statistics connection from standard properties of quantum fields, but there has always remained some dissatisfaction because these results did not really dig up the physical roots of the principle. Recently, an angular-momentum additivity condition has been established as sufficient and necessary for Pauli’s spinstatistics connection in quantum mechanics [15, 16], but this result does not include quantum fields, which will be of interest here. In particular, the analysis was confined to finite-component fields, which entails a strong assumption on the representation of the Lorentz group one wishes to investigate. There are, however, infinite-component quantum fields that are covariant under representations violating Pauli’s spin-statistics connection [17]. What is more, the confinement to finite-component fields is of a purely technical nature; there is no evident physical motivation except the fact that it is met in practically all applications. So despite the merits of the old results, one agreed that some work remained to be done. Emmy-Noether fellow of the Deutsche Forschungsgemeinschaft

810

B. Kuckert, R. Lorenzen

In the 1990s, it was realized by several authors [9, 10, 12, 11] that the spin-statistics connection could be derived from the Unruh effect [20, 1, 2]. This phenomenon, which states that a uniformly accelerated observer experiences the vacuum of a quantum field as a thermal state, has recently been derived from basic stability properties of vacuum states [13]. The Unruh effect, in turn, implies an intrinsic form of P1 CT-symmetry, i.e., covariance under conjugations in charge, time, and one spatial direction [10]. These conjugations can be extracted from the algebra of field operators by an elementary intrinsic fashion invented by Tomita and Takesaki [19, 3]. This symmetry is referred to as modular P1 CT-symmetry. In 1994/95, two spin-statistics theorems were obtained by Guido and Longo on the one hand [10] and by one of us on the other [12]. Guido and Longo derived the spinstatistics theorem from the Unruh effect, and their result applies to a large class of quantum field theories, including fields with an infinite number of components and massless fields. On the other hand, the result found in Ref. [12] merely assumes modular P1 CT-symmetry in the vacuum sector of a Haag-Kastler theory of observables, and it deduced the spin-statistics connection for massive single-particle states, which give rise to topological charges. The elements of the representation of the (homogeneous) symmetry group are products of two modular conjugations each, which yields an elementary algebraic argument. But massless fields are not included, and the setting prevents the observables to be covariant under more than one representation of the Poincaré group. So one result covers a larger class of fields by making stronger symmetry assumptions, whereas the other one minimizes the symmetry assumptions by considering a smaller class of fields. The result presented in this article joins the advantages of these two approaches: Only modular P1 CT-symmetry is assumed, and the result covers fields satisfying an absolute minimum of standard assumptions. Not even covariance under a representation of the Lorentz group needs to be assumed from the outset; this representation will be constructed from the modular P1 CT-operators. As an important prerequisite, a model of the universal covering group of the restricted Lorentz group will be constructed first. The model is a kind of reflection group. In Ref. [14], a model G R of the universal covering group S O(3) ∼ = SU (2) has been 3 constructed from pairs of reflections at planes in R . Considering a general quantum field theory, it was assumed that modular P1 CT-conjugations existed for all reflections along spacelike vectors in a fixed time-zero plane. This symmetry assumption has been shown to be sufficient to construct a covariant representation of G R , and it is elementary to see that this representation exhibits the spin-statistics relation. ↑ For the restricted Lorentz group L + , such a representation has been constructed earlier by Buchholz, Dreyer, Florig and Summers [5, 8]. In a more recent article, Buchholz and Summers have given a much more straightforward proof [6]. The short cut found there was the decisive indication that a similar result could also be obtained for the universal ↑ covering L+ ∼ = S L(2, C), the goal being the generalization of the first derivation of the spin-statistics theorem already obtained in Ref. [12]. Some of their arguments will play an important role at the end of this paper, where such a generalization is established. This article will be subdivided as follows. In Sect. 1.1, some preliminaries will be discussed; in Sect. 1.2, the construction of the covering group G L will be outlined in terms of definitions and statements, which will be proved in Sect. 2. The construction of G L will be applied when proving a most general spin-statistics theorem for relativistic quantum fields, which is done in Sect. 3. In Sect. 4, it is shown that modular P1 CT-symmetry

Spin, Statistics, and Reflections II. Lorentz Invariance

811

implies full PCT-symmetry as well and how the present result is related to the results of Guido and Longo obtained in Ref. [10]. The article ends with a conclusion. 1.1. Preliminaries. Let R1+3 be the Minkowski spacetime with three spatial dimensions, denote by g(·, ·) its Lorentz metric (x, y) → g(x, y) =: x y, by V+ the open forward light cone,1 by M1+ the hyperboloid {x ∈ V+ : x 2 = 1}, and by H1 the spacelike unit hyperboloid {x ∈ R1+3 : x 2 = −1}. The Lorentz group L has four connected components. The connected component ↑ L + =: L 1 containing the unit element 1 is a subgroup of L called the restricted Lorentz group. All μ ∈ L 1 satisfy det μ = 1 and μV+ = V+ . The fixed-point set F P(μ) of any μ ∈ L 1 is a linear subspace of R1+3 with zero, one, two, or four dimensions. We call μ ∈ L 1 a generalized boost if F P(μ) contains a two-dimensional spacelike subspace, and we call μ a generalized rotation if F P(μ) contains a two-dimensional timelike subspace. The usual notions of boost and rotation require the choice of a time vector e0 ∈ M1+ and its time-zero plane e0⊥ . A generalized boost μ is a boost with respect to e0 if F P(μ) ⊂ e0⊥ , and it is a rotation if F P(μ)⊥ ⊂ e0⊥ . For each generalized rotation or boost μ there is more than one e0 ∈ M1+ with respect to which μ is a rotation or boost, respectively. Note that the unit element is both a generalized boost and a generalized rotation and that the fixed-point sets of all other generalized rotations and boosts are two-dimensional. Crucial for the analysis to follow is the fact that each element of L 1 is a concatenation of two orthogonal reflections at two-dimensional spacelike planes.2 Like in Ref. [14], where the corresponding analysis was carried out for the simpler case of rotational symmetry, a simply connected covering of L 1 will now be constructed by endowing these planes with an orientation. There are several equivalent and useful descriptions of the set O of oriented spacelike planes. 1. Rindler wedges. The spacelike complement S = {x ∈ R1+3 : xs < 0 for all s ∈ S} of a spacelike plane S has two connected components, each of which specifies an orientation on S. These components are wedges and have been named after W. Rindler, who endowed them with a spacetime structure on their own. The geodesic observers in this spacetime structure are those observers in R1+3 that are uniformly accelerated perpendicular to S. The boundary of a Rindler wedge is a horizon for the Rindler observer. This physical role will be relevant in the discussion of the spin-statistics relation below. 2. Classes of zweibeine. Define a set of zweibeine Z by Z := {ξ = (tξ , xξ ) : tξ2 = 1, xξ2 = −1, xξ ⊥ tξ }. The set ξ ⊥ := {tξ , xξ }⊥ is a two-dimensional spacelike plane, and the wedge Wξ := {x ∈ R1+3 : −x xξ > |xtξ |} is one of its Rindler wedges. Define an equivalence relation ξ ∼η ¯ on Z by the condition Wξ = Wη . Let Z¯ be the quotient space Z /∼, ¯ and let π¯ be the canonical projection from Z onto Z¯ . For each a = π¯ (ξ ), denote by Wa the wedge Wξ and by a ⊥ its edge ξ ⊥ . 1 The set {x ∈ R1+s : x 2 > 0} has two connected components. The open forward lightcone V is the + future-directed one with respect to the time orientation of R1+3 . 2 See Lemma 9 below.

812

B. Kuckert, R. Lorenzen

Given a ∈ Z¯ , the hyperbola (a) := {xξ : ξ ∈ π¯ −1 (a)} is a geodesic of the Rindler spacetime structure on Wa . An action of the full Lorentz group L on Z from the left is defined by μξ = (μtξ , μxξ ). Since ξ ∼η ¯ implies μξ ∼μη, ¯ this action induces an action on Z¯ by μπ¯ (ξ ) := π¯ (μξ ). Evidently, Wμa = μWa . The subset Z + := {ξ ∈ Z : tξ ∈ V+ } of Z has the property that π¯ (Z + ) = π¯ (Z ). If one restricts the equivalence relation ∼ ¯ to Z + , one obtains an equivalence relation as well, and the corresponding quotient space is isomorphic with Z¯ . The restricted Lorentz group L 1 acts transitively on Z + (see Lemma 7 below), but not on Z (since the elements of L 1 preserve time orientation). 3. Spectral decompositions of boosts. Given a ∈ Z¯ , the generalized boosts with fixedpoint set a ⊥ give rise to a one-parameter group (μaχ )χ with the property that (μaχ x − x)2 > 0 for all χ > 0 and x ∈ Wa . This group is unique up to multiplication of χ by a positive scalar. Conversely, given a one-parameter group (μχ )χ ∈R of generalized boosts, the μχ with χ = 0 have a common fixed-point plane S. Furthermore, one verifies that there are an α > 0 and a future-directed lightlike vector + with μχ + = eαχ + for all χ ∈ R, and a past-directed lightlike vector − with μχ − = e−αχ − for all χ ∈ R. The convex hull of + , − , and S is the closure of a Rindler wedge. Pairs of lightlike vectors have been used earlier by Buchholz, Dreyer, Florig, and Summers for the description of Rindler wedges [5]. The subsequent analysis will be formulated in terms of Z¯ rather than the naturally isomorphic set O, but occasionally, the other descriptions show up in the proofs as well. 1.2. The construction of G L : definitions and statements. For each ξ ∈ Z , let both jξ and jπ¯ (ξ ) denote the orthogonal reflection by the plane ξ ⊥ = a ⊥ , i.e., the map jξ x ≡ jπ¯ (ξ ) x := x − 2(xtξ ) tξ + 2(x xξ ) xξ . Endow the set Z¯ × Z¯ =: M L with the structure of the pair groupoid of Z¯ with concatenation ◦, and define an operation of L on M L by μ(a, b) := (μa, μb). With each (a, b) ∈ M L , one can associate the Lorentz transformation λ(a, b) := ja jb ∈ L 1 . Define a relation ∼ on M L by writing m ∼ n if and only if there exists a μ ∈ L 1 with n = ±μ2 m and μλ(m)μ−1 = λ(m). Note that m ∼ n implies λ(m) = λ(n). 3 Proposition 1. The relation ∼ is an equivalence relation. The proof will be given in Sect. 2.2. Let G L be the quotient space M L /∼, and denote by π : M L → G L the canonical projection of the relation ∼. Define ±1 := π(a, ±a) for arbitrary a ∈ Z¯ , and −π(a, b) := π(a, −b) for (a, b) ∈ M L . As remarked, μ ∼ ν implies λ(μ) = λ(ν), so a map λ˜ : G L → L 1 can be defined ˜ by λ(g) := λ(π −1 (g)), and the diagram π

/ GL | || λ || | ˜ ~|| λ ↑ L+

ML

(1)

3 The square in the condition n = ±μ2 m is important in order to avoid trouble with rotations by the angle π . It has been forgotten in Ref. [14].

Spin, Statistics, and Reflections II. Lorentz Invariance

813

commutes. All maps in this diagram are continuous. This holds for π by definition, and it is evident for λ. To show continuity of λ˜ , let M ⊂ L 1 be open. λ˜ −1 (M) is open if and only if π −1 (λ˜ −1 (M)) is open. This set coincides with λ−1 (M), which is open by continuity of λ. ˜ Proposition 2. For each g ∈ G L , one has g = −g and λ˜ −1 (λ(g)) = {g, −g}. The proof of this proposition will be given in Sect. 2.3. Theorem 1. (i) λ˜ is a covering map and endows G L with the structure of a two-sheeted covering space of L 1 . (ii) G L is simply connected. (iii) There is a unique group product on G L with the property that the diagram ML × ML

◦

/ ML

/ GL

·

/ L1

π

π ×π

GL × GL λ˜ ×λ˜

L1 × L1

(2)

λ˜

commutes. This means that G L is isomorphic with the universal covering group of L 1 . The proof of this theorem and the following lemma will be given in Sect. 2.6. Lemma 1 (Adjoint action of G L on itself). Given h ∈ G L and (c, d) ∈ M L , one has ˜ ˜ hπ(c, d)h −1 = π λ(h)c, λ(h)d .

(3)

2. The Construction of G L : Proofs The proofs of the statements made in the preceding section requires an extended mathematical analysis, which will now be developed step by step.

2.1. Reflections of spacelike planes by spacelike planes. It may well be that the following lemma, which is highly plausible at a first glance, but somewhat tricky to prove, has been proved earlier by other authors. But since such a reference is not known to us, we prove it here. Lemma 2. If A and B are two-dimensional spacelike subspaces of R1+3 , then there exists a two-dimensional spacelike subspace C such that B is the image of A under orthogonal reflection by C.

814

B. Kuckert, R. Lorenzen

Proof. If A and B have nontrivial intersection, then there exist linearly independent nonzero vectors a ∈ A, b ∈ B, and c ∈ A ∩ B. The one-dimensional timelike space {a, b, c}⊥ is perpendicular to both A and B, so A and B are subspaces of a common timezero plane, and the problem boils down to the well-known three-dimensional euclidean case. It remains to consider the case that A and B have trivial intersection. A⊥ and B ⊥ are timelike planes and, hence, are spanned by future-directed lightlike vectors x, y ∈ A⊥ and v, w ∈ B ⊥ . Since A and B have trivial intersection, x, y, v, and w are linearly independent, so the inner product between any two distinct vectors of these is strictly positive. Let C be the plane spanned by the vectors x − αv and y − βw, where x y xw x y yv α := >0 and β := > 0. vw yv vw xw Then C ⊥ is spanned by x + αv and y + βw, since (x − αv)(x + αv) = x 2 − α 2 v 2 = 0 = (y − βw)(y + βw) and since (x − αv)(y + βw) = x y − αβ vw − α yv + β xw x y xw x y yv x y xw x y yv = xy − vw − yv + xw vw yv vw xw vw yv vw xw xy xy xy = xy − vw − (xw)(yv) + (yv)(xw) vw vw vw = 0 = · · · = (x + αv)(y − βw). C ⊥ is timelike because (x + αv)2 = 2α xv > 0, so C is spacelike. Denote by jC the orthogonal reflection by C. One then finds jC x = 21 jC (x + αv) + (x − αv) = 21 (−(x + αv) + (x − αv)) = −αv ∈ B ⊥ ∈C

∈C ⊥

and jC y = −βw ∈ B ⊥ .

2.2. Proof of Proposition 1. Proposition 1 states that the relation ∼ is an equivalence relation. In contrast to the corresponding statement for the analysis of the rotation group and its universal covering, this is not self-evident. It will be proved in this section, together with some properties of the equivalence relation ∼. Lemma 3. Let μ and ν be restricted Lorentz transformations. (i) Suppose that μ = 1. There exist at least one and at most two elements ν with μ = ν 2 . If, in particular, μ2 = 1, there are two square roots which are inverses of each other. (ii) The commutant of μ is an Abelian group if and only if μ2 = 1. (iii) Given μ, ν ∈ L 1 , suppose that μ2 = 1 = ν 2 and μ2 ν 2 = ν 2 μ2 . Then μν = νμ.

Spin, Statistics, and Reflections II. Lorentz Invariance

815

Proof. The matrix group S L(2, C) is well known to be isomorphic with the universal covering group of L 1 . Let be any covering map from S L(2, C) onto L 1 . Then

−1 ( (A)) = ±A for any A ∈ S L(2, C). The conjugacy classes of S L(2, C) are classified by the Jordan matrices in S L(2, C), which are

z 0 11 −1 1 ˙ ,z ∈ C N∞ := N−∞ := , Nz := 0 1/z 01 0 −1 ˙ ∪ ±∞ and a P ∈ S L(2, C) with A = P Nz P −1 . so for each A ∈ S L(2, C) there is a z ∈ C Proof of (i). Since μ = 1 by assumption and since [N−z ] = [−Nz ], there exists an element A = P Nz P −1 ∈ −1 (μ) with ±1 = z = −∞. 2 = ±A if If z = ∞, the elements of −1 (μ) are ±A, and B± ∈ S L(2, C) satisfy B± −1 and only if ±B± = ±P Nw± P for complex square roots w± of ±z. One obtains two square roots ν± := (B± ) ≡ (−B ± )of μ.

1 1/2 P −1 are the two square roots of A. Since the If z = ∞, then B := ±P 0 1 elements in [N−∞ ] have no square roots in S L(2, C), the only square root of μ is ν := (B) ≡ (−B). If μ2 = 1 and ν 2 = μ, then ν −2 = μ−1 = μ. The roots ν and ν −1 are distinct, since ν = ν −1 would imply 1 = νν −1 = ν 2 = μ, contradicting the assumption. Proof of (ii). μν = νμ if and only if AB = ±B A for all A ∈ −1 (μ) and B ∈ −1 (ν). Given A = P Nz P −1 ∈ S L(2, C) with z = ±1, the commutant of A is the Abelian ˙ group {P Nz P −1 : z ∈ C}. of A is trivial if z = ±i; otherwise it consists of the matrices

The anticommutant 0 v −1 P P . These matrices neither commute nor anticommute with the elements −1/v 0 of the commutant of P N±i P −1 . But if μ2 = 1, then there exists an A = P Nz P −1 ∈ −1 (μ) with ±1 = z = ±i}, so the commutant Ac of A is an Abelian subgroup of S L(2, C), and the anticommutant of A is trivial. Accordingly, the commutant μc of μ is the Abelian group (Ac ). If μ2 = 1, all z ∈ C with A = P Nz P −1 and (A) = μ equal ±1 or ±i. If z = ±1, then μ = 1, and the commutant is L 1 and, hence, non-Abelian, and if z = ±i, the above remarks apply. Proof of (iii). Since μ2 = 1 = ν 2 , it follows from the preceding statement that the comc c −1 −1 ˙ mutants μ and ν are the maximal Abelian groups {P Nz P : z ∈ C}, {P Nz P : z ∈ ˙ or P 1 t P −1 : t ∈ C for some P ∈ S L(2, C). Consequently, the assumption C}, 01 ν 2 ∈ (μ2 )c implies ν ∈ (μ2 )c , i.e., μ2 ∈ (ν 2 )c . This yields the statement by the same argument. Lemma 4. Consider (a, b) ∈ M L , and shorthand λ(a, b) =: λ. There exists an element c ∈ Z¯ with a = ± jc b. Shorthanding λ(c, b) =: μ, one has μ2 = λ and (a, b) = (∓μb, b). Proof. Existence of c follows from Lemma 2. The other statements follow from jc jb jc jb = j jc b jb = ja jb

816

B. Kuckert, R. Lorenzen

and a = ± jc b = ∓ jc jb b = ∓μb. Proposition 1. ∼ is an equivalence relation. Proof. Symmetry and reflexivity are evident, so it remains to prove transitivity. If m ∼ n and n ∼ r , then λ(m) = λ(n) = λ(r) =: λ, and there exist elements μ and ν commuting with λ and satisfying μ2 m = ±n and ν 2 n = ±r . If μ2 = 1 or ν 2 = 1, one trivially has m ∼ r . If ν 2 μ2 = 1, one even has m = ±r . It follows from ν 2 μ2 m = ±r that λ = jν 2 μ2 a jν 2 μ2 b = ν 2 μ2 ja jb μ−2 ν −2 = ν 2 μ2 λμ−2 ν −2 , and one concludes from Lemma 3.(ii) that there exists a square root κ of ν 2 μ2 commuting with λ. 2.3. Proof of Proposition 2. Lemma 5. (a, b) ∼ (a, −b) for (a, b) ∈ M L , i.e., g = −g for all g ∈ G L . Proof. The statement is evident for b = ±a, so it remains to consider the case λ(a, b) = 1. Assume (a, b) ∼ (a, −b). By Lemma 4, there exists an element μ ∈ L 1 with μ2 = λ(a, b) and (a, b) = (±μb, b), and by assumption, there exists an element ν ∈ L 1 with νμ2 ν −1 = μ2 and ν 2 (a, b) = ±(a, −b). μ2 and ν 2 commute and differ from 1, so μ and ν commute by Lemma 3(iii). Assume without loss that a = μb, then one obtains (a, b) = (μb, b) = ±ν −2 (μb, −b) = ±(μν −2 b, −ν −2 b) = ±(−μb, b) = ±(−a, b), leading to the contradiction a = −a or b = −b, respectively.

Lemma 6. Suppose that λ(a, b) = λ(c, d). Then (i) λ(a, c) = λ(b, d). (ii) λ(a, b) and λ(a, c) commute. (iii) (a, b) ∼ (c, d) or (a, b) ∼ (c, −d). Proof of (i). ja jc = ja ( jc jd ) jd = ja ( ja jb ) jd = jb jd . Proof of (ii). ja jb jc ja jb ja = ( ja jb ) jc ( ja jb ) ja = jc jd jc jc jd ja = jc ja . Proof of (iii). Since by definition (a, b) ∼ (−a, −b), it suffices to prove (a, b) ∼ (±c, ±d) for an arbitrary choice of signs. If λ(a, b) = 1 or λ(c, a) = 1 the statement is trivial. So assume λ(a, b) = 1 = λ(c, a). By Lemma 4, there exist square roots νab and νcd of λ(a, b) and square roots νca and νdb of λ(c, a) with a = ±νab b, c = ±νcd d and c = ±νca a, d = ±νdb b for some choice of signs. It suffices to prove νcd νdb = νdb νcd and νab b = ±νcd b, since these relations yield the statement by (c, d) = (±νcd νdb b, ±νdb b) = νdb (±νcd b, ±b) = νdb (±νab b, ±b) = νdb (±a, ±b).

Spin, Statistics, and Reflections II. Lorentz Invariance

817

If λ(a, b)2 = 1, one obtains νcd νdb = νdb νcd from statement (ii) and Lemma 3 (iii). The remaining condition νab b = ±νcd b then follows from −1 ) = jc ( jc jνcd b )) = jνcd b . jνab b = jc ( jc ja ) = jc ( jd jb ) = jc (νcd ( jd jb )νcd

If λ(a, b)2 = 1, one obtains b = ±λ(a, b)b from 1 = ja jb ja jb = j− ja b jb . Lemma −1 −1 νcd = λ(a, b) or νab νcd = 1, proving νab b = ±νcd b. The proof is 3(i) implies νab completed by −1 νcd λ(d, b)νcd = jc jνcd b = jc jνab b = jc ja = λ(d, b)

and an application of Lemma 3(iii) yielding νcd νdb = νdb νcd .

One now immediately obtains ˜ contains precisely two elements. Proposition 2. For each g ∈ G L the fiber λ˜ −1 (λ(g)) ˜ ˜ ˜ Proof. g = −g and λ(g) = λ(−g) for all g, so each λ˜ −1 (λ(g)) contains at least two elements. By construction, one has λ(a, b) = λ˜ (g) for each (a, b) ∈ π −1 (g). If (c, d) ∈ M L ˜ satisfies λ(c, d) = λ(g) = λ(a, b) as well, Lemma 6 implies that (a, b) ∼ (c, d) or (a, b) ∼ (c, −d), so λ˜ −1 (λ˜ (g)) contains at most two elements. 2.4. The sets Z and Z¯ . The next goal is the proof of Theorem 1. This requires some preliminaries. Some properties of the sets Z and Z¯ will be worked out in this section, and the polar decomposition of restricted Lorentz transformations into rotations and boosts will be discussed in the next section. For each x ∈ R1+3 , denote the stabilizer of x in L 1 as S(x) := {μ ∈ L 1 : μx = x}, and for each subset M of R1+3 , define S(M) := x∈M S(x). Lemma 7. The actions of L 1 on Z + and of L on Z are transitive. Proof. Consider any ξ, η ∈ Z + . M1+ is an orbit of L 1 , so there exists a Lorentz transformation μ with tξ = μtη . This μ is not unique, since tξ = νμtη for each ν ∈ S(tξ ). By construction, one already has μxη ⊥ tξ , so it remains to be shown that S(tξ ) acts transitively on H1 ∩ {tξ }⊥ . But S(tξ ) is the group of rotations with respect to the time vector tξ , and H1 ∩ {tξ }⊥ is the set of time-zero unit vectors, on which S(tξ ) acts transitively. The second statement now follows from the fact that −1 ∈ L. Lemma 8. Z¯ is a first-countable topological space. Proof. Let H be a Cauchy surface. Then the set Z H := {ξ ∈ Z + : xξ ∈ H } is a closed subset of Z + . For each ξ ∈ Z + , the intersection of the inextendible curve (ξ ) with H contains precisely one element yξ , and there is a unique generalized boost β H (ξ ) with yξ = β H (ξ )xξ . Define a map ζ H : Z + → Z H by ζ H (ξ ) := β H (ξ )ξ . Then ξ ∼ ¯ η implies ζ H (ξ ) = ζ H (η) by construction, so a map ζ¯ : Z¯ → Z H is well defined by ζ¯ (π(ξ ¯ )) = ζ (ξ ). The diagram

ζH

π¯

/ Z¯ } } }} }} ζ¯ H } ~}

Z+

ZH

818

B. Kuckert, R. Lorenzen

commutes. All maps in this diagram are continuous. This holds for π¯ by definition, and it is evident for ζ H . To show continuity of ζ¯ H , let M ⊂ L 1 be open. ζ¯ H−1 (M) is open if and only if π −1 (ζ¯ H−1 (M)) is open. This set coincides with ζ H−1 (M), which is open by continuity of ζ H . Since ζ¯ H has the continuous inverse π¯ | Z H , one finds that Z H and Z¯ are homeomor phic topological spaces. Since Z H is first-countable, so is Z¯ . One immediately concludes the following corollary. Corollary 1. Z¯ and M L are Hausdorff spaces. 2.5. Polar decompositions on L 1 . The next task will be the proof of Theorem 1, which, again, is much more involved than its prototype in Ref. [14]. A crucial instrument will be the decomposition of Lorentz transformations into rotations and boosts. Specify a time direction by distinguishing a future-directed timelike unit vector e0 . Consider the euclidean inner product ·, ·e0 on R1+3 defined by x, ye0 := −g(x, y)+ 2g(x, e0 )g(y, e0 ). Denote the adjoint of a linear map T : R1+3 → R1+3 with respect to this inner product by T ∗ . If T is a restricted Lorentz transformation, then the posˆ ) := |T | := (T ∗ T )1/2 is a boost, and the orthogonal operator itive operator β(T −1 ˆ )−1 is a rotation; β(T ˆ ) and ρ(T ρˆ := T · |T | = T β(T ˆ ) yield the polar decompoˆ ˜ ˆ λ˜ (g)). sition T = ρ(T ˆ )β(T ) of T . On G L , define ρ(g) ˜ := ρ( ˆ λ˜ (g)) and β(g) := β( To each time-zero unit vector e, assign the class e¯ := π¯ (e0 , e). The following lemma immediately follows from Lemma 2.1 in Ref. [6]; the proof is recalled here for the reader’s convenience. Lemma 9. λ˜ is onto. Proof. We prove that λ is onto, then the statement follows. λ(a, ±a) = 1 for all a ∈ Z¯ , so it remains to show that λ−1 (μ) = ∅ for each μ = 1. Suppose that μ =: ρ is a rotation, that τ is a root of ρ, and that e is a time-zero unit vector in the rotation plane of ρ. Then ρ = ρ je¯ je¯ = jτ e¯ je¯ = λ(τ e, ¯ e). ¯ Suppose that μ =: β is a boost, and let e be a time-zero unit vector in the fixed-point set of β. Then β = je¯ je¯ β = je¯ jβ −1/2 e¯ = λ(e, ¯ β −1/2 e). ¯ ˆ In the remaining case that both ρ(μ) ˆ and β(μ) differ from 1, the rotation plane of ρ(μ) ˆ ˆ and the fixed-point plane of β(μ) are well-defined two-dimensional planes contained in the time-zero plane. Since the time-zero plane is three-dimensional, this implies that the intersection of these planes is nonempty. Let e be a unit vector in this intersection and ˆ let τ be a root of ρ(μ). ˆ Then μ = ρ(μ) ˆ je¯ je¯ β(μ) = jτ e¯ jβ(μ) ¯ β −1/2 (μ)e). ¯ ˆ −1/2 e¯ = λ(τ e, Denote the set of rotations by R and the set of boost by B. Furthermore, define R˙ := R\{1} and B˙ := B\{1}, and write R¨ := {σ ∈ R : σ 2 = 1}. Lemma 10. ρ ∈ R˙ and β ∈ B˙ commute if and only if F P(ρ) = F P(β)⊥ . Proof. Assume ρβ = βρ. If x ∈ F P(β), then βρx = ρβx = ρx, so ρ F P(β) = F P(β), whence one concludes that either F P(β) = F P(ρ) or F P(β) = F P(ρ)⊥ . Since F P(β) is a spacelike surface, whereas F P(ρ) is timelike, one concludes F P(β) = F P(ρ)⊥ . That the condition is sufficient, is trivial.

Spin, Statistics, and Reflections II. Lorentz Invariance

819

Lemma 11. (i) Consider μ ∈ L 1 with polar decomposition μ = ρβ. Then ρβ = βρ if and only if there exists a time-zero unit vector e with μ ∈ S(e). ¯ (ii) Given a, b ∈ Z¯ , one has S(a) ∩ S(b) = {1} if and only if a = ±b. Proof of (i). Each rotation or boost is contained in the stabilizer of e¯ for some e, so statement (i) trivially holds for rotation or boosts. It remains to consider the case that ρ = 1 = β. If ρβ = βρ, then it follows from Lemma 10 that the rotation axis of ρ is parallel to the boost direction of β. Let e be one of the two unit vectors on this axis, then ρ, β, and, hence, also ρβ are contained in S(We¯ ) = S(e). ¯ So the condition is necessary. If, conversely, μ ∈ S(e), ¯ then there exists a unique boost γ with γ (μe0 , μe) = (e0 , e), and γ ∈ S(e) ¯ because γ e¯ = γ μe¯ = e. ¯ Because S(e) ¯ is Abelian, γ μ = μγ = ρβγ . The product ρβγ has the fixed points e0 and e by definition of γ , so it is a rotation, and βγ = 1 by uniqueness of the polar decomposition. As seen above, γ commutes with μ, so β −1 commutes with ρβ, i.e., ρ = β −1 ρβ. Proof of (ii). Without loss, assume a = e. ¯ If μ is a rotation, then e is on the rotation axis of μ, so F P(μ) = e¯⊥⊥ , and the plane e¯⊥ is mapped onto itself. The only other time-zero unit vector on the axis of μ is −e, so b = ±e¯ = ±a, as stated. If μ is a boost, then the vectors + := e + e0 and − := e − e0 are eigenvectors of μ associated with distinct eigenvalues ε and ε−1 . The vectors ± are perpendicular to F P(μ) by invariance of the metric: if x ∈ F P(μ), then ε g(x, + ) = g(x, μ + ) = g(μ−1 x, + ) = g(x, + ), so ε = 1 implies g(x, + ) = 0, and one obtains F P(μ) = e¯⊥ . It remains to consider the case that ρ = 1 = β. By Lemma 10, statement (i) implies F P(ρ) ⊥ F P(β), so ± are fixed points of ρ and, hence eigenvectors not only of β, but also of μ. Additional eigenvectors in e¯⊥ exist only if ρ is a rotation by the angle π ; their eigenvalue is −1. Since ε = −1 = ε−1 , the vectors ± are the only eigenvectors of μ with positive eigenvalues. By assumption, μ ∈ S(b) =: S(π¯ ( f 0 , f )), so the polar decomposition of μ with respect to f 0 commutes. The reasoning just used yields that the lightlike vectors f + f 0 and f − f 0 are eigenvectors of μ with positive eigenvalues and, hence, proportional to e + e0 and e − e0 , respectively, whence e¯ = ± f¯ and, hence, statement (ii) is obtained. Lemma 12. Given any μ ∈ L 1 , suppose that the polar decomposition μ = ρe0 βe0 commutes for all e0 ∈ M1+ . Then μ = 1. Proof. Because by assumption, ρe0 βe0 = βe0 ρe0 , there is some time-zero unit vector e with μ ∈ S(e). ¯ The subset te¯ := {d0 ∈ M1+ : π¯ (d0 , d) = e¯ for some unit vector d ⊥ d0 } of M1+ is a hyperbola, so there exists some f 0 ∈ M1+ \te¯ . By assumption, the polar decomposition μ = ρ f0 β f0 commutes as well, so there is some unit vector f ⊥ f 0 with μ ∈ S(π¯ ( f 0 , f )). By construction, π¯ ( f 0 , f ) = ±e, ¯ so Wπ¯ ( f 0 , f ) = ±We¯ , whence S(π¯ ( f 0 , f )) ∩ S(e) ¯ = {1} by Lemma 11.

820

B. Kuckert, R. Lorenzen

For each (ρ, β) ∈ R˙ × B, let E(ρ, β) be the set of all time-zero unit vectors in F P(ρ)⊥ ∩ F P(β). Proposition 3. (i) E(ρ, β) ∼ = S 1 if and only if ρβ = βρ. (ii) Otherwise, E(ρ, β) = {±e} for some time-zero unit vector e. Proof of (i). If β = 1, then E(ρ, β) = F P(ρ)⊥ ∩ {0} × S 2 , i.e., the intersection of the time-zero two-sphere with a two-dimensional spacelike subspace of the time-zero plane. Such an intersection is homeomorphic to S 1 . If ρ = 1 = β, then ρβ = βρ if and only if F P(β) ⊥ F P(ρ) by Lemma 10, and this holds if and only if F P(ρ)⊥ ∩ F P(β) is a two-dimensional spacelike plane, i.e., if and only if E(ρ, β) is homeomorphic with S1. Proof of (ii). If ρβ = βρ, then F P(ρ)⊥ ∩ F P(β) is not two-dimensional by Lemma 10, but since F P(ρ)⊥ and F P(β) are two-dimensional subspaces of the time-zero plane, their intersection is one-dimensional and contains two opposite time-zero unit vectors. 2.6. Proof of Theorem 1. Let N e0 be the set of all (τ, β) ∈ R¨ × B with E(τ, β) ∼ = Z2 (cf. Prop. 3). Define a map λ1 : N e0 → L 1 by λ1 (σ, β) := σ 2 β and define L e10 := λ1 (N e0 ). Furthermore, set GeL0 := λ˜ −1 (L e10 ). ¨ there is a unique time-zero unit vector a(ρ) with the property that ρ For each ρ ∈ R, is a right-handed rotation with respect to a(ρ) by a rotation angle α(ρ) smaller than π . ¨ and α has a continuous extension to a The functions a(·) and α(·) are continuous on R, function from all of R onto the closed interval [0, π ]; we denote this extension by α as well. ˙ there exists a unique time-zero unit vector b(β) with respect to For each β ∈ B, which β is a boost by a rapidity χ (β) greater than zero. The functions b and χ are continuous, and the function χ has a continuous extension to all of B with values in R≥0 , which we denote by χ as well. The functions α˜ : G L → [0, π ] and χ˜ : G L → R≥0 defined by α(g) ˜ := α(ρ(g)) ˜ ˜ and χ(g) ˜ := χ (β(g)) are continuous. Lemma 13. (i) The polar decomposition ρˆ × βˆ : L 1 → R × B is continuous. (ii) The restriction of the group product in L 1 to R × B is a homeomorphism onto L 1 . (iii) N e0 is a two-sheeted covering space of L e10 when endowed with the covering map λ1 . Proof of (i). The group product in L 1 , the map μ → μ∗ , and the square-root function √ ˆ are continuous, the map μ → β(μ) := μ∗ μ is continuous. Since the map μ → μ−1 ˆ −1 is continuous. is continuous as well, one concludes that μ → ρ(μ) ˆ := μβ(μ) Proof of (ii). The group product is continuous and inverse to the continuous polar decomposition. Since the group product is onto, so is the polar decomposition. Proof of (iii). N e0 is an open subset of R¨ × B, so it suffices to prove the corresponding statement for R¨ × B. So it remains to be shown that R¨ is a two-sheeted covering space when endowed with the covering map τ → τ 2 . Continuity of this map follows from continuity of the group product. Conversely, each ρ ∈ R¨ has the two roots [a(ρ), α(ρ/2)] and [−a(ρ), π − α(ρ)/2], and since a and α are continuous maps, the square map has a continuous local inverse.

Spin, Statistics, and Reflections II. Lorentz Invariance

821

Lemma 14. For each g ∈ G eL0 , there is a unique square root τ˜ (g) of ρ(g) ˜ with g = ˜ −1/2 e) ˜ π(τ˜ (g)e, ¯ β(g) ¯ for both e ∈ E(τ˜ (g), β(g)). ⊥ , there are ˜ ˜ −1/2 e) ˜ Proof. If e ∈ F P(β(g)), then λ(e, ¯ β(g) ¯ = β(g). If e ∈ F P(ρ(g)) ˜ precisely two a ∈ Z¯ with λ(a, e) ¯ = ρ(g). ˜ Namely, if τ± are the two square roots of the rotation ρ(g), ˜ then a± = (τ± e, ¯ e) ¯ do the job. ⊥ , the non-equivalent ˜ ˜ Accordingly, if e ∈ E(ρ(g), ˜ β(g)) = F P(ρ(g)) ˜ ∩ F P(β(g)) + − pairs m and m defined by

˜ −1/2 e) ˜ −1/2 e) ¯ e) ¯ ◦ (e, ¯ β(g) ¯ = (τ± e, ¯ β(g) ¯ m ± := (τ± e, satisfy λ(m ± ) = λ˜ (g). By Proposition 2, exactly one of them is contained in π −1 (g). ˜ Evidently Define a “polar decomposition” η : GeL0 → N e0 by η(g) := (τ˜ (g), β(g)). η is a bijection, and the diagram GeL0 | η || λ˜ || | | ~| / L e0 N e0 λ1

(4)

1

commutes. Next define the set MeL0 := {(τ e, ¯ β −1/2 e) ¯ : (τ, β) ∈ N e0 , e ∈ E(τ, β)}, and define a map λ2 : MeL0 → N e0 by λ2 (m) := η(π(m)). Then the diagrams π

/ G e0 L | η || | λ2 λ˜ || | }| / L e0 N e0 1 MeL0

(A)

λ1

and

π / e0 GL BB BB λ BB λ2 BB λ˜ ! / L e0 N e0

MeL0

λ1

1

commute. Define a continuous function e : N e0 → H1 ∩ e0⊥ by e(ρ, β) :=

a(ρ) × b(β) , |a(ρ) × b(β)|

where × denotes the vector product within the time-zero plane e0⊥ . Lemma 15. (i) λe0 := λ|Me0 is an open map. L (ii) λ2 is continuous. (iii) η is continuous.

(B)

(5)

822

B. Kuckert, R. Lorenzen

Proof of (i). L e10 is first-countable, so it suffices to show that for each sequence (μn )n in L e10 converging to a limit μ and for each m ∈ λ−1 e0 (μ), there exists a sequence (m n )n converging to m and satisfying λe0 (m n ) = μn . ˆ n )) converges So let (μn )n be a sequence in L e10 converging to μ. Then (ρ(μ ˆ n ), β(μ e 0 ˆ ˆ to (ρ(μ), ˆ β(μ)) in N , by continuity of the functions ρˆ and β. Consequently, the timeˆ n )) tend to the limit e = e(ρ(μ), ˆ zero unit vectors en := e(ρ(μ ˆ n ), β(μ ˆ β(μ)). Since π¯ is continuous, the sequence e¯n converges to e. ¯ ˆ −1/2 e) Consider, without loss, the element m := (τ e, ¯ β(μ) ¯ of the fiber λ−1 (μ). There 2 exists a convergent sequence (τn )n in R with τn = ρ(μ ˆ n ), and the sequence (m n )n ˆ n )−1/2 e¯n ) satisfies λe0 (m n ) = μn and m n → m. The same defined by m n := (τn e¯n , β(μ reasoning applies to the other elements of the fiber λ−1 e0 (μ). e0 −1 Proof of (ii). For each m 1 ∈ M L , the fiber λe0 (λe0 (m 1 )) contains four elements m 1 , . . . , m 4 , and by the Hausdorff property, these have mutually disjoint open neighborhoods U1 , . . . , U4 . Since λe0 is open by statement (i), their images are open, so V := λe0 (U1 ) ∩ · · · ∩ λe0 (U4 ) is open. On the other hand, there is an open neighborhood Y of λ2 (m 1 ) with the property that λ1 |Y is a homeomorphism onto W := λ1 (Y ). Being a covering map, λ1 is open, so W is open. V ∩ W is open, and λe0 is continuous, so the set X := U1 ∩ λ−1 e0 (V ∩ W ) is open and contains m 1 . The diagram X GG GG λe0 | X GG λ2 | X GG G# Y λ | / V ∩W 1 Y

is a commutative diagram of bijections by construction. Since λe0 | X and λ1 |Y are homeomorphisms, so is λ2 | X . Proof of (iii). Using diagram 5 (B), one immediately concludes the statement from continuity of λ2 . Lemma 16. MeL0 is a two-sheeted covering space of N e0 when endowed with the covering map λ2 . Proof. Define continuous maps m ± : N e0 → MeL0 by m ± (τ, β) := (±τ e(τ, β), ±βe(τ, β)). We show that these functions are local inverses of λ2 . For a given x ∈ N e0 , write y± := m ± (x). Since MeL0 is a Hausdorff space, there exist two disjoint open neighborhoods Y± of y± . By continuity of m ± , the pre-images X ± := m −1 ± (Y± ) are open, and X := X + ∩ X − is an open neighborhood of x. By continuity of λ2 , the sets W± := λ−1 2 (X ) ∩ Y± are open neighborhoods of m ± (x) with λ2 (W+ ) = X = λ2 (W− ). As a consequence, the continuous maps m ± | X : X → W± are one-to-one and onto, their inverses being λ2 . Proposition 4. (i) η is a homeomorphism.

Spin, Statistics, and Reflections II. Lorentz Invariance

823

(ii) GeL0 is a Hausdorff space. (iii) GeL0 is a two-sheeted covering space of L e10 when endowed with the covering map λ˜ e0 . Proof. The maps π ◦ m + and π ◦ m − coincide and are inverse to η by construction. By continuity of m ± and π , they are continuous. This proves (i) and implies (ii). λ˜ e0 = λ2 ◦ η is a concatenation of a homeomorphism and a two-sheeted covering map. This yields (iii). ˙ L . To this end, recall that μ ∈ L e0 if and only if Next we extend these results to G 1 ˆ ˆ ρ(μ). ρ(μ) ˆ β(μ)

= β(μ) ˆ Proposition 5. ˙ L. (i) For each e0 ∈ M1+ , the set GeL0 is an open subset of G e0 ˙ (ii) e0 ∈M1+ G L = G L . ˙ (iii) G L is a two-sheeted covering space of L 1 \{1} when endowed with the covering map λ˜ . ˆ n ) = β(μ ˆ n )ρ(μ ˆ Proof. If a sequence μn → μ in L 1 with ρ(μ ˆ n )β(μ ˆ n ), then ρ(μ) ˆ β(μ) = −1 −1 ˆ n )ρ(μ ˆ −1 ˆ ρ(μ). ˆ n ) ρ(μ ˆ n )β(μ ˆ n ) = 1 for all n, so β(μ) β(μ) ˆ Namely, one has β(μ −1 (μ) = 1 follows by continuity of the functions β, ˆ ˆ ρ, ˆ −1 , and ρ(·) ρ(μ) ˆ β(μ)ρ ˆ β(·) ˆ −1 , and of the group product. As a consequence, the set L e10 has a closed complement and, hence, is an open subset ˜ This proves (i). of L 1 . Accordingly, GeL0 = λ˜ −1 (L e10 ) is open by continuity of λ. It follows from Lemma 12 that e0 ∈M + L e10 = L 1 \{1}, and this proves statement (ii) 1 ˜ by continuity of λ. ˙ L , an open neighborhood restricted By statements (i) and (ii), there is, for each g ∈ G to which λ˜ is one-to-one and open. This proves (iii). Proposition 6. (i) G L is a Hausdorff space. (ii) λ˜ is open. ˙ L is a Hausdorff space, so it remains Proof (of (i)). Being a union of Hausdorff spaces, G to prove that for each g there are disjoint neighborhoods U1 and Ug of 1 and g = 1, respectively, (which implies that there are disjoint neighborhoods −U1 and −Ug of −1 and −g). g = 1 implies that (α(g), ˜ χ(g)) ˜

= (0, 0). Since α˜ and χ˜ are continuous4 and since (α(h), ˜ χ(h)) ˜ = (0, 0) implies h = 1, the open sets U1 := (α˜ × χ˜ )−1 ( [0, ε) × [0, ε) ) and Ug := (α˜ × χ˜ )−1 ( (α(g) ˜ − ε, α(g) ˜ + ε) × (χ(g) ˜ − ε, χ˜ (g) + ε) ) are disjoint for sufficiently small ε > 0. ˙ L is a two-sheeted covering space when endowed Proof of (ii). It has been shown that G with the covering map λ˜ . Since λ˜ is continuous on all of G L , it remains to be shown that λ˜ 4 with respect to the relative topologies of the closed topological subspaces [0, π ] and R≥0 of R

824

B. Kuckert, R. Lorenzen

is open at ±1. L 1 is first countable, so it suffices to show that for each sequence μn → 1 in L 1 there exists a sequence gn → 1 in G L with λ˜ (gn ) = μn ; note that the sequence (−gn )n tends to −1 in this case.5 For each n there is a gn ∈ λ˜ −1 (μn ) with α(g ˜ n ) ≤ π/2. For any ε > 0, almost all gn satisfy (α(g ˜ n ), χ˜ (gn )) ∈ [0, π ] × [0, ε]. Since this is a ˜ n) compact set, the sequence (α(g ˜ n ), χ˜ (gn )) has at least one accumulation point. β(g tends to 1, so χ(g ˜ n ) tends to zero, so all accumulation points are in [0, π ] × {0}. The assumption μn → 1 further reduces the set of possible points to the set {(0, 0), (π, 0)}, and opting for α(g ˜ n ) ≤ π/2 rules out (π, 0). So both α(g ˜ n ) and χ(g ˜ n ) tend to zero. It follows that gn tends to 1. We now recall and prove the remaining statements made in Sect. 1.2. Theorem 1(i). G L is a two-sheeted covering space of L 1 when endowed with the covering map λ˜ . ˙ L is a covering of L 1 \{1} when endowed with the covering map λ˜ , so all that Proof. G remains to be shown is that λ˜ is a homeomorphism from some neighborhood U of 1 or −1 onto λ˜ (U ). Since G L is a Hausdorff space, there exist disjoint neighborhoods U± of ±1. Since λ˜ is open, the images V± := λ˜ (U± ) are open. The intersection V := V+ ∩ V− is an open ˜ the sets W± := U ∩ λ˜ −1 (V+ ∩ V− ) neighborhood of 1 ∈ L 1 , and by continuity of λ, are open and, hence, neighborhoods of ±1 ∈ G L , respectively. Since W± have been constructed in such a fashion that λ˜ (W+ ) = U = λ˜ (W− ), the restrictions λ˜ ± to W± are one-to-one and onto, and since λ˜ is open, the inverse mappings are continuous. Theorem 1(ii). G L is simply connected. Proof. Z¯ is pathwise connected, so M L = Z¯ × Z¯ is pathwise connected, and since π is continuous, G L = π(M L ) is pathwise connected. Since G L is a two-sheeted covering group of L 1 , and since the fundamental group of L 1 is isomorphic with Z2 , one concludes that G L is homeomorphic with the universal covering of L 1 . Theorem 1(iii). There is a unique group product on G L with the property that the diagram ◦ / (6) ML ML × ML π

π ×π

GL × GL

/ GL

·

/ L1

λ˜ ×λ˜

L1 × L1

λ˜

commutes. 5 It suffices to consider sequences, since L is first-countable (which we have not yet proved for G at 1 L this stage). Namely, let Ug ⊂ G L be a neighborhood of any g ∈ G L , and let (μn )n be a sequence in L 1 converging to λ˜ (g). By assumption there is a sequence gn → g with λ˜ (gn ) = μn . Since gn → g and since Ug is a neighborhood of g, one has gn ∈ Ug for almost all n, so μn = λ˜ (gn ) ∈ λ˜ (Ug ) for almost all n. Since this holds for all sequences μn → λ˜ (g) one concludes that λ˜ (Ug ) is a neighborhood of λ˜ (g) in L 1 by first-countability.

Spin, Statistics, and Reflections II. Lorentz Invariance

825

Proof. The outer arrows of the diagram commute, so it suffices to prove existence and uniqueness of a group product conforming with the lower part. But it is well known that each simply connected covering space G˜ of a topological group G can be endowed with a unique group product such that G is a covering group.6 Lemma 1. Given h ∈ G L and (c, d) ∈ M L , one has hπ(c, d)h −1 = π λ˜ (h)c, λ˜ (h)d .

(7)

Proof. The function F : G L → G L defined by −1 ˜ ˜ F(h) := π λ(h)c, λ(h)d hπ(c, d)h −1 ˜ has the property that λ(F(h)) = 1 and that, hence, it takes values in the discrete set {±1} ⊂ G L . Since F is continuous and L 1 is connected, F is constant, and because F(1) = 1, it follows that F(h) = 1 for all h. 3. Spin & Statistics The preceding section has provided the basis of a general spin-statistics theorem, which is the subject of this section. From an intrinsic form of symmetry under a charge conjugation combined with a time inversion and the reflection in one spatial direction, which is referred to as modular P1 CT-symmetry, a strongly continuous unitary representation W˜ of G L will be constructed. It is, then, elementary to show that W˜ exhibits Pauli’s spin-statistics relation. Let F be an arbitrary quantum field on R1+3 in a Hilbert space H. The standard properties of the relativistic quantum field to be used here are practically the same as in Ref. [14] and are recalled here for the reader’s convenience. (A) Algebra of field operators. Let C be a linear space of arbitrary dimension,7 and denote by D the space C0∞ (R1+3 ) of test functions on R1+3 . The field F is a linear function that assigns to each ∈ C ⊗ D a linear operator F() in a separable Hilbert space H. (A.1) F is free from redundancies in C, i.e., if c, d ∈ C and if F(c⊗ϕ) = F(d⊗ϕ) for all ϕ ∈ D, then c = d. (A.2) Each field operator F() and its adjoint F()† are densely defined. There exists a dense subspace D of H contained in the domains of F() and F()† and satisfying F()D ⊂ D and F()† D ⊂ D for all ∈ C ⊗ D. Denote by F the algebra generated by all F()|D and all F()† |D . Defining an involution ∗ on F by A∗ := A† |D , the algebra F is endowed with the structure of a ∗-algebra. Let F(a) be the algebra generated by all F(c ⊗ ϕ)|D and all F(c ⊗ ϕ)† |D with supp(ϕ) ⊂ Wa , where Wa denotes, as above, the Rindler wedge of a. The algebra F(a) inherits the structure of a ∗-algebra from F by restriction of ∗. (A.3) F(a) is non-Abelian for each a, and a = b implies F(a) = F(b). 6 See, e.g., Props. 5 and 6 in Sect. I.VIII. in Ref. [7]. 7 As in Ref.[14], C is the “component space”, and its dimension equals the number of components, which

may be infinite in what follows.

826

B. Kuckert, R. Lorenzen

(B) Cyclic vacuum vector. There exists a vector ∈ D that is cyclic with respect to each F(a). (C) Normal commutation relations. There exists a unitary and self-adjoint operator k on H with k = and with kF(a)k = F(a) for all a. Define F± := 21 (F ± k Fk). If c and d are arbitrary elements of C and if ϕ, ψ ∈ D have spacelike separated supports, then F+ (c ⊗ ϕ)F+ (d ⊗ ψ) = F+ (d ⊗ ψ)F+ (c ⊗ ϕ), F+ (c ⊗ ϕ)F− (d ⊗ ψ) = F− (d ⊗ ψ)F+ (c ⊗ ϕ), and F− (c ⊗ ϕ)F− (d ⊗ ψ) = −F− (d ⊗ ψ)F− (c ⊗ ϕ) for all c, d ∈ C. The involution k is the statistics operator, and F± are the bosonic and fermionic components of F, respectively. Defining κ := (1 + ik)/(1 + i) and F t (d ⊗ ψ) := κ F(d ⊗ ψ)κ † , the normal commutation relations read F(c ⊗ ϕ), F t (d ⊗ ψ) = 0. This property is referred to as twisted locality. Denote F(a)t := κF(a)κ † . These properties imply that is separating with respect to each algebra F(a), i.e., there is no nonzero operator A ∈ F(a) with A = 0.8 As a consequence, an antilinear operator Ra : F(a) → F(a) is defined by Ra A := A∗ . This operator is closable. 1/2 Its closed extension Sa has a unique polar decomposition Sa = Ja a into an antiuni1/2 tary operator Ja , which is called the modular conjugation, and a positive operator a , 9 which is called the modular operator. Ja is an involution. For each a ∈ Z¯ , let ja be the orthogonal reflection at the edge of Wa and for each ϕ ∈ D define the test function ja ϕ ∈ D by ja ϕ(x) := ϕ( ja x). (D) Modular P1 CT-symmetry. For each a ∈ Z¯ , there exists a linear involution Ca in C such that for all c ∈ C and ϕ ∈ D, one has Ja F(c ⊗ ϕ)Ja = F t (Ca c ⊗ ja ϕ). The map a → Ja is strongly continuous.10 In the rest of the section it will be shown that pairs of modular P1 CT-reflections give rise to a strongly continuous representation of G L exhibiting Pauli’s spin-statistics connection. Lemma 17. Let K be a unitary or antiunitary operator in H with K = , and suppose there are a, b ∈ Z¯ such that K F(a)K † = F(b). Then K Ja K † = Jb , and K a K † = b . Proof. If B ∈ F(b), then K Sa K † B = K Sa K †

B K = B ∗ = Sb B. The state∈F(a)

ment now follows by uniqueness of the polar decomposition.

8 For details on this and the following statements, see Ref. [14] or [3]. 9 S , J , and 1/2 are the objects of the well-known modular theory developed by Tomita and Takesaki. a a a 10 If one assumes covariance with respect to some strongly continuous representation of G (which may L

also violate the spin-statistics connection), this is straightforward to derive. But covariance, as such, is not needed.

Spin, Statistics, and Reflections II. Lorentz Invariance

827

This lemma yields a couple of important relations. For each a ∈ Z¯ , one has kF(a)k † = kF(a)k = F(a), so k Ja k = Ja , whence Ja κ = κ † Ja (8) follows by antilinearity of Ja . By modular P1 CT-symmetry, Ja F(a)Ja = Ft (−a) = κF(−a)κ † , so κ † Ja κ = κ † Ja Ja Ja κ = J−a . (9) It also follows from modular P1 CT-symmetry that Ja F(b)Ja = Ft ( ja b), so Ja Jb Ja = κ J ja b κ † = J− ja b = J ja jb b = Jλ(a,b)b .

(10)

These consequences of Lemma 17 will be used extensively in what follows without mentioning them further. For every (a, b) ∈ M L , define W (a, b) := Ja Jb . Theorem 2. (i) There exists a representation W˜ : G L → W (M L ) in H with the property that the diagrams ML × ML

◦

/ ML

/ GL

π

π ×π

GL × GL W˜ ×W˜

W˜ (G L ) × W˜ (G L )

and

π

/ GL v v vv W vv ˜ v {vv W W (M L ) ML

(11)

W˜

·

/ W˜ (G ) L

commute. (ii) There is a representation D˜ of G L in C such that ˜ ˜ ⊗ λ(g)ϕ) for all g, c, ϕ, W˜ (g)F(c ⊗ ϕ)W˜ (g)† = F( D(g)c

(12)

˜ ˜ −1 ·). where λ(g)ϕ := ϕ(λ(g) Proof of (i). Fix some e0 ∈ M1+ . ˜ ) ∈ R, there exists a unique rotation τ with τ 2 = λ(r ˜ ) and For each r ∈ G L with λ(r r = π(τ e, ¯ e) ¯ for all time-zero unit vectors e in F P(λ˜ (r ))⊥ . For each n := (e¯ , f¯ ) ∼ m, there exists a rotation ρ with ρλ(m)ρ −1 = λ(m) and ρ 2 m = n. Because a → Ja and, hence, also the map W is continuous by assumption of modular P1 CT-symmetry, one can mimic the proof of Lemma 2.4 in Ref. [6] in order to show that W (m) = W (n), and one can define a unitary operator W˜ e0 (r ) by W˜ e0 (r ) := W (m). It has been shown in Ref. [14] that these operators give rise to a representation of the subgroup G R := λ˜ −1 (R) of GL . For each b ∈ G L with λ˜ (b) ∈ B, the class π −1 (b) contains elements of the form ˜ −1/2 . The one-parameter group of rotations either (e, ¯ β e) ¯ or (e, ¯ −β e), ¯ where β := β(b) around the boost direction of β acts transitively on the set of such elements. If n is a second such argument equivalent to m, one can, again, use the reasoning of Ref. [6] in order to show that W (m) = W (n), and one can define W˜ e0 (b) := W (m). Furthermore,

828

B. Kuckert, R. Lorenzen

˜ t ) ∈ B, it follows from the results if (bt )t is a one-parameter subgroup of G L with λ(b of Ref. [6] that W˜ (bs )W˜ (bt ) = W˜ (bs+t ). The polar decomposition in L 1 can be lifted to a polar decomposition in G L . Namely, given an arbitrary g ∈ G L , there exist r g , bg ∈ G L with λ˜ (r g ) = ρ( ˆ λ˜ (g)) and λ˜ (bg ) = ˆ λ˜ (g)) and r g bg = g. This decomposition is unique up to replacement of r g and bg by β( −r g and −bg , respectively. Therefore, the operator W˜ (g) := W˜ (r g )W˜ (bg ) does not depend on the choice of this polar decomposition. For arbitrary g = r g bg ∈ G L , define W˜ e0 (g) := W˜ e0 (r g )W˜ e0 (bg ). Note that the definition of W˜ e0 (g) depends on e0 as it stands; but the index will be dropped for the time being. Lemma 18. Consider g, h ∈ G L with λ˜ (g), λ˜ (hgh −1 ) ∈ B and λ˜ (h) ∈ R ∪ B. Then W˜ (h)W˜ (g)W˜ (h)† = W˜ hgh −1 . Proof. It follows from Eq. (10) and Lemma 1 that for each (a, b) ∈ π −1 (g), one has W˜ (h)W˜ (π(a, b))W˜ (h)† = W˜ (h)Ja Jb W˜ (h)† = Jλ˜ (h)a Jλ˜ (h)b ˜ ˜ ˜ ˜ = W (λ(h)a, λ(h)b) = W˜ π λ(h)a, λ(h)b

(13)

= W˜ (hπ(a, b)h −1 ). Lemma 19. If λ˜ (g) ∈ S(a) for some a ∈ Z¯ , then W (m) = W˜ (g) for all m ∈ π −1 (g). Proof. Without loss, suppose that a = e¯ for some time-zero unit vector e. If g = r g bg ˜ ˜ and h = rh bh with λ(g), λ(h) ∈ S(e) ¯ for some time-zero unit vector e, then Lemma 18 implies W˜ (h)W˜ (g)W˜ (h)† = W˜ (rh )W˜ (bh ) · W˜ (r g )W˜ (bg ) · W˜ (bh )† W˜ (rh )† = W˜ (g).

(14)

Let m ∈ M L satisfy W (m) = W˜ (g). If n ∼ m and n = m, then there exists, by definition ˜ of ∼, a μ ∈ L 1 with μ2 = 1, commuting with λ(g) and satisfying μ2 m = ±n. Since S(e) ¯ is a maximal Abelian group and since λ˜ (g) ∈ S(e) ¯ by assumption, one concludes μ ∈ S(e), ¯ and for each h with λ˜ (h) = μ2 , one obtains from Eq. (14), W (n) = W (±μ2 m) = W (μ2 m) = W˜ (h)W˜ (g)W˜ (h)† = W˜ (g) = W (m). Proof of (i) (contd.). Next let g ∈ G L be arbitrary with polar decomposition r g bg . W˜ (g) is an element of W (M L ). Namely, recall that there exist a time-zero unit vector e and a rotation τ such that g = π(τ e, ¯ β −1/2 e), ¯ where β = λ˜ (bg ) ∈ B. One concludes W˜ (g) = W˜ (r g )W˜ (bg ) = Jτ e¯ Je¯ Je¯ Jβ −1/2 e¯ = Jτ e¯ Jβ −1/2 e¯ ∈ W (M L ).

Spin, Statistics, and Reflections II. Lorentz Invariance

829

W˜ is a representation. Namely, W˜ (g)W˜ (h) = W˜ (r g )W˜ (bg )W˜ (rh )W˜ (bh ) = W˜ (r g )W˜ (rh ) W˜ (rh )† W˜ (bg )W˜ (rh ) W˜ (bh ) =: W˜ (r g rh ) W˜ (b f )W˜ (bh ). The last two terms implement the generalized boost λ˜ (b f )λ˜ (bh ) = λ˜ (b f )1/2 λ˜ (b f )1/2 λ˜ (bg )λ˜ (b f )1/2 λ˜ (b f )−1/2 , ˜ f ) and λ(b ˜ h ), so for a time-zero unit vector e orthogonal to the boost directions of λ(b Lemma 19 yields W˜ (b f bh ) = J±λ˜ (b f )1/2 e¯ Jλ˜ (bh )−1/2 e¯ = J±λ˜ (b f )1/2 e¯ Je¯2 Jλ˜ (bh )−1/2 e¯ = Je¯ J±λ˜ (b f )−1/2 e¯ Je¯ Jλ˜ (bh )−1/2 e¯ = W˜ (b f )W˜ (bh ) for one of the signs. Now write b f bh =: d = rd bd , then W˜ (g)W˜ (h) = W˜ (r g rh rd )W˜ (bd ) = W˜ (r g rh rd bd ) = W˜ (gh). Proof of (ii). Define a map D from M L into the automorphism group Aut(C) of C by D(a, b) := Ca Cb . If (a, b) ∼ (c, d), then modular P1 CT-symmetry implies F(Ca Cb c ⊗ ja jb ϕ) = = = =

W (a, b)F(c ⊗ ϕ)W (a, b)† W (c, d)F(c ⊗ ϕ)W (c, d)† F(Cc Cd c ⊗ jc jd ϕ) F(Cc Cd c ⊗ ja jb ϕ)

for all c and all ϕ. Using Assumption (A.1), one obtains Ca Cb c = Cc Cd c for all c, so ˜ := D(m). D(a, b) = D(c, d), and a map D˜ : G L → Aut(C) is defined by D(π(m)) This map D˜ now inherits the representation property from W˜ . Theorem 3 (Spin-statistics connection). 1 ˜ ⊗ ϕ)) F± (c ⊗ ϕ) = (F(c ⊗ ϕ) ± F( D(−1)c 2 for all c and all ϕ. Proof. For each a ∈ Z¯ one has W˜ (−1) = Ja J−a = Ja κ Ja κ † = Ja2 (κ † )2 = k, so k F(c ⊗ ϕ)k = W˜ (−1)F(c ⊗ ϕ)W˜ (−1) ˜ ⊗ ϕ). = W˜ (−1)F(c ⊗ ϕ)W˜ (−1)† = F( D(−1)c ˜ If, in particular, D˜ is irreducible with spin s, then D(−1) = e2πis , so F− = 0 for integer s and F+ = 0 for half-integer s.

830

B. Kuckert, R. Lorenzen

4. Other Modular Symmetries Evidently, the operator := Je¯1 Je¯2 Je¯3 implements a full PCT-symmetry. depends on the handedness of the triple (e1 , e2 , e3 ) only [14]. As mentioned earlier, Guido and Longo obtained a spin-statistics theorem in the above spirit in Ref. [10]. Instead of the P1 CT-reflections, they assumed the modular groups associated with the algebras F(a + x) and the vacuum vector, which satisfy the KMS-condition, to implement Lorentz boosts — which is the abstract version of the Unruh effect (where a + x denotes the wedge a translated by x ∈ R1+3 ). Assuming the field algebras to be von Neumann algebras satisfying some standard +↑ not from P1 CT-reflecproperties, they constructed a positive-energy representation of P tions, but from the modular groups assigned to arbitrary Rindler wedges (for which the commutation relations requested for covariance are not assumed from the outset [4, 10]). This representation satisfies Pauli’s spin-statistics relation. Since both the above representation W˜ and their representation have been constructed from the basic elements of Tomita-Takesaki theory, one should expect them to coincide. In the following sense, this is the case. If we assume (within our setting) that the modular groups associated with the algebras F(a) and Ft (a), a ∈ Z¯ implement boosts and generate a representation U of G L , then the representations U and W˜ coincide. Lemma 20. U = W˜ . Proof. It suffices to show that for each ita , a ∈ Z¯ , t ∈ R, there exist b, c ∈ Z¯ such that ita = Jb Jc . If for some a, b ∈ Z¯ there exist zweibeine ξ ∈ π¯ −1 (a) and η ∈ π¯ −1 (b) with tξ = tη and xξ ⊥ xη , then it follows from Lemma 17 that Jb ita Jb = a−it and −it/2 it/2 a Jb a = J a (t/2)b for all t ∈ R, so −it/2

Jb ita = a

Jb it/2 = J a (−t/2)b , i.e., ita = Jb J a (−t/2)b .

Conclusion Both the classical geometry and the fundamental quantum field theoretic representations of the restricted Lorentz group L 1 are based on reflection symmetries. At the classical level, a simply connected covering group G L of L 1 can be constructed from P1 T-reflections. For a typical quantum field F, a class of antiunitary P1 CT-operators exists that are fixed by the intrinsic structure of the respective field. These are the fundamental symmetries of quantum field theories, and they give rise to a unitary representation of the Lorentz group. In order to show this, the existence of such a representation does not need to be assumed from the outset. On the other hand, the construction yields a distinguished representation of the Lorentz group even in cases where several covariant representations are present. It may happen in such cases that representations satisfying Pauli’s relation coexist with representations violating it. In any case, the representation constructed from the modular P1 CT-conjugations exhibits the right spin-statistics connection, and this is, eventually, straightforward to see.

Spin, Statistics, and Reflections II. Lorentz Invariance

831

In more than 1 + 3 dimensions, the proof of a similar statement would require a considerable extension and refinement of our analysis. In general, more than two reflections are needed in order to construct an element of a higher dimensional Lorentz group. Accordingly, the formulation of an appropriate equivalence relation is unlikely to be a simple generalization of the definition of ∼ given above. Furthermore, the structure of the higher dimensional restricted Lorentz group and its universal covering group is more involved; as an example, the stabilizer group of a Rindler wedge is not Abelian. In ↑ 1 + 2 dimensions, the universal covering group of L + has an infinite number of sheets and the situation appears even more involved. Acknowledgements. Thanks are due to Professor D. Arlt for critically reading preliminary versions of the manuscript. The authors have been supported by the Emmy-Noether-programme of the Deutsche Forschungsgemeinschaft. RL also acknowledges support from the Graduiertenkolleg “Zukünftige Entwicklungen der Teilchenphysik” at the University of Hamburg.

References 1. Bisognano, J.J., Wichmann, E.H.: On the Duality Condition for a Hermitian Scalar Field. J. Math. Phys. 16, 985–1007 (1975) 2. Bisognano, J.J., Wichmann, E.H.: On the Duality Condition for Quantum Fields. J. Math. Phys. 17, 303 (1976) 3. Bratteli, O, Robinson, D.W.: Operator Algebras and Quantum Statistical Mechanics, Berlin Heidelberg New York: Springer, (1987) 4. Brunetti, R., Guido, D., Longo, R.: Modular Structure and Duality in Conformal Quantum Field Theory. Commun. Math. Phys. 156, 201–219 (1993) 5. Buchholz, D., Dreyer, O., Florig, M., Summers, S.J.: Geometric Modular Action and spacetime Symmetry Groups. Rev. Math. Phys. 12, 475–560 (2000) 6. Buchholz, D., Summers, S.J.: An algebraic characterization of vacuum states in Minkowski space. III. Reflection maps. Commun. Math. Phys. 246, 625–641 (2004) 7. Chevalley, C.: Theory of Lie Groups. Princeton, NJ: Princeton University Press, (1946) 8. Florig, M.: Geometric Modular Action, PhD-thesis, University of Florida, Gainesville, (1999) 9. Fröhlich, J., Marchetti, P.A.: Spin statistics theorem and scattering in planar quantum field theories with braid statistics. Nucl. Phys. B356, 533–573 (1991) 10. Guido, D., Longo, R.: An Algebraic Spin and Statistics Theorem. Commun. Math. Phys. 172, 517–534 (1995) 11. Guido, D., Longo, R.: The Conformal Spin and Statistics Theorem. Commun. Math. Phys. 181, 11–36 (1996) 12. Kuckert, B.: A New Approach to Spin & Statistics. Lett. Math. Phys. 35, 319–335 (1995) 13. Kuckert, B.: Covariant Thermodynamics of Quantum Systems: Passivity, Semipassivity, and the Unruh Effect. Ann. Phys. (N. Y.) 295, 216–229 (2002) 14. Kuckert, B.: Spin, Statistics, and Reflections, I. Ann. Henri Poincaré 6, 849–862 (2005) 15. Kuckert, B.: Spin & Statistics in Nonrelativistic Quantum Mechanics, I. Phys. Lett. A 332, 47–53 (2004) 16. Kuckert, B., Mund, J.: Spin & Statistics in Nonrelativistic Quantum Mechanics, II. Ann. Phys. (Leipzig) 14, 309–311 (2005) 17. Streater, R. F.: Local Field with the Wrong Connection Between Spin and Statistics. Commun. Math. Phys. 5, 88–98 (1967) 18. Streater, R. F., Wightman, A.S.: PCT, Spin & Statistics, and All That, New York: Benjamin, 1964 19. Takesaki, M.: Tomita’s Theory of Modular Hilbert Algebras and Its Applications. Lecture Notes in Mathematics 128, New York: Springer, 1970 20. Unruh, W.G.: Notes on black-hole evaporation. Phys. Rev. D14, 870–892 (1976) Communicated by Y. Kawahigashi

Commun. Math. Phys. 269, 833–849 (2007) Digital Object Identifier (DOI) 10.1007/s00220-006-0149-3

Communications in

Mathematical Physics

Generalized Kähler Manifolds and Off-shell Supersymmetry Ulf Lindström1,2 , Martin Roˇcek3 , Rikard von Unge4 , Maxim Zabzine1,5 1 Department of Theoretical Physics, Uppsala University, Box 803, SE-751 08 Uppsala, Sweden. 2 3 4 5

E-mail: [email protected] HIP-Helsinki Institute of Physics, University of Helsinki, P.O. Box 64 FIN-00014 Suomi, Finland C.N.Yang Institute for Theoretical Physics, Stony Brook University, Stony Brook, NY 11794-3840, USA Institute for Theoretical Physics, Masaryk University, 61137 Brno, Czech Republic Kavli Institute of Theoretical Physics, University of California, Santa Barbara, CA 93106, USA

Received: 6 January 2006 / Accepted: 16 June 2006 Published online: 15 November 2006 – © Springer-Verlag 2006

Abstract: We solve the long standing problem of finding an off-shell supersymmetric formulation for a general N = (2, 2) nonlinear two dimensional sigma model. Geometrically the problem is equivalent to proving the existence of special coordinates; these correspond to particular superfields that allow for a superspace description. We construct and explain the geometric significance of the generalized Kähler potential for any generalized Kähler manifold; this potential is the superspace Lagrangian.

1. Introduction Recently general N = (2, 2) supersymmetric sigma models have attracted considerable attention; the renewed interest comes both from physics and mathematics. The physics is related to compactifications with NS-NS fluxes, whereas the mathematics is associated with generalized complex geometry, in particular, generalized Kähler geometry, which is precisely the geometry of the target space of N = (2, 2) supersymmetric sigma models. The general N = (2, 2) sigma model originally described in [1] has been studied extensively in the physics literature. However, until now an N = (2, 2) off-shell supersymmetric formulation has not been known in the general case. At the physicist’s level of rigor, a description in terms N = (2, 2) superfields would imply the existence of a single function that encodes the local geometry–a generalized Kähler potential. Geometrically the problem of N = (2, 2) off-shell supersymmetry amounts to the proper understanding of certain natural local coordinates and the generalized Kähler potential. In the present work we resolve the issue of what constitutes a complete description of the target space geometry of a general N = (2, 2) sigma model. We show that the full set of fields consists of chiral, twisted chiral and semichiral fields. This was a natural guess after semichiral superfields were discovered in [2], and was explicitly conjectured by Sevrin and Troost [3]; however, in [4], which contains many useful and interesting results, the erroneous conclusion that this is not the case was reached.

834

U. Lindström, M. Roˇcek, R. von Unge, M. Zabzine

The bulk of the paper is devoted to the proof that certain local coordinates for generalized Kähler geometry exist. From the point of view of N = (2, 2) supersymmetry these coordinates are natural and correspond to the basic superfield ingredients of the model. The paper is organized as follows. In Sect. 2 we review the general N = (2, 2) sigma model and describe the generalized Kähler geometry. Section 3 states the problem of off-shell supersymmetry and explains what should be done to solve it. In Sect. 4 we describe three relevant Poisson structures and their symplectic foliations, and identify coordinates adapted to these foliations. For the sake of clarity in Sect. 5 we start with a special case when ker[J+ , J− ] = ∅. In this case we show that the correct coordinates exist and we explain the existence of the generalized Kähler potential. Next, in Sect. 6 we extend our results to the general case. Finally, in Sect. 7 we summarize our results and explain some open problems. Warning to mathematicians. Due to our background as physicists, we like to work in local coordinates with all indices written out. However, all expressions can be written in coordinate free form, except when we discuss the specific local coordinates in Sects. 5 and 6; however, even these local coordinates are merely a convenience, and an appropriate global reformulation of our results certainly exists. 2. Generalized Kähler Geometry In this section we review the results on general N = (2, 2) supersymmetric sigma models from the original work [1] (some of these results were found independently in [5, 6]). We define our notation and introduce some relevant concepts. We start from the general N = (1, 1) sigma model written in N = (1, 1) superfields (see Appendix A for our conventions) S∝ d 2 σ d 2 θ D+ μ D− ν (gμν () + Bμν ()). (2.1)

The action (2.1) is manifestly supersymmetric under the usual supersymmetry transformations δ1 ()μ = −i( + Q + + − Q − )μ ,

(2.2)

which form the standard supersymmetry algebra [δ1 (1 ), δ1 (2 )] μ = −2i1+ 2+ ∂++ μ − 2i1− 2− ∂= μ .

(2.3)

We may look for additional supersymmetry transformations of the form [1] μ

μ

δ2 ()μ = + D+ ν J+ν () + − D− ν J−ν () .

(2.4)

Classically, the ansatz (2.4) is unique for dimensional reasons. The action (2.1) is invariant under the transformations (2.4) provided that μ

μ

J±ρ gμν = −gμρ J±ν

(2.5)

and μ

μ

μ

σ ±σ ∇ρ(±) J±ν ≡ J±ν,ρ + ±μ ρσ J±ν − ρν J±σ = 0,

(2.6)

Generalized Kähler Manifolds and Off-shell Supersymmetry

835

where the two affine connections μ μσ Hσρν ±μ ρν = ρν ± g

(2.7)

have torsion determined by the field strength of Bμν (): Hμρσ =

1 Bμρ,σ + Bρσ,μ + Bσ μ,ρ . 2

(2.8)

Indeed the functional (2.1) can be rewritten in terms of an extension of H to a ball whose boundary is the surface modulo the usual arguments that apply to the bosonic WZW-term, namely [H ] ∈ H 3 (M, Z). Next we impose the standard on-shell N = (2, 2) supersymmetry algebra: The first supersymmetry transformations (2.2) and the second supersymmetry transformations (2.4) automatically commute [δ2 (1 ), δ1 (2 )] μ = 0.

(2.9)

The commutator of two second supersymmetry transformations, μ

μ

ν ν ) + 2i1− 2− ∂= λ (J−ν J−λ ) [δ2 (1 ), δ2 (2 )] μ = 2i1+ 2+ ∂++ λ (J+ν J+λ μ

μ

− 1+ 2+ D+ λ D+ ρ N λρ (J+ ) − 1− 2− D− λ D− ρ N λρ (J− ) μ ν μ ν − J−ν J+λ + 1+ 2− + 1− 2+ J+ν J−λ σ ν × D+ D− λ + −λ (2.10) σ ν D+ D− ,

should satisfy the same algebra as the first (2.3), i.e., [δ2 (1 ), δ2 (2 )]μ = −2i1+ 2+ ∂++ μ − 2i1− 2− ∂= μ .

(2.11)

In (2.10), N (J± ) is the Nijenhuis tensor defined by ρ

ρ

λ λ + ∂λ J[ν Jμ] . N ρμν (J ) = Jλ ∂[ν Jμ]

(2.12)

The field equations that follow from the action (2.1) are ρ σ D+ D− μ + −μ ρσ D+ D− = 0.

(2.13)

The first two lines of (2.10) are purely kinematical, i.e., are independent of the form of the action; the last line involves the field equations (2.13), and follows after imposing (2.6). The algebra (2.10) is the usual supersymmetry algebra (2.3) when J± obey: μ

ρ

J±ν J±μ = −δ ρ ν ,

(2.14)

N ρ μν (J± ) = 0 ;

(2.15)

the last term in (2.10) must also vanish; this is automatic on-shell, i.e., when the field equations (2.13) are satisfied. Thus the on-shell supersymmetry algebra requires that J± are integrable complex structures that preserve the metric; we may introduce the forms ω± = g J± , which are not closed, but satisfy γ

λ σ J±ν J±ρ (dω± )λσ γ , Hμνρ = ∓J±μ

as follows from (2.5), (2.6), (2.14) and (2.15).

(2.16)

836

U. Lindström, M. Roˇcek, R. von Unge, M. Zabzine

This is the full description of the most general N = (2, 2) sigma model [1]: The target manifold (M, g, J± , H ) is a bihermitian complex manifold (i.e., there are two complex structures and a metric that is Hermitian with respect to both) and the two complex structures must be covariantly constant with respect to connections that differ by the sign of the torsion; this torsion is expressed in terms of a closed 3-form that obeys (2.16). This bihermitian geometry was first described in [1]. Subsequently, a different geometric interpretation was given in [7], and more recently, following ideas of Hitchin [8], Gualtieri [9] gave an entirely new description of this geometry in terms of generalized complex structures. This geometry is now known as generalized Kähler geometry. 3. N = (2,2) Off-Shell Supersymmetry In the previous section, the field equations (2.13) are needed to close the supersymmetry algebra. To write the model in a manifestly N = (2, 2) covariant form, the algebra must close off-shell. As can be seen from (2.10), the algebra does close off-shell when the two complex structures commute [1]: [J+ , J− ] = 0. In this case, both complex structures and the product structure = J+ J− are integrable and simultaneously diagonalizable. The manifest N = (2, 2) formulation is given in terms of chiral (φ) and twisted chiral (χ ) scalar superfields: ¯ ± φ = D± φ¯ = 0, D ¯ − χ¯ = 0, ¯ + χ = D− χ = D+ χ¯ = D D

(3.17)

where D is the N = (2, 2) covariant derivative. The N = (2, 2) Lagrangian is a general ¯ χ , χ¯ ), defined modulo (the equivalent of) a Kähler gauge transreal function K (φ, φ, ¯ χ¯ ) + g(φ, χ¯ ) + g( ¯ χ ). This K serves as a potential both for formation: f (φ, χ ) + f¯(φ, ¯ φ, the metric and for the antisymmetric B-field. When [J+ , J− ] = 0, additional (auxiliary) spinorial N = (1, 1) fields are needed to close the algebra [10, 11]. The semichiral N = (2, 2) scalar superfields introduced in [2] give rise to such auxiliary fields when they are reduced to N = (1, 1) superspace. A complex left semichiral superfield X L obeys ¯ + X L = D+ X ¯ L = 0, D

(3.18)

and a right semichiral superfield X R obeys ¯ − X R = D− X ¯ R = 0. D

(3.19)

For these multiplets, the N = (2, 2) nonlinear sigma model Lagrangian1 is the real func¯ L , XR , X ¯ R ), defined modulo f (X L ) + f¯(X ¯ L ) + g(X R ) + g( ¯ R ). Again, tion K (X L , X ¯ X the function K is a potential for the metric and the antisymmetric B-field [2]. The target space has generalized Kähler geometry with [J+ , J− ] = 0 [12]. However, before our work, it was not known if all generalized Kähler geometries with [J+ , J− ] = 0 admit a description in terms of semichiral superfields. In [13], it is shown that the kernel of [J+ , J− ] is parametrized completely by chiral and twisted chiral fields. This does not answer the question of whether semichiral multiplets similarily give a complete description of the cokernel. The issue has been addressed, e.g., in [14, 3, 15]. 1 In [2], for simplicity, no chiral or twisted chiral multiplets are considered, and hence [J , J ] is invertible. + −

Generalized Kähler Manifolds and Off-shell Supersymmetry

837

The general sigma model Lagrangian containing chiral, twisted chiral, and semichiral fields is a real function ¯ L , XR , X ¯ R) ¯ χ , χ¯ , X L , X K (φ, φ,

(3.20)

¯ L ) + g( ¯ R ). When there ¯ χ¯ , X ¯ χ, X ¯ X R ) + f¯(φ, ¯ φ, defined modulo f (φ, χ , X L ) + g(φ, χ, are several multiplets of each kind2 , the fields carry indices

φ α , φ¯ α¯ , α = 1 . . . dc , χ α , χ¯ α¯ , α = 1 . . . dt , ¯ a¯ , a = 1 . . . ds , Xa , X ¯ a¯ , a = 1 . . . ds . XaL , X L R R

(3.21)

¯ and We will also use the collective notation A := {α, α}, ¯ A := {α , α¯ }, A := {a, a} A := {a , a¯ }. To reduce the N = (2, 2) action to its N = (1, 1) form, we introduce the N = (1, 1) covariant derivatives D and extra supercharges Q: ¯ ±, D± = D± + D ¯ ± ). Q ± = i(D± − D

(3.22)

In terms of these, the (anti)chiral, twisted (anti)chiral and semi (anti)chiral superfields satisfy Q ± φ = Jc D± φ , Q + X L = Js D+ X L ,

Q ± χ = ±Jt D± χ , Q − X R = Js D− X R ,

(3.23)

where the collective notation is used in the matrices, and where Jc , Jt , and Js are 2dc , 2dt , and 2ds dimensional canonical complex structures of the form i 0 . (3.24) J= 0 −i For the pair (φ, χ ) we use the same letters to denote the N = (1, 1) superfields, i.e., the lowest components of the N = (2, 2) superfields (φ, χ ). Each of the semi (anti)chiral fields gives rise to two N = (1, 1) fields: X L ≡ X L |, X R ≡ X R |,

L− ≡ Q − X L |, R+ ≡ Q + X R |,

(3.25)

where a vertical bar means that we take the θ 2 ∝ θ − θ¯ independent component. The conditions (3.23) then also imply Q + L− = Js D+ L− , Q − R+ = Js D− R+ ,

Q − L− = + i∂= X L , Q + R+ = + i∂++ X R .

(3.26)

Using the relations (3.22)-(3.26) we reduce the N = (2, 2) action to its N = (1, 1) form according to: ¯ 2K| d 2 ξ d 2 θ d 2 θ¯ K (φ A , χ A , X LA , X RA )| = d 2 ξ D2 D i =− (3.27) d 2 ξ D 2 Q + Q − K |. 4 2 To be able to integrate out the auxiliary N = (1, 1) spinor superfields, we require an equal number of left and right semichiral superfields X L and X R .

838

U. Lindström, M. Roˇcek, R. von Unge, M. Zabzine

Provided that the matrix

KLR ≡

K ab K a b¯ K ab ¯ K a¯ b¯

(3.28)

is invertible, the auxiliary spinors L− , R+ may be integrated out leaving us with a N = (1, 1) second order sigma model action of the type originally discussed in [1]. In (3.28) we use the following notation K ab ≡ ∂a ∂b K , etc. From this the metric and antisymmetric B-field may be read off in terms of derivatives of K , and from the form of the second supersymmetry the complex structures J± are determined. In a basis where the coordinates are arranged in a column as ⎛ ⎞ X LA ⎜ A ⎟ ⎜ XR ⎟ ⎜ ⎟ (3.29) ⎜ φA ⎟ , ⎝ ⎠ χA and introducing the notation (suppressing the hopefully obvious index structure) K L−1R = (K R L )−1 ,

C = JK − KJ = A = JK + KJ =

0 2i K −2i K 0

2i K 0 0 −2i K

, ,

the complex structures read [4] ⎞ ⎛ 0 0 0 Js −1 −1 −1 −1 ⎜K C K J K K C K C ⎟ J+ = ⎝ R L L L R L s L R R L Lc R L Lt ⎠ 0 0 Jc 0 0 0 0 Jt and

⎞ K L−1R Js K R L K L−1R C R R K L−1R C Rc K L−1R A Rt ⎟ ⎜ 0 Js 0 0 J− = ⎝ ⎠ 0 0 Jc 0 0 0 0 −Jt

(3.30)

(3.31)

⎛

(3.32)

where, e.g., K Rc is the matrix of second derivatives along R- and c-directions, etc. In Sects. 5 and 6, where we rederive these expressions from geometrical considerations, we explain the notation in greater detail. Finally, we compute the N = (1, 1) Lagrangian; the sum E = 21 (g + B) of the metric g and B-field takes on the explicit form: E L L = C L L K L−1R Js K R L , E L R = Js K L R Js + C L L K L−1R C R R , E Lc = K Lc + Js K Lc Jc + C L L K L−1R C Rc , E Lt = −K Lt − Js K Lt Jt + C L L K L−1R A Rt , E R L = −K R L Js K L−1R Js K R L ,

Generalized Kähler Manifolds and Off-shell Supersymmetry

839

E R R = −K R L Js K L−1R C R R , E Rc = K Rc − K R L Js K L−1R C Rc , E Rt = −K Rt − K R L Js K L−1R A Rt , E cL = CcL K L−1R Js K R L , E c R = Jc K c R Js + CcL K L−1R C R R , E cc = K cc + Jc K cc Jc + CcL K L−1R C Rc , E ct = −K ct − Jc K ct Jt + CcL K L−1R A Rt , E t L = Ct L K L−1R Js K R L , E t R = Jt K t R Js + Ct L K L−1R C R R , E tc = K tc + Jt K tc Jc + Ct L K L−1R C Rc , E tt = −K tt − Jt K tt Jt + Ct L K L−1R A Rt .

(3.33)

It is interesting that there are no corrections from chiral and twisted chiral fields in the semichiral sector (where the results agree with [2, 12]), whereas in the chiral and twisted chiral sector the semichiral fields contribute substantially. Thus locally all objects (J± , g, B) are given in terms of second derivatives of a single real function K . By construction, the present geometry is generalized Kähler geometry and therefore satisfies all the relations from the previous section. In the rest of the paper we show that (locally) any generalized Kähler manifold has such a description. 4. Poisson Structures In this section we describe three Poisson structures that arise in generalized Kähler geometry. We study these Poisson structures as we will use local coordinates adapted to their foliations. Since the Poisson geometry is rather a novel subject to some physicists, we collect some basic facts in Appendix C. We start with the two real Poisson structures π± ≡ (J+ ± J− )g −1 = −g −1 (J+ ± J− )t ,

(4.34)

which were introduced in [7] and later rederived by Gualtieri [9]. We can choose local coordinates in a neighborhood of a regular point x0 of π− such that3 Aμ

π−

= 0,

(4.35)

where A label the coordinates along the kernel of π− ; using (4.34), in these coordinates the complex structures obey A A J+ν = J−ν .

(4.36)

Repeating the same argument for π+ we get

A A = −J−ν , J+ν

(4.37)

3 A regular point x of a Poisson structure π is a point where the rank of π does not vary in a neighborhod 0 of x0 ; see Appendix C.

840

U. Lindström, M. Roˇcek, R. von Unge, M. Zabzine

where A label the coordinates along the kernel of π+ . Moreover, as the combinations (π+ ± π− ) ∝ J± are nondegenerate, the Poisson brackets defined by π+ and π− cannot have common Casimir functions4 which parametrize the kernels of π± . This means that the directions A and A do not intersect and we can choose coordinates where both the relations (4.36) and (4.37) hold [7]. We denote the remaining directions by A and A (for the moment, we do not distinguish A and A ). Thus we have shown that there exist coordinates, labeled by μ = (A, A , A, A ), where ⎞ ⎞ ⎛ ⎛ ∗∗ ∗ ∗ ∗∗ ∗ ∗ ⎜∗ ∗ ∗ ∗ ⎟ ⎜∗ ∗ ∗ ∗ ⎟ , J− = ⎝ , (4.38) J+ = ⎝ 0 0 Jc 0 ⎠ 0 0 Jc 0 ⎠ 0 0 0 Jt 0 0 0 −Jt and where Jc , Jt are canonical complex structures defined as in (3.24). The existence of these coordinates was originally shown in [13]. Using Poisson geometry this result is rederived in [7]. We can thus choose local coordinates adapted to the following decomposition: ker(J+ − J− ) ⊕ ker(J+ + J− ) ⊕ coker[J+ , J− ],

(4.39)

where we use the property [J+ , J− ] = (J+ − J− )(J+ + J− ) = −(J+ + J− )(J+ − J− ).

(4.40)

Another important Poisson structure σ = [J+ , J− ]g −1

(4.41)

was introduced in [16]. It is related to the real Poisson structures (4.34): σ = ±(J+ ∓ J− )π± = ∓(J+ ± J− )π∓ .

(4.42)

The identity (4.40) implies a relation between the kernels of the three structures ker σ = ker π+ ⊕ ker π− .

(4.43)

The symplectic leaf for σ is coker[J+ , J− ]. The Poisson structure σ satisfies J± σ J±t = −σ ; this implies that in complex coordinates with respect to either J± , σ = σ (2,0) + σ¯ (0,2) ,

(4.44)

which implies that the real dimension of the symplectic leaves for σ is a multiple of 4 (this was first proven in [3]). Indeed, σ can be interpreted as the (2, 0) + (0, 2) projection of e.g., π+ , with respect to either J = J± : (1 ± i J )σ (1 ± i J )t = ∓2i(1 ± i J )π+ (1 ± i J )t .

(4.45)

It turns out that σ (2,0) is actually a holomorphic Poisson structure [16]: ¯ (2,0) = 0. ∂σ

(4.46)

As discussed above (4.39), we have established that along the kernel of σ , complex coordinates can be simultaneously chosen for both J+ and J− . Using the properties of the cokernel of σ , in particular (4.44, 4.46), in the next two sections we find natural coordinates along the symplectic leaf of σ as well. 4 Casimir functions give the coordinates along which a Poisson structure is degenerate; see Appendix C.

Generalized Kähler Manifolds and Off-shell Supersymmetry

841

5. Structure of coker [ J+ , J−] To simplify the argument, we first consider the special case when ker[J+ , J− ] = ∅ on M and σ is thus invertible; this implies dc = dt = 0, and the complex dimension of M is 2ds . Since σ is a Poisson structure, the two-form5 = σ −1 ,

(5.47)

is closed d = 0; it also satisfies J±t J± = −. Choosing complex coordinates with respect to J+ , ⎛ ⎞ i 0 0 0 Js 0 ⎜ 0 −i 0 0 ⎟ , (5.48) ≡ J+ = ⎝ ⎠ 0 0 i 0 0 Js 0 0 0 −i we decompose the symplectic form into its (2, 0) and (0, 2) parts [4], (2,0)

= +

¯ (0,2) + . +

(5.49)

¯ (2,0) ∂ = 0, +

(5.50)

Then d = 0 implies (2,0)

∂+

= 0,

¯ being a holomorphic (antiholomorand its complex conjugate expressions with ∂ (∂) (2,0) phic) differential. Thus + is a holomorphic symplectic structure and according to Darboux’s theorem we can choose coordinates {q a , q¯ a¯ , pa , p¯ a¯ }, a = 1 . . . ds such that (2,0)

+

= dq a ∧ dpa ,

¯ (0,2) = d q¯ a¯ ∧ d p¯ a¯ . +

(5.51)

These coordinates are compatible with (5.48); the choice of which coordinates we call q and which we call p is called a polarization. Alternatively, we can choose complex coordinates with respect to J− ; then we have (2,0) (2,0) ¯ (0,2) = − + is again a holomorphic symplectic structure. Thus we − , and − can introduce the coordinates {Q a , Q¯ a¯ , P a , P¯ a¯ } a = 1 . . . ds such that

(2,0) = d Qa ∧ d P a , − In these coordinates J− has the form ⎛ i 0 ⎜ 0 −i J− = ⎝ 0 0 0 0

0 0 i 0

¯ (0,2) = d Q¯ a¯ ∧ d P¯ a¯ . −

⎞ 0 0 ⎟ Js 0 . ≡ 0 ⎠ 0 Js −i

(5.52)

(5.53)

The coordinate transformation {q, p} → {Q, P} preserves , and hence is a canonical transformation (symplectomorphism). A canonical transformation can always be described by a generating function K that depends a ds -dimensional subset of the “old” coordinates {q, p} and a ds -dimensional subset of the “new” coordinates {Q, P} 5 This two-form was introduced in [4]; however the authors erroneously concluded that there exist obstructions to the existence of the coordinates that make constant.

842

U. Lindström, M. Roˇcek, R. von Unge, M. Zabzine

(see, e.g., [17]). For simplicity, we choose our polarization such that the generating function K depends on the “old” q and the “new” P coordinates; it is a theorem that such a polarization always exists [17]. Thus in a neighborhood, the canonical transformation is given by the generating function K (q, P), p=

∂K , ∂q

Q=

∂K . ∂P

(5.54)

We now calculate J+ , J− and in the “mixed” coordinates {q, P}. Consider J+ . In {q, P} coordinates J+ is given by ∂(q, p) −1 Js 0 ∂(q, p) J+ = . (5.55) 0 Js ∂(q, P) ∂(q, P) The transformation matrix is given as 1 1 0 ∂(q, p) = ∂ p ∂ p = ∂2 K ∂(q, P) ∂q ∂ P ∂q∂q where in complex coordinates we have K ab K a b¯ KLL = , K ab ¯ K a¯ b¯

0

∂ P∂q

KLR =

≡

∂2 K

1 0 KLL KLR

K ab K a b¯ K ab ¯ K a¯ b¯

,

(5.56)

,

(5.57)

and we have anticipated our identification of the generating function K (q, P) with the action K (X L , X R ) by introducing the labels R, L. We find 1 0 Js 0 1 0 J+ = 0 Js KLL KLR −K R−1L K L L K R−1L 0 Js , (5.58) = K R−1L C L L K R−1L Js K L R where K L R and C L L are defined in (3.30) in terms of second derivatives of the generating function K . Thus in the coordinates {q, P}, J+ is given by (5.58). Identifying the generating function K (q, P) with the action K (X L , X R ), this result coincides with the one we get from the semichiral sigma model [2, 4] (cf. (3.31) with no chiral or twisted chiral fields.). Next we calculate J− in {q, P} coordinates ∂(Q, P) ∂(Q, P) −1 Js 0 , (5.59) J− = 0 Js ∂(q, P) ∂(q, P) where ∂(Q, P) = ∂(q, P)

∂Q

∂Q ∂q ∂ P

0

1

=

∂2 K ∂2 K ∂q∂ P ∂ P∂ P

0

1

≡

K RL K RR 0 1

In complex coordinates K R L = (K L R )t defined as in (5.57) and K R R is K a b K a b¯ . KRR = K a¯ b K a¯ b¯

.

(5.60)

(5.61)

Generalized Kähler Manifolds and Off-shell Supersymmetry

843

Thus we can rewrite (5.59) as −1 K RL K RR Js 0 K L R −K L−1R K R R J− = 0 Js 0 1 0 1 −1 −1 K L R Js K R L K L R C R R = , 0 Js

(5.62)

where C R R was defined in (3.30). Once more, we have reproduced the semichiral expression (cf. (3.32)). Finally in coordinates (q, P) is given by 0 KLR . (5.63) = −K R L 0 In these coordinates the metric g is given by [4] g = [J+ , J− ],

(5.64)

and this is the same as from semichiral considerations. Thus we have shown that the metric can be expressed in terms of second derivatives of a single potential K . However, unlike the case of standard Kähler geometry, the metric is not linear in the derivatives of K . It is natural to refer to K as a generalized Kähler potential. This potential has the interpretation simultaneously as a superspace Lagrangian and as the generating function of a canonical transformation6 between the complex coordinates adapted to J+ and the complex coordinates adapted to J− . Furthermore, recalling that we have assumed ker[J+ , J− ] = ∅ throughout this section, the form ((2,0) )ds is nondegenerate and defines a holomorphic volume form. Thus this is a generalized Calabi-Yau manifold [8]. Finally, one may wonder if there actually exist examples where ker[J+ , J− ] = ∅. The work of [2] provides a local example in four-dimensions; in arbitrary dimensions, one can consider hyperkähler manifolds: Theorem 1. A generalized Kähler manifold with the anticommutator of J+ and J− constant, i.e., {J+ , J− } = cI, is a hyperkähler manifold whenever |c| < 2. Proof. Using (2.6), the proof is straightforward in local coordinates. Alternatively one can observe that B = {J+ , J− } [4], and hence the torsion, which is proportional to d B, vanishes. The explicit complex structures of the hyperkähler manifold can be chosen as: I = J+ ,

J=

1 1−

c2 4

J− +

c J+ , K = I J. 2

(5.65)

The construction we have presented can be applied to the hyperkähler case with a new generalized Kähler potential. Indeed from the condition {J+ , J− } = cI, we get a partial differential equation for K in the hyperkähler case. In [3] it has been pointed out that in four dimensions, for c = 0, this is the Monge-Ampère equation. 6 This situation was found previously for N = (4, 4) hyperkähler sigma models described in projective superspace [18].

844

U. Lindström, M. Roˇcek, R. von Unge, M. Zabzine

6. General Case We now turn to the general case with both ker([J+ , J− ]) and coker([J+ , J− ]) nontrivial. Essentially, we have to combine the arguments presented in the two previous sections. We assume that in a neighborhood of x0 , the ranks of π± are constant, and as result, the rank of σ is constant. We work in coordinates adapted to the symplectic foliation of σ . Combining the notations from previous sections, we can choose coordinates {q, p, z, z } in which J+ has the canonical form ⎛

Js ⎜0 J+ = ⎝ 0 0

0 Js 0 0

0 0 Jc 0

⎞ 0 0⎟ , 0⎠ Jt

(6.66)

where we use the notation (3.24). The coordinates z and z parametrize the kernels of π∓ , respectively. Thus {z, z } parametrize the kernel of σ and {q, p} are the Darboux coordinates for a symplectic leaf. On a leaf the symplectic form is given by (5.51). Alternatively we can choose the coordinates {Q, P, z, z } in which J− has a canonical form7 ⎛

Js ⎜0 J− = ⎝ 0 0

0 Js 0 0

0 0 Jc 0

⎞ 0 0 ⎟ . 0 ⎠ −Jt

(6.67)

Again (Q, P) are the Darboux coordinates on a leaf with the symplectic form given by (5.52). If we fix a leaf (i.e., put (z, z ) to a fixed value) then we can apply the discussion from Sect. 5. Thus we can choose new coordinates {q, P}) along a leaf in a neighborhood of (q0 , p0 ) (see the discussion of the existence of these coordinates in Sect. 5). There exists a generating function K such that the relations (5.54) are satisfied. This argument can be a applied to a single leaf. If we change to another leaf then we get another generating function. Thus in a neighborhood of x0 we have a family8 of generating functions K (q, P, z, z ) such that p=

∂K , ∂q

Q=

∂K ∂P

(6.68)

is satisfied. With this definition, K (q, P, z, z ) is defined up to the addition of an arbitrary function f (z, z ). Now we can calculate J± in the coordinates {q, P, z, z }; the complex structure J+ is J+ =

p, z, z ) −1

∂(q, ∂(q, P, z, z )

⎛

Js ⎜0 ⎝0 0

0 Js 0 0

0 0 Jc 0

⎞ 0 0 ⎟ ∂(q, p, z, z ) . 0 ⎠ ∂(q, P, z, z ) Jt

(6.69)

7 We choose signs that are consistent with the sigma model results. 8 One may wonder if the dependence of K on z and z is smooth; this is necessary to write the coordinate transformation to {q, P, z, z }. The existence of these coordinates follows from Arnold’s result [17].

Generalized Kähler Manifolds and Off-shell Supersymmetry

845

The transformation matrix is given as ⎞ ⎞ ⎛ ⎛ 1 0 0 0 1 0 0 0 ⎜ ∂ p ∂ p ∂ p ∂ p ⎟ ⎜ ∂ 2 K ∂ 2 K ∂ 2 K ∂ 2 K ⎟ ∂(q, p, z, z ) ⎜ ∂q ∂ P ∂z ∂z ⎟ = ⎜ ∂q∂q ∂ P∂q ∂z∂q ∂z ∂q ⎟ = ⎠ ⎝ 0 0 1 0 ⎠ ⎝0 ∂(q, P, z, z ) 0 1 0 0 0 0 1 0 0 0 1 ⎛

1 0 ⎜ KLL KLR = ⎝ 0 0 0 0

0 K Lc 1 0

⎞ 0 K Lt ⎟ , 0 ⎠ 1

where in complex coordinates K L L and K L R were defined in (5.57) and K aα K a α¯ K aα K a α¯ , K Lt = . K Lc = K aα K aα ¯ K a¯ α¯ ¯ K a¯ α¯ Next using (6.69) and (6.70) we calculate J+ , ⎞ ⎛ 0 0 0 Js ⎜ K −1 C K −1 J K K −1 C K −1 C ⎟ J+ = ⎝ R L L L R L s L R R L Lc R L Lt ⎠ , 0 0 Jc 0 0 0 0 Jt

(6.70)

(6.71)

(6.72)

where all of the C matrices are defined in (3.30). This is exactly the same expression one gets from the sigma model considerations (3.31). Similarly, we calculate the form of J− in {q, P, z, z } coordinates: ⎛ ⎞ −1 Js 0 0 0 ∂(Q, P, z, z ) ⎜ 0 Js 0 0 ⎟ ∂(Q, P, z, z ) J− = , (6.73) ⎝ ⎠ 0 0 Jc 0 ∂(q, P, z, z ) ∂(q, P, z, z ) 0 0 0 −Jt ⎞ K L−1R Js K R L K L−1R C R R −K L−1R C Rc K L−1R A Rt ⎟ ⎜ 0 −Js 0 0 J− = ⎝ ⎠, 0 0 Jc 0 0 0 0 −Jt ⎛

(6.74)

where again the C and A matrices were defined in (3.30) and K Rc and K Rt are K a α K a α¯ K a α K a α¯ , K . (6.75) K Rc = = Rt K a¯ α K a¯ α¯ K a¯ α K a¯ α¯ This is exactly the same expression one gets from the sigma model (3.32). We now consider the metric; in the coordinates {q, P, z, z }, the metric has a form ⎞ ⎛ g AB g AB g AB g AB ⎜g g g g ⎟ (6.76) g = ⎝ A B A B A B A B ⎠ . gA B gA B gAB gAB gA B gA B gA B gA B

846

U. Lindström, M. Roˇcek, R. von Unge, M. Zabzine

The definition (4.41) of the Poisson structure σ determines all the components of the metric g except those along the kernel of σ : gAB , gAB , gA B , gA B ; this matches the ambiguity in the generating function K (q, P, z, z ) noted below (6.68). The remaining components of the metric can be expressed in terms of the second derivatives of K (q, P, z, z ) using the relation (2.16): γ

γ

λ σ λ σ J+μ J+ν J+ρ (dω+ )λσ γ = −J−μ J−ν J−ρ (dω− )λσ γ .

(6.77)

This is obvious in the Kähler case (J+ = J− ), and was shown to be true whenever the [J+ , J− ] = 0 in [1]. In the general case, we argue as follows: choosing the local coordinates (q, P, z, z ) we can plug the complex structures (6.72) and (6.74) into (6.77). After this the relation (6.77) becomes a first order partial differential equation for the metric g. The differential equation contains the derivatives of K . However, we know a solution for g (which is indeed expressible completely in terms of the second derivatives of K ): it is precisely the expression derived from the sigma model (see the expression for E in Sect. 3). Similarly, (2.16) can be used to determine the 2-form B in terms of the second derivatives of K . Thus we have established the existence of a generalization of the concept of a Kähler potential for generalized Kähler geometry. It is natural to refer to this function as a generalized Kähler potential. Of course, as we found in the previous section, the second derivatives of the generalized Kähler potential appear nonlinearly in the metric. 7. Summary and Discussion We have resolved the long standing problem of finding manifestly off-shell supersymmetric formulation for the general N = (2, 2) sigma model. We have shown that the full set of fields which is necessary for the description of general N = (2, 2) sigma model consists of chiral, twisted chiral, and semichiral fields. At the geometrical level this implies important results about the generalized Kähler geometry, in particular the existence of a generalized Kähler potential. Thus for generalized Kähler manifold all the differential geometry can be locally encoded in a single real function. We have presented a geometrical proof of this which is essentially independent of sigma model considerations. The only assumption we made was the regularity of the Poisson structures π± in a given neighborhood; presumably, continuity allows one to relax this assumption in most cases of physical interest. In general, it would be interesting to go beyond this assumption; this would require the full apparatus of Poisson geometry, in particular a study of the transversal Poisson structures around x0 . It follows that one can now discuss the general N = (2, 2) sigma models entirely within the powerful N = (2, 2) superfield formalism. In particular such a problem as finding quotients of generalized Kähler manifolds can be studied in all generality in this formalism. We plan to come back to this elsewhere. From the mathematical point of view, it would be interesting to systematically study the first order PDE for the metric that arises from Eq. (6.77). Taking into account the discussion in Sect. 5, we seem to have some new tools with which to study hyperkähler manifolds. Acknowledgements. We are grateful to the 2005 Simons Workshop for providing the stimulating atmosphere where this work was initiated. UL was supported by EU grant (Superstring theory) MRTN-2004-512194 and VR grant 621-2003-3454. The work of MR was supported in part by NSF grant no. PHY-0354776 and Supplement for International Cooperation with Central and Eastern Euorpe PHY 0300634. The research of R.v.U. was supported by Czech Ministry of Education contract No. MSM0021622409 and by Kontakt grant ME649. The research of M.Z. was supported by VR-grant 621-2004-3177 and in part by the National Science Foundation under the Grant No. PHY99-07949.

Generalized Kähler Manifolds and Off-shell Supersymmetry

847

A. N = (1,1) Supersymmetry In this and the next appendix we collect our notation for N = (1, 1) and N = (2, 2) superspace. In our conventions we closely follow [19]. We use real (Majorana) two-component spinors ψ α = (ψ + , ψ − ). Spinor indices are raised and lowered with the second-rank antisymmetric symbol Cαβ , which defines the spinor inner product: Cαβ = −Cβα = −C αβ ,

C+− = i,

ψα = ψ β Cβα ,

ψ α = C αβ ψβ .

(A.1)

Throughout the paper we use (++, =) as worldsheet indices, and (+, −) as two-dimensional spinor indices. We also use superspace conventions where the pair of spinor coordinates of the two-dimensional superspace are labelled θ ± , and the spinor derivatives D± and supersymmetry generators Q ± satisfy D+2 = i∂++ ,

2 D− = i∂= ,

{D+ , D− } = 0,

±

Q ± = i D± + 2θ ∂++ ,

(A.2)

=

where ∂++ = ∂0 ± ∂1 . The supersymmetry transformation of a superfield is given by =

δ ≡ −i(ε+ Q + + ε− Q − ) = (ε+ D+ + ε− D− ) − 2i(ε+ θ + ∂++ + ε− θ − ∂= ).

(A.3)

The components of a scalar superfield are defined by projection as follows: | ≡ X,

D± | ≡ ψ± ,

D+ D− | ≡ F,

(A.4)

where the vertical bar | denotes “the θ = 0 part”. The N = (1, 1) spinorial measure is conveniently written in terms of spinor derivatives: 2 (A.5) d θ L = (D+ D− L) . B. N = (2, 2) Supersymmetry In N = (2, 2) superspace, we have two independent N = (1, 1) subalgebras with spinor derivatives Dα1 , Dα2 ; we define complex complex spinor derivatives Dα ≡

1 1 (D + i Dα2 ), 2 α

¯ α = 1 (Dα1 − i Dα2 ) D 2

(B.1)

which obey the algebra ¯ + } = i∂++ , {D− , D ¯ − } = i∂= , {D+ , D ¯ ¯ {Dα , Dβ } = 0, {Dα , Dβ } = 0.

(B.2)

These can be written in terms of complex spinor coordinates: i D± = ∂± + θ¯ ± ∂++ , = 2

¯ ± = ∂¯± + i θ ± ∂ . D ++ = 2

(B.3)

848

U. Lindström, M. Roˇcek, R. von Unge, M. Zabzine

In terms of the covariant derivatives, the supersymmetry generators are Qα = iDα + θ β ∂αβ ,

¯ α = iD ¯ α + θ¯ β ∂αβ . Q

(B.4)

The supersymmetry transformation of a superfield is then defined by ¯ α ). δ = i( α Qα + ¯ α Q

(B.5)

Irreducible representations of N = (2, 2) obey constraints that are compatible with the ¯ ± = 0) has components defined via algebra (B.2); for example, a chiral superfield (D projections as follows: | ≡ X,

D± | ≡ ψ± ,

D+ D− | ≡ F,

(B.6)

¯ + χ = D− χ = 0) has components: and a twisted chiral superfield (D χ | ≡ X˜ ,

D++ χ | ≡ ψ˜ + ,

¯ − χ | ≡ ψ˜ − , D

¯ − χ | ≡ F. ˜ D+ D

(B.7)

The N = (2, 2) spinorial measure is conveniently written in terms of spinor derivatives: 2 2¯ ¯ ¯ (B.8) d θ d θ L = (D+ D− D+ D− L) . C. Poisson Geometry A (d-dimensional) manifold M is Poisson if it admits an antisymmetric bivector π ∈ ∧2 T M that satisfies the differential condition π μν ∂ν π ρσ + π ρν ∂ν π σ μ + π σ ν ∂ν π μρ = 0.

(C.1)

If π is invertible, π −1 is a symplectic structure. The bivector π defines the conventional Poisson bracket { f, g} ≡ π(d f, dg) = π μν ∂μ f ∂ν g,

f (x), g(x) ∈ C ∞ (M),

(C.2)

which is a bilinear map C ∞ (M) × C ∞ (M) → C ∞ (M). Because of (C.1), the Poisson bracket (C.2) has the ordinary antisymmetry property and satisfies the standard Leibnitz rule and Jacobi identity. Next we recall that (locally) a Poisson manifold admits a foliation by symplectic leaves. Let M be a Poisson manifold with the Poisson structure π μν , μ, ν = 1, 2, ..., d; choose a point x0 such that in its neighborhood rank(π ) = n is constant. Such a point is called regular.9 A vector field is locally Hamiltonian if it can be written as the contraction of the bivector π with a closed one-form e (locally e = d f for some function f ). The Lie bracket of two locally Hamiltonian vector fields is again locally Hamiltonian: μ μ μ μ for v A ≡ π μν ∂ν f A , Lv A v B ≡ v νA ∂ν v B −v νB ∂ν v A= π μρ ∂ρ (∂ν f B )π νλ (∂λ f A ) . (C.3) The maximum number of linearly independent locally Hamiltonian vector fields in the neighborhood of a regular point x0 is clearly n = rank(π ); then Frobenius theorem implies that the vector fields locally generate an integral submanifold S through x0 , and 9 In general, a non-regular Poisson manifold has singular points where the rank jumps [20]. We do not discuss these points and their neighborhoods here.

Generalized Kähler Manifolds and Off-shell Supersymmetry

849

it is always possible to introduce the local coordinates x μ = {x A , x i }, A = 1, . . . , n, i = n +1, ..., d in the neighborhood of x0 such that S can be described by x i = constant and x A are the coordinates on S. The restriction of the Poisson bracket to the functions on the submanifold S is again a Poisson bracket, and is indeed a nondegenerate Poisson structure on S. As a result, in the coordinates x μ = {x A , x i }, π has the following form: AB π 0 μν π = . (C.4) 0 0 Since π AB ≡ π | S is nondegenerate, it is the inverse of a symplectic structure on S, and thus the Poisson manifold is foliated by symplectic leaves. In a generic coordinate system, there is a locally complete set of d − n independent Casimir functions { f i (x)} of π which have vanishing Poisson bracket with any function from C ∞ (M). In these coordinates the symplectic leaves are determined locally by the conditions f i (x) = constant. For further details on the Poisson geometry the reader may consult the book [20]. References 1. Gates, S.J., Hull, C.M., Roˇcek, M.: Twisted multiplets and new supersymmetric nonlinear sigma models. Nucl. Phys. B248, 157 (1984) 2. Buscher, T., Lindström, U., Roˇcek, M.: New supersymmetric sigma models with wess-zumino terms. Phys. Lett. B202, 94 (1988) 3. Sevrin, A., Troost, J.: Off-shell formulation of N = 2 non-linear sigma-models. Nucl. Phys. B492, 623 (1997) 4. Bogaerts, J., Sevrin, A., van der Loo, S., Van Gils, S.: Properties of semichiral superfields. Nucl. Phys. B562, 277 (1999) 5. Curtright, T.L., Zachos, C.K.: Geometry, topology and supersymmetry in nonlinear models. Phys. Rev. Lett. 53, 1799 (1984) 6. Howe, P.S., Sierra, G.: Two-dimensional supersymmetric nonlinear sigma models with torsion. Phys. Lett. B148, 451 (1984) 7. Lyakhovich, S., Zabzine, M.: Poisson geometry of sigma models with extended supersymmetry. Phys. Lett. B548, 243 (2002) 8. Hitchin, N.: Generalized calabi-yau manifolds. Q. J. Math. 54(3), 281–308 (2003) 9. Gualtieri, M.: Generalized complex geometry. Oxford University, DPhil thesis, 2004 10. Lindström, U.: Generalized N = (2,2) supersymmetric non-linear sigma models. Phys. Lett. B587, 216 (2004) 11. Lindström, U., Minasian, R., Tomasiello, A., Zabzine, M.: Generalized complex manifolds and supersymmetry. Commun. Math. Phys. 257, 235 (2005) 12. Lindström, U., Roˇcek, M., von Unge, R., Zabzine, M.: Generalized Kaehler geometry and manifest N = (2,2) supersymmetric nonlinear sigma-models. JHEP 0507, 067 (2005) 13. Ivanov, I.T., Kim, B.B., Roˇcek, M.: Complex structures, duality and WZW models in extended superspace. Phys. Lett. B343, 133 (1995) 14. Sevrin, A., Troost, J.: The geometry of supersymmetric sigma-models. In: Proceedings of the workshop “Gauge Theories, Applied Supersymmetry and Quantum Gravity”, London: Imperial college, 1996 15. Grisaru, M.T., Massar, M., Sevrin, A., Troost, J.: The quantum geometry of N = (2,2) non-linear sigmamodels. Phys. Lett. B412, 53 (1997) 16. Hitchin, N.: Instantons, Poisson structures and generalized Kähler geometry. Commun. Math. Phys. 265, 131–164 (2006) 17. Arnold, V.I.: Mathematical methods of classical mechanics. Translated from the Russian by K. Vogtmann and A. Weinstein. Second Edition. Graduate Texts in Mathematics, 60 New York: Springer-Verlag, 1989 18. Lindström, U., Roˇcek, M.: Private communication, in preparation 19. Hitchin, N.J., Karlhede, A., Lindström, U., Roˇcek, M.: Hyperkähler Metrics And Supersymmetry. Commun. Math. Phys. 108, 535 (1987) 20. Vaisman, I.: Lectures on the Geometry of Poisson Manifolds, Progress in Mathematics, Vol. 118. Basel: Birkhäuser, 1994 Communicated by M.R. Douglas

Commun. Math. Phys. 269, 851–857 (2007) Digital Object Identifier (DOI) 10.1007/s00220-006-0121-2

Communications in

Mathematical Physics

Singular Continuous Spectrum for the Laplacian on Certain Sparse Trees Jonathan Breuer Institute of Mathematics, The Hebrew University of Jerusalem, 91904 Jerusalem, Israel. E-mail: [email protected] Received: 20 March 2006 / Accepted: 21 April 2006 Published online: 5 October 2006 – © Springer-Verlag 2006

Abstract: We present examples of rooted tree graphs for which the Laplacian has singular continuous spectral measures. For some of these examples we further establish fractional Hausdorff dimensions. The singular continuous components, in these models, have an interesting multiplicity structure. The results are obtained via a decomposition of the Laplacian into a direct sum of Jacobi matrices. 1. Introduction This note deals with the spectral analysis of the discrete Laplacian on trees that have a certain sparseness in their coordination number (to be precisely defined below). We show that the spectral theory of the Laplacian on such trees bears similarities to the theory of one-dimensional Schrödinger operators with a sparse-barrier potential. In particular, this framework allows us to construct explicit examples of trees with singular continuous spectrum. Moreover, for some of these models, the spectral measures have fractional Hausdorff dimensions (see Theorem 4.2 below). Graphs with singular continuous spectrum are known to exist [10]; however, we are not aware of any previous explicit construction of a graph with non-trivial bounds on the spectral dimensions. What’s more, we show that the singular continuous components occur with multiplicities that are related to the symmetry of the tree (Theorem 2.5). At the center of our analysis is a decomposition theorem (Theorem 2.4) for the Laplacian on a family of trees that exhibit a certain spherical symmetry. All our examples follow from this decomposition by applying known methods from the theory of sparse-potential Schrödinger operators, mentioned above. Thus, for the applications, we content ourselves with giving the proper references and a few general remarks. The paper is structured as follows. The basic result for sparse trees with singular continuous spectrum is described in Sect. 2, along with the decomposition theorem. The proofs of Theorems 2.4 and 2.5 are presented in Sect. 3. Further examples are given in Sect. 4.

852

J. Breuer

Fig. 1. An example of a SH rooted tree with κ0 = 1, κ1 = 2, κ2 = κ3 = κ4 = 1, κ5 = 2 …

2. Sparse Trees Recall that for a combinatorial tree, the distance between two vertices is defined as the number of edges of the unique path between them. Definition 2.1 (Spherically Homogeneous Rooted Tree). A rooted tree is called spherically homogeneous (SH) (see [2]) if any vertex v, at a distance j from the root - O, is connected with κ j vertices at a distance j + 1 from O. A locally finite (that is - the valence of every vertex is finite) spherically homogeneous tree is uniquely determined by the sequence {κ j }∞ j=0 . ∞ Let {kn }∞ n=1 be a sequence of natural numbers >1, and {L n }n=1 be a strictly increasing sequence of natural numbers. A spherically homogeneous rooted tree - - is said to be of type {L n , kn }∞ n=1 , if

κj =

kn j = L n for some n . 1 otherwise

(2.1)

We shall say that is sparse if (L n+1 − L n ) → ∞ rapidly, as n → ∞. Typical of the examples we construct is the following: Theorem 2.2. Let k0 ≥ 2 be a natural number and let kn ≡ k0 . Assume that (L n+1 − L n ) → ∞ and let be a SH rooted tree, of type {L n , kn }∞ n=1 . Then the essential spectrum of on contains the interval [−2, 2] and, provided (L n+1 − L n ) increase sufficiently rapidly, any spectral measure for is purely singular continuous on (−2, 2). By ‘sufficiently rapidly’ we mean that (L n+1 − L n ) has to be made sufficiently large with respect to {(L i+1 − L i )}i
and ˜ f )(x) = (

f (y) − #{y : d(x, y) = 1} · f (x),

(2.3)

y:d(x,y)=1

where # A, for a finite set A, is the number of elements in A (d(x, y) denotes the distance ˜ on the tree). Although we formulate all our results for , we note that they hold for as well, (with (−2, 2) ⊆ R replaced by (−4, 0) where necessary).

Singular Continuous Spectrum for the Laplacian on Certain Sparse Trees

853

It is clear that if {kn }∞ n=1 is a bounded sequence, then on the tree is bounded and self-adjoint. For unbounded coordination number, the operator is unbounded and the issue of self-adjointess has to be addressed. Definition 2.3. We call a SH rooted tree - - normal if {kn } unbounded implies that lim supn→∞ (L n+1 − L n ) > 1. Standard methods imply that the Laplacian on normal SH rooted trees is self-adjoint. The following decomposition theorem allows us to represent on as a direct sum of Jacobi matrices ⎞ ⎛ b(1) a(1) 0 0 ... ⎜ a(1) b(2) a(2) 0 . . . ⎟ ⎟ ⎜ ∞ ⎜ . ⎟ (2.4) J ({a( j)}∞ , {b( j)} ) = j=1 j=1 ⎜ 0 a(2) b(3) a(3) . . ⎟ ⎠ ⎝ .. .. .. .. .. . . . . . with

b( j) ∈ R, a( j) > 0.

Theorem 2.4. Let be a normal rooted SH tree of type {L n , kn }∞ n=1 . Let ⎧ n

n−1 ⎨ j=1 k j − j=1 k j n > 1 Mn = k 1 − 1 n=1 ⎩ 1 n = 0.

(2.5)

Furthermore, let R0 = 0 and Rn = L n + 1, for n ≥ 1. Then is unitarily equivalent to a direct sum of Jacobi matrices, each operating on a copy of 2 (Z+ ): ∼ ⊕ Jn ⊕ · · · ⊕ Jn ), = ⊕∞ n=0 (J n

(2.6)

Mn times

where Jn =

and

∞ J ({an ( j)}∞ n=1 , {bn ( j)}n=1 )

with √ km j = Rm − Rn for some m > n an ( j) = 1 otherwise bn ( j) ≡ 0.

(2.7)

(2.8)

Remarks. 1. For the case of a regular tree, a similar decomposition was discussed in [1, 8] (see also [13] and references therein, for a related result in the case of a metric tree). ˜ as well, albeit with different values for 2. As noted above, this theorem holds for b˜n ( j). The next theorem is a simple corollary of Theorem 2.4. Its conditions are satisfied by all the cases we consider in this paper. Theorem 2.5. Let be a normal SH rooted tree of type {L n , kn }∞ n=1 . Consider on . Let I ⊆ R be an interval such that all the spectral measures restricted to I are singular continuous. Let PI be the spectral projection onto I (associated with ), and let Mn be as defined in (2.5). Then, PI 2 () decomposes as a direct sum of invariant spaces, ⊕∞ n=0 H n , such that , restricted to H n , has uniform multiplicity Mn and the measure classes associated with the representation of restricted to H n are mutually disjoint.

854

J. Breuer

Remark. It is not hard to show that the numbers Mn are dimensions of certain irreducible representations of the symmetry group of . Theorem 2.4 makes it clear that the spectral analysis of the Laplacian on sparse trees reduces to that of Jacobi matrices with ‘bumps’ that are sparse along the subdiagonal and superdiagonal. Such matrices are analogous to discrete one-dimensional Schrödinger operators with potentials composed of sparse barriers. There is extensive literature on the spectral theory of such operators (see [5] for a review of the relevant theory), showing that such potentials give rise to a variety of interesting spectral phenomena. As noted in the introduction, all the examples we present in this paper are obtained by applying the (suitably modified) methods of the diagonal sparse case to the off diagonal case and using Theorem 2.4. In particular, Theorem 2.2 follows from Theorem 2.4 by the methods in [7].

3. Decomposing the Laplacian We start with some terminology and notation: We use the shorthand |v| ≡ d(v, O). For any v ∈ V () (= the vertex set of ) we call the forward subtree of v - v , the subtree of all of whose vertices, u, satisfy the following two conditions: 1. |v| ≤ |u|. 2. any vertex v on the unique path connecting v and u satisfies |v| ≤ |v |. We shall use S (r ) ≡ {v ∈ V () | |v| = r }. Proof of Theorem 2.4. We shall decompose H = 2 () as a direct sum of spaces ⊕∞ n=0 Hn , each invariant under , such that restricted to Hn is unitarily equivalent to a direct sum of Mn copies of Jn . We shall describe this decomposition inductively. We need

a label for some of the vertices of : At a distance Rn from the root there are αn ≡ nj=1 k j vertices. These are naturally divided into αn−1 groups of kn vertices αn with common backward neighbor. We shall label the vertices on S (Rn ) by {vn,l }l=1 , mkn are all on the forward where for each m = 1, 2, . . . , αn−1 , the vertices {vn,l }l=(m−1)k n +1 subtree of vn−1,m . In order to streamline the notation, we shall use δn,l for δvn,l (the delta function at the vertex vn,l ). We shall also use n,l for vn,l . αn αn - the linear span of {δn,l }l=1 - so that dim Vn = αn . Now, let us define Vn = [δn,l ]l=1 Let ϕ0 = δ O and let H0 = [n ϕ0 | n = 0, 1, . . .],

(3.1)

where [·] for a linear subspace of H denotes its closure. An orthogonal basis for H0 is obtained by ‘Gram-Schmidting’ the basis {n ϕ0 } - which results with normalized, radially symmetric functions supported on spheres around O (the radial symmetry is a consequence of the spherical homogeneity of ). This implies that H0 is the subspace of radially symmetric functions. Now assume that we have defined Hn for n = 0, 1, . . . , (N − 1) such that: 1. Hi ⊥ H j for any i = j and all spaces are invariant under in the sense that (D() ∩ Hi ) ⊆ Hi (where D() is the domain of ).

Singular Continuous Spectrum for the Laplacian on Certain Sparse Trees

855

(N −1)

2. For any vertex v with |v| <R N , δv ∈ ⊕n=1 Hn . (N −1) 3. For any l > (N − 1), Vl ∩ ⊕n=1 Hn is α(N −1) dimensional. (All these properties hold trivially for N − 1 = 0 if we let α0 = 1.) Recall that

Mn = nj=1 k j − n−1 j=1 k j = αn − αn−1 . From the above, it follows that the orthogonal (N −1)

complement in VN (which is α N dimensional) to ⊕n=1 Hn is M N dimensional and is spanned by M N mutually orthogonal unit vectors - ϕ N ,1 , . . . , ϕ N ,M N . Writing ϕN , j =

αN

a lN , j δ N ,l , 1 ≤ j ≤ M N ,

(3.2)

l=1 mk N and recalling that for all m = 1, 2, . . . , α(N −1) , {v N ,l }l=(m−1)k have a common backN +1 ward neighbor on S (L N ), we get (from 2 above) that mk N

a lN , j = 0

(3.3)

l=(m−1)k N +1

for all m and j. Define H N = [n ϕ N , j | 1 ≤ j ≤ M N n = 0, 1, . . .].

(3.4)

N HN = ⊕M j=1 H N , j

(3.5)

H N , j = [n ϕ N , j | n = 0, 1, . . .].

(3.6)

Then we claim that

with Indeed, (3.3) together with the spherical homogeneity of , implies that the orthogonal basis obtained from the Gram-Schmidt process applied to {n ϕ N , j }, is made of functions supported on {v | |v| ≥ R N } and having the form 1

ρ N , j (r )

|v|=r

a vN , j δv ,

(3.7)

where r ≥ R N , ρ N , j (r ) > 0 is a normalizing factor and a vN , j = a lN , j if v ∈ N ,l . Together with the orthogonality of the various ϕ N , j , this means that H N , j ⊥ H N ,i if i = j. Thus we see that Properties 1–3 above hold for n = 0, 1, . . . , N . Having constructed Hn for all n in this fashion, we get from Properties 1 and 2 that indeed H = ⊕∞ (3.8) n=1 Hn and that each of the spaces in the direct sum is invariant under in the sense that (D() ∩ Hn ) ⊆ Hn . This almost means that decomposes as a direct sum. The only missing point is that if is unbounded, the above does not necessarily mean that Hn is invariant under ( − z)−1 for z ∈ C\R. However, it is easy to see that the moment problem associated with the operation of on ϕn, j (any n and 1 ≤ j ≤ Mn )

856

J. Breuer

is determinate, so it follows from Proposition 4.15 in [11] that Hn is indeed invariant in both senses. Thus, we have that = ⊕∞ (3.9) n=1 n with n denoting the corresponding restricted operators. In order to complete the proof, we need to show that n ∼ = Jn ⊕ Jn ⊕ · · · ⊕ Jn .

(3.10)

Mn times

Knowing (3.5) and (3.7), however, this is nowa matter of simple computation (since it is easy to see that ρn, j (r ) = # S (r ) ∩ n,l for any 1 ≤ l ≤ αn and r ≥ Rn ). Proof of Theorem 2.5. Theorem 2.4 says that it suffices to consider the measures μn - the spectral measures of Jn and δ1 ∈ 2 (Z+ ) (since this is a cyclic vector). Obviously, these measures occur with multiplicity at least Mn , so we only need to show that their singular continuous parts are mutually singular. Note now that for n 1 > n 2 , Jn 1 can be obtained from Jn 2 by ‘stripping off’ the (Rn 1 − Rn 2 ) leftmost columns and the same number of rows from the top. The fact that the singular continuous part of μn 1 is singular, with respect to the singular continuous part of μn 2 , follows, now, from the characterization of the appropriate supports in terms of m-functions (see e.g. [9]) and from the continued fraction expansion of m [3]. (The spaces H n are just PI (Hn ).) 4. Singular Continuous Spectrum for Sparse Trees Let be a sparse, normal SH rooted tree. As noted in the proof of Theorem 2.5, all the matrices Jn , in the decomposition of the Laplacian, are actually various ‘tails’ of J0 . Since, in the sparse models, it is the asymptotics that determine the spectral type, this means that the spectral analysis of the Laplacian reduces essentially to the analysis of a single Jacobi matrix. A good reason for considering trees with unbounded {kn }, is the fact that for trees with kn → ∞, it is easy to state explicit growth conditions on {L n } which make the spectrum singular continuous. In particular, kn → ∞ implies absence of absolutely continuous spectrum [6], so the following is a straightforward adaptation of an idea of Simon-Stolz [12] to our case: ∞ Theorem 4.1. Let {k

nn}n=1 be a sequence of natural numbers such that kn → ∞ as n → ∞. Let αn = j=1 k j . Assume that (L n+1 − L n ) → ∞ and let be a SH rooted tree of type {L n , kn }∞ n=1 . Then the spectrum of on consists of the interval [−2, 2] along with some discrete point spectrum outside this interval. If for some ε > 0,

lim sup n→∞

(L n+1 − L n ) (1+ε)

αn

> 0,

(4.1)

then any spectral measure for is purely singular continuous on (−2, 2). As certain sparse potentials have been constructed with spectral measures of fractional Hausdorff dimensionality, it seems natural to try to construct trees with this property as well. An adaptation of an example of Jitomirskaya-Last [4] (see also [14]) achieves just that:

Singular Continuous Spectrum for the Laplacian on Certain Sparse Trees

857

n β Theorem 4.2. Let L n = 2(n ) . Let β > 0 and kn = L n . Let β be the corresponding tree. Then the restriction to (−2, 2) of any spectral measure for on β , is supported 2 and does not give weight to sets of Hausdorff on a set of Hausdorff dimension 2+β 1 . dimension less than 1+β (n+1) βn (c) with βn (c) = c (n+1) for some 1 > c > 0, we get that any Letting kn = L n nn spectral measure on (−2, 2) is purely singular continuous and supported on a set of Hausdorff dimension 0.

Acknowledgements. We are grateful to Michael Aizenman, Nir Avni, Yoram Last, Barry Simon, and Simone Warzel for useful discussions. We also wish to thank Michael Aizenman for the hospitality of Princeton where some of this work was done. This research was supported in part by THE ISRAEL SCIENCE FOUNDATION (grant no. 188/02) and by Grant no. 2002068 from the United States-Israel Binational Science Foundation (BSF), Jerusalem, Israel.

References 1. Allard, C., Froese, R.: A Mourre estimate for a Schrödinger operator on a binary tree. Rev. Math. Phys. 12, 1655–1667 (2000) 2. Bass, H., Otero-Espinar, M.V., Rockmore, D.N., Tresser, C.P.L.: Cyclic Renormalization and Automorphism Groups of Rooted Trees. Lecture Notes in Mathematics, 1621, Berlin-Heidelberg-New York: Springer-Verlag, 1996 3. Gesztesy, F., Simon, B.: M-Functions and inverse spectral analysis for finite and semi-infinite Jacobi matrices. J. d’Analyse Math. 73, 267–297 (1997) 4. Jitomirskaya, S., Last, Y.: Power-law subordinacy and singular spectra, I. Half-line operators. Acta Math. 183, 171–189 (1999) 5. Last, Y.: Spectral theory of Sturm-Liouville operators on infinite intervals: A review of recent developments. In: Sturm-Liouville Theory: Past and Present, edited by Amrein, W. O., Hinz, A. M., Pearson, D. B., Basel: Birkhäuser Verlag, 2005, pp. 99–120 6. Last, Y., Simon, B.: Eigenfunctions, transfer matrices, and absolutely continuous spectrum of one-dimensional Schrödinger operators. Invent. Math. 135, 329–367 (1999) 7. Pearson, D.B.: Singular continuous measures in scattering theory. Commun. Math. Phys. 60, 13–36 (1978) 8. Romanov, R.V., Rudin, G.E.: Scattering on the Bruhat-Tits tree. I. Phys. Lett. A 198, 113–118 (1995) 9. Simon, B.: Spectral analysis of rank one perturbations and applications. In: Mathematical quantum Theory, II: Schrödinger Operators. CRM Proceedings and Lecture Notes, edited by Feldman, J., Froese, R., Rosen, L., Vol. 8, Providence, RI: Amer. Math. Soc., 1995, pp. 109–149 10. Simon, B.: Operators with singular continuous spectrum, VI: Graph Laplacians and Laplace-Beltrami operators. Proc. Amer. Math. Soc. 124, 1177–1182 (1996) 11. Simon, B.: The classical moment problem as a self-adjoint finite difference operator. Adv. in Math. 137, 82–203 (1998) 12. Simon, B., Stolz, G.: Operators with singular continuous spectrum, V: Sparse potentials. Proc. Amer. Math. Soc. 124, 2073–2080 (1996) 13. Solomyak, M.: On the spectrum of the Laplacian on regular metric trees. Waves in Random Media 14, S155–S171 (2004) 14. Tcheremchantsev, S.: Dynamical analysis of Schrödinger operators with growing sparse potentials. Commun. Math. Phys. 253, 221–252 (2005) Communicated by B. Simon

Communications in Mathematical Physics - Volume 221

Read more

Communications in Mathematical Physics - Volume 220

Read more

Communications in Mathematical Physics - Volume 235

Read more

Communications in Mathematical Physics - Volume 223

Read more

Communications In Mathematical Physics - Volume 283

Read more

Communications In Mathematical Physics - Volume 270

Read more

Communications in Mathematical Physics - Volume 208

Read more

Communications in Mathematical Physics - Volume 186

Read more

Communications In Mathematical Physics - Volume 294

Read more

Communications in Mathematical Physics - Volume 217

Read more

Communications In Mathematical Physics - Volume 274

Read more

Communications in Mathematical Physics - Volume 239

Read more

Communications in Mathematical Physics - Volume 306

Read more

Communications in Mathematical Physics - Volume 264

Read more

Communications in Mathematical Physics - Volume 227

Read more

Communications in Mathematical Physics - Volume 184

Read more

Communications in Mathematical Physics - Volume 261

Read more

Communications in Mathematical Physics - Volume 225

Read more

Communications In Mathematical Physics - Volume 263

Read more

Communications in Mathematical Physics - Volume 211

Read more

Communications In Mathematical Physics - Volume 293

Read more

Communications in Mathematical Physics - Volume 246

Read more

Communications In Mathematical Physics - Volume 298

Read more

Communications in Mathematical Physics - Volume 234

Read more

Communications In Mathematical Physics - Volume 288

Read more

Communications in Mathematical Physics - Volume 304

Read more

Communications In Mathematical Physics - Volume 292

Read more

Communications in Mathematical Physics - Volume 233

Read more

Communications in Mathematical Physics - Volume 253

Read more

Communications in Mathematical Physics - Volume 222

Read more

Recommend Documents

Communications in Mathematical Physics - Volume 221

Commun. Math. Phys. 221, 1 – 26 (2001) Communications in Mathematical Physics © Springer-Verlag 2001 Evolution of a ...

Communications in Mathematical Physics - Volume 220

Commun. Math. Phys. 220, 1 – 12 (2001) Communications in Mathematical Physics © Springer-Verlag 2001 On the Definiti...

Communications in Mathematical Physics - Volume 235

Commun. Math. Phys. 235, 1–45 (2003) Digital Object Identifier (DOI) 10.1007/s00220-002-0778-0 Communications in Mathe...

Communications in Mathematical Physics - Volume 223

Commun. Math. Phys. 223, 1 – 12 (2001) Communications in Mathematical Physics © Springer-Verlag 2001 Resonance Expan...

Communications In Mathematical Physics - Volume 283

Commun. Math. Phys. 283, 1–24 (2008) Digital Object Identifier (DOI) 10.1007/s00220-008-0556-8 Communications in Mathe...

Communications In Mathematical Physics - Volume 270

Commun. Math. Phys. 270, 1–12 (2007) Digital Object Identifier (DOI) 10.1007/s00220-006-0139-5 Communications in Mathe...

Communications in Mathematical Physics - Volume 208

Commun. Math. Phys. 208, 1 – 23 (1999) Communications in Mathematical Physics © Springer-Verlag 1999 Characters of C...

Communications in Mathematical Physics - Volume 186

Commun. Math. Phys. 186, 1-59 (1997) Communications in Mathematical Physics (~) Springer-Verlag1997 Meanders and the...

Communications In Mathematical Physics - Volume 294

Commun. Math. Phys. 294, 1–19 (2010) Digital Object Identifier (DOI) 10.1007/s00220-009-0920-3 Communications in Mathe...

Communications in Mathematical Physics - Volume 217

Commun. Math. Phys. 217, 1 – 31 (2001) Communications in Mathematical Physics © Springer-Verlag 2001 Integrable Stru...