Communications In Mathematical Physics - Volume 278

Commun. Math. Phys. 278, 1–29 (2008) Digital Object Identifier (DOI) 10.1007/s00220-007-0398-9 Communications in Mathe...

Author: M. Aizenman (Chief Editor)

52 downloads 840 Views 9MB Size Report

This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!

Report copyright / DMCA form

DOWNLOAD PDF

Commun. Math. Phys. 278, 1–29 (2008) Digital Object Identifier (DOI) 10.1007/s00220-007-0398-9

Communications in

Mathematical Physics

Rigorous Remarks about Scaling Laws in Turbulent Fluids F. Flandoli1 , M. Gubinelli2 , M. Hairer3 , M. Romito4 1 2 3 4

Dipartimento di Matematica Applicata, Università di Pisa, via Buonarroti 1, 56127 Pisa, Italia Equipe de probabilités, statistique et modélisation, Université de Paris-Sud, 91405 Orsay Cedex, France Department of Mathematics, The University of Warwick, Coventry CV4 7AL, United Kingdom Dipartimento di Matematica, Università di Firenze, viale Morgagni 67/a, 50134 Firenze, Italia. E-mail: [email protected]

Received: 18 July 2005 / Accepted: 12 September 2007 Published online: 8 December 2007 – © Springer-Verlag 2007

Abstract: A definition of scaling law for suitable families of measures is given and investigated. First, a number of necessary conditions are proved. They imply the absence of scaling laws for 2D stochastic Navier-Stokes equations and for the stochastic Stokes (linear) problem in any dimension, while they imply a lower bound on the mean vortex stretching in 3D. Second, for the 3D stochastic Navier-Stokes equations, necessary and sufficient conditions for scaling laws to hold are given, translating the problem into bounds for energy and enstrophy of high and low modes respectively. Unlike in the 2D case, the validity or invalidity of such conditions in 3D remains open. 1. Introduction The scaling law devised by Kolmogorov and Obukhov for turbulent 3D fluids in 1941 (called K41 in the sequel) says that in the inertial range S2 (r ) ∼ 2/3r 2/3 , where S2 (r ) is the second order structure function and is the mean energy dissipation rate. Since Kolmogorov’s work of 1962 (and supported later by experimental evidence), K41 scaling has been believed false and has been replaced by S2 (r ) ∼ 2/3r 2/3 (r/L)κ (called K62 in the sequel) for some small value of κ > 0. Here L is the length scale at which energy is injected and the correction (r/L)κ accounts for the effects of small scale intermittency. We refer to [16,15,13,19] for further physical details. The exact value of κ and the validity itself of the previous prediction are still open problems, although there is a general agreement about the K62 “anomalous” scaling law. With respect to these difficult open problems the aim of the present work is very limited; in a sentence, our hope is just to fix some rigorous definitions and preliminary results in order to encourage further investigation by the mathematical community, especially the one dealing with stochastic partial differential equations. To be precise, our aims are the following: 1. We give one possible rigorous definition of scaling law inspired by the previous conjectures; this issue is not a priori obvious due to the fact that the scaling should

2

F. Flandoli, M. Gubinelli, M. Hairer, M. Romito

hold only in a certain range of r ’s, which does not extend to zero for finite viscosity ν > 0, but it tends to extend to zero as ν → 0. We provide some mathematical examples to understand this definition (Remark 1.5, Example 2.3). 2. We rigorously prove that the 2D Navier-Stokes equations on the torus perturbed by a large class of additive white noise cannot fulfill such a scaling. This proves rigorously what is believed on the basis of convincing but still heuristic physical arguments, see the classic papers of Onsager [22], von Neumann [27], Batchelor [1] and Fjørtoft [7]. In particular the work of Lee [20] presents a clear (albeit non-rigorous) argument which excludes the K41 scaling in 2D by showing that such a scaling is incompatible with the conservation of enstrophy. 3. The same result as in (2) is true for the 3D Stokes problem, thus only 3D nonlinear effects could produce either K41 or K62. The relevance of non-linear terms for 3D turbulence of course has been conjectured a long time ago, see for instance Taylor [24,23] and Taylor and Green [25]. We rigorously prove a necessary condition: if such scaling laws are true, then one has lower bounds on vortex stretching. The result is proved for a 3D Navier-Stokes equation on the torus perturbed by additive white noise. These results are a rigorous version of an observation made by Batchelor and Townsend [2]. 4. Although we cannot prove or disprove the scaling laws for the 3D stochastic NavierStokes equations, at least we give a number of necessary and/or sufficient conditions which could help both to understand the meaning of the scaling properties and for further investigation. As already mentioned, we base our analysis on the stochastic Navier-Stokes equations on the torus [0, 1]d , with d = 2, 3, ∂u + (u · ∇)u + ∇ p = νu + h α (x)β˙α (t) ∂t α

(1.1)

supplemented with the incompressibility condition div u = 0 and periodic boundary conditions. Here, h α (x) denote suitable vector fields and βα (t) denote independent Brownian motions (the torus instead of a more realistic framework has been chosen for mathematical simplicity). Let us remark that the theoretical usefulness of the stochastic Navier-Stokes equations has been noted since the early work of Novikov [21] who observed (albeit in a non-rigorous setup) that the Itô formula implies a simple energy balance equation for the model (see Remark 1.1 below). In the limit ν → 0 Eq. 1.1 is a singular limit problem much like the boundary layer one, and so may be considered as a prototype of high Reynold number singular limit problem, with some mathematical simplifications due to the advantages produced by stochastic analysis. Let us also remark that for 2D Navier-Stokes equations in unbounded domains (or with large-scale dissipation) the theories of Batchelor [3] and Kraichnan [17] predict that S2 (r ) ≈ r 2 with logarithmic corrections. Moreover, bounds on the energy spectrum for the deterministic 2D Navier-Stokes equations in a periodic domain forced on one or two eigenmodes of the Laplacian have been rigorously established by Constantin et al. [5] (see also [11,12]). It should be noted that another possible and interesting approach to the zero-viscosity limit is the one adopted in [18] (for the 2D case), where the amplitude of the forcing noise is proportional to the square-root of the viscosity.

Rigorous Remarks about Scaling Laws in Turbulent Fluids

3

1.1. Notations about functions spaces. Let T be the torus [0, 1]d , with d = 2, 3, L2 (T ) be the space of vector fields u : T → Rd with L 2 (T )-components, Hα (T ) be the analogous Sobolev spaces, C(T ) be the analogous space of continuous fields. 2 Let H be the space of all fields u ∈ L (T ) such that div u = 0, with zero mean, i. e. u(x) d x = 0 (zero mean), and the trace of u · n on the boundary is periodic (where T n is the outer normal, see [26], Ch. I, Thm 1.2). Let V be the space of divergence free, zero mean, periodic elements of H1 (T ) and D(A) be the space of divergence free, zero mean, periodic elements of H2 (T ). Finally, let D be the space of infinitely differentiable divergence free, zero mean, periodic fields on T . The spaces V , D(A) and D are dense and compactly embedded in H . Let A : D(A) ⊂ H → H be the (Stokes) operator Au = −u (componentwise). Sometimes we shall also need the same framework for the torus [0, L]d , d = 2, 3, with any L > 0. We set T L = [0, L]d , HL equal to the set of all fields u ∈ L2 (T L ) such that div u = 0 and u · n on the boundary is periodic, VL , D(A L ) and A L : D(A L ) ⊂ HL → HL the analogs of V , D(A) and A. Notice only that we define the inner product as |u|2HL = L −d T L |u(x)|2 d x (so that, roughly speaking, |u|2HL ∼ |u(0)|2 for homogeneous fields). 1.2. The class P of probability measures. If µ is a probability measure on a Banach space X and f is a function on X , we use the notation µ [ f (u)] := X f (u) dµ(u) whenever the integral is well defined. Let P0 be the family of all probability measures µ on H (equipped with the Borel σ -algebra) such that µ (D(A)) = 1 (D(A) is a Borel set in H ). Since H2 (T ) ⊂ C (T ) by the Sobolev embedding theorem, the elements of D(A) are continuous (have a continuous element in their equivalence class). Consequently, given x0 ∈ T , the mapping u → u (x0 ) is well defined on D(A), with values in Rd . In particular, any expression of the form µ [ f (u(x1 ), . . . , u(xn ))] is well defined for given x1 , . . . , xn ∈ T , given µ ∈ P0 , and suitable f : Rnd → R (for instance measurable non negative). It follows µ that S2 (r ) is well defined (possibly infinite) for every µ ∈ P0 . The same argument does not apply to Du(x0 ) and D 2 u (x0 ), at least in d = 3. This is why we use lengthy expressions like µ[ T Du(x) 2 d x] which are meaningful (possibly infinite) for every µ ∈ P0 . We denote by P the class of all µ ∈ P0 such that

µ

Du(x) d x < ∞ 2

T

and, for every a ∈ T and every rotation R that transforms the set of coordinate axes in itself, µ[ f (u(· − a))] = µ[ f (u)],

µ[ f (u(R·))] = µ[ f (Ru(·))]

(1.2)

for all continuous bounded f : H → R. In plain words, we impose space homogeneity and a discrete form of isotropy (compatible with the symmetries of the torus). In the following we will refer to this symmetry as partial or discrete isotropy. µ Discrete isotropy is imposed for two reasons. First, it ensures that S2 (r ) is independent of the coordinate unitary vector e, since given two such vectors e, e there is a rotation R as above such that R e = e, so

4

F. Flandoli, M. Gubinelli, M. Hairer, M. Romito

µ u(r e) − u(0) 2 = µ u(R r e ) − u(R 0) 2 = µ R(u(r e ) − u(0)) 2 = µ u(r e ) − u(0) 2 . Furthermore, we use discrete isotropy through Lemma A.3 in Appendix 3.2. Finally, µ notice that S2 (r ) < ∞ for every r > 0 and µ ∈ P, by Lemma 2.1 below. 1.3. Definition of scaling law. For every µ ∈ P we introduce the second order structure function µ S2 (r ) = µ u(r · e) − u(0) 2 (1.3) for some coordinate unitary vector e, with r > 0 (the results proved below extend to the so called longitudinal structure function; we consider (1.3) to fix the ideas). The measures of P are supported on continuous vector fields, so the pointwise operations in µ (1.3) are meaningful. Moreover, the symmetries in P imply that S2 (r ) is independent of the coordinate unitary vector e (in addition most of the estimates proved in the sequel extend to every unitary vector e). There is not only one way to define a scaling law. Inspired by the K41 and K62 theories we choose the following definition. We prefer to avoid the additional parameter L and work on the given torus of size one; to express the smallness of the inertial range of r ’s with respect to the integral scale L we shall restrict the range of r ’s as explained below. As a preliminary technical remark, notice that we are going to define K41 and K62 scaling law for a set M ⊂ P × R+ and not for a family of measures {µν }ν>0 . The reason is that Eq. (1.1) may have (a priori) more than one stationary measure for any given ν and in certain claims it seems easier to consider a set of measures for a given ν. Given ν > 0 we use the notation Mν for the set section {µ ∈ P : (µ, ν) ∈ M}. Here and in the sequel, when we talk about a set M ⊂ P × R+ , we tacitly assume that Mν = ∅ for all sufficiently small ν > 0 , since otherwise several definitions and statements would be just empty. Given (µ, ν) ∈ P × R+ , we define the mean energy dissipation rate as 2 = (µ, ν) := ν · µ Du(x) d x . [0,1]d

Remark 1.1. If µ is a stationary measure of (1.1) and a mean energy equality (coming from the Itô formula) can be rigorously proved, one can show that does not depend on (µ, ν). Given (µ, ν) ∈ P × R+ , we also define the quantity η = η(µ, ν) := ν 3/4 (µ, ν)−1/4 .

(1.4)

Remark 1.2. In case of Eqs. (1.1), η is a length scale: ν has dimension [L]2 [T ]−1 , has dimension [L]2 [T ]−3 , so η has dimension [L]. The only combination of ν and in powers, having dimension [L], is the η above. This is the simplest reason to choose η as a length scale involved in Kolmogorov theory. More refined arguments may be found in [13] and related references.

Rigorous Remarks about Scaling Laws in Turbulent Fluids

5

In the following definition κ is a non-negative real number. Definition 1.3. We say that a Kolmogorov type scaling law with exponent 23 + κ holds true for a set M ⊂ P × R+ if there exist ν0 > 0, C > c > 0, C0 > 0, and a monotone function R0 : (0, ν0 ] → R+ with R0 (ν) > C0 and limν→0 R0 (ν) = +∞, such that the bound µ

c · r 2/3+κ ≤ S2 (r ) ≤ C · r 2/3+κ holds for every pair (µ, ν) ∈ M and every r such that ν ∈ (0, ν0 ] and C0 · η(µ, ν) < r < η(µ, ν) · R0 (ν), where η(µ, ν) is defined by (1.4). This definition corresponds to K41 theory in the case κ = 0 and to the more plausible K62 theory in the case κ > 0. In fact we should restrict our next investigation to the case κ > 0. However, since the validity of such a scaling law is still an open problem, although plausible, we find of theoretical interest to analyse the necessary and/or sufficient conditions in the general case κ ≥ 0. The previous definition is a particular case of the following notion. Definition 1.4. We say that a scaling law with exponent α ∈ (0, 2) and length scale µ η : M → R+ holds true for the structure function S2 (r ) on a set M ⊂ P × R+ , if there exist a decreasing function R0 : [0, ∞) → R+ , with limν→0 R0 (ν) = +∞ and constants C2 ≥ C1 > 0, C3 > 0, ν0 > 0, such that R0 (ν) > C3 and µ

C1 · r α ≤ S2 (r ) ≤ C2 · r α

for r ∈ [C3 η (µ, ν) , η (µ, ν) R0 (ν)]

(1.5)

for every ν ∈ (0, ν0 ) and every µ ∈ Mν . Remark 1.5. The divergent factor R0 (ν) in the previous definition is essential to have a non trivial notion. If, on the contrary, we simply ask that the scaling law holds on a bounded interval r ∈ [C3 ην , C4 ην ], we have a definition without real interest. Let us explain this fact with a (useless) definition and an example. Let us say that a family M ⊂ P × R+ satisfies a local α property, α < 2, if there is a function η(µ, ν) and constants C2 ≥ C1 > 0, C4 ≥ C3 > 0, ν0 > 0, such that µ

C1r α ≤ S2 (r ) ≤ C2 r α for r ∈ [C3 η(µ, ν), C4 η(µ, ν)]

(1.6)

for every ν ∈ (0, ν0 ) and every µ ∈ Mν . As an example, consider a case with the mapping ν −→ Mν which is single valued and injective and µν

S2 (r ) = ν −1 r 2 , µν

where Mν = {µν }. This function S2 (r ) certainly does not have any interesting scaling exponent (different from 2) but satisfies the previous local α property simultaneously 1 for a continuum of values of α. Indeed, given any α ∈ (0, 2) take η(µν , ν) = ν 2−α ; then ν ν given a choice of C4 ≥ C3 > 0, for every r ∈ [C3 η(µ , ν), C4 η(µ , ν)], namely for 1 − 2−α ν r ∈ [C3 , C4 ], we have µν

1

S2 (r ) = (ν − 2−α r )2−α r α ∈ [C1 , C2 ] · r α with C1 = C32−α , C2 = C42−α . This example shows that the local α property is not a distinguished scaling property.

6

F. Flandoli, M. Gubinelli, M. Hairer, M. Romito

2. Necessary Conditions for K41 2.1. General results. The results of this subsection apply to suitable families of probability measures, without any use of the Navier-Stokes equations. They will then be applied to the stochastic Navier-Stokes equations in the next subsection. Given a measure µ ∈ P, µ = δ0 , we introduce the number θ (µ) defined by the identity µ [0,1]d Du(x) 2 d x , θ (µ)2 = (2.1) µ [0,1]d D 2 u(x) 2 d x letting θ (µ) = 0 when µ[ T D 2 u(x) 2 d x] = ∞. If µ = δ0 , the numerator and denominator vanish and we arbitrarily define θ (µ) = 1. We have θ (µ) ≤ C, where the constant is universal and depends only on the Poincaré constant of the torus. By definition, we have θ (µ)2 =

ν·µ

(µ, ν)

2 2 T D u(x) d x

for every pair (µ, ν) ∈ P × R+ . It follows from trivial dimensional analysis that θ has the dimension of a length. We interpret it as an estimate of the length scale where dissipation is more relevant. Indeed, very roughly, from 2 D 2 u(x) 2 d x u (k)|2 ) |k| (|k|2 | T ∼ 2 |k|2 | u (k)|2 T Du(x) d x we see that θ (µ)−2 has the meaning of typical square wave length of dissipation (looking at |k|2 | u (k)|2 as a sort of distribution in wave space of the dissipation). Lemma 2.1. For every µ ∈ P such that θ (µ) > 0 we have µ

S2 (r ) 1

≤ r2 · r 2 ≤ 4d µ T Du(x) 2 d x

(2.2)

for every r ∈ (0, θ (µ)/4d]. The upper bound is true for every r > 0 even if θ (µ) = 0. Proof. Since we want to use the Taylor formula for elements of D(A), we use the mollification described in Appendix A. We denote by µε the mollifications of µ. We prove in Appendix A that, for given r and µ, Du(x) 2 d x , lim µε Du(0) 2 = µ ε→0 T 2 2 2 2 D u(x) d x , lim µε D u(0) = µ ε→0 T lim µε u(r e) − u(0) 2 = µ u(r e) − u(0) 2 . ε→0

By space homogeneity of µε , µε u(r e) − u(0) 2 ≤ r 2

1 0

µε Du(σ e) 2 dσ = r 2 µε Du(0) 2

Rigorous Remarks about Scaling Laws in Turbulent Fluids

7

and thus, by the previous convergence results,

µ u(r e) − u(0)

2

≤r µ

2

Du(x) d x . 2

T

This implies the right-hand inequality of (2.2) for every r > 0. On the other hand, for smooth vector fields we have

1

u(r e) − u(0) = Du(0)r e + r 2

D 2 u(σ e)(e, e) dσ

0

and thus 2 2 2 µε Du(0)r e ≤ 2µε u(r e) − u(0) + 2µε r

1

D u(σ e)(e, e) dσ 2

2

.

0

Again from space homogeneity of µε ,

2 µε r

1 0

2 4 2 2 D , D u(σ e)(e, e) dσ µ u(0) ≤ r ε 2

and from Lemma A.3 of Appendix A, µε Du(0)e 2 = d −1 µε Du(0) 2 . Therefore r2 µε Du(0) 2 − r 4 µε D 2 u(0) 2 . µε u(r e) − u(0) 2 ≥ 2d We thus have in the limit r2 µ S2 (r ) ≥ 2d

T

Du(x) d x − r µ 2

4

D u(x) d x 2

T

2

and therefore, by definition of θ (µ), S2 (r ) ≥

r2 1 2 − µ Du(x) d x · r 2 . 2d θ (µ) T

This implies the left-hand inequality of (2.2) for r ∈ 0, θ(µ) 4d . The proof is complete. Theorem 2.2. Assume a scaling law with exponent α ∈ (0, 2) and length scale η : µ M → R+ holds true for the structure function S2 (r ) on a set M ⊂ P × R+ . Let θ (µ) be the dissipation length scale defined above. Then the two length scales θ (µ) and η(µ, ν) are related by the property

θ (µ) lim sup sup η(µ, ν) ν→0 µ∈Mν

< ∞.

(2.3)

8

F. Flandoli, M. Gubinelli, M. Hairer, M. Romito

Proof. It is intuitively rather clear that (2.2) is in contradiction with (1.5) if the ranges of r where the two properties hold overlap, so we need the bound (2.3). The proof below

confirm this intuition by ruling out the possibility that the factor µ T Du(x) 2 d x may produce a compensation. Moreover, let us notice that one could believe that the proof of (2.3) is trivial. But Remark 1.5 above shows that (2.2) and (1.6) are compatible: thus we feel that a detailed proof of (2.3) is necessary. We argue by contradiction and assume that there exists a sequence (µn , νn ) ∈ M, with νn → 0, such that θ (µn ) = +∞. η(µn , νn )

lim

n→∞

(2.4)

Notice that, in such a case, θ (µn ) must be positive, so Lemma 2.1 applies. Let us consider two sequences rn and rn defined as follows: rn = C3 η(µn , νn ), rn = rn an with limn→∞ an = +∞, rn ≤ η(µn , νn )R0 (νn ) and rn ≤ θ (µn )/(4d), where we ask that the last two inequalities are satisfied at least eventually. Such a sequence rn exists because limν→0 R0 (ν) = +∞ and (2.4) is assumed. We have (eventually) that rn , rn are both in [C3 η(µn , νn ), η(µn , νn )R0 (νn )] and n) (0, θ(µ 4d ], hence for both rn := rn and rn := rn we have 1 µ βn rn2 ≤ S2 n (rn ) ≤ βn rn2 , 4d

where we have set βn = µn T Du(x) 2 d x . The contradiction will come from the fact that, if it could happen that βn adjusts the factor rn2 to produce rnα , this cannot happen simultaneously for the two sequences rn = rn and rn = rn . Indeed, from the previous inequalities we must have µ

C1rnα ≤ S2 n (rn ) ≤ C2 rnα ,

C1rnα ≤ βn rn2 , βn rn2 ≤ 4dC2 rnα , hence βn ≥ C1rnα−2 , βn ≤ 4dC2 rnα−2 for both rn = rn and rn = rn . But the inequalities βn ≥ C1 (rn )α−2 ,

βn ≤ 4d C2 (rn )α−2

and the assumption α < 2 imply rn ≥ Crn eventually, for a suitable constant C > 0. This is impossible since limn→∞ an = +∞. The proof is complete. Example 2.3. Let us give an example of a function of (ν, r ) which satisfies the properties of Definition 1.4 and also 2.2 (to see that they are compatible). It may look artificial, but it was devised on the basis of the vortex model of [10]. The function is 1 l ∧ r 2 dl µν S2 (r ) = l 2/3 l l η with η = ν 3/4 . We have r ≤η⇒

µν S2 (r )

=

η

1

l 2/3

r 2 dl 3 = r 2 ν −1 − 1 l l 4

Rigorous Remarks about Scaling Laws in Turbulent Fluids

9

which is essentially the behaviour (2.2). On the other hand, r ∈ [η, 1], and so 1 r r 2 dl 9 dl 3 3 µν + = r 2/3 − ν 1/2 − r 2 , S2 (r ) = l 2/3 l 2/3 l l l 4 2 4 η r which is bounded above and below by the order r 2/3 since r ∈ [ν 3/4 , 1] (ν 1/2 ≤ r 2/3 ). Let us finally state two general consequences of the previous theorem, that we shall apply to the stochastic Navier-Stokes equations. Corollary 2.4. Given a family M ⊂ P × R+ , if inf (µ,ν)∈M θ (µ) > 0, then no scaling law with exponent α ∈ (0, η(µ, ν) such that 2) may hold true with a length scale lim inf ν→0 inf µ∈Mν η(µ, ν) = 0. We shall see that this simple corollary applies to the 2D stochastic Navier-Stokes equation and the Stokes problem, so a Kolmogorov type scaling law is ruled out for these systems. Let us apply the theorem to the case of a Kolmogorov type scaling law. We take, in the previous theorem, η(µ, ν) = η(µ, ν) = ν 3/4 (µ, ν)−1/4 as in Definition 1.3. In the following result, µ T D 2 u(x) 2 d x may be infinite. In fact, in the next corollary we only use the property of η(µ, ν) and not the scaling exponent 23 + κ. Corollary 2.5. Let M ⊂ P × R+ be a family having a scaling law with the exponent α ∈ (0, 2) and the length scale η(µ, ν) of Definition 1.3. Then there exist ν0 > 0 and C > 0 such that 2 2 µ D u(x) d x ≥ C 3/2 (µ, ν) · ν −5/2 T

for every ν ∈ (0, ν0 ) and every µ ∈ Mν . Proof. From (2.3), the definition of η(µ, ν) and the definition of θ 2 (µ) we have

µ T Du(x) 2 d x

< ∞. lim sup sup 3/2 (µ, ν)−1/2 µ T D 2 u(x) 2 d x ν→0 µ∈Mν ν Thus, from the definition of (µ, ν),

ν −5/2 (µ, ν)3/2

lim sup sup 2 2 ν→0 µ∈Mν µ T D u(x) d x

This implies the claim of the corollary.

< ∞.

Remark 2.6. Dimensional analysis says that ν has dimension [L]2 [T ]−1 , has dimen2 −3 3/2 −5/2 has dimension [L]−2 [T ]−2 , the correct dimension sion [L] [T ]2 , so 2 (µ,

ν) · ν of µ T D u(x) d x . 2.2. Application to stochastic Navier-Stokes equations. In this section we consider Eq. (1.1) in dimension 2 and 3 and also the corresponding linear equations (Stokes equations).

10

F. Flandoli, M. Gubinelli, M. Hairer, M. Romito

2.2.1. The noise Since we are dealing with spaces of translation invariant measures, we wish to consider classes of noises that produce such measures. Every Gaussian translation invariant noise is ‘diagonal’ with respect to the Stokes operator A in the sense that eigenmodes are all independent. In order to give a rigorous definition for our driving noise, we define (∞) := {k ∈ 2π Zd : |k| > 0} and we assume that the noise of Eq. (1.1) has the form k∈ (∞) σk β˙k (t)e−ik·x , where (βk )k∈ (∞) are independent d-dimensional Brownian motions and thecoefficients (σk )k∈ (∞) are d × d complexvalued matrices such that k · σk = 0 and k∈ (∞) |σk |2 < ∞. Additional assumptions are: we assume that σk = σ−k for every k ∈ (∞) , and |σk | = |σ Rk | for all k ∈ (∞) and for every coordinate rotation R. Together, they imply that the vector-valued random field W (t, x) = k∈ (∞) σk βk (t)e−ik·x is, for every t ≥ 0, real and partially isotropic. Finally, in order to have measures with µ(D(A)) = 1 we assume that |k|2 |σk |2 < ∞, k∈ (∞)

since the values |k|2 correspond to the eigenvalues of A. 2.2.2. The two-dimensional case We assume d = 2. The following result is well known. Lemma 2.7. Let µ be an invariant measure of (1.1) such that µ Du(x) 2 d x < ∞. T

Then µ ∈ P0 and

ν·µ

ν·µ

T

Du(x) d x = 2

T

1 |σk |2 , 2 (∞) k∈

D curl u(x) 2 d x =

1 |k|2 |σk |2 . 2 (∞) k∈

Proof. Given µ, consider the (product) filtered probability space (, A, (At )t≥0 , P) supporting both a family of independent d-dimensional Brownian motions βk (t), k ∈ (∞) , and a non anticipating random variable u 0 ∈ A0 with law µ. The corresponding strong solution u(t, x) of (1.1) is a stationary process and satisfies, due to Itô formula, the balance relations t 1 P E u(t, x) 2 d x + νE P Du(s, x) 2 d x 2 T 0 T 1 P 1 = E u 0 (x) 2 d x + t |σk |2 2 2 T (∞) k∈

and

t 1 P E curl u(t, x) 2 d x + νE P D curl u(s, x) 2 d x 2 T 0 T 1 P 1 2 = E curl u 0 (x) d x + t |k|2 |σk |2 . 2 2 T (∞) k∈

The result easily follows from stationarity.

Rigorous Remarks about Scaling Laws in Turbulent Fluids

11

Corollary 2.8. There exists a positive constant θ0 , independent of ν, such that θ (µ) ≥ θ0 for every invariant measure µ ∈ P of (1.1). Proof. The property θ (µ) ≥ θ0 follows from the definition of θ (µ) and the two identi ties of the previous lemma, since T D 2 u(x) 2 ≤ C T D curl u(x) 2 for a universal constant C > 0. In the next theorem, when we say that M ⊂ P × R+ is a family of invariant measures of (1.1), we clearly understand that each element (µ, ν) ∈ M has the property that µ is an invariant measure for the Markov semigroup associated to Eq. (1.1) with viscosity equal to ν. Theorem 2.9. In dimension d = 2, a family of invariant measures M ⊂ P × R+ of (1.1) cannot have any scaling law with exponent α ∈ (0, 2). Remark 2.10. Under our assumptions on the noise, invariant measures of (1.1) that belong to P certainly exist. In principle there could exist invariant measures for (1.1) not belonging to P, but this has recently been excluded under very weak conditions on the driving noise (see [14] and the references therein). Remark 2.11. Consider Eq. (1.1) without the nonlinear term (called Stokes equations): ∂u + ∇ p = νu + σk β˙k (t)e−ik·x ∂t (∞) k∈

in dimension d = 2, 3. Let M ⊂ P × R+ be a family of invariant measures for it. Then the same results of the previous theorem hold true. The proof is the same. Alternatively, one may work componentwise in the Fourier modes and prove easily the claims. 2.2.3. The three-dimensional case The lack of knowledge about the well posedness of the 3D stochastic Navier-Stokes equations has, among its consequences, the absence of the Markov property, and therefore of the usual notion of invariant measure. One may introduce several variants. Here we adopt the following concept. Consider the usual Galerkin approximations, recalled in Appendix B. The equation with generic index n in this scheme defines a Markov process, with the Feller property, and has invariant measures, by the classical Krylov-Bogoliubov method: if X nx (t) is its solution starting from x and νtn,x is the law of X nx (t) on H , by the Itô formula it is easy to get a bound of the form (see for instance [9]) 1 T x sup E X n (t) 2V dt ≤ C < ∞, T ≥0 T 0 which implies ([4] have been the first ones to use this elegant fast method) the necessary tightness in T of the time averaged measures 1 T n,x µn,x := ν dt. T T 0 t If we choose the initial condition x = 0, then µn,x T ∈ P (in particular it is space homogeneous and partially isotropic), so there exist invariant measures in P for the Galerkin equation. Denote by S n the set of all such invariant measures (thus S n ⊂ P).

12

F. Flandoli, M. Gubinelli, M. Hairer, M. Romito

The constant C in the estimate above is also independent of n; it follows that the invariant measures of the class S n just constructed fulfill the bound µn [ · 2V ] ≤ C. In fact it is possible to show that every element of S n has this property, [8] (if we do not want to use this property, it is sufficient to restrict the definition of S n in the sequel). These facts imply that ∪n S n is relatively compact in the weak topology of probability G (ν) (the superscript G will remind us that we use measures on H . We denote by PNS the particular procedure of Galerkin approximations) the set of limit points of ∪n S n , G (ν) if there precisely defined as follows: a probability measure µ on H belongs to PNS k n is a sequence kn → ∞ and elements µkn ∈ S such that µkn converges to µ in G (ν) are the weak topology of probability measures on H . The elements of the set PNS space homogeneous and partially isotropic (these relations are stable under weak convergence). Furthermore, they have the other regularity properties required to belong to P: finite second moment in V comes from the previous estimates, µ(D(A)) = 1 from a regularity result of [6], see also [8], summarized in the following lemma. Therefore G (ν) ⊂ P. PNS Lemma 2.12. Given ν > 0, there is a constant Cν > 0 (depending on ν) such thatµn (|A· 2/3 | H ) ≤ C for every n and every invariant measure µn ∈ S n . Given u ∈ V , let Su be the tensor with L 2 (T ) components Su =

1 (Du + Du T ) 2

(called the stress tensor). The scalar field Su (x) curl u(x), curl u(x) describes the stretching of the vorticity field. If we set ξ = curl u, then formally we have ∂t ξ + (u · ∇)ξ = νξ + Su ξ + i k × σk β˙k e−ik·x . k∈ (∞)

A formal application of the Itô formula yields the inequality 1 ν·µ D curl u(x) 2 ≤ µ Su (x) curl u(x), curl u(x) + |k|2 |σk |2 2 T T (∞) k∈

G for µ ∈ PNS (ν) (in fact formally the identity). Along with the general results of the previous sections we would get µ Su (x) curl u(x), curl u(x) d x ≥ C 3/2 (µ, ν) · ν −3/2 . (2.5)

T

This would be the final result of this section, having an interesting physical interpretation. However we are not able to prove it in this form. We analyze the status of this inequality by presenting some related rigorous results. They are of two different natures: Corollary 2.14 reformulates it for the coarse graining scheme given by Galerkin approxG (ν) imations; Corollary 2.18 expresses the most natural statement directly for µ ∈ PNS but it requires an additional unproved regularity assumption.

Rigorous Remarks about Scaling Laws in Turbulent Fluids

13

G (ν), and µ kn such that µ converges to µ in the Lemma 2.13. Given µ ∈ PNS nk ∈ S kn weak topology of probability measures on H , then

µ[|A · |2H ] ≤ lim inf µn k [|A · |2H ]. The same is true for µ[ T D curl u(x) 2 d x] in place of µ[|A · |2H ]. Proof. Let {ϕm }m∈N ∈ Cb (H ) be a sequence that converges monotonically increasing to |A · |2H for every x ∈ D(A) (it is easy to construct it by cut-off and finite dimensional approximations). Since µ(D(A)) = 1, by the Beppo-Levi theorem µ[ϕm ] → µ[|A ·|2H ]. Given ε > 0, let m 0 be such that µ[ϕm 0 ] ≥ µ[|A · |2H ] − ε. Since µn k [ϕm 0 ] → µ[ϕm 0 ] as k → ∞, eventually in k we thus have µn k [ϕm 0 ] ≥ µ[|A · |2H ] − 2ε, and therefore also µn k [|A · |2H ] ≥ µ[|A · |2H ] − 2ε. This proves the first part of the lemma; the second one is similar. G Corollary 2.14. Let M ⊂ P × R+ , with Mν ⊂ PNS (ν), be a family with the K41 scaling law, in the sense of Definition 1.4. Then there exist ν0 > 0 and C > 0 such that lim inf µn k Su (x) curl u(x), curl u(x) d x ≥ C 3/2 (µ, ν) · ν −3/2

T

k→∞

for every ν ∈ (0, ν0 ), every µ ∈ Mν and every sequence µn k ∈ S kn such that µkn converges to µ in the weak topology of probability measures on H . Proof. From the previous section we know that 2 2 µ D u(x) d x ≥ 3/2 (µ, ν) · ν −5/2 . T

Since A f, g H = curl f, curl g H

(2.6)

for every f, g ∈ D(A), we have 2 µ D curl u(x) d x ≥ C 3/2 (µ, ν) · ν −5/2 T

for a suitable universal constant C > 0. From the previous lemma we have lim inf µn k D curl u(x) 2 d x ≥ C 3/2 (µ, ν) · ν −5/2 . k→∞

T

Thus the claim of the corollary will follow from the inequality ν · µn k D curl u(x) 2 d x ≤ µn k Su (x) curl u(x), curl u(x) d x T

T

1 |k|2 |σk |2 . + 2 (∞) k∈

(2.7)

14

F. Flandoli, M. Gubinelli, M. Hairer, M. Romito

Let us sketch the proof of this inequality (see [8] for more details). Consider the Galerkin approximations du (n) + ν Au (n) + π (n) B(u (n) , u (n) ) dt = σk dβk e−ik·x k∈ (n)

described in Appendix B. From the Itô formula for Au (n) (t), u (n) (t) H we get t Au (n) (t), u (n) (t) + 2 Au (n) , ν Au (n) + π (n) B(u (n) , u (n) ) ds H

0

= Au (n) (0), u (n) (0)

H

H

1 + Mtn + |k|2 |σk |2 , 2 (∞) k∈ n

where

Mtn

is a square integrable martingale. We have Au (n) , π (n) B(u (n) , u (n) ) = Au (n) , B(u (n) , u (n) ) , H

since

π (n)

H

is selfadjoint and commutes with A. Besides (2.6) we also have A f, B(g, g) H = curl f, (g · ∇) curl g + Sg curl g H ,

hence A f, B( f, f ) H = curl f, S f curl f H for every f, g ∈ D(A). Therefore we have t (n) 2 | curl u (t)| H + (2ν|D curl u (n) |2H + curl u (n) , Su (n) curl u (n) H ) ds 0

≤ | curl u

(n)

(0)|2H

+ Mtn +

1 |k|2 |σk |2 . 2 (∞) k∈

This implies (2.7) and the proof is complete.

Remark 2.15. We cannot conclude (2.5) from the previous corollary without further (unproved) assumptions on µ or {µn k }. This could be just a technical point due to the present lack of better regularity estimates for the 3D Navier-Stokes equations, or it could be a facet of a deeper phenomenon. Let us explain it with a cartoon argument. First recall that it is easy to construct, say on the torus T , a sequence { f n } of func tions converging a.s. to zero, but with T f n d x = 1 (or even T f n d x → ∞): just take the mollifiers of a Dirac delta distribution; if we like, the example can be modified so that f n tend to develop singularities on a dense zero measure set in T , but the a.s. limit is still zero. Thus we see that for the limit measure µ we could have a small value of µ T Su (x) curl u(x), curl u(x) d x even if some coarse graining procedure, here represented by the Galerkin approximations, could give us a large value of µn k T Su (x) curl u(x), curl u(x) d x . Such arguments raise the question of the physical meaning of the true Navier-Stokes equations and possibly of its coarse graining approximations; this is not our aim, but we wanted to say that the previous corollary may be considered perhaps as a result of possible physical interest in itself, even if we cannot rewrite it in the form (2.5).

Rigorous Remarks about Scaling Laws in Turbulent Fluids

15

G (ν), and every sequence µ kn such that µ conLemma 2.16. Given µ ∈ PNS nk ∈ S kn verges to µ in the weak topology of probability measures on H , we also have µn k → µ weakly on [W 1,3 (T )]3 .

Proof. From the lemma above, {µn k } is bounded in probability on D(A): C 2/3 2/3 µn k (|Ax| H > R) = µn k |Ax| H > R 2/3 ≤ R −2/3 µn |A · | H ≤ 2/3 . R The embedding of D(A) into [W 1,3 (T )]3 is compact: recall that the Sobolev embedding theorem gives us W 2,2 ⊂ W 1,

6

6 β, 2β−1

for every β ∈ (1, 2), and the embedding of W

6 β, 2β−1

in W 2β−1 is compact; choose then β = 3/2. Therefore {µn k } is tight in [W 1,3 (T )]3 and we deduce that it converges weakly to µ also in [W 1,3 (T )]3 . G (ν) is the weak limit (in H and thus in [W 1,3 (T )]3 ) of a Corollary 2.17. If µ ∈ PNS k n sequence µn k ∈ S such that µn k [ · 2+ε V ] ≤ C for some ε, C > 0, then 1 ν·µ Du(x) 2 d x = |σk |2 . 2 T (∞) k∈

µn k [ · 3+ε V ]

≤ C, then If in addition 1 2 ν·µ D curl u(x) ≤ µ Su (x) curl u(x), curl u(x) + |k|2 |σk |2 . 2 T T (∞) k∈

Proof. It is sufficient to apply repeatedly the following fact: if µn → µ weakly in a Polish space X , ϕ ∈ C(X ) and µn [|ϕ|1+ε ] ≤ C, then µn [ϕ] → µ[ϕ]. This fact is well known but we provide the proof for completeness. Let Yn and Y be r.v.’s with law µn and µ resp., with values in X , such that Yn → Y a.s. in X . Then µn [ϕ] = E[ϕ(Yn )], µ[ϕ] = E[ϕ(Y )], so by the Vitali convergence theorem it is sufficient to prove that ϕ(Yn ) is uniformly integrable. We have E[ϕ(Yn )1ϕ(Yn )≥λ ] ≤ (E[ϕ(Yn ) p ])1/ p P(ϕ(Yn ) ≥ λ)1/q ≤ Cλ−δ . Thus the uniform integrability is proved and the proof is complete.

G (ν), be a family with the K41 Corollary 2.18. Let M ⊂ P × R+ , with Mν ⊂ PNS scaling law, in the sense of Definition 1.4. Assume that every µ in M is the weak limit of a sequence µn k ∈ S kn such that

µn k [ · 3+ε V ]≤C for some ε, C > 0. Then there exists ν0 > 0 and C > 0 such that (2.5) holds for every ν ∈ (0, ν0 ) and every µ ∈ Mν . Remark 2.19. If K41 scaling law holds then vortex stretching must be intense. Heuristically, no geometrical depletion of such stretching may occur (in contrast to the 2D case where the stretching term is zero because curl u(x) is aligned with the eigenvector of eigenvalue zero of Su (x)): indeed, if we extrapolate the behaviour E[|Du|2 ] ∼ ν1 as 1 Du ∼ √1ν , curl u ∼ √1ν , then we get E[Su curl u · curl u] ∼ ν √ if there is no help ν from the geometry. Another way to explain this idea is the following sort of generalised Hölder inequality.

16

F. Flandoli, M. Gubinelli, M. Hairer, M. Romito

G (ν), be a family with a scaling law Corollary 2.20. Let M ⊂ P × R+ , with Mν ⊂ PNS in the sense of Definition 1.3, fulfilling the assumptions of Corollary 2.18. Then there exists ν0 > 0 and C > 0 such that 1/3 1/2 2 2 µ Du d x ≤C µ Su curl u · curl u d x

T

T

for every ν ∈ (0, ν0 ) and every µ ∈ Mν . Proof. From the previous corollary and the definition of (µ, ν) we have 1/3

1/3 3/2 µ T Su curl u · curl u 2 d x ≥ C (µ, ν) · ν −3/2

1/2 = C 1/2 (µ, ν) · ν −1/2 = C µ T Du 2 d x . The proof is complete.

3. Necessary and Sufficient Conditions for Kolmogorov Type Scaling Laws As we said in the introduction, we advise the reader that we cannot prove or disprove a form of K62 law in dimension three. We simply restate the scaling laws of Kolmogorov type in various ways, with the hope to shed some light on them and encourage further research. We continue with the notations and concepts just introduced in the last section on the 3D case. The result of this section can be formulated for Definition 1.3, but the presence of the factor (µ, ν)−1/4 in the definition of η(µ, ν) makes some statements much less direct. So, having in mind the exploratory character of these equivalent conditions, we prefer to adopt a simplified form of our definition of the Kolmogorov type scaling law. Definition 3.1. We say that a scaling law of Kolmogorov type with exponent 23 + κ holds true for a set M ⊂ P × R+ if there exist ν0 > 0, C > c > 0, C0 > 0, and a monotone function R0 : (0, ν0 ] → R+ with R0 (ν) > C0 and limν→0 R0 (ν) = +∞, such that the bound µ

c · r 2/3+κ ≤ S2 (r ) ≤ C · r 2/3+κ

(3.1)

holds for every pair (µ, ν) ∈ M and every r such that ν ∈ (0, ν0 ] and C0 ν 3/4 < r < ν 3/4 R0 (ν). Recalling that η(µ, ν) = ν 3/4 (µ, ν)−1/4 , we see that this definition is equivalent to Definition 1.3 if there exist 1 > 0 > 0 such that 0 ≤ (µ, ν) ≤ 1 for all (µ, ν) ∈ M. Unfortunately, in 3D only the upper bound can be proven. However, this could be just a technical problem due to the fact that we can only use weak solutions (for slightly more regular solutions Corollary 2.17 implies that (µ, ν) would be bounded from above and below). Consider the auxiliary stochastic Navier-Stokes equations ∂t u (t, x) + ( u (t, x) · ∇) u (t, x) + ∇ p (t, x) = ν˜ u (t, x) + σk β˙k (t)e−ik·x (3.2) (∞)

k∈ L

Rigorous Remarks about Scaling Laws in Turbulent Fluids

17 (∞)

on the torus [0, L]3 with div u = 0 and periodic boundary conditions (the set L is defined in (B.1)). As we shall see below (see the next section and Lemma B.1), we obtain this equation when we perform the following scaling transformation on the solutions u of the original equation (1.1): u (t, x) = L 1/3 u(L −2/3 t, L −1 x) (and a suitably defined p (t, x)). The value of ν˜ under this transformation is ν˜ = ν L 4/3 . This scaling transformation has been introduced in the mathematical-physics literature, see [19]. What makes it special is that no coefficient depending on the scale parameter appears in front of the noise, so the energy input per unit of time and space is the same for every L. G (˜ Similarly to the case L = 1, we may introduce the (non-empty) set PNS ν , L) of limit points of the (homogeneous and isotropic) invariant measures of the corresponding Galerkin approximations. G the set of all pairs (µ, ν) such that µ ∈ P G Let us denote by PNS NS (ν). Similarly, let G G (˜ ˜ L) such that µ ∈ PNS ν , L). us denote by P˜ NS the set of all triples (µ, ν, 3.1. Basic equivalent condition. Let us introduce the notation P L for the set of probability measures analogous to P, but on the torus [0, L]3 . Denote by P· × R2+ the set of all triples (µ, ν, ˜ L) such that (˜ν , L) ∈ R2+ and µ ∈ P L . Definition 3.2. We call an admissible region a set D ⊂ R2+ of the following form: D = {(˜ν , L) ∈ R2+ ; ν˜ ∈ (0, ν0 ), L > R˜ 0 (˜ν )}, where ν˜ 0 > 0 and R˜ 0 : (0, ν˜ 0 ] → [1, ∞) is a strictly decreasing function with R˜ 0 (˜ν ) → ∞ as ν˜ → 0. An admissible region is depicted in the left-hand side of Fig. 3.1 below. ˜ ⊂ P· × R2+ is said to satisfy Condition A with anomalous Condition A. A subset M exponent κ if there exist an admissible region D ⊂ R2+ and two constants C > c > 0 such that ˜ u (e) − u (0) 2 ] ≤ C L −κ cL −κ ≤ µ[

(3.3)

˜ with (˜ν , L) ∈ D. We have denoted by for every ( µ, ν, ˜ L) ∈ M u the generic element of HL . G satisfies Condition A with anomalous exponent κ if and Proposition 3.3. The set P˜ NS G has a scaling law of Kolmogorov type with exponent 2 + κ, in the only if the set PNS 3 sense of Definition 3.1.

Proof. Given L > 0, consider the mapping SL : HL → H defined by (SL u )(x) = L −1/3 u (L x). This mapping induces a mapping S from P × R2+ to P × R+ by S(µ, ˜ ν, ˜ L) = SL∗ µ, ˜ ν˜ L −4/3 .

(3.4)

(3.5)

18

F. Flandoli, M. Gubinelli, M. Hairer, M. Romito

r˜

r

K −1 r = ν 3/ 4

ν

ν˜

ν˜ 0

(a) Parameter domain for condition A

(b) Image of the previous domain

Fig. 3.1. Effect of K −1 on an admissible domain

It follows immediately from Theorem B.2 that one has G G PNS = S(P˜ NS ).

(3.6)

Furthermore, it follows immediately from the above definitions that if (µ, ν) = S(µ, ˜ ν, ˜ r˜ ), then µ S2 (r ) = r 2/3 u(e) − u(0) 2 d µ(u). ˜ (3.7) Hr˜

It therefore follows that, in order to prove the equivalence between Condition A and Kolmogorov scaling law, it suffices to show that the domains of validity of Eq. 3.3 and of Eq. 3.1 are the same (with possibly different constants and functions R0 and R˜ 0 ), provided that (ν, r ) and (˜ν , L) are related by ν˜ = νr −4/3 ,

L = r −1 .

(3.8)

We denote by K : (ν, r ) → (˜ν , L) the above map. Condition A implies the condition in Definition 3.1 . The domain of validity of Eq. 3.3 is given by ν˜ ≤ ν˜ 0 ,

L ≥ R˜ 0 (˜ν ).

(3.9)

Under the map K −1 , this becomes r≥

ν ν˜ 0

3/4 ≡ C0 ν 3/4 ,

Both domains are shown in Fig. 3.1.

1 ≥ R˜ 0 (νr −4/3 ). r

(3.10)

Rigorous Remarks about Scaling Laws in Turbulent Fluids

19

r˜

r

K

r = ν 3/ 4

ν0

ν

(a) Parameter domain for K41

ν˜ 0

ν˜

(b) Image of the previous domain

Fig. 3.2. Effect of K on a domain of the type (3.13)

Defining the strictly decreasing function F(x) = x −3/4 R˜ 0 (x), the second condition of Eq. 3.10 is of course equivalent to ν −3/4 ≥ F(νr −4/3 ).

(3.11)

This condition (as can be inferred from Fig. 3.1), can only be satisfied simultaneously with the first condition in Eq. 3.10 if ν ≤ ν0 ≡ F(˜ν0 )−4/3 . On (0, ν0 ] this domain, Eq. 3.11 is equivalent to 3/4 ν r≤ ≡ ν 3/4 R0 (ν), (3.12) F −1 (ν −3/4 ) −3/4 . Additionally R0 is well-defined on (0, ν0 ] and it where R0 (x) = F −1 (x −3/4 ) is greater than C0 on this domain. Furthermore, since F is decreasing, R0 is strictly decreasing and it is easy to check that lim x→0 R0 (x) = ∞ because the same property holds for F. The condition in Definition 3.1 implies Condition A. The domain of validity of Eq. 3.1 is given by ν ≤ ν0 ,

r ν −3/4 ∈ [C0 , R0 (ν)].

(3.13)

ν˜ −3/4 ∈ [C0 , R0 (˜ν L −4/3 )].

(3.14)

Under the map K , this becomes ν˜ L −4/3 ≤ ν0 ,

The second condition can be rewritten as ν˜ ∈ [G(˜ν L −4/3 ), ν˜ 0 ], −4/3

where we defined ν˜ 0 = C0 in Fig. 3.2.

(3.15)

and G(x) = R0 (x)−4/3 . Both of these domains are shown

20

F. Flandoli, M. Gubinelli, M. Hairer, M. Romito

We can rewrite as above the condition ν˜ ≥ G(˜ν L −4/3 ) as 3/4 ν˜ ≡ R˜ 0 (˜ν ). L≥ G −1 (˜ν )

(3.16)

Again, it is an easy exercise to show that R˜ 0 as defined above is monotone and satisfies lim x→0 R˜ 0 (x) = ∞. The only points that remain to be clarified are: a. We haven’t taken the first equation in Eq. 3.14 into account. b. The domain of definition of R0 may not extend to ν˜ 0 . Both problems can be solved at once by simply choosing a smaller value for ν˜ 0 . Remark 3.4. Consider Eq. 3.2 and Condition A. We are in a situation where the energy injection rate per unit volume is independent of L and ν. In 3D there is clearly a cascade of energy from larger to smaller scales due to various instabilities. Kolmogorov-Obukhov 1941 theory assumes that the cascade is homogeneous, uniform, so that at scales larger than the dissipation scale the flux of energy per unit of volume and time is independent of L and ν. Under this assumption, it would be natural to conjecture that Condition A holds with κ = 0. However, the homogeneity assumption below this theory was not confirmed by later experiments and investigations. On the contrary, the fluid tends to build up localized structures at every scale larger than dissipation, that survive for times longer than the average, where energy is confined for a while and then released. This produces space-time intermittency of energy distribution, energy flux and dissipation. The consequence is a depletion of the average value of u (e) − u (0) 2 as L increases. It is then more natural to expect κ > 0 in Condition A and then in Eq. 3.1. This is supported by experiments but theoretically it is still unproved. 3.2. Necessary and sufficient conditions in terms of high and low modes. In this section, for notational simplicity, we drop the tildes in our notation. Recall that an admissible region is defined by D = {(ν, L) ∈ R2+ ; ν ∈ (0, ν0 ), L > R0 (ν)}, and that Condition A requires cL −κ ≤ µ[ u(e) − u(0) 2 ] ≤ C L −κ for every (µ, ν, L) with (ν, L) ∈ D. We start with a preparatory lemma which depends on the scaling properties of the stochastic Navier-Stokes equations in an essential way. This is the only point in this section where specific information about the measures is being used. G satisfies Condition A then there exist constants C > c > 0 and an Lemma 3.5. If P˜ NS admissible region D such that 23 −κ µ[ u(λe) − u(0) 2 ] dλ ≤ C L −κ L c ≤ e

1 2

G with (ν, L) ∈ D . The sum is extended to all coordinate for every (µ, ν, L) ∈ P˜ NS e unitary vectors. We simply have C = (1.52/3 d) · C, c = (0.52/3 d) · c, D defined by 0.54/3 · ν0 and 1.5R0 (1.5−4/3 ν), where ν0 and R0 (ν) define D.

Rigorous Remarks about Scaling Laws in Turbulent Fluids

21

G , namely µ ∈ P G (ν, L), consider the Proof. Given λ ∈ [ 21 , 23 ] and (µ, ν, L) ∈ P˜ NS NS measure µλ that corresponds to µ under the transformation u → λ−1/3 u(λ.) used in the previous section, having the property µ u(λe) − u(0) 2 = λ2/3 µλ u(e) − u(0) 2 .

By Theorem B.2 we know that µλ ∈ P NG S (νλ−4/3 , L/λ), hence (µλ , νλ−4/3 , L/λ) is in P˜ G . Thus Condition A implies NS

L −κ c ≤ µλ [ u(e) − u(0) 2 ] ≤ C L −κ if νλ−4/3 < ν0 and L/λ > R0 (νλ−4/3 ). The first condition is true if ν < 0.54/3 ν0 . The second one if L > 1.5R0 (1.5−4/3 ν). The proof can now be easily completed. Let us use some Fourier analysis on the torus TL = [0, L]d (see also Appendix 3.2). Every u ∈ HL is given by e−ik·x u (k) with u (k) := L −3 eik·x u(x) d x u(x) = TL

k∈ (∞) L

and we have Parseval identity −3 L u(x) 2 d x = u (k) 2 . TL

(∞)

k∈ L

We introduce another condition expressed in terms of the sum of the enstrophy of low modes and energy of high modes. ˜ ⊂ P· × R2+ is said to satisfy Condition B if there exist an Condition B. A subset M admissible region D ⊂ R2+ and two constants C > c > 0 such that L −κ c ≤ k 2 µ[ u (k) 2 ] + µ[ u (k) 2 ] ≤ C L −κ (∞)

k∈ L , k ≤1

(∞)

k∈ L , k >1

G such that (ν, L) ∈ D. for every (µ, ν, L) ∈ P˜ NS

Remark 3.6. Note that both the constants and the admissible regions involved in Conditions A and B need not necessarily be the same. With this definition, we may establish a first basic theorem as a corollary of the previous lemma. Theorem 3.7. Condition A implies Condition B. Proof. For every u ∈ HL we have 1 2 u(x + λe) − u(x) 2 d x = |eik·λe − 1|2 u (k) 2 u(λe) − u(0) = 3 L TL (∞) k∈ L

22

F. Flandoli, M. Gubinelli, M. Hairer, M. Romito

and thus, for every µ ∈ P NG S (ν, L) we have e

3 2 1 2

µ u(λe) − u(0) 2 dλ =

(∞)

3 2 1 2

e

k∈ L

|eik·λe − 1|2 dλ µ[ u (k) 2 ].

But there exist universal constants C > c > 0 such that 23 |eik·λe − 1|2 dλ ≤ C ( k 2 ∧ 1). c ( k 2 ∧ 1) ≤ e

1 2

Therefore, the quantities 23 µ[ u(λe) − u(0) 2 ] dλ e

1 2

and

( k 2 ∧ 1)µ[ u (k) 2 ]

(∞)

k∈ L

are “equivalent”, up to universal constants. This proves the claim.

We have at least a partial converse of the previous result. Let us introduce the following condition: ˜ ⊂ P· × R2+ is said to satisfy Condition C if there exist an Condition C. A subset M admissible region D ⊂ R2+ and two constants C > c > 0 such that k 2 µ[ u (k) 2 ] L −κ c ≤ (∞)

k∈ L k ≤1/2

≤

k 2 µ[ u (k) 2 ] +

(∞)

µ[ u (k) 2 ] ≤ C L −κ

(∞)

k∈ L k ≤1

k∈ L k >1

G such that (ν, L) ∈ D. for every (µ, ν, L) ∈ P˜ NS

Note that Condition C implies directly Condition B. What is more interesting is the following: Proposition 3.8. Condition C implies Condition A. Proof. We have e |eik·e − 1|2 ≤ C( k 2 ∧ 1) for every k. Moreover if k ≤ 1/2 we have c k 2 ≤ e |eik·e − 1|2 for some constant c > 0. The claim then follows from the next lemma and the following inequality: ik·e 2 2 ik·e 2 |e − 1| µ |e − 1| µ[ u (k) ≥ u (k)t 2 ] (∞)

k∈ L

e

(∞)

k∈ L k ≤1/2

≥c

(∞)

k∈ L k ≤1/2

e

k 2 µ[t u (k) 2 ].

Rigorous Remarks about Scaling Laws in Turbulent Fluids

23

G satisfies Condition A if and only if it satisfies the following Condition Lemma 3.9. P˜ NS A : there exist C > c > 0, and an admissible region D such that −κ ik·e 2 L c≤ |e − 1| µ[ u (k) 2 ] ≤ C L −κ (∞)

k∈ L

e

G such that (ν, L) ∈ D. for every (µ, ν, L) ∈ P˜ NS

Proof. From previous computations, we know that for every µ ∈ P NG S (ν, L) we have 2 ik·e 2 µ[ u(e) − u(0) ] = |e − 1| µ[ u (k) 2 ], (∞)

e

k∈ L

and this proves the claim.

e

Appendix A. Mollification of Measures Some computations involving Taylor expansion require more regularity than that of typical fields under µ ∈ P. For this reason we introduce mollifications of measures µ ∈ P. Note that this technical effort is useless if the noise is more regular, since one can prove more regularity of the typical elements under µ ∈ P. Let ϕ : R → R be a smooth function with compact support, symmetric, non-neg ative, strictly positive at zero, with ϕ( x ) d x = 1. Set φε (x) = ε−d ϕ( x/ε ), so d R Rd φε (x)d x = 1; {φε }ε>0 is a family of usual smooth mollifiers. For every u ∈ H set u ε (x) = Rd φε (x − y)u(y) dy. Given µ ∈ P0 , the mapping u → u ε in H induces an image measure µε ∈ P0 which is in fact supported on smooth fields. Lemma A.1. If µ ∈ P, then µε ∈ P. Proof. Using the change of variables y = y + a we have L u ε (x − a) = φε (x − y )u(y − a) dy = φε (x − y )u(y ) dy , Rd

Rd

where the last equality is understood in law under µ, and it holds true as processes in x. L

Hence u ε (· − a) = u ε (·), which can be written in terms of measures as f (u(· − a)) dµε (u) = f (u) dµε (u) H

H

for all bounded continuous f , so the space homogeneity of µε is proved. Similarly, we have u ε (Rx) = φε (R(x − R −1 y))u(y) dy = φε (x − R −1 y)u(y) dy Rd

Rd

by the symmetry of φε , and so we can conclude that for all bounded continuous f , f (u(R·)) dµε (u) = f (Ru(·)) dµε (u). H

The proof is complete.

H

24

F. Flandoli, M. Gubinelli, M. Hairer, M. Romito

Lemma A.2. For every µ ∈ P, if µ Du(x) 2 d x < ∞ T

µ

and

T

D 2 u(x) 2 d x < ∞,

then µ[ u(r e) − u(0) 2 ] < ∞ and

Du(x) 2 d x , ε→0 T D 2 u(x) 2 d x , lim µε [ D 2 u(0) 2 ] = µ lim µε [ Du(0) 2 ] = µ

ε→0

T

lim µε [ u(r e) − u(0) ] = µ[ u(r e) − u(0) 2 ]. 2

ε→0

Proof. Since for every u ∈ D(A), T Du ε (x) 2 d x is trivially bounded by a constant depending on T Du(x) 2 d x and T Du ε (x) 2 d x → T Du(x) 2 d x as ε → 0,

by Lebesgue theorem, µε T Du(x) 2 d x → µ[ T Du(x) 2 d x] as ε → 0. But

µε is space homogeneous, hence µε T Du(x) 2 = µε [ Du(0) 2 ]. This proves the first claim. The proof of the second one is entirely similar. For the third one, we have u ε (x + r e) − u ε (x) 2 = r

1 0

2 2 Du ε (x + σ e)e dσ ≤ r

0

1

Du ε (x + σ e) 2 dσ

for every u ∈ D(A), hence T

u ε (x + r e) − u ε (x) 2 d x ≤ r 2 ≤ Cr

1

T

0 2

T

Du ε (x + σ e) 2 d x dσ

Du(x) 2 d x.

Therefore, again by Lebesgue theorem, lim µε

ε→0

T

u(x + r e) − u(x) 2 d x = µ u(x + r e) − u(x) 2 d x T

and the third claim follows now from the space homogeneity of both µε and µ.

We are now in the position to prove a quantitative consequence of isotropy, that we shall use in the sequel. In the next statement we understand that both terms in the equality are either finite and equal, or both infinite. Lemma A.3. For every µ ∈ P and every coordinate unitary vector e we have Du(x) 2 d x = d µ Du(x) · e 2 d x . µ T

T

The same identity holds true for µε , moreover µε [ Du(0) 2 = d µε [ Du(0) · e 2 ].

Rigorous Remarks about Scaling Laws in Turbulent Fluids

25

Proof. Step 1. Denote by coordinate unitary vectors. e1 , . . . , ed the For u ∈ 2 D(A) 2 and Du(x) · e 2 = we have Du(x) 2 = |∂ u (x)| |∂x j u i (x)| , thus x i j j ij i 2 and so µ [ Du(0) 2 ] = Du(x) · e t µ [ Du(0) · e j 2 ]. Du(x) 2 = j ε j j ε Therefore µ

T

Du(x) 2 d x = µ Du(x) · e j 2 d x . j

T

It is then sufficient to prove that all terms of the sums on the right-hand-sides are equal, in order to prove the first and last claim of the lemma. We shall prove this below in Steps 2 and 3. Finally, the first assertion for µε is a particular case of the first claim of the lemma, since µε is an element of P. Step 2. Now, given j = 1, . . . , d, by applying a rotation R chosen as in the definition of P such that Re1 = e j , for any given N > 0, µε [ Du(0) · e j 2 ∧ N ] = lim µε [r −2 u(r e j ) − u(0) 2 ∧ N )] r →0

= lim µε [r −2 u(r e1 ) − u(0) 2 ∧ N ] r →0

= µε [ Du(0) · e1 2 ∧ N ]. By monotone convergence in N , we get that µε [ Du(0) · e j 2 ] is independent of j. This proves one of the claims. Step 3. From the previous step and homogeneity we have that the quantity

µε T Du(x) · e j 2 d x is also independent of j. Arguing as in the proof of the previ

ous lemma, this integral converges to µ T Du(x) · e j 2 d x , which is therefore also independent of j. The proof is complete.

Appendix B. Scaling Theorems We consider again the torus, T L = [0, L]d , the energy space HL with norm | · | HL , the spaces VL , D(A L ), D L and the Stokes operator A L on T L introduced in Sect. 1.1. We define (∞) L

=

2π d 2 Z : |k| > 0 , k∈ L

(B.1)

and, for the purpose of Galerkin approximations, we introduce also (n) L

(∞)

so that L

(n)

=

2π d k∈ Z : 0 < |k|2 ≤ L (∞)

= ∪n L . In particular, (∞) = 1

.

2π n L

2

26

F. Flandoli, M. Gubinelli, M. Hairer, M. Romito

B.1. Scaling theorem for Galerkin approximations. Let VL be the dual of VL ; with proper identifications we have VL ⊂ HL ⊂ VL with continuous injections. Let B L (·, ·) : VL × VL → VL be the bilinear operator defined for all u, v, w ∈ D L as w, B L (u, v) HL =

d ∂v j 1 ui wj dx = (l · u (h)) v (l) · w (k). (B.2) d L T L ∂ xi

i, j=1

h+l=k

Given L > 0, ν > 0 and θ > 0, consider (formally) the equation in HL , du + [ν A L u + B L (u, u)] dt = θ σkL dβkL e−ik·x , (∞)

k∈ L

where βkL = β Lk and σkL = σ Lk , and (βk )k∈ (∞) and (σk )k∈ (∞) have been introduced in Sect. 2.2.1 and are subject to the assumptions imposed therein, so that the random fields W L(n) (t, x) = σkL βkL (t) e−ik·x (n)

k∈ L (∞)

and the field W L (t, x) similarly defined, are space-homogeneous and partially (in the sense of the rotations of the torus) isotropic. (n) Let HL be the subspace of HL corresponding to the modes with wavelengths in (n) (n) L and consider the equation in HL , (n) du (n) + [ν A L u (n) + π L B L (u (n) , u (n) )] dt = θ σkL dβkL e−ik·x , (B.3) (n)

k∈ L (n)

(n)

where π L is the orthogonal projection of HL onto HL . Lemma B.1. If u (n) is a solution in HL of (B.3), with initial condition u (n) (0) and parameters (ν, L , θ ), then u (n) (t, x) := λβ u (n) (λ1+β t, λx) is a solution in HL/λ of Eq. (B.3) with initial condition u (n) (0) and parameters (νλβ−1 , L/λ, λ

1+3β 2

θ ) (but with new Brownian motions).

Proof. This statement is not clear a priori, especially because of the scaling transformation of the nonlinear term, so we give all the details. The solutions u (n) and u (n) (as a (n) process in HL/λ ) are given as Fourier series by (n) u (n) (t, x) = u (t, k) e−ik·x . u (n) (t, k) e−ik·x and u (n) (t, x) = (n)

k∈ L

(n)

k∈ L/λ

The Fourier coefficients of u (n) and u (n) are related by the scaling λβ (n) u (t, k) = d u (n) (λ1+β t, x ) eik ·x d x = λβ u (λ1+β t, k ) L TL (where x = λx and k = k/λ).

(B.4)

Rigorous Remarks about Scaling Laws in Turbulent Fluids

27

From Eq. (B.3) in integral form, t (n) u (t) + [ν A L u (n) + π L(n) B L (u (n) , u (n) )](s) 0 σkL βkL (t) e−ik·x , = u (n) (0) + θ (n)

k∈ L

we have β (n)

λ u

(λ

1+β

t, λx) + λ

1+2β 0

= λβ u (n) (0, λx) + λ

1+3β 2

θ

t

(n)

[ν A L u (n) + π L B L (u (n) , u (n) )](λ1+β s, λx) ds

L/λ L/λ βk (t) e−ik·x ,

σk

(n)

k∈ L/λ 1+β

(t) := λ− 2 β L (λ1+β t) are new Brownian motions. The first term on the where β k/λ k u (n) (0, x). In addition, we have l. h. s. is u (n) (t, x), and the first term on the r. h. s. is L/λ

u (n) (t, x) = λ2+β (A L u (n) )(λ1+β t, λx). A L/λ The proof of the claim will be complete if we show that (n) B L/λ ( u (n) , u (n) )](t, x). λ1+2β [π L(n) B L (u (n) , u (n) )](λ1+β t, λx) = [π L/λ

For every ϕ ∈ VL/λ , by using the Fourier expression (B.2) of the non-linear term and the scaling of Fourier coefficients (B.4), (n)

(n)

u (n) , u (n) )(t, ·), ϕ HL/λ = B L/λ ( u (n) , u (n) )(t, ·), π L/λ ϕ HL/λ π L/λ B L/λ ( (n) (n) l · u (t, h) = u (t, l) · ϕ (k) h+l=k

= λ1+2β =λ

1+2β

h l l · u (n) (λ1+β t, ) u (n) (λ1+β t, ) · ϕ (k) λ λ

h+l=k π L(n) B L (u (n) , u (n) )(λ1+β t, λ·), ϕ HL/λ ,

(n)

where the sums above are extended to all h, l and k ∈ L/λ such that h + l = k.

B.2 Scaling theorem for stationary measures. Similarly to Sect. 2.2.3, denote by P NG S (ν, L , θ ) the set of probability measures that are the limit of homogeneous isotropic invariant measures of Eqs. (B.3). Given λ > 0 and β ∈ R and µ ∈ P NG S (ν, L , θ ), let u be a random field on TL with u (x) = λβ u(λx) and let µ be the law of u law µ, define the random field u on TL/λ as on HL/λ . More intrinsically, µ is defined by the relation µ[ f (u(·))] = µ[ f (λβ u(λ·))] for every bounded continuous f on HL/λ .

28

F. Flandoli, M. Gubinelli, M. Hairer, M. Romito

Theorem B.2. If µ ∈ P NG S (ν, L , θ ) then µ ∈ P NG S (νλβ−1 , L/λ, λ

1+3β 2

θ ).

Proof. The measure µ of the theorem is the weak limit of a sequence {µn k } of invariant (n ) measures on HL k of the Galerkin problems with indexes n k . For each n k , let u (n k ) be a stationary solution (on some probability space) of (B.3), with parameters (ν, L , θ ) and marginal µn k . Let u (n k ) be the rescaled process as above, which is a solution of 1+3β

(B.3) with parameters (νλβ−1 , L/λ, λ 2 θ ) (by the lemma above) and is a stationary process. Its marginal µn k is the scaling of µn k , similarly to the relation defined above between µ and µ. Moreover µn k is an invariant measure for Eq. (B.3) with parameters (νλβ−1 , L/λ, λ

1+3β 2

θ ). From the weak convergence of µn k to µ it is now easy to deduce

the weak convergence of µn k to µ. Therefore µ ∈ P NG S (νλβ−1 , L/λ, λ is complete.

1+3β 2

θ ). The proof

Acknowledgement. The authors wish to warmly thank the anonymous referee for several valuable comments and suggestions, that dramatically improved the paper.

References 1. Batchelor, G.K.: The Theory of Homogeneous Turbulence. Cambridge Monographs on Mechanics and Applied Mathematics. Cambridge: Cambridge University Press, 1953 2. Batchelor, G.K., Townsend, A.A.: Decay of vorticity in isotropic turbulence. Proc. R. Soc. Lond. A 190(1023), 534–550 (1947) 3. Batchelor, G.K.: Computation of the energy spectrum in homogeneous, twodimensional turbulence. Phys. Fluids 12(2), 233–239 (1969) 4. Chow, P.-L., Khasminskii, R.Z.: Stationary solutions of nonlinear stochastic evolution equations. Stochastic Anal. Appl. 15(5), 671–699 (1997) 5. Constantin, P., Foias, C., Manley, O.P.: Effects of the forcing function spectrum on the energy spectrum in 2-D turbulence. Phys. Fluids 6(1), 427–429 (1994) 6. Da Prato, G., Debussche, A.: Ergodicity for the 3D stochastic Navier-Stokes equations. J. Math. Pures Appl. (9), 82(8), 877–947 (2003) 7. Fjørtoft, R.: On the changes in the spectral distribution of kinetic energy for two-dimensional, nondivergent flow. Tellus 5, 225–230 (1953) 8. Flandoli, F.: An introduction to 3D stochastic fluid dynamics. In: CIME Lectures Series, 2005, available at http://web.math.Unifi.it/users/cime// 9. Flandoli, F., Gatarek, D.: Martingale and stationary solutions for stochastic Navier-Stokes equations. Probab. Theory‘ Related Fields 102(3), 367–391 (1995) 10. Flandoli, F., Gubinelli, M.: Statistics of a vortex filament model. Electron. J. Probab. 10(25), 865–900 (electronic) (2005) 11. Foias, C., Jolly, M.S., Manley, O.P.: Kraichnan turbulence via finite time averages. Commun. Math. Phys. 255(2), 329–361 (2005) 12. Foias, C., Jolly, M.S., Manley, O.P., Rosa, R.: Statistical estimates for the Navier-Stokes equations and the Kraichnan theory of 2-D fully developed turbulence. J. Stat. Phys. 108(3–4), 591–645 (2002) 13. Frisch, U.: Turbulence. Cambridge: Cambridge University Press, 1995 14. Hairer, M., Mattingly, J.C.: Ergodicity of the 2D Navier-Stokes equations with degenerate stochastic forcing. Ann. of Math. (2), 164(3), 993–1032 (2006) 15. Kolmogorov, A.N.: A refinement of previous hypotheses concerning the local structure of turbulence in a viscous incompressible fluid at high reynolds number. J. Fluid Mech. 13, 82–85 (1962) 16. Kolmogorov, A.N.: The local structure of turbulence in incompressible viscous fluid for very large Reynolds numbers. Proc. Roy. Soc. London Ser. A 434(1890), 9–13 (1991) (translated from the Russian by V. Levin) 17. Kraichnan, R.H.: Inertial ranges in two-dimensional turbulence. Phys. of Fluids 10(7), 1417–1423 (1967) 18. Kuksin, S.B.: The Eulerian limit for 2D statistical hydrodynamics. J. Stat. Phys. 115(1-2), 469–492 (2004) 19. Kupiainen, A.: Statistical theories of turbulence. In: Advances in Mathematical Sciences and Applications, Tokyo: Gakkotosho, 2003 20. Lee, T.D.: Difference between turbulence in a two-dimensional fluid and in a three-dimensional fluid. J. Appl. Phys. 22(4), 524–524 (1951)

Rigorous Remarks about Scaling Laws in Turbulent Fluids

29

21. Novikov, E.A.: Functionals and the random-force method in turbulence theory. Sov. Phys. JETP 20, 1290–1294 (1965) 22. Onsager, L.: Statistical hydrodynamics. Nuovo Cimento (9), 6(Supplemento, 2(Convegno Internazionale di Meccanica Statistica)), 279–287 (1949) 23. Taylor, G.I.: Production and dissipation of vorticity in a turbulent fluid. Proc. R. Soc. Lond. A, 164(916), 15–23 (1938) 24. Taylor, G.I.: Observations and speculations on the nature of turbulence motion (1917). In: G.K. Batchelor, editor, Scientific Papers. Cambridge: Cambridge Univ. Press, 1971 25. Taylor, G.I., Green, A.E.: Mechanism of the production of small eddies from large ones. Proc. Roy. Soc. A 158, 499–521 (1937) 26. Temam, R.: Navier-Stokes Equations, Volume 2 of Studies in Mathematics and its Applications. Third ed., Amsterdam: North-Holland Publishing Co., 1984 (with an appendix by F. Thomasset) 27. von Neumann, J.: Recent theories of turbulence (1949). In: edited by A.H. Taub, Collected Works, Volume VI, London: Pergamon Press, 1961, pp. 437–472 Communicated by A. Kupiainen

Commun. Math. Phys. 278, 31–81 (2008) Digital Object Identifier (DOI) 10.1007/s00220-007-0395-z

Communications in

Mathematical Physics

Random Matrices, Graphical Enumeration and the Continuum Limit of Toda Lattices N. M. Ercolani1, , K. D. T-R McLaughlin1, , V. U. Pierce2, 1 Dept. of Math., Univ. of Arizona, Tucson, AZ 85721, USA.

E-mail: [email protected]; [email protected]

2 Dept. of Math., The Ohio State University, Columbus, OH 43210, USA.

E-mail: [email protected] Received: 19 May 2006 / Accepted: 23 July 2007 Published online: 11 December 2007 – © Springer-Verlag 2007

Abstract: In this paper we derive analytic characterizations for and explicit evaluations of the coefficients of the matrix integral genus expansion. The expansion itself arises from the large N asymptotic expansion of the logarithm of the partition function of N × N Hermitian random matrices. Its g th coefficient is a generating function for graphical enumeration on Riemann surfaces of genus g. The case that we particularly consider is for an underlying measure that differs from the Gaussian weight by a single monomial term of degree 2ν. Our results are based on a hierarchy of recursively solvable differential equations, derived through a novel continuum limit, whose solutions are the coefficients we want to characterize. These equations are interesting in their own right in that their form is related to partitions of 2g + 1 and joint probability distributions for conditioned random walks.

1. Motivation and Background The study of the Unitary Ensembles (UE) of random matrices [25], begins with a family of probability measures on the space of N × N Hermitian matrices. The measures are of the form dµt =

1 exp {−N Tr [Vt (M)]} d M, ZN

where the function Vt is a scalar function, referred to as the potential of the external field, or simply the “external field” for short. Typically it is taken to be a polynomial, K. D. T-R McLaughlin was supported in part by NSF grants DMS-0451495 and DMS-0200749, as well as a NATO Collaborative Linkage Grant “Orthogonal Polynomials: Theory, Applications, and Generalizations” Ref no. PST.CLG.979738. N. M. Ercolani and V. U. Pierce were supported in part by NSF grants DMS-0073087 and DMS-0412310.

32

N. M. Ercolani et al.

and written as follows: υ

Vt =

1 2 tjλj. λ + 2 j=1

The partition function Z N , which appears as a normalization factor in the UE measures, plays a central role in random matrix theory and its applications. It can be reduced to an integration over the eigenvalues which takes a form proportional to the integral (1.1), below, for the particular case when k = N . When all the coefficients tk in the external field are set equal to zero the associated ensemble, corresponding to µ0 , is called the Gaussian Unitary Ensemble (GUE). Many simplifications occur in the Gaussian case (see [16] for explanations of any unfamiliar terms): (1) The partition function, when all tk vanish, is a Gaussian integral, and can be evaluated exactly. (2) The matrix moments, { Tr (M j )}k dµ0 (M), can be evaluated, using Wick’s lemma, in terms of pair correlations of the matrix entries of M which are complex normal random variables. (3) The terms in these Wick coupling expansions are, in the manner of Feynman diagrams, in 1-1 correspondence with certain labelled, oriented graphs. These observations led to the conjecture [5,13] that the logarithm of the partition function has an asymptotic expansion of the form: Z N (t) 1 log = N 2 e0 (t) + e1 (t) + 2 e2 (t) + · · · , N Z N (0) where the coefficients eg (t) should be locally analytic functions of t. The Taylor coefficients of eg should enumerate topologically distinct labelled, connected oriented graphs that can be embedded into a Riemann surface of genus g in such a way that the complement of the graph in the surface is a disjoint union of contractible cells. Such a construction is referred to as a g-map (see Sect. 1.3 for a precise definition). The eg (t) are generating functions for counting the number of g-maps with given numbers of vertices of specified valence. This conjecture was proven in [16] for appropriate domains (see below). The present paper builds on these results to present a more detailed description of the coefficients eg (t) and related generating functions. More precisely, our interest is to develop a systematic, rigorous description of the fine structure for the large N asymptotics of the following family of integrals: (k)

Z N (t1 , t2 , . . . , tυ ) = ⎧ ⎤⎫ ⎡ k ⎨ ⎬ 1 1 · · · exp −N 2 ⎣ V (λ j ; t1 , . . . , tυ ) − 2 log |λ j − λ |⎦ d k λ, ⎩ ⎭ N N j=

j=1

V (λ; t1 , . . . , tυ ) = Vt (λ) = V (λ) =

1 2 λ + 2

υ

tjλj,

(1.1)

j=1

where the parameters {t1 , . . . , tυ } are assumed to be such that the integral converges. For example, one may suppose that υ is even, and tυ > 0. We will sometimes refer to

Random Matrices, Graphical Enumeration and the Continuum Limit of Toda Lattices

33

the following set of t = (t1 , . . . , tυ ) for which (1.1) converges. For any given T > 0 and γ > 0, define ⎧ ⎫ υ−1 ⎨ ⎬ T(T, γ ) = t ∈ Rυ : |t| ≤ T, tυ > γ |t j | . ⎩ ⎭ j=1

The parameter k is an integer that grows with N in such a way that lim N →∞ k/N = x, where x is a finite non-zero value whose role will be specified more precisely later. In this paper we derive a hierarchy of differential equations which uniquely determine ) the coefficients in the asymptotic expansion of log Z (N N for monic even coupling parameters; i.e., we present the eg (t2ν ), for arbitrary ν, as solutions to a system of ordinary differential equations. From this one can deduce functional analytic characterizations of these coefficients. Moreover, this ode system can be solved recursively in g to explicitly construct eg (t2ν ). We illustrate this process by constructing closed form expressions for eg (t2ν ), in which ν appears as a parameter, for low values of g. This analysis of the fine structure of the eg can be extended to multiple coupling parameters and we present a limited illustration of this for the case of two parameters: eg (t2ν1 , t2ν2 ). Remark. In [2], the so-called “Loop Equation” method is used to obtain some information about the fine structure of the coefficients. This approach is based on a formal derivation of a hierarchy of equations for the Cauchy transform of the mean density of eigenvalues. This interesting approach is unsatisfactory in that it relies on several interchanges of singular limits whose justification requires analytical considerations beyond the existence of the complete asymptotic expansion of the partition function. These analytical considerations are the subject of a forthcoming paper by Ercolani and McLaughlin [17].

(k)

1.1. Leading order asymptotics. The leading order behavior of Z N (t1 , t2 , . . . , tυ ) is rather classical, and is known for a very wide class of external fields V (see, for example, [22]). We will require the following result. Theorem 1.1. There is T0 > 0 and γ0 > 0 so that for all t ∈ T(T, γ ), x ∈ [1/2, 1], and k/N → x as k, N → ∞, the following holds true: (1) lim

N →∞

1 (k) log{Z N (t1 , t2 , . . . , tυ )} = −I (x, t1 , . . . , tυ ), k2

(1.2)

where I (x, t1 , . . . , tυ ) =

1 V (λ)dµ(λ) Borel measures µ,µ≥0, dµ=1 x − log |λ − µ| dµ(λ) dµ(η) . inf

(1.3)

34

N. M. Ercolani et al.

(2) There is a unique measure µV which achieves the infimum defined on the righthand side of (1.3). This measure is absolutely continuous with respect to Lebesgue measure, and dµV = ψ dλ, 1 ψ(λ) = χ(α,β) (λ) (λ − α)(β − λ) h(λ), 2π where h(λ) is a polynomial of degree υ − 2, which is strictly positive on the interval [α, β] (recall that the external field V is a polynomial of degree υ). The polynomial h is defined by 1 ds V (s) h(z) = , √ √ 2πi x (s − α) (s − β) s − z where the integral is taken on a circle containing (α, β) and z in the interior, oriented counter-clockwise. (3) There exists a constant l, depending on V such that the following variational equations are satisfied by µV : 2 log |λ − η|−1 dµV (η) + x −1 V (λ) ≥ l for λ ∈ R\supp(µV ), (1.4) 2 log |λ − η|−1 dµV (η) + x −1 V (λ) = l for λ ∈ supp(µV ). (4) The endpoints α and β are determined by the equations β V (s) ds = 0, √ (s − α)(β − s) α β sV (s) ds = 2π x. √ (s − α)(β − s) α (5) The endpoints α(x, t ) and β(x, t ) are actually analytic functions of t and x, which possess smooth extensions to the closure of {x, t : x ∈ [1/2, 1], t ∈ T(T, γ )}. They also satisfy −α(1, 0) = β(1, 0) = 2. In addition, the coefficients of the polynomial h(λ) are also analytic functions of t and x, with smooth extensions to the closure of {x, t : x ∈ [1/2, 1], t ∈ T(T, γ )}, with h(λ, x = 1, t = 0) = 1. Remark. The variational problem appearing in (1.3) is a fundamental component in the theory of random matrices, as well as integrable systems and approximation theory. It is well known, (see, for example, [29]), that under general assumptions on V , the infimum is achieved at a unique measure µV , called the equilibrium measure. For external fields V that are analytic in a neighborhood of the real axis, and with sufficient growth at ∞, the equilibrium measure is supported on finitely many intervals, with density that is analytic on the interior of each interval, behaving at worst like a square root at each endpoint, (see [10] and [11]). Remark. We call the reader’s attention to the parameter, x, in the formulation of the variational problem. We will consider the variational problem for x ∈ (0, 1], and we are particularly interested in x near 1. This parameter represents the asymptotic ratio of k to N : x = lim N →∞ k/N .

Random Matrices, Graphical Enumeration and the Continuum Limit of Toda Lattices

35

Remark. For a proof of (1.2), we refer the reader to [22], however this result is commonly known in the approximation theory literature. Remark. It will prove useful to adapt the following alternative presentation for the function ψ: ψ(λ) =

1 R+ (λ)h(λ), λ ∈ (α, β), 2πi

where the function R(λ) is defined via R(λ)2 = (λ − α)(λ − β), with R(λ) analytic in C \ [α, β], and normalized so that R(λ) ∼ λ as λ → ∞. The subscript ± in R± (λ) denotes the boundary value obtained from the upper (lower) half plane. 1.2. Complete asymptotic expansion. In [16] it was established that a complete large N asymptotic expansion of 1.1 exists. In this paper we will use a straightforward generalization of this result: Theorem 1.2. There is T > 0 and γ > 0 so that for t ∈ T(T, γ ), and x = k/N in a neighborhood of x = 1, one has the N → ∞ asymptotic expansion (k) Z N (t) 1 log = k 2 e0 (x, t) + e1 (x, t) + 2 e2 (x, t) + · · · . (1.5) (k) k Z (0) N

The meaning of this expansion is: if you keep terms up to order k −2h , the error term is bounded by Ck −2h−2 , where the constant C is independent of x and t for all t ∈ T(T, γ ) and for all x in the neighborhood of 1. For each j, the function e j (x, t) is an analytic function of the (complex) vector (x, t),in a neighborhood of (1, 0). Moreover, (k) the asymptotic expansion of derivatives of log Z N may be calculated via term-byterm differentiation of the above series. Remark. In [16], this result was established in the case where x = 1, and under the assumption that t ∈ T(T, γ ), for T small enough, and γ large enough, so that Theorem 1.2 holds true. Under these assumptions, Theorem 1.3 (below) was established. However, as observed in [16] (Remark 2.1, p. 2), the domain so defined is by no means the largest domain where the asymptotic expansion can be rigorously established. All that is required is the existence of a path through the space of parameters (values of x and t) connecting (x, t) to (1, 0) in such a way that all along the path, the associated equilibrium measure is supported on a single interval, with strict variational inequality on the support, strict positivity on the interval of support, and vanishing like a square root at both endpoints of the support. The collection of all such values of (x, t) defines a suitable candidate for a maximal domain, and the proof contained in [16] can easily be extended to show that the asymptotic expansion of the partition function holds on the interior of such a domain. In particular, the above theorem may be easily deduced along these lines. Remark. Recently, Bleher and Its [8] have carried out a similar asymptotic expansion of the partition function for a 1-parameter family of external fields. A very interesting aspect of their work is that they establish the nature of the asymptotic expansion of the partition through a critical phase transition.

36

N. M. Ercolani et al.

1.3. Graphical enumeration and the partition function expansion. Our goal in the work we present here is to establish analytical characterizations of the coefficients eg and, when possible, to derive explicit expressions for these coefficients. This is what we mean by the fine structure of the expansion. In addition to providing the first proof of the asymptotic expansion described in Theorem 1.2, [16] also provides a very detailed explanation of the connection between the asymptotic expansion and enumerative geometry, originally investigated by physicists in the 70s and 80s (see, for example, [5,13], and references contained therein). Equipped with the existence of the asymptotic expansion (and the subsequent result that it may be differentiated term by term), one shows that there is a geometric characterization of each eg as a generating function for enumerating topologically distinct embeddings of graphs into Riemann surfaces of genus g. A map D on a compact, oriented connected surface X is a pair D = (K (D), [ı]), where (1) (2) (3) (4)

K (D) is a connected 1-complex; [ı] is an isotopical class of inclusions ı : K (D) → X ; the complement of K (D) in X is a disjoint union of open cells (faces); the complement of K 0 (D) (vertices) in K (D) is a disjoint union of open segments (edges).

The eg enumerate labelled maps. To be precise we introduce the notion of a g-map which is a map in which the surface X is the closed, oriented Riemann surface of genus g and which in addition carries a labelling (ordering) of the vertices. Theorem 1.3 [16]. The coefficients in the asymptotic expansion (1.5) satisfy the following relations. Let g be a nonnegative integer. Then eg (t1 . . . tυ ) =

n j ≥1

1 (−t1 )n 1 . . . (−tυ )n υ κg (n 1 , . . . , n υ ) n1! . . . nυ !

in which each of the coefficients κg (n 1 , . . . , n υ ) is the number of g-maps with n j j-valent vertices for j = 1, . . . , υ. 1.4. Outline. The organization of this paper is as follows: In Sect. 2 we present the new results concerning the fine structure of the eg and related generating functions that will be proven and further explained in the remainder of the paper. Section 3 is concerned with the leading order term, e0 . The results here are fundamental for the characterization of all the higher order terms. We derive closed form expressions for e0 as a function of each of the valence coupling parameters t2ν . We also relate these evaluations directly and explicitly to the enumeration of planar graphs. In Sect. 4 a continuum limit of the Toda Lattice hierarchy is rigorously derived in which the hierarchy of Toda times corresponds to the valence coupling parameters t2ν . This continuum limit is then used to derive another hierarchy of differential equations whose solutions are the eg . Finally in Sect. 5 we show how the differential equations derived in the previous section are used to inductively generate explicit expressions for the eg . From this we characterize the function-theoretic structure of the eg as well as present explicit formulae for the eg for low values of g. We also show how our results may be extended to the case of multiple times.

Random Matrices, Graphical Enumeration and the Continuum Limit of Toda Lattices

37

2. Results For e0 we have explicit formulas for monic even times Theorem 2.1. For potentials V of the form V = 21 λ2 +t2ν λ2ν , the asymptotic expansion (1.5) holds true for all t2ν ≥ 0, and in addition, we have the explicit formula e0 = η(z − 1)(z − r ) +

1 log(z), 2

where (ν − 1)2 , 4ν(ν + 1) 3(ν + 1) , r= ν−1 β2 z= . 4

η=

Here 4z can be interpreted as the global analytic continuation of β 2 which determines the support (−β, β) of the equilibrium measure. The variable z is locally an analytic function of t2ν , which satisfies the algebraic relation 2ν − 1 ν−1 1 = z + 2ν x t2ν z ν . ν−1 The singularities of e0 occur at z = 0 and z = ∞. The time derivative ∂e0 2ν − 1 ν z ((ν − 1)z − (ν + 1)) = ν−1 ∂t2ν is polynomial in z. One also has a local analytical representation (here the index n 2ν is replaced by n, so that κ0 (0, . . . , 0, n 2ν ) becomes κ0 (n)), e0 (t2ν ) =

∞ j=1

κ0 (n)

(−t2ν )n , n!

(νn − 1)! κ0 (n) = (cν )n , ((ν − 1)n + 2)! 2ν − 1 , cν = 2ν ν−1 where κ0 (n) = κ0 (n 2ν ) is the generating function for 2ν-valent 0-maps. To get a handle on how the higher coefficients eg depend on the parameters t = t2ν we (N ) exploit a remarkable relation between the partition function Z N (t) and the solutions to the hierarchy of completely integrable semi-infinite Toda lattice equations. This relation is classically known, coming from several different directions: for Toda lattice equations and Jacobi matrices see [19]; for orthogonal polynomials and Jacobi matrices see, for example, [21]; for Hankel matrices and orthogonal polynomials see, for example, [30]; and for orthogonal polynomials and random matrix theory see [24]. This relation will be

38

N. M. Ercolani et al.

further explained in Sect. 4. These differential equations may be succinctly expressed through the semi-infinite tri-diagonal matrix ⎛ ⎞ 0 1 0 0 ··· ⎜ b02 0 1 0 · · · ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ 0 b2 . . . . . . . . . ⎟ 1 L=⎜ ⎟. ⎜ ⎟ ⎜ 0 0 ... 0 1 ⎟ ⎝ ⎠ .. .. . . 2 . . . . . bn . The Toda Lattice system at level 2ν can then be defined as 1 dbk2 = (L2ν )k+2,k − (L2ν )k+1,k−1 , 2 dξ (L2ν )k+1,k−1 =

(2.1) Lk+1,i2 Li2 ,i3 . . . Li2ν ,k−1 . (2.2)

i 1 ,i 2 ,...,i 2ν+1 ;|i j+1 −i j |=1;i 1 =k+1,i 2ν+1 =k−1

The sum here is indexed by walks of length 2ν along the 1D integer lattice from k + 1 to k − 1. The solution of this system may be expressed directly in terms of the partition function Z k (t1 , t) = Z k(k) (t1 , t) associated to the potential V = 21 λ2 + t1 λ + tλ2ν : 2 d 1 bk2 (ξ ) = k log Z k (t1 , s)t1 =−k −1/2 ξ1 =0,s=2ξ k ν−1 . 2k 2 dt1 2 As a dynamical system, (2.1) should be considered as an initial value problem, with bk (0)2 = k. We can now state our next main result which characterizes the continuum limit of the Toda lattice hierarchy. Theorem 2.2. For all t ≥ 0, bk2 has a valid asymptotic expansion of the form 1 1 bk2 k z 0 (s) + 2 z 1 (s) + 4 z 2 (s) + · · · , k k where s = −2k ν−1 t. The terms of this expansion are determined by the following partial differential scheme: 1 (ν) F ( f, f w , f ww , f www ) + · · · k2 1 1 + 2g Fg(ν) ( f, f w , f w(2) , · · · , f w(2g+1) ) + · · · k

f s = cν f ν f w +

evaluated at w=1

where 1 1 f 1 (s, w) + · · · + 2g f g (s, w) + · · · , and 2 k k 1 1 f (s, 1) = z 0 (s) + 2 z 1 (s) + 4 z 2 (s) + · · · , k k f g (s, w) = w 1−2g z g (w ν−1 s). f (s, w) = f 0 (s, w) +

;

Random Matrices, Graphical Enumeration and the Continuum Limit of Toda Lattices

39

Note that bk2 and k f (s, 1) possess the same asymptotic expansion. The forcing term F j(ν) (· · · )|w=1 is a homogeneous multi-nomial of degree ν + 1 in the f w(r ) which does not contain any instances of z α for α ≥ j. These forcing terms have the following form:

Fg(ν) =

(ν,g) ν−ρ(V )+1

dV

V :|V |=2g+1 ρ(V )≤ν+1

2g+1

f

j=1

f w( j) j!

r j (V )

,

!ρ(V ) where V = m=1 Vm is a partition of 2g + 1; r j is the number"of times a “part”, Vm , of cardinality |Vm | = j appears in the partition; ρ = ρ(V ) = r j (V ); and (ν,g)

dV

1 = #2g+1 j=1

r j ! 1≤i1 <···
coeff of x ν−ρ(V )+1 in P(x)

− coeff of x ν−ρ(V )+1 in Q(x) ; where, |Vσ (m) | m ρ(V ) ∂ (1 + xs )+2 P(x) = im − 2 ∂ xs σ ∈S m=1

s=1

i 1 −1

· (1 + x1 ) · · · (1 + xs )is −is−1 −1 · · · (1 + xρ )iρ −iρ−1 −1 (1 + xρ+1 )2ν−iρ |xµ =x , |Vσ (m) | m ρ(V ) ∂ Q(x) = (1 + xs )+1 im − 2 ∂ xs σ ∈S m=1

s=1

i 1 −1

· (1 + x1 )

· · · (1 + xs )is −is−1 −1 · · · (1 + xρ+1 )2ν−iρ |xµ =x .

Remark. We refer to the above as a partial differential scheme because it signifies not an equation to be solved but rather a prescription for generating a hierarchy of ordinary differential equations for the z g . The ode hierarchy is constructed from the scheme as follows. The ode at level g is obtained by replacing the expansion f and its w-derivatives, f w , f ww , etc., in the pde scheme by their order g truncations. Then, equating the coefficients of all terms of order k −2g in this truncated scheme and setting w = 1 yields a k-independent ode in s which is the g th equation of the continuum-Toda hierarchy. This is an ode for z g (s) in terms of z j (s) for j ≤ g. The first equation in the hierarchy (i.e., the one coming from the k 0 -coefficients of the above scheme) is a nonlinear ODE for z 0 : $ % z 0 (s) = cν z 0 (s)ν z 0 (s) + (ν − 1)sz 0 (s) , with the initial condition z 0 (0) = 1. This ODE is solved implicitly by a solution to the algebraic equation 1 = z 0 (s) − cν sz 0 (s)ν . (2.3) As indicated in Theorem 1.1 (5), this relation can also be derived directly from the characterization of the equilibrium measure. Relation (2.3) allows us to write s as well as derivatives of z 0 as rational functions of z 0 . This will be exploited to arrive at the explicit representations of the z g given below.

40

N. M. Ercolani et al.

The k −2g equation in the hierarchy is linear in z g , and can be written as: z g (s) = cν

f 0ν f g w + ν f 0ν−1 f g f 0 w

w=1

+ Forcingg |w=1 ,

(2.4)

where ⎞

⎛ ⎜ ⎜ c ⎜ ν ∂ Forcingg = ⎜ ⎜ ν + 1 ∂w ⎝

0 ≤ ij < g i 1 + · · · + i ν+1 = g

⎟ ⎟ ⎟ f i1 · · · f iν+1 ⎟ + F1(ν) [2g − 2] ⎟ ⎠

(ν)

+F2 [2g − 4] + · · · + Fg(ν) [0], (ν)

(2.5) (ν)

and F [2r ] denotes the coefficient of k −2r in F . We note that the terms in Forcingg |w=1 depend only on z j , j < g and their derivatives. Remark. This is a particular continuum limit of the Toda lattice, variants of which have been studied extensively in the integrable systems literature [12,6]. However, the application of this continuum limit of the Toda lattice to random matrix theory has, as far as we know, not been explored. In particular, this is not the continuum limit which arises in the so-called “double scaling” limit [31] in which the leading order equation is said to be the KdV equation. The leading order equation in our system is the inviscid Burgers’ equation. It is also important to note that the continuum limit equations are completely distinct from so-called Virasoro constraints, which have been much discussed in the physics literature (see [14,23] and references therein). For the case of planar maps (g = 0) such generating functions have received significant attention recently [7,9]. Our explicit calculations presented later can provide a basis for extending these studies. Before proceeding to the statement of our next result we need to introduce a scale of function classes that will enable us to describe the functional nature of the coefficients z g that we have just introduced as well as that of the generating functions eg . We will refer to the classes as iterated integrals of rational functions or iir for short. These classes are defined inductively in terms of the variable z = z 0 regarded as an independent variable. To begin with, the class contains rational functions of z. One then adds integrals of these rational functions with respect to dz. Next one considers the vector space of polynomials in products of these integrals over the field of rational functions in z and augments the space by integrals, with respect to dz of these functions. Then take the vector space of polynomials in these latter integrals and add integrals of these. One continues this iterative process up to any given finite stage. These are the classes of functions we refer to as iir. In our case the rational functions at any stage will be restricted to the sub-ring of functions whose poles are located at either z = 0 or z = ν/(ν − 1) for a fixed value of ν. These classes of functions will certainly include the class generated by polylogarithms [18] but may be larger. Theorem 2.3. (1) The coefficient z g appearing in Theorem 2.2 is of class iir in z 0 with singularities only possible at z 0 = 0 and z 0 = ν/(ν − 1).

Random Matrices, Graphical Enumeration and the Continuum Limit of Toda Lattices

41

(2) The coefficient z g is more explicitly presented as a function of z 0 through the following integral solution of Eq. (2.4): z 0 (s) (ν − (ν − 1)y) z 0 (s)2(1−g) z g (s) = Forcingg (y)dy. ν − (ν − 1)z 0 (s) 1 cν y ν+3−2g (3) In the above equation, Forcingg , formerly a function of finitely many terms in the expansion of f (s, w) and their derivatives, is in fact a function of z 0 (s) alone, which will henceforth be denoted as Forcingg (z 0 ). We also derive a hierarchy of differential equations for the eg (t2ν ) with data given in terms of the z j s. Theorem 2.4. The g th equation in the hierarchy of equations governing eg (t2ν ) is ' ∂ 2 & 2−2g w eg −w ν−1 s |w=1 2 ∂w g ' ∂ (2n+2) & 2−2(g−n) 2 ν−1 |w=1 w =− e (−w s) g−n (2n + 2)! ∂w (2n+2) n=1 ∞ 1 −2g + the k term of log z n (s) . k 2n

(2.6)

n=0

Equation (2.6) determines eg (−s) from a second order differential equation for eg with forcing terms depending on en , n < g, z n , n ≤ g, and their derivatives. Remark. Observe that the RHS of (2.6) is a function of z 0 (s) which will henceforth be denoted by driversg (z 0 ). Theorem 2.5. (1) The coefficient eg (−s) is of class iir in z 0 with singularities restricted to z 0 = 0 and z 0 = ν/(ν − 1). (2) The solution of (2.6) may be represented as 1 driversg (z 0 (s)) (2 − 2g)(1 − 2g) cν z 0 (s)ν (2−2g)/(ν−1) 1 − 2 − 2g z 0 (s) − 1 z 0 (s) %• y − 1 (2−2g)/(ν−1) $ driversg (y) dy × ν c y ν 1 cν z 0 (s)ν (1−2g)/(ν−1) 1 + (1 − 2g) z 0 (s) − 1 z 0 (s) %• y − 1 (1−2g)/(ν−1) $ driversg (y) dy × ν c y ν 1

eg (−s) = −

+ K 1 s (2g−2)/(ν−1) + K 2 s (2g−1)/(ν−1)

(2.7)

when g = 1, where K 1 and K 2 are constants of integration either determined by the requirement that eg be a locally analytic function of s or by the evaluation of eg

42

N. M. Ercolani et al.

for low values of ν through its combinatorial characterization; and, when g = 1, ( z 0 (s) − 1 1/(ν−1) z 0 (s) cν y ν ν/(ν−1) 1 e1 (−s) = (ν − 1) cν z 0 (s)ν y−1 1 (ν − (ν − 1)y) drivers1 (y)dy cν y ν+1 ) z 0 (s) (ν − (ν − 1)y) drivers1 (y)dy − y(y − 1) 1

×

=−

1 log (ν − (ν − 1)z 0 (s)) , 12

(2.8)

$ %• where we have chosen the principal branch of the logarithm. By driversg (y) we mean the derivative of driversg (y) with respect to y. One can also derive explicit formulae for eg , for low values of g, in terms of z 0 (see for example Sect. 5.9) from (2.7).

3. Leading Order The goal of this section is to prove the results stated in Theorem 2.1. The early subsections develop the necessary background and ingredients for the derivations of the explicit formulae presented in this theorem. The results themselves emerge in Subsects. 3.4 and 3.5. We will show that the leading order coefficient, e0 (x, t), of the asymptotic partition (k) (k) function, k −2 log(Z N (t)/Z N (0)) is found in terms of the equilibrium measure µ = µVt /x . This leading order coefficient, which satisfies a potential-theoretic variational problem, has been studied from several perspectives in the literature. We point out that what is novel here is that we identify a general expression for this coefficient which has a similar form for all external fields of the form (1/2)λ2 + t2ν λ2ν . Define ψ(λ) by dµ = ψ(λ)dλ. We denote the leading order behavior E t = lim − k→∞

1 (k) log Z (t) = I (x, t), N k2

and 1 lim − log k→∞ k 2

(k)

Z N (t) (k)

Z N (0)

= Et − E0 .

More explicitly, by (1.3) Vt (λ) Et = dµ(λ) + log |λ − η|−1 dµ(λ)dµ(η) x Vt (λ) , ψ(λ) − (Lψ, ψ) , = x

(3.1)

(3.2)

Random Matrices, Graphical Enumeration and the Continuum Limit of Toda Lattices

where

43

(L f ) (λ) =

log |λ − η| f (η)dη

is the logarithmic potential of the measure f (η)dη, and where the inner product (·, ·) is defined by ( f, g) =

f (λ)g(λ)dλ.

Using (3.1) and (3.2) together with (1.5), we find that 1 1 2 1 1 2 e0 (t, x) = − λ + tλ2ν , ψ(λ) +(Lψ, ψ)+ λ , ψ0 (λ) −(Lψ0 , ψ0 ) . (3.3) x 2 x 2 We recall here the parameter l introduced in (1.4): V (λ) l = 2 log |λ − s|−1 ψ(s)ds + x V (λ) = −2(Lψ, ψ) + . x Using this we have the following reduced formula for e0 : e0 = −

1 2 (V, ψ) (−l) (−l0 ) + + (λ , ψ0 ) − . 2x 2 4x 2

(3.4)

This formula shows that there are two fundamental quantities that need to be calculated in order to evaluate e0 . These are the moment (V, ψ) and the lagrange multiplier l associated to the constraint that the measure µ should have total mass = 1. The other quantities appearing in (3.4), (λ2 , ψ0 ) and l0 are evaluated by specializing the fundamental quantities at t = 0. Evaluating the fundamental quantities will require an explicit asymptotic expansion, for large λ, of the equilibrium measure ψ(λ)dλ. We develop this in the next section.

3.1. Explicit asymptotic expansion of the equilibrium measure. The equilibrium measure µ is of the form * 1 ψ(λ) = h(λ) λ2 − β 2 χ[−β,β] (λ), 2πi where h is a polynomial determined by 1 Vt (s) h(λ) = ds, 2πi x s 2 − β 2 (s − λ)

(3.5)

the integral in (3.5) is taken along a simple closed counterclockwise contour large enough to contain the interval [−β, β] and the point λ. We will now evaluate the polynomial h by expanding the integrand of (3.5) for large λ and calculating the loop integral on this expansion.

44

N. M. Ercolani et al.

Define the sequence {v j }∞ j=0 by ∞ 1 λ2 − β 2 =1− vi 2i+2 ; λ λ i=0

whose Taylor coefficients can be computed to be 1 2i − 1 β 2i+2 vi = i , 4 i −1 i +1

(3.6)

with v0 defined to be β 2 /2. Expand the polynomial h, given by (3.5), in terms of its coefficients h j : ⎞ ⎛ ν−1 1⎝ h j λ2 j ⎠ . h(λ) = 1+ x j=0

Next note that

λ

=

λ2 − β 2

∞ 2(i + 1)vi 1 . β2 λ2i i=0

A direct computation of (3.5) gives h j = 4ν(ν − j)t

vν−1− j . β2

The constraint that the total mass of µ should be 1, β 1 1= ψ(λ)dλ = ψ(λ)dλ, 2 −β

(3.7)

(3.8)

implicitly determines t as a function of β 2 . Here, the integral is over a contour containing the interval [−β, β]. We compute the loop integral in (3.8) over a large contour and find that the constraint can be expressed as a relation between the coefficients {h j } and {v j }: ν−1

v0 h j v j + . x x

2=

(3.9)

j=0

The relation (3.9) simplifies to the expression (using identities (3.6) and (3.7) ) β2 2ν − 1 ν 2x = tβ 2ν . + ν−1 (3.10) ν−1 2 4 If we set z = β 2 /(4x) (3.10) becomes

2ν − 1 ν−1 ν 1 = z + 2ν x tz . ν−1

When ν = 2, h(λ) = with the constraint

1 (1 + 2tβ 2 + 4tλ2 ), x

1 = z + 12xt z 2 .

(3.11)

Random Matrices, Graphical Enumeration and the Continuum Limit of Toda Lattices

45

3.2. Explicit calculation of l. We will first derive an analytic expression for l in terms of β 2 . To this end we study the logarithmic potential of µ: g(λ) =

β

−β

log(λ − s)ψ(s)ds.

This function is analytic in C \ (−∞, β]. For λ ∈ (−∞, β] we define two functions g+ and g− by g± (λ) = lim g(λ ± i). →0

Choosing the appropriate branch of the logarithm we find that these functions are expressible as β g± (λ) = log |λ − s|ψ(s)ds ± iπ ψ(s)ds. (3.12) λ

A calculation using (3.12) and (1.4) shows that V (λ) V (λ) + l = 2 log |λ − s|ψ(s)ds − + l = 0, g+ (λ) + g− (λ) − x x and

g+ (λ) − g− (λ) = −2iπ

λ

β

ψ(s)ds.

(3.13)

(3.14)

Equation (3.14) implies that g+ (λ) = g− (λ) − 2iπ

λ β

ψ(s)ds;

therefore, if λ is in C/(−∞, β], then Eq. (3.13) becomes 2g(λ) − 2iπ

λ β

ψ(s)ds −

and −l = 2g(λ) −

V (λ) + x

λ β

V (λ) + l = 0, x

* h(s) s 2 − β 2 ds.

(3.15)

This is the basic expression we shall use to calculate l. Since l is a constant it can be evaluated for any choice of λ. We will evaluate it by studying the limit of (3.15) as λ → ∞. First observe that in this limit 1 g(λ) = log(λ) + O . λ Thus the principal issue is to develop an asymptotic expansion of the indefinite integral λ * h(s) s 2 − β 2 ds. (3.16) β

46

N. M. Ercolani et al.

The details of the derivation are deferred to Appendix A; the result is the following:

λ β

* λ 2 − β 2 2 2 λ2 − β 2 1 2 2 2 2 + 2λ h(s) s − β = W (λ) λ − β x λ β λ 2 2 λ −β λ −2 log + , (3.17) β β

where W (λ) =

ν−1

w p λ2 p ,

p=1

with ν−1 1 h jvj. wp = 2v p ( p + 1) j= p

As a result of these calculations and taking the limit as λ → ∞, we deduce the formula 2 ν−1 ν−1 β 2 −l = β , (3.18) w j v j−1 − w j v j − 1 + log 4 j=1

j=1

in which terms of non-constant order (which must cancel in any case) have been dropped. Remark. The relations generated by setting the non-constant terms equal to zero are equivalent to the moment conditions of Theorem 1.1 (4) for the measure ψ (see also [26]). Equation (3.18) simplifies to the expression: −l =

2 4(2ν − 1)(2ν − 3)!t β 2ν β . − 1 + log xν(ν − 2)!2 4ν 4

(3.19)

Next, we use (3.10) to express t as a function of β 2 and x: t =−

(ν − 1)!2 4ν−1 2 (β − 4x). 2(2ν − 1)!β 2ν

(3.20)

Substituting (3.20) into (3.19) we finally have: −l = −

ν−1 ν

2 β2 β − 1 − 1 + log . 4x 4

As an example, consider ν = 2 for which Eq. (3.21) becomes 1 −l = − 2

2 β2 β − 1 − 1 + log . 4x 4

(3.21)

Random Matrices, Graphical Enumeration and the Continuum Limit of Toda Lattices

47

3.3. Explicit calculation of (V, ψ). Next we need to find an expression for (V, ψ): * 1 β 1 2 (3.22) λ + tλ2ν h(λ) λ2 − β 2 dλ. (V, ψ) = x −β 2 Evaluating (3.22) by regarding its double as a contour integral and computing the value in terms of v j and h j , we find that (V, ψ) =

ν−1 ν−1 t v1 tvν 1 + . h j v j+1 + h j v j+ν + 4x 4x 2x 2x j=0

j=0

This expression simplifies to 4−1−ν tβ 2ν+2 (2ν)! β 4 24−2ν t 2 (2ν − 1)!2 β 4ν 4−1−ν (ν + 1)tβ 2ν+2 (2ν)! + + . + x(ν + 1)(ν − 1)!2 32x xν(ν − 1)!4 x(ν + 1)!2 (3.23) As before, we simplify Eq. (3.23) by substituting the expression (3.20) for t in terms of β2 : −8xβ 2 ν 2 + β 4 ν 2 − 2β 4 ν − x 2 16ν − x 2 16 + 8β 2 νx + β 4 . (V, ψ) = − 32x(ν + 1)ν When ν = 2, we find that (V, ψ) =

(V, ψ) =

1 x 1 2 β − β4 + . 12 192x 4

3.4. The explicit formula for e0 (t). We can now put all of the pieces together. √ Evaluating the above expressions when t = 0 (or equivalently when β = 2 x), we find E 0 = −3/4 + 1/2 log(x). Collecting all the components gives 2 1 β 1 1 2 2 log = η (z − 1) (z − r ) + log(z), µ β − 4x β − 4xr + e0 = 16x 2 2 4x 2 (3.24) where (ν − 1)2 , 4ν(ν + 1) 3(ν + 1) , r = ν−1

η=

and

β2 . 4x We have proven that e0 has an explicit representation depending only on ν (see the first part of Theorem 2.1). When ν = 2 the result is 1 1 e0 = (z − 1)(z − 9) + log(z). 24 2 Remark. It is interesting to compare our expression here to a formula derived in [8]. The difference is that while our formula is more explicit, it is restricted to a 1-parameter family of times corresponding to a fixed valence of the vertices. z=

48

N. M. Ercolani et al.

3.5. Enumeration of planar graphs. It follows from Theorem 1.3 that the coefficients, (ν) κ0 ( j) , of the leading order term e0 (t) count the number of planar 2ν-regular maps with j vertices. We have shown that e0 (t) can be explicitly expressed in terms of the auxiliary function z(t). The latter solves the algebraic relation (3.11). So to find the Taylor Coefficients of e0 (t), we must first find the Taylor Coefficients of z(t). We define α = −cν x ν−1 t, (3.25) where

cν = 2ν

2ν − 1 . ν−1

Then the polynomial relation defining z(t) is 1 = z(t) − αz(t)ν

(3.26)

in a neighborhood of t = 0. (We view x here as a scaling parameter. When x is set to 1, we will recover the counting function.) The"j th coefficient of the Taylor expansion of z as a function of α near 0, z = j≥0 ζ j α j , is of course given by 1 z(α) ζj = dα. 2πi α j+1 Making the substitution u = z(α), and using the evaluation zν dz = dα 1 − ναz ν−1

(3.27)

derived by differentiating the relation (3.26), and also using (3.26) to eliminate α, this integral becomes 1 (ν − (ν − 1)u) u ν j du. (3.28) 2πi u∼1 (u − 1) j+1 Applying the binomial expansion, it is then straightforward to evaluate this loop integral and find that 1 νj νj νj + 1 = . (3.29) +ν ζ j = (1 − ν) j j j j −1 We note that these coefficients are precisely the higher Catalan numbers, which play a role in a wide variety of combinatorial problems. For a discussion of these applications and their relation to the work discussed here see [27]. A similar approach will yield coefficients for the other terms in the expression (3.24) " (ν) j for e0 . For instance the Taylor coefficients of log(z(α)) = ∞ j=1 L j α are given by (ν) Lj

1 = 2πi

u∼1

log(u)(1 − ναu ν−1 )u jν du. (u − 1) j+1

Expanding the integrand in the vicinity of u = 1, we can evaluate these coefficients as , + (−1)k 1 νj − 1 νj νj − 1 (ν) Lj = (1 − ν) +ν = . j j j −1 k+1 j k+= j−1

Random Matrices, Graphical Enumeration and the Continuum Limit of Toda Lattices

49

By the same method one also derives the coefficients in the expansion of the quadratic " (ν) j term, (z(α) − 1)2 = ∞ j=2 U2, j α : 2 νj (ν) . U2, j = j j −2 Finally one may assemble all these contributions in (3.24) and substitute for α as in (3.25) to conclude that the Taylor coefficients of e0 with respect to t2ν satisfy (ν) κ0 ( j) 1 (ν) (ν) L = j! −(r − 1)ηζ + ηU + j 2, j j 2 j cν 2( j − 1)(ν j)! 1 (ν j − 1)! (ν j)! +η + . = −(r − 1)η ((ν − 1) j + 1)! ((ν − 1) j + 2)! 2 ((ν − 1) j + 2)! After simplifying this equation we get (ν)

κ0 ( j) = cνj

(ν j − 1)! . ((ν − 1) j + 2)!

This establishes the second half of Theorem 2.1. 4. Continuum Toda Equations We will √ show that the Toda Lattice equations at level 2ν, with the initial conditions bk = k, possess a continuum limit under an appropriate scaling. These equations will lead to a description of the evolution of the asymptotic partition function coefficients, eg (t). More precisely, in Subsect. 4.1 we establish Theorem 2.2 and in Subsect. 4.2 we establish Theorem 2.4. To this end, we first review some basic facts about the Toda lattice equations. The Toda Lattice Hierarchy is given by the differential equations dL = [Bi (L), L] , dξi where L is a symmetric tridiagonal matrix of the form ⎛ ⎞ a1 b1 0 0 0 ... ⎜b1 a2 b2 0 0 ...⎟ ⎜ 0 b a b 0 ...⎟ ⎟ 2 3 3 L=⎜ ⎜ 0 0 b a b ...⎟ , 3 4 4 ⎝ ⎠ .. .. .. .. .. . . . . .

(4.1)

$ % $ % and Bi (L) = L i + − L i − , where the plus subscript denotes upper triangular projection and the minus subscript denotes lower triangular projection. We briefly recall the notion of tau functions in terms of which the solution of the Toda Lattice equations at different time parameters can be expressed. Let L 0 be a semi-infinite tri-diagonal matrix of the type that arises as a 3-term recurrence relation for orthogonal polynomials with exponential weight. The dependence of the j th tau function on the ξi is given by & '1/2 , τ j (ξ1 , . . . , ξυ ; L 0 ) = det M 2j

50

N. M. Ercolani et al.

"υ where M = exp( i=1 ξi L i0 ) and M j is the j × j upper left block of M in the basis "υ of orthogonal polynomials with respect to dω = exp( i=1 ξi λi )dλ. From the matrix factorization method of solving the Toda equations [20] one finds that b j (ξ ) =

τ j+1 (ξ )τ j−1 (ξ ) b j (0). τ j (ξ )2

(4.2)

Furthermore, from (4.2) and the differential equations for the ξ1 -Toda flow, one can deduce that 1 d2 log τ ( ξ ) . (4.3) b j (ξ ) = j 2 dξ12 The representations (4.2) and (4.3) of b j in terms of tau functions are referred to as Hirota relations, familiar from the theory of soliton equations. Finally, through the use of Hankel determinants, the tau function may be related to (k) the partition function, Z k = Z k . Recall that the Hankel determinant [30] associated to "υ the weight dω(λ) = exp( i=1 ξi λi )dλ is defined as follows c0 c1 · · · ck−1 c c ··· c 1 2 k . . . . Hk (ξ ) = .. .. . . .. , . . . . . . . .. . . . c c ··· c k−1

k

2k−2

where c j = R λ j dω(λ) are the moments of dω. There follows the relation H j+1 (ξ )H j−1 (ξ ) b2j (ξ ) = . H j (ξ )2 The partition function may be obtained from the Hankel determinant by a rescaling of the time parameters (see, for example, [16, Eq. (1.29)]). Consequently one can relate the partition function to the tau function, in the case of interest to us, as $ % Z k −ξ1 , −2k ν−1 ξ2ν 2 τk (ξ1 , ξ2ν ; L 0 ) = , Z k (0) √ where L 0 is the semi-infinite tridiagonal matrix with bk = k, ak ≡ 0. For this initial condition, the evolution of L under the even 2ν-flow preserves the vanishing of ak . It will be natural to study this Toda evolution in terms of the self-similar variable s = 2k ν−1 ξ, where ξ = ξ2ν . We will start from the unscaled partition function (1.1): ⎧ ⎫ ⎬ k ⎨ 1 2 (k) k λ − ξ λ2ν Z 1 (−ξ1 , −ξ ) = · · · exp − j − ξ1 λ j ⎭ V(λ)d λ , ⎩ 2 j j=1 λ j − λl 2 . where V(λ) = j
Using the Hirota relations, (4.2) and (4.3), we have the following proposition.

Random Matrices, Graphical Enumeration and the Continuum Limit of Toda Lattices

51

Proposition 4.1. The Toda recurrence coefficients may be represented as follows: bk2 =

1 d2 log [Z k (t1 , −s)] t1 =0 , 2 2k dt1

where s ≤ 0.

In addition, the following asymptotic expansion holds true: bk2 = k

g≥0

z g (s)

1 , k 2g

(4.4)

where z g (s) is an analytic function of s in a neighborhood of s = 0, and z 0 (0) = 1, z g (0) = 0, g > 0. To prove this proposition we scale the eigenvalues in the unscaled partition function as λ j = N 1/2 λ˜ j under which ⎧ ⎫⎞ ⎛ ⎬ k ⎨ 1 2 ξ1 2 ⎠ ˜ k˜ ˜ λ˜ − ξ N ν−1 λ˜ 2ν Z 1(k) = N k /2 · · · exp ⎝−N j − √ λ j ⎭ V(λ) d λ ⎩ 2 j N j=1 ξ1 k 2 /2 (k) ν−1 =N . Z N − √ , −ξ N N We observe that the Toda variables associated to our random matrix ensemble with the corresponding N scaling have the following representation: 1 d2 (k) log Z 1 2 dξ12 ξ1 1 d2 (k) ν−1 − = log Z , −ξ N √ N 2 dξ12 N & ' 2 1 d (k) = log Z N (t1 , t) t=−2ξ N ν−1 ,t =− √ξ1 . 1 2N dt 2

bk2 =

1

N

We next define a spatial scale through the ratio between k and N : set k = x N . If we √ furthermore introduce the scaling λ˜ j = x λˆ j , then we can rewrite ⎫⎞ ⎛ ⎧ ⎬ k ⎨ 1 t 2 1 (k) ⎠ ˆ kˆ ˆ λˆ 2 + t x ν−1 λˆ 2ν Z N (t1 , t) = x k /2 · · · exp ⎝−k j + √ λ j ⎭ V(λ) d λ ⎩ 2 j x j=1 t 2 1 (k) = x k /2 Z k √ , t x ν−1 . x Putting all the above together, the natural variable to consider for this continuum limit of the Toda Lattice Equations is ' & 1 2 1 d2 (k) b = 2 log Z k (t1 , −s) t1 =− √1 ξ1 ,s=2k ν−1 ξ . k k 2k dt1 2 k

52

N. M. Ercolani et al. (k)

One may conclude directly from the main theorem of [16] that k12 log Z k has a full asymptotic expansion in powers of k −2 , uniformly valid in an appropriate (t1 , −s) domain which includes arbitrary negative s and simultaneously an open neighborhood of t1 = 0. On this domain the asymptotic expansion of derivatives may be calculated via term by term differentiation of the original asymptotic expansion. Moreover, the coefficients in this expansion are complex analytic in a neighborhood of (0, 0). It follows that & ' 1 d2 (k) t =0 log Z , −s) (t 1 1 k 2k 2 dt1 2 has a full asymptotic expansion in powers of k −2 uniformly valid for negative s and whose coefficients are analytic in a complex neighborhood of s = 0. 4.1. The continuum Toda hierarchy. In this subsection we prove Theorem 2.2. We begin with the Toda lattice systems (4.1) introduced earlier. It will prove convenient to transform this to the so-called Kostant-Toda lattice form [15]. This is achieved by conjugating L in (4.1) by the diagonal matrix ⎛ ⎞ 1 0 0 0 0 ... −1 ⎜0 b 0 0 0 ...⎟ ⎜ 1 ⎟ ⎜0 0 (b1 b2 )−1 0 0 ...⎟ D=⎜ ⎟. ⎜0 0 ...⎟ 0 (b1 b2 b3 )−1 ⎝ ⎠ .. .. .. .. .. . . . . . The fundamental matrix variable for the Kostant-Toda system is ⎛ ⎞ 0 1 0 0 0 ... ⎜b2 0 1 0 0 ... ⎟ ⎜ 1 2 ⎟ ⎜ 0 b 0 1 0 ... ⎟ −1 2 ⎜ ⎟. L = D LD = ⎜ ⎟ ⎜ 0 0 b2 0 1 . . . ⎟ 3 ⎝ ⎠ .. .. . . . . . . . . . . . Because D is diagonal it is straightforward to see that the RHS of (4.1) transforms as & ' D −1 [B2ν (L), L] D = D −1 B2ν (L)D, D −1 L D = [B2ν (L), L] . (4.5) One also has Lξ = (D −1 L D)ξ = D −1 L ξ D + [L, D −1 Dξ ]. Combining this observation with (4.5) in (4.1) we see that the Lax equation transforms to (4.6) Lξ + [D −1 Dξ , L] = [B2ν (L), L] . By direct calculation one can check that 1 [D −1 Dξ , L] = − Lξ + [D −1 Dξ , ], 2

Random Matrices, Graphical Enumeration and the Continuum Limit of Toda Lattices

where

⎛ 0 ⎜0 ⎜0 ⎜ =⎜ ⎜0 ⎝ .. .

53

⎞ 1 0 0 0 ... 0 1 0 0 ... ⎟ 0 0 1 0 ... ⎟ ⎟ . .. ⎟ 0 0 0 1 .⎟ ⎠ .. . . . . . . . . . .

Using this observation the Lax equation (4.6) becomes 1 Lξ = [B2ν (L), L] − [D −1 Dξ , ]. 2 Since the LHS is lower triangular, we may reduce this equation to 1 Lξ = [B2ν (L), L]− . 2 Since L is tri-diagonal, the RHS may be further simplified as & ' & ' 2ν = − L2ν . [B2ν (L), L]− = L2ν + − L− , L − ,L −

−

Finally, since L = L− + with L− supported only on the first lower diagonal, it follows that Lξ is only supported on the first lower diagonal and therefore we may use the following final form of our Lax equation: ' & 1 , Lξ = , L2ν − (−1) 2 where the subscript (−1) denotes projection onto the first lower subdiagonal. At the level of the matrix entries bk2 this system of equations becomes 1 dbk2 = (L2ν )k+2,k − (L2ν )k+1,k−1 2 dξ

(4.7)

where, (L2ν )k+1,k−1 =

Lk+1,i2 Li2 ,i3 . . . Li2ν ,k−1 , (4.8)

i 1 ,i 2 ,...,i 2ν+1 ;|i j+1 −i j |=1;i 1 =k+1,i 2ν+1 =k−1

as stated in (2.1) and (2.2) . Note that in (4.8) the sum may be viewed as being taken over walks, w, from k+1 to k-1 of length 2ν. We will sometimes represent a walk w as a 2ν-vector of ±1 s. It is clear that such a walk has ν + 1 “downturns” (vector entries of -1) and ν − 1 “upturns” (vector entries of +1). Therefore, specifying the locations of the downturns determines w. From this we see that the set of all such walks, {w} is in one-to-one correspondence with all choices of ν + 1 numbers (the downturn locations), { j1 < j2 < · · · < jν+1 } from {1, . . . , 2ν}: 2ν . {w} = ν+1

54

N. M. Ercolani et al.

Thus (4.8) becomes

{w}

2 2 bk+ b2 · · · bk+ , 1 (w) k+2 (w) ν+1 (w)

(4.9)

where m (w) = jm − 2m + 1.

(4.10)

Some explanation is perhaps necessary here. The sum appearing in (4.9) is taken over all choices of ν + 1 numbers, and each such choice is denoted by a member w of the set {w}. As described above, each w may be interpreted as a walk on the integer lattice, from k+1 to k-1 of length 2ν. Since the structure of the walks and all associated counting is independent of the value of k, we will from now on, for simplicity, take k=0. The quantity m (w) is defined to be the location on the integer lattice after the m th downturn. It is now straightforward to compute that, given a walk with downturns at j1 , . . . , jν+1 , after m downturns, there have been jm − m upward steps, and m downward steps. Since the starting position is +1, (4.10) follows. Inserting these reductions into the Toda equations (4.7) we have 1 2 bk = ξ 2 {w}

( ν+1

)

ν+1 2 bk+ m (w)+1

m=1

−

2 bk+ m (w)

.

(4.11)

m=1

By Proposition 4.1, bk2 (ξ )

=

bk2 (

sk )=. bk2 (sk ) = k 2k ν−1

1 d2 log(Z (t , −s )) k 1 k 2k 2 dt12

, (4.12)

t1 =0

and the differential equation (4.11) becomes

k

ν−1

d .2 bk (sk ) = dsk {w}

( ν+1 m=1

2 . (s )− bk+ m (w)+1 k+m +1

ν+1

) 2 . (s ) bk+ m (w) k+m

. (4.13)

m=1

From (4.12), and the asymptotic expansion of the logarithm of the partition function, 2 (s Z k appearing therein, one sees that for any fixed integer , the quantity (k+)−1. bk+ k+ ) possesses an asymptotic expansion in even inverse powers of (k + ), which we express in the following form: 2 . (sk+ ) = k bk+

g≥0

1−2g −2g ν−1 1+ z g sk 1 + k . k k

(4.14)

Fundamental asymptotic principle. Recall that the following two facts were proven rigorously in [16]: i The asymptotic expansion satisfied by the partition function is differentiable term by term. ii The coefficients z g (•) are analytic functions of •.

Random Matrices, Graphical Enumeration and the Continuum Limit of Toda Lattices

55

Therefore, the terms appearing in the summation in (4.14) may be Taylor expanded (note that /k is asymptotically small) and the resulting multiple series may be re-summed. One may convince oneself of this as follows. One truncates the expansion (4.14) at order k 1−2h , and observes that the error is O(k −1−2h ). Then one computes Taylor expansions to order k −2h of all quantities appearing in the (now finite) sum, and observes that the error is O(k −1−2h ). Following this principle, we now introduce the scheme f (s, w): f (s, w) = z g sk w ν−1 w 1−2g k −2g . (4.15) g≥0

Using this scheme, the differential equation (4.13) becomes: / ν+1 ( d f (2) m + 1 2 f w m + 1 ν+1 f (s, w) + w = kf + ··· 1+ ds f k 2f k w=1 m=1 {w} ) f w(h) m + 1 h + + ··· h! f k )0 ( ν+1 f w(h) m h f w m f w(2) m 2 − + + ··· + + ··· 1+ f k 2f k h! f k m=1 , + 1 = (ν + 1) f ν f w + O 2 k {w} 1 2ν ν = (ν + 1) f fw + O 2 , (4.16) ν+1 k 2ν − 1 2ν = cν as claimed in the statement = 2ν where we note that (ν + 1) ν−1 ν+1 of Theorem 2.2. From the expansion (4.15) one may now confirm that the higher order terms in the expansion of the RHS have the form described in Theorem 2.2. The justification for the (ν,g) form of the coefficient dV stated there requires some further explanation which we now provide. (ν,g)

in Theorem 2.2 This subsection introduces some 4.1.1. Explicit Formulae for dV (ν,g) results, interesting in their own right, about connections between the coefficients dV and joint probability distributions for conditioned random walks. Given the explicit determination (4.10) of the m in terms of the walks, it is natural to (ν,g) try to calculate dV by expressing this combinatorial quantity in terms of expectations with respect to the induced probability distribution on the family of random walks {w} conditioned to begin at +1 and end at -1. To this end we observe that given an ordered sequence of numbers m 1 < m 2 < · · · < m r selected from the set {1, . . . , ν + 1}, and another ordered sequence i 1 < · · · < ir selected from {1, . . . , 2ν}, 1 2 w| m th downturn of w occurs at position i , j = 1, . . . , r j j i2 − i1 − 1 ir − ir −1 − 1 2ν − ir i1 − 1 ··· . (4.17) = m1 − 1 m2 − m1 − 1 m r − m r −1 − 1 ν + 1 − mr

56

N. M. Ercolani et al.

These joint probability distributions are fundamental for the count we will now describe. One sees directly from expanding the products in (4.16), that (ν,g) dV

=

{w}

#2g+1 j=1

r j (V )!

j=1

r j (V )!

m 1 , . . . , m ρ(V ) distinct ∈ {1, . . . , ν + 1}

{w}

1

= #2g+1 j=1

r j (V )!

1 = #2g+1 j=1 r j (V )! ⎛

ρ(V )

⎝

σ ∈Sρ(V )

···

⎛

ρ(V )

⎝

$ %|V | m i + 1 i −

i=1

m 1 < · · · < m ρ(V ) ∈ {1, . . . , ν + 1}

i1 − 1 m1 − 1

ρ(V )

−

⎛

ρ(V )

⎝

⎞ ⎠

ρ(V )

$

m i

%|Vi |

⎞ ⎠

%|V | $ m σ (i) + 1 i −

ρ(V )

$

m σ (i)

%|Vi |

⎞ ⎠

i=1

m 1 < · · · < m ρ(V ) ∈ {1, . . . , ν + 1}

i2 − i1 − 1 m2 − m1 − 1

$ %|V | i j − 2m j + 1 σ ( j)

m i

%|Vi |

i=1

j=1

i ρ(V ) − i ρ(V )−1 − 1 m ρ(V ) − m ρ(V )−1 − 1

$

i=1

{w}

ρ(V ) i=1

$ %|V | i j − 2m j + 2 σ ( j)

i=1

σ ∈Sρ(V ) 1≤i 1 ···
2ν − i ρ(V ) ν + 1 − m ρ(V )

$ %|V | m i + 1 i −

j=1

×

ρ(V )

⎝

m 1 , . . . , m ρ(V ) distinct ∈ {1, . . . , ν + 1}

1

= #2g+1

⎛

1

2ν − i ρ(V ) ν + 1 − m ρ(V )

···

i1 − 1 m1 − 1

i ρ(V ) − i ρ(V )−1 − 1 m ρ(V ) − m ρ(V )−1 − 1

i2 − i1 − 1 m2 − m1 − 1

···

⎞ ⎠,

where in the last equality we have applied (4.10) and (4.17). (ν,g) The formula for dV given in Theorem 2.2 now follows from the next proposition. Proposition 4.2. The inner-most summation may be re-expressed in terms of the following combinatorial coefficients:

=

coeff of x ν−r +1 in αn r n ∂ (1 + xs )+c in − 2 · (1 + x1 )i1 −1 · · · (1 + xr )ir −ir −1 −1 (1 + xr +1 )2ν−ir |xµ =x ∂ xs n=1 s=1 m 1 < · · · < m ρ(V ) ∈ {1, . . . , ν + 1} r n=1

(i n − 2m n + c)αn

i1 − 1 m1 − 1

i2 − i1 − 1 m2 − m1 − 1

···

ir − ir −1 − 1 m r − m r −1 − 1

2ν − ir ν + 1 − mr

.

Random Matrices, Graphical Enumeration and the Continuum Limit of Toda Lattices

57

To prove this we start with the LHS which, after expanding binomially, becomes αn i −1 i −i −1 ir −ir −1 −1 2ν−i r n 1 2 1 r ∂ 1 + xs +c ··· in − 2 · ∂ xs n=1 s=1 s1 =0 s2 =0 sr =0 sr +1 =0 i1 − 1 i2 − i1 − 1 i − ir −1 − 1 2ν − ir +1 ··· r x1s1 x2s2 · · · xrsr xrsr+1 |xµ =x sr +1 s1 s2 sr

coeff of x ν−r +1 in

= coeff of x ν−r +1 in

i1 − 1 s1

i1 − 1 s1

=

i1 − 1 s1

···

i 1 −1 i 2 −i 1 −1 s1 =0 s2 =0

i2 − i1 − 1 s2 r

···

ir −i r −1 −1 2ν−i r sr =0

2ν − ir sr +1

ir −i r −1 −1 2ν−i r sr =0

(i n − 2n + c − 2 (s1 + · · · + sn ))αn ·

sr +1 =0 n=1

ir − ir −1 − 1 sr ···

r

r

+1 x1s1 x2s2 · · · xrsr xrsr+1 |xµ =x

(i n − 2n + c − 2 (s1 + · · · + sn ))αn ·

sr +1 =0 n=1

ir − ir −1 − 1 sr

2ν − ir sr +1

x s1 +···+sr +1

(i n − 2n + c − 2 (s1 + · · · + sn ))αn ·

s1 +···+sr +1=ν−r +1 n=1

···

s1 =0 s2 =0

i2 − i1 − 1 s2

= coeff of x ν−r +1 in

i 1 −1 i 2 −i 1 −1

i2 − i1 − 1 s2

···

ir − ir −1 − 1 sr

2ν − ir sr +1

,

where in the first equality we use the fact that the monomials in x1" , . . . , xr +1 are each simultaneous eigenfunctions for all the differential operators i n − 2 ns=1 (1 + xs ∂∂xs ) + c and therefore the action of the product of these operators on each eigenfunction may be replaced by the product of the corresponding eigenvalues times this function. We next make the following change of variables in the summations s1 = m 1 − 1 s2 = m 2 − m 1 − 1 .. . sk = m k − m k−1 − 1 .. . sr = m r − m r −1 − 1 which then requires that sr +1 = ν − r + 1 − (s1 + · · · + sr ) = ν − r + 1 − (m r − r ) = ν + 1 − m r and this completes the proof of the proposition. 4.2. Differential equations for eg (t) . We will now show that there are differential equations whose solutions determine eg (−s) written in terms of the functions z g (s). We will show that these differential equations can be used to derive expressions for the Taylor (ν) Coefficients of eg (−s), κg ( j), and verify Theorem 2.5.

58

N. M. Ercolani et al.

Recall the Hirota relation (4.2). Restricted to the level 2ν-time flow this becomes bk (ξ )2 =

τk+1 (ξ )2 τk−1 (ξ )2 bk (0)2 τk (ξ )4

(4.18)

and is related to the partition function (1.1) through τk (ξ )2 =

Z k (−2ξ k ν−1 ) . Z k (0)

The logarithm of (4.18) produces a hierarchy of difference equations (discrete Hirota relations): Setting sk = 2k ν−1 ξ in order to write τk (ξ )2 =

Z k (−sk ) , Z k (0)

and applying a logarithm to both sides of Eq. (4.18) one obtains log

Z k+1 (−sk+1 ) Z k+1 (0)

− 2 log

Z k (−sk ) Z k (0)

+ log

Z k−1 (−sk−1 ) Z k−1 (0)

= log(bk (ξ )2 ) − log(bk (0)2 ).

(4.19)

We will next describe a continuum limit of these discrete equations which will generate a hierarchy of differential equations for the functions eg (t2ν ) which appear in the asymptotic expansion (1.5). If we shift k → k ± 1, then sk shifts: sk → sk±1 = sk (1 ± 1/k)ν−1 . With this substitution the left-hand side of (4.19) becomes Z k−1 (−s(1 − 1/k)ν−1 ) − 2 log + log . log Z k−1 (0) (4.20) Note that from Theorem 1.2, each term in (4.20) possesses an asymptotic expansion of the form Z k+r (−s(1 + r/k)ν−1 ) r ν−1 r 2−2g 2−2g log , = 1+ k eg −s 1 + Z k+r (0) k k

Z k+1 (−s(1 + 1/k)ν−1 ) Z k+1 (0)

Z k (−s) Z k (0)

g≥0

with r = −1, 0, 1. The expression (4.20) is a centered difference. We utilize a straightforward lemma describing centered difference expansions: Lemma 4.3. If G is an analytic function, then G(1 + z) − 2G(1) + G(1 − z) =

2z 2n G (2n) (1) (2n)! n≥1

as z → 0.

Random Matrices, Graphical Enumeration and the Continuum Limit of Toda Lattices

59

To apply this lemma we note that (4.20) has an asymptotic expansion of the form Z k (−s) Z k−1 (−s(1 − 1/k)ν−1 ) Z k+1 (−s(1 + 1/k)ν−1 ) − 2 log + log log Z k+1 (0) Z k (0) Z k−1 (0) 1 1 − 2Gg (1) + Gg 1 + , = k 2−2g Gg 1 − k k g≥0

where

Gg (w) = w 2−2g eg (−w ν−1 s).

Using Lemma 4.3, we have Z k−1 (−sk (1 − k1 )ν−1 ) Z k+1 (−sk (1 + k1 )ν−1 ) Z k (−sk ) + log log − 2 log Z k+1 (0) Z k (0) Z k−1 (0) ' 2 ∂ (2n) & 2−2n ν−1 |w=1 . w k 2−2g−2n e (−w s ) = g k (2n)! ∂w (2n) g≥0 n≥1

(4.21) In terms of the coefficients z g defined in Proposition 4.1, the right-hand side of (4.19) becomes ⎛ ⎞ log ⎝ k −2g z g (s)⎠ . (4.22) g≥0

We get ODE’s which determine eg (−s) by equating like orders in Eq. (4.19) after making the substitutions (4.21) and (4.22). The equations so derived are ' ∂ (2n+2) & 2−2(g−n) 2 ν−1 |w=1 w e (−w s) g−n (2n + 2)! ∂w (2n+2) n=0 −2g −2n = the k term in log k z n (s) ,

g

n=0

which establishes the hierarchy (2.6). These equations are recursive in the sense that we need to know z n for n ≤ g and em for m < g in order to write a closed differential equation for eg . We will see how this works out in the next section. 5. Continuum Toda Solutions and Fine Structure of the Asymptotic Partition Function In this section we will prove Theorems 2.3 and 2.5. The equations of the continuum Toda Lattice hierarchy are given by (2.4) and (2.5) and bk (t) possesses an asymptotic expansion whose leading order, z 0 (s), is the branch of the solution of the polynomial equation 1 = z 0 (s) − cν sz 0 (s)ν (5.1) which is regular at s = 0. Recall that cν = 2ν

2ν − 1 . ν−1

60

N. M. Ercolani et al.

Implicitly differentiating (5.1) we have the basic identity ds =

(ν − (ν − 1)z 0 ) dz 0 , cν z 0ν+1

(5.2)

which will be used often in the following sections. 5.1. Integrating the continuum Toda lattice hierarchy. Let f g (s, w) = w 1−2g z g (w ν−1 s). The k −2g -order equation in (2.4) is linear in z g (s): (5.3) z g (s) = cν f 0ν f gw + ν f 0ν−1 f g f 0w |w=1 + Forcingg |w=1 , where Forcingg is given by (2.5) and the entire RHS depends only on z j , j < g and their derivatives. (Note: in what follows, if the context makes it clear that the forcing terms are being evaluated at w = 1 we will simply write Forcingg rather than Forcingg |w=1 .) Expanding out the derivatives in Eq. (5.3), z g = cν (1 − 2g)z 0ν z g + (ν − 1)sz 0ν z g + νz 0ν z g + ν(ν − 1)sz 0ν−1 z g z 0 + Forcingg . (5.4) Solving for z g : (1 − cν (ν − 1)sz 0ν )z g = cν (ν + 1 − 2g)z 0ν + ν(ν − 1)sz 0ν−1 z 0 z g + Forcingg . (5.5) We may inductively assume that z j (s) is a function of z 0 (s) for j < g, and as a result Forcingg is a function of z 0 . One can convert Eq. (5.5) to a differential equation for z g as a function of z 0 using Eqs. (5.1) and (5.2): Forcingg dz g (ν(2 − 2g) + (ν − 1)(2g − 1)z 0 ) zg + = . dz 0 z 0 (ν − (ν − 1)z 0 ) cν z 0ν+1

(5.6)

Equation (5.6) is linear in z g . Let z0 (ν(2 − 2g) + (ν − 1)(2g − 1)u) du G(z 0 , y) = u(ν − (ν − 1)u) y z0 = (ν − 1)−1 (ν − (ν − 1)u)−1 + 2(1 − g)u −1 du y 0 = − log(ν − (ν − 1)u) + 2(1 − g) log(u) u=z u=y . Integrating Eq. (5.6) one has z0 z g (s) = exp(G(z 0 , y)) 1 2(1−g)

z0 = ν − (ν − 1)z 0

1

z0

1 Forcingg (y)dy cν y ν+1 ν − (ν − 1)y Forcingg (y)dy, cν y ν+3−2g

where z 0 = z 0 (s). This establishes the basic formula stated in Theorem 2.3(3).

(5.7)

Random Matrices, Graphical Enumeration and the Continuum Limit of Toda Lattices

61

5.2. Singularities of z g . In this section we complete the proof of Theorem 2.3. Expression (5.7) can be exploited to determine the singularity structure of z g (s). Using (5.2) we can solve for z 0 (s): cν z 0ν+1 . (ν − (ν − 1)z 0 ) With this we can re-express successive derivatives of z 0 as functions of just z 0 by using the chain rule. Doing so, we find the following result: z 0 (s) =

Lemma 5.1. The derivatives of z 0 (s) have the following form: ( j) z 0 (s)

j ν j+1

cν z 0 = · ( Polynomial of degree j − 1) . (ν − (ν − 1)z 0 )2 j−1

(5.8)

Equation (5.8) together with (5.2) give the following result: ( j)

s j z 0 (s) =

z 0 (z 0 − 1) j · ( Polynomial of degree j − 1) . (ν − (ν − 1)z 0 )2 j−1

The terms in Forcingg (2.5) are each of the form ( f k1 )w(n1 ) ( f k2 )w(n2 ) . . . ( f kν+1 )w(nν+1 ) |w=1 ,

(5.9)

where f k = w 1−2k z k (w ν−1 s), and n 1 + n 2 + · · · + n ν+1 = 2(g − k1 − k2 − · · · − kν+1 ). Expanding the derivatives in (5.9) one finds that Forcingg is in fact comprised of terms of the form: (j ) (j ) ( jν+1 ) s j z k11 z k22 . . . z kν+1 , where j = j1 + j2 + · · · + jν+1 , 0 ≤ j ≤ 2(g − k1 − k2 − · · · − kν+1 ). ( j)

Lemma 5.2. The function z g (s) ( j derivatives of z g (s)) is of class iir with singularities occurring only at z 0 = 0 or z 0 = ν/(ν − 1). We will first prove by induction that z g (s) satisfies Lemma 5.2; see Sect. 5.3 for ( j) the initial step. Assume that z k (s) for k < g satisfies Lemma 5.2; this assumption, Lemma 5.1 and formula (5.1) show that the integrand in formula (5.7) is a polynomial in functions of class iir with singularities restricted to 0 and ν/(ν − 1) and having rational coefficients with at worst poles located at z 0 = 0. Thus the integral remains of class iir with the stated singularities and (5.7) is then this integral multiplied by a rational function with pole only at ν/(ν − 1). ( j) We next prove, by induction on j, that the higher derivatives, z g (s), satisfy Lemma 5.2 which will complete the original induction. A derivative with respect to z 0 will raise the order of a singularity but will not introduce new singularities. Compute d & ( j−1) ' ( j) zg (s) z 0 (s); z g (s) = dz 0 by the induction step (in j) and formula (5.8) this expression is of class iir with singularities only at z 0 = 0 and z 0 = ν/(ν − 1). With the forthcoming verification of the initial step of the induction (Subsect. 5.3), this completes the proof of Theorem 2.3.

62

N. M. Ercolani et al.

5.3. Example: g = 1 . Recall that f 0 (s, w) = wz 0 (w ν−1 s) and f 1 (s, w) = w −1 z 1 (w ν−1 s). The forcing for z 1 is given exclusively in terms of z 0 as ν ν ν(ν − 1) ν−1 f 0 f 0 www + f 0 f 0 w f 0 ww 6 3 ν(ν − 1)(ν − 2) ν−2 3 + f0 f0 w . 12

cν−1 Forcing1 = cν−1 F1ν [0] =

The entire expression is evaluated at w = 1. We find that ν(ν − 1)(ν − 2) ν+1 ν(ν − 1) z0 + (9ν 2 − 17ν + 6)sz 0ν z 0 12 12 5ν(ν − 1)3 2 ν ν(ν − 1)3 s z0 z0 + (7ν − 6)s 2 z 0ν−1 (z 0 )2 + 6 12 ν(ν − 1)3 3 ν ν(ν − 1)4 3 ν−1 + s z0 z0 + s z0 z0 z0 6 3 ν(ν − 1)4 (ν − 2) 3 ν−2 3 s z 0 (z 0 ) . + (5.10) 12

cν−1 F1ν [0] |w=1 =

Inserting (5.10) into Eq. (5.7) with g = 1, (writing all derivatives out as rational expressions of z 0 ) yields z 1 (t) =

(ν − 1)ν(z 0 − 1)z 0 (−ν 2 − 2z 0 + νz 0 + ν 2 z 0 ) . 12(ν − (ν − 1)z 0 )4

(5.11)

From this we see that not only does z 1 (t) satisfy Lemma 5.2, but in fact this expression is a rational function, without any of the potential complexities of the iir class. In fact all the examples we have worked out here for low values of g yield rational expressions in z 0 for z g . Moreover, the poles of these expressions are restricted to z 0 = ν/(ν − 1). Remark. The rationality of (5.11) seems to depend very sensitively on the precise coefficients appearing in Forcingg . We illustrate this by looking at the contributions that particular terms of (5.10) give to (5.11). The integrand in integral (5.7) has a term of the form (ν − (ν − 1)z 0 ) 2 ν−1 2 (z 0 − 1)2 , s z (z ) = 0 0 (ν − (ν − 1)z 0 ) z 0ν+1 where we have used Eq. (5.2) to eliminate z 0 . This term produces a logarithmic singularity at ν/(ν − 1); therefore one of the other terms must cancel this one to eliminate the logarithm. From the representation (5.11) we can compute the Taylor Coefficients of z 1 (t) using the technique outlined in formulas (3.27-3.29): j (ν − (ν − 1)u)u ν j−1 cν (1) z 1 (z 0 = u)du ζj = 2πi u∼1 (u − 1) j+1 j (ν − 1)νu ν j (−ν 2 − 2u + νu + ν 2 u) cν du = 2πi u∼1 12(u − 1) j (ν − (ν − 1)u)3 j (ν − 1)ν(w + 1)ν j (ν − 2 + (ν 2 + ν − 2)w) cν dw. (5.12) = 2πi w∼0 12(1 − (ν − 1)w)3 w j

Random Matrices, Graphical Enumeration and the Continuum Limit of Toda Lattices

63

In the last line we have made the change of variables w = u − 1. With the integral representation (5.12) one may conclude that ν(ν − 1) j (w + 1)ν j ((ν − 2) + (ν − 1)(ν + 2)w) (1) j−1 . cν w Taylor Coefficient of ζj = 12 (1 − (ν − 1)w)3 5.4. Example: g = 2. The forcing terms for z 2 (2.5) are: (1) The first term is the summation in (2.5): cν ν f 0ν−1 f 1 f 1 w + cν ν(ν − 1)/2 f 0ν−2 f 0 w f 12 . (ν)

(2) The middle term is F1 [2]: (ν)

ν 2 ν−1 ν ν(ν − 1)2 ν−2 f 0 f 1 f 0 www + f 0ν f 1 www + f 0 f 1 f 0 w f 0 ww 6 6 3 ν(ν − 1) ν−1 ν(ν − 1) ν−1 f 0 f 1 w f 0 ww + f 0 f 0 w f 1 ww + 3 3 ν(ν − 1)(ν − 2)2 ν−3 ν(ν − 1)(ν − 2) ν−2 2 f 0 f 1 f 0 3w + f0 f0 w f1 w . + 12 4

cν−1 F1 [2] =

(3) The last term arises from F2(ν) [0]: ν(2ν − 1) ν ν(ν − 1)(18ν − 11) ν−1 1 (ν) f 0 f 0 w(5) + f 0 f 0 w f 0 w(4) F2 [0] |w=1 = cν 120 360 ν(ν − 1)(6ν − 5) ν−1 f 0 f 0 ww f 0 www + 72 ν(ν − 1)(ν − 2)(23ν − 20) ν−2 2 f 0 f 0 w f 0 www + 360 ν(ν − 1)(ν − 2)(62ν − 65) ν−2 f 0 f 0 w f 0 2ww + 720 ν(ν − 1)(ν − 2)(ν − 3)(16ν − 19) ν−3 f 0 f 0 ww f 0 3w + 360 ν(ν − 1)(ν − 2)(ν − 3)(ν − 4)(5ν − 7) ν−4 5 f0 f0 w . + 1440 The three terms make up the forcing terms for z 2 ; each of them is expressed as a function of z 0 . Substituting this explicit representation into Eq. (5.7) with g = 2 we find that & 1 (ν − 1)ν(z 0 − 1)z 0 (2ν 6 − 14ν 7 + 24ν 8 ) z 2 (t) = 1440 +(−12ν 3 + 148ν 4 − 546ν 5 + 758ν 6 − 252ν 7 − 96ν 8 )z 0 +(264ν 2 − 1510ν 3 + 25551ν 4 − 500ν 5 − 1789ν 6 + 840ν 7 + 144ν 8 )z 02 +(−536ν + 1396ν 2 + 912ν 3 − 4596ν 4 + 2492ν 5 + 1296ν 6 − 868ν 7 − 96ν 8 )z 03 +(168 + 234ν − 1467ν 2 + 558ν 3 + 1902ν 4 − 1446ν 5 − 267ν 6 ' +294ν 7 + 24ν 8 )z 04 · (ν − (ν − 1)z 0 )−9 .

64

N. M. Ercolani et al.

Let j

r2 (w) =

cν ν(ν − 1) (w + 1)ν j [(ν − 2)(ν − 3)(ν − 4)(5ν − 7) 1440 + 2(ν − 1)(ν − 2)(168 + 84ν − 246ν 2 + 73ν 3 )w + (ν − 1)2 (ν − 2)(−504 − 1158ν + 288ν 2 + 497ν 3 )w 2 + 4(ν − 1)3 (−168 − 604ν − 190ν 2 + 288ν 3 + 77ν 4 )w 3 ' + 3(ν − 1)4 (56 + 302ν + 383ν 2 + 130ν 3 + 8ν 4 )w 4 (1 − (ν − 1)w)−8 .

We compute the Taylor Coefficients of z 2 (t) with a loop integral: (2)

ζj

j (ν − (ν − 1)u)u ν j−1 cν z 2 (z 0 = u)du 2πi u∼1 (u − 1) j+1 = The w j−1 coefficient of r2 (w).

=

5.5. Example: g = 3. To compute z 3 we compute the forcing as above. Here we outline how the terms contributing to this forcing are found and then present the resulting explicit expression for z 3 . There are six terms in formula (2.5) when g = 3: (1-2) The first two terms occur in the summation in (2.5) when g = 3. One term contains three f 1 ’s and is ν(ν − 1)(ν − 2) ν−3 3 ν(ν − 1) ν−2 2 f0 f1 f0 w + f0 f1 f1 w . 6 2 The other, containing an f 1 and an f 2 , is ν(ν − 1) f 0 ν−2 f 1 f 2 f 0 w + ν f 0ν−1 f 1 f 2 w + ν f 0ν−1 f 2 f 1 w . (ν)

(3-4) The next two terms are contributed by F1 [4]: this expression represents the fourth order contributions of F1(ν) . The terms involving two f 1 ’s are cν

ν 2 ν−1 ν 2 (ν − 1) ν−2 2 ν(ν − 1) ν−1 f 0 f 1 f 1 www + f 0 f 1 f 0 www + f 0 f 1 w f 1 ww 6 12 3

+

ν(ν − 1)2 ν−2 ν(ν − 1)2 ν−2 f 0 f 1 f 0 w f 1 ww + f 0 f 1 f 1 w f 0 ww 3 3

+

ν(ν − 1)2 (ν − 2) ν−3 2 ν(ν − 1)(ν − 2) ν−2 f 0 f 1 f 0 w f 0 ww + f 0 f 0 w f 1 2w 6 4

ν(ν − 1)(ν − 2)2 ν−3 ν(ν − 1)(ν − 2)2 (ν − 3) ν−4 2 3 2 f0 f1 f0 w f1 w + f0 f1 f0 w . + 4 24

Random Matrices, Graphical Enumeration and the Continuum Limit of Toda Lattices

The terms containing one f 2 are ν ν ν 2 ν−1 ν(ν − 1) ν−1 f 0 f 2 www + f 0 f 2 f 0 www + f 0 f 0 w f 2 ww cν 6 6 3 ν(ν − 1) ν−1 ν(ν − 1)2 ν−2 f 0 f 2 w f 0 ww + f 0 f 2 f 0 w f 0 ww 3 3 ν(ν − 1)(ν − 2) ν−2 2 ν(ν − 1)(ν − 2)2 ν−3 3 f0 f0 w f2 w + f0 f2 f0 w . + 4 12

+

(5) The fifth term is F2(ν) [2]

= cν + + + + + + + + + + + + + + +

ν(2ν − 1) ν ν 2 (2ν − 1) ν−1 f 0 f 1 w(5) + f 0 f 1 f 0 w(5) 120 120

ν(ν − 1)(18ν − 11) ν−1 f 0 f 0 w f 1 w(4) 360 ν(ν − 1)(18ν − 11) ν−1 f 0 f 1 w f 0 w(4) 360 ν(ν − 1)2 (18ν − 11) ν−2 f 0 f 1 f 0 w f 0 w(4) 360 ν(ν − 1)(6ν − 5) ν−1 f 0 f 0 ww f 1 www 72 ν(ν − 1)(6ν − 5) ν−1 f 0 f 1 ww f 0 www 72 ν(ν − 1)2 (6ν − 5) ν−2 f 0 f 1 f 0 ww f 0 www 72 ν(ν − 1)(ν − 2)(23ν − 20) ν−2 2 f 0 f 0 w f 1 www 360 ν(ν − 1)(ν − 2)(23ν − 20) ν−2 f 0 f 0 w f 1 w f 0 www 180 ν(ν − 1)(ν − 2)2 (23ν − 20) ν−3 f 0 f 1 f 0 2w f 0 www 360 ν(ν − 1)(ν − 2)(62ν − 65) ν−2 f 0 f 0 w f 0 ww f 1 ww 360 ν(ν − 1)(ν − 2)(62ν − 65) ν−2 f 0 f 1 w f 0 2ww 720 ν(ν − 1)(ν − 2)2 (62ν − 65) ν−3 f 0 f 1 f 0 w f 0 2ww 720 ν(ν − 1)(ν − 2)(ν − 3)(16ν − 19) ν−3 3 f 0 f 0 w f 1 ww 360 ν(ν − 1)(ν − 2)(ν − 3)(16ν − 19) ν−3 2 f 0 f 0 w f 1 w f 0 ww 120 ν(ν − 1)(ν − 2)(ν − 3)2 (16ν − 19) ν−4 f 0 f 1 f 0 3w f 0 ww 360

65

66

N. M. Ercolani et al.

ν(ν − 1)(ν − 2)(ν − 4)(5ν − 7) ν−4 4 f0 f0 w f1 w 288 ν(ν − 1)(ν − 2)(ν − 4)2 (5ν − 7) ν−5 f 0 f 1 f 0 5w . + 1440

+

(6) The last term is

(ν)

ν ν(ν − 1) (3 − 8ν + 6ν 2 ) f 0ν f 0 w(7) + (47 − 108ν + 72ν 2 ) f 0ν−1 f 0w f 0 w(6) 7! 7!3 ν(ν − 1) (105 − 210ν + 112ν 2 ) f 0ν−1 f 0ww f 0 w(5) 7!2 ν(ν − 1) (539 − 1050ν + 504ν 2 ) f 0ν−1 f 0 w(3) f 0 w(4) 7!6 ν(ν − 1)(ν − 2) (86 − 163ν + 86ν 2 ) f 0ν−2 f 0 2w f 0 w(5) 7!2 ν(ν − 1)(ν − 2) (1387 − 2416ν + 1044ν 2 ) f 0ν−2 f 0w f 0ww f 0 w(4) 7!6 ν(ν − 1)(ν − 2) (467 − 803ν + 327ν 2 ) f 0ν−2 f 0w f 0 2w(3) 7!3 ν(ν − 1)(ν − 2) (1456 − 2359ν + 882ν 2 ) f 0ν−2 f 0 2ww f 0 w(3) 7!6 ν(ν − 1)(ν − 2)(ν − 3) (410 − 665ν + 270ν 2 ) f 0ν−3 f 0 3w f 0 w(4) 7!6 ν(ν − 1)(ν − 2)(ν − 3) (857 − 1309ν + 458ν 2 ) f 0ν−3 f 0 2w f 0ww f 0 w(3) 7!2 ν(ν − 1)(ν − 2)(ν − 3) (1327 − 1932ν + 620ν 2 ) f 0ν−3 f 0w f 0 3ww 7!6 ν(ν − 1)(ν − 2)(ν − 3)(ν − 4) (788 − 1121ν + 359ν 2 ) f 0ν−4 f 0 4w f 0 w(3) 7!12 ν(ν − 1)(ν − 2)(ν − 3)(ν − 4) (2431 − 3315ν + 974ν 2 ) f 0ν−4 f 0 3w f 0 2ww 7!12 ν(ν − 1)(ν − 2)(ν − 3)(ν − 4)(ν − 5) (228 − 290ν + 77ν 2 ) f 0ν−5 f 0 5w f 0ww 7!6 ν(ν − 1)(ν − 2)(ν − 3)(ν − 4)(ν − 5)(ν − 6) (124 − 147ν + 35ν 2 ) f 0ν−6 f 0 7w . 7!72

F3 [0] = cν + + + + + + + + + + + + +

The entire forcing expression is a rational function of z 0 with singularities only at z 0 = ν/(ν − 1). We insert these forcing terms into Eq. (5.7) with g = 3 and find that: z 0 (z 0 − 1) ν(ν − 1) z 3 (t) = 362880 (ν − (ν − 1)z 0 )14 & (ν − 2)(ν − 3)(ν − 4)(ν − 5)(ν − 6)(124 − 147ν + 35ν 2 ) + (ν − 3)(ν − 2)(ν − 1)(104160 + 47584ν − 332550ν 2 + 270697ν 3 − 83226ν 4 + 8923ν 5 )(z 0 − 1) + 3(ν − 2)(ν − 1)2 (312480 + 744980ν − 1245750ν 2 + 373091ν 3 + 1085920ν 4 − 485414ν 5 + 67225ν 6 )(z 0 − 1)2

Random Matrices, Graphical Enumeration and the Continuum Limit of Toda Lattices

67

+ (ν − 2)(ν − 1)3 (−1562400 − 7251840ν + 290690ν 2 + 11468057ν 3 − 2824078ν 4 − 3154302ν 5 + 1078663ν 6 )(z 0 − 1)3 + (ν − 2)(ν − 1)4 (1562400 + 10781280ν + 12588010ν 2 − 10677353ν 3 − 11255921ν 4 + 3006363ν 5 + 1779986ν 6 )(z 0 − 1)4 + 3(ν − 1)5 (624960 + 5411808ν + 10100796ν 2 − 1315908ν 3 − 9371695ν 4 − 973573ν 5 + 1835799ν 6 + 308858ν 7 )(z 0 − 1)5 + (ν − 1)6 (−624960 − 6823584ν − 20098900ν 2 − 16851720ν 3 + 3867117ν 4 + 8356442ν 5 + 2223760ν 6 + 119824ν 7 )(z 0 − 1)6 + 5(ν − 1)7 (17856 + 235296ν + 939236ν 2 + 1505064ν 3 + 1032603ν 4 ' +285860ν 5 + 24472ν 6 + 64ν 7 )(z 0 − 1)7 .

Let & cν ν(ν − 1) (w + 1)ν j (ν − 2)(ν − 3)(ν − 4)(ν − 5)(ν − 6) 362880 × (124 − 147ν + 35ν 2 ) + (ν − 3)(ν − 2)(ν − 1) j

r3 (w) =

× (104160 + 47584ν − 332550ν 2 + 270697ν 3 − 83226ν 4 + 8923ν 5 )w + 3(ν − 2)(ν − 1)2 (312480 + 744980ν − 1245750ν 2 + 373091ν 3 + 1085920ν 4 − 485414ν 5 + 67225ν 6 )w 2 + (ν − 2)(ν − 1)3 × (−1562400 − 7251840ν + 290690ν 2 + 11468057ν 3 − 2824078ν 4 − 3154302ν 5 + 1078663ν 6 )w 3 + (ν − 2)(ν − 1)4 (1562400 + 10781280ν + 12588010ν 2 − 10677353ν 3 − 11255921ν 4 + 3006363ν 5 + 1779986ν 6 )w 4 + 3(ν − 1)5 (624960 + 5411808ν + 10100796ν 2 − 1315908ν 3 − 9371695ν 4 − 973573ν 5 + 1835799ν 6 + 308858ν 7 )w 5 + (ν − 1)6 (−624960 − 6823584ν − 20098900ν 2 − 16851720ν 3 + 3867117ν 4 + 8356442ν 5 + 2223760ν 6 + 119824ν 7 )w 6 + 5(ν − 1)7 (17856 + 235296ν + 939236ν 2 + 1505064ν 3 + 1032603ν 4 ' + 285860ν 5 + 24472ν 6 + 64ν 7 )w 7 (1 − (ν − 1)w)−13 . We compute the Taylor Coefficients of z 3 (t) with a loop integral: (3)

ζj

j (ν − (ν − 1)u)u ν j−1 cν z 3 (z 0 = u)du 2πi u∼1 (u − 1) j+1 = The w j−1 coefficient of r3 (w).

=

68

N. M. Ercolani et al.

5.6. Determining eg (−s2ν ). Theorem 2.4 gives a second order differential equation determining eg (−s2ν ): ' ∂ 2 & 2−2g ν−1 w −w e s = driversg , g ∂w 2

(5.13)

where the entire equation is evaluated at w = 1 and where the drivers are the right-hand side of Eq. (2.6): −

' ∂ 2n+2 & 2−2(g−n) 2 ν−1 w e (−w s) g−n (2n + 2)! ∂w 2n+2 n=1 (∞ ) 1 −2g +the k term of log z n (s) . k 2n

g

n=0

Expanding the two derivatives on the left-hand side of (5.13) and setting w = 1 we arrive at the basic ode for the eg : (2 − 2g)(1 − 2g)eg (−s) − (ν − 1)(ν + 2 − 4g)seg (−s) + (ν − 1)2 s 2 eg (−s) = driversg . (5.14) If g = 1 one solves this equation by integrating factors as follows: Multiply both sides of (5.14) by s γ1 with γ1 = γ1(+) =

3 − ν − 2g 2 − ν − 2g or γ1(−) = , ν−1 ν−1

(5.15)

to arrive at (2 − 2g)(1 − 2g)s γ1 eg (−s) − (ν − 1)(ν + 2 − 4g)s γ1 +1 eg (−s) + (ν − 1)2 s γ1 +2 eg (−s) = s γ1 driversg .

(5.16)

Let A = (2 − 2g)(1 − 2g)/(γ1 + 1). We then integrate Eq. (5.16) once, As γ1 +1 eg (−s) − (ν − 1)2 s γ1 +2 eg (−s) =

s

s1 γ1 driversg (s1 )ds1 + K 1 ,

0

where K 1 is a constant of integration. Multiplying by s γ2 , As

γ1 +γ2 +1

eg (s) − (ν

− 1)2 s γ1 +γ2 +2 eg (−s)

=s

γ2

0

s

s1 γ1 driversg (s1 )ds1 + s γ2 K 1 ,

we see that if γ2 satisfies γ2 =

(2 − 2g)(1 − 2g) − (ν − 1)2 (γ1 + 1)(γ1 + 2) , (γ1 + 1)(ν − 1)2

(5.17)

(5.18)

Random Matrices, Graphical Enumeration and the Continuum Limit of Toda Lattices

69

one may solve Eq. (5.17) for eg (s): eg (−s) =

1 s −γ1 −γ2 −2 (ν − 1)2 s s1 γ γ × s1 2 s2 1 driversg (s2 )ds2 ds1 + K 1 s −γ1 −1 + K 2 s −γ1 −γ2 −2 , 0

(5.19)

0

where K 1 and K 2 are constants of integration. Switching the order of integration in (5.19), eg (−s) =

s s 1 γ γ −γ1 −γ2 −2 s s1 2 s2 1 driversg (s2 )ds1 ds2 + K 1 s −γ1 −1 + K 2 s −γ1 −γ2 −2 , (ν − 1)2 0 s2 (5.20)

one can compute the s1 integral in (5.20): eg (−s) =

s 1 1 γ −γ1 −1 s s2 1 driversg ds2 − s −γ1 −γ2 −2 (ν − 1)2 γ2 + 1 0 s γ +γ +2 × s2 1 2 driversg ds2 + K 1 s −γ1 −1 + K 2 s −γ1 −γ2 −2 . 0

Applying an integration by parts to the remaining integrals (integrating the power of s2 and differentiating the drivers (with respect to s2 )): 1 1 1 1 s −γ1 −1 − drivers + eg (−s) = g (ν − 1)2 (γ2 + 1) γ1 + 1 γ1 + γ2 + 2 γ1 + 1 s s −γ −γ −2 s 1 2 γ1 +1 γ1 +γ2 +2 • • × s2 (driversg ) z 0 ds2 − s (driversg ) z 0 ds2 γ1 + γ2 + 2 0 2 0 + K 2 s −γ1 −γ2 −2 .

(5.21)

We may now change variables and integrate with respect to y = z 0 (s2 ) in (5.21): γ2 + 1 1 driversg eg (−s) = 2 (ν − 1) (γ2 + 1) (γ1 + 1)(γ1 + γ2 + 2) cν z 0ν γ1 +1 z 0 y − 1 γ1 +1 1 − (driversg )• dy (5.22) (γ1 + 1) z 0 − 1 cν y ν 1 ) cν z 0ν γ1 +γ2 +2 z 0 y − 1 γ1 +γ2 +2 1 • (driversg ) dy + γ1 + γ2 + 2 z 0 − 1 cν y ν 1 + K 1 s −γ1 −1 + K 2 s −γ1 −γ2 −2 , where K 1 and K 2 are constants determined in either of the following ways: (1) by the requirement that eg be a locally analytic function of s or (2) by the evaluation of the first few Taylor coefficients of eg (for low values of ν) through its combinatorial characterization. Direct calculation shows that either choice of γ1 in (5.15) produces formula (2.7) in Theorem 2.5.

70

N. M. Ercolani et al.

5.7. Example g = 0. When g = 0 we have γ1 = γ1 (−) = −

ν−2 , ν−1

then

ν−2 . ν−1 The driver when g = 0 is just log(z 0 ). The two integrals in (5.22) may be evaluated separately: z0 z0 (y − 1)1/(ν−1) y −(2ν−1)/(ν−1) dy = (1 − y −1 )1/(ν−1) y −2 dy γ2 = −

1

ν−1 = (1 − z 0−1 )ν/(ν−1) , ν and

(5.23)

z0

(y − 1)2/(ν−1) y −(3ν−1)/(ν−1) dy z0 = (1 − y −1 )2/(ν−1) y −3 dy 1

(ν − 1) = (1 − z 0−1 )(ν+1)/(ν−1) z 0−1 (ν + 1) (ν − 1) z 0 + (1 − y −1 )(ν+1)/(ν−1) y −2 dy (ν + 1) 1 (ν − 1) z 0 − 1 (ν+1)/(ν−1) ν (ν − 1)2 z 0 − 1 2ν/(ν−1) 2ν = z + z0 . 0 (ν + 1) z 0ν 2ν(ν + 1) z 0ν

(5.24)

Plugging (5.23) and (5.24) into (5.22) with g = 0 yields: 1 ν−1 1ν−1 1 (ν − 1)2 (z 0 − 1) + (z 0 − 1) + (z 0 − 1)2 e0 (−s) = log(z 0 ) − 2 ν 2 ν+1 4 4ν(ν + 1) 1 (ν − 1)2 3(ν + 1) (z 0 − 1) z 0 − , = log(z 0 ) + 2 4ν(ν + 1) ν−1 where K 1 and K 2 are both 0 by the analyticity condition. This formula for e0 agrees with the expression for e0 (s) found using the equilibrium measure. 5.8. Example: g = 1 . The drivers for e1 are z1 1 ∂4 2 ν−1 w − e (−w s) , 0 z0 12 ∂w 4 evaluated at w = 1. As with the Forcing terms for the z g this formula can be expressed as a rational function of z 0 with poles at z 0 = 0 and z 0 = ν/(ν − 1). We find that g = 1 is the exceptional case for the method used in formulas (5.16)(5.22). We compute it directly from Eq. (5.14) with g = 1. The integrating factors are

Random Matrices, Graphical Enumeration and the Continuum Limit of Toda Lattices

71

(γ1 , γ2 ) = (−1, −ν/(ν − 1)), differing from the general choice of integrating factors (5.15) and (5.18). We find the integral formula: s s 1 −ν/(ν−1) −1 1/(ν−1) e1 (−s) = s s1 s2 drivers1 (s2 )ds2 ds1 + K 1 + K 2 s 1/(ν−1) 2 (ν − 1) 0 0 s s 1 −ν/(ν−1) −1 1/(ν−1) = s s1 s2 drivers1 (s2 )ds1 ds2 + K 1 + K 2 s 1/(ν−1) (ν − 1)2 0 s s s 2 1 −ν/(ν−1) −1 1/(ν−1) s = s2 drivers1 ds2 − s2 drivers1 ds2 (ν − 1) 0 0 + K 1 + K 2 s 1/(ν−1) ( z 0 (s)−1 1/(ν−1) z 0 (s) cν y ν ν/(ν−1) (ν − (ν − 1)y) 1 = drivers1 dy (ν − 1) cν z 0 (s)ν y−1 cν y ν+1 1 ) z 0 (s) (ν − (ν − 1)y) drivers1 dy + K 1 + K 2 s 1/(ν−1) − y(y − 1) 1 =−

1 log (ν + (1 − ν)z 0 ) + K 1 + K 2 s 1/(ν−1) , 12

where we have switched the order of integration, computed the s1 integral, and changed coordinates to integrals with respect to y = z 0 (s2 ). The final step is a direct computation and is left to the reader. When t = 0, log (Z N (t)/Z N (0)) = 0, therefore e1 (0) = 0 and so K 1 = 0. We notice that if ν = 2 then K 2 = 0 by the analyticity condition, otherwise we will have to determine K 2 . The computation we will carry out in Sect. 5.10 shows that when ν = 2, e1 (−s) = s + O(s 2 ). Our expression for e1 (−s) has the expansion (when ν = 2) −

1 log [2 − z 0 (s)] + K 2 s = (1 + K 2 )s + O(s 2 ), 12

therefore K 2 = 0. This proves formula (2.8). 5.9. Example: g = 2. The drivers for e2 (s) are z2 1 z 12 1 ∂4 1 ∂6 2 ν−1 ν−1 e w − − (−w s) − e (−w s) , 1 0 z0 2 z 02 12 ∂w 4 360 ∂w 6 evaluated at w = 1. As with the Forcing terms for the z g and the drivers for e1 this expression can be expressed as a rational function of z 0 with poles at z 0 = 0 and z 0 = ν/(ν − 1). For g = 2 we find that ν+1 (+) γ1 = γ 1 = − ν−1

72

N. M. Ercolani et al.

and γ2 = −

ν . ν−1

We compute the integrals in Eq. (5.22), e2 (−s) =

1 (ν − 1)(z 0 − 1) (ν − (ν − 1)z 0 )−5 (−ν 3 + 5ν 4 + 8ν 5 ) 2880 +(−ν 2 + 41ν 3 − 24ν 4 − 16ν 5 )z 0 + (44ν − 89ν 2 + 54ν 3 − 17ν 4 + 8ν 5 )z 02 +(−12 − 12ν + 108ν 2 − 132ν 3 + 48ν 4 )z 03 +(−12 + 48ν − 72ν 2 + 48ν 3 − 12ν 4 )z 04 + K 1 s 2/(ν−1) + K 2 s 3/(ν−1) .

We notice that K 1 and K 2 will be zero by the analyticity condition unless ν = 2, 3, or 4 in which case they will have to be determined by some other means such as that illustrated in the next subsection. 5.10. Evaluating the constants of integration. We will now outline a method for determining the constants of integration in formulas (2.7) and (2.8) of Theorem 2.5. We know, from [16], that eg (−s) is analytic in s in a neighborhood of s = 0, therefore we notice that for many values of γ1 both K 1 and K 2 will vanish to preserve analyticity. However there are some values for which they will not: Values of ν and g which conspire to make (2g − 2)/(ν − 1) or (2g − 1)/(ν − 1) be positive integers. To find these constants, when necessary, one may rely on the combinatorial interpretation of the Taylor coefficients of eg described in Theorem 1.3. Here we outline the method that we have used to evaluate these constants for g ≤ 3. The set of maps D having vertex set K 0 (D) of fixed cardinality at specified vertices can be placed in 1-1 correspondence with a class of subgroups of the permutation group Sd where d, twice the number of edges, is determined by the cardinality and valences of K 0 (D) [4]. For simplicity and also because it is the case of relevance for us, we take K 0 (D) to consist of n vertices each of valence 2ν. For this class d = 2νn which is the cardinality of the set of darts of K (D). A dart is an oriented edge. Equivalently, we may define the abstract set of darts associated to this class of maps as = 1 ∪ 2 , where 1 = {(v, e) : v is a vertex and e is an edge with two distinct vertices one of them being v} , and

2 = {(v, e, ±) : v is a vertex and e is an edge with a single vertex v} .

The element (v, e) ∈ 1 represents the dart based at v and going along e. The element (v, e, ±) ∈ 2 represents the dart based at v going along e in the counterclockwise (resp. clockwise) orientation. For each edge there are two darts, therefore || = d and we can think of Sd as acting by permutations on the set . Given a map, (K (D), [ι]) with K 0 (D) specified as above, we define a subgroup of Sd generated by two permutations σ, τ . The orientation on X induces (via [ι]) a

Random Matrices, Graphical Enumeration and the Continuum Limit of Toda Lattices

73

cyclic ordering on the darts attached to each vertex; the first permutation σ is given by this action. Explicitly, σ maps the element (v, e) ∈ 1 to the element (v, e) ˜ ∈ 1 or (v, e, ˜ ±) ∈ 2 , where e˜ is the edge counterclockwise in the orientation at v from e. Likewise σ maps the element (v, e, ±) ∈ 2 to the element (v, e) ˜ ∈ 1 or (v, e, ˜ ±) ∈ 2 where e˜ is the edge counterclockwise in the orientation at v from e. The second permutation τ , is given explicitly as the permutation which acts on 1 by sending (v, e) to (v, ˜ e), where v˜ is the other endpoint of e; τ acts on 2 by sending (v, e, ±) to (v, e, ∓). From these descriptions one sees that: σ is a product of disjoint 2ν-cycles, with each cycle corresponding to a unique vertex in K 0 (D); and that τ is a product of disjoint 2-cycles, with each 2-cycle corresponding to a unique edge in K 1 (D). Conversely, given a subgroup presented as above which also has the property that the group acts transitively on , one may construct a unique map. The transitivity condition insures that the underlying map is connected. The permutation τ determines how the vertices are connected through edges to define the graph. The permutation σ gives the orientation of the edges about each vertex. Together these two permutations determine the surface X and the embedding class [ι]. The punchline is that what we now have is an algorithm for computing the coefficient of s n /n! in eg (−s) for finite g, ν, and n: Let d = 2νn. Fix σ to be a permutation formed by a disjoint product of n 2ν-cycles in Sd . Then we choose each disjoint product of νn 2-cycles, τ in Sd . Check if (σ, τ ) is connected (by verifying that the orbit of σ, τ · 1 is all d letters). If (σ, τ ) is connected compute the genus by Euler’s formula (1 − ν)n + F = χ = 2 − 2g, where F (the number of faces) is given by the number of cycles in σ ◦ τ . The details of this calculation together with some examples and a generalization to the case of unoriented maps are in a forthcoming paper [28]. The algorithm gives the following values for the coefficient of s j /j! in eg (−s): g g g g g g g g g g g

=1 =1 =2 =2 =2 =2 =3 =3 =3 =3 =3

ν ν ν ν ν ν ν ν ν ν

=2 =2 =2 =3 =4 =2 =2 =3 =5 =6

j j j j j j j j j j j

=0 =1 =2 =3 =1 =1 =4 =5 =2 =1 =1

0 1 0 1440 0 21 0 58060800 0 0 1485

Then K 1 and K 2 are chosen so that the j th coefficient of eg (−s) matches these numbers.

5.11. Multiple times . The constructions carried out in this paper extend to multiple even time parameters; this is a reflection of the commutativity of the underlying flows in the Toda Lattice hierarchy. However, the expressions found are more complicated and are less easily reduced to closed form than the monic time cases we have considered thus far. For brevity we will show how our results extend to two times: t2ν1 and t2ν2 , where ν1 and ν2 are positive integers. Theorem 2.2 becomes

74

N. M. Ercolani et al.

Theorem 5.3. In the limit as k → ∞, bk (ξ2ν1 , ξ2ν2 )2 has a valid asymptotic expansion of the form bk (ξ2ν1 , ξ2ν2 )2 k(z 0 (s2ν1 , s2ν2 ) +

1 1 z 1 (s2ν1 , s2ν2 ) + 4 z 2 (s2ν1 , s2ν2 ) + · · · ), 2 k k

where s2νi = 2k νi −1 ξ2νi . The terms of this expansion are determined by the following partial differential scheme: 1 (νi ) F ( f, f w , f ww , f www ) + · · · k2 1 1 + 2g Fg(νi ) ( f, f w , f w(2) , . . . , f w(2g+1) ) + · · · evaluated at w = 1 ; k

f s2νi = cνi f νi f w +

where 1 1 f 1 (s2ν1 , s2ν2 ; w) + 4 f 2 (s2ν1 , s2ν2 ; w) 2 k k + · · · , and f g has the form

f (s2ν1 , s2ν2 ; w) = f 0 (s2ν1 , s2ν2 ; w) +

f g (s2ν1 , s2ν2 ; w) = w 1−2g z g (w ν1 −1 s2ν1 , w ν2 −1 s2ν2 ). This partial differential scheme yields a hierarchy of partial differential equations in the same manner as in Theorem 2.2. The functionals Fg(νi ) are identical to the ones found in Theorem 2.2. The z g (s2ν1 , s2ν2 ) play the role of auxiliary functions in computing eg (−s2ν1 , −s2ν2 ). The differential equation determining eg (−s2ν1 , −s2ν2 ) remains largely unchanged from (5.13) because the construction in Sect. 4.2 applies for the multi-time case as well: Theorem 5.4. The g th equation in the hierarchy of equations governing eg (t2ν1 , t2ν2 ) is ' ∂ 2 & 2−2g ν1 −1 ν2 −1 w e (−w s , −w s ) g 2ν 2ν 1 2 w=1 ∂w 2 g & ' 2n+2 ∂ 2 w 2−2(g−n) eg−n (−w ν1 −1 s2ν1 , −w ν2 −1 s2ν2 ) =− 2n+2 (2n + 2)! ∂w n=1 (∞ ) 1 −2g + the k term of log z n (s2ν1 , s2ν2 ) . (5.25) k 2n n=0

Equation (5.25) determines eg (−s2ν1 , −s2ν2 ) from a second order partial differential equation of eg with forcing terms depending on en , n < g, z n , n ≤ g, and their derivatives with respect to s2ν1 and s2ν2 . Theorem 5.5. The RHS of (5.25) will henceforth be denoted by driversg (s2ν1 , s2ν2 ). The solution of (5.25) may be represented as eg (−s2ν1 , −s2ν2 ) −(1−2g)/(ν1 −1) = (ν1 − 1)−1 s2ν1 −(2−2g)/(ν1 −1)

−s2ν1

s2ν1

0 (2−2g)/(ν1 −1)

+K 1 (u)s2ν1

$ %(2−ν1 −2g)/(ν1 −1) (ν −1)/(ν1 −1) driversg (ˆs2ν1 , u sˆ2ν21 )d sˆ2ν2 sˆ2ν1 0 $ %(3−ν1 −2g)/(ν1 −1) (ν −1)/(ν1 −1) driversg (ˆs2ν1 , u sˆ2ν21 )d sˆ2ν1 sˆ2ν1 s2ν1

(1−2g)/(ν1 −1)

+ K 2 (u)s2ν1

,

(5.26)

Random Matrices, Graphical Enumeration and the Continuum Limit of Toda Lattices

75

−(ν −1)/(ν −1)

1 where u = s2ν2 s2ν1 2 , and where K 1 (u) and K 2 (u) are analytic functions of u in a neighborhood of u = 0.

To prove Theorem 5.5: first we will show that z 0 (s2ν1 , s2ν2 ) is given implicitly as the solution of an algebraic equation with coefficients depending on s2ν1 and s2ν2 . Then we demonstrate that z g (s2ν1 , s2ν2 ) are functions of z 0 (s2ν1 , s2ν2 ) and z 0 (s2ν1 , 0). Finally we compute eg (−s2ν1 , −s2ν2 ) by integrating the partial differential equation in Theorem 5.4. The first order terms in the hierarchy of Theorem 5.3 are the pair of equations dz 0 dz 0 dz 0 ν1 , = cν1 z 0 z 0 + (ν1 − 1)s2ν1 + (ν2 − 1)s2ν2 ds2ν1 ds2ν1 ds2ν2 dz 0 dz 0 dz 0 ; = cν2 z 0ν2 z 0 + (ν1 − 1)s2ν1 + (ν2 − 1)s2ν2 ds2ν2 ds2ν1 ds2ν2 or in vector notation

M

where

dz 0 ds2ν1 dz 0 ds2ν2

cν1 z 0ν1 +1 , = cν2 z 0ν2 +1

(5.27)

1 − cν1 (ν1 − 1)s2ν1 z 0ν1 −cν1 (ν2 − 1)s2ν2 z 0ν1 . M= −cν2 (ν1 − 1)s2ν1 z 0ν2 1 − cν2 (ν2 − 1)s2ν2 z 0ν2

Invert M in (5.27) to find a pair of ordinary differential equations for z 0 : cνi z 0νi +1 dz 0 =− , ds2νi −1 + cν1 (ν1 − 1)s2ν1 z 0ν1 + cν2 (ν2 − 1)s2ν2 z 0ν2 which, with initial condition z 0 (0, 0) = 1, is satisfied implicitly by the solution of 1 = z 0 − cν1 s2ν1 z 0ν1 − cν2 s2ν2 z 0ν2

(5.28)

which is regular at (s2ν1 , s2ν2 ) = (0, 0). The g th pair of equations in the hierarchy of partial differential equations in Theorem 5.3 is ⎛ ⎞ M⎝

dz g ds2ν1 ⎠ dz g ds2ν2

= z g Gg + Fg ;

(5.29)

where ⎛

cν1 (ν1 + 1 − 2g)z 0ν1 + cν1 ν1 (ν1 − 1)s2ν1 z 0ν1 −1 dsdz 0 + cν1 ν1 (ν2 − 1)s2ν2 z 0ν2 −1 dsdz 0

⎞

2ν1 2ν2 ⎠ Gg = ⎝ cν2 (ν2 + 1 − 2g)z 0ν2 + cν2 ν2 (ν1 − 1)s2ν1 z 0ν2 −1 dsdz 0 + cν2 ν2 (ν2 − 1)s2ν2 z 0ν2 −1 dsdz 0 2ν1 2ν2 (ν1 ) Forcingg Fg = (ν ) w=1 , Forcingg 2 i) and where Forcing(ν g is given by (2.5) with

f g (s2ν1 , s2ν2 ; w) = w 1−2g z g (w ν1 −1 s2ν1 , w ν2 −1 s2ν2 ).

76

N. M. Ercolani et al.

It is useful at this stage to make the following change of variables: let z 0 (s2ν1 , 0) = y0 and denote z 0 (s2ν1 , s2ν2 ) as z 0 . The constraint equation (5.28) becomes: 1 = y0 − cν1 s2ν1 y0ν1 ,

(5.30)

1=

(5.31)

z 0 − cν1 s2ν1 z 0ν1

− cν2 s2ν2 z 0ν2 ,

and one can solve Eqs. (5.30) and (5.31) for (s2ν1 , s2ν2 ) as functions of (y0 , z 0 ), s2ν1 = (y0 − 1)(cν1 y0ν1 )−1 , s2ν2 =

(y0ν1 (z 0

− 1) −

z 0ν1 (y0

(5.32) − 1))(cν2 y0ν1 z 0ν2 )−1 ,

(5.33)

and then differentiate Eqs. (5.30) and (5.31) with respect to s2ν1 and s2ν2 and solve for dy0 dz 0 dz 0 ds2ν , ds2ν and ds2ν as functions of (y0 , z 0 ): 1

1

2

dy0 ds2ν1 dy0 ds2ν2 dz 0 ds2ν1 dz 0 ds2ν2

= cν1 y0ν1 +1 (ν1 − (ν1 − 1)y0 )−1 , = 0, = cν1 z 0ν1 (1 − ν1 cν1 s2ν1 z 0ν1 −1 − ν2 cν2 s2ν2 z 0ν2 −1 )−1 , = cν2 z 0ν2 (1 − ν1 cν1 s2ν1 z 0ν1 −1 − ν2 cν2 s2ν2 z 0ν2 −1 )−1 .

Changing variables in the system of differential equations (5.29) to differential equations for z g as a function of (y0 , z 0 ), where y0 evolves from 1 to z 0 (s2ν1 , 0) and z 0 from z 0 (s2ν1 , 0) to z 0 (s2ν1 , s2ν2 ), Eq. (5.29) becomes dz g

MC

dy0 dz g dz 0

= z g Gg + Fg ,

where cν1 y0ν1 +1 (ν1 − (ν1 − 1)y0 )−1 cν1 z 0ν1 (1 − ν1 cν1 s2ν1 z 0ν1 −1 − ν2 cν2 s2ν2 z 0ν2 −1 )−1 , C= 0 cν2 z 0ν2 (1 − ν1 cν1 s2ν1 z 0ν1 −1 − ν2 cν2 s2ν2 z 0ν2 −1 )−1 and s2ν1 and s2ν2 are given by Eqs. (5.32) and (5.33). If MC is invertible then z g can be found by integrating the differential equation: & ' & ' dz g = z g C−1 M−1 Gg + C−1 M−1 Fg , 2 2 dz 0 where [V]2 denotes the second component of the vector V. The initial condition is that z g (y0 , z 0 = y0 ) agrees with Theorem 2.3 with ν = ν1 . We will now prove Theorem 5.5. Start by expanding the LHS of (5.25): (2 − 2g)(1 − 2g)eg − (ν1 − 1)(ν1 + 2 − 4g)s2ν1 egs2ν1 − (ν2 − 1)(ν2 + 2 − 4g)s2ν2 egs2ν2 2 2 + (ν1 − 1)2 s2ν e + 2(ν1 − 1)(ν2 − 1)s2ν1 s2ν2 egs2ν1 s2ν2 + (ν2 − 1)2 s2ν e 1 gs2ν1 s2ν1 2 gs2ν2 s2ν2

= driversg (s2ν1 , s2ν2 ).

(5.34)

Random Matrices, Graphical Enumeration and the Continuum Limit of Toda Lattices

77

Changing variables to −(ν −1)/(ν1 −1)

(ˆs2ν1 , u) = (s2ν1 , s2ν2 s2ν1 2 induces

sˆ2ν1 ∂ sˆ∂ 2ν1

∂ u ∂u

=

s2ν1 ∂s∂2ν + 1

)

(ν2 −1) ∂ (ν1 −1) s2ν2 ∂s2ν2 s2ν2 ∂s∂2ν 2

.

Equation (5.34) may be rewritten in these new variables as: ∂ ∂ (1 − 2g) (2 − 2g) sˆ2ν1 eg + + (ν1 − 1) sˆ2ν1 ∂ sˆ2ν1 (ν1 − 1) ∂ sˆ2ν1 (ν1 − 1) 2

(ν −1)/(ν1 −1)

= driversg (ˆs2ν1 , u sˆ2ν21

),

(5.35)

whose LHS is identical to the differential equation (5.14) for eg (−s2ν1 ). Equation (5.35) can be integrated to give eg (−s2ν1 , −s2ν2 ) =

−(2−2g)/(ν1 −1) (ν1 − 1)−2 s2ν1

s2ν

1

× 0

$

s2ν 1

s2ν1

0

%(2−ν1 −2g)/(ν1 −1)

(2−2g)/(ν1 −1)

+ K 1 (u)s2ν1

$ %(2−ν1 )/(ν1 −1) s2ν1 (ν −1)/(ν1 −1)

· driversg (s2ν , us 2ν21 1 (1−2g)/(ν1 −1)

+ K 2 (u)s2ν1

)ds2ν ds2ν 1 1

,

(5.36)

−(ν −1)/(ν −1)

1 and K 1 (u) and K 2 (u) are functions of u only. where u = s2ν2 s2ν1 2 Switch the order of integration in (5.36),

−(2−2g)/(ν −1)

1 eg (−s2ν1 , −s2ν2 ) = (ν1 − 1)−2 s2ν1 s2ν s2ν %(2−ν1 )/(ν1 −1) $ %(2−ν1 −2g)/(ν1 −1) 1 1 $ s2ν s2ν1 × 1 s2ν

0

1

(ν −1)/(ν1 −1)

· driversg (s2ν , us 2ν21 1

(2−2g)/(ν1 −1)

+ K 1 (u)s2ν1

)ds2ν ds2ν 1 1 (1−2g)/(ν1 −1)

+ K 2 (u)s2ν1

,

(5.37)

integral in (5.37) to find and carry out the s2ν 1

eg (−s2ν1 , −s2ν2 ) −(1−2g)/(ν1 −1) = (ν1 − 1)−1 s2ν1 −(2−2g)/(ν1 −1)

−s2ν1

s2ν1

0 (2−2g)/(ν1 −1) +K 1 (u)s2ν1

$ %(2−ν1 −2g)/(ν1 −1) (ν −1)/(ν1 −1) driversg (s2ν , us 2ν21 )ds2ν s2ν1 1 2 $ %(3−ν1 −2g)/(ν1 −1) (ν −1)/(ν1 −1) driversg (s2ν , us 2ν21 )ds2ν s2ν1 1 1 s2ν1

0

(1−2g)/(ν1 −1)

+ K 2 (u)s2ν1

.

This proves formula (5.26) of Theorem 5.5.

78

N. M. Ercolani et al.

6. Conclusions In this paper we have made a detailed study of the coefficients eg (x, t) in the asymptotic expansion of the logarithm of the random matrix partition function (1.5) for a single non-trivial time parameter, t = t2ν and x = k/N near 1. These analytic coefficients are generating functions for the enumeration of g-maps. In particular, (1) We derived a hierarchy of differential equations for these generating functions. (2) We described a procedure for solving these differential equations. (3) Along the way to deriving the hierarchy we also derive an hierarchy of forced Burgers type equations for the auxiliary coefficients which we denote z g (s); these are of combinatorial interest in their own right in that ∂n z g (0) = {two-legged g-maps with n 2ν-valent vertices }. ∂s n A leg is an edge emerging from a univalent vertex; so that the leg is the only edge incident to that vertex. (4) We have calculated explicit formulae for eg (−s) for small values of g. The s-derivatives, of sufficiently large order, turn out to be rational functions of the endpoints squared, β 2 = 4z 0 . The endpoints referred to here are the endpoints of the support of the associated equilibrium measure. Moreover, the coefficients appearing in these expressions are rational constants. The random matrix partition functions and their relations to graphical enumeration through diagrammatic expansion offer powerful tools for bringing methods of complex analysis to bear on fundamental questions in diverse fields such as statistical mechanics [13] and combinatorics [31,5]. However, real progress in this regard has been hampered by the fact that, up till now, these connections have been based on formal procedures and conjectures. The results in this paper place many of these connections, for the large N expansion of the UE partition function, on a rigorous foundation. Moreover, the methods presented here, based on continuum limits of the Toda Lattice hierarchy, have yielded novel and effective procedures for explicitly calculating the relevant asymptotic generating functions. We hope that these results will help to spur renewed application of complex analytic methods in problems of statistical physics and combinatorics. Some results in this direction will appear in future work [27,28]. These analytical tools also point the way to novel combinatorial results. The calculations mentioned in the last item of the list above strongly motivate the following conjecture. Conjecture. The s-derivatives of sufficiently high order of the generating functions, eg , for fixed genus with arbitrary vertex valence number, can be expressed as rational functions of the endpoints of support of the equilibrium measure. Moreover, the coefficients appearing in these expressions are rational constants. As far as we know, such a conjecture has not appeared in the literature on combinatorics of maps. Another manuscript in preparation [17] rigorously establishes a nonlocal representation for the eg (1, t). In the physics literature such representations are referred to as loop equations [2]. We expect this to provide elegant methods that, together with the present manuscript, could enable us to prove the previous conjecture. Moreover, the

Random Matrices, Graphical Enumeration and the Continuum Limit of Toda Lattices

79

generalizations of this loop equation together with the results of Sect. 5.11 can help to guide the characterization of eg (t) as a function of multiple times. The derivation of explicit closed form expressions for eg (t) will have relevance to a number of other current research programs in the statistical mechanics of combinatorial analysis such as random graphs, random tilings and polynuclear growth models. Acknowledgements. V. U. Pierce would like to thank The University of Arizona, Brandeis University, Midwestern State University, and The Ohio State University for their hospitality and support; and Mark Adler and Yuji Kodama for many helpful discussions and their encouragement.

Appendix A In this appendix we show how the expansion (3.17) for the integral (3.16) was derived. Define λ * dj = s 2 j s 2 − β 2 ds, (A.1) β

then (3.16) is

λ β

⎞ ⎛ ν−1 * 1 h(s) s 2 − β 2 ds = ⎝d0 + h jdj⎠ . x

j=0

Integration by parts ( u = s 2 j−1 and dv = s s 2 − β 2 ) of d j gives the equation * * 1 (2 j − 1) 2 j−2 2 s d j = λ2 j−1 (λ2 − β 2 ) λ2 − β 2 − (s − β 2 ) s 2 − β 2 ds. (A.2) 3 3 Equation (A.2) produces a recursion relation for d j : * 1 1 2 j−1 2 1 (2 j − 1) 2 λ β d j−1 . dj = (λ − β 2 ) λ2 − β 2 + 2 j +1 2 j +1 The initial condition of this recursion is λ* * λ * 1 1 2 2 2 2 2 2 2 λ − β dλ = λ λ − β − β log + λ − β . d0 = 2 2 β β The recursion relation (A.3), (A.4) is solved by λ2 − β 2 λ2 − β 2 (1) (2) 2 2 d j = S j (λ)(λ − β ) + S j (λ) λ λ 2 2 λ λ −β (3) + +S j (λ) log , j ∈ N. β β (3) The functions S (2) j and S j are simple. We find the expressions: (2) Sj (3)

Sj

vj 1 2 β2 j 2 j − 1 = j λ = 2 λ2 , 4 β j −1 j +1 1 β2 j 2 j − 1 = −v j . =− j j −1 j +1 4

(A.3)

(A.4)

80

N. M. Ercolani et al. (1)

The case of S j is more complicated; after some work we have (1) Sj

j j v j λ2i 1 2 j−2i+1 ( j − i + 2)β 2i−2 2 j−2i+2 1 2j − 1 . λ = j = $2 j−2i+1% j −1 j +1 2 2 vi (i + 1) j−i

i=1

(1)

(2)

i=1

(3)

The integral (3.16), in terms of S j , S j , and S j is

λ β

⎞ ⎛ ν−1 * 1 h(s) s 2 − β 2 ds = ⎝d0 + h jdj⎠ x j=0 ⎞ ⎛ ν−1 λ2 − β 2 1⎝ (1) ⎠ 2 2 λ −β = h j Sj x λ j=0 ⎞ ⎛ ν−1 λ2 − β 2 1 (2) (2) + h j Sj ⎠ + ⎝ S0 + x λ j=0 ⎞ ⎛ ν−1 1 ⎝ (3) λ λ2 − β 2 (3) ⎠ h j Sj log S0 + + . x β β j=0

(k)

Using the relation (3.10) and the explicit expressions for the S j derived above, we finally arrive at the form of the expansion for (3.16) given in (3.17). References 1. Abramowitz, M., Stegun, I.: Handbook of Mathematical Functions. Washington, D.C.: U.S. Govt. Printing Office, 1972 2. Ambjorn, J., Chekhov, L., Kristjansen, C.F., Makeenko, Yu.: Matrix Model Calculations Beyond the Spherical Limit. Nucl. Phys. B 404, 127–172 (1993) 3. Albeverio, S., Pastur, L., Shcherbina, M.: On the 1/n expansion for some unitary invariant ensembles of random matrices. Commun. Math. Phys. 224, 271–305 (2001) 4. Bauer, M., Itzykson, C.: Triangulations. Discr. Math. 156, 29–81 (1996) 5. Bessis, D., Itzykson, X., Zuber, J.B.: Quantum Field Theory Techniques in Graphical Enumeration. Adv. Appl. Math. 1, 109–157 (1980) 6. Bloch, A., Golse, F., Uribe, A.: Dispersionless Toda and Toeplitz operators. Duke Math. J. 117, 157–196 (2003) 7. Bouttier, J., DiFrancesco, P., Guitter, E.: Census of Planar Maps: From the One-Matrix Model Solution to a Combinatorial Proof. Nucl. Phys. B 645, 477–499 (2002) 8. Bleher, P., Its, A.: Asymptotics of the partition function of a random matrix model. Ann. Inst. Fourier (Grenoble) 55(6), 1943–2000 (2005) 9. Bousquet-Melou, M., Schaeffer, G.: The Degree Distribution in Bipartite Planar Maps: Applications to the Ising Model. http://arXiv.org/list/math.CO/0211070, 2002 10. Deift, P., Kriecherbauer, T., McLaughlin, K.T.-R.: New results on the equilibrium measure for logarithmic potentials in the presence of an external field. J. Approx. Thry. 95, 388–475 (1998) 11. Deift, P., Kriecherbauer, T., McLaughlin, K.T.-R., Venakides, S., Zhou, X.: Uniform asymptotics for polynomials orthogonal with respect to varying exponential weights and applications to universality questions in random matrix theory. Commun. Pure Appl. Math. 52(11), 1335–1425 (1999) 12. Deift, P., McLaughlin, K.T.-R.: A Continuum Limit of the Toda Lattice. Memoirs of the AMS, 131 (624), January 1998 13. Di Francesco, P., Ginsparg, P., Zinn-Justin, J.: 2D gravity and random matrices. Phys. Rep. 254, 1–133 (1995)

Random Matrices, Graphical Enumeration and the Continuum Limit of Toda Lattices

81

14. Dijkgraaf, R., Verlinde, E., Verlinde, H.: Loop Equations and Virasoro Constraints in Non-perturbative Two-Dimensional Quantum Gravity. Nucl. Phys. B 348, 435–456 (1991) 15. Ercolani, N.M., Flaschka, H., Singer, S.: The Geometry of the Full Toda Lattice. Progress in Mathematics 115, 181–226 (1993) 16. Ercolani, N.M., McLaughlin, K.D.T-R.: Asymptotics of the partition function for random matrices via Riemann-Hilbert techniques, and applications to graphical enumeration. Internat. Math. Research Notices 14, 755–820 (2003) 17. Ercolani, N.M., McLaughlin, K.D.T-R.: A Quick Derivation of the Loop Equations for Random Matrices. To appear in Probability, Geometry and Integrable Systems, Cambridge University Press 18. Erdélyi, A., Magnus, W., Oberhettinger, F., Tricomi, F.G.: Higher Transcendental Functions. Vol. 1. New York: Krieger, 1981, pp. 30–31 19. Flascka, H.: On the Toda lattice. II. Inverse-scattering solution. Progr. Theoret. Phys. 51, 703–716 (1974) 20. Flaschka, H.: Integrable Systems and Torus Actions. In: Lectures on Integrable Systems, eds. O. Babelon, P. Cartier, Y. Kosmann-Schwarzbach. London: World Scientific, pp. 43–101, 1994 21. Geronimo, J.S., Case, K.M.: Scattering theory and polynomials orthogonal on the real line. Trans. Amer. Math. Soc. 258(2), 467–494 (1980) 22. Johansson, K.: On fluctuations of eigenvalues of random hermitian matrices. Duke Math. J. 91(1), 151–204 (1998) 23. Makeenko, Y., Marshakov, A., Mironov, A., Morozov, A.: Continuum versus Discrete Virasoro in OneMatrix Models. Nucl. Phys. B 356, 574–628 (1991) 24. Mehta, M.L., Gaudin, M.: On the density of eigenvalues of a random matrix. Nucl. Phys. 18, 420–427 (1960) 25. Mehta, M.L.: Random Matrices. 2nd Edition, San Diego, CA: Academic Press, 1991 26. Pierce, V.: The asymptotic expansion of the partition function of random matrices. PhD Thesis, The University of Arizona, 2004 27. Pierce, V.: Combinatoric Results to Planar Maps. http://arXiv.org/list/math.CO/0703160v1, 2007 28. Pierce, V.: An Algorithm for Map Enumeration. http://arXiv.org/list/math.CO/0610586, 2006 29. Saff, E.B., Totik, V.: Logarithmic Potentials with External Fields. New York: Springer-Verlag, 1997 30. Szeg˝o, G.: Orthogonal Polynomials, AMS Colloquium Publications, Vol. 23, Providence, RI: Amer. Math. Soc., 1939 31. Witten, E.: Two-Dimensional Gravity and Intersection Theory on Moduli Space. In: Surveys in Differential Geometry, Vol 1, Boston: Ditl. Press, 1991, pp. 243–310 32. Zvonkin, A.: How to Draw a Group. Discrete Math. 180, 403–413 (1998) Communicated by M. Aizenman

Commun. Math. Phys. 278, 83–99 (2008) Digital Object Identifier (DOI) 10.1007/s00220-007-0361-9

Communications in

Mathematical Physics

Pairs of Compatible Associative Algebras, Classical Yang-Baxter Equation and Quiver Representations Alexander Odesskii1,2 , Vladimir Sokolov1 1 Landau Institute for Theoretical Physics, Kosygina 2, 119334, Moscow, Russia.

E-mail: [email protected]; [email protected]

2 School of Mathematics, The University of Manchester, Manchester, UK

Received: 17 November 2006 / Accepted: 23 April 2007 Published online: 7 November 2007 – © Springer-Verlag 2007

Abstract: Given an associative multiplication in matrix algebra compatible with the usual one or, in other words, a linear deformation of the matrix algebra, we construct a solution to the classical Yang-Baxter equation. We also develop a theory of such deformations and construct numerous examples. It turns out that these deformations are in one-to-one correspondence with representations of certain algebraic structures, which we call M-structures. We also describe an important class of M-structures related to the affine Dynkin diagrams of A, D, E-type. These M-structures and their representations are described in terms of quiver representations. Introduction Two associative algebras with multiplications (a, b) → ab and (a, b) → a ◦ b defined on the same finite dimensional vector space are said to be compatible if the multiplication a • b = ab + λ a ◦ b

(0.1)

is associative for any constant λ. The multiplication • can be regarded as a deformation of the multiplication (a, b) → ab linear in the parameter λ. In [1] we have studied multiplications compatible with the matrix product or, in other words, linear deformations of matrix multiplication. It turns out that these deformations of the matrix algebra are in one-to-one correspondence with representations of certain algebraic structures, which we call M-structures. The case of a direct sum of several matrix algebras corresponds to representations of the so-called P M-structures (see [1]). Given a pair of compatible associative products, one can construct a hierarchy of integrable systems of ODEs via the Lenard-Magri scheme [2]. The Lax representations for these systems are described in [3]. If one of the multiplications is the usual matrix product, the integrable systems are Hamiltonian gl(N )-models with quadratic Hamiltonians [4]. These systems can be regarded as a generalization of the matrix equations

84

A. Odesskii, V. Sokolov

considered in [5]. Their skew-symmetric reductions give rise to new integrable quadratic so(n)-Hamiltonians. The main ingredient of the M-structure is a pair of associative algebras A and B of the same dimension. The simplest version of a structure of this kind can be regarded as an associative analog of the Lie bi-algebra [6]. We define an infinitesimal bi-algebra (see [20]) as a pair of associative algebras A and B with a non-degenerated pairing and a B ⊗ Aop -module structure on the space L = A ⊕ B such that the algebra A acts on A ⊂ L by right multiplications, the algebra B acts on B ⊂ L by left multiplications and the pairing is invariant with respect to this action (that is (bb , a) = (b, b a) and (b, aa ) = (ba, a ) for a, a ∈ A and b, b ∈ B). Here Aop stands for the algebra opposite to A. Given an infinitesimal bi-algebra, one has the structure of associative algebra on the space A ⊕ B ⊕ A ⊗ B (this is an analog of the Drinfeld double). In this paper we introduce the notion of associative r -matrices, which is a particular case of the usual classical r -matrices. It turns out that the constant associative r -matrices can be classified in terms of infinitesimal bi-algebras. Moreover, one can introduce spectral parameters into the definition of infinitesimal bi-algebras and obtain a classification of non-constant associative r -matrices. In [1] we have discovered an important class of M and P M-structures. These structures are related to the Cartan matrices of affine Dynkin diagrams of the A˜ 2k−1 , D˜ k , E˜ 6 , E˜ 7 , and E˜ 8 -type. In this paper we describe these M-structures and their representations in terms of quiver representations. The paper is organized as follows. In Sect. 1, we consider an associative analog of the classical Yang-Baxter equation. Since semi-simple associative algebras are more rigid algebraic structures than semi-simple Lie algebras, it turns out to be possible to construct a developed theory of the associative Yang-Baxter equation in the semi-simple case. This theory is suitable for constructing a wide class of solutions to the Yang-Baxter equation. We are planning to write a separate paper devoted to systematic search for solutions. In Sect. 2, we give an explicit construction of a solution to the Yang-Baxter equation by each pair of compatible Lie brackets provided that the first bracket is rigid. The corresponding r -matrices are not unitary and therefore they are not included in the classification by A. Belavin and V. Drinfeld [7]. In particular, compatible associative products give rise to solutions of the associative Yang-Baxter equation. This gives us a way to construct r -matrices related to M-structures. In Sect. 3 we recall the notion of M-structure and formulate the main results describing the relationship between associative multiplications in matrix algebra compatible with the usual matrix product and M-structures. In Sect. 4 we describe all M-structures with semi-simple algebras A and B. It turns out that such M-structures are related to the Cartan matrices of affine Dynkin diagrams of the A˜ 2k−1 , D˜ k , E˜ 6 , E˜ 7 , and E˜ 8 -type. We describe these M-structures and their representations in terms of representations of affine quivers [10–12]. In the Appendix we give explicit formulas for these M-structures of A and D types, their representations and for corresponding solutions to the classical Yang-Baxter equation.1 1 The explicit formulas for these M-structures of E type can be found in the preprint version of this article.

Classical Yang-Baxter Equation Solution and Quiver Representations

85

1. Classical Yang-Baxter Equation Let g be a Lie algebra. Let r (u, v) be a meromorphic function of two complex variables with values in End(g). For each u ∈ C we denote by gu a vector space canonically isomorphic to g. Let g˜ = ⊕u gu . We define a bracket on the space g˜ by the formula [xu , yv ] = ([x, r (u, v)y])u + ([r (v, u)x, y])v .

(1.2)

Lemma 1.1. The bracket (1.2) defines a structure of a Lie algebra on g˜ iff r (u, v) satisfies the following equation [r (u, w)x, r (u, v)y] − r (u, v)[r (v, w)x, y] − r (u, w)[x, r (w, v)y] ∈ Cent (g), (1.3) where x, y are arbitrary elements of g and Cent (g) stands for the center of g. Proof. of the lemma is straightforward. Remark 1. Here and in the sequel by Lie algebra we mean partial Lie algebra. Namely, the bracket (1.2) is defined iff the functions r (u, v) and r (v, u) are defined at the point (u, v). The anti-commutativity condition and the Yacobi identity hold whenever the left hand side is defined. Definition. The operator relation [r (u, w)x, r (u, v)y] − r (u, v)[r (v, w)x, y] − r (u, w)[x, r (w, v)y] = 0

(1.4)

is called the classical Yang-Baxter equation. A solution r (u, v) to the classical YangBaxter equation is called the classical r-matrix. Arguments of r (u, v) are called spectral parameters. Note that the arguments u, v of r could be also elements of Cn for n > 1 or elements of some complex manifold called the manifold of spectral parameters. Suppose g possesses a non-degenerate invariant scalar product (·, ·). An r -matrix is called unitary if (x, r (u, v)y) = −(r (v, u)x, y). Remark 2. There are several algebraic interpretations of the Yang-Baxter equation ([7–9]). For our purposes the interpretation from Lemma 1.1 is the most convenient. All definitions lead to the same equation for r (u, v) provided that the r -matrix is unitary. In particular, it is easy to see [8] that Eq. (1.4) is equivalent to the classical Yang-Baxter equation written in the tensor form. The unitary r -matrices were classified in [7]. The case of the non-unitary r -matrix was considered in ([8, 9]). There is not any classification of r -matrices in the general case. It turns out that a theory of (non-unitary) r -matrices can be developed in the special case of associative algebras. Let A be an associative algebra. Let r (u, v) be a meromorphic function in two complex variables with values in End(A). For each u ∈ C we denote by Au a vector space canonically isomorphic to A. Let A˜ = ⊕u Au . We define a product on the space A˜ by the formula xu yv = (x(r (u, v)y))u + ((r (v, u)x)y)v .

(1.5)

86

A. Odesskii, V. Sokolov

Lemma 1.2. The product (1.5) defines a structure of an associative algebra on A˜ iff r (u, v) satisfies the following equation: (r (u, w)x)(r (u, v)y) − r (u, v)((r (v, w)x)y) − r (u, w)(x(r (w, v)y)) ∈ N ull(A), (1.6) where N ull(A) is the set of z ∈ A such that zt = t z = 0 for all t ∈ A. Proof. of the lemma is straightforward. Definition. The relation (r (u, w)x)(r (u, v)y) − r (u, v)((r (v, w)x)y) − r (u, w)(x(r (w, v)y)) = 0

(1.7)

is called the associative Yang-Baxter equation. Lemma 1.3. Let g be a Lie algebra with the brackets [x, y] = x y − yx. Then any solution of (1.7) is a solution of (1.4). Proof. of the lemma is straightforward. Let A = Matn . It is easy to see that any operator from End(A) to End(A) has the form x → a1 x b1 + · · · + a p x b p for some matrices a1 , . . . , a p , b1 , . . . , b p . Moreover, p is the smallest possible for such a representation iff the sets matrices {a1 , . . . , a p } and {b1 , . . . , b p } are both linear independent. Theorem 1.1. Let r (u, v)x = a1 (u, v) x b1 (v, u) + · · · + a p (u, v) x b p (v, u), where a1 (u, v), . . . , b p (u, v) are meromorphic functions with values in Matn such that {a1 (u, v), . . . , a p (u, v)} are linear independent over the field of meromorphic functions in u, v as well as {b1 (u, v), . . . , b p (u, v)}. Then r (u, v) satisfies (1.7) iff there exist meromorphic functions φi,k j (u, v, w) and ψi,k j (u, v, w) such that ai (u, v)a j (v, w) = φi,k j (u, v, w)ak (u, w), i, j

bi (u, v)b j (v, w) = ψk (u, v, w)bk (u, w), b (u, v)a j (v, w) = i

(1.8)

φ ij,k (v, w, u)bk (u, w) + ψ k,i j (w, u, v)ak (u, w).

The tensors φi,k j (u, v, w) and ψi,k j (u, v, w) satisfy the following equations: l l (u, w, t) = φi,s (u, v, t)φ sj,k (v, w, t), φi,s j (u, v, w)φs,k i, j

j,k

ψs (u, v, w)ψls,k (u, w, t) = ψli,s (u, v, t)ψs (v, w, t),

(1.9)

l,s l i φ sj,k (v, w, t)ψsl,i (t, u, v) = φs,k (u, w, t)ψ s,i j (w, u, v) + φ j,s (v, w, u)ψk (t, u, w).

Proof. of the theorem is similar to the proof of Theorem 3.1 from [1]. Remark 3. It is easy to give an invariant description of the corresponding algebraic structure. In the case of a constant r -matrix this leads to the infinitesimal bi-algebras [20] described in the Introduction. Remark 4. A similar statement holds in the case of a semi-simple algebra A.

Classical Yang-Baxter Equation Solution and Quiver Representations

Example 1. Let A = Matn and r (u, v)x =

1 u−v e(u, v)x f (v, u),

87

where

e(u, v)e(v, w) = e(u, w), f (u, v) f (v, w) = f (u, w), u−v v−w e(u, v) f (v, w) = e(u, w) + f (u, w). u−w u−w

(1.10)

Then r (u, v) is an associative r -matrix. These equations hold if we assume, for example, that e(u, v) = 1, f (u, v) = (u + C)(v + C)−1 , where C is an arbitrary constant matrix. Example 2. Let A = C p . The algebra A has a basis {ei , i = 1, . . . , p} such that ei e j = δi, j ei . The formula r (u, v)ei =

1≤ j≤ p

ψi (v) ej φ j (u) − φi (v)

gives an associative r -matrix for any functions φ1 , . . . , φ p , ψ1 , . . . , ψ p of one variable, where φ1 , . . . , φ p are not constant. This r -matrix can be written in the form r ( u , v )ei =

1≤ j≤ p

ψi ( v) ej, u j − vi

v ) are functions of p variables. In this where u = (u 1 , . . . , u p ), v = (v1 , . . . , v p ), ψi ( case the manifold of spectral parameters is C p . 2. Compatible Products and Solutions to the Classical Yang-Baxter Equation Two Lie brackets [·, ·] and [·, ·]1 defined on the same vector space g are said to be compatible if [·, ·]λ = [·, ·] + λ[·, ·]1 is a Lie bracket for any λ. In the papers [13–16] different applications of the notion of compatible Lie brackets to the integrability theory have been considered. Suppose that the bracket [·, ·] is rigid, i.e. H 2 (g, g) = 0 with respect to [·, ·]. In this case the Lie algebras with the brackets [·, ·]λ are isomorphic to the Lie algebra with the bracket [·, ·] for almost all values of the parameter λ. This means that there exists a meromorphic function λ → Sλ with values in End(g) such that S0 = I d and [Sλ (x), Sλ (y)] = Sλ ([x, y] + λ[x, y]1 ).

(2.11)

Theorem 2.1. The formula r (u, v) =

1 Su Sv−1 u−v

(2.12)

defines a solution to the classical Yang-Baxter equation (1.4). Proof. For r (u, v) given by (2.12), Eq. (1.4) is equivalent to 1 1 [Su Sw−1 (x), Su Sv−1 (y)] − Su Sv−1 ([Sv Sw−1 (x), y]) (u − v)(u − w) (u − v)(v − w) 1 Su Sw−1 ([x, Sw Sv−1 (y)]) = 0. (2.13) − (u − w)(w − v)

88

A. Odesskii, V. Sokolov

Using (2.11), we get [Su Sw−1 (x), Su Sv−1 (y)] = Su ([Sw−1 (x), Sv−1 (y)] + u[Sw−1 (x), Sv−1 (y)]1 ), Su Sv−1 ([Sv Sw−1 (x), y]) = Su ([Sw−1 (x), Sv−1 (y)] + v[Sw−1 (x), Sv−1 (y)]1 ),

Su Sw−1 ([x, Sw Sv−1 (y)]) = Su ([Sw−1 (x), Sv−1 (y)] + w[Sw−1 (x), Sv−1 (y)]1 ). Substituting these expressions into the left hand side of (2.13), we obtain the statement. Remark 1. It is clear that the r -matrix (2.12) is unitary with respect to an invariant form (·, ·) if the operator Sλ is orthogonal. In this case formula (2.11) implies that the form (·, ·) is invariant with respect to the second bracket. Two associative algebras with multiplications (x, y) → x y and (x, y) → x ◦ y defined on the same finite dimensional vector space A are said to be compatible if the multiplication (0.1) is associative for any constant λ. Suppose H 2 (A, A) = 0 with respect to the first multiplication; then there exists a meromorphic function λ → Sλ with values in End(A) such that S0 = I d and Sλ (x)Sλ (y) = Sλ (x y + λx ◦ y).

(2.14)

The Taylor decomposition of Sλ at λ = 0 has the following form: Sλ = 1 + R λ + T λ2 + · · · ,

(2.15)

where R, T, . . . are some linear operators on A. Substituting this decomposition into (2.14) and equating the coefficients of λ, we obtain the formula x ◦ y = R(x)y + x R(y) − R(x y),

(2.16)

where R is defined by (2.15). It is clear that for any a ∈ A the transformation R −→ R + ada ,

(2.17)

where ada is a linear operator v → av − va, does not change the multiplication ◦. Definition. Operators R and R are said to be equivalent if R − R = ada for some a ∈ A. The following analog of Theorem 2.1 can be proved similarly. Theorem 2.2. Suppose that Sλ satisfies (2.14), then formula (2.12) defines a solution to the associative Yang-Baxter equation (1.7). Remark 2. In the important particular case Sλ = 1 + λR the r -matrix (2.12) is equivalent to 1 + (v + R)−1 . (2.18) r (u, v) = u−v Let A = Mat N . Consider the following classification problem: describe all possible associative multiplications ◦ compatible with the usual matrix product in A. Since H 2 (A, A) = 0 for any semi-simple associative algebra A, an operator-valued meromorphic function Sλ with the properties S0 = I d and (2.14) exists for any such multiplication and the multiplication is given by formula (2.16).

Classical Yang-Baxter Equation Solution and Quiver Representations

89

Example. Let a ∈ Mat N be an arbitrary matrix and R be the operator of left multiplication by a. Then (2.16) yields the multiplication x ◦ y = xay, which is associative and compatible with the standard one. It is clear that Sλ can be chosen in the form Sλ (x) = (1 + λa)x. In this case we have r (u, v) =

1 + (v + a)−1 . u−v

Any linear operator R on the space Mat N may be written in the form R(x) = a1 xb1 + . . .+al xbl for some matrices a1 , . . . , al , b1 , . . . , bl . Indeed, the operators x → ei, j xei1 , j1 form a basis in the vector space of linear operators on Mat N . It is convenient to represent the operator R from formula (2.16) in the form R(x) = a1 x b1 + · · · + a p x b p + c x

(2.19)

with p being the smallest possible in the class of equivalence of R. This means that the matrices {a1 , . . . , a p , 1} are linear independent as well as the matrices {b1 , . . . , b p , 1}. According to (2.16), the second product has the following form: x◦y= (ai x bi y + x ai y bi − ai x y bi ) + x c y. (2.20) i

It turns out that the matrices {a1 , . . . , a p , b1 , . . . , b p , c} form a representation of a certain algebraic structure. We describe this structure in the next section. 3. M-Structures and the Corresponding Associative Algebras In this section we formulate the results of the paper [1] and their simple consequences we will use below. Definition. By weak M-structure on a linear space L we mean the following data: • • • •

Two subspaces A and B and a distinguished element 1 ∈ A ∩ B ⊂ L. A non-degenerate symmetric scalar product (·, ·) on the space L. Associative products A × A → A and B × B → B with unity 1. A left action B × L → L of the algebra B and a right action L × A → L of the algebra A on the space L that commute to each other. These data should satisfy the following properties:

1. dim A ∩ B = dim L/(A + B) = 1. 2. The restriction of the action B × L → L to the subspace B ⊂ L is the product in B. The restriction of the action L × A → L to the subspace A ⊂ L is the product in A. 3. (a1 , a2 ) = (b1 , b2 ) = 0 and (b1 b2 , v) = (b1 , b2 v), (v, a1 a2 ) = (va1 , a2 ) for any a1 , a2 ∈ A, b1 , b2 ∈ B and v ∈ L. It follows from these properties that (·, ·) defines a non - degenerate pairing between A/C1 and B/C1. Therefore dim A = dim B and dim L = 2 dim A. Given a weak M-structure L, we define an associative algebra U (L) generated by L and satisfying natural compatibility and universality conditions.

90

A. Odesskii, V. Sokolov

Definition. By weak M-algebra associated with a weak M-structure L we mean an associative algebra U (L) with a linear mapping j : L → U (L) such that the following conditions are satisfied: 1. j (b) j (x) = j (bx) and j (x) j (a) = j (xa) for a ∈ A, b ∈ B and x ∈ L. 2. For any algebra X with a linear mapping j : L → X satisfying property 1 there exists a unique homomorphism of algebras f : U (L) → X such that f ◦ j = j . It is easy to see that U (L) exists and is unique for given L. Definition. A weak M-structure L is called M-structure if there exists a central element K ∈ U (L) of the algebra U (L) quadratic with respect to L. Theorem 3.1. Let L be an M-structure. Then there exists a basis {1, A1 , . . . , A p , B 1 , . . . , B p , C} in L such that {1, A1 , . . . , A p } is a basis in A, {1, B 1 , . . . , B p } is a basis in B, and K = A1 B 1 + · · · + A p B p + C. Theorem 3.2. Let R ∈ End(U (L)) be given by the formula R(x) = A1 x B 1 + · · · + A p x B p + C x, and ◦ be defined by (2.16). Then ◦ is associative and compatible with the usual product in U (L). Notice that K = R(1). Theorem 3.3. Let ◦ be an associative product in the space Mat N compatible with the usual one and written in the form (2.16), where R is given by (2.19) with p being smallest possible in the class of equivalence of R. Then there exists an M-structure L with representation U (L) → Mat N such that dim A = dim B = p + 1, the image of A has the basis {1, a1 , . . . , a p }, and the image of B has the basis {1, b1 , . . . , b p }. Definition. A representation of U (L) is called non-degenerate if its restrictions on the algebras A and B are exact. Theorem 3.4. There is one-to-one correspondence between N - dimensional nondegenerate representations of algebras U (L) corresponding to M-structures and associative products in Mat N compatible with the usual matrix product. The structure of the algebra U (L) for an M-structure L can be described as follows. Theorem 3.5. The algebra U (L) is spanned by the elements of the form a b K s , where a ∈ A, b ∈ B, s ∈ Z+ . We need also the following Definition. Let L be a weak M-structure. By the opposite weak M-structure Lop we mean the M-structure with the same linear space L, the same scalar product and algebras A, B replaced by the opposite algebras B op , Aop , correspondingly. It is easy to see that if L is an M-structure, then Lop is an M-structure as well.

Classical Yang-Baxter Equation Solution and Quiver Representations

91

4. M-Structures with Semi-Simple Algebras A and B and Quiver Representations 4.1. Matrix of multiplicities. By V l we denote the direct sum of l copies of a linear space V. By definition, we put V 0 = {0}. Recall [17] that any semi-simple associative algebra over C has the form ⊕1≤i≤r End(Vi ), any left End(V )-module has the form V l , and any right End(V )-module has the form (V )l for some r and l. Lemma 4.1. Let L be a weak M-structure. Suppose A = ⊕1≤i≤r End(Vi ), where dim Vi = m i . Then L as a right A-module is isomorphic to ⊕1≤i≤r (Vi )2m i . Proof. Since any right A-module has the form ⊕1≤i≤r (Vi )li for some l1 , . . . , lr ≥ 0, we have L = ⊕1≤i≤r Li , where Li = (Vi )li . Note that A ⊂ L and, moreover, End(Vi ) ⊂ Li for i = 1, . . . , r . Besides, End(Vi )⊥L j for i = j. Indeed, we have (v, a) = (v, I di a) = (v I di , a) = 0 for v ∈ L j , a ∈ End(Vi ), where I di is the unity of the subalgebra End(Vi ). Since (·, ·) is non-degenerate and End(Vi )⊥End(V i ) by property 3 of the weak M-structure, we have dim L ≥ 2 dim End(V ). But i i i dim Li = dim L = 2 dim A = i 2 dim End(Vi ) and we obtain dim Li = 2 dim End(Vi ) for each i = 1, . . . , r, which is equivalent to the statement of Lemma 4.1. Lemma 4.2. Let A and B be semi-simple associative algebras: A = ⊕1≤i≤r End(Vi ),

B = ⊕1≤ j≤s End(W j ),

Then L as the Aop ⊗ B-module is given by the formula

dim Vi = m i , dim W j = n j . (4.21)

L = ⊕1≤i≤r,1≤ j≤s (Vi ⊗ W j )ai, j , where ai, j ≥ 0 and

s j=1

ai, j n j = 2m i ,

r

ai, j m i = 2n j .

(4.22)

(4.23)

i=1

Proof. It is known that any Aop ⊗ B-module has the form ⊕1≤i≤r,1≤ j≤s (Vi ⊗ W j )ai, j , where ai, j ≥ 0. Applying Lemma 4.1, we obtain dim Li = 2m i2 , where Li = ⊕1≤ j≤s (Vi ⊗ W j )ai, j . This gives the first equation from (4.23). The second equation can be obtained similarly. Definition. The r × s-matrix (ai, j ) from Lemma 4.2 is called the matrix of multiplicities of the weak M-structure L. Definition. The r × s-matrix (ai, j ) is called decomposable if there exist partitions {1, . . . , r } = I I and {1, . . . , s} = J J such that ai, j = 0 for (i, j) ∈ I × J I × J . Lemma 4.3. The matrix of multiplicities is indecomposable. Proof. Suppose (ai, j ) is decomposable. We have A = A ⊕ A , B = B ⊕ B and L = L ⊕ L , where A = ⊕i∈I End(Vi ), A = ⊕i∈I End(Vi ), B = ⊕ j∈J End(W j ), B = ⊕ j∈J End(W j ), L = ⊕(i, j)∈I ×J (Vi ⊗ W j )ai, j , L = ⊕(i, j)∈I ×J (Vi ⊗ W j )ai, j .

Let 1 = e1 + e2 , where e1 ∈ L and e2 ∈ L . It is clear that e1 , e2 ∈ A ∩ B. Therefore, dim A ∩ B > 1, which contradicts property 1 of the weak M-structure.

92

A. Odesskii, V. Sokolov

Note that if A is the matrix of multiplicities of a weak M structure with semi-simple algebras A and B, then At is the matrix of multiplicities for the opposite weak M-structure. Theorem 4.1. Let L be a weak M-structure with semi-simple algebras A and B given by formula (4.21) and with L given by (4.22). Then there exists a simple laced affine Dynkin diagram [18] with vector spaces from the set {V1 , . . . , Vr , W1 , . . . , Ws } assigned to each vertex in such a way that: 1. there is one-to-one correspondence between this set and the set of vertices, 2. for any i, j the spaces Vi , V j are not connected by edges as well as the spaces Wi , Wj, 3. ai, j is equal to the number of edges between Vi and W j , 4. the vector (dim V1 , . . . , dim Vr , dim W1 , . . . , dim Ws ) is a positive imaginary root of the diagram. Proof. Consider a linear space with a basis {v1 , . . . , vr , w1 , . . . , ws } and the symmetric bilinear form (vi , v j ) = (wi , w j ) = 2δi, j , (vi , w j ) = −ai, j . Let J = m 1 v1 +· · ·+m r vr + n 1 w1 + · · · + n s ws . It is clear that Eqs. (4.23) can be written as (vi , J ) = (w j , J ) = 0, which means that J belongs to the kernel of the form (·, ·). Therefore (see [19]) the matrix of the form is the Cartan matrix of a simple laced affine Dynkin diagram. It is also clear that J is a positive imaginary root. On the other hand, consider a simple laced affine Dynkin diagram with a partition of the set of vertices into two subsets such that vertices of the same subset are not connected. It is clear that if such a partition exists, then it is unique up to transposition of subsets. Let v1 , . . . , vr be roots corresponding to vertices of the first subset and w1 , . . . , ws be roots corresponding to the second subset. We have (vi , v j ) = (wi , w j ) = 2δi, j . Let J = m 1 v1 + · · · + m r vr + n 1 w1 + · · · + n s ws be an imaginary root and ai, j = −(vi , w j ). Then it is easy to see that (4.23) holds. Remark. The interchanging of the subsets corresponds to the transposition of the matrix (ai, j ). It is easily seen that among simple laced affine Dynkin diagrams only diagrams of the A˜ 2k−1 , D˜ k , E˜ 6 , E˜ 7 , and E˜ 8 -type admit a partition of the set of vertices into two subsets such that vertices of the same subset are not connected. The natural question arises: to describe all M-structures with the algebras A and B given by (4.21) and L given by (4.22), where the matrix (ai, j ) is constructed by an affine Dynkin diagram of the A˜ 2k−1 , D˜ k , E˜ 6 , E˜ 7 , and E˜ 8 -type. It turns out that these M-structures exist iff J is the minimal positive imaginary root. 4.2. M-structures related to affine Dynkin diagrams and quiver representations. We recall that the quiver is just a directed graph Q = (V er, E), where V er is a finite set of vertices and E is a finite set of arrows between them. If a ∈ E is an arrow, then ta and h a denote its tail and its head, respectively. Note that loops and several arrows with the same tail and head are allowed. A representation of the quiver Q is a set of vector spaces L x attached to each vertex x ∈ V er and linear maps f a : L ta → L h a attached to each arrow a ∈ E. The set of natural numbers dimL x attached to each vertex x ∈ V er is called the dimension of the representation. By affine quiver we mean such a quiver that the corresponding graph is an affine Dynkin diagram of AD E-type.

Classical Yang-Baxter Equation Solution and Quiver Representations

93

Theorem 4.2. Let L be an M-structure with semi-simple algebras A and B given by (4.21). Then there exists a representation of an affine Dynkin quiver such that: 1. There is an one-to-one correspondence between the set of vector spaces attached to vertices of the quiver and the set of vector spaces {V1 , . . . , Vr , W1 , . . . , Ws }. Each vector space from this set is attached to only one vertex. 2. For any a ∈ E the space attached to its tail ta is some of Vi and the space attached to its head h a is some of W j . 3. L as Aop ⊗ B-module is isomorphic to ⊕a∈E Vta ⊗ Wh a . 4. The vector (dim V1 , . . . , dim Vr , dim W1 , . . . , dim Ws ) is the minimal imaginary positive root of the Dynkin diagram. 5. The element 1 ∈ L = ⊕a∈E H om(Vta , Wh a ) is just a∈E f a , where f a is the linear map attached to the arrow a. Proof. In Theorem 4.1 we have already constructed the affine Dynkin diagram corresponding to L with vector spaces {V1 , . . . , Vr , W1 , . . . , Ws } attached to the vertices. Note that each edge of this affine Dynkin diagram links some linear spaces Vi and W j . By definition, the direction of this edge is from Vi to W j . The decomposition of the element 1 ∈ L = ⊕1≤i≤r,1≤ j≤s (Vi ⊗ W j )ai, j defines the element from Vi ⊗ W j . Since Vi ⊗ W j = H om(Vi , W j ), we obtain a representation of the quiver. We know already that J = (dim V1 , . . . , dim Vr , dim W1 , . . . , dim Ws ) is an imaginary positive root. It is easy to see that if it is not minimal, then dim A ∩ B > 1. Now we can use known classification of representations of affine quivers [10–12] to describe the corresponding M-structures. Note that each vertex of our quiver can not be a tail of one arrow and a head of another arrow at the same time. Given a representation of such a quiver, it remains to construct an embedding A → L, B → L and a scalar product (·, ·) on the space L. We can construct the embedding A → L, B → L by the formula a → 1a, b → b1 for a ∈ A, b ∈ B whenever we know the element 1 ∈ L. After that it is not difficult to construct the scalar product. Example. Consider the case A˜ 2k−1 . We have dim Vi = dim Wi = 1 for 1 ≤ i ≤ k. Let {vi } be a basis of Vi and {wi } be a basis of Wi . Let {ei } be a basis of End(Vi ) such that vi ei = vi and { f i } be a basis of End(Wi ) such that f i wi =wi . A generic element 1 ∈ L in a suitable basis in Vi , Wi can be written in the form 1 = 1≤i≤k (vi ⊗wi +λvi+1 ⊗wi ), where index i is taken modulo k and λ ∈ C is a generic complex number. The embedding A → L, B → L is the following: ei → 1ei = vi ⊗ wi + λvi ⊗ wi−1 , f i → f i 1 = v i ⊗ wi + λvi+1 ⊗ wi . It is clear that the vector space A ∩ B is spanned by the vector i (vi ⊗ wi + λvi ⊗ wi−1 ) and that the algebra A ∩ B is isomorphic to C. Let Q = (V er, E) be an affine quiver and ρ be its representation constructed by a given M-structure L with semi-simple algebras A and B. Let V er = V ert V erh , where V ert is the set of tails and V erh is the set of heads of arrows. We have ρ : x → Vx , y → W y , a → f a for x ∈ V ert , y ∈ V erh and a ∈ E. It turns out that representations of the algebra U (L) can also be described in terms of representations of the quiver Q. Theorem 4.3. Suppose we have a representation of the algebra U (L) in a linear space N ; then there exists a representation τ : x → N x , a → φa ; x ∈ V er, a ∈ E of the quiver Q such that 1. The restriction of the representation of the algebra U (L) on the subalgebra A ⊂ U (L) is isomorphic to ⊕x∈V ert Vx ⊗ N x .

94

A. Odesskii, V. Sokolov

2. The restriction of the representation of the algebra U (L) on the subalgebra B ⊂ U (L) is isomorphic to ⊕x∈V erh Wx ⊗ N x . 3. The formula f = a∈E f a ⊗ φa defines an isomorphism f : ⊕x∈V ert Vx ⊗ N x → ⊕x∈V erh Wx ⊗ N x . Proof. It is known that any representation of the algebra End(V ) has the form V ⊗ S, where S is a linear space. The action is given by f (v⊗s) = ( f v)⊗s. Therefore N has the form N a = ⊕x∈V ert Vx ⊗ N x with respect to the action of A = ⊕1≤i≤r End(Vi ) and has the form N b = ⊕x∈V erh Wx ⊗ N x with respect to the action of B = ⊕1≤ j≤s End(W j ) for some linear spaces N x . Both linear spaces N a and N b are isomorphic to N . Thus we have linear spaces N x attached to each x ∈ V er and isomorphism f : ⊕x∈V ert Vx ⊗ N x → ⊕x∈V erh Wx ⊗ N x . Let f = x,y∈V er f x,y . It is easy to see that f x,y = 0 if x and y are not linked by arrow and f x,y = f a ⊗ φa for some φa if x = ta , y = h a . Here f a is defined by Theorem 4.2 (see property 5). This gives us a linear map φa attached to each arrow a ∈ E. Remark 1. It is clear that all statements of this section are valid for weak M-structures with semi-simple algebras A and B. However, it is possible to check that any such weak M-structure has a quadratic central element K and therefore is an M-structure. Remark 2. It follows from Theorem 4.3 (see property 3) that dim N =

x∈V ert

m x dim N x =

n x dim N x .

(4.24)

x∈V erh

Moreover, if the representation τ is decomposable, then the representation of U (L) is also decomposable. Therefore, if the representation of U (L) is indecomposable, then dim τ must be a positive root with the property (4.24). If this root is real, then the representation does not depend on parameters and corresponds to some special value of K . If this root is imaginary, then the representation depends on one parameter and the action of K depends on this parameter also. In the Appendix we describe these representations for imaginary roots explicitly.

5. Appendix In this Appendix we present explicit formulas for M-algebras with semi-simple algebras A and B based on known classification results on affine quiver representations. We give also formulas for the operator R with values in End(U (L)). Note that K = R(1). It turns out that in all cases Sλ = 1 + λR.

(5.25)

Moreover, the operator R satisfies a polynomial equation of degree 3 in the case A˜ 2k−1 and degree 4 in other cases. Using these equations, one can define (v + R)−1 with values in the localization C(K )⊗U (L), where C(K ) is the field of rational functions in K . Formula (2.18) gives us the corresponding universal r -matrix with values in C(K ) ⊗ U (L). For any representation of U (L) in a vector space N the image of this r -matrix is an r -matrix with values in End(N ).

Classical Yang-Baxter Equation Solution and Quiver Representations

95

The case A˜ 2k−1 . The algebras A and B have bases {ei ; i ∈ Z/kZ} and { f i ; i ∈ Z/kZ} correspondingly such that the multiplications are given by ei e j = δi, j ei ,

f i f j = δi, j f i .

(5.26)

The M-algebra U (L) is generated by e1 , . . . , ek , f 1 , . . . , f k with defining relations (5.26) and e1 + · · · + ek = f 1 + · · · + f k = 1, fi e j = 0, j − i = 0, 1. The operator R can be written in the form: R(x) = ei x f j + f k ek x. 1≤i≤ j≤k−1

This operator satisfies the following equation: K R(x) − (K + 1)R 2 (x) + R 3 (x) = 0. From this equation we obtain 1 1 (v + R)−1 (x) = x + (v + K )−1 (R 2 (x) − (1 + v + K )R(x)). v v(v + 1) The corresponding r -matrix is given by (2.18). For any generic value of K the algebra U (L) has the following irreducible representation V . There exist two bases {vi ; i ∈ Z/kZ} and {wi ; i ∈ Z/kZ} of the space V such that ei v j = δi, j vi ,

f i w j = δi, j wi , vi = wi − twi−1 , i, j ∈ Z/kZ.

Here t ∈ C is a parameter of representation. In this representation K acts as multiplication by 1/(1 − t k ). ∼ C⊕C⊕(Mat2 )k−2 ⊕C⊕C has a basis {e1 , e2 , e2k , e2k+1 , The case D˜ 2k . The algebra A = e2α,i, j ; 2 ≤ α ≤ k − 1, 1 ≤ i, j ≤ 2} with multiplication eα eβ = δα,β eβ , eα eβ,i, j = eβ,i, j eα = 0, eα,i, j eβ,i , j = δα,β δ j,i eα,i, j . The algebra B ∼ = (Mat2 multiplication

)k−1

(5.27)

has a basis {e2α−1,i, j ; 2 ≤ α ≤ k, 1 ≤ i, j ≤ 2} with

eα,i, j eβ,i , j = δα,β δ j,i eα,i, j . (5.28) The M-algebra U (L) is generated by e1 , e2 , e2k , e2k+1 , eα,i, j ; 3 ≤ α ≤ 2k − 1, 1 ≤ i, j ≤ 2 with defining relations (5.27), (5.28) and e2α,i,i = e2α−1,i,i = 1, e1 + e2 + e2k + e2k+1 + 2≤α≤k−1,1≤i≤2

e2α−1,i, j eβ e2α−1,i, j e2β,i , j e3,1,2 e1 e2α−1,i, j e2α,i , j e2α−1,i,1 e2α,1, j e2k−1,1,1 e2k e2k−1,1,2 e2k+1

= = = = = = =

2≤α≤k,1≤i≤2

0, 2 < α < k, β = 1, 2, 2k, 2k + 1, 0, α = β, β + 1, e3,2,2 e1 = e3,1,1 e2 = e3,2,1 e2 = 0, e2α+1,i, j e2α,i , j = 0, j = i , e2α−1,i,2 e2α,2, j , e2α+1,i,1 e2α,1, j = e2α+1,i,2 e2α,2, j , e2k−1,1,2 e2k , e2k−1,2,1 e2k = e2k−1,2,2 e2k , λe2k−1,1,1 e2k+1 , e2k−1,2,2 e2k+1 = λe2k−1,2,1 e2k+1 .

96

A. Odesskii, V. Sokolov

The operator R can be written in the form:

R(x) =

(λe1 xe2α+1,2,2 − λe1 xe2α+1,2,1 + e2 xe2α+1,1,1 − e2 xe2α+1,1,2

1≤α≤k−1

+e2k xe2α+1,1,1 + λe2k xe2α+1,2,2 + λe2k+1 xe2α+1,1,1 + λe2k+1 xe2α+1,2,2 ) (λe2α,1,1 xe2β−1,2,2 + e2α,2,2 xe2β−1,1,1 ) + 2≤α≤k−1, 2≤β≤k

−

(λe2α,1,1 xe2β−1,2,1 + e2α,2,2 xe2β−1,1,2 )

2≤α<β≤k

+

(λe2α,2,1 xe2β−1,2,2 + e2α,1,2 xe2β−1,1,1 ) + (1 − λ)e2k−1,2,2 e2k+1 x.

2≤β≤α≤k−1

This operator satisfies the following equation: R 4 (x) − (1 + λ + K )R 3 (x) + (λ + K + λK )R 2 (x) − λK R(x) = 0. From this equation we obtain 1 1 (v + R)−1 (x) = − x + (v + K )−1 R 3 (x) − (1 + v + λ + K )R 2 (x) v v(v + 1)(v + λ) 2 +(v + λv + v + λ + (1 + v + λ)K )R(x) , and the r -matrix is given by (2.18). For any generic value of K the algebra U (L) has the following irreducible representation V of dimension 4k − 4. There exist two bases {v1 , v2 , v2k , v2k+1 , v2α,i, j ; 2 ≤ α ≤ k − 1, 1 ≤ i, j ≤ 2} and {v2α−1,i, j ; 2 ≤ α ≤ k, 1 ≤ i, j ≤ 2} of the space V such that eα vβ = eα v2β,i, j = e2α,i, j v2β,i , j = e2α−1,i, j v2β−1,i , j =

δα,β vβ , α, β = 1, 2, 2k, 2k + 1, e2β,i, j vα = 0, α = 1, 2, 2k, 2k + 1 2 ≤ β ≤ k − 1, δα,β δ j,i v2α,i, j , 2 ≤ α, β ≤ k − 1, δα,β δ j,i v2α−1,i, j , 2 ≤ α, β ≤ k,

and v1 = v3,1,1 , v2 = v3,2,2 , v2α,i, j = v2α+1,i, j − v2α−1,i, j , 2 ≤ α ≤ k − 1, i, j = 1, 2, v2k = v2k−1,1,1 + v2k−1,2,1 + v2k−1,1,2 + v2k−1,2,2 , v2k+1 = v2k−1,1,1 + λv2k−1,2,1 + tv2k−1,1,2 + λtv2k−1,2,2 . Here λ ∈ C is a parameter of the algebra U (L) and t ∈ C is a parameter of representation. In this representation K acts as multiplication by µ = λ(t − 1)/(t − λ).

Classical Yang-Baxter Equation Solution and Quiver Representations

97

The case D˜ 2k−1 . The algebra A ∼ = C ⊕ C ⊕ (Mat2 )k−2 has a basis {e1 , e2 , e2α,i, j ; 2 ≤ α ≤ k − 1, 1 ≤ i, j ≤ 2} with multiplication eα eβ = δα,β eβ , eα eβ,i, j = eβ,i, j eα = 0, eα,i, j eβ,i , j = δα,β δ j,i eα,i, j .

(5.29)

The algebra B ∼ = C ⊕ C ⊕ (Mat2 )k−2 has a basis {e2k−1 , e2k , e2α−1,i, j ; 2 ≤ α ≤ k − 1, 1 ≤ i, j ≤ 2} with multiplication eα eβ = δα,β eβ , eα eβ,i, j = eβ,i, j eα = 0, eα,i, j eβ,i , j = δα,β δ j,i eα,i, j .

(5.30)

The M-algebra U (L) is generated by e1 , e2 , e2k−1 , e2k , eα,i, j ; 3 ≤ α ≤ 2k − 2, 1 ≤ i, j ≤ 2 with defining relations (5.29), (5.30) and eα eβ = 0, α = 2k − 1, 2k, β = 1, 2, e1 + e2 = e2α,i,i = e2k−1 + e2k + 2≤α≤k−1,1≤i≤2

e2α−1,i, j eβ e2α−1,i, j e2β,i , j eα e2β,i, j e3,1,2 e1 e2α−1,i, j e2α,i , j e2α−1,i,1 e2α,1, j e2k−1 e2k−2,1,1 e2k e2k−2,2,1

e2α−1,i,i = 1,

2≤α≤k−1,1≤i≤2

= 0, α > 2, β = 1, 2, = 0, α = β, β + 1, = 0, β < k − 1, α = 2k − 1, 2k, = e3,2,2 e1 = e3,1,1 e2 = e3,2,1 e2 = 0, = e2α+1,i, j e2α,i , j = 0, j = i , = e2α−1,i,2 e2α,2, j , e2α+1,i,1 e2α,1, j = e2α+1,i,2 e2α,2, j , = e2k−1 e2k−2,2,1 , e2k−1 e2k−2,1,2 = e2k−1 e2k−2,2,2 = λe2k e2k−2,1,1 , e2k e2k−2,2,2 = λe2k e2k−2,1,2 .

The operator R can be written in the form: ((λ − 1)e1 xe2α−1,2,2 + (λ − 1)e2α,1,1 xe2k−1 R(x) = (λ − 1)e1 xe2k−1 + 2≤α≤k−1

−λe2 xe2α−1,1,2 − e1 xe2α−1,2,1 + λe2 xe2α−1,2,2 + λe1 xe2α−1,1,1 ) ((λ − 1)e2α,1,1 xe2β−1,2,2 + 2≤α,β≤k−1

+λe2α,1,1 xe2β−1,1,1 + λe2α,2,2 xe2β−1,2,2 ) + (λe2α,1,2 xe2β−1,1,1 + e2α,2,1 xe2β−1,2,2 ) 2≤β≤α≤k−1

−

(λe2α,2,2 xe2β−1,1,2 + e2α,1,1 xe2β−1,2,1 ) + (λ − 1)xe2k e2k−2,2,2 .

2≤α<β≤k−1

This operator satisfies the following equation: R 4 (x) − R 3 (x)(2λ − 1 + K ) + R 2 (x)(λ2 − λ − K + 2λK ) − λ(λ − 1)R(x)K = 0. From this equation we obtain 1 1 R 3 (x) − R 2 (x)(v + 2λ − 1 + K ) (v + R)−1 (x) = − x + v v(v + λ)(v + λ − 1) +R(x)(v 2 + 2λv + λ2 − v − λ + (v − 1 + 2λ)K ) (v + K )−1 ,

98

A. Odesskii, V. Sokolov

and the r -matrix is given by (2.18). For any generic value of K the algebra U (L) has the following irreducible representation V of dimension 4k − 6. There exist two bases {v1 , v2 , v2α,i, j ; 2 ≤ α ≤ k − 1, 1 ≤ i, j ≤ 2} and {v2k−1 , v2k , v2α−1,i, j ; 2 ≤ α ≤ k − 1, 1 ≤ i, j ≤ 2} of the space V such that eα vβ eα v2β,i, j e2α,i, j v2β,i , j eα vβ eα v2β−1,i, j e2α−1,i, j v2β−1,i , j

= δα,β vβ , α, β = 1, 2, = e2β,i, j vα = 0, α = 1, 2, 2 ≤ β ≤ k − 1, = δα,β δ j,i v2α,i, j , 2 ≤ α, β ≤ k − 1, = δα,β vβ , α, β = 2k − 1, 2k, = e2β−1,i, j vα = 0, α = 2k − 1, 2k, 2 ≤ β ≤ k − 1, = δα,β δ j,i v2α−1,i, j , 2 ≤ α, β ≤ k − 1,

and v1 = v2α,i, j = v2k−2,1,1 = v2k−2,1,2 =

v3,1,1 , v2 = v3,2,2 , v2α+1,i, j − v2α−1,i, j , 2 ≤ α < k − 1, i, j = 1, 2, v2k−1 + v2k − v2k−3,1,1 , v2k−2,2,2 = v2k−1 + λtv2k − v2k−3,2,2 , v2k−1 + tv2k − v2k−3,1,2 , v2k−2,2,1 = v2k−1 + λv2k − v2k−3,2,1 .

Here λ ∈ C is a parameter of the algebra U (L) and t ∈ C is a parameter of representation. In this representation K acts as multiplication by µ = tλ(1 − λ)/(1 − tλ). Acknowledgements. The authors are grateful to I.Z. Golubchik, A. Vaintrob and M.A. Semenov-Tian-Shansky for useful discussions. The research was supported by the Manchester Institute for Mathematical Sciences (MIMS). The research was partially supported by: RFBR grant 05-01-00189, NSh grants 1716.2003.1 and 2044.2003.2.

References 1. Odesskii, A.V., Sokolov, V.V.: Algebraic structures connected with pairs of compatible associative algebras. International Mathematics Research Notices, 2006, Article ID 43734, 35 pages (2006) 2. Magri, F.: A simple model of the integrable Hamiltonian equation. J. Math. Phys. 19, 1156–1162 (1978) 3. Golubchik, I.Z., Sokolov, V.V.: Factorization of the loop algebra and integrable top-like systems. Theoret. Math. Phys. 141(1), 1329–1347 (2004) 4. Odesskii, A.V., Sokolov, V.V.: Integrable matrix equations related to pairs of compatible associative algebras. J. Phys. A: Math. Gen. 39, 12447–12456 (2006) 5. Mikhailov, A.V., Sokolov, V.V.: Integrable ODEs on associative algebras. Commun. Math. Phys. 211(1), 231–251 (2000) 6. Drinfeld, V.G.: Quantum groups. In: Proceedings of the International Congress of Mathematicians, Vols. 1, 2 (Berkeley, Calif., 1986), Providence, RI: Amer. Math. Soc., 1987, pp. 798–820 7. Belavin, A.A., Drinfeld, V.G.: Solutions of the classical Yang-Baxter equation for simple Lie algebras (Russian). Funks. Anal. i Pril. 16(3), 1–29 (1982) 8. Semenov-Tian-Shansky, M.A.: What a classical r -matrix is (Russian). Funkts. Anal. i Pril. 17(4), 17–33 (1983) 9. Sklyanin, E.K.: Separation of variables—new trends. In: Quantum field theory, integrable models and beyond (Kyoto, 1994). Progr. Theoret. Phys. Suppl. No. 118, 35–60 (1995) 10. Schofield, A.: General representations of quivers. Proc. London Math. Soc. 65, 46–64 (1992) 11. Dlab, V., Ringel, C.M.: Indecomposable representations of graphs and algebras. Mem. Amer. Math. Soc., 173 providence, RI: Amer. Math. Soc., 1976 12. Ringel, C.M.: Tame algebras and integral quadratic forms. Springer Lect. Notes 1099, Berlin-Heidelberg-New York: Springer, 1984

Classical Yang-Baxter Equation Solution and Quiver Representations

99

13. Golubchik, I.Z., Sokolov, V.V.: Compatible Lie brackets and integrable equations of the principle chiral model type. Func. Anal. and Appl. 36(3), 172–181 (2002) 14. Golubchik, I.Z., Sokolov, V.V.: Factorization of the loop algebras and compatible Lie brackets. J. Nonlin. Math. Phys. 12(1), 343–350 (2005) 15. Golubchik, I.Z., Sokolov, V.V.: Compatible Lie brackets and Yang-Baxter equation. Theoret. Math. Phys. 146(2), 159–169 (2006) 16. Odesskii, A.V., Sokolov, V.V.: Compatible Lie brackets related to elliptic curve. J. Math. Phys. 47, 013506 (2006) 17. Van Der Waerden, B.L.: Algebra. Berlin-Heidelberg-New York: Springer-Verlag, 1967 18. Bourbaki, N.: Groupes et algébres de Lie. Paris: Hermann, 1968 19. Vinberg, E.B.: Disete linear groups that are generated by reflections. Izv. Akad. Nauk SSSR, Ser. Mat. 35, 1072–1112 (1971) 20. Aquiar, M.: On the associative analog of Lie bialgebras. J. Algebra. 244(2), 492–532 (2001) Communicated by A. Connes

Commun. Math. Phys. 278, 101–116 (2008) Digital Object Identifier (DOI) 10.1007/s00220-007-0380-6

Communications in

Mathematical Physics

A Uniform Quantum Version of the Cherry Theorem Sandro Graffi1 , Carlos Villegas-Blas2, 1 Dipartimento di Matematica, Università di Bologna, Bologna, Italy. E-mail: [email protected] 2 Universitad Nacional Autonoma de Mexico, Instituto de Matematicas, Unidad Cuernvaca,

Cuernavaca, Mexico Received: 26 January 2007 / Accepted: 18 April 2007 Published online: 10 November 2007 – © Springer-Verlag 2007

Abstract: Consider in L 2 (R2 ) the operator family H () := P0 (h¯ , ω) + F0 . P0 is the quantum harmonic oscillator with diophantine frequency vector ω, F0 a bounded pseudodifferential operator with symbol decreasing to zero at infinity in phase space, and ∈ C. Then there exist ∗ > 0 independent of h¯ and an open set ⊂ C2 \ R2 such that if || < ∗ and ω ∈ , the quantum normal form near P0 converges uniformly with respect to h¯ . This yields an exact quantization formula for the eigenvalues, and for h¯ = 0 the classical Cherry theorem on convergence of Birkhoff’s normal form for complex frequencies is recovered. 1. Introduction and Statement of the Results Consider in the phase space R2l with canonical coordinates denoted (x, ξ ) the Hamiltonian system defined by the principal function p (x, ξ ; ω) := p0 (x, ξ ) + f 0 (x, ξ ), p0 (x, ξ ; ω) := Ik (x, ξ ) :=

1 (|ξ |2 + |ωx|2 ) = 2

l

(1.1) ωk Ik (x, ξ ),

(1.2)

k=1

1 [ξ 2 + ωk2 xk2 ], k = 1, . . . , l. 2ωk k

(1.3)

Here f 0 : R2l → R is analytic; f 0 = O([|ξ |2 + |ωx|2 ]s/2 ), s ≥ 3, as |x| + |ξ | → 0, and ∈ R. Any analytic Hamiltonian near a non-degenerate elliptic equilibrium point can be written in the form (1.1). Let the frequencies ω := (ω1 , . . . , ωl ) fulfill a diophantine condition, i.e ω, k ≥ γ |k|−τ , ∀k ∈ Zl \ {0}, |k| := |k1 | + . . . + |kl |, , γ > 0, τ > l − 1. (1.4) Partially supported by PAPIIT-UNAM IN106106-2.

102

S. Graffi, C. Villegas-Blas

Under these circumstances the Birkhoff theorem holds, namely (see e.g.[SM], Sect. 30): ∀ N ∈ N, ∀ p ∈ N, ∀ ∈ R one can construct an analytic, canonical bijection (y, η) = χ,N (x, ξ ) : R2l ↔ R2l and a sequence of analytic functions Y p (I ; ω) : Rl+ → R such that: −1 p ◦ χ,N (y, η) =

l

ωk Ik (y, η) +

N −1

Y p (I (y, η); ω) p + N R N (y, η; ). (1.5)

p=1

k=1

The l functions I := (Ik (y, η) : k = 1, . . . , l), the mechanical actions, are thus first integrals of the transformed Hamiltonian up to an error of order N . Hence the system is integrable if the remainder in (1.5) vanishes as N → ∞, namely if the Birkhoff normal form B(I ; ω, ) := ω, I +

∞

Y p (I ; ω) p , ω, I :=

p=1

l

ωk Ik

(1.6)

k=1

converges when the actions belong to some ball |I | < R of Rl+ . However, as proved by Siegel [Si] in 1941, (1.6) is generically divergent (a particular convergence criterion has been later isolated by Rüssmann [Ru]; see also [Ga]. It states that (1.6) converges if Y p (I, ω) = Y p (ω, I )). Already in 1928, on the other hand, Cherry [Ch] (see also [SM], Sect. 30; a more recent proof can be found in [Ot]) remarked that, when l = 2, the normal form is convergent provided the frequencies ω are complex with non-vanishing imaginary part. Under this assumption the small denominator mechanism which generates the divergence becomes instead a large denominator one entailing the convergence. We prove here that under the same assumptions on the frequencies, but much more restrictive conditions on the perturbation, the Cherry theorem holds in quantum mechanics as well, with estimates uniform with respect to the Planck constant h¯ . Namely, the quantum Birkhoff normal form (see [Sj]) converges uniformly with respect to h¯ , and this yields an exact quantization formula for the quantum spectrum. Consider indeed in L 2 (R2 ) the operator H () = P0 (h¯ , ω) + F0 under the assumptions: (A1) P0 (h¯ , ω) is the harmonic-oscillator Schrödinger operator with frequencies ω: 1 1 P0 (h¯ , ω)ψ = − h¯ 2 ψ + [ω12 x12 + ω22 x22 ]ψ, D(P0 ) = H 2 (R2 )∩L 22 (R2 ). (1.7) 2 2 (A2) Let ω1 = a + ib, ω2 = c + id, a = 0, c = 0, ω1 , ω2 := ac + bd. Then ω ∈ ⊂ C2 , where: |ac + bd| |ω1 , ω2 | 2 = ≤δ<1 . := ω ∈ C | 0 < δ1 ≤ |ω| ≤ δ2 | |ω1 ω2 | (a 2 + b2 )(c2 + d 2 ) (1.8) T2 × R2 ×

To state the assumption on the perturbation F0 , define an analytic action : R2 → C2 × C2 , (x, ξ ) → (x , ξ ) = φ,ω (x, ξ ) of T2 into C2 × C2 through the flow of p0 (·, ω) of real initial data u := (x, ξ ) ∈ R2 × R2 but complex frequencies ω ∈ : ⎧ ξk ⎨ sin φk + xk cos φk , xk := k = 1, 2. (1.9) ω ⎩ ξ := ξ k cos φ − ω x sin φ , k

k

k

k k

k

A Uniform Quantum Version of the Cherry Theorem

103

Let f (z) ∈ C(C2 × C2 ; C). Then f ◦ φ,ω (x, ξ ) = f ◦ φ,ω (u), denoted f φ,ω (u), is a φ − 2π - periodic function ∀ (u, ω) ∈ R2 × R2 × fixed. We further denote: f (u) := f φ,ω |φ=0 and 1. f ν,ω (u) the Fourier coefficients of f φ,ω (u) : 1 f ( φ,ω (u))e−iν,φ dφ, ν ∈ Z2 . f ν,ω (u) := (2π )2 T2 2.

1 f ν,ω (s) := f ν,ω (u)e−is,u du, 2π R2 ×R2

(1.10)

their space Fourier transform. Here g (s) is the Fourier transform of g: 1 g (s) = g(u)e−is,u du , g(u) ∈ L 1 (R2 × R2 ). 2π R2 ×R2 3. Fσ := { f ∈ L 1 (R2 × R2 ) | f σ < +∞}, σ > 0. Here: f σ := | f (s)|eσ s| ds < +∞. R 2 ×R 2

(1.11)

4. A,ρ,σ := { f ∈ L 1 (R2 × R2 ) ∩ C(C2 × C2 ) | f ,ρ,σ < +∞}, ρ > 0, σ > 0. Here: f ,ρ,σ := sup eρ|ν| f ν,ω σ . (1.12) ω∈

ν∈Z2

We can now state our assumption on the perturbation. (A3) F0 is a semiclassical pseudodifferential operator of order ≤ 0 with (Weyl) symbol f 0 ∈ A,ρ,σ for some ρ > 0, σ > 0. Explicitly: (notation as in [Ro]) F0 = O phW ( f 0 ), 1 (F0 ψ)(x) = 2 ei(x−y),ξ /h¯ f 0 ((x + y)/2, ξ )ψ(y) dydξ, ψ ∈ S(R2 ). 2 2 h R ×R (1.13) f L 1 , F0 extends to a continuous Remarks. 1. Since ([Ro], Sect. II.4) F L 2 →L 2 ≤ operator in L 2 (R2 ) because: F0 L 2 →L 2 ≤ f 0 L 1 ≤ f 0 σ ≤ f 0 ,ρ,σ .

(1.14)

2. Any f ∈ A,ρ,σ admits a holomorphic continuation from u = (x, ξ ) ∈ R2 × R2 to the strip {z = (z 1 , z 2 ) ∈ C2 × C2 | |Im z| < σ }. Obviously this holomorphic continuation can be different from the function f ◦ φ,ω (z 1 , z 2 ) : C2 × C2 → R, as 2 in the example f = e−|z| P(z) : C2 × C2 → C, P any polynomial, discussed in the Appendix. Since F0 is bounded, H () defined on D(P0 ) is closed with pure-point spectrum ∀ ∈ C, and is self-adjoint for ∈ R if ω ∈ R2+ . Moreover, P0 can be considered a semiclassical pseudodifferential operator of order 2 with symbol p0 (x, ξ ; ω).

104

S. Graffi, C. Villegas-Blas

Theorem 1.1. Let (A1-A3) be verified and let h ∗ > 0. Then there exists ∗ > 0 independent of h¯ ∈ [0, h¯ ∗ ] such that if || < ∗ the spectrum of H () is given by the quantization formula 1 E n (h¯ , ) = ω, nh¯ + (ω1 + ω2 )h¯ + N (n h¯ , h¯ ; ), 2 ∞ N (n h¯ , h¯ ; ) = N p (n h¯ , h¯ ) p .

(1.15) (1.16)

p=1

Here n = (n 1 , n 2 ), n i = 0, 1, . . ., and: 1. N p (I, h¯ ) : R2+ × [0, h ∗ ] → C is analytic in I and continuous in h¯ ; 2. The series (1.15) has convergence radius ∗ uniformly with respect to (I, h¯ ) ∈ × [0, h ∗ ]. Here is any compact of R2+ ; 3. N p (I, h¯ ) : p = 1, 2, . . . admits an asymptotic expansion to all orders in h¯ ; the order 0 term is the coefficient Y p (I ) of the Birkhoff normal form. Remarks. 1. The conditions of the Cherry theorem are much less restrictive than the present ones. In particular, the standard Schrödinger operator in which f 0 depends only on x is excluded. On the other hand, in the classical case h¯ = 0 we obtain an improved version of the theorem: indeed, in our conditions the Birkhoff normal form converges, for small enough, in any compact of R2 . To our knowledge this result is new. 1 2. Taking h¯ = 0 in N p (I, h¯ ) (1.15) becomes E νB S (h¯ , ) := ω, nh¯ + (ω1 + ω2 )h¯ + 2 ∞ p Y p (n h¯ ) , namely the Bohr-Sommerfeld quantization of the Birkhoff normal p=1

form. Formula (1.15) yields all corrections needed to recover the eigenvalues E n (h¯ , ). 3. For any fixed n and h¯ the series (1.15) coincides with the Rayleigh-Schrödinger per1 turbation expansion near the simple eigenvalue ω, nh¯ + (ω1 + ω2 )h¯ of P0 [GP]. 2 4. Always for n = 2, under the same conditions on the frequencies, but under much more general conditions on the perturbation, Melin and Sjöstrand [MS] proved that the KAM iteration scheme applied to the full symbol of the Schrödinger operator converges for I belonging to an open set of R2+ . This yields an exact quantization formula for the spectrum of H (). Under the present conditions it yields of course the same spectrum as (1.15), and reproduces it after expansion in powers of . Always under the present very particular conditions on the frequencies Theorem 1.1 represents the most sharpened version of the quantization formula with exponentially small remainder in of ([BGP], Prop. 3.1) valid for the same class of perturbations; namely, here the remainder vanishes. This is a consequence of the uniform exponential bound |N p (n h¯ , h¯ )| < C p for some C > 0 independent of (n h¯ , h¯ ) worked out in Proposition 2.2 below, in the same way as Proposition 3.1 of [BGP] follows from the uniform bound |N p (n h¯ , h¯ )| < C p p (2+τ ) p valid in the general case of real diophantine frequencies with diophantine constant τ > l − 1. 2. Proof of the Results The proof is to be obtained in four steps.

A Uniform Quantum Version of the Cherry Theorem

105

1. Perturbation theory: the formal construction. Look for a unitary transformation U (ω, , h¯ ) = ei W ()/h¯ : L 2 ↔ L 2 , W () = W ∗ (), ∈ R, such that: S() := U H ()U −1 = P0 (h¯ , ω) + Z 1 + 2 Z 2 + . . . + k Rk (),

(2.1)

where [Z p , P0 ] = 0, p = 1, . . . , k − 1. Recall the formal commutator expansion: eit W ()/h¯ H e−it W ()/h¯ =

∞

t l Hl ,

H0 := H,

Hl :=

l=0

[W, Hl−1 ] , l ≥ 1. (2.2) i h¯ l

Looking for W () under the form of a power series, W () = W1 + 2 W2 + . . . , (2.2) becomes: S=

k

s Ps + k+1 R (k+1) ,

(2.3)

s=0

where Ps = Fs =

[Ws , P0 ] + Fs , s ≥ 1, F1 ≡ F0 , i h¯ s 1 [W j , [W j , . . . , [W jr , P0 ] . . .] 1

r =2

+

r!

r =1

j1 +...+ jr =s−1 jl ≥1

2

(i h¯ )r

j1 +...+ jr =s jl ≥1

s−1 1 r!

(2.4)

[W j1 , [W j2 , . . . , [W jr , F0 ] . . .] . (i h¯ )r

Since Fs depends on W1 , . . . , Ws−1 , (2.1) yields the recursive homological equations: [Ws , P0 ] + Fs = Z s , i h¯

[P0 , Z s ] = 0.

(2.5)

To solve for S, Ws , Z s , we can equivalently look for their symbols; from now on, we denote by the same letter, but in small case, the symbol σ (A) of an operator A, except for the symbol of S, denoted . Let us now recall the following relevant results (see e.g. [Fo], Sect. 3.4): 1. σ ([A, B]/i h¯ ) = {a, b} M , where {a, b} M is the Moyal bracket of a and b. 2. Given (g, g ) ∈ Aω,σ , their Moyal bracket {g, g } M is defined as {g, g } M = g#g − g #g, where # is the composition of g, g considered as Weyl symbols. 3. In the Fourier transform representation, used throughout the paper, the Moyal bracket has the expression

2 ∧ g (s 1 )g (s − s 1 ) sin h¯ (s − s 1 ) ∧ s 1 /2 ds 1 , (2.6) ({g, g } M ) (s) = h¯ R2n where, given two vectors s = (v, w) and s 1 = (v 1 , w 1 ), s ∧s 1 := w, v1 −v, w1 . 4. {g, g } M = {g, g } if either g or g is quadratic in (x, ξ ).

106

S. Graffi, C. Villegas-Blas

Equations (2.2, 2.3, 2.4) then become, once written for the symbols: σ (ei W ()/h¯ H e−i W ()/h¯ ) =

∞

Hl , H0 := p0 + f 0 , Hl :=

l=0

{w, Hl−1 } M , l ≥ 1, l (2.7)

() =

k

s ps + k+1r (k+1) ,

(2.8)

s=0

where ps := {ws , p0 } M + f s , s ≥ 1, f 1 ≡ f 0 , s 1 {w j1 , {w j2 , . . . , {w jr , p0 } M . . .} M f s := r ! j +...+ j =s r =2

1

s−1 1 + r! r =1

(2.9) (2.10)

r jl ≥1

{w j1 , {w j2 , . . . , {w jr , f 0 } M . . .} M , s > 1.

j1 +...+ jr =s−1 jl ≥1

In turn, the recursive homological equations become: {ws , p0 } M + f s = ζs ,

{ p0 , ζs } M = 0.

(2.11)

2. Solution of the homological equation and estimates of the solution. f ∈ Aω,ρ,σ clearly entails the existence of the Fourier expansion of f φ,ω (u), and its uniform convergence with respect to φ ∈ T2 , u on compacts of R2 × R2 , and ω ∈ , namely: f φ,ω (u) = f ν,ω (u)eiν,φ =⇒ f (u) = f ν,ω (u). (2.12) ν∈Zl

ν∈Zl

We further denote, for ω ∈ , and ρ > 0: f ω,σ := f ν,ω σ ; Aω,σ := { f (u) ∈ Fσ | f (u)ω,σ < +∞},

(2.13)

ν∈Z2

f ω,ρ,σ :=

eρ|ν| f ν,ω σ ; Aω,ρ,σ := { f (u) ∈ Aω,σ | f (u)ω,ρ,σ < +∞},

ν∈Z2

f ,σ := sup f ω,σ ; A,σ := { f (u) ∈ Fσ | f (u),σ < +∞}, ω∈

f ,ρ,σ := sup f ω,ρ,σ . ω∈

(2.14) (2.15) (2.16)

Hence A,ρ,σ = { f (u) ∈ Fσ | f (u),ρ,σ < +∞} and clearly A,ρ,σ ⊂ A,σ ⊂ Fσ . Moreover the following inequalities obviously hold: sup u∈R2 ×R2

| f ν,ω (u)| ≤ fˆν,ω (s) L 1 ≤ f ν,ω σ ≤ f ,σ ≤ f ,ρ,σ ,

fˆ L 1 ≤ f σ ≤ f σ ≤ f ,σ ≤ f ,ρ,σ .

(2.17) (2.18)

A Uniform Quantum Version of the Cherry Theorem

107

Now the key remark is that {a, p0 } M = {a, p0 } for any symbol a because p0 is quadratic in (x, ξ ). The homological equation (2.11) becomes therefore {ws , p0 } + f s = ζs ,

{ p0 , ζs } = 0

(2.19)

We then have: Proposition 2.1. Let f ∈ A,ρ,σ . Then the equation {w, p0 } + f = ζ,

{ p0 , ζ } = 0

(2.20)

admits the solutions ζ ∈ A,σ , w ∈ A,ρ,σ , ζ := f 0,ω ;

w :=

ν =0

f ν,ω , iω, ν

(2.21)

with the property ζ ◦ φ = ζ ; i.e., ζ depends only on I1 , I2 . Moreover: ζ ,σ ≤ f ,σ ; w,ρ,σ ≤ f ,ρ,σ , ∇w,ρ,σ ≤

4C f ,ρ,σ (2.22) σ

for some C(, δ) > 0. To prove the proposition we need a preliminary result. Lemma 2.1. Let w be defined by (2.21), and φ,ω (x, ξ ) by (1.9). Set: φ,ω (x, ξ ) := iφ,iω (x, ξ ),

(2.23)

that is: φ,ω (x, ξ ) := (xk , ξk ), where: ⎧ ⎨

ξk sinh φk ω ⎩ ξ = ξ cosh φ + ω kx sinh φ k k k k k k xk = xk cosh φk +

k = 1, 2.

(2.24)

Then one has, uniformly with respect to (x, ξ ) on compacts of R4 : w ◦ φ,ω (x, ξ ) =

f ν,ω (x, ξ ) eiν,φ , φ ∈ T2 , iω, ν

(2.25)

ν =0

f ν,iω (x, ξ ) e−ν,φ , |φ| ≤ ρ − η, ∀ 0 < η < ρ. (2.26) w ◦ φ,ω (x, ξ ) = ω, ν ν =0

Moreover there is C(δ) > 0 such that: wω,ρ,σ ≤ C f ω,ρ,σ ; wiω,ρ,σ ≤ C f iω,ρ,σ .

(2.27)

108

S. Graffi, C. Villegas-Blas

Proof. Let us first prove that (2.21), whose convergence is proved below, solves (2.20), and that w ◦ φ,ω (x, ξ ) admits the representation (2.25). Following the argument of ([BGP]), Lemma 3.6, let us write: f ν,ω ◦ ωt,ω (u) d d w ◦

(x, ξ ) = { p0 , w}(x, ξ ) = ωt,ω dt t=0 dt t=0 iω, ν 0 =ν∈Z2 f ν,ω ◦ ωt,ω (u) f ν,ω (u)eiν,ωt d d = = dt t=0 iω, ν dt t=0 iω, ν 0 =ν∈Z2 0 =ν∈Z2 f ν,ω (u). = 0 =ν∈Z2

Clearly, this equality also entails ζ = f 0,ω . Consider now the expansions (2.25, 2.26). First, it is easy to check that ω ∈ if and only if iω ∈ . Now we have: wν,ω =

f ν,ω (x, ξ ) , iω, ν

and therefore, by a straightforward application of Lemma 2.5: wν,ω σ ≤ C f ν,ω σ . Hence: wω,ρ,σ =

eρ|ν| wν,ω σ ≤ C

ν∈Z2

eρ|ν| f ν,ω σ = f ω,ρ,σ ∀ ω ∈ .

ν∈Z2

Therefore q ∈ A,ρ,σ entails w ◦ ω,φ ∈ A,ρ,σ , whence the uniform convergence of the series (2.25). Now iω ∈ if ω ∈ ; hence w ◦ iω,φ ∈ A,ρ,σ . On the other hand, the replacement φ → iφ maps φ,iω (x, ξ ) into φ,ω (x, ξ ), and the series (2.26) is uniformly convergent if |Im φ| < ρ − η, 0 < η < ρ. Formula (2.26) is therefore proved. This concludes the proof of the lemma. Proof of Proposition 2.1. Let us first prove that ζ depends only on I1 , I2 . Consider for the sake of simplicity u = (x, ξ ) ∈ R2 . Since f ∈ A,ρ,σ , we can write: f φ,ω (x, ξ ) =

ξ iφ amn ξ −iφ m (x + )e )e + (x − 2m+n iω iω m.n=0 n

× (−iωx + ξ )eiφ + (iωx + ξ )e−iφ . ∞

The average over φ eliminates all terms but those proportional to [(x +

ξ k ξ )(x − )] [(−iωx + ξ )(iωx + ξ )]l , iω iω

i.e. to I k I l . The estimate ζ ω,σ ≤ f ω,σ is obvious, and entails ζ ,σ ≤ f ,σ . The second estimate in (2.22) has been proved in Lemma 2.1 above. To prove the third

A Uniform Quantum Version of the Cherry Theorem

109

one, consider the function f ◦ φ,ω (z) and compute, for j = 1, 2: d ∂w ∂ x j ∂w ∂ξ j w ◦ φ,ω (z)|φ=0 = + dφ j ∂ x j ∂φ j ∂ξ j ∂φ j ν j f ν,ω ∂w ξ j ∂w . = − ωjxj = ∂x j ωj ∂ξ j iω, ν 2

φ=0

0 =ν∈Z

Therefore, once more by Lemma 2.5, ∂w ξ j |ν j | ∂w f ν,ω ω,σ ≤ eρ|ν| ∂ x ω − ∂ξ ω j x j |ω, ν| j j j ω,ρ,σ 0 =ν∈Z2 ≤C eρ|ν| f ν,ω ω,σ = C f ω,ρ,σ . 0 =ν∈Z2

This yields: ∂w ξ j ∂w − ω x ≤ C f ,ρ,σ . j j ∂x ω ∂ξ j j j ,ρ,σ

(2.28)

In the same way: d ∂w ∂ x j ∂w ∂ξ j w ◦ φ,ω (z)|φ=0 = + dφ j ∂ x j ∂φ j ∂ξ j ∂φ j ν j f ν,iω ∂w ξ j ∂w = + ωjxj = ∂ x j ω j ∂ξ j ω, ν 2

φ=0

0 =ν∈Z

whence, by Lemma 2.5, ∂w ξ j |ν j | ∂w f ν,iω iω,σ + ω x ≤ eρ|ν| j j ∂x ω ∂ξ j |ω, ν| j j iω,ρ,σ 0 =ν∈Z2 ≤C eρ|ν| f ν,iω iω,σ = C f iω,ρ,σ . 0 =ν∈Z2

Recalling that ω ∈ if and only if iω ∈ we get: ∂w ξ j ∂w + ω x ≤ C f ,ρ,σ . j j ∂x ω ∂ξ j j j ,ρ,σ

(2.29)

Denote now s j , t j the Fourier dual variables of (x j , ξ j ), j = 1, 2. Then, by definition (we drop for the sake of simplicity the dependence of ω): ∂w ∂w (s j , t j ) σ (|s|+|t|) dsdt. ∂ x ξ j = 4 s j e ∂t j j R σ

110

S. Graffi, C. Villegas-Blas

Applying Lemma 2.3 to the integration over t j we get: ∂w s j w = ν,ω (s j , t j ) eσ (|s|+|t|) dsdt ∂x j ω,σ R4 ν∈Z2 ∂w 2 s j ν,ω (s j , t j ) eσ (|s|+|t|) dsdt ≤ 4 σ ∂t ν∈Z2

R

j

∂wν,ω 2 2 = ∂x ξj = σ σ j σ 2 ν∈Z

∂w . ∂x ξj j ω,σ

Therefore, by (2.28, 2.29), ∂w 2C|ω j | f ,ω,σ . ≤ ∂x σ j ,ρ,σ Analogously, applying this time Lemma 2.3 to the integration over s j : ∂w 2C f ,ω,σ . ≤ ∂ξ σ |ω j | j ,ρ,σ This is enough to prove the proposition. 3. Iterative Lemma. Proposition 2.2. Set: µ :=

4 f 0 ,ρ,σ . σ

Let µ < 1/4 and consider for k = 1, 2, . . . the function k := p0 + Zk + vk with Zk , vk ∈ A,ρ,σ , and let Zk depend on (I1 , I2 ) only. Assume moreover: ⎧ if k = 0 ⎪ ⎨ k−1 0 , Zk ,σ ≤ (2µ)s if k ≥ 1 ⎪ ⎩

(2.30)

(2.31)

s=0

vk ,ρ,σ ≤ (2µ)k f 0 ,ρ,σ .

(2.32)

Let Sk be the Weyl quantization of k . Then there exists a unitary map Tk : L 2 → L 2 , Tk := eiW/h¯ such that the Weyl symbol of the transformed operator Tk Sk Tk∗ := Sk+1 is given by (2.30) with k + 1 in place of k and satisfies (2.31, 2.32) with k + 1 in place of k.

A Uniform Quantum Version of the Cherry Theorem

111

Proof. As in [BGP], Proposition 3.2, the homological equation: { p0 , w} + vk = Vk

(2.33)

determines the symbol w of W . Here the second unknown Vk has to depend on (x, ξ ) only through I1 , I2 . Applying Proposition 1 we find that w and Vk exist and fulfill the estimates w,ρ,σ ≤ f 0 ,ρ,σ (2µ)k ; ∇w,ρ,σ ≤ (2µ)k+1 ; Vk ,ρ,σ ≤ f 0 ,ρ,σ (2µ)k . Define now: Zk+1 := Zk + Vk ; vk+1 :=

Zkl +

l≥1

Zk0 := Zk ; Zkl :=

vkl +

l≥1

pl0 ,

l≥1

1 {w, Zkl−1 } M , l

and analogous definitions for vkl and pl0 . Clearly vk+1 ∈ A,ρ,σ by Lemma 2.4 below. Then the symbol of the transformed operator has the form (2.30) with k + 1 in place of k. To get the estimates, for k ≥ 1 we can write, by Proposition 1 and Lemmas 2.2, 2.3, and 2.4:

(2µ)k+1 ≤ (2µ)k+1 , (2µ)l = 1 − 2µ l≥1 µ l ≤ 2µ, ≤ Zk ,σ · pl0 ,ρ,σ ≤ (2µ)k+1 , 1−µ

(vkl ),ρ,σ ≤ (2µ)k

l≥1

l≥1

Zkl ,σ

l≥2

whence the assertion in a straightforward way. Proof of Theorem 1. By Proposition 2 there is ∗ > 0 such that lim p0 + Zk := ()

k→∞

exists in the | · ,ρ,σ norm if || < ∗ . Then S() := O phW (()) is unitarily equivalent k ζ (l) l + to H (). Since Zk is a polynomial of order k −1 in , we can write k = p0 + l=1

vk , where ζ (l) (I1 , I2 ) are solutions of the homological equations (2.11); therefore S() has the form (2.1). Note that lim vk ,ρ,σ = 0 entails lim Rk L 2 →L 2 = 0. To sum k→∞

k→∞

up, the Weyl symbol (, h¯ ) has the convergent (uniform with respect to h¯ ) normal form (, h¯ ) = p0 (I ) +

∞

Zn (I, h¯ ) n .

n=1

Then the assertions of Theorem 1 follow exactly as in [Sj] (see also [BGP]). This concludes the proof.

112

S. Graffi, C. Villegas-Blas

4. Auxiliary results. Lemma 2.2. Let (g, g , ∇g, ∇g ) ∈ Fσ . Then: {g, g } M σ ≤ ∇gσ ∇g σ . If

(g, g , ∇g, ∇g )

(2.34)

∈ Aω,ρ,σ then {g, g } M ω,ρ,σ ≤ ∇gω,ρ,σ ∇g ω,ρ,σ ,

(2.35)

and if (g, g , ∇g, ∇g ) ∈ A,ρ,σ : {g, g } M ,ρ,σ ≤ ∇g,ρ,σ ∇g ,ρ,σ .

(2.36)

Proof. We repeat the argument of [BGP], Lemma 3.1. We have |s ∧ s 1 | ≤ |s| · |s 1 |. Hence by (2.6) and the definition of the σ − norm we get: 2 σ |s| {g, g } M σ = e ds |g(s) ˆ gˆ (s − s 1 )| · |sinh(h¯ (s − s 1 ) ∧ s 1 )/2| ds 1 h¯ R2l R2l 2 1 ≤ ds eσ (|s|+|s |) |g(s) ˆ gˆ (s 1 )| · |sinh(h¯ s ∧ s 1 )/2| ds 1 2l h¯ R2l R 1 σ |s| ≤ e |g(s)| ˆ ds eσ |s | |gˆ (s 1 )| · |s ∧ s 1 | ds 1 = 2l R2l R 1 σ |s| ≤ e |g(s)||s| ˆ ds eσ |s | |gˆ (s 1 )| · |s 1 | ds 1 = ∇gσ ∇g σ . R2l

R2l

The remaining two inequalities follow from the first one by exactly the same argument of [BGP], Lemma 3.4. This concludes the proof of the lemma. Lemma 2.3. Let g ∈ Fσ , u = (x, ξ ) ∈ R2l . Then: gσ ≤

1 ugσ . σ

(2.37)

Proof. Setting f (s) := g(s), ˆ (2.37) is clearly equivalent to 1 eσ |s| | f (s)| ds ≤ eσ |s| |∇ f (s)| ds. 2l 2l σ R R

(2.38)

We may limit ourselves to prove this inequality in the one-dimensional case, namely to show that: 1 eσ |s| | f (s)| ds ≤ eσ |s| | f (s)| ds. (2.39) σ R R To see this, first write, for s > 0: eσ s f (s) = −

∞

eσ t f (t)eσ (s−t) dt,

s

whence, for A > 0: ∞ |eσ s f (s)| ds ≤ A

=σ

| f (t)|eσ s dsdt =

A≤s≤t≤∞ ∞ −1

∞ A

| f (t)| ∞

| f (t)|(eσ t − eσ A ) dt ≤ σ −1

A

A

t

eσ s dsdt

A

| f (t)|eσ t dt.

A Uniform Quantum Version of the Cherry Theorem

113

Likewise, for s < 0, A < 0: s e−σ t f (t)e−σ (s−t) dt, e−σ s f (s) =

−∞

A

−∞

|e

−σ s

f (s)| ds = =σ

−∞≤t≤s≤A A −1 −∞

| f (t)|e

−σ s

dsdt =

A

−∞

| f (t)|

| f (t)|(e−σ t − e−σ A ) dt ≤ σ −1

A

e−σ s dsdt

t A

−∞

| f (t)|e−σ t dt.

Performing the limit A → 0 in both inequalities we get (2.39). This concludes the proof of the lemma. Lemma 2.4. Let g ∈ A,ρ,σ , w ∈ A,ρ,σ . 1. Define gr :=

1 {w, gr −1 } M , r

r ≥ 1; g0 := g.

Then gr ∈ A,ρ,σ and the following estimate holds: ∇w ,ρ,σ r g,ρ,σ . gr ,ρ,σ ≤ 4 σ

(2.40)

2. Let w solve the homological equation (2.11). Define the sequence pr 0 : r = 0, 1, . . .: p00 := p0 ;

pr 0 :=

1 {w, pr −10 } M , r ≥ 1. r

Then pr 0 ∈ Aω,σ and fulfills the following estimate: r −1 f 0 ,ρ,σ , r ≥ 1. pr 0 ,ρ,σ ≤ 4σ −1 ∇w,ρ,σ

(2.41)

Proof. Both estimates (2.40, 2.41) are straightforward consequences of Lemmas 2.2 and 2.3: as far as (2.41) is concerned, it is indeed enough to note that {w, p0 } = ζ − q, whence p10 ,ρ,σ + ∇ p10 ,ρ,σ ≤

4 f 0 ,ρ,σ . σ

Lemma 2.5. If (A3) holds there is Cδ > 0 independent of ω ∈ such that |ω1 ν1 + ω2 ν2 | ≥ Cδ ν12 + ν22 .

(2.42)

Proof. We have to show the existence of Cδ > 0 such that f (ν1 , ν2 ) :=

|ω1 ν1 + ω2 ν2 |2 ≥ Cδ , ∀ (ν1 , ν2 ) ∈ Z2 , (ν1 , ν2 ) = (0, 0). (2.43) ν12 + ν22

Notice that f is homogeneous of degree 0, namely f (µν1 , µν2 ) = f (ν1 , ν2 )∀ (ν1 , ν2 ) ∈ Z2 , (ν1 , ν2 ) = (0, 0), ∀ µ ∈ R, µ = 0. Hence it is enough to show that F(x, y) := |ω1 x + ω2 y|2 ≥ Cδ , ∀ (x, y) ∈ S 1

(2.44)

114

S. Graffi, C. Villegas-Blas

or, writing x = cos θ, y = sin θ : F(θ ) :=

1 1 [|ω1 |2 + |ω2 |2 ] + [|ω1 |2 − |ω2 |2 ] cos 2θ + ω1 , ω2 sin 2θ ≥ C. 2 2

Note that F(0) = F(2π ) = |ω1 |2 . A simple study of the function F(θ ) : S 1 → R under the assumption (A2) shows the existence of Cδ ↓ 0 as δ ↑ 1 such that |F(θ )| ≥ Cδ ∀ θ ∈ S 1 . We omit the elementary details. Appendix Consider the function f : C4 → R, f (z) := e−|z| Pn (z), z ∈ C4 , |z|2 = 2

|z k |2 .

Here Pn (z) is a polynomial of degree n. Let us verify that f belongs to A,ρ,σ ; namely, there are ρ > 0, σ > 0 such that: eρ|ν| f ν,ω (u)σ < +∞. sup ω∈

ν∈Z2

It is clearly enough to consider the case u = (x, ξ ) ∈ R2 , n = 0. Set: ω := γ eiθ , 0 ≤ θ ≤ 2π , δ1 ≤ γ ≤ δ2 . Then: 2 | φ,ω (u)|2 = xcosφ + ωξ sinφ + |ξ cosφ − ωxsinφ|2 = Ax 2 + Bxξ + Cξ 2 A := cos2 φ + γ 2 sin2 φ;

B := cos θ (γ −1 − γ ) sin 2φ, C := cos2 φ + γ −2 sin2 φ.

Therefore we can write: f φ,ω (u) := f ◦ ω,φ (u) = e−Q(γ ,θ,φ)u,u , det Q = = Tr Q = κ :=

Q(γ , θ, φ) :=

A 21 B 1 2B C

,

cos4 φ + sin4 φ + [(γ −2 + γ 2 ) − cos2 θ (γ −1 − γ )2 ] sin2 φ cos2 φ 1 + κ(1 − cos2 θ )sin2 φcos2 φ, 2 + κsin2 φ γ −2 + γ 2 − 2 ≥ 0,

whence, ∀ (θ, φ) ∈ [0, 2π ] × [0, 2π ], 1 ≤ λ1 λ2 ≤ 1 + κ, 2 ≤ λ1 + λ2 ≤ 2 + κ, where 0 < λ1 (γ , θ, φ) ≤ λ2 (γ , θ, φ) denote the eigenvalues of Q(γ , θ, φ) > 0. This easily yields the uniform estimate: 1

1 ≤ λ1 (γ , θ, φ) ≤ λ2 (γ , θ, φ) ≤ D, D := 2 + κ + (2 + κ)2 − 4 . D 2 Consider now the Fourier coefficients f ν,ω (u) = f ν,γ ,θ (u): 2π 2π 1 1 f ν,γ ,θ (u) := f ◦ ω,φ (u)e−iνφ dφ = e−Q(γ ,θ,φ)u,u e−iνφ dφ, 2π 0 2π 0

A Uniform Quantum Version of the Cherry Theorem

115

and compute their Fourier transform: 2π 1 ˆ e−Q(γ ,θ,φ)u,u e−iνφ e−iu,s dφ du f ν,γ ,θ (s) = 2(π )2 R2 0 2π 2 −1 = e−Q (γ ,θ,φ)s,s/2 e−iνφ dφ, s ∈ R2 , √ 2 (2π ) det Q 0 1 C − 21 B . Q −1 (γ , θ, φ) = det Q − 21 B A Since 2 s, Q −1 (γ , θ, φ)s ≥ λ−1 2 s ≥

s2 D

∀ (θ, φ) ∈ [0, 2π ] × [0, 2π ] we get the (ν, θ, φ)-independent estimate 2π 2 1 2 −|s|2 /D | fˆν,γ ,θ (s)| ≤ e dφ = e−|s| /D . 2 (2π ) π 0 Therefore f ν,ω σ < +∞ ∀ σ > 0, ∀ ν ∈ Z2 . Let now φ ∈ C. Writing: det Q(γ , θ, φ) = 1 +

A(γ , θ ) 2 sin (2φ), 4

A(γ , θ ) := κ(1 − cos2 θ ) ≥ 0

we get (omitting the elementary details): det Q(γ , θ, φ) = 0,

|Im φ| <

1 arccosh(1 + 8/κ) . 4

Therefore the function −1

e−Q (γ ,θ,φ)s,s φ → √ := G γ ,θ,s (φ) det Q(γ , θ, φ) 1 arccosh(1 + 8/κ) := m(κ) uniformly 4 2 with respect to (γ , θ, s) ∈ [δ1 , δ2 ] × [0, 2π ] × R . In turn the analyticity entails, as is well known, that for any 0 < η < m(κ) there exists ρ1 > m(κ) − η independent of (γ , θ, s) ∈ [δ1 , δ2 ] × [0, 2π ] × R2 such that is analytic with respect to φ in the strip |Im φ| <

| fˆν,γ ,θ (s)| ≤

sup |G γ ,θ,s (φ)|e−ρ1 |ν| .

|Im φ|≤η

Since det Q(γ , θ, φ) = 0 for |Im φ| ≤ η, there exist K 1 (η) >, K 2 (η) > 0 independent of (γ , θ ) such that: |Q −1 (γ , θ, φ)s, s| ≥ K 1 |s|2 ,

1 < K 2 (η), √ | det Q(γ , θ, φ)|

and therefore K 2 (η) −K 1 |s|2 −ρ1 |ν| e | fˆν,γ ,θ (s)| ≤ e . 2π

116

S. Graffi, C. Villegas-Blas

This in turn entails the existence of K 3 (η) > 0 independent of ν such that, ∀ σ > 0: f ν,ω σ = eσ |s| | fˆν,γ ,θ (s)| ds ≤ K 3 e−ρ1 |ν| . R2

Hence, ∀ 0 < ρ < ρ1 : f ω,ρ,σ =

eρ|ν| f ν,ω σ < K (η)

ν∈Z2

for some K (η) > 0 independent of ω ∈ . We can thus conclude that f ,ρ,σ = sup eρ|ν| f ν,ω σ < K , ω∈

ν∈Z2

i.e., f ∈ A,ρ,σ . Remarks. We have checked that f ∈ A,ρ,σ . This entails f ∈ Fσ . By the Paley-Wiener 2 2 theorem, f φ,ω (u) = e−(Ax +Bxξ +Cξ ) must have, ∀ (φ, ω), a holomorphic continuation gφ,ω (z 1 , z 2 ) from u = (x, ξ ) ∈ R × R to z = (z 1 , z 2 ) = (x + i y, ξ + iη) ∈ C × C. This holomorphic continuation is clearly gφ,ω (z 1 , z 2 ) := e−Az 1 +Bz 1 z 2 +C z 2 . 2

2

gφ,ω (z 1 , z 2 ) of course does not coincide with z2 f ◦ φ,ω ((z 1 , z 2 )) = exp −[|z 1 cosφ + sinφ|2 + |z 2 cosφ − ωz 1 sinφ|2 ] ω when (y, η) = (0, 0). Acknowledgements. We thank Dario Bambusi for a critical reading of the manuscript and André Martinez for providing us a first proof of Lemma 2.3.

References [BGP] [Ch] [Fo] [Ga] [GP] [MS] [Ot] [Ro] [Ru] [Si] [Sj] [SM]

Bambusi, D., Graffi, S., Paul, T.: Normal forms and quantization formulae. Commun. Math. Phys. 207, 173–195 (1999) Cherry, T.W.: On the solution of hamiltonian systems of differential equations in the neighboorhood of a singular point. Proc. London. Math. Soc. 27, 151–170 (1928) Folland, G.: Harmonic analysis in phase space. Princeton, NJ: Princeton University Press, 1988 Gallavotti, G.: A criterion of integrability for perturbed harmonic oscillators. wick ordering in classical mechanics. Commun. Math. Phys. 87, 365–383 (1982) Graffi, S., Paul, T.: The Schrödinger equation and canonical perturbation theory. Commun. Math. Phys. 108, 25–41 (1987) Melin, A., Sjöstrand, J.: Bohr-sommerfeld quantization condition for non-selfadjoint operators in dimension 2. Autour de l’Analyse Microlocale. Astérisque No. 284, 181–244 (2003) Ottolenghi, A.: On convergence of normal forms for complex frequencies. J. Math. Phys. 34, 5205–5216 (1991) Robert, D.: Autour de l’approximation semiclassique. Basel: Birkhäuser, 1987 Rüssmann, H.: Konvergente Reihenentwicklungen in der Störungstheorie der Himmelsmechanik. Selecta Mathematica, V, 93–60, Heidelberger Taschenbücher, 201. Berlin-New York: Springer, 1979 Siegel, C.L.: On the integrals of canonical systems. Ann. Math. 42, 806–822 (1941) Siöstrand, J.: Semi-excited levels in non-degenerate potential wells. Asymptotic Analysis 6, 29–43 (1992) Siegel C.L., Moser J.: Lectures on Celestial Mechanics. Berlin-Heidalberg-New York: SpringerVerlag, 1971

Communicated by B. Simon

Commun. Math. Phys. 278, 117–132 (2008) Digital Object Identifier (DOI) 10.1007/s00220-007-0377-1

Communications in

Mathematical Physics

A Variational Analysis of Einstein–Scalar Field Lichnerowicz Equations on Compact Riemannian Manifolds Emmanuel Hebey1 , Frank Pacard2 , Daniel Pollack3 1 Université de Cergy-Pontoise, Département de Mathématiques, Site de Saint-Martin,

2 Avenue Adolphe Chauvin, 95302 Cergy-Pontoise Cedex, France. E-mail: [email protected] 2 Université Paris XII, Département de Mathématiques, 61 Avenue du Général de Gaulle, 94010 Créteil Cedex, France. E-mail: [email protected] 3 University of Washington, Department of Mathematics, Box 354350, Seattle, WA 98195-4350, USA. E-mail: [email protected] Received: 3 February 2007 / Accepted: 18 March 2007 Published online: 7 November 2007 – © Springer-Verlag 2007

Abstract: We establish new existence and non-existence results for positive solutions of the Einstein–scalar field Lichnerowicz equation on compact manifolds. This equation arises from the Hamiltonian constraint equation for the Einstein–scalar field system in general relativity. Our analysis introduces variational techniques, in the form of the mountain pass lemma, to the analysis of the Hamiltonian constraint equation, which has been previously studied by other methods. 1. Introduction One of the foundations in the mathematical analysis of the Einstein field equations of general relativity is the rigorous formulation of the Cauchy problem. The basic local existence result of Foures–Bruhat [10], and the important extension of this due to Choquet-Bruhat and Geroch [5], allows one to approach the study of globally hyperbolic spacetimes via the analysis of initial data sets. The Gauss and Codazzi equations impose constraints on the choices of initial data in general relativity, and these constraints are expressed by the Einstein constraint equations. This perspective, originally studied in the context of vacuum spacetimes, has also been successfully employed in the study of many non-vacuum models obtained by minimally coupling gravity to many of the classical matter and field sources, such as electromagnetism (via the Maxwell equations), Yang-Mills fields, fluids, and others [8,11,12]. One of the simplest non-vacuum systems is the Einstein–scalar field system which arises in coupling gravity to a scalar field satisfying a linear or non-linear wave equation with respect to the Lorentz metric describing the gravitational field. The Einstein–scalar field system, when posed in this generality, includes as special cases the (massless or massive) Einstein–Klein–Gordon equations as well as the vacuum Einstein equations with a (positive or negative) cosmological constant. Einstein–scalar field theories have been the subject of interesting developments in recent years. Among these are the recent attempts to use such theories to explain the

118

E. Hebey, F. Pacard, D. Pollack

observed acceleration of the expansion of the universe [16]–[19]. Using the conformal method, Choquet-Bruhat, Isenberg, and Pollack [6,7] reformulated the constraint equations for the Einstien–scalar field system as a determined system of nonlinear partial differential equations. The equations are semi-decoupled in the constant mean curvature (CMC) setting. One of these equations, the conformally formulated momentum constraint, is a linear elliptic equation and its solvability is easy to address. The other one, the conformally formulated Hamiltonian constraint, is a nonlinear elliptic equation (the Einstein–scalar field Lichnerowicz equation) as in (1.1) below (see [3] for a survey on the constraint equations, and in particular, the conformal method). This nonlinear equation, which contains both a positive critical Sobolev nonlinearity and a negative power nonlinearity, turns out to be of great mathematical interest. In this paper we provide a variational analysis of this equation under certain conditions on its coefficients. The analysis of the Lichnerowicz equations which arise as the conformally formulated Hamiltonian constraint equations in both vacuum and non-vacuum settings has, in the past, been conducted primarily by either the method of sub- and supersolutions (i.e. a barrier method) or by perturbation or fixed point methods. This approach has been sufficient to allow for a complete understanding of solvability in, for example, the case of constant mean curvature vacuum initial data on compact manifolds [11]. In [7] this method was applied to constant mean curvature initial data for the Einstein–scalar field system on compact manifolds. In a number of cases, the method of sub and supersolutions was shown to be sufficient to completely analyze the solvability of the Einstein–scalar field Lichnerowicz equation. In other cases, the limitations of this method were exposed and only partial results were obtained. We establish here two general theorems concerning non-existence and existence respectively, of positive solutions to the Einstein–scalar field Lichnerowicz equation (1.1). These results are of interest due both to their application to questions of existence and non-existence of solutions of the Einstein–scalar field constraint equations, as well as, more generally, the introduction of variational techniques to the analysis of the constraint equations. We expect that similar variational techniques will be of use in resolving other open questions concerning initial data for the Cauchy problem in general relativity. In what follows we let (M, g) be a smooth compact Riemannian manifold of dimension n ≥ 3. We let also H 1 (M) be the Sobolev space of functions in L 2 (M) with one derivative in L 2 (M). The H 1 –norm on H 1 (M) is given by u H 1 = |∇u|2 + u 2 dvg . M

2n Let 2 = n−2 , so that 2 is the critical Sobolev exponent for the embedding of H 1 into Lebesgue’s spaces. Let also h, A, and B be smooth functions on M. We consider the following Einstein–scalar field Lichnerowicz type equations:

g u + hu = Bu 2

−1

+

A , u 2 +1

(1.1)

where g = −divg ∇ is the Laplace–Beltrami operator, and u > 0. Unless otherwise stated, solutions are always required to be smooth and positive. The relationship between the coefficients in (1.1) and initial data for the Einstein– scalar field system are as follows (see [7] for more details). We first note that the sign convention for the Laplace–Beltrami operator which we use here is the opposite of the one used in [7]. The conformal initial data for the purely gravitational portion of the

Einstein–Scalar Field Lichnerowicz Equations

119

Einstein–scalar field system consists of a background Riemannian metric g (indicating a choice of conformal class for the physical metric) together with a symmetric (0, 2)tensor σ which is divergence-free and trace-free with respect to g (so that σ is what is commonly referred to as a transverse-traceless, or TT-tensor) and a scalar function τ representing the mean curvature of the Cauchy surface M in the spacetime development of the initial data set. The initial data for the scalar field consists of two functions, ψ and π on M, representing respectively the initial value for the scalar field and its normalized time derivative. With respect to this set of conformal initial data, the constraint equations for the Einstein–scalar field system can be realized as a determined elliptic system whose unknowns consist of a positive scalar function φ and a vector field W on M. As previously remarked, in the CMC case (when τ is constant) this system becomes semi-decoupled. This means that the portion of it corresponding to the momentum constraint equation is a linear, elliptic, vector equation for W in which the unknown φ does not appear. This equation has a unique solution when (M, g) has no conformal Killing vector fields. The solution, W , of this “conformally formulated momentum constraint equation” then appears in the one of the coefficients of the “conformally formulated Hamiltonian constraint equation” which is what we refer to as the Einstein–scalar field Lichnerowicz equation. A positive solution φ of the Einstein–scalar field Lichnerowicz equation is then used with the vector field W to transform the “conformal” initial data set (g, σ, τ, ψ, π ) into a “physical” initial data set satisfying the Einstein–scalar field constraint equations (see [7]). In terms of the conformal initial data set and the vector field W (satisfying the conformally formulated momentum constraint equation) the coefficients of the Einstein–scalar field Lichnerowicz equation (1.1) are h = cn R(g) − |∇ψ|2g , A = cn |σ + DW |2g + π 2 and B = −cn (

n−1 2 τ − 4V (ψ)), n

n−2 where cn = 4(n−1) , R(g) is the scalar curvature, ∇ is the covariant derivative for g, V (·) is the potential in the wave equation for the scalar field, and the operator D is the conformal Killing operator relative to g, defined by (DW )ab := ∇a Wb + ∇b Wa − n2 gab ∇m W m . The kernel of D consists of the conformal Killing fields on (M, g). Note that relative to the notation of [7], we have h = Rg,ψ , B = −Bτ,ψ and A = Ag,W,π . We assume in what follows that A ≥ 0 in M. This assumption implies no physical restrictions since we always have that A ≥ 0 in the original Einstein–scalar field theory. One of the results of [7] is the definition of a conformal invariant, the Yamabe–scalar field conformal invariant, whose sign can be used, through a judicious choice of the background metric g, to control the sign of h. We prove two type of results in this paper. The first one, in Sect. 2, establishes a set of sufficient conditions to guarantee the nonexistence of positive solutions of (1.1). The second one, in Sect. 3, is concerned with the existence of positive solutions of (1.1). Our existence result corresponds to (but generalizes) the case of initial data with a positive Yamabe–scalar field conformal invariant considered in [7]. More specifically the results presented here should be contrasted with the partial results indicated in the third row of Table 2 of [7], and specifically with Theorems 4 and 5 in Section 5.4–5.5 of [7]. The results presented here apply, for example, when considering initial data for the Einstein– massive–Klein–Gordon system with small (relative to the mass), or zero, values of the mean curvature. The basic variational method employed here is to use the mountain

120

E. Hebey, F. Pacard, D. Pollack

pass lemma [1,15] to solve a family of ε-approximated equations, and let then ε → 0 to obtain a solution of (1.1). Finally, Sect. 4 contains a brief discussion of a class of slightly more general equations which arise when considering the Einstein–Maxwell–scalar field theory. 2. Nonexistence of Smooth Positive Solutions Examples of nonexistence results involving pointwise conditions on h, A, and B are easy to get. Let u be a smooth positive solution of (1.1), and x0 be a point where u is minimum. Then g u(x0 ) ≤ 0 and we get that h(x0 )u(x0 ) ≥ B(x0 )u(x0 )2 −1 + A(x0 )u(x0 )−2 −1 . Let us assume that both A and B are positive functions. We have h(x0 ) ≥ B(x0 ) X + A(x0 ) X 1−n ,

(2.1)

4

where we have set X = u(x0 ) n−2 . Studying the least value of the right hand side of (2.1) (considered as a function of X ), we get that (1.1) does not possess a smooth positive solution if + n nn (h ) . (2.2) > max n−1 M (n − 1) A B n−1 It also follows from (2.1) that n−2 A 4(n−1) u(x) ≥ u(x0 ) ≥ min + M h for all x ∈ M. The idea of getting such a bound will be used again in Sect. 3 when proving Theorem 3.1. We now obtain a nonexistence result involving the Lebesgue norm of the functions A, B and h. Theorem 2.1. Let (M, g) be a smooth compact Riemannian manifold of dimension n ≥ 3. Let also h, A, and B be smooth functions on M with A ≥ 0 in M. If B > 0 in M, and

nn (n − 1)n−1

n+2 4n A

n+2 4n

B

3n−2 4n

dvg >

M

(h + )

n+2 4

B

2−n 4

dvg ,

(2.3)

M

where h + = max(0, h), then the Einstein–scalar field Lichnerowicz equation (1.1) does not possess any smooth positive solution. Proof. We assume that B > 0. Let u be a smooth positive solution of (1.1). Integrating (1.1) over M we get that Advg Bu 2 −1 dvg + = hudvg . (2.4) 2 +1 M M u M By Hölder’s inequality,

hudvg ≤ M

(h ) +

M

n+2 4

B

2−n 4

4 n+2

dvg

Bu M

2 −1

n−2 n+2

dvg

.

Einstein–Scalar Field Lichnerowicz Equations

121

Again by using Hölder’s inequality, A

n+2 4n

B

3n−2 4n

dvg ≤

B u2

M

−1

3n−2 4n

dvg

M

M

Advg u 2 +1

n+2 4n

.

Collecting these inequalities and using (2.4), we get X+

A

n+2 4n

B

3n−2 4n

4n

n+2

dvg

X

≤

1−n

(h ) +

M

n+2 4

B

2−n 4

4 n+2

dvg

,

(2.5)

M

where we have set X=

B u2

−1

4 n+2

dvg

.

M

The study of the minimal value of the function of X which appears on the left hand side of (2.5) implies that nn (n − 1)n−1

A

n+2 4n

B

3n−2 4n

4n

n+2

dvg

M

This completes the proof of the theorem.

≤

(h ) +

n+2 4

B

2−n 4

4n

n+2

dvg

.

M

Many more restrictive nonexistence conditions can be obtained easily from (2.3). For example, replacing B by min M B in the two integrals in (2.3), we get that if n+2 n+2 4n (h + ) 4 dvg n+2 nn M A 4n dvg > (n−1)(n+2) (n − 1)n−1 M (min B) 4n M

is fulfilled, then (2.3) holds true and the Einstein–scalar field Lichnerowicz equation (1.1) does not possess any smooth positive solution. In the same spirit, note that condition (2.2) is more restricitive than (2.3) since, for any triple of functions satisfying (2.2) we have n(2−n) 3n−2 nn A B n+2 > (h + )n B n+2 , n−1 (n − 1)

raising this to the power n+2 4n and integrating the result over M yields (2.3). In what follows we let S = S(M, g), S > 0, be the Sobolev constant of (M, g) defined as the smallest S > 0 such that 2 2 2 2 2 |∇u| + u dvg |u| dvg ≤ S (2.6) M

M

for all u ∈ S can be given in special geometries, like, see Ilias [13], when the Ricci curvature of the manifold is positive. Concerning lower bounds, it is well-known that S ≥ K n2 , where K n is the sharp Sobolev constant in the n-dimensional Euclidean space for the Sobolev inequality u L 2 ≤ K n ∇u L 2 . By −2 /n , where Vg is the volume of M with letting u = 1 in (2.6) we also get that S ≥ Vg respect to g. Using this, we prove some nonexistence result for solutions with bound an a priori bound on their H 1 energy. H 1 (M). Explicit upper bounds for

122

E. Hebey, F. Pacard, D. Pollack

Theorem 2.2. Let (M, g) be a smooth compact Riemannian manifold of dimension n ≥ 3. Let also h, A, and B be smooth functions on M with A ≥ 0 in M. If B is arbitrary, not necessarily positive, and

2

1 2

A dvg > S

1 max 1, max M h + 2

max B − +

4

M

M

(2.7)

S n−2

for some > 0, where B − = max(0, −B) and S is as in (2.6), then the Einstein–scalar field Lichnerowicz equation (1.1) does not possess smooth positive solutions of energy u H 1 ≤ . Moreover, (2.7) is sharp in the sense that the power p = 21 in the left-hand side of (2.7) cannot be improved, and that the bound on the energy cannot be removed. Proof. We prove here that (2.7) prohibits the existence of positive solutions of (1.1). The discussion on the sharpness of this condition is postponed until after the proof. Let u be a smooth positive solution of (1.1) such that u H 1 ≤ , > 0. Let C h = max 1, max M h + , where h + = max(0, h). Then,

|∇u|2 + hu 2 dvg ≤ C h |∇u|2 + u 2 dvg .

M

(2.8)

M

Multiplying (1.1) by u, and integrating over M, we get by (2.8) that

Bu 2 dvg + M

M

Advg ≤ C h 2 . u2

(2.9)

By the Sobolev inequality (2.6) we can write that

2

Bu dvg ≥ − max B

−

M

M

S 2 ,

(2.10)

where B − = max (0, −B). Then, by combining (2.9)–(2.10) we get that M

Advg 2 − S 2 . ≤ C + max B h M u2

(2.11)

Now, Hölder inequality yields

1 2

A dvg ≤ M

M

Advg u2

1 2

2

1 2

u dvg

.

(2.12)

M

By combining this inequality with (2.11), and by the Sobolev inequality (2.6), we get that

2

1 2

A dvg ≤ S M

This proves the theorem.

−

max B + M

Ch S

4 n−2

1 2

.

Einstein–Scalar Field Lichnerowicz Equations

123

We now discuss the sharpness of (2.7) in Theorem 2.2. The Yamabe equation on a Riemannian manifold (M, g) may be written as g u +

n−2 R(g)u = u 2 −1 , 4(n − 1)

(2.13)

where R(g) is the scalar curvature of g. A positive solution u > 0 of (2.13) corresponds to a conformally related metric g˜ = u 2 −2 g with constant positive scalar curvature R(g) ˜ = 4(n−1) n−2 . Now, any solution of (2.13) is a solution of (1.1) when we let h = R(g), n−2 4(n−1)

B = α, and A = (1 − α)u 22 for some α ∈ R. This provides a transformation rule for rewriting equations like (2.13) into equations like (1.1). On the unit sphere (S n , g), for which R(g) = n(n − 1), we know (see, for instance, Aubin [2]) that there exist families (u ε )ε of solutions of (2.13), ε > 0, such that u ε H 1 = K n−n + o(1) for all ε > 0, and u ε L p → +∞ as ε → 0 for all p > 2 , where K n is the sharp Sobolev constant in the n-dimensional Euclidean space for the Sobolev inequality u L 2 ≤ K n ∇u L 2 . Letting α = 21 , the above transformation rule (2.13)→(1.1) provides a family of Einstein–scalar field Lichnerowicz type equations indexed by ε > 0, with h and B independent of ε, such that any equationin the family possesses a solution of energy less than or equal to p 2K n−n , and for which M Aε dvg → +∞ as ε → 0 for all p > 21 . This proves that the power p = 21 in the left hand side of (2.7) cannot be improved. This example can be modified in different ways with the constructions given in Brendle [4] and in Druet and Hebey [9]. We prove next that the bound on the energy in Theorem 2.2 cannot be removed. By Druet and Hebey [9] we know that on the unit sphere in dimension n ≥ 6, or on any quotient (M, g) of the unit sphere in dimension n ≥ 6, there exist families (h ε )ε of smooth functions, such that h ε → n(n−2) in C 1 (M), and families (u ε )ε of smooth 4 positive functions such that, for any ε > 0, u ε solves the Yamabe type equation g0 u ε + h ε u ε = u ε2

−1

,

(2.14)

and such that u ε H 1 → +∞ as ε → 0. Rewriting (2.14) with the transformation rule (2.13)→(1.1), we see that the u ε ’s solve (1.1) with h = h ε , B = α, and A = (1 − α)u 22 ε 1 for some α ∈ R. Letting α = 2 , we get families of Einstein–scalar field Lichnerowicz type equations indexed by ε > 0 such that any equation in the family possesses a solution, B is independent of ε, the h ε ’s converge in the C 1 -topology to a positive constant 1/2 function, and M Aε dvg → +∞ as ε → 0. In particular, we cannot hope to get that there exists C = C(n, h, B), depending on the manifold and continuously on h and B in the C 0 -topology, like this is the case for the constant in (2.7) when is fixed, such that if M A1/2 dvg ≥ C, then the Einstein–scalar field Lichnerowicz type equation (1.1) does not possess a smooth positive solution. This proves that the bound on the energy in Theorem 2.2 cannot be removed. In the same circle of ideas, we mention that if B > 0 in M, then we can give another form to (2.7) where the constant appears as C 2 . In order to get this dependency in 2 we may proceed as in the proof of Theorem 2.2, but now getting bounds from the estimate (2.9). By (2.9), since we assumed that B > 0 in M, we can write that

u 2 dvg ≤ M

C h 2 and min M B

M

Advg ≤ C h 2 . u2

(2.15)

124

E. Hebey, F. Pacard, D. Pollack

Then, by (2.12) as in the proof of the second part of Theorem 2.2, we get from (2.15) that (1.1) does not possess a smooth positive solution if max 1, max M h + 2 1 A 2 dvg > . (2.16) 1 M (min M B) 2 Condition (2.16) is complementary to the condition in Theorem 2.2. For large ’s, (2.16) is better than (2.7) since it involves the energy 2 and not 2(n−1)/(n−2) . 3. Existence of a Smooth Positive Solution In this section we use the mountain pass lemma [1,15], to get existence results that complement the nonexistence results presented in Theorem 2.2. More precisely, we prove that if M Advg is sufficiently small, and A > 0 in M, then (1.1) possesses a solution. When A ≡ 0, (1.1) is the prescribed scalar curvature equation and we know from Kazdan and Warner [14] that there are situations in which the equation does not possess a solution. In the sequel we assume that the function h is chosen so that g + h is coercive. This amounts to say that there exists a constant K h = K (M, g, h) > 0, such that |∇u|2 + h u 2 dvg |u|2 dvg ≤ K h M

for all u ∈

H 1 (M).

M

It will be convenient to define u H 1 = h

1 2 |∇u|2 + h u 2 dvg .

(3.1)

M

We also denote by Sh = S(M, g, h) > 0, the Sobolev constant defined to be the smallest constant Sh > 0 such that 2 2 2 2 |∇u| + h u dvg |u| dvg ≤ Sh

2

M

(3.2)

M

for all u ∈ H 1 (M). Observe that, if h > 0 in M, then g +h is coercive and conversely coercivity implies that M hdvg > 0, and thus that max M h > 0. Also observe that if A, B ≥ 0, A + B > 0, and if (1.1) possesses a smooth positive solution, then g + h is coercive. Indeed, in that case, there exists a function u > 0 such that g u + hu > 0 everywhere in M, and the existence of such an u implies the coercivity of g + h. Finally, as already mentioned, when h > 0 in M, then g + h is coercive and we have the bound Sh ≤ max 1,

1 min M h

2 2

S,

where S = S(M, g) > 0 is the Sobolev constant defined in (2.6). We prove here that the following existence result holds true.

Einstein–Scalar Field Lichnerowicz Equations

125

Theorem 3.1. Let (M, g) be a smooth compact Riemannian manifold of dimension n ≥ 3. Let h, A, and B be smooth functions on M for which g + h is coercive, A > 0 in M, and max M B > 0. There exists a constant C = C(n), C > 0 depending only on n, such that if A C ϕ2H 1 (3.3) dvg ≤ 2 (Sh max M |B|)n−1 h M ϕ and

Bϕ 2 dvg > 0 M

for some smooth positive function ϕ > 0 in M, where · H 1 is as in (3.1) and Sh is as h in (3.2), then the Einstein–scalar field Lichnerowicz equation (1.1) possesses a smooth positive solution. Proof (Preliminary computations.). We define I (1) : H 1 (M) → R by 1 1 |∇u|2 + hu 2 dvg − B(u + )2 dvg , I (1) (u) = 2 M 2 M

(3.4)

(2)

and if we fix ε > 0 we define Iε

: H 1 (M) → R by Advg 1 (2) Iε (u) = , 2 M (ε + (u + )2 )2

(3.5)

where 2 =

2 . 2

Obviously, for any u ∈ H 1 (M) we can write

(u H 1 ) ≤ I (1) (u) ≤ (u H 1 ) h

h

(3.6)

if the functions , : [0, +∞) → R are defined by

(t) =

1 2 max M |B| t − Sh t 2 2 2

(3.7)

and

1 2 max M |B| t + Sh t 2 2 2 for t ∈ R, where Sh > 0 and · H 1 are as in (3.1) and (3.2). h Let t0 > 0 be given by n−2 4 1 t0 = Sh max M |B| (t) =

(3.8)

(3.9)

so that is increasing in [0, t0 ], and decreasing in [t0 , +∞). We define θ > 0 such that θ2 =

1 2(n − 1)

126

E. Hebey, F. Pacard, D. Pollack

and t1 = θ t0 for t0 as in (3.9). It is easy to check that (t1 ) ≤ θ 2

2 + 2 1

(t0 ) ≤ (t0 ), 2 − 2 2

(3.10)

where and are as in (3.7) and (3.8). Finally, we define the functional Iε = I (1) + Iε(2) ,

(3.11)

(2)

where I (1) and Iε are as in (3.4) and (3.5). Let ϕ ∈ C ∞ (M), ϕ > 0 in M, be the function in the statement of the theorem. In particular Bϕ 2 dvg > 0, (3.12) M

and, without loss of generality, we can assume that ϕ H 1 = 1. h

Now, provided the constant C in (3.3) is chosen to be C = θ2

2 − 2 , 4

we find that (3.3) precisely translates into A 1 1

(t0 ), dvg ≤ 2 2 M (t1 ϕ) 2

(3.13)

and by (3.6), (3.10), and (3.13) we get that Iε (t1 ϕ) ≤ (t0 ) < Iε (t0 ϕ).

(3.14)

Finally, (3.12) implies that lim Iε (t ϕ) = −∞. +∞

Hence we can choose t2 > t0 such that Iε (t2 ϕ) < 0,

(3.15)

where Iε is the functional in (3.11). Application of the Mountain Pass Lemma. By (3.14) and (3.15), we can apply the mountain pass lemma [1,15] to the functional Iε . Let cε = inf max Iε (u), γ ∈ u∈γ

(3.16)

where stands for the set of continuous paths joining u 1 = t1 ϕ to u 2 = t2 ϕ. Observe that cε > (t0 ) and, taking the path γ (t) = t ϕ, for t ∈ [t1 , t2 ], we see that cε is bounded uniformly as ε tends to 0. We will keep in mind, for further use that

(t0 ) < cε ≤ c for all ε small enough, where c > 0 is independent of ε.

(3.17)

Einstein–Scalar Field Lichnerowicz Equations

127

By the mountain pass lemma we get that there exists a sequence (u k )k in H 1 (M) such that Iε (u k ) → cε and Iε (u k ) → 0 (3.18) as k → +∞. By (3.18),

(∇u k ∇ϕ)dvg + hu k ϕdvg − B(u +k )2 M M M Au +k ϕdvg = + o ϕ 1 H + h M (ε + (u k )2 )2 +1

−1

ϕdvg (3.19)

for all ϕ ∈ H 1 (M), where (∇u k ∇ϕ) stands for the pointwise scalar product of ∇u k and ∇ϕ with respect to g, and 1 1 B(u +k )2 dvg |∇u k |2 + hu 2k dvg − 2 M 2 M Advg 1 + = cε + o(1). (3.20) 2 M (ε + (u +k )2 )2 Combining (3.19) with ϕ = u k , and (3.20), we get that 1 n

M

+

1 2

B(u +k )2 dvg +

1 2

M

(ε + (u +k )2 )2 +1 = cε + o u k H 1 + o(1),

M

Advg (ε + (u +k )2 )2

A(u +k )2 dvg

h

and it follows from (3.21) that for k sufficiently large, 1 B(u +k )2 dvg ≤ 2cε + o u k H 1 . h n M By (3.20) and (3.22) we then get that for k sufficiently large, n−2 2 2 |∇u k | + hu k dvg ≤ B(u +k )2 dvg + 4 cε n M M ≤ 2n cε + o u k H 1 . h

In particular, by (3.22) and (3.23), |∇u k |2 + hu 2k dvg ≤ 2ncε + 1, and M 4n − B(u +k )2 dvg ≤ 3ncε cε ≤ n−2 M

(3.21)

(3.22)

(3.23)

(3.24)

for k sufficiently large, where cε is as in (3.16). By (3.24), the sequence (u k )k is bounded in H 1 (M). Up to passing to a subsequence we may then assume that there exists

128

E. Hebey, F. Pacard, D. Pollack

u ε ∈ H 1 (M) such that u k u ε weakly in H 1 (M), u k → u ε strongly in L p (M) for some p > 2, and u k → u ε almost everywhere in M as k → +∞. As a consequence,

(u +k )2 −1 (u +ε )2 −1 weakly in L 2 /(2 −1) (M), and u +k u +ε → strongly in L 2 (M) + (ε + (u k )2 )q (ε + (u +ε )2 )q

(3.25)

for all q > 0, as k → +∞. Indeed, by (3.24), the (u +k )2 −1 ’s are bounded in L 2 /(2 −1) (M). Since they converge almost everywhere to (u +ε )2 −1 , the first equation in (3.25) follows from standard integration theory. By Lebesgue’s dominated convergence theorem we also have that (ε + (u +k )2 )−q → (ε + (u +ε )2 )−q strongly in L p (M) for all p ≥ 1 and all q > 0, and since u k → u ε in L p (M) for some p > 2, we easily get that the second equation in (3.25) holds true. By (3.25), letting k → +∞ in (3.19), it follows that u ε satisfies Au +ε g u ε + hu ε = B(u +ε )2 −1 + (3.26) (ε + (u +ε )2 )2 +1 in the weak sense. The weak maximum principle and (3.26) imply that u ε ≥ 0. As a consequence, Au ε g u ε + hu ε = Bu ε2 −1 + (3.27) (ε + u 2ε )2 +1 in the weak sense. Regularity and positivity of the solution. We may rewrite (3.27) as

A g u ε + h − u ε = Bu ε2 −1 , +1 2 2 (ε + u ε ) and since h−

A (ε + u 2ε )2

+1

∈ L ∞ (M),

the regularity arguments developed in Trudinger [20] apply to (3.27). It follows that u ε ∈ L s (M) for some s > 2 . Since we have that A(ε + u 2ε )−2 +1 u ε ∈ L p (M) if p s u ε ∈ L (M), and u ε ∈ L (M) for some s > 2 , the standard bootstrap procedure, together with regularity theory, gives that u ε ∈ H 2, p (M) for all p ≥ 1, where H 2, p is the Sobolev space of functions in L p with two derivatives in L p . By the Sobolev embedding theorem we then get that the right-hand side in (3.27) is in C 0,α (M) for α ∈ (0, 1), and by regularity theory it follows that u ε ∈ C 2,α (M) for α ∈ (0, 1). In particular, the strong maximum principle can be applied and we get that either u ε ≡ 0, or u ε > 0 in M. Then we easily get that u ε ∈ C ∞ (M) is smooth. By (3.24) and (3.25), letting k → +∞ in (3.21), we get that Advg 1 ≤ (2 − 1)c, (3.28) 2 M (ε + u 2ε )2 where c is the upper bound for cε . If, for a sequence of ε j tending to 0, u ε j were to be equal to 0, we would conclude that 1 Advg ≤ c (3.29) 2 (2 − 1)ε2j M

Einstein–Scalar Field Lichnerowicz Equations

129

which is clearly impossible since we have assumed that A > 0. Therefore, for ε small enough u ε ≡ 0. Then, according to the above discussion, u ε is a smooth positive solution of (3.27). By (3.24), and standard properties of the weak limit, we also get that (|∇u ε |2 + hu 2ε )dvg ≤ 2ncε + 1 (3.30) M

for all ε > 0 small enough. Passing to the limit as ε tends to 0. In what follows we let (εk )k be a sequence of positive real numbers such that εk → 0 as k → +∞ and (3.29) holds true with ε = εk for all k, and let u k = u εk . Then u k is a smooth positive function in M such that g u k + hu k = Bu k2

−1

Au k

+

(εk + u 2k )2

(3.31)

+1

in M while, by (3.17) and (3.30), the sequence (u k )k is bounded in H 1 (M). Let xk be a point where u k is minimum. Then g u k (xk ) ≤ 0 and we get with (3.31) that h(xk ) + |B|(xk )u k (xk )2 Let δ0 > 0 be such that 2(2 +1) δ0

−2

≥

A(xk ) (εk + u k (xk )2 )2

+1

.

(3.32)

min M A 2 −2 max h + (max |B|)δ0 = . M M 2

By (3.32) we obtain that u k (xk ) ≥ δ0 , and thus that min u k ≥ δ0

(3.33)

M

when k is sufficiently large. Since (u k )k is bounded in H 1 (M) we may assume that there exists u ∈ H 1 (M) such that, up to passing to a subsequence, u k u weakly in H 1 (M), u k → u strongly in L p (M) for some p > 2, and u k → u almost everywhere in M as k → +∞. By (3.33), u ≥ δ0 almost everywhere in M. Still by (3.33), we get with similar arguments to those used to prove (3.25) that u k2

−1

u2 uk

−1

(εk + u 2k )2 +1

weakly in L 2 /(2 −1) (M), and 1 → 2 +1 strongly in L 2 (M) u

(3.34)

as k → +∞. By (3.31) and (3.34), letting k → +∞ in (3.31), we get that u is a weak solution of the Einstein–scalar field Lichnerowicz equation (1.1). Rewriting (1.1) as A g u + h − 2 +2 u = Bu 2 −1 , u

and since h − Au −2 −2 ∈ L ∞ (M), the regularity arguments developed in Trudinger [20] apply to (1.1). It follows that u ∈ L s (M) for some s > 2 . Since u ≥ δ0 almost everywhere, and δ0 > 0, the standard bootstrap procedure, together with regularity theory, gives that u is a smooth positive solution of (1.1). This ends the proof of the theorem.

130

E. Hebey, F. Pacard, D. Pollack

As a remark, the above proof provides an explicit expression for the dimensional constant C in (3.3). As another remark, it can be noted that when M Bdvg > 0, then we can take ϕ to be constant in (3.12). In particular, our existence result has the following corollary. Corollary 3.1. Let (M, g) be a smooth compact Riemannian manifold of dimension n ≥ 3 and h a smooth function on M for which g + h is coercive. There exists a constant C = C(n, h), C > 0, such that if A and B are smooth functions on M, with A > 0 in M, max M B > 0, and M Bdvg > 0, and if we further assume that (max |B|)n−1 A dvg ≤ C(n, h), (3.35) M

M

then the Einstein–scalar field Lichnerowicz equation (1.1) possesses a smooth positive solution. When A > 0 and B > 0, we can also take ϕ = A result has the following corollary.

n−2 4n

in (3.12), and our existence

Corollary 3.2. Let (M, g) be a smooth compact Riemannian manifold of dimension n ≥ 3 and h a smooth function on M for which g + h is coercive. There exists a constant C = C(n, h), C > 0, such that if A and B are smooth functions on M, with A > 0 and B > 0 in M and if we further assume that n−2 1 (max |B|)n−1 A 4n 2H 1 A 2 dvg ≤ C(n, h), (3.36) M

M

then the Einstein–scalar field Lichnerowicz equation (1.1) possesses a smooth positive solution. Interestingly, Sobolev embedding implies that 1 n−2 A 2 dvg ≤ SA 4n 2H 1 , M

and so, if A and B satisfy (3.36), then n−1 (max |B|) M

2

1 2

A dvg M

≤

C(n, h) , S

which is reminiscent of the condition (with the opposite inequality) that ensured the non-existence of a solution, which was obtained in Theorem 2.2. 4. Einstein-Maxwell-Scalar Field Theory The methods employed in Sects. 2 and 3 are strong enough to deal with additional nonlinear negative power terms in the equation of the form Cu − p for C ≥ 0 and p > 1. Such terms arise, for example, in the Einstein–Maxwell–scalar field theory. Given (M, g) compact of dimension n ≥ 3, we let h, A, B, and C be smooth functions in M, and we briefly discuss in this section equations of the form g u + hu = Bu 2

−1

+

A C + , u 2 +1 u p

(4.1)

Einstein–Scalar Field Lichnerowicz Equations

131

where A, C ≥ 0 and p > 1. In the case of the Einstein–Maxwell–scalar field theory in (spatial) dimension n = 3 we have p = 3 and C ≥ 0 represents the sum of the squares of the norms of the electric and magnetic fields on M. The approach we used to prove Theorem 2.2 deals with inequalities resulting from the signs of the coefficients and the p−1 powers of the unknown function u and thus applies to (4.1). Let pˆ = 2 2+ −1 . Then, if we concentrate on getting nonexistence results of smooth positive solutions with no a priori bound on the energy, the approach we used to prove Theorem 2.2 gives in particular that (4.1) does not possess a smooth positive solution if B > 0 in M, A, C ≥ 0 in M, and either (2.3) holds true, or

(α + 1)α+1 αα

1pˆ C

1 pˆ

B

p−1 ˆ pˆ

dvg >

M

(h + )

n+2 4

B

2−n 4

dvg ,

(4.2)

M

where α = (n − 2)( p + 1)/4. We also do get similar conditions to (4.2) for the nonexistence of solutions of (4.1) of energy bounded by . The method we used to prove Theorem 3.1 applies to (4.1) as well. Assume g + h is coercive, A, C ≥ 0 in M, A + C > 0 in M, and max M B > 0. Following the proof of Theorem 3.1 we get that there exists = (n, p), > 0 depending only on n and p, such that if A C dv ≤ , dvg ≤ (4.3) g 2 n−1 p−1 α ϕ (S max |B|) ϕ (S max h M h M |B|) M M and

Bϕ 2 dvg > 0 M

for some smooth positive function ϕ > 0 in M such that ϕ H 1 = 1, where · H 1 is h h as in (3.1), Sh is as in (3.2), and α is as in (4.2), then (4.1) possesses a smooth positive solution. As for (3.3), the constant in (4.3) can be made explicit. References 1. Ambrosetti, A., Rabinowitz, P.: Dual variational methods in critical point theory and applications. J. Funct. Anal. 14, 349–381 (1973) 2. Aubin, T.: Nonlinear Analysis on manifolds. Monge-Ampre Equations. Grund. der Math. Wissenschaften, 252. New York:Springer-Verlag, 1982 3. Bartnik, R., Isenberg, J.: The constraint equations. In: The Einstein Equations and the Large Scale Behavior of Gravitational Fields edited by P.T. Chru´sciel, H. Friedrich, Basel:Birkhäuser, 2004, pp. 1–39 4. Brendle, S.: Blow-up phenomena for the Yamabe PDE in high dimensions. To appear J. Amer. Math. Soc., DOI:10.1090/S0894-0347-07-00575-9 , 2007 5. Choquet-Bruhat, Y., Geroch, R.: Global aspects of the Cauchy problem in general relativity. Commun. Math. Phys. 14, 329–335 (1969) 6. Choquet-Bruhat, Y., Isenberg, J., Pollack, D.: The Einstein–scalar field constraints on asymptotically Euclidean manifolds. Chin. Ann. Math. Ser. B 27(1), 31–52 (2006) 7. Choquet-Bruhat, Y., Isenberg, J., Pollack, D.: The constraint equations for the Einstein–scalar field system on compact manifolds. Class. Quantum Grav. 24, 809–828 (2007) 8. Choquet-Bruhat, Y., York, J.: The Cauchy Problem. In: General Relativity and Gravitation - The Einstein Centenary, edited by A. Held New York:Plenum, 1980, pp. 99–172 9. Druet, O., Hebey, E.: Blow-up examples for second order elliptic PDEs of critical Sobolev growth. Trans. Amer. Math. Soc. 357, 1915–1929 (2004) 10. Foures-Bruhat, Y.: Théorème d’existence pour certains systèmes d’équations aux dérivées partialles non linéaires. Acta. Math. 88, 141–225 (1952) 11. Isenberg, J.: Constant mean curvature solutions of the Einstein constraint equations on closed manifolds. Class. Quantum Grav. 12, 2249–2274 (1995)

132

E. Hebey, F. Pacard, D. Pollack

12. Isenberg, J., Maxwell, D., Pollack, D.: A gluing constructions for non-vacuum solutions of the Einstein constraint equations. Adv. Theor. Math. Phys. 9(1), 129–172 (2005) 13. Ilias, S.: Constantes explicites pour les inégalités de Sobolev sur les variétés riemanniennes compactes. Ann. Inst. Fourier 33, 151–165 (1983) 14. Kazdan, J.L., Warner, F.W.: Scalar curvature and conformal deformation of Riemannian structure. J. Differ. Geom. 10, 113–134 (1975) 15. Rabinowitz, P.: Minimax methods in critical point theory with applications to differential equations, CBMS Regional Conference Series in Mathematics 65, Providance RI: Amer. Math. Soc., 1986 16. Rendall, A.: Accelerated cosmological expansion due to a scalar field whose potential has a positive lower bound. Class. Quantum Grav. 21, 2445–2454 (2004) 17. Rendall, A.: Mathematical properties of cosmological models with accelerated expansion. In: Analytical and numerical approaches to mathematical relativity, Lecture Notes in Phys. 692, Berlin:Springer, 2006, pp. 141–155 18. Rendall, A.: Intermediate inflation and the slow-roll approximation. Class. Quantum Grav. 22, 1655–1666 (2005) 19. Sahni, V.: Dark matter and dark energy. In: Physics of the Early Universe, edited by E. Papantonopoulos Berlin:Springer 2005 20. Trudinger, N.S.: Remarks concerning the conformal deformation of Riemannian structures on compact manifolds. Ann. Scuola Norm. Sup. Pisa 22, 265–274 (1968) Communicated by G.W. Gibbons

Commun. Math. Phys. 278, 133–144 (2008) Digital Object Identifier (DOI) 10.1007/s00220-007-0382-4

Communications in

Mathematical Physics

Catalytic Majorization and p Norms Guillaume Aubrun, Ion Nechita Université de Lyon, Université Lyon 1, CNRS, UMR 5208 Institut Camille Jordan, Batiment du Doyen Jean Braconnier, 43, boulevard du 11 novembre 1918, 69622 Villeurbanne Cedex, France. E-mail: [email protected]; [email protected] Received: 15 February 2007 / Accepted: 23 May 2007 Published online: 11 December 2007 – © Springer-Verlag 2007

Abstract: An important problem in quantum information theory is the mathematical characterization of the phenomenon of quantum catalysis: when can the surrounding entanglement be used to perform transformations of a jointly held quantum state under LOCC (local operations and classical communication)? Mathematically, the question amounts to describe, for a fixed vector y, the set T (y) of vectors x such that we have x ⊗ z ≺ y ⊗ z for some z, where ≺ denotes the standard majorization relation. Our main result is that the closure of T (y) in the 1 norm can be fully described by inequalities on the p norms: x p y p for all p 1. This is a first step towards a complete description of T (y) itself. It can also be seen as a p -norm analogue of the Ky Fan dominance theorem about unitarily invariant norms. The proof exploits links with another quantum phenomenon: the possibiliy of multiple-copy transformations (x ⊗n ≺ y ⊗n for given n). The main new tool is a variant of Cramér’s theorem on large deviations for sums of i.i.d. random variables. 1. Introduction The increasing interest that quantum entanglement has received in the past decade is due, in part, to its use as a resource in quantum information processing. We investigate the problem of entanglement transformation: under which conditions can an entangled state |φ be transformed into another entangled state |ψ? We restrict ourselves to LOCC protocols: Alice and Bob share |φ and have at their disposal only local operations (such as unitaries U A ⊗ I B for Alice) and classical communication. Nielsen showed in [15] that such a transformation is possible if and only if λφ ≺ λψ , where “≺” is the majorization relation and λφ , λψ are the Schmidt coefficients vectors of |φ and |ψ, respectively. Practically in the same time, Jonathan and Plenio [9] discovered a striking phenomenon: entanglement can help LOCC communication, without even being consumed. Precisely, they have found states |φ and |ψ such that |φ cannot be transformed into |ψ, but, with the help of a catalyst state |χ , the transformation |φ ⊗ |χ → |ψ ⊗ |χ is possible.

134

G. Aubrun, I. Nechita

When such a catalyst exists, we say that the state |φ is trumped by |ψ and we write λφ ≺T λψ . We say then that |φ can be transformed into |ψ by entanglement-assisted LOCC or ELOCC. It turns out that the trumping relation is much more complicated than the majorization relation; one can easily check on two given states |φ and |ψ whether λφ ≺ λψ is satisfied or not, but there is no direct way to determine if λφ ≺T λψ . Later, Bandyopadhyay et al. [2] discovered that a similar situation occurs when trying to transform by LOCC multiple copies of |φ into |ψ. It may happen that the transformation |φ → |ψ is not possible, but when considering n copies, one can transform |φ⊗n into |ψ⊗n . The phenomenon of multiple simultaneous LOCC transformations, or MLOCC, has been intensively studied in recent years and many similarities with ELOCC have been found [7,8]. In this note, we make some progress towards a complete characterization of both ELOCC and MLOCC. We show that a set of inequalities involving p norms (see the remark on Conjecture 1 at the end of the paper) is equivalent to the fact that |φ can be approached by a sequence of states |φn which are MLOCC/ELOCC-dominated by |ψ. An important point is that we allow the dimension of |φn to exceed the dimension of |φ. Our proof uses probabilistic tools; we introduce probability measures associated to |φ and |ψ and we use large deviation techniques to show the desired result. Interestingly, the result can be reversed to give a characterization of p norms that is similar to the Ky Fan characterization of unitarily invariant norms. We refer the interested reader to Sect. 3. The rest of the paper is organized as follows: in Sect. 2 we introduce the notation and the general framework of entanglement transformation of bipartite states. We also state our main result, Theorem 1. The theorem is proved in Sect. 4. Conclusions and some directions for further study are sketched in Sect. 5. The Appendix at the end of the paper contains basic results from large deviation theory needed in the proof of the main theorem.

2. Notation and Statement of the Results For d ∈ N∗ , let Pd be the set of d-dimensional probability vectors: Pd = {x ∈ Rd s.t. xi 0, xi = 1}. If x ∈ Pd , we write x ↓ for the decreasing rearrangement of x, i.e. the vector x ↓ ∈ Pd such that x and x ↓ have the same coordinates up to permuta↓ ↓ ↓ tion, and xi xi+1 . We shall also write xmax for x1 and xmin for the smallest nonzero coordinate of x. There is an operation on probability vectors that is fundamental in what follows: the tensor product ⊗. If x = (x1 , . . . , xd ) ∈ Pd and x = (x1 , . . . , xd ) ∈ Pd , the tensor product x ⊗ x is the vector (xi x j )i j ∈ Pdd ; the way we order the coordinates of x ⊗ x is immaterial for our purposes. We also define the direct sum x ⊕ x as the concatenated vector (x1 , . . . , xd , x1 , . . . , xd ) ∈ Rd+d . It x ∈ Pd satisfies xd = 0, it will be useful to identify x with the truncated vector (x1 , . . . , xd−1 ) ∈ Pd−1 . This identification induces a canonical inclusion Pd−1 ⊂ Pd . Thus, every vector x ∈ Pd can be thought of as a vector of Pd for all d d by appending d − d null elements to x. We consider thus the set of all probability vectors P<∞ = d>0 Pd . In other words, P<∞ is the set of finitely supported probability vectors.

Catalytic Majorization and p Norms

135

Let us now introduce the classical majorization relation [13,3]. If x, y ∈ Rd we define the submajorization relation ≺w as follows: x ≺w y iff. ∀k ∈ {1, . . . , d},

k

↓

xi

i=1

k

↓

yi ,

i=1

and the majorization relation ≺ as x ≺ y iff.

d i=1

xi =

d

yi and ∀k ∈ {1, . . . d − 1},

i=1

k i=1

↓

xi

k

↓

yi .

i=1

We usually work with probability vectors, for which both relations coincide. However, it will be useful in the proof to work with deficient vectors (of total mass less than 1) and to use submajorization. We write Sd (y) for the set of vectors x in Pd which are majorized by y. It is well-known that Sd (y) is a compact convex set whose extreme points are the vectors obtained by permuting the coordinates of y; many other characterizations of Sd (y) are known [16,4]. This relation behaves well with respect to direct sums and tensor products: x ≺ y implies x ⊕ z ≺ y ⊕ z and x ⊗ z ≺ y ⊗ z for any z ∈ P<∞ . The majorization relation has been shown to have a very important role in quantum information. Nielsen has proved [15] that a state |φ belonging to Alice and Bob can be transformed into the state |ψ by using local operations and classical communication (LOCC) if and only if λφ ≺ λ ψ , where λφ (respectively λψ ) is the vector of eigenvalues of the density matrix for Alice’s system when the joint system is in the state |φ (respectively |ψ). Not long after Nielsen’s theorem, Jonathan and Plenio have discovered a very intriguing phenomenon: there exist states |φ and |ψ such that the transformation |φ → |ψ is impossible by LOCC, but, with the aid of a catalyst state |χ , the transformation |φ ⊗ |χ → |ψ ⊗ |χ becomes possible; we say that |φ can be transformed into |ψ by Entanglement-assisted LOCC or ELOCC. This result has motivated a more complex relation between probability vectors: if x, y ∈ Pd , we say that y trumps x and write x ≺T y if there exists z ∈ P<∞ such that x ⊗ z ≺ y ⊗ z. It is important to require that the auxiliary vector z (called the catalyst) is finitely supported (see Remark 2). Given y ∈ Pd , we write Td (y) for the set of d-dimensional vectors trumped by y, that is Td (y) = {x ∈ Pd s.t. x ≺T y}. The set Td (y) is in general larger than Sd (y) [5] and much more complicated to describe. Up to now, there is no known simple procedure to decide whether x ∈ Td (y) or not. Hence, finding a tractable characterization of the relation ≺T (or, equivalently, of the set Td (y)) has become an important open problem in quantum information theory [18]. The geometry of Td (y) has been studied in [4,5]: it is a bounded convex set that it is neither closed nor open (provided y is not too simple). We shall introduce now another important extension of LOCC transformations. Bandyopadhyay et al. [2] found an example of entangled states |φ and |ψ with the property that the LOCC transformation |φ → |ψ is impossible but, when one tries to transform multiple copies of the states, the transformation |φ⊗n → |ψ⊗n becomes possible. We say that |ψ MLOCC-dominates |φ. We introduce the analogue of the trumping relation for probability vectors: x ≺ M y iff ∃n 1 s.t. x ⊗n ≺ y ⊗n ,

136

G. Aubrun, I. Nechita

and the set of probability vectors MLOCC-dominated by a given vector y: Md (y) = {x ∈ Pd s.t. x ≺ M y}. Not much is known about the set Md (y). It has been studied in [7] and shown to have many similarities with the set Td (y): for example it is neither closed nor open in general. One important point is that, for all y, we have Md (y) ⊆ Td (y) (see [7]). We report progress towards a description of the sets of Md (y) and Td (y). The main ingredient of our approach is the following observation. Consider two vectors x, y ∈ Pd . Whether x ≺ y, x ≺ M y, x ≺T y or not depends only on the non-zero coordinates of x and y. Thus, it is possible to ≺/≺ M /≺T -compare vectors of different sizes by appending the necessary amount of zero coordinates to the end of one of them. Hence, it seems more natural (at least from a mathematical point of view) to consider the sets T<∞ (y) = {x ∈ P<∞ s.t. x ≺T y}

= {x ∈ P<∞ s.t. ∃z ∈ P<∞ s.t. x ⊗ z ≺ y ⊗ z} =

Td (y)

d d

and M<∞ (y) = {x ∈ P<∞ s.t. x ≺ M y} = {x ∈ P<∞ s.t. ∃n 1 s.t. x ⊗n ≺ y ⊗n } =

Md (y).

d d

The important point here is that both T<∞ (y) and M<∞ (y) do not depend anymore on the size of y, but only on the non-null coordinates of y. Of course, if y ∈ Pd , Td (y) = T<∞ (y) ∩ Pd and Md (y) = M<∞ (y) ∩ Pd ; this shows that the sets T<∞ (y) and M<∞ (y) are not closed either in general (otherwise Td (y) and Md (y) would also be closed). We then write T<∞ (y) and M<∞ (y) to denote the closure taken with respect to the 1 -norm, the natural topology in this setting (see Remark 3). Recall that for p 1, the p norm of a vector x ∈ Pd is defined as x p =

d

1/ p p xi

(1)

i=1

and x∞ = max xi . We now come to our main result: Theorem 1 Consider two vectors x, y ∈ P<∞ . The following assertions are equivalent: (a) x ∈ M<∞ (y), (b) x ∈ T<∞ (y), (c) x p y p , ∀ p 1. Remark 1. Note that instead of demanding that x p y p for all p 1, it suffices to ask for x, y ∈ Pd that the inequality holds for all p ∈ [1, pmax (x, y)], where pmax (x, y) = log d/(log ymax − log xmax ). The inequalities for p > pmax follow by simple computation. For such results in a more general setting, see [14].

Catalytic Majorization and p Norms

137

Remark 2. It is important to see at this point how the set T<∞ (y) is related to the set Td (y). First of all, note that if we drop the closure, we have equality: T<∞ (y) ∩ Pd = Td (y) for y ∈ Pd . However, when taking the 1 closure of the left-hand side, we obtain a strict inclusion: Td (y) T<∞ (y) ∩ Pd . An example for such a vector is provided by the phenomenon of infinite-dimensional catalysis, discovered by Daftuar [4]. Take y = (0.5, 0.25, 0.25) and x = (0.4, 0.4, 0.2). It is obvious that x ∈ / Td (y) because x3 < y3 and the condition xd yd is necessary for x ∈ Td (y). However, there exists 1 an infinite-dimensional catalyst z = (1 − α)(1, α, α 2 , . . . , α k , . . .), where α = 2− 8 , such that x ⊗ z ≺ y ⊗ z and ||x ⊗ z|| p ||y ⊗ z|| p for all p 1. Note that z is p -bounded and thus x p y p for all p 1. By the preceding theorem, we have that x ∈ T<∞ (y) ∩ P3 . For further remarks on this topic, see Sect. 5. Remark 3. The use of the 1 norm is natural in this context from a mathematical point of view since P<∞ is a subset of the norm-closed hyperplane of 1 defined by xi = 1. Let us explain also how it relates to other physically motivated distances between the approaching states |φn and the original state |φ. Recall that x is the eigenvalue vector of the reduced density matrix corresponding to Alice’s (or, equivalently to Bob’s) part of the system. From the details of the proof (see also Sect. 5), one sees that the size of the approaching vectors xn increases with n. So, in order to compare ρ and ρn , we have to realize them as density matrices on the same Hilbert space H. Moreover, we can suppose that the two states are diagonalizable in the same basis (Alice can achieve this by applying a local unitary basis change). As usually, we append the necessary number of zero eigenvalues to x in order to have the same size as xn . We obtain the following equality: x − xn 1 = ρ − ρn tr . So, for Alice’s part of the system, we obtain a convergence in the trace norm sense. It is well known that the trace norm distance is related to the probability that the two states can be distinguished by some measurement. Moreover, by using some classical inequalities (see [17], Chapter 9), the fidelity F(ρ, ρn ) can be shown to converge to 1. 3. A p Version of Ky Fan Theorem In this section, we explain how Theorem 1 can be seen as an analogue of Ky Fan dominance theorem. We refer to [3] for background. We denote by Md the space of complex d ×d matrices. A norm |||·||| on Md is said to be unitarily invariant if |||U AV ||| = |||A||| for all unitary matrices U, V . A norm || · || on Rd is said to be symmetric if ||(x1 , . . . , xd )|| = ||(±xσ (1) , . . . , ±xσ (d) )|| for all choices of signs in {±1}d and all permutations σ ∈ Sd . It is well-known ([3], Theorem IV.2.1) that unitarily invariant norms on Md are in 1-to-1 correspondance with symmetric norms on Rd (consider the restriction of ||| · ||| to diagonal matrices). Examples of unitarily invariant norms are given by Ky Fan norms, defined for k = 1, 2, . . . , d by |||A|||(k) =

k j=1

s j (A),

138

G. Aubrun, I. Nechita

where s1 (A) · · · sd (A) denote the ordered singular values of a matrix A. The Ky Fan dominance theorem asserts that these norms are extremal among unitarily invariant norms in the following sense: if A, B satisfy |||A|||(k) |||B|||(k) for any k = 1, · · · , d, then |||A||| |||B||| for any unitarily invariant norm; this condition can also be formulated as s(A) ≺w s(B), where s(·) denotes the vector of singular values of a matrix. This gives a way to derive an infinite family of inequalities from a finite one. However, this may be too strong a requirement and one can wonder what happens for an important special class of unitarily invariant norms: the Schatten p-norms (or noncommutative p norms), defined for p 1 by ⎛ ⎞1/ p d p |||A||| p = ⎝ s j (A) ⎠ . j=1

To state our result, we need to compare matrices of different sizes. If d < d we identify Md with the top-left corner of Md ; this gives a natural inclusion Md ⊂ Md and we write M<∞ = d Md . Note that the tensor product of matrices is a well-defined operation on M<∞ . Theorem 2 Let A, B ∈ Md . The following are equivalent (1) |||A||| p |||B||| p for all p 1. (2) There exists in M<∞ a sequence (An ) so that limn→∞ |||An − A|||1 = 0 and ⊗n |||A⊗n n ||| |||Bn ||| for all unitarily invariant norms |||.||| (or, equivalently, so ⊗n that s(An ) ≺w s(B ⊗n )). Of course, a main difference between this result and Ky Fan dominance theorem is that condition (ii) here is hard to check and involves infinitely many inequalities. Proof (sketch). Because of the bijective correspondence between unitarily invariant norms on matrices and symmetric norms on vectors, it is enough to prove the theorem for positive diagonal matrices. This is almost the content of the equivalence (a) ⇐⇒ (c) of Theorem 1. The only slight remark that we need in order to get condition (2) as stated here is the following: in Lemma 2 below, it follows from the proof that we can actually choose the integer n so that x ⊗N ≺w y ⊗N for any N n. 4. The Proof of Theorem 1 We shall prove the sequence of implications (a) ⇒ (b) ⇒ (c) ⇒ (a). The first two are well known; we sketch their proof for completeness. The third is the most difficult one and represents our contribution to the theorem. (a) ⇒ (b) Because the closure is taken with respect to the same topology (1 ) for both M<∞ (y) and T<∞ (y), it is enough to show M<∞ (y) ⊂ T<∞ (y). Let x ∈ M<∞ (y) and consider n such that x ⊗n ≺ y ⊗n . The trick here (see [7]) is to use the following z as a catalyst: z = x ⊗(n−1) ⊕ x ⊗(n−2) ⊗ y ⊕ · · · ⊕ x ⊗ y ⊗(n−2) ⊕ y ⊗(n−1) . For simplicity we do not normalize z, but this is irrelevant. The vector z has been constructed such that x ⊗ z = x ⊗n ⊕ w and y ⊗ z = y ⊗n ⊕ w, where w is the same in both expressions. This implies that x ⊗z ≺ y⊗z, i.e. x ∈ T<∞ (y).

Catalytic Majorization and p Norms

139

(b) ⇒ (c) Let z ∈ P<∞ be the catalyst for x ≺T y: x ⊗ z ≺ y ⊗ z. A function ϕ : Rd → R is said to be Schur-convex if a ≺ b implies ϕ(a) ϕ(b). It is well-known d h(xi ) is (see [13,16]) that if h : R → R is a convex function, then ϕ : x → i=1 p Schur-convex. Consequently, the functions x → x p are Schur-convex for p 1. Moreover, they satisfy the identity x ⊗ z p = x p z p , and similarly for y. Since z p is finite, we get that x p y p . To show that the same is true for x ∈ T<∞ (y), it suffices to check that the set of x ∈ 1 such that x p y p is norm-closed; this follows from the inequality · p · 1 . (c) ⇒ (a) We will adapt some techniques used by G. Kuperberg in a slightly different context [12]. In our proof, we allow deficient vectors, i.e. vectors with total mass smaller than 1, and we use submajorization. d xi δlog xi , As in [12], we associate to a positive vector x ∈ Rd the measure µx = i=1 where δz is the Dirac measure at point z. The basic property is that the tensor product operation of vectors corresponds to the convolution of associated measures: µx⊗y = µx ∗ µ y . The convolution of two measures µ and ν is defined by the relation µ ∗ ν(A) = (µ × ν)({(x, y) ∈ R2 : x + y ∈ A}). Moreover, if µ and ν are probability measures and X µ and X ν denote independent random variables with laws, respectively µ and ν, then µ ∗ ν is the law of X µ + X ν . The following lemma gives a way to prove majorization using comparison of the tails of the associated measures. Lemma 1. Let x and y be two vectors of Rd with non-negative components. Consider the measures µx and µ y associated with x and y. Assume that, for all t ∈ R, µx [t, ∞) µ y [t, ∞). Then x ≺w y. Proof. Note that

µx [t, ∞) =

xi =

i:log xi t

xi .

i:xi exp(t)

Thus, for all u > 0, i:xi u xi i:yi u yi . For simplicity, we assume first that all coordinates of y are distinct. We will show by induction on k ∈ {1, . . . , d} that k k ↓ ↓ ↓ ↓ ↓ i=1 x i i=1 yi . For the first step; use u = y1 to conclude that x 1 y1 . Now, fix k ↓ ↓ ↓ ↓ k k ∈ {1, . . . , d − 1} and suppose that i=1 xi i=1 yi . If xk+1 yk+1 , the induction ↓ ↓ ↓ step is obvious. If xk+1 > yk+1 , we use u = xk+1 to get k+1 i=1

↓

xi

↓

i:xi xk+1

xi

↓

i:yi xk+1

yi

↓

i:yi yk+1

yi =

k+1

↓

yi .

i=1

This completes the induction when y has distinct coordinates. The general case follows by approximating y by y + εn , where (εn ) is a suitable sequence of positive vectors tending to 0. The approximation is possible since the set of vectors y majorizing a fixed x is closed.

140

G. Aubrun, I. Nechita

We now get to the key lemma in our argument. We shall use a slightly modified version of Cramér large deviations theorem—see Appendix. Lemma 2. Let x, y in Rd , with nonnegative coordinates. Assume that for any 1 p ∞, we have the strict inequality x p < y p . Then there exists an integer n such that x ⊗n ≺w y ⊗n . Proof. Consider x and y satisfying the hypotheses of the lemma. We can assume by multiplying both vectors by a positive constant K that y1 = 1. Let p = 1 − x1 > 0. We introduce the measures µx and µ y associated to x and y; µ y is a probability measure but µx is not, so we add a mass at −∞ by setting µx = µx + pδ−∞ . Let X and Y be random variables distributed according to µx and µ y respectively. We denote by (X n ) (resp. (Yn )) a sequence of i.i.d. copies of X (resp. Y). We are going to show that for n large enough ∀t ∈ R, P(X 1 + · · · + X n nt) P(Y1 + · · · + Yn nt).

(2)

This is equivalent to showing that ∞ nt

dµ∗n x =

∞ nt

dµ∗n x

∞

dµ∗n y ,

nt

which, by the previous lemma implies x ⊗n ≺w y ⊗n . Note that the asymptotic behavior of the quantities appearing in (2) is governed by Cramér’s theorem. Let f n (t) = P(X 1 + · · · + X n nt)1/n and gn (t) = P(Y1 + · · · + Yn nt)1/n . Applying Cramér’s theorem (see Appendix), we obtain 1− p if t E(X |X = −∞) f (t) := lim f n (t) = − ∗ (t) n→∞ otherwise. e X 1 if t E(Y ) g(t) := lim gn (t) = − ∗ (t) Y n→∞ e otherwise. Note also that the log-Laplace of X , defined for λ ∈ R by X (λ) = log EeλX , is related to the p norms of x: ∀λ 0, X (λ) = log xλ+1 λ+1 . The same holds for Y : Y (λ) = log yλ+1 λ+1 and thus we have X (λ) < Y (λ) for λ 0. Let M X = esssup X = log x∞ and MY = esssup Y = log y∞ ; by hypothesis M X < MY . First of all, note that f n (t) = 0 for t M X , so it suffices to show that f n gn on (−∞, M X ], for n large enough. We claim that f < g on (−∞, MY ), and thus on (−∞, M X ]. Indeed, for E(Y ) t < MY , the supremum in the definition of

∗Y (t) is attained at a point λ0 0 (cf. Appendix), so we have that f (t) e−(λ0 t − X (λ0 )) < e−(λ0 t − Y (λ0 )) = g(t), where the strict inequality follows from the fact that X (λ) < Y (λ), for all λ 0. For t < E(Y ), g(t) = 1 and f (t) 1 − p < 1. Moreover, the functions f and g admit finite

Catalytic Majorization and p Norms

141

limits in −∞: limt→−∞ f (t) = 1 − p and limt→−∞ g(t) = 1. Thus, on the compact set [−∞, M X ], the functions f and g are well-defined, non-increasing, continuous and satisfy f < g. We now use the following elementary fact: if a sequence of non-increasing functions defined on a compact interval I converges pointwise towards a continuous limit, then the convergence is actually uniform on I (for a proof see [19] Part 2, Problem 127; this statement is attributed to Pólya or to Dini depending on authors). We apply this result to ( f n ) and (gn ) on the interval I = [−∞, M X ] to conclude that the convergence is uniform for both sequences. As f < g, we can therefore find n large enough such that f n gn on I , and thus on R. This is equivalent to (2) and completes the proof of the lemma. Remark 4. It is possible to avoid the use of Cramér’s theorem by using low-technology estimates on large deviations probability instead, as done in [12]. This requires additional care to get the required uniform bounds and slightly obfuscates the argument. The only advantage is to give explicit bounds for the value of n in Lemma 2, which our compactness argument does not. These bounds are quite bad anyway, and for example do not allow to replace the 1 -closure in the main theorem by a p -closure for some p < 1. Proof of (c) ⇒ (a) (continued). Recall that x and y are such that x p y p for any p 1 and that we want to find, for any ε > 0 small enough, a vector xε ∈ M<∞ (y) such that x − xε 1 ε. Let dx (resp. d y ) be the number of nonzero coordinates of x (resp. y). We proceed as follows : let 0 < ε < 2dx xmin and consider the (deficient) vector xε obtained from x by subtracting ε/2dx to each of its nonzero coordinates. This implies that xε is a positive vector, x − xε 1 = ε/2 and that xε satisfies the hypotheses of Lemma 2. Applying the lemma, we obtain the existence of an integer n such that (xε )⊗n ≺w y ⊗n . Remember that xε is deficient; we now enlarge it into a vector xε ∈ P<∞ by adding mass ε/2. But since we want to keep the property xε⊗n ≺w y ⊗n (which is identical to xε⊗n ≺ y ⊗n ), a safe way to do this is to add a large number of coordinates, each of them being very small. More precisely, let xε = xε ⊕δ ⊕D , where δ D = ε/2 and δ is a positive n n ⊗n ⊗n number such that δ(xε )n−1 max min((x ε )min , ymin ). We claim that x ε ≺ y , that is, for any k 1, k k ↓ ⊗n ↓ (xε )i (y ⊗n )i . (3) i=1

i=1

Indeed, δ has been chosen so that the dxn largest coordinates of xε⊗n are exactly the coordinates of (xε )⊗n , so when 1 k dxn , (3) follows from the relation (xε )⊗n ≺w y ⊗n . ↓ If dx < k d yn , the inequality also holds since the choice of δ guarantees (xε⊗n )k ↓ (y ⊗n )k . Finally if k d yn , (3) holds trivially since the right-hand side equals 1. In conclusion, xε⊗n ≺ y ⊗n , and thus xε ∈ M<∞ (y). But xε has been constructed such that x − xε 1 ε and thus x ∈ M<∞ (y) which completes the proof of the theorem.

5. Conclusion and Further Remarks In conclusion, we are able to give a nice description of the 1 -closure of the set T<∞ (y). However, this closure may be substantially larger than the usual closure Td (y) in Pd ,

142

G. Aubrun, I. Nechita

and requires approximation by vectors with growing support. Our result can be seen as a contribution to a conjecture attributed to Nielsen [4]: Conjecture 1. Fix a vector y ∈ Pd . Then a vector x ∈ Pd belongs to Td (y) if and only if the following conditions are verified: (1) For p 1, x p y p . (2) For 0 < p 1, x p y p . (3) For p < 0, x p y p . M. Klimesh announced a proof of this conjecture in a short communication [10], but the solution has not appeared in print yet. However, his methods are different from our approach (private communication). Note that the definition of · p given in (1) is extended to any p ∈ R∗ . For p < 1, · p is not a norm in the usual sense. We have shown that the condition (1) above is equivalent to x ∈ T<∞ (y). Notice however that T<∞ (y) is in general larger than Td (y); note also that the set of x ∈ Pd that satisfy conditions (1–3) is closed. The “only if” part of the conjecture follows from standard convexity/concavity properties of functionals · p , see [16,4]. This question also appears in [7] where it is formulated using the Rényi entropies. For any real p = 1, the p-Rényi entropy is defined for x ∈ Pd as d p sgn( p) log2 H p (x) = xi . p−1 i=1

The limit case p = 1 corresponds to the usual Shannon entropy. The conditions (1–3) of the conjecture can be concisely reformulated as “H p (x) H p (y) for all p”. b

An intermediate notion is the following: for y ∈ Pd , let T<∞ (y) be the set of vectors x ∈ Pd such that there is a sequence (xn ) in T<∞ (y) tending to x, with a uniform bound b on the size of the support of xn . We think that a description of T<∞ (y) could be related to the set of vectors which satisfy conditions (1) and (2)—but not necessarily (3)—in Conjecture 1. There is one more consequence of our main theorem we would like to discuss. Recall that when defining catalysis, we insisted on the fact that the catalyst should be finitelysupported. Let P∞ ⊂ 1 be the set of infinite-dimensional probability vectors, and for y in P<∞ , define the set T (y) of (finitely supported) vectors trumped by y using infinite catalysts: T (y) = {x ∈ P<∞ s.t. ∃z ∈ P∞ s.t. x ⊗ z ≺ y ⊗ z}. As shown in [4] (Sect. 4.3), in general T<∞ (y) = T (y). However, since x ∈ T (y) implies x p y p for all p 1, it follows from our main theorem that T<∞ (y) = T (y). 6. Appendix: On Cramér’s Theorem We review here some facts from large deviations theory. A complete reference for all the material contained here is [6]. Let X be a random variable taking values in [−∞, ∞). We allow X to equal −∞ with positive probability; this is a nonstandard hypothesis. We however exclude the trivial case P(X = −∞) = 1. We write E for the expectation.

Catalytic Majorization and p Norms

143

We assume also that the conditional expectation E(X |X = −∞) is finite. The cumulant generating function X of the random variable X is defined for any λ ∈ R by

X (λ) = log EeλX . It is a convex function taking values in (−∞, +∞]. Its convex conjugate ∗X , sometimes called the Cramér transform, is defined as

∗X (x) = sup λx − X (λ).

(4)

λ∈R

Note that X is a smooth and strictly convex function on [0, +∞]. Moreover, X (0) = E(X |X = −∞) and limλ→+∞ X (λ) = esssup(X ). Consequently, for any x such that E(X |X = −∞) < x < esssup(X ), the supremum in (4) is attained at a unique point λ 0. We now state Cramér’s theorem in a suitable formulation Proposition 1. Let X be a [−∞, +∞)-valued random variable such that X (λ) < +∞ for any λ 0. Let (X i ) be a sequence of i.i.d. copies of X . Then for any t ∈ R, 1 log P(X = −∞) if t E(X |X = −∞) lim log P(X 1 + · · · + X n tn) = n→∞ n − ∗X (t) otherwise. Proof. Let Xˆ denote the random variable X conditioned to be finite, that is for any Borel set B ⊂ R, P( Xˆ ∈ B) =

1 P(X ∈ B), 1− p

where p = P(X = −∞). A consequence of the classical Cramér theorem ([6], Corollary 2.2.19) states that ∀t ∈ R,

1 log P( Xˆ 1 + · · · + Xˆ n tn) = − inf ∗Xˆ (s). n→∞ n s t lim

(5)

One checks that Xˆ = X − log(1 − p), and consequently

∗Xˆ = ∗X + log(1 − p).

(6)

P(X 1 + · · · + X n tn) = (1 − p)n P( Xˆ 1 + · · · + Xˆ n tn).

(7)

Note also that

Finally, note that the infimum on the right hand side of (5) is null for t E( Xˆ ) and equals ∗ˆ (t) for t > E( Xˆ ). This follows from the fact that the convex function X t → ∗ (t) attains its zero minimum at t = E( Xˆ ) and is increasing for t E( Xˆ ). Thus, Xˆ

we can rewrite Eq. (5) as: 0 1 lim log P( Xˆ 1 + · · · + Xˆ n tn) = n→∞ n − ∗ˆ (t) X The proposition follows from the Eqs. (6)–(8).

if t E( Xˆ ) otherwise.

(8)

144

G. Aubrun, I. Nechita

Acknowledgement. We thank the referees for several helpful remarks that improved the presentation of the paper. Note added in proof: Several preprints recently appeared on closely related topics. Proofs of Conjecture 1 were presented by Turgut in [20] and by Klimesh in [11]. Finally, we were able to pursue the large deviation b approach used here to obtain more precise results in [1] (in particular, we get a description of T<∞ (y) ). The analogue of Conjecture 1 remains open for MLOCC transformations.

References 1. Aubrun, G., Nechita, I.: Stochastic domination for iterated convolutions and catalytic majorization. Preprint, Available at arXiv:0707.0211 2. Bandyopadhyay, S., Roychowdhury, V., Sen, U.: Classification of nonasymptotic bipartite pure-state entanglement transformations. Phys. Rev. A 65, 052315 (2002) 3. Bhatia, R.: Matrix Analysis. Graduate Texts in Mathematics, Volume 169, New York: Springer-Verlag, 1997 4. Daftuar, S.K.: Eigenvalues Inequalities in Quantum Information Processing. Ph. D. Thesis, California Institute of technology, 2004. Available at http://resolver.caltech.edu/CaltechETD:etd-03312004-100014 5. Daftuar, S.K., Klimesh, M.: Mathematical structure of entanglement catalysis. Phys. Rev. A (3) 64(4), 042314 (2001) 6. Dembo, A., Zeitouni, O.: Large Deviations Techniques and Applications. Second edition. Applications of Mathematics (New York), 38, New York: Springer-Verlag, 1998 7. Duan, R., Feng, Y., Li, X., Ying, M.: Multiple-copy entanglement transformation and entanglement catalysis. Phys. Rev. A 71, 042319 (2005) 8. Duan, R., Ji, Z., Feng, Y., Li, X., Ying, M.: Some issues in quantum information theory. J. Comput. Sci. & Tech. 21(5), 776–789 (2006) 9. Jonathan, D., Plenio, M.B.: Entanglement-assisted local manipulation of pure quantum states. Phys. Rev. Lett. 83(17), 3566–3569 (1999) 10. Klimesh, M.: Entropy measures and catalysis of bipartite quantum state transformations. Extended abstract, ISIT 2004, Chicago, USA 11. Klimesh, M.: Inequalities that collectively completely characterize the catalytic majorization relation. Preprint, Available at arXiv:0709.3680 12. Kuperberg, G.: The capacity of hybrid quantum memory. IEEE Trans. Inform. Theory 49, 1465–1473 (2003) 13. Marshall, A., Olkin, I.: Inequalities: Theory of Majorization and its Applications. Mathematics in Science and Engineering, 143. New York-London: Academic Press Inc., 1979 14. Mitra, T., Ok, E.: Majorization by L p -Norms. Preprint, available at http://homepages.nyu.edu/~eo1/ Papers-PDF/Major.pdf 15. Nielsen, M.: Conditions for a class of entanglement transformations. Phys. Rev. Lett. 83, 436 (1999) 16. Nielsen, M.: An introduction to majorization and its applications to quantum mechanics. Preprint, available at www.qinfo.org/talks/2002/maj/book.ps 17. Nielsen, M., Chuang, I.: Quantum Computation and Quantum Information. Cambridge: Cambridge University Press, 2000 18. Open problems in Quantum Information Theory, available at http://www.imaph.tu-bs.de/qi/problems/ or http://arxiv.org/list/quant-ph/0504166, 2005 19. Pólya, G., Szegö, G.: Problems and Theorems in Analysis. Berlin-New York: Springer-Verlag, 1978 20. Turgut, S.: Necessary and sufficient conditions for the trumping relation. Preprint, Available at arXiv: 0707.0444 Communicated by M.B. Ruskai

Commun. Math. Phys. 278, 145–177 (2008) Digital Object Identifier (DOI) 10.1007/s00220-007-0387-z

Communications in

Mathematical Physics

Results on Normal Forms for FPU Chains Andreas Henrici , Thomas Kappeler Institut für Mathematik, Universität Zürich, Winterthurerstrasse 190, CH-8057 Zürich, Switzerland. E-mail: [email protected]; [email protected] Received: 16 February 2007 / Accepted: 19 May 2007 Published online: 22 November 2007 – © Springer-Verlag 2007

Abstract: In this paper we prove, among other results, that near the equilibirum position, any periodic FPU chain with an odd number N of particles admits a Birkhoff normal form up to order 4, whereas any periodic FPU chain with N even admits a resonant normal form up to order 4. This resonant normal form of order 4 turns out to be completely integrable. Further, for N odd, we obtain an explicit formula of the Hessian of its Hamiltonian at the fixed point. 1. Introduction In this paper we consider FPU chains with N particles of equal mass, normalized to be one. Such chains have been introduced by Fermi, Pasta, and Ulam [5], as models to test numerically the principle of thermalization as N gets larger and larger. A FPU chain consists of a string of particles moving on the line or the circle interacting only with their nearest neighbors through nonlinear springs. Its Hamiltonian is given by HV =

1 2 pn + V (qn − qn+1 ), 2 N

N

n=1

n=1

(1)

where V : R → R is a smooth potential. The corresponding Hamiltonian equations read (1 ≤ n ≤ N ) q˙n = ∂ pn HV = pn , p˙ n = −∂qn HV = −V (qn − qn+1 ) + V (qn−1 − qn ). Supported in part by the Swiss National Science Foundation.

Supported in part by the Swiss National Science Foundation, the programme SPECT and the European

Community through the FP6 Marie Curie RTN ENIGMA (MRTN-CT-2004-5652).

146

A. Henrici, T. Kappeler

Here qn denotes the displacement of the n th particle from its equilibrium position and pn is its momentum. If not stated otherwise we assume periodic boundary conditions (qi+N , pi+N ) = (qi , pi ) ∀i ∈ {0, 1}. Without loss of generality, the potential V : R → R is assumed to have a Taylor expansion at 0 of the form 1 2 α 3 β 4 V (x) = κ x − x + x + ... , (2) 2 3! 4! where κ is the (linear) spring constant normalized to be 1 and α, β ∈ R are parameters measuring the strength of the nonlinear interaction. The minus sign in front of the parameter α in the expansion (2) turns out to be convenient for later computations. Substituting the expression (2) for V into (1), the corresponding expansion of HV is given by HV =

1 2 1 α β pn + (qn+1 −qn )2 + (qn+1 −qn )3 + (qn+1 −qn )4 + . . . . (3) 2 2 3! 4! N

N

N

n=1

n=1

n=1

N

n=1

N

For any FPU chain, the total momentum P = N1 n=1 pn is an integral of motion, N and therefore the center of mass Q = N1 n=1 qn evolves with constant velocity. Hence any FPU chain can be viewed as a family of Hamiltonian systems of N − 1 degrees of freedom, parametrized by the vector of initial conditions (Q, P) ∈ R2 with Hamiltonian independent of Q. In particular, for N = 2 any FPU chain is integrable, and hence we will concentrate on the case N ≥ 3. Further note that for any vector (Q, P) ∈ R2 , the origin in R2N −2 is an equilibrium point of the corresponding system. The momentum of such an equilibrium point is given by the constant vector ( p1 , . . . , p N ) = P (1, . . . , 1). Introduce the function I = (Ik )1≤k≤N −1 , defined on R2N −2 with values in R N −1 , Ik =

1 2 (x + yk2 ). 2 k

(4)

Further define the function Hα,β : R N −1 → R, given by Hα,β (I ) := 2

N −1 k=1

sin

N −1 mπ kπ 1 β − α 2 lπ Ik + sin Il Im , ck Ik2 + sin N 4N 2N l=m N N k=1

α2

(5)

1≤l,m≤N −1

− α 2 ) sin2 kπ N .

where ck ≡ ck (α, β) := + (β Surprisingly, the properties of periodic FPU chains near the equilibrium point strongly depend on the parity of the number N of particles. If N is odd, our results are the following ones: Theorem 1.1. Let α, β ∈ R with (α, β) = (0, 0). If N ≥ 3 is odd, then any periodic FPU chain admits a Birkhoff normal form of order 4. More precisely, there are canonical coordinates (xk , yk )1≤k≤N −1 so that the Hamiltonian of any FPU chain, when expressed in these coordinates, takes the form N P2 + Hα,β (I ) + O(|(x, y)|5 ) 2 with Hα,β (I ) given by (5).

Results on Normal Forms for FPU Chains

147

Corollary 1.2. Near the equilibrium state, any FPU chain with an odd number N of particles can be approximated up to order 4 relative to its center of mass coordinates by an integrable system of N − 1 harmonic oscillators which are coupled at fourth order except if β = α 2 (Toda case). Denote by Q α,β the Hessian of Hα,β (I ) at I = 0. Note that Q α,β is an (N −1)×(N −1) matrix which only depends on the parameters α and β. For the following result we do not have to assume that N is odd. Theorem 1.3. (i) For any given α ∈ R \ {0}, det(Q α,β ) is a polynomial in β of degree N − 1 and has N − 1 real zeroes (counted with multiplicities). When listed in increasing order, the zeroes βk = βk (α) satisfy 0 < β1 < α 2 , 2α 2 < β2 ≤ . . . ≤ β N −1 and contain the N 2−1 distinct numbers −1 N −1 2 2 kπ ). α 1 + sin (1 ≤ k ≤ N 2 Moreover index(Q α,β ), defined as the number of negative eigenvalues of Q α,β , is given by ⎧ for β < β1 , ⎨1 for β1 < β < β2 , index (Q α,β ) = 0 ⎩ N − 2 for β > β N −1 . (ii) For α = 0, det(Q 0,β ) is a polynomial in β of degree N − 1, and β = 0 is the only zero of det(Q 0,β ). It has multiplicity N − 1, and the index of Q 0,β is given by

1 for β < 0, index (Q 0,β ) = N − 2 for β > 0. Periodic FPU chains with an even number N of particles do not admit a Birkhoff normal form up to order 4 due to resonances except if β = α 2 (Toda case). Our analysis of odd FPU chains leads in the case of even FPU chains to a resonant Birkhoff normal form up to order 4. Define the functions J = (Jk )1≤k≤N −1 and M = (Mk )1≤k≤N −1 on R2N −2 with values in R N −1 given by Jk =

1 1 (xk x N −k + yk y N −k ) and Mk = (xk y N −k − x N −k yk ), 2 2

(6)

and introduce Rα,β (J, M) :=

β − α2 R(J, M) + R N (J, M) , 4 4N

where R(J, M) = 4

1≤k< N4

sin

2kπ Jk J N −k − Mk M N −k 2 2 N

(7)

148

A. Henrici, T. Kappeler

and

R N (J, M) =

J 2N − M 2N 0

4

4

4

if

N 4

∈N

otherwise.

Note that for α, β ∈ R with β = α 2 (Toda case), the expression Rα,β vanishes. Our main results on even periodic FPU chains are the following ones: Theorem 1.4. Let α, β ∈ R with (α, β) = (0, 0). If N ≥ 4 is even, there are canonical coordinates (xk , yk )1≤k≤N −1 so that the Hamiltonian of any FPU chain, when expressed in these coordinates, takes the form G α,β (I, J, M) + O(|(x, y)|5 ), where G α,β (I, J, M) =

N P2 + Hα,β (I ) − Rα,β (J, M), 2

(8)

and where Hα,β (I ) and Rα,β (J, M) are given by (5) and (7), respectively. It is quite remarkable that G α,β turns out to be completely integrable. To make this statement more precise, introduce for 1 ≤ k < N4 , K k,α,β (J, M) = dk− (Jk2 + Mk2 ) + d − N 2

−k

(J 2N −k + M 2N −k ) 2

2

2kπ (Jk J N −k − Mk M N −k ), −2(β − α ) sin 2 2 N 2

where dk− := −α 2 + (β − α 2 ) sin2

K N ,α,β (J, M) = 4

kπ N ,

and let

−α 2 J 2N + (β − 2α 2 )M 2N 0

4

4

if N ≡ 0 mod 4 otherwise.

Theorem 1.5. Let N ≥ 4 be an even integer. Then for any α, β ∈ R with (α, β) = (0, 0), the truncated FPU Hamiltonian G α,β given by (8) is completely integrable. The following N − 1 quantities are functionally independent integrals in involution: (Ik + I N −k )1≤k≤ N , (Ik + I N +k )1≤k< N , (K k,α,β )1≤k≤ N . 2

2

4

4

Application 1. In the case where N is odd, Theorems 1.1 and 1.3 allow to apply for any given α ∈ R the classical KAM theorem (see e.g. [11]) near the equilibrium point to α 3 the FPU chain with Hamiltonian HV for a real analytic potential V (x) = 21 x 2 − 3! x + β 4 4! x + . . . with β ∈ R \ {β1 (α), . . . , β N −1 (α)}. Moreover, note that for any given α ∈ R \ {0}, the Hessian Q α,β of Hα,β is positive definite for any β satisfying β1 (α) < β < β2 (α), in particular for α 2 ≤ β ≤ 2α 2 . Hence one can apply Nekhoroshev’s theorem (see e.g. [12]) near the equilibrium point to the FPU chain with Hamiltonian HV for V with such β’s. These perturbation results confirm long standing conjectures see e.g. [3].

Results on Normal Forms for FPU Chains

149

Application 2. Theorem 1.5 can be used to show that any FPU chain with Dirichlet boundary conditions admits a Birkhoff normal form up to order 4 near the equilibrium point (q, p) = (0, 0) by viewing such a system as an invariant symplectic submanifold of a periodic FPU chain. Consider a chain with N (N ≥ 3, not necessarily even) moving particles, Hamiltonian HVD

N

N

n=1

n=1

1 2 = pn + V (qn − qn+1 ) 2

and with endpoints assumed to be fixed, q0 = q N +1 = 0. Theorem 1.6. Let α, β ∈ R with (α, β) = (0, 0). Then any FPU chain with N ≥ 3 moving particles and satisfying Dirichlet boundary conditions admits a Birkhoff normal form of order 4, i.e. there are canonical coordinates (xk , yk )1≤k≤N so that HVD takes the form (N + 1)P 2 D (I ) + O(|(x, y)|5 ), + Hα,β 2 D (I ) is given by where I = (I1 , . . . , I N ) and Hα,β

N

N

1 β − α2 2 2 2 2 2 sk I k + (α + 3(β − α )s )I + 2 IN k k 16(N + 1) 32(N + 1) 4 k=1 k=1 only if

⎛

⎞

N β − α2 ⎜ ⎟ + 4s s I I − s2k Ik I N +1−k ⎠ , ⎝ l m l m 16(N + 1) l =m 1≤l,m≤N

with sk = sin

kπ 2N +2

N 4

∈N

(9)

k=1

(1 ≤ k ≤ N ).

Similarly as in the case of odd periodic FPU chains, one can analyze the Hessian D of H D (I ) at I = 0. Q α,β α,β D ) is a polynomial in β of degree Theorem 1.7. (i) For any given α ∈ R \ {0}, det(Q α,β N and has N real zeroes (counted with multiplicities). When listed in increasing order, the zeroes βk = βk (α) satisfy

β1 ≤ . . . ≤ β N +1 < α 2 < β N +3 ≤ . . . ≤ β N . 2 2 D ), defined as the number of negative eigenvalues of Q D , is Moreover index(Q α,β α,β given by ⎧ N +1 ⎪ ⎨ 2 for β < β1 D for β N +1 < β < β N +3 . )= 0 index (Q α,β 2 2 ⎪ ⎩ N −1 2 for β > β N

150

A. Henrici, T. Kappeler

D ) is a polynomial in β homogeneous of degree N . Hence β = 0 (ii) For α = 0, det(Q 0,β D ) and the index of Q D is given by is the only zero of det(Q 0,β 0,β

D index (Q 0,β )

=

N 2+1 N 2−1

for β < 0 . for β > 0

Related work. Theorem 1.1 and Theorem 1.3 improve on earlier results of Rink [13]. In particular, Theorem 1.3 treats the case α = 0 which was described in [13] as one of the major open problems in this topic. In fact, Theorem 1.1 and Theorem 1.3 solve all open problems stated in [13] for N odd. Theorems 1.5, 1.6, and 1.7 considerably improve on earlier results of Rink in [13,14], and [15] (cf. also [10]) where the case α = 0 was treated. Our approach has been shaped by our earlier work on the Toda lattice [6,7]. The latter one, introduced by Toda [16] and extensively studied in the sequel, is a special FPU chain which is completely integrable. It turns out that the canonical transformations which near the equilibrium bring the Toda lattice into Birkhoff normal form can be used for any FPU chain. In other words, the existence of the Birkhoff normal form stated in Theorem 1.1 is, at least partially, a consequence of the fact that the family of FPU chains, parametrized by α, β, . . . , contains an integrable system, namely the Toda lattice. The literature on FPU chains is huge - see e.g. the recent survey article [3] by Berman and Izrailev. One of the most important open problems in this topic is the investigation of the dynamics of FPU chains when the number of particles gets larger and larger. It is likely that our results combined with results for the KdV equation established in [9] can be used for this purpose. For recent contributions in this direction see e.g. [1,2]. Outline. In Sect. 2, we review the notion of Birkhoff normal form. We show Theorem 1.1 in Sects. 3 and 4 and Theorem 1.4 in Sect. 5, whereas Theorem 1.3 will be proved in Sect. 6. Theorems 1.5, 1.6, and 1.7 together with a detailed analysis of the (quite complicated) level sets of the integrals in Theorem 1.5 are proved in the subsequent paper [8].

2. Birkhoff Normal Form Consider an isolated equilibrium of a Hamiltonian system on some 2n-dimensional symplectic manifold, i.e. an isolated singular point of the Hamiltonian vector field. Neglecting an irrelevant additive constant, the Hamiltonian, when expressed in canonical coordinates w = (q, p) near the equilibrium with coordinates q = 0, p = 0, then has the form H=

1 Aw, w + . . . , 2

where A is the symmetric 2n × 2n-Hessian of H at 0 and the dots stand for terms of higher order in w. We now assume that the equilibrium point w = 0 is elliptic, i.e. the spectrum of the linearized system, w˙ = J Aw, is purelyimaginary, spec(J A) = 0 I dn is the stan{±iλ1 , . . . , ±iλn } with real numbers λ1 , . . . , λn . Here J = −I dn 0 dard symplectic structure of R2n . If spec(J A) is simple there exists a linear symplectic

Results on Normal Forms for FPU Chains

151

change of coordinates which brings the quadratic part of the Hamiltonian into normal form. Denoting the new coordinates by the same symbols as the old ones one has

Aw, w =

n

λi (qi2 + pi2 ).

i=1

Definition 2.1. A Hamiltonian H is in Birkhoff normal form up to order m ≥ 2, if it is of the form H = N2 + N4 + · · · + Nm + Hm+1 + · · · ,

(10)

where the Nk , 2 ≤ k ≤ m, are homogeneous polynomials of order k, which are actually functions of q12 + p12 , . . . , qn2 + pn2 , and where Hm+1 + . . . stands for (arbitrary) terms of order strictly greater than m. If this holds for any m, the Hamiltonian is said to be in Birkhoff normal form and the coordinates (qi , pi )1≤i≤n are referred to as Birkhoff coordinates. Note that if a Hamiltonian H admits a Birkhoff normal form of order m, the coefficients of the expansion (10) up to order m are uniquely determined, as long as the normalizing transformation is of the form id + . . . . However, the normalizing transformation is by no means unique. There are well known theorems guaranteeing the existence of a Birkhoff normal form up to order m assuming that the frequencies λ1 , . . . , λn satisfy certain nonresonance conditions - see e.g. Theorem 4.3 in [9]. However, in the case under investigation, the nonresonance conditions for m = 4 are not satisfied. We will show by an explicit computation that an arbitrary odd periodic FPU chain admits a Birkhoff normal form up to order 4, while an even periodic FPU chain only admits a resonant normal form except in the case β = α 2 (Toda case).

3. Birkhoff Normal Form up to Order 2 Following the procedure of constructing Birkhoff coordinates for the Toda lattice [6] we first introduce relative coordinates (which are a canonical version of the Flaschka coordinates),

vi := qi+1 − qi (1 ≤ i ≤ N − 1) and v N :=

N 1 qi , N

(11)

i=1

and denote by (u i )1≤i≤N the corresponding conjugate variables. It turns out that u N = N N −1 N · P = i=1 pi and q N +1 − q N = − k=1 vk . The Hamiltonian HV in (3), when 2 expressed in these coordinates, takes the form HV = N 2P + H˜ V with H˜ V = Hu + Hv , where Hu and Hv only depend on u = (u i )1≤i≤N −1 and v = (vi )1≤i≤N −1 , respectively,

152

A. Henrici, T. Kappeler

and are given by N −2 1 2 2 2 (u l+1 − u l ) + u N −1 , Hu = u1 + 2 l=1 ⎛ ⎛ N −1 2 ⎞ N −1 3 ⎞ N −1 N −1 1 ⎝ 2 α vk + vk ⎠ + ⎝ vk3 − vk ⎠ Hv = 2 3! k=1 k=1 k=1 k=1 ⎞ ⎛ 4 N −1 N −1 β 4 + ⎝ vk + vk ⎠ + O(v 5 ). 4! k=1

k=1

To bring H˜ V = Hu + Hv into Birkhoff normal form up to order 2 we introduce new coordinates (ξk , ηk )1≤k≤N −1 . It turns out to be convenient to use complex notation, i.e. for 1 ≤ k ≤ N − 1, 1 1 ζk := √ (ξk − iηk ), ζ−k := √ (ξk + iηk ), 2 2

(12)

where the minus sign in the definition of ζk is chosen so that dζk ∧ dζ−k = idξk ∧ dηk . The vector ζ = (ζk )1≤|k|≤N −1 is an element in the space Z := z = (z k )1≤|k|≤N −1 ∈ C2N −2 : z −k = z k ∀ 1 ≤ k ≤ N − 1 .

(13)

Further introduce the notation kπ 21 (0 ≤ |k| ≤ N − 1). λk := sin N The proposed linear transformation Z → R2N −2 , ζ → (v, u) is then defined by 1 u 1 (ζ ) = √ N 1 u l+1 (ζ ) − u l (ζ ) = √ N 1 −u N −1 (ζ ) = √ N

λ k ζk ,

(14)

λk e2πilk/N ζk (1 ≤ l ≤ N − 2),

(15)

λk e2πi(N −1)k/N ζk ,

(16)

1≤|k|≤N −1

1≤|k|≤N −1

1≤|k|≤N −1

and 1 vl (ζ ) = √ N

λk e2πilk/N e−iπ k/N ζk (1 ≤ l ≤ N − 1).

(17)

1≤|k|≤N −1

Note that (16) is actually a consequence of (14) and (15). As explained in [7] it follows from the construction of the Birkhoff map of the Toda lattice that this map is a canonical isomorphism - see Lemma B.1 in Appendix B for a self-contained proof.

Results on Normal Forms for FPU Chains

153

When expressed in the new coordinates, Hu and Hv take the form 1 G 2 (ζ ), 2 1 Hv (ζ ) = G 2 (ζ ) + αG 3 (ζ ) + βG 4 (ζ ) + O(ζ 5 ), 2

Hu (ζ ) =

where G 2 := 2

N −1

λ2k ζk ζ−k ,

k=1

1 G 3 := √ 6 N 1 G 4 := 24N with

and

(18)

(k,k ,k )∈K

(−1)(k+k +k

)/N

λk λk λk ζk ζk ζk ,

(19)

3

(−1)(k+k +k +k

)/N

λk λk λk λk ζk ζk ζk ζk ,

(20)

(k,k ,k ,k )∈K 4

K 3 := (k, k , k ) ∈ Z3 : 1 ≤ |k|, |k |, |k | ≤ N − 1 and k + k + k ≡ 0 mod N

(21)

K 4 := (k, k , k , k ) ∈ Z4 : 1 ≤ |k|, |k |, |k |, |k | ≤ N − 1 and k + k + k + k ≡ 0 mod N .

(22)

Note that G 2 , G 3 , and G 4 are independent of α and β. In particular they came up in [7] when we computed the Birkhoff normal form of the Toda lattice. For a detailed derivation of the formulas for G 2 , G 3 , and G 4 see Appendix B. Summarizing the results of this section we have that H˜ V (ζ ) = G 2 (ζ ) + αG 3 (ζ ) + βG 4 (ζ ) + O(ζ 5 ) is in Birkhoff normal form up to order 2. As a consequence, ζ = 0 is an elliptic fixed point of the Hamiltonian H˜ V . 4. Proof of Theorem 1.1. We now begin by transforming H˜ V (ζ ) into its Birkhoff normal form up to order 4. Here we follow a standard procedure - see e.g. Sect. 14 in [9]. The phase space Z, defined in (13), is endowed with the Poisson bracket {F, G} = i

1≤|k|≤N −1

σk

∂ F ∂G , ∂ζk ∂ζ−k

where σk = sgn (k) is the sign of k. The Hamiltonian vector field X F associated to the F ∂ Hamiltonian F is then given by X F = i 1≤|k|≤N −1 σk ∂ζ∂ −k ∂ζk . With a first canonical

154

A. Henrici, T. Kappeler

transformation we want to eliminate the third order term αG 3 in H˜ V (ζ ). By a by now standard precedure we construct such a canonical transformation on the phase space Z as the time-1-map 1 := X αt F3 |t=1 of the flow X αt F3 of a real analytic Hamiltonian α F3 which is a homogeneous polynomial in ζk (1 ≤ |k| ≤ N − 1) of degree 3 and solves the homological equation {G 2 , α F3 } + αG 3 = 0.

(23) To simplify notation we momentarily write F instead of α F3 and H instead of H˜ V . Assuming for the moment that (23) can be solved and that X tF is defined for 0 ≤ t ≤ 1 in some neighbourhood of the origin in Z, we can use Taylor’s formula to expand H ◦ X tF around t = 0, t d (H ◦ X sF )ds H ◦ X tF = H ◦ X 0F + 0 ds t =H+ {H, F} ◦ X sF ds 0 t s d = H + t {H, F} + ds ds ({H, F} ◦ X sF ) ds 0 0 t = H + t {H, F} + (t − s){{H, F}, F} ◦ X sF ds. (24) 0

When evaluating this expression at t = 1, one gets 1 (1 − t){{G 2 , F}, F} ◦ X tF dt H ◦ 1 = G 2 + {G 2 , F} + 0

1

+αG 3 + 0

{αG 3 , F} ◦ X tF dt + βG 4 + O(ζ 5 ).

Using that {G 2 , F} + αG 3 = 0, the latter expression simplifies and we get 1 H ◦ 1 = G 2 + t {αG 3 , F} ◦ X tF dt + βG 4 + O(ζ 5 ). 0

Integrating by parts once more and taking into account that F ≡ α F3 is homogeneous of degree 3 one obtains, in view of (24), 1 H˜ V ◦ 1 = G 2 + {αG 3 , α F3 } + βG 4 + O(ζ 5 ). (25) 2 Note that {G 3 , F3 } is homogeneous of order 4. Hence our first step is achieved. It remains to solve (23). Since G 3 contains only monomials with (k, k , k ) ∈ K 3 (cf. (21)), also F3 need only contain such monomials, (3) F3 = Fkk k ζk ζk ζk (k,k ,k )∈K 3

which leads to

{G 2 , F3 } = i

2σk λ2k ζ−k

1≤|k|≤N −1

= −i

(k,k ,k )∈K 3

∂ F3 ∂ζ−k (3)

(sk + sk + sk )Fkk k ζk ζk ζk ,

(26)

Results on Normal Forms for FPU Chains

155

where sk := 2σk λ2k = 2 sin

kπ . N

The following result is due to Beukers and Rink (cf. [13,15]): Lemma 4.1. For any (k, k , k ) ∈ K 3 , sk + sk + sk = 0. Let us remark that Lemma 4.1 also follows from the integrability of the Toda lattice (cf. [7]). We include the self-contained proof due to Beukers and Rink. Proof. Suppose that (k, k , k ) ∈ K 3 satisfies sk + sk + sk = 0. It follows from k + k + k ≡ 0 mod N that either sk = −sk+k or sk = sk+k , according to whether k + k + k ≡ 0 or k + k + k ≡ N mod 2N . In the first case, it follows that kπ k π kπ kπ 2i sin + 2i sin − 2i sin + = 0. (27) N N N N Setting x := e

ikπ N

and y := e

0=x−

ik π N

, one can rewrite (27) as

1 1 1 1 + y − − xy + = (1 − x)(1 − y)(1 − x y) . x y xy xy

(28)

It follows that any solution of (28) contradicts the assumption 1 ≤ |k|, |k |, |k | ≤ N −1. Indeed, solutions with x = 1 (i.e. k ≡ 0 mod 2N ), y = 1 (i.e. k ≡ 0 mod 2N ), or x y = 1 (i.e. k + k ≡ 0 mod 2N and thus k ≡ 0 mod 2N ), contradict this assumption. In the second case, we have instead of (27), kπ kπ k π kπ + 2i sin + 2i sin + = 0. (29) 2i sin N N N N With x, y as above, it now follows from (29) that 0=x−

1 1 1 1 + y − + xy − = −(1 + x)(1 + y)(1 − x y) . x y xy xy

Again we conclude that any solution of (29) contradicts the assumption 1 ≤ |k|, |k |, |k | ≤ N − 1. Indeed, solutions with x = −1 (i.e. k ≡ N mod 2N ), y = −1 (i.e. k ≡ N mod 2N ), or x y = 1 (i.e. k + k ≡ 0 mod 2N and thus k ≡ N mod 2N ), contradict this assumption. By Lemma 4.1, one can define F3 as follows ⎧ (3) ⎪ ⎨ G kk k (k, k , k ) ∈ K 3 sk +sk +sk (3) i Fkk k := ⎪ ⎩ 0 otherwise.

156

A. Henrici, T. Kappeler

Then {G 2 , α F3 } + αG 3 = 0. Written more explicitly, the nonzero coefficients of F3 are kπ k π +k )/N (k+k | sin kπ (−1) N sin N sin N | (3) i Fkk k = . √ kπ k π 6 N 2 sin kπ N + 2 sin N + 2 sin N In a second step we normalize the 4th order term βG 4 + α2 {G 3 , F3 } in (25). We decompose this sum into its contribution to the Birkhoff normal form and the rest, to be transformed away in a moment. Let us first compute {G 3 , F3 } in a more explicit form: 2

∂G 3 ∂ F3 ∂G 3 ∂(i F3 ) = σk ∂ζk ∂ζ−k ∂ζk ∂ζ−k 1≤|k|≤N −1 1≤|k|≤N −1 ⎛ ⎞ 1 ⎜3 ⎟ = σk ⎝ (−1)r λk λl λm ζl ζm ⎠ N 6 1≤|l|,|m|≤N −1,

{G 3 , F3 } = i

⎛

σk

1≤|k|≤N −1

⎜3 ·⎝ 6

l+m=−k+r N

(−1)r

1≤|l |,|m |≤N −1, l +m =k+r N

=

λk λl λm ⎟ ζl ζm ⎠ s−k +sl +sm

1≤|k|≤N −1

1≤|l|,|m|,|l |,|m |≤N −1 l+m−r N =−k l +m −r N =k

1 8N

⎞

(−1)r +r

sk λl λm λl λm ζl ζm ζl ζm , s−k + sl + sm

where for the latter equality we used that 2σk λ2k = sk . Setting εlml m :=

l + m + l + m , N

one then gets {G 3 , F3 } =

=

1≤|k|≤N −1

l+m≡−k mod N l +m ≡k mod N

1 8N

λl λm λl λm ζl ζm ζl ζm −1+(sl +sm )/sk

N −1 λl λm λl λm 1 (−1)εlml m ζl ζm ζl ζm 8N −1 + (sl + sm )/sk l+m≡−k mod N k=1

+

l +m ≡k mod N

N −1 1 8N k=1

=

(−1)εlml m

(−1)εlml m

l+m≡k mod N l +m ≡−k mod N

λl λm λl λm ζl ζm ζl ζm −1 − (sl + sm )/sk

N −1 1 1 1 + 8N −1 + (sl +sm )/sk −1 − (sl+sm )/sk l+m≡−k mod N k=1

l +m ≡k mod N

εlml m

· (−1)

λl λm λl λm ζl ζm ζl ζm .

Note that for k = l + m + r N with 1 ≤ k ≤ N − 1 and r ∈ Z we have sk = |sl +m |.

(30)

Results on Normal Forms for FPU Chains

157

Introduce1 for any (l, m, l , m ) ∈ K 4 , ⎧ 1 ⎪ ⎨ −1+ sl +sm − |sl +m | clml m = ⎪ ⎩ 0

1 s +s 1+ |sl m|

if l + m ≡ 0 mod N

l+m

(31) otherwise.

We then get α2 α2 {G 3 , F3 } = 2 16N

(l,m,l ,m )∈K

clml m (−1)εlml m λl λm λl λm ζl ζm ζl ζm . 4

Combined with formula (20) for G 4 , the quantity βG 4 + 1 24N

(−1)εkk k k (β +

(k,k ,k ,k )∈K 4

(32)

α2 2 {G 3 ,

F3 } becomes

3α 2 ckk k k )λk λk λk λk ζk ζk ζk ζk . 2

(33)

We now decompose (33) into its contribution to the Birkhoff normal form of HV and the rest, and we denote by π N the projection onto the former one, whereas the latter one will be (partially) transformed away by a second transformation 2 . Lemma 4.2. The normal form part of βG 4 + α2 {G 3 , F3 } π N βG 4 + ⎛ 2 N −1 1 ⎝ 2 = (α +(β −α 2 )λl4 )|ζl |4 +2 4N l=1

α2 2 {G 3 ,

F3 } is given by

⎞

(β −α 2 )λl2 λ2m |ζl |2 |ζm |2⎠. (34)

1≤l=m≤N −1

Proof. The indices k, k , k , k of the terms in βG 4 + normal form satisfy (k, k , k , k ) ∈ K 4N , where

α2 2 {G 3 ,

F3 } contributing to the

K 4N := {(k, k , k , k ) ∈ K 4 | ∃ 1 ≤ l ≤ m ≤ N − 1 such that {k, k , k , k } = {l, −l, m, −m}}.

(35)

In the case l = m, {l, −l, l, −l} in (35) is viewed as a set-like object whose two elements l and −l each have multiplicity two. 2 We investigate π N (βG 4 ) and π N ( α2 {G 3 , F3 }) separately. Let us start with βG 4 . We distinguish the cases l = m and l = m in K 4N . For l = m, there are 24 = 6 distinct permutations of (k, k , k , k ) in K 4N , whereas for l = m, all 4! = 24 permutations of (l, m, −l, −m) are distinct. Hence we have ⎞ ⎛ N −1 β ⎝ λl4 |ζl |4 + 24 λl2 λ2m |ζl |2 |ζm |2 ⎠ π N (βG 4 ) = 6 24N l=1 1≤l<m≤N −1 ⎞ ⎛ N −1 β ⎝ = λl4 |ζl |4 + 2 λl2 λ2m |ζl |2 |ζm |2 ⎠ . (36) 4N l=1

1≤l=m≤N −1

1 To keep the formula for c as simple as possible we have not symmetrized the coefficients c . lml m lml m

158

A. Henrici, T. Kappeler

Now let us compute π N ( α2 {G 3 , F3 }). We have to single out the matches of (35) for which in addition the coefficient ckk k k in (32) does not vanish, i.e. 2

k + k ≡ 0 mod N and k + k + k + k ≡ 0 mod N . There are two quadruples (k, k , k , k ) in K 4N which satisfy these additional conditions, k + k = 0 k + k = 0

or

k + k = 0 . k + k = 0

(37)

In both cases, we have sk + sk = −(sk + sk ), and therefore (31) reduces to ckk k k =

−2|sk+k | . |sk+k | + sk + sk

(38)

Note that (38) remains valid for k + k = N , since in this case sk+k = 0 and sk + sk > 0 as k and k must satisfy 1 ≤ k, k ≤ N − 1, but not for k + k = 0, since in this case |sk+k | + sk + sk = 0. We first compute the diagonal part of π N 21 {G 3 , F3 } . In this case, the two possibilities in (37) coincide and the solutions are (k, k , k , k ) =

(l, l, −l, −l) , (−l, −l, l, l)

(39)

where 1 ≤ l ≤ N − 1. The sum of the coefficients ckk k k for the two cases listed in (39) is cl,l,−l,−l + c−l,−l,l,l = −2|s2l |

1 1 + |s2l |+2sl |s2l |−2sl

=

2 −4s2l 2 −4s 2 s2l l

= 4 cot 2

lπ . N

We now turn to the off-diagonal part of π N 21 {G 3 , F3 } . The quadruples (k, k , k , k ) ∈ K 4 satisfying (37) for given {l, m} ⊆ {1, . . . , N − 1} with l < m, (k, k ) = (±l, ±m), and (k , k ) = (±l, ±m), are ⎧ (l, ⎪ ⎨ (l, (k, k , k , k ) = ⎪ ⎩ (−l, (−l,

m, −m, m, −m,

−l, −l, l, l,

−m) m) . −m) m)

(40)

The remaining matches are obtained from (40) by permuting the first and second or the third and fourth columns on the right-hand side of (40), bringing the total number of all matches to 16 = 4 · 4. Note that by formula (38), these permutations leave the value of the coefficients ckk k k invariant. Taking the sum of the coefficients ckk k k for all the

Results on Normal Forms for FPU Chains

159

quadruples listed in (40), we obtain 4(cl,m,−l,−m + cl,−m,−l,m + c−l,m,l,−m + c−l,−m,l,m ) |sl−m | |sl+m | + = −8 |sl+m | + sl + sm |sl−m | + sl − sm |sl−m | |sl+m | + + |sl−m | − sl + sm |sl+m | − sl − sm 2 2 sl−m sl+m = −16 2 + 2 sl−m − (sl − sm )2 sl+m − (sl + sm )2 2 s 2 − s 2 (s + s )2 − s 2 (s − s )2 ) −16(2sl−m m m l+m l−m l l+m l

=

2 s 2 + (s − s )2 (s + s )2 − s 2 (s + s )2 − s 2 (s − s )2 sl−m l m l m m m l+m l−m l l+m l = −16,

,

2 s2 2 2 since sl−m l+m = (sl − sm ) (sl + sm ) . Collecting terms, we thus have

πN

α2 2

⎛

N −1

⎞

πl ⎝ 4 cos2 |ζl |4 − 16 λl2 λ2m |ζl |2 |ζm |2 ⎠ 16N N l=1 1≤l<m≤N −1 ⎞ ⎛ N −1 2 α ⎝ = (1 − λl4 )|ζl |4 − 2 λl2 λ2m |ζl |2 |ζm |2 ⎠ . (41) 4N α2

{G 3 , F3 } =

1≤l=m≤N −1

l=1

Adding up (36) and (41), we obtain (34). Now we want to remove [as much as possible of] the term (Id − π N )(βG 4 + F3 }) from the Hamiltonian (25), H˜ V ◦ 1 , by a second coordinate transformation 2 . In view of formulas (20) and (32) for G 4 and 21 {G 3 , F3 }, respectively, and in complete analogy to the first step we look for a transformation 2 of the form 2 = X tF4 |t=1 with (4) F4 = Fkk k k ζk ζk ζk ζk , α2 2 {G 3 ,

(k,k ,k ,k )∈K 4 \K 4N (4)

(4)

where Fσ (k,k ,k ,k ) = F(k,k ,k ,k ) for any permutation σ (k, k , k , k ) of the quadruple (k, k , k , k ) ∈ K 4 \ K 4N . We would like to determine the coefficients of F4 in such a way that {G 2 , F4 } = − (Id − π N )(βG 4 +

α2 {G 3 , F3 }). 2

(42)

As in (26) one gets {G 2 , F4 } = −i

(k,k ,k ,k )∈K

(4) (sk + sk + sk + sk )Fkk k k ζk ζk ζk ζk , N 4 \K 4

(43)

160

A. Henrici, T. Kappeler

and Eq. (42) combined with (33) leads to (4)

i(sk + sk + sk + sk )Fkk k k =

(44)

3α 2

1 (−1)εkk k k (β + c S ) · λk λk λk λk 24N 2 kk k k

S for any quadruple (k, k , k , k ) in K 4 \ K 4N . Here ckk k k denotes the symmetrized version of ckk k k , S ckk k k :=

1 cσ (k,k ,k ,k ) . 4!

(45)

σ ∈S4

The following lemma due to Beukers and Rink (cf. [13]) determines the quadruples (k, k , k , k ) ∈ K 4 \ K 4N for which sk + sk + sk + sk = 0. Let us introduce K 4r es := K r+es ∪ K r−es ⊆ K 4 , where K r±es :=

N so that (k, k , k , k ) ∈ K 4 | ∃ l ∈ N : 1 ≤ l ≤ 4 ! N N {k, k , k , k } = {±l, ±l ∓ N , ∓ l, − ∓ l} . 2 2

Note that if N is odd, then K 4r es = ∅. Lemma 4.3. Let (k1 , k2 , k3 , k4 ) ∈ K 4 \ K 4N . Then sk + sk + sk + sk = 0 if and only if (k, k , k , k ) ∈ K 4r es . In particular, if N is odd, then sk + sk + sk + sk = 0. For the convenience of the reader a detailed proof of Lemma 4.3 is given in Appendix A. It is likely that Lemma 4.3 also can be proved using the integrability of the Toda lattice (cf. [7]). By Lemma 4.3, if N is odd, (44) can be solved for any (k, k , k , k ) ∈ K 4 \ K 4N (4) determining the coefficients Fkk k k with (k, k , k , k ) ∈ K 4 \ K 4N in such a way that (4) Fσ(4) (k,k ,k ,k ) = F(k,k ,k ,k ) for any permutation σ (k, k , k , k ) of (k, k , k , k ) ∈ K 4 \ K 4N . With this choice of F4 the canonical transformation 2 is then defined by X tF4 |t=1 . Composing 1 and 2 , we obtain the transformation := 1 ◦ 2 . We have proved the following Proposition 4.4. Assume that N ≥ 3 is odd. The real analytic symplectic coordinate transformation ζ = (z), defined in a neighborhood of the origin in Z, transforms the Hamiltonian H˜ V into its Birkhoff normal form up to order 4. More precisely, α2 ˜ HV ◦ = G 2 + π N βG 4 + (46) {G 3 , F3 } + O(z 5 ), 2 with G 2 and π N (βG 4 +

α2 2 {G 3 ,

F3 }) given by (18) and (34), respectively.

Results on Normal Forms for FPU Chains

161

Theorem 1.1 can now be proved easily. Proof of Theorem 1.1. Proposition 4.4 provides the Taylor series expansion of H˜ V in terms of the actions I = (Ik )1≤k≤N −1 ,

Ik =

xk2 + yk2 . 2

(47)

More precisely, H˜ V ◦ = Hα,β (I ) + O(z 5 ), where Hα,β (I ) is defined by 2

N −1

N −1 1 2 β − α2 2 2 (α + (β − α 2 )λ4k )Ik2 + λl λm Il Im 4N 2N l =m

λ2k Ik +

k=1

k=1

and λk = | sin

kπ 21 N | .

(48)

1≤l,m≤N −1

This proves Theorem 1.1.

5. Proof of Theorem 1.4. Now we assume that N is even. To obtain the normal form of the FPU Hamiltonian as claimed in Theorem 1.4 we continue the investigations of the previous section. According (4) to Lemma 4.3, Eq. (44) might have no solution Fkk k k for (k, k , k , k ) ∈ K 4r es . We

first compute the projection πr es (βG 4 + α2 {G 3 , F3 }) of βG 4 + α2 {G 3 , F3 } onto those terms which are indexed by quadruples (k, k , k , k ) ∈ K 4r es , i.e. the projection onto 2 the resonant non-normal form part of βG 4 + α2 {G 3 , F3 }. 2

2

Lemma 5.1. Assume that N is even. The resonant non-normal form part of βG 4 + α2 2 {G 3 , F3 } is given by α2 β − α2 {G 3 , F3 } = − (R + R N ), (49) πr es βG 4 + 4 2 4N where R=

s2l ζl ζ−N +l ζ N −l ζ− N −l + ζ−l ζ N −l ζ N +l ζ− N +l 2

1≤l< N4

and RN = 4

2

2

⎧ 1 2 2 2 2 ⎪ ⎨ 2 ζ N ζ 3N + ζ 3N ζ N ⎪ ⎩

4

−

4

4

0

−4

if

N 4

2

(50)

∈N (51)

otherwise.

Proof. Consider the formula (33) for βG 4 + α2 {G 3 , F3 }. At this point we need to consider the symmetrized version (45) of the coefficients cklk l defined by (31). We claim that for any (k1 , k2 , k3 , k4 ) ∈ K 4r es , 2

ckS1 k2 k3 k4 =

1 2 ckσ (1) kσ (2) kσ (3) kσ (4) = − . 4! 3 σ ∈S4

(52)

162

A. Henrici, T. Kappeler

Observe that ck1 k2 k3 k4 is invariant under the transpositions k1 ↔ k2 and k3 ↔ k4 . Hence (52) follows once we prove that (53) 4 ck1 k2 k3 k4 +ck1 k3 k2 k4 +ck1 k4 k2 k3 +ck2 k4 k1 k3 +ck2 k3 k1 k4 +ck3 k4 k1 k2 = −16. Note that any element (k1 , k2 , k3 , k4 ) ∈ K 4r es is, mod 2N , a permutation of an element of the form (l, −N + l, N /2 − l, −N /2 − l) with 1 ≤ |l| ≤ N /4. For such quadruples one gets by a straightforward computation ck1 k2 k3 k4 + ck3 k4 k1 k2 = −2 − 2 = −4 and, with cl = 2 cos lπ N, ck1 k3 k2 k4 + ck2 k4 k1 k3 = −

4 8 4 − =− 2 + (sl + cl ) 2 − (sl + cl ) s2l

ck1 k4 k2 k3 + ck2 k3 k1 k4 = −

4 8 4 − = . 2 + (sl − cl ) 2 − (sl − cl ) s2l

as well as

Substituting these three identities into the left hand side of (53) leads to the claimed identity (53). Moreover, by the definition (30) of εlml m one has for any (k1 , k2 , k3 , k4 ) ∈ K 4r es and any σ ∈ S4 , that εσ (k1 ,k2 ,k3 ,k4 ) = ±1 and hence (−1)εσ (k1 ,k2 ,k3 ,k4 ) = −1. Further,

lπ lπ 1 2lπ 1 = |s2l |. cos = sin λk1 λk2 λk3 λk4 = sin N N 2 N 4

Combining all these computations we get α2 {G 3 , F3 } πr es βG 4 + 2 1 3α 2 S c = (−1)(β + )λk λk λk λk ζk ζk ζk ζk 24N 2 k1 k2 k3 k4 1 2 3 4 1 2 3 4 r es (k1 ,k2 ,k3 ,k4 )∈K 4

=−

4! (β − α 2 ) s2l ζl ζ−N+l ζ N −l ζ− N −l + ζ−l ζ N−l ζ N +l ζ− N +l 2 2 2 2 24N 4 N 1≤l< 4

−

− α2 )

3! (β 24N

·

2 2 ζ 2N ζ−2 3N + ζ−2 N ζ 3N 4 4 4 4 4

only if

=−

N 4

β − α2 (R + R N ), 4 4N

∈N

(54)

with R and R N as defined by (50) and (51), respectively. Hence Lemma 5.1 is proved. 4

Results on Normal Forms for FPU Chains

163

By Lemma 5.1, if N is even, Eq. (44) can be solved for any quadruple (k, k , k , k ) ∈ (4) K 4 \ (K 4N ∪ K 4r es ) in such a way that Fσ(4) (k,k ,k ,k ) = F(k,k ,k ,k ) for any permutation σ (k, k , k , k ) of (k, k , k , k ). With this choice of F4 the canonical transformation 2 is then defined by X tF4 |t=1 . Composing 1 and 2 , we obtain the transformation := 1 ◦ 2 and have proved the following Proposition 5.2. Assume that N is even. The real analytic symplectic coordinate transformation ζ = (z), defined locally in a neighborhood of the origin z = 0 in Z, transforms the Hamiltonian H˜ V into the resonant Birkhoff normal form up to order 4, α2 α2 H˜ V ◦ = G 2 + π N βG 4 + {G 3 , F3 } + πr es βG 4 + {G 3 , F3 } + O(z 5 ), 2 2 with G 2 , π N (βG 4 + α2 {G 3 , F3 }), and πr es (βG 4 + α2 {G 3 , F3 }) given by (18), (34), and (49), respectively. Proof of Theorem 1.4. We start with the formula for H˜ V ◦ given by Proposition 5.2 and 2

2

treat the normal form terms G 2 + π N (βG 4 + α2 {G 3 , F3 }) and the resonant normal form 2

terms πr es (βG 4 +

α2 2 {G 3 ,

F3 }) separately. With the action variables I = (Ik )1≤k≤N −1

defined by (47) we see that G 2 + π N (βG 4 +

α2 2 {G 3 ,

F3 }) = Hα,β (I ), where Hα,β (I )

is defined by (48). Concerning πr es (βG 4 + α2 {G 3 , F3 }), we first express it in terms of the real variables (xk , yk )1≤k≤N −1 , related to the ζk ’s by xk = (ζk + ζ−k )/2 and yk = (ζ−k − ζk )/2i. Note that 2

ζl ζ−N +l ζ N −l ζ− N −l + ζ−l ζ N −l ζ− N +l ζ N +l 2

2

2

2

= 2 Re (ζl ζ−N +l ζ N −l ζ− N −l ) 2

2

1 = (xl x N −l + yl y N −l )(x N −l x N +l + y N −l y N +l ) 2 2 2 2 2 −(xl y N −l − x N −l yl )(x N −l y N +l − x N +l y N −l ) 2 2 2 2 = 2 Jl J N −l − Ml M N −l , 2

(55)

2

where for any 1 ≤ k ≤ N − 1, 1 1 Jk = (xk x N −k + yk y N −k ) and Mk = (xk y N −k − x N −k yk ) . 2 2 Hence R, given by (50), can be expressed in terms of Jk and Mk as follows: s2l ζl ζ−N +l ζ N −l ζ− N −l + ζ−l ζ N −l ζ N +l ζ− N +l R(J, M) = 2

1≤l< N4

= 4

sin

1≤l< N4

Similarly, if as

N 4

2

2

2lπ Jl J N −l − Ml M N −l . 2 2 N

2

(56)

∈ N, one concludes from (55) that R N , given by (51), can be expressed 4

1 2 ζ 2N ζ−2 3N + ζ 3N ζ−2 N = J 2N − M 2N . 4 2 4 4 4 4 4 4 Theorem 1.4 now follows from the formulas (48), (56), and (57).

R N (J, M) =

(57)

164

A. Henrici, T. Kappeler

6. Proof of Theorem 1.3. To analyze the Hessian Q α,β of (48) at I = 0 we repeatedly encounter matrices of the form E + diag(µ1 , . . . , µ N −1 ), where E is the (N − 1) × (N − 1)-matrix ⎛ ⎞ 1 ... 1 ⎜ .. ⎟ E := ⎝ ... (58) .⎠ 1 ... 1

and (µk )1≤k≤N −1 are given complex numbers. The determinant of the matrix E + diag(µ1 , . . . , µ N −1 ) can be explicitly computed. Lemma 6.1. Let (µk )1≤k≤N −1 be given nonzero complex numbers. Then det (E + diag(µ1 , . . . , µ N −1 )) = 1 +

N −1 k=1

1 µk

·

N −1 "

µk .

(59)

k=1

In particular, E + diag(µ1 , . . . , µ N −1 ) is regular if and only if

N −1

1 k=1 µk

= −1.

Proof. Expanding det(E + diag(µ1 , . . . , µ N −1 )) with respect to its rows it follows that det(E + diag(µ1 , . . . , µ N −1 )) =

N −1 "

µk +

N −1 "

µl .

k=1 l=k

k=1

This leads to formula (59).

First let us treat the case α = 0, β = 0. The following proposition improves earlier results of Rink [13]. Proposition 6.2. Let N be odd and assume that α = 0 in (2). Then the following holds: (i) The Birkhoff normal form of HV up to order 4 is given by ⎛ N −1 N −1 β ⎝ λ2k Ik + λ4k Ik2 + 2 H0,β (I ) = 2 4N k=1

k=1

N P2 2

+ H0,β (I ), where ⎞

λl2 λ2m Il Im ⎠ .

(60)

1≤l=m≤N −1

(ii) For any β = 0, H0,β (I ) is nondegenerate at I = 0. Proof. The Birkhoff normal form (60) of HV is given by the formula (48) evaluated at α = 0. To investigate the Hessian Q 0,β of H0,β (I ) at I = 0 we write Q 0,β = where

β P , 4N

kπ = diag sin N 1≤k≤N −1

(61)

(62)

Results on Normal Forms for FPU Chains

165

and 1 P = 2 · E − Id N −1 . 2

In view of (59) it follows that det Q 0,β =

β 4N

N −1

· det P ·

N −1 " k=1

sin2

kπ , N

where by Lemma 6.1, det P = 2 N −1 (1 − 2(N − 1)) (−1/2) N −1 = (−1) N (2N − 3) = 0. Hence, if β = 0, det Q 0,β = 0, and the nondegeneracy of H0,β (I ) at I = 0 follows. Lemma 6.3. If β < 0, then Q 0,β has one negative eigenvalue, whereas if β > 0, then Q 0,β has N − 2 negative eigenvalues. In particular, for any β = 0, Q 0,β is indefinite (and H0,β is therefore not convex). Proof. We want to use the decomposition (61) of Q 0,β to show that Q 0,β can be deformed β continuously to 4N P: Consider for 0 ≤ t ≤ 1, Q 0,β (t) :=

β (t + (1 − t) Id) P (t + (1 − t) Id). 4N

As t + (1 − t) Id is positive definite for any 0 ≤ t ≤ 1 and P is regular, Q 0,β (t) is a symmetric regular (N − 1) × (N − 1)-matrix for any 0 ≤ t ≤ 1. For t = 0, β Q 0,β (0) = 4N P, whereas for t = 1, Q 0,β (1) = Q 0,β . Therefore, index(Q 0,β ) (i.e. the β P). The eigenvalues number of negative eigenvalues of Q 0,β ) coincides with index( 4N of P are µ1 = 2N − 3 with multiplicity one and µ2 = −1 with multiplicity N − 2. We now turn to the case α = 0. Proposition 6.4. Assume that N is odd and α = 0 in (2). Then, for α fixed, det Q α,β is a polynomial in β of degree N − 1 and has N − 1 real zeroes (counted with multiplicities) which we list in increasing order and denote by βk = βk (α) (1 ≤ k ≤ N − 1). They satisfy 0 < β1 < α 2 , 2α 2 < β2 ≤ · · · ≤ β N −1 and contain the (N − 1)/2 distinct numbers −1 N −1 2 2 kπ α 1 + sin ). (1 ≤ k ≤ N 2 Moreover ⎧ ⎨1 index (Q α,β ) = 0 ⎩N −2

for β < β1 for β1 < β < β2 . for β > β N −1

166

A. Henrici, T. Kappeler

Proof. Fix α ∈ R \ {0} and consider the map β → det(Q α,β ). It follows from (48) that det(Q α,β ) is a polynomial in β of degree at most N − 1, det(Q α,β ) =

N −1

qjβ j,

j=0

where q0 = det(Q α,0 ) and q N −1 = det(Q 0,1 ). By Proposition 6.2, det(Q 0,1 ) = 0, hence the degree of the polynomial det(Q α,β ) is N − 1. We claim that det(Q α,β ) has N − 1 real zeroes (counted with multiplicities). For |β| large enough, index(Q α,β ) is equal to index(Q 0,β ). By Lemma 6.3, index(Q 0,β ) is N − 2 for β > 0 and 1 for β < 0. Hence there exists R > 0 such that index(Q α,β ) = N − 2 for any β > R and index(Q α,β ) = 1 for any β < −R. For β = α 2 , Q α,α 2 is a positive multiple of the identity matrix, hence index(Q α,α 2 ) = 0. It then follows that index(Q α,β ) must change at least once in the open interval (−∞, α 2 ) and at least N − 2 times (counted with multiplicities) in (α 2 , ∞). Since a change of index(Q α,β ) induces a zero of det(Q α,β ) (counted with multiplicities), our consideration shows that β → det(Q α,β ) has N − 1 real zeroes. Further we have β1 (α) < α 2 < β2 (α). Next we prove that β1 (α) > 0, i.e. that Q α,β is regular for any β ≤ 0. Write Q α,β as a product, α2 − β Pα,β , 4N

Q α,β =

(63)

where is given by (62) and Pα,β is given by 1 γ (α, β) , Pα,β = −2 E + diag − 1+ 2 sin2 kπ N 1≤k≤N −1

(64)

where E is given by (58) and γ (α, β) :=

α2 . α2 − β

(65)

As −∞ < β ≤ 0 it follows that 0 < γ (α, β) ≤ 1 and − 21 1 +

γ (α,β) sin2 kπ N

< 0 for any

1 ≤ k ≤ N − 1. Lemma 6.1 says that Pα,β is regular if f (γ (α, β)) = 0, where f (γ ) := 1 − 2

N −1 k=1

1 + γ / sin2

kπ N

−1

in the interval 0 < γ ≤ 1. Note that f (γ ) is increasing in 0 < γ ≤ 1 and f (1) can be estimated as follows. Using that N is assumed to be odd one has N −1

−1)π kπ sin2 (N2N N < 1 − 4 −1)π 1 + sin2 kπ 1 + sin2 (N2N N k=1 π cos2 2N 4 1−4 π = −3 + π . 2 1 + cos 2N 1 + cos2 2N

f (1) = 1 − 4 =

2

sin2

Results on Normal Forms for FPU Chains

167

As for N ≥ 3, −3 +

4 1 + cos2

π 2N

< −3 +

4 1 + cos2

π 6

5 =− , 7

we conclude that f (1) < 0. Hence we have shown that f (γ ) < 0 for 0 < γ ≤ 1, and therefore Pα,β is regular for β ≤ 0 by Lemma 6.1. Hence we have proved that 0 < β1 (α). By the same method we prove that β2 (α) > 2α 2 , or equivalently, since we have already shown that β2 (α) > α 2 , that Q α,β is regular for any α 2 < β ≤ 2α 2 . We decompose Q α,β as in (63) and (64), and according to the definition (65) of γ (α, β), α 2 < β ≤ 2α 2 corresponds to γ (α, β) ≤ −1. For such γ ’s, we have 1 + γ / sin2 kπ N < 0, ) > 0 for any 1 ≤ k ≤ N − 1. Moreover, it also follows and hence − 21 (1 + γ / sin2 kπ N −1 < 0, which allows us to conclude that ) that (1 + γ / sin2 kπ N f (γ ) = 1 − 2

N −1 k=1

1 + γ / sin2

kπ N

−1

> 1 > 0.

According to Lemma 6.1, this proves the regularity of Pα,β and hence of Q α,β for α 2 < β ≤ 2α 2 . Finally introduce µk := − 21 (1+γ (α, β)/ sin2 kπ N ) and note that for β with γ (α, β) = 0π for some 1 ≤ k0 ≤ N 2−1 one has µk0 = µ N −k0 = 0. As k0 = N − k0 if − sin2 kN 1 ≤ k0 ≤ (N − 1)/2, it then follows that Pα,β has two equal rows and is therefore 0π 0π singular. Note that γ (α, β) = − sin2 kN corresponds to β = α 2 (1 + sin−2 kN ) and we have proved that β → det(Q α,β ) has at least (N −1)/2 different zeroes in the interval (α 2 , ∞). The statement about index(Q α,β ) easily follows from the above analysis. Proof of Theorem 1.3. Part (i) is proved by Proposition 6.4, whereas (ii) follows from Proposition 6.2 and Lemma 6.3. A. Proof of Lemma 4.3 For the convenience of the reader, we provide a detailed proof of Lemma 4.3 in this appendix. This lemma and its proof are due to Beukers and Rink - see ([13], Appendix A). Recall that K 4 \ K 4N ⊆ Z4 denotes the subset of quadruples (k1 , k2 , k3 , k4 ) satisfying 1 ≤ |ki | ≤ N − 1 (1 ≤ i ≤ 4) and k1 + k2 + k3 + k4 ≡ 0 mod N so that there are no integers l, m with {l, m, −l, −m} = {k1 , k2 , k3 , k4 }, and K 4r es := K r+es ∪ K r−es ⊆ K 4 , where N so that K r±es := (k1 , k2 , k3 , k4 ) ∈ K 4 | ∃ l ∈ N : 1 ≤ l ≤ 4 N N {k1 , k2 , k3 , k4 } = {±l, ±l ∓ N , ∓ l, − ∓ l} . 2 2 Note that K 4r es = ∅ if N is odd. Let us restate Lemma 4.3 as follows:

168

A. Henrici, T. Kappeler

Lemma A.1. ([13]). Let (k1 , k2 , k3 , k4 ) be an element of K 4 \K 4N . Then (k1 , k2 , k3 , k4 ) ∈ K 4r es if and only if sin

k2 π k3 π k4 π k1 π + sin + sin + sin = 0. N N N N

Let us make a few preparations for the proof of Lemma A.1. By a straightforward computation one sees that the “only if”-part of the claimed equivalence holds: Lemma A.2. For any (k1 , k2 , k3 , k4 ) ∈ K 4r es , one has

4

ki π i=1 sin N

= 0.

So it remains to prove the converse. First we consider some special cases. Lemma A.3. Let (k1 , k2 , k3 , k4 ) ∈ K 4 \ (K 4N ∪ K 4r es ). If there exist l, m, n ∈ Z such that (i) {k1 , k2 , k3 , k4 } = {l, −l, m, n}, or (ii) {k1 , k2 , k3 , k4 } = {l, N − l, m, n} with 1 ≤ l ≤ N − 1, or (iii) {k1 , k2 , k3 , k4 } = {l, −N − l, m, n} with −(N − 1) ≤ l ≤ −1, then 4

sin

i=1

ki π = 0. N

Proof. In case (i), it follows that m +n = N (and thus 1 ≤ m, n ≤ N −1) or m +n = −N nπ (and thus −(N − 1) ≤ m, n ≤ −1). Hence in both cases, sin mπ N and sin N have the 4 nπ same sign and i=1 sin kNi π = sin mπ N + sin N = 0. In the case (ii), by assumption, m + n ≡ 0 mod N . The case m + n = 0 has already been treated under (i). If m + n = N , / {−l, −N + l}. then sin kNi π > 0 for any 1 ≤ i ≤ 4. If m + n = −N , then m < 0, and m ∈ 4 (−m)π sin kNi π = 2 sin lπ − 2 sin = 0. The Thus n = −N − m < 0, and therefore i=1 N N case (iii) is treated similarly as (ii). Another special case is treated in the following lemma. Lemma A.4. Assume that (k1 , k2 , k3 , k4 ) ∈ K 4 \ K 4N satisfies ki + k j ≡ 0 mod N ∀ 1 ≤ i, j ≤ 4.

(66)

If there exist l, n ∈ {k1 , k2 , k3 , k4 } with sin

nπ lπ + sin = 0, N N

(67)

then 4 i=1

implies that (k1 , k2 , k3 , k4 ) ∈

K 4r es .

sin

ki π =0 N

(68)

Results on Normal Forms for FPU Chains

169

Proof. From the assumptions (66)–(67) it follows that there exists 1 ≤ l ≤ N − 1 so that (−N +l)π {k1 , k2 , k3 , k4 } = {l, −N + l, m, n} for some m, n ∈ Z. Then sin lπ = 0, N + sin N mπ nπ and hence by (68), sin N + sin N = 0. W.l.o.g. assume that 1 ≤ m ≤ N − 1. Then either n = −m or n = −N + m. If n = −m, then (k1 , k2 , k3 , k4 ) ∈ K 4r es by Lemma A.3 (i). If n = −N + m, then one has 4

ki = 2l − N + 2m − N = 2(l + m) − 2N .

i=1

Note that 2(l + m) − 2N cannot be an even multiple of N , as otherwise l + m ≡ 0 mod N , violating (66). If, in addition, N is odd, then 2(l + m) − 2N cannot be odd multiple 4 of N . Hence in the case N is odd we conclude that i=1 ki ≡ 0 mod N , contradicting the assumption (k1 , k2 , k3 , k4 ) ∈ K 4 . If N is even, it is however possible that 2(l+m)−2N equals ±N : If 2(l+m)−2N = N , i.e. l + m = 23 N , it follows that N2 < l, m ≤ N − 1, and (k1 , k2 , k3 , k4 ) ∈ K r−es as {k1 , k2 , k3 , k4 } = {−l , −l + N , N2 +l , − N2 +l } with l = l − N2 . If 2(l +m)−2N = −N , i.e. l + m = N2 , it follows similarly that (k1 , k2 , k3 , k4 ) ∈ K r+es as {k1 , k2 , k3 , k4 } = {l, l − N , N2 − l, − N2 − l}. So in both cases, we conclude that (k1 , k2 , k3 , k4 ) ∈ K 4r es . In view of Lemma A.3 and Lemma A.4 in order to prove Lemma A.1 it remains to show the following Lemma A.5. Assume that (k1 , k2 , k3 , k4 ) ∈ K 4 satisfies (66). If for any 1 ≤ i, j ≤ 4, sin

kjπ ki π + sin = 0, N N

(69)

(and thus (k1 , k2 , k3 , k4 ) ∈ / K 4N ∪ K 4r es ), then 4 i=1

sin

ki π = 0. N

To prove Lemma A.5 let us first rewrite (68), using Euler’s formula for the sine function, ζ j = 0, (70) 1≤| j|≤4

where ζ± j = ±e±ik j π/N are 2N th roots of unity. Note that for any quadruple (k1 , k2 , k3 , k4 ) ∈ K 4 \ K 4N satisfying (69) one has ζ j + ζ j = 0 ∀ 1 ≤ | j| ≤ | j | ≤ 4. k π

k

π

Indeed for any 1 ≤ | j| ≤ | j | ≤ 4 one has Im ζ j + Im ζ j = sin |Nj| + sin | jN| which does not vanish by assumption (69). Let us first discuss Eq. (70) and its solutions in general, i.e. we consider the equation ζ1 + · · · + ζ8 = 0

and want to study its solutions, (ζl )1≤l≤8 , on the unit circle S 1 := {z ∈ C|z| = 1}.

(71)

170

A. Henrici, T. Kappeler

We need an auxiliary result which we discuss first. Let n ≥ 2 be arbitrary and assume that the sequence (ζi )1≤i≤n ⊆ S 1 has no vanishing subsums (i.e. l∈J ζl = 0 for any ∅ = J {1, . . . , n}) and satisfies the equation n

ζi = 0.

(72)

i=1

Let M ∈ N be the smallest integer with the property that (ζi /ζ j ) M = 1 for all 1 ≤ i, j ≤ n. Then there exists ξ ∈ S 1 so that ζiM = ξ M for any 1 ≤ i ≤ n. W.l.o.g. we can assume that ξ = 1. Furthermore, let p k be a prime power dividing M so that M/ p k and p are relatively prime and define M =: M/ p and η := e2πi/ p . k

(73)

Then for any 1 ≤ l ≤ n there exists a unique integer 0 ≤ µ(l) ≤ p − 1 such that ζl = ζ˜l · ηµ(l) , where ζ˜l is an element of the field K := Q(e2πi/M ). (As ζlM = 1 there 2πi

exists 0 ≤ rl ≤ M − 1 with ζl = e M rl . If rl ≡ 0 mod p choose µ(l) = 0. If rl ≡ 0 mod p choose 1 ≤ µ(l) ≤ p − 1 so that rl ≡ pMk µ(l) mod p.) Hence (72) can be written as 0=

n l=1

ζl =

p−1

⎛ ⎝

s=0

⎞ ζl ⎠ =

l∈µ−1 (s)

p−1 s=0

⎛

⎝

⎞ ζ˜l ⎠ ηs .

(74)

l∈µ−1 (s)

We need the following algebraic fact (see e.g. [17], §60–61): k

Proposition A.6. The minimal polynomial of η = e2πi/ p over the field K = Q(e2πi/M ) is given by X p − η p if k ≥ 2 and X p−1 + X p−2 + · · · + X + 1 if k = 1. We now claim that M is square-free, or equivalently that for any prime power p k dividing M, k = 1.

(75)

Indeed, Eq. (74) shows that the minimal polynomial of ζ has degree at most p − 1, which by Proposition A.6 is only satisfied in the case k = 1. Further we claim that there exists σ ∈ C \ {0} so that ζ˜l = σ ∀ 0 ≤ s ≤ p − 1. (76) l∈µ−1 (s)

The existence of such a σ follows from Proposition A.6: As k = 1 by (75), the minimal polynomial of η over K is given by X p−1 + X p−2 +· · ·+ X +1. Since this is a polynomial of degree p−1 the polynomial on the right-hand side of (74) must be a scalar multiple of the minimal polynomial. Hence all the coefficients l∈µ−1 (s) ζ˜l have the same value σ ∈ C. As l∈µ−1 (s) ζl = σ ηs , the additional property σ = 0 follows from the assumption that there are no vanishing subsums. Hence we can assume w.l.o.g. that σ = 1. Next we claim that p ≤ n.

(77)

Results on Normal Forms for FPU Chains

171

In other words, possible prime factors of M are bounded by the number of summands in (72). To prove (77), note that it follows from (76) that for any 0 ≤ s ≤ p − 1 there exists 1 ≤ l ≤ n such that µ(l) = s, i.e. the map µ : {1, . . . , n} → {0, . . . , p − 1} is onto. This establishes (77). The map µ induces the partition ( µ−1 (s))0≤s≤ p−1 of the positive integer n into p summands, n=

p−1

µ−1 (s).

(78)

s=0

Lemma A.7. ([13], Appendix A). For any solution {ζ1 , . . . , ζ8 } of (71) contained in S 1 without vanishing subsums there exists ξ ∈ S 1 such that either {ζ1 , . . . , ζ8 } = {−ξ α, −ξ α 2 } ∪ {ξ γ j | 1 ≤ j ≤ 6}

(79)

{ζ1 , . . . , ζ8 } = {−ξ αl , −ξ αl · β i , −ξ αl · β j | 1 ≤ l ≤ 2} ∪ {ξβ k , ξβ m },

(80)

or where the quadruple (i, j, k, m) is a permutation of (1, 2, 3, 4) and α := e

2πi 3

, β := e

2πi 5

, γ := e

2πi 7

.

Proof. By a straightforward computation one verifies that the sets of the form (79) or (80) satisfy (71). It remains to prove that these are the only solutions of (71) of this type. We classify the solutions of (71) according to the possible values of p, which we now assume to be the largest prime dividing M. Since n = 8, by (77), the possible values of p are 2, 3, 5, and 7. If p = 2, then, by (75), M = 2 and therefore there exists ξ ∈ S 1 so that ζ j = ±ξ for any 1 ≤ j ≤ n. In this case there exists a solution of (72) without vanishing subsums only if n = 2. (In this case, they are given by {ζ1 , ζ2 } = ξ {1, −1} with ξ ∈ S 1 .) If p = 3, then M = 3 or M = 3 · 2, and there exists a solution of (72) without vanishing subsums only if n = 3. (In this case, they are given by {ζ1 , ζ2 , ζ3 } = ξ {1, α, α 2 } with ξ ∈ S 1 .) If p = 5, then η = β in (73). Up to permutations, there are the following three partitions of 8 into 5 summands, (4, 1, 1, 1, 1), (3, 2, 1, 1, 1), and (2, 2, 2, 1, 1). In a straightforward way one shows that the partitions (4, 1, 1, 1, 1) and (3, 2, 1, 1, 1) and their permutations give rise to solutions of Eq. (71) with vanishing subsums. E.g. the solutions corresponding to (4, 1, 1, 1, 1) are given by ξ ·(−β, −β 2 , −β 3 , −β 4 , β, β 2 , β 3 , β 4 ) with ξ ∈ S 1 , whereas the solutions corresponding to (3, 2, 1, 1, 1) are ξ · (−i, 1, i, −αβ, −α 2 β, β 2 , β 3 , β 4 ) with ξ ∈ S 1 . On the other hand the partition (2, 2, 2, 1, 1) leads to the solutions (ζ1 , . . . , ζ8 ) = ξ(−α, −α 2 , −αβ, −α 2 β, −αβ 2 , −α 2 β 2 , β 3 , β 4 ) with ξ ∈ S 1 . They are the solutions (80) with (i, j, k, m) = (1, 2, 3, 4). Permutations of the partition (2, 2, 2, 1, 1) again lead to solutions of the type (80), but with (i, j, k, m) given by a permutation of (1, 2, 3, 4). If p = 7, then η = γ in (73). Then, up to permutations, (2, 1, 1, 1, 1, 1, 1) is the only possible partition of 8 into 7 summands. The partition (2, 1, 1, 1, 1, 1, 1) leads to the solutions (ζ1 , . . . , ζ8 ) = ξ(−α, −α 2 , γ , . . . , γ 6 ) with ξ ∈ S 1 , where we used that 1 = −α − α 2 . They are of type (79). Any permutation of (2, 1, 1, 1, 1, 1, 1) leads to the same kind of solutions.

172

A. Henrici, T. Kappeler

Lemma A.8. ([13], Appendix A). For any solution {ζ1 , . . . , ζ8 } of (71) contained in S 1 without vanishing subsums of length 2 but having a vanishing subsum of length 3, 4, or 5, there exist ξ , ξ ∈ S 1 such that {ζ1 , . . . , ζ8 } = {ξ αl |0 ≤ l ≤ 2} ∪ {ξ β m |0 ≤ m ≤ 4},

(81)

where again α = e2πi/3 and β = e2πi/5 . Proof. Again, one verifies by a direct computation that the solutions (81) of (71) have the desired properties. It remains to prove that they are the only ones. First note that under the hypotheses of the lemma, vanishing subsums of length 4 cannot occur, since the latter ones would imply the existence of vanishing subsums of length 2, which by assumption is excluded. Hence, in order to find solutions of (72) for n = 8 with the desired properties, we have to find all solutions of (72) without vanishing subsums for n = 3 and n = 5. Note that by (77), p = n for n = 3 or n = 5. By the considerations in the proof of Lemma A.7, the former ones are given by (ζ1 , ζ2 , ζ3 ) = ξ(1, α, α 2 ) and the latter ones by (ζ1 , . . . , ζ5 ) = ξ (1, β, β 2 , β 3 , β 4 ) with ξ, ξ ∈ S 1 . This proves the lemma. We are now ready to prove Lemma A.5. Proof of Lemma A.5. We first select from (79), (80) and (81) all the solutions (ζ1 , . . . , ζ8 ) of (71) which are of the form (68) (after multiplication by 2i). This amounts to selecting the solutions (ζ1 , . . . , ζ8 ) of (71) having the property that {ζ1 , . . . , ζ8 } is invariant under the map ζ → −ζ −1 . It requires to choose ξ and ξ in (79), (80), and (81) appropriately. Let us explain this procedure in detail for the solutions of type (79). First we rewrite the solution (79), 2πitk 2πi x (ζ1 , . . . , ζ8 ) = ξ · (−α, −α 2 , γ , γ 2 , γ 3 , γ 4 , γ 5 , γ 6 ) = e 42 e 42 , 1≤k≤8

where ξ = e2πi x/42 with x ∈ R/42Z and (t1 , . . . , t8 ) = (6, 7, 12, 18, 24, 30, 35, 36).

(82)

The required invariance of the set of the ζk ’s under the map ζ → −ζ −1 is equivalent to the invariance of the set of the (tk + x)’s under the map t → 21 − t (mod 42). Since the set (82) of the tk ’s is invariant under the map t → −t (mod 42), {tk + x|1 ≤ k ≤ 8} is invariant under t → 21 − t (mod 42), if we choose x := 21 2 or ξ = i. Then the equation 8 ζ = 0 reads i=1 i e or sin π6 + sin identity reads

11πi 14

3π 14

+e

5πi 6

+ sin

+e

15π 14

sin

15πi 14

+e

+ sin

19πi 14

19π 14

+e

23πi 14

+e

27πi 14

+e

πi 6

+e

3πi 14

= 0,

= 0. Choosing all arguments in (0, π ), the latter

3π π 5π π + sin − sin − sin = 0. 6 14 14 14

(83)

For the solutions of type (80), one gets sin

π 13π 7π 3π + sin − sin − sin =0 6 30 30 10

(84)

Results on Normal Forms for FPU Chains

173

and sin

π 11π π π + sin − sin + sin = 0. 6 30 30 10

(85)

Let us briefly explain how (84)–(85) can be obtained. Note that from the 24 permutations of (1, 2, 3, 4) in (80), there are only six which lead to different sets of the ζi ’s, since interchanging i and j or k and m leaves the set on the right-hand side of (80) invariant. In t1 t8 the resulting six different cases, we again write {ζ1 , . . . , ζ8 } = ξ · {e2πi· 30 , . . . , e2πi· 30 } with ti in R/30Z. Then, up to translations, there are only two different types of solutions emerging from these six cases. With the appropriate choices of ξ , one gets the solutions (84) and (85). Finally, for the solutions of type (81), one gets sin

π π π 3π − sin + sin − sin = 0. 2 6 10 10

(86)

The procedure to obtain (86) is basically the same as in the preceding cases. We write (81) as {ζ1 , . . . , ζ8 } = ξ · {αl , λ · β m |0 ≤ l ≤ 2, 0 ≤ m ≤ 4} and first choose λ ∈ S 1 so that the set {αl , λ · β m |0 ≤ l ≤ 2, 0 ≤ m ≤ 4} is symmetric with respect to some axis through the origin, and then choose ξ so that this axis is the imaginary axis. To finish the proof of Lemma A.5 we show that all the solutions (k1 , k2 , k3 , k4 ) of 4 ki π i=1 sin N = 0 obtained in (83)–(86) and the additional ones obtained by replacing 4 0 < x < π in sin x by π − x satisfy i=1 ki ≡ 0 mod N and hence are not in K 4 . For the solutions obtained in (83)-(86), N is even. Hence if N is odd, then there is no quadruple (k1 , k2 , k3 , k4 ) ∈ K 4 such that (68) and (69) are satisfied. This finishes the proof of Lemma A.5 in this case. For the rest of the proof, we assume that N is even. If N = 42r for some r ∈ N, (83) becomes sin

7r π 9r π (−3r )π (−15r )π + sin + sin + sin = 0, 42r 42r 42r 42r

and we have 7r +9r −3r −15r = −2r ≡ 0 mod 42r . Hence the corresponding quadruple (k1 , k2 , k3 , k4 ) is not in K 4 . For the quadruples obtained by replacing 0 < x < π in 4 ki ≡ 0 mod 42r sin x by π − x in some of the summands in (83), the condition i=1 amounts to ± 7 ± 9 ± 3 ± 15 ≡ 0 mod 42

(87)

for any combination of plus and minus signs. The relations (87) are easily verified. Similarly, one verifies that the quadruples (k1 , k2 , k3 , k4 ) satisfying (84), (85), or (86) are not in K 4 by showing that ± 5 ± 13 ± 7 ± 9 ≡ 0, ±5 ± 1 ± 11 ± 3 ≡ 0, ±15 ± 5 ± 3 ± 9 ≡ 0 mod 30, (88) again for any combination of plus and minus signs. Hence we have shown that none of the solutions (k1 , k2 , k3 , k4 ) of (68) is an element of K 4 . This completes the proof of Lemma A.5. Proof of Lemma A.1. The claimed statement follows from Lemma A.2, A.3, A.4, and A.5.

174

A. Henrici, T. Kappeler

B. Details of Section 3 We begin by expressing the FPU Hamiltonian HV in relative coordinates. Introduce (v = (v j )1≤ j≤N −1 , v N ) ∈ R N given by (11). Then (v, v N ) = Mq is the linear change of the coordinates q1 , . . . , q N , where M is given by ⎛ ⎞ −1 1 0 . . . 0 ⎜ .. ⎟ ⎜ 0 ... ... . ⎟ ⎜ ⎟ ⎟. .. M =⎜ ⎜ . 0 ⎟ ⎜ ⎟ ⎝ 0 . . . 0 −1 1 ⎠ N −1 . . . . . . N −1 The variables (u = (u j )1≤ j≤N −1 , u N ) ∈ R N conjugate to (v, v N ) are then given by (M T )−1 p. The inverse of M T , (M T )−1 , can be computed to be ⎞ ⎛ ⎛ ⎞ 1 ... ... 1 1 0 ... ... 0 2 . . . . . . 2 ⎟ ⎜1 1 0 ... 0⎟ ⎜ .. ⎟ 1 ⎜ ⎟ ⎜ .. ⎜ .. .. ⎟ T −1 ⎟ . ⎟−⎜ (89) (M ) = ⎜ . . . ⎟. ⎜ ⎟ N ⎜ . ⎠ ⎝ .. ⎠ ⎝ . 1 ... 1 0 . . 0 ... ... 0 N ... ... N Note that by (89), u k = k P − Hence p1 = −u 1 + P;

k

p N = u N −1 + P;

j=1

p j for any 1 ≤ k ≤ N − 1 and u N = N P.

pk = (u k−1 − u k ) + P

(2 ≤ k ≤ N − 1),

and thus N P2 1 2 1 2 + u 1 + (u 1 − u 2 )2 + · · · + (u N −2 − u N −1 )2 + u 2N −1 . pj = 2 2 2 N

j=1

Moreover, using that q N +1 − q N = q1 − q N = − s ∈ Z≥1 , N

(q j+1 − q j )s =

j=1

N −1

N −1

vks + (−1)s

k=1

k=1

(qk+1 − qk ) one gets for any

N −1

s vk

.

k=1

2 Combining the two expressions displayed above yields HV = N 2P + H˜ V , where H˜ V only depends on (v, u) and is given by ⎛ N −1 2 ⎞ N −2 N −1 1 1 (u l+1 −u l )2 +u 2N −1 + ⎝ vk2 + vk ⎠ H˜ V = u 21 + 2 2 l=1 k=1 k=1 ⎛ ⎛ N −1 3 ⎞ N −1 4 ⎞ N −1 N −1 α 3 β 4 vk − vk ⎠ + ⎝ vk + vk ⎠ + O(v 5 ). (90) + ⎝ 3! 4!

k=1

k=1

k=1

k=1

Results on Normal Forms for FPU Chains

175

Note that for any values of α and β, the point (v, u) = (0, 0) is a critical point of the Hamiltonian H˜ V . To compute the Birkhoff normal form of H˜ V up to order 2 near the fixed point (v, u) = (0, 0), we take the expansion (90) as a starting point and use the linearization of the Birkhoff map at (v, u) = (0, 0) (cf. [7]) to define new coordinates (ξk , ηk )1≤k≤N −1 . The following lemma gives an independent proof of the fact that this linear map, defined by (14)–(17), is canonical. Lemma B.1. The linear transformation Z → R2N −2 , ζ → (v, u), as defined by (14)–(17), is a canonical isomorphism. Proof. First let us show {vl (ζ ), u m (ζ )} = i δlm , {vl (ζ ), vm (ζ ) = 0, {u l (ζ ), u m (ζ )} = 0

(91) (92) (93)

for any 1 ≤ l, m ≤ N − 1. Since (v, u) are canonical coordinates on R2N −2 , the proof of (91) amounts to showing that N −1 k=1

∂vl ∂u m ∂vl ∂u m − ∂ζk ∂ζ−k ∂ζ−k ∂ζk

= i δlm

for any 1 ≤ l, m ≤ N − 1. It follows from (14)–(17) that for any 1 ≤ k ≤ N − 1, λk ∂vl λk ∂vl = √ eπi(2l−1)k/N , = √ e−πi(2l−1)k/N , ∂ζk ∂ζ−k N N m−1 m−1 λk 2πi jk/N ∂u m λk −2πi jk/N ∂u m = √ e , =√ e . ∂ζk ∂ζ−k N j=0 N j=0 Hence ∂vl ∂u m ∂vl ∂u m − ∂ζk ∂ζ−k ∂ζ−k ∂ζk ⎞ ⎛ m−1 m−1 2 λ = k ⎝eπi(2l−1)k/N e−2πi jk/N − e−πi(2l−1)k/N e2πi jk/N ⎠ N j=0

=

j=0

m−1 λ2k πik (2l−2 j−1) πik eN − e N (2 j−2l+1) N j=0

=

m−1 2i kπ kπ sin (2(l − j) − 1) sin N N N j=0

i = N

m−1 j=0

2kπ(1 − (l − j)) 2kπ(l − j) cos − cos , N N

176

A. Henrici, T. Kappeler

where for the latter identity we used that 2 sin x sin y = cos(x − y) − cos(x + y). Taking the sum over k and changing the order of summation then leads to N −1 k=1

∂vl ∂u m ∂vl ∂u m − ∂ζk ∂ζ−k ∂ζ−k ∂ζk

=

m−1 N −1 i 2kπ(1−(l−j)) 2kπ(l−j) cos −cos N N N j=0 k=1

m−1 i N (δl− j,1 − δl− j,0 ) = N j=0

=i

m−1

(δl, j+1 − δl, j ) = i(δlm − δl0 ) = iδlm ,

j=0

as claimed. To prove (92) and (93) one argues in a similar way. From (91)–(93) it immediately follows that the linear map ξ → (v, u) is a canonical isomorphism. We now compute H˜ V in terms of the new variables ζ . Write H˜ V as H˜ V = Hu + Hv , where Hu and Hv denote the u- and v-dependent parts of (90), respectively. We compute Hu(ζ ) and Hv (ζ ) separately. To obtain Hu (ζ ), we substitute (14)–(16) into the expression N −2 1 2+ 2 + u2 u (u − u ) l+1 l l=1 1 N −1 and get 2 ⎛ N −1 1 ⎝ Hu (ζ ) = 2N l=0

1 = 2N Using again that obtains

N −1 l=0

⎞2

λk e

1≤|k|≤N −1

λk λk

1≤|k|,|k |≤N −1

2πilk/N

N −1

ζk ⎠

e

2πil(k+k )/N

ζ k ζk .

l=0

e2πilk/N = N δk0 and λk = λ−k for any 1 ≤ |k| ≤ N − 1, one

Hu (ζ ) =

N −1

λ2k ζk ζ−k .

k=1

Before computing Hv (ζ ), we simplify its expansion in terms of the variables (vk )1≤k≤N −1 . Define v0 by the expression on the right-hand side of (17) evaluated at l = 0. Note that N −1 N −1 1 −iπ k/N 2πilk/N vl = √ λ k ζk e e = 0. N 1≤|k|≤N −1 l=0 l=0 Hence

N −1 l=1

vl = −v0 and therefore Hv =

N −1 l=0

1 2 α 3 β 4 vl + vl + vl + O(|v|5 ). 2 3! 4!

(94)

Results on Normal Forms for FPU Chains

177

Substituting the expression (17) for vl in the quadratic term in the expansion (94), we get N −1 N −1 1 2 1 )/N 2πil(k+k vl = λk λk e e−iπ(k+k )/N ζk ζk 2 2N 1≤|k|,|k |≤N −1

l=0

=

N −1

l=0

λ2k ζk ζ−k ,

k=1

N −1 2πilk/N where we again used that λk = λ−k and l=0 e = N δk0 for any 0 ≤ |k| ≤ N −1. The terms of third and fourth order in Hv are treated similarly. Combining the above computations leads to the claimed formula H˜ V (ζ ) = G 2 + αG 3 + βG 4 + O(ζ 5 ) with G 2 , G 3 , and G 4 given by (18), (19), and (20), respectively. Acknowledgement. It is a great pleasure to thank Yves Colin de Verdière and Percy Deift for valuable comments. We also would like to thank the referee for his suggestions of how to improve the exposition of our paper.

References 1. Bambusi, D., Ponno, A.: Korteweg-de Vries equation and energy sharing in Fermi-PastaUlam. CHAOS 15, 015107 (2005) 2. Bambusi, D., Ponno, A.: On Metastability in FPU. Commun. Math. Phys. 264, 539–561 (2006) 3. Berman, G.P., Izrailev, F.M.: The Fermi-Pasta-Ulam problem: 50 years of progress. CHAOS 15(1), 015104.1–015104.18 (2005) 4. Broer, H.W.: KAM theory: the legacy of Kolmogorov’s 1954 paper. Bull. AMS (New Series) 41(4), 507–521 (2004) 5. Fermi, E., Pasta, J., Ulam, S.: Studies of non linear problems. Los Alamos Rpt. LA-1940 (1955). In: Collected Papers of Enrico Fermi. Chicago, IL: University of Chicago Press, 1965, Volume II, Theory, Methods and Applications, (2nd ed., New York: Marcel Dekker, 2000), pp. 978–988 6. Henrici, A., Kappeler, T.: Global Birkhoff coordinates for the periodic Toda lattice. Preprint, 2006 7. Henrici, A., Kappeler, T.: Birkhoff normal form for the periodic Toda lattice. http://arxiv.org/list/nlin.SI/ 0609045, 2006, to appear in Contemp. Math. 8. Henrici, A., Kappeler, T.: Resonant normal form for even periodic FPU chains. arXiv: 0709.2624 [nlin.SI] 9. Kappeler, T., Pöschel, J.: KdV & KAM. Ergebnisse der Mathematik, 3. Folge, 45. Berlin: Springer, 2003 10. Nishida, T.: A note on an existence of conditionally periodic oscillation in a one-dimensional lattice. Mem. Fac. Engrg. Kyoto Univ. 33, 27–34 (1971) 11. Pöschel, J.: Integrability of Hamiltonian Systems on Cantor Sets. Comm. Pure Appl. Math. 35, 653–695 (1982) 12. Pöschel, J.: On Nekhoroshev’s Estimate at an Elliptic Equilibrium. Int. Math. Res. Not. 4, 203–215 (1999) 13. Rink, B.: Symmetry and resonance in periodic FPU chains. Commun. Math. Phys. 218, 665–685 (2001) 14. Rink, B.: Direction reversing travelling waves in the Fermi-Pasta-Ulam chain. J. Nonlinear Science 12, 479–504 (2002) 15. Rink, B.: Proof of Nishida’s conjecture on anharmonic lattices. Commun. Math. Phys. 261, 613–627 (2006) 16. Toda, M.: Theory of Nonlinear Lattices, 2nd enl. ed., Springer Series in Solid-State Sciences 20. Berlin: Springer, 1989 17. Van der Waerden, B.L.: Algebra I. Heidelberger Taschenbücher. Berlin: Springer, 1966 18. Weissert, T.P.: The genesis of simulation in dynamics: pursuing the Fermi-Pasta-Ulam problem. New York: Springer, 1997 Communicated by G. Gallavotti

Commun. Math. Phys. 278, 179–191 (2008) Digital Object Identifier (DOI) 10.1007/s00220-007-0384-2

Communications in

Mathematical Physics

Global Well-Posedness for a Smoluchowski Equation Coupled with Navier-Stokes Equations in 2D P. Constantin1 , Nader Masmoudi2 1 Department of Mathematics, The University of Chicago, 5734 S. University Avenue,

Chicago, IL 60637, USA. E-mail: [email protected]

2 Courant Institute, New York University, 251 Mercer St, New York, NY 10012, USA.

E-mail: [email protected] Received: 22 February 2007 / Accepted: 10 May 2007 Published online: 7 November 2007 – © Springer-Verlag 2007

Abstract: We prove global existence for a nonlinear Smoluchowski equation (a nonlinear Fokker-Planck equation) coupled with Navier-Stokes equations in 2d. The proof uses a deteriorating regularity estimate in the spirit of [5] (see also [1]).

1. Introduction Systems coupling fluids and particles are of great interest in many branches of applied physics and chemistry. The equations attempt to describe the behavior of complex mixtures of particles and fluids, and as such, they present numerous challenges, simultaneously at three levels: at the level of their derivation, the level of their numerical simulation and that of their mathematical treatment. In this paper we concentrate solely on one aspect of the mathematical treatment, the regularity of solutions. The particles in the system are described by a probability distribution f (t, x, m) that depends on time t, macroscopic variable x ∈ Rn , and particle configuration m ∈ M. Here M is a smooth compact Riemannian manifold without boundary. The particles are transported by a fluid, agitated by thermal noise, and interact among themselves. This is reflected in a kinetic equation for the evolution of the probability distribution of the particles ([2,8]). The interaction between particles – a micro-micro interaction – is modeled in a mean-field fashion by a potential that represents the tendency of particles to favor certain coherent configurations. The interaction between particles occurs only when the concentration of particles is sufficiently high. Mathematically, this term is responsible for the nonlinearity of the Smoluchowski (Fokker-Planck) equation, and physically, it is responsible for nematic phase transitions. Because the particles are considerably small, and for smooth flows, the Lagrangian transport of the particles is modeled using a Taylor expansion of the velocity field. This gives rise to a drift term in the Smoluchowski equation that depends on the spatial gradient of velocity. It is a macro-micro term, and it causes mathematical difficulties in the regularity theory.

180

P. Constantin, N. Masmoudi

The fluid is described by the incompressible Navier-Stokes equations. The microscopic particles add stresses to the fluid. This is the micro-macro interaction and it is the most puzzling and important physical aspect of the problem. Indeed, while a macromicro interaction can be derived, in principle, by assuming that the macroscopic entities vary little on the scale of the microscopic ones, the “scaling up” of the effect of microscopic quantities to the macroscopic level is more mysterious. A principle based on an energy dissipation balance, and that recovers familiar results in simple cases was proposed in [6], where the regularity of nonlinear Fokker-Planck systems coupled with Stokes equations in 3D was also proved. The linear Fokker-Planck system coupled with Stokes equations was considered in [22]. The nonlinear Fokker-Planck equation driven by a time averaged Navier-Stokes system in 2D was studied in [7]. An approximate closure of the linear Fokker-Planck equation reduces the description to closed viscoelastic equations for the added stresses themselves. This leads to well-known non-Newtonian fluid models that have been studied extensively. For regularity results we refer to Lions and Masmoudi [19] where the existence of global weak solutions was proved for an Oldroyd-type model. In Guillopé and Saut [13] and [14], the existence of the local strong solution was proved. Also, Fernández-Cara, Guillén and Ortega [11,10] and [12] proved local well posedness in Sobolev spaces. We also mention Lin, Liu and Zhang [16] where a formulation based on the deformation tensor is used to study the Oldroyd-B model. Another model for the polymers is the FENE dumbbell model. From a mathematical point of view, this model was studied by several authors. In particular W. E, Li and Zhang [9], Jourdain, Lelievre and Le Bris [15] and Zhang and Zhang [23] proved local well-posedness. Moreover, Lin, Liu and Zhang [17] proved global existence near equilibrium. After the completion of the present work, we learned that Lin, Zhang, and Zhang [18] proved a result similar to our result for the co-rotational FENE model (see also [21]). Existence of global weak solutions was also proved in [20]. 1.1. The model. Consider the system ⎧ ∂v in × (0, T ), ⎨ ∂t + v · ∇v − νv + ∇ p = ∇ · τ ∂f (1) + v · ∇ f + div (G(v, f ) f ) − f = 0 in × (0, T ), g g ⎩ ∂t divv = 0 in × (0, T ), (1) (2) where τi j = M γi j (m) f (t, x, m)dm+ M M γi j (m 1 , m 2 ) f (t, x, m 1 ) f (t, x, m 2 )dm. ij

We denote G(v, f ) = ∇g U + W , where W = cα ∂ j vi and U = K f is a potential given by U (t, x, m) = K (m, q) f (t, x, q) dq (2) M

with a kernel K which is a smooth, time and space independent symmetric function K : M × M → R. We also take = R2 . 1.2. Statement of the result. Theorem 1.1. Take v(0) ∈ W1+ε0 ,r ∩ L 2 (R2 ) and f (0) ∈ W 1,r (H −s ), for some r > 2 and ε0 > 0 and f 0 ≥ 0, M f 0 ∈ L 1 ∩ L ∞ . Then (1) has a global solution in ∞ (W 1,r ) ∩ L 2 (W 2,r ) and f ∈ L ∞ (W 1,r (H −s )). Moreover, for T > T > 0, v ∈ L loc 0 loc loc we have v ∈ L ∞ ((T0 , T ); W 2−ε,r ).

Global Well-Posedness for a Smoluchowski Equation with 2D N-S Equations

181

1.3. Preliminaries. We define C to be the ring of center 0, of small radius 1/2 and great radius 2. There exist two nonnegative radial functions χ and ϕ belonging respectively to D(B(0, 1)) and to D(C) so that χ (ξ ) + ϕ(2−q ξ ) = 1, (3) q≥0

| p − q| ≥ 2 ⇒ Supp ϕ(2−q ·) ∩ Supp ϕ(2−p ·) = ∅.

(4)

For instance, one can take χ ∈ D(B(0, 1)) such that χ ≡ 1 on B(0, 1/2) and take ϕ(ξ ) = χ(ξ/2) − χ (ξ ). Then, we are able to define the Littlewood-Paley decomposition. Let us denote by F the Fourier transform on Rd . Let h, h, q , Sq (q ∈ Z) be defined as follows: h = F −1 ϕ and h = F −1 χ ,

q u = F −1 (ϕ(2−q ξ )Fu) = 2qd

Sq u = F −1 (χ (2−q ξ )Fu) = 2qd

h(2q y)u(x − y)dy, h(2q y)u(x − y)dy.

We use the para-product decomposition of Bony ([3]), uv = Tu v + Tv u + R(u, v), where Tu v =

Sq−1 uq v and R(u, v) =

q∈Z

q uq v.

|q−q |≤1

We define the inhomogeneous and homogeneous Besov spaces by Definition 1.2. Let s be a real number, p and r two real numbers greater than 1. Then we define the following norm:

def qs p p 2 u = S u + u , 0 L q L q∈N r B sp,r (N )

and the following semi-norm:

def u B sp,r = 2qs q u L p q∈Z

r (Z)

.

Definition 1.3. • Let s be a real number, p and r two real numbers greater than 1. We denote by B sp,r the space of tempered distributions u such that u B sp,r is finite. • If s < d/ p or s = d/ p and r = 1 we define the homogeneous Besov space B sp,r as the closure of compactly supported smooth functions for the norm · B sp,r . We refer to [4] for the proof of the following results and for the multiplication law in Besov spaces.

182

P. Constantin, N. Masmoudi

Lemma 1.4. 1

1

q u L b ≤ 2d( a − b )q q u L a for b ≥ a ≥ 1, et q u L b ≤ C2−ct2 q u L b . 2q

The following corollary is straightforward. Corollary 1.5. If b ≥ a ≥ 1, then, we have the following continuous embeddings: s Ba,r

s−d

⊂ Bb,r

1 1 a−b

.

Definition 1.6. Let p be in [1, ∞] and r in R; the space L T (C r ) is the space of distributions u such that p

def

qr p u L p (0,T ;C r ) = sup 2 q u L (L ∞ ) < ∞. T

q

We will use the following theorem from [5]. Theorem 1.7. Let v be the solution in L 2T (H 1 ) of the two dimensional Navier-Stokes system ⎧ ∂v ⎪ ⎨ + v · ∇v − νv = −∇ p + f (N Sν ) ∂t divv = 0 ⎪ ⎩ v|t=0 = v0 with an initial data in L 2 and an external force f in L 1T (C −1 ) ∩ L 2T (H −1 ); then, for any ε, a T0 in the interval ]0, T [ exists such that ∇v L1

[T0 ,T ] (C

0)

≤ ε.

2. A Deteriorating Regularity Estimate The main part of this section is the proof of a deteriorating regularity estimate for transport equations in the spirit of [1] and [5]. After this proof, we will apply this estimate in order to prove Theorem 1.1. We also denote H = (−g + I )−s/2 with s > d/2 + 1. Theorem 2.1. Let σ and β be two elements of ]0, 1[ such that σ + β < 1. A constant C exists that satisfies the following properties. Let T and λ be two positive numbers and v a smooth divergence free vector field so that σ − λ ∇v L 1 (C 0 ) ≥ β and σ + λ ∇v L 1 (C 0 ) ≤ 1 − β. T

T

Consider two smooth functions f and v so that f is the solution of ∂t f + v · ∇ f + divg (G(v, f ) f ) − g f = 0, f |t=0 = f 0 .

(5)

(6)

Global Well-Posedness for a Smoluchowski Equation with 2D N-S Equations

183

Then we have, if λ ≥ 3C, Mλσ ( f ) ≤ 3 f 0 B σp,∞ (H −s ) +

3C σ +1 M (v), λ λ

(7)

where def

Mλσ (v) =

sup

2qσ −q,λ (t) q v(t) L p or

(8)

sup

2qσ −q,λ (t) q f (t) L p (H −s ) with

(9)

t∈[0,T ],q

def

Mλσ ( f ) =

def

t∈[0,T ],q t

q,λ (t, t ) = λ

t

( Sq−1 ∇v(t

) L ∞ + 1)dt

, q,λ (t) = q,λ (t, 0).

(10)

def

We will use the notation f q = q f . Applying the operator q to the transport equation (6), we get ∂t f q + Sq−1 v · ∇ f q + divg (G(Sq−1 v, Sq−1 f ) f q ) − g f q + Rq (v, f ) = 0, f q |t=0 = q f 0 , (11) where Rq is a rest term. We denote

Nq2 (t, x) =

|H f q |2 dm.

(12)

M

Applying H to (11) and taking the L 2 norm on M, we get 1 ∂t N 2 + Sq−1 v · ∇ Nq2 + V (Sq−1 v, Sq−1 U, f q ) + |∇g H f q |2 2 q H f q (H Rq (v, f ))dm = 0,

+

(13)

M

where

V (v, h, f ) = ∂ j vi M

(H divg (cαi j

f ))(H f )dm. +

(H divg (∇g h f ))(H f )dm.

(14)

M

Hence, arguing as in [7], we have |V (Sq−1 v, Sq−1 U, f q )| ≤ C(|∇ Sq−1 v| + ||H Sq−1 f || L 2 (M) )Nq2 . We will use now the following lemma, postponing its proof: Lemma 2.2. Rq (v, f ) satisfies 2qσ −q,λ (t) H Rq (v(t), f (t)) L p (L 2 ) ≤ Ce ⎛ ⎛

Cλ ∇v L 1 (C 0 )

× ⎝ Mλσ +1 (v)+ ⎝1 + Sq ∇v(t) L ∞ +

T

|q −q|≤N

⎞

⎞

q ∇v(t) L ∞ ⎠ Mλσ ( f )⎠ . (15)

184

P. Constantin, N. Masmoudi

Taking the L p norm of Nq , we get

t

Nq (t) L p ≤ Nq (0) L p + C 0

H Rq (v(t ), f (t )) L p (L 2 )

+ (1 + ∇ Sq v(t ) L ∞ ) Nq (t ) L p dt . After multiplication by 2qσ −q,λ (t) , we get 2qσ −q,λ (t) Nq (t) L p ≤ 2qσ Nq (0) L p t

+ 2−q,λ (t,t ) 2qσ −q,λ (t ) ∇ Sq v(t ) L ∞ Nq L p dt

0 t

+ 2−q,λ (t,t ) 2qσ −q,λ (t ) H Rq (v(t ), f (t )) L p (L 2 ) dt . 0

Then, using the inequality (15) and taking the sup over q, we get Mλσ ( f ) ≤ f 0 B σp,∞ (H −s ) +e ⎛

Cλ ∇v L 1 (C 0 )

⎛

× ⎝ Mλσ +1 (v)+ Mλσ ( f ) ⎝1+2 Sq ∇v(t ) L ∞ +

T

sup

t∈[0,T ],q 0

t

2−q,λ (t,t )

(16)

⎞⎞

q ∇v(t ) L ∞ ⎠⎠ dt .

(17)

|q −q|≤N

As λ ∇v L 1 (C 0 ) is smaller than (σ − β), we have T

e

Cλ ∇v L 1 (C 0 ) T

≤ eC(σ −β) .

Moreover, by definition of q,λ (t, t ), it is obvious that

t

2−q,λ (t,t ) ( Sq ∇v(t ) L ∞ + 1)dt ≤

0

1 · λ log 2

Then, we obtain that C σ +1 C σ σ M (v) + C ∇v M (f) L 1T (C 0 ) Mλ ( f ) + λ λ λ λ C 2C σ M ( f ). ≤ f 0 B σp,∞ (H −s ) + Mλσ +1 (v) + λ λ λ

Mλσ ( f ) ≤ f 0 B σp,∞ (H −s ) +

This proves the theorem, of course, if we prove the estimate (15) of the lemma. First of all, let us decompose the operator Rq . We have

Global Well-Posedness for a Smoluchowski Equation with 2D N-S Equations

Rq (v, f ) =

6

185

Rq (v, f ) with

=1

Rq1 (v, f ) =

d

q (T∂ j f v j ),

j=1

Rq2 (v, f ) =

d

[q , Tv j ∂ j ] f,

j=1

Rq3 (v, f ) =

d

q ∂ j R(v j , f ) + q−1 v j ∂ j q+1 f q − q−2 v j ∂ j q−1 f q ,

j=1

Rq4 (v,

f) =

d

divg (cαi j q (T f ∂ j v i )) + divg (q (T f ∇g U )),

i, j=1

Rq5 (v, f ) =

d

divg (cαi j [q , T∂ j vi ] f ) + divg ([q , T∇g U ] f ),

i, j=1

Rq6 (v, f ) =

d

divg cαi j (R(∂ j v i , f ) + q−1 ∂ j v i q+1 f q − q−2 ∂i v j q−1 f q )

i, j=1

+

d

divg R(∇g U, f ) + q−1 ∇g U q+1 f q − q−2 ∇g U q−1 f q .

i, j=1

Indeed,

⎞ ⎛ d T∂ j f v j + Tv j ∂ j f + R(v j , ∂ j f )⎠ q (v · ∇ f ) = q ⎝ j=1

=

2

Rq (v, f ) +

=1

d

Tv j ∂ j q f + q R(v j , ∂ j f ),

j=1

Then, we use that d j=1

Tv j ∂ j f q =

Sq −1 v j ∂ j q f q

|q−q |≤1

= Sq−1 v j ∂ j f q +

(Sq −1 v j − Sq−1 v j )∂ j q f q

|q−q |≤1

= Sq−1 v ∂ j f q + q−1 v j ∂ j q+1 f q − q−2 v j ∂ j q−1 f q . j

Hence, q (v · ∇ f ) =

3 =1

Rq (v, f ) + Sq−1 v · ∇ f q .

186

P. Constantin, N. Masmoudi

In the same way, we have q (divg (G(v, f ) f )) =

6

Rq (v, f ) + divg (G(Sq−1 v, Sq−1 f ) f q ).

=4

Let us estimate the six terms appearing above. Let us begin with Rq1 (v, f ). By definition of the paraproduct, we have Rq1 (v, f ) =

d j=1

q (Sq −1 ∂ j f q v j ).

q

As, if |q − q | > 2 then the above term is equal to 0, we deduce that H Sq −1 ∇ f L ∞ (L 2 ) q v(t) L p . H Rq1 (v(t), f (t)) L p (L 2 ) ≤ C |q−q |≤2

Using the fact that, if |q − q | ≤ 2, then H Sq −1 ∇ f L ∞ (L 2 ) ≤ C2q H f (t) L ∞ (L 2 ) ≤ C2q , we infer that q v(t) L p ≤ C ∇q v(t) L p . H Rq1 (v(t), f (t)) L p (L 2 ) ≤ C2q |q−q |≤2

|q−q |≤2

Hence 2qσ −q,λ (t) H Rq1 (v(t), f (t)) L p (L 2 ) ≤ C Mλσ +1 (v) t t

2−λ 0 Sq ∇v(t ) L ∞ dt +λ 0 Sq ∇v(t ) L ∞ dt . |q−q |≤2

But, it is obvious that t t t Sq ∇v(t ) L ∞ dt − Sq ∇v(t ) L ∞ dt ≤ (Sq − Sq )∇v(t ) L ∞ dt . 0

0

0

Using the fact that |q − q | ≤ 2, we get t t Sq ∇v(t ) L ∞ dt − Sq ∇v(t ) L ∞ dt ≤ C|q − q | ∇v L 1 (C 0 ) . 0

T

0

(18)

So it turns out that 2qσ −q,λ (t) H Rq1 (v(t), f (t)) L p (L 2 ) ≤ 2

Cλ ∇v L 1 (C 0 ) T

Mλσ +1 (v).

Now let us look at Rq2 (v, f ). By definition of the paraproduct, we have Rq2 (v, f ) = −

d

[Sq −1 v j ∂ j q , q ] f

j=1 q

=−

d [Sq −1 v j , q ]∂ j q f. j=1 q

(19)

Global Well-Posedness for a Smoluchowski Equation with 2D N-S Equations

187

The terms of the above sum are equal to 0 except if |q − q | ≤ 2. Moreover, by definition of the operators q , we have [Sq −1 v j , q ]∂ j q f (x) = 2qd h(2q (x − y))(Sq −1 v j (x) Rd

− Sq −1 v j (y))∂ j q f (y)dy. So we infer that H [Sq −1 v j , q ]∂ j q f (x) L 2 (M) ≤ 2−q ∇ Sq −1 v L ∞ 2qd

× 2q | · | × |h(2q ·)| H ∂ j q f L 2 (M) (x). Hence, H [Sq −1 v j , q ]∂ j q f (x) L p (L 2 (M)) ≤ 2−q ∇ Sq −1 v L ∞ H ∂ j q f L p (L 2 (M)) . Then, we have, using inequality (18), 2qσ −q,λ (t) H [Sq −1 v j , q ]∂ j q f L p (L 2 (M)) Cλ v 1 1 L T (C ) ≤ C Mλσ ( f ) 2 ( ∇(Sq −1 − Sq )v(t) L ∞ + Sq v(t) L ∞ ). |q−q |≤2

So, we get 2qσ −q,λ (t) H Rq2 (v(t), f (t)) L p (L 2 ) ≤ C Mλσ ( f )2 ⎛ ⎞ × ⎝ Sq ∇v(t) L ∞ + ∇(q v(t) L ∞ ⎠ .

Cλ v L 1 (C 1 )

(20)

|q−q |≤2

For Rq3 , we have H Rq3 (v, f ) L p (L 2 ) ≤ C ≤C

|q −q

|≤1 q ≥q−2

q ≥q−2

Hence, 2qσ −q,λ (t) H Rq3 (v, f ) L p (L 2 ) ≤ C

2q q v L p H q

f L ∞ (L 2 )

2q−q q ∇v L p H f L ∞ (L 2 ) .

q ≥q−2

2(1+σ )(q−q )−q,λ (t)+q ,λ (t) Mλσ +1 (v)

× H f L ∞ (L 2 ) . Then, we see that the sum converges since

|q,λ (t) − q ,λ (t)| ≤ λ ∇v L 1 (C 0 ) |q − q | ≤ (σ − β)|q − q | T

and 1 + σ − (σ − β) = 1 + β > 0. Hence, we get 2qσ −q,λ (t) H Rq3 (v, f ) L p (L 2 ) ≤ C Mλσ +1 (v) H f L ∞ (L 2 ) .

(21)

188

P. Constantin, N. Masmoudi

The estimate for Rq4 (v, f ) = Rq4,1 (v, f ) + Rq4,2 ( f ) is the same as the estimate for Rq1 (v, f ). Indeed, we have H Rq4,1 (v(t), f (t)) L p (L 2 ) ≤ C ∇g H Sq −1 f L ∞ (L 2 ) q ∇v(t) L p |q−q |≤2

≤C

q ∇v(t) L p ,

|q−q |≤2

where we used that ∇g H Sq −1 f L ∞ (L 2 ) ≤ C. Hence, we conclude as for Rq1 (v, f ). Besides, ∇g H Sq −1 f L ∞ (L 2 ) q ∇g U L p H Rq4,2 ( f (t)) L p (L 2 ) ≤ C |q−q |≤2

≤C

q f (t) L p .

|q−q |≤2

Hence, we conclude as for Rq1 (v, f ) and get 2qσ −q,λ (t) H Rq4,2 ( f (t)) L p (L 2 ) ≤ 2

Cλ ∇v L 1 (C 0 ) T

Mλσ ( f ).

(22)

We write Rq5 (v, f ) = Rq5,1 (v, f ) + Rq5,2 ( f ). The estimate for Rq5 (v, f ) is similar to the one for Rq2 (v, f ) with the only difference that we have to use the regularity of ∇v. We have [q , T∂ j vi ] f = −

d [Sq −1 ∂ j v i , q ]∂ j q f. j=1 q

The terms of the above sum are equal to 0 except if |q − q | ≤ 2. Moreover, by definition of the operators q , we have i qd [Sq −1 ∂ j v , q ]q f (x) = 2 h(2q (x − y))(Sq −1 ∂ j v i (x) Rd

− Sq −1 ∂ j v i (y))q f (y)dy. So we infer that H Rq5,1 (v, f ) L 2 (M) ≤ 2−q |∇ 2 Sq −1 v|2qd

q

2 | · |×|h(2q ·)| ∇g H q f L 2 (M) (x).

Hence, H Rq5,1 (v, f ) L p (L 2 (M)) ≤ 2−q ∇ 2 Sq −1 v L p ∇g H q f L ∞ (L 2 (M)) . Then, we have, using Inequality (18), 2qσ −q,λ (t) H Rq5,1 (v, f ) L p (L 2 (M)) (σ −1)(q−q

)− (t)+ (t) q,λ q

,λ ≤C 2 Mλσ +1 (v) ∇g H q f L ∞ (L 2 ) . |q−q |≤2 q

≤q −1

Global Well-Posedness for a Smoluchowski Equation with 2D N-S Equations

189

Hence, 2qσ −q,λ (t) H Rq5,1 (v, f ) L p (L 2 (M)) ≤ C

q

≤q+1

2−β(q−q ) Mλσ +1 (v) ∇g H f L ∞ (L 2 ) ,

and the sum is uniformly bounded since σ − 1 + λ ∇v L 1 (C 0 ) ≤ −β. Then, we argue T

in a similar way for H Rq5,2 ( f ) L p (L 2 (M)) and get

H Rq5,2 ( f ) L p (L 2 (M)) ≤ 2−q ∇ Sq −1 f L p (L 2 (M)) ∇g H q f L ∞ (L 2 (M)) , and we conclude as above with Mλσ +1 (v) replaced by Mλσ ( f ). Finally, the estimate for Rq6 (v, f ) is exactly the same as the one for Rq3 (v, f ) since, we also have that ∇g H f L ∞ (L 2 ) ≤ C. 3. Global Existence Now, we turn to the proof of our main theorem. First, we notice that the local existence ∞ ([0, T ); W 1,r ) ∩ L 2 ([0, T ); W 2,r ) and f ∈ L ∞ ([0, T ); W 1,r (H −s )) with v ∈ L loc loc loc can be easily deduced from standard arguments. Moreover, from regularity estimates for ∞ ((T , T ); W 2−ε,r ). the heat equation, we have for all 0 < T0 < T , v ∈ L loc 0 We want to prove that we can extend the solution beyond the time T . It is enough to prove that ∇v ∈ L ∞ ((0, T ) × R2 ). The local existence result tells that, for any T0 in ]0, T [, the solution (v, f ) of (1) ∞ ([T , T [; W 2−ε,r × W 1,r (H −s )) for any ε > 0. Sobolev type belongs to the space L loc 0 embeddings of Corollary 1.5 imply that (v, τ ) ∈

∞ L loc

2−ε−2

B p,∞ [T0 , T [;

1 1 r−p

1−2

× B p,∞

1 1 r−p

.

Choosing ε < 1 − 2/r and p = ∞ in the above assertion implies that (v, τ ) ∈ ∞ (C 1+σ × C σ (H −s )), where σ = 1 − ε − 2/r > 0. So we can apply Theorem L loc 1.7 and we can choose T0 such that, with the notations of Theorem 2.1, we have ∇v L1

[T0 ,T ] (C

0)

≤

min(σ − β, 1 − σ − β) · 3λ

The deteriorating regularity estimate of Theorem 2.1 applied with σ and between T0 and T tells exactly that f satisfies Mλσ ( f ) ≤ 3 f (T0 ) C σ (H −s ) +

3C σ +1 M (v). λ λ

(23)

Now, we have to estimate ∇v. The two dimensional Navier-Stokes equation can be written as ∂t v − νv = P(v · ∇v) + P Dτ,

190

P. Constantin, N. Masmoudi

where P denotes the Leray projector on the divergence free vector field. Exactly along the same lines as in the proof of Theorem 2.1, we have 2q(σ +1)−q,λ (t) P(v · ∇v) − P(Sq v · ∇q v) L ∞ ⎛ ⎞

≤ C Mλσ +1 (v) ⎝ Sq ∇v(t) L ∞ + 2q−q ∇q v(t) L ∞ ⎠ . q ≥q

Moreover, it is obvious that 1

2q(σ − 2 )−q,λ (t) P(Sq v · ∇q v) L ∞ ≤ C v(t)

1

H2

Mλσ +1 (v).

So it turns out that 2q(σ +1)−q,λ (t) q P(v · ∇v) L ∞ ⎛ ≤ C Mλσ +1 (v) ⎝ Sq ∇v(t) L ∞ +

⎞

2

(q−q )

3q 2

∇v(t) L ∞ + 2 v(t)

q ≥q

1

H2

⎠.

(24)

Using well known estimates on the heat equation (see for instance[4]) and inequalities (23) and (24) , we get that 3q C C + 2 2 Fq (T0 , T ) Mλσ +1 (v) + Mλσ (τ ) Mλσ +1 (v) ≤ v(T0 ) C σ +1 + λ ν with def

Fq (T0 , T ) =

t

sup

t∈[T0 ,T ] T0

ecν2

2q (t−t )

v(t )

1

H2

dt .

Hölder inequality implies immediately that Fq (T0 , T ) ≤

C ν

3 4

3q

2− 2 v

L 4T

0 ,T ]

1

(H 2 )

.

Moreover, it is easy to see that Mλσ (τ ) ≤ Mλσ ( f ). So, we infer that Mλσ +1 (v)

≤ v(T0 ) C σ +1

3C τ0 C σ + + ν

C C C + + v 4 Mλσ +1 (v). 1 L T ,T ] (H 2 ) λ λν ν 34 0

Now it is enough to choose T0 such that the quantity C C C + + v 4 1 L T ,T ] (H 2 ) λ λν ν 34 0 is small enough. Then as σ is greater than 0, the solution (v, τ ) of the system (1) is such that (∇v, τ ) belongs to L ∞ ([T0 , T ] × R2 ); this concludes the proof of Theorem 1.1. Acknowledgements. The work of P.C. is partially supported by NSF-DMS grant 0504213. The work of N. M. is partially supported by NSF-DMS grant 0403983.

Global Well-Posedness for a Smoluchowski Equation with 2D N-S Equations

191

References 1. Bahouri, H., Chemin, J.-Y.: Équations de transport relatives á des champs de vecteurs non-lipschitziens et mécanique des fluides. Arch. Rational Mech. Anal. 127(2), 159–181 (1994) 2. Bird, R.B., Curtiss, C., Amstrong, R., Hassager, O.: Dynamics of polymeric liquids. Kinetic Theory Vol. 2, New York: Wiley, 1987 3. Bony, J.-M.: Calcul symbolique et propagation des singularités pour les équations aux dérivées partielles non linéaires. Ann. Sci. École Norm. Sup. (4), 14(2), 209–246 (1981) 4. Chemin, J.-Y.: Théorèmes d’unicité pour le système de Navier-Stokes tridimensionnel. J. d’Anal. Math. 77, 27–50 (1999) 5. Chemin, J.-Y., Masmoudi, N.: About lifespan of regular solutions of equations related to viscoelastic fluids. SIAM J. Math. Anal. 33(1), 84–112 (electronic) (2001) 6. Constantin, P.: Nonlinear Fokker-Planck Navier-Stokes systems. Commun. Math. Sci. 3(4), 531–544 (2005) 7. Constantin, P., Fefferman, C., Titi, E., Zarnescu, A.: Regularity for coupled two-dimensional nonlinear Fokker-Planck and Navier-Stokes systems. Commun. Math. Phys. 270, 789–811 (2007) 8. Doi, M., Edwards, S.F.: The Theory of Polymer Dynamics. Oxford: Oxford University Press, 1986 9. Li, W.E.T., Zhang, P.: Well-posedness for the dumbbell model of polymeric fluids. Commun. Math. Phys. 248(2), 409–427 (2004) 10. Fernández-Cara, E., Guillén, F., Ortega, R.R.: Some theoretical results for viscoplastic and dilatant fluids with variable density. Nonlinear Anal. 28(6), 1079–1100 (1997) 11. Fernández-Cara, E., Guillén, F., Ortega, R.R.: Some theoretical results concerning non-Newtonian fluids of the Oldroyd kind. Ann. Scuola Norm. Sup. Pisa Cl. Sci. (4), 26(1), 1–29 (1998) 12. Fernández-Cara, E., Guillén, F., Ortega, R.R. The mathematical analysis of viscoelastic fluids of the Oldryod kind, 2000 13. Guillopé, C., Saut, J.-C.: Existence results for the flow of viscoelastic fluids with a differential constitutive law. Nonlinear Anal. 15(9), 849–869 (1990) 14. Guillopé, C., Saut, J.-C.: Global existence and one-dimensional nonlinear stability of shearing motions of viscoelastic fluids of Oldroyd type. RAIRO Modél. Math. Anal. Numér. 24(3), 369–401 (1990) 15. Jourdain, B., Lelièvre, T., Le Bris, C.: Existence of solution for a micro-macro model of polymeric fluid: the FENE model. J. Funct. Anal. 209(1), 162–193 (2004) 16. Lin, F.-H., Liu, C., Zhang, P.: On hydrodynamics of viscoelastic fluids. Comm. Pure Appl. Math. 58(11), 1437–1471 (2005) 17. Lin, F.-H., Liu, C., Zhang, P.: On a Micro-Macro model for polymeric fluids near equilibrium. Comm. Pure Appl. Math. 60(6), 838–866 (2007) 18. Lin, F.-H., Zhang, P., Zhang, Z.: On the global existence of smooth solution to the 2-d FENE dumbell model. Commun. Math. Phys., DOI:10.1007/s00220-007-0385-1 19. Lions, P.-L., Masmoudi, N.: Global solutions for some Oldroyd models of non-Newtonian flows. Chinese Ann. Math. Ser. B 21(2), 131–146 (2000) 20. Lions, P.-L., Masmoudi, N.: Global existence of weak solutions to micro-macro models. To appear C. R. Math. Acad. Sci. Paris, 2007 21. Masmoudi, N.: Well posedness for the FENE dumbbell model of polymeric flows. Preprint, 2007 22. Otto, F., Tzavaras, A.E.: Continuity of velocity gradients in suspensions of rod-like molecules. SFB preprint Nr. 141, 2004 23. Zhang, H., Zhang, P.: Local existence for the FENE-dumbbell model of polymeric fluids. Arch. Ration. Mech. Anal. 181(2), 373–400 (2006) Communicated by A. Kupiainen

Commun. Math. Phys. 278, 193–252 (2008) Digital Object Identifier (DOI) 10.1007/s00220-007-0386-0

Communications in

Mathematical Physics

Localization for Yang-Mills Theory on the Fuzzy Sphere Harold Steinacker1 , Richard J. Szabo2 1 Institut für Theoretische Physik, Universität Wien, Boltzmanngasse 5, A-1090 Wien, Austria.

E-mail: [email protected]

2 Department of Mathematics, and Maxwell Institute for Mathematical Sciences, Heriot-Watt University,

Colin Maclaurin Building, Riccarton, Edinburgh EH14 4AS, UK. E-mail: [email protected] Received: 27 February 2007 / Accepted: 8 May 2007 Published online: 20 November 2007 – © Springer-Verlag 2007

Abstract: We present a new model for Yang-Mills theory on the fuzzy sphere in which the configuration space of gauge fields is given by a coadjoint orbit. In the classical limit it reduces to ordinary Yang-Mills theory on the sphere. We find all classical solutions of the gauge theory and use nonabelian localization techniques to write the partition function entirely as a sum over local contributions from critical points of the action, which are evaluated explicitly. The partition function of ordinary Yang-Mills theory on the sphere is recovered in the classical limit as a sum over instantons. We also apply abelian localization techniques and the geometry of symmetric spaces to derive an explicit combinatorial expression for the partition function, and compare the two approaches. These extend the standard techniques for solving gauge theory on the sphere to the fuzzy case in a rigorous framework. Contents 1. 2.

3.

4.

Introduction and Summary . . . . . . . . . . . . . . . . . . . Symplectic Model for Yang-Mills Theory on the Fuzzy Sphere 2.1 The fuzzy sphere . . . . . . . . . . . . . . . . . . . . . . 2.2 Configuration space of gauge fields . . . . . . . . . . . . . 2.3 The Yang-Mills action . . . . . . . . . . . . . . . . . . . . 2.4 Symplectic geometry of the configuration space . . . . . . The Classical Configuration Space . . . . . . . . . . . . . . . 3.1 Classical solutions . . . . . . . . . . . . . . . . . . . . . . 3.2 The classical action . . . . . . . . . . . . . . . . . . . . . 3.3 Local symplectic geometry of the configuration space . . . 3.4 Explicit decomposition at Yang-Mills critical surfaces . . . 3.5 Fluctuations around the critical surfaces . . . . . . . . . . Nonabelian Localization . . . . . . . . . . . . . . . . . . . . . 4.1 Equivariant cohomology and the localization principle . . .

. . . . . . . . . . . . . .

. . . . . . . . . . . . . .

. . . . . . . . . . . . . .

. . . . . . . . . . . . . .

. . . . . . . . . . . . . .

. . . . . . . . . . . . . .

. . . . . . . . . . . . . .

194 196 196 197 200 201 203 203 206 207 209 214 215 216

194

5. 6. 7.

8.

H. Steinacker, R. J. Szabo

4.2 Explicit evaluation of the localization forms . . . . . . . . . . . . . 4.3 Localization at the vacuum moduli space . . . . . . . . . . . . . . . 4.4 Localization at maximally irreducible saddle points . . . . . . . . . Abelianization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Itzykson-Zuber Localization on the Configuration Space . . . . . . . . . Abelian Localization and Radial Coordinates . . . . . . . . . . . . . . . 7.1 Polar decomposition of the configuration space . . . . . . . . . . . 7.2 Evaluation of the abelianized partition function: U (1) gauge theory . 7.3 Evaluation of the abelianized partition function: U (n) gauge theory . Yang-Mills Critical Surfaces in Abelianized Localization . . . . . . . . . 8.1 Itzykson-Zuber localization on the symplectic leaves . . . . . . . . 8.2 Radial coordinates for Yang-Mills critical surfaces . . . . . . . . . . 8.3 Action of the gauge group . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . .

. . . . . . . . . . . . .

219 222 225 233 234 235 236 238 241 243 245 246 250

1. Introduction and Summary Gauge theory on the fuzzy sphere has been of interest for many years as the simplest example of a noncommutative gauge theory with finitely many degrees of freedom which retains all of the classical symmetries of the corresponding undeformed field theory (see for instance [1–12] and references therein). It can be formulated as an N × N matrix model, which provides a natural regularization preserving all symmetries of quantum gauge theory on the classical sphere which is recovered in the large N limit. At the classical level one finds non-trivial gauge field configurations such as monopoles which can be naturally described in terms of the noncommutative topology of projective modules. Besides Yang-Mills gauge theory which is the focus of this paper, certain other gauge theories on the fuzzy sphere naturally emerge in string theory upon quantizing the worldvolume dynamics of spherical D2-branes [14], obtained for instance as expansions about vacua of matrix models with a Chern-Simons term [15,16] describing superstrings in pp-wave backgrounds [17]. These models contain additional scalar degrees of freedom and are not considered here. The formulation of Yang-Mills theory as an N × N matrix model allows a nonperturbative quantization in terms of a finite-dimensional path integral [9]. This can then be evaluated in terms of an N -dimensional integral, and the classical result as a sum over two-dimensional instantons [18–20] is recovered in the commutative limit N → ∞. A different approach to evaluate the path integral was given in [11], which is also restricted to the large N limit. This indicates in particular that the model is void of the usual perturbative ambiguities which plague noncommutative gauge theories in higher dimensions, such as UV/IR mixing (see [21,22] for reviews). In this paper we will formulate a new model for quantum Yang-Mills theory on the fuzzy sphere, and solve it exactly. The model reduces to pure Yang-Mills theory on the classical sphere when N → ∞ without any spurious auxiliary scalar fields. The classical theory admits topologically non-trivial solutions as in previous matrix model formulations [9], including some purely noncommutative ones. Its main virtue is that the finite-dimensional configuration space of gauge fields can be described as a compact coadjoint orbit, which is naturally a symplectic manifold with a hamiltonian action of a nonabelian Lie symmetry group. The Yang-Mills action is the square of the corresponding moment map, and therefore our model can be solved exactly using nonabelian localization techniques [18,23–27] to cast the partition function as a sum over local contributions from the classical solutions of the gauge theory. It can also be solved

Localization for Yang-Mills Theory on the Fuzzy Sphere

195

by abelian localization techniques which exploit the usual Duistermaat-Heckman theorem (see [28,29] for extensive treatments) and which provide an interesting alternative to the semiclassical expansion. Although the model described in this paper is fundamentally different from the fuzzy gauge theories that naturally emerge in string theory, which contain a Chern-Simons term in their action, nonabelian localization bears certain remarkable similarities to the nonabelian localization of Chern-Simons theory on Seifert homology spheres [27]. There are two main motivations behind the present work. Firstly, in the commutative case, two-dimensional gauge theories are exactly solvable and can be solved explicitly, either at strong coupling by exploiting the Migdal formula [30,31] which expresses it in terms of a sum over irreducible representations of its gauge group, or at weak coupling by using Poisson resummation techniques to cast it as a sum over two-dimensional instantons [18–20]. One would therefore like to have a similar picture in the noncommutative case. The instanton expansion can be readily generalized to provide the exact solution for gauge theory on a two-dimensional noncommutative torus [32,33]. However, in previous formulations of gauge theory on the fuzzy sphere this is not possible, either because extra scalar degrees of freedom not normally present in commutative Yang-Mills theory destroy the topological nature of the gauge theory and hence its exact solvability, or else because the exact solution does not decompose neatly into isolated contributions from classical solutions. Our model fills this gap, providing a gauge theory on the fuzzy sphere whose exact solution is on a unified footing with that of gauge theory on the noncommutative torus, in the same way that all two-dimensional gauge theories admit universal solutions. This is even apparent from the strong coupling expansions of the two noncommutative gauge theories [33,34], which exhibit the same degrees of complexity. However, the precise implementation of the nonabelian localization principle is rather different in the two cases. In the case of the torus, one starts from a rational noncommutative gauge theory and exploits Morita equivalence with commutative gauge theory to extract the exact instanton expansion, and then uses continuity arguments to extend the expansion to generic values of the noncommutativity parameter. On the fuzzy sphere, Morita equivalence is not available in this manner, and we will have to evaluate the quantum fluctuation integrals required in the semiclassical expansion explicitly. This entails a significantly larger amount of analysis and work than in the case of the torus. Secondly, our formulation of gauge theory on the fuzzy sphere provides a new finitedimensional model which can be solved explicitly by nonabelian localization techniques. In particular, we draw heavily on techniques developed recently in [27] to analyse higher critical points in ordinary two-dimensional Yang-Mills theory. In our case, the analysis is intrinsically finite-dimensional and in accord with rigorous results established in [24,26]. The techniques we exploit in this paper involve a beautiful mix of methods from random matrix theory and (both abelian and nonabelian) localization. In particular, we will throughout compare with some analogous results obtained directly from random matrix theory in [9]. Our approach thereby extends the toolkit of methods which can be generally used to treat gauge theories on fuzzy spaces. The outline of this paper is as follows. In Sect. 2 we introduce our new symplectic model for gauge theory on the fuzzy sphere, showing that it reduces to pure Yang-Mills theory on the classical sphere in the large N limit. We also describe in detail the standard construction of the symplectic structure on the coadjoint orbit space of gauge fields. In Sect. 3 we classify all classical solutions of the gauge theory, finding fuzzy versions of the usual instantons and monopoles as well as hosts of purely noncommutative solutions

196

H. Steinacker, R. J. Szabo

such as fluxons [35]. We then give a detailed description of the local geometry of the configuration space near each Yang-Mills critical point. In Sect. 4 we review some general aspects of nonabelian localization, and apply it to compute precisely the contributions to the path integral from the vacuum and also higher unstable critical points, showing in each case that the standard instanton contributions on the sphere are recovered at N → ∞. In Sects. 5, 6, and 7 we give an alternative description of the exact path integral in terms of abelian localization, which exploits the fact that the configuration space is a hermitian symmetric space to express the gauge field degrees of freedom in a suitable system of coordinates [36]. These coordinates have been previously used to evaluate integrals arising in random matrix theory in [37,38]. Finally, in Sect. 8 we compare the abelian and nonabelian localization approaches, indicating how to map between the Yang-Mills critical points and those of the abelianized localization. This is similar to the abelianized localization at higher critical points of ordinary Yang-Mills theory studied in [28], although in the fuzzy case the mapping is not one-to-one and is thus far more intricate. 2. Symplectic Model for Yang-Mills Theory on the Fuzzy Sphere In this section we will introduce our new symplectic model for gauge theory on the fuzzy sphere. A similar formulation was given for gauge theory on fuzzy CP 2 in [39]. This formulation will be particularly suitable for the approach that we take later on to computing the path integral using localization techniques. 2.1. The fuzzy sphere. Let N ∈ N, and let ξi , i = 1, 2, 3 be the N × N hermitian coordinate generators of the fuzzy sphere S N2 ∼ = Mat N which satisfy the relations i j k ξi ξ j = i ξk

and

ξi ξ i =

1 4

N 2 − 1 1l N ,

(2.1)

where throughout repeated upper and lower indices are implicitly summed over. The deformation parameter is N1 and S N2 becomes the algebra of functions on the classical unit sphere S 2 in the limit N → ∞. The quantum space S N2 preserves the classical invariance under global rotations as follows. The ξi generate an N -dimensional representation of the global SU (2) isometry group. Under the adjoint action of SU (2), this representation decomposes covariantly into p-dimensional irreducible representations ( p) of SU (2) as Mat N ∼ = (1) ⊕ (3) ⊕ · · · ⊕ (2N − 1),

(2.2)

which are interpreted as fuzzy spherical harmonics. This decomposition defines a natural map from S N2 to the space of functions on the commutative sphere. The integral of a function f ∈ S N2 over the fuzzy sphere is given by the trace of f , which coincides with the usual integral on S 2 , N d f, (2.3) Tr ( f ) = 4π S 2 where the above map is understood. Rotational invariance of the integral then corresponds to invariance of the matrix trace under the adjoint action of SU (2).

Localization for Yang-Mills Theory on the Fuzzy Sphere

197

Following [9], let us combine the generators ξi into a larger hermitian N × N matrix =

1 2

1l N ⊗ σ 0 + ξi ⊗ σ i ,

(2.4)

where N = 2N , σ 0 = 1l2 , while 01 0 i 1 0 , σ2 = and σ 3 = σ1 = 10 −i 0 0 −1 are the Pauli spin matrices obeying Tr σ i = 0 and

σ i σ j = δ i j 1l2 + i i j k σ k .

(2.5)

(2.6)

One easily finds from (2.1) and (2.6) the identities 2 =

N2 4

1lN

and

Tr () = N .

(2.7)

Since ξi ⊗ σ i is an intertwiner of the Clebsch-Gordan decomposition (N ) ⊗ (2) = (N − 1) ⊕ (N + 1), this implies that has eigenvalues ± N2 with respective multiplicities N± = N ± 1. 2.2. Configuration space of gauge fields. We will now describe the gauge field degrees of freedom in our formulation. To elucidate the construction in as transparent a way as possible, we begin with the abelian case of U (1) gauge theory. To introduce u(N ) gauge fields Ai on S N2 , consider the covariant coordinates [40] Ci = ξi + Ai

and

C0 =

1 2

1l N + A0 ,

(2.8)

which transform under the gauge group G = U (N ) as Cµ → U −1 Cµ U for µ = 0, 1, 2, 3 and U ∈ U (N ). We can again assemble them into a larger N × N matrix C = Cµ ⊗ σ µ .

(2.9)

Generically, these would consist of four independent fields, and we have to somehow reduce them to two tangential fields on S N2 . There are several ways to do this. For example,

one can impose the constraints A0 = 0 and Ci C i = N 4−1 1lN as in [9], leading to a constrained hermitian multi-matrix model describing quantum gauge theory on the fuzzy sphere which recovers Yang-Mills theory on the classical sphere in the large N limit. Here we will use a different approach and impose the constraints 2

C2 =

N2 4

1lN

and

Tr (C) = N

(2.10)

which is equivalent to requiring that C has eigenvalues ± N2 with multiplicities N± = N ± 1. In terms of the components of (2.9), this amounts to the constraints Ci C i + C02 =

N2 4

1lN

and

i i jk C j Ck + {C0 , Ci } = 0.

(2.11)

We checked in Sect. 2.1 above that this is satisfied for Aµ = 0, wherein C = . We can then consider the action of the unitary group U (2N ) given by C −→ U −1 C U

(2.12)

198

H. Steinacker, R. J. Szabo

which generates a coadjoint orbit of U (2N ) and preserves the constraint (2.10). The gauge fields Aµ are in this way interpreted as fluctuations about the coordinates of the quantum space S N2 . The constraint (2.10) ensures that the covariant coordinates (2.9) describe a dynamical fuzzy sphere. The gauge group G = U (N ) and the global isometry group SU (2) of the sphere are subgroups of the larger symmetry group U (2N ). In particular, the generators of the gauge group are given by elements of the form φ = φ0 ⊗ σ 0 ∈ g := u(N ) ⊂ u(N ), which defines the gauge algebra g. We thus claim that a possible configuration space of gauge fields is given by the single coadjoint orbit (2.13) O := O() = C = U −1 U U ∈ U (N ) , where ∈ u(2N ) is given by (2.4). Explicitly, dividing by the stabilizer of gives a representation of the orbit (2.13) as the symmetric space O ∼ = U (2N )/U (N + 1) × U (N − 1) of dimension dim(O) = 2(N 2 − 1). A similar construction was given in [39] for the case of CP 2 , and applied to S N2 in a different way in [11]. To justify this claim, we must check that the orbit O captures the correct number of degrees of freedom at least in the commutative limit N → ∞, i.e. that the gauge fields Ai are essentially tangent vector fields on S N2 . The tangent space to O() at a point C is isomorphic to TC O ∼ = u(N )/r, where r = u(N+ ) × u(N− ) is the stabilizer subalgebra of . This identification is equivariant with respect to the natural adjoint action of the Lie group U (N ). Explicitly, tangent vectors to O() at C have the form1 Vφ = i [C, φ]

(2.14)

for any hermitian element φ ∈ u(N )/r,2 which are just the generators of the unitary group U (N ) acting on O() by the adjoint action. These actually describe vector fields on the entire orbit space O(). Here and in the following we use the symbol C to denote both elements of O(), as well as the matrix of overcomplete coordinate functions on 2 O() defined using the embeddings O() → u(N ) → CN . The map J . Following [39], we can make the description of the tangent space to O, spanned by the vectors Vφ , more explicit as follows. Consider for C ∈ O the map J : u(N ) −→ su(N )

(2.15)

Vφ =

(2.16)

defined by J (φ) =

1 N

i N

[C, φ] .

Using (2.10) one finds that it satisfies J 3 = −J

(2.17)

and hence amounts to suitable projectors. Moreover, the map J is an antihermitian operator with respect to the invariant Cartan-Killing inner product Tr (φ ψ) on u(N ), since Tr (φ J (ψ)) =

i N

Tr (φ [C, ψ]) = − Ni Tr ([C, φ] ψ) = − Tr (J (φ) ψ) .

(2.18)

1 To streamline notation, we will not write explicitly the local dependences of fields and operators defined at points C ∈ O. 2 With our conventions, the vector fields (2.14) are real.

Localization for Yang-Mills Theory on the Fuzzy Sphere

199

The map J will play an instrumental role in this paper and its geometrical properties will be studied in more detail in the next section. Here we simply note the meaning of J in the commutative limit N → ∞. In component form with φ = φµ ⊗ σ µ , it acts as3

J (φ) ≈ − Ni φµ ⊗ σ µ , C j ⊗ σ j

≈ − Ni φµ , C j ⊗ σ µ σ j + Ni φµ C j ⊗ σ µ , σ j , (2.19) where we have set C0 ≈ large N this reduces to

1 2

1l N in the large N limit as will be justified below. Thus at

J (φ) ≈ O

1 N

− i j k φi x j ⊗ σ k

(2.20)

for “almost” commutative functions describing the gauge field fluctuations Aµ . Here ξi ≈ N2 xi define homogeneous coordinates xi on the sphere. This result means that if we interpret φi as a three-component vector field on the fuzzy sphere, including radial components, then the operator J vanishes on the normal component and essentially coincides with the complex structure for tangential fields on the Kähler manifold S 2 . In particular, the image of J , i.e. the space of tangent vectors (2.14) to O() or small variations of the gauge field, indeed admits two independent field degrees of freedom. This implies that the orbit (2.13) describes two tangent vector fields on S N2 . Hence the tangent space to O can be interpreted precisely as the space of tangent vector fields on the fuzzy sphere. This nicely reflects the affine nature of the space of gauge fields. Nonabelian gauge theory. The generalization to nonabelian U (n) gauge theory is very simple. One now takes N = 2n N

(2.21)

and enlarges the matrix (2.4) to ⊗ 1ln (which we continue to denote as for ease of 2 notation). The configuration space is given by the U (N ) orbit (2.13) with C 2 = N4 1lN and Tr (C) = n N . Then C has eigenvalues ± space

N 2

(2.22)

of respective multiplicities n (N ± 1). The configuration

O = U (2n N )/U (n N+ ) × U (n N− ) describes u(n) – valued gauge fields on S N2 . Its dimension is given by dim(O) = 2n 2 N 2 − 1 .

(2.23)

(2.24)

The gauge group is now given by G = U (n N ), and acts on the covariant coordinates Ci = ξi ⊗ 1ln + Ai , C0 = 21 1ln N + A0 as Cµ → U −1 Cµ U . This leads to the expected transformation law for the u(n) – valued gauge fields Ai . The corresponding gauge algebra is now g := u(n N ) ⊂ u(N ), consisting of elements of the form φ = φ0 ⊗σ 0 ∈ g. 3 Throughout, the notation ≈ will always mean an equality which is valid in the large N commutative limit.

200

H. Steinacker, R. J. Szabo

2.3. The Yang-Mills action. Consider the action S = S(C) :=

N g

Tr C0 −

1 2

1ln N

2

(2.25)

for C ∈ O, which is invariant under the group of gauge transformations G as well as global SU (2) rotations. We claim that it reduces in the commutative limit N → ∞ to the usual Yang-Mills action on the sphere S 2 , and can therefore be taken as a definition of the Yang-Mills action on the fuzzy sphere S N2 . We establish this explicitly below in the abelian case n = 1, the extension to general n being obvious. Consider the three-component field strength [9] Fi := i i jk C j Ck + Ci = i i jk [ξ j , Ak ] + i i jk A j Ak + Ai ,

(2.26)

where Ci = ξi + Ai as in (2.8). To understand its significance, consider the “north pole” of S N2 , where ξ3 ≈ N2 x3 = N2 1l N (with unit radius), and one can replace the operators i adξi −→ −εi j ∂ j := −εi j

∂ ∂x j

(2.27)

in the commutative limit for i, j = 1, 2. Hence upon identifying the commutative gauge fields Aicl through Aicl = −εi j A j ,

(2.28)

the “radial” component F3 of the field strength (2.26) reduces in the commutative limit to the standard expression

cl cl cl (2.29) F3 ≈ ∂1 Acl 2 − ∂2 A1 + i A1 , A2 . The constraint (2.11) now implies Fi + C0 − 21 1l N , Ci = Fi + {A0 , Ci } = 0, {ξi , Ai } + A0 + Ai Ai + A0 A0 = 0.

(2.30)

Since only configurations with A0 = O( N1 ) have finite action (2.25) and ξ3 is of order N , this implies that A3 , F1 and F2 are of order N1 at the north pole, while A1 and A2 can be finite of order 1. In particular, only the radial component F3 survives the N → ∞ limit, with F3 = −{A0 , C3 } ≈ −N A0 .

(2.31)

This analysis can be made global by considering the “radial” field strength Fr = x i Fi , which reduces to the usual field strength scalar on S 2 . The action (2.25) thus indeed reduces to the usual Yang-Mills action in the commutative limit with dimensionless gauge coupling g, giving 1 1 S≈ d (Fr )2 . (2.32) Tr (Fr )2 ≈ Ng 4π g S 2

Localization for Yang-Mills Theory on the Fuzzy Sphere

201

2.4. Symplectic geometry of the configuration space. The standard Kirillov-Kostant construction makes the orbit space (2.13) into a symplectic manifold [41]. Given two tangent vector fields Vφ , Vψ as above with φ, ψ ∈ u(N ), the symplectic two-form ω ∈ 2 (O) is defined locally through its pairing with the bivector Vφ ∧ Vψ as ω, Vφ ∧ Vψ = i Tr (C [φ, ψ]) .

(2.33)

Using trace manipulations it is easy to see that the kernel of this pairing coincides with the stabilizer algebra r, and hence it is nondegenerate on O(). We will derive below an explicit form of ω (2.47), which allows to verify directly the well-known fact that ω is closed, dω = 0.

(2.34)

Thus ω indeed defines an invariant symplectic structure on O(). The tangent vectors Vφ are hamiltonian vector fields, and we claim that their generator is given by Hφ = Tr (φ C)

(2.35)

for φ ∈ u(N ). Indeed, then dHφ = Tr (φ dC), and by using the dual evaluation dC, Vφ = i [C, φ],

(2.36)

one has dHφ , Vψ = i Tr (φ [C, ψ]) = − i Tr (C [φ, ψ]) = −ω, Vφ ∧ Vψ = −ιVφ ω, Vψ ,

(2.37)

where ιVφ denotes contraction with the vector field Vφ . Thus dHφ = −ιVφ ω

(2.38)

as claimed. This means that the hamiltonian function (2.35) defines a periodic flow generated by the action of a one-parameter subgroup C → e i t φ C e − i t φ , t ∈ R. The corresponding equivariant moment map µ : O() → u(N )∨ is the inclusion map which has the pairings µ(C), φ = Hφ ,

(2.39)

and it defines a representation of the Lie algebra u(N ) through the Poisson algebra corresponding to ω. For gauge transformations φ = φ0 ⊗ σ 0 , the moment map µ reduces to µ(C), φ = 2 Tr (φ0 C0 ) = Tr (φ0 (1ln N + 2 A0 )) . In the commutative limit and for abelian gauge fields n = 1, this becomes 2 1 µ(C), φ ≈ Tr (φ0 ) − Tr (φ0 Fr ) ≈ − d φ0 Fr N 2π S 2

(2.40)

(2.41)

up to an irrelevant shift, which is just the anticipated moment map for Yang-Mills theory on the classical sphere [18]. Given the appropriate symplectic structure and moment

202

H. Steinacker, R. J. Szabo

map on the gauge field configuration space O, the nonabelian localization principle for two-dimensional Yang-Mills theory can be applied for the action constructed as the square of the moment map. This is precisely the Yang-Mills action on S N2 given in (2.25). The constant term 21 1ln N is just the first Chern number of a background gauge field configuration and is of no significance for this discussion. This procedure will be worked out in detail in Sect. 4. More about the symplectic form. For later use, we will now derive some properties of the symplectic form introduced in (2.33). Consider the i u(N )-valued one-form on O() given by θ := C −1 dC.

(2.42)

Given the constraints (2.10) and using dC 2 = 0, this can be rewritten as θ=

4 N2

C dC =

2 N2

[C, dC].

(2.43)

Tr (θ ) = 0.

(2.44)

It obeys the constraints dθ + θ 2 = 0

and

Thus θ ∈ 1 (O, i u(N )) is essentially the canonical invariant Maurer-Cartan one-form, with the additional property [C, θ ] = −2J 2 (dC) = 2 dC,

(2.45)

where we have used the fact that dC is tangent to the orbit space and applied the projection property (2.17). In particular, along with the fact that C 2 is constant, this implies that C θ + θ C = 0.

(2.46)

Using again the constraint (2.10), the symplectic two-form (2.33) can be written as (2.47) ω = − 2Ni 2 Tr (C [dC, dC]) = 4i Tr C θ 2 . To see this, we substitute this expression using (2.18) and (2.17) into ω, Vφ ∧ Vψ = − Ni 2 Tr (C [ [C, φ], [C, ψ] ]) = i Tr (C [J (φ), J (ψ)]) = i Tr ([C, J (φ)] J (ψ)) = −N Tr J 3 (φ) ψ = N Tr (J (φ) ψ) = i Tr ([C, φ] ψ) = i Tr (C [φ, ψ])

(2.48)

for any φ, ψ ∈ u(N ), which coincides with the definition (2.33). Using (2.45) and (2.46), this identity gives a simple proof of the closure property (2.34) as (2.49) dω = 4i Tr dC θ 2 = − 8i Tr [θ, C] θ 2 = 0.

Localization for Yang-Mills Theory on the Fuzzy Sphere

203

3. The Classical Configuration Space In this section we will investigate in detail the space of classical solutions of U (n) gauge theory on the fuzzy sphere S N2 defined by the action (2.25). Understanding this space will be crucial for the exact solution of the quantum gauge theory, which as we will see in the next section is given exactly by its semiclassical expansion. We will first classify the solutions to the classical equations of motion, over which the partition function will be summed. Among these solutions we will find a variety of fluxons and, as in the case of gauge theory on the noncommutative torus, only a very small subset of all two-dimensional noncommutative instantons on S N2 map into the usual instantons of Yang-Mills theory on S 2 in the commutative limit N → ∞. We will then thoroughly describe the local symplectic geometry of the configuration space O near each critical point of the Yang-Mills action, as symplectic integrals over these neighbourhoods will produce the required quantum fluctuation determinants in the semiclassical expansion. 3.1. Classical solutions. The critical points of the Yang-Mills action (2.25) are easy to find. Since the most general variation of a gauge field C ∈ O is given by δC = [C, φ], by varying (2.25) one finds that the critical points satisfy

0 = Tr δC0 (C0 − 21 1ln N ) = Tr ([C, φ] C0 ) = Tr (φ [C0 , C]) (3.1) for arbitrary φ ∈ u(N )/r. They are therefore given by solutions of the equation [C0 , C] = 0, which agrees with the known saddle-points in the matrix model formulation of [9]. This equation is equivalent to [C0 , Ci ] = 0

(3.2)

which together with (2.11) implies that [Ci , C j ] = i i j k (2C0 ) Ck , C02 =

N2 4

1ln N − Ci C i .

(3.3)

For solutions with C0 = 0, we can use (3.2) to define Li =

1 Ci , 2C0

(3.4)

and rewrite (3.3) as [L i , L j ] = i i j k L k , N2 1 L i L i = 4C 1ln N . − 2 4

(3.5)

0

These equations mean that the critical points of the Yang-Mills action correspond to (isomorphism classes of) (n N ) × (n N ) unitary representations of the isometry group SU (2), i.e. homomorphisms πn N : SU (2) → U (n N ). Up to isomorphism, for each integer p ≥ 1 there is a unique irreducible SU (2) representation ( p) of dimension p. Therefore, there is a one-to-one correspondence between classical solutions and ordered partitions (n 1 , . . . , n k ) of the integer n N = n 1 +· · ·+n k , with n i the dimension of the i th irreducible subrepresentation in the representation πn N characterizing the given critical

204

H. Steinacker, R. J. Szabo

point. Eachsuch classical solution breaks the U (n N ) gauge symmetry locally to the centralizer i U (ki ) of the homomorphism πn N , where ki denotes the multiplicity of the blocks. They can be seen [9] to give precisely the usual two-dimensional instantons for U (n) Yang-Mills theory on S 2 . These solutions also agree with those that can be interpreted as configurations of D0-branes inside D2-branes [14], although the ones which will survive the large N limit are different. Therefore, each critical point is labelled (up to gauge equivalence) by the set of dimensions n i of the irreducible representations, supplemented with a sign si which is defined by si = sgn(C0 (n i )) = ± 1 (in that representation) when C0 (n i ) = 0 and si = 0 if C0 (n i ) = 0. We can thereby label the critical surfaces, i.e. the connected components of the moduli space of classical solutions in O, as C(n 1 ,s1 ),...,(n k ,sk )

with

n i ∈ N and si ∈ {± 1, 0}

(3.6)

with the constraints 1 ≤ n1 ≤ n2 ≤ · · · ≤ nk ,

k

n i = n N and

i=1

k

si = n,

(3.7)

i=1

and si = 0 only if n i = 1. Any non-trivial irreducible representation with n i > 1 and C0 = 0 gives a contribution ±N to the trace Tr (C), which must be balanced in order to satisfy the eigenvalue multiplicity constraint (2.22). This is the role of the condition i si = n in (3.7). Note that one can change the sign of any individual irreducible representation. The meaning of the blocks (n i , si ) can be described as follows: • sa = ± 1 : In this case C0 = 0, and hence C0 > 21 due to (3.5). These solutions come with two signs. Note that any irreducible representation with small dimension will be highly suppressed in the large N limit. The most extreme case is a sum of trivial representations, with n a = 1, for which Ci = 0

and

C0 (n a = 1) = sa

N 2.

(3.8)

• sa = 0 : In this case C0 = 0 and n a = 1, which implies that Ci = ci with ci ∈ R 2

and N4 = ci ci . These solutions are also suppressed at large N but less so than those with Ci = 0 above. They correspond to fluxons [35] whose positions on S 2 are determined by the vector ci . Note that each such saddle-point (or more generally any gauge field configuration C) defines a projective module over the fuzzy sphere algebra S N2 , obtained by writing C in 2n × 2n block-matrix form. The module then corresponds to a projector (n 1 ,s1 ),...,(n k ,sk ) ∈ Mat 2n (S N2 ). Let us describe some of these critical points explicitly. Ground state. The vacuum solution has k = n and is given by the critical surface 2 C(N ,1),...,(N ,1) , which implies that C0 = 21 1ln N . It follows that Ci C i = N 4−1 1l N , which is the quadratic Casimir invariant of the N -dimensional irreducible representation of SU (2). Using a suitable U (n N ) gauge transformation, it can be written as Ci = ξi ⊗ 1ln ,

(3.9)

Localization for Yang-Mills Theory on the Fuzzy Sphere

205

and we recover the original coordinates of the fuzzy sphere S N2 . This is equivalent to the vanishing curvature condition F = 0. In the abelian case n = 1, an application of Schur’s lemma shows that the only matrix which commutes with C is the constant matrix and so the gauge group U (N ) acts freely on the moduli space of vacuum solutions, corresponding simply to a change of basis in this case. For n > 1 the solution is a direct sum of n identical representations. This commutes with the action of u(n), and so now the gauge group U (n N ) contains a non-trivial stabilizer. The moduli space of flat connections is therefore isomorphic to the smooth manifold U (n N )/U (n) in the nonabelian case. Note that any configuration near the vacuum, with small but finite action, is given by a small deformation of an irreducible SU (2) representation describing S N2 , and in particular the gauge field fluctuations Aµ are “small”. It is in this sense that the quantum gauge theory will describe a fluctuating theory of noncommutative fuzzy sphere geometries. Fluxons. At the other extreme, if C0 has several zero eigenvalues, i.e. several fluxons, the situation is much more complicated. For example, when C0 = 0 and n = 1 we obtain a fuzzy version of the moduli space of constant curvature connections in genus 0 provided by the critical surface 2 µ−1 (C0 = 0) = Ci ∈ u(N ) Ci C i = N4 1l N , [Ci , C j ] = 0 (3.10) along with the condition (2.22) on the multiplicities of the eigenvalues of Ci ⊗ σ i . The action of the U (N ) gauge group on (3.10) can be used to simultaneously diagonalize the three matrices Ci . The Marsden-Weinstein symplectic reduction of the orbit space O() is then essentially a symmetric product orbifold of the classical sphere S 2 given by (3.11) M0 := µ−1 (C0 = 0)// U (N ) ∼ = Sym N S 2 , where Sym N (S 2 ) := (S 2 ) N /S N and the quotient by the Weyl group S N ⊂ U (N ) is the residual gauge symmetry acting by permutations of the real eigenvalues of the hermitian matrices Ci representing the positions of the fluxons on S 2 , which are indistinguishable. The fluxon moduli space M0 contains orbifold singularities arising from the fixed points of the S N -action on (S 2 ) N , which occur whenever two or more fluxon locations coincide. This is analogous to the vacuum solution of two-dimensional U (N ) gauge theory on a noncommutative torus wherein the moduli space of constant curvature connections is the symmetric product orbifold Sym N (T 2 ) [32], and there is a natural correspondence between two-dimensional noncommutative instantons and fluxons [42]. In the present case the U (N ) action on the fluxon configuration space (3.10) also has additional fixed points. Note that the restriction of the symplectic two-form (2.47) to the moduli space M0 is given by N 4 i i jk a a ω M = − 2 ci dc j ∧ dcka , 0 N

(3.12)

a=1

2 where cia ∈ R are the eigenvalues of Ci with i (cia )2 = N4 for each a = 1, . . . , N . With the usual embedding of the two-sphere S 2 → R3 , this is just the standard round symplectic two-form on the Kähler manifold (S 2 ) N . Each fluxon contributes a suppression factor e

N − 4g

due to (2.25).

206

H. Steinacker, R. J. Szabo

Instantons on S 2 . The configurations which will dominate the path integral in the large N classical limit are the low-energy solutions with small actions. These are solutions with n partitions and critical surfaces C(n 1 ,1),...,(n n ,1) with n i ≈ N . They correspond to the usual instantons of U (n) gauge theory on S 2 with vanishing U (1) flux, as shown in [9]. These solutions may also contain additional fluxons, which behave like localized flux tubes which ensure that the total U (1) flux vanishes. Their contributions are suppressed −N

by factors of at least e 4g , however they do contribute in the double scaling, quantum plane limit wherein S N2 becomes noncommutative R2 [43,44]. Monopoles. As shown in [9,13], an irreducible representation with n i = N − m i corresponds to the gauge field of a monopole with magnetic charge m i ∈ Z. Configurations with non-trivial U (1) monopole number can therefore be obtained by relaxing the constraint (2.22) and replacing it by Tr (C) = n N − c1 ,

(3.13)

where c1 = i m i ∈ Z is the first Chern number. In order to maintain the constraint 2 C 2 = N4 1lN , the matrix dimension (2.21) must then be replaced with N = 2(n N −c1 ). Some of these nontrivial U (1) bundles are realized within the original configuration space (2.23), in the presence of trivial blocks with n a = 1, sa = ± 1. For example, in the abelian case n = 1 the solutions in C(N −2,1),(1,1),(1,−1) are naturally interpreted as monopoles with charge m = 2. The blocks (1, ± 1) have vanishing field strength Fi = 0, and are naturally interpreted as Dirac strings. They are suppressed by factors 3 of at least e −N /g . Replacing the trivial blocks with fluxons leads to vanishing global U (1) flux as discussed above.

3.2. The classical action. The values of the Yang-Mills action (2.25) on the classical solutions obtained in Sect. 3.1 above will determine the classical contributions to the path integral in the next section. The action at these critical points can be evaluated as follows. Note that for each p-dimensional irreducible representation L i of the isometry 2 group SU (2), one has L i L i = p 4−1 1l p and hence from (3.5) it follows that N2 p2

1l p = 4C0 ( p)2

(3.14)

on that representation, so that C0 ( p) = ± 2Np 1l p . Consider the reduced Yang-Mills action N N2 S := Ng Tr C02 = S + Ng Tr (C0 ) − 4g Tr (1ln N ) = S + n4g , (3.15) which is somewhat easier to manipulate than S. For a dominant solution with critical surface C(n 1 ,1),...,(n n ,1) and n i > 1, the action S is given by S ((n 1 , 1), . . . , (n n , 1)) =

n n N2 N N3 1 ni 2 = . g 4g n 4n i i=1 i=1 i

(3.16)

Localization for Yang-Mills Theory on the Fuzzy Sphere

207

While possible fluxon blocks with n i = 1 do not contribute at all to S , they do contribute N 4g to the original action S (2.25). Their total contributions to S are proportional to the fluxon charge, i.e. the total number of blocks with n i = 1, and agree with the usual fluxon action [35] in the quantum plane limit of S N2 [43]. The dominant configurations in the classical limit are therefore those with ni = N − m i

n

and

mi = 0

(3.17)

i=1

with small m i ∈ Z, for which C0 (n i ) =

N 2(N −m i )

1ln i ≈

1 2

1+

mi N

1ln i .

(3.18)

Note that then Tr (C0 ) =

n

(N − m i )

i=1

nN N = 2(N − m i ) 2

(3.19)

as required. It follows that S ((n 1 , 1), . . . , (n n , 1)) ≈

n n m 2

1 2 N i (N − m i ) + O N1 ≈ mi , g 2N 4g i=1

i=1

(3.20) which is the usual expression [19,20] for the classical action of U (n) Yang-Mills theory on the sphere S 2 with trivial gauge bundle evaluated on the two-dimensional instanton on S 2 corresponding to a configuration of n Dirac monopoles of magnetic charges m i ∈ Z. Non-trivial gauge bundles over S 2 of first Chern class c1 ∈ Z are obtained by modifying the trace constraint as in (3.13).

3.3. Local symplectic geometry of the configuration space. We will now develop the local symplectic geometry of the configuration space of gauge fields near each YangMills critical point. This is done by analysing in more detail the map (2.16), satisfying (2.17). We want to find a useful description of the tangent space TC O ∼ = im(J ), i.e. of the local geometry of the orbit space O. Since J is an anti-hermitian operator with respect to the Cartan-Killing form on u(N ) (see (2.18)), it follows that the space u(N ) splits into two orthogonal subspaces as u(N ) = ker(J ) ⊕ ker J 2 + 1lN , (3.21) where ker(J ) = r = u(n N+ ) ⊕ u(n N− ) is the stabilizer subalgebra, while ker(J 2 + 1lN ) ∼ = TC O is the tangent space to the configuration space at C ∈ O. In particular, J defines a complex structure on TC O, and (3.21) is just the Cartan decomposition of u(N ) corresponding to the symmetric space O. This follows immediately by noticing that the involutive automorphism j : u(N ) −→ u(N ),

φ −→ C φ C −1

(3.22)

208

H. Steinacker, R. J. Szabo

is 1lN on ker(J ) and −1lN on ker(J 2 +1lN ) upon using the constraints (2.10). Moreover, for any Vφ , Vψ ∈ TC O, from (2.47) one has

(3.23) ω, Vφ ∧ Vψ = Ni 2 Tr [C, Vφ ] Vψ = N1 Tr J (Vφ ) Vψ and

ω, Vφ ∧ J (Vψ ) =

1 N

Tr (Vφ Vψ ),

(3.24)

expressing the fact that the symplectic two-form ω makes the configuration space O into a Kähler manifold with respect to the complex structure (2.16). All of these properties are just standard features of hermitian symmetric spaces [36], as will be exploited at length in this paper. Consider the restriction of the map J to the gauge algebra g = u(n N ) ⊂ u(N ) containing elements of the form g = φ ⊗ σ 0 . Since J (φ) is the infinitesimal gauge transformation of the gauge field C generated by φ, it describes the orbits of the gauge group G = U (n N ) acting on the configuration space O, in TC O. Generically this action is free (apart from the trivial u(1)), but not for certain critical points. For example, for the vacuum solution (3.9) the subalgebra 1l N ⊗ u(n) commutes with C. The higher critical points in the nonabelian case generically have a smaller u(1)n centralizer algebra. More precisely, consider the kernel of J at C restricted to the gauge algebra g, s := ker(J ) ∩ g,

(3.25)

which is the subgroup of the gauge group that stabilizes C. The elements φ ∈ s are orthogonal to TC O due to (3.21). Hence g decomposes into orthogonal subspaces g = s ⊕ g ,

(3.26)

where g = s⊥ =: g s contains the “proper” gauge transformations, acting freely near C. If (n 1 , . . . , n n ) is a partition of the integer n N which does not contain trivial representations of SU (2) (no fluxons), then g is the tangent space to the corresponding critical surface C(n 1 ,1),...,(n n ,1) ⊂ O, (3.27) C(n ,1),...,(n ,1) ∼ = U (n N )/S, 1

n

where S = exp(s). We claim that the subspaces J (g) and g are linearly independent. For this, assume to the contrary that J (g) and g are linearly dependent, i.e. J (g) ∈ g for some g ∈ g. This implies that [Ci , g] = 0, and therefore [C02 , g] = 0 due to (2.11). Restricting attention to critical points C for which the spectrum of C0 is non-negative (the others being strongly suppressed at large N ), this implies that g commutes with the spectral projectors of C0 , and hence also with C0 itself. Together with [Ci , g] = 0 it follows that J (g) = 0. However, J (g) and g need not be orthogonal subspaces. Generically one then has J 2 (g) + J (g) ⊂ TC O.

(3.28)

The two subspaces are not orthogonal in general, since for g1 , g2 ∈ g one can compute the inner product Tr J 2 (g1 ) J (g2 ) = Tr (g1 J (g2 )) = − Ni Tr (C [g1 , g2 ]) = − N1 ω, Vg1 ∧ Vg2 =

i N

Tr (g1 [C0 , g2 ])

(3.29)

Localization for Yang-Mills Theory on the Fuzzy Sphere

209

which is non-vanishing in general. For the vacuum solution with C0 = 21 1ln N , it follows from this expression that the subspaces are indeed orthogonal, and hence J 2 (g)⊕J (g) ⊂ TC O. In fact, one has J 2 (g) ⊕ J (g) = TC O

if C0 =

1 2

1ln N ,

(3.30)

which provides a useful description of the local geometry near the global minimum. To see (3.30), note first that in the abelian case n = 1 one has s = u(1), and (3.30) then follows since dim(O) = 2(N 2 − 1) = 2 dim(g ). In the nonabelian case, for the vacuum state the gauge stabilizer s ∼ = u(n) has dimension n 2 and hence dim(J 2 (g ) ⊕ J (g )) = 2 2 2 2n N − 2n = dim(O). In general, the subspaces J (g) = J (g s) and J 2 (g) are not linearly independent, and we can define E 0 := J (g) ∩ J 2 (g),

(3.31)

which is generically a non-trivial subspace. Define also the subspaces h, h˜ ⊂ g s with the properties that ˜ J (h) = E 0 = J 2 (h).

(3.32)

˜ implies that h ⊂ h˜ ⊂ h, we have Since J 2 (h) = −J (h) h = h˜

and

J (E 0 ) = E 0 .

(3.33)

We can accordingly decompose the gauge algebra g into orthogonal subspaces as g = g1 ⊕ h ⊕ s.

(3.34)

Since J : h → E 0 is a bijection, there is a unique map j : h −→ h

with

J 2 (h) = J ( j (h))

(3.35)

for all h ∈ h which satisfies j 2 = −1ln N . Similarly, in order to span the entire tangent space at C ∈ O we generally have to introduce another subspace E 1 , with J (E 1 ) = E 1 , which gives the general decomposition J (g h) ⊕ J 2 (g h) ⊕ E 0 ⊕ E 1 = TC O.

(3.36)

3.4. Explicit decomposition at Yang-Mills critical surfaces. We will now provide an explicit description of the various subspaces appearing in the decomposition of the tangent space (3.36). Consider the Yang-Mills critical surfaces C(n 1 ,1),...,(n n ,1) and suppose first that n 1 = · · · = n n are all distinct integers, corresponding to a completely nondegenerate solution. The elements φ of the subspace (3.25) satisfy [C, φ] = 0. This implies that φ respects the block decomposition described by the given partition (n 1 , . . . , n n ), and is therefore proportional to 1ln i on each block. These are thus u(1)n degrees of freedom. If some n i are degenerate, this space is enhanced to s = u(k1 ) × · · · × u(kl ) (3.37) for a critical surface with C = i C(n i ) ⊗ 1lki and n i all distinct. For the vacuum this is u(n), corresponding to the maximally degenerate solution, as in Sect. 3.3 above.

210

H. Steinacker, R. J. Szabo

We wish to work out the map J explicitly. For this, we decompose ⎞ ⎛ φ11 φ12 . . . ⎟ ⎜ φ = ⎝ φ21 φ22 . . . ⎠ , .. ··· ··· .

(3.38)

where φi j ∈ (n i ) ⊗ (n j ) and as before ( p) denotes the p-dimensional irreducible representation of SU (2). In the degenerate case, there is another factor corresponding to u(k j ). The non-orthogonality of J (g) and J 2 (g) in (3.29) is now easily understood as being simply due to the different u(1) charges between the SU (2) sectors of s. Since [C, C0 ] = 0 at the Yang-Mills critical surfaces, one has J ([C0 , φ]) = [C0 , J (φ)]. Thus the hermitian operator (ad i C0 )i j = i C0 (n i ) − i C0 (n j ) = i

N n j − ni =: i ci j 2 ni n j

(3.39)

acting on φi j ∈ (n i ) ⊗ (n j ) commutes with J . This implies that we can decompose the subspaces in (3.36) such as J (h) = J 2 (h) = E 0 into irreducible representations of the operator ad i C0 , i.e. into the various u(1) blocks. Restricted to the diagonal blocks, C0 (n i ) is proportional to the unit matrix 1ln i , so that Tr (J (g1 ) J 2 (g2 )) = 0 there as for the vacuum. Global SU (2) symmetry. To proceed further, we need to exploit an additional symmetry that we have neglected so far, the global rotation group SU (2). Recall from Sect. 3.1 above that each saddle-point defines a representation of SU (2) acting on the representation space V ∼ = Cn N as (3.4), and trivially on potential fluxon components. In the abelian case n = 1, this induces via the adjoint action the rotations of functions f → Ji f = [L i , f ] in S N2 ∼ = V ⊗ V , but it is a somewhat different symmetry for the nonabelian instantons. Let us decompose V into irreducible representations as V =

n

(n i ).

(3.40)

i=1

This representation can be extended to the module V ⊗ C2 for the action of the operators Ji = L i +

1 2

σi,

(3.41)

which by construction commute with C, [Ji , C] = 0,

(3.42)

on the critical surfaces. This follows from the fact that Ci ⊗ σ i is an intertwiner for the action of Ji on n n 2 V ⊗C = (n i + 1) ⊕ (n i − 1) =: V + ⊕ V − (3.43) i=1

i=1

and C has eigenvalues ± N2 on the component subspaces V ± . This enables one to decompose C further using the projectors i± onto the irreducible representations (n i ±1) with C, i± = 0, (3.44)

Localization for Yang-Mills Theory on the Fuzzy Sphere

211

and the constrained covariant coordinates take the simple form ⎛ n ⎞ + i 0 ⎟ N ⎜ ⎜ i=1 ⎟. C= n ⎝ 2 −⎠ i 0 −

(3.45)

i=1

In particular, since C0 ⊗ σ 0 is two-fold degenerate it follows that ⎞ ⎛ n + C (n ) 0 ⎟ ⎜ i=1 0 i i ⎟ C0 ⊗ σ 0 = ⎜ n ⎠ ⎝ C0 (n i ) i− 0

(3.46)

i=1

separates the explicit blocks according to (3.39). The complex structure map J respects this SU (2) symmetry, [Ji , J ] = 0,

(3.47)

which enables one to decompose the tangent space TC O into irreducible representations of the SU (2) isometry group. With respect to the block decomposition (3.43), the subspace ker(J ) ⊂ u(N ) consists of block diagonal operators while TC O consists of block off-diagonal operators, and the action of J on tangent vectors is given explicitly by 0 X 0 iX J = . (3.48) X† 0 − i X† 0 This is the obvious complex structure on TC O compatible with the action of the isometry group. The decomposition of the tangent space TC O into irreducible representations of SU (2) is now provided by ⎞ n ⎛ n n TC− O ∼ (n i + 1) ⊗ ⎝ (n j − 1)⎠ = (n i + 1) ⊗ (n j − 1), (3.49) = i=1

j=1

i, j=1

where TC± O := TC OV ± corresponds to the upper-right, respectively lower-left, blocks in (3.48), and the different sectors (i, j) are separated by the eigenvalues of the operator ad i C0 in the irreducible case. Note in particular that the lowest spin component in the Clebsch-Gordan decomposition of (n i + 1) ⊗ (n i − 1) is a spin one field as appropriate for gauge fields. This implies J (g0 ) = 0, where g0 is the subspace of SU (2) singlet components of g, and in fact g0 = s by Schur’s lemma. Global minimum. Consider first the vacuum surface C(N ,1),...,(N ,1) . Compare the SU (2)invariant decomposition of the gauge algebra g, given by g∼ = (N ) ⊗ (N ) ⊗ u(n) = ((1) ⊕ (3) ⊕ · · · ⊕ (2N − 1)) ⊗ u(n) = ((1) ⊕ (N + 1) ⊗ (N − 1)) ⊗ u(n), (3.50)

212

H. Steinacker, R. J. Szabo

with (3.49) in the degenerate case C0 = 21 1ln N . It follows that the image of J (g) indeed covers all modes of TC O, and the complexification is achieved by adding J 2 (g). This gives another proof of the decomposition (3.30). The singlet subspace of (3.50) is g0 = (1) ⊗ u(n) ∼ = u(n) = s. Maximally irreducible saddle points. Now consider a generic, completely nondegenerate critical surface C(n 1 ,1),...,(n n ,1) , and the corresponding decomposition of TC O = TC− O ⊕ TC+ O given by (3.49). The different sectors (i, j) are distinguished by the eigenvalues of the operator ad i C0 . Hence we can pick some fixed pair n i > n j , and decompose (n i + 1) ⊗ (n j − 1)

∼ = |n i − n j | + 3

i ci j

⊕ |n i − n j | + 5 i c ⊕ · · · ⊕ n i + n j − 1 i c ⊂ TC O, ij

ij

(3.51) which has eigenvalue given by (3.39) as indicated by the subscripts. Similarly, one has (n j + 1) ⊗ (n i − 1)

∼ = |n i − n j | − 1

i c ji

⊕ |n i − n j | + 1 i c ⊕ · · · ⊕ n i + n j − 1 i c ⊂ TC O ji

ji

(3.52) (where (0) is omitted) with ad i C0 eigenvalue i c ji = − i ci j . The corresponding conjugate matrix decompositions (n j − 1) ⊗ (n i + 1) and (n i − 1) ⊗ (n j + 1) are determined by hermiticity. They are given respectively by (3.51) with eigenvalue i c ji = − i ci j and by (3.52) with eigenvalue i ci j . We denote the tangent space decomposition (3.49) determined by (3.51) and (3.52) as TC O :=

n

C|n; n i + 1, n j − 1; i ci j , l TC O ,

(3.53)

i, j=1

where n denotes the dimension of (n), and we will drop its magnetic quantum number l from now on. This defines a natural basis for TC O, in which the action of J is given by block-wise multiplication with 0 i 2 (3.54) J =σ = −i 0 as in (3.48), and the action of ad i C0 by

ad i C0

ij

= |ci j |

0 σ2 σ2 0

(3.55)

since its sign depends on n i ≷ n j . In particular, by virtue of (3.23) the tangent space TC O is naturally a symplectic vector space with symplectic form of type (1, 1) with respect to the complex structure J . This construction thereby defines a local symplectic model for the neighbourhood of the Yang-Mills critical point C in the Kähler manifold O. In the next section this model space will be used to evaluate fluctuation integrals over tubular neighbourhoods of the critical surfaces. In particular, all pertinent one-forms can

Localization for Yang-Mills Theory on the Fuzzy Sphere

213

be explicitly evaluated on TC O by using the explicit expressions for C and C0 in (3.45) and (3.46). Let us now look at the SU (2)-invariant decomposition of the gauge algebra g given by g ∼ =

n

(n i ) ⊗ (n j )

(3.56)

i, j=1 n

= (|n i − n j | + 1) ⊕ (|n i − n j | + 3) ⊕ · · · ⊕ (n i + n j − 1)

=:

i, j=1 n

C|n; n i , n j ; i ci j g.

i, j=1

This can be compared with the SU (2)-invariant decomposition of the tangent space TC O in (3.53) above, whose higher modes match perfectly with those of g except for a doubling due to the complex structure J . There is, however, some mismatch in the low lying modes. In particular, TC O contains the extra subspace

E 1 :=

C|n i − n j − 1; n j + 1, n i − 1; − i ci j TC O ,

(3.57)

i> j

which is not contained in J (g). On the other hand, the modes in the subspace E 0 :=

C|n i − n j + 1; n j + 1, n i − 1; − i ci j TC O

(3.58)

i> j

occur only once in TC O, which means that they are already spanned by the image J (g) since J = 0 on the non-trivial modes. This implies that E 0 = J (E 0 ) = J (h), where h=

i= j

C|n i − n j | + 1 ; n i , n j ; i ci j g.

(3.59)

The linear independence of the subspaces J (g h) and J 2 (g h) follows from the explicit embedding TC O → u(N ) given below. Therefore J (g h) ⊕ J 2 (g h) spans the entire tangent space TC O except for the subspace E 1 , which gives the decomposition (3.36) with the various subspaces now explicitly identified. We have J (E 0 ) = E 0 and J (E 1 ) = E 1 , with the action of J given by diagonal eigenvalues ± i on the two components in (3.58) and (3.57). On the remaining space TC O E 0 E 1 the action of J is obtained by exchanging the two components in (3.36). To complete this analysis, we need to explicitly embed TC O into the space u(N ), which admits the SU (2)-invariant decomposition u(N ) ∼ = g ⊗ ((2) ⊗ (2)) =

n

(n i + 1) ⊗ (n j + 1) ⊕ (n i − 1) ⊗ (n j − 1) i, j=1

⊕ (n i + 1) ⊗ (n j − 1) ⊕ (n i − 1) ⊗ (n j + 1) , (3.60)

214

H. Steinacker, R. J. Szabo

corresponding to (3.43). Since we know the action of J on the rhs, we can determine the map J : g −→ TC O → g ⊗ ((2) ⊗ (2))

(3.61)

using n

J |n; n k + 1, nl − 1; i ckl TC O J |n; n i , n j ; i ci j g =

TC O

k,l=1

×n; n k + 1, nl − 1; i ckl |n; n i , n j ; i ci j g + h.c. (3.62) The non-vanishing inner products in this expression can be written in terms of Wigner 6 j-symbols for the group SU (2), which are known explicitly. This also enables one to compute the projection 0 : TC O −→ g,

V0 ⊗ σ 0 + Vi ⊗ σ i −→ V0

(3.63)

as 0 |n; n i + 1, n j − 1; i ci j TC O n = |n; n k , nl ; i ckl g gn; n k , nl ; i ckl |n; n i + 1, n j − 1; i ci j TC O .

(3.64)

k,l=1

In the basis (3.36), one has the useful explicit formula 0 J (g) = ad i C0 (g) which is of order 0 (E 1 ) = {0}.

1 N

and can also be used for E 0 , while 0 J 2 (g) is of order

(3.65) 1 N2

and

General solutions. The case where some of the irreducible representations (n i ) have multiplicity ki > 1 is a combination of the structures above for the vacuum state and for the nondegenerate case. Now the basis (3.53) acquires additional labelling reflecting the u(ki ) degrees of freedom, and it takes the symbolic form TC O =

l

Cn ; (n i + 1, ai ), (n j − 1, a j ) ; i ci j T O . C

(3.66)

i, j=1

In particular, one can now easily compute the symplectic form on TC O using (3.23). It is essentially given by the complex structure J . 3.5. Fluctuations around the critical surfaces. We conclude this section with a summary of the salient features of the decompositions in Sects. 3.3 and 3.4 above, as pertaining to how they will be exploited in the next section to evaluate fluctuation integrals over the local neighbourhoods of Yang-Mills critical points. Recall that globally the critical surface (with no fluxons) through some critical point C is given by the space of gauge transformations acting on C, as in (3.27). Its tangent space is embedded locally as TC C(n 1 ,s1 ),...,(n k ,sk ) = J (g s) ⊂ TC O,

(3.67)

Localization for Yang-Mills Theory on the Fuzzy Sphere

215

which can be determined explicitly using (3.62). Recall also that the gauge stabilizer s of C consists of the SU (2) singlets in g. It is given by s ∼ = u(n) for the vacuum, and s ∼ = u(1)n for completely irreducible saddle-points. In particular, s is never trivial, quite unlike the situation in ordinary two-dimensional Yang-Mills theory [27]. The global symmetry cannot be disentangled in the noncommutative case, and the nonabelian localization even at the global minimum is akin to that at higher critical points of two-dimensional Yang-Mills theory or more precisely at the flat connections of Chern-Simons gauge theory on a Seifert fibration [27]. The non-trivial part of the localization at higher critical points will therefore be given by fluctuation integrals over the spaces E 0 , E 1 and s. The only effect of the remaining part J (g h) ⊕ J 2 (g h) will be to induce normalization terms as for the vacuum critical point. In particular, the subspaces J (g s) and J 2 (g s) locally model the tangent space TC O near the vacuum. To understand the physical meaning of the subspace E 1 , note that the gauge field strength remains constant for variations along φ = X ∈ E 1 , since δC0 E = i[C, φ]0 ∈ 1 0 (E 1 ) = {0}. Let us compute the second order variation of the Yang-Mills action, given by Tr C0 δ 2 C0 = − Tr (C0 [ [C, φ], φ])

= Tr ([C0 , φ] [C, φ]) = −N Tr ad i C0 (φ) J (φ) . (3.68) Restricting to fluctuations φ = X ∈ E 1 with respect to the decomposition (3.57) one has Tr C0 δ 2 C0 E1 (3.69) = −N Tr ad i C0 (X †ji ) J (X ji ) = −2N |ci j | Tr X †ji X ji i> j

i> j

by using the actions (3.54) and (3.55), cf. (4.85). For the maximally nondegenerate saddle-points, this fluctuation is thus negative, demonstrating that the two-dimensional instantons on the fuzzy sphere S N2 are generically unstable. On the other hand, since the subspace E 0 = J (h) is obtained through gauge transformations, it produces flat directions for the Yang-Mills action. 4. Nonabelian Localization This section is the crux of the present paper, wherein we shall derive the semiclassical expansion of the partition function for Yang-Mills theory on the fuzzy sphere S N2 and show that it agrees with the known instanton expansion of quantum gauge theory on S 2 in the classical limit N → ∞. We will begin by describing the nonabelian localization principle, adapted to our specific gauge theory. We will then explicitly evaluate the contributions from two extreme classes of Yang-Mills critical points, the vacuum and the maximally irreducible solutions, and show that they give the expected contributions to the path integral at large N . The intermediate contributions from degenerate solutions, which we do not treat in detail here, are somewhat more involved but can in principle be evaluated using our techniques. The contribution from the vacuum to the partition function could be expressed in terms of the abstract cohomological formula of [18] given

216

H. Steinacker, R. J. Szabo

by intersection pairings on the vacuum moduli space, or by using the more explicit residue formula of [23]. The contributions from some higher unstable critical points to the nonabelian localization formula are formally described in [24,26,27], but the general cases that we need (including reducible saddle points) are not explicitly treated in full generality. Here we will directly evaluate, following [27], the explicit quantum fluctuation integrals near the critical points using the local symplectic geometry of the previous section. 4.1. Equivariant cohomology and the localization principle. The goal of this section is to compute the partition function of quantum Yang-Mills theory on the fuzzy sphere defined by the action (2.25) on the configuration space (2.13) of gauge fields. After an irrelevant shift of the covariant coordinates (2.8) which is equivalent to working with the reduced Yang-Mills action (3.15), it is defined by g dim(G)/2 1 Z := dC exp − Ng Tr C02 vol(G) 4π N O dim(G)/2 g 1 (4.1) = exp ω − 2g1 Tr C02 , vol(G) 2π O where we have used the fact that the symplectic volume form ωd /d!, with d := dimC (O), defines the natural gauge invariant measure on O provided by the Cartan-Killing riemannian volume form (up to some irrelevant normalization). This follows from the fact that the natural invariant metric on O is a Kähler form. We have divided by the volume of the gauge group G = U (n N ) with respect to the invariant Cartan-Killing form and by another normalization factor for later convenience, and also introduced the rescaled gauge coupling g =

g . 2N

(4.2)

We will now describe, following [18,27], how the technique of nonabelian localization can be applied to evaluate the symplectic integral (4.1) exactly. We begin by using a gaussian integration to rewrite (4.1) as ! 1 dφ Z= exp ω − i Tr (C0 φ) − g2 Tr φ 2 , (4.3) vol(G) g×O 2π where the euclidean measure for integration over the gauge algebra φ ∈ g = u(n N ) is determined by the invariant Cartan-Killing form. Since the moment map for the G-action on O is given by (2.40), by (2.38) we have d Tr (C0 φ) = −ιVφ ω.

(4.4)

Q = d − i ιVφ ,

(4.5)

Introduce the BRST operator

where d is the exterior derivative on (O) and the contraction ιVφ acts trivially on φ. It preserves the gradation if one assigns charge +2 to the elements φ of g, and it satisfies Q 2 = − i {d, ιVφ } = − i LVφ ,

(4.6)

Localization for Yang-Mills Theory on the Fuzzy Sphere

217

where LVφ is the Lie derivative along the vector field Vφ . Thus Q 2 = 0 exactly on the space G (O) := (C[[g]] ⊗ (O))G

(4.7)

consisting of gauge invariant differential forms on O which take values in the ring of symmetric functions on the Lie algebra g. By construction one has

using (2.34) and (4.4), and

Q (ω − i Tr (C0 φ)) = 0

(4.8)

Q Tr φ 2 = 0.

(4.9)

Therefore, the integrand of the partition function (4.3) defines a G-equivariant cohomology class in HG (O), and the value of Z depends only on this class. The integral of any Q-exact equivariant differential form in G (O) over g × O is clearly 0, as is the integral of any ιVφ -exact form even if its argument is not gauge invariant. Thus Z is unchanged by adding any Q-exact form to the action, which will fix a gauge for the localization. Hence we can replace it by ! dφ 1 exp ω − i Tr (C0 φ) − g2 Tr φ 2 + t Qα , (4.10) Z= vol(G) g×O 2π which is independent of t ∈ R for any G-invariant one-form α on O, where Qα = dα − i α, Vφ .

(4.11)

The independence of (4.10) on the particular representative α ∈ (O)G of its equivariant cohomology class will play a crucial role in our evaluation of the partition function. Expanding the integrand of (4.10) by writing exp(t dα) as a polynomial in t and using the fact that the configuration space O is compact, it follows that for t → ∞ the integral localizes at the stationary points of α, Vφ in g × O. By writing Vφ = Va φ a , where φ a is an orthonormal basis of g∨ , we have α, Vφ = α, Va φ a and the critical points are thus determined by the equations α, Va = 0,

(4.12)

φ dα, Va = 0.

(4.13)

a

Since (4.13) is invariant under rescaling of φ and the Lie algebra g is contractible, the homotopy type of the space of solutions in g × O is unchanged by restricting to φ = 0 and the saddle-points reduce to the zeroes of α, Va in O. Given the reduced Yang-Mills function (3.15), let us consider explicitly the invariant one-form α given by [27,32]

α = − i Tr (C0 [C, dC]0 ) = g J dS . (4.14) We claim that the vanishing locus of α, Va in this case coincides with the critical surfaces of the original Yang-Mills action (2.25) as found in Sect. 3.1. To see this, we note that the condition

0 = α, Va = Tr C0 [C, [C, φ a ] ]0 = − Tr [C, C0 ] [C, φ a ] (4.15)

218

H. Steinacker, R. J. Szabo

certainly holds whenever [C, C0 ] = 0. On the other hand, by setting φ = C0 it implies 0 = α, Vφ = − Tr [C, C0 ]2 (4.16) which by nondegeneracy of the inner product implies that [C, C0 ] = 0. Therefore the action in (4.10) has indeed the same critical points as the Yang-Mills action (2.25). Let us now explicitly establish, following [32], the localization of the partition function onto the classical solutions of the gauge theory. Plugging (4.14) and (4.11) into (4.10) and carrying out the integration over φ ∈ g gives ! dφ 1 exp (t dα + ω) Z = vol(G) g×O 2π × exp − i Tr (C0 φ) − g2 Tr φ 2 − i t Tr ([C, [C, C0 ] ] φ) =

dim(G)/2 g 1 exp (t dα + ω) vol(G) 2π O × exp − 2g1 Tr C02 + gt Tr (C0 [C, [C, C0 ] ]) −

t2 2g

Tr ([C, [C, C0 ] ])2 , (4.17)

where we have used Tr (C [C, −]) = 0. The only configurations which contribute to (4.17) in the large t limit are therefore solutions of the equation [C, [C, C0 ] ] = 0

(4.18)

0 = Tr (C0 [C, [C, C0 ] ]) = − Tr [C, C0 ]2 ,

(4.19)

which implies as in [32] that

giving [C, C0 ] = 0 as desired. Therefore the integral (4.17) receives contributions only from the solutions of the Yang-Mills equations (3.2), which establishes the claimed localization. The local geometry in g × O about each critical point, as analysed in detail in the last section, determines the partition function as a sum of local contributions involving the values of the Yang-Mills action evaluated on the classical solutions as in Sect. 3.2. Consider an equivariant tubular neighbourhood N(n 1 ,s1 ),...,(n k ,sk ) of a critical surface C(n 1 ,s1 ),...,(n k ,sk ) in g × O. Since the partition function (4.10) is independent of t, we can consider its large t limit as above, and this limit will always be implicitly assumed " from now on. Let W be a compact subset of O with W ∩ C = ∅, where C := (n i ,si ) C(n 1 ,s1 ),...,(n k ,sk ) . Then the integral over W in (4.17) has a gaussian decay in t → ∞. This means that in expanding exp(t dα + ω) into a finite sum of terms of the form ω p ∧ (t dα)m , we can disregard all terms which contain ω since they will be suppressed by factors of 1t and vanish in the large t limit. The only terms which survive the t → ∞ limit are those with p = 0, m = d, and the integral therefore vanishes unless ω is replaced by dα, except at the saddle point where dα = 0. Then one has !

dφ 1 exp t dα − i α, Vφ Z = vol(G) g×O 2π (4.20) × exp − i Tr (C0 φ) − g2 Tr φ 2

Localization for Yang-Mills Theory on the Fuzzy Sphere

219

in the vicinity of any critical point in which dα is nondegenerate. The integral Z (n 1 ,s1 ),...,(n k ,sk ) in (4.20) over the neighbourhood N(n 1 ,s1 ),...,(n k ,sk ) is determined by the local behaviour of α and the G-action near C(n 1 ,s1 ),...,(n k ,sk ) . Then Z (n 1 ,s1 ),...,(n k ,sk ) . (4.21) Z= k ,sk ) (n 1 ,s1 ),...,(n i n i =n N , i si =n

As expected [24], the sum over critical surfaces in (4.21) contains the sum over weights 1 ≤ n 1 ≤ n 2 ≤ · · · ≤ n k of the gauge group G = U (n N ). Our explicit computations will confirm the local behaviour of the partition function given by [24] #

− dim(G) − 1 i n i2 Z (n 1 ,s1 ),...,(n k ,sk ) = g e 2g H(n 1 ,s1 ),...,(n k ,sk ) g . (4.22) The smooth function H(n 1 ,s1 ),...,(n k ,sk ) : R → C, which is bounded by a polynomial at infinity, is determined by the equivariant Euler class of the fixed point locus corresponding to the weight (n 1 , . . . , n k ) after reducing the integral over g to its Cartan subalgebra, as we do explicitly in the next section. 4.2. Explicit evaluation of the localization forms. The explicit computation of the local contributions Z (n 1 ,s1 ),...,(n k ,sk ) to the Yang-Mills partition function on S N2 will rely on the local behaviour of the invariant one-form α introduced in (4.14) near the YangMills critical points. We will now pause to derive explicit expressions for the BRST transformations (4.11) on the subspaces appearing in the tangent space decomposition (3.36). Given the invariant Maurer-Cartan one-form (2.42) and the projector (3.63), consider the u(n N )-valued one-form θ0 := 0 (θ ) =

1 2

tr σ (θ ),

(4.23)

where tr σ denotes the partial trace over the spin matrices σ µ . It is given explicitly by (4.24) θ0 = N42 (C dC)0 = N42 Ci dC i + C0 dC0 and satisfies

dθ0 = − 21 tr σ θ 2 = −0 θ 2 .

(4.25)

One has θ, Vφ =

2 N2

[C, Vφ ] = − 2Ni J (Vφ )

and

θ0 , Vφ = − 2Ni 0 J (Vφ )

(4.26)

for any tangent vector Vφ = i [C, φ]. Using the identity C dC = −dC C, the localization one-form (4.14) can now be written as α = − i N2

2

Tr (C0 θ ) = − i N2

2

Tr (C θ0 ).

Hence the pairing in (4.11) is given by

α, Vφ = −N Tr C0 J (Vφ )

= N Tr J (C0 ) Vφ = −N 2 Tr J 2 (C0 ) φ .

(4.27)

(4.28)

220

H. Steinacker, R. J. Szabo

This vanishes on the critical surfaces, where J (C0 ) = 0. Furthermore, for any g ∈ g one has $ % α, J 2 (g) = −N Tr C0 J 3 (g) = N Tr (C0 J (g)) = i Tr (C0 [C0 , g]) = 0,

(4.29)

while for e0 ∈ E 0 one has $ % α, e0 = α, J (e0 ) = α, J 2 (h) = 0

(4.30)

for some h ∈ h. Both identities (4.29) and (4.30) hold even off-shell. We also note the on-shell relations α, J (g) = −N Tr C0 J 2 (g) = 0 and α, e1 = −N Tr (C0 J (e1 )) = 0 (4.31) for e1 ∈ E 1 . To evaluate the integral (4.20) using the stationary phase method, we must understand how it behaves near the Yang-Mills critical points. For this, we will study the local behaviour of the BRST variation (4.11), beginning with the pairing α, Vφ . Let us write a generic gauge field of O as C = C + ε i [ C, ] + 21 ε2 i [ C, i [ C, ] ] + O(ε3 ), where C is the given critical point, ∈ su(N ) are the fluctuations around C and ε is a small real parameter. Then J 2 (C0 ) = 0 + ε J 2 i [ C, ]0 + Ni i [ C, ], J ( C 0 )

+ Ni J [ i [ C, ], C 0 ] + O ε2

= ε J 2 ((V )0 ) + Ni J [V , C 0 ] + O ε2 , (4.32) which for φ ∈ g gives α, Vφ = −ε N 2 Tr J 2 ((V )0 ) φ + Ni J (V ), C 0 , φ + O ε2 = −ε N 2 Tr (V )0 J 2 (φ) + J (V ) Ni C 0 , φ + O ε2 = −ε N 2 Tr V J 2 (φ)0 − J (J (φ)0 ) + O ε2 , (4.33) using (3.65). This is non-degenerate for φ ∈ g s h, i.e. non-vanishing for some V ∈ TC O. To see this, it is sufficient to show that J (J 2 (φ)0 − J (J (φ)0 )) = 0. Indeed, assuming the contrary J (J 2 (φ)0 ) = J 2 (J (φ)0 ) would imply that either φ ∈ s, or J (φ)0 ∈ h which is amounts to φ ∈ h ⊕ s. On the other hand, this pairing is indeed degenerate for any V ∈ E 1 . For φ ∈ s, the second-order contribution to the form (4.33) can be obtained from (4.34) Vφ = i [C, φ] = i ε[V , φ] + O ε2

Localization for Yang-Mills Theory on the Fuzzy Sphere

and J (C0 ) =

i N

[C, C0 ] =

i N

221

ε ([V , C0 ] + [C, (V )0 ]) + O ε2 .

(4.35)

It follows that

α, Vφ = −ε2 Tr adφ (V ) adC0 (V ) + i N J ((V )0 ) + O ε3 .

(4.36)

In particular, for V ∈ E 1 this pairing simplifies to

α, Vφ = −ε2 Tr adφ (V ) adC0 (V ) + O ε3 .

(4.37)

We now turn to the exact part dα of (4.11). Using (2.44)–(2.46), one finds 2 2 dα = − i N2 Tr dC θ0 − C0 θ 2 = − i N2 Tr (C θ θ0 + C0 dθ ) .

(4.38)

For flat connections with F = 0, the second term in the first equality of (4.38) vanishes and one has dα = − i

N2 2

Tr (dC θ0 ) = − i

N2 2

Tr (C θ θ0 )

if C0 =

1 2

1ln N .

(4.39)

From (2.43) and (2.44) one generally has θ 2 = − N42 (dC)2 , and hence $ % Tr (C0 θ 2 ), Vφ ∧ Vψ = N42 Tr (C0 [ [C, φ], [C, ψ] ]) =

4 N2

Tr ([C0 , [C, φ] ] [C, ψ]) = − N42 Tr adC0 (Vφ ) Vψ (4.40)

for any pair of tangent vectors Vφ = i [C, φ] and Vψ = i [C, ψ]. Similarly, one has Tr (C θ θ0 ), Vφ ∧ Vψ

= Tr (dC θ0 ), Vφ ∧ Vψ = − 2Ni Tr Vφ J (Vψ )0 − Vψ J (Vφ )0 , (4.41) which vanishes if any of the arguments belongs to the subspace E 1 . If Vψ = J (h) ∈ E 0 for some h ∈ h, then by using the map (3.35) along with (4.41) one computes the on-shell pairing

Tr (C θ θ0 ), Vφ ∧ Vψ = − 2Ni Tr Vφ J ( j (h))0 − J (h)0 J (Vφ )

= − N22 Tr adC0 (Vφ ) j (h) + adC0 (h) J (Vφ )

= − N22 Tr N adC0 (J (φ)) j (h) + adC0 (h) J (Vφ ) = − N22 Tr −N adC0 (φ) J 2 (h) + adC0 (h) J (Vφ )

= − N22 Tr N adC0 (J (φ)) J (h) − adC0 (J (h)) Vφ

(4.42) = − N22 Tr adC0 (Vφ ) Vψ − adC0 (Vψ ) Vφ . This coincides with (4.40), and in particular it vanishes unless the vector field Vφ also belongs to the subspace E 0 . In summary, we have the on-shell evaluations

dα, Vφ ∧ Vψ = 2 i Tr Vφ adC0 (Vψ ) if Vψ ∈ E 1 (4.43) and dα, Vφ ∧ Vψ = 0

if Vψ ∈ E 0 .

(4.44)

222

H. Steinacker, R. J. Szabo

4.3. Localization at the vacuum moduli space. We will now compute the localized partition function Z 0 := Z (N ,1),...,(N ,1) at the vacuum critical surface. We denote this gauge orbit as (4.45) O0 := C(N ,1),...,(N ,1) = g C g −1 g ∈ U (n N ) ∼ = U (n N )/U (n). In this case the subspaces E 0 and E 1 in (3.36) are trivial. Localization implies that we can restrict ourselves to a G-equivariant tubular neighbourhood N0 = N(N ,1),...,(N ,1) of the critical surface, under the action of the gauge group G = U (n N ). The neighbourhood N0 has an equivariant retraction [45, Chap. 27] by a local equivariant symplectomorphism onto the local symplectic model F0 , defined to be an equivariant symplectic vector bundle over O0 with fibre J 2 (g s) which is a sub-bundle of the tangent bundle T O restricted to O0 . This means that the tangent space to F0 at the vacuum critical point C in (3.9) is given by TC O0 ⊕ J 2 (g s) ∼ = J (g s) ⊕ J 2 (g s) = TC O, the symplectic two-form on F0 is simply ω, and the hamiltonian G-action on F0 descends from the moment map µ. In physical terms, the gauge fields are decomposed along the vacuum moduli space O0 plus infinitesimal non-gauge variations in the subspace J 2 (g s). Due to the presence of the localization form α in the path integral, we can restrict ourselves to this model F0 and use it to replace the open neighbourhood N0 [27]. Indeed, because F0 is an equivariant retraction from N0 , the G-equivariant cohomology of N0 is the same as that of F0 . Furthermore, since the fibres of the bundle F0 are contractible, its G-equivariant cohomology is identified under pullback with the S-equivariant cohomology of its base space O0 , so that HG (N0 ) ∼ = HS (O0 ). Since S acts trivially on O0 , one has HS (O0 ) ∼ = C[[s]] S ⊗ H (O0 ) and the S-equivariant cohomology classes of O0 coincide with ordinary cohomology classes of O0 valued in the ring of invariant functions on the stabilizer s. Putting everything together gives an isomorphism HG (N0 ) ∼ = C[[s]] S ⊗ H (O0 ) which reduces the equivariant integral over g × N0 in (4.20) to an ordinary integral over s × O0 . This is precisely the nonabelian localization that is formally carried out in [26], and will turn out to be very much like the localization at the trivial connection of Chern-Simons theory on a Seifert homology sphere [27]. In the present case, the integral over φ ∈ s will then give the interesting non-trivial quantum fluctuation determinants about the classical solution. We will now carry out this reduction explicitly. Let gi be an orthonormal basis of g = g s, and consider the corresponding basis Ji = J (gi )

and

J˜j = J 2 (g j )

of TC O = J (g s) ⊕ J 2 (g s), with the dual basis λi , λ˜ j defined by $ % % % $ % $ $ λi , J j = δ i j , λ˜ i , J˜j = δ i j and λi , J˜j = λ˜ i , J j = 0.

(4.46)

(4.47)

Introduce the functions f i = α, Ji

(4.48)

which vanish on-shell but have non-degenerate derivatives d f i due to (4.33). Then by expanding φ = φ i gi + φ a sa into components φ i along g s and φ a along s, we have α, Vφ = N α, J (φ) = N f i φ i .

(4.49)

Localization for Yang-Mills Theory on the Fuzzy Sphere

223

It follows that the localization one-form can be expanded as α = f i λi

(4.50)

dα = d f i ∧ λi + f i dλi .

(4.51)

d & (dα)d = d f i ∧ λi + f j ϒ j , d!

(4.52)

with

In particular, one has

i=1

where d = dimC (O) = n 2 (N 2 − 1) is the (real) dimension of the vacuum orbit O0 . The forms f j ϒ j vanish on-shell, and are killed by localization in the large t limit. For example, inner products of the form α, J (s) , s ∈ s are non-vanishing off-shell at second order due to (4.36), but these higher-order terms do not contribute because √ of the localization in the large t limit. This can be seen explicitly by rescaling f i = t f i . The corresponding local contribution to the partition function (4.20) for t → ∞ is then given by ! g dφ t d 1 2 Z0 = (dα)d e − i t α,Vφ − i Tr (C0 φ)− 2 Tr (φ ) vol(G) g×F0 2π d! =

=

1 vol(G) g×F0 1 vol(G) s

dφ 2π

dφ 2π

!

! td

d & d f i ∧ λi e − i N t

f i φ i − i Tr (C0 φ)− g2 Tr (φ 2 )

i=1 g

e − i Tr (C0 φ)− 2

Tr (φ 2 )

1 Nd

d &

O0 i=1

λi .

(4.53)

Here the f i integrals over the fibre J 2 (g s) have produced delta-functions setting φ i = 0 in g s. We can carry out the integral over the moduli space O0 in (4.53) by observing that 1 Nd

d &

O0 i=1

λi =

d & G/S i=1

ηi =

vol(G) , vol(S)

(4.54)

where the pullbacks J ∗ (λi ) = ηi define left-invariant one-forms on the gauge group G dual to gi , with the map N J regarded as the derivative of the diffeomorphism G/S −→ O0 ,

g −→ g C g −1 .

(4.55)

To evaluate the remaining integral over the gauge stabilizer algebra s ∼ = u(n) in (4.53), we note that, for the vacuum critical point with C0 = 21 1ln N , the integrand defines a gauge invariant function f : u(n) → R. We may thus apply to it the Weyl integration formula which reduces its integral over u(n) to an integral over the Lie algebra u(1)n of the maximal torus U (1)n of U (n). It is given by vol (U (n)) [dφ] f (φ) = [ds] (s)2 f (s), (4.56) n! (2π )n u(n) Rn

224

H. Steinacker, R. J. Szabo

where we have identified u(1)n ∼ = Rn in a basis where the Cartan subalgebra of U (n) is represented by diagonal n × n matrices s = diag(s1 , . . . , sn ) by mapping them onto n-vectors s = (s1 , . . . , sn ) ∈ Rn . Here (s) =

'

(si − s j ) =

1≤i, j≤n

i< j

j−1

si

det

(4.57)

is the Vandermonde determinant, which is the Weyl determinant for U (n) arising as the jacobian for the diagonalization of hermitian matrices on the left-hand side of (4.56). The factor n! is the order of the Weyl group Sn of U (n) acting by permutations of the components si of s ∈ Rn , while (2π )n is the volume of the maximal torus U (1)n with respect to the chosen invariant Haar measure. An integral identity. We will make use here and in Sect. 4.4 below of the integral identity Rn

N 2

[ds] (s)2 e − i −n N

= e

si + 4i

i

m i si − g4

i

si2

i

2 −m N 4g

Rn

i 4

[ds] (s)2 e

m i si − g4

i

si2

i

,

where m = i m i . To derive (4.58), we set s = i si and ti = si − i ti = 0. Then

N

i

g

[ds] (s)2 e − i 2 i si + 4 i m i si − 4 i si n R N m = ds e − i 2 s+ i 4n s [dt] (t)2 e =

R

R

=2

Rn

ds e

(

πn g

−i

e

N 2

s+ i

N −m) − (2n16n g

Rn

2

Rn

1 n

s so that

2

i 4

i

g 2 s− 4n s

m 4n

(4.58)

[dt] (t)2 e

[dt] (t)2 e

i 4

i

m i ti − g4

i 4

i

m i ti − g4

i

(ti + n1 s)2

m i ti − g4

2 i ti

.

2 i ti

(4.59)

On the other hand

i

[ds] (s)2 e 4 Rn m i 4n s = ds e R

= =2

R

ds e

(

πn g

i

e

m 4n

m i si − g4

i

Rn

i

si2

[dt] (t)2 e

g 2 s− 4n s

m2 − 16n g

Rn

i 4

i

m i ti − g4

Rn

i 4

[dt] (t)2 e

[dt] (t)2 e

i 4

i

i

i

(ti + n1 s)2

m i ti − g4

m i ti − g4

2 i ti

2 i ti

.

(4.60)

Localization for Yang-Mills Theory on the Fuzzy Sphere

225

Final reduction. From (4.53), (4.54) and (4.56) we obtain ! g dφ 1 2 Z0 = e − i Tr (C0 φ)− 2 Tr (φ ) vol(S) s 2π ! g ds 1 1 2 − i N2 i si − 4 = e (s) 2 n n! (2π ) 2π Rn

i

si2

,

(4.61)

where we have substituted (4.2) and used vol(S) = N N /2 vol(U (n)) with respect to the Cartan-Killing metric on s, since S = U (n) ⊗ 1l N . Applying the integral identity (4.58) therefore allows us to finally write the partition function as N2 g 2 1 1 − n 4g Z0 = e [ds] (s)2 e − 4 i si . (4.62) 2 +n n n! (2π ) Rn 2

The exponential prefactor in the above expression is the Boltzmann weight of the action (3.15) evaluated on the vacuum solution. The remaining quantum fluctuation integral is the standard expression [19] for the contribution from the global minimum of the Yang-Mills action on S 2 to the U (n) sphere partition function. It arises from the trivial instanton configuration with vanishing monopole charges m i = 0 in (3.17). 4.4. Localization at maximally irreducible saddle points. We now turn to the opposite extreme and look at the local contribution to the partition function (4.20) from a generic maximally non-degenerate critical surface. We denote this gauge orbit by Omax := C(n 1 ,1),...,(n n ,1) = g C g −1 g ∈ U (n N − c1 ) ∼ = U (n N − c1 )/U (1)n (4.63) and assume that the integers n 1 > n 2 > · · · > n n are explicitly specified. Here we allow also c1 = 0 which describes sectors with non-vanishing U (1) monopole number (3.13). We want to compute the integral Z max in (4.20) over a local neighbourhood Nmax of Omax , which is independent of t in the large t limit. We first need to find a suitable basis for the tangent space TC O at the irreducible critical point C. The definition of the basis Ji , J˜i introduced in (4.46) naturally extends to include the non-trivial subspaces E 0 , E 1 in this case with Ji = J (gi ),

J˜j = J 2 (g j ),

Hi = J (h i ) ∈ J (h) = E 0

and

Ki ∈ E1 (4.64)

for gi and h i an orthonormal basis of g h s and of h s, respectively. The elements K i are assumed to form an orthonormal basis of E 1 , orthogonal to J (g) ⊕ J 2 (g). Recall from Sect. 3.4 that E 0 and E 1 are naturally complex vector spaces, whose generators are embedded into the tangent space decomposition (3.36) as ⎞ ⎛ 0 0 0 0 ⎜ 0 0 Xi 0 ⎟ ⎟ (4.65) Ki = ⎜ ⎝ 0 X† 0 0 ⎠ i 0 0 0 0

226

H. Steinacker, R. J. Szabo

and similarly for Hi . The complex structure is given by the map J , which amounts to multiplying X i by i . We accordingly take the real basis K i to be ordered as {K i } = {( K˜ i , J ( K˜ i ))}, and similarly for Hi . As matrices, all of the generators Hi , K j are hermitian. The corresponding dual one-forms β i , γ i are defined as usual by $ % % $ β i , H j = δi j and γ i , K j = δi j (4.66) with all other pairings equal to 0. We need to evaluate the pairing α, Vφ . It vanishes on-shell, and identically on J 2 (g). Its evaluation on J (g h s) has the form α, J (gi ) = f i , and as before this implies (4.49). Together with (4.30) and (4.31), it follows that the localization one-form α admits an expansion α = f i λi + gi β i + ki γ i ,

(4.67)

where f i , gi , ki vanish on-shell. We can evaluate dα = d f i ∧ λi + f i dλi + dgi ∧ β i + gi dβ i + dki ∧ γ i + ki dγ i

(4.68)

using (4.44) and (4.43) to get dα, Hi ∧ H j = 0 where

and

dα, K i ∧ K j = Ai j ,

Ai j = 2 i Tr K i adC0 (K j )

(4.69)

(4.70)

is an antisymmetric matrix. Furthermore, dα vanishes when evaluated on mixed terms of the form K i ∧J (g), K i ∧J 2 (g), Hi ∧J (g ) and Hi ∧J 2 (g ) with g ∈ g, g ∈ ghs. Therefore dα = d f i ∧ λi +

1 2

Ai j γ i ∧ γ j + O f ,

(4.71)

where O f denotes contributions which vanish on-shell such as f i dλi . One then has ⎞ ⎛ 2d d−d 0 −d1 &1 & (dα)d−d0 = pfaff(A) γi ∧ ⎝ d f j ∧ λj⎠ + O f , (4.72) (d − d0 )! i=1

j=1

where d0 (resp. d1 ) is the complex dimension of the vector space E 0 (resp. E 1 ), and pfaff(A) = i1 ···i2d1 Ai1 i2 · · · Ai2d1 −1 i2d1

(4.73)

is the pfaffian of the antisymmetric matrix A = (Ai j ). Let us now recall the local geometry and define its symplectic model. The G-equivariant tubular neighbourhood Nmax of Omax has an equivariant retraction [45] by a local equivariant symplectomorphism onto the local symplectic model Fmax , defined to be an equivariant symplectic vector bundle over Omax with fibre J 2 (g h s) ⊕ E 1 which is a sub-bundle of the tangent bundle T O restricted to Omax . This means that the tangent space to Fmax is given by TC Omax ⊕ J 2 (g h s) ⊕ E 1 ∼ = E 0 ⊕ J (g h s) ⊕ J 2 (g h s) ⊕ E 1 = TC O,

(4.74)

Localization for Yang-Mills Theory on the Fuzzy Sphere

227

the symplectic form on Fmax is simply ω, and the hamiltonian G-action on Fmax descends from the moment map µ. In physical terms, the gauge fields are split along the moduli space Omax , plus infinitesimal non-gauge variations belonging to J 2 (g h s) and unstable modes in the subspace E 1 . Due to the presence of the localization form α in the action, we can restrict ourselves to this model Fmax replacing Nmax . Identically to the case of Sect. 4.3 above, the canonical symplectic integral over g × Nmax will in this way reduce to an integral over s × Omax and the localization now resembles that at an irreducible flat connection of Chern-Simons theory [27]. We may now proceed to calculate ! dφ 1 exp Z max = vol(G) g×Nmax 2π

× ω + t dα − i α, Vφ − i Tr (C0 φ) − g2 Tr φ 2 1 = vol(G) g×Omax ×J 2 (ghs)×E 1 ! dφ (t dα)d−d0 ωd0 − i t α,Vφ − i Tr (C0 φ)− g Tr (φ 2 ) 2 ∧ e × 2π (d − d0 )! d0 ! ! dφ 1 pfaff(A) = vol(G) (ghs)⊕h⊕s 2π ⎞ ⎛ 2d d−d 0 −d1 &1 & ω d0 × t d−d0 γi ∧ ⎝ d f j ∧ λj⎠ ∧ d0 ! Omax ×J 2 (ghs)×E 1 i=1

× e − i t (N

f i φ i +α,Vφ )− i

Tr (C0 φ)− g2

j=1

Tr (φ 2 )

(4.75)

with φ ∈ h⊕s. In the second line we have used the fact that dα vanishes when evaluated on the subspace E 0 , and therefore we need d0 powers of ω to yield a non-trivial volume form. Then (t dα)d−d0 ∧ ωd0 is the only term which survives in the large t limit. We will modify this below by adding a second localization form α in order to write the localization integral in the generic form (4.20) without the symplectic two-form ω. We can now evaluate the integrals in (4.75) over f i in the fibre J 2 (g h s) and φ i ∈ g h s as in Sect. 4.3 above, which localizes for t → ∞ to an integral over the subspace E 1 and the gauge orbit Omax given by ! dφ pfaff(A) 1 Z max = t d1 vol(G) h⊕s 2π N d−d0 −d1 Omax ×E 1 ⎞ ⎛ 2d d−d 0 −d1 &1 & ω d0 × γi ∧ ⎝ λj⎠ ∧ d0 ! i=1

j=1

g

× e − i t α,Vφ − i Tr (C0 φ)− 2

Tr (φ 2 )

.

(4.76)

The gauge invariant volume form for the integration domain whose tangent space in E 0 is given by the symplectic volume form ωd0 /d0 !, since dα vanishes on E 0 , but this will be modified below. It remains to compute the integral over E 1 . Upon evaluating α, Vφ at second order on E 1 , i.e. away from the critical surface, we will find below that

228

H. Steinacker, R. J. Szabo

this pairing becomes a quadratic form which leads to a localization through a gaussian integral. However, to evaluate it explicitly it is easier to first localize the integral over E 0 , which presently is a complicated non-gaussian integral which does not admit a gaussian approximation at t → ∞ and is difficult to evaluate in a closed analytic form. But this can be done by adapting a trick taken from [27], which amounts to adding a further suitable localization one-form α , or equivalently a cohomologically trivial form Qα , to the action in (4.20). Indeed, we may compute Z max using any other invariant form α which is homotopic to α on the open neighbourhood Nmax . The one-form α need only be non-vanishing on E 0 ⊂ Nmax , as the other integrals can be directly carried out. The localization form α . In order to evaluate the integrals over E 0 and h, following [27] we introduce an additional localization term exp(t Qα ) in the partition function with α := − i Tr (θ φ) = − N2 J d Tr (C φ) . (4.77) E0

E0

The projection onto E 0 is equivalent to projecting φ ∈ g onto h. This one-form is equivariant on-shell, and it can be extended to the G-equivariant tubular neighbourhood Nmax of the critical surface Omax as follows. On the tangent space J (g h s) ⊕ E 0 of T Omax (4.74) there is an equivariant projection onto the subspace E 0 . In this way α is properly defined on the local model, and can hence be extended to Nmax . One could also define α = − i χ Tr (θ φ) E using a smooth G-invariant cutoff function χ with support 0 near the given saddle-point and χ = 1 in the tubular neighbourhood, which is globally well-defined over Nmax as an equivariant differential form. Note that t1 α +t2 α vanishes only on the original critical points for any t1 , t2 ∈ R with t1 = 0, and no new ones are introduced. Then our previous computation (4.17) would essentially go through, since α vanishes on J (g h s) and there are no critical points where dχ = 0. It is therefore just as good a localization form to use as α is. It follows that the modification of the canonical symplectic integral over Nmax given by ! dφ 1 Z max = vol(G) g×Nmax 2π (4.78) × exp ω + t1 Qα + t2 Qα − i Tr (C0 φ) − g2 Tr φ 2 is independent of both t1 , t2 ∈ R. Then α will localize the integral over h ⊂ g as well as the integral over the unstable modes in E 1 , without the need to expand α, Vφ to higher order. Integration over h. The new localization form α satisfies dα = i Tr θ 2 φ = − 2i Tr (θ [φ, θ ]) E0

and

α , Vh i = − N2 Tr J (Vh i ) φ =

2 N

E0

(4.79)

Tr Vh i J (φ) = 2 Tr (Hi J (φ)) , (4.80)

where Hi = J (h i ) with h i a basis of h. This produces a gaussian integral localizing h to the gauge stabilizer algebra s ∼ = u(1)n . To evaluate it, we will need the matrix Mi j := Tr (Hi H j )

(4.81)

Localization for Yang-Mills Theory on the Fuzzy Sphere

229

which is hermitian since we take Hi and h i to be hermitian. Similarly, one has

dα , Hi ∧ H j = N4 i2 Tr J (Hi ) [s, J (H j )] = − N4 i2 Tr Hi [s, J 2 (H j )] =

4i N2

Tr Hi [s, H j ] =:

4i N2

A˜ i j ,

(4.82)

where we have restricted to φ = s ∈ s using the localization. This implies that dα =

2i N2

A˜ i j β i ∧ β j

and

2d0 d0 & (dα )d0 = N4 i2 pfaff A˜ β i . (4.83) d0 ! i=1

To evaluate the matrices M = (Mi j ) and A˜ = ( A˜ i j ) above explicitly, we recall that the basis Hi := Hkl;i (where k, l are block indices) of E 0 takes the block form ⎞ ⎛ 0 0 0 0 ⎜ 0 0 Ylk;i 0 ⎟ ⎟ = J (h kl;i ), Hkl;i = ⎜ (4.84) ⎝0 Y† 0 0⎠ lk;i 0 0 0 0 where h kl;i ∈ h is a hermitian block matrix with a similar block decomposition. They are orthogonal for different k, l, and we will often omit the indices k, l. Note that the complex structure on E 0 defined by the map J is compatible with the natural complex structure on h. This basis is particularly useful for evaluating the pfaffian which appears in (4.83), because ads (Hkl;i ) for s ∈ s acts as multiplication by (sk −sl ) in the upper-right blocks of (4.84). It follows that i adC0 (Hkl;i ) = clk J (Hkl;i )

and

i ads (Hkl;i ) = (sl − sk ) J (Hkl;i ),

(4.85)

where the eigenvalues clk > 0 are defined in (3.39). These formulas hold only for k > l, and analogous statements are true for the subspace E 1 . We can choose an orthogonal basis Yi such that G i j = 2 Tr Yi Y j† is diagonal, as G i j is a hermitian matrix. Then

Tr Hi H j = Tr Yi Y j† + Yi† Y j = G i j ,

(4.86) Tr Hi J (H j ) = Tr i Yi Y j† − i Yi† Y j = 0. This means that the symmetric matrix M = (Mi j ) in (4.81) has the block decomposition G 0 (4.87) M= 0 G in the basis ( H˜ i , J ( H˜ i )), and similarly the matrix A˜ in (4.82) is given by

A˜ i j = Tr Hi ads (H j )

0 G = − i (sl − sk ) Tr Hi J (H j ) = − i (sk − sl ) . −G 0 i j

(4.88)

230

H. Steinacker, R. J. Szabo

We can read off the pfaffian from this expression and use (4.87) to write it as ' # pfaff A˜ = (− i )d0 det(M) (sk − sl )|n k −nl |+1 .

(4.89)

k>l

We can now evaluate the localization integral ! dφ (dα )d0 − i t2 α ,Vφ t2d0 e d0 ! h 2π ! 2d0 d0 dφ i j & t2d0 pfaff A˜ e −2 i t2 φ Mi j φ = N4 i2 βi , 2π h

(4.90)

i=1

φi

where φ = h i = h kl;i . The oscillatory gaussian integral is defined by analytic continuation t2 → t2 − i ε for a small positive parameter ε, which we are free to do as the partition function is formally independent of t2 . With this continuation understood and a suitable orientation of the vector space h, we readily compute ! 2d0 d0 2d )d 0 pfaff A˜ & dφ (dα 0 1 π d0 − βi t2d0 e − i t2 α ,Vφ = N4 i2 √ 2π 2i d0 ! det(M) i=1 h 2π φ kl;i

=

2d0 ' & i d0 |n k −nl |+1 (s − s ) βi . k l (2π N 2 )d0 k>l

i=1

(4.91) This integral thus produces a measure on s which we will use below to perform the remaining integral over the stabilizer. Integration over E 1 . Now that the φ-integration in (4.76) is localized onto s, we can proceed to evaluate the integral over E 1 . This space has a basis K i with block decomposition K kl;i similar to (4.84) for n ≥ k > l ≥ 1 (for k < l the K kl;i do not exist), which are non-vanishing if n k > nl + 1. We need to evaluate α, Vs for s ∈ s up to second order in the fluctuations about the critical point in E 1 , which is non-tangential to the gauge orbit Omax . For this, we introduce real linear coordinates x i , y i , i = 1, . . . , d1 on E 1 such that a generic vector V ∈ E 1 is parametrized as ⎛ ⎞ 0 0 0 0 ⎜0 0 zi X i 0 ⎟ ⎟, V = x i K i , y i J (K i ) = ⎜ (4.92) † i ⎝0 z X 0 0⎠ i 0 0 0 0 where we have introduced complex coordinates z i = x i + i y i . Then γ i = dx i and γ i+d1 = dy i for i = 1, . . . , d1 . As above, we can choose coordinates such that G i j = 2 Tr X i X †j is diagonal. Then (4.37) gives j

x (4.93) α, Vs = − Tr ads (V ) adC0 (V ) = x i , y i M˜ i j (s) yj

Localization for Yang-Mills Theory on the Fuzzy Sphere

231

to second order, where

M˜ i j (s) = Tr K i ads adC0 (K j )

= (sk − sl ) ckl Tr (K i K j ) = (sk − sl ) ckl

G 0 0 G

(4.94) ij

is a symmetric matrix and we have used the obvious analog of (4.85) for the basis K i . Similarly, the antisymmetric matrix A in (4.70) can be expressed as

0 G Ai j = 2 i Tr K kl;i adC0 (K kl; j ) = 2clk Tr K kl;i J (K kl; j ) = 2ckl , −G 0 i j (4.95) and using (4.94) its pfaffian is therefore given by ) ' d1 ˜ pfaff(A) = 2 det M(s) (sk − sl )1−|n k −nl | .

(4.96)

k>l

The pfaffians pfaff( A˜ ) and pfaff(A) represent the S-equivariant Euler classes in HS (Omax ) of equivariant bundles over Omax with fibres E 0 and E 1 , respectively, in terms of the weights sk for the (trivial) S-action on Omax . They are the typical representatives of fluctuations in equivariant localization [29,41], and they also appear in the nonabelian localization formulas of [27] and of [26]. Using the analytic continuation t1 → t1 − i ε and a suitable orientation of E 1 as before, we can now evaluate the oscillatory gaussian integral ' d1 π d1 1 dx i dy i t1d1 e − i t1 α,Vs = (4.97) ) . i E 1 i=1 ˜ det M(s) Symplectic integral over Fmax . Putting the results (4.76), (4.91), (4.96) and (4.97) together, we may evaluate the large t1 , t2 limit of the symplectic integral (4.78) to obtain !

dφ 1 exp d(t1 α + t2 α ) − i t1 α + t2 α , Vφ Z max = vol(G) g×Fmax 2π g

=

× e − i Tr (C0 φ)− 2 π d1 1 vol(G)

×

=

i d0 (2π N 2 )d0

i

1 N d−d0 −d1

Tr (φ 2 )

⎛

Omax

⎝

s

d−d 0 −d1 & j=1

ds 2π

! ' k>l

⎞ λ

j⎠

(sk − sl )|n k −nl |+1 )

∧

2d &0

pfaff(A) ˜ det M(s)

β

i

e −i

i=1

! n g ds i d0 −d1 ' √ 1 (s)2 e − i Tr (C0 s)− 2 n k d −d 0 1 vol(G) (2π ) 2π Rn k=1 ⎛ ⎞ 2d d−d 0 −d1 & &0 1 ⎝ × d+d −d λj⎠ ∧ βi , N 0 1 Omax j=1

Tr (C0 s)− g2 Tr (s 2 )

i=1

Tr (s 2 )

(4.98)

232

H. Steinacker, R. J. Szabo

where we have transformed the integration over φ = s = diag(s1 1ln 1 , . . . , sn 1ln n ) ∈ s to an integral over s = (s1 , . . . , sn ) ∈ Rn . We can carry out the integral over the moduli space Omax by observing again ⎛

1 N d+d0 −d1

Omax

⎝

d−d 0 −d1 &

⎞ λ

j⎠

∧

2d &0

j=1

β

d+d 0 −d1 &

=

i

G/S

i=1

ηj =

j=1

vol(G) , (4.99) vol(S)

where J ∗ (λi ) = ηi are left-invariant one-forms on the gauge group G. Note that (4.99) includes the integral over E 0 , and dimR (g s) = d + d0 − d1 . We also have vol(S) = √ N 2π n in our metric on s, since S = k k k U (1) ⊗ 1ln k , and C 0 (n i ) = 2n i 1ln i . Using furthermore d0 − d1 = n 2 − n which is an even integer, we may then bring (4.98) into the form Z max = = =

in

2 −n

(2π )n i

2 +n

n 2 −n

(2π ) in

n 2 +n

2 −n

(2π )n

2 +n

[ds] (s)2 e − i

Rn

i

[ds] (s)2 e − 2

Rn

N n/2 n √ nk

Rn

[d˜s ]

Tr (C0 s)− g2 Tr (s 2 )

N

i

si − g4

' (

ni i N

(

s˜k −

N nk

si2

(4.100) 2

N nl s˜l

e

− 2i

)

g N3 n i s˜i − 4

i

i

s˜i2

,

k>l

k=1

√ where s˜i := n i /N si . Completing the square of the gaussian function of s˜i in (4.100) identifies the Boltzmann weight of the action (3.15) on the solution in

non-degenerate mi (3.16). In the large N limit, we substitute (3.17) with s˜i ≈ 1 + 2N si . Neglecting terms of order N1 then reduces (4.100) to Z max ≈ ±

1 (2π )

n 2 +n

i

Rn

[ds] (s)2 e − 2

N

i

si

e

i 4

i

m i si − g4

i

si2

, (4.101)

and an application of the integral identity (4.58) leads to our final result Z max ≈ ±

1 2 (2π )n +n

e

−n N

2 −m N 4g

Rn

[ds] (s)2 e

i 4

i

m i si − g4

i

si2

. (4.102)

The exponential prefactor in this formula exhibits the shift of the vacuum action, corresponding to the modification of the trace constraint (2.22) to (3.13), by the Chern class c1 = m = i m i . The remaining contributions coincide with the classical result [19] for the contribution to the U (n) sphere partition function from the Yang-Mills instanton on S 2 specified by the configuration of magnetic monopole charges m 1 , . . . , m n ∈ Z. In particular, using the standard manipulation of [19] one can change integration variables in (4.102) to identify the anticipated Boltzmann weight of the action (3.20).

Localization for Yang-Mills Theory on the Fuzzy Sphere

233

5. Abelianization In the following sections we will describe an alternative technique of evaluating the partition function of U (n) Yang-Mills theory on the fuzzy sphere S N2 , within the framework of our symplectic model. This method can be regarded as a finite-dimensional version of the technique of abelianization for ordinary Yang-Mills theory in two dimensions [28], which can be used to derive the strong-coupling expansion of the gauge theory and agrees with the nonabelian localization. The advantage of this formalism is that it captures all classical contributions to the partition function in a single go and for any N , in contrast to nonabelian localization which requires analysis of each type of critical point individually and only yields tractable expressions in the large N classical limit. Its downfall is that it leads to somewhat cumbersome expressions for the partition function which arise from a rather different sort of localization. This is analogous to the case of gauge theory on the two-dimensional noncommutative torus whose strong-coupling expansion involves the addition of infinitely many higher Casimir operators to the usual Migdal formula [34], or its matrix model regularization which is given by a complicated combinatorial formula [33]. This complexity makes it difficult to explicitly extract the contributions from fuzzy sphere instantons, and we will examine this problem more thoroughly in the next section. Here we shall derive in detail our alternative abelianized formula for the partition function (4.1), representing yet another new solution for quantum gauge theory on the fuzzy sphere. Let us start from the partition function in the form (4.3). The crucial observation is that the function f : g → R defined by the symplectic integral 1 f (φ) := (5.1) exp ω − i Tr (C0 φ) − g2 Tr φ 2 vol(G) O() is gauge invariant. Analogously to what we did in Sect. 4.3, we may therefore apply the Weyl integration formula (4.56) which reduces its integral over the gauge algebra g = u(n N ) to an integral over the Lie algebra u(1)n N of the maximal torus T = U (1)n N of G = U (n N ). This rewriting of the φ-integral in (4.3) is called diagonalization or abelianization, and it can be thought of as the eigenvalue representation of the gauge theory regarded as a matrix model. In this way we may bring the partition function into the form ! g 1 dp 2 Z= e − 4 Tr ( p ) ( p)2 Z O ( p), (5.2) (n N )! Rn N 2π where

Z O ( p) =

O()

exp ω −

i 2

Tr ( p C)

(5.3)

is the Fourier transform of the orbit O() and we have identified (n N )-vectors with diagonal matrices p = diag( p1 , . . . , pn N ) ⊗ σ 0 . Localization can then be applied to the symplectic integral (5.3) in three different ways, by: 1. Considering p ∈ u(N ) and observe that Z O ( p) can be considered as being invariant under p → U −1 p U for U ∈ U (N ). One can then evaluate the integral over the orbit space O() directly using the Itzykson-Zuber formula (6.1) for the unitary group U (N ). This is essentially the calculation that was carried out in [9], which is adapted to the present formulation in Sect. 6. It amounts to an abelian localization of the original orbit integral via the Duistermaat-Heckman theorem.

234

H. Steinacker, R. J. Szabo

2. Considering p ∈ u(n N ) ⊗ σ 0 and apply abelian localization to the maximal torus T of the gauge group G = U (n N ). This will be elaborated in detail in Sect. 7, taking advantage of a suitable polar decomposition of the orbit space. This in turn will involve a localization onto the radial U (N+ ) × U (N− )-foliation, accompanied by a fluctuation integral over the moduli space of symplectic leaves. 3. Adding a localization form Qα as in Sect. 4, and applying nonabelian localization techniques to write the partition function as a sum over local contributions from Yang-Mills critical points. Technique 3 here was of course dealt with at length in Sect. 4, and will be compared in some detail to the other two approaches below. Comparison with Technique 1 first is interesting in its own right as a comparison between the matrix model approach of [9] to gauge theory on S N2 and the results of the present paper. It is also a useful warm-up to the abelianization approach of Technique 2 which shares some of its qualitative features. We will find that the abelianization technique through the polar decomposition of the configuration space exploits the radial coordinates in a rather explicit way to describe the local geometry of Yang-Mills critical surfaces, and it may also find useful applications in related considerations. 6. Itzykson-Zuber Localization on the Configuration Space The integral (5.3) can be evaluated immediately using the Itzykson-Zuber formula [46], which we briefly recall. If X, Y are m × m hermitian matrices with nondegenerate eigenvalues xi , yi ∈ R, i = 1, . . . , m, then one has iN e s xi y j det 1≤i, j≤m , (6.1) [dU ] exp i sN Tr X U Y U † = c N (m, s) (x) (y) U (m) where for m ∈ N and s ∈ C we have defined c N (m, s) := vol (U (m)) ( i N /s)

−m (m−1)/2

m−1 '

k!.

(6.2)

k=1

Applied to the present situation for U (N ), this yields 1 Z O ( p) = [dU ] exp − 2i Tr U −1 U vol (U (N+ )) vol (U (N− )) U (N ) i e − 2 i j det 1≤i, j≤N , (6.3) = c1 (N , 2) () () where = diag( p1 , . . . , pn N )⊗σ 0 and c1 (N , 2) := c1 (N , 2)/vol(U (N+ )) vol(U (N− )). This formula can be understood as an abelian localization with respect to the action of the maximal torus group U (1)N on the flag manifold U (N )/U (1)N [29]. The corresponding fixed points are the solutions of the equation [C, ] = 0,

(6.4)

Localization for Yang-Mills Theory on the Fuzzy Sphere

235

which are the saddle-points of the Itzykson-Zuber integral, and the expansion of the determinant in (6.3) into a sum over permutations π ∈ SN gives the sum over critical points in the localization formula. This is completely analogous to the abelianized localization of Sect. 7. However, the expression (6.3) is formal as it stands because both sets of eigenvalues i and i are degenerate, and correspondingly the critical surfaces are in fact nontrivial spaces. Therefore (6.3) has to be defined using an appropriate limiting procedure which removes the degeneracy. The partition function (4.3) is then given by i ! e − 2 i j det g c (N , 2) dp 2 1≤i, j≤N e − 4 Tr ( p ) ( p)2 Z= 1 , (6.5) (n N )! 2π () () Rn N where the set of eigenvalues i of consists of two copies of ( p1 , . . . , pn N ) and is therefore highly degenerate. While this explicit formula in terms of an n N -dimensional integral is very appealing, the ratio of degenerate determinants in (6.5) makes it difficult to evaluate explicitly [9], and its combinatorial expansion is even more intricate than that of Sect. 7.3. Thus far only an asymptotic analysis (of a slightly modified integral) has been made possible in [9]. The reason for this complexity is the fact that, without the addition of a suitable localization form Qα to the path integral (4.3), the localization is onto the solutions of Eq. (6.4) in O which are not related to the critical surfaces of the Yang-Mills action in any simple way. This will be explored in more detail below. 7. Abelian Localization and Radial Coordinates We now return to the symplectic orbit integral (5.3), and observe that it fulfills the conditions of the Duistermaat-Heckman theorem, or equivalently the abelian version of the localization theorem of Sect. 4.1. Therefore, we have mapped the original nonabelian localization problem to the simpler problem of abelian localization. Indeed, µT (C), p = Tr ( p C) is just the restriction of the moment map µ : O() → u(N )∨ to the maximal torus T of the gauge group G. The torus action on the orbit space O() is the restriction of the adjoint G-action given by C −→ P C P −1

(7.1)

for C = Cµ ⊗ σ µ = U U −1 ∈ O(), U ∈ U (N ) and P ∈ T . To compute the corresponding localization formula we need the fixed points of this T -action. They are given by those C ∈ O() which commute with the T -action generated by the element p ∈ u(1)n N , so that [C, p] = 0.

(7.2)

This equation will be studied in detail in Sect. 8. It is solved by those U ∈ U (N ) for which U −1 P U lies in the stabilizer subgroup U (n N+ ) × U (n N− ) ⊂ U (N ) of the element (with N± := N ± 1 as before). The saddle points U are generically also labelled by permutation matrices ∈ U (N ) representing elements π ∈ Sn N . On the configuration space O, the saddle point equation (7.2) means that C commutes with the characteristic projectors of p, i.e. C has the same block decomposition as p. The Fourier transform (5.3) will thus generically localize onto a subspace of U (n N+ )× U (n N− ) in O. It may be evaluated with the help of the degenerate version of the Duistermaat-Heckman theorem [29], which expresses it in terms of an integral over the

236

H. Steinacker, R. J. Szabo

critical submanifold U (n N+ ) × U (n N− ) with the quantum fluctuation determinants determined by the T -equivariant Euler class of the normal bundle to the stabilizer [41]. While this can be worked out in principle, it is rather cumbersome to do in practise. Instead we will proceed in a more direct fashion by exploiting some further geometrical properties of the configuration space O, which in the next section will be related to the local symplectic geometry near each Yang-Mills critical point as analysed at length in Sect. 3. This explicit calculation will justify the abelianized localization a priori, with the quantum fluctuation determinants given by integrals over symplectic leaves of a foliation of the configuration space parametrized by abelian subspaces of the tangent spaces to O. The symplectic integral (5.3) could also be analysed using Fourier transform techniques along with the Guillemin-Lerman-Sternberg theorem [47], as in [23–25], but this leads to much more complicated combinatorial expressions than the ones we derive. 7.1. Polar decomposition of the configuration space. The key step in the evaluation of (5.3) is the introduction of radial coordinates on the orbit space (see [36–38] for details). Let us go back to the Cartan decomposition (3.21) at a given point C ∈ O. Let t be a maximal abelian subalgebra in the tangent space TC O ∼ = ker(J 2 + 1lN ). Then the radial coordinates on the orbit space O are given by (7.3) U = V R V −1 = V R j V −1 , where V ∈ U (n N+ ) × U (n N− ), modulo elements of the centralizer of t, and R ∈ exp(t) up to the adjoint action of the Weyl group of the restricted root system of the irreducible symmetric space O. By definition, they satisfy the respective commutation and anticommutation relations V = V

R = R −1 .

and

(7.4)

The corresponding covariant coordinate C ∈ O() is then given by C = U U −1 = V R R −1 V −1 = 21 V R 2 + R −2 V −1 =

1 2

V R 2 V −1 + V R −2 V −1 .

(7.5)

The jacobian for the change of invariant integration measure on O can be computed by standard techniques with the result dC = r (n, N ) [dV ]

dim( 't)

dri

i=1

' sin(α, log R)m α ,

(7.6)

α>0

where4 r (n, N ) =

vol (U (N ))

2n

2 (N 2 −1)/2

vol (U (n N+ ))2 vol (U (n N− ))2 2n (N −2n N −3)/2

.

(7.7)

The radial coordinates ri ∈ [0, π2 ] are the eigenvalues of U , while V are the angular coordinates with [dV ] denoting the standard invariant Haar measure. The second product *

4 The normalization constant r (n, N ) is determined by the requirement O dC = vol(O).

Localization for Yang-Mills Theory on the Fuzzy Sphere

237

runs over positive roots of the restricted root lattice on O, and m α is the multiplicity of the root α in the Cartan decomposition (3.21). The pairing is defined by choosing an orthonormalbasis ei in weight space and identifying a root vector α with the dual element α ∨ = i αi ei . Then (α, log R) = i αi ri . This polar decomposition defines a foliation of the configuration space O by conjugacy classes under the adjoint action of the stabilizer subgroup. The radial symplectic leaves L(R) of this foliation are parametrized by the abelian Lie group exp(t). Let us make this decomposition more explicit using the known data for the symmetric space (2.23) [37]. The restricted root lattice is given by the root system BCn N− = Bn N− ∪ Cn N− , which has positive weights ei ± e j , 2ei and ei with i, j = 1, . . . , n N− , i < j. The corresponding multiplicities are m ei ±e j = 2, m 2ei = 1 and m ei = 4n. The gauge invariant volume form on O thereby becomes '

n N−

dC = r (n, N ) [dV ]

dri sin 2ri sin4n ri

i=1

'

sin2 (ri − r j ) sin2 (ri + r j ). (7.8)

i< j

Using the trigonometric identities sin(ri − r j ) sin(ri + r j ) =

1 2

(cos 2r j − cos 2ri )

and

sin2 ri =

1 2

(1 − cos 2ri ) (7.9)

and defining λi := cos 2ri ∈ [−1, 1], we may bring the measure to the form dC =

'

n N−

r (n, N ) 2 2 2n (N −1)

[dV ] (λ)2

dλi (1 − λi )2n .

(7.10)

i=1

A convenient choice for the radial coordinates is provided by setting ρ := diag(r1 , . . . , rn N− )

(7.11)

and defining R = diag σ 0 ⊗ 1ln , exp( i σ 1 ⊗ ρ) = diag σ 0 ⊗ 1ln , σ 0 ⊗ cos(ρ) + i σ 1 ⊗ sin(ρ) .

(7.12)

We also choose a basis in which =

N 2

diag 1ln N+ , −1ln N− =

N 2

diag σ 0 ⊗ 1ln , σ 3 ⊗ 1ln N−

(7.13)

and V ∈ U (n N+ ) × U (n N− ) is given by V = diag(V+ , V− ),

(7.14)

with V± ∈ U (n N± ) and [dV ] = [dV+ ] [dV− ]. The relations (7.4) are then automatically satisfied.

238

H. Steinacker, R. J. Szabo

7.2. Evaluation of the abelianized partition function: U (1) gauge theory. We will now explicitly evaluate the Fourier transform (5.3), beginning with the abelian case n = 1. Using (7.5) and (7.11)–(7.14), it is straightforward to work out the abelian moment map in (5.3) with the result µT (C), p = Tr p U U −1 = 21 Tr p V (R 2 + R −2 ) V −1 = Tr p V diag σ 0 , cos(2σ 1 ⊗ ρ) V −1 = N2 Tr diag( p1 σ 0 , p2 , . . . , p N ) V+ diag(σ 0 , λ1 , . . . , λ N− ) V+−1 − N2 Tr diag( p2 , . . . , p N ) V− diag(λ1 , . . . , λ N− ) V−−1 , (7.15) where we have used an inconsequential redefinition of the unitary matrix V+ by multiplication with an appropriate permutation matrix. Upon substitution into (5.3), we see that the two angular integrals decouple from each other. The integral over V− ∈ U (N− ) is now easily evaluated with the help of (6.1) with the result c N (N− , 4) ( p2 , . . . , p N ) (λ)

sgn(π− )

π− ∈SN−

N− '

e

iN 4

pi+1 λπ− (i)

.

(7.16)

i=1

The integral over V+ ∈ U (N+ ) is more delicate since the Itzykson-Zuber formula will involve a ratio of degenerate determinants. Since both numerator and denominator of (6.1) are completely antisymmetric functions of the eigenvalues xi and yi independently, the limit where some eigenvalues coalesce gives a well-defined analytic function in (xi , yi ) because all poles are cancelled by zeroes in the determinant. We will regularize the V+ -integral by replacing the first p1 entry in the last line of (7.15) with an auxiliary momentum variable p0 ∈ R, the second entry of 1 with an auxiliary radial variable λ0 ∈ [−1, 1], and then afterwards take the limits p0 → p1 , λ0 → 1. Defining λ N := 1, the Itzykson-Zuber formula (6.1) applied to the regularized V+ -integral yields c N (N+ , −4) ( p0 , p1 , . . . , p N ) (λ0 , λ1 , . . . , λ N ) ×

sgn(π+ ) e

− i 4N p0 λπ+ (N )

π+ ∈SN+

N− '

e−

iN 4

pi+1 λπ+ (i)

.

(7.17)

i=0

Taking the limit p0 → p1 first using l’Hôpital’s rule gives iN 4

p1

N

c N (N+ , −4)

( p1 − pi ) ( p) (λ0 , λ1 , . . . , λ N )

i=2

×

π+ ∈SN+

sgn(π+ ) λπ+ (N ) e −

iN 4

p1 λπ+ (N )

N− ' i=0

e−

iN 4

pi+1 λπ+ (i)

.

(7.18)

Localization for Yang-Mills Theory on the Fuzzy Sphere

239

Finally, taking the limit λ0 → 1 again using l’Hôpital’s rule yields − i4N c N (N+ , −4) p1

N

( p1 − pi ) ( p)

i=2

N −

(7.19)

(1 − λi )2 (λ)

i=1

− i 4N p1 λ N− 1 − i N e − i4N p1 λ e − i4N p1 λ1 . . . λ − i4N p1 e e 1 N − iN iN 4 − i 4N p1 λ N− − i 4N p1 λ1 − i4N p1 − 4 p 1 e − 4 p1 e ... e e . × .. .. .. .. . . . . − i N p e − i4N p N e − i4N p N λ1 . . . e − i4N p N λ N− e − i4N p N N 4 Substituting the above into (5.3) gives us the expression

4 vol(O) Z O ( p) = − p1 ( p)2

×

N ! (N − 1)! √

sgn(π− )

π− ∈SN−

8N

N− '

N −2

(k!)2

k=1 N 2 −N

e

iN 4

N− ' l=1

1

−1

dλl

pi+1 λπ− (i)

i=1

− i 4N p1 λ N− 1 − i N e − i4N p1 λ e − i4N p1 λ1 . . . λ − i4N p1 e e 1 N − iN iN 4 − i 4N p1 λ N− − i 4N p1 λ1 − i4N p1 − 4 p 1 e − 4 p1 e ... e e . × .. .. .. .. . . . . − i N p e − i4N p N e − i4N p N λ1 . . . e − i4N p N λ N− e − i4N p N N 4 (7.20) We will now write the product of determinants in (7.20) as a single sum over the Weyl group S N of the original gauge symmetry group U (N ). For this, we embed S N− in the Weyl group S N as the subgroup of permutations π− of {1, . . . , N− , N } with π− (N ) = N . We perform a Laplace expansion of the second determinant in (7.20) into minors along the first row to write − i 4N p1 λ N− 1 − i N e − i4N p1 λ e − i4N p1 λ1 . . . λ − i4N p1 e e 1 N − iN iN 4 − i 4N p1 λ N− − i 4N p1 λ1 − i4N p1 − 4 p 1 e − 4 p1 e ... e e .. .. .. .. . . . . − i N p e − i4N p N e − i4N p N λ1 . . . e − i4N p N λ N− e − i4N p N N 4 + N '

iN iN = sgn(π+ ) 1 − i4N p1 e − 4 p1 e − 4 λi pπ+ (i) π+ ∈SN

−

i=1

N iN iN λi e − 4 4 i=1

λi p1

pπ+ (i) e −

iN 4

pπ+ (i)

⎤ N ' k=1 k=i

e−

iN 4

⎥ ⎦ . (7.21)

λk pπ+ (k) ⎥

240

H. Steinacker, R. J. Szabo

When inserted into the expression (7.20), we can use the invariance of the radial integration measure and domain under permutations of the λi ’s to reduce the double sum over the Weyl groups to a single sum over the relative permutation π := π+ π−−1 ∈ S N with π(N ) = π+ (N ). The sum over π+ can be replaced by a sum over π , while the remaining sum over π− simply produces the order N ! of the Weyl group of U (N ). In this way we may bring the Fourier transform of the orbit into the form N

4 vol(O) Z O ( p) = − p1 ( p)2 ×

(k!)2

k=1

(N − 1)!

sgn(π )

√ N 2 −N 8N

1−

iN 4

p1

e−

iN 4

( p1 + pπ(N ) )

π ∈SN

×

N− ' i=1

−

1 −1

dλi e −

iN 4

λi ( pπ(i) − pi+1 )

N− iN iN pπ( j) e − 4 4

( pπ( j) + pπ(N ) )

j=1

×

−

1 −1

iN 4

dλ j λ j e −

iN 4

λ j ( p1 − p j+1 )

N− ' i=1 i= j

pπ(N ) e −

iN 4

( pπ(N ) + p1 )

N− ' i=1

1 −1

1 −1

dλi e −

iN 4

λi ( pπ(i) − pi+1 )

⎤ dλi e −

iN 4

λi ( pπ(i) − pi+1 ) ⎦

.

(7.22)

Finally, the radial integrations can be expressed in terms of the spectral sine-kernel of the unitary ensemble of random matrix theory and its derivative given by 1 1 sin x = dλ e − i λ x and K(x) := x 2 −1 1 sin x i 1 cos x − =− dλ λ e − i λ x . (7.23) K (x) = x x 2 −1 Then the abelianized partition function (5.2) is written as an exact expansion in gaussian momentum transforms given by 8 vol(O) N ! (N − 1)! Z =− 2 N (2π ) N 2

⎡

× ⎣ 1−

iN 4

p1

√ 2N

e

N −2

(k!)2

k=1 N 2 −N

− i 4N ( p1 + pπ(N ) )

g

e − 4N sgn(π ) [d p] p1 RN π ∈SN N− ' i=1

K

N 4

( pπ(i) − pi+1 )

i

pi2

Localization for Yang-Mills Theory on the Fuzzy Sphere N− iN N pπ( j) e − 4 4

+

( pπ( j) + pπ(N ) )

K

N 4

j=1

−

iN 4

pπ(N ) e −

iN 4

( p1 + pπ(N ) )

241

( p1 − p j+1 )

N− '

K

N 4

( pπ(i) − pi+1 )

i=1 i= j

N− '

K

N 4

⎤ ( pπ(i) − pi+1 ) ⎦ .

(7.24)

i=1

For low values of N , the momentum integrals in this formula can be computed in terms of transcendental error functions, which are the typical contributions in nonabelian localization [18] and reflect the occurrence of non-gaussian quantum fluctuation integrals. Note that there is a single momentum p1 singled out in the formula (7.24). In the U (n) case of Sect. 7.3 below there will be n momenta singled out which is where the sum over sets of n integers required by the nonabelian localization

formula in the large N limit will come from. At N → ∞, the spectral kernels K N4 ( pπ(i) − pi+1 ) ≈ 4π N δ( pπ(i) − pi+1 ) provide the necessary groupings of variables into partitions of N arising from the sum over the residual gauge symmetry group S N . The conjugacy class of a given permutation π ∈ S N is characterized entirely by its cycle decomposition, which contains n k ≥ 0 cycles of length k for k = 1, . . . , N with N = k k n k and sgn(π ) = (−1) k (k−1) n k . However, the saddle-point partitions here do not correspond to the cycles themselves, but rather to the numbers Nn 1 ,...,n N of cycles (n 1 , . . . , n N ). For instance, the vacuum state now corresponds to the instanton configuration with N fluxons, i.e. only trivial representations due to the abelianization, with moduli space (3.11) as described in Sect. 3.1. The higher critical points consist of an 3 even number of irreducible representations which are suppressed roughly as e −N /2g n i . This indicates that the radial coordinates on the configuration space O are not so nicely adapted to the local symplectic geometry of the Yang-Mills critical surfaces. We will return to these issues in the next section.

7.3. Evaluation of the abelianized partition function: U (n) gauge theory. The nonabelian case n > 1 becomes very complicated due to the increasing complexity of the combinatorics involved in regulating the Itzykson-Zuber integral (6.1) over V+ ∈ U (n N+ ). We will therefore only briefly sketch the essential features, deferring the explicit evaluation in favour of a more formal, regulated combinatorial expansion. Consider the radial coordinates λi , i = 1, . . . , n N− on O and add 2n new real variables 1 + εi . We assemble them into the ordered set defined by

λ1 , . . . , λn N+ := 1 + ε1 , . . . , 1 + ε2n , λ1 , . . . , λn N− .

(7.25)

Similarly, we double the first n entries of the momentum vector p = ( p1 , . . . , pn N ) and gather them into the ordered set defined by

p 1 , . . . , p n N+ := ( p1 + κ, . . . , pn + κ, p1 , . . . , pn , pn+1 , . . . , pn N ) . (7.26)

At the end we will take the limits εi , κ → 0. The evaluation of the Fourier transform (5.3) now proceeds exactly as in Sect. 7.2 above. To organize the combinatorics, we use the identity

242

H. Steinacker, R. J. Szabo

det

1≤i, j≤n N+

lim

e−

iN 4

pi λ j

(ε)

iN vol (U (2n)) = sgn Q → { pi } e − 4 c N (2n, −4) Q⊂{ pi } − i 4N pˆ i λ j , × e

(7.27)

εi →0

qi

i

(q)

det

1≤i, j≤n N−

where { pˆ 1 , . . . , pˆ n N− } = { p 1 , . . . , p n N+ } \ Q with Q = {q1 , . . . , q2n } a subset of { p 1 , . . . , p n N+ } which is ordered according to (7.26), and the sign is determined by the parity of the embedding. The identity (7.27) can be derived by performing a Laplace expansion of the determinant on the left-hand side into the 2n rows containing the variables 1 + εi , and using the limit formula iN e − 4 qi ε j det vol (U (2n)) 1≤i, j≤2n lim = (q), (7.28) εi →0 (ε) c N (2n, −4) which follows from the Itzykson-Zuber formula (6.1). The Vandermonde determinants can also be factorized as '

n N−

( λ ) = (λ) (ε)

(1 − λi )2n

(7.29)

i=1

up to higher order terms in εi → 0, along with ( p ) ( pn+1 , . . . , pn N ) = κ n ( p)2 ( p1 , . . . , pn )2

(7.30)

in the limit κ → 0. In this way the partition function (5.2) can be expanded as Z = ζn,N lim

κ→0

1 κn

Q⊂{ pi } g

sgn Q → { pi }

e − 4N i pi − i4N i qi (q) × [d p] e ( p1 , . . . , pn )2 Rn N n N− 1 iN ' e 4 pi+n λ j det dλl det × l=1

−1

2

1≤i, j≤n N−

1≤i, j≤n N−

e−

iN 4

pˆi λ j

,

(7.31)

where ( i N )2n +n N (1−n N+ ) vol(O) (n N )! (2π )n N 2n N− (2−n N+ ) 2

ζn,N :=

n N− −1

'

k=1

(k!)2

2n ' (m + n N− − 1)! . m!

m=1

(7.32) We now expand the two determinants in (7.31) into a double sum over the Weyl group Sn N− , and use permutation symmetry of the radial integration to rewrite it as a sum

Localization for Yang-Mills Theory on the Fuzzy Sphere

243

over a single relative permutation exactly as in Sect. 7.2 above. Using (7.23) we arrive finally at the exact combinatorial expansion Z = 2n N− (n N− )! ζn,N lim

κ→0

×

Rn N

g

1 κn

Q⊂{ pi }

iN e − 4N i pi [d p] e− 4 2 ( p1 , . . . , pn ) 2

sgn Q → { pi }

sgn(π )

π ∈Sn N− i

'

n N− qi

(q)

K

N 4

( pˆ π(i) − pi+n ) .

i=1

(7.33) The combinatorics of the large function (7.33) can be described

N limit of the partition as follows. The sine-kernels K N4 ( pˆ π(i) − pi+n ) ≈ 4π N δ( pˆ π(i) − pi+n ) define a link from pˆ π(i) to pi+n . Following these, we obtain a set of open or closed links determined by π ∈ Sn N− . The open links must start at { p1 +κ, . . . , pn +κ, p1 , . . . , pn } (since those are not contained in the pi+n ) and end at {q1 , . . . , q2n } (since those are not contained in the pˆ i ). The closed links correspond to cycles in the conjugacy class of the permutation π . In iN particular, there are no factors e − 4 pi , i = 1, . . . , n or ( p1 , . . . , pn )2 for the internal variables, and hence we can explicitly evaluate the internal integrals. The difficulty lies in evaluating the sum over all possible distinct cycles for the internal variables in a closed form. Comparison with the constrained matrix model. In [9], quantum gauge theory on the fuzzy sphere S N2 was formulated as a multi-matrix model with action Smm =

1 Ng

Tr C 2 −

N2 4

1lN

2

(7.34)

and the constraint C0 = 21 1l N . It was shown that this matrix model also reproduces YangMills theory on S 2 in the large N limit. This differs from the formulation of the present 2 2 paper essentially by replacing the pair (action , constraint) given by (C − N4 1lN )2 , 2 (C0 − 21 1l N ) with the permuted pair (C0 − 21 1l N )2 , (C 2 − N4 1lN ) . This can be understood by imposing the respective constraints using gaussian terms in the actions, as then the tangential degrees of freedom are essentially the same in both cases. The symplectic formulation of the present paper has not only the advantage of applying the equivariant localization principle to systematically construct the instanton expansion of gauge theory on the fuzzy sphere, but it also somewhat simplifies the evaluation of the matrix integral. It also enables one in principle to keep control of the N1 corrections to Yang-Mills theory on S 2 , and the approximate delta-functions at N → ∞ responsible for the groupings of variables are more transparent along the lines explained in Sects. 7.2 and 7.3.

8. Yang-Mills Critical Surfaces in Abelianized Localization In this final section we will elucidate the relationship between the nonabelian and abelianized localization approaches to the exact instanton expansion of Yang-Mills theory

244

H. Steinacker, R. J. Szabo

on the fuzzy sphere S N2 . As discussed above, the critical surfaces for abelian localization are determined by the saddle-point equation (6.4), (7.2), [C, ] = 0

(8.1)

φ ⊗ σ0

for = with φ ∈ u(N ), which can be assumed to be diagonal by using a gauge transformation. Its distinct eigenvalues ν are arranged into degenerate blocks as = with

ν

k

ν 1ln ν ⊗ σ 0

(8.2)

ν=1

n ν = N . Then [C, ] = 0 implies that the covariant coordinate C = U −1 U =

k

Cν

(8.3)

ν=1

has the same block decomposition as . Thus it can be diagonalized as Cν = Vν−1 ν Vν ,

(8.4)

where Vν is a 2n ν × 2n ν unitary matrix on the block defined by 1ln ν ⊗ σ 0 in (8.2), and ν has eigenvalues ± N2 . Then comparing (8.3) and (8.4) implies k k k −1 −1 Vν U U Vν ν = −1 (8.5) = ν=1

ν=1

ν=1

for some permutation matrix ∈ U (N ) representing an element π ∈ SN /S N+ ×S N− , since both and ν ν are diagonal N × N matrices with the same set of degenerate eigenvalues. It follows that k −1 U Vν (8.6) −1 ∈ U (N+ ) × U (N− ), ν=1

and therefore U ∈ U (N ) is equal to ν Vν times an element of the stablizer subgroup U (N+ ) × U (N− ) ⊂ U (N ) of the element . We conclude that the gauge equivalence classes of solutions of the saddle point equation [C, ] = 0 in the configuration space O are described by the following data: • A quotient permutation π ∈ SN /S N+ × S N− ; • A unitary matrix in the stabilizergroup U (N+ ) × U (N− ); and • A unitary block transformation ν Vν adapted to the block decomposition (8.2) of . It is evident that these critical surfaces are much larger than the critical surfaces of the original Yang-Mills action, and they are not even in any one-to-one correspondence with the Yang-Mills saddle points. Any such block configuration is degenerate for the action in (4.3), and contains some Yang-Mills blocks of Sect. 3.4 (with the irreducible low-energy critical surface C(N ,1) and possibly fluxons or other purely noncommutative solutions). The reason is the absence of any localization form Qα, without which there is no way to separate the desired Yang-Mills blocks of Sect. 3.4 from these abelianized critical surfaces.

Localization for Yang-Mills Theory on the Fuzzy Sphere

245

8.1. Itzykson-Zuber localization on the symplectic leaves. We now consider the foliation of the orbit O() ∼ = U (2N )/R by conjugacy classes under the adjoint action of the stabilizer group R = U (N+ ) × U (N− ). The corresponding symplectic leaves L(λ) are parametrized by the radial coordinates λi ∈ [−1, 1], i = 1, . . . , N− . For a given * i leaf L(λ), the integral R [dV ] e − 2 µT (C), p is obtained by using the Itzykson-Zuber formula for the unitary groups U (N+ ) and U (N− ), as we did in Sects. 7.2 and 7.3. As in Sect. 6 above, the Itzykson-Zuber formula can itself be regarded as a consequence of abelian localization, and the expansions of the resulting determinants in Sect. 7.2 is precisely the sum over the saddle-points on each leaf L(λ). Let us identify these saddle-points explicitly. Choosing as in (7.13), the critical points of the moment map (7.15) with respect to arbitrary variations of (V+ , V− ) ∈ R are given by the solutions of the equations

diag( p2 , . . . , p N ), V− diag(λ1 , . . . , λ N− ) V−−1 = 0,

diag( p1 σ 0 , p2 , . . . , p N ), V+ diag(σ 0 , λ1 , . . . , λ N− ) V+−1 = 0. (8.7) As in Sect. 7.3, we consider for convenience the extended sets of radial coordinates (7.25) and momentum variables (7.26) for n = 1. Then the first equation in (8.7) means that the matrix V− diag(λ1 , . . . , λ N− ) V−−1 commutes with the spectral projectors of ( p2 , . . . , p N ), i.e. it has the same block decomposition, and similarly the second equation in (8.7) implies that the matrix V+ diag( λ1 , . . . , λ N+ ) V+−1 commutes with the spectral projectors of p. Using unitary transformations on each of these blocks, the matrix V− diag(λ1 , . . . , λ N− ) V−−1 can then be diagonalized with the same eigenvalues λi . It follows that

k ν=1

Uν

V− diag(λ1 , . . . , λ N− ) V−−1

k ν=1

Uν−1

−1 = diag(λπ− (1) , . . . , λπ− (N− ) ) = − diag(λ1 , . . . , λ N− ) −

(8.8)

for some Uν ∈ SU (n ν ), where n ν labels the degenerate blocks of ( p2 , . . . , p N ) with to an element ν n ν = N− and − ∈ SU (N− ) is a permutation matrix

corresponding V U = π− ∈ S N− . If λi are nondegenerate, this implies that − − and hence ν ν V− =

k ν=1

Uν−1

− .

(8.9)

−1 If some λi are degenerate, it only follows that − ν Uν V− commutes with the

−1 ˜ ˜ spectral projectors of λ, so that − ν Uν V− = ν Uν for some Uν ∈ SU (n ν ). It follows that the angular saddle-point V− ∈ U (N− ) is given by k k −1 ˜ V− = Uν Uν . − (8.10) ν=1

ν=1

Similar statements hold for the angular saddle-point V+ ∈ U (N+ ), with the additional feature that the first two entries of p and λ are degenerate by definition.

246

H. Steinacker, R. J. Szabo

In each case, the value of the action (7.15) is given by µT (C), p =

N− N+ N N pi λπ+ (i) − pi+1 λπ− (i) . 4 4 i=1

(8.11)

i=1

Therefore, each saddle-point is characterized by two permutation matrices ± corresponding to π± ∈ S N± , which may or may not generate non-trivial fibers on the homogeneous spaces of the group ν U (n ν ) depending on the degeneracies of p and λ. The integral over these V± orbits can then be evaluated using the Itzykson-Zuber formula leading to (7.16) and (7.17), which gives precisely the sum over the saddle points. The regularization required in (7.17) reflects the fact that the critical surfaces are no longer isolated points, due to the degeneracies of λi and pi . The main point of this analysis is that these critical surfaces are again not in any oneto-one correspondence with those of the original Yang-Mills action. In fact, the abelian critical surfaces above contain as subspaces those of the Itzykson-Zuber localization on O() discussed in Sect. 6 above, which are not only stationary on the symplectic leaves L(λ) but also with respect to variations of the radial coordinates λi . However, even the critical surfaces for the Itzykson-Zuber localization on the configuration space O() are not simply related to those of the Yang-Mills action. In particular, the variational problem for the action (8.11) does not determine the λi . A given radial saddle-point π± can thus correspond to various types of Yang-Mills solutions by appropriately choosing some λi , as we show explicitly in Sect. 8.2 below. This arbitrariness in the radial coordinates λi is lifted by the addition of the localization one-form α of Sect. 4, which serves to single out the Yang-Mills saddle points from the new critical points. Nevertheless, it is instructive to work out the radial coordinates of some Yang-Mills saddle-points to illustrate the powerful workings of the polar decomposition. 8.2. Radial coordinates for Yang-Mills critical surfaces. We will now work out the radial coordinates for the solutions of the Yang-Mills equations [C0 , Ci ] = 0, which will identify precisely the appropriate localization values of λi for each critical surface of Sect. 3.1. Given (7.13) we now consider the fuzzy sphere coordinates 0 0−1 and correspondingly modify the radial coordinates (7.12) to 0 0 σ (8.12) 0−1 , R = 0 0 exp( i σ 1 ⊗ ρ) where 0 ∈ U (N ) is a permutation matrix representing the cyclic permutation π(N+ ) = (1 2 · · · N+ ).

(8.13)

As we will see, the modification by 0 , although irrelevant from the point of view of the path integral, will greatly simplify the explicit parametrization. Using this parametrization and (7.14), we can write the covariant coordinates (7.5) in the explicit form 0 N σ 0 V 0 C= 0−1 V −1 0 σ 3 ⊗ cos(2ρ) + σ 2 ⊗ sin(2ρ) 2 ⎛ ⎞ ⎛ ⎞ ⎛ ⎞ 1 0 −1 −1 ⎝ cos(2ρ) ⎠ V+ N ⎜ − i V+ ⎝ sin(2ρ) ⎠ V− ⎟ ⎜ V+ ⎟ , (8.14) = ⎝ ⎠ 1 0 2

−1 −1 i V− 0 , sin(2ρ) , 0 V+ −V− cos(2ρ) V−

Localization for Yang-Mills Theory on the Fuzzy Sphere

247

where we have applied the commutation relation i [σ 1 , σ 3 ] = 2σ 2 . The role of the cyclic permutation matrix 0 is to move the unit entries of σ 0 symmetrically around the matrix cos(2ρ). We note for later use that if the unitary matrices V± ∈ U (N± ) are block-diagonal, then so is C. We will now use this parametrization to illustrate the use of the radial coordinates by working out (8.14) explicitly for various classical gauge field configurations of Sect. 3.1. The vacuum solution. The generators of the irreducible N -dimensional representation of the su(2) Lie algebra (2.1) are given explicitly by # (ξ3 )i j = −δi j N +1−2i and (ξ+ )i j = δi+1, j (N − i) i, (8.15) 2 where i, j = 1, . . . , N and ξ± = ξ1 ± i ξ2 with ξ− = ξ+† . The vacuum solution (2.4) in the abelian case n = 1 thus has the explicit form 1 1l + ξ3 ξ+ C= 2 N 1 ξ− 2 1l N − ξ3 1 diag(−N + 2, . . . , N − 2, N ) ξ+ (8.16) = 2 1 ξ− 2 diag(N , . . . , −N + 4, −N + 2) using the splitting into equal blocks of size N . This should be identified with (8.14), which splits into blocks of sizes N± . Noting the explicit form of ξ± in (8.15) as raising and lowering operators, it follows that one can consistently take both V+ diag(λ1 , . . . , λ N− , 1, 1) V+−1 and V− diag(λ1 , . . . , λ N− ) V−−1 to be diagonal matrices. We can then consistently match the eigenvalues as N (λ1 , . . . , λ N− , 1, 1) = (−N + 2, . . . , N − 2, N , N ),

(8.17)

λi = − N −2i for i = 1, . . . , N− N

(8.18)

which gives

and provides the eigenvalues of the radial matrix R for the vacuum critical surface C(N ,1) . Note that the eigenvalue N2 from the second diagonal block 21 1l N − ξ3 of C in (8.16) is contained in the matrix N2 V+ diag(λ1 , . . . , λ N− , 1, 1) V+−1 . It follows that V− = − is a permutation matrix in U (N− ), while V+ = + U2 is a permutation matrix up to a possible conjugation with a unitary matrix U2 ∈ SU (2) ⊂ U (N+ ) acting on the two marked indices labelling the unit entries. We can absorb − by a redefinition of the λi , and hence take V− = 1l N−

(8.19)

without loss of generality. It is also enough to consider the case U2 = 1l N+ . Comparing (8.14) with (8.16), it follows that V+ = +

(8.20)

is a permutation matrix representing the irreducible cycle (8.13) of length N+ . Furthermore, one has ( ( # 4i 2 2 i (N − i) = N2 (ξ+ )i,i+1 (8.21) sin(2ρi ) = 1 − λi2 = 4i N − N2 = N

248

H. Steinacker, R. J. Szabo

for i = 1, . . . , N− , which is indeed the correct representation of ξ± in (8.15), embedded in the correct off-diagonal way in (8.14) due to the block decomposition into sizes N± . Let us point out one interesting feature of the covariant coordinate (8.16). The two diagonal entries of N2 in the center of the matrix constitute a trivial 2 × 2 unit matrix σ 0 which completely decouples from the rest of C. This block can be traced to the σ 0 in the upper-left corner of the first line in (8.14), whose position is determined by the permutation matrix 0 , or equivalently to the auxiliary radial coordinates λi = 1 + εi , i = 1, 2. In fact, any explicit entry of ± N2 in C necessarily decouples from the rest of C, for otherwise C would have eigenvalues of modulus larger than N2 . This means, in particular, that we can permute these two entries using a suitable permutation matrix V+ = + without any effect on C (but it will have an effect on the momenta pi if they are included). This observation will be useful below. This construction clearly generalizes to give the blocks C(n a ) of size 2n a of the critical surfaces C(n 1 ,s1 ),...,(n k ,sk ) corresponding to irreducible SU (2) representations of dimensions n a < N . The most extreme case n a = 1 consists of the one-dimensional representation with C0 (n a = 1) = N2 and Ci (n a = 1) = 0, whereby C(n a = 1) =

N 2

σ 0,

(8.22)

and hence only the explicit σ 0 block survives. Nonabelian generalization. For n ≥ 2, the vacuum critical surface C(N ,1),...,(N ,1) is associated with the solution (3.9) which is a direct sum of n irreducible SU (2) representations of dimension N . This can clearly be obtained by repeating the above construction n times. In particular, V+ = (+ )⊕n is a product of n “marked cycles” as above. Notice, however, that the same saddle point is obtained if one acts with an additional permutation of the 2n auxiliary radial coordinates λi = 1, i = 1, . . . , 2n (recall that the explicit entries ± N2 of C are always isolated). In doing this, the decomposition of V+ into irreducible cycles gets modified. It can nonetheless be made into one irreducible cycle with 2n marked points which come in groups of two at equal distance, for example. This demonstrates that the mapping between the Yang-Mills saddle-points and those of the abelianization approach in Sect. 7 is complicated. In particular, it is not injective. Again, this construction generalizes to blocks of the critical surfaces C(n 1 ,s1 ),...,(n k ,sk ) corresponding to irreducible SU (2) representations of various dimensionalities. Fluxons. Fix an integer 1 ≤ n ≤ N and consider the block gauge field configuration of size 2n given by N V σ 3 ⊗ cos(2ρ) + σ 2 ⊗ sin(2ρ) V −1 C= 2 N V+ cos(2ρ) V+−1 − i V+ sin(2ρ) V−−1 , (8.23) = i V− sin(2ρ) V+−1 −V− cos(2ρ) V−−1 2 which is almost the same as (8.14) above but without the σ 0 block. We choose for i = 1, . . . , n − 1, λi = − n−2i n

(8.24)

along with V+ = (n)

and

V− = 1ln−1 ,

(8.25)

Localization for Yang-Mills Theory on the Fuzzy Sphere

249

where (n) ∈ U (n + 1) is a cyclic permutation matrix representing π(n) := (1 2 · · · n). Then we get explicitly N diag(−n + 2, . . . , n − 2) ξ˜+ , (8.26) C= ξ˜− diag(n − 2, . . . , −n + 2) 2n where ξ˜± are cyclic operators (rather than raising/lowering operators as before). In this case C0 = 0, and hence this solution is part of the orbifold singularities for n coincident fluxons in the moduli space (3.11) of Sect. 3.1, rather than an irreducible representation of the isometry group SU (2). This construction is further used below. In particular, the special case n = 1 gives a single fluxon C(n = 1) = N2 σ 3 . Then there exists a unitary transformation U ∈ SU (2) such that U C(n = 1) U −1 =

N 2

U σ 3 U −1 = ci σ i ,

(8.27)

which gives the position ci of the fluxon on the sphere S 2 . Multi-block solutions. Let us modify the previous radial solution by setting λ1 = ± 1 and taking λi+1 to be given by (8.24), while keeping the angular variables (8.25) in U (n + 2) and U (n) the same. Then the block covariant coordinates (8.23) of size 2(n + 1) are given explicitly as N diag(−n + 2, . . . , n − 2, ± n) ξ+ , (8.28) C= ξ− diag(∓ n, n − 2, . . . , −n + 2) 2n which is almost the same as the vacuum configuration (8.16) for an n-dimensional irreducible representation except that there are two explicit diagonal entries N2 , − N2 instead of N2 , N2 . In particular, C0 is no longer constant and hence the gauge fields (8.28) are not solutions of the Yang-Mills equations of motion. This can be cured by the addition of extra irreducible representations as follows. One can now construct solutions of the Yang-Mills equations with several blocks and arbitrary parameters, i.e. the generic critical surfaces C(n 1 ,s1 ),...,(n k ,sk ) , by joining an even number of copies of (8.28) in a suitable way. Fix another integer m ≥ 1 such that n + m ≤ N , and consider again the block covariant coordinate (8.23) of size 2(n + m) with λ1 = 1, λi = − n−2(i−1) for i = 2, . . . , n and n λ j+n−1 = − m−2(mj−1) for j = 1, . . . , m.

(8.29)

The angular degrees of freedom are given by V+ = (n+m)

and

V− = 1ln+m−1

(8.30)

in U (n + m ± 1), corresponding to the cyclic permutation π(n+m) decomposed as π(n+m) = (π(n) )1,...,n ◦ (π(m) )n+1,...,n+m ◦ (1 n+1),

(8.31)

where the subscripts indicate the indices that the permutations act on. The role of the transposition (1 n+1) is to first interchange the explicit 1 and −1 in (8.29) for the upper block in (8.23), which then takes the form of two copies of the matrix (8.28) but with the correct explicit diagonal entries ± N2 . Since V+ = (n+m) corresponds to an irreducible cycle, C is a direct sum of two irreducible representations with opposite sign and hence lives on the critical surface block C(n,1),(m,−1) with vanishing overall trace. This construction clearly generalizes to an arbitrary number of irreducible representations of the SU (2) isometry group.

250

H. Steinacker, R. J. Szabo

8.3. Action of the gauge group. Finally, let us describe how the gauge symmetry acts on the radially foliated solutions. Recall that the gauge group G ∼ = SU (n N ) is embedded in the symmetry group of the orbit space O as φ = φ0 ⊗ σ 0 in the Lie algebra of G ⊂ SU (2n N ). This embedding is well adapted to the modification of the radial coordinates in (8.12) by the permutation matrix 0 . Indeed, there is an embedding of the “diagonal” subgroup U (n N− ) ⊂ U (n N+ ) × U (n N− ) into G given by taking V− into diag(1ln , V− ) ⊗ σ 0 as ⎞ ⎛⎛ ⎞ 1ln ⎜⎝ ⎠ 0 ⎟ V− ⎟. (8.32) V− −→ ⎜ ⎠ ⎝ 1ln 0 V− This shows explicitly that a large part of the gauge group is part of the stabilizer group R = U (n N+ ) × U (n N− ) which defines the foliation of the radial coordinates. Furthermore, there is an additional symmetry SU (n) ⊂ U (n N+ ) embedded into G by taking U into diag(U, 1ln N− ) ⊗ σ 0 as ⎞ ⎛⎛ ⎞ U ⎜ ⎝ 1ln N ⎠ 0 ⎟ − ⎟. (8.33) U −→ ⎜ ⎠ ⎝ U 0 1ln N− This extra SU (n) symmetry acts on the marked momenta p1 , . . . , pn of Sect. 7.3, and together with the degenerate Itzykson-Zuber localization it is thus responsible for the emergence of the nonabelian gauge symmetry in the commutative limit. The remainder SU (n N )/SU (n N− ) × SU (n) of the gauge group mixes the symplectic leaves, so that the radial foliation is not G-equivariant. Acknowledgement. We thank C.-S. Chu, B. Dolan, H. Grosse, X. Martin and D. O’Connor for helpful discussions. The work of H.S. was supported in part by the FWF Project P16779-N02 and in part by the FWF Project P18657. The work of R.J.S. was supported in part by the EU-RTN Network Grant MRTN-CT-2004-005104.

References 1. Madore, J.: The Fuzzy Sphere. Class. Quant. Grav. 9, 69–88 (1992) 2. Grosse, H., Klimcik, C., Presnajder, P.: Towards Finite Quantum Field Theory in Noncommutative Geometry. Int. J. Theor. Phys. 35, 231–244 (1996) 3. Klimcik, C.: Gauge Theories on the Noncommutative Sphere. Commun. Math. Phys. 199, 257–279 (1998) 4. Carow-Watamura, U., Watamura, S.: Noncommutative Geometry and Gauge Theory on Fuzzy Sphere. Commun. Math. Phys. 212, 395–413 (2000) 5. Baez, S., Balachandran, A.P., Ydri, B., Vaidya, S.: Monopoles and Solitons in Fuzzy Physics. Commun. Math. Phys. 208, 787–798 (2000) 6. Grosse, H., Rupp, C.W., Strohmaier, A.: Fuzzy Line Bundles, the Chern Character and Topological Charges over the Fuzzy Sphere. J. Geom. Phys. 42, 54–63 (2002) 7. Grosse, H., Maceda, M., Madore, J., Steinacker, H.: Fuzzy Instantons. Int. J. Mod. Phys. A 17, 2095 (2002) 8. Presnajder, P.: Gauge Fields on the Fuzzy Sphere. Mod. Phys. Lett. A18, 2431–2438 (2003) 9. Steinacker, H.: Quantized Gauge Theory on the Fuzzy Sphere as Random Matrix Model. Nucl. Phys. B679, 66–98 (2004) 10. Castro-Villarreal, P., Delgadillo-Blando, R., Ydri, B.: A Gauge-Invariant UV-IR Mixing and the Corresponding Phase Transition for U (1) Fields on the Fuzzy Sphere. Nucl. Phys. B704, 111–153 (2005) 11. Ydri, B.: The One-Plaquette Model Limit of NC Gauge Theory in 2D. Nucl. Phys. B762, 148–188 (2007); Ydri, B.: Quantum Equivalence of NC and YM Gauge Theories in 2 D and Matrix Theory. http://arxiv. org/list/hepth/0701057, 2007

Localization for Yang-Mills Theory on the Fuzzy Sphere

251

12. Aschieri, P., Grammatikopoulos, T., Steinacker, H., Zoupanos, G.: Dynamical generation of fuzzy extra dimensions, dimensional reduction and symmetry breaking. JHEP 0609, 026 (2006) 13. Karabali, D., Nair, V.P., Polychronakos, A.P.: Spectrum of Schrödinger Field in a Noncommutative Magnetic Monopole. Nucl. Phys. B627, 565–579 (2002) 14. Alekseev, A.Yu., Recknagel, A., Schomerus, V.: Noncommutative Worldvolume Geometries: Branes on SU (2) and Fuzzy Spheres. J. High Energy Phys. 9909, 023 (1999); Brane Dynamics in Background Fluxes and Noncommutative Geometry. J. High Energy Phys. 0005 010 (2000) 15. Iso, S., Kimura, Y., Tanaka, K., Wakatsuki, K.: Noncommutative Gauge Theory on Fuzzy Sphere from Matrix Model. Nucl. Phys. B 604, 121–147 (2001) 16. Azuma, T., Bal, S., Nagao, K., Nishimura, J.: Nonperturbative Studies of Fuzzy Spheres in a Matrix Model with the Chern-Simons Term. J. High Energy Phys. 0405, 005 (2004); O’Connor, D., Ydri, B.: Monte Carlo Simulation of a NC Gauge Theory on The Fuzzy Sphere. JHEP 0611 016 (2006) 17. Berenstein, D., Maldacena, J.M., Nastase, H.: Strings in Flat Space and pp-Waves from N = 4 Super Yang-Mills. J. High Energy Phys. 0204, 013 (2002) 18. Witten, E.: Two-Dimensional Gauge Theories Revisited. J. Geom. Phys. 9, 303–368 (1992) 19. Minahan, J.A., Polychronakos, A.P.: Classical Solutions for Two-Dimensional QCD on the Sphere. Nucl. Phys. B422, 172–194 (1994) 20. Gross, D.J., Matytsin, A.: Instanton Induced Large N Phase Transitions in Two-Dimensional and FourDimensional QCD. Nucl. Phys. B429, 50–74 (1994) 21. Douglas, M.R., Nekrasov, N.A.: Noncommutative Field Theory. Rev. Mod. Phys. 73, 977–1029 (2001) 22. Szabo, R.J.: Quantum Field Theory on Noncommutative Spaces. Phys. Rept. 378, 207–299 (2003) 23. Jeffrey, L.C., Kirwan, F.C.: Localization for Nonabelian Group Actions. Topology 34, 291–327 (1995); Intersection Theory on Moduli Spaces of Holomorphic Bundles of Arbitrary Rank on a Riemann Surface. Ann. Math. 148 109–196 (1998) 24. Paradan, P.-E.: The Moment Map and Equivariant Cohomology with Generalized Coefficients. Topology 39, 401–444 (2000) 25. Jeffrey, L.C., Kiem, Y.-H., Kirwan, F.C., Woolf, J.: Cohomology Pairings on Singular Quotients in Geometric Invariant Theory. Transf. Groups 8, 217–259 (2003) 26. Woodward, C.T.: Localization for the Norm-Square of the Moment Map and the Two-Dimensional Yang-Mills Integral. J. Symbl. Geom. 3(1), 17–54 (2005) 27. Beasley, C., Witten, E.: Nonabelian Localization for Chern-Simons Theory. J. Diff. Geom. 70, 183–323 (2005) 28. Blau, M., Thompson, G.: Localization and Diagonalization: A Review of Functional Integral Techniques for Low-Dimensional Gauge Theories and Topological Field Theories. J. Math. Phys. 36, 2192– 2236 (1995) 29. Szabo, R.J.: Equivariant Cohomology and Localization of Path Integrals. Lect. Notes Phys. M63, 1–319 (2000); Equivariant Localization of Path Integrals. http://arxiv.org/list/hepth/9608068, 1996 30. Migdal, A.A.: Recursion Equations in Gauge Field Theories. Sov. Phys. JETP 42, 413 (1975) [Zh. Eksp. Teor. Fiz. 69 810–822 (1975)] 31. Rusakov, B.E.: Loop Averages and Partition Functions in U (N ) Gauge Theory on Two-Dimensional Manifolds. Mod. Phys. Lett. A5, 693–703 (1990) 32. Paniak, L.D., Szabo, R.J.: Instanton Expansion of Noncommutative Gauge Theory in Two Dimensions. Commun. Math. Phys. 243, 343–387 (2003) 33. Paniak, L.D., Szabo, R.J.: Lectures on Two-Dimensional Noncommutative Gauge Theory 1: Classical Aspects. Sveske Fiz. Nauka A16, 1–27 (2003); Lectures on Two-Dimensional Noncommutative Gauge Theory 2: Quantization. Lect. Notes Phys. 662, Berlin-Heidelberg-New York:Springer, 2005, pp. 205–237 34. Paniak, L.D., Szabo, R.J.: Open Wilson Lines and Group Theory of Noncommutative Yang-Mills Theory in Two Dimensions. J. High Energy Phys. 0305, 029 (2003) 35. Gross, D.J., Nekrasov, N.A.: Solitons in Noncommutative Gauge Theory. J. High Energy Phys. 0103, 044 (2001) 36. Helgason, S.: Differential Geometry, Lie Groups and Symmetric Spaces. New York:Academic Press, 1978; Groups and Geometric Analysis: Integral Geometry, Invariant Differential Operators and Spherical Functions. New York:Academic Press, 1984 37. Caselle, M., Magnea, U.: Random Matrix Theory and Symmetric Spaces. Phys. Rept. 394, 41–156 (2004) 38. Szabo, R.J.: Finite Volume Gauge Theory Partition Functions in Three Dimensions. Nucl. Phys. B723, 163–197 (2005) 39. Grosse, H., Steinacker, H.: Finite Gauge Theory on Fuzzy CP 2 . Nucl. Phys. B707, 145–198 (2005) 40. Madore, J., Schraml, S., Schupp, P., Wess, J.: Gauge Theory on Noncommutative Spaces. Eur. Phys. J. C 16, 161–167 (2000) 41. Berline, N., Getzler, E., Vergne, M.: Heat Kernels and Dirac Operators. Berlin Heidelberg-New York:Springer-Verlag, 1992

252

H. Steinacker, R. J. Szabo

42. Griguolo, L., Seminara, D., Szabo, R.J.: Instantons, Fluxons and Open Gauge String Theory. Adv. Theor. Math. Phys. 9, 775–860 (2005) 43. Chu, C.-S., Madore, J., Steinacker, H.: Scaling Limits of the Fuzzy Sphere at One Loop. J. High Energy Phys. 0108, 038 (2001) 44. Behr, W., Meyer, F., Steinacker, H.: Gauge Theory on Fuzzy S 2 × S 2 and Regularization on Noncommutative R4 . J. High Energy Phys. 0507, 040 (2005) 45. Guillemin, V., Sternberg, S.: Symplectic Techniques in Physics. Cambridge:Cambridge University Press, 1984 46. Itzykson, C., Zuber, J.-B.: The Planar Approximation. 2. J. Math. Phys. 21, 411 (1980) 47. Guillemin, V., Lerman, E., Sternberg, S.: On the Kostant Multiplicity Formula. J. Geom. Phys. 5, 721–750 (1988) Communicated by A. Connes

Commun. Math. Phys. 278, 253–288 (2008) Digital Object Identifier (DOI) 10.1007/s00220-007-0390-4

Communications in

Mathematical Physics

Lingering Random Walks in Random Environment on a Strip Erwin Bolthausen1 , Ilya Goldsheid2 1 Universität Zürich, Institut für Mathematik, Winterthurerstrasse 190, CH-8057 Zürich,

Switzerland. E-mail: [email protected]

2 School of Mathematical Sciences, Queen Mary and Westfield College, University of London,

London E1 4NS, UK. E-mail: [email protected] Received: 30 July 2007 / Accepted: 1 November 2007 Published online: 8 December 2007 – © Springer-Verlag 2007

Abstract: We consider a recurrent random walk (RW) in random environment (RE) on a strip. We prove that if the RE is i. i. d. and its distribution is not supported by an algebraic subsurface in the space of parameters defining the RE then the RW exhibits the (log t)2 asymptotic behaviour. The exceptional algebraic subsurface is described by an explicit system of algebraic equations. One-dimensional walks with bounded jumps in a RE are treated as a particular case of the strip model. If the one dimensional RE is i. i. d., then our approach leads to a complete and constructive classification of possible types of asymptotic behaviour of recurrent random walks. Namely, the RW exhibits the (log t)2 asymptotic behaviour if the distribution of the RE is not supported by a hyperplane in the space of parameters which shall be explicitly described. And if the support of the RE belongs to this hyperplane then the corresponding RW is a martingale and its asymptotic behaviour is governed by the Central Limit Theorem.

1. Introduction The aim of this work is to describe conditions under which a recurrent random walk in a random environment (RWRE) on a strip exhibits the log2 t asymptotic behaviour. This slow, lingering movement of a walk was discovered by Sinai in 1982 [18]. At the time, this work had brought to a logical conclusion the study of the so called simple RWs (SRW) started by Solomon in [19] and by Kesten, Kozlov, and Spitzer in [14]. The somewhat misleading term “simple” is often used as an abbreviation describing a walk on a one-dimensional lattice with jumps to nearest neighbours. Our work was motivated by a question asked by Sinai in [18] about the validity of his (and related) results for other models. Perhaps the simplest extension of the SRW is presented by a class of one-dimensional walks whose jumps (say) to the left are bounded and to the right are of length at most one. These models were successfully studied by a

254

E. Bolthausen, I. Goldsheid

number of authors and the relevant references can be found in [2]. We would like to quote one result concerning this special case since it is perhaps most close to our results stated below in Theorems 2 and 3. Namely, Bremont proved in [3] that if the environment is defined by a Gibbs measure on a sub-shift of finite type, then the asymptotic behaviour of a recurrent RW is either as in Sinai’s theorem, or it is governed by the Central Limit Law. General 1DWBJ were also studied by different authors. Key in [15] found conditions for recurrence of a wide class of 1DWBJ. Certain sufficient conditions for the Sinai behaviour of 1DWBJ were obtained by Letchikov in [17]. The results from [17] will be discussed in a more detailed way in Sect. 1.1 after the precise definition of the onedimensional model is given. We refer the reader to [20] for further historical comments as well as for a review of other recent developments. The main object of this paper is the RWRE on a strip. We prove (and this is the main result of this paper) that recurrent walks in independent identically distributed (i. i. d.) random environments on a strip exhibit the log2 t asymptotic behaviour if the support of the distribution of the parameters defining the random environment does not belong to a certain algebraic subsurface in the space of parameters. This subsurface is defined by an explicit system of algebraic equations. The one dimensional RW with bounded jumps can be viewed as a particular case of a RWRE on a strip. This fact was explained in [1] and we shall repeat this explanation here. Due to this reduction, our main result implies a complete classification of recurrent 1DWBJ in i.i.d. environments. Namely, the corresponding system of algebraic equations reduces in this case to one linear equation which defines a hyperplane in the space of parameters. If the support of the distribution of parameters does not belong to this hyperplane, then the RW exhibits the Sinai behaviour (see Theorem 2 below). But if it does, then (Theorem 3 below) the corresponding random walk is a martingale and its asymptotic behaviour is governed by the Central Limit Law. In brief, recurrent 1DWBJ are either of the Sinai type, or they are martingales. In the case of a strip, a complete classification can also be obtained and it turns out that once again the asymptotic behaviour is either the Sinai, or is governed by the Invariance Principle. However, this case is less transparent and more technical even to describe in exact terms and we shall leave it for a future work. The paper is organized as follows. We state Sinai’s result and define a more general one-dimensional model in Sect. 1.1. Section 1.2 contains the definition of the strip model and the explanation of the reduction of the one-dimensional model to the strip case. Main results are stated in Sect. 1.3. Section 2 contains several statements which are then used in the proof of the main result, Theorem 1. In particular, we introduce random transformations associated with random environments in Sect. 2.2. It turns out to be natural to recall and to extend slightly, in the same Sect. 2.2, those results from [1] which are used in this paper. An important Lemma 5 is proved in Sect. 2.3; this lemma allows us to present the main algebraic statement of this work in a constructive form. In Sect. 2.4 we prove the invariance principle for the log of a norm of a product of certain matrices. This function plays the role of the so-called potential of the environment and is responsible for the Sinai behaviour of the random walk. It is used in the proof of our main result in Sect. 3. Finally the Appendix contains results of which many (if not all) are not new but it is convenient to have them in a form directly suited for our purposes. Among these, the most important for our applications is the Invariance Principle (IP) for “contracting” Markov chains (Sect. 4.1.3). Its proof is derived from a well known IP for general Markov chains which, in turn, is based on the IP for martingales.

Lingering Random Walks in Random Environment on a Strip

255

Conventions. The following notations and terminology shall be used throughout the paper. R is the set of real numbers, Z is the set of integer numbers, and N is the set of positive integers. For a vector x = (xi ) and a matrix A = (a(i, j)) we put def def x = max |xi |, A = max |a(i, j)|. i

i

j

Note that A = sup|x|=1 Ax. We say that A is strictly positive (and write A > 0), if all its matrix elements satisfy a(i, j) > 0. A is called non-negative (and we write A ≥ 0), if all a(i, j) are non negative. A similar convention applies to vectors. def

1.1. Sinai’s result and some of its extensions to 1DWBJ. Let ω = ( pn )−∞ 0. Put qn = 1 − pn and consider a random walk ξ(t) on a one-dimensional lattice with a starting point ξ(0) = 0 and transition probabilities Prω { ξ(t + 1) = n + 1 | ξ(t) = n } = pn , Prω { ξ(t + 1) = n − 1 | ξ(t) = n } = qn , thus defining a measure Prω {·} on the space of trajectories of the walk. It is well known (Solomon, [19]) that this RW is recurrent in almost all environments ω if and only if E ln qpnn = 0 (here E denotes the expectation with respect to the relevant measure P on the space of sequences). In [18] Sinai proved that if E(ln qpnn )2 > 0 and ξ(·) is recurrent then there is a weakly converging sequence of random variables bt (ω), t = 1, 2, . . . such that (log t)−2 ξ(t) − bt →0 as t → ∞. (1.1) The convergence in (1.1) is in probability with respect to the so-called annealed probability measure P(dω)Prω (for precise statements see Sect. 1.3). The limiting distribution of bt was later found, independently, by Golosov [7,8] and Kesten [13]. The one-dimensional walk with bounded jumps on Z is defined similarly to the simdef

ple RW. Namely let ω = ( p(n, ·)), n ∈ Z, be a sequence of non-negative vectors with m k=−m p(n, k) = 1 and m > 1. Put ξ(0) = 0 and def

Prω (ξ(t + 1) = n + k | ξ(t) = n) = p(n, k), n ∈ Z.

(1.2)

Suppose next that p(n, ·) is a random stationary in n (in particular it can be i. i. d.) sequence of vectors. Sinai’s question can be put as follows: given that a RW is recurrent, what kind of asymptotic behaviour would one observe, and under what conditions? There were several attempts to extend Sinai’s result to the (1.2) model. In particular, Letchikov [17] proved that if for some ε > 0 with P-probability 1 p(n, 1) ≥

−2 k=−m

p(n, k) + ε and p(n, −1) ≥

m

p(n, k) + ε

k=2

and the distribution of the i. i. d. random vectors p(n, ·) is absolutely continuous with respect to the Lebesgue measure (on the relevant simplex), then the analogue of Sinai’s theorem holds. (In [17], there are also other restrictions on the distribution of the RE but they are much less important than the ones listed above.) The technique we use in this work is completely different from that used in [2,3,15,17]. It is based on the methods from [1] and [6] and this work presents further development of the approach to the analysis of the RWRE on a strip started there.

256

E. Bolthausen, I. Goldsheid

1.2. Definition of the strip model . The description of the strip model presented here is the same as in [1]. Let (Pn , Q n , Rn ), −∞ < n < ∞, be a strictly stationary ergodic sequence of triples of m × m matrices with non-negative elements such that for all n ∈ Z the sum Pn + Q n + Rn is a stochastic matrix, (Pn + Q n + Rn )1 = 1,

(1.3)

where 1 is a column vector whose components are all equal to 1. We write the components of Pn as Pn (i, j), 1 ≤ i, j ≤ m, and similarly for Q n and Rn . Let (, F, P, T ) be the corresponding dynamical system with denoting the space of all sequences ω = (ωn ) = ((Pn , Q n , Rn )) of triples described above, F being the corresponding natural σ -algebra, P denoting the probability measure on (, F), and T being a shift operator on defined by (T ω)n = ωn+1 . For fixed ω we define a random walk ξ(t), t ∈ N on the strip S = Z × {1, . . . , m} by its transition probabilities Qω (z, z 1 ) given by ⎧ P (i, j) ⎪ ⎨ n Rn (i, j) def Qω (z, z 1 ) = ⎪ ⎩ Q n (i, j) 0

if z = (n, i), z 1 = (n + 1, j), if z = (n, i), z 1 = (n, j), if z = (n, i), z 1 = (n − 1, j), otherwise.

(1.4)

This defines, for any starting point z = (n, i) ∈ S and any ω, a law Prω,z for the Markov chain ξ(·) by def

Prω,z (ξ(1) = z 1 , . . . , ξ(t) = z t ) = Qω (z, z 1 )Qω (z 1 , z 2 ) · · · Qω (z t−1 , z t ).

(1.5)

We call ω the environment or the random environment on a strip S. Denote by z the set of trajectories ξ(·) starting at z. Prω,z is the so-called quenched probability measure on z . The semi-direct product P(dω)Prω,z (dξ ) of P and Prω,z is defined on the direct product × z and is called the annealed measure. All our main results do not depend on the choice of the starting point z. We therefore write Prω instead of Prω,z when there is no danger of confusion. The one-dimensional model (1.2) reduces to a RW on a strip due to the following geometric construction. Note first that it is natural to assume (and we shall do so) that at least one of the following inequalities holds: P{ω : p(x, m) > 0} > 0 or P{ω : p(x, −m) > 0} > 0.

(1.6)

Consider the one-dimensional lattice as a subset of the X -axis in a two-dimensional plane. Cut this axis into equal intervals of length m so that each of them contains exactly m consecutive integer points. Turn each of these intervals around its left most integer point anti-clockwise by π/2. The image of Z obtained in this way is a part of a strip with distances between layers equal to m. Re-scaling the X -axis of the plane by m −1 makes the distance between the layers equal to one. The random walk on the line is thus transformed into a random walk on a strip with jumps to nearest layers. The formulae for matrix elements of the corresponding matrices Pn , Q n , Rn result now from a formal description of this construction. Namely, present x ∈ Z as x = nm +i, where 1 ≤ i ≤ m. This defines a bijection x ↔ (n, i) between the one-dimensional lattice Z and the strip S = Z × {1, . . . , m}. This bijection naturally transforms the

Lingering Random Walks in Random Environment on a Strip

257

ξ -process on Z into a walk on Z × {1, . . . , m}. The latter is clearly a random walk of type (1.5) and the corresponding matrix elements are given by Pn (i, j) = p(nm + i, m + j − i), Rn (i, j) = p(nm + i, j − i), Q n (i, j) = p(nm + i, −m + j − i).

(1.7)

1.3. Main results . Denote by J the following set of triples of m × m matrices: def

J = {(P, Q, R) : P ≥ 0, Q ≥ 0, R ≥ 0 and (P + Q + R)1 = 1}. Let J0 ⊂ J be the support of the probability distribution of the random triple (Pn , Q n , Rn ) defined above (obviously, this support does not depend on n). The two assumptions C1 and C2 listed below will be referred to as Condition C. Condition C C1 (Pn , Q n , Rn ), −∞ < n < ∞, is a sequence of independent identically distributed random variables. C2 There is an ε > 0 and a positive integer number l < ∞ such that for any (P, Q, R) ∈ J0 and all i, j ∈ [1, m], ||R l || ≤ 1 − ε, ((I − R)−1 P)(i, j) ≥ ε, ((I − R)−1 Q)(i, j) ≥ ε. Remarks. 1. We note that say ((I − Rn )−1 Pn )(i, j) is the probability for a RW starting from (n, i) to reach (n +1, j) at its first exit from layer n. The inequality ||Rnl || ≤ 1−ε is satisfied in essentially all interesting cases and, roughly speaking, means that the probability for a random walk to remain in layer n after a certain time l is small uniformly with respect to n and ω. 2. If the strip model is obtained from the one-dimensional model, then C2 may not be satisfied by matrices (1.7). This difficulty can be overcome if we replace C2 by a much milder condition, namely: C3 For P - almost all ω: (a) the strip S is the (only) communication class of the walk, (b) there is an ε > 0 and a triple (P, Q, R) ∈ J0 such that at least one of the following two inequalities holds: ((I − R)−1 P)(i, j) ≥ ε for all i, j ∈ [1, m], or ((I − R)−1 Q)(i, j) ≥ ε for all i, j ∈ [1, m]. Our proofs will be carried out under Condition C2. They can be modified to make them work also under Condition C3. Lemma 6 which is used in the proof of Theorem 1 is the main statement requiring a more careful treatment under Condition C3 and the corresponding adjustments are not difficult. However, the proofs become more technical in this case, and we shall not do this in the present paper. If now vectors p(x, ·) defining matrices (1.7) are P-almost surely such that p(x, 1) ≥ and p(x, −1) ≥ for some > 0, then it is easy to see that Condition C3 is satisfied. We note also that if in addition the inequalities p(x, m) ≥ and p(x, −m) ≥ hold P-almost surely, then also C2 is satisfied. For a triple of matrices (P, Q, R) ∈ J0 denote by π = π(P,Q,R) = (π1 , . . . , πm ) a row vector with non-negative components such that π(P + Q + R) = π and

m j=1

π j = 1.

258

E. Bolthausen, I. Goldsheid

Note that the vector π is uniquely defined. Indeed, the equation for π can be rewritten as π(I − R) (I − R)−1 P + (I − R)−1 Q = π(I − R). According to Condition C2, the stochastic matrix (I − R)−1 P + (I − R)−1 Q has strictly positive elements (in fact they are ≥ 2ε). Hence π(I − R) is uniquely (up to a multiplication by a number) defined by the last equation and this implies the uniqueness of π. Consider the following subset of J : def

Jal = {(P, Q, R) ∈ J : π(P − Q)1 = 0, where π(P + Q + R) = π }, (1.8) m m where obviously π(P − Q)1 ≡ i=1 πi j=1 (P(i, j) − Q(i, j)). Note that Jal is an algebraic subsurface in J . We are now in a position to state the main result of this work: Theorem 1. Suppose that Condition C is satisfied, the random walk ξ(·) = (X (·), Y (·)) is recurrent, and J0 ⊂ Jal . Then there is a sequence of random variables bt (ω), t = 1, 2, . . ., which converges weakly as t → ∞ and such that for any > 0,

X (t) P ω : Prω | − b | ≤ ≥ 1 − → 1 as t → ∞. (1.9) t (log t)2 Remarks. The algebraic condition in this theorem requires a certain degree of nondegeneracy of the support J0 of the distribution of (Pn , Q n , Rn ). It may happen that relations (1.9) hold even when J0 ⊂ Jal . However Theorem 3 shows that there are important classes of environments where relations (1.9) (or (1.11)) hold if and only if this non-degeneracy condition is satisfied. We now turn to the one-dimensional model. It should be mentioned right away that Theorem 2 is essentially a corollary of Theorem 1. Denote by J˜ the set of all 2m + 1-dimensional probability vectors: def J˜ = {( p( j))−m≤ j≤m : p(·) ≥ 0 and

m

p( j) = 1}.

j=−m

Remember that in this model the environment is a sequence of vectors: ω = ( p(x, ·))−∞<x<∞ , where p(x, ·) ∈ J˜ . Let J˜0 ⊂ J˜ be the support of the distribution of the random vector p(0, ·). Finally, put def J˜al = { p(·) ∈ J˜ :

m

j p( j) = 0}.

(1.10)

j=−m

Theorem 2. Suppose that: (a) p(x, ·), x ∈ Z, is a sequence of i. i. d. vectors, (b) there is an ε > 0 such that p(0, 1) ≥ ε, p(0, −1) ≥ ε, p(0, m) ≥ ε, and p(0, −m) ≥ ε for any p(0, ·) ∈ J˜0 , (c) for P almost all environments ω the corresponding one-dimensional random walk ξ(·) is recurrent, (d) J˜0 ⊂ J˜al .

Lingering Random Walks in Random Environment on a Strip

259

Then there is a weakly converging sequence of random variables bt (ω), t = 1, 2, . . . such that for any > 0,

ξ(t) − b | ≤ ≥ 1 − → 1 as t → ∞. (1.11) P ω : Prω | t (log t)2 Proof. Since the one-dimensional model reduces to a model on a strip, the result in question would follow if we could check that all conditions of Theorem 1 follow from those of Theorem 2. It is obvious from formulae (1.7) that the i. i. d. requirement (Condition C1) follows from condition (a) of Theorem 2. We have already mentioned above that Condition C2 follows from condition (b). The recurrence of the corresponding walk on a strip is also obvious. Finally, condition (d) implies the algebraic condition of Theorem 1. Indeed, formulae (1.7) show that matrices Pn , Q n , Rn are defined by probability vectors p(nm +i, ·) ∈ J˜0 , where 1 ≤ i ≤ m. Put n = 0 and choose all these vectors to be equal to each other, say p(i, ·) = p(·) ∈ J˜0 , where 1 ≤ i ≤ m. A direct check shows that the triple of matrices (P, Q, R) built from this vector has the property that P + Q + R is double-stochastic and irreducible (irreducibility follows from the conditions p(1) ≥ ε and p(−1) ≥ ε). Hence the only probability vector π satisfying π(P + Q + R) = π is given by π = (m −1 , . . . , m −1 ). One more direct calculation shows that in this case mπ(P − Q)1 =

m

j p( j).

j=−m

Hence the condition J0 ⊂ Jal of Theorem 1 is satisfied if there is at least one vector ˜ p(·) ∈ J0 such that mj=−m j p( j) = 0. We conclude this section with a theorem which shows, among other things, that the algebraic condition of Theorem 2 is also necessary for having (1.11). This theorem does not require independence as such but in a natural sense it finalizes the classification of the one-dimensional recurrent RWs with bounded jumps in the i. i. d. environments. Theorem 3. Consider a one-dimensional RW and suppose that (a) p(x, ·), x ∈ Z, is a strictly stationary ergodic sequence of vectors, (b) there is an ε > 0 such that p(0, 1) ≥ ε and p(0, −1) ≥ ε for any p(0, ·) ∈ J˜0 , (c) J˜0 ⊂ J˜al , that is m

j p( j) = 0 for any p(·) ∈ J˜0 .

j=−m

Then: (i) The random walk ξ(·) is asymptotically normal in every(!) environment ω = ( p(x, ·)) −∞<x<∞ . (ii) There is a σ > 0 such that for P-a. e. ω,

x 2 ξ(t) 1 − u lim Prω √ ≤ x = √ e 2σ 2 du, (1.12) t→∞ t 2π σ −∞ where x is any real number and the convergence in (1.12) is uniform in x.

260

E. Bolthausen, I. Goldsheid

Remarks about the proof of Theorem 3. The condition of this theorem implies that ξ(t) is a martingale: m

E ω (ξ(t) − ξ(t − 1) | ξ(t − 1) = k) =

j p(k, j) = 0,

j=−m

where E ω denotes the expectation with respect to the probability measure Prω on the space of trajectories of the random walk (we assume that ξ(0) = 0). Let Un = ξ(n) − ξ(n − 1) and put def

σn2 = E ω (Un2 | ξ(n − 1)) =

m

j 2 p(ξ(n − 1), j).

j=−m def

Obviously ε ≤ σn2 ≤ m 2 , where ε is the same as in Theorem 3. Next put Vn2 = def

n

2 j=1 σ j nm 2 . Let

and sn2 = E ω (Vn2 ) = E ω (ξ(n)2 ). It is useful to note that nε ≤ Vn2 , sn2 ≤ Tt = inf{n : Vn2 ≥ t}. Statement (i) of Theorem 3 is a particular case of a much more general theorem of Drogin who in particular proves that t −1/2 ξ(Tt ) converges weakly to a standard normal random variable. We refer to [12], p. 98 for more detailed explanations. Statement (ii) of Theorem 3 is similar to a well known result by Lawler [16]. The main ingredient needed for proving (ii) is the following claim: The limit lim n −1 Vn2 = lim n −1 sn2 exists for P-almost all ω. n→∞

n→∞

(1.13)

Once this property of the variance of ξ(·) is established, (ii) becomes a corollary of Brown’s Theorem (see Theorems 9 and 10 in Appendix or Theorem 4.1 in [12]). However proving (1.13) is not an entirely straightforward matter. The proof we are aware of uses the approach known under the name “environment viewed from the particle”. This approach was used in [16] for proving properties of variances similar to (1.13); unfortunately, the conditions used in [16], formally speaking, are not satisfied in our case. Fortunately, Zeitouni in [20] found the way in which Lawler’s result can be extended to more general martingale-type random walks in random environments which include our case. 2. Preparatory Results 2.1. Elementary corollaries of Condition C. We start with several elementary observations following from C2. Lemma 3 and a stronger version of Lemma 1 can be found in [1]. Lemmas 2 and 4 are borrowed from [6]. Lemma 1. If Condition C2 is satisfied then for P-almost every environment ω the whole phase space S of the Markov chain ξ(t) constitutes the (only) communication class of this chain. Proof. Fix an environment ω and consider matrices def def P˜n = (I − Rn )−1 Pn , Q˜ n = (I − Rn )−1 Q n .

Lingering Random Walks in Random Environment on a Strip

261

Remark that P˜n (i, j) is the probability that the random walk ξ starting at (n, i) would reach (n + 1, j) at the time of its first exit from layer n; the probabilistic meaning of Q˜ n (i, j) is defined similarly. P˜n (i, j) ≥ ε > 0 and Q˜ n (i, j) ≥ ε > 0 because of Condition C2. It is now obvious that a random walk ξ(·) starting from any z ∈ S would reach any z 1 ∈ S with a positive probability. Matrices of the form (I − R − Qψ)−1 , (I − R − Qψ)−1 P, and (I − R − Qψ)−1 Q arise in the proofs of many statements below. We shall list several elementary properties of these matrices. Lemma 2. If Condition C2 is satisfied, (P, Q, R) ∈ J0 and ψ is any stochastic matrix, then there is a constant C depending only on ε and m such that (2.1) (I − R − Qψ)−1 ≤ C. Proof. Note first that ||R l || ≤ 1 − ε implies that for some C1 uniformly in R, ∞ k R ≤ C1 . (I − R)−1 ≤ k=0

Next, it follows from (P + Q + R)1 = 1 that (I − R)−1 P1 + (I − R)−1 Q1 = 1 and (I − R)−1 Q1 = 1 − (I − R)−1 P1. Condition C2 implies that (I − R)−1 P1 ≥ mε1. Hence (I − R)−1 Q = (I − R)−1 Q1 = 1 − (I − R)−1 P1 ≤ 1 − mε. Similarly, (I − R)−1 P ≤ 1 − mε. Hence (I − R − Qψ)−1 = (I − (I − R)−1 Qψ)−1 (I − R)−1 −1 ≤ 1 − (I − R)−1 Qψ r (I − R)−1 ≤ C1 m −1 ε−1 ≡ C. Lemma is proved.

Lemma 3. ([1]). If Condition C2 is satisfied, (P, Q, R) ∈ J , and ψ is a stochastic matrix, then (I − R − Qψ)−1 P is also stochastic. Proof. We have to check that (I − R − Qψ)−1 P1 = 1 which is equivalent to P1 = (I − Qψ − R)1 ⇔ (P + Qψ + R)1 = 1. Since ψ1 = 1 and P + Q + R is stochastic, the result follows. Lemma 4. Suppose that Condition C2 is satisfied and (P, Q, R) ∈ J0 and let a matrix ϕ ≥ 0 be such that ϕ1 ≤ 1. Then ((I − R − Qϕ)−1 P)(i, j) ≥ ε and ((I − R − Qϕ)−1 Q)(i, j) ≥ ε.

(2.2)

Proof. (I − R − Qϕ)−1 P ≥ (I − R)−1 P and (I − R − Qϕ)−1 Q ≥ (I − R)−1 Q.

2.2. Random transformations, related Markov chains, Lyapunov exponents, and recurrence criteria. The purpose of this section is to introduce objects listed in its title. These objects shall play a major role in the proofs of our main results. They shall also allow us to state the main results from [1] in the form which is suitable for our purposes.

262

E. Bolthausen, I. Goldsheid

Random transformations and related Markov chains. Let be the set of stochastic def m × m matrices, X be the set of unit vectors with non-negative components, and M = × X the direct product of these two sets. Define a distance ρ(·, ·) on M by ρ((ψ, x), (ψ , x )) = ||ψ − ψ || + ||x − x ||. def

(2.3)

For any triple (P, Q, R) ∈ J0 denote by g ≡ g(P,Q,R) a transformation g : M → M, where g.(ψ, x) = ((I − R − Qψ)−1 P , ||Bx||−1 Bx), def

(2.4)

and B ≡ B(P,Q,R) (ψ) = (I − R − Qψ)−1 Q. def

(2.5)

The fact that g maps M into itself follows from Lemma 3. Remarks. Here and in the sequel the notation g.(ψ, x) is used instead of g((ψ, x)) and the dot is meant to replace the brackets and to emphasize the fact that g maps (ψ, x) into another pair from M. In fact this notation is often used in the theory of products def

of random matrices, e. g. B.x = ||Bx||−1 Bx; we thus have extended this tradition to another component of g. def

If ω ∈ is an environment, ω = (ωn )−∞
(2.6)

where we use a concise notation for matrices defined by (2.5): def

Bn = Bωn (ψn ) ≡ B(Pn ,Q n ,Rn ) (ψn ).

(2.7)

Theorem 4. Suppose that Condition C is satisfied. Then: a) For P-a.e. sequence ω the following limits exist: def

ζn =

def

lim ψn , yn =

a→−∞

lim xn ,

a→−∞

(2.8)

and (ζn , yn ) does not depend on the choice of the sequence (ψa , ya ). Furthermore, the convergence in (2.8) is uniform in (ψa , xa ). b) The sequence of pairs (ζn , yn ) ≡ (ζn (ω), yn (ω)) −∞ < n < ∞, is the unique sequence of elements from M which satisfy the following infinite system of equations (ζn+1 , yn+1 ) = (I − Rn − Q n ζn )−1 Pn , ||An (ω)yn ||−1 An (ω)yn , n ∈ Z, (2.9) where An ≡ An (ω) = (I − Rn − Q n ζn )−1 Q n . def

(2.10)

c) The enlarged sequence (ωn , ζn , yn ), −∞ < n < ∞, forms a stationary and ergodic Markov chain with components ωn and (ζn , yn ) being independent of each other.

Lingering Random Walks in Random Environment on a Strip

263

Proof. The first relation in (2.8) is the most important statement of our theorem and it also is the main content of Theorem 1 in [1]; it thus is known. The main difference between this theorem and Theorem 1 from [1] is that here we consider the extended sequence (ψn , xn ), n ≥ a, rather than just (ψn ), n ≥ a. The proof of the second relation in (2.8) is based on two observations. First note that the first relation in (2.8) implies that lima→−∞ Bn = An . Next, it follows from the definition of the sequence xn that xn = Bn−1 . . . Ba xa −1 Bn−1 . . . Ba xa .

(2.11)

Estimates (2.1) and (2.2) imply that mini1 ,i2 ,i3 ,i4 Bk−1 (i 1 , i 2 )Bk (i 3 , i 4 ) ≥ ε¯ for some ε¯ > 0 and hence also mini1 ,i2 ,i3 ,i4 A−1 k (i 1 , i 2 )Ak (i 3 , i 4 ) ≥ ε¯ . It is well known (and can be easily derived from Lemma 15) that these inequalities imply the existence of lim An An−1 . . . Aa xa −1 An An−1 . . . Aa xa

a→−∞

and this limit does not depend on the choice of the sequence xa ≥ 0, ||xa || = 1. Combining these two limiting procedures we obtain the proof of the second relation in (2.8). Part b) of the theorem is proved exactly as part b) of Theorem 1 from [1]. The Markov chain property and the independence claimed in part c) are obvious corollaries of the independence of the triples (Pn , Q n , Rn ). And, finally, the ergodicity of the sequence (ωn , ζn , yn ) is due to the fact that the sequence ωn is ergodic and the (ζn , yn ) is a function of (ωk )k≤n−1 . Remarks. The proof of Theorem 1 in [1] was obtained under much less restrictive assumptions than those listed in Condition C of this work. In particular, the i. i. d. condition which we impose on our environments (rather than having them just stationary and ergodic) is unimportant for parts a) and b) of Theorem 4 as well as for Theorem 5. However, the i. i. d. property is important for the proof of our main results. The top Lyapunov exponent of products of matrices An and the recurrence criteria. The top Lyapunov exponent of products of matrices An will be denoted by λ and it is defined by 1 def (2.12) λ = lim log An An−1 . . . A1 . n→∞ n The existence of the limit in (2.12) with P-probability 1 and the fact that λ does not depend on ω is an immediate corollary of Kingman’s sub-additive ergodic theorem; it was first proved in [5]. The Furstenberg formula states that

(2.13) λ= log (I − R − Qζ )−1 Qy µ(dg)ν(d(ζ, y)), J0 ×M

where ν(d(ζ, y)) is the invariant measure of the Markov chain (2.6) and µ(dg) is the distribution of the set of triples (P, Q, R) supported by J0 (defined in Sect. 1.3). We use the shorter notation dg rather than d(P, Q, R) because, as we have seen above, every triple (P, Q, R) ∈ J0 defines a transformation g. Besides, this notation is consistent with the one used in Sect. 4.1.3. We remark that a proof of (2.12) and (2.13) will be given in Sect. 2.4 as a natural part of the proof of the invariance principle for the sequence of random variables log An An−1 . . . A1 . We finish this section by quoting the recurrence criteria proved in [1].

264

E. Bolthausen, I. Goldsheid

Theorem 5. Suppose that Condition C is satisfied. Then a) λ ≷ 0 if and only if for P-a.e. environment ω one has (respectively) lim ξ(t) = ∓∞ Prω -almost surely.

t→∞

b) λ = 0 if and only if for P-a.e. ω the RW ξ(·) is recurrent, that is lim sup ξ(t) = +∞ and lim inf ξ(t) = −∞ Prω -almost surely. t→∞

t→∞

2.3. One algebraic corollary of Theorems 4 and 5. Theorems 4 and 5 combined with a simple probabilistic observation lead to an algebraic result which plays a very important role in the proof of our algebraic condition. Suppose that the matrices (Pn , Q n , Rn ) do not depend on n: (Pn , Q n , Rn ) ≡ (P,Q,R), and the triple (P, Q, R) satisfies Condition C2. In this case relations (2.8) mean that ζn = ζ and yn = y, where ζ is a unique stochastic matrix and y ≥ 0 a unique unit vector such that (2.14) ζ = (I − R − Qζ )−1 P, and Ay = eλ y, where the matrix A is defined by A = (I − R − Qζ )−1 Q. def

Theorem 5 now states that a random walk in a constant environment is recurrent if λ = 0, transient to the right if λ < 0, and transient to the left if λ > 0. But the fact that the random environment does not depend on n allows one to analyse the recurrence and transience properties of the random walk in a way which is much more straightforward than the one offered by Theorems 4 and 5. Namely, suppose that ξ(t) = (X (t), Y (t)) = (k, i). Then the conditional probability Pr { Y (t) = j | ξ(t − 1) = (k, i)} = P(i, j) + Q(i, j) + R(i, j) does not depend on X (t − 1) and thus the second coordinate of this walk is a Markov chain with a state space (1, . . . , m) and a transition matrix P + Q + R. Hence, if π = (π1 , . . . πm ) is a probability vector such that π(P + Q + R) = π then πi is the frequency of visits by the RW to the sites (·, i) of the strip. def

Consider next the displacement η(t) = X (t) − X (t − 1) of the coordinate X of the walk which occurs between times t − 1 and t. The random variable η(t) takes values 1, −1, or 0 and the following conditional distribution of the pair (η(t), Y (t)) is given by Pr { (η(t), Y (t)) = (1, j) | ξ(t − 1) = (k, i)} = P(i, j), Pr { (η(t), Y (t)) = (−1, j) | ξ(t − 1) = (k, i)} = Q(i, j), and Pr { (η(t), Y (t)) = (0, j) | ξ(t − 1) = (k, i)} = R(i, j). It is essential that this distribution depends only on i (and not on k) and thus this pair forms a time-stationary Markov chain. Let us denote by E (k,i) the corresponding conditional expectation with conditioning on (η(t − 1), Y (t − 1)) = (k, i), −1 ≤ k ≤ 1, 1 ≤ m. We then have E (k,i) (η(t)) =

m j=1

P(i, j) −

m

Q(i, j),

j=1

and the expectation of the random variable with respect to the stationary distrimsame bution is thus given by i=1 πi mj=1 (P(i, j) − Q(i, j)). Applying the law of large

Lingering Random Walks in Random Environment on a Strip

265

numbers for Markov chains to the sequence η(t) we obtain that with Pr -probability 1, lim t −1 X (t) = lim t −1

t→∞

t→∞

t k=1

η(k) =

m i=1

πi

m (P(i, j) − Q(i, j)), j=1

and this limit is independent of the ξ(0). Since this result is equivalent to the statements of Theorems 4 and 5, we obtain the following Lemma 5. Suppose that (P, Q, R) satisfies Condition C2. Then (ζ, x) ∈ M satisfies Eq. (2.14) with λ = 0 if and only if m i=1

m πi (P(i, j) − Q(i, j)) = 0.

(2.15)

j=1

m Moreover λ > 0 if and only if i=1 πi mj=1 (P(i, j) − Q(i, j)) < 0 (and thus λ < 0 m m if and only if i=1 πi j=1 (P(i, j) − Q(i, j)) > 0). 2.4. The CLT and the invariance principle for Sn ’s. The main goal of this section is to prove an invariance principle (IP) (and a CLT) for the sequence def

Sn = log Bn . . . B1 x1 − nλ,

(2.16)

where matrices Bn are defined by (2.7) and λ is given by (2.13). Obviously, Sn depends on (ψ1 , x1 ) ∈ M. We shall prove that in fact the IP (and the CLT) are satisfied uniformly in (ψ1 , x1 ) ∈ M. Moreover, exactly one of the two things takes place if the random walk is recurrent: either the asymptotic behaviour of Sn is described by a non-degenerate Wiener process, or the support of the distribution of matrices (P, Q, R) belongs to an algebraic manifold defined by Eq. (1.8). To make these statements precise we first recall one of the definitions n of the invariance principle associated with a general random sequence Sn = k=1 f k , with the convention S0 = 0. Let {C[0, 1], B, PW } be the probability space where C[0, 1] is the space of continuous functions with the sup norm topology, B being the Borel σ -algebra generated by open sets in C[0, 1], and PW the Wiener measure. Define for t ∈ [0, 1] a sequence of random functions vn (t) associated with the sequence Sn . Namely, put 1

vn (t) = n − 2 (Sk + f k+1 (tn − k)) def

if k ≤ tn ≤ k + 1, k = 0, 1, . . . , n − 1. (2.17)

For a σ > 0 let {Pσn } be the sequence of probability measures on {C[0, 1], B} determined by the distribution of {σ −1 vn (t), 0 ≤ t ≤ 1}. Definition. A random sequence Sn satisfies the invariance principle with parameter σ > 0 if Pσn → PW weakly as n → ∞. If the sequence Sn depends on (another) parameter, e.g. z 1 , then we say that Sn satisfies the invariance principle with parameter σ > 0 uniformly in z 1 if for any continuous functional on f : C[0, 1] → R one has: Eσn (f) → E W (f) uniformly in z 1 as n → ∞. Here En and E W are expectations with respect to the relevant probabilities.

266

E. Bolthausen, I. Goldsheid

Let us state the invariance principle for the sequence Sn given by (2.16). Note that in this case Sn =

n (log Bk xk − λ), where xk = Bk−1 xk−1 −1 Bk−1 xk−1 , k ≥ 2.

(2.18)

k=1

Put z n = (ψn , xn ) and f n = f (gn , z n ), where the function f is defined on the set of pairs (g, z) ≡ ((P, Q, R), (ψ, x)) by def (2.19) f (g, z) = log (I − R − Qψ)−1 Qx − λ. Obviously in these notations Sn = nk=1 f k . Denote by A the Markov operator associated with the Markov chain z n+1 = gn .z n defined by (2.6): if F is a function defined on the state space J0 × M of this chain then

def (AF)(g, z) = F(g , g.z)µ(dg ). J0 ×M

Using these notations we write ν(dz) (rather than ν(d(ψ, x))) for the invariant measure of the chain z n and we denote by M0 ⊂ M the support of ν(dz). Theorem 6. Suppose that Condition C is satisfied and the function f is defined by (2.19). Then: (i) The equation F(g, z) − (AF)(g, z) = f (g, z) (2.20) has a unique solution F(g, z) which is continuous on J0 × M0 and

F(g, z)µ(dg)ν(dz) = 0. J0 ×M

Denote by

σ2 =

J0 ×M0

(AF 2 − (AF)2 )(g, y)µ(dg)ν(dy).

S√n

(ii) If σ > 0 then σ n converges in law towards the standard Gaussian distribution N (0, 1) and the sequence Sn satisfies the invariance principle with parameter σ uniformly in (ψ1 , x1 ) ∈ M. (iii) If σ = 0, then the function F(g, y) depends only on y and for every (g, y) ∈ J0 ×M0 one has f (g, y) = F(y) − F(g.y). (2.21) (iv) If σ = 0 and λ = 0 then

J0 ⊂ Jal ,

(2.22)

with Jal given by (1.8). Proof. Statements (i), (ii), and (iii) of our theorem follow from Theorem 12. In order to be able to apply Theorem 12 we have to show that the sequence of random transformations gn has the so called contraction property. Lemma 6 establishes this property. Relation (2.22) is then derived from (2.21) and one more general property of Markov chains generated by products of contracting transformations (Lemma 8).

Lingering Random Walks in Random Environment on a Strip

267

Lemma 6. Suppose that Condition C is satisfied and let (ψn+1 , xn+1 ) = gn .(ψn , xn ), (ψn+1 , xn+1 ) = gn .(ψn , xn ), n ≥ 1,

be two sequences from M. Then there is a c, 0 ≤ c < 1, such that for any (ψ1 , x1 ), (ψ1 , x1 ) ∈ M, (2.23) ρ (ψn , xn ), (ψn , xn ) ≤ const cn , where ρ(·, ·) is defined by (2.3). Proof of Lemma 6. We shall first prove that there is a c0 < 1 such that ||ψn − ψn || ≤ const c0n . The control of the x-component would then follow from this result. Let us introduce a sequence of m×m matrices ϕn , n ≥ 1, which we define recursively: ϕ1 = 0 and (2.24) ϕn+1 = (I − Rn − Q n ϕn )−1 Pn , if n ≥ 1. Remarks. Matrices ϕn and ψn were defined in a purely analytic way. Their probabilistic meaning is well known (see [1]) and shall also be discussed in Sect. 3. def

Put k = ψk − ϕk . To control the ψ-part of the sequence (ψn , xn ) we need the following Lemma 7. Suppose that Condition C is satisfied. Then there is a c0 , 0 ≤ c0 < 1, such that for any stochastic matrix ψ1 ∈ the matrix elements of the corresponding n+1 are of the following form: n+1 (i, j) = αn (i)cn ( j) + ˜n (i, j).

(2.25)

Here αn (i) and cn ( j) depend only on the sequence (P j , Q j , R j ), 1 ≤ j ≤ n; the matrix ˜n = (˜n (i, j)) is a function of ψ1 and of the sequence (P j , Q j , R j ), 1 ≤ j ≤ n, satisfying ||˜n || ≤ C1 c0n for some constant C1 . Corollary. If Condition C holds then ||ψn+1 − ψn+1 || ≤ 2C1 c0n .

(2.26)

Proof of Corollary. Consider a sequence ψn which differs from ψn in that the starting def

value for recursion (2.6) is ψ1 . Put k = ψk − ϕk . Applying the result of Lemma 7 to n+1 we obtain: n+1 (i, j) = αn (i)cn ( j) + ˜n (i, j). (2.27) || = It follows from (2.25), (2.27), and the definition of n+1 and n+1 that ||ψn+1 −ψn+1 ||n+1 − n+1 || ≤ ||˜n || + ||˜n || ≤ 2C1 c0n .

Proof of Lemma 7. The main idea of this proof is the same as that of the proof of Theorem 1 from [1]. A very minor difference is that here we have to control the behaviour of ψn when n is growing while ψ1 is fixed; in [1] n was fixed while the starting point of the chain was tending to − ∞. A more important difference is that here we state the exponential speed of convergence of certain sequences and present the corresponding quantities in a relatively explicit way while in [1] the speed of convergence was not very essential (even though the exponential character of convergence had been clear already then).

268

E. Bolthausen, I. Goldsheid

To start, note that it follows from (2.6) and (2.24) that n+1 = ((I − Rn − Q n ψn )−1 − (I − Rn − Q n ϕn )−1 )Pn = (I − Rn − Q n ψn )−1 Q n n (I − Rn − Q n ϕn )−1 Pn = Bn n ϕn+1 .

(2.28)

Iterating (2.28), we obtain n+1 = Bn . . . B1 1 ϕ2 . . . ϕn+1 ≡ Bn . . . B1 ψ1 ϕ2 . . . ϕn+1 .

(2.29)

It follows from Lemma 4 that ϕn 1 ≤ 1. The matrix elements of the matrices ϕn , n ≥ 2, are strictly positive and, moreover, according to estimates (2.2) we have: ϕn (i, j) ≥ ε (and hence also ϕn (i, j) ≤ 1 − (m − 1)ε). We are in a position to apply to the product of matrices ϕn the presentation derived in Lemma 15 (with an ’s replaced by ϕn ’s). By the first formula in (4.16), we have: ϕ2 . . . ϕn+1 = Dn [(cn (1)1, . . . , cn (m)1) + φn ], where Dn is a diagonal matrix, cn ( j) ≥ δ with mj=1 cn ( j) = 1, and φn ≤ (1 − mδ)n−1 with δ > 0 (and of course mδ < 1). One can easily see that δ ≥ m −1 ε2 (this follows from (4.15) and the above estimates for ϕn (i, j)). We note also that the estimate for cn ( j) follows from (4.17) and (4.18). def Put c0 = 1 − mδ and let Bn = Bn . . . B1 1 Dn . We then have n+1 = Bn [(cn (1)1, . . . , cn (m)1) + φn ], and thus n+1 (i, j) = cn ( j) maxk, j |φn (k,

j)|cn−1 ( j)

≤

B (i, k) 1+ n k=1

m

const c0n .

φn (k, j) cn ( j)

(2.30)

. But all Bn (i, k) > 0 and

Hence

cn (l) n+1 (i, l) = + n (i, j, l), n+1 (i, j) cn ( j)

(2.31)

where |n (i, j, l)| < Cc0n with C being some constant. It follows from (2.31) that (n+1 (i, j))−1

m

n+1 (i, l) =

l=1

1 + n (i, j). cn ( j)

On the other hand remember that m l=1

n+1 (i, l) =

m l=1

ψn+1 (i, l) −

m

ϕn+1 (i, l) = 1 −

l=1

m

def

ϕn+1 (i, l) = αn (i).

l=1

Comparing these two expressions we obtain that n+1 (i, j) = αn (i)cn ( j) + ˜n (i, j), where |˜n (i, j)| ≤ C1 c0n . Lemma 7 is proved.

(2.32)

Lingering Random Walks in Random Environment on a Strip

269

||. Let us denote by b the transformation We now turn to the difference ||xn+1 − xn+1 n of the set X of unit non-negative vectors defined by

bn (x) = ||Bn x||−1 Bn x, where Bn = (I − Rn − Q n ψn )−1 Q n ,

(2.33)

and ψn are the same as above. The sequence bn is defined in a similar way with the only difference that ψn is replaced by ψn . Inequality (2.26) implies that for some C2 , ρ(b ¯ n , bn ) = sup ||bn (x) − bn (x)|| ≤ C2 c0n . def

x∈X

A very general and simple Lemma 16 from the Appendix now implies that ||xn+1 − xn+1 || ≤ C()(c0 + )n (1 + ||x1 − x1 ||)

and this proves Lemma 6. We can now easily prove the existence of the limit in (2.12) as well as Furstenberg’s formula (2.13) for λ. To this end note that def S¯n (ζ1 , 1) = log ||An . . . A1 || = log ||An . . . A1 1|| =

n

f (gk , z k ),

(2.34)

k=1

where the notation is chosen so that to emphasize the dependence of the sum S¯n (ζ1 , 1) on initial values x1 = 1 and ψ1 = ζ1 of the Markov chain. (Remark the difference between S¯n (ζ1 , 1) and the sum Sn in (2.16).) Lemma 6 implies that | S¯n (ζ1 , 1) − S¯n (ψ1 , x1 )| ≤ C3 ,

(2.35)

where the constant C3 depends only on the parameter ε from Condition C. But then, according to the law of large numbers applied to the Markov chain (ωn , ζn , yn ) ≡ (gn , ζn , yn ) defined in Theorem 4 we have that the following limit exists with probability 1: 1 1 log ||An . . . A1 || = lim S¯n (ζ1 , y1 ) = λ, n→∞ n n→∞ n lim

where λ is given by (2.13). Formula (2.13) implies that the mean value of the function f (g, z) defined by (2.19) is 0. Also, it is obvious that this function is Lipschitz on J0 × M in all variables. Hence, Theorem 12 applies to the sequence Sn and statements (i), (ii), and (iii) of Theorem 6 are thus proved. The case σ = 0 and λ = 0. Derivation of the algebraic condition for (P, Q, R). We start with a statement which is a corollary of a very general property proved in Lemma 13 from the Appendix. Lemma 8. Suppose that Condition C is satisfied and let g ∈ J0 , z g ∈ M be such that g.z g = z g . Then z g ∈ M0 ≡ suppν. Proof. According to Lemma 6, Condition C implies that every g ∈ J0 is contracting. Hence, by Lemma 13, z g ∈ M0 .

270

E. Bolthausen, I. Goldsheid

Derivation of the algebraic condition. According to Theorem 12 (see formula (4.10)), the equality σ = 0 implies that f (g, z) = F(z) − F(g.z). Hence, if z can be chosen to be equal to z g , then it follows that f (g, z g ) = 0. In the context of the present theorem the function f is given by f (g, z) = log ||(I − R − Qψ)−1 Qx||, where g = (P, Q, R) ∈ J0 and z = (ψ, x) ∈ M0 ⊂ × X. The equation g.z g = z g is equivalent to saying that z g = (ψ, x) satisfies (I − R − Qψ)−1 ψ = ψ and ||(I − R − Qψ)−1 Qx||−1 (I − R − Qψ)−1 Qx = x. The equation f (g, z g ) = 0 now reads log ||(I − R − Qψ)−1 Qx|| = 0 or, equivalently, ||(I − R − Qψ)−1 Qx|| = 1. Hence the conditions σ = 0 and λ = 0 imply that all pairs (g, z g ) ∈ J0 × M0 satisfy (I − R − Qψ)−1 P = ψ and (I − R − Qψ)−1 Qx = x. But, by Lemma 5, this implies that J0 ⊂ Jal , where Jal is defined by (1.8).

3. Proof of Theorem 1 As we are in the recurrent situation, we have that the Lyapunov exponent λ = 0. Throughout this section we denote by C a generic positive constant which depends on nothing but ε and m and which may vary from place to place. If f, g > 0 are two functions, depending on n ∈ Z, i ∈ {1, . . . , m}, and maybe on other parameters, we write f g if there exists a C > 1 such that C −1 f ≤ g ≤ C f. Potential and its properties. As before, Sn is defined by (2.16). We put ⎧ if n ≥ 1 ⎨ log ||An . . . A1 || def if n = 0 , n (ω) ≡ n = 0 ⎩ − log ||A . . . A || if n ≤ −1 0 n+1

(3.1)

where the matrices An are defined in (2.10). If n ≥ 1, then obviously n ≡ S¯n (ζ1 , 1) defined in (2.34). The random function n is the analog of the potential considered first in [18]. For n ≥ a, a ∈ Z, put def

Sa,n (ω; ψa , xa ) ≡ Sa,n (ω) = log Bn . . . Ba xa , where the matrices Bn are defined by (2.7). Similarly to (2.35), one has that Sa,n (ω; ζa , 1) − Sa,n (ω; ψa , xa ) ≤ C, which implies:

Sa,n (ω) − (n (ω) − a (ω)) ≤ C.

(3.2)

(3.3) (3.4)

Since one of the conditions of Theorem 1 is J0 ⊂ Jal , it follows from Theorem 6, part (iv) that n satisfies the invariance principle with a strictly positive parameter σ : σ > 0. The importance of the potential {n }n∈Z is due to that fact that it governs the stationary measure of our Markov chain; in fact it defines this stationary measure up to

Lingering Random Walks in Random Environment on a Strip

271

a multiplication by a bounded function (see (3.7). Namely, if a < b, we consider the on Markov chain ξta,b t∈N

def

Sa,b = {a, . . . , b} × {1, . . . , m}

(3.5)

with transition probabilities (1.4) and reflecting boundary conditions at L a and L b . This means that we replace (Pa , Q a , Ra ) by (I, 0, 0) and (Pb , Q b , Rb ) by (0, I, 0). This reflecting chain has a unique stationary probability measure which we denote by πa,b = πa,b (k, i) (k,i)∈S . A description of this measure was given in [1]. We repeat a,b

def

it here for the convenience of the reader. To this end introduce row vectors νk = Z πa,b (k, i) 1≤i≤m , a ≤ k ≤ b, and Z is a (normalizing) factor. In terms of these vectors the invariant measure equation reads νk = νk−1 Pk−1 + νk Rk + νk+1 Q k+1 , if a < k < b, νa = νa+1 Q a+1 , νb = νb−1 Pb−1 .

(3.6)

To solve Eq. (3.6), define for a ≤ k < b matrices αk by αa = Q a+1 , and αk = Q k+1 (I − Rk − Q k ψk )−1 , when a < k < b, def

def

where {ψk }k≥a+1 are given by (2.6) with the initial condition ψa+1 = I (we take into account that Ra = Q a = 0 in our case). We shall now check that νk can be found recursively as follows: νk =νk+1 αk , a ≤ k < b,, where νb satisfies νb ψb = νb . Indeed, the boundary condition at b in (3.6) reduces to νb = νb αb−1 Pb−1 = νb ψb , where we use the fact that αb−1 Pb−1 = ψb because Q b = I (and also due to (2.6)). But ψb is an irreducible stochastic matrix and therefore νb > 0 exists and is uniquely defined up to a multiplication by a constant. We now have for a < k < b that νk−1 Pk−1 + νk Rk + νk+1 Q k+1 = νk+1 (αk αk−1 Pk−1 + αk Rk + Q k+1 ) = νk+1 αk (Q k ψk + Rk + (I − Rk − Q k ψk )) = νk+1 αk = νk . Finally νa = νa+1 Q a+1 with αa = Q a+1 and this finishes the proof of our statement. We now have that πa,b (k, ·) = πa,b (b, ·) αb−1 αb−2 · · · · · αk , where as before πa,b (k, ·) is a row vector. Note next that αb−1 αb−2 · · · · · αk = Bb−1 · · · · · Bk+1 (I − Rk − Q k ψk )−1 . From this, we get πa,b (k, ·) Bb−1 · · · · · Bk+1 πa,b (b, ·), and using (3.2), (3.4), we obtain for a ≤ k, l ≤ b, πa,b (k, ·) exp [k − l ]. πa,b (l, ·)

(3.7)

272

E. Bolthausen, I. Goldsheid

We also consider the “mirror situation” by defining for n ≤ a the matrices ψn− in a similar way as in (2.6) by setting −1 − ψn−1 = I − Rn − Pn ψn− Q n , n ≤ a, def

and a boundary condition ψa− . Then, as in Theorem 4 a), one has that ζn− = lima→∞ ψn− exists almost surely, and does not depend on the boundary condition ψa− . We then put def − −1 A− Pn , n = I − Rn − Pn ζn and the potential − n as (3.1): ⎧ − ⎨ log ||A− 0 . . . An−1 || def − n = 0 ⎩ − − log ||A− n . . . A−1 ||

if n ≥ 1 if n = 0 . if n ≤ −1

We could as well have worked with this potential, and therefore we obtain πa,b (k, ·) − exp − k − l . πa,b (l, ·) As 0 = − 0 = 0, we get

n − − ≤ C n

(3.8)

uniformly in n. It is convenient to slightly reformulate the invariance principle for the potential. For that consider C0 (−∞, ∞), the space of continuous functions f : (−∞, ∞) → R satisfying f (0) = 0. We equip C0 (−∞, ∞) with a metric for uniform convergence on compacta, e.g. def

d ( f, g) =

∞

2−k min 1, supx∈[−k,k] | f (x) − g (x)| ,

(3.9)

k=1

and write B for the Borel-σ -field which is also the σ -field generated by the evaluation mappings C0 (−∞, ∞) → R. We also write PW for the law of the double-sided Wiener measure on C0 (−∞, ∞). For n ∈ N, we define kσ 2 def k = √ , k ∈ Z, Wn n n and define Wn (t), t ∈ R, by linear interpolation. Wn is a random variable taking values in C0 (−∞, ∞). Weak convergence of {Wn (t)}t∈R on C0 (−∞, ∞) is the same as weak convergence of {Wn (t)}t∈[−N ,N ] for any N ∈ N, and therefore, we immediately get Proposition 7. Wn converges in law to PW . Let V be the subset of functions f ∈ C0 (−∞, ∞) for which there exist real numbers a f (x) > f (b), ∀x ∈ (a, b), f (c) > f (x) > f (b), ∀x ∈ (b, c). 4. For any γ > 0, sup

f (x) > f (a),

sup

f (x) > f (c).

x∈(a−γ ,a) x∈(c,c+γ )

It is clear that for f ∈ V, a, b, c are uniquely defined by f, and we write occasionally a( f ), b( f ), c( f ). f (b) is the unique minimum of f in [a, c]. It is easy to prove that V ∈ B, and PW (V ) = 1. If δ > 0 and f ∈ V, we define def

cδ ( f ) = inf {x > c : f (x) = f (c) + δ}, def

aδ ( f ) = sup {x < a : f (x) = f (a) + δ}. If γ > 0, we set Vδ,γ to be the set of functions f ∈ V such that 1. cδ ( f ) ≤ 1/δ, aδ ( f ) ≥ −1/δ.

(3.10)

2. sup

[ f (x) − f (y)] ≤ 1 − δ,

(3.11)

[ f (x) − f (y)] ≤ 1 − δ.

(3.12)

b≤x
sup

aδ ≤y<x≤b

3. inf

x∈[aδ ,cδ ]\(b−γ ,b+γ )

f (x) ≥ f (b) + δ.

(3.13)

274

E. Bolthausen, I. Goldsheid

It is evident that for any γ > 0, we have Vδ,γ ↑ V for δ ↓ 0, and therefore, for any δ, η > 0 we can find δ0 (γ , η) such that for δ ≤ δ0 , PW Vδ,γ ≥ 1 − η. It is easy to see that PW ∂ Vδ,γ = 0, where ∂ refers to the boundary in C0 (−∞, ∞). Therefore, given γ , η > 0, we can find N0 (γ , η) such that for n ≥ N0 , δ ≤ δ0 , we have (3.14) P Wn ∈ Vδ,γ ≥ 1 − 2η. def For t ∈ N, we set n = n (t) = log2 t . If Wn(t) ∈ Vδ,γ , then we put 2 2 b Wn(t) log2 t def aδ Wn(t) log t def cδ Wn(t) log t , at = , ct = . bt = σ2 σ2 σ2 Remark that on Wn(t) ∈ Vδ,γ , we have the following properties, translated from (3.10 )-(3.13): def

ct ≤

log2 t , σ 2δ

2

t at ≥ − log , σ 2δ

s − s ≤ (1 − δ) log t, bt ≤ s < s ≤ ct , s − s ≤ (1 − δ) log t, at ≤ s < s ≤ bt , s ≥ bt + δ log t, s ∈ [at , ct ] \ bt − γ log2 t, bt + γ log2 t , min at , ct − bt ≥ (1 + δ) log t.

(3.15) (3.16) (3.17) (3.18) (3.19)

Furthermore, if 0 ∈ [at , bt ], then sup s − bt ≤ log t,

(3.20)

0≤s≤bt

and similarly if 0 ∈ [bt , ct ]. (We neglect the trivial issue that at , bt , ct may not be in Z.) The main result is Proposition 8. For ω ∈ Wn(t) ∈ Vδ,γ , we have for any i ∈ {1, . . . , m}, / bt − γ log2 t, bt + γ log2 t ≤ 4t −δ/2 , Prω,(0,i) X (t) ∈ if t is large enough. Together with (3.14), this proves our main result Theorem 1. In all that follows, we keep γ , δ fixed, and assume that ω ∈ Wn(t) ∈ Vδ,γ . We will also suppress ω in the notation, and will take t large enough, according to ensuing necessities. We first prove several estimates of probabilities characterizing the behaviour of a RW in a finite box in terms of the properties of the function Sn .

Lingering Random Walks in Random Environment on a Strip

275

Lemma 9. Consider a random walk on Sa,b with reflecting boundary conditions (see the discussion around (3.5)), and let a < k < b. Then Pr(k,i) (τa < τb ) ≤ C

b

exp y − a ,

(3.21)

exp y − a .

(3.22)

y=k

Pr(k,i) (τb < τa ) ≤ C

k y=a

Here τa , τb are the hitting times of the layers L a , L b . Proof. We only have to prove (3.21). Equation (3.22) then follows in the mirrored situation and using (3.8). def

Put h k (i) = Pr(k,i) (τb < τa ) and consider column-vectors hk = (h k (i))1≤i≤m . In def

order to find hk we introduce the matrices ϕk+1 = (ϕk+1 (i, j))1≤i, j≤m , where def

ϕk+1 (i, j) = Prω,(k,i) (τk+1 < τa , ξ(τk+1 ) = (k + 1, j)).

(3.23)

These matrices satisfy (2.24) (with a = 0) with the modified boundary condition ϕa+1 = 0. Equation (2.29) with ψk ’s defined by (2.6) now yields k+1 = Bk . . . Ba+1 ψa+1 ϕa+2 . . . ϕk+1 , and hence k+1 ≤ Bk . . . Ba ≤ C exp(k − a ).

(3.24)

The Markov property also implies that hk = ϕk+1 hk+1 , and hence hk = ϕk+1 ϕk+2 . . . ϕb 1 since hb = 1.

(3.25)

We view the probabilities Pr(k,·) (τa < τb ) as the column vector 1−hk . Then, presenting ϕb = ψb − b , we can have Pr(k,·) (τa < τb ) = 1 − ϕk . . . ϕb−1 1 = 1 − ϕk+1 . . . ϕb−1 (ψb − b )1 = 1 − ϕk+1 . . . ϕb−1 1 + ϕk+1 . . . ϕb−1 b 1 ≤ 1 − ϕk+1 . . . ϕb−1 1 + ||b ||1. Iterating this inequality, we obtain that Pr(k,·) (τa < τb ) ≤

b

|| y ||1

y=k+1

and (3.21) follows from (3.24).

Lemma 10. Let a < b, and τ be the hitting time of L a ∪ L b – the union of two layers. Then if a ≤ k ≤ b, we have E (k,i) (τ ) ≤ C(b − a)2 exp min

sup a≤s
( (s) − (t)),

sup a≤s
( (t) − (s)) .

276

E. Bolthausen, I. Goldsheid

Proof. To prove that, consider column-vectors ek = E (k,i) τ 1≤i≤m . These vectors satisfy ea = eb = 0, and for a < k < b: ek = Pk ek+1 + Rk ek + Q k ek−1 + 1.

(3.26)

To solve (3.26), we use an induction procedure which allows us to find a sequence of matrices ϕk and vectors dk such that ek = ϕk+1 ek+1 + dk .

(3.27)

Namely, we put ϕa+1 = 0, da = 0 which according to (3.27) implies that ea = 0. Suppose next that ϕk and dk−1 are defined for some k > a + 1. Then substituting ek−1 = ϕk ek + dk−1 into the main equation in (3.26) we have ek = Pk ek+1 + Rk ek + Q k (ϕk ek + dk−1 ) + 1, and hence ek = (I − Q k ϕk − Rk )−1 (Pk ek+1 + Q k dk−1 + 1) which makes it natural to put ϕk+1 = (I − Q k ϕk − Rk )−1 Pk

(3.28)

dk = Bk (ϕk )dk−1 + uk ,

(3.29)

and where uk = (I − Q k ϕk − Rk )−1 1, Bk (ϕk ) = (I − Q k ϕk − Rk )−1 Q k . The existence of matrices ϕk follows from the fact that ϕk ≥ 0 and ϕk 1 ≤ 1. Iterating (3.27) and (3.29) we obtain ek = dk + ϕk+1 dk+1 + · · · + ϕk+1 . . . ϕb−1 db−1 and dk = uk + Bk (ϕk )uk−1 + · · · + Bk (ϕk ) . . . Ba+1 (ϕa+1 )ua . Hence ek ≤ dk + dk+1 + · · · + db−1 ≤ C(b − k) max ||d j ||. k≤ j≤b−1

But ||Bk (ϕk ) . . . Bl (ϕl )|| ≤ C supa≤s
sup

( (s) − (t)) .

a≤s
We obtain the same estimate with replaced by − , and using (3.8), we get the desired estimate.

Lingering Random Walks in Random Environment on a Strip

277

Lemma 11. Let a ≤ k 0, Pr(k,i) (τb ≥ x, τb < τa ) ≤

C(b − a)2 exp supa≤s
Proof. Let again τ being the hitting time of L a ∪ L b . It is obvious that Pr(k,i) (τb ≥ x, τb < τa ) ≤ Pr(k,i) (τ ≥ x). By the Markov inequality and Lemma 10, the result follows.

Lemma 12. Let a < b, and consider the chain {ξt } on Sa,b with reflecting boundary conditions on a, b, as above. Then for any t ∈ N, (k, i), (l, j) ∈ Sa,b , we have Pr(k,i) (ξt = (l, j)) ≤ C exp [l − k ]. Proof. πa,b (l, j) =

πa,b k , i Pr(k ,i ) (ξt = (l, j))

(k ,i )

≥ πa,b (k, i) Pr(k,i) (ξt = (l, j)) for all (k, i), (l, j) ∈ Sa,b , and all t ∈ N. The lemma now follows with (3.7).

We have now all the ingredients for the Proof of Proposition 8. We may assume that 0 ∈ (at , bt ]. The case of 0 ∈ (bt , ct ) is handled similarly. We will write a, b, c for at , bt , ct , to simplify notations. We write Jt for the interval b − γ log2 t, b + γ log2 t . We have Pr(0,i) (X (t) ∈ / Jt ) ≤ Pr(0,i) (X (t) ∈ / Jt , τb < min (τa , t)) + Pr(0,i) (τb > τa ) (3.30) + Pr(0,i) (τb > t, τa > τb ). First we see that from Lemma 9, and (3.15), (3.19), (3.20), Pr(0,i) (τb > τa ) ≤ C (b − a) exp ≤

sup x − a

(3.31)

0≤x≤b

C log2 t exp −δ log t ≤ t −δ/2 , 2 σ δ

if t is large enough, and from Lemma 11 and (3.17), C log4 t exp supa≤s
Pr(0,i) (τb > t, τa > τb ) ≤

(3.32)

278

E. Bolthausen, I. Goldsheid

By the Markov property, we get Pr(0,i) (X (t) ∈ / Jt , τb < min (τa , t)) ≤

max

s≤t,1≤ j≤m

Pr(b, j) (X (s) ∈ / Jt ).

(3.33)

Now

/ Jt ) ≤ Pr(b, j) (min (τa , τc ) ≤ t) + Pr(b, j) X (a,c) (s) ∈ / Jt , (3.34) Pr(b, j) (X (s) ∈

where X (a,c) is the chain with reflecting boundary conditions at L a and L c . The second summand is estimated by Lemma 12 and (3.18), which give Pr(b, j) X (a,c) (s) ∈ (3.35) / Jt ≤ C exp sup l − b ≤ Ct −δ ≤ t −δ/2 . l ∈J / t

To estimate the first summand in (3.34) we observe that by (3.19), Pr(b−1,i) (τa < τb ) ≤ C exp [−a ] exp b−1 + exp [b ] ≤ C exp − (1 + δ) log t ≤ t −1−2δ/3 , and similarly Pr(b+1,i) (τc < τb ) ≤ t −1−2δ/3 . If, starting in (b, j), the chain reaches L a or L c in time t, there is at least one among the first t/2 of the excursions from L b which reaches L a ∪ L c . By the above estimates, each such excursion has at most probability t −1−2δ/3 to be “successful”, and therefore t/2 Pr(b, j) (min (τa , τc ) ≤ t) ≤ 1 − 1 − t −1−2δ/3 ≤ t −δ/2 . (3.36) Combining (3.30)–(3.36), we get / Jt ) ≤ 4t −δ/2 . Pr(0,i) (X (t) ∈ This proves the claim. 4. Appendix Most (if not all) of the results in this appendix are not new. The main reason for including them is that we want to present them in the form which is needed for our purpose; this is particularly relevant in the case of Markov chains generated by contracting transformations. We also hope that a more self-contained paper makes an easier reading. 4.1. The CLT and the invariance principle (IP) for stationary Markov chains. We first recall, in Subsect. 4.1.1, the classical results of B. M. Brown [2] about the CLT and the IP for martingales. We then explain in Subsect. 4.1.2 that the reduction of the proof of the CLT for Markov chains to the martingale case invented by Gordin and Lifshits [10] can be easily extended to obtain the IP for Markov chains. Finally, in Subsect. 4.1.3, we prove that the Gordin-Lifshits conditions are satisfied for a class of Markov chains generated by contracting transformations.

Lingering Random Walks in Random Environment on a Strip

279

4.1.1. The CLT and the IP for martingales (by B. M. Brown [2]). Let { Sn , Fn }, n = 1, 2, . . . be a martingale on the probability space (, F, P). Put Un = Sn − Sn−1 with S0 = 0. The expectation with respect to P is denoted by E, and E j−1 stands for n 2 the conditional expectation E(· | F j−1 ). Let σn2 = En−1 (Un2 ), Vn2 = j=1 σ j , and sn2 = E(Vn2 ) = E(Sn2 ). The main assumption in [2] concerned with martingales is: Vn2 sn−2 → 1 in probability as n → ∞.

(4.1)

We says that the Lindeberg condition holds for the class of martingales satisfying (4.1) if for any ε > 0, sn−2

n

EU 2j I (|U j | ≥ εsn ) → 0 as n → ∞,

(4.2)

j=1

where I (·) is a characteristic function of a set. For t ∈ [ 0, 1 ] define a sequence of piecewise linear random functions 2 − s 2 )−1 u n (t) = sn−1 Sk + Uk+1 (tsn2 − sk2 )(sk+1 k 2 , k = 0, 1, . . . , n − 1. if sk2 ≤ tsn2 ≤ sk+1

(4.3)

The following two theorems from [2] describe the asymptotic behaviour of the sequences Sn and u n (·). Theorem 9. If (4.1) and (4.2) hold, then Sn is asymptotically normal:

x 1 2 −1 − 21 lim P{sn Sn ≤ x } = (2π ) e− 2 y dy n→∞

−∞

(4.4)

for all x. Furthermore, all finite dimensional distributions of u n (t) converge weakly, as n → ∞, to those of a standard Wiener process W (t) on 0 ≤ t ≤ 1 (that is W (0) = 0 and EW 2 (1) = 1). Theorem 10. Let { C[0, 1], B, PW } be the probability space where C[0, 1] is the space of continuous functions with the sup norm topology, B being the Borel σ -algebra generated by open sets in C[0, 1], and PW the Wiener measure. Let {Pn } be the sequence of probability measures on { C[0, 1], B } determined by the distribution of { u n (t), 0 ≤ t ≤ 1 }. Then if (4.1) and (4.2) hold, Pn → PW weakly as n → ∞. 4.1.2. The CLT and the IP for general Markov chains. In their famous work [10], Gordin and Lifshits reduced the proof of the CLT for Markov chains to that of martingales. They then applied the same approach to the proof of the invariance principle for Markov chains in [11]. We shall explain their method here for the sake of completeness. Let z k , k = 1, 2, . . ., be a stationary ergodic Markov chain with a phase space (X, A), transition kernel K (z, dy), and initial distribution κ. Let f : X → R be a real valued function on X such that E f (z) = 0 and Var f (z) < ∞ (all expectations are taken with respect to the measure κ). Let L 2 (X, A, κ) be the natural Hilbert space associated with X, A, κ. By I we denote the identity operator in this space, and by A the transition def operator of the Markov chain: AF(z) = X F(y)K (z, dy). Put Sn = f (z 1 ) + · · · + f (z n ) with the convention S0 = 0.

(4.5)

280

E. Bolthausen, I. Goldsheid

Theorem 11. Let z k be a Markov chain described above and suppose that the function f with E f = 0 can be presented as f = (I − A)F, where F ∈ L 2 (X, A, κ) and EF = 0. Put σ 2 = ||F||2 − ||AF||2 ≡ EF 2 − E(AF)2 and suppose that σ > 0. Then σS√n n converges in law towards the standard Gaussian distribution N (0, 1) and the sequence Sn satisfies the invariance principle with parameter σ in the sense of the definition given in Sect. 2.4. Proof. Consider the identity which is due to Gordin ([9]) and was used by Gordin and Lifshits in [10]: f (z k ) = U (z k , z k+1 ) + F(z k ) − F(z k+1 ), where U (z k , z k+1 ) = F(z k+1 ) − (AF)(z k ). This identity holds true because of the conditions imposed on f . Obviously, E{U (z k , z k+1 ) | z k , . . . , z 1 } = 0. Denote Uk+1 = U (z k , z k+1 ). In these notations we can write Sn = Sˆn + F(z 1 ) − F(z n+1 ), where Sˆn = nk=1 Uk . It is clear that if Fn is a σ -algebra generated by the variables z 1 , . . . , z n , then the sequence Sˆn , n = 1, 2, .. is a martingale with respect to the filtration Fn , n = 1, 2, . . .. Let us check that all conditions required by Theorems 9 and 10 are satisfied. Indeed, σ j2 = E{U 2j | z j } = (AF 2 )(z j ) − [(AF)(z j )]2 is a stationary sequence with Eσ j2 = ||F||2 − ||AF||2 = σ 2 . Relation (4.1) takes the form (nσ 2 )−1

n

σ j2 → 1

j=1

and is satisfied with probability 1 because of the Birkhoff Ergodic Theorem. The Lindeberg condition (4.2) takes the form EU12 I (|U1 | ≥ εnσ 2 ) → 0 as n → ∞, and is obviously satisfied. Finally, functions (4.3) are now given by 1

u n (t) = n − 2 σ −1 (Sk + (tn − k)Uk+1 )

if k ≤ tn ≤ k + 1, k = 0, 1, . . . , n − 1,

and hence for k ≤ tn ≤ k + 1, 1

vn (t) = u n (t) + n − 2 σ −1 (F(z 1 ) − F(z k+1 ) + (tn − k)(F(z k ) − F(z k+1 ))), where vn (t) is as in (2.17). Since F is square integrable and z n is a stationary sequence, 1 it follows that n − 2 max1≤k≤n |F(z k )| → 0 with probability 1 as n → ∞. Hence also the sup0≤t≤1 |vn (t) − u n (t)| → 0 as n → ∞ with probability 1. All statements of our theorem follow now from Theorems 9 and 10. 4.1.3. The CLT and the IP for Markov chains generated by contracting transformations. Consider the following setup: (, F, P) is a probability space; the related expectation is denoted E. M is a compact metric space equipped with a distance ρ(·, ·). B is a semigroup of continuous Lipschitz transformations of M: for any g ∈ B there is a constant l g such that ρ(g.y, g.y ) ≤ l g ρ(y, y ) for any y, y ∈ M. Here and in the

Lingering Random Walks in Random Environment on a Strip

281

sequel g.y denotes the result of the action of g ∈ B on y ∈ M; this notation will be used most of the time but in some cases we may write g(y) rather than g.y. def

For any g1 , g2 ∈ B put ρ(g ¯ 1 , g2 ) = sup y∈M ρ(g1 .y, g2 .y). Obviously, ρ(·, ¯ ·) defines a distance on B. We can now consider a Borel sigma-algebra generated by the corresponding open subsets of B; this sigma-algebra will be denoted by S. def

Consider a measurable mapping g : → B, ω → g ω and for a B ∈ S put µ(B) = P{ω : g ω ∈ B}. We say that g is a random transformation of M. Let gk ∈ B, k ≥ 1 be a sequence of independent copies of g. Without loss of generality we can assume that gk are defined on the same probability space (, F, P). def

Denote by g( j) = g j . . . g1 the product of random transformations g1 , . . . , g j and let be the probability distribution of the product g( j) . This measure on B is often called the j th convolution power of the measure µ and is denoted by µ( j) = µ∗ j = µ ∗ · · · ∗ µ ( j times). A sequence of random transformations gk is said to be contracting if there are constants C > 0 and c, 0 ≤ c < 1 such that for any y, y ∈ M and any n ≥ 1,

ρ(g.y, g.y )µ(n) (dg) ≡ Eρ(gn . . . g1 .y, gn . . . g1 .y ) ≤ Ccn . (4.6) µ( j)

B

Remarks. Perhaps it would be more natural to say that the contraction property holds if ρ(g.y, g.y )µ(n) (dg) ≤ Ccn ρ(y, y ). However, (4.6) is sufficient for our purposes B and is what we check in our applications. As usual, products of random transformations generate a Markov chain with a state space M. Namely, let ν ≡ ν(dy) be a probability measure on M and let y1 ∈ M be chosen randomly according to the distribution ν and independent of all g j ’s. For k ≥ 1 def

define yk+1 ∈ M by yk+1 = gk .yk ≡ g(k) .y1 . The sequence of pairs (gk , yk ), k ≥ 1 forms a Markov chain with a phase space B × M; this chain will be denoted (g, y). Note that the (y)-component of this chain, the sequences yk , k ≥ 1, is itself a Markov chain with the phase space M. Since M is a compact space the chain (y) has an invariant measure; we shall suppose from now on that ν is such a measure which, in turn, implies that µ(dg)ν(dy) is an invariant measure of the chain (g, y). It is well known (and easy to see) that if gk is a contracting sequence of random transformations then the Markov chain (y) has a unique invariant measure. Let L2 (B × M) be the Hilbert space of µ × ν square integrable real valued functions and C(B × M) be its subset of continuous functions. Given an f ∈ C(B × M) let Sn denote the related Birkhoff sums along a trajectory of the Markov chain (g, y): Sn =

n

f (gk , yk ).

k=1

By A we denote the following Markov operator acting in L2 (B × M) and preserving C(B × M):

def (A f )(g, y) = f (g , g.y)µ(dg ). (4.7) B

It follows from (4.7) that (A f )(g, y) =

k

B×B

(k−1) f (g , gg.y)µ(dg ˜ )µ (d g). ˜

(4.8)

282

E. Bolthausen, I. Goldsheid

Theorem 12. Suppose that the sequence of random transformations gk is contracting and f is a continuous bounded function on B × M such that (i) B f (g, y)µ(dg) is Lipschitz on M, that is for some C f ( f (g, y) − f (g, y ))µ(dg) ≤ C f ρ(y, y ). B

(ii) B f (g, y)µ(dg)ν(dy) = 0. Then the equation (I − A)F = f,

(4.9)

has a solution F(g, y) which is continuous on B × M and

F(g, y)µ(dg)ν(dy) = 0. B×M

Besides, this solution is unique in L2 (B × M). Denote by

2 σ = (AF 2 − (AF)2 )(g, y)µ(dg)ν(dy). B×M

S√n

If σ > 0 then σ n converges in law towards the standard Gaussian distribution N (0, 1) and the sequence Sn satisfies the invariance principle with parameter σ . If σ > 0 and, in addition to (i), | f (g, y)− f (g, y )| ≤ C f (g)ρ(y, y ) with log(1+C f (g))µ(dg) < ∞, then the invariance principle for the sequence Sn is satisfied uniformly in y1 ∈ M. If σ = 0, then the function F(g, y) depends only on y and for every (g, y) in the support of µ × ν one has f (g, y) = F(y) − F(g.y). (4.10) Proof. The existence of F. Equation (4.9) can be rewritten as F = AF + f and, iterating this relation, one obtains a formal series: F=

∞

Ak f

(4.11)

k=0

Condition (ii) of the theorem and the invariance of the measure µ(dg)ν(dy) imply that

k (A f )(g, y)µ(dg)ν(dy) = f (g, y)µ(dg)ν(dy) = 0. B×M

B×M

Hence, the convergence in (4.11) would follow if we prove that k

¯ y¯ )| ≤ const c n0 for any (g, y), (g, ¯ y¯ ) ∈ support of µ × ν. |(Ak f )(g, y) − (Ak f )(g, (4.12) But it follows from (4.8) and condition (i) of the theorem that |(Ak f )(g, y) − (Ak f )(g, ¯ y¯ )|

f (g , gg.y) ˜ − f (g , g˜ g. ¯ y¯ ) µ(dg ) µ(k−1) (d g) ˜ = B

B ≤ Cf ρ(gg.y, ˜ g˜ g. ¯ y¯ )µ(k−1) (d g) ˜ ≤ C cn , B

Lingering Random Walks in Random Environment on a Strip

283

where the last inequality is due to the contraction property (4.6). The existence and continuity of F(g, y) is proved. Uniqueness. As usual, to prove the uniqueness we have to show that the homogeneous equation F = AF has only a trivial solution F ≡ 0 in the class of functions satisfying the condition B×M F(g, y)µ(dg)ν(dy) = 0. To check that this is the case assume that, to the contrary, there is an F ∈ L2 (B × M) such that F ≡ 0, satisfies the homogeneous equation, and has a zero mean value. For a given > 0 find a function F˜ which is Lipschitz on B × M and approximates F in the sense that ˜ ≤ , where || · || denotes the L2 (B × M) norm. The F˜ can always be ||F − F|| ˜ y)µ(dg)ν(dy) = 0. Next, for any n ≥ 1, chosen so that B×M F(g, ˜ + An F. ˜ F = An F = An (F − F) ˜ ≤ . Since can be made But then An F˜ → 0 uniformly in (g, y) and ||An (F − F)|| arbitrarily small, we conclude that F ≡ 0. Proof of the CLT and the IP in the case σ > 0. According to Theorem 11 the existence of F ∈ L2 (B × M) satisfying Eq. (4.9) is the main condition under which both the Central Limit Theorem and the Invariance Principle hold for Birkhoff sums picked up along a realization of a trajectory of a Markov chain. The ergodicity of the Markov chain is the other condition which is needed and which in our case follows from the contraction property. The CLT and the IP is thus proved. Proof of the uniform IP in the case σ > 0. We write Sn (y1 ) for Sn in order to emphasize the dependence of this sequence on y1 . Clearly, |Sn (y1 ) −

Sn (y1 )|

≤

n

| f (gk , yk ) −

f (gk , yk )|

k=1

≤

∞

C f (gk )ρ(yk , yk ).

(4.13)

k=1

It follows from (4.6) (due to the Chebyshev inequality) that P almost surely ρ(yk , yk ) ≤ e−εk for some ε > 0 and k ≥ k(ε, ω). It is essential that k(ε, ω) does not depend on y1 , y1 . Next, due to the condition imposed on the function f , the sequence k −1 log(1 + C f (gk )) → 0 as k → ∞ P almost surely. Hence the right-hand side of (4.13) is P almost surely bounded and the corresponding estimate does not depend on y1 , y1 . Let us now consider the dependence on y1 of the relevant vn (t) = vn (t; y1 ) (see (2.17)). For t ∈ [0, 1], and k ≤ tn ≤ k + 1, k = 0, 1, . . . , n − 1 we have: 1 vn (t; y1 ) − vn (t; y1 ) = n − 2 Sk (y1 ) − Sk (y1 ) + ( f k+1 (y1 ) − f k+1 (y1 ))(tn − k) with the obvious meaning of f k+1 (y1 ) and f k+1 (y1 ). It is now clear that P almost surely vn (t; y1 )−vn (t; y1 ) → 0 as n → ∞ uniformly in y1 , y1 . This proves that the uniformity of the invariance principle. The case σ = 0. Note that 2

2 2 (AF − A(F ))(g, y) = F(g , g.y) − F(g, ˜ g.y)µ(d g) ˜ µ(dg ). B

B

Hence σ = 0 implies that for µ × ν-almost all (g, y) and µ-almost all g

F(g , g.y) = F(g, ˜ g.y)µ(d g). ˜ B

(4.14)

284

E. Bolthausen, I. Goldsheid

But F(·, ·) is a continuous function of both variables and hence (4.14) holds for any (g, y) from the support of µ × ν. This proves that F depends only on the second variable: F(g , g.y) ≡ F(g.y) (we note that g.y runs over the whole of the support of ν when (g, y) runs over the support of µ × ν). Finally, one obtains (4.10) by substituting F(y) (rather than F(g, y)) into (4.9). 4.1.4. Markov chains generated by contracting transformations: characterization of the support of the invariant measure. The aim of this section is to give a characterization of the support of an invariant measure of a Markov chain generated by contracting transformations in terms of fixed points of these transformations. We work here within the same setup as in Sect. 4.1.3. This applies to the sequence g j , j ≥ 1, the metric space (M, ρ), the semigroup B of transformations of M, the Markov chain y j defined by y j+1 = g j .y j , j ≥ 1 (with y1 being a random element independent of all g j ’s). However, we shall suppose that B is generated by the transformations belonging to the support J0 of the distribution µ of g j ’s. This difference is important for Lemma 14. Let ν be the stationary measure of our chain and M0 be the support of ν. As usual, we say that a transformation g ∈ B is a contraction on a subset M0 ⊂ M if there is an n ≥ 1 and a c ∈ [0, 1) (both n and c may depend on g) such that ρ(g n .x , g n .x ) ≤ cρ(x , gx ) for any x , x ∈ M0 . If g ∈ B, then by x g we denote a fixed point of the transformation g: g.x g = x g . Lemma 13. If g ∈ B is a contraction on M then its fixed point x g ∈ M, belongs to the support M0 of the invariant measure ν of the Markov chain y j . Proof. Consider a random infinite sequence g1 , g2 , . . .. Since g ∈ J0 , almost every such sequence has the property that for any k ≥ 1 and any δ > 0 there are infinitely many i’s such that each element of the part gi , . . . , gi+nk−1 of the sequence approximates g so closely that (nk)

ρ(g ¯ nk , gi

)≤δ

where

(nk) def

gi

= gi+nk−1 . . . gi .

Moreover, by the law of large numbers these i’s have a positive frequency. Since ρ(x g , g nk .x ) = ρ(g nk x g , g nk .x ) ≤ ck ρ(x g , x ) for any x ∈ M, we have that (nk)

ρ(x g , gi

(nk)

.x ) ≤ ck ρ(x g , x ) + ρ(g nk .x , gi

.x ) ≤ ck ρ(x g , x ) + δ. ( j)

Hence any (small) neighbourhood of x g is visited by the sequence g1 .x , j ≥ 1, infinitely many times and, moreover, this happens with a positive frequency for almost every sequence g j , j ≥ 1. This implies that x g ∈ M0 and (g, x g ) ∈ J0 × M0 . Note that if the invariant measure ν of our Markov chain is ergodic, then the support M0 of this measure is a minimal set of B. The latter by definition means that the orbit {g.x : g ∈ B} of any x ∈ M0 is everywhere dense in M0 . Lemma 14. Let M0 ⊂ M be a minimal set of B. Suppose that there exist a gˆ ∈ B which is a contraction on M0 . Consider the set of all fixed points of B belonging to M0 : def

FixM0 (B) = {x : x ∈ M0 and there is a g ∈ B such that g.x=x }. Then FixM0 (B) is everywhere dense in M0 .

Lingering Random Walks in Random Environment on a Strip

285

Proof. The contraction gˆ given to us by the condition of the lemma has a fixed point xˆ ∈ M0 (it may have other fixed points too, but we are interested only in this one). Since M0 is minimal it coincides with the closure of the orbit {g.xˆ : g ∈ B}. For a given g ∈ B let us consider the point g.x. ˆ We shall now show that for a sufficiently large n the transformation g gˆ n has a fixed point which we shall denote x g gˆ n . Indeed, for any x , x ∈ M0 , ρ(g gˆ n .x , g gˆ n .x ) ≤ l g ρ(gˆ n .x , gˆ n .x ) ≤ l g cn ρ(x , x ). If n is such that l g cn < 1, then there is a fixed point x g gˆ n of g gˆ n . On the other hand, it is obvious that g gˆ n .x → g.xˆ as n → ∞ uniformly in x ∈ M0 because gˆ n .x → xˆ uniformly in x ∈ M0 . It follows that in particular x g gˆ n → g.xˆ and this proves the lemma. 4.2. Products of positive matrices. Lemma 15 below explains two versions of a well known contraction property of products of positive matrices (see, e.g. [5]). The first version of this property has already been explained and proved in the Appendix to [1] and we therefore prove here only the second version. There is a slight difference in the notations used in this paper and those we have introduced in [1] and no difference in the proof; we emphasize once again that this is done for the purposes of completeness and convenience of references in the proofs of other theorems. Lemma 15. Let an = (an (i, j)), n = 1, 2, . . . be a sequence of positive m ×m matrices, def def an > 0. Put H˜ n = an an−1 . . . a1 , Hn = a1 a2 . . . an and denote ⎛ ⎞−1 δ˜r = min ar (i, j)ar −1 ( j, k) ⎝ ar (i, j)ar −1 ( j, k)⎠ , 2 ≤ r ≤ n, i, j,k

j

⎛ δr = min ar (i, j)ar +1 ( j, k) ⎝ i, j,k

⎞−1 ar (i, j)ar +1 ( j, k)⎠

, 1 ≤ r ≤ n − 1.

(4.15)

j

Suppose that ∞

δ˜r = ∞.

r =2

Then the products Hn and H˜ n can be presented as follows: Hn = Dn [(cn (1)1, . . . , cn (m)1) + φn ], H˜ n = D˜ n [(c(1)1, ˜ . . . , c(m)1) ˜ + φ˜ n ], (4.16) where: Dn and D˜ n are diagonal matrices with positive diagonal elements; # # ˜ n ˜ φn ≤ rn−1 φ ≤ (1 − mδ ), r n r =2 (1 − m δr ); =1 c( ˜ j) are strictly positive numberswhich are uniquely defined by the sequence {ak }k≥1 , do not depend on n, and such that j c( ˜ j) = 1; cn ( j) are strictly positive numbers with j cn ( j) = 1 (note that cn ( j), unlike the c( ˜ j), do depend on n and, generally, do not have a limit).

286

E. Bolthausen, I. Goldsheid

Proof. Present Hn as follows: −1 a2 . . . D1−1 an = Dn a˜ 1 a˜ 2 . . . a˜ n , Hn = Dn Dn−1 a1 Dn−1 Dn−1 −1 ˜ where a˜ r ≡ Dn−r +1 ar Dn−r , D0 = I , and Dn−r = diag (Dn−r (1), . . . , Dn−r (m)) are diagonal matrices, with Dn−r (i) chosen so that to make matrices a˜ r stochastic. It is very easy to see that the only such choice is given by ar +1 (i, ir +1 ) ar +2 (ir +1 , ir +2 ) . . . an (i n−1 , i n ) Dn−r (i) = def

ir +1 ,...,i n

and ar (i, j) a˜ r (i, j) =

ir +1 ,...,i n

ir ,ir +1 ,...,i n

ar +1 ( j, ir +1 ) . . . an (i n−1 , i n )

ar (i, ir ) ar +1 (ir , ir +1 ) . . . an (i n−1 , i n )

≥ δr .

(4.17)

It is well known that the last estimate implies the following presentation of the product of stochastic matrices a˜ n : a˜ 1 a˜ 2 . . . a˜ n = (cn (1)1, . . . , cn (m)1) + φn , where min a˜ n (i, j) ≤ cn ( j) ≤ max a˜ n (i, j) i

i

(4.18)

and the matrices φn are such that φn ≤

n−1 $

(1 − mδr ).

r =1

4.3. A stability estimate. The stability property which we explain below is definitely well known to specialists in the relevant field. Given that the proof is very short, it seems that it is easier for us to prove it than to find a relevant reference. Let bn and bn be two sequences of transformations of a metric space (X, r) and def

def

= bn (xn ), n ≥ 1, with given initial values x1 , x1 ∈ X. For any xn+1 = bn (xn ), xn+1 def

two transformations b and b put ρ(b, ¯ b ) = supx∈X r(b(x), b (x)). Lemma 16. Suppose that (a) bn are uniformly contracting, that is there is a c, 0 ≤ c < 1, such that for any x, y ∈ X we have r(bn (x), bn (y)) ≤ cr(x, y); (b) ρ(b ¯ n , bn ) → 0 as n → ∞. Then r(xn , xn ) → 0 as n → ∞. If, instead of (b), a stronger property holds, namely ρ(b ¯ n , bn ) ≤ C2 c0n ρ(b ¯ 1 , b1 ) for some C2 and c0 < 1, then for > 0 there is a constant C3 such that r(xn , xn ) ≤ C3 c˜n (ρ(b ¯ 1 , b1 ) + r(x1 , x1 )), where c˜ = max(c, c0 ) + .

(4.19)

Lingering Random Walks in Random Environment on a Strip def

287

def

Proof. Put dn = ρ(b ¯ n , bn ) and rn = r(xn , xn ). Since r(xn+1 , xn+1 ) = r(bn (xn ), bn (xn )) ≤ r(bn (xn ), bn (xn )) + r(bn (xn ), bn (xn ))

≤cr(xn , xn ) + ρ(b ¯ n , bn ),

we have that rn+1 ≤ crn + dn ≤ dn + cdn−1 + · · · + ck dn−k + ck+1 rn−k .

(4.20)

For a given > 0 choose k so that ck rn−k ≤ (which is possible because X is a compact space and thus rn−k is a uniformly bounded sequence). Next choose N (, k) so that dn− j ≤ when n − j ≥ N (, k) − k. It follows now from (4.20) that rn ≤ (2 − c)(1 − c)−1 when n > N (, k). This proves the first statement of the lemma. To prove the second statement substitute k = n into (4.20) and take into account the stronger estimates for dn . Estimate (4.19) follows with an evident choice of C3 . Remarks. The second statement of this lemma does not use the fact that X is a compact space. Acknowledgement. This work was supported by the following grants of the Swiss National Foundation: 200020-107739/1 and 200020-116348. We are grateful to the Isaac Newton Institute for its hospitality during the program Interaction and Growth in Complex Stochastic Systems held in Cambridge, UK in 2003. We also thank the European Science Foundation Research Networking Programme on Phase-Transitions and Fluctuation Phenomena for Random Dynamics in Spatially Extended Systems (RDSES) for its financial support.

References 1. Bolthausen, E., Goldsheid, I.: Recurrence and transience of random walks in random environments on a strip. Commun. Math. Phys. 214, 429–447 (2000) 2. Brémont, J.: On some random walks on Z in random medium. Ann. Probab. 30, 1266–1312 (2002) 3. Brémont, J.: Behavior of random walks on Z in Gibbsian medium. C. R. Acad. Sci. Série 1 Math. 338(11), 895–898 (2004) 4. Brown, B.M.: Martingale Central Limit Theorems. Ann. Math. Statist. 42, 59–66 (1971) 5. Furstenberg, H., Kesten, H.: Products of random matrices. Ann. Math. Statist. 31, 457–469 (1960) 6. Goldsheid, I.: Linear and Sub-linear Growth and the CLT for Hitting Times of a Random Walk in Random Environment on a Strip. Probability Theory and Related Fields, appeared on line in August, 2007, DOI:10.1007/s00440-007-0091-0 7. Golosov, A.: Localization of random walks in one-dimensional random environments. Commun. Math. Phys. 92, 491–506 (1984) 8. Golosov, A.: On the limit distributions for a random walk in a critical one-dimensional random environment. Usp. Mat. Nauk 41(2), 189–190 (1986) 9. Gordin, M.I.: The Central Limit Theorem for stationary processes. Soviet Math. Dokl. 10, 1174–1176 (1969) 10. Gordin, M.I., Lifshits, B.A.: The Central Limit Theorem for stationary Markov processes. Sov. Math. Dokl. 19(2), 392–394 (1978) 11. Gordin, M.I., Lifshits, B.A.: The Invariance principle for stationary Markov processes. “Teorija verojatnostej i ejo primenenija” 1978, issue 4, pp. 865-866 (in Russian) 12. Hall, P., Heyde, C.C.: Martingale limit theory and its application. New York: Academic Press, 1980 13. Kesten, H.: The limit distribution of Sinai’s random walk in a random environment. Physica A 138, 299–309 (1986) 14. Kesten, H., Kozlov, M.V., Spitzer, F.: Limit law for random walk in a random environment. Comp. Math. 30, 145–168 (1975) 15. Key, E.: Recurrence and transience criteria for random walk in a random environment. Ann. Prob. 12, 529–560 (1984) 16. Lawler, G.: Weak convergence of a random walks in a random environment. Commun. Math. Phys. 87, 81–87 (1982)

288

E. Bolthausen, I. Goldsheid

17. Letchikov, A.V.: Localization of one-dimensional random walks in random environment. Soviet Scientific Reviews Section C: Mathematical Physics Reviews. Chur. Switzerland: Harwood Academic Publishers, 1989, pp. 173–220 18. Sinai, Ya.G.: The limiting behavior of a one-dimensional random walk in a random medium. Theory Prob. Appl. 27, 256–268 (1982) 19. Solomon, F.: Random walks in a random environment. Ann. Prob. 3, 1–31 (1975) 20. Zeitouni, O.: Random walks in random environment, XXXI Summer school in Probability, St. Flour (2001). Lecture notes in Math. 1837, Berlin:Springer, 2004, pp. 193–312 Communicated by M. Aizenman

Commun. Math. Phys. 278, 289–306 (2008) Digital Object Identifier (DOI) 10.1007/s00220-007-0402-4

Communications in

Mathematical Physics

Symplectic Fibrations and the Abelian Vortex Equations T. Perutz DPMMS, Centre for Mathematical Sciences, University of Cambridge, Wilberforce Road, Cambridge CB3 0WB, United Kingdom. E-mail: [email protected] Received: 5 June 2006 / Accepted: 7 August 2007 Published online: 13 December 2007 – © Springer-Verlag 2007

Abstract: The n th symmetric product of a Riemann surface carries a natural family of Kähler forms, arising from its interpretation as a moduli space of abelian vortices. We give a new proof of a formula of Manton–Nasir [10] for the cohomology classes of these forms. Further, we show how these ideas generalise to families of Riemann surfaces. These results help to clarify a conjecture of D. Salamon [13] on the relationship between Seiberg–Witten theory on 3–manifolds fibred over the circle and symplectic Floer homology. 1. Introduction 1.1. Relative symmetric products. Consider a pair of smooth, oriented manifolds X and S with dim(X ) − dim(S) = 2, and a proper submersion π : X → S. Thus π is a smooth fibre bundle, and its typical fibre is a compact orientable surface . Definition 1.1. The r th symmetric product bundle, or relative symmetric product, πr : SymrS (X ) → S, is defined to be the quotient by the symmetric group Sr of the fibre product ×r : π(x1 ) = · · · = π(xr )} X ×r S = {(x 1 , . . . , xr ) ∈ X

with its natural projection to S. 1.1.1. Smooth structures SymrS (X ) is a topological manifold, but it does not inherit a smooth structure from X . To make SymrS (X ) a smooth manifold one should choose a complex structure j on the vertical tangent bundle T v X = ker(Dπ ) ⊂ T X , compatibly with the orientations. Then the fibres become complex manifolds. The smooth atlas on the relative symmetric product is generated by charts which are obtained by fibrewise application of the elementary symmetric functions to ‘restricted charts’ : D 2 × U → X.

290

T. Perutz

This means that there is a chart ψ : U → S such that (i) π ◦ = ψ ◦ pr 2 , and (ii) : D 2 × {s} → X ψ(s) is a holomorphic embedding, for each s ∈ U . As observed by Donaldson and Smith [4], the existence of such charts is a consequence of the parametrised Riemann mapping theorem. We will write SymrS (X ; j) when we want to emphasise that this is the smooth structure being considered. Different choices, say j0 , j1 , give distinct smooth structures. However, SymrS (X ; j0 ) is diffeomorphic to SymrS (X ; j1 ), as one can see by considering the relative symmetric product of X × [0, 1] → S × [0, 1], equipped with an interpolating family jt . 1.1.2. Kähler forms The symmetric product Symr () of a Riemann surface equipped with a Kähler form ω is itself a Kähler manifold. To be precise, a Kähler form is determined by a hermitian line bundle (L , | · |) of degree r over , together with a real parameter τ > 2πr/ ω. The reason is that the symmetric product can be identified canonically with a moduli space of abelian vortices, and this has a natural quotient symplectic structure. There is a generalisation of this to the case of relative symmetric products. We first fix our conventions concerning families of symplectic manifolds: Definition 1.2. (a) A symplectic fibration with typical fibre (M, ω) is a smooth fibre bundle p : X → S together with a vertical two-form ω, i.e. a section of 2 (T v X )∗ , such that each fibre (X s , ω|X s ) is a symplectic manifold isomorphic to (M, ω). (b) A locally Hamiltonian fibration (LHF) is a triple (X, p, ), where p : X → S is a smooth fibre bundle and a closed two-form on X such that (X s , |X s ) is a symplectic manifold for each s ∈ S.1 Relative symmetric products of symplectic surface-fibrations are again symplectic fibrations: if ( p : X → S, ω) is a symplectic fibration with typical fibre (, ω), and one specifies a hermitian line bundle over X of fibrewise degree r and a real parameter, then SymrS (X ) → S becomes a symplectic fibration. In this paper we show how to promote this functor to locally Hamiltonian fibrations, using the abelian vortex equations. In doing so we extend Salamon’s work [13] which applies to bundles over S 1 . Our method enables one to determine the cohomology classes of the closed forms which arise, in terms of natural operations relating the cohomologies of X and SymrS (X ). 1.2. Statement of results. There is a sequence of natural operations sending cohomology classes on X to cohomology classes on the relative symmetric product SymrS (X ) of X → S to classes on X . These come about via the universal (or tautological) divisor univ = SymrS (X ) × S X, i.e., the locus of pairs (D, x), where x ∈ Supp(D). This carries a codimension-two homology class relative to boundary, and dually, a cohomology class δ ∈ H 2 (SymrS (X ) × S X ; Z). 1 The term ‘locally Hamiltonian fibration’ is used in [9] in a slightly more restrictive way than here; there it is assumed that the base is 2-dimensional and that the form satisfies the normalisation condition introduced by Guillemin and Sternberg.

Symplectic Fibrations and the Abelian Vortex Equations

291

For example, when X → S is a holomorphic fibration, δ = c1 (O( univ )). Using the projection maps p1

p2

SymrS (X ) ←−−−− SymrS (X ) × S X −−−−→ X, and cup products in cohomology, define, for each k ≥ 0, the map H ∗ (X ; Z) → H ∗+2k−2 (SymrS (X ); Z); c → c[k] := p1! ( p2∗ c) δ k .

(1)

These operations evidently behave in a natural way under base-change (i.e. pulling back by S → S). It is known [11, Lemma 2.1.1] that 1 c1 (T v X )[1] + 1[2] . c1 (T v SymrS (X )) = 2 Theorem 1. Let (X, π, ) be a proper, locally Hamiltonian surface-fibration over a manifold S, and r a positive integer. Choose • an -positive complex structure j on T v X ; • a hermitian line bundle (L , | · |) over X such that L|X s has degree r for each s ∈ S, and a unitary connection Aref on L; • a real parameter τ . We require τ > 2πra −1 , where a is the symplectic area of a fibre. There is a procedure which associates to these data a closed two-form v(, τ, L) on the relative symmetric product SymrS (X ; j) which makes it a locally Hamiltonian fibration. This procedure is compatible with restriction of the base S. The form v(, τ, L) restricts on each fibre Symr (X s ) to the canonical Kähler form arising from the abelian vortex equations with parameter τ . Its cohomology class is [v(, τ, L)] = 2π τ [][1] − π 1[2] ∈ H 2 (SymrS (X ); R). In particular, the class [v(, τ, L)] does not depend on the line bundle L. By applying the theorem to fibrations X × U → S × U , one sees that there is smooth dependence on parameters. It can be verified without difficulty that, when the base S is the circle, the form v(, τ, L) coincides with the one found by Salamon in [13]. Remark 1.3. In the case where the base S is a point, the result specialises to a formula for the cohomology class of the canonical Kähler form on the vortex moduli space (Theorem 3). This formula is due to Manton and Nasir [10]). Note, though, that their work relies on a local expansion of the Kähler form [14] whose derivation has not received the thoroughgoing analytic treatment a pure mathematician would ask for. Some further remarks on the nature of the theorem are in order. The interesting thing is not the existence of closed, fibrewise-Kähler two-forms in the specified cohomology class. Indeed, a patching procedure due to Thurston, standard in symplectic geometry, gives an easy construction of such forms. The point is rather that, among such forms, there are some which have a definite geometric (specifically, gauge-theoretic) origin. This geometric construction is closely related to the Seiberg–Witten equations on fibred 3– and 4–manifolds: see [13] and our discussion of Floer homology below. Let us say that locally Hamiltonian structures 0 , 1 on the same fibre bundle π : X → S are isotopic if there exists a locally Hamiltonian structure ∈ 2[0,1]×X on π × id : [0, 1] × X → [0, 1] × S with |{i} × X = i for i = 0, 1. We call LHFs equivalent if they are related under the equivalence relation generated by isotopy and two-form-preserving bundle isomorphism.

292

T. Perutz

Corollary 2. Fix a proper surface-bundle π : X → S. Choose two sets of data (0 , j0 , L 0 , | · |0 , Aref,0 , τ0 ), (1 , j1 , L 1 , | · |1 , Aref,1 , τ1 ) as above, and suppose that [τ0 0 ] = [τ1 1 ] ∈ H 2 (X ; R). Then the LHFs (SymrS (X ; j0 ), πr , v(, L 0 , τ0 )), (SymrS (X ; j1 ), πr , v(, L 1 , τ )) are equivalent. Proof. Because τ0 0 and τ1 1 represent the same cohomology class, the locally Hamiltonian fibrations (X, π, τ0 0 ) and (X, π, τ1 1 ) are isotopic: an isotopy is given by the form τ0 0 + d(tβ) ∈ Z 2 (X × [0, 1]), where τ1 1 − τ0 0 = dβ. This restricts to the slice X × {t} as (1 − t)τ0 0 + tτ1 1 , and hence is positive on the fibres X s,t of X × [0, 1] → S × [0, 1]. We can give X × [0, 1] a vertical complex structure J by choosing a path jt between the given ones. In the case that L 1 = L 0 , there is a hermitian line bundle (L , | · |) with connection over X × [0, 1] which restricts to (L i , | · |i ) on the ends. The form v(τ0 0 + d(tβ), L , 1) on SymrS×[0,1] (X × [0, 1]; J ) restricts on the ends to v(τi i , L i , 1). Hence (SymrS (X ; j0 ), πr , v(, L 0 , τ0 )) is equivalent to (SymrS (X ; j1 ), πr , v(, L 1 , τ1 ). It remains to show that changing the line bundle does not affect things, and for this we may assume that 0 = 1 (write for this single form) and j0 = j1 . By the theorem, we can write v(, L 0 , τ )−v(, L 1 , τ ) = dγ . Then, since v(, L 0 , τ ) and v(, L 1 , τ ) are both Kähler, the form v(, L 0 , τ ) + d(tγ ) on SymrS (X ) × [0, 1] gives an isotopy between them.

1.3. Floer homology for fibred three-manifolds. Floer homology for symplectic automorphisms works as follows. Let be the universal Z/2–Novikov ring: the ring of formal ‘series’ λ∈R a(λ)t λ , where a : R → Z/2 is a function such that (−∞, c] ∩ Supp(a) is finite for any c ∈ R. Let (X, p, ) be a LHF over a compact one-manifold S, and suppose that its fibres are compact, ‘weakly monotone’ symplectic manifolds (i.e. c1 (X s ) is positively proportional to the symplectic class, or else c1 (X s ) vanishes on π2 (X s ), or else every S ∈ π2 (X s ) has absolute Chern number |c1 (X s ), [S]| ≥ dim(X s )/2 − 2). One can then associate with (X, p, ) a -module H F∗ (X, p; ). The underlying chain group is freely generated by the set of sections of X which are horizontal for the natural connection determined by (more precisely, by some generic perturbation of ). The differential involves moduli spaces of pseudo-holomorphic sections of X × R → S × R. Isotopic LHFs have isomorphic Floer homologies. Two-form-preserving bundle isomorphisms also give isomorphisms in Floer homology. This theory is in a sense too rich: different local Hamiltonian structures may give different modules. In the case where the fibres X s are complex manifolds, one way to make the theory more manageable is to consider closed two-forms which are not just fibrewise-symplectic, but actually fibrewise-Kähler. If one also fixes the cohomology

Symplectic Fibrations and the Abelian Vortex Equations

293

class of these forms then the set of possible choices is a convex set, and H F∗ (X, p; ) is independent of the specific choice of . An example of this method of making Floer homology manageable occurs in work of Seidel [15], who applies it to mapping tori of automorphisms of a surface of genus g ≥ 2. He thereby constructs invariants of mapping classes, π0 Diff + () [φ] → H F∗ ([φ]). Though not well understood, these invariants are far from trivial: Seidel shows that the identity mapping class [id] is characterised by the property that, under a natural action by the homology of the surface on Floer homology, H 2 (; Z/2) does not annihilate the whole module. One way to generalise Seidel’s set-up is as follows. Let π : Y → S 1 be a three-manifold fibred over S 1 , and consider its relative symmetric product SymrS 1 (Y ; j). We make it an LHF using a closed, fibrewise-Kähler two-form drawn from a particular cohomology class. The output will then depend only on the cohomology class chosen: for each class W which restricts to a Kähler class on the fibre Symr (), we get a module H F∗ (SymrS 1 (Y ), πr ; W ). The requirement that the fibres should be weakly monotone forces us to exclude the range g/2 ≤ r < g − 1, where g is the genus of the fibre. Let us take W to be one of the classes occurring in our theorem: W = W (w) = 2π w [1] − π 1[2] , where w = [τ ]. In this way we obtain a Floer homology module H F∗ (Y, π, r ; w) := H F∗ (SymrS 1 (Y ), πr ; W ) by giving only (Y, π, r ) together with a class w ∈ H 2 (Y ; R) which integrates positively over the fibres of π . Corollary 2 implies that these modules are well-defined, up to canonical isomorphism. For discussion of the dependence on w we refer to [16]. Now, we can of course represent W by one of the forms v(, τ, L) supplied by the theorem. Doing so is not of any great help in computing Floer homology, but it is highly relevant when we try to understand the relation between the symplectic Floer theory just discussed and the monopole Floer homology of the three-manifold Y . Monopole Floer homology is the Floer theory arising from the Chern–Simon–Dirac functional over a 3–manifold with Spinc –structure, a functional whose critical points are precisely the Seiberg–Witten monopoles. Specifically, the name refers to the theory constructed by Kronheimer–Mrowka in their authoritative forthcoming book [7]. It is a ‘perturbed’ version of monopole Floer homology which is of interest and for this we can again use as coefficient ring. Salamon’s proposal from [13], based on an adiabatic limit computation, is that there should be an isomorphism between symplectic and monopole Floer homologies. Expressing the conjecture in terms of Kronheimer–Mrowka’s conventions (and in terms of the notions of this paper) requires a little care because Salamon’s conventions differ in various (inessential) ways. If I have accounted correctly for these discrepancies, the statement is that there is an isomorphism between symplectic Floer homology for SymrS 1 (Y ) with the form v(, τ, L) (i.e., H F∗ (Y, π, r ; w), where w = [τ ]) and a certain summand in the -module H M∗ (Y ; −4π w − 32π 2 c1 (T v Y )), the monopole Floer homology with perturbation class −4π w − 32π 2 c1 (T v Y ). (The perturbation class is non-zero, providing we assume g ≥ 0 or τ 0, so that there are no reducible monopoles and only one version of monopole Floer theory.) The summand

294

T. Perutz

in question is the direct sum of submodules H M∗ (Y, t; w), where t ranges over those Spinc -structures on Y for which c1 (t), [] = χ () + 2r . Let us tie up this discussion. On one hand, we can use pure symplectic geometry to build a group H F∗ (Y, π, r ; w). Specifying the cohomology class on a relative symmetric product—namely, w [1] − π 1[2] or a multiple of it—is an essential part of the construction. On the other hand, the existence of the special forms v(, τ, L), and Salamon’s adiabatic limit, suggest that these modules should have a gauge-theoretic interpretation. We note finally that the modules H F∗ (Y, π, r ; w) fit into a field theory for Lefschetz fibrations over surfaces with boundary, which has been studied by M. Usher [16] and the author [12] (the latter extends the framework to a larger class of singular fibrations). This too is thought to be intimately related to Seiberg–Witten theory.

2. The Vortex Equations 2.1. Review of moduli spaces of vortices. Fix a closed Riemann surface (, j), a Kähler form ω ∈ 1,1 , and a hermitian line bundle (L , | · |) over , of degree r > 0. Let A(L , | · |), or A(L), denote the space of U(1)-connections (an affine space modelled on the imaginary one-forms i1 ). The gauge group, of smooth maps from to U(1), is denoted by G. Its Lie algebra is i0 . The pairing 0

⊗ i0

→ R,

f ⊗ ig →

fgω

embeds 0 into the dual of the i0 . We consider moment maps for Hamiltonian Gactions as maps into 0 . Connections, sections and gauge transformations are by default C ∞ , and the spaces are given their C ∞ topologies. We also need A21 , the space of U(1)-connections of Sobolev class L 21 (i.e. differing from a smooth one by an L 21 form); the space of sections L 21 (L); and the Sobolev gauge group G22 = L 22 (, U(1)). Note that a map → C of class L 22 is continuous, by the Sobolev embedding theorem, and hence has a pointwise norm. 2.1.1. Action of the gauge group The conformal structure j induces a Kähler structure on the space of connections A(L). Its two-form is (a1 , a2 ) → ia1 ∧ ia2 , a1 , a2 ∈ i1 . (2)

The complex structure is the Hodge star a → ∗ j a. The action of the gauge group G on A(L) is Hamiltonian, with (equivariant) moment map A(L) → i0 ;

A → ∗iFA .

(3)

The symplectic form ω induces a Kähler structure on 0 (L), with two-form (φ1 , φ2 ) →

Imφ1 , φ2 ω, φ1 , φ2 ∈ 0 (L)

(4)

Symplectic Fibrations and the Abelian Vortex Equations

295

and complex structure φ → iφ. The gauge-action on 0 (L) is Hamiltonian with moment map 0 (L) → 0 ; ψ →

1 2 |ψ| . 2

(5)

The manifold C(L) := A(L) × 0 (L) carries the product Kähler structure σ , which depends on both j and ω. The moment map m for the diagonal G-action is the sum of the moment maps of the factors, 1 m : C(L) → 0 , m(A, ψ) = ∗i FA + |ψ|2 . 2

(6)

The Chern–Weil formula gives some basic information about this moment map: ⎧ −1 ⎪ ⎨< r : m (τ ) = ∅; 1 τ ω = r : m(A, ψ) = τ ⇒ ψ ≡ 0; ⎪ 2π ⎩> r : m(A, ψ) = τ ⇒ ψ ≡ 0.

In fact, m is submersive at (A, ψ) precisely when ψ ≡ 0, which is also the locus on which the gauge-action is free. When τ ω > 2πr , the free gauge-action on µ−1 (τ ) admits local slices (see below), so the Kähler quotient m −1 (τ )/G is a Kähler manifold. 2.1.2. The vortex equations The vortex equations with parameter τ are the following coupled equations for a pair (A, ψ) ∈ C(L): ∂¯ A ψ = 0 in 0,1 (L), m(A, ψ) = τ in

0 .

(7) (8)

Individually, we will refer to them as the Cauchy–Riemann equation and the moment map equation. The space of solutions V(L , τ ) is invariant under G, and the quotient space V(L , τ ) := V(L , τ )/G is called the vortex moduli space. The fundamental results about V(L , τ ) are as follows. Proposition 2.1. Assume that

τ ω > 2πr .

(a) The space V(L , τ ) is a finite-dimensional, complex—therefore smooth and Kähler—submanifold of m −1 (τ )/G. (b) The map Z : V(L , τ ) → Symr (), [A, ψ] → ψ −1 (0) is an isomorphism of complex manifolds.

296

T. Perutz

The unitary connection A induces a holomorphic structure on L: a local section is holomorphic if and only if it lies in ker ∂¯ A . By means of the holomorphic structure, one attaches multiplicities to points of ψ −1 (0), so that ψ has r zeros in all. This makes sense of Z . We write L A for L with this holomorphic structure. Item (a) is proved by an elliptic regularity argument, and we shall say a little more about it. As for (b), the statement that Z is bijective is an existence and uniqueness theorem for solutions to the vortex equations. This is the heart of the theorem, and various proofs are known, see e.g. Jaffe and Taubes [6], García-Prada [5]. The ‘degenerate’ case, where τ ω = 2πr , is also interesting: Addendum 2.2. When τ ω = 2πr , the moduli space V(L , τ ) = {(A, 0) : iFA = τ ω}/G is a finite-dimensional, complex—therefore smooth and Kähler—submanifold of m −1 (τ )/ G. The map V(L , τ ) → Pic L (); [A, 0] → L A is an isomorphism of complex manifolds. Here Pic L () is the Picard torus of holomorphic structures on L. 2.1.3. Smoothness of the moduli space This is a standard application of elliptic theory. We run through it briefly in preparation for the family version considered later; see [13] for some more details. The tangent space to the affine space C21 is the space of pairs (a, φ), where a is an imaginary one-form, φ a section, both of class L 21 . One obtains local slices for the action of G by imposing the Coulomb gauge condition d ∗ (ia) + Imψ, φ = 0,

(9)

which says that (a, φ) is orthogonal to the gauge-orbit of (A, ψ). Note that the left-hand side is gauge-equivariant. The linearisations of the two vortex equations at the solution (A, ψ) are (10) ∂¯ A φ + a 0,1 ψ = 0, ∗ida + Reψ, φ = 0. The second of these and (9) are real and imaginary parts of the single equation 1 (11) ∂¯ ∗ (a 0,1 ) − ψ, φ = 0. 2 Hence the space of solutions to Eqs. (10, 9) is the kernel of the C-linear differential operator 1 D(A,ψ) : (a, φ) → (∂¯ A φ + a 0,1 ψ, ∂¯ ∗ (a 0,1 ) − ψ, φ). (12) 2 Now, D(A,ψ) is a compact perturbation of the Fredholm operator (a, φ) → (∂¯ A φ, ∂¯ ∗ (a 0,1 )), which has index (r + 1 − g) − (1 − g) = r (over C). Hence D(A,ψ) is also Fredholm of ∗ index r . It is surjective (this can be seen by computing D(A,ψ) D(A,ψ) , see [13]), so its kernel has constant rank r . From this point it is straightforward to check, using the implicit function theorem, that V(L , τ ) is a differentiable submanifold of m −1 (τ )/G. Since its tangent spaces ker(D A,ψ ) are complex linear, it is a complex submanifold.

Symplectic Fibrations and the Abelian Vortex Equations

297

2.2. The Kähler class on the vortex moduli space. As we have seen, the moduli space V(L , τ ) is a complex manifold equipped with a canonical Kähler form στ . We write στ also for its pullback by Z −1 , a Kähler form on Symr (). The target of this section is to determine its cohomology class. A (2 − p)-cycle ζ in gives rise to a closed subset δζ ⊂ Symr () representing a (2r − p)-cycle: δζ consists of divisors D ∈ Symr () such that mult x (D) = mult x (ζ ) for all x ∈ . Using this map followed by Poincaré duality on Symr (), we obtain a map ν p : H p (; Z) → H 2− p (Symr (); Z). It is well-known that ν1 is an isomorphism. When p = 2, an isomorphism ∼ =

H0 (; Z) ⊕ 2 H1 (; Z) → H 2 (Symr (); Z) is given by (a, b ∧ c) → ν0 (a) + ν1 (b) ∪ ν1 (c). We define • •

η ∈ H 2 (Symr (); Z) to be the class corresponding to the point class in H0 (; Z); θ ∈ H 2 (Symr (); Z) to be the class corresponding to the cup-product form on H 1 (; Z) (here we think of the cup-product form as an element of Hom(2 H 1 (; Z), Z) = 2 H1 (; Z)).

Often we conflate these integral classes with their images in real cohomology. Theorem 3. The equation 1 [στ ] = 2π

τ ω η + 2π(θ − r η)

holds in H 2 (Symr (); R). As already mentioned, this formula was found by Manton–Nasir [10]. Our (quite different) method of proof is to exhibit connections on two line bundles over the orbit space of irreducible pairs, C∗ /G. The Chern classes of these line bundles restrict to η and θ − r η on V(L , τ ), while the appropriate linear combination of their curvature forms restricts exactly to the form στ . 2.2.1. Cohomology of the orbit space We write C∗ = C∗ (L) for the space of pairs (A, ψ) ∈ C(L) with ψ not identically zero, B∗ for the orbit space C∗ /G, and i : V(L , τ )→ B∗ for the inclusion. Lemma 2.3. i induces a surjection on cohomology, and an isomorphism on H ≤2 . Proof. Using the cohomology slant product operation, define µB : H∗ (; Z) → H 2−∗ (B∗ ; Z), h → c1 (LB )/ h, µSym : H∗ (; Z) → H 2−∗ (Symr (); Z), h → c1 (LSym )/ h.

298

T. Perutz

Here the line bundle LB → B∗ × is LB = L B /G, where the equivariant line bundle r ∗ L B → C × is the pullback of L → ; and LSym → Sym ()× is the topological univ ⊂ Symr () × . line bundle corresponding to the universal divisor These maps extend uniquely to ring homomorphisms ∗ H1 (; Z) ⊗Z Z[H0 ()] → H ∗ (B∗ ; Z), ∗ H1 (; Z) ⊗Z Z[H0 ()] → H ∗ (Symr (); Z), since the ring on the left is freely generated by H0 (; Z) ⊕ H1 (; Z). These are homomorphisms of graded rings where the grading on the left is characterised by the property that Hi (; Z) has degree 2 − i. The first of these two maps is an isomorphism [1, pp. 539–545]. The second is surjective, since the image of µSym contains H 1 (Symr (); Z) and the class η, and these generate the cohomology ring. To prove the lemma it suffices to show that i ∗ ◦ µ1 = Z ∗ ◦ µSym . This follows from the fact that (i × 1)∗ LB is isomorphic to (Z × 1)∗ LSym . To see that these bundles are isomorphic, observe that the former has a tautological section which vanishes precisely along univ .

It is convenient to have some notation to hand for integral (co)homology classes on . Let e0 ∈ H0 () be the class of a point, e2 ∈ H2 () the orientation class. Let e0 ∈ H 0 (), e2 ∈ H 2 () be their duals. Let {αi , β j }1≤i, j≤g be a symplectic basis for H1 (), and {α i , β j }1≤i, j≤g the dual basis for H 1 (). Now put η = µ1 (e0 ), θ=

g

µ1 (αi ) ∪ µ1 (βi ).

(13)

i=1

η − 2 θ in H 2 (B∗ ; Z). Lemma 2.4. c1 (LB )2 /e2 = 2r Proof. The group H 2 (B∗ × ; Z) is the direct sum of its Künneth components H 0 (B∗ ; Z) ⊗ H 2 (; Z), H 1 (B∗ ; Z) ⊗ H 1 (; Z) and H 2 (B∗ ; Z) ⊗ H 0 (; Z). The Chern class c1 (LB ) is tautologically the sum of µ1 (e2 ) ⊗ e2 ∈ H 0 (B∗ ; Z) ⊗ H 2 (; Z), g

(µ1 (αi ) ⊗ α i + µ1 (βi ) ⊗ β i ) ∈ H 1 (B∗ ; Z) ⊗ H 1 (; Z),

i=1

µ1 (e0 ) ⊗ e0 ∈ H 2 (B∗ ; Z) ⊗ H 0 (; Z). Let us call these terms A, B and C respectively. Note that A = r.1 ⊗ e2 (by definition of LB ) and C = η ⊗ e0 . The Künneth isomorphism is compatible with cup products, providing that one uses the graded tensor product of graded rings. Thus A ∪ C = r η ⊗ e2 = C ∪ A, and g 2 B = µ1 (αi ) ∪ µ1 (βi ) − µ1 (βi ) ∪ µ1 (αi ) ⊗ e2 = −2 θ ⊗ e2 . i=1

Hence c1 (LB )2 /e2 = 2(r η − θ ).

Symplectic Fibrations and the Abelian Vortex Equations

299

2.2.2. A connection on LB We now write down a canonical connection ∇ on LB , and compute its curvature. This calculation is modelled on that of Donaldson and Kronheimer [3, p. 195]. We will use the curvature form, together with its wedge-square, to construct a closed two-form on B∗ , representing a known cohomology class, whose restriction to V(L , τ ) is στ . The connection ∇ is concocted from two ingredients: • •

on the line bundle pr ∗ L → A∗ × ; a certain unitary, G-invariant connection ∇ 2 a certain connection on the principal G-bundle C∗ → B∗ , pulled back to B∗ × .

As explained in [3], such data determine a connection ∇ on the quotient line bundle LB → B∗ × , characterised by the condition vˆ (ˆs ) (∇v s)ˆ = ∇ for local sections s and vector fields v, where ˆ· denotes -horizontal lifting. is trivial in the C∗ -directions and tautological in the -directions. The connection ∇ To amplify: a section of pr ∗2 L is a map s : C∗ × → L with s(A, ψ, x) ∈ L x , and at the point (A, ψ, x),

(a,φ,v) s = d A,v (s|{A, ψ} × )(x) + d s(A + ta, ψ + tφ, x) (x). (14) ∇ dt t=0 is given by The curvature of ∇ F∇ ((0, 0, u), (0, 0, v)) = F A (u, v), F∇ ((a, φ, 0), (0, 0, v)) = a, v, F∇ ((a, φ, 0), (a , φ , 0)) = 0.

(15)

We can obtain a connection on C∗ → B∗ from our gauge-fixing condition: the horizontal space over [A, ψ] is the kernel of the linear operator (a, φ) → d ∗ (ia) − Imψ, φ. To write down the connection one-form , we need the Green’s operator G ψ associated to the Laplacian ψ = d ∗ d + |ψ|2 :

0 → 0 .

ψ is surjective (since d ∗ d maps onto the functions of mean-value zero), inducing an isomorphism of ker( ψ )⊥ with 0 ; its inverse is G ψ . Lemma 2.5. The connection one-form is given by (A,ψ) (a, φ) = iG ψ (d ∗ ia − Imψ, φ) ∈ i0 . Proof. This form has the correct kernel, so to justify the assertion one simply observes that it is invariant under G: (A,ψ,x) (−d f, f ψ, 0) = f,

f ∈ i0 .

300

T. Perutz

In accordance with the general pattern explained in [3], the curvature of the quotient connection ∇ on LB → B∗ × is given by F∇ ((0, 0, u), (0, 0, v)) = FA (u, v), F∇ ((a, φ, 0), (0, 0, v)) = a, v, F∇ ((a1 , φ1 , 0), (a2 , φ2 , 0)) = 2iG ψ (d ∗ ib − Imψ, χ ).

(16)

Here (a1 , φ1 ) and (a2 , φ2 ) are vector fields on B∗ which are horizontal with respect to ; their Lie bracket is (b, χ ). Lemma 2.6. Suppose (a1 , φ1 ) and (a2 , φ2 ) are horizontal. Then d ∗ (ib) − Imψ, χ = −Imφ1 , φ2 . Proof. Denote the pair (A + ta1 , ψ + tφ1 ) by ct . Then, at (A, ψ), 1 (a2 (c0 ) − a2 (ct )) + o(t), t 1 χ = (φ2 (c0 ) − φ2 (ct )) + o(t) t b=

as t → 0. But at ct , d ∗ ia2 = Imψ + tφ1 , φ2 (ct ), and from this one obtains d ∗ (ib) − Imψ, χ = − lim Imφ1 (c0 ), φ2 (ct ) = −Imφ1 (c0 ), φ2 (c0 ). t→0

2.2.3. Two-forms as curvature integrals We are now in a position to write down closed two-forms representing c1 (LB )/e0 and c1 (LB )2 /e2 in de Rham cohomology. Note. In this paragraph we insist that the tangent vectors (a j , φ j ) are horizontal. The class c1 (LB ) has the Chern–Weil representative iF∇ /2π , so 1 c1 (LB )/e0 = iF∇ ∧ ω0 , where ω0 = 1. (17) 2π Explicitly, this representative for c1 (LB )/e0 is the two-form 1 ((a1 , φ1 ), (a2 , φ2 )) → G ψ (Imφ1 , φ2 ) ω. π [ω]

(18)

Similarly, c1 (LB )2 /e2 =

1 4π 2

iF∇ ∧ iF∇ .

(19)

This integral involves the product of the first and third curvature terms, and the square of the second term. So c1 (LB )2 /e2 has the representative 1 1 ((a1 , φ1 ), (a2 , φ2 )) → 2 G ψ (Imφ1 , φ2 )iFA − ia1 ∧ ia2 . (20) π 2π 2 Notice the appearance of an expression familiar from (2) as the second term.

Symplectic Fibrations and the Abelian Vortex Equations

301

At this point we impose the moment map equation, restricting these forms and classes to the locus where m(A, ψ) = τ . On that locus, the class (21) 4π 2 ( θ − r η) + 2π [τ ω] η = 2π −π c1 (LB )2 /e2 + [τ ω]c1 (LB )/e0 is represented by the form ia1 ∧ ia2 + 2 G ψ (Imφ1 , φ2 ) (τ ω − iFA ) = ia1 ∧ ia2 + G ψ (Imφ1 , φ2 )|ψ|2 ω = ia1 ∧ ia2 + Imφ1 , φ2 ω

= σ ((a1 , φ1 ), (a2 , φ2 )).

(22)

(Recall that σ is our standard Kähler form on C∗ ). The penultimate equality uses the observation that, because the Laplacian of a function f has mean value zero, fω = ψ G ψ ( f ) ω = |ψ|2 G ψ ( f ) ω.

Proof of Theorem 3. What we have just found is that the class 2π([τ ω] η + 2π( θ − r η)) on B∗ , restricted to m −1 (τ )/G, is equal to [στ ]. Restricting further to the vortex moduli space, we find that the class of our preferred Kähler form is 2π([τ ω] η + 2π( θ − r η))|V(L , τ ) ∈ H 2 (V(L , τ ); R). Hence, pulling back by Z , we find that the class of our Kähler form on Symr () is 2π([τ ω]η + 2π(θ − r η)), which is the formula we have been working towards.

2.3. The Duistermaat–Heckman formula. The Duistermaat–Heckman formula [2] for the variation of cohomology of symplectic quotients gives another proof that the cohomology class [στ ] varies linearly with τ —provided that τ is a constant function—and computes the slope. Suppose that one has a Hamiltonian S 1 -action on (M, ω), with moment map µ : M → ∗ t . Here t = Lie(S 1 ). Identify t∗ with R so that the lattice dual to exp−1 (1) ⊂ t corresponds to Z ⊂ R. Suppose that µ is proper, and that its restriction to µ−1 () is submersive, for some open interval ⊂ R. The family of symplectic quotients (Mt , ωt )t∈ is then a trivial fibre bundle, and a trivialisation gives an identification of the cohomology of Mt with that of a fixed fibre Ms . The identification is canonical, hence {[ωt ]}t∈ can be considered as a family of classes on Ms . Suppose that S 1 acts freely on µ−1 (s), so that µ−1 (s) → Ms is a principal circle-bundle, with Chern class c ∈ H 2 (Ms ; R). The Duistermaat–Heckman formula says that d [ωt ] = 2π c. dt

(23)

We apply this with M = τ ∈R V(L , τ )/G0 , where G0 is the based gauge group {u : → U(1) : u(x) = 1}, x ∈ an arbitrary basepoint, and τ ∈ R stands for a constant function

302

T. Perutz

on . The circle acts by constant gauge transformations. We take = (2πr/ ω, ∞); V(L , τ )/G is η. Formula 23 gives the Chern class c of V(L , τ )/G0 →

d [στ ] = 2π ω η, dτ which is consistent with our result. One can formally recover the constant term 4π 2 (θ − r η) by specialising to the degenerate parameter τ = 2πr/ ω (for which the formula [στ ] = 4π 2 θ is easily verified); however, justifying this formal manipulation would need further thought. Since Duistermaat and Heckman’s proof identifies the variation in the symplectic forms with the curvature of a connection on µ−1 (s) → Ms , the two methods are perhaps not so different as they appear. 3. Families of Vortex Moduli Spaces 3.1. Construction of the vortex fibration. (a) Suppose that X → S is a smooth fibre bundle, where X and S are connected and oriented, and that the typical fibre is a compact surface . Let L → X be a principal U(1)-bundle, and assume that L|X s → X s has degree r > 0. Consider X → S as a fibration with structure group Diff + (). Putting P = L|X s , we can consider the composite map L → X → S as a fibration with typical fibre P and structure group Diff +P (). The latter is the group of pairs (g, ˜ g), where g˜ ∈ Aut(P) is an automorphism covering g ∈ Diff + (), so it is an extension of Diff + () by the gauge group. There are natural left actions of Diff +P () on the space of connections A(P) and on the space of sections 0 (P). These arise through the covariance of connections and of sections; representing a connection by its one-form A ∈ 1P , we have g.A ˜ = g˜ −1∗ A; g.ψ ˜ = g˜ ◦ ψ ◦ g −1 . One can then form the associated fibrations L ×Diff +P () A(P) → S,

L ×Diff +P () 0 (P) → S,

with structure group Diff +P (). These may be thought of as the bundles of connections (resp. sections) along the fibres of X → S: L ×Diff +P () A(P) ∼ = {(s, A) : s ∈ S, A ∈ A X s (L s )}, 0 L ×Diff +P () (P) ∼ = {(s, ψ) : s ∈ S, ψ ∈ 0X s (L s )}. The first of these has the special property that it is a symplectic fibration: its structure group is reduced to the symplectic automorphism group of A(P). Other fibrations can be derived from these basic ones. The space C(P) = A(P) × 0 (P ×U (1) C), comprising pairs (A, ψ), where ψ is a section of the line bundle associated with P, is also a Diff +P ()-space (the action is the diagonal one), and so is B(P) = C(P)/G,

Symplectic Fibrations and the Abelian Vortex Equations

303

because G acts on C(P) as a subgroup of Diff +P (). The associated fibrations are C X/S (L) := L ×Diff +P () C(P), B X/S (L) := L ×Diff +P () B(P). (b) Suppose now that X → S is itself a symplectic fibration, i.e. that its structure group is reduced to Aut(, ω) for some area form ω. Then the structure group L → S is reduced to Aut P (, ω), the group of pairs ( f˜, f ) with f ∗ ω = ω, and C X/S (L) → S is again a symplectic fibration. Note that P ×U(1) C is a hermitian line bundle, so our formula for the symplectic form on C(P) makes sense. Let { js ∈ J(X s , ωs )}s∈S be a smooth family of complex structures, compatible with the symplectic forms. The moment map m : C(P) → i2 , (A, ψ) → ∗iFA + |ψ|2 /2, generalises to a bundle map over S, m X/S : C X/S (L) → L ×Diff +P () 0 . We now take τ to be a constant. Then we have a sub-bundle m −1 X/S (τ ) ⊂ C X/S (L), projecting to a sub-bundle π(m −1 X/S (τ )) ⊂ B X/S (L) under the quotient map π : C X/S

(L) → B X/S (L), and π(m −1 X/S (τ ω)) → S has structure group Aut P (, ω). (c) We now impose a fibred version of the Cauchy–Riemann equation. This differs from what we have done so far in that it cannot be expressed in terms of associated bundles. The total space of the vortex fibration V X/S (L , τ ) → S is the space of triples ¯ [s, A, ψ] ∈ π(m −1 X/S (τ )) satisfying ∂ js ,A ψ = 0. It maps to S in the obvious way. The fibre over s can be identified with the vortex moduli space V X s (L|X s , τ ), and so with Symr (X s ). Lemma 3.1. The space V X/S (L , τ ) has a structure of smooth manifold which makes the projection p : V X/S (L , τ ) → S a smooth submersion, hence a fibre bundle. Proof. The linearisation of the defining equations for V X/S (L , τ ), and the fibrewise gauge-fixing condition, define an R-linear operator D(s,A,ψ) : D(s,A,ψ) (v, a, φ) = D A,ψ (a, φ) + P(v), v ∈ Ts S.

(24)

∂j . The operator D(s,A,ψ) is thus Here P is the 0th -order operator P(v) = 21 i(d A ψ) ◦ ∂v Fredholm, of real index 2r + dim(S), and surjective (since D(A,ψ) is). The kernel of D(s,A,ψ) is the putative tangent space to V X/S (L , τ ) at (s, A, ψ), and the projection π : ker D(s,A,ψ) → Ts S is putatively the derivative of p. Note that π is surjective, because its kernel is exactly ker D A,ψ , which we know has dimension 2r . Now the standard elliptic theory which we sketched above here gives smoothness of the vortex fibration and of the map p.

304

T. Perutz

3.2. Line bundles and cohomology operations.. Let LB → C∗X/S × S X be the pullback of the line bundle L → X . It is an equivariant line bundle under the fibrewise gauge-action, and so descends to a line bundle LB → B∗X/S × S X. The universal divisor univ ⊂ SymrS (X ) × S X corresponds to a unique line bundle LSym → SymrS (X ) × S X. Lemma 3.2. There is a natural isomorphism (i × 1)∗ LB → Z ∗ LSym , where i is the inclusion of V X/S (L , τ ) in B∗X/S , and Z the natural isomorphism of V X/S (L , τ ) with SymrS X . Proof. The section ([A, ψ], x) → [ψ(x)] of (i × 1)∗ LB vanishes precisely along Z −1 ( ).

Using these two line bundles one can construct operations H ∗ (X ) → H ∗+2k−2 (B∗X/S (L)), c → c˜[k] , H ∗ (X ) → H ∗+2k−2 (SymrS (X )), c → c[k] . defined for arbitrary coefficient rings. The second of these was discussed earlier (Eq. 1). Introduce the projections p1

B∗X/S

←−−−−

B∗X/S × S X

p2

−−−−→ X,

p1

p2

SymrS (X ) ←−−−− SymrS (X ) × S X −−−−→ X, and set c˜[k] = p1! (c1 (LB )k ∪ p2∗ c), c

[k]

= p1! (c1 (LSym ) ∪ k

p2∗ c).

Because of the relation between LB and LSym , we have i ∗ c˜[k] = Z ∗ c[k] .

(25) (26)

Symplectic Fibrations and the Abelian Vortex Equations

305

3.3. Associated fibrations as locally Hamiltonian fibrations. In Sect. 3.1, we constructed various associated fibrations within the category of symplectic fibrations—fibre bundles with symplectic forms on the fibres. Our next task is to refine these constructions to the category of locally Hamiltonian fibrations. The vortex fibration will then become a LHF by restricting a closed two-form defined on a larger space. The cleanest way that I have found to do this is to ‘reverse-engineer’ our cohomology calculation for the vortex moduli space. This goes as follows. on the bundle We need a fibrewise-equivariant connection ∇ LB → C∗X/S × S X. to be the unique To obtain one, choose a connection Aref on L → X . We define ∇ connection which restricts to the natural one (14) on each fibre over S, and which is given by Aref on T h X . defines a quotient conIn conjunction with the fibrewise gauge-fixing condition, ∇ ∗ nection ∇ on LB → B X/S × S X . Definition 3.3. We define the closed two-form v(τ ˜ , L) on B∗X/S by π v(τ ˜ , L) = 2π iF∇ ∧ τ − iF∇ . 2 X/S

(27)

We define v(, τ, L) to be the restriction of v(τ ˜ , L) to the vortex fibration V X/S (L , τ ). Let us clarify the integration symbol here. Projection on the first factor makes B∗X/S × S X a fibre bundle over B∗X/S . The fibre over a point of B∗X/S × S X which lies over s ∈ S is X s . It therefore makes sense to integrate down the fibres of B∗X/S × S X → B∗X/S . In particular, a closed four-form α on B∗X/S × S X gives rise to a closed two-form α. ∗ × X )/(B∗ ) (B X/S S X/S

We write this more compactly as X/S α. Bearing in mind that integration along the fibre corresponds to the cohomology pushforward, we can read off the cohomology class of v(τ ˜ , L): ˜ [1] − π 1˜ [2] ). [v(τ ˜ , L)] = 2π([τ ] Forming v(τ ˜ , L) is obviously compatible with restricting the base S. By our earlier calculations, the form i ∗ v(τ ˜ , L) on the vortex bundle restricts to the preferred Kähler form on each fibre. Thus [v(τ , L)] = 2π([τ ][1] − π 1[2] ). Theorem 1 is now an immediate consequence of what we have done. Acknowledgements. The work presented here formed a part of my doctoral thesis. I am grateful to my Ph.D. supervisor, Simon Donaldson, for his ideas and advice. Thanks also to Michael Thaddeus for pointing out the Duistermaat–Heckman method, and to Michael Usher for telling me about his related work [16]. I acknowledge support from EPSRC Research Grant EP/C535995/1.

306

T. Perutz

References 1. Atiyah, M., Bott, R.: The Yang–Mills equations over Riemann surfaces. Philos. Trans. Roy. Soc. London Ser. A 308(1505), 523–615 (1983) 2. Duistermaat, J., Heckman, G.: On the variation in the cohomology of the symplectic form of the reduced phase space. Invent. Math. 69(2), 259–268 (1982) 3. Donaldson, S., Kronheimer, P.: The geometry of four-manifolds. Oxford Mathematical Monographs, Oxford: Oxford University Press, 1990 4. Donaldson, S., Smith, I.: Lefschetz pencils and the canonical class for symplectic four-manifolds. Topology 42(4), 743–785 (2003) 5. García-Prada, O.: A direct existence proof for the vortex equations over a compact Riemann surface. Bull. London Math. Soc. 26(1), 88–96 (1994) 6. Jaffe, A., Taubes, C.: Vortices and monopoles. Progress in Physics 2, Boston, MA: Birkhäuser, 1980 7. Kronheimer, P., Mrowka, T.: Monopoles and three-manifolds. New Mathematical Monographs, Vol. 10. Cambridge Univ. Press, (in press) 8. MacDonald, I.: Symmetric products of an algebraic curve. Topology 1, 319–343 (1962) 9. McDuff, D., Salamon, D.: J -holomorphic curves and symplectic topology. Amer. Math. Soc. Colloquium Publications 52. Providence, RI: Amer. Math. Soc., 2004 10. Manton, N., Nasir, S.: Volume of vortex moduli spaces. Commun. Math. Phys. 199(3), 591–604 (1999) 11. Perutz, T.: Surface-fibrations, four-manifolds, and symplectic Floer homology. Ph.D. thesis, Imperial College London, 2005 12. Perutz, T.: Lagrangian matching invariants for fibred four-manifolds: I. Geom. Topol. 11, 759–828 (2007) 13. Salamon, D.: Seiberg–Witten invariants of mapping tori, symplectic fixed points, and Lefschetz numbers. Turkish J. Math. 23(1), 17–143 (1999) 14. Samols, T.: Vortex scattering. Commun. Math. Phys. 145(1), 149–179 (1992) 15. Seidel, P.: Symplectic Floer homology and the mapping class group. Pacific J. Math. 206(1), 219–229 (2002) 16. Usher, M.: Vortices and a TQFT for Lefschetz fibrations on 4–manifolds. Algebr. Geom. Topol. 6, 1677–1743 (2006) Communicated by N.A. Nekrasov

Commun. Math. Phys. 278, 307–327 (2008) Digital Object Identifier (DOI) 10.1007/s00220-007-0401-5

Communications in

Mathematical Physics

Yang-Mills Detour Complexes and Conformal Geometry A. Rod Gover1 , Petr Somberg2 , Vladimír Souˇcek2 1 Department of Mathematics, The University of Auckland, Private Bag 92019,

Auckland 1, New Zealand. E-mail: [email protected]

2 Mathematical Institute, Faculty of Mathematics and Physics, Charles University,

Sokolovská 83,186 75 Praha, Czech Republic. E-mail: [email protected]; [email protected] Received: 7 July 2006 / Accepted: 2 July 2007 Published online: 8 January 2008 – © Springer-Verlag 2007

Abstract: Working over a pseudo-Riemannian manifold, for each vector bundle with connection we construct a sequence of three differential operators which is a complex (termed a Yang-Mills detour complex) if and only if the connection satisfies the full Yang-Mills equations. A special case is a complex controlling the deformation theory of Yang-Mills connections. In the case of Riemannian signature the complex is elliptic. If the connection respects a metric on the bundle then the complex is formally selfadjoint. In dimension 4 the complex is conformally invariant and generalises, to the full Yang-Mills setting, the composition of (two operator) Yang-Mills complexes for (anti-)self-dual Yang-Mills connections. Via a prolonged system and tractor connection a diagram of differential operators is constructed which, when commutative, generates differential complexes of natural operators from the Yang-Mills detour complex. In dimension 4 this construction is conformally invariant and is used to yield two new sequences of conformal operators which are complexes if and only if the Bach tensor vanishes everywhere. In Riemannian signature these complexes are elliptic. In one case the first operator is the twistor operator and in the other sequence it is the operator for Einstein scales. The sequences are detour sequences associated to certain BernsteinGelfand-Gelfand sequences. 1. Introduction In the study of Riemannian and pseudo-Riemannian geometry it is often valuable to use differential operators with good conformal behaviour. In the Riemannian setting, elliptic differential operators are particularly important. For example the conformal Laplacian controls the conformal variation of the scalar curvature. This was exploited heavily in the solution by Schoen, Aubin, Trudinger, and Yamabe (see [40]) of the “Yamabe Problem” of finding, via conformal rescaling, constant scalar curvature metrics on compact manifolds. Related curvature prescription problems and techniques have exploited the higher order conformal Laplacians of Paneitz, Graham et al. [8,18,35]. These operators

308

A. R. Gover, P. Somberg, V. Souˇcek

on functions (or really densities) also find a natural place in the recent developments [24,36] concerning the asymptotics and scattering theory of the conformally compact Poincaré-Einstein metric of Fefferman-Graham [23]. On many tensor and spinor fields there is no conformally invariant elliptic operator (taking values in an irreducible bundle); this follows from the classification of conformally invariant differential operators on the sphere [7,22]. This classification is based on the structure of generalised Verma modules and from this it follows that often the analogue, or replacement, for a conformal elliptic operator on the sphere is an elliptic complex of conformally invariant differential operators. However the situation is complicated for conformally curved structures. The requirement that a sequence of differential operators be both conformally invariant and form a complex is severe. On the other hand when such complexes exist they can be expected to play a serious role in treating the underlying structure. This idea is already well-established in the setting of self-dual 4-manifolds [1,19]. On fully conformally curved n-manifolds, with n even, there is a class of elliptic conformal complexes on differential forms [11]. Each of these is different to the de Rham complex, and these complexes generalise the conformally invariant operator of [35], with leading term n/2 . Another class of complexes is based around the (Fefferman-Graham) obstruction tensor [23]. This is a natural conformal 2-tensor that generalises, to higher even dimensions, the Bach tensor in dimension 4. It turns out that the formal deformations of obstruction-flat manifolds are controlled by a sequence of conformal operators, which form an elliptic complex if and only if the structure is obstruction-flat [12]. Unfortunately there is no obvious way to generalise either the construction in [11], or that in [12]. For 4-manifolds we construct here two conformal differential sequences which are (formally self-adjoint) complexes if and only if the (conformally invariant) Bach-tensor [2] vanishes everywhere. This condition is weaker than self-duality. In fact conformally Einstein manifolds are also Bach-flat and there are structures which are Bach-flat and neither conformally-Einstein nor half-flat [30]. Writing T : S → Tw for the usual twistor operator on Dirac spinors (as in e.g. [5]), in Theorem 4.5 we obtain a differential complex T

M

T∗

S → Tw −→ Tw → S, where M is a third order Rarita-Schwinger type operator. On the other hand in Theorem 4.3 we construct P

MT

P∗

E 0 → E 1,1 −→ E 1,1 → E 0 , where M T is a second order conformal operator, similar in form to the operator which controls deformations of Einstein structures (see [6] and references therein), while P is a curvature modification of the trace-free covariant Hessian. Non-vanishing solutions of P give conformal factors σ so that σ −2 g is Einstein (see [3]); we show via the second sequence that the Bach tensor obstructs solutions. If the manifold is Riemannian then both of the complexes are elliptic. We have been intentionally explicit in treating these constructions, as it seems these complexes should play a fundamental role in conformal and Riemannian geometry. In the compact and Riemannian-signature setting the ellipticity implies that the complexes have finite dimensional cohomology spaces. In both cases the interpretation of the 0th -cohomology is well-known but as far as we know the first cohomology is a new global conformal invariant of Bach-flat structures.

Yang-Mills Detour Complexes

309

Such conformal elliptic complexes have the scope to yield further geometric information through their detour torsion invariants [10]. In fact Theorems 4.3 and 4.5 construct short detour complexes in all dimensions n ≥ 3 and n ≥ 4 respectively. These complexes are conformally invariant only in dimension 4, but by construction have a simple conformal behaviour and may well be of interest for physics in the Lorentzian setting. The route to the constructions and results mentioned above is really one of the main points of the article. We believe that it lays foundations for an eventual general treatment of a large class of related complexes, and also many of the results should be of independent interest. The simplest example of a detour complex is the Maxwell detour complex d

δd

δ

E 0 → E 1 → E 1 → E 0.

(1)

For each vector bundle V and connection D we construct, in Sect. 3.1, a curvature adjusted twisting of this complex with the property that it is again a complex if and only if the connection D is a (pure) Yang-Mills connection, see Theorem 3.2. In dimension 4 the resulting complexes are conformal. In Sects. 3.2 (in particular Theorem 3.4) we recover a class of these complexes by considering deformations of Yang-Mills connections. We show in Proposition 3.6 of Sect. 3.4 that, in dimension 4, the Yang-Mills detour complex generalises the composition of subcomplexes of Yang-Mills complexes arising from (anti-)self-dual connections. The next main item is a rather general construction, see diagram [D] in Sect. 4. This enables the Yang-Mills detour complex to be “translated” to yield new complexes. Broadly the motivational idea is this. If one has an overdetermined differential operator (of finite type) B 0 → B 1 then one may sometimes obtain a corresponding invariant connection on a prolonged system [33]. If the latter satisfies the Yang-Mills equations and, say, preserves a metric on the prolonged system, then the Yang-Mills detour complex on the prolonged system descends and extends B 0 → B 1 to a complex. In reality this is an over-simplification, but it contains the germ of the main idea. 2. Background: Conformal Geometry Recall that a conformal structure of signature ( p, q) on M is a smooth ray subbundle Q ⊂ S 2 T ∗ M whose fibre over x consists of conformally related signature-( p, q) metrics at the point x (and S 2 T ∗ M is the symmetric part of ⊗2 T ∗ M). Sections of Q are metrics g on M. So we may equivalently view the conformal structure as the equivalence class [g] of these conformally related metrics. The principal bundle π : Q → M has structure group R+ , and so each representation R+ x → x −w/2 ∈ End(R) induces a natural line bundle on (M, [g]) that we term the conformal density bundle E[w]. We shall write E[w] for the space of sections of this bundle and g denotes the conformal metric, that is the tautological section of S 2 T ∗ M ⊗ E[2] determined by the conformal structure. On conformal manifolds this will be used to identify T M with T ∗ M[2]. Note E[w] is trivialised by a choice of metric g from the conformal class, and we write ∇ for the connection corresponding to this trivialisation (and term this the Levi-Civita connection on E[w]). It follows that (the coupled) ∇a preserves the conformal metric. In dimensions n ≥ 3 the Riemannian curvature can be decomposed into the totally trace-free Weyl curvature Cabcd and a remaining part described by the symmetric Schouten tensor Pab , according to Rabcd = Cabcd + 2g c[a Pb]d + 2g d[b Pa]c , where [· · · ] indicates antisymmetrisation over the enclosed indices. The Schouten tensor is a

310

A. R. Gover, P. Somberg, V. Souˇcek

trace modification of the Ricci tensor Ricab and vice versa: Ricab = (n − 2)Pab + Jg ab , where we write J for the trace Pa a of P. The Cotton tensor and Bach tensor are defined by, respectively, Aabc := 2∇[b Pc]a and Bab := ∇ c Aacb + Pdc Cdacb .

(2)

Under a conformal transformation we replace a choice of metric g by the metric gˆ = e2ω g, where ω is a smooth function. Explicit formulae for the corresponding transformation of the Levi-Civita connection and its curvatures are given in e.g. [3,32]. abcd = Cabcd . We recall that, in particular, the Weyl curvature is conformally invariant C In dimension 4 Bab is conformally invariant. We will write E k [w] for the sections of the tensor product E k [w] := ∧k T ∗ M ⊗ E[w]. On conformal manifolds we use the notation Ek to mean the space of sections of E k := ∧k T ∗ M ⊗ E[2k − n]. This notation (following [11]) is suggested by the duality between the section spaces E k and Ek ; compactly supported sections pair globally by contraction and integration. For any vector bundle V , E k (V ) is the space of smooth sections of E k (V ) := ∧k T ∗ M ⊗ V , while Ek (V ) means the space of sections of E k (V ) := ∧k T ∗ M ⊗ E[2k − n] ⊗ V . When a metric from the conformal class is fixed, these spaces will be identified. In conformal geometry the de Rham complex is a prototype for a class of sequences of bundles and conformally invariant differential operators, each of the form B0 → B1 → · · · → Bn , where the vector bundles Bi are irreducible tensor-spinor bundles. On the n-sphere there is one such complex for each irreducible module V for the group G = S O(n + 1, 1) of conformal motions, the space of solutions of the first (overdetermined) conformal operator B 0 → B 1 is isomorphic to V, and the sequence gives a resolution of this space viewed as a sheaf. These are the conformal cases of the (generalised) BernsteinGelfand-Gelfand (BGG) sequences, a class of sequences of differential operators that exist on any parabolic geometry [7,15]. As well as the operators Di : Bi → Bi+1 of the BGG sequence, in even dimensions there are conformally invariant “long operators” L k : B k → B n−k for k = 1, . . . , n/2 − 1 [7]. Thus there are sequences of the form D0

D1

Dk−1

Lk

Dn−k

Dn−1

B 0 → B 1 → · · · → B k → B n−k → · · · → B n , and, following [10,11], we term these detour sequences since, in comparison to the BGG sequence, the long operator here bypasses the middle of the BGG sequence. Once again from the classification it follows that these detour sequences are in fact complexes in the case that the structure is conformally flat. The dimension 4 conformal complexes, constructed in Theorems 4.3 and 4.5 below, are detour sequences of this form with k = 1. 3. Yang-Mills Detour Complexes 3.1. The general construction. We work over a pseudo-Riemannian n-manifold (M, g) of signature ( p, q) (n ≥ 2). Let V denote a vector bundle with a connection D. We denote by F the curvature of D. We also write D for the induced connection on the dual bundle V ∗ . We write d D for the connection-coupled exterior derivative operator d D : E k (V ) → E k+1 (V ). Of course we could equally consider d D : E k (V ∗ ) → E k+1 (V ∗ ), and for the formal adjoint of this we write δ D : Ek+1 (V ) → Ek (V ).

Yang-Mills Detour Complexes

311

Let us write F· for the action of the curvature on the twisted 1-forms, F· : E 1 (V ) → E1 (V ) given by (F·ϕ)a := Fa b ϕb , where we have indicated the abstract form indices explicitly, whereas the standard End(V ) action of the curvature on the V -valued 1-form is implicit. Using this we construct a differential operator M D : E 1 (V ) → E1 (V ) by M D ϕ = δ D d D ϕ − F·ϕ. The operator M D has the property that its composition with d D is given simply by an algebraic action of the “Yang-Mills current” δ D F on the bundle V , as follows. Lemma 3.1. The composition of M D : E 1 (V ) → E1 (V ) with d D : E 0 (V ) → E 1 (V ) is given by the exterior action of δ D F, as an End(V )-valued 1-form: M D d D = ε(δ D F). The composition of δ D : E1 (V ) → E0 (V ) with M D : E 1 (V ) → E1 (V ) is given by the interior action of −δ D F, as an E 1 -valued endomorphism of E 1 (V ): δ D M D = −ι(δ D F). In these expressions the interior multiplication (indicated by ι(·)) and the exterior multiplication (indicated ε(·)) refers to the form index of δ D F. Proof. For the connection D coupled with the Levi-Civita connection ∇, let us also write D. Then, again using the notation where we exhibit abstract tensor indices but suppress indices for the bundle V , a formula for M D on a twisted 1-form a is (M D )b = −D a Da b + D a Db a − Fb a a , since the Levi-Civita connection is torsion-free. On the other hand for ∈ E 0 (V ), (d D )a = Da . Thus (M D d D )b = D a (Db Da − Da Db ) − Fb a Da = D a Fba − Fba D a = ε(δ D F) . b

By a similar calculation (or using the above on (V ∗ , D) and taking formal adjoints) we obtain, δ D M D = −ι(δ D F) , for ∈ E1 (V ). (Note that ι(δ D F) = −(D b Fb a ) a .)

Remark. Note that to simplify the punctuation in calculations, we often view sections of vector bundles as order 0 operators. Thus for example D a Fba has the same meaning as D a (Fba ). If the connection D is orthogonal or unitary for some inner product or Hermitian form on V (then V may be identified with V ∗ and) the algebraic action F· : E 1 (V ) → E1 (V ) is easily verified to be formally self-adjoint and so, in this case, M D is formally selfadjoint. From these observations, and Lemma 3.1, we have the following.

312

A. R. Gover, P. Somberg, V. Souˇcek

Theorem 3.2. The sequence of operators, dD

MD

δD

E 0 (V ) → E 1 (V ) → E1 (V ) → E0 (V )

(3)

is a complex if and only if the curvature F of the connection D satisfies the (pure) Yang-Mills equation δ D F = 0. In addition: (i) If D is an orthogonal or unitary connection then the sequence is formally selfadjoint. (ii) In Riemannian signature the sequence is elliptic. (iii) In dimension 4 the sequence (3) is conformally invariant. Proof. It remains to show (ii) and (iii). For (ii) we need that the symbol sequence is exact. This sequence is simply a tensor product twisting by V of the symbol sequence of the Maxwell detour complex (1) and so it is sufficient to check that case. But that case is an easy consequence of the algebraic Hodge decomposition on an inner product space. The conformally well-defined formal adjoint of the exterior derivative d : E k → E k+1 acts δ : Ek+1 → Ek (cf. e.g. [11]). Note that in even dimensions on middle order forms we have E n/2 = En/2 and so δ : E n/2 → En/2−1 is conformally invariant. The invariance persists if we twist by a connection D, and so from the definition of M D we have the result. For a given connection D on a vector bundle V , such that δ D F = 0, we will term the complex (3) of Theorem 3.2 the (corresponding) Yang-Mills detour complex. If D is a Yang-Mills connection on a vector bundle V , then the dual connection on V ∗ and the tensor product connection on any tensor power of these are also Yang-Mills. One might alternatively work with principal connections. If ω is a Yang-Mills connection on a principal bundle P with structure group G, then we obtain a complex (3) for every finite dimensional representation of G. 3.2. A variational construction of the deformation detour. Returning to the general situation that began Sect. 3.1, let V denote a vector bundle with a connection D and denote by F the curvature of D. Consider now a smoothly parametrised family of connections D t (on V ) given, on a section v ∈ E 0 (V ), by Dat v = Da v + Aat v,

(4)

where for each t ∈ R, At ∈ E 1 (EndV ) and A0 = 0. With F t denoting the curvature of D t , we have t Fab = Fab + Da Atb − Db Aat + [Aat , Atb ],

where, once again, we write D also to mean the connection on V coupled with the Levi-Civita connection. It follows that the derivative of F t at D = D 0 is d t A |t=0 , F˙ab = Da A˙ b − Db A˙ a where A˙ a := dt a

Yang-Mills Detour Complexes

313

˙ Now we calculate the derivative, at D, of δ D t F t . We have that is F˙ = d D A. d ab t t g Da Fbc |t=0 = D b F˙bc + [ A˙ b , Fbc ] dt = D b (Db A˙ c − Dc A˙ b ) + [Fc b , A˙ b ], where A˙ acts on F and vice versa by the obvious composition of bundle endomorphisms. ˙ Multiplying Note that, since the 1-form A˙ has values in EndV , the last term here is F· A. the display by −1 gives d Dt t ˙ δ F |t=0 = M D A. dt

(5)

So we have, in particular, the following outcome. Lemma 3.3. If D is a Yang-Mills connection then the infinitesimal deformation A˙ of D is through Yang-Mills connections if and only if M D A˙ = 0. In the vector bundle picture, a so-called gauge transformation arises locally by acting on V by a section u of the fibre bundle Aut(V ) of invertible elements in End(V ). From the Leibniz rule for D (viewed as a connection on the tensor powers of V and V ∗ ) it follows immediately that this pulls back to a transformation Da → Da + u −1 Da u, of the connection, and whence Fab → u −1 Fab u,

and

D a Fab → u −1 (D a Fab )u.

(6)

Thus if u s is a smoothly parametrised family of such transformations with u 0 = idV and derivative d u s |s=0 = u˙ ∈ E 0 (End(V )), ds then we obtain that the infinitesimal variation of D s is exactly d D u: ˙ ˙ D˙ a = Da u.

(7)

So from this and (5) we have d Ds s δ F |s=0 = M D d D u. ˙ ds On the other hand from (6) and (7) we get d Ds s δ F |s=0 = (δ D F)u˙ − uδ ˙ D F. ds Putting the last two results together brings us to ˙ M D d D u˙ = ε(δ D F End(V ) )u, where F End(V ) is the curvature of D viewed as a connection on End(V ) (so e.g. F End(V ) u˙ = [F, u]). ˙ This agrees precisely with the specialisation of Lemma 3.1 to

314

A. R. Gover, P. Somberg, V. Souˇcek

End(V ) equipped with the connection induced from D on V . In particular if D is a Yang-Mills connection then so is the connection on End(V ). Since End(V ) carries the non-degenerate symmetric pairing (U, W ) = T r (U W ) and this is preserved by D, then it follows from (7) that M D is formally self-adjoint with respect to the global pairing obtained by integrating (, ). (The point is that the Yang-Mills equations are the EulerLagrange equations for the Lagrangian density T r (F ab Fab ). So (7) shows that M D is the second variation of an action. By interchanging orders of variation one obtains the symmetry.) Thus from M D d D we also have δ D M D = 0 and the following result. Theorem 3.4. For a vector bundle V , with Yang-Mills connection D, the (formal) deformation detour complex dD

MD

δD

E 0 (End(V )) → E 1 (End(V )) → E1 (End(V )) → E0 (End(V ))

(8)

is formally self-adjoint. Its first cohomology H 1 (End(V ), D) is the formal tangent space at D to the moduli space of Yang-Mills connections on V . It follows from a general deformation theory that the complex (8) controls the full formal deformation theory of the Yang-Mills equations. 3.3. Examples: (pseudo-)Riemannian manifolds with harmonic curvature. On a pseudoRiemannian (spin) manifold we write ∇ for the Levi-Civita connection and R for its curvature, the Riemannian curvature tensor. Riemannian structures satisfying δ ∇ R = 0 are said to have harmonic curvature. Einstein manifolds, for example, are harmonic in this sense. There is a rich theory of harmonic manifolds, see [6] and references therein. If δ ∇ R = 0 then, from Theorem 3.2, we get a detour complex (3) for V any tensor (spin) bundle. For example if T M is the tangent bundle then we have M ∇ : E 1 (T M) → E1 (T M) by Sb c → −2∇ a ∇[a Sb] c − Rba c d S ad . This annihilates the covariant derivative of any tangent vector field. 3.4. Half-flat connections. In the setting of conformal (or pseudo-Riemannian) 4-manifolds, we observe here that when a vector bundle connection D is half-flat then there is very simple interpretation of the Yang-Mills detour complex. First we review, in our current notation, some relevant (well-known) background. Recall that on a conformal 4-manifold M of signature ( p, q) we have = 2 for (−1)k(4−k)+q on k-forms. In the case of Minkowskian signature let us write E ± 2 the ±i-eigenspaces of . In the other signatures E ± means the ±1 eigenspaces of . In any case, since is a symmetric endomorphism of E 2 , the decomposition of E 2 into 2 is orthogonal. Viewing the curvature F (of D on V ) as a twisted 2-form, recall E +2 ⊕ E − that the curvature, or the connection, is said to be self-dual (respectively anti-self-dual) 2 (End(V )) (respectively in E 2 (End(V ))) is zero, F = 0 if the component of F in E− − + (respectively F+ = 0). So if a connection D is half-flat, in this sense, then δ D F is a multiple of d D F. But this vanishes by the differential Bianchi identity for F. So δ D F = 0 for connections which are either self-dual or anti-self-dual and each case gives a special setting where the sequence (3) is a complex. Let us write d±D for the compositions given by d D : E 1 (U ) → E 2 (U ) followed by the 2 (U ), where U means either the bundle V or its dual V ∗ . Thus projections E 2 (U ) → E±

Yang-Mills Detour Complexes

315

2 (U ) are conformally invariant. We write by construction the operators d±D : E 1 (U ) → E± D 2 2 (V ∗ ). δ± : E± (V ) → E1 (V ) for the operators formally adjoint to d±D : E 1 (V ∗ ) → E± By construction these also are conformally invariant. Now on ∈ E 0 (V ) we have 2 vanishes for all if and only if F = 0. d D d D = F. The projection of this into E± ± By a similar observation for the composition d D d D on E 0 (V ∗ ), and then taking formal adjoints, we see that we have the situation in the following proposition. These results are well-known.

Proposition 3.5. The sequences δ+D

d+D

dD

δD

E 0 (V ) −→ E 1 (V ) −→ E+2 (V ) and E+2 (V ) −→ E1 (V ) −→ E0 (V ) are complexes if and only if F+ = 0. Similarly the sequences D δ−

d−D

dD

δD

2 2 E 0 (V ) −→ E 1 (V ) −→ E− (V ) and E− (V ) −→ E1 (V ) −→ E0 (V )

are complexes if and only if F− = 0. In Riemannian signature each of these is an elliptic complex. Evidently then we obtain detour complexes by composing the twisted de Rham subcomplexes in the proposition. For example if the connection D is anti-self-dual then there is a detour complex 2δ+D d+D

dD

δD

E 0 (V ) −→ E 1 (V ) −→ E1 (V ) −→ E0 (V ).

(9)

Similarly if D is instead self-dual then there is a detour complex Dd D 2δ− −

dD

δD

E 0 (V ) −→ E 1 (V ) −→ E1 (V ) −→ E0 (V ).

(10)

The following result is a straightforward calculation. Proposition 3.6. The complexes (9) and (10) are special cases of the twisted de Rham detour complex (3) of Theorem 3.2. 4. Translation via the Yang-Mills Detour Complex We may use Theorem 3.2 to construct more exotic differential complexes. The ideas here are partly inspired by Eastwood’s curved translation principle [21,20] which in turn is a geometric adaptation of the Jantzen-Zuckermann translation functor from representation theory. Consider the following general situation. Suppose that there are vector bundles (or rather section spaces thereof) B 0 , B 1 , B1 and B0 and differential operators L 0 , L 1 , L 1 , L 0 , D and D which act as indicated in the following diagram: dD

E 0 (V ) −→ [D]

L0

6

B0

E 1 (V ) L1

D

−→

MD

−→

6

B1

δD

E1 (V ) −→ L1

MB

−→

? B1

E0 (V ) L0

D

−→

? B0

316

A. R. Gover, P. Somberg, V. Souˇcek

The top sequence is (3) for a connection D with curvature F and the operator M B : B 1 → B1 is defined to be the composition L 1 M D L 1 . Suppose that the squares at each end commute, in the sense that as operators B 0 → E 1 (V ) we have d D L 0 = L 1 D and as operators E1 (V ) → B0 we have L 0 δ D = DL 1 . Then on B 0 we have M B D = L 1 M D L 1 D = L 1 M D d D L 0 = L 1 ε(δ D F)L 0 , and similarly DM B = −L 0 ι(δ D F)L 0 . Thus if D is Yang-Mills then the lower sequence, viz. D

MB

D

B 0 −→ B 1 −→ B1 −→ B0 ,

(11)

is a complex. Remarks. Note that if the connection D preserves a Hermitian or metric structure on V then we need only the single commuting square d D L 0 = L 1 D on B 0 to obtain such a complex; by taking formal adjoints we obtain a second commuting square (L 0 δ D = DL 1 ) : B1 → B0 , where B0 and B1 are appropriate density twistings of the bundles dual to B 0 and B 1 respectively. Obviously for (11) to be a complex, it is sufficient (and necessary) for L 1 M D (d D L 0 − L 1 D) to vanish on B 0 and for (DL 1 − L 0 δ D )M D L 1 to vanish on B 1 . 4.1. The complex for (almost) Einstein scales. We work in the setting of conformal n-manifolds, n ≥ 3. We will construct here a diagram of the form [D] via the normal conformal tractor connection. The standard tractor bundle is a vector bundle with a conformally invariant connection that we may view as arising as an induced structure from the Cartan bundle and connection of [17]. In fact the Cartan connection is readily recovered from the tractor connection, see [14] where such connections and related calculus are described for the class of parabolic geometries (which also includes, for example, CR geometry, quaternionic structures and projective geometry). For our current construction it is not the normality of the tractor connection, in the sense of [14,17], that is important. Rather the key point is that it arises from a prolongation (as observed in [3]) of a certain (finite type) partial differential operator P that we may take as the operator D for the diagram [D]: In terms of a metric g, this operator P is given by Pσ = TF(∇a ∇b σ + Pab σ ),

(12)

where σ ∈ E[1]. Modulo the trace part, this is the differential operator which controls the conformal transformation of the Schouten tensor. In particular a metric σ −2 g is Einstein if and only if the scale σ ∈ E[1] is non-vanishing and satisfies Pσ = 0. In order to be explicit we give a construction of the tractor connection here, as it is the key to obtaining the required commutative diagram. For further details see [13]. We write J k E[1] for the bundle of k-jets of germs of sections of E[1]. Considering, at each point of the manifold, sections which vanish to first order at the given point reveals a canonical sequence, 0 → S 2 T ∗ M ⊗ E[1] → J 2 E[1] → J 1 E[1] → 0. This is the jet exact sequence at 2-jets. Via the conformal metric g, on a conformal manifold the bundle of symmetric covariant 2-tensors S 2 T ∗ M decomposes directly into

Yang-Mills Detour Complexes

317

the trace-free part, which we will denote E 1,1 , and a pure trace part isomorphic to E[−2], that is S 2 T ∗ M[1] = E 1,1 [1] ⊕ E[−1]. The standard tractor bundle T may be defined as the quotient of J 2 E[1] by the image of E 1,1 [1] in J 2 E[1]. By construction this is invariant, it depends only on the conformal structure. Also by construction, it is an extension of the 1-jet bundle 0 → E[−1] → T → J 1 E[1] → 0. Note that there is a tautological operator D : E[1] → E 0 (T) which is simply the composition of the universal 2-jet differential operator j 2 : E[1] → E 0 (J 2 E[1]) followed by the canonical projection E 0 (J 2 E[1]) → E 0 (T). By construction this is invariant. Via a choice of metric g, and the Levi-Civita connection it determines, we obtain a differential operator E[1] → E[1] ⊕ E 1 [1] ⊕ E[−1] by σ → (σ, ∇a σ, − n1 ( + J)σ ) and this obviously determines an isomorphism g

E 0 (T) ∼ = E[1] ⊕ E 1 [1] ⊕ E[−1].

(13)

Changing to a conformally related metric g = e2ω g (ω a smooth function) gives a different isomorphism, which is related to the previous by the transformation formula (σ, µb , τ ) = (σ, µb − σ ϒb , τ + g bc ϒb µc − 21 σ g bc ϒb ϒc ), E[1] ⊕ E 1 [1] ⊕ E[−1]

where ϒ := dω. Now we define a connection on ⎛ ⎞ ⎞ ⎛ ∇a σ − µa σ ∇a ⎝ µb ⎠ := ⎝ ∇a µb + gab ρ + Pab σ ⎠ ρ ∇a ρ − Pab µb

(14)

by the formula (15)

where, on the right-hand-side ∇ is the Levi-Civita connection for g. Obviously this determines a connection on T via the isomorphism (13). What is more surprising is that if we repeat this using the metric g , conformally related to g, in (13) and (15) we obtain the same connection on T. This may easily be verified by, for example, directly calculating that under such a conformal change the right-hand side of (15) transforms in exactly the same way as a (1-form valued) invariant section of T. That is it transforms according to (14). The canonical connection on T, so constructed, depends only on the conformal structure and is known as the (normal standard) tractor connection. In what follows we will use (13) without further explicit comment. There is also a conformally invariant tractor metric h on T given (as a quadratic form) by (σ, µ, ρ) → g −1 (µ, µ) + 2σρ. This is preserved by the connection and has signature ( p + 1, q + 1) (corresponding to g of signature ( p, q)). Note that, given a metric g, through (13) the tautological invariant operator D from above is given by the explicit formula 1 σ → (σ, ∇a σ, − (σ + Jσ )). n This is called a differential splitting operator since through the jet projections there is conformally invariant surjection X : E(T) → E[1] which inverts D. There is also a differential splitting operator E 0 [1] → E 0 (T)

E : E 1,1 [1] → E 1 (T)

ψab → (0, ψab , −(n − 1)−1 ∇ b ψab )

(cf. [20]). An easy calculation verifies that this also is conformally invariant. We have the following.

318

A. R. Gover, P. Somberg, V. Souˇcek

Proposition 4.1. With ∇ denoting the tractor connection on E 0 (T) we have ∇D = E P as differential operators on E[1]. For σ ∈ E[1], Dσ is parallel if and only if Pσ = 0. Proof. The second statement is immediate from the first. A straightforward calculation verifies that either composition applied to σ ∈ E[1] yields ⎛ ⎞ 0 ⎝ ⎠ TF(∇a ∇b σ + Pab σ ) − n1 ∇a (σ + J σ ) − Pa c ∇c σ In fact if a section I ∈ E 0 (T) is parallel then I = Dσ for some σ ∈ E[1] so a conformal manifold with a parallel tractor is almost Einstein in the sense that it has a section of E[1] that gives an Einstein scale on an open dense subset (see [28] for further details). Since the tractor connection is orthogonal (for the conformally invariant tractor metric h given above) the formal adjoints of the operators above give another commutative square of operators. That is with P ∗ : E1,1 [−1] → E0 [−1]ϕab → ∇ a ∇ b ϕab + Pab ϕab , 1 E ∗ : E1 (T) → E1,1 [−1](αa , νab , τa ) → ν(ab)0 + ∇(a αb)0 , n−1 1 D∗ : E0 (T) → E0 [−1](σ, µb , ρ) → ρ − ∇ a µa − (σ + J σ ), n δ ∇ : E1 (T) → E0 (T)a B → −∇ a a B , where E1,1 denotes the space of sections of E 1,1 ⊗ E[4 − n], we have D∗ δ ∇ = P ∗ E ∗ on E1 (T). Finally observe that the curvature of the tractor connection, as calculated directly from (15), is ⎛ ⎞ 0 0 0 ab C D = ⎝ Ac ab Cab c d 0 ⎠ 0 −Adab 0 and hence (see e.g. [31] for further details), ⎛ ⎞ 0 0 0 ∇ a ab C D = ⎝ B c b (n − 4)Ab c d 0 ⎠ 0 −Bdb 0

(16)

where, on the left-hand side, ∇ is the Levi-Civita connection coupled with the tractor connection on End(T) induced from (15). Let us say that a pseudo-Riemannian manifold is semi-harmonic if its tractor curvature is Yang-Mills, that is ∇ a ab C D = 0. Note that in dimensions n = 4 this is not a conformally invariant condition and a semi-harmonic space is a Cotton space that is also Bach-flat. From our observations above, the semiharmonic condition is conformally invariant in dimension 4 and according to the last display we have the following result.

Yang-Mills Detour Complexes

319

Lemma 4.2. In dimension 4 the tractor connection (15) is a Yang-Mills connection if and only if the structure is Bach-flat. This result is not new and equivalent observations have been known in the literature for some time [4,37,39]. It brings us to the following. Let us write M T for the composition E ∗ M ∇ E. On h ∈ E 1,1 [1] we have 1 ∇a ∇ c h bc + Ca c b d h cd , (M T h)ab = −T F S ∇ c (∇c h ab − ∇a h cb ) − n−1 and the following results. Theorem 4.3. The sequence MT

P

P∗

E 0 [1] → E 1,1 [1] −→ E1,1 [−1] → E0 [−1]

(17)

has the following properties: (i) It is a formally self-adjoint sequence of differential operators and, for σ ∈ E 0 [1],

(M T Pσ )ab = −T F S Bab σ − (n − 4)Aabc ∇ c σ ,

(18)

where T F S(· · · ) indicates the trace-free symmetric part of the tensor concerned. In particular it is a complex on semi-harmonic manifolds. (ii) In the case of Riemannian signature the complex is elliptic. (iii) In dimension 4, (17) is a sequence of conformally invariant operators and it is a complex if and only if the conformal structure is Bach-flat. Proof. Setting D = P,

L 0 = D,

L1 = E

we have the situation of the translation diagram [D] above, with the right square given by formal adjoints of these operators, and the tractor bundle connection pair (T, ∇) used for (V, D) in the top row. That is: d∇

E 0 (T) −→ D

6 E 0 [1]

E 1 (T) E

P

−→

M∇

−→

6

E 1,1 [1]

E1 (T)

δ∇

−→

E∗

MT

−→

? P∗ E1,1 [−1] −→

E0 (T) D∗

? E0 [−1]

By construction the lower sequence (17) is formally self-adjoint, and in dimension 4 conformally invariant. If the structure is semi-harmonic then the upper sequence is a complex and hence, from the commutativity of the diagram, (17) is a complex. In particular on Bach-flat 4-manifolds we obtain a complex. On the other hand, from (18) it follows that in dimension 4 we obtain a complex only if the structure is Bach-flat.

320

A. R. Gover, P. Somberg, V. Souˇcek

From (15) we calculate d ∇ on the range of E to obtain ⎛ ⎞ ⎛ ⎞ 0 0 d ∇ ⎝ ν ⎠ = ⎝ Qν ⎠ τ ∗

(19)

1 where, for ν ∈ E 1,1 [1], we have τ = − (n−1) ∇ b νab , Q is given by

(Qν)abc = 2∇[a νb]c + 2g c[a τb] , and we do not need the details of the term indicated by ∗. It follows immediately from (19), and the formulae for the tractor metric, that we have M T := E ∗ M ∇ E = Q ∗ Q + LOT, where Q ∗ denotes the formal adjoint of the operator Q. In Riemannian signature the leading symbol of Q ∗ Q has the same kernel as the leading symbol of Q, and it follows easily that the complex is elliptic. The “lower order terms” (indicated by LOT) in M T arise simply from the tractor curvature in the formula for M D and amount to an action by the Weyl curvature. Including this yields the explicit formula for M T given above the theorem. The expression (18) for the composition M T P follows from this by a short direct calculation. (The calculation is even simpler if the result of Lemma 3.1 is imported.) Remark. Note that the formula for M T is closely related to, but not the same as, the operator which arises from deformations of Einstein structures (for the latter see e.g. [6] and references therein). It should be valuable to expose the geometric meaning of the first cohomology of the sequence (17). Corollary 4.4. Einstein 4-manifolds are Bach-flat. Proof. If a non-vanishing density σ is an Einstein scale then, calculating in that scale, we have M T Pσ = −Bσ , where B is the Bach tensor. On the other hand if σ is an Einstein scale then Pσ = 0 (see (12)). Remarks. In fact, more generally, almost Einstein manifolds are also necessarily Bach flat. Since an almost Einstein manifold has an Einstein scale on an open dense subspace, this follows by continuity of the Bach tensor. (The higher dimensional extension of this result is that even dimensional almost Einstein manifolds have vanishing FeffermanGraham obstruction tensor, see e.g. [28] and references therein.) The result that Einstein metrics are Bach-flat is well-known by other means (see e.g. [38,31]). Nevertheless we feel the detour complex gives an interesting route to this. In any dimension Einstein metrics satisfy Pab = n1 J gab with J constant, so it follows from the definitions of the Cotton tensor and the Bach tensor (2) that Einstein metrics are semi-harmonic. Thus there are many examples of semi-harmonic manifolds. 4.2. The twistor spinor complex. We assume here that we have a conformal spin structure. This is no restriction locally. For the purpose of being self-contained and having the results in a uniform notation we derive the basic spinor identities we require. An alternative treatment for many of these may be found in, e.g. [26]. We will use the spintractor connection below. This is often termed the local twistor connection [5,41]. The notation we use (and the basic tractor tools) follows [9] which presents a spin-tractor calculus developed by the first author and Branson. Following that source we write S

Yang-Mills Detour Complexes

321

for the basic spinor bundle and S = S[−n] (i.e. the bundle that pairs globally in an invariant way with S on conformal n-manifolds). Evidently the weight conventions here give S a “neutral weight”. In terms of, for example, the Penrose weight conventions S = E λ [− 21 ] = E λ [ 21 ], where E λ denotes the basic contravariant spinor bundle in [41]. We write Tw for the so-called twistor bundle, that is the subbundle of T ∗ M ⊗ S[1/2] consisting of form spinors u a such that γ a u a = 0, where γa is the usual Clifford symbol. We use S and Tw also for the section spaces of these bundles. The twistor operator is the conformally invariant Stein-Weiss gradient T : S[1/2] → Tw given explicitly by ψ → ∇a ψ +

1 γa γ b ∇b ψ. n

The main result of this section is that this completes to a differential complex as follows. Theorem 4.5. On semi-harmonic pseudo-Riemannian n-manifolds n ≥ 4 we have a differential complex T

M

T∗

S[1/2] → Tw −→ Tw → S[−1/2],

(20)

where T is the usual twistor operator, T∗ its formal adjoint, and M is a third order operator and given by the formula (26) below. The sequence is formally self-adjoint and in the case of Riemannian signature the complex is elliptic. In dimension 4 the sequence (20) is conformally invariant and it is a complex if and only if the conformal structure is Bach-flat. Remarks. Of course on a fixed pseudo-Riemannian manifold we may ignore the conformal weights. Note also that, for example in dimension 4, under the chirality decomposition of this sequence, we get the two complexes T

M

T∗

S± [1/2] → Tw± −→ Tw∓ → S∓ [−1/2], by the restriction of the operators T, T∗ , and M . If we were to apply the construction below in dimension 3 then we would obtain a trivial operator M . In this dimension the BGG sequence (see Sect. 2) takes the form (20), where the middle operator is of second order. In the calculations which follow it will often be convenient to use abstract indices for the form bundles while at the same time not using any indices for the spinor bundles. We have already done this implicitly above, for example in the formula for the twistor operator which, in this notation, acts Ta : S[1/2] → Twa . From the usual gamma matrices γ a satisfying γ a γ b + γ b γ a = −2g ab Id √ we switch to the symbols β := γ / 2, so that β a β b + β b β a = −g ab Id;

(21)

322

A. R. Gover, P. Somberg, V. Souˇcek

this simplifies certain formulae in the following discussion. We denote the corresponding Dirac operator by D := β a ∇a . Given a metric g from the conformal class the spin-tractor bundle is given by g

∼ = S[1/2] ⊕ S[−1/2], where S. In the conformally related metric g = e2ω g we have a similar isomorphism and ψ ψ = , (22) ϕ + ϒc β c ψ ϕ where ϒ = dω. In terms of the g-splitting the normal conformal spin-tractor connection is given by ∇a ψ + βa ϕ ψ ∇a = . (23) ϕ ∇a ϕ + Pab β b ψ On the right side ∇ means the usual Levi-Civita (spin) connection, while on the left the same notation is used for the spin-tractor connection. It is an easy exercise to verify directly that this is a conformally invariant connection. The normality follows from the characterisation of normal tractor connections (for irreducible parabolic geometries) given in Theorem 1.3 of [13]. The invariant pseudo-Hermitian form on spin-tractors is given by ϕ, ψ + ψ, ϕ for a pair of spin-tractors (ϕ, ψ), (ϕ, ψ), and where ·, · is the usual Hermitian form on spinors (which is compatible with Clifford multiplication and is preserved by the LeviCivita spin connection). It is readily verified that this is invariant under the transformations (22) and that it is preserved by the spin-tractor connection (23). We subsequently calculate in a metric scale g without further comment. We construct two differential splitting operators: L 0 : S[1/2] → E 0 () is given by ψ →

ψ ; 2 n Dψ

(24)

L 1 : Tw → E 1 () is given by ψa →

ψa . 2 1 b n−2 (Dψa − n−1 βa ∇ ψb )

(25)

It is a straightforward exercise to verify that these transform according to (22) and so are expressions, in the metric scale g, for conformally invariant operators. An essential feature of these operators is the following commutativity result.

Yang-Mills Detour Complexes

323

Proposition 4.6. With ∇ denoting the spin-tractor connection on E 0 () we have d ∇ L 0 = L 1 T, as differential operators on S[1/2]. For ψ ∈ S[1/2], L 0 ψ is parallel if and only if Tψ = 0. The last comment follows immediately from the commutativity of the square given that L 1 is a differential splitting operator (and so, in particular, L 1 ψa = 0 ⇒ ψa = 0). A correspondence between parallel spin-tractors and twistor spinors dates back to [25]. The extra information in the proposition is that the operator L 1 is a conformally invariant tractor splitting operator. We shall postpone the proof of Proposition 4.6, as we prefer to first complete the proof of Theorem 4.5. Suppose that for a metric g, from the conformal class, the tractor connection is semi-harmonic. Recall this is exactly the condition that the normal tractor connection is Yang-Mills. It follows immediately that the spin-tractor connection is also YangMills, since this is induced from the same principal connection simply pulled back to the 2-1 covering Spin( p + 1, q + 1)-principal bundle. (Equivalently they arise from the same Cartan connection; this is the usual picture [14] and sufficient to see this result. From the Cartan picture one may easily extend to a principal bundle and connection from which the tractors are induced, and this is simply an alternative framework.) As observed above, in dimension 4 the Yang-Mills condition is exactly the condition that the metric (or conformal structure) is Bach-flat. Thus, from the proposition, the first part of Theorem 4.5 follows immediately from the commutative diagram below where M = L 1 M ∇ L 1, d∇

E 0 () −→ L0

6

E 1 () L1

T

S[1/2] −→

M∇

−→

6

Tw

(26) δ∇

E1 () −→ L1

M

−→

? Tw

E0 () L0

T∗

−→

? S[−1/2]

Using that is a self-dual bundle, the operators in the square at the right end of the diagram are defined as the formal adjoints of the operators in the first square. So all squares commute and both horizontal sequences are formally self-adjoint. To establish ellipticity we require the leading term of the operator M . Applying the spin-tractor twisted exterior derivative to L 1 t, for t ∈ Tw, we obtain a result of the form Kt (27) DKt + mβδ ∇ Kt + cur vatur e with Kt given by 2∇a1 ta2 +

4 1 βa1 (Dta2 − βa ∇ c tc ), n−2 n−1 2

where the indices a1 a2 are implicitly skewed over, and m are constants, and δ ∇ is the spin-Levi-Civita connection twisted interior derivative. By construction K is an invariant

324

A. R. Gover, P. Somberg, V. Souˇcek

operator K : Tw → Tw2 , where Tw2 is the subbundle of E 2 ⊗ S[1/2] consisting of spin-forms annihilated by interior multiplication by β. It is a straightforward exercise (or one may use the BGG machinery of [15]) to construct a differential splitting operator L 2 : Tw2 → E 2 ⊗ . This has the form s → (s, Ds + mβδ ∇ s) (cf. L 1 ) where = 0. On the other hand in the conformally flat case it follows easily, from the uniqueness of conformal differential operators, that L 2 K = d ∇ L 1 . Thus we obtain the form of the bottom slot of (27). Since the leading term of M is obtained by composing d ∇ L 1 with its formal adjoint, and D is formally self-adjoint, it follows that the leading term of the operator M is of the form K∗ π2 DK. Here K∗ denotes the formal adjoint of K and the projection π2 is the projection of spinor-valued two-forms to the space T w2 . The operator R2 = π2 D on T w 2 is an elliptic and self adjoint operator; it is a higher spin analogue of the Dirac operator. Similarly, if π1 denotes the projection of spinorvalued one-forms to T w, the operator R1 = π1 D is an elliptic self adjoint operator on T w (usually called the Rarita-Schwinger operator). Moreover, R1 K∗ is a multiple of K∗ R2 . Hence the leading term of the operator M is a multiple of R1 K∗ K. Since elements of Tw2 are annihilated by interior Clifford multiplication, it follows from the formula for K that the symbol σξ (K∗ ) is simply interior multiplication by ξ , ι(ξ ). Without loss of generality we may suppose that ξ is a unit vector. It is well known that the Rarita-Schwinger operator (on the flat space) can be composed with a third order constant coefficient operator to give the square of the Laplace operator. Hence the symbol of R1 can be multiplied on the left to get |ξ |4 = 1. By this left multiplication, σξ (K∗ R2 K)t = σξ (R1 K∗ K)t = 0 implies ι(ξ )σξ (K)t = 0. Now the explicit formula for σξ (K)t is 1 2 ε(β)(βξ t − ε(β)ι(ξ )t), n−2 n−1 where ε(·) indicates exterior multiplication. Contracting with ξ and setting to zero we obtain ((n − 3) times) 2 (n − 1)t = n ε(ξ )ι(ξ )t + ε(β)βξ ι(ξ )t . n ε(ξ )t +

The right hand side here is a multiple of σξ (T)ι(ξ )t. Thus t is in the range of σξ (T), as required. Completing the proof and remarks. From (16), (24), and Lemma (3.1) it follows easily that in dimension 4 the composition M T on ϕ, a section of S[1/2], is, up to a non-zero multiple, a Clifford multiplication of the Bach tensor Bab β b ϕ. Thus the formally selfadjoint sequence (4.5) is a complex if and only if the structure is Bach-flat, as claimed in the theorem. If ϕ is a twistor spinor (i.e. Tϕ = 0) then this Clifford action of the Bach tensor on ϕ obviously vanishes. In Riemannian signature it is in fact straightforward to recover a parallel standard tractor from the parallel spin-tractor corresponding to a twistor spinor. Thus Riemannian manifolds admitting a twistor spinor are almost Einstein and so Bach flat. In fact the last conclusion here is well-known [5]. Proof of Proposition 4.6. Let ψ be a section of S[1/2]. Using the formula (24) for the splitting operator and the expression (23) for the spin-tractor connection we have ψ ∇a ψ + n2 βa Dψ ∇a L 0 ψ = ∇a 2 . = 2 b n Dψ n ∇a Dψ + Pab β ψ

Yang-Mills Detour Complexes

Recalling that γ =

√

325

2β and that D = β a ∇a , we thus have Ta ψ . ∇a L 0 ψ = 2 b n ∇a Dψ + Pab β ψ

From (25) it is clear that it remains to show that 2 2 1 ∇a Dψ + Pab β b ψ = (DTa ψ − βa ∇ b Tb ψ). n n−2 n−1

(28)

Let us note some simpler identities. First for the Levi-Civita spin-connection the curvature on a spinor ϕ is given by 1 [∇a , ∇b ]ϕ = − Rabcd β c β d ϕ, 2 where R is the usual Riemannian curvature. Then from the Bianchi identities and the Clifford relation (21) we get Rabcd β b β c β d = Ricab β b Rabcd β a β b β c β d = −Sc/2.

(29)

Next an elementary calculation shows that T∗ , the formal adjoint of T, is given on T w by ϕa → −∇ a ϕa . Thus, for ψ in S[1/2], −T∗ Tψ = ψ +

2 2 D ψ, n

(where := ∇ a ∇a ) since of the spin-connection preserves the Clifford symbols. On the other hand since D 2 = β a ∇a β b ∇b = β a β b ∇a ∇b and the βs anti-commute up to a trace, as in (21), while the ∇s commute up to curvature we obtain 1 1 D 2 ψ = − ψ − Rabcd β a β b β c β d ψ, 2 4 and so using (29) we come to 1 ψ = −2D 2 ψ + Sc·ψ. 4 This with the expression above for T∗ T gives 1−n 1 ∗ D 2 ψ + Sc·ψ. − T Tψ = 2 n 4

(30)

We are now ready to calculate the left-hand side of (28). Applying β b ∇b to the defining identity ∇a ψ = − n2 βa Dψ + Ta ψ we get 2 β b ∇b ∇a ψ = − β b βa ∇b Dψ + β b ∇b Ta ψ. n Commuting the derivatives on the left and writing D as a shorthand for β b ∇b (applied to e.g. Tψ), we obtain ∇a Dψ +

1 2 2 Rabcd β b β c β d ψ = ∇a Dψ + βa D 2 ψ + DTa ψ. 2 n n

326

A. R. Gover, P. Somberg, V. Souˇcek

Next rearranging the terms and using (29) gives n−2 2 1 ∇a Dψ = βa D 2 ψ − Ricab β b ψ + DTa ψ. n n 2 Using now the identity (30) from above to substitute for D 2 ψ yields n−2 1 1 1 ∇a Dψ = − Ricab − Scg ab β b ψ + DTa ψ + βa T∗ Tψ. n 2 2(n − 1) n−1 But multiplying this through with 2/(n − 2) and using that the Schouten tensor Pab = 1 1 ∗ a n−2 (Ricab − 2(n−1) Scg ab ), and once again that T ϕ = −∇ ϕa , this gives exactly the expression (28), which is thus seen to be an identity. Acknowledgements. ARG would like to thank the Royal Society of New Zealand for support via Marsden Grants no. 02-UOA-108 and 06-UOA-029. Part of the work was prepared during a visit of ARG supported ˇ by the E. Cech Center. PS and VS acknowledge the support of the grant GA CR 201/05/2117 and the grant MSM 021620839. The first author would like to thank Helga Baum, Kengo Hirachi, Paul-Andi Nagy and Andrew Waldron for illuminating discussions.

References 1. Atiyah, M.F., Hitchin, N., Singer, I.M.: Self-duality in four-dimensional Riemannian geometry. Proc. Roy. Soc. London Ser A 362, 425–461 (1978) 2. Bach, R.: Zur Weylschen Relativitätstheorie und der Weylschen Erweiterung des Krümmungstensorbegriffs. Math. Z. 9, 110–135 (1921) 3. Bailey, T.N., Eastwood, M.G., Gover, A.R.: Thomas’s structure bundle for conformal, projective and related structures. Rocky Mountain J. Math. 24, 1191–1217 (1994) 4. Baston, R.J., Mason, L.J.: The conformal Einstein equations, In: Further advances in twistor theory: Volume II: Integrable systems, conformal geometry and gravitation, edited by L.J. Mason, L.P. Hughston, P.Z. Kobak, Essex: Longman, 1995 5. Baum, H., Friedrich, T., Grunewald, R., Kath, I.: Twistors and Killing spinors on Riemannian manifolds, Teubner-Texte zur Mathematik, 124. Stuttgart: B.G. Teubner Verlagsgesellschaft mbH, 1991 6. Besse, A.L.: Einstein manifolds. Ergebnisse der Mathematik und ihrer Grenzgebiete (3), 10. Berlin: Springer-Verlag, 1987 7. Boe, B.D., Collingwood, D.H.: A comparison theory for the structure of induced representations, II. Math. Z. 190, 1–11 (1985) 8. Branson, T.: Sharp inequalities, the functional determinant, and the complementary series. Trans. Amer. Math. Soc. 347, 3671–3742 (1995) 9. Branson, T.: Conformal Structure and spin geometry. In: Dirac operators: yesterday and today. Proceedings of the Summer School and Workshop held in Beirut, August 27–September 7, 2001, edited by J.-P. Bourguignon, T. Branson, A. Chamseddine, O. Hijazi, R.J. Stanton. Somerville, MA: International Press, 2005 10. Branson, T.: Q-curvature and spectral invariants. Rend. Circ. Mat. Palermo (2) Suppl. No. 75, 11–55 (2005) 11. Branson, T., Gover, A.R.: Conformally invariant operators, differential forms, cohomology and a generalisation of Q-curvature. Comm. Part. Differ. Equs. 30, 1611–1669 (2005) 12. Branson, T., Gover, A.R.: The conformal deformation detour complex for the obstruction tensor. Proc. Amer. Math. Soc. 135, 2961–2965 (2007) ˇ 13. Cap, A., Gover, A.R.: Tractor bundles for irreducible parabolic geometries. In: Global analysis and harmonic analysis (Marseille-Luminy, 1999), Sémin. Congr. 4, Paris: Soc. Math. France, 2000, pp. 129–154 ˇ 14. Cap, A., Gover, A.R.: Tractor calculi for parabolic geometries. Trans. Amer. Math. Soc. 354, 1511–1548 (2002) ˇ 15. Cap, A., Slovák, J., Souˇcek, V.: Bernstein-Gelfand-Gelfand sequences. Ann. Math. 154, 97–113 (2001) ˇ 16. Cap, A., Souˇcek, V.: Subcomplexes in Curved BGG-Sequences. http://arxiv.org/list/math.DG/0508534, 2005

Yang-Mills Detour Complexes

327

17. Cartan, E.: Les espaces à connexion conforme. Ann. Soc. Pol. Math. 2, 171–202 (1923) 18. Chang, S.-Y.A.: Non-linear elliptic equations in conformal geometry. Zurich Lectures in Advanced Mathematics, Zürich: European Mathematical Society, 2004 19. Donaldson, S.K.: Floer homology groups in Yang-Mills theory. Cambridge Tracts in Mathematics 147. Cambridge: Cambridge University Press, 2002 20. Eastwood, M.: Notes on conformal differential geometry. Rend. Circ. Mat. Palermo (2) Suppl. No. 43, 57–76 (1996) 21. Eastwood, M.G., Rice, J.W.: Conformally invariant differential operators on Minkowski space and their curved analogues. Commun. Math. Phys. 109, 207–228 (1987); Erratum: Commun. Math. Phys. 144, 213 (1992) 22. Eastwood, M.G., Slovák, J.: Semiholonomic Verma modules. J. Alg. 197, 424–448 (1997) 23. Fefferman, C., Graham, C.R.: Conformal invariants. In: Elie Cartan et les mathématiques d’aujourd’hui, Astérisque 95–116, hors série (Paris: SMF, 1985) 24. Fefferman, C., Graham, C.R.: Q-curvature and Poincaré metrics. Math. Res. Lett. 9, 139–151 (2002) 25. Friedrich, T.: On the conformal relation between twistors and Killing spinors. Rend. Circ. Mat. Palermo (2) Suppl. No. 2, 59–75 (1989) 26. Friedrich, T.: Dirac-Operatoren in der Riemannschen Geometrie, Mit einem Ausblick auf die SeibergWitten-Theorie. Advanced Lectures in Mathematics. Braunschweig: Friedr. Vieweg & Sohn, 1997 27. Gover, A.R.: Aspects of parabolic invariant theory. Rend. Circ. Mat. Palermo (2) Suppl. No. 59, 25–47 (1999) 28. Gover, A.R.: Almost conformally Einstein manifolds and obstructions. In: Differential geometry and its applications, Prague: Matfyzpress, 2005, pp. 247–260 29. Gover, A.R.: Laplacian operators and Q-curvature on conformally Einstein manifolds. Math. Ann. 336, 311–334 (2006) 30. Gover, A.R., Leitner, F.: A sub-product construction of Poincare-Einstein metrics. http://arxiv.org/list/ math.DG/0608044, 2006 31. Gover, A.R., Nurowski, P.: Obstructions to conformally Einstein metrics in n dimensions. J. Geom. Phys. 56, 450–484 (2006) 32. Gover, A.R., Peterson, L.J.: Conformally invariant powers of the Laplacian, Q-curvature, and tractor calculus. Commun. Math. Phys. 235, 339–378 (2003) 33. Gover, A.R., Šilhan, J.: The conformal Killing equation on forms – prolongations and applications, Diff. Geom. Applic., to appear. http://arxiv.org/list/math.DG/0601751, 2006 34. Graham, C.R., Hirachi, K.: The ambient obstruction tensor and Q-curvature. In: AdS/CFT correspondence: Einstein metrics and their conformal boundaries, IRMA Lect. Math. Theor. Phys. 8, Zürich: Eur. Math. Soc., 2005, pp. 59–71 35. Graham, C.R., Jenne, R., Mason, L., Sparling, G.: Conformally invariant powers of the Laplacian, I: existence. J. London Math. Soc. 46, 557–565 (1992) 36. Graham, C.R., Zworski, M.: Scattering matrix in conformal geometry. Invent. Math. 152, 89–118 (2003) 37. Korzynski, M., Lewandowski, J.: The normal Cartan connection and the Bach tensor. Class. Quant. Grav. 20, 3745–3764 (2003) 38. Kozameh, C., Newman, E.T, Tod, K.P.: Conformal Einstein Spaces. GRG 17, 343–352 (1985) 39. Merkulov, S.: A conformally invariant theory of gravitation and electromagnetism. Class. Quant. Grav. 1, 349–354 (1984) 40. Lee, J.M., Parker, T.H.: The Yamabe problem. Bull. Amer. Math. Soc. 17, 37–91 (1987) 41. Penrose, R., Rindler, W.: Wolfgang, Spinors and space-time. Vol. 1 and Vol. 2. Spinor and twistor methods in space-time geometry, Cambridge Monographs on Mathematical Physics, Cambridge: Cambridge University Press, 1987, 1988 Communicated by A. Connes

Commun. Math. Phys. 278, 329–384 (2008) Digital Object Identifier (DOI) 10.1007/s00220-007-0406-0

Communications in

Mathematical Physics

Wavepacket Preservation Under Nonlinear Evolution A. Babin, A. Figotin Department of Mathematics, University of California at Irvine, Irvine, CA 92697, USA. E-mail: [email protected]; [email protected] Received: 27 July 2006 / Accepted: 24 August 2007 Published online: 8 January 2008 – © Springer-Verlag 2007

Abstract: We study nonlinear systems of hyperbolic PDE’s in Rd , the hyperbolicity is understood in a wider sense, namely multiple roots of the characteristic equation are allowed and dispersive equations are permitted. They describe wave propagation in dispersive nonlinear media such as, for example, electromagnetic waves in nonlinear photonic crystals. The initial data is assumed to be a finite sum of wavepackets referred to as a multi-wavepacket. The wavepackets and the medium nonlinearity are characterized by two principal small parameters β and where: (i) β1 is a factor describing spatial extension of involved wavepackets; (ii) 1 is a factor describing the relative magnitude of the linear part of the evolution equation compared to its nonlinearity. A key element in our approach is a proper definition of a wavepacket. Remarkably, the introduced definition has a flexibility sufficient for a wavepacket to preserve its defining properties under a general nonlinear evolution for long times. In particular, the corresponding wave vectors and the band numbers of involved wavepackets are “conserved quantities”. We also prove that the evolution of a multi-wavepacket is described with high accuracy by a properly constructed system of envelope equations with a universal nonlinearity. The universal nonlinearity is obtained by a time averaging applied to the original nonlinearity, in simpler cases the averaged system turns into a system of Nonlinear Schrodinger equations.

1. Introduction The underlying physical subject of this work is propagation of a multi-wavepacket (a finite system of wavepackets) in a spatially dispersive and nonlinear medium, and we are particularly interested in electromagnetic waves propagation in nonlinear photonic crystals, see [4–7,55,56,58] and references therein, with the nonlinear optics constitutive relations, [12,15, Sects. 1,2, 42,48]. The mathematical subject of interest is the following general nonlinear evolutionary system

330

A. Babin, A. Figotin

i ∂τ U = − L (−i∇) U + F (U), U (r, τ )|τ =0 = h (r), r ∈ Rd ,

(1)

where (i) U = U (r, τ ), r ∈ Rd , U ∈ C2J is a 2J dimensional vector; (ii) L (−i∇) is a linear self-adjoint differential (pseudodifferential) operator with constant coefficients with the symbol L (k), which is a Hermitian 2J × 2J matrix; (iii) F is a polynomial nonlinearity such that F (0) = 0, F (0) = 0 and F (U) is translation-invariant, i.e. if Ta U (r) = U (r + a) for a ∈ Rd then F (Ta U) = Ta F (U); (iv) h = h (r) is assumed to be the sum of a finite number of wavepackets hl , l = 1, . . . , N ; (v) > 0 is a small parameter. In the case of nonlinear photonic crystals the components of the vector field U (r) are the modal amplitudes of the electromagnetic field and the nonlinearity F (U) is constructed from the nonlinear medium polarization in the adiabatic approximation, [15, Sects. 2.4.2]. The systems of the form (1) also describe as a particular case well-known equations, namely: complexification of the Nonlinear Schrodinger equation; coupled envelope equations which arise in nonlinear birefringent optical media, [41, Sect. 2i]; nonlinear Klein-Gordon and Sine-Gordon equations [61, Sect. 14.1,43, Sect. 5.8.3,44, Sect. 9.6]. Such equations appear in a number of physical problems: elementary particles, dislocations in crystals, propagation of Bloch’s domain walls in the theory of ferromagnetism, self-induced transparency in nonlinear optics, the propagation of magnetic flux quanta in long Josephson transmission lines. Significance and importance of wavepacket solutions from both physical and mathematical points of view is discussed in [4–7,41, Sect. 2, 55,58]. There are numerous problems involving small parameters only in the initial data which can be reduced to the form (1), for instance, problems with high frequency initial data or small initial data with consequent evolution on long time intervals (see Sect. 3 for details). We study the nonlinear evolution equation (1) on a finite time interval 0 ≤ τ ≤ τ ∗ , where τ ∗ > 0 is a fixed number.

(2)

L∞

norm of the initial data h but, importantly, τ ∗ does The time τ ∗ may depend on the not depend on . We consider classes of initial data such that wave evolution governed by (1) is significantly nonlinear on the time interval [0, τ ∗ ] and the effect of the nonlinearity F (U) does not vanish as → 0. Since both the linear operator L (−i∇) and the nonlinearity F (U) are translation invariant, it is natural and convenient to recast the evolution equation (1) by applying to it the Fourier transform with respect to the space variables r, namely ˆ (k) + Fˆ U ˆ (k), U ˆ (k) ˆ (k) = − i L (k) U ∂τ U = hˆ (k), (3) τ =0 ˆ (k) is the Fourier transform of U (r), i.e. where U −ir·k −d ˆ U (r) e dr, U (r) = (2π ) U (k) = Rd

Rd

ˆ (k) eir·k dr, where r, k ∈ Rd , U

(4) and Fˆ is the Fourier form of the nonlinear operator F (U) involving convolutions. The nonlinear evolution equations (1), (3) are commonly interpreted as describing wave propagation in a nonlinear medium. We assume that the linear part L (k) is a 2J ×2J Hermitian matrix with eigenvalues ωn,ζ (k) and eigenvectors gn,ζ (k) satisfying L (k) gn,ζ (k) = ωn,ζ (k) gn,ζ (k), ζ = ±, ωn,+ (k) ≥ 0, ωn,− (k) ≤ 0, n = 1, . . . , J, (5)

Wavepacket Preservation Under Nonlinear Evolution

331

where ωn,ζ (k) are real-valued, continuous for all non-singular k functions, and vectors gn,ζ (k) ∈ C2J have unit length in the standard Euclidean norm. The functions ωn,ζ (k), n = 1, . . . , J , are called dispersion relations between the frequency ω and the wavevector k with n being the band number. We assume that the eigenvalues are naturally ordered by ω J,+ (k) ≥ . . . ≥ ω1,+ (k) ≥ 0 ≥ ω1,− (k) ≥ . . . ≥ ω J,− (k),

(6)

and for almost every k (with respect to the standard Lebesgue measure) the eigenvalues are distinct and, consequently, the above inequalities become strict. Importantly, we also assume the following diagonal symmetry condition ωn,−ζ (−k) = −ωn,ζ (k), ζ = ±, n = 1, . . . , J,

(7)

which is naturally present in many physical problems (see also Remark 14 below), and is a fundamental condition imposed on the matrix L (k). In addition to that in many examples we also have ∗ gn,ζ (k) = gn,−ζ (−k), where z ∗ is complex conjugate to z.

(8)

Very often we will use the following abbreviation: ωn,+ (k) = ωn (k).

(9)

ωn,− (k) = −ωn (−k), ωn,ζ (k) = ζ ωn (ζ k), ζ = ±.

(10)

From (7) we obtain

We also will often use the orthogonal projection n,ζ (k) in C2J onto the complex line defined by the eigenvector gn,ζ (k), namely n,ζ (k) uˆ (k) = u˜ n,ζ (k) gn,ζ (k) = uˆ n,ζ (k), n = 1, . . . , J, ζ = ±.

(11)

As it is indicated by the title of this paper we study the nonlinear problem (1) for initial data hˆ in the form of a properly defined wavepacket or, more generally, a sum of wavepackets which we refer to as multi-wavepacket. The simplest example of a wavepacket w is provided by the following formula: w (r, β) = + (βr) eik∗ ·r gn,+ (k∗ ), r ∈ Rd ,

(12)

where k∗ ∈ Rd is a wavepacket wave vector, n is band number, and β > 0 is a small parameter. We refer to the pair (n, k∗ ) in (12) as the wavepacket nk-pair. Observe that the space extension of the wavepacket w (r, β) is proportional to β −1 and it is large for small β. Notice also that if β → 0 the wavepacket w (r, β) as in (12) tends, up to a constant factor, to the elementary eigenmode eik∗ ·r gn,ζ (k∗ ) of the operator L (−i∇) with the corresponding eigenvalue ωn,ζ (k∗ ). We refer to wavepackets of the simple form (12) as simple wavepackets to underline the very special way the parameter β enters its representation. The function ζ (r), which we call the wavepacket envelope, describes its shape and it can be any scalar complex-valued regular enough function, for example a function from Schwartz space. Importantly, as β → 0 the L ∞ norm of a wavepacket (12) remains constant, and, hence, nonlinear effects in (1) remain strong. Evolution of wavepackets in problems which can be reduced to the form (1) were studied for a variety of equations in numerous physical and mathematical papers, mostly

332

A. Babin, A. Figotin

by asymptotic expansions with respect to a single small parameter similar to β, see 10,13,18,20,23,29,30,38,47,50,51] and references therein. We are interested in general properties of evolutionary systems of the form (1) with wavepacket initial data which hold for a wide class of nonlinearities and all values of the space dimensions d of the number 2J of the system components. Our approach is not based on asymptotic expansions but involves the two small parameters β and with mild constraints on their relative smallness. The constraints can be expressed either in the form of certain inequalities or equalities, and a possible simple form of such a constraint can be a power law β = Cκ where C > 0 and κ > 0 are arbitrary constants. (13) Of course, general features of wavepacket evolution are independent of particular values of the constant C. In addition to that, some fundamental properties such as wavepacket invariance, are also totally independent of the particular choice of the values of κ in (13), whereas other properties are independent of κ as it varies in certain intervals. For instance, dispersion effects are dominant for κ < 1/2, whereas the wavepacket superposition principle of 7] holds for κ < 1. The qualitative picture of wavepacket evolution dependence on small β and is as follows. The parameter β enters problem (1) through the multi-wavepacket initial data h (r, β), whereas enters it through the factor 1 before the linear part. Evidently the factor

1

determines the relative magnitude of the linear part compared to the nonlin-

earity and since 1 is large, one expects the linear part to provide an important input into solutions properties. This input includes, in particular, a key role of eigenmodes and eigenfrequencies (dispersion relations) in expressing the nonlinear evolution. Importantly, in many cases of interest though 1 is large, nonlinear phenomena are significant and this is the case when β ≤ C1/2 . More precisely, if β ≤ C1/2 then, as in the case of finite-dimensional nonlinear ODE evolutionary systems, the large values of 1 lead to a well defined solution factorization into the fast (high frequency) and the slow (low frequency) components. The interplay between the fast and slow components is also similar to the ODE case, namely, the nonlinear evolution is associated primarily with the slow component governed by a nonlinear equation obtained from the original one by a certain canonical time averaging procedure. Our further analysis of the above mentioned interplay shows the following. Firstly, the linear superposition principle holds, 7], that is if κ < 1 is as in (13) and the initial data is a sum of generic wavepackets then the solution is the sum of the solutions for single involved wavepackets with precision with arbitrary small . Secondly, properly defined wavepackets and their linear β 1+

combinations are preserved under the nonlinear evolution (1), which is a subject of this paper. In the light of the above discussion we introduce the slow variable uˆ (k, τ ) by the formula iτ ˆ (k, τ ) = e− L(k) uˆ (k, τ ), U (14) and recast Eq. (3) for it as follows:

−iτ iτ L L ˆ ∂τ uˆ = e Fˆ e uˆ , uˆ τ =0 = h.

Then we obtain an integral form of (15) by integrating it with respect to τ : τ iτ −iτ L L ˆ F uˆ = F () uˆ = e Fˆ e uˆ τ dτ uˆ = F uˆ + h, 0

(15)

(16)

Wavepacket Preservation Under Nonlinear Evolution

333

with an explicitly defined nonlinear polynomial integral operator F (), which depends on the parameter is bounded uniformly with respect to in the Banach . This operator space E = C [0, τ ∗ ] , L 1 of functions vˆ (k, τ ), 0 ≤ τ ≤ τ ∗ , with the norm vˆ (k, τ ) = vˆ (k, τ ) vˆ (k, τ ) dk, (17) = sup E C ([0,τ ],L 1 ) ∗

0≤τ ≤τ ∗ Rd

where L 1 is the Lebesgue space of functions vˆ (k) with the standard norm vˆ (·) 1 = vˆ (k) dk. L

(18)

Sometimes we use more general weighted spaces L 1,a with the norm vˆ 1,a = (1 + |k|)a vˆ (k) dk, a ≥ 0. L

(19)

Rd

Rd

A rather elementary existence and uniqueness theorem (Theorem 29) implies that for a small and, importantly, independent of constant τ ∗ > 0 this equation has a unique solution uˆ (τ ) = G F (), hˆ (τ ), τ ∈ [0, τ ∗ ] , uˆ ∈ C 1 [0, τ ∗ ] , L 1 , (20) where G denotes the solution operator for Eq. (16), the operator depends on operator F (), which itself depends on the parameter . If uˆ (k, τ ) is a solution to Eq. (16) we call the function U (r, τ ) defined by (14), (4) an F-solution to Eq. (1). We denote by Lˆ 1 1 ˆ the space of functions V (r) such that their Fourier transform V (k) belongs to L , and ˆ define V Lˆ 1 = V 1 . Since L

ˆ V L ∞ ≤ (2π )−d V

L1

and Lˆ 1 ⊂ L ∞ ,

(21)

F-solutions to (1) belong to C 1 [0, τ ∗ ] , Lˆ 1 ⊂ C 1 ([0, τ ∗ ] , L ∞ ). We would like to define wavepackets in a form which explicitly allows them to be real valued. This is accomplished based on the symmetry (7) of the dispersion relations by introduction of a doublet wavepacket w (r, β) = + (βr) eik∗ ·r gn,+ (k∗ ) + − (βr) e−ik∗ ·r gn,− (−k∗ ).

(22)

Such a wavepacket is real if − (r), gn,− (−k∗ ) is complex conjugate to + (r), gn,+ (k∗ ), i.e. if (23) − (r) = ∗+ (r), gn,+ (k∗ ) = gn,− (−k∗ )∗ . Considering wavepackets with nk-pair (n, k∗ ) we usually mean doublet ones as in (22), but sometimes + or − may be zero producing (12). To identify characteristic properties of a wavepacket suitable for our needs, let us ˆ (k, β) of an elementary wavepacket w (r, β) defined by look at the Fourier transform w (12), that is ˆ β −1 (k − k∗ ) gn,ζ (k∗ ). ˆ (k, β) = β −d w (24) ˆ (k, β) a wavepacket too, obviously it possesses the following properties: We call such w (i) its L 1 norm is bounded (in fact, constant), uniformly in β → 0; (ii) for every > 0

334

A. Babin, A. Figotin

ˆ (k, β) → 0 for every k outside a β 1− -neighborhood of k∗ , and the converthe value w gence is faster than any power of β if is a Schwartz function. To explicitly interpret the last property we introduce a cutoff function (η), (η) = 1 for |η| ≤ 1, (η) = 0 for |η| > 1, together with its shifted/rescaled modification (k; k∗ ) = k; k∗ , β 1− = β −(1− ) (k − k∗ ) .

(25)

(26)

If in an elementary wavepacket w (r, β) defined by (24) ζ (r) is a Schwartz function then ˆ (·, β) ≤ C ,s β s , 0 < β ≤ 1, 1 − ·, k∗ , β 1− w which holds for arbitrarily small > 0 and arbitrarily large s > 0. Based on the above discussion we give the following definition of a wavepacket which is a minor variation of 7, Def. 8]. Definition 1 (Single-band wavepacket). Let 0 < < 1 be a fixed number. For a given band number n ∈ {1, . . . , J } and a wavevector k∗ ∈ Rd , a function hˆ (β, k) is called a wavepacket with nk-pair (n, k∗ ) and the degree of regularity s > 0 if there exists such β 0 > 0 that for β < β 0 the following conditions are satisfied: (i) hˆ (β, k) is L 1 -bounded uniformly in β, i.e. ˆ (27) h (β, ·) 1 ≤ C, 0 < β < β 0 for some C > 0; L

(ii) hˆ (β, k) has the following structure: hˆ (β, k) = hˆ − (β, k) + hˆ + (β, k) + Dˆ h , 0 < β < β 0 , where hˆ ζ (β, k) = k, ζ k∗ , β 1− n,ζ (k) hˆ ζ (β, k), ζ = ±,

(28) (29)

with ·, ζ k∗ , β 1− defined by (26) and Dˆ h satisfying the following tail estimate: ˆ (30) Dh 1 ≤ C β s , 0 < β < β 0 for some C > 0. L

The inverse Fourier transform h (β, r) of a wavepacket hˆ (β, k) is also called a wavepacket. Point (ii) of the above definition means that the wavepacket hˆ (β, k) is composed of two functions hˆ ζ (β, k), ζ = ±, which take values the in the n th band eigenspace of L (k) and are localized near ζ k∗ , where (n, k∗ ) is the nk-pair of the wavepacket. The number β 0 usually is small and may depend on a wavepacket. Evidently, if a wavepacket has the degree of regularity s, it also has a smaller degree of regularity s ≤ s with the same . Observe that the degree of regularity s is related to the smoothness of ζ (r) in (12) so that the higher the smoothness is the higher s can ˆ ζ ∈ L 1,a then one can take any s < a, see Lemma 52 below. For be taken. Namely, if

example, if in the elementary wavepacket w (r, β) defined by (12) ζ (r) is a Schwartz function then it has arbitrarily large degree of regularity.

Wavepacket Preservation Under Nonlinear Evolution

335

Remarkably it turns out that wavepackets satisfying Definition 1 preserve their defining properties under nonlinear evolution. It is remarkable, in particular, since it is wellknown that determination of classes of solutions which preserve their form under generic nonlinear evolution usually leads to infinite expansions, such as multi-scale expansions, power expansions, modal expansions, etc. with serious difficulties in establishing the convergence. Such expansions often are formally invariant, but they involve infinitely many rather complex terms and establishing the convergence is a very hard problem indeed if there is any convergence at all. Our Definition 1 of a wavepacket involves only a finite number of terms and its invariance is provided by the flexible tail term Dˆ h . We also find remarkable the very simplicity of the definition which nevetherless allows for a sufficiently detailed analysis of the dynamics, including, in particular, rigorously justified NLS-type approximations of wavepacket dynamics presented in the following sections. Our special interest is in waves that are finite sums of wavepackets and we refer to them as multi-wavepackets. Definition 2 (Multi-wavepacket). Let S be a set of nk-pairs: S = {(nl , k∗l ), l = 1, . . . , N } ⊂ = {1, . . . , J } × Rd , (nl , k∗l ) = (nl , k∗l ) for l = l , (31) and N = |S| be their number. Let K S be a set consisting of all different wavevectors k∗l involved in S with |K S | ≤ N being the number of its elements. K S is called wavepacket k-spectrum and without loss of genericity we assume the indexing of elements in S to be such that K S = {k∗i , i = 1, . . . , |K S |} , i.e. li = i for 1 ≤ i ≤ |K S | .

(32)

A function hˆ (β) = hˆ (β, k) is called a multi-wavepacket with nk-spectrum S if it is a finite sum of wavepackets, namely hˆ (β, k) =

N

hˆ l (β, k), 0 < β < β 0 for some β 0 > 0,

(33)

l=1

where hˆ l , l = 1, . . . , N , is a wavepacket with nk -pair (nl , k∗l ) ∈ S as in Definition 1. Note that if hˆ (β, k) is a wavepacket then hˆ (β, k) + O (β s ) is a wavepacket as well with the same nk-spectrum, and the same is true for multi-wavepackets. Hence, we can introduce a multi-wavepackets equivalence relation “ ” of degree s by hˆ 1 (β, k) hˆ 2 (β, k) if hˆ 1 (β, k) − hˆ 2 (β, k) 1 ≤ Cβ s for some constant C > 0. L (34) Observe also that zero functions are (trivial) wavepackets for any given (n, k)-spectrum. A wavepacket with any pair (n, k) is equivalent to zero if its L 1 norm is bounded by β s, and such trivial components of two equivalent wavepackets are excluded; the remaining sets of elements (nl , k∗l ) of spectra of two equivalent wavepackets must coincide. Let us turn now to the abstract nonlinear problem (16) where (i) F = F () depends on and (ii) the initial data hˆ = hˆ (β) is a multi-wavepacket depending on β. We would like to state our first theorem on multi-wavepacket preservation under the evolution (16) for β, → 0, which holds, as it turns out, provided its nk-spectrum S

336

A. Babin, A. Figotin

satisfies a certain natural condition called resonance invariance. This condition is intimately related to the so-called phase and frequency matching conditions for stronger nonlinear interactions, and its concise formulation is as follows. We define for given dispersion relations {ωn (k)} and any finite set S ⊂ {1, . . . , J } × Rd another finite set R (S) ⊂ {1, . . . , J } × Rd , where R is a certain algebraic operation described in Definition 18 below. It turns out that for any S always S ⊆ R (S) but if, in fact, R (S) = S we call S resonance invariant. The condition of resonance invariance is instrumental for the multi-wavepacket preservation, and there are examples showing that if it fails, i.e. R (S) = S, the wavepacket preservation does not hold. Importantly, the resonance invariance R (S) = S allows resonances inside the multi-wavepacket, that includes, in particular, resonances associated with the second and the third harmonic generations, resonant four-wave interaction, etc. Theorem 3 (Multi-wavepacket preservation). Suppose that the nonlinear evolution is governed by (16) and the initial data hˆ = hˆ (β, k) is a multi-wavepacket with nk-spectrum S and the regularity degree s, and assume S to be resonance invariant (see Definition 18 below). Let dependence between parametrs and β be any function = ρ (β) satisfying 0 < ρ (β) ≤ Cβ s , for some constant C > 0, (35) and let us set = ρ (β). Then the solution uˆ (τ , β) = G F (ρ (β)), hˆ (β) (τ ) to (16) for any τ ∈ [0, τ ∗ ] is a multi-wavepacket with nk-spectrum S and the regularity degree s, i.e. uˆ (τ , β; k) =

N

uˆ l (τ , β; k), where uˆ l is wavepacket with nk-pair (nl , k∗l ) ∈ S.

l=1

(36) The time interval length τ ∗ > 0 depends only on L 1 -norms of hˆ l (β, k) and N . The presentation (36) is unique up to the equivalence (34). The above statement can be interpreted as follows. Modes in nk-spectrum S are always resonance coupled with modes in R (S) through the nonlinear interactions, but if R (S) = S then (i) all resonance interactions occur inside S and (ii) only a small vicinity of S is involved in nonlinear interactions leading to the multi-wavepacket preservation. Many nonlinear evolution problems with small initial data can be readily reduced by elementary rescaling to the system (1) with a large parameter 1 before its linear part. For example, suppose that F (V) is a homogeneous nonlinearity of degree m (m = 3 for a cubic one) and that the nonlinear evolution is governed by ∂t V = −iL (−i∇) V + F (V), V (r, t)|t=0 = 1/(m−1) h (r), r ∈ Rd , considered for small on the large time interval 0 ≤ t ≤ the following simple change of variables:

τ∗

V (t) = 1/(m−1) U (τ ), τ = t

(37)

with a fixed τ ∗ > 0. Then (38)

transforms the problem (37) into the equivalent problem (1). In this case the inequality (35) describes a constraint between the spatial extension β1 and the amplitude factor

1/(m−1) = ρ (β)1/(m−1) of the initial data. Observe that Eq. (37) does not have any small parameters and both small parameters and β enter the problem through its initial data. Theorem 3 can be restated for problem (37) as follows:

Wavepacket Preservation Under Nonlinear Evolution

337

Corollary 4 (Multi-wavepacket preservation). Let V (r, t) be a solution to the nonlinear system (37), ρ (β) is as in (35) and we set = ρ (β). Then if the initial data is such ˆ (k, 0) = hˆ (k) is a multi-wavepacket, then −1/(m−1) V ˆ (k, t) remains that −1/(m−1) V as a multi-wavepacket with the same nk-spectrum and the degree of regularity for all τ∗ times t ∈ 0, . The statements of Theorems 3 and Corollary 4 directly follow from the following general theorem which makes no assumptions on the relations between β, → 0. Theorem 5 (Multi-wavepacket approximation). Let the initial data hˆ in the integral equation (16) be a multi-wavepacket hˆ (β, k) with nk-spectrum S as in (31), the regularity degree s and with the parameter > 0 as in Definition 1. Assume that S is resonance invariant in the sense of Definition 18 below. Let the cutoff function (k, k∗ ) and the eigenvector projectors n,± (k) be defined by (26) and (11) respectively. For a solution uˆ of (16) we set ⎡ ⎤

uˆ l (τ , β; k) = ⎣ (k, ζ k∗l ) nl ,ζ (k)⎦ uˆ (τ , β; k), l = 1, . . . , N . (39) ζ =±

Then every such uˆ l (k; τ , β) is a wavepacket and N

uˆ l (τ , β; k) sup uˆ (τ , β; k) − 0≤τ ≤τ ∗ l=1

≤ C1 + C2 β s ,

(40)

L1

where the constant C1 does not depend on , s and β, and the constant C2 does not depend on β. It is interesting to note that the statement of Theorem 5 can be extended to the special limit case β = 0, k∗l = 0. In this case the initial data of (1) are constants in r and we can consider solutions U (1) which do not depend on r. Then ∇U = 0, the linear operator L (−i∇) reduces to the multiplication by a matrix L0 = L (0) and the system (1) turns into a system of ordinary differential equations (ODE). Notice that (i) the structure of the eigenvalues (7) implies that the linear part is time-reversible; (ii) the nonlinear part can be an arbitrary polynomial. The extension of Theorem 5 to this case (see Theorem 11) reads that in a generic, non-resonant situation if initial data are bounded and a set of eigenmodes of the matrix L0 is excited at τ = 0, then in the course of evolution on a time interval [0, τ ∗ ] where τ ∗ depends on magnitude of initial data: (i) all remaining modes remain unexcited with accuracy proportional to , and (ii) only the originally excited modes can significantly evolve with this level of accuracy. For finite-dimensional systems governed by ODE’s such a statement can be derived from the classical time-averaging principle and the time-averaged equations remain nonlinear. For infinitely-dimensional systems governed by PDE and with the linear operator having a continuous spectrum, as in Theorem 5, the analysis is more complex but the time-averaging still plays an important role yielding an accurate approximation governed by a certain universal nonlinear PDE. We would like to point out also that though Theorem 3 is a simple corollary of the more general Theorem 5, it is important that the statement (40) can be formulated as multi-wavepacket invariance. That, in particular, allows to take values uˆ (τ ∗ ) as new

338

A. Babin, A. Figotin

wavepacket initial data for (1) and extend the wavepacket invariance of a solution to the next time interval τ ∗ ≤ τ ≤ τ ∗1 . This observation allows to extend the wavepacket invariance to larger values of τ (up to blow-up time or infinity) if some additional information about solutions with wavepacket initial data is available. In particular, the following theorem holds. Theorem 6. Assume that all conditions of Theorem 3 are satisfied and, in addition to that, solutions uˆ (τ , β) of (16) with the multi-wavepacket initial data hˆ (β) and = ρ (β) exist on an interval 0 ≤ τ < τ 0 , τ 0 ≤ ∞, and the estimate uˆ (·, β) C ([0,τ ],L 1 ) ≤ R (τ 1 ) 1 holds for anyτ 1 < τ 0 , where R (τ 1 ) does not depend on β ≤ β 0 . Then the solution uˆ (τ , β) = G F (ρ (β)), hˆ (β) (τ ) to (16) for any τ < τ 0 is a multi-wavepacket with nk-spectrum S and the regularity degree s, that is (36) holds. The derivation of the above statement from Theorem 3 is straightforward with the following key points. The interval τ ∗ in Theorem 3 depends only on the L 1 - norm of initial data and the solution uˆ (τ , β) is assumed to be bounded in L 1 by R (τ ) ≤ R (T ) for 0 ≤ τ ≤ T for any T < τ 0 . Therefore, we can apply Theorem 3 consecutively on intervals [nτ ∗ , (n + 1) τ ∗ ] for all integers n such that 0 ≤ nτ ∗ ≤ T and conclude that if uˆ (τ , β) is a wavepacket for τ = nτ ∗ it remains to be a wavepacket for τ ∈ [nτ ∗ , (n + 1) τ ∗ ]. Note that parameters β 0 and C in Definition 1 may depend on a wavepacket and be different for different wavepackets. Importantly, τ ∗ in the statement of Theorem 5 does not depend on β 0 and C . Since for any fixed T < τ 0 we can apply Theorem 3 a finite number of times the solution uˆ (τ ) is a wavepacket on the interval [0, T ] if T < τ 0 (with some parameters β 0 (T ) > 0 and C (T ) < ∞). Note that the wavepacket form of solutions can be used to obtain long-time estimates of solutions. Namely, very often behavior of every single wavepacket is well approximated by its own nonlinear Schrodinger equation (NLS), see 17,34,18,23,30, 31,47,50,51,53] and references therein, see also Sect. 6. Many features of the dynamics governed by NLS-type equations are well-understood, see 14,16,32,49,57,59] and references therein. These results can be used to obtain long-time estimates for every single wavepacket (as, for example, in 31]) and, with the help of the superposition principle, for the multiwavepacket solution. The wavepacket representation (36) from Theorem 3 can be used for more detailed analysis of dynamics of wavepackets uˆ l (τ , β) and interaction between them. The following theorem illustrates that by describing wavepacket interaction based on a system with a weakly universal nonlinearity similar to so-called coupled modes systems or NLS. Theorem 7 (NLS-type approximation). Let the conditions of Theorem 5 hold and, in addition to that, the initial data hˆ l (k) are of the form hˆ l = hˆ l,+ + hˆ l,− + Dˆ l , where hˆ l,ζ (k) = β −d Hˆ l,ζ β −1 (k − ζ k∗l ) gnl ,ζ (k) for |k − k∗l | ≤ β 1− , ζ = ±, Dˆ l satisfies (30), and every function Hˆ l,ζ (η), which may depend on β, is defined for all η and is bounded in L 1,a with a > s uniformly in β. Then one can write a nonlinear system of differential equations for 2N scalar envelope functions zl,ζ (τ , r) with the initial data Hl,ζ , a linear part of the system has order µ ≤ 3 and the nonlinearity is weakly universal as in (238) and has order ν ≤ 1. Let zˆl,ζ (τ , k), l = 1, . . . , N , be the Fourier transform of a solution to this system. Then there exist β 0 > 0 and a constant

Wavepacket Preservation Under Nonlinear Evolution

339

C which does not depend on β, such that for β ≤ β 0 the solution uˆ of (16) with initial data hˆ can be approximated as follows: N

uˆ l (τ , β) − β −d zˆl,ζ τ , β −1 (· − k∗l ) gnl ,ζ l=1

≤C +

E

β (µ+1)(1− ) + β (ν+1)(1− ) + β s .

(41)

The above-mentioned system with a weakly universal nonlinearity is constructed based on Eq. (1) and nk-spectrum S with the help of time averaging (70) described below. Note that in the simplest case when µ = 2, ν = 0, N = 1 (and J is arbitrary) the resulting system with a universal nonlinearity is equivalent to the classical Nonlinear Schrodinger equation (NLS). If N = 2 and k∗1 = −k∗2 we obtain the well-known coupled modes system for counterpropagating waves. This theorem applied to particular systems implies approximation theorems similar to results of (i) 30,53,6,23] on NLS approximation; (ii) 6,24,47,52] on coupled mode approximation; (iii) 54] on three-wave approximations. Note also that (41) implies that if = β κ with 1 < κ < 2, then both the first order hyperbolic equations (µ = 1, ν = 0) and the second-order NLS (µ = 2, ν = 0) provide an approximation for a solution uˆ of (16), but NLS provides a bet (1− ) ter approximation O β compared with O β 2(1− )−κ for first order hyperbolic equations. Observe that in the form (22) for a simple wavepacket we require gn,± (k∗ ) to be an eigenvector of the Hermitian matrix L (k∗ ), and one can wonder if gn,± (k∗ ) can be replaced with an arbitrary pair of vectors g± in the case J > 1. The answer is affirmative, since one can always expand any g with respect to the basis gn,± (k) using n,± (k), but the result will be a multi-wavepacket with up to 2J components rather than a single wavepacket. The rest of the paper is organized as follows. In the next section we illustrate important points of parameter dependence and wavepacket preservation based on examples. In Sect. 3 we formulate conditions of wavepacket preservation including the key resonance invariance condition. In Sect. 4 we provide examples of different forms of equations and systems which involve small or large parameters and can be written in the form of (1) after a rescaling. In Sect. 5 we introduce and discuss integrated modal forms of the evolution equation. In Sect. 6 we introduce and study the wavepacket interaction system in its relation to the original system. In Sect. 7 we approximate the wavepacket interaction system by a certain minimal wavepacket interaction system, which in the simplest cases turns into the NLS or the coupled modes system. 2. Preliminary Discussion and Examples Observe that the multi-wavepacket preservation as described in Theorems 3-7 states in different forms that (i) its modal composition is essentially preserved; (ii) its nk-spectrum (the set of nk-pairs {k∗l , nl }) remains the same at all times; (iii) no new modes are excited with good accuracy as a result of the nonlinear evolution. The preservation of multiwavepackets as they evolve shows also that only the nonlinear interactions between small neighborhoods of points (k∗l , nl ) are essential and contribute constructively to the nonlinear dynamics, whereas the amplitudes of modes with wavevectors k outside

340

A. Babin, A. Figotin

those neighborhoods is vanishingly small as β, → 0. The latter is quite remarkable ˆ ˆ since the coupling term F U (k) in (3) for such k is not small. A qualitative explanation to that, confirmed by rigorous analysis, is based on a fact that the contribution of this term to the solution is a time integral involving highly oscillatory functions that becomes vanishingly small as β, → 0. This mechanism is similar to the classical averaging mechanism for systems of ordinary differential equations described, for instance, in 11]; the relevance of the averaging mechanism for long-wave asymptotics for hyperbolic systems of PDE is well-known, see 30]. We would like to relate now the multi-wavepacket preservation property to the linear superposition

for wavepackets established in 7]. According to that principle if the initial state h = hl , with hl , l = 1, . . . , N being “generic” wavepackets, then the solution uˆ (τ ) = G (h) (τ ) to the evolution equation (15) equals with high accuracy the sum of individual solutions ul of N equations with respective initial data hl . Namely, if β, > 0 satisfy the following relation: β, → 0, β ≥ C1 with some C1 > 0,

(42)

then for all times 0 ≤ τ ≤ τ ∗ we have N N

wl (τ ) = G (wl ) (τ ) + D (τ ), G

(43)

l=1

l=1

D (τ ) E = sup D (τ ) L ∞ ≤ C

0≤τ ≤τ ∗

β 1+

+ Cβ for any > 0.

(44)

The linear superposition principle is formulated in 7] for β = C2 1/2 , but, in fact, the provided proofs of (43), (44) remain valid as long as (42) holds. Obviously, the bound β ≥ C1 in (42) determines when (44) becomes trivial. This bound is sharp and examples below show that when β ∼ the remainder D (τ ) in (43) does not tend to zero when β → 0. Both the multi-wavepacket preservation and the linear superposition apply to sums of generic wavepackets. It is important to notice though that the multi-wavepacket preservation holds for any dependence between and β which satisfy (35), that is (β) ≤ Cβ q with arbitrary small q whereas the linear superposition holds if (β) ≤ Cβ. Thus, the bounds (42) on β determine the range of its values for which both multi-wavepacket preservation and linear superposition hold simultaneously (provided some genericity conditions are satisfied). In this range wavepacket preservation provides additional information on behavior of solutions with single wavepacket initial data, namely that the solution remains a single wavepacket. Obviously, the linear superposition principle does not follow from multi-wavepacket invariance. Below we use simple examples and models to discuss different ranges of parameters and β where wavepacket preservation is valid but the solutions of equations exhibit different behavior.

2.1. A model with explicit solutions and the effect of large group velocity. Here we introduce a simple model for our general system (1) with elementary solutions which makes explicit that in the limit → 0 nonlinear effects do not vanish, in particular the blow-up time does not tend to infinity. This example also shows that on the time scale where τ is of order 1 solutions undergo significant nonlinear evolution. The influence

Wavepacket Preservation Under Nonlinear Evolution

341

of on solutions through the group velocity in this example can be seen explicitly. The model is the following system of two coupled nonlinear first order hyperbolic equations for variables u 1 (x, τ ), u 2 (x, τ ) with one-dimensional spatial variable x: ∂τ u 1 = − ∂τ u 2 = −

c1 ∂x u 1 + F1 (u 1 , u 2 ),

c2 ∂x u 2 + F2 (u 1 , u 2 ), c1 = c2 ,

u 1 |τ =0 = h 1 (x),

(45) u 2 |τ =0 = h 2 (x), (46)

where the initial data h 1 , h 2 in (46) are of wavepacket form: h 1 (x) = 1 (βx) cos k1∗ x, h 2 (x) = 2 (βx) cos k2∗ x, |k1∗ | = |k2∗ | .

(47)

We take the nonlinearity to be quadratic and of the following simple form: F1 (u 1 , u 2 ) = u 21 + a1 u 1 u 2 ,

F2 (u 1 , u 2 ) = u 22 + a2 u 1 u 2 .

(48)

The system (45)–(47) allows for an explicit form of solutions with one-wavepacket initial data, describing a wave propagating with a constant speed controlled by the linear part and with a shape evolution controlled by the nonlinearity. This simplest case is compared then with the case of two-wavepacket initial data, for which an explicit solution is not available. In the case when h 2 = 0 the second equation has trivial solution u 2 = 0 and the system (45)–(46) reduces to a single equation (45). The solution to this equation has the c1 form of a traveling wave v1 x − τ , τ , where v1 (y, τ ) is a solution of the ordinary differential equation ∂τ v1 = F1 (v1 , 0), v1 (y, 0) = h 1 (y).

(49)

The explicit formula in the case (49) yields h 1 x − c1 τ 1 β x − c1 τ cos k1∗ β x − c1τ = v1 (x, τ ) = 1 − τ h 1 x − c1τ 1 − τ 1 β x − c1 τ cos k1∗ β x − c1 τ (50) for a time interval 0 ≤ τ < τ 0 , where τ 0 = sup |h1 1 (y)| is the blow-up time. Obviously, y the blow-up time does not depend on . Consequently, the wave propagates with the velocity c1 with its shape evolution being controlled by the nonlinearity. Similarly, when h 1 = 0 the first equation has the trivial solution u 1 = 0 and the system (45)–(46) reduces to a single equation (46) which has a solution in the form of a traveling wave v2 x − c2 τ , τ propagating with the velocity c2 . Observe that for the simple model (45)–(47) the group velocity coincides with the velocity of a traveling wave. The above model is not exactly solvable if both initial conditions h 1 and h 2 do not vanish. But one can still see the way influences the nonlinear dynamics quite explicitly by applying the superposition principle from 6]. Indeed, let us assume that h 1 and h 2 are two nonzero initial wavepackets. Then the approximate superposition principle is applicable (in order to put the system in the framework of 6] we use the 4-component extension (115) and set = β κ , κ > 1). According to the principle the exact solution

342

A. Babin, A. Figotin

(u 1 , u 2 ) is approximated by v1 x − c1 τ , τ , v2 x − c2 τ , τ , which is explicitly = O β κ −1− with arbitrary small if given by (50) with the accuracy O β 1+

c1 = c2 . As it as shown in 6] the validity of such an approximate presentation is due to 2 the large difference c1 −c of the group velocities of two wavepackets. 2.2. Dispersive effects and nonlinearity. Based on an elementary example of the Nonlinear Schrodinger equation (NLS), i ∂τ u = − γ 0 u + iγ 1 ∂x u + γ 2 ∂x2 u + b1 |u|2 u, u = u (x, τ ), x ∈ R (51) with the initial data in the form of a wavepacket u|τ =0 = (βx) eik∗ x , we would like to explain here why we are interested mostly in the case ≥ C > 0, β2

(52)

when the dispersion is not dominant. To make the dependence of u on β and explicit we change the variables u (x) = v (βx) eik∗ x , βx = z, (53) and obtain the equation i ∂τ v = − γ 0 v + iβγ 1 ∂z v + γ 2 β 2 ∂z2 v1 + b |v|2 v, v|τ =0 = (z), where γ 1 = γ 1 /β + 2γ 2 k∗ . Changing variables once more, β β − iτ γ 0 v (z, τ ) = e w z + γ 1 τ , τ , z + γ 1 τ = y,

(54)

(55)

we obtain for the envelope w the following standard NLS equation: ∂τ w = −

iβ 2 γ ∂ 2 w + b |w|2 w, 2 y

w|τ =0 = (y), 0 ≤ τ ≤ τ ∗ ,

(56)

with initial data independent of the parameters β, . The behavior of the solution w to 2 Eq. (56) on the time interval 0 ≤ τ ≤ τ ∗ is determined by the dispersion parameter β , and evidently linear dispersive effects become significant when β2 is not too large. If β2

→ ∞ and β → 0, the solution tends to zero at every fixed τ = τ 0 > 0. Indeed,

if we take = β κ , κ > 2, and make another change of variables τ = tβ κ −2 , w = β 1−κ /2 W , Eq. (56) reduces to the following problem with small initial data: ∂t W = −iγ 2 ∂ y2 W + b |W |2 W,

W |t=0 = β κ /2−1 (y).

(57)

For small enough β the solution W to this problem exists for all t and W (t) → 0 as t → ∞ (see 16]). In particular, for t = τ 0 β 2−κ we have w (τ 0 ) → 0 when β → 0. In the general case, the solution dependence on small β, is as follows. The dependence on large 1 in (51) is completely described by the change of variables (55), yielding a

Wavepacket Preservation Under Nonlinear Evolution

343 −γ

wave which (i) moves as a whole with a large group velocity 1 ; (ii) has a slowly evolving shape as described by v and w in (53), (55), (56). The above observations show that for small β2 the dispersive effects dominate and control the nonlinear ones. Keeping that in mind and being interested in stronger nonlinear effects we focus primarily on the case (52), i.e. β2 ≥ C > 0, for which there are two β2

→ 0, the linear dispersion produces only a small correction to the solution of the equation ∂τ w = b |w|2 w with that nonlinear equation governing the nonlinear dynamics of the envelope w for τ ∗ being smaller than the blow-up time. In the second scenario, when β 2 ∼ , Eq. (56) becomes independent of β, and describes the evolution of the envelope w governed by an interplay between the dispersion and the nonlinearity. The case β 2 ∼ can be also characterized as one where dispersive effects do occur but they don’t dominate nonlinear effects, and, as it is well known, the dispersion can exactly balance the nonlinearity yielding solitons. scenarios of the nonlinear evolution. In the first scenario, when

2.3. A coupled modes system. Here we illustrate statements of the general theorem on the wavepacket preservation and the approximate superposition principle by a simple but still nontrivial example. Let us consider a system of two coupled NLS type equations for variables u 1 (x, τ ), u 2 (x, τ ) with one-dimensional spatial variable x, i γ 01 + iγ 11 ∂x + γ 21 ∂x2 u 1 + b11 |u 1 |2 + b12 |u 2 |2 u 1 + c12 |u 2 |2 u 2 , (58) i ∂τ u 2 = − γ 02 + iγ 12 ∂x + γ 22 ∂x2 u 2 + b21 |u 1 |2 + b22 |u 2 |2 u 2 + c22 |u 1 |2 u 1 , (59) ∂τ u 1 = −

u 1 |τ =0 = h 1 (x) = 1 (βx) eik∗1 x , u 2 |τ =0 = h 2 (x) = 2 (βx) eik∗2 x ,

(60)

where γ i j are real and bi j are complex coefficients and the initial data in (60) are in the form of wavepackets with j (y) being Schwartz functions. Notice that if in the coupled modes system (58)–(60) h 2 = 0 and c12 = c22 = 0, then it has trivial solution u 2 = 0, and reduces to a single NLS equation of the form (51). The dependence of the solution {u 1 , u 2 } on the large 1 is captured by the change of variables (55). Namely, u 1 is a wave −γ

with a slowly varying envelope described by v1 which moves with large velocity 11 . The dependence on β is of the form v1 (y, τ ) = w1 (β y, τ ) (see the following subsection for details). Similarly we can consider the case when h 1 = 0 for which the first equation has trivial solution u 1 = 0, so the system (58)–(59) reduces to a single equation (59) with the solution represented by a wave having large spacial extension proportional to −γ 12 1 β and moving with the large velocity . 2.3.1. The superposition principle. Let us assume here that h 1 = 0, h 2 = 0, c12 = 0, c22 = 0 and β = κ , 0 < κ < 1. Applying the superposition principle we obtain for generic k∗1 , k∗2 the following representation of the exact solution: u 1 (x, τ ) = v1 (x, τ ) eik∗1 x + D1 , u 2 (x, τ ) = v2 (x, τ ) eik∗2 x + D2 ,

344

A. Babin, A. Figotin

where v1 (x, τ ) is a solution of the NLS equation (58) with b12 = c12 = 0, with v2 (x, τ ) being a solution to a similar decoupled NLS equation for b22 = c22 = 0, and D1 and D2 are small terms satisfying

sup0≤τ ≤τ ∗ D1 (·, τ ) L ∞ + sup0≤τ ≤τ ∗ D2 (·, τ ) L ∞ ≤ Cβ κ −1− + Cβ, κ = κ −1 . (61) We would like to emphasize here that the coupling terms b12 |u 2 |2 u 1 + c12 |u 2 |2 u 2 and b21 |u 1 |2 u 2 + c22 |u 2 |2 u 2 in Eq. (58)–(59) are not small whereas their ultimate contributions to the solutions are small. One can explain/interpret that phenomenon as being due to the destructive wave interference and mismatch of group velocities. 2.3.2. Wavepacket preservation. Here we assume that h 1 = 0, h 2 = 0, c12 = 0, c22 = 0 and = β κ , 0 < κ ≤ 2. According to the wavepacket preservation we have u 1 (x, τ ) = v1 (x, τ ) eik∗1 x + D1 , u 2 (x, τ ) = D1 , where v1 (x, τ ) is a solution of (58) with b12 = 0 , c12 = 0, and D1 and D2 are small terms satisfying sup0≤τ ≤τ ∗ D1 (·, τ ) L ∞ + sup0≤τ ≤τ ∗ D2 (·, τ ) L ∞ ≤ C. Notice once more (see the above section) an interesting phenomenon: Eq. (59) for u 2 (x, τ ) has a coupling term b21 |u 1 |2 u 2 + c22 |u 1 |2 u 1 which does not become small as β, → 0, but, remarkably, its ultimate contribution to the solution is small. 2.3.3. Limitations of the superposition principle. Now we provide an example based on the system (58)–(60) with c12 = c22 = 0 showing that the above estimate (61) in the superposition principle is sharp in the sense that β κ −1− cannot be replaced by β κ −1+

with κ ≥ 1. We set here κ = 1 and = β. After the change of variables (53) for u 1 , u 2 −iτ

γ 01

−iτ

β w ,v = e followed by yet another change of variables βx = z, v1 = e 1 2 we obtain from (58)–(60) the following system: ∂τ w1 = −i iγ 11 ∂z w1 + βγ 21 ∂z2 w1 + b11 |w1 |2 + b12 |w2 |2 w1 , ∂τ w2 = −i iγ 12 ∂z w2 + βγ 22 ∂z2 w2 + b21 |w1 |2 + b22 |w2 |2 w2 ,

γ 01 β

w2 ,

w1 |τ =0 = 1 (z), w2 |τ =0 = 2 (z). This system has a regular dependence on β as β → 0 with the solution converging in L ∞ to the solution of the system with β = 0. If we set now in the last system b12 = b21 = 0 it turns into a system of two decoupled equations. Notice then that the difference between the solutions of the decoupled system and the original one does not tend to zero as β → 0, implying that the superposition principle does not hold when = β. 2.4. Wavepacket interaction system with a universal nonlinearity. We will prove in the following sections that the dynamics of a multi-wavepacket with a universally resonance invariant nk-spectrum for a general system can be approximated with the accuracy O () by substituting the nonlinearity with a properly constructed universal or weakly universal one. Here we provide an example of a system, called wavepacket interaction

Wavepacket Preservation Under Nonlinear Evolution

345

system, with a universal nonlinearity and show that its dynamics preserves simple wavepackets as in (12). It is shown later that universal nonlinearities are related to universally invariant multi-wavepackets in the sense of Definition 18. Wavepacket interaction system with universal nonlinearity has the form similar to NLS, namely ∂τ u j,ζ =

1 −iζ γ 0, j + γ 1, j · ∇ r u j,ζ − iζ ∇ r · γ 2, j ∇ r u j,ζ + F j,ζ ( u ), r ∈ Rd , (62) u = (u 1+ , u 1− , . . . , u N + , u N − ), j = 1, . . . , N , ζ = ±, u j,ζ = h j,ζ , h j,ζ (r) = j (βr) eiζ k∗ j ·r , τ =0

(63) (64)

where for every j coefficient γ 0, j ∈ R, γ 1, j ∈ Rd is a vector, γ 2, j is a symmetric d × d matrix, γ 1, j · ∇ r is a first order scalar differential operator, ∇ r · γ 2, j ∇ r is the second order scalar differential operator,and the universal polynomial nonlinearities F j,ζ have the following form: F j,ζ ( u) =

νF

bν , j,ζ

N

ν=1 |ν |=ν

where ν = (ν 1 , . . . , ν N ),

l=1

u l,+ u l,−

ν l

u j,ζ ,

j = 1, . . . , N , ζ = ±.

(65)

Remark 8. Notice that if we set h j,− = h ∗j,+ , bν , j,+ = bν∗, j,− = bν , j and u j,+ = u ∗j,− = 2 u j then u l,+ u l,− = u l,+ and F j,+ ( u ) turns into F j (u 1 , . . . , u N ) =

νF

ν=1 |ν |=ν

bν , j

N l=1

|u l |2ν l u j ,

(66)

and equations of (62) with ζ = + turn into ∂τ u j =

1 −iγ 0, j + γ 1 j · ∇ r u j − i∇ r · γ 2, j ∇ r u j + F j (u 1 , . . . , u N ), u j τ =0 = h j,+ , j = 1, . . . , N , ζ = ±.

(67)

Obviously, a solution of (67) defines a solution u j,+ = u j , u j,− = u ∗j of the system (62). In the simplest case N = 1, d = 1 (67) takes the form of classical NLS: γ γ ∂τ u = 1 ∂x u − i 2 ∂x2 u + b |u|2 u. Note that the universal nonlinearity F j,ζ has a characteristic property F j,ζ eiφ 1 t u 1,+ , e−iφ 1 t u 1,− , . . . , eiφ N t u N ,+ , e−iφ N t u N ,− = eiζ φ j t F j,ζ (u 1+ , u 1− , . . . , u N + , u N − ),

(68)

holding for arbitrary set values φ i . We also consider more general nonlinearities F for which (68) holds for a fixed set of frequencies φ l = ωnl (k∗l ), and call them weakly

346

A. Babin, A. Figotin

universal. We introduce now the averaging operator A T acting on polynomial functions N N F : C2 → C2 by

1 T

0

(A T F) j,ζ = A T,φ F T

j,ζ

=

e−iζ φ j t F j,ζ eiφ 1 t u 1,+ , e−iφ 1 t u 1,− , . . . , eiφ N t u N ,+ , e−iφ N t u N ,− dt,

(69)

where φ = φ 1 , . . . , φ N . The operator A T,φ depends on the frequency vector φ = = F j,ζ for φ 1 , . . . , φ N . If F is a universal polynomial nonlinearity, then A T,φ F any choice of frequencies φ 1 , . . . , φ N . Note that averaging

G av, j,ζ ( u ) = lim

T →∞

A T,φ G

j,ζ

j,ζ

u) (

(70)

N N is defined for any polynomial nonlinearity G : C2 → C2 . If φ is generic, then G av, j,ζ ( u ) is always a universal nonlinearity. In a general case G av, j,ζ for given frequencies φ one obtains a weakly universal nonlinearity which might be not universal. Systems with universal nonlinearities have interesting properties which we describe in the following proposition and remark. Proposition 9. Let = β and γ 2, j = 0. Then evolution governed by the first order system with a universal nonlinearity (62) preserves simple wavepackets as defined by (12). Proof. Let u (τ ) be a solution of (62) for 0 ≤ τ ≤ τ ∗ . Using the property (68) we change variables u j,ζ = eiζ k∗ j ·r e

−i

ζ γ 0, j

τ −i

e

γ 0 j,ζ β

τ

v j,ζ , γ 0 j,ζ = −ζ γ 1 j · k∗ j

(71)

v j,ζ τ =0 = j,ζ (βr).

(72)

and obtain from (62) ∂τ v j,ζ =

1 γ · ∇ r v j,ζ + F j,ζ ( v ), β 1j

Changing variables v j,ζ (r, τ ) = w j,ζ (βr, τ ), βr = z,

(73)

we obtain from (72) that w j is a solution of the following system of differential equations: ∂τ w j,ζ = γ 1 j · ∇ z w j,ζ + F j,ζ (w),

w j,ζ τ =0 = j,ζ (z),

(74)

which does not depend on β. Then using (73) and (71) we observe that every component u l of the solution to (62) has the form of a simple wavepacket for every τ ∈ [0, τ ∗ ], with an envelope wˆ j (τ ).

Wavepacket Preservation Under Nonlinear Evolution

347

Remark 10. Equations (62) with universal nonlinearities allow special solutions in the −i

γ 0 j

τ

form of u j,ζ = eik∗ j ·r e β v j,ζ (τ ), where v j,ζ (τ ) do not depend on r. If the initial data in (72) are constants, j,ζ (βr) = j,ζ (0), then (72) turns into a system of ODE. This implies that every linear subspace of pure modal functions with the basis v j eik∗ j ·r , v j,− e−ik∗ j ·r , j = 1, .., N is invariant with respect to nonlinear equations (62). Another class of special solutions of (62) are time-harmonic solutions of the form u j,ζ (r, τ ) = e−iζ ω j τ v j,ζ (r), where v j,ζ solve a nonlinear eigenvalue problem; for universal nonlinearities ω j can be considered as an unknown nonlinear eigenvalue. Existence of such special solutions is a special property of universal and weakly universal nonlinearities. It is remarkable that original nonlinear equations might not have time harmonic solutions whereas equations with universal nonlinearities which approximate evolution of wavepackets (see Theorem 7) admit such solutions. 2.5. Invariance of excited modes for finite-dimensional ODE’s. Here we discuss the resonance invariance conditions imposed in Theorem 5 in a simpler case of finitedimensional ODE’s. In this case one can also see the rise of universal nonlinearities in the process of time averaging. As we already discussed in the introduction, a PDE system (1) when restricted to constant functions turns into the following system of ODE’s: i ∂τ U = − L0 U + F (U), U (τ )|τ =0 = h, h ∈ C2J , U ∈ C2J, (75) where F (U) is a polynomial, U = U1,+ , U1,− , . . . , U J,+ , U J,− ∈ C2J . We assume that the eigenvalues ωn,ζ (0) = ω0n,ζ of the Hermitian matrix L0 = L (k)|k=0 are distinct 0 for j = i and the symmetry conditions (7) take the form ω0 0 ω0j,+ = ωi,+ n,−ζ = −ωn,ζ . We also assume that the eigenvectors of L0 coincide with the coordinate orts in C2J . The following limit case of Theorem 5 with β = 0 shows that solutions to this system have the property to preserve the set of initially excited modes. Theorem 11. Let the initial data h = h 1,+ , h 1,− , . . . , h J,+ , h J,− ∈ C2J in (75) have non-zero components h j,ζ only for a subset B of indices j ∈ {1, . . . , J }, and let B = {1, . . . , J } \ B be its complementary set. Assume that B is resonance invariant in the sense that the resonance equation ω0n ,ζ −

m

j=1

ω0n

j ,ζ

( j)

= 0, where n j ∈ B, ζ ( j) ∈ {+, −}

(76)

does not have solutions if n ∈ B (compare with Definition 18 in the special case when all k∗l = 0). Then under the nonlinear evolution of (75) modes with indices n ∈ B remain essentially unexcited in the following sense: sup |Un (τ )| ≤ C for all n ∈ B .

0≤τ ≤τ ∗

(77)

Note that F (U) provides a nonlinear coupling between modes Un j ,ζ ( j) with n j ∈ B and Un ,ζ with n ∈ B , but the resulting interaction is not O (1) on a fixed time interval [0, τ ∗ ] as one might expect, but rather of order O (v) as (77) shows. One way to prove

348

A. Babin, A. Figotin

Theorem 11 is to follow the proofs of Theorems 35 and 37 with obvious modifications and simplifications. In particular, instead of (15) one has to consider the following system with oscillatory coefficients: iτ

∂τ u = e

L0

−iτ 0 L F e u ,

u (τ )|τ =0 = h.

(78)

Alternatively, Theorem 11 can be derived directly from the classical time averaging principle. Indeed, the time averaging of (78) yields the following averaged system: ∂τ v = Fav (v),

v (τ )|τ =0 = h,

where Fav is defined as in (69), (70) with the frequencies φ j = ω0j,+ . From the KrylovBogolyubov averaging theorem (see 11,37]) one obtains |v (τ ) − u (τ )| ≤ C, 0 ≤ τ ≤ τ ∗ . A straightforward examination shows that if B is resonance invariantand j ∈ B then 1 the polynomial components Fav, j,ζ (v) factorize into Fav, j,ζ (v) = j ∈B ,ζ Fav, j ,ζ (v) v j ,ζ , implying (77) since v j,ζ (0) = 0 for j ∈ B . A stronger universal resonance invariance condition in Definition 18 also takes a simpler form in the ODE case. Indeed, let us collect the terms in (76) at different ω0j,+ as in (101), namely ω0n ,ζ −

m

j=1

ω0n

j ,ζ

( j)

=

J

0 δ i ωi,+ , where δ i are integers,

(79)

i=1

Similarly to Definition 18 we call B universally resonance invariant if every solution to the resonance equation (76) must have n ∈ B and every coefficient δ i in (79) for the solution is zero, i.e. δ i = 0, i = 1, . . . , J . Obviously, if all ω0n,+ are rationally independent then it is universally resonance invariant. Now let us look at how universal nonlinearities arise under time averaging. Observe that if the entire set {1, . . . , J } is universally resonance invariant and F j,ζ (v) are arbitrary polynomials, then the polynomials Fav, j,ζ (v) are obtained by discarding the “resonant” iτ 0 −iτ 0 L L terms in e F e u yielding the universal form (65), (66). For example, if F is an arbitrary cubic nonlinearity in C2N then the time averaging yields NLS-like nonlinearity Fav with components N

Fav, j,ζ u 1,+ , u 1,− , . . . , u N ,+ , u N ,− = bl, j,ζ u l,+ u l,− u j,ζ . l=1

When B is resonance invariant but not universally resonance invariant the averaging produces a weakly universal nonlinearity. A nonlinearity which is weakly universal but not universal may include additional terms, for example the cubic nonlinearity in the classical four-wave interaction system where it is assumed that ω02,− +ω03,+ +ω04,+ = ω01,+ (see 46] p. 201) in the equation for u 1,+ in addition to NLS-like terms involves the product u 2,− u 3,+ u 4,+ .

Wavepacket Preservation Under Nonlinear Evolution

349

3. Conditions and Definitions In this section we formulate and discuss definitions and conditions under which we study the nonlinear evolutionary system (1) through its modal, Fourier form (3). Most of the conditions and definitions are naturally formulated for the modal form (3), and this is one of the reasons we use it as the basic form. 3.1. Linear part. The basic properties of the linear part L (k) of the system (3), which is a 2J × 2J Hermitian matrix with eigenvalues ωn,ζ (k), has been already discussed in the Introduction. To account for all needed properties of L (k) we define the singular set of points k. Definition 12 (Band-crossing points). We call k0 a band-crossing point for L (k) if ωn+1,ζ (k0 ) = ωn,ζ (k0 ) for some n, ζ or L (k) is not continuous at k0 or if ω1,± (k0 ) = 0, we denote the set of such points by σ bc . In the next condition we collect all constraints imposed on the linear operator L (k). Condition 13 (Linear part). The linear part L (k) of the system (3) is a 2J ×2J Hermitian matrix with eigenvalues ωn,ζ (k) and corresponding eigenvectors gn,ζ (k) satisfying for k ∈ / σ bc the basic relations (5)–(7). In addition to that we assume: (i) the set of band-crossing points σ bc is a closed, nowhere dense set in Rd and has zero Lebesgue measure; (ii) the entries of the Hermitian matrix L (k) are infinitely differentiable in k for all k∈ / σ bc that readily implies via the spectral theory, 35], infinite differentiability of all eigenvalues ωn (k) in k for all k ∈ / σ; (iii) L (k) satisfies the polynomial bound L (k) ≤ C 1 + |k| p , k ∈ Rd , for some C > 0 and p > 0. (80) Remark 14 (Dispersion relations symmetry). The symmetry condition (7) on the dispersion relations naturally arises in many physical problems, for example Maxwell equations in periodic media, see 1–3,5], or when L (k) originates from a Hamiltonian. We would like to stress that these symmetry conditions are not imposed to simplify studies but rather to take into account fundamental symmetries of physical media. In fact, the opposite case when ((7) is assumed not to hold is much simpler. The symmetry creates resonant nonlinear interactions, which makes studies more intricate. Interestingly, many problems without symmetries can be put into the framework with symmetry by an extension of the relevant system (see Sect. 4). Remark 15 (Band-crossing points). Band-crossing points are discussed in more detail in 1, Sect. 5.4], 2, Sects. 4.1, 4.2]. In particular, generically the set σ bc of the bandcrossing point is a manifold of the dimension d − 2. Notice that there is an natural ambiguity in the definition of the normalized eigenvectors gn,ζ (k) of L (k) which is defined up to a complex number ξ with |ξ | = 1. This ambiguity may not allow an eigenvector gn,ζ (k) which can be a locally smooth function in k to be a uniquely defined continuous function in k globally for all k ∈ / σ bc because of a possibility of branching. But, importantly, the orthogonal projector n,ζ (k) on gn,ζ (k) as defined by (11) is uniquely defined and, consequently, infinitely differentiable in k via the spectral theory, 35], for ˆ (k) as an element of the space L 1 and σ bc is of zero all k ∈ / σ bc . Since we consider U Lebesgue measure considering k ∈ / σ bc is sufficient for us.

350

A. Babin, A. Figotin

2J We introduce for vectors uˆ ∈ C their expansion with respect to the orthonormal basis gn,ζ (k) :

uˆ (k) =

J

uˆ n,ζ (k) gn,ζ (k) =

n=1 ζ =±

J

uˆ n,ζ (k), uˆ n,ζ (k) = n,ζ (k) uˆ (k),

n=1 ζ =±

(81) and we refer to it as the modal decomposition of uˆ (k) and to uˆ n,ζ (k) as the modal coefficients of uˆ (k). Evidently j

n,ζ (k) = I2J ,

where I2J is the 2J × 2J identity matrix.

(82)

n=1 ζ =±

Notice that in view of the polynomial bound 80) we can define the action of the operator L (−i∇ r ) on any Schwartz function Y (r) by the formula r ) Y (k) = L (k) Y ˆ (k), L (−i∇

where the order of L does not exceed p.

(83)

In a special case when all the entries of L (k) are polynomials (83) turns into the action of the differential operator with constant coefficients of order not exceeding p. 3.2. Nonlinear part. The nonlinear term Fˆ in (3) is assumed to be a general functional polynomial of the form

ˆ m , where Fˆ (m) is m-homogeneous polylinear operator, ˆ = Fˆ (m) U Fˆ U m∈MF

(84) M F = m 1 , . . . , m p ⊂ {2, 3, . . .} is a finite set, and m F = max {m : m ∈ M F } . (85) ˆ For instance, The integer m F in (85) is called the degree of the functional polynomial F. if M F = {2} or M F = {3} the polynomial Fˆ is respectively homogeneous quadratic or cubic. Every m-linear operator Fˆ (m) in (84) is assumed to be of the form of a convolution ˆ 1, . . . , U ˆ 1 k . . . U ˆ m (k, τ ) = ˆ m k(m) k, k d˜ (m−1)d k, χ (m) k, k U Fˆ (m) U Dm

(86) dk . . .

dk(m−1)

, where Dm = R(m−1)d , d˜ (m−1)d k = (2π )(m−1)d k(m) k, k = k − k − . . . − k(m−1) , k = k , . . . , k(m) ,

(87)

indicating that the nonlinear operator F (m) (U1 , . . . , Um ) is translation invariant (it may be local or non-local). The quantities χ (m) in (86) are called susceptibilities. For numerous examples of nonlinearities of the form similar to (84), (86) see 1–7] and references therein. In what follows the nonlinear term Fˆ in (3) will satisfy the following conditions.

Wavepacket Preservation Under Nonlinear Evolution

351

ˆ is assumed to be of the form Condition 16 (Nonlinearity). The nonlinearity Fˆ U (84)–(86). The susceptibility χ (m) k, k , . . . , k(m) is infinitely differentiable for all k and k( j) which are not band-crossing points, and is bounded, namely (m) (m) k, k , . . . , k(m) ≤ Cχ , m ∈ M F , sup χ = (2π )−(m−1)d χ k,k ,...,k(m) ∈Rd \σ bc

(88) m m where the norm χ (m) k, k of the m-linear tensor χ (m) : C2J → C2J for fixed k, k is defined by (m) k, k = sup χ (m) k, k (x1 , . . . , xm ) , where |x| is the Euclidean norm. χ x ≤1 | j| (89) When χ (m) k, k depend on small or, more generally, on q , q > 0, we simi q . Many results of this paper extend to this case, in particular larly have χ (m) k, k, q − χ (m) k, k, 0 if χ (m) k, k, ≤ Cχ q for ≤ 1 then conditions of Corollary 38 are fulfilled. Note that since the tensors χ (m) k, k are bounded, the dependence on k, k cannot be polynomial, therefore the original equation (1) does not include spatial derivatives but rather includes bounded “pseudodifferential” operators. Note that this type of susceptibilities with spatial dispersion is common in nonlinear optics, see 15,41,55]. 3.3. Resonance invariant nk-spectrum. In this section, relying on given dispersion relations ωn (k) ≥ 0, n ∈ {1, . . . , J }, we consider resonance properties of nk-spectra S and the corresponding k-spectra K S as defined in Definition 2, i.e. S = {(nl , k∗l ), l = 1, . . . , N } ⊂ = {1, . . . , J } × Rd , K S = k∗li , i = 1, . . . , |K S | . (90) We precede the formal description of the resonance invariance (see Definition 18) with the following guiding physical picture. Initially at τ = 0 the wave is a multi-wavepacket composed of modes from a small vicinity of the nk-spectrum S. As the wave evolves according to (3) the polynomial nonlinearity inevitably involves a larger set of modes [S]out ⊇ S, but not all modes in [S]out are “equal” in developing significant amplitudes. The qualitative picture is that whenever a certain interaction phase function (see (134) below) is not zero, the fast time oscillations weaken effective nonlinear mode interaction and the energy transfer from the original modes in S to relevant modes from [S]out , keeping their magnitudes vanishingly small as β, → 0. There is a smaller set of modes [S]res out which can interact with modes from S rather effectively and develop significant amplitudes. Now, if [S]res out ⊆ S then S is called resonance invariant.

(91)

In simpler situations the resonance invariance conditions turn into the well-known in nonlinear optics phase and frequency matching conditions. For instance, if S contains n 0 , k∗l0 and the dispersion relations allow for the second harmonic generation in

352

A. Babin, A. Figotin

another band n 1 so that 2ω n 0 k∗l0 = ωn 1 2k∗l0 , then for S to be resonance invariant it must contain n 1 , 2k∗l0 too. Let us turn now to the rigorous constructions. First we introduce necessary notations. Let m ≥ 2 be an integer, l = (l1 , .., lm ), l j ∈ {1, . . . , N } be an integer vector from {1, . . . , N }m and ζ = ζ(1) , ,.., ζ (m) , ζ ( j) ∈ {+1, −1} be a binary vector from {+1, −1}m . Note that a pair ζ , l naturally labels a sample string of the length m composed of elements ζ ( j) , nl j , k∗l j from the set {+1, −1} × S. Let us introduce the sets = {(ζ , l) : l ∈ {1, . . . , N } , ζ ∈ {+1, −1}}, m = λ = (λ1 , . . . , λm ), λ j ∈ , j = 1, . . . , m .

(92)

There is a natural one-to-one correspondence between m and {−1, 1}m × {1, . . . , N }m and we will write, exploiting this correspondence λ = ζ , l1 , . . . , ζ (m) , lm = ζ , l , ϑ ∈ {−1, 1}m, l∈ {1, . . . , N }m for λ ∈ m . (93) Let us introduce the following linear combination: m

κm λ = κm ζ , l = ζ ( j) k∗l j with ζ ( j) ∈ {+1, −1},

(94)

j=1

and let [S] K ,out be the set of all its values as k∗l j ∈ K S , λ ∈ m , namely κm λ . [S] K ,out = m m∈MF

λ∈

(95)

We call [S] K ,out the output k-spectrum of K S . Everywhere in this paper we consider nk-spectra S which satisfy the following condition: σ bc = ∅. (96) [S] K ,out We also define the output nk-spectrum of S by [S]out = (n, k) ∈ {1, . . . , J } × Rd : n ∈ {1, . . . , J } , k ∈ [S] K ,out .

(97)

We introduce the following functions: 1,m λ k∗ = mj=1 ζ ( j) ωl j k∗l j , k∗ = k∗1 , . . . , k∗|K S | , where k∗l j ∈ K S ,

ζ , n, λ

k∗∗ , k∗ = −ζ ωn (k∗∗ ) + 1,m λ k∗ ,

(98) (99)

where ζ = ±1, m ∈ M F as in (84). We introduce these functions to apply later to phase functions (134). Now we introduce the resonance equation ζ , n, λ ζ κm λ , k∗ = 0, l ∈ {1, . . . , N }m , ζ ∈ {−1, 1}m , (100)

Wavepacket Preservation Under Nonlinear Evolution

353

denoting by P (S) the set of its solutions m, ζ , n, λ . Such a solution is called S-internal if n, ζ κm λ ∈ S, that is n = nl0 , ζ κm λ = k∗l0 , l0 ∈ {1, . . . , N }, and we denote the corresponding l0 = I λ . We also denote by Pint (S) ⊂ P (S) the set of all S-internal solutions to (100). Now we consider the simplest solutions to (100) which play an important role. Keeping in mind that the string l can contain several copies of a single value l, we can recast the sum in (98) as follows: N

δl ωl (k∗l ), where δl 1,m λ = 1,m ζ , l =

l=1

if l−1 (l) = ∅ j∈l−1 (l) = , 0 if l−1 (l) = ∅ (101) l−1 (l) = j : l j = l, 1 ≤ j ≤ m , l = (l1 , . . . , lm ), 1 ≤ l ≤ N . Let us call a solution m, ζ , n, λ ∈ P(S) of (100) universal if it has the following properties: (i) only a single coefficient out of all δl in (101) is nonzero, namely for some I0 we have δ I0 = ±1 and δl = 0 for l = I0 ; (ii) n = n I0 and ζ = δ I0 . A justification for calling such a solution universal comes from the fact that if it is a solution for one k∗ it is a solution for any other k∗ ∈ Rd . We denote the set of universal solutions to(100) by Puniv (S), and note that a universal solution is a S -internal solution with I λ = I0 implying Puniv (S) ⊆ Pint (S). (102) ζ ( j)

Indeed, observe that for δl as in (101), m N

ζ ( j) k∗l j = δl k∗l , κm λ = κm ζ , l = j=1

(103)

l=1

implying κm λ = δ I0 k∗I0 and ζ κm λ = δ 2I0 k∗I0 = k∗I0 . Then Eq. (100) is obviously satisfied and n, ζ κm λ = n I0 , k∗I0 ∈ S. Example 17 (Universal solutions). Suppose there is just a single band, i.e. J = 1, a symmetric dispersion relation ω1 (−k) = ω1 (k), a cubic nonlinearity F with M F = {3}. First letus S1 = {(1, k∗ )}, that is N = 1. Then the simplest nk -spectrum take 1,3 λ k∗ = δ 1 ω1 (k∗ ) and κm λ = δ 1 k∗ where we use notation (101). The universal solution set has the form Puniv (S1 ) = 3, ζ , 1, λ : λ ∈ ζ , ζ = ± , where + consists of vectors (λ1 , λ2 , λ3 ) of the form ((−, 1), (+, 1), (+, 1)), ((+, 1), (−, 1), (+, 1)) and ((+, 1), (+, 1), (−, 1)). Obviously, Puniv (S1 ) = Pint (S1 ). In the next example we take the nk-spectrum S = {(1, k∗ ), (1, −k∗ )}, that is N = 2 and k∗1 = k∗ , k∗2 = −k This example is typical for two counterpropagating Then ∗. waves. m 3 j) ( ( j) ωl j k∗l j = (δ 1 + δ 2 ) ω1 (k∗ ) and κm λ = 1,3 λ k∗ = j=1 ζ j=1 ζ

354

A. Babin, A. Figotin

k∗l j = δ 1 k∗1 + δ 2 k∗2 = (δ 1 − δ 2 ) k∗ where we use notation (101). The universal solution set has the form Puniv (S) = 3, ζ , 1, λ : λ ∈ ζ , ζ = ± , where + consists of vectors (λ1 , λ2 , λ3 ) of the form ((+, 1), (−, 1), (+, 1)), ((+, 1), (−, 1), (+, 2)), ((+, 2), (−, 2), (+, 1)), ((+, 2), (−, 2), (+, 2)), and vectors obtained from the listed ones by permutations of coordinates λ1 , λ2 , λ3 . The solutions from Pint (S) have to satisfy |δ 1 − δ 2 | = 1 and |δ1 + δ 2 | = 1 which is possibleonly if δ 1 δ 2 = 0. Since ζ = δ 1+ δ 2 we have 2 2 ζ κm λ = δ 1 − δ 2 k∗ and ζ κm λ = k∗1 if |δ 1 | = 1 or ζ κm λ = k∗2 if |δ 2 | = 1. Hence Pint (S) = Puniv (S) in this case. Note that if we set S2 = {(1, −k∗ )} then S = S1 ∪ S2 but Pint (S) is larger than Pint (S1 ) ∪ Pint (S2 ). This can be interpreted as follows. When only modes from S1 are excited, the modes from S2 remain non-excited. But when both S1 and S2 are excited, there is a resonance effect of S1 onto S2 , represented, for example, by λ = ((+, 1), (−, 1), (+, 2)), which involves the mode ζ κm λ = k∗2 . Now we are ready to define resonance invariant spectra. First, we introduce a subset [S]res out of [S]out by the formula (104) [S]res out = (n, k∗∗ ) ∈ [S]out : k∗∗ = ζ κm λ , m ∈ M F , where m, ζ , n, λ is a solution of (100) , calling it resonant output spectrum of S, and then we define resonance selection operation R (S) = S ∪ [S]res out .

(105)

Definition 18 (Resonance invariant nk-spectrum). The nk-spectrum S is called resonance invariant if R (S) = S or, equivalently, [S]res out ⊆ S. The nk-spectrum S is called universally resonance invariant if R (S) = S and Puniv (S) = Pint (S). It is worth noticing that even when a nk-spectrum is not resonance invariant often it can be easily extended to a resonance invariant one. Namely, if R j (S) ∩ σ bc = ∅ for all j then the set R∞ (S) =

∞

R j (S) ⊂ = {1, . . . , J } × Rd

j=1

is resonance invariant. In addition to that, R∞ (S) is always at most countable. Usually it is finite, i.e. R∞ (S) = R p (S) for a finite p , see examples below and we also show below that R∞ (S) = S for generic K S . Example 19 (Resonance invariant nk-spectra for quadratic nonlinearity). Suppose there is a single band, i.e. J = 1,with a symmetric dispersion relation, and a quadratic nonlinearity F, that is M F = {2}. Let us assume that k∗ = 0, k∗ , 2k∗ , 0 are not bandcrossing points and look at two examples. First, suppose that 2ω1 (k∗ ) = ω1 (2k∗ ) (no second harmonic generation) and ω1 (0) = 0. Let us set the nk-spectrum to be the set S1 = {(1, k∗ )}, then S1 is resonance invariant. Indeed, K S1 = {k∗ } , [S1 ] K ,out = {0, 2k∗ , −2k∗ }, [S1 ]out = {(1, 0), (1, 2k∗ ), (1, −2k∗ )} and an elementary examination shows that [S1 ]res out = ∅ ⊂ S1 implying R (S1 ) = S1 . For the second example let us

Wavepacket Preservation Under Nonlinear Evolution

355

assume ω1 (0) = 0 and 2ω1 (k∗ ) = ω1 (2k∗ ), that is the second harmonic generation is allowed. Here [S1 ]res out = {(1, 2k∗ )} and R (S1 ) = {(1, k∗ ), (1, 2k∗ )} implying R (S1 ) = S1 and, hence, S1 is not resonance invariant. Suppose now that 4k∗ , 3k∗ ∈ / σ bc and ω1 (0) = 0, ω1 (4k∗ ) = 2ω1 (2k∗ ), ω1 (3k∗ ) = ω1 (k∗ ) + ω1 (2k∗ ) and let us set S2 = {(1, k∗ ), (1, 2k∗ )}. An elementary examination shows that S2 is resonance invariant. Note that S2 can be obtained by iterating the resonance selection operator, namely S2 = R (R (S1 )). Note also that Puniv (S2 ) = Pint (S2 ). Notice that ω1 (0) = 0 is a special case since k = 0 is a band-crossing point, and it requires a special treatment. Example 20 (Resonance invariant nk-spectra for cubic nonlinearity). Let us consider the one-band case with a symmetric dispersion relation and a cubic nonlinearity that is M F = {3}. First we take S1 = {(1, k∗ )}, we assume that k∗ , 3k∗ are notband-cross ing points, implying [S1 ] K ,out = {k∗ , −k∗ , 3k∗ , −3k∗ }. We have 1,3 λ k∗ = 3 ( j) ω (k ) = δ ω (k ) and κ = δ 1 k∗ , where we use notation (101), 1 ∗ 1 1 ∗ m λ j=1 ζ δ 1 takes values 1, −1, 3, −3. If 3ω1 (k∗) = ω1 (3k∗ ), then (100) has a solution only if |δ 1 | = 1 and δ 1 = ζ , hence ζ κm λ = k∗ and every solution is internal. Therefore, [S1 ]res out = ∅ and R (S1 ) = S1 . Now consider the case associated with the third harmonic generation, namely 3ω1 (k∗ ) = ω1 (3k∗ ) and assume that ω1 (3k∗ ) + 2ω1 (k∗ ) = ω1 (5k∗ ), 3ω1 (3k∗ ) = ω1 (9k∗ ), 2ω1 (3k∗ ) + ω1 (k∗ ) =ω1 (7k∗ ), 2ω1 (3k∗ ) − ω1 (k∗ ) =ω1 (5k∗ ). An elementary examination shows that the set S4 = {(1, 3k∗ ), (1, k∗ ), (1, −k∗ ) (1, −3k∗ )} satisfies R (S4 ) = S4 . Consequently, a multiwavepacket having S4 as its resonance invariant nk-spectrum involves the third harmonic generation and, according to Theorem 3, it is preserved under nonlinear evolution. The above examples indicate that in simple cases the conditions on k∗ which can make S non-invariant with respect to R have a form of several algebraic equations, therefore, for almost all k∗ such spectra S are resonance invariant. The examples also show that if we fix S and dispersion relations then we can include S in the larger spectrum S = R p (S) using repeated application of the operation R to S , and often the resulting extended nk-spectrum S is resonance invariant. We show in the following section that nk-spectrum S with generic K S is universally resonance invariant. Note that the concept of resonance invariant nk-spectrum gives a mathematical description of such fundamental concepts of nonlinear optics as phase matching, frequency matching, four wave interaction in cubic media and three wave interaction in quadratic media. If a multi-wavepacket has a resonance invariant spectrum, all these phenomena may take place in the internal dynamics of the multi-wavepacket, but do not lead to resonant interactions with continuum of all remaining modes. 3.4. Genericity of the nk-spectrum invariance condition. In simpler situations, when the number of bands J and wavepackets N are not too large, the resonance invariance of nk- spectrum can be easily verified as above in Examples 19, 20, but what one can say if J or N are large, or if the dispersion relations are not explicitly given? We show below that in properly defined non-degenerate cases a small variation of K S makes S universally resonance invariant, i.e. the resonance invariance is a generic phenomenon. Assume that the dispersion relations ωn (k) ≥ 0, n ∈ {1, . . . , J } are given. Observe then that m ζ , n, λ = m ζ , n, λ k∗1 , . . . , k∗|K S | defined by (99) is a continuous function of k∗l ∈ / σ bc for every m, ζ , n, λ .

356

A. Babin, A. Figotin

Definition 21 (ω-degenerate dispersion relations). We call dispersion relations ωn (k), n = 1, . . . , J , ω-degenerate if there exists such a point k∗ ∈ Rd \ σ bc that for all k in a neighborhood of k∗ at least one of the following four conditions holds: (i) the relations J Cn ωn (k) = c0 , where all Cn are integers, one of are linearly dependent, namely n=0 which is nonzero, and the c0 is a constant; (ii) at least one of ωn (k) is a linear function; (iii) at least one of ωn (k) satisfies equation Cωn (k) = ωn (Ck) with some n and integer C = ±1; (iv) at least one of ωn (k) satisfies equation ωn (k) = ωn (−k), where n = n. Note that fulfillment of any of the four conditions in Definition 21 makes it impossible to turn some non-resonance invariant sets into resonance invariant ones by a variation of k∗l . For instance, if M F = {2} as in Example 19 and 2ω1 (k) = ω1 (2k) for all k in an open set G then the set {(1, k∗ )} with k∗ ∈ G cannot be made resonance invariant by a small variation of k∗ . Below we show that if dispersion relations are not ω -degenerate, then a small variation of k∗l turns non-resonance invariant sets into resonance invariant. d |K S | , . . . , k Theorem 22. If m ζ , n 0 , λ k∗1 ∗|K S | = 0 on a cylinder G in R \ σ bc which is a product of small balls G i ⊂ Rd \ σ bc , then either m, ζ , n 0 , λ ∈ Puniv (S) or dispersive relations ωn (k) are ω-degenerate as in Definition 21. Proof. Collecting similar terms in (100) we obtain the following equation for ki from Gi : ⎛ ⎞ |K J |K S| S|

δ in ωn (ki ) = ζ ωn 0 ⎝ δ i ki ⎠ where δ in , δ i are integers. (106) n=1 i=1

i=1

may be non-zero only if (n, k ) ∈ S, that Comparing (106) with (101) we see that δ in i is (n, ki ) = (nl , kl ) with l ∈ {1, . . . , N }, where l = l (i, n) is uniquely determined and = δ with δ as in (101). If there are two nonzero coefficients δ in (106) we use an δ in l l i elementary Proposition 24 below, noticing that we are in case (ii) of Definition 21. If we do not have two nonzero δ i then either all δ i = 0 or only one δ i = δ i0 = 0. If all δ i = 0 then the right-hand side of (106) turns into ωn 0 (0) and, G i ⊂ Rd \ σ bc , ωn 0 (0) = 0. J ω (k ) is constant, one of δ is non-zero and we Hence, for every i the sum n=1 δ in n i in are in case (i) of Definition 21. If only one δ i = 0 with i = i 0 we have J |K S|

δ in ωn (ki ) = ζ ωn 0 δ i0 ki0 for all ki ∈ G i , ki0 ∈ G i0 ,

(107)

n=1 i=1

implying linear dependence of the dispersion relations, namely J

δ in ωn (ki ) = Ci , i = i 0 , where Ci are constant.

n=1

The above equations would not imply linear dependence as in case (i) of Definition 21 only if δ in = 0, i = i 0 , n = 1, . . . , J, (108)

Wavepacket Preservation Under Nonlinear Evolution

357

and in this case the equality (107) takes the form J

δ i0 n ωn ki0 = ζ ωn 0 δ i0 ki0 for all ki0 ∈ G i0 .

(109)

n=1

J Note that in this case we deduce from (94) and (98) that n=1 δ i0 n = δ i0 . If δ i0 = 1 we are in case (iii) of Definition 21, whereas if δ i0 = 1 and n = n 0 we are in case (iv) of Definition 21. If δ i0 = 1 and n = n 0 (109) turns into δ i0 ωn 0 ki0 = ζ ωn 0 δ i0 ki0 . Since ωn 0 > 0 it implies δ i0 = ζ and ωn 0 ki0 = ωn 0 ζ δ i0 ki0 . Hence, in this case m, ζ , n 0 , λ ∈ Puniv (S), and since all possibilities are exhausted the proof is complete. Theorem 23 (Genericity of resonance invariance). Assume that dispersive relations ωn (k) are not ω-degenerate as in Definition 21. Let Krinv be a set of continuous and points k∗1 , . . . , k∗|K S | such that there exists a universally resonance invariant nk-spectrum S for which its k-spectrum K S = k∗1 , . . . , k∗|K S | . Then Krinv is open |K | and everywhere dense set in Rd \ σ bc S . Proof. The fact that Krinv is open follows from Definition 18 and the continuity in k of the dispersion relations ωn (k). Let G be a small open ball such that its clo ¯ ⊂ Rd \ σ bc |K S | . It suffices to prove that G¯ ∩ Krinv contains at least one point sure G k∗1 , . . . , k∗|K S | . For a given finite set M F let us consider all possible m, ζ , n 0 , λ ∈ × {−1, 1} × {1, . . . , J } × m m∈MF

which are not universal solutions to (100), and for a given m, ζ , n 0 , λ let G 0 (m, ζ , n 0 , ¯ and notice that it is a closed set. be a set of solutions k1 , . . . , k|K S | to (100) in G, λ) Let now G 0 (S) ⊂ G¯ be the union of the sets G 0 m, ζ , n 0 , λ over all m, ζ , n 0 , λ ∈ P (S) \ Puniv (S) and let us show that G 0 (S) = G. Indeed, suppose that G 0 (S) = G andhence G is afinite union of closed sets. According to Baire’s theorem one of the sets G 0 m, ζ , n 0 , λ with m, ζ , n 0 , λ ∈ P (S) \ P univ (S) must have a nonempty interior. Then, according to Theorem 22, the dispersion relations ωn (k) are ω-degenerate as in Definition 21 contradicting the conditions of the theorem. Hence, there is always a point k∗1 , . . . , k∗|K S | ∈ P (S) \ P univ (S) that completes the proof. The proof of the next statement is elementary and we skip it. Proposition 24. Let f 1 (k), f 2 (k), f 3 (k) be real-valued and continuous functions respectively in neighborhoods of k∗1 , k∗2 , k∗1 +k∗2 in Rd . Assume that the following equation: f 1 (k1 ) + f 2 (k2 ) = f 3 (δ 1 k1 + δ 2 k2 ) + C0 , holds in these neighborhoods where C0 , δ 1 , δ 2 are constants and δ 1 δ 2 = 0. Then all three functions f 1 (k), f 2 (k), f 3 (k) are linear in neighborhoods of k∗1 , k∗2 , k∗1 + k∗2 respectively.

358

A. Babin, A. Figotin

4. Reduction to a Standard Framework Many well known nonlinear evolutionary equations and systems can be easily reduced to the framework of (1), (3) involving two small parameters and β and characterized by the following properties: (i) the linear part L has a large factor 1 before it; (ii) the nonlinearity F (U) is independent of , β or depends on regularly; (iii) the initial data depend on β so that they do not vanish as β → 0; (iv) the solutions are considered on the time interval 0 ≤ τ ≤ τ ∗ , where τ ∗ > 0 does not depend on , β. Notice that solutions to (1), (3) under the above conditions exhibit nonlinear effects uniformly with respect to small , β on the time interval 0 ≤ τ ≤ τ ∗ . There are important classes of problems which can be readily reduced to the framework of (1), (3) by a simple rescaling. Systems with a small factor before the nonlinearity. Consider a problem of the form ∂t v = −iLv + αf (v),

v|t=0 = h, 0 < α 1,

(110)

where initial data are bounded uniformly in α. Such problems are reduced to (1) by the time rescaling τ = tα. Note that now = α and the finite time interval 0 ≤ τ ≤ τ ∗ corresponds to the long time interval 0 ≤ t ≤ τ ∗ /α. Systems with small initial data on long time intervals. The equation here is ∂t v = −iLv + f0 (v), f0 (v) =

v|t=0 = α 0 h, 0 < α 0 (m) (m+1) f0 (v) + f0 (v) + . . . ,

1, where (111)

and f (m) (v) is a homogeneous polynomial of degree m ≥ 2. After the rescaling v = α 0 V, we obtain the following equation with a small nonlinearity: (m) (112) ∂t V = −iLV + α 0m−1 f0 (V) + α 0 f 0(m+1) (V) + . . . , V|t=0 = h, which is of the form of (110) with α = α 0m−1 . Note that nonlinearities f in (110) which are obtained from problems with small initial data and regular nonlinearities f0 (v) have (m) a special form. Namely, they are almost homogeneous, f (V) = f0 (V) + α [. . .] with (m) leading term f0 (V). Introducing the slow time variable τ = tα 0m−1 we get from the above an equation of the form (1), namely i ∂τ V = − m−1 LV + f (m) (V) + α 0 f (m+1) (V) + . . . , V|τ =0 = h, (113) α0 where the nonlinearity does not vanish as α 0 → 0. In this case = α 0m−1 and the finite τ∗ time interval 0 ≤ τ ≤ τ ∗ corresponds to the long time interval 0 ≤ t ≤ m−1 with α0

small α 0 1. Note that Corollary 38 for -dependent nonlinearities can be applied to this case. This allows, in particular, to apply results of this paper to the Sine-Gordon m equation where f0 (v) = sin v. Note that a different rescaling τ = tα m 0 with = α 0 −1/m (m) (m) would produce a large term f (V). If the term f (V) is non-resonant for the initial data h such a term still produces a small contribution to the solution on interval t ≤ τ ∗ /α m 0 with small τ ∗ . The approach of this paper can be applied to this moderately singular case as well, but it would require more technical efforts and for the sake of simplicity we restrict ourselves to the regular case. The interaction of quadratic (m = 2) nonlinearity with the cubic term of the 1D model equation of form (111) was studied by Schneider 51].

Wavepacket Preservation Under Nonlinear Evolution

359

High-frequency carrier waves. Sometimes high spatial frequency of carrier waves in the initial wavepackets after a rescaling creates a large parameter 1 at the linear part. For example, Nonlinear Schrodinger equation ∂τ U = −i∂x2 U + iα |U |2 U,

U |τ =0 = h 1 (βx) eiMk∗1 x + h 2 (βx) eiMk∗2 x + c.c., (114) where c.c. stands for complex conjugate of the prior term, and M 1 is a large parameter, can be recast in the form (1). Indeed, changing variables y = M x in the above equation we obtain 1 ∂τ U = −i ∂ y2 U + iα |U |2 U,

U |τ =0 = h 1 β 1 y eik∗1 y + h 2 β 1 y eik∗2 y + c.c.,

β where β 1 = M 1, = M12 1. Note that though the nonlinearity|U |2 U in (114) is not complex homogeneous, it can be considered as a restriction of a system with a complex homogeneous nonlinearity as (67) is a restriction of (62).

First order hyperbolic equations and systems. Consider now the system (45), (46) for which the symmetry (7) does not hold. The system can be put into the standard framework by formally adding two more equations c1 c2 ∂τ w1 = ∂x w1 + F1 (w1 , w2 ), ∂τ w2 = ∂x w2 + F2 (w1 , w2 ), (115) w1 |τ =0 = 0, w2 |τ =0 = 0, which have only trivial solution w1 = w2 = 0 not affecting the solutions to the original system (45), (46). The extended system has the linear part with two-band dispersion relations ω1,ζ (k) = c1 ζ |k|, ω2,ζ (k) = c2 ζ |k| , ζ = ±, satisfying evidently (7). 5. Integrated Evolution Equation Using the variation of constants formula we recast the modal evolution equation (3) into the following equivalent integral form: τ −i(τ −τ ) −iζ τ L(k) ˆ ˆ L(k) ˆ ˆ e h (k), τ ≥ 0. (116) F U (k, τ ) dτ + e U (k, τ ) = 0

ˆ (k, τ ) into the slow variable uˆ (k, τ ) and the fast oscillatory term as Then we factor U in (14), namely iτ

iτ

ˆ (k, τ ) = e− L(k) uˆ (k, τ ), U ˆ n,ζ (k, τ ) = uˆ n,ζ (k, τ ) e− ζ ωn (k) , U

(117)

where uˆ n,ζ (k, τ ) are the modal coefficients of uˆ (k, τ ) as in (81). Notice that uˆ n,ζ (k, τ ) in (117) may depend on and (117) is just a change of variables and not an assumption. Consequently we obtain the following integrated evolution equation for uˆ = uˆ (k, τ ), τ ≥ 0,

F (m) uˆ m (k, τ ) , uˆ (k, τ ) = F uˆ (k, τ ) + hˆ (k), F uˆ = (118) F

(m)

uˆ

m

(k, τ ) =

m∈MF

τ

e 0

iτ L(k)

m −iτ L(·) ˆ e k, τ dτ , uˆ Fm

(119)

360

A. Babin, A. Figotin

where Fˆm are defined by (84) and (86) in terms of the susceptibilities χ (m) , and F (m) are bounded as in the following lemma. (m) defined by (86), (119) is a Lemma 25 (Boundness of multilinear operators). F 1 1 bounded operator from E = C [0, τ ∗ ] , L into C [0, τ ∗ ] , L 1 satisfying

m (m) uˆ j , uˆ 1 . . . uˆ m ≤ τ ∗ χ (m) F E j=1 E (m) (m) uˆ j . uˆ 1 . . . uˆ m ≤ χ ∂τ F E j

E

(120) (121)

Proof. Notice that since L (k) is Hermitian, exp −iL (k) τ1 = 1. Using the Young inequality, uˆ ∗ vˆ 1 ≤ uˆ 1 vˆ 1 , (122) L

L

L

together with (86), (119) we obtain (m) uˆ 1 . . . uˆ m (·, τ ) F Rd

(m) χ

τ

0 τ 0

Dm

L1

≤ sup χ (m) k, k k,k

(m) uˆ 1 k . . . uˆ m k k, k dk . . . dk(m−1) dτ 1 dk ≤

uˆ 1 (τ 1 )

(m) ˆ u χ . . . dτ ≤ τ (τ ) uˆ 1 E . . . uˆ m E , m 1 L1 1 ∗ L1

proving (120). Similarly we prove (121) by

Rd

∂τ F (m) uˆ 1 . . . uˆ m (·, τ ) 1 ≤ χ (m) L (m) uˆ 1 k . . . uˆ m k k, k dk . . . dk(m−1) dk ≤ χ (m) uˆ 1 E . . . uˆ m E . Dm

Equation (118) can be recast as the following abstract equation in a Banach space: ˆ u, ˆ hˆ ∈ E, uˆ = F uˆ + h,

(123)

and it readily follows from Lemma 25 that F uˆ has the following properties. Lemma 26. The operator F uˆ defined by (118)–(119) satisfies the Lipschitz condition F uˆ 1 − F uˆ 2 ≤ τ ∗ C F uˆ 1 − uˆ 2 , E E where C F ≤ Cχ m 2F (4R)m F −1 if uˆ 1 E , uˆ 2 E ≤ 2R, with Cχ as in (88). We also will use the following form of the contraction principle.

(124)

Wavepacket Preservation Under Nonlinear Evolution

361

Lemma 27 (Contraction principle). Consider equation x = F (x) + h, x, h ∈ B,

(125)

where B is a Banach space, F is an operator in B. Suppose that for some constants R0 > 0 and 0 < q < 1 we have h ≤ R0 , F (x) ≤ R0 if x ≤ 2R0 , F (x1 ) − F (x2 ) ≤ q x1 − x2 if x1 , x2 ≤ 2R0 .

(126) (127)

Then there exists a unique solution x to Eq. (125) such that x ≤ 2R0 . Let h1 , h2 ≤ R0 , then the two corresponding solutions x1 , x2 satisfy x1 , x2 ≤ 2R0 , x1 − x2 ≤ (1 − q)−1 h1 − h2 .

(128)

Let x1 , x2 be the two solutions of correspondingly two equations of the form (125) with F1 , h1 and F2 , h2 . Assume that that F1 (u) satisfies (126), (127) with a Lipschitz constant q < 1 and that F1 (x) − F2 (x) ≤ δ for x ≤ 2R0 . Then x1 − x2 ≤ (1 − q)−1 (δ + h1 − h2 ).

(129)

Lemma 26 and the contraction principle as in Lemma 27 imply the following existence and uniqueness theorem. Theorem 28. Let h E ≤ R, and let τ ∗ < 1/C Lemma F where CF , is a constant from 26. Then Eq. (118) has a solution uˆ ∈ E = C [0, τ ∗ ] , L 1 which satisfies uˆ E ≤ 2R, and such a solution is unique. The following existence and uniqueness theorem follows from Theorem 28. Theorem 29. Let (3) satisfy (88) and hˆ ∈ L 1 Rd , hˆ 1 ≤ R. Then there exists a L

unique solution to the modal evolution equation (3) in the functional space C 1 ([0, τ ∗ ] , L 1 ). The number τ ∗ depends on R and Cχ . Using the inequality (21) and applying the inverse Fourier transform we readily obtain the existence of an F−solution of (1) in C 1 [0, τ ∗ ] , L ∞ Rd from the existence of the solution of Eq. (3) in C 1 [0, τ ∗ ] , L 1 . The existence of F-solutions in spaces of spatially smooth functions can be derived by replacing Lemma 25 with an estimate similar to the one in Lemma 50. Let us recast now the system (118)–(119) into modal components using the projections n,ζ (k) as in (11). The first step to introduce elementary modal susceptibilities (m) χ having one-dimensional range in C2J and vanishing if one of its arguments uˆ j n,ζ ,ξ

belongs to a (2J − 1)-dimensional linear subspace in C2J ( j th null-space of χ

(m) ). n,ζ ,ξ

For example, in the linear case m = 1 when χ (1) acts in C2J and is presented in the standard orthonormal basis en,ζ in C2J by a 2J × 2J matrix with elements (1) (1) aξ ,ξ = an,ζ ,n ,ζ , where index ξ = n, ζ takes 2J values, the action of elementary (1)

(1)

susceptibility χ n,ζ ,n ,ζ on a vector v ∈ C2J is given by the formula χ n,ζ ,n ,ζ v = (1) an,ζ ,n ,ζ v · en ,ζ en,ζ , where en,ζ is the standard orthonormal basis in C2J . Obvi (1) (1) ously χ n,ζ ,n ,ζ v = n,ζ χ (1) n ,ζ v and χ (1) v = n,ζ ,n ,ζ χ n,ζ ,n ,ζ v. The general definition follows.

362

A. Babin, A. Figotin

Definition 30 (Elementary susceptibilities). Let (130) ξ = n, ζ ∈ {1, . . . , J }m × {−1, 1}m = m , (n, ζ ) ∈ and χ (m) k, k uˆ 1 k , . . . , uˆ m k(m) be the m-linear symmetric tensor (suscepti m (m) bility) as in (86). We introduce elementary susceptibilities χ k, k : C2J → n,ζ ,ξ C2J ) as m-linear tensors defined for almost all k, k by the following formula: (m) (m) uˆ 1 k , . . . , uˆ m k(m) k, k χ k, k uˆ 1 k , . . . , uˆ m k(m) = χ n,ζ ,ξ n,ζ , n ,ζ (m) k, k n 1 ,ζ k uˆ 1 k , . . . , n m ,ζ (m) k(m) k, k uˆ m = n,ζ (k) χ . (131) × k(m) k, k Then using (82) and the elementary susceptibilities (131) we get χ (m) k, k uˆ 1 k , . . . , uˆ m k(m)

(m) = χ k, k uˆ 1 k , . . . , uˆ m k(m) . n,ζ

ξ

n,ζ ,ξ

(132)

(m)

Consequently the modal components F of the operators F (m) in (119) are m-linear n,ζ ,ξ oscillatory integral operators defined in terms of the elementary susceptibilities (132) as follows. Definition 31 (Interaction phase). Using notations from (86) we introduce for ξ = n, ζ ∈ m the operator $ τ τ % 1 (m) exp iφ n,ζ ,ξ k, k F (u˜ 1 . . . u˜ m ) (k, τ ) = n,ζ ,ξ 0 Dm (m) (m) 1, k, k , τ 1 d˜ (m−1)d kdτ (133) χ k, k u˜ 1 k , τ 1 , . . . , u˜ m k n,ζ ,ξ

with the interaction phase function φ defined by φ n,ζ ,ξ k, k = φ n,ζ ,n ,ζ k, k = ζ ωn (ζ k) − ζ ωn 1 ζ k − . . . − ζ (m) ωn m ζ (m) k(m) , k(m) = k(m) k, k . (134) (m) n,ζ ,ξ

Using F

in (133) we recast F (m) (um ) in the system (118)–(119) as

(m) F (m) uˆ 1 . . . , uˆ m (k, τ ) = F uˆ 1 . . . uˆ m (k, τ ), n,ζ ,ξ

n,ζ ,ξ

yielding the following system for the modal components uˆ n,ζ (k, τ ) as in (11),

(m) F uˆ m (k, τ ) + hˆ n,ζ (k), (n, ζ ) ∈ . uˆ n,ζ (k, τ ) = m∈MF ξ ∈m

n,ζ ,ξ

(135)

(136)

Wavepacket Preservation Under Nonlinear Evolution

363

6. Wavepacket Interaction System The wavepacket preservation property of the nonlinear evolutionary system in any of its forms (1), (3), (118), (123), (136) is not easy to see directly. It turns out though that dynamics of wavepackets is well described by a system in a larger space E 2N based on the original equation (118) in the space E. We call it a wavepacket interaction system, which is useful in three ways: (i) the wavepacket preservation is quite easy to see and verify; (ii) it can be used to prove the wavepacket preservation for the original nonlinear problem; (iii) it can be used to study more subtle properties of the original problem, such as NLS approximation. We start with the system (118) where hˆ (k) is a multiwavepacket with a given nk-spectrum S = {(k∗l , nl ), l = 1, . . . , N } as in (31) and k-spectrum K S = {k∗i , i = 1, . . . , |K S |} as in (32). When constructing the wavepacket interaction system it is convenient to have relevant functions to be explicitly localized about the k-spectrum K S of the initial data. We implement that by making up the following cutoff functions based on (25), (26), i,ϑ (k) = (k, ϑk∗i ) = β −(1− ) (k − ϑk∗i ) , k∗i ∈ K S , i = 1, . . . , |K S |, ϑ = ± (137) with as in Definition 1 and β > 0 small enough to satisfy β 1/2 ≤ π 0 , where π 0 = π 0 (S) <

1 min dist {k∗i , σ bc } . 2 k∗i ∈K S

(138)

In what follows we use notations from (92) and ϑ ∈ m, l = (l1 , . . . , lm ) ∈ {1, . . . , N }m , ϑ = ϑ , . . . , ϑ (m) ∈ {−1, 1}m , λ = l, (139) (140) n = (n 1 , . . . , n m ) ∈ {1, . . . , J }m , ζ ∈ {−1, 1}m , ξ = n, ζ ∈ m , k = k , . . . , k(m) ∈ Rm , where m as in (130). Based on the above we introduce now the wavepacket interaction system, ⎞ ⎛

ˆ l,ϑ (·) = ·, ϑk∗il nl ,ϑ (·) F ⎝ ˆ l ,ϑ ⎠ w w (l ,ϑ )∈ ˆ (l, ϑ) ∈ , + ·, ϑk∗il nl ,ϑ (·) h, (141) 2N = w ˆ 1,+ , w ˆ 1,− , . . . , w ˆ N ,+ , w ˆ N ,− ∈ E , w ˆ l,ϑ ∈ E, (l, ϑ) ∈ , w with (·, ϑk∗i ), n,ϑ as in (137), (11), F defined by (118), and the norm in E 2N defined based on (17) by the formula

w w E 2N = ˆ l,ϑ E , E = C [0, τ ∗ ] , L 1 . l,ϑ

The index (l, ϑ) which takes 2N values labels equations and variables, the right-hand ∈ E 2N and the equality (141) is understood side of (141) is well-defined for all w

364

A. Babin, A. Figotin

as equality of elements of E 2N . We also use the following concise form of the wave interaction system (141): , where = F (w) +h w (142) ˆ i1 ,− n 1 ,− h, ˆ . . . , i N ,+ n N ,+ h, ˆ i N ,− n N ,− hˆ ∈ E 2N . h = i1 ,+ n 1 ,+ h, The following lemma is analogous to Lemmas 25, 26. is bounded in E 2N , F (0) = 0, and it satisLemma 32. Polynomial operator F (w) fies the Lipschitz condition F (w 1 − w 2 E 2N , 1 ) − F ( w 2 ) E 2N ≤ Cτ ∗ w

(143)

1 E 2N + w 2 E 2N , where C depends only on Cχ as in (88), on the degree of F and on w and it does not depend on β and . (m) n,ζ ,ξ

defined by (133) and prove its boundedness (w) and the Lipschitz property as in Lemma 25 using the inequality exp iφ n,ζ ,ξ τ1 ≤ 1 and estimates (25), (88). Note that the integration in τ 1 yields the factor τ ∗ and consequent summation with respect to n, ζ , ξ yields (143). Proof. We consider every operator F

Lemma 32 and the contraction principle as in Lemma 27 yield the following statement. Theorem 33. Let h 2N ≤ R. Then there exists R1 > 0 and τ ∗ > 0 such that E

∈ E 2N which satisfies w E 2N ≤ R1 and such a solution is Eq. (141) has a solution w unique. ˆ l,ζ (k, τ ) corresponding to the solution of (142) from Lemma 34. Every function w E 2N is a wavepacket with nk-pair (k∗l , nl ) with the degree of regularity which can be any s > 0. Proof. Note that according to (137) and (142) the function ˆ l,ϑ (k, τ ) = k, ϑk∗il nl ,ϑ F (k, τ ), F (τ ) L 1 ≤ C, 0 ≤ τ ≤ τ ∗ w involves the factor l,ϑ (k) = β −(1− ) (k − ϑk∗l ) where is as in Definition 1. Hence, ˆ l,ϑ (k, τ ) = 0 if n = nl or ϑ = ϑ, (144) n,ϑ w 1−

ˆ l,ϑ (k, τ ), w ˆ l,ϑ (k, τ ) = 0 if |k − ϑk∗l | ≥ β , ˆ l,ϑ (k, τ ) = k, ϑk∗il w w (145) ˆ l,ϑ is satisfied with Dˆ h = 0 for any s > 0 and and, consequently, Definition 1 for w C = 0 in (30).

Wavepacket Preservation Under Nonlinear Evolution

365

Now we would like to show that if hˆ is a multiwavepacket, then the function

ˆ (k, τ ) = ˆ l,ϑ (k, τ ) = ˆ λ (k, τ ) w w w (146) λ∈

(l,ϑ)∈

is an approximate solution of Eq. (123) (see notation (92)). To do that we introduce ∞ (k) = 1 −

S|

|K

(k, ϑk∗i ) = 1 −

ϑ=± i=1

ϑ=± k∗i ∈K S

k − ϑk∗i . β 1−

(147)

m ˆ l,ϑ Expanding the m-linear operator F (m) and using notations (92), (93) l,ϑ w we get ⎛⎛ ⎞m ⎞

λ , where ˆ l,ϑ ⎠ ⎠ = F (m) ⎝⎝ (148) w F (m) w λ ∈m

l,ϑ

ˆ λ1 . . . w ˆ λm , λ = (λ1 , . . . , λm ) ∈ m . λ = w w

(149)

The next statement shows that (146) defines an approximate solution to the integrated evolution equation (118). Theorem 35. Let hˆ be a multi-wavepacket with resonance invariant nk-spectrum S with be a solution of (142) and w ˆ (k, τ ) be defined by (146). Let regularity degree s, w ˆ w ˆ ˆ =w ˆ −F w ˆ − h. D (150) Then there exists β 0 > 0 such that we have the estimate ˆ ˆ ≤ C + Cβ s , if 0 < ≤ 1, β ≤ β 0 . D w E

(151)

Proof. Let ⎛ ⎞

ˆ ˆ = ⎝1 − ˆ , hˆ − = hˆ − F− w il ,ϑ nl ,ϑ ⎠ F w il ,ϑ nl ,ϑ h. l,ϑ

(152)

l,ϑ

Summation of (141) with respect to l, ϑ yields

ˆ ˆ = ˆ + w il ,ϑ nl ,ϑ F w il ,ϑ nl ,ϑ h. l,ϑ

l,ϑ

Hence, from (141) and (150) we obtain ˆ w ˆ = hˆ − − F − w ˆ . D Using (28) and (30) we consequently obtain nl ,ϑ hˆ i 1 ≤ Cβ s if nl = n i ; il ,ϑ hˆ i 1 ≤ Cβ s L L ˆ − s h ≤ C 1 β . E

(153)

if k∗il = k∗i , (154)

366

A. Babin, A. Figotin

Now, to show (151) it is sufficient to prove that − F w ˆ E ≤ C2 . Obviously,

(155)

⎛ ⎞

m ˆ = ⎝1 − ˆ . F− w il ,ϑ nl ,ϑ ⎠ F (m) w

Note that

il ,ϑ nl ,ϑ =

(156)

m

l,ϑ

(·, ϑk∗ ) n,ϑ .

(157)

ϑ=± (n,k∗ )∈S

l,ϑ

Using (82) and (147) we consequently obtain

(·, ϑk∗ ) n,ϑ + ∞ = 1,

(158)

ϑ=± (n,k∗ )∈

⎛ ⎝1 −

⎞

il ,ϑ nl ,ϑ ⎠ = ∞ +

(·, ϑk∗ ) n,ϑ ,

(159)

ϑ=± (n,k∗ )∈ \S

l,ϑ

m ˆ using (148). According to (156) with defined in (90). Let us expand now F (m) w and (159) to prove (155) it is sufficient to prove that for every string λ ∈ m the following inequalities hold: λ ≤ C3 for (n, ϑ) ∈ , and (160) ∞ n,ϑ F (m) w λ ≤ C3 , if (n, k∗ ) ∈ \ S. (161) (·, ϑk∗ ) n,ϑ F (m) w We will use (144) and (145) to obtain the above estimates. According to (135)

(m) λ (k, τ ) = ˆ λ1 . . . w ˆ λm (k, τ ). F (m) w F w n,ζ

n,ζ ,ξ

ξ

(162)

Note that according to (144) if λi = l, ϑ ˆ λi , if n = nl and ϑ = ϑ. ˆ λi = n,ϑ w w Let us introduce the notation n l = nl1 , . . . , nlm , ξ λ = n l , ϑ , Since

n ,ϑ n,ϑ = 0,

ϑ ∈ m . for λ = l,

if n = n or ϑ = ϑ,

(163)

(164)

(165)

then (163) implies ˆ λ1 . . . w ˆ λm = 0 if ξ = n, ζ = ξ λ , and, hence, w

(m) w λ (k, τ ) = ˆ λ1 . . . w ˆ λm (k, τ ), F F (m) w

(m) n,ζ ,ξ

F

n,ζ

n,ζ ,ξ λ

(166)

Wavepacket Preservation Under Nonlinear Evolution

367

where we use notation (93), (164). Note also that n ,ϑ F

(m) n,ζ ,ξ

if n = n or ϑ = ζ ,

=0

(m) λ only if and, hence, we have nonzero n ,ϑ F w n,ζ ,ξ ξ = ξ λ , n = n, ϑ = ζ . By (133) (m) λ (k, τ ) F w n,ζ ,ξ λ (m) χ n,ζ ,ξ λ

τ

= 0

Dm

(168)

τ % 1 (169) k, k 1, k(m) k, k , τ 1 d˜ (m−1)d kdτ

$ exp iφ

ˆ λ1 k , τ 1 , . . . , w ˆ λm k, k w

(167)

n,ζ ,ξ λ

Now we use (145) and notice that according to the convolution identity in (86),

w ˆ λm k(m) k, k , τ 1 = 0 if k − ˆ λ1 k , τ 1 · . . . · w ϑ i k∗li ≥ mβ 1− . (170) i

Hence the integral (169) is nonzero only if k, k belongs to the set &

1−

1−

(i) k, k : k − ϑ i k∗li ≤ β , i = 1, . . . , m, k − Bβ = ϑ i k∗li ≤ mβ . i (171) We will prove now that if (n, k∗i ) ∈ / S, then for small β one of the following alternatives holds: (m) λ = 0, (172) either (·, ϑk∗i ) n ,ϑ F w n,ζ ,ξ (173) or (168) holds and φ n,ζ ,ξ k, k ≥ c > 0 for k, k ∈ Bβ . Note that since φ n,ζ ,ξ k, k is smooth, then using notation (94) we get (174) φ n,ζ ,ξ k, k − φ n ,ζ ,ξ k∗∗ , k∗ ≤ Cβ 1− for k, k ∈ Bβ ,

l . ϑ i k∗li = ζ κm ϑ, ϑ = (ϑ 1 , . . . , ϑ m ), k∗∗ = ζ i

Hence the alternative (173) holds if

φ n,ζ ,ξ k∗∗ , k∗ = 0,

(175)

and, consequently, it suffices to prove that either (172) or (175) holds. Combining (171) (m) w with (k, ϑk∗i ) = 0 for |k − ϑk∗i | ≥ β 1− we find that i,ϑ F can be non λ l ∈ [S] K ,out , and zero for small β only in a small neighborhood of a point ζ κm ϑ, that is possible only if

l = ϑk∗i , k∗i ∈ K S . k∗∗ = ζ κm ϑ,

(176)

368

A. Babin, A. Figotin

Let us show that the equality φ n,ζ ,ξ k∗∗ , k∗ = 0

(177)

is impossible for k∗∗ as in (176) and n = n as in (167), keeping in mind that (n, k∗i ) ∈ / S. It follows from (99) and (134) that Eq. (177) has the form of the resonance equation (100). Since nk-spectrum S is resonance invariant, in view of Definition 18the resonance equation (177) may have a solution only if k∗∗ = k∗i , i = il , n = nl , with nl , k∗il ∈ S. Since (n, k∗i ) ∈ / S that implies (177) does not have a solution and, hence, (175) holds when (n, k∗i ) ∈ / S. Notice that Theorem 33 and (121) yield bounds w ˆ λi E ≤ R1 , ∂τ w ˆ λi E ≤ C. These bounds combined with Lemma 36, proven below, imply that if (175) holds then (161) holds. Now let us turn to (160). According to (147) and (170) the term λ can be non-zero only if ζ κm λ = k∗∗ ∈ ∞ n ,ϑ F (m) w / K S . Since nk-spectrum S is resonance invariant we conclude as above that inequality (175) holds in this case as well. The fact that the set of all κm λ is finite, combined with inequality (175), imply (173) for sufficiently small β. Using Lemma 36 as above we derive (160). Hence, all terms in the expansion (156) are either zero or satisfy (160) or (161) implying consequently (155) and (151). Here is the lemma used in the above proof. Lemma 36. Assume that (m) ˆ λ1 k , τ 1 , . . . , w ˆ λm k(m) k, k , τ 1 = 0 for k, k ∈Bβ , i,ϑ n ,ζ χ n,ζ ,ξ k, k w / Bβ , with Bβ as in (171). (178) and φ n,ζ ,ξ k, k ≥ ω∗ > 0 for k, k ∈ Then (m) λ ≤ (179) ·, ϑ k∗i n ,ζ Fn,ζ ,ξ w E 4 2τ ∗ (m) (m) w ∂τ w w ˆ λ j E + ˆ λ j E . ˆ λi E χ χ j j=i ω∗ ω∗ i

Proof. Notice that the oscillatory factor in (133) equals $ $ τ % τ % 1 1 ∂τ 1 exp iφ k, k = . exp iφ k, k iφ k, k

Wavepacket Preservation Under Nonlinear Evolution

369

(m)

(m)

Denoting φ n,ζ ,ξ = φ, i,ϑ n ,ζ χ = χ η and integrating (133) by parts with n,ζ ,ξ respect to τ 1 we obtain (m) λ (k, τ ) k, ϑ k∗i n ,ζ F w n,ζ ,ξ

iφ k,k e w χ η(m) ˆ λ1 k , τ . . . w ˆ λm = k, k k, ϑ k∗i B iφ k, k × k(m) k, k , τ d˜ (m−1)d k χ η(m) ˆ λ1 k , 0 . . . w ˆ λm k, k w − k, ϑ k∗i B iφ k, k × k(m) k, k , 0 d˜ (m−1)d k

τ

τ

iφ k,k 1 e χ η(m) k, k ∂τ 1 k, ϑ k∗i − B 0 iφ k, k 1, ˆ λm k(m) k, k ˆ λ1 k . . . w × w d˜ (m−1)d kdτ τ

(180)

of k(i) for which (171) holds. The relations (88) and (25) imply whereB isthe set (m) (m) k, k ≤ χ . Using then (178), the Leibnitz formula and (122) we obtain χ η (179). The main result of this subsection is the next theorem which, when combined with Lemma 34, implies the wavepacket preservation, namely that the solution uˆ n,ϑ (k, τ ) of (136) is a multi-wavepacket for all τ ∈ [0, τ ∗ ]. Theorem 37. Assume that conditions of Theorem 35 are fulfilled. Let uˆ n,ϑ (k, τ ) for ˆ l,ϑ (k, τ ) be the solutions to respective systems (136), (141), w ˆ be defined n = nl and w by (146). Then there exists β 0 > 0 such that uˆ n ,ϑ − n ,ϑ w ˆ E ≤ C + C β s for 0 < β ≤ β 0 . (181) l l Proof. Note to The that uˆ n,ϑ = n,ϑ uˆ where uˆ is a solution of (118) and, according orem 28, uˆ E ≤ 2R. Comparing Eqs. (118) and (150), which are uˆ = F uˆ + hˆ and ˆ w ˆ =F w ˆ + hˆ + D ˆ , we find that Lemma 27 can be applied. Then we notice that w ˆ Taking C F τ ∗ < 1 as in by Lemma 26 F has the Lipschitz constant C F τ ∗ for such u. Theorem 28 we obtain (181) from (128). Notice that Theorem 5 is a direct corollary of Theorem 37 and Lemma 34. The following corollary shows that inequality (181) and, therefore, Theorems 5 and 3on ˆ ˆ in preservation of wavepackets hold in the case when the coefficients of operator F U ˆ = Fˆ U, ˆ . (3), (86) regularly depends on small , Fˆ U Corollary 38 (Parameter dependent nonlinearity). Assume that conditions of Theorem 35 are fulfilled. Consider a perturbed Eq. (118) uˆ (k, τ ) = F uˆ (k, τ ) +

370

A. Babin, A. Figotin

ˆ (k, ˆ satisfies the inequality F1 u, ˆ E ≤ τ )+hˆ (k), where operator F1 u, F1 u, ˆ l,ϑ (k, τ ) be the solution of (141). Cq for q ≤ 1. Let w uˆ E ≤ 2R with some qq, 0 < s ˆ l,ϑ E ≤ C + C β . Then n,ϑ uˆ − w Proof. The statement follows from (181) and Lemma 27.

The following theorem shows that any multi-wavepacket solution to (118) yields a solution to the wavepacket interaction system (141). Theorem 39. Let uˆ (k, τ ) be a solution of (118) and assume that uˆ (k, τ ) and hˆ (k) are multiwavepackets with nk-spectrum S = {(nl , k∗l ), l = 1, . . . , N } and the regularity (k, τ ) = ˆ l,ϑ degree s. Let also il ,ϑ = il ,ϑ be defined by (137). Then functions w il ,ϑ nl ,ϑ uˆ (k, τ ) are a solution to the system (141) with hˆ (k) replaced by hˆ (k, τ ) satisfying ˆ (182) h (k) − hˆ (k, τ ) ≤ Cβ s , 0 ≤ τ ≤ τ ∗ . L1

Proof. Multiplying (118) by il ,ϑ nl ,ϑ we get ˆ l,ϑ ˆ l,ϑ w = ·, ϑk∗il nl ,ϑ F uˆ (k, τ ) + ·, ϑk∗il nl ,ϑ hˆ (k), w ˆ (183) = ·, ϑk∗il nl ,ϑ u. Since uˆ (k, τ ) is a multiwavepacket with regularity s we have

uˆ (·, τ ) − w ˆ (·, τ ) L 1 ≤ C β s where w ˆ (·, τ ) = ·, ϑk∗il uˆ (·, τ ).

(184)

l,ϑ

Let us recast (183) in the form ˆ l,ϑ ˆ (k, τ ) + ·, ϑk∗il nl ,ϑ hˆ (k) + hˆ (k, τ ) , w = ·, ϑk∗il nl ,ϑ F w ˆ (k, τ ). hˆ (k, τ ) = F uˆ − F w

(185)

Denoting hˆ (k) + hˆ (k, τ ) = hˆ (k, τ ) we observe that (185) has the form of (141) with hˆ (k) replaced by hˆ (k, τ ). Inequality (182) follows then from (184) and (124). 7. Reduction of Wavepacket Interaction System to a Minimal Interaction System Our goal in this section is to substitute the wavepacket interaction system (141) with a simpler (minimal) interaction system which describes the evolution of wavepackets with the same accuracy. We fix the nk-spectrum S = {(nl , k∗l ), l = 1, . . . , N } of the initial multiwavepacket and assume everywhere below that it is resonance invariant. ˆ and on S. We The minimal interaction system is built based on operators L and Fˆ U want the minimal interaction system to satisfy the following requirements. Firstly, the approximation of solutions of (141) by solutions of the minimal interaction system of the order (µ, ν) has to be of the order in suitable region of parameters (, β) (which is larger for larger µ, ν). Secondly, the minimal interaction system of the order (µ, ν)

Wavepacket Preservation Under Nonlinear Evolution

371

should be defined by S and by the values of L (k) and its derivatives of the order up to (m) k, k and its derivatives of order up to ν at k∗l ∈ S K . µ and by the values χ The construction of the minimal interaction system consists of the following consecutive steps: (i) introduction of a time averaged wavepacket interaction system obtained by discarding non-resonant terms in the nonlinearity; (ii) reduction of the system for vector components vˆ l,ϑ to an equivalent one for scalar amplitudes vˆl,ϑ ; (iii) change of variables k = ϑk∗l + βη in the equation for vˆl,ϑ resulting in a regular dependence of coefficients on small βη; (iv) substitution of the general dependence on βη in the linear part with a certain polynomial one of the order µ, and the general dependence on βη of coefficients of the nonlinearity with a certain polynomial of the order ν; trigonometric (v) substitution of the cutoff functions ·, ϑk∗il from (141), which were preserved up to this step, with 1. As a result we obtain a minimal interaction system with weakly universal nonlinearity, which in the simplest case, where S is just a single element (k∗ , n), is equivalent to the classical NLS equation, and in the case when S consists of only two elements (k∗ , n), (−k∗ , n), is equivalent to the classical coupled modes system.

7.1. Time averaged wavepacket interaction system. Here we modify the wavepacket interaction system (141), substituting its nonlinearity with a certain universal or conditionally universal one obtained by the time averaging, and prove that this substitution produces a small error of order . As the first step we recast (141) in a slightly different form by using expansions (148), (162) together with (166) and (167) and writing the nonlinearity in Eq. (141) in the form

(m) ·, ϑk∗il nl ,ϑ F (·, τ ) = ·, ϑk∗il F

nl ,ϑ,ξ λ

m∈M F λ∈ m

F

(m)

w λ (k, τ ) = F (m)

nl ,ϑ,ξ λ

n,ζ , n ,ζ

ζ , λ , λ = l, w

ˆ λm (k, τ ), n = n l , (n, ζ ) = (nl , ϑ), ˆ λ1 . . . w w

(m) with F as in (133) and n l as in (164). Consequently, the wavepacket interaction n,ζ , n ,ζ system (141) can be written in an equivalent form ˆ l,ϑ = w

(m) ·, ϑk∗il F

nl ,ϑ,ξ λ

m∈M F λ∈ m

ˆ l=1, . . . N , ϑ=±. λ + ·, ϑk∗il nl ,ϑ h, w

(186) The construction of the above mentioned time averaged equation reduces to discarding certain terms in the original system (186). First we introduce the following sets of indices related to the resonance equation (100) and m defined by (99):

m m nl ,ϑ = λ = l, ζ ∈ : m ϑ, n l , λ = 0 ,

(187)

and then the time-averaged nonlinearity by = Fav,nl ,ϑ (w)

m∈MF

(m)

(m)

Fnl ,ϑ , Fnl ,ϑ =

λ ∈m n ,ϑ l

F

(m) nl ,ϑ,ξ λ

λ . w

(188)

372

A. Babin, A. Figotin (m)

(m)

can be obtained from Fnl ,ϑ by the averaging Note that the nonlinearity Fav,nl ,ϑ (w) formula (70) where A T is defined by formula (69) with frequencies φ j = ωn j k∗i j . Consequently, the desired equation with time-averaged nonlinearity is ˆ l = 1, . . . N , ϑ = ±, (189) vˆ l,ϑ = ·, ϑk∗il Fav,nl ,ϑ (v) + ·, ϑk∗il nl ,ϑ h, which similarly to (142) we recast concisely as v = Fav, (v) + h .

(190)

The following lemma is analogous to Lemmas 32, 26. Lemma 40. Operator Fav, (v) is bounded for bounded v ∈ E 2N , Fav, (0) = 0. Polynomial operator Fav, (v) satisfies the Lipschitz condition Fav, (v1 ) − Fav, (v2 )

E 2N

≤ Cτ ∗ v1 − v2 E 2N ,

(191)

where C depends only on Cχ as in (88), on the power of F and on v1 E 2N + v2 E 2N , and, in particular, it does not depend on β. From Lemma 40 and the contraction principle we obtain the following theorem similarly to Theorem 33. Theorem 41. Let h

E 2N

≤ R. Then there exists R1 > 0 and τ ∗ > 0 such that

Eq. (190) has a solution v ∈ E 2N satisfying v E 2N ≤ R1 , and such a solution is unique. ˆ l,ϑ (k, τ ) be the solution of Theorem 42. Let vˆ l,ϑ (k, τ ) be the solution of (189) and w ˆ replaced by vˆ . (141). Then the vˆ l,ϑ (k, τ ) is a wavepacket satisfying (144), (145) with w In addition to that, there exists β 0 > 0 such that vˆ l,ϑ − w ˆ l,ϑ E ≤ C, l = 1, . . . , N ; ϑ = ±, for 0 < ≤ 1, 0 < β ≤ β 0 . (192) is an Proof. Formula (144), (145) for vˆ l,ϑ (k, τ ) follow from (189). We note that w ˆ ˆ ˆ = w−F ˆ approximate solution of (189), namely we have an estimate for Dav w av, −h which is similar to (150), (151): ˆ ˆ − Fav, − hˆ ≤ C, ˆ = w Dav w E

if 0 < ≤ 1, β ≤ β 0 .

(193)

The proof of (193) is similar to the proof of (155) with minor simplifications thanks to the absence of terms with ∞ . Using (193) we apply Lemma 27 and obtain (192).

Wavepacket Preservation Under Nonlinear Evolution

373

7.2. Averaged system for scalar amplitudes. Now we recast (189) in the form of an equivalent system of scalar equations for amplitudes vˆl,ϑ = vˆλ of solutions vˆ λl defined based on (11), namely vˆ λl (k) = k, ζ (l) k∗il nl ,ζ (l) (k) vˆ λl (k) = vˆl,ζ (l) (k) gnl ,ζ (l) (k). (194) Note that according to (145) support of vˆl,ζ (l) is localized near ζ k∗il , and we can assume that gnl ,ζ (l) (k) depend smoothly on k near this point. Multiplying (189) by gnl ,ζ l (k) (with the standard scalar product in C2 j ) and using (194) we obtain the following system of scalar amplitude equations: vˆl,ϑ = ·, ϑk∗il f av,nl ,ϑ ( v ) + ·, ϑk∗il hˆ nl ,ϑ , l = 1, . . . , N , ϑ = ±, where (195)

(m) ˆ v hˆ nl ,ϑ = gnl ,ϑ · nl ,ϑ h, f av,nl ,ϑ ( λ . (196) f v) = m∈MF λ ∈m n ,ϑ

nl ,ϑ,ξ λ

l

According to (169) the m-linear operators in the above equation are given by f

(m) n,ϑ,ξ

vλ (k, τ ) =

(m) Q n,ϑ,ξ

τ

0

Dm

e

τ iφ n,ϑ,ξ k,k 1

Q

(m) n,ϑ,ξ

m 1 , (197) k, k vˆλi d˜ (m−1)d kdτ i=1

(m) gλ1 k , . . . , gλm k(m) k, k k, k = gn,ϑ (k) · χ . (198) k, k n,ϑ,ξ

The concise form for the system (195) of scalar equations for amplitudes is 2N v = f ( , v ) + hˆ , v ∈ E sc

(199)

where the components vˆl,ϑ of v belong to the space E sc of scalar functions with the (m) norm defined by (17), (18) applied to scalar functions. Note that Q k, k can be n,ϑ,ξ

where (171) is not extended in an arbitrary way as bounded functions for arguments k, k, satisfied, for example the extension can be zero, the extension does not affect solutions of (195) because this equation involves factors ·, ϑk∗il and (145) holds. 2N and f (0) = 0. The Lemma 43. Operator f is bounded for bounded v ∈ E sc polynomial operator f ( v ) satisfies the Lipschitz condition

f ( v1 − v2 E sc2N , v1 ) − f ( v2 ) E sc2N ≤ Cτ ∗ where C depends only on Cχ as in (88), on the order of F as a polynomial and on v1 E 2N + v2 E 2N , and it does not depend on β. From Lemma 40 and the contraction principle we obtain the following theorem similarly to Theorem 33. Theorem 44. Let hˆ 2N ≤ R. Then there exists R1 > 0 and τ ∗ > 0 such that (199) E sc

2N satisfying v E sc2N ≤ R1 , and such a solution is unique. has a solution v ∈ E sc

374

A. Babin, A. Figotin

7.3. Rescaled amplitude equations. According to (145) amplitudes vˆl,ϑ (ζ k∗l + η) are localized about the point η = 0, and to study its behavior in a vicinity of η = 0 we introduce a group of dilation operators

Bβ vˆ (η) = β d vˆ (βη), β > 0,

(200)

which preserve the L 1 -norm and commute with the convolution, i.e. Bβ vˆ 1 = vˆ 1 , Bβ vˆ ∗ Bβ wˆ = Bβ vˆ ∗ wˆ . L L

(201)

We introduce then a rescaled and shifted version of initial data hˆ nl ,ϑ in (196) by the formula Hˆ nl ,ϑ (k) = Bβ hˆ nl ,ϑ (k + ϑk∗l ), hˆ nl ,ϑ (k) = β −d Hˆ nl ,ϑ β −1 (k − ϑk∗l ) , (202) where Bβ is defined by (200), |k − ϑk∗l | ≤ β 1− , and new variables ηl = β −1 (k − ϑk∗l ), l = 1, . . . , N , η = η1 , . . . , η N .

(203)

In this and the following sections we assume that Hˆ nl ,ϑ (β, η) are defined for all η ∈ Rd , including |η| ≥ β − . Though (195) involves hˆ nl ,ϑ with a cutoff factor, namely k, ϑk∗il hˆ nl ,ϑ (k) = k, ϑk∗il , β 1− hˆ nl ,ϑ (k) as in (26), we will later use Hˆ nl ,ϑ (β, η) defined for all η, and assume that (204) 1 − β η Hˆ nl ,ϑ (β, η) 1 ≤ Cβ s , L

where (i) (β η) = η, 0, β − is as in (25), (26); (ii) and s are the same as in Definition 1; (iii) condition (204) is consistent with (29) and (30). For a solution vˆl,ϑ (k, τ ) of (195) using (145) we introduce the following functions. zˆl,ϑ (η, τ ) = β d vˆl,ϑ (ϑk∗l + βη, τ ), zˆl,ϑ (η, τ ) = β η zˆl,ϑ (η, τ ), η ∈ Rd , (205) which satisfy a rescaled version of (195) provided below. Note that since n, ζ = λ ∈ (i) m i ζ k∗li = nl ,ϑ and the nk-spectrum S is resonance invariant we have κm λ = ζ k∗l = ϑk∗l . Since k, k satisfy the convolution identity (87) the variables η, η defined by (203) satisfy a similar identity as well, namely η=

m

i=1

η(i) , η(m) (k, η ) = η −

m−1

η(i) .

(206)

i=1

Change of variables (203) in the integral operator f av,nl ,ϑ defined by (197) yields the following amplitude system for zl,ϑ which is equivalent to (195): zˆl,ϑ (η) = β η f av,nl ,ϑ,β (z ) (η) + β η Hˆ nl ,ϑ (η), l = 1, . . . N , ϑ = ±. (207)

Wavepacket Preservation Under Nonlinear Evolution

According to (137), (196) and (197), k, ϑk∗il , β 1− = β η , (m)

f av,nl ,ϑ,β (z ) =

375

f av,nl ,ϑ,β (z ) =

m∈MF

f

λ ∈m n ,ϑ

(m) nl ,ϑ,ξ λ ,β

(m)

f av,nl ,ϑ,β (z ),

(208)

z λ ,

l

(m) f n,ϑ,ξ λ ,β

z λ (η, τ ) =

Q

(m) n,ϑ,ξ λ

τ

0

η +···+η(m) =η

$ exp iφ

m ϑk∗l + βη, k∗ + β η

i=1

τ % 1 ϑk∗l + βη, k∗ + β η (209) (i) ˜ (m−1)d zˆ λi η η dτ 1 . d

n,ϑ,ξ λ

Note that the condition (171) on the domain of integration takes in the new variables the form (i) (210) η ≤ β − , i = 1, . . . , m and |η| ≤ mβ − . Finally, we rewrite the amplitude system (207) in the concise form 2N z = β · f av,β (z ) + β · Hˆ β , z ∈ E sc .

(211)

Let us show now that (211) is of the form of (118) with 2J -component vector uˆ substituted with 2N -component vector z , the matrix L (k) substituted with a diagonal matrix (m) L with entries ϑωnl (ϑk∗l + βη). For that we introduce the S-averaged tensor Q av 2N m defined on z ∈ C by the formula m

(m) (m) ϑk∗l + βη, k∗ + β η Q zˆ λi (212) Q av,n,ϑ (βη, β η , z ) = λ ∈m n,ϑ

n,ϑ,ξ λ

i=1

2N m into C2N . Note that zˆ and which depends on S through m λi n,ϑ and acts from C (m) n,ϑ,ξ

are scalar factors, zˆ λi is a scalar projection in C2N onto a line along the λith Hence, the right-hand side of (212) is a sum of elementary suscepeigenvector of L. (m) tibilities obtained from Q av as in (132) and (207) has the form of (136). Note that non-zero terms in (212) contain products zˆ λi which satisfy (100). Therefore, if β = 0 (m) and S is resonance invariant, Q av has the form of weakly universal nonlinearity; if S is (m) universally resonance invariant then Q av has the form of a universal nonlinearity as in (65). Q

7.4. Amplitude system with polynomial dispersion relations. Now we introduce an amplitude system with polynomial dispersion which is similar to (207) and provides (i) sufficiently accurate approximation to (207); (ii) standard polynomial dependence of coefficients on η, η in the sense clarified below. The amplitude system has the form (µ,ν) uˆ l,ϑ = β η f nl ,ϑ ( (213) u ) + β η Hˆ nl ,ϑ , l = 1, . . . N , ϑ = ±,

(µ,ν) (m,µ,ν) u λ , (214) f nl ,ϑ ( f u) = m∈MF λ ∈m n ,ϑ l

nl ,ϑ,ξ λ

376

A. Babin, A. Figotin

where (β η) are cutoff-factors defined in (208), (137) and approximations f for f

(m) nl ,ϑ,ξ λ

(m,µ,ν) nl ,ϑ,ξ λ

are defined below. The indices µ = 1, 2, ν = 0, 1 determine the order of

approximation: (i) µ determines the order of approximation of the dispersion relation by a polynomial of the degree µ; (ii) ν determines the order of approximation of the susceptibility coefficients (198) by a trigonometric polynomial of the degree ν. As before, we recast (213) in a concise form, u = β f (µ,ν) ( u ) + β Hˆ ,

(215)

where β (η) = (β η). Finally, we eliminate in (213) the cutoff factor (β η) by setting (β η) = (0) = 1, and introduce the amplitude system with weakly universal nonlinearity and polynomial dispersion without cutoff (µ,ν)

uˆ l,ϑ (η) = f nl ,ϑ ( u ) (η) + Hˆ nl ,ϑ (η), l = 1, . . . N , ϑ = ±,

(216)

which can be written in the form of (215) with β = 1. Let us turn now to the construction of the approximations. For every nk-pair (k∗l , nl ) we introduce the Taylor polynomials of order µ of the dispersion relation ωnl (k∗l + βη): γ 1 (k∗l , nl , βη) = ωnl (k∗l ) + βωnl (k∗l ) η, γ 2 (k∗l , nl , βη) = γ 1 (k∗l , nl , βη) +

β2 η, ωnl (k∗l ) η , 2

and similarly γ 3 for µ = 3. Obviously we have the inequality (see (171)) ωn (k∗l + βη) − γ µ (k∗l , nl , βη) ≤ Cβ (µ+1)(1− 1 ) , k, k ∈ Bβ . l

(217)

The phase function φ n,ζ ,ξ k, k , ξ = n, ζ , defined by (134), is approximated then by a polynomial phase function (µ) φ ζ k , k , βη, β η ∗l ∗ nl ,ζ ,ξ = ζ γ µ (k∗l , nl , βη) − ζ γ µ k∗l1 , n , βη − . . . − ζ (m) γ µ k∗lm , n (m) , βη(m) . (218) Note that since ξ = ξ λ with λ ∈ m nl ,ϑ defined by (187), Eq. (100) is fulfilled. (µ) 1 Hence, φ ϑk∗l , k∗ , 0, 0 = 0 and the function φ depends linearly on η, η nl ,ϑ,ξ

and φ 2

nl ,ϑ,ξ

nl ,ϑ,ξ

is quadratic, namely

ϑk∗l , k∗ , βη, β η = βφ 1 ϑk∗l , k∗ , η, η , (219) nl ,ϑ,ξ ∗ , βη, β η = βφ 1 ∗ , η, η + β 2 φ 2,2 ∗ , η, η . ϑk ϑk ϑk , k , k , k ∗l ∗l ∗l φ1

nl ,ϑ,ξ

φ2

nl ,ϑ,ξ

nl ,ϑ,ξ

nl ,ϑ,ξ

(220)

Wavepacket Preservation Under Nonlinear Evolution

377

In the case µ = 2 the polynomial phase function involves two parameters 1 , 2 : τ 1 φ 2n ,ϑ,ξ ϑk∗l , k∗ , βη, β η l τ τ 1 1 + iφ 2,2 ϑk∗l , k∗ , η, η , (221) = iφ 1n ,ϑ,ξ ϑk∗l , k∗ , η, η nl ,ϑ,ξ l 1 2 1 = , 2 = 2 ; 0 < 1 < ∞, 0 < 2 ≤ ∞, (222) β β where 1 and 2 may be large or small depending on the relation between and β. Sometimes it is convenient to consider 1 and 2 as independent parameters. If µ = 1 we formally set 2 = ∞, τ 1 = 0. If (171) holds we have the estimate 2 $ % τ1 iφ µn ,ϑ,ξ ϑk∗l ,k∗ ,β η,β η τ1 β (µ+1)(1− ) +β η , k +β η iφ ϑk ∗ ∗l nl ,ϑ,ξ e l ≤ Cτ , µ = 1, 2. − e ∗ (223) To ensure that the approximation error is small for given µ we assume that and β satisfy β (µ+1)(1− ) → 0, β → 0, → 0. (224) (m) Now we approximate the dependence of Q ϑk∗l + βη, k∗ + β η on η, η given by n,ζ ,ξ (198) by trigonometric polynomials. Zero order approximation with ν = 0 is given by (m,0) (m) Q ϑk∗l + βη, k∗ + β η = Q ϑk∗l , k∗ . (225) n,ζ ,ξ

n,ζ ,ξ

To define the first order approximation we modify the standard Taylor expansion using trigonometric polynomials instead of algebraic ones. Taking the first derivative with respect to β at β = 0, d (m) (m) ∗ + β η , Q ϑk∗l , η, k∗ , η = ϑk Q + βη, k ∗l n,ζ ,ξ n,ζ ,ξ dβ β=0

which obviously is a linear function with respect to η, η , we express then η in terms of η using (206): Q

(m) n,ζ ,ξ

m

(m), j ( j) ( j) ϑk∗l , η, k∗ , η = q ϑk∗l , k∗ · η( j) , η( j) = η1 , . . . , ηd . j=1

n,ζ ,ξ

Then the first order approximation is Q

(m,1) n,ζ ,ξ

m

(m), j (m) ϑk∗l + βη, k∗ + β η = Q ϑk∗l , k∗ + q ϑk∗l , k∗ · sin βη( j) , n,ζ ,ξ

j=1

n,ζ ,ξ

( j) ( j) where sin η( j) = sin η1 , . . . , sin ηd . An advantage of this approximation is that the ( j)

multiplication by sin η1 is a bounded operator which equals the Fourier transform of a ( j) finite-difference operator whereas the multiplication by η1 corresponds to the partial

378

A. Babin, A. Figotin

derivative and is unbounded. Since the original nonlinearity does not involve unbounded operators, the use of bounded operators is natural and convenient. In fact, it is well known that the presence of the derivatives in the nonlinearity of NLS-type equations causes well known technical difficulties, see 14]. In our approach the approximating equation provides the same accuracy and its nonlinearity involves only bounded finite-difference operators bypassing those difficulties altogether. According to Condition 16 the susceptibility is smooth and if (210) holds we have the following inequality: (m) (m,ν) Q ϑk∗l + βη, k∗ + β η − Q ϑk∗l , k∗ , βη, β η ≤ Cβ (ν+1)(1− 1 ) . (226) n,ζ ,ξ

n,ζ ,ξ

(m,µ,ν)

We introduce components f nl ,ϑ,λ formula τ (m,µ,ν) z λ (η, τ ) = f nl ,ϑ,λ

0

of the weakly universal nonlinearity f (µ,ν) by the

η +···+η(m) =η

e

iφ 1

nl ,ϑ,ξ

τ ϑk∗l ,k∗ ,η,η 1 +iφ 2,2 1

nl ,ϑ,ξ

τ ϑk∗l ,k∗ ,η,η 1

2

(227) m (m,ν) ∗ ϑk η(i) d˜ (m−1)d k dτ 1 . , k z ˆ Q ∗l λ i nl ,ϑ,ξ

i=1

As before, we establish standard properties of the operator f (µ,ν) defined by the above formula. 2N , f (0) = 0. The Lemma 45. Operator β f (µ,ν) is bounded for bounded u ∈ E sc (µ,ν) satisfies the Lipschitz condition polynomial operator β f u 1 − u2 E sc2N , (228) u 1 ) − β f (µ,ν) ( u 2 ) 2N ≤ Cτ ∗ β f (µ,ν) ( E sc

where C depends only on Cχ as in (88), on the power of F and on u 1 E sc2N + u 2 E sc2N . In particular, it does not depend on β ≥ 0 and on 0 < 1 < ∞, 0 < 2 ≤ ∞. From Lemma 40 and the contraction principle we obtain the following theorem completely similar to Theorem 33. Theorem 46. Let hˆ 2N ≤ R. Then there exists R1 > 0 and τ ∗ > 0 such that E sc

2N satisfying z E sc2N ≤ R1 . Such a solution is unique Eq. (190) has a solution z ∈ E sc −

and zˆl,ϑ (k, τ ) = 0 if |k| ≥ β .

Theorem 47. Let uˆ l,ϑ (k, τ ) be a solution to (213) and zˆl,ϑ (k, τ ) be the solution of (211). Then the following inequality holds: uˆ l,ϑ − zˆl,ϑ ≤ Cβ (µ+1)(1− ) + C−1 β (µ+1)(1− ) , l = 1, . . . , N ; ϑ = ±, E sc (229) for all 0 < ≤ 1 and 0 < β ≤ β 0 , where is the same as in Definition 1, β 0 is sufficiently small.

Wavepacket Preservation Under Nonlinear Evolution

379

Proof. To obtain (229) we note that u l,ϑ is an approximate solution of (211), namely u − β f (µ,ν) ( u ) − hˆ = Dˆ where Dˆ is small. To estimate Dˆ observe that integrals involving u have the integration domain as in (171). Hence, using (226) and (223) we obtain ˆ D 2N ≤ Cβ (µ+1)(1− ) + C−1 β (µ+1)(1− ) , E sc

and applying Lemma 27 we get (229).

7.5. Decay of solutions and elimination of cutoff factors. In this subsection we show how to remove the cutoff function in (213) and to obtain the averaged interaction system with a weakly universal nonlinearity. If µ = 1, ν = 0 and the nk-spectrum S is resonance-invariant, the amplitude system coincides with the system (62) with a weakly universal nonlinearity. For µ > 1 or ν > 0 the amplitude system involves additional terms. In particular, if µ = 2, ν = 0 and S = {(k∗ , n)} is just a single element then the linear part has the second order and the nonlinearity is universal, and the amplitude system turns into the classical NLS system: 1 ∂τ u ζ = ζ γ 2 (k∗ , n, −iζ β∇ r η) + bζ u 2ζ u −ζ , u ζ (0) = Hˆ ζ , ζ = ±. This system is equivalent to (51) when Hˆ − = Hˆ +∗ , b− = b+∗ , u − = u ∗+ . When ν > 0 the nonlinearity involves additional terms with finite difference operators. The possibility to remove cutoff functions is based on the fast decay of uˆ (k) as |k| → ∞, which is equivalent to high smoothness of u (r). The factor β can be replaced by 1 with a small error when data Hˆ (k) decay sufficiently fast. To describe the decay we introduce weighted Banach spaces of scalar functions Hˆ (k) described as follows. Definition 48 (Weight function). For a ≥ 0 we call a positive function ψ (r ), r ≥ 0, a weight function from class W (a) if it satisfies the following conditions: (i) ψ (0) > 0, ψ (r1 ) ≥ ψ (r2 ) for r1 ≥ r2 ≥ 0; (ii) ψ (r1 + r2 ) ≤ ψ (r1 ) + ψ (r2 ) + C, where C does not depend on r1 , r2 (ψ is sublinear); (iii) ψ (r ) − a ln r ≥ C > 0 for all r > 0 (ψ (r ) is superlogarithmic). We introduce L 1 (ψ) as a space of scalar functions Hˆ (k), k ∈ Rd with the norm ˆ (230) = eψ(|k|) Hˆ (k) dk. H 1 L (ψ)

Rd

For vector-functions we use the same formula with Euclidean norm |·|. In the simplest case of ψ (r ) = a ln (1 + r ) we have ψ ∈ W (a) and obtain L 1 (ψ) = L 1,a with the norm (19). If the weight function belongs to W (a) for all a the space L 1 (ψ) consists of the Fourier transforms of infinitely smooth functions. The following lemma shows that L 1 (ψ) is closed with respect to the convolution.

380

A. Babin, A. Figotin

Lemma 49. Let Hˆ 1 , Hˆ 2 ∈ L 1 (ψ) and ˆ Hˆ 1 k − k Hˆ 2 k − k dk . H3 (k) = Then Hˆ 3 (k)

Rd

L 1 (ψ)

≤ C Hˆ 1 (k)

L 1 (ψ)

ˆ H1 (k)

L 1 (ψ)

.

(231)

Proof. Using Definition 48 (ii) we obtain ψ(|k|) ˆ e eψ(|k|) Hˆ 1 k − k Hˆ 2 k dk H3 (k) ≤ Rd ≤ eC eψ (|k |) eψ (|k−k |) Hˆ 1 k − k Hˆ 2 k dk . Rd

Applying Young’s inequality (122) we obtain eψ(|k|) Hˆ 3 (k) dk ≤ eC eψ(|k|) Hˆ 1 (k) dk Rd

implying (231).

Rd

Rd

eψ(|k|) Hˆ 2 (k) dk ,

Let us introduce the norm in the space E sc (ψ) by the formula (17) ˆ ˆ ψ(|k|) ˆ H τ = H (·, ·) e = sup (k, ) dk. (232) H (·, ·) d E(ψ) C ([0,τ ∗ ],L 1 (ψ)) 0≤τ ≤τ ∗ R Using (231) instead of (18) we obtain as in Lemma 25 the following statement. 2N (ψ), f (0) = Lemma 50. Operator β f (s,ν) in (215) is bounded for bounded u ∈ E sc 0, and satisfies the Lipschitz condition u 1 − u2 E sc2N (ψ) , ≤ Cτ ∗ (233) u 1 ) − β f (s,ν) ( u 2 ) 2N β f (s,ν) ( E sc (ψ)

where C depends only on Cχ as in (88), on the power of polynomial f (s,ν) and on u 1 E sc2N (ψ) + u 1 E sc2N (ψ) and does not depend on β ≥ 0 and on 0 < 1 < ∞, 0 < 2 ≤ ∞. From Lemma 40 and the contraction principle we obtain the following theorem completely similar to Theorem 33. Theorem 51. Let Hˆ 2N ≤ R. Then there exists R1 > 0 and τ ∗ > 0 such that E sc (ψ)

2N (ψ) which satisfies u E sc2N (ψ) ≤ R1 , and such a Eq. (215) has a solution u ∈ E sc solution is unique.

The following lemma shows that can be replaced by one with a small error. Lemma 52. Let Hˆ 1 ≤ C, ψ ∈ W (a), as in (25). If s > 0, > 0 and s < a, then (204) holds.

L (ψ)

Wavepacket Preservation Under Nonlinear Evolution

Proof. We have 1 − β η Hˆ (η) dη ≤

|η|≥β −

≤

|η|≥β −

381

ˆ dη = H (η)

|η|≥β −

e−ψ(|η|) eψ(|η|) Hˆ (η) dη

− −

− e−ψ (β ) eψ(|k|) Hˆ (η) dη ≤ β s eln(β )s/ −ψ (β ) Hˆ

(234) L 1 (ψ)

.

According to Definition 48 (iii), ln β − s/ − ψ β − ≤ a ln β − − ψ β − ≤ C, and we obtain (204) from (234). Theorem 53. Let Hˆ 2N ≤ R, where the weight function ψ belongs to W (a) and E sc (ψ)

let s < a. Let u and u0 be solutions to respectively the minimal equation with cutoff factor and without cutoff factor respectively. Then there exists Cs and β 0 such that u − u0 E sc2N (ψ) ≤ Cs β s , 0 < β ≤ β 0 .

(235)

Proof. We show that u is an approximate solution to u0 = f (µ,ν) ( u 0 ) + Hˆ . Namely, ˆ u = β f (µ,ν) ( u )+β Hˆ = f (µ,ν) ( u )+ Hˆ + D, Dˆ = β −1 f (µ,ν) ( u )+ β −1 Hˆ . 2N (ψ) then f (µ,ν) ( 2N (ψ). Applying Lemma According to Lemma 49 if u ∈ E sc u ) ∈ E sc 52 we obtain ˆ ≤ Cβ s , 0 < β ≤ β 0 . (236) D 2N E sc (ψ)

Lemma 27 combined with (236) yields (235).

Now we give the theorem on approximation by solutions of a minimal system without cutoff. Theorem 54. Let Hˆ l,ζ (k), l = 1, . . . , N be functions bounded in L 1 (ψ), where ψ belongs to W (a), let s < a. Let hˆ l,ζ (k) be defined by (202) and hˆ l,ζ (k) = hˆ l,ζ (k) gnl ,ζ (k). Let uˆ (k, τ ) be a solution of Eq. (118) with multiwavepacket initial data of the form (33). Let u l,ϑ (k, τ ) be a solution to the system with a weakly universal nonlinearity (216) with initial data u l,ϑ (k, 0) = Hˆ l,ϑ (k) and uˆ min (k, τ ) =

N

ϑ

Then

β −d u l,ϑ β −1 k − ζ k∗il , τ gnl ,ϑ (k).

l=1

uˆ − uˆ min ≤ C ,s β s + Cβ (ν+1)(1− ) + C−1 β (µ+1)(1− ) + C. E

(237)

N Proof. We take uˆ = ϑ l=1 u l,ϑ and estimate uˆ (k, τ ) − uˆ min (k, τ ) E applying subsequently Theorems 37, 42, formulas (194) and (205), Theorem 47 and finally Theorem 53 to obtain inequality (237).

382

A. Babin, A. Figotin

Note that Theorem 7 is a direct corollary of Theorem 54. Remark 55. Note that (216) is the Fourier integral version of the following system of equations based on weakly universal nonlinearity and is slightly more general than (62), ∂τ u l,ϑ =

1 i (µ,ν) ωnl (k∗l ) · ∇ x u l,ϑ + ∇ r · ωnl (k∗l ) ∇ r u l,ϑ + f nl ,ϑ ( u , δ u), 1 22 u lϑ |τ =0 = Hˆ lϑ , where δ i u l (r) = u j (r + ei ) − u j (r − ei ),

(238)

where 1 , 1 are as in (222) and ei is ith standard ort in Rd . In the case when (52) holds 1/2 is bounded or small and the dependence on the coefficient 1/2 is regular for small and β and u ϑ, j (k, τ ) may be looked at as a shape function. When 1 = and 1/2 is substituted by zero we obtain an equation exactly of the form (62). When ν = 0, µ = 1 and the nk-spectrum S is universally resonance invariant as (1,0) in Definition 18, the nonlinearities f nl ,ϑ,0 are universal of the form (65). When the nk-spectrum S is resonance invariant but not universally resonance invariant, the nonlinearities are weakly universal, but may be not universal, that allows, in particular, for the second and the third harmonic generation. Acknowledgement. Effort of A. Babin and A. Figotin is sponsored by the Air Force Office of Scientific Research, Air Force Materials Command, USAF, under grant number FA9550-04-1-0359. We also would like to express our deep gratitude to the reviewer for the thorough analysis of our work and valuable suggestions which helped to improve the presentation of our results.

References 1. Babin, A., Figotin, A.: Nonlinear Photonic Crystals: I. Quadratic nonlinearity. Waves in Random Media 11, R31–R102 (2001) 2. Babin, A., Figotin, A.: Nonlinear Photonic Crystals: II. Interaction classification for quadratic nonlinearities. Waves in Random Media 12, R25–R52 (2002) 3. Babin, A., Figotin, A.: Nonlinear Photonic Crystals: III. Cubic Nonlinearity. Waves in Random Media 13, R41–R69 (2003) 4. Babin, A., Figotin, A.: Nonlinear Maxwell Equations in Inhomogenious Media. Commun. Math. Phys. 241, 519–581 (2003) 5. Babin, A., Figotin, A.: Polylinear spectral decomposition for nonlinear Maxwell equations. In: Agranovich, M.S., Shubin, M.A. (eds.) Partial Differential Equations, Advances in Mathematical Sciences, American Mathematical Society Translations-Series 2, Vol. 206, Providence, RI: Amer. Math. Soc., 2002, pp. 1–28 6. Babin, A., Figotin, A.: Nonlinear Photonic Crystals: IV Nonlinear Schrodinger Equation Regime. Waves in Random and Complex Media, 15(2), 145–228 (2005) 7. Babin, A., Figotin, A.: Linear Superposition In Nonlinear Wave Dynamics. Rev. Math. Phys. 18(9), 971–1053 (2006) 8. Babin, A., Mahalov, A., Nicolaenko, B.: Global regularity of 3D rotating Navier-Stokes equations for resonant domains. Indiana Univ. Math. J. 48(3), 1133–1176 (1999) 9. Babin, A., Mahalov, A., Nicolaenko, B.: Fast Singular Oscillating Limits and Global Regularity for the 3D Primitive Equations of Geophysics. M2AN 34(2), 201–222 (2000) 10. Ben Youssef, W., Lannes, D.: The long wave limit for a general class of 2D quasilinear hyperbolic problems. Comm. Par. Differ. Eqs. 27(5–6), 979–1020 (2002) 11. Bogoliubov, N.N., Mitropolsky, Y.A.: Asymptotic Methods In The Theory Of Non-Linear Oscillations. Delhi: Hindustan Pub. Corp., 1961 12. Boyd, R.: Nonlinear Optics. London:Academic Press, 1992 13. Bona, J.L., Colin, T., Lannes, D.: Long wave approximations for water waves. Arch. Rat. Mech. Anal. 178(3), 373–410 (2005) 14. Bourgain, J.: Global solutions of nonlinear Schrödinger equations. American Mathematical Society Colloquium Publications 46. Providence, RI: Amer. Math. Soc., 1999

Wavepacket Preservation Under Nonlinear Evolution

383

15. Butcher, P., Cotter, D.: The Elements of Nonlinear Optics. Cambridge: Cambridge Univ. Press, 1993 16. Cazenave, T.: Semilinear Schrödinger equations. Courant Lecture Notes in Mathematics 10, New York:New York University, Courant Institute of Mathematical Sciences, Providence, RI: Amer. Math. Soc. 2003 17. Colin, T.: Rigorous derivation of the nonlinear Schrödinger equation and Davey-Stewartson systems from quadratic hyperbolic systems. Asymptot. Anal. 31(1), 69–91 (2002) 18. Colin, T., Lannes, D.: Justification of and long-wave correction to Davey-Stewartson systems from quadratic hyperbolic systems. Discrete Contin. Dyn. Syst. 11(1), 83–100 (2004) 19. Craig, W., Groves, M.D.: Normal forms for wave motion in fluid interfaces. Wave Motion 31(1), 21–41 (2000) 20. Craig, W., Sulem, C., Sulem, P.-L.: Nonlinear modulation of gravity waves: a rigorous approach. Nonlinearity 5(2), 497–522 (1992) 21. Dobrokhotov, S.Yu., Maslov, V.P., Omelyanov, G.A.: Multiwave interaction in weakly nonlinear media with dispersion. In: Mathematical mechanisms of turbulence, i, Kiev: Akad. Nauk Ukrain. SSR, Inst. Mat., 1986, pp. 25–45 22. Dineen, S.: Complex Analysis on Infinite Dimensional Spaces. Berlin-Heidelberg-New york: Springer, 1999 23. Giannoulis, J., Mielke, A.: The nonlinear Schrödinger equation as a macroscopic limit for an oscillator chain with cubic nonlinearities. Nonlinearity 17(2), 551–565 (2004) 24. Goodman, R.H., Weinstein, M.I., Holmes, P.J.: Nonlinear propagation of light in one-dimensional periodic structures. J. Nonlinear Sci. 11(2), 123–168 (2001) 25. Groves, M.D., Schneider, G.: Modulating pulse solutions for quasilinear wave equations. J. Differ. Eq. 219(1), 221–258 (2005) 26. Hayashi, N., Naumkin, P.: Asymptotics of small solutions to nonlinear Schrödinger equations with cubic nonlinearities. Int. J. Pure Appl. Math. 3(3), 255–273 (2002) 27. Hille, E., Phillips, R.S.: Functional Analysis and Semigroups. Providence RI:AMS, 1991 28. Infeld, E., Rowlands, G.: Nonlinear Waves, Solitons, and Chaos. 2nd ed., Cambridge: Cambridge University Press, 2000 29. Joly, J.-L., Metivier, G., Rauch, J.: Diffractive nonlinear geometric optics with rectification. Indiana Univ. Math. J. 47(4), 1167–1241 (1998) 30. Kalyakin, L.A.: Long-wave asymptotics. Integrable equations as the asymptotic limit of nonlinear systems. Usp. Mat. Nauk 44(1)(265), 5–34, 247 (1989); translation in Russ. Math. Surv. 44(1), 3–42 (1989) 31. Kalyakin, L.A.: Asymptotic decay of a one-dimensional wave packet in a nonlinear dispersive medium. Math. USSR Sb. 60(2), 457–483 (1988) 32. Krieger, J., Schlag, W.: Stable manifolds for all monic supercritical focusing nonlinear Schrödinger equations in one dimension. J. Amer. Math. Soc. (Electronic) 19(4), 815–920 (2006) 33. Kuksin, S.B.: Fifteen years of KAM for PDE. Geometry, topology, and mathematical physics, Amer. Math. Soc. Transl. Ser. 2, 212, Providence, RI: Amer. Math. Soc., 2004, pp. 237–258 34. Kirrmann, P., Schneider, G., Mielke, A.: The validity of modulation equations for extended systems with cubic nonlinearities. Proc. Roy. Soc. Edinburgh Sect. A 122(1–2), 85–91 (1992) 35. Kato, T.: Perturbation Theory for Linear Operators. Berlin-Heidelberg-New York: Springer, 1980 36. Lax, P.D.: Integrals of nonlinear equations of evolution and solitary waves. Comm. Pure Appl. Math. 21, 467–490 (1968) 37. Mitropolskii, Yu.A., Nguyen, V.D.: Applied asymptotic methods in nonlinear oscillations. Solid Mechanics and its Applications 55. Dordrecht: Kluwer Academic Publishers Group, 1997 38. Maslov. V.P.: Non-standard characteristics in asymptotic problems. Usp. Mat. Nauk 38:6, 3–36 (1983), translation in Russ. Math. Surv. 38:6, 1–42 (1983) 39. Maslov, V.P.: Mathematical aspects of integral optics. Russ. J. Math. Phys. 8(1), 83–105 (2001) 40. Mielke, A., Schneider, G., Ziegra, A.: Comparison of inertial manifolds and application to modulated systems. Math. Nachr. 214, 53–69 (2000) 41. Moloney, J., Newell, A.: Nonlinear Optics. Advanced Book Program, Boulder, CO: Westview Press, 2004 42. Mills, D.: Nonlinear Optics. Berlin-Heidelberg-New York: Springer-Verlag, 1991 43. Nayfeh, A.H.: Perturbation Methods. New York: Wiley, 1973 44. Ostrovsky, L., Potapov, A.: Modulated Waves. Baltimore MD: The John Hopkins Univ. Press, 1999 45. Pankov, A.: Travelling Waves And Periodic Oscillations In Fermi-Pasta-Ulam Lattices. London: Imperial College Press, 2005 46. Phillips, O.M.: Wave Interactions. In: Leibovich, S., Seebass, A.R. (eds.) Nonlinear Waves. Ithaca and London: Cornell Univ. Press, 1974 47. Pierce, R.D., Wayne, C.E.: On the validity of mean-field amplitude equations for counterpropagating wavetrains. Nonlinearity 8(5), 769–779 (1995) 48. Sauter, E.G.: Nonlinear Optics. New york: Wiley-Interscience, 1996

384

A. Babin, A. Figotin

49. Schlag, W.: Spectral theory and nonlinear partial differential equations: a survey. Discrete Contin. Dyn. Syst. 15(3), 703–723 (2006) 50. Schneider, G.: Justification of modulation equations for hyperbolic systems via normal forms. NoDEA Nonlinear Differential Equations Appl. 5(1), 69–82 (1998) 51. Schneider, G.: Justification and failure of the nonlinear Schrödinger equation in case of non-trivial quadratic resonances. J. Differ. Eq. 216(2), 354–386 (2005) 52. Schneider, G., Uecker, H.: Nonlinear coupled mode dynamics in hyperbolic and parabolic periodically structured spatially extended systems. Asymptot. Anal. 28(2), 163–180 (2001) 53. Schneider, G., Uecker, H.: Existence and stability of modulating pulse solutions in Maxwell’s equations describing nonlinear optics. Z. Angew. Math. Phys. 54(4), 677–712 (2003) 54. Schneider, G., Wayne, C.E.: Estimates for the three-wave interaction of surface water waves. European J. Appl. Math. 14(5), 547–570 (2003) 55. Sipe, J.E., Bhat, N., Chak, P., Pereira, S.: Effective field theory for the nonlinear optical properties of photonic crystals. Phys. Rev. E 69, 016604 (2004) 56. Slusher, R.E., Eggleton, B.J.: Nonlinear Photonic Crystals. Berlin-Heidelberg-New York: Springer-Verlag, 2003 57. Sulem, C., Sulem, P.-L.: The Nonlinear Schrodinger Equation. Berlin-Heidelberg-New York: Springer, 1999 58. Volkov, S.N., Sipe, J.E.: Nonlinear optical interactions of wave packets in photonic crystals: Hamiltonian dynamics of effective fields. Phys. Rev. E 70, 066621 (2004) 59. Soffer, A., Weinstein, M.I.: Resonances, radiation damping and instability in Hamiltonian nonlinear wave equations. Invent. Math. 136(1), 9–74 (1999) 60. Weissert, T.P.: The Genesis of Simulation in Dynamics: pursuing the Fermi-Pasta-Ulam problem. New York: Springer-Verlag, 1997 61. Whitham, G.: Linear and Nonlinear Waves. New York: John Wiley & Sons, 1974 Communicated by P. Constantin

Commun. Math. Phys. 278, 385–431 (2008) Digital Object Identifier (DOI) 10.1007/s00220-007-0410-4

Communications in

Mathematical Physics

Random Walk on the Incipient Infinite Cluster for Oriented Percolation in High Dimensions Martin T. Barlow1 , Antal A. Járai2 , Takashi Kumagai3 , Gordon Slade1 1 Department of Mathematics, University of British Columbia, Vancouver, BC V6T 1Z2, Canada.

E-mail: [email protected]; [email protected]

2 Carleton University, School of Mathematics and Statistics, 1125 Colonel By Drive,

Ottawa, ON K1S 5B6, Canada. E-mail: [email protected]

3 Department of Mathematics, Faculty of Science, Kyoto University, Kyoto 606-8502, Japan.

E-mail: [email protected] Received: 7 August 2006 / Accepted: 13 September 2007 Published online: 8 January 2008 – © Springer-Verlag 2008

Abstract: We consider simple random walk on the incipient infinite cluster for the spread-out model of oriented percolation on Zd × Z+ . In dimensions d > 6, we obtain bounds on exit times, transition probabilities, and the range of the random walk, which establish that the spectral dimension of the incipient infinite cluster is 43 , and thereby prove a version of the Alexander–Orbach conjecture in this setting. The proof divides into two parts. One part establishes general estimates for simple random walk on an arbitrary infinite random graph, given suitable bounds on volume and effective resistance for the random graph. A second part then provides these bounds on volume and effective resistance for the incipient infinite cluster in dimensions d > 6, by extending results about critical oriented percolation obtained previously via the lace expansion. 1. Introduction and Main Results 1.1. Introduction. The problem of random walk on a percolation cluster—the ‘ant in the labyrinth’ [17]—has received much attention both in the physics and the mathematics literature. Recently, several papers have considered random walk on a supercritical percolation cluster [5,9,34,35]. Roughly speaking, supercritical percolation clusters on Zd are d-dimensional, and these papers prove, in various ways, that a random walk on a supercritical percolation cluster behaves in a diffusive fashion similar to a random walk on the entire lattice Zd . Although a mathematically rigorous understanding of critical percolation clusters is restricted to examples in dimensions d = 2 and d > 6, or d > 4 in the case of oriented percolation, it is generally believed that critical percolation clusters in dimension d have dimension less than d, and that random walk on a large critical cluster behaves subdiffusively. Critical percolation clusters are believed to be finite in all dimensions, and are known to be finite in the oriented setting [11]. To avoid finite-size issues associated with random walk on a finite cluster, it is convenient to consider random walk on the incipient infinite cluster (IIC), which can be understood as a critical percolation cluster

386

M. T. Barlow, A. A. Járai, T. Kumagai, G. Slade

conditioned to be infinite. The IIC has been constructed so far only when d = 2 [29], when d > 6 (in the spread-out case) [24], and when d > 4 for oriented percolation on Zd × Z+ (again in the spread-out case) [21]. See [36] for a summary of the highdimensional results. Also, it is not difficult to construct the IIC on a tree [7,30]. Random walk on the IIC has been proved to be subdiffusive on Z2 [30] and on a tree [7,30]. See also [13,14] for related results in the continuum limit. In this paper, we prove several estimates for random walk on the IIC for spread-out oriented percolation on Zd × Z+ in dimensions d > 6. These estimates, which show subdiffusive behaviour, establish that the spectral dimension of the IIC is 43 , thereby proving the Alexander–Orbach [3] conjecture in this setting. For random walk on ordinary (unoriented) percolation for d < 6 the Alexander–Orbach conjecture is generally believed to be false [27, Sect. 7.4]. The upper critical dimension for oriented percolation is 4. Because of this, we initially expected that the spectral dimension of the IIC would be equal to 43 for oriented percolation in all dimensions d > 4, but not for d < 4. However, our methods require that we take d > 6. The random walk is allowed to travel backwards in ‘time’ (as measured by the oriented percolation process), and this allows the walk to move between vertices that are not connected to each other in the oriented sense. It may be that this effect raises the upper critical dimension for the random walk in the oriented setting to d = 6. Or it may be that our conclusions for the random walk remain true for all dimensions d > 4, despite the fact that our methods force us to assume d > 6. This leads to the open question: Do our results actually apply in all dimensions d > 4, or does different behaviour apply for 4 < d ≤ 6? 1.2. Random walk on graphs and in random environments. Our results on the IIC will be consequences of more general results on random walks on a family of random graphs. We now set up our notation for this. Let = (G, E) be an infinite graph, with vertex set G and edge set E. The edges e ∈ E are not oriented. We assume that is connected. We write x ∼ y if {x, y} ∈ E, and assume that (G, E) is locally finite, i.e., µ y < ∞ for each y ∈ G, where µ y is the number of bonds that contain y. We extend µ to a measure on G. Let X = (X n , n ∈ Z+ , P x , x ∈ G) be the discrete-time simple random walk on , i.e., the Markov chain with transition probabilities P x (X 1 = y) =

1 , µx

y ∼ x.

(1.1)

We define the transition density (or discrete-time heat kernel) of X by pn (x, y) =

P x (X n = y) ; µy

(1.2)

we have pn (x, y) = pn (y, x). The natural metric on , obtained by counting the number of steps in the shortest path between points, is written d(x, y) for x, y ∈ G. We write B(x, r ) = {y : d(x, y) < r }, V (x, r ) = µ(B(x, r )), r ∈ (0, ∞).

(1.3)

Following terminology used for manifolds, we call V (x, r ) the volume of the ball B(x, r ). We will assume G contains a marked vertex, which we denote 0, and we write B(R) = B(0, R), V (R) = V (0, R).

(1.4)

Random Walk on the Incipient Infinite Cluster for Oriented Percolation

387

For A ⊂ G, we write T A = inf{n ≥ 0 : X n ∈ A}, τ A = T Ac ,

(1.5)

τ R = τ B(0,R) = min{n ≥ 0 : X n ∈ B(0, R)}.

(1.6)

and let Let Wn = {X 0 , X 1 , . . . , X n } be the set of vertices hit by X up to time n, and let µx . Sn = µ(Wn ) =

(1.7)

x∈Wn

We write Reff (0, B(R)c ) for the effective resistance between 0 and B(R)c in the electric network obtained by making each edge of a unit resistor—see [15]. A precise mathematical definition of Reff (·, ·) will be given in Sect. 2. We now consider a probability space (, F, P) carrying a family of random graphs (ω) = (G(ω), E(ω), ω ∈ ). We assume that, for each ω ∈ , the graph (ω) is infinite, locally finite and connected, and contains a marked vertex 0 ∈ G. We denote balls in (ω) by Bω (x, r ), their volume by Vω (x, r ), and write B(R) = Bω (R) = Bω (0, R),

V (R) = Vω (R) = Vω (0, R).

(1.8)

We write X = (X n , n ≥ 0, Pωx , x ∈ G(ω)) for the simple random walk on (ω), and denote by pnω (x, y) its transition density with respect to µ(ω). Formally, we introduce a second measure space (, F ), and define X on the product × . We write ω to denote elements of . The key ingredients in our analysis of the simple random walk are volume and resistance bounds. The following defines a set J (λ) of values of R for which we have ‘good’ volume and effective resistance estimates. The set J (λ) depends on the graph , and thus is a random set under P. Definition 1.1. Let = (G, E) be as above. For λ > 1, let J (λ) be the set of those R ∈ [1, ∞] such that the following all hold: (1) V (R) ≤ λR 2 , (2) V (R) ≥ λ−1 R 2 , (3) Reff (0, B(R)c ) ≥ λ−1 R. Note that Reff (0, B(R)c ) ≤ R (see Lemma 2.2(c) in Sect. 2.1 ), so there is no need for an upper bound complementary to Definition 1.1(3). We now make the following important assumption concerning the graphs ((ω)). This involves upper and lower bounds on the volume, as well as an estimate which says that R is likely to be in J (λ) for large enough λ. Assumption 1.2. There exists R ∗ ≥ 1 such that the following hold: (1) There exists p(λ) ≥ 0, with p(λ) ≤ c1 λ−q0 for some q0 , c1 > 0, such that for each R ≥ R∗, P(R ∈ J (λ)) ≥ 1 − p(λ), (1.9) (2) E[V (R)] ≤ c2 R 2 , for R ∈ [R ∗ , ∞), (3) E[1/V (R)] ≤ c3 R −2 for R ∈ [R ∗ , ∞).

388

M. T. Barlow, A. A. Járai, T. Kumagai, G. Slade

Remark. Assumption 1.2(2,3), together with Markov’s inequality, provides upper bounds of the form cλ−1 for the probability of the complements of the events in Definition 1.1(1,2). This creates some redundancy in our formulation, but we state things this way because some of our conclusions for the random walk rely only on Assumption 1.2(1) and do not require the stronger volume bounds given by Assumption 1.2(2,3). Note that Assumption 1.2 only involves statements about the volume and resistance from one point 0 in the graph. In general, this kind of information would not be enough to give much control of the random walk. However, the graphs considered here have strong recurrence properties, and are therefore simpler to handle than general graphs. We use techniques developed in [6,7,37–39]. We will prove in Theorem 1.7 that Assumption 1.2 holds for the IIC for sufficiently spread-out oriented percolation on Zd × Z+ when d > 6. As the reader of Sects. 4–5 will see, obtaining volume and (especially) resistance bounds on the IIC from one base point is already difficult; it is fortunate that we do not need to assume more. We have the following four consequences of Assumption 1.2 for random graphs. They give control, in different ways, of the quantities E ω0 τ R , p2n (0, 0), d(0, X n ), and Sn , which measure the rate of dispersion of the random walk X from the base point 0. Some statements in the first proposition involve the averaged law defined by the semi-direct product P ∗ = P × Pω0 . Theorem 1.3. Suppose Assumption 1.2(1) holds. Then, uniformly with respect to n ≥ 1 and R ≥ 1, P(θ −1 ≤ R −3 E ω0 τ R ≤ θ ) → 1 as θ → ∞,

(1.10)

≤ θ ) → 1 as θ → ∞,

(1.11)

< θ ) → 1 as θ → ∞,

(1.12)

−1/3

(1.13)

P(θ ∗

P (θ

−1

−1

ω ≤ n 2/3 p2n (0, 0) ∗ P (d(0, X n )n −1/3

< (1 + d(0, X n ))n

) → 1 as θ → ∞.

Since Pω0 (X 2n = 0) ≈ n −2/3 , we cannot replace 1 + d(0, X n ) by d(0, X n ) in (1.13). Theorem 1.4. Suppose Assumption 1.2(1,2,3) hold. Then there exists n ∗ ≥ 1 (depending only on R ∗ and the function p(·) in Assumption 1.2), and constants ci such that c3 n

−2/3

c1 R 3 ≤ E(E ω0 τ R ) ≤ c2 R 3

ω ≤ E( p2n (0, 0)) ≤ c4 n −2/3 c5 n 1/3 ≤ E(E ω0 d(0, X n ))

for all R ≥ 1,

(1.14)

∗

(1.15)

∗

(1.16)

for all n ≥ n , for all n ≥ n .

We do not have an upper bound in (1.16); this is discussed further in Example 2.6 below. Remark. The above two theorems in fact do not require the polynomial decay of p(λ); it is enough to have p(λ) → 0 as λ → ∞. Let ds (G) be the spectral dimension of G, defined by log p2n (x, x) , n→∞ log n

ds (G) = −2 lim

(1.17)

if this limit exists. Here x ∈ G; it is easy to see that the limit is independent of the base point x. Note that ds (Zd ) = d. In (c) below, recall that is the second probability space, on which the random walk X is defined.

Random Walk on the Incipient Infinite Cluster for Oriented Percolation

389

Theorem 1.5. Suppose Assumption 1.2(1) holds. Then there exist α1 , α2 , α3 , α4 < ∞, and a subset 0 with P(0 ) = 1 such that the following statements hold: (a) For each ω ∈ 0 and x ∈ G(ω) there exists N x (ω) < ∞ such that ω (log n)−α1 n −2/3 ≤ p2n (x, x) ≤ (log n)α1 n −2/3 , n ≥ N x (ω).

(1.18)

In particular, ds (G) = 43 , P-a.s., and the random walk is recurrent. (b) For each ω ∈ 0 and x ∈ G(ω) there exists Rx (ω) < ∞ such that (log R)−α2 R 3 ≤ E ωx τ R ≤ (log R)α2 R 3 , R ≥ Rx (ω).

(1.19)

Hence log E ωx τ R = 3. R→∞ log R lim

(c) Let Yn = max0≤k≤n d(0, X k ). For each ω ∈ 0 and x ∈ G(ω) there exist N x (ω, ω), Rx (ω, ω) such that Pωx (N x < ∞) = Pωx (Rx < ∞) = 1, and such that (log n)−α3 n 1/3 ≤ Yn (ω, ω) ≤ (log n)α3 n 1/3 , n ≥ N x (ω, ω),

(1.20)

(log R)−α4 R 3 ≤ τ R (ω, ω) ≤ (log R)α4 R 3 , R ≥ Rx (ω, ω).

(1.21)

Remark. One cannot expect (1.18) or (1.19) to hold with α1 = 0 or α2 = 0, since it is known that log log fluctuations occur in the analogous limits for the IIC on regular trees [7]. (This example is discussed further in Example 1.8(i) below.) Let Wn = {X 0 , X 1 , . . . , X n } as before and let |Wn | denote its cardinality. For a sufficiently regular recurrent graph one expects that |Wn | ≈ n ds /2 . The original formulation of the Alexander-Orbach conjecture [3] was that, in all dimensions, for the IIC, |Wn | ≈ n 2/3 ,

(1.22)

so that ds = in all dimensions. As noted already above, the conjecture is now not believed to hold in low dimensions. The following theorem shows that a version of the Alexander–Orbach conjecture does hold for random graphs that satisfy Assumption 1.2(1). As we will see in Theorem 1.7, this is the case for the IIC for sufficiently spread-out oriented percolation on Zd × Z+ for d > 6. 4 3

Theorem 1.6. (a) Suppose Assumption 1.2(1) holds. Then there exists a subset 0 with P(0 ) = 1 such that for each ω ∈ 0 and x ∈ G(ω), lim

n→∞

2 log Sn = , Pωx -a.s. log n 3

(1.23)

(b) Suppose in addition there exists a constant c0 such that all vertices in G have degree less than c0 . Then 2 log |Wn | = , Pωx -a.s. (1.24) lim n→∞ log n 3 See Example 1.8 for a graph with unbounded degree which satisfies Assumption 1.2, but for which (1.24) fails. Remark. See [32] for results which generalise the above theorems to the situation where there exist indices α < β such that V (R) is comparable to R α and Reff (0, B(R)c ) is comparable to R β−α . Our case is α = 2, β = 3.

390

M. T. Barlow, A. A. Járai, T. Kumagai, G. Slade

1.3. The IIC. In this section, we define the oriented percolation model and recall the construction of the IIC for spread-out oriented percolation on Zd × Z+ in dimensions d > 4 [21]. For simplicity, we will consider only the most basic example of a spreadout model. (In the physics literature, oriented percolation is usually called directed percolation; see [28].) The spread-out oriented percolation model is defined as follows. Consider the graph with vertices Zd × Z+ and directed bonds ((x, n), (y, n + 1)), for n ≥ 0 and x, y ∈ Zd with 0 ≤ x − y∞ ≤ L. Here L is a fixed positive integer and x∞ = maxi=1,...,d |xi | for x = (x1 , . . . , xd ) ∈ Zd . Let p ∈ [0, 1]. We associate to each directed bond ((x, n), (y, n + 1)) an independent random variable taking the value 1 with probability p and 0 with probability 1 − p. We say a bond is occupied when the corresponding random variable is 1, and vacant when the random variable is 0. Given a configuration of occupied bonds, we say that (x, n) is connected to (y, m), and write (x, n) −→ (y, m), if there is an oriented path from (x, n) to (y, m) consisting of occupied bonds, or if (x, n) = (y, m). Let C(x, n) denote the forward cluster of (x, n), i.e., C(x, n) = {(y, m) : (x, n) −→ (y, m)}, and let |C(x, n)| denote its cardinality. The joint probability distribution of the bond variables will be denoted P, with corresponding expectation denoted E; these depend on p and are defined on a probability ˜ P). Let θ ( p) = P(|C(0, 0)| = ∞). For all dimensions d ≥ 1 and for all space (, F, L ≥ 1, there is a critical value pc = pc (d, L) ∈ (0, 1) such that θ ( p) = 0 for p ≤ pc and θ ( p) > 0 for p > pc . In particular, there is no infinite cluster when p = pc [11,19]. For the remainder of this paper, we fix p = pc , so that P = P pc . To define the IIC, some terminology is required. A cylinder event is an event that is determined by the occupation status of a finite set of bonds. We denote the algebra of cylinder events by F0 , and define F to be the σ -algebra generated by F0 . The most natural definition of the IIC is as follows. Let {(x, m) −→ n} denote the event that there exists (y, n) such that (x, m) −→ (y, n). Let Qn (E) = P(E|(0, 0) −→ n) (E ∈ F0 ),

(1.25)

Q∞ (E) = lim Qn (E) (E ∈ F0 ),

(1.26)

and define the IIC by n→∞

assuming the limit exists. A possible alternate definition of the IIC is to define Pn (E) = with τn =

1 P(E ∩ {(0, 0) −→ (x, n)}) (E ∈ F0 ) τn d

(1.27)

x∈Z

x∈Zd

P((0, 0) −→ (x, n)), and to let P∞ (E) = lim Pn (E) (E ∈ F0 ), n→∞

(1.28)

assuming the limit exists. Let d + 1 > 4 + 1 and p = pc . It was proved in [21] that there is an L 0 = L 0 (d) such that for L ≥ L 0 the limit (1.28) exists for every cylinder event E ∈ F0 . Moreover, P∞ extends to a probability measure on the σ -algebra F, and, writing C = C(0, 0), C is P∞ a.s. an infinite cluster. It was also proved in [21] that if the critical survival probability P((0, 0) −→ n) is asymptotic to a multiple of n −1 as n → ∞, then for L 0 = L 0 (d) the limit (1.26) exists and defines a probability measure on F, and moreover Q∞ = P∞ so both constructions yield the same measure. Subsequently, it was shown in [22,23] that

Random Walk on the Incipient Infinite Cluster for Oriented Percolation

391

(y, n)

(x,m)

Fig. 1. Although the vertex (x, m) is not connected to (y, n), or vice versa, in the sense of oriented percolation (oriented upwards), it is nevertheless possible for a random walk to move from one of these vertices to the other.

the survival probability is indeed asymptotic to a multiple of n −1 when d + 1 > 4 + 1 and L ≥ L 0 (d). We will find both of the equivalent definitions (1.26) and (1.28) to be useful. We call (C, Q∞ )= (C, P∞ ) the IIC, and this provides the random environment for our random walk. We write E∞ for expectation with respect to Q∞ . It will be convenient to remove a Q∞ -null set N from the configuration space , so that for all ω ∈ 0 = −N the cluster C(ω) is infinite (and connected). The IIC C(ω), ω ∈ under the law Q∞ gives a family of random graphs, with marked vertex 0 = (0, 0), so as in Sect. 1.2 we (x,n) can define a random walk X = (X j , j ∈ Z+ , Pω , (x, n) ∈ C(ω)). Note that although the orientation is used to construct the cluster C, once C has been determined the random walk on C can move in any direction—see Fig. 1. Theorem 1.7. For d > 6, there is an L 1 = L 1 (d) ≥ L 0 (d) such that for all L ≥ L 1 , Assumption 1.2(1)–(3) hold with q0 = 1 and constants c1 , c2 , c3 independent of d and L. Consequently, the conclusions of Theorems 1.3, 1.4, 1.5 and 1.6 all hold for the random walk on the IIC. In particular, the Alexander–Orbach conjecture holds in the form of (1.24). As we will see later, the restriction to d > 6 is required only for our estimate of the effective resistance. Remark. Since the constants in Assumption 1.2 are independent of d, L for the IIC (provided d > 6 and L ≥ L 1 (d)), the constants α1 , . . . , α4 in Theorem 1.5 are also independent of d and L when applied to the IIC. The proofs of our main results are performed in two principal steps, corresponding to the results in Sect. 1.2 and Theorem 1.7 respectively. The results in Sect. 1.2 are proved in Sect. 2. The first step is to obtain estimates for a fixed (non-random) graph . In Sect. 2.1, using arguments based on those in [6] and [7], we show that volume and resistance bounds on lead to bounds on transition probabilities and hitting times. Then, in Sect. 2.2 we translate these results into the random graph context, and prove Theorems 1.3–1.6. The second step is the proof of Theorem 1.7. Section 3 states three properties of the IIC for critical spread-out oriented percolation in dimensions d > 6, and show that these imply Theorem 1.7. These properties are proved in Sects. 4–5, using an extension of results of [21,22,26] that were obtained using the lace expansion.

392

M. T. Barlow, A. A. Járai, T. Kumagai, G. Slade

1.4. Further examples. We have some other examples of random graphs which satisfy Assumption 1.2. Example 1.8. (i) Assumption 1.2 holds for random walk on the IIC for the binomial tree; see [7, Corollary 2.12]. Therefore the conclusions of Theorems 1.3–1.6 hold for a random walk on this IIC. The results of [7] go beyond Theorem 1.5(a) and (b) in this context, but Theorem 1.5(c) and Theorem 1.6 here are new. (ii) It is shown in [4] that the invasion percolation cluster on a regular tree is stochastically dominated by the IIC for the binomial tree. Consequently, upper bounds on the volume and lower bounds on the effective resistance of the invasion percolation cluster follow from the corresponding bounds for the IIC (using Lemma 2.2(e) in Sect. 2.1). Assumption 1.2(1,2) for the invasion percolation cluster therefore follows from its counterpart for the IIC for the binomial tree. In addition, the lower bound on the volume in Assumption 1.2(3) is proved for the invasion percolation cluster in [4]. Therefore Assumption 1.2 holds for the invasion percolation cluster on a regular tree, and hence simple random walk on the invasion percolation cluster also obeys the conclusions of Theorems 1.3–1.6. See [4] for further details about this example. (iii) Consider the incipient infinite branching random walk (IIBRW), obtained as the limit as n → ∞ of critical branching random walk (say with binomial offspring distribution) conditioned to survive to at least n generations [20, Sect. 2]. We interpret the IIBRW as a random infinite subgraph of Zd × Z+ . There is the option of considering either one edge per particle jump, leading to the occurrence of multiple edges between vertices, or identifying any such multiple edges as a single edge; we believe both options will behave similarly in dimensions d > 4. Consider simple random walk on the IIBRW. Our volume estimates for the IIC for oriented percolation for d > 4 will adapt to give similar estimates for the IIBRW for d > 4. The effective resistance Reff (0, B(R)c ) for the IIBRW is lower than it is for the IIC on a tree, due to cycles in the IIBRW. It is an interesting open problem to obtain a lower bound on Reff (0, B(R)c ) for the IIBRW, to establish Assumption 1.2 and hence its consequences Theorems 1.3–1.6 for random walk on the IIBRW. Our main interest is the question: Does random walk on the IIBRW have the same behaviour in all dimensions d > 4, or is there different behaviour for 4 < d ≤ 6 and d > 6? An answer would shed light on the question raised at the end of Sect. 1.1. It would also be of interest to consider this question in the continuum limit: Brownian motion on the canonical measure of super-Brownian motion conditioned to survive for all time (see [20]). (iv) A non-random graph satisfies Assumption 1.2 if and only if there exists λ such that J (λ) = [1, ∞). If i , 1 ≤ i ≤ n are graphs satisfying Assumption 1.2 then the graph obtained by joining the i at their marked vertices also satisfies Assumption 1.2. (v) Consider the non-random graph consisting of Z+ with for each n a finite subgraph G n connected by one point in G n to the vertex n. If µ(G n ) n and the diameter of G n is o(n) then Assumption 1.2 holds. In particular, if we take G n to be the complete graph with rn = n 1/2 vertices, then while V (R) R 2 , we have |B(R)| R 3/2 . In this case (1.23) holds, whereas log |Wn | 1 (1.29) lim = , Pωx -a.s. n→∞ log n 2 The rough idea behind (1.29) is as follows. By (1.20), the distance travelled up to time n is approximately n 1/3 . The proof of Theorem 1.6 shows that the random walk will visit a positive fraction of the vertices within this distance, and there are of order (n 1/3 )3/2 = n 1/2 such vertices, leading to (1.29). This shows that some bound on vertex degree is necessary before one can pass from (1.23) to (1.24).

Random Walk on the Incipient Infinite Cluster for Oriented Percolation

393

Throughout the paper, we use c, c to denote strictly positive finite constants whose values are not significant and may change from line to line. We write ci for positive constants whose values are fixed within theorems and lemmas. 2. Random Walk on a Random Graph In this section we prove Theorems 1.3–1.6. First, in Sect. 2.1, we study the random walk on a fixed graph; then, in Sect. 2.2 we apply these results to a family of random graphs satisfying Assumption 1.2. 2.1. Random walk on a fixed graph. In this section, we fix an infinite locally-finite connected graph = (G, E), and will show that bounds on the quantities V (R) and Reff (0, B(R)c ) lead to control of E 0 τ R , pn (0, 0) and E 0 d(0, X n ). The results in [6] (see [6, Theorem 1.3, Lemma 2.2]) cover the case where, for all x ∈ G and R ≥ 1, c1 R 2 ≤ V (x, R) ≤ c2 R 2 , c3 R ≤ Reff (x, B(x, R)c ) ≤ c4 R.

(2.1)

Here, we treat the case where we only have information available on the volume and effective resistance from one fixed point 0 in the graph, and only for certain values of R. Our methods are very close to those of [6], but the need to keep track of the values of R for which we make use of the bounds makes the details of the proofs more complicated. The following proposition gives the majority of the bounds on τ R , pn (0, 0) and d(0, X n ) that will be used in Sect. 2.2. Recall the definition of J (λ) from Definition 1.1. In the following proposition, we will take λ ≥ 1 and assume that R, and certain multiples of R, are in J (λ). We then obtain (for example) bounds on E 0 τ R ; these bounds will involve constants depending on λ. For the limit Theorems 1.5 and 1.6 we need to know that the dependence of these constants on λ is polynomial in λ. To indicate this, we write Ci (λ) to denote positive constants of the form Ci (λ) = Ci λ±qi , which will be fixed throughout this section. The sign accompanying qi > 0 is such that statements become weaker as λ increases. Proposition 2.1. Let λ ≥ 1. There exist C1 (λ), . . . , C9 (λ) such that the following hold: (a) Suppose that R ∈ J (λ). Then E x τ R ≤ 2λR 3

for x ∈ B(R).

(2.2)

Suppose that R, R/(4λ) ∈ J (λ). Then E x τ R ≥ C1 (λ)R 3 ,

for x ∈ B(0, R/(4λ)).

Let ε < 1/(4λ) and R, ε R, ε R/(4λ) ∈ J (λ). Then P y τ R ≤ C2 (λ)(ε R)3 ≤ C3 (λ)ε, for y ∈ B(ε R).

(2.3)

(2.4)

(b) Suppose that R ∈ J (λ). Then pn (0, y) + pn+1 (0, y) ≤ C4 (λ)n −2/3 for y ∈ B(R) if n = 2R3 .

(2.5)

Suppose that R, R/(4λ) ∈ J (λ). Then p2n (x, x) ≥ C5 (λ)n −2/3

for 41 C1 (λ)R 3 ≤ n ≤ 21 C1 (λ)R 3 , x ∈ B(0, R/(4λ)). (2.6)

394

M. T. Barlow, A. A. Járai, T. Kumagai, G. Slade

(c) Let n ≥ 1, M ≥ 1, and set R = Mn 1/3 . If R, C6 (λ)R/M, C6 (λ)R/(4λM) ∈ J (λ), then C (λ) 7 . (2.7) P 0 n −1/3 d(0, X n ) > M ≤ M We have C7 (λ) ≤ cλ22/3 . (d) Let R = (n/2)1/3 and M ≥ 1. If R, R/M ∈ J (λ) then P 0 (d(0, X n ) < R/M) ≤

λC4 (λ) . M2

(2.8)

Also, if R, C8 (λ)R ∈ J (λ) then E 0 d(0, X n ) ≥ C9 (λ)n 1/3 .

(2.9)

The overall strategy for the proof of these various inequalities is as follows. We begin with obtaining bounds on the mean exit time E 0 τ R . Using the Green function (see (2.17) below for the definition) we can write E z τB = g B (z, y)µ y . (2.10) y∈B

Bc)

Since g B (x, x) = Reff (x, (see (2.20)), this leads to the upper and lower bounds on E x τ R for x sufficiently close to 0 given in (2.2) and (2.3). The final inequality concerning τ R is (2.4), which bounds from above the lower tail of τ R . (This is equivalent to bounding from above the speed at which X can move from its starting point 0.) The proof for this takes the bounds in (2.2) and (2.3) as its starting point, but also uses a simple inequality relating effective resistance and hitting probabilities—see Lemma 2.3 below. The next set of inequalities we prove are those for the heat kernel pn (x, y). In the continuous time setting these are proved using differential inequalities which relate the derivative of the heat kernel to its energy. Unfortunately in discrete time the differential inequalities are replaced by rather less intuitive difference equations, which in addition take a slightly more complicated form. The estimate (2.5) is proved from an inequality which bounds the heat kernel just in terms of the volume of balls—see (2.31). Adding information on τ R then enables one to obtain the lower bound (2.6). The final bounds on d(0, X n ) then follow easily from the bounds on τ R and pn (0, x). 2.1.1. Bounds on τ R . We begin by giving a precise definition of effective resistance. Let E be the quadratic form given by ( f (x) − f (y))(g(x) − g(y)), (2.11) E( f, g) = 21 x,y∈G x∼y

where x ∼ y means {x, y} ∈ E. If we regard as an electrical network with a unit resistor on each edge in E, then E( f, f ) is the energy dissipation when the vertices of G are at a potential f . Set H 2 = { f ∈ RG : E( f, f ) < ∞}. Let A, B be disjoint subsets of G. The effective resistance between A and B is defined by: Reff (A, B)−1 = inf{E( f, f ) : f ∈ H 2 , f | A = 1, f | B = 0}.

(2.12)

Let Reff (x, y) = Reff ({x}, {y}), and Reff (x, x) = 0. For general facts on effective resistance and its connection with random walks see [2,15,33]. We recall some basic properties of Reff (·, ·).

Random Walk on the Incipient Infinite Cluster for Oriented Percolation

395

Lemma 2.2. Let = (G, E) be an infinite connected graph. (a) (b) (c) (d) (e)

Reff is a metric on G. If A ⊂ A, B ⊂ B, then Reff (A , B ) ≥ Reff (A, B). Reff (x, y) ≤ d(x, y). If x, y ∈ G \ A, then Reff (x, A) ≤ Reff (x, y) + Reff (y, A). , and if A = A∩G If = (G , E ) is a subgraph of , with effective resistance Reff and B = B ∩ G , then Reff (A , B ) ≥ Reff (A, B). (f) For all f ∈ RG and x, y ∈ G, | f (x) − f (y)|2 ≤ Reff (x, y)E( f, f ).

(2.13)

Proof. For (a) see [31, Sect. 2.3]. The monotonicity in (b) and (e) is immediate from the variational definition of Reff . (c) is easy, and there is a proof in [6, Lemma 2.1]. (d) follows from (a) by considering the graph in which all vertices in A are connected by short circuits, which reduces A to a single vertex a. (f) If f (x) = f (y) then (2.13) is immediate. If not, then set u(z) = ( f (z) − f (y))/ ( f (x) − f (y)), so that u(x) = 1 and u(y) = 0. Then by (2.12), Reff (x, y)−1 ≤ E(u, u) = E( f, f )| f (x) − f (y)|−2 , which gives (2.13).

The inequality (2.13) will play an important role in obtaining pointwise information on functions from resistance or energy estimates. Recall that T A was defined in (1.5) to be the hitting time of A ⊂ G. If A and B are disjoint subsets of G and x∈ A ∪ B, then (see [10, Fact 2, p. 226]) P x (T A < TB ) ≤

Reff (x, B) . Reff (x, A)

(2.14)

Lemma 2.3. Let λ ≥ 1 and suppose R ∈ J (λ). Let 0 < ε ≤ 1/(2λ), and y ∈ B(ε R). Then λε ≥ 1 − 2ελ, 1 − ελ P 0 (Ty < τ R ) ≥ 1 − ελ.

P y (T0 < τ R ) ≥ 1 −

(2.15) (2.16)

Proof. By Lemma 2.2(c) Reff (y, 0) ≤ d(y, 0), while by Lemma 2.2(d) and the definition of J (λ), Reff (y, B(R)c ) ≥ Reff (0, B(R)c ) − Reff (0, y) ≥

R − ε R. λ

So by (2.14), P y (τ R < T0 ) ≤

ελ Reff (y, 0) ≤ . Reff (y, B(R)c ) 1 − ελ

Similarly, P 0 (τ R < Ty ) ≤ Reff (0, y)/Reff (0, B(R)c ) ≤ ελ.

396

M. T. Barlow, A. A. Járai, T. Kumagai, G. Slade

The initial steps in bounding τ R use the Green kernel for the random walk X , so we now recall its definition. (These facts about Green functions will only be used in this subsubsection.) Let B ⊂ G, L(y, n) =

n−1

1(X k =y) ,

k=0

and set x −1 g B (x, y) = µ−1 y E L(y, τ B ) = µ y

∞

P x (X k = y, k < τ B ).

(2.17)

k=0

Then g B (x, y) = g B (y, x) and g B (x, ·) is harmonic on B \ {x}, and zero outside B. Using the Markov property at Ty gives g B (x, y) = P x (Ty < τ B )g B (y, y). Summing (2.17) over y ∈ B gives E z τB =

(2.18)

g B (z, y)µ y .

(2.19)

y∈B

The final property of g B (·, ·) we will need is that Reff (x, B c ) = g B (x, x).

(2.20)

One way to see this is to note that g B (x, ·) is the potential due to a unit current flow from x to B c , so that g B (x, x) is the effective resistance from x to B c . Alternatively, writing p xB (y) = g B (x, y)/g B (x, x), one can verify that p xB attains the minimum in (2.12), and that E( p xB , p xB ) = g B (x, x)−1 . Proof of Proposition 2.1(a), (2.2). It is easy to use (2.19) to obtain an upper bound for the exit time from a ball. By Lemma 2.2(d) we have Reff (z, B c ) ≤ 2R for any z ∈ B = B(R). So, g B (z, y)µ y ≤ g B (z, z)µ y = Reff (z, B c )V (R) ≤ 2λR 3 , (2.21) E z τB = y∈B

which gives (2.2).

y∈B

Proof of Proposition 2.1(a), (2.3). Write B = B(R). To obtain a lower bound for E 0 τ B we restrict the sum in (2.19) to a smaller ball B = B(R/(4λ)), and use Lemma 2.3 to bound g B (0, y) from below on B . If y ∈ B then Lemma 2.3 gives P y (T0 < τ B ) ≥ 21 , so by (2.18) and (2.20), g B (0, y) = g B (0, 0)P y (T0 < τ B ) ≥ 21 g B (0, 0) = 21 Reff (0, B c ) ≥ 21 R/λ. As R/(4λ) ∈ J (λ) we have µ(B ) ≥ λ−1 (R/(4λ))2 , and therefore we obtain g B (0, y)µ y ≥ 21 g B (0, 0)µ(B ) ≥ cλ−4 R 3 . E 0τB ≥ y∈B

Then for x ∈ B we have E x τ B ≥ P x (T0 < τ B )E 0 τ B , which gives (2.3).

(2.22)

Random Walk on the Incipient Infinite Cluster for Oriented Percolation

397

The upper and lower bounds on E x τ R lead to a preliminary inequality on the distribution of τ R . Lemma 2.4. Suppose that R, R/(4λ) ∈ J (λ). Let x ∈ B(0, R/4λ) and n ≥ 1. Then P x (τ R > n) ≥

C1 (λ)R 3 − n 2λR 3

for n ≥ 0.

(2.23)

Proof. By the Markov property, (2.2) and (2.3), C1 (λ)R 3 ≤ E x τ R ≤ n + E x [1{τ R >n} E X n (τ R )] ≤ n + 2λR 3 P x (τ R > n). Rearranging this gives (2.23).

Setting n = δ R 3 in (2.23) gives P x (τ R ≤ δ R 3 ) ≤ 1 −

C1 (λ) − δ . 2λ

(2.24)

This inequality has the defect that the right-hand side of (2.24) does not converge to 0 as δ → 0. We will need a better bound in order to control d(0, X n ), and this is given in (2.4). Proof of Proposition 2.1(a), (2.4). This proof takes a little more work; we obtain it by a kind of bootstrap from (2.23) and Lemma 2.3. The basic point is that, starting at y ∈ B(ε R), X is very likely to visit 0 before escaping from B(R). So X will with high probability have made many excursions from 0 to ∂ B(ε R) before time τ B . Thus τ B is stochastically larger than a sum of independent random variables, each of which, by (2.23), has a probability at least p > 0 of being greater than c R 3 . Rather than following this intuition directly and using stochastic inequalities, it is simpler to obtain a pair of inequalities (2.25) and (2.26) which contain the same information. Let t0 > 0, and set q(y) = P y (τ R ≤T0 ),

a(y) = P y (τ R ≤ t0 ).

Then a(y) = P y (τ R ≤ t0 ) = P y (τ R ≤ t0 , τ R ≤T0 ) + P y (τ R ≤ t0 , τ R > T0 ) ≤ P y (τ R ≤ T0 ) + P y (T0 < τ R , τ R − T0 ≤ t0 ) ≤ q(y) + (1 − q(y))a(0) ≤ q(y) + a(0), (2.25) using the strong Markov property for the second inequality. Starting X at 0 we have a(0) = P 0 (τ R ≤ t0 ) ≤ E 0 [1{τε R ≤t0 } P X τε R (τ R ≤ t0 )] ≤ P 0 (τε R ≤ t0 ) max a(y). y∈∂ B(ε R)

(2.26) Combining (2.25) and (2.26) gives a(0) ≤

max y∈∂ B(ε R) q(y) . P 0 (τε R > t0 )

(2.27)

398

M. T. Barlow, A. A. Járai, T. Kumagai, G. Slade

Note that as J (λ) is defined to be a subset of [1, ∞), the condition that ε R/(4λ) ∈ J (λ) implies that R ≥ 4λ/ε. Since ε < 1/(4λ), ε R+1 ≤ 2ε R t0 ) ≥

C1 (λ) ; 4λ

combining this with (2.28), (2.27) and (2.25) completes the proof of (2.4).

2.1.2. Heat kernel bounds. We now turn to the heat kernel bounds in Proposition 2.1(b). Our first result Proposition 2.5 follows from [6, Lemmas 1.1, 1.2 and 3.10], but as the proof is short we give it here. To deal with issues related to the possible bipartite structure of the graph it proves helpful to consider pn (x, y) + pn+1 (x, y). The main result of the proposition below is the inequality (2.31), which gives an upper bound for pn (x, x) just in terms of the volume. The proof of the analogous inequality in continuous time is a bit easier—see [7, Theorem 4.1]. Proposition 2.5. Let x0 ∈ G and f n (y) = pn (x0 , y) + pn+1 (x0 , y). (a) We have E( f n , f n ) ≤

2 f 2n/2 (x0 ). n

(2.29)

2 d(x0 , y) f 2n/2 (x0 ). n

(2.30)

(b) We have | f n (y) − f n (x0 )|2 ≤ (c) Let r ∈ [1, ∞) and n = 2r 3 . Then f n (x0 ) ≤ c1 n −2/3 (1 ∨ (r 2 /V (x0 , r )).

(2.31)

Proof. (a) It is easy to check that E( f n , f n ) = f 2n (x0 ) − f 2n+2 (x0 ). The spectral decomposition (see for example, Chapter 3 (32) of [2]) gives that k → f 2k (x0 ) − f 2k+2 (x0 ) is non-increasing. Thus n ( f 2n (x0 ) − f 2n+2 (x0 )) ≤ (2n/2 + 1) f 4n/2 (x0 ) − f 4n/2+2 (x0 ) ≤2

2n/2 i=n/2

and (2.29) is obtained.

( f 2i (x0 ) − f 2i+2 (x0 )) ≤ 2 f 2n/2 (x0 ),

Random Walk on the Incipient Infinite Cluster for Oriented Percolation

399

(b) Using Lemma 2.2(c),(f), | f n (y) − f n (x)|2 ≤ Reff (x, y)E( f n , f n ) ≤ d(x, y)E( f n , f n ). We then use (2.29) to bound E( f n , f n ). (c) Choose x∗ ∈ B(x0 , r ) such that f n (x∗ ) = min x∈B(x0 ,r ) f n (x). Then

f n (x∗ )V (x0 , r ) ≤

f n (x)µx ≤

x∈B(x0 ,r )

pn (x0 , x)µx +

x∈G

pn+1 (x0 , x)µx ≤ 2,

x∈G

so that f n (x∗ ) ≤ 2/V (x0 , r ). Since n is even, by (2.30) we have f n (x0 )2 ≤ 2 f n (x∗ )2 + | f n (x0 ) − f n (x∗ )|2 ≤

8 cr f n (x0 ) + . V (x0 , r )2 n

Using a + b ≤ 2(a ∨ b), we see that f n (x0 ) ≤ (c /V (x0 , r )) ∨ (cr /n).

Remark. In fact, (2.29) can be sharpened to give E( f n , f n ) ≤ c1 n −1 p2n/2 (x0 , x0 ), – see [6, Lemma 3.10], but we do not need this. Proof of Proposition 2.1(b). Let f n (y) = pn (0, y) + pn+1 (0, y). As R ∈ J (λ), R 2 /V (R) ≤ λ, so by Proposition 2.5(c), f n (0) ≤ c1 λn −2/3 .

(2.32)

By Proposition 2.5(b), if n is even f n (y) ≤ f n (0) + | f n (y) − f n (0)| ≤ f n (0) + (2d(0, y)n −1 f n (0))1/2 ≤ cλn −2/3 , (2.33) which proves (2.5). To prove the lower bound (2.6) we use Lemma 2.4. For sufficiently small n this bounds from above the probability that X has left B by time n, and so bounds from below P 0 (X n ∈ B). This leads easily to a lower bound on p2n (x, x). Here are the details. Let n ≤ 21 C1 (λ)R 3 . Then using (2.23), P x (X n ∈ B) ≥ P x (τ B > n) ≥ 14 λ−1 C1 (λ).

(2.34)

By Chapman–Kolmogorov and Cauchy–Schwarz ⎛ P x (X n ∈ B)2 = ⎝

⎞2 pn (x, y)µ y ⎠ ≤ µ(B)

y∈B

and using (2.34) gives (2.6).

y∈B

pn (x, y)2 µ y ≤ p2n (x, x)λR 2 ,

400

M. T. Barlow, A. A. Járai, T. Kumagai, G. Slade

2.1.3. Bounds on d(0, X n ). The main work for these bounds has already been done in the proofs of Proposition 2.1(a) and (b), and in particular the proof of (2.4). Proof of Proposition 2.1. (c) The proof of (2.7) follows from (2.4) after suitable checking, since P 0 (d(0, X n )n −1/3 > M) = P 0 (d(0, X n ) > R) ≤ P 0 (τ R ≤ n).

(2.35)

We now fill in the details. Define ε by the relation n = C2 (λ)(ε R)3 ; so that ε = C6 (λ)/M. Let C7 (λ) = C3 (λ)C6 (λ). The desired inequality is trivial when C7 (λ)/M ≥ 1, so assume that C7 (λ)/M < 1. This means ε = C6 (λ)/M < C3 (λ)−1 . Since we may take C3 (λ) > 4λ, we obtain ε < (4λ)−1 , so we can apply (2.4). Using (2.35) and (2.4), P 0 (d(0, X n )n −1/3 > M) ≤ P 0 (τ R ≤ C2 (λ)(ε R)3 ) ≤ C3 (λ)ε =

C7 (λ) , M

(2.36)

which proves (2.7). Tracking the powers of λ gives that C7 (λ) ≤ cλ22/3 . (d) We can bound the probability that X is in a ball B by the volume of the ball and the maximum of the heat kernel on the ball. By (2.5), writing B = B(0, R/M) ⊂ B(0, R) and f n (0, y) = pn (0, y) + pn+1 (0, y), P 0 (X n ∈ B ) = pn (0, y)µ y ≤ f n (0, y)µ y ≤V (R/M)C4 (λ)R −2 ≤ λC4 (λ)/M 2 , y∈B

y∈B

(2.37) proving (2.8). The final inequality in (d) now follows easily, since all we need is that d(0, X n ) is greater than cn 1/3 with positive probability. Let M = C8 (λ) satisfy M 2 = 2λC4 (λ). Then using (2.8), P 0 (d(0, X n ) < R/M) ≤ 21 , so E 0 d(0, X n ) ≥ 21 R/M. We do not have an upper bound on E 0 d(0, X n ) to complement the lower bound of Proposition 2.1(d), which uses volume and resistance bounds from a single base point, i.e., bounds on V (0, R) and Reff (0, B(R)c ). Suppose that J (λ) = [1, ∞) for some p λ ≥ 1, and let Z n = n −1/3 d(0, X n ). Then we are able to bound E 0 Z n for p < 1, since (2.7) gives p

E 0 [Z n ] ≤ ≤

∞ m=1 ∞ m=1

(2m+1 ) p P 0 2m ≤ n −1/3 d(0, X n ) < 2m+1 ∞ (2m+1 ) p P 0 n −1/3 d(0, X n ) ≥ 2m ≤ c1 2m( p−1) = c2 < ∞. m=1

On the other hand the following example indicates that, under our hypotheses, we cannot p expect to have a uniform bound on E 0 (Z n ) when p > 1. We sketch this argument below. Example 2.6. Let be the subgraph of Z2 with vertex set G = G 0 ∪ G 1 , where G 0 = {(n, 0), n ∈ Z}, and G 1 = {(n, m) : 0 ≤ m ≤ n}. Let the edges be {(n, 0), (n + 1, 0)}, for n ∈ Z, and {(n, m), (n, m + 1)} if n ≥ 1 and 0 ≤ m ≤ n − 1. Thus consists of Z− and a comb-type graph of vertical branches with base Z+ . Write 0 for (0, 0). It is easily checked that V (0, R) R 2 , and Reff (0, B(0, R)c ) ≥ R/4. Thus there exists λ0 < ∞ such that J (λ0 ) = [1, ∞). Let H (a, b) = {(n, m) ∈ G : a ≤ n ≤ b}.

Random Walk on the Incipient Infinite Cluster for Oriented Percolation

401

Let X n be the simple random walk on . If we time-change out the excursions of X away from Z then we obtain a simple random walk Yn on Z. Now let R ≥ 1, and r = R 2/3 ∈ Z. Let A = H (−r, r ). Since B(0, r/2) ⊂ A ⊂ B(0, 2r ), Proposition 2.1(a) implies that E 0 τ A ≈ r 3 ≈ R 2 . Since X only moves horizontally when it is on the x-axis, P 0 (X τ A = (−r, 0)) = 1/2. If X τ A = (−r, 0) then the probability that X reaches H (−∞, −R) before returning to 0 is r/R ≈ R −1/3 ; also, if X does this then the time taken to do so will be of order R 2 . These arguments lead us to expect that if n = R 2 then P 0 (X n ∈ H (−∞, −R/2)) ≥ c R −1/3 .

(2.38)

Given (2.38), it follows from Markov’s inequality that E 0 Z n ≥ n − p/3 (R/2) p P 0 (X n ∈ H (−∞, −R/2)) ≥ cn ( p−1)/6 , p

and the lower bound diverges if p > 1. This concludes Example 2.6. 2.2. Results for random graphs. We now consider a family of random graphs, as described in Sect. 1.2, and prove Theorems 1.3–1.6. Most of the hard work has been done in the previous section, where we obtained bounds for a fixed graph . We begin by obtaining tightness of the quantities R −3 E 0 τ R , n 2/3 p2n (0, 0), and −1/3 n d(0, X n ). We recall the definition of the function p(λ) in Assumption 1.2(1), and that p(λ) ≤ c0 λ−q0 . Proof of Theorem 1.3. The basic idea here is straightforward. For each of the quantities we are interested in, the estimates in Proposition 2.1 tell us that provided the environment is ‘good’ at the scale R (that is, more precisely, that ci R ∈ J (λ) for suitable ci ) then the quantity takes the value we want. The bounds we get will only hold if R or n is large enough, but it is easy to handle the small values of R or n. We begin with (1.10). Let ε > 0. Choose λ ≥ 1 such that 2 p(λ) < ε. Let R/(4λ) ≥ R ∗ , and set F1 = {R, R/(4λ) ∈ J (λ)}. Then, by Assumption 1.2(1), P(F1 ) ≥ 1−2 p(λ). For ω ∈ F1 , by Proposition 2.1(a), there exists c1 < ∞, q1 ≥ 0 such that (c1 λq1 )−1 ≤ R −3 E ωx τ R ≤ c1 λq1

for x ∈ B(R/(4λ)).

So, if θ ≥ c1 λq1 then for R ∈ [4λR ∗ , ∞), P θ −1 ≤ R −3 E ω0 τ R ≤ θ ≥ P(F1 ) ≥ 1 − 2 p(λ) ≥ 1 − ε.

(2.39)

(2.40)

Let R0 ≥ 1. Since 0 < sup1≤r ≤R0 r −3 E ω0 τr < ∞, we have lim P(θ −1 ≤ r −3 E ω0 τr ≤ θ ) = 1

θ→∞

uniformly for r ∈ [1, R0 ].

Combining this with (2.40) gives (1.10). A similar argument enables us to handle the cases of small n in (1.11)–(1.13), and we do not provide further details on this point below. For (1.11) let n ≥ 1, λ ≥ 1, and let R0 , R1 be defined by n = 21 C1 (λ)R13 = 2R03 . Let F2 = {R0 , R1 , R1 /(4λ) ∈ J (λ)}. Suppose that R0 and R1 /(4λ) are both greater than R ∗ ; then P(F2 ) ≥ 1 − 3 p(λ). If ω ∈ F2 then by Proposition 2.1(b), ω (0, 0) ≤ c2 λq2 . (c2 λq2 )−1 ≤ n 2/3 p2n

402

So,

M. T. Barlow, A. A. Járai, T. Kumagai, G. Slade

ω P (c2 λq2 )−1 ≤ n 2/3 p2n (0, 0) ≤ c2 λq2 ≥ P(F2 ) ≥ 1 − 3 p(λ),

(2.41)

proving (1.11). We now prove (1.12). Let n ≥ 1 and λ ≥ 1. Let M = λ8 and set R0 = Mn 1/3 ,

R1 = C6 (λ)n 1/3 ,

R2 = C6 (λ)n 1/3 /(4λ),

F3 = {R0 , R1 , R2 ∈ J (λ)}. If n is large enough so that Ri ≥ R ∗ for 0 ≤ i ≤ 2, then by (2.7), if ω ∈ F3 then C (λ) cλ22/3 c 7 ≤ = 2/3 . Pω0 n −1/3 d(0, X n ) > λ8 ≤ 8 λ λ8 λ Taking θ = λ8 , we have P ∗ n −1/3 d(0, X n ) > θ ≤ P(F3c ) + E Pω0 (n −1/3 d(0, X n ) > λ8 )1 F3 ≤ 3 p(θ 1/8 ) + c3 θ −1/12 ,

(2.42)

and (1.12) follows. Finally, we prove (1.13). Let R = (n/2)1/3 , M ≥ 1. If R, R/M ∈ J (λ) then by (2.8), λC (λ) 4 Pω0 n −1/3 d(0, X n ) < 2−1/3 M −1 ≤ . (2.43) M2 Given ε > 0 choose λ so that p(λ) < ε and M so that λC4 (λ)/M 2 < ε. Let F4 = {R, R/M ∈ J (λ)}. Then (2.43) holds for ω ∈ F4 , so taking expectations with respect to P, P ∗ n −1/3 (1 + d(0, X n )) < 2−1/3 M −1 ≤ P ∗ n −1/3 d(0, X n ) < 2−1/3 M −1 = EPω0 n −1/3 d(0, X n ) < 2−1/3 M −1 ≤ P(F4c ) + ε < 3ε. This deals with the case of large n; for small n we just use 1 + d(0, X n ) ≥ 1.

Proof of Theorem 1.4. We begin with the upper bounds in (1.14)–(1.15). Here all we need do is to use the bounds on EV (R) and E(1/V (R)) given by Assumption 1.2(2), together with the bounds on E 0 τ R and p2n (0, 0) obtained above. By (2.21) and Assumption 1.2(2), E(E ω0 τ R ) ≤ E(2RV (R)) ≤ c R 3 , provided R ≥ R ∗ . If R ≤ R ∗ then since τ R ≤ τ R ∗ , we obtain the upper bound in (1.14) by adjusting the constant c2 . Also, by Proposition 2.5(c), if r = (n/2)1/3 then using Assumption 1.2(3), ω (0, 0) ≤ cn −2/3 E(1 + r 2 /V (r )) ≤ c n −2/3 , E p2n

again provided r ≥ R ∗ . For each of the lower bounds, it is sufficient to find a set F ⊂ of ‘good’ graphs with P(F) ≥ c > 0 such that, for all ω ∈ F we have suitable lower bounds on E ω0 τ R ,

Random Walk on the Incipient Infinite Cluster for Oriented Percolation

403

ω (0, 0) or E 0 d(0, X ). We assume that R ≥ 1 is large enough so that R/(4λ ) ≥ R ∗ , p2n n 0 ω where λ0 is chosen large enough that p(λ0 ) < 1/8. Again, we obtain the lower bound in (1.14) for small R using the fact that E(E ω0 τ R ) ≥ 1 and adjusting the constant c1 . Let F = {R, R/(4λ0 ) ∈ J (λ0 )}. Then P(F) ≥ 43 , and for ω ∈ F, by (2.3), E ω0 τ R ≥ c1 (λ0 )R 3 . So,

E(E ω0 τ R ) ≥ E(E ω0 τ R 1 F ) ≥ c1 (λ0 )R 3 P(F) ≥ c2 (λ0 )R 3 . Given n ∈ N, choose R so that n = 21 C1 (λ0 )R 3 . Then there exists n ∗ (depending on λ0 and R ∗ ) such that n ≥ n ∗ implies that R/(4λ0 ) ≥ R ∗ . Let F be as above. Then using (2.6) to bound p2n (0, 0) from below, ω E p2n (0, 0) ≥ P(F)c3 (λ0 )n −2/3 ≥ c4 (λ0 )n −2/3 ,

giving the lower bound in (1.15). A similar argument uses (2.9) to conclude (1.16).

Proof of Theorem 1.5. These results will follow from the bounds already obtained in Proposition 2.1 and in the proof of Theorem 1.3 by a straightforward Borel–Cantelli argument. We will take 0 = a ∩ b ∩ c , where the sets ∗ are defined in the proofs of (a), (b) and (c). Recall that by Assumption 1.2(1), p(λ) = P(R ∈ J (λ)) ≤ c0 λ−q0 . ω (0, 0). By (2.41) we have (a) We begin with the case x = 0, and write w(n) = p2n P((c1 λq1 )−1 < n 2/3 wn ≤ c1 λ−q1 ) ≥ 1 − 3 p(λ). p(λk ) < ∞, by Borel–Cantelli there Let n k = ek and λk = k 2/q0 . Then, since 2/3 exists K 0 (ω) with P(K 0 < ∞) = 1 such that c1−1 k −2q1 /q0 ≤ n k w(n k ) ≤ c1 k 2q1 /q0 for all k ≥ K 0 (ω). Let a = {K 0 < ∞}. For k ≥ K 0 we therefore have −2/3

c2−1 (log n k )−2q1 /q0 n k

−2/3

≤ w(n k ) ≤ c2 (log n k )2q1 /q0 n k

,

so that (1.18) holds for the subsequence n k . The spectral decomposition (see for example ω (0, 0) is monotone decreasing in n. So, if n > N = e K 0 +1, let k ≥ K [2]) gives that p2n 0 0 be such that n k ≤ n < n k+1 . Then −2/3

w(n) ≤ w(n k ) ≤ c2 (log n k )2q1 /q0 n k

≤ 2e2/3 c2 (log n)2q1 /q0 n −2/3 .

Similarly w(n) ≥ w(n k+1 ) ≥ c3 n −2/3 (log n)−2q1 /q0 . Taking q2 > 2q1 /q0 , so that the constants c2 , c3 can be absorbed into the log n term, we obtain ω (log n)−q2 n −2/3 ≤ p2n (0, 0) ≤ (log n)q2 n −2/3 for all n ≥ N0 (ω). (2.44) ω (0, 0)/ log n = −2/3, P-a.s. is then immediate. Since p ω (0, 0) = That limn log p2n n 2n ∞, X is recurrent. If x, y ∈ C(ω) and k = dω (x, y), then the Chapman–Kolmogorov equations give that ω ω p2n (x, x)( pkω (x, y)µx (ω))2 ≤ p2n+2k (y, y),

and using this it is easy to obtain (1.18) from (2.44).

404

M. T. Barlow, A. A. Járai, T. Kumagai, G. Slade

(b) Let Rn = en and λn = n 2/q0 . Let Fn = {Rn , Rn /(4λn ) ∈ J (λn )}. Then (provided Rn /(4λn ) ≥ 1) we have P(Fnc ) ≤ 2 p(λn ) ≤ 2n −2 . So, by Borel–Cantelli, if b = lim inf Fn , then P(b ) = 1. Hence there exists M0 with M0 (ω) < ∞ on b , and such that ω ∈ Fn for all n ≥ M0 (ω). Now fix ω ∈ b , and let x ∈ C(ω). Write F(R) = E ωx τ R . By (2.39) there exist constants c4 , q4 such that (c4 λn4 )−1 ≤ Rn−3 F(Rn ) ≤ c4 λn4 , q

q

(2.45)

provided n ≥ M0 (ω) and n is also large enough so that x ∈ B(Rn /(4λn )). Writing Mx (ω) for the smallest such n, c4−1 (log Rn )−2q4 /q0 Rn3 ≤ F(Rn ) ≤ c4 (log Rn )2q4 /q0 Rn3 , for all n ≥ Mx (ω). As F(R) is monotonic, the same argument as in (a) enables us to replace F(Rn ) by F(R), for all R ≥ Rx = 1 + e Mx . Taking α2 > 2q4 /q0 we obtain (1.19). (c) Recall that Yn = max0≤k≤n d(0, X k ). We begin by noting that {Yn ≥ R} = {τ R ≤ n}.

(2.46)

Using this, (1.20) follows easily from (1.21). It remains to prove (1.21). Since τ R is monotone in R, as in (b) it is enough to prove the result for the subsequence Rn = en . The estimates in (b) give the upper bound. In fact, if ω ∈ b , and n ≥ Mx (ω), then by (2.45), q

Pωx (τ Rn ≥ n 2 c4 λn4 Rn3 ) ≤

F(Rn ) q 2 n c4 λn4 Rn3

≤ n −2 .

So, by Borel–Cantelli (with respect to the law Pωx ), there exists N x (ω, ω) with Pωx (N x < ∞) = Pωx ({ω : N x (ω, ω) < ∞}) = 1 such that τ Rn ≤ c5 (log Rn )q5 Rn3 , for all n ≥ N x . For the lower bound, write C2 (λ) = c6 λ−q6 , C3 (λ) = c7 λq7 . Let λn = n 2/q0 , and −q −q εn = n −2 λn 6 7 . Set G n = {Rn , εn Rn , εn Rn /(4λn ) ∈ J (λn )}. Then, for n sufficiently large so that εn Rn /(4λn ) ≥ 1, we have P(G cn ) ≤ 3 p(λn ) ≤ 3c0 n −2 . Let c = b ∩ (lim inf G n ); then by Borel–Cantelli P(c ) = 1 and there exists M1 with M1 (ω) < ∞ for ω ∈ c such that ω ∈ G n whenever n ≥ M1 (ω). By (2.4), if n ≥ M1 and x ∈ B(εn Rn ) then −q

Pωx (τ Rn ≤ c6 λn 6 εn3 Rn3 ) ≤ c7 λn7 εn ≤ c7 n −2 . q

So, using Borel–Cantelli, we deduce that (for some q8 ) τ Rn ≥ c6 λ−q6 εn3 Rn3 ≥ n −q8 Rn3 = (log Rn )−q8 Rn3 , for all n ≥ N x (ω, ω). This completes the proof of (1.21).

(2.47)

Random Walk on the Incipient Infinite Cluster for Oriented Percolation

405

Proof of Theorem 1.6. (a) We first consider the case x = 0. The upper bound on log Sn / log n follows easily from the bounds on τ R and V (R), as follows. A Borel– Cantelli argument similar to those above implies that V (R) ≤ R 2 (log R)c

(2.48)

for all sufficiently large R. Recall that Yn = max0≤k≤n d(0, X n ). We have Wn ⊂ B(Yn ), so Sn ≤ V (Yn ). So, for sufficiently large n, using (1.20),

Sn ≤ V ((log n)α3 n 1/3 ) ≤ n 2/3 (log n)c ,

(2.49)

proving the upper bound in (1.23). For the lower bound, we need to show that a positive proportion of the points in B(Yn ) have been hit by time n, and for this we use Lemma 2.3. q2 Choose q1 ≥ 1, q2 ≥ 1 so that we can write C2 (λ) = c1 λ−q1 and C3 (λ) = c2 λ . Let Rk = ek , and λk = k q3 , where q3 ≥ 2 is chosen large enough so that p(λk ) < ∞. −q Let εk = c2−1 λk 2 k −q3 . Set Fk = {Rk , εk Rk , εk Rk /4λk ∈ J (λk )}. Write ξ(x, R) = 1{Tx >τ R } . If R ∈ J (λ) and ε < 1/2λ then by Lemma 2.3, Pω0 (ξ(x, R) = 1) ≤ ελ, for x ∈ B(ε R). Set Z k = V (εk Rk )−1

ξ(x, Rk )µx ;

x∈B(εk Rk )

this is the proportion of points in B(εk Rk ) which are not hit by time τ Rk . Then if ω ∈ Fk , Pω0 (Z k ≥ 21 ) ≤ 2E ω0 Z k ≤ 2εk λk ≤ k −q3 . Let m(k) = k q3 λk Rk3 . Then if ω ∈ Fk , by (2.2), Pω0 (τ Rk ≥ m(k)) ≤ 2λk Rk3 m(k)−1 = 2k −q3 . Thus P ∗ (Fkc ∪ Z k ≥ 21 ∪ {τ Rk ≥ m(k)}) ≤ 3 p(λk ) + 3k −q3 , so by Borel–Cantelli, P ∗ -a.s. there exists a k0 (ω, ω) < ∞ such that, for all k ≥ k0 , Fk holds, τ Rk ≤ m(k), and Z k ≤ 1/2. So, for k ≥ k0 , 2 (1 − ξ(x, Rk ))µx = V (εk Rk )(1 − Z k ) ≥ 21 λ−1 Sm(k) ≥ Sτ Rk = k (εk Rk ) . x∈B(εk Rk )

Let n be large enough so that m(k) ≤ n < m(k + 1) for some k ≥ k0 . Then log Sn log Sm(k) 2k − c log k ≥ ≥ , log n log m(k + 1) 3(k + 1) + c log(k + 1) and the lower bound in (1.23) follows. This proves (1.23) when x = 0.

406

M. T. Barlow, A. A. Járai, T. Kumagai, G. Slade

Now let 0 = {ω : G(ω) is recurrent and Pω0 (lim(log Sn / log n) = 23 ) = 1}. n

We have P(0 ) = 1. If ω ∈ 0 , and x ∈ G(ω) then X hits 0 with Pωx –probability 1. Since the limit does not depend on the initial segment X 0 , . . . , X T0 , we obtain (1.23). (b) We have |Wn | ≤ Sn ≤ c0 |Wn |, so (1.24) is immediate from (1.23). Remark. Note that the constants ci in Theorem 1.4 and αi in Theorem 1.5 depend only on the constants c1 , c2 , c3 , q0 in Assumption 1.2. 3. Verification of Assumption 1.2 for the IIC In Sect. 3.1, we state three propositions which give estimates for the volume and effective resistance for the IIC. Propositions 3.1–3.2, which pertain to the volume growth of C, are proved in Sect. 4. Proposition 3.3, which will be used to estimate the effective resistance, is proved in Sect. 5. In Sect. 3.2, we use the three propositions to verify Assumption 1.2 for the IIC, and complete the proof of our main result Theorem 1.7. 3.1. Three propositions. We will use the following notation for the IIC. Let U (R) = {(x, n) : n ≥ R}, B(R) = {(x, n) ∈ C : 0 ≤ n < R}, and ∂ B(R) = {(x, R) : (x, R) ∈ C}. We note that, using the graph distance d on C, B(R) is just the ball B(0, R), and ∂ B(R) is its exterior boundary. Let Z R = b0 R −2 V (R),

(3.1)

where b0 is a constant that will be specified below (4.25). The constant b0 has limit as L → ∞.

1 2

Proposition 3.1. Let d > 4 and L ≥ L 0 . Under the IIC measure, the random variables Z R converge in distribution to a strictly positive limit Z , whose distribution is independent of d and L. Also, all moments converge, i.e., E∞ Z lR → EZ l for each l ∈ N. In particular, c1 (d)R 2 ≤ E∞ V (R) ≤ c2 (d)R 2 , R ≥ 1. Moreover, c1 and c2 do not depend on d, if we further require that L ≥ L 1 , for some L 1 = L 1 (d). Remark. We do not need the full strength of Proposition 3.1 to establish Assumption 1.2 for the IIC. However, since the scaling limit of V (R) is also of independent interest, we will prove the stronger result, and, moreover, identify the limiting random variable Z in terms of super-Brownian motion. Proposition 3.2. Let d > 4 and L ≥ L 0 . Then Q∞ (V (R)R −2 < λ) ≤ c1 (d) exp{−c2 (d)λ−1/2 },

R ≥ 1.

(3.2)

Moreover, c1 and c2 do not depend on d, if we further require that L ≥ L 1 , for some L 1 = L 1 (d).

Random Walk on the Incipient Infinite Cluster for Oriented Percolation

407

The third proposition gives an estimate on the expected number of edges at level n −1 that need to be cut in order to disconnect 0 from level R. We say that (x, n), (x , n ) ∈ C are RW-connected, if there is a path, not necessarily oriented, in C from (x, n) to (x , n ). We reserve the term connected to mean oriented connection, that is (x, n) −→ (x , n ). Let

(x, n) is RW-connected to , 0 6. There exists L 1 = L 1 (d) ≥ L 0 (d) such that for L ≥ L 1 , R ≥ 1 and 0 < a < 1, E∞ (|D(n)|) ≤ c1 (a),

0 < n ≤ a R.

(3.4)

The constant c1 (a) is independent of the dimension d and also of L. Remark. Proposition 3.3 is the only place where we need d > 6 rather than d > 4. 3.2. Verification of Assumption 1.2 for the IIC. We begin with a lemma that relates |D(n)| and the effective resistance. Lemma 3.4. For oriented percolation in any dimension d ≥ 1, Reff (0, ∂ B(R)) ≥

R n=1

1 . |D(n)|

(3.5)

Proof. We have that Reff (0, ∂ B(R)) is the minimum energy dissipation of a unit current from 0 to ∂ B(R) – see [15, p. 63]. Let I be such a unit current. Fix 1 ≤ n ≤ R, let k = |D(n)|, and let J1 , . . . Jk be the currents in the bonds in D(n). Then since k |Ji | ≥ 1. Hence all the current must flow through the edges in D(n), we have i=1 k the energy dissipation for I in the bonds in D(n), which is i=1 |Ji |2 , is greater than 1/k = |D(n)|−1 . Summing then gives (3.5). Now we combine Proposition 3.3 and Lemma 3.4 to show that it is unlikely that the effective resistance Reff (0, ∂ B(R)) is less than a small multiple of R. Proposition 3.5. There is a constant c such that for d > 6, L ≥ L 1 , R ≥ 2 and > 0, Q∞ (Reff (0, ∂ B(R)) ≤ ε R) ≤ cε.

(3.6)

Proof. Let R ≥ 2. Fix 21 < a < 1 and let r = a R; note that r ≥ 1. By Lemma 3.4 and the Cauchy–Schwarz inequality, r −1 r −1 −1 Reff (0, ∂ B(R)) ≤ |D(n)| ≤ r −2 |D(n)|. (3.7) n=1

n=1

408

M. T. Barlow, A. A. Járai, T. Kumagai, G. Slade

Therefore, by Proposition 3.3, Markov’s inequality and (3.7), Q∞ (Reff (0, ∂ B(R)) ≤ ε R) = Q∞ (Reff (0, ∂ B(R))−1 ≥ ε−1 R −1 ) ≤ ε RE∞ Reff (0, ∂ B(R))−1 r −2 |D(n)| ≤ ε Rr −1 c1 (a) ≤ 2a −1 c1 (a)ε. ≤ ε Rr E∞ n=1

Proof of Theorem 1.7. Let W R = V (R)/R 2 . By Proposition 3.1 we have (2) and Q∞ (W R ≥ λ) ≤ λ−1 E∞ W R ≤ cλ−1 .

(3.8)

Also, Proposition 3.2 gives Q∞ (W R < λ−1 ) ≤ c exp(−c λ1/2 ),

(3.9)

and (3) is then immediate after integration. The combination of (3.8)–(3.9) and (3.6) (with ε = λ−1 ), together with the fact that each of the bounds is less than cλ−1 for large λ, gives (1) with q0 = 1 and R ∗ = 2. The fact that all constants here are independent of d, L implies that the constants in Assumption 1.2 share this independence. 4. IIC Volume Estimates: Proof of Propositions 3.1–3.2 In Sect. 4.2 we prove Proposition 3.1, and in Sect. 4.3 we prove Proposition 3.2. The proofs make use of results from several previous papers involving the lace expansion; these results are gathered together and slightly extended in Sect. 4.1. We assume throughout that d > 4 and that L is large; these assumptions will often not be mentioned explicitly in the following. Throughout: β = L −d , K denotes a constant that only depends on d, and K¯ denotes an absolute constant. The values of the constants K and K¯ may change from one occurrence to the next. 4.1. Preliminaries. In this section, we recall and slightly extend various results from [20,21,25,26]. These results isolate the necessary ingredients from other papers that will be used in the proof of Propositions 3.1–3.2. 4.1.1. Critical oriented percolation r -point functions. The critical oriented percolation two-point function τn (x) is defined by

Let τn =

τn (x) = P pc ((0, 0) −→ (x, n)). x∈Zd τn (x).

(4.1)

By [26, Theorem 1.1], sup τn (x) ≤ Kβ(n + 1)−d/2 , n ≥ 1,

(4.2)

τn = A(1 + O(n (4−d)/2 )), as n → ∞,

(4.3)

x∈Zd

Random Walk on the Incipient Infinite Cluster for Oriented Percolation

409

where |A − 1| ≤ Kβ. The estimate [25, (4.2)] shows that the error term in (4.3) is bounded by Kβn (4−d)/2 (note that f n (0, z c ) of [25] corresponds to our τn ). Hence for L ≥ L 1 = L 1 (d), we have K¯ −1 ≤ A ≤ K¯ ,

|τn − A| ≤ K¯ n (4−d)/2 , n ≥ 1,

K¯ −1 ≤ τn ≤ K¯ , n ≥ 0. (4.4) Also, noting that τ1 is called pc in [26], we see from [26, Eq. (1.12)] that |τ1 − 1| ≤ Kβ ≤ K¯ for L ≥ L 1 (d) sufficiently large. For all r ≥ 2, the critical oriented percolation r -point function τn(r ) (x) is defined by τn(r1),...,nr −1 (x1 , . . . , xr −1 ) = P pc ((0, 0) −→ (xi , n i ) for all i = 1, . . . , r − 1),

(4.5)

with xi ∈ Zd , n i ∈ Z+ . The asymptotic behaviour of the Fourier transforms of the r -point functions is given in [26, Theorem 1.2]. A very special case of [26, Theorem 1.2] is that there is a δ > 0 such that for t1 , t2 > 0, (3) ∗ 3 −δ t τnt (x , x ) = nV A ∧ t + O(n ) (4.6) 1 2 1 2 ,nt 1 2 x1 ,x2 ∈Zd

as n → ∞ (see [26, (1.22)]). The vertex factor V ∗ is written V in [26] but written V ∗ here to avoid confusion with the volume. The vertex factor is a constant with |V ∗ − 1| ≤ Kβ, and we assume that L 1 has been chosen so that K¯ −1 ≤ V ∗ ≤ K¯ . 4.1.2. The IIC r -point functions. Let y = (y1 , . . . , yr −1 ) and m = (m 1 , . . . , m r −1 ) with yi ∈ Zd , m i ∈ Z+ . For r ≥ 2, the IIC r -point function is defined by ρm(r ) (y ) = Q∞ ((0, 0) −→ (yi , m i ) for all i = 1, . . . , r − 1). Let

ρˆm(r ) =

y1 ,...,yr −1 ∈Zd

ρm(r ) (y ).

(4.7) (4.8)

Let A be the constant of (4.3), and let V ∗ be the vertex factor of (4.6). Let r ≥ 2, t = (t1 , . . . , tr −1 ) ∈ (0, 1]r −1 , and for a positive integer m, let m t be the vector with that for r ≥ 2, components mti . It is immediate from [21, (5.15)] (with k = 0) lim

m→∞

1

(r ) ρˆ (r ) = Mˆ 1, , t

(m A2 V ∗ )r −1 m t

(r) is defined recursively as follows (see [21, Sect. 4.2]). where the limit Mˆ 1, t For r = 1, we have simply Mˆ s(1) = 1.

(4.9)

(4.10)

For r > 2 and s¯ = (s1 , . . . , sr ) with each si > 0, the Mˆ s¯(r) are given recursively by s (r ) ˆ Ms¯ = ds Mˆ s(1) , (4.11) Mˆ s¯(i)I −s Mˆ s¯(rJ−i) \I −s 0

I ⊂J1 :|I |≥1

where i = |I |, J = {1, . . . , l}, J1 = J \{1}, s = mini si , sI denotes the vector consisting of the components si of s with i ∈ I , and sI − s denotes subtraction of s from each component of sI . The explicit solution to the recursive formula (4.11) can be found, e.g.,

410

M. T. Barlow, A. A. Járai, T. Kumagai, G. Slade

in [26, (1.25)]. In particular, Mˆ s(2) 1 ,s2 = s1 ∧ s2 . It is shown in [21, Lemma 4.2] that for r ≥ 1 and t > 0, (r) Mˆ t,...,t = t r −1 2−(r −1) r !. (4.12) To this we add the following elementary fact. Lemma 4.1. For r ≥ 1, Mˆ s(r1),...,sr is nondecreasing in each si . Proof. The proof is by induction on r . For r = 1, Mˆ s(1) 1 = 1 by (4.10), which is nondecreasing. Assume the result holds for all j ≤ r . Then it holds also for r + 1 by (4.11), since increasing an si can only increase the integrand (by the induction hypothesis) or the domain of integration in (4.11). 4.1.3. Super-Brownian motion. As discussed in [21, Sect. 4], the quantity Mˆ s¯(r ) appearing in (4.9) is the r th moment of the canonical measure N of super-Brownian motion X t , namely Mˆ s(r1),...,sr = N X s1 (Rd ) · · · X sr (Rd ) . (4.13) For an introduction to the canonical measure, see [36, Chap. 17]. Let Yt denote the canonical measure of super-Brownian motion conditioned to survive for all time (see [20]). Let 1

Z=

dt Yt (Rd ),

(4.14)

0

so that Z is a positive random variable. It is clear that the distribution of Z does not depend on L. It also does not depend on d, since it is equal to the mass up to time 1 of the continuum random tree conditioned to survive forever. The moments of Z are given, for integers l ≥ 1, by 1 1 (l+1) l EZ = dt1 · · · dtl Mˆ 1, (4.15) t 0

0

(see [20, Sect. 3.4]). We will use the fact that Z has an exponential moment. This follows from 1 1 (l+1) l EZ ≤ dt1 · · · dtl Mˆ 1,1,...,1 = 2−l (l + 1)!, (4.16) 0

0

where we have used (4.15), Lemma 4.1 and (4.12). 4.1.4. Rate of convergence to the IIC. For the proof of Proposition 3.2, we will need an estimate for the rate of convergence of Pn to P∞ (recall the definitions from (1.27)– (1.28)). Let Em denote the set of cylinder events measurable with respect to the set of edges up to level m − 1. In [21, Eq. (2.19)], the following representation was obtained for Pn (E), E ∈ Em : n−1 1 Pn (E) = ϕl (E)τ1 τn−l−1 + ϕn (E) , (4.17) τn l=m

where ϕl (E) is a function arising in the lace expansion. The factor τ1 was called pc in [21]. By [21, Lemma 2.2], ϕl satisfies |ϕl (E)| ≤ Kβm(l − m + 1)−d/2 ,

l ≥ m + 1.

(4.18)

Random Walk on the Incipient Infinite Cluster for Oriented Percolation

411

However, a very slight modification of the proof of [21, Lemma 2.2] actually shows that |ϕl (E)| ≤ Kβ(l − m + 1)(2−d)/2 , l ≥ m ≥ 1

(4.19)

m−1 (l − a)−d/2 used in [21, (replace the upper bound K m(l − m + 1)−d/2 on a=0 (2.33),(2.35)] by the more careful upper bound K (l − m + 1)(2−d)/2 ), and we will use this variant. The IIC measure is given in [21, Eq. (2.29)] as P(E) =

∞

τ1 ϕl (E),

E ∈ Em .

(4.20)

l=m

The following lemma bounds the rate at which the measure P2m converges to P∞ . Lemma 4.2. Let d > 4. For E ∈ Em , |P2m (E) − P∞ (E)| = O((m + 1)(4−d)/2 ),

(4.21)

where the constant in the error term is uniform in E and L ≥ L 0 . The error term can be guaranteed to be uniform in d as well, by further requiring that L ≥ L 1 for some L 1 = L 1 (d). Proof. By the triangle inequality, 2m 2m |P2m (E) − P∞ (E)| ≤ P2m (E) − τ1 ϕl (E) + P∞ (E) − τ1 ϕl (E) . l=m

(4.22)

l=m

For the second term on the right-hand side, we use (4.20) and (4.19) to obtain 2m ∞ ∞ τ1 ϕl (E) ≤ τ1 |ϕl (E)| ≤ Kβ (l−m+1)(2−d)/2 ≤ Kβm (4−d)/2 . P∞ (E)− l=m l=2m+1 l=2m+1 (4.23) For the first term on the right-hand side of (4.22), we use (4.17) to obtain 2m−1 2m τ2m−l−1 1 τ1 ϕl (E) ≤ τ1 |ϕl (E)| − 1 + |ϕ2m (E)| − τ1 . P2m (E) − τ2m τ2m l=m

l=m

(4.24) By (4.19), the last term is bounded by Kβm (2−d)/2 . To bound the sum, we split it into the cases m ≤ l < 3m/2 and 3m/2 ≤ l ≤ 2m − 1. In the first case, we use (4.3) to obtain |(τ2m−l−1 /τ2m ) − 1| ≤ Kβm (4−d)/2 . Then inserting the bound (4.19) and summing over l, we obtain a bound Kβm (4−d)/2 for the first case. In the second case, we bound |τ2m−l−1 /τ2m − 1| ≤ K . Inserting the bound on ϕl , and summing over l, we obtain a bound Kβm (4−d)/2 for the second case. Thus, in either case, (4.24) is bounded by Kβm (4−d)/2 . For L ≥ L 1 this bound is at most K¯ m (4−d)/2 . With (4.22)–(4.23), this proves (4.21).

412

M. T. Barlow, A. A. Járai, T. Kumagai, G. Slade

4.2. Volume convergence: Proof of Proposition 3.1. In this section, we prove Proposition 3.1. We now choose b0 = (2τ1 A2 V ∗ R 2 )−1 in (3.1), so that Z R is defined by Z R = (2τ1 A2 V ∗ R 2 )−1 V (R).

(4.25)

As pointed out in Sect. 4.1, the constants τ1 , A, V ∗ all have limit 1 as L → ∞. Let Z˜ R = (A2 V ∗ R 2 )−1 |B(R)|.

(4.26)

Thus Z˜ R is defined in terms of the vertices in B(R), whereas Z R is defined in terms of the edges. Recall the random variable Z defined in (4.14). We use (4.9) to prove that lim R→∞ E Z˜ lR = EZ l for all l ≥ 1, and then adapt this to Z R . Let l ≥ 1. By definition, E Z˜ lR =

R−1 R−1 1 · · · · · · ρn(l+1) (x1 , . . . , xl ) 1 ,...,n l (A2 V ∗ R 2 )l d d n 1 =0

=

1 R

R−1

···

n 1 =0

1 R

nl =0 x1 ∈Z

R−1

xl ∈Z

1 ρˆ (l+1) , (A2 V ∗ R)l tR

nl =0

(4.27)

where t = (n 1 R −1 , . . . , nl R −1 ). The summand on the right-hand side is bounded by a constant, by standard tree-graph inequalities [1] (see [21, Sect. 5.1] for the details when l = 1). Therefore, by (4.9), the dominated convergence theorem, and (4.15), lim E Z˜ lR =

R→∞

1

dt1 · · ·

0

0

1

(l+1) dtl Mˆ 1, = EZ l . t

(4.28)

The next lemma implies that it is also the case that lim R→∞ EZ lR = EZ l for all l ≥ 1. Lemma 4.3. For all l ≥ 1 and R ≥ 3, (1 − 2/R)2l E Z˜ lR−2 ≤ EZ lR ≤ E Z˜ lR−1 + c(d, L , l)R −1 .

(4.29)

Since Z was shown in (4.16) to have a moment generating function with radius of convergence at least 2, the convergence of moments established in Lemma 4.3 implies that Z R converges weakly to Z (see [12, Theorem 30.2]). Note that for L ≥ L 1 , the constants A, V ∗ and τ1 satisfy bounds independent of d, hence c1 and c2 in Proposition 3.1 do not depend on d. This completes the proof of Proposition 3.1, subject to Lemma 4.3. Proof of Lemma 4.3. For l ≥ 1, we define σm(l+1) x , y) = Q∞ ((0, 0) −→ (xi , m i ) −→ (yi , m i + 1) for all i = 1, . . . , l). ( Note that 2|edges in B(R − 1)| ≤

(x,m)∈B(R)

µ(x,m) = V (R) ≤ 2|edges in B(R)|,

(4.30)

Random Walk on the Incipient Infinite Cluster for Oriented Percolation

413

since edges on the boundary of B(R) are counted once in V (R), while other edges are counted twice. Therefore EZ lR ≥

R−2 R−2 1 · · · 2 ∗ 2 l (τ1 A V R ) n 1 =0

···

nl =0 x1 ,y1 ∈Zd

σn(l+1) (x1 , . . . , xl , y1 , . . . , yl ), 1 ,...,n l

xl ,yl ∈Zd

(4.31) with a corresponding upper bound if the summations over the n i ’s extend to R − 1. Lower bound. The Harris–FKG inequality [16,18] implies that for increasing events A and B we have Qn (A ∩ B) ≥ Qn (A)P(B). If A and B are cylinder events, then by passing to the limit, we have Q∞ (A ∩ B) ≥ Q∞ (A)P(B). Hence x , y) ≥ ρn(l+1) ( x) σn(l+1) (

l

τ1 (yi − xi ).

(4.32)

i=1

With (4.27), this gives EZ lR ≥ [(R − 2)/R]2l E Z˜ lR−2 . Upper bound. Let x ) = {(0, 0) −→ ∞, (0, 0) −→ (xi , m i ), i = 1, . . . , l}. Am ( x , y) denote the event that the following l + 1 events occur on disjoint sets of Let Fm ( edges: x ), {(x1 , m 1 ) −→ (y1 , m 1 + 1)}, . . . , {(xl , m l ) −→ (yl , m l + 1)}. Am (

(4.33)

Then x , y) ≤ Q∞ (Fm ( x , y))+Q∞ (Am ( x )∩li=1 {(xi , m i ) −→ (yi , m i + 1)}\ Fm ( x , y)). σm(l+1) ( (4.34) The BK inequality implies that for increasing events A and B that depend on only finitely many edges we have P(A ◦ B) ≤ P(A)P(B), where A ◦ B denotes disjoint occurrence [8,18]. We will bound the first term by passing to the limit in the BK inequality. Let Am,n x ) = {(0, 0) −→ n, (0, 0) −→ (xi , m i ), i = 1, . . . , l}, ( x , y) analogously, by replacing Am ( x ) in (4.33) by Am,n x ). Then each and define Fm,n ( ( event in the definition of Fm,n x , y) only depends on finitely many edges, hence by BK, ( P(Fm,n x , y)) ≤ P(Am,n x )) ( (

l

τ1 (yi − xi ).

i=1

Dividing both sides by P((0, 0) −→ n) and letting n → ∞, we get x , y)) ≤ Q∞ (Am ( x )) Q∞ (Fm (

l i=1

(l+1)

τ1 (yi − xi ) = ρm ( x)

l

τ1 (yi − xi ).

(4.35)

i=1

l The sum of this bound over x and y is ρˆm(l+1) τ1 . With (4.27), this gives a contribution l ˜ E Z R−1 to the upper bound version of (4.31). We claim that on the event Am ( x )∩li=1 {(xi , m i ) −→ (yi , m i + 1)} \ Fm ( x , y), there exists 1 ≤ i ≤ l such that either (xi , m i ) −→ (x j , m j ) for some j = i, or (xi , m i ) −→

414

M. T. Barlow, A. A. Járai, T. Kumagai, G. Slade

∞. To see this, we may assume that all the (xi , m i )’s are different, otherwise there is nothing to prove. Under this assumption, the last l events in (4.33) occur disjointly. As in a tree-graph bound [1], choose a set of disjoint paths showing that Am ( x ) occurs. Then at least one of the paths uses an edge ((xi , m i ), (yi , m i + 1), otherwise Fm ( x , y) would occur. This path includes a connection (xi , m i ) −→ (x j , m j ) or (xi , m i ) −→ ∞, proving the claim. By the claim, the second term on the right-hand side of (4.34) is at most ⎤ ⎡ ⎣ Q∞ (Am ( x ), (xi , m i ) −→ (x j , m j )) + Q∞ (Am ( x ), (xi , m i ) −→ ∞)⎦ . 1≤i≤l

j=i

(4.36) Each term in (4.36) can be bounded using a tree-graph inequality where the number of internal vertices in the tree-graph bound is l − 1, one less than it would be for ρ (l+1) . This implies that the sum of (4.36) over x and y inside B(R) is bounded by c(d, L , l)R l−1 . It follows that EZ lR ≤ E Z˜ lR−1 + c(d, L , l)R −1 , which gives the desired upper bound and completes the proof of (4.29).

4.3. Volume estimate: Proof of Proposition 3.2. In this section, we prove Proposition 3.2. Recall the definitions of Pn and P∞ from (1.27)–(1.28). It is enough to show that we can find constants R0 (d), c1 (d), c2 (d), c3 (d) such that for R ≥ R0 and λ ≤ c3 we have P∞ (V (R)R −2 < λ) ≤ c1 exp{−c2 λ−1/2 }.

(4.37)

Indeed, the restrictions on λ and R can be removed by adjusting the constant c1 as follows. First, for λ > c3 , if c1 > exp{c2 (c3 )−1/2 }, the right-hand side of (4.37) is larger than 1. As for R < R0 , due to the (deterministic) inequality V (R) ≥ R, we have V (R)R −2 ≥ R R −2 > R0−1 . Therefore, if λ < R0−1 , the left-hand side of (4.37) is 0. 1/2 For λ ≥ R0−1 , it is enough to require that c1 > exp{c2 R0 }. Finally, note that if initially R0 , c1 , c2 , c3 are independent of d, then so is the adjusted c1 . We begin with a simple consequence of Proposition 3.1. Corollary 4.4. Given ε > 0, there exists λ0 = λ0 (ε, d), such that Q∞ (V (R)R −2 < λ0 ) < ε, R ≥ 1.

(4.38)

For L ≥ L 1 , λ0 can be chosen independent of d. Proof. This follows from Proposition 3.1 and the fact that Z is strictly positive.

Let c = c(d) = supm≥1 τm . According to (4.38), there is a constant c3 = c3 (d) such that 1 P∞ (V (R) < 4c3 (R + 1)2 ) < , R ≥ 1. (4.39) 3c We fix m 0 = m 0 (d) such that for m ≥ m 0 the error term on the right-hand side of (4.21) is at most (3c)−1 . Let R0 = 16c3 m 20 . Fix λ ≤ c3 and R ≥ R0 . We will prove that (4.37) 1/2 holds for λ and R with the choice of c3 made and with c1 = 1 and c2 = 21 log(3/2)c3 .

Random Walk on the Incipient Infinite Cluster for Oriented Percolation

415

There is nothing to prove if λ < R0 /R 2 , since, in this case P∞ (V (R)R −2 < λ) ≤ P∞ (V (R) < R0 ) ≤ P∞ (V (R) < R) = 0

(4.40)

and (4.37) holds trivially. Hence, without loss of generality, we assume that 16c3 m 20 R0 = 2 ≤ λ ≤ c3 . R2 R

(4.41)

To estimate P∞ (V (R) < λR 2 ), we subdivide the time interval [0, R] into blocks that provide roughly independent contributions to the volume, and apply (4.39) in each block. The number of blocks is S = (c3 /λ)1/2 , which is at least 1 by (4.41). The length of a block is 2m, with m = R/2S. Note that m ≥ m 0 , since 1/2

R R R ≥ 01/2 = 2m 0 > 1, ≥ 1/2 2S 2(c3 /λ) 2c3

and hence m=

R 2S

≥

R R ≥ m0. ≥ 4S 4(c3 /λ)1/2

(4.42)

(4.43)

Set n i = i(2m), i = 0, . . . , S, so that the i th block starts at level n i−1 and ends at level ni . By (1.28), 1 P pc (V (R) < λR 2 , (0, 0) −→ (x, N )). (4.44) N →∞ τ N d

P∞ (V (R) < λR 2 ) = lim

x∈Z

The path (0, 0) −→ (x, N ) on the right-hand side passes through the levels n 1 , . . . , n S , and hence there exist 0 = x0 , x1 , . . . , x S ∈ Zd such that (0, 0) −→ (x1 , n 1 ) −→ · · · −→ (x S , n S ) −→ (x, N ). We write x i = (xi , n i ) for i = 0, . . . , S, and write x = (x, N ). It follows that P pc (V (R) < λR 2 , (0, 0) −→ (x, N )) ⎞ ⎛ = P pc ⎝ {V (R) < λR 2 , x i−1 −→ x i , i = 1, . . . , S} ∩ {x S −→ x}⎠ x1 ,...,x S ∈Zd

≤

P pc (V (R) < λR 2 , x i−1 −→ x i , i = 1, . . . , S, x S −→ x).

x1 ,...,x S ∈Zd

(4.45) Let

C(y; n) = C(y) ∩ (Zd × {0, 1, . . . , n}).

(4.46)

On the event on the right-hand side of (4.45), x i−1 is contained in B(R), and hence C(x i−1 ; n i−1 + m) ⊂ B(R). Denote Vi = µ(C(x i−1 ; n i−1 + m)). Then on the event in the right-hand side of (4.45), since λ ≤ c3 /S 2 by the choice of S, we have R 2 c3 ≤ 4c3 (m + 1)2 . (4.47) Vi ≤ V (R) < λR 2 ≤ 2 R 2 = 4c3 S 2S

416

M. T. Barlow, A. A. Járai, T. Kumagai, G. Slade

Hence, the right-hand side of (4.45) is at most S ! P pc {Vi < 4c3 (m + 1)2 , x i−1 −→ x i } ∩ {x S −→ x} . x1 ,...,x S ∈Zd

(4.48)

i=1

The S + 1 events in (4.48) depend on disjoint sets of bonds, so the probability factors as x1 ,...,x S

P pc (x S −→ x)

∈Zd

S

P pc (Vi < 4c3 (m + 1)2 , x i−1 −→ x i ).

(4.49)

i=1

We insert this into (4.45), and use (4.44), (4.3) and (1.27) to obtain ⎞ ⎛ S τ N −n S ⎝ P∞ (V (R) < λR 2 ) ≤ P pc (Vi < 4c3 (m + 1)2 , x i−1 −→ x i )⎠ lim sup τN N →∞ d i=1

xi ∈Z

" #S = τ2m P2m (V (m) < 4c3 (m + 1)2 ) . (4.50) By Lemma 4.2, the right-hand side equals " #S S τ2m P∞ (V (m) < 4c3 (m + 1)2 ) + O((m + 1)(4−d)/2 ) .

(4.51)

By the choice of m 0 and (4.39), both terms inside the square brackets are at most (3c)−1 . Since $ % 1 S = (c3 /λ)1/2 ≥ (c3 /λ)1/2 , 2 it follows from our choice of c that S ' & 2 S 2 1/2 S ≤ ≤ exp − 21 log(3/2)c3 λ−1/2 . (4.52) P∞ (V (R) < λR 2 ) ≤ τ2m 3c 3 1/2

The choice c2 = 21 log(3/2)c3 gives (4.37). Noting that for L ≥ L 1 , c, c3 and m 0 (and hence all further constants chosen) are independent of d, this completes the proof of Proposition 3.2. 5. IIC Resistance Estimates: Proof of Proposition 3.3 In this section we prove Proposition 3.3. Throughout, we use x, y, . . . to denote spacetime vertices in Zd × Z+ , we denote the spatial component of a vertex x by x, and we write |x| = n when x = (x, n). According to (3.3),

|x| = n, x is RW-connected to , 0 < n ≤ R. (5.1) D(n) = e = (w, x) ⊂ C : level R by a path in C ∩ U (n) Our goal is to prove that for d > 6, L sufficiently large and 0 < a < 1, E∞ (|D(n)|) ≤ c1 (a), 0 < n ≤ a R.

(5.2)

Random Walk on the Incipient Infinite Cluster for Oriented Percolation

417

y R

n

x w

0 Fig. 2. The configuration bounded in (5.4). The vertices w = (w, n − 1), x = (x, n), y = (y, N ) are summed over w, x, y ∈ Zd , and the three unlabelled vertices are summed over space and time

Writing y = (y, N ), by (1.28) and (4.3) we have E∞ |D(n)| = P∞ [(w, x) ∈ D(n)] w,x∈Zd

=

1 lim A N →∞

P pc (w, x) ∈ D(n), 0 −→ y .

(5.3)

w,x,y∈Zd

Hence we will focus on the event {(w, x) ∈ D(n), 0 −→ y}, for fixed n, w = (w, n−1), x = (x, n) and y = (y, N ). Remark. For a quick indication of why we need to assume d > 6, consider the configuration in Fig. 2, which contributes to the right-hand side of (5.3). Using the fact that τn is bounded by a constant by (4.4), and using (4.2) (see also (5.32) below), the configuration in Fig. 2 can be bounded above using the BK inequality by c

∞ ∞ ∞ l n l (l − j + 1)−d/2 ≤ c (l − n + 1)(2−d)/2 ≤ c (l − n + 1)(4−d)/2 l=n k=n j=0

=c

l=n k=n ∞ (4−d)/2

m

l=n

,

(5.4)

m=1

where j, k, l are the time coordinates of the unlabelled vertices, from bottom to top. Here, the connection from the lower unlabelled vertex to the upper unlabelled vertex via w and x contributes K (l − j + 1)−d/2 , and the other connections all contribute constants. The right-hand side is bounded only for d > 6. Our complete proof of (5.2) is more involved since we must estimate the contributions to (5.3) due also to more complex zigzag random walk paths. In Sect. 5.1, we prove Lemma 5.1, which explores the geometry of the event {(w, x) ∈ D(n), 0 −→ y}. Then, in Sect. 5.2, we apply Lemma 5.1 to construct

418

M. T. Barlow, A. A. Járai, T. Kumagai, G. Slade R z

n

x w p q p

r

A

B 0

Fig. 3. Illustration of the setup in Lemma 5.1

events A J (n, w, x, y), J ≥ 0, such that {(w, x) ∈ D(n), 0 −→ y} ⊂

∞

A J (n, w, x, y).

(5.5)

J =0

In Sect. 5.3, the BK inequality [8] is used to obtain a diagrammatic bound for the probability of the event A J (n, w, x, y). Finally, in Sect. 5.4, we estimate the diagrams in this diagrammatic bound, to prove (5.2) and hence Proposition 3.3. The need to restrict to d > 6, rather than d > 4, occurs only in our last lemma, Lemma 5.6. 5.1. An intersection lemma. We will need the existence of certain intersections within the cluster C that are implied by the presence of a random walk path from x to R. These intersections are isolated in the following lemma. The following notation will be convenient: C˜(p,q) = {v : 0 −→ v disjointly from the edge (p, q)}, (p, q) ⊂ C. Also, we write y 1 y 2 for an occupied oriented path y 1 −→ y 2 . Such paths are in general not unique, but context will often identify a unique path for consideration. We first describe informally the statement of the lemma, whose setup is illustrated in Fig. 3. Suppose that (w, x) ∈ D(n), and 0 −→ y. Let (p, q) be an edge on an occupied path that starts at 0 and ends with the edge (w, x). Assume that q −→ / R. (p,q) ˜ Then C(q) must intersect C , otherwise a RW-connection from x to R in C ∩ U (n) could not occur. The lemma gives a more sophisticated version of the intersection requirements, which allows us to have some control over the way the intersection occurs.

Random Walk on the Incipient Infinite Cluster for Oriented Percolation

419

This is needed, because we will use the lemma recursively to construct a set of paths realizing the intersections. Assume that we are given a subgraph A ∪ B of C˜(p,q) , that will represent a set of paths already constructed, where A will be a certain ‘preferred region.’ Assume that A ∪ B is disjoint from C(q), and 0 ∈ A ∪ B. Then there will be upwards occupied paths from some vertex r ∈ A ∪ B and some vertex p ∈ qx to an intersection point z. It will be convenient, if we can also conclude that r is in the preferred region A. For this reason, we will also assume that any occupied path from B to C(q) passes through A. Now we state the lemma precisely. Lemma 5.1. Assume the event {(w, x) ∈ D(n), 0 −→ y}. In addition, assume the following: (i) (ii) (iii) (iv)

(p, q) ⊂ C and either q −→ w or (p, q) = (w, x); q −→ / R; A and B are subgraphs of C˜(p,q) with 0 ∈ A∪ B, and such that (A∪ B)∩C(q) = ∅; every occupied oriented path from B to C(q) passes through a vertex of A.

Then there exist p ∈ qx, r ∈ A and z with |p| < |z| < R, such that p −→ z and r −→ z edge-disjointly, and edge-disjointly from px ∪ A ∪ B. Here z may coincide with p or r. Proof. We first show that C(q) and C˜(p,q) must have a common vertex v. Fix a random walk path from x to R in U (n), showing that (w, x) ∈ D(n). Note that C (as a set of vertices) is the union C˜(p,q) ∪ C(q). Since starts at x ∈ C(q), but q −→ / R, there is an edge (v, v ) ⊂ such that v ∈ C(q) but v ∈ C(q), and therefore v ∈ C˜(p,q) . We need to have |v | = |v| − 1 (otherwise v ∈ C(q)). We can rule out (v , v) = (p, q), since stays in U (n), and |p| ≤ n − 1. It follows that v ∈ C˜(p,q) , and hence is in the intersection C(q) ∩ C˜(p,q) . / R, |p| < |q| ≤ |z| < R. Choose z ∈ C(q) ∩ C˜(p,q) with |z| minimal. Since q −→ We can find occupied oriented paths qz ⊂ C(q) and 0z ⊂ C˜(p,q) . These two paths must be edge-disjoint by minimality of |z|. Let p be the last visit of qz to qx, and let r be the last visit of 0z to A ∪ B. Such a last visit exists, since we assumed 0 ∈ A ∪ B. Since z ∈ A ∪ B, due to (A ∪ B) ∩ C(q) = ∅, the last visit has to be in A by assumption (iv). The path p z is edge-disjoint from px, by the definition of p . It is also edge-disjoint from A ∪ B, by minimality of |z|. Likewise, the path rz is edge-disjoint from A ∪ B by definition of r. It is also edge-disjoint from p z, by minimality of |z|. Remark. Note that in the proof, we have first found a vertex r ∈ A ∪ B, and assumption (iv) was only used to show that we must have r ∈ A. In fact, without assumption (iv), we would get the statement of the lemma with r ∈ A ∪ B. The significance of being able to ensure that r is in the smaller set A, as well as the roles played by A and B will become apparent in Sect. 5.2. 5.2. The event A J (n, w, x, y). In this section, we define the event A J (n, w, x, y) and prove (5.5). The following lemma is key. Lemma 5.2. Let e = (w, x), and assume the event {e ∈ D(n), 0 −→ y}. Then there exists J ≥ 0, such that the following vertices and paths (all edge-disjoint) exist:

420

M. T. Barlow, A. A. Járai, T. Kumagai, G. Slade

R

y

z2

z3 v2

n

v3 u3

z1 u2 v1 v0

u1 u0

v∗ 0 Fig. 4. The vertices and disjoint paths of A J (n, w, x, y) for J = 3. Here x = v 3 and w = u3

(i) vertices u0 , u1 , . . . , u J = w such that 0 ≤ |u0 | ≤ |u1 | ≤ · · · ≤ |u J | = n − 1; (ii) vertices v 0 , v 1 , . . . , v J = x, and, if J ≥ 1, vertices z1 , . . . , z J such that |ui−1 | ≤ |v i−1 | ≤ |zi |, 1 ≤ i ≤ J ; |ui−1 | < |zi | < R, 1 ≤ i ≤ J ;

(5.6) (5.7)

(iii) 0 −→ u0 and ui−1 −→ ui , 1 ≤ i ≤ J ; (iv) ui−1 −→ zi , 1 ≤ i ≤ J ; (v) v i−1 lies either on ui−1 ui or ui−1 zi , and v i −→ zi , 1 ≤ i ≤ J . In addition, at least one of the following holds: Case (a) v 0 −→ y; Case (b) v 0 −→ R and there exists v ∗ on 0u0 such that v ∗ −→ y. Definition 5.3. We denote by A J = A J (n, w, x, y) the event that the vertices and disjoint paths listed in Lemma 5.2 exist, and (w, x) is occupied. See Fig. 4. The inclusion (5.5) then follows immediately from Lemma 5.2. Proof of Lemma 5.2. Throughout the proof, we assume the event {e = (w, x) ∈ D(n), 0 −→ y}. We first show that if x −→ R then the lemma holds with J = 0. Indeed, take u0 = w and v 0 = x. Then 0 −→ u0 , since u0 ∈ C. Hence it is left to show that at least one of Cases (a) and (b) holds. If v 0 = x −→ y, then Case (a) holds. If not, then since 0 −→ y we can find v ∗ ∈ 0u0 such that v ∗ −→ y edge-disjointly from 0u0 . The connection v ∗ y has to be edge-disjoint from wx R, otherwise we are in Case (a). Hence Case (b) holds. For the rest of the proof, we assume x −→ / R. We construct the paths claimed in the lemma recursively. Hence our proof will be based on a recursion hypothesis whose statement involves an integer I ≥ 0, and which

Random Walk on the Incipient Infinite Cluster for Oriented Percolation

421

says that a subset of the paths claimed in the lemma (depending on I ) have already been constructed. In order to advance the recursion, the hypothesis also specifies graphs A I and B I such that Lemma 5.1 can be applied with A = A I and B = B I . The outline of the proof is the following. Since the statement of the hypothesis for I = 0 is slightly different than for I ≥ 1, we state and verify the hypothesis for I = 0 separately. This will show that the recursion can be started. Since the general step of the recursion is complex, we explain the first two steps of the recursion (I = 1 and I = 2) in some detail, before formulating the recursion hypothesis precisely in the general case I ≥ 1. The recursion will lead to the proof of the lemma by the following steps. We prove that if the hypothesis holds for some value of I ≥ 0, then either the conclusion of Lemma 5.2 follows with J = I + 1, or else the hypothesis also holds for I + 1. If, for some i > 0, the hypothesis holds for I = 0, 1, . . . , i, then its statement will guarantee the existence of vertices p0 , p 1 , . . . , pi with |p0 | < |p1 | < · · · < |pi | < n.

(5.8)

Consequently the hypothesis cannot hold for all I = 0, 1, . . . , n, and the implications just mentioned provide a proof of Lemma 5.2. We now carry out the details. (R) Recursion hypothesis for I = 0. There exists p0 , q 0 such that 0 −→ p0 , p0 −→ R, p0 −→ w −→ x, q 0 −→ / R,

(5.9) (5.10)

where (p 0 , q 0 ) is the first edge in the path p0 x. All paths stated are edge-disjoint. Letting A0 = {0p0 , p 0 R} = {paths in (5.9)}, B0 = ∅, the hypotheses of Lemma 5.1 are satisfied with p = p 0 , q = q 0 , A = A0 and B = B0 . Verification of (R) for I = 0. Since 0 −→ w and 0 −→ R, there exists p 0 such that 0 −→ p 0 , p 0 −→ w and p0 −→ R disjointly. Fix the paths 0p0 , p 0 w and p0 R, and let (p0 , q 0 ) be the first step of the path p0 x. If we select p 0 so that |p0 | is maximal, then we have q 0 −→ / R. We verify the hypotheses of Lemma 5.1 with these choices. First, (i), (ii) and 0 ∈ A0 ∪ B0 are immediate. Also, C(q 0 ) ∩ (A0 ∪ B0 ) = C(q 0 ) ∩ A0 = ∅, since otherwise q 0 −→ R. Finally, (iv) is vacuous, since B0 is empty. Next, to illustrate the main idea of the proof, we explain the first two steps of the recursion. Since we have verified (R) in the case I = 0, we can apply Lemma 5.1 with p = p0 , q = q 0 , A = A0 and B = B0 . Lemma 5.1 shows that there exist p ∈ q 0 x and r ∈ A0 = 0p0 ∪ p 0 R and a vertex z such that p −→ z and r −→ z. For reasons that will be explained in the third paragraph below, we select p with |p | maximal such that the conclusions of Lemma 5.1 hold. With this choice of p , we set p 1 = p , z1 = z and r 0 = r. Note that |p1 | > |p0 |. We define the vertices u0 and v 0 as follows. Note that r 0 ∈ A0 , which is the union of the paths 0p0 and p 0 R. If r 0 ∈ p 0 R then we set v 0 = r 0 and u0 = p0 , and if r 0 ∈ 0p0 then we set v 0 = p0 , u0 = r 0 . In either case, we have |u0 | ≤ |p0 | < |z1 | < R, and hence (5.7) holds for i = 1.

422

M. T. Barlow, A. A. Járai, T. Kumagai, G. Slade R

R z2

n

x w

z1 q1

n

x w

z1 q2

p2 v1 = p1

p1 v0 = r0 u 0 = p0

(a)

v0

u1 = r1 u0

0

(b)

0

Fig. 5. Assumptions of the recursion hypothesis for (a) I = 1; (b) I = 2. The thick solid lines indicate the sets (a) B1 and (b) B2 , and the thick dashed lines the sets (a) A1 and (b) A2 . The intersection lemma is used to produce paths that join the thick dashed lines to the thin solid lines

The paths constructed so far are depicted in Fig. 5 (a). For the moment, the reader should disregard q 1 , and the distinction between thin, thick and dashed paths in the figure. We either have |p 1 | < |x| = n, as depicted in Fig. 5(a), or p1 = x. We first argue that in the case p1 = x, Lemma 5.2 holds with J = 1. Indeed, if p1 = x, we set u1 = w and v 1 = x. Then apart from the claim regarding Cases (a) and (b), the vertices and paths required by Lemma 5.2 for J = 1 have been constructed. (Note that the conclusion of Lemma 5.1 guarantees that the newly constructed paths are edge-disjoint from the old ones.) It is not difficult to also show that either Case (a) or (b) holds, and we leave the details of this to when we deal with the general recursion step. Next we explain how to continue the construction if |p 1 | < |x| = n. Let q 1 denote the first vertex on the path p 1 x following p1 . Let B1 denote the union of the thick solid lines in Fig. 5(a), that is, B1 = 0p0 ∪ p0 R ∪ r 0 z1 = A0 ∪ r 0 z1 . Let A1 denote the union of the dashed lines in Fig. 5(a), that is, A1 = p 0 p1 ∪ p 1 z1 . We want to apply Lemma 5.1 with A = A1 , B = B1 , etc. It is easy to verify conditions (i)–(iii) of the lemma. The crucial condition here is (iv), which allows us to conclude that r ∈ A1 , and hence the two new paths produced by Lemma 5.1 will connect the dashed lines to the thin solid lines in Fig. 5(a). The reason condition (iv) is satisfied is that we chose |p1 | to be maximal. Indeed, a glance at Fig. 5(a) suggests that if we had paths from q 1 x and B1 \A1 to a vertex z that are edge-disjoint from A1 ∪ B1 , then that would contradict the maximality of |p1 |. (Recall the earlier application of Lemma 5.1 with A = A0 , B = B0 , etc., and the choice of p 1 .) We will verify the details of this when we deal with the general case I ≥ 1. We can summarize the above discussion by saying that Hypothesis (R) for I = 0 should imply that in the case p 1 = x the following statement holds: (R) Recursion hypothesis for I = 1. Vertices and paths (all edge-disjoint) with the following properties exist: (i) p1 and q 1 such that / R, p1 −→ w −→ x, q 1 −→

(5.11)

Random Walk on the Incipient Infinite Cluster for Oriented Percolation

423

where (p1 , q 1 ) is the first edge of the path p 1 x, and |p1 | > |p0 |; (ii) u0 , v 0 , z1 , such that 0 −→ u0 , u0 −→ z1 , v 0 −→ R;

(5.12)

(iii) u0 −→ p1 ; (iv) v 0 lies either on u0 p 1 , in which case p0 = v 0 , or on u0 z1 , in which case p 0 = u0 ; (v) p 0 −→ p 1 −→ z1 . Letting A1 = {p 0 p1 , p 1 z1 }, B1 = A0 ∪ {r 0 z1 } = {paths in (5.12)} ∪ {u0 p0 }, the hypotheses of Lemma 5.1 are satisfied with p = p 1 , q = q 1 , A = A1 and B = B1 . The next step of the construction is carried out similarly. An application of Lemma 5.1 gives the paths shown in Fig. 5(b). Again, we chose p so that |p | is maximal, and set p 2 = p , z2 = z and r 1 = r for this choice of p . We define u1 and v 1 depending on the location of r 1 , similarly to the previous step. If p2 = x, we can conclude similarly to the previous step that the lemma holds with J = 2. If p 2 = x, as in Fig. 5(b), we advance the induction similarly to the previous step. This time, we use both the choice of p1 and p 2 to conclude the necessary statement about A2 and B2 . Now we state the recursion hypothesis in general for I ≥ 1. (R) Recursion hypothesis for I ≥ 1. Vertices and paths (all edge-disjoint) with the following properties exist: (i) p I and q I such that

p I −→ w −→ x, q I −→ / R,

(5.13)

where (p I , q I ) is the first edge of the path p I x, and |p I | > |p I −1 |; (ii) ui , 0 ≤ i < I ; v i , 0 ≤ i < I ; zi , 1 ≤ i ≤ I , such that Lemma 5.2 (iii) holds with i restricted to 1 ≤ i < I , Lemma 5.2 (iv) holds with i restricted to 1 ≤ i ≤ I , Lemma 5.2 (v) holds with i restricted to 1 ≤ i < I , v 0 −→ R;

(5.14) (5.15) (5.16) (5.17)

(iii) u I −1 −→ p I ; (iv) v I −1 lies either on u I −1 p I , in which case p I −1 = v I −1 , or on u I −1 z I , in which case p I −1 = u I −1 ; (v) p I −1 −→ p I −→ z I . Letting A I = {p I −1 p I , p I z I }, B I = B I −1 ∪ A I −1 ∪ {r I −1 z I } = {paths in (5.14)–(5.17)} ∪ {u I −1 p I −1 }, the hypotheses of Lemma 5.1 are satisfied with p = p I , q = q I , A = A I and B = B I . Figure 5 illustrates those paths of Fig. 4 that have been constructed at the stages I = 1 and I = 2. Note that p I receives either the label u I or v I . Hence pi will always equal either ui or v i , depending on the location of v i (by part (iv) of the hypothesis). Note also that (5.8) holds if (R) holds for all I = 0, 1, . . . , i.

424

M. T. Barlow, A. A. Járai, T. Kumagai, G. Slade

Consequence of (R): Definition of p I +1 , u I , v I and z I +1 . We now assume that (R) holds for some I ≥ 0. An application of Lemma 5.1 with the data given in the hypothesis shows the existence of vertices p , r and z with certain properties. We now choose p so that |p | is maximal, and such that the properties claimed in Lemma 5.1 hold. We set p I +1 = p , z I +1 = z and r I = r for this choice. Note that r I ∈ A I , which is a union of two paths in both cases I = 0 and I ≥ 1. In the case I = 0, if r 0 ∈ p0 R then we set v 0 = r 0 and u0 = p 0 , and if r 0 ∈ 0p 0 then we set v 0 = p0 , u0 = r 0 . Similarly, in the case I ≥ 1, we set v I = r I and u I = p I if r I ∈ p I z I , and we set v I = p I , u I = r I if r I ∈ p I −1 p I . In both cases, it is clear that |u I | ≤ |p I | < |z I +1 | < R, and hence (5.7) holds for i = I + 1. It follows immediately from these definitions, and from the disjointness properties ensured by Lemma 5.1, that assumptions (ii)–(v) of (R) now hold with I replaced by I + 1. Verification of Lemma 5.2 if p I +1 = x. We show that if p I +1 = x, then Lemma 5.2 holds with J = I + 1. For this, we define u I +1 = w and v I +1 = x. It is immediate from these definitions, from the disjointness properties ensured by Lemma 5.1, and from the already established properties (ii)–(v) of hypothesis (R) for I + 1 = J , that (i)–(v) of Lemma 5.2 hold. It remains to show that either Case (a) or Case (b) holds. Since 0 −→ y, there exists v ∗ ∈ 0u0 , such that v ∗ −→ y disjointly from 0u0 . If v ∗ y is not disjoint from v 0 R, we are in Case (a), and we can ignore v ∗ . If v ∗ y intersects u0 p0 or u0 z1 , let v 0 be the last such intersection. Note that v ∗ y must be disjoint from all other paths constructed, since those are subsets of C(q 0 ), and q 0 −→ / R. Hence if the intersection v 0 exists, we can replace v 0 by v 0 and we are in Case (a). If the intersection v 0 does not exist, we are in Case (b). This verifies the claims of Lemma 5.2. We are left to show that if p I +1 = x, then (R) must hold for I + 1. Advancing the recursion I =⇒ I + 1 if p I +1 = x. Since p I +1 ∈ q I x, but p I +1 = x, we have |p I +1 | > |p I |, and p I +1 −→ w, showing (i) of hypothesis (R). We have already seen that (ii)–(v) are guaranteed to hold. We are left to show that the hypotheses of Lemma 5.1 hold with the data given. (i), (ii) and 0 ∈ A I +1 ∪ B I +1 are clear from the definitions. By the definition of q I +1 , A I +1 ∪ B I +1 is a subgraph of C˜(q I +1 ) . Assume, for a contradiction, that we have z∗ ∈ C(q I +1 )∩(A I +1 ∪ B I +1 ). Without loss of generality, assume that z∗ is the first visit of an occupied path q I +1 z∗ to A I +1 ∪ B I +1 . In particular, q I +1 z∗ is edge-disjoint from A I +1 ∪ B I +1 . Observe that A I +1 ∪ B I +1 = A I +1 ∪ A I ∪ B I ∪ {r I z I +1 }. If we had z∗ ∈ A I +1 , then the disjoint paths q I +1 z∗ z I +1 and r I z I +1 would satisfy the conclusions of Lemma 5.1 for p = p I , q = q I , etc. This contradicts the choice of p I +1 (the maximality of |p I +1 |), since |q I +1 | > |p I +1 |. If we had z∗ ∈ r I z I +1 , we get a similar contradiction due to the paths q I +1 z∗ and r I z∗ . Finally, we can rule out z∗ ∈ A I ∪ B I , since C(q I +1 ) ⊂ C(q I ), and the latter is disjoint from A I ∪ B I . We are left to show that every occupied path from B I +1 to C(q I +1 ) has to pass through A I +1 . Assume, for a contradiction, that there exists z∗ ∈ C(q I +1 ), and z∗ ∈ B I +1 such that z∗ −→ z∗ disjointly from A I +1 . By considering the last visit, we may also assume that z∗ is the only vertex of z∗ z∗ in A I +1 ∪ B I +1 . We may also assume that q I +1 z∗ and z∗ z∗ are edge-disjoint. We already saw C(q I +1 ) ∩ (A I +1 ∪ B I +1 ) = ∅, in particular,

Random Walk on the Incipient Infinite Cluster for Oriented Percolation

425

q I +1 z∗ is edge-disjoint from A I +1 ∪ B I +1 . Observe that B I +1 = A I ∪ B I ∪ {r I z I +1 } =

I

(Ai ∪ {r i zi+1 }).

(5.18)

i=0

If we had z∗ ∈ r i zi+1 , then the paths q I +1 z∗ and r i z∗ z∗ would contradict the choice of pi+1 . Finally, if we had z∗ ∈ Ai , then the paths q I +1 z∗ and z∗ z∗ would contradict the choice of pi+1 . This completes the verification of hypothesis (R) for I + 1. This completes the proof of Lemma 5.2. 5.3. A diagrammatic bound. In this section, we use Lemma 5.2 and the BK inequality [8] to bound P pc [A J (n, w, x, y)]. For this, we need the following preliminaries. The critical survival probability is defined by θ N = P pc (0 −→ N ).

(5.19)

The two papers [22,23] show that for d > 4 and L ≥ L 0 (d), we have θ N ∼ cN −1 as N → ∞, for some c = c(d, L) = 2 + O(L −d ). Moreover, θN ≤

K , N

N ≥ 0, L ≥ L 0 ,

(5.20)

with the constant K = 5 which is of course independent of both d and L (see [22, Eq. (1.11)]). To abbreviate the notation, when y 1 = (y1 , m 1 ) and y 2 = (y2 , m 2 ) we write τ (y 1 , y 2 ) = τm 2 −m 1 (y2 − y1 ). We also introduce U1 (u0 , v 0 , u1 , v 1 , z1 ) = τ (v 0 , u1 ) τ (u1 , v 1 ) τ (v 1 , z1 ) τ (u0 , z1 ), U2 (u0 , v 0 , u1 , v 1 , z1 ) = τ (u0 , u1 ) τ (u1 , v 1 ) τ (v 1 , z1 ) τ (v 0 , z1 ), U = U1 + U2 .

(5.21)

For 0 ≤ |u0 | < n and |u0 | ≤ |v 0 | < R and y = (y, N ), let τ (0, u0 ) τ (u0 , v 0 ) τ (v 0 , y), ϕ(u0 , v 0 ) = y∈Zd

ϕ R (u0 , v 0 ) =

y∈Zd

v ∗ ∈Zd ×Z+

τ (0, v ∗ ) τ (v ∗ , u0 ) τ (u0 , v 0 ) θ R−|v 0 | τ (v ∗ , y), (5.22)

ψ (0) (u0 , v 0 ) = ϕ(u0 , v 0 ) + ϕ R (u0 , v 0 ). For I ≥ 1, 0 ≤ |u I | < n and |u I | ≤ |v 0 | < R, let ψ (I ) (u I , v I ) =

U (u I −1 , v I −1 , u I , v I , z I )

u I −1 ∈Zd ×Z + z I ∈Zd ×Z + v I −1 ∈Zd ×Z + 0≤|u I −1 |≤|u I | |v I |<|z I |
× ψ (I −1) (u I −1 , v I −1 ). (5.23)

426

M. T. Barlow, A. A. Járai, T. Kumagai, G. Slade

Lemma 5.4. For J ≥ 0,

P pc A J (n, w, x, y) ≤ ψ (J ) (w, x).

(5.24)

y∈Zd

Proof. Definition 5.3 guarantees that on the event A J (n, w, x, y) certain disjoint paths exist. If we fix the vertices u0 , . . . , u J , v 0 , . . . , v J and z1 , . . . , z J , then the probability of the existence of the disjoint paths is bounded by the product of the probabilities of the existence of the individual paths, by the BK inequality [8]. An individual path y 1 y 2 contributes a factor τ (y 1 , y 2 ). Now summing the bound over all the vertices but u J and v J , gives an upper bound on P pc A J (n, w, x, y) . Further summing over y ∈ Zd gives an upper bound for the left-hand side of (5.24). Now it is merely a matter of bookkeeping to check that we get the expressions ψ (J ) . The terms ϕ and ϕ R correspond to Cases (a) and (b) of Lemma 5.2, respectively, and their sum ψ (0) bounds the contribution of the paths constructed when we initialized the recursion, together with the path leading to y. When J = 0, and we take u0 = w and v 0 = x, we get the bound in (5.24), with J = 0. When J ≥ 1, the recursive definition of ψ (I ) reflects the recursion of Lemma 5.2. The factor U = U1 + U2 gives the contribution of the paths added in the I th step: for U1 these are v I −1 u I , u I v I , v I z I and u I −1 z I (when v I −1 lies on u I −1 u I ), and for U2 they are u I −1 u I , u I v I , v I z I and v I −1 z I (when v I −1 lies on u I −1 z I ). Note that the path u I −1 v I −1 is not present in U , since it is taken care of inside ψ (I −1) . 5.4. Estimation of diagrams. It follows from (5.3), (5.5) and Lemma 5.4 that E∞ |D(n)| ≤

1 lim sup A N →∞

∞

P pc A J (n, w, x, y)

w,x,y∈Zd J =0

⎡ ⎤ ∞ 1 ≤ lim sup ⎣ ψ (J ) (w, x)⎦ . A N →∞ d

(5.25)

J =0 w,x∈Z

Fix d > 6, R ≥ 1, 0 < a < 1 and 0 < n ≤ a R. To prove (5.2) and hence Proposition 3.3, it suffices to show that there exist c2 = c2 (a) and a constant 0 < c3 < 21 such that lim sup ψ (J ) (w, x) ≤ c2 c3J , J ≥ 0, (5.26) N →∞

w,x∈Zd

since (4.3) and (5.25)–(5.26) then imply that E∞ |D(n)| ≤ K¯ c2

∞ J =0

c3J = K¯

c2 ≤ 2 K¯ c2 = c1 (a). 1 − c3

We now state and prove two lemmas which imply (5.26). Their proofs use the bound τn ≤ K¯ , n ≥ 0

(5.27)

of (4.3), as well as (5.20). It is in Lemma 5.6, and only there, that we need to assume d > 6 rather than d > 4. The first lemma gives a bound on ψ (0) .

Random Walk on the Incipient Infinite Cluster for Oriented Percolation

427

Lemma 5.5. Let d > 4, R ≥ 1, 0 < a < 1, 0 < n ≤ a R, w = (w, n − 1) and x = (x, n). Then lim sup ψ (0) (w, x) ≤ ( K¯ 3 + K¯ 4 K a/(1 − a)). (5.28) N →∞

w,x∈Zd

Proof. By definition and (5.27), ϕ(w, x) = τn−1 τ1 τ N −n ≤ K¯ 3 . w,x∈Zd

Similarly, writing v ∗ = (v∗ , l∗ ),

ϕ R (w, x) =

n−1

τl∗ τn−l∗ −1 τ1 θ R−n τ N −l∗ ≤

l∗ =0

w,x∈Zd

K¯ 4 K n . R−n

Since n/(R − n) ≤ a/(1 − a) because n ≤ a R, this gives (5.28).

For J ≥ 1, we use a somewhat stronger formulation of the bound, in which |u J | and |v J | are not restricted to the values n − 1 and n. This will allow us to prove a bound on ψ (J ) by induction. Lemma 5.6. Let d > 6, R ≥ 1, 0 < a < 1, 0 < n ≤ a R. Suppose that 0 ≤ k J < n, k J ≤ l J < R, u J = (u J , k J ) and v J = (v J , l J ). Then lim sup ψ (J ) (u J , v J ) ≤ (2 K¯ 3 K 3 β) J ( K¯ 3 + 3 K¯ 5 K a/(1 − a)), J ≥ 1. N →∞

u J ,v J ∈Zd

(5.29) Proof. We start by inserting the definition of ψ (J ) into the left-hand side of (5.29). With z J = (z J , s J ), u J −1 = (u J −1 , k J −1 ) and v J −1 = (v J −1 , l J −1 ), the left-hand side of (5.29) equals lim sup N →∞

kJ

R−1

sJ

U (u J −1 , v J −1 , u J , v J , z J )

u J ,v J ∈Zd z J ,u J −1 ,v J −1 ∈Zd k J −1 =0 s J =l J l J −1 =k J −1

× ψ (J −1) (u J −1 , v J −1 ). (5.30) The vertices u J , v J and z J only appear in the factor U . We claim that U (u J −1 , v J −1 , u J , v J , z J ) ≤ 2 K¯ 3 Kβ(s J − k J −1 + 1)−d/2 .

(5.31)

u J ,v J ,z J ∈Zd

To see this, note that s J = |z J | > |u J −1 | = k J −1 , by (5.7). For the U1 term, we use (4.2) to bound τ (u J −1 , z J ) by Kβ(s J − k J −1 + 1)−d/2 . Then the sums over z J , v J and u J contribute the factor K¯ 3 , by using (5.27) for the other three factors in U1 . For the U2 term, we apply (4.2) and τn ≤ K¯ to see that sup τn (y)τm (x − y) ≤ Kβ(n + m + 1)−d/2 , n + m ≥ 1. (5.32) x∈Zd y∈Zd

428

M. T. Barlow, A. A. Járai, T. Kumagai, G. Slade

An application of (5.32) to the convolution of τ (u J −1 , u J ), τ (u J , v J ) and τ (v J , z J ), together with (5.27), yields an upper bound of the same form. This proves (5.31). Inserting (5.31) into (5.30) and rearranging, we get (5.30) ≤ 2 K¯ 3 Kβ

kJ

R−1

(s J − k J −1 + 1)−d/2

k J −1 =0 s J =l J sJ

×

lim sup

l J −1 =k J −1

N →∞

ψ (J −1) (u J −1 , v J −1 ).

(5.33)

u J −1 ,v J −1 ∈Zd

Now we prove (5.29) by induction on J . To start the induction, we verify (5.29) for J = 1. This is most of the work; advancing the induction is easy. When J = 1, the lim sup in (5.33) consists of two terms, corresponding to ϕ and ϕ R . The ϕ-term is bounded by lim sup τ (0, u0 ) τ (u0 , v 0 ) τ (v 0 , y) = lim sup τk0 τl0 −k0 τ N −l0 ≤ K¯ 3 . N →∞

N →∞

u 0 ,v0 ∈Zd y∈Zd

(5.34) Inserting this into (5.33), and assuming d > 6, we see that the ϕ contribution to (5.33) is bounded by 2 K¯ 3 Kβ K¯ 3

k1 R−1

(s1 − k0 + 1)(2−d)/2 ≤ (2 K¯ 3 K 2 β)( K¯ 3 ).

(5.35)

k0 =0 s1 =l1

The ϕ R term is bounded as follows. First, the lim sup is bounded by

lim sup N →∞

k0

τ (0, v ∗ ) τ (v ∗ , u0 ) τ (u0 , v 0 ) θ R−l0 τ (v ∗ , y)

u 0 ,v0 ∈Zd l∗ =0 y,v∗ ∈Zd

≤ K¯ 4

k0 l∗ =0

K 1 θ R−l0 ≤ K¯ 4 (k0 + 1) ≤ K¯ 4 K n . R − l0 R − l0

(5.36)

We insert this bound into (5.33) to obtain (2 K¯ 3 Kβ)( K¯ 4 K )n

k1 R−1

(s1 − k0 + 1)−d/2

k0 =0 s1 =l1

s1 l0 =k0

1 . R − l0

(5.37)

We split the sum over s1 into the cases: (1) s1 < n + (R − n)/2; (2) s1 ≥ n + (R − n)/2. In case (1), we have 1 1 2 . ≤ ≤ R − l0 R − s1 R−n Inserting this into (5.37), the contribution of case (1) to the expression in (5.37) is bounded by (2 K¯ 3 Kβ)(2 K¯ 4 K )

k1 n R−n

k0 =0

n+(R−n)/2

(s1 − k0 + 1)(2−d)/2

s1 =l1

n a ≤ (2 K K β)(2 K K ) ≤ (2 K¯ 3 K 2 β)(2 K¯ 4 K ) . R−n 1−a ¯3

2

¯4

(5.38)

Random Walk on the Incipient Infinite Cluster for Oriented Percolation

429

In case (2), since n ≥ k1 ≥ k0 we have (s1 − k0 + 1)−d/2 ≤ K (R − k0 + 1)−d/2 , and the sum over l0 in (5.37) is bounded by log(R − k0 + 1) ≤ K¯ (R − k0 + 1)δ for some fixed exponent δ (e.g., δ = 1/4 suffices). Therefore the contribution of case (2) to the expression in (5.37) is bounded by (2 K¯ 3 K 2 β)( K¯ 5 K )n

k1 R−n (R − k0 + 1)(2δ−d)/2 2 k0 =0

R−n (R − n)(2δ+2−d)/2 2 n (R − n)(2δ+6−d)/2 ≤ (2 K¯ 3 K 3 β)( K¯ 5 K ) R−n a . ≤ (2 K¯ 3 K 3 β)( K¯ 5 K ) 1−a ≤ (2 K¯ 3 K 3 β)( K¯ 5 K )n

(5.39)

Putting (5.38) and (5.39) together, we get that (5.37) is bounded by (2 K¯ 3 K 3 β)(3 K¯ 5 K a/(1 − a)). Together with (5.35) this proves the J = 1 case of (5.29). To advance the induction, we assume now that (5.29) holds for an integer J = M − 1 ≥ 1, and prove that it holds for J = M. Using d > 6, we insert the bound (5.29) into (5.33) to get that the right-hand side of (5.33) is bounded by (2 K¯ 3 Kβ)(2 K¯ 3 K 3 β) M−1 ( K¯ 3 + 3 K¯ 5 K a/(1 − a))

kM

R−1

(s M − k M−1 +1)(2−d)/2

k M−1 =0 s M =l M

¯3

¯3

¯5

≤ (2 K K β) ( K + 3 K K a/(1 − a)). 3

M

This completes the proof of (5.29).

(5.40)

Proof of (5.26). It follows immediately from Lemmas 5.5–5.6 that (5.26) holds with c2 = ( K¯ 3 + 3 K¯ 5 K a/(1 − a)) and c3 = 2 K¯ 3 K 3 β. Recall that the constant K = 5 of (5.20) is independent of d and L. Choosing β small ensures that 0 < c3 < 21 . This proves (5.26), and thus completes the proof of Proposition 3.3. Acknowledgements. The work of MTB, AAJ and GS was supported in part by NSERC of Canada. The work of TK was supported in part by the Ministry of Education, Culture, Sports, Science and Technology of Japan, Grant-in-Aid 18654018 (Houga). We thank an anonymous referee for suggesting several improvements to the exposition.

References 1. Aizenman, M., Newman, C.M.: Tree graph inequalities and critical behavior in percolation models. J. Statist. Phys. 36, 107–143 (1984) 2. Aldous, D., Fill, J.: Reversible Markov Chains and Random Walks on Graphs. Book in preparation, available at http://www.stat.berkeley.edu/~aldous/RWG/book.html, 2003 3. Alexander, S., Orbach, R.: Density of states on fractals: “fractons”. J. Physique (Paris) Lett. 43, L625–L631 (1982) 4. Angel, O., Goodman, J., den Hollander, F., Slade, G.: Invasion percolation on regular trees. Ann. Probab., to appear

430

M. T. Barlow, A. A. Járai, T. Kumagai, G. Slade

5. Barlow, M.T.: Random walks on supercritical percolation clusters. Ann. Probab. 32, 3024–3084 (2004) 6. Barlow, M.T., Coulhon, T., Kumagai, T.: Characterization of sub-Gaussian heat kernel estimates on strongly recurrent graphs. Comm. Pure Appl. Math. 58, 1642–1677 (2005) 7. Barlow, M.T., Kumagai, T.: Random walk on the incipient infinite cluster on trees. Illinois J. Math. 50, 33–65 (2006) 8. van den Berg, J., Kesten, H.: Inequalities with applications to percolation and reliability. J. Appl. Prob. 22, 556–569 (1985) 9. Berger, N., Biskup, M.: Quenched invariance principle for simple random walk on percolation clusters. Prob. Theory Related Fields 137, 83–120 (2007) 10. Berger, N., Gantert, N., Peres, Y.: The speed of biased random walk on percolation clusters. Probab. Theory Related Fields 126, 221–242 (2003) 11. Bezuidenhout, C., Grimmett, G.: The critical contact process dies out. Ann. Probab. 18, 1462–1482 (1990) 12. Billingsley, P.: Probability and Measure. 3rd edition, New York: John Wiley and Sons, 1995 13. Croydon, D.: Volume growth and heat kernel estimates for the continuum random tree. Probab. Theory Related Fields. 140(1–2), 207–238 (2008) 14. Croydon, D.: Convergence of simple random walks on random discrete trees to Brownian motion on the continuum random tree. Ann. Inst. H. Poincaré Probab. Statist., to appear 15. Doyle, P.G., Snell, J.L.: Random Walks and Electric Networks. Washington DC: Mathematical Association of America, 1984; avilable at http://arxiv.org/abs/math/0001057v1, 2000 16. Fortuin, G., Kastelyn, P., Ginibre, J.: Correlation inequalities on some partially ordered sets. Commun. Math. Phys. 22, 89–103 (1971) 17. de Gennes, P.G.: La percolation: un concept unificateur. La Recherche 7, 919–927 (1976) 18. Grimmett, G.: Percolation. 2nd ed., Berlin: Springer, 1999 19. Grimmett, G., Hiemer, P.: Directed percolation and random walk. In: V. Sidoravicius, editor, In and Out of Equilibrium, Boston: Birkhäuser, pp. 273–297, 2002 20. van der Hofstad, R.: Infinite canonical super-Brownian motion and scaling limits. Commun. Math. Phys. 265, 547–583 (2006) 21. van der Hofstad, R., den Hollander, F., Slade, G.: Construction of the incipient infinite cluster for spreadout oriented percolation above 4 + 1 dimensions. Commun. Math. Phys. 231, 435–461 (2002) 22. van der Hofstad, R., den Hollander, F., Slade, G.: The survival probability for critical spread-out oriented percolation above 4 + 1 dimensions. I. Induction. Probab. Theory Related Fields 138, 363–389 (2007) 23. van der Hofstad, R., den Hollander, F., Slade, G.: The survival probability for critical spread-out oriented percolation above 4 + 1 dimensions. II. Expansion. Ann. Inst. H. Poincaré Probab. Statist. 43, 509–570 (2007) 24. van der Hofstad, R., Járai, A.A.: The incipient infinite cluster for high-dimensional unoriented percolation. J. Statist. Phys. 114, 625–663 (2004) 25. van der Hofstad, R., Slade, G.: A generalised inductive approach to the lace expansion. Probab. Theory Related Fields 122, 389–430 (2002) 26. van der Hofstad, R., Slade, G.: Convergence of critical oriented percolation to super-Brownian motion above 4 + 1 dimensions. Ann. Inst. H. Poincaré Probab. Statist. 39, 415–485 (2003) 27. Hughes, B.D.: Random Walks and Random Environments. Volume 2: Random Environments. Oxford: Oxford University Press, 1996 28. Janssen, H.-K., Täuber, U.C.: The field theory approach to percolation processes. Ann. Phys. 315, 147–192 (2005) 29. Kesten, H.: The incipient infinite cluster in two-dimensional percolation. Probab. Theory Related Fields 73, 369–394 (1986) 30. Kesten, H.: Subdiffusive behavior of random walk on a random cluster. Ann. Inst. H. Poincaré Probab. Statist. 22, 425–487 (1986) 31. Kigami, J.: Analysis on Fractals. Cambridge: Cambridge University Press, 2001 32. Kumagai, T., Misumi, J.: Heat kernel estimates for strongly recurrent random walk on random media, preprint, 2007 33. Lyons, R., Peres, Y.: Probability on Trees and Networks. Book in preparation, available at http://mypage. iu.edu/~rdlyons/prbtree/prbtree.html 34. Mathieu, P., Piatnitski, A.: Quenched invariance principles for random walks on percolation clusters. Proc. Roy. Soc. A 463, 2287–2307 (2007) 35. Sidoravicius, V., Sznitman, A.-S.: Quenched invariance principles for walks on clusters of percolation or among random conductances. Probab. Theory Related Fields 129, 219–244 (2004) 36. Slade, G.: The Lace Expansion and its Applications. Lecture Notes in Mathematics Vol. 1879. Ecole d’Eté de Probabilités de Saint–Flour XXXIV–2004, Berlin: Springer, 2006 37. Telcs, A.: Volume and time doubling of graphs and random walks: the strongly recurrent case. Comm. Pure Appl. Math. 54, 975–1018 (2001)

Random Walk on the Incipient Infinite Cluster for Oriented Percolation

431

38. Telcs, A.: Local sub-Gaussian estimates on graphs: the strongly recurrent case. Electron. J. Probab. 6, paper 22 (2001) 39. Telcs, A.: A note on rough isometry invariance of resistance. Combin. Probab. Comput. 11, 427–432 (2002) Communicated by M. Aizenman

Commun. Math. Phys. 278, 433–451 (2008) Digital Object Identifier (DOI) 10.1007/s00220-007-0404-2

Communications in

Mathematical Physics

Exponential Decay Towards Equilibrium for the Inhomogeneous Aizenman-Bak Model J. A. Carrillo1 , L. Desvillettes2 , K. Fellner3 1 ICREA (Institució Catalana de Recerca i Estudis Avançats) and Departament de Matemàtiques,

Universitat Autònoma de Barcelona, E-08193 Bellaterra, Spain. E-mail: [email protected]

2 CMLA, ENS Cachan, CNRS, PRES UniverSud, 61 Av. du Pdt. Wilson,

94235 Cachan Cedex, France. E-mail: [email protected]

3 Faculty of Mathematics, University of Vienna, Nordbergstr. 15, 1090 Wien, Austria.

E-mail: [email protected] Received: 18 August 2006 / Accepted: 25 July 2007 Published online: 8 January 2008 – © Springer-Verlag 2007

Abstract: The Aizenman-Bak model for reacting polymers is considered for spatially inhomogeneous situations in which they diffuse in space with a non-degenerate sizedependent coefficient. Both the break-up and the coalescence of polymers are taken into account with fragmentation and coagulation constant kernels. We demonstrate that the entropy-entropy dissipation method applies directly in this inhomogeneous setting giving not only the necessary basic a priori estimates to start the smoothness and size decay analysis in one dimension, but also the exponential convergence towards global equilibria for constant diffusion coefficient in any spatial dimension or for non-degenerate diffusion in dimension one. We finally conclude by showing that solutions in the one dimensional case are immediately smooth in time and space while in size distribution solutions are decaying faster than any polynomial. Up to our knowledge, this is the first result of explicit equilibration rates for spatially inhomogeneous coagulation-fragmentation models. 1. Introduction We analyze the spatial inhomogeneous version of a size-continuous model for reacting polymers or clusters of aggregates: ∂t f − a(y) x f = Q( f, f ).

(1.1)

Here, f = f (t, x, y) is the concentration of polymers/clusters with length/size y ≥ 0 at time t ≥ 0 and point x ∈ ⊂ Rd , d ≥ 1. These polymers/clusters diffuse in the environment . This set is assumed to be a smooth bounded domain with normalized volume, i.e., || = 1. In the one dimensional case, we will set = (0, 1). Equation (1.1) is to be considered with homogeneous Neumann boundary condition ∇x f (t, x, y) · ν(x) = 0

on ∂

(1.2)

434

J. A. Carrillo, L. Desvillettes, K. Fellner

with ν the outward unit normal to , so that there is no polymer flux through the physical boundary. We assume the diffusion coefficient a(y) to be non-degenerate in the sense that there exist a∗ , a ∗ ∈ R+ such that 0 < a∗ ≤ a(y) ≤ a ∗ .

(1.3)

On the other hand, the reaction term Q( f, f ) of (1.1) models chemical degradationbreak-up or fragmentation- and polymerization -coalescence or coagulation- of polymers/clusters. More precisely, the full collision operator reads as Q( f, f ) = Q c ( f, f ) + Q b ( f, f ) = Q + ( f, f ) − Q − ( f, f ) − + = Q +c ( f, f ) − Q − c ( f, f ) + Q b ( f, f ) − Q b ( f, f )

(1.4)

with obvious definitions of the coagulation Q c ( f, f ), fragmentation or break-up Q b ( f, f ), loss Q − ( f, f ) and gain Q + ( f, f ) operators which are determined from the four basic terms in (1.4): 1. Coalescence of clusters of size y ≤ y and y − y results in clusters of size y: y Q +c ( f, f ) := f (t, x, y − y ) f (t, x, y ) dy . (1.5) 0

2. Polymerization of clusters of size y with other clusters of size y produces a loss in its concentration: ∞ Q− ( f, f ) := 2 f (t, x, y) f (t, x, y ) dy . (1.6) c 0

3. Break-up of clusters of size

y

larger than y contributes to create clusters of size y: ∞ Q +b ( f, f ) := 2 f (t, x, y ) dy . (1.7) y

4. Break-up of polymers of size y reduces its concentration: Q− b ( f, f ) := y f (t, x, y).

(1.8)

This kind of model finds its application not only in polymers and cluster aggregation in aerosols [S16,S17,AB,Al,Dr] but also in cell physiology [PS], population dynamics [Ok] and astrophysics [Sa]. Here, fragmentation and coagulation kernels are all set up to constants as in the original Aizenman-Bak model [AB]. This will be of paramount importance in the basic a-priori estimates. The conservation of the total number of monomers at time t ≥ 0 quantified by ∞ N (t, x) d x, where N (t, x) := y f (t, x, y) dy

0

is the basic conservation law satisfied by Eq. (1.1) since the reaction term (1.4) satisfies ∞ y Q( f, f ) d y d x = 0, 0

and thus, assuming initially a positive total number of monomers, we formally conclude ∞ y f (t, x, y) d y d x = N (t, x) d x = N0 (x) d x := N∞ > 0. (1.9) 0

Exponential Equilibration Rate for the Inhomogeneous Aizenman-Bak Model

Another macroscopic quantity of interest is the number density of polymers, ∞ f (t, x, y) dy, M(t, x) :=

435

(1.10)

0

that together with the total number of monomers N (t, x) satisfies the reaction-diffusion system ∞ ya(y) f (t, x, y) dy = 0, (1.11) ∂t N − x 0 ∞ a(y) f (t, x, y) dy = N − M 2 , (1.12) ∂t M − x 0

becoming a closed decoupled system in the constant diffusion case (a(y) := a): ∂t N − a x N = 0, ∂t M − a x M = N − M 2 .

(1.13) (1.14)

The definition of the full collision operator has to be understood in the weak sense as < Q( f, f ), ϕ >=

∞ 0

∞

f (y ) − f (y) f (y ) (ϕ(y) + ϕ(y ) − ϕ(y )) dy dy

0

(1.15) for any smooth function ϕ(y), where y = y + y and the dependence on (t, x) of the density function has been dropped for notational convenience. An alternative weak formulation that can be useful in several arguments below is obtained integrating by parts in the Q +b part giving ∞ ∞ ∞ ∞ < Q( f, f ), ϕ >= − 2 ϕ(y) f (y) dy f (y ) dy + f (y) f (y )ϕ(y ) dy dy 0 0 0 0∞ ∞ +2 f (y) Φ(y) dy − y f (y) ϕ(y) dy (1.16) 0

0

for any smooth function ϕ, the function Φ being the primitive of ϕ (∂ y Φ = ϕ) such that Φ(0) = 0. Let us consider the (free-energy) entropy functional associated to any positive density f as ∞

H ( f )(t, x) =

( f ln f − f ) dy,

(1.17)

0

and the relative entropy H ( f |g) = H ( f ) − H (g) of two states f and g not necessarily with the same L 1y -norm. Then, the entropy formally dissipates as d dt

∞ ∞ |∇x f |2 dy dx − H( f ) dx = − a(y) ( f − f f ) f 0 0 0 f dy dy d x := −D H ( f ) × ln (1.18) ff

with obvious notations.

∞

436

J. A. Carrillo, L. Desvillettes, K. Fellner

Global existence and uniqueness of classical solutions has been studied in [Am,AW] for some particular cases, namely, for constant diffusion coefficient or dimension one with additional restrictions for the coagulation and fragmentation kernel not including the AB model. The initial boundary-value problem to (1.1)–(1.2) was then analyzed in [LM02-1], for much more general coagulation and fragmentation kernels including the AB model (1.5) – (1.8), proving the global existence of weak solutions satisfying the entropy dissipation inequality t H ( f (t)) d x + D H ( f (s)) ds ≤ H ( f0 ) d x

0

for all t ≥ 0. The equilibrium states for which the entropy dissipation vanishes are better understood after applying a remarkable inequality proven in [AB, Props. 4.2 and 4.3]. A modified version of this inequality (reviewed in Sect. 2) reads: ∞ ∞ √ 2 f √ f − f f ln ≥ M H ( f | f ) + 2(M − N ) . (1.19) dy dy N ,N ff 0 0 Herein, f √ N ,N denotes a distinguished, exponential-in-size distribution with the very √ moments M = N and N : f √ N ,N (t, x, y) = e

− √y

N

.

These distributions f √ N ,N appear as analogues to the so-called intermediate or local equilibria in the study of inhomogeneous kinetic equation (e.g. [DV01,CCG,FNS,DV05, FMS,NS]). Finally, the conservation of mass (1.9) identifies (at least formally) the global 2 = N = N : equilibrium f ∞ with constant moments M∞ ∞ f∞ = e

− √ Ny

∞

.

(1.20)

The analogy to intermediate equilibria carries over to the following additivity of relative entropies: H ( f | f ∞ ) = H ( f | f √ N ,N ) + H ( f √ N ,N | f ∞ ).

(1.21)

It is worth pointing out that even if f √ N ,N and f ∞ do not have the same L 1y −norm, its global relative entropy √ √ H ( f N ,N | f ∞ ) d x = 2 N dx − N dx ≥ 0

is a nonnegative quantity, as easily checked via Jensens’s inequality. In [LM02-1], it is proved that f ∞ attracts all global weak solutions in L 1 ( × (0, ∞)) of (1.1)-(1.2) but no time decay rate is obtained. This result is the analogue to convergence results along subsequences for the classical Boltzmann equation in [De]. Other existence and uniqueness results for inhomogeneous coagulation-fragmentation models were given in [CD] and the references therein. Finally, let us mention that the conservation law (1.9) is known not to hold for certain coagulation-fragmentation kernels, phenomena known as gelation [ELMP], and the convergence or not towards typical self-similar profiles for the pure coagulation models is a related issue; we refer

Exponential Equilibration Rate for the Inhomogeneous Aizenman-Bak Model

437

to [Le,LM05,MP]. We refer finally to [LM02-1,LM03,LM04] for an extensive list of related literature. Let us now discuss some works on the study of the long time asymptotics for related models. Qualitative results concerning a discrete version of a coagulation-fragmentation system, the Becker-Döring system, have been obtained in [CP,LW,LM02-2] and the references therein. We emphasize that global explicit decay estimates towards equilibrium were obtained for the Becker-Döring system without diffusion in [JN] by entropy-entropy dissipation methods. Other techniques have recently been developed for inhomogeneous kinetic equations. We refer to [MN] for a spectral approach and to [V06] for a general description of the concept of hypocoercivity. Note that the presence of diffusion instead of advection makes it possible in our present context, not to use the concept of hypocoercivity. In this work we prove exponential decay towards equilibrium with explicit rates and constants. Our key result, Lemma 2 in Sect. 2, establishes a functional inequality between entropy and entropy dissipation provided lower and upper bounds on the moment M and (1.3). We are able to apply this functional inequality to solutions of (1.1)-(1.2) in the next two situations. In the special case of size-independent diffusion coefficients a(y) = a, we show, as the first application of Lemma 2, the exponential decay towards equilibrium in all space dimensions d ≥ 1 by exploiting the closed system (1.13)–(1.14) for N and M in Sect. 2. In the case of general diffusion coefficients a(y) satisfying (1.3) we prove in Sect. 3 a-priori estimates in the one-dimensional case d = 1, which entail an entropy-entropy dissipation estimate with a constant sufficient to conclude exponential decay via a suitable Gronwall argument (see Sect. 4). These two cases are summarized in the following theorem: Theorem 1. Let be a smooth bounded connected open set of Rd , d ≥ 1 and assume a constant diffusion coefficient a(y) = a > 0 or let be the interval (0, 1) and consider a diffusion coefficient satisfying (1.3). Let us also assume that f 0 = 0 is a nonnegative initial datum such that (1+ y +ln f 0 ) f 0 ∈ L 1 ((0, 1)×(0, ∞)). In the case a(y) = a > 0 assume further that initial moments M0 (x) and N0 (x) are L ∞ ()-functions. Then, the global weak solutions f (t, x, y) of (1.1)–(1.2) decay exponentially to the global equilibrium state (1.20) with explicitly computable constants C1 , C2 and rate α, both in global relative entropy: H ( f (t)| f ∞ ) d x ≤ C1 H ( f 0 | f ∞ ) d x e−α t , (1.22)

and in the L 1x,y sense: f (t, ·, ·) − f ∞ L 1x,y ≤ C2

H ( f0 | f∞) d x

α

e− 2 t

(1.23)

for all t ≥ 0, where f ∞ is defined by (1.20) and N∞ > 0 is determined by the conservation of mass (1.9). In the one dimensional case, it is further possible to interpolate the exponential decay in a “weak” norm like L 1 with polynomially growing bounds in “strong” norms like (weighted) L 1y (Hx1 ) in order to get an exponential decay in a “medium” norm like L 1y (L ∞ x ). Thus, the decay toward equilibrium can be extended to these stronger norms. The following proposition is proved at the end of Sect. 4:

438

J. A. Carrillo, L. Desvillettes, K. Fellner

Proposition 1. Under the assumptions of Theorem 1 for the case d = 1, for all t∗ > 0 and q ≥ 0, there are explicitly computable constants C3 , α > 0 such that whenever t ≥ t∗ , ∞ (1 + y)q f (t, ·, y) − f ∞ (y) L ∞ dy ≤ C3 e−α t . (1.24) x 0

A bootstrap argument in the spirit of the proof of Proposition 1 allows to replace the L∞ x norm by any Sobolev norms in (1.24). 2. Entropy-Entropy Dissipation Estimate Please note that in this section we will systematically use the shortcuts: ∞ ∞ M= f (x, y) d yd x, N= y f (x, y) d yd x. 0

0

We start by reminding the reader of the following functional inequality: Lemma 1 ([AB, Prop. 4.3]). Let g := g(y) be a function of L 1+ ((0, ∞)) with finite entropy g ln g ∈ L 1 ((0, ∞)), then ∞ ∞ ∞ ∞ g(y)g(y ) ln g(y + y ) dy dy ≤ g(y) dy g(y ) ln g(y ) dy 0

0

0

0

2

∞

−

g(y) dy

.

(2.1)

0

This inequality allows to show the dissipation inequality (1.19). Following the original paper [AB] or the survey [LM04], one finds that ∞ ∞ √ f dy dy ≥ M H ( f | f √ N ,N ) + (M − N )2 ( f − f f ) ln ff 0 0 N N N . (2.2) + M2 ln + 1 − M2 M2 M2 In fact, after expanding the left-hand side of (2.2), one applies Lemma 1 to the term ∞ ∞ − f f ln f dy dy , 0

while for the term

0

0

∞ ∞ 0

f ff ln ff

f ff

dy dy

one uses Jensen’s inequality for the convex function x ln x and further that ∞ ∞ f dy dy = N . 0

0

Then, after directly calculating the remaining terms one obtains (2.2), as in [LM04], and moreover √ the inequality (1.19) when applying the elementary inequality x ln(x)+1−x ≥ (1 − x)2 for x ≥ 0 to the last term on the right-hand side of (2.2).

Exponential Equilibration Rate for the Inhomogeneous Aizenman-Bak Model

439

For the subsequent large-time analysis, we will rather study the relative entropy with respect to the global equilibrium, which dissipates according to (1.18) and (1.19) as ∞ d |∇x f |2 H ( f | f∞) d x ≤ − a(y) dy dx dt f 0

√ M H ( f | f √ N ,N ) + 2(M − N )2 d x := −D( f ). (2.3) −

We introduce a lemma enabling to estimate the entropy of f by means of its entropy dissipation. This is a functional estimate, that is, the function f in this lemma does not depend on t and has not necessarily something to do with the solution of our equation. Lemma 2. Assume (1.3). Let f := f (x, y) ≥ 0 be a measurable function with moments satisfying 0 < M∗ ≤ M(x) ≤ M L ∞ and 0 < N∞ = N . Then, the following x entropy-entropy dissipation estimate holds: C(M∗ , N∞ , a∗ , P()) D( f ) ≥ H ( f | f ∞ ) d x, (2.4) M L ∞ x with a constant C(M∗ , N∞ , a∗ , P()) depending only on M∗ , N∞ , a∗ and the Poincaré constant P(). Proof. Step 1. We start with the right-hand side of (2.4) by using the additivity (1.21) and calculating √ H ( f | f∞) d x = H ( f | f √ N ,N ) d x + 2 N− N . (2.5)

Step 2. The second term of (2.5) is bounded as:

√ √ 2 M − N 2L 2 + M − M 2L 2 . (2.6) N− N≤√ x x N∞ √ √ √ Indeed, since N − N is orthogonal to N − M in L 2x , we have √ 2 √ √ √ √ N− N 1 1 =√ N− N≤ N − N 2L 2 ≤ √ N − M 2L 2 , x x N∞ N∞ N √ and further, we obtain (2.6) by expanding N − M 2L 2 and Young’s inequality x

√ 1 √ N − M 2L 2 − M − M 2L 2 ≤ N − M − M + M 2L 2 . x x x 2 Thus, we obtain (using 0 < M∗ < M) √ 2 − 12 √ H ( f | f ∞ ) d x ≤ max M−1 , 2N M H ( f | f ) d x + 2 M − N ∞ ∗ N ,N L2

x

4 +√ M − M 2L 2 x N∞ ∞ − 12 ≤ max M−1 , 2N ∞ ∗ 0

+√ by the inequality (1.19)

4 M − M 2L 2 , x N∞

0

∞

f − ff

ln

f ff

dy dy d x (2.7)

440

J. A. Carrillo, L. Desvillettes, K. Fellner

Step 3. Next, the variance of M, i.e. the last term on the right-hand side of (2.7) is controlled by the first, “Fisher”-type term of (2.3). Denoting with P() the constant of Poincaré’s inequality, we estimate using Cauchy-Schwartz, ∞ ∞ ∞ a∗ |∇x f |2 |∇x f |2 d yd x ≥ dy a(y) f dy d x f M L ∞ f 0 0 0 x 2 ∞ a∗ a∗ ≥ ∇x f dy d x = |∇x M|2 d x M L ∞ M L ∞ 0 x x a∗ ≥ M − M 2L 2 . (2.8) x P() M(t, ·) L ∞ x We remark that the seemingly more natural estimate ∞ √ √ 4 |∇x f |2 d yd x ≥ M − M 2L 2 , a(y) x f P() 0 provides a bound which does not seem sufficient to conclude as in Step 2. Step 4. Finally, combining (2.7) and (2.8), we have − 12 4P() M(t, ·) L ∞ x −1 D( f ), H ( f | f ∞ ) d x ≤ max M∗ , 2N∞ , √ a∗ N ∞ which yields the proof of Lemma 2.

Now, let us directly apply this entropy-entropy dissipation estimate to prove the constant diffusion part of Theorem 1. In the constant diffusion case, the equations for the first two moments M(t, x) and N (t, x) become the closed system (1.13)-(1.14). The existence and uniqueness of global, classical solutions with global L ∞ bounds from below and above are standard thanks to the maximum principle applied to the equations for N and further for M. We refer, for instance, to [Ro,Ki] for details, to conclude with: Lemma 3. Let be a smooth bounded connected open set of Rd , d ≥ 1 and let us assume that the initial data M0 (x) and N0 (x) = 0 are nonnegative L ∞ ()-functions. Then, there exist increasing functions t → M∗ (t), N∗ (t) and decreasing functions t → M ∗ (t), N ∗ (t) such that the unique global bounded solutions of the system (1.13)-(1.14) satisfy 0 < M∗ (t) ≤ M(t, x) ≤ M ∗ (t) < ∞, 0 < N∗ (t) ≤ N (t, x) ≤ N ∗ (t) < ∞,

(2.9) (2.10)

for all t > 0. Proof of Theorem 1. Case a(y) = a constant, d ≥ 1. Let us fix t∗ > 0. From (2.9)(2.10), we have 0 < M∗ ≤ M(t, x) ≤ M∗ < ∞ and 0 < N∗ ≤ N (t, x) ≤ N ∗ < ∞ for all t ≥ t∗ , and thus C(M∗ , N∞ , a∗ , P()) H ( f | f∞) d x D( f ) ≥ M∗ for all t ≥ t∗ due to (2.4) with the constant C(M∗ , N∞ , a∗ , P()) given in Lemma 2. As a direct consequence, we get d C(M∗ , N∞ , a∗ , P()) H ( f | f∞) d x ≤ − H ( f | f∞) d x dt M∗ for all t ≥ t∗ . Gronwall’s lemma implies estimate (1.22).

Exponential Equilibration Rate for the Inhomogeneous Aizenman-Bak Model

441

Next, convergence in L 1 as stated in Theorem 1 follows from the functional inequality of Csiszar-Kullback type [Cs,Ku]: 2 M(t, x) d x + N∞ H ( f (t)| f ∞ ) d x. (2.11) f (t, ·, ·) − f ∞ L 1 ≤ 2 x,y

The proof is standard, see [CCD] for related inequalities, and it is shown via a Taylor expansion of the function ϕ( f ) = f ln( f ) − f up to second order around f ∞ . Indeed, for a function ζ (x, y) ∈ (inf{ f (x, y), f ∞ (y)}, sup{ f (x, y), f ∞ (y)}), we get ∞ y 1 −√ ( f − f ∞ )2 d y d x, H ( f | f∞) d x = ( f − f∞) + 2ζ N∞ 0 and the first term vanishes due to the conservation law (1.9). For the second term, we apply Hölder’s inequality ∞ 1 2 2 ( f − f ∞ ) d y d x with ζ L 1x,y ≤ M d x + N∞ . f − f ∞ L 1 ≤ ζ L 1x,y x,y 0 ζ Noticing that t ∈ [0, t∗ ] → f (t, ·, ·) ∈ L 1x,y is bounded, we finally get (1.23), which concludes the proof of Theorem 1. Remark 1. As a consequence of the previous result, we also showed that the unique √ global bounded solutions of the system (1.13)-(1.14) satisfy M(t, x) → M∞ = N∞ and N (t, x) → N∞ as t → ∞ in L 1 () exponentially fast with explicit constants. In fact, we first remark that H ( f | f ∞ ) = H ( f | f M,N ) + H ( f M,N | f ∞ ) with √ √

2 d x, E S := H ( f M,N | f ∞ )d x = 2 N (ξ ln ξ − ξ + 1)+ N∞ N − N∞

where ξ =

√M N

and

M2 − M y e N . N It is obvious that E S is nonnegative since the minimum of ξ ln ξ − ξ + 1 is zero, and it can be written as √ √ M2 M ln − 2(M − N ) + 2 H ( f M,N | f ∞ ) d x = N∞ − N d x, N f M,N (t, x, y) =

by using the conservation of mass (1.9). Since H ( f | f M,N ) ≥ 0, then (1.22) implies the exponential convergence to zero of E S by the above additivity property. Finally, a simple Taylor expansion shows that, for all t ≥ t∗ , 2 2 N (t) − M(t) L 2 + N (t) − N∞ L 2 ≤ L H ( f M,N (t)| f ∞ ) d x, x

with

x

M∗ √ ∗ L = max 1, √ , N , N∗

that implies by trivial arguments the exponential convergence in L 1 () towards equilibrium for M and N . In fact, the system (1.13)-(1.14) might have been studied by a direct application of the techniques in [DF05,DF06].

442

J. A. Carrillo, L. Desvillettes, K. Fellner

3. A-priori Estimates In the sequel, we shall discuss the general diffusion coefficient, i.e., size dependent verifying (1.3) but we restrict to the one dimensional case, d = 1 (we shall not recall this fact in the various lemmas). We begin the proof of Theorem 1. Lemma 4. Assume that f 0 = 0 is a non-negative initial datum such that (1 + y) f 0 ∈ L 1 ((0, 1)×(0, ∞)). Then, there exists M∗0 > 0 such that solutions of (1.1)–(1.8) satisfy ∞ sup f (t, x, y) d y d x ≤ M(t, x) d x ≤ M∗0 . (3.1) t≥0

0

Proof. We estimate the L 1 ()-norm of M(t, x) by integrating equality (1.14), obtaining d M(t, x) d x = N (t, x) d x − M(t, x)2 d x dt 2 N0 (x) d x − M(t, x) d x ≤

by Hölder’s inequality and the conservation of mass (1.9). Therefore, for all t ≥ 0, 1/2

M(t, x) d x ≤ max

M0 (x) d x,

N0 (x) d x

:= M∗0

showing (3.1).

We now turn to a control of the L 1y (L ∞ x )-norm of f : Lemma 5. Assuming that the nonnegative initial datum f 0 = 0 satisfies (1 + y + ln f 0 ) f 0 ∈ L 1 ((0, 1) × (0, ∞)). Then, the number density of polymers M ∈ L 1 + L ∞ (0, ∞; L ∞ (0, 1)). More precisely, there exist m ∞ > 0 and an L 1+ (0, ∞)-function m 1 (t) such that the solution of (1.1)–(1.8) satisfies ∞ sup f (t, x, y) dy ≤ m ∞ + m 1 (t), (3.2) 0 0<x<1

and as a consequence, M(t, ·) L ∞ ≤ m ∞ + m 1 (t) x a.e. t ≥ 0. Proof. In order to estimate the L ∞ x -norm of M(t, x), we first use the entropy dissipation (2.3) of H ( f | f ∞ ) to deduce that ∞ 1 2 0

0

0

∞

∂x

∞ 1 2 f d y d x dt ≤ 0

0

0

∞

a(y) (∂x f )2 d y d x dt 2 a∗ f

H ( f0 | f∞) := µ1 . ≤ 2 a∗

(3.3)

Exponential Equilibration Rate for the Inhomogeneous Aizenman-Bak Model

443

Now, we integrate

f (t, x, y) −

f (t, x, ˜ y) =

x x˜

∂x

f (t, ξ, y) dξ

with respect to x˜ ∈ (0, 1) and estimate sup

0<x<1

1

f (t, x, y) −

2 f (t, x, ˜ y) d x˜

1

∂x

≤

0

f (t, ξ, y)

2

dξ.

0

Hence, after further integration with respect to y ∈ (0, ∞), we apply Young’s and Hölder’s inequalities to show ∞ 1 ∞ ∞ 1

2 ∂x f (t, ξ, y) dξ dy + 2 sup f (t, x, y) dy ≤ 2 f (t, x, ˜ y) d x˜ dy . 0 0<x<1 0 0 0 0 ≤ m∞ := m 1 (t) In particular, we have, due to (3.1) and (3.3), that ∞ H ( f0 | f∞) m 1 (t) dt ≤ µ1 = , 2 a∗ 0

m ∞ = 2M∗0 .

Finally, M(t, ·)

L∞ x

≤

∞

sup f (t, x, y) dy,

0 0<x<1

which completes the proof of Lemma 5.

Note that the estimates (3.1) and (3.2) are somehow in duality, a fact that will become essential below. We now prove a lemma showing that the total number of clusters 1 0 M(t, x) d x is bounded below by a strictly positive constant: Lemma 6. Assume that f 0 = 0 is a nonnegative initial datum such that (1 + y + ln f 0 ) f 0 ∈ L 1 ((0, 1) × (0, ∞)). Then, there exists a constant M0∗ > 0 such that for all times t ≥ 0, one has M(t, x) d x ≥ M0∗ , (3.4)

where f is a solution of (1.1)–(1.8). Proof. We recall that 1 1 1 d M(t, x) d x = N0 (x) d x − M(t, x)2 d x, dt 0 0 0 so that d dt

1

1

M(t, x) d x ≥

0

0 1

≥ 0

N0 (x) d x − M(t, ·) L ∞ x

1

M(t, x) d x 0

N0 (x) d x − (m ∞ + m 1 (t))

1

M(t, x) d x. 0

444

J. A. Carrillo, L. Desvillettes, K. Fellner

Then, 1 1 t t d (m ∞ +m 1 (s)) ds 0 e M(t, x) d x ≥ N0 (x) d x e 0 (m ∞ +m 1 (s)) ds , dt 0 0 +∞ and, recalling µ1 ≥ 0 m 1 (s) ds), we deduce 1 1 t M(t, x) d x ≥ M0 (x) d x e− 0 (m ∞ +m 1 (σ )) dσ 0

0

1

+

N0 (x) d x

0

≥e

−µ1 −m ∞ t

t

e−

s

(m ∞ +m 1 (σ )) dσ

0

1

M0 (x) d x +

0

≥ e−µ1

t

t

e

ds

−(t−s) m ∞ −µ1

0

−m ∞ t e

1

M0 (x) d x +

0

1 − e−m ∞ t m∞

1

N0 (x) d x 1 N0 (x) d x .

ds

0

0

Distinguishing here between t < 1 and t ≥ 1, for instance, we obtain 1 1 1 − e−m ∞ −µ1 −m ∞ inf e M0 (x) d x, N0 (x) d x , M0∗ := e m∞ 0 0

which concludes the proof of Lemma 6.

Next, we show the uniform control in time of all moments with respect to size y of the solutions. Let us define the moment of order p > 1 by 1 ∞ M p ( f )(t) := y p f (t, x, y) d y d x 0

0

for all t ≥ 0. Lemma 7. We assume that f 0 = 0 is a nonnegative initial datum such that (1 + y + ln f 0 ) f 0 ∈ L 1 ((0, 1) × (0, ∞)). Then, the solution f of (1.1)–(1.8) has moments M p ( f )(t) uniformly bounded in time t > t∗ > 0 and for any p > 1, i.e., there exist explicit constants M∗p ( f 0 , m ∞ , m 1 , p) such that M p ( f )(t) ≤ M∗p ,

for a.e. t > t∗ > 0.

(3.5)

Proof. We proceed in two steps: Step 1. We first assume that M p ( f )(t∗ ) < ∞ for certain p > 1 and t∗ > 0. Using the weak formulation (1.16), it is easy to check that ∞ ∞ ∞ < Q( f, f ), y p >= − 2 y p f (y) dy M(t, x) + f (y) f (z)(y + z) p dy dz 0 0 0 p−1 ∞ f (y) y p+1 dy. − p+1 0 Taking into account Lemma 5 and (y + z) p ≤ C p (y p + z p ), we deduce ∞ p−1 ∞ f (y) y p+1 dy < Q( f, f ), y p > ≤ 2(C p − 1) y p f (y) dy [m ∞ + m 1 (t)] − p+1 0 0

Exponential Equilibration Rate for the Inhomogeneous Aizenman-Bak Model

445

for all p > 1. Integrating in space, we find that the evolution of the moment of order p > 1 is given by d p−1 M p ( f )(t) ≤ 2(C p − 1) M p ( f )(t) [m ∞ + m 1 (t)] − M p+1 ( f )(t). dt p+1

(3.6)

Trivial interpolation of the p + 1-order moment with the moment of order one implies M p ( f )(t) ≤

1

p−1

1

N0 (t, x) d x + M p+1 ( f )(t)

0

for all > 0, and thus d p−11 M p ( f )(t) ≤ 2(C p − 1) M p ( f )(t) [m ∞ + m 1 (t)] − M p ( f )(t) + D

dt p+1

for certain constant D . Choosing > 0 such that 2(C p − 1)m ∞ −

1 p−11 ≤− , p+1

2

we obtain d 1 M p ( f )(t) ≤ − M p ( f )(t) + 2(C p − 1) m 1 (t) M p ( f )(t) + D

dt 2

for a.e. t > t∗ . According to Duhamel’s formula, t t − t∗ m 1 (s) ds − M p ( f )(t) ≤ M p ( f )(t∗ ) exp 2(C p − 1) 2

t∗ t t t −s ds, + D

exp 2(C p − 1) m 1 (τ ) dτ − 2

t∗ s

(3.7)

which shows that the moment M p ( f )(t) is bounded by a constant M∗p for a.e. t > t∗ since m 1 (t) ∈ L 1 ((0, ∞)) by Lemma 5. Moreover, it follows from (3.6) that the boundedness of M p (t∗ ) immediately implies that T M p+1 ( f )(t) dt < ∞ t∗

for all T > 0, and thus the finiteness of M p+1 ( f )(t) for a.e. t > t∗ and a simple induction argument enables then to conclude the bounds on all higher moments. Step 2. It remains to show that for given nontrivial initial data y f 0 ∈ L 1x,y and for a p > 1 and a time t∗ > 0 we have that M p (t∗ ) < ∞. We start with the following observation [MW, Appendix A]: For a nonnegative integrable function g(y) = 0 on (0, ∞), there exists a concave function (y), depending on g, smoothly increasing from (0) > 0 to (∞) = ∞ such that ∞ (y) g(y) dy < ∞. 0

446

J. A. Carrillo, L. Desvillettes, K. Fellner

Moreover, the function can be constructed to satisfy (y) − (y ) ≥ C

y − y y ln2 (e + y)

(3.8)

for 0 < y < y with C not depending on g. We refer to [MW, Appendix A] for all the details of this “by-now standard” construction. To show now that M p (t∗ ) < ∞ for a p > 1 and a time t∗ > 0, we take functions (x, y) constructed for nontrivial y f 0 (x, y) ∈ L 1y (0, ∞) a.e. x ∈ (0, 1) and calculate - similar to Step 1 - the moment 1 ∞ M1, ( f )(t) = y (x, y) f (x, y) d y d x. 0

0

For the fragmentation part, we use (3.8) for 0 < y < y and estimate y y (y) Q f ( f ) = 2 y ( (y ) − (y)) dy f (y) 0 y −2 −1 ≤ −C ln (e + y)y y (y − y ) dy f (y) 0

y2 = −C ln−2 (e + y) f (y) ≤ −Cδ y 2−δ f (y), 6 for all δ > 0 and a positive constant Cδ , where the (t, x)-dependence has been dropped for notational convenience. Hence, by estimating the coagulation part similar to Step 1, making use of the concavity of , we obtain that d M1, ( f )(t) ≤ 3(m ∞ + m 1 (t))M1, ( f )(t) − Cδ M2−δ ( f )(t), dt and boundedness of the moment M1, follows by interpolation as well as the finiteness of M2−δ ( f )(t∗ ) analogously to Step 1. Next, we show that M and N are bounded below uniformly (with respect to t and x) for all t ≥ t∗ > 0. Proposition 2. Under the assumptions of Theorem 1, let t∗ > 0 be given. Then, there are strictly positive constants M∗ and N∗ such that for all t ≥ t∗ > 0, M(t, x) ≥ M∗

and

N (t, x) ≥ N∗ .

Proof. We write the equation satisfied by f in this way: f, ∂t f − a(y) ∂x x f = g1 − y f − M(t, ·) L ∞ x where g1 is nonnegative. Then

t

t y+ M(s,·) L ∞ ds x = g2 , (∂t + a(y) ∂x x ) f e 0

where g2 is nonnegative. Now, we recall that the solution h := h(t, x) of the heat equation ∂t h − a ∂x x h = G,

Exponential Equilibration Rate for the Inhomogeneous Aizenman-Bak Model

447

with homogeneous Neumann boundary condition on the interval (0, 1), where a > 0 is a constant and G := G(t, x) ∈ L 1 , is given by the formula 1 h(t, x) = √ 2 π

1

˜ z) h(0,

−1

t

1 + √ 2 π

1 −1

0

∞

(2k+x−z)2 1 √ e− 4a t dz at k=−∞

˜ G(s, z)

∞

(2k+x−z)2 1 e− 4a (t−s) dzds, √ a (t − s) k=−∞

with h˜ and G˜ denoting the “evenly mirrored around 0 in the x variable” functions h and G. Therefore, for all t1 , t ≥ 0, and x ∈ (0, 1), y ∈ R+ , t +t (t +t) y+ 1 M(s,·)

∞

ds

Lx 0 f (t1 + t, x, y) e 1 1 t 2 1 1 − (x−z) t y+ 1 M(s,·) L ∞ ds x ≥ √ dz, f˜(t1 , z, y) √ e 4a(y) t e 1 0 2 π −1 a(y) t

so that when t ∈ [t∗ , 2t∗ ] (and since |x − z| < 2): 1 1 1 f (t1 , z, y) e− a∗ t∗ e−2t∗ y−2t∗ m ∞ −µ1 dz ∗ 2π a t∗ 0 1 ≥C f (t1 , z, y) e−2t∗ y dz,

f (t1 + t, x, y) ≥ √

0

where C > 0 depends on the constants a∗ , a ∗ , m ∞ , µ1 and t∗ > 0. We recall that for all t ≥ t∗ , 1

∞

0 0

y 2 f (t, x, y) d y d x ≤ M∗2 ,

and thus, for any A > 0, we deduce N (t1 + t, x) ≥ C e−2t∗ A ≥ C e−2t∗ A

1

A

0 0 1 0

= C e−2t∗ A

0

1

f (t1 , z, y) y dydz

N (t1 , x) d x − M∗2 /A

N0 (x) d x − M∗2 /A ,

due to the conservation law (1.9) and 1 0

∞ A

y f (t, x, y) d y d x ≤ M∗2 /A.

448

J. A. Carrillo, L. Desvillettes, K. Fellner

Choosing now A, we get that N (t1 +t, x) ≥ N∗ for some N∗ > 0 which does not depend on t1 . Using Lemma 6, M(t1 + t, x) ≥ C

1

∞

0 0

≥Ce

−2t∗ A

≥ C e−2t∗ A

f (t1 , z, y) e−2y dydz 1

A

f (t1 , z, y) dydz 0 0

M0∗ − M∗2 /A2 .

Once again choosing A, we get that M(t1 + t, x) ≥ M∗ . Since M∗ does not depend on t1 , we get Proposition 2. 4. Proofs of Theorem 1 and Proposition 1 if = (0, 1). With Proposition 2 and Lemma 5 providing the moment bounds required by the entropyentropy dissipation Lemma 2 in the one dimensional case = (0, 1), we turn now to the Proof of Theorem 1. Case = (0, 1). According to Lemma 2,

d dt

1

0

H ( f | f ∞ ) d x ≤ −D( f ) ≤ −

C M L ∞ x

1

H ( f | f ∞ ) d x,

0

where M L ∞ (t) ≤ m ∞ + m 1 (t) is in L 1t +L ∞ t by Lemma 5. Hence, for t∗ > 0, x

1

0

H ( f (t)| f ∞ ) d x ≤

1

0

H ( f (t∗ )| f ∞ ) d x exp

t

−

t∗

C ds . M L ∞ x

∞

Knowing that m 1 (t) ∈ L 1t with 0 m 1 (t) dt ≤ µ1 , we consider the sets A := {s > 0 : m 1 (s) ≥ 1} and Bt := {s ∈ [0, t] : m 1 (s) < 1}. We readily find that

|A| =

∞

ds ≤ A

m 1 (t) dt ≤ µ1

and

|Bt | = t −

ds ≥ t − µ1 . A∩[0,t]

0

Moreover,

t

t∗

−

C ds ≤ M L ∞ x

− Bt

C C (t − µ1 ), ds ≤ − M L ∞ (1 + m ∞ ) x

finishing the proof of (1.22). The proof of the L 1 -decay estimate (1.23) follows the same arguments as in the case of constant diffusion done in Sect. 2 using Csiszar-Kullback type inequalities. Finally, we show Proposition 1. Let us denote by C T any constant of the form C(t) (1+ T )s , where s ∈ R and C(t) is bounded on any interval [t∗ , +∞) with t∗ > 0.

Exponential Equilibration Rate for the Inhomogeneous Aizenman-Bak Model

449

Proof of Proposition 1. We observe using the bounds (3.5) and (3.1) that for all q ≥ 0, T 1 ∞ (1 + y)q Q + ( f, f ) d y d x dt 0

≤

0

0

T 1

∞

(1 + y)q+1 f (t, x, y) d y d x dt q +1 0 0 0 T 1 ∞ ∞ (1 + y + z)q f (t, x, y) f (t, x, z) dz dy d x dt + 0

0

0

0

∗ ) T + 2q ≤ 2q+1 (M∗0 + Mq+1

T 0

M(t, ·) L ∞ (M∗0 + Mq∗ ) dt ≤ C T . x

According to the properties of the heat kernel (cf. [DF06] for example), we know that for any ε > 0 and t∗ > 0,

f (·, ·, y) L 3−ε ([t∗ ,T ]×) ≤ C T f (0, ·, y) L 1x + Q + ( f, f )(·, ·, y) L 1 ([0,T ]×) . As a consequence,

∞

0

(1 + y)q f (·, ·, y) L 3−ε ([t∗ ,T ]×) dy ≤ C T .

Then, for all r ∈ [2, 3[, ∞ (1 + y)q Q + ( f, f )(·, ·, y) L r/2 ([t∗ ,T ]×) dy 0

∞

(1 + y)q+1 f (·, ·, y) L r ([t∗ ,T ]×) dy q +1 0 ∞ ∞ + (1 + y)q f (·, ·, y ) f (·, ·, y − y ) dy L r/2 ([t∗ ,T ]×) dy 0 ∞ ∞ 0 (1 + y + z)q f (·, ·, y) f (·, ·, z) L r/2 ([t∗ ,T ]×) dydz ≤ CT +

≤

0

0

q−1 ≤ CT + 2

∞ 0

2 (1 + y) f (·, ·, y) q

L r ([t

∗ ,T ]×)

dy

≤ CT .

Using again the properties of the heat kernel (still described in [DF06]), we see that for any s ∈ [1, ∞) and t∗ > 0, ∞ (1 + y)q f (·, ·, y) L s ([t∗ ,T ]×) dy ≤ C T . 0

The above argument can now be used with r = 4 and shows that ∞ (1 + y)q Q + ( f, f )(·, ·, y) L 2 ([t∗ ,T ]×) dy ≤ C T . 0

As a consequence, the standard energy estimate on the heat kernel implies that ∞ (1 + y)q f (T, ·, y) Hx1 dy ≤ C T . 0

450

J. A. Carrillo, L. Desvillettes, K. Fellner

Then, using a Gagliardo-Niremberg type interpolation and Theorem 1, we obtain ∞ (1 + y)q f (T, ·, y) − f ∞ (y) L ∞ dy x 0 ∞

3/4 (1 + y)q f (T, ·, y) − f ∞ (y) H 1 ≤ x 0

1/4 × f (T, ·, y) − f ∞ (y) L 1 dy x ∞ 3/4 4q/3 ≤ (1 + y) f (T, ·, y) − f ∞ (y) Hx1 dy 0

× ≤

1/4

∞

f (T, ·, y) − f ∞ (y) L 1x dy

0 3/4 C T exp(−Cst

T ) ≤ Cst exp(−Cst T ),

which concludes the proof of Proposition 1.

Acknowledgements. JAC acknowledges the support from DGI-MEC (Spain) project MTM2005-08024. KF is partially supported by the WWTF (Vienna) project “How do cells move?” and the Wittgenstein Award 2000 of Peter A. Markowich. JAC and KF appreciate the kind hospitality of the ENS de Cachan. The authors want to express their gratitude to the reviewer who helped us to improve this work.

References [AB] [Al] [Am] [AW] [CCD] [CCG] [CD] [Cs] [CP] [De] [DF05] [DF06] [DV01] [DV05] [Dr]

Aizenman, M., Bak, T.: Convergence to equilibrium in a system of reacting polymers. Commun. Math. Phys. 65, 203–230 (1979) Aldous, D.J.: Deterministic and stochastic models for coalescence (aggregation, coagulation): a review of the mean-field theory for probabilists. Bernoulli 5, 3–48 (1999) Amann, H.: Coagulation-fragmentation processes. Arch. Rat. Mech. Anal. 151, 339–366 (2000) Amann, H., Walker, C.: Local and global strong solutions to continuous coagulationfragmentation equations with diffusion. J. Differ. Eqs. 218, 159–186 (2005) Cáceres, M.J., Carrillo, J.A., Dolbeault, J.: Nonlinear stability in lp for solutions of the vlasovpoisson system for charged particles. SIAM J. Math. Anal. 34, 478–494 (2002) Cáceres, M.J., Carrillo, J.A., Goudon, T.: Equilibration rate for the linear inhomogeneous relaxation-time boltzmann equation for charged particles. Comm. Partial Differ. Eqs. 28, 969–989 (2003) Chae, D., Dubovskii, P.: Existence and uniqueness for spatially inhomogeneous coagulationcondensation equation with unbounded kernels. J. Integ. Eqs. Appl. 9, 219–236 (1997) Csiszár, I.: Information-type measures of difference of probability distributions and indirect observations. Studia Sci. Math. Hungar 2, 299–318 (1967) Collet, J.F., Poupaud, F.: Asymptotic behaviour of solutions to the diffusive fragmentationcoagulation system. Phys. D 114, 123–146 (1998) Desvillettes, L.: Convergence to equilibrium in large time for boltzmann and b.g.k. equations. Arch. Rat. Mech. Anal. 110, 73–91 (1990) Desvillettes, L., Fellner, K.: Exponential decay toward equilibrium via entropy methods for reaction-diffusion equations. J. Math. Anal. Appl. 319, 157–176 (2006) Desvillettes, L., Fellner, K.: Entropy methods for Reaction-Diffusion Equations: Degenerate Diffusion and Slowly Growing A-priori bounds. To appear in Rev. Matem. Iber. Desvillettes, L., Villani, C.: On the trend to global equilibrium in spatially inhomogeneous entropy-dissipating systems: the linear fokker-planck equation. Comm. Pure Appl. Math. 54, 1–42 (2001) Desvillettes, L., Villani, C.: On the trend to global equilibrium for spatially inhomogeneous kinetic systems: the boltzmann equation. Invent. Math. 159, 245–316 (2005) Drake, R.L.: “A general mathematical survey of the coagulation equation”. Topics in Current Aerosol Research (part 2), International Reviews in Aerosol Physics and Chemistry, Oxford: Pergamon Press, 1972, pp. 203–376

Exponential Equilibration Rate for the Inhomogeneous Aizenman-Bak Model

[ELMP] [FMS] [FNS] [JN] [Ki] [Ku] [LM02-1] [LM02-2] [LM03] [LM04] [LM05] [LW] [Le] [MP] [MW] [MN] [NS] [Ok] [PS] [Ro] [Sa] [S16] [S17] [V06]

451

Escobedo, M., Laurençot, Ph., Mischler, S., Perthame, B.: Gelation and mass conservation in coagulation-fragmentation models. J. Differ. Eqs. 195, 143–174 (2003) Fellner, K., Miljanovic, V., Schmeiser, C.: Convergence to equilibrium for the linearised cometary flow equation. Trans. Theory Stat. Phys. 35, 109–136 (2006) Fellner, K., Neumann, L., Schmeiser, C.: Convergence to global equilibrium for spatially inhomogeneous kinetic models of non-micro-reversible processes. Monatsh. Math. 141, 289– 299 (2004) Jabin, P.E., Niethammer, B.: On the rate of convergence to equilibrium in the becker-döring equations. J. Differ. Eqs. 191, 518–543 (2003) Kirane, M.: On stabilization of solutions of the system of parabolic differential equations describing the kinetics of an autocatalytic reversible chemical reaction. Bull. Inst. Mat. Acad. Sin. 18(4), 369–377 (1990) Kullback, S.: A lower bound for discrimination information in terms of variation. IEEE Trans. Information Theory 4, 126–127 (1967) Laurençot, Ph., Mischler, S.: The continuous coagulation-fragmentation equation with diffusion. Arch. Rat. Mech. Anal. 162, 45–99 (2002) Laurençot, Ph., Mischler, S.: From the discrete to the continuous coagulation-fragmentation equations. Proc. Roy. Soc. Edinburgh Sect. A 132, 1219–1248 (2002) Laurençot, Ph., Mischler, S.: Convergence to equilibrium for the continuous coagulationfragmentation equation. Bull. Sci. Math. 127, 179–190 (2003) Laurençot, Ph., Mischler, S.: On coalescence equations and related models. Modeling and Computational Methods for Kinetic Equations P. Degond, L. Pareschi, G. Russo eds., Boston: Birkhäuser, 2004, pp. 321–356 Laurençot, Ph., Mischler, S.: Liapunov functionals for smoluchowskis coagulation equation and convergence to self-similarity. Monatsh. Math. 146, 127–142 (2005) Laurençot, Ph., Wrzosek, D.: The becker-döring model with diffusion. ii. long-time behaviour. J. Differ. Eqs. 148, 268–291 (1998) Leyvraz, F.: Scaling theory and exactly solved models in the kinetics of irreversible aggregation. Phys. Rep. 383, 95–212 (2003) Menon, G., Pego, R.L.: Approach to self-similarity in smoluchowski’s coagulation equations. Comm. Pure Appl. Math. 57, 1197–1232 (2004) Mischler, S., Wennberg, B.: On the spatially homogeneous boltzmann equation. Ann. Inst. H. Poincaré Anal. Non Linéaire 16(4), 467–501 (1999) Mouhot, C., Neumann, L.: Quantitative study of convergence to equilibrium for linear collisional kinetic models in the torus. Nonlinearity 19, 969–998 (2006) Neumann, L., Schmeiser, C.: Convergence to global equilibrium for a kinetic fermion model. SIAM J. Math. Anal. 36, 1652–1663 (2005) Okubo, A.: Dynamical aspects of animal grouping: swarms, schools, flocks and herds. Adv. Biophys. 22, 1–94 (1986) Perelson, A.S., Samsel, R.W.: Kinetics of red blood cell aggregation: an example of geometric polymerization. Kinetics of aggregation and gelation, F. Family, D.P. Landau, eds., London: Elsevier, 1984 Rothe, F.: Global Solutions of Reaction-Diffusion Systems. Lecture Notes in Mathematics, Berlin: Springer, 1984 Safronov, V.S.: Evolution of the ProtoPlanetary cloud and Formation of the earth and the planets. Jerusalem: Israel Program for Scientific Translations Ltd., 1972 Smoluchowski, M.: Drei vorträge über diffusion, brownsche molekularbewegung und koagulation von kolloidteilchen. Physik Zeitschr 17, 557–599 (1916) Smoluchowski, M.: Versuch einer mathematischen theorie der koagulationskinetik kolloider lösungen. Z Phys Chem 92, 129–168 (1917) Villani, C.: Hypocoercive diffusion operators. International Congress of Mathematicians, Vol. III, Zürich, Eur. Math. Soc., 2006, pp. 473–498

Communicated by A. Kupiainen

Commun. Math. Phys. 278, 453–486 (2008) Digital Object Identifier (DOI) 10.1007/s00220-007-0405-1

Communications in

Mathematical Physics

n ), Two-parameter Quantum Affine Algebra Ur,s (sl Drinfel’d Realization and Quantum Affine Lyndon Basis Naihong Hu1, , Marc Rosso2 , Honglian Zhang1,3, 1 Department of Mathematics, East China Normal University, Min Hang Campus, Dong Chuan Road 500,

Shanghai 200241, PR China. E-mail: [email protected]

2 Départment Mathématiques et Applications, Ecole Normale Superieure, 45 Rue de Ulm,

75230 Paris Cedex 05, France. E-mail: [email protected]

3 Department of Mathematics, Shanghai University, Shanghai 200444, PR China.

E-mail: [email protected] Received: 22 August 2006 / Accepted: 23 July 2007 Published online: 8 January 2008 – © Springer-Verlag 2008

n ) (n > 2) Abstract: We further define two-parameter quantum affine algebra Ur,s (sl after the work on the finite cases (see [BW1,BGH1,HS,BH]), which turns out to be a Drinfel’d double. Of importance for the quantum affine cases is that we can work out the compatible two-parameter version of the Drinfel’d realization as a quantum affinization of Ur,s (sln ) and establish the Drinfel’d Isomorphism Theorem in the two-parameter setting, via developing a new combinatorial approach (quantum calculation) to the quantum affine Lyndon basis we present (with an explicit valid algorithm based on the use of Drinfel’d generators). 1. Introduction 1.1. In 2001, Benkart-Witherspoon investigated the structures of two-parameter quantum groups Ur,s (g) for g = gln, or sln in [BW1] originally obtained by Takeuchi [T], and the finite-dimensional weight representation theory in [BW2], and further obtained some new finite-dimensional pointed Hopf algebras in [BW3] when r s −1 is a root of unity, which possess new ribbon elements under some conditions (and will yield new invariants of knots and links). These show that two-parameter quantum groups are well worth further studying. 1.2. In 2004, Bergeron-Gao-Hu [BGH1] gave the structures of two-parameter quantum groups Ur,s (g) for g = so2n+1 , sp2n , so2n , and developed in [BGH2] the highest weight representation theory when r s −1 is not a root of unity. Especially, [BGH1] explored the environment condition upon which Lusztig’s symmetries exist for the classical simple Lie algebras g, namely, they exist as Q-isomorphisms between Ur,s (g) and the N.H., supported in part by the NNSF (Grants 10431040, 10728102), the PCSIRT, the TRAPOYT and the FUDP from the MOE of China, the SRSTP from the STCSM, die Deutche Forschungsgemeinschaft (DFG), as well as an ICTP long-term visiting scholarship. H.Z., supported by a Ph.D. Program Scholarship Fund of ECNU 2006.

454

N. Hu, M. Rosso, H. Zhang

associated object Us −1 ,r −1 (g) only when rank (g) = 2, and in the case when rank (g) > 2, the sufficient and necessary condition for the existence of Lusztig’s symmetries between Ur,s (g) and its associated object forces Ur,s (g) to take the “one-parameter” form Uq,q −1 (g), where r = s −1 = q. In other words, when rank (g) > 2, the Lusztig’s symmetries exist only for the one-parameter quantum groups Uq,q −1 (g) as Q(q)-automorphisms (rather merely as Q-isomorphisms). In this case, these symmetries give rise to, with respect to modulo some identification of group-like elements, the usual Lusztig symmetries on quantum groups Uq (g) of Drinfel’d-Jimbo type. The Lusztig symmetry property indicates that there do exist remarkable differences between the two-parameter quantum groups in question and the one-parameter quantum groups of Drinfel’d-Jimbo type. Afterwards, Hu-Shi [HS] and Bai-Hu [BH] studied the two-parameter quantum groups for type G 2 and E cases. Through these work, we found that the treatments in two-parameter cases are frequently more subtle to follow combinatorial approaches only, for instance, the description of the convex PBW-type basis (cf. [BH]) has to appeal to the use of Lyndon words (see [R2] and references therein) because there is no braid group available in question. Thereby so far, it seems desirable to extend these kind of two-parameter quantum groups in Benkart-Witherspoon’s sense in finite cases to the affine cases. The present paper is aimed at this purpose for the affine type A(1) n (n > 1) case. To this end, we n ) (n > 2) (whereas Ur,s (sl 2 ) is essentially first give the defining structure of Ur,s (sl 2 ) if set r s −1 = q 2 , which is not considered in the paper). isomorphic to Uq,q −1 (sl 1.3. As is well-known, the importance of the Drinfel’d generators (in the Drinfel’d realization) for quantum affine algebras is just like that of the loop generators (in the loop realization) for affine Kac-Moody algebras (see [Ga,K]). Early in 1987, Drinfel’d [Dr2] put forward his famous new (conjectural) realization of quantum affine algebras Uq ( g) with g semisimple, because he recognized that the study of finite dimensional representations of Uq ( g) is made easier by the use of this realization on the set of Drinfel’d generators, which is called the Drinfel’d realization of Uq ( g) or the Drinfel’d quantum affinization of Uq (g). Besides this, the Drinfel’d realization also finds its main contribution to the construction of vertex representations for quantum affine algebras Uq ( g) (see [FJ,J1,DI2], etc.), as does the loop realization in the vertex representation theory of affine Kac-Moody algebras (see [K]). In 1993, Khoroshkin-Tolstoy [KT] constructed the Drinfel’d realization for the untwisted types using a Cartan-Weyl generators system with no proof. The first perfect proof of the Drinfel’d isomorphsim only for the untwisted types was given by Beck [B2] in 1994, making use of his extended braid group actions, based on the work of Damiani [Da], Levendorskii-Soibel’man-Stukopin [LSS] for the 2 ). In 1998, Jing [J2] basically adopted the inverse map suggested by Beck case Uq (sl for the untwisted types (see the final remark in [B2, Sect. 4]) and gave a combinatorial proof for the Drinfel’d isomorphism for the untwisted types. 1.4. In order to further explore and enrich the structure and representation theory of the two-parameter quantum affine algebras later on, another main result of this paper is to n ) (n > 2). Its definition depends on the selfgive the Drinfel’d realization of Ur,s (sl compatible defining system (Definition 3.1), which in the two-parameter setting, varies dramatically in comparison with the one-parameter cases (see [Dr2], or [B2, Theorem 4.7]) and is nontrivial to match up here and there the whole relations together. Indeed, to invent the two-parameter version of Drinfel’d realization needs some insights, e.g., from the antisymmetric point of view via the Q-algebra antiautomorphism τ , based on

Two-Parameter Quantum Affine Algebra, Drinfel’d Realization

455

some information from the combinatorial description of the convex PBW-type basis via the Lyndon words (see [R2,BH], etc.), and also, the proof of the Drinfel’d isomorphism in our case depends completely on the combinatorial approach with specific techniques to design those defining relations in order to fit the compatibilities in the whole system. If the readers follow the details, they will find how our quantum calculations (somehow a bit tedious) work well and necessarily for exactly verifying the compatibilities of the defining system. The reason is that the method we expanded, to some extent, essentially follows an approach to a kind of description of the quantum “affine” Lyndon basis. Actually, we can construct explicitly all quantum real and imaginary root vectors using this method (see Lemmas 4.7 & 4.8, together with Definition 3.9). 1.5. The paper is organized as follows. We first give the structure of two-parameter n ) (n > 2) as a Hopf algebra in Sect. 2. We prove that the quantum affine algebra Ur,s (sl n ) is characterized as a Drinfel’d double two-parameter quantum affine algebra Ur,s (sl D(B, B ) of Hopf subalgebras B, B with respect to a skew-dual pairing. In Sect. 3, we explicitly describe the two-parameter Drinfel’d quantum affinization of Ur,s (sln ) (n > 2), that is, the Drinfel’d realization in the two-parameter case which is antisymmetric with respect to the Q-algebra antiautomorphism τ . In the case when r s = 1, i.e., r = s −1 = q, our result modulo some identification yields the usual Drinfel’d realizan ) of Drinfel’d-Jimbo type (see [Dr2,B2,DI1,J2], tion of a quantum affine algebra Uq (sl etc.). Since Beck’s extended braid group actions approach is invalid for our case, we combine the Lyndon words description ([R2]) with the quantum Lie bracket operation ([J2]) to develop a combinatorial trick in the quantum affine case (we call it quantum calculations), which can be utilized in the construction of all the quantum root vectors (including real and imaginary ones), so that we can formulate and prove the quantum “affine” Lyndon basis for the first time (in a more explicit form than that of [B1]) for Ur,s ( n± ) based on the Drinfel’d realization in Sect. 3, and further prove the Drinfel’d isomorphism using our combinatorial algorithm in Sect. 4. In fact, our proof also provides a concrete process of how to construct the Drinfel’d generators using the Chevalley-KacLusztig generators. n ) and Drinfel’d Double 2. Quantum Affine Algebra Ur,s (sl 2.1. Let K = Q(r, s) denote a field of rational functions with two-parameters r , s (r = ±s). Assume is a finite root system of type An−1 with a base of simple roots. Regard as a subset of a Euclidean space E = Rn with an inner product ( , ). Set I = {1, · · · , n − 1}, I0 = {0} ∪ I . Let ε1 , ε2 , · · · , εn denote an orthonormal basis of E, then we can take = {αi = εi − εi+1 | i ∈ I } and = {εi − ε j | i = j ∈ I }. Let δ denote n . Take α0 = δ − (ε1 − εn ), then = {αi | i ∈ I0 } is the primitive imaginary root of sl n . a base of simple roots of affine Lie algebra sl Let A = (ai j ) (i, j ∈ I0 ) be a generalized Cartan matrix associated to affine Lie n . Let h be a vector space over K with a basis { h 0 , h 1 , · · · , h n−1 , d } and algebra sl define the linear action of αi (i ∈ I0 ) on h by αi (h j ) = a ji ,

αi (d) = δi,0 , for j ∈ I0 .

n . The standard nondegenerate Let Q = Zα0 +· · ·+Zαn−1 denote the root lattice of sl symmetric bilinear form (· , ·) on h∗ satisfies (αi , α j ) = ai j ,

(δ, αi ) = (δ, δ) = 0, ∀i, j ∈ I0 .

456

N. Hu, M. Rosso, H. Zhang

n ) (n > 2) be the unital associative algebra over K Definition 2.1. Let U = Ur,s (sl 1 1 ±1 generated by the elements e j , f j , ω±1 ( j ∈ I0 ), γ ± 2 , γ ± 2 , D ±1 , D ±1 (called j , ωj the Chevalley-Kac-Lusztig generators), satisfying the following relations: (A1) γ ± 2 , γ ± 2 are central with γ = ωδ , γ = ωδ , γ γ = r s, such that ωi ωi−1 = ωi ωi −1 = 1 = D D −1 = D D −1 , and 1

1

[ ωi±1 , ω j±1 ] = [ ωi±1 , D ±1 ] = [ ωj±1 , D ±1 ] = [ ωi±1 , D ±1 ] = 0 = [ ωi±1 , ωj±1 ] = [ ωj±1 , D ±1 ] = [D ±1 , D ±1 ] = [ ωi±1 , ωj±1 ]. (A2) For i ∈ I0 and j ∈ I , D ei D −1 = r δ0i ei , ω j ei ω j−1 = r (ε j ,αi ) s (ε j+1 ,αi ) ei , ω0 ei ω0−1 = r −(εi+1 ,α0 ) s (ε1 ,αi ) ei ,

D f i D −1 = r −δ0i f i , ω j f i ω j−1 = r −(ε j ,αi ) s −(ε j+1 ,αi ) f i , ω0 f i ω0−1 = r (εi+1 ,α0 ) s −(ε1 ,αi ) f i .

(A3) For i ∈ I0 and j ∈ I , D ei D −1 = s δ0i ei , ωj ei ωj−1 = s (ε j ,αi )r (ε j+1 ,αi ) ei , ω0 ei ω0 −1 = s −(εi+1 ,α0 )r ( 1 ,αi ) ei ,

D f i D −1 = s −δ0i f i , ωj f i ωj−1 = s −(ε j ,αi )r −(ε j+1 ,αi ) f i , ω0 f i ω0 −1 = s (εi+1 ,α0 )r −( 1 ,αi ) f i .

(A4) For i, j ∈ I0 , we have [ ei , f j ] =

δi j (ωi − ωi ). r −s

(A5) For i, j ∈ I0 , but (i, j) ∈ / { (0, n − 1), (n − 1, 0) } with ai j = 0, we have [ ei , e j ] = 0 = [ f i , f j ]. (A6) For i ∈ I0 , we have the (r, s)-Serre relations: ei2 ei+1 − (r + s) ei ei+1 ei + (r s) ei+1 ei2 = 0, 2 2 ei ei+1 − (r + s) ei+1 ei ei+1 + (r s) ei+1 ei = 0, 2 2 en−1 e0 − (r + s) en−1 e0 en−1 + (r s) e0 en−1 = 0,

en−1 e02 − (r + s) e0 en−1 e0 + (r s) e02 en−1 = 0. (A7) For i ∈ I0 , we have the (r, s)-Serre relations: f i2 f i+1 − (r −1 + s −1 ) f i f i+1 f i + (r −1 s −1 ) f i+1 f i2 = 0, 2 2 f i f i+1 − (r −1 + s −1 ) f i+1 f i f i+1 + (r −1 s −1 ) f i+1 f i = 0, 2 2 f n−1 f 0 − (r −1 + s −1 ) f n−1 f 0 f n−1 + (r −1 s −1 ) f 0 f n−1 = 0,

f n−1 f 02 − (r −1 + s −1 ) f 0 f n−1 f 0 + (r −1 s −1 ) f 02 f n−1 = 0.

Two-Parameter Quantum Affine Algebra, Drinfel’d Realization

457

n ) is a Hopf algebra with the coproduct , the counit ε and the antipode S Ur,s (sl defined below: for i ∈ I0 , we have 1

1

1

(γ ± 2 ) = γ ± 2 ⊗ γ ± 2 , (D ±1 ) = D ±1 ⊗ D ±1 , (wi ) = wi ⊗ wi , (ei ) = ei ⊗ 1 + wi ⊗ ei , 1

1

1

1

(γ ± 2 ) = γ ± 2 ⊗ γ ± 2 , (D ±1 ) = D ±1 ⊗ D ±1 , (wi ) = wi ⊗ wi , ( f i ) = f i ⊗ wi + 1 ⊗ f i ,

1

ε(ei ) = ε( f i ) = 0, ε(γ ± 2 ) = ε(γ ± 2 ) = ε(D ±1 ) = ε(D ±1 ) = ε(wi ) = ε(wi ) = 1, 1

1

S(ei ) =

−wi−1 ei ,

S(γ ± 2 ) = γ ∓ 2 ,

1

1

S(γ ± 2 ) = γ ∓ 2 , S( f i ) =

S(D ±1 ) = D ∓1 ,

− f i wi−1 ,

S(wi ) =

wi−1 ,

S(D ±1 ) = D ∓1 , S(wi ) = wi−1 .

2.2. In what follows, we give the skew-pairing and the Drinfel’d double structure. Definition 2.2. A bilinear form , : B × A −→ K is called a skew-dual pairing of two Hopf algebras A and B (see [KS, 8.2.1] ), if it satisfies b, 1A = εB(b),

1B, a = εA(a),

op

b, a1 a2 = B(b), a1 ⊗ a2 ,

b1 b2 , a = b1 ⊗ b2 , A(a) ,

for all a, a1 , a2 ∈ A and b, b1 , b2 ∈ B, where εA, εB denote the counits of A, B, respectively, and A, B are the respective coproducts. Definition 2.3. For any two Hopf algebras A and B skew-paired by , , there exists a Drinfel’d quantum double D(A, B) which is a Hopf algebra whose underlying coalgebra is A ⊗ B with the tensor product coalgebra structure, whose algebra structure is defined by (a ⊗ b)(a ⊗ b ) = SB(b(1) ), a(1)

b(3) , a(3)

aa(2) ⊗ b(2) b , for a, a ∈ A and b, b ∈ B, and whose antipode S is given by S(a ⊗ b) = (1 ⊗ SB(b))(SA(a) ⊗ 1). ) denote the Hopf (Borel-type) subalgebra of Ur,s (sl n ) generated by (resp. B Let B ±1 ±1 ± 21 ± 21 ±1 ±1 e j , ω j , γ , D (resp. f j , ω j , γ , D ) with j ∈ I0 . × B −→ K of the Proposition 2.4. There exists a unique skew-dual pairing , : B Hopf subalgebras B and B such that: (1) f i , e j = δi j

1 , s −r

(i, j ∈ I0 ),

(2) ωi ,

r (ε j , αi ) s (ε j+1 , αi ) , (i ∈ I0 , j ∈ I ) ω j = −(εi+1 , α0 ) (ε1 , αi ) r s , (i ∈ I0 , j = 0),

458

N. Hu, M. Rosso, H. Zhang

(3) ±1 −1 ωi ±1 , ω−1 = ωi , ω j ∓1 , (i, j ∈ I0 ), j = ωi , ω j

(4) 1

1

1

γ ± 2 , γ = γ , γ ± 2 = γ , γ ± 2 = 1, (5) D , D ±1 = D ±1 , D = D , D ±1 = 1, (6) γ ± 2 , ωi±1 = 1 = ωi ±1 , γ ± 2 , (i ∈ I0 ), 1

1

(7) D ±1 , ωi = D , ωi±1 = s ∓δ0i , ωi±1 , D = ωi , D ±1 = r ±δ0i , (i ∈ I0 ), (8) 1

1

1

D , γ ± 2 = D ±1 , γ 2 = s ∓ 2 ,

1

1

1

γ ± 2 , D = γ 2 , D ±1 = r ± 2 ,

and all other pairs of generators are 0. Moreover, we have S(b ), S(b) = b , b for , b ∈ B. b ∈ B Proof. The uniqueness assertion is clear, as any skew-dual pairing of bialgebras is determined by the values on the generators. We proceed to prove the existence of the pairing. The pairing defined on generators as (1)—(8) may be extended to a bilinear form on × B in a way such that the defining properties in Definition 2.2 hold. We will verify B are preserved, ensuring that the form is well-defined and and B that the relations in B . and B is a skew-dual pairing of B First, it is straightforward to check that the bilinear form preserves all the relations 1 . Next, we observe that and the ω ±1 , γ ± 21 , D ±1 in B among the ωi±1 , γ ± 2 , D ±1 in B i the identities hold: for i, j ∈ I , (ε j , αi ) = −(εi+1 , α j ),

(ε j , α0 ) = −(ε1 , α j ),

(2.1)

which ensure the compatibility of the form defined above with the relations of (A2) and respectively. This fact is easily checked by definition (see (1)—(8)). So or B (A3) in B . and B we are left to verify that the form preserves the (r, s)-Serre relations in B For 1 ≤ i < n, (r, s)-Serre relations in B and B have been checked in [BW1]. Here . It suffices to and B we need only to verify the relations involving index i = 0 in B consider the following case (the remaining case is similar) X, e02 en−1 − (r −1 + s −1 )e0 en−1 e0 + (r s)−1 en−1 e02 , . By definition, this equals where X is any word in the generators of B (2) (X ), e0 ⊗ e0 ⊗ en−1 − (r −1 + s −1 )e0 ⊗ en−1 ⊗ e0 + (r s)−1 en−1 ⊗ e0 ⊗ e0 ,

(2.2)

Two-Parameter Quantum Affine Algebra, Drinfel’d Realization op

459

where stands for B . In order for any one of these terms to be nonzero, X must involve exactly two f 0 factors, one f n−1 factor, and arbitrarily many ω±1 ( j ∈ I0 ), j 1

γ ± 2 , or D ±1 factors. For simplicity, we first consider three key cases: (i) If X = f 02 f n−1 , then (2) (X ) is equal to (ω0 ⊗ ω0 ⊗ f 0 + ω0 ⊗ f 0 ⊗ 1 + f 0 ⊗ 1 ⊗ 1)2 (ωn−1 ⊗ ωn−1 ⊗ f n−1 +ωn−1 ⊗ f n−1 ⊗ 1 + f n−1 ⊗ 1 ⊗ 1).

The relevant terms of (2) (X ) are f 0 ω0 ωn−1 ⊗ f 0 ωn−1 ⊗ f n−1 + ω0 f 0 ωn−1 ⊗ f 0 ωn−1 ⊗ f n−1 + f 0 ω0 ωn−1 ⊗ ω0 f n−1 ⊗ f 0 + ω0 f 0 ωn−1 ⊗ ω0 f n−1 ⊗ f 0

+ω02 f n−1 ⊗ f 0 ω0 ⊗ f 0 + ω02 f n−1 ⊗ ω0 f 0 ⊗ f 0 . Therefore, (2.2) becomes f 0 ω0 ωn−1 , e0 f 0 ωn−1 , e0 f n−1 , en−1

+ ω0 f 0 ωn−1 , e0 f 0 ωn−1 , e0 f n−1 , en−1

−1 −1 − (r +s ) f 0 ω0 ωn−1 , e0 ω0 f n−1 , en−1 f 0 , e0

, e0 ω0 f n−1 , en−1 f 0 , e0

+ ω0 f 0 ωn−1 + (r s)−1 ω02 f n−1 , en−1 f 0 ω0 , e0 f 0 , e0

+ ω02 f n−1 , en−1 ω0 f 0 , e0 f 0 , e0

1 1 + ω0 , ω0 − (r −1 +s −1 ) ω0 , ωn−1 + ω0 , ω0 ω0 , ωn−1

= (s − r )3 +(r s)−1 ω0 , ωn−1 2 + ω0 , ωn−1 2 ω0 , ω0

1 −1 −1 −1 −1 −1 2 2 −1 = 1 + r s − (r +s )(s + r s s) + (r s) (s + s r s ) (s − r )3 = 0.

(ii) When X = f 0 f n−1 f 0 , it is easy to get the relevant terms of (2) (X ): ω0 ωn−1 f 0 ⊗ f 0 ωn−1 ⊗ f n−1 + f 0 ωn−1 ω0 ⊗ ωn−1 f 0 ⊗ f n−1 +ω0 ωn−1 f 0 ⊗ ω0 f n−1 ⊗ f 0 + f 0 ωn−1 ω0 ⊗ f n−1 ω0 ⊗ f 0

+ω0 f n−1 ω0 ⊗ ω0 f 0 ⊗ f 0 + ω0 f n−1 ω0 ⊗ f 0 ω0 ⊗ f 0 .

Thus, (2.2) becomes 1 ω0 , ω0 ωn−1 , ω0 + ωn−1 , ω0

(s−r )3 −(r −1 +s −1 ) ω0 , ω0 ωn−1 , ω0 ω0 , ωn−1 + 1 +(r s)−1 ω0 , ωn−1 ω0 , ω0 + ω0 , ωn−1

1 −1 −1 −1 −1 −1 −1 −1 −1 −1 r s = · r +r −(r +s )(r s · r s+1)+(r s) (s · r s +s) (s−r )3 = 0.

460

N. Hu, M. Rosso, H. Zhang

(iii) If X = f n−1 f 02 , one can similarly get that (2.2) vanishes. Finally, if X is any word involving exactly two f 0 factors, one f n−1 factor, and 1 arbitrarily many factors ωj±1 ( j ∈ I0 ), γ ± 2 and D ±1 , then (2.2) will just be a scalar multiple of one of the quantities we have already calculated, and then will be 0. are preserved. Analogous calculations show that the relations in B ) is isomorphic to Ur,s (sl n ) as Hopf algebras. B Theorem 2.5. D(B, ) by ei and similarly for ω±1 , γ ± 2 , B Proof. We denote the image ei ⊗ 1 of ei in D(B, i 1 ) by B D ±1 , denote the image 1⊗ fi of f i in D(B, f i and similarly for ωi ±1 , γ ± 2 , D ±1 . ) −→ Ur,s (sl n ) by B Define a map ϕ : D(B, 1

ϕ( ei ) = ei , ϕ( f i ) = f i , ϕ( ωi±1 ) = ωi±1 , ϕ( ωi±1 ) = ωi±1 , ϕ( γ ± 2 ) = γ ± 2 , ϕ(γ 1

1

± 21

) = γ

± 21

±1 ) = D ±1 , ϕ( , ϕ( D D

±1

The remaining argument is analogous to that of [BGH1, Theorem 2.5].

) = D

±1

.

Remark 2.6. (1) Up to now, we have completely solved the compatibility problem on n ) (n > 2). the defining relations of our two-parameter quantum affine algebra Ur,s (sl This is done in two steps: the proof of Theorem 2.5 indicates that the cross relations are half of the relations (A1)—(A4), and the proof of Proposition 2.4 and B between B shows the remaining relations, including the remaining half of relations (A1)—(A4) and the (r, s)-Serre relations (A5)—(A7). n ) modulo the Hopf ideal gener(2) When r = s −1 = q, the Hopf algebra Uq,q −1 (sl ated by the set { ωi − ωi−1 (i ∈ I0 ), γ 2 − γ − 2 , D − D −1 } is the usual quantum affine n ) of Drinfel’d-Jimbo type. algebra Uq (sl 1

1

Let U 0 = K[ω0±1 , · · · , ωn±1 , ω0 ±1 , · · · , ωn ±1 ], U0 = K[ω0±1 , · · · , ωn±1 ], and U0 = n ), B, and K[ω0 ±1 , · · · , ωn ±1 ] denote the Laurent polynomial subalgebras of Ur,s (sl 0 n) B respectively. Clearly, U = U0 U0 = U0 U0 . Furthermore, let us denote by Ur,s ( ) generated by ei (resp. f i ) for all i ∈ I0 . (resp. B (resp. Ur,s ( n − ) ) the subalgebra of B = U Ur,s ( = Ur,s ( n ) U0 , and B n − ), so that the Thus, by definition, we have B 0 ) ∼ B double D(B, n ) ⊗ U 0 ⊗ Ur,s ( n − ), as vector spaces. On the other hand, if = Ur,s ( − we consider , : B × B −→ K by b , b − := S(b ), b , the convolution inverse of the skew-dual pairing , in Proposition 2.4, the composition with the flip mapping −→ K, given × B σ then gives rise to a new skew-dual pairing | := , − ◦ σ : B by b|b = S(b ), b . As a byproduct of Theorem 2.5, similar to [BGH1, Cor. 2.6], we n ). get the standard triangular decomposition of Ur,s (sl n ) ∼ n − ) ⊗ U 0 ⊗ Ur,s ( n ), as vector spaces. Corollary 2.7. Ur,s (sl = Ur,s (

n n ), the defining relaζi αi ∈ Q (the root lattice of sl Corollary 2.8. For any ζ = i=0 tions (A2) and (A3) in Ur,s (sln ) take the form: ωζ ei ωζ−1 = ωi , ωζ ei , ωζ ei ωζ

−1

= ωζ , ωi −1 ei ,

ωζ f i ωζ−1 = ωi , ωζ −1 f i , ωζ f i ωζ

−1

= ωζ , ωi f i .

Two-Parameter Quantum Affine Algebra, Drinfel’d Realization

Ur,s ( n ±) =

461

±η

Ur,s ( n ± ) is then Q ± -graded with

−1 η ( n ± ) = a ∈ Ur,s ( n ± ) ωζ a ωζ−1 = ωη , ωζ a, ωζ a ωζ = ωζ , ωη −1 a , Ur,s η∈Q +

for η ∈ Q + ∪ Q − . η Furthermore, U = η∈Q Ur,s (sl n ) is Q-graded with

η (sln ) = ων E β ∈ U ωζ (Fα ωµ ων E β ) ωζ−1 = ωβ−α , ωζ Fα ωµ ων E β , Ur,s Fα ωµ −1 ωζ (Fα ωµ ων E β ) ωζ = ωζ , ωβ−α −1 Fα ωµ ων E β , with β − α = η , where Fα (resp. E β ) runs over monomials f i1 · · · f il (resp. e j1 · · ·e jm ) such that αi1 + · · · + αil = α (resp. α j1 + · · · + α jm = β). n ) such that τ (r ) = Definition 2.9. Let τ be the Q-algebra anti-automorphism of Ur,s (sl s, τ (s) = r , τ ( ωi , ω j ±1 ) = ωj , ωi ∓1 , and τ (ei ) = f i , τ ( f i ) = ei , τ (ωi ) = ωi , τ (ωi ) = ωi , τ (γ ) = γ , τ (γ ) = γ , τ (D) = D , τ (D ) = D. = τ (B) with those induced defining relations from B, and those cross relations Then B in (A2)—(A4) are antisymmetric with respect to τ . n ) and Quantum Affine Lyndon Basis 3. Drinfel’d Realization of Ur,s (sl n ) (n > 2) we defined in 3.1. For the two-parameter quantum affine algebra Ur,s (sl Sect. 2, we give the following definition of its Drinfel’d realization. In the two-parameter case, the defining relations (D2), (D6), (D7) and (D8) below appear to vary dramatically in comparison with the one-parameter cases (see (d2), (d6), (d7) and (d8) in Remark 3.3), where the compatibilities for the whole system are based on some intrinsic considerations as indicated in the sequel. We briefly write i, j := ωi , ω j . n ) (n > 2) be the unital associative algebra over K generated Definition 3.1. Let Ur,s (sl

by the elements xi± (k), ai (), ωi±1 , ωi ±1 , γ ± 2 , γ ± 2 , D ±1 , D ±1 (i ∈ I , k, k ∈ Z, , ∈ Z\{0}), subject to the following defining relations: 1

1

(D1) γ ± 2 , γ ± 2 are central with γ γ = r s, ωi ωi−1 = ωj ωj−1 = 1 = D D −1 = D D −1 (i, j ∈ I ), and 1

1

[ ωi±1 , ω j±1 ] = [ ωi±1 , D ±1 ] = [ ωj±1 , D ±1 ] = [ ωi±1 , D ±1 ] = 0 = [ ωi±1 , ωj±1 ] = [ ωj±1 , D ±1 ] = [D ±1 , D ±1 ] = [ ωi±1 , ωj±1 ]. (D2) ||

ai j

(r s) 2 ( i, i 2 − i, i − [ ai (), a j ( ) ] = δ+ ,0 ||(r − s)

ai j 2

) γ || − γ || · . r −s

462

N. Hu, M. Rosso, H. Zhang

(D3) ±1

[ ai (), ω±1 j ] = [ ai (), ω j ] = 0.

(D4) D xi± (k) D −1 = r k xi± (k),

D xi± (k) D −1 = s k xi± (k),

D ai () D −1 = r ai (),

D ai () D −1 = s ai ().

(D5) −1 ±1 ± ωi x ± j (k) ωi = j, i x j (k),

−1 ωi x ± = i, j ∓1 x ± j (k) ωi j (k).

(D61 ) ai j

||

[ ai (), x ± j (k) ]

(r s) 2 ( i, i 2 − i, i − =± (r − s)

[ ai (), x ± j (k) ]

(r s) 2 ( i, i 2 − i, i − =± (r − s)

ai j 2

)

γ ± 2 x± j (+k), for < 0,

)

γ ± 2 x ± j (+k), for > 0.

(D62 ) ai j

||

ai j 2

(D7) ± ±1 ± xi± (k+1) x ± j (k ) − j, i x j (k ) x i (k+1) ± 1 2 ± ± ±1 ± = − j, i i, j −1 x± j (k +1) x i (k) − i, j x i (k) x j (k +1) .

(D8) [ xi+ (k),

x− j (k ) ]

δi j = r −s

−k − k+k k k+k 2 2 γ γ ωi (k+k ) − γ γ ωi (k+k ) ,

where ωi (m), ωi (−m) (m ∈ Z≥0 ) with ωi (0) = ωi and ωi (0) = ωi are defined by: ∞ ∞ −m − ωi (m)z = ωi exp (r −s) ai ()z ; m=0 ∞

ωi (−m)z m

=

ωi

=1 ∞

exp −(r −s)

ai (−)z

,

=1

m=0

with ωi (−m) = 0 and ωi (m) = 0, ∀ m > 0. (D91 ) ± ± xi± (m)x ± j (k) = x j (k)x i (m),

(D92 )

for ai j = 0,

± ±1 + s ±1 ) xi± (m 1 )x ± Sym m 1 , m 2 xi± (m 1 )xi± (m 2 )x ± j (k) − (r j (k)x i (m 2 ) ± ± +(r s)±1 x ± j (k)x i (m 1 )x i (m 2 ) = 0, for ai j = −1, 1 ≤ i < j < n,

Two-Parameter Quantum Affine Algebra, Drinfel’d Realization

(D93 )

463

± ∓1 Sym m 1 , m 2 xi± (m 1 )xi± (m 2 )x ± + s ∓1 ) xi± (m 1 )x ± j (k) − (r j (k)x i (m 2 ) ± ± +(r s)∓1 x ± (k)x (m )x (m ) = 0, for ai j = −1, 1 ≤ j 2) Proposition 3.2. There exists the Q-algebra antiautomorphism τ of Ur,s (sl such that τ (r ) = s, τ (s) = r , τ ( ωi , ω j ±1 ) = ωj , ωi ∓1 and τ (ωi ) = ωi , τ (ωi ) = ωi , τ (γ ) = γ , τ (γ ) = γ , τ (D) = D , τ (D ) = D, τ (ai ()) = ai (−), τ (xi± (m)) = xi∓ (−m),

τ (ωi (m)) = ωi (−m), τ (ωi (−m)) = ωi (m),

and τ preserves each defining relation (Dn) in Definition 3.1 for n = 1, · · · , 9. Remark 3.3. (1) Note that the defining relations (D1)—(D5), (D7), (D8), and (D91 )— (D93 ) are self-compatible each under the Q-algebra antiautomorphism τ , while the couple of the defining relations ((D61 ),(D62 )) is compatible with each other with respect to τ . Using such a τ , it is sufficient to consider the compatibility for half of the relations, e.g., those relations involving in +-parts for xi± (m), or in positive ’s for ai () (for instance, see (D62 )). (2) The constraint condition γ γ = r s in (D1) is required intrinsically by the compatibilities among (D1), (D3), (D5), (D6), (D7) & (D8). For instance, by (D7), we have 1 − − 2 [ xi− (0), x − j (1) ] i, j = ( j, i i, j ) [ x i (1), x j (0) ] j,i −1 . Thus, using the property (3.5) in Definition 3.4 below and (D8) & (D5), we get − − + 2 [ x +j (0), [ xi− (0), x − j (1) ] i, j ] = ( j, i i, j ) [ x i (1), [ x j (0), x j (0) ] ] j,i −1 ω j − ωj 1 − = ( j, i i, j ) 2 xi (1), r −s −1 1

j,i

=

1 2

( j, i i, j ) − ( j, i i, j ) r −s

− 12

xi− (1)ω j .

However, using (3.5), (D8), (D3), (D5) & (D62 ), we can follow another way to expand [ x +j (0), [ xi− (0), x − j (1) ] i, j ] directly as − − + [ x +j (0), [ xi− (0), x − j (1) ] i, j ] = [ x i (0), [ x j (0), x j (1) ] ] i, j

= γ − 2 [ xi− (0), a j (1) ] ω j 1

1 2

− 21

= (r s) (γ γ )

j, j

a ji 2

− j, j − r −s

a ji 2

xi− (1)ω j .

464

N. Hu, M. Rosso, H. Zhang

Therefore, we obtain that γ γ = r s and i, j j, i = i, i ai j , for any i, j ∈ I . (3) As a glimpse of the compatibility of (D2) with (D61 ), (D62 ) and (D8), we 1 have the following: By (D8), we get ai (1) = ωi−1 γ 2 [ xi+ (0), xi− (1) ] and ai (−1) = 1

ωi−1 γ 2 [ xi+ (−1), xi− (0) ]. Then using one of these expressions of ai (±1) and using (D61 ) (or (D62 )) and (D8) again, we may expand the Lie bracket [ai (1), a j (−1)] in two manners to get to the same formula as (D2). One is to expand ai (1) first, and then to use (D61 ) & (D8) as follows: [ ai (1), a j (−1) ] = ωi−1 γ 2 [ [xi+ (0), xi− (1)], a j (−1) ] 1 = ωi−1 γ 2 [ [xi+ (0), a j (−1)], xi− (1) ] + [ xi+ (0), [xi− (1), a j (−1)] ] 1 1 1 = ωi−1 γ 2 {−ai j } i,i γ − 2 [xi+ (−1), xi− (1)] − γ 2 [xi+ (0), xi− (0)] ωi − ωi −1 γ ωi − γ ωi −γ = {−ai j } i,i ωi r −s r −s γ − γ γ − γ = {ai j } i,i

, = {−ai j } i,i

r −s r −s 1

(r s)

ai j || 2 ( i,i 2

− i,i − (r −s)

ai j 2

) where {ai j } i,i := = {a ji } j, j , {−ai j } i,i = −{ai j } i,i . Expanding a j (−1) instead and using (D62 ) & (D8), we get the same result. More compatibilities will be clearer in the proof of the Drinfel’d isomorphism theorem. n ) (4) Another observation is the following: When r = s −1 = q, the algebra Uq,q −1 (sl

modulo the ideal generated by the set { ωi − ωi−1 (i ∈ I ), γ 2 − γ − 2 , D − D −1 } is n ) defined below (cf. [B2]). exactly the usual Drinfel’d realization Uq (sl The unital associative algebra Uq (sln ) over Q(q) is generated by the elements xi± (k), 1

1

ai (), ωi±1 , γ ± 2 , D ±1 , (i ∈ I , k ∈ Z, ∈ Z\{0}) subject to the following defining relations: 1

(d1) γ ± 2 are central, ωi ωi−1 = 1 = D D −1 (i ∈ I ), and for i, j ∈ I , one has 1

[ ωi±1 , ω j±1 ] = [ ωi±1 , D ±1 ] = 0. (d2) [ ai j ] γ − γ − · [ ai (), a j ( ) ] = δ+ , 0 , q − q −1

q n − q −n . [n] = q − q −1

(d3) [ ai (), ω±1 j ] = 0. (d4) D xi± (k) D −1 = q k xi± (k),

D ai () D −1 = q ai ().

(d5) −1 ±ai j ± x j (k). ωi x ± j (k) ωi = q

Two-Parameter Quantum Affine Algebra, Drinfel’d Realization

465

(d6) [ ai (), x ± j (k) ] = ±

[ ai j ] ∓ || ± γ 2 x j (+k).

(d7) ±ai j ± ± xi± (k + 1)x ± x j (k )xi (k+1) j (k ) − q ± ± = q ±ai j xi± (k)x ± j (k +1) − x j (k +1)x i (k).

(d8) [ xi+ (k),

x− j (k ) ]

δi j = q − q −1

k− k2 k − k2 −1 γ ωi (k+k ) − γ ωi (k+k ) ,

where ωi (m) and ωi−1 (−m) (m ∈ Z≥0 ) with ωi (0) = ωi and ωi−1 (0) = ωi−1 are defined by: ∞ ∞ −m −1 − ωi (m)z = ωi exp (q−q ) ai ()z , (ωi (−m) = 0, ∀ m > 0) ; m=0 ∞

=1

ωi−1 (−m)z m = ωi−1 exp −(q−q −1 )

m=0

∞

ai (−)z ,

ωi−1 (m) = 0, ∀ m > 0 .

=1

(d91 ) ± ± xi± (m)x ± j (k) = x j (k)x i (m),

(d92 )

(d93 )

for ai j = 0,

± ±1 + q ∓1 ) xi± (m 1 )x ± Sym m 1 , m 2 xi± (m 1 )xi± (m 2 )x ± j (k) − (q j (k)x i (m 2 ) ± ± +x ± j (k)x i (m 1 )x i (m 2 ) = 0, for ai j = −1, 1 ≤ i < j < n, ± ∓1 + q ±1 ) xi± (m 1 )x ± Sym m 1 , m 2 xi± (m 1 )xi± (m 2 )x ± j (k) − (q j (k)x i (m 2 ) ± ± +x ± j (k)x i (m 1 )x i (m 2 ) = 0, for ai j = −1, 1 ≤ j 2) in Definition 3.1 is exactly the Drinfel’d realization of Q(r, s)-algebra Ur,s (sl n ) (n > 2) defined in Definition 2.1, the two-parameter quantum affine algebra Ur,s (sl we need to make some preliminaries on Lyndon words, and to adapt a definition of quantum Lie bracket borrowed from [J2] to give our definition about “affine” quantum Lie bracket (see Definition 3.6) which enables us to derive an interesting description on the quantum affine Lyndon basis in the quantum affine cases for the first time. Note that the (affine) quantum Lie bracket possesses some advantages in calculations such as less related to degrees of elements (see the properties (3.3) & (3.4) below). This generalized quantum Lie bracket, like the one used in the usual construction of the quantum Lyndon basis (for definition, see [R2]), is consistent with the process when adding the bracketing on those corresponding Lyndon words. This is crucial to the quantum calculations we develop later on.

466

N. Hu, M. Rosso, H. Zhang

Definition 3.4 ([J2]). The quantum Lie bracket [ a1 , a2 , · · · , as ](q1 , q2 , ··· , qs−1 ) is defined inductively by [ a1 , a2 ]q = a1 a2 − q a2 a1 , for q ∈ K\{0}, [ a1 , a2 , · · · , as ](q1 , q2 ,··· ,qs−1 ) = [ a1 , [ a2 , · · · , as ](q1 ,··· ,qs−2 ) ]qs−1 , for qi ∈ K\{0}. The following identities follow from the definition: [ a, bc ]v = [ a, b ]x c + x b [ a, c ] vx ,

x = 0,

[ ab, c ]v = a [ b, c ]x + x [ a, c ] b,

x = 0,

v x

[ a, [ b, c ]u ]v = [ [ a, b ]x , c ]

uv x

(3.1) (3.2)

+ x [ b, [ a, c ] ] , v x

u x

x = 0,

(3.3)

[ [ a, b ]u , c ]v = [ a, [ b, c ]x ] uvx + x [ [ a, c ] vx , b ] ux , x = 0, a, [ b1 , · · · , bs ](v1 , ··· , vs−1 ) = [ b1 , · · · , [ a, bi ], · · · , bs ](v1 , ··· , vs−1 ) ,

(3.4) (3.5)

i

[ a, a, b ](u, v) = [ a, a, b ](v, u) = a 2 b − (u + v) aba + (uv) ba 2 .

(3.6)

n ), we define the Definition 3.5. For the generators system of the algebra Ur,s (sl ˙ ˙ Q-gradation (where Q is the root lattice of sln ) as follows: deg(ωi±1 ) = deg(ωi±1 ) = deg(γ ± 2 ) = deg(γ ± 2 ) = deg(D ±1 ) = deg(D ±1 ) = 0, 1

deg(ai (±)) = 0,

1

deg(xi± (k)) = ±αi .

n ) has a triangular decomHence, the defining relations (D1)—(D9) ensure that Ur,s (sl position: 0 n ) = Ur,s ( Ur,s (sl n− ) ⊗ Ur,s (sln ) ⊗ Ur,s ( n ), where Ur,s ( n± ) = α∈ Q˙ ± Ur,s ( n± )α is generated respectively by xi± (k) (i ∈ I ), and 0 (sl n ) is the subalgebra generated by ω±1 , ω±1 , γ ± 21 , γ ± 21 , D ±1 , D ±1 and ai (±) Ur,s i i 0 (sl n ) is generated by the toral subalgebra Ur,s (sl n )0 and for i ∈ I , ∈ N. Namely, Ur,s n ) generated by those quantum imaginary the quantum Heisenberg subalgebra Hr,s (sl root vectors ai (±) (i ∈ I , ∈ N).

n± ), Definition 3.6. For α, β ∈ Q˙ + (a positive root lattice of sln ), xα± (k), xβ± (k ) ∈ Ur,s ( we define their “affine” quantum Lie bracket as follows: xα± (k), xβ± (k ) := xα± (k) xβ± (k ) − ωα , ωβ ∓1 xβ± (k ) xα± (k). (3.7) ∓1 ωα ,ωβ

By Definition 3.6, the formula (D7) will take the convenient form as ± 1 2 ± ± −1 x (k +1) = − j, i i, j

(k ), x (k+1) xi± (k), x ± j j i ∓1 i, j

j,i ∓1

.

(3.8)

By (3.6), the (r, s)-Serre relations (D92 ) & (D93 ) for m 1 = m 2 in the case of ai j = −1 can be reformulated as: xi± (m), xi± (m), x ± for 1 ≤ i < j < n, j (k) (r ±1 , s ±1 ) = 0, (3.9) xi± (m), xi± (m), x ± (k) = 0, for 1 ≤ j < i < n. j ∓1 ∓1 (s

,r

)

Two-Parameter Quantum Affine Algebra, Drinfel’d Realization

467

Remark 3.7. (1) For any nonsimple root α (= αi ) (i ∈ I ), the meaning of notation xα+ (k) (resp. xα− (k)) in Definition 3.6 has a bit of ambiguity, as is well-known even for quantum “classical” root vectors xα+ (0) which have different linearly-independent choices. However, the combinatorial approach to Lyndon words, together with the “affine” quantum Lie bracket, will give us a valid and specific choice for xα+ (k) which leads to a construction of quantum “affine” Lyndon basis for Ur,s ( n ), on which acting τ will yield a corresponding construction of quantum “affine” Lyndon basis for Ur,s ( n− ) (see Proposition 3.10 & Theorem 3.11 below). (2) In fact, (3.8) describes a kind of consistent constraints of quantum affine root vectors defined by some Lyndon words of different levels (if say, x ± j (k) have level k) which obeys the defining rule of Lyndon basis (see below) via Lyndon words as in the classical types, since from (3.8), we get the level-shifting formula a ± ± ∓ 2i j x xi± (k), x ± (k +1) = i, i

(k+1), x (k ) j i j i, j ∓1 i, j ∓1 ± + i, j ∓1 − j, i ±1 x ± (3.10) j (k ) x i (k+1) . Based on this formula, we will see that it makes it reasonable to give the definition of quantum affine root vector xα± (k) as in (3.14) & (3.15) below such that the level k completely concentrates on the component of the lowest index, in the ordered constituents of Lyndon basis. This will be clear from the proof of Proposition 3.10. (3) Let Ur,s (n) denote the subalgebra of Ur,s ( n ), generated by xi+ (0) (i ∈ I ). ∼ By definition, it is clear that Ur,s (n) = Ur,s (n), the subalgebra of Ur,s (sln ) generated by ei (i ∈ I ) (see [BGH1, Remarks (2), p. 391]). Now let us recall the construction of a Lyndon basis. The natural ordering < in I gives a total ordering of the + (0)}. Let A∗ be the set of all words in the alphabet A alphabet A = {x1+ (0), · · · , xn−1 (including the vacuum 1) and let u < v denote that word u is lexicographically smaller than word v. Recall that a word ∈ A∗ is a Lyndon word if it is lexicographically smaller than all its proper right factors (cf. [LR,R2,BH]). Let K[A∗ ] be the associative algebra of K-linear combinations of words in A∗ whose product is juxtaposition, namely, a free K-algebra. Let J be the (r, s)-Serre ideal of K[A∗ ] generated by elements {(ad xi+ (0))1−ai j (x +j (0)) | 1 ≤ i = j ≤ n − 1}. Clearly, Ur,s (n) = K[A∗ ]/J . Now given another ordering in A∗ , introducing a usual length function | · | for each word u ∈ A∗ . We say u w, if |u| < |w| or |u| = |w| and u ≥ w. Then we call a (Lyndon) word to be good with respect to the (r, s)-Serre ideal J if it cannot be written as a sum of strictly smaller words modulo J with respect to the ordering . From [R2], the set of quantum Lie brackets (or say, q-bracketings) of all good Lyndon words consists of a system of quantum root vectors of Ur,s (n). More precisely, we have a construction ˙ + (a positive root system of sln ) in the for any quantum root vector xα+ (0) with α ∈ following. Take a corresponding ordering (compatible with the natural ordering < on I ) of ˙ + = {αi j := αi + αi+1 + · · · + α j−1 = εi − ε j | 1 ≤ i < j ≤ n} with αi,i+1 = αi as follows (see [H, p. 533]): α12 , α13 , α14 , · · · , α1n , α23 , α24 , · · · , α2n , · · · , αn−1,n ,

(3.11)

˙ + (for definition, see [R2, Sect. 6]). Hence, for which is a convex ordering on ˙ + , by [R2], we can construct the quantum root vector xα+ (0) as a each α = αi j ∈

468

N. Hu, M. Rosso, H. Zhang

(r, s)-bracketing of a good Lyndon word in the inductive fashion: xα+i j (0) : = xα+i, j−1 (0), x +j−1 (0) −1

+ = · · · xi+ (0), xi+1 (0)

ωα

i, j−1

, ω j−1

i,i+1 −1

, · · · , x +j−1 (0)

+ = · · · xi+ (0), xi+1 (0) r , · · · , x +j−1 (0)

r

ωα

i, j−1

, ω j−1 −1

.

(3.12)

Applying τ to (3.12), we can obtain the definition of quantum root vector xα−i j (0) as below: − − (0), · · · , x (0), x (0) · · · . (3.13) xα−i j (0) = τ xα+i j (0) = x − j−1 i+1 i s s

Theorem 3.8. (i) The set

xα+n−1,n (0)n−1,n · · · xα+23 (0)23 xα+1n (0)1n · · · xα+13 (0)13 xα+12 (0)12 i j ≥ 0 is a Lyndon basis of Ur,s (n). (ii) The set

xα−12 (0)12 xα−13 (0)13 · · · xα−1n (0)1n xα−23 (0)23 · · · xα−n−1,n (0)n−1,n i j ≥ 0 is a Lyndon basis of Ur,s (n− ). ˙ + , we define the quantum affine root vectors xα± (k) of nonDefinition 3.9. For αi j ∈ ij trivial level k by + xα+i j (k) := · · · xi+ (k), xi+1 (0) r , · · · , x +j−1 (0) , (3.14) r − − , (3.15) xα−i j (k) := x − j−1 (0), · · · , x i+1 (0), x i (k) s · · · s

where τ xα±i j (±k) = xα∓i j (∓k). (k) For each fixed α ∈ Q˙ + , let us denote by Ur,s ( n ) the subspace of Ur,s ( n )α , consistα (k) ˙+ n )α = k∈Z Ur,s ( n )α . When α = αi ∈ ing of elements of level k. Hence, Ur,s ( (k) is a simple root, by definition, dim Ur,s ( n )αi = 1 for any level k. However, for any (k) nonsimple root α = αi (i ∈ I ), dim Ur,s ( n )α = ∞ for any level k. In this case, given a ˙ + , we call a tuple (β j1 , · · · , β jν ) (ν ≥ 1) a partition of root positive root α = αi j ∈ αi j if β j1 < · · · < β jν in the ordering given in (3.11) such that β j1 + · · · + β jν = αi j . If ν > 1, we say this partition is proper. Denote by P◦ (α) the set of all proper partitions of (k ) (k ) (k) root α. Obviously, we have Ur,sν ( n )β jν · · · Ur,s1 ( n )β j1 ⊆ Ur,s ( n )α if k1 + · · · + kν = k. Now we write (kν ) (k1 ) (k) (k) n ) := Ur,s ( n )β jν · · · Ur,s ( n )β j1 ⊆ Ur,s ( n )α α ( (β j ,··· ,β jν )∈P ◦ (α) 1 k1 +···kν =k

Two-Parameter Quantum Affine Algebra, Drinfel’d Realization

469

(k)

for the subspace of Ur,s ( n )α spanned by basis elements’ products of level k from those proper partitions pertaining to α. Using the Q-antiautomorphism τ on (−k) ( n ), we get α n− ) := τ (−k) ( n) . (k) α ( α (k) ± ˙ + , whose proof shows Then we have the following description on Ur,s ( n )α for α ∈ that Definition 3.9 makes sense.

˙ + (a positive root system of sln ), we Proposition 3.10. For 1 ≤ i < j ≤ n and αi j ∈ have (k) (k) (i) Ur,s ( n )αi j = Kxα+i j (k) αi j ( n ), (k) − (k) (ii) Ur,s ( n )αi j = Kxα−i j (k) αi j ( n− ). Proof. (i) We will use an induction on rank n, where n ≥ 2. Assume that i < j and k > 0, then by (3.10), we have a + + − 2i j + + x = i, i

(k+1), x (k −1) xi (k), x j (k ) i j i, j −1 i, j −1 + i, j −1 − j, i x +j (k −1) xi+ (k+1) , (3.16)

xi+ (k), x +j (−k )

ai j = i, i 2 xi+ (k−1), x +j (−k +1) i, j −1 + j, i − i, j −1 x +j (−k ) xi+ (k).

i, j −1

(3.17)

When n = 2, for any k ∈ N, repeatedly using (3.16) & (3.17), we get + k x1 (k), x2+ (k ) r = 1, 1 2 xα+13 (k+k )

+

k

1, 1

k −t+1 2

(r −s) x2+ (t−1) x1+ (k+k −t+1)

t=1 k

x1+ (k), x2+ (−k )

r

) ( n ), ≡ 1, 1 2 xα+13 (k+k ) mod α(k+k 13 k

= 1, 1 − 2 xα+13 (k−k )

+

k

1, 1 −

k −t 2

(s−r ) x2+ (−t) x1+ (k−k +t)

t=1 k

) ( n ), ≡ 1, 1 − 2 xα+13 (k−k ) mod (k−k α13

which means that in both cases, we have + k ) ( n ), for any k ∈ Z. x1 (k), x2+ (k ) r ≡ 1, 1 2 xα+13 (k+k ) mod α(k+k 13 Therefore, in rank 2 case, any elements for x2+(k )x1+ (k)) of degree α13 generated (except + + + by x1 (k) and x2 (k ) are of the form: x1 (k), x2+ (k ) a for any a ∈ K; however, + x1 (k), x2+ (k ) a = x1+ (k), x2+ (k ) r + (r − a) x2+ (k ) x1+ (k) k

) ( n ). ≡ (r s −1 ) 2 xα+13 (k+k ) mod α(k+k 13

(3.18)

470

N. Hu, M. Rosso, H. Zhang

This fact shows that (k) Ur,s ( n )α13 = Kxα+13 (k)

(k) n) α13 (

(k) − as vector spaces. Dually, we also have Ur,s ( n )α13 = Kxα−13 (k) (k) n− ) as vector α13 ( spaces. Now we assume that we have proved the results for rank < n, that is, for those αi j with 1 ≤ i < j < n. For the rank n case, owing to the ordering given in (3.11), we are (k) ± left to prove the remaining cases: Ur,s ( n )αin with 1 ≤ i < j = n. In view of the same observation as (3.18), we need only consider the following ele+ (k ) for 1 ≤ i < ments of degree αin and level k + k generated by xα+i,n−1 (k) and xn−1 + (k ) + (k ) . By definition (see = xα+i,n−1 (k), xn−1 n: xα+i,n−1 (k), xn−1 −1 ωα

i,n−1

, ωn−1

r

(3.14)) and using (3.4), (3.5) & (3.1), we have

+ + + xα+i,n−1 (k), xn−1 (k ) = xα+i,n−2 (k), xn−2 (0) , xn−1 (k ) (using (3.4)) r r r + + = xα+i,n−2 (k), xn−2 (0), xn−1 (k ) r r + + + xαi,n−2 (k), xn−1 (k ) , xn−2 (0) +r (2nd term = 0 by (3.5) & (D91 )) + + = xα+i,n−2 (k), xn−2 (0), xn−1 (k ) r (using (3.18): rank 2 case) r −1 k2 + + + xαi,n−2 (k), xn−2 (k ) , xn−1 (0) = (r s ) r r +

(using the inductive hypothesis) + + ∗t (r − s) xα+i,n−2 (k), xn−1 (t) xn−2 (k −t)

r

t

(using (3.1))

) + + mod α(k+k xα+i,n−1 (k+k ), xn−1 (0) ( n ), x (0) n−1 i,n−1 r r (by definition) + + ∗t (r − s) xn−1 (t) xα+i,n−2 (k), xn−2 (k −t) + r t (using the inductive hypothesis)

≡ (r s −1 )

k (n−1−i) 2

k (n−1−i)

) ( n) ≡ (r s −1 ) 2 xα+in (k+k ) mod α(k+k in −t) + + + ∗t (r − s) xn−1 (t) xα+i,n−1 (k+k −t) mod xn−1 (t) α(k+k ( n) i,n−1 t

≡ (r s −1 )

k (n−1−i) 2

) xα+in (k+k ) mod α(k+k ( n ), in

Two-Parameter Quantum Affine Algebra, Drinfel’d Realization

471

where in the 1st “≡”, we used the following fact: + + + + xα+i,n−2 (k), xn−1 (t) xn−2 (k −t) = xn−1 (t) xα+i,n−2 (k), xn−2 (k −t) r r + + + + xαi,n−2 (k), xn−1 (t) xn−2 (k −t) (2nd term = 0 by (3.5) & (D91 )) + + (k −t) ; = xn−1 (t) xα+i,n−2 (k), xn−2 r

while in the 2nd “≡”, we used the facts: ) ) + α(k+k ( n ), xn−1 (0) ⊆ α(k+k ( n ), i,n−1 in −t) + xn−1 (t) α(k+k ( n) i,n−1

r

) ⊆ α(k+k ( n ). in (k+k )

The latter is clear, due to the definition of αin ( n ). As for the first inclusion, we have the following argument provided that we notice the basis elements’ constituents of (k+k ) αi,n−1 ( n ). Indeed, for any basis element xα+ν ,n−1 (kν ) xα+

ν−1 ,ν

) (kν−1 ) · · · xα+i, (k1 ) ∈ (k+k n) αi,n−1 ( 1

of level k + k pertaining to a partition of αi,n−1 , using (3.2), we have + xα+ν ,n−1 (kν ) xα+ ,ν (kν−1 ) · · · xα+i, (k1 ), xn−1 (0) ν−1 1 r + + + + = xαν ,n−1 (kν ), xn−1 (0) xα ,ν (kν−1 ) · · · xαi, (k1 ) (by definition) ν−1 1 r + (0) + xα+ν ,n−1 (kν ) xα+ ,ν (kν−1 ) · · · xα+i, (k1 ), xn−1 1 ν−1 (2nd term = 0 by (3.5) & (D91 ) since 1 < · · · < ν < n−1) = xα+ν ,n (kν ) xα+

ν−1 ,ν

(kν−1 ) · · · xα+i, (k1 ) 1

) ( n ), ∈ α(k+k in

here k1 + · · · + kν = k + k . Up to now, we have finished the proof of (i). Using τ to (i), we can get the second statement (ii). The argument above (in fact used the so-called quantum calculations) implies the important conclusions about the quantum affine Lyndon basis we present below. Theorem 3.11. (i) The set (i) (i) (i) n−1,n + + 1n + 12 (i) xαn−1,n (i) xα1n (i) xα12 (i) ··· ··· st ≥ 0 i∈Z

i∈Z

i∈Z

(i)

is an “affine” Lyndon basis of Ur,s ( n ), where each index set Iαst = {i ∈ Z | st = 0} is finite.

472

N. Hu, M. Rosso, H. Zhang

(ii) The set

i∈Z

(i)

xα−12 (i)12

···

i∈Z

(i)

xα−1n (i)1n

···

i∈Z

(i)

xα−n−1,n (i)n−1,n

(i) st ≥ 0

is an “affine” Lyndon basis of Ur,s ( n− ), where each index set Iαst = {i ∈ Z | (i) st = 0} is finite. 3.3. The following main theorem establishes the Drinfel’d isomorphism between the n ) (in Definition 2.1) and the (r, s)-analogue two-parameter quantum affine algebra Ur,s (sl of Drinfel’d quantum affinization of Ur,s (sln ) (in Definition 3.1), which affords the twon ) as required. parameter Drinfel’d realization of Ur,s (sl Theorem 3.12 (Drinfel’d Isomorphism). For Lie algebra sln with n > 2, let θ = α1n be n ) −→ the maximal positive root. Then there exists an algebra isomorphism : Ur,s (sl Ur,s (sln ) defined by: for each i ∈ I, ωi −→ ωi ωi −→ ωi ω0 −→ γ −1 ωθ−1

ω0 −→ γ −1 ωθ−1 1

1

γ ± 2 −→ γ ± 2 1

γ ±2 D ±1 D ±1 ei

−→ −→ −→ −→

1

γ ±2 D ±1 D ±1 xi+ (0)

f i −→ xi− (0) e0 −→ xα−1n (1) · (γ −1 ωθ−1 ) = xθ− (1) · (γ −1 ωθ−1 ) f 0 −→ (γ −1 ωθ−1 ) · xα+1n (−1) = τ xα−1n (1) · (γ −1 ωθ−1 ) , . where ωθ = ω1 · · · ωn−1 , ωθ = ω1 · · · ωn−1

Since Lusztig’s symmetry of the braid group for the two-parameter cases is no more available when the rank of g is bigger than 2 (see [BGH1, Sect. 3]), this means that Beck’s approach (using the extended braid group actions (see [B2]) to prove the Drinfel’d Isomorphism Theorem) is not yet valid for the two-parameter cases here. Our treatment in the next section in fact develops a valid and interesting algorithm on the quantum calculations, which, as the reader has seen, is also a successful application to the combinatorial approach to the quantum “affine” Lyndon basis (based on the Drinfel’d generators) we introduced above. In some sense, our method also provides another new combinatorial proof via the quantum “affine” Lyndon basis even in the one-parameter setting.

Two-Parameter Quantum Affine Algebra, Drinfel’d Realization

473

4. Proof of the Drinfel’d Isomorphism Theorem 4.1. Let E i , Fi , ωi , ωi denote the images of ei , f i , ωi , ωi (i ∈ I0 ) in the algebra n ) under the mapping , respectively. Ur,s (sl n ) the subalgebra of Ur,s (sl n ) generated by E i , Fi , ω±1 , ω±1 Denote by U r,s (sl i i 1

1

(i ∈ I0 ), γ ± 2 , γ ± 2 , D ±1 and D ±1 , that is, " ! n ) := E i , Fi , ω±1 , ω ±1 , γ ± 21 , γ ± 21 , D ±1 , D ±1 i ∈ I0 . U r,s (sl i i

Thereby, to prove the Drinfel’d Isomorphism Theorem (Theorem 3.12) is equivalent to prove the following three theorems: n ) −→ U r,s (sl n ) is an epimorphism. Theorem 4.1. : Ur,s (sl n ) = Ur,s (sl n ). Theorem 4.2. U r,s (sl n ) −→ Ur,s (sl n ) is injective. Theorem 4.3. : Ur,s (sl 4.2. Proof of Theorem 4.1. We shall check that the elements E i , Fi , ωi , ωi (i ∈ 1 1 n ). I0 ), γ ± 2 , γ ± 2 , D ±1 , D ±1 satisfy the defining relations of (A1)–(A7) of Ur,s (sl First of all, the defining relations of Ur,s (sln ) imply that E i , Fi , ωi , ωi (i ∈ I ) generate n ), which is isomorphic to Ur,s (sln ). So we are left to a subalgebra Ur,s (sln ) of Ur,s (sl check the relations involving the index i = 0. n ). Obviously, the relations of (A1) hold, according to the defining relations of Ur,s (sl For (A2): we just check the following three relations involving i = 0, the remaining relations in (A2) are parallel to check. Using (D4), we get D E 0 D −1 = D xθ− (1) D −1 · (γ −1 ωθ−1 ) = r E 0 . For 0 ≤ j < n, noting that ωθ−1 , ω j = γ −1 ωθ−1 , ω j = ω0 , ω j (by Proposition 2.4), we have − −1 −1 ω j E 0 ω−1 ωθ ) ω−1 j = ω j x θ (1) (γ j = ωn−1 , ω j −1 · · · ω1 , ω j −1 E 0 = ω0 , ω j E 0 .

For 0 ≤ i < n, ω0 E i ω0−1 = (γ −1 ωθ−1 ) E i (γ ωθ ) = ωθ−1 E i ωθ , when i = 0, since ωi , ωθ −1 = ωi , ω0 (by Proposition 2.4), we obtain ωθ−1 E i ωθ = ωθ−1 xi+ (0) ωθ = ωi , ωθ −1 xi+ (0) = ωi , ω0 E i ; and when i = 0, since ωθ , ωθ−1 −1 = ω0 , ω0 (by Proposition 2.4), we have ωθ−1 E 0 ωθ = ωθ−1 xθ− (1) (γ −1 ωθ−1 ) ωθ = ω0 , ω0 E 0 . Similarly, one can verify the relations in (A3).

474

N. Hu, M. Rosso, H. Zhang

For (A4): first of all, when i = 0, we see that [ E 0 , Fi ] = xθ− (1) · (γ −1 ωθ−1 ), xi− (0) = − xi− (0), xθ− (1) ω , ω (γ −1 ωθ−1 ). θ

i

n ), we claim the folAccording to the corresponding cross relations held in Ur,s (sl lowing crucial lemma, whose proof using the typical quantum calculations is technical. Lemma 4.4. xi− (0), xθ− (1) ω , ω = 0, for i ∈ I . i

θ

Proof. (I) When i = 1, ω1 , ω0 = ω2 , ω1 = s, and ω1 , ωθ = s −1 . By (3.8) & (3.9), we have − − − − − (by (3.8)) x1 (0), xα13 (1) s −1 = x1 (0), x2 (0), x1 (1) 2,1

s −1 1 = −( 1, 2 −1 2, 1 ) 2 x1− (0), x1− (0), x2− (1) 1,2 −1 s 1 − − − 2 x1 (0), x1 (0), x2 (1) (r −1 ,s −1 ) = 0. = −(r s) (by (3.9)) Hence, repeatedly using (3.3), we have − − − − − x1 (0), xα1n (1) s −1 = x (0), xn−1 (0) , xα1,n−1 (1) (= 0 by (D91 )) 1 − (0), x1− (0), xα−1,n−1 (1) −1 (by (3.3)) + xn−1 s s − = xn−1 (0), x1− (0), xα−1,n−1 (1) −1 s s = ··· =

(inductively using (3.3) & (D91 ))

− xn−1 (0),

− xn−2 (0), · · ·

,

x1− (0),

xα−13 (1) s −1

··· (s,··· ,s)

= 0. , ω0 = r −1 , that is, ωn−1 , ωθ = r . By (3.3), (3.9) & (II) When i = n − 1, ωn−1 (D91 ), we have − xn−1 (0), xα−1n (1) r (by definition) ⎤ ⎡ − − − ⎦ (0), xn−1 (0), xn−2 (0), xα−1,n−2 (1) (using (3.3)) = ⎣ xn−1 s s r − − − (this term using (3.3)) = xn−1 (0), xn−1 (0), xn−2 (0) s , xα−1,n−2 (1) s r − − − − + s xn−1 (0), xn−2 (0), xn−1 (0), xα1,n−2 (1) (= 0 by (3.5), (D91 )) r

Two-Parameter Quantum Affine Algebra, Drinfel’d Realization

=

− − − xn−1 (0), xn−1 (0), xn−2 (0) (s,r ) ,

+r

− − xn−1 (0), xn−2 (0)

475

xα−1,n−2 (1)

(this term= 0 by (3.9)) s

− , xn−1 (0), xα−1,n−2 (1)

s

= 0.

(= 0 by (3.5), (D91 )) r −1 s

(III) When 1 < i < n − 1, ωi , ω0 = 1, that is, ωi , ωθ = 1. In order to derive the required result, we first need tomake two claims below: Claim (A). xi− (0), xα−1,i+1 (1) = xi− (0), xα−1,i+1 (1) = 0, for i ≥ 2. ωi ,ωα1,i+1

r

In fact, by (3.3), (3.9) & (D91 ), we have

xi− (0), xα−1,i+1 (1) ⎡ =

⎣ x − (0), i

= xi− (0),

r

xi− (0),

(by definition)

− xi−1 (0),

xα−1,i−1 (1)

⎤

⎦ r

s s

(using (3.3))

− (this term using (3.3)) xi− (0), xi−1 (0) s , xα−1,i−1 (1) s r − − − − + s xi (0), xi−1 (0), xi (0), xα1,i−1 (1) (= 0 by (3.5), (D91 )) r − − = xi (0), xi− (0), xi−1 (0) (s,r ) , xα−1,i−1 (1) (= 0 by (3.9)) s − − − − (= 0 by (3.5), (D91 )) +r xi (0), xi−1 (0) s , xi (0), xα1,i−1 (1) r −1 s

= 0. Claim (B).

xi− (0), xα−1,i+2 (1)

ωi ,ωα1,i+2

=

xi− (0), xα−1,i+2 (1)

= 0 (i ≥ 2), if

r = −s. By definition, we note that [ b, a ]u = −u [ a, b ]u −1 . So, we get '

( − − xi− (0), xi+1 (0), xi− (0) = −s xi− (0), xi− (0), xi+1 (0) (s −1 ,r −1 ) (by (3.6)) (s,r −1 ) − = −s xi− (0), xi− (0), xi+1 (0) (r −1 ,s −1 ) (by (3.9)) = 0.

476

N. Hu, M. Rosso, H. Zhang

We then consider the following deduction: − − − − − − xi (0), xα1,i+2 (1) −1 = xi (0), xi+1 (0), xi (0), xα1i (1) s (by (3.3)) r s s r −1 s − = xi− (0), xi+1 (0), xi− (0) s , xα−1i (1) s −1 (using (3.3)) r s − − − − (this term= 0 by (3.5), (D91 )) + s xi (0), xi (0), xi+1 (0), xα1i (1) r −1 s − − − − = xi (0), xi+1 (0), xi (0) (s, r −1 ) , xα1i (1) (this term= 0 by the above) 2 s − + r −1 xi+1 (0), xi− (0) s , xα−1,i+1 (1) (using (3.4)) rs ⎡ − − −1 ⎣ − + xα−1,i+2 (1), xi− (0) −1 xi+1 (0), xi (0), xα1,i+1 (1) =r r s r s 2 = xα−1,i+2 (1), xi− (0) −1 (1st term= 0 by Claim (A)). r

s

Expanding both sides of the above equation according to definition, we easily get (1 + r −1 s) xi− (0), xα−1,i+2 (1) = 0. Thus the required result is obtained under the assumption. Now applying (3.5), we can get − − [ xi− (0), xθ− (1) ] = xn−1 (0), · · · , xi+2 (0), xi− (0), xα−1,i+2 (1) (s,··· ,s) =0

(by Claim (B)).

This completes the proof of Lemma 4.4.

Next, we turn to check the relation below, whose argument (using the quantum calculations) is crucial to our verification on compatibilities of the defining relations system n ) mentioned in Remark 3.3. of Ur,s (sl Proposition 4.5. [ E 0 , F0 ] =

ω0 −ω0 r −s .

Proof. Using (D1) & (D5), we have −1 [ E 0 , F0 ] = xα−1n (1) γ −1 ωθ −1 , γ −1 ωθ xα+1n (−1) −1 = xα−1n (1), xα+1n (−1) · (γ −1 γ −1 ωθ −1 ωθ ). Note that for j ≥ 1, we have − x− j+1 (0), ω j s = (r − s) ω j x j+1 (0), ωj X, x +j+1 (k) = ωj X, x +j+1 (k) , r + x j (k), Y ω j+1 = x +j (k), Y ω j+1 . r

x− j+1 (0), ω j

s

= 0,

(4.1)

(4.2) (4.3) (4.4)

Two-Parameter Quantum Affine Algebra, Drinfel’d Realization

477

So (4.2) implies that there hold − + (0), x (0), x (0) = ωj x − x− j j+1 j j+1 (0), s − − x2 (0), x1 (1), x1+ (−1) s = γ ω1 x2− (0), − + (0), x (0) , x (0) = −x − x− j+1 j+1 j j (0) ω j+1 . s

Now let us write briefly

+ (0) x1+ (−1), x2+ (0), · · · , xi−1

r,··· ,r

:=

(4.5) (4.6) (4.7)

+ · · · x1+ (−1), x2+ (0) r , · · · r , xi−1 (0) . r

Thus, by (3.5), we have − + xα1i (1), xα+1i (−1) = xα−1i (1), x1+ (−1), x2+ (0), · · · , xi−1 (0) r,··· ,r

− + + + = xα (1), x1 (−1) , x2 (0), · · · , xi−1 (0) 1i r,··· ,r

i−1 + . x1+ (−1), x2+ (0), · · · , xα−1i (1), x +j (0) , · · · j=2 r,··· ,r

(4.8) (i) For j = 1, by (3.5), (D8) & (4.6), we have − − xα1i (1), x1+ (−1) = xi−1 (0), · · · , x2− (0), x1− (1), x1+ (−1) s (by (4.6)) (s,··· ,s) = γ ω1 xα−2i (0),

so that

M(i) : =

xα−1i (1),

(i > 2),

x1+ (−1)

, x2+ (0), · · ·

+ xi−1 (0)

r,··· ,r

+ ω1 xα−2i (0), x2+ (0) r , · · · , xi−1 (0) r,··· ,r

+ = γ ω1 (0) xα− (0), x2+ (0) , · · · , xi−1 2i r,··· ,r

− + + = γ ω1 ω2 xα3i (0), x3 (0) r , · · · , xi−1 (0) r,··· ,r

− + = γ ω1 ω2 (0) xα3i (0), x3+ (0) , · · · , xi−1 r,··· ,r

=γ

= ···

− + = γ ω1 · · · ωi−2 xi−1 (0), xi−1 (0) = γ ω1 · · · ωi−2

ωi−1 − ωi−1

r −s

,

(i > 2),

478

N. Hu, M. Rosso, H. Zhang

where we used the following identities, respectively: ωj−1 xα−ji (0), x +j (0) = ωj−1 xα−ji (0), x +j (0) , (by (4.3)), r xα−ji (0), x +j (0) = ωj xα−j+1,i (0), (by (3.13) & (4.5)). (ii) For j = i − 1, again by (3.5), (3.3) & (4.7), we get − − − + + xα1i (1), xi−1 (0) = xi−1 (0), xi−1 (0) , xi−2 (0), xα−1,i−2 (1) (by (3.3)) s s − − + − (by (4.7)) = xi−1 (0), xi−1 (0) , xi−2 (0) s , xα1,i−2 (1) s − − − + xi−1 (0), xi−1 (0) , xα1,i−2 (1) + s xi−2 (0), (= 0) − = − xi−2 (0) ωi−1 , xα−1,i−2 (1) s − − = − xi−2 (0), xα1,i−2 (1) ωi−1 s

= − xα−1,i−1 (1) ωi−1 , where we notice that (*):

− + (0) , x − xi−1 (0), xi−1 (1) = 0. α1,i−2

Thereby, we further obtain N (i) : =

x1+ (−1),

=−

x2+ (0), · · ·

,

+ xα−1i (1), xi−1 (0)

xα+1,i−1 (−1), xα−1,i−1 (1) ωi−1

r

,··· r,··· ,r

(by (4.4))

= − xα+1,i−1 (−1), xα−1,i−1 (1) ωi−1 = xα−1,i−1 (1), xα+1,i−1 (−1) ωi−1 .

(iii) For 1 < j < i − 1, by (3.5), (3.3), (4.7) & (D91 ), we obtain ⎤ ⎡ − − + − ⎦ xα−1i (1), x +j (0) = ⎣ xi−1 (0), · · · x − j (0), x j (0) , x j−1 (0), x α1, j−1 (1) s s (s,··· ,s) =

(by (3.3)) − − − + − xi−1 (0), · · · x j (0), x j (0) , x j−1 (0) , xα1, j−1 (1) s (s,··· ,s)

(by (4.7))

Two-Parameter Quantum Affine Algebra, Drinfel’d Realization

+s

− xi−1 (0), · · ·

x− j−1 (0),

479

+ x− j (0), x j (0)

, xα−1, j−1 (1)

(s,··· ,s)

(= 0 by (*)) − − − − = − xi (0), · · · , x j+1 (0), x j−1 (0) ω j , xα1, j−1 (1) s (s,··· ,s) − = − xi− (0), · · · , x − j+1 (0), x α1 j (1) ω j s (s,··· ,s) − = − xi− (0), · · · x − (0), x (1) (by (3.5), (D91 )) ωj α j+1 1j (s,··· ,s) = 0, where in the fourth and fifth equality “=” we used the following identities, respectively: − − − − x− j−1 (0) ω j , x α1, j−1 (1) s = x j−1 (0), x α1, j−1 (1) s ω j = x α1 j (1) ω j , − − − x− (0), x (1) ω = x (0), x (1) ωj. j α1 j α1 j j+1 j+1 s

As a result of (i), (ii) & (iii), (4.8) becomes − xα1i (1), xα+1i (−1) = M(i) + N (i) = M(i) + xα−1,i−1 (1), xα+1,i−1 (−1) ωi−1 = M(i) + M(i−1) ωi−1 + xα−1,i−2 (1), xα+1,i−2 (−1) ωi−2 ωi−1 = ··· = M(i) + M(i−1) ωi−1 + M(i−2) ωi−2 ωi−1 + · · · +M(3) ω3 · · · ωi−1 + xα−12 (1), xα+12 (−1) ω2 · · · ωi−1 =

γ ωα 1i − γ ωα1i r −s

,

(i > 1),

where we used (D8) to get

γ ω1 − γ ω1 xα−12 (1), xα+12 (−1) = . r −s

Therefore, by (4.9), (4.1) takes the required formula: [ E 0 , F0 ] =

γ −1 ωθ−1 − γ −1 ωθ−1 . r −s

The proof of Proposition 4.5 is complete.

(4.9)

480

N. Hu, M. Rosso, H. Zhang

For (A5): We need only to verify that [ E 0 , E j ] = 0 and [ F0 , F j ] = 0 for1 < j <

n − 1. Actually, in the proof of Proposition 4.5, the fact that xα−1i (1), x +j (0) = 0 for

1 < j < i −1 implies the first identity (taking i = n) since [ E 0 , E j ] = [ xθ− (1), x +j (0) ]· γ −1 ωθ−1 . The second can be obtained utilizing τ on the first one.

For (A6): When i · j = 0, (D9n ) implies that the corresponding generators satisfy exactly those (r, s)-Serre relations in Ur,s (sln ). So, it is enough to check the (r, s)-Serre relations involving the indices with i · j = 0. Lemma 4.6. (1) E 0 E 12 − (r + s)E 1 E 0 E 1 + (r s) E 12 E 0 = 0, (2) E 02 E 1 − (r + s)E 0 E 1 E 0 + (r s) E 1 E 02 = 0, 2 E − (r + s) E 2 (3) E n−1 0 n−1 E 0 E n−1 + (r s) E 0 E n−1 = 0, (4) E n−1 E 02 − (r + s) E 0 E n−1 E 0 + (r s) E 02 E n−1 = 0, (5) F12 F0 − (r + s)F1 F0 F1 + (r s) F0 F12 = 0, (6) F1 F02 − (r + s)F0 F1 F0 + (r s) F02 F1 = 0, 2 − (r + s) F 2 (7) F0 Fn−1 n−1 F0 Fn−1 + (r s) Fn−1 F0 = 0, (8) F02 Fn−1 − (r + s) F0 Fn−1 F0 + (r s) Fn−1 F02 = 0. Proof. The proofs for relations of (5)—(8) follow from taking τ on the first four relations (1)—(4). We shall demonstrate the first two (r, s)-Serre relations, the third and fourth ones are similar to the first two relations (1) & (2), which are left to the reader. (1) Observing

( − xn−1 (0), · · · , x1+ (0), x1− (1) (using (D8)), (s,··· ,s) − − − 21 =γ (using (D5)), xn−1 (0), · · · , x2 (0), ω1 a1 (1) s (s,··· ,s) ( ' − − − 21 =γ s ω1 xn−1 (0), · · · , x2 (0), a1 (1) (using (D6)), (s,··· ,s)

E 1 , xθ− (1) =

'

1

= −(r s)− 2 xα−2n (1) ω1 , we have E 0 E 12 − (r + s)E 1 E 0 E 1 + (r s) E 12 E 0 = (r s) E 12 xθ− (1) − (1 + r −1 s)E 1 xθ− (1)E 1 + (r −1 s) xθ− (1)E 12 (γ −1 ωθ−1 ) ( ' − = (r s) E 1 , E 1 , xθ (1) (γ −1 ωθ−1 ) (by (3.6)) r −1 s 1 = −(r s) 2 x1+ (0), xα−2n (1) ω1 r −1 s (γ −1 ωθ−1 ) 1 = −(r s) 2 x1+ (0), xα−2n (1) ω1 (γ −1 ωθ−1 ) =0 (by (3.5), (D8)).

Two-Parameter Quantum Affine Algebra, Drinfel’d Realization

(2) Using the formula of

481

E 1 , xθ− (1) derived in (1) above, we have

E 02 E 1 − (r + s)E 0 E 1 E 0 + (r s) E 1 E 02 ( ' − − = (r s) xθ (1), xθ (1), E 1 (γ −2 ωθ−2 ) (1, r s −1 ) 1 = (r s) 2 xθ− (1), xα−2n (1) ω1 r s −1 (γ −2 ωθ−2 ) 1 = (r s) 2 xθ− (1), xα−2n (1) s −1 ω1 (γ −2 ωθ−2 ) 1 = −(r s −1 ) 2 xα−2n (1), xθ− (1) s ω1 (γ −2 ωθ−2 ) = 0, (by Claim (C) below) where we used the following claim: Claim (C). xα−2n (1), xα−1n (1) s = 0, for n > 2 and r = −s. The argument for Claim (C) is technical. Indeed, by induction on n, we have: when n = 3, by (3.8), one gets − x2 (1), xα−13 (1) s = x2− (1), x2− (0), x1− (1) s (by (3.8)) s − 1 − − = −(r s) 2 x2 (1), x1 (0), x2 (1) r −1 s 1 = (r s −1 )− 2 x2− (1), x2− (1), x1− (0) (r,s) 1 = (r s −1 )− 2 x2− (1), x2− (1), x1− (0) (s,r ) (by (3.6)) = 0,

(by (3.9)),

which is exactly the (r, s)-Serre relation (see (3.9)). For n > 3, we first notice the fact: − − xn−1 (0), xα−2n (1) ω ,ω = xn−1 (0), xα−2n (1) r = 0, n−1

α2n

for n > 3,

which can be proved using the same method of the proof of (II) in Lemma 4.4. We thus have − − xα2n (1), xθ− (1) r = xn−1 (0), xα−2,n−1 (1) , xθ− (1) (by (3.4)) s r − = xn−1 (0), xα−2,n−1 (1), xθ− (1) 1 rs − − − + xn−1 (0), xθ (1) r , xα2,n−1 (1) (= 0 by Claim (A)) s ⎡ ⎤ − − ⎦ = ⎣ xn−1 (0), xα−2,n−1 (1), xn−1 (0), xα−1,n−1 (1) (by (3.3)) s 1 rs ⎡ ⎤ − − ⎦ xα−2,n−1 (1), xn−1 = ⎣ xn−1 (0), (0) −1 , xα−1,n−1 (1) s s2 r s

(4.10)

482

N. Hu, M. Rosso, H. Zhang

⎡

− + s −1 ⎣ xn−1 (0),

− xn−1 (0),

xα−2,n−1 (1), xα−1,n−1 (1) s

⎤ ⎦ s2

rs

(2nd sumand= 0 using induction hypothesis) − −1 = −s xn−1 (0), xα−2n (1), xα−1,n−1 (1) 2 (by (3.3)) s rs − = −s −1 (= 0 by (4.10)) xn−1 (0), xα−2n (1) r , xα−1,n−1 (1) 3 s − − r s −1 xα−2n (1), xn−1 (0), xα−1,n−1 (1) (by definition) s r −1 s 2 = −r s −1 xα−2n (1), xθ− (1) r −1 s 2 . By definition, expanding both sides of the above identity gives us (1 + r s −1 ) xα−2n (1) xα−1n (1) = (r + s) xα−1n (1) xα−2n (1), which means xα−2n (1), xα−1n (1) s = 0, under the assumption r = −s. For (A7): The verification is analogous to that of (A6). n ) is generated by 4.3. Proof of Theorem 4.2. We shall show that the algebra Ur,s (sl ±1 ±1 ± 21 ± 21 ±1 ±1 E i , Fi , ωi , ωi , γ , γ , D , D (i ∈ I0 ). To this end, we need to prove the following results. n ), then Lemma 4.7. (1) x1− (1) = E 2 , E 3 , · · · , E n−1 , E 0 (r,··· ,r ) γ ω1 ∈ U r,s (sl n ). for any i ∈ I , xi− (1) ∈ U r,s (sl (2) x1+ (−1) = τ E 2 , E 3 , · · · , E n−1 , E 0 (r,··· ,r ) γ ω1 = γ ω1 [F0 , Fn−1 , · · · , F3 , n ), then for any i ∈ I , x + (−1) ∈ U r,s (sl n ). F2 ] s,··· ,s ∈ U r,s (sl i

= xα− (1) ωi+1 · · · ωn−1 γ −1 ω−1 for i ≥ 1, where E(n−1) Proof. (1) Set E(i) = θ 1,i+1 E 0 . Observing that xi+ (0), xα−1,i+1 (1) = xα−1i (1) ωi in the proof (see case (ii)) of Proposition 4.5, we get an important recursive relation: + − −1 −1 = x (0), x (1) ω · · · ω γ ω E i , E(i) i+1 n−1 α1,i+1 i θ r r + − −1 −1 = xi (0), xα1,i+1 (1) ωi+1 · · · ωn−1 γ ωθ = E(i−1).

(4.11)

Recursively using the above relations, we obtain x1− (1) = E(1)γ ω1 = E 2 , E(2) γ ω1 = · · · r γ ω1 = E 2 , · · · , E n−1 , E(n−1) (r,··· ,r ) = E 2 , · · · , E n−1 , E 0 (r,··· ,r ) γ ω1 n ). ∈ U r,s (sl

(4.12)

Two-Parameter Quantum Affine Algebra, Drinfel’d Realization

483

n ) for i ≥ 1. Notice Now suppose that we already have obtained xi− (1) ∈ U r,s (sl that − xi+ (0), xi− (0) , xi+1 (1) r −1 ωi−1 ( ' − (1) ω−1 (r s) xi+ (0), xi− (0), xi+1 (r −1 ,1) i 1 − (0), xi− (1) (s,1) ωi−1 −(r s) 2 xi+ (0), xi+1 1 Fi+1 , xi− (1) s , E i ωi−1 (r s) 2 n ), U r,s (sl

− xi+1 (1) = (r s)

= = = ∈

(by (3.4)), (by (3.8)),

(4.13)

which gives rise to the recursive construction of some basic quantum real root vectors of level 1. Hence, we obtain the required result. x+ · · · ωi+1 (−1) for i ≥ 1, where (2) Set F(i) = τ ( E(i)) = γ −1 ωθ−1 ωn−1 α1,i+1 = F(n−1) = F0 . Applying τ to (4.11), we see that F(i), Fi s = F(i−1) and F(1) −1 + −1 γ ω1 x1 (−1), which implies the first claim. The remaining claim follows from − + (−1) = τ (xi+1 (1)) = (r s) 2 ωi−1 xi+1 1

n ). Fi , xi+ (−1), E i+1 r ∈ U r,s (sl

This completes the proof of Lemma 4.7.

(4.14)

We observe that Lemma 4.7, together with (4.12), (4.13) & (4.14), gives the construction of the Drinfel’d generators of level 1. Furthermore, the first conclusion of the following lemma gives the construction of the quantum imaginary root vectors of any level (= 0), while the second gives the construction of some basic quantum real root vectors of any level. Actually, as a result of Definition 3.9 and Lemma 4.8 below, this approach also gives the construction of all quantum real root vectors of any level. n ), for ∈ Z\{0}. Lemma 4.8. (1) ai () ∈ U r,s (sl n ), for k ∈ Z. (2) xi± (k) ∈ U r,s (sl Proof. (1) At first, it follows from (D8) that n ), ai (1) = ωi−1 γ 1/2 xi+ (0), xi− (1) ∈ U r,s (sl −1 1/2 + − n ). xi (−1), xi (0) = τ (ai (1)) ∈ U r,s (sl ai (−1) = ωi γ

(4.15) (4.16)

n ) for all ≤ and Suppose that we have already obtained ai (± ) ∈ U r,s (sl some ≥ 1. Now using (D6n ) & (D8), we have the following expansion (in fact, the expansions of both sides are the same which also show the compatibility between (D6n ) and (D8)

484

N. Hu, M. Rosso, H. Zhang

for n = 1, 2.): n ) x + (0), ai (), x − (1) U r,s (sl i i = xi+ (0), ai () , xi− (1) + ai (), xi+ (0), xi− (1) = ∗γ 2 xi+ (), xi− (1) 1 + ai (), γ − 2 ωi ai (1) (this term= 0 by (D2)) ⎡ ⎢ 1 = ∗(γ γ )− 2 γ − 2 ωi ⎢ ⎣ai (+1) +

1< p≤+1

k k =+1

⎤

⎥ ∗ (r −s) p−1 ai ( j1 ) · · · ai ( j p )⎥ ⎦, (4.17)

n ). where scalars ∗, ∗ ∈ K\{0}. So ai (+1) ∈ U r,s (sl n ). Thereby, Applying τ to the above formula, we can get ai (−(+1)) ∈ U r,s (sl ai () ∈ U r,s (sln ), for any ∈ Z\{0}. (2) follows from (D6) (setting i = j and k = 0), together with (1). n ) = Ur,s (sl n ), that is to say, the latter is indeed Therefore, we have proved U r,s (sl 1 1 ±1 ±1 generated by E i , Fi , ωi , ωi , γ ± 2 , γ ± 2 , D ±1 , D ±1 (i ∈ I0 ). 4.4. Proof of Theorem 4.3. From Sects. 4.2 & 4.3, we actually get an algebra epimorn ) −→ Ur,s (sl n ), since both algebras have essentially the same phism : Ur,s (sl generators system enjoying the defining relations from the former. n ) and Ur,s (sl n ) have commonly a natural Q-gradation Notice that both algebras Ur,s (sl structure (see Corollary 2.8), which is by definition preserved evidently under . On n )0 and Ur,s (sl n )0 generated by the same the other hand, both toral subalgebras Ur,s (sl generators system of group-like elements

1 1 ±1 ωi±1 , ω i (i ∈ I0 ), γ ± 2 , γ ± 2 , D ±1 , D ±1 are obviously isomorphic with respect to 0 := |Ur,s (sl n )0 .

n are two subAssigned to the positive or negative nilpotent Lie subalgebra n ± of sl ± ± ± n ) and Ur,s (sl n ) algebras Ur,s ( n ) and Ur,s ( n ). Both are generated by n in Ur,s (sl respectively. Denote ± := |Ur,s ( n ± ) . By Corollary 2.7, the double structure of n ) in Theorem 2.5 implies its triangular decomposition structure Ur,s ( n −) ⊗ Ur,s (sl n )0 ⊗ Ur,s ( n + ). This fact likewise indicates that has a corresponding decomUr,s (sl position − ⊗ 0 ⊗ + . So, we are left to show ± are isomorphic. It suffices to consider the epimorphism + : Ur,s ( n + ) −→ Ur,s ( n + ). + + Observe that Ur,s ( n ) (resp. Ur,s ( n )) is generated by elements ei (resp. E i ) for i ∈ I0 and subject to (r, s)-Serre relations (A5) & (A6). To check that + is an isomorphism, now we fix r = q and specialize s at q −1 as follows. Note that Ur,s ( n + ) can be viewed as defined over the Laurent polynomials ring ±1 ±1 Q[r , s ]. Let A ⊂ Q(r, s) be the localization of ring Q[r ±1 , s ±1 ] at the maximal + be the A-subalgebra of U ( + ideal (r s − 1). Let UA r,s n ) generated by ei (i ∈ I0 ). Let + + (r s−1)UA be the ideal generated by (r s−1) in UA . Define the algebra Uq+ , the specializa+ /(r s − 1)U + . Obviously, U + ∼ U ( + n + ) at s = q −1 , by Uq+ = UA tion of Ur,s ( q = q n ), the A

Two-Parameter Quantum Affine Algebra, Drinfel’d Realization

485

n ). However, in this case, + induces usual one-parameter quantum subalgebra of Uq (sl + + + the isomorphism : Uq ( n ) −→ Uq ( n ) given by the Drinfel’d isomorphism in the one-parameter case (see [B2] or [J2]). Since specialization doesn’t change the root multiplicities, + : Ur,s ( n + ) −→ Ur,s ( n + ) is an isomorphism. Up to now, from subsections 4.2—4.4, we have finally established the Drinfel’d isomorphism in the two-parameter case. Acknowledgement. Part of this work was done when Hu visited l’DMA, l’Ecole Normale Supérieure de Paris from October to November, 2004, the Fachbereich Mathematik der Universität Hamburg from November 2004 to February 2005, as well as the ICTP (Trieste, Italy) from March to August, 2006. He would like to express his deep thanks to ENS de Paris, H. Strade and ICTP for the hospitalities and the supports from ENS, DFG and ICTP. The authors are indebted to the referee for useful comments.

References [B1]

Beck, J.: Convex bases of PBW type for quantum affine algebras. Commun. Math. Phys. 165, 193– 199 (1994) [B2] Beck, J.: Braid group action and quantum affine algebras. Commun. Math. Phys. 165, 555–568 (1994) [BGH1] Bergeron, N., Gao, Y., Hu, N.: Drinfel’d doubles and Lusztig’s symmetries of two-parameter quantum groups. J. Algebra 301, 378–405 (2006) [BGH2] Bergeron, N., Gao, Y., Hu, N.: Representations of two-parameter quantum orthogonal and symplectic groups. “Proceedings of the International Conference on Complex Geometry and Related Fields”, AMS/IP Studies in Adv. Math. 39, Providence, RI: Amer. Math. Soc., 2007, pp. 1–21 [BH] Bai, X., Hu, N.: Two-parameter quantum groups of exceptional type E-series and convex PBW-type basis. Algebra Colloquium, to appear, available at http://arXiv.org/list/Math.QA/0605179, 2006 [BW1] Benkart, G., Witherspoon, S.: Two-parameter quantum groups and Drinfel’d doubles. Alg. Rep. Theory 7, 261–286 (2004) [BW2] Benkart, G., Witherspoon, S.: Representatons of two-parameter quantum groups and Schur-Weyl duality. In: Hopf algebras, Lecture Notes in Pure and Appl. Math., 237, New York: Dekker, 2004, pp. 62–92 [BW3] Benkart, G., Witherspoon, S.: Restricted two-parameter quantum groups, In: Fields Institute Communications, “Representations of Finite Dimensional Algebras and Related Topics in Lie Theory and Geometry”, Vol. 40, Providence, RI: Amer. Math. Soc., 2004, pp. 293–318 [Da] Damiani, I.: A basis of type Poincaré-Birkhoff-Witt for the quantum algebra of sl(2). J. Algebra 161, 291–310 (1993) [DI1] Ding, J.T., Iohara, K.: Generalization of Drinfel’d quantum affine algebras. Lett. Math. Phys. 41(2), 181–193 (1997) [DI2] Ding, J.T., Iohara, K.: Drinfel’d comultiplication and vertex operators. J. Geom. Phys. 23, 1–13 (1997) [Dr1] Drinfel’d, V.G.: Quantum groups. ICM Proceedings (New York, Berkeley, 1986), Providencem RI: Amer. Math. Soc., pp. 798–820, 1987 [Dr2] Drinfel’d, V.G.: A new realization of Yangians and quantized affine algebras. Soviet Math. Dokl. 36, 212–216 (1988) [FJ] Frenkel, I., Jing, N.: Vertex representations of quantum affine algebras. Proc. Nat’l. Acad. Sci. USA. 85, 9373–9377 (1998) [G] Grossé, P.: On quantum shuffle and quantum affine algebras. J. Algebra 318, 495–519 (2007) [Ga] Garland, H.: The arithmetic theory of loop algebras. J. Algebra 53, 480–551 (1978) [H] Hu, N.: Quantum divided power algebra, q-derivatives, and some new quantum groups. J. Algebra 232, 507–540 (2000) [HS] Hu, N., Shi, Q.: The two-parameter quantum group of exceptional type G 2 and Lusztig’s symmetries. Pacific J. Math. 230, 327–345 (2007) [J1] Jing, N.: Twisted vertex representations of quantum affine algebras. Invent. Math. 102, 663–690 (1990) [J2] Jing, N.: On Drinfel’d realization of quantum affine algebras. Ohio State Univ. Math. Res. Inst. Publ. 7, Berlin: de Gruyter, pp. 195–206, 1998 [K] Kac, V.: Infinite Dimentional Lie Algebras. 3rd edition, Cambridge: Cambridge Univ. Press, 1990 [KS] Klimyk, A., Schmüdgen, K.: Quantum Groups and Their Reprsentations. Berlin: Springer, 1997 [KT] Khoroshkin, S.M., Tolstoy, V.N.: On Drinfel’d realization of quantum affine algebras. J. Geom. Phys. 11, 445–452 (1993)

486

N. Hu, M. Rosso, H. Zhang

[LR]

Lalonde, M., Ram, A.: Standard Lyndon bases of Lie algebras and enveloping algebras. Trans. Amer. Math. Soc. 347(5), 1821–1830 (1995) Levendorskii, S., Soibel’man, Y., Stukopin, V.: Quantum Weyl group and universal quantum R-matrix (1) for affine Lie algebra A1 . Lett. Math. Phys. 27(4), 253–264 (1993) Rosso, M.: Quantum groups and quantum shuffles. Invent. Math. 133(2), 399–416 (1998) Rosso, M.: Lyndon bases and the multiplicative formula for R-matrices. Preprint, 2002 Takeuchi, M.: A two-parameter quantization of G L(n). Proc. Japan Acad. 66(Ser. A), 112–114 (1990)

[LSS] [R1] [R2] [T]

Communicated by A. Connes

Commun. Math. Phys. 278, 487–548 (2008) Digital Object Identifier (DOI) 10.1007/s00220-007-0403-3

Communications in

Mathematical Physics

Lagrangian Approach to Sheaves of Vertex Algebras Fyodor Malikov Department of Mathematics, University of Southern California, Los Angeles, CA 90089, USA. E-mail: [email protected] Received: 29 September 2006 / Accepted: 8 June 2007 Published online: 8 January 2008 – © Springer-Verlag 2007

Abstract: We explain how sheaves of vertex algebras are related to mathematical structures encoded by a class of Lagrangians. The exposition is focused on two examples: the WZW model and the (1,1)-supersymmetric σ -model. We conclude by showing how to construct a family of vertex algebras with base the Barannikov-Kontsevich moduli space thus furnishing the B-model moduli for Witten’s half-twisted model. Contents Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1. Diffieties and Functional Pre-Symplectic Structures . . . . . . . . . 1.1 The jets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2 Local formulas . . . . . . . . . . . . . . . . . . . . . . . . . . 1.3 De Rham complex . . . . . . . . . . . . . . . . . . . . . . . . . 1.4 Differential equations . . . . . . . . . . . . . . . . . . . . . . . 1.5 Functional pre-symplectic structure . . . . . . . . . . . . . . . . 1.6 Calculus of variations and integrals of motion. Bosonic σ -model 2. Vertex Poisson Algebras . . . . . . . . . . . . . . . . . . . . . . . . 2.1 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2 Lemma . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3 Tensor products . . . . . . . . . . . . . . . . . . . . . . . . . . 2.4 From vertex Poisson algebras to Courant algebroids . . . . . . . 2.5 Symbols of vertex differential operators . . . . . . . . . . . . . 2.6 A sheaf-theoretic version . . . . . . . . . . . . . . . . . . . . . 2.7 A natural sheaf of SVDOs . . . . . . . . . . . . . . . . . . . . 2.8 The Lagrangian interpretation . . . . . . . . . . . . . . . . . . . 2.9 An example: WZW model . . . . . . . . . . . . . . . . . . . .

Partially supported by the National Science Foundation.

. . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . .

488 492 492 493 494 495 496 500 506 507 507 508 508 512 513 515 518 519

488

3.

F. Malikov

Supersymmetric Analogues . . . . . . . . . . . . . . . . . 3.1 Bits of supergeometry . . . . . . . . . . . . . . . . . . 3.2 Functional pre-symplectic structure . . . . . . . . . . . 3.3 Calculus of variations . . . . . . . . . . . . . . . . . . 3.4 An example: (1,1)-supersymmetric σ -model . . . . . . 3.5 Vertex Poisson algebra interpretation. Witten’s models 3.6 Quantization. B-model moduli . . . . . . . . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

525 525 528 531 532 538 543

Introduction More than anything else, the present notes are a report on what we have been able to make of the recent papers by Kapustin and Witten [Kap,W4]. Even after the tremendous effort [QFS], much of mathematical literature treating various aspects of string theory and related topics is conspicuously lacking any mention of the Lagrangian,1 an object that is at the heart of a physical theory. We would like to make precise the relation of sheaves of vertex algebras [MSV] to a Lagrangian field theory. What vertex algebras and Lagrangians have in common is that both produce infinite dimensional Lie algebras. If V is a vertex algebra, then, in particular, it is a vector space with multiplications (n) , n ∈ Z, and a derivation T . The corresponding Lie algebra is Lie(V ) = (V /T (V ),(0) ).

(0.1)

We will be concerned with the class of vertex algebras, or rather of sheaves thereof, introduced in [MSV]. To make our life easier, we will, first, mostly consider their quasiclassical limits, i.e., the corresponding vertex Poisson algebras and, second, work in the C ∞ -setting. This class comprises vertex algebra analogues of sheaves of symbols of differential operators, and their natural habitat is different versions of ∞-jet spaces. To give an example, let be a 1-dimensional real manifold and consider J ∞ (T ∗ M ), the space of ∞-jets of sections of the trivial bundle T ∗ M = T ∗ M × → . We will find it convenient to work with families of such jet-spaces, to be denoted J ∞ (T ∗ M/ ), that are are naturally and similarly attached to a “time” fibration τ: → with fiber . The push-forward of the structure sheaf O J ∞ (T ∗ M/ ) on M := M × carries a structure of a sheaf of vertex Poisson algebras, a vertex analogue of the Poisson algebra of functions on T ∗ M. This sheaf is natural in that the assignment M → O J ∞ (T ∗ M/ ) is functorial in M. Such sheaves of symbols of vertex differential operators, SVDO, can be defined axiomatically and classified [GMS1]. A simple quasiclassical version of the classification obtained in [GMS1] shows that locally (on M) all SVDOs are isomorphic and the set of isomorphism classes is identified with H 3 (M, R). In particular, an SVDO . O J ∞ (T ∗ M/ ) + H is defined for each closed 3-form H . To each such SVDO construction (0.1) attaches a sheaf of Lie algebras on M, .

.

.

Lie(O J ∞ (T ∗ M/ ) + H ) = (O J ∞ (T ∗ M/ ) + H )/T (O J ∞ (T ∗ M/ ) + H ). (0.2) 1 [AKSZ,FL] are notable exceptions.

Lagrangian Approach to Sheaves of Vertex Algebras

489

One of the more interesting examples arises when M is a compact simple Lie group. In this case the set of isomorphism classes of SVDOs is a 1-dimensional vector space, and we denote the representatives by S DG,k , k ∈ C. If we let g be the corresponding Lie algebra and V (g)k the corresponding vertex Poisson algebra, then there arise two Poisson-commuting subalgebras [F,FP,AG,GMS2] jl

jr

V (g)k → (G, S DG,k ) ← V (g)−k

(0.3)

engendered by the left/right translations of G by itself. If is a circle, this implies the existence of 2 copies of the affine Lie algebra jl

jr

gˆ k → (G, Lie(S DG,k )) ← gˆ −k . The case of k = 0 is exceptional; in this case the gˆ k × gˆ −k -module structure of the SVDO has a form reminiscent of the WZW (G, S DG,k ) = Vλ,k ⊗ Vλ∗ ,−k , (0.4) λ

where Vλ,k is the gˆ k -module induced from the finite dimensional simple g-module with highest weight λ; λ∗ is the dual weight. On the Lagrangian side one deals with a somewhat different jet-space, J ∞ (M ), where, recall, is 2-dimensional. Defined on it there is a sheaf of variational bi-complexes, ( •,• J ∞ (M ) , δ, d). An action, S, is defined to be 1,0 S ∈ (J ∞ (M ), 2,0 J ∞ (M ) /d J ∞ (M ) ),

(0.5)

and can be represented by a collection {L j ∈ 2,0 J ∞ (M ) }

(0.6)

of locally defined Lagrangians equal to each other modulo d-exact terms on double intersections. Action S also produces a Lie algebra I S of integrals of motion by virtue of the Noether theorem. Let us relate this algebra to (0.2). Each action (0.5) defines a space, Sol S , often referred to as the solution space, which for our purposes had better be chosen to be the infinite dimensional submanifold of J ∞ (M ) defined by the Euler-Lagrange equations. Sol S and the jet-spaces considered are examples of a diffiety, a notion introduced by A.M.Vinogradov [V]. Attached to each action there is what is usually called a variational 2-form on Sol S . Generally speaking, it is not a form but a global section 0,2 /d ω S ∈ Sol S , 1,2 Sol S Sol S

(0.7)

annihilated by δ. Special diffiety properties allow to attach to each such form, not necessarily coming from a Lagrangian, a Lie algebra structure on a certain subsheaf of “functions” on Sol S .

490

F. Malikov

We call it a functional pre-symplectic structure and denote the corresponding sheaf of Lie algebras by HωSolS S . One has I S ⊂ (Sol S , HωSolS S ).

(0.8)

We have come to the point. For a class of Lagrangians, comprising those that are convex and of order 1, Sol S is diffeomorphic to J ∞ (T M/ ) and there is a version of the Legendre transform g : J ∞ (T M/ ) → J ∞ (T ∗ M/ ) that gives a Lie algebra sheaf isomorphism ∼

g # : HωSolS S −→ g −1 Lie(O J ∞ (T ∗ M/ ) )

(0.9)

in the case of a single globally defined Lagrangian, cf. (0.1–0.2). Classification of SVDOs is also reflected in the Lagrangian world: given a globally defined Lagrangian L and a closed 3-form H , known as an H -flux, one defines, following e.g. [GHR,W1], a collection of Lagrangians L H , such as in (0.6), so that there is an isomorphism ∼

.

g # : HωSolS S −→ g −1 Lie(O J ∞ (T ∗ M/ ) + H ),

(0.10)

hence an embedding I S ⊂ (M, Lie(O J ∞ (T ∗ M/ ) ).

(0.11)

As an illustration, let us relate the WZW model [W1,GW] to S DG,k . It is conformally invariant meaning that 2 copies, left and right moving, of the Virasoro algebra are among the integrals of motion IW Z W . Both are embedded into Lie(S DG,k ) by virtue of (0.11). Precisely when the level, k, is non-zero, they coincide with the Virasoro algebra defined via the Sugawara construction inside jl (V (g)k ) or jk (V (g)−k ) resp., see (0.3). The existence of the left/right moving Virasoro algebra allows to define the left/right moving subalgebra of Lie(S DG,k ), or indeed, the left/right moving subalgebra of S DG,k . Again, precisely when k = 0, the spaces of global sections of these equal jl (V (g)k ) or jk (V (g)−k ) resp. This is an easy consequence of decomposition (0.4). A disclaimer is in order: no attempt at originality has been made. But it is also true that we failed to find an exposition of this material suitable for our purposes. Our main source of fact and inspiration was [Di], Ch. 19; see also [DF,S,V,Z]. Needless to say, much of what has just been discussed is contained in one form or another in [BD], e.g. Sects. 2.3.20, 3.9, but note that the meaning is somewhat different: we work in the C ∞ setting and our constructions are not necessarily chiral. In fact, one of our wishes was to understand “left and right movers”, whose ubiquity in physics literature bedevils some of us in the mathematics community. All of the above has a more or less straightforward superanalogue. As an example, we analyze the (1,1)-supersymmetric σ -model arising on a Riemannian manifold M. It poiss is similarly governed by a super SVDO, M , which is a quasiclassical limit of the C ∞ -version of the chiral de Rham complex [MSV]. The Lie superalgebra of integrals of motion contains two copies of the N=1 superconformal algebra, and we write explicit poiss formulas for their embeddings into (M, Lie( M )). If, in addition, M is a Kähler manifold, then this symmetry algebra is enlarged to include 2 copies of the N = 2 superconformal algebra, a remarkable fact known since

Lagrangian Approach to Sheaves of Vertex Algebras

491

[Z,A-GF]. If so, a quadruple of operators, Q •,• , • = ±, arises, appropriate combinations poiss of which are differentials on M . The 3 cohomology sheaves, poiss

H Q −− +Q ++ ( M

poiss

), H Q −− +Q +− ( M

poiss

), H Q −− ( M

)

(0.12)

are versions of the quasiclassical limit of Witten’s A-, B- and half-twisted models [W2]. Their cohomology can be computed using versions of the de Rham complex and ∂¯ resopoiss,an lution; the result is H ∗ (M, C), H ∗ (M, ∗ T M ), and H ∗ (M, M ) resp., where the poiss latter is a purely holomorphic version of M , the quasiclassical limit of the chiral de Rham complex [MSV]. Note that in the present situation the left/right moving subalgebra can also be defined by analogy with the WZW case considered above. Of the 3 cohomology vertex Poisson poiss,an algebras, H ∗ (M, C), H ∗ (M, ∗ T M ), and H ∗ (M, M ), the last one is an infinite dimensional subquotient of the left moving algebra and is often referred to as Witten’s chiral algebra [W3,W4]. The first two are also appropriate subquotients but are finite dimensional and of topological nature. poiss The construction of the sheaf complexes used, such as ( M , Q −− + Q ++ ), poiss poiss ( M , Q −− + Q +− ), ( M , Q −− ), is easily quantized to produce the complexes t −− + Q ++ ), ( ver t , Q −− + Q +− ), ( ver t , Q −− ). of vertex algebra sheaves, ( ver M ,Q M M The sheaf cohomology of the first two remains the same, that of the 3rd is quite different and equals the cohomology of the chiral de Rham complex. When formulated in the physics language, the relation of this quantization to the genuine supersymmetric model is that the latter equals the former “perturbatively”; this is the main result of Kapustin [Kap]. It seems, however, that the emphasis made in [Kap] on the “infinite metric limit” is somewhat disingenuous: the above supersymmetries depend on a genuine metric and its Kähler form. By skillfully applying the techniques of SUSY vertex algebras [HK], Ben-Zvi, Heluani, and Szczesny have recently solved [B-ZHS] a harder problem of finding quantum versions of the above mentioned embeddings of the various superconformal algebras in the chiral de Rham complex. Our discussion seems to indicate that their quantization is related to physics (apart from non-perturbative effects) in about the same way the known quantization of (0.3) [FP,F,AG,GMS2] is related to the quantum WZW: it mixes the chiral and anti-chiral sectors. At this point we can only ask, following [FP], if there is a physical model of interest whose chiral algebra is as in [B-ZHS]. t −− ) can be deformed. One The differential Q −− of the (quantum) complex ( ver M ,Q ver t −− of the ways to think about the complex ( M , Q ) is that it is a vertex algebra version ¯ of the ∂-resolution of the algebra of polyvector fields. We conclude by showing that the Barannikov-Kontsevich construction [BK] has a vertex algebra analogue: we define a family of vertex algebras with base the Barannikov-Kontsevich moduli space MBK t by assigning to each t ∈ MBK an element Q −− ∈ (M, ver t M ) and a vertex algebra t H Q −− ((M, ver M )). t t The conformal weight zero subspace of the family {t → H Q −− ((M, ver M ))} t encodes precisely the Frobenius manifold introduced in [BK]. Our construction amounts to defining a morphism of the deformation functor of the Lie algebra of polyvector fields t to that of (M, Lie( ver M )), which is yet another application of construction (0.1). This furnishes the B-model moduli for Witten’s half-twisted model. The instanton effects seem to be out of reach, to us; see, however, an intriguing sentence in [W4], p.1, and [FL] for an interesting new approach based on marginal operators.

492

F. Malikov

1. Diffieties and Functional Pre-Symplectic Structures The geometry of jet-spaces is a huge and familiar topic, see sources such as [BD,Di,Ol, V]. The purpose of this section, see Lemma 1.5.4.1, is to introduce a “universal” sheaf of Lie algebras Hcan , which contains the algebra of symmetries of a class of Lagrangians, see Lemma 1.6.8.1. 1.1. The jets. Assume given a d-dimensional C ∞ -manifold , the “world sheet”, and a smooth fiber bundle τ : →

(1.1.1)

with base , an open subset of R, and fiber , a (d − 1)-dimensional manifold. This is the minimal requirement; most of the time, we will have the Cartesian product ×

(1.1.2)

with a fixed coordinate (“intrinsic time”)

on

τ = σ 0 : → R,

(1.1.3)

σ = (σ 1 , . . . , σ d−1 ) : Rd−1 →

(1.1.4)

and étale coordinates

,

on so that (1.1.4) is a universal cover – étale because we would like to include the case of a torus; furthermore, we will be mostly interested in d = 1. Let M, “space-time”, be an n-dimensional C ∞ -manifold and M = M × . There arises a fiber bundle M → , and we denote by J k (M ) the space of k-jets of its sections. Each J k (M ) is a finite dimensional C ∞ -manifold, and the natural projections πk,l : J k (M ) → J l (M ), k ≥ l organize the collection {J k (M ), k ≥ 0} in a projective system. The space of ∞-jets, J ∞ (M ), is the projective limit, lim← J k (M ). Its sheaf of smooth functions is the direct limit of those on J k (M ), ≥ 0. Let O J k (M ) be the sheaf of all smooth functions on J k (M ) that are polynomials in positive order jets and define O J ∞ (M ) = lim O J k (M ) . →

J ∞ (M

The fiber bundle ) → carries a well-known flat connection, or equivalently, O J ∞ (M ) is a sheaf of D -algebras, where D is the sheaf of differential operators on . Denote by ρ : T → T J ∞ (M )

(1.1.5)

the corresponding morphism of the sheaves of vector fields. We will often refer to this situation by calling J ∞ (M ) → a D -manifold, thus mimicking [BD]. In particular, attached to any tangent vector ξ ∈ Tt , there is a tangent vector ρ(ξ ) ∈ T(x,t) (J ∞ (M )). Hence there arises an integrable d-dimensional distribution M (x, t) → span{ξˆ , ξ ∈ Tt } ⊂ T(x,t) J ∞ (M ) known as the Cartan distribution.

Lagrangian Approach to Sheaves of Vertex Algebras

493

J ∞ (M ) is a simple example of what A.M.Vinogradov calls a diffiety. Since the Cartan distribution is an important structure ingredient, by an infinitesimal automorphism of J ∞ (M ) one means a contact vector field [V], i.e., a vector field that preserves the Cartan distribution. Locally defined contact vector fields form a Lie algebra subsheaf C J ∞ (M ) ⊂ T J ∞ (M ) . Call a contact vector field evolutionary if it is tangent to the fibers of the projection J ∞ (M ) → and let Evol(J ∞ (M )) denote the sheaf of all evolutionary vector fields. Of course, Evol(J ∞ (M )) ⊂ C J ∞ (M ) ⊂ T J ∞ (M ) are embeddings of Lie algebra sheaves. All of this admits a relative version: if one defines J ∞ (M/ ) to be the space of jets of sections → M in the direction of fibers of the bundle → , then the definitions of the connection ρ/ : T/ → T J ∞ (M/ ) ,

(1.1.6)

(where T/ is the sheaf of vector fields on tangent to the fibers of the projection → ), Cartan distribution, sheaf of evolutionary vector fields Evol(J ∞ (M/ )), etc., are immediate. From now, unless otherwise stated, we will be working over a base S, S being either or a point. If (1.1.2-4) are valid, then the fibers of the projection J ∞ (M ) → are canonically identified, and ∼

∼

∞ J ∞ (M ) −→ J∞ (M) × , J ∞ (M/ ) −→ J/ (M) ×

(1.1.7)

∞ for some infinite dimensional manifolds, J∞ (M) and J/ (M), whose definition is easy to reconstruct from 1.2 below.

1.2. Local formulas. As an illustration, and for future use, let us show what all of this means in terms of local coordinates. Let x 1 , . . . , x n be local coordinates on M, those on being defined by (1.1.2–4). For a multi-index (m) = (m 0 , . . . , m d−1 ), let j

m

d−1 j x(m) = ∂σm00 · · · ∂σ d−1 x ,

where ∂σ m = ∂/∂σ m and x j is regarded, formally, as a function of σ 0 , . . . , σ d−1 . Then j

{σ i , x(m) : 0 ≤ i ≤ d − 1, 1 ≤ j ≤ n, (m) ∈ Zd+ } are local coordinates on J ∞ (M ), j

{σ i , x(m) : 0 ≤ i ≤ d − 1, 1 ≤ j ≤ n, (m) ∈ Zd+ , m 0 = 0} are local coordinates on J ∞ (M/ ), and sections of O J ∞ (M/S ) are smooth functions

in σ i, x j and polynomials in x(m) , (m) = 0, with m 0 = 0 if S = . j

j

j

Let δ/δx(m) denote the vertical vector field ∂/∂ x(m) ∈ T J ∞ (M ) . Morphism (1.1.5) is defined by ρ(∂σ i ) = ∂σ i +

n j=1 (m)∈Zd+

j

x(m+ei )

δ j

δx(m)

,

where ei = (0, . . . , 0, 1, 0, . . . , 0), 1 appearing at the i th position.

(1.2.1)

494

F. Malikov

Evolutionary vector fields are in 1–1 correspondence with n-tuples of functions F 1 , . . . , F n ∈ O J ∞ (M ) (called the characteristic of a vector field) and are defined via the following prolongation formula: ⎛ ⎞ n δ ⎠ ⎝F j δ + ρ(∂σ 0 )m 0 · · · ρ(∂σ m d−1 )m d−1 F j ξ= . (1.2.2) j δx j δx (m)=0

j=1

(m)

The relative analogs of (1.2.1,2) are obviously obtained by demanding that m 0 = 0. 1.3. De Rham complex. Reflecting the product structure of M , the de Rham complex on J ∞ (M/S ) is bi-graded: p

i, j

J ∞ (M/S ) = ⊕i+ j= p J ∞ (M/S ) . It carries 2 anti-commuting differentials: i, j

i, j+1

i, j

i+1, j

δ : J ∞ (M ) → J ∞ (M ) , dρ/S : J ∞ (M/S ) → J ∞ (M/S ) , [δ, dρ/S ] = 0, defined as follows. The space, or rather the sheaf of spaces, ∗,∗ J ∞ (M/S ) is naturally an O -module, and δ is the vertical de Rham differential, i.e., the one that is O -linear. The flat connection ρ/S gives rise to a differential, dρ/S , on ∗,0 J ∞ (M/S ) in the standard manner. For example, in terms of local cordinates, j

dρ/S x(m) =

d−1

j

x(m)+ei dσ i ,

(1.3.1)

i=

where = 0 if S = ∅ and = 1 if S = . Then the condition [δ, dρ/S ] allows one to extend dρ/S to the entire ∗,∗ J ∞ (M/S ) unambiguously. Thus δ F(σ, x(m) ) =

∂ F(σ, x(m) ) δx(m) ; δdσ = 0, ∂ x(m)

j

j

dρ/S δx(m) = −δdρ/S x(m) = −

d−1

j

δx(m)+ei ∧ dσ i .

(1.3.2) (1.3.3)

i=

There is a mapping of bi-complexes ∗,∗ ∗,∗ J ∞ (M ) → J ∞ (M

/ )

,

(1.3.4)

that sends a form to its restriction to the fibers of the composite projection τ

J ∞ (M ) → → .

(1.3.5)

As a practical matter, (1.3.4) amounts to dτ → 0.

(1.3.6)

Let ιξ be the operator of contraction with a vector field ξ . A straightforward computation proves the following.

Lagrangian Approach to Sheaves of Vertex Algebras

495

Lemma 1.3.1. A vertical vector field ξ on J ∞ (M/S ) is evolutionary iff [ιξ , dρ/S ] = 0.

(1.3.7)

Corollary 1.3.2. If ξ is evolutionary, then [dρ/S , Lieξ ] = 0.

(1.3.8)

Indeed, [dρ/S , Lieξ ] = [dρ/S , [δ, ιξ ]] = −[δ, [dρ/S , ιξ ]] = 0. 1.4. Differential equations. Let J ⊂ O J ∞ (M ) be a sheaf of ideals preserved by the connection ρ. Let Sol ⊂ J ∞ (M ) be the zero locus of this ideal. If some regularity conditions hold, then this submanifold delivers another example of a diffiety. For example, one can, and we will, assume that J is locally pseudo-Cauchy-Kovalevskaya, i.e., there is a distinguished coordinate on , to be denoted τ , and for any point in M there is a coordinate system x 1 , . . . , x n s.t. the ideal is generated, around the pre-image of this point on the jet space, by the functions E 1 , . . . , E n satisfying E j = ρ(∂τ )l j x j + · · · ,

(1.4.1)

where the dots stand for the terms that do not involve jets of x i of degree ≥ li , 0 ≤ i ≤ d − 1 in the direction of ∂τ . Here are some of the structure properties Sol shares with the ambient jet space: τ

• Sol is fibered over , hence over via → ; • the algebra of functions O Sol is a D -algebra (because the flat connection preserves Sol), hence a D/ -algebra, where D/ is the subalgebra of D that commutes with τ −1 O ; we will write Sol if we wish to emphasize the D/ -algebra structure; • the de Rham complex ∗Sol/S is bi-graded and carries two commuting differentials, δ, the vertical differential, and dρ/S , the D/S -module differential. If (1.4.1) is valid, then solving the equation E j = 0 for ρ(∂τ )l j x j , one sees that Sol τ looks like ∞-jets in the direction of the fiber of the bundle → to something finite dimensional. In particular, if l1 = l2 = · · · = ln = 2, then ∼

Sol −→ J ∞ (T M/ ),

(1.4.2)

as D/ -manifolds. Any evolutionary vector field on the ambient jet space that preserves the ideal J descends to a vector field on Sol, which still satisfies (1.3.7). We emulate this situation by making the following definition. Definition 1.4.1. A vertical vector field ξ on Sol is called evolutionary (relative to S) if [ιξ , dρ/S ] = 0.

(1.4.3)

496

F. Malikov

Lemma 1.4.2. If ξ is evolutionary, then [dρ/S , Lieξ ] = 0.

(1.4.4)

1.4.3. Let Evol(Sol) S denote the sheaf of all evolutionary vector fields on Sol relative to S. The identity [Lieξ , ιη ] = ι[ξ,η] combined with (1.4.3,4) implies that Evol(Sol) S is a Lie algebra. It is an obvious consequence of Definition 1.4.1 that Evol(Sol) ⊂ Evol(Sol) .

(1.4.5)

1.5. Functional pre-symplectic structure. A symplectic structure, that is, a nondegenerate closed 2-form gives rise to a Poisson algebra structure on the structure sheaf of a manifold. A pre-symplectic structure, i.e., a degenerate closed 2-form similarly gives rise to a Poisson algebra structure on a certain, admissible, subalgebra of the structure sheaf. This subalgebra consists of functions constant along the leaves of the foliation tangent to the kernel of the form [Fad]. In the case of a diffiety, such as Sol, an analogue of the structure sheaf is supplied by the sheaf of local functionals, and we would like to explain, in the spirit of [DF,Di], that in this case a pre-symplectic structure gives rise to a Poisson bracket on the sheaf of local functionals, which may be just as good for all practical purposes as the symplectic one. Arising in this way the notion of a functional pre-symplectic structure is a rather straightforward geometric version of Dorfman’s symplectic operator [D]. 1.5.1. The following is a list of standard symplectic geometry notions adjusted to the case where vector fields are replaced with evolutionary vector fields and equalities are valid up to dρ/S -exact terms. From now on M is a diffiety, such as Sol ⊂ J ∞ (M ) or J ∞ (M/S ). The relation [δ, dρ/S ] = 0, see Sect. 1.3, implies that δ descends to a differential ¯ •,• → ¯ •,•+1 , where ¯ •,• = •,• /dρ/S •−1,• . δ¯ : M/S M/S M/S M/S M/S

(1.5.1)

¯ •,• are often referred to as functional forms [Ol]. Elements of the quotient complex M/S ¯ d−1,2 ) a functional pre-symplectic form if We will call ω¯ ∈ H 0 (M, M/S

δ¯ω¯ = 0.

(1.5.2)

Note that if ξ is an evolutionary vector field, then Lieξ and ιξ are well-defined oper¯ •,• thanks to (1.4.3–4). ators acting on the quotient complex M/S An evolutionary vector field ξ ∈ Evol(M)/S is called Hamiltonian if Lieξ ω¯ = 0.

(1.5.3)

¯ d−1,0 admissible relative to ω¯ if there is an evolutionary Call a functional form F¯ ∈ M/S vector field ξ F such that δ¯ F¯ = ιξ F¯ ω. ¯

(1.5.4)

Note that (1.5.4) implies that ξ F¯ is Hamiltonian, which prompts one to think of F¯ as a Hamiltonian associated to ξ F¯ . Hence the following bit of notation:

Lagrangian Approach to Sheaves of Vertex Algebras

497

ω¯ Definition 1.5.1.1. Let HM /S be the sheaf of all functional (d − 1, 0)-forms that are admissible relative to ω. ¯

Note that although an admissible F¯ does not determine ξ F¯ uniquely it does so up to the kernel of ω: ¯ any two such vector fields ξ F¯ , η F¯ satisfy ιξ F¯ −η F¯ ω¯ = 0.

(1.5.5)

¯ G¯ define the bracket For any admissible F, ¯ G} ¯ = ξ ¯ G. ¯ { F, F

(1.5.6)

¯ G} ¯ = ιξ ¯ δ¯ G¯ = ιξ ¯ ιξ ¯ ω, ¯ { F, F F G

(1.5.7)

Equivalently,

which shows, by virtue of (1.5.5), that (1.5.6) is independent of the choice of ξ F¯ . Next, ¯ G} ¯ ∈ Hω¯ . { F, M/S

(1.5.8)

¯ G} ¯ = ι[ξ ¯ ,ξ ¯ ] ω, ¯ F, δ{ ¯ F G

(1.5.9)

In fact,

as the following computation (based on a repeated use of (1.5.2–3) ) shows:

¯ G} ¯ = δ(ξ ¯ F, ¯ F¯ G) = δ¯ ιξ ¯ ιξ ¯ ω¯ = Lieξ ¯ ιξ ¯ ω¯ − ιξ ¯ Lie xi ¯ ω¯ + ιξ ¯ ιξ ¯ δ¯ω¯ δ{ F G F G F G F G = ι[ξ F¯ ,ξG¯ ] ω¯ + ιξG¯ Lieξ F¯ ω¯ = ι[ξ F¯ ,ξG¯ ] ω. ¯ Therefore, we obtain the map ω¯ ω¯ ω¯ ¯ ¯ ¯ ¯ ¯ {., .} : HM /S × HM/S → HM/S , ( F, G) → { F, G} = ξ F¯ G.

(1.5.10)

ω¯ Proposition 1.5.2. Map (1.5.10) makes HM /S into a sheaf of Lie algebras.

Proof. The antisymmetry of (1.5.10) is an immediate consequence of (1.5.7). The Jacobi identity is proved as follows: {{F, G}, H } = [ξ, η]H = ξ ηH − ηξ H = {F, {G, H }} − {G, {F, H }}, where the first equality is a consequence (1.5.9).

1.5.3. The absolute and relative versions of this construction can be compared. Indeed, by virtue of (1.4.5), morphism of bi-complexes (1.3.4) induces a morphism of the Lie algebra sheaves ω¯ ω¯ HM → HM / ,

(1.5.11)

498

F. Malikov

which in terms of local coordinates amounts to dτ → 0,

(1.5.12)

cf. (1.3.6). 1.5.4. Example 1. Canonical commutation relations. Replace M as the target space with T ∗ M and consider J ∞ (T ∗ M/ ) with d = dim = 2. There arises the projection π : J ∞ (T ∗ M/ ) → T ∗ M/ .

(1.5.13)

Let ωo be the canonical symplectic form on T ∗ M and ω = ωo ∧ dσ ; the latter is a (1, 2)-form on T ∗ M/ – we are taking advantage of coordinates (1.1.2–4). There arises then π ∗ ω, the pull-back of ω to J ∞ (T ∗ M/ ) under (1.5.14). Let us now com∗ pute HπJ ∞ω(T ∗ M ) defined in 1.5.1.1. /

Lemma 1.5.4.1. ∗

HπJ ∞ω(T ∗ M

/ )

¯ 1,0 = J ∞ (T ∗ M

/ )

.

(1.5.14)

Informally speaking, this lemma says that in this case any function is admissible, hence π ∗ ω is as good as symplectic, the fact that we alluded to in the beginning of Sect. 1.5. Proof. Note that, ωo being non-degenerate, any section of 1,1 T ∗ M/ can be written as ιξ o ω for some vector field ξ o on T ∗ M . Pulling back on J ∞ (T ∗ M/ ), one sees that likewise any section of π ∗ 1,1 can be written as ιξ o π ∗ ω, where ξ o is now a T∗M /

vector field on J ∞ (T ∗ M/ ), which is locally a linear combination of δ/δx i , x i being local coordinates on T ∗ M. Thinking of ξ o as a characteristic, one can prolong it to an evolutionary vector field ξ , as in (1.2.2), and thus obtain π ∗ 1,1 T∗M

/

= {ιξ π ∗ ω, ξ ∈ Evol(J ∞ (T ∗ M/ ))}.

Now observe that 1,1 J ∞ (T ∗ M

/ )

is generated, as a D/ -module, by π ∗ 1,1 T∗M

Hence (1.5.15) holds true for the entire 1,1 J ∞ (T ∗ M

/ )

(1.5.15)

1,1 J ∞ (T ∗ M/ )

/

.

modulo dρ/ -exact terms, i.e.,

= {ιξ π ∗ ω + dρ/ β, ξ ∈ Evol(J ∞ (T ∗ M/ )), β ∈ 0,1 J ∞ (T ∗ M

/ )

}

(1.5.16) and (1.5.14) follows.

Because of its importance, the sheaf of Lie algebras arising in this way will be denoted thus ∗

Hcan = HπJ ∞ω(T ∗ M def

/ )

¯ 1,0 = J ∞ (T ∗ M

/ )

.

(1.5.17)

Computationally, the gist of our discussion is as follows. The algebra of functions on the cotangent bundle with the canonical Poisson bracket is a Lie subalgebra of Hcan : (1.5.18) π # : π −1 OT ∗ M/ dσ, {., .}T ∗ M → Hcan ,

Lagrangian Approach to Sheaves of Vertex Algebras

499

and the rest of the Lie algebra structure is determined by {Fdσ , G H dσ } = {Fdσ, Gdσ }H + G{Fdσ, H dσ }, (1.5.19a) {Fdσ , Gρ(∂σ )H dσ } = {Fdσ, Gdσ }ρ(∂σ )H + Gρ(∂σ ){Fdσ, H dσ }, (1.5.19b) because an evolutionary vector field is a derivation commuting with ρ(∂σ ), see e.g. (1.4.4.). To see what all of this means, let us compute some brackets. Let F, G be functions on M/ , ξ, η vector fields on M vertical w.r.t. M → , which we regard as fiberwise linear functions on T ∗ M . Then, {Fdσ , Gdσ } = 0, {ξ dσ , Gdσ } = ξ Gdσ , {ξ dσ , ηdσ } = [ξ, η]dσ .

(1.5.20a) (1.5.20b) (1.5.20c)

The first instance of the bracket jet nature manifesting itself is as follows. If Fi d x i is a 1-form on M, then α = Fi (x)ρ(∂σ )x i is a well-defined (0, 0)-form on J ∞ (T ∗ M ). Having thus embedded 1M into 0,0 , one uses (1.5.18a–b) to obtain J ∞ (T ∗ M ) /

{ξ dσ , αdσ } = (Lieξ α)dσ ,

(1.5.20d)

if ξ does not depend on σ explicitly, and {ξ dσ , αdσ } = (Lieξ α + ι∂σ ξ α)dσ

(1.5.20e)

in general, where Lieξ α is the Lie derivative of α along ξ . Formulas (1.5.20a–d), without functions explicitly depending on τ, σ , are a familiar definition of the Lie algebra associated with the Courant algebroid on T M ⊕ T ∗ M. The idea that the Courant algebroid has infinite dimensional nature apparently goes back to I.Dorfman [Dor]. It was revived recently, in a slightly different context, by P.Bressler [Bre]. Note that identities (1.5.19a,b) seem to incorporate the Leibnitz identity, which they do not, because Hcan is not an associative algebra. It is, however, a quotient of 1,0 1,0 J ∞ (T ∗ M/ ) , and the latter is. In fact, J ∞ (T ∗ M/ ) is a sheaf of vertex Poisson algebras, and its quotient Hcan is canonically associated to its sheaf of Lie algebras, see Proposition 2.7.3 below. 1.5.5. Example 2. The solution space of an order 2 system in the pseudo-CauchyKovalevskaya form. Let us place ourselves in the situation of 1.4 and let Sol satisfy (1.4.2), i.e., ∼

Sol −→ J ∞ (T M/ ). The latter does not carry any canonical 2-form, but let us fix a diffeomorphism ∼

g : T M −→ T ∗ M, which in practice is most often defined by a metric on M. It is lifted, uniquely, to a diffeomorphism of D/S -manifolds (for any base S) ∼

g : J ∞ (T M/S ) −→ J ∞ (T ∗ M/S ),

(1.5.21)

500

F. Malikov

hence a diffeomorphism ∼

g : Sol −→ J ∞ (T ∗ M/ ), and a sheaf isomorphism ∼

−1 •,• g # : •,• Sol/ −→ g J ∞ (T ∗ M )/ ,

(1.5.22)

where g −1 stands for the inverse image in the category of sheaves of vector spaces. Let g ∗ ω be the symplectic form on T M obtained by pulling back the canonical symplectic form ω on T ∗ M. We have arrived at Lemma 1.5.5.1. The mapping (1.5.22) descends to an isomorphism of Lie algebra sheaves g∗ ω

∼

g # : H Sol/ −→ g −1 Hcan . 1.6. Calculus of variations and integrals of motion. Bosonic σ -model. Calculus of variations is the principal source of the brackets discussed in 1.4–5. 1.6.1. An action A is a global section, cf. (1.5.1), ¯ d,0 A ∈ J ∞ (M ), ∞

J (M )

.

(1.6.0)

It can be represented by a Lagrangian which is a collection of sections ( j) L = {L (i) ∈ Ui , d,0 − L (i) ∈ Imdρ on U j ∩ Ui }, (1.6.1a) ∞ J (M ) s.t.L determined up to a transformation L (i) → L (i) + dρ β (i) ,

(1.6.1b)

where {Ui } is an open covering of J ∞ (M ). Choosing local coordinates one observes that (i)

δL (i) = −dρ γ (i) + E j δx j ,

(1.6.2)

d,0 for some γ (i) ∈ d−1,1 J ∞ (M ) , known as a variational 1-form, and some E j ∈ J ∞ (M ) .

Since representation (1.6.2) is unique [Di,T], and transformation (1.6.1b) leaves E (i) j unaffected (because [δ, dρ ] = 0, see 1.3), associated to the action A there arises the sheaf of Euler-Lagrange ideals J L =< D E 1 , D E 2 , . . . , D E n >⊂ O J ∞ (M ) .

(1.6.3)

Let Sol L ⊂ J ∞ (M ) be the corresponding zero locus. We will assume that J L is of pseudo Cauchy-Kovalevskaya type, and usually (1.4.2) will hold.

Lagrangian Approach to Sheaves of Vertex Algebras

501

The variational 1-form γ (i) is not quite uniquely defined, but due to the well-known acyclicity theorem [T,Di], locally it is determined up to a dρ -exact term. Therefore, the variational 2-form ω(i) = δγ (i) ¯ d−1,2 unambiguously defines a section of the quotient sheaf J ∞ (M ) over Ui . Since transformation (1.6.1b) leaves it invariant, there arises def ¯ d−1,2 ω¯ L = {Ui → ω(i) } ∈ J ∞ (M ), (1.6.4) ∞ J (M ) . By construction, ω¯ L satisfies (1.5.2); hence on Sol L there arises the sheaf of Lie algebras HωSolL , see Proposition 1.5.2. Our task now is to detect inside it a subalgebra of integrals of motion. As we have seen already, the nature of the argument tends to be purely local, and until further notice it will be assumed that L ∈ (J ∞ (M ), d,0 J ∞ (M ) ). 1.6.2. A symmetry of L is an evolutionary vector field ξ s.t. Lieξ L = dρ αξ ,

(1.6.5)

for some αξ ∈ d−1,0 J ∞ (M ) . Denote by Sym L the set of all symmetries of L; it is naturally a Lie algebra. It is easy to derive from (1.6.5) that any ξ ∈ Sym L preserves J L , see [Di], hence defines a vector field on Sol L , to be denoted ξ¯ . Let Sym L be the Lie algebra of all such vector fields. ¯ d−1,0 An integral of motion of L is an F¯ ∈ (J ∞ (M ), J ∞ (M ) ) s.t. ¯ Sol L = 0. dρ F|

(1.6.6)

¯ ¯ d−1,0 I L = { F¯ ∈ (J ∞ (M ), J ∞ (M ) ) s.t. dρ F| Sol L = 0}

(1.6.7a)

¯ d−1,0 ). I˜ L = I L | Sol L ⊂ (Sol L , Sol L

(1.6.7b)

Let

and

If ξ is a symmetry of L with characteristic {Q j }, see (1.2.2), then the computation dρ αξ = ιξ δL

(1.6.2)

= −ιξ dρ γ + E j Q j

(1.3.7)

= dρ ιξ γ + E j Q j

(1.6.8)

shows that αξ − ιξ γ is an integral of motion. The form αξ being determined by ξ up to a dρ -exact term, (1.6.8) defines maps Sym L → I L , ξ → αξ − ιξ γ

(1.6.9)

Sym L → I˜ L .

(1.6.10)

and, by restriction,

Noether’s Theorem 1.6.3. ([Di,Ol]). Map (1.6.9) is a surjection, map (1.6.10) is an isomorphism.

502

F. Malikov

Therefore, I˜ L inherits a Lie algebra structure from Sym L . Let us now show that this Lie algebra structure is consistent with that on the sheaf Hω¯ L . ¯L Lemma 1.6.4. I˜ L is a Lie subalgebra of (Sol L , HωSol ) such that (1.6.10) is a Lie L algebra isomorphism.

It is this Lie algebra that is often referred to as the algebra of integrals of motion or current algebra. Proof. It is known, see e.g. [Di] 19.6.17 or [DF] Proposition 2.76, that if ξ is a symmetry of L such that (1.6.5) holds, then, upon restricting to Sol L , Lieξ γ = δαξ + dρ β for some β. An application of δ to both sides of this equality shows that ξ is Hamiltonian, see (1.5.2). The corresponding integral of motion Fξ = αξ − ιξ γ is admissible because δ Fξ = δαξ − διξ γ = δαξ − Lieξ γ + ιξ δγ = −dρ β + ιξ ω L . ¯L ). Furthermore, the line above shows that modulo dρ , Hence I˜ L ⊂ (Sol L , HωSol L

δ¯ F¯ξ = ιξ ω¯ L . Hence the bracket of two integrals of motion induced by the Lie algebra structure on ¯L HωSol , see (1.5.10), is as follows: L { F¯ξ , F¯η } = ξ F¯η , which is also an integral of motion, because due to (1.4.4), dρ ξ G = ξ dρ G = 0. The corresponding symmetry of L is, of course, [ξ, η], which completes the proof.

1.6.5. Let us now drop the requirement that L be globally defined. The exposition above has to be altered a little. An evolutionary vector field is a symmetry of L, see (1.6.1), if it is of each L (i) : (i)

Lieξ L (i) = dρ αξ . ( j)

There may arise discrepancies αξ(i) − αξ on double intersections Ui ∩ U j , but (1.6.1a) and (1.4.4) ensure that they are dρ -exact. Therefore, while the collection (i)

(i)

{Fξ = αξ − ιξ γ (i) } does not define a global section of d−1,0 J ∞ (M ) , taken modulo dρ it defines a global section d−1,0 ¯ ∞ of J (M ) . The rest of the discussion in 1.6.2–4 goes through unchanged, and we obtain

Lagrangian Approach to Sheaves of Vertex Algebras

503

Corollary 1.6.6. Lemma 1.6.4 holds true for any Lagrangian (1.6.1a,b). ¯L ¯L Along with HωSol , there is its relative version, HωSol/ and the Lie algebra sheaf morphism ¯L ¯L HωSol → HωSol/

defined in (1.5.11), which seems to be neither surjection nor injection, generally speaking. Lemma 1.6.7. If Sol L satisfies (1.4.1), then the composition L I˜ L → (Sol, HωSolL ) → (Sol, HωSol/ )

is an injection. Proof. Assume that F˜ ∈ I˜ L is annihilated by the composite map. This means that if o o ˜ F ∈ d−1,0 Sol L is a representative of F, then F = F ∧ dτ , and dρ/ F = 0. Due to the o Takens acyclicity theorem [T] (applicable thanks to (1.4.1)), F = dρ/ G for some G. Therefore, F = ±dρ (G ∧ dτ ) and F˜ = 0, as desired. Now we would like to explain that for an important class of Lagrangians, the sheaf ¯L can defined in (1.5.17) and exhibit some conHωSol/ is isomorphic to the canonical H crete Lie algebras of integrals of motion. 1.6.8. Order 1 Lagrangians and the Legendre transform. Let us assume that the Lagrangian L depends only on 1-jets of the coordinates x j . If we let ˜ 0 ∧ · · · ∧ dσ d−1 , L = Ldσ then (1.6.2) becomes

(1.6.11)

∂ L˜ j 0 d−1 p δL = − dρ (−1) dσ ∧ · · · ∧ dσ δx ∧ dσ ∧ · · · ∧ ∂(∂σ p x j ) (1.6.12) ∂ L˜ ∂ L˜ − ∂σ p + δx j ∧ dσ 0 ∧ · · · ∧ dσ d−1 , ∂x j ∂(∂σ p x j ) p+1

where means that the term is omitted and summation w.r.t. repeated indices is assumed. Assume now that on there is a distinguished coordinate, say τ = σ 0 , such that L is a convex function of jets of coordinates in the τ -direction. It follows then that Sol L satisfies (1.4.2). Applying (1.3.4) to γ we obtain ∂ L˜ γ := γ |dτ =0 = δx j ∧ dσ 1 ∧ · · · ∧ dσ d−1 . ∂(∂τ x j ) Note that, as a function of ∂τ x j , L˜ is canonically a function on the tangent space T M. It follows that γ is unambiguously a 1-form on T M. The convexity of L implies that the Legendre transform dT M L˜ : T M → T ∗ M

(1.6.13)

504

F. Malikov

is a diffeomorphism. A moment’s thought shows that γ is the pull-back of the canonical ˜ which places us in the situation of Lemma 1.5.5.1. In a 1-form on T ∗ M w.r.t. dT M L, coordinate form, we have: if x j are coordinates on M, x j = ∂/∂ x j are fiberwise linear functions on T ∗ M, then ˜ # (x j ) = x j , (dT M L) ˜ # (x j ) = (dT M L) and

∂ L˜ , ∂(∂τ x j )

˜ # x j δx j ∧ dσ 1 ∧ · · · ∧ dσ d−1 , γ = (dT M L) (1.6.14)

˜ # δγ = (dT M L) ˜ # δx j ∧ δx j ∧ dσ 1 ∧ · · · ∧ dσ d−1 , ωL = (dT M L)

are the pull-backs of the canonical degenerate symplectic form. Hence Lemmas 1.5.5.1 and 1.6.7 specialized to the present situation read as follows. Lemma 1.6.8.1. If L depends only on the 1-jets of coordinates and is convex, then in the case where d = 2, there are the following Lie algebra (sheaf) morphisms: ∼ ∼ ¯L ˜ −1 Hcan , I˜ L → (M, Hω¯ L ) −→ (M, Hcan ). HωSol/ −→ (dT M L) Sol/

1.6.8.2. This lemma explains the universality of Hcan . One can argue, therefore, that the Lie algebra content of the “theory” is independent of the Lagrangian. What captures the properties of an individual Lagrangian is the subalgebra of integrals of motion. For example, if L is independent of τ , the intrinsic time, then ρ(∂τ ) is a symmetry of L, and (1.6.9) produces the corresponding integral of motion as follows: since ˜ 1 ∧ · · · ∧ dσ d−1 ), ρ(∂τ )L = dρ ( Ldσ the corresponding integral of motion, upon restriction to the fibers of Sol L → , becomes ˜ ∂ L j Hρ(∂τ ) = αρ(∂τ ) − ιρ(∂τ ) γ = L˜ − ∂τ x dσ 1 ∧ · · · ∧ dσ d−1 , ∂(∂τ x j ) (1.6.15) which is the familiar energy function, of course. 1.6.9. Bosonic string, left/right movers, and a rudiment of generalized geometry. Let M be a Riemannian manifold with metric (., .), be 2-dimensional with coordinates τ and σ . By definition, a point in J 1 (M ) is a triple (t, x, ∂ x), where t ∈ , x ∈ M, and ∂ x is a linear map ∂ x : Tt → Tx M, ξ → ∂ξ x. This makes sense out of the symbol (∂ξ x, ∂η x) as a function on J 1 (M ). The following L=

1 ((∂σ − ∂τ ) x, (∂σ + ∂τ ) x) dσ ∧ dτ 2

(1.6.16)

Lagrangian Approach to Sheaves of Vertex Algebras

505

is then a well-defined Lagrangian, the celebrated σ -model Lagrangian. In terms of local coordinates x 1 , . . . , x n s.t. (., .) = gi j d x i d x j , it looks as follows: L=

1 gi j ∂σ x i ∂σ x j − gi j ∂τ x i ∂τ x j dσ ∧ dτ. 2

A direct computation shows that δL = −dρ ((∂τ x, δx) dσ − (∂σ x, δx) dτ )

+ ∇∂τ x ∂τ x − ∇∂σ x ∂σ x dσ ∧ dτ,

(1.6.17)

where ∇∂• x is the value of the Levi-Civita connection on ∂• x. It is clear that L satisfies all the conditions of Lemma 1.6.8.1. The Lagrangian being independent of τ and σ , associated to ρ(∂τ ) and ρ(∂σ ) there arise two integrals of motion, energy and momentum, and any linear combination thereof. But much more is true. In fact, any vector field of the type either ξ − =

1 1 f (σ − τ )ρ(∂σ − ∂τ ) or ξ + = f (σ + τ )ρ(∂σ + ∂τ ) 2 2

(1.6.18)

is a symmetry of L. Indeed, precisely because (∂σ ± ∂τ )(σ ∓ τ ) = 0, one has 1 − f (σ − τ ) ((∂σ − ∂τ ) x, (∂σ + ∂τ ) x) (dσ + dτ ) , ξ L = dρ 4 (1.6.19) 1 + ξ L = −dρ f (σ + τ ) ((∂σ − ∂τ ) x, (∂σ + ∂τ ) x) (dσ − dτ ) . 4 Using (1.6.9) and Lemma 1.6.7 one obtains the corresponding integrals of motion, inside ¯L HωSol , L / 1 f (σ − τ ) ((∂σ − ∂τ ) x, (∂σ − ∂τ ) x) dσ, 4 1 = − f (σ + τ ) ((∂σ + ∂τ ) x, (∂σ + ∂τ ) x) dσ. 4

Fξ − = Fξ +

(1.6.20)

Upon applying Legendre transform (1.6.13), which in terms of local coordinates is given by xi = giα ∂τ x α , ∂τ x i = g iα xα , formulas (1.6.20) become 1 ij 1 1 g xi x j + gi j ∂σ x i ∂σ x j − x j ∂σ x j dσ, Fξ − = f (σ − τ ) 4 4 2 1 ij 1 1 i j j Fξ + = f (σ + τ ) − g xi x j − gi j ∂σ x ∂σ x − x j ∂σ x dσ, 4 4 2

(1.6.21)

and this computes the image of Fξ ± under the composite map of Lemma 1.6.8.1. Let Vir ± = span{Fξ ± } ⊂ (M, Hcan ).

(1.6.22)

506

F. Malikov

All of this means that the space of global sections of the sheaf of Lie algebras Hcan contains 2 commuting copies of the Lie algebra of vector fields on . In the case where = S 1 × , each is the centerless Virasoro algebra, hence the notation. In view of canonical commutation relations discussed in 1.5.5, formulas (1.6.21) are 2 bozonizations of the Virasoro algebra – in the quasiclassical limit. This prompts the following definitions: ¯ L ,+ − in Hω¯ L Definition 1.6.9.1. (i) Denote by HωSol the centralizer of Vir Sol L / and L / call it the right moving algebra. ¯ L ,− ω¯ L + (ii) Denote by HωSol the centralizer of Vir in H Sol / and call it the left moving L / L algebra.

We will present a computation of left/right moving algebra in the context of the ¯L WZW model in Sect. 2.9.2. Let us also note that HωSol contains yet another Virasoro L / algebra–the sum of the first two, which upon restricting to {τ = 0} becomes Vir o = span{Fξ + + Fξ − } = span{ f (σ )x j ∂σ x j dσ }.

(1.6.23)

Bosonization (1.6.23) is much simpler than (1.6.21) and was thoroughly investigated in [MSV,GMS1], but the corresponding Virasoro algebra is neither right nor left moving. 1.6.9.2. Generalized geometry interpretation. Formulas (1.6.18) admit a nice, Lagrangian free, interpretation in the spirit of Hitchin’s “generalized geometry”, [G]. The idea of generalized geometry is that the tangent bundle of a manifold must be consistently replaced with the direct sum of the tangent and cotangent bundles. From this point of view, a metric on M is a reduction of the structure group of T M ⊕ T ∗ M from S O(n, n) to S O(n, 0) × S O(0, n). Letting {ei }, {e j } be a pair of relatively dual bases of the S O(n, 0)-subbundle and letting { f i }, { f j } the same for the S O(0, n)-subbundle, one can form 2 invariantly defined tensors, ei ei and f i f i . Noticing that xi , in (1.6.18), is naturally identified with ∂x i , ∂σ x j with d x i , one concludes that Vir + is generated by ei ei and Vir − by − f i f i . To talk about these and other issues coherently, one must change gears and introduce vertex Poisson algebras. 2. Vertex Poisson Algebras Our presentation of this well-known topic, see e.g. [FB-Z], will be a little different in the following respects. First of all, we will fix an associative commutative C-algebra B to be the ground ring for all linear algebra constructions of this section. Second of all, we will let g = Der B and demand that all the structures be g-equivariant. These assumptions are intended to handle functions of τ and σ should they appear. Therefore, two examples to be kept in mind are these: B = C ∞ (), g = T () or B = C, g = 0.

(2.1)

The case at hand, where M = M × , is rather special, and we could have avoided including B and g as part of data (which is customary in works on vertex algebras), but we decided against it. That the natural setting for what follows is equivariant was pointed out by Beilinson and Drinfeld [BD, 3.9].

Lagrangian Approach to Sheaves of Vertex Algebras

507

Definition 2.1. A g-equivariant vertex Poisson algebra is a collection (V, T,(n) , g; n ≥ −1), where V is a B-module, T :V →V is a B-linear map, and (n)

: V ⊗ V → V, a(n) b = 0 if n >> 0

is a family of B-bilinear multiplications, such that the following axioms hold: I. The triple (V, T,(−1) ) is a commutative associative algebra with derivation T . II. The collection (V, T,(n) ; n ≥ 0) is a vertex Lie algebra, i.e., the following holds: II.1. skew-commutativity a(n) b = (−1)n+1

∞ (−1) j j=0

j!

T j (b(n+ j) a),

II.2. Jacobi identity a(m) b(n) c − b(n) a(m) c =

∞ m (a( j) b)(n+m− j) c, j j=0

II.3. properties of T : (T a)(n) = [T, a(n) ] = −na(n−1) if n ≥ 0. III. Leibnitz identity: for any n ≥ 0, a(n) is a derivation of (−1) . IV. g-equivariance: V is a g-module, and the maps (n) and T are g-module morphisms. In addition, we will always be assuming that a vertex Poisson algebra (V, T,(n) ; n ≥ −1) is Z+ -graded, i.e., V =

∞

Vn , T (Vn ) ⊂ Vn+1 , g(Vn ) ⊂ Vn , Vm( j) Vn ⊂ Vm+n− j−1 .

(2.1.1)

n=0

We will unburden the notation by letting V stand for (V, T,(n) , g; n ≥ −1) when this does not lead to confusion and by suppressing (−1) so that ab stands for a(−1) . We will also tend to drop the adjective “equivariant” whenever doing so seems appropriate. Note that if m = n = 0, then II.2 becomes

a(0) b(0) c − b(0) a(0) c = a(0) b (0) c, (2.1.2) which is the usual Jacobi identity for (V,(0) ). Anticommutativity fails, but II.1 ensures that it holds up to T (...). This proves the following important and well-known Lemma 2.2. If V is a vertex Poisson algebra, then T (V ) ⊂ V is a 2-sided ideal w.r.t. and (V /T (V ),(0) ) is a Lie algebra.

(0) ,

508

F. Malikov

2.3. Tensor products. The simplest example of a vertex Poisson algebra is a commutative associative algebra V with derivation T . Defining a(−1) to be the multiplication by a and letting a(n) = 0 if n ≥ 0. makes V into a vertex Poisson algebra. If (V1 ,(n) T1 ) and (V2 ,(n) , T2 ) are two vertex Poisson algebras, then V1 ⊗ V2 carries at least two vertex Poisson algebra structures. First of all, one can simply regard V1 ⊗ V2 as an extension of scalars whereby V1 ⊗ V2 becomes a vertex Poisson algebra over V1 with derivation T2 and multiplications coming from V2 . Second of all, one can define T = T1 + T2 and

(a ⊗ b)(n) =

⎧ ⎪ ⎨ ⎪ ⎩

∞ i=0

a(−1) b(−1) if n = −1 1 . (2.3.1) if n ≥ 0 T1i a b(n+i) + a(n+i) T1i b (−1) (−1) i!

If, in addition, V1 is of the type we started with, i.e., if (V1 )(n) (V1 ) = 0 for all n ≥ 0, then (2.3.1) is simplified as follows:

(a ⊗ b)(n) =

⎧ ⎪ ⎨ ⎪ ⎩

∞ i=0

a(−1) b(−1) if n = −1 1 . T1i a b(n+i) if n ≥ 0 (−1) i!

(2.3.2)

In a sense, the second version is a twist of the first by derivation T1 ∈ Der(V1 ). In the context of equivariant vertex Poisson algebras this can be generalized as follows. If (V,(n) , T ) is an equivariant vertex Poisson algebra over B and ξ ∈ g, then letting a(n)ξ =

∞ 1 i ξ a (n+i) i!

(2.3.3)

i=0

defines a vertex Poisson algebra (V,(n)ξ , T + ξ ). We will refer to this construction as the ξ -twist. Note that the ξ -twist reduces the constants from B to the algebra of ξ -invariants, Bξ .

2.4. From vertex Poisson algebras to Courant algebroids. The Poisson vertex algebra structure on V = ⊕∞ n=0 Vn defines on the subspace V0 + V1 the following operations: (−1) (−1) (0) (0) (1)

: V0 ⊗ V0 : V0 ⊗ V1 : V1 ⊗ V0 : V1 ⊗ V1 : V1 ⊗ V1 T : V0

→ → → → → →

V0 , V1 , V1 ⊗ V0 → V1 , V0 , V0 ⊗ V1 → V0 , V1 , V0 , V1 ,

(2.4.1a) (2.4.1b) (2.4.1c) (2.4.1d) (2.4.1e) (2.4.1f)

all the other operations either not preserving the subspace V0 + V1 or being zero due to condition (2.1.1). Vertex Poisson algebra axioms imply that (2.4.1a–f) satisfy certain conditions; e.g., (2.4.1a) is such that (V0 ,(−1) ) is an associative commutative B-algebra, and (2.4.1b) is

Lagrangian Approach to Sheaves of Vertex Algebras

509

such that V1 is a V0 -module. In [GMS1], these conditions were written down explicitly and made into an axiomatic definition of a vertex algebroid – in a more complicated, quantum, situation. It is a nice observation due to Bressler [Bre] that under some nondegeneracy assumptions a quasiclassical limit of a vertex algebroid is an exact Courant V0 -algebroid; e.g. (2.4.1d) is the Dorfman

barcket [Dor,G] on V1 . Therefore, the assignment V → V0 ⊕ V1 , T,(−1) ,(0) ,(1) defines a functor from a subcategory of vertex Poisson algebras to the category of exact Courant V0 -algebroids. This functor is actually an equivalence of categories, and a classification of exact Courant algebroids furnishes that of a subclass of vertex Poisson algebras. For future use, and for the reader’s convenience – after all the present situation is somewhat different – let us now reproduce the essence of this argument. 2.4.1. We have seen already that the pair (V0 ,(−1) ) is an associative commutative B-algebra. Let A = V0 . The entire V , hence V1 , is an A-module and A(n) A = 0 if n ≥ 0. By virtue of Axiom I, the map T : A → AT (A) ⊂ V1 is a B − derivation,

(2.4.2)

i.e., T (ab) = aT (b) + bT (a) and T (B) = 0. Therefore, AT (A) is a quotient of the module of relative Kähler differentials, A/B . Assumption 1. Let (A; T : A → AT (A)) be isomorphic to (A; d : A → A/B ). There arises an exact sequence of A-modules 0 → A/B → V1 → V1 / A/B → 0.

(2.4.3)

Let T = V1 / A/B . It is an A-module and a Lie algebra w.r.t. the operation (0) , by virtue of Lemma 2.2. Furthermore, the map

(0)

:T ⊗A→ A

(2.4.4)

is well defined because (AT (A))(n) A = (AT (A))(n) (AT (A)) = 0, n ≥ 0.

(2.4.5)

This map gives A a T A/B -module structure compatible with the A-module structure in that (aξ )(0) b = a(ξ(0) b). For each τ ∈ T , τ(0) ∈ End(A) is actually a B-derivation of A, and this defines a Lie algebra homomorphism over A, T → Der B (A), All of this can be summarized by saying that T is an A-Lie algebroid. Assumption 2. Morphism (2.4.6) is an isomorphism.

(2.4.6)

510

F. Malikov

The map (0)

: T ⊗ A/B → A/B ,

(2.4.7)

also arising by virtue of (2.4.5), equals the Lie derivative: ξ(0) ω = Lieξ ω,

(2.4.8)

cf. (1.5.20d). (Indeed, ξ(0) (aT b) = (ξ(0) a)T b + a(T ξ(0) b)). Next, again thanks to (2.4.5), there arises the map : T ⊗ A/B → A.

(1)

(2.4.9)

It is the natural pairing of vector fields and forms: ξ(1) ω = ιξ ω.

(2.4.10)

(Indeed, ξ(1) (aT b) = (ξ(1) a)T b + a(ξ(1) T b) = a(ξ(0) b), where axioms II. 3 and III are used.) This determines all of (2.3.1a-f) that makes sense on the graded object A ⊕ (T ⊕ A/B ). To continue our analysis we need to make the following Assumption 3. Let sequence (2.4.3) be split. Let us fix a splitting s : T → V1 .

(2.4.11)

Then there arise the following two maps: (1)s (0)s

: T ⊗ T → A, : T ⊗ T → A/B ,

(2.4.12) (2.4.13)

where (2.4.12) is the restriction of (1) to s(T ), and (2.4.13) is the composition of the restriction of (0) to s(T ) with the projection V1 → A/B = V1 /s(T ). These two maps determine all of (2.4.1a-f). The map (1)s is, in fact, a symmetric A-bilinear form on T . By varying the splitting s it can killed. Indeed, letting h(., .) =(1)s , we obtain, for any ξ ∈ T , an A-linear form h(ξ, .) ∈ A/B . Replacing s with sh defined to be 1 sh (ξ ) = s(ξ ) − h(ξ, .) 2 we get (1)sh = 0. Therefore, we can, and usually will, assume that V1 = T ⊕ A/B

(2.4.14)

Lagrangian Approach to Sheaves of Vertex Algebras

511

and (1)

: T ⊕ A/B ⊗ T ⊕ A/B → A

(2.4.15)

is the canonical pairing (ξ + ω)(1) (ξ + ω ) = ιξ ω + ιξ ω, cf. (2.4.10). 2.4.2. Therefore, all moduli, if any, come from (0)s . A short computation shows that it is A-linear. Furthermore, Axiom IV implies that

(2.4.16) (0)s ∈ Hom g T ⊗ T , A/B . Hence (0)s can be considered as an A-trilinear g-invariant functional on T , and as such it will be denoted by H : g ⊗3 . (2.4.17) (0)s ≈ H ∈ A/B Skew-commutativity II.1 implies that it is anti-commutative in the first 2 variables: H (ξ, η, .) = −H (η, ξ, .). Jacobi identity II.2 applied to [ξ(1) , η(0) ](ζ ), ξ, η, ζ ∈ s(T A ), shows that, in fact, g H (., ., .), is totally anti-commutative, hence belongs to 3A/B . Jacobi identity II. 2 applied to [ξ(0) , η(0) ](ζ ), ξ, η, ζ ∈ s(T A ), shows that H is closed, i.e., g . (2.4.18) H ∈ 3,cl A/B Conditions (2.4.14-15 or 15h ) do not determine the splitting s; they are respected by the shearing transformation g T ξ → ξ + ιξ α for a fixed α ∈ 2A/B . (2.4.19) The effect of this transformation on H is H → H + d D R α.

(2.4.20)

2.4.3. Checking the various properties of maps (2.4.1a–f) derived in 2.3.1 against the definition of an exact Courant A-algebroid [LWX] (especially in the form proposed in [Bre]) shows the following. If Assumptions 1–3 hold, then the equivariant Poisson vertex algebra structure on V defines an equivariant exact Courant A-algebroid structure on T ⊕ A such that (0)

: (T ⊕ A ) ⊗ (T ⊕ A ) → T ⊕ A

is the Dorfman [Dor,G] bracket, (1)

: (T ⊕ A ) ⊗ (T ⊕ A ) → A

is the symmetric pairing, and (2.4.6) is the anchor. The discussion in 2.4.2 practically proves (see [Bre,GMS1] for a complete g analy3,cl sis) that the category of exact equivariant Courant A-algebroids is an A/B -space.

512

F. Malikov

Indeed, if C is one such algebroid and H ∈ algebroid

g 3,cl A/B , then the H -twisted Courant

.

C + H is defined by replacing (0)s with (0)s + H.

(2.4.21)

A “canonical” Courant algebroid C0 can be chosen by letting the only “unknown” operation (0)s be zero: define C0 to be s.t. (2.4.14–15) hold and (0)s = 0.

(2.4.22) g This identifies the category of equivariant exact Courant A-algebroids with 3,cl A/B s.t. g . 3,cl

H → C H = C0 + H. (2.4.23) A/B The effect of shear (2.4.19) on H recorded in (2.4.20) implies the following description of morphisms: g . s.t. d D R α = H , (2.4.24) Mor(C, C + H ) = α ∈ 2A/B and automorphisms

g . Aut(C) = 2,cl A/B

(2.4.25)

In particular, the set of isomorphism classes of exact Courant A-algebroids is identified with the g-invariant de Rham cohomology group, g g /d D R 2A/B . (2.4.26) 3,cl A/B 2.5. Symbols of vertex differential operators. Let be an open subset of a Rd , U of Rn , and U = U × . Define B = C ∞ (), g = TRd (). Identify g with the subalgebra of horizontal vector fields on U , thereby making C ∞ (U ) into a g-module. These are the prerequisites to the definition of a TRd ()-equivariant vertex Poisson algebra over B. Definition 2.5.1. Call V an algebra of symbols of vertex differential operators, SVDO for short, if (i) V0 = C ∞ (U ), V1 is a TRd ()-equivariant exact Courant C ∞ (U )-algebroid over B = C ∞ (), (ii) V is generated as an associative commutative algebra with derivation T by V0 ⊕V1 . The discussion in 2.4.3 means that we have obtained a functor, say F, from the category of SVDOs to the category of equivariant exact Courant C ∞ (U )-algebroids: F : {SVDOs} → {Courant algebroids}.

(2.5.1)

Theorem 2.5.2. ([GMS1,Bre]). This functor is an equivalence of categories. To be precise, [GMS1,Bre] only construct F ∗ , the left adjoint to F, but a simple representation-theoretic argument shows that the “vertex envelope”, F ∗ (C), is simple. (Indeed, by construction any element of F ∗ (C) can be moved to A = F ∗ (C)0 by a sequence of operations a(n) , τ(n) , where a ∈ A and τ ∈ T , and thus generate the entire F ∗ (C).)

Lagrangian Approach to Sheaves of Vertex Algebras

513

2.6. A sheaf-theoretic version. All of this can be spread over manifolds. The geometric prerequisite is a fiber bundle π : M →

(2.6.1a)

∇ : T → T M .

(2.6.1b)

with a flat connection

A sheaf of SVDOs, V, over M is a sheaf of vector spaces s.t. the space of sections V(U ) is an SVDO for each open U ⊂ M with V(U )0 = O M (U ), B(U ) = π ∗ O (πU ), g = T (πU ),

(2.6.2)

and equivariant structure determined by ∇. The condition that V(U )0 = O M (U ) implies that V is automatically a sheaf of O M (U )-modules. It follows from (2.4.6) that the next homogeneous component, V1 is an extension of vertical vector fields by relative 1-forms: 0 → M / → V1 → T M / → 0.

(2.6.3)

As to the existence of such sheaves, they are plentiful locally: for any sufficiently small ∇ open U ⊂ M , the category of such sheaves over U is an 3,cl (U )-space, as M / ∇ follows from (2.4.23). If VU is one such sheaf and H ∈ 3,cl (U ), then M / ∇ 2 Mor(VU , VU + H ) = α ∈ M / (U ) s.t. d D R α = H , .

(2.6.4)

cf. (2.4.24). Technically, (2.6.4) means that there is a gerbe, in particular, a sheaf of categories, of SVDOs bound by the sheaf complex ∇ ∇ dD R 0 → 2M / (U ) → 3,cl (U ) → 0, M / so that the categories over sufficiently small U are equivalent to that of SVDOs with V0 = O M (U ). A priori there may be no single sheaf of SVDOs on the entire M; an obstruction to its existence is a certain canonical characteristic class lying in ∇ ∇ H 2 M , 2M / . → 3,cl M / At this point let us return to the concrete situation of interest to us, where M = M× and ∇ is the horizontal connection. If so, the above discussion is simplified in that the ∇ sheaves •M / can be replaced with •M . For example, the obstruction becomes a class lying in H 2 (M, 2M → 3,cl M ).

514

F. Malikov

This class vanishes; the obstruction (equal to the 1st Pontryagin class) computed in [GMS1], see also [Bre], is a purely quantum phenomenon, and in any case, an example of such sheaf will be exhibited shortly. Furthermore, (2.4.24–26) imply that the set of isomorphism classes of such sheaves is an H 1 (M, 2M → 3,cl M )-torsor, and the group of automorphisms of any such sheaf ∼ 2,cl 0 is isomorphic to H 0 (M, 2M → 3,cl M ) −→ H (M, M ). Note that since the sequence dD R

3,cl 2 0 → 2,cl M → M → M → 0

(2.6.5)

is exact, we obtain isomorphisms ∼

2,cl 1 H 1 (M, 2M → 3,cl M ) −→ H (M, M ), ∼

2,cl 0 H 0 (M, 2M → 3,cl M ) −→ H (M, M ).

(2.6.6)

The long exact cohomology sequence associated with (2.6.5) implies, in addition, that ∼

∼

0 3 0 2 3 H 1 (M, 2,cl M ) −→ H (M, M )/d H (M, M ) −→ H (M, R),

(2.6.7)

where the last isomorphism is the de Rham theorem. This proves Proposition 2.6.1. a) The set of isomorphism classes of sheaves of SVDOs on M = M × with horizontal connection is identified with either of the isomorphic 3 groups H 1 (M, 2,cl M ) and H (M, R). b) If V is a sheaf of SVDOs over M, then ∼

AutV −→ H 0 (M, 2,cl M ). 2.6.2. Here is an explicit construction of identifications a) and b) of Proposition 2.6.1. The presentation of the set of isomorphism classes as H 1 (M, 2,cl M ) emphasizes the fact that locally all such sheaves are isomorphic (this is an immediate consequence of (2.4.26)). Indeed, let {Ui } be a covering by balls. Let Vi be the restriction V to Ui . Then there arise canonical identifications, ∼

φi j : Vi |Ui ∩U j −→ V j |Ui ∩U j ,

(2.6.8)

ˇ to be thought of as gluing functions. Let now αi j ∈ 2,cl M (Ui ∩ U j ) be a Cech cocycle 2,cl 1 representing α ∈ H (M, M ). Regarding αi j as an automorphism of V j |Ui ∩U j , define .

∼ def φˆ i j = φi j + αi j : Vi |Ui ∩U j −→ V j |Ui ∩U j ,

(2.6.9)

ˇ cocycle to be the composition of φi j and the shear by αi j defined in (2.4.19). The Cech ˆ ˆ ˆ condition satisfied by {αi j } implies that φik ◦ φk j ◦ φ ji = id on the triple intersection Ui ∩ U j ∩ Uk for any i, j, k. Thus φˆ i j are gluing functions of a new sheaf of SVDOs, to . be denoted V + α.

Lagrangian Approach to Sheaves of Vertex Algebras

515

Contrary to this, the presentation of the set of isomorphism classes as H 3 (M, R) has nothing to do with gluing functions or even the O M -module structure. Indeed, for an element of H 3 (M, R), pick a global closed 3-form H representing it. By definition . (2.4.21), the sheaf V + H is different from V only in that the operation (0)

: V0 ⊗ V0 → V0 .

is replaced with (0) + H (and the sheaf V + dβ, β a global 2-form, is canonically isomorphic to V). . The relation of one point of view to another is as follows. For example, given V + H , find a collection β = {βi ∈ 2M (Ui )} so that dβi = H |Ui . Then dCˇ (β) is de Rham-closed ˇ and hence is a Cech 1-cocycle with coefficients in 2,cl M . The map H 0 M, 3,cl

H → β → class of dCˇ (β) ∈ H 1 M, 2,cl (2.6.10) M M descends to the inverse of (2.6.7). . Now, (V + H )|Ui = Vi as vector spaces but not as SVDOs; to obtain an SVDO isomorphism, the shear by βi is needed: .

βi : Vi → (V + H )|Ui .

(2.6.11)

The effect of this transformation on the gluing functions is as follows: .

φi j → φi j + dCˇ β,

(2.6.12)

cf. (2.4.20), and this delivers the desired isomorphism .

∼

.

V + H −→ V + (class of dCˇ β).

(2.6.13)

2.7. A natural sheaf of SVDOs. Let us attach to any smooth M a sheaf of SVDOs which depends on M functorially. In order to do so, let us place ourselves in the situation where T ∗ M = T ∗ M ×, satisfies (1.1.2-4) and carries, in particular, a distinguished coordinate system, σ and τ . Taking advantage of (1.1.7), we note that the operator of the jet connection, (1.1.5), splits in the vertical and horizontal components, e.g., ρ(∂σ ) = ∂σv + ∂σh ,

(2.7.1)

where the latter stands for the operator of differentiation w.r.t. σ “appearing explicitly”. Let π : J ∞ (T ∗ M/ ) → M

(2.7.2)

be the natural projection. There arises the direct image of the structure sheaf π∗ O J ∞ (T ∗ M/ ) which we will take the liberty to denote also by O J ∞ (T ∗ M/ ) because this is unlikely to cause confusion. Thus, for example, if U ⊂ M is open, then O J ∞ (T ∗ M/ ) (U ) will stand for the space of functions on the jet-space regular over π −1 (U ).

516

F. Malikov

Being a structure sheaf, O J ∞ (T ∗ M/ ) carries a canonical multiplication. Let us define a grading O J ∞ (T ∗ M/ ) =

∞

OiJ ∞ (T ∗ M

i=0

OiJ ∞ (T ∗ M ) /

j · O J ∞ (T ∗ M ) /

⊂

/ )

s.t. (2.7.3)

i+ j O J ∞ (T ∗ M ) /

by requiring that the pull-back of functions on M have degree 0, the pull-back of fiberwise linear functions on T ∗ M have degree 1, and the operator ∂σv , defined in (2.7.1), have degree 1, i.e., that ∂σv (OiJ ∞ (T ∗ M ) ) ⊂ Oi+1 . Thus, for example, J ∞ (T ∗ M ) /

O0J ∞ (T ∗ M

/ )

= O M , O1J ∞ (T ∗ M

/

/ )

= T M / ⊕ M / ,

(2.7.4)

cf. (2.6.3), where T M / is realized inside O J ∞ (T ∗ M/ ) as the pull-back of fiberwise linear functions on T ∗ M, and M / is realized as O M ∂σv O M , cf. Sect. 2.4.1, Assumption 1. Proposition 2.7.1. The sheaf O J ∞ (T ∗ M/ ) carries a unique structure of a sheaf of SVDOs over B = O such that (−1) is the canonical multiplication, T = ∂σv (which furnishes (2.4.1a,b,f) in this case), and (2.4.1c–e) take the following form: if ξ, ξ ∈ T M , ω, ω ∈ M , then

(0) : T M / ⊕ M / ⊗ O M → O M , O M ⊗ T M / ⊕ M / → O M ,

(0)

(ξ + ω)(0) F = −F(0) (ξ + ω) = ξ F, (2.7.5)

: T M / ⊕ M / ⊗ T M / ⊕ M / → T M / ⊕ M / ,

(ξ + ω)(0) ξ + ω = [ξ, ξ ] + Lieξ ω − Lieξ ω + ∂σv ιξ ω , (2.7.6)

(1) : T M / ⊕ M / ⊗ T M / ⊕ M / → O M , (2.7.7)

(ξ + ω)(1) ξ + ω = ιξ ω + ιξ ω.

Note that (2.7.5–7) restricted to some U ⊂ M are nothing but the definition of the canonical Courant C ∞ (U )-algebroid C0 of (2.4.21); therefore O J ∞ (T ∗ M/ ) (U ) is nothing but F ∗ (C0 ), where F is equivalence of categories (2.5.1). The vertex Poisson algebra structure of Proposition 2.7.1 is not quite what we need. Being T -equivariant, it is subject to the ξ -twist, see (2.3.3), for any ξ ∈ H 0 (, T ).

J ∞ (T ∗ M ) denote the sheaf O J ∞ (T ∗ M ) with the vertex Definition 2.7.2. Let O / / Poisson algebra structure defined in Proposition 2.7.1 and let O J ∞ (T ∗ M/ ) denote the latter’s ∂σh -twist, see (2.7.1). Note that in the case of O J ∞ (T ∗ M/ ) , the derivation T becomes T = ρ(∂σ ).

(2.7.8)

Lagrangian Approach to Sheaves of Vertex Algebras

517

In particular, (2.7.6) is changed as follows:

(0) : T M / ⊕ M / ⊗ T M / ⊕ M / → T M / ⊕ M / ,

(2.7.9) (ξ + ω)(0) ξ + ω = [ξ, ξ ] + Lieξ ω − Lieξ ω + ρ(∂σ ) ιξ ω , and the operations on O J ∞ (T ∗ M/ ) are no longer linear over O , only over O . Let us now relate O J ∞ (T ∗ M/ ) to the canonical Lie algebra sheaf Hcan defined in (1.5.17). Lemma 2.2 associates with O J ∞ (T ∗ M/ ) the sheaf of Lie algebras Lie(O J ∞ (T ∗ M/ ) ) = O J ∞ (T ∗ M/ ) /ρ(∂σ )O J ∞ (T ∗ M/ ) . Proposition 2.7.3. The Lie algebra sheaves Hcan and Lie(O J ∞ (T ∗ M/ ) ) are canonically isomorphic. Proof. The sheaf isomorphism ∼

O J ∞ (T ∗ M/ ) −→ 1,0 J ∞ (T ∗ M

/ )

, F → Fdσ

descends to ∼

O J ∞ (T ∗ M/ ) /ρ(∂σ )O J ∞ (T ∗ M/ ) −→ 1,0 J ∞ (T ∗ M

/ )

/dρ/ 0,0 J ∞ (T ∗ M

/ )

.

Lemma 1.5.4.1 (and (1.5.17)) identifies the range of this map with Hcan , and thanks to (2.7.8), the domain of this map is Lie(O J ∞ (T ∗ M/ ) ) – it is at this point that we need the ∂σh -twist; hence a sheaf isomorphism ∼

Lie(O J ∞ (T ∗ M/ ) ) −→ Hcan .

(2.7.10)

Map (2.7.10) respects all defining relations (1.5.17–18a,b): (1.5.18) is (part of) (2.7.5,6), (1.5.19a) is Sect. 2.1, Axiom III, and (1.5.19b) is Sect. 2.1, Axiom II.3 (another point where the ∂σh -twist is necessary). Hence (2.7.10) is a Lie algebra sheaf isomorphism. Terminology 2.7.4. We have obtained two families of sheaves of vertex Poisson algebras. First, those provided by the combination of Propositions 2.6.1a) and 2.7.1. They .

J ∞ (T ∗ M ) + H , where H ∈ H 0 (M, 3,cl ) represents a can be realized as either O /

M

.

J ∞ (T ∗ M ) + {αi j } , where {αi j } is a cocycle 3-dimensional cohomology class, or O representing an element of H 1 (M, 2,cl M ).

/

.

Second, their ∂σh -twisted versions, to be denoted by O J ∞ (T ∗ M/ ) + H and

. O J ∞ (T ∗ M/ ) + {αi j } . As Proposition 2.7.3 indicates, it is the latter that will be of importance. Note, however, that these choices have arisen only

because we have included

. .

J ∞ (T ∗ M ) + {αi j } and O J ∞ (T ∗ M ) + {αi j } functions of τ and σ . In fact, both O / / induce the same vertex Poisson algebra structure on the fiber at any

point

(σ, τ ) ∈ . . For this reason sheaves such as O J ∞ (T ∗ M/ ) + {αi j } , where {αi j } will also be referred to as sheaves of SVDOs.

518

F. Malikov

2.8. The Lagrangian interpretation. Let us place ourselves in the situation of 1.6.8 and assume that the Lagrangian L ∈ H 0 (J ∞ (M ), 2,0 J ∞ (M ) ) is of order 1, globally defined, and convex. A combination of Proposition 2.7.3 and Lemma 1.6.8.1 gives ∼ ¯L ˜ HωSol/ −→ Lie(O J ∞ (T ∗ M/ ) ), I L ⊂ (M , Lie(O J ∞ (T ∗ M/ ) )). (2.8.1)

In this sense the universal sheaf of SVDO’s O J ∞ (T ∗ M/ ) governs the theory associated to L. In order to interpret similarly all the other, twisted, sheaves of SVDO’s provided by Proposition 2.6.1a), one needs to consider Lagrangians (1.6.1a,b) that do not glue in a global section of 2,0 J ∞ (M ) . One possibility to construct such a Lagrangian is to add what a physicist might call a Wess-Zumino term or an H -flux, cf. [GHR,W1]. Fix a global closed 3-form H on M and let {Ui } be an open covering of M fine to ensure the existence of a collection of 2-forms {β (i) ∈ 2M (Ui ) s.t. dβ (i) = H on Ui }.

(2.8.2)

L H = {L (i) = L + β (i) (ρ(∂τ ), ρ(∂σ ))dτ ∧ dσ }.

(2.8.3)

Define

It follows from (2.8.4) that on double intersection β (i) − β ( j) are closed and, provided {Ui } is fine enough, are exact, i.e., there is a collection of 1-forms, {α (i j) } such that β ( j) − β (i) = dα (i j) . Then a quick computation shows that L ( j) − L (i) = dρ

ιρ(∂τ ) α (i j) dτ + ιρ(∂σ ) α (i j) dσ .

Therefore, collection (2.8.3) is a new Lagrangian in the sense of (1.6.1a,b). The L H is a collection of locally defined Lagrangians, which are still order 1 and convex, hence Sol L H can still be identified with the universal J ∞ (T ∗ M/ ). One way to define such identification is to use L, as in (2.8.1): ∼

dT M L : (Sol L H ) −→ J ∞ (T ∗ M/ ),

(2.8.4)

but the obvious counterpart of (2.8.1) fails in this case. Instead, (2.8.4) gives an isomorphism of the twisted sheaf, see (2.4.21), .

∼

ω¯

H

L Lie(O J ∞ (T ∗ M/ ) + H ) −→ H/ .

(2.8.5)

.

This attaches the twisted sheaf O J ∞ (T ∗ M/ ) + H to the Lagrangian L H . To see how the twist comes about note that the Legendre transform dT M L used in (2.8.4) does not respect the canonical variational 2-form ω L H , see (1.6.4). This can be straightened out locally. According to (1.6.13), one way to proceed is to choose, over Ui , the mapping to be dT M L (i) . Since L (i) = L + β (i) , 1 dT M L (i) (ξ ) = dT M L(ξ ) + ιξ β (i) , 2

(2.8.6)

Lagrangian Approach to Sheaves of Vertex Algebras

519

as follows, e.g., from local formulas (1.6.14). But mappings (2.8.6) are incompatible on ˇ double intersections Ui ∩ U j , the obstruction being the Cech cocycle 1 dCˇ {β (i) } = {β ( j) − β (i) } ∈ Z Cech (M, 2,cl ˇ M ).

(2.8.7) .

In order to restore the compatibility, let us introduce the twisted sheaf O J ∞ (T ∗ M/ ) +

dCˇ {β (i) } obtained by twisting the gluing functions of O J ∞ (T ∗ M/ ) over Ui ∩ U j by the 2-form β ( j) − β (i) , as we did in (2.6.9). Then the collection of mappings ∼

{(dT M L (i) )∗ : O J ∞ (T ∗ M/ ) (Ui ) −→ O Sol L H (Ui ) delivers a map of the twisted sheaf . O J ∞ (T ∗ M/ ) + dCˇ {β (i) } → O Sol L H ,

(2.8.8a)

(2.8.8b)

so that the arising map ∼ . ω¯ L H Lie O J ∞ (T ∗ M/ ) + dCˇ {β (i) } −→ H/

(2.8.9)

is a Lie algebra sheaf isomorphism. It is explained in some detail in 2.6.2 that this sheaf . is the same as O J ∞ (T ∗ M/ ) + H , see (2.6.13); hence (2.8.9) is equivalent to (2.8.5). Incidentally, the classification of automorphisms of SVDO’s, Proposition 2.6.1b) is also accurately reflected in the Lagrangian approach. Given a globally defined Lagrangian and a closed 2-form β, a B-field, let L β = L + β(ρ(∂τ ), ρ(∂σ ))dτ ∧ dσ , cf. (2.8.3). This does nothing to either the corresponding equations of motion or the corresponding variational 2-form. Hence Sol L = Sol L β , literally, as pre-symplectic manifolds, but there arise two competing Legendre transforms, dT M L and dT M L β . A moment’s thought shows that the latter is the composition of the former with the B-field transform, ξ → ξ + ιξ β, and this provides the Lagrangian realization of the automorphism of the SVDO O J ∞ (T ∗ M/ ) associated to β in Proposition 2.6.1b. The subalgebras of integrals of motion I˜ L → (Sol L , Hω¯ L ), arising by virtue of /

Lemma 1.6.8.1, also tend to come from vertex Poisson subalgebras of O J ∞ (T ∗ M/ ) . For example, the three Virasoro algebras, left, right, and “half-twisted”, see (1.6.22,23), are the Lie-functor evaluated on the three subalgebras of (M, O J ∞ (T ∗ M/ ) ) generated by 1 1 1 ij g xi x j + gi j ∂σ x i ∂σ x j − x j ∂σ x j , 4 4 2 1 1 1 − g i j xi x j − gi j ∂σ x i ∂σ x j − x j ∂σ x j , 4 4 2

(2.8.10)

−x j ∂σ x j , respectively. The global nature of these local formulas was unraveled in 1.6.9.2. 2.9. An example: WZW model. Let us see how all of this plays out in the case where the target manifold is a real Lie group G, either compact and simple or GL(n, R).

520

F. Malikov

2.9.1. Classification. Let g = LieG be the corresponding Lie algebra. Fix an invariant bilinear form g ∈ S 2 (g∗ )g and an invariant trilinear form H (x, y, z) = g([x, y], z).

(2.9.1)

The left translates of these generate the invariant metric and 3-form (resp.) on G, which we will take the liberty of denoting by the same letters g ∈ H 0 (G, TG⊗2 ), H ∈ H 0 (G, 3G ).

(2.9.2)

Note that the latter is closed: H ∈ H 0 (G, 3,cl G ).

(2.9.3)

H 3 (G, R) = R · (class of H ).

(2.9.4)

It is well known that

Therefore, Proposition 2.6.1a) implies that the set of isomorphism classes of SVDO’s on G form a 1-parameter family: def

.

S DG,k = O J ∞ (G / ) +

−k H. 2

(2.9.5)

As it was explained in Sect. 2.6.2, the structure of S DG,k is determined by the following: there is a fixed splitting

S DG,k

1

= TG / ⊕ G / ,

(2.9.6)

poiss and the vertex Poisson algebra structure makes DG,k into the Courant algebroid 1 that satisfies k (2.4.14, 15)hold true, and(0)s = − H, 2

(2.9.7)

cf. (2.4.21–23). Induced by the action on the left and on the right, there are the corresponding Lie algebra g = Lie G embeddings in the space of global vector fields jl0 : g → H 0 (G , TG / ), jr0 : g → H 0 (G , TG / ) s.t. [ jl0 (g), jr0 (g)] = 0. (2.9.8)

These embeddings respect the SVDO structure on S DG,0 1 in that jl0 ([x, y]) = jl0 (x)(0) jl0 (y) , jl0 (x)(n) jl0 (y) if n > 0,

(2.9.9a)

jr0 ([x, y]) = jr0 (x)(0) jr0 (y) , jr0 (x)(n) jr0 (y) if n > 0,

(2.9.9b)

Lagrangian Approach to Sheaves of Vertex Algebras

521

and

jl0 (x)

(n)

jl0 (x) = 0 if n ≥ 0,

(2.9.9c)

as it follows from either (2.7.6) or (2.4.6). Technically, (2.9.9a–c) mean the following. Associated to g there is a Z+ -graded vertex Poisson algebra, V (g)k , see e.g. [FB-Z]. It is the universal vertex Poisson algebra generated by (V (g)k )0 = R, (V (g)k )1 = g,

(2.9.10)

⎧ ⎨ kg(x, y) if n = 1 [x, y] if n = 0 x(n) y = ⎩ 0 if n > 1.

(2.9.11)

such that for any x, y ∈ g,

By definition, (2.9.9a–c) imply that maps (2.9.8) can be extended to vertex Poisson algebra maps jl0 : V (g)0 → H 0 (G , S DG,0 ), jr0 : V (g)0 → H 0 (G , S DG,0 )

(2.9.12)

such that

jl0 (V (g)0 )

(n)

jr0 (V (g)0 ) = 0 if n ≥ 0.

(2.9.13)

0 must be deformed. Let In order to carry this over to k = 0, the maps jl/r

k poiss jlk : g → DG,k , jlk (x) = jl0 (x) + g jl0 (x), . , 1 2 k poiss k k 0 , jr (x) = jr (x) − g jr0 (x), . , jr : g → DG,k 1 2

(2.9.14) (2.9.15)

Theorem 2.9.1.1. [FP,F,AG,GMS2]. Maps (2.9.14,15) extend to vertex Poisson algebra embeddings jlk

jrk

V (g)k → H 0 (G , S DG,k ) ← V (g)−k

(2.9.16)

such that

jlk (V (g)k )

(n)

jrk (V (g)−k ) = 0 if n ≥ 0.

(2.9.17)

Remark 2.9.1.2. This appealing result has a long and somewhat unhappy history. A version of it first appeared in [FP] (in a more complicated, quantum, situation) but apparently had been known even earlier to E. Frenkel, [F] – all of this before the introduction of sheaves of vertex algebras – and then was thoroughly forgotten. Arkhipov and Gaitsgory [AG] gave a proof in the language of chiral algebras. Our presentation is close to [GMS2].

522

F. Malikov

The algebra V (g)k has a well-known family of modules, Vλ,k , induced from Vλ , the simple finite dimensional g-module with highest weight λ, see e.g. [FBZ]. According to Theorem 2.9.1.1, H 0 (G , S DG,k ) is a V (g)k ⊗ V (g)−k -module, see Sect. 2.3 for the definition of the tensor product of vertex Poisson algebras. Proposition 2.9.1.3. If k = 0, then there is an isomorphism of V (g)k ⊗ V (g)−k -modules

∼ ˆ ⊕λ Vλ,k ⊗ Vλ∗ ,−k , H 0 (G , S DG,k ) −→ C ∞ ()⊗

(2.9.18)

where λ∗ stands for the highest weight of the g-module dual to Vλ . Sketch of Proof. The validity of decomposition (2.9.18) for the subspace H 0 (G , (S DG,k )0 ) is the content of the Peter-Weyl theorem. It is not hard to deduce from (2.9.14,15) that any R-basis, B, of jlk (g) ⊕ jrk (g) is a basis of H 0 (G , (S DG,k )1 ) poiss over functions if and only if k = 0. Hence, the entire H 0 (G , DG,k ) is the space of differential polynomials in B over functions. Decomposition (2.9.18) follows at once from the induced nature of modules Vλ,k . Remark 2.9.1.4. A proof – in the quantum case – of (2.9.18) for a generic k first appeared in [FS]. Our proof goes through in the quantum case as well as also for a generic k. It is shown in a recent preprint [Zh] what may happen at special values of k. Decomposition (2.9.21) is tantalizingly similar to the space of states of the WZW model to which S DG,k is indeed intimately related. 2.9.2. WZW. Consider the standard σ -model Lagrangian with target G: Lκ =

κ g ((∂τ − ∂σ ) x, (∂τ + ∂σ ) x) dτ ∧ dσ, 2

(2.9.19)

cf. (1.6.16), where g(., .) is the invariant metric (2.9.2) and κ is an arbitrary constant. −k/2H Next use the 3-form H of (2.9.2) to obtain L κ as explained in (2.8.4–5). The WZW Lagrangian [W1] is −k H

L W Z W = L k/22 .

(2.9.20)

As follows from (2.8.5) and normalization (2.9.5), the sheaf S DG,k governs the theory −k/2H for any κ. It is clear why the H -twist of (2.9.19) is needed – the associated to L κ pleasing decomposition (2.9.18) is valid only if k = 0. Let us now explain the choice of κ made in (2.9.20). Recall that Lagrangian (2.9.19) is conformally invariant, i.e., the corresponding algebra of integrals of motion contains two Virasoro subalgebras Vir ± , see (1.6.17). It is easy to see that the twisted version, −k/2H Lκ , is also, and Vir ± are still the corresponding integrals of motion. By virtue of (2.8.5) the Legendre transform delivers the embeddings

Vir ± → G, Lie S DG,k .

(2.9.21)

Lagrangian Approach to Sheaves of Vertex Algebras

523

On the other hand, each V (g)k carries its own Virasoro element – a well-known fact. By virtue of Theorem 2.9.1.1, there arise then two more Virasoro subalgebras

Vir l → G, Lie S DG,k ← Vir r . (2.9.22) Lemma 2.9.2.1. Upon taking the images of (2.9.21–22) Vir + = Vir l , Vir − = Vir r

(2.9.23)

if and only if κ = k/2. This allows to compute the left/right moving subalgebra, see Definition 1.6.9.1. Corollary 2.9.2.2. The right moving subalgebra of WZW is Lie(C ∞ () ⊗ V (g)k ) and the left moving is Lie(C ∞ () ⊗ V (g)−k ). The Lie-functor appearing in 2.9.2.1–2 only obscures the matter, of course. Armed with the notion of a vertex Poisson algebra we can easily refine both Definition 1.6.9.1 and 2.9.2.1–2. The Lie algebra Vir itself is the Lie-functor applied to a certain vertex Poisson algebra, Vir. Embeddings (2.9.21–22) are engendered by vertex Poisson algebra embeddings of 4 copies of Vir: Vir ± → (G, S DG,k ),

(2.9.24)

Vir → (G, S DG,k ) ← Vir .

(2.9.25)

l

r

Lemma 2.9.2.1 can be refined as follows: upon taking the images of (2.9.24,25) Vir + = Virl , Vir − = Virr iff κ =

k . 2

(2.9.26)

Definition 1.6.9.1 can be similarly refined: Definition 2.9.2.3. Let the left/right moving subalgebras of S DG,k be + S DG,k = {v ∈ S DG,k s.t. v(n) Vir − = 0 ifn ≥ 0},

(2.9.27)

− S DG,k = {v ∈ S DG,k s.t. v(n) Vir + = 0 ifn ≥ 0}.

(2.9.28)

The refined form of Corollary 2.9.2.2 is this: − + S DG,k = C ∞ () ⊗ V (g)k , S DG,k = C ∞ () ⊗ V (g)−k .

(2.9.29)

Proof 2.9.2.4. We will prove (2.9.26) and (2.9.29) from which Lemma 2.9.2.1 and Corollary 2.9.2.2 follow immediately. Proving (2.9.26) amounts to painstakingly translating from Sect. 2.9.1 to Sect. 2.9.2, the Legendre transform being the main tool. To facilitate bookkeeping, we will assume that G = GL(n, R); an extension via a faithful representation to compact Lie groups is immediate. Let then x i j be coordinates, ∂i j = ∂/∂ x i j , and {E i j } the standard basis of gl(n, R). The invariant metric is g = xtα d x α j x jβ d x βt ,

(2.9.30)

where xtα are defined so that xtα x α j = δt , and the summation w.r.t. repeated indices is always assumed. j

524

F. Malikov

Embeddings (2.9.8.) take the form jl0 (E i j ) = x αi ∂α j ,

(2.9.31)

jr0 (E i j )

(2.9.32)

= −x

jα

∂iα .

By virtue of (2.9.30), definitions (2.9.14,15) read k jlk (E i j ) = x αi ∂α j + x jγ ∂σ x γ i , 2 k k jα jr (E i j ) = −x ∂iα + xγ i ∂σ x jγ . 2

(2.9.33) (2.9.34)

Finally, the elements that generate the two corresponding Virasoro vertex Poisson algebras inside S DG,k , cf. (2.9.25), are Virl =<

1 k 1 jl (E i j ) jlk (E ji ) >, Virr =< jrk (E i j ) jrk (E ji ) > . k k

(2.9.35)

−k/2H

, one needs to use To recapitulate all of this in terms intrinsic to the Lagrangian L κ the twisted version of the Legendre transform, see (2.8.4), i.e., apply (1.6.13–14) not to −k/2H Lκ but to L 0κ . This amounts to letting ∂i j =

∂ L 0κ ; ∂(∂τ x i j )

thus ∂i j = κ xαi ∂τ x βα x jβ .

(2.9.36)

Plugging this in (2.9.33–34) gives k αi αi x jα , = κ∂τ x + ∂σ x 2 k k jα jα xαi . jr (E i j ) = −κ∂τ x + ∂σ x 2

jlk (E i j )

(2.9.37) (2.9.38)

It is pleasing to notice that precisely when κ = k/2, the latter formulas become the WZW currents, see [W1], (15) or [GW], (2.3), jlk (E i j ) = k∂+ x αi x jα ,

(2.9.39)

jrk (E i j )

(2.9.40)

= k∂− x

jα

xαi ,

where ∂± = (∂σ ± ∂τ )/2. Now to the Virasoro subalgebras. Plugging (2.9.37,38) in (2.9.35) one finds similarly that precisely when κ = k/2 the corresponding Virasoro elements are Virl = < kg(∂+ x, ∂+ x) >, Virr = < kg(∂− x, ∂− x) >,

(2.9.41) (2.9.42)

i.e., defined by the familiar, see (1.6.20), formulas for Vir ± . This concludes our proof of (2.9.26).

Lagrangian Approach to Sheaves of Vertex Algebras

525

Now to (2.9.29). Having at our disposal (2.9.26), we infer from Theorem 2.9.1.1 that − + C ∞ () ⊗ V (g)k ⊂ S DG,k , C ∞ () ⊗ V (g)−k ⊂ S DG,k .

(2.9.43)

To prove the reverse inclusions, let 1 1 L l = jlk (E i j ) jlk (E ji ), L r = jrk (E i j ) jrk (E ji ). k k It follows easily from the definition of the modules Vλ,k that def

def

KerL l(0) = V0,−k = V (g)−k , Ker L r(0) = V0,k = V (g)k .

(2.9.44)

By definition then − + C ∞ () ⊗ V (g)k ⊃ S DG,k , C ∞ () ⊗ V (g)−k ⊃ S DG,k ,

(2.9.45)

which concludes the proof of (2.9.29). 3. Supersymmetric Analogues 3.1. Bits of supergeometry. All of the geometric background of Sect. 1 allows more or less straightforward super-generalization. We will explain this very briefly, and in less generality, because our exposition will be more example-oriented. Such sources as [DM,L,M1] provide an introduction to supermathematics. 3.1.1. Super world-sheet. The world-sheet is now a 2|2-dimensional real C ∞ -manifold either with a fixed coordinate system ˆ → R2|2 (u, v, θ + , θ − ) :

(3.1.1a)

or a fixed étale coordinate system ˆ (u, v, θ + , θ − ) : R2|2 → ,

(3.1.1b)

θ±

where (u, v) are even and are odd. We have the underlying even manifold ˆ = {θ + = θ − = 0} →

(3.1.2)

ˆ (u,v) → .

(3.1.3)

and the bundle

The time-fibration will be defined to be the composition τ ˆ (u,v) → → ⊂ R

ˆ for some fibration τ , where is an even manifold underlying . ˆ The Lie algebra of vector fields on contains two remarkable elements ∂ ∂ ∂ ∂ D+ = + − θ + , D− = − − θ − . ∂θ ∂u ∂θ ∂v The following relations hold true: ∂ ∂ [D+ , D+ ] = −2 , [D− , D− ] = −2 , [D+ , D− ] = 0, ∂u ∂v ∂ ∂ , D± = , D± = 0. ∂v ∂u

(3.1.4)

(3.1.5)

(3.1.6)

526

F. Malikov

3.1.2. Super-jets. Let M be a C ∞ -supermanifold with underlying even manifold M even . Define ˆ Mˆ = M × .

(3.1.7)

ˆ Mˆ → .

(3.1.8)

ˆ It is fibered over : The manifold of ∞-jets of sections of this bundle, J ∞ (Mˆ ), is defined in a straightforward manner as follows (cf. [BD, p.80]). Definition 3.1.2.1. J ∞ (Mˆ ) is a supermanifold with underlying even manifold J ∞ (Meven ) and the structure sheaf O J ∞ (Mˆ ) defined to be the symmetric algebra on Dˆ ⊗Oˆ O M modulo the relations 1 ⊗ f · 1 ⊗ g = 1 ⊗ f g, 1 ⊗ 1 = 1, ˜ ˜

ξ ⊗ f g = (ξ ⊗ f ) · (1 ⊗ g) + (−1)ξ f (1 ⊗ f ) · (ξ ⊗ g)

(3.1.9)

for any ξ ∈ Tˆ , f, g ∈ O M , where˜stands for the parity. There arises a fiber bundle

ˆ J ∞ Mˆ →

(3.1.10)

ρ : Tˆ → T J ∞ (Mˆ ) s.t. ρ(η)(ξ ⊗ f ) = (ηξ ) ⊗ f

(3.1.11)

with connection

in complete analogy with (1.1.5). The relative versions, such as J ∞ (M/ ), are immediate. ˆ Note that connection (3.1.11) is constant in the direction of (θ + , θ − ), i.e., if we let ∞ J (Mˆ )o = {θ + = θ − = 0} → J ∞ (Mˆ ), then there is a diffeomorphism

∞ (3.1.12) J (Mˆ ), ρ(∂θ ± ) → J ∞ (Mˆ )o × R0|2 , ρ o (∂θ ± ) = ∂θ ± of R0|2 -manifolds with connection. Indeed, given a local coordinate system X i on M, the collection i {X (m),() , u, v, θ + , θ − ; (m) ∈ Z2+ , () ∈ Z22 }

(3.1.13a)

constitutes a local coordinate system on J ∞ (Mˆ ), where i X (m = (∂um 1 ∂vm 2 ∂θ+1 ∂θ−2 ) ⊗ X. 1 ,m 2 ),(1 ,2 )

(3.1.13b)

Letting F˜ i = (∂θ − ∂θ + ) ⊗ X i, ψ+i = (∂θ + ) ⊗ X i − θ − F˜ i, i ψ− = (∂θ − ) ⊗ X i + θ + F˜ i,

i x i = X i − θ + ψ+i − θ − ψ− − θ + θ − F˜ i ,

(3.1.14)

Lagrangian Approach to Sheaves of Vertex Algebras

we obtain another local coordinate system i i i x(m) , ψ±,(m) , F˜(m) ; u, v, θ + , θ − ; (m) ∈ Z2+ , () ∈ Z22

527

(3.1.15)

such that i i i ∂θ ± x(m) = ∂θ ± ψ±,(m) = ∂θ ± F˜(m) = 0,

(3.1.16)

and (3.1.12) follows. Note that change of variables (3.1.14) is nothing but the formal Taylor series expansion at J ∞ (Mˆ )o : i X i = x i + θ + ψ+i + θ − ψ− + θ + θ − F˜ i .

(3.1.17)

Along M, {x i } are coordinates and i transform as (even or odd) d x i . ψ±

(3.1.18)

3.1.3. Differential equations. The definition and discussion of a submanifold Sol ⊂ J ∞ (Mˆ ) as the zero locus of a differential ideal J is quite parallel to Sect. 1.4. Since our exposition is strongly focused on one particular example, that of the (2,2)-supersymmetric σ -model, we will restrict ourselves to the case where J is locally generated by 4n functions, E αi , 1 ≤ i ≤ n, 1 ≤ α ≤ 4, such that (cf. (1.4.1)) E 1i = F˜ i + · · · , i E 2i = ∂τ ψ− + ··· ,

E 3i = ∂τ ψ+i + · · · ,

(3.1.19)

E 4i = ∂τ2 x i + · · · , · in the where the omitted terms are independent of F˜ · , of non-zero order jets of ψ± · direction of τ , and of order > 1 jets of x also in the direction of τ . (τ is time-function (3.1.4) tacitly assumed to have been included in a coordinate system.) Letting

Sol o = {θ + = θ − = 0} → Sol,

(3.1.20)

one obtains a diffeomorphism of R0|2 -manifolds with connection ∼

(Sol, ρ(∂θ ± )) −→ (Sol o × R0|2 , ρ o (∂θ ± ) = ∂θ ± ),

(3.1.21)

by restricting (3.1.12). Note that Sol o is a D −, hence a D/ −, supermanifold; to emphasize the latter o . structure we will often write Sol If (3.1.19) holds, then (3.1.18) implies a diffeomorphism of D/ -manifolds ∼

o ∞ Sol (T (T M) ) , −→ J

where is the familiar parity change functor. Similarly,

∼ Sol −→ J ∞ T (T M)/ × R0|2 as D/ -manifolds. Both (3.1.22,23) are analogous to (1.4.2). ˆ

(3.1.22)

(3.1.23)

528

F. Malikov

3.2. Functional pre-symplectic structure. The right framework for super-generalization ˆ [L,M1,DM]. of 1.5 is provided by integral, rather than differential, forms on 3.2.1. Recall that the sheaf of integral forms is defined to be I∗ˆ =

4 i=−∞

I iˆ s.t. I 4−i = i Tˆ ⊗Oˆ Ber( ˆ ), ˆ

(3.2.1)

where Ber( ˆ ) is the Berezinian of ˆ . By definition, I ∗ˆ is a locally free ∗ Tˆ -module defined by

Tˆ → EndOˆ (I∗ˆ ), ξ → ιξ , where ιξ β = ξ ∧ β. def

(3.2.2)

Next, I ∗ˆ carries a unique structure of a module over the Clifford algebra, Cl(Tˆ ⊕ ˆ ), such that j

i+ j

iˆ ⊗Oˆ I ˆ → I ˆ , α ⊗ β → α ∧ β, [ιξ , α∧] = α(ξ ).

(3.2.3)

The Berezinian, Ber( ˆ ), carries the Lie derivative operation Tˆ ⊗R Ber( ˆ ) → Ber( ˆ ), ξ ⊗ β → Lieξ β,

(3.2.4)

which is naturally extended to Tˆ ⊗R I iˆ → I iˆ , ξ ⊗ β → Lieξ β.

(3.2.5)

The sheaf of integral forms is a complex with differential d : I iˆ → I i+1 ˆ

(3.2.6a)

[d, ιξ ] = Lieξ , ξ ∈ Tˆ .

(3.2.6b)

determined by

Many other differential-geometric identities, such as [Lieξ , ιη ] = ι[ξ,η] ,

[Lieξ , β∧] = (Lieξ β)∧, ξ, η ∈ Tˆ , β ∈ ˆ ,

(3.2.7)

keep on holding true. ˆ carries a fixed (étale) coordinate system (u, v, θ + , θ − ), there Since our particular is an integral form [dθ + dθ − ] such that du ∧ dv ∧ [dθ + dθ − ] trivializes the Berezinian Ber( ˆ ). Letting [dθ ± ] = ι∂θ ∓ [dθ + dθ − ] = [dθ ± ], one discovers a part of I ∗ˆ pleasingly – and deceptively – similar to the de Rham complex; e.g., Lie∂θ ± [dθ + dθ − ] = 0, d([dθ + dθ − ]) = d([dθ ± ]) = 0, d(θ ± [dθ ∓ ] = [dθ + dθ − ]. (3.2.8)

Lagrangian Approach to Sheaves of Vertex Algebras

529

Once a projection ˆ → is given, integration over fibers delivers a morphism I4ˆ → 2 , α → α o

(3.2.9)

which, in the case where the projection is (3.1.3), means that f (u, v, θ + , θ − )du ∧ dv ∧ [dθ + dθ − ] → ∂θ − ∂θ + f (u, v, θ + , θ − )du ∧ dv, (3.2.10) cf. (3.2.8). This is often referred to as integrating out θ + and θ − . 3.2.2. Back to super-presymplectic forms. Let M be either Sol or any version of an ˆ Let ∞-jet space considered in 3.1.2 that is fibered over . ˜ ∗,∗ = ∗ ⊗Oˆ I∗ˆ . M M/ˆ

(3.2.11)

If we wish to work in a relative situation determined by τ , see (3.1.4), then we write ∗ ˜ ∗,∗ = ∗ ⊗Oˆ I/ . ˆ M/ M/ˆ

(3.2.12)

In any case, we get a bi-complex with an obvious vertical differential ˜ ∗,i+1 ˜ ∗,i → δ: M/S M/S

(3.2.13)

˜ i,∗ → ˜ i+1,∗ , dρ/S : M/S M/S

(3.2.14)

and a horizontal differential

which owes its existence to connection (3.1.11) and is defined in exactly the same way as its counterpart in Sect. 3.1; here and elsewhere S is either or a point. ˜ ∗,∗ taken as a replacement of ∗,∗ , the discussion of Sect. 1.5.1–3 With M/ M/ carries over to the super-case practically word for word. For example, cf. (1.5.2), a 3,2 2,2 0 ˜ ˜ functional pre-symplectic form is ω ∈ H M, M/S /dρ M/S such that, ˜ 2,3 δω ∈ H 0 M, dρ/S . M/S

(3.2.15)

The outcome is the Lie superalgebra sheaf over M, ω HM /S .

(3.2.16)

Here is an operation that does not have an adequate purely even analogue. In all our examples, ∼

M −→ Mo × R0|2 ,

530

F. Malikov

˜ 3,2 / in a way respecting the connection, cf. (3.1.12, 21, 23). Given ω ∈ H 0 M, M/S ˜ 2,2 , operation (3.2.9) produces ωo ∈ H 0 Mo , ˜ 2,2o . Integrat˜ 3,2o /dρ o dρ M/S M /S M /S ing over fibers one obtains a Lie algebra sheaf morphism, an isomorphism in fact, ∼

ω ω HM /S −→ HMo /S . o

(3.2.17)

As a practical matter, (3.2.17) amounts to carrying out a Taylor expansion as in (3.1.17) and then extracting the coefficient of θ + θ − [dθ + dθ − ] as in (3.2.10). Similar in spirit is a morphism ω ω → HM HM /

(3.2.18)

that relates the relative and absolute versions and amounts to letting dτ = 0, cf. (1.5.11). 3.2.3. Example: canonical commutation relations. Let M be an n-dimensional purely even C ∞ -manifold. The 2n|2n-dimensional supermanifold T ∗ (T M) carries a wellknown closed 2-form ωo . If we let {x i } be coordinates on M, then {x i , xi = ∂x i } along with their superpartners {φ i , φi } form a system of local coordinates on T ∗ (T M), and ωo = δxi ∧ δx i + δφi ∧ δφ i .

(3.2.19)

Now use the projection π : J ∞ (T ∗ (T M)/ ) → T ∗ (T M) to introduce π ∗ ωo , a closed 2-form on J ∞ (T ∗ (T M)/ ). A suitable analogue of Hcan , see 1.5.4, is provided by fixing a suitable σ so that σ , τ is a coordinate system on , see (3.1.4), letting ω = π ∗ ωo ∧ dσ,

(3.2.20)

and defining H˜ can = HωJ ∞ (T ∗ (T M)

/ )

.

(3.2.21)

The rest of the discussion in 1.5.4 carries over to the present situation practically word for word; we will not dwell upon this any longer. Note that in this example integral forms do not appear. The reader interested in an ω example of a full-fledged Lie superalgebra sheaf HM /S will have to wait for the discussion of the calculus of variations in Sect. 3.3. 3.2.4. Legendre transform? In practice, the manifold J ∞ (T (T M)/ ) may be more important than J ∞ (T ∗ (T M)/ ) because of (3.1.22). The possibility to apply H˜ can then rests on the existence of the diffeomorphism, cf. (1.5.21), g : J ∞ (T (T M)/ ) → J ∞ (T ∗ (T M)/ ),

(3.2.22)

because given (3.2.21) there arises at once a Lie algebra sheaf isomorphism, cf. Lemma 1.5.5.1, g # : Hg

∗ω

∼

−→ g −1 H˜ can .

(3.2.23)

Lagrangian Approach to Sheaves of Vertex Algebras

531

Isomorphism (3.2.22), however, is a more subtle matter in the present situation than the usual Legendre transform. While the purely even manifolds underlying both the manifolds in (3.2.22) are the familiar J ∞ (T M/ ) and J ∞ (T ∗ M/ ), and they are easy to identify via a metric, the structure sheaves are more substantially different. The essence of this difference is that while M → OT (T M)

(3.2.24)

as a direct summand, its T ∗ (T M)-counterpart, T M , appears via the extension 0 → End M → A M → T M → 0,

(3.2.25)

where A M is the Atiyah algebra, i.e., the algebra of order 1 differential operators acting on the sections of M . One way to construct (3.2.22) seems to be this: split (3.2.25) by means of a connection ∇

←

0 → End M → A M → T M → 0,

(3.2.26)

and then identify ∼

M −→ T M

(3.2.27)

by means of a metric. This is exactly what the Lagrangian of a (1,1)-supersymmetric σ -model allows to do, see Sect. 3.4.3–4. 3.3. Calculus of variations. 3.3.1. The discussion of Sect. 1.6 carries over in a straightforward manner. Here are a few highlights. An action is

4,0 ˜ 3,0 ˜ ∞ /dρ

. A ∈ J ∞ Mˆ , (3.3.1) J M J∞ M ˆ

It is represented by a collection of Lagrangians ˜ 4,0 L = L ( j) ∈ ∞ J

ˆ

Mˆ

(U j )

(3.3.2)

determined up to dρ -exact terms and equal to each other on intersections Ui ∩ U j up to dρ -exact terms, cf. (1.6.0–1ab). An analogue of (1.6.2) is immediate, the outcome is a Dˆ -supermanifold Sol L with variational 1-form γ L and 2-form ω L = δγ L , cf. (1.6.4). The definition of a symmetry of L is also an obvious modification of (1.6.5). Noether’s Theorem 1.6.3 establishes a bijection between symmetries and integrals of motion as follows: ˜

ξ ↔ αξ + (−1)ξ +1 ιξ γ L ; the change of sign occurs when swapping ιξ and dρ as in (1.6.8).

(3.3.3)

532

F. Malikov

Thus there arise the Lie algebra sheaf HωSolL L , containing the algebra of integrals of motion I˜L , its relative version, HωSolL L / , and morphisms HωSolL L → HωSolL L / , I˜L → (Sol L , HωSolL L ) → (Sol L , HωSolL L / ),

(3.3.4)

whose composition is an injection provided (3.1.19) holds. A familiar novelty is that in all of this θ ± can be integrated out. The result is this: the action

o 2,0

o , A ∈ J ∞ Mˆ (3.3.5) , ∞

o /dρ o 1,0 ∞ J

Mˆ

J

Mˆ

the Lagrangian L

( j)

∈

o (U j ) 2,0 J ∞ Mˆ

,

(3.3.6)

and, since nothing is gained or lost, the integrated version of (3.3.4) as follows: ωo ωo ωo ωo H SolL o → H SolL o / , I˜L → Sol Lo , H SolL o → (Sol Lo , H SolL o / ), (3.3.7) L

L

L

L

where Sol Lo is defined in (3.1.20) In view of (3.1.22), this means J ∞ (T (T M)/ ) equipped with ω L and an embedding L I˜L → (M, HωJ ∞ (T (T M)

/ )

).

(3.3.8)

We will now exhibit an example where L HωJ ∞ (T (T M)

/ )

∼

−→ H˜ can .

(3.3.9)

3.4. An example: (1,1)-supersymmetric σ -model. 3.4.1. Let M be an n-dimensional purely even Riemannian manifold with metric (., .). Analogously to Sect. 1.6.9, we observe that a point in J 1 (Mˆ ) is a triple (tˆ, X, ∂ X ), a ˆ a point in M, and a map point in , ˆ → TX M, ξ → ∂ξ X, ∂ X : Ttˆ

(3.4.1)

cf. (3.1.13a,b). Hence for fixed vector fields ξ , η, (ξ X, ηX ) is a global section of O J 1 (M ˆ ) . (Of course, to be precise, we should have used B-points.) We will unburden the notation by letting ξ X stand for ∂ξ X . Here is a coordinate expression for this function gi j (X )ξ X i ηX j .

(3.4.2)

The (1,1)-supersymmetric σ -model Lagrangian is defined to be L = (D+ X, D− X )du ∧ dv ∧ [dθ + dθ − ],

(3.4.3)

Lagrangian Approach to Sheaves of Vertex Algebras

533

where the vector fields D± are from (3.1.5), cf. (1.6.13). Integrating out θ + and θ − gives (an exercise in differential geometry, see e.g. [QFS, p. 666]) L 11 = (−(∂u x, ∂v x) + (∇∂v x ψ+ , ψ+ ) + (∇∂u x ψ− , ψ− ) + (R(ψ+ , ψ− )ψ+ , ψ− ) − (F, F))du ∧ dv.

(3.4.4)

In this formula ∂u x = ∂u X |θ + =θ − =0 , ∂v x = ∂v X |θ + =θ − =0 , ψ± = ∂θ ± X |θ + =θ − =0

(3.4.5a)

and coincide with their namesakes from (3.1.14), ∇ is the Levi-Civita connection associated to the metric (., .), R is the curvature tensor, and F = ∇ D+ X D− X |θ + =θ − =0 ,

(3.4.5b)

which is somewhat different from its counterpart F˜ of (3.1.14). In fact, with a little extra effort the entire Taylor series expansion of L in θ + , θ − , cf. (3.1.17), can be computed to the effect that L = ((ψ+ , ψ− ) −θ + ((∂u x, ψ− ) + (ψ+ , F)) + θ− ((∂v x, ψ+ ) −(ψ− , F)) + θ + θ − L 11 )du ∧ dv ∧ [dθ + dθ − ].

(3.4.6)

To see better what all of this means, let us write down the first three terms of (3.4.4) in local coordinates (3.1.14,17); the result is s i s gs j (x) + ∂u x α ψ− ψ− αi gs j (x) L 11 = (−gi j (x)∂u x i ∂v x j + ∂v x α ψ+i ψ+ αi j

j

j

j

i ψ− · · · )du ∧ dv. + gi j (x)∂v ψ+i ψ+ + gi j (x)∂u ψ−

(3.4.7)

Computation of δL 11 , cf. (1.6.2), yields the Euler-Lagrange equations and a variational 1-form. The former are as follows, see also [QFS, p. 666]: F = 0, ∇∂u x ψ− = −R(ψ+ , ψ− )ψ+ , (3.4.8) ∇∂v x ψ+ = −R(ψ+ , ψ− )ψ− , 1 ∇∂u x ∂v x = (R(ψ− , ψ− )∂v x + R(ψ+ , ψ+ )∂u x) − (∇ψ+ R)(ψ− , ψ− )ψ+ . 2 The latter is

j s i ψ− αi gs j (x)δx α dv γ Lo = −gi j (x)∂v x j δx i + ψ− j s + gi j (x)∂u x i δx j − ψ+i ψ+ αi gs j (x)δx α du j

j

i − gi j (x)ψ+ δψ+i du + gi j (x)ψ− δψ− dv.

(3.4.9)

This, unlike more challenging (3.4.8), is a straightforward consequence of (3.4.7). Note that we have computed after projection (3.2.9), i.e., with θ ± integrated out, (3.2.10); nothing is gained or lost because γ L matters only modulo dρ -exact terms.

534

F. Malikov

ˆ carries another pair of distin3.4.2. (1,1)-supersymmetry. In addition to D± , (3.1.5), guished vector fields, + =

∂ ∂ ∂ ∂ + θ + , − = − + θ − . + ∂θ ∂u ∂θ ∂v

(3.4.10)

They enjoy similar properties ∂ ∂ [+ , + ] = 2 , [− , − ] = 2 , [+ , − ] = 0, ∂u ∂v ∂ ∂ , ± = , ξ± = 0, ∂v ∂u

(3.4.11)

[• , D• ] = 0.

(3.4.12)

and

Relations (3.4.11) imply that def ˆ N 1+ = span{ f (u)+ } ⊂ Tˆ (),

(3.4.13)

def ˆ N 1− = span{ f (v)− } ⊂ Tˆ ()

are two commuting copies of the N=1- supersymmetric superalgebra Lie realized in ˆ note that each contains a copy, Vir ± , of the algebra of vector fields vector fields on ; on . In fact, both are subalgebras of the algebra of symmetries of L: N 1± → I˜L .

(3.4.14)

Indeed, using (3.4.6), one computes easily that Lieρ f (u)+ L 11 = dρ ( f (u)L 01 )dv, Lieρ f (v)− L 11 = dρ ( f (v)L 10 )du. (3.4.15) It is then rather straightforward, and pleasing, to use (3.3.3, 3.4.9) in order to compute the corresponding integrals of motion def

j

Q +f = Q ρ f (u)+ = 2 f (u)gi j (x)ψ+i ∂u x j du − gi j (x)F i ψ− dv, i Q −f = Q ρ f (v)− = −2 f (v)gi j (x)ψ− ∂v x j dv + gi j (x)F i ψ+ , def

j

(3.4.16)

which, upon imposing the Euler-Lagrange equation F = 0, becomes i Q +f = 2 f (u)gi j (x)ψ+i ∂u x j du, Q −f = −2 f (v)gi j (x)ψ− ∂v x j dv.

(3.4.17)

This furnishes the embeddings Vir ± → N 1± → (Sol Lo , HωSolL o ),

(3.4.18a)

L

and the definitions (cf. 1.6.11.1) of right/left moving subalgebras ωo ,±

H SolL o = {F ∈ HωSolL 0 : [F, Vir ∓ ] = 0}. L

(3.4.18b)

L

Next, we will see that all of this unfolds within the canonical Lie algebra sheaf H˜ can of Sect. 3.2.3.

Lagrangian Approach to Sheaves of Vertex Algebras

535

Proposition 3.4.3. There is a diffeomorphism ∼

gˆ : Sol Lo −→ J ∞ (T ∗ (T M)/ ) of D/ -manifolds, which delivers the Lie algebra sheaf isomorphism ∼ g # : HωSolL ,± ˆ −1 H˜ can , o / −→ g L

cf. Lemmas 1.5.5.1 and 1.6.8.1. 3.4.4. Proof. Super-Legendre transform. In order to proceed, we need to make sure that Sol L as defined by (3.4.8) satisfies Cauchy-Kovalevskaya condition (3.1.19). Apparently neither u nor v can play the role of time, but the following change of variables: u = σ + τ, v = σ − τ

(3.4.19)

1 1 (∂σ + ∂τ ), ∂v = (∂σ − ∂τ ) 2 2

(3.4.20)

so that ∂u =

does the job. Therefore, cf. (3.1.22), ∼

Sol Lo −→ J ∞ (T (T M)/ ),

(3.4.21)

and our task is to find ∼

gˆ : J ∞ (T (T M)/ ) −→ J ∞ (T ∗ (T M)/ )

(3.4.22)

that identifies, modulo dρ/ -exact terms, ω L on the L.H.S. with the pull-back of the canonical ωo , (3.2.19), on the R.H.S. (g) ˆ ∗ ω|dτ =0 = ω L |dτ =0 + dρ/ (· · · ).

(3.4.23)

The variational 1-form γ Lo , computed in (3.4.9), is not well suited for this purpose. In addition to (3.4.19), let us introduce variables ρj =

1 j 1 j j j (ψ− + ψ+ ), φ j = (ψ− − ψ+ ). 2 2

(3.4.24)

j

(Since ψ± are sections of 2 copies of the bundle of 1-forms, see (3.4.5a) and (3.1.17,18), this change of variables makes sense globally.) Plugging these variables in Lagrangian (3.4.4) gives 1 (∂τ x, ∂τ x) + 2(∇∂τ x ρ, φ) + 2(∇∂τ x φ, ρ) + · · · dτ ∧ dσ, L 11 = 2 where · · · stand for the terms not containing ∂τ . Since (∇∂τ x φ, ρ) = −(φ, ∇∂τ x ρ) + ∂τ (φ, ρ) = (∇∂τ x ρ, φ, ) + ∂τ (φ, ρ), we obtain L˜ 11 =

1 (∂τ x, ∂τ x) + 4(∇∂τ x ρ, φ) + · · · 2

(3.4.25)

dτ ∧ dσ = L 11 mod dρ (...). (3.4.26)

536

F. Malikov

It is immediate to derive from (3.4.26) that the corresponding variational 1-form s α j γ L = (gi j ∂τ x j + 4iα ρ φ gs j (x))δx i ∧ dσ + 4gi j (x)δρ i φ j ∧ dσ,

(3.4.27)

equals γ L |dτ =0 modulo dρ -exact terms. If we let ρi =

1 gi j (x)φ j , 4

(3.4.28)

then s α ρ ρs )δx i ∧ dσ + δρ i ρi ∧ dσ. γ L = (gi j ∂τ x j + iα

(3.4.29)

s α ρ ρs , φ i → ρ i , φi → ρi xi → gi j ∂τ x j + iα

(3.4.30)

The substitution

makes sense as a globally defined map dT M L˜ 11 : T (T M) → T ∗ (T M).

(3.4.31)

It is a super-analogue of the Legendre transform, (1.6.13–14), which was envisaged in Sect. 3.2.4; indeed, if xi = ∂x i , then the first of assignments (3.4.30) is exactly splitting (3.2.26). The D/ -manifold property allows to extend this map unambiguously to the jet-spaces, and it is clear that such map identifies δγ L with ωo from (3.2.19). 3.4.5. Therefore, H˜ can is to the (1,1)-supersymmetric σ -model what Hcan is to the ordinary σ -model. In particular, writing integrals of motion (3.4.16) in terms of the new variables introduced in Sect. 3.4.4 provides a free field realization of N 1± . The result, which we will discuss in the context of the Kähler geometry, see the next section, is presumably the quasiclassical limit of the formulas obtained in [B-ZHS]. 3.4.6. The Kähler case: (2,2)-supersymmetry and the Witten Lie algebra. It is an exciting discovery going back to [Zu,A-GF] that in the Kähler case the supersymmetry algebra becomes twice as large. 3.4.6.1. Let then M be a complex manifold and (., .) a Kähler metric on it. To handle this case, we will change the notation somewhat: the natural vector bundles, such as T M, will be assumed to be complexified, and decompositions, such as T M = T 10 M ⊕T 01 M, will arise. What has been treated as a vector field, e.g. ∂τ x, ∂τ ψ+ , will become a section of T 10 M, and ∂τ x, ¯ ∂τ ψ¯ + will stand for the complex conjugate sections. We will also let, sloppily but customarily, ¯

j

j¯

∂τ x j = ∂τ x j , ψ± = ψ± .

(3.4.32)

The defining property of the Kähler metric ∇(T 10 ) ⊂ T 10 , ∇(T 01 ) ⊂ T 01 is crucial for what follows.

(3.4.33)

Lagrangian Approach to Sheaves of Vertex Algebras

537

Computing as in 3.4.4 (and using (3.4.33)) one obtains L 11 = (−(∂u x, ∂u x) ¯ − (∂v x, ∂u x) ¯ + (∇∂v ψ+ , ψ¯ + ) +(∇∂u ψ− , ψ¯ − ) + · · · )du ∧ dv mod dρ , (3.4.34) where the terms not containing ∂u , ∂v are omitted. Property (3.4.33) implies (and (3.4.34) supports) that w.r.t. the grading on O J ∞ (Mˆ )o defined by ψ± → 1, ψ¯ ± → −1,

(3.4.35)

L 11 is homogeneous of degree 0. Therefore, any homogeneous component of a symmetry of L 11 is also a symmetry. Integrals of motion (3.4.16) afford decomposition +− Q +f = Q ++ f + Qf ,

−− Q −f = Q −+ f + Qf

(3.4.36)

into the sum of degree ±1 components, which implies that the entire quadruple +− −+ −− ˜ {Q ++ f , Q f , Q f , Q f } ⊂ IL ,

(3.4.37)

and this extends (3.4.18a) to an embedding of a pair of N=2-superconformal Lie algebras Vir ± → N 1± → N 2± → I˜ L → H˜ can .

(3.4.38)

In particular, (and this follows from the consideration of the degree) −− ++ −− [Q ++ f , Q g ] = [Q f , Q g ] = 0.

(3.4.39)

Witten has used these relations [W2,W3] to define what in the present context becomes Witten Lie algebra sheaves: {X ∈ H˜ can : [Q 1∓,∓ , X ] = 0} . {[Q ∓,∓ , X ] all X ∈ H˜ can }

W± =

def

(3.4.40)

1

(There are, of course, two more versions of these sheaves.) 3.4.6.2. Some formulas. For the purpose of writing embeddings such as (3.4.38) explicitly, rewrite (3.4.34) using σ and τ which were defined in (3.4.19), ¯

j¯

t L 11 = (gi j¯ ∂τ x i ∂τ x j − 2∂τ x i isj gs t¯ψ+ ψ+t¯ + 2∂τ x i i¯s¯ j¯ gs¯t¯ψ− ψ− )dτ ∧ dσ (3.4.41) ¯j j i¯ )dτ ∧ dσ · · · , + (2gi j¯ ψ+ ∂τ ψ+i − 2gi¯ j ψ− ∂τ ψ− j

where the terms not containing ∂τ are omitted. It follows that, cf. (3.4.27), ¯

¯

¯

t i γ L = (gi j¯ ∂τ x j − 2isj gs t¯ψ+ ψ+t¯ )δx i ∧ dσ + (gi j¯ ∂τ x i + 2 s¯j¯i¯ gs¯t ψ− ψ− )δx j ∧ dσ (3.4.42) j j¯ i¯ ) ∧ dσ. × (−2gi j¯ ψ+ δψ+i + 2gi¯ j ψ− δψ− j

538

F. Malikov

If we let ψ = ψ+ , ψ¯ = ψ¯ − ,

(3.4.43a)

and j¯

j

ψi = −2gi j¯ ψ+ , ψi¯ = 2gi¯ j ψ− ,

(3.4.43b)

then ¯

¯

¯

γ L = (gi j¯ ∂τ x j + isj ψ j ψs )δx i ∧ dσ + (gi j¯ ∂τ x i + s¯j¯i¯ ψ i ψs¯ )δx j ∧ dσ ¯

× (ψi δψ i + ψi¯ δψ i ) ∧ dσ.

(3.4.44)

Therefore, the coordinate form of the super-Legendre transform (3.4.31) is ¯

xi → gi j¯ ∂τ x j + isj ψ j ψs , x • → x • ,

(3.4.45)

¯

x j¯ → gi j¯ ∂τ x i + s¯j¯i¯ ψ i ψs¯ , φ • → ψ • , φ• → ψ• .

Plugging these in (3.4.17) and extracting homogeneous components as in (3.4.36) one obtains, upon letting dτ = 0, ¯

¯

i j j Q −− f = f (σ − τ )(−x j¯ φ + gi j¯ ∂σ x φ )dσ, ¯

¯

¯

j ji ji s α Q −+ f = 2 f (σ − τ )(∂σ x φ j¯ − g x i φ j¯ + g iα φ φs φ j¯ )dσ,

(3.4.46− )

¯

j i j Q ++ ¯ ∂σ x φ )dσ, f = f (σ + τ )(x j φ + g ji ¯

¯

i ij i j s¯ α¯ Q +− f = −2 f (σ + τ )(∂σ x φi + g x j¯ φi − g j¯α¯ φ φs¯ φi )dσ.

(3.4.46+ )

One may wish at this point to use these formulas to compute Witten’s Lie algebra sheaf (3.4.40). Two things transpire immediately: first, the role played by f in all of this is rather superficial and, second, if one removes from the first of (3.4.45− ) the annoying gi j¯ ∂σ x i φ j (and gi j¯ ∂σ x i φ j from the first of (3.4.45+ ) resp.), then it becomes exactly ¯ (∂- resp.) differential; and so, perhaps, W ± should be of completely holomorphic the ∂(antiholomorphic resp.) nature. This is all true, but the language suited to analysis of such issues is that of vertex Poisson algebras. 3.5. Vertex Poisson algebra interpretation. Witten’s models. The sheaf H˜ can is the tip of an iceberg. It is, just as its purely even counterpart Hcan was, Sect. 1.5.4, a Lie algebra sheaf attached to a certain sheaf of vertex Poisson superalgebras.

Lagrangian Approach to Sheaves of Vertex Algebras

539

3.5.1. The notion of a super-SVDO is quite analogous to the one we discussed in Sect. 2. It is a Z+ -graded vertex Poisson superalgebra V = V0 ⊕ V1 ⊕ · · · such that V0 = C ∞ (T U ), U ⊂ Rn ,

(3.5.1)

V1 = TU / (U ) + U / (U ) + TU / (U ) + U / (U ) .

(3.5.2)

and, non-canonically,

Classification of such algebras [GMS3], under some obvious non-degeneracy assumptions, is obtained in a way similar to Sect. 2.4.3, 2.5. They form an 3,cl (U )-torsor, i.e., given a super-SVDO V and a closed 3-form H ∈ 3,cl (U ), an operation .

(V, H ) → V + H

(3.5.3)

.

is defined, where V + H is a super-SVDO different from V only in that the operation (0)

: TU / ⊗ TU / → V1

is replaced with (0) H

=(0) +H,

(3.5.4)

cf. (2.4.21). (This involves only even components of V1 .) One has, cf. (2.4.24), .

Mor (V, V + H ) = {α ∈ 2 (U )s.t.dα = H }.

(3.5.5)

In particular, cf. (2.4.25), ∼

2,cl (U ) −→ Aut(V ),

(3.5.6a)

where the automorphism corresponding to α is the one determined by the shear, cf. (2.4.19), TU / (U ) ξ → ξ + ιξ α.

(3.5.6b)

All of this can be defined over manifolds. There is a distinguished such sheaf poiss of super-SVDOs, the vertex Poisson de Rham complex [MVS], M . As an O M module, poiss

M

= π∗ O J ∞ (T ∗ (T M)/ ) ,

(3.5.7)

where π is the projection J ∞ (T ∗ (T M)/ ) → M . The operations are determined by the requirement that they all be of classical origin – as in Proposition 2.7.1. Here are some examples written down in local coordinates: j

¯

j¯

(xi )(0) f (x) = ∂x i f (x), (xi¯ )(0) f (x) = ∂x i¯ f (x) (φi )(0) φ j = δi , (φi¯ )(0) φ j = δi¯ , ξ(0) η = [ξ, η], ξ(o) α = Lieξ α, (3.5.8a) ¯

¯

¯

where ξ = f i (x)xi + f i (x)xi¯ , η = g i (x)xi +g i (x)xi¯ , α = h i (x)ρ(∂σ )x i +h i¯ (x)ρ(∂σ )x i , the vertex algebra derivation being T = ρ(∂σ ).

(3.5.8b)

540

F. Malikov

(The twist that takes care of functions explicitly depending on σ and can imposed as in the even case, see Definition 2.7.2, has been tacitly assumed throughout.) One has, analogously to Proposition 2.6.1, Proposition 3.5.1.1. a) The set of isomorphism classes of sheaves of super-SVDOs on M is identified with H 3 (M, R). b) If V is a sheaf of super-SVDOs, then ∼

AutV −→ H 0 (M, 2,cl M ). Let Lie(V) = V/T (V). Operation tion 2.7.3,

(0)

(3.5.9)

makes LieV into a sheaf of Lie superalgebras. One has, cf. Proposi-

Proposition 3.5.2. The algebra sheaves H˜ can and Lie( M

poiss

) are isomorphic.

3.5.3. Some of the constructions above are simplified when performed in the framework of vertex Poisson superalgebras because some of the Lie algebras considered are the value of the Lie-functor. For example, there are N=1,2 supersymmetric vertex Poisson algebras [K], N 1 and N 2, such that the N=1,2 supersymmetric Lie superalgebras, which appeared in (3.4.13), are N 1 = Lie(C ∞ () ⊗ N 1), N 2 = Lie(C ∞ () ⊗ N 2).

(3.5.10)

The elements, see (3.4.45), ¯

¯

Q −− = −x j¯ φ j + gi j¯ ∂σ x i φ j , Q

−+

j¯

¯ ji

= 2(∂σ x φ j¯ − g xi φ j¯ + g

¯ ji

s α iα φ φs φ j¯ ),

(3.5.11− )

¯

Q ++ = x j φ j + g ji¯ ∂σ x j φ i , ¯

¯

Q +− = −2(∂σ x i φi + g i j x j¯ φi − g i j s¯j¯α¯ φ α¯ φs¯ φi ), poiss

define global sections of M (2.9.25) holds true.

(3.5.11+ )

. By definition, the following analogue of (2.8.12) and

Lemma 3.5.3.1. The two pairs of global sections (Q −− , Q −+ ) and (Q ++ , Q +− ) generpoiss ate, inside H 0 (M, M ), two pairwise Poisson-commuting copies of the vertex Poisson N=2 superalgebra: poiss

N 2+ → H 0 (M, M

+ ) ← N 2− , (N 2(n) (N 2− ) = 0 if n ≥ 0.

(3.5.12)

Lagrangian Approach to Sheaves of Vertex Algebras

541

A streamlined version of Witten’s Lie algebra sheaf (3.4.40) is Witten’s vertex Poisson algebra sheaf defined as follows. Relations (3.4.39) in the vertex algebra context imply that each element of the quadruple {Q •,• (0) , • = ±}) and various linear combinations poiss

thereof are differentials of the sheaf M obtain a cohomology sheaf

. Letting Q (0) be one such differential, we

poiss def

H Q ( M

) =

Ker Q (0) . ImQ (0)

(3.5.13)

It is a vertex Poisson algebra sheaf – a well-known fact and an immediate consequence of (0) being a derivation of all (n) -products (super-analogue of Jacobi identity, Sect. 2.1, II.2). Of sheaves (3.5.13) the following 3 will be of interest to us: Definition 3.5.3.2 (cf. [W2]). poiss

),

(3.5.14a)

poiss

),

(3.5.14b)

poiss H Q −− ( M ).

(3.5.14c)

A − model sheaf :W A = H Q −− +Q ++ ( M B − model sheaf :W B = H Q −− +Q +− ( M half − twisted model sheaf :W1/2 = The relation of (3.5.13–14) to (3.4.40) is that W − = Lie(W1/2 ),

(3.5.15)

to give but one example. The cohomology, H ∗ (M, V), of a sheaf of vertex Poisson algebras V is a vertex Poisson algebra, of course. We are led then, following [W2], to Definition 3.5.3.3. A − model vertex Poisson algebra :H ∗ (M, W A ), B − model vertex Poisson algebra :H ∗ (M, W B ), half − twisted model vertex Poisson algebra :H ∗ (M, W1/2 ),

(3.5.16a) (3.5.16b) (3.5.16c)

Theorem 3.5.4. Let M be Kähler. Then 1) the following isomorphisms are valid: ∼

H ∗ (M, W A ) −→ H ∗ (M, C), ∼

∗

∗

(3.5.17a)

∗

H (M, W B ) −→ H (M, T M ), ∗

∼

H (M, W1/2 ) −→ H poiss,an

∗

(3.5.17b)

poiss,an (M, M ),

(3.5.17c) poiss

where M is a purely holomorphic version of the sheaf M [MSV]; poiss poiss,an + 2) embedding N 2 → M , (3.5.12), descends to an embedding N 2+ → M poiss,an whose image coincides with N=2 superconformal structure introduced on M .

542

F. Malikov

Remark 3.5.4.1. 1) Of these, the first two are finite dimensional supercommutative algebras and as such are trivial examples of a vertex Poisson algebra with zero derivation T as noted in Sect. 2.3. Contrary to this, the last one is a full-fledged infinite dimensional vertex Poisson algebra. Being infinite dimensional it is characterized by its character (q-dimension), which is closely related to the elliptic genus of M. The algebra can be quantized, and the character of the quantum version has provided some insights into the elliptic genus [BL,MS,GM1,GM2]. 2) This theorem, especially (3.5.17c) is a refined version of [Kap]. In fact, Kapustin deals with the quantum version of this result; we will discuss quantization in the next section. poiss

3.5.4.2. Sketch of proof. Apply to M automorphism (3.5.6a-b) determined by the ¯ Kähler 2-form gi j¯ d x i ∧ d x j . As a result, (Q −− )(0) will be replaced with a vertex ana¯ logue of the ∂-differential: ¯ ∂¯ver t = (x j¯ φ j )(0) .

(3.5.18)

Essentially by definition, poiss,an

( M

poiss

, 0) → ( M

, ∂¯ver t ), ¯

¯

is a quasiisomorphism [MSV]. Indeed, a glance at (3.5.8a) convinces one that x j φ j are not ∂¯ver t -cocycles, and x j¯ φ j¯ are ∂¯ver t -cohomologous to 0. Therefore ∂¯ver t effectively kills all antiholomorphic variables, leaving holomorphic ones intact. This defines poiss poiss,an a purely holomorphic analogue of M , that is, M . Hence a quasiisomorphism poiss,an

( M

poiss

, 0) → ( M

, (Q −− )(0) ),

which proves (3.5.17c). In (3.5.17a-b) one more differential is turned on. Definition (3.5.11) implies that upon the same shear by the Kähler form, Q ++ = x j φ j .

(3.5.19)

Therefore, (Q −− )(0) + (Q ++ )(0) is a vertex analogue of total de Rham differential, and (3.5.17b) becomes essentially [MSV], Theorem 2.4. Similarly, in the (Q −− )(0) -cohomology, Q +− = −4∂σ x i φi ,

(3.5.20)

and a simple analysis along the lines of [MSV], Sect. 2.3–2.4, shows that poiss,an

H∂σ x j φ j ( M

∼

) −→ H ∗ (M, ∗ T M ),

as desired. Item 2) is a result of checking (3.5.19, 20) against [MSV], (2.3b). Next, we establish concrete complexes which compute vertex Poisson algebras of the A-, B-, and half-twisted models.

Lagrangian Approach to Sheaves of Vertex Algebras

543

Corollary 3.5.5. ∼

H ∗ (M, W A ) −→ H Q −− +Q ++ ((M, M

poiss

)),

∼

poiss H ∗ (M, W B ) −→ H Q −− +Q +− ((M, M )), ∼ poiss H ∗ (M, W1/2 ) −→ H Q −− ((M, M )).

(3.5.21a) (3.5.21b) (3.5.21c)

poiss

Proof. The sheaf M is a complex w.r.t. the 3 differentials used above. Hence there poiss poiss poiss arise 3 different hypercohomology groups, H A ( M ), H B ( M ), H1/2 ( M ). Each can be computed by any of the two spectral sequences. The computation using one of them is the content of Theorem 3.5.4. It says that the result is the vertex Poisson algebra of A-, B-, and half-twisted models resp. The computation using another will poiss poiss then prove the corollary, because the sheaf M being flabby, H j (M, M ) = 0 if j > 0. Remark. In view of Theorem 3.5.4, isomorphisms (3.5.21a,b) are vertex Poisson algebra ¯ versions of the de Rham complex and ∂-resolution of the algebra of polyvector fields ¯ resp., while (3.5.21c) is the ∂-resolution of the vertex Poisson de Rham complex. 3.5.6. H-flux. Let us now give, along the lines of Sect. 2.8, a Lagrangian interpretation of twisted sheaves of super-SVDOs which arise via (3.5.3) and are parametrized by H 3 (M, R), see Proposition 3.5.1.1. Fix H ∈ (M, 3,cl M ), a closed 3-form; a cover {Ui } of M; and a collection of 2-forms (i) β ∈ (Ui , 2M ) s.t. dβ (i) = H |Ui . Having noticed that β j (D+ X, D− X ) is naturally a section of the structure sheaf of the jet space over U j , introduce the H -twist of Lagrangian (3.4.3) as follows, cf. (2.8.5): L H = {L + β j (D+ X, D− X )du ∧ dv ∧ [dθ + dθ − ]}.

(3.5.22)

The argument parallel to that leading to (2.8.5) proves the following. Lemma 3.5.6.1. ωo

H H˜ SolL o

LH

poiss .

∼

−→ Lie( M

+ H ),

poiss .

where M

+ H is defined as in (3.5.3).

Therefore, all the constructions originating in [GHR] and further explored in papers poiss . such as [BLPZ,KL] translate into different vertex Poisson subalgebras of M + H, which depend on a choice of a generalized Kähler structure. 3.6. Quantization. B-model moduli. This section is an announcement. It will be assumed throughout that the automorphism by the Kähler form has been performed so that ¯

Q −− = φ j x j¯ , Q ++ = φ i xi , cf. Sect. 3.5.4.2.

(3.6.0)

544

F. Malikov poiss

3.6.1. The differential graded sheaves of vertex Poisson algebras, ((M, M ), Q (0) ), where Q is any of the differentials appearing in (3.5.21a,b,c), can be quantized. What t we mean by this is that, first, there is a sheaf of vertex algebras ver M [MSV] whose poiss quasiclassical limit is M and, second, this sheaf carries quantum analogues of each ++ of the 3 differentials. In fact, quantum versions of (Q −− (0) ) and (Q (0) ) are in [MSV], and +− (Q )(0) has been recently proposed in [B-ZHS]; in what follows the use of the latter is easy to avoid. Thus there arise 3 vertex algebra versions of A-, B-, and half-twisted models resp.: H ∗ (M, W A

quant

H

∗

∼

t ) −→ H Q −− +Q ++ ((M, ver M )),

∼ quant t (M, W B ) −→ H Q −− +Q +− ((M, ver M )), ∼ quant t H ∗ (M, W1/2 ) −→ H Q −− ((M, ver M )).

(3.6.1a) (3.6.1b) (3.6.1c)

) and H ∗ (M, W B ), coincide with their quasiclassiThe first two, H ∗ (M, W A cal limits (3.5.21a,b). The 3rd is quite different from its quasiclassical limit and equals the cohomology of the chiral de Rham complex, H ∗ (M, ch M ) [MSV]. Relation of this naive quantization to the genuine quantum string theory is expressed by saying, in physics language, that the latter equals the former “perturbatively”, [Kap]. But let us show that both (3.6.1b,c) can be further deformed along the Barannikov-Kontsevich moduli space [BK]. We will focus on the half-twisted model (3.6.1c). quant

quant

3.6.2. Recall that associated (by Deligne, see [GoM,Kon,BK]) to any differential Lie superalgebra (g = g0 ⊕ g1 , d) there is a deformation functor, Def g, with domain the category of Artin algebras and range the category of sets . In order to define it, introduce the space of solutions to the Maurer-Cartan equation with values in an Artin algebra A: 1 1 (3.6.2) MCg(A) = γ : dγ + [γ , γ ] = 0, γ ∈ (g ⊗ A) . 2 The operation (g ⊗ A)1 γ → dβ + [γ , β] if β ∈ (g ⊗ A)0

(3.6.3)

does not preserve the set MCg(A), but it does so infinitesimally, see a lucid explanation in [M2], Ch. 2, Sect. 9. Exponentiating (3.6.3) gives a group action G(A)0 × MCg(A) → MCg(A).

(3.6.4)

Def g(A) = MCg(A)/G(A)0 .

(3.6.5)

Define

The motivation behind this ([M2], Ch. 2, sect. 9) is that (i) if γ is a solution of the Maurer-Cartan equation, then d +[γ , .] is also a differential, and (ii) the adjoint action of g0 results in the action on solutions of the Maurer-Cartan equation defined in (3.6.3).

Lagrangian Approach to Sheaves of Vertex Algebras

545

Barannikov and Kontsevich apply this functor in the case where ∗,0 ¯ g B K = (M, 0,∗ M ⊗ TM ), d = ∂ [., .] is the Schouten-Nijenhuis bracket.

(3.6.6)

Our task is similar but somewhat different. We need, see (3.6.1c), to deform (Q −− )(0) t within the class of differentials on the vertex algebra (M, ver M ). Even though the latter is not a Lie algebra, this deformation problem is governed by the differential Lie superalgebra

def t −− (ˆg, d, [., .]) = (M, Lie( ver )(0) ,(0) , (3.6.7a) M )), (Q where t ver t ver t (M, Lie( ver M )) = (M, M /T ( M )).

(3.6.7b)

Indeed, (0)

makes

t (M, ver M )

t ver t : gˆ ⊗ (M, ver M ) → (M, M )

(3.6.8)

a gˆ -module, on which gˆ operates by derivations. Furthermore,

1 ((Q −− )(0) + γ(0) )2 = (Q −− (0) γ )(0) + 2 (γ(0) γ )(0) .

(3.6.9)

Hence, if γ satisfies the Maurer-Cartan equation, then (Q −− )(0) + γ(0) is a differential. Let us define then Def (M, ver t ) = Def gˆ .

(3.6.10)

t g B K ⊂ (M, ver M ),

(3.6.11)

M

By definition,

which, by virtue of (3.6.7b), gives a map, an injection in fact, ι : g B K → gˆ .

(3.6.12)

It is not a differential Lie algebra homomorphism, but its twisted version ι Q ++ : g B K → gˆ , a → Q ++ (0) ι(a)

(3.6.13)

is; here Q ++ is a vertex analogue of the ∂-differential; it has appeared in (3.6.1) and is defined by the same formula as its quasiclassical limit (3.6.0). Indeed, it is a pleasing exercise to check that the Schouten-Nijenhuis bracket can be expressed in purely vertex algebra terms, cf. Proposition 1.1 in [Get], ι ([a, b]) = ι(a)(0) (Q ++ (0) ι(b)).

(3.6.14)

Therefore

++ ι Q ++ ([a, b]) = Q ++ = Q ++ (0) ι(a)(0) Q (0) ι(b) (0) ι(a)

(0)

Q ++ (0) ι(b)

= ι Q ++ (a)(0) ι Q ++ (b)(0) . Note that morphism (3.6.13) changes the parity, as it should, because g B K is an odd Lie superalgebra. This proves

546

F. Malikov

Lemma 3.6.2.1. Map (3.6.13) defines a morphism of functors Def gB K → Def (M, ver t ) . M

(3.6.15)

If M is a Calabi-Yau manifold, then Def gB K is represented by a formal scheme that is the formal neighborhood of 0 of the superspace H ∗ (M, ∗ T M ) [BK]. In particular, there exists a generic formal solution of the Maurer-Cartan equation in variables chosen to be any basis of the dual space (H ∗ (M, ∗ T M ))∗ . Therefore, Corollary 3.6.2.2. If M is a Calabi-Yau manifold, then there is a family of vertex algebras H ∗ (M, W1/2

quant

∼

t )t −→ H Q −− ((M, ver M )), t

(3.6.16)

with base the formal neighborhood of 0 in the superspace H ∗ (M, ∗ T M ). Some of these deformations are not so formal; for example, (Q −− )(0) itself depends quite explicitly on the choice of a complex structure, see (3.6.0); this can be extended by including generalized complex structures [G]; and considerable work has been done in order to interpret other points of the Barannikov-Kontsevich moduli space. 3.6.3. Vertex Frobenius manifolds? It appears that there is more than just that to this story. The events unfolding in the conformal weight zero component of H Q −− t t ((M, ver M )) is precisely the Barannikov-Kontsevich construction of the Frobenius manifold structure on Def gB K . Furthermore, it is plausible that each line of [BK] has a vertex algebra analogue valid up to homotopy. For example, operation (−1) makes each vertex algebra into a homotopy associative commutative algebra [LZ]. Furthermore, the order 2 differential operator defined on g, which is essential for [BK], has a vertex analogue; this analogue is (Q ++ )(1) , which is well defined precisely when M is a Calabi-Yau manifold [MSV]. It is also an order 2 differential operator of sorts in that ++ [(Q ++ )(1) , a(−1) ] − (Q ++ (1) a)(−1) = (Q (0) a)(0) ,

(3.6.17)

which is a derivation of all (n) -multiplications – a remark of Lian and Zuckerman, [LZ], Lemma 2.1. What all of this seems to indicate is that there is a reasonable definition of a vertex t Frobenius manifold of which H Q −− ((M, ver M )) is an important example. t

Acknowledgements. The author thanks V.Gorbounov, A.Kapustin, and B.Khesin for illuminating discussions. Parts of this work were done while the author was visiting the Fields Institute, IHES, Max-Planck-Institut in Bonn, and Erwin Schrödinger Institut in Vienna. It is a pleasure to acknowledge the support, hospitality, and stimulating atmosphere of these institutions.

References [AKSZ] [AG] [A-GF]

Aleksandrov, M., Kontsevich, M., Schwarz, A., Zaboronsky, O.: The geometry of the master equation and topological quantum field theory. Internat. J. Mod. Phys. A12, 1405–1430 (1997) Arkhipov, S., Gaitsgory, D.: Differential operators on the loop group via chiral algebras. Int. Math. Res. Not. 2002(4), 165–210 (2002) Alvarez-Gaumé, L., Freedman, D.Z.: Geometrical structure and ultraviolet finiteness in the supersymmetric σ -model. Commun. Math. Phys. 80(3), 443–451 (1981)

Lagrangian Approach to Sheaves of Vertex Algebras

[BK] [BD] [B-ZHS] [BL] [BLPZ] [Bre] [DF] [DM] [Di] [Dor] [Fad] [FP] [F] [FL] [FB-Z] [FS] [GHR] [GW] [Get] [GoM] [GM1] [GM2] [GMS1] [GMS2] [GMS3] [G] [HK] [K] [Kap] [KL] [Kon] [L] [LZ]

547

Barannikov, S., Kontsevich, M.: Frobenius manifolds and formality of lie algebras of polyvector fields. Int. Math. Res. Not. 1998(4), 201–215 (1998) Beilinson, A., Drinfeld, V.: Chiral algebras, American Mathematical Society Colloquium Publications, 51, Providence, RI: Amer. Math. Soc. 2004 Ben-Zvi, D., Heluani, R., Szczesny, M.: Supersymmetry of the chiral de Rham complex. http:// arXiv.org/list/math. QA/0601532, 2006 Borisov, L., Libgober, A.: Elliptic genera of toric varieties and applications to mirror symmetry. Inv. Math. 140(2), 453–485 (2000) Bredthauer, A., Lindström, U., Persson, J., Zabzine, M.: Generalized kähler geometry from supersymmetric sigma models. Lett. Math. Phys. 77, 291–308 (2006) Bressler, P.: The first Pontryagin class. http://arxiv.org/math. AT/0509563, 2005 Deligne, P., Freed, D.: Classical field theory. In: Quantum fields and strings: A course for mathematicians v.1, P.Deligne et al, editors, Providence, RI: Amer. Math. Soc. 2000 Deligne, P., Morgan, J.: Notes on supersymmetry (following J.Bernstein). In: Quantum fields and strings: A course for mathematicians, v.1, P.Deligne et al, editors, Providence, RI: Amer. Math. Soc. 2000 Dickey, L.A.: Soliton equations and Hamiltonian systems, Second edition. Advanced Series in Mathematical Physics 26, River Edge, NJ: World Scientific Publishing Co., Inc., 2003 Dorfman, I.Ya.: Dirac structures of integrable evolution equations. Phys. Lett. A 125(5), 240–246 (1987) Faddeev, L.D.: The Feynman integral for singular Lagrangians (in Russian), Teoret. Mat. Fiz. 1(1), 3–18 (1969) Feigin, B., Parkhomenko, S.: Regular representation of affine Kac-Moody algebras. In: Algebraic and geometric methods in mathematical physics (Kaciveli, 1993), Math. Phys. Stud. 19, Dordrecht: Kluwer Acad. Publ., 1996, pp 415–424 Frenkel, E.: Private communication Frenkel, E., Losev, A.: Mirror symmetry in two steps: a-i-b. Commun. Math. Phys. 269, 39–86 (2007) Frenkel, E., Ben-Zvi, D.: Vertex algebras and algebraic curves. Mathematical Surveys and Monographs 88, Providence, RI: Amer. Math. Soc., 2001 Frenkel, I., Styrkas K.: Modified regular representations of affine and Virasoro algebras, VOA structure and semi-infinite cohomology. http://arXiv.org/math. QA/0409117, 2004 Gates, S.J. Jr.., Hull, C.M., Roˇcek, M.: Twisted multiplets and new supersymmetric nonlinear σ -models. Nucl. Phys. B 248(1), 157–186 (1984) Gepner, D., Witten, E.: String theory on group manifolds. Nucl. Phys. B 278(3), 493–549 (1986) Getzler, E.: A darboux theorem for hamiltonian operators in the formal calculus of variations. Duke Math. J. 111(3), 535–560 (2002) Goldman, W.M., Millson, J.J.: The deformation theory of representations of fundamental groups of compact kähler manifolds. Inst. Hautes Études Sci. Publ. Math. 67, 43–96 (1988) Gorbounov, V., Malikov, F.: Vertex algebras and the landau-ginzburg/calabi-yau correspondence. Moscow Math. J. 4(3), 729–779 (2004) Gorbounov, V., Malikov, F.: The chiral de Rham complex and the positivity of the equivariant signature of the loop space. http://arXiv.org/math. AT/0205132, 2002 Gorbounov, V., Malikov, F., Schechtman, V.: Gerbes of chiral differential operators. ii. vertex algebroids. Inv. Math. 155, 605–680 (2004) Gorbounov, V., Malikov, F., Schechtman, V.: On chiral differential operators over homogeneous spaces. Int. J. Math. Math. Sci. 26(2), 83–106 (2001) Gorbounov, V., Malikov, F., Schechtman, V.: Gerbes of chiral differential operators. III. http:// arXiv.org/list/math. AG/0005201, 2000 Gualtieri, M.: Generalized complex geometry. http://arXiv.org/list/math. DG/0401221, 2004 Heluani, R., Kac, V.G.: Supersymmetric vertex algebras. Commun. Math. Phys. 271, 103–178 (2007) Kac, V.: Vertex algebras for beginners. 2nd edition, Providence, RI: Amer. Math. Soc. 1998 Kapustin, A.: Chiral de Rham complex and the half-twisted sigma-model. http://arXiv.org/list/ hep-th/0504074, 2005 Kapustin, A., Li, Yi.: Topological sigma-models with H-flux and twisted generalized complex manifolds. http://arXiv.org/list/hep-th/0407249, 2004 Kontsevich, M.: Deformation quantization of poisson manifolds. Lett. Math. Phys. 66(3), 157–216 (2003) Leites, D.: Introduction to the theory of supermanifolds. Russ. Math. Surv. 35(1), 1–64 (1980) Lian, B.H., Zuckerman, G.J.: New perspectives on the brst-algebraic structure of string theory. Commun. Math. Phys. 154(3), 613–646 (1993)

548

[LWX] [M1] [M2] [MS] [MSV] [Ol] [QFS] [S] [T] [V] [W1] [W2] [W3] [W4] [Z] [Zh] [Zu]

F. Malikov

Liu, Z.-J., Weinstein, A., Xu, P.: Manin triples for lie bialgebroids. J. Diff. Geom. 45, 547–574 (1997) Manin, Yu.I.: Gauge field theory and complex geometry. Grundlehren 289 Berlin-Heidelberg-New York: Springer-Verlag, 1988 Manin, Yu.I.: Frobenius manifolds, quantum cohomology, and moduli spaces. Colloquium Publications 47, Providence, RI: Amer. Math. Soc 1999 Malikov, F., Schechtman, V.: Deformations of vertex algebras, quantum cohomology of toric varieties, and elliptic genus. Commun. Math. Phys. 234(1), 77–100 (2003) Malikov, F., Schechtman, V., Vaintrob, A.: Chiral de rham complex. Commun. Math. Phys. 204, 439–473 (1999) Olver, P.J.: Applications of Lie groups to differential equations. Graduate Texts in Mathematics 107, New York: Springer-Verlag, 1986 Quantum fields and strings: A course for mathematcians. v.1, 2, P. Deligne et al, eds., Providence, RI: Amer. Math. Soc. 2000 Schwarz, A.: Symplectic formalizm in conformal field theory. In: Symétries Quantiques, Les Houches, Session LXIV, 1995, A. Connes, K. Gawedzki, Zinn-Justin, eds., Elsevier Science B.V., 1998 Takens, F.: A global version of the inverse problem of the calculus of variations. J. Differ. Geom. 14(4), 543–562 (1979) Vinogradov, A. M. Cohomological analysis of partial differential equations and secondary calculus. Translations of Mathematical Monographs, 204. Providence, RI: Amer. Math. Soc. 2001 Witten, E.: Nonabelian bosonization in two dimensions. Commun. Math. Phys. 92(4), 455– 472 (1984) Witten, E.: Mirror manifolds and topological field theories. In: Essays on mirror symmetry, S.T. Yau, ed., Hong Kong: International Press, 1992 Witten, E.: On the landau-ginzburg description of n = 2 minimal models. Int. J. Mod. Phys. A9, 4783–4800 (1994) Witten, E.: Two-Dimensional Models With (0,2) Supersymmetry: Perturbative Aspects. http:// arXiv.org/list/hep-th/0504078, 2005 Zuckerman, G.J.: Action principles and global geometry. In: Mathematical aspects of string theory (San Diego, Calif., 1986), Adv. Ser. Math. Phys., 1, Singapore: World Sci. Publishing, 1987, pp. 259–284 Zhu, M.: Vertex operator algebras associated to modified regular representations of affine Lie algebras. http://arXiv.org/list/math/0611517, 2006 Zumino, B.: Supersymmetry and Kähler manifolds. Phys. Lett. 27B, 203 (1979)

Communicated by L. Takhtajan

Commun. Math. Phys. 278, 549–566 (2008) Digital Object Identifier (DOI) 10.1007/s00220-007-0397-x

Communications in

Mathematical Physics

The Ground State Energy of Heavy Atoms: Relativistic Lowering of the Leading Energy Correction Rupert L. Frank1, , Heinz Siedentop2 , Simone Warzel3 1 Department of Mathematics, Royal Institute of Technology, 100 44 Stockholm, Sweden.

E-mail: [email protected]

2 Mathematisches Institut, Ludwig-Maximilians-Universität München, Theresienstraße 39,

80333 München, Germany. E-mail: [email protected]

3 Department of Mathematics, Princeton University, Princeton, NJ 08544-1000, USA.

E-mail: [email protected] Received: 16 February 2007 / Accepted: 2 May 2007 Published online: 21 December 2007 – © R.L. Frank, H. Siedentop and S. Warzel 2007

Abstract: We describe atoms by a pseudo-relativistic model that has its origin in the work of Chandrasekhar. We prove that the leading energy correction for heavy atoms, the Scott correction, exists. It turns out to be lower than in the non-relativistic description of atoms. Our proof is valid up to and including the critical coupling constant. It is based on a renormalization of the energy whose zero level we adjust to be the ground-state energy of the corresponding non-relativistic problem. This allows us to roll the proof back to results for the Schrödinger operator. 1. Introduction The energy of heavy atoms has attracted considerable interest that dates back to the advent of quantum mechanics. As in classical mechanics it soon became clear that the exact solution of problems involving more than two particles interacting through Coulomb forces is not possible. Thomas [61] and Fermi [22,23] introduced their description of such atoms by the particle density and Lenz [31], who wrote down the corresponding energy functional which we will use here (see (7)), addressed this question and derived that the ground state energy of atoms should decrease with the atomic number Z as Z 7/3 . Scott predicted that this could be refined by an additive Z 2 -correction. Considerably later Schwinger [47] argued also for Scott’s prediction. Schwinger [48] and Englert and Schwinger [10–12] even refined these considerations by adding more lower order terms (see also Englert [9]). The challenging conjecture whether the predicted formulae by Thomas and Fermi would yield asymptotically correct results in leading order when compared with the N -particle Schrödinger theory was settled by Lieb and Simon in their seminal paper [37]. Alternative proofs were given by Thirring [60] (lower bound), Lieb [34], and Balodis and Solovej [41]. The Scott correction was established by Hughes [26,27] (lower bound), © 2007 The authors. Reproduction of this article for non-commercial purposes by any means is permitted.

Current address: Department of Mathematics, Princeton University, Princeton, NJ 08544-1000, USA

550

R. L. Frank, H. Siedentop, S. Warzel

and Siedentop and Weikard [49–53] (lower and upper bound). In fact, even the existence of the Z 5/3 -correction conjectured by Schwinger was proven (Fefferman and Seco [18– 20,13,21,16,14,15,17]). Later these results were extended in various ways, e.g., the Scott correction to ions (Bach [1,2]), to molecules (Ivrii and Sigal [29], Solovej and Spitzer [59,58], Balodis [4]), and to molecules in the presence of magnetic fields (Sobolev [56] and Ivrii [30]). Ivrii [28] extended the validity of Schwinger’s correction to the molecular case. Nevertheless, from a physical point of view, these considerations are questionable, since large atoms force the bulk of the electrons on orbits that are close to the nucleus (of order Z −1/3 ) where the electrons move with high speed which requires a relativistic treatment. Schwinger [48] has estimated this effect concluding that they should contribute to the Scott correction whereas the leading term should be unaffected by the change of model. Sørensen [45] was the first who proved that the Thomas-Fermi term is indeed left unaffected when the non-relativistic Hamiltonian is replaced by the Chandrasekhar operator in the limit of large Z and large velocity of light c with κ := Z /c fixed. Cassanas and Siedentop [5] showed that similarly to the Chandrasekhar case, the leading energy is not affected for the Brown-Ravenhall operator. Recently, Solovej, Sørensen, and Spitzer [57] announced a proof that a correction is at most of the order Z 2 , although no claim on the actual value of the coefficient was made. (See also Sørensen [44] for the non-interacting case.) In the present paper, we give an alternate proof of the Scott correction of the Chandrasekhar operator, which we present – for simplicity – in the atomic case. Our proof relies heavily on a semi-classical approximation for electrons that are far enough from the nucleus. However, we use them only indirectly relying on known results about the non-relativistic Scott correction. In addition we use only relatively standard technical means as Lieb-Thirring and Hardy inequalities. Our basic strategy is a renormalization of the energy setting the energy of the Schrödinger atom as zero. Moreover, we are able to extend the result of [57] to the case of the critical coupling constant. However, the question of whether the Schwinger correction which lives on the scale Z −2/3 also exists in this relativistic model cannot be answered with our techniques and is, therefore, left open. The energy of an heavy atom is described by a quadratic form E # : QN → R ⎡ N T − Z |x|−1 + ψ → ψ, ⎣ ν

ν=1

with Q N :=

N

⎤ |xµ − xν |−1 ⎦ ψ

(1)

1≤µ<ν≤N

C0∞ (R3 ) ⊗ Cq .

(2)

ν=1

The superscript # refers to the following two operators which are self-adjointly realized in L 2 (R3 ) ⊗ Cq : Chandrasekhar operator: T C := c2 p 2 + c4 − c2 , Schrödinger operator: T S := 21 p 2 . The parameter q ∈ N represents the possible number of spin states per electron which – physically – has the value 2; Z is the atomic number, c is the velocity of light, N is the electron number. We use units in which m = e2 = = 1.

Relativistic Scott Correction

551

A word about names: we address operators of the form T C + V with a potential V as Chandrasekhar operators, since the use of this kinetic energy can be traced back at least to Chandrasekhar’s semiclassical treatment of the stability of stars [6], where T C can be viewed as the underlying operator. Later the use of T C has been investigated by Weder [62] and by Herbst [25]. In the literature the operator is sometimes addressed as a pseudo-relativistic operator or Herbst operator. In the following we assume that the system is neutral, i.e., Z = N , an assumption that we make mainly because of notational convenience. It follows from Kato’s inequality (with sharp constant), (2/π )|x|−1 ≤ | p|, that the Chandrasekhar form E C is bounded from below, if and only if κ := Z /c ≤ 2/π. (3) Henceforth we assume this condition. The ground state energy of a heavy atom with atomic number Z is given by

# E (κ) (Z ) := inf E # (ψ) | ψ ∈ Q N , ψ = 1 ,

(4)

where # refers – as above – either to the Chandrasekhar or the Schrödinger operator, the former being dependent additionally on κ. We are interested in E κC (Z ). However, E S (Z ) will also play an essential role, namely in regularizing the energy. Note that E C ≤ E S , which implies that E κC (Z ) ≤ E S (Z ). Our main result strengthens a result by Solovej, Sørensen, and Spitzer [57] in the atomic case to the critical value of the coupling constant. Theorem 1. Let κ ∈ (0, 2/π ] and q ∈ N. In the limit Z → ∞ with κ = Z /c fixed and N = Z, E κC (Z ) = E S (Z ) − qs(κ)Z 2 + o(Z 2 ), (5) where s(κ) := κ

−2

tr

p2

+ 1 − 1 − κ|x|

−1

−

−

1 2

p − κ|x| 2

−1

−

.

(6)

In (6) we used the notation A− := −Aχ(−∞,0) (A) for the negative part of a selfadjoint operator A. Several remarks apply: 1. As already mentioned in the introduction, the asymptotics of the ground-state energy E S (Z ) of the Schrödinger atom up to o(Z 2 ) is given by the Thomas-Fermi energy and the Scott correction. To state this result precisely we introduce the Thomas-Fermi functional (Lenz [31]) 3 Z ETF (ρ) := γTF ρ(x)5/3 − ρ(x) dx + D(ρ, ρ), (7) |x| R3 5 where, in our units, γTF = (6π 2 /q)2/3 /2, and where ρ(x)σ (y) 1 dx dy D(ρ, σ ) := 2 R3 ×R3 |x − y| is the Coulomb scalar product. We define

E TF (Z ) := inf ETF (ρ) | ρ ∈ L 1 (R3 ) ∩ L 5/3 (R3 ), ρ ≥ 0

(8)

552

R. L. Frank, H. Siedentop, S. Warzel

to be the minimal Thomas-Fermi energy. By scaling one finds that E TF (Z ) = E TF (1)Z 7/3 . The asymptotic formula E S (Z ) = E TF (Z ) + 41 q Z 2 + O(Z 47/24 )

(9)

was proven in [49,50]; for a lower bound only, see Hughes [26,27]. Inserting this into (5) one finds that E κC (Z ) = E TF (Z ) + 41 − s(κ) q Z 2 + o(Z 2 ). (10) 2. The spectral shift s(κ) is monotone increasing with respect to κ and strictly positive for κ > 0. Indeed, by scaling x → x/κ, s(κ) = tr κ −2 p 2 + κ −4 − κ −2 − |x|−1 − 21 p 2 − |x|−1 , −

−

and κ −2 p 2 + κ −4 − κ −2 is monotone decreasing with respect to κ. 3. It is part of our assertion that the operator in brackets in (6) belongs to the trace class. In the subcritical case κ < 2/π this was already proved by Sørensen [44]. Since neither the Schrödinger nor the Chandrasekhar operator depend explicitly on spin, we shall assume henceforth q = 1; the general case follows along the same line. We prove Theorem 1 in Sect. 3 after having established a precise bound on the spectral shift for one-particle operators in the next section. 2. Bound on the Spectral Shift For any real-valued potential v for which the following operators can be defined according to Friedrichs, we set S(v) :=

p 2 − v, C(v) := p 2 + 1 − 1 − v, 1 2

(11) (12)

the Schrödinger, respectively Chandrasekhar, operator in L 2 (R3 ). We assume c = 1 throughout this section. If the potential v is radially symmetric, both the Schrödinger and the Chandrasekhar operator commute with the angular momentum operators allowing for a decomposition into the corresponding invariant subspaces. For each l ∈ N0 the subspace Hl spanned by the spherical harmonics Yl,m with m = −l, . . . , l, is an invariant subspace of S(v) and ∞ H = L 2 (R3 ). We write for the orthogonal projection onto H and C(v), and ⊕l=0 l l l trl (A) := tr(l A) for the corresponding reduced trace. Our main result in this section concerns the decay of the spectral shift trl [C(v)]− − [S(v)]− as the angular momentum l increases. We shall prove

(13)

Relativistic Scott Correction

553

Theorem 2. There exists a constant M such that for all µ ≥ 0 and for all l ∈ N0 and for all v : [0, ∞) → [0, ∞) satisfying v(r ) ≤

r −1

2 π

(14)

the sum of eigenvalue differences for angular momentum l is bounded according to 0 ≤ trl [C(v) + µ]− − [S(v) + µ]− ≤ M(l + 1)−2 .

(15)

This theorem shows that there is an effective cancelation in the difference in (15). Indeed, if v(r ) = κr −1 , then trl S(κr

−1

)

−

∞ 1 κ2 = (2l + 1) , 2 (n + l)2 n=1

and also that (15) implies that the operator this does not decay at all as l → ∞. We note p 2 + 1 − 1 − κ|x|−1 − 21 p 2 − κ|x|−1 − appearing in Theorem 1 is trace class −

for any κ ∈ (0, π2 ].

2.1. Reminder on Lieb-Thirring estimates. In the proof of Theorem 2 we use the following relativistic Lieb-Thirring inequalities due to Daubechies [7]. Proposition 1. For any γ > γ trl [C(v)]−

1 2

there exists a constant L γ such that for all l ≥ 0,

≤ L γ (2l + 1)

∞

0

1+γ [v(r )]+

1 +γ + [v(r )]+2

dr.

(16)

Proposition 1 is also valid for γ = 21 , but we will not need this fact. γ

γ

Proof. Since trl [C(v)]− ≤ (2l + 1) tr 0 [C(v)]− , it suffices to verify the claim for l = 0. If we extend v to an evenfunction v˜ on R, then C(v) is unitarily equivalent to the part of the whole-line operator p 2 + 1 − 1 − v˜ on antisymmetric functions. In the whole-line case, the result follows by evaluating the integral in [7, Eq. (2.14)]. Our treatment of the critical case κ = Theorem 11] of Lieb and Yau.

2 π

is based on the following inequality [38,

Proposition 2. Let I be a function with support in {x ∈ R3 : |x| ≤ 1}. Then for all µ > 0, tr I | p| − π2 |x|−1 − µ I ≤ const µ4 |I (x)|2 dx. −

554

R. L. Frank, H. Siedentop, S. Warzel

2.2. Finiteness of partial traces. In (15) the trace of the difference of the operators [C(v) + µ]− and [S(v) + µ]− appears. We begin by proving that both operators separately have finite traces. Since S(v) ≥ C(v) (see also (25) below) it suffices to prove this in the relativistic case. Lemma 1. For all µ ≥ 0 and all l ∈ N0 one has trl C π2 |x|−1 + µ − < ∞. Proof. Obviously it suffices to prove the lemma for µ = 0. Pick a Lipschitz function ϕ : R+ → [0, π/2] with Lipschitz constant φ0 which vanishes for r ≤ 1/2 and which is π/2 for r ≥ 1. Then I := cos(ϕ) has compact support around the origin and, furthermore, it constitutes together with A := sin(ϕ) a quadratic partition of unity, i.e., I 2 + A2 = 1. According to Lieb and Yau [38, Theorem 9] we have the localization formula

ψ, ( p 2 + 1)1/2 ψ = I ψ, ( p 2 + 1)1/2 I ψ + Aψ, ( p 2 + 1)1/2 Aψ − ψ, Lψ (17) for ψ ∈ L 2 (R3 ). Here L is the bounded integral operator on L 2 (R3 ) with non-negative kernel given in terms of a Bessel function L(x, y) := K 2 (|x − y|)

sin2 [(ϕ(|x|) − ϕ(|y|))/2] . π 2 |x − y|2

(18)

We shall estimate this localization error by a multiplication operator. More precisely, we shall show that there exists a constant M > 0 such that

ψ, Lψ ≤ M ψ, e−|x| ψ.

(19)

To prove this, we note that by the Schwarz inequality we have sin2 ((ϕ(|x|) − ϕ(|y|))/2) 2

ψ, Lψ ≤ dx |ψ(x)| dy K 2 (|x − y|) π 2 |x − y|2 R3 R3 2 φ0 ≤ dx |ψ(x)|2 dy K 2 (|x − y|) 2π |x|<1 R3 2 φ0 + dx |ψ(x)|2 dy K 2 (|x − y|) 2π |x|≥1 |y|<1 =: ψ, v I ψ + ψ, v A ψ. (20) ∞ Since 0 drr 2 K 2 (r ) = 3π/2 [39, Formula 11.4.22] we have

φ0 2π

2

3φ02 χ{|x|<1} (x). 2 R3 √ Moreover, since K 2 (r ) = 2/r 2 + O(1) as r ↓ 0 and K 2 (r ) ∼ π/(2r ) exp(−r ) as r → ∞ [43], the function 2 φ0 v A (x) := χ{|x|≥1} (x) dy K 2 (|y|) 2π |x+y|<1 v I (x) :=

χ{|x|<1} (x)

dy K 2 (|x − y|) =

is well-defined and satisfies v A (x) ≤ const e−|x| . This proves (19).

Relativistic Scott Correction

555

Combining (17) and (19) we find that trl C π2 |x|−1 − 2 ≤ trl I C π |x|−1 + Me−|x| I + trl A C π2 |x|−1 + Me−|x| A . (21) −

−

To estimate the inner part we use that I C π2 |x|−1 + Me−|x| I ≥ I | p| − π2 |x|−1 − 1 − Me−1 I. It follows therefore from Proposition 2 that the corresponding trace is finite (even when summed over all l). For the outer part we use −1 −|x| −1 −|x| 2 2 A C π |x| + Me . A≥C χ + Me 1 π |x| {|x|≥ 2 }

The corresponding trace is finite by Proposition 1.

2.3. Angular momentum barrier inequalities. A straightforward consequence of Hardy’s inequality, which we will frequently exploit, is Lemma 2. Let l ∈ N0 . Then, as operators in Hl p 2 ≥ (l + 21 )2 r −2 .

(22)

Proof. Writing the Laplacian in spherical coordinates we find that p 2 in Hl is unitarily equivalent to pr2 + l(l + 1)r −2 in L 2 (R+ ) with Dirichlet boundary conditions. The claim follows hence from Hardy’s inequality, pr2 ≥ (2r )−2 . By operator monotonicity of the square root, (22) implies the (non-sharp) inequality | p| ≥ (l + 21 )r −1 in Hl . We shall need an analogue of this inequality for the operator C(0) instead of | p|. Note that p 2 + 1 − 1 behaves as 21 p 2 for small p. Since ‘small p’ corresponds intuitively to ‘large r ’, we cannot expect that C(0) controls an r −1 decay. But it does control an r −1 singularity. This is the content of 1 2 1 2 2 Lemma 3. Let l ∈ N0 , R > 0 and Ml (R) := (l + 2 ) / R + R + (l + 2 ) . Then, as operators in Hl ,

C(0) ≥ Ml (R) χ{r ≤R} (r ) r −1 .

(23)

Proof. The inequality (22) and operator monotonicity of the square root imply in Hl , p 2 + 1 − 1 ≥ (l + 21 )2 r −2 + 1 − 1; the claim follows by determining the solution of the equation (l + 21 )2 r −2 + 1 = 1 + Mr −1 .

556

R. L. Frank, H. Siedentop, S. Warzel

The core of Theorem 2 is contained in the following Lemma 4. There exists a constant such that for all v : [0, ∞) → [0, ∞) satisfying (14), for all µ ≥ 0 and for all l ∈ N, one has 0 ≤ trl [C(v) + µ]− − [S(v) + µ]− (24) ≤ const trl [C(wl )]2− + (l + 21 )−2 trl [C(wl )]− , where wl (r ) := 10 r −1 χ{r ≥l 2 /4} (r ). Proof. The identity 1 2

p 2 = C(0) + 21 C(0)2

(25)

implies the non-negativity asserted in (24). To prove the second inequality in (24) we shall first assume (in addition to (14)) that v is a bounded function and that µ > 0. Once the inequality is proved in this case (with a constant independent of µ and the supremum of v), we can apply it to the cut-off potential v M := min{v, M}. By monotone convergence C(v M ) and S(v M ) converge to C(v) and S(v) in strong resolvent sense [8, Thm. 1.2.3], and therefore [46, Thm. VIII.20], [54, Thm. 2.7] for any µ > 0, lim inf M→∞ trl [C(v M ) + µ]− ≥ trl [C(v) + µ]− and similarly for S(v M ). But the reverse inequalities are also true, since we have C(v M ) ≥ C(v) and S(v M ) ≥ S(v). Hence we conclude that trl [C(v M ) + µ]− − [S(v M ) + µ]− converges to the corresponding quantity with v M replaced by v. Finally, we can use Lemma 1 to extend the result to µ → 0. Thus we may assume v to be bounded, µ > 0 and denote by γl the orthogonal projection onto the eigenspace of C(v) corresponding to angular momentum l and eigenvalues less or equal than −µ. Since v is bounded, any eigenfunction of C(v) lies in the form domain of S(v). Hence the variational principle together with (25) yields 2 trl [C(v) + µ]− − [S(v) + µ]− ≤ trl C(0)2 γl . (26) Again the boundedness of v and the finite rank of γl imply that trl C(0)2 γl is finite. Using the eigenvalue equation and the bound (14) on the potential we estimate this term further as follows: 2 trl C(0)2 γl ≤ trl [C(v)]2− + trl v 2 γl ≤ trl C π2 |x|−1 + ( π2 )2 trl |x|−2 γl . −

(27) The last term in the above inequality is bounded using (22) and (25), −2 −1 2 1 −2 | p|ψl 2 = l + 21 C(0)ψl 2 + 2 ψl , C(0)ψl , (28) |x| ψl ≤ l + 2 valid for ψl ∈ Hl . Since l ≥ 1 we have ( π2 )2 (l + 21 )−2 ≤ may be summarized as

1 2

and thus the last two estimates

2 trl C(0)2 γl ≤ 2 trl C( π2 |x|−1 ) + 4( π2 )2 (l + 21 )−2 trl C(0)γl . −

(29)

Relativistic Scott Correction

557

In view of (26) the assertion will follow, if we can prove 2 ≤ trl [C(wl )]2− , trl C π2 |x|−1 −

trl C(0)γl ≤ trl [C(wl )]− .

We begin with the (more difficult) second inequality. We have trl C(0)γl ≤ trl vγl ≤ π2 trl |x|−1 γl ≤ trl |x|−1 γl .

(30)

(31)

We apply Lemma 3 with R = l 2 /4 to bound the last term. Since Ml (l 2 /4) ≥ 5/4 for l ≥ 1 we obtain 4 ψl , |x|−1 ψl ≤ ψl , C(0)ψl + ψl , χ{r ≥l 2 /4} (|x|) |x|−1 ψl . (32) 5 The last two estimates can be summarized as trl C(0)γl ≤ − trl C(wl )γl ≤ trl [C(wl )]− ,

(33)

which proves the second inequality in (30). We proceed similarly to prove the first one. Indeed, by (32), C π2 |x|−1 ≥ 15 C(0) + χ{r ≤l 2 /4} (|x|) |x|−1 − π2 |x|−1 ≥ 15 C(wl ), (34) 2 and hence trl C( π2 |x|−1 ) − ≤

1 25

trl [C(wl )]2− . This completes the proof of the lemma.

Proof (of Theorem 2). The boundedness of the trace in (15) for l = 0 is implied by Lemma 1, and its non-negativity follows from (25). For l ≥ 1 we use Lemma 4 and note that trl [C(wl )]2− ≤ const l −2 , trl [C(wl )]− ≤ const (35) by Proposition 1.

3. Proof of the Main Results: Renormalization of the Relativistic Operator The strategy for our proof of the main results is to use the Schrödinger operator as a regularization for the relativistic problem, i.e., we will use it to eliminate the main contribution to the energy – the Thomas-Fermi energy – and to focus only on the energy shift of the low lying states where the electron-electron interaction plays no role and the unscreened problem remains. Recall that Theorem 1 for q = 1 reads lim

Z →∞

E S (Z ) − E κC (Z ) = s(κ). Z2

(36)

We will show this claim in two steps, namely that the upper limit and the lower limits exist and are given by the same expression, namely the coefficient of the Z 2 -correction claimed in the theorem. That this coefficient is finite was already remarked after Theorem 2.

558

R. L. Frank, H. Siedentop, S. Warzel

3.1. Upper bound on the energy difference – Lower bound on the relativistic energy. Lieb and Simon [37] showed that the Thomas-Fermi minimization problem (8) has a unique minimizer ρ Z , the Thomas-Fermi density. It fulfills the scaling relation ρ Z (x) := Z 2 ρ1 (Z 1/3 x).

(37)

We define the radius of the Thomas-Fermi exchange hole at point x ∈ R3 as the unique minimal radius R Z (x) for which 1 (38) ρ Z (y)dy = . 2 |x−y|≤R Z (x) We denote the exchange-hole-reduced Thomas-Fermi screening potential by ρ Z (y) dy, χTF (x) := |x−y|>R Z (x) |x − y|

(39)

and the corresponding one-particle operators by STF = S(Z |x|−1 − χTF ), CTF = Cc (Z |x|−1 − χTF ),

(40) (41)

both self-adjointly realized in L 2 (R3 ). Here we use a notation similar to that in Sect. 2, (42) Cc (v) := p 2 c2 + c4 − c2 − v. We remark that we slightly deviate from the usual choice Z |x|−1 − ρ Z ∗ | · |−1 (x) for the screened potential. This is motivated by the correlation inequality (44) below. The concept of an exchange hole can be traced back to Slater [55]. It also has been used to estimate the exchange-correlation energy (Lieb [32], Lieb and Oxford [36]). We shall express the many-particle ground-state energy in terms of quantities involving the above one-particle operators. In the relativistic case we use the correlation inequality of [40] to obtain a lower bound on the many-particle ground-state energy. Lemma 5. For all L ∈ N, E κC (Z ) ≥ −

L−1

∞ trl Cc (Z |x|−1 ) − trl [CTF ]− − D(ρ Z , ρ Z ). −

l=0

(43)

l=L

Proof. We use the correlation inequality [40, Eq. (14)]

|xµ − xν |−1 ≥

N

χTF (xν ) − D(ρ Z , ρ Z ),

(44)

ν=1

1≤µ<ν≤N

to bound E κC (Z ) from below by the ground-state energy of N

(CTF )ν − D(ρ Z , ρ Z ).

ν=1

This yields E κC (Z ) ≥ − tr[CTF ]− − D(ρ Z , ρ Z ). (45) We split the trace according to angular momentum and use the operator inequality CTF ≥ Cc (Z |x|−1 ) for all l ≤ L − 1 to obtain the assertion.

Relativistic Scott Correction

559

In the non-relativistic case, we recall Proposition 3. Let L := Z 1/9 . Then, as Z → ∞, E S (Z ) = −

L−1

∞ trl S(Z |x|−1 ) − trl [STF ]− − D(ρ Z , ρ Z ) + O(Z 47/24 ). (46) −

l=0

l=L

Proof. The same argument as in Lemma 5 yields the lower bound E S (Z ) ≥ −

L−1

∞ trl S(Z |x|−1 ) − trl [STF ]− − D(ρ Z , ρ Z ). −

l=0

(47)

l=L

|−1 .

Hence [52, Theorem 1] and the proof of this theorem Note that the χTF ≤ ρ Z ∗ | · (in particular, [52, Lemma 2], see also [53]) show that one can further estimate −

L−1

∞ trl S(Z |x|−1 ) − trl [STF ]− − D(ρ Z , ρ Z ) −

l=0

≥ E TF (Z ) +

1 2 4Z

l=L

− const Z 17/9 log Z .

(48)

On the other hand, one has the upper bound [49, Lemmas 3.1 and 4.1] E S (Z ) ≤ E TF (Z ) + 41 Z 2 + const Z 47/24 .

(49)

Combining this with (47) and (48) we obtain the assertion. 1/9 Proof (of Theorem 1 – first part). Choosing L = Z and combining Lemma 5 and Proposition 3 we obtain L−1 E S (Z ) − E κC (Z ) ≤ − trl S(Z |x|−1 ) − Cc (Z |x|−1 ) l=0 ∞

−

−

−

trl [STF ]− − [CTF ]− + const Z 47/24 .

(50)

l=L

We note that by scaling x → x/c, the operators STF and CTF are unitarily equivalent to the operators κ −2 Z 2 S(κ|x|−1 −χ Z ) and κ −2 Z 2 C1 (κ|x|−1 −χ Z ), both acting in L 2 (R3 ), where χ Z (x) := κ 2 Z −2 χTF (κ x/Z ). (51) This implies lim sup Z →∞

E S (Z ) − E κC (Z ) ≤ κ −2 lim sup (1 (Z ) + 2 (Z )) , Z2 Z →∞

where 1 (Z ) := 2 (Z ) :=

L−1 l=0 ∞ l=L

trl

C1 (κ|x|−1 ) − S(κ|x|−1 ) ,

trl

C1 (κ|x|−1 − χ Z ) − S(κ|x|−1 − χ Z ) .

−

−

−

−

(52)

560

R. L. Frank, H. Siedentop, S. Warzel

Theorem 2 implies that the summands in both sums on the right-hand side are non-negative and bounded by const (l + 1)−2 independently of Z . Therefore the first sum actually converges lim sup 1 (Z ) = Z →∞

∞

trl

C1 (κ|x|−1 ) − S(κ|x|−1 ) . −

l=0

−

(53)

Moreover, the second sum converges to zero, lim sup 2 (Z ) ≤ const lim sup Z →∞

∞ (l + 1)−2 = 0.

(54)

Z →∞ l=L

This concludes the proof of the upper bound on the energy difference.

We remark that 2 (Z ) ≤ const Z −1/9 , hence we have actually shown that E S (Z ) − E κC (Z ) ≤ s(κ)Z 2 + const Z 47/24 .

(55)

3.2. Lower bound on the energy difference – Upper bound on the relativistic energy. Following [49] we define two one-particle density matrices d S and d C as sums d# =

∞

dl# .

(56)

l=0

As above, we use the convention that # refers either to the Schrödinger case or to the Chandrasekhar case. The operators dl# are defined in L 2 (R3 ) through their integral kernels l ∞ # # dl# (x, y) := wn,l ψn,l,m (x)ψn,l,m (y). (57) n=1

m=−l

# The weights wn,l and the functions ψn,l,m

are defined separately for angular momentum l < L and l ≥ L, where L will be chosen later in a Z -dependent way. # Case l < L. We define ψn,l,m as the n th eigenfunction of S(Z |x|−1 ) restricted to angular momentum (l, m), respectively of Cc (Z |x|−1 ) restricted to angular # momentum (l, m), with the normalization ψn,l,m 2 = 1. Note that this # function is of the form ψn,l,m (x) = ϕn,l (|x|)Yl,m (x/|x|) with a radial function ϕn,l . The weights wn,l are defined independently of m by

wn,l :=

1 n ≤ K − l, 0 n > K − l,

(58)

where K := [d Z 1/3 ] with d some positive constant independent of Z . # Case l ≥ L. We choose ψn,l,m (x) = ϕn,l (|x|)Yl,m (x/|x|), where the functions ϕn,l , as well as the weights wn,l , are defined exactly as in [49, Sect. 2] independently of #. (The exact form of the functions and the values of the weights for l ≥ L are irrelevant in our context.)

Relativistic Scott Correction

561

Note that the above construction guarantes d # to be density matrices, i.e., 0 ≤ d # ≤ 1. Moreover, by the choice of L, K , and wn,l one can assure that tr d # ≤ Z . (For # = S this is proved in [49, Corollary 4.1], and follows hence also for # = C.) Since dl# is independent of # for l ≥ L we drop the superscript in this case. Moreover, we shall use the notations # d< :=

L−1 l=0

dl# , d> :=

∞

dl ,

l=L

and # # ρl# (x) := dl# (x, x), ρ< (x) := d< (x, x), ρ> (x) := d> (x, x).

We recall now that the density matrix d S gives an energy which is correct up to the order we are interested in. More precisely, one has Proposition 4. Let L := Z 1/12 . Then, for sufficiently large Z , E S (Z ) = tr[S(Z |x|−1 )d S ] + D(ρ S , ρ S ) + O(Z 47/24 ).

(59)

Proof. It is shown in [49] that for sufficiently large Z , E S (Z ) ≤ tr S(Z |x|−1 )d S + D(ρ S , ρ S ) ≤ E TF (Z ) + 41 Z 2 + const Z 47/24 . Combining this with the lower bound on E S (Z ) which was recalled in (47) and (48), we obtain the assertion. We decrease the ground state energy further by dropping a part of the Coulomb energy, E S (Z ) ≥ tr[S(Z |x|−1 )d<S ] + tr[S(Z |x|−1 )d> ] + D(ρ> , ρ> ) − const Z 47/24 .

(60)

For an upper bound in the relativistic case we employ a variational principle to obtain Lemma 6. For sufficiently large Z , C C C C E κC (Z ) ≤ tr[Cc (Z /|x|)d< ] + tr[S(Z /|x|)d> ] + D(ρ> , ρ> ) + 2D(ρ< , ρ> ) + D(ρ< , ρ< ).

Proof. As noted above, d C satisfies 0 ≤ d C ≤ 1 and tr d C ≤ Z for sufficiently large Z [49, Corollary 4.1]. Using that the Hartree-Fock functional bounds the ground state energy from above – even if non-idempotent density matrices are inserted, a fact that was proven by Lieb [33] (see also Bach [3]) – and estimating the indirect part of the Coulomb energy by zero we obtain E κC (Z ) ≤ tr[Cc (Z |x|−1 )d C ] + D(ρ C , ρ C ).

(61)

C C Both terms on the right-hand side are split according to d = d< + d> . To obtain the 1 2 desired upper bound we use the inequality 2 p ≥ c2 p 2 + c4 − c2 for large angular momenta.

562

R. L. Frank, H. Siedentop, S. Warzel

The following lemma shows the irrelevance of the interaction energy of the low lying states with all other electrons (including themselves). The proof follows the strategy pursued in [49], namely to estimate it by the lowest Coulomb energy of a particle in the field of an external point charge Z , and then simply multiplying by the particle number. There is, however, one important change in the channel l = 0. Because of the singular nature of the lowest eigenfunctions in the critical case, their expectations in potentials with Coulomb singularities does not exist. To circumvent this problem we use the Hardy-Littlewood-Sobolev inequality followed by a recent Sobolev-type inequality [24]. C , ρ C ) ≤ const Z 11/6 log Z . Lemma 7. One has D(ρ< C , ρ C ) and D(ρ C , ρ ) separately. For the latter one we Proof. We treat the terms D(ρ< < < > recall that ρlC (x) dx = (2l + 1)(K − l), 0 ≤ l < L , (62)

where K = O(Z 1/3 ) and that by Proposition 3.4 in [49], S ∞ ρ (x) ρl (x) dx ≤ dx ≤ const Z 4/3 . |x| |x|

(63)

l=L

The densities ρl# are spherically symmetric because of the addition formula for the spherical harmonics. Hence, using Newton’s theorem [42], we have 1 ρ> (y) C C D(ρ< , ρ> ) ≤ (x)dx ρ< dy 2 |y| ≤ const

L−1

(2l + 1)(K − l)Z 4/3 ≤ O(L 2 K Z 4/3 ) = O(Z 11/6 ). (64)

l=0 C := ρ C − ρ C and estimate We set ρ<> < 0 C C C C ≤ 2D ρ0C , ρ0C + 2D ρ<> . D ρ< , ρ< , ρ<>

(65)

This allows to treat the contributions from l = 0 and 1 ≤ l < L separately. Using a scaled version of Lemma 3 with Rl := (l + 21 )2 − 4κ 2 /4κ we obtain for 1 ≤ l < L, 1 tr |x|−1 dlC ≤ tr Cc (0) dlC + tr χ{|x|>Rl /c} |x|−1 dlC 2Z c 1 tr dlC , ≤ tr |x|−1 dlC + 2 Rl where the last inequality used the fact that eigenfunctions of dlC are also eigenfunctions of Cc (Z |x|−1 ) with negative eigenvalue. Hence, summing over l and noting that Rl−1 ≤ const l −2 ,

L L−1 ρ<> (y) dy = tr(|x|−1 dlC ) ≤ const Z l −2 ρlC (x) dx. |y| l=1

l=1

Relativistic Scott Correction

563

Thus by (62) and again by Newton’s theorem 1 ρ<> (y) C C C dy D(ρ<> , ρ<> ) ≤ ρ<> (x) dx 2 |y| ≤ const K L 2 K Z log L ≤ const Z 11/6 log Z . Finally, we treat the term corresponding to l = 0. By the Hardy-Littlewood-Sobolev inequality (cf. [35]) and by Hölder’s inequality ⎛ D(ρ0C , ρ0C ) ≤ const ρ0C 26/5 = const ⎝

# K

⎞5/3

$6/5 C |ψn,0,0 (x)|2

dx ⎠

n=1

≤ const K 1/3

# K

$5/3 C |ψn,0,0 (x)|12/5 dx

.

n=1

Now we use the Sobolev-type inequality [24, Eq. (2.8)] 1/2 u212/5 ≤ const u, (| p| − π2 |x|−1 )u u,

(66)

where the first factor on the right-hand side is to be understood in form sense. Using C that | p| − π2 |x|−1 ≤ c−1 Cc (Z |x|−1 ) + c and that ψn,0,0 is a normalized eigenfunction −1 of Cc (Z |x| ), we deduce C ψn,0,0 12/5 ≤ const c1/4 .

(67)

Combining the previous relations we arrive at D(ρ0C , ρ0C ) ≤ const K 1/3 (K c3/5 )5/3 ≤ const Z 5/3 . This completes the proof of the lemma.

(68)

Proof (of Theorem 1 – second part). It follows from Lemma 7 that C C C , ρ> ) + D(ρ< , ρ< ) = O(Z 11/6 log Z ). 2D(ρ<

Hence Lemma 6 together with (60) implies lim inf Z −2 E S (Z ) − E κC (Z ) Z →∞ ' ( −2 −1 S −1 C tr S(Z |x| ) d< − tr Cc (Z |x| ) d< ≥ lim inf Z Z →∞

= lim inf Z →∞

−

L−1 l=0

(2l + 1)

K −l

Z −2

S S ψn,l,m , S(Z |x|−1 ) ψn,l,m

n=1

C C , Cc (Z |x|−1 ) ψn,l,m ψn,l,m

.

The claim now follows from the scaling x → x/c and Fatou’s lemma.

564

R. L. Frank, H. Siedentop, S. Warzel

In order to get an explicit remainder estimate one could bound the sum L−1

(2l +1)

l=0

∞

S S C C ψn,l,m − ψn,l,m (69) , S(Z |x|−1 ) ψn,l,m , Cc (Z |x|−1 ) ψn,l,m

n=K −l+1

from above. This is certainly not difficult but for brevity we refrain from doing so. The sum corresponding to l ≥ L can be bounded using Theorem 2. Acknowledgement. We thank Elliott Lieb and Robert Seiringer for various supportive discussions. R.F. and H.S. thank the Departments of Mathematics and Physics of Princeton University for hospitality while this work was done. The work has been partially supported by the Swedish Foundation for International Cooperation in Research and Higher Education (STINT) (R.F.), the U.S. National Science Foundation, grant PHY 01 39984 (H.S.), and the Deutsche Forschungsgemeinschaft, grant SI 348/13-1 (H.S.).

References 1. Bach, V.: Ein Beweis der Scottschen Vermutung für Ionen. Master’s thesis, Institut für Mathematische Physik, Technische Universität Braunschweig, Braunschweig, June 1989 2. Bach, V.: A proof of Scott’s conjecture for ions. Rep. Math. Phys. 28(2), 213–248 (1989) 3. Bach, V.: Error bound for the Hartree-Fock energy of atoms and molecules. Commun. Math. Phys. 147, 527–548 (1992) 4. Balodis, P.: A proof of Scott’s correction for matter. Commun. Math. Phys. 249(1), 79–132 (2004) 5. Cassanas, R., Siedentop, H.: The ground-state energy of heavy atoms according to Brown and Ravenhall: absence of relativistic effects in leading order. J. Phys. A 39(33), 10405–10414 (2006) 6. Chandrasekhar, S.: The maximum mass of ideal white dwarfs. Astrophys. J. 74, 81–82 (1931) 7. Daubechies, I.: An uncertainty principle for Fermions with generalized kinetic energy. Commun. Math. Phys. 90, 511–520 (1983) 8. Davies, E.B.: Heat kernels and spectral theory, Volume 92 of Cambridge Tracts in Mathematics. Cambridge: Cambridge University Press, 1990 9. Englert, B.-G.: Semiclassical Theory of Atoms. Number 300 in Lecture Notes in Physics, 1 edition, Berlin: Springer-Verlag, 1988 10. Englert, B.-G., Schwinger, J.: Statistical atom: Handling the strongly bound electrons. Phys. Rev. A 29(5), 2331–2338 (1984) 11. Englert, B.-G., Schwinger, J.: Statistical atom: Some quantum improvements. Phys. Rev. A 29(5), 2339– 2352 (1984) 12. Englert, B.-G., Schwinger, J.: Atomic-binding-energy oscillations. Phys. Rev. A 32, 47–63 (1985) 13. Fefferman, C., Seco, L.: Eigenfunctions and eigenvalues of ordinary differential operators. Adv. Math. 95(2), 145–305 (1992) 14. Fefferman, C., Seco, L.: The density of a one-dimensional potential. Adv. Math. 107(2), 187–364 (1994) 15. Fefferman, C., Seco, L.: The eigenvalue sum of a one-dimensional potential. Adv. Math. 108(2), 263– 335 (1994) 16. Fefferman, C., Seco, L.: On the Dirac and Schwinger corrections to the ground-state energy of an atom. Adv. Math. 107(1), 1–188 (1994) 17. Fefferman, C., Seco, L.: The density in a three-dimensional radial potential. Adv. Math. 111(1), 88–161 (1995) 18. Fefferman, C.L., Seco, L.A.: An upper bound for the number of electrons in a large ion. Proc. Nat. Acad. Sci. USA 86, 3464–3465 (1989) 19. Fefferman, C.L., Seco, L.A.: Asymptotic neutrality of large ions. Commun. Math. Phys. 128, 109–130 (1990) 20. Fefferman, C.L., Seco, L.A.: On the energy of a large atom. Bull. AMS 23(2), 525–530 (1990) 21. Fefferman, C.L., Seco, L.A.: Aperiodicity of the Hamiltonian flow in the Thomas-Fermi potential. Rev. Math. Iberoamer. 9(3), 409–551 (1993) 22. Fermi, E.: Un metodo statistico per la determinazione di alcune proprietá dell’atomo. Atti Della Reale Accademia Nazionale dei Lincei, Rendiconti, Classe di Scienze Fisiche, Matematiche e Naturali 6(12), 602–607 (1927) 23. Fermi, E.: Eine statistische Begründung zur Bestimmung einiger Eigenschaften des Atoms und ihre Anwendungen auf die Theorie des periodischen Systems der Elemente. Z. Phys. 48, 73–79 (1928)

Relativistic Scott Correction

565

24. Frank, R.L., Lieb, E.H., Seiringer, R.: Hardy-Lieb-Thirring inequalities for fractional Schrödinger operators. J. AMS (in press). DOI 10.1090/s0894-0347-07-00582-6 25. Herbst, I.W.: Spectral theory of the operator ( p 2 + m 2 )1/2 − Z e2 /r . Commun. Math. Phys. 53, 285– 294 (1977) 26. Hughes, W.: An Atomic Energy Lower Bound that Gives Scott’s Correction. PhD thesis, Princeton, Department of Mathematics, 1986 27. Hughes, W.: An atomic lower bound that agrees with Scott’s correction. Adv. in Math. 79, 213–270 (1990) 28. Ivrii, V.: Semiclassical asymptotics for exchange energy. In: Séminaire sur les Équations aux Dérivées Partielles, 1993–1994, Palaiseau: École Polytech., 1994, pp. Exp. No. XX, 12 29. Ivrii, V.Ja., Sigal, I.M.: Asymptotics of the ground state energies of large Coulomb systems. Ann. of Math. 138(2), 243–335 (1993) 30. Ivrii, V.: Heavy atoms in a superstrong magnetic field. In: Differential operators and spectral theory, Providence, RI: Amer. Math. Soc., 1999, pp.125–137 31. Lenz, W.: Über die Awendbarkeit der statistischen Methode auf Ionengitter. Z. Phys. 77, 713–721 (1932) 32. Lieb, E.H.: A lower bound for Coulomb energies. Phys. Lett. 70, 444–446 (1979) 33. Lieb, E.H.: Variational principle for Many-Fermion systems. Phys. Rev. Lett. 46, 457–459 (1981), Erratum: Phys. Rev. Lett. 47, 69 (1981) 34. Lieb, E.H.: Thomas-Fermi and related theories of atoms and molecules. Rev. Mod. Phys. 53(4), 603–641 (1981) 35. Lieb, E.H., Loss, M.: Analysis, Volume 14 of Graduate Studies in Mathematics. 2nd edition, Providence, RI: American Mathematical Society, 2001 36. Lieb, E.H., Oxford, S.: Improved lower bound on the indirect Coulomb energy. Intern. J. Quantum Chem. 19, 427–439 (1981) 37. Lieb, E.H., Simon, B.: The Thomas-Fermi theory of atoms, molecules and solids. Adv. Math. 23, 22–116 (1977) 38. Lieb, E.H., Yau, H.-T.: The stability and instability of relativistic matter. Commun. Math. Phys. 118, 177–213 (1988) 39. Luke, Y.F.: Integrals of Bessel functions. In: M. Abramowitz, I.A. Stegun, ed. Handbook of Mathematical Functions with Formulas, Graphs, and Mathematical Tables, 5th edition, Chapter 9, New York: Dover Publications, 1968, pp. 479–494 40. Mancas, P., Klaus, A., Müller, M., Siedentop, H.: The optimal size of the exchange hole and reduction to one-particle Hamiltonians. In: Theoretical Chemistry Accounts: Theory, Computation, and Modeling (Theoretica Chimica Acta) 111(1), 49–53, (2004) 41. Matesanz, P.B., Solovej, J.P.: On the asymptotic exactness of Thomas-Fermi theory in the thermodynamic limit. Ann. Henri Poincaré 1(2), 281–306 (2000) 42. Newton, I.: Philosophiae naturalis principia mathematica. Vol. I. Cambridge, MA: Harvard Univ. Press, 1972. Reprinting of the third edition (1726) with variant readings, Assembled and edited by A. Koyré, I.B. Cohen with the assistance of A. Whitman 43. Olver, F.W.J.: Bessel functions of integer order. In: M. Abramowitz, I.A. Stegun, eds. Handbook of Mathematical Functions with Formulas, Graphs, and Mathematical Tables, 5th edition, chapter 9, New York: Dover Publications, 1968, pp. 355–433 44. Sørensen, T.Ø.: Towards a Relativistic Scott Correction. PhD thesis, Aarhus Universitet, http://www.imf. au.dk/cgi-bin/viewers/viewpublications.cgi?id=79, October 1998 45. Sørensen, T.Ø.: The large-Z behavior of pseudorelativistic atoms. J. Math. Phys. 46(5), 052307, 24 (2005) 46. Reed, M., Simon, B.: Methods of modern mathematical physics. I. Functional analysis. New York: Academic Press, 1972 47. Schwinger, J.: Thomas-Fermi model: The leading correction. Phys. Rev. A 22(5), 1827–1832 (1980) 48. Schwinger, J.: Thomas-Fermi model: The second correction. Phys. Rev. A 24(5), 2353–2361 (1981) 49. Siedentop, H., Weikard, R.: On the leading energy correction for the statistical model of the atom: Interacting case. Commun. Math. Phys. 112, 471–490 (1987) 50. Siedentop, H., Weikard, R.: Upper bound on the ground state energy of atoms that proves Scott’s conjecture. Phys. Lett. A 120, 341–342 (1987) 51. Siedentop, H., Weikard, R.: On the leading energy correction of the statistical atom: Lower bound. Europhys. Lett. 6, 189–192 (1988) 52. Siedentop, H., Weikard, R.: On the leading correction of the Thomas-Fermi model: Lower bound – with an appendix by A.M.K. Müller. Invent. Math. 97, 159–193 (1989) 53. Siedentop, H., Weikard, R.: A new phase space localization technique with application to the sum of negative eigenvalues of Schrödinger operators. Ann. Sci. de l’École Normale Supér. 24(2), 215–225 (1991) 54. Simon, B.: Trace ideals and their applications, Volume 120 of Mathematical Surveys and Monographs, 2nd ed. Providence, RI: Amer. Math. Soc., 2005 55. Slater, J.C.: A simplification of the Hartree-Fock method. Phys. Rev. 81(3), 385–390 (1951)

566

R. L. Frank, H. Siedentop, S. Warzel

56. Sobolev, A.V.: The quasi-classical asymptotics of local Riesz means for the Schrödinger operator in a strong homogeneous magnetic field. Duke Math. J. 74(2), 319–429 (1994) 57. Solovej, J.P.: The relativistic Scott correction. In: M.J. Esteban, C. Le Bris, G. Scuseria, eds. Mathematical and Numerical Aspects of Quantum Chemistry Problems, Volume 47/2006 of Oberwolfach Report, Mathematisches Forschungsinstitut Oberwolfach, Berlin: European Mathematical Society, 2006, pp. 52–53 58. Solovej, J.P., Spitzer, W.L.: New coherent states and a new proof of the Scott correction. In: Advances in differential equations and mathematical physics (Birmingham, AL, 2002), Volume 327 of Contemp. Math., Providence, RI: Amer. Math. Soc., 2003, pp. 305–319 59. Solovej, J.P., Spitzer, W.L.: A new coherent states approach to semiclassics which gives Scott’s correction. Commun. Math. Phys. 241(2–3), 383–420 (2003) 60. Thirring, W.: A lower bound with the best possible constant for Coulomb Hamiltonians. Commun. Math. Phys. 79, 1–7 (1981) 61. Thomas, L.H.: The calculation of atomic fields. Proc. Camb. Phil. Soc. 23, 542–548 (1927) 62. Weder, R.: Spectral analysis of pseudodifferential operators. J. Funct. Anal. 20, 319–337 (1975) Communicated by B. Simon

Commun. Math. Phys. 278, 567–625 (2008) Digital Object Identifier (DOI) 10.1007/s00220-007-0400-6

Communications in

Mathematical Physics

Modulating Pulse Solutions to Quadratic Quasilinear Wave Equations over Exponentially Long Length Scales M. D. Groves1 , G. Schneider2 1 Department of Mathematical Sciences, Loughborough University, Loughborough, LE11 3TU, UK.

E-mail: [email protected]

2 Mathematisches Institut A, Universität Stuttgart, Pfaffenwaldring 57, 70569 Stuttgart, Germany

Received: 4 May 2006 / Accepted: 2 September 2007 Published online: 17 January 2008 – © Springer-Verlag 2008

Abstract: This paper presents an existence proof for modulating pulse solutions to a wide class of quadratic quasilinear Klein-Gordon equations of the form ∂t2 u = ∂x2 u − u + f 1 (u, ∂x u, ∂t u)∂x2 u + f 2 (u, ∂x u, ∂t u). Modulating pulse solutions consist of a pulse-like envelope advancing in the laboratory frame and modulating an underlying wave-train; they are also referred to as ‘moving breathers’ since they are time-periodic in a moving frame of reference. The problem is formulated as an infinite-dimensional dynamical system with three stable, three unstable and infinitely many neutral directions. By transforming part of the equation into a normal form with an exponentially small remainder term and using a generalisation of local invariant-manifold theory to the quasilinear setting, we prove the existence of small-amplitude modulating pulses on domains in space whose length is exponentially large compared to the magnitude of the pulse. 1. Introduction 1.1. Breathers and modulating pulses. A breather solution of the nonlinear wave equation ∂t2 u = ∂x2 u − u + g(u),

x, t ∈ R,

in which g : R → R is a smooth function, is a solution u(x, t) ∈ R which is periodic in t and decays to zero as x → ±∞; in particular the completely integrable sineGordon equation (g(u) = u − sin u) admits explicit breather solutions. It seems natural to ask whether wave equations obtained by perturbing the sine-Gordon nonlinearity also have breather solutions; a negative answer to this question was given by Denzler [2] and Birnir, McKean & Weinstein [1], who showed that the only perturbations of the

568

M. D. Groves, G. Schneider

sine-Gordon equation admitting breather solutions are those which can be transformed back into the sine-Gordon equation by a rescaling. The ‘non-persistence of breathers’ result is remarkable since the existence of smallamplitude breather solutions is predicted by multiple scaling analysis. Making the Ansatz u(x, t) = ε A(ε(x − cg t), ε2 t)eik0 x−iω0 t + c.c., in which 0 < ε 1 is a small perturbation parameter, k0 is the basic spatial wavenumber, ω0 = ω0 (k0 ) = 1 + k02 is the basic frequency and cp = cp (k0 ) = ω0 /k0 , cg = cg (k0 ) = k0 /ω0 are the linear phase and group velocities, one finds that at leading order A(X, T ) ∈ C satisfies the nonlinear Schrödinger equation ∂T A = iν1 ∂ X2 A + iν2 A|A|2

(1)

with coefficients ν j = ν j (k0 ) ∈ R. Equation (1) is said to be focusing if ν1 ν2 > 0; in this case it possesses a family of breather solutions A(X, T ) = B(X )eiγ T ,

γ ∈ R,

where B(X ) ∈ R satisfies lim X →±∞ B(X ) = 0, and it is well known that these solutions correctly approximate solutions of the nonlinear wave equation on length- and time-scales of O(1/ε2 ) (see e.g. Kalyakin [8] and Schneider [12]). The breathers considered by Denzler and Birnir, McKean & Weinstein have a basic wavenumber k0 = 0; for k0 = 0, so that cg (k0 ) = 0, the solutions are called moving breathers (because they are time-periodic in a moving frame of reference) or modulating pulses (because they consist of a pulse-like envelope advancing in the laboratory frame and modulating an underlying periodic wave train). The ‘non-persistence of breathers’ result shows that the above breathers are destroyed in the full equation by higher order terms which are neglected in the derivation of the nonlinear Schrödinger equation. The ‘non-persistence’ phenomenon is easily understood in terms of the geometrical theory of differential equations. Let us examine the equation ∂t2 u = ∂x2 u − u + u f (u 2 ),

(2)

where f : R → R is another smooth function and seek moving breather solutions u(x, t) = v(x − cg t, k0 (x − cp t)), where v is 2π -periodic and odd in its second variable and cg = cg + O(ε2 ), cp = cp + O(ε2 ). Writing the equation for v as an evolutionary equation in which the unbounded spatial variable ξ = x − cg t plays the role of the time-like variable (‘spatial dynamics’), one obtains a system of equations of the form ∂ξ z = z + F(z, w), ∂ξ w = w + G(z, w) + H (z)

(3) (4)

(see Groves & Schneider [4]), in which is a linear operator whose spectrum consists of a pair of real eigenvalues of opposite signs and infinitely many purely imaginary eigenvalues, z and w are the projections of v onto the hyperbolic and centre subspaces of and F, G, H are nonlinearities with G(z, 0) = 0. A moving breather corresponds to a homoclinic solution of (3), (4), that is a solution (z, w) such that (z(ξ ), w(ξ )) → (0, 0) as ξ → ∞, and arises as the intersection of the stable and unstable manifolds in the phase space of 2π -periodic functions. Equations (3), (4) have the property that {w = 0} is a two-dimensional invariant subspace when the higher-order terms H (z) are neglected;

Modulating Pulses for Quasilinear Wave Equations

569

the stable and unstable manifolds are contained in this two-dimensional subspace, and in fact intersect whenever f (0) > 0, giving rise to homoclinic solutions of the twodimensional dynamical system ∂ξ z = z + F(z, 0). On the other hand {w = 0} is no longer invariant when H (z) is included, and the intersection of the one-dimensional stable and unstable manifolds in the infinite-dimensional phase space is a rare phenomenon; the homoclinic solution for H (z) = 0 does not persist for H (z) = 0. Any further analysis of the situation clearly has to take the infinite-dimensional centre space and hence the variable w into account. Groves & Schneider [4] proved the existence of modulating pulse solutions to (2) which remain O(εn )-close to the approximate solutions of amplitude O(ε) obtained by setting H (z) = 0 but do not decay to zero as ξ → ±∞, so that their ‘tails’ are O(εn ); here n ∈ N is arbitrary but fixed. The proof involves using a sequence of normal-form transformations which eliminate successive terms in the Taylor expansion of H (z), so that it can be made O(ε N ), where N is arbitrary but fixed (and determined by the choice of n). A standard construction for semilinear evolutionary equations yields a family of solutions on ξ ∈ [0, ∞) whose hyperbolic parts are O(εn ) and whose centre part may experience secular growth; the initial-values of these solutions form the centre-stable manifold. The O(εn )boundedness of their centre parts follows using an auxiliary argument: a solution v(ξ ) with initial data on the centre-stable manifold converges to a solution on an appropriately defined centre manifold, a graph in phase space upon which all solutions remain so long as they are O(εn ); the existence of a Lyapunov function (the Hamiltonian function for the wave equation) shows that the centre manifold is actually globally invariant; and the rate of convergence of v(ξ ) to the centre manifold is shown to be faster than the rate of secular growth of its centre part. Finally, the reversibility of (2) is exploited to extend the above solutions to symmetric solutions on ξ ∈ (−∞, ∞). Our result identifies a sense in which the modulating pulses for H (z) = 0 persist for H (z) = 0 and we generalise our definition of ‘modulating pulses’ and ‘moving breathers’ accordingly. An alternative approach to persistence, which is based upon scattering theory, is given by McLaughlin & Shatah [10]. Further complications arise when studying quasilinear wave equations. Our technique in reference [4] relies heavily upon semilinearity, in particular that global existence theory is available for globally Lipschitz nonlinearities with small Lipschitz constant; this method is therefore not applicable to quasilinear problems. Progress was however made in our study of the prototype quasilinear equation ∂t2 u = ∂x2 u − u + ∂x2 (u 3 ) (Groves & Schneider [5]). The theory is analogous to that for semilinear equations: a normal-form transformation eliminates terms up to O(ε N ) in the Taylor expansion of H (z); an iteration scheme and energy estimates are used to construct solutions which exist on ξ ∈ [0, ε−n ] and are O(εn ) close to the approximate solutions of amplitude O(ε) obtained by setting H (z) = 0; and reversibility is used to extend these solutions to symmetric solutions on ξ ∈ [−ε−n , ε−n ]. The definition of ‘modulating pulses’ and ‘moving breathers’ is thus generalised further, namely to include pulses which exist on large but finite spatial intervals in a frame of reference moving with the pulse. In the present paper we present an existence theory for modulating pulses which greatly improves and generalises the result given in reference [5]. We show that a large

570

M. D. Groves, G. Schneider

—

Fig. 1. A modulating pulse solution guaranteed by Theorem 1

class of quasilinear wave equations with analytic nonlinearities and whose quadratic terms do not necessarily vanish admit√modulating pulse solutions which exist on ξ ∈ √ √ [−ec /2 ε , ec /2 ε ] and are O(e−c /2 ε )-close to the O(ε) approximate modulating pulses. Our solutions are therefore exponentially close to the approximate pulses and exist on exponentially long length scales in a frame of reference moving with the pulse. The main tool is a normal-form transformation which makes H (z) exponentially rather than algebraically small.

1.2. The result. We seek modulating pulse solutions to the quasilinear wave equation ∂t2 u = ∂x2 u − u + f 1 (u, ∂x u, ∂t u)∂x2 u + f 2 (u, ∂x u, ∂t u),

(5)

in which f 1 , f 2 : R3 → R are analytic functions which satisfy f 1 (0) = 0, f 2 (0) = 0, d f 2 [0] = 0 and (6) f i (a, −b, −c) = f i (a, b, c), i = 1, 2; this hypothesis ensures that the spatial dynamics formulation of (5) is reversible, and is an essential requirement for the construction of symmetric modulating pulses. Quasilinear wave equations of this type arise for example in approximations to Maxwell’s equations (Schneider [13]). Our result is stated in Theorem 1 and illustrated in Fig. 1. Theorem 1. Fix a positive real number k0 . There exist positive constants ε0 and c∗ with the property that for each ε ∈ (0, ε0 ] Eq. (5) admits an infinite-dimensional, continuous family of modulating pulse solutions of the form u(x, t) = v1 (x − cg t, k0 (x − cp t)), where v1 is 2π -periodic in its second argument and cp = cp − ε2 , cg =

1 . cp

These solutions satisfy v1 (ξ, η) = v1 (−ξ, −η), |v1 (ξ, η) − h ε (ξ, η)| ≤ e−c

/2√ε

Modulating Pulses for Quasilinear Wave Equations

for all η ∈ R and ξ ∈ [−ec ε

h (ξ, η) = ±ε

2Cˇ 1 π Cˇ 2

/2√ε

, ec

/2√ε

571

], in which

1/2 sech(Cˇ 1 εξ ) cos η + O(ε3/2 e−εθ|ξ | ), 0 < θ < Cˇ 1 1/2

1/2

(so that lim h ε (ξ, η) = 0 uniformly in η ∈ R). Here Cˇ 1 = 2k0 (1 + k02 )3/2 and Cˇ 2 is ξ →±∞

a normal-form coefficient which is defined in Eq. (30) and required to be positive. Example 1. The quasilinear wave equation ∂t2 u = ∂x2 u − u − αu∂x2 u − β(∂x u)2 ∂x2 u, in which α, β are real parameters, satisfies the hypotheses of Theorem 1 with Cˇ 2 = −

α 2 k02 k 4 (1 + k02 )β (2 + 7k02 + 5k04 ) + 0 ; 4π 4π

this coefficient is positive for sufficiently large values of k0 whenever β > 5α 2 . Remark 1. The coefficient Cˇ 2 is positive whenever the associated nonlinear Schrödinger equation (1) is focusing. The proof of Theorem 1 has five main steps. I. Spatial dynamics formulation (Sect. 2). The equation for v1 is formulated as an evolutionary system for v = (v1 , v2 ), v2 = ∂ξ v1 , in which the unbounded spatial variable ξ is the time-like variable. The linear operator in this evolutionary system has two geometrically double real eigenvalues of O(ε), two simple strongly hyperbolic eigenvalues of O(1) and an infinite number of geometrically double purely imaginary eigenvalues. In terms of the projections z = Pwh v and q = Psh,c v of v onto the weakly hyperbolic and strongly hyperbolic/centre subspaces, we write the system as the coupled fourth-order dynamical system ⎛

∂ξ z = K z + F ε (z, q),

0 ⎜0 K =⎝ 0 0

0 0 0 0

1 0 0 0

⎞ 0 1⎟ 0⎠ 0

(7)

for z = (z 1,o , z 1,e , z 2,o , z 2,e ) and quasilinear wave equation ∂ξ q1 = q2 , ∂ξ q2 =

(8)

−c3ε k02 ∂η2 q1 − c4ε q1 + Psh,c (d1 (z 1 )q1 ) + Psh,c (d2 (z 1 )∂η q1 ) + Psh,c (d3 (z 1 )q2 ) (9) +Psh,c (g3ε (z, q)∂η2 q1 ) + g4ε (z, q) + Psh,c (g5ε (z, q)∂η q2 ) + h ε (z),

where the notation for the nonlinearities has been designed to help with the careful book-keeping which is needed later. The nonlinearities F ε , g εj and h ε are analytic functions of their arguments and the parameter ε, where g4ε (z, q) contains no ε-independent terms which are linear in z 1 . The functions d j are linear and independent of ε, so that

572

M. D. Groves, G. Schneider

Psh,c (d1 (z 1 )q1 ), Psh,c (d2 (z 1 )∂η q1 ) and Psh,c (d3 (z 1 )q2 ) represent the terms ‘missing’ in the Taylor expansion of g4ε (z, q). Hypothesis (6) implies that g εj (Swh (z 1 , z 2 ), q1 , −∂η q1 , −q2 ) = g εj (z 1 , z 2 , q1 , ∂η q1 , q2 ), ε

j = 3, 4, 5,

ε

h (Swh (z 1 , z 2 )) = h (z 1 , z 2 ) and d1 (−z 1,o , z 1,e ) = d1 (z 1,o , z 1,e ), d j (−z 1,o , z 1,e ) = −d j (z 1,o , z 1,e ),

j = 2, 3,

where Swh (z 1,o , z 1,e , z 2,o , z 2,e ) = (−z 1,o , z 1,e , z 2,o , −z 2,e ), and this restriction is an essential requirement, its purpose being to guarantee the reversibility of Eqs. (7)–(9), that is their invariance under the transformation ξ → −ξ , (v1 , v2 ) → S(v1 , v2 ), where S(v1 (η), v2 (η)) = (v1 (−η), −v2 (−η)). Theorem 1 evidently requires an ‘almost global-wellposedness’ result for (8), (9). It is well known that the presence of quadratic terms in wave equations causes difficulties in constructing existence theories of this kind, and the usual approach is to construct a normal-form transformation which eliminates them (e.g. see Shatah [14]). Unfortunately elimination of the quasilinear quadratic terms in Eq. (9) in this fashion would cause a loss of regularity and complicate our analysis. In fact we do not require a complete theory for the initial-value problem for Eqs. (7)–(9) since we are only interested in solutions of a certain type, and it is actually not necessary to eliminate the quadratic terms to solve the initial-value problem for such solutions (see Step IV below). II. Identification of approximate modulating pulses (Sect. 3). According to the discussion in Sect. 1.1, approximate modulating pulses exist as homoclinic solutions of the equation ∂ξ z = K z + F ε (z, 0), (10) and the approximations increase in accuracy as a sequence of transformations is constructed to remove terms of order 2, 3, …from h ε . The transformation eliminating the quadratic part of h ε affects the cubic part of F ε , which in turn controls homoclinic bifurcation in Eq. (10). It is therefore necessary to carry out this preliminary transformation separately, after which dynamical systems arguments show that (10) admits a pair of homoclinic solutions provided that a coefficient Cˇ 2 in the cubic part of F ε is positive. III. Normal-form theory (Sect. 4). We proceed by using a sequence of normal-form transformations to eliminate terms of order 3, 4, …in the Taylor expansion of h ε . One cannot expect to eliminate the whole of h ε in this fashion, because our equations would then admit homoclinic solutions whose existence would contradict the ‘non-persistence of breathers’ result. By restricting attention to a neighbourhood of the origin (which is large enough to contain the approximate homoclinic solutions), one can however optimise over the order of the eliminated terms so that the remainder is exponentially small. The necessary transformation theory (Sect. 4) is a generalisation of a theory for finitedimensional dynamical analytic vector fields given by Iooss & Lombardi [7], and here we adopt their notation and make frequent reference to their paper for needed results of a combinatorial nature. A central requirement of Iooss & Lombardi’s result is that the linearised vector field should be diagonalisable (this condition ensures that certain estimates hold uniformly in the order of terms eliminated from the vector field). In the present context the corresponding requirement is that the matrix K should be diagonalisable, a condition which is clearly not met. This difficulty is overcome by writing

Modulating Pulses for Quasilinear Wave Equations

573

ε = µ2 and introducing scaled parameters which convert the equation for z into ∂ξ z = F µ (z, q), in which the linear part of the vector field is the (trivially diagonalisable) zero matrix. A similar device was used by Iooss & Lombardi [6] in an application of their normal-form theory to the 02 iω resonance. The transformation theory in Sect. 4 amounts to a partial normal form since only certain higher-order terms (the q-independent terms in the equations for q) are eliminated. A complete normal form would involve eliminating all ‘non-resonant’ terms in the vector field, and this task is known to be impossible because of a small-divisor problem arising from asymptotic resonances among the frequencies, that is the magnitudes of the purely imaginary eigenvalues (e.g. see Pöschel [11]). By contrast, the frequencies interact in a helpful way in our partial normal form: they guarantee that the transformation itself is smoothing of degree one (see Proposition 1), and this property in turn ensures that the transformed equation for q is again a quasilinear wave equation. IV. Existence theory (Sect. 5). The next step is to construct an existence theory for solutions of (7)–(9) which remain exponentially close to one of the approximate modulating pulses identified in Step II over an exponentially long time scale. For this purpose we use an iteration scheme for quasilinear systems of the type suggested by Kato [9], and here the main task is to prove that the iterative sequence {w(m) }m∈N0 for the central part w = Pc q of q converges; in particular we show that w(m) (ξ ) ≤ e−c

/2µ

, ξ ∈ [0, ec /2µ ], 1 w(m+1) (ξ ) − w(m) (ξ ) ≤ w(m) (ξ ) − w(m−1) (ξ ), ξ ∈ [0, ec /2µ ] 2

(11) (12)

for each m ∈ N0 , so that w = limm→∞ w(m) exists and satisfies w(ξ ) ≤ e−c /2µ for ξ ∈ [0, ec /2µ ]. The analysis of the sequence {w(m) }m∈N0 is complicated by the presence of quadratic terms in our nonlinearities. In proving (11) one arrives at the differential inequality ∂ξ w(m) 2 ≤ cµ(e−c

/µ

+ µ2 e−c

/2µ

e−µ

2 θξ

)w(m) + cµ(e−c

/2µ

+ µe−µ

2 θξ

)w(m) 2 (13)

for w(m) , and it is necessary to deduce that w(m) (ξ ) ≤ e−c /2µ for ξ ∈ [0, ec better inequality is obtained for equations with cubic nonlinearities, namely ∂ξ w(m) 2 ≤ cµ(e−c

/µ

+µ2 e−c

/2µ

e−µ

2 θξ

)w(m) +cµ(e−c

/2µ

+µ2 e−µ

2 θξ

/2µ

]. A

)w(m) 2 ;

integrating and using the means inequality, one finds that sup ξ ∈[0,e

c /2µ

]

w(m) (ξ )2 ≤ w(m) (0)2 + cµe−c

/µ

+ cµ

sup ξ ∈[0,ec

/2µ

]

w(m) (ξ )2 , (14)

so that w(m) (ξ ) ≤ cµ1/2 e−c /2µ , ξ ∈ [0, ec /2µ ] whenever w(m) (0) ≤ µ1/2 e−c /2µ . Notice however that this calculation does not yield the required result for inequality (13) (the O(µ) coefficient of the third term on the right-hand side of (14) is replaced by an O(1) coefficient).

574

M. D. Groves, G. Schneider

In fact the required result does follow from inequality (13), but a more careful 2 two-step estimation technique is required. In the first step we define ξ so that e−µ θξ = α µ , where α is an appropriately chosen positive constant; a straightforward application of Gronwall’s inequality shows that w(m) (ξ )2 ≤ cµ| log µ|e−c /µ for ξ ∈ [0, ξ ] whenever w(m) (0) ≤ µe−c /2µ . In the second step we integrate (13) over [0, ec /2µ ] and split the range of integration into [0, ξ ] and [ξ , ec /2µ ]. Satisfactory estimates for the integrals over [ξ , ec /2µ ] are obtained by an optimal choice of α (and hence ξ ), while the integrals over [0, ξ ] are handled using the result from the first step; the final result is that w(m) (ξ ) ≤ cµ1/2 | log µ|e−c /2µ for ξ ∈ [0, ec /2µ ]. Full details of this estimate are given in the final part of Lemma 5, and the corresponding calculation needed for (12) is discussed in Lemma 6. V. Extension to symmetric modulating pulses (Sect. 6). The final step is is based upon the following observations: (i) solutions v(ξ ) = (z(ξ ), q(ξ )) of (7)–(9) on ξ ∈ [0, ξ0 ] with the property that v(0) lies on the symmetric section = Fix S can be extended to symmetric solutions on ξ ∈ [−ξ0 , ξ0 ]; (ii) the initial values v(0) of the solutions identified in Step IV define a manifold in phase space (the centre-stable manifold) which is parametrised by the projections w 0 of v(0) onto the infinite-dimensional centre subspace and (Z 0 , r10 , r20 ) of v(0) onto an appropriately defined three-dimensional stable subspace. An intersection of the centre-stable manifold with the symmetric section therefore guar antees the existence of symmetric modulating pulse solutions on ξ ∈ [−ec /2µ , ec /2µ ]. Denote the spectral projections of the phase space onto the centre and hyperbolic subspaces by respectively Pc and Ph . Because Pc v(0) = w 0 we have that v(0) ∈ c := Pc whenever w0 ∈ c and fixed-point arguments are used to prove that there is a unique value of (Z 0 , r10 , r20 ) such that Ph v(0) ∈ h := Ph (and additional regularity requirements on w 0 beyond those used in Step IV are necessary here). In this fashion we obtain the result announced in Theorem 1, namely the existence of an infinite-dimensional, continuous family of modulating pulse solutions parametrised by w0 ∈ c . 2. Spatial Dynamics Formulation We look for modulating pulse solutions of the nonlinear wave equation (5) of the form u(x, t) = v1 (ξ, η),

ξ = x − cg t, η = k0 (x − cp t),

where v1 is periodic in η with period 2π and k0 is a fixed positive number. Making this Ansatz, one arrives at the equation (1 − cg2 )∂ξ2 v1 + 2(1 − cg cp )k0 ∂ξ ∂η v1 + (1 − cp2 )k02 ∂η2 v1 − v1 + f 3 (v1 , ∂η v1 , ∂ξ v1 )(∂ξ2 v1 + 2k0 ∂η ∂ξ v1 + k02 ∂η2 v1 ) + f 4 (v1 , ∂η v1 , ∂ξ v1 ) = 0, where f 3 (v1 , ∂η v1 , ∂ξ v1 ) = f 1 (v1 , ∂ξ v1 + k0 ∂η v1 , −cg ∂ξ v1 − k0 cp ∂η v1 ), f 4 (v1 , ∂η v1 , ∂ξ v1 ) = f 2 (v1 , ∂ξ v1 + k0 ∂η v1 , −cg ∂ξ v1 − k0 cp ∂η v1 ); notice that f 3 , f 4 are analytic real-valued functions of their arguments with the property that f i (a, −b, −c) = f i (a, b, c), i = 3, 4. We write cp = cp − ε2 , cg = 1/cp ,

Modulating Pulses for Quasilinear Wave Equations

575

where ε ∈ [0, ε0 ] and ε0 is a positive number which is chosen as small as necessary in the following analysis, so that cp is a small perturbation of the linear phase velocity cp and the equation simplifies to ∂ξ2 v1 + +

1 − cp2 + f 3 (v1 , ∂η v1 , ∂ξ v1 ) 1 − cg2 + f 3 (v1 , ∂η v1 , ∂ξ v1 )

k02 ∂η2 v1

2 f 3 (v1 , ∂η v1 , ∂ξ v1 ) f 4 (v1 , ∂η v1 , ∂ξ v1 ) − v1 k0 ∂η ∂ξ v1 + = 0. 2 1 − cg + f 3 (v1 , ∂η v1 , ∂ξ v1 ) 1 − cg2 + f 3 (v1 , ∂η v1 , ∂ξ v1 )

Introducing the new variable v2 = ∂ξ v1 , we can write the above equation as the evolutionary system ∂ξ v1 = v2 , ∂ξ v2 =

(15)

−c3ε k02 ∂η2 v1 − c4ε v1 + (c0 v1 + g0ε (v))∂η2 v1 + (c2 v1 + g2ε (v))∂η v2 +c1,1 v12 + c1,2 (∂η v1 )2 + c1,3 v22 + c1,4 v2 ∂η v1 + g1ε (v),

(16)

which we study in the phase space s+1 s (0, 2π ) × Hper (0, 2π )}, s ≥ 0, X s = {v = (v1 , v2 ) ∈ Hper

the domain of the vector field on the right-hand side of (15), (16) being Ds = X s+1 . Here c3ε =

1 − cp2 1 − cg2

, c4ε =

−1 1 − cg2

s+1 (R2 ) and are negative constants and the analytic functions g0ε , g1ε , g2ε : X s+1 → Hper constants c0 , c1,1 , …c1,4 , c2 are defined by

−c3ε k02 + c0 v1 + g0ε (v) = −k02

1 − cp2 + f 3 (v1 , ∂η v1 , v2 ) 1 − cg2 + f 3 (v1 , ∂η v1 , v2 )

,

−c4ε v1 + c1,1 v12 + c1,2 (∂η v1 )2 + c1,3 v22 + c1,4 v2 ∂η v1 + g1ε (v) f 4 (v1 , ∂η v1 , v2 ) − v1 , =− 1 − cg2 + f 3 (v1 , ∂η v1 , v2 ) c2 v1 + g2ε (v) = −

2k0 f 3 (v1 , ∂η v1 , v2 ) , 1 − cg2 + f 3 (v1 , ∂η v1 , v2 )

so that g εj (0) = 0, dg εj [0] = 0, j = 0, 1, 2 and d2 g10 [0] = 0. (We denote parameter dependence, which is always analytic, of functions and constants by the superscript ε; functions and constants without the superscript do not depend upon ε.) The evolutionary system (15), (16) has a discrete symmetry which plays an important role in the following theory. It is reversible, that is invariant with respect to the transformation ξ → −ξ , (v1 , v2 ) → S(v1 , v2 ), where the reverser S is defined by the formula S(v1 (η), v2 (η)) = (v1 (−η), −v2 (−η)). This symmetry has the consequence that (v1 (−ξ, −η), −v2 (−ξ, −η)) solves the equation whenever (v1 (ξ, η), v2 (ξ, η)) is a solution.

576

M. D. Groves, G. Schneider s (0, 2π ) as a Fourier series We may express an element of Hper

∞ 1 1 {v1,m,o sin(mη) + v1,m,e cos(mη)} v1 (η) = v1,0 + 2π π m=1

s (0, 2π ) → H s (0, 2π ) by the formulae and define projections P0 , Pm,o , Pm,e , Pm : Hper per ⎞

⎛

∞ 1 1 1 v1,0 + v1,0 (v1, j,o sin( jη) + v1, j,e cos( jη))⎠ = P0 ⎝ 2π π 2π j=1

and

⎛

⎞

∞ 1 1 1 Pm,o ⎝ (v1, j,o sin( jη) + v1, j,e cos( jη))⎠ = v1,0 + v1,m,o sin(mη), 2π π π j=1 ⎞

⎛

∞ 1 1 1 Pm,e ⎝ v1,0 + v1,m,e cos(mη) (v1, j,o sin( jη) + v1, j,e cos( jη))⎠ = 2π π π

j=1

with Pm = Pm,o + Pm,e for m = 1, 2, . . .. By extending the Fourier series coordinatewise to vector-valued functions we find that X s decomposes into a direct sum ⊕m∈N0 E m of subspaces, where E m = E m,o ⊕ E m,e ,

E m,o = {(v1,m,o , v2,m,o )},

E m,e = {(v1,m,e , v2,m,e )}.

We may therefore write X s = s+1 × s , in which t = {u | u2t := |u 0 |2 +

∞

m 2t (|u m,o |2 + |u m,e |2 ) < ∞},

m=1

and Pm,o , Pm,e , Pm also extend naturally to projections X s → X s which are denoted by the same symbols. Notice that Pm is infinitely smoothing due to its finite-dimensional range, so that Pm v1 t1 ≤ Cm,t1 ,t2 v1 t2 , t1 ≥ t2 ; the same smoothing property is enjoyed by Pm,o and Pm,e . The action of the reverser S in the new coordinate system is readily confirmed to be S(v1,o , v1,e , v2,o , v2,e ) = (−v1,o , v1,e , v2,o , −v2,e ), where (v1,o , v2,o ) = {(v1,m,o , v2,m,o )}, (v1,e , v2,e ) = {(v1,m,e , v2,m,e )}. Note also that the periodicity in η combines with the translation invariance in this variable to give an O(2) symmetry represented in the new coordinates by {(v1,m,o , v1,m,e , v2,m,o , v2,m,e )} → {(Rma (v1,m,o , v1,m,e ), Rma (v2,m,o , v2,m,e ))} for a ∈ R, where Rθ is the 2 × 2 matrix representing a rotation through the angle θ .

Modulating Pulses for Quasilinear Wave Equations

577

The spectrum of the linearised system

v2 ε ε v1 = ∂ξ v = L v, L v2 −c3ε k02 ∂η2 v1 − c4ε v1 associated with (15), (16) was calculated by Groves & Schneider [4]; we recall the complete result since extensive use is made of it in the following analysis. The m th Fourier component satisfies the ordinary differential equations ∂ξ v1,m = v2,m , ∂ξ v2,m =

m 2 k02 (1 − cp2 ) + 1 (1 − cg2 )

v1,m ,

and the associated eigenvalues λm,ε of this system of equations are given by λ2m,ε =

m 2 k02 (1 − cp2 ) + 1 (1 − cg2 )

= (k02 + 1)(1 − m 2 ) + 2k0 (1 + k02 )1/2 (k02 + m 2 )ε2 + O((1 + m 2 )ε4 ). m = 0: We have two simple, real eigenvalues ±λ0,ε = ±(1 + k02 )1/2 + O(ε2 ). The corresponding eigenvectors are given by

1 v1 = . ±λ0,ε v2 m = 1: For ε = 0 we have a geometrically simple and algebraically double zero eigenvalue in E 1,o . The eigenvector and associated generalised eigenvector are given by v1 1 0 v1 = sin η, = sin η. v2 v2 0 1 For ε > 0 we have two simple real eigenvalues ±λ1,ε which satisfy the equation (λ1,ε )2 = 2k0 ε2 (1 + k02 )3/2 + O(ε4 ). The eigenvectors are

v1 1 = sin η. ±λ1,ε v2 The same result holds in E 1,e with sin η replaced by cos η. m > 1: We have two simple purely imaginary eigenvalues in E m,o given by ±iωm,ε , where ωm,ε = ±i(m 2 − 1)1/2 (k02 + 1)1/2 + O((1 + m)ε2 ). The eigenvectors are

v1 1 = sin(mη). v2 ±λm,ε The same result holds in E m,e with sin(mη) replaced by cos(mη). The eigenvalue picture is summarised in Fig. 2; for ε > 0 we have a twos = E , a four-dimensional weakly hyperbolic dimensional strongly hyperbolic part Xsh 0 s part Xwh = E 1 and an infinite-dimensional central part Xcs = ⊕∞ m=2 E m of phase space. s along the Notice that P0 is the projection onto the strongly hyperbolic subspace Xsh s ∪ X s , while P is the projection onto the central and weakly hyperbolic subspace Xwh 1 c s along the central and strongly hyperbolic subspace weakly hyperbolic subspace Xwh

578

M. D. Groves, G. Schneider

Fig. 2. The spectrum of the linearised problem consists of infinitely many semisimple purely imaginary eigenvalues and two real semisimple real eigenvalues together with two Jordan blocks of length two at the origin for ε = 0 or two additional semisimple real eigenvalues for ε > 0; with the exception of the geometrically simple real eigenvalues of largest magnitude all eigenvalues have geometric multiplicity two

s ∪ X s . In the theory below we therefore write P for P , P Xsh sh 0 wh for P1 , Pc for I − c Psh − Pwh and also define Psh,c = Psh + Pc ; we use the notation (Z 1 , Z 2 ) = Psh (v1 , v2 ), (z 1 , z 2 ) = Pwh (v1 , v2 ), (w1 , w2 ) = Pc (v1 , v2 ) and (q1 , q2 ) = Psh,c (v1 , v2 ), so that L ε (Z 1 , Z 2 ) = (Z 2 , λ20,ε Z 1 ), L ε (z 1 , z 2 ) = (z 2 , λ21,ε z 1 ) and L ε {(q1,m , q2,m )}m≥2 = 2 q {(q2,m , −ωm,ε 1,m )}m≥2 . One may formulate Eqs. (15), (16) as the coupled four-dimensional dynamical system

∂ξ z = K z + F ε (z, q), where

⎛

0 ⎜0 K =⎝ 0 0

0 0 0 0

1 0 0 0

⎞ 0 1⎟ , 0⎠ 0

F ε (z, q) =

(17)

0 , λ21,ε z 1 + f ε (z, q)

f ε (z, q) = Pwh [(c0 (z 1 + q1 ) + g0ε (z + q))∂η2 (z 1 + q1 ) + c1,1 (z 1 + q1 )2 + c1,2 (∂η (z 1 + q1 ))2 + c1,3 (z 2 + q2 )2 + c1,4 (z 2 + q2 )∂η (z 1 + q1 ) + g1ε (z + q) + (c2 (z 1 + q1 ) + g2ε (z + q))∂η (z 2 + q2 )], and quasilinear wave equation ∂ξ q1 = q2 , ∂ξ q 2 =

(18)

−c3ε k02 ∂η2 q1 − c4ε q1 + Psh,c (d1 (z 1 )q1 ) + Psh,c (d2 (z 1 )∂η q1 ) + Psh,c (d3 (z 1 )q2 ) (19) +Psh,c (g3ε (z, q)∂η2 q1 ) + g4ε (z, q) + Psh,c (g5ε (z, q)∂η q2 ) + h ε (z),

where d1 (z 1 ) = d2 (z 1 ) = d3 (z 1 ) = g3ε (z, q) =

c0 ∂η2 z 1 + 2c1,1 z 1 , 2c1,2 ∂η z 1 , c1,4 ∂η z 1 , c0 (q1 + z 1 ) + g0ε (z + q),

Modulating Pulses for Quasilinear Wave Equations

579

g4ε (z, q) = Psh,c [(g0ε (z + q) − g0ε (z))∂η2 z 1 + (g2ε (z + q) − g2ε (z))∂η z 2 + 2c1,3 z 2 q2 + c1,4 z 2 ∂η q1 + c2 ∂η z 2 q1 + c1,1 q12 + c1,2 (∂η q1 )2 + c1,3 q22 + c1,4 q2 ∂η q1 + g1ε (z + q) − g1ε (z)], g5ε (z, q) = c2 (q1 + z 1 ) + g2ε (z + q), h ε (z) = Psh,c [(c0 z 1 + g0ε (z))∂η2 z 1 + (c2 z 1 + g2ε (z))∂η z 2 + c1,1 z 12 + c1,2 (∂η z 1 )2 +c1,3 z 22 + c1,4 z 2 ∂η z 1 + g1ε (z)]; the linear functions d j , j = 1, 2, 3 and nonlinearities in Eqs. (17)–(19) satisfy the estimates d j (z 1 )s+1 = O(|z 1 |),

j = 1, 2, 3,

ε

| f (z, q)| = O(qX s+1 (z, q)X s+1 + (z, q)3X s+1 ), sh,c

|F ε (z, q)| = O(ε2 |z| + qX s+1 (z, q)X s+1 + (z, q)3X s+1 ), sh,c

g εj (z, q)s+1 = O((z, q)X s+1 ), g4ε (z, q)s+1 ε

j = 3, 5,

= O(qX s+1 + qX s+1 |z||(z, ε)| + qX s+1 |z 2 |), 2

sh,c

sh,c

sh,c

h (z)s+1 = O(|z| ). 2

We use this formulation in the remainder of the article.

3. Approximate Modulating Pulses In this section we construct a pair of approximate modulating pulse solutions to Eqs. (17)– (19) from which we later obtain genuine modulating pulses by perturbation arguments. Consider the approximate problem obtained by removing the term h ε (z) from Eq. (19). This approximate problem has the property that E 1 = {q = 0} is an invariant subspace, the flow in which is given by the equation ∂ξ z = K z + F ε (z, 0), where |F ε (z, 0)| = O(|z|3 + ε2 |z|). Under a sign condition on the cubic part of the nonlinearity, the above equation admits a pair of small-amplitude homoclinic orbits p ε± for small, positive values of ε (see below), and each of these orbits serves as an approximate modulating pulse. Our strategy is therefore to use a sequence of changes of variable which systematically removes the term h εj (z) that is homogeneous of degree j in (z, ε) from h ε while preserving the overall structure of the equations. This procedure is carried out in Sect. 4 below, where it is shown that the remaining terms in h ε can be made exponentially small in comparison to ε, so that p ε± become very good approximations to genuine modulating pulses. The transformation which eliminates the term h 02 (z) (the term that is homogeneous of degree two in z and does not depend upon ε) affects the coefficient in F ε (z, 0) whose sign determines whether homoclinic bifurcation takes place; we therefore consider this transformation as a separate preliminary step.

580

M. D. Groves, G. Schneider

Lemma 1. There is a near-identity, finite-dimensional change of coordinates which transforms the coupled systems (17), and (18), (19) into ∂ξ z = K z + F˜ ε (z, q),

(20)

∂ξ q1 = q2 + g6ε (q, z) + h˜ ε1 (z), ∂ξ q2 =

(21)

−c3ε k02 ∂η2 q1 − c4ε q1 + Psh,c (d1 (z 1 )q1 ) + Psh,c (d2 (z 1 )∂η q1 ) + Psh,c (d3 (z 1 )q2 ) +Psh,c (g˜ 3ε (z, q)∂η2 q1 ) + g˜ 4ε (z, q) + Psh,c (g˜ 5ε (z, q)∂η q2 ) + h˜ ε2 (z), (22)

and preserves the reversibility. The nonlinearities F˜ ε , g˜ 3ε , g˜ 4ε , g˜ 5ε satisfy the same estimates as respectively F ε , g3ε , g4ε , g5ε , while g6ε (z, q) ˜ s+2 = O(|z|q ˜ X s+1 (z, q) ˜ X s+1 ), h˜ ε (z)X s+1 = O(|z|2 |(z, ε)|). sh,c

sh,c

Proof. Write h ε (z) = h 02 (z)+ hˆ ε (z), so that hˆ ε (z)X s+1 = O(|z|2 |(z, ε)|). Observe that sh,c

h 02 is a mapping from E 1 to E 0 ⊕ E 2 , and this fact suggests using a finite-dimensional change of coordinates of the form q˜ = q + (z), where Pm (z) = 0 for m = 0, 2. Substituting q = q˜ − (z) into (17) and (18), (19), one finds that they are transformed into respectively (20) and (21), (22) with

0 , F˜ ε (z, q) = λ21,ε z + f˜ε (z, q) f˜ε (z, q) ˜ = f ε (z, q˜ − (z)), g˜ εj (z, q) ˜ = g εj (z, q˜ − (z)), j = 3, 5, g˜ 4ε (z, q) ˜ = g4ε (z, q˜ − (z)) − g4ε (z, −(z)) −Psh,c (g3ε (z, q˜ − (z)) − g3ε (z, −(z)))∂η2 1 (z)

−Psh,c (g5ε (z, q˜ − (z)) − g5ε (z, −(z)))∂η 2 (z) +d2 [z](F ε (z, q˜ − (z)) − F ε (z, −(z))), g6ε (z, q) ˜ = d1 [z](F ε (z, q˜ − (z)) − F ε (z, −(z))), and

h˜ ε (z) = −L ε (z) + d[z](K z) + d[z](F ε (z, −(z)))

0 + −Psh,c (d1 (z 1 )1 (z)) − Psh,c (d2 (z 1 )∂η 1 (z)) − Psh,c (d3 (z 1 )2 (z))

0 ; + g4ε (z,−(z))− Psh,c [g3ε (z,−(z))∂η2 1 (z)+g5ε (z,−(z))∂η 2 (z)]+h ε (z) our objective is therefore achieved by choosing (z) to be a polynomial which is homogeneous of degree two and satisfies

0 L 0 (z) − d[z](K z) = . (23) h 02 (z)

Modulating Pulses for Quasilinear Wave Equations

581

Notice that (23) decomposes into component equations for P0 (z), P2,o (z) and P2,e (z) in respectively E 0 , E 2,o and E 2,e . Let R2 denote the space of R2 -valued polynomials of degree two in the variables z 1,o , z 1,e , z 2,o , z 2,e , equip R2 with the basis i k i k B = {(1, 0)T z 1,o z 1,e z 2,o z 2,e , (0, 1)T z 1,o z 1,e z 2,o z 2,e }i+ j+k+=2 , j

j

and consider the linear operator L : R2 → R2 defined by (L)(z) = L 0 (z) − d[z](K z). Using the calculations i k L((1, 0)T z 1,o z 1,e z 2,o z 2,e ) j

i−1 i k k+1 i k +1 = λ20,0 (0, 1)T z 1,o z 1,e z 2,o z 2,e − (1, 0)T (i z 1,o z 1,e z 2,o z 2,e + j z 1,o z 1,e z 2,o z 2,e ), j

j

j−1

i k L((0, 1)T z 1,o z 1,e z 2,o z 2,e ) j

i−1 i k k+1 i k +1 = (1, 0)T z 1,o z 1,e z 2,o z 2,e − (0, 1)T (i z 1,o z 1,e z 2,o z 2,e + j z 1,o z 1,e z 2,o z 2,e ), j

j

j−1

to compute the matrix of L with respect to B, we find that this matrix is invertible, so that the component equation of (23) for P0 (z) has a unique solution. A similar argument yields P2,o (z) and P2,e (z), and one concludes that (23) admits a unique solution for (z). Let us now examine the system of ordinary differential equations ∂ξ z 1,o = z 2,o , ∂ξ z 2,o =

λ21,ε z 1,o

∂ξ z 1,e = z 2,e , +

f˜oε (z o , z e ),

∂ξ z 2,e =

λ21,ε z 1,e

(24) +

f˜eε (z o , z e ),

(25)

where f˜oε = P1,o f˜ε |q=0 is O(|z o ||(z o , z e )||(z o , z e , ε2 )|) and f˜eε = P1,e f˜ε |q=0 is O(|z e ||(z o ,z e )||(z o ,z e ,ε2 )|), which is obtained by neglecting h˜ ε and setting q = 0 in Eq. (20). This system inherits the reversibility and O(2) symmetry of Eqs. (17)–(19): it is invariant under the transformation ξ → −ξ , (z 1,o , z 1,e , z 2,o , z 2,e ) → Swh (z 1,o , z 1,e , z 2,o , z 2,e ), where the reverser Swh is defined by Swh (z 1,o , z 1,e , z 2,o , z 2,e ) = (−z 1,o , z 1,e , z 2,o , −z 2,e ), and under the transformation

z 1,o z 1,o z 2,o z 2,o

→ Ra ,

→ Ra z 1,e z 1,e z 2,e z 2,e for each a ∈ [0, 2π ). Introducing the scaled variables ξˇ = εξ, z 1 (ξ ) = ε zˇ 1 (ξˇ ), z 2 (ξ ) = ε2 zˇ 2 (ξˇ ), one finds from (24)–(25) that ∂ξˇ zˇ 1,o = zˇ 2,o , ∂ξˇ zˇ 2,o =

2 Cˇ 1 zˇ 1,o − Cˇ 2 zˇ 1,o (ˇz 1,o

(26) +

2 zˇ 1,e ) + Rεo (ˇz 1,o , zˇ 1,e , zˇ 2,o , zˇ 2,e ),

+

2 zˇ 1,e ) + Rεe (ˇz 1,o , zˇ 1,e , zˇ 2,o , zˇ 2,e ),

∂ξˇ zˇ 1,e = zˇ 2,e , ∂ξˇ zˇ 2,e =

2 Cˇ 1 zˇ 1,e − Cˇ 2 zˇ 1,e (ˇz 1,o

(27) (28) (29)

582

M. D. Groves, G. Schneider

Fig. 3. Dynamics in the (ˇz 1,e , zˇ 2,e ) coordinate plane

in which Cˇ 1 = 2k0 (1+k02 )3/2 and the O(ε2 ) remainder terms Rεo and Rεe are respectively odd and even in (ˇz 1,o , zˇ 2,e ) and satisfy Rεo (0, zˇ 1,e , 0, zˇ 2,e ) = 0, Rεe (ˇz 1,o , 0, zˇ 2,o , 0) = 0. The remaining coefficient Cˇ 2 is given by the formulae Cˇ 2 = P1,o

= P1,e

1

f 0;3,0 [{sin η}(3) , 0] +

1

f 0;1,1 [sin η, −2000 ]

π 3/2 π 3/2 1 1 0;3,0 (3) 0;1,1 f [{cos η} , 0] + f [cos η, − ] 0200 , π 3/2 π 3/2

(30)

and is required to be positive. Here we have adopted the notation f 0;n 1 ,n 2 =

1 dn 1 ,n 2 f 0 [0, 0], (z) = (n 1 + n 2 )! 1,2

i k i jk z 1o z 1e z 2o z 2e j

i+ j+k+=2

and {u}(n) is an abbreviation for the n-tuple (u, . . . , u). The system (26)–(29) has the property that the (ˇz 1,e , zˇ 2,e ) coordinate plane is invariant; in fact each orbit in its four-dimensional phase space is obtained from an orbit in the (ˇz 1,e , zˇ 2,e ) coordinate plane by a rotation Ra for some a ∈ (0, 2π ) (so that each subspace (Ra (0, zˇ 1,e ), Ra (0, zˇ 2,e )), a ∈ (0, 2π ) is invariant). The phase portrait of the (ˇz 1,e , zˇ 2,e ) coordinate plane in the limit ε → 0 is shown in Fig. 3. Notice in particular the two homoclinic orbits pˇ ± given by the explicit formulae

1/2 2Cˇ 1 1/2 =± sech(Cˇ 1 ξˇ ), Cˇ 2 1/2 2Cˇ 12 1/2 1/2 ± ˇ sech(Cˇ 1 ξˇ )tanh(Cˇ 1 ξˇ ); pˇ 2,e (ξ ) = ∓ Cˇ 2 ± ˇ (ξ ) pˇ 1,e

these orbits are reversible, that is they satisfy Swh pˇ ± (−ξ ) = pˇ ± (ξ ). A familiar reversibility argument shows that the phase portrait of the (ˇz 1,e , zˇ 2,e ) coordinate plane is qualitatively the same as that shown in Fig. 3 for small non-zero values of ε and yields the following lemma.

Modulating Pulses for Quasilinear Wave Equations

583

Lemma 2. Equations (24), (25) admit a pair p ε± of reversible homoclinic orbits of the form

ε± ε± ε p˜ 1 (εξ ) p1 (ξ ) = , p2ε± (ξ ) ε2 p˜ 2ε± (εξ ) where p˜ 1ε± , p˜ 2ε± are smooth functions with bounded derivatives. These homoclinic orbits satisfy | p1ε± (ξ )| ≤ ch εe−θε|ξ | , | p2ε± (ξ )| ≤ ch ε2 e−θε|ξ | , ξ ∈ R for any θ ∈ (0, (2k0 )1/2 (1 + k02 )3/4 ). The next step is a partial normal-form theory which eliminates sufficiently many terms in the Taylor expansion of h ε (z) that the remaining terms are exponentially small in comparison with ε. A central requirement of the relevant transformation theory, which is based upon a theory for finite-dimensional dynamical systems given by Iooss & Lombardi [7], is that the linearised vector field in the dynamical system for z should be diagonalisable; this condition ensures that certain estimates hold uniformly in the order of the terms eliminated from the vector field (see Lemma 1). The matrix K is clearly not diagonalisable, and this difficulty is overcome using the following scaled 2 variables. We replace √ε with an alternative small parameter µ by writing ε = µ for µ ∈ [0, µ0 ], µ0 = ε0 and choosing µ0 , and hence ε0 , as small as necessary in the following analysis. Defining −1/2 z 1 = µ−1 z 1 , z 2 = Cˇ 1 µ−3 z 2 , (q1 , q2 ) = µ−2 (q1 , q2 ),

one finds that (20)–(22) are transformed into µ

µ

∂ξ z = F1 (z) + F2 (z, q), µ

(31)

µ

∂ξ q1 = q2 + g6 (z, q) + h 1 (z), ∂ξ q 2 =

µ −c3 k02 ∂η2 q1

µ − c4 q1

(32)

+ µPsh,c (d1 (z 1 )q1 ) + µPsh,c (d2 (z 1 )∂η q1 ) µ

µ

+µPsh,c (d3 (z 1 )q2 ) + Psh,c (g3 (z, q)∂η2 q1 ) + g4 (z, q) µ

µ

+Psh,c (g5 (z, q)∂η q2 ) + h 2 (z),

(33)

in which µ

F1 (z) = F µ (z, 0),

µ

F2 (z, q) = F µ (z, q) − F µ (z, 0),

1/2 Cˇ 1 µ2 z 2 F (z, q) = 1/2 2 , −1/2 1/2 Cˇ 1 µ z 1 + Cˇ 1 (µ−2 (λ21,µ − Cˇ 1 µ4 )z 1 +µ−3 f˜ε (µz 1 , Cˇ 1 µ3 z 2 , µ2 q)) µ

1/2 µ g j (z, q) = g˜ εj (µz 1 , µ3 Cˇ 1 z 2 , µ2 q), µ g j (z, q)

=

h µ (z) =

1/2 µ−2 g˜ εj (µz 1 , Cˇ 1 µ3 z 2 , µ2 q), 1/2 µ−2 h˜ ε (µz 1 , Cˇ 1 µ3 z 2 )

j = 3, 5, j = 4, 6,

584

M. D. Groves, G. Schneider

and, with a slight abuse of notation, we have abbreviated λm,ε |ε=µ2 , ωm,ε |ε=µ2 , c3ε |ε=µ2 µ µ and c4ε |ε=µ2 to respectively λm,µ , ωm,µ , c3 and c4 (the primes have been dropped for notational simplicity). The linear part of the vector field on the right-hand side of (37) at µ = 0 is the zero matrix, which is trivially diagonalisable, while the nonlinearities in (31)–(33) satisfy the estimates µ |F1 (z)| µ |F2 (z, q)| µ g j (z, q)s+1 µ g4 (z, q)s+1

= O(µ2 |z| + |z|3 ), = O(qX s+1 (z, q)X s+1 ), sh,c

= O(µ(z, q)X s+1 ),

j = 3, 5,

= O(µ2 q2X s+1 + µ2 qX s+1 |z||(z, µ)|), sh,c

sh,c

µ

g6 (z, q)s+2 = O(µ2 |z|qX s+1 (z, q)X s+1 ),

(34) (35) (36)

sh,c

h µ (z)X s+1 = O(µ|z|2 |(z, µ)|). sh,c

The homoclinic orbits detected in Lemma 2 are denoted in the new variables by p µ± and satisfy the estimate | p µ± (ξ )| ≤ ch µe−µ

2 θ|ξ |

,

ξ ∈ R;

for notational simplicity we henceforth use the symbol p µ to denote either of the functions p µ+ , p µ− . A final preliminary step is necessary, namely the elimination of terms whose order µ is less than six and which are linear in q from the Taylor expansion of F2 (z, q). Our existence theory for modulating pulses is based upon perturbation arguments around an approximate modulating pulse p µ , and the presence of these terms lead to difficulties in obtaining the estimates necessary to construct an iteration scheme for this purpose (see the remarks above Proposition 11). Lemma 3. There is a near-identity, finite-dimensional change of variables which transforms the coupled systems (31) and (32), (33) into µ µ ∂ξ z = F¯1 (z) + F¯2 (z, q),

(37)

µ µ ∂ξ q1 = q2 + g¯ 6 (z, q) + h¯ 1 (z),

∂ξ q2 =

µ −c3 k02 ∂η2 q1

µ − c4 q1

(38)

+ µPsh,c (d1 (z 1 )q1 ) + µPsh,c (d2 (z 1 )∂η q1 ) µ

µ

+µPsh,c (d3 (z 1 )q2 ) + Psh,c (g¯ 3 (z, q)∂η2 q1 ) + g¯ 4 (z, q) µ µ +Psh,c (g¯ (z, q)∂η q2 ) + h¯ (z) 5

2

(39)

µ µ µ and preserves the reversibility. The nonlinearities F¯1 , h¯ j and g j satisfy the same estiµ µ µ mates as respectively F1 , h j and g j , and µ

| F¯2 (z, q)| = O(|z||(z, µ)|4 qX s+1 + q2X s+1 ). sh,c

(40)

sh,c

Proof. We write µ µ F2 (z, q) = Fˆ2 (z, q) +

5 5−i i=1 j=0

µ j bi, j (z, q),

(41)

Modulating Pulses for Quasilinear Wave Equations

585

where µ

| Fˆ2 (z, q)| = O(|z||(z, µ)|4 qX s+1 + q2X s+1 ) sh,c

sh,c

and bi, j is a µ-independent operator which is homogeneous of degree i in its first argument and linear in its second. Because Psh,c F µ (z, q) = 0, one finds that in fact b2i+1, j (z, q) =

i+1

k b2i+1, j (z, q2k ), i = 0, 1, 2

k=0

b2i, j (z, q) =

i

k b2i, j (z, q2k+1 ), i = 1, 2

k=1

where bi,k j is again homogeneous of degree i in its first argument and linear in its second. The change of variable 0 z˜ = z + d1,0 (z, Z ), 1 is linear in both its arguments, may be employed to eliminate the term in which d1,0 1 (z, Z ) (at the expense of modifying Fˆ µ , g µ , h µ and the higher-order terms in the sum b1,0 2 j on the right-hand side of (41)). Proceeding as in Lemma 1, we find that it is necessary 0 so that to choose d1,0 0 0 (z, L 0 Z ) = −b1,0 (z, Z ), d1,0 s → X s implies that this equation can always be solved. and the invertibility of L 0 : Xsh sh Similar arguments show that the change of variables k z˜ = z + µ j d2i+1, j (z, q2k ),

in which di,k j is homogeneous of degree i in its first argument and linear in its second, k can be used the eliminate the term µ j b2i+1, j (z, q2k ) (at the expense of modifying higherorder terms). Our objective is therefore achieved by using the composition of the above changes of variables in the correct order. In Sect. 4 we use a sequence of changes of variable of the form q˜ = q + j (µ, z),

j = 4, . . . , p,

(42) µ

where j (µ, z) is homogeneous of degree j in (z, µ), to remove the terms h j (z), j = 4, . . . , p that are homogeneous of degree j in (z, µ) from the Taylor expansion µ of h µ (z) (at the expense of modifying higher-order terms). Observe that F˜2 (z, q) ˜ = µ 4 F2 (q˜ − (µ, z)) may contain terms of the kind identified in the sum on the right-hand side of (41). This difficulty can be avoided by using the transformation (42) for j = 4 before the change of variables given in Lemma 3; in this fashion we may assume without loss of generality that the normal-form transformation constructed in Sect. 4 does not affect the estimate (40).

586

M. D. Groves, G. Schneider

4. Normal-Form Theory 4.1. Construction of the normal-form transformation. Our normal-form theory consists µ of a sequence of changes of variable which systematically removes the terms h j (z), µ j = 4, . . . , p that are homogeneous of degree j in (z, µ) from h (z) while preserving the overall structure of Eqs. (37)–(39). It is possible to make an optimal choice of p so that the remaining terms are exponentially small in comparison to µ; the functions p µ± found in Sect. 3 therefore become very good approximations to genuine modulating pulse solutions and can be used as the starting point for a perturbation argument to find genuine modulating pulses. Our analysis is based upon a theory for finite-dimensional dynamical systems given by Iooss & Lombardi [7], and we use their notation and refer to several of their combinatorial results here. The dependence of our equations upon µ is accommodated by introducing the new variable y = (z, µ) and attaching the additional equation ∂ξ µ = 0 to Eq. (37); in this notation Eqs. (37)–(39) are written as ∂ξ y = F1 (y) + F2 (y, q),

(43)

∂ξ q1 = q2 + g6 (y, q) + h 1 (y), ∂ξ q2 =

(44)

µ −c3 k02 ∂η2 q1

µ − c4 q1 + Psh,c (d4 (y)q1 ) + Psh,c (d5 (y)∂η q1 ) + Psh,c (d6 (y)q2 ) (45) +Psh,c (g3 (y, q)∂η2 q1 ) + g4 (y, q) + Psh,c (g5 (y, q)∂η q2 ) + h 2 (y),

where F1 (y) =

µ F¯1 (z) , 0

µ

F2 (y, q) =

g j (y, q) = g¯ j (z, q),

j = 3, . . . , 6,

d j+3 (y) = µd j (z 1 ),

j = 1, 2, 3.

µ F¯2 (z, q) , 0 µ h j (y) = h¯ (z), j

j = 1, 2,

We use a change of variable of the form q˜ = q + (y), s+2 , which transforms Eqs. (43)–(45) into where : R5 → Xsh,c

∂ξ y = F˜1 (y) + F˜2 (y, q), ˜ ˜ + h˜ 1 (y), ∂ξ q˜1 = q˜2 + g˜ 6 (y, q) ∂ξ q˜2 =

µ −c3 k02 ∂η2 q˜1

(46) (47)

µ − c4 q˜1 + Psh,c (d4 (y)q˜1 ) + Psh,c (d5 (y)∂η q˜1 ) + Psh,c (d6 (y)q˜2 ) +Psh,c (g˜ 3 (y, q)∂ ˜ η2 q˜1 ) + g˜ 4 (y, q) ˜ + Psh,c (g˜ 5 (y, q)∂ ˜ η q˜2 ) + h˜ 2 (y). (48)

Modulating Pulses for Quasilinear Wave Equations

587

Here F˜1 (y) = F1 (y), F˜2 (y, q) ˜ = F2 (y, q˜ − (y)), (49) g˜ j (y, q) ˜ = g j (y, q˜ − (y)), j = 3, 5, (50) g˜ 4 (y, q) ˜ = g4 (y, q˜ −(y))−g4 (y,−(y))+d2 [y](F(y, q˜ −(y))− F(y, −(y))) −Psh,c (g3 (y, q˜ − (y)) − g3 (y, −(y)))∂η2 1 (y) −Psh,c (g5 (y, q˜ − (y)) − g5 (y, −(y)))∂η 2 (y), g˜ 6 (y, q) ˜ = g6 (y, q˜ − (y)) − g6 (y, −(y)) +d1 [y](F(y, q˜ − (y)) − F(y, −(y))),

(51) (52)

in which F(y, q) = F1 (y) + F2 (y, q) and

˜ h(y) = −L 0 (y) + N (y),

in which N :

R5

→

s+1 Xsh,c

(53)

is defined by the formula

N (y) = −(L µ − L 0 )(y) + d[y](F(y, −(y))) + h(y)

g6 (y, −(y)) + −Psh,c (d4 (y)1 (y)) − Psh,c (d5 (y)∂η 1 (y)) − Psh,c (d6 (y)2 (y))

0 + g4 (y, −(y)) − Psh,c [g3 (y, −(y))∂η2 1 (y) + g5 (y, −(y))∂η 2 (y)] and L µ is an abbreviation for L ε |ε=µ2 . Let us write (y) =

p

k (y),

k=2

where k (y) is a polynomial which is homogeneous of degree k in y and takes values s+2 . We denote the space of such polynomials by P k and equip it with the inner in Xsh,c s+2 product and norm P, QP k = P(∂ y ).Q(y)| y=0 , s+2

1/2

|P|s+2 2 = P, P

k Ps+2

,

s+2 inner product, and for later use we also introduce in which the period denotes the Xsh,c the symbol Qk for the space of polynomials R5 → R5 that are homogeneous of degree k in y. We proceed by choosing k so that

where N k (y)

Lk = N k , k = 2, . . . , p,

(54)

(Lk )(y) = L 0 k (y)

(55)

and is the part of N (y) which is homogeneous of degree k in y; this choice of ˜ ensures that the Taylor expansion of h(y) does not contain any terms of order less than

588

M. D. Groves, G. Schneider

p (see Eq. (53)). Without loss of generality we can assume that h j (y) = 0, j = 2, . . . , 4 (see the remarks below Eq. (42)), so that j and N j also vanish for j = 2, . . . , 4; certain combinatorial aspects of the following theory are however simplified by including these zero terms in the expansions. It is also important to note that k affects N k+1 , …, N p , so that k must be chosen systematically for k = 2, . . . , p. The following result shows that Eq. (54) admits a unique solution for any value for k and yields an estimate for k in terms of N k which is independent of k. This estimate, which plays a crucial role in the following analysis, follows from the simple formula for the operator L, which is in turn a consequence of the fact that the linearisation of the vector field on the right-hand of Eq. (46) is the zero matrix. k k Proposition 1. The operator L : Ps+2 → Ps+1 defined by (55) is invertible and its operator norm

|||L−1 ||| =

sup |k |s+1 2 =1

|L−1 k |s+2 2

is bounded by a universal constant γ (and in particular is independent of k). Proof. Let {e0,1 , e0,2 }, {em,1,o , em,2,o } and {em,1,e , em,2,e } be the usual bases for respectively E 0 , E m,o , m ≥ 2 and E m,e , m ≥ 2 and consider the orthonormal basis Bk = {P α e0,1 , P α e0,2 , P α em,1,o , P α em,2,o , P α em,1,e , P α em,2,e : |α| = k, m ≥ 2} for Psk , where α ∈ N50 is a multi-index and Pα = Observe that

1 α2 α3 α4 α5 µα1 z 1,o z 1,e z 2,o z 2,e . α1 !α2 !α3 !α4 !α5 !

α,k α,k Psk = ⊕ P0α,k ⊕ Pm,o ⊕ Pm,e , |α|=k

|α|=k m≥2

(56)

|α|=k m≥2

α,k α,k in which P0α,k = span{P α e0,1 , P α e0,2 }, Pm,o = span{P α em,1,o , P α em,2,o } and Pm,o = span{P α em,1,e , P α em,1,e }, are invariant under L; furthermore Lα,k = L| admits an α,k 0 P 0

inverse whose matrix with respect to the basis {P α e0,1 , P α e0,2 } for P0α,k is

0 1/λ20,0 −1 (Lα,k ) = 0 1 0

α,k and Lα,k α,k , Lm,e = L| α,k admit inverses whose matrices with respect to the m,o = L|Pm,o Pm,e α,k α,k and {P α em,1,e , P α em,2,e } for Pm,e are bases {P α em,1,o , P α em,2,o } for Pm,o

2 2 0 −1/ωm,0 0 −1/ωm,0 α,k −1 α,k −1 , (Lm,e ) = . (Lm,o ) = 1 0 1 0

Let us write k =

|α|=k

α,k 0 +

|α|=k m≥2

α,k m,o +

|α|=k m≥2

α,k m,e

Modulating Pulses for Quasilinear Wave Equations

589

in accordance with the orthogonal decomposition (56), so that (|k |s2 )2 =

2 |α,k 0 | +

|α|=k

+

2 α,k 2 m 2(s+1) |(α,k m,o )1 | + |(m,e )1 | )

|α|=k m≥2 2 α,k 2 m 2s |(α,k m,o )2 | + |(m,e )2 | ),

|α|=k m≥2

where the symbols (v)1 and (v)2 denote the first and second components of a vector v ∈ X s . Since α,k α,k −1 α,k −1 α,k (L0 )−1 α,k + (L ) + (Lα,k L−1 k = m,o m,o m,e ) m,e , 0 |α|=k

|α|=k m≥2

|α|=k m≥2

it follows that 2 (|L−1 k |s+2 2 ) =

−1 α,k 2 |(Lα,k 0 ) 0 |

|α|=k

+

−1 α,k 2 α,k −1 α,k 2 m 2(s+3) {|((Lα,k m,o ) m,o )1 | + |((Lm,e ) m,e )1 | }

|α|=k m≥2

+

−1 α,k 2 α,k −1 α,k 2 m 2(s+2) {|((Lα,k m,o ) m,o )2 | + |((Lm,e ) m,e )2 | }

|α|=k m≥2

=

|α|=k

+

1 λ40,0

2 |(α,k 0 )2 |

2 + |(α,k 0 )1 |

m 2(s+3) 2 α,k 2 |(α,k m,o )2 | + |(m,e )2 | ) 4 ω m,0 |α|=k m≥2

+

2 α,k 2 m 2(s+2) |(α,k m,o )1 | + |(m,e )1 | )

|α|=k m≥2

≤ γ2

⎧ ⎪ ⎪ ⎨ ⎪ ⎪ ⎩|α|=k

+ =

2 |α,k 0 | +

2 α,k 2 m 2(s+1) |(α,k m,o )2 | + |(m,e )2 | )

|α|=k m≥2

2 α,k 2 m 2(s+2) |(α,k m,o )1 | + |(m,e )1 | )

|α|=k m≥2 2 k s+1 2 γ (| |2 ) ,

⎫ ⎪ ⎪ ⎬ ⎪ ⎪ ⎭

2 ≥ m 2 /γ for m ≥ 2. in which γ is a constant chosen so that λ20,0 ≥ 1/γ and ωm,0

590

M. D. Groves, G. Schneider

In Sects. 4.2–4.4 below we derive precise estimates for and the transformed nonlinearities defined in Eqs. (49)–(52). The main issue here is careful book-keeping; we therefore calculate the constants in order of magnitude estimates explicitly and reserve the symbol c to denote a constant which does not depend upon the order p of the normal form or the maximum size δ of |y|. 4.2. Estimates for the transformation. The next task is to estimate the size of , and for m this purpose we use the norms |m |s+2 2 , (y)X s+2 and sh,c

m (y)X s+2

|m |s+2 0 = sup

sh,c

|y|m

y∈R5

for m = 2, . . . , p. Our estimates for these quantities are used in Sect. 4.3 below to show ˜ that h(y) is exponentially small with respect to y. m s+2 is obtained We begin by estimating |N m |s+1 2 , from which an estimate for | |2 using Proposition 1. A straightforward calculation shows that (m) N1m (y) = h m ]+ 1 [{y}

q

i,q−i

g6

[{y}(i) , − p1 , . . . , − pq−i ]

2≤q≤m i=0 i+ p1 +... pq−i =m

⎛

+

⎜ dk1 [y]⎜ ⎝

2≤k≤m−1

q

⎞

⎟ F i,q−i [{y}(i) ,− p1 , . . . ,− pq−i ⎟ ]⎠,

2≤q≤m−k+1 i=0 i+ p1 +... pq−i =m−k+1

(57) N2m (y)

=

(m) hm ] 2 [{y}

+

⎛

⎜ dk2 [y] ⎜ ⎝

2≤k≤m−1

−

q

⎞

⎟ F i,q−i [{y}(i) ,− p1 , . . . ,− pq−i ]⎟ ⎠

2≤q≤m−k+1 i=0 i+p1+... pq−i =m−k+1

i,q−i

Psh,c (g3

1≤q≤m−1 i=0 p1 +... pq−i +i+r =m r >0 q

+

q

i,q−i

g4

[{y}(i) , − p1 , . . . , − pq−i ]∂η2 r1 )

[{y}(i) , − p1 , . . . , − pq−i ]

2≤q≤m i=0 i+ p1 +... pq−i =m

−

q

i,q−i

Psh,c (g5

[{y}(i) , − p1 , . . . , − pq−i ]∂η r2 )

1≤q≤m−1 i=0 p1 +... pq−i +i+r =m r >0 m−k m−k 2 2 k c3 µ k 0 ∂η 1 + c4m−k µm−k k1 − Psh,c (d4 (y)1m−2 ) + 2≤k≤m−1 2≤k≤m−1 m−2 −Psh,c (d5 (y)∂η 1 ) − Psh,c (d6 (y)2m−2 ), (58)

Modulating Pulses for Quasilinear Wave Equations

591

in which we use the notation 1 n d f [0](y1 , . . . , yn ), n! 1 dn 1 ,n 2 f [0](y1 , . . . , yn 1 , w1 , . . . , wn 2 ) f n 1 ,n 2 [y1 , . . . , yn 1 , w1 , . . . , wn 2 ] = (n 1 + n 2 )! 1,2 f n [y1 , . . . , yn ] =

for derivatives of functions of one and two variables and write µ c3

=

∞

c3i µi ,

µ c4

=

i=0

∞

c4i µi .

i=0

|N m |s+1 2

An estimate for can be obtained from formulae (57), (58) using the following lemma, whose proof is readily deduced from that of Lemmata 2.10 and 2.11 of Iooss & Lombardi [7]. Lemma 4. Define 1 |P|s2,m = √ |P|s2 , P ∈ Psm , m!

1 |P|2,m = √ |P|2 , P ∈ Qm . m!

(i) The estimates |k |s0 ≤ |k |s2,k ≤

√ 2 ks 5k | |0

hold for each k ∈ Psk . (ii) Suppose that q ∈ N, i ∈ {0, . . . , q}, { p }1≤≤q−i ⊂ N and that Rq is a bounded, s )q → X s q-linear operator (R5 × Xsh,c sh,c with operator norm |||Rq |||. For each choice of p ∈ Ps , = 1, . . . , q − i the polynomial Rq [{y}i , p1 , . . . , pq−i ] lies in Psn with n = p1 + . . . + pq−i + i and satisfies the estimate |Rq [{y}i , p1 , . . . , pq−i ]|s2,n ≤ |||Rq ||| 5i | p1 |s2, p1 . . . | pq−i |s2, pq−i . The analogous result holds when Psm is replaced by Qm . (iii) Suppose that p ∈ N, k ∈ N0 and k ∈ Psk , N p ∈ Q p . The polynomial dk [y](N p ) lies in Psn with n = k + p − 1 and satisfies the estimate |dk [y](N p )|s2,n ≤ k 2 + 4k|k |s2,k |N p |2, p . s+1 → H s+2 (R2 ), j = 3, 4, 5 is analytic, it satisfies the Because g6 : R5 × Xsh,c per inequality

g6n 1 ,n 2 [y1 , . . . , yn 1 , q1 , . . . , qn 2 ]s+2 ≤

a ρ n 1 +n 2

|y1 | . . . |yn 1 |q1 X s+1 . . . qn 2 X s+1 sh,c

sh,c

for each n 1 , n 2 ∈ N0 , where a > 1 and ρ < 1 are universal constants, and the analogous s+1 → R5 and h : R5 → X s+1 ; for notational simplicity estimates apply to F : R5 × Xsh,c sh,c later we estimate g4n 1 ,n 2 [y1 , . . . , yn 1 , q1 , . . . , qn 2 ]s+1 a ≤ |y1 | . . . |yn 1 |q1 X s+1 . . . qn 2 X s+1 , n sh,c sh,c 3ρ 1 +n 2

j = 3, 4, 5

592

M. D. Groves, G. Schneider

and cnj ≤

a , 2ρ n

j = 3, 4,

d j (y)s+1 ≤

a |y|2 , 3ρ 2

j = 4, 5, 6.

Using these estimates together with Proposition 1 and Lemma 4, we find from Eqs. (57), (58) that ⎡ ⎢ a √ m+ (k 2 + 4k)1/2 φk 5 φm ≤ γ ⎢ ⎣ ρm 2≤k≤m−1

2 + 3

q

2≤q≤m−k+1 i=0 i+ p1+...+pq−i =m−k+1

q

a i 5 φ p1 . . . φ pq−i ρq

a i 5 φ p1 . . . φ pq−i φr ρq

2≤q≤m−1 i=0 p1 +... pq−i +i+r =m r >0 q

a i 5 φ p1 . . . φ pq−i ρq 2≤q≤m i=0 i+ p1 +... pq−i =m ⎤ a √ m−k k ⎦ 5a m−2 + 2φ + 5 φ ρ ρ m−k

+

1 3

2≤k≤m−1

≤

aγ √ m 5 + (k 2 + 4k)1/2 φk ρm 2≤k≤m−1

+3

q

q

2≤q≤m−k+1 i=0 i+ p1 +...+ pq−i =m−k+1

2≤q≤m i=0 i+ p1 +... pq−i =m

aγ i 5 φ p1 . . . φ pq−i ρq

aγ i 5 φ p1 . . . φ pq−i , ρq

(59)

where φm = |m |s+2 2,m . The above inequalities are converted into an estimate for φm in Propositions 2 and 3 below, the first of which is proved by straightforward mathematical induction; we note that √ √ √ √ 5aγ 5aγ β2 = 4, φ2 ≤ 4 5 = 5 β2 , ρ2 ρ2 so that the result holds for m = 2, and proceed inductively using (59). Proposition 2. Consider the sequence {βn } defined recursively by the formulae β1 = 1, m−2 q ρ +3 βm = aγ

2≤q≤m i=0 i+ p1 +... pq−i =m

+5

2≤k≤m−1

kβk

q

ρ aγ

2≤q≤m−k+1 i=0 i+ p1 +...+ pq−i =m−k+1

q−2 β p1 . . . β pq−i

ρ a

q−2

β p1 . . . β pq−i

Modulating Pulses for Quasilinear Wave Equations

593

for m ≥ 2. The quantity φm satisfies the estimate

φm ≤

√

√ 5

5aγ ρ2

m−1 βm ,

m ≥ 2.

Proposition 3. Consider the sequence {αn } defined by the formulae α1 = 1, αm = m−2 (m − 2)!, m ≥ 2, where ≥ 1 is a constant. The estimate βm ≤ 2m αm , m ≥ 1 holds for ≥ 12 +

√

(60)

269 + 13ρ/(aγ ).

Proof. This result is also established using mathematical induction. We note that β1 = 1 < 2 = 2α1 , β2 = 4 = 22 α2 , so that (60) holds for m = 1 and m = 2, and proceed inductively by choosing m ≥ 3 and supposing that βk ≤ 2k αk for 1 ≤ k < m. Observe that q

β p1 . . . β pq−i =

β p1 . . . β pq

p1 +...+ pq =m p j ≥1

i=0 i+ p1 +...+ pq−i =m p j ≥2

because β1 = 1; the recursion relation for βm can therefore be rewritten as βm =

ρ aγ

+5

m−2

q

+3

2≤q≤m i=0 p1 +...+ pq =m p j ≥1

kβk

2≤k≤m−1

q

ρ aγ

2≤q≤m−k+1 i=0 p1 +...+ pq =m−k+1 p j ≥1

q−2 β p1 . . . β pq

ρ aγ

q−2 β p1 . . . β pq

for m ≥ 2, and it follows that βm ≤ 2m [1m + 2m + 3m + 4m ] +

ρ aγ

m−2 ,

594

M. D. Groves, G. Schneider

where 1m = 3

3≤q≤m p1 +... pq =m p j ≥1

2m = 5

3m = 3

α p1 . . . α pq ,

3≤q≤m−k+1 p1 +...+ pq =m−k+1 p j ≥1

ρ aγ

q−2 α p1 . . . α pq ,

αk αm−k ,

1≤k≤m−1

4m = 5

q−2

kαk

2≤k≤m−1

ρ aγ

⎛ kαk ⎝

2≤k≤m−1

⎞

α j αm−k+1− j ⎠.

1≤ j≤m−k

To estimate the quantities 1m , …, 4m we use the inequalities

αm , q−2

kαk αm−k+1 ≤

5 αm , m ≥ 3, 2

(62)

2 αm , m ≥ 3,

(63)

p1 +... pq =m p j ≥1

2

α p1 . . . α pq ≤

2≤k≤m−1

αk αm−k ≤

1≤k≤m−1

3 ≤ q ≤ m,

(61)

which were established by Iooss & Lombardi [7, Lemma 2.13]. Inequality (61) shows that ρ ρ q−2 1 aγ 1 m ≤ 6 αm ≤ 6 αm ≤ αm ρ aγ 1 − aγ 2 3≤q≤m

whenever ρ/(aγ ) ≤ 1/13, inequalities (61)–(63) similarly yield 2m ≤

5 2

kαk αm−k+1 ≤

2≤k≤m−1

25 αm , 42

3m ≤

and it follows from (62), (63) that 4m ≤

10

kαk αm−k+1 ≤

2≤k≤m−1

25 αm . 2

Finally, note that

ρ aγ

m−2

=

ρ aγ

m−2 m−2 ≤

1 m 2 αm , 4

6 αm

Modulating Pulses for Quasilinear Wave Equations

and choosing ≥ 12 +

595

√

269, one concludes that

125 3 6 2m αm ≤ 2m αm . + + βm ≤ 4 42

√ In keeping with Proposition 3 we fix = 12 + 269 + 13ρ/(aγ ) for the remainder of the article. The proposition implies that 20aγ φm ≤ ρ2

√ m−2 2 5aγ (m − 2)!, m = 2, 3, . . . , ρ2

(64)

and by imposing a mutual constraint on the order p of the normal form and the maximum size δ of |y|, we can use (64) to obtain another estimate for (cf. Iooss & Lombardi [7, Lemma 2.14]). Proposition 4. Suppose that δ > 0 and p ≥ 2 satisfy ρ2 δp ≤ √ . 4 5aγ The estimates

! ! ! ! ! k ! ! (y)! ! ! ! !2≤k≤ p

s+2 Xsh,c

√ 5δ , ≤

(65)

""" """ """ """ 10 k """ d [y]"""""" ≤ """

hold for every y ∈ R5 such that |y| ≤ δ. Here ||| # s+2 . norm of 2≤k≤ p dk [y] : R5 → Xsh,c

2≤k≤ p

# 2≤k≤ p

dk [y]||| denotes the operator

4.3. Estimates for the # ptransformed nonlinearities. In this section we use the above estimates for = k=2 k with an optimal choice of p to derive estimates for the terms F˜2 , g˜ j , j = 3, . . . , 6 and h˜ appearing in Eqs. (47), (48); in particular we show ˜ that h(y) s+1 is exponentially small with respect to y. Xsh,c It follows from that fact that h˜ m is identically zero for m ≤ p that

h˜ 2 (y) =

q c4 µq k1 +

h 2 [{y}(q) ] + q

p≤q

p+1≤q

⎛

+

q c3 µq k02 ∂η2 ⎝ k1 ⎠ + p≤q

−

1≤q≤ p−1 i=0 i+ p1 +...+ pq−i +r ≥ p+1 r >0

q

c3 µq k02 ∂η2 r1

1≤q≤ p−1 r ≥ p+1−q

2≤k≤ p q

q

c4 µq r1

1≤q≤ p−1 r ≥ p+1−q

2≤k≤ p

⎞

i,q−i

g3

[{y}(i) , − p1 , . . . , − pq−i ]∂η2 r1

596

M. D. Groves, G. Schneider q

−

i,q−i

g5

[{y}(i) , − p1 , . . . , − pq−i ]∂η r2

1≤q≤ p−1 i=0 i+ p1 +...+ pq−i +r ≥ p+1 r >0 q i,q−i + g4 [{y}(i) , − p1 , . . . , − pq−i ] 2≤q≤ p i=0 i+ p1 +... pq−i ≥ p+1

⎡ ⎧ ⎫(q−i) ⎤ ⎛ ⎞ q ⎨ ⎬ ⎥ 2⎝ i,q−i ⎢ (i) g3 k k1 ⎠ − ⎣{y} , − ⎦ ∂η ⎩ ⎭ p≤q i=0

−

q i,q−i ⎢ (i) g5 ⎣{y} , − p≤q i=0

+

2≤k≤ p

⎡

⎧ ⎨ ⎩

2≤k≤ p

⎡

q i,q−i ⎢ (i) g4 ⎣{y} , − p+1≤q i=0

⎛ ⎜ ⎜ dk2 [y] ⎜ ⎝

+

⎧ ⎨ ⎩

2≤k≤ p p+1≤n≤ p+k−1

2≤k≤ p

⎞

+

q

⎡

⎢ F i,q−i ⎣{y}(i) , −

p+1≤q i=0

q

Psh,c (d4 (y)1 ) +

p−1≤q≤p

⎟ ⎟ F i,q−i [{y}(i) , − p1 , . . . , − pq−i ]⎟ ⎠

F i,q−i [{y}(i) , − p1 , . . . , − pq−i ]

2≤q≤ p i=0 i+ p1 +... pq−i ≥ p+1

k=2

2≤k≤ p

2≤q≤n−k+1 i=0 i+ p1 +... pq−i =n−k+1

⎛ ⎞ p q ⎜ ⎜ +⎝ dk2 [y]⎠⎜ ⎝

+

⎫(q−i) ⎤ ⎬ ⎥ k ⎦ ⎭

q

⎛

+

2≤k≤ p

⎫(q−i) ⎤ ⎛ ⎞ ⎬ ⎥ ⎝ k k 2 ⎠ ⎦ ∂η ⎭

⎧ ⎨ ⎩

2≤k≤ p

⎫(q−i) ⎤⎞ ⎬ ⎥⎟ k ⎦⎠ ⎭

q

Psh,c (d5 (y)∂η 1 )

p−1≤q≤ p q Psh,c (d6 (y)2 ),

(66)

p−1≤q≤ p

together with a similar expression for h˜ 1 (y). Suppose that |y| ≤ δ and p, δ satisfy the constraint (65). Using Proposition 4 and the rule k k k (y)X s+2 ≤ |k |s+2 0 |y| ≤ φk |y| , sh,c

one finds from the above formula that h˜ 2 (y)s+1 ≤

p+1≤q

a √ q a √ q ( 5δ) + ( 5δ) ρq ρq p≤q

√ 5δ +

1≤q≤ p−1 r ≥ p+1−q

a √ q 5 φr δ q+r ρq

Modulating Pulses for Quasilinear Wave Equations

+

+

2 3

1 10 + 3

597

q

aδ n i 5 φ p1 . . . φ pq−i ρq

2≤q≤ p i=0 n=i+ p1 +...+ pq−i ≥ p+1 q n

1≤q≤ p−1 i=0 n=i+ p1 +...+ pq−i +r ≥ p+1 r >0

aδ ρq

5i φ p1 . . . φ pq−i φr

√ q−i

q a √ i 5δ ( 5δ) ρq p+1≤q i=0 √ q−i √ q 5δ 5δ 2 a √ i + ( 5δ) q 3 p≤q ρ

1 10 + + 3

i=0

+

√

5kφk

2≤k≤ p p+1≤n≤ p+k−1

q

2≤q≤n−k+1 i=0 i+ p1 +...+ pq−i =n−k+1

aδ n i 5 φ p1 . . . φ pq−i ρq

5a q φ ρ2 1 p−1≤q≤ p

10 10 1 p + 3 + 2p + 3p , ≤ 3+ +

where p

1 =

p+1≤q

p 2

=

√ q q a (q + 1) 5 δ , ρq

2≤q≤ p n= p1 +...+ pq ≥ p+1 p j ≥1 p

3 =

(67)

2≤k≤ p p+1≤n≤ p+k−1

aδ n φ p . . . φ pq , ρq 1

√ 5kφk

2≤q≤n−k+1 i+ p1 +...+ pq =n−k+1 p j ≥1

aδ n i 5 φ p1 . . . φ pq , ρq

√ 5. A similar calculation shows that

10 10 1p + 1 + 2p + 3p . (68) h˜ 1 (y)s+2 ≤ 2 + Proposition 5. Suppose that δ > 0 and p ≥ 2 satisfy the stronger mutual constraint

and we have defined φ1 =

ρ2 . δp ≤ √ 4 5eaγ The estimate

˜ h(y)X s+1 ≤ c (Cδ) p+1 p! + sh,c

holds for every y ∈ R5 such that |y| ≤ δ.

1 e p+1 p 2

√ 4 5aγ , C= ρ2

598

M. D. Groves, G. Schneider p

p

p

Proof. We proceed by estimating the quantities 1 , 2 , 3 , making use of the inequalities 4ar 2 1 1 rq (Cδ)n ( p1 − 2)! . . . ( pq − 2)! ≤ p+1 . 2 . , (69) e p 1−r p +...+ p ≤n 2≤q≤ p

n≥ p+1

q

1

1≤ p j ≤ p

where r = ρ/(aγ ), and k(k − 2)!(Cδ) p+1 2≤k≤ p

(Cδ)n− p−1 (n − k − 1)! ≤ 2(Cδ) p+1 p!

(70)

p+1≤n≤ p+k−1

which were established by Iooss & Lombardi [7, pp. 30–32]. Observe that q + 1 ≤ 2q and √ 2 5δ r r ≤ ≤ ρ 2p 4 for p ≥ 2, so that √ p+1 r 2 5δ 1p ≤ a ρ 4

4a = 4−r

q

q≥0

√ p+1 √ p+1 5δ 8a 2 5δ ≤ ρ 7 ρ

for r ≤ 1/2. The quantity 2p is estimated using the calculation 2p

≤

2≤q≤ p n≥ p+1 p1 +...+ pq ≤n 1≤ p j ≤ p

aδ n ρq

aδ n √ ≤ (2 5)q ρq 2≤q≤ p n≥ p+1 ≤a rq

2≤q≤ p

20aγ ρ2

n−2q

q √ 2 5aγ α p1 . . . α pq ρ2

√ n−q 2 5aγ ( p1 − 2)! . . . ( pq − 2)! ρ2 (Cδ)n ( p1 − 2)! . . . ( pq − 2)!

n≥ p+1 p1 +...+ pq ≤n 1≤ p j ≤ p

4ar 2

1 1 . . e p+1 p 2 1 − r 2a ≤ 2 p+1 p e ≤

for r ≤ 1/2, where the fourth line follows from the third by (69). Finally, we find from (61) that q−2 ρ α1 . . . α pq aγ 2≤q≤n−k+1 i+ p1 +...+ pq =n−k+1 p j ≥1

≤

2≤q≤n−k+1

ρ 2 aγ

q−2 αn−k+1 ≤

2 αn−k+1 ≤ 4αn−k+1 1−r

Modulating Pulses for Quasilinear Wave Equations

599

for r ≤ 1/2, and combining this estimate with (70) yields p 3

≤

√ 20aγ 5k ρ2

2≤k≤ p p+1≤n≤ p+k−1

×

√ k−2 2 5aγ αk ρ2

2≤q≤n−k+1 i+ p1 +...+ pq =n−k+1 p j ≥1

1 = γ

aδ n ρq

≤

=

40ρ 2 aγ 2 3

n

2≤k≤ p p+1≤n≤ p+k−1

≤

aγ 2 3

ρ aγ

q−2 α1 . . . α pq

√ n−1 2 5aγ δ kαk αn−k+1 ρ2

2≤k≤ p p+1≤n≤ p+k−1

2≤k≤ p

n−k+1−2q

q √ 2 5aγ α1 . . . α pq ρ2

2≤q≤n−k+1 i+ p1 +...+ pq =n−k+1 p j ≥q

n

(Cδ)n k(k − 2)!(n − k − 1)!

40 k(k − 2)!(Cδ) p+1 aγ 2 3 80ρ 2

20aγ ρ2

√ k−2 2 5aγ 20 5δ kαk ρ2 √

2≤k≤ p p+1≤n≤ p+k−1

√ 80 5 ≤ γ

(Cδ)n− p−1 (n − k − 1)!

p+1≤n≤ p+k−1

(Cδ) p+1 p!.

The result follows from inequalities (68), (67) and the above estimates for 1p , 2p ,

3p .

Remark 2. Inspecting the proof of the above proposition, one finds that it can proved under the weaker hypothesis ρ2 δp ≤ √ 2 5eaγ √ and with C replaced by 2 5aγ /ρ 2 . The stronger constraint and larger value of C are however required later (see Proposition 9) and we introduce them from the outset for notational simplicity. ˜ s+1 is given by the following The final step in the derivation of our estimate for h Xsh,c proposition, which was proved by Iooss & Lombardi [7, Lemma 2.18]. Proposition 6. Choose ϑ > 0. The function f ϑ : N → R defined by f ϑ ( p) = ϑ p+1 p!

600

M. D. Groves, G. Schneider

satisfies fϑ

1 ϑe

=m

ϑ −2/ϑe e2 p! e , m = sup p+1/2 − p , e e p∈N p

where [·] denotes the integer part of a real number. Define popt ∈ [2, (eCδ)−1 ] by popt = [(eCδ)−1 ]. Applying Proposition 5, we find that 1 popt +1 ˜ popt ! + p +1 2 h(y) s+1 ≤ c (Cδ) Xsh,c e opt popt

Cδ −2/eCδ 2 −1/eCδ e + (2eCδ) e ≤c m e

27 + 4e2 δ 2 e−1/eCδ , (71) ≤ cC 2 m 8e in which the second line follows from the first by Proposition 6 with ϑ = Cδ and the inequalities 1 ≤ popt + 1, eCδ

1 ≤ 2eCδ. popt

Let us now return to the original notation by writing Eqs. (46)–(48) as µ µ ∂ξ z = F˜1 (z) + F˜2 (z, q), ˜

(72)

µ µ ˜ + h˜ 1 (z), ∂ξ q˜1 = q˜2 + g˜ 6 (z, q)

∂ξ q˜2 =

µ −c3 k02 ∂η2 q˜1

µ − c4 q˜1

(73)

+ µPsh,c (d1 (z 1 )q˜1 ) + µPsh,c (d2 (z 1 )∂η q˜1 ) µ

µ

˜ η2 q˜1 ) + g˜ 4 (z, q) ˜ + µPsh,c (d3 (z 1 )q˜2 ) + Psh,c (g˜ 3 (z, q)∂ µ µ + Psh,c (g˜ 5 (z, q)∂ ˜ η q˜2 ) + h˜ 2 (z),

(74)

where y = (z, µ); these equations are valid for |(z, µ)| ≤ δ. Recall that the approximate modulating pulses p µ found in Sect. 3 satisfies | p µ (ξ )| ≤ ch µe−µ

2 θξ

, ξ ∈ [0, ∞),

and it is therefore necessary to choose δ so that |( p µ (ξ ), µ)| ≤ δ for ξ ∈ [0, ∞). This task is accomplished by defining δ = (3ch + 1)µ0 and restricting (z, µ) to {|z| ≤ 2ch µ0 , 0 ≤ µ ≤ µ0 }; without loss of generality we henceforth suppose that µ = µ0 . It follows from inequality (71) that h˜ µ (z)X s+1 ≤ cµ2 e−c sh,c

/µ

,

(75)

where c = (eC(3ch + 1))−1 , and inequality (75) is the requisite estimate showing that h˜ µ is exponentially small with respect to µ for p = popt .

Modulating Pulses for Quasilinear Wave Equations

601

We conclude this section by stating estimates for the remaining nonlinearities F˜ µ µ and g˜ j , j = 3, . . . , 6 appearing in the transformed Eqs. (46)–(48). These estimates are obtained as a corollary of the following proposition, which refines the results of Propµ µ osition 4, together with the definitions (49)–(52) of F˜2 and g˜ j , j = 3, . . . , 6 and the estimates (34)–(40) for the original nonlinearities. Proposition 7. Suppose that δ > 0 and p ≥ 2 satisfy (65). The estimates ! ! ! ! """ """ ! ! """ """ k 2 3 k " ! ! " " (y)! ≤ c|z| |y| , d [y]"""""" ≤ c|y|4 ! """ !2≤k≤ p ! s+2 2≤k≤ p Xsh,c

hold for every y ∈ R5 such that |y| ≤ δ. µ µ Corollary 1. The transformed nonlinearities F˜2 and g j , j = 3, . . . , 6 satisfy the inequalities µ | F˜2 (z, q)| ≤ c(|z||(z, µ)|4 (z, q)X s+1 + q2X s+1 ), sh,c

µ ˜ s+1 g˜ j (z, q)

≤ cµ(z, q) ˜ X s+1 , sh,c

j = 3, 5,

µ

g˜ 4 (z, q) ˜ s+1 ≤ cµ2 q ˜ X s+1 (z, q) ˜ X s+1 , sh,c

µ g˜ 6 (z, q) ˜ s+2

≤ cµ q ˜ X s+1 (z, q) ˜ X s+1 . 2

sh,c

4.4. Estimates for the derivatives of the transformed nonlinearities. The existence theory for modulating pulses presented in Sects. 5 and 6 below is based upon perturbation arguments around an approximate modulating pulse p µ . In order to use perturbation theory of this kind we require additional estimates upon the derivatives of the nonlinearities in the transformed Eqs. (46)–(48), and the appropriate estimates are derived in this section. We begin by estimating the derivative ∂(y) of (y) with respect to y = (z, µ). It follows from Eq. (54) that L(∂m ) = ∂ N m , m = 2, . . . , p, and differentiating (57), (58), one finds that ∂ N1m (y) (m−1) , 1] = mh m 1 [{y}

+

q

⎛

⎝ig i,q−i [{y}(i−1) , 1, − p1 , . . . , − pq−i ] 6

2≤q≤m i=0 i+ p1 +... pq−i =m

+

q−i j=1

⎞ i,q−i g6 [{y}(i) , − p1 , . . . , −∂ p j , . . . − pq−i ]⎠

602

M. D. Groves, G. Schneider

⎛ +

⎜ dk1 [y] ⎜ ⎝

2≤k≤m−1

q

2≤q≤m−k+1 i=0 i+ p1 +... pq−i =m−k+1

⎛

⎜ i,q−i ⎜i F [{y}(i−1) , 1, − p1 , . . . , − pq−i ] ⎝ ⎞⎞ +

q−i j=1

⎟⎟ ⎟ F i,q−i [{y}(i) , − p1 , . . . , −∂ p j , . . . − pq−i ]⎟ ⎠⎠,

together with a similar expression for ∂ N2m (y). The methods used to establish Proposition 2 show that ψm−1 = |∂m |s+2 2,m−1 satisfies ψm−1 so that ψm−1

20aγ ≤ ρ2

√ ≤ 5m

√ m−1 5aγ βm , ρ2

√ m−2 2 5aγ m(m − 2)!, m = 2, 3, . . . , ρ2

(76)

and the arguments in the proof of Proposition 4 yield another estimate for ∂. Proposition 8. Suppose that δ > 0 and p ≥ 2 satisfy ρ2 δp ≤ √ . 4 5aγ The estimates ! ! ! ! ! ! k ! ∂ (y)! ! ! !2≤k≤ p !

s+2 Xsh,c

√ 5 , ≤

""" """ """ """ 10 k """ d(∂ )[y]"""""" ≤ """ δ 2≤k≤ p

hold for every y ∈ R5 such that |y| ≤ δ. ˜ The next step is to derive an estimate for ∂ h(y) s+1 . Differentiating (66) and the Xsh,c ˜ corresponding formula for h 1 (y) and proceeding as in Sect. 4.3, we find that

20 ˆ 1p + 1 + ∂ h˜ 1 (y)s+2 ≤ 2 +

20 ˆ 1p + 3 + ∂ h˜ 2 (y)s+1 ≤ 3 +

20 ˆ 2p + ˆ 3p ,

20 ˆ 2p + ˆ 3p ,

Modulating Pulses for Quasilinear Wave Equations

where ˆp= 1

p+1≤q

ˆp= 2

√ q q−1 a q(q + 1) 5 δ , ρq

2≤q≤ p n= p1 +...+ pq ≥ p+1 p j ≥1

ˆp= 3

603

anδ n−1 φ p1 . . . φ pq , ρq

√ 5kφk

2≤k≤ p p+1≤n≤ p+k−1

2≤q≤n−k+1 i+ p1 +...+ pq =n−k+1 p j ≥1

anδ n−1 i 5 φ p1 . . . φ pq . ρq

(In deriving these estimates we have replaced ψm−1 by mφm ; this procedure is permissible in view of inequalities (64) and (76), which are used to estimate ψm−1 and φm in ˜ the subsequent analysis.) An estimate for ∂ h(y) s+1 is obtained from the above calXsh,c √ q √ q culation by estimating q(q + 1) 5 ≤ (2 5) and repeating the proof of Proposition 5. Proposition 9. Suppose that δ > 0 and p ≥ 2 satisfy the stronger mutual constraint ρ2 . δp ≤ √ 4 5eaγ The estimate

√ 4 5aγ C= ρ

c 1 p+1 ˜ (Cδ) p! + p+1 2 , ∂ h(y)X s+1 ≤ sh,c δ e p

(77)

holds for every y ∈ R5 such that |y| ≤ δ. Proposition 9 and the calculation above inequality (71) yield the inequality

27 2 2 ˜ + 4e δe−1/eCδ ∂ h(y) m s+1 ≤ cC Xsh,c 8e for p = popt , from which it follows that ∂ h˜ µ (z)X s+1 ≤ cµe−c sh,c

/µ

.

(78) µ

It remains to derive the corresponding estimates for the derivatives of g j , j = 3, . . . , 6, and this task is accomplished by repeating the arguments used in Proposition 7 and Corollary 1. µ µ Proposition 10. The transformed nonlinearities F˜2 and g˜ j , j = 3, . . . , 6 satisfy the inequalities µ |∂i F˜2 (z, q)| ˜ ≤ c(|(z, µ)|4 (z, q) ˜ X s+1 + qX s+1 ), µ ∂i g˜ j (z, q) ˜ s+1 µ ∂i g˜ 4 (z, q) ˜ s+1 µ ∂i g˜ 6 (z, q) ˜ s+2

for i = 1, 2.

sh,c

≤ cµ,

j = 3, 5,

≤ cµ (z, q) ˜ X s+1 , 2

≤ cµ2 (z, q) ˜ X s+1

604

M. D. Groves, G. Schneider

cs remain within a distance of e−c /2µ Fig. 4. Solutions with initial data on the local centre-stable manifold Wloc of p µ on the timescale [0, ec /2µ ]

The existence theory presented in Sects. 5 and 6 below makes frequent use of the fact µ that the nonlinearities g˜ j , j = 3, . . . , 6 and h˜ µ are Lipschitz functions of their arguments whose Lipschitz constants are estimated by inequality (78) and Proposition 10. The Lipschitz continuity of their derivatives is also required, but here the size of the Lipschitz constants is not important. In these circumstances we use estimates of the form µ

µ

∂i g˜ 6 (z 1 , q˜1 ) − ∂i g˜ 6 (z 2 , q˜2 )s+2 ≤ cµ (z 1 − z 2 , q˜1 − q˜2 )X s+1 ,

j = 1, 2,

in which the Lipschitz constant cµ depends upon µ; these estimates follow from the analyticity of the nonlinearities and the restriction (77). 5. The Local Centre-Stable Manifold In this section we construct solutions of Eqs. (46)–(48) whose pointwise distance from an approximate pulse p µ identified in Sect. 3 does not exceed e−c /2µ for ξ ∈ [0, ec /2µ ] (see Fig. 4). The local centre-stable manifold is the set of initial data for such solutions, and we use it in the next section to extend these solutions to symmetric modulating pulses which exist for ξ ∈ [−ec /2µ , ec /2µ ]. The centre-stable manifold is a generalisation of a concept familiar in dynamical-systems theory, although the methods used to find the solutions whose initial data defines the manifold is rather different here due to the quasilinear nature of our problem. We begin by writing z = p µ + r , so that r is a perturbation around an approximate modulating pulse, and decomposing equations (47), (48) into equations for the strongly hyperbolic part Z = Psh q and central part w = Pc q of q. Recall that the m th Fourier components of d1 (z 1 ) vanish for m = 1, so that Psh (d1 (z 1 )q1 ) = 0,

Pc (d1 (z 1 )q1 ) = Pc (d1 (z 1 )w1 ).

Using this calculation and the corresponding results for d2 and d3 , one finds that µ

µ

µ

∂ξ Z = L 0 Z + G 1 (Z , p µ + r, w) + G 2 ( p µ + r ),

(79)

∂ξ r =

(80)

µ L1 r

+

µ F3 (r ) +

µ F4 (Z , r, w),

Modulating Pulses for Quasilinear Wave Equations

605

where µ

L 0 =L µ |Xshs ,

µ

G 1 (Z , z, w)=

µ Psh (g˜ 6 (z, Z + w)) , µ µ Z + w)∂η2 (Z 1 + w1 ) + g˜ 4 (z, Z + w) + g˜ 5 (z, Z + w)∂η (Z 2 + w2 )]

µ Psh (h˜ 1 (z)) µ G 2 (z)= , µ Psh (h˜ 2 (z))

µ Psh [g˜ 3 (z,

µ µ L1 r=d1 F˜1 [ p µ , 0](r ), µ µ µ µ F (r )= F˜ ( p µ + r ) − F˜ ( p µ ) − d1 F˜ [ p µ , 0](r ), 3

1

1

1

µ µ µ F4 (Z , r, w)= F˜2 ( p µ + r, Z + w) − F˜2 ( p µ , 0),

and µ µ ∂ξ w1 = w2 + gˆ 6 (Z , p µ + r, w) + hˆ 1 ( p µ + r ),

∂ξ w 2 =

(81)

µ µ µ µ −c3 k02 ∂η2 w1 − c4 w1 + µPc (d1 ( p1 + r1 )w1 ) + µPc (d2 ( p1 µ µ + µPc (d3 ( p1 + r1 )w2 ) + Pc (gˆ 3 (Z , p µ + r, w)∂η2 w1 ) µ µ µ + gˆ 4 (Z , p µ + r, w) + Pc (gˆ 5 (Z , p µ + r, w)∂η w2 ) + hˆ 2 ( p µ

+ r1 )∂η w1 ) + r ),

(82)

where µ

gˆ j (Z , z, w) = g˜ µ (z, Z + w), µ gˆ 4 (Z , z, w) µ gˆ 6 (Z , z, w)

= =

j = 3, 5,

µ µ Pc [g˜ 4 (z, Z + w) + g˜ 3 (z, Z µ µ Pc g˜ 6 (z, Z + w), hˆ j (z)

µ

+ w)∂η2 Z 1 + g˜ 5 (z, Z + w)∂η Z 2 ], µ = Pc h˜ j (z),

j = 1, 2.

The main difficulty in the existence theory below is the fact that the ‘nonlinearities’ in Eqs. (79)–(82) are only ‘asymptotically nonlinear’ in (r, Z , w): they contain linear terms with ξ -dependent coefficients which decay to zero as ξ → ∞. These terms are in fact unproblematic in Eq. (79), while a delicate argument using Gronwall’s inequality and energy estimates is required to deal with them in (81), (82). Turning to (80), we use µ µ s in the ξ -dependent linear operator L1 rather than the constant operator L 1 = L µ |Xwh µ µ Eq. (80) in order to avoid the presence of linear terms in F3 , and although F4 does contain linear terms, Lemma 3 ensures that they are of higher order with respect to the parameter µ. Proposition 11. Suppose that |r | ≤ ch µ (see the comments above Eq. (75)) and qX s+1 ≤ µ. sh,c

(i) The nonlinearities in Eq. (79) satisfy the estimates µ

|G 1 (Z , z, w)| ≤ c(µ(Z , w)2X s+1 + (µ|z|2 + µ2 |z|)(Z , w)X s+1 ), sh,c

sh,c

µ |∂ j G 1 (Z , z, w)|

≤ c(µ(Z , w)X s+1 + µ|z| + µ |z|), 2

2

sh,c

j = 1, 2, 3,

and µ

|G 2 (z)| ≤ cµ2 e−c

/µ

µ

, |∂G 2 (z)| ≤ cµe−c

/µ

.

606

M. D. Groves, G. Schneider

(ii) The nonlinearities in Eq. (80) satisfy the estimates µ

µ

|F3 (r )| ≤ c|r |2 , |∂ F3 (r )| ≤ c|r | and µ

|F4 (Z , r, w)| ≤ cµ5 (Z , r, w)X s+1 , µ

|∂ j F4 (Z , r, w)| ≤ c(µ5 + (Z , w)X s+1 ), µ

µ

j = 1, 2, 3.

µ

(iii) The nonlinearities gˆ 3 (Z , z, w), gˆ 5 (Z , z, w) and gˆ 6 (Z , z, w) in Eqs. (81), (82) µ µ µ satisfy the same estimates as respectively g˜ 3 (z, q), g˜ 5 (z, q) and g˜ 6 (z, q) with q = Z + w, while µ

gˆ 4 (Z , z, w)s+1 ≤ c(µ(Z , w)2X s+1 + (µ|z|2 + µ2 |z|)(Z , w)X s+1 ), sh,c

sh,c

µ ∂ j gˆ 4 (Z , z, w)s+1

≤ c(µ(Z , w)X s+1 + µ|z| + µ |z|), 2

2

sh,c

j = 1, 2, 3.

Proof. (i) Observe that |Psh (g3 (z, q)∂η2 q1 )|, |Psh (g5 (z, q)∂η q2 )| ≤ c(µ|z|2 qX s+1 + µq2X s+1 ), sh,c

sh,c

because Psh (g3 (z, q)∂η2 q1 ), Psh (g5 (z, q)∂η q2 ) do not contain any terms which are linear in both z and q, and proceeding as in the proof of Corollary 1, one finds that |Psh (g˜ 3 (z, q)∂η2 q1 )|, |Psh (g˜ 5 (z, q)∂η q2 )| ≤ c(µ|z|2 qX s+1 + µq2X s+1 ). sh,c

sh,c

µ µ The estimate for G˜ 1 follows from this result and the estimate for g˜ 4 given in Corollary 1, µ and those for its derivatives are obtained by a similar argument. The estimates for G˜ 2 follow directly from (75) and Proposition 8. µ µ (ii) The results for F3 are an immediate consequence of the fact that F3 is an analytic function of r which contains no constant or linear terms in its Taylor expansion. The µ results for F4 follow from the inequalities µ

µ

|F4 (Z , r, w)| = | F˜2 ( p µ + r, q) − F˜2 ( p µ , 0)| µ

≤ sup |dF2 [z, q]| (r, q)X s+1 ≤ cµ5 (r, q)X s+1 , in which the supremum is taken over |z| ≤ 2ch µ, qX s+1 ≤ µ, and sh,c

µ µ |∂ j F4 (Z , r, w)| ≤ |∂ j F˜2 ( p µ + r, Z + w)| + |∂ j F˜2 ( p µ , 0)|,

j = 1, 2, 3

together with Proposition 10. (iii) The results for g4 are derived by noting that Pc (g3 (z, q)∂η2 Z 1 ), Pc (g5 (z, q)∂η Z 2 ) do not contain any terms which are linear in both z and q and arguing as in part (i).

Modulating Pulses for Quasilinear Wave Equations

607

Turning to the linear part of the vector fields on the right-hand sides of (79), (80), µ recall that L 0 has a pair ±λ0,µ = ±(1 + k02 )1/2 + O(µ4 ) of simple eigenvalues with corresponding eigenvectors u 0 = (1, λ0,µ ) and s0 = (1, −λ0,µ ) which define the stable and unstable directions associated with this matrix. The projections onto these directions s , where are constructed in the usual fashion using the dual basis {s0∗ , u ∗0 } to {s0 , u 0 } in Xsh s0∗ =

1 1 (λ0,µ , 1), u ∗0 = (λ0,µ , −1). 2λ0,µ 2λ0,µ µ

Observe that u 0 = Ssh s0 and u ∗0 = Ssh s0∗ (a consequence of the relation L 0 Ssh = µ −Ssh L 0 and the self-adjointness of the operator Ssh ). The stable and unstable directions associated with the ξ -dependent linear operator Lµ are described by the following result; it is proved by noting that −θµ s X s →X s ≤ cµe Lµ − L µ |Xwh wh wh

2 |ξ |

, ξ ∈ R,

and using the method explained by Groves & Mielke [3, §4.3]. Proposition 12. The equation µ

∂ξ r = L1 r has solutions s1,1 (ξ ), s1,2 (ξ ), u 1,1 (ξ ), u 1,2 (ξ ) on [0, ∞) such that |s1, j (ξ )| ≤ ce−λ1,µ ξ , |u 1, j (ξ )| ≤ ceλ1,µ ξ , ξ ∈ [0, ∞). ∗ (ξ ), s ∗ (ξ ), u ∗ (ξ ), u ∗ (ξ )} to {s (ξ ), s (ξ ), u (ξ ), u (ξ )} The dual basis {s1,1 1,1 1,2 1,1 1,2 1,2 1,1 1,2 s in Xwh satisfies c λ1,µ ξ c −λ1,µ ξ ∗ |s1, e , |u ∗1, j (ξ )| ≤ e . j (ξ )| ≤ λ1,µ λ1,µ µ

µ

Taking s1,1 (ξ ) = µ−3 p µ (ξ ), s1,2 = µ−1 (Rπ/2 p1 (ξ ), Rπ/2 p2 (ξ )) (a choice made possible by the translation invariance in ξ and the O(2) symmetry of (24)–(25)), so that Swh s1, j (0) = −s1, j (0), one can choose u 1, j so that Swh u 1, j (0) = u 1, j (0). Our task is to find solutions (Z , r, w) of (79)–(82) for which |Z (ξ )|, |r (ξ )| and w(ξ )Xcs+1 do not exceed e−c /2µ for ξ ∈ [0, ec /2µ ]. The requisite solutions are constructed using the following iteration scheme. Choose real numbers Z 0 , r10 , r20 whose magnitude is at most µe−c /2µ and w 0 ∈ Xcs+1 such that w 0 Xcs+2 ≤ µe−c /2µ . Set Z (0) = 0, r(0) = 0, w(0) = 0 and for m = 0, 1, 2, . . . define Z (m+1) ∈ C([0, ec /2µ ], R2 ), r(m+1) ∈ C([0, ec /2µ ], R4 ) by the formulae % ξ µ µ Z (m+1) (ξ ) = Z 0 s0 e−λ0,µ ξ + (G 1(m) + G 2(m) )(τ ), s0∗ eλ0,µ τ dτ s0 e−λ0,µ ξ % −

ec

/2µ

ξ

0

µ

µ

(G 1(m) + G 2(m) )(τ ), u ∗0 e−λ0,µ τ dτ u 0 eλ0,µ ξ ,

r(m+1) (ξ ) = r10 s1,1 (ξ ) + r20 s1,2 (ξ ) + −

2 % j=1 ξ

ec /2µ

µ

2 % j=1 0 µ

ξ

µ

(83)

µ

∗ (F3(m) + F4(m) )(τ ), s1, j (τ ) dτ s1, j (ξ )

(F3(m) + F4(m) )(τ ), u ∗1, j (τ ) dτ u 1, j (ξ ),

(84)

608

M. D. Groves, G. Schneider

and let w(m+1) ∈ C([0, ec

/2µ

], Xcs+1 ) be the solution of the equations

µ

µ

∂ξ w1(m+1) = w2(m+1) + gˆ 6(m) + hˆ 1(m) , µ

(85)

µ

∂ξ w2(m+1) = − c3 k02 ∂η2 w1(m+1) − c4 w1(m+1) + µPc (d1(m) w1(m+1) ) + µPc (d2(m) ∂η w1(m+1) ) + µPc (d3(m) w2(m+1) ) µ µ µ µ ∂η2 w1(m+1) ) + gˆ + Pc (gˆ ∂η w2(m+1) ) + hˆ , (86) + Pc (gˆ 3(m)

4(m)

3(m)

2(m)

µ

µ

with initial data w(m+1) |ξ =0 = w 0 ; here G 1(m) is an abbreviation for G 1 (Z (m) , p µ + r(m) , w(m) ) and similar abbreviations are used for the other nonlinearities. Lemma 5. Suppose that |Z 0 |, |r10 |, |r20 |, w 0 Xcs+1 ≤ µe−c sup

sup

m∈N ξ ∈[0,ec /2µ ]

|Z (m) (ξ )| ≤ ce−c

/2µ

,

sup

sup

/2µ

m∈N ξ ∈[0,ec /2µ ]

. The estimates

|r(m) (ξ )| ≤ ce−c

/2µ

and sup

sup

m∈N ξ ∈[0,e

c /2µ

]

w(m) (ξ )Xcs ≤ ce−c

/2µ

hold for all sufficiently small values of µ. Proof. We demonstrate that sup ξ ∈[0,e

c /2µ

]

|Z (m+1) (ξ )| ≤ cµe−c

/2µ

,

sup ξ ∈[0,e

c /2µ

]

|r(m+1) (ξ )| ≤ cµe−c

/2µ

and sup ξ ∈[0,e

c /2µ

]

w(m+1) (ξ )Xcs ≤ cµ1/2 | log µ|e−c

/2µ

whenever sup ξ ∈[0,ec

/2µ

]

& ' |Z ( j) (ξ )| + |r( j) (ξ )| + w( j) (ξ )Xcs ≤ e−c /2µ

for j = 0, . . . , m; the lemma follows inductively from this result. Observe that % ξ |Z (m+1) (ξ )| ≤ c |Z 0 | + µ3 e−c /2µ eλ0,µ τ dτ e−λ0,µ ξ 0

% ∞ 3 −c /2µ −λ0,µ τ +µ e e dτ eλ0,µ ξ ξ

≤ c(|Z 0 | + µ3 e−c ≤ cµe

−c /2µ

/2µ

)

(87)

Modulating Pulses for Quasilinear Wave Equations

and

609

% µ5 e−c /2µ ξ λ1,µ τ e dτ e−λ1,µ ξ λ1,µ 0 % µ5 e−c /2µ ∞ −λ1,µ τ + e dτ eλ1,µ ξ λ1,µ ξ µ5 e−c /2µ 0 0 ≤ c |r1 | + |r2 | + λ21,µ

|r(m+1) (ξ )| ≤ c |r10 | + |r20 | +

≤ cµe−c for ξ ∈ [0, ec

/2µ

(88)

/2µ

]; here we have used the estimates µ

|G 1(m) | ≤ cµ3 e−c µ |F3(m) |

≤ ce

/2µ

−c /µ

,

µ

, |G 2(m) | ≤ cµ2 e−c

µ |F4(m) |

5 −c /2µ

≤ cµ e

/µ

,

,

which are obtained from Proposition 11. The corresponding result for w(m+1) is obtained by applying energy estimates to Eqs. (85), (86). Define the energy Es+1 by % µ µ Es+1 (w) = {(∂ηs+1 w2 )2 − c3 k02 (∂ηs+2 w1 )2 + c4 (∂ηs+1 w1 )2 } dη % =

(∂ηs+1 w2 )2 dη +

∞ µ µ (−c3 k02 j 2 + c4 ) j 2s+2 |w j,1 |2 , j=1

and note that Es+1 is equivalent to the usual norm on Xcs+1 because w has zero mean µ and c3 < 0. Applying the operator ∂ηs+1 w2(m+1) ∂ηs+1 to (86), integrating with respect to η over one period and using the calculation % µ {∂ηs+1 w2(m+1) ∂ξ ∂ηs+1 w2(m+1) + c3 k02 ∂ηs+1 w2(m+1) ∂ηs+3 w1(m+1) µ

+ c4 ∂ηs+1 w2(m+1) ∂ηs+1 w1(m+1) } dη % 1 µ µ µ = ∂ξ Es+1 (w(m+1) ) − c3 k02 ∂ηs+3 w1(m+1) ∂ηs+1 (gˆ 6(m) + hˆ 1(m) ) dη 2 % µ µ µ − c4 ∂ηs+1 w1(m+1) ∂ηs+1 (gˆ 6(m) + hˆ 1(m) ) dη, which is obtained by integrating by parts and using Eq. (85), one finds that % 1 µ µ µ ∂ξ Es+1 (w(m+1) ) = c3 k02 ∂ηs+3 w1(m+1) ∂ηs+1 (gˆ 6(m) + hˆ 1(m) ) dη 2 % µ µ µ + c4 ∂ηs+1 w1(m+1) ∂ηs+1 (gˆ 6(m) + hˆ 1(m) ) dη % + µ ∂ηs+1 w2(m+1) ∂ηs+1 Pc (d1(m) w1(m+1) ) dη

610

M. D. Groves, G. Schneider

% +µ

%

∂ηs+1 w2(m+1) ∂ηs+1 Pc (d2(m) ∂η w1(m+1) ) dη

+ µ ∂ηs+1 w2(m+1) ∂ηs+1 Pc (d3(m) w2(m+1) ) dη % µ µ + ∂ηs+1 w2(m+1) ∂ηs+1 (gˆ 4(m) + hˆ 2(m) ) dη % µ + ∂ηs+1 w2(m+1) ∂ηs+1 Pc (gˆ 3(m) ∂η2 w1(m+1) ) dη % µ + ∂ηs+1 w2(m+1) ∂ηs+1 Pc (gˆ 5(m) ∂η w2(m+1) ) dη. An estimate for Es+1 (w(m+1) ) can be derived from this identity with the help of the estimates d j (m) s+1 ≤ c(e−c

/2µ

+ µe−µ

2 θξ

gˆ j (m) s+1 ≤ cµ(e

−c /2µ

gˆ 4(m) s+1 ≤ cµ(e

−c /µ

+ µe

),

−µ2 θξ

j = 1, 2, 3,

),

j = 3, 5,

2 −c /2µ −µ2 θξ

+µ e

−c /µ

gˆ 6(m) s+2 ≤ cµ(e + µ2 e hˆ (m) Xcs+1 ≤ cµ2 e−c /µ ,

e

−c /2µ

e

−µ2 θξ

), ),

which are obtained from Proposition 11(iii). Notice that "% " " " " ∂ s+3 w1(m+1) ∂ s+1 (gˆ µ + hˆ µ ) dη" η η 6(m) 1(m) " " "% " " " µ s+2 s+2 µ " ˆ = " ∂η w1(m+1) ∂η (gˆ 6(m) + h 1(m) ) dη"" µ µ ≤ w1(m+1) s+2 (gˆ 6(m) s+2 + hˆ 1(m) s+2 )

≤ cµEs+1 (w(m+1) )1/2 (e−c

/µ

+ µ2 e−c

/2µ

e−µ

2 θξ

),

and similarly "% " " " "% " " " "% " " " "% " " " "% " " "

" " 2 ∂ηs+1 w1(m+1) ∂ηs+1 Pc (d1(m) w1(m+1) ) dη"" ≤ c(e−c /2µ + µe−µ θξ )Es+1 (w(m+1) ), " " 2 ∂ηs+1 w1(m+1) ∂ηs+1 Pc (d2(m) ∂η w1(m+1) ) dη"" ≤ c(e−c /2µ + µe−µ θξ )Es+1 (w(m+1) ), " " 2 s+1 s+1 ∂η w1(m+1) ∂η Pc (d3(m) w2(m+1) ) dη"" ≤ c(e−c /2µ +µe−µ θξ )Es+1 (w(m+1) ), " " 2 µ µ ∂ηs+1 w1(m+1) ∂ηs+1 (gˆ 6(m) + hˆ 1(m) ) dη"" ≤ cµEs+1 (w(m+1) )1/2 (e−c /µ +µ2 e−c /2µ e−µ θξ ), " " 2 µ µ ∂ηs+1 w2(m+1) ∂ηs+1 (gˆ 4(m) + hˆ 2(m) ) dη"" ≤ cµEs+1 (w(m+1) )1/2 (e−c /µ +µ2 e−c /2µ e−µ θξ ).

Modulating Pulses for Quasilinear Wave Equations

611

Straightforward calculations show that %

µ

∂ηs+1 w2(m+1) ∂ηs+1 Pc (gˆ 3(m) ∂η2 w1(m+1) ) dη % =

µ

∂ηs+1 w2(m+1) ∂ηs+1 (gˆ 3(m) ∂η2 w1(m+1) ) dη + s1 %

=

µ

∂ηs+1 w2(m+1) gˆ 3(m) ∂ηs+3 w1(m+1) dη + s1 + s2

% µ = − ∂ηs+2 w2(m+1) ∂ηs+2 w1(m+1) gˆ 3(m) dη + s1 + s2 + s3 % µ µ µ = − ∂ηs+2 (∂ξ w1(m+1) − gˆ 6(m) − hˆ 1(m) )∂ηs+2 w1(m+1) gˆ 3(m) dη + s1 + s2 + s3 1 = − ∂ξ 2

%

µ

(∂ηs+2 w1(m+1) )2 gˆ 3(m) dη + s1 + s2 + s3 + s4 + s5 ,

where % s1 = − % s2 =

µ

∂ηs+1 w2(m+1) ∂ηs+1 (Ph (gˆ 3(m) ∂η2 w1(m+1) )) dη, ⎛ ⎞

s s + 1 µ ∂ηs+1− j gˆ 3(m) ∂ηj+2 w1(m+1) ⎠ dη, ∂ηs+1 w2(m+1) ⎝ j j=0

% s3 = − % s4 = 1 s5 = 2

µ

∂ηs+1 w2(m+1) ∂η gˆ 3(m) ∂ηs+1 w2(m+1) dη, µ µ µ ∂ηs+2 (gˆ 6(m) + hˆ 1(m) )∂ηs+2 w1(m+1) gˆ 3(m) dη

%

µ

(∂ηs+2 w1(m+1) )2 ∂ξ gˆ 3(m) dη,

and %

µ

∂ηs+1 w2(m+1) ∂ηs+1 Pc (gˆ 5(m) ∂η w2(m+1) ) dη % = % =

µ

∂ηs+2 w2(m+1) ∂ηs+1 (gˆ 5(m) ∂η w2(m+1) ) dη + s6 µ

∂ηs+1 w2(m+1) gˆ 5(m) ∂ηs+2 ∂ξ w2(m+1) dη + s6 + s7

% 1 µ ∂η ((∂ηs+1 w2(m+1) )2 )gˆ 5(m) dη + s6 + s7 2 = s6 + s7 + s8 , =

612

M. D. Groves, G. Schneider

where

% µ s6 = − ∂ηs+1 w2(m+1) ∂ηs+1 Ph (gˆ 5(m) ∂η w2(m+1) ) dη, ⎞ ⎛

% s s + 1 µ ∂ηs+1− j gˆ 5(m) ∂ηj+1 w2(m+1) )⎠ dη, s7 = ∂ηs+1 w2(m+1) ⎝ j j=0 % 1 µ s8 = ∂η ((∂ηs+1 w2(m+1) )2 )gˆ 5(m) dη; 2

these quantities satisfy the estimates |s j | ≤ cµ3 Es+1 (w(m+1) )1/2 (e−c |s j | ≤ µEs+1 (w(m+1) )(e

−c /2µ

/µ

+ µ2 e−c

+ µe

−µ2 θξ

/2µ

),

e−µ

2 θξ

),

j = 1, 2, 3,

j = 4, 6, 7, 8,

and µ

|s5 | ≤ cEs+1 (w(m+1) )∂ξ gˆ 3(m) s µ

≤ cEs+1 (w(m+1) )∂ gˆ 3(m) s (|∂ξ Z (m) | + |∂ξ r(m) | + |∂ξ p µ | + ∂ξ w(m) Xcs ) ≤ cµEs+1 (w(m+1) )(e−c

/2µ

+ µe−µ

2 θξ

)

(the inequalities |∂ξ Z (m) | ≤ ce−c

/µ

, |∂ξ r(m) | ≤ ce−c

/µ

, ∂ξ w(m) Xcs ≤ ce−c

/µ

follow directly from the inductive hypothesis by means of Eqs. (83)–(86) with m replaced by m − 1). Finally, let us define % 1 µ e Es+1 (w) = Es+1 (w) − (∂ηs+2 w1 )2 gˆ 3(m) dy 2 and note that

since

1 e Es+1 (w) ≤ Es+1 (w) ≤ cEs+1 (w) c

(89)

"% " " " " (∂ s+2 w1 )2 gˆ ε dy " ≤ Es+1 (w)gˆ ε s+1 η 3(m) 3(m) " " ≤ µ2 Es+1 (w).

Altogether, we have that e ∂ξ Es+1 (w(m+1) ) ≤ cµ(e−c

/µ

+ cµ(e−c

+ µ2 e−c

/2µ

/2µ

+ µe−µ

e−µ

2 θξ

2 θξ

)Es+1 (w(m+1) )1/2

)Es+1 (w(m+1) ).

(90)

e (w We proceed by establishing an estimate for Es+1 (m+1) ) on the short interval ξ ∈ [0, ξ ], where

ξ =

α| log µ| , θ µ2

Modulating Pulses for Quasilinear Wave Equations

613

so that e−µ θξ = µα , and α is an appropriately chosen positive constant. It follows from inequality (90) that 2

e e ∂ξ Es+1 (w(m+1) ) ≤ c1 µ2 Es+1 (w(m+1) ) + c2 µ4 e−c

/µ

,

and an application of Gronwall’s inequality yields e e Es+1 (w(m+1) )(ξ ) ≤ (Es+1 (w(m+1) )(0) + c2 µ4 e−c

/µ

ξ )ec1 µ ξ . 2

Choosing α = θ/c1 , one finds that 2ξ

e c1 µ

1 , µ

=

whereby 1 e (Es+1 (w(m+1) )(0) + c2 µ4 e−c /µ ξ ) µ 1 e ≤ (Es+1 (w(m+1) )(0) + cµ2 e−c /µ | log µ|) µ

e Es+1 (w(m+1) )(ξ ) ≤

≤ cµe−c

/µ

| log µ|

(91)

for ξ ∈ [0, ξ ]. This intermediate step may now be used to deduce the desired estimate e (w c /2µ ]. for Es+1 (m+1) ) on the long interval ξ ∈ [0, e Integrating inequality (90), we find that e e Es+1 (w(m+1) )(ξ ) ≤ Es+1 (w(m+1) )(0) + cµ

+ µe−c

/2µ

sup τ ∈[0,ec

3 −c /2µ

%

ec

+µ e

/2µ

/2µ

]

τ ∈[0,ec /2µ ]

e Es+1 (w(m+1) )1/2 (τ )

e−θµ τ dτ 2

%

ec 0

/2µ

sup τ ∈[0,ec

0

+ cµ2

e Es+1 (w(m+1) )(τ )

sup

/2µ

]

e Es+1 (w(m+1) )1/2 (τ )

e−θµ τ Es+1 (w(m+1) )(τ ) dτ 2

(92)

for ξ ∈ [0, ec /2µ ]. Observe that % ec /2µ 2 2 e e−µ θτ Es+1 (w(m+1) )(τ ) dτ µ 0

%

ξ

=µ

2 0

2 e e−µ θτ Es+1 (w(m+1) )(τ ) dτ

e ≤ µ2 ξ sup Es+1 (w(m+1) )(τ ) + µ2 τ ∈[0,ξ ]

2

=µ ξ

sup

τ ∈[0,ξ ]

≤ cµe−c

/µ

e Es+1 (w(m+1) )(τ ) +

| log µ|2 + cµα

sup

% +µ

2

%

∞ ξ

/2µ

ξ

e−µ

2 θτ

e−µ

dτ

2 θτ

τ ∈[0,ec /2µ ]

sup τ ∈[0,ec

/2µ

e Es+1 (w(m+1) )(τ ) dτ

sup

2ξ

e−θµ θ

τ ∈[0,ec /2µ ]

ec

]

e Es+1 (w(m+1) )(τ )

e Es+1 (w(m+1) )(τ )

e Es+1 (w(m+1) )(τ ),

614

M. D. Groves, G. Schneider

in which the last step follows by (91) and the definition of ξ . Inserting this estimate and the inequality %

ec

/2µ

%

e−θµ τ dτ ≤ 2

∞

e−θµ τ dτ = 2

0

0

1 θ µ2

into (92), one concludes that e (w(m+1) )(ξ ) ≤ cµ| log µ|2 e−c Es+1

for ξ ∈ [0, ec

/2µ

/µ

+ c(µ + µα )

sup ξ ∈[0,ec

/2µ

]

e Es+1 (w(m+1) )(ξ )

], so that

e Es+1 (w(m+1) )(ξ ) ≤ cµ| log µ|2 e−c

/µ

, ξ ∈ [0, ec

/2µ

].

Lemma 6. Suppose that |Z 0 |, |r10 |, |r20 |, w 0 Xcs+2 ≤ µe−c /2µ . The iterates Z (m) , r(m) and w(m) satisfy sup | Z˜ (m+1) (ξ )| + |˜r(m+1) (ξ )| + w˜ (m+1) (ξ )X s+1 ξ ∈[0,ec

/2µ

c

]

1 sup | Z˜ (m) (ξ )| + |˜r(m) (ξ )| + w˜ (m) (ξ )X s+1 sh,c 2 ξ ∈[0,ec /2µ ]

≤

for each m ∈ N0 , where Z˜ (m+1) = Z (m+1) − Z (m) , r˜(m+1) = r(m+1) − r(m) , w˜ (m+1) = w(m+1) − w(m) . Proof. Examining the equations % ξ µ µ µ µ ˜ (G 1(m) + G 2(m) − G 1(m−1) − G 2(m−1) )(τ ), s0∗ eλ0,µ τ dτ s0 e−λ0,µ ξ Z (m+1) (ξ ) = 0

− r˜(m+1) (ξ ) =

%

ec

/2µ

ξ

2 %

ξ

j=1 0

− one finds that

µ

µ

µ

µ

(G 1(m) +G 2(m) −G 1(m−1) −G 2(m−1) )(τ ), u ∗0 e−λ0,µ τ dτ u 0 eλ0,µ ξ , µ

2 %

ec

/2µ

| Z˜ (m+1) (ξ )| ≤ c µ3

µ

µ

%

ξ 0

µ

µ

µ

(| Z˜ (m) (ξ )| + |˜r(m) | + w˜ (m) (ξ )Xcs+1 )eλ0,µ τ dτ e−λ0,µ ξ

% 3

+µ

µ

(F3(m) + F4(m) − F3(m−1) − F4(m−1) )(τ ), u ∗1, j (τ ) dτ u 1, j (ξ ),

j=1 ξ

µ

∗ F3(m) + F4(m) − F3(m−1) − F4(m−1) )(τ ), s1, j (τ ) dτ s1, j (ξ )

ξ

ec

/2µ

(| Z˜ (m) (ξ )| + |˜r(m) | + w˜ (m) (ξ )Xcs+1 )e

−λ0,µ τ

dτ e

λ0,µ ξ

Modulating Pulses for Quasilinear Wave Equations

≤ cµ3 |˜r(m+1) (ξ )| ≤ c

sup ξ ∈[0,ec

≤c for ξ ∈ [0, ec

ξ 0

λ1,µ

]

| Z˜ (m) (ξ )| + |˜r(m) | + w˜ (m) (ξ )Xcs+1 ,

(| Z˜ (m) (ξ )| + |˜r(m) | + w˜ (m) (ξ )Xcs+1 )eλ1,µ τ dτ e−λ1,µ ξ

%

µ5

µ5 λ21,µ

/2µ

%

µ5 λ1,µ +

/2µ

615

ec

/2µ

ξ

sup ξ ∈[0,ec

/2µ

]

(| Z˜ (m) (ξ )| + |˜r(m) | + w˜ (m) (ξ )Xcs+1 )e

−λ1,µ τ

dτ e

λ1,µ ξ

| Z˜ (m) (ξ )| + |˜r(m) | + w˜ (m) (ξ )Xcs+1

], in which the estimates

µ µ |G 1(m) − G 1(m−1) | ≤ cµ3 (Es+1 (w˜ (m) )1/2 + | Z˜ (m) | + |˜r(m) |), µ

µ

|G 2(m) − G 2(m−1) | ≤ cµe−c µ

µ

|F3(m) − F3(m−1) | ≤ e−c µ |F4(m)

−

µ F4(m−1) |

/µ

/2µ

(Es+1 (w˜ (m) )1/2 + | Z˜ (m) | + |˜r(m) |),

|˜r(m) |,

≤ µ (Es+1 (w˜ (m) )1/2 + | Z˜ (m) | + |˜r(m) |) 5

have been used (see Proposition 11). Similarly, the estimate for w˜ (m+1) is obtained by studying the equations µ

µ

µ

µ

∂ξ w˜ 1(m+1) = w˜ 2(m+1) + gˆ 6(m) − gˆ 6(m−1) + hˆ 1(m) − hˆ 1(m−1) , µ

(93)

µ

∂ξ w˜ 2(m+1) = −c3 k02 ∂η2 w˜ 1(m+1) − c4 w˜ 1(m+1) + µPc (d1(m) w˜ 1(m+1) )

µ

+ µPc (d2(m) ∂η w˜ 1(m+1) ) + µPc (d3(m) w˜ 2(m+1) ) + Pc (gˆ 3(m) ∂η2 w˜ 1(m+1) ) µ

+ Pc (gˆ 5(m) ∂η w˜ 2(m+1) ) + µPc ((d1(m) − d1(m−1) )w1(m) + µPc ((d2(m) − d2(m−1) )∂η w1(m) + µPc ((d3(m) − d3(m−1) )w2(m) µ

µ

µ

µ

+ Pc ((gˆ 3(m) − gˆ 3(m−1) )∂η2 w1(m) ) + Pc ((gˆ 5(m) − gˆ 5(m−1) )∂η w2(m) ) µ µ µ µ + gˆ 4(m) − gˆ 4(m−1) + hˆ 2(m) − hˆ 2(m−1) ,

(94)

and using the additional estimates d j (m) − d j (m−1) s+1 ≤ c|˜r(m) |, µ µ gˆ 4(m) − gˆ 4(m−1) s+1 µ µ gˆ 6(m) − gˆ 6(m−1) s+2 µ µ gˆ j (m) − gˆ j (m−1) s+1 µ µ hˆ (m) − hˆ (m−1) Xcs+1

≤ cµ(e

j = 1, 2, 3,

−c /2µ

+ µ2 e−µ

)(Es+1 (w˜ (m) )1/2 + | Z˜ (m) | + |˜r(m) |),

/2µ

+ µ2 e−µ

)(Es+1 (w˜ (m) )1/2 + | Z˜ (m) | + |˜r(m) |),

≤ cµ(e−c

2 θξ 2 θξ

≤ cµ(Es+1 (w˜ (m) )1/2 + | Z˜ (m) | + |˜r(m) |), ≤ cµe

−c /µ

j = 3, 5,

|˜r(m) |,

which are obtained from Proposition 11(iii). We apply the operator ∂ηs+1 w˜ 2(m+1) ∂ηs+1 to (94), integrate with respect to η over one period and use the estimation techniques developed in the previous lemma; the result is

616

M. D. Groves, G. Schneider

e ∂ξ Es+1 (w˜ (m+1) )

≤ cµ(e−c

/2µ

+ cµ(e

+ µe−µ

2 θξ

−c /2µ

e )Es+1 (w˜ (m+1) )

+ µ2 e−µ

2 θξ

e )Es+1 (w˜ (m+1) )1/2 (Es+1 (w˜ (m) )1/2 + | Z˜ (m) |2 + |˜r(m) |), (95)

where we have used the further calculations "% " " " " ∂ s+1 w˜ 2(m+1) ∂ s+1 Pc ((d1(m) − d1(m−1) )w1(m) ) dη" η η " " ≤ Es+1 (w˜ (m+1) )1/2 Es+1 (w(m) )1/2 d1(m) − d1(m−1) s+1

≤ ce−c /2µ Es+1 (w˜ (m+1) )|˜r(m) |, "% " " " 2 " ∂ s+1 w˜ 2(m+1) ∂ s+1 Pc ((gˆ µ − gˆ µ " η η 3(m) 3(m−1) )∂η w1(m) ) dη" " µ

µ

≤ Es+1 (w˜ (m+1) )1/2 Es+2 (w(m) )1/2 gˆ 3(m) − gˆ 3(m−1) s+1 ≤ cµe−c

/2µ

Es+1 (w˜ (m+1) )(Es+1 (w˜ (m) )1/2 + | Z˜ (m) | + |˜r(m) |).

(The estimate Es+2 (w(m) ) ≤ e−c /µ is obtained by repeating Lemma 5 with s replaced by s + 1 and requires the stronger condition w0 Xcs+2 ≤ µec /2µ ). Using the two-step method in the proof of the previous lemma, one finds from (95) that sup τ ∈[0,ec

/2µ

Es+1 (w˜ (m+1) (τ ))

]

≤ cµ| log µ|2

sup

τ ∈[0,ec

/2µ

]

Es+1 (w˜ (m) (τ )) + | Z˜ (m) | + |˜r(m) (τ )|2 .

The following convergence result is a direct consequence of the above lemmata. Theorem 2. For each Z 0 , r10 , r20 and w 0 with |Z 0 | ≤ µe−c

/2µ

, |r10 | ≤ µe−c

/2µ

, |r20 | ≤ µe−c

/2µ

, w 0 Xcs+2 ≤ µe−c

the sequence (Z (m) , r(m) , w(m) )m∈N0 converges in C([0, ec (Z , r , w ) which satisfies the estimate sup

ξ ∈[0,ec /2µ ]

/2µ

(Z (ξ ), r (ξ ), w (ξ ))X s+1 ≤ e−c

/2µ

], X s+1 ) to a limit

/2µ

and solves Eqs. (79)–(82). We now use the above results to define a local centre-stable manifold at time ξ = 0 for the nonautonomous equations (79)–(82). According to Lemmata 5 and 6 the solutions defining this manifold are available under the hypothesis that w0 Xcs+2 ≤ µe−c /2µ ; to ensure its differentiability one however requires the stronger hypothesis that w0 Xcs+4 ≤ µe−c /2µ (see Sect. 6 below), and we therefore make this hypothesis from the outset.

Modulating Pulses for Quasilinear Wave Equations

617

Definition 1. The set of points cs = Wloc

( {(Z (0), r (0), w (0))},

in which the union is taken over the set of Z 0 , r10 , r20 and w 0 such that |Z 0 | ≤ µe−c

/2µ

, |r10 | ≤ µe−c

/2µ

, |r20 | ≤ µe−c

/2µ

, w 0 Xcs+4 ≤ µe−c

is called the local centre-stable manifold for solutions to (79)–(82) at time ξ

/2µ

,

0.

6. Existence Theory for Symmetric Modulating Pulses In this section we identify solutions (Z , r , w ) to Eqs. (79)–(82) on the interval cs and which can be extended [0, ec /2µ ] whose initial data (Z (0), r (0), w (0)) lies on Wloc /2µ /2µ /2µ −c c c to solutions that remain O(e ) on [−e ,e ]. The idea is to exploit the reversibility of Eqs. (79)–(82) (see Sect. 2); in particular, solutions with the property that (Z (0), r (0), w (0)) lies on the symmetric section := Fix S = X s+1 ∩ {(v1,o , v2,e ) = (0, 0)}

can be extended to symmetric solutions on [−ec /2µ , ec /2µ ]. Because w (0) = w 0 , s1, j (0) ∈ Fix S, u 1, j (0) ∈ Fix(−S) we find that w (0) ∈ c := Pc whenever w 0 ∈ c and r (0) ∈ wh := Pwh whenever r10 , r20 = 0. Our task therefore reduces to that of finding a criterion on Z 0 which guarantees that Z (0) ∈ sh := Psh . Our first step is to introduce an artificial parameter ρ by replacing hˆ µ in Eqs. (81), (82) with ρ hˆ µ and replacing Eqs. (79), (80) with µ µ µ µ ∂ξ Z = L 0 Z + G˜ 1 (Z , p µ + r, w) + ρdG 1 [0, p µ , 0](Z , r, w) + ρG 2 ( p µ + r ), µ

µ

µ

µ

∂ξ r = L1 r + F3 (r ) + F4 (Z , r, w) + ρdF4 [0, 0, 0](Z , r, w), where µ µ µ G˜ 1 (Z , p µ + r, w) = G 1 (Z , p µ + r, w) − dG 1 [0, p µ , 0](Z , r, w), µ

µ

(96)

µ

F4 (Z , r, w) = F4 (Z , r, w) − dF4 [0, 0, 0](Z , r, w). loc undertaken in Sect. 5 above clearly remains valid for all values The construction of Wcs of ρ ∈ [0, 1], the choice ρ = 1 corresponding to the original equations. On the other hand ρ = 0 yields a system for which {(Z , w) = (0, 0)} is an invariant subspace, the flow in which coincides with that studied in Sect. 3; in particular it admits the homoclinic solution p µ (generated by the solution (Z , r, w) = (0, 0, 0)). It also has the advantage µ µ of being linearly ‘diagonal’ in the sense that G˜ 1 (Z , p µ + r, w) and F4 (Z , r, w) contain no terms which are linear in (Z , r, w). cs as a function We consider a solution (Z , r , w ) with (Z (0), r (0), w (0)) ∈ Wloc 0 0 0 0 of Z , r1 , r2 which depends upon ρ ∈ R and w ∈ c as parameters (with ρ ∈ [0, 1], w0 Xcs+4 ≤ µe−c /2µ ) and write (Z , r , w ) as (Z ρ,w0 , rρ,w0 , wρ,w0 )(Z 0 , r10 , r20 ) in the following analysis. Notice that (Z ρ,w0 , rρ,w0 , wρ,w0 )(Z 0 , 0, 0)|ξ =0 ∈ whenever Z 0 is a solution of the equation Jρ,w0 (Z 0 ) = 0, (97)

618

M. D. Groves, G. Schneider

s+1 is defined by where Jρ,w0 : B¯ µe−c /2µ (0) ⊂ R → (I − Ssh )Xsh

Jρ,w0 (Z 0 ) = (I − Ssh )Z ρ,w0 (Z 0 , 0, 0)|ξ =0 .

Equation (97) has the solution Z 0 = 0 at (ρ, w 0 ) = (0, 0) since the unique solution of (79)–(82) with (ρ, w 0 ) = (0, 0) is (Z , r, w) = (0, 0, 0). We therefore seek a solution of (79)–(82) near this known solution for parameter values (ρ, w0 ) near (0, 0), and it seems natural to apply the implicit-function theorem; notice, however, that we are forced to work from first principles (by applying the contraction mapping principle) since we require precise information concerning the parameter-dependence of the solutions, in particular that the solution exists for values of ρ up to one. In order to carry out the above programme it is necessary to show that Jρ,w0 is differentiable with respect to Z 0 and obtain some estimates on its derivative. We therefore need to show that the solutions (Z ρ,w0 , rρ,w0 , wρ,w0 ) described above are differentiable with respect to Z 0 and obtain some estimates on their derivatives. To this end we formally differentiate Eqs. (83)–(86) with respect to Z 0 and use a dot to denote ∂ Z 0 ; we treat the resulting linear equations for Z˙ , r˙ , w˙ with the iteration scheme

Z˙ (m+1) (ξ ) = s0 e−λ0,µ ξ % ξ µ + (d G˜ 1 [Z , p µ + r, w]( Z˙ (m) , r˙(m) , w˙ (m) ) 0

µ + ρdG 1 [0, p µ + r, 0]( Z˙ (m) , r˙(m) , w˙ (m) ) µ

+ ρdG 2 [ p µ + r ](˙r(m) ))(τ ), s0∗ eλ0,µ τ dτ s0 e−λ0,µ ξ ,

% −

e

c /2µ

ξ

µ

(d G˜ 1 [Z , p µ + r, w]( Z˙ (m) , r˙(m) , w˙ (m) ) µ + ρdG 1 [0, p µ + r, 0]( Z˙ (m) , r˙(m) , w˙ (m) ) µ

r˙(m+1) (ξ ) =

2 % ξ j=1 0

+ ρdG 2 [ p µ + r ](˙r(m) ))(τ ), u ∗0 e−λ0,µ τ dτ u 0 eλ0,µ ξ ,

(98)

µ µ (dF3 [r ](˙r(m) ) + dF4 [Z , r, w]( Z˙ (m) , r˙(m) , w˙ (m) ) µ

∗ + ρdF4 [0, 0, 0]( Z˙ (m) , r˙(m) , w˙ (m) ))(τ ), s1, j (τ ) dτ s1, j (ξ ) 2 % ec /2µ µ µ − (dF3 [r ](˙r(m) ) + dF4 [Z , r, w]( Z˙ (m) , r˙(m) , w˙ (m) ) j=1 ξ

µ

∂ξ w˙ 1(m+1)

+ ρdF4 [0, 0, 0]( Z˙ (m) , r˙(m) , w˙ (m) ))(τ ), u ∗1, j (τ )dτ u 1, j (ξ ), (99) µ µ = w˙ 2(m+1) + d gˆ [Z , p µ + r, w]( Z˙ (m) , r˙(m) , w˙ (m) ) + ρdhˆ [ p µ + r ](˙r(m) ), 6

1

(100)

Modulating Pulses for Quasilinear Wave Equations µ

619

µ

µ

∂ξ w˙ 2(m+1) = −c3 k02 ∂η2 w˙ 1(m+1) − c4 w˙ 1(m+1) + µPc (d1 ( p1 + r1 )w˙ 1(m+1) ) µ

µ

+ µPc (d2 ( p1 + r1 )∂η w˙ 1(m+1) ) + µPc (d3 ( p1 + r1 )w˙ 2(m+1) ) + µPc (d1 (˙r1(m) )w1 ) + µPc (d2 (˙r1(m) )∂η w1 ) + µPc (d3 (˙r1(m) )w2 ) µ

+ Pc (gˆ 3 (Z , p µ + r, w)∂η2 w˙ 1(m+1) ) µ

+ Pc (d gˆ 3 [Z , p µ + r, w]( Z˙ (m) , r˙(m) , w˙ (m) )∂η2 w1 ) µ

+ Pc (gˆ 5 (Z , p µ + r, w)∂η w˙ 2(m+1) ) µ + Pc (d gˆ [Z , p µ + r, w]( Z˙ (m) , r˙(m) , w˙ (m) )∂η w2 ) 5 µ + d gˆ 4 [Z ,

µ p µ + r, w]( Z˙ (m) , r˙(m) , w˙ (m) ) + ρdhˆ 2 [ p µ + r ](˙r(m) ).

(101) −c /2µ

Let us now choose Z , r , w which satisfy |Z (ξ )|, |r (ξ )|, w(ξ )Xcs+3 ≤ e for /2µ c ˙ ξ ∈ [0, e ], take Z (0) = 0, r˙(0) = 0, w˙ (0) = 0, and for m = 0, 1, 2, . . . define Z˙ (m+1) ∈ C([0, ec /2µ ], R2 ), r˙(m+1) ∈ C([0, ec /2µ ], R4 ) by formulae (98), (99) and let w˙ (m+1) ∈ C([0, ec /2µ ], Xcs+1 ) be the solution of (100), (101) with initial data (w˙ 1 , w˙ 2 )|ξ =0 = (0, 0). Lemma 7. (i) The estimate sup ξ ∈[0,ec

/2µ

]

| Z¯ (m+1) (ξ )| + |¯r(m+1) (ξ )| + w¯ (m+1) (ξ )Xcs+1

1 sup | Z¯ (m) (ξ )| + |¯r(m) (ξ )| + w¯ (m) (ξ )X s+1 , m ∈ N ≤ sh,c 2 ξ ∈[0,ec /2µ ]

holds uniformly over the set of (Z , r, w) which satisfy |Z (ξ )|, |r (ξ )|, w(ξ )Xcs+2 ≤ e−c /2µ for ξ ∈ [0, ec /2µ ], where Z¯ (m+1) = Z˙ (m+1) − Z˙ (m) , r¯(m+1) = r˙(m+1) − r˙(m) , w¯ (m+1) = w˙ (m+1) −w˙ (m) . Under these hypotheses the sequence {( Z˙ (m) , r˙(m) , w˙ (m) )} is bounded independently of (Z , r, w) in C([0, ec /2µ ], X s+1 ). (ii) Suppose additionally that w(ξ )Xcs+3 ≤ e−c

/2µ

, ξ ∈ [0, ec

/2µ

].

For each fixed value of ( Z˙ (m) , r˙(m) , w˙ (m) ) the iterate ( Z˙ (m+1) , r˙(m+1) , w˙ (m+1) ) ∈ C([0, ec /2µ ], X s+1 ) depends in a Lipschitz-continuous fashion upon (Z , r, w) ∈ C([0, ec /2µ ], X s+2 ); the Lipschitz constant is an affine function of the norm ( Z˙ (m) , r˙(m) , w˙ (m) )C([0,ec /2µ ],X s+1 ) . Proof. Equations for the difference ( Z¯ (m) , r¯(m) , w¯ (m) ) are obtained by replacing ( Z˙ (m) , r˙(m) , w˙ (m) ) by ( Z¯ (m) , r¯(m) , w¯ (m) ) and Z˜ 0 by zero in Eqs. (98)–(101). Observe that the equations for ( Z¯ (m) , r¯(m) , w¯ (m) ) are transformed into those for ( Z˜ (m) , r˜(m) , w˜ (m) ) examµ ined in Lemma 6 by replacing derivatives such as d G˜ 1 [Z , p µ + r, w]( Z¯ (m) , r¯(m) , w¯ (m) ) µ µ with differences such as |G˜ 1(m) − G˜ 1(m−1) |, where the derivatives obey the same estimates as the differences. The first assertion in part (i) therefore follows from the conclusion of Lemma 6; the second assertion is a consequence of the first together with the linearity of the right-hand sides of Eqs. (98)–(101) in ( Z˙ (m) , r˙(m) , w˙ (m) ).

620

M. D. Groves, G. Schneider

Turning to part (ii), note that % ξ µ µ µ µ (d G˜ 1,1 − d G˜ 1,2 + ρdG 2,1 − ρdG 2,2 )(τ ), s0∗ eλ0,µ τ dτ s0 e−λ0,µ ξ Zˆ (ξ ) = 0

%

− rˆ (ξ ) =

ec

/2µ

µ µ µ µ (d G˜ 1,1 − d G˜ 1,2 + ρdG 2,1 − ρdG 2,2 )(τ ), u ∗0 e−λ0,µ τ dτ u 0 eλ0,µ ξ ,

ξ

2 )% j=1

ξ

µ µ µ µ (dF3,1 − d F˜3,2 + dF3,1 − d F˜4,2 )(τ ), s ∗j (τ ) dτ s j (ξ )

0

% −

ec

/2µ

ξ

µ (dF3,1

µ µ µ ∗ ˜ ˜ − d F3,2 + dF4,1 − d F4,2 )(τ ), u j (τ ) dτ u j (ξ ) ,

µ µ µ µ ∂ξ wˆ 1(m+1) = wˆ 2(m+1) + d gˆ 6,1 − d gˆ 6,2 + ρdhˆ 1,1 − ρdhˆ 2,1 , µ

µ

∂ξ wˆ 2(m+1) = −c3 k02 ∂η2 wˆ 1(m+1) − c4 wˆ 1(m+1) + µPc (d1,1 wˆ 1(m+1) ) + µPc (d2,1 ∂η wˆ 1(m+1) ) + µPc (d3,1 wˆ 2(m+1) ) 2 2 ) + µPc ((d2,1 − d2,2 )∂η w˙ 1(m+1) ) + µPc ((d1,1 − d1,2 )w˙ 1(m+1) 2 ) + µPc ((d3,1 − d3,2 )w˙ 2(m+1)

+ µPc (d1 (˙r1(m) )w˜ 1 ) + µPc (d2 (˙r1(m) )∂η w˜ 1 ) + µPc (d3 (˙r1(m) )w˜ 2 ) µ

µ

+ Pc (gˆ 3,1 ∂η2 wˆ 1(m+1) ) + Pc (gˆ 5,1 ∂η wˆ 2(m+1) ) µ

µ

µ

µ

2 2 + Pc ((gˆ 3,1 − gˆ 3,2 )∂η2 w˙ 1(m+1) ) + Pc ((gˆ 5,1 − gˆ 5,2 )∂η w˙ 2(m+1) ) µ

µ

µ

µ

+ Pc ((d gˆ 3,1 − d gˆ 3,2 )∂η2 w12 ) + Pc ((d gˆ 5,1 − d gˆ 5,2 )∂η w22 ) µ µ µ µ µ µ + Pc (d gˆ 3,1 ∂η2 w˜ 1 ) + Pc (d gˆ 5,1 ∂η w˜ 2 ) + d gˆ 4,1 − d gˆ 4,2 + ρdhˆ 1 − ρdhˆ 2 ,

1 1 1 in which ( Zˆ , rˆ , w) ˆ denotes the difference between the values ( Z˙ (m+1) , r˙(m+1) , w˙ (m+1) ) 2 2 2 and ( Z˙ (m+1) , r˙(m+1) , w˙ (m+1) ) of the iterate ( Z˙ (m+1) , r˙(m+1) , w˙ (m+1) ) for (Z , r, w) = µ (Z 1 , r 1 , w 1 ) and (Z , r, w) = (Z 2 , r 2 , w 2 ), d G˜ is employed as an abbreviation for 1, j

µ d G˜ 1 [Z j , r j , w j ]( Z˙ (m) , r˙(m) , w˙ (m) ), j = 1, 2 (similar abbreviations are used for the other functions) and ( Z˜ , r˜ , w) ˜ = (Z 1 , r 1 , w 1 ) − (Z 2 , r 2 , w 2 ). Employing the symbol cµ to denote a constant which depends upon µ and estimating µ µ d gˆ 3,1 − d gˆ 3,2 s+1 ≤ cµ ( Z˙ (m) , r˙(m) , w˙ (m) )X s+1 ( Z˜ , r˜ , w) ˜ X s+1

together with similar estimates for the other terms involving differences of derivatives (see the remarks at the end of Sect. 4.4), we find that % ξ 1 | Zˆ (ξ )| ≤ cµ ( Z˙ (m) , r˙(m) , w˙ (m) )(τ )X s+1 ( Z˜ , r˜ , w)(τ ˜ )X s+1 eλ0,µ τ dτ e−λ0,µ ξ λ0,µ 0 % ec /2µ 1 + ( Z˙ (m) , r˙(m) , w˙ (m) )(τ )X s+1 ( Z˜ , r˜ , w)(τ ˜ )X s+1 e−λ0,µ τ dτ eλ0,µ ξ λ0,µ ξ for ξ ≤ ec sup ξ ∈[0,ec

/2µ

]

/2µ

, whence

| Zˆ (ξ )| ≤ cµ

sup ξ ∈[0,ec

/2µ

]

( Z˙ (m) , r˙(m) , w˙ (m) )(ξ )X s+1

sup ξ ∈[0,ec

/2µ

]

( Z˜ , r˜ , w)(ξ ˜ )X s+1

Modulating Pulses for Quasilinear Wave Equations

621

and similarly sup ξ ∈[0,ec

/2µ

]

|ˆr (ξ )| ≤ cµ

sup ξ ∈[0,ec

/2µ

]

( Z˙ (m) , r˙(m) , w˙ (m) )(ξ )X s+1

sup ξ ∈[0,ec

/2µ

]

( Z˜ , r˜ , w)(ξ ˜ )X s+1 .

Furthermore, the usual energy estimates show that e (wˆ (m+1) ) ∂ξ Es+1

≤ cµ(e−c

/2µ

+ µe−µ

2 θξ

e )Es+1 (wˆ (m+1) )

e 2 + cµ Es+1 (wˆ (m+1) )1/2 Es+2 (w˙ (m+1) )1/2 (Es+1 (w) ˜ 1/2 + | Z˜ | + |˜r |) e + cµ Es+1 (wˆ (m+1) )1/2 Es+2 (w 2 )1/2 × (Es+1 (w˙ (m) )1/2 + | Z˙ (m) | + |˙r(m) |)(Es+1 (w) ˜ 1/2 + | Z˜ | + |˜r |) e (wˆ (m+1) )1/2 Es+2 (w) ˜ 1/2 (Es+1 (w˙ (m) )1/2 + | Z˙ (m) | + |˙r(m) |) + cµ Es+1 e (wˆ (m+1) )1/2 (Es+1 (w) ˜ 1/2 + | Z˜ | + |˜r |)(Es+1 (w˙ (m) )1/2 + | Z˙ (m) | + |˙r(m) |), + cµ Es+1

which in turn yields the estimate sup ξ ∈[0,ec

/2µ

≤ cµ

]

( Zˆ (m) , rˆ(m) , wˆ (m) )(ξ )X s+1 sup

ξ ∈[0,ec

/2µ

]

& ' ( Z˙ (m) , r˙(m) , w˙ (m) )(ξ )X s+1 + 1

sup ξ ∈[0,ec

/2µ

]

( Z˜ , r˜ , w)(ξ ˜ )X s+2

because 2 ) ≤ c, Es+2 (w 2 ) ≤ ce−c Es+2 (w˙ (m+1)

/2µ

for ξ ≤ ec /2µ . (The first of the above inequalities follows from part (i) of this lemma and Lemma 5 with s replaced by s + 1.) cs Corollary 2. Any solution (Z , r , w ) to Eqs. (79)–(82) whose initial data lies on Wloc s+1 0 is differentiable in the topology of X with respect to Z .

Proof. Let T be the operator mapping ( Z˙ (m) , r˙(m) , w˙ (m) ) to ( Z˙ (m+1) , r˙(m+1) , w˙ (m+1) ) in the iteration scheme (98)–(101), which may therefore be written as ( Z˙ (m+1) , r˙(m+1) , w˙ (m+1) ) = T ((Z , r, w), ( Z˙ (m) , r˙(m) , w˙ (m) )). Consider the new iteration scheme ( Z˙ (m+1) , r˙(m+1) , w˙ (m+1) ) = T ((Z (m) , r(m) , w(m) ), ( Z˙ (m) , r˙(m) , w˙ (m) )) with initial data Z˙ (0) = 0, r˙(0) = 0, w˙ (0) = 0, which is obtained by differentiating (83)–(86) with respect to Z 0 and using the dot to denote ∂ Z 0 . Let us write this iteration scheme as ( Z˙ (m+1) , r˙(m+1) , w˙ (m+1) ) = T ((Z , r , w ), ( Z˙ (m) , r˙(m) , w˙ (m) )) + α(m) , where α(m) = T ((Z (m) , r(m) , w(m) ), ( Z˙ (m) , r˙(m) , w˙ (m) )) − T ((Z , r , w ), ( Z˙ (m) , r˙(m) , w˙ (m) )).

622

M. D. Groves, G. Schneider

Lemma 7(i) implies that T ((Z , r , w ), ·) : C([0, ec

/2µ

], X s+1 ) → C([0, ec

/2µ

is a contraction whenever w (ξ )Xcs+2 ≤ e−c /2µ for ξ ∈ [0, ec from Lemma 7(ii) and Theorem 2 with s replaced by s + 1 that

], X s+1 )

/2µ

], and it follows

α(m) C([0,ec /2µ ],X s+1 )

≤ cµ (Z (m) − Z , r(m) − r , w(m) − w )C([0,ec /2µ ],X s+2 ) × ( Z˙ (m) , r˙(m) , w˙ (m) )C([0,ec /2µ ],X s+1 )

+ cµ (Z (m) − Z , r(m) − r , w(m) − w )C([0,ec /2µ ],X s+2 )

= o(1)( Z˙ (m) , r˙(m) , w˙ (m) )C([0,ec /2µ ],X s+1 ) + o(1)

whenever w(m) (ξ )Xcs+3 , w (ξ )Xcs+3 ≤ e−c /2µ for ξ ∈ [0, ec /2µ ]; according to Theorem 2 with s replaced by s + 3 the hypothesis w0 Xcs+4 ≤ µe−c /2µ in the definics guarantees that these conditions are met. Elementary arguments show that tion of Wloc ˙ ( Z (m) , r˙(m) , w˙ (m) ) converges in C([0, ec /2µ ], X s+1 ) to ( Z˙ , r˙ , w˙ ). By construction, one has that ( Z˙ (m) , r˙(m) , w˙ (m) ) = (∂ Z 0 Z (m) , ∂ Z 0 r(m) , ∂ Z 0 w(m) ) for each m ∈ N0 , and a familiar uniform continuity argument asserts that ( Z˙ , r˙ , w˙ ) = (∂ Z 0 Z , ∂ Z 0 r , ∂ Z 0 w ). We now turn to the requisite estimates on the derivative of Jρ,w0 . s+1 , Proposition 13. Consider the function Jρ,w0 : B¯ µe−c /2µ (0) ⊂ R → (I − Ssh )Xsh /2µ 0 −c where ρ ∈ [0, 1] and w Xcs+4 ≤ µe . s+1 is a bijection and (i) The operator d J0,0 [0] : R → (I − Ssh )Xsh

|d J0,0 [0]−1 | ≤ c.

(102)

s+1 satisfies the estimate (ii) The operator d Jρ,w0 [Z 0 ] : R → (I − Ssh )Xsh

|d Jρ,w0 [Z 0 ] − d J0,0 [0]| ≤ cµ3 .

(103)

s+1 of X s+1 Proof. We begin by equipping the one-dimensional subspace (I − Ssh )Xsh sh s+1 with a convenient basis. Recall that {s0 , Ssh s0 } is a basis for Xsh ; it follows that

{(I + Ssh )s0 , (I − Ssh )s0 } s+1 , whose dual is is also a basis for Xsh

*1

2 (I

+ + Ssh )s0∗ , 21 (I − Ssh )s0∗ .

s+1 is therefore {(I − S )s }. A suitable basis for (I − Ssh )Xsh sh 0

Modulating Pulses for Quasilinear Wave Equations

623

By definition we have that % ξ µ Z˙ = Z˜ 0 s0 e−λ0,µ ξ + (d G˜ 1 [0, p µ , 0]( Z˙ , r˙ , w), ˙ s0∗ eλ0,µ τ dτ s0 e−λ0,µ ξ % −

ec ξ

0

/2µ

µ (d G˜ 1 [0, p µ , 0]( Z˙ , r˙ , w), ˙ u 0 eλ0,µ τ dτ u 0 e−λ0,µ ξ ,

= Z˜ 0 s0 e−λ0,µ ξ , in which Z˙ , r˙ and w˙ are abbreviations for respectively ∂ Z 0 Z 0,0 (0, 0, 0) Z˜ 0 , ∂ Z 0 r0,0 (0, 0, 0) Z˜ 0 and ∂ Z 0 w0,0 (0, 0, 0) Z˜ 0 (cf. Eqs. (98)–(100)) and we have used the µ µ fact that G˜ 1 (0, p µ , 0) and d G˜ 1 [0, p µ , 0] both vanish (see Eq. (96)). It follows that (I − Ssh )∂ Z 0 Z 0,0 (0, 0, 0) Z˜ 0 |ξ =0 = Z˜ 0 (I − Ssh )s0 and hence that Z˜ 0 = (I − Ssh )∂ Z 0 Z 0,0 (0, 0, 0) Z˜ 0 |ξ =0 , 21 (I − Ssh )s0∗ = d J0,0 [0]( Z˜ 0 ), 1 (I − Ssh )s ∗ . 0

2

(104)

s+1 R2 . The first assertion is a direct consein which · , · is the inner product on Xsh quence of (104). Define Z 1 = Z ρ,w0 (Z 0 , r10 , r20 ), Z 2 = Z 0,0 (0, 0, 0), r 1 = rρ,w0 (Z 0 , r10 , r20 ), r 2 = r0,0 (0, 0, 0), Z˙ 1 = dZ ρ,w0 [Z 0 , r10 , r20 ], Z˙ 2 = dZ 0,0 [0, 0, 0] and r˙ 1 = drρ,w0 [Z 0 , r10 , r20 ], r˙ 2 = dr0,0 [0, 0, 0]. By construction we have that

( Z˙ 1 − Z˙ 2 )(ξ ) % ξ µ µ (d G˜ 1 (Z 1 , p µ + r1 , w1 )( Z˙ 1 , r˙1 , w˙ 1 ) − d G˜ 1 (0, p µ + r2 , 0)( Z˙ 2 , r˙2 , w˙ 2 ) = 0

µ

µ

+ ρdG 1 [0, p µ , 0]( Z˙ 1 , r˙1 , w˙ 1 ) + ρdG 2 [ p µ + r1 ](˙r1 ), s0∗ eλ0,µ τ dτ s0 e−λ0,µ ξ % ec /2µ µ µ − (d G˜ 1 (Z 1 , p µ + r1 , w1 )( Z˙ 1 , r˙1 , w˙ 1 ) − d G˜ 1 (0, p µ + r2 , 0)( Z˙ 2 , r˙2 , w˙ 2 ) ξ

µ µ +ρdG 1 [0, p µ , 0]( Z˙ 1 , r˙1 , w˙ 1 ) + ρdG 2 [ p µ + r1 ](˙r1 ), u ∗0 e−λ0,µ τ dτ u 0 eλ0,µ ξ

whence sup ξ ∈[0,ec

/2µ

]

|( Z˙ 1 − Z˙ 2 )(ξ )| ≤ cµ3

sup ξ ∈[0,ec

/2µ

(| Z˙ 1 (ξ )| + |˙r1 (ξ )| + w˙ 1 (ξ )Xcs+1 ) ≤ cµ3 .

]

We now study the solution set of the equation Jρ,w0 (Z 0 ) = 0 near the known solution Z 0 = 0 at (ρ, w0 ) = (0, 0) by writing it as Z 0 = Z 0 − d J0,0 [0]−1 Jρ,w0 (Z 0 )

(105)

624

M. D. Groves, G. Schneider

and examining this fixed point problem. According to a standard argument in nonlinear analysis the fixed-point problem (105) has a unique solution Z 0 = Z 0 (ρ, w 0 ) in s+1 whenever B¯ η (0) ⊂ (I − Ssh )Xsh η , 2 1 |d J0,0 [0]−1 ||d Jρ,w0 [Z 0 ] − d J0,0 [0]| ≤ , 2 |d J0,0 [0]−1 ||Jρ,w0 (0)| ≤

Z 0 ∈ B¯ η (0).

The estimates (102), (103) and " " |Jρ,w0 (0)| ≤ c " Z ρ,w0 (0)" ≤ cµ3 e−c /2µ

(see formula (87)) show that we can take η = µe−c /2µ . We have therefore constructed a family of symmetric solutions (Z w0 , rw0 , ww0 ) to (79)–(82) on [−ec /2µ , ec /2µ ] parametrised by w 0 ∈ c with w 0 Xcs+4 ≤ µe−c /2µ and satisfy (Z w0 (ξ ), rw0 (ξ ), ww0 (ξ ))X s+1 ≤ e−c /2µ for each ξ ∈ [−ec /2µ , ec /2µ ]. The formula z w0 (ξ ) = p µ (ξ ) + rw0 (ξ ), ξ ∈ [−ec

/2µ

, ec

/2µ

]

defines a family of modulating pulse solutions to the coupled system (72)–(74) which was obtained from the original spatial dynamics formulation of the problem by the nor0 mal-form theory in Sect. 4; these solutions are parametrised by w0 ∈ c , that is by w1,e 0 . Notice that p µ (0), r (0) ∈ , so that z (0) ∈ , and by construcand w2,o wh wh w0 w0 tion Z w0 (0) ∈ sh , ww0 (0) ∈ c . The existence result for modulating pulses stated in Sect. 1 (Theorem 1) follows by tracing the√coordinate transformations back to the original variable v(ξ, η) and replacing µ with ε. Acknowledgements. The work of Guido Schneider is partially supported by the Deutsche Forschungsgemeinschaft DFG under the grant Schn 520/3-1/2. We would like to thank the referee for many helpful comments and suggestions.

References 1. Birnir, B., McKean, H., Weinstein, A.: The rigidity of sine-Gordon breathers. Commun. Pure Appl. Math. 47, 1043–1051 (1994) 2. Denzler, J.: Nonpersistence of breather families for the perturbed sine Gordon equation. Commun. Math. Phys. 158, 397–430 (1993) 3. Groves, M.D., Mielke, A.: A spatial dynamics approach to three-dimensional gravity-capillary steady water waves. Proc. Roy. Soc. Edin. A 131, 83–136 (2001) 4. Groves, M.D., Schneider, G.: Modulating pulse solutions for a class of nonlinear wave equations. Commun. Math. Phys. 219, 489–522 (2001) 5. Groves, M.D., Schneider, G.: Modulating pulse solutions for quasilinear wave equations. J. Diff. Eqs. 219, 221–258 (2005) 6. Iooss, G., Lombardi, E.: Normal forms with exponentially small remainder: application to homoclinic connections for the reversible 02 iω resonance. C. R. Acad. Sci. Paris 339, 831–838 (2004) 7. Iooss, G., Lombardi, E.: Polynomial normal forms with exponentially small remainder for analytic vector fields. J. Diff. Eqs. 212, 1–61 (2005) 8. Kalyakin, L.A.: Asymptotic decay of a one-dimensional wave packet in a nonlinear dispersive medium. Mat. Sb. (N.S.) 132(174), 470–495 (1988) (English translation: Math. USSR-Sb. 60, 457–483 (1988))

Modulating Pulses for Quasilinear Wave Equations

625

9. Kato, T.: Quasi-linear equations of evolution, with applications to partial differential equations. In: Lecture Notes in Mathematics 448—Spectral Theory and Differental Equations, Dundee 1974. Berlin: SpringerVerlag, 1975, pp. 25–70 10. McLaughlin, D.W., Shatah, J.: Homoclinic orbits for PDEs. AMS Proc. Symp. Pure Math. 54, 281–299 (1998) 11. Pöschel, J.: Nonlinear partial differential equations, Birkhoff normal forms and KAM theory. In: European Congress of Mathematics, Vol. II (Budapest 1996). Prog. Math. 169. Basel: Birkhäliser, 1998, pp. 167–186 12. Schneider, G.: Justification of modulation equations for hyperbolic systems via normal forms. Nonlinear Differential Equations and Applications (NODEA) 5, 69–82 (1998) 13. Schneider, G.: Lecture Notes for Analysis of Maxwell’s Equations. Lecture notes: Universität Karlsruhe, 2007 14. Shatah, J.: Normal forms and quadratic nonlinear Klein-Gordon equations. Commun. Pure Appl. Math. 38, 685–696 (1985) Communicated by A. Kupiainen

Commun. Math. Phys. 278, 627–641 (2008) Digital Object Identifier (DOI) 10.1007/s00220-007-0408-y

Communications in

Mathematical Physics

Sheaves on Fibered Threefolds and Quiver Sheaves Balázs Szendr˝oi Department of Mathematics, Utrecht University, P.O. Box 80010, NL-3508 TA Utrecht, The Netherlands Received: 11 August 2006 / Accepted: 23 August 2007 Published online: 8 January 2008 – © Springer-Verlag 2007

Abstract: This paper classifies a class of holomorphic D-branes, closely related to framed torsion-free sheaves, on threefolds fibered in resolved AD E surfaces over a general curve C, in terms of representations with relations of a twisted Kronheimer– Nakajima-type quiver in the category Coh(C) of coherent sheaves on C. For the local Calabi–Yau case C ∼ = A1 and special choice of framing, one recovers the N = 1 AD E quiver studied by Cachazo–Katz–Vafa. Introduction The purpose of this paper is to study, via dimensional reduction, certain holomorphic D-branes, closely related to torsion-free sheaves, on threefolds X →C fibered in resolved AD E surfaces over a curve. Fibered local Calabi–Yau threefolds X → A1 of this type, as well as their deformations X s → A1 and extremal transitions, were thoroughly analyzed in [6,7] from the point of view of supersymmetric gauge theory. The paper [6] contains an assertion, made explicit in [14] and studied in [21], that exceptional components of a natural threefold contraction X s → X¯ s are classified by irreducible representations of a certain quiver with loop edges, the N = 1 AD E quiver (see Fig. 3.2 for an example), satisfying a specific set of relations. This statement is in the spirit of Gabriel’s theorem classifying exceptional (not necessarily irreducible) rational curves in resolved AD E surfaces in terms of irreducible representations of the corresponding Dynkin quiver. In this paper we generalize the work of [6,14,21] in two directions: we consider holomorphic D-branes, objects in the derived category of coherent sheaves, instead of exceptional components, and we study the semi-local case: the neighbourhood of a deformed AD E fibration X s → C over a general curve C. The main result is Theorem 3.1, which shows that certain holomorphic D-branes on the fibered threefold X s are classified by representations with relations of a Kronheimer–Nakajima-type quiver Current address: Mathematical Institute, University of Oxford, 24–29 St.Gles, Oxford OX1 3LB, UK. E-mail: [email protected]

628

B. Szendr˝oi

in the category Coh(C) of coherent sheaves on the curve C. In particular, moduli spaces of such holomorphic D-branes are quiver bundle varieties over C. If C ∼ = A1 , a further dimensional reduction leads to Theorem 3.4, relating sheaves on the threefold to the zero-dimensional problem of ordinary matrix representations of the N = 1 AD E quiver of [6,14,21]. The loops in the N = 1 AD E quiver arise as the action by multiplication of a parameter t ∈ H 0 (OA1 ) on spaces of sections of sheaves on the base A1 . The geometry considered in this paper is non-monodromic, meaning that there is no global [19] nor local [6] monodromy in the fibration of AD E surfaces over the curve C. It appears to be an interesting question to extend the results proved here to these more general cases involving monodromy. In recent work [9], the moduli space of certain very special holomorphic D-branes on resolved A1 -fibered geometries X → C has been connected, via imposing a superpotential and going through a large N transition, to the Hitchin system on C. The branes studied in [9] are not of the type classified by our results; they should rather correspond to a complex of quiver representations. Understanding the precise connection between [9] and the present paper is left for future work. After introducing basic notation in Sect. 1, Sect. 2 describes the threefolds we study, and defines some auxiliary sheaves of non-commutative algebras over the curve C. Section 3 contains our results, in particular the general statement Theorem 3.1 connecting quiver bundles to holomorphic D-branes on ADE fibrations, as well as the statement for the affine case. Proofs are discussed in Sect. 4. 1. Finite Groups of Type A D E and Surfaces Let < SL(2, C) be a finite subgroup of type A, D or E. Let h0 be the Cartan subalgebra of the finite dimensional Lie algebra of the same type. Fix a set of simple roots {ηa : a ∈ 0 } indexed by nodes of the Dynkin diagram 0 , and let R+ be the set of positive roots. Let h be the corresponding affine Cartan with simple roots indexed by nodes of the Dynkin diagram ⊃ 0 . The group ring C has center Z (C) ∼ = C ; explicitly, for λ ∈ Z (C), the isomorphism is obtained by taking the trace of λ on a set of irreps, indexed by the nodes of according to the McKay correspondence. There is also a natural identification ∼ C , h0 = {λ ∈ C | λ · δ = 0} ⊂ h = where δ = (δa ) are the dimensions of the irreps of . Lemma 1.1. The centralizer CGL(2,C) () of in GL(2, C) is (1) the full group GL(2, C) for type A1 ; (2) a torus (C )2 in GL(2, C) for type An with n > 1; (3) the center C of GL(2, C) for types D and E. Let Y¯ = A2 / be the singular affine quotient, Y → Y¯ its minimal resolution. Exceptional curves in the resolution are in one-to-one correspondence with the nodes of 0 , and thus with a set of simple roots of h0 ; the positive roots η ∈ R+ correspond to connected, possibly reducible exceptional rational curves. The universal deformations Y → h0 and Y¯ → h0 /W of Y and Y¯ , where W denotes the Weyl group, are connected by the well known commutative diagram Y −→ p ∗ Y¯ −→ Y¯ ↓ ↓ p h0 −→ h0 /W.

Sheaves on Fibered Threefolds and Quiver Sheaves

629

2. Threefolds: Definitions 2.1. The geometry. Let C be a curve, and let Q be a rank-two vector bundle on C whose structure group reduces from GL(2, C) to the centralizer CGL(2,C) (). Thus, by Lemma 1.1, • for type A1 , Q is an arbitrary rank-two vector bundle; • for type An with n > 1, Q ∼ = Q1 ⊕ Q2 is the direct sum of two line bundles; ⊕2 ∼ • for types D, E, Q = Q0 for some line bundle Q0 . There is a fiberwise -action on the total space of the vector bundle Q, and the quotient X¯ = Q/ is a threefold with a curve of compound Du Val singularities along the image of the zero section. Let f : X → X¯ be the crepant resolution, with a map π : X → C whose fibres are minimal resolutions of the corresponding surface singularity, with trivial monodromy in the fibres. The canonical bundle of X is ωX ∼ = π ∗ (ωC ⊗ det Q∨ ). In particular, X is Calabi–Yau if and only if Q has canonical determinant on C. Part of the deformation theory of the threefold X was described in [19]. Let H0 = det Q ⊗ h0 , a vector bundle over C, and let S = H 0 (C, H0 ) be its space of sections. Then there is a smooth family of threefolds X → S, with injective Kodaira–Spencer map and central fibre X 0 ∼ = X , together with a fibration X → C × S and a contraction X → X¯ over S. Thus, for every s ∈ S, the threefold fibre X s possesses a fibration πs : X s → C in surfaces and a contraction f s : X s → X¯ s to a singular threefold with compound Du Val singularities. More precisely, for every positive root η ∈ R+ of h0 , there is a map pη : H0 → det Q, whose vanishing locus is a family of root hyperplanes in the h0 fibers, and we have Lemma 2.1. Let s ∈ S = H 0 (C, H0 ) be a section of H0 , and let η ∈ R+ be a positive root of h0 . The contraction f s : X s → X¯ s contracts a (possibly reducible) rational curve corresponding to the root η over a point P ∈ C, if and only if the projected section pη (s) ∈ H 0 (C, det Q) vanishes at P ∈ C. Thus if the projected section pη (s) is not identically zero for any root η, then f s is a small contraction, contracting rational curves to isolated singularities in certain configurations. If for different roots η, the sections pη (s) have different simple zeros, then f s contracts a set of isolated (−1, −1)-curves to simple nodes. If the linear system det Q has no base points on C, then this holds for generic s ∈ S. In the special case C ∼ = A1 , the central fiber X 0 = A1 × Y is Calabi–Yau, and its deformations are parameterized by an h0 -valued polynomial s ∈ h0 [t]. Under the isomorphism h0 ∼ = {λ | s · δ = 0} ⊂ C , we can also parameterize deformations by a set of ordinary polynomials a ∈ C[t] indexed by nodes of the affine Dynkin diagram , satisfying a δa a = 0. The exceptional fibres of f s : X s → X¯ s lie over roots of the various polynomials ηa = a , corresponding to simple roots ηa , as well as over roots oftheir linear combinations η = a µa a , corresponding to other positive roots η = a µa ηa ∈ R+ . For generic choice of parameter s ∈ S, equivalently for generic choice of { a }, the polynomials { η : η ∈ R+ } have distinct simple roots, and the exceptional set of f s : X s → X¯ s consists of isolated (−1, −1)-curves.

630

B. Szendr˝oi

2.2. Sheaves of non-commutative algebras and their sheaves of modules. Given (C, Q), let H = det Q ⊗ h, a vector bundle on the curve C containing H0 as a subbundle. Given a section s ∈ H 0 (C, H), consider the natural composition ∧2

∼

·s

σs : Q∨ ⊗ Q∨ −→ det Q∨ −→ h ⊗ OC −→ Z (C) ⊗ OC , a family of Z (C)-valued symplectic forms in the fibres of the vector bundle Q∨ . Also fix, once and for all, a trivializing section z ∈ H 0 (OC ). Definition 2.2. Let As be the sheaf of non-commutative algebras on C whose sections on an open set U ⊂ C are As (U ) = T Q∨ (U ) ∗ C

[x1 , x2 ] + σs (x1 , x2 ) , ∨ ∨ ∨ where T Q (U ) is the full tensor algebra of Q (U ), xi ∈ Q (U ) are local sections, and . . . denotes the two-sided ideal generated by all given expressions. Define also Ps (U ) = T (Q∨ ⊕ OC )(U ) ∗ C [x1 , x2 ] + σs (x1 , x2 )z 2 , [xi , z] ,

where the fixed section z ∈ H 0 (OC ) commutes with elements of C. The sheaf Ps becomes a sheaf of graded algebras by assigning degree 1 to local sections xi ∈ Q∨ (U ) as well as to z ∈ H 0 (OC ); thus its degree-zero piece is Ps,0 ∼ = OC ⊗ C. Remark 2.3. The sheaf of algebras As is a relavitive version of the following non-commutative deformation of the skew group algebra, introduced by Crawley–Boevey and Holland in [8], depending on a deformation parameter λ ∈ h ∼ = Z (C): Aλ = C x1 , x2 ∗

[x1 , x2 ] + λ. The graded version is Pλ = C y0 , y1 , y2 ∗

[y0 , yi ], [y1 , y2 ] + λy02 .

For = {1}, λ is just a complex number; if λ = 0, Aλ is isomorphic to the first Weyl algebra, whereas Pλ is a degenerate Sklyanin algebra deforming the algebra of functions on the commutative projective plane P2 . As proved in [8], for general and λ ∈ h0 ⊂ Z (C) the algebra Aλ is finite over its center Z Aλ ∼ = C[Y¯λ ]. The latter is the coordinate ring of the affine variety Y¯λ corresponding to the deformation parameter λ ∈ h0 , a deformation of the invariant ring C[x1 , x2 ] ∼ = C[Y¯ ]. For λ ∈ h\h0 , Aλ is “genuinely” non-commutative. By abuse of notation, we will refer to Ps = Proj C Ps as the non-commutative projective bundle corresponding to s ∈ S, with fibration πs : Ps → C. Setting z = 0, we have its divisor at infinity i s : Ds → Ps .

Sheaves on Fibered Threefolds and Quiver Sheaves

631

The divisor Ds has the structure of an ordinary (commutative) P1 -bundle πs | Ds = τs : Ds → C equipped with a -action on the fibres. Its complement As = Ps \ Ds = Spec C As is a non-commutative affine bundle. The sheaf Ps is a sheaf of regular graded algebras in the sense of [1]; sheaf theory on Ps works in complete analogy with the absolute case discussed in [3]. The category of coherent sheaves Coh(Ps ) is by definition the quotient of the category of sheaves of finitely generated graded right Ps -modules by the subcategory of sheaves of torsion Ps -modules; we will sometimes refer to objects in this category as Ps -modules. The trivial module, graded in degree n, defines the object OPs (n) ∈ Coh(Ps ); given a sheaf E, its twists E(n) are obtained by shifting the grading. We have Ext groups as the derived functors of Hom, and also functors Ext i (−, OPs ); the latter take values in the category of left Ps -modules (compare [3]). Pushforward πs∗ : Coh(Ps ) → Coh (C) along the morphism πs : Ps → C is defined in the usual way, as the coherent -sheaf on C defined by sections over preimages of open sets of C, the section spaces being (right) C-modules; the action of on C is taken to be trivial. The higher pushforwards R p πs∗ (−) are the derived functors of πs∗ . Given a Ps -module E, we will also use the relative Hom-functor HomC (E, −) : Coh(Ps ) → Coh (C), defined by homomorphisms on preimages of open sets in C, as well as its derived functors ExtiC (E, −) : Coh(Ps ) → Coh (C). We also have a pullback functor πs∗ : Coh (C) → Coh(Ps ) taking a sheaf of (right) C-modules F to the sheaf F ⊗C Ps of (right) Ps -modules. The pair (πs∗ , πs∗ ) forms an adjoint pair as in the commutative case. Similarly, for the inclusion i s : Ds → Ps , we have a pullback (restriction) functor i s∗ : Coh (Ps ) → Coh (Ds ), defined by factoring modules of local sections by the ideal z (recall that z is central), as well as a pushforward i s∗ : Coh (Ds ) → Coh (Ps ), with z acting on local sections by zero. There is also a restriction functor to the finite part As , defined by factoring the ideal z − 1.

632

B. Szendr˝oi

Definition 2.4. A πs -free sheaf on Ps is an object E ∈ Coh(Ps ), which admits an embedding E → πs∗ (U)(n) for some U ∈ Coh (C) and n ∈ Z. A framed πs -free sheaf (E, ϕ) on (Ps , Ds ) is a πs -free sheaf E on Ps , together with a fixed isomorphism ∼

ϕ : i s∗ E −→ τs∗ W, on the divisor Ds at infinity, for some W ∈ Coh (C). Remark 2.5. If π : P → {∗} is a (non-commutative) projective space over a point, the π -free sheaves are exactly the torsion free ones (compare [3, Sect. 2]). To see this, note that a π -free sheaf is certainly torsion free, since it embeds into a locally free sheaf. Conversely, a torsion free sheaf embeds into some locally free sheaf, which in turn embeds into some OPm (n). Lemma 2.6. If E is πs -free, then L j i s∗ E = 0 for j > 0. Proof. As in the commutative case, the structure sheaf i s∗ O Ds has a resolution z

0 → OPs (−1) → OPs → i s∗ O Ds → 0, which implies that L j i s∗ E = 0 for j > 1 for any E ∈ Coh(Ps ), and also that L 1 i s∗ is left exact. If E is πs -free, applying the latter to an embedding E → πs∗ (U)(n) gives the vanishing of L 1 also. 3. Threefolds: The Results 3.1. Twisted quiver representations and quiver sheaves. Recall that, given a quiver with arrows a → b marked by objects Oab ∈ C of an abelian tensor category C, a representation of the marked quiver in C consists of a set of objects Oa of C associated to nodes, and a set of morphisms ϕab ∈ HomC (Oa ⊗ Oab , Ob ) associated to the arrows a → b. Representations of a marked quiver in the category Coh(X ) of an algebraic variety X are also called quiver sheaves [12] on X . In the specific context of classifying holomorphic D-branes on the threefold X and its deformations, the following quiver marked in Coh(C) will arise naturally. The quiver is the standard extended McKay quiver of [16], obtained from the original one by adding an extra leaf at each node with arrows in both directions. Using the data of the vector bundle Q on C, we mark this quiver in Coh(C) as follows: • The marked An quiver for n > 1 is illustrated on Fig. 3.1; recall that in this case, there is a decomposition Q = Q1 ⊕ Q2 into a sum of line bundles, since the structure group of Q reduces to the diagonal torus. • The marked A1 quiver consists of only two nodes 0 and 1 and two arrows 0 → 1, 1 → 0 marked by the rank-two bundle Q∨ , as well as leaves marked as in the higher An case. • For types D and E, arrows between nodes are all marked by the line bundle Q∨ 0, ; leaves are marked as before. where Q = Q⊕2 0

Sheaves on Fibered Threefolds and Quiver Sheaves

633

Fig. 3.1. The marked extended McKay quiver for A2

3.2. The main classification result. Theorem 3.1. Given s ∈ H 0 (C, H), there is a 1-to-1 correspondence between the following sets of data: (1) Isomorphism classes of framed πs -free sheaves (E, ϕ) on (Ps , Ds ). (2) Quintuples (V, W, B, I, J ), where W, V are coherent -sheaves on C, and (V ⊗ Q∨ , V), B ∈ HomC (W, V), I ∈ HomC (V ⊗ det Q∨ , W), J ∈ HomC

satisfying the following two conditions: (a) the ADHM relation B ∧ B + I ◦ J + s = 0 ∈ HomC (V ⊗ det Q∨ , V),

where (V ⊗ det Q∨ , V) H 0 (C, Z (C) ⊗ det Q) → HomC

is the natural embedding as the central subspace; (b) non-degeneracy: if V ⊂ V is a -subsheaf such that B(V ⊗ Q∨ ) ⊂ V and IW ⊂ V , then V = V. (V, V). Sets of quintuples are identified under the action of invertible elements of Hom C

634

B. Szendr˝oi

(3) Representations ({Va }, {Wa }, {Bab }, {Ia }, {Ja }) in Coh(C) of the marked McKaytype quiver introduced in 3.1, satisfying (a) the ADHM relations ab Bba ◦ Bab + Ia ◦ Ja + sa = 0 ∈ HomC (Va ⊗ det Q∨ , Va ) b

at each node a, where ab ∈ {±1} is a standard assignment of signs to arrows with ab = −ba , and sa = Pηa (s) is the projected section corresponding to the simple root ηa , and (b) non-degeneracy: if {Va } is a B-invariant set of subsheaves containing the images of Ia ’s, then Va = Va at all nodes. Two representations are identified under invertible elements of a HomC (Va , Va ). If s ∈ S = H 0 (C, H0 ) is a deformation parameter of the threefold X = X 0 , then the same data also parametrizes (4) Certain objects in D(Coh X s ), the derived category of coherent sheaves on X s . Proof. The equivalence (1) ⇐⇒ (2) follows from a version of the relative Beilinson resolution for the non-commutative projective bundle Ps → C; details are given in Sect. 4.1. McKay’s definition of the quiver describing the representation theory of implies (2) ⇐⇒ (3) in the standard way. Finally the mapping (1) =⇒ (4) in the geometric case s ∈ S = H 0 (C, H0 ) is given by a derived equivalence to be discussed in Sect. 4.2. Remark 3.2. As X = X 0 and its deformations X s for s ∈ S are not projective, one needs to rigidify before holomorphic D-branes, in other words objects in Db (X s ) have a sensible moduli space. For the central fibre X = X 0 , a crepant resolution of the singular threefold Q/ , one has a derived equivalence [5] D(X 0 ) ∼ = D (Q) between the derived categories of coherent sheaves on X 0 and that of -equivariant sheaves on the total space of the bundle Q → C. One can easily rigidify on the latter by considering -sheaves on the projective bundle P0 = P(Q ⊕ OC ) → C, framed on the divisor at infinity D0 = P(Q) → P0 . Theorem 3.1 is the appropriate generalization of this approach which also works for deformations: for the analogous problem on X s , we consider framed sheaves on the non-commutative projective bundle Ps → C. In the surface case, this approach was used earlier in [3]. To quote the result, let λ ∈ Z (C). Then for -modules V, W , Nakajima’s non-singular quiver variety MV,W,λ parametrizes torsion free sheaves on the non-commutative space P2λ = Proj Pλ , framed on the commutative -line at ∞. This statement generalizes earlier work of [10,15,17, 18,13] and others. The origin of all such results is of course the ADHM classification [2] of finite-action SU(dim(W ))-instantons on R4 of charge dim(V ). 3.3. Some holomorphic D-branes on AD E fibrations over A1 . If C ∼ = A1 , Theorem 3.1 can in some cases be re-written in terms of classical quiver representations: representations of a quiver in vector spaces. This will give an interpretation of an assertion of [6,14,21].

Sheaves on Fibered Threefolds and Quiver Sheaves

635

Fig. 3.2. A representation of the affine N = 1 A2 quiver

∼ A1 , a deformation parameter s ∈ S of the central fibre X 0 = Recall that for C = × Y can be specified by a set of polynomials { a ∈ C[t] : a ∈ } indexed by the vertices of the affine quiver, subject to a δa · a = 0. The following definition is due to Cachazo–Katz–Vafa [6,14]. A1

Definition 3.3. The affine N = 1 AD E quiver is the McKay quiver extended by a loop a → a at each vertex. For a (finite-dimensional) representation ({Va }, {Bab }, {a }) of this quiver, the ADHM-type relations are ab Bba Bab + a (a ) = 0 ∈ Hom(Va , Va ) (3.1) b

at each vertex a ∈ of the quiver, where a (a ) is to be interpreted as the evaluation of a polynomial on an endomorphism of Va , as well as a Bba = Bba b ∈ Hom(Va , Vb )

(3.2)

along each arrow a → b of the quiver . Consider quadruples ({Va }, {Bab }, {a }, v0 ), where ({Va }, {Bab }, {a }) is a representation of the affine N = 1 AD E quiver satisfying the ADHM-type relations, and v0 ∈ V0 is a fixed vector in the vector space attached to the affine node. Call a quadruple non-degenerate if there is no (B, )-invariant collection of subspaces {Va ⊂ Va } with v0 ∈ V0 . Theorem 3.4. Equivalence classes of non-degenerate quadruples ({Va }, {Bab },{a },v0 ) satisfying the ADHM relations, identified under the action of a GL(Va ), parametrize certain objects in D(Coh X s ), holomorphic D-branes on the threefold X s .

636

B. Szendr˝oi

Proof. Quiver sheaf data on C parametrize certain branes on X s by Theorem 3.1. The correspondence between representations of the N = 1 AD E quiver and a special class of quiver sheaf data will be discussed in Sect. 4.3. Remark 3.5. As explained in [6], the quiver relations (3.1)-(3.2) come from the natural superpotential of the quiver gauge theory on , involving adjoint fields a as well as bifundamental fields Bab . Remark 3.6. Let the finite N = 1 AD E quiver be obtained from the affine one by deleting the affine node. Representations of the finite N = 1 AD E quiver, satisfying the ADHM-type relations (3.1)-(3.2), parametrize holomorphic D-branes supported on exceptional fibres of f s : X s → X¯ s . This follows from the statement that the vanishing of the affine component of V forces all other Va to be supported on points P ∈ C at which some projected section pη (s) vanishes for some positive root η ∈ R+ , in other words on points of the base curve over which the surface fiber πs−1 (P) contains exceptional curves. Observing that the section s ∈ H 0 (C, Z(C) ⊗ det Q) is central in (V ⊗ det Q∨ , V), so commutes with all components of B, the latter statement is HomC essentially proved in [6, 4.1–4.2]. This establishes a direct link to [14,21], according to which (in the generic case) irreducible representations of the finite N = 1 quiver with the given relations parametrize exceptional components of the contraction f s : X s → X¯ s . 4. Proofs 4.1. The Beilinson argument. The aim of this section is to prove of the equivalence (1) ⇐⇒ (2) of the classification result Theorem 3.1 via an analysis of framed πs -free sheaves on Ps . Given s ∈ H 0 (C, H), recall the sheaf of algebras Ps on the curve C, and the associated non-commutative bundle πs : Ps → C. Define Ps -modules Ti by T0 = OPs , 0 −→ OPs −→ πs∗ (Q ⊕ OC )(1) −→ T1 −→ 0,

(4.3)

T2 = πs∗ (det Q)(3). Proposition 4.1. A πs -free sheaf E on Ps , framed on the divisor Ds , is the cohomology of a monad 1 1 1 πs∗ ExtC (T2 (−1), E) (−1) → πs∗ ExtC (T1 , E) → πs∗ ExtC (T0 (1), E) (1)

of Ps -modules. Proof. Given a Ps -module F, a Koszul duality argument, in an analogous way to the absolute case in [3, Sect. 7] following [4, Thm. 2.6.1], leads to a Beilinson-type spectral sequence with E 1 term

p,q q E 1 = πs∗ ExtC T− p ( p), F ( p), nonzero only for −2 ≤ p ≤ 0, 0 ≤ q ≤ 2, converging to F in the limit. The vanishing results

q ExtC T− p ( p), E(−1) = 0 for q = 0, 2, p = −1, −2

Sheaves on Fibered Threefolds and Quiver Sheaves

637

which follow from the existence of the framing of E on the divisor Ds (compare [13, Lemma 6.2], [3, Lemma 4.2.12]), reduce the spectral sequence for F = E(−1) to the monad given in the statement. Details are left to the reader. We also record an auxiliary result. Lemma 4.2. There are natural isomorphisms ∨ ∨

HomPs πs∗ det Q∨ , πs∗ Q∨ (1) ∼ Q , Q ⊕ OC . = HomPs πs∗ Q∨ , OPs (1) ∼ = HomC Proof. The first isomorphism follows from Lemma 4.3 below. The second one follows from adjunction for the pair (πs∗ , πs∗ ), together with πs∗ OPs (1) ∼ = Ps,1 ∼ = (Q∨ ⊕ OC ) ∗ C ∈ Coh (C), an identity well known from the commutative context. Lemma 4.3. Let Q be a rank-two bundle on a (commutative) space. Then there is a natural isomorphism Q ⊗ det Q∨ ∼ = Q∨ . Proof. The embedding ι : det Q∨ → Q∨ ⊗ Q∨ induces a natural map

Hom det Q∨ , det Q∨ → Hom det Q∨ , (Q∨ )⊗2 ∼ = Hom Q ⊗ det Q∨ , Q∨ . The image of the identity of the first Hom-group gives a natural morphism as in the statement, which can be checked on a local basis to be an isomorphism. Now return to the context of the classification result Theorem 3.1, and consider a quintuple (V, W, B, I, J ) as in Theorem 3.1(2); recall that W, V ∈ Coh (C), and (V ⊗ Q∨ , V), B ∈ HomC (W, V), I ∈ HomC (V ⊗ det Q∨ , W). J ∈ HomC

Let

c ∈ HomPs πs∗ det Q∨ , πs∗ Q∨ (1) , d ∈ HomPs πs∗ Q∨ , OPs (1)

denote the images, under the isomorphisms of Lemma 4.2, of the canonical element ∨ ∨

Id ∈ HomC Q , Q ⊕ OC . (Q∨ , Q∨ ) ⊂ HomC Note also that we have a fixed section z ∈ HomPs (OPs , OPs (1)).

638

B. Szendr˝oi

Define

a=

πs∗ (B ◦ (IdV ⊗ ι)) ⊗ z − πs∗ (IdV ) ⊗ c (−1) πs∗ (J ) ⊗ z

: πs∗ (V ⊗ det Q∨ )(−1)

−→ πs∗ (V ⊗ Q∨ ⊕ W), where ι : det Q∨ → (Q∨ )⊗2 is the natural map. Define similarly

b = πs∗ (B) ⊗ z + πs∗ (IdV ) ⊗ d πs∗ (I) ⊗ z : πs∗ (V ⊗ Q∨ ⊕ W) → πs∗ (V)(1), to obtain the chain of morphisms a

b

πs∗ (V ⊗ det Q∨ )(−1) −→ πs∗ (V ⊗ Q∨ ⊕ W) −→ πs∗ (V)(1).

(4.4)

The following result completes the proof of the equivalence (1) ⇐⇒ (2) of the classification result Theorem 3.1. Proposition 4.4. If the quintuple satisfies the ADHM relation, then (4.4) is a complex of Ps -modules. Furthermore, it is a monad defining a framed πs -free sheaf E if and only if the quintuple (V, W, B, I, J ) is non-degenerate. Conversely, every πs -free Ps -module E, framed on Ds , arises from this construction. Proof. The standard direct computation shows that b ◦a = 0 is equivalent to the ADHM relation. The proof of the equivalence of the monad property and non-degeneracy is analogous to the absolute case [3, Sect. 4.1]. For the converse, given a framed sheaf (E, ϕ), 1 (O (1), E). Then by Proposition 4.1, E is the middle cohomology of the let V = ExtC Ps monad 1 πs∗ (V ⊗ det Q∨ )(−1) → πs∗ ExtC (T1 , E) → πs∗ V(1).

The usual arguments [13, Theorem 6.7] show that, since E is framed on Ds , this monad is isomorphic to a monad of the form (4.4) for some quintuple (V, W, B, I, J ). 4.2. A derived equivalence. In this section we complete the proof of Theorem 3.1 by establishing the missing link (1) =⇒ (4). Proposition 4.5. Let s ∈ S be a deformation parameter of the central fibre X = X 0 . There is a distinguished equivalence of triangulated categories D(Coh X s ) ∼ = D(Mod As ), where Mod As is the category of sheaves of finitely generated right As -modules, and D(−) denotes the bounded derived category on both sides. Proof. This assertion is a fibered version of the analogous two-dimensional equivalence proved in [11], and the proof carries over verbatim. A deformation argument starting from the central fibre X = X 0 shows that a certain specific component Ms of a fine moduli space of torsion sheaves on As maps by a semi-small birational map to the singular variety X¯ s . By [20], generalizing an argument of [5], this implies that Ms is a crepant resolution of X¯ s , and one has a derived equivalence D(Coh Ms ) ∼ = D(Mod As ) defined by the universal sheaf. But since X s is the unique crepant resolution of X¯ s , necessarily Ms ∼ = X s and the proposition follows. Details are left to the reader.

Sheaves on Fibered Threefolds and Quiver Sheaves

639

This equivalence gives the mapping (1) =⇒ (4) of Theorem 3.1 from framed πs free sheaves on Ps to objects in D(Coh X s ). Indeed, a right Ps -module can be restricted to the affine part As to give a right As -module, and then mapped using the derived equivalence to an object in D(Coh X s ), in other words a holomorphic D-brane on X s . 4.3. Fibrations over the affine line. In this section, we take a fibration X s → C ∼ = A1 and discuss the proof of Theorem 3.4. From Theorem 3.1, we know that certain holomorphic D-branes on X s are classified by non-degenerate quintuples (V, W, B, i, j) satisfying the ADHM equation. Consider the subclass of representations in Coh(A1 ) with the simplest possible framing W ∼ = OA1 and V a torsion -sheaf on A1 . It follows that J = 0 and I ∈ H 0 (A1 , V ). Decompose V and the map B into -components to obtain torsion sheaves Va and sheaf homomorphisms Bab : Va → Vb indexed by nodes and edges of the McKay quiver. Set Va = H 0 (A1 , Va ), and let Bab = H 0 (Bab ) : Va → Vb be the map on global sections induced by Bab . Let v0 ∈ V0 be the section corresponding to I. Let also a : Va → Va be the map induced by multiplication by the section t ∈ H 0 (A1 , OA1 ) ∼ = C[t]. Theorem 3.4 follows from Theorem 3.1, together with Proposition 4.6. The map (V, OC , B, 0, 0) → ({Va }, {Bab }, {a }, v0 ∈ V0 ) sets up a one-to-one correspondence from this restricted set of quiver ADHM data to representations of the affine N = 1 ADE quiver satisfying the relations (3.1)-(3.2). Proof. Given (V, B), the edge relations (3.2) a Bba = Bba b for the data ({Va }, {Bab }, {a }) hold by definition. Further, the ADHM equation for (V, B) is B ∧ B + s = 0 ∈ Hom(V, V ⊗ det Q), which in -components says that ab Bba ◦ Bab + sa = 0 ∈ Hom(Va , Va ). b

Replacing sa by the polynomial a , and remembering that the effect of t ∈ H 0 (OA1 ) on H 0 (V) is exactly a , for global sections we obtain ab Bba ◦ Bab + a (a ) = 0 ∈ Hom(Va , Va ) b

which is exactly relation (3.1) for the node a. Conversely, given a representation ({Va }, {Bab }, {a }, v0 ∈ V0 ) of the N = 1 ADE quiver, define torsion sheaves attached to the nodes by

1⊗t−a ⊗1 Va = coker Va ⊗ OA1 −−−−−−→ Va ⊗ OA1 . Using Lemma 4.7 below, for adjacent nodes a, b we have a diagram 1⊗t−a ⊗1

0 −−−−→ Va ⊗ OA1 −−−−−−→ Va ⊗ OA1 −−−−→ Va −−−−→ 0 ⏐ ⏐ ⏐ ⏐ Bab ⊗1 Bab ⊗1 1⊗t−b ⊗1

0 −−−−→ Vb ⊗ OA1 −−−−−−→ Vb ⊗ OA1 −−−−→ Vb −−−−→ 0

640

B. Szendr˝oi

which, by commutativity a Bba = Bba b , induces a map Bab : Va → Vb . The converse of the above argument shows that the ADHM relation follows from the relations (3.1). By Lemma 4.7, the two constructions are inverses to each other. The proof used the elementary Lemma 4.7. Given a torsion sheaf V on A1 = Spec C[t], let V = H 0 (A1 , V) and let : V → V be the map given by multiplication by t ∈ H 0 (OA1 ). Then the sequence of sheaves 1⊗t−⊗1

c

0 −−−−→ V ⊗ OA1 −−−−−−→ V ⊗ OA1 −−−−→ V −−−−→ 0 is exact on A1 , where c : H 0 (V) ⊗ OA1 → V is the canonical map. Conversely, given a vector space with an endomorphism (V, ), the exact sequence defines a torsion sheaf V on A1 , and the two constructions are mutual inverses. Remark 4.8. In this lemma, V ∼ = O Z is a structure sheaf of a 0-dimensional subscheme Z ⊂ A1 if and only if is a regular endomorphism. Their moduli space is Mat(n, C)//GL(n, C) ∼ = (A1 )[n] , = {regular endomorphisms}/GL(n, C) ∼ = An ∼ where the map is given by taking the coefficients of the characteristic polynomial of , which is also the equation of the corresponding subscheme. Acknowledgements. Thanks to Sheldon Katz, Eduard Looijenga, Tom Nevins and Tony Pantev for helpful remarks and correspondence. Special thanks to Ian Grojnowski for many conversations on subjects related to this paper. Support by a European Union Marie Curie Individual Fellowship and by OTKA grant #046878 is also gratefully acknowledged.

References 1. Artin, M., Zhang, J.: Non-commutative projective schemes. Adv. Math. 109, 228–287 (1994) 2. Atiyah, M.F., Drinfeld, V., Hitchin, N., Manin, Yu.: Geometric construction of instantons. Phys. Lett. A 65, 185–187 (1978) 3. Baranovski, V., Ginzburg, V., Kuznetsov, A.: Quiver varieties and a non-commutative P2 . Compositio Math. 134, 283–318 (2002) 4. Beilinson, A., Ginzburg, V., Soergel, C.: Koszul duality patterns in representation theory. J. Am. Math. Soc. 9, 473–527 (1996) 5. Bridgeland, T., King, A., Reid, M.: The McKay correspondence as an equivalence of derived categories. J. Am. Math. Soc. 14, 535–554 (2001) 6. Cachazo, F., Katz, S., Vafa, C.: Geometric transitions and N = 1 quiver theories. http://arxiv.org/list/ hep-th/0108120, 2001 7. Cachazo, F., Fiol, B., Intriligator, K., Katz, S., Vafa, C.: A geometric unification of dualities. Nucl. Phys. B628, 3–78 (2002) 8. Crawley-Boevey, W., Holland, M.P.: Non-commutative deformations of Kleinian surface singularities. Duke Math. J. 92, 605–635 (1998) 9. Diaconescu, D.-E., Dijkgraaf, R., Donagi, R., Hofman, C., Pantev, T.: Geometric transitions and integrable systems. Nucl. Phys. B752, 329–390 (2006) 10. Donaldson, S.: Instantons and geometric invariant theory. Commun. Math. Phys. 93, 453–460 (1984) 11. Gordon, I., Smith, S.P.: Representations of symplectic reflection algebras and resolutions of deformations of symplectic quotient singularities. Math. Ann. 330, 185–200 (2004) 12. Gothen, P., King, A.: Homological algebra of twisted quiver bundles. J. London Math. Soc. 71, 85–99 (2005) 13. Kapustin, A., Kuznetsov, A., Orlov, D.: Non-commutative instantons and twistor transform. Commun. Math. Phys. 221, 385–432 (2001)

Sheaves on Fibered Threefolds and Quiver Sheaves

641

14. Katz, S.: AD E geometry and dualities. Minicourse, Workshop on Algebraic Geometry and Physics, Lisbon, September 2004 15. Kronheimer, P., Nakajima, H.: Yang–Mills instantons on ALE gravitational instantons. Math. Ann. 288, 263–307 (1990) 16. Nakajima, H.: Instantons on ALE spaces, quiver varieties and Kac–Moody algebras. Duke M. J. 76, 365–416 (1994) 17. Nakajima, H.: Heisenberg algebra and Hilbert scheme of surfaces. Ann. of Math. 145, 379–388 (1997) 18. Nekrasov, N., Schwarz, J.: Instantons on noncommutative R4 , and (2, 0) superconformal six dimensional theory. Commun. Math. Phys. 198, 689–703 (1998) 19. Szendr˝oi, B.: Artin group actions on derived categories of coherent sheaves. J. Reine Angew. Math. 572, 139–166 (2004) 20. Van den Bergh, M.: Non-commutative crepant resolutions. The legacy of Niels Henrik Abel, Berlin: Springer, 2004, pp. 749–770 21. Zhu, X.: Representations of N = 1 AD E quivers via reflection functors. Mich. Math. J. 54, 671–686 (2006) Communicated by M.R. Douglas

Commun. Math. Phys. 278, 643–678 (2008) Digital Object Identifier (DOI) 10.1007/s00220-007-0409-x

Communications in

Mathematical Physics

Asymptotics of the Airy-Kernel Determinant P. Deift1 , A. Its2 , I. Krasovsky2,3 1 Courant Institute of Mathematical Sciences, New York, NY 10003, USA 2 Department of Mathematical Sciences, Indiana University – Purdue University Indianapolis,

Indianapolis, IN 46202-3216, USA. E-mail: [email protected]

3 Department of Mathematical Sciences, Brunel University, Uxbridge UB83PH, UK

Received: 15 September 2006 / Accepted: 19 July 2007 Published online: 11 January 2008 – © Springer-Verlag 2008

Abstract: The authors use Riemann-Hilbert methods to compute the constant that arises in the asymptotic behavior of the Airy-kernel determinant of random matrix theory. 1. Introduction Let K s be the trace-class operator with kernel K s (t, u) =

Ai (t)Ai (u) − Ai (u)Ai (t) t −u

(1)

(see [31]) acting on L 2 (−s, ∞). Here Ai (x) is the Airy function (see, e.g., [1]). In this paper we are concerned with the behavior of det(I − K s ) as s → +∞. Our main result is the following. Theorem 1. The large-s asymptotic behavior of the Fredholm determinant det(I − K s ) is given by the formula ln det(I − K s ) = −

1 s3 − ln s + χ + O(s −3/2 ), 12 8

(2)

where χ=

1 ln 2 + ζ (−1), 24

(3)

and ζ (s) is the Riemann zeta-function. The Airy-kernel determinant det(I − K s ) is the edge scaling limit for the largest eigenvalue of a random n × n Hermitian matrix H from the Gaussian Unitary Ensemble

644

P. Deift, A. Its, I. Krasovsky

(GUE) (see [29,31]) as n → ∞: More precisely, if λ1 (H ) ≥ λ2 (H ) ≥ · · · ≥ λn (H ) denote the eigenvalues of H , then √ (4) det(I − K s ) = lim Prob {H ∈ GUE : (λ1 (H ) − 2n)21/2 n 1/6 ≤ −s} n→∞

(see [21,31], and also [8] for some history of (4)). This determinant also describes the distribution of the longest increasing subsequence of random permutations [3,25]. Namely, let π = i 1 i 2 · · · i n be a permutation in the group Sn of permutations of 1, 2, . . . , n. Then a subsequence i k1 , i k2 , . . . i kr , k1 < k2 < · · · < kr , of π is called an increasing subsequence of length r if i k1 < i k2 < · · · < i kr . Let ln (π ) denote the length of a longest increasing subsequence of π and let Sn have the uniform probability distribution. Then ln (π ) is a random variable, and [3] √ det(I − K s ) = lim Prob {π ∈ Sn : (ln (π ) − 2 n)n −1/6 ≤ −s}. (5) n→∞

The distribution FT W (x) ≡ det(I − K −x ), known as the Tracy-Widom distribution, admits the following integral representation [31]: ∞ FT W (x) = exp − (y − x)u 2 (y)dy , (6) x

where u(y) is the (global) Hastings-McLeod solution of the Painlevé II equation u (y) = yu(y) + 2u 3 (y),

(7)

specified by the following asymptotic condition: u(y) ∼ Ai (y)

as y → +∞.

The behavior of u(y) as y → −∞ is given by the relation [22]: 1 y u(y) = − 1 + 3 + O y −6 , y → −∞, 2 8y from which one learns that as s → +∞, ∞ s3 1 − + ln s = as + b + o(1) (s + y)u 2 (y)dy + 12 8 −s

(8)

(9)

(10)

for some constants a, b. The content of Theorem 1 is that a = 0 and b = χ as in (3). The value (3) of the constant χ was conjectured by Tracy and Widom in [31] on the basis of the numerical evaluation of the l.h.s. of (10) as s → +∞ and by taking into account the Dyson formula for a similar constant in the asymptotics of the so-called sine-kernel determinant [31]. The sine-kernel determinant describes the gap probability for GUE in the bulk scaling limit as n → ∞ [29]. Dyson’s conjecture for the constant in the asymptotics of the sine-kernel determinant was proved rigorously in independent work by Ehrhardt [19] and one of the authors [26], and a third proof was given later in [17]. The two latter works use a Riemann-Hilbertproblem approach. The proof in [26] relies on a priori information from [33], whereas the proof in [17] is self-contained. The proof of Theorem 1 in this paper follows the method in [17].

Asymptotics of the Airy-Kernel Determinant

645

As discussed in [17], the key difficulty in evaluating constants such as χ in (2) in the asymptotic expansion of the determinants, is that in the course of the analysis one most naturally obtains expressions only for the logarithmic derivative with respect to some auxiliary parameter, say α, in the problem, and not the determinant itself. After evaluation of these expressions asymptotically, the constant of integration remains undetermined. In [17] and [26], this difficulty is overcome by utilizing a scaling limit of finite-n random matrices together with universality in the sense of random matrix theory (see, e.g., [15]), in a way that is inspired by, but different from, Dyson [18]. We proceed as follows. Consider the scaled Laguerre polynomials pk (x) defined for some integer n by the orthogonality relation ∞ e−4nx pk (x) pm (x)d x = δk,m , k, m = 0, 1 . . . . (11) 0

The polynomial pk (x) = k x k +· · · is of degree k and is related to the standard Laguerre polynomial L (0) k (x) (see [30]) as follows: √ (0) pk (x) = 2 n L k (4nx) with leading coefficient k = (−1)k

√ 2 n (4n)k . k!

(12)

The scaling here is chosen so that the asymptotic density of zeros of the polynomial pn (x) (0) (with index n) is supported on the interval (0, 1) (as opposed to (0, 4n) for L n (x)). See [30,16] and below. In the unitary random matrix ensemble defined by the Laguerre weight, the distribution function of the eigenvalues is given by the expression: d P(x0 , . . . , xn−1 ) =

1 Cn n!

where the normalization constant ∞ 1 ∞ ··· Cn = n! 0 0

(xi − x j )2

0≤i< j≤n−1

n−1

e−4x j n d x j ,

(13)

j=0

(xi − x j )2

0≤i< j≤n−1

n−1

e−4x j n d x j .

(14)

j=0

By a well known identity (e.g. [30,7]), the r.h.s. of the above expression gives Cn =

n−1

k−2 = (4n)−n

k=0

2

n−1

k!2 ,

(15)

k=0

where (12) was used. For α ≥ 0, the probability Dn (α) that the interval (α, ∞) has no eigenvalues is given by 1 Dn (α) = Cn n!

0

α

···

α

0 0≤i< j≤n−1

(xi − x j )2

n−1 j=0

e−4x j n d x j .

(16)

646

P. Deift, A. Its, I. Krasovsky

By standard arguments (cf. [7,29]), this quantity can be written as the Fredholm determinant of an integral operator on L 2 (0, ∞) in the following way: Dn (α) = det(I − K n χ(α,∞) ),

K n (x, y) =

1 ωn (x)ωn−1 (y) − ωn (y)ωn−1 (x) , 4 y−x (17)

where ωk (x) = e−2nx pk (x),

k = 0, 1, . . . ,

(18)

and χ(α,∞) is the characteristic function of the interval (α, ∞). If x = 1 + 1/(2n) + u/(2n)2/3 with u fixed, then as n → ∞, we obtain from classical results on the asymptotics of the Laguerre polynomials (see [32,30]): 1 1 u 1/3 = ω + (4n + 2 + 2(2n) ωn 1 + u) n 2n (2n)2/3 4n √

1 u 2 n −2/3 1 + + Ai (u) + O(n ) ; ω = (−1)n n−1 (2n)1/3 2n (2n)2/3

1 2 −2/3 + O(n 4(n − 1) + 2 + 2(2[n − 1])1/3 u + = ωn−1 ) 4n (2n)1/3 √ 2 2 n −2/3 Ai u + + O(n ) , (19) = (−1)n−1 (2[n − 1])1/3 (2n)1/3 where Ai (x) is the standard Airy function. Let K air y (u, v) =

Ai (u)Ai (v) − Ai (v)Ai (u) . u−v

Set u (n) = 1 +

u 1 v 1 + + , v (n) = 1 + . 2/3 2/3 (2n) 2n (2n) 2n

It follows from (19) that for any fixed u, v we have 1 (n) (n) u = K air y (u, v). K , v lim n n→∞ (2n)2/3 In fact, this asymptotics is uniform for u, v ≥ L 0 , where L 0 is an arbitrary constant. Indeed, for any L 0 there exists C = C(L 0 ) > 0, c = c(L 0 ) > 0 such that −cu −cv j k 1 (n) (n) ∂u ∂ ≤ Ce e u − K K , v (u, v) , (20) n air y v (2n)2/3 n 2/3 u, v ≥ L 0 ,

j, k = 0, 1.

This estimate can be proved in the same manner as estimate (3.8) in [8]. In [8] the authors use global estimates for orthogonal polynomials on R taken from [16]: Here the relevant global estimates can be obtained from [32].

Asymptotics of the Airy-Kernel Determinant

647

As in [8], estimate (20) immediately implies that for any fixed s ∈ R, lim Dn

n→∞

s 1− (2n)2/3

= det I − K air y χ(−s,∞) ) .

(21)

Below we obtain the asymptotics of the determinant det I − K air y χ(−s,∞) ≡ det (I − K s ) as s → +∞. In order to do this, we analyze the asymptotics of (17) for all α from α close to zero to α = 1−s/(2n)2/3 . Note that the determinant (17) has the structure of so-called integrable determinants [23]. Therefore, it is not surprising that there exists d a differential identity for dα ln Dn (α) in terms of the solution of a related RiemannHilbert problem. Solving the Riemann-Hilbert problem asymptotically as n → ∞, we find the asymptotics of this logarithmic derivative uniform for α ∈ [0, 1 − s0 /(2n)2/3 ], (2n)2/3 > s0 for some (large) s0 > 0. Integrating these asymptotics from α close to zero to α = 1 − s/(2n)2/3 , s0 < s < (2n)2/3 we obtain the asymptotics of Dn (1 − s/(2n)2/3 ) provided we know the asymptotics of Dn (α) for α close to zero. The latter, however, is readily obtained from the series expansion of the multiple integral formula for Dn (α) (see (22, 27) below). More precisely, the “inner workings” of the method in this paper (cf. also (133) in [17]) can be seen from formula (161) below, which is obtained by integrating the derivative (d/dα ) ln Dn (α ) from α = α0 to α = α. The key fact is that the estimate on the derivative is uniform for 0 ≤ α ≤ 1 − s/(2n)2/3 , s > s0 (see (152, 153)): This leads to the error estimate O(1/(n(1 − α)3/2 )) in (161). Using (27), we can then let α0 → 0: The singularities on the l.h.s. and the r.h.s. of (161) cancel out, and we are left with (162). Using (162), we immediately obtain Theorem 1. Note that in our calculations formula (3) for χ does not arise from an evaluation of Dn (α0 ) as n → ∞ for some fixed α0 . Rather it arises, somewhat paradoxically, from the behavior of Dn (α0 ) as α0 → 0 with n fixed as given in (27). In Sect. 2 the series expansion of Dn (α) for n fixed and α → 0 is derived, as indicated above. In Sect. 3 we obtain an asymptotic (n → ∞) solution of the Riemann-Hilbert d problem related to (17). Moreover, in Sect. 3, a differential identity for dα ln Dn (α) is obtained in terms of the matrix elements (and their first derivatives) of the solution to the Riemann-Hilbert problem at the point α. An alternative derivation of this identity, which is closer to the spirit of integrable systems and τ -functions (see, e.g., [5,9,23]), is given in the Appendix. The identity is then evaluated asymptotically in Sect. 4 using asymptotics found in Sect. 3. In Sect. 5 the identity is integrated, and the results of Sect. 2 are then used to complete the proof. Remark. Universality allows for considerable freedom in the choice of the approximating ensemble in the above method. We choose to consider the Laguerre ensemble, although we could have considered, for example, GUE itself: for GUE, however, the analysis turns out to be algebraically more complicated. (For example, in the GUE case there will be two endpoints instead of one endpoint at x = 1, see (12) et seq.) In choosing the approximating ensemble, it is essential that the various constants that arise can be evaluated explicitly as in (27) and also in formula (17) in [17]. In both cases we see that ultimately the formula for the desired constant arises from classical formulae for the Legendre polynomials. In physics, and also in mathematical physics, universality is often viewed as a passive statement that certain systems “behave in a similar fashion”. The thrust of this paper, going back to Dyson [18], is that universality can be used as an active analytical tool to obtain estimates for asymptotic problems of mathematical and physical interest.

648

P. Deift, A. Its, I. Krasovsky

Addendum. We draw the attention of the reader to the work [2] of Baik, Buckingham, and DiFranco, in which the authors give a different proof of (3) together with related results for GOE and GSE. The paper [2] appeared after our paper was written and refereed. 2. Expansion of Dn (α) as α → 0. In this section we derive a series expansion for Dn (α) as α → 0. Changing the variables x j = (α/2)(t j + 1) and expanding the exponent in (16), we obtain for fixed n: 1 1 α n(n−1)+n 1 Dn (α) = ··· (ti − t j )2 Cn n! 2 −1 −1 0≤i< j≤n−1

×

n−1

(1 − 2αn(t j + 1) + O(α 2 ))dt j

j=0

=

1 α n 2 An (1 + On (α)), Cn 2

(22)

where 1 An = n!

1

−1

···

1

−1 0≤i< j≤n−1

(ti − t j )

2

n−1

dt j

(23)

j=0

can be expressed in terms of the product of the leading coefficients (cf. (14, 15)) of the Legendre polynomials: An =

n−1 k=0

22k (k!)4 2 . [(2k)!]2 2k + 1

(24)

The asymptotics of An as n → ∞ (used first by Widom in [33], and then in [17]) are given by the expression 1 1 ln n + ln 2 + 3ζ (−1) + δn , n → ∞, (25) 4 12 δn → 0 as n → ∞. The where ζ (x) is the derivative of Riemann’s zeta-function, and zeta-function originates from the expansion of the product of factorials. The asymptotics of Cn (15) have a similar form, 3 1 ln Cn = − +ln 4 n 2 +n ln(2π )− ln n + 2ζ (−1)+ δˆn , δˆn → 0, n → ∞. 2 6 (26) ln An = −n 2 ln 2 + n ln(2π ) −

Substituting the asymptotics (25, 26) into (22), we obtain for α > 0: n 3 1 + ln α n 2 − ln + ζ (−1) + δn + On (α), ln Dn (α) = 2 12 2

(27)

where δn depends on n only and δn → 0 as n → ∞. Note for later application (see proof of Lemma 2) that the error term On (α) is analytic in α, in particular, (d/dα)On (α) = On (1). We shall use formula (27) in the last section. Caveat. On (α) → 0 as α → 0, n fixed: no claim is made about On (α) as n → ∞.

Asymptotics of the Airy-Kernel Determinant

649

3. Differential Identity and the Riemann-Hilbert Problem 3.1. Initial transformations. In what follows, unless explicitly stated otherwise, we will always assume 0 < α < 1. At certain points in the text, however, we will also consider α in a small neighborhood Dε0 (0) of α = 0 (see the discussion in the end of Sect. 3.1.) The multiple integral (16) can be written as (cf. (14, 15)): Dn (α) =

n−1 1 −2 θj , Cn

(28)

j=0

where θ j are the leading coefficients of the polynomials q j (x) = θ j x j + · · · satisfying α qk (x)qm (x)e−4nx d x = δkm , k, m = 0, 1, . . . . (29) 0

It is convenient to write this orthogonality relation in the form α δ jk q j (x)x k e−4nx d x = , k = 0, 1, . . . , j, j = 0, 1, 2, . . . . θj 0

(30)

Note, in particular, that α ∂ q j (x) q j (x)e−4nx d x ∂α 0 α dθ j j 1 dθ j x + polynomial of degree less than j e−4nx d x = . q j (x) = dα θ j dα 0 (31) Using relation (31), we obtain n−1 n−1 n−1 α d 1 dθ j d ∂ ln Dn (α) = ln = −2 θ −2 = −2 q j (x) q j (x)e−4nx d x j dα dα θ dα ∂α j=0 j=0 j j=0 0 ⎞ ⎞ ⎛ ⎛ α n−1 n−1 ∂ ⎝ 2 ⎠ −4nx d ⎝ α 2 q j (x) e dx = − q j (x)e−4nx d x ⎠ =− dα 0 ∂α 0 j=0

+

n−1

j=0

q 2j (α)e−4nα .

(32)

j=0

By (29) with k = m = j, the last integral (inside the brackets) in (32) equals n and hence vanishes upon differentiation. Applying the Christoffel-Darboux formula, n−1 j=0

q 2j (x) =

θn−1 (qn (x)qn−1 (x) − qn (x)qn−1 (x)), θn

(33)

to the last sum in (32), we obtain d θn−1 −4nα e (qn (α)qn−1 (α) − qn (α)qn−1 (α)). ln Dn (α) = dα θn

(34)

650

P. Deift, A. Its, I. Krasovsky

Here and below the prime denotes differentiation w.r.t. the argument x. d ln Dn (α) depends only on qn , qn−1 . This property is Formula (34) shows that dα crucial for the analysis below. As noted in [20], orthogonal polynomials can be represented in terms of a solution to an associated Riemann-Hilbert problem. In the present case, the relevant RiemannHilbert problem is formulated as follows: Find a 2 × 2 matrix-valued function V (z) satisfying the conditions: (a) V (z) is analytic for z ∈ C\[0, α]. (b) Let x ∈ (0, α). V (z) has L 2 boundary values V+ (x) as z approaches x from above, and V− (x), from below. They are related by the jump condition 1 e−4nx V+ (x) = V− (x) , x ∈ (0, α). (35) 0 1 (c) V (z) has the following asymptotic behavior as z → ∞: 1 1 V (z) = I + O z nσ3 , where σ3 = 0 z

0 . −1

(36)

This Riemann-Hilbert problem (RHP) has a unique solution for any n, α > 0, and, in particular, V11 (z) = qn (z)/θn and V21 (z) = −2πiθn−1 qn−1 (z). Therefore we can rewrite the differential identity (34) in terms of V (z) in the form:1 d e−4nα ln Dn (α) = (V11 (α)V21 (α) − V11 (α)V21 (α)). dα 2πi

(37)

In this section our task is to solve the RHP for V (z) asymptotically (in other words, to find asymptotics of the polynomials qk (z)) as n → ∞. The results will then be used in Sect. 4.2. to evaluate the r.h.s. of (37). Following the steepest descent method for RH problems as described in [16,7], we first of all need to find a so-called g-function: In the present situation this reduces to finding a function analytic outside the interval (−∞, α) and continuous up to the boundary with the properties: (a) g(z) = ln(z) + O(1/z) as z → ∞; (b) there exists a constant l such that the boundary values g± (x) = limε↓0 g(x ± iε) of g(z) are related as follows: g+ (x) + g− (x) − 4x − l = 0,

x ∈ (0, α);

(38)

(c) on (0, α), g+ (x) − g− (x) is purely imaginary, and i(d/d x)(g+ (x) − g− (x)) > 0; (d) e g+ (x)−g− (x) = 1 on (−∞, 0). A standard computation shows that if such a function g(z) exists then it is unique.2 Formally, the derivative g (z) of g(z) must have the properties: (a’) g (z) = 1/z + O(1/z 2 ) as z → ∞; (x) = 4 for x ∈ (0, α). (b’) g+ (x) + g− 1 An alternative derivation of this identity is presented in the Appendix. 2 Note that as the contour for the RHP is (0, α), the extra condition (4.14) for g(z) in [16] is redundant in

the present situation.

Asymptotics of the Airy-Kernel Determinant

651

It is easy to verify that the following function satisfies these conditions: 1 + α − 2z . g (z) = 2 + √ z(z − α)

(39)

satisfying (a’) and (b’) (In fact, g (z) is the unique function with L p boundary values g± √ for any 1 α. Therefore, z g(z) = g (t)dt + C, α

where the constant C is determined from the condition that g(z) − ln(z) = O(1/z) as z → ∞. This gives z α 1 + α − 2t g(z) = 2z − α + ln + dt, (40) √ 4 t (t − α) α and it is easy to verify that g(z) indeed satisfies (a)–(d). From (38, 40) we now see that l = −2α + 2 ln

α . 4

(41)

We need to analyze the RHP for V (z) asymptotically as n → ∞ uniformly for 0 < α < 1 − s0 /(2n)2/3 , where s0 is a fixed (large) number. The steepest descent method continues with the following steps (see [16,7]): 1) the RHP for V is conjugated by eng(z)σ3 ; 2) the contour (0, α) is split into lenses; 3) matching parametrices for the solution to the RHP are constructed (i) away from the end-points 0 and α, (ii) in neighborhoods of 0 and α, respectively. By means of these steps, the RHP reduces as n → ∞ to a small norm problem which can be solved by a Neuman series. All these steps go through in the standard way except for the construction of the parametrix in a neighborhood of α. As we see from [16,7] the method requires that in a neighborhood |z − α| ≤ ε, ε small and fixed, (g+ − g− )(z) = (z − α)β (c + O(z − α)),

(42)

for some c = 0 and some exponent β > 0. (In [16], β = 3/2.) In our case for 0 < z < α, z 4 1 + α − 2t dt = √ (z − α)1/2 (1 − α + O(z − α)). (43) (g+ − g− )(z) = 2 √ α t (t − α) α For any fixed 0 < α < 1 we see that (g+ − g− )(z) satisfies (42). As α → 1, we have to make the neighborhood |z − α| < ε smaller and smaller. The constant c in (42) depends then on α, but that, in itself, is not an insurmountable problem. The real problem is that, unlike the situation in [26], the parametrix away from the points 0, α (see [16,7]) contains certain terms of the form (z/(z − α))1/4 evaluated on {z : |z − α| = ε}, and as a result is not uniformly bounded when 1 − α, and hence ε, approach zero. At the same time, there is not enough decay in the other relevant quantities to compensate for

652

P. Deift, A. Its, I. Krasovsky

Fig. 1. Conformal mapping

this. The problem can be circumvented, however, by introducing a transformation of the z-plane that “regularizes” the RHP in a neighborhood of z = α. Namely, set 1−α z , z = 1. (44) α 1−z This fractional-linear transformation maps the interval [0, α] onto [0, 1], the point z = 1 is mapped to infinity, and infinity is mapped to λ = −(1 − α)/α. The inverse transform is αλ 1−α z= , λ = − . (45) 1 − α + αλ α λ=

Thus z(λ) is analytic from C \ {−(1 − α)/α} into C, taking the complement of [0, 1] onto C\[0, α]. The fact that in our case we could not obtain an estimate of the form (42) uniformly as α ↑ 1 originates in the vanishing of the numerator in the integral for g+ − g− in (43) at the point t = (1 + α)/2 ∈ (α, 1). Under the transformation z → λ the point (1 + α)/2 is mapped to λ = 1 + α −1 . This point is at a positive distance from the contour 0 < λ < 1 for α ∈ (0, 1). This means that we will be able to construct a parametrix for the solution of the RHP in the λ variable in a fixed neighborhood about λ = 1. On the other hand, the point λ = −(1−α)/α (the image of z-infinity) now approaches the contour as α ↑ 1, and we will need to contract the neighborhood of λ = 0 so that this point remains outside. We shall see, however, that this neighborhood presents no problem, as the relevant terms of the jump matrix for the final R-RHP (see (70) and the argument after (103) below) decay sufficiently fast on the boundary of the neighborhood. For any λ ∈ C\([0, 1] ∪ {−(1 − α)/α}) set U (λ) ≡ V (z(λ)),

(46)

where z(λ) = αλ/(1 − α + αλ) as in (45). Then we obtain the following Riemann-Hilbert problem for U (λ): (a) U (λ) is analytic for λ ∈ C\([0, 1] ∪ {−(1 − α)/α}). (b) Let λ ∈ (0, 1). U has L 2 boundary values U+ (λ) as λ approaches the real axis from above, and U− (λ), from below. They are related by the jump condition 1 e−4nz(λ) , λ ∈ (0, 1). (47) U+ (λ) = U− (λ) 0 1

Asymptotics of the Airy-Kernel Determinant

653

(c) U (λ) has the following asymptotic behavior as λ → − 1−α α (z → ∞): 1 z(λ)nσ3 . U (λ) = I + O z(λ)

(48)

We transfer g(z) to the λ-plane by defining

1−α g(λ) ˆ ≡ g(z(λ)), for λ ∈ C\ − ,1 . α

(49)

Necessarily, g(λ) ˆ is analytic on its domain. We obtain α 1−λ + α + ln + (1 − α)3/2 g(λ) ˆ = 2α(α − 1) 1 − α + αλ 4

λ 1

1 + α(1 − t) dt . √ 2 (1 − α(1 − t)) t (t − 1) (50)

Note that g(λ ˆ + 0) − g(λ ˆ − 0) = 2πi on (−(α −1 − 1), 0) as this interval is the image of the half-axis (−∞, 0) in the z-variable, where it is easy to conclude (cf. (52) below) that g+ (z) − g− (z) = 2πi. This jump in the λ-variable is also easy to obtain directly from (50). Let dt 2(1 − α)3/2 λ 1 + α(1 − t) (51) h(λ) = √ iπ/2 2 e t (1 − t) 1 (1 − α(1 − t)) √ which is analytic in C\((−∞, 0)∪(1, ∞)). Here we choose the branch so that t (1 − t) is analytic in C\((−∞, 0) ∪ (1, ∞)) and positive for t ∈ (−1, 1). The function h(λ) is the analytic continuation of g(λ ˆ + 0) − g(λ ˆ − 0) off the interval (0, 1). Note that 0 2 1 + α − 2αx h(0) = iπ/2 d x = 2πi. (52) √ e x(1 − x) 1 Now transform the RHP for U as follows: ˆ 3, (λ) = e−nlσ3 /2 U (λ)e−n(g(λ)−l/2)σ T

1−α . λ ∈ C\ [0, 1] ∪ − α

(53)

(λ) satisfies: We easily obtain then that T (λ) is analytic for λ ∈ C\[0, 1]. (a) T (λ) are related by the jump condition (b) For λ ∈ (0, 1) the boundary values of T −nh(λ) 1 + (λ) = T − (λ) e , λ ∈ (0, 1). (54) T 0 enh(λ) (λ) has the following asymptotic behavior as λ → − 1−α : (c) T α (λ) = I + O λ + 1 − α . T α

(55)

654

P. Deift, A. Its, I. Krasovsky

Note that the problem is now normalized to I at λ = − 1−α α . (λ) = 1 and T (λ) is analytic at infinity, it follows that T (∞) is invertible. Since det T The function T (λ) defined by (∞)−1 T (λ) T (λ) = T

(56)

(λ), with the (c) condition is the solution to the same Riemann-Hilbert problem as T replaced by T (λ) = I + O(1/λ),

λ → ∞.

(57)

Clearly, 1−α −1 − T (λ). T (λ) = T α

(58)

We now show that the RHP for T is solvable for all 0 ≤ α < 1. For 0 < α < 1 the existence of such a T (λ) follows simply by pushing forward V (z), the solution of the RHP (35, 36) for the polynomials orthogonal on (0, α) with the weight e−4nx : the existence of V (z) itself follows from the basic results of [20,12]. So we are reduced to showing that T (λ) exists in the√case α√= 0 when the mapping V (z) → T (λ) breaks down. For α = 0, h(λ) = 4 ln( λ + i 1 − λ), 0 < λ < 1. If (λ − 1)1/2 (resp., λ1/2 ) denotes the branch which is analytic in C\[−∞, 1] (resp., C\[−∞, 0]), then in particular √ 1/2 1/2 (λ − 1)+ = −(λ − 1)− = i 1 − λ, 0 < λ < 1, and we find enh(λ) =

1/2

(λ − 1)+ + λ1/2

2n

1/2

(λ − 1)− + λ1/2

.

Thus if r (λ) = ((λ − 1)1/2 + λ1/2 )/2, then −nh(λ) 1 1 (r− /r+ )2n e = , 0 (r+ /r− )2n 0 enh(λ)

(59)

0 < λ < 1.

(60)

Setting Z (λ) = T (λ)r (λ)2nσ3 , we see that Z (λ) solves the RHP: (a) Z (λ) is analytic for λ ∈ C\[0, 1]. (b) For λ ∈ (0, 1) the boundary values of Z (λ) are related by the jump condition 11 Z + (λ) = Z − (λ) , λ ∈ (0, 1). (61) 01 (c) Z (λ) has the following asymptotic behavior as λ → ∞: Z (λ) = (I + O(1/λ))λnσ3 .

(62)

This is the standard RHP for polynomials orthogonal on (0, 1) with the unit weight. Therefore the desired solution T (λ) exists for α = 0 as well. This completes the proof of solvability of the RHP for T (λ) for all 0 ≤ α < 1. The above proof of solvability for all n is included only for completeness (cf. the last remark at the end of Sect. 3.5).

Asymptotics of the Airy-Kernel Determinant

655

Fig. 2. Contour for the S-Riemann-Hilbert problem and the circular neighborhoods U1,0 of the points 1, 0. These neighborhoods will be introduced below in connection with the construction of parametrices

As is standard in applications of the steepest descent method, we now deform the RHP as follows. Let = ∪3j=1 j be the oriented contour as in Fig. 2. Define a matrix-valued function S(λ) on C\ by the expressions: ⎧ T (λ), ⎪ ⎪ for λ outside the lens, ⎪ ⎪ ⎪ 1 0 ⎪ ⎨T (λ) , for λ in the upper part of the lens, −nh(λ) −e 1 S(λ) = (63) ⎪ ⎪ ⎪ 1 0 ⎪ ⎪ ⎪ for λ in the lower part of the lens. ⎩T (λ) enh(λ) 1 , It is easy to verify that S(λ) solves the following RHP: (a) S(λ) is analytic for λ ∈ C\, where = ∪3j=1 j . (b) The boundary values of S(λ) are related by the jump condition 1 0 S+ (λ) = S− (λ) ∓nh(λ) , λ ∈ (1 ∪ 3 )\{0,1}, 1 e where the plus sign in the exponent is on 3 , and the minus sign, on 1 , 0 1 S+ (λ) = S− (λ) , λ ∈ 2 ≡ (−1, 1). −1 0

(64)

(c) S(λ) = I + O(1/λ) as λ → ∞. For a fixed 0 < ε < 1/4, consider the circular neighborhood U1 of radius ε at the point λ = 1. Consider also the neighborhood U0 of λ = 0 of radius ε3 (1 − α) for a fixed 1/2 > ε3 > 0. Note that U0 contracts with growing n for α = 1 − s0 /(2n)2/3 . The point −(1 − α)/α lies outside U0 for all α ∈ (0, 1). In U0 , we can expand the integrand in (51) in powers of t and t/(1 − α): λ 2 t2 2αt h(λ) = h(0) + +O (1 + α − αt) 1 − √ 1−α (1 − α)2 eiπ/2 1 − α 0 dt ×(1 + t/2 + O(t 2 )) √ t √ 4 λ = h(0) + √ iπ/2 e 1−α 1 − 6α − 3α 2 λ2 , |λ| ≤ ε3 (1 − α), (65) × 1+α+ λ+O 6(1 − α) (1 − α)2

656

P. Deift, A. Its, I. Krasovsky

Fig. 3. Form of the contour for the S-Riemann-Hilbert problem

√ uniformly in α, and where h(0) = 2πi (see (52)). It is the presence of 1 − α in the denominator that will allow us to construct a solution to the RHP using a contracting neighborhood U0 as α approaches 1. We shall now show that the jump matrices for S(λ) on 1 ∪ 3 \ (U1 ∪ U0 ) are uniformly exponentially close to the identity (see (69) below) as n(1 − α)3/2 → ∞. To estimate the real part of h(λ) outside of the neighborhoods U0 and U1 , we now describe the form of the lens more precisely. First, we assume that the contour 3 is the mirror image of 1 , i.e. 3 = 1 . Therefore, we only need to describe the structure of the contour 1 . We assume that for 0 ≤ λ ≤ 1/2 the contour 1 lies above the straight line originating at zero, and making a positive angle γ0 with the real axis (see Fig. 3). The value of the angle γ0 will be specified later on. Similarly, the part of the contour between the vertical line λ = 1/2 and the boundary of the neighborhood U1 lies above the line λ = (1 − λ) tan γ1 where, again, the value of the angle γ1 < γ0 will be specified later on. Note that the contour has a well-defined limit as α ↓ 0. Let λ0 (resp., λ1 ) be the point of intersection of the contour 1 and the boundary of the disc U0 (resp., U1 ) (see again Fig. 3). Let λ = µ, λ = u. Thus, λ = µ + iu, and on 1 , λ0 ≤ µ ≤ λ1 . Fix some small ε2 > 0. Suppose that 1 and 3 are so close to the real axis that |u| < ε2 , µ

|u| < ε2 , 1−µ

λ0 ≤ µ ≤ λ1 .

(66)

In particular, this implies that tan γ0 < ε2 and tan γ1 < ε2 . Furthermore, as 1+α −αµ > 1, we have |u| < |u|, 1 + α − αµ and, as 1 − α + αµ > αµ, |u| α|u| < . 1 − α + αµ µ

Asymptotics of the Airy-Kernel Determinant

657

The above inequalities allow us to perform the following estimate on h(λ) for λ = µ + iu in (1 ∪ 3 )\{U1 ∪ U0 }. Using (51), we obtain 2(1 − α)3/2 µ+iu 1 + α − αµ − αiv d(µ + iv) h(λ) = h(µ) + √ 2 eiπ/2 (1 − α + αµ + αiv) (µ + iv)(1 − µ − iv) µ −2 u iαv iαv 2(1 − α)3/2 1 + α − αµ 1 − 1 + = h(µ) + √ 1 + α − αµ 1 − α + αµ µ(1 − µ) (1 − α + αµ)2 0 −1/2 −1/2 iv iv 1− × 1+ dv µ 1−µ 2(1 − α)3/2 1 + α − αµ = h(µ) + √ u [1 + O(ε2 )], µ(1 − µ) (1 − α + αµ)2

(67)

where the constant in the error term is uniform for 0 ≤ α < 1. The fraction u/(1 − α + αµ)2 in the last equation of (67) can be estimated for some ε4 > 0 as |u| 1 ≤ µ ≤ λ1 , > ε sin γ1 > ε4 , for 2 (1 − α + αµ) 2 |u| µ tan γ0 tan γ0 > = > (1 − α + αµ)2 (1 − α + αµ)2 µ(α + (1 − α)/µ)2 tan γ0 1 > ε4 , for λ0 ≤ µ ≤ , (1 + ε2 /(ε3 sin γ0 ))2 2

(68)

where ε4 depends only on ε and εi , i = 2, 3, γ0 , γ1 , which in turn depend only on ε, ε2 , ε3 . Since h(µ) = 0, we obtain from (68) as n → ∞ for sufficiently small ε2 > 0: |e−nh(λ) | = O(e−ρc ), |e

nh(λ)

| = O(e

−ρc

),

λ ∈ 1 \(U0 ∪ U1 ), λ ∈ 3 \(U0 ∪ U1 )

(69) 3/2

uniformly for α ∈ [0, 1 − s0 /(2n)2/3 ] for some (large) s0 > 0 and all n > s0 /2, for some c = c(ε, ε2 , ε3 ) > 0, where ρ = n|1 − α|3/2 . So except for the jump on the interval (0, 1) and the jumps inside U1 , U0 , the jumps of S(λ) are indeed exponentially close to the identity as ρ → ∞. For later purposes, we shall need the series expansion of h(λ) at λ = 0, 1. We have: √ 4 λ h(λ) = 2πi + √ eiπ/2 1 − α 1 − 6α − 3α 2 λ2 × 1+α+ , λ → 0; (70) λ+O 6(1 − α) (1 − α)2 √ h(λ) = 4(1 − α)3/2 u × 1 − (α + 1/6)u + (α 2 + 3α/10 + 3/40)u 2 + O(u 3 ) , λ = 1 + u, u → 0. (71)

658

P. Deift, A. Its, I. Krasovsky

In (70) the cut of the root lies to the left of λ = 0, and −π < arg λ < π , whereas in (71) the cut lies to the right of λ = 1, and 0 < arg u < 2π . Note the crucial fact that, as follows from (70), (71), the quantity n|h(λ)| (resp., n|h(λ) − 2πi|) is uniformly large on the boundary ∂U1 (resp., ∂U0 ) for some (large) 3/2 s0 > 0 for all α ∈ [0, 1 − s0 /(2n)2/3 ], if (2n)2/3 > s0 . Indeed, it is of order s0 for λ on ∂U1 (resp., of order n for λ on ∂U0 ). This will allow us to obtain the desired asymptotic solution of the Riemann-Hilbert problem. For technical reasons (see the end of Sect. 4.2. below and also proof of Corollary 2 in [17]), we need to control the solution of the RHP for all α ∈ Dε0 (0)∪[ε0 , 1−s0 /(2n)2/3 ], where Dε0 denotes the disc of radius ε0 about zero in the complex α-plane with ε0 small. For all α ∈ Dε0 (0) we use the fixed contour = α=0 in Fig. 3 corresponding to α = 0. By the preceding calculation we see that |h(λ; α = 0)| ≥ c0 > 0 for all λ ∈ (1 ∪ 3 )\(U0 ∪ U1 ). Thus |e−nh(λ;α=0) | ≤ e−nc0 ,

λ ∈ (1 ∪ 3 )\(U0 ∪ U1 ).

(72)

Hence, by continuity, we must have

|e−nh(λ,α) | ≤ e−nc0

(73)

for all λ ∈ (1 ∪ 3 )\(U0 ∪ U1 ) and all α ∈ Dε0 (0), 0 < c0 < c0 , ε0 sufficiently small. We now begin the construction of parametrices which give, in their respective regions, the leading contribution to the asymptotics for the RHP. 3.2. Parametrix in C\(U1 ∪ U0 ). First, because of the exponential convergence described above, we expect the following model problem to play a role in constructing a parametrix for the solution of the RHP as n → ∞: (a) N (λ) is analytic for λ ∈ C\[0, 1],

0 1 N+ (λ) = N− (λ) , −1 0

(b)

(c)

N (λ) = I + O

1 , λ

λ ∈ (0, 1),

(74)

as λ → ∞.

(75)

The solution N (λ) can be found in the standard way by first transforming N (λ) with a 2 × 2 unitary transformation to the form for which the jump matrix is diagonal and then solving the two resulting scalar Riemann-Hilbert problems (cf. [7]). We obtain N (λ) =

1 m + m −1 −i(m − m −1 ) , 2 i(m − m −1 ) m + m −1

m(λ) =

λ−1 λ

1/4 ,

(76)

where m(λ) is analytic outside [0, 1] and m(λ) → +1 as λ → ∞. Note that det N (λ) = 1 and that N (λ) is the unique L p solution of the RHP for any 1 < p < 4.

Asymptotics of the Airy-Kernel Determinant

659

3.3. Parametrix at λ = 1. Now let us construct a parametrix in U1 . We look for an analytic matrix-valued function P1 (λ) in U1 which has the same jump relation as S(λ) on ∩ U1 and instead of a condition at infinity satisfies the matching condition on the boundary P1 (λ)N −1 (λ) = I + O(1/ρ),

λ ∈ ∂U1 ,

ρ = n|1 − α|3/2 ,

uniformly in λ and α as ρ → ∞. Define: eiπ h(λ)/2, for λ > 0, φ(λ) = . h(λ)/2, for λ < 0

(77)

(78)

This function is analytic in U1 outside (1 − ε, 1]. We look for P1 (z) in the form: nφ(λ)σ3 ˆ P1 (λ) = E n (λ) P(λ)e ,

(79)

where E n (λ) is analytic and invertible (det E n = 0) in a neighborhood of U1 , and nφ(λ)σ3 . ˆ therefore does not affect the jump and analyticity conditions for P(λ)e As P1 (λ) is required to satisfy the jump relations (64) for S, it is easy to verify that ˆ P(λ) = E n (λ)−1 P1 (λ)e−nφ(λ)σ3 satisfies jump conditions with constant jump matrices: 10 , λ ∈ ((1 ∪ 3 ) ∩ U1 )\{1}, Pˆ+ (λ) = Pˆ− (λ) 11 0 1 , λ ∈ 2 ∩ U 1 . (80) Pˆ+ (λ) = Pˆ− (λ) −1 0 Now introduce a mapping of U1 onto a new ζ -plane ζ = n 2 φ(λ)2 = 4n 2 (1 − α)3 u(1 − (2α + 1/3)u + (3α 2 + 14α/15 + 8/45)u 2 + O(u 3 )), λ = 1 + u,

(81)

where we used (71). The expansion at λ = 1 is uniform for α in a bounded set. Choosing a sufficiently small ε > 0, we see that ζ (λ) is analytic and one-to-one in the neighborhood U1 . Note that if α ∈ [0, 1 − s0 /(2n)2/3 ] then |ζ | = O(ρ 2 ) is uniformly large, if s0 is large, on the boundary ∂U1 and in α. This is a crucial fact in the present work. When 3/2 α = 1 − s0 /(2n)2/3 , we have ρ = s0 /2. Let us now choose the exact form of the contours in U1 so that their images under the mapping ζ (λ) are straight lines (see Fig. 4). Set ˆ P(λ) = (ζ ).

(82)

ˆ So the jump matrices for (ζ ) are the same as for P(λ) (they are shown in Fig. 4). A matrix (ζ ) satisfying these jump conditions was constructed in [27] in terms of Bessel functions, namely:

660

P. Deift, A. Its, I. Krasovsky

Fig. 4. Contour of the Riemann-Hilbert problem for (ζ ) (the case of U1 )

1) Region I

(1) 1/2 H0 (e−iπ/2 1 ζ ) (ζ ) = (1) 2 π ζ 1/2 H0 (e−iπ/2 ζ 1/2 )

2) Region II

(2) 1/2 H0 (eiπ/2 1 ζ ) (ζ ) = 2 −π ζ 1/2 H0(2) (eiπ/2 ζ 1/2 )

3) Region III

(ζ ) =

I0 (ζ 1/2 ) πiζ 1/2 I0 (ζ 1/2 )

(2) 1/2 ) H0 (e−iπ/2 ζ (2) −iπ/2 1/2 , (83) π ζ 1/2 H0 (e ζ ) (1) iπ/2 1/2 −H 0 (e ζ ) , (1) π ζ 1/2 H0 (eiπ/2 ζ 1/2 ) i 1/2 ) π K 0 (ζ 1/2 −ζ K 0 (ζ 1/2 )

,

(84)

(85)

where −π < arg(ζ ) < π√. √ Here the square root ζ has the cut on (−∞, 0). Hence, ζ = −nφ(λ) for −π < arg(ζ ) < π . The large-ζ asymptotics of Bessel functions give (here we choose s0 , depending only on ε, sufficiently large): −σ /2 1 i 1 1 −1 −2i 3 I+ √ (ζ ) = √ (π ζ ) i 1 1 8 ζ −2i 2

√ 3 1 −4i − 7 + O(ζ −3/2 ) e ζ σ3 (86) 1 2 ζ 4i uniformly on the boundary ∂U1 .

Asymptotics of the Airy-Kernel Determinant

661

Thus P1 (λ) = E n (λ)(ζ (λ))enφ(λ)σ3 , where the function E n (λ) is found from the matching condition to be 1 1 −i E n (λ) = √ N (λ) (π ζ )σ3 /2 . −i 1 2

(87)

(88)

Now to complete the construction of the parametrix it only remains to show that E n (λ) is an analytic function in U1 (clearly, det E n (λ) = 0). First, we show that it has no jump on the real ζ -axis. This is easy to verify using the jump condition for N (λ) and the identity ζ− = ζ+ e−2πi on the negative half axis. Moreover, a simple calculation shows that E n (λ) has no pole at λ = 1. Thus, E n (λ) is analytic in U1 , and the parametrix in U1 is given by Eqs. (79, 82–85, 88) for α ∈ [0, 1). Below we shall need the first three terms in the matching condition for P1 . Using (86), we obtain 1 −1 , λ ∈ ∂U1 . P1 (λ)N (λ) = I + 1 (λ) + 2 (λ) + O (89) ρ3 Here

1 −1 −2i N (λ)−1 1 (λ) = √ N (λ) −2i 1 8 ζ 1 −3m 2 + m −2 −i(3m 2 + m −2 ) , = √ 3m 2 − m −2 16 ζ −i(3m 2 + m −2 ) 3 3 −1 4i −1 4i −1 N (λ) = 7 , 2 (λ) = 7 N (λ) −4i −1 2 ζ 2 ζ −4i −1

(90)

where m(λ) is defined in (76). Note that both 1 (λ) and 2 (λ) are meromorphic functions in U1 with a simple pole at λ = 1. Recall that we use the contour = α=0 for all α ∈ Dε0 (0), ε0 small. For such α, the map λ → ζ maps U1 (consisting of the three regions separated by ) onto a set, region to region, where the lines separating each region are now no longer straight but lie in small cones about the original ones. The opening angles of the cones are proportional to |α|. Using the same definition for as in (83–85) for each of the new Regions I, II, III, we find again that (86) is valid, and that P1 (λ)N −1 (λ) has the same expansion (89) as in the case 0 ≤ α < 1. Note that the values of ε0 and s0 can be changed (now and below) if necessary. 3.4. Parametrix at λ = 0. The construction of the parametrix in U0 is similar. Recall, however, that the radius of U0 is ε3 (1 − α), so it decreases as α → 1, i.e. as the pole of h(λ) approaches the point λ = 0. We shall see that this neighborhood produces asymptotics for the RHP in inverse powers of n. We look for an analytic matrix-valued function P0 (z) in the neighborhood U0 which satisfies the same jump conditions as S(λ) on ∩U0 , and satisfies the matching condition P0 (λ)N −1 (λ) = I + (1 − α)−1/2 O(1/n) uniformly in λ on the boundary ∂U0 as n → ∞.

(91)

662

P. Deift, A. Its, I. Krasovsky

Below we define functions in U0 which play the same role as φ, E n , and Pˆ in U1 . We use the same notation for these quantities as before. Namely, let eiπ (h(λ) − 2πi)/2, for λ > 0, φ(λ) = . (92) (h(λ) − 2πi)/2, for λ < 0 This function is analytic in U0 outside [0, ε3 (1 − α)]. As above, we look for the parametrix P0 (λ) in the form: nφ(λ)σ3 ˆ P0 (λ) = E n (λ) P(λ)e .

We obtain that

1 0 ˆ ˆ P+ (λ) = P− (λ) , λ ∈ ((1 ∪ 3 ) ∩ U0 )\{0}, 1 1 0 1 , λ ∈ 2 ∩ U 0 . Pˆ+ (λ) = Pˆ− (λ) −1 0

(93)

(94)

We choose the following mapping of U0 onto a ζ -plane 1 − 6α − 3α 2 λ2 λ 1+ λ + O , ζ = n 2 φ(λ)2 = e−iπ 4n 2 (1 + α)2 1−α 3(1 − α 2 ) (1 − α)2 (95) where we used (70). Choosing a sufficiently small ε3 > 0, we see that ζ (λ) is analytic and one-to-one in the neighborhood U0 . Let us also choose the exact form of the contours in U0 so that their images under the mapping ζ (λ) are direct lines. In the ζ -plane the contour and the jump matrices for ˆ P(λ) are the same as in Fig. 4 with the only difference that all directions are reversed (pointing away from ζ = 0). It is easily seen that the function ˆ P(λ) = σ3 (ζ )σ3 ,

(96)

where (ζ ) is given by (83–85) satisfies the jump conditions in this case. Finally, we calculate E n and obtain P0 (λ) = E n (λ)σ3 (ζ (λ))σ3 enφ(λ)σ3 , where

1 1 E n (λ) = √ N (λ) i 2

i (π ζ )σ3 /2 1

(97)

(98)

(the analyticity of E n (λ) in U0 is verified as above). Then we see immediately from (86, 95) that P0 (λ)N −1 (λ) = E n (λ)σ3 (ζ )σ3 enφ(λ)σ3 N −1 (λ) 1 1 1 1 =I+√ O √ O =I+√ n ζ 1−α λ

(99)

Asymptotics of the Airy-Kernel Determinant

663

Fig. 5. Contour for the R-Riemann-Hilbert problem

uniformly in λ ∈ ∂U0 and α ∈ [0, 1 − s0 /(2n)2/3 ]. Of course, the bound in (99) blows up if α → 1 too rapidly: for 0 ≤ α < 1 − s0 /(2n)2/3 , we see that the error term is O(n −2/3 ). Thus the construction of the parametrix in U0 is now complete. Using the expansion of (ζ ), we can extend (99) to a full asymptotic series in inverse powers of n. Substituting (86) into (99), we obtain in particular: 1 1 −1 , (100) P0 (λ)N (λ) = I + 1 (λ) + 2 (λ) + √ O 3 n 1−α where

1 −1 1 (λ) = √ N (λ) 2i 8 ζ 3 −1 2 (λ) = 7 N (λ) 4i 2 ζ

2 1 m − 3m −2 i(m 2 + 3m −2 ) , N (λ) = √ 16 ζ i(m 2 + 3m −2 ) −m 2 + 3m −2 3 −1 −4i −4i N (λ)−1 = 7 . (101) −1 −1 2 ζ 4i

2i 1

−1

As above, note that 1 (λ) and 2 (λ) are meromorphic functions in U0 with a simple pole at λ = 0. For sufficiently small ε0 , the estimate (100) extends uniformly for α ∈ Dε0 (0) ∪ 3/2 [0, 1 − s0 /(2n)2/3 ] for all n > s0 /2, and λ ∈ ∂U0 as in Sect. 3.3. 3.5. Final transformation of the problem. Now construction of the parametrices is complete, and we are ready for the last transformation of the Riemann-Hilbert problem. Let ⎧ −1 ⎪ ⎨ S(λ)N (λ), λ ∈ C\(U0 ∪ U1 ∪ ), −1 R(λ) = S(λ)P0 (λ), λ ∈ U0 \, (102) ⎪ ⎩ S(λ)P −1 (λ), λ ∈ U \. 1 1 It is easy to see that this function has jumps only on ∂U1 , ∂U0 , and parts of 1 , 3 lying outside the neighborhoods U1 , U0 (we denote these parts out ). The contour is shown in Fig. 5. Outside this contour, R(λ) is analytic. Besides, R(λ) = I + O(1/λ) as λ → ∞. The jumps are as follows: 1 0 R+ (λ) = R− (λ)N (λ) ∓nh(λ) N (λ)−1 , λ ∈ 1out ∪ 3out , 1 e where the “-” sign in the exponent is taken on 1out , and “+”, on 3out , (103) R+ (λ) = R− (λ)P0 (λ)N (λ)−1 ,

λ ∈ ∂U0 \{intersection points},

R+ (λ) = R− (λ)P1 (λ)N (λ)

λ ∈ ∂U1 \{intersection points}.

−1

,

664

P. Deift, A. Its, I. Krasovsky

The jump matrix on out can be uniformly estimated (both in λ and α ∈ [0, 1 − s0 /(2n)2/3 ]) as I + O(exp(−cρ)), where c is a positive constant. In view of the estimates (69), this is obviously true outside a fixed neighborhood of λ = 0, say when |λ| ≥ 1/2. However, since the parametrix N (λ) is of order 1/λ1/4 for λ close to zero, and the contour approaches λ = 0 as α → 1, we need a more detailed analysis for |λ| ≤ 1/2. In that case, we use (67) to write for all α ∈ [0, 1) (in what follows the same symbols C and c stand for various positive constants independent of α, λ, and n): √

1 −nh(λ) µ/(1 − α) √ e < √C exp −cn λ µ (1 + αµ/(1 − α))2

t C 1 ≡ f (t), (104) exp −cn =√ (1 + αt 2 )2 1−α t √ where t = µ/(1 − α), µ = λ. We need to find the maximum value of f (t) in the interval 1 ≡ t2 t1 ≡ (ε3 /ε2 ) sin γ0 ≤ t ≤ √ 2(1 − α) for all α ∈ [0, 1−s0 /(2n)2/3 ]. For this purpose, it is convenient to consider the following two cases separately. 1) αt 2 ≤ 1. Then 1 + αt 2 ≤ 2, and we have C exp[−cnt] ≡ f 1 (t). f (t) < √ t 1−α

(105)

The derivative f 1 (t) < 0 for t > 0, which implies max√

t∈[t1 ,1/ α]

f (t) < f 1 (t1 ) < C

n 1/3 1/2

s0

e−cn < Ce−cn .

(106)

√ If 1/ α > t2 this is all we need. Otherwise consider 2) αt 2 > 1. Then 1 + αt 2 < 2αt 2 , and we have C exp[−cn/t 3 ] ≡ f 2 (t). f (t) < √ t 1−α

(107)

The only maximum of f 2 (t) is at the point tc = (3cn)1/3 . Now choose sufficiently large s0 > 0 (depending on ε2 , ε3 ). Then 1 1/2 < cn 1/3 /s0 < tc . t2 = √ 2(1 − α) Therefore max √

t∈[1/ α,t2 ]

f (t) < f 2 (t2 ) < Ce−cn(1−α)

3/2

= Ce−cρ .

(108)

Combining (106, 108), we finally obtain that the jump matrix on (1 ∪ 3 )\(U0 ∪ U1 ) is the identity up to an error of order 1 −nh(λ) < Ce−cρ √ e (109) λ 3/2

for all α ∈ [0, 1 − s0 /(2n)2/3 ], 2n > s0 .

Asymptotics of the Airy-Kernel Determinant

665

This estimate can be readily extended to complex α ∈ Dε0 (0). The jump matrices on ∂U0,1 admit the uniform expansions given by (100, 89). A consequence of the above considerations is the following result: Lemma 1. Let ρ = n|1 − α|3/2 , α ∈ Dε0 (0) ∪ [0, 1 − s0 /(2n)2/3 ], U = U0 ∪ U1 , 1 be the circle centered at λ = 1 of radius ε/2. Then, for = out ∪ ∂U . Also let U sufficiently small ε, ε j , j = 0, 2, 3 (ε j , j = 0, 2, 3 are the ε-parameters introduced ), there exists s0 > 0 such that for all α ∈ above in the definition of the contour 3/2 2/3 Dε0 (0) ∪ [0, 1 − s0 /(2n) ], and n > s0 /2, a (unique) solution R(λ) of the R-RH problem exists. Moreover, the function R(λ) admits the following asymptotic expansion, which (and the derivative of which) is uniform for α ∈ Dε0 (0) ∪ [0, 1 − s0 /(2n)2/3 ] and 1 , as ρ → ∞: all λ ∈ U R(λ) = I + R1 (λ) + R2 (λ) + · · · + Rk (λ) + Rr(k+1) (λ), d (k+1) R Rr(k+1) (λ) = O(ρ −k−1 ), (λ) = O(ρ −k−1 ), dλ r

(110)

k = 1, 2, . . . . The functions R j (λ) = O(ρ − j ) are constructed by induction as follows: 1 ds 1 ds , R2 (λ) = , 1 (s) (R1 − (s)1 (s) + 2 (s)) R1 (λ) = 2πi ∂U s−λ 2πi ∂U s−λ (111) k 1 ds , R0 ≡ I. (112) ..., Rk (λ) = Rk− j,− (s) j (s) 2πi ∂U s−λ j=1

Remark. The uniformity means that for sufficiently small ε, ε j , j = 0, 2, 3, there exist positive constants s0 , c1 , and c2 independent of α, n, λ such that d (k+1) c1 (k+1) ≤ c2 , |Rr Rr | ≤ k+1 , ρ k+1 ρ dλ 1 , ∀α ∈ Dε0 (0) ∪ [0, 1 − s0 /(2n)2/3 ], ∀n > s 3/2 /2. (113) ∀λ ∈ U 0

We also note that, ρ >

3/2 s0 /2,

∀α ∈ Dε0 (0) ∪ [0, 1 − s0

/(2n)2/3 ],

3/2

n > s0 /2.

Proof of Lemma 1. We shall follow a similar line of argument to the one which was used to prove a similar statement in [17] (Lemma 1). For simplicity, as in [17], we will only prove the expansion (110) in the case k = 2, which is all that is needed for the problem at hand. We shall also adopt the notation: Rr(3) (λ) ≡ Rr (λ). Besides, as before, the symbol c will stand for various positive constants independent of α, λ, and n. Write the jump condition for R(λ) in the form R0 + + R1 + + R2 + + Rr + = (R0 − + R1 − + R2 − + Rr − )(I + 1 + 2 + r ).

(114)

Here 1 and 2 are given by (101, 90) on ∂U0 , ∂U1 , respectively, and we set 1 = 2 = 0 on the rest of the contour. A direct analysis of the expressions (101, 90) shows that k = O((n −k |1−α|−1/2 ) on ∂U0 , and k = O(ρ −k ) on ∂U1 . Similarly, r = O(1/ρ 3 )

666

P. Deift, A. Its, I. Krasovsky

on ∂U1 (this error term arises from the Bessel asymptotics), r = O(|1 − α|4 /ρ 3 ) on \∂U . ∂U0 , and, by (109), r = O(e−cρ ) on We now show that we can define R1 and R2 so that they are of order 1/ρ and 1/ρ 2 , respectively. We then show that the remainder Rr is of order 1/ρ 3 . Set R0 = I. We define R j by collecting in (114) the terms that we want to be of the same order. First, . λ∈

R1 + (λ) = R1 − (λ) + 1 (λ),

(115)

, satisfying R1 (λ) = We are looking for a function R1 (λ), which is analytic outside O(1/λ), λ → ∞, and the above jump condition. The solution to this RH-problem is given by the Sokhotsky-Plemelj formula, R1 (λ) = C(1 ), where 1 C( f ) = 2πi

f (s)

(116)

ds s−λ

. The condition 1 (λ) = O(1/ρ), λ ∈ , ρ → ∞ (uniform is the Cauchy operator on in α), implies that there exist c, δ, s0 > 0 such that 3/2

|R1 (λ)| ≤ c/ρ, n ≥

s0 2

(117)

) ≥ δ. Actually, uniformly in α ∈ Dε0 (0) ∪ [0, 1 − s0 /(2n)2/3 ] and λ satisfying dist(λ, up to . Indeed, since this estimate is uniform for all λ ∈ C\ 1 ds , (118) 1 (s) R1 (λ) = 2πi ∂U s−λ for λ outside a fixed neighborhood of zero, this is seen by shifting the contour to a fixed distance from the point λ. Inside that neighborhood, the distance of the shift will depend on α. Namely, the distance is ε |1 − α| for a fixed (sufficiently small) ε > 0. Then c c c 1 c + ≤ (119) + = , c|1 − α| ρ n|1 − α|3/2 ρ ρ on and close to ∂U0 . Here we used the estimate 1 = O n −1 λ−1/2 , so that in the neighborhood of the circle ∂U0 the inequality |C(1 )| ≤ max |1 |

max |1 | ≤

c √ n| 1 − α|

holds. It should be observed that, by the same deformation of the contour of integration in (118), one obtains the analytic continuations of both the functions R1+ (λ) and R1− (λ) (we note in the neighborhood of the contour ∂U and hence in the neighborhood of R1 (λ) has no jump). Moreover, the estimate (117) that on the part out of the contour is preserved under this analytic continuation.

Asymptotics of the Airy-Kernel Determinant

667

Now define R2 (λ) by the jump condition , λ∈

R2 + (λ) = R2 − (λ) + R1 − (λ)1 (λ) + 2 (λ),

(120)

, and the condition R2 (λ) = together with the requirement of analyticity for λ ∈ C\ O(1/λ) for λ → ∞. The solution to this RHP is . λ ∈ C\

R2 (λ) = C(R1 − 1 + 2 ),

(121)

Using (117) and the estimates for 2 , we obtain in the same way as for R1 , |R2 (λ)| ≤ c/ρ 2 ,

3/2

, λ ∈ C\

n≥

s0 2

(122)

with the same uniformity and analyticity properties in α and λ. Now from (114, 115, 120) we obtain Rr + (λ) = Rr − (λ) + M(λ) + Rr − (λ)(λ),

, λ∈

(123)

where M ≡ R2 − 1 + (R1 − + R2 − )2 + (I + R1 − + R2 − )r ,

≡ 1 + 2 + r .

Remark. In the terminology of [13], Eq. (123) is an inhomogeneous RH-problem of type 2. and Since Rr = R − I − R1 − R2 , the matrix function Rr (λ) is analytic outside satisfies the condition Rr (λ) = O(1/λ) as λ → ∞. Therefore . λ ∈ C\

Rr (λ) = C(M) + C(Rr − ),

(124)

Hence Rr − (λ) = C− (M) + C− (Rr − ),

, λ∈

(125)

from the − side of . where C− ( f ) = limλ →λ C( f ), as λ approaches a point λ ∈ Now defining the operator C ( f ) ≡ C− ( f ), we represent (125) in the form (I − C )(Rr − ) = C− (M).

(126)

By virtue of the estimates (100), (89), and (109) we have that |||| L 2 ()∩L ∞ () ˜ ˜ ≤

c , ρ

(127)

3/2

for all α ∈ Dε0 (0) ∪ [0, 1 − s0 /(2n)2/3 ] and n > s0 /2. ˜ (see, e.g., [28]), and by a The Cauchy operator C− is bounded in the space L 2 () standard scaling argument (the Cauchy operator is homogeneous of degree 0), its norm is bounded by a constant independent of α. This together with the L ∞ part of the estimate (127) implies that the operator norm ||C || L 2 = O(1/ρ), and hence I − C is invertible by a Neumann series for s0 (and, therefore, ρ) sufficiently large. Thus (126) gives Rr − = (I − C )−1 (C− (M)),

(128)

668

P. Deift, A. Its, I. Krasovsky

and this proves the solvability of the R-RH problem for all α ∈ Dε0 (0) ∪ [0, 1 − 3/2 s0 /(2n)2/3 ] and n > s0 /2. Moreover, using the L 2 part of the estimate (127), we −3 conclude that C− (M) L 2 () ˜ = O(ρ ). Together with (128) this yields the uniform estimate Rr − L 2 () ˜ ≤

c , ρ3

(129) 3/2

∀α ∈ Dε0 (0) ∪ [0, 1 − s0 /(2n)2/3 ], n > s0 /2. The solution R(λ) of the R-RH problem is given by the integral representation R(λ) = I + R1 (λ) + R2 (λ) + C(M) + C(Rr − )(λ), ˜ λ ∈ C\ .

(130)

˜ Then, Remark. Let k , k = 1, 2, 3, 4 denote the connected components of the set C\. using again the possibility of the contour deformation when solving the integral equation (126), and taking into account the triviality of the jump matrix monodromy at each node ˜ we conclude that the restriction R|k (λ) is continuous in k point of the contour , for each k (see e.g. [4]). This means that Eq. (130) defines the solution of the R-RH problem in the classical, point-wise continuous, sense. Combining the inequality (129) with Eq. (130), we can complete the proof of the 1 , we immediately obtain the estimate lemma. Indeed, assuming that λ ∈ U c 3/2 (131) |C(M)(λ)| ≤ 3 , n > s0 /2, ρ for the fourth term in the r.h.s. of (130), and the estimate |C(Rr − )(λ)| ≤ c||Rr − || L 2 () ˜ |||| L 2 () ˜ ≤

c , ρ3

(132)

3/2

n > s0 /2, for the fifth term. Both the estimates are uniform in α ∈ Dε0 (0) ∪ [0, 1 − s0 /(2n)2/3 ]. Together they yield the estimate c 3/2 (133) |Rr (λ)| ≤ 3 , n > s0 /2, ρ 1 . This establishes part of uniformly in α ∈ Dε0 (0) ∪ [0, 1 − s0 /(2n)2/3 ] and λ lying in U the estimate (110) for the error term. The estimate for the derivative follows immediately from (124). This completes the proof of the lemma (in the case k = 2). Remark. (cf. Remark 2 in [17]). Part of the assertion of Lemma 1 is that the solution of the R-RH problem, and hence of the original T -RH problem, exists and is unique for all 3/2 α ∈ Dε0 (0) ∪ [0, 1 − s0 /(2n)2/3 ] and n > s0 /2 with s0 sufficiently large. This is all we need in the analysis that follows; however, the solution of the R-RH problem actually exists and is unique for all α ∈ Dε0 (0) ∪ [0, 1) and all n > 0 for some (possibly smaller) ε0 > 0. Indeed, by the discussion following (58), the T -RH problem, and hence the R-RH problem, is solvable for all α ∈ [0, 1), n > 0. Since, by the previous remark, the solution of the R-RH problem is continuous up to the contour, the problem is easily seen 3/2 to be solvable for α ∈ Dε0 (0), 0 < n ≤ s0 /2 for some ε0 > 0 by continuity of the jump matrix at α = 0. By Lemma 1, the R-RH problem is solvable for all α ∈ Dε0 (0), 3/2 n > s0 /2. Thus the R-RH problem, and hence the T -RH problem, is solvable for all n > 0 on Dε0 (0) ∪ [0, 1), where ε = min{ε0 , ε0 }.

Asymptotics of the Airy-Kernel Determinant

669

4. Evaluation of the Differential Identity 4.1. Exact transformations. We start with the differential identity (37). Note that since V (z) is related to U (λ) by the expression (46, 45) U (λ) = V (z(λ)), we have

z=

αλ , 1 − α + αλ

1 dλ , = dz z=α α(1 − α)

(134)

and (37) can be rewritten in terms of U (λ) as follows: d e−4nα (1) − U11 (1)U21 (1)). ln Dn (α) = (U11 (1)U21 dα 2πiα(1 − α)

(135)

Note that the derivatives in (135) are taken w.r.t. λ. By (53, 58), the matrix elements of U (λ) can be expressed in terms of T (λ) as follows:

1−α −1 ˆ U11 (λ) = T − T (λ) en g(λ) , α 11

1−α ˆ U21 (λ) = T −1 − e−nl en g(λ) . (136) T (λ) α 21 Furthermore, for λ outside the lens in U1 , T (λ) = S(λ),

S(λ) = R(λ)P1 (λ).

(137)

Note also that by (87), S j1 = (R(λ)E n (λ)(ζ )) j1 enφ(λ) ,

j = 1, 2,

and, as follows from the definitions of the functions φ, h, and the properties of g(z), 1 gˆ + − gˆ − gˆ + + gˆ − φ(λ) + g(λ) ˆ = ∓ h + gˆ = ∓ + gˆ ± = = 2z(λ) + l/2, 2 2 2 where gˆ ± (λ) stand for the analytic continuation of these functions. Here the upper sign corresponds to λ > 0, and the lower, to λ < 0. Hence, (135) finally gives d 1 ln Dn (α) = ((R E)11 (1)(R E)21 (1) − (R E)11 (1)(R E)21 (1)), dα 2πiα(1 − α) (138) where we used the fact that det T −1 (−(1 − α)/α) = 1. In (138), the derivative at λ = 1 is taken along a path in U1 outside the lens. In the next subsection we use the solution of the Riemann-Hilbert problem for R(λ) (found in Sect. 3) to construct the asymptotics of the r.h.s. of (138).

670

P. Deift, A. Its, I. Krasovsky

4.2. Asymptotics. Consecutive asymptotic terms in the expansion of the logarithmic derivative (138) are generated by consecutive terms in (110): R(λ) = I + R1 (λ) + R2 (λ) + · · · . Thus, setting R = I in (138) gives the main asymptotic term of

d dα

ln Dn (α):

1 ((E)11 (1)(E)21 (1) − (E)11 (1)(E)21 (1)). 2πiα(1 − α)

(139)

Using (88) and (85), we obtain (E)11 (ζ ) = µ+ (λ),

(E)21 (ζ ) = −iµ− (λ),

(140)

where µ± (λ) =

π 1/4 −1 ζ (m (λ)I0 ( ζ ) ± m(λ)I0 ( ζ )). 2

(141)

Using the expansion of Bessel functions as ζ → 0 (i.e. λ → 1), we obtain √ µ± (1) ≡ M = π n(1 − α)3/4 ,

α 1 , a = M n 2 (1 − α)3 − + 2 6

µ± (1) ≡ a ± b, b = Mn(1 − α)3/2 .

(142)

Substituting these values into (139), we find the main asymptotic term d n2 ln Dn (α) ∼ (1 − α)2 . dα α

(143)

To obtain the next term, we need to compute first R1 (1) =

1 2πi

∂U

1 (λ) dλ, λ−1

R1 (1) =

1 2πi

∂U

1 (λ) dλ. (λ − 1)2

(144)

We now examine 1,2 in the neighborhoods of the points λ = 0, 1. Using (95) and expanding the matrix elements of N (λ), we obtain from (101): √ 1−α C1 i(F0 (α) + 7/2) F0 (α) − 5/2 + O(λ), + 1 (λ) = λ 32n(1 + α) i(F0 (α) + 7/2) −F0 (α) + 5/2 √ 1−α 1 − 6α − 3α 2 −1 −i F0 (α) = , λ ∈ U0 . = , C 1 1 6(1 − α 2 ) 32n(1 + α) −i

(145)

For 2 (λ), we obtain similarly: 2 (λ) =

3(1 − α)(1 + O(λ)) 1 −4i 29 n 2 (1 + α)2 λ

4i , 1

λ ∈ U0 .

(146)

Asymptotics of the Airy-Kernel Determinant

671

In U1 , a similar calculation based on (90) and (81) gives (λ = 1 + u), A1 1 −5/2 + α + 1/6 −i(7/2 + α + 1/6) 1 (λ) = + u 32n(1 − α)3/2 −i(7/2 + α + 1/6) −(−5/2 + α + 1/6) u + 32n(1 − α)3/2 −i(−3/2 + (7/2)(α + 1/6) + F1 (α)) 3/2 − (5/2)(α + 1/6) + F1 (α) × −i(−3/2 + (7/2)(α + 1/6) + F1 (α)) −(3/2 − (5/2)(α + 1/6) + F1 (α)) 1 1 −i 2 + O(u ), , 1 + u = λ, λ ∈ U1 , A1 = 32n(1 − α)3/2 −i −1 (147) α 31 F1 (α) = − , 6 4 · 45 3(1 + (2α + 1/3)u + O(u 2 )) −1 4i , λ ∈ U1 . (148) 2 (λ) = −4i −1 29 n 2 (1 − α)3 u Now the expressions for R(1) and R (1) are obtained from the above results and (144) by a straightforward residue calculation:

1 (1 − α)2 δ η , δ= 5/2 − α − 1/6 − , R1 (1) = η −δ 32n(1 − α)3/2 1+α

i (1 − α)2 , η= 7/2 + α + 1/6 − 32n(1 − α)3/2 1+α σ τ , R1 (1) = τ −σ

1 (1 − α)2 σ = −3/2 + (5/2)(α + 1/6) − F1 (α) + , 32n(1 − α)3/2 1+α

i (1 − α)2 τ= −3/2 + (7/2)(α + 1/6) + F1 (α) + . (149) 32n(1 − α)3/2 1+α Note that the contours ∂U0,1 are traversed in the negative direction. We shall be using the following notation for the expansion terms of the logarithmic derivative (138). We denote Rk · Rm (R0 ≡ I ) the term given by 1 ((Rk E)11 (1)(Rm E)21 (1) + (Rm E)11 (1)(Rk E)21 (1) 2πiα(1 − α)(1 + δk,m ) (150) −(Rk E)11 (1)(Rm E)21 (1) − (Rm E)11 (1)(Rk E)21 (1)). For example, the main term (139) is I · I . We can now evaluate the next (R1 · I ) term in the expansion. It is written as follows: n2 1 d 1 ln Dn (α) − (1 − α)2 ∼ R1 (1) M(−iµ− (1)) −i 1 dα α 2πiα(1 − α) µ+ (λ) µ+ (λ) + M R1 (λ) (1) − R1 (λ) (1)(−i M) −iµ− (λ) 2 −iµ− (λ) 1 α M 2 (τ + iσ ) 1 − µ+ (1) R1 (1) = , (151) M = −i 2 πiα(1 − α) 4(1 − α 2 )

672

P. Deift, A. Its, I. Krasovsky

where we first simplified the expression substituting the above symbolic representation of R1 in terms of δ, η, σ , τ , and used their numerical values only at the last step. It turns out that the two terms in the asymptotics just obtained is all we need (up to the error term). The following lemma is the main result of this section: Lemma 2. There exists s0 > 0 such that the expansion d n2 α ln Dn (α) = (1 − α)2 + + r (n, α), dα α 4(1 − α 2 ) 1 1 r (n, α) = O , ρ = n|1 − α|3/2 , 1−α ρ

(152) (153)

3/2

holds uniformly in α ∈ (0, 1 − s0 /(2n)2/3 ] for all n > s0 /2. Proof. It only remains to prove the expression for the error term. We consider the expansion of R(λ) up to the third term: R = I + R1 + R2 + Rr . Since Rk = O(ρ −k ) and, according to (142), µ± (1)µ± (1) = O(ρ 3 ), it is not difficult to deduce from (138) (cf. (151)) that the contribution of the terms Rr · R1 , R2 · R2 and higher are of order (α(1 − α))−1 O(ρ −1 ). Thus we shall need to consider in detail only the following 4 terms: R1 · R1 , R2 · I , R2 · R1 , Rr · I . For the R1 · R1 term, which we denote L 11 , we obtain after a calculation similar to (151): L 11 = −

n2 1 (1 − α)2 (δ 2 + η2 ) = 8 (α + 2/3)(2 + 5α − α 2 ). α 2 α(1 − α 2 )

For further analysis, we need to calculate R2 (1). It is given by the formula: R1− (λ)1 (λ) + 2 (λ) 1 dλ. R2 (1) = 2πi ∂U λ−1

(154)

(155)

The solution of the Riemann-Hilbert problem for R1 inside U1,0 is given by the expression (which we write on the boundary) R1− (λ) =

C1 A1 + − 1 (λ), λ−1 λ

λ ∈ ∂U,

(156)

where A1 , C1 are defined in (147, 145). Note that outside U1,0 the solution is R1 (λ) =

C1 A1 + . λ−1 λ

It is easily seen that the jump, analyticity conditions, and the condition at infinity of the Riemann-Hilbert problem for R1 (λ) are satisfied, and therefore, by uniqueness, this is the solution. The expansions for 1,2 obtained above and the formulas (156, 155) give, by a residue calculation, the final expression for R2 (1): γ −β , R2 (1) = β γ

−1 (1 − α)2 (1 − α)2 γ = 9 2 (3α − 1) 1 − + 3 − , (157) 2 n (1 − α)3 3(1 + α) 1+α where the expression for β is omitted as it is not needed below.

Asymptotics of the Airy-Kernel Determinant

673

To compute the “R2 · I ” term (which we denote L 20 ) note first that the contribution of the terms in that expression involving R2 (1) is of order (α(1 − α))−1 O(ρ −1 ) and we need not calculate them. The remainder gives a nontrivial contribution, and we obtain: 2bMγ 1 1 . (158) L 20 = + O π α(1 − α) α(1 − α) n(1 − α)3/2 The expression for γ tells us that this is equal to −L 11 (154) up to the error term. Thus, we conclude that the contributions of R2 · I and R1 · R1 terms cancel each other. The analysis of the R2 · R1 term is now easy to carry out, and we find that this term is of order (α(1 − α))−1 O(ρ −2 ). For any matrix elements of Rr (1) (we only know they are of order O(ρ −3 )), we obtain that the Rr · I term is of order (α(1 − α))−1 O(ρ −1 ). Thus, in view of uniformity of the error term in the expansion of R(λ), the lemma is proven but with the remainder 1 1 O . (159) r (n, α) = α(1 − α) ρ We now show that α in the denominator here can be omitted. First, we notice that r (n, α) = On (1) as α → 0 and n is fixed: this follows immediately after substitution of the expansion (27) into the l.h.s. of (152). However, we need an estimate which is uniform in n. To obtain such an estimate, we use the extensions of our expressions for complex α discussed above. As follows from (152, 27), r (n, α) is an analytic function of α in Dε0 (0). Thus 1 r (n, α) d α, |α| < ε0 /4. r (n, α) = (160) 2πi ∂ Dε0 /2 (0) α−α Since by (159), r (n, α ) is uniformly bounded on ∂Dε0 /2 (0), it follows that r (n, α) is 3/2 uniformly bounded by O(1/ρ) for all α ∈ Dε0 /4 (0), and all n > s0 /2. Lemma 2 is proven. 5. Proof of Theorem 1 Integrating the differential identity (152) from α0 (close to zero from above) to any α0 < α ≤ 1 − s0 /(2n)2/3 , we obtain: ln Dn (α) − ln Dn (α0 ) 2 − α2 α 1 α 1 1 − α2 2 0 − 2(α − α0 ) + +O = n ln − ln α0 2 8 1 − α02 n(1 − α)3/2 (161) 3/2

for all n > s0 /2. Note from (153) that the term O(1/n(1 − α)3/2 ) does not depend on α0 . Substituting for ln Dn (α0 ) the expansion (27) and taking the limit α0 → 0, we obtain for any 0 < α ≤ 1 − s0 /(2n)2/3 , 3 α2 1 1 + ln α − 2α + − ln n − ln(1 − α 2 ) ln Dn (α) = n 2 2 2 12 8 1 1 + δn . ln 2 + ζ (−1) + O (162) + 12 n(1 − α)3/2

674

P. Deift, A. Its, I. Krasovsky

z

α

0

+ −

1

Fig. 6. Contour for the Y -Riemann-Hilbert problem

Fix any s > s0 and, for n sufficiently large, set α = 1 − s/(2n)2/3 . Now take the limit n → ∞. As n → ∞, the r.h.s. of (162) becomes −

1 1 s3 − ln s + ln 2 + ζ (−1) + O(s −3/2 ). 12 8 24

(163)

On the other hand, as s is any fixed number s > s0 , the l.h.s. of (162) converges to ln det(I − K s ) by (21). Acknowledgements. Percy Deift was supported in part by NSF grant # DMS 0500923 and also by a Friends of the Institute Visiting Membership at the Institute for Advanced Study in Princeton, Spring 2006. Alexander Its was supported in part by NSF grant # DMS-0401009. The authors thank V. Tarasov for a useful discussion which took place after one of us (A.I.) gave a talk on our preceding work [17]. In fact, it was during this discussion that the idea to use the Airy-limit of the classical polynomials in order to prove the Tracy-Widom conjecture in the framework of the approach of [17] was born.

6. Appendix Here we present an alternative derivation of the identity (37). Let φ(x) =

1 ωn−1 (x), 2

ψ(x) =

1 ωn (x). 2

(164)

The determinant (17) is written then as follows: φ(x)ψ(y) − φ(y)ψ(x) χ(α,∞) . Dn (α) = det I − x−y

(165)

The operator K (x, y) = (φ(x)ψ(y) − φ(y)ψ(x))/(x − y) is of integrable type, and hence (see, e.g., [23,6,9]) Dn (α) is related to the following Riemann-Hilbert problem for a 2 × 2 matrix-valued function Y (z) (Fig. 6): (a) Y (z) is analytic for z ∈ C\[α, ∞). (b) Let x ∈ (α, ∞). Y (z) has L 2 boundary values Y+ (x) as z approaches x from above, and Y− (x), from below. They are related by the jump condition Y+ (x) = Y− (x)vY (x), 1 + 2πiφ(x)ψ(x) vY (x) = 2πiφ(x)2

−2πiψ(x)2 , 1 − 2πiφ(x)ψ(x)

(c) Y (z) has the following asymptotic behavior at infinity: 1 , as z → ∞. Y (z) = I + O z

x ∈ (α, ∞).

(166)

(167)

Asymptotics of the Airy-Kernel Determinant

675

As in [5,24], it is possible to reduce the RHP for Y to an equivalent RHP with an “elementary”, in fact constant, jump matrix (see (174) below). Note first that for any ˜ ˜ ˜ ˜ functions ψ(x), φ(x) such that ψ(x)φ(x) − φ(x)ψ(x) = 1, we have ˜ 1 −2πi ψ(x) ψ(x) −1 A (x), . (168) vY (x) = A(x) A(x) = ˜ 0 1 φ(x) φ(x) ˜ ˜ Note that the condition on ψ(x), φ(x) is equivalent to the following one: det A(x) = 1. Let

⎛ ψ(z) (z) = ⎝ φ(z)

e2nz

∞

e2nz

∞

0

ψ(ξ ) −2nξ dξ ξ −z e

0

φ(ξ ) −2nξ dξ ξ −z e

⎞ ⎠.

(169)

The function (z) is analytic in C\R+ . Using the orthogonality property of the polynomials pn (x), pn−1 (x) with respect to the weight e−4nx , we see that (z) solves the following RHP on R+ : (a) (z) is analytic for z ∈ C\[0, ∞). (b) For x ∈ (0, ∞) the L 2 boundary values + (x) and − (x) are related by the jump condition 1 2πi + (x) = − (x) , x ∈ (0, ∞). (170) 0 1 (c) (z) has the following asymptotic behavior as z → ∞: 1 n −2nz n σ3 (z) = I + O e z . z 2

(171)

By standard arguments, see [7], det (z) = 1. Hence, we see that for x > 0, we can take in (168) A(x) = + (x).

(172)

The decomposition (168) suggests the following transformation of the RiemannHilbert problem. Let X (z) = Y (z)(z).

(173)

It is easy to verify that X (z) satisfies the following problem: (a) X (z) is analytic for z ∈ C\[0, α]. (b) For x ∈ (0, α) the L 2 boundary values X + (x) and X − (x) are related by the jump condition 1 2πi , x ∈ (0, α). (174) X + (x) = X − (x) 0 1 (c) X (z) has the following asymptotic behavior as z → ∞: 1 n −2nz n σ3 X (z) = I + O e z . z 2

(175)

676

P. Deift, A. Its, I. Krasovsky

Thus X (z) satisfies the same RHP as (z), but now on the interval (0, α). The transformation √ n −σ3 V (z) = 2πi X (z)e2nzσ3 (2πi)σ3 /2 2

(176)

converts the RHP to the RHP for V (z) of Sect. 3. We now turn to the derivation of the identity for Dn (α). Write the determinant (165) in the form Dn (α) = det(I − K ), where K is an integral operator acting on functions f (x) from L 2 (α, ∞) as follows: ∞ φ(x)ψ(y) − φ(y)ψ(x) . K (x, y) f (y)dy, K (x, y) = (K f )(x) = x−y α The logarithmic derivative of Dn (α) w.r.t. α has the form d −1 d K ln Dn (α) = −tr (I − K ) = ((I − K )−1 K )(α, α) dα dα = ((I − K )−1 (K − I + I ))(α, α) = R(α, α),

(177)

where R(x, y) is the kernel of the operator (I − K )−1 − I . As noted above, the kernel K (x, y) has the structure of an “integrable” kernel. A consequence of this fact is the identity R(x, y) =

−F1 (x)F2 (y) + F2 (x)F1 (y) , x−y

(178)

where the F j (z) are expressed in terms of the solution of the Riemann-Hilbert problem for Y (z) as follows: F j (z) = Y+, j1 ψ + Y+, j2 φ,

j = 1, 2.

(179)

Comparing this with the definition (173) of X (z) we see that F j (z) = X j1 (z),

j = 1, 2.

(180)

Substituting then R(α, α) = lim x→α R(x, α) into (177), we obtain: d ln Dn (α) = X 11 (α)X 21 (α) − X 11 (α)X 21 (α), dα

(181)

which expresses the logarithmic derivative of Dn (α) in terms of the solution of the Riemann-Hilbert problem for X (z). Now the function X (z) is related to V (z) by the expression (176). In particular, X 11 (z) =

n −2nz e V11 (z), 2

X 21 (z) =

1 −2nz e V21 (z). πin

Calculating the derivatives of these quantities at z = α and substituting into (181), we finally obtain (37).

Asymptotics of the Airy-Kernel Determinant

677

References 1. Abramowitz, M., Stegun, I.: Handbook of Mathematical Functions. NY: Dover, 1965 2. Baik, J., Buckingham, R., DiFranco, J.: Asymptotics of Tracy-Widom distributions and the total integral of a Painlevé II function. http://arXiv.org/list/math/0704.3636, 2007 3. Baik, J., Deift, P., Johansson, K.: On the distribution of the length of the longest increasing subsequence of random permutations. J. Amer. Math. Soc. 12(4), 1119–1178 (1999) 4. Beals, R., Deift, P., Tomei, C.: Direct and inverse scattering on the line. Mathematical Surveys and Monographs, 28. Providence, RI: Amer. Math. Soc., 1988 5. Borodin, A., Deift, P.: Fredholm determinants, Jimbo-Miwa-Ueno τ -functions, and representation theory. Comm. Pure Appl. Math. 55(9), 1160–1230 (2002) 6. Deift, P.: Integrable operators. Amer. Math. Soc. Transl. (2) 189, 69–84 (1999) 7. Deift, P.: Orthogonal polynomials and random matrices: a Riemann-Hilbert approach. Courant Lecture Notes in Math, 1998 8. Deift, P., Gioev, D.: Universality at the edge of the spectrum for unitary, orthogonal and symplectic ensembles of random matrices. Comm. Pure. Appl. Math. 60(6), 867–910 (2007) 9. Deift, P., Its, A., Zhou, X.: A Riemann-Hilbert approach to asymptotic problems arising in the theory of random matrix models, and also in the theory of integrable statistical mechanics. Ann. Math 146, 149–235 (1997) 10. Deift, P., Zhou, X.: A steepest descent method for oscillatory Riemann-Hilbert problem. Ann. Math. 137, 295–368 (1993) 11. Deift, P., Zhou, X.: Asymptotics for the Painlevé II equation. Comm. Pure Appl. Math. 48, 277–337 (1995) 12. Deift, P., Zhou, X.: A priori L p estimates for solutions of Riemann-Hilbert problems. Int. Math. Res. Notes 40, 2121–2154 (2002) 13. Deift, P., Zhou, X.: Long-time asymptotics for solutions of the NLS equation with initial data in a weighted Sobolev space. Comm. Pure Appl. Math. 56, 1029–1077 (2003) 14. Deift, P., Venakides, S., Zhou, X.: New results in small dispersion KdV by an extension of the steepest descent method for Riemann-Hilbert problems. Int. Math. Res. Not. 1997, 286–299 (1997) 15. Deift, P., Kriecherbauer, T., McLaughlin, K.T.-R., Venakides, S., Zhou, X.: Uniform asymptotics for polynomials orthogonal with respect to varying exponential weights and applications to universality questions in random matrix theory. Comm. Pure Appl.Math. 52, 1335–1425 (1999) 16. Deift, P., Kriecherbauer, T., McLaughlin, K.T.-R., Venakides, S., Zhou, X.: Strong asymptotics for orthogonal polynomials with respect to exponential weights. Commun. Pure Appl. Math. 52, 1491–1552 (1999) 17. Deift, P., Its, A., Krasovsky, I., Zhou, X.: The Widom-Dyson constant for the gap probability in random matrix theory. J. Comput. Appl. Math. 202, 26–47 (2007) 18. Dyson, F.: Fredholm determinants and inverse scattering problems. Commun. Math. Phys. 47, 171–183 (1976) 19. Ehrhardt, T.: Dyson’s constant in the asymptotics of the Fredholm determinant of the sine kernel. Commun. Math. Phys. 262, 317–341 (2006) 20. Fokas, A.S., Its, A.R., Kitaev, A.V.: The isomonodromy approach to matrix models in 2D quantum gravity. Commun. Math. Phys. 147, 395–430 (1992) 21. Forrester, P.J.: The spectrum edge of random matrix ensembles. Nucl. Phys. B 402, 709–728 (1993) 22. Hastings, S.P., McLeod, J.B.: A boundary value problem associated with the second Painlevé transcendent and the Korteweg de Vries equation. Arch. Rat. Mech. Anal. 73, 31–51 (1980) 23. Its, A.R., Izergin, A.G., Korepin, V.E., Slavnov, N.A.: Differential equations for quantum correlation functions. Int. J. Mod. Phys. B4, 1003–1037 (1990) 24. Its, A.: A Riemann-Hilbert Approach to the Distribution Functions of Random Matrix Theory. Lecture notes, MSRI 1999, Kent 2000 (unpublished) 25. Johansson, K.: Random permutations and the discrete Bessel kernel. In: Random matrix models and their applications, Math. Sci. Res. Inst. Publ., 40, Cambridge: Cambridge, Univ. Press, 2001, pp. 259–269 26. Krasovsky, I.V.: Gap probability in the spectrum of random matrices and asymptotics of polynomials orthogonal on an arc of the unit circle. Int. Math. Res. Not. 2004, 1249–1272 (2004) 27. Kuijlaars, A.B.J., McLaughlin, K.T.-R., Van Assche, W., Vanlessen, M.: The Riemann-Hilbert approach to strong asymptotics for orthogonal polynomials on [−1, 1]. Adv. Math. 188, 337–398 (2004) 28. Litvinchuk, G.S., Spitkovskii, I.M.: Factorization of measurable matrix functions. Basel-Boston: Birkhäuser, 1987 29. Mehta, M.L.: Random matrices. San Diego: Academic, 1990 30. Szeg˝o, G.: Orthogonal polynomials. AMS Colloquium Publ. 23. New York: Amer. Math. Soc., 1959 31. Tracy, C., Widom, H.: Level-spacing distributions and the Airy kernel. Commun. Math. Phys. 159, 151–174 (1994) 32. Vanlessen, M.: Strong asymptotics of Laguerre-type orthogonal polynomials and applications in random matrix theory. Constr. Approx. 25, 125–175 (2007)

678

P. Deift, A. Its, I. Krasovsky

33. Widom, H.: The strong Szeg˝o limit theorem for circular arcs. Indiana Univ. Math. J. 21, 277–283 (1971) 34. Zhou, X.: The Riemann-Hilbert problem and inverse scattering. SIAM J. Math. Anal. 20(4), 966–986 (1989) Communicated by M. Aizenman

Commun. Math. Phys. 278, 679–713 (2008) Digital Object Identifier (DOI) 10.1007/s00220-008-0412-x

Communications in

Mathematical Physics

Injectivity Radius of Lorentzian Manifolds Bing-Long Chen1 , Philippe G. LeFloch2 1 Department of Mathematics, Sun Yat-Sen University, 510275 Guang-Zhou, People’s Republic of China.

E-mail: [email protected]

2 Laboratoire Jacques-Louis Lions & Centre National de la Recherche Scientifique,

Université de Paris VI, 4 Place Jussieu, 75252 Paris, France. E-mail: [email protected] Received: 8 January 2007 / Accepted: 23 July 2007 Published online: 26 January 2008 – © Springer-Verlag 2008

Abstract: Motivated by the application to general relativity we study the geometry and regularity of Lorentzian manifolds under natural curvature and volume bounds, and we establish several injectivity radius estimates at a point or on the past null cone of a point. Our estimates are entirely local and geometric, and are formulated via a reference Riemannian metric that we canonically associate with a given observer ( p, T ) –where p is a point of the manifold and T is a future-oriented time-like unit vector prescribed at p only. The proofs are based on a generalization of arguments from Riemannian geometry. We first establish estimates on the reference Riemannian metric, and then express them in terms of the Lorentzian metric. In the context of general relativity, our estimate on the injectivity radius of an observer should be useful to investigate the regularity of spacetimes satisfying Einstein field equations. 1. Introduction Aims of this paper. The regularity and compactness of Riemannian manifolds under a priori bounds on geometric quantities such as curvature, volume, or diameter represent a central theme in Riemannian geometry. In particular, the derivation of lower bounds on the injectivity radius of a Riemannian manifold, and the construction of local coordinate charts in which the metric has optimal regularity are now well-understood. Moreover, Cheeger-Gromov’s theory provides geometric conditions for the strong compactness of sequences of manifolds and has become a central tool in Riemannian geometry. See, for instance, [1,4,5,7,8,15,20,21]. Our objective in this paper is to present an extension of these classical techniques and results to Lorentzian manifolds. Recall that a Lorentzian metric is not positive definite, but has signature (−, +, . . . , +). Motivated by recent work by Anderson [2] and Klainerman and Rodnianski [18], we derive here several injectivity radius estimates for Lorentzian manifolds satisfying certain curvature and volume bounds. That is, we provide lower bounds on the size of the (geodesic) ball around one point within which

680

B.-L. Chen, P. G. LeFloch

the exponential map is a global diffeomorphism and, therefore, we obtain a sharp control of the manifold geometry. Our proofs rely on arguments that are known to be flexible and efficient in Riemannian geometry, and are here extended to the Lorentzian setting: we analyze the properties of Jacobi fields and rely on volume comparison and homotopy arguments. In our presentation (see for instance our main result stated in Theorem 1.1 at the end of this introduction) we emphasize the importance of having assumptions and estimates that are stated locally and geometrically, and avoid direct use of coordinates. When necessary, coordinates can be constructed a posteriori, once uniform bounds on the injectivity radius have been established. Our motivation comes from general relativity, where one of the most challenging problems is the formation and the structure of singularities in solutions to the Einstein field equations. Relating curvature and volume bounds to the regularity of the manifold, as we do in this paper, is necessary before tackling an investigation of the geometric properties of singular spacetimes satisfying Einstein equations. (See, for instance, [2,3] for some background on this subject.) Two preliminary observations should be made. First, since the Lorentzian norm of a non-zero tensor may vanish it is clear that only limited information would be gained from an assumption on the Lorentzian norm of the curvature tensor. This justifies that we endow the Lorentzian manifold with a “reference” Riemannian metric (denoted below by gT ); this metric is defined at a point p once we prescribe a future-oriented time-like unit vector T in the tangent space at p. We refer to the pair ( p, T ) as an observer located at the point p. This reference vector is necessary in order to define appropriate notions of conjugate and injectivity radii. (See Sect. 2, below, for details.) Secondly, we rely here on the elementary but essential observation that, in the flat Riemannian and Lorentzian spaces, geodesics (are straight lines and therefore) coincide. More generally, under our assumptions, we will see that geodesics associated with the given Lorentzian metric are comparable to geodesics associated with the reference Riemannian metric. On the other hand, it must be emphasized that the curvature bound assumed on the Lorentzian metric implies, in general, no information on the curvature of the reference metric. In fact, as we show below, one of the main technical difficulties is to construct a sufficiently regular foliation of the manifold by spacelike hypersurfaces.

Earlier work. Let us briefly review some classical results from Riemannian geometry. Let (M, g) be a differentiable n-manifold (possibly with boundary) endowed with a Riemannian metric g. (Throughout the present paper, the manifolds and metrics under consideration are always assumed to be smooth.) Denote by B( p, r ) the corresponding geodesic ball centered at p ∈ M and with radius r > 0. Suppose that at some point p ∈ M the unit ball B( p, 1) is compactly contained in M and that the Riemann curvature bound and the lower volume bound Rmg L∞ (B( p,1)) ≤ K ,

Volg (B( p, 1)) ≥ v0 ,

(1.1)

hold for some constants K , v0 > 0. (We use the standard notation Lm , 1 ≤ m ≤ ∞, for the spaces of Lebesgue measurable functions.) Then, according to Cheeger, Gromov, and Taylor [9] the injectivity radius Injg (M, p) at the point p is bounded below by a positive constant i 1 = i 1 (K , v0 , n), Injg (M, p) ≥ i 1 .

(1.2)

Injectivity Radius of Lorentzian Manifolds

681

It should be noticed that this is a local statement; for earlier (global) results on the injectivity radius see [5,10,14]. Moreover, Jost and Karcher in [15] rely on the regularity theory for elliptic operators and establish the existence of coordinates in which the metric has optimal regularity and which are defined in a ball with radius i 2 = i 2 (K , v0 , n). Precisely, given ε > 0 and 0 < γ < 1 there exist a positive constant C(ε, γ ) (depending also upon (K , v0 , n) and a system of harmonic coordinates defined in the geodesic ball B( p, i 2 ) in which the metric g is close to the Euclidian metric g E in these coordinates and has optimal regularity, in the following sense: e−ε g E ≤ g ≤ eε g E , r 1+γ ∂gCγ (B( p,r )) ≤ C(ε, γ ),

r ∈ (0, i 2 ].

(1.3)

Here, C0 and Cγ = C0,γ are the spaces of continuous and Hölder continuous functions, respectively. Harmonic coordinates are optimal [11] in the sense that if the metric is of class Ck,γ in certain coordinates then it has at least the same regularity in harmonic coordinates. The above results were later generalized by Anderson [1] and Petersen [21] who replaced the L∞ curvature bound by an Lm curvature bound with m > n/2. For instance, one can take m = 2 in dimension n = 3 which is the situation met in the application to general relativity (since time-slices of Lorentzian 4-manifolds are Riemannian 3-manifolds). It is only more recently that the same questions were tackled for Lorentzian (n + 1)manifolds (M, g). Anderson [2,3] studied the long-time evolution of solutions to the Einstein field equations and formulated several conjectures. In particular, assuming the Riemann curvature bound in some domain , Rmg L∞ () ≤ K ,

(1.4)

and other regularity conditions, he investigated the existence of coordinates that are harmonic in each spacelike slice of a time foliation of M. Anderson’s pioneering work motivated us throughout the present paper. On the other hand, for applications to general relativity and nonlinear wave equations using harmonic analysis tools, Klainerman and Rodnianski [18] considered asymptotically flat spacetimes endowed with a time foliation and satisfying the L2 curvature bound Rmg L2 () ≤ K

(1.5)

for every spacelike hypersurface . They established an injectivity radius estimate for past null cones, by relying on their earlier work [16,17] on the conjugate radius of null cones in terms of Bell-Robinson’s energy and energy flux, and by deriving in [18] a new estimate on the null cut locus of such manifolds. We refer to these papers for further details and references on the Einstein equations. Outline of this paper. We establish here four estimates on the radius of injectivity of Lorentzian manifolds, which hold either in a neighborhood of a point or in the past null cone of a point. Our assumptions are formulated within a geodesic ball (or within a null cone) and possibly apply to a large ball with arbitrary size, as long as our curvature and volume assumptions hold. All assumptions and statements are local and geometric.

682

B.-L. Chen, P. G. LeFloch

An outline of the paper follows. In Sect. 2, we begin with basic material from Lorentzian geometry and we introduce the notions of reference metric and injectivity radius associated with Lorentzian manifolds. In Sect. 3, we state our first estimate (Theorem 3.1 below) for a class of manifolds that have bounded curvature and admit a time foliation by slices with bounded extrinsic curvature. In Sect. 4, we provide a proof of this first estimate and we introduce a technique that will be used (in variants) throughout this paper; we combine two main ingredients: sharp estimates for Jacobi fields along geodesics, and a homotopy argument based on contracting a possible loop to two linear segments. In Sect. 5, our second main result (Theorem 5.1) shows, under the same assumptions, the existence of convex functions (distance functions) and convex neighborhoods; this result leads us to a lower bound for the convexity radius. In Sect. 6, our third estimate (Theorem 6.1) covers the generalization to null cones, and we show that weaker assumptions are sufficient to control the geometry of null cones. This result is directly relevant for the application to general relativity. Next, in Sect. 7, we establish our principal and fourth result (stated in Theorem 1.1 below) which provides an injectivity radius bound under the mild assumption that the exponential map exp p is defined in some ball and the curvature Rm is bounded. Most importantly, this is a general result that does not require a time foliation of the manifold but solely a single reference (future-oriented time-like unit) vector T at the base point p. This is very natural in the context of general relativity and ( p, T ) can be interpreted as an observer at the point p. For the convenience of the reader we state here our main result and refer to Sect. 7 for further details. Given such an observer ( p, T ), we consider the geodesic ball BT (0, r ) ⊂ T p M with radius r , determined by the reference Riemannian inner product at p, and we can also define the geodesic ball BT ( p, r ) := exp p (BT (0, r )). In turn, the radius of injectivity Injg (M, p, T ) is defined as the largest radius r such that the exponential map is a diffeomorphism from BT (0, r ) onto BT ( p, r ). Let us then consider an arbitrary geodesic γ = γ (s) initiating at p and let us g-parallel transport the vector T along this geodesic, defining therefore a vector field Tγ along this geodesic, only. At every point of γ we introduce a reference inner product gTγ and compute the curvature norm |Rmg |Tγ . This construction allows us to express our curvature assumption below. Theorem 1.1. (Injectivity radius of Lorentzian manifolds). Let M be a time-orientable, Lorentzian, differentiable (n + 1)-manifold. Consider an observer ( p, T ) consisting of a point p ∈ M and a reference (future-oriented time-like unit) vector T ∈ T p M. Assume that the exponential map exp p is defined in a ball BT (0, r ) ⊂ T p M and the Riemann curvature satisfies sup |Rmg |Tγ ≤ γ

1 , r2

(1.6)

where the supremum is over the domain of definition of γ and over every g-geodesic γ initiating from a vector in the Riemannian ball BT (0, r ) ⊂ T p M. Then, there exists a constant c(n) depending only on the dimension of the manifold such that Injg (M, p, T ) r

≥ c(n)

Volg (BT ( p, c(n)r )) . r n+1

(1.7)

Observe that the curvature assumption (1.6) is not a genuine restriction since it can always be satisfied by suitably rescaling the metric. This result should be compared

Injectivity Radius of Lorentzian Manifolds

683

with the injectivity radius estimate established by Cheeger, Gromov, and Taylor [9] in Riemannian geometry. We also point out that the importance of analyzing the geometry of radial geodesics, as we will do in the proof of Theorem 1.1, was emphasized by Anderson [3] in his pioneering work on the optimal regularity of Lorentzian metrics. It would be interesting to refine our arguments and replace the volume term in the right-hand side of (1.7) by Volg (B( p, r )). In a related direction, in Sect. 8, we establish a volume comparison theorem for future cones which allows us to generalize our main theorem and use the volume of a future cone in the right-hand side of (1.7). Finally, in Sect. 9, we briefly discuss the regularity of Lorentzian metrics in harmonic-like coordinates, and we provide a direct generalization to pseudo-riemannian manifolds. 2. Preliminaries on Lorentzian Geometry Basic definitions. It is useful to discuss first some basic definitions from Lorentzian geometry, for which we can refer to the textbook by Penrose [19]. Throughout this paper, (M, g) is a connected and differentiable (n + 1)-manifold, endowed with a Lorentzian metric g with signature (−, +, . . . , +). To emphasize the role of the metric g or the point p we use any of the following notations: g p (X, Y ) = X, Y g p = X, Y g = X, Y p for the inner product of two vectors X, Y at a point p ∈ M; we sometimes also write |X |2g p instead of g p (X, X ). Recall that the tangent vectors X ∈ T p M are called time-like, null, or spacelike depending whether the norm g p (X, X ) is negative, zero, or positive, respectively. Vectors that are time-like or null are called causal. Time-like vectors form a cone with two connected components. The manifold (M, g) is said to be time-orientable if we can select in a continuous way a half-cone of time-like vectors at every point p. The choice of a specific orientation allows us to decompose the cone of time-like vectors into future-oriented and past-oriented ones. The set of all future-oriented, time-like vectors at p and the corresponding bundle on M are denoted by T p+ M and T + M, respectively. We also introduce the bundle T1+ M consisting of elements of T + M with unit length. By definition, a trip is a continuous time-like curve γ : (a, b) → M. We write p << q if there exists a trip from p to q. A causal trip is defined similarly except that the geodesics may be causal instead of time-like, and we write p < q if there exists a causal trip from p to q. The set I+ ( p) := {q ∈ M / p << q} is called the chronological future of the point p, and I− ( p) := {q ∈ M / q << p} is called the chronological past. The causal future and past J− ( p) are defined similarly by replacing << by <. The future or past sets of a set S ⊂ M are defined by I± (S) := I± ( p), J± (S) := J± ( p), p∈S

p∈S

and one easily checks that I± (S) are open, but that J± (S) need not be closed in general. A future set F ⊂ M by definition has the form F = I+ (S) for some set S ⊂ M. Similarly, a past set satisfies F = I− (S) for some S. A set is called achronal if no two points are connected by a time-like trip. Observe that a set can be spacelike at every point without being achronal and that an achronal set can be null at some (or even at

684

B.-L. Chen, P. G. LeFloch

every) point. A set B ⊂ M is called an achronal boundary if it is the boundary of a future set, that is, B = ∂I+ (S) = I+ (S) \ I+ (S) for some S ⊂ M. One can check that given a non-empty achronal boundary the manifold can be partitioned as M = P ∪ B ∪ F, where B is the boundary of both F and P and, moreover, any trip from p ∈ P to q ∈ F meets B at a unique point. Observe also that any achronal boundary is a Lipschitz continuous n-manifold. For instance, in Sect. 6 below, we will be interested in the geometry of past null cones, that is, the sets ∂J− ( p) for p ∈ M. Given an arbitrary achronal and closed set S ⊂ M, we define the (future or past) domains of dependence of S in M by D± (S) := { p ∈ M / every future (resp. past) endless trip containing p meets S} , D(S) := D− (S) ∪ D+ (S). Observe that domains of dependence are closed sets. Next, define the (future or past) Cauchy horizons H± (S) := p ∈ D± (S) / I± ( p) ∩ D± (S) = ∅ = D± (S) \ I∓ (D± (S)), H(S) := H− (S) ∪ H+ (S). For instance, the future Cauchy horizon is the future boundary of the future domain of dependence of S. One can check that the Cauchy horizons are closed and achronal sets, with ∂D+ (S) = H+ (S) ∪ S and ∂D(S) = H(S). Finally, a (future) Cauchy hypersurface for M is defined as an achronal (but not necessarily closed) set S satisfying D+ (S) = M. For instance, it is sufficient for S to be smooth, achronal, spacelike and such that every endless null geodesic meets M.

Reference metric. As explained in the introduction one should not use the Lorentzian metric to compute the norm of a tensor since its Lorentzian norm may vanish even when the tensor does not. This motivates the introduction of a “reference” Riemannian metric associated with a time-like vector field, as follows. Let T be a future-oriented, time-like, unit vector field, satisfying therefore g p (T, T ) = −1 at every point p (in an open subset of M, at least). We refer to T as the reference vector field prescribed on M (or on an open subset). Introduce a moving frame E α (α = 0, 1, . . . , n) defined in M, that is, E α is an orthonormal basis of the tangent space at every point and consists of the vector E 0 = T supplemented with n spacelike unit vectors E j ( j = 1, . . . , n). Denoting by E α the corresponding dual frame, the Lorentzian metric takes the form g = ηαβ E α ⊗ E β , where ηαβ is the Minkowski “metric”. This decomposition suggests to consider the Riemannian version obtained by switching the minus sign in η00 = −1 into a plus sign, that is, gT := δαβ E α ⊗ E β , where δαβ is the Euclidian “metric”. Clearly, gT is a positive definite metric; it is referred to as the reference Riemannian metric associated with the frame E α .

Injectivity Radius of Lorentzian Manifolds

685

For every p ∈ M, since T p is time-like, the restriction of the metric g p to the orthogonal complement {T p }⊥ ⊂ TP M is positive definite, and the reference metric can be computed as follows: if V = a T p + V and W = b T p + W with V , W ∈ {T p }⊥ , then gT, p (V, W ) = a b + g p (V , W ). In the following, we use the notation V, W T, p := gT, p (V, W ),

|V |2T, p := gT, p (V, V )

for vectors; the norm of tensors is defined and denoted similarly. In contrast with the Lorentzian norm, the Riemannian norm |A|T, p of a tensor A at a point p ∈ M vanishes if and only if the tensor vanishes at p. Moreover, as long as T remains in a compact subset of the bundle of half-cones T + M, the norms associated with different reference vectors are equivalent. The reference Riemannian metric also allows one to define the functional norms associated with Lebesgue or Sobolev spaces of tensors defined on M (as well as on submanifolds of M). For instance, L2 (M, gT ) can now be viewed as a Banach space. In particular, we will use later the L2 norm of a tensor field A defined on M and restricted to a hypersurface : ∇ AL2 (,gT ) := |A|2T d V,gT ,

where d V,gT is the volume form induced on by the reference Riemannian metric. The functional norm above depends upon the choice of the vector field T , but another choice of T would give rise to an equivalent norm (provided T remains in a fixed compact subset). Observe in passing that the volume forms associated with the metrics g and gT coincide, so that the spacetime integrals of functions in (M, g) or (M, gT ) coincide; for instance, the volumes Volg (A) and VolgT (A) of a set A ⊂ M coincide. Finally, we observe that in order to define the reference inner-product gT at a single point p, it suffices to prescribe a future-oriented time-like unit vector T at that point p only; it is not necessary to prescribe a vector field. In the situation where the reference metric need only be defined at the base point p, we refer to T as the reference vector (rather than vector field) and we refer to ( p, T ) ∈ T1+ M as the observer at the point p . This will be the standpoint adopted for our main result in Sect. 7 below. Exponential map. On a complete Riemannian manifold the exponential map exp p : T p M → M at some point p ∈ M is defined on the whole tangent space T p M and is smooth. For sufficiently small radius r the restriction of exp p to the ball Bg p (0, r ) ⊂ T p M (determined by the metric g p at the point p) is a diffeomorphism on its image. The radius of injectivity at the point p is defined as the largest value r such that the restriction exp p | Bg p (0,r ) is a global diffeomorphism. In the Lorentzian case, the exponential map is defined similarly but some care is needed in defining the notion of radius of injectivity. First of all, if the manifold is not geodesically complete (which is a rather generic situation, as illustrated by Penrose and Hawking’s incompleteness theorems [13]), the map exp p need not be defined on the whole tangent space T p M but only on a neighborhood of the origin in T p M. More importantly, the Lorentzian norm of a non-zero vector may well vanish; consequently, the radius of injectivity should not be defined directly from the Lorentzian metric g. The definition given now depends on the prescribed Riemannian metric gT, p at the point p, only.

686

B.-L. Chen, P. G. LeFloch

Definition 2.1. The conjugate radius Conjg (M, p, T ) of an observer ( p, T ) ∈ T1+ M is the largest radius r such that the exponential map exp p is a local diffeomorphism from the Riemannian ball BT (0, r ) = BgT, p (0, r ) ⊂ T p M to a neighborhood of p in the manifold M. Similarly, the injectivity radius Injg (M, p, T ) of an observer ( p, T ) ∈ T1+ M is the largest radius r such that the exponential map is a global diffeomorphism in the ball BT (0, r ). When a vector field T is prescribed on the manifold (rather than a vector at the point p), we use the notation Injg (M, p, T ) instead of Injg (M, p, T p ). Note that the radii Conjg (M, p, T ) and Injg (M, p, T ) are essentially independent of the choice of the reference vector, as long as it remains in a fixed compact subset of T p+ M. We will also need the notion of injectivity radius for null cones. Given a point p ∈ M and a reference vector T ∈ T p M, we consider the past null cone at p, N p− := X ∈ T p M g p (X, X ) = 0, g p (T, X ) ≥ 0 , which defines a subset of the tangent space at p. Denote by BTN (0, r ) = BgNT, p (0, r ) := BgT, p (0, r ) ∩ N p− the intersection of the Riemannian gT, p -ball with radius r and the past null cone, and by N− ( p) := ∂I− ( p) the past null cone at p. Consider now the restriction of exp p to the past null cone, denoted by exp Np : BTN (0, r ) ⊂ N p− → N− ( p) ⊂ M, which we refer to as the null exponential map. Definition 2.2. The (past) null conjugate radius Null Conjg (M, p, T ) of an observer ( p, T ) ∈ T1+ M is the largest radius r such that the null exponential map exp Np is a local diffeomorphism from the punctured Riemannian ball BTN (0, r )\{0} ⊂ T p M to a neighborhood of p in the past null cone. The null injectivity radius Null Injg (M, p, T ) of an observer ( p, T ) ∈ T1+ M is defined similarly by requiring the map exp Np to be a global diffeomorphism. 3. Lorentzian Manifold Endowed with a Reference Vector Field A first injectivity radius estimate. From now on, we fix a reference vector field T which allows us to define the Riemannian metric gT and compute the norms of tensors. We begin with a set of assumptions encompassing a large class of foliated Lorentzian manifolds with L∞ bounded curvature and we state our first injectivity estimate, in Theorem 3.1 below. The forthcoming sections will be devoted to further generalizations and variants of this result. We fix a point p ∈ M and assume that a domain ⊂ M containing p is foliated by spacelike hypersurfaces t with future-oriented time-like unit normal T , = t . (3.1) t∈[−1,1]

Injectivity Radius of Lorentzian Manifolds

687

A positive scalar n is defined by the relation ∂t∂ = n T , or ∂ ∂ 2 , . n := −g ∂t ∂t In the context of general relativity, n is the proper time of an observer moving orthogonally to the hypersurfaces, and is called the lapse function. The geometry of the foliation is determined by this function n together with the Lie derivative LT g. The latter is nothing but the second fundamental form, or extrinsic curvature, of the slices t embedded in the manifold M. We always assume that the geodesic ball B0 ( p, 1) ⊂ 0 (determined by the induced metric g|0 ) is compactly contained in 0 . We introduce the following assumptions: e−K 0 ≤ n ≤ e K 0 in , |LT g|T ≤ K 1 in , |Rmg |T ≤ K 2 in , Volg|0 (B0 ( p, 1)) ≥ v0 ,

(A1) (A2) (A3) (A4)

where K 0 , K 1 , K 2 and v0 are positive constants. Observe that Assumption (A4) is a condition on the initial slice only; together with the other assumptions it actually implies a lower volume bound for every slice of the foliation. We will prove: Theorem 3.1. (Injectivity radius of foliated manifolds). Let M be a differentiable manifold endowed with a Lorentzian metric g satisfying the regularity assumptions (A1)–(A4) at some point p and for some foliation (3.1). Then, there exists a positive constant i 0 depending only upon the foliation bounds K 0 , K 1 , the curvature bound K 2 , the volume bound v0 , and the dimension of the manifold such that the injectivity radius at p satisfies Injg (M, p, T ) ≥ i 0 . The following section is devoted to the proof of this theorem. Observe that conditions (A1)–(A4) are local about one point of the manifold and are stated in purely geometric terms, requiring no particular choice of coordinates. Of course, the conclusion of Theorem 3.1 holds globally in M if the assumptions (A1)–(A4) hold at every point of the manifold. Our assumptions do depend on the choice of the time-like vector field T , but the dependence of the constants arising in (A1)–(A4) should not be essential in the applications. It is conceivable that a quantitatively sharper estimate would be obtained with a choice of an “almost Killing” field, that is, a field T corresponding to a “small” Lie derivative LT g. Later in Sect. 7, a more general approach is presented in which the vector field T is constructed from a single vector prescribed at the point p. Basic estimates on the reference metric. To establish Theorem 3.1 it is convenient to introduce coordinates on , chosen as follows. Fix arbitrarily some coordinates (x i ) on the initial slice 0 . Then, transport these coordinates to the whole of along the integral curves of the vector field T . This construction generates coordinates (x α ) on such that x 0 = t and the vector ∂/∂t is orthogonal to each vector ∂/∂ x j (1 ≤ j ≤ n), so that the Lorentzian metric takes the form g = −n 2 dt 2 + gi j d x i d x j ,

(3.2)

688

B.-L. Chen, P. G. LeFloch

where n is the lapse function and gi j is the Riemannian metric induced on the slices t . The reference Riemannian metric in the domain then takes the form gT = n 2 dt 2 + gi j d x i d x j ,

(3.3)

and the Riemannian norm of a vector X equals gT (X, X ) := n 2 X 0 X 0 + X j X j . We want to control the discrepancy between the reference Riemannian metric gT and the original Lorentzian metric g, as measured in the connections ∇ and ∇gT and the curvature tensors Rm and RmgT . Clearly, these estimates should involve the constants arising in (A1)–(A4). Consider the general class of metrics g := f dt 2 + gi j d x i d x j ,

(3.4)

which allows us to recover both the Lorentzian ( f = −n 2 ) and the Riemannian ( f = n 2 ) metrics. In view of the general expressions of the Christoffel and Riemann curvature coefficients ∂ gαβ gδβ ∂ 1 γ δ ∂ gδα γ

αβ := , g + − 2 ∂xα ∂xβ ∂xδ ζ ζ ∂

βδ ∂

αδ η ζ ζ ζ η Rαβδ :=

βδ −

αδ , − +

αη

βη ∂xα ∂xβ αβ := R αγβδ αβγ δ := ζ , gγ ζ R R gγ δ , R αβδ we can compute explicitly the Christoffel symbols associated with the metric g, 1 ∂f 1 ∂f 1 ∂gi j 0 ,

0i , = ,

i0j = − i 2 f ∂t 2 f ∂x 2 f ∂t ∂f 1 1 ∂gli k k

00 ,

ikj = ikj , = − g kl l ,

i0 = g kl (3.5) 2 ∂x 2 ∂t as well as the (non-trivial) curvature terms ∂gik ∂g jl ∂gil ∂g jk 1 − Ri jkl = Ri jkl − , 4f ∂t ∂t ∂t ∂t p ∂ jl ∂glq 1 pq ∂gql ∂ 1 ∂ p 1 p = − g + g pq gqk kjl − jk ( g kq ) R 0 jl ∂t ∂x j 2 ∂t 2 ∂t 2 ∂t ∂gq j 1 ∂ f ∂g jl 1 1 pq ∂ f g − g pq + , 4f ∂xq ∂t 2 ∂t 2 f ∂ x l 1 ∂ f ∂g jl ∂ f ∂gi j 0 jil = 1 ∇l ∂ gi j − ∇i ∂ gl j + − , R 2 ∂t ∂t 4 f ∂ x i ∂t ∂ x l ∂t ∂g p = ∂ − 1 g pq ∂ f − ∂ 1 g pq qi + p − 1 glq ∂ f R i00 il ∂ xi 2 ∂xq ∂t 2 ∂t 2 ∂xq 1 lr ∂gri 1 ∂ f 1 pq ∂gqi 1 ∂f 1 ∂f 1 pq ∂gql g g + g + g pq q − , 2 ∂t 2 ∂t 2 f ∂t 2 ∂t 2 ∂ x 2 f ∂ xi 0

00 =

and

∂ 2 gi j ∂gi p ∂g jq 1 1 ∂ f ∂gi j 1 ∂f ∂f i0 j0 = − 1 ∇i ∇ j f + + g pq + + . R 2 2 ∂t 4 ∂t ∂t 4 f ∂t ∂t 4 f ∂ xi ∂ x j

Injectivity Radius of Lorentzian Manifolds

689

By applying the formulas above to the metrics g and gT we can estimate the Chrisγ γ toffel symbols, as follows. Recall that the difference αβ − gT ,αβ can be regarded as a tensor field on M, so that the following (Riemannian) norm squared is a scalar field on the manifold M:

ββ

γγ

α |∇gT − ∇|2T := | gT − |2T = ( gαT ,βγ − βγ ) ( gαT ,β γ − βα γ ) gT,αα gT gT .

Lemma 3.2. (Levi-Cevita connection of the reference metric). Suppose that g satisfies Assumptions (A1)–(A2). Then, the covariant derivative of the Lorentzian and Riemannian metrics are comparable, precisely at every point of , |∇gT − ∇|T = n 2 |LT g|2T ≤ e2K 0 K 12 =: K 3 . Proof. In view of (3.5) the difference gT − depends essentially upon the terms ∂g

∂n ∂xi

and ∂ti j which precisely appear in the expression of the Lie derivative of g along the vector field T (as follows by a direct computation from (3.2)) (LT g)00 = 0,

(LT g)0i =

1 ∂n , n ∂ xi

(LT g)i j =

1 ∂gi j . n ∂t

(3.6)

We omit the details. Next, observe that the difference between the curvature tensors can not be similarly estimated, and that this is one of the main difficulties to deal with in the present work. For future reference we provide here the expressions of certain curvature coefficients of g and gT in terms of (first-order derivatives of) the lapse function n and the induced metric g jk : Ri jkl R0 jil Ri0 j0

∂gik ∂g jl ∂gil ∂g jk 1 − , = + 2 4n ∂t ∂t ∂t ∂t ∂ ∂n ∂g jl ∂n ∂gi j 1 ∂ 1 ∇l ( gi j − ∇i gl j ) + − l , = 2 ∂t ∂t 2n ∂ x i ∂t ∂ x ∂t ∂ 2 gi j ∂gi p ∂g jq 1 1 ∂n ∂gi j ∂n ∂n 1 + g pq ∇i ∇ j (n 2 ) − + − i = , 2 2 ∂t 4 ∂t ∂t 2n ∂t ∂t ∂x ∂x j Rijkl

and RT,i jkl RT,0 jil RT,i0 j0

∂gik ∂g jl ∂gil ∂g jk 1 − , = − 2 4n ∂t ∂t ∂t ∂t ∂ ∂ 1 ∂n ∂g jl ∂n ∂gi j 1 + ∇l gi j − ∇i gl j − l , = 2 ∂t ∂t 2n ∂ x i ∂t ∂ x ∂t ∂ 2 gi j ∂gi p ∂g jq 1 1 ∂n ∂gi j ∂n ∂n 1 + g pq ∇i ∇ j (−n 2 ) − + + i = , 2 2 ∂t 4 ∂t ∂t 2n ∂t ∂t ∂x ∂x j Rijkl

where Rijkl denotes the induced curvature tensor on the time slices = t .

690

B.-L. Chen, P. G. LeFloch

4. Derivation of the First Injectivity Radius Estimate In this section we provide a proof of Theorem 3.1. Step 1. Radius of definition of the exponential map. First of all, the injectivity radius of the Riemannian metric g|0 induced on the initial hypersurface 0 = t −1 (0) is controled as follows. In view of Assumptions (A3) and (A4), the Riemann curvature of the metric g|0 is bounded and the volume of the unit geodesic ball Volg|0 (B0 ( p, 1)) is bounded below. Therefore, according to [9], there exists a constant i 1 = i 1 (K 2 , v0 ) such that the injectivity radius of g|0 at the point p is i 1 at least: Injg| (0 , p) ≥ i 1 . 0

Moreover, according to [15] we can also assume that i 1 is sufficiently small so that, given any ε > 0 there exist coordinates (x α ) defined in a ball with definite size near p, with x α ( p) = 0, such that the metric g|0 is close to the n-dimensional Euclidian metric g E = δi j (in these coordinates). More precisely, on the initial slice 0 we have e−ε δi j ≤ gi j (0, x 1 , . . . , x n ) ≤ eε δi j , (x 1 , . . . , x n ) ∈ B E (0, i 1 ), where we have set B E (0, r ) := (x 1 )2 + . . . + (x n )2 < r 2 ⊂ Rn . The latter can be regarded as a subset of 0 by identifying a point with its coordinates; we will also use the notation B E ( p, r ) for this Euclidean ball. We can next introduce some coordinates (x α ) = (t, x j ) on the manifold, by propagating the coordinates (x j ) chosen on 0 along the integral curve of the vector field T . This construction allows us to cover the domain . From Assumption (A2) (together with (A1) and (3.6)) we deduce that the induced metric on each slice of the foliation is comparable with the n-dimensional Euclidean metric in some time interval [−i 2 , i 2 ], that is, (e−ε − K i 2 ) δi j ≤ gi j (x) ≤ (eε + K i 2 ) δi j , x = (t, x 1 , . . . , x n ) ∈ [−i 2 , i 2 ] × B E (0, i 1 ), for some K > 0 depending only on K 0 , K 1 , K 2 . We then restrict attention to a smaller radius i 2 = i 2 (K 0 , K 1 , K 2 ) ≤ i 1 chosen such that e−ε − K i 2 > 0, and we pick up c1 ≥ 0 sufficiently large so that e−c1 ≤ e−ε − K i 2 ≤ eε + K i 2 ≤ ec1 . In turn, in view of Assumption (A1) on the lapse function n and of the expression (3.3) of the reference Riemannian metric gT , the above inequalities imply that gT is comparable to the (n + 1)-dimensional Euclidean metric: e−c2 δαβ ≤ gT,αβ ≤ ec2 δαβ , x = (t, x 1 , . . . , x n ) ∈ [−i 2 , i 2 ] × B E (0, i 2 ) for some constant c2 ≥ c1 depending upon c1 and K 0 . Introducing on the manifold the (n + 1)-dimensional Euclidian metric E (which we define in the constructed coordinates (x α ) and is, of course, independent of the point on the manifold) and introducing also the corresponding Euclidian metric ball B E ( p, i 2 ), we have established e−c2 g E ≤ gT,q ≤ ec2 g E , q ∈ B E ( p, i 2 ).

(4.1)

In the following we use the notation |X | E for the Euclidean norm of a vector X . Our first task is to determine the radius of a ball on which the exponential map is welldefined. This radius depends upon the reference vector field T . Let γ : [0, s0 ] → M be a

Injectivity Radius of Lorentzian Manifolds

691

geodesic associated with the Lorentzian metric g and satisfying γ (0) = p. Assume that this geodesic is included in the Euclidean ball B E ( p, i 2 ) (in which we already control the metric gT ). Obviously, the Lorentzian norm γ (s), γ (s)g = γ (0), γ (0)g ,

s ∈ [0, s0 ],

is constant. On the other hand, to determine the length of γ (s) with respect to the reference metric gT , we proceed as follows: d

γ (s), γ (s)T = ∇T,γ (s) gT (γ (s), γ (s)) ds = 2 ∇gT ,γ (s) γ (s), γ (s)T = 2 (∇g − ∇g )γ (s) γ (s), γ (s)T T

≤ 2 |∇gT − ∇g |T |γ (s)|3T . d |γ (s)|2T ≤ 2 K 3 |γ (s)|3T , and, in consequence, So, by Lemma 3.2, ds d |γ (s)|−1 ≤ K 3 . T ds By integration of the above inequality and provided s is small enough so that 2s K 3 |γ (0)|T < 1, we see that 1 |γ (0)|T ≤ |γ (s)|T ≤ 2 |γ (0)|T . 2

(4.2)

In view of (4.1) this implies e−c2 |γ (0)| E ≤ |γ (s)| E ≤ 2 ec2 |γ (0)| E . 2

(4.3)

These inequalities hold for all s ∈ [0, 1/(2K 3 |γ (0)|T )] as long as γ (s) ∈ B E ( p, i 2 ). In particular, by restricting attention to geodesics whose initial vector has unit Euclidean length, |γ (0)| E = 1, we see that γ ([0, r2 ]) ⊂ B E ( p, i 2 ) where r2 := i 2 e−c2 /2. Hence, for such curves the geodesic equation is well-defined, and this establishes that the exponential map at the point p is well-defined on the ball B E (0, r2 ) with a range included in the geodesic ball B E ( p, i 2 ). Step 2. Conjugate radius estimate. Our second task is to determine a ball on which the exponential map is a local diffeomorphism and we, therefore, need to control the length of Jacobi fields along a geodesic. Let γ : [0, r2 ] → M be a g-geodesic satisfying γ (0) = p and |γ (0)| E = 1. By the discussion in Step 1 we already know that the curve γ lies in B E ( p, i 2 ) and that maxs∈[0,r2 ] |γ (s)|T ≤ 2 e2c2 . Given an arbitrary Jacobi field along γ , J = J (s), satisfying J (s) = −Rm(J (s), γ (s))γ (s), J (0) = 0,

|J (0)|T = 1,

we need to control its Riemannian length F(s) := |J |T (s) (as stated in (4.7), below).

692

B.-L. Chen, P. G. LeFloch

Let [0, s0 ] be the largest subinterval of [0, r2 /4] in which the inequality |J |T ≤ 1 holds. Using the equation satisfied by the Jacobi field and taking into account the curvature bound (A3), we deduce that, in the interval [0, s0 ], d ∇γ J, ∇γ J T = 2 ∇g ,γ ∇γ J, ∇γ J T T ds ≤ 2 |∇gT − ∇g |T |γ |T |∇γ J |2T + 2 K 2 |γ |2T |J |T |∇γ J |T . With (4.2) and the covariant derivative estimate in Lemma 3.2, we obtain d |∇γ J |T ≤ 4 K 3 |∇γ J |T + 8 K 2 . ds

(4.4)

Next, we integrate (4.4) over an arbitrary interval [0, s] ⊂ [0, s0 ], use the initial condition on the Jacobi field, and obtain 1+

2K 2 2K 2 4K 3 s (1 − e−4K 3 s ) ≤ |∇γ J |T ≤ 1 + (e − 1). K3 K3

Assuming that r2 is small enough so that 2K 2 (1−e−4K 3 s )/K 3 ≥ −1/2 and 2K 2 (e4K 3 s − 1)/K 3 ≤ 1, we infer that 1 ≤ |∇γ J |T ≤ 2. 2 Hence, using this inequality and Lemma 3.2 we find that

(4.5) d ds |J |T

≤ 2 + 2K 3 ≤ 1 and

F(s) = |J |T (s) ≤ (2 + 2K 3 ) s ≤ (2 + 2K 3 ) r2 .

(4.6)

Further assuming that (2 + 2K 3 ) r2 ≤ 1, we conclude that s0 = r2 . Next, we want to improve the rough estimate (4.6). Since d ∇γ J, J T = ∇gT ,γ ∇γ J, J T + ∇gT ,γ J, ∇γ J T , ds then by substituting the previous estimates of |J |T (s) and |∇γ J |T (s) and performing similar calculations as above, we get e−c3 ≤

d ∇γ J, J T ≤ ec3 ds

for some constant c3 > 0. By integration this implies e−c3 s ≤ ∇γ J, J T ≤ ec3 s and, for some c4 > 0, we arrive at the following lower bound for the norm of the Jacobi field: ∇γ J, J T e−c3 s ≥ e−c4 s. F(s) ≥ ≥ |∇γ J |T 2

Injectivity Radius of Lorentzian Manifolds

693

On the other hand, using again the above estimates we have d 1 F ≤ (∇gT ,γ J, J T + K 3 F 2 ) ds F

e c4 c3 e s + K 3 (2 + 2K 3 )2 s 2 ≤ ec5 ≤ s for some constant c5 > 0, which now yields the upper bound F(s) ≤ ec5 s. In summary, we have established that the norm of the Jacobi field is comparable with s: e−c4 s ≤ F(s) ≤ ec5 s,

s ∈ [0, r2 ].

(4.7)

By the definition of Jacobi fields these inequalities are equivalent to a uniform control of the differential of the exponential map, that is, for s ∈ [0, r2 ], e−c4 |W |T ≤ |Dexp p,sγ (0) (W )|T ≤ ec5 |W |T . By the inverse mapping theorem, Dexp p,sγ (0) is a local diffeomorphism. We also conclude that the pull back of the reference metric to the tangent space at p satisfies

e−c4 gT, p ≤ exp p gT ≤ ec5 gT, p (4.8) in the ball BT (0, r2 ) ⊂ T p M. Since the conjugate radius of the Lorentzian metric is precisely defined from the reference Riemannian metric, these inequalities show that the conjugate radius of the exponential map is r2 , at least. Step 3. Injectivity radius estimate. We are now in a position to establish that Injg (M, p,T ) ≥ r3 := r2 e−3c2 /4. We argue by contradiction and assume that γ1 : [0, s1 ] → M and γ2 : [0, s2 ] → M are two distinct g-geodesics satisfying max(s1 , s2 ) ≤ r3 and γ1 (0) = γ2 (0) = p, |γ1 (0)|T = |γ2 (0)|T = 1, γ1 (s1 ) = γ2 (s2 ) =: q. We will reach a contradiction and this will establish that the injectivity radius is greater or equal to r3 (as can be checked by using the fact that the exponential map is at least a local diffeomorphism). By Step 1 (since r3 ≤ r2 ≤ i 2 ) we know that γ1 , γ2 ⊂ B E ( p, 2e2c2 r3 ). By concatenating these two curves, we construct a geodesic loop containing p, γ = γ2−1 ∪ γ1 : [0, s1 + s2 ] → B E ( p, 2e2c2 r3 ), which need not be smooth at p nor q. Since γ is contained in the image of the ball BT ( p, r2 ) under the exponential map, we can define a homotopy of γ with the origin (x = 0), by setting (in the coordinates constructed earlier)

ε (s) = εγ (s), ε ∈ [0, 1]. The curves ε : [0, s1 + s2 ] → B E ( p, 2e2c2 r3 ) satisfy

ε (0) = ε (s1 + s2 ) = p, 0 ([0, 1]) = p, 1 = γ .

694

B.-L. Chen, P. G. LeFloch

Moreover, we have | ε (s)| E ≤ ε2e2c2 ≤ 2e2c2 and thus (in view of (4.1)) | ε (s)|T ≤ gT -lengths (computed with the reference metric) of the loops ε are less than 2e3c2 . In particular, the

L( ε , gT ) ≤ 2e3c2 r3 =

r2 , 2

due to the choice made for r3 . Since the exponential map is a local diffeomorphism from the ball BT (0, r2 ) ⊂ T p M to the manifold, and in view of the estimate (4.8) on the exponential map, it follows that all the loops ε can be lifted to the ball BT (0, r2 ) in the tangent space with the same origin 0. Consequently, we obtain a continuous family of curves

ε : [0, s1 +s2 ] → T p M satisfying

ε (0) = 0, ε ∈ [0, 1]. At this juncture we observe that, since

ε (s1 + s2 ) (for ε ∈ [0, 1]) all cover the same point p and since the curve

0 is trivial and the family is continuous,

ε (s1 + s2 ) = 0, ε ∈ [0, 1]. It remains to consider the lift of the original geodesic loop γ : under the lifting the geodesics γ1 , γ2 are sent to two distinct line segments (with respect to the vector space structure) originating at the origin 0 which obviously do not intersect. This is a contradiction and we conclude that, in fact, Injg (M, p, T ) ≥ r3 as announced. This completes the proof of Theorem 3.1. 5. Convex Functions and Convex Neighborhoods We establish now the existence of convex functions and convex neighborhoods in M. Let us recall first some basic definitions. A function u is said to be geodesically convex if the composition of u with any geodesic is a convex function (of one variable). A set ⊂ is said to be relatively geodesically convex in if, given any points p, q ∈ and any geodesic (segment) γ from p to q contained in , one has γ ⊂ . A set is said to be geodesically convex in if is relatively geodesically convex in and, in addition, for any p, q ∈ there exists a unique geodesic γ connecting p and q and lying in . We denote by dT the distance function associated with the reference Riemannian metric gT . Theorem 5.1. (Existence of geodesically convex functions). Let (M, g) be a differentiable (n + 1)-manifold endowed with a Lorentzian metric g, satisfying the regularity assumptions (A1)–(A4) for some point p ∈ M and some future-oriented, unit, time-like vector field T , and let gT be the associated Riemannian metric. Then, for any ε ∈ (0, 1) there exists a positive constant r0 depending only upon ε, the foliation bounds K 0 , K 1 , the curvature bound K 2 , the volume bound v0 , and the dimension of the manifold and there exists a smooth function u defined on BT ( p, r0 ) such that (1 − ε) dT ( p, ·)2 ≤ u ≤ (1 + ε) dT ( p, ·)2 , (2 − ε) gT ≤ ∇ 2 u ≤ (2 + ε) gT .

Injectivity Radius of Lorentzian Manifolds

695

Hence, the function u above is equivalent to the Riemannian distance function from p and is geodesically convex for the Lorentzian metric. In the proof given below, the function u is the Riemannian distance function associated with a new time-like vector field (denoted by N in the proof below). The following corollary is immediate and provides us with a control of the radius of convexity, which generalizes the Whitehead theorem from Riemannian geometry [22,6]. Corollary 5.2. (Existence of geodesically convex neighborhoods). Under the assumptions of Theorem 5.1, for any 0 < r < r0 there exists a set r ⊂ which is geodesically convex in BT ( p, 2r0 ) and satisfies exp p (BT (0, r )) ⊂ r ⊂ exp p (BT (0, (1 + δ)r )). Moreover, one can choose r so that BT ( p, r ) ⊂ r ⊂ BT ( p, (1 + δ)r ) also holds, where BT ( p, r ) is the geodesic ball determined by the reference Riemannian metric. Proof of Theorem 5.1. Step 1. Synchronous coordinate system. Given ε > 0, by applying the injectivity radius estimate in Theorem 3.1 to all points near p, we can find a constant r0 depending on K 0 , K 1 , K 2 , v0 , ε, n such that for any q ∈ BT ( p, 2r0 ) the injectivity radius at q is 2r0 at least, and we can assume that e−ε gT,q ≤ (expq ) gT ≤ eε gT,q ,

BT (0, r0 ) ⊂ Tq M, q ∈ BT ( p, 2r0 ).

Let γ = γ (s) be the past time-like geodesic from p and satisfying γ (0) = p and γ (0) = −T p , and consider the (past) point q := γ (r0 /2). The future null cone at q with radius r0 (the orientation being determined by the vector field T ) is defined by Cq (r0 ) := V ∈ Tq M |V |gT,q < r0 , |V |2gq < 0, V, T > 0 . Observe that the gT -length of γ between p and q is approximatively r0 /2 and that the norm |γ |T is almost 1, while |γ (q)|2gq = 1 and −γ , T g > 0. By the injectivity radius estimate in Theorem 3.1 the exponential map at q is a diffeomorphism from Cq (r0 ) onto its image which, moreover, contains the original point p. Next, introduce the set of vectors that are “almost” parallel to T : V, V gq Cq (r0 , ε) := V ∈ Tq M |V |T,q < r0 , V, T gq > 0, >1−ε . V, V T,q The notation c(ε) > 0 is used for constants that depend only on K 0 , K 1 , K 2 , v0 , n, ε and satisfy limε→0 c(ε) = 0. We claim that there is constant c(ε) > 0 such that BT ( p, c(ε)r0 ) ⊂ expq (Cq (r0 , ε)). Actually, we have BT ( p, c(ε)r0 ) ⊂ BT (q, ( 21 + c(ε))r0 ), hence 1 BT ( p, c(ε)r0 ) ⊂ expq BT (0, ( + c(ε))r0 ) . 2

(5.1)

696

B.-L. Chen, P. G. LeFloch

Since the metrics gT,0 and gT,q are comparable (under the exponential map at q) we see that geodesics σ connecting q and points of BT ( p, c(ε)r0 ) make an angle ≤ c(ε) with −γ (q) at the point q (as measured by the metric gT,q ). By reducing the constant c(ε) if necessary, the claim is proved. Let τ be the Lorentzian distance from q: it is defined on expq (Cq (r0 )) and is a smooth function on expq (Cq (r0 )) \ { p}. From (5.1) we deduce that τ is smooth and satisfies 1 1 (5.2) − c(ε) r0 < τ < + c(ε) r0 in the ball BT ( p, c(ε)r0 ). 2 2 It is clear also that |∇τ |2g = −1,

∇ 2 τ (∇τ, ·) = 0.

We now introduce a new foliation based on this Lorentzian distance function. Let (z j ) be coordinates on the level set hypersurface τ = τ ( p). By following the integral curves of the (unit, time-like) vector field N := ∇τ construct coordinates (z α ) in which z 0 := τ and the Lorentzian metric takes the form g = −(dz 0 )2 + gi j dz i dz j . Let g N be the reference Riemannian metric based on this (new) vector field N . In view of Lemma 3.2 and the geodesic equation we see that all (future) g-geodesics σ satisfy the uniform bound d ≤ K 3 r0 . log |σ (τ )| dτ (Recall that we allow r0 to depend upon ε.) This inequality shows that the vector field N makes an angle ≤ c(ε) with T , everywhere on expq (Cq (r0 , ε)). From this, we conclude that the two metrics are comparable: (1 − c(ε)) gT ≤ g N ≤ (1 + c(ε)) gT

in the cone expq (Cq (r0 , ε)).

Step 2. Hessian comparison theorem and curvature bound for the reference metric g N . Since p ∈ expq (Cq (r0 )), let σ : [0, τ ( p)] → M be the future time-like geodesic connecting q to p, and let J be the Jacobi field defined along σ such that J (0) = 0,

J (τ ( p)) = V,

where V ∈ T p M is given and satisfies the orthogonality condition ∇τ, V = 0. Then, we have −∇ 2 τ (V, V ) = −J, ∇∇τ J = J, ∇ ∂ J ∂τ τ ( p) = ∇ ∂ J, ∇ ∂ J g − Rm(σ , J, σ , J ) =: I (J, J ). 0

∂τ

∂τ

Recall that, in the absence of conjugate points along the geodesic, Jacobi fields minimize the index form I (V, V ) among all vector fields with fixed boundary values. By applying a standard comparison technique from Riemannian geometry on the orthogonal space

Injectivity Radius of Lorentzian Manifolds

697

(∇τ )⊥ (on which the Lorentzian metric induces a Riemaniann metric) we can control the Hessian of τ in terms of the curvature bound K 2 , as follows: √ √ K 2 (1 + c(ε)) K 2 (1 + c(ε)) 2 g |(∇τ )⊥ ≤ (−∇ τ )|(∇τ )⊥ ≤ g |(∇τ )⊥ . √ √ tan K 2 (1 + c(ε))τ tanh K 2 (1 + c(ε))τ Next, since −∇i2j τ =

1 ∂gi j 2 ∂τ

, we deduce from the above inequalities that

∂gi j 3gi j gi j ≤ ≤ τ ∂τ τ

in the cone expq (Cq (r0 )).

(5.3)

In view of the curvature expressions given at the end of Sect. 3, i.e. since the lapse function is now constant 1 ∂gik ∂g jl ∂gil ∂g jk Ri jkl = Ri jkl + − , 4 ∂τ ∂τ ∂τ ∂τ 1 ∂ ∂ ∇l ( gi j ) − ∇i ( gl j ) , R0 jil = 2 ∂τ ∂τ Ri0 j0 = −

1 ∂ 2 gi j 1 pq ∂gi p ∂g jq , + g 2 ∂τ 2 4 ∂τ ∂τ

we conclude that 2 ∂ gi j C ∂τ 2 ≤ τ 2

on expq (Cq (r0 )).

(5.4)

Finally, relying on the formulas for the curvature of the reference Riemannian metric g N , we obtain |Rmg N | N ≤

C τ2

on expq (Cq (r0 )).

(5.5)

Observe that, as could have been expected, the upper bounds in (5.4) and (5.5) blow-up as one approaches the point q which is the base point in our definition of the distance. In particular, (5.5) shows the desired curvature estimate near the point p: |Rmg N | N ≤ Cr0−2

on the ball BT ( p, c(ε)r0 ).

Step 3. Constructing geodesically convex functions. Since the metrics gT and g N are comparable, the volume ratio Volg N (B N ( p, c(ε)r0 )) /r0n+1 is uniformly bounded (above and) below. Thanks to the theory for Riemannian metrics [9], the injectivity radius of the metric g N is bounded from below by c(ε)r0 . Let u(x) := dg N ( p, x)2 be the (square) of the distance function associated with the Riemannian metric g N , which is a smooth function defined on the geodesic ball B N ( p, c(ε)r0 ). By the standard Hessian comparison theorem for Riemannian manifold we have (2 − ε) g N ,αβ ≤ ∇g N ,α ∇g N ,β u ≤ (2 + ε) g N ,αβ

on the ball B N ( p, c(ε)r0 ).

698

B.-L. Chen, P. G. LeFloch

Next, in terms of the original Lorentzian metric g, the Hessian of the function u is γ

γ

∇α ∇β u = ∇g N ,α ∇g N ,β u + ( g N ,αβ − αβ )

∂u . ∂xα

∂g

Since | g N − | N ≤ C sup | ∂τi j | ≤ C by the estimate (5.3) and since also |∇u| N ≤ 2 dg N on B N ( p, r0 ), we conclude that (2 − ε) g N ,αβ ≥ ∇α ∇β u ≥ (2 + ε) g N ,αβ

in the ball B N ( p, c(ε)r0 ).

This completes the proof of Theorem 5.1. 6. Injectivity Radius of Null Cones We now turn our attention to null cones within foliated Lorentzian manifolds. Our main result (Theorem 6.1 below) provides a lower bound for the null injectivity radius under the main assumption that the exponential map is defined in some ball and the null conjugate radius is already controled. Hence, contrary to the presentation in Sect. 3 our main assumption (see (A3 ) below) is not directly stated as a curvature bound. However, under additional assumptions, it is known that the conjugate radius estimate can be deduced from an L p curvature bound, so that our result is entirely relevant for the applications. Indeed, in a series of fundamental papers [16–18], Klainerman and Rodnianski assumed an L2 curvature bound and estimated the null conjugate and injectivity radii for Ricci-flat Lorentzian (3 + 1)-manifolds. Our result in the present section is a continuation of the recent work [18] and covers a general class of Lorentzian manifolds with arbitrary dimension, while our proof is local and geometric and so conceptually simple. We use the terminology and notation introduced in Sect. 2. In particular, a point p ∈ M and a reference vector field T are given, and N p− denotes the past null cone in the tangent space at p. Recall from Sect. 2 that we defined the null exponential map exp Np : BTN (0, r ) → M over a subset of this cone, BTN (0, r ) := BT (0, r ) ∩ N p− , and we introduced the (past) null injectivity radius Null Injg (M, p, T ). We also set BTN ( p, r ) := exp Np (BTN (0, r )). We consider a domain ⊂ M containing the point p on a final slice 0 and foliated as =

t ,

p ∈ 0 .

(6.1)

t∈[−1,0]

We assume that there exist positive constants K 0 , K 1 , K 2 such that e−K 0 ≤ n ≤ e K 0 in , |LT g|T ≤ K 1 in ,

(A1) (A2)

the null conjugate radius at p is r0 (at least) and the null exponential map satisfies

(A3 ) e−K 2 gT, p | B N (0,r0 ) ≤ exp Np (gT |BN (0,r0 ) ) ≤ e K 2 gT, p | B N (0,r0 ) T

T

T

Injectivity Radius of Lorentzian Manifolds

699

and, finally, there exists a coordinate system on the initial slice −1 such that the metric g |−1 is comparable to the n-dimensional Euclidean metric g E in these coordinates: e−K 0 g E ≤ g |−1 ≤ e K 0 g E

in B−1 ,E ( p, r0 ).

(A4 )

We refer to K 2 as the effective conjugate radius constant. Theorem 6.1. (Injectivity radius of null cones). Let M be a differentiable (n + 1)manifold, endowed with a Lorentzian metric g satisfying the regularity assumptions (A1), (A2), (A3 ), and (A4 ) at some point p and for some foliation (6.1). Then, there exists a positive constant i 0 depending only upon the foliation bounds K 0 , K 1 , the null conjugate radius r0 , the effective conjugate radius constant K 2 , and the dimension n such that the null injectivity radius of the metric g at p satisfies Null Injg (M, p, T ) ≥ i 0 . It is interesting to compare the assumptions above with the ones made in Sect. 3. Assumptions (A1) and (A2) are concerned with the property of the foliation and were already required in Sect. 3. Assumption (A3 ) should be viewed as a weaker version of the L ∞ curvature condition (A3). Recall that, under the assumptions of Theorem 3.1 which included a curvature bound, an analogue of (A3 ) valid in the whole of was already established in (4.8). It is expected that (A3 ) is still valid when the curvature in every spacelike slice is solely bounded in some L m space. Indeed, at least when the spatial dimension is n = 3 and the manifold is Ricci-flat, Assumption (A3 ) is a consequence of the following L2 curvature bound (for some constant K 2 > 0) Rmg L2 (−1 ,gT ) ≤ K 2 ,

(6.2)

as was established by Klainerman and Rodnianski [16,17]. Assumption (A4 ) concerns the metric on the initial hypersurface and is only slightly stronger than the volume bound (A4). Furthermore, according to Anderson [1] and Petersen [21] the property (A4 ) is a consequence of the curvature bound (for m > n/2 and some constant K 2 > 0) Rmg Lm (−1 ,gT ) ≤ K 2 ,

(6.3)

and a volume lower bound at every scale r −n Volg|0 (B0 ( p, r )) ≥ v0 ,

r ∈ (0, r0 ].

(6.4)

Proof of Theorem 6.1. Step 1. Localization of the past null cone N− ( p) between two flat null cones. Assumption (A3 ) provides us with a bound on the null conjugate radius, we need to control the injectivity radius. We proceed as in Sect. 4 and introduce coordinates near the point p such that x α ( p) = 0. Precisely, relying on Assumptions (A1), (A2), and (A4 ), we determine the coordinates x = (x α ) by setting x 0 = t and transported (via the gradient of the function t) spatial coordinates (x j ) initially given on −1 . The Lorentzian metric reads g = −n 2 dt 2 + gi j d x i d x j and satisfies for some C0 , C1 > 0, 1 ≤ n 2 ≤ C0 , C0

1 δi j ≤ gi j ≤ C1 δi j , C1

(6.5)

700

B.-L. Chen, P. G. LeFloch

for all −r0 < t ≤ 0 and (x 1 )2 + . . . + (x n )2 ≤ (r0 )2 , and in these coordinates the reference Riemannian metric gT is comparable to the (n + 1)-dimensional Euclidean metric g E := dt 2 + (d x 1 )2 + . . . (d x n )2 : 1 g E ≤ gT ≤ C 1 g E . C1

(6.6)

Denote by B E (q, r ) the Euclidean ball with center q and radius r . Note that these inequalities hold within a neighborhood of p in . The forthcoming bounds will hold in a neighborhood of the past null cone, only. To simplify the notation, we set c0 :=

1 , C0

c1 :=

1 . C1

In each time slice of parameter value t = a we introduce the n-dimensional Euclidean ball with radius b, Aab , Aa[c,d] , . . . in a similar way. For any point q in a slice t0 satisfying −r0 ≤ t0 < 0 and x 1 (q)2 +· · ·+x n (q)2 < c12 t02 we consider the line (for the Euclidean metric) connecting q to p: τ 1 τ n γ (τ ) = τ, x (q), · · · , x (q) , τ ∈ [t0 , 0]. t0 t0 This is a time-like curve for the Lorentzian metric g, since |γ (τ )|2 = −n 2 + gi j

x i (q) x j (q) < −c0 + c1 < 0, t0 t0

which shows that At
t ∈ (−r0 , 0).

On the other hand, we claim that the larger Euclidean cone At
t ∈ (−c1 r0 , 0).

Indeed, arguing by contradiction we suppose there exist a time t0 ∈ (−c1 r0 , 0) and a 0 point q ∈ At≥C connected to p by a causal curve γ = γ (s) with γ (0) = p. After 1 t0 reparametrizing (in time) the curve if necessary, we can assume that γ (τ ) = (τ, x j (τ )) for some t0 ≤ τ ≤ 0, as long as the point γ (τ ) lies in the coordinate system under consideration. For this part of the curve at least we have 0 ≥ |γ |2 = −n 2 + gi j 1

n

dxi dx j , dτ dτ

which by (6.5) implies that ( ddτx )2 + . . . + ( ddτx )2 < C1 C0 . Therefore, after integration we find

1/2 x 1 (q)2 + · · · + x n (q)2 (t0 ) ≤ C0 C1 t0 ≤ C0 c1 r0 < r0 .

Injectivity Radius of Lorentzian Manifolds

701

Hence, we can choose t0 = t0 , the whole curve lies in our coordinate system, and is parametrized in the form γ (τ ) = (τ, x(τ )), (τ ∈ [t0 , 0]). Moreover, we have |x(t0 )| ≤ √ 0 C1 C0 |t0 | < C1 |t0 |, which contradicts our assumption q ∈ At≥C . 1 t0 In conclusion, we have localized the slices of the past null cone within “annulus” regions: N− ( p) ∩ t ⊂ At[c1 |t|,C1 |t|] ,

t ∈ [−c1 r0 , 0].

Step 2. The past null cone N− ( p) can be viewed as a graph with bounded slope. We now 1 r0 obtain a Lipschitz continuous parametrization of the null cone. For any fixed q ∈ A−c ≤c2 r 1 0

we consider the vertical curve passing through q: γq (τ ) = (τ, x 1 (q), · · · , x n (q)),

τ ∈ [−c1r0 , 0].

By Step 1 we know that there exists τq such that γq (τq ) ∈ N− ( p). Moreover, τq is unique since N− ( p) is achronal, and this defines a map 1 r0 F : A−c → N− ( p) ≤c2 r 1 0

such that F(q) = γq (τq ). It is obvious F(−c1r0 , 0) = p. We claim that the map F is Lipschitz continuous with Lipschitz constant less than C1 , as computed with the Euclidean metric E. Namely, by contradiction, suppose that 1 r0 , then by (6.6) in Step 1, |F(q1 ) − F(q2 )| E > C1 |q1 − q2 | E for some q1 , q2 ∈ A−c ≤c2 r 1 0

F(q1 ) would be chronologically related to F(q2 ) and this would contradict the fact that N− ( p) is achronal. Moreover, from Step 1 it follows that 1 r0 F(A−c ) ⊃ N− ( p) ∩ B E ( p, c13 r0 ). ≤c2 r 1 0

Step 3. Constructing a homotopy of curves on the null cone. Suppose that γ1 , γ2 are two (past) null geodesics from p satisfying |γ1 (0)|T = |γ2 (0)|T = 1,

γ1 (0) = γ2 (0), γ1 (s1 ) = γ2 (s2 ).

We claim that max(s1 , s2 ) > c16r0 , which will establish the desired injectivity bound by setting i 0 = c16r0 . We argue by contradiction and assume that max(s1 , s2 ) < c16r0 . Taking into account Assumption (A2) and applying exactly the same arguments as in Step 1 of Sect. 4 we see that the gT -lengths of the curves γ1 , γ2 satisfy 5+3/4

L(γ j , gT ) ≤ s j eCC1 s j ≤ c1

r0

( j = 1, 2).

By Step 1 of the present proof we know that the Euclidean lengths of γ1 , γ2 satisfy 5+1/4

L(γ j , g E ) ≤ c1

r0

( j = 1, 2).

In particular, γ1 , γ2 ⊂ N− ( p) ∩ B E ( p, c15 r0 ) and we can thus concatenate the curve γ1 , γ2 and obtain γ := γ2−1 ∪ γ1 : [0, s1 + s2 ] → N− ( p) ∩ B E ( p, c15 r0 ).

702

B.-L. Chen, P. G. LeFloch 1 r0 Since F(A−c ) ⊃ N− ( p) ∩ B E ( p, c13 r0 ), there exists a smooth family of curves ≤c2 r 1 0

σε : [0, s1 + s2 ] → N− ( p) such that

σ1 = γ , σ0 = p, σε (0) = σε (s1 + s2 ) = p,

ε ∈ [0, 1].

Specifically, we choose σε (s) := F(εF −1 γ (s)), 1 r0 ≈ where the multiplication by ε is defined by relying on the linear structure of A−c ≤c2 r 1 0

BRn (0, c12 r0 ). Equivalently, by setting x i (s) = x i (γ (s)) we have the explicit formula

σε (s) = F −c1r0 , εx 1 (s), · · · , εx n (s) . It is clear that the Euclidean and gT -lengths of σε satisfy 4+1/8

L(σε , g E ) ≤ ε(1 + C1 ) L(γ , g E ) ≤ c1 L(σε , gT ) ≤

r0 ,

3+5/8 c1 r0 .

By Assumption (A3 ) on the null conjugate radius, we can lift to the null cone of the tangent space T p M the continuous family of loops σε , and we obtain a continuous family of curves σε defined on the interval [0, s1 + s2 ] and such that σε (0) = 0,

L( σε , gT, p ) ≤ c13 r0 .

Observe that the property L( σε , gT, p ) ≤ c13 r0 r0 guarantees the existence of this continuous lift. By continuity, all of the curves σε are loops containing 0. As observed earlier (in the proof for the case of bounded curvature), σ1 consists of two distinct segments which, clearly, can not form a closed loop. We have reached a contradiction and the proof of Theorem 6.1 is completed. 7. Injectivity Radius of an Observer in a Lorentzian Manifold Main result. We are now in a position to discuss and prove Theorem 1.1 stated in the introduction. As we have seen in the proof given in the previous section, once the injectivity radius is controled, one can construct a foliation satisfying certain “good” properties. On the other hand, the concept of injectivity radius is clearly independent of any prescribed foliation. As this is more natural, we will now present a general result which avoids assuming a priori the existence of a foliation. This will be achieved by relying on purely geometric and intrinsic quantities and constructing coordinates adapted to the geometry. Such a result is conceptually very important in the applications. The result and proof in this section should be viewed as a Lorentzian generalization of Cheeger, Gromov, and Taylor’s technique [9], originally developed for Riemannian manifolds. Let (M, g) be a differentiable (n + 1)-manifold endowed with a Lorentzian metric g, and consider a point p ∈ M and a vector T ∈ T p M with g p (T, T ) = −1. That is, we now fix a single observer located at the point p. As explained in Sect. 2 the vector T induces an inner product gT = , T on the tangent space T p M. We assume that the exponential map exp p is defined in some ball BT (0, r ) ⊂ T p M determined by this

Injectivity Radius of Lorentzian Manifolds

703

inner product, which is of course always true in a sufficiently small ball. Controling the geometry at the point p precisely amounts to estimating the size of this radius r where the exponential map is defined and has some good property. We restrict attention to the geodesic ball BT ( p, r ) := exp p (BT (0, r )); recall that these sets depend upon the vector T given at p. As explained in the introduction, by g-parallel translating the vector T at p along a geodesic γ from p, we can get a future-oriented unit time-like vector field Tγ defined along this geodesic. To this vector field and the Lorentzian metric g we can associate a reference Riemannian metric gTγ along the geodesic. In turn, this allows us to compute the norm |Rmg |Tγ of the Riemann tensor along the geodesic. Of course, whenever two such geodesics γ , γ meet away from p, the corresponding vectors Tγ and Tγ are generally distinct. If we consider the family of all such geodesics we therefore obtain a (generally) multi-valued vector field defined in the geodesic ball BT ( p, r ). We use the same letter T to denote this vector field. In turn, we can compute the Riemann curvature norm |Rmg |T by taking into account every value of T . The key objective of the present section is the study of the geometry of the local covering map exp p : BT (0, r ) → BT ( p, r ) by comparing the Lorentzian metric g defined on the manifold M with the reference Riemannian metrics gT . As we will see in the proof below, it will be convenient to pull the metric “upstairs” on the tangent space at p, using the exponential map. This will be possible once we will have estimated the conjugate radius (in Step 1 of the proof below) and will know that the exponential map is non-degenerate on BT (0, r ). By pulling back the Lorentzian metric g on M by the exponential map we get a Lorentzian metric g = exp p g defined in the tangent space, on the ball BT (0, r ). We use the same letter g to denote this metric. The geometry in the tangent space is particularly simple, since the g-geodesics on M passing through p are radial straightlines in BT (0, r ). Note that another (equivalent) standpoint could be adopted here by restricting attention to the domain within the cut-locus from the point p, and by imposing the curvature assumption within the cut-locus only. We now prove our main result stated in Theorem 1.1. Proof of Theorem 1.1. After scaling we may assume that r = 1, and so we need to show Injg (M, p, T ) ≥ c(n) Volg (BT ( p, c(n))).

(7.1)

Step 1. Estimates for the metric gT and its covariant derivative. Let E 0 = T, E 1 , · · · , E n be an orthonormal frame in T p M for the Lorentzian inner product g p , where E j are spacelike vectors. By g-parallel transporting this basis along a radial geodesic γ = γ (r ), satisfying γ (0) = 0, |γ (0)|T = 1, we get an orthonormal frame defined along the geodesic. We use the same letters E α to denote these vector fields. Since d E α , E β g = 0, dr we infer that |E i |2T = |E i |2g = 1 along the geodesic. The same argument also implies |γ (r )|2T = |γ (0)|2T = 1,

(7.2)

and γ (r ) = cα E α (r ) with constant (in r ) scalars cα and |cα |2 = |γ (0)|T = 1. We used here that, by definition, γ is g-parallel transported along γ .

704

B.-L. Chen, P. G. LeFloch

Let V = a α (r ) E α (r ) be a Jacobi field along a radial geodesic γ = γ (r ), with V (0) = 0 and |V (0)|T = 1. Then, the Jacobi equation takes the form aα (r ) = −E α , R(E β , E γ ) E δ T cβ cδ a γ (r ), and since, by our curvature assumption (1.6),

2 2 2 d aα + aα2 ≤ aα + aα2 , aα + aα2 ≤ 2 −2 dr α α α we obtain |V (r )|T ≤ er and thus |V (r )|T ≤ (er − 1). By substituting this result into the above formulas, the estimate can be improved d again. Indeed, by computing and estimating the second-order derivative dr α aα aα as we did for the Jacobi field estimate of Sect. 4, we can check that

1/2 r − C(n) r 2 ≤ ≤ (er − 1) along the geodesic, |aα (r )|2 where C(n) depends only on the dimension. Denote by g0 and gT,0 the Lorentzian and the Riemannian metrics at the origin 0 (which are nothing but the metrics at the point p), and let y 0 , . . . , y n be Cartesian coordinates on BT (0, 1), with ∂ ∂y α , ∂ ∂y β g0 (0) = ηαβ (where ηαβ is the Minkowski metric). Assuming that the radius under consideration is sufficiently small so that (1−C(n) |y|) < 1, we conclude from the Jacobi field estimate that the exponential map is non-degenerate and that the inner products along the geodesic are comparable. In turn, since this is true for every radial geodesic, we can define the pull back of the metric to the tangent space and the conclusion holds in the whole ball BT (0, 1), that is, (1 − C(n) |y|) gT,0 ≤ gT,y ≤ (1 + C(n) |y|) gT,0 ,

y ∈ BT (0, 1).

(7.3)

We next try to compare the covariant derivative operators. By construction of the metric gT we have ∇gT − ∇ = ∇T ∗ T (schematically) with ∇T (0) = 0, and so we need to control ∇T . We write the radial vector field as ∂ yα ∂ = , r 2 := |y α |2 , α ∂r r ∂y αβ

∂ 2 with | ∂r |T ≡ 1 (as stated already in (7.2)). Using that |∇T |2T = ∇α T ξ ∇β T η gT,ξ η gT and computing the derivative of |∇T |2T along any radial geodesic, we find

d |∇T |2T ≤ C(n) |∇T |3T + 2 ∇ ∂ ∇T, ∇T T . ∂r dr By using that ∇ ∂ T = 0, ∂r

∂ ∂ , α ∂r ∂ y

=−

1 ∂ yα ∂ , + r ∂ y α r 2 ∂r

we obtain ∇∂ ∇ ∂r

∂ ∂ yα

1 ∂ ∂ T γ = − ∇ ∂α T γ + R( , α )T γ r ∂y ∂r ∂ y

Injectivity Radius of Lorentzian Manifolds

705

and, therefore, thanks to the curvature assumption we find d 2 |∇T |2T ≤ − |∇T |2T + C(n) |∇T |3T + C(n) |∇T |T . dr r This implies a uniform bound for the covariant derivative of T |∇T |T (y) ≤ C(n) |y|,

|y| ≤ 1/C(n),

(7.4)

which also provides a bound for the difference ∇gT − ∇. Step 2. Estimate of the injectivity radius of g on BT (0, c(n)). Since the curvature on BT (0, 1) is bounded and |∇gT − ∇|2T ≤ C(n) = 1/c(n) on the ball BT (0, c(n)) we can follow the argument in Sect. 4 and obtain a uniform lower bound the conjugate radius at any point of the ball BT (0, 3c(n)/4). Next, given any point y ∈ BT (0, c(n)/2), let γ1 and γ2 be two geodesics which meet at their end points and have “short” length with respect to the metric gT (or, equivalently, gT,0 ). By using the linear structure on BT (0, 1) (as a subset of the vector space T p M) we can construct a homotopy of the loop γ1 ∪ γ2−1 to the origin, such that each curve has also “short” length for the metric gT . By lifting the homotopy to the tangent space Ty BT (0, 1) and by relying on the conjugate radius bound, we reach a contradiction as was done in Sect. 4. In summary, there exists a universal constant C(n) = 1/c(n) (depending only on the manifold dimension) such that the injectivity radius at each point y of BT (0, c(n)) is bounded from below by 4c(n). Moreover, using again a Jacobi field estimate we can check that the ball BT, p (0, c(n)) ⊂ T p M defined by the Euclidean metric gT, p is covered by exp y (BT,y (0, 3c(n))), where BT,y (0, 3c(n)) ⊂ Ty T p M is a ball of radius 3c(n) defined by metric gT,y and, moreover, any two points in BT, p (0, c(n)) can be connected by a g-geodesic totally contained in BT, p (0, 2c(n)). Further arguments are now required to arrive at the desired bound (7.1). Step 3. New Riemannian metric g N induced on BT (0, 2c(n)). Consider a geodesic γ satisfying γ (0) = 0 and γ (0) = −T , and define γ (c(n)/2) =: q,

τ := dg (·, q) − dg (q, 0).

Then, by following exactly the same arguments as in the main proof of Sect. 5, we construct a normal coordinate system (of definite size) such that g = −dτ 2 + gi j d x i d x j ,

g N = dτ 2 + gi j d x i d x j ,

and the Riemannian metric satisfies the following properties: (i) (1 − c(n)) g N ≤ gT ≤ (1 + c(n)) g N , (ii) g N has bounded curvature (≤ C(n)) (see (5.4)), and (iii) for any fixed y0 ∈ BT (0, c(n)) the distance function dg N (y0 , )2 is strictly g-convex on the ball BT (0, 2c(n)) and, more precisely, for any y0 ∈ BT (0, c(n)), 2 (2 + c(n)) g N ≥ ∇g2 d g (y0 , ·) ≥ (2 − c(n)) g N

on BT (0, 2c(n)).

Recall that the Hessian of the distance function (defined by the Riemannian metric g N ) is computed with the covariant derivative operator associated with the Lorentzian metric.

706

B.-L. Chen, P. G. LeFloch

Step 4. Suppose that p1 , · · · , p N are distinct pre-images of p in the ball BT (0, c(n)). We claim that any p ∈ BT ( p, c(n)) has at least N distinct pre-images in BT (0, 1), and refer to this property as a “lower semi-continuity” property. Generalizing the terminology in [9], we use the notation a ∼ b when two curves (gT,0 ,A)

a, b defined on M and with the same endpoints are homotopic through a family of curves whose lift have gT,0 -lengths ≤ A. Relying on the lift and the linear structure, we see that, for any curve ξ starting from p with (after lifting through 0) gT,0 -length A ≤ 1, there exists a unique g-geodesic γξ (with the same end points as ξ ) defined on M such that ξ ∼ γξ . This fact establishes a one-to-one correspondence between the following (gT,0 ,A)

three concepts: (i) equivalence class of curves through p with gT,0 -lengths ≤ 3c(n), (ii) radial geodesic segments of gT,0 -lengths ≤ 3c(n), and (iii) points in the ball BT (0, 3c(n)) ⊂ T p M. Let σ be a g-geodesic connecting p to p in BT ( p, c(n)). Observe that the images of the lines O pi by the exponential map, σi = exp p (O pi ), are distinct geodesic loops through p. Denote by σi the lift of σi ∪ σ through 0, and denote by pi the end point of σi . Then, it is clear that all the points pi (i = 1, · · · , N ) are the pre-images of p in BT (0, 1/2). We claim that they are distinct. Indeed, assuming that pi = p j for some i = j, we would find σ ∪ σi ∼ σ ∪σj, gT,0 ,2c(n)

which gives σi

∼

gT,0 ,3c(n)

This would imply σi

σ −1 ∪ σ ∪ σi ∼

gT,0 ,3c(n)

∼

gT,0 ,3c(n)

σ −1 ∪ σ ∪ σ j

∼

gT,0 ,3c(n)

σj.

σ j and, therefore, pi = p j , which is a contradiction. In

short, this argument shows that the “cancellation law” holds for the homotopy class of “not too long” curves. Step 5. Suppose that there exist two distinct g-geodesics γ1 : [0, l1 ] → M and γ2 : [0, l2 ] → M satisfying γ1 (0) = γ2 (0) = p,

|γ1 (0)|2T = |γ2 (0)|2T = 1,

and meeting at their endpoints, that is, γ1 (l1 ) = γ2 (l2 ). Then, let l := l1 + l2 and γ := γ2−1 ∪ γ1 : [0, l] → M. Our aim is to prove that l ≥ c(n) Volg (BT ( p, c(n))) , which will give us the desired injectivity radius. From the loop γ we define a map πγ : BT (0, c(n)) → BT (0, 2c(n)) as follows: for any y ∈ BT (0, c(n)), the point πγ (y) is the end point of the lift exp p (O y) ∪ γ (through the origin). If one would have πγ (y) = y then by the cancellation law established in Step 4, we would have γ ∼ 0, which is a contradiction. So, the map πγ has no gT,0 ,2c(n)

fixed point. Without loss of generality, we assume that l ≤ c(n)5 . Let N = [c(n)3 /l] be the largest integer less than c(n)3 /l, and let us use the notation 2γ = γ ◦ γ , etc. Claim. The classes [γ ], [2γ ], · · · , [N γ ] are distinct homotopy classes for the relation ∼ . gT,0 ,c(n)2

Injectivity Radius of Lorentzian Manifolds

707

If this were not true, then by the cancellation law we would have [ jγ ] for some 1 ≤ j ≤ N . We already know that all πγi is defined from BT BT (0, c(n)) for i ≤ j. Since for any y ∈ BT (0, c(n)2 ) we have exp p (O y) ∪ jγ

∼

gT,0 ,c(n)

∼

0

gT,0 ,c(n)2 (0, c(n)2 ) to

exp p (O y),

j

which implies that πγ = id. We use here the notation πγ2 := πγ ◦ πγ , etc. Then, we define a function u : BT (0, c(n)) → R by 2 2 2 j−1 y). u(y) = d g (0, y) + d g (0, πγ y) + · · · + d g (0, πγ j

Since πγ = id, it is easy to see u(πγ y) = u(y) for any y ∈ BT (0, c(n)). That is to say, u is πγ -invariant. By Step 3, u is strictly g-geodesically convex on BT (0, c(n)). More precisely, since for any g-geodesic ξ : [0, s0 ] → BT (0, c(n)), πγi ξ are still g-geodesics in BT (0, c(n)), and d2 u(ξ(s)) ds 2

2 2 2 j−1 j−1 = ∇ 2 d (ξ (s)), dπ (ξ (s)) γ g (0, ·)(ξ (s), ξ (s)) + . . . + ∇ d g (0, ·) dπγ ξ(s) ξ(s) ≥ g (ξ (s), ξ (s)) > 0.

Observe that u | BT

(0,c(n))c

2 2l c(n)3 j c(n)2 , ≥ j (1 − c(n)) c(n) − ≥ l 2 2

and u(0) ≤ j ( jl)2 ≤ j c(n)5 <

j c(n)2 , 2

so the minimum of function u over BT (0, c(n)) is only achieved at an interior point, say y0 ∈ BT (0, c(n)). Then by πγ invariance of u, we have u(πγ y0 ) = u(y0 ) < jc(n)2 /2, and this implies πγ (y0 ) ∈ BT (0, c(n)). By the injectivity radius estimate at y0 ∈ (T p M, g), there exists a g-geodesic connecting y0 to πγ (y0 ), which is contained in BT, p (0, 2c(n)). By using the strong g-geodesic convexity of u, we conclude that πγ y0 = y0 . This contradicts the fact that πγ has no fixed point, and the claim is proved. Step 6. The pull back of the volume element of g is the same as the one of gT . By combining this observation with our results in Steps 4 and 5 we find VolgT (BT (0, 1)) ≥

c(n)3 Volg (BT ( p, c(n))), l

which implies l ≥ c(n)

Volg (BT ( p, c(n))) ≥ c(n) Volg (BT ( p, c(n))) VolgT (BT (0, 1))

and completes the proof of Theorem 1.1.

708

B.-L. Chen, P. G. LeFloch

8. Volume Comparison for Future or Past Cones In Riemannian geometry, Bishop-Gromov’s volume comparison theorem assumes a lower bound on the Ricci curvature and compares the volume of small and large balls in a sharp and qualitative manner. Our aim in this section is to provide an extension to cones in a Lorentzian manifold, and to use this result to refine our main injectivity radius estimate. For definiteness we state the result for future cones. Theorem 8.1. (Volume comparison theorem for cones). Let (M, g) be a globally hyperbolic, Lorentzian (n + 1)-manifold. Fix p ∈ M and a vector T ∈ T p M with g p (T, T ) = −1, and suppose that the exponential map exp p is defined on the ball BT (0, r0 ) ⊂ T p M (determined by the reference inner product gT at p). Suppose also that the Ricci curvature on BT ( p, r0 ) satisfies for some K the inequality Ricg (V, V ) ≥ −n K |g(V, V )| for all time-like vector fields V. Then, by setting FC( p, r ) := exp p (FC( p, r )) and FC( p, r ) := 0 < gT, p (V, V ) < r02 , g p (V, V ) < 0, g p (T, V ) < 0 , for any 0 < r < s < r0 the following inequality holds Volg (FC( p, r )) Vol K ( B K (r )) ≥ , Volg (FC( p, s))) Vol K ( B K (s))) K in the simply-connected where Vol K ( B K (r )) is the volume of the ball B K (r ) ⊂ M Lorentzian (n + 1)-manifold ( M K , g K ) with constant curvature K (that is, with curvaK ,αβγ δ = −K ( ture tensor R g K ,αγ g K ,βδ − g K ,αδ g K ,βγ )). For the proof of this result we return to the technique in Step 2 of Sect. 5, where we introduced the index form associated with a synchronous coordinate system based on time-like geodesics. By observing that the index form is symmetric and that Jacobi fields “minimize” the index form, we can extend the method of proof of the index comparison theorem. However, in a general Lorentzian manifold, since the index form we needed (without imposing a restriction on the geodesics) is non-symmetric, we need to adapt the method of the index comparison theorem. More generally, the same proof allows to establish that, if is a subset of the unit sphere S n such that g(V, V ) < 0 and g(T, V ) < 0 for all V ∈ , then by setting with FC ( p, r ) := exp p (FC ( p, r )) and

V ∈ , FC ( p, r ) := V ∈ FC( p, r ) / |V |gT the following inequality holds Volg (FC ( p, r )) Vol K ( B K (r )) ≥ . Volg (FC ( p, s))) Vol K ( B K (s))) From Theorem 8.1 it follows:

Injectivity Radius of Lorentzian Manifolds

709

Corollary 8.2. (Injectivity radius based on the volume of a future cone). Let M be a manifold satisfying the assumptions in Theorem 1.1 and assumed to be globally hyperbolic, and let T ∈ T p M be a reference vector. Let be a subset in the unit sphere S n included in the future cone N p+ . If Volg (FC ( p, r )) ≥ v0 > 0, then the inequality Injg (M, p, T )

≥ c()

v0

r r n+1 holds, where FC ( p, r ) := exp p (FC p (r )) with FC p (r ) := 0 < |V |T < r, T, V T < 0, |V |2g < 0,

V ∈ , |V |T

and the constant c() depends only on the distance (measured by T ) of to the null cone. Proof of Theorem 8.1. Given a future-oriented time-like geodesic γ : [0, s0 ] → M satisfying γ (0) = p and |γ (0)|gT = −1, let us compute the rate of change of the volume element along γ . For simplicity in the derivation, we fix s1 ∈ (0, s0 ) sufficiently small so that every point in the interval (0, s1 ] is neither a conjugate point nor a cut point with respect to the base point p. Let v0 = γ (s1 ), v1 , v2 , · · · , vn be an arbitrary orthonormal basis prescribed at the final point γ (s1 ) (with respect to the Lorentzian metric gγ (s1 ) ). Let Jα be the Jacobi field defined on the interval [0, s1 ] and satisfying the two conditions Jα (0) = 0 and Jα (s1 ) = vα at the point γ (s1 ). Clearly, the vector fields J0 coincides (up to rescaling) with the tangent vector along the geodesic: s J0 (s) = γ (s), s1 while the vectors Ji and ∇γ Ji are orthogonal to the geodesic. Recall that the Jacobian of the exponential map ϕ(s) := J ac(Dexpγ (s) ) along the geodesic is given by the formula ϕ(s)2 =

|γ (s) ∧ J1 (s) ∧ · · · ∧ Jn (s)|2g s 2n |γ (0) ∧ J1 (0) ∧ · · · Jn (0)|2g

.

We will use also below the function ϕ K (s) defines as the corresponding quantity in the simply connected Lorentzian (n + 1)-manifold with constant curvature −K . Along the geodesic we can consider the index form s

∇γ X, ∇γ Y g − Rmg (γ , X, γ , Y ) ds, Is (X, Y ) := 0

where X, Y are arbitrary vector fields and Rmg (γ , X, γ , Y ) := −Rmg (γ , X )γ , Y g . Observe that the index form is symmetric in its arguments X, Y ; moreover, using the fact that Ji and ∇γ Ji are orthogonal to the geodesic we can express the derivative of the Jacobian of the exponential map in terms of the index form evaluated on Jacobi fields, as follows:

d 2n log ϕ 2 (s1 ) = Ji (s1 ), Ji (s1 )g − ds s1 i 2n = Is1 (Ji , Ji ) − . s1 i

710

B.-L. Chen, P. G. LeFloch

Recall also that since, by assumption, there are no conjugate points along γ , the Jacobi field minimizes the index form among all vector fields with fixed boundary values. This is the same property as in Riemannian geometry, which follows from the fact that a time-like geodesic without conjugate points has a locally maximizing length among all nearby time-like curves with the same end points. Finally, let E i (s) be the vector field obtained by parallel transporting vi (prescribed at the end-point γ (s1 )) along γ . Setting sinh s Ji (s) = E i (s), sinh s1 for which we already know that Is1 (Ji , Ji ) ≤ Is1 ( Ji , Ji ), we can compute that s1 (sinh s)2

d ϕ2 Rmg (γ , E i , γ , E i ) − K ds log 2 (s1 ) ≤ − 2 ds (sinh s ) ϕK 1 0 i (8.1) s1 2 (sinh s)

=− Ricg (γ , γ ) − n K ds, 2 0 (sinh s1 ) which is non-positive. Hence, the function ϕ(s)/ ϕ K (s) is non-increasing. To conclude we need an observation due to Gromov in the Riemannian setting, which we now extend to globally hyperbolic Lorentzian manifolds. Let A be the star-shaped domain (with respect to 0) in T p M, such that exp p : A ∩ BT (0, r0 ) is a diffeomorphism on its image and the image of ∂ A ∩ BT (0, r0 ) is the cut locus (in BT ( p, r0 )). Let χ A be the characteristic function of the set A. Since ϕ/ ϕ K is non-increasing, we see that χ A ϕ/ ϕ K is also non-increasing. Hence, we have two functions defined on the ball BT (0, r0 ), whose quotient is non-increasing along any radial geodesics. Since, by assumption, the manifold M is globally hyperbolic, any point in the set FC( p, r0 ) can be connected to p by a maximizing time-like geodesic. This implies that the integration of the function χ A ϕ over the ball BT (0, s) coincides with the volume Volg (BT ( p, s)) of the geodesic ball. In conclusion, by integrating χ A ϕ and ϕ K over the B K (s)) ball BT (0, s) and after a simple calculation we obtain that Volg (FC( p, s))/Vol K ( is non-increasing in s. The proof of the theorem is completed. Proof of Corollary 8.2. Observe that there is a constant C() depending only on the distance of to the null cone such that Ric(γ , γ ) ≥ −C() g(γ , γ ) for any timelike geodesic γ satisfying γ (0) ∈ . From the volume comparison theorem for future cones (Theorem 8.1) we deduce Volg (FC ( p, c(n)r )) ≥ C(), Volg (FC ( p, r )) and, by combining this result with Theorem 1.1, the corollary follows. 9. Final Remarks Regularity of Lorentzian metrics. Following the strategy proposed in the present paper, we can also transfer to the Lorentzian metric the regularity available on any reference Riemannian metric. Clearly, the regularity obtained in this manner depends on the choice of the reference metric. The interest of the approach we now describe is to provide a

Injectivity Radius of Lorentzian Manifolds

711

simple proof of a regularity result: we use harmonic-like coordinates for the Riemannian metric that we constructed in the proof of our main theorem and we see immediately that the Lorentzian metric has uniformly bounded first-order derivatives. For a discussion of the optimal regularity achievable with Lorentzian metrics we refer to Anderson [3]. Proposition 9.1. (Regularity in harmonic-like coordinates). Under the assumptions and notation of Theorem 1.1, define r1 := c(n)

Volg (BT ( p, c(n) r )) r, r n+1

where c(n) is the constant determined therein. Then, for any ε > 0 there exist a constant c1 (n, ε) with limε→0 c1 (n, ε) = 0 and a coordinate system (x α ) satisfying x α ( p) = 0 and defined for all (x 0 )2 +(x 1 )2 +. . .+(x n )2 < (1−ε)2 r12 , such that in the L ∞ sup-norm |gαβ − ηαβ | + r1 |∂gαβ | ≤ c1 (n, ε),

(9.1)

where ηαβ is the Minkowski metric in these coordinates. Proof. By scaling we may assume r1 = 1. By Step 1 in the proof of Theorem 1.1, we know that the Riemannian metric gT is equivalent to the Riemannian metric gT, p on the ball BT (0, 4c1 (n)) for some c1 (n) > 0. By considering a lift and using again the results in Step 1 this implies BT ( p, c1 (n)) ⊂ BT (q, 3c1 (n)) q ∈ BT ( p, c1 (n)). Applying the same argument as in Theorem 1.1, we deduce that the injectivity radius of any point in BT ( p, c1 (n)) is bounded from below by c1 (n). As in Step 3 in the proof of Theorem 1.1 (or in Step 2 of Sect. 5), we see that there exists a synchronous coordinate system (y α ) = (τ, y j ) of definite size around p such that the metrics g = −dτ 2 + gi j dy i dy j and g N = dτ 2 + gi j dy i dy j (the Riemannian metric constructed therein) satisfy the following properties on the geodesic ball BT ( p, c1 (n)): (a) (1 − c1 (n)) g N ≤ gT ≤ (1 + c1 (n)) g N , (b) g N has bounded curvature (≤ 1/c1 (n)), (c) |τ | + |τ1| + |∇ 2 τ | N ≤ 1/c1 (n). (In particular, this implies |∇g N g| N < 1/c1 (n).) Since the volume Volg N (BT ( p, c1 (n))) is bounded from below, it follows from [9] that the injectivity radius of g N at p is bounded from below by c1 (n). By the theorem in [15] on the existence of harmonic coordinates, for any small ε > 0 there exists a harmonic coordinate system (x α ) with respect to the Riemannian metric g N such that α |x α |2 < (1 − ε)2 and for every 0 < γ < 1, |g N ,αβ − δαβ | < c1 (n, ε),

|∂g N | < 1/c1 (n),

|∂g N |Cγ < 1/c(n, ε, γ ).

∂ |gT, p < In the construction of harmonic coordinates, we may also assume that | ∂∂y0 − ∂τ c1 (n, ε). Since |∇g N g| N < 1/c1 (n) and that, in these coordinates, |∇g N | ≤ 1/c1 (n), we have |∂g| < 1/c1 (n). Finally, to estimate the metric we write |gαβ − ηαβ | p < c1 (n, ε) and 1 ε + c1 (n, ε). The proof is |∂g| < 1/c1 (n) and we conclude that |gαβ − ηαβ | < C(n) completed.

712

B.-L. Chen, P. G. LeFloch

Pseudo-Riemannian manifolds. Finally, we would like to discuss the more general situation of pseudo-Riemannian manifolds (M, g) (also referred to as semi-Riemannian manifolds). Consider a differentiable manifold M endowed with a symmetric, nondegenerate covariant 2-tensor g. We assume that the signature of g is (n 1 , n 2 ), that is, n 1 negative signs and n 2 positive signs. Riemannian and Lorentzian manifolds are obviously special cases of pseudo-Riemannian manifolds. Fix p ∈ M and an orthonormal family T consisting of n 1 vectors E 1 , E 2 , · · · , E n 1 ∈ T p M such that E i , E j g = −δi j . Based on this family, we can define a reference inner product gT on T p M by generalizing our construction in the Lorentzian case, and by using this inner product we can then define the ball BT (0, s) ⊂ T p M. By parallel translating E 1 , E 2 , · · · , E n 1 along radial geodesics from the origin in T p M, we obtain vector fields E 1 , E 2 , · · · , E n 1 defined in the tangent space (or multi-valued vector fields on the manifold). This also induces a (multi-valued) Riemannian metric gT as was explained before. The following corollary immediately follows by repeating the proof of Theorem 1.1. We note that the curvature covariant derivative bound imposed below is in fact superfluous and could be removed by introducing a foliation based on certain synchronous-type coordinates, as we did in Sect. 5 for Lorentzian manifolds. On the other hand, to the best of our knowledge this is the first injectivity radius estimate for pseudo-Riemannian manifolds. Corollary 9.2. (Injectivity radius of pseudo-Riemannian manifolds). Let (M, g) be a differentiable pseudo-Riemannian n-manifold with signature (n 1 , n 2 ), and let p ∈ M and T = (E 1 , · · · , E n 1 ) be a family of vectors in T p M satisfying g(E i , E j ) = −δi j . Suppose that the exponential map exp p is defined on BT (0, r ) ⊂ T p M and that |Rmg |T ≤ r −2 , |∇Rmg |T ≤ r −3

on BT (0, r ).

Then, there exists a positive constant c(n) such that Injg (M, p, T )

Volg (BT ( p, c(n) r )) ≥ c(n) , r rn where BT ( p, r ) = exp p (BT (0, r )) is the geodesic ball at p with radius r . Proof. Without loss of generality we assume r = 1. In local coordinate system y α , let β

E i =: E i then gT,αβ = gαβ + 2 orem 1.1 we obtain

n 1

∂ , ∂ yβ

i=1

β

E iα = E i gαβ ,

i = 1, . . . , n 1 ,

E iα E iβ . By the same computations as in the proof of The-

1 , c(n) |gT − gT, p | + |g − η| < c(n) |∇ E i |T ≤

on the ball BT (0, c(n)),

where ηαβ := ∓δαβ (a minus sign for α ≤ n 1 , and a plus sign for α > n 1 ). In view of the computations in [12] (Theorem 4.11 and Corollary 4.12) we deduce that |∂g| < r/c(n), where r 2 = (y 1 )2 + · · · + (y n )2 . Since dg2T, p (y0 , y) = |y − y0 |2 , we have for any point y0 ∈ BT (0, c(n)), 2 2 dgT, p (y0 , ·) ≥ δαβ = gT, p ∇αβ

on the ball BT (0, c(n)).

Since the metric gT, p plays the same role as g N (cf. the proof of Theorem 1.1), all arguments can be carried out and this completes the proof of the corollary.

Injectivity Radius of Lorentzian Manifolds

713

Acknowledgments. The first author (BLC) was partially supported by Sun Yat-Sen University via a research grant between China, France, and Russia (No. 34000-3275100), the Ecole Normale Supérieure de Paris, the French Ministry of Foreign Affairs, and the Institut des Hautes Études Scientifiques (IHÉS, Bures-sur-Yvette). The second author (PLF) was partially supported by the A.N.R. (Agence Nationale de la Recherche) through the grant 06-2-134423 entitled “Mathematical Methods in General Relativity” (MATH-GR), and by the Centre National de la Recherche Scientifique (CNRS).

References 1. Anderson, M.T.: Convergence and rigidity of manifolds under Ricci curvature bounds. Invent. Math. 102, 429–445 (1990) 2. Anderson, M.T.: On long-time evolution in general relativity and geometrization of 3-manifolds. Commun. Math. Phys. 222, 533–567 (2001) 3. Anderson, M.T.: Regularity for Lorentz metrics under curvature bounds. J. Math. Phys. 44, 2994– 3012 (2003) 4. Besse, A.: Einstein manifolds. Ergebenisse Math. Series 3, Berlin-Heidelberg-New York:Springer Verlag, 1987 5. Cheeger, J.: Finiteness theorems for Riemannian manifolds. Amer. J. Math. 92, 61–94 (1970) 6. Cheeger, J., Ebin, D.: Comparison theorems in Riemannian geometry. Amsterdam-Oxford:NorthHolland, New York:American Elsevier Pub., 1975 7. Cheeger, J., Gromov, M.: Collapsing Riemannian manifolds while keeping their curvature bounded. I. J. Diff. Geom. 23, 309–346 (1986); and II, J. Diff. Geom. 32, 269–298 (1990) 8. Cheeger, J., Fukaya, K., Gromov, M.: Nilpotent structures and invariant metrics on collapsed manifolds. J. Amer. Math. Soc. 5, 327–372 (1992) 9. Cheeger, J., Gromov, M., Taylor, M.: Finite propagation speed, kernel estimates for functions of the Laplace operator, and the geometry of complete Riemannian manifolds. J. Differ. Geom. 17, 15–53 (1982) 10. Cheng, S.Y., Li, P., Yau, S.T.: Heat equations on minimal submanifolds and their applications. Amer. J. Math. 106, 1033–1065 (1984) 11. DeTurck, D.M., Kazdan, J.L.: Some regularity theorems in Riemannian geometry. Ann. Sci. École Norm. Sup. 14, 249–260 (1981) 12. Hamilton, R.S.: A compactness property for solution of the Ricci flow. Amer. J. Math. 117, 545–572 (1995) 13. Hawking, S., Ellis, G.F.: The large scale structure of space-time. Cambridge:Cambridge Univ. Press, 1973 14. Heintze, E., Karcher, H.: A general comparison theorem with applications to volume estimates for submanifolds. Ann. Sci. Ecole Norm. Sup. 11, 451–470 (1978) 15. Jost, J., Karcher, H.: Geometrische Methoden zur Gewinnung von a-priori-Schranken für harmonische Abbildungen. Manuscripta Math. 40, 27–77 (1982) 16. Klainerman, S., Rodnianski, I.: Ricci defects of microlocalized Einstein metrics. J. Hyperbolic Differ. Eq. 1, 85–113 (2004) 17. Klainerman, S., Rodnianski, I.: Rough solutions of the Einstein-vacuum equations. Ann. of Math. 161, 1143–1193 (2005) 18. Klainerman, S., Rodnianski, I.: On the radius of injectivity of null hypersurfaces. J. Amer. Math. Soc., to appear 19. Penrose, R.: Techniques of differential topology in relativity. CBMS-NSF Region. Conf. Series Appl. Math., Vol. 7, Philadelphia, PA:SIAM, 1972 20. Peters, S.: Convergence of Riemannian manifolds. Compositio Math. 62, 3–16 (1987) 21. Petersen, P.: Convergence theorems in Riemannian geometry. In: “Comparison Geometry” (Berkeley, CA, 1992–93), MSRI Publ. 30, Cambridge:Cambridge Univ. Press, 1997, pp. 167–202 22. Whitehead, J.H.C.: Convex regions in the geometry of paths. Quart. J. Math. Oxford 3, 33–42 (1932) Communicated by A. Connes

Commun. Math. Phys. 278, 715–751 (2008) Digital Object Identifier (DOI) 10.1007/s00220-007-0389-x

Communications in

Mathematical Physics

The Spectrum of Heavy Tailed Random Matrices Gérard Ben Arous1,2 , Alice Guionnet3 1 Swiss Federal Institute of Technology (EPFL), CH-1015 Lausanne, Switzerland 2 Courant Institute of Mathematical Sciences, New York University, 251 Mercer Street,

New York, NY 10012, USA. E-mail: [email protected]

3 Ecole Normale Supérieure de Lyon, Unité de Mathématiques pures et appliquées,

UMR 5669, 46 Allée d’Italie, 69364 Lyon Cedex 07, France. E-mail: [email protected] Received: 27 February 2007 / Accepted: 25 May 2007 Published online: 4 December 2007 – © Springer-Verlag 2007

Abstract: Let X N be an N × N random symmetric matrix with independent equidistributed entries. If the law P of the entries has a finite second moment, it was shown by Wigner√[14] that the empirical distribution of the eigenvalues of X N , once renormalized by N , converges almost surely and in expectation to the so-called semicircular distribution as N goes to infinity. In this paper we study the same question when P is in the domain of attraction of an α-stable law. We prove that if we renormalize 1 the eigenvalues by a constant a N of order N α , the corresponding spectral distribution converges in expectation towards a law µα which only depends on α. We characterize µα and study some of its properties; it is a heavy-tailed probability measure which is absolutely continuous with respect to Lebesgue measure except possibly on a compact set of capacity zero.

1. Introduction We study the asymptotic behavior of the spectral measure of large random real symmetric matrices with independent identically distributed heavy tailed entries. Let (xi j , 1 ≤ i ≤ j < ∞) be an infinite array of i.i.d real variables with common distribution P living in a probability space (, P). Denote by X N the N × N symmetric matrix given by: X N (i, j) = xi j if i ≤ j, x ji otherwise. If the entries have a finite second moment σ 2 = E[xi2j ] = (λ1 , · · · , λ N ) are the eigenvalues of

x 2 d P(x), and if

XN √ N

then Wigner’s theorem (see [14] and generalN δλi of the matrix izations in [1,9]) asserts that the empirical spectral measure N1 i=1 XN √ converges weakly almost surely to the semi-circle distribution N

This work was partially supported by Miller institute for Basic Research in Science, University of California Berkeley.

716

G. Ben Arous, A. Guionnet

σ (d x) =

1 2 4σ − x 2 d x. 2π σ 2

We will consider here the case of heavy tailed entries, when the second moment σ 2 is infinite. We will assume that the common distribution of the absolute values of the xi j ’s is in the domain of attraction of an α-stable law, for α ∈]0, 2[, i.e that there exists a slowly varying function L such that P(|xi j | ≥ u) =

L(u) . uα

We introduce the normalizing constant a N by: 1 . a N = inf u, P[|xi j | ≥ u] ≤ N

(1)

(2)

1

It is clear that a N is roughly of order N α , indeed there exists another slowly varying function L 0 such that 1

a N = L 0 (N )N α .

(3)

We then consider the matrix A N := a −1 N X N , its eigenvalues (λ1 , . . . , λ N ), and its specN tral measure µˆ A N := N1 i=1 δλi . Our main result is Theorem 1.1. Let α ∈]0, 2[ and assume (1). 1. There exists a probability measure µα on R such that the mean spectral measure E[µˆ A N ] converges weakly to µα . 2. µˆ A N converges weakly in probability to µα . More precisely, forany bounded continuous function f , f (x)d µˆ A N (x) converges in probability to f (x)dµα (x). −ε < ∞ for 3. Let (Nk )k≥1 be an increasing sequence of integers such that ∞ k=1 Nk some ε < 1, then the subsequence µˆ A Nk converges almost surely weakly to µα . Remark 1.2. We note that the hypothesis (1) concerns only the tail behavior of the distribution of the absolute values of the entries. We make no assumption about the skewness of the distribution of the entries, i.e about their right or left tails. Remark 1.3. It would be useful to control better the fluctuations in Theorem 1.1 and establish almost sure convergence for the whole sequence µˆ A N . Our approach is classical. It consists in proving the convergence of the resolvent, i.e of the mean of the Stieltjes transform of the spectral measure, by proving tightness and characterizing uniquely the possible limit points. We first prove, in Sect. 2, that it is possible, for all later purposes, to truncate the large values of the entries at appropriate levels. We then proceed, in Sect. 3, to show tightness for the spectral measures of the truncated and original matrices A N . We then introduce, in Sect. 4, the following important quantity: for z ∈ C\R, we define the probability measure L zN on C by L zN =

N 1 δ(z−A N )−1 , kk N k=1

Spectrum of Heavy Tailed Random Matrices

717

i.e the empirical measure of the diagonal elements of the resolvent of A N at z ∈ C\R. The classical Schur complement formula is our basic linear algebraic tool to study L zN recursively on the dimension, as is usual when the resolvent method is used (see e.g [9] or [1]). In Sect. 5, using an argument of concentration of measure and borrowing classical techniques from the theory of triangular arrays of i.i.d random variables, we show that the limit points µz of L zN satisfy a fixed point equation in the space of probability measures on C. Even though we cannot prove uniqueness of the solution to this equation, we manage in Sect. 6 to prove the uniqueness of the solution to the resulting equation α for x 2 dµz (x), which in turn gives the uniqueness of xdµz (x). This is enough to characterize uniquely the limit points of E[µˆ A N ] and thus the convergence of E[µˆ A N ] to µα . Once the question of convergence is settled by Theorem 1.1, the next question is to describe the limiting measure µα . We will discuss in this article three different characterizations of µα . Our approach leads directly to the following first characterization of µα through its Stieltjes transform, defined for z ∈ C\R by: (4) G α (z) = (z − x)−1 dµα (x). Define the entire function g on C by 2 ∞ −v α2 −vy gα (y) = e e dv. α 0 We will also need the constants C(α) =

πα

e−i 2 ( α2 )

(5)

and c(α) = (1 − α2 ).

Theorem 1.4. 1. There exists a unique function Yz , analytic on the half plane C+ = {z ∈ C, I mz > 0}, tending to zero at infinity, and such that C(α)gα (c(α)Yz ) = Yz (−z)α . 2. The probability measure µα of Theorem 1.1 is uniquely described by its Stieltjes transform given, for z ∈ C+ , by 1 ∞ −t −c(α)t α2 Yz G α (z) = − e e dt. (6) z 0 Remark 1.5. Note that µα depends continuously on α ∈ (0, 2) since Yz as described above is continuous in α, at least for sufficiently large z, a remark which insures the continuity of G α (z) at least for sufficiently large z and therefore the continuity of µα . Using the characterization given in Theorem 1.4, we prove in Sect. 7 the following properties of µα . Theorem 1.6. The probability measure µα of Theorem 1.1 satisfies 1. µα is symmetric. 2. µα has unbounded support. 3. There exists a (possibly empty) compact subset of the real line K α of capacity zero, such that the measure µα has a smooth density ρα on the open complement Uα = R\K α .

718

G. Ben Arous, A. Guionnet

4. µα has heavy tails. There exists a constant L α = ρα (x) ∼

α 2

> 0 such that, when |x| → ∞

Lα . |x|α+1

A second and different characterization of µα is proposed in the physics literature by Cizeau-Bouchaud [3]. This description has been controversial (see [4] for a discussion and numerical simulations). The strategy used in [3] is also based on the convergence of the resolvent, but on the real axis as opposed to our proof of convergence away from the real axis. We unfortunately cannot make sense of the strategy used in [3]. We discuss in Sect. 8 the link between our characterization given in Theorem 1.4 and the BouchaudCizeau characterization (after correction of a small typographical error in [3] already noted by [4]). Remark 1.7. In a forthcoming work with A. Dembo we show that the exceptional set K α of Theorem 1.4 is empty, or reduced to zero. This implies that µα has a smooth density everywhere (except may be at zero) as suggested by numerical simulations and accepted by the physics literature. This question is discussed further in Sect. 7. We also describe below (in Sect. 9) a third characterization of µα , more combinatorial in nature. It is based on an extension (due to I. Zakharevich [15]) of the classical moment method rather than the resolvent approach used both by [3] and us. Obviously because of the heavy tails and thus of the absence of moments, one would have to do it first for truncated matrices and then try to lift the truncation. More precisely if one truncates the entries at the level Ba N , for a fixed B > 0 and define xiBj = xi j 1|xi j |≤Ba N one can compute the moments of the empirical measures µˆ A B of the truncated matrix B (i j) = a −1 x B , AN N ij

N

x k d µˆ A B (x) = N

1 B k tr (A N ) , N

and study their convergence when N tends to infinity. We establish in Sect. 9 that Theorem 1.8. With the above notations, and under the hypothesis of Theorem (1.1) and the additional hypothesis: lim

u→∞

P(x(i j) > u) = θ ∈ [0, 1]. P(|x(i j)| > u)

(7)

1. E[µˆ A B ] converges weakly to a probability measure µαB uniquely determined by its N

moments and independent of the parameter θ . This measure µαB has unbounded support and is symmetric. 2. µαB converges weakly to µα as B tends to infinity. The moments of µαB are described combinatorially in Sect. 9. Thus Theorem 1.8 gives a third, independent, description of the limiting measure µα . As we will see in Sect. 9, the first part of Theorem (1.8) is a direct consequence of a general combinatorial result of I. Zakharevich and its proof is essentially given in [15]. The convergence of these Zakharevich measures to our µα establishes a link between this combinatorial description and the one we have given in terms of Stieltjes transforms in Theorem 1.4. This link is far from transparent.

Spectrum of Heavy Tailed Random Matrices

719

Remark 1.9. We note that the limiting measure µαB is in fact independent of the skewness parameter θ . Thus it is insensitive to the hypothesis (7) about the upper and lower tails of the distribution of the entries. This is coherent with Remark 1.2. Remark 1.10. The case α = 2 is covered neither by the classical Wigner theorem (which asks for a second moment) nor by our results so far. In fact it is easy to see, using the combinatorial approach of Theorem 1.8 that the limit law is then the semi-circle, even though the normalization differs from the usual one. Finally, let us mention that the behavior of the edge of the spectrum of heavy tailed matrices (when α ∈]0, 2[) has been established by Soshnikov [13]. The largest eigenvalues are asymptotically, in the scale a 2N , distributed as a Poisson point process with intensity α −1 x −α−1 d x. This is in sharp contrast with the Airy determinantal process description of top eigenvalues for the case of light tailed entries [12] but in perfect agreement with our result about the tail of µα given in Theorem 1.6. 2. Truncating the Entries Since the entries of our random matrices have very few moments, it will be of importance later to truncate them. We introduce the appropriate truncated matrices in this section and show how their spectral measure approximates the spectral measure of the original matrices. Let us consider X NB (resp. X κN ) the Wigner matrix with entries xi j 1|xi j |≤Ba N for B > 0, respectively xi j 1|xi j |≤N κ a N for κ > 0. Also define A N = a −1 N XN,

B B AN = a −1 N XN,

κ AκN = a −1 N XN.

Let us remark here that the threshold a N is precisely the scale of the largest entry in a row (or a column) of the random matrix X N , while the scale of the largest entry (or of 2 the largest eigenvalue) of the whole matrix is a 2N i.e roughly N α . B and Aκ are very We want to state that the spectral measures of the matrices A N , A N N close in a well chosen distance, compatible with the weak topology. The standard Dudley distance d is defined on P(R) by

f dµ

, d(µ, ν) = sup

f dν − || f ||L ≤1

where the supremum is taken over all Lipschitz functions f on R such that f L ≤ 1, where the norm f L is defined by f L := sup x = y

| f (x) − f (y)| + sup | f (x)|. |x − y| x

We will use the following variant d1 of the Dudley distance:

f dµ

, d1 (µ, ν) = sup

f dν − || f ||L ≤1, f ↑

where the supremum is taken over non-decreasing Lipschitz functions such that f L ≤ 1 . The Dudley distance d is well known to be a metric compatible with the weak topology and the following lemma shows that so is the variant d1 .

720

G. Ben Arous, A. Guionnet

Lemma 2.1. d1 is compatible with the weak topology on P(R), i.e if µ is a positive measure on R such that there exists µn ∈ P(R) so that lim d1 (µn , µ) = 0,

n→∞

then µn converges weakly to µ and µ ∈ P(R). Reciprocally, if µn converges to µ weakly, d1 (µn , µ) goes to zero. If a sequence µn ∈ P(R) is Cauchy for d1 , it converges weakly. Proof. A compactly supported Lipschitz function f can be written as x g(y)dy, f (x) = f (0) + 0

where g is a borelian function bounded by the Lipschitz norm of f . Writing x x f (x) − f (0) = 1g(y)≥0 g(y)dy − |g(y)|1g(y)<0 dy 0

0

we see that f can be written as the difference of two non-decreasing Lipschitz functions. Hence, if d1 (µn , µ) goes to zero as n goes to infinity, f dµn converges to f dµ for all Lipschitz compactly supported functions. Hence, µn converges to µ for the vague topology. On the other hand, if µn converges to µ for d1 , we must have, taking f = 1, µ(1) = lim µn (1) = 1, n→∞

which is enough to guarantee also the weak convergence. Indeed, if we now take f ∈ Cb (R), and g compactly supported with values in [0, 1], |µn ( f ) − µ( f )| ≤ f ∞ (µ(1) + µn (1) − µ(g) − µn (g)) + |µn ( f g) − µ( f g)|. Letting first n going to infinity and then taking g approximating the unit, we obtain the result. The second statement is clear since d1 ≤ d with d the standard Dudley distance (obtained by taking the supremum over all Lipschitz functions with norm bounded by one) and the result is well known to hold for d. Finally, if a sequence µn is Cauchy for d1 , it converges for the vague topology (as it is tight for the vague topology, and the property of being Cauchy uniquely prescribes the limit) and then for the weak topology by the mass property. We next show that truncation does not affect much the spectral measures in the d1 distance. Theorem 2.2. 1. For every > 0 there exists B() < ∞ and δ(, B) > 0 when B > B() such that, for N large enough, P d1 (µˆ A N , µˆ A B ) > ≤ e−δ(,B)N . N

2. For κ > 0, and a ∈]1 − ακ, 1[, there exists a finite constant C(α, κ, a) such that for all N ∈ N, a P d1 (µˆ A N , µˆ AκN ) > N a−1 ≤ e−C N log N .

Spectrum of Heavy Tailed Random Matrices

721

Remark 2.3. This result depends crucially on the proper choice of the truncation level. Had we truncated the entries at a lower level, say N κ a N with κ < 0, then the limit law would be the semi-circle. Thus the effect of the heavy tails would have been completely canceled by the truncation. Proof. Let X and Y be two N × N Hermitian matrices, and µˆ X and µˆ Y be their spectral measures. Then Lidskii’s theorem implies (see e.g [8] p. 500) that, if d is the rank of X − Y , then d1 (µˆ X , µˆ Y ) ≤

2d . N

(8)

Consequently, the following lemma implies Theorem 2.2.

Lemma 2.4. 1. For every > 0, there exists B() > 0 and δ(, B) > 0 when B > B() such that P(rank(X N − X NB ) ≥ N ) ≤ e−δ(,B)N . 2. For κ > 0, and a ∈]1 − ακ, 1[ there exists a finite constant C(α, κ, a) such that for all N ∈ N, P(rank(X N − X κN ) ≥ N a ) ≤ e−C N

a

log N

.

(9)

Proof of Lemma 2.4. Let Mi− = 1 (resp. Mi+ = 1) if there exists a j ≤ i (resp. j > i) such that |xi j | > Ba N , and Mi− = 0 (resp. Mi+ = 0) otherwise. Define M− =

N

Mi− and M + =

i=1

N

Mi+ .

i=1

Now let M be the number of non-zero rows of the matrix X N − X NB , obviously rank X N − X NB ≤ M ≤ M − + M + ,

(10)

so that N N N +P M + ≥ ≤ 2P M − ≥ , P(rank X N − X NB ≥ N ) ≤ P M − ≥ 2 2 2 where we observed that M + is stochastically dominated by M − (which contains the diagonal terms). But if we denote by pi = P(Mi− = 1), we have L(Ba N ) i c i pi = P(∃ j ≤ i, |xi j | > Ba N ) = 1 − 1 − ≤ 1 − 1 − , (Ba N )α N Bα where the latter inequality holds for c > 1 when N is large enough since lim

N →∞

N L(Ba N ) = 1. a αN

(11)

722

G. Ben Arous, A. Guionnet

As a consequence we can estimate the sum N +1 N 1 − 1 − N cB α ∼ N C(B), pi ≤ N − 1 − 1 − N cB α i=1

(12)

where we denoted A N ∼ B N if A N /B N goes to one as N goes to infinity and C(B) = 1 −

c Bα (1 − e− B α ). c

(13)

For any λ > 0, the independence of the Mi− ’s gives −

E(exp λM ) =

N

λ

λ

(1 + pi (e − 1)) ≤ exp (e − 1)

i=1

N

pi

,

i=1

so that we get the exponential upper bound, for N large enough N N ≤ e−λ 2 E(exp λM − ) ≤ exp[−N φ− (λ, , B)], P M− ≥ 2 with φ− (λ, , B) =

λ − (eλ − 1)C(B). 2

Obviously, since lim B→∞ C(B) = 0, for any > 0, there exists a B() > 0 (of order 1 − α ) such that when B > B(), δ− (, B) := sup φ− (λ, , B) > 0 λ>0

and

N ≤ exp[−N δ− (, B)]. P M− ≥ 2

Using the crude rank estimate (10) proves the first claim of Lemma 2.4. In order to prove the second claim of Lemma 2.4, we simply replace B by B(N ) = N κ and by (N ) = N a−1 in the proof above. We get then that δ− ((N ), B(N )) ∼

1 (a − 1 + ακ)(N a−1 log N ) 2

and similarly for δ+ ((N ), B(N ), which proves our second claim.

κ Remark 2.5. We now let AκN = a −1 N X N . We note that centering the entries of the matrix κ A N defines a perturbation of rank one. Hence, Lidskii’s theorem (see (8)) shows that

2 d1 µˆ AκN , µˆ AκN −E[AκN ] ≤ . N Thus we may assume that AκN is centered without changing its limiting spectral distribution.

Spectrum of Heavy Tailed Random Matrices

723

3. Tightness We prove in this section that the mean of the spectral measures of the random matrices B or Aκ are tight. A N and of their truncated versions A N N Lemma 3.1. 1. The sequence (E[µˆ A N ]; N ∈ N) is tight for the weak topology on P(R). 2. For every B > 0, and κ > 0, the sequences (E[µˆ A B ]; N ∈ N) and (E[µˆ AκN ]); N N ∈ N) are tight for the weak topology on P(R). Proof. We will use the following classical result about truncated moments (Theorem VIII.9.2 of [6]): For any ζ > α, lim

t→∞

E[|xi j |ζ 1|xi j |
=

α . ζ −α

(14)

Therefore, using (11), we have E[|xi j |ζ 1|xi j |
ζ

a α B ζ −α N , ζ −α N

(15)

or equivalently: B (i j)|ζ ] ∼ E[|A N

1 α B ζ −α . ζ −α N

(16)

The version for the truncated matrix AκN will also be useful: E[|AκN (i j)|ζ ] ∼

α N κ(ζ −α)−1 . ζ −α

Using these estimates with ζ = 2, one sees that 2 α 1 B ∼ tr A N B 2−α sup E N 2−α N ∈N and that

sup E

N ∈N

α 1 κ 2 tr A N N κ(2−α) . ∼ N 2−α

(17)

(18)

(19)

Equation (18) shows that E[µˆ A B ] belongs to the compact set K C := {µ ∈ P(R); µ(x 2 ) ≤ N

α B 2−α and N large enough. Hence, the sequence (E[µˆ A B ]); N ∈ N) C} for any C > 2−α N is tight, and thus any subsequence of E[µˆ A B ] has converging subsequences. We denote N by µ B a limit point, i.e the limit of a converging subsequence. By a diagonal procedure, we can insure that this subsequence is the same for all B ∈ N, and in particular, since d1 is compatible with the weak topology, we can find an increasing function φ so that for any δ > 0, B0 < ∞, there exists N0 < ∞ so that for N ≥ N0 , and all B ≤ B0 ,

d1 (E[µˆ A B

φ(N )

], µ B ) ≤ δ.

724

G. Ben Arous, A. Guionnet

By Lemma 2.4, and Lidskii’s estimate (8), we have for all > 0, d1 (E[µˆ Aφ(N ) ], E[µˆ A B

φ(N )

]) ≤ E[d1 (µˆ Aφ(N ) , µˆ A B

φ(N )

)] ≤ 2 + e−δ(,B)φ(N )

(20)

with δ(, B) > 0 if B > B(). These two inequalities imply that (µ B , B ∈ N) is a Cauchy sequence for the modified Dudley metric d1 and thus converges when B tends to ∞. Indeed, if we choose , , δ > 0 and an integer number B0 > B() ∨ B( ), we find that for B, B ∈ [B() ∨ B( ), B0 ] and N > N0 , d1 (E[µˆ Aφ(N ) ], µ B ) ≤ δ + 2 + e−δ(,B)φ(N ) and d1 (E[µˆ Aφ(N ) ], µ B )

≤ δ + 2 + e−δ( ,B )φ(N ) ,

(21)

and therefore

d1 (µ B , µ B ) ≤ 2δ + 2 + 2 + e−δ(,B)φ(N ) + e−δ( ,B )φ(N ) .

(22)

Letting N going to infinity, and then δ to zero and B0 to infinity we finally deduce that d1 (µ B , µ B ) ≤ 2 + 2 , provided that B and B are greater than B() ∨ B( ). Hence, µ B is a Cauchy sequence for d1 and thus converges weakly by Lemma 2.1 as B goes to infinity. As a consequence of (21) we also find that E[µˆ Aφ(N ) ] converges to this limit as N goes to infinity. The same holds for the truncated versions E[µˆ Aκφ(N ) ]. Thus, we have proved that (E[µˆ A N ], E[µˆ AκN ]) N ∈N are tight. This lemma (3.1) can be strengthened into a partial tightness result. almost-sure 1 Consider an increasing function φ : N → N such that N ≥0 φ(N < ∞, then ) Lemma 3.2. The sequences (µˆ A B ) N ∈N , (µˆ Aφ(N ) ) N ∈N , (µˆ Aκφ(N ) ) N ∈N are almost surely φ(N ) tight. Proof. We note that the truncated moments bound given in (16) can be strengthened α into a bound in probability as follows. Let M > 0 and C > 2−α B 2−α , Chebychev’s inequality reads 2 2 2 2 1 1 1 1 B B B ≥ M +C ≤ 2E −E tr A N tr A N tr A N P N M N N ⎡⎛ ⎞2 ⎤ N 1 ⎢ 1 ⎥ B B AN = 2 E ⎣⎝ 2 (i, j)2 − E A N (i, j)2 ⎠ ⎦ M N i, j=1

2 4 B 2 B 2 E A (i, j) − E[A (i, j) ] N N M2 N 2 i≤ j 2 B ≤ 2 max E A N (i j)4 M i≤ j ≤

∼

2α B 4−α 1 , 4 − α M2 N

(23)

Spectrum of Heavy Tailed Random Matrices

725

where we used the independence of the entries at the third step and the truncated moments estimate (16) for ζ = 4 at the last step. Then Borel Cantelli’s lemma implies that for α any C > 2−α B 2−α , lim sup N →∞

2 1 B ≤ C a.s, tr Aφ(N ) φ(N )

which insures the almost sure tightness of (µˆ A B

φ(N )

) N ∈N . From this point, all the above

arguments apply to show the almost sure tightness of (µˆ Aφ(N ) ) N ∈N and (µˆ Aκφ(N ) ) N ∈N . 4. Induction Over the Dimension of the Matrices We borrow the following idea from [3]: in order to prove the vague convergence of (E[µˆ A N ]) N ∈N we study the asymptotic behavior, for z a complex number, of the probability measure L zN on C given, for f ∈ Cb (C), by

L zN ( f )

N 1 −1 =E f ((z − A N ) )kk . N k=1

Here and below, z denotes in short z times the identity in the set of matrices under consideration. L zN is thus the empirical measure of the diagonal entries of the resolvent of A N . In contrast to [3], we will only consider these measures when z ∈ C\R, where everything is well defined since z − A N is invertible. Note that for z ∈ C+ = {z ∈ C : z > 0}, and for k ∈ {1, . . . , N }, the diagonal term ((z − A N )−1 )kk belongs to the set D := C− ∩ {x ∈ C : |x| ≤ |(z)|−1 }. L zN is thus a probability measure on the compact subset D of C. If we choose the function f (x) = x then 1 z −1 tr((z − A) ) LN( f ) = E N is the Stieltjes transform of E[µˆ A N ]. Thus, the weak convergence of L zN for all z ∈ C+ (or even for all z in a set with accumulation points) is enough to prove the vague convergence of E[µˆ A N ]. Indeed the latter is a consequence of the convergence of its Stieltjes transform, which, as an analytic function on C+ , is uniquely determined by its values on a set with accumulation points. In the following, given a z ∈ C+ , we will prove an equation on the limit points of L zN (more precisely of its analogue where A N is replaced by its truncation AκN for some well chosen κ > 0). Our main tool will be a recursion on the dimension N, and the Schur complement formula. We first investigate how these measures depend on the dimension. We let A¯ N +1 be the (N + 1) × (N + 1) matrix obtained by adding to A N a first row ¯ and a first column A N (0, k) = A N (k, 0) = a −1 N x 0k . Hence, A N +1 has the same law as a N +1 A . N +1 aN We then let Aˆ N be the (N + 1) × (N + 1) matrix obtained by adding as first row and column the zero vector.

726

G. Ben Arous, A. Guionnet

We also define for z ∈ C\R, G¯ N +1 (z) := (z − A¯ N +1 )−1 , G N (z) = (z − A N )−1 , Gˆ N (z) = (z − Aˆ N )−1 . We finally denote by .κ all quantities where A N has been replaced by its truncated version AκN . Thus for z ∈ C\R we define L z,κ N =

N 1 δG κN (z)kk , N

Lˆ z,κ N =

k=1

1 δGˆ κ (z)kk , N N +1 N

k=0

N 1 L¯ z,κ = δG¯ κ (z)kk . N +1 N +1 N k=1

Lemma 4.1. 1. Gˆ κN (z)kk is equal to G κN (z)kk for k ≥ 1 and to z −1 for k = 0. 2. N

1

¯ κ

lim E G N +1 (z)kk − Gˆ κN (z)kk = 0. N →∞ N k=1

1 3. For κ ∈]0, 2−α [ and 0 < η < 21 (1 − κ(2 − α)), −η ¯ z,κ lim P d L z,κ = 0. N , L N +1 > N N →∞

Here, as above, d is the Dudley distance on P(C). Proof. We note that (z − Aˆ κN ) =

z 0

0 z − AκN

⇒ Gˆ κN (z) =

z −1 0

0 (z − AκN )−1

(24)

which immediately yields the first point. For the second, let us write G¯ κN +1 (z)kk − Gˆ κN (z)kk = G¯ κN +1 (z) A¯ κN +1 − Aˆ κN Gˆ κN (z)

kk

=

N

G¯ κN +1 (z)kl A¯ κN +1 − Aˆ κN

l=0

+

N

l0

Gˆ κN (z)0k

G¯ κN +1 (z)k0 A¯ κN +1 − Aˆ κN Gˆ κN (z)lk 0l

l=0

= G¯ κN +1 (z)k0

N

AκN (0l)Gˆ κN (z)lk ,

l=0

where we noticed above that Gˆ κN (z)0k is null for k = 0 by (24). Therefore, we find that ⎡

2 ⎤ N

2

2

AκN (0l)Gˆ κN (z)lk ⎦ E G¯ κN +1 (z)kk − Gˆ κN (z)kk ≤ E G¯ κN +1 (z)k0 E ⎣

l=0

Spectrum of Heavy Tailed Random Matrices

727

by Cauchy-Schwartz’s inequality. We recall that we have seen in Remark 2.5 that we can assume that the entries of the matrix AκN are centered. Using then the independence of Aκ0l and Gˆ N (z), summing over k ∈ {1, . . . , N } and with a further use of Cauchy-Schwartz’s inequality, we find that, N 1 E[|G¯ κN +1 (z)kk − Gˆ κN (z)kk |] N k=1

≤ max E[(AκN (0 j))2 ]

1 2

j

N 1 E[|G¯ κN +1 (z)k0 |2 ] N

21

k=1

⎡

⎤1 2 N 1 κ 2 E⎣ |Gˆ N (z)lk | ⎦ . N l,k=1

We now note that the entries of the resolvent Gˆ N (z) are uniformly bounded in modulus. Indeed observe that, if U is a basis of eigenvectors of Aˆ κN , with associated eigenvalues (λi , 1 ≤ i ≤ N ) ∈ R N , for any k, l ∈ {0, . . . , N }2 , |Gˆ κN (z)kl | = | u kr (z − λr )−1 u rl | r

1 1 2 2 1 1 2 2 , |u kr | |u rl | ≤ ≤ |(z)| r |(z)| r

(25)

and the same holds for G¯ κN +1 (z). Moreover, since the spectral radius of Gˆ N (z) is bounded above by 1/|(z)|, we also have N

2 1

ˆ κ N +1 1

.

G N (z)lk = tr Gˆ κN (z)Gˆ κN (z)∗ ≤ N N N |(z)|2 l,k=0

Hence, we deduce N 1 E[|G¯ κN +1 (z)kk − Gˆ κN (z)kk |] ≤ N k=1

1 N +1 1 max E[(AκN (0l))2 ] 2 . 2 N |(z)| l

But we know how to control the truncated moments E[(AκN (0l))2 ]. Indeed by the estimate (17) we see that there exists a finite constant c such that for all N ∈ N, max

1≤i≤ j≤N

E[|AκN (i j)|2 ] ≤ cN −

(26)

with = 1 − κ(2 − α) > 0. The proof of the second point is complete. We finally deduce the last result simply by ˆ z,κ E[d( L¯ z,κ N +1 , L N )] ≤

1 E[|G¯ κN +1 (z)kk − Gˆ κN (z)kk | ∧ 1] N +1 N

k=0

≤

1 + N +1

c(N + 1) 1 N−2 , N |(z)|2

(27)

728

G. Ben Arous, A. Guionnet

and since G N (z) and Gˆ N (z) differ at most by a rank one perturbation, Lˆ z,κ N =

N 1 L z,κ δ −1 N + N +1 N +1 z

implies that ˆ z,κ ≤ d L z,κ N , LN

2 . N +1

This shows by Chebychev’s inequality that for all η < 2 , −η ¯ z,κ lim P d L z,κ = 0. N , L N +1 > N

N →∞

To derive an equation for L z,κ N , our tool will be the Schur complement formula, which we now recall. Let A¯ N +1 and A N be as above. Lemma 4.2. For any z ∈ C, ( A¯ N +1 − z I )−1

00

⎛

N

= ⎝ A N (00) − z −

A N (0k)A N (l0) (A N − z I )−1

k,l=1

⎞−1

⎠ kl

.

Proof. The proof is a direct consequence of Cramer’s inversion formula: ( A¯ N +1 − z I )−1

00

=

det(A N − z I N −1 ) . det( A¯ N +1 − z I )

To get a more explicit formula for this ratio, write A N (00) − z A¯ N +1 − z I = a0T

a0 AN − z

with a0 = (A(01), . . . , A(0N )), and use the representation A B I −B D −1 A − B D −1 C · = C D C 0 I

0 D

with A = A(00) − z, B = a0 , C = a0T and D = A N − z. Therefore, as det(AB) = det(A) det(B), we conclude that det( A¯ N +1 − z I ) = det(A N − z I ) det A(00) − z − a0 , (A N − z I )−1 a0 . This proves the lemma.

We now show that, in the Schur complement formula above, the off-diagonal terms in the sum in the right-hand side are negligible.

Spectrum of Heavy Tailed Random Matrices

729

Lemma 4.3. For any δ > 0, any z with |(z)| ≥ δ, any 0 < κ <

1 2(2−α) ,

and R > 0,

⎞ ⎛

2 P ⎝

AκN (0k)AκN (0l) (AκN − z)−1

> R ⎠ ≤ 2 2−1 2 , kl

R N δ

k =l with = 1 − κ(2 − α) > 21 . Proof. Following Remark 2.5, we can always assume that the entries of AκN are centered. By independence of AκN (0k) and A N , we find that the first moment of the off-diagonal term vanishes: ⎡ ⎤ ⎢ ⎥ E⎣ AκN (0k)AκN (l0) (AκN − z I )−1 ⎦ = 0, kl

1≤k,l≤N k =l

and that the second moment is small: ⎡

2 ⎤

⎥ 2 ⎢

κ κ κ −1

⎦ ≤ 2 max E Aκ (i j) 2 E ⎣

A N (0k)A N (l0) (A N − z) N i, j kl

k =l ⎡ ⎤ 2

×E ⎣

(AκN − z)−1 ⎦ k,l

kl

≤ 2N −2 E[tr((AκN − z)−1 (AκN − z¯ )−1 )] ≤ Chebychev’s inequality concludes the proof.

2N −2+1 . |(z)|2

We finally derive from the previous considerations a first approximation result for L z,κ N . This will be our first step to obtain a closed equation for the limit points of the spectral measure (such an equation will be derived in the next section). 1 , let = 1 − κ(2 − α) > 21 . Let z ∈ C+ . For any Lemma 4.4. For 0 < κ < 2(2−α) bounded Lipschitz function f ,

⎡ ⎛ −1 ⎞⎤

N

⎠⎦ = 0. lim

E[L z,κ ( f )] − E ⎣ f ⎝ z − AκN (0k)2 G κN (z)kk N

N →∞

k=1

Proof. It is clear, by Lemma 4.1, that it is sufficient to prove that, for a constant c, and every Lipschitz function f , ⎡ ⎛ −1 ⎞⎤ N c f L z,κ ⎠⎦ | ≤ |E[ L¯ N +1 ( f )] − E ⎣ f ⎝ z − AκN (0k)2 G κN (z)kk . (28) 2−1 |(z)|2 N 3 k=1

730

G. Ben Arous, A. Guionnet

We have proved above that, for z ∈ C\R, there exists a random variable ε N (z), the sum of the off diagonal terms and A N (00), P(|ε N (z)| ≥ R) ≤

8 4α + , R 2 N 2−1 |(z)|2 R 2 (2 − α)N

such that G¯ κN +1 (z)00

= z−

N

−1 AκN (0k)2 G κN (z)kk

+ ε N (z)

.

k=1

In particular we have for any Lipschitz function f , ⎡ ⎛ −1 ⎞⎤ N κ κ 2 κ ⎠⎦ . A N (0k) G N (z)kk + ε N (z) E[ f (G¯ N +1 (z)00 )] = E ⎣ f ⎝ z −

(29)

k=1

Observe that with AκN = U diag(λ)U ∗ , G κN (z)kk =

N

|u ki |2 (z − λi )−1

i=1

is such that (z) G κN (z)kk ≤ 0, |G κN (z)kk | ≤ |(z)|−1 . In particular, we always have N AκN (0k)2 G κN (z)kk z − k=1 (z)

≥ 1.

Thus, on |ε N (z)| ≤ |(z)|/2, we obtain the control

−1 −1

N N

2|ε N (z)|

z−

≤ AκN (0k)2 G κN (z)kk +ε N (z) − z− AκN (0k)2 G κN (z)kk .

|(z)|2

k=1 k=1 Hence, if f is Lipschitz, ⎡ ⎛ −1 ⎞⎤ N ! κ " ⎠⎦ E f G¯ N +1 (z)00 = E ⎣ f ⎝ z − AκN (0k)2 G κN (z)kk k=1

|ε N (z)| 1 , + O( f L ) E ∧ 1 + 2−1 |(z)|2 N |(z)|4 where the last error comes from the weight of putting and removing the cutoff |ε N (z)| ≤ |(z)|/2, due to the fact that f L also bounds the uniform bound on f . Now, the righthand side does not depend on the choice of the indices and so we have the same estimate

Spectrum of Heavy Tailed Random Matrices

731

for all E[ f (G¯ κN +1 (z)kk )], for k ∈ {0, 1, . . . , N }. Summing the resulting equalities we find that ⎡ ⎛ −1 ⎞⎤ N " ! z,κ ⎠⎦ AκN (0k)2 G κN (z)kk E L¯ N +1 ( f ) = E ⎣ f ⎝ z − k=1

|ε N (z)| 1 . + O( f L ) E ∧ 1 + |(z)|2 N 2−1 |(z)|4

This proves the estimate (28) and thus the lemma. 5. The Limiting Equation

We prove in this section that the limit points of the sequence of measures E[L z,κ N ] satisfy an implicit equation. This section will rely heavily on a result about the convergence of sums of triangular arrays to complex stable laws. We have deferred to Appendix 10 the statements and proofs of these convergence results. We also refer to the same appendix for notations and references about complex stable laws. z,γ Hereafter z ∈ C+ will be fixed. We have seen that E[L N ] is a compactly supported probability measure on C (since its support lies in the open ball with radius 1/|(z)|). z,γ Therefore, (E[L N ]) N ∈N is tight, and we denote by µz a limit point. Recall that for z ∈ C+ , µz is a probability measure on C− ∩ {|y| ≤ 1/|(z)|}. In order to state the main result of this section we will need the following notations. For t, z ∈ C, we denote by t, z the scalar product of t and z seen as vectors in R2 , i.e t, z = (t)(z) + (t)(z). For a probability measure µ on C, and t ∈ C, we define the numbers σµ,α (t) and βµ,α (t) by:

1 σµ,α (t) = Cα and

βµ,α (t) =

α

|t, z| dµ(z)

1

α

(30)

| < t, z > |α sign < t, z > dµ(z) , | < t, z > |α dµ(z)

(31)

(2 − α) cos( π2α ) sin x . d x = xα 1−α

(32)

where Cα−1

= 0

∞

Definition 5.1. For a probability measure µ on C, we define the probability measure P µ on C by its Fourier transform π α α . ei d P µ (x) = exp −σµ, α2 (t) 2 1 − iβµ, α2 (t) tan 4 P µ is well defined by this Fourier transform, indeed P µ is a complex stable distribution. For this description of P µ see Appendix 10. We can now state the main result of this section.

732

G. Ben Arous, A. Guionnet

Theorem 5.2. For 0 < κ <

1 2(2−α) ,

the limit points µz of E[L z,κ N ] satisfy the equation

f dµ = z

f

1 z−x

d P µ (x) z

for every bounded continuous function f. z Proof. We consider a subsequence of (E[L z,κ N ]) converging to µ , i.e an increasing funcz,κ z tion φ(N ) such that (E[L φ(N ) ]) converges weakly to µ . We denote by PNz the law of N z κ 2 κ + − k=1 (A N (0k)) G N (z)kk . For z ∈ C , PN is a probability measure on C since then − G N (z)kk ∈ C for all k. If f is Lispchitz, Theorem 5.2 is a direct consequence of the main result of the preceding section, i.e Lemma 4.4, and of the next crucial Lemma 5.3. z,κ z z Lemma 5.3. If E[L φ(N ) ] converges weakly to µ as N goes to infinity, then Pφ(N ) conz verges weakly to P µ as N goes to infinity.

It is then easy to see that the statement of Theorem 5.2 extends to any bounded continuous function. We now have to prove Lemma 5.3. Proof. We apply first the following concentration result for L z,κ N . 1 Lemma 5.4. For κ ∈ (0, 2−α ), let = 1 − κ(2 − α) > 0. There exists a finite constant c so that for z ∈ C\R and any Lispchitz function f on C,

c f 2L − z,κ

≥δ ≤ P L z,κ ( f ) − E[L ( f )] N . N N |(z)|4 δ 2 z,κ z,κ z This lemma shows that since E[L φ(N ) ] converges weakly to µ , then L φ(N ) also converges almost surely to the non-random probability µz . From there, one can apply Theorem 10.3 of Appendix 10 or more precisely its extension, Theorem 10.4, which has been built to fit exactly our needs here, when applied to the variables X k = A(0, k)2 and therefore # a N = a 2N . One must simply notice that the exponent α in Theorem 10.4 must be replaced here by α2 . This concludes the proof of Lemma 5.3.

Proof of Lemma 5.4. We prove this concentration lemma using standard martingale decomposition. We assume that f is continuously differentiable, the generalization to any Lipschitz function being deduced by density. We put FN (AκN (kl), k ≤ l) := L z,κ N (f) =

N 1 f (G N (z)kk ). N k=1

Let n = N (N − 1)/2 + N and index the set (AκN (kl), k ≤ l) by (Aiκ , 1 ≤ i ≤ N (N − 1)/2 + N for some lexicographic order. Then, if we let Fi = σ (Aκj , 1 ≤ j ≤ i), the

Spectrum of Heavy Tailed Random Matrices

733

independence and identical distribution of the Aiκ ’s shows that, if PN denotes the law of Aiκ (i.e the properly truncated and normalized version of P), E[(FN − E[FN ])2 ] =

n−1

E[(E[FN |Fi+1 ] − E[FN |Fi ])2 ]

i=0

=

n−1

FN (x1 , ·, xi+1 , yi+2 , ·, yn )d PN⊗n (y)

i=0

FN (x1 , ·, xi , yi+1 , ·, yn )d PN⊗n (y)

− ≤

n−1

2

d PN⊗i+1 (x)

(FN (x1 , . . . , xi+1 , . . . , xn )

i=0

− ≤

n−1

FN (x1 , . . . , xi , y, xi+2 . . . xn )d PN (y))2 d PN⊗n (x) ∂xi+1 FN 2∞

(x − y)2 d PN⊗2 (x, y).

(33)

i=0

In our case, for all k ∈ {1, . . . , N }, all m, l ∈ {1, . . . , N }, ∂ Aml f (G N (z)kk ) = f (G N (z)kk )(G N (z)kl G N (z)mk + G N (z)km G N (z)lk ) which yields ∂ Aml FN (A) =

N 1 f (G N (z)kk )(G N (z)kl G N (z)mk + G N (z)km G N (z)lk ) N k=1

1 [G N (z)D( f )G N (z)]ml + [G N (z)D( f )G N (z)]lm = N with D( f ) the diagonal matrix with entries ( f (G N (z)kk ))1≤k≤N . Note that the spectral radius of G N (z)D( f )G N (z) is bounded by f ∞ /|(z)|2 and so since for all l, m ∈ {1, . . . , N }2 ,

[G N (z)D( f )G N (z)]lm ≤ G N (z)D( f )G N (z)∞ ≤ f ∞ /|(z)|2 , we conclude that for all l, m ∈ {1, . . . , N }2 , |∂ Aml F(A)| ≤

2 f ∞ . N |(z)|2

Thus, (33) shows that 4 f 2∞ N 2 max E[(Aκkl − E[Aκkl ])2 ] N 2 |(z)|4 2 k,l 2 f 2∞ − ≤ N , |(z)|4

E[(FN − E[FN ])2 ] ≤

734

G. Ben Arous, A. Guionnet

where we used the truncated moment estimate (17). Chebychev’s inequality then provides the announced bound. We now apply Theorem 5.2 for a particular choice of the function f. To this end, we need to define, for any α > 0, the usual branch of the power function x → x α , which πα is the analytic function on C\R− such that (i)α = ei 2 . This amounts to choosing, if x = r eiθ with θ ∈] − π, π [, x α = r α eiαθ . This function is analytic on C\R− and extends by continuity to x = r eiθ with θ decreasing to −π , lim (r eiθ )α = r α e−iπ α .

θ↓−π

When x = r eiθ is on the other side of the cut R− , i.e when θ is slightly smaller than π , the function jumps by a multiplicative factor e2iαπ . We want to choose in Theorem 5.2 α the analytic function on C\R− f (x) = x 2 . Theorem 5.5. For 0 < κ < α X µz := x 2 dµz (x). Then

1 2(2−α) ,

1. X µz is analytic in C+ and |X µz | ≤

let µz be a limit point of E[L z,κ N ] and define 1 α |(z)| 2

,

2. X µz is a solution of the following equation: ∞ α α X µz = iC(α) (it) 2 −1 eit z exp{−c(α)(it) 2 X µz }dt,

(34)

0

with C(α) =

πα

e−i 2 ( α2 )

and c(α) = (1 − α2 ).

Proof. The first point is obvious. Indeed, for some increasing function φ,

X µz = lim

N →∞

X zφ(N ) ,

X zN

N α 1 κ −1 2 (z − A N )kk := E . N k=1

For each N , X zN is an analytic function on C+ . Moreover, |X zN | ≤ C+ .

1 α |(z)| 2

for all N . This

entails that any limit point X µz must also be analytic in In order to prove the second point and obtain the closed equation (34) we will need the following classical identity: Lemma 5.6. For all z ∈ C+ , α ∞ α 1 2 = iC(α) (it) 2 −1 eit z dt z 0 with C(α) =

πα

e−i 2 ( α2 )

Spectrum of Heavy Tailed Random Matrices

735

This lemma is proven by a simple contour integration, it is also a consequence of Lemma 6.2, proven in the next section (plug y=0 in the statement of Lemma 6.2). z By Theorem 5.2, and since µz and P µ are supported in C− , we can write α 2 1 z X µz = d P µ (x). z−x Applying Lemma 5.6 to z → z − x ∈ C+ for P µ almost all x, and integrating over the x’s we have, by Fubini’s theorem, ∞ α z (it) 2 −1 eit z e−it x d P µ (x)dt. (35) X µz = iC(α) z

0

We now use Theorem 10.5 in the Appendix, with ν = µz here, and replacing α in Theorem 10.5 by α2 . We see that: α α z (36) x 2 dµz (x)}. e−it x d P µ (x) = exp{−c(α)(it) 2 Plugging this equality into (35) yields ∞ α α α (it) 2 −1 eit z exp{−c(α)(it) 2 x 2 dµz (x)}dt. X µz = iC(α)

(37)

We have obtained the announced closed equation ∞ α α X µz = iC(α) (it) 2 −1 eit z exp{−c(α)(it) 2 X µz }dt.

(38)

0

0

6. Proofs of Theorem 1.1 and of Theorem 1.4 In this section we gather the preceding arguments and prove Theorem 1.1 and Theorem 1.4. This proof will be based on the following uniqueness result for the closed equation (34). We recall the notation ∞ α α 2 ∞ −v α2 −vy gα (y) := e e dv = t 2 −1 e−t exp{−t 2 y}dt. α 0 0 Theorem 6.1. 1. There exists a unique analytic function X z of z ∈ C+ , such that α |X z | = O(|I m(z)|− 2 ) at infinity, satisfying the equation ∞ α α X z = iC(α) (it) 2 −1 eit z exp{−c(α)(it) 2 X z }dt. (39) 0 α

2. This solution in fact also satisfies: |X z | = O(|z|− 2 ). α 3. If one defines Yz := (− 1z ) 2 X z , then Yz is the unique solution of the equation (−z)α Yz = C(α)gα (c(α)Yz ), analytic on C+ and tending to zero at infinity. In fact |Yz | = O(|z|−α ).

736

G. Ben Arous, A. Guionnet

Proof. We already know that there exists such an analytic solution X z . Indeed we have seen in the preceding section that, if µz is a limit point, then X µz is such a solution. In order to prove uniqueness, we will use that: Lemma 6.2. For all z ∈ C+ and any y ∈ C, α ∞ α α α 1 2 − gα (y) = i (it) 2 −1 eit z exp[−(−z) 2 (it) 2 y]dt. z 0 Proof. We write z = r eiθ with some θ ∈]0, π [. Assume first that θ ∈]0, π2 [. Fix β > 0. α

β

α

Since f (u) = (u) 2 −1 euz e−u 2 [(−z) 2 y] is analytic in C\R− , for all R > 0 finite, its integral over the contour $ % π = {it, ≤ t ≤ R} ∪ eiη R, η ∈ , π − θ ∪ {eiπ −iθ t, R ≤ t ≤ } 2 $ π % iη ∪ e , η ∈ π − θ, 2 vanishes. Note that η + θ ∈ [ π2 + θ, π ] so that (Reiη z) = Rr cos(η + θ ) < 0 for all η ∈ [ π2 , π − θ ] and θ ∈]0, π2 [. This shows that π iη lim R f (eiη R) = 0 ∀η ∈ , π − θ ⇒ lim R !π " f (e R)dη = 0. R→∞ R→∞ 2 η∈ 2 ,π −θ Similarly,

iη iη < ∞ ⇒ lim f (e )dη lim sup !

" ! " f (e )dη = 0.

→0 →0 η∈ π ,π −θ η∈ π ,π −θ 2

2

Hence, letting R → ∞ and → 0, we find ∞ 0 i f (it)dt + f (ei(π −θ) t)ei(π −θ) dt = 0. 0

+∞

In other words, ∞ α α β 2 2 (it) 2 −1 eit z e−(it) [(−z) y] dt = − i

∞

β

−iθ

α 2

α 2

(−e−iθ t) 2 −1 e−t|z| e−(−e t) [(−z) y] e−iθ dt 0 ∞ α α β −1 2 2 −1 (−z −1 t) 2 −1 e−t e−(−z t) [(−z) y] dt, = −z

0

0

where we finally did the change of variable t = |z|t. α α α α α Noting the obvious facts (−z −1 t) 2 −1 = (−z −1 ) 2 −1 t 2 −1 and (−z −1 ) 2 −1 (−z) 2 −1 = 1, we thus have proved that ∞ ∞ α β β β α α 2 (it) 2 −1 eit z exp{−(−z) 2 (it) 2 y}dt = (−z −1 ) 2 t 2 −1 e−t e−yt dt, (40) i 0

0 π 2 [.

The case θ ∈ [ π2 , π [ is which upon taking β = α proves the claim when θ ∈]0, identical after an immediate modification of the definition of the contour. It can also be obtained by a trivial analytic extension argument.

Spectrum of Heavy Tailed Random Matrices

737

By Lemma 6.2 we remark that, if X z is a solution of Eq. (39) and if z = |z|eiθ , ∞ α α X z = −e−iθ C(α) (−e−iθ t) 2 −1 e−t|z| exp{−c(α)(e−iθ t) 2 X z }dt (41) 0 & ' α ∞ α2 −1 1 t 2 t −t = − C(α) e exp −c(α) − X z dt − z z z 0 & ' α α ∞ α α 1 2 1 2 −1 −t C(α) t 2 e exp −c(α)t 2 − X z dt. = − (42) z z 0 α

Hence, if Yz := (− 1z ) 2 X z , we obtain ∞ α α t 2 −1 e−t exp{−c(α)t 2 Yz }dt. (−z)α Yz = C(α)

(43)

0

This equation for Yz can be written simply as (−z)α Yz = C(α)g(c(α)Yz ). We recall that α we have assumed that there exists a constant C1 such that |X z | ≤ C1 (z)− 2 . Now, consider the function of two complex variables F(u, y) = ugα (y) − y. Obviously F(0, 0) = 0 and ∂ y F(0, 0) = −1. By the local implicit function theorem, there exists 1 > 0 and 2 > 0, such that for every u ∈ C with |u| < 1 there exists a unique y(u) ∈ C with |y(u)| < 2 satisfying the equation F(u, y(u)) = 0, i.e ugα (y(u)) = y(u). Moreover |y(u)| ≤ C|u|

(44)

and y is an analytic function on |u| < 1 . For any z ∈ C+ , such that (z) > L, with 2 1 − α2 1 L α > C(α) ∨ c(α)C so that |Yz | ≤ CL α1 ≤ c(α) . Thus for z ∈ C+ , 2 , then |X z | ≤ C 1 L 1 such that (z) > L we have that

1

C(α)(−z)α ≤ 1 , |c(α)Yz | ≤ 2 . Therefore, the uniqueness in the local implicit function theorem shows that Yz is given 1 1 1 + by Yz = c(α) y( C(α)(−z) α Yz . Since X z is analytic on z ∈ C α ) and thus that X z = 1 (− z ) 2

and uniquely determined on the set of z ∈ C+ such that (z) > L, it is uniquely determined. This proves the claim of uniqueness for X z . Using the bound (44) now proves α the improved bound at infinity, i.e |X z | = O(|z|− 2 ). These arguments prove the second and third statements of the theorem. We can now deduce from this last uniqueness result the convergence of the mean of the normalized trace of the resolvent. N 1 Theorem 6.3. For any κ ∈]0, 2(2−α) [, any z ∈ C+ , E[ N1 k=1 G κN (z)kk ] converges as N goes to infinity to ∞ α 1 ∞ −t −c(α)t α2 Yz 2 G α (z) := i eit z e−c(α)(it) X z dt = − e e dt. (45) z 0 0

738

G. Ben Arous, A. Guionnet

Proof. For any z ∈ C+ and any limit point µz , 1 z d P µ (x) xdµz (x) = z−x ∞ z =i eit (z−x) d P µ (x)dt 0 ∞ α eit z e−c(α)(it) 2 X z dt. =i 0

The uniqueness of X z implies that the mean of the resolvent E[N −1 tr(z − AκN )−1 ] has a unique limit point which is given by ∞ α 2 eit z e−c(α)(it) X z dt. G α (z) = i 0

This shows that E[N −1 tr(z − AκN )−1 ] converges to G α (z). In order to finish the proof, observe that for z ∈ C+ , we can use the same arguments as in the proof of Lemma 6.2 to see that ∞ α 2 eit z e−c(it) X z dt G α (z) = i 0 1 ∞ −t −c(−t z −1 ) α2 X z e e dt =− z 0 ∞ α 1 2 e−t e−c(t) Yz dt. (46) =− z 0 This last result enables us to conclude the proof of Theorem 1.1 and Theorem 1.4. Proof of Theorem 1.1 and Theorem 1.4. By Lemma 3.1, E[µˆ Aγ ] is tight for the weak N topology. Taking any subsequence, we see that any limit point µ is such that its Stieltjes transform must be equal to G α (z) for all z ∈ C+ . This prescribes uniquely the limit point µ and thus ensures the convergence of E[µˆ Aγ ] towards µ ∈ P(R) so that N (z − x)−1 dµ(x) = G α (z), z ∈ C+ . By Corollary 2.2, and the fact that d1 (E[µˆ AκN ], E[µˆ A N ]) ≤ E[d1 (µˆ AκN , µˆ A N )] we also conclude that E[µˆ A N ] converges weakly towards µ. By Lemma 5.4, for any −1 ˆ κ (x) converges in probability towards G (z). This z ∈ C\R, L z,κ α AN N (x) = (z − x) d µ convergence holds as well for finite dimensional vectors ( (z i − x)−1 d µˆ AκN (x), 1 ≤ i ≤ n). Since {(z − x)−1 , z ∈ C\R}is dense in the set C0 (R) of functions on R going to zero at infinity, we conclude that f (x)d µˆ AκN (x) converges in probability towards f (x)dµ(x) for all f ∈ C0 (R). But also µˆ AκN (1) = µ(1) = 1 and so this vague convergence can be strengthened in a weak convergence (see the proof of Lemma 2.1). We

Spectrum of Heavy Tailed Random Matrices

739

finally can remove the truncation by κ by using Corollary 2.2. Again by Lemma 5.4, L z,κ (x) = (z − x)−1 d µˆ A N (x) converges almost surely along subsequences φ(N ) so N that φ(N )− < ∞ by the Borel-Cantelli Lemma. As = α2 − 2−α κ is as close to one as wished, for any sequence φ(N ) so that φ(N )−ε < ∞ for some ε < 1, we can z,κ choose κ close enough to one so that L φ(N ) (x) converges almost surely to G(z). This entails the almost sure weak convergence of µˆ Aφ(N ) by the same arguments as above. Remark 6.4. If we could prove that the equation given in Theorem 5.2 admits a unique solution µz , at least for z in a set large enough, the convergence of E[L z,κ N ] to this solution would be assured. We cannot prove this uniqueness result. But as we have seen we do not really need such a strong uniqueness statement either. We rather have proved a z (x), which already entails the uniqueness weaker statement, i.e the uniqueness of xdµ z,κ of the limit points for E[ xd L N (x)], i.e the mean Stieltjes transform of the spectral measure of AκN . This is sufficient for our needs but the question of the uniqueness of solutions to the equation given in Theorem (5.2) remains intriguing. 7. Study of the Limiting Measure. Proof of Theorem 1.6 In this section, we prove Theorem 1.6. First, the fact that the limit measure µα is symmetric is obvious. It suffices to consider the case where the entries have symmetric distributions. To prove the other statements, we need to consider the limit of G α (z) as z tends to a positive real number x. We first remark that the analytic function Yz defined on C+ is univalent (i.e one-to-one). Indeed this is an obvious consequence of the equation, valid for z ∈ C+ : (−z)α Yz = C(α)gα (c(α)Yz ). In order to study the boundary behavior of G α (z), we thus have to study the boundary behavior of the univalent function Yz . For x ∈ R, the cluster set Cl(x) is defined as the set of limit points of Yz when z tends to x (see [5] or [10]). It is easy to see that for any non-zero x ∈ R the cluster set Cl(x) is reduced to one point in C ∪ {∞}. Indeed, assuming w.l.o.g that x > 0 we have, for any finite v ∈ Cl(x), the equality gα (v) = (−x)α v. If Cl(x) contains two points it is a continuum, i.e a compact connected set with more than one point (see [5]). By analytic continuation we would then get the equality gα (v) = (−x)α v for every v ∈ C which is false. The only remaining possibility for Cl(x) is to be reduced to one finite point or to the point at infinity. We define K α = {x ∈ R, Cl(x) = {∞}}. We first prove that K α is bounded. The proof of Theorem 6.1, using the local implicit function theorem at infinity, shows that Yz admits an analytic extension to the set{z ∈ C, |z| > L} for L large enough, and that this extension satisfies |Yz | = O(|z|−α ). This obviously proves that, when |x| > L, the cluster set Cl(x) is reduced to one finite point and thus that K α is bounded. We consider the complement Uα of K α . Let x ∈ Uα and Yx be the unique point in the cluster set Cl(x). By continuity, for x > 0, Yx satisfies the equation eiπ α x α Yx = C(α)g(c(α)Yx ).

740

G. Ben Arous, A. Guionnet

The local implicit function theorem can be applied to this equation at (x, Yx ), except for the subset, say F, of R where the derivative vanishes. The exceptional set F must be bounded, since the derivative does not vanish at infinity, and its points must all be isolated. Thus F is finite. For any x ∈ Uα \F, the implicit function theorem shows that Yz can be extended analytically on a complex neighborhood of x. Hence Uα := Uα \F is open and its complement K α = K α ∪ F ∪ {0} is closed. K α is also bounded and thus compact. Finally we use Beurling’s Theorem which states that the set K α has capacity zero, and thus also the set K α (see [5] or [10]). For any point x in the open set Uα the function Yz admits an analytic extension to a complex neighborhood of x, and thus the Stieltjes transform G α (z) admits a smooth extension, which proves that µα has a smooth density ρα on the open set Uα . Indeed, for x ∈ Uα , 1 ∞ −t −c(α)t α2 Yx lim G α (z) = H µα (x) − iπρα (x) = − e e dt. z→x x 0 In particular the density of the measure µα is given, if Yx = r x eiφx , by ∞ α α 1 2 ρα (x) = − e−t e−c(α)t [r x cos(φx )] sin[c(α)t 2 r x sin(φx )]dt. πx 0

(47)

Note that we now know that Yx is well defined and smooth for x large enough. We also have seen that Yx = O(|x|−α ) and thus that Yx ∼ eiπ α C(α)gα (0)x −α . Hence, when x → ∞, the following asymptotic behavior holds for G α (x) = lim z→x G α (z) : α 1 ∞ −t G α (x) ∼ e (1 − c(α)t 2 Yx (1 + o(1)))dt x 0 ∞ α 1 e−t t 2 dtYx (1 + o(1))). ≈ (1 − c(α) x 0 Identifying the imaginary parts of both sides we get: α (Yx ) ρα (x) ∼ π −1 c(α) 1 + , 2 x which proves the last statement of Theorem 1.6. 8. Cizeau and Bouchaud’s Characterization In [3], the authors propose the following argument; they look at G N (z)00 for z on the real line. By arguments similar to those we used (but with no a priori bounds on the G N (z)kk ) they argue that G N (z)00 converges in law as N goes to infinity. The limit law, that we will denote PG to follow their notations (but which is µz in ours) is then given by the implicit equation (11) in [3], 1 1 1 d PS (y) = . f (y)d PG (y) = z − f f (y)d P S z−y y2 y C(z),β(z)

PS = L α is now a real-valued stable law with parameters C(z) and β(z) given self-consistently (see (12a) and (12b) in [3]) by

Spectrum of Heavy Tailed Random Matrices

α

C(z) =

741

|y| 2 d PG (y) =

α 1 , |y| 2 −2 d PS x − y

α

β(z) =

|y| 2 sign(y)d PG (y),

where there was a typographical error in the definition of β in [3]. Equation 12b was already noticed in [4]. We in fact have that for any real t, α πα −1 2 e−it y d PS (y) = e−Cα t (C(z)−i tan( 4 )β(z))

= e− (α−1)(it) α 2

α 2

α

(x) 2 d PG (x)

,

(48)

− iπ4α

where we used that K z := (x) d PG (x) = e [cos( π4α )C(z) − i sin( π4α )β(z)]. So, we see that the description of the limit law is very similar to ours, except that z is supposed to belong to R. Let us assume (as seems to be the case in [3]) that C(z) and β(z) are finite. Then, also K z is finite and we see that for non-negative real z’s, α K z = (z − y)− 2 d PS (y) ∞ α α 2 = −C(α) t 2 −1 e−t z e− (α−1)(it) K z dt. (49) 0

Hence, K z and the X z introduced in Sect. 7 satisfy formally the same equation, except that X z satisfies it for z ∈ C+ and K z for real z’s. Moreover, we have seen that X z can be extended continuously to z real in (K α )c and then this extension X z satisfies the same equation that K z . This indicates that we expect K z and X z to be equal, at least on (K α )c . In fact, X z is the unique solution of this equation with an analytic extension to C+ and going to zero at infinity. In [3], under (12a–12b), it is claimed that the equations defining C(z), β(z) have a unique solution, and so that K z is also determined uniquely by (49). We could not prove the uniqueness of the solutions to this equation on the real line. In any case, if we believe either that K z extends analytically on C+ and goes to zero at infinity or that the above equation has a unique solution for z ∈ R, we must have X z = K z at least for z ∈ (K α )c . The second claim of [3] is that the density of the limiting spectral measure ρ(z)dz = dµ(z) is given, see [3] (14), by d PS ρ(z) = (z). dz Note that by Fourier inversion, if K z = X z , for z > 0, since PS is a probability measure on R with Fourier transform given by (48), α d PS 1 2 (z) = e−it z e− (α−1)(it) X z dt dz 2π R ∞ α 1 −it z − (α−1)(it) 2 X z e e dt = π 0 ∞ α 1 1 −t − (α−1)(t) 2 Yz e e dt , =− π z 0 and therefore we recover our result (47). Hence, at least for z ∈ (K α )c , the prediction of [3] coincides with our result if we believe that (49) has a unique solution.

742

G. Ben Arous, A. Guionnet

9. The Moment Method. Proof of Theorem 1.8 We prove here Theorem 1.8 using the moment method developed by I. Zakharevich [15]. For any B > 0, we consider the matrix X NB with truncated entries xiBj = xi j 1|xi j |≤Ba N B = a −1 X B . Recall that we work here under the additional and the normalized matrix A N N N hypothesis (7):

lim

u→∞

P(x(i j) > u) = θ ∈ [0, 1]. P(|x(i j)| > u)

B. We begin by the following estimate on moments of the entries of A N

Lemma 9.1. For any integer m ≥ 1, the following limit exists: " ! B E AN (i j)m Cm = lim "m . ! B m N →∞ N 2 −1 E A N (i j)2 2 Moreover, if m = 2k is even 2−α Cm = m−α

2−α α B α

m −1 2

.

If m = 2k − 1 is odd Cm = (2θ − 1)

2−α m−α

2−α α B α

m −1 2

.

Proof. It is a simple application of the classical result about truncated moments (Theorem VIII.9.2 of [6]) already used in Sect. 3.1, (15) : For any ζ ≥ α, E[|x(i j)|ζ 1|x(i j)|
ζ

a α B ζ −α N . ζ −α N

The first item of the lemma is a direct consequence of this estimate for ζ = 2 and ζ = 2k. The second is also a consequence of this estimate, used for x(i j)+ and x(i j)− , and of the additional skewness hypothesis (7). This lemma enables us to get the main result of this section, i.e the convergence of the B . We will need some more notations moments of the spectral measure of the matrix A N that we take verbatim from Zakharevich. For any integer k ≥ 1, we define Vk as the set of all (e1 , . . . , el ) such that li=1 ei = k and e1 ≥ e2 ≤ . . . ≥ el > 0. For any (e1 , . . . , el ) ∈ Vk define T (e1 , . . . , el ) as the number of colored rooted trees with k + 1 vertices and l + 1 distinct colors, say (c1 , . . . cl ) satisfying the following conditions: 1. 2. 3.

There are exactly ei nodes of color ci . The root node is the only node colored c0 . If nodes a and b are the same color then the distance from a to the root is the same as the distance from b to the root. If nodes a and b have the same color then their parents also have the same color.

With these notations we have the following convergence result, directly implied by Zakharevich’s results.

Spectrum of Heavy Tailed Random Matrices

743

Lemma 9.2. 1. For every integer k ≥ 1, the following limit exists: lim E x k d µˆ A B (x) =: m kB . N →∞

N

(50)

(l B = 2. m kB = 0 if k is odd, and m 2k (e1 ,...,el )∈Vk T (e1 , . . . , el ) i=1 C 2ei . 3. There exists a probability measure µαB uniquely determined by its moments m kB . µαB is independent of the skewness parameter θ . 4. µαB has unbounded support and is symmetric. 5. The mean spectral measure E[µˆ A B ] converges weakly to µαB . N

Proof. In order to prove the first and second items, it is enough to use the preceding lemma, Corollary 6 and Theorem 2 in [15], plus the fact that α B lim N E A N B 2−α . (i j)2 = N →∞ 2−α The third item is a consequence of the estimate C(m) ≤ Cρ m 1

α 2 B with ρ = ( 2−α α B ) and of Proposition 10 in [15]. The fact that µα is independent of the skewness parameter θ is obvious since its moments only depend on the Cm for even m’s, which are insensitive to the parameter θ . The fourth item is a consequence of Proposition 9 and Proposition 12 of [15]. The fifth one is a consequence of Theorem 1 of [15].

This lemma proves the first part of Theorem 1.8. In order to prove the second part we simply remark that we have already done so, since we have seen, in the proof of Lemma 3.1, that µαB converges and that its limit is the weak limit of E[µˆ A N ]. 10. Appendix: Convergence to Stable Distributions for Triangular Arrays We begin here by recalling the notations for stable distributions, see for instance [11]. A real random variable Y has a stable distribution with exponent α ∈ (0, 2), α = 1, scale parameter σ > 0, skewness parameter β ∈ [−1, 1], and shift parameter µ ∈ R (in short Stableα (σ, β, µ) ) iff its characteristic function is given by: π α + iµt . E[exp(itY )] = exp −σ α |t|α 1 − iβsign(t) tan 2 We will consider here only the case where α < 1. A complex random variable Y has an α-stable distribution with spectral representation ( , µ) if is a finite measure on the unit circle S 1 , and µ is a complex number such that the characteristic function of Y is given by: π α E[exp(it, Y )] = exp − (ds) + iµ, t . |t, s|α 1 − isign(t, s) tan 2 S1 We will need the constant Cα−1 =

0

∞

(2 − α) cos( π2α ) sin x . d x = xα 1−α

744

G. Ben Arous, A. Guionnet

Throughout this section, we consider a sequence of i.i.d non-negative random variables (X k )k≥1 and assume that their common distribution is in the domain of attraction of an α-stable distribution, with α ∈ (0, 1), i.e that the tail is regularly varying: P[X ≥ u] =

L(u) . uα

We introduce the normalizing constant # a N by: 1 # a N = inf u, P[X ≥ u] ≤ . N

(51)

We consider a triangular array of real or complex numbers (G N ,k , 1 ≤ k ≤ N ) and give sufficient conditions for the normalized sum: N 1 SN = G N ,k X k # aN k=1

to converge in distribution to a (real or complex) stable distribution. We will always assume that the triangular array is bounded, i.e that M := sup(|G N ,k |, N ≥ 1, 1 ≤ k ≤ N ) < ∞. We begin with the case where the numbers G N ,k are real. Theorem 10.1. Assume that the triangular array of real numbers (G N ,k , N ≥ 1, 1 ≤ k ≤ N ) is bounded. Furthermore assume that the empirical measure νN =

N 1 δG N ,k N k=1

converges weakly to a (compactly supported) probability measure ν on the real line. N Then the distribution of the normalized sum S N = #a1N k=1 G N ,k X k converges to a Stableα (σ, β, 0) distribution, with 1 σα = |x|α dν(x), Cα α |x| sign(x)ν(d x) β= . |x|α ν(d x) If σ α = 0, i.e if ν = δ0 , the above statement should of course be understood as: N G N ,k X k converges in distribution to zero. S N = #a1N k=1 Proof of Theorem 10.1. We begin with the particular case where the numbers G N ,k are positive and bounded below. We assume that there exists an δ > 0 such that for any N ≥ 1 and 1 ≤ k ≤ N , δ ≤ G N ,k ≤ M.

(52)

Spectrum of Heavy Tailed Random Matrices

745

In this context we will be able to apply directly classical theorems to the array of non negative independent random variables U N ,k =

1 G N ,k X k . # aN

For instance, we could apply the theorem in Sect. XVII.7 of [6]. We rather choose to apply Theorem 8, Chap. 5 of [7]. According to this last result, Theorem 10.1 will be proved in this restricted case if we can check the following three conditions. First the Uniform Asymptotic Negligibility (UAN) condition, for every > 0, max P(U N ,k > ) = 0.

lim

(53)

N →∞ 1≤k≤N

Second we must check that:

lim lim

→0 N →∞

V ar [U N ,k 1(U N ,k <) ] = 0,

(54)

1≤k≤N

and finally we must check that, for x > 0, Cα σ α lim P max U N ,k ≤ x = exp − α , N →∞ 1≤k≤N x and that

lim P

N →∞

min U N ,k ≤ x

1≤k≤N

= 1.

We first note that

P(U N ,k

(55)

# aN > ) = P X k > G N ,k

L ≤

(56)

# aN G N ,k

# aN G N ,k

α ,

which shows that (53) is thus a direct consequence of our assumption (52) and of the following lemma. Lemma 10.2. Let L be a slowly varying function and define # a N as in (51): 1 . # a N = inf u, P[|X | ≥ u] ≤ N

(57)

Then, for any 0 < a < b and any a < y < b, L(y# aN ) 1 1 = (1 + (x, N )) (y# a N )α N yα

(58)

with lim

sup (x, N ) = 0.

N →∞ a
(59)

746

G. Ben Arous, A. Guionnet

Proof of Lemma 10.2. Writing aN ) 1 L(y# aN ) L(y# a N ) N L(# = , (y# a N )α L(# aN ) # a αN N yα

(60)

this lemma is clearly a direct consequence of the classical fact: lim

N →∞

N L(# aN ) =1 (# a N )α

(61)

and of the uniform convergence theorem for slowly varying functions ([2], Theorem 1.2.1), which asserts that the convergence lim

t→∞

L(t x) =1 L(t)

is uniform for x’s in a compact subset of (0, ∞).

(62)

Next, in order to control the variance V ar [U N ,k 1(U N ,k <) ] and prove the validity of (54), we must use Karamata’s theorem, or more directly Theorem VIII.9.2 of [6] which shows that t ζ −α L(t) ζ −α = . ζ t→∞ E[X 1 X
Using this for ζ = 1, 2, we see that V ar [U N ,k 1(U N ,k <) ]

α 2−α

(63)

⎡ ⎤ L G#aNN,k ⎢ ⎥ α ⎦ . 2 ⎣ # aN G N ,k

Lemma (10.2) then shows that V ar [U N ,k 1(U N ,k <) ] is of order N , and thus that lim lim V ar [U N ,k 1(U N ,k <) ] = 0.

(64)

2

→0 N →∞

(65)

1≤k≤N

In order to complete the proof of Theorem 10.1 in the particular case where the numbers G N ,k are positive and bounded below, we now only have to check (55) since (56) is obvious, the variables u N ,k being non-negative. For x > 0, ⎡ ⎤ N L Gx#aNN,k ⎥ ⎢ α ⎦ . log ⎣1 − log P max U N ,k ≤ x = 1≤k≤N

1

x# aN G N ,k

Using again Lemma (10.2) we see that Cα σ α lim log P max U N ,k ≤ x = − α N →∞ 1≤k≤N x since lim

N →∞

N 1 α G N ,k = |x|α ν(d x) = Cα σ α . N 1

Spectrum of Heavy Tailed Random Matrices

747

This checks the condition (55) and finishes the proof in the particular case where the numbers G N ,k are positive and bounded below. Now it is easy to prove Theorem 10.1 in full generality. It is enough to split the sum into the three independent summands SN =

N N 1 G N ,k X k = U N ,k = S N+, − S N−, + S N , # aN k=1

k=1

with S N+, =

N

U N ,k 1
k=1

S N−, = −

N

U N ,k 1G N ,k <− ,

k=1

S N , =

N

U N ,k 1|G N ,k |≤ .

k=1

We now know that, if and − are not atoms of ν, then S N+, (resp. S N−, ) converges in + , 1, 0) (resp. Stable (σ − , 1, 0)) with distribution to a Stableα (σα, α α, ∞ + Cα σα, = |x|α ν(d x), − − Cα σα, = |x|α ν(d x), −∞

so that the sum S N+, + S N−, converges in distribution, when N tends to ∞ to a Stableα (σα, , βα, , 0) with Cα σα, = |x|α ν(d x), |x|> α |x|> |x| sign(x)ν(d x) . βα, = α |x|> |x| ν(d x) It is clear that, since lim→0 σα, = σα and that lim→0 βα, = βα , the distribution Stableα (σα, , βα, , 0) converge to Stableα (σα , βα , 0), when tends to zero. Thus there exists a sequence N tending to zero such that the sum S N+, N + S N−, N converges in distribution to a Stableα (σα , βα , 0) variable. But S N , N converges to zero in probability when N → ∞. Indeed, for any x > 0, N 1 x Xk > P(|S N , N | > x) ≤ P # aN N k=1

so that lim P(|S N , N | > x) = 0.

N →∞

(66)

748

G. Ben Arous, A. Guionnet

N These two facts show that S N = #a1N k=1 G N ,k X k converge in distribution to a Stableα (σα , βα , 0) variable as announced in Theorem 10.1. This result implies easily the following analogous result in the complex case. Theorem 10.3. Assume that the triangular array of complex numbers (G N ,k , N ≥ 1, 1 ≤ k ≤ N ) is bounded. Furthermore assume that the empirical measure νN =

N 1 δG N ,k N k=1

converges weakly to a (compactly supported) probability measure ν on the complex N G N ,k X k converges in distribution to a complex stable displane. Then S N = #a1N k=1 tribution with spectral representation ( ν , 0), where ν is the measure on S 1 obtained as z the image of the measure C1α |z|α ν(dz) on the complex plane by the map z → |z| . Again if ν = δ0 the above statement should be understood as: S N converges in distribution to zero. Proof. For any fixed t ∈ C, a direct application of Theorem 10.1 to the array of real numbers (t, G N ,k ) shows that t, S N converges in distribution to a Stableα (σ (t), β(t), 0) variable, where 1 σ (t) = Cα α

and

β(t) =

|t, z|α dν(z)

|t, z|α signt, zdν(z) . |t, z|α dν(z)

As a consequence, we obtain that

π α lim E[exp(it, S N )] = exp σ (t)α 1 − iβ(t) tan . N →∞ 2

Note that, by definition of ν : π α π α α = ν (ds). σ (t) 1 − iβ(t) tan |t, s|α 1 − isign(t, s) tan 2 2 S1 These two last facts prove that the distribution of S N converges to a complex α-stable distribution with spectral representation ( ν , 0). In Sect. 5 we need a slight variation of Theorem 10.3. We want to extend it to the case where the random variables X k are truncated at a high enough level. More precisely, keeping the notations and hypothesis of Theorem 10.3, we define, for any : N → R+ going to infinity at infinity, the truncated variables Z k = X k 1 X k ≤(N )#a N . We then consider the normalized sum S N =

N 1 G N ,k Z k . # aN k=1

Spectrum of Heavy Tailed Random Matrices

749

Theorem 10.4. Assume that the triangular array of complex numbers (G N ,k , N ≥ 1, 1 ≤ k ≤ N ) is bounded. Furthermore assume that the empirical measure νN =

N 1 δG N ,k N

(67)

k=1

converges weakly to a (compactly supported) probability measure ν on the complex plane. Then S N converges in distribution to a complex stable distribution with spectral representation ( ν , 0), where ν is the measure on S 1 obtained as the image of the z measure C1α |z|α ν(dz) on the complex plane by the map z → |z| . Again if ν = δ0 , the above statement should be understood as: S N converges in distribution to zero. The proof of this variant is identical verbatim to the proof of Theorem 10.3, and we omit it. Finally we also need in Sect. 5 information about the Fourier-Laplace transform of certain complex stable distributions. Consider a probability measure ν on C and define as above the measure ν . Let us denote by P ν the complex Stableα ( ν , 0) distribution. Theorem 10.5. Assume that the measure ν is compactly supported in the closure of C− . Then, for any t > 0: −it x ν α α d P (x) = exp − (1 − α)(it) (68) e x dν(x) . Proof. This is a simple consequence of the analogous result for real Stableα (σ, 1) distributions. If X is a random variable with Stableα (σ, 1) distribution, and if γ ∈ C with (γ ) > 0, then E(e−γ X ) = e

−

σα cos π2α

(

)

γα

.

(69)

This result is classical when γ is real positive (see Proposition 1.2.12 of [11] for instance). The statement (69) is obtained by an easy analytic extension from the real case. Consider now a sequence of i.i.d.r.v (X k )k≥1 , with common distribution Stableα (σ, 1). Furthermore consider a bounded array of complex numbers (G N ,k ) ∈ C− , such that the N empirical measure N1 k=1 δG N ,k converges to ν when N → ∞. As above define the normalized sum SN =

N 1 G N ,k X k . # aN k=1

Then, if γ N ,k = it

G N ,k # aN ,

one has obviously E(e−it SN ) =

N

E(exp(−γ N ,k X k )).

k=1

Noting that (γ N ,k ) > 0, it is then possible to use (69): N σα −it S N α ) = exp − γ E(e . cos π2α k=1 k,N

750

G. Ben Arous, A. Guionnet

Using the classical tail estimate for real Stableα (σ, 1) distributions, when u tends to ∞: P(X ≥ u) ∼ 1

Cα σ α , uα

1

one sees that # a N ∼ Cαα N α σ . Thus, we get the estimate N 1 α (it)α −it S N ) ∼ exp − G N ,k . E(e Cα cos π2α N k=1

N But N1 k=1 G αN ,k converges to x α dν(x). Using now the convergence of Theorem 10.3 we see that 1 −it x ν −it S N α α (it) e x dν(x) . d P (x) = lim E(e ) = exp − N →∞ Cα cos π2α Noting that Cα cos proves Theorem 10.5.

πα 2

=

1−α 1 = (2 − α) (1 − α)

Acknowledgements. The authors wish to thank A. Soshnikov for pointing out this problem to them during a conference at Banff in 2004. They are very grateful to S. Belinschi and A. Dembo for very useful comments.

References 1. Bai, Z.D.: Methodologies in spectral analysis of large-dimensional random matrices, a review. Statist. Sinica 9, 3, 611–677 (1999) (with comments by G. J. Rodgers, Jack W. Silverstein; and a rejoinder by the author) 2. Bingham, N.H., Goldie, C.M., Teugels, J.L.: Regular variation, Vol. 27 of Encyclopedia of Mathematics and its Applications. Cambridge: Cambridge University Press, 1989 3. Bouchaud, J., Cizeau, P.: Theory of Lévy matrices. Phys. Rev. E 50, 1810–1822 (1994) 4. Burda, Z., Jurkiewicz, J., Nowak, M., Zahed, I.: Random Lévy matrices revisited. http://arxiv./orglist/ cond-mat/0602087, 2006 5. Collingwood, E.F., Lohwater, A.J.: The theory of cluster sets. Cambridge Tracts in Mathematics and Mathematical Physics, No. 56. Cambridge: Cambridge University Press, 1966 6. Feller, W.: An introduction to probability theory and its applications. Vol. II. Second edition. New York: John Wiley & Sons Inc., 1971 7. Galambos, J.: Advanced probability theory. Second ed., Vol. 10 of Probability: Pure and Applied. New York: Marcel Dekker Inc., 1995 8. Guionnet, A., Zeitouni, O.: Large deviations asymptotics for spherical integrals. J. Funct. Anal. 188(2), 461–515 (2002) 9. Khorunzhy, A.M., Khoruzhenko, B.A., Pastur, L.A.: Asymptotic properties of large random matrices with independent entries. J. Math. Phys. 37(10), 5033–5060 (1996) 10. Pommerenke, C.: Boundary behaviour of conformal maps. Vol. 299 of Grundlehren der Mathematischen Wissenschaften [Fundamental Principles of Mathematical Sciences]. Berlin: Springer-Verlag, 1992 11. Samorodnitsky, G., Taqqu, M.S.: Stable non-Gaussian random processes. In: Stochastic Modeling. New York: Chapman & Hall, 1994 12. Soshnikov, A.: Universality at the edge of the spectrum in Wigner random matrices. Commun. Math. Phys. 207(3), 697–733 (1999) 13. Soshnikov, A.: Poisson statistics for the largest eigenvalues in random matrix ensembles. In: Mathematical physics of quantum mechanics, Vol. 690 of Lecture Notes in Phys. Berlin: Springer, 2006, pp. 351–364

Spectrum of Heavy Tailed Random Matrices

751

14. Wigner, E.P.: On the distribution of the roots of certain symmetric matrices. Ann. of Math. (2) 67, 325–327 (1958) 15. Zakharevich, I.: A generalization of Wigner’s law. Commun. Math. Phys. 268(2), 403–414 (2006) Communicated by B. Simon

Commun. Math. Phys. 278, 753–773 (2008) Digital Object Identifier (DOI) 10.1007/s00220-007-0388-y

Communications in

Mathematical Physics

Expanding Domain Limit for Incompressible Fluids in the Plane James P. Kelliher Department of Mathematics, Brown University, Box 1917, Providence, RI 02912, USA. E-mail: [email protected] Received: 1 March 2007 / Accepted: 6 April 2007 Published online: 24 November 2007 – © Springer-Verlag 2007

Abstract: The general class of problems we consider is the following: Let 1 be a bounded domain in Rd for d ≥ 2 and let u 0 be a velocity field on all of Rd . Suppose that for all R ≥ 1 we have an operator T R that projects u 0 restricted to R1 (1 scaled by R) into a function space on R1 for which the solution to some initial value problem is well-posed with T R u 0 as the initial velocity. Can we show that as R → ∞ the solution to the initial value problem on R1 converges to a solution in the whole space? We answer this question when d = 2 for weak solutions to the Navier-Stokes and Euler equations. For the Navier-Stokes equations we assume the lowest regularity of u 0 for which one can obtain adequate control on the pressure. For the Euler equations we assume the lowest feasible regularity of u 0 for which uniqueness of solutions to the Euler equations is known (thus, we allow “slightly unbounded” vorticity). In both cases, we obtain strong convergence of the velocity and the vorticity as R → ∞ and, for the Euler equations, the flow. Our approach yields, in principle, a bound on the rates of convergence. Contents 1. Introduction . . . . . . . . . . . . . . 2. Yudovich Vorticity . . . . . . . . . . . 3. Function Spaces . . . . . . . . . . . . 4. Truncation of the Initial Velocity . . . 5. Weak Solutions . . . . . . . . . . . . . 6. Properties of the Velocity and Pressure 7. Tail of the Velocity . . . . . . . . . . . 8. Main Result: Convergence of Solutions Appendix A. Various Lemmas . . . . . . . References . . . . . . . . . . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

754 755 756 756 759 760 762 763 770 773

754

J. P. Kelliher

1. Introduction The properties of the solutions to the Navier-Stokes equations (which we refer to as (N S)) and to the Euler equations (which we refer to as (E)) are reasonably well understood in two dimensions in the setting of a bounded domain and in the whole space (as well as for periodic domains). It is a natural question to ask whether the solution to (N S) or (E) in a bounded domain approaches the solution to (N S) or (E) in the entire space as we let the size of the bounded domain increase to infinity. More precisely, let 1 be a bounded domain with a C 2 -boundary 1 . For simplicity, we assume that 1 is connected and simply connected. Define R := R1

and R := R1 = ∂ R

for R in [1, ∞).

(1.1)

We assume that the origin lies in the interior of 1 , so that R fills the whole space as R → ∞. For R = ∞, we define R to be R2 and R to be empty. Let X ( R ) be a function space for which (N S) or (E) is well-posed on R . Let u 0 lie in X (R2 ) and suppose that T R is a “truncation” operator that maps X (R2 ) to X ( R ) in such a way that u 0 | R − T R u 0 X ( R ) → 0 as R → ∞. The question we address is the following: If u R is the solution (velocity) to (N S) or (E) on R with 2 initial velocity T R u 0 and u is the solution to (N S) or (E) on R , can we show that u| − u R 2 → 0 as R → ∞? R L ([0,T ];X ( R )) We show in Theorem 8.1 that, in fact, such convergence does occur in X ( R ) = H 1 ( R ). For solutions to (N S) we need only assume that u 0 lies in H 1 (R2 ). For solutions to (E), though, we need a stronger assumption on u 0 to have well-posedness. We will assume that the initial velocity has Yudovich vorticity, described in Sect. 2. This is a class of vorticities introduced by Yudovich in [19] for which he showed uniqueness of solutions to (E) in a bounded domain in Rd , d ≥ 2. This class is slightly broader than initial vorticities lying in L ∞ , for which Yudovich established the same uniqueness result in [18]. It is the natural class of initial vorticities for us to use because it is ideally suited to the use of energy methods, and is the largest such class for which existence and uniqueness of solutions to (E) has been established. (For the larger class of initial vorticities defined by Misha Vishik in [15] existence is not known. Also, this class is not as readily amenable to the use of energy methods; see, however, [2].) We will restrict ourselves to solutions in the whole space that have finite energy, though this is a stronger condition than required. For instance, the spaces E m of [1] which allow infinite energy or spaces that allow even slower decay of the velocity at infinity can be dealt with using our techniques. The assumption of finite energy simplifies the analysis considerably, however, in large part because it does not require us to make significant adaptations to the standard existence and uniqueness results for the NavierStokes and Euler equations, and because it simplifies considerably the definition of the truncation operator T R . Our results seem to be most closely related to those of [5] and [6], in which the authors consider the limit as → 0 of solutions of (E) and (N S) on the domain external to = 1 , where 1 is a fixed simply connected domain. In a sense, this is the opposite limit to what we consider. They start with a smooth initial vorticity ω0 whose support is compact and does not contain the origin. For > 0, they use as an initial velocity the 0 unique divergence-free vector field in C that is tangent to ∂ , has a curl equal to ω in C , and has a given fixed circulation γ . Using a weak vorticity formulation of (E), they find, roughly speaking, that a subsequence of solutions to (E) converges in the limit as → 0 to a solution to (E) with an additional forcing term of γ δ. (Here, δ is the Dirac

Expanding Domain Limit in 2D

755

delta function.) In contrast, for (N S) they find that a subsequence converges to a solution to (N S) whose initial vorticity is ω0 + γ δ. (The smoothness of the initial vorticity is not the critical point; their convergence argument for (E) would apply for initial vorticities in L p for p > 2 and even less smoothness is required for (N S), as they note.) The limits considered here and in [5] and [6] can be viewed as falling into the broad class of limits of singularly perturbed domains, as considered in detail for elliptic problems in [12]. This paper is organized as follows: In Sect. 2 we define Yudovich vorticity and in Sect. 3 we define the function spaces we will use. In Sect. 4 we describe how we adjust the initial velocity to satisfy the boundary conditions. We define a weak solution to (N S) and (E) in Sect. 5 and give the basic existence, uniqueness, and regularity results for the velocity and pressure in Sect. 6. We also require a uniform-in-time bound on how fast solutions to (N S) and (E) in all of R2 vanish at infinity, which we discuss in Sect. 7. Our main result, in which we establish convergence of solutions to (N S) and (E) as R → ∞, is given in Sect. 8. We include in the Appendix various lemmas we use in the body of the paper. A few words on notation: We define the vorticity of a vector field u on R2 by ω(u) := ∂1 u 2 − ∂2 u 1 . By T , we always mean an arbitrary, but fixed, positive real number representing time. The symbol C stands for a positive constant that can hold different values on either side of an inequality, though always has the same value on each side of an equality. The constant may have dependence on certain parameters, such as viscosity, but will never have any dependence on our scaling factor, R. We use the notation f g when we sometimes should more properly write ( f, g)—the pairing of f in a function space X with an element g in the dual space of X . 2. Yudovich Vorticity Definition 2.1. Let θ : [ p0 , ∞) → R+ for some p0 in (1, 2). We say that θ is admissible if the function β M : (0, ∞) → [0, ∞) defined, for some M > 0, by1 β M (x) := 2C0 inf (M x 1− /)θ (1/) : in (0, (2 + 0 )−1 ] , (2.1) where C0 is a fixed absolute constant and 0 > 0 is fixed as in Lemma A.5, satisfies 1 dx = ∞. (2.2) 0 β M (x) Because

x inf ((x/M)− /)θ (1/) : in (0, (2 + 0 )−1 ] M = Mβ1 (x/M),

β M (x) = 2C0 M

this definition is independent of the value of M. Also, β M is a monotonically increasing continuous function, with lim x→0+ β M (x) = 0. Yudovich proves in [19] that for a bounded domain in Rn , if ω0 L p ≤ θ ( p) for some admissible function θ , then at most one solution to the Euler equations exists. Because of this, we call such a vorticity, Yudovich vorticity: 1 The definition of β in Eq. (2.1) differs from that in [7] in that it directly incorporates the factor of p M that appears in the Calderón-Zygmund inequality; in [7] this factor is included in the equivalent of Eq. (2.2).

756

J. P. Kelliher

Definition 2.2. We say that a vector field v has Yudovich vorticity if for some admissible function θ : [ p0 , ∞) → R+ with p0 in (1, 2), ω(v) L p ≤ θ ( p) for all p in [ p0 , ∞). Examples of admissible bounds on vorticity are θ0 ( p) = 1, θ1 ( p) = log p, . . . , θm ( p) = log p · log log p · · · logm p,

(2.3)

where logm is log composed with itself m times. These admissible bounds are described in [19] (see also [7].) Roughly speaking, the L p –norm of a Yudovich vorticity can grow in p only slightly faster than log p and still be admissible. Such growth in the L p –norm arises, for example, from a point singularity of the type log log(1/ |x|). 3. Function Spaces We will use the following function spaces: H ( R ) = v ∈ (L 2 ( R ))2 : div v = 0 in R and v · n = 0 on R , V (E) ( R ) = v ∈ (H 1 ( R ))2 : div v = 0 in R and v · n = 0 on R , (3.1) V (N S) ( R ) = v ∈ (H 1 ( R ))2 : div v = 0 in R and v = 0 on R . We equip H ( R ) with the L 2 ( R )-norm and V (E) ( R ) and V (N S) ( R ) with the H 1 ( R )-norm. Our solutions to (E) at time t will lie in V (E) ( R ), solutions to (N S) in V (N S) ( R ). In general, V (N S) ( R ) V (E) ( R ) H ( R ); however, when R = R2 , the first two spaces coincide, and we simply write V (R2 ). Given a function θ : [ p0 , ∞) → R+ admissible in the sense of Definition 2.1 for some p0 in (1, 2), we define the subspace Yθ ( R ) = v ∈ V (E) ( R ) : ω(v) L p ≤ Cθ ( p) for all p in [ p0 , ∞) for some constant C. We define a norm on Yθ by vYθ ( R ) = v L 2 ( R ) +

sup

p∈[ p0 ,∞)

ω(v) L p ( R ) /θ ( p).

(3.2)

Finally, we define the space (E) Y( R ) = v ∈ Yθ ( R ) : for some admissible θ , but place no norm on this space. 4. Truncation of the Initial Velocity Definition 4.1. (“Truncation” operator). Let

1 = {x ∈ 1 : dist(x, 1 ) < 1/2κ} , where κ is the maximum curvature of 1 . Let ϕ1 in C ∞ (1 ) taking values in [0, 1] be defined so that ϕ1 = 1 on 1\ 1 and ϕ1 = 0 on 1 , and let ϕ R (·) = ϕ1 (·/R) and

Expanding Domain Limit in 2D

757

R = R 1 . Let ψ be a stream function for u ∈ H (R2 ); that is, u = ∇ ⊥ ψ (ψ is unique up to the addition of a constant). Finally, define T R : H (R2 ) → H ( R ) by T R u := ∇ ⊥ (ϕ R ψ R ), (4.1) where ψ R = ψ − | R |−1 R ψ, so that R ψ R = 0 and u = ∇ ⊥ ψ R on all of R2 . Lemma 4.2. T R : H (R2 ) → H ( R ) with an operator norm that is independent of R. For any u in H (R2 ), u − T R u H ( R ) → 0 as R → ∞.

(4.2)

T R : V (R2 ) → V (E) ( R ) with an operator norm that is independent of R. For any u in V (R2 ), u − T R u H 1 ( R ) → 0 as R → ∞.

(4.3)

T R : Yθ (R2 ) → Yθ ( R ) with an operator norm that is independent of R. For any u in Yθ (R2 ), ω(u) − ω(T R u) L p ( R ) → 0 as R → ∞

(4.4)

uniformly over all p in [ p0 , ∞), p0 being as in Definition 2.2. If in Definition 4.1 we impose the extra condition on the cutoff function ϕ1 that ∇ϕ1 = 0 on 1 then also T R : V (R2 ) → V (N S) ( R )

(4.5)

with an operator norm that is independent of R, and Eq. (4.2) and Eq. (4.3) continue to hold. Proof. Define R , ϕ R , and ψ R as in Definition 4.1. Observe that ∇ϕ R L ∞ ( R ) ≤ C/R, ∇∇ϕ R L ∞ ( R ) ≤ C/R 2 , and by Lemma A.3, ψ R L p ( R ) ≤ C p R∇ψ R L p ( R ) = C p Ru L p ( R ) for all p in [1, ∞] for some constant C p . Thus, u − T R u H ( R ) = u − ∇ ⊥ (ϕ R ψ R ) L 2 ( R ) = u − ϕ R ∇ ⊥ ψ R − ψ R ∇ ⊥ ϕ R L 2 ( R ) ≤ 1 − ϕ R L ∞ ( R ) u L 2 ( R ) + ∇ϕ R L ∞ ( R ) ψ R L 2 ( R ) C2 Ru L 2 ( R ) ≤ Cu L 2 ( R ) . ≤ u L 2 ( R ) + R This converges to 0 as R → ∞ since u is in L 2 (R2 ), giving Eq. (4.2). The same calculation with the first term dropped gives T R u H ( R ) ≤ u L 2 ( R ) + C2 u L 2 ( R ) ≤ Cu L 2 ( R ) , which bounds the operator norm of T R : H (R2 ) → H ( R ) independently of R.

(4.6)

758

J. P. Kelliher

Similarly, ∇u − ∇T R u L 2 ( R ) = ∇u − ∇∇ ⊥ (ϕ R ψ R ) L 2 ( R ) = ∇u − ∇(ϕ R ∇ ⊥ ψ R ) − ∇(ψ R ∇ ⊥ ϕ R ) L 2 ( R ) = ∇u − ϕ R ∇∇ ⊥ ψ R − ∇ϕ R ⊗ ∇ ⊥ ψ R − ∇ψ R ⊗ ∇ ⊥ ϕ R − ψ R ∇∇ ⊥ ϕ R L 2 ( R ) = (1 − ϕ R )∇u − ∇ϕ R ⊗ ∇ ⊥ ψ R − ∇ψ R ⊗ ∇ ⊥ ϕ R − ψ R ∇∇ ⊥ ϕ R L 2 ( R ) ≤ ∇u L 2 ( R ) + 2 ∇ϕ R L ∞ ( R ) u L 2 ( R ) + ∇∇ ⊥ ϕ R L ∞ ( R ) ψ R L 2 ( R ) C C2 ≤ ∇u L 2 ( R ) + u L 2 ( R ) + 2 Ru L 2 ( R ) ≤ Cu H 1 ( R ) , R R which converges to zero because u is in H 1 (R2 ). This gives Eq. (4.3). The same calculation with the first term dropped gives ∇T R u L 2 ( R ) ≤ ∇u L 2 ( R ) + (C/R)u L 2 ( R ) ≤ Cu H 1 ( R ) . Together with Eq. (4.6), this bounds the operator norm of T R : V (R2 ) → V (E) ( R ) independently of R. Requiring that ∇ϕ1 = 0 on 1 (so ∇ϕ R = 0 on R ) affects none of the calculations above while ensuring that T R u lies in V (N S) ( R ), since then T R u = ϕ R ∇ ⊥ ψ R + ψ R ∇ ⊥ ϕ R = 0 on R , giving Eq. (4.5) and the independence of the operator norm on R. Now assume that u lies in Yθ (R2 ). Then for all p in the interval [ p0 , ∞), ω(u) − ω(T R u) L p ( R ) = ω(u) − ω(ϕ R ∇ ⊥ ψ R ) − ω(ψ R ∇ ⊥ ϕ R ) L p ( R ) = ω(u) − ϕ R ω(∇ ⊥ ψ R ) + ∇ϕ R · (∇ ⊥ ψ R )⊥ − ψ R ω(∇ ⊥ ϕ R ) + ∇ψ R · (∇ ⊥ ϕ R )⊥ L p ( R ) = (1 − ϕ R )ω(u) − 2∇ϕ R · ∇ψ R − ψ R ω(∇ ⊥ ϕ R ) L p ( R ) C C ∇ψ R L p ( R ) + 2 ψ R L p ( R ) . (4.7) R R We wish to obtain a bound on the last term that is independent of p. When p ≥ 2, ≤ ω(u) L p ( R ) +

C C ψ R L p ( R ) ≤ 2 ψ R L 2 ∩L ∞ ( R ) R2 R C C ≤ max {C2 , C∞ } 2 R∇ψ R L 2 ∩L ∞ ( R ) ≤ u L 2 ∩L ∞ ( R ) , R R which converges to 0 because u is in L 2 (R2 ) by assumption and is in L ∞ (R2 ) by Lemma A.4. For p in [ p0 , 2), let q and b be such that 1/ p = 1/2 + 1/q and 1/ p0 = 1/2 + 1/b. Then C C ψ R L p ( R ) ≤ 2 ψ R L 2 ( R ) 1 L q ( R ) ≤ C R 2/q−2 C2 Ru L 2 ( R ) 2 R R = C R 2/q−1 u L 2 ( R ) .

Expanding Domain Limit in 2D

759

Since q > b > 2, we have C ψ R L p ( R ) ≤ C R 2/b−1 u L 2 ∩L ∞ ( R ) ≤ C R 2/b−1 u L 2 ∩L ∞ (R2 ) , R2 an inequality that, in fact, holds for all p in [ p0 , ∞). Similarly, C ∇ψ R L p ( R ) ≤ C R 2/b−1 u L 2 ∩L ∞ (R2 ) . R Then from Eq. (4.7), we have ω(u) − ω(T R u) L p ( R ) ≤ ω(u) L p ( R ) + C R 2/b−1 u L 2 ∩L ∞ (R2 ) . This converges to 0 as R → ∞ because ω(u) is in L p (R2 ), u is in L 2 ∩ L ∞ (R2 ), and 2/b − 1 < 0, giving Eq. (4.4). A similar argument gives ω(T R u) L p ( R ) ≤ ω(u) L p (R2 ) + C R 2/b−1 u L 2 ∩L ∞ (R2 ) . From interpolation of Lebesgue spaces and Lemma A.4, u L 2 ∩L ∞ (R2 ) ≤ max u L 2 (R2 ) , u L ∞ (R2 ) ≤ C u L 2 (R2 ) + ω(u) L 4 (R2 ) ≤ C uYθ (R2 ) . Thus by Eq. (3.2),

T R uYθ ( R ) ≤ u L 2 (R2 ) +

sup

ω(u) L p (R2 ) + C R 2/b−1 uYθ (R2 )

θ ( p)

p∈[ p0 ,∞)

≤ C uYθ (R2 ) , showing that T R : Yθ (R2 ) → Yθ ( R ) with an operator norm that is independent of R. 5. Weak Solutions Definition 5.1. (Weak Navier-Stokes Solutions) Given viscosity ν > 0 and initial velocity u 0 in H ( R ), u in L 2 ([0, T ]; V (N S) ) with ∂t u in L 2 ([0, T ]; (V (N S) ) ) is a weak solution to the Navier-Stokes equations (without forcing) if u(0) = u 0 and (NS) ∂t u · v + (u · ∇u) · v + ν ∇u · ∇v = 0 R

R

R

for almost all t in [0, T ] and for all v in V (N S) ( R ). For the Euler equations, existence is only known if the L p -norm of the initial vorticity is finite for some p in (1, ∞], and uniqueness is known only under even stronger assumptions, such as the initial velocity lying in Y (see also [15]). This is reflected in the following definition of a weak solution to the Euler equations. Definition 5.2. (Weak Euler Solutions) Given an initial velocity u 0 in Y( R ), u in L ∞ ([0, T ]; V (E) ) with ∂t u in L 2 ([0, T ]; (V (N S) ) ) is a weak solution to the Euler equations (without forcing) if u(0) = u 0 and

760

J. P. Kelliher

(E)

R

∂t u · v +

R

(u · ∇u) · v = 0

for almost all t in [0, T ] and for all v in V (E) ( R ). Given a solution to (N S), there exists a distribution p (tempered, if R = ∞) such that ∂t u + u · ∇u + ∇ p = νu,

(5.1)

equality holding in the sense of distributions. This follows from a result of Poincaré and de Rham that any distribution that is a curl-free vector is the gradient of some scalar distribution. Given a solution to (E), there exists a pressure p such that ∂t u + u · ∇u + ∇ p = 0,

(5.2)

but we can only interpret p as a distribution when R = ∞. Otherwise, we must view ∂t u + u · ∇u as lying in H −1 ( R ) and p as lying in L 2 ( R ). (Equation (5.2) follows, for instance, from Remark I.1.9, p. 14 of [14].) In both Eq. (5.1) and Eq. (5.2) the pressure is unique up to the addition of a function of time. We resolve this ambiguity for R < ∞ by requiring that R p(t) = 0 and for R = ∞ by requiring that p(t) lie in L 2 ( R ) for almost all t in [0, T ]. 6. Properties of the Velocity and Pressure Theorem 6.1. (1) Assume that u 0 is in V (R2 ). Then there exists a unique weak solution (u, p) to (N S) with initial velocity u 0 for R = ∞ and initial velocity T R u 0 (see Definition 4.1) for R in [1, ∞), with u ∈ L ∞ ([0, T ]; H ( R )), u ∈ L 4 ([0, T ]; L ∞ ( R )), ∂t u ∈ L 2 ([0, T ]; H ( R )), u ∈ L ∞ ([0, T ]; H 1 ( R )),

∇u ∈ L ∞ ([0, T ]; L 2 ( R )), u ∈ L 2 ([0, T ]; L 2 ( R )), ∇ p ∈ L 2 ([0, T ]; L 2 ( R )), u ∈ L 2 ([0, T ]; H 2 ( R )),

and the norms in these spaces can be bounded independently of R in [1, ∞]. If R < ∞ then p is in L 2 ([0, T ]; L 2 ( R )) and if R = ∞ then p is in L ∞ ([0, T ]; L 2 (R2 )) and ∇ p is in L 4 ([0, T ]; L 2 (R2 )). (2) Assume that u 0 is in Yθ (R2 ). Then there exists a unique weak solution (u, p) to (E) in the sense of Definition 5.2 with initial velocity u 0 for R = ∞ and initial velocity T R u 0 for R in [1, ∞). The velocity u lies in L ∞ ([0, T ]; Yθ ) and is unique in that class. We have, u ∈ L ∞ ([0, T ]; H ( R )), u ∈ L ∞ ([0, T ] × R ), ∂t u ∈ L ∞ ([0, T ]; H ( R )),

∇u ∈ L ∞ ([0, T ]; L 2 ( R )), u ∈ C([0, T ] × R ), ∇ p ∈ L ∞ ([0, T ]; L 2 ( R )),

and the norms in these spaces and of u in L ∞ ([0, T ]; Yθ ) can be bounded independently of R in [1, ∞]. The pressure p is in L ∞ ([0, T ]; H 1 (R2 )). Also, ω(t) L q ( R ) = ω0 L q ( R )

(6.1)

Expanding Domain Limit in 2D

761

for all q in [ p0 , ∞) (and for q = ∞ if ω0 is in L ∞ ( R )) and almost all t ≥ 0, where p0 is as in Definition 2.2. Furthermore, there is a bound on the modulus of continuity of u(t, x) in t that is independent of x and a bound on the modulus of continuity of u(t, x) in x that is independent of t, and both of these bounds are independent of R in [1, ∞]. There exists a unique flow X associated with u with bounds on the moduli of continuity in time and in space with the same properties just described for u. Finally, the bound, µ, on the 1 modulus of continuity of u(t, x) in x satisfies 0 ds/µ(s) = ∞. Proof. The facts regarding solutions to (N S) in (1) are entirely classical except perhaps for the independence of the norms on R. In that regard, we note that no domain-dependent constants enter into the bounds on u in L ∞ ([0, T ]; H ( R )) or ∇u in L 2 ([0, T ]; L 2 ( R )), as these bounds follow from the most basic energy equality derived by multiplying Eq. (5.1) by u and integrating over R . (This is true even with forcing, though then the domain-independent bounds grow with T .) Only the norms of u 0 and ∇u 0 in L 2 ( R ) enter into these bounds, and by Lemma 4.2 the truncation operator T R is bounded in L 2 and H 1 ; hence, the bounds can be made independent of R. In the bounds on ∇u in L ∞ ([0, T ]; L 2 ( R )) and u in L 2 ([0, T ]; L 2 ( R )), domaindependent constants do enter. These bounds follow by an energy inequality derived (formally) by multiplying Eq. (5.1) by Au and integrating over R (see, for instance, the proof of Theorem III.3.10, p. 213–214 of [14] for details). Here, A is the Stokes operator. The proof of this energy inequality relies on two key inequalities, the first being C u L 2 ( R ) ≤ Au L 2 ( R ) ≤ u L 2 ( R ) .

(6.2)

The constant C is independent of R because Au and u scale the same way with R. The second key inequality is Eq. (A.3) applied to ∇u instead of u, giving ∇u2L 4 ( ) ≤ C ∇u L 2 ( R ) ∇∇u L 2 ( R ) + (1/R) ∇u L 2 ( R ) . R

But it follows from basic elliptic regularity theory (see, for instance, Theorem 8.12 p. 176 of [4])) that ∇∇u L 2 ( R ) ≤ C u L 2 ( R ) + (1/R) ∇u L 2 ( R ) , (6.3) with a scaling argument to give the factor of 1/R and the independence of C on R. Other than the additional term of (1/R) ∇u L 2 ( R ) , which is easy to accommodate, the derivation of the energy inequality proceeds as usual, giving bounds on ∇u in L ∞ ([0, T ]; H 1 ( R )), on u in L ∞ ([0, T ]; L 2 ( R )), and on u in L 2 ([0, T ]; L 2 ( R )) that are independent of R (though not of the shape of the domain). Because u, ∇u, and u are each in L 2 ([0, T ]; L 2 ( R )) with bounds on their norms that are independent of R, it follows from Eq. (6.3) that u is in L 2 ([0, T ]; H 2 ( R )) with a bound on its norm that is independent of R. The remaining bounds on u, ∂t u, and ∇ p follow from these basic bounds, and in that way we obtain independence of all the stated norms on R. By Lemma 4.2, the operator norm of T R : Yθ (R2 ) → Yθ ( R ) is independent of R. So too then are the bounds on the norms in (2), which derive from the energy inequality and the transport of vorticity along the flow lines and so involve no domain-dependent constants.

762

J. P. Kelliher

For solutions to (E) in (2), the existence, uniqueness, and regularity of u for R < ∞ were proved in the special case of bounded initial vorticity by Yudovich in [18]. He extended uniqueness to the case of Yudovich initial vorticity in [19] for R < ∞; uniqueness for R = ∞ is essentially the same (see [7]). For R in [1, ∞], existence in the class Y( R ) follows from Theorem 4.1, p. 126 and the comment immediately preceding Remark 4.4, p. 132 of [10], the comment being that the L p -norm of vorticity is independent of time for any p for which ω0 is in L p . For R < ∞, existence can also be established as in [17,18] (see comment in the introduction to [19]). Uniqueness in the class Y( R ) for R < ∞ is established by Yudovich in [19], and his argument extends with little change to R = ∞. To establish the facts concerning the moduli of continuity of the velocity and flow in the last paragraph of (2), however, it is much easier to adapt the approach in Majda’s proof of existence and uniqueness of solutions to (E) as elucidated on p. 311-319 of [11]. (The proof is worked out in all of R2 but can be adapted to a bounded domain without difficulty.) The only significant change we need make for the unbounded initial vorticities in Yθ ( R ) is to substitute the potential theory arguments in Lemma 6.2 for those in [11]. Lemma 6.2. Let u lie in the space L ∞ ([0, T ]; Yθ ( R )) for R in [1, ∞] and assume that u is locally integrable in [0, T ] × R . Then there exists a unique associated flow X : [0, T ]× R → R . The moduli of continuity of u(t, ·) and X (t, ·) are each bounded by a function that depends only upon the norm of u in L ∞ ([0, T ]; Yθ ( R )) and upon the function θ itself (in particular, the bound is independent of t in [0, T ].) Furthermore, t if µ is the bound on the modulus of continuity of the u in space, then 0 ds/µ(s) = ∞. Proof. For R = ∞ this result follows from Theorem 3.1 of [15] (or see Chapter 5 of [8]). For R < ∞ it follows from Lemma 4.2 and Theorem 2 of [19] except for the independence of the moduli of continuity on R, but this follows from a scaling argument. In both cases, the bound depends only upon the function θ (via the function µ). As noted in [19], there is the somewhat surprising relationship between µ and the function β1 of Eq. (2.1) that µ(r ) = (C/r )β1 (r 2 /4). 7. Tail of the Velocity For our solutions to (E) and (N S) in all of R2 , at any time t > 0 the velocity u(t) and its gradient ∇u(t) lie in L 2 (R2 ) and hence vanish at infinity, though at no specific a priori rate. In the proof of Theorem 8.1, however, we will need the stronger property that u(t) vanishes at infinity in the L 2 -norm at a rate that is bounded in L ∞ ([0, T ]) and, for (N S), that ∇u(t) vanishes in the L 2 -norm at a rate that is bounded in L 2 ([0, T ]). The rate itself, while unimportant to obtain convergence, will be determined by the rate at which u 0 vanishes at infinity, though will never be faster than C/R. Lemma 7.1. Let (u, p) be a solution to (E) in all of R2 with initial velocity in Y(R2 ). Then u L ∞ ([0,T ];L 2 (C )) → 0 as R → ∞. R

(7.1)

Let (u, p) be a solution to (N S) in all of R2 with initial velocity in H (R2 ). Then Eq. (7.1) holds and also ∇u L 2 ([0,T ];L 2 (C )) → 0 as R → ∞. R

(7.2)

Expanding Domain Limit in 2D

763

Proof. The lemma follows by a standard energy argument that involves scaling by R a cutoff function defined to be 0 on 1/2 and 1 on C 1. 8. Main Result: Convergence of Solutions Theorem 8.1. Let u 0 be in V (R2 ) and let (u R , p R ) be the solution to (N S) of Definition 5.1 for R in [1, ∞) with initial velocity T R u 0 in V (N S) ( R ). (T R is defined in Definition 4.1.) Let (u, p) be the solution to (N S) in all of R2 with initial velocity u 0 . Then u R − u L ∞ ([0,T ];L 2 ( R )) → 0

as R → ∞

(8.1)

and ∇u R − ∇u L 2 ([0,T ];L 2 ( R )) → 0 as R → ∞.

(8.2)

Let u 0 be in Y(R2 ) and let (u R , p R ) be the unique solution to (E) of Definition 5.2 for R in [1, ∞) with initial velocity T R u 0 in Y( R ). Let (u, p) be the solution to (E) in all of R2 with initial velocity u 0 . Then u R − u L ∞ ([0,T ];L 2 ∩L ∞ ( R )) → 0 as R → ∞

(8.3)

∇u R − ∇u L ∞ ([0,T ];L p ( R )) → 0 as R → ∞

(8.4)

and

for all p in [ p0 , ∞), where p0 is as in Definition 2.2. Also, if X R and X are the flows associated to u R and u, as given by Theorem 6.1, then X R − X L ∞ ([0,T ]× R ) → 0 as R → ∞.

(8.5)

Proof. Basic energy inequality. For the first part of the proof we will treat (N S) and (E) in a unified manner, since, formally, (E) is simply (N S) with ν = 0. We start with a basic energy argument. Let w = uR − u and observe that w(0) H 1 ( R ) = u 0 − T R u 0 H 1 ( R ) → 0 as R → ∞ by Lemma 4.2. Subtracting Eq. (5.1) for (u, p) from Eq. (5.1) for (u R , p R ), we have, on R , ∂t w + u R · ∇u R − u R · ∇u + u R · ∇u − u · ∇u + ∇ p R − ∇ p = νw or ∂t w + u R · ∇w + w · ∇u + ∇ p R − ∇ p = νw. Multiplying by w and integrating over space, we obtain 1 d 2 w(t) L 2 ( ) + (u R · ∇w) · w + (w · ∇u) · w R 2 dt R R + ∇( p R − p) · w = ν w · w R

R

764

J. P. Kelliher

= −ν = −ν

R

R

∇w · ∇w + ν |∇w|2 − ν

R

R

(∇w · n) · w

(∇w · n) · u.

In the last equality we used ν = 0 for (E) and u R = 0 on R for (N S). But, 1 1 j j (u R · ∇w) · w = u R ∂ j wi wi = u R ∂ j |w|2 = u R · ∇ |w|2 2 R 2 R R R 1 1 2 =− (div u R ) |w| + (u R · n) · |w|2 = 0, 2 R 2 R since div u R = 0 and u R · n = 0 on R (in fact, u R = 0 on R for (N S)). Thus, we have, d w(t)2L 2 ( ) + 2ν ∇w2L 2 ( ) R R dt = −2 ∇( p R − p) · w − 2ν R

R

(∇w · n) · u − 2

R

(w · ∇u) · w.

Integrating in time gives w(t)2L 2 ( ) R

t 0

∇w2L 2 ( ) R t −2 ∇( p R − p) · w ) 0

= w(0)2L 2 ( − 2ν

t

+ 2ν

R

R

0

R

(∇w · n) · u − 2

t 0

R

(w · ∇u) · w.

(8.6)

Letting E be the extension operator of Lemma A.1, we have ∇( p R − p) · w = − ∇( p R − p) · u = ∇(E p R − p) · u. R

R

CR

The first equality follows from R ∇( p R − p)·u R = 0 and the second from R2 ∇(E p R − p) · u = 0. Then, t ∇ p · u ≤ ∇ p L 2 ([0,T ];L 2 (R2 )) u L 2 ([0,T ];L 2 (C )) , R 0 C R (8.7) t ∇E p R · u ≤ ∇E p R L 2 ([0,T ];L 2 (R2 )) u L 2 ([0,T ];L 2 (C )) . R 0 C R

The first integral in Eq. (8.7) converges to 0 as R → ∞ by Theorem 6.1 and Eq. (7.1). Because ∇E p R L 2 (R2 ) ≤ C ∇ p R L 2 ( R ) + R1 p R L 2 ( R ) ≤ C ∇ p R L 2 ( R )

Expanding Domain Limit in 2D

765

by Lemma A.1 and Lemma A.3 (recall that R p R = 0), the second integral in Eq. (8.7) converges to 0 as well. For solutions to (N S), we extend w to all of R2 as w = Eu R − u (we do not need a divergence-free extension). Then (∇w · n) · u = − ∇w · ∇u − w · u R

so

t 0

R

CR

CR

(∇w · n) · u ≤ ∇w L 2 ([0,T ];L 2 (R2 )) ∇u L 2 ([0,T ];L 2 (C )) R

+ w L 2 ([0,T ];L 2 (R2 )) u L 2 ([0,T ];L 2 (C )) . R

By Theorem 6.1, ∇u L 2 ([0,T ];L 2 (R2 )) ≤ C. Also, ∇Eu R L 2 ([0,T ];L 2 (R2 )) ≤ C u R L 2 ([0,T ];H 1 ( R )) ≤ C by Lemma A.1 and Theorem 6.1 so ∇w L 2 ([0,T ];L 2 (R2 )) ≤ C. Similar reasoning gives w L 2 ([0,T ];L 2 (R2 )) ≤ C. Therefore, t →0 (∇w · n) · u 0

R

as R → ∞ by Eq. (7.1) and Eq. (7.2). (It is only in this bound that we require that u 0 lie in V (R2 ). For the other bounds, u 0 in H (R2 ) would have sufficed.) From Eq. (8.6) and the estimates above, we have that w(t)2L 2 ( ) + 2ν 0t ∇w2L 2 ( ) ≤ K + 2 0t R |∇u| |w|2 , (8.8) R

R

where K → 0 as R → ∞. Solutions to (N S) with u 0 in V . Assume that (u R , p R ) and (u, p) are solutions to (N S) with u 0 in V (N S) (R2 ). Applying Lemma A.2, Young’s inequality, and the inequality (A + B)2 ≤ 2(A2 + B 2 ) to Eq. (8.8), we have t t w(t)2L 2 ( ) + 2ν ∇w2L 2 ( ) ≤ K + 2 ∇u L 2 ( R ) w2L 4 ( ) R R R 0 0

t 1 ∇u L 2 ( R ) w L 2 ( R ) ∇w L 2 ( R ) + w L 2 ( R ) ≤ K + 23/2 R 0 t t 1 ∇w2L 2 ( ) + 2 w2L 2 ( ) + C ∇u2L 2 ( ) w2L 2 ( ) , ≤ K +ν R R R R R 0 0 or, w(t)2L 2 (

R

+ν )

t 0

∇w2L 2 (

R)

t ν C ∇u2L 2 ( ) + 2 w2L 2 ( ) R R R 0 t w2L 2 ( ) , ≤ K +C ≤K+

0

R

where we used Theorem 6.1 in the last inequality. Applying Gronwall’s lemma gives Eq. (8.1) and Eq. (8.2).

766

J. P. Kelliher

Solutions to (E). By Lemma 4.2 and Theorem 6.1, there exists a unique solution (u R , p R ) to (E) for all R in [1, ∞) and both u R and u lie in L ∞ (R × R ) with a norm that is independent of R. Thus, M = sup |w|2 L ∞ ([0,T ]× R )

(8.9)

R≥1

is finite and independent of R in [1, ∞]. We now proceed as in [19] or [7]. Let s be in [0, T ], and let B = |∇u(s, x)| ,

A = |w(s, x)|2 ,

L(s) = w(s)2L 2 .

Then for all 1/ in [2 + 0 , ∞), |∇u(s, x)| |w(s, x)|2 d x = AB = A A1− B ≤ M A1− B R R R R B L 1/ ≤ M A1− 1/(1−) B L 1/ = M A1− L1 L

1 ∇u(s) L 1/ ≤ C M L(s)1− ω0 L 1/ 1 ≤ C M L(s)1− θ (1/),

= M L(s)

1−

where θ is as in Definition 2.1. Here we used Lemma A.5 and the bounds on the L p -norms of the vorticity given by Eq. (6.1). Since this inequality holds for all in (0, 1/(2+0 )−1 ] it follows that |∇u(s, x)| |w(s, x)|2 d x ≤ Cβ M (L(s)), 2 R2

with β M as in Eq. (2.1). From Eq. (8.8), then, we have t β M (L(r )) dr. L(t) ≤ K + C

(8.10)

0

By Lemma A.6,

L(t) K

t ds ≤ 0 ds = t. Cβ M (s)

(8.11)

It follows that for all t in (0, T ], 1 1 ds ds ≤ CT + . β (s) β M K L(t) M (s) Since Eq. (2.2) holds, as R → ∞ the left side becomes infinite; hence, so must the right side. But this implies that L(t) → 0 as R → ∞, and that the convergence is uniform over [0, T ]: this is Eq. (8.1). It also follows from Eq. (8.11) that L(t) dr ≤ Ct, β (r ) M K which can be used, in principle, to bound the rate of convergence. Also, Eq. (8.3) follows by an application of Corollary 8.4 to u R and u| R .

Expanding Domain Limit in 2D

767

Vorticity for solutions to (E). We have, 0 −1 p ω R (t) − ω(t) L p ( R ) = ω0 (T R u 0 ) ◦ X −1 R (t) − ω ◦ X (t) L ( R ) −1 0 p ≤ ω0 (T R u 0 ) ◦ X −1 R (t) − ω ◦ X R (t) L ( R ) 0 −1 p + ω0 ◦ X −1 R (t) − ω ◦ X (t) L ( R ) 0 −1 p = ω0 (T R u 0 ) − ω0 L p ( R ) + ω0 ◦ X −1 R (t) − ω ◦ X (t) L ( R ) ,

(8.12)

X −1 R (t)

using, in the last step, that is measure-preserving and maps R to itself. The first term on the right-hand side of Eq. (8.12) converges to zero as R → ∞ by Lemma 4.2. This leaves the second term on the right-hand side of Eq. (8.12), which converges to −1 in L ∞ ([0, T ] × ), which we now show. zero by Lemma 8.2 if X −1 R R → X −1 The inverse flow X is given by t X −1 (t, x) = x − u(s, X −1 (s, x)) ds, 0

and similarly for X −1 R . Then,

t −1 −1 −1 −1 X R (t, x) − X (t, x) = (u R (s, X R (s, x)) − u(s, X (s, x))) ds 0

≤

t −1 (s, x)) − u(s, X (s, x)) u R (s, X −1 ds R R 0

+

t −1 (s, x)) − u(s, X (s, x)) u(s, X −1 ds. R 0

But,

−1 −1 −1 X ≤ µ( (s, x)) − u(s, X (s, x)) (s, x) − X (s, x) u(s, X −1 ), R R

where µ is the bound on the modulus of continuity in space of u given by Theorem 6.1. Also, t −1 (s, x)) − u(s, X (s, x)) u R (s, X −1 ds ≤ A(R)T, R R 0

where A(R) = u R − u L ∞ ([0,T ]× R ) ; this converges to zero as R → ∞ by Eq. (8.3). Thus, t −1 −1 X R (t, x) − X −1 (t, x) ≤ A(R)T + 0 µ( X −1 R (s, x) − X (s, x) ). −1 (t, x) we have Letting L R (t) = X −1 (t, x) − X , R

t

L R (t) ≤ A(R)T + 0

ds . µ(s)

768

J. P. Kelliher

Applying Lemma A.6 gives

L R (t) A(R)T

ds = t. µ(s)

1

−1 in L ∞ ([0, T ]× ), thus comBecause 0 µ(s) ds = ∞, we conclude that X −1 R R → X pleting the demonstration of Eq. (8.5). Applying Lemma A.5 for p ≥ 2+0 and standard elliptic regularity bounds along with Eq. (8.3) for p in [ p0 , 2 + 0 ) gives Eq. (8.4).

We can obtain an upper bound on the rate of convergence of solutions to (N S) in Eq. (8.1) and Eq. (8.2) by examining the bounds in the proof above, in the proof of Lemma 7.1, and the proof of Lemma 4.2. Similarly, we can obtain a bound on the rate of convergence of solutions to (E) in Eq. (8.3). For (N S), the convergence rate is controlled by the rate of decay with R of u 0 L 2 (C ) and ∇u 0 L 2 (C ) . For solutions to (E), the R

R

convergence rate is controlled by the rate of decay with R of u 0 L 2 (C ) and by the R function β M of Definition 2.1. (The function β M enters into these bounds much as in [7] or [9].) We can also obtain a bound on the rate of convergence in Eq. (8.4), but this ultimately relies on measure-theoretic properties of ω0 that are hard to usefully characterize let alone quantify. The rate of convergence of the flow, however, can be determined much as for the convergence in Eq. (8.3). We used the following lemmas in the proof of Theorem 8.1: Lemma 8.2. Let f be in L p (Rd ), 1 ≤ p < ∞, d ≥ 1 and let (X n ) and (Yn ) be sequences of measure-preserving homeomorphisms from a domain R of Rd to all of Rd with X n − Yn L ∞ ( R ) ≤ M(n) with M(n) → 0 as n → ∞. Then there exists a nondecreasing function N : (0, ∞) → Z+ such that for all > 0 if n ≥ N () then f ◦ X n − f ◦ Yn L p ( R ) ≤ . Furthermore, the function N depends only upon the functions f and M. Proof. Our proof is an adaptation of the proof that translation is continuous in L p (Rd ) (see, for instance, Theorem 8.19, p. 134–135 of [16]). Approximate f in L p (Rd ) by a sequence of functions ( f k ) that are finite linear combinations of characteristic functions of cubes in Rd . It is easy to see that if g1 is the characteristic function of a cube, then g1 ◦ X n − g1 ◦ Yn L p ( R ) ≤ g1 (· + M(n)e j ) − g1 (·) L p ( ) , R and that g1 (· + M(n)e j ) − g1 (·) L p ( ) → 0 as n → ∞. Here, e j is any of the R coordinate basis vectors. If g2 is also the characteristic function of a cube, then (g1 + g2 ) ◦ X n − (g1 + g2 ) ◦ Yn L p ( R ) = g1 ◦ X n − g1 ◦ Yn + g2 ◦ X n − g2 ◦ Yn L p ( R ) ≤ g1 ◦ X n − g1 ◦ Yn L p ( R ) + g2 ◦ X n − g2 ◦ Yn L p ( R ) ≤ g1 (· + M(n)e j ) − g1 (·) p + g2 (· + M(n)e j ) − g2 (·) L ( R )

L p ( R )

,

Expanding Domain Limit in 2D

769

so (g1 + g2 ) ◦ X n − (g1 + g2 ) ◦ Yn L p ( R ) → 0 as n → ∞ at a rate that is bounded in terms of M(n). We conclude then that each f k has the property that || f k ◦ X n − f k ◦ Yn || L p ( R ) → 0 as n → ∞ at a rate that is bounded in terms of M(n). Now let > 0 and choose k large enough that f k − f L p (R2 ) < /4. Then f ◦ X n − f ◦ Yn L p ( R ) ≤ f ◦ X n − f k ◦ X n L p ( R ) + f k ◦ X n − f k ◦ Yn L p ( R ) + f k ◦ Yn − f ◦ Yn L p ( R ) = f k ◦ X n − f k ◦ Yn L p ( R ) + f k − f L p (X n−1 ( R )) + f k − f L p (Yn−1 ( R ))

≤ f k ◦ X n − f k ◦ Yn L p ( R ) + 2 f k − f L p (Rd ) . If we choose N large enough that f k ◦ X n − f k ◦ Yn L p ( R ) < /2 for all n ≥ N , it follows that f ◦ X n − f ◦ Yn L p ( R ) < for all n ≥ N . What we have constructed is the desired map N = N () from the properties only of M and f . Lemma 8.3. Let

F( R ) = u ∈ (C( R ))2 : |u(x) − u(y)| ≤ ρ(|x − y|) ,

where ρ is a nondecreasing continuous function with ρ(0) = 0. (That is, F( R ) consists of all continuous functions on R with a given common bound on their modulus of continuity.) Then there exists a continuous function F : [0, ∞) → [0, ∞) with F(0) = 0 such that for all u 1 , u 2 in F( R ), u 1 − u 2 L ∞ ( R ) ≤ F(u 1 − u 2 L 2 ( R ) ). Moreover, a choice of F can be made that is independent of R in [1, ∞]. Proof. Assume first that R = ∞ and let u 1 , u 2 be in F( R ). Fix x in R and let δ = |u 1 (x) − u 2 (x)| . Now suppose that y is in the ball B of radius a about x, where a = ρ −1 (δ/4). Then |u 1 (x) − u 1 (y)| ≤ ρ(|x − y|) ≤ ρ(a) = δ/4 and also |u 2 (x) − u 2 (y)| ≤ δ/4. It follows that |u 1 (y) − u 2 (y)| ≥ δ/2 for all y in B, and thus that 1/2

u 1 − u 2 L 2 (R2 ) ≥ u 1 − u 2 L 2 (B) ≥

(δ/2)2

=

B

√ π aδ. 2

Hence, h(δ) :=

√ π −1 δρ (δ/4) ≤ u 1 − u 2 L 2 (R2 ) 2

so |u 1 (x) − u 2 (x)| = δ ≤ h −1 (u 1 − u 2 L 2 (R2 ) ).

(8.13)

770

J. P. Kelliher

Since this is true for all x in R , u 1 − u 2 L ∞ (R2 ) ≤ F(u 1 − u 2 L 2 (R2 ) ),

(8.14)

where F = h −1 , and where we note that F(0) = 0. The only modification required for R in [1, ∞) is that we must replace the ball B with B ∩ R . If B has radius r < 1/(2κ R ) = R/(2κ 1 ) = C R, where κ R is the maximum curvature of R (which is necessarily positive), then it is easy to √ see that Area(B ∩ R√) ≥ (1/4) Area B. This has−1the effect of changing the constant π /2 in Eq. (8.13) to π /8 and gives F(x) = h (x) for √ x in the interval [0, C R]. For x > C R, the constant in Eq. (8.13) decreases below π /8 resulting in an F that increases more rapidly than h −1 . In any case, it follows that the function F that results for R = 1 serves as an upper bound on F for all R in [1, ∞]. Corollary 8.4. Let u j : [0, T ] × R → R2 , j = 1, 2, with u j (t) in F( R ) for almost all t in [0, T ], where F( R ) is as in Lemma 8.3. Then there exists a continuous function F : [0, ∞) → [0, ∞) with F(0) = 0 such that u 1 − u 2 L ∞ ([0,T ]× R ) ≤ F(u 1 − u 2 L ∞ ([0,T ];L 2 ( R )) ). Proof. Apply Lemma 8.3 to u 1 (t) and u 2 (t) for all t in [0, T ].

Appendix A. Various Lemmas Lemma A.1. For any R in [1, ∞) there exists a single bounded linear extension operator E = E R , E : H n, p ( R ) → H n, p (R2 ) for all n = 0, 1, . . . and all p in [1, ∞], with E f H n, p (R2 ) ≤ Cn f H n, p ( R ) , where the constant Cn is independent of p and R in [1, ∞]. If f is in H 1, p ( R ) then

1 ∇E f L p (R2 ) ≤ C ∇ f L p ( R ) + f L p ( R ) R

(A.1)

(A.2)

with a constant C that is independent of p and R in [1, ∞]. Proof. First define the extension operator E1 on 1 . We can use, for instance, a partition of unity and the extension operator of Theorem 5’, p. 181 of [13], since we have sufficient smoothness of the boundary. This gives Eq. (A.1) for R = 1 with independence of Cn on p. (The extension operator of Theorem 5, p. 181 of [13] would suffice, except for the independence of Cn on p.) Now let R be in [1, ∞) with f in H n, p ( R ), and define f in H n, p (1 ) by f 1 (x) = f (Rx). Then define E R by E R f (x) = (E1 f 1 )(x/R). The factor of 1/R in Eq. (A.2) and the independence of Cn on R in [1, ∞) follow by scaling. The following is Ladyzhenskaya’s inequality and a simple consequence of it.

Expanding Domain Limit in 2D

771

Lemma A.2. For u in H01 ( R ) with R in [1, ∞], u2L 4 (

R)

≤ 21/2 u L 2 ( R ) ∇u L 2 ( R ) .

For u in H 1 ( R ) with R in [1, ∞), u2L 4 (

1 ≤ C u L 2 ( R ) ∇u L 2 ( R ) + u L 2 ( R ) , R

R)

(A.3)

where C is independent of R in [1, ∞]. Proof. The first inequality is Ladyzhenskaya’s inequality (see, for instance, Lemma III. 3.3, p. 197 of [14]). The second inequality follows from the first, since H01 (R2 ) = H 1 (R2 ), and from Lemma A.1: u2L 4 (

R)

≤ 21/2 Eu L 2 ( R ) ∇Eu L 2 ( R )

1 ≤ C u L 2 ( R ) ∇u L 2 ( R ) + u L 2 ( R ) . R ≤ Eu2L 4 (

R)

Lemma A.3. (Poincaré’s inequality) Let U be an open bounded connectedsubset of R2 with a C 1 -boundary, and let U R = RU . Then for all f in H 1, p (U R ) with U R f = 0, f L p (U R ) ≤ C p R ∇ f L p (U R ) for all p in [1, ∞], where C p is independent of R. Proof. This is classical; see, for instance, Theorem 1, p. 275 of [3]. To verify that the scaling factor is R, assume that f L p (U R ) ≤ C p (R) ∇ f L p (U R ) .

(A.4)

Let f be in L p (U R ) and define f 1 in L p (U1 ) by f 1 (x) = f (Rx). Then the chain rule and a change of variables gives f 1 L p (U1 ) = R −2/ p f L p (U R ) , while ∇ f 1 L p (U1 ) = R 1−2/ p ∇ f L p (U R ) . Multiplying both sides of Eq. (A.4) by R −2/ p gives f 1 L p (U1 ) ≤ C p (R)R −1 ∇ f L p (U R ) . Since this is true for all f in L p (U R ) it follows that C p (1) ≤ C p (R)R −1 . Interchanging the roles of U R and U1 it follows that C p (R) = C p (1)R. Lemma A.4. Let f be a scalar- or vector-valued function in L 2 (R2 ) with ∇ f in L a (R2 ) for some a in (2, ∞). Then f is in L 2 (R2 ) ∩ L ∞ (R2 ), and for all b in (a, ∞], f L b (R2 ) ≤ C f L 2 (R2 ) + C ∇ f L a (R2 ) , (A.5) where the constant C depends on a and on b.

772

J. P. Kelliher

Let v be a divergence-free vector field in L 2 (R2 ) with vorticity ω lying in L a (R2 ) for some a in (2, ∞). Then v is in L 2 (R2 ) ∩ L ∞ (R2 ), and for all b in (a, ∞],

a2 ω L a (R2 ) , v L b (R2 ) ≤ C v L 2 (R2 ) + (A.6) a−1 where the constant C depends on a and on b. Proof. This can be proven by decomposing v into low and high-frequencies using Littlewood-Paley operators. See, for instance, Lemma 2B.1, pp. 23–24 of [8]. The following is a result of Yudovich’s: Lemma A.5. Fixing 0 > 0, for any p in [2 + 0 , ∞) and any u in V (E) ( R ) (recall that R is simply connected), ∇u L p ( R ) ≤ C p ω(u) L p ( R ) , with a constant C that is independent of p and of R in [1, ∞]. Proof. Let u be in V (E) ( R ). Then ψ, the stream function for u, can be assumed to vanish on R since R is simply connected. Applying Corollary 1 of [17] with the operator L = and r = 0 gives ∇u L p ( R ) ≤ ψ H 2, p ( R ) ≤ C( R ) p ψ L p ( R ) = C( R ) p ω(u) L p ( R ) . To demonstrate the independence of C( R ) on R, let u be an arbitrary element of V (E) ( R ). Then u(·) = u 1 (·/R) for some u 1 in V (E) (1 ). But, ∇u L p ( R ) = R 2/ p−1 ∇u 1 L p (1 ) and ω(u) L p ( R ) = R 2/ p−1 ω(u 1 ) L p (1 ) , so C( R ) ≤ C(1 ); the argument in reverse shows equality of the two constants. The following is Osgood’s lemma (see, for instance, p. 92 of [1]). The succinct proof is due to M. Tehranchi. Lemma A.6. (Osgood’s lemma) Let L be a measurable nonnegative function and γ a nonnegative locally integrable function, each defined on the domain [t0 , t1 ]. Let µ : [0, ∞) → [0, ∞) be a continuous nondecreasing function, with µ(0) = 0. Let a ≥ 0, and assume that for all t in [t0 , t1 ], t L(t) ≤ a + γ (s)µ(L(s)) ds. (A.7) t0

If a > 0, then

If a = 0 and

∞ 0

a

ds ≤ µ(s)

t

γ (s) ds.

t0

ds/µ(s) = ∞, then L ≡ 0.

Proof. We have, L(t) a

L(t)

dx ≤ µ(x) ≤

t a+ t γ (u)µ(L(u)) du

a

dx µ(x) t γ (s)µ(L(s)) ds s ≤ γ (s) ds. µ(a + t0 γ (u)µ(L(u)) du) t0 0

t t0

The last inequality follows from Eq. (A.7), since µ is nondecreasing.

Expanding Domain Limit in 2D

773

References 1. Chemin, J.-Y.: Perfect incompressible fluids. Volume 14 of Oxford Lecture Series in Mathematics and its Applications. New York: The Clarendon Press Oxford University Press, 1998 (translated from the 1995 French original by Isabelle Gallagher and Dragos Iftimie) 2. Cozzi, E., Kelliher, J.P.: Vanishing viscosity in the plane for vorticity in borderline spaces of Besov type. J. Differ. Eqs. 235(2), 647–657 (2007) 3. Evans, L.C.: Partial differential equations. Volume 19 of Graduate Studies in Mathematics. Providence, RI: Amer. Math. Soc. 1998 4. Gilbarg, D., Trudinger, N.S.: Elliptic partial differential equations of second order. Grundlehren der Mathematischen Wissenschaften, Vol. 224, Berlin: Springer-Verlag, 1977 5. Iftimie, D., Lopes Filho, M.C., Nussenzveig Lopes, H.J.: Two dimensional incompressible ideal flow around a small obstacle. Comm. Part. Differ. Eqs. 28(1-2), 349–379 (2003) 6. Iftimie, D., Lopes Filho, M.C., Nussenzveig Lopes, H.J.: Two-dimensional incompressible viscous flow around a small obstacle. Math. Ann. 336(2), 449–489 (2006) 7. Kelliher, J.P.: The inviscid limit for two-dimensional incompressible fluids with unbounded vorticity. Math. Res. Lett. 11(4), 519–528 (2004) 8. Kelliher, J.P.: The vanishing viscosity limit for incompressible fluids in two dimensions PhD Thesis, University of Texas at Austin, Austin, TX, 2005 9. Kelliher, J.P.: Navier-Stokes equations with Navier boundary conditions for a bounded domain in the plane. SIAM Math Anal. 38(1), 210–232 (2006) 10. Lions, P.-L.: Mathematical topics in fluid mechanics. Vol. 1, Volume 3 of Oxford Lecture Series in Mathematics and its Applications. New York: The Clarendon Press Oxford University Press, 1996 11. Majda, A.J., Bertozzi, A.L.: Vorticity and incompressible flow, Volume 27 of Cambridge Texts in Applied Mathematics. Cambridge: Cambridge University Press, 2002 12. Maz ya, V., Nazarov, S., Plamenevskij, B.: Asymptotic theory of elliptic boundary value problems in singularly perturbed domains. Vols. I and II, Vol. 111 of Operator Theory: Advances and Applications. Basel: Birkhäuser Verlag, 2000 (translated from the German by Georg Heinig and Christian Posthoff) 13. Stein, E.M.: Singular integrals and differentiability properties of functions. Princeton Mathematical Series, No. 30. Princeton, N.J.: Princeton University Press, 1970 14. Temam, R.: Navier-Stokes equations. Providence, RI: AMS Chelsea Publishing, 2001 15. Vishik, M.: Incompressible flows of an ideal fluid with vorticity in borderline spaces of Besov type. Ann. Sci. École Norm. Sup. (4), 32(6), 769–812 (1999) 16. Wheeden, R.L., Zygmund, A.: Measure and integral. An introduction to real analysis, Pure and Applied Mathematics, Vol. 43, New York: Marcel Dekker Inc., 1977 17. Yudovich, V.I.: Some bounds for solutions of elliptic equations. Mat. Sb. (N.S.), 59 no. 101, 229–244 (1962) (Russian); Eng. trans. in Amer. Math. Soc. Transl. (2) 56, 1–18 (1962) ˘ Vyˇcisl. Mat. i Mat. Fiz. 3, 18. Yudovich, V.I.: Non-stationary flows of an ideal incompressible fluid. Z. 1032–1066 (1963) (Russian) 19. Yudovich, V.I.: Uniqueness theorem for the basic nonstationary problem in the dynamics of an ideal incompressible fluid. Math. Res. Lett. 2(1), 27–38 (1995) Communicated by P. Constantin

Commun. Math. Phys. 278, 775–801 (2008) Digital Object Identifier (DOI) 10.1007/s00220-007-0393-1

Communications in

Mathematical Physics

Supersymmetric Quantum Mechanics and Super-Lichnerowicz Algebras K. Hallowell, A. Waldron Department of Mathematics, University of California, Davis, CA 95616, USA. E-mail: [email protected]; [email protected] Received: 2 March 2007 / Accepted: 14 May 2007 Published online: 6 December 2007 – © Springer-Verlag 2007

Dedicated to the memory of Tom Branson Abstract: We present supersymmetric, curved space, quantum mechanical models based on deformations of a parabolic subalgebra of osp(2 p +2|Q). The dynamics are governed by a spinning particle action whose internal coordinates are Lorentz vectors labeled by the fundamental representation of osp(2 p|Q). The states of the theory are tensors or spinor-tensors on the curved background while conserved charges correspond to the various differential geometry operators acting on these. The Hamiltonian generalizes Lichnerowicz’s wave/Laplace operator. It is central, and the models are supersymmetric whenever the background is a symmetric space, although there is an osp(2 p|Q) superalgebra for any curved background. The lowest purely bosonic example (2 p, Q) = (2, 0) corresponds to a deformed Jacobi group and describes Lichnerowicz’s original algebra of constant curvature, differential geometric operators acting on symmetric tensors. The case (2 p, Q) = (0, 1) is simply the N = 1 superparticle whose supercharge amounts to the Dirac operator acting on spinors. The (2 p, Q) = (0, 2) model is the N = 2 supersymmetric quantum mechanics corresponding to differential forms. (This latter pair of models are supersymmetric on any Riemannian background.) When Q is odd, the models apply to spinor-tensors. The (2 p, Q) = (2, 1) model is distinguished by admitting a central Lichnerowicz-Dirac operator when the background is constant curvature. The new supersymmetric models are novel in that the Hamiltonian is not just a square of super charges, but rather a sum of commutators of supercharges and commutators of bosonic charges. These models and superalgebras are a very useful tool for any study involving high rank tensors and spinors on manifolds. Contents 1. 2.

Introduction . . . . . . . . . . . . . . The Model . . . . . . . . . . . . . . . 2.1 Orthosymplectic superalgebras . . 2.2 Orthosymplectic spinning particle

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

776 779 779 780

776

3.

4.

5.

K. Hallowell, A. Waldron

Quantization . . . . . . . . . . . . . . . . . . . . 3.1 Canonical analysis . . . . . . . . . . . . . . 3.2 States . . . . . . . . . . . . . . . . . . . . . 3.3 Conserved charges . . . . . . . . . . . . . . Geometry . . . . . . . . . . . . . . . . . . . . . . 4.1 Dirac operators and N = 1 supersymmetry . 4.2 Differential forms and N = 2 supersymmetry 4.3 Symmetric algebras and quantum mechanics . 4.4 Lichnerowicz Dirac operator . . . . . . . . . 4.5 Multisymmetric tensors . . . . . . . . . . . . 4.6 Multiforms . . . . . . . . . . . . . . . . . . 4.7 Super-Lichnerowicz algebras . . . . . . . . . 4.8 Parabolic orthosymplectic algebra . . . . . . Conclusions . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . .

. . . . . . . . . . . . . .

. . . . . . . . . . . . . .

. . . . . . . . . . . . . .

. . . . . . . . . . . . . .

. . . . . . . . . . . . . .

. . . . . . . . . . . . . .

. . . . . . . . . . . . . .

. . . . . . . . . . . . . .

. . . . . . . . . . . . . .

. . . . . . . . . . . . . .

. . . . . . . . . . . . . .

. . . . . . . . . . . . . .

. . . . . . . . . . . . . .

781 781 782 784 785 785 786 786 789 791 793 795 797 798

1. Introduction Since the early 1980’s it has been clear that supersymmetric quantum mechanical models are deeply related to geometry. In particular, Alvarez-Gaumé and Witten computed gravitational anomalies and Pontryagin classes by identifying the Dirac operator on a Riemannian manifold with the supercharge of an N = 1 supersymmetric quantum mechanics [1]. In an application to Morse theory, Witten realized that the supercharges of N = 2 supersymmetric quantum mechanics corresponded to the exterior derivative d and codifferential δ [2]. In these models the Hamiltonian corresponds to the Laplace operator, and states are spinors or differential forms, respectively. Increasing the number of supersymmetries to N = 4 requires that the background manifold be Kähler [3] and the supersymmetry charges are now the Dolbeault operators. In particular the sl(2, R) Hodge-Lefschetz algebra of the de Rham cohomology of a Kähler manifold amounts to the algebra of conserved charges of the N = 4 supersymmetric quantum mechanics [4,5]. If instead of studying differential forms on a manifold M, but rather symmetric tensors, there is also an extremely useful algebra of the gradient and divergence operations. In particular, Lichnerowicz observed that for constant curvature backgrounds one could introduce a wave operator that generalized the Laplacian and (in some sense) commuted with the action of the divergence and gradient operators on symmetric tensors [6]. (His original motivation was a computation of the spin 2 massive propagator on these spaces.) In a recent study of higher spins in constant curvature manifolds we found the algebra of the following operators on symmetric tensors1 [8]:

Index Trace Metric Divergence Gradient

N tr g div grad

Counts indices. Traces a pair of indices. Multiplies by the metric and symmetrizes. The covariant divergence. Symmetrized covariant derivative.

The result was a deformation of the Jacobi group (a central extension of sl(2, R) R2 ) and in this paper we extend it to an arbitrary symmetric space. In particular, unlike differ1 The flat version of this algebra has been studied in the context of conformally flat manifolds in [7].

Supersymmetric Quantum Mechanics and Super-Lichnerowicz Algebras

777

ential forms for which the anticommutator of the exterior derivative and codifferential (morally – gradient and divergence) yield the form Laplace operator {d, δ} = ,

(1)

for symmetric tensors, it is the commutator that does so [div, grad] = − g tr + N(N + dim M − 2) .

(2)

Here g tr−N(N + dim M −2) is the Casimir of the sl(2, R) algebra generated by (g, N, tr) and the central Lichnerowicz wave operator equals = + g tr − N(N + dim M − 2) .

(3)

The natural question, posed and answered here, is whether there exists a Lichnerowicz/ Lefschetz type algebra applicable to both differential forms and symmetric tensors, or indeed, for the most general tensors and spinor-tensors on a Riemannian manifold. We answer this question in the affirmative and find that there exists a central Lichnerowicz wave operator whenever the manifold M is a locally symmetric space, namely its curvature is covariantly constant Dκ Rµνρσ = 0 .

(4)

The operator also exists for general Riemannian manifolds and always commutes with an osp(2 p|Q) subalgebra of operators generalizing {g, N, tr}. We construct these “super-Lichnerowicz” algebras from the conserved charges of a supersymmetric quantum mechanical model whose states are tensors and tensor-spinors on M. The model itself is a simple generalization of the N = 2 supersymmetric quantum mechanics. The action is just D Xµ 1 1 S= (5) + Rµνρσ X µ · X ν X ρ · X σ , dt x˙ µ gµν x˙ ν − i X µ · 2 dt 4 or in other words, a generalized spinning particle model [9–11]. The fields x µ (t) are the worldline imbedding coordinates, while X µ (t) are a collection of both bose and fermi fields transforming under the fundamental representation of the superalgebra osp(2 p|Q) for which the dot “·” denotes the invariant inner product. Supersymmetry requires the background to be a locally symmetric space. Gauged models of this type where p = 0 and the fields X µ are purely fermionic are known to describe massless higher spin particles2 [12]. (When (2 p, Q) = (2, 0), the fields X µ carry an sp(2) index corresponding to the spinor and conformal vector oscillators first introduced by Labastida and Vasiliev to describe higher spin fields [14–16].) The conserved charges of the model correspond to both commuting and anticommuting generalizations of the gradient and divergence (alias exterior derivative and codifferential) operators, along with a Lichnerowicz wave operator (the Hamilitonian), and operators that trace over indices, count indices of a given type (form or totally symmetric), multiply by metric tensors, and change indices from one type to another. Taking the parameters (2 p, Q) completely general lets us describe arbitrary tensors and spinor-tensors. The interpretations for some specific values are listed below: (0,1)

N = 1 supersymmetric quantum mechanics – describes the Dirac operator acting on spinors.

2 Recently, the one-loop quantization of these models has also been studied [13].

778

K. Hallowell, A. Waldron

(0,2)

(2,0)

(2,1)

(2p,0)

(0,2q)

N = 2 supersymmetric quantum mechanics – describes the exterior derivative acting on differential forms. This model is purely bosonic and yields Lichnerowicz’s original construction – states are symmetric tensors. The conserved charge algebra is the “symmetric algebra” of [8]. A model of totally symmetric spinor-tensors (i.e., spinors with an arbitrary number of totally symmetric vector indices), which means there are charges corresponding to gamma-traces (contracting vector and spinor indices using Dirac matrices). This algebra was first encountered in a study of partially massless fermionic fields in [17] and systematized in [8]. (It has also appeared in the computation of massive and partially massless fermionic actions [18].) The theory is distinguished by possessing a “Lichnerowicz–Dirac” operator – a modification of the Dirac operator that commutes with all other charges when the background is constant curvature (but not in general symmetric spaces). A purely bosonic model, but states are now “multi-symmetric tensors”. There are p different gradient and divergence operators along with operators that remove or add pairs of indices or change index types. States are multi-forms (indices are grouped into antisymmetric subsets, e.g., the Riemann tensor is a “bi-form”). There are now q distinct exterior derivatives and codifferentials. Algebras of this type have been studied by Hull and Medeiros in the context of “exotic” higher spin gauge theories [19] and a mathematical analysis was given in [20,21].

For (2 p, Q) = (0, 1) or (0, 2), namely the N = 1, 2 supersymmetric models, the algebra of conserved charges is a finite dimensional superalgebra for any background Riemannian manifold. The same is true whenever the target manifold M is flat but (2 p, Q) are arbitrary. In general backgrounds, we obtain a deformation of the finite dimensional Lie superalgebra appearing in the flat case. The simplest example is the (2, 0) model in a constant curvature background subject to the algebra (2). The Lichnerowicz wave operator is central but the sl(2, R) Casimir does not commute with the doublet (grad, div). Instead, further commutators yield an infinite series of higher operators (involving increasing powers of the “oscillators” X µ ). In this case there exists a reformulation of the resulting infinite dimensional Lie algebra as a rather simple associative algebra, obtained by including a certain square root of the Casimir operator [8]. This is not a direction we pursue in this paper, but there is a simple characterization of the Lie superalgebra being deformed as a parabolic subalgebra p of osp(2 p + 2|Q). This parabolic superalgebra can be viewed as a (supersymmetric) generalization of the Jacobi group G J = (Sl(2, R) R2 ) × R ∗ . Our results are organized as follows: In Sect. 2, we briefly review osp(2 p|Q) superalgebras and write down our supersymmetric quantum mechanical model. Then we quantize it and determine its Hilbert space, conserved charges, and their algebra in Sect. 3. Section 4 is concerned with the application of the model to geometry. There we discuss each of the special examples itemized above, followed by the most general case and also the deformed parabolic superalgebra of differential geometric operators on M. (A hardy reader can skip the examples and study Sects. 4.7 and 4.8 directly.) Our conclusions list various physical and mathematical applications of our theory as well as speculations on its generalizations, especially to manifolds with additional structures, such as a Kähler one, a possible associative algebraic reformulation of our algebra, the rôle of the Jacobi group and novel Casimir operators, applications to higher spins and gauged versions of the model, extensions of the algebra to osp(2 p + 2|Q) by including

Supersymmetric Quantum Mechanics and Super-Lichnerowicz Algebras

779

the inverse Lichnerowicz wave operator, and finally the model’s quantum mechanical spectrum. Tom Branson knew of Lichnerowicz’s work. He may have liked this paper, so it’s dedicated to him.

2. The Model 2.1. Orthosymplectic superalgebras. The Lie superalgebra osp(2 p|Q) is defined by even supermatrices A B λ= , (6) C D subject to the orthosymplectic condition λJ = −J λst ,

(7)

where the supertranspose is defined by st t t A C A B . ≡ C D −B t D t

(8)

The invariant tensor/metric J is given by ⎧⎛ ⎞ −1 p× p ⎪ ⎪ ⎪ ⎪ ⎜ 1 p× p ⎟ ⎪ ⎪ ⎝ ⎠ Q = 2q even , ⎪ 1 ⎪ q×q ⎪ ⎪ ⎪ 1q×q ⎪ ⎨ ⎛ ⎞ J= −1 p× p ⎪ ⎪ ⎪ ⎜ 1 p× p ⎟ ⎪ ⎪ ⎜ ⎟ ⎪ ⎪ 1 ⎜ ⎟ Q = 2q + 1 odd . ⎪ q×q ⎪ ⎪ ⎝ ⎠ ⎪ 1 ⎪ q×q ⎩ 1

(9)

The condition (7) is more simply formulated in terms of ≡ λJ , 2 p times

= −J , st

2

Q times

J = diag −1, . . . , −1, 1, . . . , 1 , 2

whose solution is αβ

≡ ( ) =

λS

t λ A

(10)

,

(11)

with λ S and λ A being 2 × 2 symmetric and antisymmetric bosonic matrices while is a 2 × 2 fermionic matrix. The superindices α, β, . . . run over 2 p bosonic and Q fermionic values. Note that transformations δ X α = Jβα γβ X γ ,

(12)

780

K. Hallowell, A. Waldron

leave invariant the orthosymplectic inner product X · Y ≡ −X α J βα Yβ .

(13)

In addition to time translation invariance, our model will enjoy Q(2 p + 1) supersymmetries as well as bosonic sp(2 p) R2 p and so([Q/2], [Q/2 + 1/2]) symmetries. These will correspond to osp(2 p|Q) and its fundamental representation R2 p|Q . The osp(2 p|Q) generators obey the Lie superalgebra [ f αβ , f γ δ } = 4J(β(γ f α]δ] ,

(14)

where the superbracket [·, ·} is a commutator unless both entries are femionic, in which case it equals the anticommutator. We will also often need the quadratic osp(2 p|Q) Casimir c=

1 βα J f αγ J δγ f δβ . 2

(15)

It commutes with all the osp(2 p|Q) generators f αβ . 2.2. Orthosymplectic spinning particle. The field content of our quantum mechanical theory consists of the embedding coordinates x µ , x µ : R −→ M ,

(16)

taking values in a (dim M = d)-dimensional Riemannian target manifold (M, gµν ), along with 2 p + Q additional fields X µ describing spinning degrees of freedom ⎧ ⎨ (ϕ µ )∗ , ϕ µ , (ψ µ )∗ , ψ µ , Q = 2q even , 1,..., p 1,..., p 1,...,q 1,...,q µ (17) Xα = µ µ µ µ ∗ ∗ µ , Q = 2q + 1 odd . ⎩ (ϕ 1,..., p ) , ϕ1,..., p , (ψ1,...,q ) , ψ1,...,q , ψ µ

The ϕ’s are bosonic and ψ’s fermionic and together X α transforms as the fundamental representation of osp(2 p, Q) (as in (12)). The action is the natural generalization of the N = 2 real supermultiplet 1 1 µ ν µ βα D X βµ µ βα ν ρ δγ σ S= + Rµνρσ X α J X β X γ J X δ . (18) dt x˙ gµν x˙ + i X α J 2 dt 4 As already mentioned, there are clearly p(2 p + 1) + 21 Q(Q − 1) bosonic and 2 p Q µ fermionic internal symmetries obtained by X α transformations δ X αµ = Jβα γβ X γµ .

(19)

These hold for any background metric gµν and we discuss their interpretation later. Less trivial are Q further supersymmetries plus their 2 p bosonic “partner” symmetries given by δx µ = i X αµ εα , D X αµ = x˙ µ εα .

(20)

Supersymmetric Quantum Mechanics and Super-Lichnerowicz Algebras

781

In the above, D and D/dt are the covariant variation and worldline derivatives defined on vectors v µ by µ Dv µ ≡ δv µ + ρσ δx ρ v σ = δx ρ Dρ v µ ,

Dv µ µ ρ σ ≡ v˙ µ + ρσ x˙ v = x˙ ρ Dρ v µ . dt In particular note that their commutator is D µ D, v = δx ρ x˙ σ Rρσ µ ν v ν . dt

(21)

(22)

To determine whether the variations (20) are symmetries is straightforward upon noting that δ(vµ w µ ) = vµ Dw µ + (Dv µ )wµ . For the simplest N = 1 version of the model with (2 p, Q) = (0, 1), the four point coupling to the Riemann tensor in the action (18) is absent by virtue of the first Bianchi identity R[µνρ]σ = 0 ,

(23)

and terms proportional to three X ’s (fermions) cancel for the same reason. Therefore the (0, 1) model is invariant in any background M. For higher cases the terms in the action varying into three X ’s cancel by a conspiracy between the four and two-point couplings of these fields. There remain variations proportional to five X ’s. For the N = 2 model with (2 p, Q) = (0, 2), these cancel by the second Bianchi identity D[κ Rµν]ρσ = 0 ,

(24)

so this model is also invariant in any background. So far we have just recovered existing µ results. For general values of (2 p, Q) the obstruction to terms quintic in X α is i δS = (25) dt X µ · X ν X ρ · X σ Dκ Rµνρσ X κ ·ε . 8 Hence, this theory is supersymmetric on manifolds of covariantly constant curvature Dκ Rµνρσ = 0 ,

(26)

or in other words, locally symmetric spaces. 3. Quantization 3.1. Canonical analysis. To analyze the canonical structure of the theory it is best to work in Darboux coordinates which are found by introducing the vielbein eµ m and µ writing the action in terms of X αm ≡ eµ m X α : D Xm 1 1 (27) + Rmnr s X m · X n X r · X s , S= dt x˙ µ gµν x˙ ν − i X m · 2 dt 4 where DVm /dt ≡ V˙m + x˙ µ ωµm n Vn . Then we cast this action into first order form i 1 1 S (1) = dt pµ x˙ µ − X m · X˙ m − πµ g µν πν + Rmnr s X m · X n X r · X s , (28) 2 2 8

782

K. Hallowell, A. Waldron

where the covariant canonical momentum is defined by i (29) πµ ≡ pµ + ωµmn X m · X n . 2 Since the symplectic form is canonical we immediately quantize by demanding the operator relations3 [ pµ , x ν ] = −iδµν ,

[X αm , X βn } = Jαβ ηmn .

(30)

The quantum Hamiltonian is 1 1 πm π m − Rmnr s X m · X n X r · X s 2 8 i 1 δ2 p+Q,1 R , + ωnm n π m − (31) 2 16 where we have made a definite choice of operator ordering reflected by the terms proportional to a naked spin connection and the scalar curvature. These ensure that the operator H is central4 and can be geometrically interpreted as a generalized Laplace operator5 . Note that we employ the operator ordering πm ≡ eµ m πµ . H =

µ

3.2. States. To study the Hilbert space, we view the operators X α as bosonic and fermionic oscillators ⎧ m† m† m m ⎪ Q = 2q even , , b , b ⎨ a1,..., p , a1,..., p 1,...,q 1,...,q , (32) X αm = ⎪ ⎩ a m† , a m , bm† , bm , bm , Q = 2q + 1 odd, 1,..., p 1,..., p 1,...,q 1,...,q with † [aim , a †jn ] = δi j δnm , {bam , bbn } = δab δnm , {bm , bn } = ηmn .

(33)

The definition of the vacuum state depends on whether Q is even or odd. In the even case, Q = 2q we introduce the Fock vacuum µ

ai |0 = 0 = baµ |0 .

(34)

Excited states then correspond to “multi-symmetric tensor–multi forms” |µ1 ...µ1 1

p p q q 1 1 s1 ,··· ,µ1 ...µs p ;ν1 ...νk1 ,··· ,ν1 ...νkq

(x)

= µ1 ...µ1 1

µ1 †

p p q q 1 1 s1 ,··· ,µ1 ...µs p ;ν1 ...νk1 ,··· ,ν1 ...νkq p p µ1s † µs p† ν11 † µ1 † 1

× a1 1 . . . a1

· · · ap

. . . ap

νk1 †

q

ν †

q

νk †

b1 . . . b1 1 · · · bq1 . . . bq q |0 .

(35)

3 Unitarity of our quantum mechanical model requires signature (η ) = diag(+1, . . . , +1), but for the mn study of algebras of differential-geometric operators on semi-Riemannian manifolds this requirement can be relaxed. Needless to say, therefore, all our results are germane to an arbitrary choice of signature. 4 To be sure, throughout this paper we refer to an operator as being central when it commutes with all the conserved charges of the underlying quantum mechanical system, the prototypical example being the Hamiltonian. 5 The term proportional to the scalar curvature vanishes for all models save (2 p, Q) = (0, 1). Whenever the Riemann tensor is covariantly constant, multiplication by the scalar curvature is obviously a central operation. However, for the osp(0, 1), N = 1 supersymmetric model this term ensures that H is central in arbitrary Riemannian backgrounds.

Supersymmetric Quantum Mechanics and Super-Lichnerowicz Algebras

783

When p = 0 and q = 1 these are simply differential forms, while for p = 1, q = 0 they are just totally symmetric tensors. Moreover, although tensors of the above symmetry type are not irreducible so(d) representations, any tensor field on the target space manifold can be represented this way. Although, these formulæ look complicated, we will rarely need them, and instead can rely on our supersymmetric quantum mechanical system to provide dynamics and a simple operator algebra on spinning space. When Q = 2q + 1 is odd, we need to consider vacua for the algebra {bm , bn } = δ mn which are now degenerate. As the bilinear operators b[m bn] obey the Lorentz/rotation algebra, √vacuum states |R are labeled by a representation R of so(d). Moreover, since γ m ≡ 2 bm obey the Dirac gamma matrix algebra, R must be a spinor representation |α, i.e., 1 bm |α = √ γ mα β |β . 2

(36)

The precise choice of representation labeled by the spinor index α (not to be confused with the orthosymplectic indices carrying the same name) depends on the dimensionality of the target space. Requiring µ

ai |α = 0 = baµ |α ,

(37)

excited states are then “multi-symmetric tensor-spinor–multi forms” |αµ1 ...µ1 1

p p q q 1 1 s1 ,··· ,µ1 ...µs p ;ν1 ...νk1 ,··· ,ν1 ...νkq

(x)

= µ1 ...µ1 1

µ1 †

p p q q 1 1 s1 ,··· ,µ1 ...µs p ;ν1 ...νk1 ,··· ,ν1 ...νkq p p µ1s † µs p† ν11 † µ1 † 1

× a1 1 . . . a1

· · · ap

. . . ap

νk1 †

q

ν †

q

νk †

b1 . . . b1 1 · · · bq1 . . . bq q |α .

(38)

Let us denote states such as (35) and (38) simply by |. In both the Q even and odd cases, by demanding the vacuum to be translation invariant pµ |0 = 0 = pµ |α ,

(39)

we find that the operator iπµ corresponds to the covariant derivative iπµ | = Dµ | .

(40)

Similarly, the first two terms of the Hamiltonian are proportional to the Laplacian = Dµ D µ so that H | =

1 1 1 − − Rmnr s X m · X n X r · X s − δ2 p+Q,1 R | . 2 8 16

(41)

Note that in this notation it is important to distinguish between states Dµ | and |Dµ . µ† For example, when (2 p, Q) = (0, 2), the state |ωµν = b1 b1ν† ωµν |0 is a two-form, µ† ρ† while |Dρ ωµν = b1 b1ν† b1 ∂µ ωνρ |0 is its three-form exterior derivative which does ρ† not equal Dµ |ωνρ = (Dµ ωνρ )b1ν† b1 |0. In fact, it is best to think of | · as a machine which takes as input any tensor from the target space manifold and outputs states such as (35) or (38).

784

K. Hallowell, A. Waldron

3.3. Conserved charges. To analyze the algebra of conserved charges corresponding to the symmetries6 (19) and (20), we note the identities 1 Rmnr s X r · X s + 2iω[mn] r πr , 2 [πm , X αn ] = iωm n r X αr , [πm , x µ ] = −ieµ m . [πm , πn ] =

(42)

Also notice that the operators M mn ≡ −X [m · X n] ,

(43)

generate the rotation algebra of the tangent space [M mn , M r s ] = M ms ηnr − M ns ηmr + M nr ηms − M mr ηns .

(44)

The conserved charges for time translations, the generalized “supersymmetries” (20), and internal symmetries (19) are readily computed and respectively given by 1 1 πm π m − Rmnr s X m · X n X r · X s 2 8 i 1 + ωnm n π m − δ2 p+Q,1 R , 2 16 vα = i X αm πm , m n f αβ = X (α ηmn X β] . H =

(45)

Again, these are quantum results whose orderings are important (in particular the terms on the second line of the Hamiltonian H are higher order in and are not needed for classical Poisson brackets). Checking that these charges commute with the Hamiltonian is tedious but straightforward using the above identities. When the osp(2 p|Q) fundamental superindex α is odd, the operators vα are “standard” supersymmetry generators rotating bosons x µ into fermions, and fermions X αm into momenta times bosons. Otherwise, they generate novel bosonic symmetries, which geometrically correspond to gradient and divergence-like operations. To compute their algebra, it is useful to note further identities [πm , ϕ # ] = −i Dm ϕ # , [X αm , ϕ # ] = −2ϕ m s X αs , [vα , ϕ # ] = X αm Dm ϕ # + 2iϕr s X αr π s ,

(46)

where ϕr s (x) = −ϕsr (x) and is otherwise arbitrary. Also, we denote the contraction of X m · X n on an antisymmetric tensors by #, so ϕ # ≡ ϕr s X r · X s .

(47)

i m # X R − X αm ωmnr π r , 2 α mn [vα , X βn } = i Jαβ π n − ωm n r X αm X βr .

(48)

It also helps to note that [vα , πn ] =

6 We do not consider possible additional symmetries that could arise for backgrounds with special geometries. An excellent starting point for this important investigation are the spinning particle studies [10,11].

Supersymmetric Quantum Mechanics and Super-Lichnerowicz Algebras

785

Detailed yet standard computations yield the commutators between the Hamiltonian and the generators vα , f αβ which on any Riemannian manifold obey the superalgebra: # , [vα , vβ } = Jαβ − 21 X αm X βn Rmn

[ f αβ , vγ } = 2v(α Jβ]γ , [ f αβ , f γ δ } = 4J(β(γ f α]δ] , [, f αβ ] = 0 . (49) In anticipation of its rôle in geometry as the Laplacian, we have defined the operator 1 1 ≡ −2H − R ## − δ2 p+Q,1 R . (50) 4 8 Importantly the “Lichnerowicz wave operator” ≡ −2H (51) is central, i.e., [, vα ] = 0 , (52) in symmetric space backgrounds. Moreover, we denote “supersymmetrization” with unit weight over a pair of superindices by (· ·]. Since the right-hand side of the [v, v}–supercommutator is Jαβ + (curvatures), the above relations constitute a finite dimensional super Lie algebra in flat backgrounds. This algebra is the central result of the paper, because it implies a beautiful operator superalgebra on arbitrary tensors on any manifold M. Its analysis and geometric interpretation is the subject of the next section. 4. Geometry 4.1. Dirac operators and N = 1 supersymmetry. In general, for Q odd, the fermionic operator 1 /, v2 p+Q = bm πm = √ D (53) 2 acts on states as the Dirac operator. Moreover, at (2 p, Q) = (0, 1), there are no bosonic oscillators and a single fermionic one bm = √1 γ m corresponding to the Dirac matrices, 2

/ and no f αβ ’s as the internal symmetry so we have conserved charges H and v1 = √1 D 2 group osp(0|1) is empty. Their algebra is just the N = 1 supersymmetry algebra 1 / 2 = −2H = + R ≡ , (54) D 4 or in terms of geometry, the Weitzenbock identity for the square of the Dirac operator in curved space. Since the Laplace-like operator commutes with the Dirac operator / we may view it as a Lichnerowicz wave operator. D,

786

K. Hallowell, A. Waldron

4.2. Differential forms and N = 2 supersymmetry. At (2 p, Q) = (0, 2), we have a pair µ of fermionic oscillators (bm† , bm ). Identifying X 1 = bµ† = d x µ with anticommuting µ coordinate differentials d x , the operator v1 = d ,

(55)

the exterior derivative, while states are differential forms. Then we identify v2 = δ ,

(56)

with the codifferential and the algebra (49) gives {d, δ} = −2H = +

1 ## R ≡ . 4

(57)

The right hand side, = Dµ D µ + 41 R ## equals the form Laplacian, which is the central modification of the usual Laplace operator. Again, we may view it as a Lichnerowicz wave operator acting on differential forms. The internal osp(0|2) ∼ = so(1, 1) algebra has a single generator 1 f 12 = bm† bm − dim M . 2

(58)

N ≡ bm† bm

(59)

The operator

acts on states (= forms) by counting their degree (or number of indices). Hence, we have an u(1|1) superalgebra {d, δ} = , [N, d] = d ,

[, d] = 0 = [, δ] , [N, δ] = −δ , [N, ] = 0 .

(60)

Alternately, the u(1|1) superalgebra of the N = 2 real multiplet model can be reinterpreted as a central extension by of the semidirect product of superalgebras osp(0|2) R0|2 , where f αβ ≡ Jαβ N generates the osp(0|2) factor. 4.3. Symmetric algebras and quantum mechanics. The above discussion relating the osp(0|1) and osp(0|2) models to geometry reviewed known results. We now turn to the purely bosonic7 osp(2|0) model, and begin our presentation of new material. Lichnerowicz introduced a modified Laplacian (n) acting on n-index totally symmetric tensors [6] in order to facilitate a computation of the massive spin 2 propagator on spaces of constant curvature8 Rµν ρσ = −

2R ρ δ δσ . d(d − 1) [µ ν]

(61)

(We shall employ units R = −d(d − 1) whenever working with constant curvature R spaces. Factors of R can be reinstated by inserting appropriate powers of 1 = − d(d−1) on the grounds of dimensionality. All formulæ obtained this way are valid for spaces of 7 Spinning particle models with “commuting-spinor” coordinates have appeared before in the literature [22]. 8 Our Ricci curvature convention is R ρ µν = Rρµν .

Supersymmetric Quantum Mechanics and Super-Lichnerowicz Algebras

787

positive or negative scalar curvature. The physical cosmological constant, = −R/d.) Low lying examples of this Lichnerowicz wave operator are (0) ϕ = ϕ , (1) ϕµ = ( − d + 1)ϕµ , (2) ϕµν = ( − 2d)ϕµν + 2gµν ϕρ ρ , (3) ϕµνρ = ( − 3d − 3)ϕµνρ + 6g(µν ϕρ)σ σ , .. .

(62)

Formulæ for general n are known (see [8] and also [23,24]) but are also a direct consequence of the models given here, in particular the generalization to an arbitrary symmetric space is (n) ϕµ1 ...µn = ϕµ1 ...µn + n(n − 1)R(µ1 ρ µ2 σ ϕµ3 ...µn )ρσ + n Rρ(µ1 ϕµ2 ...µn ) ρ . (63) Here, = Dµ D µ is the usual Laplacian, the key point being that (n) “commutes” with symmetrized trace, metric, divergence and gradient operations g µν (n) ϕµνρ3 ...ρn =

(n−2) ϕ µ µρ3 ...ρn ,

(n+2) g(ρ1 ρ2 ϕρ3 ...ρn+2 ) = g(ρ1 ρ2 (n) ϕρ3 ...ρn+2 ) , D µ (n) ϕµρ2 ...ρn

= (n−1) D µ ϕµρ2 ...ρn ,

(n+1) D(ρ1 ϕρ2 ...ρn+1 ) = D(ρ1 (n) ϕρ2 ...ρn+1 ) .

(64)

Indeed, the Lichnerowicz wave operator (n) corresponds to −2H , the Hamiltonian of our quantum mechanical system at (2 p, Q) = (2, 0). To see this relation we recall the symmetric algebra formalism of [8] where the operators (N, tr, g, div, grad, ) acting on symmetric tensors were defined:

Index Trace Metric Divergence Gradient

N = d x µ ∂µ tr = g µν ∂µ ∂ν g = gµν d x µ d x ν div = g µν ∂µ Dν grad = d x µ Dµ

Counts indices. Traces a pair of indices. Multiplies by the metric and symmetrizes. The covariant divergence. Symmetrized covariant derivative.

In that work, commuting symbols d x µ and ∂µ ≡ ∂/∂(d x µ ) with algebra9 [∂µ , d x ν ] = δµν ,

(65)

were employed for index bookkeeping and the algebra operated on sums of symmetric tensors ϕµ1 ...µn d x (µ1 · · · d x µn ) . (66) = n 9 We remind the reader that the dual differential ∂ corresponds to the quantum mechanical oscillator a , µ µ so does not act on functions of coordinates such as the metric.

788

K. Hallowell, A. Waldron

Moreover the operators (g, N, tr) formed an sp(2) Lie algebra with (grad, div) transforming as its fundamental doublet representation. The Lichnerowicz wave operator was given by10 d(d − 4) d(d − 4) = [div, grad] + 2c + , 4 2 where the sp(2) Casimir of (15) equals =+c+

c=

d(d − 4) 1 βα J f αγ J δγ f δβ = g tr − N(N + d − 2) − . 2 4

(67)

(68)

The operator acts on an n-index symmetric tensor by the operator (n) given above. Importantly, is central – commuting with (N, tr, g, div, grad) which explains Eqs. (64) above. Identifying the Fock space oscillators of the previous section as µ†

d x µ ↔ a1 ,

∂ ↔ a1µ ∂(d x µ )

(69)

establishes an isomorphism between the symmetric space algebra and our quantum mechanical system at (2 p, Q) = (2, 0). States | correspond to symmetric tensors = s φµ1 ...µs d x µ1 . . . d x µs and the dictionary for conserved charges of the model reads g N + d2 f αβ ↔ N + d2 tr grad vα ↔ div −2H ↔ . (70) Also, the inner product and adjoint operations introduced in [8] correspond to the inner product of quantum mechanical states. In fact this dictionary even provides a generalization of the symmetric algebra formalism to any symmetric space, rather than simply spaces of constant curvature. For completeness, we write out explicitly the algebra (49) [tr, g] = [tr, grad] = 2div , [div, grad] = [N, O] =

4N + 2d , [div, g] = 2grad , − 2Rµνρσ d x µ ∂ ν d x ρ ∂ σ , wtO .O ,

(71)

where the weights of the index operator N provide a five-grading: O tr div N grad g wt O −2 −1 0 0 1 2

(72)

The operators {g, N, tr} generate an sl(2, R) = sp(2) = osp(2|0) Lie algebra and (grad, div) transform as its fundamental representation. When the curvature vanishes we obtain the Lie algebra of the Jacobi group (a very useful reference is [25]). Otherwise the algebra is a deformation thereof. 10 Note that the equivalent relation in Eq. (30) of [8] is quoted with the wrong sign convention for the cosmological constant relative to the remainder of that work. Also, we have shifted the definition of c by the constant d(d − 4)/4 to conform with the uniform definition (15).

Supersymmetric Quantum Mechanics and Super-Lichnerowicz Algebras

789

4.4. Lichnerowicz Dirac operator. Our next example is the osp(2|1) model. It is the first model with both bosonic and fermionic oscillators and is distinguished among all models by possessing a central, Lichnerowicz–Dirac operator. This operator was first uncovered in a study of partially massless higher spin fermi fields [17]. It displays an extremely interesting “statistical-transmutation” property. Namely, that although it is an odd operator (proportional to odd numbers of Dirac matrices), it commutes with every other operator in the algebra (fermionic or otherwise). The osp(2|1) model has a pair of bosonic oscillators, interpreted as commuting coordinate differentials and their duals (a m† , am ) ∼ = (d x m , ∂m ) ,

(73)

with algebra (65), just as for the osp(2|0) model. In addition there is a single vector of fermionic oscillators corresponding to the Dirac matrices 1 bm ∼ = √ γm , 2

(74)

as for the osp(0|1), N = 1 supersymmetric model. States are totally symmetric spinor/ of the osp(0|1) model, and operators valued tensors. In addition to the Dirac operator D (N, tr, g, div, grad) of the osp(2|0) model and their mutual Lichnerowicz operator , there are an additional pair of operators (γ , γ ∗ ), which either add a symmetric tensor index carried by a Dirac matrix, or take a gamma-trace11 : γ : ψµ2 ...µs −→ γ(µ1 ψµ2 ...µs ) , γ ∗ : ψµ1 ...µs −→ sγ µ ψµµ2 ...µs . Together, (g, γ , N, γ ∗ , tr) generate osp(2|1) ⎞ ⎛ g N + d2 √1 γ 2 ⎜ ⎟ d 1 ∗⎟ f αβ = ⎜ ⎝ N + 2 tr √2 γ ⎠ , [ f αβ , f γ δ } = 4J(β(γ f α]δ] . √1 γ √1 γ ∗ 0 2

(75)

(76)

2

This superalgebra mostly replicates the one given above for osp(2|0), but in addition (γ , γ ∗ ) transform as an sp(2) doublet: [g, γ ] = 0 ∗ [γ , g] = 2γ , [N , γ ] = γ , {γ , γ } = 2g , {γ , γ ∗ } =

= [γ ∗ , tr] , [tr, γ ] = 2γ ∗ , [N , γ ∗ ] = −γ ∗ , {γ ∗ , γ ∗ } = 2tr , 2N + d .

(77)

Writing out the second [ f, v} line of the algebra (49) yields (for those commutation relations not given in the previous osp(2|0) section) / = [div, γ ] , [γ ∗ , grad] = D / g] = 0 = [ D, / tr] , [ D, / γ } = 2 grad , { D, / γ ∗ } = 2 div . { D,

(78)

11 Strictly we should write, for example, either γ |ψ µ2 ...µs = |γµ1 ψµ2 ...µs in a quantum mechanical language, or γ ψµ1 ...µs−1 d x µ1 · · · d x µs−1 = γµ1 ψµ2 ...µs d x µ1 · · · d x µs , in a geometry notation.

790

K. Hallowell, A. Waldron

Finally it remains to explicate the [v, v} relations. Since these are simplest (and most utilizable) for the case of constant curvature, we specialize to Rmnr s = 2ηm[r ηs]n .

(79)

Then we find d − 1 / grad] = γ N + − gγ ∗ , [ D, 2 1 d(3d − 5) , 8 1 1 [div, grad] = − 2c − [N − γ γ ∗ ] − d(3d − 7) , 2 8 d − 1 ∗ / = N+ γ − γ tr . [div, D] 2 /2 = − c − N + γ γ ∗ − D

(80)

Here 1 c = g tr − N(N + d − 1) + γ γ ∗ − d(d − 2) , 4

(81)

is the osp(2|1) Casimir and the central Lichnerowicz wave operator 1 = + c + d(d − 3) . 8

(82)

Incidentally, introducing the supergravity-inspired operators12 i Grad = grad + γ , 2

i Div = div − γ ∗ , 2

(83)

the above algebra simplifies to 3 [Div, Grad] = − 2c − d(d − 3) . 8

(84)

Armed with the above algebra, we can now search for further central operators, and find a generalized Dirac operator / . / ∗ − γ ∗ Dγ D = γ Dγ

(85)

This is the operator introduced in [17]. Although it is clearly of odd Grassmann parity, / div, γ ∗ , tr) in constant it commutes with all other operators (g, γ , grad, N, , D, / in general symmetric curvature backgrounds. In particular, it fails to commute with D spaces (in which , being the Hamiltonian of our spinning particle model, is still central). It commutes with the osp(2|1) generators in any background and seems to have no generalization to higher osp(2 p|Q) models with Q odd. So to the best of our (current) knowledge its existence is a peculiarity of symmetric spinor-tensors. 12 Recall that N = 1 cosmological supergravity is most simply defined in terms of a modified covariant √ derivative Dµ = Dµ + 21 −/3 γµ [26,27]. Note also that the algebra presented in Sect. 7 of [8] suffers

some typographical errors amounting to interchanging (grad, div) ↔ (Grad, Div). The correct algebra is presented above.

Supersymmetric Quantum Mechanics and Super-Lichnerowicz Algebras

791

4.5. Multisymmetric tensors. When the oscillator algebra is osp(2 p|0) = sp(2 p), states are tensors whose indices are grouped into totally symmetric subsets ϕµ1 ...µ1 1

p p s1 ,...,µ1 ...µs p

= ϕ(µ1 ...µ1 1

p p s1 ),...,(µ1 ...µs p )

.

(86)

The oscillators can be viewed as commuting coordinate differentials and their duals µ

µ

X αµ = {d x1 , . . . , d x µp , ∂1 , . . . , ∂ µ p},

(87)

[∂iµ , d x νj ] = δi j δµν .

(88)

with algebra

In Young diagram notation we could depict the tensor in (86) as s1 boxes

⊗

s2 boxes

⊗ .. .

(89)

⊗ s p boxes. The operations g, N and tr of the above sections correspond to adding a pair of boxes, counting the number of boxes, or removing a pair of boxes in the single row case, respectively. In addition we would now like to count the number of indices in a given row, move boxes from one row to another, and add or remove pairs of boxes from distinct rows. All these operations are achieved by promoting g, N and tr to p × p matrices of operators: µ

g ≡ (gi j ) = (d xi gµν d x νj ) , µ

N ≡(Ni j )= (d xi ∂ jµ ) , tr ≡(tri j )= (∂iµ g µν ∂ jν ) .

(90)

These operators correspond precisely to the p(2 p + 1) conserved osp(2 p|0) charges of the underlying quantum mechanical model ( f αβ ) =

g Nt + d2 1

N + d2 1 tr

,

(91)

where 1 is the p × p identity matrix and the matrices g and tr are symmetric. These obey the sp(2 p) Lie algebra following from (49), [Ni j , gkl ] [tri j , gkl ] [Ni j , trkl ] [Ni j , Nkl ]

= 2δ j (k gl)i , = 4δ(k(i Nl) j) + 2dδi(k δl) j , = −2δi(k trl) j , = δ jk Nil − δil Nk j

(92)

792

K. Hallowell, A. Waldron

whose quadratic Casimir (15) now reads c = tr

g tr + tr g 2

d 2 − N+ 1 2

1 = gi j tri j − Ni j N ji − (d − p − 1)Nii − dp(d − 2 p − 2) . 4

(93)

There are now gradient and divergence operators acting on each row (vα ) =

gradi divi

,

(94)

which transform as the fundamental representation of sp(2 p) [Ni j , gradk ] = δ jk gradi , [Ni j , divk ] = −δik div j , [tri j , gradk ] = 2δk(i div j) , [gi j , divk ] = −2δk(i grad j) .

(95)

They obey the algebra µ

ρ

[gradi , grad j ] = Rµνρσ d xi d x νj d xk ∂kσ , µ

ρ

[divi , grad j ] = δi j − Rµνρσ d x j ∂iν d xk ∂kσ , µ

ρ

[divi , div j ] = Rµνρσ ∂i ∂ νj d xk ∂kσ .

(96)

Moreover, the Lichnerowicz wave operator µ

ρ

= + Rµνρσ d xi ∂iν d xk ∂kσ ,

(97)

commutes with g, N and tr on any manifold, and in the case of symmetric spaces also commutes with div and grad. Finally, for constant curvature manifolds (61), the relations (96) and (97) simplify to [gradi , grad j ] = 2gk[i N j]k , 1 [divi , grad j ] = δi j ( − c − dp[d − 2 p − 2]) 4 − g jk trki + N jk Nki + (d − p − 1)N ji , [divi , div j ] = −2N[ j|k| tri]k ,

(98)

and 1 = + c + dp(d − 2 p − 2) . 4

(99)

Supersymmetric Quantum Mechanics and Super-Lichnerowicz Algebras

793

4.6. Multiforms. Models with osp(0|2q) oscillators have been studied extensively in the literature. The oscillator modes are all fermionic and their bilinears represent the so(2q) Lie algebra. In terms of geometry, they correspond to q sets of anticommuting µ µ coordinate differentials and their duals {(d x1 , ∂1ν ), . . . , (d xq , ∂qν )} subject to {d xaµ , ∂bν } = δνµ δab ,

(100)

with indices a, b, c, . . . = 1, . . . , q. States are therefore multiforms, i.e., tensors with groups of totally antisymmetric indices φν 1 ...ν 1 1

q q k1 ,...,ν1 ...νkq

= φ[ν 1 ...ν 1 1

q q k1 ],...,[ν1 ...νkq ]

.

(101)

Pictorially, the analog of (89) would be products of columns. Geometric operations on beasts of this type have been studied in detail [19–21] in the context of mixed symmetry higher spin fields. Gauged versions of the underlying S O(2q) spinning particle models and their relation to higher spins appeared in [28,31,32]. Once again the conserved charges f αβ , vα and H correspond to geometric operators. The so(2q) generators,

f αβ =

g −Nt + d2 1

N − d2 1 tr

,

(102)

either count indices from a set or move an index from one set to another via N = (bam† ηmn bbn ) ,

(103)

or add or remove a pair of indices from a pair of antisymmetric index sets using the q × q antisymmetric matrices of operators g = (bam† ηmn bbn† ) = −gt or tr = (bam ηmn bbn ) = −trt .

(104)

Naturally, it is not possible to trace or add a pair of boxes using the metric on a single column. As an example of these operations, the Riemann tensor Rµνρσ = R[µν]1 [ρσ ]2 is a biform and ⎛

0

⎜ ⎜ −g[κ[η Rµν]1 ρσ ]2 f αβ Rµνρσ =⎜ ⎜ ⎝ −[2− d2 ]Rµνρσ −2R[µνρ]1 σ

⎞

g[κ[η Rµν]1 ρσ ]2

[2− d2 ]Rµνρσ

0

−2Rµ[νρσ ]2

2Rµ[νρσ ]2

0

⎟ [2− d2 ]Rµνρσ ⎟ ⎟, ⎟ −4Rµρ ⎠

−[2− d2 ]Rµνρσ

4Rµρ

0

2R[µνρ]1 σ

(105) (for clarity we have labeled index sets by subscripts 1, 2 and have not imposed the first Bianchi identity). Clearly, an index notation rapidly becomes cumbersome, and it is best just to think of the operators acting on states, but it should at least make the interpretation of these operators clear. Their so(2q) algebra follows from (49) [ f αβ , f γ δ ] = 4J[β[γ f α]δ] ,

(106)

794

K. Hallowell, A. Waldron

or spelled out [Nab , gcd ] = 2ga[d δc]b , [trab , gcd ] = 4δ[c[a Nd]b] + 2dδa[d δc]b , [Nab , trcd ] = −2trb[d δc]a , [Nab , Ncd ] = δbc Nad − δad Ncb .

(107)

Their Casimir (15) is 1 c = gab trba + Nab Nba − (d + q − 1)Naa + dq(d + 2q − 2) . 4 The operators vα generalize the exterior derivative and codifferential ⎛ ⎞ d1 ⎜ .. ⎟ ⎜. ⎟ ⎜ ⎟ ⎜d ⎟ (vα ) = ⎜ q ⎟ . ⎜ δ1 ⎟ ⎜. ⎟ ⎝ .. ⎠ δq

(108)

(109)

These act on each antisymmetric set of indices much like the usual d and δ operators, but µ it is important to note that even the operators da ∼ = d xa Dµ are not metric independent, since they employ the covariant derivative. The vα transform as a vector under13 so(2q) [Nab , dc ] = δbc da , [Nab , δ c ] = −δac δ b , [trab , dc ] = 2δc[b δ a] , [gab , δ c ] = 2δc[b da] .

(110)

They are supercharges from the spinning particle model viewpoint and obey the superalgebra (49). In detail, these read {da , db } = Rµνρσ d xaµ d xbν d xcρ ∂cσ , {da , δ b } = δab + Rµνρσ d xaµ ∂bν d xcρ ∂cσ , {δ a , δ b } = Rµνρσ ∂aµ ∂bν d xcρ ∂cσ .

(111)

Let us spell those relations out explicitly for the constant curvature case (79): {da , db } = 2gc(a Nb)c , 1 {da , δ b } = δab ( − c + dq(d + 2q − 2)) 4 + gac trcb + Nac Ncb − (d + q − 1)Nab , {δ a , δ b } = 2Nc(a trb)c ,

(112)

where the Laplacian is related to the Lichnerowicz wave operator by 1 = + c − qd(d + 2q − 2) . 4

(113)

13 We apologize for ambiguous notations such as: (i) the Kronecker delta δ versus codifferentials, δ , (ii) a ab the spacetime dimension d = dim M versus exterior derivatives da , and (iii) the use of α as both an osp(2 p|Q) superindex and as a spacetime spinor index.

Supersymmetric Quantum Mechanics and Super-Lichnerowicz Algebras

795

4.7. Super-Lichnerowicz algebras. It is now clear that our supersymmetric quantum mechanical system provides a vast generalization of Lichnerowicz’s original construction. The most general extension acts on tensors with p sets of symmetrized indices, [Q/2] sets of anti-symmetrized indices and a single spinor index present only for Q odd α(µ1 ...µs

1 ),...,(µ1 ...µs p );[ν1 ...νk1 ],...,[ν1 ...νk[Q/2] ]

.

(114)

In Young diagram notation, where rows are totally symmetric and columns antisymmetric, we could write α ⎧ ⎪ ⎪ ⎪ ⎪ ⎨

p times

⎪ ⎪ ⎪ ⎪ ⎩

.

⊗ ⊗ . . . ⊗

⊗

⊗

(115)

⊗ ··· ⊗

[Q/2] times

Clearly, although this is not an irreducible basis for tensors and spinors on a manifold, we can generate all such objects this way. Indeed, the study of irreducible tensors amounts to an examination of irreducible representations of the non-compact orthosymplectic algebra obeyed by the conserved charges. A detailed study is reserved for future work, however. Firstly, let us discuss the osp(2 p|Q) generators. For any Q, these can (i) add a pair of boxes, (ii) count or move boxes and (iii) remove a pair of boxes using g, N and tr. These are now operator-valued supermatrices. Their fermionic entries act on pairs of boxes of which one lives in an (antisymmetric) column, and the other in a (symmetric) row. Otherwise, they act exactly as in the above multisymmetric tensor and multiform examples. When Q is odd, states are also labeled by a spinor index, so a single box can be added or removed using a Dirac matrix as described in Sect. 4.4. All these operators fit into the conserved charges f αβ according to ⎛

gi j

⎜ ⎜ N ji + d2 δi j ⎜ ⎜ ⎜ f αβ = ⎜ ga j ⎜ ⎜ N ja ⎜ ⎝ √1 γ j 2

Ni j + d2 δi j

gib

Nib

tri j

Nbi

trib

Na j

gab

Nab − d2 δab

tra j

−Nba + d2 δab

trab

√1 γ ∗ 2 j

− √1 γ b

− √1 γ ∗b

2

2

⎞

√1 γ i 2 ⎟ √1 γ ∗ ⎟ 2 i ⎟

⎟

⎟ √1 γ a ⎟ 2 ⎟ √1 γ a∗ ⎟ ⎟ 2

.

(116)

⎠

0

The final column and row should be omitted for Q even. Here i, j, . . . and a, b, . . . take values 1, . . . , p and 1, . . . , [Q/2]. The operator matrix entries are defined by g•◦ ≡ d x•µ gµν d x◦ν , N•◦ ≡ d x•µ gµν ∂◦ν , tr•◦ ≡ ∂•µ gµν ∂◦ν , γ • ≡ d x•µ γµ , γ ∗• ≡ ∂•µ γµ ,

(117)

where • and ◦ stand for indices of either type i, j, . . . or a, b, . . .. The differentials and their duals are either commuting or anticommuting with non-vanishing brackets µ

[∂i , d x νj ] = δi j g µν ,

{∂aµ , d xbν } = δab g µν .

(118)

796

K. Hallowell, A. Waldron µ

µ

µ

µ

µ

In terms of the oscillators the correspondence is X α = (d xi , ∂i , d xa , ∂a , √1 γ µ ). 2 The superalgebra of the osp(2 p|Q) generators f αβ is given in (49) and their quadratic Casimir by (15). Our discussion so far holds for any Riemannian manifold. It remains to discuss the µ charges vα = i X α πµ and the Lichnerowicz wave operator = − 2H . These operators can distinguish between general backgrounds, symmetric spaces, and constant curvature ones. The operator iπµ corresponds to the covariant derivative operator on states (see Eq. (40)). For α = a odd, these are “standard” supersymmetry operators. From a geometric viewpoint, this means that they act on columns (i.e., form indices) as exterior derivative and codifferential operators. It is important to realize that the covariant derivative Dµ acting on a tensor-valued differential form ϕ[µ1 ...µk ]ν1 ...νs (say) also mixes the tensor indices νi through the Christoffel symbols. When Q is odd, the final generator v2 p+Q is the Dirac operator, as discussed in Sect. 4.4. Finally for values of the superindex α = i even, the vα are gradient and divergence operators acting on symmetric rows. In an equation, vα is a column vector ⎞ ⎛ grad ⎜ div ⎟ ⎟ ⎜ (119) (vα ) = ⎜ d ⎟, ⎠ ⎝δ / D which transforms as the fundamental representation of osp(2 p|Q) as given in (49). Here, omit the final entry from (vα ) when Q is even. The Lichnerowicz wave operator14 is again a modification of the Laplacian according to =+

1 ## 1 R + δ2 p+Q,1 R . 4 8

(120)

It commutes with f αβ in any background and with vα in symmetric spaces. The algebra of all the above geometric operations is given in (49) and is valid for any Riemannian manifold. We propose to call this a “super-Lichnerowicz algebra”. The additional relation (52), required for to be central always holds for (2 p|Q) = (0|1, 2) (N = 1, 2 supersymmetry) and is otherwise valid in symmetric spaces (26). We end this section with its specialization to constant curvature manifolds (61). In that case the operator 1 R ## = 4c − d(Q − 2 p)(Q − 2 p + d − 2) , 2

(121)

i.e., the osp(2 p|Q) quadratic Casimir. The supercommutators of the vα then simplify to 1 [vα , vβ } = Jαβ + f [α|γ J δγ f δ|β) − d(Q − 2 p + d − 2)Jαβ . 4

(122)

14 It is interesting to note that can always be expressed in terms of the v by contracting the [v , v } α α β bracket with J αβ . Moreover, when Q = 2 p, an even simpler formula

=

1 p − Q/2 + 1 ## 1 1 v·v+ R + δ2 p+Q,1 R p − Q/2 4 p − Q/2 8

holds upon contracting with J βα .

Supersymmetric Quantum Mechanics and Super-Lichnerowicz Algebras

797

The Lichnerowicz wave operator is simply 1 1 = + c − d(Q − 2 p)(Q − 2 p + d − 2) + δ2 p+Q,1 R. 8 8

(123)

4.8. Parabolic orthosymplectic algebra. The superalgebra (49) is certainly consistent since it was defined by the charges of a quantum mechanical system acting on an explicit representation by quantum states or equivalently tensors on a manifold. However, the presence of terms quartic in the oscillators X αm on the right-hand side of the [v, v}-supercommutator imply that this is not a closed algebra. (The reader may also easily convince themselves of this fact by studying the constant curvature, (2 p, Q) = (2, 0) case studied in depth in [8].) However, when the curvature vanishes, we do obtain a finite dimensional Lie superalgebra. As we now show, it is a parabolic subalgebra of osp(2 p + 2|Q). The curved models are deformations of this parabolic algebra. The first step is to introduce a new operator ord which counts derivatives: [ord, H ] = 2H , [ord, vα ] = vα , [ord, f αβ ] = 0 .

(124)

Although ord is not a conserved charge, it can be interpreted as a dilation generator, and will play an extremely important rôle. In particular, it provides a 3-grading of the Lie superalgebra p = g 2 ⊕ g1 ⊕ g0 ,

(125)

where g2 = {H }, g1 = {vα } and g0 = { f αβ , ord}. Moreover, {H, g1 , ord} form a Heisenberg Lie superalgebra [vα , vβ } = −2Jαβ H .

(126)

It is natural to wonder therefore, whether the algebra p is the parabolic subalgebra of some larger Lie superalgebra, and indeed this is the case. Namely, the Lie superalgebra osp(2 p + 2|Q) has a five grading by the Cartan generator, which we are here calling ord, corresponding to the longest root of its bosonic sp(2 p + 2) subalgebra osp(2 p + 2|Q) = g2 ⊕ g1 ⊕ g0 ⊕ g−1 ⊕ g−2 .

(127)

The non-negatively graded subspaces are isomorphic to the parabolic algebra p, while we have as yet found no physical interpretation of the negatively graded subspaces, but cannot help remarking that the corresponding quantum mechanical generators ought be computable in terms of inverse powers of H . In summary, the algebra of conserved charges corresponds to all osp(2 p+2|Q) generators that commute with the generator H labeled by the longest root of the bosonic sp(2 p + 2) subalgebra. To verify this claim explicitly, it suffices to display an upper-triangular matrix representation of the super Lie algebra p. This is achieved by the osp(2 p + 2|Q) supermatrix ⎛ ⎞ ord vα 4H ⎜ ⎟ βγ βγ ⎟ P≡⎜ (128) ⎝ 0 2J f γ α −J vγ⎠ , 0 0 −ord (it is not difficult to permute the rows and columns of P to obtain the usual osp-valued block bosonic/fermionic supermatix form).

798

K. Hallowell, A. Waldron

5. Conclusions In this article, we have presented curved space spinning particle models whose spin degrees of freedom are described by both fermionic and bosonic oscillators. These transform as spacetime vectors and under the superalgebra osp(2 p|Q). States of these models describe the most general spinors and tensors on any Riemannian manifold. The conserved charges describe the various geometric operators on tensors and spinors. For manifolds that are symmetric spaces, our models are supersymmetric with Q supercharges corresponding to exterior derivatives, codifferentials, and (when Q is odd) the Dirac operator. Moreover, the underlying superalgebra is a deformation of a parabolic subalgebra of osp(2 p + 2|Q) with 2 p “commuting” supercharges which act as gradient and divergence operators. The Hamiltonian of the model is a modified Laplacian that generalizes Lichnerowicz’s constant curvature wave operator to tensors and spinors on arbitrary Riemannian manifolds. The remaining charges amount to all possible operations on tensors and spinors using the metric and Dirac matrices. There are many applications, generalizations, and open directions suggested by our work. We close by giving a (partial) list: 1.

2.

3.

4.

In special geometries it is possible to write down quantum mechanical systems with higher supersymmetries [33,34], the first example being Kähler backgrounds [3]. It is possible motivate the super-Lichnerowicz algebra presented here by analogy with the Hodge-Lefshetz algebra of Dolbeault cohomology which is also described in terms of supersymmetric quantum mechanics [5]. It would be interesting to consider Kähler backgrounds for our models to try and marry the two algebras (an investigation of this direction may be found in [35]). We have said nothing here about the spectrum of the model or its detailed dynamics. Eigenmodes of Lichnerowicz operators have been studied before [23,36], but it would be interesting to have statements applicable to our more general models. Clearly this is an important problem deserving further attention. An important relation of supersymmetric quantum mechanics to geometry is through index theorems. The Lichnerowicz operators presented here are natural regulators for index computations in an extremely broad class of spaces, in particular, from a geometric standpoint the quantum mechanical trace over states runs over infinitely many fields. Whether novel invariants can be constructed this way is an open problem. Second quantization of these models introduces fields of arbitrary spin, thanks to the introduction of bosonic oscillators [14,15]. Already the Lichnerowicz algebra for the osp(2|0) model implies generating functions for totally massless, symmetric, higher spin field actions. Indeed, calling 1 G = + 2(N − 1)(N + d − 3) − grad div + grad2 tr + g div2 2 1 1 (129) − g + 2N(N + d − 1) + d − 1 + grad div tr , 2 2 the (Bianchi) identity divG = gX = 0 mod g , implies that the action

S=

∗ G = |G| ,

(130)

(131)

Supersymmetric Quantum Mechanics and Super-Lichnerowicz Algebras

799

is gauge invariant under δ = grad ξ when tr ξ = 0. This is the invariance of a doubly trace free, tr2 = 0, field. Since need not be an eigenstate of the index operator N, the action (131) generates the actions of all totally symmetric, massless fields in constant curvature backgrounds. Similarly, the osp(2|1) model pertains to totally symmetric tensor-spinors. Indeed the operator / + (d − 4) γ Dγ / ∗ − Grad γ ∗ − γ Div G = D − 2N D 1 + (d − 2) γ Grad tr + g Div γ ∗ − i(2N + d − 4) 2 i 1 / − (2N + d) tr − 2iγ (2N + d − 2)γ ∗ , (132) − (d − 2) g D 4 2 obeys the identity DivG = γ X = 0 mod γ ,

(133)

so that the action S=

5.

6.

G = |G|,

(134)

enjoys the gauge invariance δ = Grad ξ with γ ξ = 0, of a traceless, gammatraceless, γ ∗ tr = 0, fermionic higher spin field. An obvious approach for obtaining the above actions is to gauge the rigid osp symmetries [9] of the models. Spinning particle studies of this type have been conducted in [29,30,13,37,38]. Interactions would amount to second quantization of these spinning particle models. Needless to say, this is an ambitious program, see [39] for a review of interacting higher spin theories. The super-Lichnerowicz algebra is a deformation of a parabolic osp(2 p + 2|Q) algebra. It would be most interesting to find geometric operators corresponding to the remaining osp(2 p + 2|Q) operators. In particular, it seems likely that introducing the Green’s function −1 may yield a solution to this problem. The additional charges would not be expected to yield quantum mechanical symmetries, but instead ought to be spectrum generating. The Lichnerowicz algebra for the osp(2|0) model is a deformation of the Jacobi group G J , an object extensively studied in the mathematical literature (see the excellent book [25]). In particular it is known that this group enjoys a cubic Casimir operator, in our notations d(d − 4) − grad2 tr − g div2 + grad (2N + d − 2) div. c3 = c + N + 4 (135)

7.

This operator is central in flat backgrounds. We know of no generalization to curved backgrounds however. Instead of studying the model via the Lie algebra obeyed by its conserved charges, for the osp(2|0) theory it is also helpful to work in an extended universal enveloping algebra. As shown in [8], in this case there is a rather elegant associative algebra.

800

K. Hallowell, A. Waldron

The key step is to enlarge the constant curvature algebra by a certain square root of the Casimir √ T ≡− 1− c (136) and in addition we define N ≡ N + d2 − 1. This allows us to form the operator N + T whose eigenstates are k-fold trace-free tensors, namely trk ϕ = 0 = trk−1 ϕ =⇒ (N + T )ϕ = 2kϕ .

(137)

≡ (N − T )div − grad tr, div

(138)

Then introducing

the constant curvature algebra is presented and similarly for the formal adjoint grad, by the six relations = grad tr N = (N + 2)tr , tr grad

N −T +4 tr , N −T +2

g tr = N 2 − T 2 = tr g − 4N − 4 , = (T − 1)div , div N = (N + 1)div , divT 2 grad (N − T + 2)T = grad div div (N − T )(T 2 − 1) −2

( −

(d−2)2 2

+ 2T 2 )(N − T + 2)T 2 , N −T

(139)

and their formal adjoints where all other products are commutative. In particular, grad) is diagonal. These relations observe that the sp(2, R) action on the pair (div, provide a calculus for constant curvature algebra computations in terms of rational functions of (N , T ). Needless to say, it would be extremely fruitful to generalize this associative algebra to the general osp(2 p|Q) models. Acknowledgements. It is a pleasure to thank Boris Pioline, Sergey Prokushkin, and Andrew Hodge for discussions.

References 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12.

Alvarez-Gaume, L., Witten, E.: Nucl. Phys. B 234, 269 (1984) Witten, E.: J. Diff. Geom. 17, 661 (1982) Zumino, B.: Phys. Lett. B 87, 203 (1979) Witten, E.: Nucl. Phys. B 202, 253 (1982) Figueroa-O’Farrill, J.M., Kohl, C., Spence, B.J.: Nucl. Phys. B 503, 614 (1997) Lichnerowicz, A.: Institut des Hautes Études Scientifiques, 10, 293 (1961); Bull. Soc. Math. France, 92, 11 (1964) Duval, C., Lecomte, P., Ovsienko, V.: Ann. Inst. Fourier, 49, 1999 (1999); Duval, C., Ovsienko, V.: Selecta Math. (N.S.), 7, 291 (2001) Hallowell, K., Waldron, A.: Nucl. Phys. B 724, 453 (2005) Brink, L., Deser, S., Zumino, B., Di Vecchia, P., Howe, P.S.: Phys. Lett. B 64, 435 (1976) Rietdijk, R.H., van Holten, J.W.: Class. Quant. Grav. 7, 247 (1990) Gibbons, G.W., Rietdijk, R.H., van Holten, J.W.: Nucl. Phys. B 404, 42 (1993) Gershun, V.D., Tkach, V.I.: Pisma Zh. Eksp. Teor. Fiz. 29, 320 (1979) [Sov. Phys. JETP 29, 288 (1979)]; Howe, P.S., Penati, S., Pernici, M., Townsend, P.K.: Phys. Lett. B 215, 555 (1988); Class. Quant. Grav. 6, 1125 (1989)

Supersymmetric Quantum Mechanics and Super-Lichnerowicz Algebras

801

13. Bastianelli, F., Corradini, O., Latini, E.: Higher spin fields from a worldline perspective. JHEP 0702, 072 (2007) 14. Labastida, J.M.F.: Nucl. Phys. B 322, 185 (1989) 15. Vasiliev, M.A.: Phys. Lett. B 209, 491 (1988) 16. Vasiliev, M.A.: Phys. Lett. B 243, 378 (1990), Phys. Lett. B 567, 139 (2003); see also Sagnotti, A., Sezgin, E., Sundell, P.: On higher spins with a strong Sp(2,R) condition. In: Proc. First Solvay Workshop on Higher Spin Gauge Theorey. (Brussels, May 2004), available at http://www.solvayinstitutes.be/ Activities/Higher%20spin/solvay1proc.pdf 17. Deser, S., Waldron, A.: Phys. Rev. Lett. 87, 031601 (2001); Nucl. Phys. B 607, 577 (2001) 18. Metsaev, R.R.: Phys. Lett. B 643, 205 (2006) 19. de Medeiros, P., Hull, C.: JHEP 0305, 019 (2003); Commun. Math. Phys. 235, 255 (2003) 20. Dubois-Violette, M., Henneaux, M.: Lett. Math. Phys. 49, 245 (1999); Edgara, S.B., Senovilla, J.M.M.: J. Geom. Phys. 56, 2153(2006); Olver, P.J.: Differential hyperforms I. Univ. of Minnesota report 82–101; Invariant theory and differential equations. In: Koh, S. Invariant theory, Berlin-Heidelberg-New York, Springer-Verlag, 1987, p. 62; Dubois-Violette, M., Henneaux, M.: Lett. Math. Phys. 49, 245 (1999); Commun. Math. Phys. 226, 393 (2002) 21. Bekaert, X., Boulanger, N.: Tensor gauge fields in arbitrary representations of GL(D,R). II: Quadratic actions. Commun. Math. Phys. 271, no. 3, 723–773 (2007); Phys. Lett. B 561, 183 (2003) 22. Deriglazov, A.A., Gitman, D.M.: Mod. Phys. Lett. A 14, 709 (1999) 23. Christensen, S.M., Duff, M.J.: Nucl. Phys. B 154, 301 (1979) 24. Warner, N.P.: Proc. Roy. Soc. Lond. A 383, 217 (1982) 25. Berndt, R., Schmidt, R.: Elements of the representation theory of the Jacobi group. Rolf Berndt, Ralf Schmidt (eds.) Baston: Birkhäuser Verlag, 1998 26. Deser, S., Zumino, B.: Phys. Rev. Lett. 38, 1433 (1977) 27. Townsend, P.K.: Phys. Rev. D 15, 2802 (1977) 28. Kuzenko, S.M., Yarevskaya, Z.V.: Mod. Phys. Lett. A 11, 1653 (1996) 29. Howe, P.S., Penati, S., Pernici, M., Townsend, P.K.: Phys. Lett. B 215, 555 (1988) 30. Howe, P.S., Penati, S., Pernici, M., Townsend, P.K.: Class. Quant. Grav. 6, 1125 (1989) 31. Bastianelli, F., Benincasa, P., Giombi, S.: JHEP 0504, 010 (2005) 32. Bastianelli, F., Benincasa, P., Giombi, S.: JHEP 0510, 114 (2005) 33. Coles, R.A., Papadopoulos, G.: Class. Quant. Grav. 7, 427 (1990) 34. Hull C.M., http://arxiv.org/list/hep-th/9910028, 1999 35. Marcus, N.: Nucl. Phys. B 439, 583 (1995) 36. Warner, N.P.: Proc. Roy. Soc. Lond. A 383, 207 (1982) 37. Hull, C.M., Vazquez-Bello, J.L.: Nucl. Phys. B 416, 173 (1994) 38. Halling, R., Lindeberg, A.: Class. Quant. Grav. 7, 2341 (1990) 39. Bekaert, X., Cnockaert, S., Iazeolla, C., Vasiliev, M.A.: http://arxiv.org/list/hep-th/0503128, 2005 Communicated by G.W. Gibbons

Commun. Math. Phys. 278, 803–819 (2008) Digital Object Identifier (DOI) 10.1007/s00220-007-0391-3

Communications in

Mathematical Physics

A Lower Bound for Nodal Count on Discrete and Metric Graphs Gregory Berkolaiko Department of Mathematics, Texas A&M University, College Station, TX 77843-3368, USA. E-mail: [email protected] Received: 1 April 2007 / Accepted: 20 April 2007 Published online: 4 December 2007 – © Springer-Verlag 2007

Abstract: We study the number of nodal domains (maximal connected regions on which a function has constant sign) of the eigenfunctions of Schrödinger operators on graphs. Under a certain genericity condition, we show that the number of nodal domains of the n th eigenfunction is bounded below by n − , where is the number of links that distinguish the graph from a tree. Our results apply to operators on both discrete (combinatorial) and metric (quantum) graphs. They complement already known analogues of a result by Courant who proved the upper bound n for the number of nodal domains. To illustrate that the genericity condition is essential we show that if it is dropped, the nodal count can fall arbitrarily far below the number of the corresponding eigenfunction. In the Appendix we review the proof of the case = 0 on metric trees which has been obtained by other authors.

1. Introduction According to a well-know theorem by Sturm, the zeros of the n th eigenfunction of a vibrating string divide the string into n “nodal intervals”. The Courant nodal line theorem carries over one half of Sturm’s theorem to the theory of membranes: Courant proved that the n th eigenfunction cannot have more than n domains. He also provided an example showing that no non-trivial lower bound for the number of nodal domains can be hoped for in Rd , d ≥ 2. But what can be said about the number of nodal domains on graphs? Earliest research on graphs concentrated on Laplace and Schrödinger operators on discrete (combinatorial) graphs. The functions on discrete graphs take values on vertices of the graph and the Schrödinger operator is defined by This research was partially supported by NSF award number 0604859.

804

G. Berkolaiko

(H ψ)u = −

ψv + qu ψu ,

v∼u

where the sum is taken over all vertices adjacent to the vertex u. Gantmacher and Krein [11] proved than on a chain graph (a tree with no branching which can be thought of as a discretization of the interval) an analogue of Sturm’s result holds: the n th eigenvector changes sign exactly n −1 times. But for non-trivial graphs the situation departs dramatically from its Rd analogue. First of all, Courant’s upper bound does not always hold. There is a correction due to multiplicity of the n th eigenvalue and the upper bound becomes1 [7] n + m − 1, where m is the multiplicity. In this paper we discuss another striking difference. If the number of cycles of a graph is not large, the graph behaves “almost” like a string: for a typical eigenvector, there is a lower bound on the number of nodal domains. To be more precise, let be the minimal number of edges of the graph that distinguish it from a tree (a graph with no loops). In terms of the number of vertices V and the number of edges E, the number can be expressed as = E − V + 1. We show that, for a typical eigenvector, the number of nodal domains is greater or equal to n − . In particular, on trees ( = 0) the nodal counting is exact: the n th eigenfunction has exactly n domains. Here by a “typical” eigenvector we mean an eigenvector which corresponds to a simple eigenvalue2 and which is not zero on any of the vertices. This property is stable with respect to small perturbations of the potential {qu }. Another graph model on which the question of nodal domains is well-defined is the so-called quantum or metric graphs. These are graphs with edges parameterized by the distance to a pre-selected start vertex. The functions now live on the edges of the graph and are required to satisfy matching conditions on the vertices of the graph. The Laplacian in this case is the standard 1-dimensional Laplacian. A good review of the history of quantum graphs and some of their applications can be found in [17]. The ideas that the zeros of the eigenfunctions on the metric trees behave similarly to the 1-dimensional case have been around for some time. Al-Obeid, Pokornyi and Pryadiev [1,21,20] showed that for a metric tree in a “general position” (which is roughly equivalent to our genericity assumption 2, see Sect. 3) the number of the nodal domains of n th eigenfunction is equal to n. This result was rediscovered by Schapotschnikow [22] who was motivated by the recent interest towards nodal domains in the physics community [3,13,12]. Our result on the lower bound extends to the quantum graphs as well. Similarly to the discrete case, we prove that even for graphs with > 0, n − is a lower bound on the number of nodal domains of the n th eigenfunction. The article is structured as follows. In Sect. 2 we explain the models we are considering, formulate our result and review the previous results on the nodal counting on graphs. The case of the metric trees has been treated before in [20,22]. In the three remaining cases, metric graphs with > 0, discrete trees and discrete graphs with > 0, we believe our results to be previously unknown and in Sect. 3 we provide complete proofs. For completeness, we also include a sketch of the general idea behind the proofs of [20,22] in the Appendix. Finally, in the last subsection of Sect. 3 we show that when a graph does not satisfy our genericity conditions, the nodal count can fall arbitrarily far below the number of the corresponding eigenfunction. 1 We are talking here about the so-called “strong nodal domains” — maximal connected components on which the eigenfunction has a constant well-defined (i.e. not zero) sign. 2 Thus for a “typical” eigenvector the notions of “strong” and “weak” nodal domains (see [7]) coincide.

A Lower Bound for Nodal Count on Discrete and Metric Graphs

805

3

1

5

2

6

4 Fig. 1. An example of a graph with = 2. For example, one can cut edges (2, 3) and (4, 5) to make it a tree. If φ is positive on shaded vertices and negative on white vertices, the nodal domain count on the graph is 3. On the tree obtained by deleting (2, 3) and (4, 5) the nodal count would be 5

2. The Main Result 2.1. Basic definitions from the graph theory. Let G be a finite graph. We will denote by V the set of its vertices and by E the set of undirected edges of the graph. If there exists an edge connecting two vertices v1 and v2 , we say that the vertices are adjacent and denote it by v1 ∼ v2 . We will assume that G is connected. Definition 2.1. A graph G is connected if for any v1 , v2 ∈ V there is a sequence of distinct vertices u 1 , . . . u n leading from v1 to v2 (u 1 = v1 , u n = v2 and u j ∼ u j+1 for j = 1, . . . n − 1). A graph G is a tree if for any v1 and v2 the sequence of u j connecting them is unique. The number of edges emanating from a vertex v is called the degree of v. Because we only consider connected graphs, there are no vertices of degree 0. If a vertex v has degree 1, we call it a boundary vertex, otherwise we call it internal. It will sometimes be convenient to talk about directed edges of the graph. Each non-directed edge produces two directed edges going in the opposite directions. These directed edges are reversals of each other. The notation for the reversal of d is d; the operation of reversal is reflexive: d = d. Directed edges always come in pairs, in other words, there are no edges that are going in one direction only. The set of all directed edges will be denoted by E. If an edge d emanates from a vertex v, we express it by writing v ≺ d. The number of vertices is denoted by |V| and the number of non-directed edges is |E|. Correspondingly, the number of directed edges is |E| = 2|E|. Another key definition we will need is the dimension of the cycle space of G. Definition 2.2. The dimension of the cycle space of G is the number of edges that have to be removed from E (leaving V as it is) to turn G into a connected tree. Remark 2.3. An alternative characterization of would be the rank of the fundamental group of G. There is also an explicit expression for in terms of the number of edges and number of vertices of the graph, = |E| − |V| + 1. Obviously, = 0 if and only if G is a tree.

(2.1)

806

G. Berkolaiko

2.2. Functions on discrete graphs. The functions on G are the functions from the vertex set V to the set of reals, ψ : V → R. We only consider finite graphs, therefore the set of all functions ψ can be associated with R|V | , where |V| is the number of vertices of the graph. Given a function ψ on G, we define a positive domain on G with respect to ψ to be a maximal connected subgraph S of G such that ψ is positive on the vertices of S. Similarly we define the negative domains. Then the nodal domain count νG (ψ) is the total number of positive and negative domains on G with respect to ψ. When the choice of the graph is obvious, we will drop the subscript G. Our interest lies with the nodal domain counts of the eigenvectors of (discrete) Schrödinger operators on graphs. We define the Schrödinger operator with the potential q : V → R by (H ψ)u = − ψv + qu ψu . (2.2) v∼u

The eigenproblem for the operator H is H ψ = λψ. The operator H has |V| eigenvalues, which we number in increasing order, λ1 ≤ λ 2 ≤ · · · ≤ λ | V | . This induces a numbering of the eigenvectors: H ψ (n) = λn ψ (n) . This numbering is welldefined if there are no degeneracies in the spectrum, i.e. λ j = λk whenever j = k. By ν H (λn ) we denote the nodal domain count of the n th eigenvector ψ (n) of an operator H . 2.3. Functions on metric graphs. A metric graph is a pair (G, {L e }), where L e is the length of the edge e ∈ E. The lengths of the two directed edges corresponding to e are also equal to L e . In particular, L d = L d . We would like to consider functions living on the edges of the graph. To do it we identify each directed edge d with the interval [0, L d ]. This gives us a local variable xd on the edge which can be interpreted geometrically as the distance from the initial vertex. Note that if the edge d¯ is the reverse of the edge d then xd¯ and L d − xd refer to the same point. Now one can define a function on an edge and, therefore, define a function ψ on the whole graph as a collection of functions {ψd }d∈E on all edges of the graph. To ensure that the function is well defined we impose the condition ψd (xd ) = ψd¯ (L d − xd ) for all d ∈ E. The scalar product of two square integrable functions ψ and φ is defined as Le def ψ, φ = ψe (xe )φe (xe )dxe . (2.3) e∈E

0

This scalar product defines the space L 2 (G). To introduce the main object of our study, the nodal domains, on metric graphs we need to define the notion of the metric subgraph of (G, {L e }). Definition 2.4. A metric subgraph of (G, {L e }) is a metric graph obtainable from G by (a) cutting some of the edges of G and thus introducing new boundary vertices, (b) removing some of the edges and (c) removing all vertices of degree 0. An example of a metric subgraph is shown on Fig. 2. Now, similarly to the discrete case, we can define the nodal count for a real-valued function φ.

A Lower Bound for Nodal Count on Discrete and Metric Graphs

807

4 cuts 3

1

3

5

2

6

1

2

5

6

Fig. 2. An example of a graph and its metric subgraph. The shaded vertices are the new ones which appeared due to cuts

A positive (negative) domain with respect to a real-valued function φ is a maximal connected metric subgraph on whose edges and internal vertices φ is positive (corresp. negative). The total number of positive and negative domains will be called the nodal count of φ and denoted by ν(φ). We are interested in the nodal counts of the eigenfunctions of the Laplacian d2 3 = − dx 2 . As its domain we take the set of continuous functions that belong to 2 the Sobolev space H (e) on each edge e and satisfy the Kirchhoff condition d ψd (0) = 0 dx

for all v ∈ V.

(2.4)

d v

Note that the sum is taken over all directed edges that originate from the vertex v and the derivative (which depends on the direction of the edge) is taken in the outward direction. The Laplacian can also be defined via the quadratic form Le Q [ψ] = |ψe (xe )|2 dxe . (2.5) e∈E

0

The domain of this form is the Sobolev space H 1 (G). For boundary vertices condition (2.4) reduces to the Neumann condition ψd (0) = 0. We also consider other homogeneous conditions on the vertex v, of the general form ψd (0) cos αv = ψd (0) sin αv ,

(2.6)

where the Neumann condition corresponds to the choice αd = 0. The corresponding quadratic form will then change4 to Le Q [ψ] = |ψe (xe )|2 dxe + ψ 2 (v) tan αv , (2.7) e∈E

0

v:deg(v)=1

where the sum is over the boundary vertices and ψ(v) is the value of the function at the vertex v. Our results will also apply to Schrödinger operators H = + q(x) with a potential q(x) which is continuous5 on every edge of the graph. 3 In particular, the functions must be continuous across the vertices. 4 If cos α = 0 — the Dirichlet case — the condition ψ(v) = 0 should instead be introduced directly into v the domain of Q . 5 Or a potential with finitely many jumps: the jumps can be thought of as “dummy” vertices of degree 2.

808

G. Berkolaiko

Schrödinger operator H , defined in the above fashion, has an infinite discrete spectrum with no accumulation points. As in the discrete case, we number the eigenvalues in increasing order. We will denote by ψ (n) the eigenvector corresponding to the eigenvalue λn . 2.4. Our assumptions and results. Let λn be the n th eigenvalue of the Schrödinger operator H on either discrete or metric graph. Let ψ (n) be the corresponding eigenfunction. We shall make the following assumptions. Assumption 1. The eigenvalue λn is simple and the corresponding eigenvector ψ (n) is non-zero on each vertex. Remark 2.5. The properties described in the Assumption are generic and stable with respect to a perturbation. Relevant perturbations include changing the potential {qv } in the discrete case and changing lengths {L e } in the metric case. More precisely, in the finite-dimensional space of all potentials (corresp. lengths) the set An on which (λn , ψ (n) ) satisfy the Assumption is open and dense unless the graph is a circle (see [9], where this question is discussed for metric graphs). We also mention that on each connected component of the set An the nodal count of ψ (n) remains the same. Indeed, on discrete graphs the sign of the eigenvector on each vertex must remain unchanged. On metric graphs the zeros cannot pass through the vertices. Moreover zeros cannot undergo a bifurcation (i.e. appear or disappear) — otherwise at the bifurcation point the eigenfunction and its derivative are both zero. By uniqueness theorem for H ψ = λψ, this would mean that ψ is identically zero on the whole edge, contradicting the Assumption. Now we are ready to state the main theorem which applies to both discrete and metric graphs. Theorem 2.6. Let λn and ψ (n) be the n th eigenvalue and the corresponding eigenvector of the Schrödinger operator H on either discrete or metric graph G. If (λn , ψ (n) ) satisfy Assumption 1, then the nodal domain count of ψ (n) is bounded by n − ≤ ν(ψ (n) ) ≤ n,

(2.8)

where = |E| − |V| + 1 is the dimension of the cycle space of G. In particular, when G is a tree, ν(ψ (n) ) = n. While we state the theorem in the most complete form, we will prove only those parts of it that we believe to be new. The upper bound on the number of nodal domains is a result with a long history going back to Courant [5,6]. The original proof for domains in Rd was adapted to metric graphs by Gnutzmann, Weber and Smilansky [13], who used the Rd proof from Pleijel [19] who, in turn, cites Herrmann [15] who simplified the original proof of Courant [5]. The history of the discrete version of Courant’s upper bound is more complicated. The question was considered by Colin de Verdière [4], Friedman [10], Duval and Reiner [8], and Davies, Gladwell, Leydold and Stadler [7]. The latter paper contains a good overview of the history of the result and points out various shortcomings in the preceding papers. The point of difficulty was counting the nodal domains if an eigenvalue is degenerate (and therefore there is an eigenvector which is zero on some vertices). As shown

A Lower Bound for Nodal Count on Discrete and Metric Graphs

809

in [7], the upper bound is n + m − 1, where m is the multiplicity of the eigenvalue. In our case, Assumption 1, which is essential for the lower bound (see Sect. 3.4), also simplifies the upper bound. The lower bound for the nodal domains on metric trees (i.e. the = 0 case) was shown by Al-Obeid, Pokornyi and Pryadiev [1,21,20] and by Schapotschnikow [22]. For completeness, we give a sketch of the proof of this case in the Appendix. Finally, the results on the lower bound for discrete graphs (both = 0 and > 0 cases) and for metric graphs with > 0 are new and will be proved in this paper. Note added in proof: It has been brought to the author’s attention by J. Leydold that the lower bound for discrete trees has also been obtained by Bıyıko˘glu in [23] as a corollary of a result of Fiedler [24]. 3. Proofs We will apply induction on to deduce the statement for metric graphs. The proofs for the discrete case follow the same ideas but differ in some significant detail. First, however, we discuss an important consequence of Remark 2.5: it is sufficient to prove statements on nodal counts under the following stronger Assumption. Assumption 2. Assumption 1 is satisfied for all eigenpairs (λk , ψ (k) ) with k ≤ n. Indeed, if only Assumption 1 is satisfied but Assumption 2 is not, we can perturb the problem so that (a) the nodal count of the n th eigenfunction ψ (n) does not change and (b) Assumption 1 becomes satisfied for all k ≤ n. Then, anything proved about the nodal domains of ψ (n) in the perturbed problem (which satisfies Assumption 2) will still be valid for the unperturbed one. In our proofs we use the classical ideas of mini-max characterization of the eigenvalues. Let H be a self-adjoint operator with domain D. Assume the spectrum of H is discrete and bounded from below. Let Q H [ψ] = (ψ, H ψ) be the corresponding quadratic form. Then the eigenvalues of H can be obtained as λk+1 =

max

min

f 1 ,..., f k ∈D ψ ∈D , f j (ψ )=0

Q H [ψ] , (ψ, ψ)

(3.1)

where the maximum is taken over all linear functionals over D. We will need the following classical theorem (see, e.g., [6, Chap. VI] or [14, Chap. II]) Theorem 3.1. (Rayleigh’s Theorem of Constraint). Let H be a self-adjoint operator defined on D. If H is restricted to a subdomain D R = {ψ ∈ D : g(ψ) = 0}, where g ∈ D , then the eigenvalues µn of the restricted operator satisfy λn ≤ µn ≤ λn+1 , where λn are the eigenvalues of the unrestricted operator. 3.1. Proof of Theorem 2.6 for metric graphs ( > 0). Proof for metric graphs ( > 0). We will derive the lower bound for graphs with cycles by cutting the cycles and using the lower bound for trees.

810

G. Berkolaiko

We are given an eigenpair (λn , ψ (n) ). Assume that cutting the edges e1 , . . . , e turns the graph G into a tree. We cut each of these edges at a point x j ∈ e j such that ψ (n) (x j ) = 0. We thus obtain a tree with |E(G)| + edges and |V(G)| + 2 vertices. Denote this tree by T. There is a natural mapping from the functions on the graph G to the functions on the tree T. In particular, we can think of ψ (n) as living on the tree. We would like to consider the same eigenproblem H ψ = µψ on the tree now. The vertex conditions on the vertices common to T and G will be inherited from the eigenproblem on G. But we need to choose the boundary conditions at the 2 new vertices. Each cut-point x j gives rise to two vertices, which we will denote by u j+ and u j− . Define a j+ =

d (n) d x ψ (u j+ ) , ψ (n) (u j+ )

a j− =

d (n) d x ψ (u j− ) , ψ (n) (u j− )

j = 1, . . . , ,

where the derivatives are taken in the inward direction on the corresponding edges of T. Since ψ (n) , as an eigenfunction, was continuously differentiable and ψ (n) (u j+ ) = ψ (n) (u j− ), we have a j+ = −a j− . Now we set the boundary conditions on the new vertices of T to be d ψ(u j+ ) = a j+ ψ(u j+ ), dx

d ψ(u j− ) = a j− ψ(u j− ), dx

j = 1, . . . , ,

where the derivatives, as before, are taken inwards. By definition of the coefficients a j± , the function ψ (n) satisfies the above boundary conditions. It also satisfies the equation H ψ = µψ and the vertex conditions throughout the rest of the tree. Thus, ψ (n) is also an eigenfunction on T and λn is the corresponding eigenvalue. If we denote the ordered eigenvalues of T by µk , then λn = µm for some m. It is important to note that m is in general different from n. We will now show that m ≥ n. Denote by Q G [ψ] the quadratic form corresponding to the eigenvalue problem on G; its domain we denote by HG . Similarly we define Q T [ψ] and HT . As we mentioned earlier, there is a natural embedding of HG into HT . Moreover, we can say that HG = ψ ∈ HT : ψ(u j+ ) = ψ(u j− ), j = 1, . . . , . We also note that, formally, Q T [ψ] = Q G [ψ] +

a j+ ψ 2 (u j+ ) + a j− ψ 2 (u j− ) . j=1

If ψ ∈ HG then ψ(u j+ ) = ψ(u j− ) and a j+ = −a j− result in the cancellation of the sum on the right-hand side. This means that on HG , Q T [ψ] = Q G [ψ]. Now we employ the minimax formulation for the eigenvalues λk on G, λk+1 =

max

φ1 ,...,φk ∈HG

Q G [ψ] =

min

ψ∈HG

ψ =1, ψ ⊥φi

max

φ1 ,...,φk ∈HG

min

ψ∈HG

ψ =1, ψ ⊥φi

Comparing it with the corresponding formula for the eigenvalues on T, µk+1 =

max

φ1 ,...,φk ∈HT

min

ψ∈HT

ψ =1, ψ ⊥φi

Q T [ψ],

Q T [ψ].

A Lower Bound for Nodal Count on Discrete and Metric Graphs

811

we see that the eigenvalues λk correspond to the same minimax problem as µk but with additional constraints ψ(u j+ ) = ψ(u j− ). By Rayleigh’s theorem we conclude that µk ≤ λk for any k. Therefore, if λn = µm for some n and m, they must satisfy m ≥ n. To finish the proof we need to count the number of nodal domains on G and on T with respect to ψ (n) . When we cut an edge of G, we increase the number of nodal domains by at most one.6 Therefore, νT (ψ) ≤ νG (ψ) + . On the other hand, we know that the nodal counting on the tree is exact, and, since ψ (n) is the m th eigenvector on T, νT (ψ (n) ) = m ≥ n. Combining the above inequalities we obtain the desired bound νG (ψ (n) ) ≥ n − . To conclude the proof we acknowledge that we implicitly assumed that the tree T satisfies Assumption 1, more precisely, that the eigenvalue µr is simple. To justify it, we observe that, if this is not the case, a small perturbation in the lengths of the edges will force T to become generic but will not affect the properties of the eigenvectors of G. 3.2. Proof of Theorem 2.6 for discrete trees ( = 0). Take an arbitrary vertex of T and designate it as root, denoted r . The tree with a root induces partial ordering on the vertices V: we say that v1 < v2 if the unique path connecting v1 with r passes through v2 (see Definition 2.1). We denote by v1 ≺ v2 the situation when v1 < v2 and v1 ∼ v2 . In the above ordering the root is higher than any other vertex. Since T is a tree, for each vertex v, other than the root, there is a unique u such that v ≺ u. Given a non-vanishing ψ we introduce the new variables Rv = ψu /ψv , where v ≺ u. Variables Rv are sometimes called Riccati variables [18]. The eigenvalue condition H ψ = λψ can now be written as − ψu − ψw + qv ψv = λψv , (3.2) w≺v

and, after dividing by ψv , R v = qv − λ −

1 . R w≺v w

(3.3)

If v is the root, condition (3.2) takes the form ψw + qr ψr = λψr . − w≺r

Therefore, if we define Rr ≡ qr − λ −

1 , R w∼r w

6 The number of nodal domains might not increase at all if a nodal domain entirely covers a loop of G.

812

G. Berkolaiko

5

4

1

3

2

Fig. 3. An example of a tree graph with 5 being the root vertex

then the zeros of Rr in terms of λ are the eigenvalues of H . Whenever Rr (λ) = 0, the values of Rv , v = r , uniquely specify the corresponding eigenvector ψ of H , and vice versa. Equation (3.3) provides a recursive algorithm for calculating Rv , in order of increasing v. Thus one gets a closed formula for Rv in terms of qu , u ≤ v and λ. This is best illustrated by an example. Example 3.2. For the tree shown in Fig. 3 the eigenvalue condition in terms of Riccati variables reads R1 = q1 − λ, R2 = q2 − λ, R3 = q3 − λ, 1 1 − , R1 R2 1 1 0 = q5 − λ − − ≡ R5 . R3 R4

R 4 = q4 − λ −

(3.4) (3.5) (3.6) (3.7) (3.8)

By substituting lines (3.4) and (3.5) into Eq. (3.7), and then lines (3.6) and (3.7) into Eq. (3.8), one obtains an eigenvalue condition for H . Denote by Pv the set of all poles of Rv with respect to λ and by Z v the set of all zeros of Rv ; these sets are finite. We define Nv< to be the number of negative values among Ru with u < v; we similarly define Nv≤ : Nv≤ = {u ≤ v : Ru < 0}. (3.9) Nv< = {u < v : Ru < 0}, The above numbers are not defined whenever one of Ru has a zero or a pole. The following lemma, listing properties of the Riccati variables, their poles and zeros, amounts to the proof of Theorem 2.6 when G is a tree and q is generic. Lemma 3.3. Assume that, for each v, the sets Z w with w ≺ v are pairwise disjoint for all v. Then (1) Pv = w≺v Z w . (2) For every p ∈ Pv , lim λ→ p−0 Rv = −∞ and lim λ→ p+0 Rv = +∞. Also, lim λ→−∞ Rv = +∞ and lim λ→∞ Rv = −∞. Outside the poles, Rv is continuous and monotonically decreasing as a function of λ.

A Lower Bound for Nodal Count on Discrete and Metric Graphs

813

(3) There is exactly one zero of Rv strictly between each pair of consecutive points from the set {−∞} ∪ {∞} ∪ Pv . (4) Between each pair of consecutive points from {−∞} ∪ {∞} ∪ Z v , the number Nv≤ (where defined) remains constant. When a zero of Rv is crossed, Nv≤ increases by one. (5) Between each pair of consecutive points from {−∞} ∪ {∞} ∪ Pv , the number Nv< (where defined) remains constant. When a pole of Rv is crossed, Nv< increases by one. (6) When λ = λn is an eigenvalue of H , the number of the nodal domains of ψ (n) is given by ν(λn ) = Nr< + 1. (3.10) Proof. Part 1 follows directly from Eq. (3.3). Part 2 follows from (3.3) by induction over increasing v. Part 3 follows from part 2: between each pair of consecutive points from {−∞} ∪ {∞} ∪ Pv , the function Rv decreases from +∞ to −∞. Parts 4 and 5 are linked together in an induction over increasing v. The induction is initialized by Nv≤ for minimal v (i.e. there is no w with w < v). In this case, Rv = qv −λ, therefore Nv≤ = 0 to the left of λ = qv and Nv≤ = 1 to the right of λ = qv . The inductive step starts with part 5. For a vertex v, let both statements be verified for all w, w < v. The statement for Nv< is obtained immediately from the duality between the zeros and the poles (part 1). Note that the assumption of the lemma implies that only one of Nw≤ with w ≺ v can increase when λ crosses a pole of Rv . To obtain the statement for Nv≤ consider two consequent poles and two consequent zeros of Rv , interlacing as follows: p1 < z 1 < p2 < z 2 . Then Rv is positive for λ ∈ ( p1 , z 1 ) (by part 2), therefore, on this interval Nv≤ = Nv< . When z 1 is crossed, Nv≤ increases by one since Rv becomes negative: Nv≤ = Nv< +1 ≡ C. On the other hand, when p2 is crossed, Nv≤ and Nv< become equal again since Rv > 0. However, Nv< has increased by one (by the induction hypothesis) and therefore Nv≤ is still equal to C. The above is obviously valid even if p1 = −∞ or/and z 2 = +∞. Finally, to show part 6 we observe that Nr< is the number of negative Riccati variables throughout the tree. If Rv < 0 then the signs of ψv and ψu (where u is the unique vertex satisfying v ≺ u) are different, i.e. the edge (u, v) is a boundary between a positive and a negative domain. Removing all boundary edges separates the tree into subtrees corresponding to the positive/negative domains. But removing Nr< edges from a tree breaks it into Nr< + 1 disconnected components, therefore the number of domains on a tree is equal to Nr< + 1. Proof of Theorem 2.6 for discrete trees ( = 0). The condition of Lemma 3.3 is satisfied due to the genericity assumption. Indeed, if there are v, w1 and w2 such that w1 ≺ v, w2 ≺ v and λ ∈ Z w1 ∩ Z w2 , then one can construct an eigenvector with eigenvalue λ and with ψv = 0. Since the sets Z v are finite, Nr< must become zero when λ → −∞. Consequently, Nr< is zero between −∞ and the first pole of Rr . Denote by pn the n th pole of Rr . By part 3 of the lemma, the first eigenvalue λ1 of H lies in the interval (∞, p1 ), on which Nr< is zero. By (3.10) we thus have ν(λ1 ) = 1. Further, λ2 lies in the interval ( p1 , p2 ). By part 5 of the lemma, Nr< = 1 on this interval, giving ν(λ2 ) = 2. Equality for other λn follows similarly.

814

G. Berkolaiko

3.3. Proof of Theorem 2.6 for discrete graphs ( > 0). In this case H is a matrix and the quadratic form is Q G [ψ] = Q H [ψ] =

|V |

H jk ψ j ψk ,

(3.11)

j,k=1

where H jk

⎧ ⎪ ⎨−1, = qj, ⎪ ⎩0,

j ∼ k, j = k, otherwise.

Proof of Theorem 2.6 for discrete graphs ( > 0). We will prove the result by induction. The initial inductive step = 0 is already proven in Sect. 3.2. Assume, without loss of generality, that we can delete the edge (1, 2) of the graph G without disconnecting it. We will denote thus obtained graph by . Note that V(G) = V( ). Let φ be an eigenvector of HG with eigenvalue λn . We would like to prove that νG (φ) ≥ n − l. Set α = φ2 /φ1 and define the potential p on by ⎧ ⎪ j = 1, ⎨q1 − α, p j = q2 − 1/α, j = 2, ⎪ ⎩q , j = 1, 2. j With the aid of potential p we define the operator H in the usual way, see Eq. (2.2). It is easy to see that, due to our choice of potential p, the vector φ is an eigenvector of H . For example, φ j + (q1 − α)φ1 = − φ j − φ2 + q1 φ1 = (HG φ)1 = λn φ1 , (H φ)1 = − j∼1

j∼1

where the adjacency is taken with respect to the graph . The eigenvalue corresponding to φ remains unchanged. However, in the spectrum |V | {µ j } j=1 of H , this eigenvalue may occupy a position other than the n th . We denote by m the new position of λn : µm = λn . Now consider the quadratic form associated with H . Consulting (3.11) we conclude Q [ψ] = Q G [ψ] + 2ψ1 ψ2 − αψ12 − α −1 ψ22 .

(3.12)

Consider first the case α > 0. We write Q [ψ] in the form Q [ψ] = Q G [ψ] − (α 1/2 ψ1 − α −1/2 ψ2 )2 ≤ Q G [ψ]. From here and Eq. (3.1) we immediately conclude that µ j ≤ λ j . Therefore, µm = λn implies m ≥ n. From the inductive hypothesis we know that ν [φ] ≥ m − (l − 1). But the number of nodal domains of φ with respect to is either the same or one more than the number with respect to G: α > 0, therefore φ1 and φ2 are of the same sign and we may have cut one domain in two by deleting the edge (1, 2). In particular, ν [φ] ≤ νG [φ] + 1. Eliminating ν [φ], we obtain νG [φ] + 1 ≥ m − (l − 1), which is the sought conclusion.

A Lower Bound for Nodal Count on Discrete and Metric Graphs

815

In the case α < 0 the quadratic form on can be written as Q [ψ] = Q G [ψ] + (β 1/2 ψ1 + β −1/2 ψ2 )2 ,

(3.13)

where β = −α. Consider the subspace D R = {ψ ∈ R|V | : β 1/2 ψ1 + β −1/2 ψ2 = 0}. The restrictions of HG and H to this subspace coincide, as can be seen from (3.13). Therefore we can apply Theorem 3.1 twice, obtaining λ j−1 ≤ ρ j−1 ≤ λ j

µ j−1 ≤ ρ j−1 ≤ µ j ,

where ρ j are the eigenvalues of the restricted operator. In particular, we conclude that µ j−1 ≤ λ j . Since λ-spectrum is non-degenerate, µ j−1 < λ j+1 , therefore µm = λn implies m ≥ n − 1. On the other hand, the number of nodal domains with respect to is the same as with respect to G: since α = φ2 /φ1 < 0, we have cut an edge between two domains. Using the inductive hypothesis we conclude that νG (φ) = ν (φ) ≥ m − (l − 1) ≥ n − 1 − (l − 1) = n − l. We finish the proof with a remark similar to the final statement of the proof for metric graphs. If the new graph happens not to satisfy Assumption 1, a small perturbation in q will force to become generic but will not affect the properties of the eigenvectors of G. 3.4. Low nodal count in a non-generic case. In this section we show that the genericity assumption (Assumption 1) is essential for the existence of the lower bound. We shall construct an example in which the assumption is violated and the nodal count becomes very low. The construction is based on the fact that an eigenfunction of a graph (as opposed to a connected domain in Rd ) may be identically zero on a large set. We consider a metric star graph, which is a tree with N edges all connected to a single vertex. For Dirichlet boundary conditions one can show [16] that k 2 is an eigenvalue of the graph if N cot k L j = 0. (3.14) j=1

To obtain all eigenvalues of the star graph, one needs to add to the solutions of (3.14) the points which are “multiple” poles of the left-hand side of (3.14). More precisely, if a given k is a pole for m cotangents at the same time, then k 2 is an eigenvalue of multiplicity m − 1. Those eigenvalues that are not poles (but zeros) of the left-hand side of (3.14) interlace the poles: between each pair of consecutive poles (coming from different cotangents) there is exactly one zero. Now we choose the lengths L j to exploit the above features. Let L 1 = 1, L 2 = 1/m for some m ∈ N, and the remaining lengths be irrational pairwise incommensurate numbers slightly greater than 1. By construction, k = mπ is a pole for cot(k L 1 ) and cot(k L 2 ). The corresponding eigenfunction is a sine-wave on the edges 1 and 2 and is zero on the other edges. It is easy to see that it has m + 1 nodal domains. On the other hand, counting the poles of (3.14), one can deduce that there are (m − 1)(N − 1) + 1

816

G. Berkolaiko

4 6

2 3 1

7

r

5 orientation

Fig. 4. An example of a tree with root r . If v is vertex 6 then it has two incoming subtrees, one consisting of edges (1, 3), (2, 3) and (3, 6) (highlighted in thicker lines) and the other consisting of only one edge (4, 6). The outgoing edge of v is the edge (6, 7)

eigenvalues preceding (mπ )2 . Thus, we have constructed an eigenfunction which is very high in the spectrum but has low number of nodal domains. A similar construction is possible for discrete graphs as well. Acknowledgement. The result of the present article came about because of two factors. The first was the request by Uzy Smilansky that the author give a talk on the results of [22] at the workshop “Nodal Week 2006” at Weizmann Institute of Science. The second was the discussion the author had with Rami Band on his proof that the nodal count resolves isospectrality of two graphs, one with = 0 and the other with = 1 (now a part of [2]). Rami showed that the nodal count of the latter graph is ν(ψ (n) ) = n − 1 or n with equal frequency. His result lead the author to conjecture that for the graphs close to trees the nodal count of the n th eigenstate does not stray far from n. The author is also grateful to Uzy Smilansky and Rami Band for patiently listening to the reports on the progress made in the proof of the conjecture and carefully checking the draft of the manuscript. The author is indebted to Leonid Friedlander for his explanations of the results and techniques of [9]. The author is also grateful to Tsvi Tlusty for pointing out reference [7], to Vsevolod Chernyshev for pointing out [21,20], to Vladimir Pryadiev for pointing out [1] and to Philipp Schapotschnikow for several useful comments. Most of the work was done during the author’s visit to the Department of Physics of Complex Systems, Weizmann Institute of Science, Israel.

Appendix A. Ideas Behind the Proof for Metric Trees ( = 0) In this section we give an informal overview of the proof of (2.8) on a metric tree ( = 0). For detailed and rigorous proofs we refer the reader to [20–22]. Let (λn , ψ (n) ) be an eigenpair for a tree T satisfying Assumption 2. Choose an arbitrary boundary vertex of the tree T and call it the root r . We can now orient all edges of the tree towards the root (well-defined because it is a tree) and will be taking derivatives in this direction. For each non-root vertex v there is only one adjacent edge that is directed away from it. We call it the outgoing edge of the vertex v. The other adjacent edges are correspondingly incoming. An incoming subtree of vertex v is defined recursively as the union of an incoming edge (u, v) with all incoming subtrees of the vertex u, see Fig. 4. If we drop the boundary condition at the root, then for any λ ≤ λn there is a solution φ(λ, x) which solves the equation H φ = λφ and satisfies all remaining vertex conditions. This solution is unique up to a multiplicative constant. The function φ can be constructed recursively. We fix λ and initialize the recursion by solving the equation H φ = λφ on the outgoing edge of each non-root boundary vertex and imposing the boundary condition corresponding to this vertex. Now let v be a vertex such that the equation is solved on each incoming subtree Tvj . We denote these solutions (which are defined up to a multiplicative constant) by

A Lower Bound for Nodal Count on Discrete and Metric Graphs

817

C j φ j (x). We would like to match these solutions and to extend them to the outgoing edge of v. Denoting the solution of the outgoing edge by φv (x) we write out the matching conditions at the vertex v, φv (v) = C1 φ1 (v) = C2 φ2 (v) = . . . , φv (v) = C1 φ1 (v) + C2 φ2 (v) + . . . . Suppose that all of the functions φ j (x) assume non-zero values on the vertex v. Then the condition on φv (x) takes the form

φ1 (v) φ2 (v) + + ... . φv (v) = φv (v) φ1 (v) φ2 (v) It is now clear that φv , as a solution of H φ = λφ satisfying this condition, is also defined up to a multiplicative constant, Cv . The continuity condition now fixes the constants C j to be Cv φv (v)/φ j (v). Thus we obtain the solution on the union of subtrees Tvj and the outgoing edge of v. This union is in turn an incoming subtree for another vertex (or the root). In the case when one of φ j (x) is zero on the vertex v (without loss of generality we take φ1 (v) = 0), the condition on φv takes the form φv (v) = 0. The solution φv is again defined up to a multiplicative constant Cv . The values of the other constants are now given by C1 = Cv φv (v)/φ1 (v) and C j = 0 when j > 1. Again the solution on the union of subtrees Tvj and the outgoing edge of v is obtained up to a constant. Finally, if more than one of φ j (x) is zero on the vertex v (without loss of generality, φ1 (v) = φ2 (v) = 0), one can take C j = 0 for all j > 2, find non-zero C1 and C2 such that C1 φ1 (v) + C2 φ2 (v) = 0 and extend the function by zero on the rest of the tree. This function will satisfy the Kirchhoff condition at v and also all other vertex conditions. Thus it is an eigenfunction and, moreover, it is equal to zero at an inner vertex. This contradicts our assumptions. We have now constructed a function φ(λ, x) which coincides with the eigenfunction of the tree whenever it satisfies the boundary condition at the root. To count the nodal domains we need to understand the behavior of zeros of φ as we change λ. In order to do that we consider the function7 R(λ, x) = φ (λ, x)/φ(λ, x), where the derivative is taken with respect to x in the direction towards the root. If x is a zero of φ, it becomes a pole of R(λ, x). From the definition of R(λ, x) we see that R(λ, x − 0) = −∞ and R(λ, x + 0) = ∞. Differentiating R(λ, x) with respect to x and using the equation −φ + q(x)φ = λφ, we see that R(λ, x) satisfies d R = q(x) − λ − R 2 , dx a Riccati-type equation. Conditions (2.6) on the boundary vertices in terms of R(λ, x) take the form R(λ, v) = tan(αv ) ∈ R1 ∪ {∞}. The matching conditions on the internal vertices imply that the value of R(λ, v) on the outgoing edge is equal to the sum of the values of R(λ, v) on the incoming edges (in general, R is not continuous on internal vertices). Now let λ2 > λ1 and R(λ2 , x0 ) = R(λ1 , x0 ). Then R (λ2 , x) < R (λ1 , x) and therefore, on some interval (x0 , x0 + ), we have R(λ2 , x) < R(λ1 , x). Moreover, once 7 Sometimes called the Weyl-Titchmarsh function or Dirichlet-to-Neumann map.

818

G. Berkolaiko

R(λ2 , x) ≤ R(λ1 , x), we have R(λ2 , y) ≤ R(λ1 , y) for all y > x provided both functions do not have poles on [x, y]. This can be seen by assuming the contrary and considering the point z ∈ [x, y], where R(λ2 , z) = R(λ1 , z). Using these properties one can conclude that for each fixed x0 , the value R(λ, x0 ) is decreasing as a function of λ between the pairs of consecutive poles. A direct consequence of this is that the poles of R move in the “negative” direction as the parameter λ is increased. The zeros of φ, therefore, move in the direction from the root to the leaves. Since q(x) is continuous, zeros of φ cannot bifurcate on the edges, see Remark 2.5 in Sect. 2.4. To see that the zeros of φ do not split when passing through the vertices, assume the contrary and consider the reverse picture: λ is decreasing. There are at least two subtrees with zeros of φ approaching the same vertex v as λ approaches some critical value from above. At this critical value we thus have two subtrees on which φ has zero at v. But earlier we concluded that this situation contradicts our genericity assumption. To summarize, as λ is increased, new zeros appear at the root and move towards the leaves of the tree. The zeros already in the tree do not disappear or increase in number. Now suppose λk is an eigenvalue and thus R(λk , r ) = tan(αr ). As we increase λ the value of R(λ, r ) decreases to −∞, jumps to +∞ (when a new zero enters the tree) and then increases to tan(αr ) again. Thus between each pair of eigenvalues exactly one new zero enters the tree. And, on a tree, the number of nodal domains is equal to the number of internal zeros plus one.

References 1. Al-Obeid, O.: On the number of the constant sign zones of the eigenfunctions of a dirichlet problem on a network (graph). Tech. report, Voronezh: Voronezh State University, 1992, in Russian, deposited in VINITI 13.04.93, N 938 – B 93. – 8 p 2. Band, R., Shapira, T., Smilansky, U.: Nodal domains on isospectral quantum graphs: the resolution of isospectrality? J. Phys. A 39(45), 13999–14014 (2006) 3. Blum, G., Gnutzmann, S., Smilansky, U.: Nodal domains statistics: A criterion for quantum chaos. Phys. Rev. Lett. 88(11), 114101 (2002) 4. Colin de Verdière, Y.: Multiplicités des valeurs propres. Laplaciens discrets et laplaciens continus. Rend. Mat. Appl. (7) 13(3), 433–460 (1993) 5. Courant, R.: Ein allgemeiner Satz zur Theorie der Eigenfunktione selbstadjungierter Differentialausdrücke. Nach. Ges. Wiss. Göttingen Math.-Phys. Kl. 81–84 (1923) 6. Courant, R., Hilbert, D.:Methods of Mathematical Physics. Vol. I. New York, NY : Interscience Publishers, Inc., 1953 7. Davies, E.B., Gladwell, G.M.L., Leydold, J., Stadler, P.F.: Discrete nodal domain theorems. Linear Algebra Appl. 336, 51–60 (2001) 8. Duval, A.M., Reiner, V.: Perron-Frobenius type results and discrete versions of nodal domain theorems. Linear Algebra Appl. 294(1–3), 259–268 (1999) 9. Friedlander, L.: Genericity of simple eigenvalues for a metric graph. Israel J. Math. 146, 149–156 (2005) 10. Friedman, J.: Some geometric aspects of graphs and their eigenfunctions. Duke Math. J. 69(3), 487– 525 (1993) 11. Gantmacher, F.P., Krein, M.G.: Oscillation matrices and kernels and small vibrations of mechanical systems. Revised ed., Providence, RI: AMS Chelsea Publishing, 2002, Translation based on the 1941 Russian original, Edited and with a preface by Alex Eremenko 12. Gnutzmann, S., Smilansky, U., Sondergaard, N.: Resolving isospectral “drums” by counting nodal domains. J. Phys. A 38(41), 8921–8933 (2005) 13. Gnutzmann, S., Smilansky, U., Weber, J.: Nodal counting on quantum graphs. Waves Random Media 14(1), S61–S73 (2004) 14. Gould, S.H.: Variational Methods for Eigenvalue Problems: an Introduction to the Methods of Rayleigh, Ritz, Weinstein, and Aronszajn. New York: Dover Publications Inc., 1995 15. Herrmann, H.: Beziehungen zwischen den Eigenwerten und Eigenfunktionen verschiedener Eigenwertprobleme. Math. Z. 40, 221–241 (1935)

A Lower Bound for Nodal Count on Discrete and Metric Graphs

819

16. Kottos, T., Smilansky, U.: Periodic orbit theory and spectral statistics for quantum graphs. Ann. Phys. 274, 76–124 (1999) 17. Kuchment, P.: Graph models for waves in thin structures. Waves Random Media 12(4), R1–R24 (2002) 18. Miller, J., Derrida, B.: Weak-disorder expansion for the Anderson model on a tree. J. Stat. Phys. 75(3– 4), 357–388 (1994) 19. Pleijel, Å.: Remarks on Courant’s nodal line theorem. Comm. Pure Appl. Math. 9, 543–550 (1956) 20. Pokornyi, Y.V., Pryadiev, V.L.: Some problems in the qualitative Sturm-Liouville theory on a spatial network, Usp. Mat. Nauk 59, no. 3(357), 115–150 (2004), translated in Russ. Math. Surveys 59, 515–552 (2004) 21. Pokornyi, Y.V., Pryadiev, V.L., Al-Obeid, A.: On the oscillation of the spectrum of a boundary value problem on a graph. Mat. Zametki 60, no. 3, 468–470 (1996), translated in Math. Notes 60, 351–353 (1996) 22. Schapotschnikow, P.: Eigenvalue and nodal properties on quantum graph trees. Waves in Random and Complex Media 16(3), 167–178 (2006) 23. Bıyıko˘glu, T.: A discrete nodal domain theorem for trees. Linear Algebra Appl. 360, 197–205 (2003) 24. Fiedler, M.: Eigenvectors of acyclic matrices. Czechoslovak Math. J. 25(100), 607–618 (1975) Communicated by P. Sarnak

Communications in Mathematical Physics - Volume 221

Read more

Communications in Mathematical Physics - Volume 220

Read more

Communications in Mathematical Physics - Volume 235

Read more

Communications in Mathematical Physics - Volume 223

Read more

Communications In Mathematical Physics - Volume 283

Read more

Communications In Mathematical Physics - Volume 270

Read more

Communications in Mathematical Physics - Volume 208

Read more

Communications in Mathematical Physics - Volume 186

Read more

Communications In Mathematical Physics - Volume 294

Read more

Communications in Mathematical Physics - Volume 217

Read more

Communications In Mathematical Physics - Volume 274

Read more

Communications in Mathematical Physics - Volume 239

Read more

Communications in Mathematical Physics - Volume 306

Read more

Communications in Mathematical Physics - Volume 264

Read more

Communications in Mathematical Physics - Volume 227

Read more

Communications in Mathematical Physics - Volume 184

Read more

Communications in Mathematical Physics - Volume 261

Read more

Communications in Mathematical Physics - Volume 225

Read more

Communications In Mathematical Physics - Volume 263

Read more

Communications in Mathematical Physics - Volume 211

Read more

Communications In Mathematical Physics - Volume 293

Read more

Communications in Mathematical Physics - Volume 246

Read more

Communications In Mathematical Physics - Volume 298

Read more

Communications in Mathematical Physics - Volume 234

Read more

Communications In Mathematical Physics - Volume 288

Read more

Communications in Mathematical Physics - Volume 304

Read more

Communications In Mathematical Physics - Volume 292

Read more

Communications in Mathematical Physics - Volume 233

Read more

Communications in Mathematical Physics - Volume 253

Read more

Communications in Mathematical Physics - Volume 222

Read more

Recommend Documents

Communications in Mathematical Physics - Volume 221

Commun. Math. Phys. 221, 1 – 26 (2001) Communications in Mathematical Physics © Springer-Verlag 2001 Evolution of a ...

Communications in Mathematical Physics - Volume 220

Commun. Math. Phys. 220, 1 – 12 (2001) Communications in Mathematical Physics © Springer-Verlag 2001 On the Definiti...

Communications in Mathematical Physics - Volume 235

Commun. Math. Phys. 235, 1–45 (2003) Digital Object Identifier (DOI) 10.1007/s00220-002-0778-0 Communications in Mathe...

Communications in Mathematical Physics - Volume 223

Commun. Math. Phys. 223, 1 – 12 (2001) Communications in Mathematical Physics © Springer-Verlag 2001 Resonance Expan...

Communications In Mathematical Physics - Volume 283

Commun. Math. Phys. 283, 1–24 (2008) Digital Object Identifier (DOI) 10.1007/s00220-008-0556-8 Communications in Mathe...

Communications In Mathematical Physics - Volume 270

Commun. Math. Phys. 270, 1–12 (2007) Digital Object Identifier (DOI) 10.1007/s00220-006-0139-5 Communications in Mathe...

Communications in Mathematical Physics - Volume 208

Commun. Math. Phys. 208, 1 – 23 (1999) Communications in Mathematical Physics © Springer-Verlag 1999 Characters of C...

Communications in Mathematical Physics - Volume 186

Commun. Math. Phys. 186, 1-59 (1997) Communications in Mathematical Physics (~) Springer-Verlag1997 Meanders and the...

Communications In Mathematical Physics - Volume 294

Commun. Math. Phys. 294, 1–19 (2010) Digital Object Identifier (DOI) 10.1007/s00220-009-0920-3 Communications in Mathe...

Communications in Mathematical Physics - Volume 217

Commun. Math. Phys. 217, 1 – 31 (2001) Communications in Mathematical Physics © Springer-Verlag 2001 Integrable Stru...