Communications in Mathematical Physics - Volume 209

Commun. Math. Phys. 209, 1 – 12 (2000) Communications in Mathematical Physics © Springer-Verlag 2000 Quantum Invaria...

Author: A. Jaffe (Chief Editor)

50 downloads 679 Views 5MB Size Report

This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!

Report copyright / DMCA form

DOWNLOAD PDF

Commun. Math. Phys. 209, 1 – 12 (2000)

Communications in

Mathematical Physics

© Springer-Verlag 2000

Quantum Invariants? Arthur Jaffe Harvard University, Cambridge, MA 02138, USA Received: 12 January 1998 / Accepted: 1 May 1999

Abstract: In earlier work, we derived an expression for a partition function Z(λ) , and gave a set of analytic hypotheses under which Z(λ) does not depend on a parameter λ. The proof that Z(λ) is invariant involved entire cyclic cohomology and K-theory. Here d (λ) Z = 0. The considerations apply to non-commutative we give a direct proof that dλ geometry, to super-symmetric quantum theory, to string theory, and to generalizations of these theories to underlying quantum spaces. 1. Introduction In [QHA] we studied a class of geometric index invariants, in (non-commutative) differential geometry [C1, C2, JLO]. These invariants arise from pairing a cocycle τ λ,g in (equivariant) entire cyclic cohomology, with an operator square-root of unity a. In [JLO] we discovered a representation τ JLO of a cocycle defined

in terms of a given self-adjoint operator Q(λ), yielding Zλ (a, g) = ZQ(λ) (a, g) = τ λ,g , a . In [QHA] we study a modification of this pairing, and extend the class of perturbations of Q(λ) under which the invariants are stable, formulating a sufficient condition of fractional differentiability. The index studied in [QHA] has the numerical value Z ∞ 1 2 2 Q(λ) (a, g) = √ Tr H (γ U (g)ae−Q(λ) +itdλ a )e−t dt, (1.1) Z π −∞ where Q(λ) acts on a Hilbert space H, and the differential da = dλ a is defined by da = [Q(λ), a]. We assume that Q is odd with respect to the Z2 -grading γ , while a is even. We also assume that g is an element of a group G of symmetries of Q and of a. The invariant ZQ (a, g) does not necessarily take integer values, but it is integer in case g equals the identity. ? Work supported in part by the Department of Energy under Grant DE-FG02-94ER-25228 and by the National Science Foundation under Grant DMS-94-24344.

2

A. Jaffe

In this note we present an elementary analysis of the invariance of (1.1). Rather than relating Z to cohomology or K-theory, we study the end result of that analysis. We ask: can one show directly that ZQ(λ) (a, g) is constant? We answer this question affirmatively, by introducing an auxiliary Hilbert space Hˆ that is a skew tensor-product of H with a finite dimensional space. We obtain a representation for Z as an expectation ˆ Therefore we replace the study of Q(λ) acting on H by the study of J(λ, a, g) on H. ˆ Using the identity ZQ(λ) (a, g) = J(λ, a, g), it q(λ, a) = Q(λ) + ηa acting on H. d Q(λ) Z (a, g) = 0. becomes an elementary calculation to establish that dλ ˆ The Hilbert space H differs from H by also containing the additional independent fermionic coordinate η chosen so that η2 = I , ηa = aη, and ηQ(λ) + Q(λ)η = 0. We interpret ηa as a connection associated with the translation in the auxiliary direction t, paired with the fermionic coordinate η. We present the algebraic aspects of the proof, without the analytic details. These analytic details remain absolutely crucial, and without them we would be in the position ˜ both to show that all invariants for a given a agree! For example, if any two odd Q and Q ˜ and commute with the symmetry group U (g), then we could take Q(λ) = λQ+(1−λ)Q attempt to interpolate between them for 0 ≤ λ ≤ 1. However the analytic assumptions we require differ little from those in [QHA], and in fact they comprise a major portion of that work. First we define the regularity of Q(λ) with respect to λ, and secondly we investigate the regularity of a with respect to commutation with Q(λ). We call the latter the fractional differentiability properties of a. The conditions in [QHA] are convenient in many examples, and we summarize our analytic hypotheses in Sect. 11. Under these regularity conditions, ZQ(λ) (a, g) is once-differentiable in λ. Furthermore, the resulting λ-derivative of Z equals the expression that we would obtain by interchanging the order of differentiating and the order of taking traces or integrals in the definition of Z. In Sect. 10 we consider a different but related case with two independent differentials Q1 and Q2 . While we assume that Q1 is invariant under the symmetry group G, we do not assume the invariance of Q2 . We replace this assumption by the two assumptions, namely both that Q22 is invariant, and that Q21 − Q22 commutes with all observables. We show in this case that an expectation (10.4) has a representation similar to (1.1) and also is an invariant with respect to λ. 2. The Supercharge Our basic framework involves an odd, self adjoint operator Q on a Z2 -graded Hilbert space H. This means that we have a self-adjoint operator γ on H for which γ 2 = I . Thus H splits into the direct sum H = H+ ⊕ H− of eigenspaces of γ . The statement that Q is odd means Qγ + γ Q = 0. In terms of the direct sum decomposition, I 0 0 Q∗+ and γ = . (2.1) Q= Q+ 0 0 −I The operator Q is the supercharge1 and its square ∗ Q+ Q+ 0 2 H =Q = 0 Q+ Q∗+

(2.2)

1 We are not concerned with the basic structure of H or Q, aside from the possibility to perform the construction in §V.

Quantum Invariants

3

will be referred to as the Hamiltonian. We let x γ = γ xγ denote the action of γ on operators. We say that the operator x is even (bosonic) if x = x γ and odd (fermionic) if x γ = −x. We define the graded differential dx = Qx − x γ Q.

(2.3)

We suppose that there is a compact Lie group G with a continuous unitary representation U (g) on H such that U (g)γ = γ U (g),

and

U (g)Q = QU (g).

(2.4)

Denote the action of U (g) on the operator x by x → x g = U (g)xU (g)−1 .

(2.5)

3. The Observables Consider an algebra of bounded operators A on H such that each a ∈ A is even and ginvariant. In other words, each a ∈ A commutes with γ and with U (g) for all g ∈ G.Also consider Mat n (A), the set of n × n matrices with matrix elements in A. If a ∈ Matn (A), aij ∈ A. Use the shorthand ab ∈ Matn (A) to denote the matrix then a = {aij }, whereP with entries (ab)ij = nk=1 aik bkj ∈ A. Define the differential of a element a ∈ A by da = Qa − aQ = [Q, a].

(3.1)

This is always densely defined as a quadratic form on H × H. We make precise the boundedness properties of this quadratic form in Sect. 11. We use a to denote an element of the algebra A and x to denote a linear operator or a bilinear form acting on H. In the latter case, we assume that the domain of x includes C ∞ (Q(λ)) × C ∞ (Q(λ)). 4. The Invariant ZQ(λ) (a, g) In [QHA] we gave a simple formula for an invariant. Let Q(λ) depend on a real parameter λ. We denote the graded commutator (2.3) of Q(λ) with x by dλ x = Q(λ)x − x γ Q(λ),

(4.1)

that reduces to dλ a = [Q(λ), a] for a ∈ A. For a ∈ A, the invariant is (1.1). More generally, we let a ∈ Matn (A). In this case Z ∞ 1 2 2 Q(λ) (a, g) = √ Tr H⊗Cn2 γ U (g)ae−Q(λ) +itdλ a e−t dt , (4.2) Z π −∞ where γ , U (g), Q(λ)2 act in Matn (A) as diagonal n × n matrices of the form γ ⊗ I , U (g) ⊗ I , etc. Theorem 1. For a ∈ Mat n (A), assume a 2 = I . Furthermore assume that Q = Q(λ) and dλ a = [Q(λ), a] satisfy the regularity hypotheses given in Sect. 11. Then ZQ(λ) (a, g) is independent of λ. The main point of this paper is to present an elementary proof of Theorem 1.

4

A. Jaffe

5. The Extended Supercharge q In order to exhibit our proof, we introduce a new Hilbert space Hˆ on which the operators Q, γ , A, and U (g) also act. In addition, on Hˆ there are two additional self adjoint operators η and J , both of which have square one, η2 = J 2 = I.

(5.1)

Furthermore, we assume that η commutes with γ , with all elements of A, and with the representation U (g). In other words, [η, x] = [J, x] = 0 for x = γ , a ∈ A, or U (g).

(5.2)

Also we assume that J commutes with Q, but that η anticommutes with J and with Q, ηJ + J η = ηQ + Qη = [J, Q] = 0.

(5.3)

ˆ We now let x denote a linear operator or bilinear Let 0 = γ J denote a Z2 -grading on H. ˆ form acting on H (in the latter case, with domain including C ∞ (Q(λ)) × C ∞ (Q(λ)) x 0 = 0x0.

(5.4)

The operator η is our auxiliary fermionic coordinate, and J = (−I )Nη is the corresponding Z2 grading.2 Given a ∈ A, define the extended supercharge q = q(λ, a) by q = q(λ, a) = Q(λ) + ηa,

(5.5)

h = h(λ, a) = q(λ, a)2 = Q(λ)2 + a 2 − ηdλ a.

(5.6)

q 0 = −q, and h0 = h.

(5.7)

and also let

Note that

ˆ We use the notation dq to denote the 0-graded commutator on H, dq x = qx − x 0 q.

(5.8)

If we need to emphasize the dependence of q on λ or a, then we write dq(λ,a) x. We continue to reserve d or dλ to denote the γ -graded commutator (4.1). 2 Suppose that H = H ⊗ H is a tensor product of bosonic and fermionic Fock spaces, that Q is linear in b f fermionic creation or annihilation operators, and that γ = (−I )Nf . This would be standard in the physics of supersymmetry. Suppose in addition that η = b + b∗ denotes one fermionic degree of freedom independent of

∗ those in Hf and acting on the two-dimensional space Hη . Then take Hˆ = Hb ⊗(Hf ∧Hη ) and J = (−I )b b , ˆ with Q, γ , a, and U (g) acting on Hˆ in the natural way. This gives a realization of (V.1–3) on H.

Quantum Invariants

5

ˆ 6. Heat Kernel Regularization on H ˆ Let Xn = {x0 , . . . , xn } Let us introduce the heat kernel regularizations Xˆ n of Xn on H. ˆ We call the xj vertices denote an ordered set of (n + 1) linear operators xj acting on H. and Xn a set of vertices. Choose a ∈ A and let q(λ, a) = Q(λ)+ηa, and h = h(λ, a) = q(λ, a)2 . Define the heat kernel regularization Xˆ n (λ, a) = {x0 , . . . , xn }∧ (λ, a) by Z x0 e−s0 h x1 e−s1 h · · · xn e−sn h δ(1 − s0 − · · · − sn )ds0 · · · dsn . (6.1) Xˆ n (λ, a) = sj >0

Note that if T is any operator on Hˆ that commutes with h = q 2 , then {x0 , . . . , xj T , xj +1 , . . . , xn }∧ (λ, a) = {x0 , . . . , xj , T xj +1 , . . . , xn }∧ (λ, a). (6.2) Furthermore T = J η anti-commutes with q(λ, a) and commutes with h(λ, a) for all a. Proposition 2 (Vertex Insertion). Let Xn = {x0 , . . . , xn } denote a set of vertices possibly ˙ = ∂Q(λ)/∂λ, we have depending on λ. Then with the notation Q n

X ∂ ˙ xj +1 , . . . , xn }∧ (λ, a) {x0 , . . . , xj , dq Q, {x0 , . . . , xn }∧ (λ, a) = − ∂λ j =0

n X ∂xj {x0 , . . . , , . . . , xn }∧ (λ, a). + ∂λ

(6.3)

j =0

Here ˙ = dq(λ,a) Q ˙ = dλ Q ˙ + η[a, Q]. ˙ dq Q

(6.4)

Proof. By differentiating Xˆ n defined in (6.1), we obtain two types of terms. Differentiating the xj ’s gives the second sum in (6.3). (This sum is absent if the xj ’s are λ-independent.) The other terms arise from differentiating the heat kernels. We use the identity Z s ∂h ∂ −sh e =− e−uh e−(s−u)h du, (6.5) ∂λ ∂λ 0 that holds under suitable regularity hypotheses, see for example Proposition VII.10 of [QHA]. Here ∂q ∂ 2 ∂q ∂q ∂h ˙ = (q ) = q + q = dq = dq Q. ∂λ ∂λ ∂λ ∂λ ∂λ Explicitly ˙ = (Q + ηa)Q ˙ + Q(Q ˙ ˙ + η[a, Q]. ˙ + ηa) = dλ Q dq Q Inserted back into the definition of Xˆ n , we observe that the differentiation of the heat ˙ vertex at position kernel between vertex j and vertex j + 1 produces one new −dq Q j + 1. This completes the proof of (6.3). u t

6

A. Jaffe

Define the action of the grading 0 on sets of vertices Xn by Xn → Xn0 = {x00 , x10 , . . . , xn0 }.

(6.6)

Since q 2 = (q 0 )2 , the regularization Xn → Xˆ n commutes with the action of 0, namely 0 ∧ (6.7) Xˆ n (λ, a) = Xn0 (λ, a). It is also convenient to write explicitly the expression for the differential dq Xˆ n , dq Xˆ n (λ, a) = q Xˆ n (λ, a) − Xˆ n (λ, a)0 q = {qx0 , x1 , . . . , xn }∧ (λ, a) − {x00 , . . . , xn0 q}∧ (λ, a).

(6.8)

In particular, we infer that dq Xˆ n (λ, a) =

n X {x00 , x10 , . . . , xj0−1 , dq xj , . . . , xn }∧ (λ, a).

(6.9)

j =0

One other identity we mention is Proposition 3 (Combination Identity). The heat kernel regularizations satisfy {x0 , x1 , . . . , xn }∧ (λ, a) =

n X {x0 , x1 , . . . , xj , I, xj +1 , . . . , xn }∧ (λ, a).

(6.10)

j =0

Proof. The j th term on the right side of (6.9) is {x0 , . . . , xj , I, xj +1 , . . . , xn }∧ (λ, a) Z x0 e−s0 h · · · xj e−(sj +sj +1 )h = sj >0

· · · xn e−sn+1 h δ(1 − s0 − · · · − sn+1 )ds0 · · · dsn+1 .

(6.11)

Change the s-integration variables to s00 = s0 , s10 = s1 , . . . , sj0 = sj + sj +1 , sj0 +1 = 0 = sj . This change has Jacobian 1, and the resulting sj +2 , . . . , sn0 = sn+1 , and sn+1 integrand has the form of the integrand for {x0 , . . . , xn }∧ with variables s00 , . . . , sn0 , namely Z Z 0 0 0 ds00 · · · dsn0 x0 e−s0 h · · · xn e−sn h δ(1 − s00 − · · · − sn0 ) , dsn+1 s00 ,s10 ,... ,sn0 >0

(6.12) 0 only through the restriction of the with the integrand depending on the variable sn+1 0 0 range of the sn+1 integral. The original domain of integration restricts sn+1 to the range 0 0 0 0 ≤ sn+1 ≤ sj , so the dependence of the integrand on sn+1 is the characteristic function 0 integration produces a factor sj0 in of the interval [0, sj0 ]. Thus performing the sn+1 0 0 the s0 , . . . , sn -integrand. Add the similar results for 0 ≤ j ≤ n to give the factor s00 + s10 + · · · sn0 . But the delta function in (6.12) restricts this sum to be 1, so the integral t of the sum is exactly {x0 , . . . , xn }∧ (λ, a). u

Quantum Invariants

7

ˆ 7. Expectations on H Let a ∈ A satisfy a 2 = I , and let Xˆ n = Xˆ n (λ, a) denote the heat kernel regularization of Xn . We define the expectation DD EE Xˆ n

λ,a,g

Z

1 =√ 4π

∞

−∞

Tr Hˆ 0U (g)Xˆ n (λ, ta) dt.

(7.1)

Here we choose a 2 = I to ensure that the t 2 term in h provides a gaussian convergence factor to the t-integral. This integral represents averaging over a’s whose squares are multiples of the identity. These expectations can be considered as (n + 1)-multilinear expectations on sets Xn of vertices. We sometimes suppress the λ- or a- or g-dependence of the expectations, or the n-dependence of sets of vertices. Furthermore, where confusion does not occur we omit the ∧ that we use to distinguish a set ofDDvertices EE X from the heat kernel regularization ˆ by hhXii, or when we wish to clarify of the set. Thus at various times we denote Xn a,g

the dependence on n, a, or g with some subset of these indices, or even as one of the following: DD EE Xˆ n

λ,a,g

= hhXii = hhXiin = hhXiin,a = hhXiin,a,g ,

(7.2)

etc. Proposition 4. With the above notation, we have the identities

hhXiin = X0 n ,

(0-invariance)

(differential)

(7.3)

n DDn oEE X

x00 , x10 , . . . , xj0−1 , dq xj , . . . , xn , dq X n =

(7.4)

j =0

(cyclic symmetry) DDn −1 oEE g 0 hh{x0 , x1 , . . . , xn }ii = xn , x1 , x2 , . . . , xn−1 ,

(7.5)

and (combination identity) hh{x0 , x1 , . . . , xn }iin =

n X

x0 , x1 , . . . , xj , I, xj +1 , . . . , xn

j =0

n+1

. (7.6)

Also, in case Q = Qg and a = a g , then q = q g and we have (infinitesimal invariance)

dq(λ,ta) X = 0.

(7.7)

8

A. Jaffe

Proof. The symmetry (7.3) is a consequence of the fact that 0 2 = I , and 0 commutes with U (g) and with q 2 . The expectation of (6.9) completes the proof of (7.4). The proof of (7.5) involves cyclicity of the trace. The identity (7.6) is the expectation of (6.10). To establish (7.7), note that every Xˆ n can be decomposed uniquely as Xˆ n = Xˆ n+ + Xˆ n− , 0

= ±Xˆ n± . The symmetry (7.3) ensures that dq(λ,ta) Xn+ = 0. On the where Xˆ n± other hand, q 0 = −q, together with cyclicity of the trace and q g = q ensures that

dq(λ,ta) Xn− = q(λ, ta)Xn− + Xn− q(λ, ta) EE DD

−1 = q(λ, ta)Xn− + q(λ, ta)g 0 Xn− = 0.

t u

Except in (7.7), we have implicitly assumed that the vertices xj in Xn are t-independent. In case that Xn has one factor linear in t, the heat kernel regularizations of the following agree: ∧ {tx0 , x1 , . . . , xn }∧ (λ, ta) = x0 , x1 , . . . , txj , . . . , xn (λ, ta) ,

(7.8)

for any j = 0, 1, . . . , n. We then obtain an interesting relation for expectations, Proposition 5 (Integration by parts). Let a 2 = I . Then hh{tx0 , x1 , . . . , xn }iin =

n X 1

j =0

2

x0 , . . . , xj , ηdλ a, xj +1 , . . . , xn

n+1

.

(7.9)

Proof. In order to establish (7.9), we collect together the terms exp(−sj t 2 ) that occur in {x0 , . . . , xn }∧ (ta). Since the integrand for the heat kernel regularization has a δfunction restricting the variables sj to satisfy s0 + · · · + sn = 1, we obtain the factor exp(−t 2 ). Write 1 d −t 2 2 e te−t = − 2 dt and integrate by parts in t. The resulting derivative involves the t-derivative of each heat kernel exp −(sj q(λ, ta)2 ) with the quadratic term in t removed from q 2 . Note that e−st

2

Z s d −s(q 2 −t 2 ) d 2 2 2 =− e−uq e (q − t 2 ) e−(s−u)q du dt dt 0 Z s 2 2 e−uq ηdλ ae−(s−u)q du. = 0

Here we use (5.6) with ta replacing a and with a 2 = I in order to evaluate the t derivative of q 2 − t 2 . Thus each derivative introduces a new vertex equal to 21 ηdλ a, and the proof of (7.9) is complete. u t

Quantum Invariants

9

8. The Functional J(λ, a, g) Let us consider a single vertex and X0 = x0 = J a, where a ∈ A, and its expectation J(λ, a, g) = hhJ aii .

(8.1)

Explicitly 1 J(λ, a, g) = √ 4π

Z

∞

−∞

2 Tr Hˆ γ U (g)ae−q(λ,ta) dt.

(8.2)

This functional allows us to recover the functional Z. Theorem 6. Let a satisfy a 2 = I . Then J(λ, a, g) = ZQ(λ) (a, g).

(8.3)

Proof. Let h = h0 − tηda, where h0 = Q(λ)2 + t 2 . The Hille–Phillips perturbation theory for semi-groups, see Theorem 13.4.1 of [HP], can be written e−q(λ,ta) = e−h0 +tηda Z ∞ X = e−h0 + tn 2

n=1 −sn h0

· · · ηdλ ae

sj >0

e−s0 h0 ηdλ ae−s1 h0 ηdλ a

δ(1 − s0 − s1 − · · · − sn )ds0 ds1 · · · dsn .

(8.4)

In the nth term we collect all factors of η on the left. Note that η commutes with a and h0 , and it anti-commutes with dλ a. Therefore the result of collecting the factors of η on the left is ηn (−1)n(n−1)/2 . If n is odd, then ηn = η and Tr Hη (η) = 0. Thus only even n terms contribute to (8.2). For even n, ηn (−1)n(n−1)/2 = (−1)n/2 I and Tr Hη (I ) = 2. Thus (8.2) becomes 1 J(λ, a, g) = √ π

Z

∞

−∞

dt

∞ X 2 (−t 2 )n h{a, dλ a, . . . , dλ a}i2n e−t ,

(8.5)

n=0

where we use expectations h in on H similar to hh iin on Hˆ (but without the t-integration) and defined by Z 2 2 h{x0 , . . . , xn }in = Tr H γ U (g)x0 e−s0 Q(λ) · · · xn e−sn Q(λ) sj >0

· δ(1 − s0 − s1 − · · · − sn )ds0 ds1 · · · dsn . But using the Hille–Phillips formula once again, (8.6) is just Z ∞ 1 2 2 dtTr H γ U (g)ae−Q(λ) +itdλ a e−t = ZQ(λ) (a, g). √ π ∞

(8.6)

(8.7)

(Here we use the symmetry of (8.7) under γ to justify vanishing of terms involving odd powers of dλ a.) Thus we can prove that ZQ(λ) (a, g) is independent of λ by showing that J(λ, a, g) is constant in λ. u t

10

A. Jaffe

9. (J(λ, a, g) Does Not Depend on λ We now prove Theorem 1. Calculate ∂J/∂λ using (6.3), in the simple case of one vertex independent of λ. Thus

∂ ∂ ˙ . hhJ aii = − J a, dq(λ,ta) Q J(λ, a, g) = ∂λ ∂λ Using the identity (7.7) in the form

˙ + J a, dq(λ,ta) Q ˙ ˙ = dq(λ,ta) (J a), Q , 0 = dq(λ,ta) J a, Q

(9.1)

(9.2)

we have

∂ ˙ . J(λ, a, g) = dq(λ,ta) (J a), Q ∂λ

(9.3)

It is at this point that we have used q g = q, namely the invariance of both Q and a under U (g). To evaluate (9.3), note that dq(λ,ta) (J a) = [q(λ, ta), J a] = J dλ a − 2tJ η.

(9.4)

Here we use the assumption a 2 = I . From Proposition V we therefore infer

∂ ˙ − 2 tJ η, Q ˙ J(λ, a, g) = J dλ a, Q ∂λ

˙ − J η, ηdλ a, Q ˙ − J η, Q, ˙ ηdλ a . (9.5) = J dλ a, Q ˙ = −QJ ˙ η, use (6.2) to establish Since J η commutes with h = q 2 , and since J ηQ

˙ + J η, Q, ˙ ηdλ a = I, J dλ a, Q ˙ − I, Q, ˙ J dλ a J η, ηdλ a, Q

˙ I + J dλ a, I, Q ˙ . (9.6) = J dλ a, Q, ˙ and the cyclic symmetry (7.5). Hence we can ˙ 0 = −Q In the last step we Q

also use ˙ simplify (9.6) to J dλ a, Q , by applying the combination identity (7.6). Substituting this back into (9.5), we end up with

∂ ˙ − J dλ a, Q ˙ = 0. J(λ, a, g) = J dλ a, Q ∂λ

(9.7)

Thus J(λ, a, g) is invariant under change of λ, and the demonstration is complete. u t 10. Independent Supercharges Qj (λ) Let us generalize our consideration to the case that there are two self-adjoint operators Q1 = Q1 (λ) and Q2 = Q2 (λ) on H such that Q1 γ + γ Q1 = Q2 γ + γ Q2 = Q1 Q2 + Q2 Q1 = 0.

(10.1)

Thus we have two derivatives dj a = [Qj , a]. We assume that the energy operator on H is defined by 1 2 1 Q1 + Q22 (10.2) H = H (λ) = (Q1 + Q2 )2 = 2 2

Quantum Invariants

11

and that the operator P =

1 2 Q1 − Q22 2

(10.3)

has the properties: i) P does not depend on λ. ii) P commutes with Q1 , Q2 and with each a ∈ A. iii) U (g) commutes with Q1 and with H (λ). Assumption (i) corresponds to a common situation where P can be interpreted as a “momentum” operator. Then the energy, but not the momentum, is assumed to depend on λ. Assumption (ii) says that Q1 , Q2 are translation invariant, and that A is a “zeromomentum” or translation-invariant subalgebra. According to assumption (iii), U (g) commutes with Q22 , but U (g) may not commute with Q2 . Under these hypotheses, and with appropriate regularity assumptions, we proved in [QHA] that for a = a g and a2 = I , Z ∞ 1 2 Tr γ U (g)ae−H +itd1 a−t dt Z{Qj (λ)} (a, g) = √ (10.4) π −∞ is independent of λ. In this section we give an alternate proof that (10.4) is constant. Introduce on Hˆ two extended supercharges q1 = q1 (λ, a) = Q1 + ηa and q2 = Q2 . With η as before, ηQ1 + Q1 η = ηQ2 + Q2 η = 0. Define h = h(λ, ta) = H (λ) + t 2 a 2 − tηd1 a.

(10.5)

h = q1 (λ, ta)2 − P = Q1 (λ)2 + t 2 a 2 − tηd1 a − P ,

(10.6)

Note that so we can eliminate Q2 (λ) from h by introducing the operator P , that commutes with a, γ , J , U (g), η, and Qj (λ). ˆ so we repeat the conThus P commutes with all operators that we consider on H, 2 structions of Sects. 5–9. However, we replace q(λ, ta) in the previous construction with h(λ, ta) defined by (10.5). Also we replace dq x with dqj x = qj x − x 0 qj . We use the heat kernel exp(−sh) to define the heat kernel regularization. Then define the g expectation hh·ii by the formula (7.1) with this new h(λ, ta). As q1 = q1 , therefore we have

(10.7) dq1 (λ,ta) X = 0.

g However it may not be true that q2 = q2 , so it may not be true that dq2 (λ,ta) X vanishes. 2 As before, with a = I , define J(λ, a, g) = hhJ aii .

(10.8)

In this case, we establish as in the proof of Theorem 6 that J(λ, a, g) = ZQj (λ) (a, g).

(10.9)

Thus the proof of Theorem 1 shows: Theorem 7. Let a ∈ A, assume a 2 = I , and also assume the regularity hypotheses on Qj (λ) and d1 a = [Q1 (λ), a], stated in Sect. 11. Then the expectation ZQj (λ) (a, g) is independent of λ.

12

A. Jaffe

11. Regularity Hypotheses As explained in the introduction, our results depend crucially on some regularity hy2 potheses. In order for Z to exist, we assume e−H (λ) = e−Q(λ) exists and is trace class on H. We give sufficient conditions to ensure this, as well as to ensure the validity of the results claimed in Sects. 1–9. The content of Sect. 10 requires only minor modification of these hypotheses. We have explored the consequences of these hypotheses in [QHA]. 1. The operator Q is self-adjoint on H, odd with respect to γ , and e−βQ is trace class for all β > 0. 2. For λ ∈ 3, where 3 is an open interval on the real line, the operator Q(λ) can be expressed as a perturbation of Q in the form 2

Q(λ) = Q + W (λ).

(11.1)

Each W (λ) is a symmetric operator on the domain D = C ∞ (Q). 3. Let λ lie in any compact subinterval 30 ⊂ 3. The inequality W (λ)2 ≤ aQ2 + bI,

(11.2)

holds as an inequality for forms on D × D. The constants a < 1 and b < ∞ are independent of λ in the compact set 30 ⊂ 3. 4. Let R = (Q2 + I )−1/2 . The operator Z(λ) = RW (λ)R is bounded uniformly for λ ∈ 30 , and the difference quotient Z(λ) − Z(λ0 ) λ − λ0

(11.3)

converges in norm to a limit as λ0 → λ ∈ 30 ⊂ 3. 5. The bilinear form dλ a satisfies the bound kR α dλ aR β k < M,

(11.4)

with a constant M independent of λ for λ ∈ 30 . Here α, β are non-negative constants and α + β < 1. In certain examples we are interested in the behavior of J(λ, a, g) as λ tends to the boundary of 3. In this case, we may establish the constancy of J with estimates that are weaker than (1–5) at the endpoint of 3, by directly proving the existence and continuity of J at the endpoint. We study one such example in [HE], though other types of endpoint singularities are also of interest (often involving a λ → ∞ limit). References [C1] [C2]

Connes, Alain: Non-Commutative Geometry. London–New York: Academic Press, 1994 Connes, Alain: Entire cyclic cohomology of Banach algebras and characters of 2-summable Fredholm modules. K-theory 1, 519–548 (1988) [HP] Hille, E. and Phillips, R.: Functional Analysis and Semi-groups, Colloquium Publications Vol. 31, Revised Edition, American Mathematical Society, 1957 [QHA] Jaffe, A.: Quantum harmonic analysis and geometric invariants. Adv. in Math. 143, 1–110 (1999) [HE] Jaffe, A.: The holonomy expansion, index theory, and approximate supersymmetry. Ann. Phys. to appear [JLO] Jaffe, A., Lesniewski, A. and Osterwalder, K.: Quantum K-theory, I. The Chern character. Commun. Math. Phys. 118, 1–14 (1988) Communicated by A. Jaffe

Commun. Math. Phys. 209, 13 – 27 (2000)

Communications in

Mathematical Physics

© Springer-Verlag 2000

The Extended Moduli Space of Special Lagrangian Submanifolds S. A. Merkulov? Department of Mathematics, University of Glasgow, 15 University Gardens, Glasgow G12 8QW, UK. E-mail: [email protected] Received: 29 June 1998 / Accepted: 7 June 1999

Abstract: It is well known that the moduli space of all deformations of a compact special Lagrangian submanifold X in a Calabi–Yau manifold Y within the class of special Lagrangian submanifolds is isomorphic to the first de Rham cohomology group of X. Reinterpreting the embedding data X ⊂ Y within the mathematical framework of the Batalin–Vilkovisky quantization, we find a natural deformation problem which extends the above moduli space to the full de Rham cohomology group of X. 1. Introduction Let Y be a Calabi–Yau manifold of complex dimension m with Kähler form ω and a nowhere vanishing holomorphic m-form . A compact real m-dimensional submanifold X ,→ Y is called special Lagrangian if ω|X = 0 and Im |X = 0. According to McLean [6], the moduli space of all deformations of X inside Y within the class of special Lagrangian submanifolds is a smooth manifold whose tangent space at X is isomorphic to the first de Rham cohomology group H1 (X, R). Moduli spaces of special Lagrangian submanifolds are playing an increasingly important role in quantum cohomology and related topics. On physical grounds, Strominger, Yau and Zaslow argued [11] that whenever a Calabi–Yau manifold Y has a mirror partner Yˆ , then Y admits a foliation ρ : Y 2m → B m by special Lagrangian tori T m and Yˆ is the compactification of the family of dual tori Tˆ m along the fibres of the projection ρ (for the mathematical account of this construction see [8]). The homological mirror symmetry conjecture, as it was formulated by Kontsevich in [5], states that the bounded derived category Db (Yˆ ) of coherent sheaves on a Calabi– Yau manifold Yˆ is equivalent to the derived category constructed out of the Fukaya ? Present address: Max-Planck-Institut für Mathematik, Postfach 7280, 53072 Bonn, Germany. E-mail: [email protected]

14

S. A. Merkulov

category of all Lagrangian submanifolds in the dual Calabi–Yau manifold Y . A natural subcategory of Db (Yˆ ) is the category of holomorphic vector bundles, W → Yˆ . Recently Vafa [14] argued, on physical grounds, that the mirror partner of a pair ˆ (Y , W ), W being a stable holomorphic vector bundle on a Calabi–Yau manifold Yˆ , must be a triple (Y, X, L) consisting of a dual Calabi–Yau manifold Y , a compact special Lagrangian submanifold X ,→ Y and a flat unitary line bundle L on X. Moreover, this correspondence (Yˆ , W ) ⇐⇒ (Y, X, L) comes together with an isomorphism between the moduli space (with typical tangent space H1 (Yˆ , End W )) of all stable deformations of the holomorphic vector bundle W → Yˆ , and the moduli space (with typical tangent space H1 (X, R) ⊗ C) associated with McLean’s deformations of the embedding X ,→ Y and deformations of the flat unitary line bundle L on X. Actually, Vafa conjectures that much more must be true: H∗ (Yˆ , End W ) = H∗ (X, R) ⊗ C. This raises a problem of finding a geometric interpretation of the full de Rham cohomology group of a special Lagrangian submanifold X ,→ Y . Its solution is the main theme of the present paper. By moving into the mathematical realm of Batalin– Vilkovisky quantization, we devise, out of the same data X ,→ Y , a deformation problem whose moduli space has the typical tangent space isomorphic to H∗ (X, R), thereby extending McLean’s moduli space to the full de Rham group. The idea is very simple. First, out of Y we construct a real (2m|2m)-dimensional supermanifold Y := 51 Y , where 1 Y is the real cotangent bundle and 5 denotes the parity change functor. The supermanifold Y comes equipped with a complex structure and a nowhere vanishing b ∈ 0(Y, Ber holo (Y)), of the holomorphic Berezinian bundle holomorphic section, induced by the holomorphic m-form on Y . We note that if X ⊂ Y is an (m|m)b restricts to X as a global no-where vanishing section of dimensional real slice, then the bundle C ⊗ Ber (X ). b := Y × Second, we construct a real (2m|2m + 1)-dimensional supermanifold Y R0|1 and note that it comes canonically equipped with an odd exact contact structure represented by a 1-form b θ . We also note that the Kähler form ω on Y gives rise to an even b and call an (m|m)-dimensional sub-supermanifold X ,→ Y b smooth function b ω on Y special Legendrian if the following conditions hold: b|p(X ) ) = 0, b ω|X = 0, Im ( θ |X = 0, b b → Y. where p denotes the natural projection Y b is special Legendrian, then Xred ,→ Y is special Lagrangian. If X ,→ Y If X ,→ Y is special Lagrangian (with normal bundle denoted by N), then the associated superb However, the manifold X := 5N ∗ is a special Legendrian sub-supermanifold of Y. correspondence between the special Lagrangian submanifolds X ,→ Y and special b is not one-to-one – passing from the Calabi– Legendrian sub-supermanifolds X ,→ Y b brings precisely the right Yau manifold Y to the associated contact supermanifold Y amount of new degrees of freedom to extend McLean’s moduli space H1 (X, R) to the full de Rham group H∗ (X, R). Main Theorem. Let X ,→ Y be a compact special Lagrangian submanifold of a b be the associated special Legendrian Calabi–Yau manifold and let X = 5N ∗ ,→ Y

Extended Moduli Space of Special Lagrangian Submanifolds

15

sub-supermanifold of the contact supermanifold. The maximal moduli space M of deformations of X inside Y within the class of special Legendrian sub-supermanifolds is a smooth supermanifold whose tangent superspace at X is canonically isomorphic to 5H∗ (X, R). In view of Vafa’s conjectures, it is important to study geometric structures induced on M from the original data X ,→ Y (cf. [3,12]). The paper is organised as follows. In Sects. 2 and 3 we study extended moduli spaces of general compact submanifolds X in a manifold Y . In Sect. 4 we specialize to the case when the ambient manifold Y has a symplectic structure and the submanifolds X ,→ Y are Lagrangian. In Sect. 5 we consider another special case when Y is a complex manifold equipped with a nowhere-vanishing holomorphic volume form and X ,→ Y is a real slice of Y satisfying Im |X = 0. Finally, in Sect. 6 we combine all the previous results to prove the Main Theorem.

2. Extended Kodaira Moduli Spaces 2.1. Families of compact submanifolds. Let Y and M be smooth manifolds and let π1 : Y × M −→ Y and π2 : Y × M −→ M be the natural projections. Afamily of compact submanifolds of the manifold Y with the moduli space M is a submanifold F ,→ Y × M such that the restriction, ν, of the projection π2 on F is a proper regular map. Thus the family F has the structure of a double fibration µ

ν

Y ←− F −→ M, where µ ≡ π1 |F . For each t ∈ M, there is an associated compact submanifold Xt in Y which is said to belong to the family F . Sometimes we use a more explicit notation {Xt ,→ Y | t ∈ M} to denote the family F of compact submanifolds. The family F is called maximal if for any other family F˜ ,→ Y × M˜ such that µ ◦ ν −1 (t) = µ˜ ◦ ν˜ −1 (t˜) ˜ there is a neighbourhood U˜ ⊂ M˜ of the point t˜ and for some points t ∈ M and t˜ ∈ M, ˜ a smooth map f : U → M such that µ˜ ◦ ν˜ −1 (t˜0 ) = µ ◦ ν −1 f (t˜0 ) for every t˜0 ∈ U˜ . Similar definitions can be made in the category of complex manifolds, category of (complex) supermanifolds and category of analytic (super)spaces. 2.2. The Kodaira map. Consider a 1-parameter family, F ,→ Y × M, of compact sub(super)manifolds in a (super)manifold Y , where M = R1|0 or R0|1 with the natural coordinate denoted by t (such a family is often called a 1-parameter deformation of the sub(super)manifold X = µ ◦ ν −1 (0)). There is a finite covering {Ui } of F such that the restriction to each Ui of the ideal sheaf JF of F ,→ Y × M is finitely generated, ∂f α say JF |Ui = hfiα i, α = 1, . . . , codim F . It is easy to see that the family { ∂ti mod JF } defines a global section of the normal bundle, NF , of the embedding F ,→ Y × M and hence gives rise to a morphism of sheaves, k : T M −→ ν∗0 NF α ∂f ∂ i ∂t −→ { ∂t mod JF }.

16

S. A. Merkulov

This morphism, or rather its restriction kt : Tt M −→ H0 (Xt , Nt ), where Nt ' NF |ν −1 (t) is the normal bundle of Xt ,→ Y , is called the Kodaira map. If M is the 1-tuple point R[t]/t 2 , then the family F is called an infinitesimal deformation of X = µ ◦ ν −1 (0) in Y . The Kodaira map establishes a one-to-one correspondence between all possible infinitesimal deformations of X inside Y and the vector superspace H0 (X, N). Often in this paper we shall be interested in deformations of X inside Y within a class of special (say, complex, Lagrangian, Legendrian, etc.) submanifolds. The associated set of all possible infinitesimal deformations of X is a vector subspace of H0 (X, N) called the Zariski tangent space at X to the moduli space of (special) compact submanifolds. Note that we do not require that any element of the Zariski tangent space at X necessarily exponentiates to a genuine 1-parameter deformation of X. Put another way, the Zariski tangent space to the moduli space M makes sense even when M does not exist as a smooth manifold!

2.3. From manifolds to supermanifolds. Given a compact submanifold, X ,→ Y , of a smooth manifold Y . The associated exact sequence 0 −→ T X −→ T Y |X −→ N −→ 0 implies the canonical map 3∗ T Y |X −→ 3∗ N −→ 0 which in turn implies the canonical embedding, X ,→ Y, of the associated supermanifolds X := (X, 3∗ N) and Y := (Y, 3∗ T Y ) (cf. [9,10]). In more geometrical terms, X ' 5N ∗ , Y ' 51 Y and X ,→ Y corresponds just to the natural inclusion, 5N ∗ ⊂ 51 M|X . The supermanifold Y comes equipped canonically with an even Liouville 1-form θ defined, in a natural local coordinate system (x a , ψa ' 5∂/∂x a ) on Y, as follows n X dx a ψa , n = dim Y. θ= a=1

The odd two-form η := dθ = −

n X

dx a ∧ dψa

a=1

is non-degenerate and hence equips Y with an odd symplectic structure. A (p, n−p)-dimensional sub-supermanifold X ,→ Y is called Lagrangian if ηX = 0 (this implies, in particular, that θ|X is closed). It is called exact Lagrangian if θ|X is an exact 1-form on X . Lemma 2.3.1. For any submanifold X ,→ Y , the associated sub-supermanifold X ,→ Y is exact Lagrangian.

Extended Moduli Space of Special Lagrangian Submanifolds

17

Proof. Assume dim X = p. We can always choose a local coordinate system (U, x a ) in a tubular neighbourhood U of (a part of) X inside Y in such a way that X ∩ U = {x a = 0, a = p + 1, . . . , n}. Then the normal bundle N of X ,→ Y is locally generated by ∂/∂x a with a = p + 1, . . . , n. Hence X ,→ Y is locally given by the equations x a = 0, ψb = 0, where a = p + 1, . . . , n and b = 1, . . . , p. It is now obvious that t θ|X = 0. Finally, dim X = (p, n − p). u Remark 2.3.2. It also follows from the above proof that, for any submanifold X ,→ Y with the normal bundle NX , θ |5NX∗ = 0. It is not hard to check that the reverse is also true: if X ,→ Y is an (p|n − p)-dimensional sub-supermanifold such that θ|X = 0, then X = 5N ∗ X for some submanifold X ,→ Y . 2.4. The extended Kodaira map. Let X be a Lagrangian sub-supermanifold of a supermanifold Y equipped with an odd symplectic structure η. Then, as usual, one gets an odd η−1

isomorphism j : 1 X → N , where N is the normal bundle of X ,→ Y. In particular, there is a monomorphism of sheaves, j ◦d

i : OX /R −→ N , where d is the exterior derivative. Consider now a one (even or odd) parameter family of compact exact Lagrangian sub-supermanifolds of the supermanifold Y = 51 Y , i.e. a double fibration µ

ν

Y ←− F −→ M, with ν being a proper submersion and Xt := µ◦ν −1 (t) being a compact exact Lagrangian sub-supermanifold of (Y, η) for every t ∈ M ⊂ R1|0 or R0|1 . Lemma 2.4.1. For the family {Xt ,→ Y|t ∈ M} as above the Kodaira map kt : Tt M → H0 (Xt , Nt ) factors as follows k0

i

kt : Tt M −→ H0 (Xt , OXt )/R −→ H0 (Xt , Nt ). Proof. Since µ∗ (η)|Xt = 0, we have µ∗ (η) = A ∧ dt for some 1-form A on F whose restriction to ν −1 (t) represents, under the isomorphism η−1

j : 1 Xt → Nt , the normal vector field kt (∂/∂t). On the other hand, µ∗ (θ ) = 9dt + dB, ˜ = t˜ + 1 and B˜ = 11 . Thus for some smooth functions 9 and B on F with parities 9 A = d9 completing the proof. u t Corollary 2.4.2. Let X be a compact exact Lagrangian sub-supermanifold of an odd symplectic supermanifold Y. Then the Zariski tangent space at X to the moduli space of all deformations of X within the class of exact Lagrangian sub-supermanifolds is (odd) isomorphic to H0 (X , OX )/R. 1 Here and elsewhere ˜ stands for the parity of the kernel symbol.

18

S. A. Merkulov

Definition 2.4.3. If Y = 51 Y and Xt ' 5Nt∗ , where Nt is the normal bundle of some submanifold Xt ,→ Y , then H0 (Xt , OX )/R ' H0 (Xt , 3∗ Nt )/R. The associated even map 5k 0 : Tt M −→ 5 H0 (Xt , 3∗ Nt )/R ≡ 5H0 (Xt , 3∗ Nt ) /R0|1 is called the extended Kodaira map. 2.5. Extended Kodaira moduli space. Kodaira [4] proved that if X ,→ Y is a compact complex submanifold of a complex manifold with H1 (X, N ) = 0, then there exists a maximal moduli space M parametrizing all possible deformations of X inside Y whose tangent space at the point X is isomorphic to H0 (X, N ). With the same data X ,→ Y one associates a pair X = 5N ∗ ,→ Y = 51 Y and asks for all possible holomorphic deformations of X inside (Y, η) within the class of complex exact Lagrangian sub-supermanifolds. Theorem 2.5.1. Let X ,→ Y be a compact complex submanifold of a complex manifold and X ,→ Y the associated compact complex exact Lagrangian sub-supermanifold. If H 1 (X, 3k N) = 0 for all k ≥ 1, then there exists a maximal moduli space M, called the extended Kodaira moduli space, which parametrizes all possible deformations of X inside (Y, η) within the class of complex exact Lagrangian P sub-supermanifolds. Its tangent space at the point X is canonically isomorphic to k≥1 H 0 (X, 3k N ), P with the following Z2 -grading: [TX M]0 = k∈2Z+1 H 0 (X, 3k N ) and [TX M]1 = P 0 k k∈2Z H (X, 3 N). Proof. is routine, cf. [4,7]. Example 2.5.2. Let X be a projective line CP1 embedded into a complex 3-fold Y with normal bundle N = O(1) ⊕ O(1). In this case the Kodaira moduli space is a complex 4fold M canonically equipped, according to Penrose, with a self-dual conformal structure, while the extended Kodaira moduli space M is a (4|3)-dimensional supermanifold isomorphic to 52+ M, where 2+ M is the bundle of self-dual 2-forms on M. 3. Restoring the Lost Constants 3.1. Odd contact structure. Let X be a compact submanifold of a manifold Y . It is shown in Sect. 2 that the Zariski tangent space to the extended moduli space of Lagrangian deformations of X = 5N ∗ inside Y = 51 Y is (5H0 (X, 3∗ N ))/R0|1 . One can b easily restore the lost constants R0|1 by extending Y to an odd contact supermanifold Y b and studying Legendrian families of compact sub-supermanifolds in Y. b b := Y × R0|1 and define a 1-form on Y Consider Y b θ = dε + p∗ (θ ), b → Y is the natural projection and ε is the standard coordinate on R0|1 . The where p : Y b form b θ defines an odd contact structure on Y. Lemma 3.1.1. For any submanifold X ,→ Y , the associated sub-supermanifold X = b is Legendrian with respect to the odd contact structure θ . 5N ∗ ,→ Y t Proof. b θ |X = dε|X + p∗ (θ)|X = 0 + 0 = 0. u

Extended Moduli Space of Special Lagrangian Submanifolds

19

Thus one can associate with data X ,→ Y the moduli space M of all possible b within the class of Legendrian sub-supermanifolds. deformations of X inside Y Proposition 3.1.2. The Zariski tangent space to M at X is 5H0 (X, 3∗ N ). µˆ

ν

b ←− F −→ M ⊂ R1|0 or R0|1 0 is a 1-parameter family of compact Proof. If Y Legendrian sub-supermanifolds, then θ ) = 9dt µˆ ∗ (b ˜ = t˜ + 1 (compare this with µ∗ (η) = d9 ∧ dt in for some 9 ∈ 0(F, OF ) with 9 −1 2.4.1). The restriction of 9 to ν (t) represents the image of ∂/∂t under the extended Kodaira map. u t b t ∈ M} is a family of compact Legendrian sub-supermaniRemark 3.1.3. If {Xt ,→ Y| folds, then {p(Xt ) ,→ Y| t ∈ M} is a family of exact Lagrangian sub-supermanifolds. 3.2. An important observation. If (Y, ω) is a symplectic manifold and X ,→ Y a Lagrangian submanifold with respect to ω, then the normal bundle N is canonically isomorphic to 1 X. Thus the associated extended Zariski tangent space is isomorphic to ∗ X. 4. Even + Odd Symplectic Structures 4.1. Isotropic Lagrangian sub-supermanifolds. In this section we assume that Y is an even 2m-dimensional symplectic manifold. The symplectic 2-form ω on Y gives rise to a global even function b ω on the associated odd symplectic supermanifold Y = 51 Y b = Y × R0|1 ) defined, in a natural local coordinate system (x a , ψa = (and hence on Y a 5∂/∂x ), as follows: 2m X ωab (x)ψa ψb , b ω= a,b=1

where ωab (x) is the matrix inverse to the matrix ωab (x) of components of ω dx a . The latter function gives rise to an odd Hamiltonian vector field Q ˆ defined by contact vector field Q on Y)

in the basis on Y (or a

Qy η = db ω, and is given, in a natural local coordinate system, by Q=

X a,b

ωab ψb

X ∂ ∂ωbc ce ∂ + w ad ω ψe . a a ∂x ∂x ∂ψb a,b,c,d,e

Differential forms on Y can be identified with smooth functions on the supermanifold 5T Y , 0(Y, ∗ Y ) = 0(5T Y, O5T Y ). Under this identification the de Rham differential d : ∗ Y → ∗ Y corresponds to an odd vector field d on 5T Y satisfying d 2 = 0. The even symplectic form ω establishes an isomorphism φ : 5T Y → Y and hence maps d into an odd vector field φ∗ d on Y which, as it is not hard to check [1], coincides precisely ω = 0. with Q. This observation implies, in particular, that Q2 = 0 and Qb

20

S. A. Merkulov

Definition 4.1.1. A Lagrangian (resp. Legendrian) sub-supermanifold of (Y, η) (resp. ˆ b ω|p(X ) = 0). (Y, θ )) is called ω-isotropic if b ω|X = 0 (resp. b If X ,→ Y is ω-isotropic, then Q|X ∈ 0(X , T X ). Lemma 4.1.2. Let (Y, ω) be a symplectic manifold and let Y := 51 Y . (i) If X is a compact (m|m)-dimensional ω-isotropic Lagrangian submanifold (Y, η), then Xred is a compact Lagrangian submanifold of (Y, ω). (ii) Let X be a compact Lagrangian submanifold of (Y, ω). Then the associated compact (m|m)-dimensional sub-supermanifold X := 5NX∗ ,→ Y is ω-isotropic. Moreover, under the isomorphism 0(X , OX ) = 0(X, ∗ X) the vector field Q|X goes into the usual de Rham differential d on X. Proof. is very straightforward when one uses Darboux coordinates. 4.2. Normal exponential map. Let X be an r-dimensional compact manifold of an ndimensional manifold Y and let X = 5N ∗ be the associated Lagrangian sub-supermanifold of the odd symplectic supermanifold (Y = 51 Y, η). Lemma 4.2.1. There exist • a tubular neighbourhood U of X in Y, • a tubular neighbourhood V of 0X in 51 X , where 0X ' X is the zero section of the bundle 51 X → X , • a diffeomorphism exp : V → U, such that (i) exp |0X : X → X is the identity map, and (ii) exp∗ (θ) − θ0 = dF for some F ∈ 0(X , OX ), where θ is the Liouville form on 51 Y and θ0 is the Liouville form on 51 X . In particular, exp∗ (η) = η0 , where η0 is the natural odd symplectic structure on 51 X . Proof. There is a tubular neighbourhood U of Xred in Y which can be identified via the normal exponential map with a tubular neighbourhood V ⊂ N of the zero section of the normal bundle N of X in Y . These neighbourhoods and the exponential map have a canonical extension to the map exp : U → V which has the property (i). We only have to check the validity of (ii). Let (x α , x α˙ ), α = 1, . . . , r, α˙ = r + 1, . . . , n, be a local trivialisation of N, where x α are local coordinates on the base of N and x α˙ , are the fibre coordinates. In the associated local coordinate system (x α , x α˙ , ψα := 5∂/∂x α , ψα˙ := 5∂/∂x α˙ ) on V ⊂ 51 X the zero section 0X is given by the equations x α˙ = ψα = 0. We have ∂ ∂ exp∗ (θ) = dx α 5 α + dx α˙ 5 α˙ = dx α ψα + dx α˙ ψα˙ , ∂x ∂x and ∂ ∂ = dx α ψα − dψα˙ x α˙ . θ0 = dx α 5 α + dψα˙ 5 ∂x ∂ψα˙ t Hence exp∗ (θ) − θ0 = d(ψα˙ x α˙ ). Since ψα˙ x α˙ is an invariant, the statement follows. u

Extended Moduli Space of Special Lagrangian Submanifolds

21

Remark 4.2.2. The above lemma establishes a one-to one correspondence between exact Lagrangian sub-supermanifolds nearby to X and global exact differential forms on X . Note, however, that this correspondence is not canonical but depends on the choice of the normal exponential map exp : V → U. If f is a global odd section of OX (such that df ∈ V ⊂ 51 X ) and Xdf ,→ Y is the associated Lagrangian sub-supermanifold, then we have a diffeomorphism df

exp

expdf : X −→ V −→ Xdf . Consider now a particular case when Y is a 2m-dimensional symplectic manifold (Y, ω) and X ,→ Y is a compact Lagrangian submanifold with respect to ω. In this case the normal bundle N of X in Y is isomorphic to 1 X and hence the total space of N is naturally a symplectic manifold implying that 51 X (with X = 5N ∗ ) comes canonically equipped with an odd vector field Q0 such that Q20 = 0 (cf. Subsect. 4.1). exp

Since the normal exponential map N ⊃ V −→ U ⊂ Y can be chosen to be a symplecexp tomorphism, the associated extended exponential map 51 X ⊃ V −→ U ⊂ 51 Y can be chosen to satisfy the additional property exp∗ (Q0 ) = Q. Note also that the isomorphism N = 1 X implies X = 5T X which in turn implies 0(X , OX ) = ∗ X. Then we have Lemma 4.2.3. For any (U, V, exp) as above and any exact Lagrangian sub-supermaniω|Xdf ) ∈ 0(X , OX ) = ∗ X defines a closed fold Xdf ,→ U, the function exp∗df (b (non-homogeneous, in general) differential form on X. Proof. Let φdf : 51 X → 51 X be a translation by df along the fibres of the projection 51 X → X . In the natural coordinates on 51 X we have    2 X ∂f X ∂ f  ∂  ω) =  + ψβ˙ ˙ Q0 − (φdf )∗ Q0 exp∗ (b α β ∂ψ ∂x ∂x α˙ ∂x α˙ α β˙  2 X ∂ f ∂ X − ψβ˙ α β ψγ˙ ψγ ∂x ∂x ∂ψα γ α,β˙

=

X α, ˙ β˙

ψα˙ ψβ˙

∂ 2f ∂x α ∂x β

= 0, and hence

∗ ω|Xdf ) = Q0 |X φdf ◦ exp∗ (b ω) |X Q0 |X exp∗df (b h i ∗ = φdf ◦ Q0 exp∗ (b ω) |X h i ∗ = φdf ◦ exp∗ (Qb ω) |X = 0.

22

S. A. Merkulov

Then the statement follows from the fact that under the isomorphism 0(X , OX ) = ∗ X the vector field Q0 |X ∈ 0(X , T X ) goes into the de Rham differential on ∗ X (cf. Lemma 4.1.2(ii)). u t 4.3. Moduli space of isotropic sub-supermanifolds. Given a compact Lagrangian submanifold X of a symplectic manifold (Y, ω). With these data one may associate the extended moduli space M of all possible deformations of X = 5N ∗ inside Yˆ within the class of Legendrian, ω-isotropic sub-supermanifolds. Theorem 4.3.1. The Zariski tangent space to M is 5H0 (X, ∗ Xclosed ), where ∗ Xclosed is the sheaf of closed differential forms on X. b ∈ M} be a 1-parameter family of ω-isotropic Legendrian subProof. Let {Xt ,→ Y|t manifolds, and let µ

ν

Y ←− F −→ M, be an associated 1-parameter family of Lagrangian, ω-isotropic sub-supermanifolds. The vector field Vf on Y gives rise to a vector field on Y × M (denoted by the same letter) which is tangent to F ,→ Y × M. We have Vf |F y µ∗ (η) = (Vf |F y d9) ∧ dt, implying

µ∗ (df ) = (Vf |F 9)dt.

Since µ∗ (df ) = d(f |F ) = 0, we get Vf |F 9 = 0. Finally, the required statement t follows 4.1.1(ii) which says that Vf |F is essentially the de Rham differential. u 5. Moduli Spaces of Special Real Slices 5.1. Batalin–Vilkovisky structures. Let Y be an (n|n)-dimensional compact supermanifold equipped with an odd symplectic form η and an even nowhere-vanishing section µ of the Berezinian bundle Ber (Y). Such data have been extensively studied by A. S. Schwarz in [9,10] in the context of Batalin–Vilkovisky quantization. R The volume form µ induces the Berezin integral, µ f , on smooth functions f on Y. In particular, µ gives rise to a divergence operator div V on smooth vector fields V on Y which can be characterized by the formula [2] Z Z (div V )f µ = − V (f )µ. If x a is a local coordinate system on Y and D ∗ (dx a ) the associated local basis of Ber (Y), then µ = ρ(x)D ∗ (dx a ) for some even nowhere-vanishing even function ρ(x) and div V =

a 1 ˜ V˜ ) ∂(V ρ) , (−1)a(1+ ρ ∂x a

where a˜ is the parity of x a and V a are the components of V in the basis ∂/∂x a . Another possible definition, which also works in the holomorphic category, is

Extended Moduli Space of Special Lagrangian Submanifolds

div V =

23

LV µ , µ

where LV stands for the Lie derivative along the vector field V . In particular, if Vf is the hamiltonian vector field on Y associated to a smooth function f ∈ 0(Y, OY ), then one defines a second order operator, 1f :=

1 div Vf . 2

Note that this operator depends solely on µ and η. The situations when 12 = 0 are of special interest in the context of Batalin–Vilkovisky quantization. The data (Y, η, µ) with property 12 = 0 are sometimes called SP -manifolds [9,10] or Batalin–Vilkovisky supermanifolds [2]. The structure (Y, η, µ) which arises in the context of Calabi–Yau manifolds does actually satisfy the requirement 12 = 0, see Sect. 6. 5.2. Integration on Lagrangian sub-supermanifolds. Let Y again be an (n|n)-dimensional compact oriented2 supermanifold equipped with an odd symplectic form η and an even nowhere-vanishing section of the Berezinian bundle Ber (Y), and let X ,→ Y be a compact (r|n − r)-dimensional Lagrangian sub-supermanifold. Then the extension 0 −→ T X −→ T Y|X −→ N −→ 0 and the isomorphism N ' 51 X imply Ber (Y)|X = Ber (X )⊗2 . b on Y induces a volume form on X which we denote by b1/2 . Thus the volume form A possible problem with taking the square root is overcome with the assumption that Y b1/2 is given by A.S. Schwarz in [9]. is oriented; a clear and explicit construction of As an example, let us consider the case when Y = 51 Y , where Y is an ndimensional compact manifold equipped with a nowhere-vanishing n-form . The latter b on Y. If x a gives rise, via the isomorphism Ber (Y) ' Det (Y )⊗2 , to a volume form 1 n is a local coordinate system on Y in which = α(x)dx ∧ . . . ∧ dx , then, in the associated local coordinate system (x a , ψa := 5∂/∂x a ) on Y, b = α 2 (x)D ∗ (dx a , dψa ). In particular, if X ,→ Y is a Lagrangian sub-supermanifold given locally by the equab1/2 = α(x)|Xred tions x b = 0, ψe = 0, b = r + 1, . . . , n, e = 1, . . . , r, then ∗ e D (dx , dψb ). There is a natural morphism of sheaves, F :P

−→ P OY ∗ Y α(x)−1 wa1 ...ak εa1 ...ak ak+1 ...an ψak+1 . . . ψan , wa1 ...ak dx a1 ∧ . . . ∧ dx ak −→

where ε a1 ...ak an is the antisymmetric tensor with ε1...n = 1. One has [13,9], F (dw) = 1F (w), b, η). where 1 is the Batalin–Vilkovisky operator on (Y, 2 To avoid a possible confusion let us recall that a supermanifold Y is called oriented if the underlying manifold Yred is oriented.

24

S. A. Merkulov

Lemma 5.2.1. Let Y be a manifold Y equipped with a nowhere vanishing volume form and let Y = 51 Y . If, for any compact submanifold X ,→ Y , the function 8 ∈ 0(Y, OY ) is such that Z 5NX∗

b1/2 = 0, 8

then 8 = 19 for some 8 ∈ 0(Y, OY ). Proof. We may assume for simplicity that 8 is homogeneous in odd coordinates ψa , i.e. that 8 = F (w) for some k-form on Y . According to A.S. Schwarz [9], Z Z 1/2 b 8 = w. 5NX∗

X

Since this vanishes for any compact submanifold X ,→ Y , w = ds for some (k−1)-form s on Y . Then 8 = F (w) = F (ds) = 1F (s). u t 5.3. Holomorphic volume forms. Let Y be an m-dimensional complex manifold equipped with a nowhere vanishing holomorphic m-form . Then the associated (m|m)dimensional complex supermanifold3 Y = 51c Y comes equipped with two natural odd symplectic structures. The first one is holomorphic and is represented, in a natural local coordinate system (zα , ζα := i5∂/∂zα ), α = 1, . . . , m, by the odd holomorphic 2-form ! X X α dz ζα = − d(x α + ix α˙ ) ∧ d(ψα˙ + iψα ), ηc = d α

α

where zα = x α + ix α˙ , ψα = 5∂/∂x α and ψα˙ = 5∂/∂x dal . The second one is real and comes from the identification of the real (2m|2m)-dimensional supermanifold underlying Y (which we denote by the same letter Y) with the real cotangent bundle 51 Y . It is given by X dx α dψα + dx α˙ ψα˙ . η=d α

Clearly, η = Im ηc . The holomorphic m-form induces, via the isomorphism ⊗2 Ber c (Y) = [m c Y] ,

b on Y. a holomorphic volume form A compact real (m|m)-dimensional sub-supermanifold X ,→ Y is called a real slice b if the sheaf C⊗T X is isomorphic to the sheaf of smooth sections of Tc Y|X . In this case b induces [1] a smooth section, |X , of the complexified Berezinian bundle C ⊗ Ber (X ). b|X ) is a b|X ) = 0. In this case Re ( A real slice X ,→ Y is called special if Im ( nowhere-vanishing real volume form on X . If X ,→ Y is also Lagrangian with respect to the real odd symplectic structure η, then ηc |X is non-degenerate and hence makes X b|X , ηc |X ) induces on into an odd symplectic manifold. According to 5.1, the data (Re ( the structure sheaf of X a second-order differential operator 1. Note that if X ,→ Y is real slice of the manifold Y such that Im |X = 0, then the associated sub-supermanifold X := 5N ∗ ,→ Y is a special Lagrangian real slice. 3 In this and the next sections the subscript c is used to distinguish holomorphic objects from the real ones. In particular, 1c Y denotes the bundle of holomorphic 1-forms on Y as opposite to 1 Y which denotes the bundle of real smooth 1-forms on the real manifold underlying Y .

Extended Moduli Space of Special Lagrangian Submanifolds

25

Theorem 5.3.1. Let Y be a complex manifold Y equipped with a nowhere-vanishing holomorphic m-form , X a compact real slice of Y such that Im |X = 0, and X = 5N ∗ the associated special Lagrangian real slice in Y = 51 Y . Then the Zariski tangent space to the moduli space of all possible deformations of X inside Y within the class of special Lagrangian real slices is isomorphic to the kernel of the operator 1 : 0(X , OX )/R → 0(X , OX ). Proof. Consider a 1-parameter family, {Xt ,→ Y | t ∈ R1|0 or R0|1 }, of special Lagrangian real slices in Y such that Xt=0 = X . Let zα = x α + ix α˙ be a local coordinate system on Y in which X is given by x α˙ = 0. Then in the associated local coordinate system (zα , ζα := i5∂/∂zα = ψα˙ + iψα ) on Y, the equations of Xt ,→ Y are x α˙ =

∂8 , ∂ψα˙

ψα = −

∂8 ∂x a

for some 1-parameter family of smooth functions 8 = 8(x α , ψα˙ , t) satisfying the boundary condition 8(x α , ψα˙ , 0) = 0. The image of ∂/∂t under the extended Kodaira map is represented by the function (see Lemma 2.4.1) ∂8 ∂ = ≡ 9. kt=0 ∂t ∂t t=0 b = ρ(zα , ζα )D ∗ (dzα , dζα ) for some holomorphic function ρ(zα , ζα ), then If ∂8 ∂8 ∂8 ∂8 α ∗ α b , ψα˙ − i ), d(ψα˙ − i ) D d(x + i |Xt = ρ x + i ∂ψα˙ ∂xα ∂ψα˙ ∂xα   28 ∂28 δβα + i ∂x∂β ∂ψ ∂8 ∂8 α ∂x β ∂x α α ˙ , ψα˙ − i Ber  = ρ x +i 28 28  ∂ α ∂ψα˙ ∂xα δβ − i ∂x∂α ∂ψ α˙ β˙ ˙ ∗

α

∂ψ ∂ψ

β

D (dx , dψα˙ ). Hence, b |X t ) dIm ( 1 X ∂9 ∂ρ0 ∂9 ∂ρ0 ∂ 29 = − + 2ρ ρ0 D ∗ (dx α , dψα˙ ) 0 α α ∂ψ α ∂ψ dt ρ ∂ψ ∂x ∂x ∂x 0 α ˙ α ˙ α ˙ t=0 α b|X = (div H9 ) Re b = (19) Re |X ,

t where ρ0 = Re ρ(xα , ψα˙ ). Hence 19 = 0. u 6. Existence of the Extended Moduli Space of Special Lagrangian Submanifolds 6.1. Initial data. Let X be a compact special Lagrangian submanifold of a Calabi– Yau manifold Y equipped with the Kähler form ω and a holomorphic volume form , b be the associated special Legendrian sub-supermanifold of and let X = 5N ∗ ,→ Y b (see Sect. 1). With these data one naturally associates the the contact supermanifold Y b within the class of special moduli superspace M of all deformations of X inside Y Legendrian sub-supermanifolds.

26

S. A. Merkulov

Proposition 6.1.1. The Zariski tangent superspace to M at X is canonically isomorphic to 5H∗ (X, R). Proof. It is not hard to check that under the isomorphism OX = ∗ X the Batalin– Vilkovisky operator 1 : OX → OX goes into 2∗d∗, where d is the de Rham differential and ∗ is the Hodge duality operator. Then Theorems 4.3.1 and 5.3.1 imply that the Zariski tangent superspace is isomorphic to 50(X, ∗ Xclosed ) ∩ 50(X, ∗ Xcoclosed ) = 5H∗ (X, R).

t u

Theorem 6.1.2. M is a smooth supermanifold. Proof (after McLean [6]). Let V be a tubular neighbourhood of the zero section in b and exp : V → U the normal exponential 51 X , U a tubular neighbourhood of X in Y map constructed as in Sect. 4.2. This map identifies nearby (to X ) special Legendrian b with global odd sections f of 0(X , OX ) and induces a sub-supermanifolds Xf of Y diffeomorphism expf : X → Xf . Let V 0 be an open subset in 0(X , OX ) lying in the preimage of V under the map d : OX → 1 X . We define a non-linear map M φ : V 0 ⊂ 0(X , OX ) −→ ∗ X ∗ X 

as follows:

ω), Im φ(f ) = exp∗f (b

b|p(Xf ) (p ◦ expf )∗ ( b|p(X ) ) Re (

!1/2  ,

b → Y = 51 Y is the natural projection. The square root in the above where p : Y formula always exists (cf. Sect. 5.2). Note that φ −1 (0, 0) = M. ω) ∈ ∗ X is a closed differential form. It follows from Lemma 4.2.3 that exp∗f (b b is homotopic to the inclusion Replacing f with tf , we see that the map expf : X → Y b X → Y. Therefore, denoting by [ ] the cohomology class, we get [exp∗f (b ω)] = [b ω|X ] = ω) is an exact differential form on X. 0 and conclude that exp∗f (b R b b is holomorphic, the integral Since p(Xf ) |p(Xf ) depends only on the homology (r|m−r)-dimensional Lagrangian4 class of Xred in Y [1]. Analogously, for any R compact b1/2 (and hence its real and imaginary parts) sub-supermanifold Z ⊂ Xf , the integral Z depends only on the homology class of Zred in Y . Since Zred is homologous to an rR b1/2 b|1/2 ) vanishes, we conclude that dimensional cycle in X and Im ( Z Im ( ) = 0 X for any such Z. Thus, for any smooth cycle Z ,→ X ⊂ Y , we have Z b1/2 ) Im ( 0= Zf

Z =

5NZ∗

 

b|1/2 ) (p ◦ expf )∗ Im ( Zf

Z =

5NZ∗

Im

b|5N ∗ )1/2 Re ( Z

b|Zf ) (p ◦ expf )∗ ( b|5N ∗ ) Re ( Z

  Re ( b|5N ∗ )1/2 Z

!1/2 b|5N ∗ )1/2 , Re ( Z

4 with respect to the odd symplectic structure induced on p(X ) from the holomorphic odd symplectic f

structure on Y, see Sect. 5.2

Extended Moduli Space of Special Lagrangian Submanifolds

27

where Zf := p ◦ expf (5NZ∗ ) and we used the fact that Z and (Zf )red are homologous in Y . By Lemma 5.2.1 and the fact that in our case 1 = ∗d∗, the integrand of the last integral is a coexact differential form in ∗ X. Thus we proved that φ maps V 0 ⊂ ∗ X into the subset M M ∗ X. ∗ Xexact ∗ Xcoexact ⊂ ∗ X Put another way, as a map from C 1,α differential forms on X to exact and coexact C 0,α differential forms, φ is surjective. Then, by the Banach space implicit function theorem and elliptic regularity, the extended moduli space M = φ −1 (0, 0) is smooth with tangent space at 0 canonically isomorphic to the kernel of the following operator (see the proofs of Theorems 4.3.1 and 5.3.1), M d φ(tf ) = (d, ∗d∗) : ∗ X −→ ∗ X ∗ X, dt t=0 t which is precisely 5H∗ (X, R). u Acknowledgement. It is a pleasure to thank A. N. Tyurin for valuable discussions.

References 1. Alexandrov, M., Kontsevich, M., Schwarz, A. and Zabolonsky, O.: The geometry of the master equation and topological quantum field theory. Int. J. Mod. Phys. A12, 1405–1430 (1997); hep-th/9502010 2. Getzler, E.: Batalin–Vilkovisky algebras and two-dimensional topological field theories. Commun. Math. Phys. 159, 265–285 (1994); hep-th/9212043 3. Hitchin, N.J.: The moduli space of special Lagrangian submanifolds. dg-ga/9711002 4. Kodaira, K.: A theorem of completeness of characteristic systems for analytic families of compact submanifolds of complex manifolds. Ann. Math. 75, 146–162 (1962) 5. Kontsevich, M.: Homological algebra of Mirror Symmetry. In: Proceedings of the International Congress of Mathematicians I (1994), Zürich: Birkhäuser, pp. 120–139 6. McLean, R.C.: Deformations of calibrated submanifolds. Duke University preprint, January 1996 7. Merkulov, S.A.: Existence and geometry of Legendre moduli spaces. Math. Z. 226 , 211–265 (1997) 8. Morrison, D.R.: The geometry underlying mirror symmetry. In: Proc. European Algebraic Geometry Conf. (Warwick, 1996); alg-geom/9608006 9. Schwarz, A.: Geometry of Batalin–Vilkovisky quantization. Commun. Math. Phys. 155, 249–260 (1993); hep-th 9205088 10. Schwarz, A.: Semiclassical approximation in Batalin–Vilkovisky formalism. Commun. Math. Phys. 158, 265–285 (1994); hep-th/9210115 11. Strominger, A., Yau, S.-T. and Zaslow, E.: Mirror symmetry is T -duality. Nucl. Phys. B 479, 243–259 (1996) 12. Tyurin, A.N.: Special Lagrangian geometry and slightly deformed algebraic geometry. math.AG/9806006 13. Witten, E.: A note on the antibracket formalism. Mod. Phys. Lett. A5, 487 (1990) 14. Vafa, C.: Extending mirror conjecture to Calabi–Yau with bundles. hep-th/9804131 Communicated by A. Connes

Commun. Math. Phys. 209, 29 – 49 (2000)

Communications in

Mathematical Physics

© Springer-Verlag 2000

Computation of Lickorish’s Three Manifold Invariant Using Chern–Simons Theory P. Ramadevi1 , Swatee Naik2,3 1 Mehta Research Institute of Mathematics and Mathematical Physics, Allahabad 211 019, India.

E-mail: [email protected]; [email protected]

2 Department of Mathematics, University of Nevada, Reno, NV 89557, USA.

E-mail: [email protected]

3 School of Mathematics, Tata Institute of Fundamental Research, Mumbai - 400 005, India.

E-mail: [email protected] Received: 5 March 1999 / Accepted: 15 June 1999

Abstract: It is well known that any three-manifold can be obtained by surgery on a framed link in S 3 . Lickorish gave an elementary proof for the existence of the threemanifold invariant of Witten using a framed link description of the manifold and the formalisation of the bracket polynomial as the Temperley–Lieb Algebra. Kaul determined a three-manifold invariant from link polynomials in SU (2) Chern–Simons theory. Lickorish’s formula for the invariant involves computation of bracket polynomials of several cables of the link. We describe an easier way of obtaining the bracket polynomial of a cable using representation theory of composite braiding in SU (2) Chern–Simons theory. We prove that the cabling corresponds to taking tensor products of fundamental representations of SU (2). This enables us to verify that the two apparently distinct three-manifold invariants are equivalent for a specific relation of the polynomial variables. 1. Introduction Classification of three dimensional manifolds has been a long standing problem. Witten [23] succeeded in giving an intrinsically three-dimensional definition for Jones’ type polynomial invariants [2, 3] of links using a topological quantum field theory known as Chern–Simons theory. Witten’s approach gives rise to a three-manifold invariant Z(M), also called the partition function for the manifold M, via surgery on framed links. The existing definitions of three-manifold invariants rely on two results: I. The fundamental theorem of Lickorish [12] and Wallace [22] that any connected, closed, orientable three-manifold can be obtained by surgery on a framed link in S 3 . II. The theorem due to Kirby [8] that two framed links determine the same threemanifold if and only if they are related by a sequence of diagram moves which are referred to as Kirby moves. It follows from Kirby’s theorem that invariants of framed links which are unchanged under Kirby moves give an invariant of three-manifolds.

30

P. Ramadevi, S. Naik

Computations of the Witten invariant are achieved by exploiting the connection between Chern–Simons field theory and a two dimensional conformal field theory known as Wess–Zumino conformal field theory. Though there are questions about measure in the functional integral formulation of Chern–Simons theory, the computed values of these invariants agree with the ones obtained from other mathematically rigorous approaches- viz., exactly solvable two dimensional statistical mechanical models [21], quantum groups [9,10,20], Temperley–Lieb algebra [4,14]. The interconnections between Chern–Simons theory, Wess–Zumino conformal field theory, solvable models and quantum groups have been summarised in Refs. [5,16]. Given a primitive 4r th root of unity, Lickorish [14,15] defines an invariant Fl (M) as a linear combination of bracket polynomials of cables of a framed link on S 3 which under surgery gives the three-manifold M. The cabling is necessary for the preservation of the invariant under Kirby moves. However, it introduces a large number of crossings in the diagram, and determining the bracket polynomials of link diagrams with several crossings is extremely cumbersome by the recursive method. Hence, the computation of Fl (M) is not easy. In this paper we find an elegant and easier method of determining bracket polynomials of cables of link diagrams (see Theorem 2) using the techniques developed in Ref. [18] -viz., representation theory of composite braids. This simplifies the computation of Lickorish’s invariant to a great extent. Note that Kauffman and Lins [4] have given another formulation of the three-manifold invariant using Temperley–Lieb recoupling theory. The recoupling theory computes the invariant in terms of certain theta and tetrahedral nets. Witten showed that the Jones’and HOMFLY polynomials of links in S 3 correspond to expectation values of Wilson loops carrying the defining representation of the SU (2) and SU (N) gauge groups respectively [23]. This method has been generalised to arbitrary higher dimensional representations of any compact semi-simple group resulting in a whole lot of new invariants of (framed) links in S 3 [5,7,16,17]. We refer to these field theoretic invariants as generalised invariants. Unlike Jones’, HOMFLY, and bracket polynomials, the generalised invariants cannot be solved completely by the recursive method. Hence, a direct method of evaluating these was developed in Refs. [5–7, 16, 17]. By construction these generalised invariants depend on the framing chosen for the link. However, by fixing the framing to be standard, i.e., one in which the linking number of the link with its frame is zero, ambient isotopy invariants of links are obtained. In Refs. [5–7, 16, 17], the emphasis was on obtaining ambient isotopy invariants of links and hence computations were done in standard framing. In the present problem, we require a field theoretic presentation for the bracket polynomial of a link diagram. Bracket polynomials are regular isotopy invariants. So we choose the vertical or the blackboard framing. We show how the Chern–Simons invariant P1,1,... ,1 [DL ](q) in vertical framing for the defining representation placed on any n-component link L is related to the bracket polynomial hDL i(A) provided the polynomial variables satisfy q 1/4 = −A

(1.1)

The relation is proved by first establishing a connection between the field theoretic invariant P1,1,... ,1 [DL ](q) and the Jones’ polynomial, and then using the well-known relationship between the Jones’ polynomial and the bracket polynomial (See Sect. 3 and Theorem 1). Using the generalised regular isotopy invariants in SU (2) Chern–Simons theory, Kaul [5] has derived a three-manifold invariant Fk [M]. We use the relationship between

Lickorish’s Three Manifold Invariant Using Chern–Simons Theory

31

the bracket and the generalised invariants (Theorem 2) and prove in Theorem 4 that Lickorish’s and Kaul’s invariants are equivalent with the polynomial variables obeying (1.1). It is mentioned in [5] that wherever computed, Kaul’s three-manifold invariant has agreed with the Chern–Simons partition function Z[M] except for the normalisation. That is, Fk (M) = Z[M]/Z[S 3 ]. We expect Kaul’s invariant to be equivalent to Witten’s invariant for an arbitrary M. Our work proves this in an indirect way by the following sequence of steps: (i) We prove that Lickorish’s and Kaul’s invariants are equivalent with the polynomial variables obeying (1.1). (ii) In [15] the equivalence between Lickorish’s invariant and the Reshetikin–Turaev invariant was established. The Reshetikin–Turaev invariant is considered a reformulation of Witten’s invariant using quantum groups (see [9, 20]). It follows that Kaul’s invariant is a reformulation of the Reshetikin–Turaev invariant in terms of the Chern–Simons generalised framed link invariants. The plan of the paper is as follows. In Sect. 2, we describe Lickorish’s three-manifold invariant obtained using bracket polynomials. We present in Sect. 3, the techniques used in evaluating Chern–Simons regular isotopy invariants. We show detailed computations for the Hopf link and then generalise the method to prove Theorem 1. In (3.26) and (3.27) we define the three-manifold invariant derived by Kaul from these generalised link invariants. In Sect. 4, we study the representation theory of parallel copies of braids. This is essential to compute directly the bracket polynomial for cables of link diagrams without going through the extremely tedious process of recursive evaluation. We show the details of our computation for the (2, 3)-cable of the Hopf link and generalise the techniques to arbitrary links. In the concluding section, we show that the three-manifold invariants obtained by Lickorish’s approach and the field theory approach are equivalent and thereby provide an easier method for computing Lickorish’s invariant. 2. Lickorish’s Three Manifold Invariant We briefly present the salient features of Lickorish’s three-manifold invariant obtained from bracket polynomials. An n-component link L in S 3 is a subset of S 3 homeomorphic to the union of n disjoint circles. A framing f = (f1 , f2 , . . . , fn ) on L is an assignment of an integer to each component of L. A regular projection of a link in the plane is one with transverse double points as the only self-intersections. These double points are referred to as crossings. A link diagram is obtained from a regular projection by marking the under crossing arc at a crossing with a break to indicate that part of the curve dips below the plane. Regular isotopy is an equivalence relation on the set of link diagrams. It is generated by Reidemeister moves II and III [19]. A framed link [L, f] can be represented by a diagram DL in the plane such that the framing on each component of L equals the sum of crossing signs in the part of the diagram that represents that component. Given a link diagram, the framing it represents will be called the blackboard framing or the vertical framing. The bracket polynomial as normalised in [14] is a function h i : {link diagrams in (R2 ∪ ∞) of unoriented links } → Z[A±1 ] defined by the following three properties.

(2.1)

32

P. Ramadevi, S. Naik

(i) hφi = 1; (ii) hDL ∪ U i = (−A2 − A−2 ) < DL >, where U is a component with no crossings; (iii) h i = Ah i + A−1 h i, where this refers to three diagrams identical except at one crossing where they look as shown. This is a regular isotopy invariant of link diagrams. In order to define a three-manifold invariant using the bracket, Lickorish obtains an expression which is invariant under Kirby moves on link diagrams. Before we can state Lickorish’s result we need the following definition. Definition. For a diagram DL representing an n component framed link [L, f], and a given n-tuple of nonnegative integers c = [c(1), c(2), . . . , c(n)], a c-cable c ∗ DL is defined as the diagram obtained by replacing the i th component of L in DL by c(i) copies all parallel in the plane. As in [15], let r be a fixed integer, r ≥ 3, and let C(n, r) denote the set of all functions c : {1, 2, . . . , n} → {0, 1, . . . , r − 2}. Let A be a primitive 4r th root of unity. P l2 ¯ Let G = G(A) be the Gauss sum 4r l=1 A , and let G denote the complex conjugate of G. Proposition ([14, 15]). Let M be a three-manifold obtained from S 3 by surgery on an n-component framed link represented by a diagram DL , and let σ and ν be the signature and the nullity of the linking matrix, respectively. Then Fl (M) = κ

X

σ +ν−n 2

λc(1) λc(2) . . . , λc(n) hc ∗ DL i

(2.2)

c∈C(n,r)

is an invariant of the three-manifold with κ and λc given by ¯ κ = (−1)r+1 A6 (G/G),

λc = 2G−1 Ar

2 +3

X j 0≤2j ≤r−2−c

(−1)c+j

c+j j

(2.3)

A2(c+2j +1) − A−2(c+2j +1) . (2.4)

In [14] only the existence and uniqueness of the λc was shown without giving any method of computation. The formulas (2.3) and (2.4) for κ and λc , 0 ≤ c ≤ n, were obtained in [15]. The equivalence of the Lickorish invariant to the Reshetikin–Turaev πi invariant as in the Kirby-Melvin [9] formulation is established in [15] for A = −e 2r . In spite of having these formulas the computation of the invariant Fl is quite difficult as one has to compute bracket polynomials of the c-cables which is very cumbersome. As mentioned in [14] if r = 6, and DL is a standard diagram of the trefoil with only 3 crossings, the computation of < 4 ∗ DL > using the definition of the bracket as given above would involve 248 operations. In the next two sections, we will concentrate on the link invariants from Chern–Simons field theory with the motivation of finding an easier method of computing the bracket polynomials of cables.

Lickorish’s Three Manifold Invariant Using Chern–Simons Theory

33

3. SU (2) Chern–Simons Theory and Link Invariants Now we briefly present the methods employed in the direct evaluation of Chern–Simons link invariants [23,6,7,16,5,17]. The metric independent action S of the SU (2) Chern–Simons theory on any threemanifold M 3 is given by Z 2 k A ∧ dA + A ∧ A ∧ A, (3.1) S= 4π M 3 3 where A is a one-form (matrix-valued in the Lie algebra of a compact semi-simple Lie Group SU (2)). Explicitly, A = Aaµ T a dx µ , where T a are the SU (2) generators and k is the coupling constant. The Wilson loop operators of any link L embedded in M 3 is given by I Y Y WRi (Ci ) = {T rRi P exp Aµ dx µ }, (3.2) WR1 R2 ...Rn (L) = i

Ci

i

where Ci ’s are the component knots of the link L and T rRi refers to the trace over the SU (2) representation Ri placed on the component Ci . The link invariants are given by the expectation value of the Wilson loop operator: R P2R1 ,2R2 ,... ,2Rn [DL ] = hWR1 ,R2 ,... ,Rn (L)i =

Q [DAµ ] i WRi (Ci ) eiS R , [DAµ ] eiS

(3.3)

where DL denotes a diagram representing the framed link L in vertical or blackboard frame. This functional integral over the space of matrix valued one forms A is evaluated by exploiting the connection between the Chern–Simons theory in three dimensional space with boundary and the SU (2)k Wess–Zumino conformal field theory on the two dimensional boundary [23]. The computation of these invariants has been considered in detail in Refs. [5–7, 16, 17]. We summarize this below. We represent a link as the closure of a braid. The braid group Bm consisting of mstrand braids is generated by bi , 1 ≤ i ≤ m − 1, where bi represents a right-handed half-twist between the i th strand and the i + 1st strand. The inverse bi−1 corresponds to a left-handed half-twist. For a standard reference on braid groups see [1]. In order to illustrate the technique of direct evaluation of link invariant (3.3), we will take the example of the Hopf link. The details we present in this example are general enough for deriving invariants of links obtained from a four-strand braid either by platting or capping. In Theorem 1 we generalise the technique to arbitrary links. Consider the Hopf link H , obtained from a four-strand braid (drawn as a closure of a two-strand braid), embedded in S 3 as shown in Fig. 1. Let j1 and j2 denote the representations placed on the two component knots of H . Let us slice the three-manifold S 3 into two three dimensional balls as shown in Fig. 2(a) and (b). The two dimensional S 2 boundaries of the three-balls are oppositely oriented and have four points of intersections with the braid, which we refer to as four punctures. Now, exploiting the connection between Chern–Simons theory and Wess–Zumino conformal field theory, the functional integrals of these three-balls correspond to states in the space of four point correlator conformal blocks of the Wess–Zumino conformal field theory [23]. The dimensionality of this space is dependent on the representation of SU (2) placed on the strands and

34

P. Ramadevi, S. Naik

3 S

j

j

1

2

Fig. 1. Hopf link oppositely oriented S2

j1 j2

ψ1

ψ2

(a)

(b) Fig. 2.

the number of punctures on the boundary. In the present example, the dimension of the space is min(2j1 + 1, 2j2 + 1, k − 2j1 + 1, k − 2j2 + 1).

(3.4)

These states can be written in a suitable basis. Two such choices of bases (|φsside i), and (|φtcent i) are pictorially depicted in Fig. 3(a),(b). Here s ∈ j1 ⊗ j2 and t ∈ min(j1 ⊗ j1 , j2 ⊗ j2 ), where ⊗ (also called tensor product notation) is defined as: j1 ⊗ j2 = |j1 − j2 | ⊕ |j1 − j2 | + 1 ⊕ . . . , ⊕min(k − j1 − j2 , j1 + j2 ) with ⊕ usually referred to as direct sum.

(3.5)

Lickorish’s Three Manifold Invariant Using Chern–Simons Theory

j2

j1

j2

j1

j1

s

35 j2

j2

j1

t

φside s

φtcent

(a)

(b)

Fig. 3.

The basis |φsside i is chosen when the braiding is done in the side two parallel strands. In other words, it is the eigenbasis corresponding to the generators b1 and b3 : b12 |φsside i = b32 |φsside i = (λs,R (j1 , j2 )(+) )2 |φsside i

(3.6)

(+)

with the eigenvalues λs (j1 , j2 ) for the left-handed and right-handed half-twists in parallel strands (vertical framing) being: −1 (+) (+) λs,R (j1 , j2 ) = λs,L (j1 , j2 ) (3.7) = (−1)j1 +j2 −l q [j1 (j1 +1)+j2 (j2 +1)−s(s+1)]/2 . Similarly for braiding in the middle two anti-parallel strands b2 , we choose the basis |φtcent i: b22 |φtcent i = (λt,R (j2 , j2 )(−) )2 |φtcent i with the eigenvalues in anti-parallel strands being: −1 (−) (−) = λt,L (j1 , j2 ) λt,R (j1 , j2 ) = (−1)|j1 −j2 |−l q [j1 (j1 +1)+j2 (j2 +1)−t (t+1)]/2 .

(3.8)

(3.9)

These two bases are related by a duality matrix j j , s ∈ j1 ⊗ j2 , t ∈ min(j1 ⊗ j1 , j2 ⊗ j2 ), ast 1 2 j2 j1 defined as:

|φsside i

= ast

j1 j2 |φtcent i. j2 j1

(3.10)

The matrix elements of the duality matrix are the SU (2) quantum Racah coefficients which are known [7,10]. For example, the duality matrix for j1 = j2 = 21 (defining/fundamental representation of SU (2)) is a 2 × 2 matrix: √ 1 [3] √−1 , (3.11) [3] 1 [2]

36

P. Ramadevi, S. Naik

where the number in square bracket refers to the quantum number defined as n

[n] =

n

q 2 − q− 2

(3.12)

q 2 − q− 2 1

1

with q (also called the deformation parameter in the quantum algebra SU (2)q ) related to 2iπ ). We will see that the invariants are polynomials the coupling constant k as q = exp( k+2 in q. Now, let us determine the states corresponding to Figs. 2(a) and (b). Since the braiding (side) i as basis states. Let |91 i is in the side two parallel strands, it is preferable to use |φs be the state corresponding to Fig. 2(a). Clearly, we can write the state for Fig. 2(b) as |92 i = b12 |91 i.

(3.13)

This state should be in the dual space as its S 2 boundary is oppositely oriented compared to the boundary in Fig. 2(a). Then the link invariant is given by P2j1 ,2j2 [DH ] = h91 |b12 |91 i.

(3.14)

For determining the polynomial, we will have to express the states as a linear combination of the basis states |φsside i. The coefficients in the linear combination are chosen such that h91 |91 i = P2j1 ,2j2 [DU 2 ] = [2j1 + 1][2j2 + 1], where P2j1 ,2j2 [DU 2 ] gives the polynomial of the unlink with 2 components.1 The above mentioned restrictions determine the state |91 i (see [6]) as: |91 i =

min(j1 +jX 2 ,k−j1 −j2 ) s=|j1 −j2 |

p [2s + 1]|φsside i.

(3.15)

Substituting it in Eq. (3.14) and using the braiding eigenvalue (3.7), we obtain P2j1 ,2j2 [DH ] =

min(j1 +jX 2 ,k−j1 −j2 ) s=|j1 −j2 |

(+)

[2s + 1](λs,R (j1 , j2 ))2 .

(3.16)

For j1 = j2 = 1/2, we get the following polynomial: P1,1 [DH ] = q 2 + (q + 1 + q −1 )q 3

−1 2

3

1

= q2 +q2 +q

−1 2

+q

−3 2

.

(3.17)

The bracket polynomial for the Hopf link, represented by diagram DH , obtained by the recursive method is hDH i = A6 + A2 + A−2 + A−6 = P1,1 [DH ] |q 1/4 =−A .

(3.18)

The bracket polynomial hDL i(A) and Jones’ polynomial V [L](q) for any n-component link are related as given below (see, for instance, [15] which uses a different normalisation2 ) (−1)n V [L](q) |q 1/4 =−A = (−A)3ω hDL i(A),

(3.19)

1 We work in the unknot polynomial normalisation P [D ] = [2j + 1] with the representation j placed 2j U on unknot. The square bracket denotes the quantum number (3.12). 1

1

2 We work in the unknot normalisation: V [U ] = (q 2 + q − 2 ), hD i = −(A2 + A−2 ). U

Lickorish’s Three Manifold Invariant Using Chern–Simons Theory

37

where ω is the writhe or the sum of crossing signs3 in the diagram DL . The Jones’polynomial is obtained by placing the defining representation j1 = j2 = 21 on the component knots in standard framing. The standard framing braiding eigenvalue (+) for a right-handed half-twist in parallel strands λ˜ r,R (j1 , j2 ) is related to the corresponding vertical framing eigenvalue as 3 (+) 1 1 (+) 1 1 λ˜ r,R ( , ) = q 4 λr,R ( , ). 2 2 2 2

(3.20)

This equation determines the frame correction factor between the ambient isotopy invariant and the regular isotopy invariant. Using (3.16), it is clear that for the Hopf link the Jones’ polynomial V [H ] is related to the invariant in vertical framing as V [H ] =

min(1,k−1) X

(+)

[2s + 1](λ˜ s,R (j1 , j2 ))2 = q 3/2 P1,1 [DH ].

s=0

(3.21)

For the diagram in Fig. 1, ω = 2, and the number of components n = 2. Combining = hDH i which confirms Eq. (3.18). (3.19) and (3.21) we get P1,1 [DH ] | 1 q 4 =−A

The method elaborated above for a specific example can be generalised for any ncomponent link obtained as a closure of an m-strand braid (n ≤ m). We will briefly outline the steps below. Theorem 1. For a diagram DL of an n-component link the bracket polynomial and the invariant in vertical framing are related as hDL i |

1

A=−q 4

= (−1)n P1,1,... ,1 [DL ].

(3.22)

Proof. Let us take an arbitrary braid word in the braid group B2m denoted by the black box P as shown in Fig. 4. Here, the dotted lines denote the closure and the dashed line represents slicing of S 3 into two pieces with oppositely oriented S 2 boundaries. Note that the last m strands are trivial, so this can also be considered the closure of an m-braid. The states on the 2m punctured surface corresponding to the two three-balls can be expanded in a suitable basis (see Fig. 5): X Aj1 ... ,j2m ,t0 ,t1 ... ,t2m−4 [P ] |φt0 ,t1 ,... ,t2m−4 i, |91 i = t0 ,t1 ... ,t2m−4

h92 | =

X

Bj1 ,j2 ... ,j2m ,t0 ,t1 ... ,t2m−4 hφt0 ,t1 ,... ,t2m−4 |,

(3.23)

t0 ,t1 ... ,t2m−4

where the summation variables ti ∈ ti−1 ⊗ ji+2 (3.5) with t−1 = j1 and t2m−3 = j2m . The element A in the Q summation depends on the braid word P and B is chosen such that h92 |92 i gives m i=1 [2ji + 1]. There are various ways to obtain a link by closing a braid. Some such ways are discussed in [7]. The closure as shown in Fig. 4 will demand that (jm+1 , jm+2 . . . , j2m ) = (jm , jm−1 . . . , j1 ) . 3 In standard literature ω in the exponent may appear with a negative sign. This is a matter of replacing the variable q with q −1 .

38

P. Ramadevi, S. Naik

1

2

3

. . . m-1

m

...

m+1

2m

P ...

. . .

Fig. 4.

j

1

j

j 3

2

t

0

j 4

.

t1

.

.

j

2m-2

j

j 2m-1

t2m-4 Fig. 5.

2m

Lickorish’s Three Manifold Invariant Using Chern–Simons Theory

39

Additional restrictions on ji are dictated by the braid word P and the n-component link. The link invariant is well-defined only if all strands which correspond to the same component of the link are marked with the same ji . (i.e.), at most n of the ji ’s can be different. With these inputs, any n-component link invariant will be P2j1 ,2j2 ,... ,2jn = h92 |91 i.

(3.24)

In order to compare this invariant with the Jones’ polynomial, we choose j1 = j2 = . . . = jn = 1/2 and as a consequence of the frame correction factor (3.20) between ambient isotopy (standard framing) and regular isotopy (vertical framing) we have V [L] = (q 4 )ω P1,1,... ,1 [DL ]. 3

(3.25)

Hence from Eq. (3.19) we have the result (3.22). u t The generalised link polynomials (3.24) of some of the knots up to eight crossings and two component links have been tabulated in Appendix II of [7]. We have to use vertical framing braiding eigenvalues (3.7), (3.9) instead of the standard framing eigenvalues. With these regular isotopy polynomial invariants, a three-manifold invariant which respects Kirby’s theorem has been constructed in [5]; the formula being X

Fk (M) = α −σ [L]

µc(1) µc(2) . . . , µc(n) Pc(1),c(2)... ,c(n) [DL ],

(3.26)

c∈C(n,k+2)

where µc =

1 2i

r

−(c+1) 1 3k 2 c+1 q 2 −q 2 , α = (q 4 ) 2 , k+2

(3.27)

and σ denotes the signature of the linking matrix in a framed link representation of the manifold M. It is not a priori clear whether this formula gives the same invariant as the Lickorish invariant (2.2) 4 with the polynomial variables related as in Eq. (1.1). In order to verify the equality, we need to find a method of determining brackets of cables in terms of the invariant Pc(1),c(2),... ,c(n) (3.24). In the next section, we will present the representation theory of composite braiding which will prove useful to directly compute the invariants of cables of link diagrams.

4. Composite Braiding and c-Cable Link Invariants The representation theory of composite braiding involves determination of braiding eigenvalues, eigenbasis and the duality matrix. One such representation in standard framing was presented in [17] in an attempt to distinguish a class of knots called mutants. In this paper, we have a slightly different composite braiding. The braiding eigenvalues and eigenbasis are derived in a similar fashion as in Ref. [17]. 4 The presence or absence of the nullity ν of the linking matrix for the framed link appears to be a matter of normalisation as ν is unchanged under both the Kirby moves.

40

P. Ramadevi, S. Naik

Definition. A c-Composite of a given braid is obtained by replacing every strand by (c) c-strands and the generator bi by a composite braiding Bi , (c)

Bi

= b(ci, ci + c − 1)b(ci − 1, ci + c − 2) . . . b(ci − c + 1, ci),

(4.1)

where b(i, j ) = bi bi+1 . . . , bj . For convenience we will call the original braid an elementary braid and the new one the composite braid. When we are dealing with a link it is possible to replace different components by a different number of parallel copies. In order to handle that case we have to consider mixed composite braids which we describe below. Definition. Let c = (c1 , c2 , . . . , cn ). An c-composite braid of an elementary braid is (c) obtained by replacing the i th strand by ci -strands and the generator bi by Bi , (c)

Bi

i i+1 i i+1 X X X X = b( cj , cj − 1)b( cj − 1, cj − 2) . . . j =1

b(

i X

j =1

j =1

cj − ci + 1,

j =1

i+1 X

j =1

(4.2)

cj − ci ).

j =1 (c)

Clearly, for c1 = c2 = · · · = cn = c, Bi

(c)

is the same as Bi

(4.1).

We shall again take the Hopf link from Fig. 1 as an example and work out the details for the (2, 3)-cable. Let us denote the resulting diagram as (2, 3)∗DH and the corresponding field theory invariant in vertical framing as P{2j1 },{2j2 } [(2, 3)∗DH ]. This notation implies that j1 is the representation of SU (2) placed on all the elementary strands constituting the r1 = 2 bunch of strands and j2 on all the elementary strands in the r2 = 3 bunch. We are interested in determining the invariant P{1},{1} [(2, 3) ∗ DH ] so that using (3.22), we get the bracket polynomial h(2, 3) ∗ DH i. We present the steps, analogous to the ones in Sect. 3, to evaluate P{1},{1} [(2, 3)∗DH ] which is obtained by gluing the two three-balls as shown in Fig. 6(a) and (b). Clearly, the boundary is a ten-punctured surface as against the earlier elementary case considered in Sect. 3. So, the functional integrals on these three-balls corresponds to states in the space of ten-point correlator conformal blocks in the Wess–Zumino conformal field theory. (2,3) The basis is so chosen that it is the eigenbasis of the composite braiding operator B1 . Using the elementary braiding eigenvalues (3.7), duality matrix (3.10), and some of the properties of the duality matrix which are given in Appendix I of [7], it can be shown that the eigenbasis for (2, 3)-mixed braiding in the side strands is (+) i with eigenvalue λm,R (l1 , l2 ) (3.7)-(i.e.), |φ(lside 1 ,(n1 ,l2 ),m),((n2 ,l3 ),l4 ,m) (3,2)

B1

(2,3)

.B1

(+)

|φ(lside i 1 ,(n1 ,l2 ),m),((n2 ,l3 ),l4 ,m)

i. = [λˆ m,R (l1 , l2 )]2 |φ(lside 1 ,(n1 ,l2 ),m),((n2 ,l3 ),l4 ,m)

(4.3)

The derivation of composite basis states and eigenvalues is along a similar direction as elaborated in Appendix of [18] for a different 2-composite braiding. Similarly, for composite braiding (3, 3) in the middle two strands, we choose the i. These basis states are pictorially depicted in Fig. 7(a) and basis |φlcent 1 ,((n1 ,l2 ),(n2 ,l3 ),n),l4 (b).

Lickorish’s Three Manifold Invariant Using Chern–Simons Theory

41

oppositely oriented S

ψ2

ψ1

(2,3)

(a)

2

(2,3)

Fi

(b)

6 Fig. 6.

j 1

j

j j2

j2

2

1

j

j

2

2

j1

j 2 j1

j

j1j2

j1

j2

2

l1

l

n2

1

l3

l2

1

2

j2 j1

n2

n n1

j

j2

l2

l3

j1

l

4

n

l4

m

φ(l ,(n , l ),m ), ( (n , l ),l ,m )

φlcent ,( (n ,l ), (n ,l ),n ), l

side 1

1

2

2

3

4

1

1 2

2 3

4

(b)

(a) Fig. 7.

For j1 = j2 = 21 , 1 1 1 1 l1 , n1 , n2 , l4 ∈ ( ⊗ ); l2 ∈ (n1 ⊗ ), l3 ∈ (n2 ⊗ ); 2 2 2 2 m ∈ min(l1 ⊗ l1 , l2 ⊗ l2 ); n ∈ (l1 ⊗ l2 ). The two bases in Fig. 7(a) and (b) are related by the duality matrix l1 l2 i = a i. |φlcent |φ(lside mn 1 ,(n1 ,l2 ),m),((n2 ,l3 ),l4 ,m) 1 ,((n1 ,l2 ),(n2 ,l3 ),n),l4 l3 l4

(4.4)

42

P. Ramadevi, S. Naik

With these ingredients, it is clear that the state for Fig. 6(a) is: (2,3) |91 i

X

=

2 ,k−l1 −l2 ) p X min(l1 +lX [2m + 1]

l2 ∈n1 ⊗ 21 l1 ,n1

(4.5)

m=|l1 −l2 |

i. · |φ(lside 1 ,(n1 ,l2 ),m),((n1 ,l2 ),l1 ,m) The restriction l1 = l4 , n1 = n2 , l2 = l3 in the basis states is obtained as a consequence of closure of the braid to obtain the link. The coefficients in the linear combination are (2,3) (2,3) obtained from the fact that h91 |91 i = P1,1,1,1,1 [DU 5 ] = [2]5 , where DU 5 is the unlink with five components. Using (4.5), the state corresponding to Fig. 6(b) will be: (2,3)

|92

(3,2)

i = B1

(2,3)

.B1

(2,3)

|91

i

X

min(l1 +lX 2 ,k−l1 −l2 )

l1 ,l2 ,n1

m=|l1 −l2 |

=

p (+) [2m + 1][λˆ m,R (l1 , l2 )]2 |

(4.6)

i. · φ(lside 1 ,(n1 ,l2 ),m),((n1 ,l2 ),l1 ,m) The summation over n1 can be suppressed but we should remember that l2 ∈ ( 21 ⊗ 21 ⊗ 21 ). Now that we have determined the states for Fig. 6(a) and (b), the invariant for the (2, 3)cable of Hopf link can be rewritten in terms of elementary Hopf link invariants (3.16): X (2,3) (2,3) P2l1 ,2l2 [H ], (4.7) P{1},{1} [(2, 3) ∗ DH ] = h91 |92 i = l1 ,l2

where l1 = 0, 1 and l2 = 1/2, 1/2, 3/2 for k ≥ 3. The explicit form of the polynomial is: P{1},{1} [(2, 3) ∗ DH ] = 2P0,1 [DH ] + P0,3 [DH ] + 2P2,1 [DH ] + P2,3 [DH ] −1

1

3

−1

1

−3

= 2(q 2 + q 2 ) + (q 2 + q 2 + q 2 + q 2 ) 1 −1 3 1 −1 −3 + 2 q 2 (q 2 + q 2 ) + q −1 (q 2 + q 2 + q 2 + q 2 ) 1 −1 3 1 −1 −3 + q 5 (q 2 + q 2 ) + q 2 (q 2 + q 2 + q 2 + q 2 ) (4.8) 5 3 1 −1 −3 −5 + q −3 (q 2 + q 2 + q 2 + q 2 + q 2 + q 2 ) =q

11 2

6q

9

7

5

3

1

+ q 2 + q 2 + 3q 2 + 4q 2 + 6q 2 −1 2

+ 4q

−3 2

+ 3q

−5 2

+q

−7 2

+q

−9 2

+q

−11 2

.

The following relation is easy to check by computing the bracket polynomial using the recursive method: P{1},{1} [(2, 3) ∗ DH ]|

1

q 4 =−A

= −h(2, 3) ∗ DH i .

(4.9)

This is expected from Theorem 1 for the five component link. Now we generalise the technique used here for cables of arbitrary link diagrams. A boldface lower case letter will indicate an n-tuple of numbers, for ex., c = (c1 , c2 , . . . , cn ). Using Theorem 1 we obtain:

Lickorish’s Three Manifold Invariant Using Chern–Simons Theory

}

l 3

l 4

t

0

.

.

.

l

2m-2

t1

l

}

}

l

c’2m

}

}

l 1

2

c’2m-2 c’2m-1

c’4

c’3

}

c’2

}

c’1

43

l

2m-1

2m

t2m-4

φ l1 ,l2,t0 ,l3 ,t1 ,...,t2m-4 ,l2m-1 ,l2m FIG 8 Fig. 8.

Theorem 2. The bracket polynomial of a c-cable of the diagram of an n-component link can be expressed in terms of elementary link invariants in vertical framing (3.24); the exact relation being:  hc ∗ DLn i = (−1)c1 +c2 +... ,cn 

X

 P2l1 ,2l2 ,... ,2ln [DLn ] |

1

,

q 4 =−A

l1 ,l2 ... ,ln

(4.10)

where li takes values in 1 1 1 1 . li ∈ {( ⊗ ) ⊗ } . . . 2 2 2 2 {z } |

(4.11)

ci

Proof. Consider an c-cable of an n-component link L. Suppose that L is obtained from closure of an elementary m-braid. Let us replace each strand corresponding to the i th component of the link by ci parallel strands, 1 ≤ i ≤ n. This gives a mixed composite 0 ), where as a set {c0 , c0 . . . , c0 } is the same as {c , c . . . , c }. braid say (c10 , c20 . . . , c2m 1 2 n 1 2 2m We will place the defining representation j = 1/2 on all the elementary strands. Closure of composite braids forces, 0 0 0 0 . . . , c2m , cm−1 . . . , c10 . = cm cm+1

44

P. Ramadevi, S. Naik

Again writing the states for the mixed composite braiding in a suitable basis (See Fig. 8): X X (c ,c ,... ,cn ) i= Al1 ...l2m ,t0 ,... ,t2m−4 [P ] |91 1 2 l1 ,l2 ,... ,l2m t0 ,t1 ,... ,t2m−4

· |φ˜ l1 ,l2 ,t0 ,l3 ,t1 ,... ,t2m−4 ,l2m−1 ,l2m i, X X (c ,c ,... ,cn ) |= Bl1 ,... ,l2m ,t0 ,... ,t2m−4 h92 1 2

(4.12)

l1 ,l2 ,... ,l2m t0 ,t1 ,... ,t2m−4

· hφ˜ l1 ,l2 ,t0 ,l3 ,t1 ,... ,t2m−4 ,l2m−1 ,l2m |, where ti ∈ (ti−1 ⊗ li−2 ) with t−1 = l2 , t2m−3 = l2m and li ’s as in . The closure of the braid demands: (lm+1 , . . . , l2m ) = P (l1 , l2 , . . . , lm ) . The constraint of closing the braid to give an n-component link, n ≤ m requires that at most n of the li ’s are distinct, and the rest are determined by the link under consideration. Incorporating the above mentioned conditions and also using (3.24), we get the composite invariant to be: (c ,c2 ,... ,cn )

P{1},{1},... ,{1} [c ∗ DLn ] = h92 1 X =

(c1 ,c2 ,... ,cn )

|91

i

P2l1 ,2l2 ,... ,2ln [DLn ].

(4.13)

l1 ,l2 ,... ,ln

It is evident that the number of components in the c-cable of the n-component link is t c1 + c2 + . . . cn . Hence Theorem 1 gives the result (4.10). u We expand all the tensor products in (4.11) using (3.5) to get some useful relations for proving the equality of the two apparently distinct three-manifold invariants defined by Lickorish and Kaul. The closed form expression for the tensor product for ci ≤ k turns out to be M M as js − at jˆt , (4.14) li ∈ s 0≤2s≤ci

t ci +2≤2t≤2(ci −1)

whererepresentations js and jˆt are given by 2js = ci − 2s and 2jˆt = 2t − ci − 2 and c −1 are the constants. as = i s We use this tensor product expansion to rewrite the bracket of an c-cable in the following corollary. 1

Corollary 1. With q 4 = −A, the bracket polynomial of the c-cable of an n-component link diagram D, where c = (c1 , c2 , . . . , cn ), is given by    X  (4.15) ni=1 Aci ,si Ps1 ,s2 ,... ,sn [D] , hc ∗ Di = (−1)c1 +c2 +...+cn ·   (s1 ,s2 ,... ,sn )

where the {si } are subjected to ri − si even, 0 ≤ si ≤ ci ,

Lickorish’s Three Manifold Invariant Using Chern–Simons Theory

45

and

Aci ,si

 ci − 1 ci − 1   , if 0 ≤ si ≤ ci − 4, −  c −s c −s i i  i i 2 2 −2 =  ci − 1   , if ci − 3 ≤ si ≤ ci .  ci −si

(4.16)

2

The inverse of Corollary 1 expressing elementary invariants Pl1 ,l2 ,... ,ln [D] in terms of the composite invariants hc ∗ Di will be useful in proving the equivalence between Lickorish and Kaul’s three-manifold invariants. Hence inverting (4.15) we obtain the following: Theorem 3. For a diagram D of an n-component link, the invariant Pl1 ,l2 ,... ,ln [D] can be expressed in terms of bracket polynomials as follows: Pl1 ,l2 ,... ,ln [D] =

X

ni=1 (−1)li −ji

j 0≤2j ≤l i i

li − ji ji

h(l − 2j) ∗ Di.

(4.17)

Proof. We use Corollary 1 to rewrite the RHS of the above equation. This changes (4.17) to: X X l − ji ni=1 (−1)ji i Ali −2ji ,si Ps1 ,s2 ,... ,sn [D], Pl1 ,l2 ,... ,ln [D] = ji j 0≤2j ≤l i i

s

(4.18) where s = (s1 , s2 , . . . , sn ) is such that li − si is even and 0 ≤ si ≤ li − 2ji . Let us make a change of variable li − si = 2si0 , write each sum as a multiple sum, and interchange the summations to rewrite the statement we need to prove as: Pl1 ,l2 ,... ,ln [D] = X

...

sn0 ,jn 0≤2s 0 ≤l ,0≤j ≤s 0 n n n n

X s10 ,j1

0≤2s10 ≤l1 ,0≤j1 ≤s10

li − ji Ali −2ji ,li −2si0 ji

ni=1 (−1)ji

Pl1 −2s10 ,l2 −2s20 ,... ,ln −2sn0 [D] .

We will work with these sums one at a time. Let 0

S(s10 )

=

s1 X j1 =0

(−1)j1

l1 − j1 Al1 −2j1 ,l1 −2s10 Pl1 −2s10 ,... ,ln −2sn0 [D] . j1

Then the first sum in (4.19) equals blX 1 /2c s10 =0

S(s10 ),

(4.19)

46

P. Ramadevi, S. Naik

where bl1 /2c denotes the greatest integer less than or equal to l1 /2. We split this into Pbl /2c Pmin{bl /2c,1} and s 0 1=2 and use Eq. (4.16) to substitute for the Ali −2ji ,si . two sums s 0 =0 1 1 1 It is easy to see that: min{bl 1 /2c,1} X

S(s10 ) = Pl1 ,l2 −2s20 ,... ,ln −2sn0 [D].

s10 =0

(4.20)

For 2 ≤ s10 ≤ l1 /2, using (4.16) we have :

0

S(s10 )

=

s1 X

l1 − j1 j1

j1

(−1)

j1 =s10 −1

l1 − 2j1 − 1 Pl1 −2s10 ,... ,ln −2sn0 [D] s10 − j1

+ (l1 − 2s10 + 1) × s10 −2 X (l1 − j1 )(l1 − j1 − 1) . . . , (l1 − j1 − s10 + 2) 0 P [D] . (−1)j1 0 l1 −2s1 ,... ,ln −2sn j1 !(s10 − j1 )!

j1 =0

(4.21) We claim that this equals zero. The theorem will follow by treating the rest of the sums in Eq. (4.19) similarly. In order to prove the claim first note that 0

s1 X j1 =0

0

(1 − x)s1 1 (−1)j1 x j1 = . 0 j1 !(s1 − j1 )! s10 !

Using this we see :   0 s1 0 X (l1 − j1 )(l1 − j1 − 1) . . . (l1 − j1 − s1 + 2)   (−1)j1 j1 !(s10 − j1 )! j1 =0   0  s1 s10 −1 j1 x j1 X d (−1) 0 0  x s1 −l1 −2   = Lt x→1 (−1)s1 −1 0 j1 !(s10 − j1 )! dx s1 −1

(4.22)

j1 =0

0

= Lt x→1 (−1)s1 −1

0 d s1 −1

dx

n

s10 −1

o 0 0 (1 − x)s1 x s1 −l1 −2 = 0.

It follows that S(s10 )

=

s10 X

j1

(−1)

j1 =s10 −1

l1 − j1 j1

l1 − 2j1 − 1 − s10 − j1

(l1 − j1 )(l1 − j1 − 1) . . . , (l1 − j1 − s10 + 2) + 1) × j1 !(s10 − j )! i Pl1 −2s10 ,... ,ln −2sn0 [D] . (l1 − 2s10

A simple arithmetic shows that the expression in the RHS of (4.23) is 0. u t

(4.23)

Lickorish’s Three Manifold Invariant Using Chern–Simons Theory

47

Now that we have given a direct method of determining the bracket polynomials of cables of link diagrams, we will show, in the next section, that the three-manifold invariants obtained from regular isotopy field theoretic invariants (3.26) are the same as (2.2) for the polynomial variables satisfying Eq. (1.1). 5. Conclusions We have given a field theoretic presentation for bracket polynomials in terms of framed link invariants in SU (2) Chern–Simons theory with the polynomial variable obeying (1.1). Then, using representation theory of composite braids, we obtained a direct method of evaluating bracket polynomials of cables of link diagrams. This enables us to show that the three-manifold invariant obtained by Lickorish using the formalisation of the bracket polynomial as the Temperley–Lieb algebra and the invariant obtained by Kaul using generalised link invariants from Chern–Simons theory are equal up to normalisation. However the normalising factor on the choice of A, the 4r th root of unity.  depends  a In the discussion below − denotes the quadratic symbol for relatively prime b integers a and b, defined as [11]:   a 2 − = +1, if a ≡ x mod b, for some integer x, (5.1) −1, otherwise. b nπ i

Theorem 4. Let A = e 2r , where n is a positive integer relatively prime to 4r with r related to the coupling constant in field theory as k = r − 2. Lickorish’s invariant Fl obtained from the formalisation of the bracket polynomial as the Temperley–Lieb algebra and Kaul’s invariant Fk obtained using generalised link invariants from Chern–Simons theory, for the polynomial variables obeying (1.1), are related as: Fk (M) = κ

−ν 2

Fl (M), where

  r (n−1)(r+1)π i 2 = ±1. = − e n

(5.2)

(5.3)

Proof. It is an easy exercise in algebraic number theory (see [11]) to show that the Gauss nπ i (quadratic) sum G = G(e 2r ) used in the definitions (2.3) and (2.4) is as given below:   r √ nπ i (5.4) G = 2 2r − e 4 . n Clearly,

¯ G G

=e

−nπ i 2

. Simplifying (2.3) and using (3.27) it is easy to see that κ = α −2 .

(5.5)

48

P. Ramadevi, S. Naik

Similarly using (2.4) and (3.27) we see that   r X (n−1)(r+1)π i l+j 2 α −1 (−1)l+j µl+2j . λl = − e j n j 0≤2j ≤r−2−l

(5.6)

We use Theorem 3 to write Fk in terms of brackets of cables of the diagram D which represents the framed link associated with the manifold and compare the coefficients of hc ∗ Di. We see that −ν Fk (M) = n κ 2 Fl (M). Note that = ±1, and n is odd. The result follows. u t iπ

It was shown [15] that with A = −e 2r Lickorish’s invariant equals the Reshitikin– Tureav invariant up to normalisation and a change of variable. So from Theorem 4, it follows that Kaul’s invariant defined using the generalised link invariants in vertical framing is a reformulation of the Reshetikin–Turaev invariant. The Kauffman–Lins invariant defined in Chapter 12 of [4] gives another normalization of the Witten–Reshetikin–Turaev invariant following Lickorish’s Temperley–Lieb algebra approach. We shall rewrite the inferred result in the following corollary: Corollary 2. The relationship between the Kauffman–Lins invariant Z(M), which is Witten–Reshetikin–Turaev invariant up to a normalisation, and Kaul’s invariant Fk is r π 2 3 3 sin . (5.7) Fk (M) = Z(M)/Z(S ), where Z(S ) = µ0 = r r We have shown by an indirect procedure that Kaul’s three-manifold invariant equals Witten’s partition function. It would be very interesting to see whether there is a direct method of deducing the above result. Acknowledgements. We would like to thank Ashoke Sen for his valuable suggestions. We are also grateful to T. R. Govindarajan, R. K. Kaul, C. Livingston, and J. Prajapat for their comments.

References 1. Birman, J.: Braids, Links, and Mapping Class Groups. Annals of Mathematics Studies, No. 82, Princeton, N.J.: Princeton University Press and Tokyo: University of Tokyo Press, 1974 2. Jones, V.F.R.: A polynomial invariant for knots via von Neumann algebras. Bull. Am. Math. Soc. 12, 103–111 (1985) 3. Jones, V.F.R.: Hecke algebra representations of braid groups and link polynomials. Ann. of Math. 126, 335–388 (1987) 4. Kauffman, L.H., Lins, S.L.: Temperley–Lieb Recoupling Theory and Invariants of 3-Manifolds. Annals of Math. Studies, no. 134, Princeton, NJ: Princeton University Press, 1994 5. Kaul, R.K.: Chern–Simons Theory, Knot Invariants, Vertex models and Three-manifold Invariants. hep-th 9804122 6. Kaul, R.K., Govindarajan, T.R.: Three dimensional Chern–Simons theory as a theory of knots and links. Nucl. Phys. B380, 293–333 (1992; Three dimensional Chern–Simons theory as a theory of knots and links II: Multicoloured links. Nucl. Phys. B393, 392–412 (1993) 7. Kaul, R.K.: Complete solution of SU (2) Chern–Simons theory: Chern–Simons theory, coloured-oriented braids and link invariants. Commun. Math. Phys. 162, 289–319 (1994) 8. Kirby, R.: Calculus of Framed Links in S 3 . Invent. Math. 45, 35–56 (1978) 9. Kirby, R., Melvin, P.: The 3-manifold invariants of Witten and Reshetikin–Turaev for sl(2, C). Invent. Math. 105, 473–545 (1991)

Lickorish’s Three Manifold Invariant Using Chern–Simons Theory

49

10. Kirillov, A.N., Reshetikin, N.Yu.: Representation algebra Uq (sl2 ), q-orthogonal polynomials and invariants of links. LOMI preprint E-9-88; see also in: New Developments in the Theory of Knots. ed. T. Kohno, Singapore: World Scientific, 1989 11. Lang, S.: Algebraic Number Theory. New York: Springer, 1986 12. Lickorish, W.B.R.: A representation of combinatorial 3-manifolds. Annals of Math. 76, 531–538 (1962) 13. Lickorish, W.B.R.: Invariants of 3-manifolds from the combinatorics of the Jones polynomial. Pacific J. Math. 149, 337–347 (1991) 14. Lickorish, W.B.R.: Three Manifolds and Temperley Lieb Algebra. Math. Ann. 290, 657–670 (1991) 15. Lickorish, W.B.R.: Calculations with the Temperley–Lieb algebra. Comment Math. Helvetici 67, 571–591 (1992) 16. Ramadevi, P.: Chern–Simons theory as a theory of knots and links. Ph.D. Thesis 17. Ramadevi, P., Govindarajan, T.R., Kaul,R.K.: Three dimensional Chern–Simons theory as a theory of knots and links III: Compact an arbitrary compact semi-simple group. Nucl. Phys. B402, 548–566 (1993) 18. Ramadevi, P., Govindarajan, T.R., Kaul, R.K.: Representations of Composite braids and invariants for mutant knots and links in Chern–Simons field theories. Mod. Phys. Lett. A10, 1635–1658 (1995) 19. Reidemeister, K.: Knot Theory. BCS Associates Moscow, Idaho, USA (translated from Knotentheorie, Ergebnisse der Mathematik und ihrer Grenzgebiete, (Alte Folge), Band 1, Heft 1, Berlin–Heidelberg–New York: Springer, 1932 20. Reshetikin, N.Yu., Turaev, V.G.: Invariants of 3-manifolds via link polynomials and quantum groups. Invent. Math. 103, 547–597 (1991) 21. Wadati, M., Deguchi, T., Akutsu, Y.: Exactly solvable models and knot theory. Phys. Rep. 180, 247–332 (1989) 22. Wallace, A.D.: Modifications of cobounding manifolds. Canad. J. Math. 12, 503–528 (1960) 23. Witten, E.: Quantum field theory and Jones polynomials. Commun. Math. Phys. 121, 351–399 (1989) Communicated by H. Araki

Commun. Math. Phys. 209, 51 – 76 (2000)

Communications in

Mathematical Physics

© Springer-Verlag 2000

A Master Identity for Homotopy Gerstenhaber Algebras Füsun Akman? Department of Mathematics, Coastal Carolina University, PO Box 261954, Conway, SC 29528-6054, USA. E-mail: [email protected] Received: 2 March 1998 / Accepted: 16 July 1999

Abstract: We produce a master identity {m}{ ˜ m, ˜ m, ˜ . . . } = 0 for a certain type of homotopy Gerstenhaber algebras, in particular suitable for the prototype, namely the Hochschild complex of an associative algebra. This algebraic master identity was inspired by the work of Getzler–Jones and Kimura–Voronov–Zuckerman in the context of topological conformal field theories. To this end, we introduce the notion of a “partitioned multilinear map” and explain the mechanics of composing such maps. In addition, many new examples of pre-Lie algebras and homotopy Gerstenhaber algebras are given. Contents 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 2. Partitioned Maps . . . . . . . . . . . . . . . . . . . . . . . 2.1 Multibraces . . . . . . . . . . . . . . . . . . . . . . . 2.2 The algebra of ordered partitions . . . . . . . . . . . . 2.2.1 Regular partitions . . . . . . . . . . . . . . . . . 2.2.2 Partitions involving zeros . . . . . . . . . . . . . 2.3 Partitioned multilinear maps . . . . . . . . . . . . . . 2.3.1 Regular partitioned maps . . . . . . . . . . . . . 2.3.2 Higher products of regular partitions . . . . . . . 2.3.3 Higher compositions of regular partitioned maps 2.3.4 Partitioned Hochschild space . . . . . . . . . . . 2.3.5 Partitioned maps involving zeros . . . . . . . . . 3. The Master Identity for Homotopy Gerstenhaber Algebras . 4. Substructures and Examples . . . . . . . . . . . . . . . . . 4.1 Substructures . . . . . . . . . . . . . . . . . . . . . . 4.2 Hochschild complex revisited . . . . . . . . . . . . . . 4.3 Topological vertex operator algebras . . . . . . . . . . ? Preliminary version was completed at Utah State University, Logan, UT.

. . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . .

52 54 54 58 58 61 62 62 66 67 68 69 70 71 71 72 74

52

F. Akman

1. Introduction The proliferation of algebraic identities in string theory, such as the lower identities for BRST complexes and topological vertex operator algebras (TVOA) in Lian and Zuckerman’s leading work [18] (see also [24]) and those given by Kimura, Voronov, and Zuckerman for homotopy Gerstenhaber algebras – specifically for TVOA’s – in [15], has led the author to study a common language for multilinear maps and their compositions. Gerstenhaber’s braces in [10] x◦y (which we will write as {x}{y}, following the notation of [15]) have been in existence for more than thirty years and denote a certain rule of composition of two multilinear maps x, y on a vector space A with values in A. The extension of the definition to {x}{y1 , . . . , yn }, namely, to the composition of several multilinear maps with one, is first seen in a limited way in Gerstenhaber [10] again, followed by its explicit utilization by May in the definition of operads (see, for example, [22]), and finally in Getzler’s work [11]. These composition rules were essential in redefining, and exploring new properties and examples of, commonly studied algebras like associative, Lie, strongly homotopy associative (A∞ ), strongly homotopy Lie (L∞ ), Batalin-Vilkovisky, Gerstenhaber, and homotopy Gerstenhaber (G∞ ) algebras. For example, many fundamental (sets of) identities, most notably associativity, can be grouped under m ◦ m = 0,

(1)

where m might be a homogeneous multilinear map or a formal infinite sum of such maps. An overview of multibraces will be given in Sect. 2.1. In turn, the iterated composition operations on multilinear maps have been shown to satisfy many algebraic identities (see [26]) resembling those which arise in the context of operads and homotopy algebras. Curiously, the multibraces {v1 , . . . , vm } · · · {w1 , . . . , wn } of [15], defined on a TVOA or any G∞ algebra, satisfy similar (but more numerous) identities.As a first step in understanding the similarities, the Gerstenhaber–May–Getzler braces were extended by the author in [1] to multiple substitutions/compositions of the form {x}{y1 , . . . , ym } · · · {z1 , . . . , zn } on the “Hochschild space” C • (A) = Hom(T A; A) of multilinear maps on A, and eventually to {x1 , . . . , xm } · · · {y1 , . . . , yn } on the “extended Hochschild space” C •,• (A) = Hom(T A; T A)

A Master Identity

53

(examples can be found in Sect. 2.1). Although many new results and identities were unveiled, the unadorned multilinear maps were apparently not enough to describe the rich structure in [15] in a nutshell. Terminology. For clarity a multigraded vector space will be referred to as a “space”, a space equipped with multilinear maps will be an “algebra”, and an algebra with a distinguished differential will be a “complex”; hence our reference above to the Hochschild space rather than the Hochschild complex (if A is not associative, the “differential” is not square-zero). In this paper we introduce “partitioned multilinear maps” x(π ) which are no different from ordinary multilinear maps except in composition. The ordered partitions π = (i1 | · · · |ir ) of nonnegative integers give us a grouping of the arguments of the map x, and the algebra generated by the “products” of such partitions governs the composition rule for two maps: if π ∗ π 0 = π1 + · · · + πn , then {x(π )}{y(π 0 )} =

n X

zi (πi ),

i=1

where the resulting partitioned maps zi (πi ) are rigorously defined in Sects. 2.3.1 and 2.3.5. The product ∗ is reduced to the familiar rule (i) ∗ (j ) = (i + j − 1) for singletons, which says that {x(i)}{y(j )} = z(i + j − 1), or the composition of i-linear and j -linear maps is an (i + j − 1)-linear map. As a major change from identities of type (1), we will be composing more than one map on the right with the one on the left, thus we introduce on partitions some higher products N (1| · · · ) of which N(1|1) is the same as ∗. With this notation, it is possible to write the algebraic master equation for certain homotopy Gerstenhaber (or G∞ ) algebras as {m}{ ˜ m, ˜ m, ˜ . . . } = 0, where m=

X

m(π )

π

is a formal sum of all the partitioned multilinear maps involved, and m ˜ is a term-by-term modification of m (only in ± signs). This identity is to be interpreted as follows: the finite sum of all multilinear maps of partition type π 0 , which result from the compositions of all m(π) with all m(π1 ), m(π2 ), . . . , m(πn ) such that π 0 is a summand of the higher product of the underlying partitions π , πk , is identically zero for every partition π 0 . This is in the spirit of A∞ algebras where only one-slot partitions and pairwise multiplications are allowed.

54

F. Akman

The prototype of homotopy Gerstenhaber algebras is the Hochschild complex (to be referred to as the “H-complex” locally) with the differential as M(1), the dot (cup) product as M(2), the Gerstenhaber product – composition – as M(1|1), and the Gerstenhaber– May–Getzler higher compositions as M(1|k) (note the use of capital M versus m, to distinguish between multilinear maps on the space A and products defined on such maps). The only partitions which correspond to nonzero maps are (1), (2), and (1|k) (as far as have been verified), and (1|0) may be added to write commutativity relations. The second important example that emerged several decades after the first one is a TVOA, definitely with a different choice of the defining set of partitions. The corresponding maps have not been explicitly enumerated (however see Sect. 4.3) but obvious candidates, such as the BRST operator for M(1) and the Wick product for M(2), have been noticed. There are minor differences between the two examples. For example, M(2) is associative on the nose on the H-complex but pre-Lie on a TVOA; the Gerstenhaber bracket on the H-complex is the antisymmetrization of M(1|1) – hence graded Lie on the nose – but is obtained from a so-called BV-operator on a TVOA. Among the similarities, we note that the commutativity of M(2) is obstructed by the bracket of M(1) and M(1|1). An extensive study of the G∞ maps on a TVOA is certainly in order. To create a third generation of G∞ algebras – as yet nonexistent – we may consider enlarging algebraic structures that involve bilinear pre-Lie products and higher products, of which examples will be provided throughout the paper (the algebra of regular partitions is one). Another path with promise seems to be the use of certain operators – which were introduced in [2] by the author to define higher order differential operators on general algebras – to generate full-blown G∞ algebras beginning from a very small data set. We will study interesting substructures of the G∞ algebra in Sect. 4.1. In addition to the master identity, we will introduce a variety of new examples of homotopy Gerstenhaber algebras and pre-Lie algebras. For example, the algebra P of regular partitions with the product ∗ is a right pre-Lie algebra, any vertex operator algebra with the Wick product is a left pre-Lie algebra, and it may be possible to build up a G∞ algebra from scratch starting with a square-zero Batalin-Vilkovisky type operator and the 8-operators mentioned above (an older example is provided by Frölicher and Nijenhuis in [8]). The multibraces notation of [15] will be replaced by the partitioned map notation m(i1 | · · · |ir ) so that braces can be used to denote composition.

2. Partitioned Maps 2.1. Multibraces. This is a short review of the multibraces notion introduced by Gerstenhaber [10] and further developed by May [22], Getzler [11], and the author [1]. For more details and historical references the reader is referred to [1]. For the time being it suffices to consider compositions of multilinear maps x : A⊗n → A, where A is a (super) graded vector space A = ⊕n∈Z An . These maps live in the Hochschild space C • (A) = ⊕n≥0 Hom(A⊗n ; A),

A Master Identity

55

which may be replaced by its “completion” C • (A) = Hom(T A; A) to accommodate formal infinite sums (T A is the tensor space on A). Elements of A, or maps from the (characteristic zero) field into A, are treated on the same footing as higher multilinear maps. If x is an n-linear map (n ≥ 0) which changes the total super degree of its arguments by the integer amount s, it is assigned the gradings d(x) = n − 1 and

|x| = s.

The compact expression {x}{y1 , . . . , yk } · · · {z1 , . . . , zl }{a1 , . . . , an },

(2)

where x, yi , zj are maps and a1 ,...,an ∈ A, indicates that: • Every symbol except x is to be substituted into one on the left, in every possible way. But: • Elements of A and symbols within the same pair of braces cannot be substituted into one another, and the internal order within each pair must be retained. Then: • All such expressions are to be added up, each term accompanied by a product of signs (−1)d(u)d(v)+|u||v| whenever two symbols u, v are interchanged with respect to (2). Super and d-degrees are preserved under such multiple composition-substitutions. Example 1. Let d(x) = 2, d(y) = 1, and d(z) = 2. Then {x}{y, z}{a, b, c, d, e, f } = (−1)d(z)(d(a)+d(b))+|z|(|a|+|b|) x(y(a, b), z(c, d, e), f ) +(−1)d(z)(d(a)+d(b)+d(c))+|z|(|a|+|b|+|c|) x(y(a, b), c, z(d, e, f )) +(−1)d(y)d(a)+|y||a|+d(z)(d(a)+d(b)+d(c))+|z|(|a|+|b|+|c|) x(a, y(b, c), z(d, e, f )) = (−1)|z|(|a|+|b|) x(y(a, b), z(c, d, e), f ) +(−1)|z|(|a|+|b|+|c|) x(y(a, b), c, z(d, e, f )) −(−1)|y||a|+|z|(|a|+|b|+|c|) x(a, y(b, c), z(d, e, f )). But note that {x}{y}{z}{a, b, c, d, e, f } = ±x(y(a, b), z(c, d, e), f ) ± x(y(a, b), c, z(d, e, f )) ± x(a, y(b, c), z(d, e, f )) ±x(z(a, b, c), y(d, e), f ) ± x(z(a, b, c), d, y(e, f )) ± x(a, z(b, c, d), y(e, f )) ±x(y(z(a, b, c), d), e, f ) ± x(y(a, z(b, c, d)), e, f ) ± x(a, y(z(b, c, d), e), f ) ±x(a, y(b, z(c, d, e)), f ) ± x(a, b, y(z(c, d, e), f )) ± x(a, b, y(c, z(d, e, f ))). Example 2. By definition, if x is an n-linear map, then X (−1)sgn(σ ) x(aσ (1) , . . . , aσ (n) ) {x}{a1 } · · · {an } = σ ∈Sn

(super degrees ignored for simplicity).

56

F. Akman

Example 3. If m is an even bilinear map, then the condition {m}{m} = 0 is equivalent to associativity: {m}{m}{a, b, c} = m(m(a, b), c) + (−1)d(m)d(a)+|m||a| m(a, m(b, c)) = m(m(a, b), c) − m(a, m(b, c)) = 0. Example 4. For an even bilinear associative map m, the bracket defined by [a, b]m = {m}{a}{b} = m(a, b) − (−1)|a||b| m(b, a) is a (graded) Lie bracket, because {m}{m}{a, b, c} = 0 ∀a, b, c implies that {m}{m}{a}{b}{c} = 0 ∀a, b, c, which is equivalent to the Jacobi identity for [ , ]m . Example 5. A vector space A with a bilinear multiplication map m(a, b) = a ? b is a right pre-Lie algebra if the identity (a ? b) ? c − a ? (b ? c) = (−1)|b||c| ((a ? c) ? b − a ? (c ? b))

(3)

holds (there may be any number of gradings on A). Gerstenhaber [10] showed that his bigraded composition product x ◦ y = {x}{y} on -truncated- C • (A) is a right pre-Lie product, and therefore the bigraded Gerstenhaber bracket [x, y] = {x}{y} − (−1)d(x)d(y)+|x||y| {y}{x} def

leads to a bigraded Lie algebra. This last statement is easy to see in general for (A, m) because (3) can be expressed as {m}{m}{a, {b}{c} } = 0 ∀a, b, c, which implies {m}{m}{a}{b}{c} = 0 ∀a, b, c. Additional results about C • (A) can be found in Sects. 2.3.4 and 4.2.

A Master Identity

57

Example 6. In order to write master identities for strongly homotopy associative and strongly homotopy Lie (A∞ and L∞ ) algebras, we are forced to introduce the total degree ||x|| = d(x) + |x| on C • (A), and to define x(a ˜ 1 , . . . , an ) = (−1)(n−1)||a1 ||+(n−2)||a2 ||+···+||an−1 || {x}{a1 , . . . , an } for d(x) = n − 1. Then, for example, an A∞ algebra is nothing but a graded vector space A with a formal infinite sum of maps m = m(1) + m(2) + · · · satisfying {m}{ ˜ m} ˜ = 0. In this setting we have d(m(i)) = i − 1 and |m(i)| ≡ i (mod 2), so that ||m(i)|| is always odd. Moreover, an interchange of u and v results in the sign (−1)||u|| ||v|| . The master identity above reduces to the associativity condition on m(2) when m = m(2). Example 7. It is well-known [17] that graded antisymmetrization of the products in an A∞ algebra leads to an L∞ algebra. The proof in [1] uses the following simple fact: the usual L∞ identities are equivalent to {m}{ ˜ m}{a ˜ 1 } · · · {an } = 0 ∀a1 , . . . , an in this case, which is an immediate consequence of {m}{ ˜ m}{a ˜ 1 , . . . , an } = 0 ∀a1 , . . . , an . Example 8. In [1], the Hochschild space was extended to C •,• (A) = Hom(T A; T A) by allowing multilinear maps with values in T A, including elements of T A. The rules remain the same, but many formerly inadmissible expressions have a meaning in the new space. For example, def

{a1 , . . . , an } = a1 ⊗ · · · ⊗ an , and {a}{b} = {a, b} − (−1)|a||b| {b, a} = a ⊗ b − (−1)|a||b| b ⊗ a.

58

F. Akman

2.2. The algebra of ordered partitions. 2.2.1. Regular partitions. We will define a binary multiplication ∗ on the free abelian group P spanned by the (regular) ordered partitions π = (i1 | · · · |ir ) r ≥ 1, i1 , . . . , ir ≥ 1 of all positive integers. These partitions have been studied under the name “compositions” in literature (see [3] for an introduction), which we feel would be easy to confuse with compositions of maps in this paper. It is known [3] that there are (n − 1)! n−1 def = c(m, n) = m−1 (m − 1)!(n − m)! ordered partitions of a positive integer n with m “parts” (which we will call slots), and the generating function for c(m, n) is given by ∞ X

c(m, n)q n =

n=0

qm . (1 − q)m

Ordered partitions are to be thought of as grouping the arguments of multilinear maps x : A⊗(i1 +···+ir ) → A of type π. We grade the basis elements π of P by d(π ) = i1 + · · · + ir − 1 (total number of arguments minus one) and ¯ )=r −2 d(π (total number of slots minus two), and remark that of the two only d will be preserved under the multiplication ∗ and the higher products to be introduced. Defining d¯ to be r − 1 would have been more useful for the limited purpose of preserving it under ∗ but our present definition will have significance later on, in the definition of the master identity. The product π ∗ π 0 of two ordered partitions will be a sum π1 + · · · + πn of basis elements, where the multiplicity of each πl is chosen to be 1 for technical reasons. We will later interpret the outcome as the collection of types of multilinear maps that arise from the composition of two maps of types π and π 0 . An arbitrary element of the algebra P will be denoted by p, p, ˆ etc. Notation: If π appears with a nonzero (integral) ¯ ) = d(π ¯ 0 ) and each slot of π is at least as coefficient in p, we will write π → p. If d(π large as the corresponding slot of π 0 , we will denote this by π ≥ π 0 . We first describe the product p = (i) ∗ π 0 by p=

X π 00 ≥π 0 ,d(π 00 )=d(π 0 )+d(i)

π 00 .

(4)

A Master Identity

59

In short, we write the partition π 0 and add to its slots nonnegative integers totaling i − 1 in all possible ways (the “1” in i − 1 refers to the number of factors to the right of (i), a notion that will come up again during the introduction of higher products). Then we add up all the outcomes in the space P. The definition of p = π ∗ π 0 = (i1 | · · · |ir ) ∗ π 0 is first approximated by pˆ =

r X

X

(i1 | · · · |il−1 |π|i ˜ l+1 | · · · |ir ),

(5)

0 l=1 π→(i ˜ l )∗π

after which we combine like terms under coefficient 1 and write X π 00 , p=

(6)

π 00 →pˆ

that is, we multiply one slot of π with π 0 at a time and insert the result into the slot in question, finally adding up, and erasing duplicate terms. Example 9. In the following computation, we reduce the coefficient of (1|1|3|4) to 1 although it shows up twice in p: ˆ (1|1|4) ∗ (1|3) = (1|3|1|4) + (1|1|3|4) + (1|1|4|3) + (1|1|2|5) + (1|1|1|6). Example 10. The special case (i) ∗ (j ) = (i + j − 1) = (j ) ∗ (i) reflects the fact that the composition of an i-linear map with a j -linear map in the sense of Gerstenhaber is an (i + j − 1)-linear map. The partition (1) acts as a two-sided identity in the noncommutative algebra P. The product ∗ is not associative nor left pre-Lie, but it does turn out to be right pre-Lie. These statements are false if we count multiplicities. Proposition 1. The algebra P of ordered partitions is a right pre-Lie algebra under the product ∗, namely, the identity (p1 ∗ p2 ) ∗ p3 − p1 ∗ (p2 ∗ p3 ) = (p1 ∗ p3 ) ∗ p2 − p1 ∗ (p3 ∗ p2 ) holds. As a result, the (ungraded) Gerstenhaber bracket def

[p1 , p2 ] = p1 ∗ p2 − p2 ∗ p1 on P is a Lie bracket.

60

F. Akman

Proof. It suffices to give a proof for basis elements π1 = (i1 | · · · |ir ), π2 = (j1 | · · · |js ), π3 = (k1 | · · · |kt ). (A) Special case: we will show that when π1 = (i) associativity holds on the nose, i.e. ( (i) ∗ π2 ) ∗ π3 = (i) ∗ (π2 ∗ π3 ). Let us compare the two sides (up to repetition of terms, hence the symbol ≈ instead of =) from the definition. We have X (j1 + u1 | · · · |js + us ) (i) ∗ π2 = u1 +···+us =i−1

and

≈

( (i) ∗ π2 ) ∗ π3 t s X X X

(7) X

α=1 β=1 u1 +···+us =i−1 v1 +···+vt =jα +uα −1

(j1 + u1 | · · · |jα−1 + uα−1 |k1 + v1 | · · · |kt + vt |jα+1 + uα+1 | · · · |js + us ). On the other hand, we have π 2 ∗ π3 s X ≈ (j1 | · · · |(jα ) ∗ π3 | · · · |js ) ≈

α=1 t s X X

X

(j1 | · · · |jα−1 |k1 + w1 | · · · |kt + wt |jα+1 | · · · |js )

α=1 β=1 w1 +···+wt =jα −1

and

≈

(i) ∗ (π2 ∗ π3 ) t s X X X

(8) X

α=1 β=1 w1 +···+wt =jα −1 z1 +···+zs+t−1 =i−1

(j1 + z1 | · · · |jα−1 + zα−1 |k1 + w1 + zα | · · · |kt + wt + zα+t−1 |jα+1 + zα+t | · · · |js + zs+t−1 ). After making a (not necessarily invertible) change of variables u1 = z1 , . . . , uα−1 = zα−1 , uα = zα + · · · + zα+t−1 , uα+1 = zα+t , . . . , us = zs+t−1 , v1 = w1 + zα , . . . , vt = wt + zα+t−1 , we obtain (7) from (8). Note that any excess terms are duplicates and can be discarded.

A Master Identity

61

(B) General case: we have π1 ∗ π2 ≈

r X

(i1 | · · · |(im ) ∗ π2 | · · · |ir )

m=1

and (π1 ∗ π2 ) ∗ π3 ≈

r m−1 X X (i1 | · · · |(iα ) ∗ π3 | · · · |(im ) ∗ π2 | · · · |ir ) m=1 α=1 r X

(i1 | · · · |( (im ) ∗ π2 ) ∗ π3 | · · · |ir )

+

+

m=1 r X

r X

(i1 | · · · |(im ) ∗ π2 | · · · |(iα ) ∗ π3 | · · · |ir ).

m=1 α=m+1

But then π1 ∗ (π2 ∗ π3 ) is just π1 ∗ (π2 ∗ π3 ) =

r X

(i1 | · · · |(im ) ∗ (π2 ∗ π3 )| · · · ir ),

m=1

and (π1 ∗ π2 ) ∗ π3 − π1 ∗ (π2 ∗ π3 )

(9)

consists of all terms in which π2 and π3 (in any order) are multiplied on the right by two distinct slots in π1 , thanks to the special case. Obviously, (π1 ∗ π3 ) ∗ π2 − π1 ∗ (π3 ∗ π2 )

(10)

consists of the exact same terms by symmetry, and the difference of (9) and (10) is zero. t u Remark 1. Many right pre-Lie proofs follow the same pattern: one shows (a ? b) ? c − a ? (b ? c) is (possibly graded) symmetric in b and c. 2.2.2. Partitions involving zeros. In order to write down the G∞ identities in a uniform fashion, we will need to deal with partitions π = (i1 | · · · |ir ) with i1 , . . . , ir ≥ 0. Let us denote the basis of P consisting of the regular partitions by ¯ and B, the set of partitions with at least one zero slot by B0 , their union B ∪ B0 by B, ¯ We extend the multiplication ∗ to P¯ by the the free abelian group spanned by B¯ by P. same rules (4)-(6). In effect, everything is as before except when there is a zero slot on the left: we have (0) ∗ π 0 = 0 (not (0)) since d(0) = −1 (an empty sum is zero).

62

F. Akman

Example 11. (1|0|3) ∗ (2) = (2|0|3) + (1|0|4), (2) ∗ (1|0|3) = (1|0|4) + (1|1|3) + (2|0|3). Remark 2. The partition (1) still acts as the left multiplicative identity, but the algebra ¯ ∗) is not right pre-Lie any more. For example, (P, ( (i) ∗ (0) ) ∗ (j ) − (i) ∗ ( (0) ∗ (j ) ) = (i + j − 2), whereas ( (i) ∗ (j ) ) ∗ (0) − (i) ∗ ( (j ) ∗ (0) ) = (i + j − 2) − (i + j − 2) = 0 for i ≥ 2, j ≥ 1. 2.3. Partitioned multilinear maps. 2.3.1. Regular partitioned maps. Let A be a super-graded vector space. We introduce the notion of a (regular) partitioned multilinear map x(π) = x(i1 | · · · |ir ) : A⊗i1 ⊗ · · · ⊗ A⊗ir → A, π ∈ B as an (i1 + · · · + ir )-linear map which is labeled by a regular ordered partition π and is distinguished from ordinary maps only by the way we compose it (Getzler and Jones mention in [13] multilinear maps mk,l which are similar to our m(k|l) but do not elaborate on their composition properties). For substitution into x(π ), we will use the notation {x(i1 | · · · |ir )}{a1 , . . . , ai1 |ai1 +1 , . . . , ai1 +i2 | · · · |ai1 +···ir−1 +1 , . . . , ai1 +···+ir } = {x(i1 | · · · |ir )}{a (1) | · · · |a (r) } rather than {x(i1 | · · · |ir )}{a1 , . . . , ai1 +···+ir }. When x = y1 (π1 ) + · · · + yk (πk ) (with d(π1 ) = · · · = d(πk ) = n − 1) is a sum of maps of different types, we will still write {x}{a1 , . . . , an } to mean {y1 (π1 )}{a (1) | · · · |a (r) } + · · · (say for π1 = (i1 | · · · |ir ) etc.), where each map yl (πl ) is followed by the suitable barred braces. In particular, we may use ordinary braces after the composition of two maps x(π ) and y(π 0 ) to denote the sum of all cases. Recalling from the previous section that {x(π )}{y(π 0 )} is designed to be of the form X zπ 00 (π 00 ), {x(π)}{y(π 0 )} = π 00 →π ∗π 0

A Master Identity

63

we proceed to define each component zπ 00 (π 00 ) rigorously. In the following account, the ± signs preceding each term are computed as in Sect. 2.1, depending solely on the super degrees and the d-degrees of the maps and elements involved; we will in general omit the full expressions to save space. Note that ¯ ¯ ) d(x(π )) = d(π ) and d(x(π )) = d(π by definition. As with partitions, we start with the simpler case {x(i)}{y(π 0 )}, π 0 = (j1 | · · · |js ). Let π 00 = (k1 | · · · |ks ) → (i) ∗ (π 0 ). To define {zπ 00 (π 00 )}{a (1) | · · · |a (s) }, we consider all possible subdivisions S {a (1) } = {b(1) , c(1) , d (1) } .. . {a (s) } = {b(s) , c(s) , d (s) }

(11)

of the elements {a1 , . . . , ak1 +···ks } = {a (1) | · · · |a (s) } such that c(1) contains j1 consecutive elements in a (1) , c(2) has j2 in a (2) , and so on. Then we have {zπ 00 (π 00 )}{a (1) | · · · |a (s) } def X = ±{x(i)}{ {b(1) } · · · {b(s) }, {y(j1 | · · · |js )}{c(1) | · · · |c(s) }, {d (1) } · · · {d (s) } }. S

The notation {b(1) } · · · {b(s) }, for example, indicates that we sum over all possible permutations of the remaining elements that come before the chosen substrings c(l) , while retaining the order within each b(l) . The strings b(l) and d (l) may be empty. Example 12. Since (3) ∗ (2|4) = (2|6) + (3|5) + (4|4), the composition {x(3)}{y(2|4)} is a sum of three partitioned maps. The first one is given by {x(3)}{y(2|4)}{a, b|c, d, e, f, g, h} = ±{x(3)}{ {y(2|4)}{a, b|c, d, e, f }, g, h} ±{x(3)}{c, {y(2|4)}{a, b|d, e, f, g}, h} ±{x(3)}{c, d, {y(2|4)}{a, b|e, f, g, h} }, the second one by {x(3)}{y(2|4)}{a, b, c|d, e, f, g, h} = ±{x(3)}{ {y(2|4)}{a, b|d, e, f, g}, c, h} ±{x(3)}{ {y(2|4)}{a, b|d, e, f, g}, h, c} ±{x(3)}{d, {y(2|4)}{a, b|e, f, g, h}, c} ±{x(3)}{a, {y(2|4)}{b, c|d, e, f, g}, h} ±{x(3)}{a, d, {y(2|4)}{b, c|e, f, g, h} } ±{x(3)}{d, a, {y(2|4)}{b, c|e, f, g, h} },

64

F. Akman

and the third one by {x(3)}{y(2|4)}{a, b, c, d|e, f, g, h} = ±{x(3)}{ {y(2|4)}{a, b|e, f, g, h}, c, d} ±{x(3)}{a, {y(2|4)}{b, c|e, f, g, h}, d} ±{x(3)}{a, b, {y(2|4)}{c, d|e, f, g, h} }. Remark 3. Note that unlike the ordinary composition for non-partitioned maps, the order of the elements {a (1) | · · · |a (s) } do change in the final substitution. However, the order within each a (l) remains the same. In order to extend the definition to {x(π)}{y(π 0 )} = {x(i1 | · · · |ir )}{y(j1 | · · · |js )}, we first identify the slot(s) in (i1 | · · · |ir ) which give rise to the partition π 00 in the product π ∗ π 0 (when r ≥ 2, it is possible to have the outcome π 00 repeated in the multiplication process; we then reduce the coefficient of π 00 in π ∗ π 0 to 1 and add all the multilinear maps of type π 00 in {x(π)}{y(π 0 )}). Without loss of generality, assume that the generating slot is the first one and that π 00 is not repeated. Then π 00 = (j1 + k1 | · · · |js + ks |i2 | · · · |ir ), where k1 + · · · + ks = i1 − 1, and {zπ 00 (π 00 )}{a (1) | · · · |a (s+r−1) } = ±{ {x(i1 | · · · |ir )}{−|a (s+1) | · · · |a (s+r−1) } }{y(j1 | · · · |js )}{a (1) | · · · |a (s) }.

def

In other words, we fix the arguments of the remaining r − 1 slots of x(π ), and compose the two maps as if the first has only one slot, according to the recipe in the previous paragraph. Example 13. We verify that (1|2) ∗ (2|3) = (2|3|2) + (1|2|4) + (1|3|3), and compute the first partitioned map which arises from multiplying the first slot in (1|2) by (2|3): {x(1|2)}{y(2|3)}{a, b|c, d, e|f, g} = ±{ {x(1|2)}{−|f, g} }{y(2|3)}{a, b|c, d, e}. The last two maps come from multiplication with the second slot, hence we have {x(1|2)}{y(2|3)}{a|b, c|d, e, f, g} = ±{ {x(1|2)}{a|−} }{ {y(2|3)}{b, c|d, e, f }, g} ±{ {x(1|2)}{a|−} }{d, {y(2|3)}{b, c|e, f, g} }, and {x(1|2)}{y(2|3)}{a|b, c, d|e, f, g} = ±{ {x(1|2)}{a|−} }{ {y(2|3)}{b, c|e, f, g}, d} ±{ {x(1|2)}{a|−} }{b, {y(2|3)}{c, d|e, f, g} }.

A Master Identity

65

Remark 4. Suppose we are trying to compose x(π ) and y(π 0 ), with π = (i1 | · · · |ir ) and π 0 = (j1 | · · · |js ). Once we compute the product π ∗ π 0 and decide which summand π 00 = (k1 | · · · |kt ) → π ∗ π 0 to fix for the time being, we proceed to write down the π 00 -summand z(π 00 ) of {x(π )} {y(π 0 )} as follows. We examine the list {a (1) | · · · |a (t) } of variable arguments of z(π 00 ), partitioned according to π 00 , and choose s consecutive slots each of which is at least as large as the corresponding slot in y(π 0 ) (we will eventually take a sum over such choices). Our next choice is a substring of consecutive elements in each of the selected host slots, to be fed into y(π 0 ) (we will sum over these choices as well). We then start writing out {x(π)} followed by a pair of braces enclosing all the variables from the list {a (1) | · · · |a (t) } as well as y(π 0 ) applied to the chosen substrings inside the a (i) (denoted above by c(i) ). Any slot in {a (1) | · · · |a (t) } untouched by y(π 0 ) will go unchanged into x(π), and the rest of the symbols will be squeezed into one designated slot in x(π ). The details inside this slot are as follows: we permute all “left leftovers” (denoted by b(i) ) and write them first (do not change the internal order of any b(i) ), then we write the element {y(π 0 )}{c(1) | · · · |c(t) }, and finally finish the list by writing all permutations of the “right leftovers” (denoted by d (i) ). Example 14. In [2], we defined higher order differential operators 1 : A → A for a noncommutative, nonassociative algebra (A, m) with a bilinear map m to coincide with the commutative and associative case given by Koszul in [16]. A follow-up on the simplification of notation due to multibraces, and a generalization, appeared in [1]. A linear operator 1 is called a differential operator of order ≤ r if and only if a certain (r + 1)-linear map 8r+1 1 (a1 , . . . , ar+1 )

(12)

is identically zero; we now want to think of (12) as a partitioned map 8r+1 1 (r|1), because the last slot is distinguished and the original inductive definition 811 (a) = 1(a),

821 (a, b) = 1(ab) − 1(a)b − (−1)|1||a| a1(b),

r+1 r+1 8r+2 1 (a1 , . . . , ar , b, c) = 81 (a1 , . . . , ar , bc) − 81 (a1 , . . . , ar , b)c

−(−1)|b|(|1|+|a1 |+···+|ar |) b8r+1 1 (a1 , . . . , ar , c) r ≥ 1

in [2] can be conveniently rewritten as 811 (a) 2 81 (a, b)

= {1}{a}, = [811 (1), m(2)]{a, b},

r+1 8r+2 1 (a1 , . . . , ar , b, c) = [81 (r|1), m(2)]{a1 , . . . , ar |b, c} r ≥ 1

in terms of Gerstenhaber brackets and partitioned maps. Note that (r|1) ∗ (2) = (2) ∗ (r|1) = (r + 1|1) + (r|2). Here we may designate the same multilinear map 8r+2 1 to be of type (r + 2), (r + 1|1), or (r|2), depending on what we want to describe.

66

F. Akman

2.3.2. Higher products of regular partitions. The algebra P enjoys higher products N(1|λ1 | · · · |λt ) : P ⊗ P ⊗λ1 ⊗ · · · ⊗ P ⊗λt → P,

(13)

where (λ1 | · · · |λt ) itself is any regular partition, and N(1|1)(π, π 0 ) = π ∗ π 0 . def

In some calculations we will switch to the notation π ∗ [π1 , . . . , πλ1 ] ∗ [πλ1 +1 , . . . , πλ2 ] ∗ · · · , but for the time being we want to emphasize the underlying partition (1|λ1 | · · · |λt ). Such products will be used to model higher compositions of partitioned maps in the next section. For (i), π1 = (j1 | · · · |js ), and π2 = (k1 | · · · |kt ) with i, s, t ≥ 2, we define def

{N(1|2)}{(i)|π1 , π2 } = (i) ∗ [π1 , π2 ] = (i − 1) ∗ (π1 + +π2 ), where π1 + +π2 is given by def

π1 + +π2 = (j1 | · · · |js−2 |js−1 + k1 |js + k2 |k3 | · · · |kt ) + (k1 | · · · |kt−2 |kt−1 + j1 |kt + j2 |j3 | · · · |js ), with the understanding that one of the terms on the right-hand side is to be dropped if they turn out to be equal. The procedure amounts to overlapping the two partitions in two slots and copying down the free slots on either end while adding up the entries in the overlapping slots. This can be done in two ways if the partitions are long enough. For example, (1|2|3) + +(4|5|6|7) = (1|6|8|6|7) + (4|5|7|9|3). Finally, note that nonnegative integers totaling i − 2 (“2” referring to the number of partitions on the right, being multiplied by (i)) are added to the slots in every possible way to complete the product. Similarly, for i ≥ 3 and π1 , π2 , π3 , we define def

{N(1|3)}{(i)|π1 , π2 , π3 } = (i) ∗ [π1 , π2 , π3 ] = (i − 2) ∗ (π1 + + + π2 + + + π3 ), where π1 + + + π2 + + + π3 is defined according to a similar “overlay and add” rule in which we imagine the partitions physically lying on top of each other -like rulers- and the shadowed slots (those which lie under or above at least one other) number exactly four. This number increases by two each time we add another factor, so the generalization to {N(1|k)}{(i)|π1 , . . . , πk } = (i) ∗ [π1 , . . . , πk ] is clear (there must be exactly 2k − 2 shadowed slots). We see that d is preserved but d¯ is increased by one regardless of the number of factors in the higher product, and this property is exactly what dictates the amount of overlap allowed. In the next section we will see the definition of the composition {x}{y, z} which motivates the above formulas. It is possible, but not necessary for our purposes, to give a precise algebraic definition of products involving more than two slots in N.

A Master Identity

67

To complete the discussion of higher products, we replace (i) by π = (i1 | · · · |ir ): then by definition, N(1|λ1 | · · · |λt )}{π| · · · } ≈

r X

(i1 | · · · |iα−1 |{N (1|λ1 | · · · |λt )}{(iα )| · · · }|iα+1 | · · · |ir ).

α=1

Example 15. (2|3) ∗ [(2), (3|4)] ≈ ((2) ∗ [(2), (3, 4)]|3) + ((2|(3) ∗ [(2), (3|4)]) = ((1) ∗ (5|4)|3) + ((1) ∗ (3|6)|3) + (2|(2) ∗ (5|4)) + (2|(2) ∗ (3|6)) = (5|4|3) + (3|6|3) + (2|6|4) + (2|5|5) + (2|4|6) + (2|3|7)

(14)

(no duplicate terms). 2.3.3. Higher compositions of regular partitioned maps. By analogy with the nonpartitioned case, we would like to define {x(π )}{y1 (π1 ), . . . , yk (πk )}, and ultimately {x(π)}{y1 (π1 ), . . . , yk (πk )} · · · {z1 (π10 ), . . . , zl (πl0 )} for π, πi , ..., πj0 ∈ B. The types of summands in the composite map will be governed by the enriched algebra of partitions discussed in the previous section. As the simplest example, we consider {x(i)}{y(π1 ), z(π2 )}, with π1 = (j1 | · · · |js ), π2 = (k1 | · · · |kt ) ∈ B, with i, s, t ≥ 2. For simplicity assume s, t > 2 and choose π 00 → {N (1|2)}{(i)|π1 , π2 }, then set up variables {a (1) | · · · |a (s+t−2) } accordingly. Then {x(i)}{y(π1 ), z(π2 )}{a (1) | · · · |a (s+t−2) }

(15)

will be defined as a sum over subdivisions S of these variables. Without loss of generality assume that (j1 | · · · |js ) is to the left of (k1 | · · · |kt ) in the particular configuration leading to the chosen term, so that each S looks like a (1) = {b(1) , c(1) , d (1) } .. . a (s−2) = {b(s−2) , c(s−2) , d (s−2) } a (s−1) = {b(s−1) , c(s−1) , d (s−1) = e(1) , f (1) , g (1) } a (s) = {b(s) , c(s) , d (s) = e(2) , f (2) , g (2) } a (s+1) = {e(3) , f (3) , g (3) } .. . a (s+t−2) = {e(s+t−u−2) , f (s+t−u−2) , g (s+t−u−2) },

68

F. Akman

where c(l) has jl consecutive elements, f (l) has kl consecutive elements, and the strings b(l) , d (l) , e(l) , g (l) may be empty. Then the contribution of S to (15) will be ±{x(i)} { {b(1) } · · · {b(s) }, {y(π1 )}{c(1) | · · · |c(s) }, {d (1) } · · · {d (s−2) }{e(1) } · · · {e(t) }, {z(π2 )}{f (1) | · · · |f (t) }, {g (1) } · · · {g (t) } }. Any multiple compositions involving more than two pairs of braces can be handled as sums and iterations of two-pair compositions: for example, {x(π )}{y(π1 )}{z(π2 )} means {x(π)}{y(π1 ), z(π2 )} ± {x(π )}{z(π2 ), y(π1 )} ± {x(π )}{ {y(π1 )}{z(π2 )} }. 2.3.4. Partitioned Hochschild space. Let us denote the vector space of n-linear maps on the graded vector space A (n ≥ 0) with values in A by Hom(A⊗n ; A) and the vector space of multilinear maps on A of type π = (i1 | · · · |ir ) ∈ B ∪ {(0)} by Hom(Aπ ; A) = Hom(A⊗i1 ⊗ · · · ⊗ A⊗ir ; A) (the two spaces are clearly isomorphic if d(π ) = n − 1). Depending on whether we want infinite sums or not, we defined the regular Hochschild space C • (A) to be either of the spaces ⊕n≥0 Hom(A⊗n ; A) ⊂ Hom(⊕n≥0 A⊗n ; A) = Hom(T A; A). • (A) by either In a similar manner, we may define the partitioned Hochschild space CP of

⊕π∈B∪{(0)} Hom(Aπ ; A) ⊂ Hom(⊕π ∈B∪{(0)} Aπ ; A) = Hom(T (T A); A). Proposition 2 (Gerstenhaber). The truncated Hochschild space C¯ • (A) = ⊕n≥1 Hom(A⊗n ; A) is a right pre-Lie algebra under the multiplication x ◦ y = {x}{y}.

Proof. See [10,1], and Proposition 3 below. u t The following proposition answers a natural question: Proposition 3. The truncated partitioned Hochschild space • (A) = ⊕π ∈B Hom(Aπ ; A) C¯ P

is a right pre-Lie algebra under the composition of partitioned maps.

A Master Identity

69

Proof. The triple composition { {x(π)}{y(π 0 )} }{z(π 00 )} − {x(π )}{ {y(π 0 )}{z(π 00 )} } consists of partitioned maps w(π˜ ) obtained by substituting y and z separately into x (we subtract the terms where z goes into y). Then by symmetry { {x(π)}{z(π 00 )} }{y(π 0 )} − {x(π )}{ {z(π 00 )}{y(π 0 )} } has the same summands up to an overall sign. u t Remark 5. The result holds for the full space in both cases, keeping in mind that {a}{b1 , . . . , bn } = 0 for a, bi ∈ A. Proposition 4. The full extended Hochschild space C •,• (A) = Hom(⊕i≥0 A⊗i ; ⊕j ≥0 A⊗j ) ⊂ Hom(T A; T A) defined in [1] is a right pre-Lie algebra under the multiplication {x}{y}. Proof. Same as above. The difference is that now all compositions {x}{y} have a natural definition in the new space, even when x and y are elements of T A (see Example 8 in Sect. 2.1). u t 2.3.5. Partitioned maps involving zeros. We now need a theory of compositions of maps ¯ consistent with our earlier definitions. It is clear that x(0) (with of type π for π ∈ B, d(x) = −1) has to be an element of A. Example 16. For n ≥ 2, we have (n) ∗ (0) = (n − 1), and not surprisingly, {x(n)}{y(0)}{a1 , . . . , an−1 } = {x(n)}{b}{a1 , . . . , an−1 } if y(0) = b ∈ A. On the other hand, • (A), as in Example 17. The expression {x(0)}{y(0)} must be interpreted as zero in CP C • (A) (after all, (0) ∗ (0) = 0). It can be thought of as an element

a ⊗ b − (−1)|a||b| b ⊗ a ∈ A⊗2 inside C •,• (A) = Hom(T A; T A) •,• (A)) only if we enlarge the Hochschild space. (or its suitably defined counterpart CP • (A). Similarly, the composition {y(0)}{x(n)} for n ≥ 1 is zero in CP

70

F. Akman

We will not have occasion to use any partition involving zeros other than (1|0), and that only to blend the (homotopy) commutativity relations into the master identity. The choice of (1|0) over (0|1) is also arbitrary. We may think of m(1|0) as a map that chooses none of the elements in two consecutive slots, so that the subdivision S of {a (1) |a (2) } is simply {a (1) } = {a (1) , ∅}, {a (2) } = {a (2) , ∅}. The permutation {a (1) }{a (2) } of the “left leftovers” corresponds to the twisting of two adjacent strands in the pictures of [15] and [25]. The simplest example is {m(2)}{m(1|0)}{a|b} = {m(2)}{a}{b} = m(2)(a, b) − (−1)|a||b| m(2)(b, a),

(16)

where m(1|0) antisymmetrizes m(2). The use of such partitioned maps helps us formalize symmetries, which would otherwise have to be put in “by hand”. For example, if we choose the generating map to be m = m(2) + m(1|0) instead of m(2) for an associative algebra, the identity m ◦ m = 0 gives us a commutative associative algebra. It is possible to improve the notation and usage for more complicated tasks. 3. The Master Identity for Homotopy Gerstenhaber Algebras The present work on the master identity for G∞ algebras originated from a joint paper of Kimura, Voronov, and Zuckerman [15], where a nontrivial algebra over a certain operad K• M (first defined by Getzler and Jones [12], based on ideas of Fox-Neuwirth) is described. Recently an error was detected by Tamarkin in the work of Getzler and Jones, and corrected by Voronov in [25], where relevant changes to [15] are also discussed. Following Voronov, we will call the corrected operad in [25] the weak G∞ -operad. Definition 1. A homotopy Gerstenhaber algebra (G∞ algebra) is an algebra over the weak G∞ -operad. Our new definition will differ from the one above in certain aspects involving symmetries, and these are detected only in higher identities which do not occur in the Hochschild complex. We want to stress the fact that all the work leading to Definition 2 was inspired by the configurations pictured in [15] and [25], and eventually the algebra emerged as a consistent structure on its own. Definition 2. A homotopy Gerstenhaber algebra (G∞ algebra) is a super graded vector space A = ⊕n∈Z An together with a collection ¯ ) (mod 2) m(π), |m(π )| ≡ d(π ) + d(π of partitioned multilinear maps, where the formal sum X m(π ) m= π ∈B∪{(1|0)}

A Master Identity

71

satisfies {m}{ ˜ m, ˜ m, ˜ . . . } = 0.

(17)

In this definition m ˜ is the formal sum which is modified from m using the total degree (described below) as in Sect. 2.1. The identity (17) is to be interpreted as follows: For ˜ )}{m(π ˜ 1 ), . . . , m(π ˜ n )}(π 00 ) for which every partition π 00 , the finitely many terms {m(π 00 π → π ∗ [π1 , . . . , πn ] add up to zero. We define the total degree ||m(π )|| to be def ¯ ) + |m| + 1, ||m(π )|| = d(π ) + d(π

which is always odd in a G∞ algebra. The actual super degree |m| comes from dimensions of chains in the G∞ operad. We reiterate that the elegant notation of [15] is changed in this paper in order to reserve the older notation of braces for compositions of maps. We would use the symbol m(i1 | · · · |ir ) to denote the multilinear map in [15] shown by r pairs of braces with i1 , . . . , ir arguments respectively. The prototype of G∞ algebras, namely the Hochschild complex of an associative algebra, will be studied again in Sect. 4.2. 4. Substructures and Examples 4.1. Substructures. Let (A, m) be a G∞ algebra as in Definition 2. Note that if desired, we may modify this algebraic definition to make any subset of {m(π )}π ∈B¯ , especially of {m(π)}π∈B0 , vanish. Some substructures are worth studying on their own. Lemma 1. The infinite sum mA = m(1) + m(2) + · · · satisfies the relation ˜ A } = 0. {m ˜ A }{m

(18)

Therefore, (A, mA ) is an A∞ algebra and (A, lA ) is an L∞ algebra, where lA is the term-by-term graded antisymmetrization of mA . Proof. The identity (18) is equivalent to { {m}{ ˜ m} ˜ }(i) = 0 ∀i ≥ 1, hence is valid, for the following reasons: the span of {(i)}i≥1 is a subalgebra of (P, ∗). Moreover, if (i) → π ∗ π 0 , then we must have ¯ + 1 = 0, ¯ ¯ 0 ) + 1) = d(i) (d(π) + 1) + (d(π or ¯ ) = d(π ¯ 0 ) = −1. d(π ˜ A, m ˜ A , . . . } = 0, because the restriction Note that the identity (18) is the same as {m ˜ A }{m ¯ on d-degrees forces us to compose only two maps of type m(i) at a time. u t

72

F. Akman

The infinite sum mB = m(1) + m(1|1) + m(1|1|1) + · · · barely misses the identity ˜ B } = 0, {m ˜ B }{m due to the single exceptional product (2) ∗ (1|0) = (2|0) + (1|1). Unlike mA , the partitions involved do not all arise from products of partitions of similar type. But then the unwanted first term in the defining equation ({m(2)}{m(1|0)} ± {m(1)}{m(1|1)} ± {m(1|1)}{m(1)})(1|1) = 0 can be eliminated by the following trick. Generalizing the (right) pre-Lie identity of Example 5 in Sect. 2.1, we may define a (right) pre-L∞ algebra to be a formal sum such as mB satisfying ˜ B }{a1 , . . . , an−2 , {an−1 }{an }} = 0 {m ˜ B }{m

(19)

for all n. The abovementioned symmetry term drops out for n = 2 and argument {a1 }{a2 }, and mB is easily seen to satisfy Eq. (19), from which it follows that ˜ B }{a1 } · · · {an } = 0 {m ˜ B }{m

(20)

for all n. This is exactly the defining identity of the L∞ algebra `B obtained by antisymmetrizing mB term by term (see Example 7 in Sect. 2.1). Note that mB also satisfies the left pre-L∞ algebra identity ˜ B }{{a1 }{a2 }, a3 , . . . , an } = 0. {m ˜ B }{m The definition of a pre-L∞ algebra can be altered according to need, as long as it leads to Eq. (20). 4.2. Hochschild complex revisited. Recall that the truncated Hochschild space C¯ • (A) is a right pre-Lie algebra under the multiplication def

M(1|1)(x, y) = {x}{y}, regardless of any structure on the vector space A (Proposition 2). Moreover, if (A, m) is an associative algebra, then the differential def

M(1)(x) = [m, x], dot product def

M(2)(x, y) = ±{m}{x, y}, and partitioned maps def

M(1|n)(x, y1 , . . . , yn ) = ±{x}{y1 , . . . , yn } n ≥ 1 are known to satisfy G∞ identities under Definitions 1 and 2 ([13,15,25]); this was first noticed by Gerstenhaber and Voronov, and Getzler and Jones. We provide a brief algebraic proof, in the light of the new terminology.

A Master Identity

73

Proposition 5 (Gerstenhaber–Voronov–Getzler–Jones). The truncated Hochschild complex (C¯ • (A), M(1), M(2), {M(1|n)}n≥0 ) is a G∞ algebra (Definition 2). Proof. We have (1) ∗ (1) = (1), (1) ∗ (2) = (2) ∗ (1) = (2), (2) ∗ (2) = (3), (1) ∗ (1|n) = (1|n) ∗ (1) = (1|n), (2) ∗ (1|n) = (1|n) ∗ (2) = (1|n + 1) + (2|n) (n ≥ 1), (2) ∗ (1|0) = (1|1) + (2|0), (1|n) ∗ (1|m) ≈ (1|m|n) + (1|n|m) + (1|n − 1|m − 1) + · · · + (1|1|m + n − 1) (n ≥ 1, m ≥ 0), (2) ∗ [(1|i), (1|n − i)] = (2|n) (1 ≤ i ≤ n − 1), as the only products involving (1), (2), or (1|n) on either side and leading to a nontrivial ¯ identity (recall that the d-degrees have to add up to one less than that of the partition on the right-hand side). We have to check (i) (ii) (iii) (iv) (v)

M(1)2 = 0, ( {M(1)}{M(2)} ± {M(2)}{M(1)} )(2) = 0, ( {M(2)}{M(2)} )(3) = 0, ( {M(1)}{M(1|n + 1)} ± {M(1|n + 1)}{M(1)} ±{M(2)}{M(1|n)} ± {M(1|n)}{M(2)} )(1|n + 1) = 0 ( {M(2)}{M(1|n)} ± {M(1|n)}{M(2)} X ±{M(2)}{M(1|i), M(1|n − i)} )(2|n) = 0, +

(n ≥ 1),

1≤i≤n−1

(vi) ( {M(1)}{M(1|1)} ± {M(1|1)}{M(1)} ± {M(2)}{M(1|0)} )(1|1) = 0. The first two tell us that the differential is square-zero and is a derivation of the dot product; these well-known facts are in [10]. The third equation is the associativity of the dot product, which does hold in the complex. The fourth is Eq. (4) of [15] and Eq. (8) of [26] in disguise (the distributivity of the dot product over the multi-composition up to homotopy). Equation (v) is exactly Eq. (3) in [15]; this was a stray identity which did not fit the mold because of the higher compositions before the corrections in [25]. Finally, Eq. (vi) is the commutativity of the dot product up to homotopy. Short proofs of some of these statements are also in [1]. u t Remark 6. Once again, the result holds for the full complex with all {a}{b1 , . . . , bn } = 0 for a, bi ∈ A. It would also be interesting to study the cases where A is an A∞ algebra ([11,1]) and/or C¯ • (A) is replaced by C •,• (A) with the additional composition maps M(λ) [2]. The overall algebraic structure of the algebra of partitions together with the higher products, possibly with the introduction of a differential and a dot product, is also worth comparing with the G∞ structure of the Hochschild complex (note that the (1|1)-map is pre-Lie in both cases).

74

F. Akman

4.3. Topological vertex operator algebras. Topological vertex operator algebras (TVOA) have always fueled the subject of G∞ algebras; see [18] and [15]. The article [15] indicates the existence of a G∞ structure on a TVOA but does not produce the actual products; Huang and Zhao’s recent proof in [14] of the existence is not explicit either. We will assume some acquaintance with vertex operator super algebras (VOSA) and TVOA’s, which are VOSA’s with extra structure ([7,6,4,9], [19]–[21]). A VOSA is in which a Z-bigraded vector space V (one L0 and one super, or fermionic, grading) P we associate to every element u a unique vertex operator u(z) =P un z−n−1 , with un ∈ End(V ). There is an action of the Virasoro algebra by L(z) = Ln z−n−2 , where the eigenvalues of L0 are bounded from below and L−1 is formal differentiation. The vacuum element 1, represented by 1 · z0 , is the identity element with respect to the distinguished Wick product given by u−1 v. Although the overall identities satisfied by all the bilinear products un v are summed up by the formal relation ([4]) [u(z1 ), v(z2 )](z1 − z2 )t = 0 for sufficiently large t = t (u, v), the specialized identities (um v)n =

X m (−1)i (um−i vn+i − (−1)m+|u||v| vm+n−i ui ) i i≥0

and [um , vn ] =

Xm (ui v)m+n−i i i≥0

(for m, n ∈Z) are very useful. For the record, we list a number of products already shown to satisfy G∞ identities: • The differential (“BRST operator”), usually denoted by Q, can be taken to be the odd linear operator m(1). • The Wick product (“normal ordered product”) plays the role of the even bilinear operator m(2) in any VOSA. • The odd trilinear operation n(u, v, t) of [18] (Eq. (2.16)) looks like m(3). • The odd bilinear product m(u, v) of [18] (Eq. (2.14)) is like m(1|1). There are additional products in [18] and [2] on a TVOA which may eventually be linked to the G∞ structure (most of these are related to homotopy Batalin-Vilkovisky algebras). In addition, we establish a surprising property of the Wick product on a VOSA: Proposition 6. Any VOSA V with the Wick product is a left pre-Lie algebra. Therefore, the Wick bracket defined by [u, v]W = u−1 v − (−1)|u||v| v−1 u def

is a graded Lie bracket on V . Proof. Dropping the super degrees, we have X X (u−1−i v−1+i + v−2−i ui ) = u−1 v−1 + (u−2−i vi + v−2−i ui ), (u−1 v)−1 = i≥0

i≥0

A Master Identity

75

so that for the left (Wick) multiplication operators Lu we have the identity Lu Lv − Lv Lu − L[u,v]W = u−1 v−1 − v−1 u−1 − (u−1 v)−1 + (v−1 u)−1 X = u−1 v−1 − v−1 u−1 − u−1 v−1 − (u−2−i vi + v−2−i ui ) + v−1 u−1 i≥0 X + (v−2−i ui + u−2−i vi ) i≥0

= 0.

t u

It was pointed out to the author by Haisheng Li that the Jacobi identity for [u, v]W was independently observed by Dong, Li, and Mason in [5]. Another (this time, right) ¯ which is defined on the space of “vector-forms” on an arbitrary vector pre-Lie product ∧ space is described by Frölicher and Nijenhuis in [8], Eq. (2.9). The pre-Lie identity is also part of the definition of a Novikov algebra (see e.g. Osborn’s [23]) which has been studied in some detail in terms of the representation theory and also in the context of vertex operator algebras, variational calculus, etc. Finally, since the modes un of vertex operators for n ≥ 0 have been identified as differential operators of order n + 1 with respect to the Wick product, and the 8 operators of Example 14 having been shown to be an invaluable tool in proving identities on VOSA’s (such as those related to the generalized Batalin-Vilkovisky structure) ([2, 1]), we expect the 8 operators to play some role in making explicit the G∞ structure on a TVOA. Acknowledgements. I am very much indebted to Sasha Voronov, who explained the pictures in [15] and later in [25] to me in great detail, and to Jim Stasheff, for his continuing guidance and support, as well as for his helpful comments during the final stages of this work. Many thanks are due Don Schack for comments on simplification of notation and for pointing to an example of pre-Lie algebras I didn’t know about; also Haisheng Li and Chongying Dong for discussions on pre-Lie algebras and the Wick bracket. I would like to thank the referee for the encouraging remarks and for suggesting certain changes in the light of new developments in the subject.

References 1. Akman, F.: Multibraces on the Hochschild space. Preprint q-alg/9702010 2. Akman, F.: On some generalizations of Batalin-Vilkovisky algebras. J. Pure Appl. Alg. 120, 105–141 (1997) 3. Andrews, G.E.: The theory of partitions. Cambridge: Cambridge University Press, 1998 4. Dong, C., Lepowsky, J.: Generalized vertex algebras and relative vertex operators. Progress in Mathematics v. 112, Boston: Birkhäuser, 1993 5. Dong, C., Li, H., Mason, G.: Vertex Lie algebra, vertex Poisson algebra, and vertex operator algebras. Unpublished preprint 6. Frenkel, I.B., Huang, Y.-Z., Lepowsky, J.: On axiomatic approaches to vertex operator algebras and modules. Memoirs AMS 1992 7. Frenkel, I.B., Lepowsky, J., Meurman, A.: Vertex operator algebras and the Monster. NewYork: Academic Press, 1988 8. Frölicher, A., Nijenhuis, A.: Theory of vector-valued differential forms, Part I: Derivations in the graded ring of differential forms. Koninklijke Nederlandse Akademie van Wetenschappen (later Indag. Math.), Series A, Proceedings 59, 338–350 (1956) 9. Gebert, R.W.: Introduction to vertex algebras, Borcherds algebras, and the Monster Lie algebra. Preprint hep-th/9308151 and DESY 93–120 10. Gerstenhaber, M.: The cohomology structure of an associative ring. Ann. Math. 78, 267–288 (1963)

76

F. Akman

11. Getzler, E.: Cartan homotopy formulas and the Gauss-Manin connection in cyclic homology. In: Joseph, A., Shnider, S. (eds.) Quantum deformations of algebras and their representations, Israel Mathematical Conference Proceedings, Vol. 7, 1993 12. Getzler, E., Jones, J.D.S.: A∞ -algebras and the cyclic bar complex. Illinois J. Math. 34, 256–283 (1989) 13. Getzler, E., Jones, J.D.S.: Operads, homotopy algebra and iterated integrals for double loop spaces. Preprint hep-th/9403055 14. Huang, Y.-Z., Zhao, W.: Semi-infinite forms and topological vertex operator algebras. Preprint math.QA/9903014 15. Kimura, T., Voronov, A.A., Zuckerman, G.J.: Homotopy Gerstenhaber algebras and topological field theory. In: Loday, J.-L., Stasheff, J., Voronov, A.A. (eds.) Operads: Proceedings of Renaissance Conferences. Contemp. Math. Vol. 202, Providence, RI: AMS, 1996 16. Koszul, J.-L.: Crochet de Schouten-Nijenhuis et cohomologie. Astérisque, 257–271 (1985) 17. Lada, T., Stasheff, J.D.: Introduction to sh Lie algebras for physicists. Preprint hep-th/9209099, UNCMATH-92/2 18. Lian, B.H., Zuckerman, G.J.: New perspectives on the BRST-algebraic structure of string theory. Commun. Math. Phys. 154, 613–646 (1993) 19. Lian, B.H., Zuckerman, G.J.: Some classical and quantum algebras. In: Brylinski, J.-L. et al. (eds.) Lie theory and Geometry, Progress in Mathematics Vol. 123, Boston: Birkhäuser, 1994 20. Lian, B.H., Zuckerman, G.J.: Commutative quantum operator algebras. J. Pure Appl. Alg. 100, 1995 21. Lian, B.H., Zuckerman, G.J.: Moonshine cohomology. Preprint q-alg/9501015 22. May, J.P.: The geometry of iterated loop spaces. Lecture Notes in Mathematics Vol. 271, Springer-Verlag, 1972 23. Osborn, J.M.: Novikov algebras. Nova J. Algebra Geom. 1, 1–14 (1992) 24. Penkava, M., Schwarz, A.: On some algebraic structures arising in string theory. In: Perspectives in mathematical physics. Conf. Proc. Lecture Notes Math. Phys., III. Cambridge, MA: International Press, 1994 25. Voronov, A.A.: Homotopy Gerstenhaber algebras. Unpublished preprint 26. Voronov, A.A., Gerstenhaber, M.: Higher order operations on the Hochschild complex. Functional Anal. Appl. 29, no. 1, 1–6 (1995) 27. Zhu, Y.: Modular invariance of characters of vertex operator algebras. J. Amer. Math. Soc. 9, 237–302 (1996) Communicated by R. H. Dijkgraaf

Commun. Math. Phys. 209, 77 – 95 (2000)

Communications in

Mathematical Physics

© Springer-Verlag 2000

D-Particle Bound States and Generalized Instantons Gregory Moore1 , Nikita Nekrasov2,3 , Samson Shatashvili1,4,? 1 Department of Physics, Yale University, Box 208120, New Haven, CT 06520, USA.

E-mail: [email protected]; [email protected]

2 Institute of Theoretical and Experimental Physics, 117259, Moscow, Russia 3 Lyman Laboratory of Physics, Harvard University, Cambridge, MA 02138, USA.

E-mail: [email protected]

4 Theory Division, CERN, 1211 Geneve 23, Switzerland

Received: 12 March 1999/ Accepted: 16 July 1999

Abstract: We compute the principal contribution to the index in the supersymmetric quantum mechanical systems which are obtained by reduction to 0 + 1 dimensions of N = 1, D = 4, 6, 10 super-Yang–Mills theories with gauge group SU (N ). The results P are: N12 for D = 4, 6, d|N d12 for D = 10. We also discuss the D = 3 case. 1. Introduction The existence of M-theory depends crucially on the existence within type-I I A string theory of a tower of massive BPS particles electrically charged with respect to the RR 1-form. These particles, originally described as black holes in I I A supergravity [1], can be interpreted as Kaluza–Klein particles of eleven-dimensional M-theory compactified on a circle [2, 3]. Later, these particles were identified with “D0-branes” [4]. In the D-brane formulation it becomes clear that in certain energy regimes the dynamics of N such particles can be described by the supersymmetric quantum mechanics of N × N Hermitian matrices obtained from dimensional reduction of N = 1, D = 10 superYang–Mills theory [5] (the quantum mechanical model was originally studied in [6]). The existence of the M-theoretic Kaluza–Klein tower of states is equivalent to the statement that this quantum mechanics has exactly one bound state for each N . Consequently, proving the existence of these bound states has been the focus of several recent papers of which [7–9] are the most relevant to the present work. In particular, we note that the existence of the bound state in the case of N = 2 was proven in [7], but the case N > 2 remains open. The results of the present paper will help complete the proof for all N. The existence of bound states in susy quantum mechanics can be detected by computing the Witten index: limβ→∞ Tr H (−1)F e−βH = NB − NF ,

(1.1)

? On leave of absence from St. Petersburg Branch of Steklov Mathematical Institute, Fontanka, St. Petersburg, Russia.

78

G. Moore, N. Nekrasov, S. Shatashvili

where NB,F are the numbers of bosonic and fermionic zero eigen-states of the Hamiltonian H respectively. The expression Tr H (−1)F e−βH is β-independent in theories with a discrete spectrum, but may be rather complicated if the spectrum is continuous. In fact, the densities of fermionic and bosonic eigen-states may differ, leading to nontrivial β-dependence. Nevertheless, supersymmetry allows us to relate the index of interest to the easier-to-access quantity: limβ→0 Tr H (−1)F e−βH .

(1.2)

In the case of the quantum mechanics of N D0-branes (1.2) can be expressed very explicitly as a matrix integral, Z 1 (1.3) d 10 Xd 16 9e−S , Vol(G) where S is the reduction to zero dimensions of the action of the N = 1 d = 10 superYang–Mills theory with the gauge group G = SU (N )/ZN . More generally, we are aiming at computing the integral Z 1 D/2−1 d D Xd 2 9e−S (1.4) ID (N) ≡ (N 2 −1)(D−3) 2 (πg) Vol(G) for D = 3 + 1, 5 + 1, 9 + 1 respectively, where   D X X i 1 1 ¯ µ [Xµ , 9]) , Tr[Xµ , Xν ]2 + Tr(90 S= g 4 2 µ,ν=1,... ,D

(1.5)

µ=1

and 0 µ are the Clifford matrices for Spin(D). The integrals (1.3), (1.4) are not the full contribution to the Witten index (indeed, as we will see, they are not integral). The difference (also called the boundary term) limβ→∞ Tr H (−1)F e−βH − limβ→0 Tr H (−1)F e−βH Z ∞ d = dβ Tr H (−1)F e−βH dβ 0

(1.6)

may be analysed separately and is beyond the scope of this paper. See [7–9] for further discussion. The paper is organized as follows. In Sect. 2 we reinterpret the integrals (1.3), (1.4) as those appearing in the CohFT approach to the studies of the moduli space of susy gauge configurations, reduced to 0 dimensions.1 The susy gauge configurations obey flatness, instanton and complexified (or octonionic) instanton equations in 3 + 1, 5 + 1 and 9 + 1 cases respectively. (Quantum mechanics on the moduli spaces of such susy gauge configurations on compact manifolds was studied recently in [10].) In Sect. 3 we deform the integral using the global symmetries of the equations. The symmetry groups are Spin(2), Spin(4) and Spin(6) (or Spin(7)) in D = 4, 6, 10, respectively. We simplify the deformed integrals by the method of “integrating out BRST quartets” and get contour integrals over the eigenvalues of one of the matrices, denoted φ below. This brings the integrals to the form given in Eqs. (3.6), (3.7), and (3.8) below 1 “CohFT” = “Cohomological field theory”.

D-Particle Bound States and Generalized Instantons

79

for the cases D = 10, 6, 4. The method used to arrive at these expressions is a direct extension of methods we used to integrate over Higgs branches in [11]. The expressions (3.6), (3.7), (3.8) are one of the main results of this paper. Nevertheless, we must note at the outset that the result is incomplete. As Lebesgue integrals these expressions do not make sense. Rather, they should be regarded as contour integrals, which do make sense once a prescription is adopted for picking up the poles of the integrand. We are confident that a more careful implementation of the quartet mechanism will lead to a definite pole prescription. In this paper we will take the pragmatic route and simply find a pole prescription which gives the desired answer. In particular, in Sect. 4 we perform an explicit evaluation of (1.4) for G = SU (2), SU (3). In Sects. 5, 6, 7 we evaluate the integrals for the general case G = SU (N ). Each case, D = 4, 6, 10, requires a different trick in order to carry out the intricate sum over poles. In the D = 3 + 1 case we use an identity familiar from bosonization in two-dimensions. In the D = 5 + 1 case we use fixed-point techniques for a certain torus action on the Hilbert scheme of N points on C2 . In Sect. 7 we deform the octonionic instanton equations and reduce the D = 9 + 1 case to a sum over answers for D = 3 + 1 with the sum running over all possible unbroken gauge groups of N = 4 super-Yang–Mills theory broken down to N = 1 by mass terms. This final reduction leads to an answer for the index computation, predicted by M. Green and M. Gutperle in [8], building on the work of [7]. Finally, in Sect. 8 we relate our computations to the partition functions of the SYM theories on T 4 , K3, T 3 and discusses some subtleties of the latter case. As our paper was nearing completion a related paper appeared [12]. This paper describes a complementary (numerical) approach to the evaluation of the integrals ID (N ) and in particular evaluates the integral for the SU (3) case. Also, the paper [9] studied the mass deformations of the quantum mechanical problems we consider here, for the case when N is prime. It would be interesting to understand better the relation to these works. It was brought to our attention that the CohFT reformulation of IKKT model has been also considered in [13].

2. CohFT Reinterpretation To map to CohFT formalism we choose two matrices, say XD and XD−1 , and arrange them into a complex matrix φ: φ = XD−1 + iXD . The rest of the matrices can be written as Bj = X2j −1 + iX2j for j = 1, . . . , D/2 − 1. Sometimes we simply denote them as X = {Xa , a = 1, . . . , D − 2}. We also rearrange E η and add bosonic auxiliary fields HE . Then we the fermions: 9 → 9a = (ψj , ψj† ), χ, rewrite the bosonic part of the action as:

S=

D−2 X 1 E HE + gTr HE 2 − 1 ¯ 2 − iTr E(X) Tr[φ, φ] Tr|[Xa , φ]|2 , 16g 4g a=1

(2.1)

80

G. Moore, N. Nekrasov, S. Shatashvili

where the “equations” EE are: D = 4 : EE = [B1 , B1† ], D = 6 : EE = [B1 , B1† ] + [B2 , B2† ], [B1 , B2 ], [B2† , B1† ] ,

(2.2a) !

X 1 [Bi , Bi† ] . D = 10 : EE = [Bi , Bj ] + ij kl [Bk† , Bl† ], i < j, 2

(2.2.b) (2.2c)

i

It is worth noting that one can also write Eqs. (2.2b) as a three-vector EA = [XA , X4 ]+ 1 ε [XB , XC ], A = 1, 2, 3. Similarly, we can also write Eqs. (2.2c) as a seven-vector: ABC 2 EA = [XA , X8 ] + 21 cABC [XB , XC ] using octonionic structure constants: A = 1, . . . , 7. The integral (1.4) has the following important nilpotent symmetry: QXa = 9a Q9a = [φ, Xa ], QχE = HE QHE = [φ, χ], E ¯ Qφ¯ = η Qη = [φ, φ],

(2.3)

Qφ = 0. In fact, the action (2.1) together with fermions can be represented as: ! D−2 X 1 1 ¯ − iTr χE · EE + gTr χE · HE + ¯ . η[φ, φ] Tr9a [Xa , φ] S = Q Tr 16g 4g

(2.4)

a=1

As usual, there is a ghost charge. It is equal to +2 for φ, +1 for 9a , 0 for HE , Xa , −1 ¯ for χ, E η and −2 for φ. All the bosonic fields except φ are paired with the fermions. Therefore, in order to fix the normalization of the integral one need only fix the measure Dφ on the Lie algebra of G. Since LieG is a simple Lie algebra, there is a unique Killing form up to a constant multiple. This form determines the measure both on the Lie algebra and on the group G. The measure Dφ Vol(G) is thus independent of the choice of the Killing form. However, the measure depends on whether the gauge group contains the center or not. We ought to use the measure normalized against G = SU (N)/ZN , since it is G which is the actual gauge group of the problem. When we reduce the computation to an integral over the Lie algebra of the maximal torus T ⊂ SU (N) the measure Dφ will be normalized in such a way that the measure on T obtained by the exponential map integrates to one. Therefore there is an extra factor #Z in front of the integral since in passing to the measure on t we get as a factor a volume of the generic adjoint orbit: #Z Vol(G/(T /Z)) = . Vol(G) Vol(T )

(2.5)

Finally, upon eliminating the auxiliary fields HE by taking the Gaussian integral the extra 2 factor (π/g)−(D−3)(N −1)/2 appears. It is related to the β-dependent factor appearing in the index computation [7].

D-Particle Bound States and Generalized Instantons

81

3. Global Symmetries and Deformation The global symmetries are 4 : K = Spin(2), EE ∈ 1; 6 : K = Spin(4), EE ∈ 3L ; 10 : K = Spin(6), EE ∈ 6 ⊕ 6¯ r ⊕ 1. Alternatively, in the last case we can use the octonionic representation with K = Spin(7), EE ∈ 7. We will simplify our integrals by deforming the BRST operator. The deformation will involve a choice of a generic element in the Cartan subalgebra of the global symmetry group K. We therefore choose elements ∈ Lie(Spin(2)), Lie(Spin(4)), and Lie(Spin(6)) for D = 4, 6, 10 respectively. Explicitly we will write these elements as: 0 E D=4 = , −E 0   0 E1  −E 0 , D=6 = 1 0 E2  −E2 0   0 E1 + E2  −E1 − E2 0   0 E2 + E3   D = 10 =   −E2 − E3 0    0 E1 + E3  −E1 − E3 0 (3.1) for sufficiently generic real constants Ei . Using the global symmetry one may deform the nilpotent charge (2.3) to the differential of K-equivariant cohomology: Q Xa = 9a Q 9a = [φ, Xa ] + Xb Tv ()ba , Q HE = [φ, χE ] + Ts () · χ, E Q χE = HE ¯ Q η = [φ, φ], Q φ¯ = η

(3.2)

Q φ = 0. where we denote Tv for the action of Lie(K) on X’s and Ts for the action of Lie(K) on the equations. Now deform the action (2.4) to ! D−2 X 1 1 ¯ . (3.3) ¯ − iTr χE · EE + gTr χE · HE + Tr9a [Xa , φ] Trη[φ, φ] S = Q 16g˜ 4gˆ a=1

At this point the couplings g, ˆ g˜ and g are all equal but in the sequel we shall treat them separately. In particular we will first take g˜ → ∞. The new integral Z . . . e−S

82

G. Moore, N. Nekrasov, S. Shatashvili

is convergent if the original (1.4) integral is convergent. In fact the added piece S − S0 is equal to 1 ¯ a , Xb ], Tv ()ab Tr φ[X gTr (χE · Ts ()χ) E + 4gˆ which has ghost charge −2 (if we temporarily assign a charge zero to ). This means that the value of the integral (whose measure has net ghost charge zero) is not changed. Now a closer look at the eigenvalues of Ts reveals that there is always one zero eigen-value for the mass matrix of χ but the rest is non-vanishing for generic . We denote his massless mode by χ0 , and consider adding to the action a Q -exact term ¯ (3.4) sQ Tr(χ0 φ) with a large coefficient s. It has ghost charge −2. This term together with g HE 2 produces masses for all the fermions of negative ghost charge. Integrating them out (by taking the limit s → ∞, g → ∞) would produce a very simple action but without a “kinetic” term for 9a ’s. To cure this problem we add a positive ghost charge operator D/2−1 X 1 tQ ( Bi 9i† − Bi† 9i ). 2 i=1

If we assign the standard ghost charge +2 to , then the insertions of the coupling t must be compensated by the insertions of the coupling s, so the answer may only depend on the combination st. On the other hand, it is easy to repeat the derivation of [14] by first taking the limit s → ∞ with g much smaller than s. In this way one gets an effective action which is schematically of the form: 1 (3.5) Seff ∼ {Q , Tr9a [X, E]} s and which has ghost charge two.2 As we shall see momentarily, in the limit s, t → ∞ the dependence on either variable actually vanishes, therefore the value of integral which we get is equal to the original integral (1.4). As discussed in [14, 15, 11], one can now proceed to do the integrals in the semiclassical approximation for large s, t, g. We first do the Gaussian integrals to eliminate the ¯ χ, BRST quartet (η, φ, E HE ). This results in a determinant in the numerator of the measure of the form Det( + ad(φ)), where the determinant is evaluated in the representation space of the equations. Proceeding with the Gaussian integrals on (Bi , 9i ) produces determinants of the form Det( + ad(φ)) in the denominator. Finally, taking into account the Vandermonde factor in reducing the integral on φ from Lie(G) to t = Lie(T ) we obtain the integral: Z Y P (φij ) (E1 + E2 )(E2 + E3 )(E3 + E1 ) N −1 N , Dφ ID=10 (N) = E1 E2 E3 E4 N! t Q(φij ) i6 =j

P (x) = x(x + E1 + E2 )(x + E3 + E2 )(x + E1 + E3 ), Q(x) =

4 Y

(3.6)

(x + Eα + i0)

α=1 2 One might worry that the original integrals and the ones we are getting at this point differ by exponentially small terms, as in [14]. The difference with the situation of [14] is that due to the absence of topologically non-trivial solutions to the equations on finite-dimensional matrices there are no extra contributions to the integral coming from infinity.

D-Particle Bound States and Generalized Instantons

83

P for the case D = 10. Here α Eα = 0 and the integral is taken along the real line. Similarly, the same procedure gives the integral: ID=6 (N) =

E1 + E2 E1 E2

N −1

N N!

Z Dφ t

Y φij (φij + E1 + E2 ) Q2 α=1 (φij + Eα + i0) i6 =j

(3.7)

for D = 6, and can be obtained from (3.6) by taking a formal limit E3 → ∞. Finally, for D = 4 the integral is: Z Y φij N (3.8) Dφ ID=4 (N) = N −1 (φ + E1 + i0) N !E1 ij t i6 =j and can be obtained from (3.7) by taking a formal limit E2 → ∞. N has the following origin. The denominator is the order of the Weyl The factor N! group of SU (N) which enters in passing to the integral over the conjugacy classes of φ. We then rewrite this integral as an integral over t, divided by |W (G)| = N !. The numerator N is the order of the center of ZN which appears in comparing the volumes of SU (N) and G. The measure Dφ is defined as follows. The maximal Cartan subalgebra of SU (N) can be identified with RN −1 by means of the imbedding: (φ1 , . . . , φN−1 ) → diag (φ1 , . . . , φN −1 , −φ1 − . . . − φN −1 ) into the space of traceless hermitian matrices. The measure Dφ is simply the normalized Euclidean measure on RN−1 : Dφ =

N −1 Y k=1

dφk . 2π i

(3.9)

Finally, as mentioned in the introduction, it might appear that the integrals (3.8), (3.7), (3.6) are ill-defined since they are integrals along RN −1 with a measure that generically approaches 1 at ∞. This is an illusion. They should be regarded as contour integrals and become convergent once a contour deformation prescription is adopted. We will find such prescription in every case. The prescription E → E + i0 is required for the validity of the Gaussian integrations, but we still must give a prescription for closing the contours. We expect that the contour prescriptions found below will follow from a more careful implementation of the technique of integrating out BRST quartets than we have yet performed. 4. Detailed Evaluation for Low Values of N 4.1. Two-body problem. We begin by evaluating the integral (3.6) for the D = 10 case: Z P (2φ)P (−2φ) 1 P 0 (0) dφ , I= 2πi Q(0) R Q(2φ)Q(−2φ) (4.1) P (x) = x(x + E1 + E2 )(x + E3 + E2 )(x + E1 + E3 ), Y4 (x + Eα + i0). Q(x) = α=1

84

G. Moore, N. Nekrasov, S. Shatashvili

In order to evaluate it we close the contour in the upper half plane (this is an example of the “prescription” alluded to above) and pick up the contribution of four poles, at φ = 21 Eα + i0. The residue at Eα turns out to be Res 1 Eα +i0 = 2

1 R(−2Eα ) , 12 Eα R 0 (Eα )

(4.2)

where R(x) =

4 Y

(x − Eα ),

α=1

and the sum over the residues can be evaluated using an auxiliary contour integral: 4 X α=1

Res 1 Eα 2

1 = 12

I

R(−2x) dx − 1 = 5/4. xR(x)

(4.3)

For lower D’s the same formula (4.2) holds, and Eq. (4.3) gives 1 D/2−1 + (−1)D/2 , 2 12

(4.4)

i.e. the famous 5/4, 1/4, 1/4 for D = 10, 6, 4 respectively originally computed in [10, 7].

4.2. Three-body problem. The formalism we have developed so far is rather powerful. In fact, it is still possible to evaluate the integral for N = 3 directly. Let x = φ1 − φ2 , y = φ2 − φ3 = 2φ2 + φ1 . The measure can be rewritten as: dφ1 ∧ dφ2 =

1 dx ∧ dy. 3

Specializing (3.6), (3.7) to this case we find 3 sets of possible poles. The first set is given by: x ∈ {E1 + i0, E2 + i0, E3 + i0, E4 + i0}, and y ∈ {E1 + i0, E2 + i0, E3 + i0, E4 + i0}.

(4.5)

x ∈ {E1 + i0, E2 + i0, E3 + i0, E4 + i0}, and x + y ∈ {E1 + i0, E2 + i0, E3 + i0, E4 + i0}.

(4.6)

x + y ∈ {E1 + i0, E2 + i0, E3 + i0, E4 + i0}, and y ∈ {E1 + i0, E2 + i0, E3 + i0, E4 + i0}.

(4.7)

The second set is

The third set is

We order the +i0’s appropriately so that Im(Eα − Eβ ) > 0 for α > β. In the D = 6 case we have similar sets of poles but with E1 , E2 present without E3 , E4 .

D-Particle Bound States and Generalized Instantons

85

In evaluating the integral we choose poles from the first set but only take the second or third set (but not both). It is straightforward to evaluate the residues. For example, for the 5+1 case x = E1 , y = E1 gives: (

E1 + E2 2 E12 E22 (2E1 + E2 )(3E1 + E2 ) ) E1 E2 3(E1 + E2 )2 (E1 − E2 )(2E1 − E2 )

(4.8)

while the residue vanishes for x = E1 , y = E2 , with a similar contribution with 1 ↔ 2. Thus the sum of the first set of poles gives: (

E1 + E2 2 2E12 E22 (4E12 + 5E1 E2 + 4E22 ) . ) E1 E2 3(E1 + E2 )2 (E1 − 2E2 )(2E1 − E2 )

(4.9)

Choosing (4.6), and not (4.7), the contribution x + y = E2 , x = E1 gives: −(

E1 + E2 2 E12 E22 (2E1 + E2 )(E1 + 2E2 ) . ) E1 E2 (E1 + E2 )2 (E1 − 2E2 )(2E1 − E2 )

(4.10)

The sum of (4.9) and (4.10) is 1/3, which leads to 1/32 for the net answer. With a little more work one can check that in the 9+1 case we obtain Z = 13 (3 + 1/3) (again, with 1 ∧dφ2 ). 1/3 coming from the factor dφdx∧dy 5. SU (N), D = 4 For the D = 4 case we may use the Bose–Cauchy identity: 1 Y E1N

i6=j

N X Y φij 1 σ = . (−1) (φij + E1 + i0) φi − φσ (i) + E1 + i0 σ ∈SN

(5.1)

i=1

Of all the terms in (5.1) only the cycles of maximal length N can contribute to the residue evaluation (and there are (N − 1)! of those). The integral (3.8) will pick up a residue for all i except one (let us denote it by j ) provided that φσ (i) = φi + E1 + i0, for all i 6 = j. By relabelling the indices with the help of the Weyl group we can assume that j = N and the permutaton σ is a long cycle σ (i) = i + 1. The pole is at φi =

1 (2i − N − 1)E1 2

(5.2)

and the residue is equal to: N12 .3 We prove this fact by taking the integral over the variables φ in the following order: φN−1 , φN−2 , . . . , φ1 . In the sequel E1 should read as E1 + i0. Given the fact that σ = (123 . . . N ) we need to evaluate: I N−2 Y dφi E1 (−1)N−1 N N−1 N(2πi) φi − φi+1 + E1 i=1 (5.3) dφN −1 (2φN−1 + φ1 + . . . + φN −2 + E1 ) (−2φ1 − φ2 − . . . − φN −1 + E1 ) 3 Notice that φ ∈ t can be expressed as φ = ρ · E , where ρ is half the sum of the positive roots. 1

86

G. Moore, N. Nekrasov, S. Shatashvili

(the factor N in the denominator is the order of the stabilizer of σ in the Weyl group: N !/(N −1)! and the sign (−1)N−1 is (−1)σ for the long cycle). Let us prove by induction that the integral (5.3) reduces to kE1 (−1)N−k (k + 1)2 (2πi)N −k

φN−k

I

N −k−1 Y i=1

dφi φi − φi+1 + E1

(5.4) dφN−k 1 1 + k+1 (φ1 + . . . + φN−k−1 ) + 2k E1 −φ1 − k+1 (φ2 + . . . + φN−k ) + 2k E1

For k = 1 this expression is identical to (5.3). Now let us take the φN −k integral. By closing the contour in either the upper or the lower half plane (it doesn’t matter) we pick up either one or two residues. For simplicity we always close the integral in the lower half-plane, meaning that k 1 φN−k = − E1 − (φ1 + . . . + φN −k−1 ) . 2 k+1

(5.5)

By evaluating the residue we immediately see that the declared form of the integral is reproduced with the replacement k → k + 1. Finally, for k = N − 1 we get I 1 dφ1 E1 (N − 1)(−1) (5.6) = 2. N −1 N −1 2 N 2πi N φ1 + 2 E1 −φ1 + 2 E1 Hence, the D = 4 integral is equal to ID=4 (N ) =

1 . N2

(5.7)

Note that the integral has been localized to the fixed point of the C∗ action on the quotient of the space of regular traceless matrices B by the adjoint action of the group SLN (C).4 Indeed, the φ from (5.2) solves the equation [B, φ] = E1 B

(5.8)

for Bij = δi,j −1 . On general principles we expect the integral to localize to the Q fixed-points. Of course, Eq. (5.8) has other, more non-trivial, solutions. In fact, for every Jordan cell decomposition   J1 0 0 (5.9) B =  0 ... 0  0 0 Jk P for Jl being a Jordan block of length nl , l nl = N we get a solution to (5.8) of the form:   ϕ1 0 0 (5.10) φ =  0 ... 0 , 0 0 ϕk 4 A group element is regular if its centralizer in SL (C) has dimension N − 1. N

D-Particle Bound States and Generalized Instantons

87

where ϕl = fl Idnl +diag 21 (2i − nl − 1)E1 , i = 1, . . . , nl . The parameters fl are only constrained by the requirement that Trφ = 0, which leaves k − 1 free zero modes. But the presence of extra zero modes is equivalent to the statement that the integrand in (3.8) can’t pick up sufficiently many residues. Before eliminating the redundant fields every mode of φ came together with a bunch of superpartners, fermionic modes among them. By supersymmetry the unlifted modes, the fl ’s, correspond to the extra fermionic modes which make the integral vanish. We thus obtain the following important principle: The fixed points with extra U (1)’s left unbroken don’t contribute to the index. It is interesting to compare this principle with the one derived in [17] in a seemingly different context. 6. SU (N), D = 6 In this case we rewrite the integral (3.7) as: 1 (N − 1)!

E1 + E2 E1 E2

N −1

1 (2π i)N

I

dφ1 ∧ . . . ∧ dφN φ1 + . . . + φN Y φij (φij + E1 + E2 ) . · Q2 α=1 (φij + Eα + i0) i6 =j

(6.1)

We next perform the change of variables: φi 7 → φ˜ i = φi +

N −1 X

φj i = 1, . . . , N.

(6.2)

j =1

The measure gets an extra factor

1 N:

1 d φ˜ 1 ∧ . . . ∧ d φ˜ N dφ1 ∧ . . . ∧ dφN = , φ1 + . . . + φN N φ˜ N

(6.3)

and we may rewrite (6.1) as: (E1 + E2 )N−1 N(2πi)N (E1 E2 )N−1 E1 E2 = N(2πi)N (E1 + E2 ) Q ×Q

I I

d φ˜ 1 ∧ . . . ∧ d φ˜ N Y φij (φij + E1 + E2 ) × Q2 φ˜ N (φij + Eα + i0) dφ1 ∧ . . . ∧ dφN

i6=j

φij

i (−φi )(φi + E1 + E2 )

Y i,j

Y i
i6 =j

(−φi )

α=1

Y (φi + E1 + E2 ) ×

(6.4)

i

(φij + E1 + E2 ) , (φij + E1 )(φij + E2 )

where in the second line we made a substitution: φ˜ → φ and in the denominators Eα → Eα + i0. The factor (N − 1)! disappears for the following reason. The choice of φN breaks the permutation group to SN −1 . We can fix the latter symmetry by ordering the eigenvalues φi . As we shall see later, in assigning the poles of the integral (6.1) to Young tableaux each tableau yields a definite way of ordering the eigenvalues which takes up the whole of SN−1 .

88

G. Moore, N. Nekrasov, S. Shatashvili

Despite the seemingly senseless manipulation, we have arrived at an integral we can make sense of and in fact evaluate. In order to explain its meaning we recall that the solutions to the equation [B1 , B2 ] = 0 modulo conjugation describe the symmetric product of C2 away from singularities and in fact provide a certain resolution of singularities, once appropriate stability conditions are imposed. These stability conditions can be formulated by introducing an auxiliary vector I ∈ CN . Then the stable data consists of a triple Z = (B1 , B2 , I ), such that [B1 , B2 ] = 0 and there is no proper B1 , B2 invariant subspace of CN which contains I . The triples (B1 , B2 , I ) and (g −1 B1 g, g −1 B2 g, g −1 I ) are considered equivalent for any g ∈ GLN (C). It can be shown that the equivalence classes of such data Z are in one-one correspondence with codimension N ideals IZ in C[z1 , z2 ].5 The set of all codimension N ideals in the polynomial ring C[z1 , z2 ] forms [N] . what is called the “Hilbert scheme of N points on C2 ”, and is denoted by HN = C2 The quotients VZ = C[z1 , z2 ]/IZ are the fibers of a rank N vector bundle E over HN . The Chern roots of E are nothing but −φi ’s. The space HN is acted on by the complex torus T = C∗ × C∗ by rotation of the coordinates (z1 , z2 ): (z1 , z2 ) 7→ (eE1 z1 , eE2 z2 ).

(6.5)

This action lifts to the action on the data (B1 , B2 , I ) as follows: (B1 , B2 , I ) 7 → (eE1 B1 , eE2 B2 , I ).

(6.6)

The action of T on E is defined through the identification of the fiber E(B1 ,B2 ,I ) with the vector space C[B1 , B2 ]I . Let Q be the topologically trivial T-equivariant rank 2 vector bundle over HN whose isotypical decomposition coincides with that of the space C2 with coordinates z1 , z2 . The integral (6.4) computes the Euler character of a certain T-equivariant bundle FN over HN . To be more precise, we need the virtual bundle given by: (6.7) FN = Q ⊕ E ⊕ E ∗ ⊗ ∧2 Q detE ⊕ ∧2 Q . This bundle has virtual dimension 2N. The Euler classes of the various factors can be ∗ 2 recognized Q in the integrand of (6.4). For example, the EulerQclass of E ⊗ ∧ Q is the product i (φi + E1 + E2 ), while the incomplete product i
D-Particle Bound States and Generalized Instantons

89

would have in general ±1 signs for unstable critical points. One famous example of such an integral is the Duistermaat–Heckmann formula: Z X ωm −tH e−tH (p) Qm e , (6.8) = M 2m m! i=1 tmi (p) p:dH (p)=0

where (M, ω) is a symplectic manifold with Hamiltonian U (1) action generated by H , p’s are the fixed points of the U (1) action (assuming they are isolated) and mi (p) are the weights of the U (1) action in the tangent space to M at the fixed point p. Of course, there exist generalizations of this formula for other manifolds, groups other than U (1), non-isolated fixed points and so on. In the problem of present interest it turns out that the fixed points are enumerated by Young tableaux D with #D = N boxes.6 In other words, consider the partition N = ν1 + . . . + νν10 = ν10 + . . . + νν0 1 . Let (α, β) denote the position of a box in the Young tableau. There is the one-to-one correspondence between the labels i ∈ {1, . . . , N} and the allowed pairs (α, β): 1 ≤ α ≤ νβ , 1 ≤ β ≤ να0 , given by the lexicographic order ((α, β) > (α 0 , β 0 ) if α < α 0 or β < β 0 for α = α 0 ). In particular, (1, 1) ↔ N. The corresponding eigenvalues φi are given by φ(α,β) = (α − 1)E1 + (β − 1)E2 .

(6.9)

One can evaluate the residue at (6.9) using the results of [18]. Namely, in [18] it is proven that for a Young tableau D and the set φi given by (6.9) the following sum: X X −φ eφij + eφij +E1 +E2 − eφij +E1 − eφij +E2 − e i + eφi +E1 +E2 (6.10) i,j ∈D

i∈D

is equal to −

X

0

0

e(νβ −α+1)E1 +(β−να )E2 + e(α−νβ )E1 +(να −β+1)E2 .

(6.11)

(α,β)∈D

[N] at the fixed In fact, in [18] the weight decomposition of the tangent space to C2 point corresponding to D is computed. It is encoded in the formula (6.11). We simply have to take the product of those weights, which will go into the denominator. In addition we need to take into account the decomposition of the bundle FN into weight subspaces and compute the product of those weights, which will go into the numerator. We simply use the fact that the weights of E are given by φi ’s. Combining these two products we arrive at: YD ≡ contribution of D = (−)N −1 E1 E2 Q (6.12) (α,β)6=(1,1) ((α − 1)E1 + (β − 1)E2 ) (αE1 + βE2 ) . ·Q 0 0 (α,β) (νβ − α + 1)E1 + (β − να )E2 (α − νβ )E1 + (να − β + 1)E2 Now what remains is to sum over all Young tableaux D. One can check that the explicit pole prescriptions found above for the SU (2), SU (3) cases are reproduced by the poles associated to Young diagrams. Moreover, as a further illustration (and to have a look at the case with N non-prime) we write out all the residues 6 We thank V. Ginzburg for his very clear explanation of this fact. In the language of ideals I the fixed Z points are the ideals which are spanned by z1a z2b with a ≥ νb , b ≥ νa0 [21]. It explains the formula for the weights φ(α,β) below.

90

G. Moore, N. Nekrasov, S. Shatashvili

for the SU (4) case: there are five Young tableaux, a column (4), a hook (3, 1), a box (2, 2), the mirror hook (2, 1, 1) and a row (1, 1, 1, 1) (in the brackets we listed the values of να ’s). Let x = E2 /E1 . The contributions are: 1 (1 + 2x)(1 + 3x)(1 + 4x) , 4 (1 − x)(1 − 2x)(1 − 3x) 1 (1 + 2x)(1 + 3x)(x + 2) , hook (3, 1) − 2 (1 − x)2 (−1 + 3x) 1 (1 + 2x)(x + 2)(1 + x)2 , box (2, 2) − 2 (1 − 2x)(2 − x)(1 − x)2 1 (1 + 2x)(x + 3)(x + 2) , hook (2, 1, 1) − 2 (1 − x)2 (−x + 3) 1 (x + 2)(x + 3)(x + 4) , row (1, 1, 1, 1) − 4 (x − 1)(x − 2)(x − 3) (4)

column

−

(6.13)

1 4,

the sum

which together with the 1/4 factor from the measure gives 1/16 as the answer. The general answer is also expected to be E1 , E2 independent. Looking at (6.12) we see that the factors which contain single E1 ’s cancel out. Indeed, in the numerator these come from β = 1 in the factors (α − 1)E1 + (β − 1)E2 , producing E1ν1 (ν1 − 1)!

(6.14)

In the denominator the single E1 ’s come from β = να0 in the factors (νβ − α + 1)E1 + (β − να0 )E2 , giving rise to the product: E1ν1

ν1 Y α=1

ννα0 − α + 1 = E1ν1 (ν1 − 1)!.

(6.15)

Hence, single E1 ’s cancel out and the limit E1 → 0 is well-defined. It is easy to see that ν1 −N , coming from comparing all other factors for Q the overall sign (−) Q cancel out except 0 the products β<να0 (β − να ) and να0 ≥β>1 (β − 1). Thus we are left with: YD =

(ν1 − 1)! (−)ν1 −1 . Q N α ννα0 − α + 1

(6.16)

Scary as it seems, the expression (6.16) can be represented in a very simple form. The way to do it is to combine the factors in the denominator into the groups with constant να0 . A little mental excercise shows that the result can be represented as follows: Y (q) =

X

q #D YD

D

= {`γ },

X P

γ

`γ >0,γ =1,2,... ,`γ ≥0

P

q P

γ

γ

γ `γ

γ `γ

P

(−)

γ

`γ − 1 ! Q . γ `γ !

P `γ

γ

(6.17)

D-Particle Bound States and Generalized Instantons

91

Here `γ represent yet another way of partitioning N into the sum of positive integers: N=

∞ X

γ `γ

γ =1

P and `γ = #{α|να0 = γ }, in particular ν1 = γ `γ . The rest is easy: represent the P factorial in the numerator of (6.17) and γ γ `γ in the denominator with the help of integrals: `γ ∞ dt −t X Y −tq γ e−sγ e ds Y (q) = − t `γ ! 0 0 {`γ } γ =1 Z Z ∞ qe−s dsdt −t −t 1−qe −s −1 e e =− t 0 Z Z ∞ dsdt − t −s −t 1−qe e −e =− t 0 Z ∞ X qN . = − dslog(1 − qe−s ) = Li2 (q) = N2 Z

∞

Z

∞

(6.18)

N =1

So we get: ID=6 (N ) =

1 N2

(6.19)

just as in the 3 + 1 case. It is probably worth pointing out that the last stage of computations is very similar to those performed in [22] in the course of proving that the contribution to a prepotential of an isolated rational curve sitting in te Calabi–Yau manifold equals Li3 (q). Another important remark is that a faster way of getting the equality ID=6 = ID=4 is by taking the limit E2 → ∞. One might also attempt to take the limit E3 →P∞ in the D = 10 integral. This needn’t (and in fact doesn’t) work because the sum rule Eα = 0 then forces E4 ∼ −E3 → ∞ too, and the contour integration is “pinched” between the poles. Pinching poles in a contour integral is a well-known source of discontinuity. 7. SU (N), D = 10 This section concludes our tour of the matrix integrals. In principle the integral (3.6) may be computed by summing over a set of generalizedYoung tableaux (as we did above for N = 2, 3). It turns out, however, that there is a shorter route to the answer, which avoids working with any new integrals. The strategy is to reduce the number of matrices by enforcing deformed octonionic instanton equations. As opposed to Sect. 2 where we were basically taking strong coupling limits, here we are taking mixed weak and strong coupling limits, imposing the weak coupling limit to enforce some of the equations. Let us take Eα = 0 for all α. Introduce the formal variable m. Consider the expression 8ij = [Bi , Bj ] − mij k4 Bk , 1 ≤ i, j ≤ 4.

(7.1)

92

G. Moore, N. Nekrasov, S. Shatashvili

The instanton equations may now be deformed to 1 Eij = 8ij − ij kl 8†kl . 2

(7.2)

X 1 X TrEij Eij† = Tr8ij 8†ij . 2

(7.3)

Note that

1≤i,j ≤4

1≤i,j ≤4

Hence the equations Eij = 0 imply: [Bi , Bj ] = mij k4 Bk , [B4 , Bk ] = 0.

(7.4) (7.5)

Equations (7.4) are formally the equations for the vacua of N = 4 broken down to N = 1 (see [23]). Equation (7.5) implies that B4 generates the gauge transformations in the complexified unbroken group. Now let us take separate couplings g 0 , g 00 for the equations Eij and for the equation P4 † i=1 [Bi , Bi ] respectively (we can do this without spoiling Q-symmetry). Take the limit g 0 → 0. This limit enforces Eqs. (7.4), (7.5). We also split the coupling g1ˆ Tr[Xa , φ]2 as follows: 3 4 1 X 1 1X Tr|[Xa , φ]|2 → 0 Tr|[Bi , φ]|2 + 00 Tr|[B4 , φ]|2 . gˆ gˆ gˆ a=1

(7.6)

i=1

gˆ 0

Upon taking the limit → 0 we enforce the equations [Bi , φ] = 0, i = 1, 2, 3. Adopting the argument that extra U (1)’s kill the contributions to the partition function we only have to count the vacua where the adjoint gauge group is broken down to SU (d)/Zd , for N = ad. For these vacua: Bα = kLα ka×a ⊗ Idd×d

(7.7)

for α = 1, 2, 3, Lα being SU (2) generators in the a-dimensional irreducible representation of SU (2). Also, we have (B4 )N×N = Ida×a ⊗ (B4 )d×d , (φ)N ×N = Ida×a ⊗ (φ)d×d .

(7.8)

In the limit we are taking we can integrate out the Bα , α = 1, 2, 3 degrees of freedom, leaving behind B4 , φ. Accordingly, we recognize that we have exactly the degrees of freedom present in the integral ID=4 (d) for gauge group SU (d)/Zd . Moreover, due to supersymmetry, not only the degrees of freedom but also the measure is appropriate to interpret the integral as ID=4 (d). Now, we showed above that ID=4 (d) = 1/d 2 . Thus, we conclude that the answer is: X 1 (7.9) ID=10 (N ) = d2 d|N

and in particular is equal to 1 + 1/N 2 only for N prime. The term with d = 1 comes from the vacuum with completely broken gauge group.

D-Particle Bound States and Generalized Instantons

93

8. Comparison with Partition Functions of susy Gauge Theory on T 4 and K3 There are some interesting relations of the integrals ID (N ) with other well-studied partition functions. First there is a relation with 5-branes. It is worth noting that the q 0 term in the partition function of N fivebranes wrapped on K3 proposed in [17] reproduces the answer (7.9) for all N . One must divide by 24, which is the Euler characteristics of the moduli space of a center of mass of D0 branes moving on K3. The partition function is computed by wrapping the worldvolume of the fivebranes on K3 × T 2 , which by a series of T - and S-dualities can be mapped to the problem of N D4-branes wrapped on K3 and N D0 branes bound to it. The q 0 term counts the zero D0-brane charge sector in the effective gauge theory. Presumably, by a Fourier–Mukai–Nahm-duality of K3 surface, one can map this problem to the problem of N D0 branes in ten dimensions, by taking the limit of very large K3 surface on which N D0 branes propagate. A more direct connection is that between ID (N ) and partition functions of SY M on tori. Consider SU (N)/ZN N = 4 SYM on T 4 , viewed as the theory of N D3instantons wrapped on T 4 with the center of mass motion factored out (otherwise the partition function vanishes). Again, the mass perturbation breaks the theory to N = 1 with unbroken groups without U (1)’s being7 SU (d)/Zd , ad = N.

(8.1)

The SU (d) N = 1 theory has d vacua, each contributing 1 to the partition function and their total contribution is d. The partition function of N = 1 SU (d)/Zd theory is d 3 times smaller, since the partition function of SU (d) contained as a factor the number of Zd flat connections (d 4 ) and the volume of SU (d)/Zd is d times smaller, see [23] for more detailed explanations. Hence the partition function of N = 1 SU (d)/Zd gauge theory on the four-torus8 is equal to 1 , d2

(8.2)

and the partition function of the SU (N )/ZN N = 4 theory is given by N =4 4 ZSU (N )/ZN (T ) =

X 1 . d2

(8.3)

d|N

Then the T -duality relates the partition function of N D3 branes wrapped on T 4 to that of D(−1) instantons in ten dimensions. This concludes the proof of the conjecture of [8]. For lower numbers of supersymmetries the partition functions of the SU (N )/ZN are easy. For N = 1 as we argued we get N12 = N/N 3 , where N in the numerator is Witten’s index [24] and the factor N 3 is the effect of the center ZN ⊂ SU (N ). For N = 2, standard lore says that by the mass perturbation the theory reduces to N = 1 and this perturbation does not affect the value of the partition function [25]. So, we get N =1,2 4 ZSU (N )/ZN (T ) =

1 . N2

7 We thank C. Vafa for the clarifying discussion on this point. 8 In the zero ’t Hooft magnetic flux sector.

(8.4)

94

G. Moore, N. Nekrasov, S. Shatashvili

For the minimal supersymmetric three-dimensional gauge theory with the gauge group SU (N) Witten’s index is equal to 1. The effect of flat ZN connections is now N 3−1 = N 2 , thus leading to the same answer, N =1,2 3 ZSU (N )/ZN (T ) =

1 . N2

(8.5)

One could also get this answer by adding a Chern–Simons term to the SYM Lagrangian (suitably accompanied by the fermions so as to preserve some susy, see [26, 10]) and then analytically continuing in k – the coefficient in front of the CS term. It would be interesting to see whether the above answer could be reproduced by the finite dimensional integral of the sort we have considered in the paper. As has been pointed out in [7] for even N and subsequently argued in 12 for all N the D = 3 integral should vanish. The reason (at least for even N) being that the fermionic Pfaffian is odd under the parity reversal X → −X. On the other hand, by adding the Chern–Simons-like term: ¯ + ψη , k TrX[φ, φ] and integrating out all massive modes we arrive at the integral of the same form as the one for D = 4, which should be equal to N12 , thus providing an agreement with the field theory computation. Clearly, this CS-like term violates parity. On the other hand, the original integral is not obviously absolutely convergent, therefore the parity arguments may be invalid. It would be interesting to resolve this puzzle. Another interesting question, but one which is beyond the scope of this paper, is the applications to the IKKT model [27]. In fact, our technique allows for the derivation of regularized correlation functions of the operators Trφ n1 . . . Trφ nk . Acknowledgements. We would like to thank T. Banks, L. Baulieu, M. Green, S. Sethi, I. Singer, M. Staudacher and C. Vafa for useful remarks and discussions. G. M. and N. N. are grateful to the ITP at Santa Barbara and especially to D. Gross for hospitality and to the organizers and participants of the Workshops on Geometry and String Duality for providing a stimulating atmosphere. S. Sh. is grateful to the Theory Group at CERN for hospitality. The research of G. Moore is supported by DOE grant DE-FG02-92ER40704, that of S. Shatashvili, by DOE grant DE-FG02-92ER40704, by an NSF CAREER award and by an OJI award from DOE and by the Alfred P. Sloan Foundation. The research of N. Nekrasov was supported by the Harvard Society of Fellows, partially by NSF under grant PHY-92-18167, partially by RFFI under grant 96-02-18046 and partially by grant 96-15-96455 for scientific schools. In addition, this research was supported in part by NSF under Grant No. PHY-94-07194.

References 1. Horowitz, G.T. and Strominger, A.: Black strings and p-branes. Nucl. Phys. B360, 197 (1991) 2. Townsend, P.: The eleven dimensional supermembrane revisited. hep-th/9501068 3. Witten, E.: String theory dynamics in various dimensions. hep-th/9503124, Nucl. Phys. B443, 85–126 (1995) 4. Chaudhuri, S., Johnson, C. and Polchinski, J.: Notes on D-branes. hep-th/9602052; Polchinski, J.: TASI Lectures on D-branes. hep-th/9611050 5. Witten, E.: Bound States Of Strings And p-Branes. hep-th/9510135, Nucl. Phys. B460, 335–350 (1996) 6. Claudson, M., Halpern, M.B. Supersymmetric ground state wave functions. Nucl. Phys. B250 689 (1985) 7. Sethi, S., Stern, M.: D-brane bound states redux. hep-th/9705046 8. Green, M., Gutperle, M.: D-particle bound states and the D-instanton measure. hep-th/9711107 9. Porrati, M., Rozenberg, A.: Bound States at Threshold in Supersymmetric Quantum Mechanics. hepth/9708119 10. Baulieu, L., Losev, A., Nekrasov, N.: Chern–Simons and Twisted Supersymmetry in Higher Dimensions. hep-th/9707174, to appear in Nucl. Phys. B

D-Particle Bound States and Generalized Instantons 11. 12. 13. 14. 15. 16. 17. 18. 19. 20. 21. 22. 23. 24. 25. 26. 27.

95

Moore, G., Nekrasov, N., Shatashvili, S.: Integrating over Higgs Branches. hep-th/9712241 Krauth, W., Nicolai, H., Staudacher, M.: Monte Carlo Approach to M-theory. hep-th/9803117 Hirano, S., Kato, M.: Topological Matrix Model. hep-th/9708039, Prog.Theor.Phys. 98, 1371 (1997) Witten, E.: Two dimensional gauge theories revisited. hep-th/9204083; J. Geom. Phys. 9, 303–368 (1992) Jeffrey, L.C., Kirwan, F.C.: Localization for nonabelian group actions. alg-geom/9307001 Yi, P.: Witten Index and Threshold Bound States of D-Branes. hep-th/9704098, Nucl. Phys. B 505, 307– 318 (1997) Minahan, J.A., Nemeschansky, D., Vafa, C., Warner, N.P.: E-Strings and N = 4 Topological Yang–Mills Theories. hep-th/9802168 Nakajima, H.: Lectures on Hilbert schemes of points on surfaces. H. Nakajima’s homepage Kirwan, F.: Cohomology of quotients in symplectic and algebraic geometry. Math. Notes, Princeton, NJ: Princeton University Press, 1985 Atiyah, M., Bott, R.: The Moment Map And Equivariant Cohomology. Topology 23, 1–28 (1984) Ginzburg, V., Besrukavnikov, R.: To appear Manin, Yu.: Generating functions in algebraic geometry and sums over trees. alg-geom/9407005 Vafa, C., Witten, E.: A strong coupling test of S-duality. hep-th/9408074; Nucl. Phys. B431, 3–77 (1994) Witten, E.: Constraints on supersymmetry breaking. Nucl. Phys. B202, 253 (1982) Witten, E.: Supersymmetric Yang-Mills Theory On A Four-Manifold. hep-th/9403195; J. Math. Phys. 35, 5101 (1994) Nekrasov, N.: Five Dimensional Gauge Theories and Relativistic Integrable Systems. hep-th/9609219 Ishibashi, N., Kawai, H., Kitazawa, Y. and Tsuchiya, A.: A large N reduced model as superstring. hepth/9612115; Nucl. Phys. B498, 467 (1997)

Communicated by R. H. Dijkgraaf

Commun. Math. Phys. 209, 97 – 121 (2000)

Communications in

Mathematical Physics

© Springer-Verlag 2000

Integrating over Higgs Branches Gregory Moore1 , Nikita Nekrasov2,3 , Samson Shatashvili1,? 1 Department of Physics, Yale University, Box 208120, New Haven, CT 06520, USA.

E-mail: [email protected]; [email protected]

2 Institute of Theoretical and Experimental Physics, 117259, Moscow, Russia 3 Lyman Laboratory of Physics, Harvard University, Cambridge, MA 02138 USA.

E-mail: [email protected] Received: 2 May 1999/ Accepted: 16 July 1999

Abstract: We develop some useful techniques for integrating over Higgs branches in supersymmetric theories with 4 and 8 supercharges. In particular, we define a regularized volume for hyperkähler quotients. We evaluate this volume for certain ALE and ALF spaces in terms of the hyperkähler periods. We also reduce these volumes for a large class of hyperkähler quotients to simpler integrals. These quotients include complex coadjoint orbits, instanton moduli spaces on R4 and ALE manifolds, Hitchin spaces, and moduli spaces of (parabolic) Higgs bundles on Riemann surfaces. In the case of Hitchin spaces the evaluation of the volume reduces to a summation over solutions of Bethe Ansatz equations for the non-linear Schrödinger system. We discuss some applications of our results. 1. Introduction In this note we study integrals over Higgs branches. They are kähler and hyperkähler quotients, depending on the amount of supersymmetry in the theory. Our original motivation was inspired by the works [1] on S-duality and [2] on the action of affine algebras on the moduli spaces of instantons which have been subsequently developed in many ways [3–5]. In particular, the series of papers [4–7] arose out of attempts to give a field-theoretic interpretation of the results of [2]. Subsequently, string duality ideas turned out to be more adequate for addressing these questions [8, 9] although in all cases ([4, 8,9] and more recently [10]) quantum mechanics on the moduli space of instantons plays a crucial rôle. The four-dimensional version of the WZW theory described in [4] depends on a choice of a 2-form ω. The path-integral of the gauged W ZW4 is related to the fourdimensional Verlinde number [4]. If we study the theory in the ω → ∞ limit then the ? On leave of absence from St. Petersburg Branch of Steklov Mathematical Institute, Fontanka, St. Petersburg, Russia.

98

G. Moore, N. Nekrasov, S. Shatashvili

Verlinde number reduces, formally, to the symplectic volume of the moduli space of instantons on a four-manifold. In some cases this moduli space is a hyperkähler quotient by the ADHM construction. If a type IIA 5brane wraps a K3 surface then the resulting string may be identified with the heterotic string [11, 12]. The natural question then arises as to whether one can wrap IIA 5branes on ALE spaces to produce other types of strings.1 A first objection to this idea is that the infinite volume of the ALE space leads to an infinite string tension. However, it is possible that with a regularized version of the volume, e.g., that described in this paper, other well-defined string theories can be obtained from wrapped IIA 5branes. The natural hyperkähler quotients to study in the context of string duality are Hitchin spaces. They occur as spaces of collective coordinates, describing the bound states of D2-branes, wrapping a curve 6 in K3. It is clear that a thorough study of these spaces is important in various aspects of supersymmetric gauge theories [14, 15]. In all cases we define the regularized volumes of these spaces and compute them using localization techniques. The localization with respect to different symmetries produces different formulas, thereby establishing curious identities. On this route one may get certain sum rules for solutions of Bethe Ansatz equations [16, 17]. The last motivation stems from the recent studies of the Matrix theory of IIA fivebranes. The Higgs branch of such theory is described by a two dimensional sigma model with target space being the moduli space of instantons. The volume of target space is a characteristic of the ground state wave function. The integrals of some cohomology classes over the moduli space of instantons like those which we present below can be translated to give explicit results for correlation functions of chiral fields in (2, 0) six dimensional theories in the light-cone description [10]. The paper is organized as follows. In Sect. 2 we review very briefly the localization approach and describe the quotients we are going to study. In Sect. 3 we present our definitions of regularized integrals. Sects. 4–7 are devoted to explicit examples. 2. Quotients and Localization 2.1. Kähler and hyperkähler quotients. A kähler manifold X is by definition a complex variety with a hermitian metric g, such that the corresponding two-form ω(·, ·) = g(I ·, ·) is closed. The complex structure I , viewed as the section of End(T C X) is covariantly closed ∇I = 0, for the Levi–Civita connection ∇. A hyperkähler manifold X has three covariantly constant complex structures I, J, K which obey quaternionic algebra: I 2 = J 2 = K 2 = I J K = −1 which are such that in any of the complex structures the manifold X is complex. It follows that √ ωr = g(I ·, ·) is the Kähler form w.r. t. complex structure I while ωc = g(J ·, ·) + −1g(K·, ·) is a closed (2, 0)-form i.e., a holomorphic symplectic structure. See [18] for a detailed introduction to the subject. Whenever a compact group K acts on a symplectic manifold (X, ω) preserving its symplectic structure one may attempt to define a reduced space. To this end the existence of an equivariant moment map µ : X → k∗ is helpful. The latter is defined as follows: dhµ, ξ i = −ιVξ ω, where ξ ∈ k and Vξ ∈ Vect(X) is the corresponding vector field. In other words, hµ, ξ i is the hamiltonian generating the action of ξ . The equivariance condition means that µ(g · x) = Adg∗ µ(x) for any g ∈ K. The existence of µ is guaranteed in the situation where X is simply-connected while the equivariance depends on the triviality of a certain cocycle of the Lie algebra of Hamiltonian vector fields on X. 1 Such objects have recently appeared in [13].

Integrating over Higgs Branches

99

The symplectic quotient is the space X//G = µ−1 (0)/K. In the case where K acts on kähler manifold preserving both its metric and complex structure the space X//K is kähler as well. If the group K acts on a hyperkähler manifold preserving all its structures then the moment map is extended to the hyperkähler moment map: µ E : X → k ∗ ⊗ R3 , defined by E dhµ, E ξ i = −ιVξ ω,

(2.1)

where ω E ∈ 2 (X) ⊗ R3 is the triplet of symplectic forms g(I ·, ·), g(J ·, ·), g(K·, ·). It is a theorem of [18] that the hyperkähler quotient X////K = µ E −1 (0)/K is itself a hyperkähler manifold. In gauge theories with gauge group K, the quotient X/K arises as (a Higgs branch of) the moduli-space-of-vacua limit of bosonic gauge theory (whenever spontaneous symmetry breaking takes place), the Kähler quotient X//K occurs in the theory with 4 supercharges, the hyperkähler quotients appear in the theories with 8 supercharges. One is tempted to define “octonionic quotients” for the purposes of the theories with sixteen supercharges, but both the notation (8 slashes) and the lack of time make us leave it for future investigations. The original space X in gauge theories is often linear. So it is instructive to play with linear quotients first. The 0 in µ−1 (0) can be replaced by the central elements ζ , ζE in g∗ , g∗ ⊗ R3 . In gauge theories these are called Fayet-Illiopoulos terms. The quotients E −1 (ζE )/K will be denoted as M(ζ ) and M(ζE ) when this will not lead µ−1 (ζ )/K and µ to confusion. 2.2. Preliminaries: Hyperkähler representations. Let K be compact group of dim K = k, unitarily represented on a hermitian vector space V of complex dimension `. K acts via antihermitian matrices TA . In a natural way V ⊕ V ∗ is hyperkähler and the K-action n o (2.2) δA (zα ; wα ) = (TA )α β zβ ; −wα (TA )α β is H-linear. This motivates the introduction of quaternionic coordinates: α z w¯ α , X α = zα + J wα = −wα z¯ α α

α

δA X = (τA )

βX

β

(2.3)

,

where (τA )α β =

(TA )α β 0, ∗ 0 (TA )αβ .

(2.4)

On the linear space we have the Kähler form and holomorphic symplectic form: i X α [dz d z¯ α + dwα d w¯ α ], 2 X dzα ∧ dwα . ωc =

ωr =

(2.5)

100

G. Moore, N. Nekrasov, S. Shatashvili

We regard exterior derivatives as fermions:

ψ α χ¯ α . dX ≡ 9 = −χα ψ¯ α α

(2.6)

We can also define exterior differentials transforming as a triplet under the right action of SU (2). Together the four derivatives are: δn X = 9n,

(2.7)

¯ †, δn X † = n9

here n is any quaternion. Now let ρ be antihermitian of norm 1, so ρ = u† iσ3 u for some unitary u. Then the holomorphic exterior differentials in the direction ρ are: 1 (d + δρ ), 2 1 = (d − δρ ). 2

(2.8)

1 TrX† X 2

(2.9)

∂ (ρ) = ∂¯ (ρ) Finally, the (hyper)-kähler potential is κ= and satisfies:

1 ∂ (ρ) ∂¯ (ρ) κ = ω(ρ) = − Trρ9 † 9, 2

(2.10)

giving the Kähler form associated to the complex structure ρ. The moment map for the K-action: E =0 dµ E A + ι(VA )ω

(2.11)

is given explicitly by: Xα† (τA )αβ Xβ = 2

µrA −µcA , µcA −µrA

2µrA = z¯ α (TA )αβ zβ − wα (TA )α β w¯ β ∈

√ −1R,

(2.12)

µcA = wα (TA )αβ zβ . r −1 c We finally construct the hyperkähler quotient. Let µ E −1 (ζE ) = µ−1 r (ζ ) ∩ µc (ζ ), where ζEA is nonzero only for central generators TA . Then

M(ζE ) = µ E −1 (ζE )/K

(2.13)

is of real dimension 4` − 4 dim K = 4(` − k) = 4m. As complex manifolds we may write [18]: c M(ζ c ) = µ−1 c (ζ )/KC

after deleting the unstable points.

(2.14)

Integrating over Higgs Branches

101

2.3. Review of CohFT approach and localization. This section is devoted to a review of mathematical and physical realizations of integration over the quotients in various (real, complex, quaternionic and octonionic) cases. Consider a manifold X with an action of a group K. Suppose that submanifold N ⊂ X is K-invariant and is acted on by K freely. The problem we address is how to present the integration over the quotient M = N/K in terms of the one over X. Let x µ denote the coordinates on X. Their exterior derivatives are denoted as ψ µ . Differential forms a ∈ ∗ (X) on X are regarded as functions of (x, ψ). Equivariant forms are equivariant maps: α : k → ∗ (X)

(2.15)

of the Lie algebra to the differential forms: α(Adg φ) = g ∗ α(φ),

(2.16)

where in the right-hand side g represents the action of an element g ∈ K on X and correspondingly on ∗ (X). The equivariant derivative D ≡ d + ιVφ

(2.17)

squares to zero on the space of equivariant forms, which we denote as ∗K (X). The answer to our problem is positive in the important case where the K-invariant submanifold N is realized as a set of zeroes of a K-equivariant section of a K-equivariant bundle E → X: N = s −1 (0) for s : X → E. In this case it is possible to represent the integration over M in terms of integrals over X and some auxiliary supermanifolds. See [19–22]. Endow E with a K-invariant metric (, ) and let A be the connection which is compatible with it. One needs a multiplet (χ , H ) of “fields” of opposite statistics taking values in the sections of E. The equivariant derivative D acts as follows: Dχ =H − ψ µ Aµ · χ, DH =Tφ · χ − ψ µ Aµ · H − ψ µ ψ ν Fµν · χ,

(2.18)

where Tφ represents the action of φ ∈ k on the fibers of E and Fµν is the curvature of ¯ η) with values in k with the action of A. One also needs the “projection multiplet” (φ, ¯ ¯ D as: D φ = η, Dη = [φ, φ]. In the situation we are in, namely Kähler and hyperkähler quotients, the bundle E is the topologically trivial bundle E = X × g∗ or E = X × g∗ ⊗ R3 respectively. The section s is simply µ − ζ and µ E − ζE respectively. Let gµν be any K-invariant metric ∗ on X. Define a k valued one-form β as follows: β(ξ ) = 21 g(Vξ , ·) for any ξ ∈ k. In coordinates (x, ψ) it is simply : β(ξ ) = 21 gµν ψ µ Vξν . Equivariant cohomology of X maps to the ordinary cohomology of M. Indeed, there is an equivariant inclusion map i : N → X, hence the map i ∗ : HK∗ (X) ,→ HK∗ (N ) and by the assumption that the action of K on N is free we get an isomorphism I : HK∗ (N ) ≈ H ∗ (M). The cohomology classes of M among others contain the characteristic classes of the K bundle N → M which are in one-to-one correspondence with the invariant polynomials P (φ) on k. These correspond to the equivariant forms P (φ) ∈ 0 (X). So let us compute the integrals over M of the cohomology classes which are in the image of the map I ◦ i ∗ . Let α1 , . . . , αl be the representatives of the classes in HK∗ (X), so that

102

G. Moore, N. Nekrasov, S. Shatashvili

αp (φ) ∈ ∗ (X). Let 2p = I ◦ i ∗ αp . Putting all things together we may represent the integrals of such classes over M as: Z 21 ∧ . . . ∧ 2l N/K

Z

=

X

¯ DχDH DxDψDφDφDη ¯ eiD (χ ·s+β(φ)) α1 (φ) ∧ . . . ∧ αl (φ). Vol(K)

(2.19)

In the case where X is a symplectic manifold then the form α = ω+hµ, φi ∈ ∗K (X) is equivariantly closed and gives rise to a symplectic form $ on the reduced space X//K. Similarly, in the hyperkähler case the triplet of equivariantly closed forms αE = ω+h E µ, E φi produces a triplet of Kähler forms $ E on the quotient X////K. So the symplectic volume of the quotient is2 Z Z Z Dφ $ e = DχDH DηDφ¯ X//K k Vol(K) k⊕5k⊕k⊕5k (2.20) Z ¯ ω+ihµ,φi ihH,µi+ihχ ,dµi−D(β(φ)) e e . · X

The specifics of this case is the fact that the auxiliary fields η, χ, H, φ¯ come in a quartet and sometimes can be integrated out by introducing the D-exact term ¯

eitD(hχ ,φi)

¯ and into the exponent. In the limit t → ∞ this term dominates over D hχ, µi − β(φ) allows us to get rid of the auxiliary multiplet with the result: Z Z Z Dφ e$ 21 . . . 2l ∼ eω+ihµ,φi α1 (φ) ∧ . . . ∧ αl (φ), (2.21) Vol(K) X//K k X where now the integral over φ should be understood as the contour one and the choice of contour is the subtle memory of the eliminated quartet. See [21, 22] for a thorough discussion of these matters. Another instance where auxiliary fields come in quartets is in hyperkähler reduction. ¯ ⊕ (χ, H ) take values in k ⊕ k ⊗ R3 which makes it possible Indeed, the multiplets (η, φ) to introduce the “mass term” of the form: eit1 D(hχ

r ,φi)+it c c c c ¯ 2 D(hχ ,H¯ i+hχ¯ ,H i)

, t1 , t2 → ∞.

By taking t1 → ∞ and integrating out H c first we arrive at the analogue of (2.21): Z Z Z Z D2 χ c Dφ r e$ 21 ∧ . . . 2l = eZ α1 (φ) ∧ . . . ∧ αl (φ) t2k X////K k Vol(K) X (2.22) |µc |2 c c c c c c + hχ¯ , dµ i + hχ , d µ¯ i + hφ, [χ¯ , χ ]i. Z = ω + hµ, φi − t2 Had we naively integrated out χ c by dropping the hχ c , d µ¯ c i term we would get an extra determinant Det(adφ) which vanishes due to the zero modes of ad(φ). In fact the mass term does not allow to eliminate the whole of χ c , χ¯ c , etc. but rather only its k/t part, 2 For later convenience we put an i in front of φ – a kind of Wick rotation.

Integrating over Higgs Branches

103

where t is the Lie algebra of the maximal torus, corresponding to φ 3 . So we conclude that the resulting integral without the octet of auxiliary fields may be ill-defined. In fact, it might have been ill-defined from the very beginning. We haven’t discussed so far the issue of compactness of the quotient space. As soon as it is compact we expect the manipulations we performed to be reasonable and leading to the correct answer. But what if it is not? We shall discuss it in the next section but here let us proceed assuming that the non-compactness can be cured in the D-invariant manner. In this case the other side of the story is the possibility to express the integrals (2.19), (2.20) in terms of the fixed points of K action on X. This feature comes about due to the presence of the term D(gµν ψ µ Vφ¯ν ) = g(Vφ , Vφ¯ ) + . . . in the exponential which we can scale up by any amount we wish due to its D-exactness. Therefore the integrand is peaked near the zeroes of V and can be evaluated using the semi-classical approximation in the directions transverse to the fixed loci. We present the resulting formulae below.

2.4. Normal form theorems. Here we consider the hyperkähler integral (2.20) and treat ¯ η and get the representation for it a bit differently. Namely we integrate out χ r , HE , φ, the measure using (x, ψ, φ, λ ≡ χ c ) only. This exercise is a good warm-up example yet it is an illustration of the principle of “killing the quartets.” Consider the setup: c ∗ µ−1 c (ζ ) ,→ X = V ⊕ V . ↓p Mζ

(2.23)

A KC -equivariant tubular neighborhood of the level set will be modeled on a neighborhood of (µc )−1 (ζ c ) × kC which in turn is modelled on a neighborhood of: Mζ × T ∗ KC .

(2.24)

¯ c be the three Kähler forms on Mζ . Then in a KC -equivariant neighborLet $r , $c , $ c ) we have (ζ hood of µ−1 c ωc = p∗ ($c ) +

dim XK A=1

c d h2A c , µA i ,

(2.25)

−1 c where 2A c is a (1, 0) connection form for the holomorphic principal KC bundle µc (ζ ) → Mζ . We claim:

p

∗

¯ c) ($c ∧ $

Z Y k A=1

m

H aar

dµ

(K) = ωc ∧ ω¯ c

dλA d λ¯ A exp λ¯ A λB

` dim YK A=1

δ (2) (µcA − ζAc )δ(µrA − ζAr )

α β α β z¯ α {TB , TA } β z + wα {TB , TA } β w¯ .

Indeed, the complexified Lie algebra may be decomposed as kC = k ⊗R C = k ⊕ ik 3 There is also a subtlety in rotating φ to the torus but it is irrelevant for us here.

(2.26)

104

G. Moore, N. Nekrasov, S. Shatashvili

according to the decomposition of KC into K · H , where K is compact and H is from the coset K\KC . Decompose the measure as ωc ∧ ω¯ c

`

=p

∗

m

¯ c) ($c ∧ $

Y

|

A

2 dµA c |

Y A

|

dξcA |2

.

(2.27)

The action of the complex transformations on z, w is δiA zα = i(TA )α β zβ , δiA z¯ α = i z¯ β (TA )βα ,

δiA wα = −iwβ (TA )βα ,

(2.28)

δiA w¯ α = −i(TA )α β w¯ β . Evaluating the delta functions we get the ghost determinant: ∂ R A µA (exp(iξ TA ).(z, w) , det AB ∂ξ B

(2.29)

and using (2.28) to compute the variation of µrA in the nilpotent directions gives the formulae. u t As an alternative proof we could restrict ωr to µc = 0 and then do ordinary Kähler reduction. The formula (2.26) is equivalent to the case t2 = 0 of the formula (2.22) which is easily seen by shifting the ψ-variables4 , see the analogous manipulations in the section devoted to instanton moduli. 3. Definitions of the Volume In this section we are going to deal with the non-compactness which didn’t allow us to perform a large t2 evaluation of the formula (2.22). Indeed, M(ζE ) is in general noncompact, and of infinite volume in the sense of ordinary Riemannian geometry. Suppose that the space M is acted on by a group H in a Hamiltonian way such that for some ∈ h the Hamiltonian H = hµh , i is sufficiently positive at infinity. Then the definition is a) The Hamiltonian regularized volume is: Z Vol (M) =

M

e$ −H .

(3.1)

Consider the case where X is a linear space. This space always has a U (1) symmetry which commutes with the action of the group K, namely (z, w) 7→ (eiθ z, eiθ w). This action does not preserve the hyperkähler structure but preserves the Kähler form ωr . If ζ c = 0 then this action descends to the quotient space M(ζ ). In fact, the corresponding Hamiltonian coincides with the Kähler potential κ and it descends to the quotient even if ζ c 6 = 0 since it is K-invariant. In this case the Hamiltonian regularized volume is called the Kähler regularized volume. 4 It has been remarked to us by A. Losev in 1995.

Integrating over Higgs Branches

105

Another natural Hamiltonian is H = Tr{µrA , µcB }c , where {, }c is the Poisson bracket in the holomorphic symplectic structure. This regularization is called ghost regularization. It will in general differ from the Kähler regularization if K is not simple. ¯ c )m so the → We have for the hyperkähler metric: volg = ($ r )2m = ($ c ∧ $ 0 limit of (3.1) would give the Riemannian volume of M if only it existed. Quite analogously we can define regularized intersection numbers, taking the integrals (2.22) and inserting e−H . Notice that in order to define the Kähler or ghost regularized volume we don’t actually need ζ c to vanish. 3.1. Doing integrals straightforwardly. Before addressing the issue of existence of Hamiltonians H which generate some symmetries of the quotients M(ζE ), we can express the Kähler and ghost regulated volumes as integrals over the corresponding linear space X using the normal form theorems. Applying (2.26) we have (we changed the notation (φ, H c , H¯ c ) → HE ): Z ($ c ∧ $ ¯ c )m e−H = Vol (M) = 1 Vol(K)

M

Z

d HE 3

k⊗R3

Z Y A

dχAc d χ¯ Ac

Z

` Y

Vˆ ⊕Vˆ ∗ α=1

d 4 Xα d 4 9 α exp I1 + Igh + Ireg , (3.2)

where the action is given by 1 1 I1 = i TrH A X† τA X + Trρ9 † 9 + iTrH A ζA , 2 2

(3.3)

Igh = χ¯ A χ B z¯ α {TB , TA }αβ zβ + wα {TB , TA }αβ w¯ β , (

and Ireg =

−TrX† X, Kähler , † −TrX τA τA X, ghost

and H A , ρ, ζA are anti hermitian matrices ρr ρ¯c HrA HcA A H = , ρ= −ρc −ρr −HcA −HrA

(3.4)

(so φr , ρr are imaginary). Now, the above expressions are useful because they are Gaussian in X, 9, hence we can do the integrals. Doing the Gaussian integrals we arrive at the formula: Theorem. The regularized symplectic volumes of Hyperkähler quotients are given by the formula A E E Z Z exp iTr H ζA dim YK , (3.5) d 3 HEA dχAc d χ¯ Ac (2π)2` det N k⊗R3 A=1

N = R + iφ A ⊗ TA + κ η¯ A ηB 1 ⊗ {TA , TB }, where R is the regulator matrix. It is 1 for Kähler regularization and regularization.

(3.6) τA2

for ghost

106

G. Moore, N. Nekrasov, S. Shatashvili

3.2. Doing integrals using localization. It turns out that one can considerably simplify the formula (3.5) in the case ζ c = 0. What is needed is the existence of a linear action of H such that µc is transformed non-trivially under its action. For the action of H to descend down to M it is sufficient for the equation µc = 0 to be invariant under the action of H . By going to the maximal torus we may assume that H is the torus itself. In fact, we only need a one-dimensional subalgebra, generated by . With respect to this subalgebra the trivial bundle kC × X as well as its sub-bundle tC × X split as sums of line bundles. We require that the latter has no zero weight components. In this case there is a Theorem. Suppose that ζ c = 0, and that the induced action 3k () of Vh () on kC (which makes µc H -equivariant) is such that the operator A = 3k () + ad(φ) is non-degenerate for generic φ. Then Z Dφ r iφ r ·ζ r Detk 3k () + iadφ r r . e (3.7) Vol (M(ζ ) = rT DetV ⊕V ∗ 3X () + iφA k Vol(K) A Here we denoted by 3X () the linear operator on X which is the derivative of Vh () at x = 0. The measure in (3.7) is invariant under the adjoint action of K. It implies that the integral in (3.7) can be reduced to the integral over the maximal Cartan subalgebra t ⊂ k: Z (3.8) Vol (ζ r ) = DtZ M(ζ r ) (t)). t

The measure Z is the measure in (3.7) evaluated at the element t ∈ k times the Vandermonde determinant and Dt is the standard Euclidean measure. The proof of (3.7) involves some use of equivariant cohomology and all the essential ingredients are already explained in the previous section, except for the remark concerning the non-compactness of the quotient space. We said there that in order for the manipulations with “mass terms” to make sense some sort of compactness is required. Now the regularization we choose provides such a compactness in the sense that the forms we integrate exponentially fall at infinity. The formalism with equivariant cohomology must be slightly modified to take into account the action of the regulating group H . The derivative D gets promoted to D = D + ιVh () . The troublesome term Dhχ c , H¯ c i becomes D hχ c , H¯ c i which is non-degenerate on χ’s by assumptions of the theorem. Notice that (3.7) also suggests the interpretation of the equivariant volume of M(ζE ) as of the generating function of the interesection numbers of Chern classes of certain bundles over the “half” M(ζ ) of M(ζE ) defined by the symplectic quotient of V by the action of K at the same level ζ r .5 3.3. Hyperkähler regularization. The last definition is the most symmetric in the sense that it preserves the hyperkähler structure and is the direct analogue of the “equivariant volumes” of [25]. Unfortunately, this definition is rarely useful, because it requires a triholomorphic action on M of the torus of dimension 41 dimM which is rather non-generic. Nevertheless, suppose that M(ζE ) is acted on by the torus T of the stated dimension and let E ∈ t ⊗ R3 (all examples studied in [26] obey this condition). 5 Integrals which are similar to those in this paper have been studied in [24].

Integrating over Higgs Branches

107

b) Hyperkähler regularized volume: Z E M(ζ ) =

VolE

M(ζE )

volg ehE ,µE t i .

(3.9)

It essentially reduces to the Hamiltonian regularization in the case where ∈ t ⊗ (R ⊂ R3 ). 3.4. Example: Volume of a point. The volume of a point computed with the help of Hamiltonian regularization is instructive. We take X = H = C2 = {(b+ , b− )} and K = U (1). The standard K action is: (b+ , b− ) 7 → (eiθ b+ , e−iθ b− ) and the hyperkähler moment map is given by: E = (|b+ |2 − |b− |2 )Ee3 + b+ b− eE− + b+ b− eE+ ; M

(3.10)

the relevant integral involves first: Z Z E − (| b+ |2 + | b− |2 ) + 2ηη(| ¯ b+ |2 + | b− |2 )] dηd η¯ d 2 b+ d 2 b− exp[i φE · M Z 4 1 = . = dηd η¯ 2 E E ( − 2ηη¯ − i|φ|)( − 2ηη¯ + i|φ|) ( + φE2 )2 (3.11) (To do the gaussian integral it is convenient to use rotational invariance to put φ into the 3 direction.) Finally, one must do the integral: Z 4 E E E −i φ·ζ = e−|ζ | . (3.12) d 3 φe 2 2 2 E ( + φ ) In this case all regularizations agree and produce the same result since the reduced space is point. There is not much freedom here! 4. Examples: Four Dimensional Hyperkähler Manifolds 4.1. Asymptotically locally euclidean spaces. Consider the famous ALE gravitational instantons. In the An case the metric on the space Xn (ζE ) is given by the following explicit formula where the space is represented as an S 1 fibration over R3 which has singular fibers over the n + 1 point rEi ∈ R3 : ds 2 = V d rE2 + V −1 (dτ + ω)2 ,

(4.1)

where rE ∈ R3 , V = V (Er ), ω ∈ 1 (R3 ), dω = ?3 dV , and the potential V is given by: V =

n+1 X i=1

1 . |Er − rEi |

(4.2)

108

G. Moore, N. Nekrasov, S. Shatashvili

The moduli of the space are ζi = rEi+1 − rEi . We now represent the An ALE spaces Xn (ζE ) as a hyperkähler quotient, following [27]. We take X=

n Y

C2 = {(bi,i+1 , bi+1,i )|i = 0, . . . n + 1 ≡ 0},

i=0

K=

n Y

U (1)i /U (1)d ∼ =

n Y

i=0

U (1)i = {eiθl |l = 0, . . . n}/U (1)d ,

(4.3)

i=1 iα

H = U (1) = {e } and the action of the groups is as follows: bi,i+1 → ei(α+θi −θi+1 ) bi,i+1 ,

(4.4)

bi+1,i → ei(α+θi+1 −θi ) bi,i+1 , so the Kähler regulated volume becomes: Z=

Z n−1 Y

3 i

d φ

i=1

n Y

i

dη d η¯

i=1

i

Z n−1 Y

d 2 bi+1,i d 2 bi,i+1

i=0

X n−1 Ei − M E i−1 − ζEi ) + Ireg φEi (M exp i i=1

+ η¯ i (ηi+1 − ηi )(| bi,i+1 | + | bi+1,i | ) 2

=

Z n−1 Y

2

Z Y n−1 n−1 n−1 X X X φEi )δ( d 3 φ i dχ i d χ¯ i δ( χ i )δ( χ¯ i ) d 2b

i=0

i=0

i=0

i=0

X n−1 n−1 X X E i · φEi ) − φEi · xEi , (χi χ¯ i + )(|bi,i+1 |2 + |bi+1,i |2 ) − M exp i ( i=0

i=0

i

(4.5) where xE0 = ζE0 , xEi − xEi−1 = ζEi . (So xEn = 0.) Now the next trick is to use δ(

n−1 X i=0

Z n−1 n−1 Pn−1 X X X E i i χ i χ¯ i . φ )δ( χ )δ( χ¯ ) = d 3 xEei xE·( i=0 φi ) i

i=0

(4.6)

i=0

Now we have decoupled integrals of exactly the kind we met in the hyperkähler volume of a point. In particular we use Z 2 E − (|b+ |2 + |b− |2 ) = π , (4.7) d 4 b exp i φE · M 2 + φE2 and hence the φ integral is just the three-dimensional propagator: Z

d 3 φE

E

4π 2 −|x| ei φ·Ex = . e |x| 2 + φE2

(4.8)

Integrating over Higgs Branches

109

Thus, we get (4π 2 )n

Z d 3x

n−1 X 1 − Pn−1 |Exi | i=0 e , |E xi |

(4.9)

i=0

where xEi − xEi−1 = ζEi , xEi = xE + rEi . Now we would like to extract the moduli dependence of Z Vol (Xn ) =

Xn (ζE )

e−κ volg .

(4.10)

Note that we have (∇i2 + 2 )Vol (Xn ) = e−

P

ri −Erj | j 6 =i |E

.

(4.11)

Solving recursively in we obtain moduli dependence at order 0 . Thus, the part of the Hamiltonian-regulated volume of An ALE space which is finite as → 0 is given by: “vol00 (Xn (ζE )) =

n−1 X

ζEi C ij ζEj .

(4.12)

i,j =1

Remarks. 1. It is worth noting that the answer depends on ζ r . From the QFT perspective, our integral is a model for a gauge theory with complexified gauge group KC . Thus, the ζ r dependence of the volumes is an anomaly in the noncompact part of the complexified gauge group KC . If we simply added ghosts we would expect sliceindependence: that is the whole point of the Faddeev–Popov gauge fixing. In fact, the slice parameter ζ r for the principal KC bundle does matter. In order to see this dependence we must regularize with . In the → 0 limit we discover, contrary to naive expectations, ζ r dependence. 2. When ζic = 0 and the spheres are lined up we can evaluate the integral exactly: Z =

dzrdr

1 = =

Z

P√ 1 (z−zi )2 +r 2 p e− 2 2 (z − zi ) + r

X

dze−

P

|z−zi |

(4.13)

P

2 X e . 2 (n − 2i + 2)(n − 2i) −

j |zi −zj |

i

Note that although we did the integral explicitly, in exactly this case the Kähler potential can be viewed as a Hamiltonian, generating a flow on Xn (ζ r ). So, we can apply the Duistermaat-Heckmann theorem. The fixed points are the intersection points of the spheres, where bi,i+1 = bi+1,i = 0.

110

G. Moore, N. Nekrasov, S. Shatashvili

3. Naive evaluation. It is worthwhile noting that (4.12) could also be obtained via the following naive calculation: Z Z X 1 √ 4 , gd x = 2π d 3r |Er − rEj | Xn R3 j (4.14) n−1 n−1 X 1 X 2 2 rEi − rEi , = n i=0

i=0

where we use the regularization: Z 1 d 3r = 2π R 2 − 2π/3|b − a|2 . |r − a| |r−b|
(4.15)

Subtract the quadratic divergence and choose a for translation invariance. Thus, we P can take n−1 i=0 ri = 0. Then r = (r0 , r1 , . . . , rn−1 ) defines an element in the Cartan algebra of An−1 . Taking the dot product with the simple roots αi = ei − ei−1 , P i = 0, . . . n − 1, we see that αi · r = ζi . But that means r = n−1 i=1 ζi λi , where λi are the fundamental weights. But λi · λj = C ij , hence we recover n−1 X

ζEi C ij ζEj .

i,j =1

4. The hyperkähler regularization is the simplest in this case: Z Xn (ζ )

ei E·Er volg =

X ei E·Eri i

E2

.

(4.16)

This should be compared to the equivariant volume of a two-sphere: Z S2

e−H volg =

e−H− e−H+ − ,

(4.17)

for the singular terms to be where H± are the maximum/minimum of H . In order P independent of the orientation of E we must choose i rEi = 0. In this case all regularizations produce the same answer in the 0 term. 5. The appearance of three dimensional propagators is not surprising in view of the relation between the Higgs branches of d = 3 N = 4 theories and Coulomb branches of their mirrors [18]. 6. One can view ALE spaces as degenerate K3 manifolds. More precisely, near a point where K3 has a singularity of A, D, E type the relevant part of K3 looks like the corresponding ALE space. The volume of K3 is finite as it is compact manifold. Its variation near the point of degeneration may be studied using our simple integrals without knowledge of the exact metric on K3. Of course, in this case the answer essentially is given by the intersection form so there is no real simplification. But in the subsequent cases the intersection theory itself of the quotient is unknown or cumbersome.

Integrating over Higgs Branches

111

4.2. Taub–Nut spaces. ALE spaces are special examples of gravitational instantons – they solve the four dimensional Einstein equations in Euclidean signature. Close relatives of ALE spaces are ALF spaces, or Taub–Nut The metric on Taub–Nut space P manifolds. 1 . Therefore the naive regularization is of the same form (4.1) with V = 1 + i |Er −E ri | of the volume will lead to the same result as in the ALE case. Let us check whether it is true for the Kähler or similar regularization. Taub–Nut spaces can also be realized as hyperkähler quotients of a flat hyperkähler space [26]: one starts with C2n ⊕ R3 × S 1 = {(zl , wl )|l = 1, . . . , n} ⊕ {(t, xE)} which is acted on by U (1)n as follows: X θl , xE) (4.18) θl : (zl , wl ) 7 → (eiθl zl , e−iθl wl ); (t, xE) 7 → (t + l

with the moment maps: µ E l = (|zl |2 − |wl |2 − x r , zl wl − x c , z¯ l w¯ l − x¯ c ),

(4.19)

where xE = (x r , x c , x¯ c ), x r ∈ R, x c ∈ C. The generic Taub–Nut space Yn (ζE ) is obtained by imposing the constraints: µ E l = ζEl

(4.20)

and solving them modulo (4.18). In the case ζlc = 0 the regularizing U (1) action can be chosen to act as follows: eiα : (zl , wl ) 7 → (eiα zl , eiα wl ), l > 1; (t, xE) 7 → (t, x r , eiα x c )

(4.21)

with the Hamiltonian H : H = |x c |2 +

n X

|zl |2 + |wl |2 .

(4.22)

l=1

We see that H does not constrain the t, x r directions. Actually, t is gauged away by (4.18) while its x r is constrained by (4.20). Thus we expect to get a finite answer for the regularized volume. The calculations are similar to (4.5), (4.6) and the result is identical to (4.13). 5. Hitchin Spaces and Bethe Ansatz 5.1. The setup. Consider a G = U (N ) gauge theory on a Riemann surface 6 of genus g. The gauge field A = Az dz + Az¯ d z¯ can be thought of (locally) as a g-valued oneform. The topological sectors in the gauge theory (choices of a bundle E) are classified R 1 TrF ∈ Z - a magnetic flux of the U (1) part of the gauge group. by c1 (E) = 2πi 6 If we project the abelian part out in order to study the SU (N )/ZN theory, then there is w2 (E) ∈ ZN - a discrete magnetic flux, corresponding to π1 (G/U (1)). Denote the space of all gauge fields in a given topological sector as Ap , p = hw2 (E), [6]i. The gauge group G acts in Ap in a standard fashion: A 7→ Ag = g −1 Ag + g −1 dg. The cotangent space T ∗ Ap is the set of pairs (A, 8), where 8 - a Higgs field is a g-valued one-form, more precisely, a section of the bundle ad(E) ⊗ 1 (6), 8 = φz dz + φz†¯ d z¯ .

112

G. Moore, N. Nekrasov, S. Shatashvili

The Hitchin equations [29, 30] can be thought of the hyperkähler moment map equations for the natural action G on T ∗ Ap : µr = Fz¯z + [φz , φz†¯ ], ¯ z + [Az¯ , φz ]. µc = ∂φ

(5.1)

The quotient M = µ E −1 (0)/G is called a Hitchin space. It is a hyperkähler manifold. By construction, it has a natural U (1)-action [30]: eiθ : (A, φz , φz†¯ ) 7 → (A, eiθ φz , e−iθ φz†¯ ).

(5.2)

This action preserves the form ωr , descending from: Z TrδA ∧ δA + δ8 ∧ δ8 ωr =

(5.3)

and is generated by the hamiltonian H , descending from Z Trφφ † d 2 z. H = k8k2 =

(5.4)

We define the regularized volume of M to be: Z exp($ r − H ), > 0. V () =

(5.5)

6

6

M

Since M is a quotient by the gauge group one can rewrite the integral (5.5) as a partition function of a certain two dimensional gauge theory, generalizing the one studied in [21]. In the following we repeat the analysis of the Sects. 2 and 3 in a gauge theory language. Recall the field content of the gauge theory, computing the volume of the moduli space of flat connections [21]: A, ψA , φ

(5.6)

Q0 A = ψA , Q0 ψA = dA φ, Q0 φ = 0

(5.7)

with the nilpotent supercharge Q0 :

which squares to a gauge transformation Q20 = LV (φ) generated by φ. The charge Q0 is a scalar, so ψA is a fermionic one-form with values in the adjoint, φ is a scalar with values on the adjoint (not to be confused with the Higgs one-form!). The action of the theory is Z 1 TrφF + ψA ψA . S0 = 2 6 In our problem, the original set of fields is to be enlarged as we start with T ∗ Ap rather than Ap . We add 8, ψ8

(5.8)

to take care of that. Now we also impose three conditions rather than one F = 0 as it used to be in [21], so we need more Lagrange multipliers, or actually two more Q0 multiplets: H c , χ c ; H¯ c , χ¯ c .

(5.9)

Integrating over Higgs Branches

113

The action of Q0 gets promoted to Q0 8 = ψ8 , Q0 ψ8 = [φ, 8], Q0 χ c = H c , Q0 H c = [φ, H c ],

(5.10)

and S0 generalizes to Z R=

e S0 = S0 + Q(R), 6

Tr 8ψ8 + χ¯ c µc + h.c. .

(5.11)

The last modification has to do with the fact that we have an extra symmetry in the problem, namely the U (1) of (5.2. Since Q0 is the equivariant derivative it is easy to modify it to take care of (5.2): Q = Q0 + Q, where Q acts as follows: QA = QψA = Qφ = 0, Q8 = 0, c

c

Qχ = Qχ¯ = 0,

Q(ψ8 )z = φz , Q(ψ¯ 8 ) = −φz†¯ , QH c = χ c , QH¯ c = −χ¯ c .

(5.12)

Finally, the action of our gauge theory assumes the form: S = S0 + Q (R)

(5.13)

with R still given by (5.2). There are three ways of evaluating the partition function Z() =

1 Vol(G)

Z

DφDAD8DψA Dψ8 D2 HE D2 χe−S .

(5.14)

The first and the most direct one consists of integrating out the Lagrange multiplers HE , thus enforcing the moment map equations and thereby reducing (5.14) to (5.5). The second approach, similar in spirit to [21, 31–33], uses the localization with respect to the action of G, thereby reducing (5.14) to the sum over the fixed points of the action of T – the T -valued gauge transformations. This reduces the theory to the abelian one and will lead, as we will see presently, to the Bethe Ansatz equations. The third approach uses the localization of (5.5) onto the fixed points of the (5.2), studied (for G = SU (2)) by Hitchin in [30]. The comparison of the second and the third approaches leads to an interesting set of identities, which we discuss only in one simple situation.

114

G. Moore, N. Nekrasov, S. Shatashvili

5.2. Localization by the gauge group. Using the Q invariance of our theory we modify the action in such a way that the integral becomes more tractable. We add our favorite mass term and throw away the terms Qχ c µ¯ c . In this way we get a new action Z e S = S0 + Q ( Tr 8ψ8 + χ¯ c λc + h.c. ) 6

which is quadratic in χ c -λc , 8-ψ8 and essentially quadratic in A-ψA . The evaluation of the partition function Z e

D(. . . )e−S

proceeds as follows. Fix a gauge φ = diag(l1 , . . . , lN ),

X

lk = 0

(5.15)

which corresponds to a decomposition of the bundle E into a sum of the line bundles M Lk (5.16) E= k

with the Chern numbers

Z c1 (Lk ), nk =

6

c1 (Lk )

(5.17)

which should obey the following selection rule: X nk = w2 (E)modN.

(5.18)

k

Then the gauge field A decomposes as A = Aab + A⊥ , Aab being the t = Lie(T ), T = U (1)N gauge field, and A⊥ the component of A in orthogonal complement to t. The action is quadratic in A⊥ -ψA⊥ . Integrating out all these multiplets together with the Faddeev–Popov determinant for (5.15) we get the ratio of determinants: Det 0 ⊗g/t (adφ) Det0 ⊗g ( + iadφ) Det1 ⊗g/t (adφ) Det1 ⊗g ( + iadφ)

.

(5.19)

−1 Here g denotes the bundle E ⊗ E ∗ − I ≡ ⊕i,j Li ⊗ L−1 j − I , g/t denotes ⊕i6 =j Li ⊗ Lj . As usual, the determinants naively cancel, but the infinite-dimensionality of the situation makes them cancel only up to the determinants acting on the spaces of zero modes, whose dimension is given by the Riemann–Roch formula:

dim0 ⊗ L − dim1 ⊗ L = c1 (L) + 1 − g

(5.20)

(we skip the passage to the space of cohomology of the ∂¯ operator as the argument is quite standard). Before we use this fact let us take the integral over the rest of the fields, namely, we have to integrate out Aab as well as lk ’s and ψAab . The rest of the action is given by i XZ h 1 lk Fk + ψk ψk , 2 6 k

Integrating over Higgs Branches

115

where ψAab = diag(ψ1 , . . . , ψN ), Fk is the curvature, corresponding to the k’th entry of Aab . We have (5.17): Z 6

Fk = 2π ink .

Although the connection Ak in the bundle with nonvanishing nk is not a one-form on 6 the difference of two such connections A0k − Ak is a one-form αk . Fix a metric on 6 and let hk be a harmonic representative of the cohomology class of Fk . Then, for any connection in this topological sector we may write Fk = hk + dαk for some one-form αk , defined up to a gauge transformation αk 7→ αk + dβk . Integrating αk , ψk out together with the gauge fixing forces lk to obey dlk = 0 and leaves no determinant. Therefore, our integral reduces to the sum over topological sectors nk and the integral over the constant diagonal matrices φ with the measure XZ exp 2π i lE · nE νlE(), Zp () = nE

νlE =

Y

lE

(li − lj )ni −nj +1−g

i6=j

Y

ni −nj +1−g

+ i(li − lj )

,

(5.21)

i,j

lE = (l1 , . . . , lN ), nE = (n1 , . . . , nN ). P P Both the sum and integral are restricted: l = 0, n ≡ pmodN . We can relax the P condition onP n’s by introducing a factor e−2π ip l in the integral and dropping the requirement l = 0. Finally, we have two options. Either we first take the sum over nE or we first take the E The summation over nE is easily performed if we notice that (5.21 can be integral over l. rewritten as Z N P XY E −2π ip k lk ν 2−2g e2π ink χk , Z() = Dle E l

lE

nE k=1

+ i(lk − lj ) 1 X N +1 + , log χk = lk + 2 2π i − i(lk − lj ) j 6 =k Y lij2 ( 2 + lij2 ), lij = li − lj . νlE =

(5.22)

i<j

The sum over nk leads to the delta function on χk supported at integers, or, in other words, the integral in (5.22) localizes onto the solutions of the equations Y lk − lj − i . (5.23) −e2π ilk = lk − lj + i j 6 =k

A look at [16], Eqs. (228) and a formula below Eq. (291) there suggests that this equation is nothing but the Bethe Ansatz Equation for the Non-linear Schrödinger model (NLS)! Under this identification maps to the inverse coupling g −1 of the NLS Hamiltonian: Z (5.24) H NLS = dx |∂x ψ|2 + g(ψ ∗ )2 (ψ)2 .

116

G. Moore, N. Nekrasov, S. Shatashvili

In the weak coupling limit g → 0, → ∞ and (5.23) turns into lk ∈ Z + N2 . The second option is to integrate lE out keeping nE fixed and then to sum over nE. This approach seems to be equivalent to 5.3. Localization via U (1) action. The fixed points of the U (1) action (5.2) on M necessarily have Tr8r = 0, i.e. they form a submanifold of the nilpotent cone N ⊂ M. Let Sα be a connected component of the set of fixed points. As explained in [30] the bundle E splits into a sum of line bundles: E = ⊕Lk . The contribution of Sα is the standard integral Z Zα = e−dα Sα

r

e$ , Eu (Nα )

where dα is the value of ||8||2 on Sα (it is easy to express it in terms of c1 (Lk )), Nα is the normal bundle to Sα and Eu is its equivariant Euler class. By summing up Zα we get another expression for (5.22) thus establishing a curious identity obeyed by solutions of Bethe Ansatz equations (5.230). We hope to elaborate on this point in future publications. 6. Volumes of the Moduli Spaces of Instantons 6.1. Gauge theory approach. Let X be a hyperkähler manifold of dimension 4l and let A denote the space of gauge fields in some principal G-bundle E over X. Let ω E be a triple of Kähler forms on X and πE be the corresponding triple of Poisson structures. Then A carries the following (formal) Kähler forms: Z E = hπE , TrδA ∧ δAidvolg , (6.1) X

where dvolg is the metric volume form and h, i is the usual contraction. They are invariant under the gauge group action: A → Ag = g −1 dg + g −1 Ag,

(6.2)

and the hyperkähler moment map is µ E = hπE , F i.

(6.3)

The hyperkähler reduction of the space A by the action of the gauge group K produces a manifold M which is infinite-dimensional unless l = 1. Suppose X is a compact four-dimensional hyperkähler manifold, i.e. K3 or T 4 . Then the result of the hyperkähler reduction of the space of the gauge fields produces a finitedimensional hyperkähler space, which is nothing but the moduli space Mk of instantons on X of a given instanton charge k. This space posesses a natural compactification and has finite volume. It depends on the volume of X. In fact, one can deduce from [34] that Vol(Mk ) =

1 Vol(X)N , N!

(6.4)

Integrating over Higgs Branches

117

where Vol(X) is computed with the help of the Kähler metric and N is the dimension of Mk . As mentioned above, one of the motivations for this work is the desire to understand the QFT formula for the number of “four-dimensional conformal blocks” [4]: Z X (−1)i dim H i (M+ (c1 , c2 ); Lω ) = ch(Lω )T d(T 1,0 M+ ). (6.5) M+ (c1 ,c2 )

i

This is related to the gauged W ZW4 theory. Let us rescale the form ω = kω0 and take + the limit k → ∞. In that case (6.5) becomes: k dim M /2 Volω0 (M+ ). In the W ZW2 theory this leads to expressions for the symplectic volume of the moduli space of flat connections in terms of ζ -functions [21]. In the W ZW4 theory from the path integral expression we obtain: Z [dAdψdχ dH dϕ] + ZωGW ZW4 → k dim M /2 volG Z i Tr H¯ 0,2 F 2,0 + ϕω0 ∧ F 1,1 + H 2,0 F 0,2 exp − 4π X4 0,2 2,0 ¯ ¯ ¯ + ω0 ψ ∧ ψ + χ¯ DA ψ + χ DA ψ . (6.6) This theory has been considered in [35, 36] (where it was called holomorphic Yang– Mills theory). On a hyperkähler manifold we can identify the (2, 0), (0, 2) Donaldson fermions with η: χ 2,0 = ωc η, ¯ χ 0,2 = ω¯ c η.

(6.7)

Integrating out ψ, ψ¯ from the HYM lagrangian we produce all parts of the above action for computing volumes described above. If the manifold X is non-compact it is necessary to regularize the volume to get a well-defined result. One natural regularization uses the Kähler potential [37]: Z κ(x µ )TrF 2 , (6.8) κ→ X4

P where is the Kähler potential of X, equal to | rE − rEi | for ALE space. Now suppose that X is R4 or an ALE manifold. The theory with (6.8) added to the action reduces, upon localization to the moduli space of instantons, to the “Kähler regularization” used in this paper. In the case where X is an ALE manifold there is another natural term to add to the Lagrangian: Z Z ∂ ι( )[ωTr(F )], κ0 (x)TrF ∧ F + lim r→∞ S 3 ∂r r where κ0 (x) is the Kähler potential of X and is a generic diagonal matrix. As shown by Nakajima, this generates a torus action on the moduli space of instantons on Xn (ζE ). The fixed points are the fully reducible connections, i.e., the theory can be abelianized. On the other hand, the moduli spaces of instantons on R4 and ALE manifolds can be described using ADHM construction. The latter realizes the moduli space as a finitedimensional hyperkähler quotient. We may therefore apply our techniques directly in the finite-dimensional case. We shall do it in the next section. κ(x µ )

118

G. Moore, N. Nekrasov, S. Shatashvili

6.2. Volumes of ADHM moduli spaces. In the case where the space X is non-compact the moduli space of instantons on X might have a nice finite-dimensional description. For example, the instantons on R4 with the gauge group U (k) are described by the hyperkähler quotient with respect to the group K = U (N ) of the space V ⊕ V ∗ , where V = k ⊕ CN ⊗ Ck , with CN being the fundamental representation of K. The FI term ζ vanishes in this case. We can consider a slightly deformed space with ζ r 6 = 0. Applying (3.7) we arrive at the formula for the regularized volume: Z Y r N Y φij2 (φij2 + 4 2 ) eiζ φl dφl . (6.9) VolN,k (ζ r , ) = 2N (φl (φl + 2i))k i<j ( 2 + φij2 )2 l=1 It is interesting to study this integral in the large N limit. We can rewrite it (after a shift: φl → φl + i) as a partition function of a gas of N particles on a line with the pair-wise repulsive potential 2 3 2 1− 2 (6.10) V rep (x) = log 1 + 2 x x + 4 2 in the external potential, attracting to the origin in the limit ζ r = 0: V ext (x) = klog( 2 + x 2 ).

(6.11)

Assuming the existence of the density of the eigenvalues in the large N limit we get the following equation for the critical point: X V rep (φij )0 = − V ext (φi )0 ⇒ i<j

Z v.p.

ρ(x)dxV rep (x − y)0 =

2k y . 2 2 N y + 2

(6.12)

This equation can be solved in a standard way using the Fourier transform. It seems that the non-trivial limit exists only if N and k are taken to infinity with k/N fixed. The moduli space of instantons on R4 is acted on by the rotation group Spin(4) = SU (2)L × SU (2)R . The Cartan subalgebra of its Lie algebra is two dimensional and is spanned by two elements (1 , 2 ) which generate rotations in two orthogonal planes in R4 . The corresponding equivariant volume is a slight modification of (6.9): Z Y r N Y φij (φij + 1 + 2 ) (1 + 2 )N eiζ φl dφl . Vol1 ,2 (ζ r ) = N N k (φij + 1 )(φij + 2 ) 1 2 (φl (φl + 1 + 2 )) l=1

i6 =j

(6.13) Remark. In the case N = 1 ζ 6 = 0 the hyperKähler quotient coincides with the Hilbert [N] of points on C2 . The integral (6.13) can be further evaluated by residues, scheme C2 which turn out to be enumerated by allYoung tableaux Y of length N , and the contribution of a given Young tableau is computed as follows. The tableau encodes the partition of N = ν1 + . . . + νl with the condition that ν1 ≥ ν2 ≥ . . . ≥ νl . Let us enumerate the boxes in the tableau by the pairs of integers (m, n), where 1 ≤ m ≤ l, 1 ≤ n ≤ νm . Then the integrand in (6.13) has a pole at φm,n = −1 (m − 1) − 2 (n − 1). Moreover the residue can be rather easily evaluated using the results of [5].

Integrating over Higgs Branches

119

Another example where the moduli space of instantons is given by the hyperkähler quotient is the instantons on ALE spaces. For concreteness we consider the U (k) instanE tons on An−1 ALE space. The topology of the gauge bundle is specified by the vectors w and vE which enter the Kronheimer–Nakajima construction [38]. We present the result. E Voln;k,w,E E v (ζ , ) = e−

P

i ζi v i

αβ

where φi

Z Y vi n Y



iζi φiα

dφiα e  ( 2 + (φiα )2 )wi i=1 α=1

 αβ 2 αβ 2 2 (φ ) ) + 4 (φ α<β i i  , Qvi+1 α β 2 2 β=1 (φi − φi+1 ) +

Q

(6.14)

β

= φiα − φi .

7. Regularized Volumes of Quiver Varieties The formulae (6.14), (6.9) can be readily generalized to cover the case of the general quiver variety (again with the restriction ζ c = 0) [38, 2]. So assume that a quiver 0 is given. Let denote the set of its oriented edges. There is an involution ι : → which sends an oriented edge to the same edge of opposite orientation. Let V ert be the set of its vertices v. There are two maps s, t : → V ert, assigning to an edge its beginning (source) and the end (target). To each vertex a hermitian vector space Lv and to every element ω of a vector space Hω = Hom(Ls(ω) , Lt (ω) ) are assigned. Let lv = dimLv . The space M Hω (7.1) V0 = ω

is naturally acted on by the group G0 = ×v U (Lv )

(7.2)

and this action preserves the hyperkähler structure on V0 . The quiver variety X0 (V , ζE ) is defined for V ⊂ V ert, ζE : V → R3 . It is the hyperkähler quotient of V0 with respect to the subgroup GV ,0 of (7.2): GV ,0 = ×v∈V U (Lv ). Let lv∨ =

X

(7.3)

lt (ω) .

(7.4)

ω:s(ω)=v;t (ω)∈V ert\V

Applying (3.7) we arrive immediately at: P

r

Vol (X0 (V , ζE )) = e− v ζv lv  Z YY lv  eiζvr φvα dφvα  ·  ( 2 + (φ α )2 )lv∨ Q v v∈V α=1

Q

αβ 2 lv ≥β>α (φv )

ω∈:s(ω)=v,t (ω)∈V

Qlt (ω)

β=1

2

αβ + (φv )2



β

2 + (φvα − φt (ω) )2

  1  . 2

(7.5) This example covers a lot of interesting hyperkähler manifolds, complex coadjoint orbits of GLN (C) among them, for example T ∗ CPN with Calabi metric and deformed cotangent bundles to generalized flag varieties (see for example [2]).

120

G. Moore, N. Nekrasov, S. Shatashvili

Acknowledgements. We would like to thank A. Losev for many useful remarks and discussions. We also thank N. Hitchin and V. Guillemin for an interesting discussion. The research of G. Moore is supported by DOE grant DE-FG02-92ER40704, and by a Presidential Young Investigator Award; that of S. Shatashvili, by DOE grant DE-FG02-92ER40704, by an NSF CAREER award and by an OJI award from DOE and by the Alfred P. Sloan Foundation. The research of N. Nekrasov was supported by the Harvard Society of Fellows, partially by the NSF under grant PHY-92-18167, partially by RFFI under grant 96-02-18046 and partially by grant 96-15-96455 for scientific schools.

References 1. Vafa, C., Witten, E.: A strong coupling test of S-duality. Nucl. Phys. B 431, 3–77 (1994) 2. Nakajima, H.: Homology of moduli spaces of instantons on ALE Spaces. I. J. Diff. Geom. 40, (1990) 105; Instantons on ALE spaces, quiver varieties, and Kac-Moody algebras. Duke. Math. J. 76, (1994) 365–416; Gauge theory on resolutions of simple singularities and affine Lie algebras. Inter. Math. Res. Notices 61–74 (1994) 3. Grojnowski, I.: Instantons and affine algebras I: The Hilbert scheme and vertex operators. alggeom/9506020 4. Losev, A., Moore, G., Nekrasov, N., Shatashvili, S.: Four-Dimensional Avatars of 2D RCFT. hepth/9509151, Nucl. Phys. Proc. Suppl. 46, 130–145 (1996) 5. Nakajima, H.: Heisenberg algebra and Hilbert schemes of points on projective surfaces alggeom/9507012; Lectures on Hilbert schemes of points on surfaces. H. Nakajima’s homepage 6. Losev, A., Moore, G., Nekrasov, N., Shatashvili, S.: Central Extensions of Gauge Groups Revisited. hep-th/9511185 7. Losev, A., Moore, G., Nekrasov, N., Shatashvili, S.: Chiral Lagrangians, Anomalies, Supersymmetry, and Holomorphy. Nucl. Phys. B 484, 196–222 (1997), hep-th/9606082 8. Vafa, C.: Instantons on D-branes. hep-th/9512078, Nucl. Phys. B463, 435–442 (1996) 9. Harvey, J.A., Moore, G.: On the algebras of BPS states. hep-th/9609017 10. Aharony, O., Berkooz, M., Seiberg, N.: Light-cone description of (2, 0) superconformal theories in six dimensions. hep-th/9712117 11. Harvey, J.A. and Strominger, A.: The heterotic string is a soliton. hep-th/9504047 12. Sen, A.: String- String Duality Conjecture In Six Dimensions And Charged Solitonic Strings. hepth/9504027 13. Douglas, M.R.: Enhanced Gauge Symmetry in M(atrix) Theory. hep-th/9612126 14. Donagi, R., Witten, E.: Supersymmetric Yang–Mills Theory and Integrable Systems. hep-th/9510101, Nucl. Phys.B 460, 299–334 (1996) 15. Donagi, R.Y.: Seiberg-Witten integrable systems. alg-geom/9705010 16. Faddeev, L.D.: How algebraic Bethe ansatz works for integrable model. hep-th/9605187 17. Faddeev, L.D.: Algebraic aspects of Bethe ansatz. Int. J. Mod. Phys. A 10,1845–1878 (1995), hepth/9404013; The Bethe ansatz. SFB-288-70, Jun 1993. Andrejewski lectures 18. Hitchin, N., Karlhede, A., Lindstrom, U. and Rocek, M.: Hyperkähler metrics and supersymmetry. Commun. Math. Phys. 108,535 (1987) 19. Atiyah, M., Bott, R.: The Moment Map And Equivariant Cohomology. Topology 23, 1–28 (1984) 20. Witten, E.: Introduction to Cohomological Field Theories. In: Lectures at Workshop on Topological Methods in Physics, Trieste, Italy, Jun 11–25, 1990, Int. J. Mod. Phys. A6, 2775 (1991) 21. Witten, E.: On Quantum gauge theories in two dimensions. Commun. Math. Phys. 141, 153 (1991) 22. For a review, see S. Cordes, G. Moore, and S. Ramgoolam, Lectures on 2D Yang Mills theory, Equivariant Cohomology, and Topological String Theory. Lectures presented at the 1994 Les Houches Summer School Fluctuating Geometries in Statistical Mechanics and Field Theory, and at the Trieste 1994 Spring school on superstrings. hep-th/9411210, or see http://xxx.lanl.gov/lh94 23. Kirwan, F.: Cohomology of quotients in symplectic and algebraic geometry. Math. Notes, Princeton, NJ: Princeton University Press, 1985 24. Park, J.-S.: Monads and D-instantons. hep-th/9612096 25. Givental, A.B.: Equivariant Gromov–Witten Invariants. alg-geom/9603021 26. Gibbons, G., Rychenkova P.: Hyperkähler quotient construction of BPS Monopole moduli space. hepth/9608085 27. Kronheimer, P.: The construction of ALE spaces as hyper-kähler quotients. J. Diff. Geom. 28, 665 (1989) 28. Intriligator, K., Seiberg, N.: Mirror Symmetry in Three Dimensional Gauge Theories. hep-th/9607207, Phys. Lett. B 387, 513 (1996) 29. Hitchin, N.: Stable bundles and integrable systems. Duke Math 54, 91–114 (1987) 30. Hitchin, N.: The self-duality equations on a Riemann surface. Proc. London Math. Soc. 55, 59–126 (1987)

Integrating over Higgs Branches

121

31. Blau, M. and Thompson, G.: Lectures on 2d Gauge Theories: Topological Aspects and Path Integral Techniques. Presented at the Summer School in High Energy Physics and Cosmology. Trieste, Italy, 14 Jun–30 Jul, 1993, hep-th/9310144 32. Blau, M., Thomson, G.: Derivation of the Verlinde Formula from Chern–Simons Theory and the G/G model. Nucl. Phys. B 408, 345–390 (1993) 33. Gerasimov, A.: Localization in GWZW and Verlinde formula. hepth/9305090 34. Witten, E.: Supersymmetric Yang–Mills Theory On A Four-Manifold. J. Math. Phys. 35, 5101 (1994) 35. Park, J.-S.: Holomorphic Yang–Mills theory on compact Kähler manifolds. hep-th/9305095; Nucl. Phys. B423, 559 (1994); J.-S. Park: N = 2 Topological Yang-Mills Theory on Compact Kähler Surfaces. Commun. Math, Phys. 163, 113 (1994); S. Hyun and J.-S. Park: N = 2 Topological Yang-Mills Theories and Donaldson Polynomials. hep-th/9404009 36. Hyun, S. and Park, J.-S.: Holomorphic Yang–Mills Theory and Variation of the Donaldson Invariants. hep-th/9503036 37. Maciocia, A.: Metrics on the moduli spaces of instantons over Euclidean 4-Space. Commun. Math. Phys. 135, 467 (1991) 38. Kronheimer, P. and Nakajima, H.: Yang–Mills instantons on ALE gravitational instantons. Math. Ann. 288, 263 (1990) Communicated by R. H. Dijkgraaf

Commun. Math. Phys. 209, 123 – 178 (2000)

Communications in

Mathematical Physics

© Springer-Verlag 2000

On Entropy and Monotonicity for Real Cubic Maps (with an Appendix by Adrien Douady and Pierrette Sentenac) John Milnor1 , Charles Tresser2,? 1 Institute for Mathematical Sciences, SUNY, Stony Brook, NY 11794-3651, USA.

E-mail: [email protected]

2 I.B.M., P.O. Box 218, Yorktown Heights, NY 10598, USA. E-mail: [email protected]

Received: 16 November 1998 / Accepted: 2 August 1999

Abstract: Consider real cubic maps of the interval onto itself, either with positive or with negative leading coefficient. This paper completes the proof of the “monotonicity conjecture”, which asserts that each locus of constant topological entropy in parameter space is a connected set. The proof makes essential use of the thesis of Christopher Heckman, and is based on the study of “bones” in the parameter triangle as defined by Tresser and R. MacKay.

Contents 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2. Piecewise Monotone Maps and Kneading Theory . . . . . . . . . 3. Parametrization of Polynomials . . . . . . . . . . . . . . . . . . 4. Topological Entropy and Periodic Orbits . . . . . . . . . . . . . 5. The Stunted Sawtooth Family . . . . . . . . . . . . . . . . . . . 6. Contractibility of Isentropes for the Stunted Sawtooth Family . . 7. Bones in P 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8. Monotonicity, Intersections of Bones, and the n-Skeleton . . . . 9. From Connected Bones to Connected Isentropes . . . . . . . . . Appendix A: Characterization of a Polynomial by its Critical Values Appendix B: Tight Symbol Sequences and Thurston’s Theorem . . . Appendix C: Monotonicity vs Antimonotonicity . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ? Partially supported by NSF grant DMS-97-04867.

. . . . . . . . . . . . .

. . . . . . . . . . . . .

. . . . . . . . . . . . .

. . . . . . . . . . . . .

. . . . . . . . . . . . .

. . . . . . . . . . . . .

. . . . . . . . . . . . .

124 126 132 137 146 153 155 161 167 169 170 175 176

124

J. Milnor, C. Tresser

1. Introduction Consider a continuous map f from a closed interval I to itself. In the simplest cases, I is the union (in a not necessarily unique way) of sub-intervals where f is monotone. The minimal number of sub-intervals required (called the lap number) is a rough measure of the complexity of the map f . If now we think of f as generating a dynamical system, a quantitative description of the dynamic complexity of f is obtained by measuring the rate of exponential growth of the lap numbers of the successive iterates f ◦1 = f, f ◦2 = f ◦ f, . . . of the map f . This growth rate is clearly invariant under a continuous change of coordinate, and turns out to be equal to a measure of dynamic complexity used in a much more general context and known as the topological entropy h(f ). Maps on the interval provide the simplest examples to illustrate the problem of understanding how dynamic complexity evolves under deformations of a dynamical system. However, even the basic problem of comparing the topological entropies of two smooth interval maps which are close to each other is only partly understood. The best results so far have been obtained in the more specific case of polynomial maps, where one can use complex analysis methods in the study of the associated complex polynomial maps. The theory is most complete in the particular case of the family of quadratic maps Qv (x) = 4vx(1−x), parametrized by the critical value 0 ≤ v ≤ 1. It has been known for some time that the topological entropy h(Qv ) is a (non-strictly) increasing continuous function of the parameter v, with h(Q0 ) = 0 and h(Q1 ) = log 2. All known proofs of the monotonicity of topological entropy for quadratic maps use complex analytic methods. (Compare [D, DH2, MvS2, MTh]1 .) The particular proof which we will generalize is based on the following ideas. These maps Qv have a single critical point c = 21 for v 6 = 0, and there are a countable infinity of values of v ∈ (0, 1] such that c belongs to a periodic orbit. The restriction of Qv to any periodic orbit x1 < · · · < xp can be described combinatorially by the cyclic permutation o of {1, . . . , p} which we call its order type, defined by the property that Qv (xi ) = xo(i) . Thurston showed that each order type for a periodic critical orbit which occurs in this way for some Qv , occurs for a single value of v only. (Compare Sect. 8 and Appendix B.) This fact is used in [MTh] to prove monotonicity. It has been recognized for a long time that for each quadratic map with a periodic critical orbit of period p there exists a countable infinity of quadratic maps with the same topological entropy and with a critical orbit whose period is some multiple of p. Thus the monotonicity of the correspondence v 7→ h(Qv ) cannot be strict. On the other hand, it has been proved more recently that any value of the topological entropy in [0, log 2] which cannot be achieved by a map having a periodic critical orbit, is realized for a single value of v ([GS, L2]). This result is a special case of the Generic Hyperbolicity Conjecture, which states that every rational map can be approximated arbitrarily closely by a rational map (real if the original map is real, and a polynomial if the original is a polynomial) such that the orbits of all critical points converge to attracting periodic orbits. (For an early version of this conjecture, see Fatou [F].) One would like to understand as much as possible of the global bifurcation theory for polynomials of higher degree. But there, even the relevant concepts are harder to isolate because families of degree d polynomials depend naturally on d − 1 parameters. (In fact we will usually work with the integer m = d − 1.) For each fixed degree d and for each fixed sign ±1 we consider the family of all degree d polynomial maps from the interval I = [a, b] to itself, with all d − 1 critical points in I , which send the boundary 1 In contrast with these older proofs, a recent proof by M. Tsujii does not depend on holomorphic dynamics, but does depend on complex analysis [Ts]. Notice that Tsujii indicates that his proof can be understood as a local version of the argument in [MTh].

On Entropy and Monotonicity for Real Cubic Maps

125

∂I = {a, b} into itself, and with leading coefficient of specified sign, compactified for d even by the constant map with the same boundary behavior. (Compare Figs. 5, 6.) For these families it has been proposed to generalize the monotonicity property of the topological entropy when d = 2 to the connectedness of the topological entropy level sets or isentropes. Numerical computations in the case when d = 3 have suggested the Connected Isentrope Conjecture according to which all isentropes are connected [M1]. (Compare the earlier report [DGMT]. In fact, for any d, one can ask the sharper question as to whether isentropes are contractible, or cellular.) In the degree 3 case, the “bones”, or sets of parameters such that one critical point is periodic with a specified order type, are no longer points, as when d = 2, but are smooth curves. We conjectured [DGMT] that these bones cannot have any connected component which is a simple closed curve or “bone-loop”. (It follows that every bone is a connected simple arc, except in a few exceptional cases with very low periods.) Furthermore, we sketched a proof of the following implications: Generic Hyperbolicity for Real Cubic Maps H⇒ No Bone-Loops H⇒ Isentropes are Connected. The generic hyperbolicity question remains open. However, C. Heckman [He] has been able to prove a weaker version which is enough to show that there are no bone-loops. The present paper will assume Heckman’s result, and provide a more detailed exposition for the last implication. Thus, assuming that there are no bone loops, we will prove that isentropes, in either of the two families of real cubic maps, are indeed connected. (In fact we will derive the slightly more precise statement that isentropes are cellular sets. However, even assuming generic hyperbolicity, we do not know whether isentropes are contractible. Compare [FT] which shows, for the family of analytic circle maps θ 7 → θ + a + b sin(θ ) (mod 2π ), that the zero isentrope is non locally connected, with a “comb” structure.) The paper is organized as follows. Section 2 contains a short discussion of kneading theory, generalized so as to allow maps with “plateaus” or intervals of constancy. Section 3 describes the parametrization of families of polynomial maps of the interval by their critical values. An alternate approach, due to Douady and Sentenac, is given in Appendix A. We discuss in Sect. 4 the main aspects of topological entropy that we need, in particular continuity properties and relations between topological entropy and kneading information. In Sect. 5, we describe families of continuous maps of the interval closely related to kneading theory, that we call stunted sawtooth maps: Corresponding to the family of degree d polynomial maps with leading coefficient of specified sign, there is an essentially unique family of stunted sawtooth maps which mimic their behavior. Our basic hope is that most of the essential features of a family of polynomial maps of the interval are more or less faithfully mirrored in the corresponding stunted sawtooth family, where they are much easier to verify. As an example, the Generic Hyperbolicity Conjecture is easily verified for stunted sawtooth families. In Sect. 6, for each family of stunted sawtooth maps, we show that all isentropes are contractible, and therefore connected. This might be a step toward the Connected Isentrope Conjecture, or some more precise form of monotonicity property, for polynomial families of any degree. The rest of the paper is devoted to the proof that isentropes are connected for the two families of cubic maps. In order to use what we know for sawtooth maps in the study of cubic maps, we recall the definition of bones in Sect. 7, discuss their elementary properties in Sect. 8, and finish the proof in Sect. 9.

126

J. Milnor, C. Tresser

2. Piecewise Monotone Maps and Kneading Theory (Compare [My, MSS, MTh].) Let I = [a, b] be a closed interval of real numbers. A map f : I → I will be called piecewise monotone if I can be covered by finitely many closed intervals on which f is monotone (but not necessarily strictly monotone2 ). Let σ = (σ0 , . . . , σm ) be an alternating sequence of m + 1 signs. That is, we assume that σj = (−1)j σ0 with σ0 = ±1. By an m-modal map of shape σ will be meant a piecewise monotone map f : I → I as above, together with a sequence of points a = c0 < c1 < · · · < cm < cm+1 = b

(1)

in the interval I = [a, b] satisfying the following condition.3 For each 0 ≤ j ≤ m the restriction of f to the interval cj ≤ x ≤ cj +1 should be either monotone increasing or monotone decreasing according as σj equals +1 or −1. Thus the cj are uniquely determined by f in the special case that f is piecewise strictly monotone; but not in the general case when cj may belong to an interval of constancy. One also uses terms such as unimodal for an m-modal map with m = 1, bimodal when m = 2, and so on. We will use the notation (f, c) with c = (c1 , . . . , cm ) ∈ I m for an m-modal map when it is important to specify the precise choice of the ci . We will call c1 , . . . , cm the folding points of f , and their images vi = f (ci ) the folding values. In the case of a C 1 -smooth function, note that the first derivative necessarily vanishes at our folding points, but may vanish at other points also. In the important case of C 2 smooth maps without critical inflection points, the ci are precisely the critical points of f , and their images vi are the critical values. However, we introduce a new name for these points in the general case in order to avoid confusion. In that special case, note that σj can be identified with the sign of the first derivative f 0 (x) for cj < x < cj +1 . (If the second derivative f 00 (cj ) is non-zero, then σj can also be identified with the sign of f 00 (cj ) for j = 1, . . . , m.) When it is necessary to emphasize that some choice is involved (that is, when some of the ci lie in intervals of constancy) we may refer to the ci more explicitly as the designated folding points of the m-modal map (f, c). The m-tuple v = (v1 , . . . , vm ) ∈ I m will be called the folding value vector for (f, c). Note that vj ≤ vj +1 when σj = +1, (2) vj ≥ vj +1 when σj = −1 for 0 < j < m. In fact, if we set v0 = f (a), vm+1 = f (b), then this same inequality (2) will be true for 0 ≤ j ≤ m. If strict equalities hold in (2) for 0 ≤ j ≤ m, or equivalently if m is minimal, then f will be called a strictly m-modal map, and m + 1 will be called the lap number of f , denoted by `(f ). Thus `(f ) is the minimum number of intervals of monotonicity needed to cover the interval I . 2 We will use the term piecewise strictly monotone in the special case where f has no intervals of constancy. However, it will be important to allow limiting cases where f may have intervals of constancy. Because of this there is some choice of appropriate definitions for kneading theory (compare [BMT]), but the following will be convenient for our purposes. 3 We will sometimes need to consider the more general case where

a = c0 ≤ c1 ≤ · · · ≤ cm ≤ cm+1 = b. (Compare Sect. 3.) However, this is awkward for kneading theory, and we avoid it in this section.

(10 )

On Entropy and Monotonicity for Real Cubic Maps

127

v3

v =v 1 2

a

I0

c 1

I1

c2

I2

c3

I3

b

Fig. 1. Graph of a 3-modal map of shape (+ − +−), with v1 = v2 < v3 . (Depending on the choice of designated folding points within the intervals of constancy, this same map could equally well be considered as strictly unimodal, or as m-modal for any odd value of m).

Let f : I → I be an m-modal map of shape σ . Partition the interval I into disjoint subsets I = I0 ∪ C1 ∪ I1 ∪ C2 ∪ · · · ∪ Cm ∪ Im . where Cj is the set {cj } consisting of a single folding point of f , and where the open sets I0 = [c0 , c1 ), I1 = (c1 , c2 ), · · · , Im = (cm , cm+1 ] are the connected components of the complement I r (C1 ∪ · · · ∪ Cm ). We will also think of the Cj and Ij as abstract symbols, forming the letters of an alphabet A = A(m) = {I0 , C1 , I1 , C2 , < · · · , < Cm , Im }. We order the 2m + 1 symbols in A(m) as they lie along the real line, so that I0 < C1 < I1 < C2 · · · Cm < Im . Each point x ∈ I has a unique address A(x) ∈ A defined by the condition that x ∈ A(x). Note that A(x) ≤ A(y) whenever x < y. Let AN be the set of all infinite sequences (A0 , A1 , . . . ) of symbols Ai ∈ A. Each point x ∈ I has a well defined itinerary, I(x) = A(x), A(f (x)), A(f ◦2 (x)), . . . ∈ AN . Note that the map f on the interval I corresponds to the shift map shift (A0 , A1 , . . . ) = (A1 , A2 , . . . ) on the space AN of symbol sequences, in the sense that the following diagram is commutative: I I −−−−→ AN     (3) fy yshift I

I −−−−→ AN .

128

J. Milnor, C. Tresser

Some aspects of the structure of the orbit of x reflect in an obvious way in the itinerary. For instance, if x belongs to a periodic orbit, its itinerary must be periodic, and since the only periodic orbits of monotone maps have period 1 or 2, the period of the itinerary of x is either the period of the orbit or half of it. This periodic itinerary (or any shift of it) will be called the kneading type of the periodic orbit. For each fixed shape σ , we introduce a partial ordering of AN as follows. Define the sign function : A → {−1, 0, 1} by the formula (Ij ) = σj and (Cj ) = 0. (In the special case of a C 1 -map whose derivative vanishes only at the folding points, note that (A(x)) can be identified with the sign of the derivative f 0 (x).) By definition, (A0 , A1 , . . . ) < (B0 , B1 , . . . ) if and only if there is a index k ≥ 0 so that Ai = Bi for i < k, but Ak < Bk if the product (A0 ) (A1 ) · · · (Ak−1 ) is equal to + 1 Ak > Bk if this product equals − 1. (If the product is zero, or in other words if some Ai = Bi with i < k is equal to one of the folding point sets C1 , . . . , Cn , then the ordering is not defined. However, when comparing two different itineraries for the same map, this case occurs only when the two symbol sequences are identically equal.) It is not difficult to check that I(x) ≤ I(y) whenever x ≤ y. Note that there is a unique smallest element Imin the space AN with this ordering, so that Imin ≤ I   (I0 , I0 , I0 , I0 , . . . ) Imin = (I0 , Im , Im , Im , . . . )  (I , I , I , I , . . . ) 0 m 0 m

and a unique largest element Imax in ≤ Imax for all I ∈ AN . For example: if σ = (+ · · · ±) if σ = (− · · · +) if σ = (− · · · −).

Sometimes we will need to truncate the infinite sequences in AN and consider the space Ak of finite sequences of some fixed length k. There is a completely analogous ordering of such finite sequences. The itineraries Kj = I(f (cj )) ∈ AN of the folding values f (cj ) will play a special role, and will be called the kneading sequences of f . To each m-modal map f there is associated a vector K(f ) = (K1 , . . . , Km ) ∈ (AN )m of kneading sequences. This vector of kneading sequences, together with the shape σ , will be called the kneading data for the piecewise monotone map f . The choice of kneading data imposes sharp restrictions on which itineraries can actually occur. Suppose that I(x) = (A0 , A1 , . . . ). Evidently: Compatibility Condition 1. If some symbol Ak is equal to Cj , then the sequence (Ak+1 , Ak+2 , . . . ) which follows it must be equal to the kneading sequence Kj . In view of this condition, it is often convenient to terminate the sequence (A0 , A1 , . . . ) at the first Cj which occurs in it, since the subsequent symbols give no further information.

On Entropy and Monotonicity for Real Cubic Maps

129

Compatibility Condition 2. If Ak = Ij −1 or Ak = Ij , then the sequence which follows must satisfy either (Ak+1 , Ak+2 , . . . ) ≤ Kj or (Ak+1 , Ak+2 , . . . ) ≥ Kj according as σj = −1 (so that cj is a local maximum point) or σj = +1 (so that cj is a local minimum point). Definition. The symbol sequence (A0 , A1 , · · · ) ∈ AN will be called admissible for the kneading data (K1 , . . . , Km ) if it satisfies these two compatibility conditions. (This terminology will be justified in 5.2.) Remark. Evidently the orbit of the folding point cj is periodic if and only if the itinerary I(cj ) = (Cj , Kj ) is periodic. Using Condition 1, a completely equivalent condition is that the kneading sequence Kj contains the symbol Cj as one of its entries. Example 2.1. The map f (x) = (4x −1)2 (1−x) of shape (−+−) on the unit interval has periodic kneading sequences K1 = I0 I2 and K2 = I2 I0 (where the overline indicates a sequence which is to be repeated infinitely often), yet the critical points themselves are not periodic. In fact f maps both critical points 1/4 and 3/4 to the boundary period two orbit {0, 1}. On the other hand,√for the map f (x) = 1 − 3x 2 + 2x 3 of shape (+ − +) on the interval bounded by (1 ± 5)/2, the critical points 0 ↔ 1 are periodic, and hence their itineraries I(0) = C1 C2 and I(1) = C2 C1 are also periodic. In general, one cannot hope that the set of all itineraries for a given map will be completely determined by its kneading data. The first problem is that the kneading data does not usually determine the itineraries of the two endpoints a, b ∈ ∂I . Yet every itinerary I(x) which actually occurs must certainly satisfy I(a) ≤ I(x) ≤ I(b). If these two boundary itineraries have been specified, then setting K0 = I(f (a)) and Km+1 = I(f (b)) we can introduce the following slightly sharper version of Condition 2 for an itinerary I(x) = (A0 , A1 , . . . ): Compatibility Condition 2] . If Ak = Ij , then Kj ≤ (Ak+1 , Ak+2 , . . . ) ≤ Kj +1 whenever σj = +1, Kj ≥ (Ak+1 , Ak+2 , . . . ) ≥ Kj +1 whenever σj = −1. However, it is still not true that every sequence satisfying Conditions 1 and 2] necessarily occurs as itinerary for the given m-modal map. The following helps to indicate the more serious difficulties. Example 2.2. Consider the following three families of unimodal maps of shape (+−) on the unit interval, with folding point 1/2, all parametrized by the folding value v, the tent family: Tv (x) = 2v Min(x, 1 − x), the quadratic family: Qv (x) = 4vx(1 − x), and the family: Sv (x) = Min(2x, v, 2 − 2x)

(compare Sect. 5).

130

J. Milnor, C. Tresser

In each case there is a unique parameter value v so that the orbit of the folding point has period 3, necessarily with kneading sequence K1 = I1 I0 C1 . (The corresponding parameter values are respectively √ v = (1 + 5)/4 = 0.80901 · · · , v 0 = 0.95796 · · · , and v 00 = 7/8 = 0.875 in the three cases.) However, the itineraries which can actually occur for these three maps are different. For the tent case, the orbit of the folding point is the only period 3 orbit. (The graph of Tv◦3 touches the diagonal without crossing it at the three points of this orbit.) In the other two cases the corresponding graph definitely crosses the diagonal and must cross back, so there is a period 3 point close to the folding point with itinerary I1 I1 I0 . In the third case, the corresponding graph is horizontal near the folding point, so that there are also infinitely many nearby points with an itinerary I(x) equal to I0 K1 = I0 I1 I0 C1 or I1 K1 = I1 I1 I0 C1 . Evidently such itineraries can occur only if the map is constant on the entire interval between x and the folding point. Hence they can never occur for maps like Tv or Qv 0 which are piecewise strictly monotone. Thus the compatibility conditions are not sufficient to guarantee the existence of itineraries, or even of finite truncations of itineraries which contain folding point symbols. However, the next lemma provides a useful existence statement by working only with finite sequences containing no folding point symbols. Note that the analogues of Conditions 1, 2, and 2] for finite sequences in Ak make perfect sense. In particular, the concept of “admissibility” for finite sequences also makes sense. A finite or infinite symbol sequence {Ai } will be called acritical if the Ai all belong to the smaller alphabet A0 = A0 (m) = {I0 , I1 , . . . , Im } ⊂ A(m), with no folding point symbols. Lemma 2.3. Let f be an m-modal map, and let (Iα0 , Iα1 , . . . , Iαk ) be a finite sequence of intervals Iαj ∈ A0 (m). There exists an orbit x0 7 → x1 7→ · · · 7 → xk 7→ · · · with xi ∈ Iαi for all i ≤ k if and only if this sequence satisfies Condition 2] , modified so as to apply to sequences of finite length. Proof. If there exists such an orbit, then clearly the sequence (Iα0 , . . . , Iαk ) must satisfy the suitably modified Condition 2] . Conversely, we will prove by induction on k that every sequence (Iα0 , Iα1 , . . . , Iαk ) which satisfies this condition can be realized by an orbit of f . This statement is certainly true when k = 0. Suppose then that it is known for all sequences of shorter length. Suppose also, to fix our ideas, that σα0 = +1, so that (k)

≤ (Iα1 , Iα2 , . . . , Iαk ) ≤ Kα0 +1 Kα(k) 0

(4)

by Condition 2] . (Here the superscript (k) indicates truncation to length k.) By the induction hypothesis, there exists a point x1 ∈ I with itinerary (Iα1 , Iα2 , . . . , Iαk , . . . ).

On Entropy and Monotonicity for Real Cubic Maps

+ ... +

+ ... −

131

− ... +

− ... −

Fig. 2. Boundary anchored maps: graphs illustrating the four possible cases. These maps are m-modal with m odd for the middle pictures, and with m even for the end pictures

The proof will now be divided into two cases according as strict inequalities do or do not hold in (4). If strict inequalities hold, then it follows that vα0 < x1 < vα0 +1 . It then follows from the intermediate value theorem that there exists at least one point x0 ∈ Iα0 with f (x0 ) = x1 , so that x0 has the required itinerary. On the other hand, if for example Kα(k) = (Iα1 , Iα2 , . . . , Iαk ), 0 then the point x0 = cα0 + with > 0 sufficiently small will have the required itinerary, t since the Iαj are open subsets of I . The remaining cases are completely analogous. u It is often practical to first consider maps whose boundary behavior is specified in the simplest way. Suppose that f maps the interval I = [a, b] into itself. Again choose some fixed shape σ . Definition. The m-modal map f : I → I will be called boundary anchored for the shape σ if f maps the boundary ∂I = {a, b} into itself by the rule ( ( b if σm = + a if σ0 = + f (b) = (5) f (a) = b if σ0 = −, a if σm = −. In the sequel, we will nearly always work with boundary anchored maps, so that the distinction between Conditions 2 and 2] will disappear. For a boundary anchored map, note that the itinerary I(a) is precisely equal to Imin and that I(b) = Imax . It follows that the sequences K0 = I(f (a)) and Km+1 = I(f (b)) are determined by the shape, and that each one is equal to either Imin or Imax . Lemma 2.3 has the following consequence. (Compare 4.5 below.) Corollary 2.4. Let f be a boundary anchored m-modal map. Then a finite acritical sequence (Iα0 , . . . , Iαk ) is actually realized by some orbit x0 7 → · · · 7 → xk if and only if it is admissible. Hence, for any k, the set of all such sequences in Ak0 is determined by the kneading data for f . The kneading sequences Kj are themselves itineraries, and hence must satisfy these two compatibility conditions. Evidently they must also satisfy the following. Compatibility Condition 3 (for kneading data). Kj ≤ Kj +1 if σj = +1, Kj ≥ Kj +1 if σj = −1.

132

J. Milnor, C. Tresser

Definition. The kneading data K = (K1 , . . . , Km ) will be called admissible for the specified shape σ if it satisfies these three compatibility conditions. As we shall see in 5.2, the three compatibility conditions characterize the m-tuples of symbolic sequences in A(m)N that can actually occur as kneading data for some m-modal map. However, if we restrict to polynomial (or analytic) maps, then further restrictions are needed. (Compare [MaT1, p.179], as well as 2.2 and Appendix B, Example 4.) 3. Parametrization of Polynomials First consider a polynomial map f : R → R of degree m + 1 with distinct real critical points c1 < c2 < · · · < cm . As in the previous section, we can form the critical value vector (= folding value vector) v = (v1 , . . . , vm ) ∈ Rm , where vi = f (ci ). Setting σm equal to the sign of the leading coefficient, or equivalently the sign of the (m + 1)st derivative, and setting σi = (−1)m−i σm , note the following strict form of (2): σi (vi+1 − vi ) > 0 for 1 ≤ i < m.

(20 )

Again we will refer to σ = (σ0 , . . . , σm ) as the shape. We will first prove the following. Lemma 3.1. Given any m-tuple (v1 , . . . , vm ) ∈ Rm satisfying the inequalities (20 ), there exists a polynomial f of degree m+1 with these critical values, listed in order of the corresponding critical points as above. Furthermore f is unique up to precomposition with a positive affine transformation f (x) 7 → f (px + q) with p > 0. Proof of 3.1. We will give a proof based on complex analysis. Alternatively, a purely real proof may be extracted from [MvS2, p. 120.], and a different real proof, due to Douady and Sentenac, is provided in Appendix A. ˆ labeled To begin the construction, start with m + 1 copies of the Riemann sphere C, ˆ for 0 ≤ k ≤ m. Slit the first two copies {0} × C ˆ and {1} × C ˆ along the real axis as {k} × C from v1 to ±∞, taking the plus sign or the minus sign according as the sign σ0 is +1 or −1. That is, if σ0 = +1 so that v1 is a local maximum, we remove the open interval consisting of real z with v1 < z < +∞, while if σ0 = −1 so that v1 is a local minimum we remove the open interval −∞ < z < v1 . Similarly, for each 1 ≤ k ≤ m we slit both ˆ and {k} × C ˆ from vk to ±∞, now choosing the sign according as vk is a {k − 1} × C local maximum or minimum. The hypothesis (20 ) guarantees that these various slits can never meet. ˆ Now, for each 1 ≤ k ≤ m, sew together the pair of corresponding slits in {k − 1} × C ˆ so that the top edge of either one is matched with the bottom edge of its and {k} × C mate. The result will be a compact simply connected Riemann surface S. (Note that there is a canonical way of assigning a conformal structure to this surface S, even at the m + 1 ramification points.) Furthermore, the natural projection maps (k, z) 7 → z from ˆ to C ˆ fit together to yield a holomorphic map η : S → C ˆ of degree m + 1, with {k} × C the vk and the point at infinity as critical values. By the Uniformization Theorem, S must be conformally isomorphic to the standard Riemann sphere, under some isomorphism

On Entropy and Monotonicity for Real Cubic Maps

133

v1

v2

v1

v2 Fig. 3. Three copies of the complex numbers, slit for the construction of a bimodal map of shape (+ − +), with critical values v1 > v2

ˆ → S. (See for example [FK] or [Be].) If we choose the isomorphism u : C ˆ →S u:C ˆ ˆ in such a way that the points at infinity correspond, then the composition η ◦ u : C → C is clearly a polynomial map with the required critical values. To recover the real mapping, note that the complex conjugation operations on the ˆ fit together to yield an anti-holomorphic involution of S whose various copies {k} × C fixed point set F is the union of the m+1 real axes, with the various slits removed. There ˆ is a preferred ordering of F r {∞} so that its intersections with the various {k} × C ˆ →S occur in the order of increasing k. Now choose the conformal isomorphism u : C ˆ corresponds to F r {∞}, preserving orientation. In this way, we obtain a so that R ⊂ C real polynomial map η ◦ u|R , with the specified critical values occurring in the specified order. Evidently the conformal isomorphism u, with these restrictions, is unique up to an affine change of coordinates u(z) 7 → u(pz + q) with p, q ∈ R and p > 0. u t In practice, we are interested in polynomial maps which carry some closed interval I = [a, b] ⊂ R into itself, and which are boundary anchored, so that ∂I = {a, b} maps into itself. Let us fix the shape σ . It is sometimes convenient to set v0 = v0 (σ ) = f (a),

vm+1 = vm+1 (σ ) = f (b),

where now f (a) and f (b) are to be defined by formula (5). Thus a map g is boundary anchored for the shape σ if and only if it satisfies g(a) = v0 and g(b) = vm+1 . Note that the inequality (20 ) can be sharpened to include v0 and vm+1 : σi (vi+1 − vi ) > 0 for 0 ≤ i ≤ m.

(200 )

134

J. Milnor, C. Tresser

Theorem 3.2. Given an m-tuple (v1 , . . . , vm ) ∈ I m satisfying the inequalities (200 ), there exists one and only one boundary anchored polynomial map g : I → I of degree m + 1 which has distinct critical points a < c1 < · · · < cm < b, with g(ck ) = vk . Proof of 3.2. Let f : R → R be the polynomial whose existence is promised by 3.1. Then we claim that there are unique real numbers p > 0 and q so that the polynomial g(x) = f (px + q) satisfies the additional conditions that g(∂I ) ⊂ ∂I . It is then easy to check that g(I ) ⊂ I . Without loss of generality, we may assume that I is the unit interval [0, 1]. Suppose first that σ has the form (+ · · · −). Then f (c1 ) = v1 is a local maximum point. Evidently f maps the closed interval [−∞, c1 ] homeomorphically onto the closed interval [−∞, v1 ]. Since v1 > a = 0, there is a unique point −∞ < q < c1 with f (q) = 0. Similarly, f maps the closed interval [cm , +∞] homeomorphically onto [−∞, vm ], reversing orientation. Since vm > a = 0, there is a unique p + q with cm < p + q < +∞ and f (p + q) = 0. The map g(x) = f (px + q) will then have the required properties. The proof for the other three possible shapes is similar. Details will be left to the reader. u t Thus far, we have assumed that our polynomials have distinct critical points, however it is often convenient to relax this condition in order to obtain a compact parameter space. First consider polynomials f (x) with derivative f 0 (x) = κ(x − c1 ) · · · (x − cm ), where a ≤ c1 ≤ · · · ≤ cm ≤ b for some real constant κ 6 = 0. The corresponding critical value vector (v1 , . . . , vm ) then satisfies the weaker inequalities (2): where σi = (−1)m−i sgn(κ). Just as in Lemma 3.1, the polynomial f is uniquely determined, up to precomposition with a positive affine transformation, by its critical value vector, which is required to satisfy only (2). Either the proof above or the proof in Appendix A can be adapted to show this. (Compare the Addendum to Appendix A.) Details will be omitted. There is a corresponding generalization of 3.2. However, when m is odd, we must now allow for the case of a constant function with f (x) identically equal to a or b. In this limiting case, note that the critical value vector v1 = · · · = vm is still well defined, although we can no longer distinguish m critical points. With this understanding, we have the following: Theorem 3.3. Given a specified shape σ , and given (v1 , . . . , vm ) ∈ I m satisfying (2), there is one and only one boundary anchored polynomial map f : I → I of degree m + 1 having (v1 , . . . , vm ) as critical value vector. The proof is quite similar to the proof of 3.2. u t It would be quite natural to parametrize the set of all boundary anchored polynomial maps f : I → I of shape σ by the polyhedron consisting of all vectors v = (v1 , . . . , vm ) ∈ I m satisfying the inequalities (2) for the shape σ . However, we will rather work with a slightly different but affinely equivalent polyhedron which is independent of the shape σ , and is invariant under affine reparametrization of the interval I (as used in renormalization theory). As a first attempt in this direction, note that

On Entropy and Monotonicity for Real Cubic Maps

135

p

2

p

1

p

p

2

1

Fig. 4. The parameters p1 and p2 for a cubic map of shape + − + on the left, and − + − on the right, on an interval of length 1

our interval I is the union of non-overlapping subintervals Ij , which map onto intervals f (Ij ) ⊂ I . Let length (f (Ij )) ∈ [0, 1] 1j = length (I ) be the relative length of this image interval. Since our maps are boundary anchored, these invariants 10 , . . . , 1m are not independent. (Their alternating sum is constant, and various inequalities are needed to guarantee that f (I ) ⊂ I .) However, we can obtain a set of invariants which are independent and more manageable by setting 1j = pj + pj +1 − 1 for

0 < j < m,

with

10 = p1 , 1m = pm . P (Note that the “normalized total variation” 1i of our m-modal map is equal to a constant plus 2(p1 + · · · + pm ).) If I is the interval [a, b], then we can write ( (vi − a)/(b − a) if σi = +1 (6) pi = (b − vi )/(b − a) if σ1 = −1. (Compare Figs. 4, 9.) In particular, in the special case of the unit interval I = [0, 1] this formula simplifies to ( vi if σi = +1 pi = 1 − vi if σ1 = −1. It is easy to check that the parameters p1 , . . . , pm ∈ [0, 1] satisfy only the relations pi + pi+1 ≥ 1 .

(7)

It may seem that an increase in any pi should lead to more complex dynamical behavior, but this is not quite true for the cubic family. (Compare Fig. 8.) However, in 5.6 we will see that the analogous statement for the family of “stunted sawtooth” maps is true. As parameter space for boundary anchored polynomial maps of degree m + 1 and shape σ (or indeed for any family of maps which can be parametrized by their folding value vectors) we take the polyhedron P m consisting of all vectors p = (p1 , . . . , pm ) ∈

136

J. Milnor, C. Tresser

Fig. 5. Picture of the polyhedron P 1 on the left, followed by the graphs of the quadratic maps of shape +− corresponding to three representative points of P 1

Fig. 6. The triangle P 2 , showing cubic maps of shape + − + on the left, and cubic maps of shape − + − on the right, for nine representative points of P 2 . Arrows point in the direction of the p1 and p2 axes

(constant map)

Fig. 7. The polyhedron P 3 together with graphs of the 4th degree maps of shape + − +− corresponding to its vertices. Arrows point in the direction of increasing entropy, with heavy arrows parallel to the pj axes

[0, 1]m which satisfy the inequalities (7). This is a convex polyhedral subset of the mdimensional unit cube. More exactly, P m can be described as the convex hull of the set of vectors (p1 , p2 , . . . , pm ) with pi ∈ {0, 1}, such that there is no consecutive pair of 0’s. From this characterization, we get: Proposition 3.4. The polyhedron P m has Fm+1 vertices, where Fi stands for the i th Fibonacci number.

On Entropy and Monotonicity for Real Cubic Maps

137

Proof of 3.4. Recall that the sequence of Fibonacci numbers is defined by Fn+2 = Fn + Fn+1 for n > 0, with F1 = F2 = 1. Notice first that P 2 is a segment with F1 = 1 vertex labeled 0 and F2 = 1 vertex labeled 1. Next, if P m has Fm−1 vertices whose label terminates with 0 and Fm vertices whose label terminates with 1, the “not two consecutive zeros” rule implies that P m+1 has Fm vertices whose label terminates by 0 and Fm−1 + Fm = Fm+1 vertices whose label terminates with 1. While the abstract polyhedron P m is completely determined by m, the way it is partitioned according to dynamical properties depends of course on the particular family of maps it parameterizes. Consider some family of m-modal maps parametrized by P m : we will write p = p(v) (which can be inverted to v = v(p)) so that, for any chosen shape, fp is the boundary anchored map on I in that family, which corresponds to the parameter vector p and to the folding value vector v. For example, in the polynomial case, the map f(1,... ,1) corresponding to the vector p = (1, 1, . . . , 1) is, up to sign, a Chebyshev polynomial: it is the unique boundary anchored polynomial map of shape σ with maximal topological entropy. Note. When m is odd, the two possible shapes for an m-modal map are not really different, since a map of shape (+ − · · · + −) with parameters (p1 , . . . , pm ) is topologically conjugate, under an orientation reversing reflection of its interval, to a map of shape (− + · · · − +) with parameters (pm , . . . , p1 ). However, when m is even the two shapes are dynamically essentially different. (Even the dynamical behavior of f on the boundary of I is enough to distinguish the two.)

4. Topological Entropy and Periodic Orbits A particularly useful measure of the dynamic “complexity” of a continuous self-map is provided by the topological entropy h which was defined in [AKM] as an invariant of topological conjugacy. The definition requires some notations. Let X be a non-vacuous compact space. By a cover C we mean simply a collection of subsets with union X. The “covering number” n(C) is defined to be the smallest cardinality of a subcollection of C with union X. Given a continuous map f : X → X and k ≥ 1, define Ckf to be the cover consisting of all intersections of the form C0 ∩ f −1 C1 ∩ · · · ∩ f 1−k Ck−1 , where each Ci is a set belonging to the collection C. (Here f −i (C) is the set of all x ∈ X k ` with f ◦i (x) ∈ C.) Using the inequality n(Ck+` f ) ≤ n(Cf ) · n(Cf ), one can check that the limit 1 log n(Ckf ) h(f, C) = lim k→∞ k exists, with 0 ≤ h(f, C) ≤ ∞. Furthermore, this limit is equal to the infimum inf

k>0

1 log n(Ckf ). k

In particular, h(f, C) ≤ n(C), so if n(C) is finite (for example if C is a covering by finitely many sets, or if C is a covering of the compact space X by open subsets), then it follows that h(f, C) must also be finite.

138

J. Milnor, C. Tresser

The topological entropy of f is defined to be the supremum of h(f, C) over all open covers of X. (This may be infinite, even if each h(f, C) is finite, since covers by smaller sets may yield larger values of h(f, C).) In the case of maps of the interval, combining theorems of Misiurewicz and Yomdin, we have the following. Let C ∞ (I, I ) be the space of C ∞ maps from a closed interval I to itself, with the C ∞ topology. Theorem 4.1. The topological entropy function h : C ∞ (I, I ) → [0, ∞) is continuous. Proof. Lower semi-continuity of the entropy, for interval maps, has been proved by Misiurewicz [Mis2]. (See also [ALM].) In fact he even proved lower semi-continuity for C 0 -maps with the C 0 -topology. Upper semi-continuity, for C ∞ -maps in any dimension, has been proved by Yomdin [Y]. Finally, the statement that h(f ) < ∞ is true for any C 1 map of a compact manifold, and follows from an easily verified bound which takes the form h(f ) ≤ log+ maxx |f 0 (x)| in the 1-dimensional case. (Here log+ (s) is defined to be the maximum of log(s) and zero.) u t The following immediate consequence of this theorem is essential for our purpose: Corollary 4.2. For any d, the topological entropy function is continuous on the finite dimensional compact space consisting of all polynomial maps of the interval with degree ≤ d. Remarks. It is essential for Yomdin’s Theorem that we work with the C ∞ topology. In fact Misiurewicz and Szlenk have shown that h : C r (I, I ) → [0, ∞) is not upper semicontinuous for any r < ∞. However, their example involves a sequence of m-modal maps with m → ∞, converging to a bimodal limit. Misiurewicz [Mis4] has recently proved the following statement, which is much sharper than 4.2: If M1` is the space of C 1 -smooth maps of the interval I which are m-modal for some m ≤ ` − 1, with the C 1 -topology, then h : M1` → [0, log(`)] is continuous. Note that Misiurewicz’ s lower semi-continuity result for entropy is true only in dimension one. As an example ([Mis1]), consider the family of maps Mt (x, y) = (4xy(1 − x), ty) of the unit square. Here h(M1 ) = log(2), but h(Mt ) = 0 for t < 1. In the case of diffeomorphisms of class C 1+ , Katok [Ka] has proved an analogous lower semi-continuity theorem in dimension 2, but again there are easy counterexamples in higher dimensions. For further information about continuity properties of the topological entropy function, see also [Mis3] and [N], as well as [D]. Consider a family of maps fp parameterized by some compact space P , and suppose that the topological entropy function p 7→ h(fp ) is continuous, with values in some interval [α, β].

On Entropy and Monotonicity for Real Cubic Maps

139

Definition. For each fixed h0 ∈ [α, β], the h0 -isentrope for this family is defined to be the set consisting of all parameter values p ∈ P for which the topological entropy h(fp ) is equal to h0 . Evidently the various isentropes are disjoint compact subsets, with union equal to P . Our goal is to show that all isentropes in the cubic family are connected. (Compare Fig. 8. In fact countably many of these isentropes are connected regions with interior, while one, with h0 = log(3), is a single point. It seems possible that all of the rest are simple arcs.) The rest of this section will outline basic results about the topological entropy of multimodal maps, which we will need in order to describe the structure of isentropes. First a fundamental formula proved by Misiurewicz and Szlenk and also by Rothschild: For any piecewise monotone map, we have log `(f ◦k ) log `(f ◦k ) = inf , k→∞ k>0 k k

h(f ) = lim

(8)

where ` is the lap number, as defined in Sect. 1 or Sect. 2. (Compare [MSz, Ro, ALM].) It follows that 0 ≤ h(f ) ≤ log `(f ). In particular, h(f ) ≤ log(m + 1) if f is m-modal. In practice, it is often more convenient to work with the quantity p γ = exp(h) = lim n `(f ◦n ), n→∞

known as the growth number of f . For a polynomial map of degree d > 0, note that this number γ lies in the closed interval [1, d]. In the special case of a piecewise linear map with |slope| = constant ≥ 1, the growth number γ is precisely equal to this constant |slope|. (Compare [MSz].4 ) The proof of (8) actually yields a slightly sharper statement as follows. Let us say that a cover C of the interval I is f -mono if C consists of finitely many (possibly degenerate) subintervals, and if f is monotone on each of these subintervals. Evidently such a cover exists if and only if f is piecewise monotone. According to [MSz] (or [ALM], Prop.4.2.3), we have h(f ) = h(f, C)

whenever the cover C is f -mono.

(9)

In particular, interpreting the alphabet A of Sect. 2 as an f -mono cover of the interval I , we have h(f ) = h(f, A). (Compare 4.4 below.) One well known consequence of (9) is an easy computation of γ , or of h = log(γ ), in the case of a map where the orbits of all of the folding points and end points are finite. For such maps, the forward orbits of these points cut the interval I into finitely many subintervals. Numbering these subintervals in their natural order as J1 , . . . , Jn , the associated n × n Markov transition matrix M is defined by setting ( +1 if f (Ji ) ⊃ Jj , Mij = 0 otherwise. 4 Here it is essential that f be piecewise monotone. The map F (x) = inf{ 3 |x − 1/2n | ; n ≥ 0 } on the unit interval has |slope| = 3 almost everywhere, and yet has h(F ) = 0 since F (x) ≤ x for all x.

140

J. Milnor, C. Tresser

2 Fig. 8. Isentropes h = log(γ ) = constant in the parameter triangle √ P for real cubic maps of shape (+ − +) above and (− + −) below. Here ρ stands for the golden ratio (1 + 5)/2, associated with an attracting period 3 orbit with both critical points in its immediate basin. Similarly, the value γ = 2 is associated with a “capture” component, with say c1 and f ◦2 (c2 ) in the immediate basin of an attracting fixed point. Compare [M1, App. B]

On Entropy and Monotonicity for Real Cubic Maps

141

Recall that a real or complex number γ is called an algebraic integer if it is a root of a monic polynomial equation with integer coefficients, and an algebraic unit if both γ and 1/γ are algebraic integers. Lemma 4.3. If f is a piecewise monotone map such that the orbits of the folding points are eventually periodic, then the topological entropy of f has the form h = log(γ ), where γ is the largest real eigenvalue of the associated Markov transition matrix. In particular, γ is an algebraic integer. Furthermore log #fix(f ◦k ) , h ≤ lim sup k k→∞ where #fix(f ◦k ) denotes the number of fixed points of f ◦k . If all of the folding points are actually periodic, then γ is an algebraic unit. (Compare 4.8 and 4.10.) See [BST] as well as 5.10 for methods of computing h in more general cases, based on this formula h = log γ . Outline Proof of 4.3. It is convenient to use the sum of absolute values norm kAk = P |Aij | for any n × n real or complex matrix. If λ1 , . . . λn are the eigenvalues of A, and if γ is the maximum of |λi |, then careful matrix estimates show that log γ = lim log kAk k /k = lim sup log |trace(Ak )| /k. k→∞

k→∞

We will apply these equations to the Markov matrix M. In this case, it follows from the Perron-Frobenius Theorem, that the number γ = max|λj | is itself an eigenvalue, and hence can be described as the largest real eigenvalue of M. Let us say that a finite or infinite sequence (Ji0 , Ji1 , . . . ) of subintervals Jik is M-admissible if f (Jik ) ⊃ Jik+1 for all appropriate k. It is easy to check that the number of sequences (Ji0 , Ji1 , . . . , Ji` ) of length ` + 1 which are M-admissible is precisely equal to the norm kM ` k. These facts, together with Eq. (9) for the covering {Ji }, imply that h = log γ . Similarly, the number of infinite M-admissible sequences which are periodic of period dividing k is equal to the trace of M k . Since every M-admissible sequence of Ji with period dividing k corresponds to at least one periodic point x = f ◦k (x), and since this sequence is usually uniquely determined by x, the required inequality for periodic points follows. Now suppose that every folding point of f belongs to a periodic orbit. List the points x1 < x2 · · · xn+1 which belong to the orbits of folding points. The action of f on these points is described by an (n + 1) × (n + 1) permutation matrix A, where Aij = 1 if and only if f (xi ) = xj . The associated Markov transition matrix M between the intervals [xi , xi+1 ] can be constructed out of A in four steps, as follows. Let A0 be the matrix obtained from A by adding ones to the left of every one. (For a rough bar-graph of the map, rotate this matrix 90◦ counterclockwise.) Let A00 be obtained from A0 by replacing every row Ri except the first by Ri − Ri−1 , and let A000 be the smaller matrix which is obtained from A00 by deleting its first row and first column. Then it is not difficult to check that all four of these matrices have the same determinant ±1. Note that the i th row of A000 is negative if and only if f is decreasing on the interval [xi , xi+1 ]. The required Markov matrix M, with Mij = 1 if and only if the image of the interval [xi , xi+1 ] contains [xj , xj +1 ], can now be obtained from A000 by changing the sign of every negative row, so that Mij = |A000 ij |. Since M is a matrix of integers with determinant ±1, it follows that every eigenvalue is an algebraic unit. In particular, the largest eigenvalue γ must be

142

J. Milnor, C. Tresser

an algebraic unit. This computes the topological entropy of f restricted to the interval [x1 , xn+1 ]. The actual interval of definition for f may be strictly larger than [x1 , xn+1 ]. However, it is not difficult to check that the portion of this interval to the left of x1 or to t the right of xn+1 makes no contribution to h. (Compare the proof of 4.4 below.) u P Remark. Let (L1 , . . . , Ln ) be an eigenvector for M, so that j Mij Lj = γ Li . If the Li are strictly positive, then we can construct a piecewise linear “model” for f , having slope ±γ everywhere, as follows. Replace each Ji by an interval of length Li , so that f (Ji ) will be replaced by an interval of length γ Li . Interpolating linearly we obtain the required piecewise linear map, which has the same kneading data as f . As another application of (9), we will prove the following. Lemma 4.4. The topological entropy of an m-modal map (f, c) is determined by its kneading data. Proof. Assume first that f : I → I is boundary anchored. Recall that an itinerary I(x) = (A0 , A1 , . . . ) is called acritical if each Ai belongs to the subalphabet A0 = {I0 , . . . , In }. Definition. For each k > 0, let Adm(f, k), or more precisely Adm(f, c, k), be the number of acritical sequences of length k which are admissible for f . According to Corollary 2.4, these numbers Adm(f, k) are determined by the kneading data for f . On the other hand, it is not difficult to check that `(f ◦k ) ≤ Adm(f, k) ≤ Card(Akf ). (In fact `(f ◦k ) = Adm(f, k) in the piecewise strictly monotone case.) Taking the logarithm of these quantities, dividing by k, and then letting k → ∞, we see that 1 log Adm(f, k) ≤ h(f ) k→∞ k

h(f ) ≤ lim by (8) and (9), hence

h(f ) = lim

k→∞

1 log Adm(f, k) . k

(10)

Thus h(f ) is determined by kneading data when f is boundary anchored. Now consider the more general case where f : I → I is not boundary anchored. It is not difficult to extend f to a boundary anchored map g : J → J on a strictly larger interval J ⊃ I , where g has the same shape and the same kneading data. Let L and R be the two connected components of J r I . (One of these two may be empty.) Since g(I ) = f (I ) ⊂ I , we see that any orbit for g either lies completely in I , or lies completely in L ∪ R, or else consists of a finite initial segment in L ∪ R followed by a terminal segment in I . Note also that neither g(L) nor g(R) can intersect both L and R. Hence there are only two possible sequences of any given length in L ∪ R. It follows that k X Card(Aif ). Card(Akg ) ≤ 2 i=0

On Entropy and Monotonicity for Real Cubic Maps

143

Now for any constant log(c) > h(f ) we have (log Card(Akf ))/k < log(c) or equivalently Card(Akf ) < ck , for large k. Hence there is a constant a so that Card(Akf ) < a ck for all k. It then follows that Card(Akg ) < 2a(1 + c + · · · + ck ) < a 0 ck for some constant a 0 . Therefore h(g) < log(c) for all such c. This implies that h(g) ≤ h(f ), hence h(g) = h(f ), as required. u t Using these results, we can define a useful partial order on the possible kneading data K(f ) for m-modal maps of a given shape. Let us say that K(f ) K(g) if and only if Ki (f ) ≥ Ki (g) when σi = −1, Ki (f ) ≤ Ki (g) when σi = +1, for 1 ≤ i ≤ m. Corollary 4.5. If K(f ) K(g) , then h(f ) ≥ h(g). Proof. If K(f ) K(g), then it is easy to check that every admissible sequence for g is also admissible for f . Therefore Adm(f, k) ≥ Adm(g, k) (with notation as in the proof of 4.4). Making use of formula (10), the conclusion follows. u t This corollary will help us to reduce questions about isentropes to questions about kneading data. More precisely, our strategy will develop as follows: we will first verify the Connected Isentrope Conjecture for the special family of “stunted sawtooth” maps, and then transfer as much as we can of parameter space information from that family to the cubic family. We will see in 5.8 that topological entropy depends continuously on the kneading data. Here is a preliminary result in that direction. We give the space AN of itineraries the usual infinite product topology. Lemma 4.6. Topological entropy is upper semi-continuous as a function of kneading data. Proof. Let f : I → I be a map with given kneading data, and let gk : I → I be any sequence of maps with the same shape such that the truncations of corresponding kneading sequences of f and gk agree up to length k. It follows that Adm(f, k 0 ) = Adm(gk , k 0 ) for k 0 ≤ k. Given > 0, it follows from (10) that we can choose k0 large enough so that (log Adm(f, k0 ))/k0 < h(f ) + . Then for k ≥ k0 we have h(gk ) ≤ (log Adm(gk , k0 ))/k0 = (log Adm(f, k0 ))/k0 < h(f ) + , as required. u t Corollary 4.7. Let f be a piecewise monotone map whose kneading sequences are all acritical, so that the orbit of a folding value never meets a folding point. Then the topological entropy is continuous at f under C 0 -deformations which move the folding points continuously, keeping these points separated and keeping their number fixed. Proof. Upper semi-continuity follows easily from Lemma 4.6. Lower semi-continuity holds true at all continuous interval maps by [Mis2]. (In fact here, the same result for piecewise monotone maps, as proved in [MSz], would suffice.) u t

144

J. Milnor, C. Tresser

For one-dimensional maps, there is a close relationship between topological entropy and the existence of periodic orbits. Let us define hper (f ) = lim sup k→∞

log #fix(f ◦k ) , k

where #fix is the number of fixed points. (Compare 4.3.) Caution. It is essential to take the lim sup since the limit may well not exist. √ As an (x) = s max(x, 1 − x), with slope s = 2 chosen example, consider the tent map T s √ √ √ √ so that the two subintervals [ 2 − 1, 2 − 2] and [2 − 2, 2/2] map to each other. Then it is not difficult to check that ( 2 for k odd #fix(Ts◦k ) = k/2+1 for k even. 2 Hence (log #fix(Ts◦k ))/k tends to√zero as k tends to infinity through odd integers, but tends to the limit hper (Ts ) = log 2 as k tends to infinity through even integers. Lemma 4.8 (Misiurewicz and Szlenk). The inequality hper (f ) ≥ h(f ) is valid for any piecewise monotone map. Compare [MSz]. Misiurewicz later showed that this inequality is valid for any continuous interval map (compare [Mis2] or [ALM, 4.3.14]). However, the piecewise monotone case will suffice for our purposes. (Compare 4.12.) In fact, equality holds in many important cases. See 4.10 below. We will need to subdivide periodic orbits into three classes. Recall that every periodic point, f ◦p (x) = x has an itinerary I(x) = (A0 , A1 , . . . ) which is also periodic, with Ai = Ai+p . Define the sign of this fixed point of f ◦p to be the product (A0 )(A1 ) · · · (Ap−1 ), where (Ij ) = σj ∈ {±1} and (Cj ) = 0, as in Sect. 2. (In the special case where f is piecewise strictly monotone, this sign is either +1, −1, or zero according as f ◦p is increasing, decreasing, or has a folding point at x.) A periodic orbit will be called of positive, negative, or critical type according as its sign is +1, −1, or 0. Similarly, a periodic sequence in AN has either positive, negative or folding type. We will be particularly interested in periodic points of negative type, since they are quite stable under perturbation of the m-modal map (f, c). In particular, we have the following, with notation as in 4.5. Lemma 4.9. Given any admissible symbol sequence {Ai } which is periodic of negative type, with Ai = Ai+p , there is one and only one fixed point of f ◦p which has this symbol sequence as itinerary. Hence the number Neg(f ◦p ) of fixed points of negative type for each iterate f ◦p is completely determined by the kneading data for (f, c), and satisfies Neg(f ◦p ) ≤ Adm(f, p). Furthermore, if K(f ) K(g) , then Neg(f ◦p ) ≥ Neg(g ◦p ) for every p ≥ 1. (Note that a fixed point of negative type for f ◦p counts also as a fixed point of negative type for the odd iterates f ◦3p , f ◦5p , . . . , but as a fixed point of positive type for f ◦2p , f ◦4p , . . . .)

On Entropy and Monotonicity for Real Cubic Maps

145

Proof of 4.9. Let J = (α, β) ⊂ I be the subinterval consisting of all x ∈ I with A(f ◦i (x)) = Ai for 0 ≤ i < p. Then the restriction f ◦p |J is monotone decreasing with f ◦p (J ) ∩ J 6 = ∅. This implies that f (α) > α and f (β) < β. Hence by the Intermediate Value Theorem, f ◦p |J has a fixed point, which is unique since this restriction is monotone decreasing. Thus Neg(f ◦p ) is equal to the number of admissible sequences which are fixed points of negative type for the p-fold iterate of the shift. It follows that Neg(f ◦p ) ≤ Adm(f, p). In the boundary anchored case, the statement that the numbers Neg(f ◦p ) are completely determined by the kneading data follows immediately from Corollary 2.4. The general case follows since, extending f to a boundary anchored map as in the proof of 4.4, it is easy to see that no periodic orbit of negative type can involve either of the two intervals which are adjoined to the ends of I . u t Lemma 4.10. If f is piecewise monotone, with at most finitely many non-repelling periodic orbits, then log+ Neg(f ◦k ) . h(f ) ≥ hper (f ) = lim sup k k→∞ Combining this inequality with 4.8, it follows of course that h(f ) = hper (f ) = lim sup k→∞

log+ Neg(f ◦k ) k

(11)

whenever f has at most finitely many non-repelling periodic orbits. This hypothesis is satisfied in many important cases. For example, for a polynomial map of degree d > 1, the classical theory of Fatou and Julia shows that the number of such orbits is at most d − 1. For a smooth5 m-modal map with negative Schwarzian derivative, the number of such orbits is at most m + 2. For an m-modal stunted sawtooth map with m > 1, as studied in Sect. 5, the number of such orbits is at most m. Remark. This close relationship between topological entropy and periodic orbits exists only in low dimensions. Katok [Ka] has proved the inequality hper ≥ h for 2-dimensional diffeomorphisms which are C 1+α -smooth. However, Kaloshin [K] has shown that hper is infinite (and hence strictly greater than h) for C r -generic maps inside a “Newhouse region” in parameter space. As soon as we go to higher dimensions or allow non-smooth maps, there can be maps of positive entropy with no periodic orbits at all. As an example, the cartesian product of an irrational rotation of the circle with an arbitrary dynamical system for which h > 0 will have h > 0, but no periodic orbits. For the case of nonsmooth surface homeomorphisms, see Rees [Re]. Outline Proof of 4.10. First suppose that all of the periodic orbits of f are non-folding and strictly repelling. Then evidently the fixed points of f ◦p , that is the places where the graph of f ◦p crosses the diagonal, must be alternately of positive and negative type, and it follows easily that |#fix(f ◦p ) − 2 Neg(f ◦p )| ≤ 1. If there are ` periodic points which are either of folding type or are non-repelling, then a similar argument shows that |#fix(f ◦p ) − 2 Neg(f ◦p )| ≤ 2` + 1. 5 Note also the following theorem of Martens, de Melo and van Strien [MMS]: For any C 1 -smooth interval map with non-flat critical points, every orbit of sufficiently high period is repelling.

146

J. Milnor, C. Tresser

Whenever this number ` is finite, it follows immediately that hper (f ) = lim sup log+ Neg(f ◦k ) /k, k→∞

t and since Neg(f ◦p ) ≤ Adm(f, p) by 4.9, it follows by (10) that hper ≤ h. u Main Theorem 4.11. For any m-modal map, we have h(f ) = lim sup k→∞

1 log+ (Neg(f ◦k )). k

This is proved in [MTh], and also in [Pr]. (Either one of these references describes explicitly how to compute these numbers Neg(f ◦k ) from the kneading data.) Remark 4.12. These results are closely related. Thus 4.8 is an immediate corollary of 4.11, since #fix(f ◦p ) ≥ Neg(f ◦p ). On the other hand, a proof that 4.11 follows from 4.8 can be sketched as follows. If f has only finitely many non-repelling periodic orbits, then 4.11 follows immediately from 4.8 and 4.10. However, we know by 4.4 that the topological entropy h(f ) is completely determined by the kneading data for f , and we know by 4.9 that the numbers Neg(f ◦p ) are completely determined by the kneading data for f . Hence, if we can find just one example of a map g which has the same kneading data as f , and which has only finitely many non-repelling periodic orbits, then 4.11 will follow. In fact such an example will be provided in the next section (5.3 together with 5.4). Another proof of both 4.8 and 4.11 will be described at the end of Sect. 5. Define the negative orbit complexity of f to be the sequence N (f ) = Neg(f ◦1 ), Neg(f ◦2 ), Neg(f ◦3 ), . . . of non-negative integers, and define the relation N (f ) N (g) to mean that Neg(f ◦p ) ≥ Neg(g ◦p ) for all p, then we can summarize 4.5, the last statement of 4.9, and 4.11 as follows. Corollary 4.13. The kneading data K(f ) determines the negative orbit complexity N (f ), which in turn determines the topological entropy h(f ), with K(f ) K(g) ⇒ N (f ) N (g) ⇒ h(f ) ≥ h(g).

(12)

5. The Stunted Sawtooth Family Closely related to kneading theory is a special family of piecewise monotone maps which is rich enough to encompass in a canonical way all possible kneading data and all possible itineraries. (Compare [Gu, BCMM], as well as Fig. 9.) In order to introduce this family, we first introduce the sawtooth map of specified shape σ = (σ0 , . . . , σm ). This map, on an interval J , can be characterized as the unique piecewise linear map S : J → R which is boundary anchored of shape σ , with slope ±s everywhere where s > m+1 is some specified constant,6 and with folding points in arithmetic progression.

On Entropy and Monotonicity for Real Cubic Maps

147

v2

2αp1

v1 2αp2

−α

α

Fig. 9. The bimodal sawtooth map of shape (− + −), and a representative stunted sawtooth map of the same shape

Caution. This map S does not carry the interval J into itself. In fact the folding values of S all lie outside of J . The precise choice of the constant slope s doesn’t matter, but to fix our ideas, let us always take s = m + 3/2. One choice of the interval J is particularly convenient for bm choose the numbers b1 , . . . , C kneading theory: As folding points C −m + 1 , − m + 3 , . . . , m − 1, bj = 2j − m − 1, and with S(Cˆ j ) = −σj s as the corresponding folding values. with C Now choose a base point Ibj = 2j − m in the j th lap, so that b1 < Ib1 · · · C bm < Ibm Ib0 < C are consecutive integers. Then S is given by the formula S(x) = (x − Iˆj ) σj s for x ∈ Ij .

(13)

We must choose a domain of definition J for S so that S(∂J ) ⊂ ∂J . The appropriate choice is the interval J = [−α, α], where ms < m + 1 < s. α= s−1 Since α = s (α − m), it follows that S(±α) ∈ {±α}, as required. As in Sect. 2, we work with the ordered alphabet A = {I0 , C1 , I1 , . . . , Cm , Im }. bj , or briefly A 7 → A b for A ∈ A, defines an The correspondence Ij 7 → Ibj , Cj 7 → C order preserving embedding of this alphabet into the integers. We can extend to a map 8 : AN → R by setting b1 /s + 0 1 A b2 /s 2 + · · · b0 + 0 A 8(A0 , A1 , . . . ) = A =

∞ X

bk /s k . 0 · · · k−1 A

(14)

k=0

(Recall that i = (Ai ) with (Ij ) = σj and (Cj ) = 0.) Order the space of sequences AN as in Sect. 2. 6 In the preliminary publication [DGMT] (and also in 2.2) we took s = m + 1. However, here we will need s > m + 1 in order to get an actual embedding of the set of all K-admissible sequences in AN into the reals.

148

J. Milnor, C. Tresser

Lemma 5.1. The image 8(AN ) is contained in the interval J = [−α, α], and 8 is related to the sawtooth map on J by the identity 8(A1 , A2 , A3 , . . . ) = S 8(A0 , A1 , A2 , . . . ) whenever 0 6 = 0, or in other words whenever A0 ∈ {I0 , I1 , . . . , Im }. (Thus 8 semiconjugates the shift map on AN to the sawtooth map on J , except when A0 is a folding point symbol.) This map 8 : AN → J is strictly order preserving, provided that we consider only sequences which satisfy the first compatibility condition of Sect. 2 with some fixed kneading data. bj | ≤ m, we have Proof of 5.1. Since |A |8(A0 , A1 , . . . )| ≤ m (1 + 1/s + 1/s 2 + · · · ) = α, hence 8(AN ) ⊂ J . We can write Eq. (14) as b0 + (A0 ) 8(A1 , A2 , . . . ) /s. 8(A0 , A1 , . . . ) = A

(140 )

If (A0 ) = ±1, then Eq. (140 ) can be solved for b0 . 8(A1 , A2 , . . . ) = s (A0 ) 8(A0 , A1 , . . . ) − A Comparing (14), we can write the right-hand side of this equation as S 8(A0 , A1 , . . . ) , as required. Finally, we must show that the map 8 : AN → J is strictly monotone, in the sense that (A0 , A1 , . . . ) < (B0 , B1 , . . . )

⇐⇒

8(A0 , A1 , . . . ) < 8(B0 , B1 , . . . ),

provided that both sequences satisfy the First Compatibility Condition of Sect. 2 with the same kneading data. Let k ≥ 0 be the smallest index with Ak 6 = Bk . It follows from (140 ) that b0 )| ≤ α/s < 1, |8(A0 , . . . ) − A b0 when A0 is a folding point symbol. If k = 0 so that A0 < B0 , with 8(A0 , . . . ) = A then it follows immediately from this inequality that 8(A0 , A1 , . . . ) < 8(B0 , B1 , . . . ). Now if k ≥ 1, then (A0 ) must be non-zero, and an easy induction on k, using (140 ), proves the corresponding inequality. u t Now suppose that we are given an m-modal map f : I → I of shape σ . For any non-folding point x ∈ I ⊂ I the itinerary I(x) ∈ AN maps to a real number θ(x) = 8(I(x)) ∈ J, which provides an invariantly defined coordinate for the point x with respect to the map f . It follows from 5.1 together with Sect. 2 that θ : I → J is a monotone function, in the sense that x < y ⇒ I(x) ≤ I(y) ⇒ θ (x) ≤ θ (y).

On Entropy and Monotonicity for Real Cubic Maps

149

bj , and that θ (x) = θ (y) (However, θ is certainly not continuous.) Note that θ (cj ) = C if and only if x and y (and hence all points between x and y) have the same itinerary. For any x ∈ Ij it follows from 5.1 that (15) θ(f (x)) = s σj θ (x) − Ibj = S θ (x) . Thus θ is a monotone but discontinuous semiconjugacy from the map f on I to the sawtooth map S on J , provided that we exclude folding points. However, this identity θ(f (x)) = S(θ(x)) definitely breaks down when x = cj , since θ (f (cj )) ∈ J but bj ) 6 ∈ J . To correct this problem, we truncate the map S as follows. S(θ(cj )) = S(C Let us specify some folding value vector v = (v1 , . . . , vm ) ∈ J m , subject to the usual inequalities (2): vj ≤ vj +1 when σj = +1, vj ≥ vj +1 when σj = −1, for 0 < j < m. Definition. The stunted sawtooth map Sp : J → J is obtained by chopping off the successive peaks and pits of the map S at the heights v1 , . . . , vm , as shown in Fig. 9. Thus Sp is a continuous function which takes the constant value vj throughout the largest bj = 2j − m − 1 on which connected neighborhood of C bj ) = +α S(y) ≥ vj if S(C bj ) = −α, S(y) ≤ vj if S(C but Sp (y) is equal to S(y) otherwise. Note. Here, as in Sect. 3, we parametrize the admissible folding value vectors v ∈ J m by vectors p belonging to a standard polyhedron P m ⊂ [0, 1]m . Since J = [−α, α], the transformation p ↔ v of (6) is given by vj = (2pj − 1) σj α ∈ [−α, α] or pj =

1 σj vj + ∈ [0, 1]. 2 2α

It is not difficult to write down a more explicit formula as follows: ( bj | ≤ 1 + σj vj /s for some j , if |y − C vj Sp (y) = S(y) otherwise.

(60 ).

(16)

The interval of constancy bj | ≤ 1 + σj vj /s } { y ∈ J ; |y − C will be called the j th plateau of Sp . Since s > α ≥ |vj |, these plateaus always have strictly positive length. Note that the heights of these plateaus are just the corresponding folding values. Caution. If vj = vj +1 (or equivalently if pj + pj +1 = 1) then the j th and j + 1st plateaus have a common endpoint, and together form a longer interval of constancy. More generally, it follows from (16) that the distance between the j th and j + 1st plateaus is equal to |vj +1 − vj |/s ≥ 0.

150

J. Milnor, C. Tresser

bj , which can By definition, the j th designated folding point of Sp is just the point C th also be described as the midpoint of the j plateau. Note that every Sp is boundary anchored, mapping ∂J = {±α} into itself by the fixed map Sp (−α) = −σ0 α , Sp (+α) = +σm α. In order to relate this construction to kneading theory, we choose the folding value vector v by setting vj = 8(Kj ), with 8 as in 5.1. In this way, we prove the following. Theorem 5.2. To any shape σ and any m-tuple K of symbol sequences satisfying the Compatibility Conditions 1, 2, 3 of Sect. 2, there is associated a canonical stunted sawtooth map Sp which has exactly these kneading data. Furthermore a symbol sequence in AN actually occurs as the itinerary of some point under Sp if and only if it is admissible (i.e., satisfies Compatibility Conditions 1 and 2). Proof. Let S be the m-modal sawtooth map with shape σ , and let Sp be the associated truncated map with folding values vj = 8(Kj ). The Third Compatibility Condition guarantees that these folding values satisfy the required inequalities (2). Notice that the orbits under S and Sp are identical as long as they do not enter a plateau of Sp . Next, using Compatibility Condition 2, it is not difficult to show that the orbit of a folding value can enter the interior of a plateau of Sp only at the folding point in this plateau. It follows that the folding values vj have the same itineraries, up to their first folding point if any occurs on the itinerary, under S and under Sp . We then use the First Compatibility Condition to show that the full itinerary of vj under Sp is equal to the given Kj . Further details are straightforward, and will be left to the reader. u t Remark. The map S1 corresponding to the vector p = 1 = (1, 1, . . . , 1) is the unique stunted sawtooth map of shape σ with maximal numbers of periodic orbits and maximal topological entropy. If X1 is the set of points in J whose orbit under S1 meets the various plateaus of S1 only at endpoints or midpoint, then we see that each point of X1 is uniquely characterized by its itinerary, which can be either a completely arbitrary infinite sequence in {I0 , I1 , . . . , Im }N , or any finite sequence of symbols from {I0 , I1 , . . . , Im } followed by a folding point symbol Cj (with 1 ≤ j ≤ m), and then followed by either Imin or Imax according as σj is +1 or −1. In particular, given any vector K = (K1 , . . . , Km ) of symbol sequences, there are unique points vj ∈ X1 such that the itinerary of each vj under S1 coincides with the given Kj up to the first folding point symbol (if any) in this sequence. If the three Compatibility Conditions are satisfied, then taking these points vj ∈ X1 to be the folding values for a stunted sawtooth map Sp(v) , we easily obtain another proof of Theorem 5.2. (Compare [DGMT].) Corollary 5.3. To any m-modal map f : I → I there is associated a canonical stunted sawtooth map Sp : J → J which has exactly the same kneading data. Furthermore, the (monotone but discontinuous) correspondence θ : I → J semiconjugates f to Sp . That is θ (f (x)) = Sp θ (x) for all x. The proof is straightforward. u t To complete the discussion in Remark 4.12, we also need the following observation.

On Entropy and Monotonicity for Real Cubic Maps

151

Lemma 5.4. An m-modal stunted sawtooth map can have at most m non-repelling periodic orbits. For at most one periodic orbit can intersect any plateau. But any period q orbit which t does not meet any of the plateaus must be strictly repelling, with multiplier ±s q . u Remark 5.5. Theorem 5.2 can be reformulated by saying that for any shape σ , the set of possible kneading data injects canonically into the convex polyhedron P m . We will write K 7 → p(K). This allows us to replace the sometimes cumbersome comparisons of symbol sequences by comparisons of numbers. Evidently K K0

⇐⇒

pi (K) ≥ pi (K0 ) for all i,

with notation as in 4.5. Remark 5.6. Conversely, suppose that p, p0 ∈ P m satisfy pi ≥ pi0 for all i. Then it is not hard to show that K(Sp ) K(Sp0 ), and it follows (or can be shown directly) that Adm(Sp , k) ≥ Adm(Sp0 , k) for all k, that N (Sp ) N (Sp0 ), and that h(Sp ) ≥ h(Sp0 ). (Compare 4.4 and 4.13.) Similarly, an easy argument shows that an increase in pi can only increase the number of period k orbits, or the number of fixed points of Sp◦k . (Compare the proof of 6.1.) Lemma 5.7. The topological entropy of a stunted sawtooth map depends continuously on its parameters, or in other words on its vector of folding values. Proof. Lower semi-continuity follows immediately from [Mis2], as noted in Sect. 4. To prove upper semi-continuity, we note that the topological entropy h(f ) depends only on the mapping f , and not on which particular points within the various plateaus are designated as folding points. However, we can choose these folding points to be disjoint from the forward orbits of all the folding values. Upper semi-continuity then follows from 4.6. u t Recall from 4.4 that topological entropy is uniquely determined by kneading data. Combining 5.2 and 5.7 we have the following sharper form of 4.6. Corollary 5.8. Topological entropy depends continuously on kneading data. For the canonical model of 5.2 certainly depends continuously on kneading data. (Alternatively, a direct proof of 5.8 could be based on the methods used in [MTh, Lemma 12.3].) u t Definition. Recall that a polynomial map is called “hyperbolic” if every critical point lies in the basin of some periodic attractor. To simplify the analogous discussion for a stunted sawtooth map Sp , we consider only the case where Sp is strictly m-modal; that is, we assume that consecutive folding values are distinct. It is not hard to see that a periodic orbit for such a map Sp is attracting, and remains attracting under perturbation of the map, if and only if it contains an interior point of some plateau, and hence actually absorbs all orbits in a neighborhood. Let us call a strictly m-modal Sp hyperbolic if the forward orbit of each folding value eventually lands in the interior of some plateau. Clearly this is an open condition. The generic hyperbolicity property of stunted sawtooth maps can be stated as follows. Lemma 5.9. The stunted sawtooth maps which are strictly m-modal and hyperbolic form a dense open set in the space of all stunted sawtooth maps of specified shape.

152

J. Milnor, C. Tresser

Proof of 5.9. Openness is clear. Furthermore, it is clear that the strictly m-modal maps are dense. Suppose inductively that every Sp can be approximated by a strictly m-modal Sq for which the orbits of the first k − 1 folding values eventually hit the interior of some plateau. Then we can choose some > 0 so that these conditions will remain true as we change the k th folding value throughout an -neighborhood. Now choose an integer n so that the product s n is greater than the length of the entire interval J . As we move the folding value vk with unit speed through its -neighborhood, its forward image Sq◦n will move with speed ±s n , so long as its orbit does not pass through any plateau. But we have chosen n with s n large enough so that this is impossible. Hence some forward image must pass though a plateau; which completes the inductive construction. u t Combining 4.3 with the proof of 5.9, we have the following basic result. Theorem 5.10. The topological entropy of a piecewise monotone map can be effectively computed to any required degree of accuracy from its kneading data. Proof. We must produce computable upper and lower bounds for h(f ), arbitrarily close to each other. The proof will be by induction on the number m of folding points. To construct a lower bound, first suppose that the kneading data K = K(f ) satisfies Ki 6 = Ki+1 for every i, so that the associated stunted sawtooth map Sp , with p = p(K) as in 5.5, has distinct adjacent folding values. Then using the argument above we see easily that there exists a hyperbolic map Sp0 so that p0 is arbitrarily close to p, and with pi0 ≤ pi for all i. Here h(Sp0 ) is effectively computable by 4.3, and can be chosen arbitrarily close to h(f ) = h(Sp ) by 5.7. On the other hand, if Sp has two adjacent plateaus at the same height, then by ignoring the two corresponding critical points it can be considered as an (m − 2)-modal map, and the conclusion follows by induction. A completely analogous argument produces a computable upper bound h(f ) ≤ h(Sp00 ) which is arbitrarily close to h(f ). One simply chooses p00 close to p, with pi00 ≥ pi for all i, so that the orbit of each critical value is eventually periodic. In fact if pi = 1, then the associated orbit is already eventually periodic, while if pi < 1 then arguing as in 5.9 t we can choose pi00 so that its orbit eventually hits a plateau. u Remarks. A closely related algorithm, more explicitly worked out, is described in [BST]. For the special case of a bimodal map, Block and Keesling [BK] have given a fast algorithm, based on comparison with maps of |slope| = constant. (This was used for the plots in Fig. 8.) Note that the question of effective computability of entropy for more general dynamical systems remains open: compare [HKC]. As a corollary, we can give another proof of two basic results from Sect. 4. Proof of 4.11 and 4.8. We first show that

h(f ) = lim sup log+ Neg(f ◦k ) /k k→∞

for every piecewise monotone map. This statement is certainly true for the approximating maps Sp0 and Sp00 of 5.10, by Lemma 4.3, together with 4.10 and 5.4. Since N (Sp0 ) N (Sp ) = N (f ) N (Sp00 ) by 4.9 and 5.6, the corresponding statement for f follows. The inequality hper (f ) ≥ h(f ) is an immediate corollary. u t

On Entropy and Monotonicity for Real Cubic Maps

153

6. Contractibility of Isentropes for the Stunted Sawtooth Family We consider the family of stunted sawtooth maps of some specified shape σ = (σ0 , . . . , σm ), which remains fixed throughout this section. Parameterize this family by the standard polyhedron P m , consisting of all vectors p = (p1 , . . . , pm ) ∈ [0, 1]m satisfying pi + pi+1 ≥ 1 for 1 ≤ i < m , : P m → [0, log(m + 1)] be the topological entropy as described in Sect. 3. Let hsaw σ function hsaw σ (p) = htop (Sp ). These functions are continuous by Lemma 5.7. m Caution. This function hsaw σ : P → [0, log(m + 1)] definitely depends on the choice of shape σ , and also on the fact that we are working with the family of stunted sawtooth maps with specified slope s > m + 1, rather than some other family. For m odd, the two choices of σ are related by a canonical involution of P m . However for m even it is important to realize that the two possible choices of shape yield families which are essentially different, and have quite different topological entropy functions. saw = h }, for We will use the notation {p ∈ P m ; hsaw 0 σ (p) = h0 }, or briefly {hσ m saw the isentrope consisting of all p ∈ P with topological entropy hσ (p) equal to h0 . m saw Similarly we sometimes write {hsaw σ ≤ h0 } for the compact subset {p ∈ P ; hσ (p) ≤ h0 }. The object of this section is to prove the following. Theorem 6.1. For each h0 ∈ [0, log(m + 1)] the isentrope {p ∈ P m ; hsaw σ (p) = h0 } is contractible. The proof will be based on two lemmas. We first construct a partial ordering of the polyhedron P m so that if p q then the corresponding stunted sawtooth map Sp has at saw least as many periodic orbits as does Sq . By 4.10, this will imply that hsaw σ (p) ≥ hσ (q). m Within the interior of P we can simply say that p q if and only if pj ≥ qj for all j . (As any coordinate pj increases, any periodic orbit which intersects the j th plateau deforms continuously, while any other periodic orbit remains unchanged.) However, to take care of implications on the boundary we must give a more complicated definition. It will be convenient to say that p contains a level block of length ` if there are indices 1 ≤ i0 ≤ j0 ≤ m with ` = j0 − i0 + 1 so that pi + pi+1 = 1 for i0 ≤ i < j0 . This means that the corresponding folding value vector v = v(p) satisfies vi0 = vi0 +1 = · · · = vj0 , so that the corresponding map Sp has ` consecutive plateaus at the same level. (Here it is convenient to allow the uninteresting case ` = 1.) We now define to be the smallest transitive relation on P m which satisfies the following condition: If p coincides with q except that p contains a level block p, 1 − p, p, . . . , 1 − p, p of odd length ` ≥ 1 which is replaced in q by a corresponding block q, 1 − q, q, . . . , 1 − q, q of the same length, where p ≥ q, then p q.

154

J. Milnor, C. Tresser

Lemma 6.2. For each 0 ≤ h0 ≤ log(m + 1), the isentrope {hsaw σ = h0 } is a deformation retract of the region {h ≤ h0 }. Proof of 6.2. Starting with any point pˆ ∈ P m we construct a topological entropy increasing path t 7 → p(t) ∈ P m for 0 ≤ t ≤ 1 with p(0) = pˆ and hsaw σ (p(1)) = log(m + 1). In fact, let pj (t) = min(pˆ j + t, 1). Alternatively, this deformation can be described by the differential equation ( dpj +1 if pj < 1 = 0 if pj = 1. dt Clearly the resulting path depends continuously on pˆ and t, takes values in P m , and satisfies p(t) p(t 0 ) whenever t ≤ t 0 . In particular, it follows that hsaw σ (p(t)) ≤ 0 0 hsaw σ (p(t )) whenever t ≤ t . It will be convenient to use the norm kp − qk = maxj |pj − qj | ˆ < , then clearly kp(t) − q(t)k < for t ∈ [0, 1], and for p, q ∈ P m . If kpˆ − qk furthermore p(t + ) q(t), q(t + ) p(t). ˆ be the smallest value of t ∈ [0, 1] with ˆ ≤ h0 . Let t (p) Now suppose that hsaw σ (p) ˆ − qk ˆ < , then it follows that |t (p) ˆ − t (q)| ˆ < . Therefore, the hsaw σ (p(t)) = h0 . If kp homotopy which is defined by ( ˆ p(t) if t ≤ t (p) ˆ = ht (p) ˆ ˆ p(t (p)) if t ≥ t (p) saw t yields a deformation retraction from {hsaw σ ≤ h0 } onto the isentrope {hσ = h0 }. u m Lemma 6.3. The region {hsaw σ ≤ h0 } ⊂ P is contractible.

Proof of 6.3. The first step is to construct a topological entropy decreasing deformation ˆ t) 7 → p(t) which continuously “flattens out the bumps” on the corresponding stunted (p, sawtooth maps Sp(t) . This deformation is defined by a differential equation as follows. Let p, 1 − p, p, . . . be any maximal level block contained in p, and let ` ≥ 1 be its length. We set ( dp −1 if ` is odd and p > 0, and = 0 if ` is even or p = 0. dt (Thus the majority of the entries in a block of odd length move down, but a minority of the entries move up.) Evidently topological entropy decreases monotonically along each path. Furthermore the quantity p1 + · · · + pn , which is linearly related to the total variation of the map Sp , decreases with derivative ≤ −1 until we reach a stationary state, corresponding to a monotone map with topological entropy equal to zero. This proves that the sub-polyhedron consisting of such stationary states is a deformation retract of the region {hsaw σ ≤ h0 }. In the case m odd, this sub-polyhedron consists of a single point, corresponding to the constant map. This completes the proof of 6.3 for m odd.

On Entropy and Monotonicity for Real Cubic Maps

155

For m = 2k, this sub-polyhedron is a k-simplex, consisting of all (p1 , 1 − p1 , p3 , 1 − p3 , . . . , pm−1 , 1 − pm−1 ) with p1 ≤ p3 ≤ · · · ≤ pm−1 . Since this simplex is itself contractible, this completes the proof of 6.3. u t Theorem 6.1 now follows, since a retract of a contractible space is clearly contractible (if Fλ is a contraction of the total space, r a retraction, and e the embedding of the retract t in the total space, then r ◦ Fλ ◦ e is a contraction of the retract). u 7. Bones in P 2 The stunted sawtooth families are well understood, but the polynomial families of the same shape are poorly understood beyond the unimodal case. Our analysis of the relationship between these two families in the cubic case will rely on the study of parameter points for which at least one of the two critical orbits is periodic. We will need a precise terminology for describing periodic orbits. As in Sect. 1, let o be a cyclic permutation of the integers {1, 2, . . . , q}. By definition, a periodic orbit O = {x1 , . . . , xq } of an interval map f is said to have order type o if f maps each xi to xo(i) , where x1 < x2 < · · · < xq . We will sometimes use the notation o = (i1 i2 . . . iq ) for the permutation which satisfies o(ij ) = ij +1 , so that f (xij ) = xij +1 . Here the subscripts j are to be taken modulo q. Definition. A cyclic permutation o will be called m-modal of shape σ if there exists an m-modal map of shape σ which has a periodic orbit with order type o. It will be called strictly m-modal of shape σ if there exists such a map with all m of its critical points on this orbit. Equivalently, this means that there are integers 1 ≤ γ1 < γ2 < · · · < γm ≤ q, where xγ1 < · · · < xγm are to be the critical points, with the following property. Setting γ0 = 1, γm+1 = q, the restriction of the permutation o to the integers in each interval [γi , γi+1 ] must be either monotone increasing or monotone decreasing according as σi equals +1 or −1. Of course this condition implies that the period q must satisfy q ≥ m. Given any periodic orbit of this order type, for a map of shape σ , note that the address of the orbit point xi is necessarily given by   I0 if i < γ1 A(xi ) = Ij if γj < i < γj +1  I if γ < i. m m However, when i is precisely equal to γj the address A(xi ) is not uniquely determined: it can be Cj but it can also be either of the adjacent intervals Ij −1 or Ij . Thus altogether there are 3m distinct possibilities. It follows that there are 3m different possibilities for the kneading type of the periodic orbit (that is, the itinerary of a representative point, which is well defined up to a shift). Compare the proof of 7.1 below.

156

J. Milnor, C. Tresser

Fig. 10. The period 3 order type o = (123) is strictly bimodal of shape either (+ − +) or (− + −), but the period 2 order type is strictly bimodal only of shape (+ − +), and the period 4 order type o = (1243) is bimodal only of shape (+ − +).

It is not difficult to check that a bimodal order type of shape (+ − +) is strictly (+ − +)-bimodal if and only if its period satisfies q ≥ 2. On the other hand, a bimodal order type of shape (− + −) is strictly (− + −)-bimodal if and only if q ≥ 3. In order to relate two families of the same shape in the bimodal case, we introduce some terminology from MacKay and Tresser [MaT1]. As in Sect. 3, consider a family of bimodal maps of some fixed shape σ , parameterized by the triangle P 2 . Definition. By a bone in the parameter space P 2 we mean the compact set consisting of all parameter values for which a specified critical point has periodic orbit with specified order type. More precisely, the left bone B− (o) is the set of parameter values for which the left hand critical point is periodic with order type o. The dual right bone B+ (o) is the set of parameter values for which the right critical point is periodic with this same order type. Note that two left bones, or two right bones, are disjoint, almost by definition. These definitions make sense either for the stunted sawtooth family or for the cubic family. We will insert the superscript “saw” respectively “cub” in order to distinguish these two cases. Similarly, we will use the notation P saw or P cub for the parameter triangle P 2 when we want to emphasize that it is being considered as the parameter space for stunted sawtooth maps or for cubic maps. In discussing the parameter triangle P 2 , we will refer to the vertex p = (1, 1), corresponding to a map of entropy log 3, as the top vertex. The opposite edge, corresponding to monotone maps, with entropy zero, will be called the bottom edge of P 2 . In either the stunted sawtooth or the cubic family of shape σ , it is not difficult to check that a bone B± (o) is non-vacuous if and only if its order type o is bimodal of shape σ . We will concentrate on order types which are strictly σ -bimodal. In the three exceptional cases where o is not strictly bimodal of this shape (that is for period two with shape (− + −) or for period one with either shape), the bones behave rather differently. For example, in these cases only, the corresponding bones intersect the bottom edge of the triangle P 2 (see Fig. 14). These exceptional cases will play only a minor role in our argument. (Compare Sect. 8.) Suppose then that o is strictly bimodal of shape σ . We consider first the stunted sawtooth case, which is much easier to analyze. Lemma 7.1. For either bimodal shape σ and for each strictly σ -bimodal order type o: saw (o) is a simple arc with both endpoints on a common vertical or (i) Each bone B± horizontal edge of the triangle P 2 = P saw , and is made up out of three straight line segments which are alternately horizontal and vertical.

On Entropy and Monotonicity for Real Cubic Maps

157

B-

B+

Fig. 11. The period 2 bones for the stunted sawtooth family of shape (+ − +) on the left, and for the corresponding cubic family on the right, with the primary intersection points emphasized. Arrows point in the direction of increasing dynamic complexity. (The pictures for either shape σ and for any strictly bimodal order type of this shape would look qualitatively the same.)

saw (o) and B saw (o) intersect transversally in exactly two points. (ii) The dual bones B− + The intersection at the midpoints of their middle line segments will be called their primary intersection point. It corresponds to the unique map Sp for which both critical points lie on a common periodic orbit of order type o. There is also a secondary intersection point, higher in P saw , corresponding to the unique map Sp which has two disjoint critical orbits, both periodic with the same order type o. (iii) This pair of dual bones cuts the triangle P saw into five regions. There is a large region which contains the entire bottom edge of P saw and contains the primary intersection point in its boundary. Adjoining this are two side regions, each touching just one edge of P saw . Next there is a central region, which is disjoint from all other bones so that the topological entropy takes a constant value (equal to the logarithm of an algebraic unit) throughout. Finally, there is a top region whose boundary contains both the secondary intersection point and the top vertex of P saw . saw (o) and B saw (o0 ) intersect transversally in either 0, 2, or (iv) For o 6 = o0 , the bones B− + 4 points. Each of these intersection points corresponds to a map with two disjoint periodic critical orbits. Again the entropy is the logarithm of an algebraic unit.

Proof of (i), (ii) and (iv). Since the order type o is strictly bimodal of shape σ , it follows easily from 5.3 that there exists a point p0 = (p10 , p20 ) ∈ P saw so that both critical points of the map Sp0 lie in a common period q orbit with order type o. (The argument will show that this point p0 is unique. Compare 8.1 below.) Evidently there is a unique b1 to C b2 while S ◦(q−n) b2 0 < n < q so that the iterate S ◦n0 maps the critical point C maps C 0 p

p

b1 . The kneading type I 0 of this periodic orbit, that is the itinerary of a representative to C point (well defined up to a shift), is periodic of period q, and contains each of the critical point symbols C1 and C2 exactly once in each period. For an arbitrary periodic orbit of the same order type, as noted above, there are 3m = 9 different possibilities for the kneading type: If we replace (p10 , p10 ) by some nearby point (p1 , p2 ), then there will

158

J. Milnor, C. Tresser

still be a unique periodic orbit which intersects both plateaus. However, depending on the sign of p2 − p20 and of p1 − p10 , we can replace the symbol C1 in the kneading type I 0 by either of the two adjacent symbols I0 or I1 , and we can replace C2 by either I1 or I2 . However, no other replacements are possible. In particular, for any orbit of order type o for a bimodal map of shape σ , the remaining q − 2 symbols must remain unchanged. b1 To fix our ideas, consider a left bone B− (o). For p ∈ B− (o) the left critical point C is periodic. Evidently each plateau of a stunted sawtooth map can contain at most one b1 ) in the periodic point, and for p ∈ B− (o) it is easy to check that only the point Sp◦n (C b orbit of C1 can lie in the right hand plateau. We divide the discussion into three cases, corresponding to the three line segments which make up the bone B− (o). b1 ) = Sp◦(n−1) (v1 ) does lie in the right hand plateau, then the iterate Case 1. If Sp◦n (C ◦(n−q−1) b2 ) to C b1 . Since the partial itinerary must map the critical value v2 = Sp (C Sp b from v2 to C1 is uniquely determined by the order type, this yields a linear equation which we can solve for v2 , and hence for p2 = p20 . b1 ) lies either to the left or to the right of the right Cases 2, 3. On the other hand, if Sp◦n (C b1 is uniquely determined. hand plateau, then the entire partial itinerary from v1 back to C In either of these two cases, this yields a linear equation which we can solve for v1 . Alternatively, we can describe these three cases as follows. Suppose that we fix p2 = p20 and vary p1 throughout a neighborhood of p10 . Then the size of the right hand plateau remains fixed, and there will be a largest interval [p1− , p1+ ], symmetric about ◦(n−1) (v1 ) lies in the right-hand plateau. p10 , so that, for p1 in this interval, the image Sp For the extreme values p1 = p1± , this image will lie at either end of the plateau. Now fix p1 = p1± and let p2 vary over the interval [p20 , 1]. As p2 increases, the right-hand b1 plateau will move up or down (depending on σ ), shrinking in length, but the orbit of C will remain fixed, missing the right-hand plateau completely when p2 > p20 . It follows that B − (o) is the union of three line segments:

B− (o) = p1− × [p20 , 1] ∪ [p1− , p1+ ] × p20 ∪ p1+ × [p20 , 1]. The discussion of right bones in completely analogous. This proves Part (i); and Parts (ii), (iv) follow easily. u t Proof of (iii). It is easy to see that a pair of dual bones partitions P saw into five regions. To prove that the central region is disjoint from all other bones, let us follow the left bone B− (o) from the primary intersection point p0 to the secondary intersection point. b1 ) is periodic, belonging to the For each point p in this path, the image C 0 (p) = Sp◦n (C b2 and C 0 (p). For p = p0 this interval b1 . Let Jp be the interval with endpoints C orbit of C ◦q degenerates to a point, and for p in a large neighborhood of p0 the iterate Sp maps all 0 of Jp to the endpoint C (p). However, as p moves towards the secondary intersection ◦q point, eventually its image under Sp will become bigger. However, the restriction of ◦q Sp to Jp will remain monotone, and its image will remain a proper subset of Jp until p ◦q b b2 , so that reaches the secondary intersection point, at which time Sp (C 2 ) will equal C b2 will Jp maps onto itself with both endpoints fixed. Until p reaches this point, clearly C not be periodic, so no other bone can cross through to the central region. The statement that topological entropy is constant in any region without bones follows easily from 4.11. (Compare 9.1 below.) Since the boundary of the central region contains

On Entropy and Monotonicity for Real Cubic Maps

159

parameters for which all turning points are periodic, this constant value must be the logarithm of an algebraic unit by 4.3. This proves (iii), and completes the proof of 7.1. t u The statement corresponding to Lemma 7.1 for the cubic family is much more difficult. Here is some preliminary information. Again let o be strictly bimodal of shape σ. cub (o) are smooth 1-dimensional manifolds with boundary. Lemma 7.2. Both bones B± cub (o) is equal to the transversal intersection of this bone Further, the boundary of B− cub (o) is with the horizontal (top) edge of the triangle P cub , while the boundary of B+ equal to the transversal intersection of this bone with the vertical (right-hand) edge of cub (o) ∩ B cub (o0 ) between two such bones is also transverse. P cub . Any intersection B− +

(If we make use of Heckman’s Theorem that there are no bone-loops, then we have the much sharper statement that each bone is a connected simple arc.) The proof of 7.2 begins as follows. First consider the corresponding statement for the family of complex maps z 7 → z3 − 3a 2 z + b, with critical points ±a. It is proved in [M3] that the locus S± (p) of points for which ±a has period p is a smooth complex curve. Furthermore, for each p and q the curves S+ (p) and S− (q) intersect transversally. (See [M2].) In fact, S+ (p) has transverse intersection with any curve consisting of points for which the other critical point −a is preperiodic. The proofs make essential use of quasi-conformal surgery. Compare [St], where analogous results for quadratic rational maps are proved by similar methods. Alternatively, these statements have been proved by quite different methods by Epstein [E]. Restricting to the real (a, b)-plane, we obtain a corresponding statement for real cubic maps of shape (+ − +). In the family of3 real maps (17+ ) x 7 → x − 3a 2 x + b, the locus of pairs (a, b) for which a (or −a) is periodic of period p forms a smooth 1dimensional manifold without boundary. A complex linear change of coordinates yields the corresponding statement for the family of maps x 7 → −(x 3 − 3a 2 x + b),

(17− )

of shape (− + −). In order to relate this to the family of cubic maps fp , we note that each fp is positively affinely conjugate to a unique map in the normal form (17± ) with a ≥ 0. That is, for each p there is a unique affine map L(x) = c x + d with c > 0 so that L ◦ fp ◦ L−1 has the required form. It follows that there is a well defined continuous mapping φ : P cub → R2 which associates to each p ∈ P cub the associated pair φ(p) = (a, b) in the half-plane a ≥ 0. Evidently the pre-image of the curve S± (p) under φ is the union cub (o) of period p. If φ were a diffeomorphism, then it would follow of all bones B± immediately that each bone is a smooth 1-manifold, with boundary points precisely at the intersections with the horizontal or vertical part of the boundary of P cub . In fact, the situation is somewhat more complicated, and can be described as follows. (Compare [DGMT, Figs. 8, 9.) Lemma 7.3. Let P cub be the parameter triangle for cubic maps of specified shape σ , and let U ⊂ P cub be the open subset consisting of those parameter values p such that the boundary ∂I is strictly repelling for the associated cubic map fp : I → I . Then every bone of strictly bimodal order type in P cub is contained in this open set U . Furthermore, the map φ : P cub → R2 embeds U diffeomorphically into R2 .

160

J. Milnor, C. Tresser

Fig. 12. Graphs of cubic maps of shape (+ − +) representing successive points along the left bone of period two. Illustrated are the “lower” endpoint of this bone, the primary intersection point where dynamic complexity is minimized, the secondary intersection point, and the “upper” endpoint. In the third graph there are three different period two orbits, all contained in the emphasized intervals, and these period two orbits all persist to the right-hand graph, which has the largest negative orbit complexity

Clearly Lemma 7.2 will follow from 7.3, together with the discussion above. Proof of 7.3. If p belongs to the complement P cub r U , then some boundary orbit of period one or two for fp has multiplier in the interval [0, 1]. Since fp has negative Schwarzian derivative, this implies that there is at least one critical point in the immediate basin, and this clearly implies that the interval between the two critical points is also contained in this immediate basin. Thus there cannot be any periodic critical orbit, unless one and hence both critical points are actually contained in the boundary ∂I . This occurs only if p is one of the two lower corner points of the parameter triangle. Evidently the corresponding order type o is not strictly bimodal. (These zero entropy corner points are quite anomalous, since each one represents an isolated point of the associated bone of period one or two.) For any p ∈ U the boundary periodic orbit (in the (− + −) case) or orbits (in the (+ − +) case) are strictly repelling, and hence vary smoothly under smooth deformation of the polynomial. It follows that there is a smooth local inverse function from the image φ(U ) ⊂ R2 back to U . This inverse function is single valued, since otherwise we would have two different intervals, say I and I 0 , both containing the two critical points, so that the same polynomial function fp maps each of these intervals into itself with strictly repelling boundary. Then any component of I r I 0 would map diffeomorphically into itself under fp or fp ◦ fp , with both boundary points repelling, which is impossible for a map of negative Schwarzian. This completes the proof of 7.3 and 7.2. u t Remark 7.4. There is a fundamental relationship between bones and negative orbit complexity. (Compare 9.1.) In fact, as we traverse a path in the parameter triangle, the number of fixed points of negative type for the nth iterate can clearly change only as we cross a bone of period dividing n. More precisely, as we cross a period p bone in the ‘positive’ direction, as indicated in Fig. 11, it turns out that exactly one new periodic orbit of negative type is created, with period equal to p. This fact can be used to count the number of period p bones. In the (+ − +)-case, note that the number of fixed points of negative type for the nth iterate increases from zero, along the bottom edge of the parameter triangle, to a maximum of (3n − 1)/2 for the map, say S(1,1) , corresponding to the top (high entropy) vertex of the triangle. It follows easily that #Bones(p), the number of period p bones in the parameter triangle (suitably interpreted when p = 1), is precisely equal to the number of period p orbits of negative type for the map S(1,1) .

On Entropy and Monotonicity for Real Cubic Maps

161

From this, we derive the following formula, which can be used to compute #Bones(p): X

(3n − 1)/2 =

p · #Bones(p).

p|n with n/p odd

For example: p= 1 (3p − 1)/2 = 1 #Bones(p) = 1

2 4 2

3 13 4

4 40 10

5 121 24

6 364 60

7 1093 156

8 3280 410

9 9841 1092.

Note that #Bones(p) is even for p > 1, since the bones occur in dual pairs. The discussion for the (− + −)-case is slightly different, but the number #Bones(p) remains the same. Remark 7.5. Evidently the intersection points of bones are precisely those points in parameter space for which both critical points are periodic. We will see in Sect. 8 that there is a one-to-one correspondence, preserving kneading data, between these intersection points for the stunted sawtooth case and for the cubic case. Alternatively, this statement would follow easily from a form of Thurston’s Theorem, as stated in B.3 and B.4 of Appendix B. In general, for the cubic map corresponding to any intersection of bones, the two critical points will belong to disjoint periodic orbits. The only exception is for the primary intersection point between two dual bones. In this exceptional case, the two critical points belong to a common orbit. (See Fig. 11 for the picture in P 2 , and see Fig. 10 for graphs of some cubic maps corresponding to primary intersections between dual bones.) Similarly, the endpoints of strictly σ -bimodal bones correspond to maps for which one critical point is periodic while the other is preperiodic, mapping directly to a boundary fixed point or period two orbit. Again there is a canonical one-to-one correspondence between such endpoints for the stunted sawtooth and cubic families. In particular, it follows that each cubic bone has exactly two endpoints. (See Fig. 12 for graphs of several maps corresponding to parameter values along a left bone, including the two endpoints.) 8. Monotonicity, Intersections of Bones, and the n-Skeleton In the quadratic case, it is known that the number of period q points for the interval map Qv (x) = 4vx(1 − x) increases monotonically as the folding value parameter v ∈ [0, 1] increases. Similarly the kneading sequence K1 (Qv ) increases monotonically with v. Proofs of this result have been given by Sullivan,7 Douady and Hubbard, and by Milnor and Thurston. Compare [DH2:noVI, MTh, D], as well as [MvS2]. We can adapt similar techniques to study special one parameter families of cubic maps. (Compare [NN].) However, instead of working with the full kneading data, it will suffice for our purposes to work with the “negative orbit complexity” N (f ) = Neg(f ◦1 ), Neg(f ◦2 ), . . . 7 As far as we know, Sullivan did not actually publish a proof. However he communicated one orally at an early date, and conversations with Sullivan led to the Milnor–Thurston proof.

162

J. Milnor, C. Tresser

of 4.13. In fact, we will establish monotonicity of N (f ) in a collection of one parameter families large enough to give us a good picture of the full two parameter family, making use of Heckman’s Theorem that there are no bone-loops. By definition, a polynomial map is post-critically finite if the orbit of every critical point is periodic or eventually periodic. Similarly, we will say that a piecewise monotone map has finite folding point orbits if the orbit of every folding point is periodic or eventually periodic. (If f : I → I is piecewise strictly monotone and boundary anchored, then a completely equivalent condition is that f be a “Markov map” in the sense that there exist a subdivision of I into finitely many subintervals, each of which maps homeomorphically onto some union of these subintervals.) Thurston Uniqueness Theorem for Real Polynomial Maps. A post-critically finite real polynomial map of degree m + 1 with m distinct real critical points is uniquely determined, up to positive affine conjugation, by its kneading data. See Appendix B, which augments this to a full existence and uniqueness statement, and gives some indication of the proof. There is an analogous statement for the stunted sawtooth family, as follows. As usual, we assume that the slope s > m + 1 has been fixed. Lemma 8.1. Given any piecewise strictly monotone m-modal map such that each folding point has finite forward orbit, there exists one and only one stunted sawtooth map of the same shape with the same kneading data.

Caution. Here it is essential that the kneading data comes from a piecewise strictly monotone map. The figure above shows an example to illustrate this. Recall that the bi of the “folding points” of a stunted sawtooth map are defined to be the midpoints C plateaus. Let Sp be a stunted sawtooth map as illustrated, of shape (+ − +) with folding b2 + > v2 = C b2 . If > 0 is sufficiently small, then this map is postvalues v1 = C critically finite, with kneading data K1 = (I2 , C 2 ) , K2 = C 2 , where the overline stands for infinite repetition. Thus there is an entire 1-parameter family of stunted sawtooth maps with the same post-critically finite kneading data. Yet no polynomial map, and more generally no piecewise strictly monotone map, can have this kneading data, which requires that an entire interval in I2 must map to the second folding point. (Compare Appendix B.) In fact there is a simple criterion which guarantees that the kneading data for some given post-critically finite map f can be realized by a map which is piecewise strictly monotone. Let {y1 , . . . , yn } be the union of the forward orbits of the folding points, where y1 < · · · < yn . If f (yi ) 6 = f (yi+1 ) for 1 ≤ i < n, then interpolating linearly

On Entropy and Monotonicity for Real Cubic Maps

163

between the yi , and extending suitably over the rest of the interval [a, b], we easily obtain a piecewise strictly monotone map with the same kneading data. Proof of 8.1. Existence follows from 5.3. To prove uniqueness of the resulting stunted sawtooth map Sp , note that the orbit of a folding point of Sp can never land in a plateau, bj . For otherwise, any map with the same kneading data would except at its midpoint C have an interval of points eventually mapping to the same point, which would contradict the hypothesis. Therefore, orbit points are uniquely determined by their itineraries, hence t Sp itself is uniquely determined. u We now proceed to apply these results to the study of monotonicity along special curves. As a first example, consider the top edge p2 = 1 of the parameter triangle for either cubic or stunted sawtooth maps of fixed shape σ . This edge corresponds to the set of maps f : I → I (in either family) such that the second critical value v2 is an endpoint of the interval I , and hence is either a fixed point, in the (+ − +) case, or a period two point, in the (− + −) case. Thus the kneading sequence K2 remains constant as we traverse this edge, and only K1 varies. Lemma 8.2. In either the cubic or the stunted sawtooth family of specified shape σ , as we traverse the top edge p2 = 1 of the parameter triangle in the direction of increasing p1 , the negative orbit complexity N increases monotonically. More precisely, N increases whenever we cross the endpoint of a left bone, and remains constant otherwise. Furthermore these crossing points occur in the same order in the two families: To each crossing point in one family, there corresponds one and only one crossing point in the other family with the same kneading data; where the crossing point f (in either family) occurs before f 0 (in the same family) if and only if K(f ) K(f 0 ). (Compare 4.5 for this last notation.) It follows that topological entropy also increases monotonically as we traverse this edge. Of course there is a completely analogous statement as we traverse the right hand edge p1 = 1 in the direction of increasing p2 . It must be emphasized that this lemma compares cubic and stunted sawtooth maps for some fixed shape σ . If we switch shape, then the combinatorics becomes completely different. Proof of 8.2. At the left endpoint p = (0, 1) of this edge, either in the cubic or the stunted sawtooth family, the associated map is monotone, with no periodic orbit of negative type other than a single fixed point of negative type in the (− + −) case. On the other hand, at the right endpoint p = (1, 1), we have a bimodal map of maximal entropy, with three non-overlapping subintervals each of which maps onto the entire interval I . For such a maximal entropy map f there are 3n fixed points of f ◦n with distinct itineraries, and approximately half of these have negative type. Thus, as p1 increases, many periodic orbits of negative type must be created. However, an orbit of negative type is a relatively robust object which can only be created or destroyed at a parameter value for which one of the orbit points becomes a folding point. Here only the left folding point c1 can be the one in question, since the condition p2 = 1 guarantees that the orbit of the right folding point c2 never returns to c2 . More precisely, we claim the following: In either family, as p1 increases through a value such that the orbit of c1 is periodic, a periodic orbit of negative type is created. This orbit persists until we reach the right hand endpoint with p = (1, 1), and all periodic orbits of negative type (with the exception of a single fixed point of negative type in the (− + −) case) arise in this way. There is only one step we have to be careful about: We must be sure that an orbit of negative type, once created, cannot disappear again as p1 increases. But this would

164

J. Milnor, C. Tresser

Fig. 13. Bones of period 3 and 4 for the stunted sawtooth family of shape (− + −) above and for the cubic family of shape (− + −) below. Periods are indicated near the primary intersection points

On Entropy and Monotonicity for Real Cubic Maps

165

imply that there were two different parameter values along this edge for which the left folding point is periodic, with the same itinerary in both cases. (It is not difficult to check that there is only one point of such an orbit which can merge with the left critical point.) This is impossible by Thurston’s Theorem in the cubic case,8 or by 8.1 in the stunted sawtooth case. Thus the invariant N increases monotonically as we follow the edge, and increases as we cross each bone. But according to 4.9, N is determined by the kneading data. Hence bone-ends with the same kneading data, in the cubic and stunted sawtooth families, must occur in the same order, as asserted. u t Now let us restrict attention to bones which are strictly σ -bimodal. Thus we exclude only a few low period bones which are atypical (as illustrated in Fig. 14). Lemma 8.3. In the cubic family, just as in the stunted sawtooth family, each strictly σ -bimodal left bone B− (o) intersects the corresponding right bone B+ (o) in exactly two points, and just one of these (the “primary intersection point”) has the property that both folding points belong to a common periodic orbit. Proof. If there were no intersection point, then we could find a path from the maximal entropy vertex p = (1, 1) of the parameter triangle which avoids both B− (o) and B+ (o) and yet leads to the opposite edge. This is impossible, since any negative orbit with strictly bimodal order type o must disappear before we reach this lower edge, which parametrizes only monotone maps. Hence there must be at least one crossing point. Alternative Proof. Compare Fig. 12. As we follow the bone B− (o) from one endpoint to the other, the kneading invariant K1 remains periodic, of order type o. However, this kneading invariant cannot be the same at the two endpoints, since this would contradict Thurston’s Theorem, and the only way for it to change is to pass a point, where both critical points belong to this same periodic orbit. Since all crossings are transverse, it follows from simple plane topology, using the Jordan curve theorem, that the number of crossing points must be even, so there are at least two. Finally, there cannot be more than two crossing points, since each crossing point corresponds to a postcritically finite cubic map, which is uniquely determined by its kneading data K. There are only two crossing points in the stunted sawtooth case, hence there are only two possible choices for K, and the conclusion follows. u t Definition. It will be convenient to divide each strictly σ -bimodal bone B± (o) into two halves by cutting at the primary intersection point. Each of these two will be called a half-bone. Theorem 8.4. In either the cubic or the stunted sawtooth family, as we follow any left half-bone from its primary intersection endpoint to the edge p2 = 1, the negative orbit complexity N increases monotonically, with an actual increase every time we cross a right bone. Again there is a one-to-one correspondence between crossing points in the two families which preserves the order along the half-bone, and which preserves kneading data. 8 As in the proof of 7.2, we must be careful since the map from parameter triangle to positive affine conjugacy classes of cubic polynomials is not one-to-one, but rather folds over two corners of the triangle. However by 7.3, there are no periodic critical orbits for parameters in the folded over region.

166

J. Milnor, C. Tresser

Fig. 14. The bones of period ≤ 2 for the shape (− + −) are not strictly bimodal, and hence are quite atypical. In particular, their structure in the stunted sawtooth case (shown on the left) is qualitatively different from that in the cubic case (on the right). The period one bones for the (+ − +) case are even simpler, with no intersection point at all. In order to get a homeomorphism from one parameter triangle to the other preserving these low period bones, we would have to first remove the bottom edge of the triangle or collapse it to a point

The proof is completely analogous to the proof of 8.2. As we follow the half-bone, the kneading sequence K1 remains constant. Hence an orbit of negative type can appear or disappear only by passing through a parameter value for which the right folding point is periodic. Again there can be only one such parameter value for each periodic itinerary of negative type. Details will be left to the reader. u t It follows that topological entropy increases monotonically as we traverse either left half-bone, starting at the primary intersection point. Again there is a completely analogous statement for right half-bones. Definition. By the n-skeleton Snsaw for the stunted sawtooth family of some specified shape, either σ = (+ − +) or σ = (− + −), we will mean the union of all bones saw (o) ⊂ P saw of period p ≤ n, together with the boundary ∂P saw . Similarly, the B± n-skeleton Sncub for the cubic family of the same shape is the union of all cubic bones cub (o) of period p ≤ n, together with ∂P cub . For elements of the analysis of the B± structure of these skeletons, see [RS, RT], [MaT1, MaT2] as well as [Mu]. It will be convenient to exclude the low period bones (those whose orbit types are not strictly bimodal), since they behave rather awkwardly. (Compare Fig. 14.) Thus, in either family, we define the essential n-skeleton Ensaw or Encub to be the boundary of the parameter triangle together with the union of all bones associated with strictly σ -bimodal order types o of period p ≤ n. In other words, we require that 2 ≤ p ≤ n in the (+ − +) case, and that 3 ≤ p ≤ n in the (− + −) case. By a vertex of this essential skeleton, we mean either an endpoint of one of its bones or an intersection point of two of its bones. As an example, Fig. 13 shows the essential 4-skeletons for the stunted sawtooth and cubic families of shape (− + −). In comparing the pictures for these two families, it is important that we consider only the essential skeletons, since low period bones do not behave in quite the same way.

On Entropy and Monotonicity for Real Cubic Maps

167

Theorem 8.5. For either bimodal shape σ , and for any n > 2 there is a homeomorphism from the parameter triangle P saw to the parameter triangle P cub which maps the essential skeleton Ensaw homeomorphically onto the essential skeleton Encub , carrying each bone of order type o to a bone of the same order type o, carrying each vertex to a vertex with the same kneading data, and carrying each edge of the parameter triangle to the corresponding edge. The proof, making use of 8.2 and 8.4 and the fact that all intersections are transverse, is a straightforward exercise in plane topology. For example one can proceed inductively, starting with the identity map of P 2 and then adjusting it to behave correctly on one bone at a time. Details will be left to the reader. u t 9. From Connected Bones to Connected Isentropes As usual, we fix one of the two possible shapes σ for bimodal maps. Working either in the cubic family or the stunted sawtooth family, we have the following. Lemma 9.1. Let p and p0 be two points in the parameter triangle P 2 such that the associated maps have topological entropy h(fp ) 6 = h(fp0 ). Then any path from p to p0 in the parameter triangle P cub must cross infinitely many bones. Proof of 9.1. According to Theorem 4.11 the difference |Neg(fp◦k ) − Neg(fp◦k0 )| must be unbounded as k → ∞. As we deform p along some path in P 2 , the number Neg(fp◦k ), which measures the number of decreasing laps of fp◦k whose graph crosses the diagonal, will remain constant except as we pass through a map which has a folding point with periodic orbit of period q dividing k, with k/q odd, in which case this number jumps by ±q. (Compare 7.4.) Thus, Neg(f ◦k ) remains constant unless we pass through a bone of period dividing k. Further details are easily supplied. u t Associated with the essential skeleton Encub or Ensaw ⊂ P 2 is a topological cell structure on P 2 . That is, we can partition P 2 into subsets, each of which is homeomorphic either to a point, an open interval, or a 2-dimensional open unit disk. Furthermore, these subsets fit together nicely so that the closure of each one is topologically a point, a closed interval, or a closed 2-disk. By definition, the open 2-cells in this cell structure are the connected components of the complement P 2 r En , the 0-cells are the vertices as described in Theorem 8.5, and the open 1-cells are the connected components of En r {vertices}. The resulting cell complex will be denoted by Pn2 , or more precisely by Pn2, cub or Pn2, saw . These two cell complexes Pn2, cub and Pn2, saw are homeomorphic as a consequence of Theorem 8.5, in a homeomorphism ηn which takes each vertex to a vertex of the same topological entropy and each edge to an edge with the same interval of entropies. Lemma 9.2. For each > 0 there exists an integer n such that if two points q and q0 belong to a common closed cell of the complex Pn2, cub or Pn2, saw , then the corresponding maps fq and fq0 satisfy |h(fq ) − h(fq0 )| < . Proof in either family. Otherwise we could find > 0 so that for each n there existed points qn and qn0 in a common cell of Pn2 with |h(fqn0 ) − h(fqn )| ≥ . After passing to infinite subsequences, we could assume that both sequences converge, say qk → q and qk0 → q0 , and furthermore that all of the qk and qk0 belong to a common closed cell of

168

J. Milnor, C. Tresser

Pn whenever k ≥ n. Thus the limit points q and q0 would belong to a common closed cell of Pn2 for every n, but by continuity the associated entropies would differ by at least . This is impossible by Lemma 9.1. u t Lemma 9.3. The topological entropy function q 7 → h(fq ), either for the cubic family or the stunted sawtooth family, restricted to any closed cell of the cell complex Pn2, takes its maximum and minimum values on the boundary (and in fact on the set of boundary vertices). Hence the interval of entropy values which are realized by any cell in the cubic family is the same as the interval of entropy values for the corresponding cell in the stunted sawtooth family. Proof. For the stunted sawtooth family, the first statement follows easily from the fact that entropy is a monotone function of either coordinate p1 or p2 . To prove the analogous statement for the cubic family, suppose for example that for some point p0 of a closed cell C, the value h(fp0 ) were strictly larger than the maximum value hmax on the boundary of C. Let 2 = h(fp0 ) − hmax . According to Lemma 9.2 we can choose m > n so that h varies by less than on each cell of Pm2, cub . Let C 0 ⊂ C be a cell of Pm2, cub which contains this point p0 , and let p0 be any vertex of C 0 . Then it follows that h(fp0 ) > hmax . Since the homeomorphism ηm carries vertices to vertices with the same topological entropy, this would yield a vertex in the complex Pm2, saw satisfying a corresponding inequality. But this is impossible, since we already proved Lemma 9.3 to be true for the stunted sawtooth family. The second statement follows immediately. u t Main Theorem 9.4. Every isentrope {p ∈ P±2,cub ; h(fp ) = h0 } for either cubic family is connected. Proof. For each n, the union of all closed cells of Pn2, saw which touch the h0-isentrope forms a compact set, which is connected by Theorem 6.1. The union of corresponding cells in Pn2, cub forms a compact connected set by Theorem 8.5, and contains the h0 isentrope for the cubic family by Lemma 9.3. The intersection of these sets, as n → ∞, will be precisely equal to the required isentrope by Lemma 9.2. Since an intersection of compact connected sets is compact and connected, the conclusion follows. u t Recall that a compact subset of Rn is called cellular if it is the intersection of a nested sequence of closed topological n-cells, each contained in the interior of its predecessor. (Compare [Br].) Corollary 9.5. Each isentrope {h = h0 } for either cubic family is a cellular set. Further, if 0 < h0 < log(3) then this isentrope separates the parameter triangle into two connected subsets. Proof. Since we are in dimension two, it is only necessary to check that a set is compact and connected, with connected complement in R2 , in order to prove that it is cellular. But it follows from 9.4 that no isentrope can separate the plane. For otherwise, if p0 belonged to the bounded component of the complement of some isentrope {h = h0 } ⊂ P 2 within R2 , then choosing a point on the boundary of the parameter triangle with the same topological entropy as p0 , we would see that the isentrope {h = h(p0 )} could not be connected, contradicting 9.4. Since we know by 8.2 that each isentrope intersects the two top edges of the parameter triangle in connected sets, it follows easily that the t complement P 2 r {h = h0 } has at most two connected components. u

On Entropy and Monotonicity for Real Cubic Maps

169

Appendix A: Characterization of a Polynomial by its Critical Values A letter from Adrien Douady (slightly edited) Tuesday, 13 July 1993 Dear Milnor, Putting order in my papers, I found some notes I had written with Pierrette Sentenac after a talk by Arnold in ENS, where he asked for a proof in R. Let: Pn = {monic centered real polynomials of degree n} = {x n + cn−2 x n−2 + · · · + c0 }, = {f ∈ Pn | f has distinct real roots a1 , . . . , an }. Then clearly f 7 → (a1 , . . . , an ) is a diffeo of onto {(a1 , . . . , an ) | a1 < · · · < an ,

X

ai = 0},

or equivalently f 7 → (`1 , . . . , `n−1 ), where `i = ai+1 − ai , is a diffeo of onto (R+ )n−1 . Let 8(f ) = (s1 , . . . , sn−1 ), where Z ai+1 Z n−i |f (x)|dx = (−1) si = ai

ai+1

ai

f (x)dx.

Proposition. 8 : f 7 → (s1 , . . . , sn−1 ) is a diffeo of onto (R+ )n−1 . Proof. a) 8 is a local diffeo. In fact, for g of degree n − 2 , we have Z ai+1 dai+1 dai n−i d8i (f + g) g(x) dx + = f (ai+1 ) − f (ai ) (−1) d d d ai at = 0, where the last two summands vanish since the ai are zeros of f . Thus d8/d|=0 could vanish only if g had zero average on each of the (n − 1) intervals [ai , ai+1 ] , and hence had at least n − 1 roots; which is impossible for g 6= 0 of degree n − 2. tends to ∞, and si → 0 b) 8 is proper. In P fact si → ∞ as the length `i = ai+1 − ai Q t as `i tends to 0 with `j bounded. (Use the fact that f (x) = (x − ai ).) u Corollary. Let 0 ⊂ Pn+1 be the set of monic centered polynomials F of degree n + 1 with n distinct real critical points. Taking (n+1)f to be the derivative of F , it follows that the map F 7 → (v1 , . . . , vn ) from 0 to Rn , where vi are the critical values (in the order of the critical points) is a diffeo of 0 onto the set of (v1 , . . . , vn ) with sgn(vi − vi−1 ) = (−1)n−i . Best wishes, Adrien

170

J. Milnor, C. Tresser

Addendum. This argument can easily be extended to the case where the roots {aj } of f (alias the critical P points of F ) need not be distinct, but must only satisfy a1 ≤ a2 ≤ ai = 0. Let aˆ 1 · · · aˆ r be the distinct roots of f , with multiplicities · · · ≤ an with mi ≥ 1 so that f (x) = (x − aˆ 1 )m1 · · · (x − aˆ r )mr ,

with m1 + · · · + mr = n and m1 aˆ 1 + · · · + mr aˆ r = 0. To deform f within polynomials of this same form, choose direction parameters wi ∈ R with m1 w1 + · · · + mr wr = 0 and set ft (x) = x − aˆ 1 (t) As before, let

Z si (t) =

m1

mr

· · · x − ar (t)

aˆ i+1 (t)

aˆ i (t)

, where aˆ i (t) = aˆ i + twi . Z

|ft (x)| dx = ±

aˆ i+1 (t)

aˆ i (t)

ft (x) dx.

A brief computation shows that the derivative dsi /dt at t = 0 can be expressed as the integral from aˆ i to aˆ i+1 of the product of a fixed polynomial ∓(x − aˆ 1 )m1 −1 · · · (x − aˆ r )mr −1 which has no zeros within this interval, and a polynomial g(x) =

X

mi wi

i

Y

(x − aˆ j )

{j ; j 6 =i}

of degree ≤ r − 2. This polynomial g(x) is non-zero unless all of the wi are zero, since g(aˆ i ) = mi wi

Y

(aˆ i − aˆ j ),

{j ; j 6 =i}

which is non-zero unless wi = 0. Hence g has at most r − 2 zeros. Arguing as above, it follows that the linear first derivative map (w1 , . . . , wr ) 7→ (ds1 /dt, . . . , dsr−1 /dt)|t=0 is bijective, and it follows as before that F is uniquely determined by its critical value vector (v1 , . . . , vn ), where now the case vi = vi+1 is allowed. Appendix B: Tight Symbol Sequences and Thurston’s Theorem We first make a rather formal definition, to be justified by the properties which follow. Let I(x) = (A0 , A1 , A2 , . . . ) ∈ AN be the itinerary of a point under some m-modal map of shape σ with kneading data K = (K1 , . . . , Km ).

On Entropy and Monotonicity for Real Cubic Maps

171

Definition. We will say that this symbol sequence I(x) = (A0 , A1 , . . . ) is flabby (flasque) if some terminal segment shift◦k I(x) = (Ak , Ak+1 , . . . ) has either the form (Ij , Kj ) or the form (Ij −1 , Kj ), for any 1 ≤ j ≤ m. In other words, some point of the associated orbit which is not a folding point must have the same itinerary as an immediately adjacent folding point. This symbol sequence will be called tight if it is not flabby. Similarly, we will say that the kneading data K is tight if each of the sequences K1 , . . . , Km is tight. As a first example, consider the family of stunted sawtooth maps with fixed shape σ and fixed slope s > m + 1. Lemma B.1. Let Sp : J → J be a stunted sawtooth map in this family with kneading data K, and let I 0 be a symbol sequence in AN . There exists one and only one x ∈ J with itinerary I 0 if and only if this symbol sequence is tight and K-admissible. Similarly, for any kneading data K, there exists one and only one map Sp in the family with this kneading data if and only if K is tight and σ -admissible. (Existence was shown in Sect. 5, so only the uniqueness part is new and requires tightness.) Proofs will be given at the end of this appendix. We can also give a more geometric criterion for tightness in the stunted sawtooth case. Lemma B.2. The itinerary of x under Sp is tight if and only if the orbit of x never hits any plateau except at its central folding point. Similarly the kneading data for Sp is tight if and only if the orbit of each folding point satisfies this condition. Recall from Sect. 5 that any admissible kneading data is represented by a canonical stunted sawtooth map Sp , which has the property that the orbit of a folding point cannot hit the interior of a plateau except at a folding point. Thus, if we use this canonical map, the only extra requirement for tightness of K is that the folding point orbits cannot even hit the endpoints of the plateaus. The case of polynomial maps is much more difficult, and not well understood. However, for the postcritically finite case (where every critical orbit is periodic or eventually periodic) we have the following result,9 which is an adaptation of a basic theorem of Thurston in a form due to Poirier. Theorem B.3. Suppose that the m-modal kneading data K is σ -admissible, with Ki 6 = Ki+1 for all i. There exists a postcritically finite polynomial map of degree m + 1 and shape σ with kneading data K if and only if each Ki is periodic or eventually periodic, and also tight. This polynomial is always unique when it exists, up to a positive affine change of coordinates, or as a boundary anchored map of the interval. To illustrate this result, here are four examples of non-tight data. The notation A will stand for an infinite sequence (A, A , A, . . . ) of identical copies of the symbol A ∈ A. 9 Erratum. This result was stated incorrectly, without the hypothesis of tightness, in our preliminary manuscript [DGMT, Sect. 5]. A counter-example to the version stated there is provided in Example 3 below (Fig. 15c).

172

J. Milnor, C. Tresser

v

1

c

2

Fig. 15. Three maps with eventually periodic but non-tight kneading data. (See Examples 1, 2, 3.) The folding points are marked

√ Example 1. If 2 < 4v < 1 + 5, the unimodal map Qv (x) = 4vx(1 − x) has periodic kneading sequence K1 = I 1 = (I1 , K1 ) . This kneading sequence is not tight. It is realised by an entire one-parameter family of distinct quadratic maps (Fig. 15a), but not by any postcritically finite quadratic map. Example 2. The (+ − +)-bimodal kneading data K1 = (I2 , C 2 ), K2 = C 2 is admissible, but cannot be realised by any piecewise strictly monotone map (Fig. 15b. Compare the discussion of 8.1). Example 3. The (+ − +)-bimodal data K1 = (I1 , I 0 ) , K2 = I 0 can be realised by a postcritically finite 4th degree polynomial (namely f (x) = 8ax 2 (1 − x)2 with a 3 − a 2 − a − 1 = 0), but not by any cubic polynomial (Fig. 15c). Example 4 (See [MaT1, p. 179]). More generally, whenever the kneading data for f satisfies K1 = (I1 , K2 ), it follows that all points in the open interval (v1 , c2 ) ⊂ I1 have the same itinerary. If K2 is not eventually periodic, then (v1 , c2 ) is a wandering interval, hence f cannot be real analytic.10 There are uncountably many distinct choices for K2 . In fact, using the stunted sawtooth model with folding points ±1, we can take v1 = 1 − and let v2 range from −α to v1 . The corresponding topological entropy ranges between log 2 and zero. (For example, the right picture in Fig. 15 illustrates the log 2 case, with K2 periodic.) Thus we can certainly choose infinitely many K2 which are not eventually periodic. For the application, we need a special case where the hypothesis of tightness is automatically satisfied: Corollary B.4. Let K be admissible kneading data such that every kneading sequence Kj either 1) contains the symbol Cj , so that the folding point itinerary (Cj , Kj ) is periodic, or 2) satisfies the condition that Kj equals either Imin or Imax according as the sign σj is +1 or −1. 10 Compare [L1, BL, MvS1, MMS]. It would be interesting to know whether any such kneading data can be realized by a map which is C ∞ and piecewise strictly monotone. Examples of C ∞ maps with wandering intervals have been given in [Ha] and [SI].

On Entropy and Monotonicity for Real Cubic Maps

173

Fig. 16. The periodic kneading data K1 = C 1 , K2 = I 0 is tight, yet is represented by an entire one-parameter family of cubic maps. However, only one of these is postcritically finite.

Then this data is tight, and hence is realized by one and only one postcritically finite polynomial in the parametrized family of Sect. 3, and by one and only one stunted sawtooth map in the family of Sect. 5. (However, compare Fig. 16.) Here the various periodic critical orbits need not be disjoint. It follows that there is a canonical one-to-one correspondence between intersection of bones in the cubic family and in the corresponding stunted sawtooth family, and also between endpoints of bones in these two families. This would yield an alternate proof of 8.5. The proofs follow. Proof of B.1 and B.2. First note that the two statements B.1 and B.2 are equivalent to each other: The itinerary of any non folding point on the j th plateau is either (Ij −1 , Kj ) or (Ij , Kj ). If the orbit of x hits each plateau at most at its center point, then as in Sect. 5 one can compute x from its itinerary, and it is not difficult to see that this itinerary cannot contain any (Ij −1 , Kj ) or (Ij , Kj ). If the orbit of x hits some plateau off-center, then clearly we can move x slightly without changing its itinerary. Similarly, if the i th folding point orbit hits the interior of some plateau at a non folding point, then varying the i th folding value but keeping all other folding values fixed we obtain a one-parameter family of stunted sawtooth maps with the same kneading data. If, for one or more values of i, the k(i)th forward image of the folding value vi lies at an endpoint of the j (i)th plateau, then the argument is slightly more delicate: Vary all such folding values at unit speed in such a way that this k(i)th forward image moves with speed s k(i) ≥ 1 towards the interior of the corresponding plateau. Of course the j (i)th folding value may itself be varying at unit speed, but in this case the endpoints of the plateau will then move at speed 1/s < 1, so that the image will still move into the interior of the plateau, and the kneading data will not change. Thus Sp is not tight. Conversely, if the orbit of x hits every plateau at most at its center point, then it follows from the remarks above that x is tight, and the corresponding statement for kneading data follows easily. u t Proof of B.3. We will make use of a very special form of Hubbard tree, as described by Douady and Hubbard [DH1], and in a more precise form by Poirier [Po1, Theorem A]. Since we are interested only in real maps with non-degenerate critical points, our Hubbard tree T can be identified simply with an interval [1, n] of real numbers, with the integer points 1, 2, . . . , n as vertices. Here m of these vertices are to be designated as

174

J. Milnor, C. Tresser

critical points (hence n ≥ m). As a final element of structure, there is to be a mapping F from this finite set {1, 2, . . . , n} to itself which satisfies three conditions: (i) F must take consecutive vertices to distinct points. (ii) Extending linearly over the intermediate intervals [i, i + 1], we require that two consecutive intervals should map with opposite orientations (so that their images overlap) if and only if their common vertex is a critical point. The third condition can be stated in several equivalent forms (compare [Po1, II.3.11]), but the following will be convenient for our purposes. (iii) Poirier Expansion Condition. For any interval [i, i + 1] ⊂ T there must be an integer k ≥ 0 so that the k th forward image of this interval contains a critical point. Assertion. These conditions are satisfied if and only if there is a polynomial map f of degree m+1, necessarily unique up to positive affine conjugacy, and an order preserving embedding of the vertices {1, 2, . . . , n} of T into the real line which sends the designated critical points in T to the critical points of f , and which conjugates F (on this set of vertices) to f . Poirier’s proof (of a more general theorem which implies this assertion) is based on Thurston’s characterization of postcritically finite rational maps (see [DH2]), and relies on earlier work by Douady, Hubbard, Bielefeld, Y. Fisher, and others (see [Po2] and references therein). To apply this result to the proof of B.3, we start with the finite subset 6 ⊂ AN consisting of the critical orbits (Cj , Kj ) together with all of their terminal segments. Number the elements of 6 in increasing order from 1 to n, and let the shift map on 6 correspond to the mapping F on {1, . . . , n}. Then Condition (i) above follows from tightness, together with the hypothesis that Ki 6 = Ki+1 , while Condition (ii) follows from admissibility, and Condition (iii) follows trivially from the definition of kneading sequences as itineraries. Thus the Assertion above yields a polynomial map f : R → R. Let I ⊂ R be the set of points with bounded orbit. Since f is postcritically finite, all critical orbits are bounded. It follows easily that I is a closed interval, and that f |I is boundary anchored. There cannot be any smaller closed interval J ⊂ I which contains all critical points and is f -invariant and boundary anchored. For then a component of I r J would contain an attracting point (of period ≤ 2), which is impossible in the postcritically finite case. u t Closely related is an alternative characterization of tightness, in the eventually periodic case. Corollary B.5. Let K be admissible eventually periodic kneading data with Ki 6= Ki+1 for all i. Then K is tight if and only if the collection of all terminal segments of the folding point itineraries (Ci , Ki ) has the property that consecutive elements (in the ordering of Sect. 2) map to distinct points under the shift map. Proof. This follows immediately from the discussion above. u t Proof of B.4. Although it is not difficult to prove B.4 directly from the definition of tightness, the argument would be completely formal. It is more intuitive to make use of B.5. In other words, we look at the associated Hubbard tree, whose set of vertices can be identified with the union of the critical orbits for a representative map, and check directly that consecutive vertices map to distinct points. For the union of the

On Entropy and Monotonicity for Real Cubic Maps

175

periodic critical orbits, there is nothing to prove since distinct points clearly map to distinct points. Any two critical points which map to the right endpoint, corresponding to Imax , have the same parity, and hence are separated by another critical point. We must also check that the rightmost critical point and the Imax point have distinct images, Km 6 = shift(Imax ). But if σm = +1 then shift(Imax ) = Imax 6= Km , while if σm = −1 then shift(Imax ) = Imin 6 = Km , using the hypothesis of B.4 in both cases. The discussion t of critical points mapping to Imin is completely analogous. u

Appendix C: Monotonicity vs Antimonotonicity As we mentioned in Sect. 8, one nice monotonicity property in parametrized families would be to have a line through each point in the parameter space such that the dynamical complexity increases monotonically along the line. This augmentation of complexity could be precisely associated to the fact that no periodic orbit disappears as we follow the line: in one dimension this implies (but is not required for) monotonicity of the topological entropy. For the quadratic family, it is true that no periodic orbit disappears. (If one does not like to follow deformation of orbits from their birth or to their destruction, one can think of the equivalent statement that the number of orbits of each order type does not decrease). Now think of the quadratic maps as singular maps of R2 , writing ˜ v,0 (x, y) = (4vx(1−x)+y, 0). Such maps are limits of diffeomorphisms of the plane, Q ˜ v,b (x, y) = (4vx(1−x)+y, by). the Hénon maps, obtained for each b 6 = 0 by writing Q Numerical computations have suggested for a long time that periodic orbits both appear and disappear along typical one-parameter families extracted from the two-parameter Hénon family. The deep reason for these observations was finally given by the following result (Compare [KKY]). Antimonotonicity Theorem. In any neighborhood of a nondegenerate, homoclinictangency parameter value of a one-parameter C 3 -smooth family of dissipative diffeomorphisms of the plane, there must be both infinitely many orbit-creation and infinitely many orbit-annihilation parameter values. It has been conjectured ([DG, DGYKK], see also [DGK]) that a similar statement should work for m-modal maps as soon as m > 1. More precisely, calling antimonotone a parameter value approached both by infinitely many orbit-creation and infinitely many orbit-annihilation parameter values, the following can be extracted from [DGYKK]: Antimonotonicity Conjecture. A smooth one-dimensional map depending on one parameter has antimonotone parameter values whenever two critical points have disjoint orbits and are contained in the interior of a chaotic attractor. This conjecture would not overtly contradict 9.4. If we follow a (not necessarily smooth) path within some isentrope, note that any significant amount of periodic orbit creation must be offset by a roughly equivalent amount of orbit annihilation, so that topological entropy will remain constant. Even if we could find a smooth curve which is transverse to the family of isentropes, so that topological entropy increases monotonically along it, it seems possible that orbit annihilations could be dense, but outweighed by even more orbit creations. Similarly, the Antimonotonicity Theorem above does not contradict the possibility that isentropes for the family of real Hénon maps may be connected, although we have no reason to conjecture this.

176

J. Milnor, C. Tresser

The analogue of the Antimonoticity Conjecture for the stunted sawtooth families is certainly false, since by 5.8, it is very easy to find smooth curves along which there are only orbit creations. Thus, if the conjecture is true for the cubic family, then any complexity preserving correspondence between the stunted sawtooth and cubic parameter triangles must be very wild indeed. References [AKM]

Adler, R.L., Konheim, A.G. and McAndrews, M.H.: Topological entropy. Trans. Am. Math. Soc. 114, 309–319 (1965) [ALM] Alsedà, L., Llibre, J. and Misiurewicz, M.: Combinatorial Dynamics and Entropy in Dimension One. Singapore: World Scientific, 1993 [BCMM] Bernhardt, C., Coven, E., Misiurewicz, M. and Mulvey, I.: Comparing periodic orbits of maps of the interval. Trans. Am. Math. Soc. 333, 701–707 (1992) [Be] Beardon, A.F.: A Primer on Riemann Surfaces. London Mathematical Society Lecture Notes Series 78, Cambridge: Cambridge University Press, 1984 [BK] Block, L. and Keesling, J.: Computing topological entropy of maps of the interval with three monotone pieces. J. Stat. Phys. 66, 755–774 (1992) [BL] Blokh, A.M. and Lyubich, M.: Non-existence of wandering intervals and structure of topological attractors of one-dimensional dynamical systems II, The smooth case. Erg. Th. and Dynam. Sys. 9, 751–758 (1989) [BMT] Brucks, K., Misiurewicz, M. and Tresser, C.: Monotonicity properties of the family of trapezoidal maps. Commun. Math. Phys. 137, 1–12 (1991) [Bo] Bowen, R.: On Axiom A Diffeomorphisms. Proc. Reg. Conf. Math. 35, (1978) [Br] Brown, M.: A proof of the generalized Schoenflies theorem. Bull. Am. Math. Soc. 66, 74–76 (1960) (See also: Brown, M.: The monotone union of open n-cells is an open n-cell. Proc. Am. Math. Soc. 12, 812–814 (1961)) [BR] Baladi, V. and Ruelle, D.: An extension of the theorem of Milnor and Thurston on the zeta functions of interval maps. Ergodic Theory and Dynam. Sys. 14, 621–632 (1994) [BST] Balmforth, N.J., Spiegel, E.A. and Tresser, C.: The topological entropy of one-dimensional maps: Approximation and bounds. Phys. Rev. Lett. 80, 80–83 (1994) [D] Douady, A.: Topological entropy of unimodal maps: Monotonicity for quadratic polynomials. In: Real and Complex Dynamical Systems, B. Branner and P. Hjorth, Eds. Dordrecht: Kluwer, 1995, pp. 6587 [DG] Dawson, S.P. and Grebogi, C.: Cubic maps as models of two-dimensional antimonotonicity. Chaos, Solitons & Fractals 1, 137–144 (1991) [DGK] Dawson, S.P., Grebogi, C. and Koçak, H.: A geometric mechanism for antimonotonicity in scalar maps with two critical points. Phys. Rev. E 48, 1676–1682 (1993) [DGKKY] Dawson, S.P., Grebogi, C., Kan, I., Koçak, H. and Yorke, J.A.: Antimonotonicity: Inevitable reversals of period doubling cascades. Phys. Lett. A 162, 249–254 (1992) [DGMT] Dawson, S.P., Galeeva, R., Milnor, J. and Tresser, C.: A monotonicity conjecture for real cubic maps. In: Real and Complex Dynamical Systems, B. Branner and P. Hjorth, Eds. Dordrecht: Kluwer, 1995, pp. 165–183 [DH1] Douady, A. and Hubbard, J.: A proof of Thurston’s topological characterization of rational maps. Acta Math. 171, 263–297 (1993) [DH2] Douady, A. and Hubbard, J.H.: Etude dynamique des polynômes quadratiques complexes, I, (1984) & II, (1985), Publ. Mat. d’Orsay [E] Epstein, A.: Algebraic dynamics: Contraction, finiteness, and transversality principles. Manuscript in preparation. (See also Towers of Finite Type Complex Analytic Maps, Thesis, CUNY, 1993.) [F] Fatou, P.: Sur les équations fonctionnelles, II. Bull. Soc. Math. France 48, 33–94 (1920) [FK] Farkas, H.M. and Kra, I.: Riemann Surfaces. Second Edition, Graduate Texts in Mathematics 71, New York: Springer Verlag, 1992 [FT] Friedman, B. and Tresser, C.: Comb structure in hairy boundaries: Some transition problems for circle maps. Phys. Lett. 117A, 15–22 (1986) ´ atek, G.: Generic hyperbolicity in the logistic family. Ann. Math. 146, 1–52 [GS] Graczyk, J. and Swi¸ (1997) [Gu] Guckenheimer, J.: In: Dynamical Systems. C.I.M.E. Lectures J. Guckenheimer, J. Moser and S. Newhouse, Progress in Mathematics 8, New York: Birkhauser, 1980 [Ha] Hall, G.R.: A C ∞ Denjoy counterexample. Ergodic Theory and Dynamical Systems 1, 261–272 (1981)

On Entropy and Monotonicity for Real Cubic Maps

[He] [HKC] [K] [Ka] [KKY] [L1] [L2] [M1] [M2] [M3] [MaT1] [MaT2] [Mc1] [Mc2] [Mis1] [Mis2] [Mis3] [Mis4] [MMS] [MN] [MSS] [MSz] [MTh] [Mu] [MvS1] [MvS2] [My] [N] [NN] [Po1] [Po2] [Pr] [Re]

177

Heckman, C.: Monotonicity and the Construction of Quasiconformal Conjugacies in the Real Cubic Family. Thesis, Stony Brook, 1996 Hurd, L.P., Kari, J. and Culik, K.: The topological entropy of cellular automata is uncomputable. Erg. Th. and Dynam. Sys. 12, 255–265 (1992) Kaloshin, V.: Generic Diffeomorphisms with Superexponential Growth of Number of Periodic Orbits. Stony Brook I.M.S. Preprint 1999#2. Katok, A.: Lyapunov exponents, entropy and periodic orbits of diffeomorphisms. Pub. Math. IHES 51, 137–173 (1980) Kan, I., Koçak, H. and Yorke, J.A.: Antimonotonicity: Concurrent creation and annihilation of periodic orbits. Ann. Math. 136, 219–252 (1992) Lyubich, M.: Non-existence of wandering intervals and structure of topological attractors of onedimensional dynamical systems I, The case of negative Schwarzian derivative. Erg. Th. and Dynam. Sys. 9, 737–750 (1989) Lyubich, M.: Dynamics of quadratic polynomials, I and II. Acta Math. 178, 185–247, 247–297 (1997) Milnor, J.: Remarks on iterated cubic maps. Experimental Math. 1, 5–24 (1992) Milnor,J.: Hyperbolic components in Spaces of Polynomial Maps (with an appendix by A. Poirier). Stony Brook I.M.S. Preprint 1992#3 Milnor, J.: On cubic polynomials with periodic critical point. In preparation MacKay, R.S. and Tresser, C.: Boundary of topological chaos for bimodal maps of the interval. J. London Math. Soc. 37, 164–181 (1988) MacKay, R.S. and Tresser, C.: Some flesh on the skeleton: The bifurcation structure of bimodal maps. Physica 27D, 412–422 (1987) McMullen, C.: Automorphisms of rational maps. In: Holomorphic Functions and Moduli I, ed. Drasin, Earle, Gehring, Kra & Marden; MSRI Publ. 10, New York: Springer, 1988, pp. 31–60 McMullen, C.: Complex Dynamics and Renormalization. Ann. Math. Studies 135, Princeton: Princeton University Press, 1994 Misiurewicz, M.: On non-continuity of topological entropy. Bull. Ac Pol. Sci., Ser. Sci. Math. Astr. Phys. 19, 319–320 (1971) Misiurewicz, M.: Horseshoes for mappings of an interval. Bull. Ac Pol. Sci., Ser. Sci. Math. Astr. Phys. 27, 167–169 (1979) Misiurewicz, M.: Jumps of entropy in one dimension. Fund. Math. 132, 215–226 (1989) Misiurewicz, M.: Continuity of entropy revisited. Dynamical systems and applications, World Sci. Ser. Appl. Anal. 4, 495–503 (1995) Martens, M., de Mello, W. and van Strien, S.: Julia-Fatou-Sullivan theory for real one-dimensional dynamics. Acta Math. 168, 273–318 (1992) Misiurewicz, M. and Nitecki, Z.: Combinatorial Patterns for Maps of the Interval. Memoirs of the A.M.S. 456 (1991) Metropolis, N., Stein, M.L. and Stein, P.R.: On finite limit sets for transformations on the unit interval. J. Comb. Theory 15, 25–44 (1973) Misiurewicz, M. and Szlenk, W.: Entropy of piecewise monotone mappings. Studia Math. 67, 45–63 (1980) (Short version: Astérisque 50, 299–310 (1977)) Milnor, J. and Thurston, W.: On iterated maps of the interval. Springer Lecture Notes 1342, 1988, pp. 465–563 Mumbrú, P.: Estructura Periòdica i Entropia Topològica de les Aplicacions Bimodals. Thesis, Universitat Autònoma de Barcelona, 1987 de Mello, W. and van Strien, S.: A structure theorem in one-dimensional dynamics. Ann. Math. 129, 519–546 (1989) de Melo, W. and Van Strien, S.: One Dimensional Dynamics. Berlin: Springer Verlag, 1993 Myrberg, P.J.: Iteration der reellen Polynome zweiten Grades. Ann. Acad. Sci. Fennic. 256A, 1–10 (1958); 268A, 1–13 (1959) and 336A, 1–18 (1963) Newhouse, S.: Continuity properties of entropy. Ann. Math. 129, 215–235 (1989) and 131, 409– 410 (1990) Nishizawa, K. and Nojiri, A.: Center curves in the moduli space of the real cubic maps. Proc. Japan Acad. Ser. A Math. Sci. 69, 179–184(1993) (See also: Algebraic geometry of center curves in the moduli space of the cubic maps. Proc. Japan Acad. Ser. A Math. Sci. 70, 99–103 (1994)) Poirier, A.: On post critically finite polynomials, Part II, Hubbard Trees. Stony Brook I.M.S. Preprint 1993#7. (Thesis, Stony Brook 1993) Poirier, A.: Realizing reduced schemata. Appendix to J. Milnor, Hyperbolic components in Spaces of Polynomial Maps, Stony Brook I.M.S. Preprint 1992#3 Preston, C.: What you need to know to knead. Advances Math. 78, 192–252 (1989) Rees, M.: A minimal positive entropy homeomorphism of the 2-torus. J. London Math. Soc. (2) 23, 537–550 (1981)

178

[Ro] [RS] [RT] [SI] [St] [Ts] [Y]

J. Milnor, C. Tresser

Rothschild, J.: On the Computation of Topological Entropy. Thesis, CUNY, 1971 Ringland, J. and Schell, M.: Genealogy and bifurcation skeleton for cycles of the iterated twoextremum map of the interval. SIAM J. Math. Anal. 22, 1354–1371 (1991) Ringland, J. and Tresser, C.: A genealogy for finite kneading sequences of bimodal maps of the interval. Trans. Am. Math. Soc. 347, 4599–4624 (1995) Sharkovskii, A.N. and Ivanov, A.F.: C ∞ -mappings of an interval with attracting cycles of arbitrary large periods. Ukrain. Mat. Zh. 35, 455–458 (1983) Stimson, J.: Degree Two Rational Maps with a Periodic Critical Point. Thesis, Univ. Liverpool, 1993 Tsujii, M.: A simple proof for monotonicity of entropy in the quadratic family. Preprint Hokkaido University, 1998 Yomdin, Y.: Volume growth and entropy. Isr. J. Math. 57, 285–300 (1987) (See also C k-resolution of semialgebraic mappings, ibid. pp. 301–317)

Communicated by Ya. G. Sinai

Commun. Math. Phys. 209, 179 – 205 (2000)

Communications in

Mathematical Physics

© Springer-Verlag 2000

Factorized Combinations of Virasoro Characters Andrei G. Bytsko1,2 , Andreas Fring1 1 Institut für Theoretische Physik, Freie Universität Berlin, Arnimallee 14, 14195 Berlin, Germany.

E-mail: [email protected]

2 Steklov Mathematical Institute, Fontanka 27, St. Petersburg 191011, Russia.

E-mail: [email protected] Received: 29 October 1998 / Accepted: 3 August 1999

Abstract: We investigate linear combinations of characters for minimal Virasoro models which are representable as a product of several basic blocks. Our analysis is based on consideration of asymptotic behaviour of the characters in the quasi-classical limit. In particular, we introduce a notion of the secondary effective central charge. We find all possible cases for which factorization occurs on the base of the Gauß-Jacobi or the Watson identities. Exploiting these results, we establish various types of identities between different characters. In particular, we present several identities generalizing the Rogers– Ramanujan identities. Applications to quasi-particle representations, modular invariant partition functions, super-conformal theories and conformal models with boundaries are briefly discussed. Introduction It is a well known fact that the characters of irreducible representations of the Virasoro algebra for the M(3, 4) minimal model possess the peculiar property to be representable as infinite products 1

3,4 (q) = q 24 χ1,2

∞ Q n=0

1

(1 + q n+1 ) = q 24

3,4 3,4 (q) ± χ1,3 (q) = q − 48 χ1,1 1

∞ Q n=0

∞ Q n=0

1 1−q 2n+1

(1 ± q n+1/2 ).

,

(0.1) (0.2)

As was observed in [1], some characters and linear combinations of characters for the M(4, 5) minimal model admit similar forms. The question towards a generalization and classification of these identities arises naturally. Surprisingly, it turned out [2] that the only factorizable single characters for 2n,t 3n,t (q) and χn,m (q). In [3,2,4–6] it was discussed that minimal models are of type χn,m

180

A. G. Bytsko, A. Fring

the factorization of characters in these series is based on the Gauß-Jacobi and Watson identities. On the other hand, a factorization of linear combinations of Virasoro characters has not been studied so far. In the present paper we show that factorization of combinations s,t s,t (q) ± χn,t−m (q) occurring due to the Gauß-Jacobi and Watson identities is possible χn,m (up to the symmetries of the characters) only for s = 3n, 4n, 6n. Moreover, we will prove that there are no other factorizable differences of this type which admit the inverse product form similar to the r.h.s. of (0.1). We present a systematic analysis based on considerations of the asymptotic behaviour of (combinations of) characters in the so-called quasi-classical limit, q → 1− . We will demonstrate that for linear combinations of the above mentioned type we need, besides ˜ the effective central charge ceff , the notion of the “secondary” effective central charge c. The advantage to have the characters (or combinations) in the form of infinite products rather than infinite sums is many-fold. First of all the problem of finding the dimension of a particular level in the Verma module of the irreducible representation has been reduced to a simple problem of partitions. As a consequence one may state the possible monomials of Virasoro generators at a specific level. Also the associated quasi-particle states may be constructed from this form without any effort, whereas it is virtually impossible to find them from the infinite sum representation. The quasi-particle form is also related to a classification of Rogers–Ramanujan type of identities [7]. In the present paper this subject is discussed rather briefly in 3.4 and Appendix E. However, this point is followed up further in [31], where the obtained factorized forms of characters were exploited in the derivation of Rogers–Ramanujan type identities. In addition, the factorized characters (or combinations) allow to derive various new identities between different combinations of characters far easier than employing the infinite sum representation. Some of these identities relate different sectors of the same models, whereas others relate different models altogether. Factorized combinations of characters appear naturally in the context of coset models, super-conformal extensions of the Virasoro algebra and boundary conformal field theories. They may even shed some light on massive models, since it was conjectured in [2] that they allow to identify the space of form factors of descendant operators.

1. Preliminaries We use the notation hn, mi = 1 if n and m are co-prime numbers and we employ also m Q (1 − q k ) with (q)0 = 1. the standard abbreviation for Euler’s function (q)m = k=1

1.1. Characters of minimal models. The Virasoro algebra is generated P by operator valued Fourier coefficients of the energy-momentum tensor T (z) = n z−n−2 Ln and a central charge c. For an irreducible highest weight representation Vc,h of the Virasoro algebra with central charge c and weight h one defines the character c

c

χ (q) = tr Vc,h q L0 − 24 = q h− 24

∞ X n=0

µn q n ,

(1.1)

Factorized Combinations of Virasoro Characters

181

with µn being the multiplicity of the level n. The corresponding states at a particular level k are spanned by the vectors L−k1 . . . L−kn |hi, k1 ≤ k2 ≤ . . . ≤ kn , k =

n X

ki .

(1.2)

i=1

Minimal models are the distinguished conformal theories in which the set of highest weights is finite [8]. These models are labeled by two integer numbers s and t such that s, t ≥ 2

hs, ti = 1.

and

(1.3)

The minimal models for which |s − t| = 1 are unitary [9,10]. The minimal model M(s, t) has the central charge c(s, t) = 1 −

6(s − t)2 . st

(1.4)

The corresponding irreducible highest weight representations of the Virasoro algebra are representations with the weights hs,t n,m =

(nt − ms)2 − (s − t)2 , 4st

(1.5)

where the labels run through the following set of integers: 1≤n≤s−1,

1≤m≤t −1.

(1.6)

The corresponding character is given by [11,1] s,t

s,t (q) χn,m

c(s,t)

q hn,m − 24 = (q)∞ =

q

c(s,t) hs,t n,m − 24

(q)∞

∞ X

q stk

2

q k(nt−ms) − q k(nt+ms)+nm

k=−∞

(1.7)

s,t χˆ n,m (q),

(the second equality defines χ(q), ˆ which we refer to as “incomplete character”). The characters possess the following symmetries: s,t t,s s,t t,s (q) = χm,n (q) = χs−n,t−m (q) = χt−m,s−n (q). χn,m

(1.8)

It follows from (1.6) and these symmetries that the minimal model M(s, t) has D = (s − 1)(t − 1)/2 different sectors (inequivalent irreducible representations). In addition, (1.7) allows to relate some characters of different models αs,t s,αt (q) = χn,αm (q), χαn,m

(1.9)

6,5 (q) = where α is a positive number such that hs, αti = ht, αsi = 1. For instance, χ2,m 3,10 (q). χ1,2m

182

A. G. Bytsko, A. Fring

1.2. Quantum dilogarithm. In our analysis of factorized characters we will be exploiting the properties of the quantum dilogarithm, whose defining relations are lnq (θ) :=

∞ Y

(1 − e2π iθ q k ) = exp

k=0

∞ X 1 e2π iθ k . k qk − 1

(1.10)

k=1

Taking q = e2πiτ , we assume that Im(τ ) > 0 and Im(θ ) > 0 in order to guarantee the convergence of (1.10). We see from (1.10) that lnq (θ ) is a pseudo-double-periodic function lnq (θ + 1) = lnq (θ) and lnq (θ + τ ) =

1 lnq (θ ) . 1 − e2π iθ

(1.11)

It follows easily from this that lnq (θ) =

k(k−1) ∞ X (−1)k q 2 e2π iθ k

(q)k

k=0

∞

and

X e2π iθ k 1 = . lnq (θ ) (q)k

(1.12)

k=0

For explicit calculations it will further turn out to be convenient to employ the notations (in which we will omit the explicit q-dependence as long as q is not varying) {x}− y := lnq y (xτ )

{x}+ y := lnq y (xτ + 1/2) ,

and

0 < x ≤ y.

(1.13)

These blocks have the following obvious properties: − − {x}+ y {x}y = {2x}2y , + ln−q y (xτ ) = {x}− 2y {x + y}2y ,

{x}− 2x =

1 , {x}+ x

{x}± y =

n−1 Y k=0

{x + ky}± ny ,

(1.14)

− ln−q y (xτ + 1/2) = {x}+ 2y {x + y}2y , (1.15)

{x}+ 2x =

1 + {x}− 2x {2x}2x

.

(1.16)

The last line is Euler’s identity which, in fact, can be derived from (1.14).1 1.3. Gauß-Jacobi and Watson identities. It will be the principal aim of our manuscript to seek factorizations of some single characters and some linear combinations of characters in the following form: q const

YM + 1 YN {xi }− x˜j y . y i=1 j =1 (q)∞

(1.17)

We will encounter the cases N 6 = 0, M = 0 and N 6 = 0, M 6= 0. The explicit formulae of this type, which we will obtain, are based on the Gauß-Jacobi identity (see e.g. [12]) ∞ X

(−1)k v

k=−∞

k(k+1) 2

w

k(k−1) 2

=

∞ Y

(1−v k w k−1 )(1−v k−1 wk )(1−v k wk ),

(1.18)

k=1

1 Indeed, using consequently the first and second relation in (1.14) for y = x, we obtain {2x}− = 2x − + − − + − {x}+ x {x}x = {x}x {x}2x {2x}2x , thus deriving the identity {x}x {x}2x = 1.

Factorized Combinations of Virasoro Characters

183

and the Watson identity [13] ∞ X

v

3k 2 +k 2

w 3k (w −2k −w4k+1 ) = 2

k=−∞

∞ Y

(1 − v k−1 w 2k−1 )(1−v k w2k−1 )

k=1

× (1−v k w 2k )(1−v 2k−1 w 4k−4 )(1 − v 2k−1 w 4k ). (1.19) Substituting v = q a , w = q b , we can rewrite the identities in terms of the blocks (1.13) ∞ X

k2

k

− − (−1)k q (a+b) 2 + 2 (a−b) = {a}− a+b {b}a+b {a + b}a+b ,

(1.20)

k=−∞

∞ X

q

3k 2 2 (a+2b)

k=−∞

− − q k(a/2−2b) − q k(a/2+4b)+b = {b}− a+2b {a + b}a+2b {a + 2b}a+2b − ×{a}− 2a+4b {a + 4b}2a+4b .

(1.21)

Other useful substitutions are v = q a , w = −q b and v = −q a , w = q b (for (1.18) it suffices to consider only the first of them, because of the symmetry v ↔ w), which yield ∞ X

(−1)

k(k+1) 2

k2

k

q (a+b) 2 + 2 (a−b)

(1.22)

k=−∞ + + − + − = {a}− 2(a+b) {b}2(a+b) {a + b}2(a+b) {a + 2b}2(a+b) {2a + b}2(a+b) {2a + 2b}2(a+b) , ∞ X

2

(−1)3k q

3k 2 2 (a+2b)

(q k(a/2−2b) + q k(a/2+4b)+b )

k=−∞ + − − − = {b}+ a+2b {a + b}a+2b {a + 2b}a+2b {a}2a+4b {a + 4b}2a+4b , ∞ X

(−1)

k(k−1) 2

q

3k 2 2 (a+2b)

(1.23)

(q k(a/2−2b) − q k(a/2+4b)+b )

k=−∞ − + = {a}+ 2a+4b {b}2a+4b {a + b}2a+4b

(1.24)

+ + − − × {a + 2b}+ 2a+4b {a + 3b}2a+4b {a + 4b}2a+4b {2a + 3b}2a+4b {2a + 4b}2a+4b .

Here we used (1.15) in order to obtain the r.h.s. in the desired form. Now one can try to find factorizable linear combinations of characters simply by matching the l.h.s. of (1.20)–(1.24) with appropriate combinations of (1.7). However, this is a cumbersome task. Below we will develop a more systematic and more elegant approach exploiting the quasi-classical asymptotics of characters.

1.4. Quasi-classical asymptotics of characters. As can be seen from (1.10), the limit τ → 0 of lnq (θ) (since we require Im(τ ) > 0, this is the limit q → 1− ) is singular. The asymptotics is given by

184

A. G. Bytsko, A. Fring

lim lnq (θ) = exp

τ →0

1 1 Li2 e2π iθ + ln(1 − e2π iθ ) + O(τ ) , 2πiτ 2

(1.25)

P xn 2 where Li2 (x) = ∞ n=1 n2 is the Euler dilogarithm (see e.g. [15]). Introducing qˆ = exp{−2πi/τ }, we derive from (1.13) and (1.25) the following asymptotics for the limit q → 1− : {x}− y ∼ qˆ

Li2 (1) 4π 2 y

= qˆ

1 24y

, {x}+ y ∼ qˆ

Li2 (−1) 4π 2 y

= qˆ

1 − 48y

.

(1.26)

Here we used the fact that Li2 (1) = −2Li2 (−1) = π 2 /6 holds.3 Notice that q → 1− + implies that qˆ → 0+ , so that {x}− y and {x}y tend to zero and infinity, respectively. From a physical point of view, say if we regard χ(q) as a partition function, the limit τ → 0 can be interpreted as a high-temperature limit (with temperature T ∼ 1/τ ) which is singular and known to be ruled by the effective central charge only (i.e. it is sector-independent) [16]. Indeed, in order to carry out this limit, one may exploit the behaviour of Virasoro characters under the modular transformation. It is well known [17, 3], that the S-modular transformation (q ↔ q) ˆ of a character has the following form: s,t (q) = χn,m

0

X n0 ,m0

0

0

n m s,t Snm χn0 ,m0 (q), ˆ

(1.27)

0

n m are explicitly known constants (see (2.17)). Now it is obvious from (1.7) where Snm and (1.27) that s,t n¯ m ¯ − (q) ∼ Snm qˆ χn,m

ceff (s,t) 24

(q → 1− ).

(1.28)

Here we have introduced the so-called effective central charge ceff (s, t) = c(s, t) − 6 ¯ − ms) ¯ 2 , where hs,t 24hs,t n, ¯ m ¯ = 1 − st (nt n, ¯ m ¯ denotes the lowest of all conformal weights in the model. Let us remark that the conditions (1.3) and (1.6) allow us to invoke the well-known theorem of the greatest common divisor and show that |nt ¯ − ms| ¯ = 1. Hence ceff (s, t) = 1 −

6 st

(1.29)

holds for any minimal model. Comparison of (1.28) with (1.26) imposes a constraint on the possible structure of ±1 and characters factorized in form (1.17). Namely, each factor of the type ({x}− y) ±1 contributes, ∓ 1 and ± 1 to the effective central charge, respectively. Notice ({x}+ y) y 2y that this is an x independent property. These contributions must sum up to the value given by (1.29). 2 This motivated the authors of [14] to coin ln (θ ) a quantum dilogarithm. q 3 Equations (1.26) can also be obtained by a saddle point analysis of the identities (1.12) for ln L(1) qˆ 4π 2 y

q b (θ ) if we − L(1/2) 2

4π y , put θ = xτ + 1/2 and θ = xτ , respectively [6]. In this approach one finds: {x}− , {x}+ y ∼ y ∼ qˆ 1 as τ → 0. Here L(z) = Li2 (z) + 2 ln z ln(1 − z) denotes the Rogers dilogarithm [15]. These results coincide with (1.26) since L(1) = 2L(1/2) = π 2 /6.

Factorized Combinations of Virasoro Characters

185

2. Factorization of Characters Below it will be useful to refer to the following simple statement: ζ nm + ζ −1 st = nt + ms ⇐⇒ t = ζ m or s = ζ n.

(2.1)

Equations of these form will arise as necessary conditions for factorization of (combinations of) characters. Clearly, for s, t, n, m obeying (1.3) and (1.6) the parameter ζ may assume only some rational values greater than unity. 2.1. Factorization of single characters. The factorization of some Virasoro characters in the M(3, 4) and M(4, 5) models was already observed in [1], whereas the factorization 2n,t 3n,t (q) and χn,m (q) was discovered in [2]. It was already of all characters of type χn,m discussed in [3,2,4–6] that the factorization of characters in these series may be obtained by exploiting the Gauß-Jacobi and Watson identities. Nevertheless, we wish to present here a systematic derivation of these results based on alternative arguments which will also be applicable in a more general situation. It is straightforward to see from (1.7) that the first three terms in the expansion of the incomplete character are s,t (q) = 1 − q nm − q (s−n)(t−m) + . . . , χˆ n,m

(2.2)

and that further terms are of higher powers in q. Let us assume that the incomplete character in question is a particular case of the l.h.s. of the Gauß-Jacobi identity (1.20) for some a and b. Noticing that the series on the l.h.s. of (1.20) is 1 − q a − q b + higher order terms, we conclude that a = nm, b = (s − n)(t − m) or vice versa. Furthermore, the r.h.s. of (1.20) allows to calculate the effective central charge for the character in 1 to question. As was explained in Subsect. 1.4, each of the three blocks contributes − a+b − 3 ceff . Therefore, ceff = 1 − a+b (the 1 is a contribution of (q)∞ = lnq (τ ) = {1}1 , which appears in (1.7) and whose limit is also ruled by (1.26)). Comparison of this result with (1.29) yields the equation 2 nm +

1 st = nt + ms, 2

(2.3)

which is a particular case of (2.1) with ζ = 2 and, hence, either s = 2n or t = 2m. This 2n,t (q) is the only (up to the symmetries (1.8)) possible type of characters implies that χn,m factorizable with the help of the Gauß-Jacobi identity and that its factorization has to be of the following form 2n,t

2n,t (q) = χn,m

c(2n,t)

q hn,m − 24 − − {nm}− nt {nt − nm}nt {nt}nt , (q)∞

(2.4)

where t is an odd number according to (1.3). One can verify that Eq. (2.4) is indeed valid by a direct matching of the l.h.s. of (1.20) for the specified a and b with the formula (1.7) for characters (see e.g. [6]). The same type of consideration applies if we seek characters which are factorizable with the help of the Watson identity. Namely, since the series on the l.h.s. of (1.21) is again 1 − q a − q b + higher order terms, we conclude that a = nm, b = (s − n)(t − m) or vice versa (in contrast to the previous case, these two possibilities lead to different equations).

186

A. G. Bytsko, A. Fring

The r.h.s. of (1.21) allows to calculate the effective central charge: ceff = 1 − y4 , where y = a + 2b. Comparison of this value for ceff for the two choices of a and b with (1.29) yields the following equations: 2 3 nm + st = nt + ms 2 3

and 3 nm +

1 st = nt + ms, 3

(2.5)

respectively. According to (2.1) this implies: n = 2s/3 or m = 2t/3 in the first case and n = s/3 or m = t/3 in the second. Notice that these cases are related via the symmetries (1.8). Thus, we conclude that the only possible type of characters factorizable on the 3n,t (q) (again up to the symmetries (1.8)) and that its base of the Watson identity is χn,m factorization has to be of the following form: 3n,t

3n,t (q) = χn,m

c(3n,t)

q hn,m − 24 − − {nm}− 2nt {2nt − nm}2nt {2nt}2nt (q)∞ − ×{2nt − 2nm}− 4nt {2nt + 2nm}4nt ,

(2.6)

where h3, ti = 1. Again, one verifies this formula directly matching it with (1.7) (see [6]). Thus, we have found all types of characters which are factorizable on the base of the Gauß-Jacobi and the Watson identities. In fact, it was shown in [2] that this exhausts the list of characters of minimal Virasoro models which admit the form (1.17) with M=0 and xi 6 = xk . This implies that for the purpose of factorizing a single character in such a form one does not have to invoke the higher Macdonald identities [18] (also known as the Weyl-Macdonald denominator identities). As a last remark in this subsection, we notice that in the case h2, mi = h3, ni = hn, mi = 1 the combination of (2.4) and (2.6) yields nm−1

2n,3m (q) = χn,m

q 24 {nm}− nm . (q)∞

(2.7)

3,4 + 24 (q) = q 24 /{1}− The first non-trivial example of this kind is χ1,2 2 = q {1}1 (the second equality is due to the Euler identity). Furthermore, noticing the symmetry n ↔ m of the r.h.s. of Eq. (2.7), we derive an identity relating different models (it can also be obtained employing (1.9) twice) 1

1

2n,3m 2m,3n (q) = χm,n (q), χn,m

(2.8)

2,15 (q) = where h6, ni = h6, mi = hn, mi = 1. The first non-trivial example is χ1,5 3,10 χ1,5 (q).

2.2. Factorization of linear combinations. Preliminary ideas. We commence the ins,t (q) ± χns,t0 ,m0 (q), by introducing the vestigation of factorized linear combinations, χn,m quantity 0

0

,m s,t (s, t) := hs,t 1hnn,m n0 ,m0 − hn,m

(m + m0 )s − (n + n0 )t (n − n0 )t − (m − m0 )s = , 4st

(2.9)

Factorized Combinations of Virasoro Characters

187

where we will often omit the labels s and t. Then s,t c(s,t) n0 ,m0 q hn,m − 24 s,t s,t (q) ± χns,t0 ,m0 (q) = (q) ± q 1hn,m χˆ ns,t0 ,m0 (q) . χˆ n,m χn,m (q)∞

(2.10)

n0 ,m0

s,t (q) ± q 1hn,m χˆ ns,t0 ,m0 (q) can be represented as a product of few The combination χˆ n,m 0

0

,m is an integer or a fraction with sufficiently small denomiblocks { }± only if 1hnn,m n0 ,m0

nator (otherwise the product will generate terms with powers of q 1hn,m which are not presented in the combination). On the other hand, the numerator in (2.9) is, in general, not divisible by st because of the conditions (1.3) and (1.6). The only possibility to make this fraction reducible by st is to put n = n0 and m + m0 = t or, alternatively, m = m0 and n + n0 = s. Thus, we are led to consider the combinations s,t s,t (q) ± χn,t−m (q). χn,m

(2.11)

s,t Let us denote 1hn,t−m n,m (s, t) for such pairs by 1hn,m ; its explicit value is

1hs,t n,m =

1 (t − 2m)(s − 2n). 4

(2.12)

If s or t is even (in particular, this includes all unitary minimal models), then 1hs,t n,m is integer or semi-integer. Let, for definiteness, s be even. Then, taking into account the symmetries (1.8), we see that each character in the minimal model M(s, t) is either of s,t (q) (i.e., a “single” character, factorizable per se) or there exists exactly the form χs/2,m one more character such that they form a pair of type (2.11). It follows from this and Eq. (1.6) that the model M(s, t) has D0 = t−1 2 “single” characters. Consequently, there (s−2)(t−1) pairs. If both s and t are odd, then apparently D0 = 0 are D1 = (D −D0 )/2 = 4 . and D1 = D/2 = (s−1)(t−1) 4 Consider (2.11) for n and m in the range n < s/2 , m < t/2.

(2.13)

For this range all involved characters are different (see e.g. [3]), and it is easy to see that we cover all possible D1 combinations. Moreover, conditions (2.13) ensure that 1hs,t n,m > 0. This in turn implies that (2.11) contains only non-negative powers of q. Thus, from now on we will assume that n and m in (2.11) are restricted as in (2.13). As we have seen in the previous subsection, the knowledge of the asymptotic behaviour of the characters in the q → 1− limit proves to be very useful in the search of factorized characters. It turns out that in the case of linear combinations we have to take into account also the next to leading term in (1.28), s,t n¯ m ¯ − (q) ∼ Snm qˆ χn,m

ceff (s,t) 24

n˜ m ˜ − + Snm qˆ

c(s,t) ˜ 24

+ . . . (q → 1− ),

(2.14)

c(s, ˜ t) = c(s, t) − 24 hs,t n, ˜ m ˜ .

(2.15)

where we denoted ceff (s, t) = c(s, t) − 24 hs,t n, ¯ m ¯ ,

s,t Here hs,t n, ¯ m ¯ and hn, ˜ m ˜ are the smallest and the second smallest conformal weights in the model corresponding to the minimal and the next to minimal value of |nt − ms|, respectively. We will refer to c(s, ˜ t) as the secondary effective central charge.

188

A. G. Bytsko, A. Fring

As we mentioned above, the theorem of the greatest common divisor ensures that ˜ − ms| ˜ = 2, ceff (s, t) = 1 − 6/st. Furthermore, one can show in the same way that |nt s,t 4−(s−t)2 4 so that hn, holds. Thus, employing (1.4) and (2.15), we obtain 4st ˜ m ˜ = c(s, ˜ t) = 1 −

24 . st

(2.16)

The only case where this argument fails is M(2, t). Here |nt ˜ − ms| ˜ = 3 (unless t = 3, in does not exist). But, as we demonstrated, in this case all the characters which case hs,t n, ˜ m ˜ are factorizable per se. Now, using the explicit form of the matrix S [17,3] involved in the S-modular transformation (1.27) r π nn0 t π mm0 s 8 0 0 n0 ,m0 (−1)nm +mn +1 sin sin , (2.17) Sn,m = st s t 0

0

0

0

0

0

n ,m n ,m = −Sn,m (−1)n t−m s . Taking into account that |nt ¯ − ms| ¯ =1 we observe that Sn,t−m n, ¯ m ¯ n, ¯ m ¯ n, ˜ m ˜ n, ˜ m ˜ and |nt ˜ − ms| ˜ = 2, we conclude that Sn,t−m = Sn,m and Sn,t−m = −Sn,m . Therefore, s,t s,t (q) − χn,t−m (q) the leading terms in (2.14) corresponding for the combination χn,m ˜ t) survive5 and for the combination to ceff (s, t) cancel but those corresponding to c(s, s,t s,t χn,m (q) + χn,t−m (q) the leading terms corresponding to ceff (s, t) do not cancel. Thus, we obtain the following asymptotics of the combinations in question: s,t s,t (q) + χn,t−m (q) ∼ qˆ − χn,m s,t s,t (q) − χn,t−m (q) χn,m

∼ qˆ

ceff (s,t) 24

˜ − c(s,t) 24

(q → 1− ), −

(q → 1 ).

(2.18) (2.19)

2.3. Factorization of linear combinations. Exact formulae. Now we are in the position to find all combinations of type (2.11) which are factorizable on the base of the GaußJacobi and Watson identities. First, it follows from (1.7) that s,t

s,t

s,t s,t (q) ± q 1hn,m χˆ n,t−m (q) = 1 − q nm ± q 1hn,m + · · · , χˆ n,m

(2.20)

and the further terms are of higher powers in q. Here 1hs,t n,m is given by (2.12) and we assume n < s/2, m < t/2, notice that then nm 6 = 1hs,t n,m We will consider the sum of characters first. Let us assume that it is factorizable on the base of the Gauß-Jacobi identity (1.22), whose expansion on the l.h.s. is 1 − q a + q b + higher order terms. Then we infer from (2.20) that a = nm and b = 1hs,t n,m . The r.h.s. of (1.22) gives the following effective central charge of the combination in question: 3 . Comparing it with (1.29), we obtain the equation 8(a + b) = st, or ceff = 1 − 4(a+b) more explicitly 4 nm +

1 st = nt + ms. 4

(2.21)

4 In fact, a more general statement is valid: for s and t obeying (1.3) and positive integer k such that k < min(s, t) there exists always a solution of the equation |nt − ms| = k obeying (1.6). It is given by n = k n¯ − ps and m = k m ¯ − pt, where p is some integer depending on k. 2,t 5 For M(2, t) these terms cancel since |nt ˜ − ms| ˜ = 3. This is not surprising because in this case χ1,m (q) − 2,t (q) vanishes due to (1.8). χ1,t−m

Factorized Combinations of Virasoro Characters

189

According to (2.1), this implies 4n = s or 4m = t. If we assume that the difference of characters in (2.20) is factorizable on the base of the Gauß-Jacobi identity (1.20), we have to put a = nm, b = 1hs,t n,m or vice versa. According to (2.19), the asymptotics q → 1− defines the secondary effective central 3 . Together with charge, and comparison with the r.h.s. of (1.20) yields c˜ = 1 − a+b (2.16), we obtain the equation 8(a + b) = st which leads to the same condition (2.21) found for the sum of characters. Thus, we have shown that the only possible (up to the symmetries (1.8)) combination 4n,t 4n,t (q)±χn,t−m (q) of characters factorizable on the base of the Gauß-Jacobi identity is χn,m and that its factorization has to be of the following form: 4n,t

4n,t 4n,t (q) + χn,t−m (q) χn,m

c(4n,t)

q hn,m − 24 − − = {nm}− nt {nt − nm}nt {nt}nt (q)∞ + + ×{nt/2 − nm}+ nt {nt/2}nt {nt/2 + nm}nt , 4n,t

4n,t 4n,t (q) − χn,t−m (q) = χn,m

(2.22)

c(4n,t)

q hn,m − 24 {nm}−nt {nt/2 − nm}−nt {nt/2}−nt . (q)∞ 2 2 2

(2.23)

Here ht, 2i = ht, ni = 1. The direct proof of these relations is performed again by matching them with (1.7) (see Appendix B). Notice that in the case of odd n it suffices to prove only one of the relations, say (2.23). Indeed, in this case 1h4n,t n,m = n(t − 2m)/2 is semi-integer, so that changing the signs of all semi-integer powers in the series on the l.h.s. of (2.22), we obtain the series on the l.h.s. of (2.23). Therefore, the r.h.s. of (2.22) is derived from the r.h.s. of (2.23) with the help of (1.15). Now we apply the same technique as above in order to find the differences of type (2.11) which are factorizable on the base of the Watson identity. We assume they have the form of Eq. (1.21), whose expansion on the l.h.s. is 1 − q a − q b + higher order s,t terms. Then we infer from (2.20) that a = nm and b = 1hs,t n,m or a = 1hn,m and b = nm. The r.h.s. of (1.21) gives the following secondary effective central charge of 4 . Comparing it with (2.16), we obtain the the combination in question: c˜ = 1 − a+2b equation 6(a + 2b) = st, which gives for the two possible choices of a and b, 3 nm +

1 1 st = nt + ms, and 6 nm + st = nt + ms, 3 6

(2.24)

respectively. According to (2.1), this implies 3n = s or 3m = t in the first case and 6n = s or 6m = t in the second. Assuming that the sum of characters in (2.20) is factorized on the base of the Watson identity (1.23), we have to put a = nm, b = 1hs,t n,m . Then the r.h.s. of (1.23) gives 1 ceff = 1 − a+2b . Comparing it with (1.29), we obtain the equation 6(a + 2b) = st and, thus, we recover the first equation in (2.24). So, this is once more the case n = s/3 or m = t/3. It turns out that the sum of characters in (2.20) cannot be factorized on the base of the Watson identity (1.24). Indeed, its l.h.s. is the following series 1 + q a − q b + q 2a+b − q a+5b + higher order terms. On the other hand, for n < s/2, m < t/2 we have s,t

s,t

s,t

s,t s,t (q) ± q 1hn,m χˆ n,t−m (q) = 1 − q nm ± q 1hn,m ∓ q 1hn,m +n(t−m) χˆ n,m s,t

∓ q 1hn,m +m(s−n) + . . . ,

(2.25)

190

A. G. Bytsko, A. Fring

where further terms are of higher powers in q. Evidently, these two series cannot match because of the wrong sign of the q 2a+b term. Thus, the only possible (up to the symmetries (1.8)) combinations of characters 3n,t 3n,t 6n,t (q) ± χn,t−m (q) and χn,m (q) − factorizable on the base of the Watson identity are χn,m 6n,t χn,t−m (q) and their factorizations have to be of the following form 3n,t

3n,t 3n,t (q) ± χn,t−m (q) χn,m

c(3n,t)

q hn,m − 24 − = {nm}− nt {nt − nm}nt (q)∞ − nt nt − 2nm ± nt + 2nm ± × , nt nt 2 nt 4 4 2

6n,t

6n,t 6n,t (q) − χn,t−m (q) χn,m

2

(2.26)

2

c

q hn,m − 24 (6n,t) − − = {nm}− nt {nt − nm}nt {nt}nt (q)∞ − ×{nt − 2nm}− 2nt {nt + 2nm}2nt .

(2.27)

Here ht, 3i = ht, ni = 1 in (2.26) and ht, 6i = ht, ni = 1 in (2.27). The direct proof of these relations is performed again by matching them with (1.7) (see Appendix B). Combining (2.22)–(2.23) and (2.26), we also obtain nm−2

4n,3m 4n,3m (q) + χn,2m (q) χn,m

q 48 + = {nm}− nm {nm/2}nm , (q)∞

(2.28)

nm−2

q 48 {nm/2}−nm , 2 (q)∞

4n,3m 4n,3m (q) − χn,2m (q) = χn,m

(2.29)

where hn, 3i = hm, 2i = hn, mi = 1. To conclude this subsection we mention an interesting byproduct, which follows from (2.6) and (2.26), − {nm}− nt {nt − nm}nt

nt 2

− nt 2

nt − 2nm 4

± nt 2

nt + 2nm 4

± nt 2

− − − − = {2nt}− 2nt {nm}2nt {2nt − nm}2nt {2nt − 2nm}4nt {2nt + 2nm}4nt

±q

nt−2nm 4

− − − − {2nt}− 2nt {nt − nm}2nt {nt + nm}2nt {2nm}4nt {4nt − 2nm}4nt ,

which may also be rewritten as − nt − 2nm ± nt + 2nm ± nt nt nt 2 nt 4 4 2

+ + = {nt}+ nt {nt − nm}2nt {nt + nm}2nt ± q

2 nt−2nm 4

(2.30)

(2.31) + + {nt}+ nt {nm}2nt {2nt − nm}2nt .

This identity resembles particular formulae in [19] ((A5) and (A6) therein), which were useful to derive a different type of identities between characters. Analogous identities following from (2.28)–(2.29) and (2.26) are + {3nm}+ 8nm {5nm}8nm ± q

nm 2

+ {nm}+ 8nm {7nm}8nm

− − = {nm/2}± nm {2nm}4nm {4nm}8nm .

(2.32)

Factorized Combinations of Virasoro Characters

191

2.4. Remarks on the factorized combinations. The factorized characters given by (2.4) and (2.6) can be rewritten in the “inverse product” form (examples of such representation are given in Appendix A)6 q const QN

1

− i=1 {xi }yi

.

(2.33)

In order to achieve this, one rewrites (q)∞ = {1}− 1 with the help of (1.14) as a product of some number of blocks and then cancels all blocks in the numerator with some of those in the denominator. The only problem here is to verify that all blocks in (2.4) and (2.6) are different. Equation (2.4) could have coinciding blocks only if t = 2m. This is however excluded by the condition ht, 2i = 1 which must hold because of (1.3). Equation (2.6) could have coinciding blocks if t = 3m, t = 3m/2 or t = 2m. The first two possibilities are excluded by the condition ht, 3i = 1. The last one is allowed, but this case is described by the reduced formula (2.7), which is obviously representable in the form (2.33). The inverse product form (2.33) (it is rather common for characters of Kac-Moody algebras [12]) can be interpreted as a character of a module with states created by bosonic type operators. Having the characters in the form (2.33) implies that the dimension of the level k in the Verma module of the irreducible representation is simply the number of P partitions k = x1 +. . .+xN + N i=1 ni yi with ni being an arbitrary non-negative integer. This suggests that the states at this level are simply monomials of the form (1.2). If any power of a generator having a given grading k is allowed, the character acquires a factor (1 − q k ) in the denominator. It is guaranteed that any monomial by itself (apart from L−1 |h = 0i) can never constitute a null-vector, as follows from the following simple argument. A null-vector has by definition zero norm or equivalently it is annihilated by Ln for all n > 0. Hence to prove our statement it is sufficient to show for one n that Ln acting on (1.2) is non-vanishing. It is easy to verify for k1 6= k that Lk acting on (1.2) vanishes only for h = 0. In case k1 = k 6= 1, the action of Lk−1 is always non-vanishing. However, one may not guarantee that all these monomials are linearly independent. It turns out that all of the factorized combinations of characters (2.22)–(2.27) and (2.28)–(2.29) can be rewritten in the inverse product form generalizing (2.33), namely as q const QN

− i=1 {xj }yj

1 QM

+ j =1 {x˜ i }yi

.

(2.34)

In particular, (2.23) for even n, the lower sign in (2.26) for integer nt/4 and nm/2, and (2.27) can be analyzed easily in the way we presented above and correspond to (2.34) with M = 0. The analysis of other cases is slightly more involved (since we encounter { }+ blocks and blocks with non-integer arguments) but goes essentially along the same lines. Consider, for instance, (2.22). Using (1.14) and (1.16), we can rewrite its r.h.s. as ± ± follows (we use here the notation {x1 ; . . . ; xn }± y := {x1 }y . . . {xn }y ) q const

{nm; n(t − 2m); n(t − m); nt; n(t + m); n(2t − m); n(t + 2m); 2nt}− 2nt − + {1}− 1 {nt/2 − nm; nt/2; nt/2 + nm)}nt {nt}nt

.

For n and m in the range (2.13) the numerator could have coinciding blocks only if t = 3m. However, in this case we have the reduced formula (2.28) which is readily seen 6 Exactly this form was an aim in [2].

192

A. G. Bytsko, A. Fring

to be representable in form (2.34) if we take (1.16) into account. Analysis of (2.23), (2.26) and (2.29) is performed analogously (notice only that for (2.26) one has to distinguish the cases nt/2 = 0, 1 mod 2). Thus, all the factorizable combinations of characters of type (2.11) admit the form (2.34). Examples of such representation are given in Appendix A. Moreover, we prove (see Appendix C) that there are no other factorizable differences of this type which admit the inverse product form (2.33). This is a rather surprising fact because the Gauß-Jacobi (1) (2) and Watson identities are the specific Macdonald identities [18] for the A1 and A2 algebras and one could expect that the higher Macdonald identities also lead to similar factorizations. It is worth to notice that some of the factorizable combinations discussed above admit the following form QM + j =1 {x˜ i }yi const . (2.35) q QN − i=1 {xj }yj This is the most natural form if we consider such an expression as a character (e.g. in the context of the super-conformal models, see Subsect. 3.4) of a module with states created not only by bosonic type operators but also by fermionic type operators, which produce the blocks in the numerator. Also, the form (2.35) gives particularly simple formulae for quasi-particle momenta (see Subsect. 3.3). 3. Applications In the rest of the paper we will present some corollaries and applications of the obtained results both in a mathematical and physical context. 3.1. Identities between characters. We commence by matching the product sides of the formulae for the factorized linear combinations of characters with those for the factorized single characters. For (2.23) this yields 8n,t 8n,t 2n,t (q) − χ2n,t−m (q) = χn,2m (q), χ2n,m

(3.1)

where ht, 2i = ht, ni = 1. Notice that this identity is exact in the sense that is it does not need an extra factor of type q const on the r.h.s. because h2n,t n,2m − c(2n, t)/24 =

2,3 7 h8n,t 2n,m − c(8n, t)/24. Since χ1,1 (q) = 1, we obtain, as a particular case, the identity (which was also presented in [3] in a different context) 3,8 3,8 (q) − χ1,6 (q) = 1. χ1,2 s,t n,t (q) − χs,t−m (q) = q const because it This is the only possible identity of the type χn,m requires c(s, ˜ t) = 0. According to (2.16), this implies st = 24. The latter equation is solved uniquely (up to a permutation of s and t) due to (1.3). For (2.27) and (2.26) we obtain analogously 12n,t 12n,t 3n,t (q) − χ8n,m (q) = χ2n,2m (q), χ4n,m

12n,t 12n,t 3n,t χ2n,m (q) − χ10n,m (q) = χn,2m (q),

(3.2)

7 This property, which actually holds for all identities in this subsection, hints on specific modular properties of the combinations of type (2.11).

Factorized Combinations of Virasoro Characters

193

where ht, 6i = hn, ti = 1. These identities are also exact. The first nontrivial examples 12,5 12,5 3,5 12,5 12,5 3,5 (q) − χ8,m (q) = χ2,2m (q) and χ2,m (q) − χ10,m (q) = χ1,2m (q), of this kind are χ4,m m = 1, 2. Furthermore, the characters on the r.h.s. of (3.2) form a pair of the type (2.11), and applying (2.26), we obtain (assuming m < t/4 for definiteness) 12n,t 12n,t 12n,t 12n,t (q) − χ10n,m (q) ± χ4n,m (q) ∓ χ8n,m (q) χ2n,m

=

q

c(3n,t) h3n,t n,2m − 24

(q)∞

(3.3)

− ± ± − {2nm}− nt {nt − 2nm}nt {nt/2} nt {nt/4 − nm} nt {nt/4 + nm} nt . 2

2

2

Finally, matching the r.h.s. of (3.3) for the lower sign with (2.6), we obtain 48n,t 48n,t 48n,t 48n,t 3n,t (q) − χ16n,m (q) + χ32n,m (q) − χ40n,m (q) = χ2n,4m (q). χ8n,m

(3.4)

48,5 48,5 48,5 48,5 3,5 (q) − χ16,1 (q) + χ32,1 (q) − χ40,1 (q) = χ1,1 (q). The first nontrivial example is χ8,1 Another way to derive some new identities is to match the product sides of different factorized linear combinations. In particular, one easily recovers the property (1.9) for combinations s,αt αs,t s,αt αs,t (q) ± χn,α(t−m) (q) = χαn,m (q) ± χαn,t−m (q), χn,αm

(3.5)

3,10 (q) ± if α is a positive integer such that ht, αi = hs, αi = 1. For instance, χ1,2m 3,10 5,6 5,6 (q) = χm,2 (q) ± χm,4 (q), m = 1, 2. χ2,2m Less obvious identities between characters of different models having the same ceff follow if we compare the r.h.s. of (2.26) and (2.27): 3n,2t 3n,2t 6n,t 6n,t (q) − χn,t+2m (q) = χn,m (q) − χ5n,m (q), χn,t−2m

(3.6)

3,10 3,10 (q) − χ2,1 (q) = where ht, 6i = hn, 2i = ht, ni = 1, m < t/2. For instance, χ1,1 5,6 5,6 χ2,1 (q) − χ2,5 (q). Employing the factorized form of (combinations of) characters, we can derive identities involving their bilinear combinations. For instance, it is straightforward to verify the following relations (see Appendix D for a sample proof) 3n,4m 6n,5m 6n,5m 3n,2m 4n,5m χ2n,m = χn,2m − χn,3m χn,2m , (3.7) χn,m 3n,4m 6n,5m 3n,2m 4n,5m 6n,5m χ2n,2m = χn,2m − χn,4m χn,m , (3.8) χn,m 4n,5m 6n,5m 6n,5m 3n,4m 3n,2m 4n,5m 3n,4m ± χn,4m ∓ χ2n,3m ± χn,3m χn,m = χ2n,2m χn,m , (3.9) χn,m 4n,5m 4n,5m 6n,5m 6n,5m 3n,4m 3n,2m 3n,4m ± χn,3m ∓ χ2n,4m ± χn,3m χn,2m = χ2n,m χn,m , (3.10) χn,m

which in turn lead to the identities 4n,5m 6n,5m 4n,5m 6n,5m 6n,5m 6n,5m − χn,4m − χn,3m χn,m = χ2n,2m χn,2m , χ2n,m 4n,5m 4n,5m 6n,5m 6n,5m ± χn,3m ∓ χ2n,2m χ2n,2m χn,2m 4n,5m 6n,5m 6n,5m 4n,5m ± χn,4m ∓ χ2n,4m χ2n,m . = χn,m

(3.11)

(3.12)

194

A. G. Bytsko, A. Fring

We have omitted the q-dependence for compactness of the formulae. Once more we like to point out these relations are exact (see (D.1)). A particular case of (3.11) and (3.12) for n = m = 1 was found in [19]. Further interesting identities are for instance

3n,4m 3n,4m 3n,4m 3n,4m (q) + χn,3m (q) χn,m (q) − χn,3m (q) χn,m 2 nm 3n,2m (q) {nm}− = q − 24 χn,m 2nm ,

2 5,6 5,6 5,6 5,6 3,4 (q)χ2,2 (q) − χ1,4 (q)χ2,4 (q) = χ1,2 (q) , χ1,2

4,5 4,5 (q) ± χ1,3 (q) χ1,2

5,6 5,6 4,15 4,15 χ2,2 (q) ∓ χ2,4 (q) = χ1,5 (q) ± χ3,5 (q).

(3.13)

(3.14)

(3.15)

Equation (3.13) for n = m = 1 yields the well-known relation

3,4 3,4 3,4 (q))2 − (χ1,3 (q))2 χ1,2 (q) = 1 (χ1,1

It is of a certain interest to search for relations between (combinations of) characters with rescaled q. The rescaling, q → q r or, equivalently, τ → rτ can be regarded as a transformation relating theories on two different tori. In statistical mechanics, where τ is considered as a physical parameter (e.g. inverse temperature or width of a strip), this transformation relates two models at different values of this parameter. In order to match the factorized (combinations of) characters involving those with rescaled q it is useful to take into account that such rescaling, q → q r , also leads to the ˜ We present here only several examples relating rescaling of ceff → ceff /r (and c˜ → c/r). characters of some models with interesting physical content under the transformation q → q 2: −1 3,4 2 3,4 2 3,4 (q ) − χ1,3 (q ) = χ1,2 (q) , χ1,1

(3.16)

5,6 2 5,6 2 2,5 5,6 2 5,6 2 2,5 (q ) + χ1,4 (q ) = χ1,1 (q), χ2,2 (q ) + χ2,4 (q ) = χ1,2 (q), χ1,2

(3.17)

5,6 5,6 2,5 2 5,6 5,6 2,5 2 (q) − χ1,5 (q) = χ1,2 (q ), χ2,1 (q) − χ2,5 (q) = χ1,1 (q ), χ1,1

(3.18)

6,7 2 6,7 2 6,7 6,7 (q ) + χ2,6 (q ) = χ1,3 (q) − χ1,4 (q), χ2,1

(3.19a)

6,7 2 6,7 2 (q ) + χ2,5 (q ) χ2,2

6,7 6,7 χ1,1 (q) − χ1,6 (q),

(3.19b)

6,7 2 6,7 2 6,7 6,7 (q ) + χ2,4 (q ) = χ1,2 (q) − χ1,5 (q). χ2,3

(3.20)

=

Finally, it may be of some interest to consider relations between (combinations of) incomplete characters with rescaled q. For instance, we have 4n,t 2n,t 6n,t 3n,t 4n,t 2 6n,t 2 (q ) − χˆ 3n,m (q 2 ) = χˆ n,2m (q), χˆ n,m (q ) − χˆ 5n,m (q 2 ) = χˆ n,2m (q). χˆ n,m

(3.21)

Identities between the corresponding full characters are then obtained by multiplication 1 −1 of the r.h.s. with (q 24 {1}+ 1) .

Factorized Combinations of Virasoro Characters

195

3.2. Rogers-Ramanujan type identities. Once we have achieved factorization of (combinations of) characters in the form (2.35), we can employ (1.12) in order to re-express the product as a sum of type distinct from (1.7). More precisely, combining (1.12) with (1.13) and substituting into (2.35), we obtain QM 2 −l −...−l )y/2+B·l + 1 M X q (l12 +...+lM j =1 {x˜ i }y const = , (3.22) q QN y y − (q )l1 . . . (q )lM+N i=1 {xj }y l

where B = {x˜1 , . . . x˜M , x1 , . . . xN } and l has (M +N ) components running through nonnegative integers. The structure of this identity resembles the famous Rogers–Ramanujan 2,5 2,5 and χ1,2 identities (which are in fact just two ways of writing down the characters χ1,1 – see Appendix A) ∞

∞

X ql 1 , − = (q)l {1; 4}5 l=0

X q l +l 1 . − = (q)l {2; 3}5 l=0

2

2

(3.23)

However, whereas Eq. (3.22) may be decomposed into a product of identities (1.12), such simplifications are not possible in the proof of the Rogers–Ramanujan identities (see e.g. [23]). Thus, in order to obtain more interesting generalizations of the RogersRamanujan identities involving our factorized form of (combinations of) characters as a product side, we need another expression for the sum on the r.h.s. of (3.22). For this purpose we make use of the results of [21] where it was observed that some Virasoro characters admit the following form: q

const

X l

t

q l Al+B·l , (q)l1 . . . (q)ln

(3.24)

where A is a real n × n symmetric matrix (sometimes coinciding with the inverse Cartan matrix of a simply-laced Lie algebra), B is an n-component vector, and the summation may be restricted by a condition of the type γ · l = Q (mod α) with some integer valued γ and positive Q and α. It turns out that some of the characters of minimal models admitting the form (3.24) are either factorizable per se or can be combined into the factorizable combinations considered above. This circumstance allows us to apply the results of Sect. 2 and derive a set of Rogers–Ramanujan type identities. For instance q

1 − 40

3,5 χ1,1 (q)

=

2 ∞ X q (l +2l)/4 l=0 even

q

1 − 40

3,5 χ1,4 (q)

=

(q)l

2 ∞ X q (l +2l)/4 l=1 odd

(q)l

=

+ {4}+ 10 {6}10

− − − − {2}− 10 {3}10 {5}10 {7}10 {8}10 3

= q4

+ {1}+ 10 {9}10

,

− − − − {2}− 10 {3}10 {5}10 {7}10 {8}10

(3.25)

.

(3.26)

Furthermore, we can apply (2.26) to combinations of the l.h.s. which yields 2 ± ∞ {3/4}± X 1 (±)l q (l +2l)/4 5/2 {7/4}5/2 3,5 3,5 (q) ± χ1,4 (q) = = q − 40 χ1,1 − + . (q)l {2}− 5 {3}5 {5/2}5/2 l=0

(3.27)

We present a set of further Rogers–Ramanujan type identities derived in a similar way in Appendix E. The product sides of these identities are not unique in the sense that one

196

A. G. Bytsko, A. Fring

may use the techniques discussed in Subsect. 2.4 and bring them, if possible, to the form (2.33), (2.34) or (2.35) (compare (3.27) and the corresponding formula in Appendix A). It is also worth noticing that, combining these identities further, we again obtain identities of the Rogers–Ramanujan type. For instance, multiplying (3.25) and (3.26), we find 2 + + 2 2 ∞ {1}+ X 5 {4}5 {5}5 q l1 +l2 +l1 +2l2 = (3.28) 2 . (q)2l1 (q)2l2 +1 − l1 ,l2 =0 {3} {2}− 5 5 It should be mentioned that there exists a more general type of formulae than (3.24) (involving a q-deformed binomial factor) [20] which covers the whole range of characters in all minimal models. Therefore when our factorization technique applies we also have Rogers–Ramanujan identities for these more general types. 3.3. Quasi-particle representation. Once a character admits a factorizable form, it is easy to obtain a quasi-particle spectrum following the prescription of [21,22,6]. Let P(n, m) be the number of partitions of a positive integer n into m distinct non-negative integers and Q(n, m) be the number of partitions of a positive integer n into positive integers smaller or equal to m. In the theory of numbers the following formulae are well-known (e.g. [23]): ∞ X

P(n, m) q n =

n=0

∞ X

q m(m−1)/2 , (q)m

Q(n, m) q n =

n=0

1 . (q)m

(3.29)

Combining them with (1.12) and (1.13), we obtain {x}+ y

∞ X

∞ X

Q(n, m) q (n+m(m−1)/2)y+mx ,

(3.30)

1 = Q(n, m) q ny+mx = P(n, m) q (n−m(m−1)/2)y+mx . {x}− y n,m=0 n,m=0

(3.31)

=

P(n, m) q

ny+mx

=

n,m=0 ∞ X

n,m=0 ∞ X

We assume now the character to be of the form (3.22), and proceed in the usual way in order to derive the quasi-particle states. For the characters as a P this one interprets −El /kT , k being Boltzmann’s P (E )e partition function with χ (q = e−2π v/LkT ) = ∞ l l=0 constant, T the temperature, L the size of the quantizing system, v the speed of sound, El the energy of a particular level and P (El ) its degeneracy. The contribution of a single number) particle of type a and momentum paia (ia being an additional internal quantum PN +M Pla ia to the energy is assumed to be of the form El = v a=1 ia =1 pa . One has now the option to construct either a purely fermionic (in units of 2π/L) ! NX +M y i lk + yNai (3.32) 1− pa ( l ) = Ba + 2 k=M+1

or purely bosonic spectrum (in units of 2π/L) pai ( l )

y = Ba + 2

1−

M X k=1

! lk + yMai .

(3.33)

Factorized Combinations of Virasoro Characters

197

1/24 3,4 Table 1. Bosonic and fermionic spectrum for χ1,2 (q) = q − . k denotes the level and µk its degeneracy

{1}2

k

µk

p i = 1 + 2Mi

p i (l) = (2 − l) + 2Ni

1

1

|1i

|1i

2

1

|1, 1i

|0, 2i

3

2

|1, 1, 1i , |3i

|−1, 1, 3i , |3i

4

2

|1, 1, 1, 1i , |1, 3i

|−2, 0, 2, 4i , |0, 4i

5

3

|1, 1, 1, 1, 1i , |1, 1, 3i , |5i

|−3, −1, 1, 3, 5i , |−1, 1, 5i , |5i

6

4

7

5

|1, 1, 1, 1, 1i , |3, 3i ,

|−2, 0, 2, 6i , |0, 6i ,

|1, 5i , |1, 1, 1, 3i

|−4, −2, 0, 2, 4, 6i , |2, 4i

|1, 1, 1, 1, 1, 1i , |1, 3, 3i ,

|−5, −3, −1, 1, 3, 5, 7i , |−1, 1, 7i ,

|1, 1, 5i , |1, 1, 1, 1, 3i , |7i

|−1, 3, 5i , |−3, −1, 1, 3, 7i , |7i

Here Nai are distinct positive integers and Mai are some arbitrary integers. The fermionic nature of this spectrum is here expressed through the fact that the integers Nai are all distinct, such that we have a Pauli principle. An example for such spectra is presented in Table 1. A particular interesting spectrum arises when we allow bosons and fermions pai = Ba + yNai ,

pbi = Bb + yMbi

(3.34)

with a ∈ {1, M} and b ∈ {M +1, N +M}. Notice now the dependence on l has vanished. When N = M this may be interpreted in a supersymmetric way. Following the procedure of this subsection, the answer to the question [24]: “How many fermionic representations are there for the characters of each model M(s, t)?” would be infinite for factorizable characters due to the second relation in (1.14). One could also change the approach and start with a given spectrum and search for the related character [25] which shifts the problem to finding all possible integrable lattice models. A possible selection mechanism is given by using information from the massive models which in the conformal limit lead to certain models M(s, t). In this spirit for instance the choice A1 and E8 for the algebras of the related Cartan matrices in (3.24) appears quite natural. 3.4. Super-conformal characters. Linear combinations of characters may be found in various contexts as for instance when considering superconformal theories. The two N=1 unitary minimal superconformal extension of the Virasoro algebra are characterized by an integer l and a label s = R, N S, which refers to the Ramond or Neveu-Schwarz sector. The Virasoro central charge was found [26] to be 8 3 1− , l = 3, 4, . . . . (3.35) c(l) = 2 l(l + 2) The corresponding irreducible representations are highest weight representations with weights l,s = Hn,m

((l + 2)n − ml)2 − 4 1 + δs,R 8l(l + 2) 16

,

(3.36)

198

A. G. Bytsko, A. Fring

where the labels are restricted as 1 ≤ n ≤ l − 1, 1 ≤ m ≤ l + 1 together with n − m = ˆ (2)l−2 ⊗ even, odd when s = NS, s = R, respectively. Realizing these models as SU l,s ˆ ˆ SU (2)2 /SU (2)l+2 -cosets the corresponding characters n,m (q) were constructed in [10]. One notices from (1.4) and (3.35) that c(3) = c(4, 5) and indeed, applying twice the GKO-sumrules one may identify supersymmetric characters with linear combinations of some non-supersymmetric Virasoro characters ˜

S 4,5 4,5 3,N 1,1 (q) = χ1,1 (q) + χ1,4 (q),

NS 4,5 4,5 3, 1,1 (q) = χ1,1 (q) − χ1,4 (q),

S 4,5 4,5 3,N 1,3 (q) = χ1,2 (q) + χ1,3 (q),

NS 4,5 4,5 3, 1,3 (q), = χ1,2 (q) − χ1,3 (q),

4,5 3,R 2,1 (q), = χ2,1 (q),

˜

4,5 3,R 2,3 (q), = χ2,2 (q).

(3.37) (3.38) (3.39)

Notice that all these characters factorize (see Appendix A for the explicit formulae). Moreover, they admit the form (2.35) (which is due to Rocha-Caridi [1]) as well as the form (2.33) (see Appendix A). It is interesting that the latter does not appear to be manifestly supersymmetric. We observe easily the property for these expressions under the T-modular transformation (assuming y to be an integer, the effect of this ± ± ∓ transformation is that {x}± y → {x}y when x is an integer and {x}y → {x}y when x ˜

S l,N S l,R is a semi-integer) which relates l,N n,m (q) and n,m (q) and leaves n,m (q) invariant. l,s Fermionic representations for all characters n,m (q) were found in [27] and we leave it for future investigations to settle the question whether they also factorize or not. As in the non-supersymmetric case the modular properties of these characters [28] will certainly turn out to be useful.

3.5. Modular invariant partition functions. Modular invariant partition functions for minimal models are given by (up to an overall coefficient) ¯ = Z s,t (q, q)

X n,n0 ,m,m0

0

m,m s,t s,t Zn,n 0 χn,m (q) χn0 ,m0 (q).

(3.40)

For the so-called main sequence (in the terminology of [17]), or (As−1 , At−1 ) type, m,m0 = δn,n0 δm,m0 . Bearing in mind factorizability of all characters in the we have Zn,n 0 M(2, t) and M(3, t) models, one can rewrite the corresponding partition functions as a sum of products of the type (2.33). This allows, in particular, to apply the technique of Subsect. 3.3 and obtain quasi-particle representations for these partition functions. Besides the main sequence some minimal models possess other modular invariants m,m0 (complementary sequences) [3,17,29] of the type (3.40) with more general Zn,n 0 . In particular, for M(4k, t) and M(4k + 2, t) ((D2k+1 , At−1 ) and (D2k+2 , At−1 ) type, rem,m0 spectively) the non-diagonal part of Zn,n 0 is znm δn,n0 δm,t−m0 . In this case (3.40) involves not only squares of modules of single characters but also those of sums of characters of the type (2.11). For t = 3 all of these sums are factorizable and we can represent the corresponding partition functions as a sum of products (of the type (2.35) in general). Thus, for such partition functions we also can obtain quasi-particle representations.

Factorized Combinations of Virasoro Characters

199

3.6. Partition functions in boundary CFT. A partition function of a conformal theory on a manifold with boundaries, say on a cylinder, is expressed as a sum of characters of a single copy of the Virasoro algebra [30] X h Nαβ χh (q), (3.41) Zα,β (q) = h

where (α, β) is a pair of boundary conditions, χh (q) denotes a character of given weight h are multiplicities (expressible in terms of (2.17) and also related to the fusion h, and Nαβ rules). It is interesting that in some cases Zα,β (q) is just a factorizable sum (or several such sums) of type (2.11), so we can rewrite it in the product form. For instance, for the critical 3-state Potts model (corresponding to M(5, 6)) there are three microscopic states A, B and C, and for some of possible partition functions we find 11

5,6 5,6 (q) + χ4,2 (q) = q 120 ZA,F (q) = χ1,2

5,6 5,6 (q) + χ3,2 (q) = q − 120 ZBC,F (q) = χ2,2 1

1

− , {1}− 5/2 {3/2}5/2

1 − , {1/2}− 5/2 {2}5/2

(3.42) (3.43)

where F stands for the free boundary condition. As we mentioned in Subsect. 2.3, such an expression may be interpreted as a character of a module generated by bosonic operators (in fact, (3.17) shows that (3.42) and (3.43) coincide with the characters of M(2, 5) of an argument q 1/2 ). Also, this form of a partition function allows for a direct extraction of a quasi-particle spectrum which, (in the spirit of Subsect. 3.3) in particular, can be used to study connections between theories with distinct boundary conditions. Conclusion We have shown how to obtain the factorized form of a single Virasoro character on the base of the Gauß-Jacobi and Watson identities by exploiting the quasi-classical asymptotics of the usual sum representation. We have applied this method also to the factorization of a linear combination of two Virasoro characters and found the explicit formulae (2.23), (2.26) and (2.27). We presented a rigorous proof that besides the obtained expressions no other differences of two Virasoro characters of the type (2.11) are factorizable in the form (2.33). It is a remarkable fact, which certainly needs some deeper understanding, that just like for the single characters none of the Macdonald (1) (2) identities, other than the ones corresponding to the A1 and A2 algebras need to be invoked. We employed the obtained factorized versions of the characters in order to derive a set of new identities, e.g. (3.7)–(3.10), in a very economical way. Some particular cases of these identities coincide with formulae derived originally in [19], however now the proof has simplified considerably. As was already pointed out in [19], these identities belong to a class which is closely related, but not derivable, from a repeated use of the GKO-sumrules [10]. It is therefore suggestive to assume that the new identities are related to some higher sumrules. A systematic classification of identities obtainable from factorised combinations of Virasoro characters will be presented elsewhere. It is also conceivable, that the presented method will be applicable to non-minimal models ˆ (2)k /Uˆ (1)k -coset, or general N=1,2,4 supersymmetlike parafermionic models, i.e. SU ric models. Concerning the quasi-particle representation of the Virasoro characters with

200

A. G. Bytsko, A. Fring

their relation to lattice models, the factorized versions constitute a suitable starting point for a more detailed analysis, as for instance in [22]. Acknowledgements. We would like to thank W. Eholzer, K.P. Kokhas, C. Korff, B.M. McCoy, V. Schomerus, M.A. Semenov-Tian-Shanski and O. Verhoeven for useful discussions and remarks. A.B. is grateful to the members of the Institute für Theoretische Physik, FU-Berlin for their hospitality and to the Volkswagen Stiftung for partial financial support. A.F. is grateful to the members of the Steklov Mathematical Institute (St. Petersburg) for their hospitality and to the Deutsche Forschungsgemeinschaft (Sfb288) for partial financial support.

Appendix A Here we will present some examples of the inverse product representation for characters and linear combinations of characters in some unitary and non-unitary models. For shortness we omit the argument q on l.h.s. and use the notation {x1 ; . . . ; xn }± y := ±. {x1 }± . . . {x } n y y 1 1 1 3,4 χ1,2 = q 24 , ∓ +, {1/2}1 {1}1 {1}− 2 49 1 1 1 4,5 4,5 χ2,1 = q 120 χ2,2 = q 120 , − − , − {1; 4}5 {3; 5; 7}10 {2; 3}5 {1; 5; 9}− 10 7 1 4,5 4,5 ± χ1,4 = q − 240 χ1,1 + − , {3/2; 5/2; 7/2}∓ 5 {5}5 {2; 8}10 17 1 4,5 4,5 ± χ1,3 = q 240 χ1,2 + − , {1/2; 5/2; 9/2}∓ 5 {5}5 {4; 6}10 1 11 1 1 5,6 5,6 5,6 5,6 − χ1,5 = q − 30 χ2,1 − χ2,5 = q 30 , χ1,1 − , {2; 8}10 {4; 6}− 10

3,4 3,4 ± χ1,3 = q − 48 χ1,1 1

11

5,6 χ1,2

5,6 ± χ1,4

q 120 = , − {1; 4}5 {3/2; 7/2}∓ 5

6,7 6,7 − χ1,6 = χ1,1

q − 28 − , {3; 4}− 7 {2; 12}14

6,7 6,7 − χ1,4 = χ1,3

q 28 − , {2; 5}− 7 {6; 8}14

6,7 6,7 ± χ2,6 = χ2,1

q 56 ∓, {1; 3; 4; 6}− 7 {5/2; 9/2}7

5,6 ± χ2,4

1

3

6,7 6,7 χ1,2 − χ1,5 =

19

19

q − 56 = ∓, {1; 2; 5; 6}− 7 {3/2; 11/2}7 3 1 6,7 6,7 ± χ2,4 = q 56 , χ2,3 − {2; 3; 4; 5}7 {1/2; 13/2}∓ 7 11 1 1 1 2,5 2,5 = q 60 χ1,2 = q − 60 , χ1,1 −, {2; 3}5 {1; 4}− 5 1

6,7 χ2,2

6,7 ± χ2,4

q − 120 = ∓, {2; 3}− 5 {1/2; 9/2}5 1

5,6 χ2,2

q 28 , − {1; 6}7 {4; 10}− 14

Factorized Combinations of Virasoro Characters 1

3,5 3,5 χ1,1 ± χ1,4 = q 40

201

1

∓ +, {2; 8}− 10 {3/4; 7/4} 5 {3/2; 5/2; 7/2}5 2

3,5 χ1,2

3,5 ± χ1,3

=q

1 − 40

1

∓ +. {4; 6}− 10 {1/4; 9/4} 5 {1/2; 5/2; 9/2}5 2

Appendix B In this appendix we present a sample proof for the identities of the type (2.22)–(2.23) and (2.26)–(2.27), that is for the factorization of the sum or difference of two Virasoro characters related to minimal models. The proof is based on a systematic exploitation of the Gauß-Jacobi and Watson identities (1.20)–(1.23). We have to compare the l.h.s. of these expressions with the sum or difference of characters given by (1.7), s,t

s,t s,t (q) ± χn,t−m (q) χn,m

c(s,t)

q hn,m − 24 = (q)∞ ±q

∞ X

q stk

k=−∞

k(nt+ms−st)+1hs,t n,m

q k(nt−ms) − q k(nt+ms)+nm

2

s,t ∓q k(nt−ms+st)+n(t−m)+1hn,m . (B.1)

Here the quantity 1hs,t n,m is defined by (2.12) and we assume n < s/2, m < t/2, so that 1hs,t n,m > 0. We outline the proof for the identity (2.27). All other proofs work along the same lines. Recall that (2.27) has been conjectured to be a particular case of (1.21) for a = 1hs,t n,m and b = nm provided that the condition s = 6n holds. Notice that substitution of the latter relation into (2.12) yields a = nt − 2nm. In order to produce the right number of terms for a possible comparison with (B.1), we have to split the sum in the l.h.s. of (1.21) into two new sums – over even and odd k. Then the l.h.s. of (1.21) acquires the form ∞ X

qk

2 (6a+12b)

q k(a−4b) + q k(7a+8b)+2a+b − q k(a+8b)+b − q k(7a+20b)+2a+8b ,

k=−∞

which, upon substitution of the explicit values for a and b and the relation s = 6n, becomes ∞ X k=−∞

q stk

2

q k(nt−ms) + q k(nt−ms+st)+2nt−3nm −q k(nt+ms)+nm − q k(nt+ms+st)+2nt+4nm .

We see that the first, second and third terms here exactly match the first, fourth and second terms on the r.h.s. in (B.1), respectively. Making the shift k → k − 1 in the last term, we achieve that it coincides with the third term in (B.1). This completes the proof.

202

A. G. Bytsko, A. Fring

Appendix C Here we will prove the following statement: the factorization of the difference of two minimal Virasoro characters in the form s,t s,t (q) − χs−n,m (q) = χn,m

s,t c(s,t) c(s,t) q hs,t n,m − 24 q hn,m − 24 s,t s,t (q) = QN , χˆ n,m (q) − q 1h χˆ s−n,m − (q)∞ i {xi }b (C.1)

where 0 < x1 < . . . < xN ≤ b, is up to the symmetries (1.8) only possible for s = 3n, 4n, 6n. Here 1h stands for 1hs,t n,m defined in (2.12), and we assume n < s/2, m < t/2, so that 1hs,t n,m > 0. Our argumentation goes along the lines of the proof for the factorization of single characters given in [2]. Surprisingly it is enough to investigate the first five terms in the sum, which for the incomplete character may be identified uniquely s,t (q) = 1 − q nm − q (s−n)(t−m) + q ts+sm−tn + q ts+tn−sm + . . . . χˆ n,m

(C.2)

For the difference of the two characters they read s,t s,t (q) − q 1h χˆ s−n,m (q) = 1 − q nm − q 1h + q 1h+m(s−n) + q 1h+n(t−m) + . . . . χˆ n,m (C.3)

For definiteness we choose sm < nt (so that 1h + m(s − n) < 1h + n(t − m)), since the other case may be obtained from the symmetry properties. The negative terms in (C.3) allow us to write down the first two factors in the product s,t s,t (q) − q 1h χˆ s−n,m (q) = (1 − q nm )(1 − q 1h ) . . . , χˆ n,m

(C.4)

which means that after expanding we will generate a term q nm+1h . Since nm + 1h < 1h + m(s − n), we have to include a factor (1 − q nm+1h ) on the r. h. s. of (C.4) in order to cancel this term. Expanding once more we will generate new terms, which in turn have to be cancelled by additional factors on the right hand side of (C.4) until we obtain the matching condition αnm + β1h = 1h + m(s − n) with positive integers α and β. At first sight it seems a formidable task to bring some systematics into this analysis. However, it was observed in [2] that this procedure will terminate when α + β = 5. Actually also one case from level 6 might be possible. Performing this analysis up to that level one obtains 1 − q nm − q 1h + q αnm+β1h + . . . = (1 − q nm )(1 − q 1h )(1 − q nm+1h )(1 − q 2nm+1h ) ×(1 − q nm+21h )(1 − q 3nm+1h )(1 − q nm+31h )(1 − q 2nm+21h ) ×(1 − q 4nm+1h )(1 − q nm+41h )(1 − q 3nm+21h )2 (1 − q 2nm+31h )2 . . . , where in this expression α + β > 5. It is the occurrence of the quadratic terms (1 − q 3nm+21h )2 and (1 − q 2nm+31h )2 which allows us to stop at this point, since they may never be cancelled against factors within (q)∞ and we can therefore restrict the investigation to the cases 2 ≤ α + β ≤ 5. Commencing with the case α + β = 5 we obtain two matching conditions, that is for the two smallest powers of the positive terms st ms nt s 4 + 2 − 2 = 3nm + 21h ⇒ s = 6n − 2m . st ms nt t 4 − 2 + 2 = 2nm + 31h

Factorized Combinations of Virasoro Characters

203

Since n is positive, m is strictly smaller than t, hs, ti = 1 and t = 2m produces zero on the left hand side of (C.1), the case α + β = 5 will never produce any solution. We may also encounter the situation st ms nt 4 + 2 − 2 = 4nm + 1h ⇒ s = 5n and t = 6m. st ms nt 4 − 2 + 2 = 2nm + 21h In the remaining possibilities we only obtain one matching condition, that is for p. The case α = β = 2 leads to the condition 2nm + 21h = sm − nm + 1h which amounts to m=t

(s − 2n) . (6s − 16n)

However, substitution of this relation into the condition sm < nt leads to s(s − 2n) < n(6s − 16n), or equivalently, (s − 4n)2 < 0, which is impossible. The other cases yield nm + 1h = sm − nm + 1h nm + 21h = sm − nm + 1h 2nm + 1h = sm − nm + 1h nm + 31h = sm − nm + 1h 3nm + 1h = sm − nm + 1h 5nm + 1h = sm − nm + 1h

⇒ ⇒ ⇒ ⇒ ⇒ ⇒

s s s s s s

= 2n, = 2n or t = 6m, = 3n, = 2n or t = 4m, = 4n , = 6n .

We observe that we recover the cases we claimed to factorize in the form (C.4), which concludes the proof. Appendix D We will now provide a sample proof for the identities (3.7)–(3.10). For n = m = 1 some very involved proof which employs identities of theta functions may be found in [19]. With the help of the product representations (2.22)–(2.27) such identities may be derived without any effort. We demonstrate this just for Eqs. (3.9) with the upper sign, the remaining equations may be derived in a similar way. First of all we notice that h3n,2m n,m −

c(3n, 2m) c(4n, 5m) c(6n, 5m) + h4n,5m = h6n,5m n,m − 2n,2m − 24 24 24 c(3n, 4m) . + h3n,4m n,m − 24

(D.1)

After cancelling (q)2∞ on both sides of (3.9) for the upper sign we obtain for the left hand side upon using (1.14) (we omit here the labels nm in order to avoid lengthy formulae − and imagine just for now that {x}− y should always be understood as {xnm}ynm ), 4n,5m 3n,2m 4n,5m + χˆ n,4m χˆ n,m χˆ n,m + + + ! 5 7 − − 3 {1}− = {1}− 1 5 {4}5 {5}5 2 5 2 5 2 5

204

A. G. Bytsko, A. Fring

+ + + 3 5 7 = 2 5 2 5 2 5 + + ! − − ! 1 3 9 − − − − 1 = {1}− {4}− 4 {3}4 {2}2 10 {6}10 {5}5 2 2 2 2 2 5 2 5 3n,4m 6n,5m 6n,5m 3n,4m + χˆ n,3m − χˆ 2n,3m χˆ 2n,2m . = χˆ n,m − − − − − − − {1}− 4 {3}4 {2}2 {1}10 {6}10 {4}10 {9}10 {5}5

Here we have used several times the identities (1.14). Appendix E We complement the list started in Subsect. 3.2 of the Rogers–Ramanujan type identities obtained by combining our product formulae for (combinations of) characters with the results of [21]. We adopt the notations explained in Appendix A: 3,4 = q 1/48 χ1,1

2 ∞ X q l /2 l=0 even

3,4 = q 1/48 χ1,3

(q)l

2 ∞ X q l /2 l=1 odd

(q)l

=

1

−, {2; 14}− 16 {3; 4; 5}8

= q 1/2

(E.1)

1

−, {6; 10}− 16 {1; 4; 7}8

2 ∞ X (±)l q l /2 3,4 3,4 ± χ1,3 = {1/2}± = q 1/48 χ1,1 1, (q)l

3,4 q −1/24 χ1,2

=

l=0 ∞ X l=0 even

∞

2 2 q 2(l1 +l1 +l1 l2 )/3

(q)l1 (q)l2

l1 ,l2 =0 l1 +2l2 =±1(mod 3) 1

3,5 = q 40 χ1,2

2 ∞ X q l /4 l=0 even

1

3,5 = q 40 χ1,3

(q)l

2 ∞ X q l /4 l=1 odd

2

(E.4)

odd

∞ X

5,6 = q 1/30 χ1,3

(E.3)

X q (l −l)/2 1 q (l −l)/2 = = , (q)l (q) {1}− l 2 l=1 2

(E.2)

(q)l

=

{3; 7}+ 10

{1; 4; 5; 6; 9}− 10

= q 1/4

=

q 2/3 − , {1; 2}− 3 {6; 9}15

,

{2; 8}+ 10

{1; 4; 5; 6; 9}− 10

(E.6)

,

2 ± ∞ X {1/4}± 1 (±)l q l /4 5/2 {9/4}5/2 3,5 3,5 ± χ1,3 = = q 40 χ1,2 + . (q)l {1; 4}− 5 {5/2}5/2 l=0

More identities will be given elsewhere [31].

(E.5)

(E.7)

(E.8)

Factorized Combinations of Virasoro Characters

205

References 1. Rocha-Caridi, A.: Vertex Operators in Mathematics and Physics. ed. J. Lepowsky et al, Berlin: Springer, 1985 2. Christe, P.: Int. J. Mod. Phys. 29, 5271 (1991) 3. Cappelli, A., Itzykson, C., and Zuber, J.B.: Nucl. Phys. B280, 445 (1987); Commun. Math. Phys. 113, 1 (1987) 4. Kellendonk, J., Rösgen, M., and Varnhagen, R.: Int. J. Mod. Phys. A9, 1009 (1994) 5. Eholzer, W., and Skoruppa, N.-P.: Phys. Lett. B388, 82 (1996) 6. Bytsko, A.G., and Fring, A.: Nucl. Phys. B521, 573 (1998) 7. Rogers, L.J.: Proc. London Math. Soc. 25, 318 (1894); Ramanujan, S.: J. Indian Math. Soc. 6, 199 (1914); Schur, I.: Berliner Sitzungsberichte 23, 301 (1917) 8. Belavin, A.A., Polyakov, A.M., and Zamolodchikov, A.B.: Nucl. Phys. B241, 333 (1984) 9. Friedan, D., Qiu, Z., and Shenker, S.: Phys. Rev. Lett. 52, 1575 (1984); Commun. Math. Phys. 107, 535 (1986) 10. Goddard, P., Kent, A., and Olive, D.: Commun. Math. Phys. 103, 105 (1986) 11. Feigin, B.L., and Fuchs, D.B.: Funct. Anal. Appl. 17, 241 (1983) 12. Kac, V.G.: Infinite dimensional Lie algebras. Cambridge: Cambridge U. Press, 1990 13. Watson, G.N.: J. London Math. Soc. 4, 39 (1929) 14. Faddeev, L.D., and Kashaev, R.M.: Mod. Phys. Lett. A9, 427 (1994) 15. Lewin, L.: Polylogarithms and associated functions. Amsterdam: North-Holland, 1981 16. Blöte, H., Nichtingale, M.P., and Cardy, J.L.: Phys. Rev. Lett. 56, 742 (1986); Affleck, I.: Phys. Rev. Lett. 56, 746 (1986); Itzykson, C., Saleur, H., and Zuber, J.B.: Europhys. Lett. 2, 91 (1986) 17. Itzykson, C., and Zuber, J.B.: Nucl. Phys. B275, 580 (1986) 18. Macdonald, I.G.: Invent. Math. 15, 91 (1972) 19. Taormina, A.: Commun. Math. Phys. 165, 69 (1994) 20. Berkovich, A., McCoy, B.M., and Schilling, A.: Commun. Math. Phys. 191, 325 (1998) 21. Kedem, R., Klassen, T.R., McCoy, B.M., and Melzer, E.: Phys. Lett. B304, 263 (1993); B307, 68 (1993) 22. Belavin, A.A., and Fring, A.: Phys. Lett. B409, 199 (1997) 23. Hardy, G.H., and Wright, E.M.: An introduction to the theory of numbers. Oxford: Clarendon Press, 1965 24. Berkovich, A., and McCoy, B.M.: “Proceedings of the ICM 1998, Vol.III.” Berlin: DMV, 1998, p. 163 25. McCoy, B.M.: Private communication 26. Friedan, D., Qiu, Z., and Shenker, S.: Phys. Lett. B151, 37 (1985) 27. Baver, E., and Gepner, D.: Phys. Lett. B372, 231 (1996) 28. Capelli, A.: Phys. Lett. B185, 82 (1987); Minces, P., Namazie, M.A., and Nunez, C.: Phys. Lett. B422, 117 (1998) 29. Gepner, D.: Nucl. Phys. B287, 111 (1987); Kato, A.: Mod. Phys. Lett. A2, 585 (1987) 30. Cardy, J.L.: Nucl. Phys. B324, 581 (1989) 31. Bytsko, A.G.: J. Phys. A32, 8045 (1999) Communicated by R. H. Dijkgraaf

Commun. Math. Phys. 209, 207 – 261 (2000)

Communications in

Mathematical Physics

© Springer-Verlag 2000

Renormalization Group Approach to Interacting Polymerised Manifolds P. K. Mitter1 , B. Scoppola2,? 1 Laboratoire de Physique Mathématique, Laboratoire Associé au CNRS, UMR 5825, Université Montpel-

lier 2, Place E. Bataillon, Case 070, 34095 Montpellier Cedex 05, France

2 Dipartimento di Matematica, Universitá “La Sapienza” di Roma, Piazzale A. Moro 2, 00185 Roma, Italy

Received: 7 January 1999 / Accepted: 20 August 1999

Abstract: We propose to study the infrared behaviour of polymerised (or tethered) random manifolds of dimension D interacting via an exclusion condition with a fixed impurity in d-dimensional Euclidean space in which the manifold is embedded. In this paper we take D = 1, but modify the underlying free Gaussian covariance (thereby changing the canonical scaling dimension of the Gaussian random field) so as to simulate a polymerised manifold with fractional dimension D¯ : 1 < D¯ < 2. The canonical dimension of the coupling constant is ε = 1 − β d2 , where −β/2 is the canonical scaling dimension of the Gaussian embedding field. β is held strictly positive and sufficiently small. For ε > 0, sufficiently small, we prove for this model that the iterations of Wilson’s renormalisation group transformations converge to a non-Gaussian fixed point. Although ε is small, our analysis is non-perturbative in ε. A similar model was studied earlier [CM] in the hierarchical approximation. 0. Introduction Consider the Euclidean action Z Z 1 D d x(φ(x), −1φ(x)) + g d D xδ (d) (φ(x)), S(φ) = 2

(0.1)

where φ has values in Rd , ( , ) is scalar product in Rd , and the corresponding formal partition function: Z R D (d) (0.2) Z = dµC (φ)e−g d xδ (φ(x)) µC is a Gaussian measure with covariance C = (−1)−1 . φ can be considered as the embedding function of a D dimensional tethered or polymerised manifold (for tethered ? Partially supported by CNR, G.N.F.M. and Short Term Mobility Program.

208

P. K. Mitter, B. Scoppola

manifolds see [NPW]) in d-dimensional Euclidean space, and for g > 0, we have a repulsive interaction with a fixed impurity in the embedding space. It is easy to see that the coupling constant g has canonical (engineering) dimension [g] = ε = D−(2−D) d2 , φ has canonical scaling dimension: [φ] = −(2 − D)/2 and the upper critical dimension 2D . of the embedding space is dc = 2−D Such a model was studied in [DDG1,2] (the model was first considered in [D2]). It was shown in [DDG1,2] for 1 < D < 2 and ε > 0 sufficiently small, that there exists an ε-expansion in renormalised perturbation series, and that the infrared behaviour is governed by a non-Gaussian fixed point. The model with an ultraviolet cutoff was reconsidered in [CM] in the hierarchical approximation to Wilson’s Renormalization group (henceforth called RG). It was shown, under the same conditions, that the iteration of the hierarchical RG transformations converge to a non-Gaussian fixed point independent of the ε-expansion. In the present paper we consider a version of the above model and study the iterations of the exact RG with an ultraviolet cutoff in a finite but large volume eventually tending to infinity. The precise definition of the model and the RG iterations will be found in ¯ 1 < D¯ < 2, by choosing the Sect. 1.1. We shall simulate a fractional dimension D, underlying space dimension D = 1 and modifying the covariance C to be C = (−1)(δ−1) F (−1)

(0.3)

with 0 < δ < 1/2. Here F is an ultraviolet cutoff function which is positive, and F (p2 ) is of fast decrease. The covariance C is so chosen (see Sect. 1.1) that C(x − y) is smooth and of compact support. In finite volume, the zero momentum mode p = 0 is automatically taken care of (for details see later). The delta function interaction is replaced (see [CM]) by its regularised version in finite volume: Z dx v(φ(x)) (0.6) V (φ) = 3

with

v(φ(x)) =

λ 2π

d

2

λ

e− 2 |φ(x)| , 2

(0.7)

where | · | is the norm in Rd and λ > 0 is a constant fixed below. It is easy to see that the canonical scaling dimension of the field is [φ] = −β/2, where β = 1 − 2δ > 0 since 0 < δ < 1/2 by choice. The coupling constant g has dimension [g] = ε = 1 − β d2 , and the upper critical dimension is dc = β2 . Note that the canonical scaling of the field is such that it simulates a Gaussian random field with covariance (−1)−1 in dimension D¯ = 1 + 2δ. We have 1 < D¯ < 2, and for δ close to 1/2, D¯ is close to 2. The idea of simulating fractional dimensions by changing the covariance is not new; it figures for example in [GK] and in the recent paper of [BDH3]. We shall hold the parameter L (see later) sufficiently large, β = 1−2δ > 0 sufficiently −30 . Moreover the small and independent of L, and ε = 1 − dβ 2 such that 0 < ε < L 3/2 initial coupling g0 is held within a ball of radius ε , centered at g¯ = O(ε), the approximate fixed point of the second order computations of Sect. 4 (see below). Clearly in this configuration the critical embedding dimension dc is very large and d close to dc from below. Our main result is that the exact RG iteractions converge in an appropriate

Renormalization Group Approach to Interacting Polymerised Manifolds

209

norm to a non-Gaussian fixed point close to the unstable Gaussian fixed point. The precise statement is to be found in Theorem 7.5 of Sect. 7. We will sketch here the various steps of the proof. We show that the second order flow of the RG is under control, and it gives an approximate non-trivial fixed point. We then prove that the remainder is also under control in an appropriate norm and that it gives a negligible correction to the second order RG flow. Next we prove that there exists an invariant small neighborhood of the approximate fixed point. The renormalisation group transformations are contractive in this domain and this permits us to prove that there exists a true attractive non-trivial fixed point of the exact RG. More precisely the partition function density with respect to the gaussian measure after n steps of the RG is competely parametrised by the triple (gn , In , rn ), where gn is an effective coupling constant, In is a second order irrelevant polymer activity, computed exactly, and the polymer activity rn is the remainder, and this triple converges in an appropriate norm to a non-trivial fixed point. The formulation of RG iterations in terms of a polymer gas representation, as well as the method of analysis employed in this paper, have been much influenced by the original paper of Brydges-Yau [BY], the Lausanne lectures of Brydges [B, Laus.], together with developments due to Brydges, Dimock and Hurd [BDH1,2,3]. This technique has the advantage, with respect to the naive perturbative expansion, that the polymer activities as functions of the coupling are not expanded in series when the expansion is unnecessary. In this paper, however, there are definite simplifications and differences with respect to [BY, BDH1,2,3]. The simplifications stem from the use of covariances with compact support, an idea suggested to us by David Brydges. This trivializes the cluster expansion. All polymer activities appearing in this paper are based on connected polymers. We can dispense with analyticity norms. As to the differences, they stem from the special form of the interaction. As a consequence we have that the growth of polymer activities is measured by a norm which employs a large fields regulator quite specific to this problem. Relevant terms are also extracted in a special way appropriate to this model. These matters are explained in detail in the subsequent sections. Some further remarks are in order. The reader may wonder why we interpolated in the covariance starting with D = 1 instead of D = 2. The reason is technical and stems from the scaling properties of the fields which for D = 1 permits us to exploit with advantage the simple large field regulator that we have devised for the construction of norms in which convergence is proved. For the case D = 2 the large field behaviour is not yet under control. This problem deserves more attention. Needless to say, this problem is not seen in perturbation theory. Finally, we note that much progress has been made in the study of self-avoiding polymerised manifolds via perturbative expansions, see [DDG3,4, DW1,2,3,4] and for earlier work [NPW, KN, D1, DHK, H]. 1. The Model, Polymer Gas Representation and RG Tranformations N +1

N +1

1.1. The model. Let 3N be the closed interval [− L 2 , L 2 ] of length LN +1 . The model is described by the partition function Z (1.1) Z0 (3N ) = dµCN z0 (3N , φ), where

z0 (3N , φ) = e−V0 (3N ,φ) ,

(1.2)

210

P. K. Mitter, B. Scoppola

Z

and V0 (3N , φ) = g0 V∗ (3N , φ) = g0

Z = g0

3N

dx

λ∗ 2π

d

2

3N

e

dx v∗ (φ(x))

− λ2∗ |φ(x)|2

(1.3) ,

where each φ ∈ Rd and λ∗ > 0 will be fixed later (see below). µCN is a gaussian measure with mean 0 and covariance CN . The components φj , 1 ≤ j ≤ d, are independent gaussian random variables, and each component has covariance CN . dµCN (φ) = ⊗dj =1 dµCN (φj ). We now construct the measure µCN . Let g(x) be a C ∞ function of compact support: g(x) = 0 ∀ |x| ≥ 21 . We may choose for definiteness ( − 1 2 1−4x for |x| ≤ 1/2 . (1.4) g(x) = e 0 elsewhere Define u(x) = (g∗g)(x). Then u(x) is C ∞ and of compact support: u(x) = 0 ∀ |x| ≥ 1. Define Z L dl β x l u . (1.5) 0L (x) = l l 1 Note that 0L (x) is C ∞ and of compact support: 0L (x) = 0 ∀ |x| ≥ L. Finally we define the covariance CN by a truncated multiscale decomposition: CN (x) =

N X

Ljβ 0L (x/Lj )

(1.6)

j =0

with β = 1 − 2δ > 0 as in the introduction. From (1.6) CN is C ∞ and of compact support: CN (x) = 0 ∀|x| ≥ LN+1 . From (1.5) we have Z Z dp ipx L dl l β u(lp). ˆ (1.7) e 0L (x) = 2π 1 uˆ is of fast decrease, since u is C ∞ with compact support. Moreover, since u = g∗g, uˆ = |gˆ 2 |, so that 0ˆ L (p) ≥ 0. In fact 0ˆ L (p) > 0 a. e. since 0ˆ L being the Fourier transform of a function of compact support is entire analytic and thus has at most a finite number of zeroes in any compact set. Let 0L act on functions by convolution. Then it is immediate to see from the above that 0L is a strictly positive self-adjoint operator of trace class on L2 (3N ) and from the definition in (1.6) it follows that the same is true for CN . Define S = CN 1/2 and the Hilbert space H by completing SL2 (3N ) in the inner product (u, v)H = (S −1 u, S −1 v)L2 (3N ) . It is easy to verify using the smoothness of the integral kernel of S that H is contained in the Sobolev space Hs (3N ) for every integer s ≥ 0. Moreover, using the smoothness of the integral kernel of CN , we can verify that the injection map of H into Hs (3N ) is Hilbert–Schmidt for every s ≥ 0. Now from a standard construction (see e.g. Lemma 2 in Sect. 3.2, chapter 4 of [GV] ) it follows that for every s ≥ 0 there exists a Gaussian Borel probability measure on Hs (3N ) of mean zero and covariance CN which is the desired measure µCN . Take s > 1/2 + σ for any positive integer σ . The Sobolev embedding theorem implies that the sample fields φ are σ -times differentiable. For our purposes it is enough to fix σ = 2.

Renormalization Group Approach to Interacting Polymerised Manifolds

211

There exists another formula for CN , derived from its definition, which is useful because it makes contact with the cutoff function FN : from (1.5) it follows that Z Lj +1 dl β x l u . (1.8) Ljβ 0L (x/Lj ) = l l Lj Introducing this in (1.6) we get Z CN (x) =

LN +1

1

which we can rewrite as

dl β x l u , l l

(1.9)

Z

dp ipx FN (p2 ) e , (1.10) 2π p2(1−δ) where the UV cutoff function in finite volume FN has now the following form Z |p|LN +1 Z LN +1 dl l β u(l) ˆ = p2(1−δ) dl l β u(l|p|). ˆ (1.11) FN (p2 ) = CN (x) =

|p|

1

By our choice of g , gˆ is an even function, and hence so is uˆ = |g| ˆ 2 . This justifies the 2 2 replacement of p by p in the above formula. FN (p ) is of fast decrease because uˆ is of fast decrease. We also see that in our finite volume covariance the zero mode at p = 0 is automatically regularized. To complete the definition of the model we specify the constant λ∗ as λ∗ = Note that, defining γ = 0L (0) = we have λ∗ =

β . u(0)

(1.12)

u(0) β (L − 1) β

(1.13)

Lβ − 1 . γ

(1.14)

Remark. If we had started with an arbitrary λ in the interaction (1.3), and we had performed a Renormalization Group transformation (defined below), then we would Lβ λ have obtained in the absence of quantum corrections: λ0 = 1+γ λ. Since β > 0, λ∗ is an attractive fixed point. Choosing λ = λ∗ from the beginning is an updating of the Renormalization Group trajectory which simplifies the subsequent analysis. From now on, a bound of the form O(1) will mean a bound independent of L, ε and β. We have the following bound on the derivatives of 0L (x): Lemma 1.1.1. For 0 ≤ β ≤ 1/4 and all k ≥ 1 we have sup |∂ k 0L (x)| ≤ O(1),

(1.15)

dx |∂ k 0L (x)|2 ≤ O(1).

(1.16)

x

Z R

212

P. K. Mitter, B. Scoppola

Proof. The proof of (1.15) follows directly from (1.5) and from the fact that supx |∂ k u(x)| ≤ O(1). One has, taking k derivatives, sup |∂ k 0L (x)| ≤ O(1) x

To prove (1.16) we have Z Z k 2 dx |∂ 0L (x)| = 2 R

L

1

dl1 β−k l l1 1

Z

l1

1

1 − Lβ−k ≤ O(1). k−β

dl2 β−k l l2 2

Z

l2

−l2

dx (∂ k u)(x/ l1 )(∂ k u)(x/ l2 ).

Using (1.15) the last integral can be bounded by O(1)l2 . Therefore we have Z L Z Z dl1 β−k l1 dl2 β−k dx |∂ k 0L (x)|2 ≤ O(1) l1 l l2 , l1 l2 2 R 1 1 and it is easy to see by direct computation that the integrals are bounded again by O(1). t u From now on we drop for simplicity the suffix L from 0. It is also natural to define the rescaled propagator (1.17) R0(y) = L−β 0(yL). We will also use sometimes the following notation: ¯ 0(y) = λ∗ L−β 0(yL) = λ∗ R0(y).

(1.18)

¯ Note that 0(0) = 1 − L−β . 1.2. Renormalization Group transformation. It is easy to see from (1.6) that CN (x) = 0(x) + Lβ CN −1 (x/L).

(1.19)

Define the rescaled field Rφ(x) Rφ(x) = Lβ/2 φ(x/L). From (1.19) and (1.20) we see that we can write Z Z dµCN (φ)z0 (3N , φ) = dµCN −1 (φ)z1 (3N −1 , φ),

(1.20)

(1.21)

Z

where z1 (3N−1 , φ) =

dµ0 (ζ )z0 (3N , ζ + Rφ).

(1.22)

This constitutes our Renormalization Group (RG) transformation, which can be iterated. After n steps (0 ≤ n ≤ N) we get Z Z (1.23) dµCN (φ)z0 (3N , φ) = dµCN −n (φ)zn (3N −n , φ), Z

where zn (3N−n , φ) =

dµ0L (ζ )zn−1 (3N −n+1 , ζ + Rφ).

(1.24)

Renormalization Group Approach to Interacting Polymerised Manifolds

213

After N steps we get zN (30 , φ) and measure µC0 (φ). Note that 30 is the closed interval [−L/2, L/2] and C0 = 0. Then we want to pass to the N → ∞ limit. We would like to study the convergence of the iteration (1.24). This is awkward because the volume is changing with the iterations (see (1.23)). We can take N , n, N − n very large. Then the RG iterations can be viewed as those of a fixed map. The precise sense of the convergence of these iterations will be explained later. 1.3. The master formula for fluctuation integrals. By a simple gaussian integration it is easy to see that, choosing suitably the constants β and d, the direction defined by the initial interaction (1.3) is relevant under the action of RG: with our definition of λ∗ we have that Z dµ0 (ζ )v∗ (ζ (x) + Rφ(x)) = L−α v∗ (L−β/2 Rφ(x)) with α = βd/2 chosen in such a way that α < 1. Changing variables x 0 = x/L we get Z Z −α −β/2 ε dx v∗ (L dx v∗ (φ(x)) Rφ(x)) = L L with ε = 1 − α > 0. We will see later that this direction is actually the only relevant one. In order to prove this it is often useful to define a modified fluctuation integration such that the result of this integration is already multiplied by a factor L−α v∗ (L−β/2 Rφ(x)). This is given by the following equality Lemma 1.3.1 (Master formula). Z dµ0 (ζ )v∗ (ζ (x) + Rφ(x))F (ζ + Rφ) Z = L−α v∗ (L−β/2 Rφ(x)) dµ6 x (ζ )F (ζ + L−β T x Rφ), where

6 x (y − z) = 0(y − z) − λ∗ 0(x − z)0(x − y), T x Rφ(y) = Lβ Rφ(y) − λ∗ 0(y − x)Rφ(x).

(1.25)

(1.26)

Note that by trivial algebraic manipulation, and using (1.14) we obtain L−β T x Rφ(y) = [Rφ(y) − Rφ(x)] + L−β (1 + λ∗ (0(0) − 0(x − y)))Rφ(x), (1.27) L−β T x Rφ(x) = L−β Rφ(x).

(1.28)

The proof of Lemma 1.3.1 is obtained simply by gaussian integration. 1.4. Polymer expansion [BY]. We shall express the density of the partition function (11.1) in the polymer gas representation of Brydges andYau, [BY]. This is obtained by expressing the volume 3 as a union of closed blocks 1 of size 1; since 3 is one-dimensional, the blocks are actually closed unit intervals. We define first of all a polymer X as a union of blocks 1. A cell may be the interior of a block, i.e. an open block, or a point of its boundary. Then we define a commutative product, denoted ◦, on functions of sets containing P polymers and cells, in the following way (F1 ◦ F2 )(X) = Y,Z:Y ◦ Z=X F1 (Y )F2 (Z), ∪

214

P. K. Mitter, B. Scoppola

where X = Y ∪◦ Z iff X = Y ∪ Z and Y ∩ Z = ∅. The ◦ identity I is defined by ( 1 if X = ∅ I= . The Exponential is defined by Exp (K) = I +K +K ◦K/2!+· · · . 0 otherwise This is the usual series for an exponential except that the product has been replaced by the ◦ product. The Exponential with the ◦ product satisfies the usual( properties of an 1 if X is a cell . exponential. Moreover we define a space filling function as (X) = 0 otherwise Finally we denote, for X polymer or cell Z V0 (X, φ) = g0

X

d D xv∗ (φ(x)).

(1.29)

With these notations it is clear that, since Exp ( )(X) = 1, one has Z

dµCN e−V0 (φ,3N ) Exp ( )(3N ).

Z0 (3N ) =

(1.30)

In what follows it will be understood that in an expression of the form e−V (3) Exp ( + ˆ ˆ called activities, are supported only on the functions K, K, K)(3) or Exp ( + K)(3) ˆ is often called polymers and vanish on cells. In particular the expression Exp ( + K)(3) polymer gas, because it can be written in the form ˆ = Exp ( + K)(3)

∞ X

(1/N!)

N =0

X

ˆ N ), ˆ 1 ) . . . K(X K(X

(1.31)

X1 ,...,XN

where X1 , . . . , XN are all disjoint polymers in 3. Since these are closed, they are separated by a distance of at least one. We will consider often activities defined on connected polymers. For such activities the decomposition (1.31) is on connected subsets of X. Such activities will be called connected activities. The polymer expansion given above is borrowed from [BY]. Let us conclude this subsection stating two useful lemmas about manipulations on Exp. The (easy) proofs can be found in [BY] and [B,Laus.]. Lemma 1.4.1. For any pair of polymer activities A, B, Exp ( + A)Exp ( + B) = Exp ( + A + B + A ∨ B),

(1.32)

where the polymer activity (A ∨ B) is defined by (A ∨ B)(X) =

X

Y

{Xi },{Yj }→X i

A(Xi )

Y

B(Yj )

(1.33)

j

with {Xi }, {Yj } → X meaning that the sum is over the families of polymers {Xi }, {Yj } such that (∪i Xi ) ∪ (∪j Yj ) = X, the X’s are disjoint, the Y ’s are disjoint, but the two families are overlap connected.

Renormalization Group Approach to Interacting Polymerised Manifolds

215

Lemma 1.4.2. Let us define, for a polymer activity A, the quantity A+ (X) =

X 1 N!

N ≥1

N Y

X

A(Xj ),

(1.34)

X1 ...XN →X j =1

where X1 . . . XN → X means that X1 . . . XN are distinct, overlap connected and such that ∪i Xi = X. Then we have Y

eA(X) = Exp ( + (eA − 1)+ )(Z).

(1.35)

X⊂Z

1.5. RG strategy. A single RG step has for us four parts. We describe them briefly here and we will fill in the details in later sections (3–5). We begin with an expression of the form e−V (φ,3) Exp ( + K)(φ, 3).

(1.36)

Here V is of the form (1.29), with g0 replaced by g. Although K is absent initially, it is necessarily generated in RG operations. The structure of the activity K, together with bounds, will be exhibited in later sections. Suffice it to say at this stage that K consists of an exact second order perturbation theory contribution plus a remainder. Before we proceed further, let us rewrite (1.36) in the form ˆ 3). e−V (φ,3) Exp ( + K)(φ, 3) = Exp ( + K)(φ,

(1.37)

Kˆ is a functional of K and V , given by a standard formula, given later (see Sect. 3). Step 1. Reblocking. We reblock (1.37) using the reblocking operator B. The reblocking operator was introduced in [BY]. So far 3 has been paved with closed 1-blocks. Introduce a compatible paving of 3 on the next scale by closed L-blocks. Each closed L-block is a union of closed 1-blocks. For any 1-polymer X, let X¯ be the smallest L-polymer containing X. If Z is a 1-polymer, LZ denotes a L-polymer. Then ˆ (B K)(LZ) =

X 1 N!

N ≥1

X

N Y

X1 ...XN disjoint {X¯ j } overlap connected ∪X¯ j =LZ

j =1

ˆ j ). K(X

(1.38)

We then have ˆ φ) = Exp L ( Exp ( + K)(3,

L

ˆ + B K)(3, φ),

(1.39)

where all the operations on the r. h. s. of (1.39) are on scale L. We shall now let the RG act on (1.39). The RG action, as outlined before (see (1.22)), consists of a convolution with respect to the measure µ0L (called the fluctuation integration) followed by rescaling.

216

P. K. Mitter, B. Scoppola

Step 2. Fluctuation integration. This is (µ0 ∗ Exp L (

ˆ + B K))(3, Rφ)

L

ˆ gives a sum over products with Rφ defined by (1.20). The expansion of Exp L ( L + B K) of L-polymer activities, where the L-polymers are closed and disjoint. They are thus separated from each other by a distance greater or equal to L. The fluctuation covariance 0 is of compact support by construction: 0(x −y) = 0 ∀ |x −y| ≥ L. As a consequence (µ0 ∗ Exp L (

L

ˆ + B K))(3, Rφ) = Exp L (

L

ˆ + µ0 ∗ B K)(3, Rφ).

(1.40)

This leads to a considerable simplification in the RG analysis. Our next step is Step 3. Rescaling. Recalling the definition of the rescaling operator R acting on the field φ by Rφ(x) = Lβ/2 φ(x/L) and on the fluctuation covariance by R0(y) = L−β 0(yL), we define for a polymer activity K,

Note that

RK(L−1 X, φ) = K(X, Rφ).

(1.41)

(µ0 ∗ K)(X, Rφ) = (µR0 ∗ RK)(L−1 X, φ).

(1.42)

Define now S = RB. Then we have from (1.40), Exp L (

L

ˆ ˆ \ )(L−1 3, φ). + µ0 ∗ B K)(3, Rφ) = Exp ( + (S K)

(1.43)

Here \ denotes the convolution operation with respect to µR0 . On the r.h.s. of (1.43), L−1 3 stands for 3 shrunk by L−1 and paved by closed 1-blocks. Our final step is the extraction: ˆ \ and expoStep 4. Extraction. This consists of picking up relevant parts F from (S K) nentiating them, in such a way that ˆ \ )(L−1 3, φ) = e−V Exp ( + (S K)

0 (L−1 3,φ)

We will find that 0

V (F )(X) = g

0

Exp ( + K 0 )(L−1 3, φ).

(1.44)

Z X

dx v∗ (φ(x))

(1.45)

with a new coupling constant g 0 . K 0 is a functional of Kˆ and of the relevant part F : ˆ F ). An explicit formula for the extraction operator E is given in [B, Laus.] K 0 = E(K, and we will put it to good use. The aim of the RG analysis is to control the discrete flows obtained by a large number of iterations (V , K) → (V 0 , K 0 ). 2. Polymer Activity Norms and Basic Lemmas We want to define in this section the basic properties that we need on the activities K, and the appropriate norms to control them.

Renormalization Group Approach to Interacting Polymerised Manifolds

217

2.1. Decay in X: The large set regulator 0. Let K(X) be a connected polymer activity (with possible φ dependence suppressed). The decay of K in the “size” of X is controlled by a norm of the following type: X |K(X)|0n (X), (2.1) kKk0n = sup 1

X⊃1 X connected

where the large set regulators are defined by 0n (X) = 2n|X| 0(X); 0(X) = L(D+2)|X|

(2.2)

and |X| denotes the number of blocks in X. Because our fluctuation covariance is compactly supported it is sufficient to define the norm of the K only for connected X. This simplifies the definition of 0 with respect to [BY]. We define a small set as follows: a connected polymer X is a small set if |X| ≤ 2D . Recall that the L-closure X¯ of a polymer X is defined to be the smallest union of L-blocks containing X. The main result about 0 that we need in the next sections is the following statement, borrowed from [BDH2]: Lemma 2.1.1. For each p = 0, 1, 2, . . . , there is an O(1) constant cp such that for L sufficiently large and for any polymer X, ¯ ≤ cp 0(X). 0p (L−1 X)

(2.3)

For any large set X a stronger bound is valid ¯ ≤ cp L−D−1 0(X). 0p (L−1 X)

(2.4)

2.2. Smoothness in the fields. Functionals of φ are defined on the Banach space C r (3) of r times continuously differentiable fields with the norm kf kC r =

r X l=0

sup |∂ l f (x)|. x

(2.5)

A derivative of a functional with respect to φ in the direction f is a linear functional f → DK(X, φ; f ) on this Banach space defined by ∂ = DK(X, φ; f ). K(X, φ + sf ) ∂s s=0 The size of a functional derivative is naturally measured by the norm kDK(X, φ)k = sup[|DK(X, φ; f )| : f ∈ C r (X), kf kC r (X) ≤ 1] and kK(X, φ)k = |K(X, φ)|. In the proof of the main theorem we will need to introduce the norm kK(X)k1 = kK(X, φ)k + kDK(X, φ)k.

(2.6)

We have the obvious property kK1 (X1 )K2 (X2 )k1 ≤ kK1 (X1 )k1 kK2 (X2 )k1 for any polymers X1 , X2 and for any activities K1 , K2

218

P. K. Mitter, B. Scoppola

2.3. Growth in the fields: The large fields regulator G. The growth of K(X, φ) as a function of φ and derivatives of φ is controlled by a large fields regulator G(X, φ). The natural norm defined employing G has the form kK(X)kG = sup kK(X, φ)kG−1 (X, φ).

(2.7)

φ∈C r

The functional G(X, φ) is chosen so as to satisfy the following inequality: G(X ∪ Y, φ) ≥ G(X, φ)G(Y, φ) if X ∩ Y = ∅.

(2.8)

The form of our interaction suggests the use of the following regulator: Z 2 1 2 dxe−(λ∗ /2)(1−ρ)|φ(x)| eκkφkX,1,σ Gρ,k (X, φ) = |X| X

(2.9)

with 0 < ρ < 1, κ > 0 and X

kφk2X,1,σ =

k∂ α φk2X ,

(2.10)

1≤α≤σ

where kφkX is the L2 norm. We take σ large enough so that this norm can be used in Sobolev inequalities to control ∂φ pointwise. Let us show that (2.8) is true for this choice: it is enough to show Z 1 2 dxe−(λ∗ /2)(1−ρ)|φ(x)| |X ∪ Y | X∪Y Z Z 1 2 −(λ∗ /2)(1−ρ)|φ(x)|2 1 ≥ dxe dxe−(λ∗ /2)(1−ρ)|φ(x)| . |X| X |Y | Y Define Z Z 1 1 2 2 dxe−(λ∗ /2)(1−ρ)|φ(x)| , b = dye−(λ∗ /2)(1−ρ)|φ(y)| , |X| X |Y | Y |X ∪ Y | |X ∪ Y | , q= . p= |X| |Y | a=

Using that 0 ≤ a, b ≤ 1 and 1 |X ∪ Y |

Z X∪Y

1 p

+

1 q

= 1 we have: 1 1 a+ b p q 1 1 1 p1 p 1 q1 q = (a ) + (b ) ≥ a p b q ≥ ab, p q

dxe−(λ∗ /2)(1−ρ)|φ(x)| = 2

which shows (2.8). For the norm (2.7) to be useful, we will need further properties for the regulator G. In particular to control the fluctuation step we will need that G is stable in the sense of the following lemma:

Renormalization Group Approach to Interacting Polymerised Manifolds

219

Lemma 2.3.1. (Stability of the large field regulator) Let 0 < ρ < 1/8 ρ = O(1) and κ > 0 κ = O(1) be both sufficiently small. Recall that O(1) means that they are independent of L, β and ε. Let κ/ρ < 1 and L be sufficiently large. Then (µ0 ∗ Gρ,κ )(X, Rφ) ≤ G]ρ,κ (X, Rφ)

(2.11)

with G]ρ,κ (X, Rφ) = O(1)2

−α |X| L

|X|

Z X

β/2 )|L−β/2 Rφ(x)|2

dxe−(λ∗ /2)(1−ρ/L

e4κkRφkX,1,σ . 2

(2.12)

The proof of this lemma, which is straightforward but rather long, is presented in the Appendix. It is useful to note that from the scaling property of the field φ and the definition of β we have kRφk2X,1,σ ≤ Lβ−1 kφk2L−1 X,1,σ , and for L sufficiently large, Lβ−1 4 < 1 since β > 0 but very small. Note also that LD−α = Lε = O(1) for ε sufficiently small (depending on L). Using these two facts it is easy to see that G]ρ,κ (X, Rφ) ≤ O(1)2|X| L−D G(L−1 X, φ), which is the original form of the regulator up to the contractive factor L−D and a vacuum energy contribution depending on the size of X. Finally we remark that the stability of the large fields regulator can be stated equivalently in the following corollary: Corollary 2.3.2. For 0 < ρ < 1/32 and κ/ρ < 1 and L sufficiently large, Z −β Rφ(x)| ¯ ¯ 2 4κkζ +L−β T x¯ Rφk2X,1,σ e dµ6 x¯ (ζ )e(λ∗ /2)4ρ|ζ (x)+L β/2 )L−β |Rφ(x)| ¯ 2

≤ e(λ∗ /2)(4ρ/L

e

8κkRφk2 −1 L

(2.13)

X,1,σ

Equation (2.13) can be derived from (2.11), (2.12) and using the master formula. Such form of the stability of the regulator will be used in Sect. 5. One particular point which we have to take account of in this work is the fact that due to our expression of the large field regulator the usual relation (see e.g. BDH) G(X, φ) ≥ 1 is not true in our case. Therefore also the useful relation G(X, φ) ≥ G(Y, φ) if X ⊃ Y is in general false. This implies that in many cases the reblocking step has to be evaluated in some detail. For the contributions due to small sets (see below, Sect. 5) the following lemma is often used. Lemma 2.3.3.

X X s. s. ¯ X=LZ

G(L−1 X, φ) ≤ O(1)LD G(Z, φ).

(2.14)

220

P. K. Mitter, B. Scoppola

Proof.

X

G(L−1 X, φ)

X s. s. ¯ X=LZ

=

X X s. s. ¯ X=LZ

Z

1 |L−1 X|

dxe−(λ∗ /2)(1−ρ)|φ(x)| e 2

L−1 X

κkφk2 −1 L

X,1,σ

.

(2.15)

First we observe that since X is a small set, so is Z. Then it is easy to see that  Z X 2 X G(L−1 X, φ) ≤ O(1)LD  dxe−(λ∗ /2)(1−ρ)|φ(x)| X s. s. ¯ X=LZ

 X

+

 

11 ,12 conn. 11 ∪12 =LZ

We now use bound X X s. s. ¯ X=LZ

P

1 ¯ L−1 1 1=LZ

Z

dxe−(λ∗ /2)(1−ρ)|φ(x)| + 2

L−1 11

Z

 2  κkφk2 Z,1,σ . dxe−(λ∗ /2)(1−ρ)|φ(x)|  e

L−1 12

12 :11 ∩12 6=∅ 1

≤ 2 and the fact that |Z| ≤ 2 and we obtain the upper

G(L−1 X, φ) ≤ O(1)LD

Z

dxe−(λ∗ /2)(1−ρ)|φ(x)| eκkφkZ,1,σ 2

Z

2

(2.16)

≤ O(1)LD G(Z, φ).u t 2.4. Norms. Now we have all the ingredients to construct norms on K. We define the norms kK(X)kG,1 = kK(X)kG + kDK(X)kG , kKkG,1,0 = kkK(X)kG,1 k0 .

(2.17) (2.18)

However sometimes it will be useful to define L∞ norms on certain activities in the following way: (2.19) kK(X)k∞,1 = kK(X)k∞ + kDK(X)k∞ , where

kK(X)k∞ = sup |K(X, φ)|. φ∈C r

(2.20)

2.5. Basic estimates on generic integrated activities. In this subsection we state some bounds valid for integrated activities with initial norm small enough. These bounds will be used often in the next sections. Lemma 2.5.1. Let K be an activity such that kKkG,1,0 = O(εq ), with q ≥ 1/10, and let us define S≥k K by restricting the sum on N in (1.38) to N ≥ k. Then we have X (RK¯ k )(L−1 X, φ) (2.21) S≥k K(Z, φ) = X ¯ X=LZ

Renormalization Group Approach to Interacting Polymerised Manifolds

221

with K¯ k defined by K¯ k (X, φ) =

X 1 N!

N ≥k

X

N Y

X1 ,...,XN disj. ∪j Xj =X {X¯ j } ov conn

j =1

K(Xj ).

(2.22)

Then one has for any integer p ≥ 0 and ε ≥ 0 sufficiently small, \ k S≥k K kG,1,0p ≤ O(1)k L(β/2+kD) kKkkG,1,0

(2.23)

with O(1) depending on p. Proof. The action of the fluctuation operator is controlled by the stability of the large fields regulator: ¯ ¯ Rφ)| ≤ L−D O(1)2|X| G(L−1 X, φ)kK(X)k |(µ0 ∗ K)(X, G, then | S≥k K

\

(Z, φ)| ≤

X

L−D O(1)2|X| G(L−1 X, φ)kK¯ k (X)kG

X ¯ X=LZ

≤ L−D O(1)eκkφkZ,1,σ |Z| 2

·

(2.24)

X X ¯ X=LZ

1 |Z|

Z

dxe−(λ∗ /2)(1−ρ)|φ(x)|

2

Z

(2.25)

2|X| χ −1 (x)kK¯ k (X)kG , |L−1 X| L X

where χL−1 X (x) is the characteristic function of the set L−1 X. Using the trivial bounds 1 χ −1 (x) ≤ LD , |Z| ≤ 2|Z| we have |L−1 X| L X k S≥k K

\

(Z)kG ≤ O(1)2|Z|

X

2|X| kK¯ k (X)kG .

(2.26)

X ¯ X=LZ

After performing a similar estimate for the functional derivative which produces an additional Lβ/2 factor, and then multiplying throughout by 0p (Z) we obtain X \ ¯ kK¯ k (X)kG,1 0p+2 (L−1 X). (2.27) k S≥k K kG,1,0p ≤ O(1)Lβ/2 sup 1

X ¯ L−1 X∩16 =∅

Now using the spanning tree argument of Lemma 7.1 [BY] as well as Lemma 2.1.1, ¯ ≤ O(1)0(X) we obtain 0p+3 (L−1 X) X \ O(1)N (LD )N (kKkG,1,0 )N . (2.28) k S≥k K kG,1,0p ≤ O(1)Lβ/2 N ≥k

Using the bound kK(X)kG,1,0 ≤ t u

O(1)εq

to control the sum over N we have the lemma.

222

P. K. Mitter, B. Scoppola

Lemma 2.5.1 obviously implies the following: Corollary 2.5.2. For the linearized scaling operator S1 the following bound holds: k (S1 K)\ kG,1,0p ≤ O(1)L(β/2+D) kKkG,1,0

(2.29)

for any integer p ≥ 1, O(1) depends on p. We now define the linearized scaling operator restricted to contributions from large sets by X (l.s.) (RK)(L−1 X, φ). (2.30) S1 K(Z, φ) = X conn. large set ¯ X=LZ

We have the following result. Lemma 2.5.3.

\ (l.s.) k S1 K kG,1,0p ≤ O(1)L−(1−β/2) kKkG,1,0

(2.31)

for any integer p ≥ 0, O(1) depends on p. Proof. Repeat the proof of Lemma 2.5.1 using ¯ ≤ O(1)L−D−1 0(X) for X large set 0p+3 (L−1 X)

(2.32)

which comes from (2.4) of Lemma 2.1.1. u t Let now F be a polymer activity supported on small sets, such that for every integer p ≥ 0 and for q ≥ 1/10, kF kG,1,0p ≤ O(εq )

kF k∞,1,0p ≤ O(εq )

(2.33)

with O(1) depending on p. We then have for ε ≥ 0 sufficiently small Lemma 2.5.4. ke−F − 1kG,1,0p ≤ O(1)kF kG,1,0p , ke

−F

(2.34)

− 1 + F kG,1,0p ≤ O(1)kF kG,1,0p kF k∞,1,0p

Remark. Lemma 2.5.4 remains true if the G-norm is replaced by the same proof.

(2.35) L∞

norm, by the

Proof. We can easily bound ke ≤

−F (X)

X N≥k+1

k X 1 (−F (X))l kG,1 − l! l=0

1 kF (X)kG,1 (kF (X)k∞,1 0(X))N −1 . N!

(2.36)

Then by an easy argument kF (X)k∞,1 0(X) ≤ kF k∞,1,0 , so that we obtain ke

−F

k X X 1 1 −1−k − (−F )l kG,1,0 ≤ kF kG,1,0 kF kk∞,1,0 kF kN ∞,1,0 . l! N! l=0

(2.37)

N ≥k+1

Because of the smallness of kF k∞,1,0 the series is bounded by O(1). Hence setting k = 0, 1 in (2.37) the lemma is proved. u t

Renormalization Group Approach to Interacting Polymerised Manifolds

223

Lemma 2.5.5. For any integers k ≥ 1 and p ≥ 0, and with O(1) dependent on p, k−1 k k(e−F − 1)+ ≥k kG,1,0p ≤ O(1) kF kG,1,0p+3 kF k∞,1,0p+3 ,

(2.38)

k k k(e−F − 1)+ ≥k k∞,1,0p ≤ O(1) kF k∞,1,0p+1 .

(2.39)

Proof. From the definition of (e−F − 1)+ ≥k (X) it is easy to obtain k(e−F − 1)+ ≥k (X)k1 ≤

X 1 N!

N≥k N Y

X

G(X1 )k(e−F − 1)(X1 )kG,1

X1 ,..,XN :∪j Xj =X {Xj } overlap conn.

(2.40)

k(e−F − 1)(Xj )k∞,1 .

j =2

P We estimate for X1 ⊂ X, G(X1 ) ≤ Y ⊂X G(Y ) and then Z Z λ∗ λ∗ 2 2 2 2 dxe− 2 (1−ρ)|φ(x)| eκkφkX,1,σ ≤ dxe− 2 (1−ρ)|φ(x)| eκkφkX,1,σ , (2.41) G(Y ) ≤ Y

X

P so that we obtain G(Y ) ≤ 2|X| G(X) and, from Y ⊂X 1 ≤ 2|X| , we get G(X1 ) ≤ Q 2|Xj | . Hence (2.40) is estimated by 22|X| G(X) ≤ G(X) N j =1 2 k(e−F − 1)+ ≥k (X)kG,1 X 1 X ≤ 22|X1 | k(e−F − 1)(X1 )kG,1 N! X ,..,X :∪ X =X N≥k

·

N Y

1 N j j {Xj } overlap conn.

(2.42)

22|Xj | k(e−F − 1)(Xj )k∞,1 .

j =2

Since the {Xj } are overlap connected, 0p (X) ≤ k(e−F − 1)+ ≥k kG,1,0p ≤

X 1 X sup N! 1

N ≥k

· k(e−F − 1)(X1 )kG,1 0p+2 (X1 )

QN

X∩16 =∅

N Y

j =1 0p (Xj ),

and hence

X X1 ,..,XN :∪j Xj =X {Xj } overlap conn.

(2.43)

k(e−F − 1)(Xj )k∞,1 0p+2 (Xj ).

j =2

We now estimate the r.h.s. of (2.43) by the spanning tree argument in the proof of Lemma 5.1 of [BY] so that it is majorised by O(1)k k(e−F − 1)kG,1,0p+3 k(e−F − 1)kk−1 ∞,1,0p+3 X N −k N −k −F · O(1) k(e − 1)k∞,1,0p+3 . N≥k

(2.44)

224

P. K. Mitter, B. Scoppola

Now use Lemma 2.5.4 and the remark following it: k−1 k k(e−F − 1)+ ≥k kG,1,0p ≤ O(1) kF kG,1,0p+3 kF k∞,1,0p+3 X −k · O(1)N −k kF kN ∞,1,0p+3 .

(2.45)

N ≥k

By assumption (2.33) the series converges to O(1) and thus (2.38) is proved. The proof of Q 2|Xj | , (2.39) is the same, except that we do not need the estimate G(X1 ) ≤ G(X) N j =1 2 t so that on the r.h.s. of (2.39) we have the norm with 0p+1 instead of 0p+3 . u 2.6. Lemmas on increments. Let K, K 0 be two polymer activities satisfing the hypothesis of Lemma 2.5.1, namely kKkG,1,0 ≤ O(εq ) kK 0 kG,1,0 ≤ O(εq ) for some q ≥

1 10

(2.46)

and ε ≥ 0 sufficiently small. Define the increments 1K = K − K 0 1(S≥k K) = S≥k K 0 − S≥k K.

(2.47)

Then we have Lemma 2.6.1. For any integer p ≥ 0, k1(S≥k K)\ kG,1,0p ≤ O(1)k L(β/2+kD) ε(k−1)q k1KkG,1,0

(2.48)

with O(1) depending on p. Let now F, F 0 be two polymer activities supported on small sets, such that for every integer p ≥ 0, (2.49) kF kG,1,0p ≤ O(εq ), kF 0 kG,1,0p ≤ O(εq ), and for some q ≥ have

kF k∞,1,0p ≤ O(εq ), kF 0 k∞,1,0p ≤ O(εq ). 1 10 ,

(2.50)

and ε ≥ 0 sufficiently small. Define increments as before. We then

Lemma 2.6.2. k1(e−F − 1)kG,1,0p ≤ O(1)k1F kG,1,0p ,

(2.51)

−F

(2.52)

k1(e

q

− 1 + F )kG,1,0p ≤ O(1)ε k1F kG,1,0p .

Remark. Lemma 2.6.2 remains true if the G-norm is replaced by the L∞ norm, by the same proof. Lemma 2.6.3. Given two activities satisfying (2.49), (2.50), for any integers k ≥ 1 and p ≥ 0, and with O(1) dependent on p, k (k−1)q k1F kG,1,0p+3 , k1(e−F − 1)+ ≥k kG,1,0p ≤ O(1) ε

k1(e

−F

− 1)+ ≥k k∞,1,0p

k (k−1)q

≤ O(1) ε

k1F k∞,1,0p+1 .

(2.53) (2.54)

Renormalization Group Approach to Interacting Polymerised Manifolds

225

We shall only prove Lemma 2.6.1, the proofs of Lemmas 2.6.2, 2.6.3 being similar. Proof of Lemma 2.6.1. 1(S≥k K)\ = (S≥k (K + 1K))\ − (S≥k K)\ Z 1 ∂ = dt (S≥k (K + t1K))\ ∂t 0 \ Z 1 ∂ S≥k (K + t1K) . = dt ∂t 0

(2.55)

From the proof of Lemma 2.5.1 we have that K −→ S≥k (K) is an analytic map between the Banach spaces with norms k · kG,1,0 and k · kG,1,0p respectively. Define K(t) = K + t1K. Then S≥k (K(t)) is analytic in t. By the Cauchy integral formula \ \ I S≥k (K(z)) 1 ∂ S≥k (K(t)) = dz , (2.56) ∂t 2π i γ (z − t)2 where we choose the closed contour γ in C as follows: γ : z − t = Reiθ , 0 ≤ θ ≤ 2π , εq . With this choice of γ , and 0 ≤ t ≤ 1, for z ∈ γ , R = k1Kk G,1,0 εq iθ e 1K. K(z) = K + t + k1KkG,1,0

(2.57)

Clearly K(z) satisfies the hypothesis of Lemma 2.5.1 under the hypothesis (2.47), kK(z)kG,1,0 ≤ O(εq ). Hence from (2.56) we have the Cauchy estimate

\

∂ \

S≥k (K(t)) ≤ k1KkG,1,0 ε−q sup S≥k (K(z))

∂t G,1,0p z∈γ (2.58) G,1,0p ≤ O(1)k Lβ/2+kD ε(k−1)q k1KkG,1,0 , where to proceed to the last line we used Lemma 2.5.1 and kK(z)kG,1,0 ≤ O(εq ). We use the estimate (2.58) in (2.55) to finish the proof. u t

3. Estimates for a Generic RG Step. Remainder Estimates In this section we present the detailed structure of the initial form of the partition functional and the form of the activities produced by a RG step. 3.1. Some manipulation on the starting partition functional. The starting partition functional z0 (3, φ) has the initial expression z0 (3, φ) = e−V0 Exp ( ),

(3.1)

where the initial activity V0 is Z V0 (X) = g0 V∗ (X) = g0

Z X

dx v∗ (φ(x)) = g0

X

dx

λ∗ 2π

d

2

e−

λ∗ 2 2 |φ(x)|

(3.2)

226

P. K. Mitter, B. Scoppola

β with λ∗ = 2u(0) . However it is convenient to write (3.1) in a form suitable for iteration −V 0 as e Exp ( + K0 ), where initially K0 ≡ 0. We want to show that the structure above is reproduced also after the generic step of RG. We start therefore from an expression of the form (3.3) e−V (φ) Exp ( + K),

where in (3.3) the activity V is V = gV∗ , with g = O(ε), and 0 < ε < L−10(D+2)

(3.4)

with L sufficiently large. The bounds kV (1)kG,1 = O(ε) kV (1)k∞,1 = O(ε)

(3.5)

obviously hold. We assume for the connected activity K the following structure: K = I + r,

(3.6)

where I is an activity exactly computed in second order perturbation theory, supported only on small sets and satisfying the following bounds: kIkG,1,06 ≤ ε 7/4 , k(S1 I)\ kG,1,0 ≤ L−β/4 ε7/4 .

(3.7)

The structure of I and the above bounds will be given in Sect. 4 devoted to second order perturbation theory. r satisfies the following inductive bound: krkG,1,06 ≤ ε5/2+η , 0 < η ≤ 1/20.

(3.8)

As outlined above (see (1.37)) it is convenient first of all to rewrite the partition functional ˆ (3.9) e−gV∗ Exp ( + K) = Exp ( + K). A suitable explicit expression of the connected activity Kˆ is provided by the following lemma. Lemma 3.1.1.

Kˆ = P + + K + P + ∨ K,

where

(3.10)

( P (X) =

e−gV∗ (1) − 1 forX = 1 , 0 otherwise

(3.11)

and, following (1.34), P + (X) =

X 1 N!

N ≥1

X

N Y

11 ...1N : conn. ∪j 1j =X

j =1

Proof. Is obtained writing simply e−gV∗ = t P + )(X) and then applying Lemma 1.4.1. u

Q

1∈X ((e

P (1j ).

−gV∗ (1)

(3.12)

− 1) + 1) = Exp ( +

Renormalization Group Approach to Interacting Polymerised Manifolds

227

3.2. Extraction of the second order activities. It is convenient now to separate out the ˆ We write second order contributions from K. ˆ + rˆ Kˆ = Q

(3.13)

with

 2  −gV∗ (1) + g2 (V∗ (1))2 + I(1) for |X| = 1, X ≡ 1    g2 P ˆ Q(X) = 2 1 ,1 V∗ (11 )V∗ (12 ) + I(X) for |X| = 2, X connected , (3.14) 1 2   11 ∪12 =X   0 otherwise

and

rˆ = 5 + r + P + ∨ K,

(3.15)

where 5 is given by  3 R −g 3 1 2  ∗ (1)) 0 ds(1 − s) exp(−sgV∗ (1)) for |X| = 1, X ≡ 1  2 (V  1 P σ (11 , 12 ) for |X| = 2, X connected , 5(X) = 2 11 ,12  11 ∪12 =X    + otherwise P≥3 (X) (3.16) with Z 1 ds(1 − s) exp(−sgV∗ (12 )) + (11 12 ) σ (11 , 12 ) = −g 3 V∗ (11 )(V∗ (12 ))2 + g4

2 Y

(V∗ (1j ))2

j =1 + (X) P≥3

X 1 = N! N≥3

0

Z

1 0

X

N Y

11 ...1N : conn. ∪j 1j =X

j =1

ds(1 − s) exp(sgV∗ (1j )),

(3.17)

P (1j ).

(3.18)

3.3. Bound on the remainder rˆ . We prove now the following result. Lemma 3.3.1.

kˆr kG,1,0 ≤ O(1)ε5/2+η .

(3.19)

Proof. Expanding the exponential, taking derivatives we get, using the Banach algebra property stated after (2.6) and the smallness of ε, kP (1)k∞,1 ≤ O(1)kV (1)k∞,1 ≤ O(1)ε and kP (1)kG,1 ≤ O(1)kV (1)kG,1 ≤ O(1)ε, so that k5(X)kG,1 ≤ O(1)ε3 for |X| ≤ 2.

(3.20)

+ we apply Lemmas 2.5.4 and 2.5.5 and we obtain To give a bound on P + and on P≥3 + + 9/10 and kP≥3 kG,1,0 ≤ O(1)ε27/10 . Therefore we have, using kP k∞,1,0 ≤ O(1)ε (3.20), and again the smallness of ε (3.4),

k5kG,1,0 ≤ O(1)ε27/10 ≤

1 −1 5/2+η L ε . 8

(3.21)

228

P. K. Mitter, B. Scoppola

Finally we consider P + ∨ K. By definition of ∨ and using the [BY] spanning tree argument we obtain kP + ∨ KkG,1,0 ≤

X

M 5/2+η (O(1))N +M kP + kN , ∞,1,03 kKkG,1,03 ≤ ε

(3.22)

N,M≥1

where in the last inequality we used the fact that kKkG,1,03 ≤ ε7/4 , as it can be seen from (3.6), (3.7) and (3.8), and we estimated the series using again the smallness of ε. Putting together (3.8), (3.22), and (3.21) we obtain (3.19). u t Finally it is useful to note Lemma 3.3.2. ˆ G,1,0 ≤ O(1)ε9/10 , kKk kKˆ + gV∗ kG,1,0 ≤ O(1)ε7/4 .

(2.23) (3.24)

The proof is simply obtained by the definition of Kˆ (3.13), by the estimate (3.5) with the smallness of ε, by (3.7) and by Lemma 3.3.1 above. 3.4. The action of RG. Reblocking–rescaling. The connected activity Kˆ is given by ˆ (3.13), (3.14), (3.15). It is convenient to separate out the second order term I from Q. ˆ (0) (X), supported on connected sets |X| ≤ 2 by: So we define Q  2  −gV∗ (1) + g2 (V∗ (1))2 for |X| = 1, X ≡ 1   2 P  g V∗ (11 )V∗ (12 ) for |X| = 2, X connected . ˆ (0) (X) = 2 Q 11 ,12   11 ∪12 =X   0 otherwise

(3.25)

Now it is easy to see that we can express the reblocked-rescaled activity S Kˆ as: ˆ (0) + g 2 S2 (V∗ ) + S1 I + S1 rˆ + r˜ , S Kˆ = S1 Q

(3.26)

r˜ = S≥3 Kˆ + S2 (Kˆ + gV∗ ) + S2 (gV∗ , Kˆ + gV∗ ).

(3.27)

where

Remarks. 1) In (3.26), (3.27) V∗ is supported on single blocks. 2) S1 is the linearized reblocking-rescaling, S2 is the quadratic part of S and S≥3 stands for S − S1 − S2 . In the last term of (3.27), the quadratic reblocking sum has for each term one factor gV∗ and the other Kˆ + gV∗ . 3) Note that r˜ is formally of O(ε 3 ). We shall estimate it after performing the fluctuation integration.

Renormalization Group Approach to Interacting Polymerised Manifolds

229

Fluctuation integration. We have

Then

ˆ = Exp ( + (S K) ˆ \ ). µR0 ∗ Exp ( + S K)

(3.28)

2 ˜ \ + (S1 I)\ + (S1 rˆ )\ + r˜ \ , ˆ \ = −gLε V∗ + g (RQ) (S K) 2

(3.29)

˜ is where V∗ above is again supported on single blocks and the connected activity Q supported on L polymers LZ such that |Z| ≤ 2 and has the following expression: ˜ Q(L1) = (V∗ (L1))2 , ˜ Q(L1 ∪ L10 ) = V∗ (L1)V∗ (L10 ) + (1 10 ).

(3.30)

The first three terms of (3.29) are contributions up to second order in perturbation theory. They will be treated in more detail in the next section. Preliminary extraction. It is convenient to extract the first order term and to rewrite our partition functional in the following way: −1 ˆ \ )(L−1 3) = e−Lε gV∗ (L−1 3) Exp ( + K)(L ˜ 3), Exp ( + (S K)

(3.31)

where the activity K˜ is ˆ \, ˆ \ + P˜ + + P˜ + ∨ (S K) K˜ = (S K) where

(

ε gV (1) ∗

e−L P˜ = 0

− 1 for X = 1 . otherwise

(3.32)

(3.33)

Equation (3.32) may be proved along the lines of the proof of Lemma 3.1.1. We now isolate out the terms proportional to g and g 2 in (3.32). Introduce the notation + + ) to represent the sum of contributions proportional P˜(≤2) (to be distinguished from P˜≤2 2 + ˜ to g and g in P . Clearly   Lε gV∗ (1) + 21 L2ε g 2 (V∗ (1))2 for |X| = 1, X ≡ 1    1 L2ε g 2 P V (1 )V (1 ) for |X| = 2, X connected ∗ 1 ∗ 2 + (X) = 2 P˜(≤2) 11 ,12  11 ∪12 =X    0 otherwise

(3.34)

Note also ˆ \ )(≤2) (X) (P˜ + ∨ (S K)   for |X| = 1, X ≡ 1 −L2ε g 2 (V∗ (1))2   −L2ε g 2 P V (1 )V (1 ) for |X| = 2, X connected . ∗ 1 ∗ 2 = 11 ,12  11 ∪12 =X    0 otherwise

(3.35)

230

P. K. Mitter, B. Scoppola

Adding (3.34) and (3.35) we get ˜ ˆ \ )(≤2) = Lε gV∗ − 1 L2ε g 2 Q, (P˜ + + P˜ + ∨ (S K) 2

(3.36)

˜ defined in (3.30). Hence returning to (3.32) with V∗ supported on single blocks and Q we obtain Z ∂ g2 1 ˜ s\ + (S1 I)\ + (S1 rˆ )\ + r˜ \ + r¯ , ds (RQ) (3.37) K˜ = 2 0 ∂s where the new remainder r¯ is + ˆ \ − (P˜ + ∨ (S K) ˆ \ )≤2 . ) + (P˜ + ∨ (S K) r¯ = (P˜ + − P˜(≤2)

(3.38)

Remarks. 1) r¯ is formally O(ε 3 ) and together with r˜ \ needs no further extraction. Their norms will be estimated in the following Subsect. 3.5. 2) K˜ needs further extractions, namely from the first and third terms in (3.37). The first extracted term, denoted FQ˜ , is the perturbative relevant part of the first term of (3.37). It will be computed explicitly in the following Sect. 4, and its irrelevant part will be I. Note that in Sect. 4 it will also be proved that (S1 I)\ needs no extraction, since its norm, by exact computations, goes down by a contracting factor. The second extracted term, denoted Frˆ , is the relevant part of (S1 rˆ )\ . Although rˆ is O(ε5/2+η ), an extraction has to be performed because the linear reblocking for small sets produces a factor LD , and therefore a contractive factor has to be obtained. This extraction, and the control of the obtained remainder, will be the subject of Sect. 5. 3.5. Bounds on irrelevant remainders. We prove now the following results. Lemma 3.5.1.

k¯r kG,1,0p ≤ L−β/2 ε5/2+η .

(3.39)

+ ), has the same form as 5 in Proof. The first addend of r¯ , namely the term (P˜ + − P˜(≤2) ε (3.16) with V∗ substituted by −L V∗ . Therefore, since Lε = O(1), one can obtain the bound + ) ≤ L−1 ε5/2+η (3.40) (P˜ + − P˜(≤2)

ˆ \ − (P˜ + ∨ (S K) ˆ \ )≤2 it along the same lines as for 5. To control the term (P˜ + ∨ (S K) + \ \ ˆ and P˜ ∨ (S(Kˆ + gV∗ )) . We have is enough to estimate P˜≥2 ∨ (S K) + ˆ \ kG,1,0p ∨ (S K) kP˜≥2 X + N ˆ \ kM ≤ (O(1))N +M kP˜≥2 k∞,G,1,0p+3 k(S K) G,1,0p+3 .

(3.41)

N,M≥1

By Lemma 2.5.1 and using the fact that from the condition of the smallness of ε (3.4) and D = 1 it turns out L ≤ ε −1/30 , we have β/2 D ˆ M ˆ \ kM L kKkG,1,0 k(S K) G,1,0p+3 ≤ O(1)L

≤ L2 ε9/10 ≤ ε9/10−2/30 ≤ ε8/10 .

(3.42)

Renormalization Group Approach to Interacting Polymerised Manifolds

231

+ N It is easy to see, as in the proof of Lemma 2.5.5, that kP˜≥2 kG,1,0p+3 ≤ ε18/10 . This gives + ˆ \ kG,1,0p ≤ ε26/10 ≤ L−1 ε5/2+η . ∨ (S K) kP˜≥2

(3.43)

We obtain in the same way kP˜ ∨ (S(Kˆ + gV∗ ))\ kG,1,0p ≤ O(1)ε9/10 ε7/4 LD+β/2 ≤ L2 ε53/20 ≤ L−1 ε5/2+η .

(3.44)

From (3.40), (3.43) and (3.44) we obtain the lemma. u t Lemma 3.5.2.

k˜r \ kG,1,0p ≤ L−β/2 ε5/2+η .

(3.45)

Proof. r˜ is given in (3.27). There are three terms. By Lemma 2.5.1 the contribution of the first is bounded by ˆ 3G,1,0 ˆ \ kG,1,0p ≤ O(1)Lβ/2+3D kKk k(S≥3 K) ≤

L4−β/2 27/10 1 −β/2 5/2+η ε ≤ L ε . 3 3

(3.46)

From the second, by Lemma 3.3.2, we get L3−β/2 14/4 e k(S2 (Kˆ + gV∗ ))\ kG,1,0p ≤ O(1)Lβ/2+2D kKˆ + gV∗ k2G,1,0 ≤ 3 (3.47) 1 −β/2 5/2+η ≤ L ε , 3 whilst from the third we obtain ˆ G,1,0 k(S2 (gV∗ , Kˆ + gV∗ ))\ kG,1,0p ≤ O(1)Lβ/2+2D kKˆ + gV∗ kG,1,0 kKk L3−β/2 7/4 9/10 1 −β/2 5/2+η ≤ L ε , e e 3 3

≤

(3.48)

summing (3.46), (3.47) and (3.48) we obtain the proof. u t 4. RG Step to Second Order. Relevant and Irrelevant Terms. Estimates 4.1. The starting second order activity. In this section we consider the contribution to the partition functional up to the second order ˆ Exp ( + Kˆ ≤2 ) ≡ Exp ( + Q),

(4.1)

ˆ is given by (3.14). It is convenient in the following where the second order activity Q computations to use the obvious representation: V∗ (1) =

λ∗ 2π

d/2 Z 1

dDx

Z

|k|2 dd k e− 2λ∗ eik·φ(x) . d/2 (2π λ∗ )

(4.2)

232

P. K. Mitter, B. Scoppola

The irrelevant second order activity I(X) = Ik (X) depends actually on the number k of iterations of RG so far performed. We assume here inductively that I0 (X) = 0 and for k ≥ 1, k X 2 ¯ Il (X). gk−l (4.3) Ik (X) = l=1

In the above g = gk and we assume inductively that gj = O(ε) for all 0 ≤ j ≤ k. We will see in a moment that the action of RG will give us an irrelevant second order activity of the form Ik+1 (X). The activities I¯ l (X) are supported on the polymer X such that |X| ≤ 2, and are defined by the following expression: d Z 1 Z Z Z Z λ∗ ∂ d d k1 d d k2 2lε 1 D ¯ ds d D x1 d x Il (1) = L 2 2 2π ∂s 1 (2π λ∗ )d/2 1 (2π λ∗ )d/2 0 Z 1 ∂ 1 · dt exp − (k, Il (s, t)k) + i(k1 · φ(x1 ) + tk2 · (φ(x2 ) − φ(x1 ))) , ∂t 2λ∗ 0 (4.4) Z Z Z Z d d k1 ∂ 1 λ∗ d 1 ds d D x1 d D x2 I¯ l (1 ∪ 10 ) = L2lε d/2 0 2 2π ∂s (2π λ ) ∗ 1 1 0 Z Z 1 d d k2 ∂ · dt (4.5) (2πλ∗ )d/2 0 ∂t 1 (k, Il (s, t)k) + i(k1 · φ(x1 ) + tk2 · (φ(x2 ) − φ(x1 ))) + (1 10 ), · exp − 2λ∗ where k ≡ (k1 , k2 ) and Il (s, t) = with

−tCl (s)

1

!

−tCl (s) 2 Cl (s) + (1 − t 2 )D1 0(x2 − x1 )

(4.6)



 l−2 ¯ ¯ p (x2 − x1 )) ¯ (l−1) (x2 − x1 ))) X 0(0) − 0(L (1 − s 0(L , + Cl (s) =  Lpβ Lβ(l−1) p=0

where the sums over p are void if l = 1, and with ¯ 2 − x1 ) = D1 0(x

∞ X

i h ¯ ¯ 2 − x1 )/Lj ) . Ljβ 0(0) − 0((x

(4.7)

j =1

¯ ≥ 0 and that the series converges by virtue of the estimate Note that D1 0(x) j ¯ ¯ − 0(x/L )) ≤ O(1)L−(j −1)(2−β) |x|2 . 0 ≤ Ljβ (0(0)

We will compute explicitly in the rest of this section the evolution of these terms under RG transformation.

Renormalization Group Approach to Interacting Polymerised Manifolds

233

ˆ up to the second order: it is easy 4.2. Reblocking. First we consider the reblocking of Q to see that X X 1 ˆ 2) ˆ ˆ 1 )Q(X ˆ (≤2) (LZ) = Q(X) + Q(X (B Q) 2 X X ,X disj 1 2 X¯1 ,X¯2 overlap conn. ¯ X1 ∪X¯2 =LZ

¯ X=LZ

=

X X ¯ X=LZ

1 ˆ Q(X) + 2

X

(4.8) V (11 )V (12 ).

11 ,12 disj 1¯1 ,1¯2 overlap conn. 1¯1 ∪1¯2 =LZ

This gives, in the case |Z| = 1, i.e. Z ≡ 1, 2 X ˆ (≤2) (L1) = −V (L1) + V (L1) + Ik (X), (B Q) 2 X

(4.9)

¯ X=L1

while, for |Z| = 2, i.e. Z ≡ (1 ∪ 10 ), ˆ (≤2) (L(1 ∪ 10 )) = + (B Q)

V (L1)V (L10 ) + (1 10 ) + Ik (11 ∪ 12 ), 2

(4.10)

where in (2.7), since D = 1, 11 ⊂ L1 and 12 ⊂ L10 are uniquely defined by the fact that they have to be overlap connected, and therefore the relation 11 ∩ 12 = L1 ∩ L10 has to be fullfilled. No symmetrization is necessary in 11 , 12 because Ik (11 ∪ 12 ) ¯ the is already symmetrized. Note that, exploiting the compactness of the propagator 0, reblocking for the irrelevant activities Ik can be written in the following form: X

Ik (X) = BIk (L1) =

X ¯ X=L1

Ik (11 ∪ 12 ) = BIk (L(1 ∪ 10 )) =

k X l=1

k X l=1

2 gk−l B I¯ l (L1),

2 gk−l B I¯ l (L(1 ∪ 10 )),

(4.11)

(4.12)

with d Z 1 Z Z Z Z λ∗ ∂ d d k1 d d k2 2lε 1 D D ¯ ds d x1 d x B Il (L1) = L 2 2 2π ∂s L1 (2π λ∗ )d/2 L1 (2π λ∗ )d/2 0 Z 1 ∂ 1 · dt exp − (k, Il (s, t)k) + i(k1 · φ(x1 ) + tk2 · (φ(x2 ) − φ(x1 ))) , (4.13) ∂t 2λ∗ 0 d Z 1 Z Z Z ∂ λ∗ d d k1 0 2lε 1 ¯ ds d D x1 d D x2 B Il (L(1 ∪ 1 ))) = L 2 2π ∂s L1 (2π λ∗ )d/2 L10 0 Z Z 1 d d k2 ∂ · dt (4.14) d/2 (2πλ∗ ) ∂t 0 1 (k, Il (s, t)k) + i(k1 · φ(x1 ) + tk2 · (φ(x2 ) − φ(x1 ))) + (1 10 ). · exp − 2λ∗

234

P. K. Mitter, B. Scoppola

4.3. Rescaling, integration and preliminary extraction. Now we rescale and we integrate the fluctuating field and we obtain immediately (1.46) up to the second order in g: g2 ˜ \ + (S1 Ik )\ , (RQ) 2 ˜ is defined by where the V∗ is supported only on single blocks and Q \

ε ˆ (S Q) (≤2) (1) = −gL V∗ +

˜ Q(L1) = (V∗ (L1))2 , ˜ Q(L1 ∪ L10 ) = V∗ (L1)V∗ (L10 ) + (1 10 ).

(4.15)

(4.16)

Following again Sect. 3 we obtain for the contribution up to the second order K˜ (2) of the activity K˜ defined in (3.37) the following expression Z g2 1 ∂ ˜ s\ + (S1 I)\ ≡ K˜ ˜ + (S1 Ik )\ , ˜ ds (RQ) (4.17) K(2) = Q 2 0 ∂s where, using (1.42) and the representation (4.2) we have Z Z 2 λ d Z 1 ∂ d d k1 ∗ 2ε g D ˜ ds d x1 KQ˜ (1) = L 2 2π ∂s 1 (2π λ∗ )d/2 0 Z Z Z h |k |2 + |k |2 d d k2 1 2 d D x2 (ζ ) exp − dµ s R 0 (2πλ∗ )d/2 2λ∗ 1 i + i (k1 · (ζ (x1 ) + φ(x1 )) + k2 · (ζ (x2 ) + φ(x2 ))) ,

(4.18)

Z Z Z ∂ g 2 λ∗ d 1 d d k1 ds d D x1 K˜ Q˜ (1 ∪ 10 ) = L2ε 2 2π ∂s 1 (2π λ∗ )d/2 0 Z Z Z h |k |2 + |k |2 d d k2 1 2 d D x2 (ζ ) exp − (4.19) dµ s R 0 d/2 0 (2π λ ) 2λ ∗ ∗ 1 i + i (k1 · (ζ (x1 ) + φ(x1 )) + k2 · (ζ (x2 ) + φ(x2 ))) + (1 10 ). ˜ in terms of the representation (4.2) Equations (4.18), (4.19) are obtained writing Q and performing the change of variables k → L−β/2 k, x → Lx. The gaussian integral with respect to the measure µs R0 appearing in (4.18), (4.19) is easily done: Z Z Z ∂ g 2 λ∗ d 1 d d k1 ds d D x1 (4.20) K˜ Q˜ (1) = L2ε 2 2π ∂s 1 (2π λ∗ )d/2 0 Z Z 1 d d k2 d D x2 exp − (k, σ k) + i(k · φ(x ) + k · φ(x )) , 1 1 1 2 2 (2πλ∗ )d/2 2λ∗ 1

d Z

Z Z Z ∂ d d k1 D ds d x1 d D x2 ∂s 1 (2π λ∗ )d/2 10 0 d d k2 1 exp − (k, σ k) + i · φ(x ) + k · φ(x )) (k 1 1 1 2 2 (2πλ∗ )d/2 2λ∗ (4.21) + (1 10 ),

g K˜ Q˜ (1 ∪ 10 ) = L2ε 2 Z

2

λ∗ 2π

1

Renormalization Group Approach to Interacting Polymerised Manifolds

235

where the matrix σ1 is given by ¯ 2 − x1 ) 1 s 0(x . σ1 = ¯ 2 − x1 ) 1 s 0(x

(4.22)

In order to extract the relevant part from (4.20), (4.21) it is useful to perform the following change of variables k1 → k1 − k2 , k2 → k2 obtaining Z Z Z Z Z ∂ d d k1 d d k2 g 2 λ∗ d 1 D ds d D x1 d x K˜ Q˜ (1) = L2ε 2 d/2 2 2π ∂s 1 (2π λ∗ ) (2π λ∗ )d/2 1 0 1 exp − (k, T σ1 k) + i(k1 · φ(x1 ) + k2 · (φ(x2 ) − φ(x1 ))) , 2λ∗ (4.23) Z Z Z Z ∂ g 2 λ∗ d 1 d d k1 ds d D x1 d D x2 K˜ Q˜ (1 ∪ 10 ) = L2ε 2 2π ∂s 1 (2π λ∗ )d/2 10 0 Z d d k2 1 exp − (k, T σ k) + i · φ(x ) + k · (φ(x ) − φ(x ))) (k 1 1 1 2 2 1 (2πλ∗ )d/2 2λ∗ (4.24) + (1 10 ), where the matrix T σ1 is given by ¯ 1 − 1 − s 0(x2 − x1 ) . T σ1 = ¯ 2 − x1 ) 2 1 − s 0(x ¯ 2 − x1 ) − 1 − s 0(x

(4.25)

In order to give the explicit expression of (S1 Ik )\ and to prove the iterative form of Ik (4.3) the following lemma is useful: Lemma 4.3.1.

(S1 I¯ l )\ = I¯ l+1 .

(4.25)

Proof. Let us write the proof for the single block contribution only. The proof for the couple of adjacent blocks is identical. Integrating and rescaling (4.13) we obtain (S1 I¯ l )\ (1, φ) = ((B1 I¯ l )] )(L1, Rφ) Z Z 1 Z Z ∂ 1 λ∗ d d k1 ds d D x1 d D x2 = L2lε ( )d d/2 2 2π ∂s (2π λ ) ∗ L1 L1 0 Z Z 1 d d k2 ∂ − 2λ1 (k,Il (s,t)k) · dt e ∗ (2π λ∗ )d/2 0 ∂t β/2 φ(x /L)+tk ·Lβ/2 (φ(x /L)−φ(x /L))) 1 2 2 1

· ei(k1 ·L

e− 2 (k,0(t)k) 1

with the matrix 0(t) given by 0(0) −t (0(0) − 0(x2 − x1 )) . 0(t) = −t (0(0) − 0(x2 − x1 )) 2t 2 (0(0) − 0(x2 − x1 )) Adding λ∗ −1 Il (s, t) and 0(t), performing the change of variables k → L−β/2 k, x → Lx and some elementary manipulations, we obtain the proof. u t

236

P. K. Mitter, B. Scoppola

4.4. Extraction. Now we define the relevant part FQ˜ (Z, φ) = LK˜ Q˜ (Z, φ) in the following way: Z Z Z Z g 2 λ∗ d d d k1 d d k2 D d D x1 d x LK˜ Q˜ (1) = L2ε 2 2 2π (2π λ∗ )d/2 1 (2π λ∗ )d/2 1 Z 1 ∂ 1 · ds exp − (k, R1 k) + i(k1 · φ(x1 )) , (4.27) ∂s 2λ ∗ 0 Z Z Z 2 λ d Z d d k1 d d k2 ∗ 0 2ε g D D ˜ d x1 d x LKQ˜ (1 ∪ 1 , φ) = L 2 2 2π (2π λ∗ )d/2 10 (2π λ∗ )d/2 1 Z 1 1 ∂ · ds exp − (k, R1 k) + i(k1 · φ(x1 )) + (1 10 ), ∂s 2λ∗ 0 (4.28) where the matrix R1 is given by 1 0 R1 = ¯ 2 − x1 )] . ¯ 2 − x1 ) + D1 0(x 0 2[1 − s 0(x

(4.29)

It is convenient to perform the gaussian integral in the k’s variables obtaining for (4.27) and (4.28) the following expression Z Z Z 1 ∂ g 2 λ∗ d d D x1 d D x2 ds FQ˜ (1, φ) = L2ε 2 2π ∂s (4.30) 1 1 0 λ∗ 2 ¯ 2 − x1 ))]−d/2 e− 2 |φ(x1 )| , ¯ 2 − x1 ) + D1 0(x · [2(1 − s 0(x FQ˜ (1 ∪ 10 , φ) = L2ε

g2 2

λ∗ 2π

d Z 1

d D x1

Z 10

d D x2

Z 0

1

ds

∂ ∂s

¯ 2 − x1 ))]−d/2 e− ¯ 2 − x1 ) + D1 0(x · [2(1 − s 0(x

(4.31) λ∗ 2 2 |φ(x1 )|

+ (1 10 ). Following [BDH2] , we want to write FQ˜ (Z, φ) in terms of the sets where the dependence from the field φ is localized. In other words, we want to write the decomposition X FQ˜ (Z, 1, φ), (4.32) FQ˜ (Z, φ) = 1⊂Z

where in FQ˜ (Z, 1, φ) appear only fields defined in 1. The explicit expression for the relevant contribution FQ˜ (Z, 1, φ) is therefore Z dx1 v∗ (φ(x1 ))fQ˜ (Z, 1), (4.33) FQ˜ (Z, 1, φ) = 1

where

Z Z 1 λ∗ d/2 ∂ dx2 ds fQ˜ (1, 1) = L 2 4π ∂s 1 0 ¯ 2 − x1 ) + D1 0(x2 − x1 )]−d/2 , · [1 − s 0(x 2ε g

2

(4.34)

Renormalization Group Approach to Interacting Polymerised Manifolds

237

Z Z 1 λ∗ d/2 ∂ fQ˜ (1 ∪ 1 , 1) = L dx2 ds 2 4π ∂s 10 0 ¯ · [1 − s 0(x2 − x1 ) + D1 0(x2 − x1 )]−d/2 . 2ε g

0

2

By definition VF0 ˜ (1, φ) Q

Z

X

=−

Z⊃1 |Z|≤2, conn

FQ˜ (Z, 1, φ) = −

1

dx1 v∗ (φ(x1 ))

Z⊃1 |Z|≤2

Z =− where

1

X

(4.35)

fQ˜ (Z, 1) (4.36)

dx1 v∗ (φ(x1 ))fQ˜ (3, 1),

Z Z 1 ∂ λ∗ d/2 dx2 ds fQ˜ (3, 1) = L 2 4π ∂s 3 0 ¯ 2 − x1 )]−d/2 , ¯ 2 − x1 ) + D1 0(x · [1 − s 0(x 2ε g

2

(4.37)

¯ and we have used the fact that 0(y) vanishes for |y| ≥ 1. Now we can use the translation invariance for fQ˜ (3, 1) and we obtain VF0 ˜ (1, φ) Q

2ε g

= − V∗ (1, φ)L

2

2

d/2Z

λ∗ 4π

Z dy 0

¯ · [1 − s 0(y) + D1 0(y)]

−d/2

1

ds

∂ ∂s

(4.38)

,

and therefore V 0 (1, φ) = Lε gV∗ (1, φ) + VF0 ˜ (1, φ) = (Lε g − b1 g 2 )V∗ (1, φ) Q

(4.39)

with b1 = L2ε

1 2

λ∗ 4π

d/2 Z

Z dy 0

1

ds

∂ −d/2 ¯ ¯ . [1 − s 0(y) + D1 0(y)] ∂s

(4.40)

b1 can be controlled explicitly: using the behaviour for short distances of the covariance ¯ which can be easily proved to be 0, ¯ ¯ − 0(y)| ≤ O(1)y β O(1)y β ≤ |0(0)

(4.41)

for 1/L ≤ y ≤ 1/2, one can obtain the following result Lemma 4.4.1.

b1 = O(ln L), b1 > 0.

Proof. Let us first consider the quantity Z 1 −d/2 ¯ ¯ dy(1 − 0(y) + D1 0(y)) . a(L, ˜ ε) =

(4.42)

(4.43)

0

˜ ε) We split the integral in two regions and we will call a˜ < (L, ε) the contribution to a(L, ˜ ε) coming from coming from 0 ≤ y ≤ 1/L and a˜ > (L, ε) the contribution to a(L, ¯ ≥ L−β 1/L ≤ y ≤ 1. Now we can estimate a˜ < (L, ε) simply observing that 1 − 0(y)

238

P. K. Mitter, B. Scoppola

¯ ≥ 0, obtaining a˜ < (L, ε) ≤ Lβd/2 (1/L) = L−ε ≤ 1. a˜ > (L, ε) is estimated and D1 0(y) using (4.41), the dominant contribution coming from the region 1/L ≤ y ≤ 1/2, and ¯ ≤ O(1) |y|2 to show that both for the upper bound and for the using 0 ≤ D1 0(y) ¯ lower bound the contribution coming from D1 0(y) can be neglected with respect to −β ¯ ¯ ¯ 1 − 0(y) = L + 0(0) − 0(y). Then by a straightforward integration we can show that a˜ > (L, ε) = O(ln L) > 0. It is now trivial to see, computing explicitly in (4.40) R1 ∂ ˜ > (L, ε) − O(1) and we have the lemma. u t 0 ds ∂s , that b1 = O(1)a From Lemma 4.4.1 and the definition (4.30), (4.31) it is immediate to show that kFQ˜ kG,1,0p ≤ O(1)ε7/4 kFQ˜ k∞,1,0p ≤ O(1)ε7/4

(4.44)

for any integer p ≥ 1, with O(1) depending on p. 4.5. Second order irrelevant part. From the explicit expression of K˜ Q˜ given in (4.23) and (4.24) and of FQ˜ given in (4.30) and (4.31) we obtain (K˜ Q˜ − FQ˜ )(1) = L2ε Z

g2 2

λ∗ 2π

d Z

1

0

Z

ds

∂ ∂s

Z 1

d D x1

Z

d d k1 (2π λ∗ )d/2

Z

h 1 ∂ · dt exp − (k, I1 (s, t)k) ∂t 2λ∗ 0 i + i(k1 · φ(x1 ) + tk2 · (φ(x2 ) − φ(x1 ))) , d d k2 (2πλ∗ )d/2

1

d D x2

1

(4.45)

Z Z Z 2 λ d Z 1 ∂ d d k1 ∗ 0 2ε g D ˜ ds d x1 d D x2 (KQ˜ − FQ˜ )(1 ∪ 1 ) = L 2 2π ∂s 1 (2π λ∗ )d/2 10 0 Z Z 1 1 d d k2 ∂ · dt exp − (k, I1 (s, t)k) d/2 (2π λ∗ ) ∂t 2λ∗ 0 i + i(k1 · φ(x1 ) + tk2 · (φ(x2 ) − φ(x1 ))) + (1 10 ). (4.46) Therefore (K˜ Q˜ − FQ˜ ) = g 2 I¯ 1 , and by Lemma 4.3.1 the induction on (4.3) is proved. The bounds (3.7) are now easy consequences of the following Lemma 4.5.1, Lemma 4.3.1 and the smallness of the coupling constants. Lemma 4.5.1. For any positive integer p ≥ 1 and with O(1) p-dependent, kI¯ l kG,1,0p ≤ O(1)L3β−lβ/2 L2(D+2) , kI¯ l k∞,1,0p ≤ O(1)L3β−lβ/2 L2(D+2) .

(4.47) (4.48)

Proof. We perform first of all in (4.4), (4.5) the gaussian integral in the k variables and we obtain Z Z 1 ∂ ∂ 1 λ∗ d 1 ds dt I¯ l (1) = L2lε 2 2π ∂s ∂t 0 0 Z Z 1 D D d x1 d x2 exp − λ∗ (φ, Il (s, t)−1 φ) (det Il (s, t))−d/2 , · 2 1 1 (4.49)

Renormalization Group Approach to Interacting Polymerised Manifolds

239

d Z 1 Z 1 Z Z λ∗ ∂ ∂ 0 2lε 1 ¯ Il (1 ∪ 1 ) = L ds dt d D x1 d D x2 2 2π ∂s 0 ∂t 1 10 0 1 −1 · exp − λ∗ (φ, Il (s, t) φ) (det Il (s, t))−d/2 + (1 10 ), (4.50) 2 where φ = (φ(x1 ), φ(x2 ) − φ(x1 )). ¯ (l−1) (x2 − Then we observe that the derivative with respect to s produces a factor 0(L x1 )), and due to the compact support of the covariance this implies that d Z Z I¯ l (X) ≤ L2lε 1 λ∗ d D x1 d D x2 2 sup 2 2π 0≤s≤1 1 δl Z 1 1 ∂ exp − λ∗ (φ, Il (s, t)−1 φ) (det Il (s, t))−d/2 , · dt ∂t 2

(4.51)

0

where δl ≡ {x2 : |x2 − x1 | ≤ L−(l−1) }. Performing the derivative with respect to t we obtain that the integrand of the integral in t in (4.51) is λ∗ −1 λ∗ ∂ λ∗ (φ, Jl (s, t)φ) e− 2 (φ,Il (s,t) φ) − (d/2) − (φ, J (s, t)φ) , I t l (det Il (s, t))d/2+1 2 det Il (s, t) 2 ∂t (4.52) where It = ∂t∂ det Il (s, t) and ¯ 2 − x1 ) tCl (s) 2 Cl (s) + (1 − t 2 )D1 0(x . (4.53) Jl (s, t) = tCl (s) 1

Notice that Il (s, t)−1 = (det Il (s, t))−1 Jl (s, t) and that h i ¯ 2 − x1 ) − t 2 (Cl (s))2 . det Il (s, t) = 2 Cl (s) + (1 − t 2 )D1 0(x 1 0 and 0 0 2 1 t Cl (s)2 tCl (s) = 1 det Il (s, t) tCl (s)

By writing Il (s, t)−1 = E1 + I˜l (s, t)−1 , where E1 = I˜l (s, t)−1

(4.54)

and from the fact that the displayed matrix is positive definite and that det Il (s, t) is positive, as shown below, we obtain trivially that exp − 1 λ∗ (φ, Il (s, t)−1 φ) ≤ e− 21 λ∗ |φ(x1 )|2 . (4.55) 2 The following bounds are easy to obtain: ( O(1)L−βl for |x2 − x1 | < L−l , Cl (s) ≥ β O(1)|x2 − x1 | for L−l ≤ |x2 − x1 | ≤ L−l+1 0 ≤ Cl (s) ≤ O(1)L−β(l−1) for |x2 − x1 | ≤ L−l+1 ,

(4.56) (4.57)

240

P. K. Mitter, B. Scoppola

where O(1) = 1 for l = 1, ( O(1)L−βl for |x2 − x1 | < L−l , det Il (s, t) ≥ O(1)|x2 − x1 |β for L−l ≤ |x2 − x1 | ≤ L−l+1

(4.58)

|It | ≤ O(1)L−2β(l−1) for |x2 − x1 | ≤ L−l+1 .

(4.59)

To prove (4.56) and (4.57) we have to use the definition of Cl (s) given after (4.6) ¯ ¯ ¯ ≤ and elementary inequalities (4.41), 0 ≤ 0(0) − 0(x) ≤ O(1)|x|2 L(2−β) and 0(x) −β ¯ 2 − x1 ) ≥ 0, so ¯ 0(0) = 1 − L . To prove (4.58) we use (4.54), observe that D1 0(x that det Il (s, t) ≥ Cl (s)(2 − Cl (s)) and then use (4.56) and (4.57). To prove (4.59) we use (4.57) together with 0 ≤ D1 0(x2 − x1 ) ≤ O(1)|x2 − x1 |2 . We now list two further inequalities for the quadratic form (φ, Jl (s, t)φ), |(φ, Jl (s, t)φ)| ≤ O(1)L−β(l−1) e(κ/2)kφk1,σ,X e 2 λ∗ (ρ/2)|φ(x1 )| 2

1

2

for |x2 − x1 | ≤ L−l+1 , ∂ (φ, Jl (s, t)φ) ≤ O(1)L−(β+1)(l−1) e(κ/2)kφk21,σ,X e λ2∗ (ρ/2)|φ(x1 )|2 , ∂t

(4.60)

(4.61)

where (4.61) is true for 0 ≤ |x2 − x1 | ≤ L−l+1 . To prove (4.60) we find from the explicit expression of Jl (s, t) the following bound: ¯ |(φ, Jl (s, t)φ)| ≤ O(1)|φ(x1 )|2 (Cl (s) + D1 0) i h + O(1) |φ(x1 )||φ(x2 ) − φ(x1 )|Cl (s) + |φ(x2 ) − φ(x1 )|2 . Then from the mean value theorem and Sobolev embedding we bound |φ(x2 )−φ(x1 )| ≤ |x2 − x1 |kφk1,σ,X . Finally, since κ and ρ are small but O(1), we have kφk1,σ,X ≤ 2 1 2 O(1)e(κ/4)kφk1,σ,X , |φ(x1 )| ≤ O(1)e 2 λ∗ (ρ/4)|φ(x1 )| and we obtain ¯ + |x2 − x1 |)e(κ/2)kφk1,σ,X e 2 λ∗ (ρ/2)|φ(x1 )| |(φ, Jl (s, t)φ)| ≤ O(1)((Cl (s) + D1 0) 2

1

2

that gives the proof of (4.60) by (4.57) and the estimate on D1 0¯ given in the proof of (4.59). To prove (4.61) we start from the bound ∂ (φ, Jl (s, t)φ) ≤ O(1)(Cl (s)|φ(x1 )||φ(x2 ) − φ(x1 )| + D1 0(x ¯ 2 − x1 )|φ(x1 )|2 ), ∂t then we proceed as in (4.60) and we obtain the proof. Using the bounds (4.55)–(4.61) it is now an easy task to bound R the explicit expres1 sion (4.52). We have, defining gρ,κ (X, φ) by Gρ,k (X, φ) = |X| X dx1 gρ,k (x1 , X, φ), the following bound: ∂ exp − 1 λ∗ (φ, Il (s, t)−1 φ) (det Il (s, t))−d/2 ∂t 2 ( for |x2 − x1 | < L−l O(1)Lβl(d/2+2) L−3β(l−1) gρ,κ (x1 , X, φ) ≤ −β(d/2+2) −3β(l−1) O(1)|x2 − x1 | L gρ,κ (x1 , X, φ) for L−l ≤ |x2 −x1 | ≤ L−l+1 . (4.62)

Renormalization Group Approach to Interacting Polymerised Manifolds

241

Now we come back to (4.51) and obtain "Z −l L ¯ dyLβl(d/2+2) L−3β(l−1) Gρ,k (X, φ)+ |Il (X)| ≤ O(1) 0

Z +

L−l+1 L−l

h

# −β(d/2+2) −3β(l−1)

dy|y|

L

Gρ,k (X, φ)

(4.63)

l(βd/2−1) −β(l−3)

L ≤ O(1) L Z L−l+1 i + dy|y|−β(d/2) L−β(l−3) Gρ,k (X, φ). L−l

Observe that β(d/2) = 1 − ε. We obtain for the integral in (4.63), for ε sufficiently small, Z L−l+1 1 dy|y|−β(d/2) = L−lε (Lε − 1) = O(ln L). (4.64) −l ε L Hence we have kI¯ l (X)kG ≤ O(1)L3β−lβ/2 . Evaluating the functional derivative D I¯ l (X), we obtain the same bound, so that we have kI¯ l (X)kG,1 ≤ O(1)L3β−lβ/2 ,

(4.65)

and we obtain finally the lemma by the smallness of the set X. u t 5. RG Action on the Remainder. Relevant and Irrelevant Terms. Estimates. This section is devoted to the activity (S1 rˆ )\ , which we encountered earlier in (3.37) of Sect. 3. We need to extract relevant terms from the contributions from small sets. We define the relevant and irrelevant terms and give suitable bounds. We also control the remainder contribution to the flow of the effective coupling constant. The contributions from large sets of course need no subtractions since we easily obtain a contractive bound for them. 5.1. Linear reblocking, small set contributions. Let X be a small set. Define ¯ ¯ φ) = e(λ∗ /2)|φ(x)| rˆ (X, φ), rˆ∗ (X, φ(x), 2

where x¯ is the midpoint of the polymer X. We have Z 1 ¯ 2 dx1 e−(λ∗ /2)|φ(x)| rˆ∗ (X, φ(x), ¯ φ). rˆ (X, φ) = |X| X

(5.1)

(5.2)

Now we consider the contribution to the linear reblocking of the activity rˆ restricted to small sets, denoted by rˆ s.s . We have Z X 1 ¯ 2 dx1 e−(λ∗ /2)|φ(x)| rˆ∗ (X, φ(x), ¯ φ), (5.3) B1 rˆ (s.s.) (LZ, φ) = |X| X X small set ¯ X=LZ

242

P. K. Mitter, B. Scoppola

and after rescaling and convolution integration, using the master formula in Lemma 1.3.1, \ X (Rˆr )\ (L−1 X, φ) S1 rˆ (s.s.) (Z, φ) = X s. s. ¯ X=LZ

X L−α Z −β ¯ 2 dx1 e−(λ∗ /2)L |Rφ(x)| = |X| X X s. s.

(5.4)

¯ X=LZ

· rˆ∗#6x¯ (X, L−β Rφ(x), ¯ L−β T x¯ Rφ), \ where we used the notations S1 rˆ (s.s.) = µ0¯ ∗ (S1 rˆ (s.s.) ) and rˆ∗#6x¯ = µ6 x¯ ∗ rˆ∗ . Then we define the relevant part in the following way: \ X L(Rˆr )\ (L−1 X, φ) L S1 rˆ (s.s.) (Z, φ) = X s. s. ¯ X=LZ

=

(5.5) X L−α Z −β 2 dx1 e−(λ∗ /2)L |Rφ(x1 )| rˆ∗#6x¯ (X, 0, 0), |X| X X s. s.

¯ X=LZ

so that the irrelevant term is \ X (1 − L)(Rˆr )\ (L−1 X, φ) (1 − L) S1 rˆ (s.s.) (Z, φ) =

(5.6)

X s. s. ¯ X=LZ

with (1 − L)(Rˆr )\ (L−1 X, φ) Z h L−α −β 2 dx1 e−(λ∗ /2)L |Rφ(x1 )| −ˆr∗#6x¯ (X, 0, 0) = |X| X

i −β ¯ 2 −|Rφ(x1 )|2 #6x¯ rˆ∗ (X, L−β Rφ(x), ¯ L−β T x¯ Rφ) + e−(λ∗ /2)L |Rφ(x)| Z Z 1 ∂ L−α −β 2 dx1 e−(λ∗ /2)L |Rφ(x1 )| dt = |X| X ∂t 0 i h 2 −t (λ∗ /2)L−β |Rφ(x)| ¯ 2 −|Rφ(x1 )|2 #6x¯ rˆ∗ (X, tL−β Rφ(x), ¯ tL−β T x¯ Rφ) . · e

(5.7)

Then we perform the derivative with respect to t obtaining (1 − L)(Rˆr )\ (L−1 X, φ) Z Z 1 L−α − λ2∗ L−β |Rφ(x1 )|2 dx1 e dt = |X| X 0 h ∂ λ∗ 2 −β ¯ 2 −|Rφ(x1 )|2 e− 2 t L |Rφ(x)| ¯ tL−β T x¯ Rφ) rˆ∗#6x¯ (X, tL−β Rφ(x), · ∂t i λ∗ 2 −β ∂ ¯ 2 −|Rφ(x1 )|2 ¯ tL−β T x¯ Rφ) rˆ∗#6x¯ (X, tL−β Rφ(x), + e− 2 t L |Rφ(x)| ∂t Z Z 1 λ∗ 2 −β ∂ L−α −β 2 ¯ 2 −|Rφ(x1 )|2 dx1 e−(λ∗ /2)L |Rφ(x1 )| dt e− 2 t L |Rφ(x)| = |X| X ∂t 0

Renormalization Group Approach to Interacting Polymerised Manifolds

243

λ∗ 2 −β ¯ 2 −|Rφ(x1 )|2 · rˆ∗#6x¯ (X, tL−β Rφ(x), ¯ tL−β T x¯ Rφ) + e− 2 t L |Rφ(x)| h ¯ tL−β T x¯ Rφ; (L−β Rφ(x), ¯ 0)) · (D rˆ∗#6x¯ )(X, tL−β Rφ(x), i #6x¯ −β −β x¯ −β x¯ ¯ tL T Rφ; (0, L T Rφ)) , + (D rˆ∗ )(X, tL Rφ(x),

(5.8)

where the derivatives with respect to the field φ(x) ¯ are ordinary partial derivatives, and we will denote them hereafter with ∂, while variational derivatives with respect to the field φ have norms computed in C 1 (X) topology. We denote with D(2) K the first variational derivative in the direction of φ in the activity K(X, φ(x), ¯ φ). We can write the following bound for the activity (1 − L)(Rˆr )\ (L−1 X, φ): (1 − L)(Rˆr )\ (L−1 X, φ) Z L−α −β 2 dx1 e−(λ∗ /2)L |Rφ(x1 )| ≤ |X| X Z 1 1/2−β · |Rφ(x)| ¯ 2 −|Rφ(x1 )|2 · dt L−1/2 e(λ∗ /2)L 0 #6x¯ · (ˆr∗ )(X, tL−β Rφ(x), ¯ tL−β T x¯ Rφ) (5.9) h λ∗ 2 −β ¯ 2 −|Rφ(x1 )|2 ¯ |L−β Rφ(x)| + e− 2 t L |Rφ(x)| #6x¯ )(X, tL−β Rφ(x), ¯ tL−β T x¯ Rφ) · (∂ rˆ∗R

i

¯ tL−β T x¯ Rφ) , + L−β T x¯ Rφ 1 (D(2) rˆ∗#6x¯ )(X, tL−β Rφ(x), C (X)

where in the second line we used the trivial inequality |xex | ≤ α1 eα|x| ∀α > 2. We can give a suitable estimate of (5.9) by means of the following lemmas Lemma 5.1.1. ¯ 2 − |Rφ(x1 )|2 ≤ O(1)L−(1−β) kφk2L−1 X,1,σ + L−β |Rφ(x1 )|2 . |Rφ(x)| Proof.

¯ + |Rφ(x1 )|| . ¯ − |Rφ(x1 )|| ||Rφ(x)| ¯ 2 − |Rφ(x1 )|2 ≤ ||Rφ(x)| |Rφ(x)|

Using the mean value theorem, a Sobolev inequality and the assumption that X is a small set, ! 2 2 −(1−β/2) sup ∇φ(x) ¯ − |Rφ(x1 )| ≤ |x¯ − x1 |L |Rφ(x)| x∈L−1 X

· 2|Rφ(x1 )| + 2kφkL−1 X,1,σ

≤ |X|L−(1−β) kφkL−1 X,1,σ · 2L−β/2 |Rφ(x1 )| + 2kφkL−1 X,1,σ , and the lemma easily follows. u t

244

P. K. Mitter, B. Scoppola

Lemma 5.1.2.

−β x¯

L T Rφ

C 1 (X)

≤ O(1)L−β/2 √ ·e

1 1 − t2

1 1 √ +√ ρ k

2 2 (λ∗ /2)(ρ/4)(1−t 2 )L−β |Rφ(x1 )|2 (κ/4)(1−t )kφkL−1 X,1,σ

e

(5.10) .

¯ x). ¯ By trivial algebraic Proof. Recall that T x¯ Rφ(x) = Lβ Rφ(x) − λ∗ 0(x − x)Rφ( manipulation, and the property Lβ − 1 = λ∗ 0(0) we obtain ¯ + L−β (1 + λ∗ (0(0) − 0(x − x)))Rφ( ¯ x). ¯ L−β T x¯ Rφ(x) = [Rφ(x) − Rφ(x)] Observe that |0(0) − 0(x − x)| ¯ ≤ |x − x| ¯ supx |∇0(x)| ≤ O(1)|X|, where we used the mean value theorem and Lemma 1.1.1, as well as ¯ sup |∇φ(x/L)| |Rφ(x) − Rφ(x)| ¯ ≤ L−(1−β/2) |x − x| x

≤ |X|L

−(1−β/2)

sup |∇φ(x)|,

x∈L−1 X

and therefore ¯ sup |L−β T x¯ Rφ(x)| ≤ |X|[L−(1−β/2) sup |∇φ(x)| + O(1)L−β |Rφ(x)|].

x∈X

x∈L−1 X

¯ x) ¯ and thereAnalogously L−β ∇x T x¯ Rφ(x) = L−(1−β/2) ∇φ(x/L)−λ∗ ∇0(x − x)Rφ( fore ¯ sup |L−β ∇x T x¯ Rφ(x)| ≤ L−(1−β/2) sup |∇φ(x)| + O(1)L−β |Rφ(x)|.

x∈X

x∈L−1 X

Combining the two relations above we obtain

−β x¯ ≤ |X|[L−(1−β/2) sup |∇φ(x)| + O(1)L−β |Rφ(x)|]. ¯

L T Rφ 1 C (X)

Moreover,

x∈L−1 X

|Rφ(x)| ¯ ≤ |Rφ(x1 )| + |X|[L−(1−β/2) sup |∇φ(x)|]. x∈L−1 X

Now we exploit the estimates 1 1 (κ/4)(1−t 2 )kφk2 −1 L X,1,σ sup |∇φ(x)| ≤ √ √ e 1 − t2 k x∈L−1 X and

1 1 2 −β 2 L−β/2 |Rφ(x1 )| ≤ O(1) √ √ e(λ∗ /2)(ρ/4)(1−t )L |Rφ(x1 )| 2 ρ 1−t and X is a small set. The lemma has been proved. u t Lemma 5.1.3. 1 1 L−β/2 ¯ ≤ O(1) √ √ √ |L−β Rφ(x)| ρ k 1 − t2 e

(λ∗ /2)(ρ/4)(1−t 2 )L−β |Rφ(x1 )|2

(5.11) e

(κ/4)(1−t 2 )kφk2 −1 L

X,1,σ

.

Renormalization Group Approach to Interacting Polymerised Manifolds

245

The proof follows the above lines. ρ, κ are chosen sufficiently small but are of O(1) in L. Then we obtain from (5.9) using the above lemmas, (1 − L)(Rˆr )\ (L−1 X, φ) Z ρ λ∗ −β L−α 2 k kφk2 dx1 e− 2 L (1− 4 )|Rφ(x1 )| e 4 L−1 X,1,σ ≤ O(1)L−β/2 |X| X Z 1 1 −β 2 2 (κ/4)(1−t 2 )kφk2 L−1 X,1,σ + dt √ e(λ∗ /2)L (1−t )(ρ/4)|Rφ(x1 )| e 2 1 − t 0 #6x¯ · (ˆr∗ )(X, tL−β Rφ(x), ¯ tL−β T x¯ Rφ) #6x¯ )(X, tL−β Rφ(x), ¯ tL−β T x¯ Rφ) + (∂ rˆ∗R

¯ tL−β T x¯ Rφ) . + (D(2) rˆ∗#6x¯ )(X, tL−β Rφ(x),

(5.12)

In order to bound the activities in (5.12) in terms of the norms introduced in Sect. 2 we introduce the following intermediate regulator: ¯ ¯ φ) = e(λ∗ /2)(1+ρ)|φ(x)| G(X, φ), G∗ρ (X, φ(x), 2

(5.13)

¯ φ) the following lemma holds: where G(X, φ) is defined in (2.9). For G∗ρ (X, φ(x), Lemma 5.1.4. Let β be small enough but O(1) independent of L. Let X be a small set. Then 2

¯ ¯ φ) ≤ e(λ∗ /2)4ρ|φ(x)| e2κkφkX,1,σ . G∗ρ (X, φ(x), 2

(5.14)

Proof. We plug in the definition of G(X, φ) and observe that ¯ = e−(λ∗ /2)(|φ(x)| e−(λ∗ /2)(1−ρ)|φ(x)| e(λ∗ /2)(1+ρ)|φ(x)| 2

2 −|φ(x)| ¯ 2)

2

e(λ∗ /2)ρ(|φ(x)|

2 +|φ(x)| ¯ 2)

¯ x)| ¯ (λ∗ /2)ρ(|φ(x)| ≤ e(λ∗ /2)|φ(x)−φ(x)||φ(x)+φ( e

2 +|φ(x)| ¯ 2)

Recall that x and x¯ belong to X a small set. Then by using the Sobolev inequality we have ¯ ≤ 2|φ(x)| ¯ + 2kφkX,1,σ , |φ(x) − φ(x)| ¯ ≤ 2kφkX,1,σ , |φ(x) + φ(x)| ¯ 2 + 4kφk2X,1,σ . |φ(x)|2 ≤ 2|φ(x)| Then, using elementary inequalities we get ¯ ¯ φ) ≤ e(λ∗ /2)4ρ|φ(x)| e(κ+λ∗ (1/ρ+2+2ρ))kφkX,1,σ , G∗ρ (X, φ(x), 2

2

and Lemma 5.1.4 follows choosing β small enough to give ( see 1.12) λ∗ ≤ k/(1/ρ + 2 + 2ρ). u t

246

P. K. Mitter, B. Scoppola

We will also need the following lemma Lemma 5.1.5. kˆr∗ (X)kG∗ρ ≤ kˆr (X)kG , k(D(2) rˆ∗ )(X)kG∗ρ ≤ k(D rˆ )(X)kG , O(1) k(∂ rˆ∗ )(X)kG∗ρ ≤ √ kˆr (X)kG ρ which follows immediately from the definition of G∗ρ and of rˆ∗ . Denoting with J any of the activities rˆ∗ , ∂ rˆ∗ , D(2) rˆ∗ we have x¯

¯ tL−β T x¯ Rφ)| |(J #6 )(X, tL−β Rφ(x), Z ¯ + tL−β Rφ(x), ¯ ζ + tL−β T x¯ Rφ)kJ (X)kG∗ρ ≤ dµ6 x¯ (ζ )G∗ρ (X, ζ (x) Z −β Rφ(x)| ¯ ¯ 2 4κkζ +tL−β T x¯ Rφk2X,1,σ ≤ dµ6 x¯ (ζ )e(λ∗ /2)4ρ|ζ (x)+tL e kJ (X)kG∗ρ ,

(5.15)

where in passing to the last line we have used Lemma 5.1.4. Now using the stability of the large fields regulator in the form (2.13) we have x¯

¯ tL−β T x¯ Rφ)| |(J #6 )(X, tL−β Rφ(x), ≤ e(λ∗ /2)(ρ/8)t

2 L−β |Rφ(x)| ¯ 2

≤ e(λ∗ /2)(ρ/4)t

2 L−β |Rφ(x )|2 1

e

(κ/8)t 2 kφk2 −1

e

L

X,1,σ

(κ/4)t 2 kφk2 −1 L

kJ (X)kG∗ρ

X,1,σ

(5.16)

kJ (X)kG∗ρ ,

where in passing to the last line we have used Lemma 5.1.1. R1 Finally returning to (5.12) and using (5.16) together with the fact that 0 dt (1 − t 2 )−1/2 = O(1) we obtain Z L−α −β 2 dx1 e−(λ∗ /2)(1−ρ/2)L |Rφ(x1 )| (1 − L)(Rˆr )\ (L−1 X, φ) ≤ O(1)L−β/2 |X| X (κ/2)kφk2 −1 L X,1,σ kˆ e r∗ (X)kG∗ρ + k(∂ rˆ∗ )(X)kG∗ρ + k(D(2) rˆ∗ )(X)kG∗ρ (5.17) and then performing the rescaling, using L−α = O(1)L−D for ε small enough, and Lemma 5.1.5 we get (1 − L)(Rˆr )\ (L−1 X, φ) ≤ O(1)L−β/2 L−D G(L−1 X, φ)kˆr (X)kG,1 . Now we want to obtain an analogous estimate for the derivative (D(1 − L)(Rˆr )\ )(L−1 X, φ).

(5.18)

Renormalization Group Approach to Interacting Polymerised Manifolds

247

By definition we have (D(1 − L)(Rˆr )\ )(L−1 X, φ; f ) d (1 − L)(Rˆr )\ (L−1 X, φ + sf ) = ds s=0 Z λ∗ L−α −β 2 dx1 e−(λ∗ /2)L |Rφ(x1 )| − f (x1 /L)(L−β/2 Rφ)(x1 ) = |X| X 2 Z 1 h i λ∗ 2 −β ∂ ¯ 2 −|Rφ(x1 )|2 e− 2 t L |Rφ(x)| dt ¯ tL−β T x¯ Rφ) rˆ∗#6 x¯ (X, tL−β Rφ(x), · ∂t 0 λ∗ −β x¯ x1 x1 x¯ ¯ 2 −|Rφ(x1 )|2 (5.19) + λ∗ (f ( )φ( ) − f ( )φ( ))e− 2 L |Rφ(x)| L L L L λ∗

−β

¯ −|Rφ(x1 )| ¯ −β T x¯ Rφ) + e− 2 L |Rφ(x)| · rˆ∗#6x¯ (X,L−β Rφ(x),L h · (D(2) rˆ∗#6 )(X, L−β Rφ(x), ¯ L−β T x¯ Rφ; (0, L−β T x¯ Rφ))

+L

−β

2

2

i

x¯ (Rf (x))(∂ ¯ rˆ∗#6 )(X, L−β Rφ(x), ¯ L−β T x¯ Rφ; (L−β Rφ(x), ¯ 0))

.

We note that x¯ x1 x1 x¯ x1 x1 x¯ x¯ x1 x¯ |f ( )φ( ) − f ( )φ( )| ≤ |(f ( ) − f ( ))φ( )| + |f ( )(φ( ) − φ( ))| L L L L L L L L L L O(1) x1 kf kC 1 (L−1 X) (|φ( )| + kφkL−1 X,1,σ ) ≤ L L (5.20) and kL−β T x1 Rf kC 1 (X) ≤ O(1)L−β/2 kf kC 1 (L−1 X) which is proved as (5.10). Now we can proceed to the estimate of (5.19) along the same lines as the proof of (5.18). Recalling the definition of the norm of functional derivatives given in Sect. 2, we obtain

(D(1 − L)(Rˆr )\ )(L−1 X, φ) ≤ O(1)L−β/2 L−D G(L−1 X, φ) (5.21) · kˆr (X)kG + k(D rˆ )(X)kG . The results (5.18) and (5.21) can be written in terms of the norms introduced in Sect. 2 in the following way:

(5.22)

(1 − L)(Rˆr )\ (L−1 X, φ) ≤ O(1)L−β/2 L−D G(L−1 X, φ)kˆr (X)kG,1 . 1

Now we go back to (5.6). Using (5.22) and Lemma 2.1.1 we obtain \ k(1 − L) S1 rˆ (s.s.) (Z, φ)k1 0p (Z) X G(L−1 X, φ)kˆr (X)kG,1 0(X) ≤ O(1)L−β/2 L−D X s. s. ¯ X=LZ

≤ O(1)L

−β/2 −D

L

kˆr kG,1,0 ·

X X s. s. ¯ X=LZ

≤ O(1)L−β/2 kˆr kG,1,0 G(Z, φ),

−1

G(L

(5.23) X, φ)

248

P. K. Mitter, B. Scoppola

P where in the last step we have used Lemma 2.3.3. Observing now that Z s. s. 1 = O(1) Z⊃1 we have proved the following proposition \ Proposition 5.1.6. The contribution (1 − L) S1 rˆ (s.s.) due to small set activities and linear reblocking satisfies the following bound for any p ≥ 0, \ k(1 − L) S1 rˆ (s.s.) kG,1,0p ≤ O(1)L−β/2 kˆr kG,1,0 .

(5.24)

\ 5.2. Linear reblocking, large set contributions. The contribution S1 rˆ (l.s.) due to large sets in linear reblocking satisfies by virtue of Lemma 2.5.3 of Sect. 2 the following inequality: (5.25) k(S1 rˆ (l.s.) )\ kG,1,0 ≤ O(1)L−(1−β/2) kˆr kG,1,0 . 5.3. Bound for the relevant part. We prove some preliminary lemmas on the bounds of the relevant part. The relevant terms Frˆ (Z, φ) are defined by (see (5.5)) X L(Rˆr )\ (L−1 X, φ). (5.26) Frˆ (Z, φ) = X s. s. ¯ X=LZ

¯ ≤ |X|, Z is a small set. Then we have Since |X| Lemma 5.3.1. For any integer p ≥ 0, kFrˆ kG,1,0p ≤ O(1)kˆr kG,1,0 kFrˆ k∞,1,0p ≤ O(1)kˆr kG,1,0 ,

(5.27)

where O(1) depends on p. Proof. X is a small set. From (5.5) we have \

−1

L(Rˆr ) (L

L−α X, φ) = −1 |L X|

Z L−1 X

x¯

dx1 e−(λ∗ /2)|φ(x1 )| rˆ∗#6 (X, 0, 0) 2

(5.28)

and it is easy to see, using (5.16) as well as k(ˆr∗ )(X)kG∗ρ ≤ O(1)kˆr (X)kG (from Lemma 5.1.5) that the following inequality holds kL(Rˆr )\ (L−1 X)kG ≤ O(1)L−D kˆr (X)kG . Analogously, for the functional derivative k(DL(Rˆr )\ )(L−1 X)kG ≤ O(1)L−D kˆr (X)kG so that we have

kL(Rˆr )\ kG,1,0p ≤ O(1)L−D kˆr kG,1,0 .

(5.29)

Applying Lemma 2.3.3 finishes the proof of the first inequality in (5.27). The second is obtained following the same lines. u t

Renormalization Group Approach to Interacting Polymerised Manifolds

249

We want to write Frˆ (Z, φ) in terms of the sets where the dependence from the field φ is localized. In other words, we want to write the decomposition X Frˆ (Z, 1, φ), (5.30) Frˆ (Z, φ) = 1⊂Z

where in Frˆ (Z, 1, φ) appear only fields defined in 1. Frˆ (Z, 1, φ) is given by Z X X λ∗ −β 2 −α Frˆ (Z, 1, φ) = L dx1 e− 2 L |Rφ(x1 )| frˆ (X), (5.31) ¯ 1 =L1 11 :1

11

X: s.s. ¯ X=LZ X⊃11

where

1 #6x¯ rˆ (X, 0, 0). (5.32) |X| ∗ From the above expressions it is easy to check that (5.30) is satisfied. Now, following [BDH 1], we define the contribution to the local effective potential: X Frˆ (Z, 1, φ), (5.33) VF0 rˆ (1, φ) = − frˆ (X) =

Z⊃1 |Z|≤2, conn

and from (5.31) X Frˆ (Z, 1, φ) =

X

X X

¯ 1 =L1 Z⊃1 11 :1

Z⊃1

L−α

Z

X: s.s. ¯ X=LZ X⊃11

11

dx1 e−

λ∗ −β 2 2 L |Rφ(x1 )|

frˆ (X). (5.34)

It is immediate to see that this can be rewritten as Z X X λ∗ −β 2 X −α Frˆ (Z, 1, φ) = L dx1 e− 2 L |Rφ(x1 )| frˆ (X). 11

¯ 1 =L1 11 :1

Z⊃1

(5.35)

X: s.s. X⊃11

Now we want to prove the following lemma: Lemma 5.3.2.

VF0 rˆ (1, φ) = ξrˆ V∗ (1, φ)

(5.36)

with

(5.37) |ξrˆ | ≤ O(1)kˆr kG,1,0 ≤ O(1)ε5/2+η . P P 1 #6x¯ rˆ∗ (X, 0, 0) with x¯ midProof. By translation invariance X: s.s. frˆ (X) = X: s.s. |X| X⊃11

X⊃11

point of X is independent of 11 , i.e. 11 can be taken any unit block and the sum does not change. Therefore we define the constant ξrˆ by X 1 rˆ∗#6x¯ (X, 0, 0) = ξrˆ L−ε (5.38) |X| X: s.s. X⊃11

and (5.35) can be rewritten Z X Frˆ (Z, 1, φ) = L−α Z⊃1

L1

dx1 e−

λ∗ −β 2 2 L |Rφ(x1 )|

ξrˆ L−ε .

(5.39)

250

P. K. Mitter, B. Scoppola

Performing the rescaling in (5.39) we obtain (5.36). To prove the bound (5.37) we observe |ξrˆ | ≤ O(1)

X X: s.s. X⊃11

X 1 #6x¯ |ˆr∗ (X, 0, 0)| ≤ O(1) O(1)kˆr (X)kG , |X| X: s.s.

(5.40)

X⊃11

where in the last step we used (5.16) followed by Lemma 5.1.5. From (5.40) we have X 1)kˆr kG,1,0 ≤ O(1)kˆr kG,1,0 , |ξrˆ | ≤ O(1)( Z s. s. Z⊃11

and by Lemma 3.3.1 we obtain the proof. u t 6. Extraction Estimates From Lemma 3.1.1, then reblocking-rescaling followed by fluctuation integration, and then the preliminary extraction provided by Lemma 3.4.1, we obtained a single step of RG in the form ε V (L−1 3) ∗

e−gV∗ (3) Exp ( + K)(3) −→ e−gL

−1 ˜ Exp ( + K)(L 3)

(6.1)

with K˜ given by (3.32). In Sects. 4–5 respectively we obtained the relevant parts FQ˜ ˜ rˆ (see (4.27)) and Frˆ (see (5.26)). They are supported on small sets and, for U = Q, have the following local representation: X FU (Z, 1). (6.2) FU (Z) = 1⊂Z

The corresponding local potential is X X VU0 (1), VU0 (1) = FU (Z, 1). VU0 (L−1 3) = 1⊂L−1 3

We have

(6.3)

Z⊃1 Z⊂L−1 3

VQ0˜ (1) = −Lε b1 g 2 V∗ (1), Vrˆ0 (1) = ξrˆ V∗ (1)

(6.4)

with b1 = O(ln L) > 0 (see (4.42)) and |ξrˆ | ≤ O(1)ε5/2+η (see Lemma 5.3.2). These lemmas are valid under the assumptions (3.4) and (3.8) on g and on the remainder r. Define the total relevant part and the corresponding contribution to the local effective potential, due to FQ˜ and Frˆ as VF0 = VQ0˜ + Vrˆ0 , F = FQ˜ + Frˆ .

(6.5)

Then the local effective potential gLε V∗ (L−1 3) can be replaced by V 0 = gLε V∗ + VF0 ,

(6.6)

˜ This procedure known as provided we replace the activity K˜ with a new activity E(K). extraction (adopting the terminology of BDH) is given by the following proposition:

Renormalization Group Approach to Interacting Polymerised Manifolds

251

Proposition 6.1. (Extraction) εV

−1 ∗ (L 3)

−1 ˜ Exp ( + K)(L 3) = e−V

0 (L−1 3)

−1 ˜ Exp ( + E(K))(L 3),

(6.7)

˜ ˜ = (K˜ − F ) + (e−F − 1 + F ) + (e−F − 1)+ + (e−F − 1)+ ∨ K. E(K) ≥2

(6.8)

e−gL where

Before we prove this proposition let us observe that V 0 = g 0 V∗ , where g 0 = Lε g(1 − b1 g) + ξrˆ

(6.9)

as follows from (6.5) and (6.6). ˜ is determined by Proof of Proposition 6.1. E(K) 0 −1 −1 −1 ˜ ˜ 3) = eVF (L 3) Exp ( + K)(L 3). Exp ( + E(K))(L P From (6.2) and (6.5) we have F (Z) = 1⊂Z F (Z, 1) and then observe that

X

X

F (Z) =

X

Z⊂L−1 3 1⊂Z

Z⊂L−1 3

X

=

F (Z, 1) =

X

X

1⊂L−1 3

Z⊃1 Z⊂L−1 3

F (Z, 1)

−VF0 (1) = −VF0 (L−1 3).

(6.10)

1⊂L−1 3

Hence 0

−1 3)

eVF (L

=

Y

e−F (Z)

Z⊂L−1 3

=

Y

((e−F (Z) − 1) + 1) = Exp ( + (e−F − 1)+ )(L−1 3).

(6.11)

Z⊂L−1 3

˜ = Exp ( + (e−F − 1)+ )Exp ( + K) ˜ and from Lemma 1.4.1, Using this, Exp ( + E(K)) ˜ = K˜ + (e−F − 1)+ + (e−F − 1)+ ∨ K˜ E(K) −F ˜ − 1)+ ∨ K. = (K˜ − F ) + (e−F − 1 + F ) + (e−F − 1)+ ≥2 + (e

t u (6.12)

Note that from the definition of K˜ given in (3.37) and from (5.26), ˜ = Ik+1 + r 0 , E(K) where

\ \ r 0 = (1 − L) S1 rˆ (s.s.) + S1 rˆ (l.s.) + r˜ \ + r¯ −F ˜ − 1)+ ∨ K. + (e−F − 1 + F ) + (e−F − 1)+ ≥2 + (e

We wish to estimate r 0 . This is provided by the following proposition.

(6.13)

(6.14)

252

P. K. Mitter, B. Scoppola

Proposition 6.2. kr 0 kG,1,06 ≤

1 5/2+η ε . Lβ/4

(6.15)

Proof. The sum of the first four terms on r.h.s. of (6.14) is bounded by O(1)L−β/2 ε5/2+η using Proposition 5.1.6, (5.25), Lemma 3.3.1, Lemma 3.5.2 and Lemma 3.5.1. Furthermore ke−F − 1 + F kG,1,06 ≤ L−β/2 ε5/2+η , (6.16) −β/2 5/2+η ε . k(e−F − 1)+ ≥2 kG,1,06 ≤ L To prove the first inequality in (6.16) we observe that for any integer p ≥ 0, with O(1) depending on p, we have kF kG,1,0p ≤ O(1)ε7/4 , kF k∞,1,0p ≤ O(1)ε7/4 . This follows from (4.44), Lemma 5.3.1 and Lemma 3.3.1. Moreover from Lemma 2.5.4, ke−F − 1 + F kG,1,06 ≤ O(1)kF kG,1,06 kF k∞,1,06 .

(6.17)

Putting in (6.17) the bounds above the inequality follows from the smallness of ε. The second inequality in (6.16) follows from Lemma 2.5.5, along the same lines as above. Finally we have the estimate ˜ G,1,06 ≤ L−β/2 ε5/2+η . k(e−F − 1)+ ∨ Kk In fact ˜ G,1,06 ≤ k(e−F − 1)+ ∨ Kk

X N,M≥1

(6.18)

˜ M O(1)N +M k(e−F − 1)+ kN ∞,1,09 kKkG,1,09 . (6.19)

From the expression of K˜ (3.37), Corollary 2.5.2, bound (3.7) proved by means of ˜ G,1,09 ≤ O(1)ε7/4 . From Lemma 2.5.5, (4.41), and Lemmas 3.5.1, 3.5.2 we have kKk and the estimates above for kF k we have k(e−F − 1)+ k∞,1,09 ≤ O(1)ε7/4 . Then ˜ G,1,06 ≤ O(1)ε14/4 and (6.18) follows from the smallness of ε. k(e−F − 1)+ ∨ Kk Collecting the estimates above we have 1 5/2+η ε , Lβ/2 and Proposition 6.2 follows for L large enough. u t kr 0 kG,1,06 ≤ O(1)

(6.20)

From (6.9) the evolved coupling constant g 0 at the end of one RG step is g 0 = Lε g(1 − b1 g) + ξrˆ . In the absence of the remainder contribution ξrˆ , the approximate flow has the fixed point g, ¯ ¯ − b1 g) ¯ (6.21) g¯ = Lε g(1 whence

Lε − 1 . (6.22) Lε b1 From (4.42) we have 0 < b1 = O(ln L). From the smallness of ε (3.4) we have g¯ = O(ε). Assume (6.23) |g − g| ¯ ≤ ε3/2 . g¯ =

Then, since g¯ = O(ε), we have g = O(ε) and the hypothesis at the beginning of Sect. 3 is satisfied. We have, under the assumption (6.23) above

Renormalization Group Approach to Interacting Polymerised Manifolds

253

Proposition 6.3. The evolved coupling constant g 0 satisfies ¯ ≤ ε3/2 |g 0 − g|

(6.24)

so that the closed ball centered at g¯ of radius ε 3/2 is stable under RG iteration. Also, |g 0 − g| ≤ ε1/2 ε3/2 . (6.25) Proof. From (6.9) and an elementary calculation using property (6.21) we have ¯ − Lε b1 g) + ξrˆ . g 0 − g¯ = (g 0 − g)(1 g is O(ε), b1 is O(ln L), hence for ε sufficiently small 0 < 1 − Lε b1 g ≤ 1 − O(ε). By assumption (6.23) above and by Lemma 5.3.2 |ξrˆ | ≤ ε5/2+η , so that we have ¯ ≤ ε3/2 (1 − O(ε) + O(ε1+η )) ≤ ε3/2 |g 0 − g| for ε sufficiently small. The next part of the proposition follows by writing ε ¯ b1 g) + ξrˆ g 0 − g = (g − g)(−L

and estimating as before. u t 7. Convergence to a Non-Gaussian Fixed Point The density of the partition functional at the nth step of the RG can be parametrized by the triple (gn , In , rn ) in volume LN +1−n . A further RG transformation is a map (gn , In , rn ) −→ (gn+1 , In+1 , rn+1 ),

(7.1)

where the volume changes to LN +1−(n+1) . In order to discuss the convergence of the sequence of RG transformations we shall consider N, n very large with N n, so that we are effectively in “infinite” volume. With this hypothesis, the sequence of transformations (7.1) are iterations of a fixed mapping. Let us now recall some results of the preceding sections. By hypothesis I0 = r0 = 0 ¯ ≤ ε3/2 . By the structure of irrelevant terms in second order perturbation and |g0 − g| theory given in Sect. 4, and by results established there, we can write for any n, In =

n X l=1

2 ¯ Il , gn−l

(7.2)

I¯ l = (S1 I¯ l−1 )\ , kI¯ l kG,1,0p ≤ O(1)L

(7.3) L

3β−lβ/2 2(D+2)

(7.4)

for all integers p ≥ 1, with O(1) depending on p. Note that for n ≥ 1: kIn − In−1 kG,1,0p ≤ L−nβ/4 ε7/4 , where we have used the smallness of ε.

(7.5)

254

P. K. Mitter, B. Scoppola

By Proposition 6.2, 6.3, the closed ball: o n B = g, r |g − g| ¯ ≤ ε3/2 , krkG,1,06 ≤ ε5/2+η

(7.6)

is stable under RG iteractions. Moreover from (7.2), (7.4), and using gn = O(ε) for all n, we get easily kIn kG,1,0p ≤ O(ε2 )L2(D+2) L3β

n X

L−lβ/4 ≤ ε 7/4 .

(7.7)

l=1

We thus have Proposition 7.1. For any n ¯ ≤ ε3/2 , |gn − g| krn kG,1,06 ≤ ε

5/2+η

kIn kG,1,06 ≤ ε

7/4

(7.8) ,

,

(7.9) (7.10)

Define, for any sequence {an }, the increments 1an = an+1 − an , and make the following inductive hypothesis: For all j = 1, 2, . . . , n, j

|1gj −1 | ≤ k∗ ε3/2 ,

(7.11)

j k∗ ε5/2+η ,

(7.12)

k∗ = 1 − ε ln L + 2ε1+η/2 .

(7.13)

k1rj −1 kG,1,06 ≤ where

Clearly 0 < k∗ < 1. Note that the inductive hypothesis is true for j = 1. In fact, for j = 1 (7.11) follows from Proposition 6.3, and (7.12) follows from Proposition 6.2 if we use L−β/4 ≤ k∗ , for sufficiently large L. Remember r0 = 0, η > 0 and sufficiently small, say η = 1/20 as in Sect. 3. Our task will be to prove that (7.11) and (7.12) are true for j = n + 1. To this end we first note some preliminary estimates. These are an increment version of Lemma 3.3.1, the estimate of K˜ given after (6.19), and Lemmas 3.5.1, 3.5.2. They are summarized in the following lemma: Lemma 7.2. For any integer p ≥ 0 and for O(1) depending on p, k1ˆrn−1 kG,1,0 ≤ O(1)k∗n ε5/2+η , k1K˜ n−1 kG,1,0p ≤ O(1)k∗n ε7/4 , k∗n 5/2+η ε , k1¯rn−1 kG,1,0p ≤ O(1) β/2 L k∗n 5/2+η ε . k(1˜rn−1 )\ kG,1,0p ≤ O(1) β/2 L

(7.14) (7.15) (7.16) (7.17)

Renormalization Group Approach to Interacting Polymerised Manifolds

255

˜ r¯ , r˜ are those introduced in Sect. 3. The quantities rˆ , K, We omit the proof of Lemma 7.2. The proof is straightforward. We have to use Proposition 7.1 and the inductive hypothesis. We use Lemmas 2.6.1, 2.6.2, 2.6.3. For (7.15) we use also (7.5) and L−β/4 ≤ k∗ . Then follow the lines of the proofs of lemma ˜ and Lemmas 3.5.1, 3.5.2. 3.3.1, the estimate of K, Now turn to the extracted activities. By definition (see (6.14)) \ \ rn+1 = (1 − L) S1 rˆn(s.s.) + S1 rˆn(l.s.) + r˜n\ + r¯n (7.18) −Fn + ˜ n. + (e−Fn − 1 + Fn ) + (e−Fn − 1)+ + (e − 1) ∨ K ≥2 Hence

(s.s.) \ (l.s.) \ + S1 1ˆrn−1 + (1˜rn−1 )\ + 1¯rn−1 1rn = (1 − L) S1 1ˆrn−1 + 1(e−Fn−1 − 1 + Fn−1 ) + 1(e−Fn−1 − 1)+ ≥2

(7.19)

+ (1(e−Fn−1 − 1)+ ) ∨ K˜ n + (e−Fn−1 − 1)+ ) ∨ 1K˜ n−1 . Proposition 7.3.

k1rn kG,1,06 ≤ O(1)k∗n+1 ε5/2+η .

(7.20)

Proof. We estimate each term on the r. h. s. of (7.19). By Proposition 5.1.6 and linearity O(1) O(1) (s.s.) \ k(1 − L) S1 1ˆrn−1 kG,1,06 ≤ β/2 k1ˆrn−1 kG,1,0 ≤ β/2 k∗n ε5/2+η , (7.21) L L where in the last step we have used Lemma 7.2, O(1) O(1) (l.s.) \ k S1 1ˆrn−1 kG,1,06 ≤ β/2 k1ˆrn−1 kG,1,0 ≤ β/2 k∗n ε5/2+η . L L

(7.22)

By Lemmas 2.5.3 and 7.2, k(1˜rn−1 )\ kG,1,06 ≤ O(1)

k∗n 5/2+η ε . Lβ/2

(7.23)

By Lemma 7.2, k1(e−Fn−1 − 1 + Fn−1 )kG,1,06 ≤ O(1)

k∗n 5/2+η ε . Lβ/2

(7.24)

To prove (7.24) we recall that Fn = FQ˜ n + Frˆn . Consider first FQ˜ n with g = gn = O(ε) by Proposition 7.1. From (4.44) for j = n − 1, n and any integer p ≥ 0, kFQ˜ j kG,1,0p ≤ O(1)ε7/4 , kFQ˜ j k∞,1,0p ≤ O(1)ε7/4 . Then, by Lemma 5.3.1, for j = n − 1, n and any integer p ≥ 0 kFrˆj kG,1,0p ≤ O(1)kˆrj kG,1,0 ≤ O(1)ε5/2+η , kFrˆj k∞,1,0p ≤ O(1)kˆrj kG,1,0 ≤ O(1)ε5/2+η , where in the last step we used Lemma 3.3.1.

256

P. K. Mitter, B. Scoppola

Hence kFj kG,1,0p ≤ O(1)ε7/4 , kFj k∞,1,0p ≤ O(1)ε7/4

(7.25)

Hence, by Lemma 2.6.2, k1(e−Fn−1 − 1 + Fn−1 )kG,1,06 ≤ O(1)ε7/4 k1Fn−1 kG,1,06 .

(7.26)

By (4.30), (4.31) 2 | ≤ O(1)k∗n ε7/4 , k1FQ˜ n−1 kG,1,0p ≤ O(ln L)L2(D+2) |gn2 − gn−1

(7.27)

where we used in the last step the inductive hypothesis. By Lemma 5.3.1, linearity, and Lemma 7.2, k1Frˆn−1 kG,1,0p ≤ O(1)k1ˆrn−1 kG,1,0 ≤ O(1)k∗n ε5/2+η .

(7.28)

From (7.27), (7.28) we obtain k1Fn−1 kG,1,06 ≤ O(1)k∗n ε7/4 .

(7.29)

Putting (7.29) in (7.26) and using the smallness of ε we obtain the proof of (7.24). We have now the estimate k1(e−Fn−1 − 1)+ ≥2 kG,1,06 ≤

O(1) n 5/2+η k ε . Lβ/2 ∗

(7.30)

This follows from (7.25), (7.29) and Lemma 2.6.3. Next we have the estimate: k(1(e−Fn−1 − 1)+ ) ∨ K˜ n kG,1,06 ≤ O(1)k(1(e−Fn−1 − 1)+ )k∞,1,09 kK˜ n kG,1,09 ≤ O(1)k1Fn−1 k∞,1,09 kK˜ n kG,1,09 (7.31) ≤ O(1)k∗n ε14/4 ≤

O(1) n 5/2+η k ε , Lβ/2 ∗

where we have used Lemma 2.6.3 with k = 1, and then (7.29) and the estimate of K˜ given after (6.19). Finally, we have: k(e−Fn−1 − 1)+ ∨ 1K˜ n−1 kG,1,06 ≤ O(1)k(e−Fn−1 − 1)+ k∞,1,09 k1K˜ n−1 kG,1,09 O(1) ≤ O(1)k∗n ε14/4 ≤ β/2 k∗n ε5/2+η , L (7.32) where we have used the estimate of (e−Fn−1 − 1)+ given after (6.19) and Lemma 7.2 Adding up the estimates given in (7.21)–(7.24) and (7.30)–(7.32) we obtain k1rn kG,1,06 ≤

O(1) n 5/2+η k ε , Lβ/2 ∗

(7.33)

and from this we obtain the proof for L large enough. u t Proposition 7.4.

|1gn | ≤ k∗n+1 ε5/2+η .

(7.34)

Renormalization Group Approach to Interacting Polymerised Manifolds

257

Proof. gn+1 = Lε gn (1 − b1 gn ) + ξrˆn , whence 1gn = 1gn−1 (Lε − Lε b1 (gn + gn−1 )) + 1ξrˆn−1 , Using the definition of the approximate fixed point g¯ =

Lε −1 Lε b1 ,

(7.35)

we have

¯ − Lε b1 (gn−1 − g). ¯ Lε − Lε b1 (gn + gn−1 ) = 2 − Lε − Lε b1 (gn − g) Using the fact that Lε − Lε b1 (gn + gn−1 ) > 0 for ε sufficiently small we have ¯ + Lε b1 |gn−1 − g| ¯ 0 ≤ Lε − Lε b1 (gn + gn−1 ) ≤ 1 − ε ln L + Lε b1 |gn − g| ≤ 1 − ε ln L + 2Lε b1 ε3/2 .

(7.36)

We have also by the inductive hypothesis |1gn−1 | ≤ k∗n ε3/2 as well as |1ξrˆn−1 | ≤ O(1)k1ˆrn−1 kG,1,0 ≤ O(1)k∗n ε5/2+η , where we also used linearity and Lemma 5.3.2. Putting these inequalities and (7.36) in (7.35) we get |1gn | ≤ ε3/2 (1 − ε ln L + 2Lε b1 ε3/2 + O(1)ε1+η )k∗n ≤ k∗n+1 ε3/2 , which gives the proof of (7.34). u t Propositions 7.3 and 7.4 imply that the inductive hypothesis (7.11), (7.12) is actually true for all j ≥ 1. This, together with (7.5) and Proposition 7.1 implies: Theorem 7.5. Let L be sufficiently large, β = 1 − 2δ > 0 sufficiently small and inde−30 . Moreover let the initial coupling pendent of L, and ε = 1 − dβ 2 such that 0 < ε < L 3/2 ¯ the approximate second order fixed g0 be held within a ball of radius ε , centered at g, point given in (6.22). Then the sequence of effective coupling constants gn converges, as n −→ ∞, to the fixed point g∞ . The sequence (In , rn ) of polymer activities converges, as n −→ ∞, to (I∞ , r∞ ) in the norm k · kG,1,06 . Moreover ¯ ≤ ε3/2 , |g∞ − g| kr∞ kG,1,06 ≤ ε5/2+η , kI∞ kG,1,06 ≤ ε7/4 . Since g¯ = O(ε), we have g∞ 6= 0 so that the fixed point is non-Gaussian. Unfortunately we have not been able to relax the smallness condition on β used in Proposition 5.1.6 via Lemma 5.1.4. Acknowledgements. We are grateful to David Brydges for many interesting conversations and suggestions during the course of this work. We thank Marzio Cassandro for having participated at an early stage of this project and for his continued interest. We thank Francois David and Kay Wiese for interesting comments and questions as well as bringing to our attention some relevant references. P.K.M. thanks Gerard Menessier for his mathematical vigilance and for having cooperated on an earlier version of the proof of Lemma 2.3.1. He thanks the INFN, sezione di Roma, for having supported his visits to Rome during the course of this work.

258

P. K. Mitter, B. Scoppola

Appendix A In this appendix we will prove Lemma 2.3.1 on the stability of the large field regulator. Proof. Using the master formula given in Lemma 1.3.1, we have Z L−α −β 2 dx1 e−(λ∗ /2)L |φ(x1 )| (µ0 ∗ Gρ,κ )(X, φ) = |X| X Z 2 −β x1 −β 2 · dµ6 x1 (ζ )e(ρλ∗ /2)|ζ (x1 )+L φ(x1 )| eκkζ +L T φkX,1,σ Z L−α −β 2 ≤ dx1 e−(λ∗ /2)L |φ(x1 )| (A1.1) |X| X 1/2 Z −β 2 · dµ6 x1 (ζ )eρλ∗ |ζ (x1 )+L φ(x1 )| Z ·

dµ6 x1 (ζ )e

2κkζ +L−β T x1 φk2X,1,σ

1/2

Observing that σ = 6 x1 (x1 , x1 ) = γ − λ∗ L−β γ 2 so that λ∗ σ = 1 − L1β we easily obtain the bound Z Z −β 2 −2β 2 2 dµ6 x1 (ζ )e2ρλ∗ |ζ (x1 )| dµ6 x1 (ζ )eρλ∗ |ζ (x1 )+L φ(x1 )| ≤ e2ρλ∗ L |φ(x1 )| −2β |φ(x )|2 1

≤ (1 − 4ρ)−d/2 e2ρλ∗ L hence for 0 < ρ <

1 8

(A1.2)

we get from (A.1.1) Z L−α −(λ /2) 1− 2ρβ L−β |φ(x1 )|2 L dx1 e ∗ |X| X Z 1/2 −β T x1 φk2 2κkζ +L X,1,σ · dµ6 x1 (ζ )e .

(µ0 ∗ Gρ,κ )(X, φ) ≤ 2d/4

(A1.3)

We shall now estimate the last integral in (A1.3). First we observe that ζ + L−β T x1 φ = ζ + φ − λ∗ L−β 0(·, x1 )φ(x1 ). Then we have Z 2 2 −β x1 2 −2β 2 dµ6 x1 (ζ )e2κkζ +L T φkX,1,σ ≤ e4κλ∗ L k0(·,x1 )kX,1,σ |φ(x1 )| Z (A1.4) 2 · dµ6 x1 (ζ )e4κkζ +φkX,1,σ . We now make the following claim. Claim A.1. For κ > 0 sufficiently small, independent of L, and any x1 ∈ X, Z

dµ6 x1 (ζ )e4κkζ +φkX,1,σ ≤ 2|X| e8κk φkX,1,σ . 2

2

(A1.5)

Renormalization Group Approach to Interacting Polymerised Manifolds

259

Observe also that, from Lemma 1.1.1, X Z dx|∂ α 0(x − x1 )|2 k0(·, x1 )k2X,1,σ ≤ X

1≤α≤σ

X Z

≤

(A1.6)

dx|∂ α 0(x − x1 )|2 ≤ O(1).

R

1≤α≤σ

Using (A1.5) and (A1.6) we get from (A1.4), Z 2 2 −β x1 2 −2β 2 dµ6 x1 (ζ )e2κkζ +L T φkX,1,σ ≤ 2|X| eO(1)κλ∗ L |φ(x1 )| e8κkφkX,1,σ ,

(A1.7)

and using (A1.7) we get from (A1.3), Z L−α dx1 (µ0 ∗ Gρ,κ )(X, φ) ≤ O(1) |X| X

(A1.8)

·e

−(λ∗ /2) 1− 2ρβ (1+O(1) ρκ λ∗ ) L−β |φ(x1 )|2 |X| 4κkφk2 X,1,σ L

2

e

.

We have chosen κ > 0, O(1) in L, sufficiently small and 0 < ρ < 1/8. We choose ρ ≥ κ. Then we get (2.12) for L sufficiently large. So Lemma 2.3.1 will have been proved provided we prove the Claim A.1. Proof of Claim A.1. The proof is along the lines of that of Lemma 3 in [BDH2], the only difference being that we have the “covariance” 6 x1 instead of the covariance 0. Recall (A1.9) 6 x1 (x, y) = 0(x − y) − λ∗ L−β 0(x − x1 )0(y − x1 ). Define for t ∈ [0, 1] Gt (X, φ) = eUt (X,φ) , where Ut (X, φ) = t ln 2|X| + 4κ(1 + t)kφk2X,1,σ . We have to prove Z

1

ds

0

∂ µ(1−s)6 x1 ∗ Gs (X, φ) ≥ 0. ∂s

It is sufficient to prove that the integrand is non-negative. Thus we have to prove, for s ∈ [0, 1], 1 1 x1 ∂Us ∂Us ∂ x Us − 16 1 Us − 6 , ≥ 0. (A1.10) ∂s 2 2 ∂φ ∂φ Z

Here 1 6 x 1 Us =

Z

X

dx

X

dy6 x1 (x, y)

= 10 Us − λ∗ L−β and 6

x1

∂Us ∂Us , ∂φ ∂φ

Z

Z

Z X

δ δ Us δφ(x) δφ(y)

dx0(x − x1 )

δ δφ(x)

(A1.11)

2 Us

δUs δUs δφ(x) δφ(y) X X Z δUs 2 ∂Us ∂Us , − λ∗ L−β dx0(x − x1 ) , =0 ∂φ ∂φ δφ(x) X (A1.12) =

dx

dy6 x1 (x, y)

260

P. K. Mitter, B. Scoppola

∂Us = ln 2|X| + 4κkφk2X,1,σ . ∂s

(A1.13)

1 10 Us ≤ O(1)κ|X|, 2

(A1.14)

We have

where O(1) is independent of L. The latter follows from the fact that the Sobolev norm starts with one derivative and Lemma 1.1.1, Z

δ dx0(x − x1 ) δφ(x) X

2 Us ≤ 8κ(1 + s)k0(·, x1 )k2X,1,σ ≤ O(1)κ

(A1.15)

with O(1) independent of L, by (A1.6). Using (A1.14) and (A1.15), we get from (A1.11), 1 16 x1 Us ≤ O(1)κ|X|. 2

(A1.16)

It is easy to see that 0 ∂Us , ∂Us ≤ 32 κ 2 ∂φ ∂φ

Z

X X

≤ 32 κ 2

1≤α1 ,α2 ≤σ

≤ 32 κ 2

X

1≤α1 ,α2 ≤σ

dy|∂ α1 φ(y)| |((∂ α1 +α2 0) ∗ ∂ α2 φ)(y)|

k∂ α1 φkL2 (X) k(∂ α1 +α2 0) ∗ ∂ α2 φkL2 (X)

2 ! X sup k∂ j 0kL1 (R)  k∂ α φkL2 (X) 

2≤j ≤2σ

≤ O(1)κ 2

!

1≤α≤σ

sup k∂ j 0kL1 (R) kφk2X,1,σ ,

2≤j ≤2σ

(A1.17)

where to pass to the next from the last line we have used Young’s convolution inequality. Now, using the compact support of the kernel function u, sup k∂ j 0kL1 (R) ≤ sup

2≤j ≤2σ

Z

2≤j ≤2σ

L

1

Z

≤ 2 sup

2≤j ≤2σ

1

dl β−j l l L

Z

l

−l

dx |(∂ j u)(x/ l)|

dl β−j +1 j k∂ uk∞ ≤ O(1), l l

(A1.18)

where O(1) is independent on L. From (A.1.17) and (A1.18) we get 0 ∂Us , ∂Us ≤ O(1)κ 2 kφk2 X,1,σ . ∂φ ∂φ

(A1.19)

Renormalization Group Approach to Interacting Polymerised Manifolds

261

Next we have 2  2 Z Z X δUs dx0(x − x1 ) ≤ 32κ 2  dx∂ α 0(x − x1 )∂ α φ(x) δφ(x) X 1≤α≤σ X   1/2 Z 1/2 2 (A1.20) X Z  ≤ 32 κ 2  dx(∂ α 0(x − x1 ))2 dx|∂ α φ(x)|2 1≤α≤σ

X

≤ O(1)κ 2 sup1≤α≤σ

R

X

2 2 α 2 2 R (∂ 0(x)) kφkX,1,σ ≤ O(1)κ kφkX,1,σ ,

where O(1) is independent on L and we have used Lemma 1.1.1. Using (A1.19) and (A1.20) we get from (A1.12), x ∂Us ∂Us ≤ O(1)κ 2 kφk2 6 1 (A1.21) , X,1,σ . ∂φ ∂φ Putting together the estimates (A1.13), (A1.16) and (A1.21) we get 1 1 x1 ∂Us ∂Us ∂ x Us − 16 1 Us − 6 , ∂s 2 2 ∂φ ∂φ (A1.22) ≥ (ln 2 − O(1)κ)|X| + (4κ − O(1)κ 2 )kφk2X,1,σ ≥ 0 for κ > 0 small enough and independent of L, since the O(1) are independent of L. This completes the proof of the claim, and hence of Lemma 2.3.1. u t References [AL] [B] [BDH1] [BDH2] [BDH3] [BY] [CM] [D1] [D2] [DDG1] [DDG2] [DDG3] [DDG4] [DH1] [DH2] [DHK] [DW1] [DW2] [DW3] [DW4] [GK] [GV] [H] [KN] [KW] [NPW]

Aronowitz, J.A. and Lubensky, T.C.: Europhys Lett. 4, 395 (1987) Brydges, D.: Functional integrals and their applications. Technical report, Ecole Polytechnique Federale de Lausanne (1992) Brydges, D., Dimock, J. and Hurd, T.R.: Commun. Math. Phys. 172, 143–186 (1995) Brydges, D., Dimock, J. and Hurd, T.R.: Estimates on Renormalization Group Transformation. Preprint (1996) Brydges, D., Dimock, J. and Hurd, T.R.: Commun. Math. Phys. 198, 111–156 (1998) Brydges, D. and Yau, H.T.: Commun. Math. Phys. 129, 351–392 (1990) Cassandro, M. and Mitter, P.K.: Nucl. Phys. B422, 634–674 (1994) Duplantier, B.: Phys. Rev. Lett. 58, 2733 (1987) Duplantier, B.: Phys. Rev. Lett. 62, 2337 (1989) David, F., Duplantier, B. and Guitter, E.: Phys. Rev. Lett. 70, 2205 (1993) David, F., Duplantier, B. and Guitter, E.: Nucl. Phys. B394, 555 (1993) David, F., Duplantier, B. and Guitter, E.: Phys. Rev. Lett. 72, 311 (1994) David, F., Duplantier, B. and Guitter, E.: Renormalization theory for the self-avoiding polymerized membranes. cond-mat 9702136 (1997) Dimock, J. and Hurd, T.R.: J. Stat. Phys 66, 1277–1318 (1992) Dimock, J. and Hurd, T.R.: Commun. Math. Phys. 156, 547–580 (1993) Duplantier, B., Hwa, T. and Kardar, M.: Phys. Rev. Lett. 64, 2022 (1990) Wiese, K.J. and David, F.: Nucl. Phys. B 487, 529 (1997) David, F. and Wiese, K.J.: Phys. Rev. Lett. 76, 4564 (1996) Wiese, K.J. and David, F.: Nucl. Phys. B 450, 495 (1995) David, F. and Wiese, K.J.: Nucl. Phys. B 535, 555 (1998) Gawedski, K. and Kupiainen, A.: Nucl. Phys. B 262, 33 (1985) Gel’fand, I.M. and Vilenkin,N.Ya.: Generalized Functions. Vol. 4, New York: Academic Press, 1964 Hwa, T.: Phys. Rev. A 41, 2733 (1987) Kardar, M. and Nelson, D.R.: Phys. Rev. Lett. 58, 1298, 2280 (1987) Kogut, J. and Wilson, K.G.: Phys. Rep., 12, 75 (1974) Nelson, D.R., Piron, T. and Weinberg, S., eds.: Statistical mechanics of membranes and surfaces. Singapore: World Scientific, 1989

Communicated by D. Brydges

Commun. Math. Phys. 209, 263 – 273 (2000)

Communications in

Mathematical Physics

© Springer-Verlag 2000

Fibre Bundle for Spin and Charge in General Relativity Kenichi Horie? Institut für Physik der Johannes-Gutenberg-Universität, 55099 Mainz, Germany Received: 12 February 1999 / Accepted: 6 July 1999

Abstract: The Lorentzian and spin structures of general relativity are shown to allow a natural extension, by means of which the set of possible electromagnetic bundles is linked to the topology and geometry of the underlying causal structure. Further, both the Dirac operator and the electromagnetic potential are obtainable from a single linear connection 1-form. 1. Introduction In general relativity the gravitational field is represented by the Riemannian curvature tensor, which is determined by the underlying spacetime geometry in terms of the metric. Furthermore, the geometry and topology of the spacetime manifold not only govern the large-scale behaviour of the gravitational field and the causal structure, but also determine the spin geometry, which is the adequate mathematical framework for spinorial matter. Whereas these gravitational and spinorial structures are intrinsic features of the underlying spacetime, the electromagnetic structure is commonly considered not to be of this type, but is rather put from the outside by attaching a U(1)-bundle and introducing an electromagnetic potential on it. The bundle is often taken to be trivial. In this work we would like to discuss a natural and modest extension of the fibre bundle geometry employed in general relativity, which not only incorporates the usual setting for gravitational and spinor fields, but also an appropriate basis for electromagnetism. In order to motivate this extended structure, let us first review a few examples, which hint at a topological interrelationship between the spacetime topology and the electromagnetic structure. When the spacetime manifold has non-trivial topology, there may exist physically distinct spinors due to inequivalent spin structures [6], which are characterized by the elements of the first cohomology group H1 (M, Z2 ) [10,3]. In [12] it was shown that these ? Present address: High Energy Accelerator Research Organization (KEK), Tanashi Branch, Tokyo 188–8501, Japan. E-mail: [email protected]

264

K. Horie

elements naturally lead to an additional U(1)-potential in the Dirac operator, and the potential was constructed explicitly for the case when H1 (M, Z) has no torsion: given an element of H1 (M, Z2 ) in terms of transition functions {kαβ } subordinate to a covering {Uα }α∈A , local functions λα : Uα → U(1) were shown to exist with λα λ−1 β = kαβ . 1 2 −1 2 The induced potential B := 2πi (λα ) d(λα ) is well-defined globally and represents an integer cohomology class. It is naturally interpreted as an effective non-dynamical electromagnetic potential which adds to the dynamical potential when considering inequivalent spinors. The physical implications of such an interplay between the non-trivial topology of the spacetime and electromagnetism were discussed in [12] for the case of a superconducting ring, where some phenomena in superconductivity could be explained geometrically by considering this effective potential. Another example of a connection between the spacetime topology and electromagnetism is displayed by the Spinc -structure, see e.g. Appendix D of [8]. As a generalization of the notion of a spin structure, it consists of a principal Spin×Z2 U(1)-bundle PSpinc (M) and a twofold covering map ξ c onto the (fibre-)product of the orthonormal tangent frame bundle and a U(1)-bundle, ξ c : PSpinc (M) → (PSO × PU(1) )(M).

(1)

A Spinc -structure may be defined even if no spin structure exists. Any complex vector bundle carries a canonical Spinc -structure, and so every almost complex manifold is a Spinc -manifold. It was also proven that any oriented four-dimensional Riemannian manifold possesses a Spinc -structure. Given a Spinc -structure, charged Dirac spinors find a natural interpretation as sections into a vector bundle associated to PSpinc (M) [4]. In the Spinc -structure (1) the topology of the orthonormal frame bundle is tied to that of the pairing electromagnetic bundle in that the second Stiefel–Whitney class of the orthonormal bundle must be equal to the mod 2 reduction of the Chern class of the U(1)-bundle [8]. This geometrical framework for electromagnetism becomes troublesome when bosons are also taken into account, since it turns out that fermions always must have half the charge of bosons [4]. Nevertheless, in [2] the Spinc -structure is extended to non-abelian groups, and an attempt was made to consider even the electro-weak structure. In this work we consider just a principal Spin × U(1)-bundle instead of the Spin ×Z2 U(1)-bundle PSpinc (M) and arrive at a different connection between the topology of the spacetime and that of the electromagnetic structure. Let M be an n-dimensional spacetime manifold with metric g of arbitrary signature, and consider first the bundle structure ξe : (PSpin × PU(1) )(M) −→ Fc (M),

(2)

where (PSpin × PU(1) )(M) denotes the product of some spin bundle and a yet not determined principal U(1)-bundle. The bundle Fc (M) denotes the trivially complexified frame bundle. Obviously, such a complexification is algebraically necessary to account for the unitary group. Note that the bundle map ξe is not surjective. The idea behind (2) is to view both the spin and electromagnetic structures as originating from the geometry of the spacetime itself. The existence of a spin structure not only depends on the topology of the spacetime but, when g has mixed signature, also on the geometry (i.e. causal structure) itself [3]. If a spin structure exists, then there may exist non-isomorphic ones if H1 (M, Z2 ) is non-trivial. Thus in a theory like general relativity, where the metric is dynamical, consideration of spin structure deformations is

Fibre Bundle for Spin and Charge

265

an indispensable consequence. The bundle structure (2) ties these deformations to those of the electromagnetic structure. Contrary to the Spinc -structure PSpinc (M) (1), which may be viewed as the product of a possibly non-existent spin bundle and a possibly non-existent square root of some U(1)-bundle [8], we require both principal bundles representing spin and charge to exist in (2). On the other hand, the product of these two bundles (PSpin × PU(1) )(M) should be mapped effectively into the frame bundle itself and not merely onto some product of the frame bundle and a U(1)-bundle, since otherwise the U(1)-structure would be completely detached from the spacetime topology. In view of the complexification of the frame bundle in (2), it may seem strange that one still considers a real spin bundle, since its underlying orthonormal bundle is necessarily a real structure. In fact, in this work we shall not consider (2) directly, but rather its complexified version, ξe : (PCSpin × PC∗ )(M) −→ Fc (M).

(3)

Here U(1) is replaced by the group of non-zero complex numbers C∗ , which however is only a minor change related to technical issues when discussing connection 1-forms. More important is the extension of the real spin group to the complex spin group CSpin, which, by definition, covers the complex orthonormal group. In this way, the structure (3) makes contact with complex general relativity and, in particular, withAshtekar theory [1]. One standard way to arrive at the Ashtekar theory of general relativity is to consider first complex general relativity in terms of a complex Lagrangian based on complex tetrads and a complex self-dual Lorentz connection. After performing a Legendre transform to the Hamiltonian framework, one extracts the real general relativity by placing a simple reality condition on the complex phase space. Concerning the complex geometry considered in this work, in order to arrive at a real theory, a natural reality condition shall be placed upon the complex bundle structure (3). Provided some Lie algebra identities hold, via pull-back with ξe one can obtain from a given linear connection 1-form on the frame bundle Fc (M) an appropriate connection on the product bundle (PCSpin × PC∗ )(M). As we shall detail in this work, the proposed bundle structure (3) not only ties the topology of possible electromagnetic bundles to that of the spacetime, but also allows for the deduction of an electromagnetic potential and a Dirac operator from the geometrical setting of the underlying spacetime. The organization of this paper is as follows. In the next section we explain the fibre bundle geometric structures involved in (3) and the restrictions to be placed upon it. In Sect. 3 we shall discuss topological implications, especially those on the electromagnetic structure. The fourth section discusses the connection 1-forms and the Dirac operator. Section 5 contains a summary and discussion. 2. Fibre Bundle for Spin and Charge Let M be an n-dimensional manifold not necessarily orientable, and define the internal metric to be the n × n matrix η = (ηab ) = (ηab ) = diag(1, ..., 1, −1, ..., −1) with signature (k, l), k + l = n. The complex orthonormal group CSO = CSO(η) with respect to η consists of all complex n × n matrices L with LT ηL = η and determinant one. Let CSpin denote the corresponding spin group of CSO, see e.g. [13,8], where we take only the component connected with the identity, such that the spin mapping ξ0 : Cspin → CSO is a twofold mapping like in the real case.

266

K. Horie

For example, if η = diag(1, −1, −1, −1), then CSpin ∼ = SL(2, C) × SL(2, C). To make the spin mapping explicit, define the vector space isomorphism from the complex Minkowski space (C!4 , η) to the space of 2 × 2 matrices ∼: x = (x a ) 7→ x∼ = x 0 + x 3 x 1 − ix 2 , where the identity x Tηx = det x∼ holds. Then the spin mapping x 1 + ix 2 x 0 − x 3 ξ0 (A, B), A, B ∈ SL(2, C), is given by (ξ0 (A, B)(x))∼ := Ax∼ B † , where † denotes Hermitian conjugation, see e.g. [13]. As was said in the Introduction, the proposed bundle map (3) is subject to natural requirements which we now explain. The spin bundle PCSpin (M) in (3) naturally defines an orthonormal bundle by spin mapping, which shall be denoted by PCSO (M). (More precisely, if {Sαβ } denotes a family of transition functions for a given covering by local trivializations of the spin bundle, then the corresponding transition functions of the CSO-bundle PCSO (M) are given by {ξ0 (Sαβ )}.) A theory of gravitation along the lines of general relativity requires an orthonormal (or Lorentz) connection and thus implicitly the corresponding orthonormal bundle. Therefore, the bundle map ξe in (3) can not be just any map, but we require ξe to factorize as follows: ξ ×id

2

(PCSpin × PC∗ )(M) −→ (PCSO × PC∗ )(M) −→ Fc (M).

(4)

Here ξ denotes the spin bundle mapping ξ : PCSpin (M) → PCSO (M), and 2 is a yet undetermined bundle mapping. To characterize 2, let us consider the following chain of Lie group homomorphisms: ξ0 ×id

j0

θ0

CSpin × C∗ −→ CSO × C∗ −→ G −→ GLc .

(5)

The homomorphism θ0 is the multiplication of a complex orthonormal matrix with a non-zero complex number, yielding an element of the complex general linear group GLc = GL(n; C). Let G be the image group of θ0 and j0 be its inclusion into GLc . The chain of homomorphisms (5) may be thought of as that of structure groups of principal fibre bundles. Correspondingly, we require the following chain of bundle mappings to exist for (3) as a continuation of the spin structure (4), ξ ×id

θ

j

(PCSpin × PC∗ )(M) −→ (PCSO × PC∗ )(M) −→ PG (M) −→ Fc (M).

(6)

Here the bundle maps are denoted by the same letters as their accompanying Lie group homomorphisms in (5) and the principal bundles themselves by the letter P . Exceptionally, Fc (M) denotes the complexified frame bundle. We have 2 = j ◦ θ . The complexification in Fc (M) is necessary for the consideration of the C∗ -structure. However, following the idea mentioned in the Introduction, namely that both the spin structure and the electromagnetic bundle be based on the intrinsic geometry of the spacetime manifold itself, we demand the inclusion of PG (M) in Fc (M) be C-trivial, i.e. PG (M) be without a “complex twist” and reducible to some real principal subbundle of the real tangent frame bundle F (M). This is the reality condition to be placed upon (3). We make the following Definition. The pair (M, η) is said to be a charged spin manifold if the bundle structure (6) exists and if therein the bundle PG (M) is reducible to a real subbundle of the frame bundle F (M). The complexification of the frame bundle is needed purely to match the algebraic properties (5), and, as we shall see below, it may become geometrically significant only in special cases.

Fibre Bundle for Spin and Charge

267

3. Topological Implications To discuss topological implications of a charged spin manifold, we concentrate on the bundle map θ (6). Let {Uα }α∈A be a covering of M by contractible open sets such that any bundle over each of these sets can be trivialized [5]. Let {3αβ } be a 1-cocycle representation of the bundle PCSO (M) in (6), i.e. each 3αβ is a CSO-valued transition function on the overlap Uα ∩ Uβ between two local trivializations fulfilling the cocycle condition 3αβ 3βγ 3γ α = 1. Similarly, let {cαβ } be a cocycle representation of PC∗ (M) in (6). The requirement that the image of these two principal bundles under θ is reducible to a real subbundle of F (M) implies the existence of a gauge transformation such that for the embedded principal G-bundle its transition functions 3αβ cαβ are always real 2 η, matrix-valued functions, which is now assumed. Then (3αβ cαβ )T η(3αβ cαβ ) = cαβ 2 must be real-valued. If c2 > 0, both c and so cαβ αβ and 3αβ are real-valued. If, on the αβ 2 < 0, then c is purely imaginary, and there must exist some function other hand, cαβ αβ 0 3αβ with values in the real general linear group, such that 3αβ = i30αβ . This implies −η = −3Tαβ η3αβ = 30 Tαβ η30 αβ . Since the index of a symmetric matrix A is preserved under the transformation A 7 → W T AW , where W is an arbitrary real regular matrix, the index of −η, namely (l, k), must be equal to that of η, (k, l), i.e. k = l. We conclude that if k 6 = l, then the cocycle {cαβ } defines a (real) GL(1; R)-bundle, which is always reducible to a Z2 -bundle {zαβ } := {cαβ /|cαβ |}. On the other hand then, 3αβ must be real and so belongs to the real orthonormal group with respect to η, SO = SO(η). For the remaining case k = l let ISO be the set of purelyimaginary 0 i1 elements of CSO, and select an arbitrary element of ISO, e.g. I := ∈ ISO (1 i1 0 is the k × k unity matrix). It is easy to show that ISO = {I 3|3 ∈ SO} and that ISO ∪ SO makes up a subgroup of CSO with four connected components. All transition functions 3αβ have their values in this subgroup. Concerning the C∗ -bundle, it is obvious that all its transition functions can be re-scaled to take values in Z4 := {±1, ±i}. We have shown

Proposition 1. Let (M, η) be a charged spin manifold with signature (k, l). If k 6 = l, then in the diagram (6) the CSO-bundle reduces to an SO-bundle, and the C∗ -bundle reduces to a Z2 -bundle. If k = l, the CSO-bundle reduces to an (SO ∪ ISO)-bundle and the C∗ -bundle to a Z4 -bundle. If k = l, then in general neither the CSO-bundle nor the C∗ -bundle can be reduced further, and the image of these bundles under the map θ does not allow an easy interpretation as it is in the other case k 6 = l. We henceforth assume k 6 = l. The product bundle (PCSO × PC∗ )(M) (6) reduces to an SO × Z2 -bundle which we denote by (PSO × PZ2 )(M) and represent by the cocycle {(3αβ , zαβ )}. The image of this cocycle under θ, the G-bundle PG (M) (6) represented by the cocycle {3αβ zαβ }, also reduces to a real bundle. The multiplication of an orthonormal matrix 3αβ with zαβ = ±1 gives in general an orthogonal matrix with determinant ±1, i.e. an element of the real orthogonal group with respect to η, O = O(η). Therefore, PG (M) reduces to an orthogonal subbundle of the real tangent frame bundle F (M). In order to determine which SO × Z2 -bundles {(3αβ , zαβ )} are allowed in the diagram (6), let an orthogonal subbundle PO (M) (as a reduction of PG (M)) of the tangent

268

K. Horie

frame bundle be given first, {Lαβ } being its cocycle representation. Modulo gauge transformations the SO-bundle is obtained by multiplying the transition functions of PO (M) by the Z2 -cocycles, {3αβ } = {Lαβ zαβ }. If the spacetime dimension n is even, then the determinant of an (n × n)-matrix remains the same upon multiplication by ±1. Thus we can not start with any orthogonal subbundle in F (M), but only with orthonormal ones, such that always det Lαβ = 1. At the same time, the cocycle {zαβ } can be any element of H1 (M, Z2 ). If n is odd, a multiplication with −1 changes the sign of the determinant. Since {3αβ } is to define an orthonormal bundle, the determinant of each O-valued function Lαβ must be equal to the value of the corresponding function zαβ , which guarantees that the product of both functions has determinant one. Modulo gauge transformation the Z2 -cocycle is actually the first Stiefel–Whitney class of the orthogonal subbundle. We obtain Proposition 2. In the situation of Proposition 1 let k 6= l. If n is even, the Z2 -bundle PZ2 (M) can represent any element of H1 (M, Z2 ), and the orthonormal bundle PSO (M) is obtained by tensoring a tangent orthonormal bundle with this Z2 -bundle. If n is odd, PSO (M) is obtained by tensoring a tangent orhogonal bundle with the Z2 -bundle representing its own Stiefel–Whitney class. Note that whereas PO (M) is a subbundle of the frame bundle, this is not necessarily so for PSO (M). We further remark that the second Stiefel–Whitney class of PSO (M) must vanish in order for a spin structure to exist. Since the electromagnetic bundle is obtained by complexifying the real bundle PZ2 (M), we have the following result, see e.g. [11,3]. Proposition 3. The Chern class of the electromagnetic bundle PC∗ (M) is given by c1 (PC∗ (M)) = b(w1 ),

(7) ×2

where b is the Bockstein-homomorphism of the coefficient sequence 0 → Z → Z → Z2 → 0 and w1 is the first Stiefel–Whitney class of the underlying Z2 -bundle PZ2 (M). In particular, c1 (PC∗ (M)) = 0 if and only if w1 is the mod 2 reduction of an integral class in H1 (M, Z). As the complexification of a real bundle, the electromagnetic bundle satisfies the relation 2c1 (PC∗ (M)) = 0, and so the image of the Chern class under the map H2 (M, Z) → i FA , where FA is H2 (M, R) is always zero. This means that the spacetime integral of 2π the curvature of an electromagnetic potential A on the bundle, is always zero. w1 is the mod 2 reduction of an integral class if, for example, the first cohomology group H1 (M, Z) of the spacetime has no torsion elements of odd degree [11,3]. Note the difference of (7) to the Spinc -structure, where in the latter case the Chern class of the electromagnetic bundle is related to the second Stiefel–Whitney class of the orthonormal bundle. 4. Connections Having the bundle geometry (6) at hand, we shall now discuss how a general linear connection may give rise to a Dirac operator with electromagnetic potential. Let ω be a linear connection on the (complexified) frame bundle Fc (M). Then, as a 1-form, ω always can be pulled back along the bundle maps in (6) to give a 1-form

Fibre Bundle for Spin and Charge

269

on each of the bundles. In order for these 1-forms to be actually connections, certain subsidiary algebraic conditions must hold [7], which we now want to illustrate. Let gl denote the Lie algebra of the general linear group GL = GL(n; R) and so that of the orthonormal group SO(η). The Lie algebra of GLc may be decomposed as C ⊗ gl = C ⊗ so ⊕ C ⊕ n 1 1 1 T T A = 2 (A − ηA η) + n trA + 2 (A + ηA η − n2 trA).

(8)

We note that C ⊗ so ⊕ C is the Lie algebra of the product group CSO × C∗ , which is mapped isomorphically under (the Lie algebra homomorphism of) θ0 onto the Lie algebra of G, which we name g. (The Lie group homomorphism θ0 itself is not one to one.) The vector subspace n defined in (8) can easily be shown to be invariant under the adjoint action of G. Looking at the inclusion map j : PG (M) → Fc (M) in (6) this algebraic condition implies that we can obtain from any connection 1-form ω on Fc (M) a G-connection on PG (M) by pulling ω back with j and then projecting its components to the Lie algebra of G, see e.g. [7]. Let us denote the so created connection by ωG , ωG = j ∗ ω |g .

(9)

Moving backwards in the bundle diagram (6) from PG (M) to (PCSO × PC∗ )(M), we want to obtain a connection on the latter bundle from ωG . To this end, we note that given a bundle homomorphism f : P (M) → Q(M) such that the accompanying Lie algebra homomorphism f0 is an isomorphism, any connection γ on Q(M) defines a connection on P (M) by f0 −1 (f ∗ γ ), see e.g. [7]. For the case at hand, the Lie algebra homomorphism of θ0 is indeed an isomorphism, and the connection on (PCSO ×PC∗ )(M) obtained from ωG is given by ωa := θ0−1 θ ∗ ωG .

(10)

It is not difficult to show that a connection on a fibre-product bundle uniquely determines connections on each of the product bundles, which can be explicitly constructed. For the case at hand, the connection ωa uniquely determines an orthonormal connection on PCSO (M) which shall be called ωCSO , and a potential on PC∗ (M) which we term ωC∗ . Let p and q be the canonical projections on PCSO (M) and PC∗ (M), respectively. Then between the respective connections one has [7] ωa = p∗ ωCSO + q ∗ ωC∗ .

(11)

As the next step the spin bundle map ξ (6) yields, again by the same argument as in the case of θ, a spin connection on PCSpin (M), ωCSpin := ξ0−1 ξ ∗ ωCSO .

(12)

We have shown Proposition 4. For a charged spin manifold any complex linear connection on Fc (M) yields a spin connection ωCSpin and a C∗ -potential ωC∗ via pull-backs along the diagram (6).

270

K. Horie

We mention the trivial but important fact that, although we have started with a linear connection ω on Fc (M), which may be subject to any GLc -gauge transformation, the obtained spin connection ωCSpin in fact only transforms under spin gauge transformations, since it is defined on a spin bundle and not on the original frame bundle. Analogously, the C∗ -connection undergoes only C∗ -transformations. Thus especially, there is nothing like a twofold covering of the whole frame bundle. With the help of the spin connection a covariant spinor derivative can be deduced. As for the Dirac operator, it is not possible to define it in the standard way, but one has to modify the usual procedure. In order to illustrate these fine points, we give a fairly detailed derivation of first the covariant derivative and then of the Dirac operator. To keep the following presentation lucid, we discard the electromagnetic structure. It can be incorporated in a standard fashion once the Dirac operator for neutral spinors is defined. Furthermore, we make use of Proposition 1 and henceforth consider only real bundle structures. Let ζ : Spin → GL(V ) be a representation of the spin group on a complex vector space V . A spinor field 9 is a cross section into the associated vector bundle S(M) := PSpin (M)×ζ V , and if σˆ is a local section into PSpin (M), the spinor field can be expressed as an equivalence class 9 = [σˆ , ψ], where ψ is a V -valued local function. The covariant spinor derivative is a certain linear mapping from the space of sections into S(M) to the space of sections into the tensor product of the cotangent bundle and S(M), ∇ : 0(S(M)) → 0(T ∗ M ⊗ S(M)). Let ωSpin be the spin connection (12) for the case of a real spin structure. With respect to a tangent vector X ∈ Tx M, the covariant derivative is defined as ∇X 9 = [σˆ , Xψ + ζ (ωSpin (d σˆ X))ψ].

(13)

If (ea ) = (eaµ ∂µ ), a = 1, ..., n, denotes a local orthonormal frame field in the tangent bundle T M, and (ea ) = (eaµ dx µ ) its dual, then the covariant derivative can be written as ∇ = ea ⊗ ∇ea ,

(14)

where Einstein summation convention is understood [8]. In order to define a Dirac operator, we choose ζ to be an algebra representation of the Clifford algebra Cl on some left module V , enclosing the representation of the spin group Spin. The corresponding spinor fields on S(M) are termed Dirac spinors. Let Tˆ M := PSO (M) ×id Rn be the vector bundle associated to the orthonormal bundle, on which a natural pseudo-Riemannian metric gˆ can be defined. (If [u, v] and [u, w] represent ˆ v], [u, w]) := v T ηw.) The two elements of Tˆ M, u ∈ PSO (M), v, w ∈ Rn , then g([u, ˆ ˆ metric on T M defines an isomorphism of its dual bundle T ∗ M with the bundle itself, which shall be denoted by the same letter g. ˆ Let Cl(Tˆ M) be the Clifford bundle of Tˆ M [8]. Note that Tˆ M is contained in Cl(Tˆ M). Let Ad : Spin → Aut(Cl) denote the adjoint representation of the spin group, and define the associated algebra bundle PSpin (M) ×Ad Cl. As is well known, this is precisely the Clifford bundle Cl(Tˆ M). Let γ ∈ 0(Cl(Tˆ M)) be a (local) section into the Clifford bundle and 9 ∈ 0(S(M)) a Dirac spinor. By representing γ and 9 as equivariant functions on the spin bundle PSpin (M) the Clifford multiplication µ is defined by µ(γ 9)(s) := ζ (γ (s))9(s), s ∈ PSpin (M). The Clifford multiplication is also simply denoted by a dot. The considerations so far are valid for any abstract spin structure PSpin (M) of an abstract SO vector bundle Tˆ M. Now suppose the tangent case, i.e. where Tˆ M = T M,

Fibre Bundle for Spin and Charge

271

the tangent bundle. Denoting the spaces of sections with 0, the Dirac operator D is then defined by ∇

gˆ

µ

D : 0(S(M)) −→ 0(T ∗ M ⊗ S(M)) −→ 0(T M ⊗ S(M)) −→ 0(S(M)).

(15)

ˆ a ) = (a)ea , (a) = +1 for a = 1, Locally, one starts with (14), identifies ea with g(e ..., k, and (a) = −1 otherwise. One obtains the Dirac operator in local form via Clifford multiplication as D = (a)ζ (ea ) · ∇ea [8,3]. Now turn back to the bundle geometry considered in this paper. The derivation of the Dirac operator (15) fails when trying to perform the Clifford multiplication, since T M is in general not a subbundle of the Clifford bundle Cl(Tˆ M). Nevertheless, for the special bundle setting, it is yet possible to define the Dirac operator. The local frame (ea ) is an element of the tangent orthogonal frame bundle PO (M) (i.e. the O-reduction of PG (M)). In a local trivialization, it is related to an element (fa ) of the abstract orthonormal bundle PSO (M) by fa = ±ea , when taking the Z2 -bundle into account. Thus locally, it is possible to define the covariant spinor derivative ∇fa . Of course, such a definition can in general not be extended globally. However, the local expression (a)ζ (fa ) · ∇fa , which is well defined, is formally independent of the Z2 -section, since both ζ and ∇ are linear in fa . Thus the local expression can be extended globally, yielding a Dirac operator. Proposition 5. For a charged spin manifold with k 6 = l, every complex linear connection yields a Dirac operator. Example. η = diag(1, −1, −1, −1). Let σ be a local cross-section into PG (M). Since this is a subbundle of the tangent frame bundle, one can express σ as σ = (eaµ ∂µ ), where a = 1, . . . , 4 are internal indices raised and lowered with the help of η. Let ω be a linear connection on Fc (M). The pull-back of ω via σ yields a local gl-valued connection 1-form on M, 0 aµb Eab dx µ := σ ∗ ω, where Eab is a matrix whose entry is 1 in the a th row of the bth column and 0 otherwise. By looking at the algebraic decomposition (8) the components of the pull-back of ωG reads (σ ∗ ωG )ab = 21 (0 aµb − 0bµa )dx µ + 1 c a µ ∗ 4 0 µc δ b dx . For simplicity, consider a trivial C -bundle, such that PCSO (M) is in fact a tangent orthonormal frame bundle, and so σ can be viewed as a local section into PCSO (M), too. Then, the orthonormal connection reads σ ∗ ωCSO = 21 (0 aµb − 0bµa )Dab dx µ , where the so-generators are given by Dab = 21 (Eab − ηac Edc ηdb ). Let 1ˆ be the section into the trivial C∗ -bundle assigning to each point the value 1. Clearly, the C∗ -potential reads 1ˆ ∗ ωC∗ = 41 0 cµc dx µ . Define σˆ as a section into PCSpin (M), which maps to σ under the spin map, ξ(σˆ ) = σ . With the help of σˆ the spin connection and the Dirac operator can be represented in local form on the spacetime manifold. Define the ζ : CSpin ∼ = SL(2; C) × SL(2; C) → GL(4; C) by ζ (A, B) := spin representation A 0 ∗ −1 , see e.g. [13], and take the charge representation ρ : C → GL(4; C) as 0 B† ρ(c) := cq 1. With the help of the spin mapping defined before and with the standard gamma matrices γ a in the chiral basis satisfying γ a γ b + γ b γ a = 2ηab 1, one can deduce that under the Lie algebra homomorphism ζ ξ0−1 the generators Dab are mapped to − 41 γ b γa . Therefore, the Dirac operator reads locally as follows: 1 1 q b a µ c b (16) D = γ ea ∂µ − · (0bµc − 0cµb )γ γ + 0 µb . 4 2 4

272

K. Horie

a When the connection is taken to be the Levi-Civita connection µb (in anholonomic basis), which is already orthogonal, bµc + cµb = 0, this expression gives back the standard Dirac operator in general relativity. 5. Summary and Discussion Based on the physical viewpoint that both the spin geometry and the electromagnetic structure originate from the intrinsic topology and geometry of the underlying spacetime, a modest extension of the (pseudo-)Riemannian geometry of general relativity was proposed. It is expressed by the complexified bundle structure (3), whereupon a natural reduction requirement (6) is placed, followed by a reality condition. As a consequence, the electromagnetic structure and the spacetime geometry are topologically related. For even spacetime dimension the electromagnetic bundle originates from any of the possible Z2 -bundles on the spacetime, and the pairing orthonormal bundle is created by tensoring a tangent orthonormal frame bundle with exactly this Z2 -bundle. For odd spacetime dimension the Z2 -bundle turns out to be the determinant bundle of a tangent orthogonal frame bundle, and the orthonormal bundle PCSO (M) is the tensor product of this tangent orthogonal frame bundle with its determinant bundle. Further, we have shown that any linear connection 1-form on the frame bundle yields a Dirac operator and an electromagnetic potential. Regarding the topological, non-dynamical electromagnetic potentials considered in [12] arising from inequivalent spin structures, these can be simply added to the geometric potential in this work. For a Spinc -structure its electromagnetic part is the square root of the U(1)-bundle pairing the orthonormal bundle [8], which causes a factor 2 between the boson charge and the spinor charge [4]. Contrary to this situation, from the bundle structure (6) it is obvious that no problems with boson charges occur in the present geometrical setting. Besides the interrelation between spacetime geometry and electromagnetic structure, the extended geometry implies further physical consequences when considering the dynamics. Since the connection plays a central role in the present geometry, it is natural to formulate a gravitational theory with the first order formalism, where the dynamical variables are given by the components of the connection and of the soldering form. Usually, a soldering form gives an isomorphism between the tangent bundle T M and an abstract n-dimensional vector bundle [14]. In the present setting the orthonormal connection is defined on an abstract orthonormal bundle PSO (M) (after reducing the complex bundle with Proposition 1), and its associated vector bundle Tˆ M is in general not isomorphic to the tangent bundle T M. Therefore, the soldering form is in general not a global isomorphism, i.e. it may be degenerate. Classically, such a degeneracy of the soldering form is not allowed. Note however, that the abstract metric gˆ on Tˆ M is formally the same as the metric on the tangent bundle (induced by the orthogonal bundle PO (M)), since this tangent orthogonal frame bundle is obtained from the abstract orthonormal bundle by tensoring with a Z2 -bundle. (The formal equivalence can be shown by using arguments similar to those used to define the Dirac operator. Roughly speaking, vectors are locally multiplied with ±1, and so their bilinear product remains the same.) Thus especially, the metric itself is not degenerate. On the other hand, quantum mechanically the inclusion of degenerate soldering forms and even of degenerate metrics can remedy the unrenormalizability, at least of the 2 + 1 dimensional gravity [14]. Thus for quantum gravity the geometry of the present work might be of some relevance.

Fibre Bundle for Spin and Charge

273

Other possible physical implications arise for odd dimensional spacetimes.According to Proposition 2 for odd dimensions the tangent orthogonal bundle need not be an orthonormal one. Therefore, non-orientable spacetime manifolds are not excluded from the consideration in the present geometrical framework. Vacuum quantum gravity has already been considered for non-orientable spacetimes, e.g. for R × (Kleinbottle) in [9], and it would be interesting to investigate the extended geometry, which involves both gravity and electromagnetism, in such a context. Acknowledgement. I am indebted to Prof. Martin Kretzschmar for his encouragement to this work and for his help in resolving an organizational problem.

References 1. Ashtekar, A.: Lectures on Non-Perturbative Canonical Gravity. Singapore: World Scientific, 1991 2. Back, A., Freund, P.G.O., Forger, M.: Phys. Lett. 77B, 181–184 (1978) 3. Baum, H.: Spin–Strukturen und Dirac–Operatoren über pseudoriemannschen Mannigfaltigkeiten. Leipzig: Teubner Verlag, 1981 4. Hawking, S.W., Pope, C.N.: Phys. Lett. 73B, 42–44 (1978) 5. Hirzebruch, F.: Topological Methods in Algebraic Geometry. Berlin: Springer-Verlag, 1978 6. Isham, C.J.: Proc. R. Soc. Lond. A 364, 591–599 (1978) 7. Kobayashi, S., Nomizu, K.: Foundations of Differential Geometry, Volume I. New York: John Wiley, 1963 8. Lawson, H.B., Michelson, M.-L.: Spin Geometry. Princeton: Princeton University Press, 1989 9. Louko, J.: Class. Quantum Grav. 12, 2441–2467 (1995) 10. Milnor, J.: Enseignement Math. 9, 198–203 (1963) 11. Milnor, J., Stasheff, J.D.: Characteristic Classes. Princeton: Princeton University Press, 1974 12. Petry, H.R.: J. Math. Phys. 20, 231–240 (1979) 13. Streater, R.F., Wightman, A.S.: PCT, Spin and Statistics, and All That. New York: Benjamin, 1964 14. Witten, E.: Nucl. Phys. B311, 46–78 (1988/89) Communicated by H. Nicolai

Commun. Math. Phys. 209, 275 – 324 (2000)

Communications in

Mathematical Physics

© Springer-Verlag 2000

Eisenstein Series and String Thresholds? N. A. Obers1 , B. Pioline2 1 Nordita and Niels Bohr Institute, Blegdamsvej 17, 2100 Copenhagen, Denmark.

E-mail: [email protected]

2 Centre de Physique Théorique, Ecole Polytechnique, Unité mixte CNRS UMR 7644, 91128 Palaiseau,

France. E-mail: [email protected] Received: 17 March 1999 / Accepted: 16 July 1999

Abstract: We investigate the relevance of Eisenstein series for representing certain G(Z)-invariant string theory amplitudes which receive corrections from BPS states only. G(Z) may stand for any of the mapping class, T-duality and U-duality groups Sl(d, Z), SO(d, d, Z) or Ed+1(d+1) (Z) respectively. Using G(Z)-invariant mass formulae, we construct invariant modular functions on the symmetric space K\G(R) of non-compact type, with K the maximal compact subgroup of G(R), that generalize the standard nonholomorphic Eisenstein series arising in harmonic analysis on the fundamental domain of the Poincaré upper half-plane. Comparing the asymptotics and eigenvalues of the Eisenstein series under second order differential operators with quantities arising in oneand g-loop string amplitudes, we obtain a manifestly T-duality invariant representation of the latter, conjecture their non-perturbative U-duality invariant extension, and analyze the resulting non-perturbative effects. This includes the R 4 and R 4 H 4g−4 couplings in toroidal compactifications of M-theory to any dimension D ≥ 4 and D ≥ 6 respectively. Contents 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . 2. Toroidal Compactification and Sl(d, Z) Eisenstein Series . . 2.1 Moduli space and Iwasawa gauge . . . . . . . . . . . . . 2.2 Fundamental and antifundamental Eisenstein series . . . 2.3 Higher representations and constrained Eisenstein series . 2.4 Decompactification and analyticity . . . . . . . . . . . . 2.5 Partial Iwasawa decomposition . . . . . . . . . . . . . . 3. SO(d, d, Z) Eisenstein Series and One-Loop Thresholds . . . 3.1 Moduli space and Iwasawa gauge . . . . . . . . . . . . . 3.2 Spinor and vector Eisenstein series . . . . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

? Work supported in part by TMR networks ERBFMRXCT96-0045 and ERBFMRXCT96-0090.

. . . . . . . . . .

276 279 279 280 281 282 284 284 284 285

276

N. A. Obers, B. Pioline

3.3 One-loop modular integral and method of orbits . . . . . . . . . . . . . . 3.4 A new second order differential operator . . . . . . . . . . . . . . . . . . 3.5 Large volume behaviour . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.6 Asymmetric thresholds and elliptic genus . . . . . . . . . . . . . . . . . 3.7 Boundary term and symmetry enhancement . . . . . . . . . . . . . . . . 4. U-Duality and Non-Perturbative R 4 Thresholds . . . . . . . . . . . . . . . . 4.1 R 4 couplings and non-renormalization . . . . . . . . . . . . . . . . . . . 4.2 String multiplet and non-perturbative R 4 couplings . . . . . . . . . . . . 4.3 Strings, particles and membranes . . . . . . . . . . . . . . . . . . . . . . 4.4 Weak coupling expansion and instanton effects . . . . . . . . . . . . . . . 5. Higher Genus Integrals and Higher Derivative Couplings . . . . . . . . . . . . 5.1 Genus g modular integral . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2 N = 4 topological string and higher derivative terms . . . . . . . . . . . 5.3 Non-perturbative R 4 H 4g−4 couplings . . . . . . . . . . . . . . . . . . . 6. Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Appendices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A. Gl(d), Sl(d), SO(d, d) and Sp(g) Laplacians . . . . . . . . . . . . . . . . . A.1 Laplacian on the SO(d)\Gl(d, R) and SO(d)\Sl(d, R) symmetric spaces A.2 Laplacian on the [SO(d) × SO(d)]\SO(d, d, R) symmetric space . . . . A.3 Laplacian on the U (g)\Sp(g, R) symmetric space . . . . . . . . . . . . A.4 Laplacian on the K\Ed+1(d+1) (R) symmetric space . . . . . . . . . . . A.5 Decompactification of the Laplacians . . . . . . . . . . . . . . . . . . . . B. Eigenmodes and Eigenvalues of the Laplacians . . . . . . . . . . . . . . . . B.1 Sl(d, Z) Eisenstein series in the fundamental representation . . . . . . . . B.2 SO(d, d, Z) Eisenstein series in the vector representation . . . . . . . . . B.3 SO(d, d, Z) Eisenstein series in the spinor representations . . . . . . . . C. Large Volume Expansions of Eisenstein Series . . . . . . . . . . . . . . . . . C.1 SO(d, d, Z) vector Eisenstein series . . . . . . . . . . . . . . . . . . . . C.2 SO(3, 3, Z) spinor and conjugate spinor Eisenstein series . . . . . . . . . C.3 SO(4, 4, Z) spinor and conjugate spinor Eisenstein series . . . . . . . . . C.4 SO(d, d, Z) spinor and conjugate spinor Eisenstein series . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

288 289 291 292 294 295 295 296 297 300 301 301 303 304 306 307 307 307 309 309 310 311 313 313 313 314 317 318 320 321 322 322

1. Introduction While worldsheet modular invariance has played a major role in the context of perturbative string theory since its early days, the advent of target space and non-perturbative dualities has brought into play yet another branch of the mathematics of automorphic forms invariant under infinite discrete groups. Indeed, physical amplitudes should depend on scalar fields usually taking values (in theories with many supersymmetries) in a symmetric space K\G(R), where K is the maximal compact subgroup of G, while duality identifies points in K\G(R) differing by the right action of an infinite discrete subgroup G(Z) of G(R). This includes in particular the mapping class group Sl(d, Z) in the case of toroidal compactifications of diffeomorphism-invariant theories, the Tduality group SO(d, d, Z) in toroidal compactifications of string theories, as well as the non-perturbative U-duality group Ed+1(d+1) (Z) in maximally supersymmetric compactified M-theory [1–3] (see for instance [4,5] for reviews and exhaustive list of references). Moreover, supersymmetry constrains certain “BPS saturated” amplitudes to be eigenmodes of second order differential operators [6–8], so that harmonic analysis

Eisenstein Series and String Thresholds

277

on such spaces provides a powerful tool for understanding these quantities. In the most favorable case, it can be used to determine exact non-perturbative results not obtainable otherwise, which can then be analyzed at weak coupling [20,10]. Other exact results can also be obtained from string-string duality, although in a much less general way, since one needs to be able to control the result on one side of the duality map. This approach was taken for vacua with 16 supersymmetries in [11–18]. In both cases, one generically obtains a few perturbative leading terms which can in principle be checked against a loop computation, whereas the non-perturbative contributions correspond to instantonic saddle points of the unknown string field theory. A number of hints for the rules of semi-classical calculus in string theory have been extracted from these results [19,20] and reproduced in Matrix models [21–23], but a complete prescription is still lacking. A better understanding of such effects would be very welcome, as it would for instance allow quantitative computations of perturbatively-forbidden processes in cases of more immediate physical relevance. The prototypical example was proposed by Green and Gutperle, who conjectured that the R 4 couplings in ten-dimensional type IIB theory were exactly given by an S-duality invariant result [9], 3/2 1 X ζ (3) X τ2 1 IIB = 8 , (1.1) fR 4 = 2 3 2 lP (m,n)6=0 |m + nτ | lP (p,q)=1 T(p,q) where in the first expression τ = a + i/gs = τ1 + iτ2 is the complexified string 1/4 coupling transforming as a modular parameter under Sl(2, Z)S and lP = gs ls the Sduality invariant ten-dimensional Planck length. This result is interpreted in the second expression as a sum over the solitonic (p, q) strings of tension T(p,q) = |p + qτ |/ ls2 . In particular, the scaling dimension −8 + 3 × 2 is appropriate for an R 4 coupling in ten dimensions. The invariant function in (1.1) is a particular case s = 3/2 of a set of non-holomorphic automorphic forms s X τ2 Sl(2,Z) = (1.2) E2;s |m + nτ |2 (m,n)6 =0

also known as Eisenstein series, which together with a discrete set of cusp forms generate the spectrum of the Laplacian on the fundamental domain of the upper half-plane, within the set of modular functions increasing at most polynomially as τ2 → ∞ (see [24] for an elementary introduction): Sl(2,Z)

1U (1)\Sl(2) E2;s

=

s(s − 1) Sl(2,Z) 1 E2;s , 1U (1)\Sl(2) = τ22 (∂τ21 + ∂τ22 ) . 2 2

(1.3)

Cusp forms are exponentially suppressed at large τ2 , and lie at discrete values along the s = 1/2 + iR axis, although no explicit form is known for them. Eisenstein series on the other hand can be expanded at weak coupling (large τ2 ) by Poisson resummation on the integer m (see Appendix C for useful formulae): Sl(2,Z)

E2;s

√ 0(s − 1/2) ζ (2s − 1) = 2ζ (2s)τ2s + 2 π τ21−s 0(s) √ 2π s τ2 X X m s−1/2 Ks−1/2 (2π τ2 |mn|) e2π imnτ1 . + 0(s) n m6 =0 n6 =0

(1.4)

278

N. A. Obers, B. Pioline

For s = 3/2, this exhibits a tree-level and one-loop term which can be checked against a perturbative computation, together with an infinite series of instantonic effects, from the saddle point expansion (C.3) of the modified Bessel function K1 : Sl(2,Z)

2π 2 φ/2 e 3 √ i XX N h −2π N (e−φ +ia) −2π N (e−φ −ia) + 4π + e e + ..., n2

E2;s=3/2 = 2ζ (3)e−3φ/2 +

(1.5)

N6=0 n|N

where eφ = gs denotes the type IIB coupling. These effects can be interpreted as arising from D-instantons and anti-D-instantons [9]. As suggested in [25], one can in fact prove that the 32 supersymmetries of type IIB imply that the exact R 4 coupling should be an eigenmode of the Laplacian on the moduli space U (1)\Sl(2, R), with a definite eigenvalue (3/8 in the conventions of the present paper) [26,6,8], which uniquely selects out the s = 3/2 Eisenstein series. In particular, it rules out contributions from cusp forms, which on the basis of the leading perturbative terms alone would have been acceptable [6]. Whereas harmonic analysis on the fundamental domain of the upper half-plane U (1)\ Sl(2, R)/Sl(2, Z) is rather well understood, it is not so for the more general symmetric spaces of interest in string theory (see however [27,28]). It is the purpose of this work to generalize these considerations to more elaborate cases, corresponding to a larger moduli space and discrete symmetry group. Such situations arise in compactifications with 16 or 32 supersymmetries, where supersymmetry prevents corrections to the scalar manifold. As we mentioned, this is the case of toroidal compactifications of string theories, with (part of the) moduli space [SO(d) × SO(d)]\SO(d, d, R)/SO(d, d, Z), or of M-theory, with moduli space K\Ed+1(d+1) (R)/Ed+1(d+1) (Z). This also happens in more complicated cases, such as type IIA on K3 , with moduli space R+ × [SO(4) × SO(20)]\SO(4, 20, R) identified by the SO(4, 20, Z) (perturbative) mirror symmetry, or type IIB on K3 , with moduli space [SO(5) × SO(21)]\SO(5, 21, R) identified by the non-perturbative symmetry SO(5, 21, Z). It is even possible to have uncorrected tree-level scalar manifolds in theories with 8 supersymmetries, as in the FHSV model [29], with a moduli space K\[Sl(2, R) × SO(2, 10, R) × SO(4, 12, R)], where K is the obvious maximal compact subgroup. Note that in this case the duality group is broken to a subgroup of SO(2, 10, Z) × SO(4, 12, Z), due to the effect of the freely acting orbifold. Usually however, in cases with 8 supersymmetries, amplitudes are given by sections of a symplectic bundle on the corrected moduli space, and our methods will not carry over in a straightforward way. This generalization was in fact started in Ref. [10], where toroidal compactifications of type IIB string theory down to 7 or 8 dimensions were considered. It was demonstrated there that the straightforward extension of the order 3/2 Eisenstein series (1.2) to the U-duality groups Sl(5, Z) and Sl(3, Z) reproduces the tree-level and one-loop R 4 thresholds, together with (p, q)-string instantons. A generalization to lower dimensional cases was also proposed, and a distinct route using successive T-dualities was taken in Ref. [30] to obtain the contribution of the O(e−1/gs ) D-brane instantons in the toroidally compactified type IIA and IIB theories; it was also pointed out that S-duality suggests 2 extra O(e−1/gs ) contributions yet to be understood. In the present work, we will take a more general approach, and investigate the properties and utility of the generalized Eisenstein series, defined for any symmetric space K\G(R) and any representation R

Eisenstein Series and String Thresholds

of G by

X

G(Z)

ER;s (g) =

279

−s δ(m ∧ m) m · Rt R(g) · m .

(1.6)

m∈3R \{0}

Here, g denotes an element in the coset K\G(R), m a vector in an integer lattice 3R transforming in the representation R. ∧ is an integer-valued product on the lattice, such that the condition m ∧ m = 0 projects the symmetric tensor product R ⊗s R onto its highest irreducible component, thus keeping only the “completely symmetric” part. This definition is to be contrasted with the one used in the mathematical literature [28]: G(Z)

E{wi } (g) =

X

r Y

ai (gh)−wi

(1.7)

h∈G(Z)/N i=1

where wi is now an arbitrary r-dimensional vector in weight space, and a(g) is the Abelian component of g in the Iwasawa decomposition of the rank-r non-compact group G(R) = K · A · N into maximal compact K, Abelian A and nilpotent N subgroups. Note that this definition is manifestly K-invariant on the left and G(Z)-invariant on the right. Choosing w along a highest-weight vector λR associated to a representation R reduces (1.7) to (1.6) where w = sλR , up to an s-dependent factor. This generalizes the equality in (1.1) to higher rank groups. The definition (1.6), albeit less general, has a clearer physical meaning: the lattice 3R labels the set of BPS states in the representation R of the duality group, M2 = m · Rt R(g) · m gives their mass squared (or tension), and m ∧ m = 0 imposes the half-BPS condition; this will be shown to be a necessary G(Z) G(Z) requirement for the eigenmode condition 1K\G ER;s ∝ ER;s , but could be dropped if one were to address non–half-BPS saturated amplitudes. The outline of this paper is as follows. In Sect. 2, we will discuss the simplest case of Sl(d, Z) Eisenstein series, where most of the features arise without the complications in the parametrization of the moduli space. In Sect. 3, we will turn to SO(d, d, Z) Eisenstein series, and discuss their applications for the computation of T-duality invariant one-loop thresholds of string theories compactified on a torus T d . In Sect. 4, we covariantize this expression to obtain exact non-perturbative R 4 couplings in toroidal compactifications of M-theory to D ≥ 4. In Sect. 5, we apply the same techniques to the g-loop threshold, and use it to deduce R 4 H 4g−4 exact couplings in the same theory. Computational details will be relegated to the Appendices. This work appeared on the archive simultaneously with Ref. [31], which uses similar techniques, albeit with a different motivation. 2. Toroidal Compactification and Sl(d, Z) Eisenstein Series 2.1. Moduli space and Iwasawa gauge. Infinite discrete symmetries appear in the simplest setting in compactifications of a diffeomorphism invariant field theory on a torus T d . Specifying the internal manifold requires a flat metric on the torus, that is a positive definite metric g. Equivalently we may specify a vielbein e ∈ Gl(d, R) such that g = et e, defined up to orthogonal rotations SO(d, R) acting on the left, which leaves g invariant. This gauge invariance can be fixed thanks to the Iwasawa decomposition Gl(d, R) = SO(d, R) × (R+ )d × Nd ,

(2.1)

where (R+ )d denotes the Abelian group of diagonal d × d matrices with positive non zero entries, and Nd the nilpotent group of upper triangular matrices with unit diagonal,

280

N. A. Obers, B. Pioline

by choosing e in the last two factors, i.e. in an upper triangular form. The Abelian part j corresponds to the radii of the torus, whereas Nd parametrizes the Wilson lines Ai of the Kaluza–Klein gauge field gµi . By general covariance, the Kaluza–Klein reduction of the field theory on the torus only involves contractions with the metric gij , so that the reduced theory is invariant under a symmetry h ∈ Gl(d, R) which transforms g in the representation g → ht gh. The vielbein on the other hand is acted upon on the right, e → eh, which has to be compensated by a field-dependent SO(d, R) gauge transformation e → ω(e, h)e to preserve the upper triangular form. Transforming by an element h ∝ 1 in the center of Gl(d, R) corresponds to changing the volume, whereas an Sl(d, R) transformation affects the torus shape. This change is not always physical however, since an Sl(d, Z) rotation can be compensated by a global diffeomorphism of the torus, i.e. an element of the mapping class group. The toroidal compactification is therefore parametrized by the symmetric space R+ × [SO(d, R)\Sl(d, R)/Sl(d, Z)]

(2.2)

and all physical amplitudes should be invariant under Sl(d, Z). In particular, the effective action including the massive Kaluza–Klein modes will only be invariant under Sl(d, Z), and not Gl(d, R). 2.2. Fundamental and antifundamental Eisenstein series. Keeping the above in mind, it is now straightforward to generalize the Sl(2, Z) Eisenstein series (1.2) to the fundamental representation of Sl(d, Z) as h i−s X ˆ Sl(d,Z) = , (2.3) mi gij mj Ed;s mi

where the subscript d stands for the representation in which the integers mi , i = 1 . . . d transform. In fact, the above form is really a Gl(d, Z) Eisenstein series √ since we did not restrict g to have unit determinant, but the dependence on Vd = det g is trivial so we shall keep with this abuse of language. The Sl(d, Z)-invariant form in (2.3) is easily seen to be an eigenmode of the Laplacian 1 on the scalar manifold (2.2): Sl(d,Z)

1Gl(d) Ed;s 1Gl(d) =

=

s(2s − d + 1) Sl(d,Z) Ed;s , 2

1 d +1 ∂ ∂ ∂ + . gik gj l gij 4 ∂gij ∂gkl 4 ∂gij

(2.4a)

(2.4b)

In fact, it is more appropriate to restrict to the SO(d, R)\Sl(d, R) moduli, in terms of which Sl(d,Z)

1Sl(d) Ed;s

=

s(d − 1)(2s − d) Sl(d,Z) Ed;s , 2d

(2.5a)

1 In all expressions for the Laplacians in the main text of the paper we employ the convention that ∂/∂ g˜ ij is taken with respect to the diagonally rescaled metric g˜ ij = (1 − δij /2)gij , and for simplicity of notation we omit the tilde. As explained in Appendix A, this redefinition has the advantage that unrestricted sums can be used.

Eisenstein Series and String Thresholds

1Sl(d)

281

∂ 2 d +1 ∂ + . gij gij ∂gij 4 ∂gij

1 1 ∂ ∂ = gik gj l − 4 ∂gij ∂gkl 4d

(2.5b)

Here we may wonder why we should choose the integers m to lie in the fundamental representation d of Sl(d). Choosing m to transform in the antifundamental representation, h i−s X ˆ Sl(d,Z) = , (2.6) mi g ij mj E¯ d;s

mi

does not bring much novelty, since a Poisson resummation over all integers mi brings us back to Eq. (2.3), albeit with a transformed order s → d/2 − s: Sl(d,Z) d;s

E¯

=

Sl(d,Z)

Vd π s 0( d2 − s) π

d 2 −s

0(s)

Sl(d,Z) . d; d2 −s

E

(2.7)

Sl(d,Z)

and E ¯ have the same eigenvalue under 1Sl(d) , but Note that the two series Ed;s d;s different eigenvalues under 1Gl(d) . This simply stems from their different dependence on the volume, and is not sufficient to lift their degeneracy under the Sl(d) Laplacian. 2.3. Higher representations and constrained Eisenstein series. We may also choose m to transform in a higher dimensional representation, i.e. as a tensor mij ... with prescribed symmetry properties. In order to determine whether we still get an eigenmode, it is useful to take a more algebraic approach. We consider acting with the Laplacian on the integral representation Z ∞ π −s t πs dt t m = exp − Mm , (2.8) m Mm 0(s) 0 t 1+s t where M = Rt R denotes the mass matrix in the representation R. Deriving only once in the exponential yields the action of the Laplacian on M, which transforms as a symmetric tensor product R ⊗s R. In order to get an eigenmode, this tensor product should be irreducible when contracted with the charges m. This puts a quadratic constraint on m, which we generically denote m ∧ m = 0. In other words, m ∧ m = 0 projects onto the highest irreducible component of the symmetric tensor product R ⊗s R. One may want to drop the quadratic constraint, and still impose higher cubic and quartic constraints, in order to obtain candidates for quarter-BPS amplitudes, but we will not pursue this line here. Assuming this constraint is fulfilled, we therefore get an insertion −Q[R ⊗ R]/4t in the integral, where Q[S] is the Casimir (T i )2 in the representation S. The other term 2 , where ˜ with two derivatives acting in the exponential gives a contribution Q[R⊗R]/4t i i ˜ Q[S] denotes the operator T ⊗ T acting on the symmetric tensor product S ⊗s S. By ˜ developing the square in (T i ⊗ 1 + 1 ⊗ T i )2 , we find that Q[S ⊗s S] = 2Q[S] + 2Q[S], so that all in all t

t

eπm Mm/t 1e−πm Mm/t =

Q[R⊗4 ] − 2Q[R⊗2 ] t Q[R⊗2 ] t (m Mm)2 − m Mm. (2.9) 2 8t 4t

We now use the expression for the Casimir of the p th symmetric power of a representation of highest weight λ, ⊗p

Q[Rλ ] = (pλ, pλ + 2ρ),

(2.10)

282

N. A. Obers, B. Pioline

where ρ is the Weyl vector, i.e. the sum of all the fundamental weights, and (·, ·) the inner product on the weight space with the length of the roots normalized to 2 (since we restrict to simply laced Lie groups of ADE type). Using formula (B.2) to integrate by part in (2.8), we thus find Proposition 1. The constrained Eisenstein series (1.6) associated to the representation of highest weight λ is an eigenmode of the Laplacian with eigenvalue G(Z)

G(Z)

1K\G ERλ ;s = s(λ, ρ − sλ) ERλ ;s .

(2.11)

This result reproduces the eigenvalue (2.5a) for the fundamental representation of Sl(d, Z) but will be applied for many other situations in the following. It implies in particular that Eisenstein series associated to representations related by outer automorphisms, i.e. symmetries of the Dynkin diagram, are degenerate under 1K\G , as well as two Eisenstein series of same representation but order s and [(λ, ρ)/(λ, λ)] − s. We also note that (2.11) can be obtained more quickly by noting that M−2s = (mt Mm)−s transforms as the symmetric power of order −2s of R, and substituting p = −2s in (2.10). Finally, we note that Eisenstein series are in fact eigenmodes of the complete algebra of invariant differential operators [28]. In some cases, it may happen that the constraint m ∧ m = 0 can be solved in terms of a lower dimensional representation. This is for instance the case of pth symmetric tensors of Sl(d, R), where the constraint implies that the integers mij kl... themselves are, up to an integer r, the symmetric power of a fundamental representation ni : mij kl... = r ni nj nk nl . . . .

(2.12)

The summation over r can then be carried out explicitly, and the result is proportional to the Eisenstein function in the fundamental representation, with a redefined order s → ps. This, however, does not happen for antisymmetric tensors. Since the antisymmetric representations are associated to the nodes of the Dynkin diagram, we are therefore led to the conjecture that a generating set of eigenmodes of the Laplacian is in general provided by the Eisenstein series associated to the nodes of the Dynkin diagram, up to cusp forms; this does not preclude identifications such as (2.7), relating Eisenstein series for representations related by outer automorphisms.

2.4. Decompactification and analyticity. Our definition of Eisenstein series has so far remained rather formal: the infinite sums appearing in (2.3), (2.6) are absolutely convergent for s > d/2 only, and need to be analytically continued for other values of s in the complex plane2 . It turns out that the analyticity properties can be determined by induction on d, which corresponds to the physical process of decompactification. We thus assume the torus T d+1 to factorize into a circle of radius R times a torus T d with metric gab and use the integral representation (2.8) of the Eisenstein series, say in the fundamental representation, Z ∞ πh i πs dt X ˆ Sl(d+1,Z) a b 2 2 n = exp − g n + R m , (2.13) Ed+1;s ab 0(s) 0 t 1+s a t m,n

2 Other regularization methods have also been discussed in Ref. [10]

Eisenstein Series and String Thresholds

283

where m denotes the first component of ni . The leading term as R → ∞ corresponds to the m = 0 contribution, which reduces to the Sl(d, Z) Eisenstein series. Subleading contributions arise by Poisson resumming on the unrestricted (since now m 6= 0) integers na : Z ∞ πs π dt X X ˆ Sl(d,Z) Sl(d,Z) + exp −π tna g ab nb − R 2 m2 , Ed+1;s = Ed;s d Vd 0(s) 0 t 1+s− 2 n m t a (2.14) √ where Vd stands for the volume det g of the torus T d . Separating the na = 0 contribution from the still subleading na 6= 0 one, we get Sl(d+1,Z)

Ed+1;s

2π s 0(s − d/2)ζ (2s − d) + π s−d/2 0(s)R 2s−d Vd q X 2π s ˆ X ˆ na g ab nb ab n n g 2π|m|R K s−d/2 a b . 0(s)R 2s−d−2 m n m2 Sl(d,Z)

= Ed;s

+

(2.15)

a

Using the asymptotic behaviour of the Bessel function (C.3), we see that the last term is exponentially suppressed of order O(e−R ), and the sum is absolutely convergent and thus analytic in s. For d = 1, the Sl(d, Z) Eisenstein series reduces to 2ζ (2s)R −2s and has a simple pole at s = 1/2. For d > 1, induction shows that the pole at s = d/2 from the second term cancels the one in the Sl(d, Z) Eisenstein series, leaving the pole at s = (d + 1)/2 from the zeta function in (2.15). We thus have Proposition 2. The Sl(d, Z) Eisenstein series of order s in the fundamental representation can be analytically continued to the s-plane with s = d/2 excluded, where it has a single pole with residue Sl(d,Z)

Ed;s

'

π d/2 1 Vd 0(d/2) s − d/2

(2.16)

This result is well known in the mathematical literature [27]. Of course, the same holds for the antifundamental representation by replacing Vd by its inverse. Let us mention in passing that, together with the functional relation (2.7), this implies a relation which generalizes 2ζ (0) = −1: Sl(d,Z)

Sl(d,Z) d;s=0

Ed;s=0 = E ¯

= −1

(2.17)

We also note that the pole at s = d/2 coincides with the vanishing of the eigenvalue of the Eisenstein series under the Laplacian 1Sl(d) . This is so because the residue is moduli independent. An invariant modular form can still be obtained by subtracting the pole, in which case the eigenmode equation gets a harmonic anomaly: π d/2 (d − 1) Sl(d,Z) . 1Sl(d) Eˆd;s=d/2 = 20(d/2)Vd

(2.18)

The case d = 2 will be particularly relevant in the sequel: Sl(2,Z) Eˆds=1 = −π log τ2 |η(τ )|4 , where η(τ ) denotes the usual Dedekind function.

(2.19)

284

N. A. Obers, B. Pioline

This computation can unfortunately not be made for constrained Eisenstein series, since the constraint prevents a simple Poisson resummation. We shall come back to this problem in the next section for the SO(d, d, Z) case. We can however conjecture the analytic structure from a simple argument: the divergences arise from the large m region, where the integers can be approximated by N = dim R continuous variables. The Nc quadratic constraints restrict the phase space to RN −Nc , while inserting an extra c in spherical coordinates, from δ(r 2 ) = δ(r)/2r. We are therefore led to the factor r −N R −N integral r c r N−Nc −1 r −2s dr, which converges for s > (N − 2Nc )/2. We therefore expect a simple pole at s = (N − 2Nc )/2 for an Eisenstein series of an N-dimensional representation with Nc independent constraints. 2.5. Partial Iwasawa decomposition. In determining the decompactification behaviour, we assumed the torus T d+1 to factorize into T d × S1 . This may be too restrictive, as for instance in M-theory applications, where we are interested in the perturbative type II limit corresponding to a vanishingly small circle of radius Rs = gs ls but still want to retain the effect of the off-diagonal metric, i.e. the Ramond one-form A. It is then convenient to take the Kaluza–Klein ansatz dx i gij dx j = R 2 (dx 1 + Aa dx a )2 + dx a gˆ ab dx b

(2.20)

which is nothing but a partial Iwasawa decomposition. This breaks the higher dimensional symmetry Sl(d + 1, R) to a subgroup Sl(d, R), together with a nilpotent group of constant shifts Aa → Aa + 3a , which is what remains from the Kaluza–Klein gauge invariance on a flat torus. In terms of these variables, the Laplacian takes the form (see Appendix A.5 for details on the derivation) 1 1 ∂ 2 d ∂ gˆ ab ∂ ∂ ∂ + + R . (2.21) R + 1Gl(d+1) = 1Gl(d) − gˆ ab 2 4 ∂ gˆ ab 4 ∂R 4 ∂R 2R ∂Aa ∂Ab One can then check that each term in (2.15) – upon reinstating the dependence on Aa – is an eigenmode of the Laplacian with the correct eigenvalue. 3. SO(d, d, Z) Eisenstein Series and One-Loop Thresholds In this section we turn to the construction of Eisenstein series for SO(d, d, Z) and its application to one-loop thresholds in type II string theory. Higher genus contributions are also amenable to an Eisenstein series representation, and will be addressed in Sect. 5. 3.1. Moduli space and Iwasawa gauge. Owing to the occurrence of winding states charged under the 2-form Bµν , any closed string theory on a torus T d exhibits a larger symmetry O(d, d, Z), a discrete subgroup of the O(d, d, R) symmetry of the massless degrees of freedom. The symmetry is actually reduced to SO(d, d, Z) in type II theories, where the elements in O(d, d, Z) with determinant −1 map type IIA to type IIB. This T-duality is valid to all orders in perturbation theory, and postulated to hold non-perturbatively as well. It contains the mapping class group Sl(d, Z) of the torus as a subgroup, as well as generators that are non-perturbative from a world-sheet point of view. The moduli space includes a symmetric subspace [SO(d, R) × SO(d, R)] \SO(d, d, R)/SO(d, d, Z)

(3.1)

Eisenstein Series and String Thresholds

285

describing the metric of the torus and the two-form background, which can again be parametrized using the Iwasawa decomposition SO . SO(d, d, R) = [SO(d, R) × SO(d, R)] × (R+ )d × N2d

(3.2)

More precisely, the Abelian part (R+ )d corresponds to the d radii (and d inverse radii) SO parametrizes the Wilson lines Aj of the Kaluza– of the torus and the nilpotent part N2d i Klein gauge field and the antisymmetric tensor Bij . In particular, in the basis where the ! 0 1d SO(d, d, Z) invariant tensor is η = , the gauge-fixed vielbein e can be chosen 1d 0 as  1 ...    1  1/R B11 B12 . . .  −A1 1 . . .  1 1/R2 B21 B22 . . . 1  2       . . .  . . .. ..       . .. . . . ·    (3.3) · e= R1     1 1 A12 . . .   R2  1      ..   . . . .. 1  .. .

..

and right symmetry transformations by an SO(d, d, R) element have to be compensated by left SO(d, R) × SO(d, R) gauge transformations. The Sl(d, R) subgroup ! −1 g . In analogy with the SO(d)\Sl(d, R) corresponds to block diagonal elements g case, we can trade the vielbein e for the gauge invariant moduli matrix −1 g g −1 B (3.4) M(V) = et e = −Bg −1 g − Bg −1 B which provides the mass matrix for BPS states in the vector representation V of SO(d, d), namely momentum and winding states. D-branes on the other hand transform as (conjugate) spinor representations S (C), and their mass matrix is given accordingly by R(e)t R(e) where R(e) is the spinor or conjugate spinor representation of the group element e. We can therefore build SO(d, d, Z) invariant functions by summing the BPS mass or tension over all BPS states, which we do now. 3.2. Spinor and vector Eisenstein series. In order to define these T-duality invariant functions, we need to be more explicit about the mass matrix and BPS conditions of these states. These have been reviewed in [5] (see [32] for a résumé) so we shall be brief in recalling them. The mass in the vector representation in terms of the KK momenta and winding numbers mi , mi (i = 1 . . . d), reads ˜ i g ij m ˜ j + mi gij mj , M2 (V) = m · et e · m = m k = mi mi = 0,

(3.5a) (3.5b)

where the last equation records the (quadratic) half-BPS condition m ∧ m = k = 0. Integer shifts of B → B + b induce a spectral flow mi → mi − bij nj on the lattice of BPS states, leaving the dressed charge m ˜ i = mi + Bij mj invariant and preserving the condition m ∧ m = 0. For the spinor representation with 2d−1 charges

286

N. A. Obers, B. Pioline

(m[1] , m[3] , m[5] , . . . )3 describing the wrapping numbers along the odd cycles of T d , the charges can be encapsulated in a differential form m = mi dxi + 3!1 mij k dxi ∧dxj ∧dxk + . . . , and the effect of the B-field is to boost the charges as m ˜ = exp 21 Bij dx i ∧ dx j · m, where · denotes the inner product. The formula [30] 1 1 ij k 2 1 ij klm 2 2 2 ˜ ) + (m ˜ ) + ... , (3.6a) (m) ˜ + (m M (S) = Vd 3! 5! 1 1 m ˜ i = mi + mj ki Bj k + mj klmi Bj k Blm + . . . , 2 8 1 m ˜ ij k = mij k + mlmij k Blm + . . . , 2 m ˜ ij klm = mij klm + . . . Vd /(gs2 ls8 )

= gives, up to a power subject to the half-BPS conditions

k

=k

i;j klmn

(3.6c) (3.6d)

lPd−8

of the T-duality invariant Planck length and

k [4] = k ij kl = m[i mj kl] = 0, [1;5]

(3.6b)

i[j k

=m

lmn]

m

(3.7a) i[j klm

+m

n]

m = 0,

k [2;6] = k ij ;klmnpq = mij [k mlmnpq] = 0,

(3.7b) (3.7c)

the mass of type IIB D-branes wrapped on an odd-dimensional cycle, or the tension of type IIA D-branes wrapped on an odd-dimensional cycle. Here we made explicit the constraints up to d = 6 only. In particular, we note that the first occurrence of the quadratic constraints is for d = 4, in which case they reduce to a singlet. For d = 5 they form a vector 5, while for d = 6 they transform in an antisymmetric representation 66 of the T-duality group SO(6, 6, Z). More generally, one should require the representation R⊗s R to be irreducible. Similarly, for the conjugate spinor representation with wrapping numbers m = (m, m[2] , m[4] , . . . ) around the even cycles of T d , we have 1 1 ij 2 1 ij kl 2 2 2 (3.8a) ˜ ) + (m ˜ ) + ... , M (C) = m ˜ + (m Vd 2 4! 1 1 m ˜ = m + mij Bij + mij kl Bij Bkl + . . . , 2 8 1 m ˜ ij = mij + mklij Bkl + . . . , 2 m ˜ ij kl = mij kl + . . . ,

(3.8b) (3.8c) (3.8d)

with half-BPS conditions k ij kl = m[ij mkl] + m mij kl = 0,

(3.9a)

k i;j klmn = mi[j mklmn] + m mij klmn = 0, k

ij ;klmnpq

ij klmnpq

=n n

ij [kl mnpq]

+n

n

= 0.

(3.9b) (3.9c)

3 Integer subscripts or superscripts in square brakets denote the number of antisymmetric Sl(d) indices. When separated by a semi-colon as in (3.7), they stand for groups of antisymmetric indices with no mutual symmetry property. The upper or lower position of the indices denotes a gradient or contragradient representation of Sl(d).

Eisenstein Series and String Thresholds

287

This describes the tension of type IIB D-branes wrapped on an even-dimensional cycle, or the mass of type IIA D-branes wrapped on an even-dimensional cycle. With these T-duality invariant building blocks in hand, we may now define the Eisenstein series for each of these three representations as X ˆ SO(d,d,Z) = δ(m ∧ m)[M2 (R)]−s , (3.10) ER;s m

where have used the labels R = V, S, C for the vector, spinor and conjugate spinor representations. Here δ(m ∧ m) stands for the quadratic constraints (3.5b), (3.7), (3.9), and M2 (R) are the mass formulae given in (3.5a), (3.6), (3.8). Not surprisingly, an explicit computation (see Appendix B) shows that these Eisenstein series are indeed eigenmodes of the Laplacian on the scalar manifold (3.1), SO(d,d,Z)

1SO(d,d) ER;s 1SO(d,d)

SO(d,d,Z)

= 1(R, s) ER;s

,

(3.11a)

1 ∂ ∂ ∂ 1 ∂ ∂ = gik gj l + , + gij 4 ∂gij ∂gkl ∂Bij ∂Bkl 2 ∂gij

(3.11b)

where the eigenvalues are given by sd(s − d + 1) , (3.12) 4 in agreement with Eq. (2.11). The degeneracy of the spinor and conjugate spinor (as well as the vector for d = 4) is a consequence of the outer automorphism which relates the two (or the three for d = 4, due to triality). We emphasize that the derivation shows that the quadratic 1/2 BPS constraints are essential for these Eisenstein series to be eigenmodes. For instance, in the case of the vector representation, the analogue of (2.9) is t (m M(V)m)2 −4(m ∧ m)2 mt M(V)m −mt M(V)m/t t −d , e 1SO(d,d) e−m M(V)m/t = t2 t (3.13) 1(V, s) = s(s − d + 1), 1(S, s) = 1(C, s) =

where m ∧ m = mi mi vanishes on half-BPS states only. For low dimensional cases however, the constraints drop or can be solved, so that we are back to ordinary Eisenstein series. This includes the d = 1 vector series, SO(1,1,Z) = 2ζ (2s) R 2s + R −2s (3.14) EV;s or the d < 4 spinor series, SO(1,1,Z)

= 2ζ (2s)R −s ,

EC;s

SO(2,2,Z)

= E2;s

Sl(2,Z)

(U ),

EC;s

SO(3,3,Z)

= E4;s

Sl(4,Z)

,

EC;s

ES;s ES;s

ES;s

SO(1,1,Z)

= 2ζ (2s)R s ,

SO(2,2,Z)

= E2;s

SO(3,3,Z)

= E¯

Sl(2,Z)

(T ),

Sl(4,Z) , 4;s

(3.15a) (3.15b) (3.15c)

where the identities in the last two lines follow from the local isomorphisms SO(2, 2, R) = Sl(2, R) × Sl(2, R) (U and T denote the standard complex moduli U = (g12 + iV2 )/g11 , T = B12 + iV2 ) and SO(3, 3, R) = Sl(4, R).

288

N. A. Obers, B. Pioline

3.3. One-loop modular integral and method of orbits. Under toroidal compactification on a torus T d , any string theory exhibits the T-duality symmetry SO(d, d, Z), and all amplitudes should be expressible in terms of modular forms of this group. For half-BPS saturated couplings, the one-loop amplitude often reduces to an integral of a lattice partition function over the fundamental domain F of the moduli space of genus-1 Riemann surfaces, Z Id = 2π

F

d 2τ Zd,d (g, B; τ ), τ22

(3.16)

where Zd,d is the partition function (or theta function) of the even self-dual lattice describing the toroidal compactification, Zd,d = Vd

X

e

− τπ (mi +τ ni )(gij +Bij )(mj +τ¯ nj ) 2

mi ,ni

= (τ2 )d/2

X mi

e−π τ2 M

2 (V)−2π iτ m∧m 1

.

,ni

(3.17a) This is for instance the case for R 4 couplings in type II strings on T d , or R 2 or F 2 couplings in type II on K3 × T 2 . In the above formula, a Poisson resummation on the integers mi takes from the Lagrangian representation, manifestly invariant under the genus 1 modular group, to the Hamiltonian representation, manifestly invariant under T-duality. It is natural to expect a connection between this one-loop modular integral and the SO(d, d, Z) Eisenstein series defined above. As is well known, the τ -integral can be carried out by the method of orbits, which corresponds to a large volume expansion of the integral. This was first carried out in [33] and extended to higher dimensional tori in [10, 14]. We will briefly review these results for later comparison with the Eisenstein series. In order to carry out the integral on the fundamental domain of the upper half plane, one uses the fact that an Sl(2, Z) modular transformation on τ can be reabsorbed by an Sl(2, Z) action on the doublet (mi , ni ): one can thus restrict the sum over (mi , ni ) to a sum over their Sl(2, Z) orbits, while unfolding the integration to a larger domain depending on the centralizer of the orbit. The orbits can be classified by defining the sub-determinants, d ij = mi nj − mj ni , so that d ij is a d × d antisymmetric matrix. We then have the trivial orbit, mi = ni = 0, with a contribution Z Idtr = 2π Vd

F

2π 2 d 2τ Vd ; = 3 τ22

(3.18)

the degenerate orbits, with all d’s being zero: in this case we can set ni = 0, unfold the integration domain F onto the strip τ1 ∈ [− 21 , 21 ], τ2 ∈ R+ , and carry out the integrals: Idd = 2Vd

X ˆ mi

1 Sl(d,Z) = 2Vd Ed;s=1 ; mi gij mj

(3.19)

the non-degenerate orbits, where at least one of the d ij is non-zero. The Sl(2, Z) modular action can be completely fixed in order to unfold the integration domain to twice the

Eisenstein Series and String Thresholds

289

upper-half plane. After Gaussian integration on τ , we obtain: h i p exp −2π (m · g · m)(n · g · n) − (m · g · n)2 + 2π iBij mi nj X ¯ p . Idn.d = 4πVd (m · g · m)(n · g · n) − (m · g · n)2 mi ,ni (3.20) The summation is performed over all sets of 2n integers, having at least one non-zero d ij , modded out by the Sl(2, Z) modular action (for d = 1, this is m > 0, 0 ≤ n < m). These terms are all exponentially suppressed at large Vd , albeit not in a uniform way. For low-dimensional cases, the sum can be further simplified, and yields the wellknown results: 1 2π 2 R+ , I2 = −2π log(T2 U2 |η(T )η(U )|4 ). (3.21) I1 = 3 R It is remarkable that these results can be rewritten in terms of SO(d, d, Z) Eisenstein series. Indeed, using the properties (3.15) and (2.19), we find Proposition 3. For d = 1, 2, the one-loop integral Id in (3.16) can be rewritten as the sum of the SO(d, d, Z) Eisenstein series of order 1 in the spinor and conjugate spinor representations: SO(d,d,Z)

Id = 2 ES;s=1

SO(d,d,Z)

+ 2 EC;s=1

.

(3.22)

In particular, the result is manifestly invariant under the extended T-duality O(d, d, Z), where the extra generator exchanges the two spinors. We shall now substantiate a similar claim for d > 2, by showing that the two sides are eigenmodes of second order differential operators with the same eigenvalues, and that they also agree in various limits. At this point, we note that the fact that the two spinor representations contribute is in agreement with the invariance of the modular integral under the extended group O(d, d, Z) which exchanges the two spinors. Besides, for d = 1 the vector Eisenstein 2 SO(1,1,Z) series − π3 EV;s=−1/2 is an equally valid candidate. 3.4. A new second order differential operator. Given that our Eisenstein series are eigenmodes of the SO(d, d) Laplacian (3.11b), we should ask about its action on the modular integral (3.16). An explicit computation of the action of 1SO(d,d) on the integrand shows that the lattice sum satisfies the differential equation d(d − 2) Zd,d = 0, (3.23) 1SO(d,d) − 21Sl(2) + 4 1 2 ∂2 ∂2 where 1Sl(2) = 2 τ2 ∂τ 2 + ∂τ 2 is the Laplacian on the upper-half plane. Upon in2

1

tegrating by parts the second term, we get a boundary term which vanishes, so that the modular integral itself is an eigenmode of 1SO(d,d) : 1SO(d,d) Id =

d(2 − d) Id . 4

(3.24)

290

N. A. Obers, B. Pioline

The modular integral Id is therefore degenerate with the SO(d, d, Z) Eisenstein series SO(d,d,Z)

EV;s=d/2−1 ,

SO(d,d,Z)

ES;s=1

,

SO(d,d,Z)

EC;s=1

,

(3.25)

EC;s=d−2 .

(3.26)

or their “duals” SO(d,d,Z)

EV;s=d/2 ,

SO(d,d,Z)

ES;s=d−2 ,

SO(d,d,Z)

We expect the functions in (3.26) to be related to the ones in (3.25) by a duality transformation analogous to (2.7), although we cannot prove this statement at present due to the presence of constraints. Less expected however is the existence of a second differential operator d , involving only the metric, which also annihilates the integrand up to a total derivative: 1 2−d ∂ 2 ∂ 2 = 1Sl(d) + , (3.27) gij gij d = 1Gl(d) − 8 ∂gij 8d ∂gij where the Gl(d) and Sl(d) Laplacians are given in (2.4b), (2.5b). Indeed an explicit computation shows that d(d − 2) d(2 − d) (3.28) Zd,d = 0, i.e. d Id = Id . d − 1Sl(2) + 8 8 The operator d is non-invariant under SO(d, d), but is invariant under complete inversion of the metric.4 This last property gives a strong constraint for the identification of the modular integral Id with Eisenstein series. Indeed, one can show that the spinor Eisenstein series are eigenmodes of d for s = 1 only, whereas the vector is always an eigenmode: s(s − d + 1) SO(d,d,Z) EV;s , 2 d(2 − d) SO(d,d,Z) SO(d,d,Z) ES;s=1 = , d ES;s=1 8 d(2 − d) SO(d,d,Z) SO(d,d,Z) = . EC;s=1 d EC;s=1 8 SO(d,d,Z)

d EV;s

=

(3.29a) (3.29b) (3.29c)

In particular, we see that the spinor Eisenstein series of order s = 1 and s = d − 2 are distinct, even though they are degenerate under 1SO(d,d) . A peculiarity occurs for d = 4, where the spinor Eisenstein series is an eigenmode for all s, whereas the conjugate spinor is an eigenmode for s = 1 only: 4 (V, s) = 4 (S, s) =

s(s − 3) , 4 (C, s = 1) = −1. 2

(3.30)

The three SO(4, 4, Z) Eisenstein series at s = 1 are therefore degenerate under both 1SO and 4 , and we conjecture that they are actually equated by triality. The degeneracy is however lifted at s 6 = 1. 4 In Ref.[34] it was shown that the one-loop integral is an eigenfunction under a non-invariant second order operator 1 that involves both g and B. The relation with d in (3.27) is d = 1SO − 21 1+ d(d−2) . Equation 8 (3.23) involving 1SO was also given in Ref. [14].

Eisenstein Series and String Thresholds

291

Summarizing the results in this section, we see that the only candidates for representing the modular integral (3.16) are the order s = 1 spinor and conjugate spinor series, together with the order s = d/2 − 1 vector series and their duals. In order to sort out these possibilities, we need to determine the behaviour of these invariant functions in various limits. 3.5. Large volume behaviour. The large volume limit of the modular integral Id has already been obtained from the orbit decomposition. The behaviour of the Eisenstein series on the other hand can be obtained by Poisson resummation techniques similar to the ones described in Sect. 2, with the complication of the constraints. The actual computation is deferred to Appendix C, and we present only the results, specializing to the relevant value of s. In the case of the vector representation, we are able to determine the complete large volume expansion: SO(d,d,Z) E V;s= d2 −1

d

=

π 2 −2

Sl(d,Z)

Vd Ed;s=1 (gij ) +

π2 Vd + 3

(3.31) 0( d2 − 1)  p 2 | + 2π iB mi nj exp −2π |(m · g · m)(n · g · n) − (m · g · n) X ij ¯ . p +2πVd 2| |(m · g · m)(n · g · n) − (m · g · n) m,ni

Here the sum runs over non-degenerate Sl(2, Z) orbits of (mi , ni ). Comparing with the expansion Idtr + Idd + Idn.d. of the modular integral (3.16) in Eqs. (3.18)–(3.20), we see a complete matching and thus obtain the theorem Theorem 4. The integral Id (3.16) of the (d, d) lattice partition function on the fundamental domain of the moduli space of genus 1 Riemann surfaces is given for d ≥ 3 by the SO(d, d, Z) Eisenstein series of order s = d/2 − 1 in the vector representation Id = 2

0( d2 − 1) d

π 2 −2

SO(d,d,Z) . V;s= d2 −1

E

(3.32)

This provides a convenient representation of the one-loop integral Id , manifestly invariant under T-duality. The Eisenstein series of order s = d/2 in the vector representation is degenerate with the one above, but singular, so we ignore it here. In the case of the Eisenstein series of order 1 in the spinor representations, the determination of the asymptotic behaviour is complicated by the presence of the constraints, and we have to content ourselves with the partial results SO(d,d,Z)

= Vd Ed;s=1 (gij ) +

SO(d,d,Z)

=

ES;s=1

EC;s=1

Sl(d,Z)

π2 Vd + . . . 3

π2 Sl(d,Z) Vd + π Vd E[2];s=1/2 (gij ) + . . . 3

(3.33a) (3.33b)

to be compared with Id =

2π 2 Sl(d,Z) Vd + 2Vd Ed;s=1 (gij ) + . . . . 3

(3.34)

292

N. A. Obers, B. Pioline

The second term in (3.33a) is correct for d ≤ 3, but we are not able to prove it explicitly for d > 3, due to the presence of the constraints; there are also exponentially suppressed corrections that we did not write. The second term in (3.33b) denotes the Eisenstein series of Sl(d, Z) in the antisymmetric representation, and appears only when d ≥ 2. For the particular order s = 1/2, it is easy to check from (B.14) that this series has the same eigenvalue as the Eisenstein series of order 1 in the fundamental representation. For d = 2, 3 we have also explicitly checked the equality of the two Eisenstein series, so that we are led to assert Conjecture 5. For any d, the Eisenstein series of Sl(d, Z) in the antisymmetric representation at the particular order s = 1/2 coincides with the Eisenstein series of order 1 in the fundamental representation: Sl(d,Z)

E[2];s=1/2 =

1 Sl(d,Z) E . π d;s=1

(3.35)

Assuming this is true, we can now formulate our second claim for the one-loop threshold: Conjecture 6. The integral (3.16) of the (d, d) lattice partition function on the fundamental domain of the moduli space of genus 1 Riemann surfaces is given for d ≥ 3 by the SO(d, d, Z) Eisenstein series of order s = 1 in any of the two spinor representations: SO(d,d,Z)

Id = 2 ES;s=1

SO(d,d,Z)

= 2 EC;s=1

.

(3.36)

This is to be contrasted with the d = 1, 2 case (3.22), where the two spinors contribute in order to enforce the O(d, d, Z) invariance of the integral (3.16). When d > 2, we conjecture that the two Eisenstein series are equal for the particular order s = 1, so that a single series is sufficient to reproduce the threshold. For d = 3, this conjecture is actually a theorem, as follows from the computation of R 4 couplings in 7 dimensions [10]. For d = 4, the conjecture (3.36) together with the theorem (3.32) implies that the one-loop integral (3.16) is invariant under SO(4, 4) triality, a fact not obvious from its representation as a theta function.

3.6. Asymmetric thresholds and elliptic genus. So far, we focused on symmetric thresholds of the type (3.16), which often appear for half-BPS saturated couplings in type II strings, and showed how they could be expressed in terms of Eisenstein series of the SO(d, d, Z) duality group. For heterotic strings however, the BPS condition constrains only the left-movers to be in their ground states, and the amplitude usually involves all excitations of the right-moving oscillators. Here we want to investigate the possible relevance of Eisenstein series for these quantities. Even though the –negative– outcome can already be anticipated due to the issue of symmetry enhancement, this will allow us to establish some identities that may become useful in later studies. One-loop BPS-saturated couplings for toroidal compactifications of the heterotic string can usually be written as the modular integral Z I

het

=

F

d 2τ Zd,d (g, B; τ ) A(F, R, τ ), τ22

(3.37)

Eisenstein Series and String Thresholds

293

where the insertion A is an almost holomorphic modular form of weight 0, depending on the background gauge-field F and curvature R in the uncompact dimensions. By almost holomorphic, we mean that A can be expanded as a finite polynomial in 1/τ2 , A(F, R, τ ) =

νX max ν=0

1 (ν) A (F, R, τ ), τ2ν

(3.38)

with A(ν) (F, R, τ ) a meromorphic function in q = e2π iτ . The non-holomorphic contributions ν ≥ 1 come from back-reaction effects, or equivalently from contact terms at the boundary of moduli space. In all string applications, the coefficients A(ν) have Laurent expansions with at most a simple pole in q, arising from the left-moving tachyon. When the elliptic genus does not depend on the gauge fields, it is actually possible to switch on Wilson lines Y , giving an SO(d, d + k, Z)-invariant threshold Z d 2τ Zd,d+k (g, B, Y ; τ ) Ak (R, τ ), (3.39) Id,k = 2 F τ2 where Zd,d+k = (τ2 )d/2

X

2 (V)−2π iτ v t ηv 1

e−π τ2 M

,

(3.40a)

mi ,pI ,ni

M2 (V) = v t Md,k (V)v, v = (mi , pI , ni ), i = 1 . . . d, I = 1 . . . k,

(3.40b)

and Ak is now an almost holomorphic modular form of weight −k/2. We can derive, also in this case, a set of second order partial differential equations satisfied by the lattice Iwasawa gauge, in partition function Zd,d+k . It is convenient to choose the following   1d the basis where the SO(d, d + k) invariant tensor reads η = 1k  :  1d       1Y C 1 g −1 (3.41) Md,k (V) = Y t 1  ·  1k  ·  1 −Y t  1 C t −Y 1 g with C = B − Y Y t /2 and B antisymmetric, as a result of the SO(d, d + k) constraint t ηM Md,k d,k = η. The right action by the SO(d, d + k) elements     1 b 1 y −yy t /2  1 −y t  and  1  (3.42) 1 1 preserving the Iwasawa gauge generates a set of continuous Borel symmetries 1 (3.43) Y → Y + y, B → B + (yY t − Yy t ) or B → B + b 2 which reduces to a discrete subgroup at the quantum level. The Laplacian then takes the form 1 ∂ ∂ ∂ 2−k ∂ ∂ gij + + 1Y , (3.44) + 1SO(d,d+k) = gik gj l 4 ∂gij ∂gkl ∂Bij ∂Bkl 4 ∂gij

294

N. A. Obers, B. Pioline

where 1Y =

1 1 I ∂ 1 J ∂ ∂ ∂ gij δ I J Y Y − − . 2 ∂Yi I 2 k ∂Bik ∂Yj J 2 l ∂Bj l

(3.45)

We may then show that the lattice partition function satisfies the following identities, generalizing (3.23) and (3.28), d(d + k − 2) (3.46a) − D Zd,d+k = 0, 1SO(d,d+k) + 4 d,k +

d(d + k − 2) 1 − D Zd,d+k = 0, 8 2

(3.46b)

where D is the modular-covariant second order differential operator acting on modular forms of weight (k/2, 0), and d,k generalizes the non-invariant operator (3.27): D = 4τ22 ∂τ¯ Dτ , Dτ = ∂τ − i

d,k =

k , 4τ2

(3.47a)

1 1 1 ∂ ∂ ∂ 2 d + 1 − k/2 ∂ − + + 1Y . (3.47b) gik gj l gij gij 4 ∂gij ∂gkl 8 ∂gij 4 ∂gij 2

3.7. Boundary term and symmetry enhancement. A quick glance at the partial differential equations (3.46) may lead us to the conclusion that the SO(d, d + k)-invariant one-loop integral (3.39) should again be an eigenmode of the operators 1SO(d,d+k) and d,k . This is wrong however, due to the presence in A of the tachyonic pole in 1/q. This pole is usually killed by the integration on the strip τ1 ∈ [−1, 1] (at large τ2 ), except at special points of the moduli space where the lattice contains a length 2 vector: the contribution q 1 q¯ 0 from the lattice sum cancels the pole, which signals an enhancement of the gauge symmetry in space-time. In particular, using the identity (3.46) requires particular care for the boundary term Z 1 (ν) ∂ 2 ∂ d τ A (F, R) (Zd,d+k ) − ∂ τ¯ τ2ν k ∂τ F Zd,d+k (ν) d/2−ν−1 . (3.48) Ak (F, R) = lim (τ2 ) τ2 →∞ (τ2 )d/2 q 0 q¯ 0 The contribution from the constant term in A and the ground state of Zd,d+k /(τ2 )d/2 yields a moduli-independent divergent (for d/2 − ν − 1 > 0) term which implies a harmless non-harmonicity, whereas the pole term in A generates a harmonic anomaly localized at enhanced symmetry points in the moduli space, clearly not captured by any candidate Eisenstein series. Finally, the term integrated by parts now involves the descendant of the elliptic genus, as explained in [14]. For d = 2, the answer to this problem is well known: the threshold involves the automorphic form of SO(2, 2 + k) constructed by Borcherds [35,36] (see also [37,14] in the physics literature). The evaluation of the modular integral (3.39) by the method of orbits gives a presentation of this form as an infinite product over a sublattice, a each term vanishes on a particular

Eisenstein Series and String Thresholds

295

divisor of [SO(2) × SO(k)]\SO(2, 2 + k), where the gauge symmetry is enhanced. It would be interesting to construct the generalization of these objects to d > 2, where the complex structure is not present (and find the analogue of the generalized prepotentials obtained in Ref.[14]) but we will not pursue this line here. 4. U-Duality and Non-Perturbative R 4 Thresholds While Eisenstein series provide a nice way to rewrite one-loop integrals such as (3.16), their utility becomes even more apparent when trying to extend the perturbative computation into a non-perturbatively exact result. Indeed, a prospective exact threshold should reduce in a weak coupling expansion to a sum of T-duality invariant Eisenstein-like perturbative terms, plus exponentially suppressed contributions, and Eisenstein series of the larger non-perturbative duality symmetry are natural candidates in that respect. This approach was taken in [10] for R 4 couplings in type II theories toroidally compactified to 7, 8, 9 dimensions, where the technology of SO(d, d, Z) Eisenstein series was hardly needed; here we would like to extend it to lower dimensional compactifications, in an attempt to understand non-perturbative effects in these cases as well. 4.1. R 4 couplings and non-renormalization. Four graviton R 4 couplings in maximally supersymmetric theories have been argued in dimension D ≥ 8 to receive no perturbative corrections beyond the tree-level and one-loop terms, and we shall assume that this holds in lower dimensions as well. The tree level term is simply obtained by dimensional reduction of the ten-dimensional 2ζ (3)e−2φ term found in [38], while the one-loop term was explicitly shown to be given by the modular integral (3.16), after cancellation of the bosonic and fermionic oscillators, so that Vd (4.1) fR 4 = 2ζ (3) 2 + Id + non pert. gs While the Ramond scalars are decoupled from the perturbative expansion by PecceiQuinn symmetries, the full non-perturbative result should depend on all the scalars in the symmetric space K\Ed+1(d+1) (R), where Ed+1(d+1) (R) is the maximally non-compact real form (also known as the normal real form) of the series of classical simply laced Lie groups E2 = Sl(2), E3 = Sl(3) × Sl(2), E4 = Sl(5), E5 = SO(5, 5) and exceptional Lie groups E6 ,E7 ,E8 [39,40]. It should furthermore be invariant under the discrete symmetry group Ed+1(d+1) (Z) also known as the U-duality group [41], which arises from the T-duality SO(d, d, Z) by adjoining the exchange of the eleventh M-theory direction with any perturbative direction. The moduli space K\Ed+1(d+1) (R) has the structure of a bundle on the manifold [SO(d) × SO(d)]\SO(d, d, R) on which the Neveu–Schwarz scalars φ, g, B live, with a fibre transforming as a spinor representation of SO(d, d, R) in which the Ramond scalars live. For D ≥ 8, it was shown [6, 8] that the exact threshold is an eigenmode of the Laplacian on the full scalar manifold as a consequence of supersymmetry, and we shall also assume that this persists in lower dimensions. As shown in Conjecture 1, the one-loop contribution can be written as the order s = 1 SO(d, d, Z) Eisenstein series in the spinor representation. On the other hand, the treelevel term can itself be represented as an Eisenstein series in the singlet representation, using the property G(Z)

E1;s

= 2ζ (2s)

(4.2)

296

N. A. Obers, B. Pioline

valid for any G, which provides a natural representation for Apery’s transcendental number ζ (3). In analogy with the D ≥ 8 case, we do not expect any further perturbative contribution. The R 4 threshold should thus be an automorphic form of Ed+1(d+1) (Z) with asymptotic behavior fR 4 =

Vd SO(d,d,Z) SO(d,d,Z) E + ES;s=1 + non pert. gs2 1;s=3/2

(4.3)

4.2. String multiplet and non-perturbative R 4 couplings. In order to propose a nonperturbative extension of this result, we therefore need to unify the singlet and spinor representations of SO(d, d, Z) into a representation of Ed+1(d+1) (Z). Remarkably, there is one, namely the string multiplet, corresponding to the leftmost node in the Dynkin diagram 1 3 lM R1 3 lM

−

R1 R2 6 lM

−

|

R1 R2 R3 9 lM

−

R1 R2 R3 R4 9 lM

(4.4) −···−

Rd+1 , 1

where each node is labelled by the tension of the states transforming in the corresponding representation [42,5]. The string multiplet is described by a collection of charges m[1] , m[4] , m[1;6] describing the wrappings of membranes, five-branes and Kaluza–Klein monopoles respectively5 , with a BPS mass given by 1 [1] 2 1 [4] 2 1 [1;6] 2 + 12 m + 18 m . (4.5) ˜ ˜ ˜ T2 = 6 m lM lM lM The dressed charges are given by m ˜ [1] = m[1] + C3 m[4] + (C3 C3 + E6 ) m[1;6] , m ˜ [4] = m[4] + C3 m[1;6] , m ˜ [1;6] = m[1;6] .

(4.6)

where C3 and C6 are the expectation value of the M-theory three-form and its dual, to be supplemented with an extra K1;8 form in D = 3. See Ref. [43] for the d ≤ 4 case and [44,5] for the general d case. This amounts to an explicit partial Iwasawa decomposition of the symmetric spaces K\Ed+1(d+1) (R). The corresponding state preserves half the supersymmetries provided the following conditions are obeyed [43,5]: k [5] = m1 m[4] = 0, k

[2;6]

[1;6]

=m m 1

(4.7a) [4]

[4]

+m m

= 0,

k [5;6] = m[4] m[1;6] = 0.

(4.7b) (4.7c)

The above constraints in turn transform as a U-duality multiplet, namely the three-brane multiplet [5]. For completeness, Table 4.1 lists the U-duality groups and string multiplets for any d ≤ 6. The decomposition of this Ed+1(d+1) (Z) irreducible representation into SO(d, d, Z) representations was carried out in [44,5], and indeed gives a singlet m = ms , a spinor 5 For simplicity we restrict ourselves to the case d ≤ 6, i.e. D ≥ 4.

Eisenstein Series and String Thresholds

297

Table 4.1. String multiplets of Ed+1(d+1) (Z) D

d +1

10

1

1

1

1

1

9

2

Sl(2, Z)

2

2

1+1

3

1+2

4+1 5 + 5¯

1 + 8S + 1

U-duality group

irrep

8

3

Sl(3, Z) × Sl(2, Z)

(3, 1)

7

4

Sl(5, Z)

5

6

5

SO(5, 5, Z)

5

6

E6(6) (Z)

10 ¯ 27

4

7

E7(7) (Z)

133

Sl(d + 1) content

6 + 15 + 6¯ 7 + 35 + 28 + . . .

SO(d, d) content

1+4 1 + 16 + 10 1 + 32 + (1+66) + . . .

S = (mi , msij k , ms,sij klm ), plus some other multiplets O when d ≥ 4. In particular, for d = 4, there is an extra singlet O = mij kl of SO(4, 4, Z), and for d = 5 a vector O = (mij kl , mi;j klmn ) of SO(5, 5, Z). The mass formula (4.5) is easily rewritten, for 3 = vanishing RR backgrounds, in terms of T-duality quantities, using the relations lM gs ls3 , Rs = gs ls : T 2 = m2 +

Vd2 2 Vd 2 M (S) + M (O), gs2 gs4

(4.8)

where we set ls = 1 and M2 (O) is the usual T-duality invariant mass for a singlet (d = 4) or a vector (d = 5). Given this group theory fact, it is therefore quite tempting to consider the following non-perturbative generalization of (4.1): Conjecture 7. The exact four-graviton R 4 coupling in toroidal compactifications of type II theory on T d , or equivalently M-theory on T d+1 , is given, up to a factor of Newton’s constant, by the Eisenstein series of the U-duality group Ed+1(d+1) (Z) in the string multiplet representation, with order s = 3/2: fR 4 =

Vd+1 Ed+1(d+1) (Z) Estring;s=3/2 . 9 lM

(4.9)

Here lM is the eleven-dimensional Planck length, Vd+1 = Rs Vd the volume of the M9 = l d−8 is the U-duality invariant gravitational theory torus T d+1 . The quantity Vd+1 / lM P constant in dimension D = 10 − d. As an immediate check, the proposal has the appropriate scaling dimension d + 1 − 9 + 3 × 2 for an R 4 coupling in dimension D = 10 − d. 4.3. Strings, particles and membranes. Before showing how this conjecture reproduces the tree-level and one-loop terms, a few remarks are in order. Firstly, our claim reduces to the Green–Gutperle conjecture (1.1) in the d = 1 case of M-theory on T 2 , or equivalently D=10 type IIB; it also contains the D = 7, 8 extension of [10], where the string multiplet transforms as a (3, 1) and 5 of Sl(3, Z)×Sl(2, Z) and Sl(5, Z) respectively, as well as the D = 6 proposal in [10], although in a refined way, since it is now a constrained Eisenstein series that is involved. This is needed to obtain an eigenmode of the Laplacian on the scalar manifold K\Ed+1(d+1) (R). Although such a requirement was strictly proved in D ≥ 8 [6,8], it should very plausibly hold in lower dimensions. Using the general

298

N. A. Obers, B. Pioline

formula (2.11), we can compute the eigenvalue of the Ed+1(d+1) (Z) Eisenstein series in the string, particle and membrane representations. These representations correspond to the leftmost, rightmost and upmost nodes in the Dynkin diagram (4.4) and can be labelled by Sl(d + 1) charges as follows: The charges of the string multiplet are given in (4.5) while the particle and membrane multiplet have charges m[1] , m[2] , m[5] , m[1;7] . . . and m, m[3] , m[1;5] . . . respectively and are listed in Tables 4.2 and 4.3. Using the weights given in [5] we can compute the eigenvalues under the Laplacian: E

(Z)

E

(Z)

E

(Z)

d+1(d+1) 1Ed+1(d+1) Estring;s

d+1(d+1) 1Ed+1(d+1) Eparticle;s

=

s(4s − d 2 + d − 4) Ed+1(d+1) (Z) Estring;s , 8−d

=

s(2(9 − d)s + d 2 − 17d + 12) Ed+1(d+1) (Z) Eparticle;s , (4.10b) 2(8 − d)

d+1(d+1) = 1Ed+1(d+1) Emembrane;s

(d + 1)s(2s − 3d + 4) Ed+1(d+1) (Z) Emembrane;s . 2(8 − d)

(4.10a)

(4.10c)

(See Appendix A.4, (A.21) for the explicit form of the Laplacian on the K\Ed+1(d+1) (R) scalar manifold.) Substituting s = 3/2 in (4.10a) and noting that the U-duality invariant 9 = l d−8 is inert under the Laplacian, we obtain factor Vd+1 / lM P Corollary 8. R 4 couplings in M-theory compactified on a torus T d+1 ar eigenmodes of the Laplacian on the symmetric space K\Ed+1(d+1) (R), with eigenvalue 1Ed+1(d+1) fR 4 =

3(d + 1)(2 − d) fR 4 . 2(8 − d)

(4.11)

This property could in principle be proved from supersymmetry arguments along the lines of [6,8], and holds order by order in the the weak coupling expansion. In particular, 12φ

the tree level contribution Vd /(gs2 ls2 ) = e d−8 / lP2−d , albeit not U-duality invariant, is an eigenmode of 1Ed+1(d+1) with the same eigenvalue as above, see Appendix A.4, Eq. (A.24). Secondly, we assumed according to Conjecture 1 that the Eisenstein series in the spinor of SO(d, d, Z) reproduces the one-loop threshold; for d = 1, 2, this is incorrect, since we need also the conjugate spinor. However, the two contribute to two different kinematic structures (t8 t8 ± 8 8 /4)R 4 , and (4.9) is only concerned with the + structure, while the − is given at one-loop only by the SO(d, d, Z) Eisenstein series of order s = 1 in the conjugate spinor representation, and is U-duality invariant by itself. Thirdly, we could have considered the representation (3.32) of the one-loop threshold in terms of the Eisenstein series of order d/2 − 1 in the vector of SO(d, d, Z); the latter appears as the leading term in the branching of the particle multiplet of Ed+1(d+1) (Z) into representations of SO(d, d, Z) (see Table 4.2), so we would be led to the Ed+1(d+1) (Z) Eisenstein series of order d/2 − 1 in the particle representation. Upon weak coupling SO(d,d,Z) expansion, this would start as a one-loop term EV;s=d/2−1 as in (3.32), but would also include another perturbative term after Poisson resumming on the vector charges, which would plausibly be the tree-level term in (4.1). Similarly, we might have started from the representation of the one-loop coupling in terms of the SO(d, d, Z) Eisenstein series of order 1 in the conjugate spinor representation; the latter arises as the leading term in the branching of the membrane multiplet of Ed+1(d+1) (Z) into representations of SO(d, d, Z) (see Table 4.3), so we would be led to the Ed+1(d+1) (Z) Eisenstein series

Eisenstein Series and String Thresholds

299

Table 4.2. Particle multiplets of Ed+1(d+1) (Z) D

d +1

10 9

Sl(d + 1) content

SO(d, d) content

1

1

1

3

2+1 3¯ + 3

4+2

4¯ + 6 5¯ + 10 + 1

8 V + 8C

U-duality group

irrep

1

1

2

Sl(2, Z)

8

3

Sl(3, Z) × Sl(2, Z)

(3, 2)

7

4

Sl(5, Z)

10

6

5

SO(5, 5, Z)

16

5

6

E6(6) (Z)

27

4

7

E7(7) (Z)

56

2+1 6+4

6¯ + 15 + 6 ¯ +7 7¯ + 21 + 21

10 + 16 + 1 12 + 32+ 12

Table 4.3. Membrane multiplets of Ed+1(d+1) (Z) D

d +1

10 9

Sl(d + 1) content

SO(d, d) content

1

1

1

1

1

1

1+1

2

U-duality group

irrep

1

1

2

Sl(2, Z)

8

3

Sl(3, Z) × Sl(2, Z)

7

4

Sl(5, Z)

(1, 2) 5¯

1+4

4+1

6

5

SO(5, 5, Z)

¯ 16

1 + 10 + 5

5

6

E6(6) (Z)

78

1 + 20 + 36 + . . .

8 C + 8V ¯ 16 + (1 + 45) + 16

4

7

E7(7) (Z)

912

1 + 35 + . . .

32 + . . .

of order 1 in the membrane representation, yielding the correct one-loop term plus an extra (presumably tree-level) perturbative contribution. Indeed, it is easy to check that the Eisenstein series E

(Z)

d+1(d+1) , Estring;s=3/2

E

(Z)

d+1(d+1) , Estring;(d+1)(d−2)/4

E

(Z)

d+1(d+1) Eparticle;s=d/2−1 ,

E

(Z)

d+1(d+1) Eparticle;s=3(d+1)/(9−d) ,

E

(Z)

d+1(d+1) Emembrane;s=1 ,

E

(Z)

d+1(d+1) Emembrane;s=3(d−2)/2 ,

(4.12a) (4.12b)

are all degenerate with fR 4 under the Laplacian. It is thus quite tempting to conjecture Conjecture 9. The Eisenstein series of Ed+1(d+1) (Z), d > 2, in the string multiplet representation at the particular order s = 3/2 is equal to the one in the particle multiplet of order s = d/2 − 1, and to the one in the membrane multiplet of order s = 1, up to numerical coefficients and powers of Newton’s constant: 0(d/2 − 1) Ed+1(d+1) (Z) Vd+1 Ed+1(d+1) (Z) Vd+1 Ed+1(d+1) (Z) Estring;s=3/2 = Eparticle;s=d/2−1 = 9 Emembrane;s=1 . (4.13) 9 d/2−2 π lM lM Again, it is easy to check that the scaling dimensions match. Note that the restriction d > 2 applies because we are making use of (3.36). For d = 3, 4, this conjecture nicely checks with (2.7),(3.35),(3.36): Sl(5,Z)

Sl(5,Z)

SO(5,5,Z)

SO(5,5,Z)

Sl(5,Z) , 5;s=1

(4.14a)

SO(5,5,Z) 16;s=1

(4.14b)

E5;s=3/2 = π E10;s=1/2 = E ¯ E10;s=3/2 = E16;s=1

= E¯

300

N. A. Obers, B. Pioline

up to factors of Newton’s constant, whereas d > 4 gives new identities. The automorphic forms in (4.13) should give three different representations of the same R 4 threshold in M-theory on T d+1 . 4.4. Weak coupling expansion and instanton effects. Now, in order to justify the claim (4.9), we need to show that it reproduces the perturbative contributions in (4.1) in a weak coupling expansion. This is achieved as usual by a sequence of Poisson resummations on the integral representation Z Z Vd π s dtdθ X ˆ 2 gs 0(s) t 1+s " # ) ( i V2 1 h i 2 π d 2 2 2 2 i m + 2 (m ¯ + m mi , ¯ + 2π iθ mm ˜ ) + Vd (mi ) + 4 m exp − t gs gs (4.15)

fR 4 =

where the integral runs from 0 to +∞ for t and 0 to 1 for the Lagrange multiplier θ ; the sum is on unrestricted integers, not vanishing all at the same time, and for definiteness we restricted to the d = 4 case with vanishing RR fields, and defined mi = ij kl msj kl /3! and m ¯ = ij kl mij kl /4!. The leading contribution as gs → 0 arises from the term mi = i ¯ = 0 with m 6 = 0, and reproduces the tree-level term in (4.1). After subtracting m =m this term, the sum over m is now unrestricted, and we can Poisson resum on m using the formula (C.1): Z Z dtdθ X Vd Vd π s ˆ + 2 2 1+s− 21 gs gs 0(s) t " # ) ( h i V2 π 1 d 2 2 i 2 2 2 i ¯ + 2π iθ m mi , (m ˜ ) + Vd (mi ) + 4 m exp −πt (m + θ m) ¯ − t gs2 gs (4.16)

fR 4 = 2ζ (2s)

where we should substitute s = 3/2. This now contains several contributions when m ¯ = 0 (and therefore mi , mi not simultaneously zero): for m = 0, we precisely recover the Eisenstein series of order s − 1/2 = 1 in the spinor representation, whereas m 6 = 0 contains non-perturbative e−1/gs effects: fR 4 = 2ζ (2s) "

Vd + gs2

Vd gs2

m2 Vd (m ˜ i )2 + Vd2 (mi )2

3 −s 2

SO(d,d,Z)

ES;s−1/2

# 2s−1

+

Vd gs2

3−2s 4

2π s X ˆ ˆ X δ(mi mi ) 0(s) m i m ,mi

4

Ks− 1 2

q 2π|m| 2 i 2 2 (m ˜ ) + Vd (mi ) + . . . . (4.17) − gs

Using the saddle point approximation (C.3) of the Bessel function at s = 3/2, we see that these non-perturbative terms can be interpreted as superposition of Euclidean D0 and D2-branes wrapped on a one-cycle mi or a three-cycle ij kl ml of T 4 , preserving half

Eisenstein Series and String Thresholds

301

of the supersymmetries (mi mi = 0) [30]. In addition to these terms, we have further contributions arising from m ¯ 6= 0,

Vd gs2

3−2s

2π s 0(s)

4

Z

1

0

Ks− 1 2

dθ

X ˆ X ˆ

"

m mi ,mi

m2 gs2 Vd 2 Vd m ¯ 2 + gs2 (m ˜ i )2 + gs2 Vd2 (mi )2

# 2s−1 4

q 2π |m + θ m| ¯ i 2 2 2 2 i 2 2 2 Vd m ¯ + gs (m ˜ ) + gs Vd (mi ) e2π iθ m mi , − 2 gs

(4.18)

which behave superficially as e−1/gs . Such non-perturbative effects are certainly unexpected in toroidal compactifications to D > 4, since there are no half-BPS instanton configurations with this action (the NS5-brane does have a tension scaling as 1/gs2 , but it can only give rise to Euclidean configurations with finite actions when D ≤ 4). Unfortunately, the infinite sum is not uniformly convergent ( |m + θ n| can vanish at any rational value of θ), so we cannot be positive about the existence of such effects at that stage6 . The matching of the tree-level and one-loop contributions together with the consistent interpretation of the D-brane contribution is however a strong support to our conjecture. 2

5. Higher Genus Integrals and Higher Derivative Couplings 5.1. Genus g modular integral. Having discussed the modular integrals arising in oneloop amplitudes, one may ask if our methods carry over to higher-loop amplitudes, which are notoriously difficult to evaluate. We shall not attempt to make any full-fledged higher-genus amplitude computation, but we will consider the higher-genus analogue of (3.16), namely the integral of a lattice partition function on the 3g − 3-dimensional moduli space of genus g curves g

Id = g

g

Zd,d = Vd

X miA ,niA ∈Z

Z Mg

g dµ Zd,d gij , Bij ; τ ,

(5.1a)

h i j exp −π(gij + Bij )(miA + τAB niB )τ2AC (mC + τ¯CD nj D ) . (5.1b)

Here, the integers miA , niA denote the winding numbers along the cycles γA and γ A of a symplectic basis of the homology lattice of the genus g curve, and the period matrix τAB , of positive definite imaginary part, describes the complex structure on the curve. (miA , niA ) transforms as a symplectic vector under Sp(g, Z) which now plays the role of the modular group. µ is the modular invariant Weil-Peterson measure on the moduli space Mg of genus g curves (see for instance [45] for a review). Except for g = 1, 2, τAB is a redundant parametrization of the Teichmüller space of dimension (3g − 3), constrained by Schottky relations. Nonetheless, for our computation it will be convenient to consider 6 One may carry out the Gaussian θ integration by summing over m modulo m ¯ only and then compute the sum over m, but this only takes us back to (4.15).

302

N. A. Obers, B. Pioline

it as a set of independent parameters living in the symmetric space U (g)\Sp(g, R), with partial Iwasawa decomposition M(V) =

!

Ig τ1 Ig

·

!

τ2−1 τ2

·

Ig τ1 Ig

! .

(5.2)

Note that the boost parameter τ1 is now symmetric, as imposed by the symplectic condition. From this it is straightforward (see Appendix A for the derivation) to determine an Sp(g, R) invariant second order differential operator, namely the Laplacian on this manifold:7 1Sp(g) =

1 τ2AC τ2BD 4

∂

∂

∂τ1AB ∂τ1CD

+

∂

∂

∂τ2AB ∂τ2CD

,

(5.3)

which reduces to half the Sl(2, R) Laplacian (1.3) for g = 1. An explicit computation along the same lines as before shows that the genus g lattice sum continues to obey a partial differential equation 1SO(d,d) − 1Sp(g) +

dg(d − g − 1) g Zd,d = 0. 4

(5.4)

The non-trivial step is now to integrate by parts the 1Sp(g) term. As we already emphasized, except in the g = 1, 2 case, the integration measure is not the Sp(g, R)-invariant measure on τ -space, but its restriction to the solution of Schottky constraints. Nevertheless, we assume that the expression of 1Sp(g) in terms of the independent coordinates still yields the appropriate Laplacian, and we can therefore integrate it by parts. Under this plausible assumption, we obtain g

1SO(d,d) Id =

dg(g + 1 − d) g Id . 4

(5.5)

Quite amazingly, comparison with (3.12) shows that this eigenvalue agrees with the order s = g Eisenstein series in the spinor and conjugate spinor representation. We are therefore led to the Conjecture 10. The integral (5.1) of the (d, d) lattice partition function on the fundamental domain of the moduli space of genus g Riemann surfaces is given, up to an overall factor, by the SO(d, d, Z) Eisenstein series of order g in the spinor representation: g

SO(d,d,Z)

Id ∝ ES;s=g

SO(d,d,Z)

+ EC;s=g

.

(5.6)

Note that the superposition of the two spinor representations is required by the O(d, d, Z) invariance of the integrand. Normalizing (5.6) would require a knowledge of the Weil– Peterson volume of the moduli space of genus g curves. 7 Again, the derivatives w.r.t. to the symmetric matrices τ and τ are computed in terms of the diagonally 1 2 rescaled matrices (1 − δAB /2)τ1,2;AB .

Eisenstein Series and String Thresholds

303

5.2. N = 4 topological string and higher derivative terms. The conjecture (5.6) is less substantiated than the 1-loop conjecture (3.36), since we do not have a second differential operator at our disposal, nor can we control the large volume limit of the lefthand side of (5.6). It is however strongly reminiscent of the genus g partition function of the N = 4 topological string [46] on T 2 , which was shown to be exactly given by the Eisenstein Sl(2,Z) series of order s = g in the spinor representation E2;s=g (T ) [47]. The precise result X ˆ

F g (uL , uR ) ∝

|n + mT |2g−4

(m,n)

+ u− u− u+ L uR + L R n + mT n + mT¯

!4g−4 (5.7)

involves a set of harmonic variables u, with charge 1/2 under the R-symmetry SO(2). This result was obtained from a set of first-order differential equations, which, loosely speaking, are nothing but the holomorphic half of our second-order differential equation (5.5). It was subsequently used to derive a set of higher derivative topological couplings R 4 H 4g−4 in type IIB string compactified over T 2 [48]. Our conjecture (5.6) suggests a natural generalization of these results to lower dimensions, which we shall now present. The topological amplitude (5.7) can be identified with higher derivative couplings R 4 H 4g−4 in type IIB string theory on T 2 in the following way. The field-strength of i of Sl(2, R) . Using the the Ramond two-forms Bµν , Dµν12 transform as a doublet HRR T ±± SO(2)\Sl(2, R) two-bein ei , these two three-forms can be converted into an SO(2) ±± i e±± , and further contracted with the harmonic variables into an doublet HRR = HRR i SO(2) invariant Hˆ RR = u+ u+ H −− + u− u− H ++ . Integrating (5.7) against R 4 Hˆ 4g−4 L R

RR

L R

RR

in harmonic superspace8 yields the physical coupling Z

√ d 8 x −γ

2g−2 X p=2−2g

++ 2g−2+p −− 2g−2−p (−)p R 4 (HRR ) (HRR )

X ˆ

T2 g · . g+p (m + nT¯ )g−p m,n (m + nT )

(5.8)

−− ++ 2 − nH 1 , we can rewrite − (m + nT¯ )HRR = mHRR Using the identity (m + nT )HRR RR the above result in a more suggestive way,

Z

X √ ˆ d 8 x −γ m,n

i )4g−4 R 4 (mi HRR 3g−2 , mi M ij (C)mj

(5.9)

where M(C) is the mass matrix in the conjugate spinor representation C of SO(2, 2, Z). Indeed, H i transforms as a conjugate spinor under the T-duality group, while mi = (m, n) transforms in the dual way. More generally, in type IIB on T d the 2-form and 1-form potentials in the RR sector transform in the conjugate spinor and spinor representation of SO(d, d) respectively, while in type IIA these two representations are interchanged. Using the representation (5.9), the generalization of the g-loop R 4 H 4g−4 coupling to lower dimensions is then obvious: in type IIA variables, 8 The precise contraction of the Lorentz indices is also obtained by dressing Hˆ RR with Grassmann parameters, and generalizes the usual t8 t8 + 8 8 /4 combination [48].

304

N. A. Obers, B. Pioline

Conjecture 11. The R 4 H 4g−4 couplings between 4 gravitons and 4g −4 Ramond threeform field-strengths in type IIA compactified on T d , d ≤ 4 are given at genus g by the SO(d, d, Z) constrained Eisenstein series in the spinor representation with insertions of 4g − 4 charges: Z X √ R 4 (m · HRR )4g−4 ˆ δ(m ∧ m) e6(g−1)φ , (5.10) I = d 10−d x −γ (m · M(S) · m)3g−2 m where φ is the T-duality invariant dilaton, related to the ten-dimensional coupling as e−2φ = Vd /gs2 lsd , and we work in units of ls . The restriction d ≤ 4 is due to the fact that for D = 5 three-form field-strength are Poincaré dual to two-form field-strengths, while for D = 4 they become part of the scalar manifold after dualization. A similar conjecture also holds for the coupling computed by the topological B-model [46], Conjecture 12. The R 4 F 4g−4 couplings between 4 gravitons and 4g − 4 Ramond twoform field-strengths in type IIA compactified on T d , d ≤ 6 are given at genus g by the SO(d, d, Z) constrained Eisenstein series in the conjugate spinor representation with insertions of 4g − 4 charges: Z X √ R 4 (m · FRR )4g−4 ˆ δ(m ∧ m) e6(g−1)φ . (5.11) I = d 10−d x −γ (m · M(C) · m)3g−2 m Here, the restriction d ≤ 6 is due to the fact that for D = 3 two-forms field-strengths become part of the scalar manifold after Poincaré dualization. The relation between these two conjectures and the genus g integral (5.6) is similar to the case of (t8 t8 ± 8 8 /4)R 4 couplings in dimensions 8 or higher: the insertions of the vertex operators of the four gravitons and the 4g −4 two-forms FRR or three-forms HRR saturate the fermionic zeromodes and select one out of the two spinor contributions in the modular integral (5.6). The end results (5.10) and (5.11) involve covariant modular functions instead of invariant ones, but behave as Eisenstein series of order 3g −2−(4g −4)/2 = g for most purposes. P (p+q)/2 /[(m + nτ )p (m + They generalize the Sl(2, Z) modular functions f p,q = ˆ τ2 q nτ¯ ) ] invariant up to a phase, that were also used in the context of non-perturbative type IIB string in [49,50]. 5.3. Non-perturbative R 4 H 4g−4 couplings. Having put the g-loop amplitude in a manifestly T-duality invariant form (5.10), it is now straightforward to propose a nonperturbative completion, invariant under the full U-duality group. For that purpose, we note that the set of three-form field-strengths in M-theory compactified on T d+1 fall into a representation of Ed+1(d+1) dual to the string multiplet which already appeared in Sect. 4 (this is strictly speaking only correct for D ≥ 5 as explained below (5.10)). The string multiplet decomposes under SO(d, d, Z) into a singlet (the Neveu-Schwarz HNS ) a spinor (the Ramond three-forms obtained by reducing the M-theory four-form field-strength), as well as further terms for d ≥ 4. It is therefore tempting to propose Conjecture 13. The R 4 H 4g−4 couplings between 4 gravitons and 4g − 4 three-form field-strengths in M-theory compactified on T d+1 , d ≤ 4 are exactly given, up to a power of Newton’s constant, by the Ed+1(d+1) (Z) constrained Eisenstein series in the string representation with insertions of 4g − 4 charges: Z X √ R 4 (m · H )4g−4 Vd+1 ˆ δ(m ∧ m) . (5.12) d 10−d x −γ I= 9 3 lM (m · M(string) · m)3g− 2 m

Eisenstein Series and String Thresholds

305

As an immediate check, we note that this proposal has the appropriate scaling dimension. The leading contribution arises by restricting the summation to ms 6 = 0 only, where ms is the top charge in the string multiplet m, contracted with the top three-form H N S : I=

Vd gs2 ls8

Z

√ 4g−4 d 10−d x −γ 2ζ (2g + 1) R 4 HN S + . . .

(5.13)

corresponding to a tree-level interaction involving the Neveu-Schwarz three-form only. The next-to-leading contribution is obtained by Poisson resummation on the integer ms , and setting the dual integer to zero, as in our analysis of R 4 couplings. This has the effect of setting m · H = mRR · HRR (for vanishing value of the Ramond scalars) and shifting the order 3g − 3/2 → 3g − 2. We thus reproduce the g-loop result (5.10). The analysis of non-perturbative effects is as in the R 4 case, and shows order e−1/gs D-brane effects 2 as well as, for d ≥ 4, contributions superficially of order e−1/gs . More explicitly, in the simplest example of ten-dimensional type IIB theory, we obtain, in units of the 10D Planck length, X ˆ m,n

τ2 |m + nτ |2

3g− 3 2

4g−4

R 4 (mHN S − nHRR )4g−4 = 2ζ (2g + 1) R 4 HN S

√ 5 −3g 0(3g − 2) ζ (6g − 4) R 4 (HRR − τ1 HN S )4g−4 + O(e−1/gs ). (5.14) + 2 πτ22 0(3g − 23 ) Turning finally to the case of non-perturbative R 4 F 4g−4 couplings, we note that the two-form field-strengths of M-theory compactified on T d+1 transform as the dual of the particle multiplet. The particle multiplet is the representation associated to the rightmost node in the Dynkin diagram (4.4) (see Ref. [5] for further details). It is thus quite natural to propose a non-perturbative completion as Z I=

X √ ˆ δ(m ∧ m) d 10−d x −γ m

R 4 (m · F )4g−4 d

(m · M(particle) · m)4g−5+ 2

,

(5.15)

where the power 4g − 5 + d/2 has been set by dimensional analysis. The particle multiplet decomposes as a vector and conjugate spinor of SO(d, d, Z) in that order, so that this proposal implies a one-loop term given by the SO(d, d, Z) Eisenstein series of order 2g − 3 + d/2 in the vector representation, plus a higher perturbative term which should reproduce the genus g term (5.11). Due to the presence of constraints, we are unfortunately not able to prove this statement at present. For g = 1, this conjecture is implied by the alternative form of the R 4 threshold in (4.13). Note that this proposal may in principle lift the difficulty raised by Berkovits and Vafa, who noted that in 8 dimensions the non-perturbative generalization of the genus g R 4 F 4g−4 terms should include a mixing between the U (1)\Sl(2, R) and SO(3)\Sl(3, R) moduli [48]. Here the mixing is built-in since the particle multiplet transforms in the (3, 2) of Sl(3, Z) × Sl(2, Z). Let us finally note that our techniques could also be used to generalize the conjectures about ∇ 2k R 4 and R 3m+1 terms [51,52], but the status of these is less clear.

306

N. A. Obers, B. Pioline

6. Conclusions Duality provides strong constraints on the non-perturbative extension of string theory. It is especially powerful in vacua with many supersymmetries, where physical amplitudes and low energy couplings have to be invariant under the symmetry group. For a restricted class of BPS saturated couplings, the supersymmetry constraints close into a set of partial differential equations, which together with perturbative boundary conditions allows to determine the result exactly. Such techniques have enabled us to obtain convenient representations of one-loop thresholds manifestly invariant under T-duality, to compute higher-genus amplitudes not tractable otherwise, and to propose an exact non-perturbative completion of R 4 H 4g−4 couplings in toroidal compactifications of Mtheory. Upon expansion in weak coupling, these results reveal a tree-level and g-loop contribution, non-perturbative order e−1/gs effects that can be attributed to Euclidean Dbranes wrapped on various cycles of the internal torus, as well as further ill-understood 2 non-perturbative effects superficially of order e−1/gs , appearing in dimension D = 6 and lower. It would be very interesting to ascertain the behaviour of these effects, and 2 eventually give an instantonic interpretation for them. In D = 4 we expect such e−1/gs effects from the Euclidean NS5-brane wrapped on T 6 which should be extracted from our conjecture (4.9). Finally, the generalization to D ≤ 2 should involve Eisenstein-like series for affine Lie algebras and even hyperbolic Kac-Moody algebras. We have focused in this work on half-BPS saturated couplings in maximally supersymmetric theories. It would be interesting to extend our techniques to (i) couplings preserving a lesser amount of supersymmetry, and (ii) half BPS states in theories with less supersymmetry. Given that the quadratic half-BPS constraint imposes second order differential equations and that the quarter-BPS condition is cubic in the charges, one may envisage that quarter-BPS saturated couplings should be eigenmodes of a cubic Casimir operator, and expressible as generalized Eisenstein series. As for the second issue, one has to face situations where the gauge symmetry can be enhanced at a particular point in the moduli space, a case where Eisenstein series seem to be of little relevance. The differential equations (3.46) and the generalized prepotentials of [14] should prove useful for constructing automorphic forms with the required singularity structure, generalizing [53,37,36,54]. Particularly interesting cases include the toroidal compactifications of the heterotic string, where five-brane instantons are little understood; type IIB compactified on K3 , where the moduli space unifies the dilaton with the other scalars in a simple form [SO(5) × SO(21)]\SO(5, 21) and where tensionless strings appear at singularities of K3 ; the FHSV model [29], where the duality group is broken to a subgroup of SO(2, 10, Z) by the freely acting orbifold construction. On a more mathematical level, our results provide a wealth of explicit examples of modular functions on symmetric spaces of non-compact type K\G, with G a real simply laced Lie group in the normal real form, that generalize the Eisenstein series on the fundamental domain of the upper half-plane. These functions can be associated to any fundamental representation of G, and are eigenmodes of the Laplacian with an easily computable eigenvalue. From analyzing their asymptotics and their behaviour under the Laplace operator as well as some other differential operator, we have been able to obtain a number of relations between Eisenstein series in various representations, although we had to content ourselves with conjectures rather than proofs in several cases. This has shown that Eisenstein series may become equal for certain values of the order s, the most useful example being the equality of the vector, spinor and conjugate spinor Eisenstein series of SO(d, d, Z) at s = d/2 − 1, s = 1 and s = 1 respectively. On the other hand, two Eisenstein series with the same eigenvalue under the Laplacian may

Eisenstein Series and String Thresholds

307

still be separated by an extra differential operator, like d in the SO(d, d) case. We have not addressed the question of the analyticity of Eisenstein series with respect to the order s: this would require an asymptotic expansion analogous to (1.4) or (2.15) with a uniformly suppressed general term. Unfortunately, it seems that the presence of constraints tends to give rise to ill-behaved expansions such as (3.31). This problem is the mathematical counterpart of the physical one raised above, namely understanding 2 the instanton effects that are superficially of order e−1/gs . It would be interesting to understand more precisely what Eisenstein series are needed to generate the spectrum of the Laplace operator for any eigenvalue (note in that respect that the order s is no longer a good parametrization, since the relation between the eigenvalue and s depends on the representation). From a mathematical point of view however, Eisenstein series are the least interesting part of the spectrum on such manifolds, which should also include a discrete series of cusp forms. Perhaps string theory will provide an explicit example of these elusive objects. Acknowledgements. We are grateful to Nathan Berkovits, Joseph Bernstein, Richard Borcherds, Greg Moore, Marios Petropoulos, Erik Verlinde, Jean-Bernard Zuber and Gysbert Zwart for useful discussions or correspondence, and especially to Elias Kiritsis for participation at an early stage of this project. B.P. thanks Nordita and both of the authors thank the CERN Theory Division for their kind hospitality and support during the completion of this work.

Appendices A. Gl(d), Sl(d), SO(d, d) and Sp(g) Laplacians In this appendix we give some details of the derivation of the Laplacians (2.4b), (2.5b), (3.11b) and (5.3) on the scalar manifolds for the four cases of Gl(d), Sl(d), SO(d, d) and Sp(g) symmetry, as well as some useful alternative forms. The Laplacians are computed from the general expression 1 1 √ (A.1) 1 = √ ∂µ γ γ µν ∂ν , ds 2 = γµν dx µ dx ν = − Tr dMdM −1 , γ 2 where γ is the bi-invariant metric on the symmetric space K\G, parametrized by the symmetric matrix M. A.1. Laplacian on the SO(d)\Gl(d, R) and SO(d)\Sl(d, R) symmetric spaces. For the SO(d)\Gl(d) case, we can choose M = g a symmetric positive definite matrix, and the metric ds 2 and volume element take the form ds 2 = g ik g j l dgij dgkl , det(ds 2 ) = 2

d(d−1) 2

(det g)−(d+1) .

(A.2)

Its inverse is easily computed by ordering the indices, 2 = dsinv

X i,j

+2

gij gij dg ii dg jj + X i;k
1 X 2

i<j ;k
ii

gik gil dg dg

kl

gik gj l + gil gj k dg ij dg kl (A.3)

308

N. A. Obers, B. Pioline

and using the relation ∂ det g = (2 − δij )g ij det g . ∂gij

(A.4)

We find, after some algebra, d +1X ∂ ∂ ∂ gik gj l − gij , ∂gij ∂gkl 2 ∂gij

X

1Gl(d) =

(A.5)

i≤j

i≤j ;k≤l

which can also be put in the form X

1Gl(d) =

gik gj l

d +1X ∂ ∂ ∂ + gij . ∂gij ∂gkl 2 ∂gij

(A.6)

i≤j

i≤j ;k≤l

In order to avoid the cumbersome sums over ordered indices, it is convenient to introduce the diagonally rescaled metric g˜ ij = (1 − δij /2)gij .

(A.7)

This then satisfies the properties ∂gij = δik δjl + δil δjk , ∂ g˜ kl

∂ det g = 2g ij det g, ∂ g˜ ij

(A.8)

which allows to write the above Laplacian in the covariant form 1Gl(d) =

1 d +1 ∂ ∂ ∂ gik gj l gij + , 4 ∂ g˜ ij ∂ g˜ kl 4 ∂ g˜ ij

(A.9)

where now repeated indices are summed over without further restrictions. This is the form given in (2.4b), where we omitted the tilde on the redefined metric as done throughout the text of the paper for simplicity of notation. To compute the Laplacian on the SO(d)\Sl(d) symmetric space from this, we decompose the element g of Gl(d) as g = t g, ˜ with det g˜ = 1. The metric then takes the form 2 dsGl

=

2 dsSl

+d

dt t

2 ,

X i≤j

gij

∂ = t∂t ∂gij

(A.10)

so that the Laplacian reads 1Gl(d)

1 1 = 1Sl(d) + t∂t t∂t = 1Sl(d) + d 4d

∂ 2 . gij ∂ g˜ ij

Together with (A.9), this yields the result (2.5b) for the Sl(d) Laplacian.

(A.11)

Eisenstein Series and String Thresholds

309

A.2. Laplacian on the [SO(d) × SO(d)]\SO(d, d, R) symmetric space. Next, we turn to the Laplacian on the symmetric space [SO(d) × SO(d)]\SO(d, d) of dimension d 2 . We can choose the symmetric moduli matrix M as in (3.4), so that the metric in (A.1) reads (A.12) ds 2 g ik g j l dgij dgkl + dBij dBkl . This is a fibration on the coset SO(d) \ Gl(d), so we only need to compute the Laplacian on the fiber. We order the indices as X (A.13) g ik g j l − g il g j k dBij dBkl . dsB2 = 2 i<j ;k
The determinant of the metric on the fiber is γB = 1/(det g)d−1 , up to an irrelevant √ numerical factor. Using (A.2), the volume form on the total manifold is therefore γ = (det g)−d . The inverse metric reads 2 = dsB,inv

1 X 2

gik gj l − gil gj k dB ij dB kl

(A.14)

i<j ;k
so that the Laplacian on the fiber is given by 1B =

1 X 2

gik gj l − gil gj k

i<j ;k
∂ ∂ . ∂Bij ∂Bkl

(A.15)

where we let ∂Bij /∂Bkl = δik δjl − δil δjk . Putting this together with the Laplacian (A.6) on the base (with the appropriate volume element), we find 1SO(d,d) =

X

gik gj l

i≤j ;k≤l

X 1X ∂ ∂ ∂ ∂ ∂ + gij + gik gj l , (A.16) ∂gij ∂gkl ∂gij 4 ∂Bij ∂Bkl i≤j

ij kl

where the sum in the last term runs over unconstrained indices. An alternative form using the redefined metric (A.7) is 1 ∂ ∂ 1 ∂ ∂ ∂ + (A.17) + gij 1SO(d,d) = gik gj l 4 ∂ g˜ ij ∂ g˜ kl ∂Bij ∂Bkl 2 ∂ g˜ ij which is the one given in (3.11b). The SO(d, d + k) Laplacian (3.44) can be computed using similar techniques, but we will not give the details of this computation here. A.3. Laplacian on the U (g)\Sp(g, R) symmetric space . Next, we derive the Laplacian on the U (g)\Sp(g) symmetric space, relevant for the genus g amplitude in (5.1). Using for M the moduli matrix (5.2), the metric in (A.1) takes the form ds 2 = τ2AC τ2BD (dτ1AB dτ1CD + dτ2AB dτ2CD ).

(A.18)

This is again a fibration on the coset SO(g)\Gl(g), so again we only need to compute the Laplacian on the fiber. The determinant of the metric on the fiber is γ |τ1 = 1/(det τ2 )g+1 , up to an (irrelevant) numerical factor, so that, using (A.2) the volume form on the total

310

N. A. Obers, B. Pioline

√ manifold is γ = (det τ2 )−(g+1) . With the known result (A.6) for the Gl(d) Laplacian, we then obtain X

1Sp(g) =

A≤B;C≤D

∂ ∂ τ2AC τ2BD ∂τ2AB ∂τ2CD

+ τ2AC τ2BD

X A≤B;C≤D

∂

∂

∂τ1AB ∂τ1CD

− (g + 1)

X A≤B

τ2AB

∂ . ∂τ2AB

(A.19)

Diagonally rescaling τ1 and τ2 as before gives the more compact and covariant expression 1 ∂ ∂ ∂ ∂ + 1Sp(g) = τ2AC τ2BD (A.20) 4 ∂τ1AB ∂τ1CD ∂τ2AB ∂τ2CD which is the form given in (5.3) and reduces to half the usual Laplacian on the Poincaré upper half-plane for g = 1. A.4. Laplacian on the K\Ed+1(d+1) (R) symmetric space . We finally give here also the Laplacian on the the scalar manifold K\Ed+1(d+1) (R) of eleven-dimensional supergravity on T d+1 (equivalently type IIA string theory on T d ). In this case, the scalars are given by the metric gI J , I = 1 . . . d + 1, a three-form CI J K and its dual E6 (and for D = 11 − (d + 1) ≤ 3 an extra K1,8 -form, which will not be included below). For d ≤ 6, the corresponding Laplacian is given by 1Ed+1(d+1) =

1 ∂ ∂ gI K gJ L 4 ∂gI J ∂gKL

2 (d + 7)(d − 4) 1 ∂ ∂ + + gI J gI J 4(d − 8) ∂gI J 4(8 − d) ∂gI J 1 ∂ ∂ + g g g − 10 CRST 6 I K J L P Q ∂C ∂ERST I J P 2 · 3! lM IJP ∂ ∂ − 10 CU V W · ∂CKLQ ∂EU V W KLQ +

∂ ∂ 1 gI K gJ L gP Q gRU gSV gT W . 12 ∂EI J P RST ∂EKLQU V W 2 · 6! lM

(A.21)

The eigenvalues (4.10) of the Eisenstein series of the particle, string and membrane multiplet can be checked explicitly from this form using the mass formulae of these multiplets and the techniques employed in Appendix B. To this end it is important to express the 11D Planck length lM , which is not invariant under the U-duality group 9 = Ed+1(d+1) (Z), in terms of the invariant Planck length lP using the relation Vd+1 / lM d−8 lP . Note also that, since for d ≤ 4 the U-duality groups Ed+1(d+1) (Z) are of the Sl and SO type the Laplacian above should reduce to the corresponding forms by appropriate redefinition of the scalars. For d = 5, 6, with U-duality group E6 , E7 the above Laplacian is not contained in the previous results.

Eisenstein Series and String Thresholds

311

It is useful to determine the T-duality decomposition of the Laplacian (A.21). For that purpose, we compute the kinetic terms of the scalars in the Kaluza–Klein reduction of ten-dimensional type IIA theory. Going to the Einstein frame g → e4φ/(8−d) g, where √ φ e = gs / Vd is the invariant dilaton, we find Z S=

√ d 10−d x −g R +

1 4 1 ∂φ∂φ − ∂g∂g −1 + ∂Bg −1 ∂Bg −1 + 8−d 4 4 e2φ ∂R · M(S) · ∂R + . . . . (A.22) + 2

Here, R denote the Ramond scalars transforming in the spinor representation of SO(d, d), and the dots stand for extra scalars which originate from dualizing the Kaluza– Klein one-form, Neveu-Schwarz two-form or Ramond forms in d ≥ 5. From the property X d X d k = k = d 2d−2 , (A.23) k k k=even

k=odd

it follows that the mass matrix M(S), like M(C), has unit determinant. The volume d−1 √ element is thus given by γ = e2 φ (for d < 5 and in fact also d = 5), and the Laplacian on the symmetric space K\Ed+1(d+1) (R) then reads, in variables appropriate for T-duality, 1Ed+1(d+1) =

8−d 2 e−2φ ∂φ + 2d−1 ∂φ + 1SO(d,d) + ∂R · M −1 (S) · ∂R + . . . . 16 2 (A.24)

From this we can for example check that the Einstein-frame tree-level R 4 term e12φ/(d−8) , SO(d,d,Z) or the one-loop term e2(d−2)φ/(d−8) ES,C;s=1 are eigenmodes of the U-duality invariant Laplacian as required by the conjecture (4.11).

A.5. Decompactification of the Laplacians. We conclude by giving the decompactification formulae for the Gl(d) and SO(d, d) Laplacians. These are relevant for the study of the decompactifcation properties of the corresponding Eisenstein series. We will consider only the SO(d, d) case, since the resulting formulae for Gl(d) and Sl(d) can easily be obtained from this case. For the metric we take the U (1)-fibered form dx i gij dxj = R 2 (dx 1 + Aa dx a )2 + dx a gˆ ab dx b ,

(A.25)

where a = 2, . . . d and the original metric is gij . We also define 1 B1a = Ba , Bab = Bˆ ab + [Aa Bb − Ab Ba ]. 2

(A.26)

In terms of these variables, T-duality takes the simple form R↔

eφ 1 , eφ ↔ , Aa ↔ Ba , (gˆ ab , Bˆ ab ) inv. R R

(A.27)

312

N. A. Obers, B. Pioline

For the purpose of dimensional reduction it is, however, more convenient to introduce a modified B˜ ab field invariant under gauge transformations of Aa (but not under shifts of Ba ): Bab = B˜ ab + Aa Bb − Ba Ab .

(A.28)

In the expressions below, we also use the diagonally rescaled metric (A.7) for gˆ whenever it appears in derivatives. The Jacobian for the change of variables from (gij , Bij ) to (R, Aa , gˆ ab , Ba , B˜ ab ) is given by Aa ∂ 1 ∂ 1 Aa Bb ∂ ∂ ∂ − 2 = + Aa Ab + , ∂g11 2R ∂R R ∂Aa 2 ∂ gˆ ab R 2 ∂ B˜ ab

(A.29a)

1 ∂ Bb ∂ ∂ ∂ = 2 − Ab − 2 , ∂g1a R ∂Aa R ∂ B˜ ab ∂ g˜ˆ ab

(A.29b)

∂ ∂ ∂ = + Ab , ∂B1a ∂Ba ∂ B˜ ab

∂ ∂ = , ∂gab ∂ gˆ ab

∂ ∂ = . ∂Bab ∂ Bˆ ab

(A.29c)

The Jacobian relevant for the Gl(d) Laplacian is simply obtained by ignoring the terms involving B. Then, we find for the Gl(d) Laplacian the decomposed result (2.21), while for SO(d, d) we have after some algebra 1SO(d+1,d+1) +

1 1 gˆ ab ∂ ∂ 2 ∂ ∂ = 1SO(d,d) − gˆ ab + + R 2 2 ∂ gˆ ab 4 ∂R 2R ∂Aa ∂Ab

R 2 gˆ ab ∂ ∂ 1 1 ∂ ∂ ∂ ∂ − 2 gˆ ab Bc + gˆ ab Bc Bd . 2 2 ∂Ba ∂Bb R ∂Aa ∂ B˜ bc 2R ∂ B˜ ab ∂ B˜ cd

(A.30)

We also note that the corresponding Jacobian for the change to the (R, Aa , gˆ ab , Ba , Bˆ ab ) variables can be obtained from the one in (A.29) by substituting B˜ → 2Bˆ except for the last equation. For completeness, we also give the decomposed SO(d, d) Laplacian in these variables ∂ 2 1 1 ∂ R + 1SO(d+1,d+1) = 1SO(d,d) − gˆ ab 2 ∂ gˆ ab 4 ∂R gˆ ab ∂ R 2 gˆ ab ∂ ∂ ∂ + 2R 2 ∂Aa ∂Ab 2 ∂Ba ∂Bb 1 Bb Bd ∂ ∂ + gˆ ac R 2 Ab Ad + 8 R2 ∂ Bˆ ab ∂ Bˆ cd 1 Bc ∂ ∂ ∂ − gˆ ab R 2 Ac + 2 2 ∂Ba R ∂Aa ∂ Bˆ bc +

which manifestly exhibits the T-duality symmetry (A.27).

(A.31)

Eisenstein Series and String Thresholds

313

B. Eigenmodes and Eigenvalues of the Laplacians In this appendix we give some details on the explicit computation of the eigenvalues under the Laplacian and the non-invariant differential operator (3.27) of the various Eisenstein series and modular integral considered in the main text. These computations are most easily done using the integral representation Z ∞ π i−s h πs dt = exp − M2 (B.1) M2 1+s 0(s) 0 t t of the generic term in the Eisenstein series. The result of differentiation can be integrated by parts using 2 Z ∞ Z ∞ C −C/t dt dt −C/t C = s(αs + α + β) e . (B.2) e α 2 +β 1+s 1+s t t t t 0 0 B.1. Sl(d, Z) Eisenstein series in the fundamental representation. We start with the fundamental representation of Sl(d, Z), for which the mass matrix reads M2 (d) = mt gm = mi gij mj , and obeys the identities ∂M2 (d) = 2mi mj , ∂gij

∂ 2 M2 (d) =0 ∂gij ∂gkl

(B.3)

so that using the Laplacian (2.4b), we obtain eM

2 (d)/t

1Gl(d) e−M

2 (d)/t

1 d +1 gij 2mi mj gik gj l (mi mj )(mk ml ) − 2 t 4t i2 d + 1 1 h = 2 M2 (d) − (B.4) M2 (d). t 2t =

Then, using the identity (B.2) we immediately find the eigenvalue s(s + 1 − d+1 2 ) as given in (2.4a). The corresponding eigenvalue under the Sl(d) Laplacian follows by subtracting the (t∂/∂t)2 /d contribution in (A.11), 1 1 h 2 i2 1 2 1 M2 (d)/t ∂ 2 −M2 (d)/t e = (d) − (d) (B.5) e M gij M 4d ∂gij d t2 t so that the eigenvalue is s(s + 1 − (d + 1)/2) − s 2 /d as given in (2.5a). B.2. SO(d, d, Z) Eisenstein series in the vector representation. For the case of the vector representation of SO(d, d), the mass matrix now reads ˜ i g ij m ˜ j + ni gij nj , m ˜ i = mi + Bij nj M2 (V) = mt M(V)m = m

(B.6)

and satisfies h i ∂M2 (V) = 2 −m ˜ im ˜ j + n i nj , ∂gij

h i ∂M2 (V) =2 m ˜ i nj − m ˜ j ni , ∂Bij

(B.7)

314

N. A. Obers, B. Pioline

where m ˜ i = g ij m ˜ j . To compute the action of the Laplacian (3.11b) and of the operator d in (3.27) we need the quantities D0 =

1 ∂M2 (V) gij 2 ∂gij

2 = (m ˜ 2 )2 + (n2 )2 − 2m ˜ 2 n2 ,

(B.8a)

D1 =

1 ∂M2 (V) ∂M2 (V) gik gj l = (m ˜ 2 )2 + (n2 )2 − 2(mn) ˜ 2, 4 ∂gij ∂gkl

(B.8b)

D2 =

1 ∂M2 (V) ∂M2 (V) gik gj l = 2[m ˜ 2 n2 − (mn) ˜ 2] 4 ∂Bij ∂Bkl

(B.8c)

as well as C1 =

1 1 ∂ 2 M2 (V) ∂ 2 M2 (V) gik gj l = (d + 1)m ˜ 2 , C2 = gik gj l = (d − 1)n2 , 4 ∂gij ∂gkl 4 ∂Bij ∂Bkl (B.9a)

C0 =

1 1 ∂M2 (V) ∂ ∂M2 (V) gij gkl =m ˜ 2 + n2 , C = gij = −m ˜ 2 + n2 . 4 ∂gij ∂gkl 2 ∂gij (B.9b)

Using these data it is then easy to compute eM

2 (V)/t

eM

1SO(d,d) e−M

2 (V)/t

2 (V)/t

d exp e−M

2 (V)/t

1 1 (B.10a) [D1 + D2 ] − [C1 + C2 + C] t2 t i d 1 h (B.10b) = 2 (M2 (V))2 − 4(mn)2 − M2 (V), t t 1 1 1 1 1 = 2 D1 − D0 − C1 − C0 + (d + 1)C t 2 t 2 2 (B.10c) i d 1 h = 2 (M2 (V))2 − 4(mn)2 − M2 (V). (B.10d) 2t 2t =

The eigenvalues s(s − d + 1) and 2s (s − d + 1) of the vector Eisenstein series under the Laplacian 1SO(d,d) and the non-invariant operator d follow by using the identity (B.2), provided the half-BPS constraint mn ˜ = mn = 0 is satisfied. These are the values quoted in (3.12) and (3.29) respectively.

B.3. SO(d, d, Z) Eisenstein series in the spinor representations. The direct computation of the eigenvalues of the (conjugate) spinor Eisenstein under the SO(d, d) Laplacian is more involved and will not be given here. The general results in (3.12) have been checked directly for d ≤ 4, showing also in this case explicitly the importance of imposing the half-BPS constraints (3.7a) and (3.9a) that occur for d = 4.

Eisenstein Series and String Thresholds

315

Finally, we turn to the action of the d operator on the (conjugate) spinor representation (3.6), (3.8) of SO(d, d), with mass M2 (S) =

˜ [p] )2 1 X (m 1 , M2 (C) = Vd p! Vd p=odd

X

(m ˜ [p] )2 , p! p=even

(B.11)

˜ [p] )2 where m ˜ [p] = m[p] + B2 m[p−2] + B22 m[p−4] + . . . are the dressed charges and (m denotes the invariant square obtained with p powers of the metric. We need the derivative 1 X 2p[(m ˜ [p] )2 ]ij − (m ˜ [p] )2 g ij ∂M2 (S) = , ∂gij Vd p p!

(B.12)

where [(m ˜ [p] )2 ]ij denotes the invariant square with one power of the metric taken out. The direct computation of the full d along the same lines as the cases treated above is rather intricate. We therefore employ a method that uses the underlying group theory and the realization that d contains the Sl(d) Laplacian, as well as the structural form (3.7) of the constraints. We first note that each term in (B.11) represents a totally antisymmetric tensor of Sl(d) with p indices. For an antisymmetric p-tensor of Sl(d), the Casimir of the r th symmetric power is given by Q([p]⊗r ) =

rp(d − p)(r + d) , d

(B.13)

so that according to the general formula (2.11) the action of the Sl(d) Laplacian is 1Sl(d) [(m[p] )2 ]−s =

p(d − p)s(2s − d) [(m[p] )2 ]−s . 2d

(B.14)

Using the identity (B.2), we therefore have, up to cross terms which we neglect for the moment, eM (S)/t 1Sl(d) e−M (S)/t = # " # " 2 1 X p(d − p)(d + 2) (m[p] )2 1 X p(d − p) (m[p] )2 + cross − , t2 p d Vd p! t 2d Vd p! p 2

2

(B.15) where we emphasize that this result is only valid when enforcing the quadratic constraints on the charges. We also need e

M2 (S)/t

1 ∂ gij 2 ∂gij

2 e

−M2 (S)/t

#2 " 1 X d (m[p] )2 = 2 p− t 2 Vd p! p

" # 1 X d 2 (m[p] )2 − p− t 2 Vd p! p

(B.16)

316

N. A. Obers, B. Pioline

obtained by direct calculation using (B.12). Using the form of d in (3.27) we obtain from the two expressions in (B.15) and (B.16) that e

M2 (S)/t

d e

−M2 (S)/t

# " [p] 2 2 (m ) 1 X p(d − p) d(d − 2) − = 2 + cross t 2 8 Vd p! p

" # d(d − 2) (m[p] )2 1 X . (B.17) −p(d − p) + + t 8 Vd p! p

Using (B.2) we then deduce that ( " # X p(d −p) d(d −2) (m[p] )2 2 d (M2 (S))−s = s (s +1) +cross − (M2 (S))−s−2 2 8 Vd p! p +M (S) 2

X p

d(d − 2) −p(d − p) + 8

(m[p] )2 Vd p!

)

. (B.18)

Requiring the right-hand side to be proportional to (the diagonal terms in) (M2 (S))2 then shows us that (for generic value of d) this is only possibly when s = 1, in which case we find that d (M2 (S))−s =

d(2 − d) (M2 (S))−s , s = 1, 8

(B.19)

so that the s = 1 spinor and conjugate spinor Eisenstein series are eigenmodes of d as recorded in (3.29b), (3.29c). A special feature arises for the spinor representation of SO(4, 4), in which case we have p(d − p) = p(4 − p) = 3 for both the relevant values p = 1 and 3 so that the terms in (B.18) are proportional to (M2 (S))2 for all values of s. As a result we find that the spinor Eisenstein series for SO(4, 4) is an eigenmode of d with eigenvalue s(s −3)/2 as noted in (3.30) 9 . This is not the case for the conjugate spinor representation of SO(4, 4). Finally, we wish to point out some further checks on the cross terms that have been neglected so far. First, we have explicitly checked the full result by direct computation for the spinor representation in the cases d ≤ 4. In particular, for d = 4 one finds, as expected that the constraint m[1] ∧ m[3] = 0 of (3.7a) is crucial for the eigenvalue condition. More generally, using the metric on weight space g[p][q] = p(d − q)/d with [p] ≤ [q] two totally antisymmetric Sl(d) representations, we know from the group theory arguments in (2.9) that the cross terms in (B.15) can be incorporated by replacing the 1/t 2 term by 1 t2

"

X

p(d − q) (2 − δpq ) d p≤q

(m[p] )2 Vd p!

(m[q] )2 Vd q!

# .

(B.20)

9 In a similar way one can see that the spinor and conjugate spinor Eisenstein series of SO(2, 2) are eigenvalues for all s, but that was expected since in that case d reduces to the Sl(2) Laplacian.

Eisenstein Series and String Thresholds

317

Together with the directly computed cross terms in (B.16), this changes the 1/t 2 term in (B.17) to [p] 2 [q] 2 d(d − 2) (m ) (m ) d(p + q) + 2(p − q) − 2pq 1 X − (2 − δpq ) t 2 p≤q 4 8 Vd p! Vd q! (B.21) which will produce an analogous correction to (B.18). For s = 1 we then see that, taking into account the cross terms from the second term in (B.18), the (p, q)-dependent part is given by (d(p + q) + 2(p − q) − 2pq) − (p(d − p) + q(d − q)) = (p − q)(p − q + 2) (B.22) which we see vanishes (besides the diagonal terms p = q) for the cross terms q −p = 2. If q − p > 2, there are non-trivial effects from the constraints. A simple way to see them is to consider the two-form in the d = 4 conjugate spinor. The 1/t 2 contribution to 1Sl includes a term (m[2] )4 , where the contraction is the non-factorized one. By the Cayley–Hamilton theorem for 4 × 4 antisymmetric matrices A, 1 A4 − (TrA2 )A2 + (PfaffianA)2 1 = 0, 2

(B.23)

we see that this is [(m[2] )2 ]2 /2 up to a (m[2] ∧ m[2] )2 term, which by the half-BPS condition (3.9a) is equivalent to an extra crosterm (m m[4] )2 that was not taken into account previously and will cancel the deficit seen in (B.22).

C. Large Volume Expansions of Eisenstein Series Here, we derive the results (3.31), (3.33) by considering the large volume expansions of the vector, spinor and conjugate spinor Eisenstein series of SO(d, d). In the computations below we shall repeatedly use the Poisson resummation formula X m

X 1 t t A−1 (m+b)−2π ˜ ˜ i(m+b)a ˜ e−π(m+a) A(m+a)+2πimb = √ e−π(m+b) . det A m˜

(C.1)

Note that an insertion of m on the left-hand side translates into an insertion of −a + ˜ + b) on the righthand side. We also recall the integral representation of the iA−1 (m Bessel function Z ∞ c s/2 p dx −b/x−cx e = 2 (C.2) Ks (2 |bc|). 1+s x b 0 It is an even function in s, and admits the asymptotic expansion at large x, ! r ∞ X 0 s + k + 21 π −x 1 e 1+ Ks (x) = . 2x (2x)k k!0 s − k + 21 k=1

(C.3)

318

N. A. Obers, B. Pioline

The expansion truncates when s is half-integer, and in particular, for s = 1/2 the saddle point approximation is exact: r π −x e . (C.4) K1/2 (x) = 2x We also recall some useful facts about the Riemann Zeta and Gamma functions, ζ (s) =

∞ X π s/2 0(1 − s/2) 1 ζ (1 − s), = (1−s)/2 s m π 0(s/2)

(C.5a)

1 π2 1 , ζ (0) = − , ζ (2) = , 12 2 6

(C.5b)

m=1

ζ (−1) = − Z

∞

0(s) =

dt

e−1/t , 0(s + 1) = s0(s), 0(1) = 1, 0(1/2) =

t 1+s

0

√ π . (C.5c)

It is also useful to recall that ζ (s) has a simple pole at s = 1, simple zeros at s = −2, −3, . . . , whereas 0(s) has simple poles at s = 0, −1, −2, . . . : ζ (1 + ) =

1 1 + γ + O(), 0() = − γ + O(),

(C.6)

where γ = 0.577215... is the Euler constant. C.1. SO(d, d, Z) vector Eisenstein series. We first consider the large volume expansion of the Eisenstein series in the vector representation of SO(d, d), for which we use the integral representation SO(d,d,Z)

Es;V

πs = 0(s)

Z 0

∞

dt t 1+s

Z

1

0

dθ

π π exp − (mi + Bij nj )2 − (ni )2 + 2π iθ mi ni . t t i

X ˆ mi ,n

(C.7) Here the integration over θ incorporates the constraint mi ni = 0 and the squares denote the invariant contraction with the metric or inverse metric depending on the position of the indices. We first extract the ni = 0 piece, and in the remaining part Poisson resum SO(d,d,Z) on the integers mi which are now unconstrained. Then EV;s = J1 (V) + J2 (V) with s X π s 0( d2 − s) Sl(d,Z) 1 ˆ Sl(d,Z) = E = V (C.8a) E d , J1 (V) = d d ¯ d;s mi g ij mj π 2 −s 0(s) d; 2 −s mi Z Z 1 Vd π s ∞ dt dθ J2 (V) = 0(s) 0 t 1+s− d2 0 XX π ˆ exp −π t (mi + θ ni )2 − (ni )2 + 2π iBij mi nj . (C.8b) · t i i m

n

Eisenstein Series and String Thresholds

319

Here we have recognized the first term (C.8a) as the Eisenstein series of the antifundamental of Sl(d) and used the identity (2.7) in the last step. Continuing with the second term (C.8a) we note that although the integration over θ runs from 0 to 1 only, we can reabsorb a shift θ → θ + 1 into a spectral flow mi → mi + ni . We therefore extend the integration range of θ to N → ∞ but sum on mi modulo ni only. Then, after performing the Gaussian integration on θ , the second term becomes J2 (V) =

Vd π s 0(s)

Z

∞

dt t

0

1+s− d−1 2

(m·m)(n·n)−(m·n)2 π i nj exp −π t − n · n + 2π iB m X ij n·n t ˆ q · . i j i i n g n ij m |n

(C.9)

We now extract the terms for which (m · n)2 − (m · m)(n · n) = 0. By Schwarz inequality, this is the case if and only if mi = λni for all i, and therefore the phase factor in (C.9) is irrelevant. For a given vector n, the number of parallel vectors m modulo the spectral flow is gcd({ni }), so that we have J2 (V) = J2a (V) + J2b (V) with J2a (V) = =

Vd π Vd π

d−1 2

d−1 2

0(s − 0(s)

d−1 X 2 ) ˆ

1 gcd({n }) i n gij nj i

s− d−2 2

0(s − d−1 2 )ζ (2s − d + 1) Sl(d,Z) . E d;s− d2 +1 0(s)ζ (2s − d + 2)

(C.10)

0

Here we have split the integers ni into coprime n i ’s and greatest common divisor r, carried out the r-summation, and rewritten the coprime integers in terms of integers again at the expense of yet another r summation. Finally, for the remaining non-degenerate terms J2b (V) in (C.9) we can perform the integral on t using (C.2) so that d−1−2s 4 4Vd π s X (n · n)2 1 ¯ q J2b (V) = 0(s) i i ni g nj |(m · m)(n · n) − (n · n)2 | ij m ,n p i j Ks− d−1 2π |(m · m)(n · n) − (m · n)2 | e2π iBij m n , 2

(C.11)

where the sum runs over (non-degenerate) Sl(2, Z) orbits of vectors (m, n). Our final result for the large volume expansion of the order s Eisenstein series of the vector representation of SO(d, d) is thus obtained as the sum of expressions (C.8a), (C.10) and (C.11). Substituting the particular value s = d2 − 1, we find in particular that E

SO(d,d,Z) V;s= d2 −1

d

=

π 2 −2 0( d2

Sl(d,Z)

Vd Ed;s=1 +

π2 Vd + 3

− 1)  p 2 | + 2π iB mi nj exp −2π |(m · m)(n · n) − (m · n) X ij ¯ . p + 2πVd 2| |(m · m)(n · n) − (m · n) i i m ,n

(C.12)

320

N. A. Obers, B. Pioline Sl(d,Z)

Here, to simplify the second term (C.10) we have used (C.5b) and Ed;s=0 = −1 (see (2.16)). For the third term (C.11) we have used (C.4) to express the Bessel function K1/2 as an exponential. We have thus reproduced the announced result (3.31). C.2. SO(3, 3, Z) spinor and conjugate spinor Eisenstein series. We start with the spinor representation of SO(3, 3) with Eisenstein series with integral representation, Z ∞ πs π V3 π dt X ˆ SO(3,3,Z) i 2 2 (m (n) = exp − + nB ) − , (C.13) ES;s i 0(s) 0 t 1+s m ,n V3 t t i

where we have introduced the singlet charge n = 3!1 3 m[3] dual to the three-form charge and B i = 21 ij k Bj k is the dual of the NS 2-form. We single out the contribution with n = 0 and for the remaining terms we Poisson resum on the (unconstrained) integer mi whose dual charge is mi . The latter contribution splits up into a part with mi = 0 and a remaining non-degenerate contribution, so that after some algebra we can write SO(3,3,Z)

ES;s

X ˆ

=

mi

V3 mi gij mj

s

+

1/2 2π s V3 X n2 ˆ X ˆ + 0(s) m n mi g ij mj

2π 3/2 0(s − 3/2)ζ (2s − 3)V32−s 0(s)

3−2s 4

q i Ks−3/2 (2π V3 |n| mi g ij mj )e2π inmi B .

(C.14)

i

In particular for s = 1 this becomes SO(3,3,Z)

ES;s=1

Sl(3,Z)

= V3 E3;s=1 +

π2 V3 3

q exp(−2π V |n| mi g ij mj + 2π inmi B i ) X 3 ˆ q +π , mi g ij mj mi ,n

(C.15)

where we have used the definition (2.3) of the Sl(d) Eisenstein series, (C.5b) and (C.4) in each of the three terms respectively. The two leading terms establish the claim in (3.33a), reproducing the trivial and degenerate orbit contribution of the 1-loop integral I3 respectively. Moreover, exact agreement is also explicitly seen [10] between the third term and the non-degenerate orbit contribution (3.20). For the conjugate spinor of SO(3, 3) the integral representation of the Eisenstein series is Z ∞ πs π V3 π dt X ˆ SO(3,3,Z) i 2 ij = exp − B ) − g m (n + m m , EC;s i i j 0(s) 0 t 1+s m ,n V3 t t i

(C.16) where in this case we have dualized the two-form into a one-form n1 = 3 m[2] /2, and the dual B-field is as above. In this case, we first separate the mi = 0 contributions and for

Eisenstein Series and String Thresholds

321

the remainder Poisson resum on the unconstrained integer n, whereafter we distinguish between n = 0 and the rest. After some algebra we then have SO(3,3,Z) EC;s

=

2−s π 2s−2 0(2 − s) X V3 ˆ + 0(s) mi gij mj i

2ζ (2s)V3s

m

+

√ 2π s V3 X n2 ˆ X ˆ 0(s) mi g ij mj i n

2s−1 4

q i Ks−1/2 (2π V3 |n| mi g ij mj )e2π inmi B ,

(C.17)

m

where we have also used the identity (2.7) to rewrite the second term in terms of the fundamental representation of Sl(d). Setting s = 1 we find exactly the same result (C.15) as obtained for the spinor representation, with the first two terms interchanged as noted in (3.33b). The equality of the d = 3 spinor and conjugate series for s = 1 is obvious from the fact that the two representations have inverse mass matrices in this case and (since there are no constraints on the charges) can hence be related by a complete Poisson resummation SO(3,3,Z)

ES;s

SO(3,3,Z)

= EC;2−s

.

(C.18)

Equivalently, this identity follows from (3.15c) and (2.7). C.3. SO(4, 4, Z) spinor and conjugate spinor Eisenstein series. Moving on to SO(4, 4) we remark that from this case on, one needs to incorporate the non-trivial half-BPS constraints (3.7) and (3.9). The integral representation of the spinor Eisenstein series reads Z ∞ Z 1 πs dt SO(4,4,Z) = dθ (C.19) ES;s 0(s) 0 t 1+s 0 X π π ˆ i ij 2 2 i (m + B nj ) − V4 ni + 2π iθ m ni , exp − · V4 t t i m ,ni

where we have dualized the three-form into a one-form n1 = 3!1 4 m[3] and introduced the dual B-field B 2 = 21 4 B2 . The constraint m[1] ∧ m[3] = 0 then becomes mi ni = 0 and is incorporated due to the θ integration. The evaluation of this integral proceeds in a way analogous to the SO(d, d) vector case, and omitting the details we record the final result SO(4,4,Z)

ES;s

=

X ˆ mi

V4 i m gij mj

s

+

s−1 V42−s π 3/2 0(s − 23 )ζ (2s − 3) X 1 ˆ 0(s)ζ (2s − 2) ni g ij nj n i

√ 3−2s 4 4 V4 π s X (n · n)2 1 ¯ q + 0(s) m ,n n g ij n |(m · m)(n · n) − (n · n)2 | i i i j p ij Ks− 3 2π V4 |(m · m)(n · n) − (m · n)2 | e2π iB mi nj , 2

(C.20)

322

N. A. Obers, B. Pioline

where all inner products are taken with the inverse metric. In fact, this result can be obtained immediately from the result of the SO(4, 4) vector representation (substitute d = 4 in (C.8a) + (C.10) + (C.11)), using the triality relation M2 (S; g, B; mi , ni ) = M2 (V; V4 g −1 , B∗; ni , mi )

(C.21)

between the SO(4, 4) spinor and vector mass. For use below we also note that the first two terms can be expressed in terms of Sl(4) Eisenstein series, SO(4,4,Z)

ES;s

Sl(4,Z)

= V4s E4;s

+

V42−s π 3/2 0(s − 23 )ζ (2s − 3) Sl(4,Z) E¯ + ... . 4;1−s 0(s)ζ (2s − 2)

(C.22)

Evaluating this at s = 1 with the use of (C.5b) we reproduce the two leading terms (3.33a). Using (C.2) the non-degenerate contribution at s = 1 in (C.20) takes the form p ij 2 X ¯ exp −2πV4 |(m · m)(n · n) − (m · n) | + 2π iB mi nj p . (C.23) 2π |(m · m)(n · n) − (m · n)2 | mi ,ni Although we have not been able to show it explicitly, this contribution should be equal to the corresponding non-degenerate contribution in (C.12) for the SO(4, 4) vector Eisenstein series at s = 1, and hence equal to the non-degenerate contribution of the 1-loop integral I4 . C.4. SO(d, d, Z) spinor and conjugate spinor Eisenstein series. More generally, we can compute for all n the leading term for the spinor Eisenstein series, obtained by setting all charges m[3] = m[5] = . . . = 0 except m[1] , so that the constraints are trivial. This shows that s X Vd ˆ SO(d,d,Z) Sl(d,Z) = + . . . = Vds Ed;s + ... , (C.24) ES;s i g mj m ij i m

so that, for s = 1, we observe the leading term in (3.33a). For the conjugate spinor, we can go even further and obtain the first two leading terms. Focusing on the contributions from m and m[2] only and setting m[4] = m[6] = . . . = 0 (so that the constraints can be ignored) we find that s X π s 0(s − 21 ) s ˆ Vd SO(d,d,Z) = + Vd EC;s 1 m2 π s− 2 0(s) m X ˆ

·

mij

1 ij m gik gj l mkl

= 2Vds ζ (2s) +

s− 1 2

π s 0(s − 21 ) π s− 2 0(s) 1

δ(m[2] ∧ m[2] ) + . . . Sl(d,Z) [2];s− 21

Vds E

+ ... .

(C.25)

Here, the leading term is obtained from m[2] = 0, while the second term follows after Poisson resummation on the unconstrained m in the remainder and setting (the dual) 2 m = 0. Substituting s = 1 we immediately recognize the leading term π3 Vd .

Eisenstein Series and String Thresholds

323

References 1. Hull, C.M. and Townsend, P.K.: Enhanced gauge symmetries in superstring theories. Nucl. Phys. B451, 525–546 (1995); hep-th/9505073 2. Townsend, P.K.: The eleven-dimensional supermembrane revisited. Phys. Lett. B350, 184–187 (1995); hep-th/9501068 3. Witten, E.: String theory dynamics in various dimensions. Nucl. Phys. B443, 85–126 (1995); hepth/9503124 4. Giveon, A., Porrati, M. and Rabinovici, E.: Target space duality in string theory. Phys. Rept. 244, 77–202 (1994); hep-th/9401139 5. Obers, N.A. and Pioline, B.: U-duality and M-theory. Phys. Rept. 318, 113–225 (1999); hep-th/9809039 6. Pioline, B.: A note on non-perturbative R 4 couplings. Phys. Lett. B431, 73–76 (1998); hep-th/9804023 7. Paban, S., Sethi, S. and Stern, M.: Supersymmetry and higher derivative terms in the effective action of Yang–Mills theories. J. High Energy Phys. 06, 012 (1998); hep-th/9806028 8. Green, M.B. and Sethi, S.: Supersymmetry constraints on type IIB supergravity. Phys. Rev. D59, 046006 (1999); hep-th/9808061 9. Green, M.B. and Gutperle, M.: Effects of D instantons. Nucl. Phys. B498, 195–227 (1999); hepth/9701093 10. Kiritsis, E. and Pioline, B.: On R 4 threshold corrections in IIB string theory and (p, q) string instantons. Nucl. Phys. B508, 509–534 (1997); hep-th/9707018 11. Harvey, J.A. and Moore, G.: Five-brane instantons and R 2 couplings in N = 4 string theory. Phys. Rev. D57, 2323–2328 (1998); hep-th/9610237 12. Gregori, A., Kiritsis, E., Kounnas, C., Obers, N.A., Petropoulos, P.M. and Pioline, B.: R 2 corrections and nonperturbative dualities of N = 4 string ground states. Nucl. Phys. B510, 423–476 (1998); hepth/9708062 13. Bachas, C., Fabre, C., Kiritsis, E., Obers, N.A. and Vanhove, P.: Heterotic/type I duality and D-brane instantons. Nucl. Phys. B509, 33–52 (1998); hep-th/9707126 14. Kiritsis, E. and Obers, N.A.: Heterotic type I duality in D < 10-dimensions, threshold corrections and D instantons. J. High Energy Phys. 10, 004 (1997); hep-th/9709058 15. Antoniadis, I., Pioline, B. and Taylor, T.R.: Calculable e−1/λ effects. Nucl. Phys. B512, 61–78 (1998); hep-th/9707222 16. Lerche, W. and Stieberger, S.: Prepotential, mirror map and F theory on K3 . Adv. Theor. Math. Phys. 2, 1105–1140 (1998); hep-th/9804176 17. Lerche, W., Stieberger, S. and Warner, N.P.: Quartic gauge couplings from K3 geometry. hep-th/9811228 18. Foerger, K. and Stieberger, S.: Higher derivative couplings and heterotic type I duality in eight-dimensions. hep-th/9901020 19. Bachas, C.: Heterotic versus type-I. Nucl. Phys. Proc. Suppl. 68, 348 (1998); hep-th/9710102 20. Green, M.B. and Gutperle, M.: D particle bound states and the D instanton measure. J. High Energy Phys. 01, 5 (1998); hep-th/9711107 21. Moore, G., Nekrasov, N. and Shatashvili, S.: D particle bound states and generalized instantons. hepth/9803265 22. Kostov, I.K. and Vanhove, P.: Matrix string partition functions. Phys. Lett. B444, 196 (1998); hepth/9809130 23. Gava, E., Hammou, A., Morales, J.F., and Narain, K.S.: On the perturbative corrections around D string instantons. J. High Energy Phys. 03, 023 (1999); hep-th/9902202 24. Terras, A.: Harmonic analysis on symmetric spaces and applications Vol. I. Berlin–Heidelberg–New York: Springer Verlag, 1985 25. Green, M.B. and Vanhove, P.: D-instantons, strings and M theory. Phys. Lett. B408, 122–134 (1997); hep-th/9704145 26. Berkovits, N.: Construction of R 4 terms in N = 2, D = 8 superspace. Nucl. Phys. B514, 191–203 (1998); hep-th/9709116 27. Terras, A.: Harmonic analysis on symmetric spaces and applications, Vol. II. Berlin–Heidelberg–New York: Springer Verlag, 1985 28. Harish-Chandra: Automorphic Forms on Semisimple Lie Groups. No. 62 in Lecture Notes in Mathematics, Berlin–Heidelberg–New York: Springer Verlag, 1968 29. Ferrara, S., Harvey, J.A., Strominger, A. and Vafa, C.: Second quantized mirror symmetry. Phys. Lett. B361, 59–65 (1995); hep-th/9505162 30. Pioline, B. and Kiritsis, E.: U duality and D-brane combinatorics. Phys. Lett. B418, 61–69 (1998); hepth/9710078 31. Ganor, O.: Two conjectures on gauge theories, gravity, and infinite dimensional Kac-Moody groups. hep-th/9903110

324

N. A. Obers, B. Pioline

32. Obers, N.A. and Pioline, B.: U duality and M theory, an algebraic approach. In 2nd Conference on Quantum Aspects of Gauge Theories, Supersymmetry and Unification, Corfu. Sept, 1998. hep-th/9812139 33. Dixon, L.J., Kaplunovsky,V. and Louis, J.: Moduli dependence of string loop corrections to gauge coupling constants. Nucl. Phys. B355, 649–688 (1991) 34. Kiritsis, E. and Kounnas, C.: Infrared behavior of closed superstrings in strong magnetic and gravitational fields. Nucl. Phys. B456, 699–731 (1995) hep-th/9508078 35. Borcherds, R.E.: Automorphic forms on Os+2,2 (R) and infinite products. Invent. Math. 120, 161 (1995) 36. Borcherds, R.E.: Automorphic forms with singularities on Grassmannians. Invent. Math. 132, 491–562 (1998); alg-geom/9609022 37. Harvey, J.A. and Moore, G.: Algebras, BPS states, and strings. Nucl. Phys. B463, 315–368 (1996); hep-th/9510182 38. Gross, D.J. and Witten, E.: Superstring modifications of Einstein’s equations. Nucl. Phys. B277, 1 (1986) 39. Cremmer, E. and Julia, B.: The SO(8) supergravity. Nucl. Phys. B159, 141 (1979) 40. Julia, B.L.: Dualities in the classical supergravity limits: Dualizations, dualities and a detour via (4k+2)dimensions. In NATO Advanced Study Institute on Strings, Branes and Dualities, Cargese. May, 1997; hep-th/9805083 41. Hull, C.M. and Townsend, P.K.: Unity of superstring dualities. Nucl. Phys. B438, 109–137 (1995); hepth/9410167 42. Elitzur, S., Giveon, A., Kutasov, D. and Rabinovici, E.: Algebraic aspects of matrix theory on T d . Nucl. Phys. B509, 122–144 (1998); hep-th/9707217 43. Dijkgraaf, R., Verlinde, E. and Verlinde, H.: BPS spectrum of the five-brane and black hole entropy. Nucl. Phys. B486, 77–88 (1997); hep-th/9603126 44. Obers, N.A., Pioline, B. and Rabinovici, E.: M theory and U duality on T d with gauge backgrounds. Nucl. Phys. B525, 163–181 (1998); hep-th/9712084 45. D’Hoker, E. and Phong, D.H.: The geometry of string perturbation theory. Rev. Mod. Phys. 60, 917 (1988) 46. Berkovits, N. and Vafa, C.: N = 4 topological strings. Nucl. Phys. B433, 123–180 (1995); hep-th/9407190 47. Ooguri, H. and Vafa, C.: All loop N = 2 string amplitudes. Nucl. Phys. B451, 121–161 (1995); hepth/9505183 48. Berkovits, N. and Vafa, C.: Type IIB R 4 H 4g−4 conjectures. Nucl. Phys. B533, 181–198 (1998); hepth/9803145 49. Kehagias, A. and Partouche, H.: The exact quartic effective action for the type IIB superstring. Phys. Lett. B422, 109–116 (1998); hep-th/9710023 50. Green, M.B., Gutperle, M. and Kwon, H.: Sixteen fermion and related terms in M theory on T 2 . Phys. Lett. B421, 149–161 (1998) hep-th/9710151 51. Russo, J.G. and Tseytlin, A.A.: One loop four graviton amplitude in eleven-dimensional supergravity. Nucl. Phys. B508, 245–259 (1997); hep-th/9707134 52. Kehagias, A. and Partouche, H.: D instanton corrections as (p, q) string effects and nonrenormalization theorems. Int. J. Mod. Phys. A13, 5075–5092 (1998); hep-th/9712164 53. Mayr, P. and Stieberger, S.: Moduli dependence of one loop gauge couplings in (0,2) compactifications. Phys. Lett. B355, 107–116 (1995); hep-th/9504129 54. Berglund, P., Henningson, M. and Wyllard, N.: Special geometry and automorphic forms. Nucl. Phys. B503, 256–276 (1997); hep-th/9703195 Communicated by R. H. Dijkgraaf

Commun. Math. Phys. 209, 325 – 352 (2000)

Communications in

Mathematical Physics

© Springer-Verlag 2000

Calabi–Yau Black Holes and (0, 4) Sigma Models Ruben Minasian1 , Gregory Moore1,2 , Dimitrios Tsimpis1 1 Department of Physics, Yale University, New Haven, CT 06520, USA 2 School of Natural Sciences, Institute for Advanced Study, Olden Lane, Princeton, NJ 08540, USA

Received: 10 May 1999 / Accepted: 16 July 1999

Abstract: When an M-theory fivebrane wraps a holomorphic surface P in a Calabi– Yau 3-fold X the low energy dynamics is that of a black string in 5 dimensional N = 1 supergravity. The infrared dynamics on the string worldsheet is an N = (0, 4) 2D conformal field theory. Assuming the 2D CFT can be described as a nonlinear sigma model, we describe the target space geometry of this model in terms of the data of X and P. Variations of weight two Hodge structures enter the construction of the model in an interesting way. 1. Introduction D-brane and M-brane models of black holes have provided an extremely intriguing approach to an understanding of black hole entropy [1] and promise to lead to important insights in other aspects of black hole physics. The program of Strominger and Vafa is based on mapping the low energy dynamics of certain configurations of branes to the conformal field theory of an effective string. The derivation of this conformal field theory is best understood (and already quite subtle) for black holes in backgrounds preserving 16 supersymmetries, such as IIB compactification on K3 × S 1 . In this paper we will investigate an analogous conformal field theory for 4D black holes in backgrounds with 8 unbroken supersymmetries. Specifically, in this paper we continue the investigation of the microscopic dynamics of wrapped five-branes following the work of Maldacena, Strominger, and Witten [2]. We consider M-theory compactifications on IR1,3 × S 1 × X, where X is a nonsingular compact Calabi–Yau 3-fold. The radius of S 1 is taken to be large with respect to the length scale set by X, which is in turn large compared to the 11D Planck scale. We usually will take the background 3-form C (3) to vanish. If an M5-brane worldvolume W6 wraps IR × S 1 × P, where P is a four-manifold P ⊂ X then the resulting object is a string S with worldsheet W2 wrapping IR × S 1 in IR1,3 × S 1 . At long distances the supergravity background is that of a black hole in IR1,3 with 8 unbroken supersymmetries at infinity.

326

R. Minasian, G. Moore, D. Tsimpis

The 4-manifold P must be a holomorphically embedded complex surface to preserve supersymmetry. In this case the low energy dynamics of the string S is described by a (0, 4) CFT and the number of massless boson and fermion degrees of freedom can be expressed purely in terms of the topology of P and of its embedding into X [2]. Knowing the number of massless degrees of freedom suffices to determine the entropy microscopically, but for many purposes one would certainly like to know the data of the (0, 4) conformal field theory of S in much more detail. The object of this paper is to express this data in terms of the data of the ambient Calabi–Yau geometry and the topology and geometry of P. In this introduction we summarize the structure of the sigma model that we will find. The detailed justification is described in the subsequent sections. Much of what we say is implicitly (and sometimes explicitly) described in [2]. Let us begin with the overall count of the degrees of freedom. In a supersymmetric configuration the surface P is a divisor for a holomorphic line bundle L over X. Let P = [P] ∈ H 2 (X; Z) be the first Chern class of L. It is Poincaré dual to the 4-cycle defined by P. Macroscopically, in the 5D supergravity obtained from M theory on X, the string is a black string and P is the charge of the string. Using index theory and holomorphic geometry [2] we computed the left- and right-moving central charges: 1 cR = 6D + c2 · P , (1.1) 2 cL = 6D + c2 · P , R R where D := 16 X P 3 and c2 · P := X P c2 (T X), verifying microscopically the entropy computed macroscopically in [3]. These central charges can also be obtained from the requirement of the complete anomaly cancellation in five dimensions [4]. We now describe some of the local geometry of the target space. This is obtained by considering the collective coordinates of the wrapped 5-brane. These include the collective coordinates associated to the 5 scalars Xa and the chiral two-form β of the 5-brane worldvolume tensormultiplet. We begin with the collective coordinates ϕ associated to the five scalars. The space of supersymmetric wrappings of charge P is the set of divisors in X in the class P . This is called a “linear system” because all divisors are zero-loci of global holomorphic sections of L, and the latter is a linear space. Of course two sections related by a multiplicative constant have the same divisor, so the linear system is just a projective space |P | := PH 0 (P, L|P ) = CPN .

(1.2)

Assuming P is a smooth ample divisor, as we should to apply classical geometry [2], the Riemann–Roch formula gives the dimension of the linear system (1.2): N := D +

1 c2 · P − 1. 12

(1.3)

Taking into account the position in noncompact IR3 , the target space of the scalars is ϕ : W2 → IR3 × |P |.

(1.4)

In this paper we will make the important restriction that the fivebrane wraps a smooth 4-cycle P. Thus if D is the discriminant locus of singular divisors in the linear system we restrict to maps ϕ with image in |P |s := |P | − D.

(1.5)

Calabi–Yau Black Holes and (0, 4) Sigma Models

327

Moreover, because of monodromy, we will even restrict attention to maps into a local neighborhood U ⊂ |P |s .1 Now we consider the collective coordinates arising from the chiral two-form β on worldvolumes of the form W6 = W2 × P. The massless modes are associated with harmonic two-forms on P. Since the form is chiral there are b2− := b2− (P) left-moving and b2+ := b2+ (P) right-moving chiral bosons. Moreover, as shown in [2], one can express these topological invariants of P in terms of D and c2 · P : 5 b2− = 4D + c2 · P − 1, 6 1 + b2 = 2D + c2 · P − 1. 6

(1.6)

If b1 (P) = 0 (which follows if b1 (X) = 0) the only fermions are rightmoving. These pair up to form D+

1 c2 · P 12

(1.7)

N = (0, 4) scalar multiplets with both left- and right- moving scalars. In addition there are real purely leftmoving scalars neutral under supersymmetry. Since N =D+

1 1 c2 · P − 1 = (b2+ − 1) 12 2

(1.8)

there are |σ (P)| = b2− (P) − b2+ (P) such scalars. Let us now consider the scalars from the chiral two-form in more detail. The splitting into left-movers and right-movers follows from the decomposition H 2 (P; IR) = H 2,− (P, IR) ⊕ H 2,+ (P; IR)

(1.9)

into anti-self-dual and self-dual parts respectively. Since P is Kähler we may further decompose H 2,+ (P; IR) = [H 2,0 (P) ⊕ H 0,2 (P)]IR ⊕ IR · J,

(1.10)

where J is the Kähler class of P induced by that of X, while H 2,− (P; IR) is purely of Hodge type (1, 1). A crucial point is that the splitting (1.9),(1.10) depends on P (i.e., on the values of ϕ), and hence on the (weight two) Hodge structure of H 2 (P; Z). Thus, a natural framework for working with the (0, 4) model is the theory of variation of Hodge structures. (See the references in [5] for some useful background material.) The Hodge structure on H 2 (P) decomposes into a “fixed part” and a “variable part” (as functions of P): H 2 (P; IR) = Hf2 (P; IR) ⊕ Hv2 (P; IR),

(1.11) R

where Hv2 is the orthogonal complement of Hf2 in the Hodge metric (θ1 , θ2 ) := P θ1 ∧θ2 . The “fixed” or “rigid” part is simply the space of 2-forms which extend to X. As pointed out in [2], since P is ample the restriction map ι∗ : H 2 (X, Z) → H 2 (P; Z) 1 We thank E. Witten for stressing the importance of the monodromy.

(1.12)

328

R. Minasian, G. Moore, D. Tsimpis

∼ H 2 (X; IR). Physically, the splitting (1.11) means the (0, 4) is injective so Hf2 (P; IR) = sigma model splits (up to possible discrete identifications by a finite group) into a product of two sigma models which we call the “universal factor” and the “entropic factor”. The terminology refers to the intuition that P should be regarded as large, so that D is a very large positive integer, determining the leading term in the black hole entropy. The CFT for the universal factor is easily described. It consists of a single (0, 4) multiplet with target space IR3 ×S 1 (for left- and right-movers) together with h1,1 (X)−2 purely leftmoving bosons.2 Since β can be shifted by large gauge transformations in H 2 (X; Z) the universal factor is just a (0, 4) Narain model with leftmoving gauge group of rank h1,1 (X) − 2. The Narain data is obtained from the projection of HR2 (X; Z) ⊗ IR Ronto the definite signature subspaces of the quadratic form (θ1 , θ2 ) = P θ1 ∧ θ2 = X P ∧ θ1 ∧ θ2 . The entropic factor is more subtle and is the focus of much of this paper. In describing this model we will make the important assumption that the model can be described by a geometrical Lagrangian (see footnote 1 above). Roughly speaking, a (0, 4) sigma model f which has a hyper-Kähler Lagrangian is determined by a choice of target space M connection (with torsion) together with a triholomorphic vector bundle with connection f The detailed conditions on the sigma model Lagrangian are written in Eq. 2.13– V → M. 2.20 below. f of the In the following sections we will argue that in our case the target space M entropic factor has a holomorphic projection f → PN , p:M

(1.13)

where the projective space PN is the linear system |P |. Physically, the degrees of freedom describing the fibers of p have their origin in those of the self-dual two-form β on W6 . The fibers are complex tori of complex dimension N = 21 (b2+ − 1). Moreover, the vector bundle V has real rank |σ (P)| − (h1,1 (X) − 2) = b2− (P) − b2+ (P) − (h1,1 (X) − 2), and the connection on V is vertical, and flat. f is zero, so that In Sect. 5.3 below we argue that the torsion of the connection on T M f is hyper-Kähler. This metric may be described as follows. The metric the metric on M of the ambient Calabi–Yau X induces a metric on the normal bundle of P ⊂ X, and therefore a metric on the linear system |P |. This metric is Kähler,3 and, by a version of the Calabi ansatz/c-map ([6, 7]) there is an induced hyper-Kähler metric on T ∗ |P|. f is that of T ∗ |P | with a hyper-Kähler structure. Thus, the the local geometry of M Having described the local geometry of the target space we now turn to global issues. There are several interesting issues one should address, but we focus on only one, namely the nature of the fibers of p (working locally in a patch of |P |). The derivation of the sigma model Lagrangian uses the chiral fivebrane Lagrangians of [8, 9]. Unfortunately, the formalism of these papers does not determine the way in which zeromodes of the b2− left-chiral and b2+ right-chiral scalars are paired with each other. Therefore we must resort to some guess-work. The key motivation for our guess is that we expect that the target space of the sigma f should be compact. Otherwise it is hard to understand how we could have finite model M dimensional spaces of BPS states. Since the linear system |P | is compact the question reduces to compactness of the fibers. Therefore, all the b2 (P) modes due to the chiral 2 Except when h1,1 (X) = 1. In this case the compact scalar in S 1 is purely right-moving and there is no left-moving gauge bundle. 3 Warning: the metric on the linear system is not the Fubini–Study metric.

Calabi–Yau Black Holes and (0, 4) Sigma Models

329

β-field should be compact scalars. The most natural way to achieve this is to assume that the fiber above P is just a conformal field theory on a torus with a flat connection, i.e., a Narain model. The data of the Narain model consists of a choice of lattice 0 of signature (p, q) of zeromodes of scalars and an orthogonal projection of 0 ⊗ IR to the definite signature subspaces defining the spectrum of left- and right-moving parts of the winding/momentum lattice. In our case we will take 0 = H 2 (P; Z) and the orthogonal projection is H 2 (P; Z) ⊗ IR → H 2,+ (P; IR) ⊥ H 2,− (P; IR)

(1.14)

induced from the metric on P. Thus, according to our hypothesis, the fibers of the projection in (1.13) are complex tori of dimension N , so we have a holomorphic integrable system. Two interesting subtleties in this discussion are, first, there is a nontrivial flat connection on the toroidal fibers, and second, by a mechanism mentioned in [2] most of the charges in H 2 (P; Z) are not conserved (we comment on this briefly in Sect. 6). Finally we mention another motivation for the present work. D-brane models of black holes appear to have interesting arithmetic properties [10, 11], at least for the case of black holes in backgrounds with 16 supersymmetries. One can entertain various conjectures about the arithmetic nature of D-brane black holes in Calabi–Yau compactification and several of these are related to questions about the numbers of BPS states dim HBP S (γ ) for charge γ ∈ H even (X). The results of the present paper might help to elucidate the nature of these BPS degeneracies. Our hope is that these degeneracies can be studied using (0, 4) elliptic genera. We summarize the remaining sections as follows. In Sect. 2 we review the general form of (0, 4) Lagrangians. In Sect. 3 we review the relation between unbroken supersymmetry and holomorphically wrapped 4-cycles. In Sect. 4 we describe in detail the derivation of the collective coordinates and the relation to variation of Hodge structures. We derive carefully the (0, 4) supermultiplets by reducing the supersymmetry transformations of the 6D tensormultiplet. In Sect. 5 we use the Kaluza–Klein ansatz of Sect. 4 in the chiral 5-brane action and derive the geometry of the target space of the (0, 4) model. In Sect. 6 we comment on some of the global aspects of the target space model. These involve the toroidal fibers of (1.13) and their relation to Narian models. In Sect. 7 we describe briefly what we think are some of the most interesting open problems raised by this paper. Many conventions and technical points may be found in the appendices. Appendix D describes the close analogy of the models in this paper with the strings obtained by wrapping D3 branes around holomorphic curves in a K3 surface. 2. Geometrical Data for (0, p) σ -Models In this section we summarize the geometrical data used to construct (0, 4) supersymmetric Lagrangians. This material is standard, and this section follows mostly [12, 13]. In two spacetime dimensions the supersymmetry algebra of type (0, p) is carried by p negative-chirality supersymmetries QI− , I = 1, . . . p, obeying: {QI− , QJ− } = 2δ I J P− .

(2.1)

In the construction of the (0, 4) sigma-models it is convenient to consider a formulation with only (0, 1) manifest supersymmetry. (0, 1) superspace consists of two Bose coordinates x + , x − and a single negative-chirality anticommuting coordinate θ − . Our sigma-model is defined by a map from (0,1) superspace 6 to a d-dimensional target

330

R. Minasian, G. Moore, D. Tsimpis

manifold M, given by scalar superfields 8i (x, θ − ), i = 1, ...d. In general there can also be another field which is a section of the vector bundle S− ⊗ φ ∗ V over 6, given by negative-chirality4 Spinor superfields 3a+ (x, θ − ), a = 1, ...n, where n is the fibre dimension of V and S− is the Spinor bundle over 6. V is equipped with a positive definite metric hab and a connection Ai a b with curvature Fij a b valued in some subgroup of O(n). The requirement of (0,4) SUSY imposes additional constraints which will be analyzed at the end of this section. The superfields have the following expansions: i (x), 8i (x, θ − ) = φ(x)i + iθ − ψ−

(2.2)

3a+ (x, θ − ) = λa+ (x) + iθ − F a (x). The action for the model in terms of (0, 1) superfields reads: Z d xdθ

S=

2

−

(gij (8) + bij (8))D− 8i ∂+ 8j ,

+ i3a+ (D− 3b+

+ D− 8

i

b

Ai c 3c+ )hab

a

b

(2.3)

+ imC 3 hab ,

where D− =

∂ ∂ + iθ − − , ∂θ − ∂x

(2.4)

where Ai a b is the connection on V with a curvature Fij a b . After eliminating the auxiliary fields and expanding in components, (2.3) reads: Z S=

i (+) j ∇+ ψ− − iλa+ D− λb+ hab d 2 x (gij + bij )∂+ φ i ∂− φ j + igij ψ−

1 1 i j i b − λa+ λb+ ψ− ψ− Fij ab + m∇i C a ψ− λ+ hab − m2 C a C b hab , 2 4

(2.5)

where ∇ (±) is the covariant derivative with respect to the connection with torsion ϒ (±)i j k = 0 i j k ± H i j k , 3 Hij k = ∂[i bj k] , 2

(2.6)

D− λ+ = ∂− λa+ + ∂− φ i Ai a b λb+ .

(2.8)

(2.7)

and

The action (2.3) is manifestly invariant under supersymmetry transformations δ8i = iη+ D− 8i , δ3a+ = −iη+ D− 3a+ .

(2.9)

4 The subscript refers to the fact that in a free theory the θ -independent component in the expansion of 3 would be a left-moving fermion in the sense that ∂− λa+ = 0.

Calabi–Yau Black Holes and (0, 4) Sigma Models

331

Let us now assume that (2.3) possesses additional supersymmetries, parametrized by r , r = 1, ...p − 1, of the form anticommuting parameters η+ r Jr i j (8)D− 8j , δ8i = iη+

r r a Ir a b (8)S b − mη+ tr (8), δ3a+ = −δ8i Ai a b 3b+ + η+

(2.10)

where S a = 2∇− 3a+ + mC a ,

(2.11)

and Jr , Ir , tr are to be determined. Invariance of the action under (2.9) and (2.10), and on-shell closure of the supersymmetry algebra are equivalent to the following set of conditions [13–15]: Jr Js = −δrs + frs t Jt ,

(2.12)

N (Jr , Js )ij k = 0, Fij Jr i [k Js j l] = Fkl δrs , Jr k (i gj )k = 0, (+)

(2.13)

J = 0, Ir (a hb)c = 0, ∇ c

(2.14) (2.15) (2.16) (2.17)

∂i (tra C b hab ) = 0,

(2.18)

= Jr i ∇j C .

(2.19)

∇i tra

j

a

Here N is essentially the Nijenhuis tensor. The SUSY algebra closes on-shell on λa+ by virtue of (2.12), (2.13), (2.14), (2.19). Conditions (2.11)–(2.17) can be summarized as follows [13]: M admits three complex structures obeying the algebra of quaternions, the metric on M is hermitian with respect to all three complex structures and the holonomy of the connection ϒ (+) is a subgroup of Sp(d/4). The bundle V ⊗C is holomorphic with respect to all three complex structures and carries an Hermitian metric hab . 3. Supersymmetrically Wrapped Fivebranes The low energy states of the black string S are obtained from small deformations of supersymmetrically wrapped cycles. Thanks to the existence of a κ-symmetric fivebrane action we can analyze which configurations preserve supersymmetry, following the analysis of [16]. The unbroken supersymmetries are the result of combining the supersymmetry δ 2 = with κ-symmetry δκ 2 = 2(1+0)κ, where the matrix 0 is field dependent (the explicit expression can be found in [17]) and has the property that (1±0) are projection operators. The condition of unbroken supersymmetry is (1 − 0) ≡ P− = 0.

(3.1)

Let the M5-brane be stretched in the X 0 − X5 directions, and consider a compactification on a Calabi–Yau threefold along X 2 , ..., X7 . We consider the wrapping of the fivebrane on a four-cycle P stretched in X 2 − X5 directions. The coordinates on the

332

R. Minasian, G. Moore, D. Tsimpis

fivebrane world-sheet are denoted by σ α , α = 0, ..., 5. It is convenient to choose a gauge such that σ 0 = X0 , σ 1 = X1 . Also let Xm , Xm , m = 1, 2 be a complex basis for X 2 − X5 (choosing a gauge such that dX6 + idX7 is an eigenvector of the complex structure on the Calabi–Yau). An eleven-dimensional Spinor is decomposed as (2)

± = λ(3) ⊗ λ± ⊗ ξ (6) ,

(3.2)

(2)

where λ± is a Spinor of Spin(2)01 of positive (negative) chirality, λ(3) is a Spin(3)8910 Spinor and ξ (6) is the covariantly constant Spinor of the Calabi–Yau. The elevendimensional 0-matrices can be decomposed as follows (2)

0 8,9,10 = γ 8,9,10 ⊗ ρ01 ⊗ ρ (6) , 0 0,1 = 13 ⊗ γ 0,1 ⊗ ρ (6) , 0

2,3,4,...,7

= 15 ⊗ γ

2,3,4,...,7

(3.3) ,

(2)

where ρ01 = iγ 0 γ 1 is the chirality operator of Spin(2)01 and ρ (6) is the chirality operator acting on the Calabi–Yau Spinors. We let 0M1 ...Mk = 0[M1 ...0Mk ] . Denoting a positive (6) (negative)- chirality Spinor on the Calabi–Yau by ξ± , we can choose a normalization such that the following identities hold: (6)

(6)

(6)

γm ξ+ = 0, γnpq ξ+ = 2iJn[p γq] ξ+ ,

(3.4)

where J is the Kähler form for X. Also, we have passed to a complex basis for the γ matrices such that γc1 = γ 2 + iγ 4 = (γc1 )† and γc2 = γ 3 + iγ 5 = (γc2 )† . We will omit the subscript on the complex matrices whenever there is no possibility of confusion. Equation (3.4) implies γmnpq ξ (6) = (Jnp Jmq − Jmp Jnq )ξ (6) ,

(3.5)

where we used Jmn = igmn and the anticommutation relations. We thus have 1 m n p q1 1 ± ∂2 X ∂3 X ∂4 X ∂5 X Jnp Jmq − (m ↔ n) ± , P − ± = 2 4

(3.6)

where ∂i = ∂/∂σ i , Ai = 2, ..., 5 and we have used γ 1212 = −4γ 2345 . As expected, P− ± = 0 implies 1 (dV )4 = ±ι∗ ( J ∧ J ), 2

(3.7)

where dV4 is the volume form of the part of the fivebrane wrapping the Calabi–Yau four-cycle and ι∗ (J ∧ J ) is the pullback of J ∧ J to the fivebrane. From (3.6), (3.7) it follows that for holomorphic (anti-holomorphic) cycles P only + (− ) can satisfy P− = 0. Thus only one eighth of the supersymmetry is preserved and the resulting σ -model is chiral (4, 0) ((0, 4)).

Calabi–Yau Black Holes and (0, 4) Sigma Models

333

4. The Massless (0, 4) Supermultiplets We now turn to the description of the massless degrees of freedom on S arising from small fluctuations around a supersymmetric wrapping of the fivebrane worldvolume on IR × S 1 × P. Our method will be to reduce the six-dimensional (2, 0) multiplet along P. Of course, this presupposes some facts about the equations of motion. These will be justified in Sect. 5. 4.1. Reduction of the scalars. The five scalars of the 6D (2, 0) tensormultiplet, denoted X a (cf. B.2) parametrize the position of the fivebrane in eleven dimensions. When we wrap an M-theory fivebrane on a real four-cycle P inside a Calabi–Yau manifold, three scalars (call them X8,9,10 ) parametrize the position of the string in the noncompact dimensions, and the remaining two describe the position of the cycle P inside the Calabi– Yau and should therefore be thought of infinitesimally as sections of the normal bundle. The massless scalars arise from deformations of the position of W2 × P preserving unbroken supersymmetry. The space of deformations of P as a complex submanifold of X has tangent space TP |P | = H 0 (P, N ),

(4.1)

where N is the normal bundle. We may identify N ∼ = L|P . It follows from the index theorem that if P is ample the complex dimension of the space of deformations of P is 1 c2 P − 1. N = D + 12 When we consider small fluctuations of the wrapped fivebrane on a cycle P we are really considering a family of cycles Pϕ near a point Pϕ=0 in the moduli space of deformations of P. These fit into a holomorphic fibration X ↓ π, U

(4.2)

where U ⊂ |P |s is a neighborhood of ϕ = 0. The fibers of this family are diffeomorphic to Pϕ=0 , but have variable complex structure. In Appendix C we show that there is an injection 0 → H 0 (X, L)/C · s ∼ = H 0 (P, L|P ) → H 0,1 (T P),

(4.3)

where P is the vanishing locus of the section s. By Kodaira–Spencer theory H 0,1 (T P) is the space of infinitesimal deformations of the complex structure of P [18]. Thus, a first order deformation of ϕ induces a nonzero deformation of complex structure. This will be important in the next section. Finally, we write out the Kaluza–Klein ansatz for the scalars describing fluctuations around P. Choosing a basis, υI , for H 0 (P, L|P ) and a coordinate system on X so that dX 6 + idX 7 is normal to P we may expand, to first order in ϕ I , X 6 + iX7 = υI ϕ I

(4.4)

to obtain complex two-dimensional massless fields ϕ I , I = 1, . . . , N. These scalars are both left- and right-moving.

334

R. Minasian, G. Moore, D. Tsimpis

4.2. Reduction of the worldvolume two-form. The chiral two-form β on W6 reduces to left-moving and right-moving scalars according to the decomposition into self-dual and anti-self dual parts as in Eqs. (1.9), (1.10) of the introduction. As mentioned in the introduction, H 2 (P) carries a polarized Hodge structure of weight two. The Hodge decomposition is simply H 2 (P; Z) ⊗ C = H 2,0 (P) ⊕ H 1,1 (P) ⊕ H 0,2 (P)

(4.5)

and the polarization is given by the Hodge metric.

H+2

H+2 (')

H,2 H,2 (')

'=0

'

Fig. 1. As we move in the moduli space of deformations, we may use C ∞ diffeomorphisms to define a flat basis for H 2 (P; Z) in the fibers. However both the Hodge decomposition and the decomposition into self-dual/antiself-dual parts change

It is important to bear in mind that the decomposition (4.5) and hence the decomposition of β into left- and right-moving scalars depends on ϕ. As ϕ changes the Kähler metric and complex structure on Pϕ varies, as illustrated in Fig. 1. This is the standard geometrical realization of variation of Hodge structures [5]. Since the fibers are all diffeomorphic we can choose a family of C ∞ diffeomorphisms and define a local system (i.e. a flat bundle) with a locally constant basis for H 2 (P; Z). Extending by linearity defines the Gauss-Manin connection on the bundle R 2 π∗ (C) over U whose fiber at P is H 2 (P). The Gauss–Manin connection allows us to differentiate in the ϕ I direction, and is crucial in deriving the low-energy (0, 4) Lagrangian. If we choose a smoothly-varying basis ωI , I = 1, . . . , 21 (b2+ − 1) of harmonic (2, 0) forms on P, and ω−a , a = 1, . . . , b2− (P), of anti-selfdual (1, 1)-forms on P, then these bases will rotate into one another in accord with Griffiths transversality. That is, if we define the holomorphically varying filtration H 2 (P) = F 0 ⊃ F 1 ⊃ F 2 by F 0 = H 2,0 ⊕ H 1,1 ⊕ H 0,2 , F 1 = H 2,0 ⊕ H 1,1 , F 2 = H 2,0 ,

(4.6)

Calabi–Yau Black Holes and (0, 4) Sigma Models

335

then ∇ : F p → F p−1 ⊗ 1 (U).

(4.7)

Or, in plain English, the connection matrix is upper triangular and increases p in the decomposition into (p, q) forms by at most one. We can split the Hodge structure into a fixed and a variable part as in (1.11) of the introduction. The Hodge structure Hf2 is fixed (as a function of P) because it is purely of type (1, 1) for all P. Finally, we write out the Kaluza–Klein ansatz for the chiral two-form β as: β = ρ a ω−,a + 4(π I ωI + c.c.) + u4 J.

(4.8)

The two-dimensional complex scalars π I and the real scalar u4 are right-moving whereas the b2− real scalars ρ a are left-moving. 4.3. Reduction of (2, 0) tensormultiplet fermion fields along P. We now describe the Kaluza–Klein ansatz for the fermions in the 6D tensormultiplet on W2 × P. We will expand these in terms of harmonic 2-forms and 0-forms on P. The conceptual reason we can do this is the following. The 5-brane breaks local 6D Lorentz symmetry on W6 as Spin(1, 1) × Spin(4) ,→ Spin(1, 5).

(4.9)

So the fermions are in the 4 = (+ 21 ; 2, 1) ⊕ (− 21 ; 1, 2). Moreover, the tensormultiplet theory has a Spin(5) = U Sp(4) R symmetry group from local Lorentz rotations in the normal directions (in 11D) to W2 × P. The Calabi–Yau background breaks this to Spin(3) × Spin(2), where Spin(3) are rotations in the noncompact normal directions and Spin(2) is the structure group of the normal bundle N for P in X. The restriction of the Spinbundle on X to P decomposes as: S + (T X) ∼ = S + (T P) ⊗ K 1/2 ⊕ S − (T P) ⊗ K −1/2

(4.10)

but, because P is Kähler, S + (T P) ⊗ K 1/2 ∼ = 0,0 (P) ⊕ 2,0 (P).

(4.11)

Hence we expand the zeromodes in terms of 0-forms and 2-forms on P. In order to reduce the supersymmetry transformations we will need to make (4.11) more explicit. Our conventions for six-dimensional supersymmetry are in Appendix B. We choose a basis for six-dimensional 0 matrices to be 0 0,1 = γ 0,1 ⊗ ρ (4) , 0 2,3,4,5 = 12 ⊗ γ 2,3,4,5 ,

(4.12)

where γ 0 = iσ 2 ; 0 σ 1,2,3 ; γ 2,3,4 = σ 1,2,3 0

γ 1 = σ 1, 0 i 12 γ5 = . −i 12 0

(4.13)

336

R. Minasian, G. Moore, D. Tsimpis

We will decompose the covariantly constant Spinor on the Calabi–Yau into two- and four-dimensional parts ξ (6) = ξ (2) ⊗ ξ.

(4.14)

We take ξ (6) to be anti-chiral in order to conform to the chirality of the tensor multiplet. Note that the Spin(2)67 and Spin(4)2345 Spinors ξ (2) and ξ are not covariantly constant but only projectively covariantly constant. That is, they are parallel up to a phase, and the phase cancels between ξ (2) and ξ . Again, using (4.9), the 6D tensormultiplet Spinors ψ (6) in their turn decompose as (6)

ψi

I 0 = ψi− ⊗ 1I(i) + ψi− ⊗ ξ(i) .

(4.15)

Here i = 1, 2, 3, 4 is a U Sp(4) R-symmetry index (see Appendix B) and there is no summation on i. Moreover, ( I γ mn ξ ωmn for i = 1, 3 I (4.16) 1(i) = ωI,mn γ mn ξ ∗ for i = 2, 4 and

( ξ(i) =

ξ for i = 1, 3 . ξ ∗ for i = 2, 4

(4.17)

This decomposition is consistent with the symplectic reality condition, which with our conventions reads † † , ψ3− = −iψ4− . ψ1− = iψ2−

(4.18)

I correspond to two complex Spinor degrees Thus the two-dimensional Spinors ψi− of freedom; they carry a subscript minus since only right-moving Spinors survive the wrapping as massless degrees of freedom. Note, for use in Sect. 5.2 that since γm ξ (6) = 0 one learns that γm ξ = 0.

4.4. Reduction of the supersymmetry transformations. We now come to the reduction of the 6D supersymmetry transformations. In principle, we should take into account all the complications of kappa supersymmetry and the exact superisometries unbroken by the background determined by X and P. However, as explained in Sect. 4.6 below, it is sufficient for our purposes to consider the reduction of the supersymmetry transformations from flat space. As a preliminary to the calculation it will prove very useful to relate the basis υI of (4.4) (used for the scalars) to the basis ωI of holomorphic (2, 0) forms on P. Abstractly this is a consequence of H 0 (P, L|P ) ∼ = H 0 (P, 2P ) ∼ = H 2,0 (P),

(4.19)

which follows from the adjunction formula (KX ⊗ L)|P ∼ = KP and the triviality of the canonical bundle KX of X. More explicitly we have a relation between the basis of the holomorphic two-forms on P, ωI,mn , and υI given by: ξ † ωI,mn γ mn ξ ∗ ; υ I = ξ T r ωI,mn γmn ξ.

(4.20)

Calabi–Yau Black Holes and (0, 4) Sigma Models (3,0)

337 (2,0)

Conversely, one can write ι(υI )|P = ωI , where ι is a contraction and (3,0) is a nowhere zero holomorphic three-form on X. Substituting the above expansions into the supersymmetry transformations of the tensormultiplet the supersymmetry transformation of the β-field yields δπ I = −2(i+ )† (1 + iT )ij ψjI− , δu4 = 2 (i+ )† Tij ψj0− ,

(4.21)

a

δρ = 0, where T = γˆ 6 γˆ 7 and 21 (1 + iT ) is a projection operator. Similarly, the reduction of the susy transformations for the Xa gives δϕ I = 2(i+ )† γˆ 6 (1 + iT ) ψjI− , ij (4.22) 8,9,10 † 8,9,10 0 = 2 (i+ ) γˆij ψj − , δX The second lines in (4.21) and (4.22) can be joined into a single equation δuA = 2 (i+ )† γˇijA ψj0− ,

(4.23)

where A = 1, . . . 4 and we have defined γˇ A=1,2,3 = γˆ 8,9,10 and γˇ A=4 = T , and u1 = X8 , u2 = X9 , u3 = X10 . Finally, the transformation laws of the fermions reduces to 1 1 I I = ∂− ϕ I 4+ + π I 2+ , δψ4− = ∂− −ϕ I 2+ + π I 4+ , δψ2− 2 2 1 0 A δψi− = ∂− uA γˇij j + , 2

(4.24)

(4.25)

where = −i † , ∂− = −∂0 + ∂1 . In the first line of (4.25) we have found it more convenient to work explicitly with components. Of course the other two transformations I , δψ I ) are related to the above by the symplectic reality condition (4.18). We (δψ1− 3− will give a considerably more attractive form of this equation in the next section. 4.5. Assembling the multiplets and the Hyper-Kähler structure. We will now summarize the previous sections by describing the (0, 4) supermultiplets in terms of the hyperKähler structure. Note first that from the above supersymmetry transformations the four real scalars uA and the four real component Spinor ψ 0 transform amongst themselves. Since this multiplet is present no matter what the topology of X or P is, we refer to it as the “universal” multiplet. It has some nice analogies with the universal hypermultiplet of Calabi–Yau compactification of type II strings. We now cast the susy transformations in quaternionic form. By letting w = u4 − iu3 ; z = −u2 + iu1 the bosonic coordinate X has the form z w . (4.26) X= −w z

338

R. Minasian, G. Moore, D. Tsimpis

Moreover if we define the quaternions η θ ; 2= −θ η

χ ψ , −ψ χ

90 =

(4.27)

0 ; ψ = ψ 0 , then (4.23) reads where we have set η = i2+ ; θ = −i4+ ; χ = ψ1− 4−

δX = −4i2 · 9.

(4.28)

The corresponding fermionic transformation (the second line of (4.25)) becomes i δ9 † = − ∂− X · 2. 2

(4.29)

These are the supersymmetry transformations of the universal superfield in a manifestly (0, 4) invariant form. Geometrically, the four scalars parametrize U = IR3 × S 1 (the u4 scalar coming from tensor field is periodic). Similarly, if we define δπ I := DI J δπ J and

(4.30)

Z DI J :=

P

ωI ∧ ω J ,

(4.31)

we see from the supersymmetry transformations that the scalars π I and ϕ I mix under δ 2 . Proceeding in the same manner we define I δϕ I δπ I χ ψI I = δXI = ; 9 (4.32) −δπ I δϕ I −ψ I χ I and ζ λ , 4= −λ ζ

(4.33)

I ; ψ I = ψ I . We see that the transformations where now ζ = 4+ ; λ = 2+ ; χ I = ψ2− 4− I δϕ and δπ I are joined together since the first lines of (4.21) and (4.22) can be expressed compactly as

δXI = −4i4 · 9 I .

(4.34)

Similarly the fermionic transformations (the first line of (4.25)) become δ9 I =

1 † 4 · ∂− XI . 2

(4.35)

And once more the (0, 4) supersymmetry is manifest. It is clear from our construction that π I play the role of coordinates on the cotangent space. Thus, the target space is locally T ∗ |P |.

Calabi–Yau Black Holes and (0, 4) Sigma Models

339

4.6. Comparison with standard (0, 4) models. Now that we have determined the field content and (0, 4) multiplets let us begin to make contact with the general form of the Lagrangian described in Sect. 2. Since the target space is locally U × T ∗ |P|, there must 1 c2 · P scalar multiplets. The scalars in these multiplets have both be N + 1 = D + 12 left- and right-moving degrees of freedom, whereas the complex scalars π I derived from (4.8) are purely right-moving. On the other hand, the chiral 2-form β also gives b2− real left-moving scalars ρ a . Since the signature of P is negative, we must pair b2+ degrees of freedom from the ρ a with the rightmovers π I . The remaining |σ (P)| degrees of freedom correspond to the left-moving fermions denoted by λ in Sect. 2. We will discuss this pairing of left- and right-moving degrees of freedom further in Sect. 5 below. We can also compare with the supersymmetry transformations of Sect. 2. Let Z M = (X, θ) be the superembedding coordinates and let M = (m, µ); A = (a, α) be “curved” and “inertial” eleven dimensional indices respectively. Lower-case Latin (Greek) letters denote bosonic (fermionic) components. Up to local Lorentz transformations the vielbein A = D A , where D transforms under supercoordinate transformations as δ EM M M is the covariant derivative. We see that the fermionic transformations which preserve a given background are parametrized by covariantly constant Spinors . The fermionic symmetries of the fivebrane are: δ Z M = EαM α ; δκ Z M = (1 + 0)β α κ β EαM ,

(4.36)

where 0 is as in Sect. 3. In order to recover the field content of the six dimensional (2, 0) tensor multiplet we need to do some gauge fixing. Upon reducing the resulting equations to two dimensions we get highly non-linear expressions. Ultimately we want to compare with the supersymmetries of the sigma model presented in Sect. 2 (which is quadratic both in derivatives and in the right-moving fermions). Such a truncation brings us back to the supersymmetry transformations of Appendix B.2 and their reduction, Eqs. ((4.34), (4.35)). Comparing with (2.10) we can read off the three complex structures encoded in these equations: Let us choose a real basis δϕ I = δφ 1I + iδφ 2I ; δπ I := D I J δπJ = δφ 3I + iδφ 4I . Similarly χ I = θ 1I + iθ 2I ; ψ I = −θ 4I + iθ 3I and ζ = 0 + i 1 ; λ = 2 + i 3 . With these definitions (4.34) takes the form δφ iI = 0 θ iI + r Jr iI j J θ j J , i = 1, . . . 4,

(4.37)

where J1 = [σ 3 ⊗ iσ 2 ]i j δ I J , J2 = [iσ 2 ⊗ 12 ]i j δ I J , J3 = [σ 1 ⊗ iσ 2 ]i j δ I J .

(4.38)

One can verify that they satisfy Jr Js = −(δrs + εrst Jt ).

5. Local Target-Space Geometry The low-energy two-dimensional Lagrangian encodes the geometry of the target space of the (0, 4) model. We will derive this Lagrangian by Kaluza–Klein reduction of the chiral fivebrane Lagrangian of [8, 9].

340

R. Minasian, G. Moore, D. Tsimpis

5.1. Bosonic Lagrangian. Let us start by reducing the bosonic part of the five-brane action presented in [9]. The action possesses manifest general coordinate covariance only along five of the six worldvolume dimensions. Here we will take the distinguished direction to be the spatial direction of the two-dimensional world-sheet W2 which is taken to be flat. For our conventions/definitions we refer to Appendix A.2. The action consists of three terms r p ρλ / −G , e + G G (5.1) H L1 = − −det Gµˆ 5 ˆν µρ ˆ νˆ λ 1 eµν ∂1 βµν , L2 = − H 4 1 G1ρ eµν eκλ H . L3 = εµνρκλ 11 H 8 G

(5.2) (5.3)

We will use the “static gauge” X µˆ = σ µˆ in which the above expressions simplify and we recover the field content of the (2, 0) six-dimensional multiplet discussed in Appendix B (note however that here we are using a gauge in which the antisymmetric tensor is effectively five-dimensional in the sense that β1µˆ = 0). Moreover we will keep only terms at most quadratic in ∂X and/or H . As explained before, when reducing to W2 × P the only nonvanishing components of the field β are along P. Hence (5.1)–(5.3) read 1 1 eµν e Hµν (1 + gab ∂ρ Xa ∂ ρ Xb ) L1 = gab ∂µˆ X a ∂ µˆ Xb − H 2 4 1 eµκ e ν (5.4) Hκ gab ∂µ X a ∂ν Xb , − H 2 1 (5.5) L2 = − εABCD ∂0 βAB ∂1 βCD , 8 1 eµν Hρµν gab ∂ ρ Xa ∂1 Xb , (5.6) L3 = H 4 where gab is the metric on the space transverse to the fivebrane. Keeping only up to twoderivative terms and dropping terms with derivatives along P (which are suppressed by the size of P) the above expressions simplify further Z 1 1 −∂0 X a ∂0 Xb + ∂1 X a ∂1 X b gab + g AC g BD ∂0 βAB ∂0 βCD dV S= 2 4 W6 (5.7) 1 ABCD − ε ∂0 βAB ∂1 βCD . 8 Reducing the kinetic term of the scalars in (5.7) using (4.4) we see that it gives rise to a term, Z d 2 σ ∂(+ ϕ I ∂−) ϕ J GI J , (5.8) R

W2

where GI J = P dV G(L)υI υ J , and G(L) is the hermitian metric on the line bundle L|P = N(P ,→ X) (in complex notation, it has only one component). By using a Fierz identity (see B.1) we can establish the metric on the space of ϕ’s in terms of the intersection matrix Z ωI ∧ ωJ . (5.9) DI J = P

Calabi–Yau Black Holes and (0, 4) Sigma Models

341

As for the chiral two-form, the reduction of (5.7) gives (omitting the universal superfield) Z d 2 σ (∂(0 π I ∂+) πJ D I J − ∂0 ρ a ∂− ρ b Dab ). (5.10) S0 = W2

5.2. Fermionic Lagrangian. Equations (5.8), (5.9) and (5.10) contain some of the essential information we need to extract the geometric data for the (0, 4) Lagrangian. However, to extract all the data we must consider the quadratic terms in fermions. Therefore, we look at the terms quadratic in the right-moving fermions containing exactly one derivative along W2 . These come from the reduction to W2 of the quadratic Lagrangian for fermions: Z Z 1 AC BD 1 g g ∂0 βAB θ 00 0C DD θ + ABCD ∂0 βAB θ 01 0C DD θ dV 2 4 W2 P (5.11) 1 αβ ABCD α θ 0ABCD 0α Dβ θ , − θ0 Dα θ + 4! where θ is the eleven-dimensional superspace coordinate (the superpartner of the embedding coordinates X M (σ )), 0’s are eleven-dimensional gamma-matrices and D is the pullback of the Spin connection from the ambient space to the fivebrane worldvolume W6 . This piece of the action is obtained by gauge fixing the κ-symmetric action of [17] in a general curved background, keeping only the terms quadratic in θ which involve exactly one derivative along W2 , and discarding O((Xa )2 ) terms. The last term comes from the coupling of the six-form potential to the fivebrane worldvolume. In reducing (5.11) we can make use of the κ-symmetry to eliminate the unphysical degrees of freedom of θ and express it in terms of six-dimensional Spinors ψ−I ⊗ 1I , χ−I ⊗ (1∗ )I . Here ψ−I , χ−I are two-dimensional Weyl Spinors and 1I are fourdimensional Weyl Spinors (see Sect. 3.2 and Appendix B). The six-dimensional 0matrices decompose as in Sect. 3.2. Suppressing internal (along P) covariant derivatives on the Spinors, the reduced action can be cast in the form (the universal superfield is not included) Z d 2 σ ∂+ ϕ K [(ψ−J )† ψ−I RJKI + (χ−J )† χ−I RKJ I ] + c.c., (5.12) W2

where we use the same conventions for the two-dimensional fermions as in Sect. 3 and we have defined Z Z JI J † I dV (1 ) ∇K 1 ; RKJ I = dV (1J )tr ∇K (1∗ )I , (5.13) RK = P

P

where ∇I is the covariant derivative discussed in Sect. 4.2. We will now analyze the meaning of the “coupling” of (5.13) to extract the target space geometry. As in Sect. 4.1 we consider a family of surfaces Pϕ near Pϕ=0 . With respect to our basis of holomorphic two-forms ∇ acts in the following way: ∇J ωI = [0J ]x I ωx ,

(5.14)

342

R. Minasian, G. Moore, D. Tsimpis

where x ∈ {I, I , a} and [0J ] is the ϕ- dependent Gauss–Manin connection matrix. Therefore Z Z ∇K ωI ∧ ωJ = [0K ]L I ωL ∧ ωJ , (5.15) ∂ K DI J = P

where we have defined ∂I =

P

∂ ∂ϕ I

. On the other hand 1I = ωI (γ (2) )ξ , γ (2) =

γ A γ B ∂x∂A ⊗ ∂x∂B and it’s easy to see that Z dV ξ † ωJ (γ (2) )∇K ωI (γ (2) )ξ RKJ I = P Z dV ξ † γ mn ωJ mn [0K ]x I ωxAB γ AB ξ = ∂K DI J . =

(5.16)

P

For the last step we have used (5.15), the fact that γm ξ = 0 (as in 4.3) and a little bit of gamma-matrix algebra. We conclude that RKJ I is just the Christoffel symbol of the manifold |P| with Kähler metric (5.9). 5.3. Comparison to the standard (0, 4) Lagrangian. Let us now assemble the data we have gathered and compare to the standard Lagrangian spelled out in Sect. 2. The part of (the bosonized version of) the (0, 4) action (2.5) containing all one-derivative terms quadratic in fermions is j

(+)

i ψ− [Fij aˆ ∂+ ρ aˆ + ϒij k ∂+ φ k ], ψ−

(5.17)

where ρ aˆ , aˆ = 1, . . . , b2− − b2+ is the set of purely left-moving scalar fields. Comparison to (5.12) implies that the b-field and the gauge connection of the vector bundle over the target-space are flat, and that the metric on the target-space is ds 2 = D(ϕ, ϕ)I J dϕ I dϕ J + D(ϕ, ϕ)I J (∇π )I (∇π )J ,

(5.18)

f has where DI J is the intersection pairing defined in (5.9). Since the connection on T M no torsion, supersymmetry requires that the metric (5.18) is hyper-Kähler, and indeed, gives a way to prove the hyperkähler property of the metric. Using the supersymmetry transformations (4.37) and the fact that (0, 4) symmetry is unbroken, it follows that the 3 complex structures in (4.38) are covariantly constant.5 It is interesting to compare the metric with the c-map construction [7]. The metric there reads ds 2 = DI J dϕ I ⊗ dϕ J + D I J (∇π )I ⊗ (∇π )J

(5.19)

(note that in this case DI J = DI J ). One has three closed two-forms ω1 , ω2 , ω3 , where ω1 is the associated (1, 1) form and ω1,2 are the real, imaginary parts of dϕ I ∧ (∇π )I . Setting dϕ I = dφ 1I + idφ 2I ; (∇π )I := D I J (∇π )J = dφ 3I + idφ 4I , the metric takes the form ds 2 = GiI,j J dφ iI ⊗ dφ j J , where GiI,j J = DI J δij . The three two-forms r can be used to construct three complex structures Jr iI j J := GiI,kK ωkK,j J which in 3 2 i I 2 i I 1 components read: J1 = [σ ⊗ iσ ] j δ J , J2 = [iσ ⊗ 12 ] j δ J , J3 = [σ ⊗ iσ 2 ]i j δ I J . These are exactly the same as in (4.38). 5 It would be desirable to have a more direct, and more standard proof of this fact. This is being investigated in [19].

Calabi–Yau Black Holes and (0, 4) Sigma Models

343

6. Comments on the Global Structure of the Target Space 6.1. Narain theory in the entropic factor. We now consider the periodicities of β, needed to determine the global structure of the fibers of p in (1.13). As we have stressed in the introduction, we expect the target space to be compact. Therefore, while the local target manifold is IR4 ×T ∗ |P |, the fibers should be compactified, maintaining the hyper-Kähler property. The most natural (perhaps the only) way to do this is to take a quotient by a f are complex tori. lattice in the fiber of T ∗ |P| so that the fibers of M + ˆ Passing to a real basis {ωIˆ , I = 1, . . . b2 } of self-dual two forms on P we can expand ˆ

β = π I ωIˆ + ρ a ωa

(6.1)

and reexpress (5.10) as Z S0 =

W2

ˆ

ˆ

d 2 σ (∂0 π I ∂+ π J DIˆJˆ − ∂0 ρ a ∂− ρ b Dab ),

(6.2)

R where DIˆJˆ = P ωIˆ ∧ ωJˆ . Thus the metric is diagonal on left- and right-movers. However, the information of how to “combine” the zero-modes of the left and rightmoving bosons to define the statespace of the full conformal field theory is not contained in (6.2). This has to be imposed ad hoc. Our ansatz is that the Lagrangian in (6.2) is a Narain σ -model with non-trivial Narain data: a constant metric, a constant torsion, and Wilson lines. Thus, we take the periodicities β → β + nx Ux , nx ∈ Z,

(6.3)

where we have introduced a basis {Ux ; x = 1, . . . b2 } of H 2 (P, Z). The data for the zeromodes of the scalars are encoded in the projections onto the definite signature subspaces: P : H 2 (P; Z) ⊗ IR → H 2− (P; IR) ⊥ H 2+ (P; IR).

(6.4) ˆ

In particular, the left and right-moving momenta are just p = (Fxa nx ea ; fxI nx eIˆ ), where R R ˆ Fxa = P ωa ∧ Ux ; fxI = P ωIˆ ∧ Ux , and ea is the vielbein for the metric Dab and eIˆ is the vielbein for DIˆJˆ . 6.2. Charge violation by instantons: The “MSW effect”. The Narain model of the previous section is somewhat peculiar because the conserved U (1) charges coupling to the string are in the lattice H 2 (X; Z) which is a (small!) sublattice of H 2 (P; Z). This puzzle was resolved in [2] as follows.6 The charges H 2 (P; Z) are conserved in the (0, 4) sigma model studied in this paper, but they are violated by membrane instanton processes in the full M-theory. As mentioned in [2] if a state in the (0, 4) CFT is charged under an element in H 2 (P; Z) which is not in H 2 (X; Z) it will decay to a state charged in H 2 (X; Z). Indeed, since the map (1.12) is injective the dual map: ι∗

H2 (P; Z) → H2 (X; Z) → 0 6 We thank J. Maldacena and E. Witten for important clarifying explanations about this process.

(6.5)

344

R. Minasian, G. Moore, D. Tsimpis

is surjective, and hence has a large kernel. Elements of the kernel are nontrivial surfaces [6] ∈ H2 (P; Z) which bound a 3-ball in X, 6 = ∂B. It is possible to have a membrane instanton whose worldvolume is B because the equation dH = −Q(M2)δ(6 ,→ W6 ),

(6.6)

where H = dβ and Q(M2) is the membrane charge, allows membranes to end on fivebranes [20]. Since this process uses M-theory instantons, it will only be important near degenerations of P. One interesting question raised by this “MSW effect” is whether states on the 5-brane can carry torsion charges. The kernel of ι∗ is a sublattice of H2 (P; Z). We claim that under Poincaré duality P D : H2 (P; Z) → H 2 (P; Z), we have P D(ker ι∗ ) = ι∗ (H 2 (X; Z))⊥ , where the orthogonal complement is in the Hodge metric of P. To prove this note that if [6] ∈ ker ι∗ , then its Poincaré dual form η6 ∈ H 2 (P; Z) satisfies Z Z ∗ η6 ∧ ι (θ ) = θ (6.7) P

ι∗ (6)

for all θ ∈ H 2 (X; Z). Since ι∗ (H 2 (X; Z)) is not unimodular while H 2 (P; Z) is unimodular, the sublattice ι∗ (H 2 (X; Z)) ⊕ ι∗ (H 2 (X; Z))⊥ will have finite index in H 2 (P; Z). The quotient group is a (large) group of potential torsion charges. We say “potential” because we do not fully understand the model globally on |P |. It would be interesting to understand how the above torsion charges can be understood in the framework of the K-theory interpretation of D-brane charges [21, 22].

6.3. Narain data for the universal factor . It is possible to be much more explicit about the Lagrangian for the universal multiplet. Just as for the rest of the fields, its action follows from the reduction of (5.7) and (5.11). Let {J, θ3 ; 3 = 1, . . . h1,1 (X) − 1} be a basis of H 1,1 (X, IR) such that θ3 restricts to a basis of anti-self-dual forms on P. Moreover, let {Yw ; w = 0, . . . h1,1 (X) − 1} be a basis of H 1,1 (X, Z). The part βu of the chiral 2-form contributing to the “universal” multiplet is expanded as βu = u4 J + ρ 3 θ3

(6.8)

βu → βu + nw Yw ; nw ∈ Z.

(6.9)

with the periodicities

The universal multiplet is governed by the action Z 0 d 2 σ {D00 ∂0 u4 ∂+ u4 − D330 ∂0 ρ 3 ∂− ρ 3 }, Sun = W2

(6.10)

R R R where D330 := P θ3 ∧ θ30 = X P ∧ θ3 ∧ θ30 ; D00 := P J ∧ J = 2V ol(P). Repeating the analysis of Sect. 6.1 we see that the left/right-moving momenta are given by p = (Fw3 nw e3 ; fw0 nw e0 ),

(6.11)

Calabi–Yau Black Holes and (0, 4) Sigma Models

where Fw3 :=

R

P θ3

∧ Yw ; fw0 :=

R P

345

J ∧ Yw are the projections

P : H 2 (X, Z) ⊗ IR → H 2+ (X, IR) ⊕ H 2− (X, IR),

(6.12)

and e3 (e0 ) is a vielbein for the metric D330 (D00 ). The self-dual (right-moving) piece is generated by J . The radius R of the S 1 in the target space is given by R 2 = 2π1 2 V ol(P), when h1,1 (X) = 1, and by more complicated formulae in general. 6.4. Effects of the M-theory 3-form. Finally, let us comment on two effects that happen when we turn on the C3 field of the eleven-dimensional supergravity. First, in the Kaluza–Klein reduction of M theory on X the field C3 gives rise to h1,1 (X) five-dimensional vectors (together with KK modes from the metric these form the gravity multiplet and h1,1 (X) − 1 vector multiplets). The coupling of C3 to the fivebrane worldvolume induces string couplings to the background gauge fields Z 30 0 d 2 σ {A3 (6.13) + ∂− ρ D330 + A− ∂+ u4 D00 }, W2

where A3 are the abelian vector fields and A0 is the graviphoton. Such couplings are also important for cancellation of anomalies in the gauge transformations in the presence of the string [23]. Since the projections in (6.12) already encode Narain data, including the flat connection on the gauge bundle, we see that turning on C3 just shifts the gauge fields. Second, the 5-brane action consists of a Dirac part and a WZ part. In the Dirac part the fieldstrength of the chiral 2-form enters through H = dβ − C3 . In the Kaluza– Klein reduction this leads to shifts of the periodicities of the chiral scalars, for example, ˆ ˆ ˆ ∂+ π I → ∂+ π I + C I . If C3 = dX1 ∧ θ , with θ ∈ H 2 (X; IR) then the Narain vectors are e0 . If we consider the corresponding shifted by p → p + θ, leading to a shift in L0 − L IIA picture this is in accord with the Witten-effect shifting of the D0 charge: Z 1 e (6.14) p∧θ + θ ∧θ , 1(L0 − L0 ) = 2 P where we have identified p with the first Chern class of the Chan–Paton bundle on the D4 brane. 7. Conclusion: 5 Problems on 5 Branes First and foremost it would be good to extend the discussion in this paper to understand the global geometry on |P |. This consists of at least two important sub-problems. First, we have restricted to an open neighborhood in |P |s . It would be interesting to take into account the effects of monodromy. Second, the 4-cycle P will degenerate on a codimension one discriminant locus D = |P | − |P |s of the linear system. The generic singularity will be a rational double point. Many interesting and important questions depend crucially on understanding what happens to the (0, 4) model when the fivebrane degenerates. In [24] a drastic degeneration with D points of self-intersection was successfully used to count black hole entropy at leading order in large charges. Second, as mentioned in the introduction, one of the original motivations for this work was to find a state-counting formula for BPS states in M-theory compactifications which

346

R. Minasian, G. Moore, D. Tsimpis

are macroscopically 4d black holes with 8 supersymmetries. We believe that combining the elliptic genus of (0, 4) models studied in [25] with the results of this paper one can derive formulae for the BPS degeneracies. This idea is currently under investigation. Third, it would be nice to clarify the status of the above model as a CFT. Since the σ -model described in this paper is rather elaborate, it would be nice to have a clear understanding of whether the entropic factor is, in fact, a conformal field theory (and if not, what it flows to). Moreover, it might be useful to find a linear sigma model which renormalizes to the above nonlinear model. This would be possible if the metric on T ∗ |P | were given by a hyper-Kähler quotient. Thus, an interesting question raised by this work is whether there is a sense in which the metric on T ∗ |P | induced by the Calabi–Yau metric becomes the hyperkähler quotient metric in the limit of large P . Fourth, it would be nice to extend the discussion to fivebranes with even less supersymmetry, leaving a (0, 2) string. Such configurations would appear if the M5 worldvolume is near a boundary, as in the Horava-Witten picture. At a formal level, much of the above discussion generalizes to the (0, 2) case. However quantum corrections are expected to be much more important here. Fifth, if the M-theory compactification has a heterotic dual then there must be a description of the same strings in the heterotic picture. Indeed, in the case X = K3 × T 2 with P = K3 one reproduces the heterotic string itself [26, 27]. However, in the case of P defined by a class P with P large there will be a large number of left- and rightmoving degrees of freedom. Because of the MSW effect it is not obvious that these charges should really be visible. We think this is worth understanding better. Acknowledgements. GM would like to thank J. Maldacena and E. Witten for several important discussions on this subject. We would also like to thank P. Deligne, D. Freed, D. Morrison, T. Pantev, A. Todorov, and G. Zuckerman for discussions. RM acknowledges the hospitality of the Erwin Schrödinger Institute, Ecole Polytechnique and LTPHE, Paris VI–VII. GM thanks the Institute for Advanced Study for hospitality and the Monell Foundation for support during the completion of this paper. This work is also supported by DOE grant DE-FG02-92ER40704.

Appendix A. List of Some Notation A.1. General notation. A[1···k] = A1···k for a k-form A α = 0, 1: the directions along the string world-sheet a = 6, . . . , 10: the directions transverse to W6 ; (a = 1, . . . , b2− also enumerates the basis vectors of H 2− (P)) aˆ = 1, . . . , b2− − b2+ A, B = 2, . . . , 5: the (real) directions along P. β The chiral 2-form of the 5brane 6D tensormultiplet. γ µ , 0 M , γˆ a : Gamma matrices defined in Sects. 3, 4.3 and B1. γˇ 1 − γˇ 4R: matrices defined in Sect. 4.4 (below (4.23)). DI J ≡ R P ωI ∧ ωJ Dab ≡ P ωa ∧ ωb H 1,1 (X)⊥ : The subspace of H 1,1 (X) “orthogonal” to J , see Sect. 4.1 H 2± (P): the spaces of self-dual, antiself-dual 2-forms on P θ3 , 3 = 1, . . . , h1,1 (X) − 1: a basis of H 1,1 (X)⊥ i = 1, . . . , 4: a U Sp(4) index, except in Sect. 2 2 − 1), except in Sect. 2 I = 1, . . . , 21 (b+

Calabi–Yau Black Holes and (0, 4) Sigma Models

347

Iˆ = 1, . . . , b2+ J : the Kähler form on P and on X L: the holomorphic line bundle over X, associated to the divisor P M = 0, . . . , 10: the spacetime index µ, ν = 0, . . . , 5: the directions along W6 m, m = 1, 2: the (complex) directions along P N The number of (0, 4) multiplets. Defined in (1.3) ω−a , a = 1, . . . , b2− : a basis of H 2− (P) 2 − 1): A basis of H (2,0) (P) (H (0,2) (P)) ωI (ωI ), I = 1, . . . , 21 (b+ ωIˆ , Iˆ = 1, . . . , b2+ : a basis of H 2+ (P, IR) P: A generic smooth holomorphic surface inside X The locus of smooth divisors in the linear system |P | |P |s P: The cohomology class in H 2 (X; Z) dual to the 4-cycle P σ 0 − σ 5 : the coordinates on W6 Ux , x = 1, . . . , b2 (P): a basis of H 2 (P, Z) Yw , w = 1, . . . , h1,1 (X): a basis of H 1,1 (X, Z) X A Calabi–Yau 3-fold, used for compactifying M-theory X M (σ ) : the embedding of W6 to the eleven-dimensional spacetime ξ (6) : the covariantly constant Spinor of the Calabi–Yau ξ : the component of ξ (6) along P (in a local decomposition)

A.2. Conventions for Sect. 5.1. µ, ˆ νˆ = 0, . . . , 5: the directions along W6 µ, ν = 0, 2, . . . , 5: omitting the “distinguished” direction σ µˆ : the coordinates on W6 M N Gµˆ ˆ ν = ηMN ∂µˆ X ∂νˆ X G5 = det (Gµν ) Hµνρ = 3∂[µ βνρ] eµν = 1 εµνρκλ Hρκλ H 6 Appendix B. (2, 0) Tensor Multiplet B.1. The conventions. In this section we work in six-dimensional Minkowski space. The R-symmetry group for the theory with sixteen real supercharges is SO(5). Let a = 1, . . . , 5 (the index a is SO(5) Euclidean) and µ = 0, 1, . . . 5. A basis of gammamatrices γˆ a in five-dimensional Euclidean space can be constructed as follows: 0 σ 1,2,3 0 i 12 −12 0 9 10 ; γ ˆ ; γ ˆ = = γˆ 6,7,8 = . (B.1) −i 12 0 0 12 σ 1,2,3 0 In checking the susy transformations of the (2,0) multiplet of B.2 it is more convenient to work in a slightly different basis than the one we used in 4.3 for gamma-matrices 0 µ in six-dimensional Minkowski space: 0 γµ , (B.2) 0µ = γµ 0 e

348

R. Minasian, G. Moore, D. Tsimpis

where

0 12 0 σ 1,2,3 12 0 1,2,3 4 ; γ = ; γ = ; γ = −12 0 0 −12 σ 1,2,3 0

0

γ e

0−4

=γ

0−4

5

In this basis the charge-conjugation and the chirality matrices are 0 c 14 0 ; ρ (6) = i 2 0 1 . . . 0 5 0 0 = , C (6) = −c 0 0 −14 where

0 c= 0 − with given by

(B.3)

; e γ = −γ = −i 14 . 5

(B.4)

AB =

0 1 = iσ 2 ; AB = −AB . −1 0

(B.5)

(B.6)

The real, antisymmetric tensor of U Sp(4) obeys = −−1 ; (γˆa )T r = γˆ a . We can write in this basis explicitly

− 0 = . 0

(B.7)

(B.8)

For an (anti)chiral Spin(1,5) Spinor θi , i = 1, . . . 4 transforming in the 4 of U Sp(4) (i is a U Sp(4) index) the symplectic-reality condition reads Tr

θi = ij cθ j ; θ i = −ij θjT r c,

(B.9)

where θ = θ + γ 0 . B.2. 6D supersymmetry. The 6D (2, 0) multiplet consists of a self-dual (on-shell) antisymmetric two-form βµν , four six-dimensional Weyl Spinors {ψi , i = 1 . . . 4} obeying the symplectic reality condition (B.9), and five scalars {X a , a = 6, ...10}. In other words, under the little group Spin(4) × U Sp(4) , βµν , ψ, Xa transform in the ((3, 1); 1), (4; 4), ((1, 1); 5) respectively. After the elimination of the auxiliary field introduced in the covariant formulation of the fivebrane of [17], supersymmetry closes on-shell. The susy transformations are (suppressing the USp(4) index on the fermions): δX a = −2 γˆ a ψ, 1 µνρ 1 µ a γ ∂µ X γˆa + γ Hµνρ , δψ = 2 8 δβµν = −2γµν ψ,

(B.10)

where Hµνρ = ∂[µ βν%] . Using these equations one checks that the algebra closes on shell.

Calabi–Yau Black Holes and (0, 4) Sigma Models

349

Appendix C. Some Remarks on Kodaira–Spencer Theory In this appendix we show (4.3). Let X be a complex manifold with a divisor P. There are two exact sheaf sequences which we are going to use 0 → O(T P) → O(T X|P ) → O(L|P ) → 0,

(C.1)

which is a sequence over P, and 0 → O(T X ⊗ [−P]) → O(T X) → O(T X|P ) → 0,

(C.2)

which is a sequence over X. From the long exact sheaf-cohomology sequence associated to (C.1) we obtain: · · · → H 0 (P, O(T X|P ) → H 0 (P, O(L)) → H 1 (P, O(T P)) → H 1 (P, O(T X|P )) → · · · .

(C.3)

Since H 1 (P, O(T P)) ∼ = H 0,1 (T P), in order to show that the mapping (4.3) is injective, it suffices to show that H 0 (P, O(T X|P )) = 0. For this we will use the following part of the exact long sheaf-cohomology sequence associated to (C.2): · · · → H 0 (X, O(T X)) → H 0 (P, O(T X|P )) → H 1 (X, O(T X ⊗ [−P])) → · · · , (C.4) where we noted that H ∗ (X, O(T X|P )) = H ∗ (P, O(T X|P )). However H 0 (X, O(T X)) ∼ H 0,0 (T X) ∼ = H 2,0 (X) = 0 (where the last equivalence can be seen using the exis= tence of a unique nowhere-vanishing holomorphic three-form on X). Moreover using Kodaira–Serre duality we have H q (X, O(T X ⊗ [−P])) ∼ = H 3−q (X, 1 (L))∗ = 0; q = 0, 1, 2,

(C.5)

where the last equality is due to the fact that L is associated to the very ample divisor P, and we can take c1 (L) to be arbitrarily large. Thus it immediately follows from (C.4) that H 0 (P, O(T X|P )) = 0 and hence (4.3) is indeed injective. One can show that H 1 (P, O(T X|P )) ∼ = H 1 (X, O(T X)) ∼ = H 0,1 (T X) ∼ = H 2,1 (X) 6= 0 so we cannot conclude from (C.3) that (4.3) is surjective. Appendix D. D3 on K3 and the (4, 4) σ -Model Although outside the main line of development of this paper, it is worthwhile discussing the properties of (4, 4) models within the framework of this paper. For a recent account see [28]. Here we address some complementary issues. To obtain a (4, 4) model we will consider a D3 wrapped on a holomorphic two-cycle (a Riemann surface) P inside X = K3 (whenever it doesn’t lead to confusion, we will keep the same notation as for the corresponding discussion in the case where X is a Calabi–Yau three-fold). This is a much simpler system to analyze, since all the scalars coming from the reduction to the string world-sheet of the D3-brane low-energy lagrangian, are non-chiral. The number of left and right-movers is given by a formula similar to the one for the fivebrane: NLB = NRB = dP + b1 (P) + 4.

(D.1)

350

R. Minasian, G. Moore, D. Tsimpis

The (bosonic part of the) gauge theory on the worldvolume involves a vector field Aµ and six scalars. When wrapped on the two-cycle, two scalars X4 , X5 will parametrise the deformations, yielding dP scalars on the string worldvolume, while the other four X 6 − X9 will form the universal superfield (here, in analogy to the discussion for the M5, we consider the D3-brane to be along X0 − X3 while X is taken along X2 − X5 ). In this case, the universal superfield does not contain compact scalars, and is given simply by IR4 . In its turn, the vector field gives rise to b1 (P) scalars. Note that the Kähler form no longer appears in our analysis of the scalar spectrum and the structure of the universal superfield is considerably simpler. More precisely, the counting goes as follows Z Z c1 (P) = − P 2 = 2 − b1 (P), (D.2) χ(P) = cp

X

where as before we have set P = c1 (L) = −c1 (P). On the other hand χ(L) =

dimX X

(−1)i hi (X, L) = h0 (X, L), hi (X, L) ≡ dimC H i (X, L)

(D.3)

i=0

where the last equality follows from the fact that P is very ample, and Z Z 1 1 2 eP T d(X) = (D.4) P + c2 (X) . h0 (X, L) = 12 X X 2 R Taking (D.2) into account and the fact that χ(X) = X c2 (X) = 24 for X = K3, we finally get dP = b1 (P) = 2D + 2,

(D.5)

R where again dP stands for the real dimension of H 0 (X, L) and D = 21 X P 2 as before. The (4, 4) action follows from the reduction of the Born-Infeld action for D3 X 1 ∂α X a ∂ α Xa + . . . . L = − F2 + 4 9

(D.6)

a=4

Let {ωI (ϕ), I = 1, . . . , 21 b1 } be a basis of holomorphic one-forms on Pϕ . We can write the gauge field in terms of this basis Am = πI ωI,m .

(D.7)

Moreover, to the first order in ϕ we expand X 4 + iX5 = ϕ I υI ,

(D.8)

with {υI } a basis of holomorphic sections of LP . The c-map works in the same way as in the case of [7], and we obtain the result that all the terms in the reduction of (D.6) come from a Kähler potential of the form7 e κ = κ(ϕ, ϕ) + D I J πI π J ; DI J = ∂I ∂J κ, 7 To prove this, one should use ω = ι(υ )|(2,0) . I I P

(D.9)

Calabi–Yau Black Holes and (0, 4) Sigma Models

351

Again supersymmetry mixes the scalars effectively doubling the coordinates and yielding a target space of real dimension 2b1 = 4D + 4. Finally, the coupling of D3 R to background RR fields gives rise to the σ -model bfield. In particular we have D3 φ RR F ∧ F, where the RR scalar has to be kept fixed in the D3 background. The discussion of the compactness of the target space as well as the dependence of the σ -model data in terms of K3 geometry follows the generic Calabi–Yau constructions. References 1. Strominger, A, and Vafa, C.: Microscopic Origin Of The Bekenstein–Hawking Entropy. Phys. Lett. 379B, 99 (1996); hep-th/9703062 2. Maldacena, J., Strominger, A. and Witten, E.: Black Hole Entropy in M Theory. J. High Energy Phys. 12 002 (1997), hep-th/9711053 3. Behrndt, K., Lopez Cardoso, G., de Wit, B., Kallosh, R., Lust, D. and Mohaupt, T.: Classical and quantum N=2 supersymmetric black holes. Nucl. Phys. B488, 236 (1997); hep-th/9610105 4. Harvey, J.A., Minasian, R. and Moore, G.: Non-abelian Tensor-multiplet Anomalies. J. High Energy Phys. 09, 004 (1998); hep-th/9808060 5. Griffiths, P.A.: Periods of integrals on algebraic manifolds, III. Publications I.H.E.S. 38, 125 (1970); Schmid, W.: Variation of Hodge structure: The singularities of the period mapping. Inv. Math. 22, 211 (1973); Griffiths, P. and Schmid, W.: Recent developments in Hodge Theory: A discussion of techniques and results. In: Discrete subgroups of Lie groups and applications to moduli, Bombay Colloquium 1973, OUP 1975; Carlson, J., Green, M., Griffiths, P. and Harris, J.: Infinitesimal variations of Hodge structures. Compositio Mathematica 50, 109 (1983); see also Topics in transcendental algebraic geometry. ed. P.A. Griffiths, Princeton, NJ: Princeton University Press, 1984; Brylinski, J.-L. and Zucker, S.: An overview of recent advances in Hodge theory. In: Several Complex Variables VI, W. Barth and R. Narasimhan eds. Berlin–Heidelberg–New York: Springer Verlag, 1990; Kulikov, V.S. and Kurchatov, P.F.: Complex algebraic varieties: Periods of integrals and Hodge Structures. In: Algebraic Geometry III, eds. A.N. Parshin and I.R. Shafarevich, New York: Springer-Verlag, 1998 6. Calabi, E.: Kähler metrics and holomorphic vector bundles. Ann. Sci. Ecole Norm. Sup. 12, 269 (1979) 7. Cecotti, S., Ferrara, S. and Girardello, L.: Geometry of Type-II Superstrings and the Moduli of Superconformal Field Theories. Int. J. Mod. Phys. A4, 2475 (1989) 8. Pasti, P., Sorokin, D., Tonin, M.: On Lorentz-invariant actions for chiral p-forms. hep-th/9611100 9. Aganagic, M., Park, J., Popescu, C., Schwarz, J.: World-Volume Action of the M Theory Five-Brane. hep-th/9701166; Schwarz, J.: Coupling a Self-Dual Tensor to Gravity in Six Dimensions. hep-th/9701008 10. Moore, G.: Arithmetic and attractors. hep-th/9807087; Attractors and arithmetic. hep-th/9807056 11. Miller, S.D. and Moore, G.: Landau–Siegel zeroes and black hole entropy. hep-th/9903267 12. Hull, C., Witten, E.: Supersymmetric sigma models and the heterotic string. Phys. Lett. 160B, 398 (1985) 13. Papadopoulos, G., Townsend, P.: Massive sigma-models with (p,q) supersymmetry. Class. Quant. Grav. 11 , 515 (1994) 14. Hull, C.M., Papadopoulos, G. and Townsend, P.K.: Potentials For (p, 0) and (1, 1) Supersymmetric Sigma Models With Torsion. Phys. Lett. B316, 291 (1993) 15. Howe, P. and Papadopoulos, G.: Further remarks on the geometry of two-dimensional non-linear σ models. Class. Quant. Grav. 5, 1647 (1988) 16. Becker, K., Becker, M. and Strominger, A.: Fivebranes, Membranes and Non-Perturbative String Theory. Nucl. Phys. B456, 130 (1995); hep-th/9507158 17. Bandos, I. et al.: Covariant Action for the Super-Five-Brane of M-Theory. Phys. Rev. Lett. 78, 4332; (1997) hep-th/9701149 18. Kodaira, K.: Complex manifolds and deformation of complex structures. New York: Springer-Verlag, 1985 19. Minasian, R., Moore, G., Todorov, A. and Tsimpis, D.: In progress 20. Strominger, A.: Open p-branes. Phys. Lett. 383B, 44 (1996); hep-th/9512059 21. Minasian, R. and Moore, G.: K-theory and Ramond–Ramond charge. JHEP 11, 002 (1997); hepth/9710230 22. Witten, E.: D-branes and K-theory. hep-th/9810188 23. Ferrara, S., Khuri, R.R. and Minasian, R.: M-Theory on a Calabi–Yau Manifold. Phys. Lett. B375, 81–88 (1996); hep-th/9602102

352

R. Minasian, G. Moore, D. Tsimpis

24. Maldacena, J.: N = 2 Extremal Black Holes and Intersecting Branes. Phys. Lett. B403, 20–22 (1997); hep-th/9611163 25. Kawai, T. and Mohri, K.: Geometry of (0, 2) Landau-Ginzburg orbifolds. Nucl. Phys. B425, 191 (1994); hep-th/9402148 26. Harvey, J.A. and Strominger, A.: The Heterotic String is a Soliton. Nucl. Phys. B449, 535 (1995), hepth/9504047 27. Cherkis, S. and Schwarz, J.: Wrapping the M-Theory Five-Brane on K3. Phys. Lett. 403B, 225 (1997); hep-th/9601029 28. Dijkgraaf, R.: Instanton Strings and Hyper-Kähler Geometry. hep-th/9810210 Communicated by R. H. Dijkgraaf

Commun. Math. Phys. 209, 353 – 392 (2000)

Communications in

Mathematical Physics

© Springer-Verlag 2000

A Geometric Approach to the Existence of Orbits with Unbounded Energy in Generic Periodic Perturbations by a Potential of Generic Geodesic Flows of T2 Amadeu Delshams1 , Rafael de la Llave2 , Tere M. Seara1 1 Departament de Matemàtica Aplicada I, Universitat Politècnica de Catalunya, Diagonal 647,

08028 Barcelona, Spain. E-mail: [email protected]; [email protected]

2 Department of Mathematics, University of Texas at Austin, Austin, TX, 78712, USA.

E-mail: [email protected] Received: 22 September 1998 / Accepted: 2 August 1999

Abstract: We give a proof based in geometric perturbation theory of a result proved by J. N. Mather using variational methods. Namely, the existence of orbits with unbounded energy in perturbations of a generic geodesic flow in T2 by a generic periodic potential. 1. Introduction The goal of this paper is to give a proof, using geometric perturbation methods, of a result proved by J.N. Mather using variational methods [Mat95]. We will prove: Theorem 1.1. Let g be a C r generic metric on T2 , U : T2 × T → R a generic C r function, r sufficiently large. Consider the time dependent Lagrangian L(q, q, ˙ t) =

1 q g (q, ˙ q) ˙ − U (q, t), 2

where g q denotes the metric in Tq T2 . Then, the Euler–Lagrange equation of L has a solution q(t) whose energy E(t) =

1 q g (q(t), ˙ q(t)) ˙ + U (q(t), t), 2

tends to infinity as t → ∞. Remark 1.2. Note that, in fact, the only unbounded part in E(t) is q(t), ˙ so that the theorem could be expressed as unbounded growth in the velocity. Remark 1.3. As it is usually the case in problems of diffusion, one not only constructs orbits whose energy grows unbounded, but also orbits whose energy makes more or less arbitrary excursions. We formulate this precisely in Theorem 4.26, and deduce Theorem 1.1 from it.

354

A. Delshams, R. de la Llave, T. M. Seara

Remark 1.4. The argument presented here shows that r ≥ 15 is large enough for Theorem 1.1. (See the proof of Lemma 4.23.) We do not claim that this is optimal for the geometric method to go through. Remark 1.5. Actually, the results of Mather contain this as a particular case, as well as ours. This theorem as stated seems to be just a common ground that allows some comparison of the methods. Notably, Mather can deal with situations involving much less regularity. Our method seems to apply to other situations. Notably, it applies without substantial changes to geodesic flows in any manifold provided we assume that they have a periodic orbit which is hyperbolic in an energy surface and that its stable and unstable manifolds intersect transversally in the energy surface. Besides geodesic flows, it also applies to some mechanical systems and to quasiperiodic perturbations. We hope to come back to these extensions in future work. Remark 1.6. The assumptions of genericity will be made quite explicit in Theorem 4.26, a more general result than Theorem 1.1. They amount to the existence of a closed hyperbolic geodesic with a homoclinic connecting orbit for the metric g, and that a certain function L, called Poincaré function, computed from the potential on the homoclinic orbit, is not constant. The work of Mather [Mat95] also requires similar assumptions. As far as we can see, the main difference in the hypotheses of [Mat95] and this paper is that [Mat95] also uses that the periodic orbits and the connecting ones are minimizing and class A. On the other hand, the differentiability hypotheses of this note are much more restrictive than those in [Mat95]. The orbits with growing energy produced in this work and those produced in [Mat95] are not necessarily the same: the orbits we produce here shadow smooth invariant curves, whereas those in [Mat95] shadow minimizing Aubry–Mather sets (which could be Cantor sets). We think that it is remarkable that the functional that needs not to be constant is the same in both approaches. We hope that this could lead to a more geometric understanding of [Mat95], which could perhaps lead to some new results. Remark 1.7. We note that it is possible to choose g and U as arbitrarily close to the flat metric and zero as desired in an analytic topology. Hence, this could be considered as an analogue of Arnol’d diffusion. Depending on what one defines precisely as Arnol’d diffusion it may not be appropriate to call the phenomenon described in [Mat95] and here by this name. Since a universally accepted precise definition of Arnol’d diffusion seems to be lacking we just point out that the phenomena described here has a similar flavor and indeed the methods that we use here are very similar to the methods traditionally used in the field. The analogy with the traditional approaches of Arnol’d diffusion is much closer when we consider what happens for a bounded range of (rather high) energies. We note that in this case, there are two smallness parameters. One is the distance of the metric to the flat metric and another one is the size of the potential. For high energy, the potential is a very small perturbation of the geodesic flow (we will make all this more precise later). If we choose g close to flat, for the theorem to go through we need to choose the energy for which the potential can be considered as a sufficiently small perturbation. The same feature of two smallness parameters was present in the original example [Arn64]. Remark 1.8. Note that the geodesic flow, which in our case plays the rôle of the unperturbed system, is assumed to have some hyperbolicity properties. Indeed, the hyperbolicity properties involve that the system contains hyperbolic sets with transversal

Geometric Approach to the Existence of Orbits with Unbounded Energy

355

intersection in an energy surface. This is somewhat stronger hyperbolicity than the a priori unstable unperturbed systems of [CG94], which are integrable. We propose the name a priori chaotic for systems such as those considered in this paper in which the reference system has some conserved quantities, but there are orbits which are hyperbolic and with transverse heteroclinic intersections in the manifolds corresponding to the conserved quantities. One can hope that, besides their intrinsic interest since they appear in physically relevant models, the study of a priori chaotic systems can be used as a stepping stone for the study of other systems, in the same way that a priori unstable systems are used as a step in the study of a priori stable systems. Note that, since a priori chaotic systems are not close to integrable, the Nekhoroshev upper bounds for the time of diffusion and the KAM bounds on the volume of diffusing trajectories do not apply. Remark 1.9. An important feature of this problem is that, besides two smallness parameters, it has two time scales. For high energy, the frequency of the unperturbed problem is high while the frequency of the perturbation is small. Hence, one can bring to bear methods of adiabatic theory to obtain small gaps between KAM tori. (This phenomenon also happens in the models considered in [CG94], who emphasized the important rôle played by this fact in the conclusions and also identified several physical models where this is a natural assumption.) Remark 1.10. The main difference of the methods presented here with more traditional approaches to Arnold diffusion is the reliance in hyperbolic perturbation theory and center manifold reduction rather than the exclusive reliance in KAM perturbation theories. (A sketch of the method proposed was known in [Lla96].) We think that the locally invariant normally hyperbolic manifold is an interesting structure since one can study the dynamics on it using the powerful methods of two dimensional dynamics, notably Aubry–Mather theory. We hope to come back to these issues exploiting the many structures present in the invariant center manifold in the near future. Similar ideas were used in [LW89]. We note that the use of methods based in normal hyperbolicity to deal with systems with two scales of time in a geometric way has been successfully used for a long time (see, e.g., [Fen79]). We want to draw attention to [BT99], which presents another geometric method to obtain similar results (In particular they also give a geometric proof of Mather’s result.) They construct a transition chain relying on standard KAM theory and the Poincaré– Melnikov method and do not use normally hyperbolic theory as we do in this paper. Rather than relying on periodic orbits as we do in this paper, they rely on whiskered tori with one hyperbolic degree of freedom. For systems with two degrees of freedom (such as geodesic flows on T2 ) periodic orbits are the same as whiskered tori with one hyperbolic degree of freedom, but for systems with more degrees of freedom, they are not. Hence, the escaping orbits constructed by the two methods are different. Of course, the methods used in [Mat95] are completely different from all the methods based on geometric perturbation theory. We have hopes that a blending of the traditional methods, with hyperbolic perturbation theory, a more geometric understanding and variational methods could lead to progress in the problem of Arnol’d diffusion.

356

A. Delshams, R. de la Llave, T. M. Seara

1.1. Summary of the method. The proof we present here can be conveniently divided into different stages. In a first stage, we use classical Riemannian geometry to establish the existence of a family of periodic orbits. The whole family is a two dimensional normally hyperbolic manifold which carries an exact symplectic form (restriction of the symplectic form in the phase space). Its stable and unstable manifolds intersect transversally and the motion on it is a twist map with an unbounded frequency. This step is due to Morse, Hedlund and Mather, and is covered in Sects. 2 and 3.2. In a second stage (Sect. 4.2), we show that, for high enough energy, the perturbation introduced by the potential can be considered small. This is just an elementary scaling argument. We give full details mainly to set the notation. In a third stage, we use perturbation theory of normally hyperbolic manifolds to show that this normally hyperbolic manifold persists into a locally invariant normally hyperbolic manifold, and its stable and unstable manifolds keep on intersecting transversally. Also, we note that the perturbed invariant manifold inherits a symplectic structure from the ambient space and that, therefore, the rich methods of Hamiltonian perturbation theory can be brought to bear on the motion restricted to it. A brief summary of hyperbolic perturbation theory is presented in Appendix A, and the application to our problem is presented in Sects. 3.3 and 4.3. It is important to note that the motion on this invariant manifold has a faster time scale than the perturbation introduced by the potential. In a fourth stage (Sect. 4.4.1), we use averaging theory to eliminate the fast angles from the Hamiltonian to obtain that the motion on the normally hyperbolic invariant manifold can be reduced to integrable up to an error which is of very high order in the perturbation parameter, which is given by the inverse of the square root of the energy. Hence, the error decreases as an inverse power of the energy. In a fifth stage (Sect. 4.4.2), we use quantitative versions of KAM theory to show that the smallness of the perturbation in the invariant manifold leads to the fact that this invariant manifold is filled very densely with KAM tori, and we obtain approximated expressions for these tori. In a sixth stage, we use the Poincaré-Melnikov method to compute the change of energy in a homoclinic excursion and show that, under appropriate non-degeneracy assumptions, the stable manifold of one KAM torus intersects transversally the unstable manifold of another – very close – KAM torus, giving rise to heteroclinic orbits. These calculations are not completely standard due to the presence of two time scales. We also note that the literature about Melnikov functions for quasiperiodic objects is somewhat confusing. Notably, some of the terms that make the naïve Melnikov integrals not absolutely converging are incorrectly omitted in many papers. Hence, we decided to present rather full details in Sect. 4.6. In a seventh stage (Sect. 4.7) we use the results which show that given transition chains, one can find orbits that shadow them. We emphasize that all these stages use only readily available techniques and theorems which are almost readily available. (Perhaps the less standard part is the part on the calculation of Poincaré–Melnikov functions, so it appears fully expanded.) Moreover, these stages are significantly independent, so that if we assume – or arrive by other methods at – the conclusions of one, all the subsequent results apply. In particular, if we assumed that the geodesic flow in a manifold (not necessarily T2 or not necessarily two dimensional) has a periodic orbit which, when considered in the unit energy surface is hyperbolic and has a transverse homoclinic intersection, all the results would go through. (The place where we need some more serious modifications

Geometric Approach to the Existence of Orbits with Unbounded Energy

357

for higher dimensional manifolds is the obstruction property since the λ-lemma we quote works for codimension one surfaces.) Other mechanical systems could also be treated in a similar manner. In particular, the above strategy was designed to be compatible with variational methods. The invariant manifolds produced using the theory of normally hyperbolic manifolds carry Aubry–Mather sets, as pointed out by J. N. Mather. Moreover, variational methods can be used to provide powerful shadowing lemmas that can be used in the last stage. 2. Classical Geometry of the Geodesic Flow The following geometric facts were proved by Morse, Hedlund and Mather and their relevance for the problem we are considering was discovered and emphasized in [Mat95]. Theorem 2.1. For a C r open and dense set of metrics in T2 , r = 2, . . . , ∞, ω, there exists a closed geodesic “3” which is hyperbolic in the dynamical systems sense as a periodic orbit of the geodesic flow. Moreover, there exists another geodesic “γ ” and real numbers a+ , a− , such that dist(“3”(t + a± ), “γ ”(t)) → 0 as t → ±∞.

(2.1)

Here we will take the standard definition that a geodesic “3” is a curve “3” : R → T2 , parameterized by arc length which is a critical point for the length among any two of its points. Later, we will consider curves in the cotangent bundle that are orbits of the geodesic flow. Clearly, these orbits are closely related to the geometric geodesics in the manifold. We will use for the orbits in the cotangent bundle the same letter as for the geodesics but suppress the quotation marks. When we want to speak about the orbits of the geodesic flow as manifolds in phase space (more properly, the range of the mapping ˆ = Range 3). Note that the speed of a unit geodesic is 1 and 3), we will use a ˆ (i.e. 3 that, therefore, its energy is 1/2. We assume without any loss of generality that the length of “3” on the metric g is 1. (It suffices to multiply the metric by a constant, which, physically, corresponds to choosing the units of length). Therefore, “3” as an orbit of the geodesic flow has period 1. Note that by changing the origin of time, we obtain another geodesic, so that the geodesics satisfying geometric properties are always one parameter families. This consideration will be important when we consider time dependent perturbations. When the change of origin of time is an integer (an integer number of times the period of “3”) then (2.1) remains unaltered. Hence a± are defined only up to the simultaneous addition of an integer to both of them. Actually Morse and Hedlund showed much more. They showed that there exists one “3” in each free homotopy class. Moreover, they showed that “3” can be taken to be minimizing and “γ ” satisfies other minimizing properties (class A). These result were essentially (no mention of genericity, hypernbolicity and higher differentiability was required) established in [Mor24] for any two dimensional manifold of genus bigger than 1 and in [Hed32] for the torus. Such minimization properties play an important rôle in the work [Mat95]. In this work, what is important is that the closed geodesic “3” is hyperbolic and that there exists a connecting geodesic “γ ”. Of course, the fact that “3” is hyperbolic implies – when it has the right index – that it is a local minimizer for the length functional, which is the assumption used in [Mat95]. On the other hand, our method seems to work without any minimizing assumptions on the connecting geodesic “γ ”. Recall that, using

358

A. Delshams, R. de la Llave, T. M. Seara

dynamical systems theory, given a periodic orbit with homoclinic connections, there exist other homoclinic connections (and other periodic orbits). Even if the original connection was minimizing, the secondary ones will not, in general, be so. Similarly, we note that, since the analysis we perform is quite local in the neighorhood of the periodic orbit and its homoclinic connection, our method does not require that the manifold considered is the torus. The transversality of the invariant manifolds associated to “3”, which plays an important rôle for our method, does not seem to play a rôle in [Mat95]. Of course, our method requires much more differentiability than the method of [Mat95].

3. The Unperturbed Problem 3.1. Hamiltonian formalism and notation. The present problem admits natural Lagrangian and Hamiltonian formulations. From our point of view neither of them plays a large rôle, but it seems that the Hamiltonian point of view is somewhat more convenient. Hence, this is the formalism that we will consider. The Hamiltonian phase space of the geodesic flow is T∗ T2 = R2 × T2 . We will denote the coordinates in T2 by q and the cotangent directions by p. Note that we are taking some advantage – but mainly in the notation – of the fact that the cotangent bundle of T2 is trivial. We point out that, as it is well known, the phase space, being a cotangent bundle admits a canonical symplectic form, which moreover is exact. It is well known that for a cotangent bundle such as T∗ T2 there is a unique 1-form θ such that α ∗ θ = α for any one form α on T2 . (Here we think of forms as maps from T2 to T∗ T2 .) P i pi dqi , = P Then, = dθ is a symplectic form. In local coordinates, θ = i dpi ∧ dqi . With respect to the form , the geodesic flow is Hamiltonian and the Hamiltonian function is H0 (p, q) =

1 gq (p, p), 2

where gq is the metric in T∗ T2 . We will denote by 8t this geodesic flow. For each E, we will denote 6E = {(p, q) | H0 (p, q) = E}, and observe that, for any ˜ E0 = ∪E≥E0 6E ' [E0 , ∞) × T1 × T2 , E0 > 0 (later, we will use this for large E0 ), 6 that is, we can take the energy as a part of a coordinate system. Note that the energy is one half the square of |p| so that the energy can be used as a radial coordinate in p. This is quite convenient. We will also need an angle coordinate, to complete the polar coordinate system. We also note that 6E – a three dimensional manifold diffeomorphic to T1 × T2 – is invariant under the geodesic flow. p q Given an arbitrary geodesic “λ” : R → T2 we will denote by λE (t) = (λE (t), λE (t)) the orbit of the geodesic flow that lies in the energy surface 6E , and whose projection over q runs along the range of “λ”. Moreover, we fix the origin of time in λE so that it corresponds to the origin of the parameterization in “λ”. (Formally H0 (λE (t)) = E, q q and Range(“λ”) = Range(λE ), “λ”(0) = λE (0).) It is easy to check that the above conditions determine uniquely the orbit of the p geodesic flow, in particular determine λE (t).

Geometric Approach to the Existence of Orbits with Unbounded Energy

Note that

359

√ √ √ p q p q 2E λ1/2 2E t , λ1/2 2E t , λE (t), λE (t) =

(3.1)

so that, for the geodesic, the rôle of E is just a rescaling of time. Since 31/2 has period √ 1 with our conventions (see the remarks after Theorem 2.1), then 3E has period 1/ 2E . 3.2. Hyperbolicity properties. Extending the methods of Morse-Hedlund for Theorem 2.1, J. N. Mather showed: Theorem 3.1. For a C r generic metric, r = 2, . . . , ∞, ω, and for any value of the Hamiltonian H0 (p, q) = E > 0, there exists a periodic orbit 3E (t), as in (3.1), of ˆ E is a normally hyperbolic invariant manifold in the the geodesic flow whose range 3 are two dimensional, and there energy surface. Its stable and unstable manifolds W s,u ˆ 3E

exists a homoclinic orbit γE (t), that is, its range γˆE satisfies ˆ E ∩ Wu \ 3 ˆE . γˆE ⊂ W3sˆ \ 3 ˆ 3 E

E

Moreover, this intersection is transversal as an intersection of invariant manifolds in the energy surface along γˆE . For E = 1/2, we have that, for some a± ∈ R, dist (31/2 (t + a± ), γ1/2 (t)) → 0 as t → ±∞.

(3.2)

We note that (3.2) is a general property of homoclinic orbits to hyperbolic manifolds and follows readily from the exponential convergence of γ1/2 to 31/2 and the comparison of the flow restricted to 31/2 and γ1/2 . We also note that since γˆE is one dimensional, W sˆ , W uˆ are two dimensional, and 3E

3E

the ambient manifold 6E is three dimensional, we have Tx γˆE = Tx W sˆ ∩ Tx W uˆ for 3E

3E

all the points x ∈ γˆE . Hence, by the implicit function theorem, γˆE is the locally unique intersection. Since we are considering manifolds invariant under flows, their intersection has to contain orbits and γE is locally the only possible – up to change in the origin of the parameter – orbit in the intersection of W sˆ and W uˆ . 3E 3E For the geodesic flow, the energy is preserved and therefore the dynamics can be analyzed on each energy surface. This, however, will not be useful when we consider the external periodic potential which changes the energy. Hence, it will be useful to discuss what happens for all energy surfaces. The following lemma is a description of S ˆ E for all values of the energy. the behavior of 3 = E≥E0 3 S ˆ E . This is a manifold with boundary which is diffeoLemma 3.2. Define 3 = E≥E0 3 morphic to [E0 , ∞) × T1 , and the canonical symplectic form on T∗ T2 restricted to 3 is non-degenerate. The form |3 is invariant under the geodesic flow 8t . ˆ E, We have for some C, α > 0 and for all x ∈ 3 ˆE Tx 6E = Exs ⊕ Exu ⊕ Tx 3 with ||D8t (x)|Exs || ≤ Ce−αt for t ≥ 0, ||D8t (x)|Exu || ≤ Ceαt for t ≤ 0 and ||D8t (x)|Tx 3ˆ E || ≤ C for all t ∈ R.

360

A. Delshams, R. de la Llave, T. M. Seara

The stable and unstable manifolds to 3: W3s , W3u , are three dimensional manifolds diffeomorphic to [E0 , ∞) × T1 × R, and [ γˆE ⊂ W3s \ 3 ∩ W3u \ 3 γ = E≥E0

is diffeomorphic to [E0 , ∞) × R. We also note that, since the definition of transversal intersection of manifolds only requires that the tangent spaces span the ambient space, when we add an extra dimension (in this case the energy, but later we will consider other parameters) the intersection of the extended manifolds is still transversal. The intersection of the extended manifolds will not be just one orbit but we will have Tx γ = Tx W3s ∩ Tx W3u . Hence, γ will still be a locally unique intersection. We note that the only properties of the geodesic flow that we will use are the conclusions of Theorem 3.1 and Lemma 3.2. 3.3. Extended phase spaces. Since we are going to consider periodic perturbations, it will be convenient to introduce an extra angle variable, which we will denote by s, which moves at a constant rate 1. Then, the phase space will be T∗ T2 × T1 . ˜ = 3 × T1 , and analogously γ˜ = γ × T1 , to denote We will introduce the notation 3 the corresponding objects in the extended phase space. In the case that we do not have any external potential, the dynamics in this extended phase space is just the product of the geodesic flow in T∗ T2 and the motion with constant speed 1 in the circle (corresponding to the extra variable). In this extended phase space the results of Sect. 3.2 immediately imply: ˆ E × T1 is a two dimensional invariant manifold. Its (un)stable manifold is a three • 3 dimensional manifold. They intersect transversally in 6E × T1 . (Of course, they are not transversal in the whole extended space since they lie on the energy surface.) • When we consider the results for all the energies, we obtain normal hyperbolicity: ˜ = 3 × T1 is a 3-dimensional manifold, and it is normally hyperbolic for the 3 ˜ ˜ t (see Definition A.1 in Appendix A). The (un)stable manifolds of 3 extended flow 8 are W3u,s × T1 , and are 4-dimensional. • Moreover, γ˜ = γ × T1 lies in the intersection of W3s × T1 = W s˜ and of W3u × T1 = 3 W u˜ , and the intersection is transversal. 3 ˜ is neither contracting nor ˜ t restricted to the invariant manifold 3 • The extended flow 8 expanding: ˜ ˜ t (x)| ˜ || ≤ C ∀t ∈ R, x ∈ 3. ||D 8 Tx 3

(3.3)

These observations will be important because they will allow us to use the rich theory of hyperbolic invariant manifolds summarized in Appendix A when we consider the problem with the external potential. This extended phase space is obviously not symplectic (it has odd dimension). In order to perform some other calculations, we will find it convenient to perform a symplectic extension. This is accomplished by adding another real variable a symplectically conjugate to s, which does not change with time.

Geometric Approach to the Existence of Orbits with Unbounded Energy

361

Then, the symplectically extended phase space is T∗ T2 × R × T1 . The symplectic ˜ = + da ∧ ds. The flow is Hamiltonian and its Hamiltonian form in this space is function is h(a, s, p, q) = a + H0 (p, q). Since a is conserved, the restriction of the flow of h to each of the manifolds a = cte. is identical to the flow of H0 in the extended phase space. In this case, the neutral direction given by a spoils all the hyperbolicity properties. This situation is very common in Hamiltonian systems since the neutrality along a manifold as in (3.3) implies similar bounds for the symplectic conjugate space. 3.4. The inner map. We will consider F , the time 1 map of the geodesic flow restricted to 3, i. e., F = 81 |3 . (This will make it easier to analyze the time periodic external forcing.) As we are dealing with the autonomous case, we note: 1. It is still true that 3 is a normally hyperbolic surface for 81 . 2. The stable and the unstable manifolds for 81 are the same as for the flow 8t . In particular, they are still transversal. 3. |3 is a symplectic form on 3. 4. 81 ∗ = . Hence F ∗ |3 = |3 . 5. We have the canonical 1-form θ , called the symplectic potential, such that dθ = . We note that |3 = dθ|3 . 6. 81 ∗ θ = θ + dS. Hence, F ∗ θ |3 = θ |3 + dS|3 . Therefore, the map F restricted to 3 is an exact symplectic map. Remark 3.3. Note that the rescaling properties (3.1) of the geodesic flow imply scaling properties for the variational equations. As a consequence of them, the angle hT3ˆ E W sˆ , 3E

ˆ E , remains bounded indeT3ˆ E W uˆ i between the stable and the unstable bundles in 3 3E √ pendently of E. On the other hand, the Lyapunov exponents scale with 2E . Therefore,

√

D81 |T W s ≤ α 2E ,

ˆ 3 ˆ E 3 E

√

D8−1 |T W u ≤ α 2E ,

ˆ 3 ˆ E

3E

where α < 1 is independent of E, even if it depends on the metric. 3.5. A coordinate system on 3. Now we want to describe a coordinate system in 3 that can be used to compute the motions on it as well as their perturbations. We want coordinate functions that are not only defined on 3 but also on a neighborhood of it. This will be particularly important for us mainly in the calculation of the Poincaré function. Since the manifolds we are going to consider are cylinders, we will take one real coordinate (momentum) and one√angle coordinate (position). √ The real coordinate will be J = 2H0 ≥ 2E0 . For the angle coordinate, we will take ϕ ∈ T1 , which is determined by dJ ∧ dϕ = |3 , and ϕ = 0 corresponds to the origin of the parameterization in “3". Hence θ |3 = J dϕ. If we express the motion in 3 in these variables, it will be a Hamiltonian system of Hamiltonian 21 J 2 and therefore the equations of motion will be J˙ = 0; ϕ˙ = J . √ Hence the geodesic 3E (t) of formula (3.1) is given in these coordinates by J = 2E ,

362

A. Delshams, R. de la Llave, T. M. Seara

√ √ ϕ = 2E t. Note that for any ϕ0 ∈ R, √ ) is another periodic orbit that √3E (t + ϕ0 / 2E in these coordinates is given by J = 2E , ϕ = ϕ0 + 2E t. For emphasis, when we consider the geodesic flow, the inner map of Sect. 3.4 (the time one map restricted to 3) will be denoted by F0 . Its expression in these coordinates is F0 (J, ϕ) = (J, ϕ + J ).

(3.4)

Note that F0 is a twist map and that

F0∗ θ|3 = θ |3 + d J 2 /2 .

3.6. The outer map. Another important ingredient in our approach is the map S : 3 → 3 that we will call the “scattering map” (in analogy with a similar object in quantum mechanics) or the “outer map” associated to γ . This map S will transform the asymptotic point at −∞ of a homoclinic orbit to 3 into the asymptotic point at +∞. For emphasis, we will denote S0 : 3 → 3 the scattering map of the geodesic flow. We define x+ = S0 (x− ) if W s (x+ ) ∩ W u (x− ) ∩ γ 6= ∅. More precisely, x+ = S0 (x− ) means that ∃z ∈ γ ⊂ T∗ T2 , such that dist (8t (x± ), 8t (z)) → 0 , as t → ±∞. We note that, as it is obvious from the definition, the map S0 depends on the γ we have chosen. We have not included it in the notation to avoid typographical clutter, since in the rest of the paper, γ will be fixed. For the unperturbed case of the geodesic flow, this map can be computed explicitly. To compute S0 , we note that, from Theorem 3.1, we have: dist(31/2 (t + a± ), γ1/2 (t)) → 0, as t → ±∞ or, by the rescaling properties (3.1), √ √ √ dist 3E t/ 2E + a± / 2E , γE t/ 2E → 0 as t → ±∞,

(3.5)

(3.6)

therefore, and for any ϕ0 ∈ R, t + ϕ0 + a± t + ϕ0 , γE √ → 0 as t → ±∞. (3.7) dist 3E √ 2E 2E √ Hence, the points x± = 3E (ϕ0 + a± ) / 2E are asymptotically connected through √ z = γE ϕ0 / 2E . (We note that z is not unique: it can be replaced by √ γE (ϕ0 + n) / 2E , for any n ∈ Z.) In the internal coordinates (J, ϕ) of Sect. 3.5, the map S0 is expressed as S0 (J, a− + ϕ) = (J, a+ + ϕ),

Geometric Approach to the Existence of Orbits with Unbounded Energy

363

or more simply, calling 1 = a+ − a− the phase shift: S0 (J, ϕ) = (J, ϕ + 1) .

(3.8)

Note that the phase shift 1 is uniquely defined in spite of the fact that the point z is not unique and that the a± are defined only up to the simultaneous addition of an integer. The result of the previous calculation – that x+ can indeed be defined as a function of x− and hence S0 is a well defined function – , can be explained geometrically by noting that the monodromy of the local definition of x+ is trivial. Besides using the previous calculation, we can appeal to the general argument, which we will use later, that if the monodromy was non trivial, we could find x+ 6= x+ ∈ 3 in such a way that W s (x+ ) ∩ W s (x+ ) 6 = ∅. This is impossible. Note that z can be defined locally as a function of x− : z = Z(x− ) (this follows from the fact that the stable and the unstable manifolds intersect transversally). This local definition in neighborhoods of x− ∈ 3 cannot be made into a global definition on 3 since there is a monodromy. Note that if x− moves around a non-trivial circle in the annulus 3, the local z changes from z to 8T (z), where T is the period of the orbit in 3 through x− . Later, when we have to consider perturbations, even if the direct calculation is impossible, the geometric argument will go through and it will establish that an S defined in a fashion analogous to S0 is indeed a smooth map. 4. The Problem with External Potential 4.1. Summary. The main idea is that, for high energy, the external potential is a small (and slow) perturbation of the geodesic flow. Therefore, all the geometric structures that we constructed based on normal hyperbolicity and transversality persist for high energy. In particular, the manifold 3 will persist as well as the transversality of the intersection of its stable and unstable manifolds. This will allow us to define F, S analogues of the maps F0 , S0 , and to compute them perturbatively. Using the information that we have of these maps, we will construct a sequence n1 , . . . , nk , . . . , such that there is some point x with xk = F nk ◦ S ◦ · · · ◦ F n1 ◦ S(x) → ∞.

(4.1)

This sequence of points xk will be used as the skeleton for orbits of the perturbed geodesic flow whose energy grows to infinity. The points xk constitute a chain of heteroclinic connections between whiskered tori. Hence the existence of escape orbits can be described and established using the usual geometric methods for whiskered tori and their heteroclinic connections. Heuristically, these orbits can be described as follows: the orbits make excursions roughly along the homoclinic orbit when the external potential has a phase that helps to gain energy, but they bid their time between jumps staying close to the unperturbed periodic orbits till the phase of the external potential becomes favorable again. By choosing the time when to perform the jumps, it will be possible for the orbits to keep on gaining energy. Therefore, the main technical goal will be to compute perturbatively, for high energy, the inner and the outer maps F and S, show that applying them alternatively we can construct sequences xk as in (4.1) and then, show that these orbits can be shadowed by real orbits. The existence of the points xk will require some non-degeneracy assumptions on the external potential (namely, that there are times at which jumping produces a gain

364

A. Delshams, R. de la Llave, T. M. Seara

in energy). It turns out that the gain in energy is expressed by an integral – commonly termed the Poincaré function – which depends on the phase at which the jump takes place (relative to the phase of the potential). If this function, as a function of the jumping time, is not constant, it is indeed possible to make jumps that gain energy. Rather remarkably, the same integral and the same condition appears in J. N. Mather’s approach even if with a very different motivation. Moreover, it is interesting to note that the variational construction in [Mat95] also involves jumps roughly along γ separated by orbits that stay close to 3. Remark 4.1. We recall attention to the fact that the problem has two different smallness parameters. One is how close is the metric to the integrable one. Another one is the inverse of the energy. For large values of the energy, the potential can be considered as a perturbation of the geodesic flow. We also note that there are two different time scales involved. One is the time scale of the period √of the perturbation (O(1)) and the second one is that of the period of the geodesic (1/ 2E ), which is also a characteristic time of the homoclinic trajectory.

4.2. The scaled problem. In order to make the perturbative structure of the problem more apparent we will scale √ the variables and the time. Thus, we pick a (large) number E∗ and introduce ε = 1/ E∗ . Recall that the original Hamiltonian is H (p, q, t) = 21 gq (p, p) + U (q, t), hence ε 2 H (p, q, t) = 21 gq (εp, εp) + ε2 U (q, t). If we denote εp = p¯ and consider the sym¯ = d p¯ ∧ dq = ε, we note that q, p¯ are conjugate variables in . ¯ We plectic form ¯ also introduce a new time t = t/ε. We see that the equations ∂H 1 ∂gq ∂U dp =− =− (p, p) − (q, t), dt ∂q 2 ∂q ∂q ∂H dq = = gq (p, ·), dt ∂p are equivalent to 1 ∂gq ∂U d p¯ =− (p, ¯ p) ¯ − ε2 (q, εt¯), d t¯ 2 ∂q ∂q dq = gq (p, ¯ ·), d t¯ ¯ for the time t¯, with respect to the Hamiltonian which are Hamiltonian equations in , 1 ¯ q, εt¯) = gq (p, ¯ p) ¯ + ε2 U (q, εt¯). H¯ ε (p, 2

(4.2)

We also introduce E¯ = E/E∗ . For our purposes, it suffices to analyze a fixed range in scaled energies (which we will fix arbitrarily to be [1/2, 2]) and establish that for large enough E∗ , we can find pseudo-orbits which are often close to 3 and whose energy increases from ≈ 1/2 to ≈ 2. Then, using that the result is valid for all the large enough energies, we can construct a pseudo-orbit whose energy grows unboundedly. From now on and until further notice, we will drop the bar from the problem. We will refer to the bar variables as the rescaled variables and the original ones as the physical

Geometric Approach to the Existence of Orbits with Unbounded Energy

365

variables. Then the Hamiltonian Hε and all the functions derived from it will be 1/ε periodic in time. In order to make this more apparent we will use the notation given in 4.2. Since we have introduced the scaling, it will be convenient to express S0 , F0 in these rescaled variables. Because S0 was defined through geometric considerations it does not change when rescaled: S0 (J, ϕ) = (J, ϕ + 1). On the other hand, F0 becomes the time 1/ε of the geodesic flow. Hence, we introduce the notation f0ε : 3 → 3 for its rescaled expression, that becomes f0ε (J, ϕ) = (J, ϕ + J /ε). Similarly, we can study the hyperbolic properties of 3 under the rescaled flow. It is easy to note that the stable and unstable bundles do not change under rescaling of time, and that the exponents get multiplied by 1/ε.

4.3. The perturbed invariant manifold. Using the hyperbolicity properties of the manifold 3 for the geodesic flow (see Sect. 3.2), we will apply the results of hyperbolic perturbation theory summarized in Appendix A. In order to do perturbation theory for the manifold 3, it will be more convenient to use the flow rather than the time 1/ε map. Notice that the Lyapunov exponents of the unperturbed map are ±∞. Even if this does not interfere with stability (roughly, the larger the Lyapunov exponents are, the more stable the system is), it is cumbersome to write the arguments. We note that in the Hamiltonian (4.2), ε enters in two different ways, both as a perturbation parameter in the Hamiltonian and as the frequency of the perturbing potential. To distinguish these two different rôles of ε, we find it more convenient to introduce the autonomous flow ∂H1 ∂H0 (p, q) − δ (p, q, s/T ), ∂q ∂q ∂H1 ∂H0 (p, q) + δ (p, q, s/T ), q˙ = ∂p ∂p s˙ = 1,

p˙ = −

(4.3)

defined on the extended phase space T∗ T2 × T T1 . This problem is equivalent to our original one if we set δ = ε 2 , T = 1/ε, and H1 (p, q, s/T ) = U (q, εs). ˜ t,T ,δ (p, q, s) = (0 s,s+t (p, q), s + t), where We will denote the flow of (4.3) by 8 T ,δ 0

0 00

0

00

0Tt,t,δ (p, q) is the non-autonomous flow. Note that as usual 0Tt ,t,δ ◦ 0Tt,t,δ = 0Tt,t,δ in the domains where these compositions make sense. We note that setting δ = 0 in (4.3) we have that

˜ := 3 × T T1 ' [J0 , ∞) × T1 × T T1 3 √ is a manifold locally invariant for the flow, where J0 = 2E0 . This manifold is also normally hyperbolic in the sense of Definition A.1. Using Theorem A.14 and observation 1 after it, we have:

366

A. Delshams, R. de la Llave, T. M. Seara

Theorem 4.2. Assume that we have a system of equations as in (4.3), where the Hamiltonian H = H0 + δH1 is C r , 2 ≤ r < ∞. Then, there exists a δ ∗ > 0 such that for |δ| < δ ∗ , there is a C r−1 function F : [J0 + Kδ, ∞) × T1 × T T1 × (−δ ∗ , δ ∗ ) −→ T∗ T2 × T T1 such that

˜ T ,δ = F [J0 + Kδ, ∞) × T1 × T T1 × {δ} 3

(4.4)

˜ T ,0 = 3 ˜ in the ˜ T ,δ is δ-close to 3 is locally invariant for the flow of (4.3). Therefore, 3 r−2 C sense. ˜ T ,δ is a hyperbolic manifold. We can find a C r−1 function Moreover, 3 F s : [J0 + Kδ, ∞) × T1 × T T1 × [0, ∞) × (−δ ∗ , δ ∗ ) −→ T∗ T2 × T T1 such that its (local) stable invariant manifold takes the form ˜ T ,δ ) = F s [J0 + Kδ, ∞) × T1 × T T1 × [0, ∞) × {δ} . W s,loc (3

(4.5)

˜ T ,δ , then W s,loc (x) = F s ({J } × {ϕ} × {s} × [0, ∞) × {δ}). If x = F(J, ϕ, s, δ) ∈ 3 s,loc ˜ ˜ in the C r−2 sense. Analogous results (3T ,δ ) is δ-close to W s,loc (3) Therefore W hold for the (local) unstable manifold. ˜ W u (3) ˜ are transversal at γ˜ ⊂ W s (3) ˜ ∩ W u (3), ˜ we see that Remark 4.3. Since W s (3), r−2 sense, such that there exists a locally unique γ˜T ,δ which is δ-close to γ˜ in the C ˜ T ,δ ) ∩ W u (3 ˜ T ,δ ), and that γ˜T ,δ can be parameterized by a C r−1 function γ˜T ,δ ⊂ W s (3 on γ˜ × (−δ ∗ , δ ∗ ) to the extended phase space. Notation 4.4. From now on, we are going to fix our attention to the case δ = ε2 and T = 0 ˜ 1/ε,ε2 , γ˜ε = γ˜1/ε,ε2 , 8 ˜ t,ε = 8 ˜ t,1/ε,ε2 and 0εt,t 0 = 0 t,t 2 . ˜ε =3 1/ε, and we will call 3 1/ε,ε ˜ ε , we will show Remark 4.5. Even if Theorem 4.2 only guarantees local invariance for 3 later that KAM theory will provide invariant boundaries consisting of KAM tori. There˜ ε invariant. Since the results in hyperbolic theory for locally fore, it is possible to take 3 invariant manifolds are somewhat sharper for invariant manifolds (they include uniqueness statements and a geometric definition of stable and unstable manifolds), this will allow us later to state slightly sharper results. The main results in this paper can be obtained without this improvement, hence we will just develop it in remarks. Since the theory of normally invariant manifolds ignores symplectic structures, which will play an important rôle in our considerations, it will be useful to supplement the above considerations with a study of symplectic structure. ˜ ε given For a fixed s, we denote 3sε ⊂ T∗ T2 the manifold obtained by fixing s in 3 by (4.4): (3sε , s) = F [E0 + Kε2 , ∞) × T1 × {s} × {ε2 } .

By Theorem 4.2, 3sε is ε 2 -close to the unperturbed manifold 3 in the C r−2 sense. In particular, if we denote by sε the restriction of the symplectic form to these manifolds,

Geometric Approach to the Existence of Orbits with Unbounded Energy

367

it is a symplectic form. We also have sε = dθεs , where θεs is the restriction of the symplectic potential form to 3sε . The classical results of adiabatic perturbation theory we want to use in Sect. 4.4.1 refer to time dependent Hamiltonian flows on a fixed manifold with a fixed symplectic structure, whereas we have a time dependent manifold. Thus, we introduce changes of variables that keep the manifold fixed and study the flow induced in the fixed manifold. Since the Hamiltonian character is important in adiabatic perturbation theory, we pay attention to the Hamiltonian structure of the changes of variables. ˜ t,ε (p, q, s) = (0εs,s+t (p, q), s + t) of (4.3), we ˜ ε is invariant by the flow 8 Since 3 0 0 t,t t t t have that 0ε : ϒε ⊂ 3ε → 3ε (where ϒεt excludes a neighborhood of order ε2 outside the boundary of 3tε ). Moreover, this flow transforms the symplectic structure 0∗ 0 in one manifold to the one of the image 0εt,t tε = tε . Furthermore, it is an exact 0∗ 0 0 0 transformation, that is, 0εt,t θεt = θεt + dSεt,t , where Sεt,t is a real valued function in 0 3tε and the d refers to the exterior differential in that manifold. Now, since the manifolds 3sε are close to the standard one 3 we can find coordinate maps Cεs : 3sε → 3. We claim that it is possible to choose these Cεs in such a way that they transform the symplectic form into the standard one. In effect, if we push forward the symplectic forms sε , we obtain a family of symplectic forms in 3 which are close to . These symplectic forms are also exact. Applying Moser’s method [Wei77], we can find maps from 3 to 3 that transform these symplectic forms into the standard one. We will just redefine the Cεs to include the composition with these mappings in 3. A proof that these maps can be chosen to be C r−2 jointly with the parameters can be found in complete detail in [BLW96]. 0 0 If we now consider Cεt ◦ 0εt,t ◦ (Cεt )−1 we see that it is a flow of exact symplectic mappings in 3. The Hamiltonian kε (J, ϕ, εs) generating this flow is the push-forward by Cεs of the Hamiltonian Hε (p, q, s/T ) = Hε (p, q, εs) generating the flow of (4.3) (T = 1/ε). In particular, it is a C r−2 flow, 1/ε periodic and it is a small perturbation of the constant flow J˙ = 0, ϕ˙ = J of Hamiltonian 21 J 2 . 4.4. The perturbed inner map. Given s ∈ ε1 T1 , the perturbed inner map is the time 1/ε flow on 3sε : . 0εs,s+1/ε : 3sε → 3s+1/ε ε In the coordinate system (J, ϕ) on 3 introduced at the end of Sect. 3.5, we study the map fεε : 3 → 3, obtained setting τ = ε in: fετ = Cε1/τ ◦ 0ε0,1/τ ◦ (Cε0 )−1 . This map is the time 1/ε flow of the Hamiltonian kε (J, ϕ, εs). Note that this map is a small perturbation of the map f0ε introduced in Sect. 4.2. (The notation fεε is designed to be a mnemonic of this fact: the upper ε indicates the frequency of the perturbation and the lower ε is a measure of the size of the perturbation.) Our goal is to study this map and show that it possesses KAM curves with very small gaps. If we applied KAM theory directly, we would obtain gaps significantly bigger than those desired for our purposes. Therefore, we will take advantage of the fact that the perturbation is slow so that we can apply several steps of averaging theory (see, for example [AKN88,LM88]) and reduce the perturbation. If we apply KAM to the map after averaging (which is significantly closer to integrable than the original one), the KAM tori have small enough gaps for our purposes.

368

A. Delshams, R. de la Llave, T. M. Seara

4.4.1. Averaging theory. The result that allows us to reduce the perturbation by a change of variables is: Theorem 4.6. Let kε (J, ϕ, εs) be a C n Hamiltonian, 1-periodic in ϕ and εs, such that kε (J, ϕ, εs) = 21 J 2 + ε2 k1 (J, ϕ, εs; ε). Then, for any 0 < m < n, there exists a canonical change of variables (J, ϕ, s) 7 → (I, ψ, s), 1-periodic in ϕ and εs, which is ε2 -close to the identity in the C n−m topology, such that transforms the Hamiltonian system of Hamiltonian kε (J, ϕ, εs) into a Hamiltonian system of Hamiltonian Kε (I, ψ, εs). This new Hamiltonian is a C n−m function of the form: Kε (I, ψ, εs) = Kε0 (I, εs) + εm+1 Kε1 (I, ψ, εs), where Kε0 (I, εs) = 21 I 2 + OC 1 (ε2 ), and the notation OC 1 (ε) means a function whose C 1 norm is O(ε). Proof. The proof of this theorem is standard. For more details and applications of the analytic case, one can see [AKN88]. We will just go over the proof to show that it works for finite differentiable Hamiltonians. Calling a the action conjugate of time s, we have the 2-degrees of freedom Hamiltonian a + kε (J, ϕ, εs), which has a fast angle ϕ and a slow one εs. We look for a canonical change of variables which eliminates the fast angle ϕ. The change will be obtained through a composition of changes of variables. Each of these changes will be generated through a generating function of the form: P s + I ϕ + εq+2 Sq (I, ϕ, εs; ε),

(4.6)

where Sq is 1-periodic on ϕ and εs. In this way, through the implicit equations ∂Sq (I, ϕ, εs; ε), ∂ϕ ∂Sq (I, ϕ, εs; ε), a = P + ε q+3 ∂εs ∂Sq (I, ϕ, εs; ε), ψ = ϕ + εq+2 ∂I J = I + εq+2

we obtain a canonical change of variables (J, ϕ, a, εs) → (I, ψ, P , εs), where (J, ϕ, a) = (I, ψ, P ) + εq+2 ψq (I, ψ, εs; ε)

(4.7)

which, by the implicit function theorem, has one degree less of differentiability than its generating function (4.6). We will apply the following inductive lemma: Lemma 4.7. Consider a Hamiltonian of the form a + Kq (J, ϕ, εs; ε) = a + Kq0 (J, εs; ε) + εq+2 Kq1 (J, ϕ, εs; ε), where Kq0 = J 2 /2 + OC 1 (ε2 ) is C n−q+1 and Kq1 is C n−q , 0 ≤ q ≤ n − 1. We can find a function Sq (I, ϕ, εs; ε) verifying ∂Sq ∂ 0 1 Kq (I, εs; ε) (I, ϕ, εs; ε) + Kq1 (I, ϕ, εs; ε) = K q (I, εs; ε), ∂I ∂ϕ

(4.8)

Geometric Approach to the Existence of Orbits with Unbounded Energy

where 1

K q (I, εs; ε) =

Z

1

0

369

Kq1 (I, ϕ, εs; ε)dϕ.

Then, the change (4.7) generated by (4.6) transforms the Hamiltonian a+Kq (J ϕ, εs; ε), into a Hamiltonian 0 1 (I, εs; ε) + εq+3 Kq+1 (I, ψ, εs; ε), a + Kq+1 (I, ψ, εs; ε) = a + Kq+1

where 0 (I, εs; ε) = Kq0 (I, εs; ε) + εq+2 K q (I, εs; ε) = Kq+1 1

I2 + OC n−q (ε2 ) 2

is C n−q and Kq1 is C n−q−1 . Proof. Note that a solution of (4.8) is Sq =

R

d ϕ Kq1 − K¯ q1 /∂I Kq0 . It follows that Sq

and ∂Sq /∂ϕ are C n−q . The new Hamiltonian is given by

∂Sq (I, ϕ, εs; ε) P + εq+3 ∂εs 0 q+2 ∂Sq (I, ϕ, εs; ε), εs; ε +Kq I + ε ∂ϕ q+2 1 q+2 ∂Sq (I, ϕ, εs; ε), ϕ, εs; ε +ε Kq I + ε ∂ϕ 0 1 (I, εs; ε) + εq+3 Kq+1 (I, ψ, εs; ε), = P + Kq+1

where, Taylor expanding Kq0 and Kq1 and using Definition (4.8) of the generating function, we get: 0 (I, εs; ε) = Kq0 (I, εs; ε) + εq+2 K q (I, εs; ε) Kq+1 1

and ∂Sq 1 (I, ψ, εs; ε) = Kq0 I + εq+2 (I, ϕ, εs; ε), εs; ε ε q+3 Kq+1 ∂ϕ ∂S ∂ q K 0 (I, εs; ε)εq+2 (I, ϕ, εs; ε) −Kq0 (I, εs; ε) − ∂I q ∂ϕ q+2 1 q+2 ∂Sq 1 (I, ϕ, εs; ε), ϕ, εs; ε − Kq (I, ϕ, εs; ε) Kq I + ε +ε ∂ϕ ∂Sq (I, ϕ, εs; ε), + ε q+3 ∂εs where, in these formulas, ϕ has to be expressed in terms of the variables (I, ψ, εs; ε) using the change of variables (4.7). Since Kq0 is C n−q+1 and Kq1 is C n−q , it is clear that Sq , ∂ϕ Sq are C n−q , and the change of variables (4.7) is C n−q−1 . (Note that in the equation above, only the term εq+3 ∂εs Sq 0 1 is C n−q−1 .) Then one has that Kq+1 is C n−q and Kq+1 is C n−q−1 . u t

370

A. Delshams, R. de la Llave, T. M. Seara

To finish the proof of Theorem 4.6, we only need to apply the inductive Lemma 4.7 for q = 0, 1, . . . m − 1, and we obtain the desired result. For q = 0, it is important to note that K00 (J, εs; ε) = 21 J 2 is C ∞ , and K01 (J, ϕ, εs; ε) = k 1 (J, ϕ, εs; ε) is C n . Then t the last Hamiltonian will be of class C n−m . u Lemma 4.8. In the conditions of Theorem 4.6 with n = r − 2, the map fεε : 3 → 3, which is exact symplectic, can be written in the coordinates (I, ψ) introduced in Theorem 4.6 as 1 ε (4.9) fε (I, ψ) = I, ψ + A(I, ε) + εm R(I, ψ; ε), ε R 1/ε where A(I, ε) = ε 0 D1 Kε0 (I, εs)ds = I + OC 0 (ε2 ), and R is a C r−m−4 function. Proof. Recall that fεε in the (I, ψ) coordinates is the time 1/ε map of the C r−2−m Hamiltonian Kε whose flow is C r−3−m . The flow, in these coordinates, is the flow of an integrable Hamiltonian Kε0 plus some Hamiltonian of order O(εm+1 ). Hence, using variational equations, we obtain that the time 1/ε map differs from that of the integrable t part by an amount not larger than ε m in the C n−4−m topology. u 4.4.2. K.A.M. theory. We now recall a quantitative version of the KAM Theorem. The version below is somewhat weaker than that of [Her83] (we do not use fractional regularities so we lose whole integer number of derivatives in the conclusion while an arbitrary real positive number would suffice), but is enough for our purposes. We recall that a real number ω is called a Diophantine number of exponent θ if there exists a constant C > 0 such that |ω − p/q| ≥ C/q θ+1 for all p ∈ Z, q ∈ N. Theorem 4.9. Let f : [0, 1] × T1 7 → [0, 1] × T1 be an exact symplectic C l map, with l ≥ 4. dA l ≥ M, Assume that f = f0 + δf1 , where f0 (I, ψ) = (I, ψ + A(I )), A is C , dI and kf1 kC l ≤ 1. Then, if δ 1/2 M −1 = ρ is sufficiently small, for a set of ω of Diophantine numbers of exponent θ = 5/4, we can find invariant tori which are the graph of C l−3 functions uω , the motion on them is C l−3 conjugate to the rotation by ω, and kuω kC l−3 ≤ cte. δ 1/2 , and the tori cover the whole annulus except a set of measure smaller than cte. M −1 δ 1/2 . Moreover, if l ≥ 6 we can find expansions uω = u0ω +δu1ω +rω , with krkC l−4 ≤ cte. δ 2 ,

1 and u C l−4 ≤ cte. . Applying Theorem 4.9 to the map fεε given in (4.9), we obtain KAM invariant tori of the system (4.3), as long as this map is C l with l := r − m − 4 ≥ 6. Note that according to (4.9), the frequencies of f0ε are roughly (1/ε)A(I, ε) with A(I, 0) the frequencies of the unperturbed Hamiltonian flow. Hence, the C l−4 distance between invariant tori is not bigger than ε m/2+1 . We note that these invariant circles for fεε correspond to invariant two dimensional tori for the extended flow. An invariant circle for fεε with frequency ω ˜ t,ε with frequency (ω, ε). corresponds to a two dimensional invariant torus for 8 Remark 4.10. Note that these KAM tori that we have produced for the map fεε are ˜ t,ε . They could have been produced also really whiskered tori for the extended flow 8 by appealing to the Graff-Zehnder Theorem.

Geometric Approach to the Existence of Orbits with Unbounded Energy

371

In particular, proceeding as in Zehnder [Zeh75,Zeh76] we can obtain a normal form for the Hamiltonian Hε (p, q, εs) in a neighborhood of these KAM tori: 0 2 I + < zs , (ϕ, s)zu > 2 + g(I, ϕ, s, zs , zu ).

G(I, a, ϕ, s, zs , zu ) = ωI + a +

(4.10)

Such normal forms are commonly used in the study of inclination lemmas for whiskered tori. However, we will perform our study of inclination lemmas in the normal form for whiskered tori with one dimensional whiskers introduced in [FM98, Sect. 4.1]. This normal form does not require that the motion on the tori satisfies Diophantine conditions – only that it is an irrational rotation – and requires much less regularity. Remark 4.11. When the metric and the potential are C ∞ or C ω , even if the argument using the hyperbolic invariant manifold only allows to construct finitely differentiable tori, appealing to the results in [Zeh75,Zeh76], we can conclude that these tori we constructed are indeed C ∞ or C ω . Remark 4.12. Note that KAM tori produced by Theorem 4.9 are of codimension 1 inside ˜ ε . If we choose a submanifold whose boundary consists of two KAM tori, this sub3 manifold will be an invariant manifold for the extended flow. The results of hyperbolic perturbation theory of Appendix A can be extended to include uniqueness as is explained in observation 4 after Theorem A.14. Once we have the existence of the invariant tori of system (4.3), it is worthwhile to obtain some explicit approximations for them in the coordinate system given by the phases (ϕ, s) and the value of the Hamiltonian Hε . (Note that since the Hamiltonian Hε is close to J 2 /2, (Hε , ϕ, s) constitute a good system of coordinates.) We will find it convenient to write U (q, τ ) = U¯ (τ ) + U˜ (q, τ ), where the functions U¯ (τ ), and U˜ (q, τ ) are given by U¯ (τ ) =

Z

1

U (31/2 (ϕ), τ )dϕ, U˜ (q, τ ) = U (q, τ ) − U¯ (τ ).

(4.11)

0

This decomposition is natural because of the different scales involving the problem. We are separating explicitly the average on the fast variables. We call attention to the fact that U¯ (τ ), being independent of q, does not affect the dynamics. Lemma 4.13. Let ω be one of the frequencies allowed in Theorem 4.9. Then, in the coordinate system (Hε , ϕ, s), we can write the torus of frequency (ω, ε) as the graph of a function G(ϕ, s; ε). Moreover, we can write G(ϕ, s; ε) =

ω2 ˜ s; ε) + OC l−4 (ε4 ), + ε2 U¯ (εs) + ε3 g(ϕ, 2

where g(ϕ, ˜ τ ; ε) is a 1-periodic in (ϕ, τ ) function which verifies

(4.12)

372

A. Delshams, R. de la Llave, T. M. Seara q

ωD1 g(ϕ, ˜ τ ; ε) + εD2 g(ϕ, ˜ τ ; ε) = D2 U˜ (31/2 (ϕ), τ ) + OC l−4 (ε 3 )

(4.13)

and ||g(·, ˜ ·; ε)||C l−4 is bounded uniformly in ε. ˜ This h˜ satisfies (obviously) Furthermore, we can choose g˜ in such a way that g˜ = D2 h. q ˜ ˜ τ ; ε) + εD2 h(ϕ, τ ; ε) = U˜ (31/2 (ϕ), τ ) + OC l−4 (ε 3 ) ωD1 h(ϕ,

(4.14)

˜ ·; ε)||C l−4 is bounded uniformly in ε. and ||h(·, We call attention to the fact that the functions g, ˜ h˜ are not unique. On the other hand, as we will see later, the ambiguities only arise in subdominant terms. Proof. We will first present a formal proof and then we will work out the relation with perturbative methods such as Lindstedt–Poincaré, which are somewhat subtle since the problem involves singular perturbations. (One frequency is much larger than the other.) The KAM Theorem 4.9 provides us with parameterizations (p(ψ, εs), q(ψ, εs), s) of the invariant torus in the original variables (p, q), in terms of the internal variables ψ, s which satisfy ψ˙ = ω, s˙ = 1. These parameterizations are OC l−4 (ε3 ) close to constant when expressed in terms of the averaged variables. We denote by G(ψ, εs; ε) = Hε (p(ψ, εs), q(ψ, εs), εs)

(4.15)

and note that the derivative with respect to the flow of this equation is d G ◦ 8t,ε |t=0 = ωD1 G(ψ, εs; ε) + εD2 G(ψ, εs; ε) dt = ε 3 D2 U (q(ψ, εs; ε), εs).

(4.16)

We note that the first two terms of the averaging transformations are J 2 /2 + ε2 U¯ (εs) and that, as a consequence of the hyperbolic perturbation theory, the averaging method and the KAM theory, the KAM tori are close to an orbit 3E , with E = J 2 /2, of the unperturbed system. If we perform this substitution in (4.16), we obtain the desired result. u t Remark 4.14. The previous calculation can be also understood as a modification of Lindstedt–Poincaré method. Since the Lindstedt–Poincaré method is a commonly used tool in singularly perturbed systems, we thought it could be interesting to some readers to develop a comparison. We refer to [Gal94] for a survey of Lindstedt methods for analytic systems that includes a treatment of singularly perturbed systems through the use of tree-like diagrams. Since we are considering a system with two time scales, the most standard method, which fixes the frequency and, then, seeks parameterizations of tori with the prescribed frequency as expansions in powers of ε, cannot work since the frequency dependence in ε will cause the composed frequency to go through resonances on which we do not expect tori to exist. Nevertheless, we will see that it is possible to compute systematically parameterizations p(ψ, εs; ε), q(ψ, εs; ε) that satisfy the equations of motion to a very high accuracy

Geometric Approach to the Existence of Orbits with Unbounded Energy

373

and whose coefficients are, furthermore, of moderate size. Once we have that, the Newton method started on them will lead to a true solution which is close to these approximate solutions. (See [Zeh75,Zeh76].) If we seek a parameterization of the torus with frequency vector (ω, ε), as above, we obtain a system of equations [ωD1 + εD2 ] p(ψ, εs; ε) = −Dq Hε (p(ψ, εs; ε), q(ψ, εs; ε), s), [ωD1 + εD2 ] q(ψ, εs; ε) = Dp Hε (p(ψ, εs; ε), q(ψ, εs; ε), s).

(4.17)

Even if, as we will soon see, it is a bad idea to try to obtain solutions that are just powers of ε with coefficients that are functions only of the other variables, it is quite feasible to obtain expansions in powers of ε with coefficients that are functions of all the variables – including ε – which solve (4.17) up to a high order power in ε and such that all the coefficients are of order 1. These coefficients are not unique since the term of a certain order is only defined up to terms ofRhigher order. The main observation is that, given 0 with T 0(ψ, εs; ε) dψ = 0 and smooth, the equation for η [ωD1 + εD2 ]η(ψ, εs; ε) = 0(ψ, εs; ε)

(4.18)

can be satisfied up to high order error in ε by functions whose size is comparable to 0. As it is well known, this is the homology equation and the Lindstedt series can be computed by recursively solving this equation on expressions that involve only previously computer quantities. If we try to solve (4.18) using Fourier analysis, we find that it is equivalent to ηˆ k1 ,k2 = (2π i(ωk1 + εk2 ))−1 0ˆ k1 ,k2 .

(4.19)

If we choose η in such a way that its Fourier coefficients with |k| ≤ ε−1/2 are obtained according to (4.19) and the other ones are zero, we note that: a) If 0 is C m then

|0ˆ k1 ,k2 | ≤ C|k|−m ||0||C m .

Hence, η solves Eq. (4.18) up to an error whose C l norm can be bounded by C||0||C m · P l−m , which can, in turn, be bounded by C||0|| m ε (−l+m−2)/2 when l − C |k|≥ε1/2 |k| m + 1 < −1. b) Since 0 has no Fourier coefficients with k1 = 0, then the denominators of (4.19) are uniformly bounded from below and we have, using the same estimates as above, ||η||C l ≤ C||0||C m when l − m + 1 < −1. By repeating this construction in all the steps that we have to solve (4.18) in the calculation of the Lindstedt series, we obtain functions of size bounded uniformly in ε which satisfy (4.17) up to an error which can be bounded by a power of ε. This power can be made arbitrarily high if we are considering systems that are differentiable enough. Note that these approximate solutions – in contrast to those of the standard Lindstedt method – are not unique since they include choices such as the level of truncation (we took |k| ≤ ε −1/2 but could have made other choices). The above procedure makes it clear that it is a bad idea solving Eqs. (4.18) exactly. If we considered in (4.19) the coefficients with |k| ≈ ε−1 or bigger we would indeed have to consider small divisors. This is a reflection of the fact that there is no number ω such that (ω, ε) is a nonresonant vector for an interval of ε around zero. Since the goal of

374

A. Delshams, R. de la Llave, T. M. Seara

this equation was to eliminate terms from the perturbation, we have decided to respect those modes corresponding to |k| ≥ ε−1/2 since the regularity assumptions guarantee that they are small. Once we have parameterizations that solve (4.17) with a very small error, we can apply an appropriate version of the KAM theorem to produce an exact solution. Indeed, this Lindstedt method is an alternative to the averaging method that we used in the main text. We emphasize that for the applications that we have in mind here, it suffices to compute only a finite number of terms to obtain approximations to O(εn ). Hence, there is no need to discuss convergence and we only need that the functions involved are finitely differentiable. 4.5. The perturbed outer map. Theoretical results. The goal of this section is to define and to compute the outer map S which characterizes intersections of stable and unstable manifolds for the perturbed flow. This will be done in a very similar way to the one used to define the outer map S0 for the geodesic flow in Sect. 3.6. We recall that, according to Theorem 4.2 and Remark 4.3, when we consider the perturbed flow (4.3) in the extended ˜ ε ), γ˜ε , continuing those of the unperturbed system. ˜ ε , W s,u (3 phase space, we can find 3 ˜ ε , we say x˜+ = S(x˜− ) when Then, given (x˜+ , x˜− ) ∈ 3 W s (x˜+ ) ∩ W u (x˜− ) ∩ γ˜ε 6 = ∅.

(4.20)

That is, there exists z˜ ∈ γ˜ε such that ˜ t,ε (˜z) → 0 as t → ±∞, ˜ t,ε (x˜± ), 8 dist 8

(4.21)

which, by the hyperbolicity properties is equivalent to ˜ t,ε (˜z) ≤ cte. e−βt for ˜ t,ε (x˜± ), 8 dist 8

(4.22)

± t ≥ 0.

Note that, if we write x˜± = (x± , s± ), z˜ = (z, sz ), since the flow (4.3) satisfies s˙ = 1, we see that (4.21) implies s+ = s− = sz , which we will henceforth denote by s. Now, we want to argue that the map S is indeed well defined and that it is smooth in the x˜− argument. If we fix ε small enough, we see that, because of the differentiability of ˜ at γ˜ , the condition (4.20) W s,u (x˜− ) with respect to x˜− and the transversality of W s,u (3) defines z˜ as a local function of x˜− .(Note that we have several z˜ that satisfy (4.21) so that z˜ (x˜− ) cannot be defined as a function.) Using that, we can define x˜+ as a local function of x˜− . As in Sect. 3.6 we argue that the monodromy of x˜+ (x˜− ) is trivial, (even if that of z˜ (x˜− ) ∗ ∈3 ˜ which satisfy (4.20) is not). We just observe that if we could find two different x˜+ , x˜+ s s ∗ for the same x˜− , we should have W (x˜+ ) ∩ W (x˜+ ) 6 = ∅, which is impossible. In order to perform explicit calculations, we will express the map S in terms of the explicit coordinates that we have introduced before. We will use the maps Cεs : 3sε → 3 introduced at the end of Sect. 4.3, the coordinate system (J, ϕ) for 3 introduced in Sect. 3.5 and the map F given by the perturbation theory for normally hyperbolic manifolds (Theorem 4.2). We introduce the coordinate system K by: (4.23) x˜ = (x, s) = F (Cεs )−1 (J, ϕ), s, ε2 = K J, ϕ, s, ε2 , s .

Geometric Approach to the Existence of Orbits with Unbounded Energy

375

In these coordinates, if we consider x˜+ = S(x˜− ) connected through a point z˜ verifying (4.22), and set x˜± = (x± , s), with x± = K(J± , ϕ± , s, ε2 ), we have ϕ± = ϕ0 + a± + O(ε2 ), J± = J0 + O(ε2 ), where a± were introduced in Theorem 2.1, for some ϕ0 ∈ R, J0 ∈ R. Moreover, we have ϕ +a ˜ t,ε (x˜± ) = 3E t + 0√ ± + O(ε2 ), s + t , 8 2E ϕ0 ˜ t,ε (˜z) = γE t + √ (4.24) + O(ε2 ), s + t 8 2E with E = J02 /2. In the formulas (4.24), the O(ε2 ) is uniform for t ∈ R. This follows from the hyperbolicity theory and Remark 4.3.

4.6. The perturbed outer map. The Poincaré function. The main goal of this section is to define and to compute a function which characterizes and quantifies the existence of heteroclinic intersections between the KAM tori for the inner map (whiskered tori for the perturbed flow) obtained in Sect. 4.4.2. That is, we will need to characterize when, ˜ ε , we have that S(τ1 ) is tranversal to τ2 in 3 ˜ ε. given KAM tori τ1 , τ2 in 3 The main idea is to use the fact that (Hε , ϕ, s) constitutes a good system of coor˜ ε . The KAM tori as given in Lemma 4.13 correspond very dinates in the manifold 3 approximately to Hε = cte. and indeed, we have expressions on their dependence. If x˜− lies on a KAM torus τ1 we will be interested in computing Hε (x˜+ ) − Hε (y), where y is the projection of x˜+ = S(x˜− ) on the KAM torus τ1 (see Fig. 1). The function Hε (x˜+ ) − Hε (y) will be our desired measurement. Its main term will be the Melnikov function (which is the gradient of the Melnikov potential). Following [Tre94], we will compute Hε (x˜+ ) − Hε (y) as Hε (x˜+ ) − Hε (x˜− ) + Hε (x˜− ) − Hε (y). The first term will be computed by means of a classical calculation that goes back to Poincaré. Indeed, since x˜+ and x˜− are connected through an orbit, we can use the fundamental theorem of calculus and obtain the difference by integrating the derivative and taking appropriate limits. This will be done in detail in Lemma 4.15. The term Hε (x˜− ) − Hε (y) can be computed using the explicit expansions of KAM tori that we computed in Lemma 4.13. For the system at hand, we can take advantage of the slow dynamics and we can use ˜ 1/ω,ε (x− ) ≡ u has the same phases (ϕ, εs) as y up to order ε. the fact that the point 8 Using this fact, in Lemma 4.19 we will give an explicit formula for the leading term of the Melnikov potential in terms of the potential U and the unperturbed geodesics which will be called Poincaré function, with no need to solve any small divisors equation to obtain Hε (y). This explicit expression will be quite important to establish that, for high enough energies – in the scaled variables for small enough ε – the KAM tori have transversal heteroclinic intersections.

376

A. Delshams, R. de la Llave, T. M. Seara

~

~ 11 00 00x11

~

0010x+= S(x-) 11 10 10 10 10 1010 00 11 0010uy 11 00 11 00 11

Fig. 1. Illustration of the perturbed tori and the outer map

˜ ε such that x˜+ = S(x˜− ). Then Lemma 4.15. Let x˜− and x˜+ be two points on 3 Z T2 ϕ0 q 3 ˜ lim dt D2 U γE t + √ , εs + εt Hε (x˜+ ) − Hε (x˜− ) = ε (T1 ,T2 )→∞ 2E −T1 Z 0 ϕ0 + a− q dt D2 U˜ 3E t + √ , εs + εt − 2E −T1 Z T2 ϕ0 + a+ q ˜ dt D2 U 3E t + √ , εs + εt − 2E 0 (4.25) + O(ε 5 ), where

ϕ0 + a± + O(ε2 ), s x˜± = (x± , s) = K J± , ϕ± , s, ε2 , s = 3E √ 2E

for some ϕ0 ∈ R, J0 ∈ R, where E = J02 /2, K is introduced in (4.23), and U˜ , introduced in (4.11), is the forcing potential minus its average on the periodic orbit 31/2 . ˜ Proof. Recall that if a trajectory λ(t) = (λp (t), λq (t), s + t) satisfies (4.3) then: d ˜ Hε ◦ λ(t) = ε3 D2 U (λq (t), εs + εt). dt Therefore, for any two trajectories λ˜ = (λp , λq , s + t)), µ˜ = (µp , µq , r + t) of (4.3), we have, by the fundamental theorem of Calculus, ˜ )) − Hε (µ(T ˜ ˜ )) = Hε (λ(0)) − Hε (µ(0)) ˜ Hε (λ(T Z T Z T + ε3 dt D2 U (λq (t), εs + εt) − ε3 dt D2 U (µq (t), εr + εt). 0

0

(4.26)

Geometric Approach to the Existence of Orbits with Unbounded Energy

377

As x˜+ = S(x˜− ), we know that there exists z˜ ∈ T∗ T2 × T T1 , T = 1/ε, such that the ˜ t,ε (˜z) and 3 ˜ ±,(ε) (t) = 8 ˜ t,ε (x˜± ), verify (4.22). trajectory γ˜(ε) (t) = 8 Now we can use (4.26) and, by (4.22), taking limits at ±∞ as appropriate, 0 = Hε (x˜+ ) − Hε (˜z) Z T2 q q dt D2 U 3+,(ε) (t), εs + εt − D2 U γ(ε) (t), εs + εt , + lim ε3 T2 →∞

0

0 = Hε (x˜− ) − Hε (˜z) Z −T1 q q 3 dt D2 U 3−,(ε) (t), εs + εt − D2 U γ(ε) (t), εs + εt . + lim ε T1 →∞

0

Subtracting these two equations we obtain: Hε (x˜+ ) − Hε (x˜− ) = Z T2 q q − ε3 lim dt D2 U 3+,(ε) (t), εs + εt − D2 U γ(ε) (t), εs + εt Z −

(T1 ,T2 )→∞ 0

−T1

dt D2 U

0

q 3−,(ε) (t), εs

q + εt − D2 U γ(ε) (t), εs + εt .

(4.27)

By (4.22), these limits are reached uniformly in ε. (They are reached exponentially fast and the constants are uniform in ε.) We also note that the dependence of the trajectories on ε is uniform on compact intervals of time. Hence, at the expense only of introducing an error of higher order in ε, we can substitute in (4.27) for 3±,(ε) and γ(ε) the unperturbed orbits given by (4.24). We note that the right-hand side of (4.27) is linear in U . Hence if we use the decomposition U (q, τ ) = U¯ (τ ) + U˜ (q, τ ) given in (4.11), and observe that computing the right-hand side of (4.27) in U¯ gives zero, we obtain (4.25). u t Lemma 4.16. Let y be a point with the phases of x˜+ and which lies on the invariant torus for the perturbed flow which contains x˜− , where ϕ0 + a+ 2 2 + O(ε ), s , x˜+ = K(J+ , ϕ+ , εs, ε ), s = 3E √ 2E with E = J02 /2. Then: Z Hε (x˜+ ) − Hε (y) = ε

3

lim

(T1 ,T2 )→∞

+ O(ε 5 ),

ϕ0 q ˜ dtD2 U γE t + √ , εs + εt 2E −T1 √ − g˜ ϕ0 + a+ + 2E T2 , εs + εT2 ; ε √ − g˜ ϕ0 + a− − 2E T1 , εs − εT1 ; ε T2

(4.28)

where g˜ is the function given in Lemma 4.13 verifying (4.13), associated to the invariant torus of the perturbed flow which contains x˜− .

378

A. Delshams, R. de la Llave, T. M. Seara

Proof. We use Lemma 4.15 for Hε (x˜+ )−Hε (x˜− ) and Lemma 4.13 for Hε (x˜− )−Hε (y): Hε (x˜+ ) − Hε (y) = Hε (x˜+ ) − Hε (x˜− ) + Hε (x˜− ) − Hε (y) Z T2 ϕ0 q 3 ˜ =ε lim dt D2 U γE t + √ , εs + εt (T1 ,T2 )→∞ 2E −T1 Z 0 ϕ0 + a− q , εs + εt dt D2 U˜ 3E t + √ − 2E −T1 Z T2 ϕ0 + a+ q ˜ dt D2 U 3E t + √ , εs + εt − 2E 0 ˜ 0 + a+ , εs; ε) + g(ϕ ˜ 0 + a− , εs; ε) − g(ϕ + O(ε 5 ).

(4.29) √ Now, calling A− (t) = g˜ ϕ0 + a− + 2E t, εs + εt; ε , we have, using the functional equation (4.13) verified by g: ˜ √ √ A˙ − (t) = 2E D1 g˜ ϕ0 + a− + 2E t, εs + εt; ε √ + εD2 g˜ ϕ0 + a− + 2E t, εs + εt; ε √ 2E t + ϕ0 + a− , εs + εt + O(ε3 ) = D2 U˜ 31/2 ϕ0 + a− , εs + εt + O(ε3 ), = D2 U˜ 3E t + √ 2E √ and a similar identity holds for A+ (t) = g˜ ϕ0 + a+ + 2E t, εs + εt; ε , which verifies: ϕ0 + a+ ˜ ˙ , εs + εt + O(ε3 ). A+ (t) = D2 U 3E t + √ 2E

Then, using the fundamental theorem of Calculus, we have for any T : Z T ϕ0 + a± ˜ , εs + εt + O(ε3 ), dt D2 U 3E t + √ A± (T ) − A± (0) = 2E 0 and using these identities to express the second and third integrals in (4.29) with T1 and t T2 we obtain formula (4.28). u Remark 4.17. The function provided by Lemma 4.16: Z T2 ϕ0 q lim dtD2 U˜ γE t + √ , εs + εt M(ϕ0 , εs, E; ε) = (T1 ,T2 )→∞ 2E −T1 √ (4.30) − g˜ ϕ0 + a+ + 2E T2 , εs + εT2 ; ε √ + g˜ ϕ0 + a− − 2E T1 , εs − εT1 ; ε

Geometric Approach to the Existence of Orbits with Unbounded Energy

379

is usually called the Melnikov function associated to the perturbed torus. As Hε (x+ ) − Hε (y) = ε3 M(ϕ0 , εs, E; ε) + O(ε5 ),

(4.31)

M is the leading term of the function we will use to study the existence of heteroclinic intersections among tori. Even if we will not be concerned with homoclinic intersections, we note that the non-degenerate zeros of this function lead to homoclinic intersections. Remark 4.18. Note that in (4.30) in general, neither the integral nor the other terms reach a limit as T1 , T2 , but rather oscillate quasiperiodically. Only their combination converges. The meaning of this phenomenon can be clearly understood when we realize that g˜ measures the displacement of the invariant torus under the perturbation. If we are interested in the intersections of the manifolds of perturbed tori, we need to consider the changes induced in the stable manifolds of the perturbed tori, not on the unperturbed ones. We warn the reader that in many places in the literature, this term is omitted. This omission is incorrect, unless special circumstances (e.g. symmetries, that the perturbation vanishes on the torus, etc.) justify it. As a matter of fact, the Melnikov function is the derivative of the Melnikov potential (see [DR97]) defined by: Z T2 ϕ0 q lim dt U˜ γE t + √ , εs + εt L(ϕ0 , εs, E; ε) = (T1 ,T2 )→∞ 2E −T1 √ ˜ (4.32) − h ϕ0 + a+ + 2E T2 , εs + εT2 ; ε √ + h˜ ϕ0 + a− − 2E T1 , εs − εT1 ; ε , where D2 h˜ = g˜ and h˜ verifies (4.14). The Melnikov potential satisfies the following properties: 1. M(ϕ0 , εs, E; ε) = D2 L(ϕ0 , εs, E; ε). Note that the uniform convergence of the difference of two integrals by (4.22) readily justifies the computation of derivatives by computing the derivative of each term separately and also taking derivatives by taking them under the integral sign. 2. L(ϕ0 , εs, E; ε) is 1/ε-periodic in s. 3. For any u ∈ R one has: √ L ϕ0 + 2E u, εs + εu, E; ε = L(ϕ0 , εs, E; ε), √ and, taking u = −ϕ0 / 2E :

√ L(ϕ0 , εs, E; ε) = L 0, ε(s − ϕ0 / 2E ), E; ε ,

that is, L is a

√

2E /ε-periodic function of ϕ0 .

In the following lemma we are going to give an approximation of the Melnikov potential L(ϕ0 , εs, E; ε) in terms of a function L(τ ), which will be called Poincaré function.

380

A. Delshams, R. de la Llave, T. M. Seara

Lemma 4.19.

ϕ0 1 + OC 2 (ε), L(ϕ0 , εs, E; ε) = √ L ε s − √ 2E 2E

where L(τ ) =

Z lim

(T1 ,T2 )→∞

+T2

−T1

dt U˜ (γ1/2 (t), τ ) −

Z

+T2 +a+

−T1 +a−

(4.33)

dt U˜ (31/2 (t), τ ) . (4.34)

Proof. In order to obtain the first order terms in the Melnikov potential we write (4.32) as Z T2 q dt U˜ γE (t), τ + εt L(0, τ, E; ε) = lim (T1 ,T2 )→∞ −T1 √ − h˜ a+ + 2E T2 , τ + εT2 ; ε + h˜ (a+ , τ ; ε) √ + h˜ a− − 2E T1 , τ − εT1 ; ε − h˜ (a− , τ ; ε) √ − h˜ (a+ , τ ; ε) + h˜ a− + 1, τ + ε1/ 2E ; ε √ ˜ − , τ ; ε) . − h˜ a− + 1, τ + ε1/ 2E ; ε + h(a The fourth line in this expression is of order ε in the C 1 norm due to the fact that ˜h(·, ·; ε) is a bounded function with bounded derivatives, (see Lemma 4.13) and a− +1 = a+ . In order to obtain integral expressions for the other three, we only need to use the ˜ Thus, fundamental Theorem of Calculus and the functional equation (4.14) verified by h. L(0, τ, E; ε)

a− q q ˜ ˜ dt U γE (t), τ + εt − U 3E t + √ , τ + εt = lim (T1 ,T2 )→∞ 2E −T1 Z T2 a+ q q dt U˜ γE (t), τ + εt − U˜ 3E t + √ , τ + εt + 2E 0 Z 1/√2E a− q dt U˜ 3E t + √ , τ + εt + O(ε), − 2E 0 √ or equivalently, by the rescaling properties (3.1), and the change of variable u = 2E t, Z

0

1 L(0, τ, E; ε) = √ × 2E Z 0 εu εu q q − U˜ 31/2 (u + a− ), τ + √ du U˜ γ1/2 (u), τ + √ lim (T1 ,T2 )→∞ 2E 2E −T Z T2 1 εu εu q q − U˜ 31/2 (u + a+ ), τ + √ du U˜ γ1/2 (u), τ + √ + 2E 2E 0 Z 1 εu q + O(ε), du U˜ 31/2 (u + a− ), τ + √ − 2E 0

Geometric Approach to the Existence of Orbits with Unbounded Energy

381

and taking the dominant terms in ε, L(0, τ, E; ε) Z 0 1 q q ˜ ˜ lim du U γ (u), τ − U 3 (u + a ), τ = √ − √ 1 1/2 2E (T1 ,T2 )→∞ −T1 2E √ Z T2 2E q q du U˜ γ1/2 (u), τ − U˜ 31/2 (u + a+ ), τ + 0 Z 1 1 q du U˜ 31/2 (u + a− ), τ + √ R(τ, ε) + O(ε) − 2E 0 1 1 = √ L(τ ) + √ R(τ, ε) + O(ε), 2E 2E where R(τ, ε) is defined so that the above is an identity. Note that it only involves the difference of integrals whose integrands have second arguments that are slightly different. One can bound R(τ, ε), using the properties (4.22) and the fact that U˜ (q, τ ) is a periodic function with respect to its second variable τ , as Z +∞ Z 1 |R(τ, ε)| ≤ Kε du e−β|u| + du ≤ Cε. −∞

0

Similarly, one can bound the first and second derivatives because one can take derivatives under the integral sign (the convergence of the integrand is exponentially fast) and then, similar cancellations than those√used above, establish the result. t Then taking τ = ε(s − ϕ0 / 2E ), we have the lemma. u Proposition 4.20. Given a metric that satisfies the genericity conditions of Theorem 3.1, the set of periodic potentials for which the Poincaré function L(τ ) in Lemma 4.19 is identically constant is a C l closed subspace of infinite codimension for l > 0. Proof. We note that, for every τ τ 0 , the mapping U 7 → L(τ ) − L(τ 0 ) is a continuous linear functional map if we give U the C l topology, l > 0. This functional is non-trivial as can be observed by noting that, since “3” and “γ ” do not coincide, it is possible to choose potentials U with support near “γ ” so that the functional does not vanish. u t 4.7. Transition chains and transition lemmas. We recall that according to [Arn64], [AA67], a transition chain for a Hamiltonian flow is a sequence of transition tori such that the unstable manifold of one intersects transversally the stable manifold of the next. The definition of transition tori in [Arn64] is topological, but for our purposes we note that it has been shown in several places (we will follow [FM98] in Lemma 4.24) that all whiskered tori with one dimensional whiskers and with irrational motion are transition tori. This includes the tori produced applying Theorem 4.9 to the inner map fεε of our problem. The importance of transition chains is that there are orbits that follow them closely. Therefore, our first step will be to verify that there exists a sequence of tori obtained by applying Theorem 4.9 to fεε and such that the stable manifold of one crosses transversally the unstable manifold of the previous one. Then, we will discuss some small

382

A. Delshams, R. de la Llave, T. M. Seara

modifications needed to the standard arguments (they only apply to finite sequences) that show that indeed there are orbits that follow them. We note that, in the notation that we have introduced in this paper, the assertion that ˜ ε intersects the unstable manifold of the unstable manifold of a torus contained in 3 another one is equivalent to the assertion that the image of the first torus under the outer map S intersects the second. We will refer to the invariant tori obtained applying Theorem 4.9 to fεε simply as KAM tori. Lemma 4.21. Assume that r ≥ 15. If the Poincaré function L(τ ) is not constant, we can find K > 0 such that for ε sufficiently small, given a KAM torus T , we can find other KAM tori T + , T − such that WTu t WTs + ,

WTu t WTs − ,

min Hε (T ) ≥ max Hε (T − ) + Kε3 , max Hε (T ) ≤ min Hε (T + ) − Kε3 .

Proof. Observe that, since L is periodic and C 2 , if it is not constant, we can find two numbers τ± such that L0 (τ+ ) > 0, L0 (τ− ) < 0, L00 (τ± ) 6= 0. Since L is C 2 the same inequalities are true for small intervals around τ± . ˜ ε using the coordinates Hε , ϕ, s. We study the dynamics on 3 Since L approximates in the C 2 sense the Melnikov potential, and the derivative of this function measures the increase in Hε under the map S, it follows that for small enough ε, given any KAM torus T , its image under S has to include segments ρ± (corresponding to the intervals around τ± above) such that max Hε (ρ± ) − min Hε (ρ± ) ≥ K1 ε3 . On the other hand, the projection of these intervals over the ϕ variable has a length not more than K2 ε. We note that in the averaged coordinates, the KAM tori are not more than εm/2+1 apart and that they correspond very approximately to surfaces of constant action. Hence, in the original coordinates, they will be graphs of functions in the ϕ, s, Hε coordinates which are not more than ε m/2+1 apart in the C l−4 sense. Since the interpretation of the function M (see Fig. 1) was the increment in energy over a torus of the map S, we see that the image of one torus has to cross two KAM tori, one of higher energy and another one of lower energy. Moreover, this intersection has to be transversal. The fact that L00 (τ± ) 6= 0 implies that the derivative of the gain in energy with respect to the angle is bounded from below by a constant times ε 3 . That is, if we express the torus T , S(T ) and T + , T − as graphs 0 | ≥ Kε 3 in a neighborhood of of functions 9, 9S , 9± respectively, we have |9S0 − 9± ± ˜ ε : S(T ) t T ± in 3 ˜ ε. the intersection S(T ) ∩ T , which is therefore transversal in 3 u s ∩ γ ˜ , and On the other hand, by the definition of the outer map S, WT ∩ γ˜ε = WS( ε T) hence s s WTu ∩ γ˜ε t WTs ± ∩ γ˜ε = WS( T ) ∩ γ˜ε t WT ± ∩ γ˜ε in γ˜ε . Finally, the transversal intersection of W u˜ with W s˜ along γ˜ε implies that WTu ∩ γ˜ε t 3 3 ε ε t WTs ± ∩ γ˜ε if and only if WTu t WTs ± . u Remark 4.22. The lemma above does not assert the existence of transverse homoclinic orbits to any of the tori T , T − and T + . The existence of transverse homoclinic orbits

Geometric Approach to the Existence of Orbits with Unbounded Energy

383

O(ε3)

O(εm/2+1)

S(τ)

τ

Fig. 2. Illustration of the action of the map S on a torus τ

is related to the existence of nondegenerate critical points of the Poincaré function. We emphasize that, for the purposes of this paper, what we need are transverse heteroclinic intersections. As an immediate consequence, we have: Lemma 4.23. Assume that the metric g satisfies the assumptions of Theorem 3.1 and that the potential U is such that the Poincaré function L is not constant. Assume moreover that both g and U are C 15 . Then, there exist M > 0, α > 0, such that if i i , E+ ] i = 1, . . . Ii = [E−

is any sequence of intervals such that i ≥ M, E−

i i i −α (E+ − E− ) ≥ M(E+ ) .

Then, we can find a sequence {Ti } of KAM tori such that WTs i+1 t WTui , and a subsequence {Tji } of those tori in such a way that H (Tji ) ∩ Ii 6= ∅.

384

A. Delshams, R. de la Llave, T. M. Seara

Our next goal is to show that the pseudo orbits obtained by interspeding the KAM homoclinic jumps with the motion along the torus can be shadowed by true orbits of the system. As it is usual in the literature for Arnol’d diffusion, the key step is to find an appropriate inclination lemma (also called sometimes λ-lemma). In the literature, one can find very sharp inclination lemmas – including even some estimates of the times needed to do the shadowing – for analytic maps, when the rotation is Diophantine, in [Mar96,Cre97,Val98]. (Related results appear in [CG94]). The result that we have found best adapted to our purposes is that of [FM98] for whiskered tori with one dimensional strong (un)stable directions – as is the case in the problem we are considering – which works for C 1 maps and only requires that the torus has an irrational rotation. A particular case of the results of [FM98] is: Lemma 4.24. Let f be a C 1 symplectic mapping in a 2(d + 1) symplectic manifold. Assume that the map leaves invariant a C 1 d-dimensional torus T and that the motion on the torus is an irrational rotation. Let 0 be a d + 1 manifold intersecting WTu transversally. Then [ f −i (0). WTs ⊂ i>0

An immediate consequence of this is that any finite transition chain can be shadowed by a true orbit. The argument for infinite chains requires some elementary point set topology. ∞ Lemma 4.25. Let {Ti }∞ i=1 be a sequence of transition tori. Given {εi }i=1 a sequence of strictly positive numbers, we can find a point P and a increasing sequence of numbers Ti such that 8Ti (P ) ∈ Nεi (Ti ),

where Nεi (Ti ) is a neighborhood of size εi of the torus Ti . Proof. Let x ∈ WTs 1 . We can find a closed ball B1 , centered on x, and such that 8T1 (B1 ) ⊂ Nε1 (T1 ).

(4.35)

By the Inclination Lemma 4.24, WTs 2 ∩ B1 6= ∅. Hence, we can find a closed ball B2 ⊂ B1 , centered in a point in WTs 2 such that, besides satisfying (4.35): 8T2 (B2 ) ⊂ Nε2 (T2 ). Proceeding by induction, we can find a sequence of closed balls Bi ⊂ Bi−1 ⊂ · · · ⊂ B1 , 8Tj (Bi ) ⊂ Nεj (Tj ), i ≤ j. Since the balls are compact, ∩Bi 6= ∅. A point P in the intersection satisfies the required property. u t Putting together Lemma 4.23 and Lemma 4.25, we obtain the following result, which clearly implies Theorem 1.1.

Geometric Approach to the Existence of Orbits with Unbounded Energy

385

Theorem 4.26. Assume that the metric g satisfies the assumptions of Theorem 3.1 and that the potential U is such that the Poincaré function L is not constant. Assume moreover that both g and V are C 15 . Then, there exist M > 0, α > 0, such that if i i , E+ ] i = 1, . . . Ii = [E−

is any sequence of intervals such that i ≥ M, E−

i i i −α − E− ) ≥ M(E+ ) . (E+

Then, we can find an orbit p(t), q(t) of the Hamiltonian flow and an increasing sequence of times t1 < t2 < · · · < tn < · · · , such that H (p(ti ), q(ti ), ti ) ∈ Ii , i )−2 of the periodic orbit 3 . and (p(ti ), q(ti )) is in a neighborhood of size M(E− Ei −

Note. By assuming more differentiability in the hypothesis of the theorem, we can get α to be arbitrarily large. Remark 4.27. A question that has often been asked us, and which is indeed quite relevant for physical applications, is what is the measure of the diffusing orbits. We do not know at the moment of this writing how to produce a set of positive measure of diffusing orbits. (The set of orbits we have produced here is uncountable, but we do not know how to show what is its measure.) Of course, the mechanism described here is presumably not the only mechanism that contributes to diffusion. Remark 4.28. Another physically relevant question is what is the speed of diffusion that can be reached by these orbits. A heuristic argument – which at the moment we cannot even raise as a conjecture – suggests that the orbit following the mechanism studied in this paper can perform ≈ E 1/2 heteroclinic excursions in a unit of time and in each of them it can gain ≈ E −3/2 rescaled energy which is equivalent to a gain of E −1/2 of energy per heteroclinic excursion. Hence, the gain in energy per unit time could be about constant and therefore E(t) ≈ t. Note that this argument implicitly assumes that the proportion of times that are favorable for the jump and indeed the average gain in energy per jump reach a limit as the energy grows and that the time that one needs to bid preparing for the next jump is a fixed proportion of the total time. d H (x(t), t) = ∂2 V (q(t), t) and, by compactness, the right hand Note that since dt side term is uniformly bounded, we have that the energy of any orbit cannot grow faster than linearly in time, so that, up to multiplicative constants, the rate above would be optimal. The rigorous justification (and indeed a non-rigorous but reliable assessment) of this assumption seems like a daunting task, but we hope some reader may be motivated to investigate this question. Remark 4.29. Another question that is relevant for physical applications but, to our knowledge, remains open is whether the quantum mechanical analogues of our system can have states with energy unbounded with time.

386

A. Delshams, R. de la Llave, T. M. Seara

Acknowledgements. We thank J. N. Mather for communicating his results and for encouragement and advice. This work has been partially supported by the NATO grant CRG950273. Research by A.D. and T.M.S. is also supported by the Spanish grant DGICYT PB94-0215, the Catalan grant CIRIT 1998SGR–0041, and the INTAS project 97-10771. Research by R.L. is also supported by NSF grants. We also thank TICAM, UPC and IMA for invitations that made possible our collaboration.

A. Appendix: Brief Summary of Hyperbolicity Theory In this appendix we collect some of the results from the rich theory of hyperbolic (or normally hyperbolic or further qualifications) invariant manifolds. The results we present are quite standard and can be found in many places, (indeed the theory seems to have been developed several times) so we will just highlight some of the more subtle points of the conclusions which affect some of the statements of the theorems we will prove. We just recommend [Fen74,HP70,Wig94] as readable and complete references. Another version – somewhat more demanding in notation and style – is in [HPS77]. Yet another point of view can be found in [SS74] and following papers. We refer to [Wig94] for a discussion of original references. We discuss only three aspects: the regularity properties of invariant manifolds and foliations, their persistence properties and the smooth dependence on parameters. We will present here the theory for flows. Using the standard suspension trick, all the general results for flows imply corresponding results for invertible maps. There are aspects of the theory of hyperbolic non-invertible maps without flow counterparts, but the aspects of the theory we are discussing are identical for flows and for maps. The theory of non-invertible maps is still somewhat incomplete even in the aspects we discuss here. Definition A.1. Let M be a manifold and 8t a C r , r ≥ 1, flow on it. We say that a manifold 3 ⊂ M – possibly with boundary – invariant under 8t is hyperbolic when there is a bundle decomposition T M = T 3 ⊕ Es ⊕ Eu

(A.1)

invariant under the flow, and numbers C > 0, 0 < β < α, such that for x ∈ 3, v ∈ Exs ⇐⇒ |D8t (x)v| ≤ Ce−αt |v| ∀ t > 0, v ∈ Exu ⇐⇒ |D8t (x)v| ≤ Ceαt |v| ∀ t < 0, v ∈ Tx 3 ⇐⇒ |D8t (x)v| ≤ Ce

β|t|

(A.2)

|v| ∀ t.

Remark A.2. In this paper, we will refer to (A.2) as saying that the manifold is “hyperbolic". In some references where more precision is needed, names such as α − β hyperbolic or normally hyperbolic are used. The hypotheses (A.2) are often referred to by saying that the bundle decomposition (A.1) satisfies exponential dichotomies. Remark A.3. There are two different ways of developing hyperbolicity theory. One is, as we stated, to assume that the constants in (A.2) are uniform in the bundle. Another one is to assume bounds such as those in (A.2) along an orbit and that the ratios along several constants along the orbit are bounded. The first method is the basis of [HP70] and [HPS77]. The second one was used in [Fen74,Fen77]. Clearly, the hypothesis of the bundle approach imply those of the orbit method. The difference in the bounds can be particularly significant in systems in which a geometric

Geometric Approach to the Existence of Orbits with Unbounded Energy

387

structure implies relations between expansion and contraction rates along an orbit but not on a bundle. One example of this situation is the study of the horospheric foliation in geodesic flows in manifolds of negative curvature ([HK90]). Moreover, the study of individual orbits leads naturally to the non-uniform hyperbolic theory [Pes76,Pes77]. For the applications we have in mind, we do not need the sharper results, so that we will state the results in the somewhat simpler language of bundles. Remark A.4. Note that if the inequalities (A.2) are established for |t| ≤ T with T sufficiently large to overcome the constant C (i.e. Ceβ−αT < 1), then we can recover the definition we have given because we can bound ||D8nT +s || ≤ ||D8T ||n · ||D8s ||. This observation is useful when we want to study the persistence of these structures for sufficiently small perturbations. Remark A.5. Similarly, we note that, by redefining the metric in M, one can get rid of the constant C in Definition A.1. A metric satisfying C = 1 is called the adapted metric or sometimes, specially in the East, Lyapunov metric. We refer to the general references above. Remark A.6. If we construct an splitting between bundles in such a way that the bundles are not assumed to be invariant but that they satisfy the inequalities in (A.2) for |t| ≤ T , with T large enough with respect to C, α and β, then one can construct invariant bundles that satisfy similar inequalities with slightly worse constants. Intuitively, Definition A.1 means that the normal infinitesimal perturbations grow faster (either in the future or in the past) than the infinitesimal perturbations along the manifold. The first result we quote is about the existence of invariant stable and unstable manifolds for hyperbolic manifolds. Theorem A.7. Let 3 be a compact hyperbolic manifold (possibly with boundary) for the C r flow 8t , satisfying Definition A.1. Then, there exists a sufficiently small neighborhood U , and a sufficiently small δ > 0, such that: 1. The manifold 3 is C min(r,r1 −δ) , where r1 = α/β. 2. For any x in 3, the set Wxs = {y ∈ U : dist(8t (y), 8t (x)) ≤ Cy e(−α+δ)t for t > 0}

= {y ∈ U : dist(8t (y), 8t (x)) ≤ Cy e(−β−δ)t for t > 0}

is a C r manifold and Tx Wxs = Exs , 8t (Wxs ) = W8s t (x) . 3. For appropriately chosen C > 0, for any x ∈ 3, Wxs,loc = {y ∈ U : dist(8t (y), 8t (x)) ≤ Ce(−α+δ)t for t > 0}

= {y ∈ U : dist(8t (y), 8t (x)) ≤ Ce(−β−δ)t for t > 0}

is a C r manifold and Tx Wxs,loc = Exs , 8t (Wxs ) ⊂ W8s t (x) for t ≥ T0 .

. 4. Moreover, we have Wxs = ∪t>0 8−t W8s,loc t (x)

5. The bundle Exs is C min(r,r0 −δ) in x, where r0 = (α − β)/β.

388

A. Delshams, R. de la Llave, T. M. Seara

6. The set W3s = {y ∈ U : dist(8t (y), 3) ≤ Cy e(−α+δ)t for t > 0}

= {y ∈ U : dist(8t (y), 3)) ≤ Cy e(−β−δ)t for t > 0}

is a C min(r,r0 −δ) manifold. Clearly, 8t (W3s ) = W3s for all t ∈ R. 7. For appropriately chosen C > 0, the set W3s,loc = {y ∈ U : dist(8t (y), 3) ≤ Ce(−α+δ)t for t > 0}

= {y ∈ U : dist(8t (y), 3)) ≤ Ce(−β−δ)t for t > 0}

is a C min(r,r0 −δ) manifold. 8t (W3s,loc ) ⊂ W3s,loc for t > t0 , W3s = ∪t>0 8−t W3s,loc . 8. Tx W3s = Exs . 9. Lip 8t |W s,loc ≤ Ce(−β−δ)t . 3 10. W3s = ∪x∈3 Wxs , and this union is disjoint (i.e. Wxs ∩ Wys 6= ∅, x, y ∈ 3 implies x = y). 11. Moreover, we can find a ρ > 0 sufficiently small and a C min(r,r0 −δ) diffeomorphism s to W s,loc . from the bundle of balls of radius ρ in E3 3 Remark A.8. We note that Wxs , W3s may fail to be embedded manifolds since they may accumulate on themselves. (But they do not intersect themselves.) Also, we note that their boundaries may be rather complicated sets (often they are fractal sets) so that, when considering global properties of these sets one has to be careful on what is the precise definition of a manifold. If the definition is very restrictive in terms of what is the possible boundary, they may fail to be manifolds in that sense. An analogous theorem can be stated for W3u considering the flow generated by −X. Notice that the definition of Wxs includes that the convergence is somewhat fast, not just convergence. There could be other points in 3 whose orbit approaches that of x albeit at a slower rate. Even if it is customary – and we follow the custom – to refer to Wxs as the stable manifold for x we note that it would be more appropriate to refer to it as the strong stable manifold. The last part of the conclusions state, roughly, that all the orbits that approach the manifold 3 fast enough, approach an orbit in 3. Moreover, for the points approaching 3 fast enough and in a sufficiently small neighborhood of 3, the point whose orbit is approached is a well defined function in W3s and is C min(r,r0 −δ) . We point out that compactness enters only mildly in the assumptions. We only need that the flow is uniformly C r in a neighborhood of 3. Remark A.9. When β = 0, r0 and r1 have zero denominator. This cannot be interpreted as ∞ without care. Even if r = ∞, ω, we cannot conclude that min(r, r0 ) = ∞ and that the manifolds are C ∞ or C ω . The best that can be said is that there are C k manifolds for every k. There are examples where the C ∞ conclusions are false even for polynomial perturbations. Remark A.10. We emphasize that, even if the manifolds Wxs are as smooth as the flow, the dependence on x is not claimed to be smoother than r0 , which depends on the contraction factors in the tangent and (un)stable bundles. Indeed, it is sometimes the case that these bounds are sharp in C r0 open sets. Similarly, the regularity of the manifold 3 and that

Geometric Approach to the Existence of Orbits with Unbounded Energy

389

of W3s can be sharp even if the flow is assumed to be analytic. An example for maps can be obtained setting f : T2 × R 7 → T2 × R given by

f (x, y) =

1 21 x, h(x) + y . 11 50

with h : T2 7 → R a conveniently chosen trigonometric polynomial. Using the standard suspension trick, similar examples can be obtained for flows. More examples in this line and a more detailed analysis can be found in [Lla92]. Remark A.11. Even if the above examples show that the regularity numbers r0 , r1 cannot be improved in general, it is possible to obtain sharper results if we introduce more parameters to characterize the exponential rates of the different bundles. Here we have used only α and β, but one obtains sharper results if one introduces different parameters for the contraction rate of the unstable bundle in the past and the stable bundle in the future. The above theorem has as a corollary the smooth dependence on parameters of the (un)stable manifolds. The trick is completely elementary and will be used later several times. Corollary A.12. Assume that 8t,ε is a family of flows which is jointly C r in all its variables (the base point x, the time t and the parameter ε) and that for all the values of the parameter in a ball 8t,ε leaves invariant the manifold 3. Then, for sufficiently s , W s are small |ε|, it is possible to apply Theorem A.7. Moreover, the manifolds W3,ε x,ε min(r,r −δ) 0 C jointly in x and ε. ˜ t (x, ε) = The idea of the proof is very simple. We just consider the extended flow 8 (8t,ε (x), ε), on M × B with B a sufficiently small ball in ε. It is easy to check that the manifold 3 × B is invariant for the flow and that, for a finite time the flow satisfies the exponential dichotomy bounds in the stable and unstable subspaces. Using Remark A.6 we conclude that there are invariant bundles with very close constants. A moment’s reflection shows that the dependence of manifolds of the extended system on the base point gives the dependence on parameters and the base point in the original system. u t Remark A.13. We note that the dependence on parameters cannot be more differentiable than the dependence on the base point and, indeed, the examples alluded to in Remark A.10 can be easily made into examples in which the dependence with respect to parameters is optimal. Take, for example 8t,ε (x) = 8t (x − εv) + εv so that the invariant objects of the 8t,ε (x) are just translates by εv of the invariant objects for 8t and, therefore, the dependence on parameters is the same as the space dependence in the original problem. This is in sharp contrast with the results of the usual implicit function theorem so that the formulations of these problems in terms of implicit function theorems need to involve specialized implicit function theorems that do not have the same properties as the usual one. Now, we continue to discuss persistence. Roughly, we state that any perturbation of a system admitting a hyperbolic manifold has to carry another hyperbolic invariant manifold which is a perturbation of that of the original system.

390

A. Delshams, R. de la Llave, T. M. Seara

Theorem A.14. Let 3 ⊂ M – not necessarily compact – be hyperbolic for the flow 8t generated by the vector field X, which is uniformly C r in a neighborhood U of 3 such that dist(M \ U, 3) > 0. Let 9t be the flow generated by another vector field Y which is C r and sufficiently close to X in the C 1 topology. Then, we can find a manifold 0 which is hyperbolic for Y and close to 3 in the C min(r,r1 −δ) topology. The constants in Definition A.1 for 0 are arbitrarily close to those of 3 if Y is sufficiently close to X in the C 1 topology. The manifold 0 is the only C 1 manifold close to 3 in the C 0 topology, and invariant under the flow of Y . There are several extensions of this result that can be readily obtained. We will just sketch the method of proof and refer to the sources mentioned above. 1. Similarly to Corollary A.12, one can obtain smooth dependence on parameters in Theorem A.14 by extending the system by another one with trivial dynamics. Again, we obtain only min(r, r1 − δ) regularity and this is optimal in examples. A convenient way of formulating this smooth dependence on parameters is using the implicit function theorem and finding a C min(r,r1 −δ ) mapping F : 3 × B → M in such a way that F(3, ε) = 3ε , and F(·, 0) is the identity. 2. Using the remark above, given a family of flows, we can use the map F to identify all the local invariant manifolds of all the flows. Extending the mapping F to a neighborhood and changing coordinates to it, we obtain that we can reduce the study of a family of flows to the problem of a family of flows which preserve a common manifold. This is precisely the case considered in Corollary A.12. Hence, we can obtain that there is a C min(r,r1 −δ) mapping F s : W3s,loc × B → M , F s (·, ε)|3 = F(., ε), F s (Wxs,loc , ε) = in such a way that F s (W3s,loc , ε) = W3s,loc ε ,ε

WFs,loc (x,ε),ε . 3. It is also possible to discuss persistence of manifolds with boundary and locally invariant manifolds. The idea is that we can extend the flow to a globally defined one, with C r bounds which are close to the ones of the original problem and with bounds on the bundles which are also close to the ones we assumed and which agrees with our original flow in the points of the original manifold which are sufficiently far from the boundary. Then, we can apply Theorem A.14 to the extended system. The invariant manifold for the extended system will be locally invariant for the original one. 4. Even if Theorem A.14 includes uniqueness in its conclusions and, therefore the manifold produced is unique (under appropriate conditions) for the extended system, the extension process is not unique and the manifold produced does depend on the extension used. Hence, one cannot claim uniqueness for the locally invariant manifold produced for the original system. On the other hand, it follows from the uniqueness conclusions of Theorem A.7, that all the orbits that remain in a sufficiently small neighborhood of 3 and away from the boundary should be present in all the extensions that do not modify the vector field away from this neighborhood of the boundary. Similarly, note that the definition of stable manifold of a point x or of a manifold 3 involves discussing what happens for arbitrarily large times of the time in the evolution. Such long time orbits depend on the extension used if the orbit of x is not contained in the manifold 3 away from the boundary.

Geometric Approach to the Existence of Orbits with Unbounded Energy

391

On the other hand, for the orbits that indeed remain inside of 3, the definition identifies the points of the stable manifold. Hence, the germs of these stable manifolds have to agree in all the extensions. 5. The above extension process can be combined with the dependence on parameters. We just remark that given a family of perturbations, one can perform the extension in such a way that it depends smoothly on parameters. (The extension only involves elements such as cut-off functions, mappings to identify spaces, etc., that can be used for all the values of the parameter.) Again, the extension process is not unique and the smooth dependence on parameters should be interpreted as the possibility of finding a map F or F s so that its range produces the invariant manifolds. As before, we note that the orbits that are contained in a small neighborhood of 3 away from the boundaries and the germs of their stable and unstable manifolds should be present in all the extended systems. References [AA67] Arnold, V.I. and Avez, A.: Ergodic problems of classical mechanics. New York: Benjamin, 1967 [AKN88] Arnold, V.I., Kozlov, V.V. and Neishtadt, A.I.: Dynamical Systems III. Volume 3 of Encyclopaedia Math. Sci. Berlin: Springer, 1988 [Arn64] Arnold, V.I.: Instability of dynamical systems with several degrees of freedom. Sov. Math. Doklady 5, 581–585 (1964) [BLW96] Banyaga, A., de la Llave, R. and Wayne, C.E.: Cohomology equations near hyperbolic points and geometric versions of Sternberg linearization theorem. J. Geom. Anal. 6(4), 613–649 (1996) [BT99] Bolotin, S. and Treschev, D.: Unbounded growth of energy in nonautonomous Hamiltonian systems. Nonlinearity 12 (2), 365–388 (1999) [CG94] Chierchia, L. and Gallavotti, G.: Drift and diffusion in phase space. Ann. Inst. H. Poincaré Phys. Théor. 60 (1), 1–144 (1994) [Cre97] Cresson, J.: A λ-lemma for partially hyperbolic tori and the obstruction property. Lett. Math. Phys. 42 (4), 363–377 (1997) [DR97] Delshams, A. and Ramírez-Ros, R.: Melnikov potential for exact symplectic maps. Commun. Math. Phys. 190, 213–245 (1997) [Fen77] Fenichel, N.: Asymptotic stability with rate conditions. II. Indiana Univ. Math. J. 26 (1), 81–93 (1977) [Fen79] Fenichel, N.: Geometric singular perturbation theory for ordinary differential equations. J. Differential Equations 31(1), 53–98 (1979) [Fen74] Fenichel, N.: Asymptotic stability with rate conditions. Indiana Univ. Math. J. 23, 1109–1137 (1973/74) [FM98] E. Fontich and P. Martín. Arnold diffusion in perturbations of analytic integrable Hamiltonian systems. Preprint 98-319, [email protected], 1998 [Gal94] Gallavotti, G.: Twistless KAM tori, quasi flat homoclinic intersections, and other cancellations in the perturbation series of certain completely integrable Hamiltonian systems. A review. Rev. Math. Phys. 6 (3), 343–411 (1994) [Hed32] Hedlund, G. H.: Geodesics on a two-dimensional Riemannian manifold with periodic coefficients. Ann. of Math. 33, 719–739 (1932) [Her83] Herman, M.-R.: Sur les courbes invariantes par les difféomorphismes de l’anneau. Vol. 1. Volume 103 of Astérisque, Paris: Société Mathématique de France, 1983 [HK90] Hurder, S. and Katok, A.: Differentiability, rigidity and Godbillon-Vey classes for Anosov flows. Inst. Hautes Études Sci. Publ. Math. 72, 5–61 (1991), (1990) [HP70] Hirsch, M.W. and Pugh, C.C.: Stable manifolds and hyperbolic sets. In: S. Chern and S. Smale, editors, Global Analysis (Proc. Sympos. Pure Math., Vol. XIV, Berkeley, Calif., 1968), Providence, RI: Amer. Math. Soc., 1970, pp. 133–163 [HPS77] Hirsch, M.W., Pugh, C.C. and Shub, M.: Invariant manifolds. Volume 583 of Lecture Notes in Math. Berlin: Springer-Verlag, 1977 [Lla92] de la Llave, R.: Smooth conjugacy and S-R-B measures for uniformly and non-uniformly hyperbolic systems. Commun. Math. Phys. 150 (2), 289–320 (1992) [Lla96] de la Llave, R.: Letter to J. N. Mather, 1996

392

[LM88] [LW89] [Mar96] [Mat95] [Mor24] [Pes76] [Pes77] [SS74] [Tre94] [Val98] [Wei77] [Wig94] [Zeh75] [Zeh76]

A. Delshams, R. de la Llave, T. M. Seara

Lochak, P. and Meunier, C.: Multiphase Averaging for Classical Systems. Volume 72 of Appl. Math. Sci. New York: Springer, 1988 de la Llave, R. and Wayne, C.E.: Whiskered and lower dimensional tori in nearly integrable Hamiltonian systems. Preprint, 1989 Marco, J.-P.: Transition le long des chaînes de tores invariants pour les systèmes hamiltoniens analytiques. Ann. Inst. H. Poincaré Phys. Théor. 64 (2), 205–252 (1996) Mather, J.N.: Graduate course at Princeton, 95–96, and Lectures at Penn State, Spring 96, Paris, Summer 96, Austin, Fall 96. Morse, M.: A fundamental class of geodesics on any closed surface of genus greater than one. Trans. of the A. M. S. 26, 25–60 (1924) Pesin, Ja.B.: Families of invariant manifolds that correspond to nonzero characteristic exponents. Math. USSR-Izv. 40 (6), 1261–1305 (1976) Pesin, Ja.B.: Characteristic Ljapunov exponents, and smooth ergodic theory. Russ. Math. Surv. 32 (4), 55–114 (1977) Sacker, R.J. and Sell, G.R.: Existence of dichotomies and invariant splittings for linear differential systems. I. J. Differ. Eqs. 15, 429–458 (1974) Treschev, D.V.: Hyperbolic tori and asymptotic surfaces in Hamiltonian systems. Russ. J. Math. Phys. 2 (1), 93–110 (1994) Valdinoci, E. Whiskered transistion tori for a priori stable and unstable Hamiltonian systems. Preprint 98-200, [email protected], 1998 Weinstein, A.: Lectures on Symplectic Manifolds. Volume 29 of CBMS Regional Conf. Ser. in Math. Providence, RI: Am. Math. Soc., 1977 Wiggins, S.: Normally hyperbolic invariant manifolds in dynamical systems. Volume 105 of Applied Mathematical Sciences, New York: Springer-Verlag, 1994 Zehnder, E.: Generalized implicit function theorems with applications to some small divisor problems/I. Comm. Pure Appl. Math. 28, 91–140 (1975) Zehnder, E.: Generalized implicit function theorems with applications to some small divisor problems/II. Comm. Pure Appl. Math. 29, 49–111 (1976)

Communicated by Ya. G. Sinai

Commun. Math. Phys. 209, 393 – 405 (2000)

Communications in

Mathematical Physics

© Springer-Verlag 2000

Space of Spaces as a Metric Space Masafumi Seriu1,2 1 Department of Physics, Fukui University, Fukui 910–8507, Japan.

E-mail: [email protected]

2 Yukawa Institute for Theoretical Physics, Kyoto University, Kyoto 606-8224, Japan.

E-mail: [email protected] Received: 31 March 1999 / Accepted: 4 August 1999

Abstract: In spacetime physics, we frequently need to consider a set of all spaces (“universes”) as a whole. In particular, the concept of “closeness” between spaces is essential. However there has been no established mathematical theory so far which deals with a space of spaces in a suitable manner for spacetime physics. Based on the scheme of the spectral representation of geometry, we construct a space S N , which is a space of all compact Riemannian manifolds equipped with the spectral measure of closeness. We show that SN can be regarded as a metric space. We also show other desirable properties of SN , such as the partition of unity, locally-compactness and second countability. These facts show that the space SN can be a basic arena for spacetime physics. 1. Introduction In the course of the development of theoretical physics, mathematics has been an efficient language for a precise formulation and analysis of the problems. However sometimes physics goes ahead, namely we are occasionally forced to face a problem in physics for which appropriate mathematical language has not yet been established. In this case, we cannot even state the problem properly, though its importance is obvious. Though troublesome, it is such a situation that can be a strong motivation for a new development of mathematics, which in turn would help further progress in theoretical physics. The theory presented here can be regarded as this kind. In spacetime physics, it often happens that not a single space (“universe”), but a set of spaces should be considered as a whole. However there is no workable mathematical theory suitable for handling such a situation. Perhaps the most famous example of this category is the “spacetime foam” picture due to Wheeler [13]: At near the Planck scale (lpl =( Gc3h¯ )1/2 ' 10−33 cm), it is anticipated that there are drastic topological fluctuations of spacetime taking place (“spacetime foam”), because of the quantum effect on spacetime. Now as the observational energy

394

M. Seriu

scale E goes down, the finer topological structure of scale less than E −1 would be averaged in some manner, and the effective geometry would seem simpler than the original one. If E is decreased to the much lower energy scale, say 1015 GeV, almost all of the topological handles would be smoothed out, resulting in the simplest structure that we usually experience. In the “spacetime foam” picture briefly described above, we find the concept of a “set of spaces” appears twice: (1) Topological fluctuations. We are tacitly considering a set of spaces with various topologies. (2) Scale-dependent topology [12,8,9]. It is appropriate to explain here the scale-dependent topology in some detail: in the standard mathematical context, topology by definition is a scale-free concept, i.e. it represents the geometrical properties that are independent from the size of the structures. However topology in spacetime physics is quite different: Here the observational energy scale E naturally enters into a discussion, so that geometrical structures of scales less than E −1 are of no significant meaning. Thus we need to consider effective topology as a function of E. In other words, we need to establish a procedure of “topological approximation” at the energy scale E. In this manner, not only the topology (in the ordinary sense) of a handle, but also its size should be taken into account in spacetime physics. Hence it is desirable to establish a suitable framework in which local geometry and global topology are treated on the same footing [10]. As another example which requires the concept of a “set of spaces”, we mention a fundamental problem in cosmology. The real geometry of our universe is very complicated so that we can perceive how our universe is only through cosmological models. Thus cosmology in principle requires a mapping procedure from reality to a model. By analyzing the observational data, we choose the optimal model among a set of models which is most compatible with the data. Here a mapping from reality to the model, or smoothing out the reality to get the model is taking place, but so far, this process is not understood well. We need a scheme for analyzing the mapping procedure itself quantitatively and for judging the validity of the model choice. For this purpose, we need to establish a suitable language which describes the “closeness” between two spaces (reality and a model). Thus we should give a definite meaning to a space of all spaces, supplying a suitable distance between spaces, for further understanding of the problem of model-fitting in cosmology [11]. The fundamental problems in spacetime physics like the examples mentioned above have been frequently discussed so far, without firm foundations for the space of spaces, effective topology, topological approximation, and so on. As perhaps the first attempt to describe the scale-dependent topology quantitatively, the scattering cross-sections of a small handle of various topologies in 2-dimensional space have been investigated and it has been analyzed how the cross-sections are influenced by the topology of a handle and the energy scale of a probe [8,9]. Based on this preliminary investigation, the more systematic scheme for handling space structures has been introduced: The spectral representation of geometrical structures along with the spectral distance between spaces [10]. It echoes a famous question in Riemannian geometry, “Can one hear the shape of a drum?” [3]. The basic idea behind the spectral representation is very clear: We use the “sounds” of a space to characterize the geometrical structures of the space. To be more specific, we use the set of eigenvalues of an elliptic operator (typically, the Laplacian) {λn }∞ n=1 to characterize geometry. (We confine ourselves to the case of spatially compact spaces for definiteness.) Then one can

Space of Spaces as Metric Space

395

also introduce a measure of closeness dN (G, G 0 ) between two geometries G and G 0 by 0 N 0 comparing {λn }N n=1 for G and {λn }n=1 for G . (We can treat the cut-off number N as a running parameter.) There are several advantages for the spectral representation: (1) The spectra {λn }∞ n=1 are countable-number positive quantities and they are easy to handle. (2) The spectra {λn }∞ n=1 contain the information on both the local geometry and global topology of a space. In other words, {λn }∞ n=1 are suitable quantities for treating the local and global geometry in a unified manner. (3) On dimensional grounds, the lower (higher) spectrum corresponds to the larger (smaller) scale behavior of geometry. For instance, if we compare two sets of spectra 0 N up to the N th eigenvalue, {λn }N n=1 and {λn }n=1 , we are in effect comparing the cor0 responding two geometries, G and G , neglecting the small-scale behavior of order −1/2 o(λn ). In this manner, the spectra are suitable for describing the scale-dependent behavior of geometry. (4) The spectra {λn }∞ n=1 are spatial-diffeomorphism invariant quantities, and the spectral distance dN also possesses this property. (5) The spectral distance dN between spaces is constructed from purely “internal” concepts, i.e. the spectra are defined within spaces themselves, and they are independent from the way of embeddings into some other space. In this sense dN is a physical measure of closeness between spaces. The basic properties of the spectral distance have been investigated. Among several possibilities, one particular choice of the spectral distance dN is especially important since it can be derived from quite a general argument of introducing a distance, and since it can be related to the reduced density matrix element in quantum cosmology under some circumstances. At the same time, however, it turned out that this form of dN does not satisfy the triangle inequality [10]. Though the triangle inequality is not of absolute necessity, its utility clearly makes the arguments efficient and compact, and furthermore it is compatible with our intuitive notion of “closeness”. Thus further investigations on this point have been awaited. In this paper we will show that the failure of the triangle inequality of the spectral distance dN is only a mild one, and in fact we will see that its slight modification, d¯N , recovers the inequality. With the help of d¯N , which is a distance in a rigorous sense, we can investigate the properties of the space SN , the space of all spaces equipped with dN . Then we will see that, as a topological space, the space SN is equivalent to the space of all spaces equipped with d¯N , which is a metric space. Thus SN is a metrizable space and it provides us with a notion of “closeness” between spaces in a manner that is compatible with our intuitive notion of “closeness”. In this way it is justified to treat SN as a metric space, provided that we resort to d¯N whenever the triangle inequality is needed in the arguments. We will also show that the space SN possesses several desirable properties, such as the partition of unity, locally-compactness, second countability. These observations support regarding the space SN as a basic arena for spacetime physics. In Sect. 2, we will introduce the spectral representation and the spectral distance dN . In Sect. 3, we will consider the space SN , the space of all spaces equipped with dN , and will see that SN can be regarded as a metric space. We will so investigate several other properties of SN . Section 4 is devoted to several discussions.

396

M. Seriu

2. The Spectral Distance Let us first recollect the attempts in Riemannian geometry to define “closeness” between spaces. There is the Gromov-Hausdorff distance dGH (X, Y ) between two compact metric spaces X and Y [2,7]. It is defined by means of isometric embeddings of X and Y into another metric space Z such as dGH (X, Y ) := inf dH (ϕ1 (X), ϕ2 (Y )). ϕ1 ,ϕ2

Here dH denotes the Hausdorff distance1 on Z ; ϕ1 : X ,→ Z and ϕ2 : Y ,→ Z are isometric embeddings of X and Y respectively, into Z. In other words, we search for the optimal isometric embeddings such that X and Y overlap “as close as possible” in Z, and the distance is defined as the order of failure in overlapping. Another quantity related to “closeness” between spaces is a norm of Riemannian manifolds due to Petersen [7]. Let (Uλ , φλ )λ∈3 be an atlas of a Riemannian manifold (6, h). If each patch Uλ is chosen to be sufficiently small, the metric tensors w.r.t. (with respect to) the atlas do not change so much within each chart (Uλ , φλ ), namely, locally it looks like Euclidean to some extent. Now the larger the size of charts becomes the more the metric tensors vary within each chart. The norm due to Petersen can be explained as the maximum size of the admissible charts under the condition that the variation of the metric tensors within each chart lies within a given range. Its precise definition is very complicated and we do not go further here. This norm measures how close a Riemannian manifold is to the Euclidean space, but it does not provide us with the distance between two manifolds. Though these quantities play a significant role in the convergence theory of Riemannian geometry [7], it seems too abstract and complicated to be directly applied to spacetime physics. Thus let us focus on another measure of closeness between spaces which would be suitable for spacetime physics. We make use of the eigenvalues of an elliptic operator on a space (or “spectra” hereafter). The spectra contain the information of both local geometry and global topology, and the difference in geometry reflects on the difference in the spectra.2 Thus, to state it symbolically, “we ‘hear’the shape of the universe”. Let us call such a representation of geometry in terms of the spectra, the spectral representation, for brevity [10]. Now let Riem be a space of all D-dimensional, compact Riemannian manifolds without boundaries. Let G = (6, h), G 0 = (6 0 , h0 ) ∈ Riem. (We regard them as models of spaces and not spacetimes.) For definiteness we consider only the Laplacian operator 1 here as an elliptic operator, though the arguments below are quite universal. Setting the eigenvalue problem on each manifold 1f = −λf , 1 Let (Z, d) be a metric space. For A ⊂ Z and > 0, we define the -neighborhood of A as B(A, ) := {x ∈ Z|d(x, A) < }, where d(x, A) := inf y∈A d(x, y). Then the Hausdorff distance between two subsets of Z, A1 and A2 , is defined as dH (A1 , A2 ) := inf{| > 0, A1 ⊂ B(A2 , ), A2 ⊂ B(A1 , )}. 2 Geometrical structures are classified into two categories: local geometry and global geometry (global topology), though they are related to each other and there is no clear separation between them. Throughout this paper “geometry” indicates both properties. We use the symbols G, G 0 , etc. to represent geometry as a whole in this broad sense.

Space of Spaces as Metric Space

397

the set of eigenvalues (numbered in increasing order) is obtained; {λm }∞ m=0 for G and 0. for G {λ0n }∞ n=0 The first option that one can imagine easily is probably v uN uX 0 dEuclid (G, G ) := t (λn − λ0n )2 , n=1

which is similar to the Euclidean distance on RN . However from the viewpoint of physics, it is unsatisfactory for two reasons. −2 (1) The spectra {λn }∞ n=1 have the physical dimension [Length ], so that dEuclid also has a dimension [Length−2 ]. It introduces a scale into the theory, which is not desirable. For instance, the statement dEuclid << 1 becomes meaningless, and rather we should −2 say dEuclid << lpl , if we choose lpl as a typical length scale. In this way a particular scale (e.g. lpl ) enters into the discussion. This is unsatisfactory, considering the fundamental nature of the theory of “closeness” between spaces. (2) Remembering the arguments of scale-dependent topology (see Sect. 1), it is clear that, in spacetime physics, the larger scale behavior of geometry is of more importance than the smaller scale behavior. However, looking at the expression of dEuclid , the measure dEuclid counts the smaller scale behavior (i.e. λn with larger n) with more importance, which is unsatisfactory.

Thus a simple difference λn −λ0n is not appropriate for our purpose. Rather we should λ0

0 n take the ratio λnn = 1 + δλ λn , which implies that the difference δλn := λn − λn in the lower spectrum is counted with more importance. Hence we introduce a measure of closeness between G and G 0 as 0

dN (G, G ) =

N X n=1

F

λ0n λn

,

(1)

where F(x) (x > 0) is a suitably chosen function which satisfies F ≥ 0, F(1/x) = F(x), F(y) > F(x) if y > x ≥ 1. For most of the cases we also require F(1) = 0. (However, see an exceptional case below (F2 ).) Note that the zero modes λ0 = λ00 = 0 are excluded from the summation in Eq. (1). On dimensional grounds, λn with the larger number n reflects the smaller scale behavior of the geometry. Therefore the cut-off number N indicates the scale up to which G and G 0 are compared. √In other words the difference between the two geometries in the scale of order o 1/ λN is neglected. Treating N as a running parameter, dN (G, G 0 ) as a function of N indicates the coarse grained similarity between G and G 0 at each scale. In this way the spectral measure of closeness gives a natural basis for analyzing the scale-dependent behavior of geometries. Here we remember that such a quantitative, scale-dependent description of the geometrical structures (the scale-dependent topology in particular) is very essential for developing spacetime physics (see Sect. 1) [8,9]. Now there are several possibilities for the choice of F in Eq. (1). Its detailed form should be determined according to the features of geometry that we are interested in. Among several possibilities, √ however, √ there is one especially interesting choice for F, which is F1 (x) = 21 ln 21 ( x + 1/ x). By this choice, dN can be related to the reduced

398

M. Seriu

density matrix element in quantum cosmology. For the details on the derivation and physical interpretation of this choice, we refer the reader to Ref.[10]. Thus we get [10] s s ! N 0 X 1 1 λ λn n . (2) ln + dN (G, G 0 ) = 2 2 λn λ0n n=1

This measure of closeness3 possesses the following properties: dN (G, G 0 ) ≥ 0, and dN (G, G 0 ) = 0 ⇔ G ∼ G 0 , where ∼ means equivalent up to isospectral manifolds4 , (II) dN (G, G 0 ) = dN (G 0 , G) (I)

but it does not satisfy the triangle inequality [10], (III) dN (G, G 0 ) + dN (G 0 , G 00 ) 6 ≥ dN (G, G 00 ). However, it turns out that the breakdown of the triangle inequality is only a mild one in the following sense: A universal constant a(> 0) can be chosen such that dN0 (G, G 0 ) := dN (G, G 0 ) + a satisfies the triangle inequality. Here a is independent of G and G 0 . √ c √ Indeed there is another option for F as F2 (x) := log2 x + 1/ x N . Here c is an 1 arbitrary positive constant. Note that F2 = N2c ln 2 (F1 + 2 ln 2). Then a modified measure of closeness becomes s s ! Nc N 0 X λ λn n log2 + . d˜N (G, G 0 ) = λn λ0n n=1

This measure d˜N satisfies (I’) d˜N (G, G 0 ) ≥ c, and d˜N (G, G 0 ) = c ⇔ G ∼ G 0 , where ∼ means equivalent up to isospectral manifolds, (II’) d˜N (G, G 0 ) = d˜N (G 0 , G), (III’) d˜N (G, G 0 ) + d˜N (G 0 , G 00 ) ≥ d˜N (G, G 00 ). In this case the triangle inequality holds ((III’)) but the lower bound of d˜N is c(> 0) and not zero ((I’)). Note that d˜N = N2c ln 2 dN + c. Thus, the precise form of (III) turns t to be dN (G, G 0 ) + dN (G 0 , G 00 ) +

N ln 2 ≥ dN (G, G 00 ). 2

Looking at (I)–(III) and (I’)–(III’), we see that the measures of closeness introduced here are generalizations of an ordinary distance.5 3 We will see below that it is justified to regard d as a distance if suitable care is taken. Until then, however, N let us call it a “measure of closeness” for safety. 4 Two non-isometric Riemannian manifolds G and G 0 are called isospectral manifolds when {λ }∞ ≡ m m=0 N 0 }N {λ0n }∞ [6,3,1]. However, the weaker condition {λ } ≡ {λ is enough instead of {λ }∞ m m n n=0 m=0 n=0 m=0 ≡ ∞ 0 {λn }n=0 for the present purpose. 5 It is desirable to construct the theory of a generalized metric space which is characterized by the generalized axioms of distance:

(a) d(p, q) ≥ 0, and d(p, q) = 0 ⇔ p = q. (b) d(p, q) = d(q, p). (c) d(p, q) + d(q, r) + c ≥ d(p, r), where c > 0 is a universal constant independent of p, q and r.

Space of Spaces as Metric Space

399

For later use, we also attention to another choice for F in Eq.(1): we can choose √ pay√ F0 (x) := 21 ln max( x, 1/ x), which is a slight modification of F1 . Then Eq.(1) becomes s s ! N 0 X 1 λ λn n 0 ln max , d¯N (G, G ) = . (3) 2 λn λ0n n=1

It is clear that d¯N satisfies all of the axioms of a distance. In particular it satisfies the triangle inequality (III”) d¯N (G, G 0 ) + d¯N (G 0 , G 00 ) ≥ d¯N (G, G 00 ) , because of the relation max(x, 1/x) max(y, 1/y) ≥ max(xy, 1/xy) for x, y > 0. Therefore d¯N is a distance. 3. The Space of All Spaces Equipped with the Spectral Distance We introduce an r-ball centered at G defined by dN as

B(G, r; dN ) := {G 0 ∈ Riem/∼ |dN G, G 0 < r}.

Here dN is the one defined by Eq. (2) and ∼ indicates the identification of isospectral manifolds. We also consider an r-ball centered at G defined by d¯N as B(G, r; d¯N ) := {G 0 ∈ Riem/∼ |d¯N G, G 0 < r}. Below we will show that the set of all balls defined by dN forms a basis of topology (Lemma 6 below), and that the topology generated by this set of balls is equivalent to the topology generated by the set of all balls defined by d¯N (Theorem 1 below). Here we note that the latter topology makes the space Riem/∼ a metric space. Thus the space SNo := (Riem, dN ) /∼

(4)

turns out to be a metrizable space, which is an idealistic property. (We will consider its completion SN after establishing Theorem 1.) Now we prepare a series of lemmas before showing Theorem 1. Lemma 1. (1) dN (G, G 0 ) ≤ d¯N (G, G 0 ). (2) For ∀B(G, r; dN ) there exists r 0 (> 0) s.t. B(G, r 0 ; d¯N )⊂ B(G, r; dN ). (3) For ∀B(G, r; d¯N ) there exists r 0 (> 0) s.t. B(G, r 0 ; dN )⊂ B(G, r; d¯N ). Proof. (1): It immediately follows from the inequality 21 (p + 1/p) ≤ max(p, 1/p) for p > 0. (2): Indeed, it follows that B(G, r; d¯N )⊂B(G, r; dN ) due to Lemma 1 (1). (3): Suppose there exist G and r > 0 s.t. B(G, ; dN )6 ⊂ B(G, r; d¯N ) for ∀ > 0. For G 0 ∈ B(G, ; dN )\B(G, r; d¯N ). Then dN (G, G 0 ) < , a fixed > 0, take G 0 s.t. q q 0 λn λn ∞ which implies that 21 ln 21 λn + λ0 < for n = 1, 2, · · · N . (Here {λn }n=0 n

and {λ0n }∞ corresponding to G and G 0 , respectively.) Then it easily n=0 are the spectra q 0 q √ λ follows that 1 ≤ max( λnn , λλn0 ) < exp 2 + exp 4 − 1 for n = 1, 2, · · · N . n √ Thus we get a relation r ≤ d¯N (G, G 0 ) < N ln(exp 2 + exp 4 − 1). However, this 2

inequality cannot hold for s.t. 0 < < t u

1 2

ln(cosh

2r N ),

which is a contradiction.

400

M. Seriu 00

0

|λ −λ | Lemma 2. For ∀G 00 ∈ B(G 0 , ; d¯N ), it follows that 0 ≤ nλ0 n < exp 4 − 1 (n = n 00 ∞ 0 00 1, 2, · · · , N). Here {λ0n }∞ n=0 and {λn }n=0 are the spectra corresponding to G and G , respectively.

q 00 q 0 Q λn λn max Proof. The assumption implies that 0 ≤ 21 ln N n=1 λ0n , λ00n < , or q q q q Q λ00n λ0n λ00n λ0n max , , < exp 2. Thus 1 ≤ max 1 ≤ N 0 00 0 00 n=1 λ λ λ λ < exp 2 (n = n

n

1, 2, · · · , N). From this inequality, either 0 ≤

λ00n −λ0n λ0n

n

n

< exp 4 − 1 or 0 ≤

< exp 4 − 1 follows. Then, it is straightforward to get 0 ≤

|λ00n −λ0n | λ0n

λ0n −λ00n λ00n

< exp 4 − 1. u t

0 ∞ 00 ∞ 0 00 Lemma 3. Let {λn }∞ n=0 , {λn }n=0 and {λn }n=0 are the spectra for G, G and G , respec0 −λ λ00 −λ0 P λ N n tively. Then dN (G, G 00 ) = dN (G, G 0 )+ 41 n=1 λnn +λ0 nλ0 n +R. n n PN λ00n −λ0n 2 c ( ) , with c ’s being finite constants. Furthermore R is Here R = n n=1 n λ0n 0 PN λ00n −λ bounded as |R| < n=1 ( λ0 n )2 . n

Proof. (1◦ ) Let f (x) :=

1 2

√ √ ln 21 ( x+1/ x) (x > 0). By the Taylor–Maclaurin theorem, √ √ 1 √x−1/√x δx 4 x+1/ x x +c(x

+ ξ δx)( δx )2 with 0 < ∃ξ < 1. √ x Here c(x) is a smooth function for x > 0. Indeed, setting x = exp θ , f (x) can be (2) 1 represented as f (x) = 21 ln cosh θ. Then dfdx(x) = 4x tanh(θ )=: x1 F1 (θ ), and d dxf2(x) = 1 F (θ), where F2 (θ) = 21 F10 (θ) − F1 (θ ) = 18 ( cosh1 2 θ − 2 tanh θ ). Thus F˜2 (x), which is x2 2 √ √ √ √ F2 regarded as a function of x, is a well-defined function of x + 1/ x and x − 1/ x 1 ˜ and it is smooth for x > 0. Then we can set c(x) = 2! F2 (x). We also note that |c(x)| = 1 1 1 3 2 |F2 (θ)| < 16 ( cosh2 θ + 2| tanh θ|) < 16 < 1. PN PN λ00n λ0n (2◦ ) Applying the result of (1◦ ) to dN (G, G 00 ) = n=1 f ( λn ) = n=1 f ( λn + it follows that f (x + δx) = f (x) +

λ00n −λ0n λn ),

the claim follows. u t

Lemma 4. For ∀G 0 ∈ B(G, r; dN ), there exists > 0 s.t. B(G 0 , ; d¯N ) ⊂ B(G, r; dN ). Proof. Let ρ := dN (G, G 0 ). (Then 0 ≤ ρ < r.) For a fixed , take ∀G 00 ∈ B(G 0 , ; d¯N ). Then, dN (G, G 00 ) = dN (G, G 0 ) +

N

1 X λ0n − λn λ00n − λ0n + R (Lemma 3) 4 λn + λ0n λ0n n=1

<ρ+

N 1 X |λ00 − λ0 | n

4

n=1

λ0n

n

+ |R|.

P λ00n −λ0n 2 Now let us pay attention to the last line. The last term is bounded as |R| < N n=1 ( λ0n ) (Lemma 3). Due to Lemma 2, thus, one can choose sufficiently small s.t. the last term |R| is less in magnitude than the middle term. Let ˜ (> 0) be such . Then we can

Space of Spaces as Metric Space

401

continue the estimation of dN (G, G 00 ) as dN (G, G 00 ) < ρ +

N

1 X |λ00n − λ0n | 2 λ0n n=1

N < ρ + (exp 4˜ − 1) (Lemma2). 2 By choosing ˜ again if necessary, we can assume that 0 < ˜ < 41 ln(1 + N2 (r − ρ)). Then it follows that dN (G, G 00 ) < r for ∀G 00 ∈ B(G 0 , ˜ ; d¯N ). Hence there exists ˜ (> 0) t s.t. B(G 0 , ˜ ; d¯N ) ⊂ B(G, r; dN ). u Lemma 5. For ∀G 0 ∈ B(G, r; d¯N ), there exists > 0 s.t. B(G 0 , ; dN )⊂B(G, r; d¯N ). Proof. Since d¯N is a distance, there exists 0 (> 0) s.t. B(G 0 , 0 ; d¯N ) ⊂ B(G, r; d¯N ). However, due to Lemma 1 (3), there exists 1 (> 0) s.t. B(G 0 , 1 ; dN ) ⊂ B(G 0 , 0 ; d¯N ). t Hence B(G 0 , 1 ; dN ) ⊂ B(G, r; d¯N ). u Lemma 6. The set of balls {B(G, r; dN )| G ∈ Riem/∼ , r > 0} can form a basis of topology. T Proof. Let B1 and B2 are twoTballs defined by dN , and suppose B1 B2 6= ∅. Because T of Lemma 4, for ∀G ∈ B1 B2 , there exists r(> 0) s.t. B(G, r; d¯N ) ⊂ B1 B2 . However, due to Lemma 1 (3), there exists r 0 (> 0) s.t. B(G, r 0 ; dN )⊂B(G, r; d¯N ). Hence T B(G, r 0 ; dN )⊂B1 B2 . Thus the set of all balls defined by dN satisfies the condition for a basis of topology. u t Now we can show Theorem 1. The set of balls {B(G, r; dN )| G ∈ Riem/∼ , r > 0} and the set of balls {B(G, r; d¯N )| G ∈ Riem/∼ , r > 0} generate the same topology on Riem/∼ . Proof. ∀B(G, r; dN ) is open in d¯N -topology due to Lemma 4. On the other hand, t ∀B(G, r; d¯N ) is open in dN -topology due to Lemma 5. u Corollary. The space SNo is a metrizable space. The distance function for metrization is provided by d¯N . Hence it is appropriate to extend SNo to its completion SN . We understand {λn }∞ n=0 , ¯ dN and dN are extended on SN accordingly. Now the metrizable space SN is a normal space, not to mention a Hausdorff space6 . Due to Theorem 1, it is justified to regard SN as a metric space, provided that we resort to the distance function d¯N whenever the triangle inequality is needed in a discussion. From now on we call dN in Eq.(1) (the form of dN in Eq.(2) in particular) a spectral distance for brevity. Here it may be appropriate to add some comments on the applications of SN to physics. From the viewpoint of the practical applications in spacetime physics, dN is more convenient than d¯N since the former is easier to handle than the latter, which contains max in the expression. Furthermore dN can be related to the reduced density matrix element for the universe in the context of quantum cosmology [10]: It is possible 6 For the basics of point set topology, see e.g. Ref. [5,4,14]

402

M. Seriu

to state that, under some circumstances, two universes G and G 0 are separated far in dN when their quantum decoherence is strong. With these comments in mind let us now turn back to the mathematical aspects of the space SN . Since SN is a metrizable space, it follows that [4,14] Theorem 2. The space SN is paracompact. Since SN is Hausdorff and paracompact, it follows that [4,14] Corollary. There exists a partition of unity subject to any open covering of SN . Now we prepare two lemmas to show that SN is locally compact (Theorem 3 below). Lemma 7. The set D(G, r; dN ) := {G 0 ∈ SN |dN (G, G 0 ) ≤ r} is closed and compact in SN . Proof. (1◦ ) Note that the map dN (G, ·): SN → [0, ∞) is continuous and that [0, r] is closed in [0, ∞). Since D(G, r; dN ) is the inverse image of the closed set [0, r] by the continuous map dN (G, ·), it is closed in SN . (2◦ ) We now show that D(G, r; dN ) is sequentially compact. N Any sequence {Gn }∞ n=1 ⊂ D(G, r; dN ) can be embedded into an N -cube in R , (n) ∞ N {Gn }∞ n=1 ,→[0, L] for some L > 0. Indeed let {λk }k=1 be the spectra (zero-mode is excluded) for Gn . Then a map s s s  s (n) (n) (n) (n) λk λN λ λ2 (n) (n)  :=  1 , ,··· , ,··· , {λk }N k=1 7 → µ λ1 λ2 λk λN q (n) is the spectra for G.) Here we note that λk (k = provides the embedding. ({λk }∞ k=1 q √ √ (n) 1, 2, · · · , N) is bounded as λk ≤ (exp 2r + exp 4r − 1) λk because dN (G, Gn ) ≤ √ r. Hence we can set L = exp 2r + exp 4r − 1. N Now {µ(n) }∞ n=1 is a sequence in a compact set [0, L] , so that there exists its subse0) ∞ (n N quence {µ }n0 =1 which converges to a point in [0, L] (in the sense of RN -topology). (∞) (∞) (∞) Let this convergent point be µ(∞) = (µ1 , µ2 , · · · , µN ). Then for any (0 < < 1) there exists M ∈N s.t. for ∀m ≥ M, it follows that (m) (m) (∞) P µk µk µk N 1 < 1 + . Then it follows that 21 N 1 − < (∞) k=1 ln max( (∞) , (m) ) < 2 ln( 1− ). µk

µk

µk

This implies that, for any (0 < < 1), there exists M ∈N s.t. for ∀m, m0 ≥ M, 1 (n0 ) }∞ is a Cauchy ). Namely {Gn0 }∞ d¯N (Gm , Gm0 ) < N ln( 1− n0 =1 corresponding to {µ n0 =1 is a Cauchy sequence w.r.t. d also, due to Lemma 1 sequence w.r.t. d¯N . Thus {Gn0 }∞ 0 N n =1 (1). However D(G, r; dN ) is closed in the complete space SN , so that {Gn0 }∞ n0 =1 converges to a point ∃G∞ in D(G, r; dN ). Hence any sequence {Gn }∞ n=1 ⊂ D(G, r; dN ) contains a subsequence which converges to a point in D(G, r; dN ), i.e. D(G, r; dN ) is sequentially compact. (3◦ ) Since D(G, r; dN ) is a set in the metrizable space SN as well as sequentially compact, it is compact. u t Lemma 8. The set D(G, r; d¯N ) := {G 0 ∈ SN |d¯N (G, G 0 ) ≤ r} is closed and compact in SN .

Space of Spaces as Metric Space

403

Proof. Since d¯N is a distance, it is clear that D(G, r; d¯N ) is closed in d¯N -topology. Thus D(G, r; d¯N ) is closed in SN due to Theorem 1. The rest goes almost in the same manner as the Proof of Lemma 7: Any sequence ∞ N N ¯ {Gn }∞ n=1 ⊂ D(G, r; dN ) can be embedded into an N -cube in R , {Gn }n=1 ,→[0, L] (n) N for some L > 0. The embedding map {λk }k=1 7 → µ(n) is the same as in the Proof of q √ (n) Lemma 7. Since d¯N (G, G)n ≤ r, it follows that λk ≤ exp 2r λk (k = 1, 2, · · · , N ) due to Lemma 2. Thus we can set L = exp 2r. Because {µ(n) }∞ n=1 is a sequence in a 0 which converges to a point compact set [0, L]N , there exists its subsequence {µ(n ) }∞ 0 n =1 µ(∞) in [0, L]N (in the sense of RN -topology). Repeating the same argument as in the (n0 ) }∞ is a Cauchy Proof of Lemma 7, we conclude that {Gn0 }∞ n0 =1 corresponding to {µ n0 =1 sequence w.r.t. dN . However D(G, r; d¯N ) is closed in the complete space SN , then ¯ ¯ {Gn0 }∞ n0 =1 converges to a point ∃G∞ in D(G, r; dN ). Hence D(G, r; dN ) is sequentially compact. Since D(G, r; d¯N ) is a set in the metrizable space SN as well as sequentially compact, it is compact. u t Theorem 3. The space SN is locally compact. Proof. For any G ∈ SN , one can take D(G, r; dN ) or D(G, r; d¯N ) as its compact neighborhood because of Lemma 7 and Lemma 8. u t Corollary. If a sequence of continuous functions on SN , {fn }∞ n=1 , pointwise converges to a function f∞ , then f∞ is continuous on a dense subset of SN . Proof. The set of discontinuous points of f∞ is a set of first category7 . Since SN is Hausdorff and locally compact, it becomes a Baire space, i.e. a space in which the complement of any set of first category becomes dense [4,14]. Thus the claim follows. t u Due to Theorem 3, we can define integral over SN . and locally compact, one can consider its one-point comSince SN is Hausdorff S pactification SN {∞}, which is Hausdorff. Moreover, SN is metrizable so that it is ˇ completely regular, then one can construct its Stone–Cech compactification [4,14]. Furthermore we can show Theorem 4. The space SN satisfies the second countability axiom. Proof. (1◦ ) Since SN is a metrizable space, it suffices to show that SN is separable [4,14]. First we choose a suitable countable subset of SN . For a fixed M ∈ N, the label (m1 ≤ m2 ≤ · · · ≤ mN ) is uniquely assigned to the spectra {λn }∞ n=1 , where m1 , m2 , · · · , mN ∈ N: This can be achieved by choosing m1 , m2 , · · · , mN s.t. mnM−1 < λn `2 ≤ mMn (n = 1, 2, · · · N ). (Here ` is any constant of physical dimension [Length]. It has been introduced only for physical comfort, and it is not essential for the arguments below.) For a given M, thus, the space SN is uniquely decomposed into classes labeled by (M; m1 ≤ m2 ≤ · · · ≤ mN ). (Some of the classes can be empty.) Then we can choose a representative G(M;m1 ≤m2 ≤···≤mN ) in the class (M; m1 ≤ m2 ≤ · · · ≤ mN ) 6 = ∅. Thus we obtain a countable subset 7 A set of first category is defined as a set which can be expressed as a union of at most countable number of sets that are nowhere dense.

404

M. Seriu

C := {G(M;m1 ≤m2 ≤···≤mN ) |M, m1 , m2 , · · · , mN ∈ N}. (For notational simplicity let (M; m) denote (M; m1 ≤ m2 ≤ · · · ≤ mN ) hereafter.) (2◦ ) Take ∀G∈ SN . For ∀M ∈ N, there uniquely exists a class (M; m) s.t. G ∈(M; m). ∗ ∞ for G and G(M;m) , the representative Let {λn }∞ n=0 and {λn }n=0 be, respectively, the spectra q ∗ q q λ 1 ∗ 2 of the class (M; m). Then |λn −λn |` ≤ M . Hence 1 − Mλ1 `2 ≤ λnn ≤ 1 + Mλ1 `2 . n n q N 1 ¯ Thus dN (G(M;m) , G) < 2 ln 1/ 1 − Mλ `2 . Due to Lemma 1 (1), thus, it follows 1 that ∀ > 0, there exists G(M;m) ∈ C s.t. G(M;m) ∈ B(G, ; dN ). Hence SN is separable, so that the claim follows immediately. u t Theorem 1–Theorem 4 and their corollaries indicate that one can deal with functions on SN to a great extent, which makes SN a basic arena for spacetime physics. Thus it is essential to investigate the mathematical structures of SN in detail. 4. Discussion We have introduced the space SN and have shown that it has several desirable properties. In particular we have shown that SN is a metrizable space and in effect it can be regarded as a metric space provided that care is taken with regard to the triangle inequality: Whenever we need the arguments linked with the triangle inequality, it is safer to resort to d¯N , which is a slight modification of dN and which defines the same point set topology as dN (Theorem 1). However, dN is of more importance as well as easier to handle in practical applications. Therefore it is significant that it has been justified to treat SN as a metric space. Several properties of SN that we have shown indicate that the space of spaces SN provides us with a firm platform for pursuing meaningful investigations in spacetime physics (recall the arguments in Sect. 1). Hence we await a more detailed investigation on the properties of SN to be performed. Since the spectral distance is explicitly defined in terms of the spectra, that are of definite physical meaning, it possesses direct applicability to physics as well as theoretical firmness. Explicit applications of the spectral formalism would be discussed elsewhere. (See e.g. Ref.[11].) Finally we make some comments on the isospectral manifolds [6,3,1]. It is no surprise that there exist Riemannian manifolds with identical spectra of the Laplacian even though they are non-isometric to each other: In this case we are comparing “sounds” produced by a single type of oscillation corresponding to the Laplacian. If we change a type of oscillation, namely if we use a different elliptic operator, the difference in sound would make a distinction between such spaces. If by any chance there were non-isometric Riemannian manifolds, s.t. the spectra are identical for any elliptic operator, they should have been regarded as identical from the physical point of view. We can even imagine a new picture of “space” suggested by the spectral formalism, in which one regards all of the geometrical information of a space as a collection of all spectral information such as [ ∞ (k) ∞ (Dk , {λ(k) Space = n }n=0 , {fn }n=0 ) , k

(k)

(k)

∞ where Dk denotes an elliptic operator, {λn }∞ n=0 and {fn }n=0 are its spectra and eigenfunctions. Here the index k runs over all possible elliptic operators. Any observation

Space of Spaces as Metric Space

405

selects out a subclass of elliptic operators related to the observational apparatus so that only a small portion of the whole geometrical information is obtained by a single observation. In some cases, such incomplete information is not enough to distinguish some manifolds from each other. (This is the physical interpretation of isospectral manifolds.) Then one should perform other type of observations (corresponding to other elliptic operators) to make finer distinctions. It is also tempting to regard the spectral information as most fundamental. Further investigations are required as to whether this viewpoint of spacetime makes sense. Acknowledgements. The author thanks S. Naito for valuable discussions on point set topology. He is also grateful to H. Kodama, K. Piotrkowska and S. Yasukura for helpful comments. This work has been completed during the author’s stay in the Department of Mathematics and Applied Mathematics, University of Cape Town. The author thanks the department for its hospitality. He also thanks the Inamori Foundation, Japan, for encouragement as well as financial support.

References 1. Chavel, I.: Eigenvalues in Riemannian Geometry. Orlando: Academic Press, 1984 2. Gromov, M., Lafontaine, J., Pansu, P.: Structures métriques pour les variétés riemannienness. Paris: Cedic/Fernand Nathan, 1981 3. Kac, M.: Can one hear the shape of a drum? Am. Math. Mon. 73(4), Part II, 1–23 (1966) 4. Kelly, J.L.: General Topology. Princeton: D. van Nostrand, 1955 5. Kolmogorov, A.N., Fomin, S.V.: Foundations of Functional Analysis (4th ed.). Moscow: Nauka, 1976, Chapter 2 6. Milnor, J.: Eigenvalues of the Laplace operator on certain manifolds. Proc. Nat. Acad. Sci. USA 51, 542 (1964) 7. Petersen, P.: Riemannian Geometry. New York: Springer-Verlag, 1998, Chapter 10 8. Seriu, M.: Analytical Description of Scale-Dependent Topology. A Toy Model. Phys. Lett. B319, 74–82 (1993) 9. Seriu, M.: The Scale-Dependent Topology: The Effects of Small Handles on the Propagation of a Field. Vistas in Astronomy 37, 637–640 (1993) 10. Seriu, M.: The Spectral Representation of the Spacetime Structure: The “Distance” between Universes with Different Topologies. Phys. Rev. D53, 6902–6920 (1996) 11. Seriu, M.: Averaging Procedure in Terms of the Spectral Representation. In: K.Oohara (ed.), Proceedings of the 8th Workshop on General Relativity and Gravitation. Proceedings, Niigata University 1998, pp. 334–339, Niigata: Niigata University, 1999 12. Visser, M.: Wormholes, baby universes, and causality. Phys. Rev. D41, 1116–1124 (1990) 13. Wheeler, J.A.: On the Nature of Quantum Geometrodynamics. Ann. Phys. (N.Y.) 2, 604–614 (1957) 14. Yano, K.: Metric Spaces and Topological Structures. Tokyo: Kyoritsu Shuppan, 1998, Chaps. 3 and 4 Communicated by H. Nicolai

Commun. Math. Phys. 209, 407 – 435 (2000)

Communications in

Mathematical Physics

© Springer-Verlag 2000

Cluster Coagulation J. R. Norris Statistical Laboratory, 16 Mill Lane, Cambridge, CB2 1SB, UK. E-mail: [email protected] Received: 13 July 1998 / Accepted: 9 August 1999

Abstract: We introduce a general class of coagulation models, where clusters of given types may coagulate in more than one way and where the rate at which this happens may depend on the cluster types. In the continuum version of these models there is a generalization of Smoluchowski’s coagulation equation. We introduce a notion of strong solution for this equation and prove the existence of a maximal strong solution, which while it persists is the only solution. When the total rate of coagulation for particles is bounded above and below by constant multiples of the product of their masses, we show that the maximal strong solution coincides with the maximal mass-conserving solution and does not persist for all time. Thus, for these models, loss of mass (to infinity) coincides with divergence of the second moment of the mass distribution and takes place in a finite time. When the total rate of coagulation of large particles is proportional to their masses, we establish the existence and uniqueness of solutions for all time. In a restricted class of “polymer” models, we allow coagulation of weighted shapes in a finite number of ways. For this class we establish a discrete approximation scheme for the continuum dynamics. For each continuum coagulation model, there is a corresponding finite-particlenumber stochastic model. We show that, in the polymer case, which includes the case of simple mass coalescence, as the number of particles becomes large, the stochastic model converges weakly to the deterministic continuum model, at an exponential rate.

1. Introduction In the standard Smoluchowski [vS16] model for coagulation, particles having masses x and y are assumed to coagulate at a rate κ(x, y), forming a single particle of mass x + y. We refer to [Ald90] for a recent survey of the many fields of application and mathematical problems associated with this model. Since the dynamics depend only on the particle masses, this model is most obviously applicable to coalescence, for example

408

J. R. Norris

of liquid droplets, where the combined particle is indistinguishable from other particles of the same mass. There many physical processes where one sees the evolution of a cluster of basic particles, which is not characterized by its mass alone. The cluster may have a shape and may retain characteristics depending on the basic particles themselves. Examples are provided by any process involving the coagulation of solid particles or, at a smaller scale, by the growth of polymers. It may be that the coagulation rates in these processes depend only on the cluster mass, in which case the evolution of the mass distribution may be modelled as for coalescence. In general, coalescence will not provide even a good approximation for cluster-dependent coagulation. We aim here to set up and to analyse mathematical models appropriate to physical processes involving pairwise coagulation of clusters x and y at rates K(x, y) depending on the detailed structure of each cluster. Just as for coalescence, the informal idea of clusters coagulating at rate K can be made precise either in terms of a finite stochastic particle system, whose construction is elementary, or in a deterministic continuum version, where the existence and uniqueness of dynamics is not so obvious. There are three main sorts of result in this paper. First, we will obtain conditions for existence and uniqueness of dynamics in the continuum model (Theorems 2.1 and 2.3). Second, we will provide criteria for the presence or absence of gelation in the continuum models (Theorems 2.1, 2.2 and 2.8). Third, we will establish weak convergence to the continuum dynamics of the stochastic particle system, as the number of particles becomes large (Theorems 4.2 and 4.3). The continuum model is studied in Sect. 2, 3 and weak convergence in Sect. 4. The main points of mathematical interest are the following. We are able to adapt our analysis of coalescence models in [Nor99] to the current context. The continuum equations are treated there as the flow of a quadratic vector field in a space of measures. The same formulation adapts to deal with the present model. By a careful use of positivity, we are able to give, here as in [Nor99], a stronger uniqueness statement for the continuum dynamics than was known even for coalescence. We can identify the gelation time for the multiplicative coalescent exactly, as a function of the initial mass distribution. A coupling argument then allows us to characterize gelation for approximately multiplicative coalescents. We are also able to prove weak convergence of the particle system to the continuum dynamics at an exponential rate. This improves on the corresponding result in [Nor99] and makes use of some aspects of our more general models in an approximation scheme, even when the given rate of coagulation depends on clusters only through their total mass. The approximation scheme allows us to dispense entirely with weak compactness arguments, which are replaced by a finite-dimensional stability argument based on the exponential martingale inequality. 2. A Generalization of Smoluchowski’s Coagulation Equation Let (E, E) be a measurable space. We call the elements of E clusters. We suppose there is given a measurable function m : E → (0, ∞). For a cluster x, we interpret m(x) as its mass. A kernel is a map K : E × E × E → [0, ∞] such that (i) (x, y) 7 → K(x, y, A) : E × E → [0, ∞] is measurable for all A ∈ E, (ii) A 7 → K(x, y, A) : E → [0, ∞] is a measure for all (x, y) ∈ E. We say that K is a coagulation kernel if in addition it is finite, symmetric and preserves mass, that is, respectively ¯ (iii) K(x, y) = K(x, y, E) < ∞ for all x, y ∈ E,

Cluster Coagulation

409

(iv) K(x, y, A) = K(y, x, A) for all x, y ∈ E and A ∈ E, (v) m(z) = m(x) + m(y) for K(x, y, .) a.a. z, for all x, y ∈ E. For functions ψ on (0, ∞) and subsets B of (0, ∞), we adopt the notation ψ˜ = ψ ◦m and B˜ = m−1 (B). Denote by M+ the space of measures on E. Given a weight function w : E → (0, ∞], denote by M(w) the space of signed measures µ on E for which wµ has finite total variation. We write kµk for the total variation of µ. Let K be a coagulation kernel. For µ ∈ M+ we attempt to define L(µ) by Z 1 {f (z) − f (x) − f (y)}K(x, y, dz)µ(dx)µ(dy) hf, L(µ)i = 2 E×E×E for some suitable class of test-functions f . Let S denote the set of functions on E ˜ where ψ(x) = x1x≤1 , together with all bounded non-negative measurable comprising ψ, functions f supported on B˜ for some compact set B ⊆ (0, ∞). We consider the following generalization of Smoluchowski’s coagulation equation Z t L(µs ) ds. (2.1) µt = µ0 + 0

We admit as a local solution any map t 7→ µt : [0, T ) 7 → M+ , where T ∈ (0, ∞], such that: (i)

for all Borel sets A ⊆ E the following map is measurable t 7 → µt (A) : [0, T ) → [0, ∞];

(ii) for all f ∈ S we have hf, µ0 i < ∞; (iii) for all compact sets B ⊆ (0, ∞) and all t < T , Z tZ ¯ K(x, y)µs (dx)µs (dy)ds < ∞; 0

˜ B×E

(iv) for all f ∈ S and all t < T hf, µt i = hf, µ0 i +

Z 0

t

hf, L(µs )ids.

(2.2)

In the case T = ∞, we have a solution. A local solution to (2.1) will also be called a K-coagulant. It is easy to check that conditions (i),(ii),(iii) give an unambiguous meaning to (iv). We will sometimes write equations such as (2.2) in their differentiated form, in which case the derivative (d/dt)hf, µt i is always to be understood in the weak sense. The condition that (2.2) holds for ψ˜ is a boundary condition, expressing that no mass enters at 0. We obtain an equivalent condition on replacing ψ by any non-vanishing sublinear function (0, ∞) → [0, ∞) of bounded support which is linear near 0. A function ϕ : (0, ∞) → [0, ∞) is sublinear if ϕ(λx) ≤ λϕ(x), x ∈ (0, ∞), λ ≥ 1.

410

J. R. Norris

Note that such a function ϕ is always subadditive: ϕ(x + y) ≤ ϕ(x) + ϕ(y), x, y ∈ (0, ∞). Hence hϕ, ˜ L(µ)i ≤ 0 for all µ ∈ M+ . Note also that, if ϕ : (0, ∞) → [0, ∞) is any sublinear function and if  −1 −1  nxϕ(n ), 0 < x ≤ n , ϕn (x) = ϕ(x), n−1 < x ≤ n,  0, x > n, then ϕn (x) ↑ ϕ(x) for all x, and ϕn is sublinear of bounded support, linear near 0. So, for any local solution (µt )t
Hence, using monotone convergence on the left and Fatou’s lemma on the right, Z t ˜ µt i − hϕ, ˜ L(µs )i ds. (2.3) hϕ, ˜ µ0 i ≥ hϕ, 0

Thus hϕ, ˜ µt i is non-increasing in t. In particular, the total mass density hm, µt i is nonincreasing in t; if it is finite and constant, then we say that (µt )t
(2.4)

for some continuous sublinear function ϕ : (0, ∞) → (0, ∞). For example, we might take ϕ(x) = a + bx + c(x), where a and b are non-negative constants and c : (0, ∞) → (0, ∞) is continuous and decreasing. We also assume that the initial measure µ0 satisfies hϕ, ˜ µ0 i < ∞.

(2.5)

˜ µt i is non-increasing For any local solution (µt )t
a strong local solution. Here is our basic existence and uniqueness result. It extends the corresponding result of [Nor99] to the case of cluster-dependent coagulation. For earlier work on this problem, see [McL62,McL64,Whi80,BC90,Hei92,DS96]. An example where (2.1) has more than one conservative solution has been constructed in [Nor99]. The proof will follow later. Theorem 2.1. Assume conditions (2.4) and (2.5). If (µt )t 0, then any strong local solution is conservative. Moreover, if hϕ˜ 2 , µ0 i < ∞, then

Cluster Coagulation

411

(i) there exists a unique maximal strong solution (µt )t<ζ (µ0 ) , with ζ (µ0 ) ≥ hϕ˜ 2 , µ0 i−1 , ¯ y) ≤ ϕ(x) ˜ + ϕ(y) ˜ for all x, y ∈ E, then ζ (µ0 ) = ∞. (ii) if ϕ 2 is sublinear or if K(x, This result provides sufficient conditions for ζ (µ0 ) = ∞ but not for ζ (µ0 ) < ∞, nor does it assert that (µt )t<ζ (µ0 ) has no conservative extension, indeed we know this is not always true (see [Nor99]). The prototypical example where ζ (µ0 ) < ∞ is the multiplicative coalescent, which we obtain by taking E = (0, ∞), m(x) = x and K(x, y, dz) = xyεx+y (dz). We show in Theorem 2.8 that the multiplicative coalescent has no conservative extension beyond ζ (µ0 ). More generally, let us call K approximately multiplicative if, for some constants ε > 0 and M < ∞, we have ¯ εm(x)m(y) ≤ K(x, y) ≤ M(1 + m(x))(1 + m(y)), x, y ∈ E. The following result is related to Corollary 2 in [Jeo98], which establishes a notion of finite time gelation for approximately multiplicative coalescence in N, also allowing for particle fragmentation. Our result applies to more general coagulation models but we have not considered the case of fragmentation. By proving that loss of mass occurs with the divergence of the second moment, we are able to tie up conservativity of local solutions with uniqueness. Theorem 2.2. Assume that K is approximately multiplicative and hm + 1, µ0 i < ∞. (i) If hm2 , µ0 i = ∞ then (2.1) has no conservative solution. (ii) If hm2 , µ0 i < ∞ then (2.1) has a unique maximal conservative solution (µt )t
Z 0

t

S(µs )ds

(2.6)

exactly as for (2.1). Note that the extra term in S(µ) comes into play only after a solution has ceased to be conservative. Thus (2.6) and (2.1) have the same conservative local solutions. The extra term represents an interaction between the “mass at infinity” and µt , which is the natural extension in this context of the interaction with large clusters. By contrast, Smoluchowski’s equation makes the “mass at infinity” inert.

412

J. R. Norris

Theorem 2.3. Assume that K is eventually multiplicative and that hm + 1, µ0 i < ∞. Then (2.6) has a unique solution (µt )t≥0 starting from µ0 . Note that, in many cases, Theorem 2.2 shows that (2.1) has no conservative solution beyond some finite time. Thus the solution provided by Theorem 2.3 will typically be non-conservative. We turn to the proofs of the above three results. First we introduce a more general sort of dynamics, which will provide the main tool in the proof of Theorem 2.1. Within this framework, we are able to approximate the dynamics on E in terms of an evolution of clusters on B˜ ⊆ E, where B ⊆ (0, ∞) is compact. This has the advantage that K¯ is ˜ At the same time, we have to take account of what the clusters outside bounded on B˜ × B. ˜ B might do to those inside – that is the function of the auxiliary process (λt )t≥0 . Let us suppose given a coagulation kernel J : E × E × E → [0, ∞), a symmetric measurable function J ∗ : E × E → [0, ∞] and a measurable function w : E → (0, ∞] such that J¯(x, y) ≤ J ∗ (x, y) ≤ w(x)w(y), x, y ∈ E and

w ≤ w(x) + w(y), J (x, y, .)-a.e.

In the case where w is bounded, for µ ∈ M(w) and λ ∈ R, we can define L(µ, λ) ∈ M(1) × R by h(f, a),L(µ, λ)i Z 1 {f (z) − f (x) − f (y)}J (x, y, dz)µ(dx)µ(dy) = 2 E×E×E Z 1 {aw(x) + aw(y) − f (x) − f (y)}(J ∗ − J¯)(x, y)µ(dx)µ(dy) + 2 E×E Z + λ {aw(x) − f (x)}w(x)µ(dx) E

for all bounded measurable functions f on E and all a ∈ R. Here we used the notation h(f, a), (µ, λ)i = hf, µi + aλ. In the case where w is unbounded, this formula still serves to define L(µ, λ) ∈ M(1) × (R ∪ {∞}) provided µ ≥ 0 and λ ≥ 0. Consider the equation Z t L(µs , λs ) ds. (2.7) (µt , λt ) = (µ0 , λ0 ) + 0

For each T ∈ (0, ∞], we admit as a local solution any measurable map t 7 → (µt , λt ) : [0, T ) → M+ (w) × [0, ∞) such that for all non-negative bounded measurable functions f on E and all a ∈ R, for all t < T , Z t h(f, a), L(µs , λs )i ds. h(f, a), (µt , λt )i = h(f, a), (µ0 , λ0 )i + 0

When T = ∞ we have a solution. Note that, by our basic assumptions (2.4) and (2.5) and the subsequent remarks, in the ˜ if (µt )t
Cluster Coagulation

413

Proposition 2.4. Assume that w is bounded. Then, for all µ0 ∈ M+ (w) and all λ0 ∈ [0, ∞), Eq. (2.7) has a unique solution (µt , λt )t≥0 starting from (µ0 , λ0 ). Proof. By a scaling argument we may assume without loss that hw, µ0 i + λ0 ≤ 1. We shall show, by a standard iterative scheme, that there is a constant T > 0, depending only on w, and a continuous map t 7 → (µt , λt ) : [0, T ] → M(w) × R

(2.8)

such that, for all t ≤ T , for all non-negative measurable functions f on E with f ≤ w and all a ∈ R, Z t h(f, a), L(µs , λs )i ds. (2.9) h(f, a), (µt , λt )i = h(f, a), (µ0 , λ0 )i + 0

Then we shall show, moreover, that µt ≥ 0 for all t ∈ [0, T ]. First of all, let us see that this is enough to prove the proposition. If we put f = 0 and a = 1 in (2.9), we obtain Z Z 1 d ∗ ¯ {w(x) + w(y)}(J − J )(x, y)µt (dx)µt (dy) + λt w(x)2 µt (dx). λt = dt 2 E×E E So, since µt ≥ 0, we deduce λt ≥ 0 for all t. Hence we have a local solution to (2.7) on [0, T ). Next, we put f = w and a = 1 to see that Z 1 d (hw, µt i + λt ) = {w(z) − w(x) − w(y)}J (x, y, dz)µt (dx)µt (dy) ≤ 0. dt 2 E×E×E Hence

hw, µT i + λT ≤ hw, µ0 i + λ0 ≤ 1.

We can now start again from (µT , λT ) at time T to extend the solution to [0, 2T ], and so on, to prove the proposition. We use the following complete norm on M(w) × R k(µ, λ)k = kwµk + |λ|, µ ∈ M(w), λ ∈ R. Note the following estimates: there is a constant C < ∞, depending only on w, such that, for all µ, µ0 ∈ M(w) and all λ, λ0 ∈ R, kL(µ, λ)k ≤ Ck(µ, λ)k2 , kL(µ, λ) − L(µ0 , λ0 )k ≤ Ck(µ, λ) − (µ0 , λ0 )k (k(µ, λ)k + k(µ0 , λ0 )k).

(2.10) (2.11)

Set (µ0t , λ0t ) = (µ0 , λ0 ) for all t and define inductively a sequence of continuous maps t 7 → (µnt , λnt ) : [0, ∞) → M(w) × R by , λn+1 ) = (µ0 , λ0 ) + (µn+1 t t

Z 0

t

L(µns , λns ) ds.

414

J. R. Norris

Set

fn (t) = k(µnt , λnt )k,

then f0 (t) = fn (0) = k(µ0 , λ0 )k ≤ 1 and by the estimate (2.10), Z t fn (s)2 ds. fn+1 (t) ≤ 1 + C 0

Hence

fn (t) ≤ (1 − Ct)−1 , t ≤ C −1

for all n, so, setting T = (2C)−1 , we have k(µnt , λnt )k ≤ 2, t ≤ T .

(2.12)

Next, set g0 (t) = f0 (t) and for n ≥ 1, , λn−1 )k. gn (t) = k(µnt , λnt ) − (µn−1 t t By the estimates (2.11) and (2.12), there is a constant C < ∞, depending only on w, such that Z gn+1 (t) ≤ C

t

0

gn (s) ds, t ≤ T .

Hence, by the usual arguments, (µnt , λnt ) converges in M(w) × R, uniformly in t ≤ T , to the desired map (2.8), which is moreover unique. Indeed, for some constant C < ∞, depending only on w, we have k(µt , λt )k ≤ C, t ≤ T .

(2.13)

It remains to show that µt ≥ 0 for all t. For this we need the following result. Proposition 2.5. Let (µt , λt )t≤T denote the map (2.8) and let (t, x) 7 → ft (x) : [0, T ] × E → R be a bounded measurable function, having a bounded partial derivative ∂f/∂t. Assume that |f |, |∂f/∂t| ≤ w. Then for all t ≤ T , ∂f d hft , µt i = h , µt i + h(ft , 0), L(µt , λt )i. dt ∂t Proof. Fix t ≤ T and set bscn = (n/t)−1 bns/tc and dsen = (n/t)−1 dns/te. Then Z t Z t ∂f h , µbscn i ds + h(fdsen , 0), L(µs , λs )i ds. hft , µt i = hf0 , µ0 i + 0 ∂s 0 We note the estimate h(f, 0), L(µ, λ)i ≤ (3/2)kf kk(µ, λ)k2 together with the bound (2.13). The proposition follows on letting n → ∞. u t

Cluster Coagulation

415

We resume the proof of non-negativity. For t ≤ T , set θt (x) = exp

Z t Z 0

E

∗

J (x, y)µs (dy) + λs w(x) ds

and define Gt : M(w) → M(w) by Z 1 (f θt )(z)J (x, y, dz)θt (x)−1 θt (y)−1 µ(dx)µ(dy). hf, Gt (µ)i = 2 E×E×E Note that Gt (µ) ≥ 0 whenever µ ≥ 0 and, for some C < ∞, depending only on w, we have kGt (µ)kw ≤ Ckµk2w , kGt (µ) − Gt (µ0 )kw ≤ Ckµ − µ0 kw (kµkw + kµ0 kw ), where kµkw = kwµk. Set νt = θt µt . By Proposition 2.5, for all measurable functions f with |f | ≤ w, we have ∂θ d hf, νt i = hf , µt i + h(f θt , 0), L(µt , λt )i = hf, Gt (νt )i. dt ∂t (Thus the function θt acts as an integrating factor to remove the negative terms appearing in L.) Define inductively a new sequence of measures νtn by setting νt0 = µ0 and, for n ≥ 0, Z t n+1 = µ0 + Gs (νsn ) ds. νt 0

By an argument similar to that used for the original iterative scheme, we can show, first, and possibly for a smaller value of T > 0, but still depending only on w, that kνtn kw is bounded, uniformly in n, for t ≤ T , and then that kνtn − νt kw → 0 as n → ∞. Since νtn ≥ 0 for all n, we deduce νt ≥ 0 and hence µt ≥ 0 for all t ≤ T . This completes the proof of Proposition 2.4. u t We remark that the formula in Proposition 2.5 remains valid, under the same conditions on f , for any local solution (µt , λt )t
416

J. R. Norris

Proof. Since w is bounded on A, we have Z 1 d {w(z) − w(x) − w(y)}J (x, y, dz)µt (dx)µt (dy) ≤ 0. (hw, µt i + λt ) = dt 2 E×E×E On the other hand, for w n = w ∧ n, Z 1 d n 0 {wn (z) − w n (x) − wn (y)}J 0 (x, y, dz)µ0t (dx)µ0t (dy) hw , µt i = dt 2 E×E×E Z w n (x)(J ∗ 0 − J¯0 )(x, y)µ0t (dx)µ0t (dy) − E×E Z wn (x)w0 (x)µ0t (dx). − λ0t E

wn

wn (x) + wn (y), J 0 (x, y, .)-a.e.,

≤ so, as n → ∞, Fatou’s lemma applies Note that in the first integral on the right – and monotone convergence everywhere else – to give hw n , µ0t i ≤ ωt0 , where ω00 = hw, µ00 i and Z 1 d 0 ωt = {w(z) − w(x) − w(y)}J 0 (x, y, dz)µ0t (dx)µ0t (dy) dt 2 E×E×E Z w(x)(J ∗ 0 − J¯0 )(x, y)µ0t (dx)µ0t (dy) − E×E Z 0 w(x)w0 (x)µ0t (dx). − λt E

Set θt (x) = exp

Z t Z 0

E

J ∗ (x, y)µs (dy) + λs w(x)

ds,

πt = 1A θt (µ0t − µt ),

χt = hw, µt i + λt − hw, µ0t i − λ0t , ρt = hw, µt i + λt − ωt0 − λ0t .

Note that θt is bounded on A, uniformly on compacts in t, that π0 ≥ 0, χ0 ≥ 0 and that χt ≥ ρt for all t. As we remarked above, the formula of Proposition 2.5 extends to any local solution of (2.7). So, for any bounded measurable function f supported in A, ∂θ 0 d hf, πt i = h(f θt , 0), L0 (µ0t , λ0t ) − L(µt , λt )i + hf , µ − µt i dt ∂t t Z 1 f θt (z)(J 0 (x, y, dz)µ0t (dx)µ0t (dy) = 2 E×E×E − J (x, y, dz)µt (dx)µt (dy)) Z f θt (x)J ∗ 0 (x, y)µ0t (dx)µ0t (dy) − E×E Z f θt (x)w0 (x)µ0t (dx) − λ0t Z Z E f θt (x) J ∗ (x, y)µt (dy) + λt w(x) µ0t (dx). + E

E

Cluster Coagulation

417

The inequalities we have assumed for the coefficients allow us to rewrite the last equation so as to make clear that, so long as πt and ρt remain non-negative, so does the right-hand side: Z 1 d f θt (z) hf, πt i = dt 2 E×E×E × J (x, y, dz)(1x∈A µ0t (dx)θt (y)−1 πt (dy) + θt (x)−1 πt (dx)µt (dy)) Z f θt (x)(w(x)w(y) − J ∗ (x, y))µ0t (dx)θt (y)−1 πt (dy) + E×E Z f θt (x)w(x)µ0t (dx) + ρt E Z 1 f θt (z)(J 0 − 1A×A J )(x, y, dz)µ0t (dx)µ0t (dy) + 2 E×E×E Z f θt (x)(J ∗ − J ∗ 0 )(x, y)µ0t (dx)µ0t (dy) + ZE×E 0 0 f θt (x)(w(x)w(y) − J ∗ (x, y))1y ∈A + / µt (dx)µt (dy) E×E Z + (χt − ρt ) f θt (x)w(x)µ0t (dx) E Z 0 f θt (x)(w − w 0 )(x)µ0t (dx). + λt E

In the same spirit Z 1 d ρt = {w(z) − w(x) − w(y)}J (x, y, dz)µt (dx)µt (dy) dt 2 E×E×E Z 1 {w(z) − w(x) − w(y)}J 0 (x, y, dz)µ0t (dx)µ0t (dy) − 2 E×E×E Z (w 0 − w)(x)(J ∗ 0 − J¯0 )(x, y)µ0t (dx)µ0t (dy) − E×E Z − λ0t (w0 − w)(x)w0 (x)µ0t (dx) Z E 1 {w(x) + w(y) − w(z)} = 2 E×E×E

× J (x, y, dz)(1x∈A µ0t (dx)θt (y)−1 πt (dy) + θt (x)−1 πt (dx)µt (dy)) Z 1 {w(x) + w(y) − w(z)}(J 0 − 1A×A J )(x, y, dz)µ0t (dx)µ0t (dy) + 2 E×E×E Z (w − w0 )(x)(J ∗ 0 − J¯0 )(x, y)µ0t (dx)µ0t (dy) + E×E Z + λ0t (w − w0 )(x)w0 (x)µ0t (dx). E

So (πt , ρt ) satisfies an equation of the form d (πt , ρt ) = Ht (πt , ρt ) + (αt , βt ), dt

418

J. R. Norris

where Ht : M(w)×R → M(w)×R is linear, where Ht (π, ρ) ≥ 0 whenever (π, ρ) ≥ 0, where αt ∈ M+ (w), βt ∈ [0, ∞), and where we have estimates, kHt (π, χ )k ≤ Ck(π, ρ)k for some constant C < ∞ depending only on w and A, and Z t k(αs , βs )k ds < ∞, t < T . 0

Therefore we can apply the same sort of argument that we used for non-negativity to see t that πt ≥ 0 and ρt ≥ 0 for all t, as required. u ˜ For each compact set B ⊆ (0, ∞), set Proof of Theorem 2.1. Fix µ0 ∈ M+ (ϕ). B ˜ E\B˜ , µ0 i. µB 0 = 1B˜ µ0 , λ0 = hϕ1

Take J (x, y, dz) = K(x, y, dz)1z∈B˜ , ¯ y), J ∗ (x, y) = K(x,

(2.14)

w(x) = ϕ(x). ˜ By Proposition 2.4, for each finite constant M with M ≥ supx∈B ϕ(x), there is a unique B B B solution (µB t , λt )t≥0 to (2.7), starting from (µ0 , λ0 ), corresponding to coefficients 2 ∗ 2 J ∧ M , J ∧ M , w ∧ M. On taking f = 1E\B˜ and a = 0 in (2.7), we find that B B ˜ supp µB t ⊆ B for all t. It follows by uniqueness that (µt , λt )t≥0 does not depend on our choice of M and is moreover a solution to (2.7) for coefficients J, J ∗ , w. By Proposition 2.6, for B ⊆ B 0 we have 0

0

0

B B B ˜ µB ˜ µB µB t i + λt . t ≤ µt , hϕ, t i + λt ≥ hϕ, B Consider the monotone limits µt = limB↑(0,∞) µB t and λt = lim B↑(0,∞) λt . Note that

hϕ, ˜ µt i =

lim hϕ, ˜ µB ˜ µ0 i < ∞. t i ≤ hϕ,

B↑(0,∞)

So, by dominated convergence, using (2.4), for all bounded measurable functions f , as B ↑ (0, ∞), Z E×E×E

B f (z)K(x, y, dz)1z6∈B˜ µB t (dx)µt (dy) → 0,

and we can pass to the limit in (2.7) to obtain Z 1 d hf, µt i = {f (z)−f (x)−f (y)}K(x, y, dz)µt (dx)µt (dy)−λt hf ϕ, ˜ µt i. dt 2 E×E×E We take J 0 = K and J ∗ 0 = K¯ in Proposition 2.6 to see that, for any local solution (νt )t
Cluster Coagulation

419

Hence, if λt = 0 for all t < T , then (µt )t
0

for all t < T ; this allows us to pass to the limit in (2.7) to obtain d λt = λt hϕ˜ 2 , µt i dt

(2.15)

and to deduce from this equation that λt = 0 for all t < T ; it follows that (νt )t 0, then, by dominated convergence, the second term on the right tends to 0 as n → ∞, showing that (νt )t
so, for t < T ,

hϕ˜ 2 , µt i ≤

−1 lim hϕ˜ 2 , µB t i ≤ (T − t) .

B↑(0,∞)

Hence (2.15) holds and forces λt = 0 for t < T as above, so (µt )t
so ˜ µ0 it}. hϕ˜ 2 , µt i ≤ hϕ˜ 2 , µ0 i exp{2hϕ, t In either case we can deduce that (µt )t≥0 is a strong solution. u

420

J. R. Norris

Proof of Theorem 2.3. Suppose that (µt )t≥0 is a solution to (2.6). Fix R ≥ R(K) and set νt = 1m≤R µt . For each bounded measurable function f on E, on applying (2.6) to 1m≤R f , we find that (νt )t≥0 satisfies the autonomous equation Z 1 d hf, νt i = {f (z)1m(z)≤R − f (x) − f (y)} dt 2 E×E×E (2.16) × K(x, y, dz)νt (dx)νt (dy) − λt hψf, νt i, where Since m1m≤R

λt = hm, µ0 i − hm, νt i. is bounded, we have Z 1 d λt = {m(x) + m(y)}1m(x)+m(y)>R dt 2 E×E ¯ × K(x, y)νt (dx)νt (dy) + λt hψm, νt i.

(2.17)

By a minor modification of the proof of Proposition 2.4, Eqs. (2.16), (2.17) have a unique solution, with λt + hm, νt i = hm, µ0 i. Moreover, by uniqueness, for R(K) ≤ R1 < R2 , with an obvious notation, νtR1 = 1m≤R1 νtR2 . Set µt = lim νtR , ρt = lim λR t . R→∞

R→∞

Then ρt = hm, µ0 − µt i and (µt )t≥0 satisfies (2.6). Hence (2.6) has a unique solution t starting from µ0 . u The conclusion of the next result is obtained by taking f = m2 in (2.2). Since m2 is unbounded this is not necessarily valid. We show here that it is valid for a conservative local solution. Proposition 2.7. Let (µt )t
E×E

Proof. On taking f = m1m≤λ in (2.1) and subtracting from the equation hm, µt i = hm, µ0 i, we obtain, for all t < T and λ ≥ 0, hm1m>λ , µt i = hm1m>λ , µ0 i Z Z 1 t {m(z)1m(z)>λ − m(x)1m(x)>λ − m(y)1m(y)>λ } + 2 0 E×E×E × K(x, y, dz)µs (dx)µs (dy). Now m(z) = m(x) + m(y) for K(x, y, ·)-almost all z, so the integrand on the right is non-negative. Hence on integrating in λ we obtain Z ∞ 2 hm1m>λ , µt idλ hm , µt i = 0 Z Z 1 t 2 {m(z)2 − m(x)2 − m(y)2 }K(x, y, dz)µs (dx)µs (dy), = hm , µ0 i + 2 0 E×E×E which simplifies to the desired equation. u t

Cluster Coagulation

421

The next result concerns the multiplicative coalescent, where E = (0, ∞), m(x) = x and K(x, y, dz) = xyεx+y (dz). A similar result, established by essentially the same arguments, is given in [EZH]. Assume that hm + 1, µ0 i < ∞ and denote by (µt )t≥0 the unique solution to (2.6) provided by Theorem 2.3. Theorem 2.8. Set

Z τp = τ (p) = hm

m

−1

e−px dx, µ0 i

, p ≥ 0.

0

Then τ0 = (hm2 , µ0 i)−1 and τ : [0, ∞) → [τ0 , ∞) is a continuous and increasing bijection, C ∞ on (0, ∞). We have hm, µt i = hm, µ0 i, hm2 , µt i = (τ0 − t)−1 , 0 ≤ t < τ0 and

hm, µτ (p) i = hme−pm , µ0 i, p ≥ 0.

In particular, the map t 7 → hm, µt i is continuous on [0, ∞) and gelation occurs at τ0 . Proof. By rescaling we can take hm, µ0 i = 1. For t ≥ 0 and p ≥ 0, set mt (p) = hme−pm , µt i. Note that mt is continuous on [0, ∞) and continuously differentiable on (0, ∞), with (∂/∂p)mt (p) = −hm2 e−pm , µt i < 0. Since (µt )t≥0 satisfies (2.6), mt (p) is also continuously differentiable in t, for each p > 0, with (∂/∂t)mt (p) = hm2 e−pm , µt i(mt (p) − 1) < 0. Hence, by the implicit function theorem, for every p0 > 0, there is a constant τ (p0 ) ∈ (0, ∞] (not yet identified as above) and a continuously differentiable map t 7→ pt : [0, τ (p0 )) → (0, ∞) such that mt (pt ) = m0 (p0 ), t < τ (p0 ) and pt → 0 as t → τ (p0 ) in the case τ (p0 ) < ∞. We have 0 = (∂/∂t)mt (pt ) = hm2 e−pt m , µt i{mt (pt ) − 1 − (∂p/∂t)} so

pt = p0 − (1 − m0 (p0 ))t, t < τ (p0 )

and so τ (p0 ) = p0 /(1 − m0 (p0 )). Hence our use of τ is consistent with the statement of the theorem. For all p > 0 we have hm, µτ (p) i = mτ (p) (0) = m0 (p). In particular, for t < τ0 we deduce that hm, µt (0)i = 1. (We know this also by Theorem 2.1.) So we can apply Proposition 2.7 to deduce that (d/dt)hm2 , µt i = hm2 , µt i2 , t which implies hm2 , µt i = (τ0 − t)−1 . u

422

J. R. Norris

Proposition 2.9. Assume conditions (2.4) and (2.5). Let (µt )t
K(x, y, ·) a.a.z.

Then (2.2) holds for f . Proof. Since (2.2) is linear in f and already holds for f 1m≤1 , we can reduce to the case where 0 ≤ f ≤ Cm. Since (2.2) holds for m1m≤R for every R > 0, and (µt )t
R→∞ 0

E×E

¯ 1m(x)≤R,m(x)+m(y)>R m(x)K(x, y)µs (dx)µs (dy)ds = lim hm1m≤R , µ0 − µt i = 0. R→∞

(2.18)

Now (2.2) holds for f 1m≤R : Z Z 1 t {f (z) − f (x) − f (y)} 1m(x)+m(y)≤R hf 1m≤R , µt i = hf 1m≤R , µ0 i + 2 0 E×E×E × K(x, y, dz)µs (dx)µs (dy)ds Z tZ ¯ f (x)1m(x)≤R,m(x)+m(y)>R K(x, y)µs (dx)µs (dy)ds. − 0

E×E

We let R → ∞ to see that (2.2) holds for f : dominated convergence applies to the second term on the right and (2.18) shows that the third term tends to 0. u t Proof of Theorem 2.2. We will show that, given a conservative local solution (µt )t 0, choose s < T with hm2 , µs i ≥ (δε)−1 . Suppose (µt )t<S is a conservative local solution to (2.1), with S > s. Then so is (µs+t )t<S−s . Hence S − s ≤ (εhm2 , µs i)−1 ≤ δ, so S < T + δ. Since δ was arbitrary, this shows that (µt )t
Cluster Coagulation

423

m(x, ˜ x) ˜ = m(x). Set π(x, x) ˜ = x. Consider the coagulation kernel K˜ on E˜ given by ˜ K((x, x), ˜ (y, y), ˜ (dz, d z˜ )) Z = K(x, y, dz)

ξ ηx(dξ ˜ )y(dµ) ˜ εx+ (d z˜ ) ˜ y+c(ξ,µ) ˜ ¯ K(x, y) (0,∞)×(0,∞) m(x)m(y) εx+ + 1− ˜ y˜ (d z˜ ) , ¯ K(x, y)

where c(ξ, µ) = εξ +µ − εξ − εµ . Note that K˜ (x, x), ˜ (y, y), ˜ π −1 (A) = K(x, y, A), A ∈ E. ˜ We will show that we can construct a K-coagulant (µ˜ t )t
µt = µ˜ t ◦ π −1 , t ≥ 0. Then hm, ˜ µ˜ t i = hm, µt i for all t, so (µ˜ t )t
Then (νt )t
Note that, since (µt )t 0, we obtain an iteration for µ˜ R,n t using the estimate ¯ K(x, y) ≤ M (1 + m(x)) (1 + m(y)) we can argue as in the proof of Proposition 2.4 that, for each R > 0, (µ˜ R,n t )t
424

J. R. Norris

The argument differs from Proposition 2.4 only in that, since K¯ is unbounded but µt is already known, we estimate the last term in this equation by Z R ≤ M(1 + R)kf k∞ kµR kh1 + m, µ0 i. ¯ f (x, x) ˜ K(x, y)µ (dx, d x)µ ˜ (dy) t ˜ E×E

The processes (µR t )t
3. Discrete Approximation Scheme In this section we find conditions under which the strong solutions of Sect. 2 can be approximated by an evolution of measures supported on a finite set. This will be used in Sect. 4 to reduce the problem of the large particle number behaviour of the stochastic model to a finite-dimensional stability problem, which can be handled by the exponential martingale inequality. Let us now describe the context adopted in this section, which is less general than that of Sect. 2, but still sufficiently general to include models for cluster growth dependent on the shape and constitution of the clusters. In particular, this restricted class of models may be appropriate for the evolution of polymers. We remark that the standard mass coalescence model can be embedded in a simple case of these ‘polymer’ models, as we explain below. By a shape we mean a finite set σ together with a subgroup G of the group of permutations of σ . We denote by W(σ ) the set of equivalence classes of functions f : σ → (0, ∞), where f ∼ f 0 if and only if f 0 = f ◦ g for some g ∈ G. Thus W(σ ) is the set of all weighted shapes on σ with symmetry group G. Given shapes (σ1 , G1 ), (σ2 , G2 ) and (σ, G), and a bijection θ : σ1 t σ2 → σ , we consider, for xi = [fi ] ∈ W(σi ), the atomic probability measure on W(σ ) given by X ε[f ◦ψ −1 ] (dz). π(x1 , x2 , dz) = |[θ ]|−1 ψ∼θ

Here we write [θ] for the class of bijections equivalent to θ , where ψ ∼ θ if ψ|σi = (θ|σi ) ◦ gi for some gi ∈ Gi , i = 1, 2. Also, we write f for the function on σ1 t σ2 given by f |σi = fi , i = 1, 2. It is easy to check that π(x1 , x2 , dz) does not depend on the choices of representatives f1 , f2 . The kernels π provide the basic coagulations that we allow. If G is large enough, then f ◦ ψ −1 ∼ f ◦ θ −1 whenever ψ ∼ θ , so π(x1 , x2 , dz) is a unit mass. If not, then π allows us to make a random coagulation in which symmetry is lost. We call any of these kernels π a coagulation of σ1 and σ2 into σ . We assume that, for each k ∈ N, there is a finite set 6k of shapes of cardinality k such that E = ∪∞ k=1 Ek , Ek = ∪σ ∈6k W(σ ).

(3.1)

We use the topology on E generated by sets of the form {[f ] ∈ W(σ ) : f1 < f < f2 }, where f1 , f2 : σ → (0, ∞) and σ ∈ 6k , k ∈ N. We take E to be the associated Borel σ -algebra. Set X f (z), inf[f ] = inf f (z). m([f ]) = z∈σ

z∈σ

Cluster Coagulation

425

Note that m : E → (0, ∞) is continuous, so ϕ˜ = ϕ ◦ m is too. Given σi ∈ 6k(i) , i = 1, 2, we denote by C(σ1 , σ2 ) the finite set of coagulations of σ1 and σ2 into some σ ∈ 6k(1)+k(2) . We assume that the coagulation kernel K has the form X κ(x, y, π )π(x, y, dz), x ∈ W(σ ), y ∈ W(τ ) (3.2) K(x, y, dz) = π∈C(σ,τ )

for some continuous symmetric function κ : W(σ ) × W(τ ) × C(σ, τ ) → [0, ∞). A simple example occurs when we take E to be the set of finite integer-valued measures on (0, ∞) with x and y coagulating to x + y at rate κ(x, y). Here a cluster k ε is the mass distribution of its constituent particles, but we do not retain any x = 6i=1 xi notion of configuration of these particles. We can take 6k = {σk }, σk = ({1, . . . , k}, Sk ). Then each set C(σ, τ ) contains just one element, given by π(x, y, dz) = εx+y (dz) and the coagulation kernel has the required form K(x, y, dz) = κ(x, y)εx+y (dz). In particular, this example may be used to study the mass coalescence model on (0, ∞) where masses x and y coagulate at rate κ(x, y), as follows. The map x 7 → εx : (0, ∞) → E allows us to regard a measure µ0 on (0, ∞) as a measure µ˜ 0 on E. Then any solution (µ˜ t )t
Note that, for any coagulation π(x, y)(dz) = π(x, y, dz), we have dπ(x, y)en = π(dxen , dyen )

426

J. R. Norris

and, if dµen = dπ(x, y)en , then µ = π(x 0 , y 0 ) for some x 0 , y 0 with dx 0 en = dxen , dy 0 en = dyen . Fix B ⊆ (0, ∞) compact and for n ≥ 1 set Jn (x, y, A) =

X π∈C(σ,τ )

Jn∗ (x, y)

=

wn (x) =

sup

dx 0 en =dxen dy 0 en =dyen

sup

inf

dx 0 en =dxen dy 0 en =dyen

˜ k(x 0 , y 0 , π )π(x, y, A ∩ B),

¯ 0 , y 0 ), K(x

dx 0 en =dxen

(3.3)

ϕ(x ˜ 0 ).

Note that J¯n (x, y) ≤ Jn∗ (x, y) ≤ wn (x)wn (y), x, y ∈ E and, for any coagulation π, wn ≤ wn (x) + wn (y), π(x, y, .)-a.e. so wn ≤ wn (x) + wn (y), K(x, y, .)-a.e. Note also that ∗ ≤ Jn∗ , ϕ˜ = w ≤ wn+1 ≤ wn , Jn ≤ Jn+1 ≤ J ≤ K, K¯ ≤ J ∗ ≤ Jn+1

where J, J ∗ , w are given by (2.14). For ε > 0, set B˜ ε = {x ∈ B˜ : inf x > ε}. Note that, if inf x > 2−n , then dx 0 en = dxen implies m(x)/2 ≤ m(x 0 ) ≤ 2m(x) and so, by sublinearity, ϕ(x ˜ 0 ) = ϕ(m(x 0 )) ≤ 4ϕ(m(x)/2). Hence, for ε ≥ 2−n , wn is bounded on B˜ ε . Fix ε ≥ 2−n and set ε,B ˜ µ0 i. µε,B 0 = 1B˜ ε µ0 , λ0 = h1E\B˜ ε ϕ, ε,B Choose a finite constant M ≥ supx∈B˜ ε wn (x) and denote by (µε,B t , λt )t≥0 and

, λε,B,n )t≥0 the unique solutions to (2.7), provided by Proposition 2.4, starting (µε,B,n t t ε,B ε,B from (µ0 , λ0 ), and corresponding respectively to coefficients J ∧M 2 , J ∗ ∧M 2 , w ∧ M and Jn ∧ M 2 , Jn∗ ∧ M 2 , wn ∧ M. By uniqueness, these solutions do not depend on our choice of M and remain solutions when we take M = ∞. , λε,B,n )t≥0 , which is a consequence of the special form The key point about (µε,B,n t t ∗ en , λε,B,n )t≥0 is a solution to (2.7) starting of the coefficients Jn , Jn , wn , is that (dµε,B,n t t ε,B ε,B en is from (dµ0 en , λ0 ) with coefficients Jn , Jn∗ , wn . This is useful because dµε,B,n t

Cluster Coagulation

427

supported in the finite set {dxen : x ∈ B˜ ε } for all t. To verify that (dµε,B,n en , λε,B,n )t≥0 t t satisfies (2.7) is straightforward once it is seen that Z {f (z) − f (x) − f (y)}Jn (x, y, dz)dµen (dx)dµen (dy) E×E×E Z {fn (z) − fn (x) − fn (y)}Jn (x, y, dz)µ(dx)µ(dy), = E×E×E

where fn (x) = f (dxen ). Theorem 3.1. Assume conditions (2.4) and (2.5). Assume also that hϕ˜ 2 , µ0 i < ∞ and denote by (µt )t
So, by Proposition 2.6, for all t ≥ 0, ε,B,n ε,B ≤ µε,B i + λε,B,n ≥ hwn , µε,B µε,B,n t t , hwn , µt t t i + λt ,

so

ε,B i + λε,B,n ≥ hϕ, ˜ µε,B hϕ, ˜ µε,B,n t t t i + λt ,

and, for all t < T ,

ε,B ≤ µt , hϕ, ˜ µε,B ≥ hϕ, ˜ µt i. µε,B t t i + λt

Moreover, again by Proposition 2.6, we have monotone limits = lim µε,B,n , λε,B,∞ = lim λε,B,n , λt = lim λε,B µε,B,∞ t t t t t . n→∞

n→∞

ε↓0 B↑(0,∞)

ε,B,n = λε,B )t≥0 satisfies the equation Note that λε,B,∞ 0 0 and λ0 = 0. Now (λt Z 1 d ε,B,n = {wn (x) + wn (y)}(Jn∗ − Jn )(x, y)µε,B,n (dx)µε,B,n (dy) λt t t dt 2 E×E Z wn (x)2 µε,B,n (dx). + λε,B,n t t E

Since K and ϕ˜ are continuous, we obtain in the limit n → ∞, by bounded convergence, Z 1 d ε,B,∞ ∗ λ = {ϕ(x) ˜ + ϕ(y)}(J ˜ − J )(x, y)µε,B,∞ (dx)µε,B,∞ (dy) t t dt t 2 E×E Z ϕ(x) ˜ 2 µε,B,∞ (dx). + λε,B,∞ t t E

428

J. R. Norris

On the other hand (λε,B t )t≥0 satisfies Z 1 d ε,B ε,B ∗ λ = {ϕ(x) ˜ + ϕ(y)}(J ˜ − J )(x, y)µε,B t (dx)µt (dy) dt t 2 E×E Z ϕ(x) ˜ 2 µε,B + λε,B t t (dx). E

So λε,B,∞ t

λε,B t

≤ for all t ≥ 0. Since (µt )t
p X

εxi .

i=1

A Markov process (Xt )t≥0 of finite integer-valued measures on E can be constructed as follows: for each pair i < j , take independent random variables Tij and Yij , where ¯ i , xj ) and Yij has law K(xi , xj , .)/K(x ¯ i , xj ); set Tij is exponential of parameter K(x T = mini<j Tij , set Xt = X0 for t < T and set XT = X0 − εxi − εxj + εYij

if

T = Tij ,

then begin the construction afresh from XT . In this process, each pair of clusters {xi , xj } coalesces at rate K(xi , xj , dy) to form a new cluster y. We call (Xt )t≥0 a (stochastic) K-coagulant. Our main result, Theorem 4.2, is a weak limit theorem for K-coagulants as the number of clusters becomes large, where the limiting deterministic dynamics are provided by the Smoluchowski-type equations from Sect. 2. We begin however by introducing a more elaborate sort of particle system, cases of which will be used to approximate the K-coagulant. We suppose given, as in Sect. 2, a coagulation kernel J , a symmetric measurable function J ∗ : E × E → [0, ∞) and a measurable function w : E → (0, ∞) such that J¯(x, y) ≤ J ∗ (x, y) ≤ w(x)w(y), x, y ∈ E and

w ≤ w(x) + w(y), J (x, y, .)-a.e.

Our particle system will be a Markov process (Yt , 3t )t≥0 , where, for each t, Yt is a finite integer-valued measure on E and 3t ∈ [0, ∞). Let (Y0 , 30 ) be given with Y0 =

q X i=1

εxi

Cluster Coagulation

429

and take independent exponential random variables Sij , Sij∗ , for i < j , and Si , for all i, of parameters J¯(xi , xj ), (J ∗ − J¯)(xi , xj ) and w(xi )30 respectively; take also independent random variables Yij of laws J (xi , xj , .)/J¯(xi , xj ). Set S = min(Sij ∧ Sij∗ ) ∧ min Si . i<j

i

Set (Yt , 3t ) = (Y0 , 30 ) for t < S and set   if S = Sij , (Y0 − εxi − εxj + εYij , 30 ) (YS , 3S ) = (Y0 − εxi − εxj , 30 + w(xi ) + w(xj )) if S = Sij∗ ,  (Y − ε , 3 + w(x )) if S = Si . 0 xi 0 i Now repeat the same construction independently from (YS , 3S ). We call the Markov process thus obtained a (J, J ∗ , w)-coagulant. In the case where J ≤ K and K¯ ≤ J ∗ , we will show that one can couple this process with the K-coagulant in a useful way. Proposition 4.1. Assume that w is bounded. Let µ0 ∈ M+ (w) and λ0 ∈ [0, ∞). Denote by (µt , λt )t≥0 the unique solution to (2.7) provided by Proposition 2.4. Let (YtN , 3N t )t≥0 be a sequence of (J, J ∗ , w)-coagulants. Suppose that there is a finite set F ⊆ E such that, for all t ≥ 0 and all N ≥ 1, supp µt ⊆ F, supp YtN ⊆ F. Suppose also that, as N → ∞, uniformly in probability, ˜N kY˜0N − µ0 k + |3 0 − λ0 | → 0, −1 N ˜N where Y˜tN = N −1 YNN−1 t and 3 t = N 3N −1 t . Then, for all t ≥ 0 and all δ > 0, o n −1 N N ˜ ˜ lim sup N log P sup kYs − µs k + |3s − λs | > δ < 0. N→∞

s≤t

Proof. Fix N ≥ 1 and define a process (Mt )t≥0 in M × R by Z t 1/2 N ˜N N ˜N (N ) ˜ N ˜ N ˜ ˜ L (Ys , 3s ) ds , Mt = N (Yt , 3t ) − (Y0 , 30 ) −

(4.1)

0

where we define L(N ) (µ, λ) by the requirement h(f, a),L(N ) (µ, λ)i Z 1 {f (z) − f (x) − f (y)}J (x, y, dz)µ(N ) (dx, dy) = 2 E×E×E Z 1 {aw(x) + aw(y) − f (x) − f (y)}(J ∗ − J¯)(x, y)µ(N ) (dx, dy) + 2 E×E Z + λ {aw(x) − f (x)}w(x)µ(dx) E

for all bounded measurable functions f on E and all a ∈ R, and where µ(N ) is given by µ(N ) (A × B) = µ(A)µ(B) − N −1 µ(A ∩ B).

430

J. R. Norris

Set then by

f,a

Mt (M f,a )t≥0

= h(f, a), Mt i,

is a martingale, with previsible increasing process (hM f,a it )t≥0 given hM

f,a

Z it =

t

0

˜N Q(N ) (Y˜sN , 3 s )(f, a)ds,

where Q(N ) (µ, λ)(f, a) Z 1 {f (z) − f (x) − f (y)}2 J (x, y, dz)µ(N ) (dx, dy) = 2 E×E×E Z 1 {aw(x) + aw(y) − f (x) − f (y)}2 (J ∗ − J¯)(x, y)µ(N ) (dx, dy) + 2 E×E Z + λ {aw(x) − f (x)}2 w(x)µ(dx). E

Set Ut =

˜N (Y˜tN , 3 t ) − (µt , λt )

and subtract Eqs. (4.1) and (2.7) to obtain Z t Gs (Us ) ds, Ut = Zt + 0

Z

where Zt = U0 + N

−1/2

Mt +

0

t

˜N (L(N ) − L)(Y˜sN , 3 s ) ds

and Gt is given by h(f, a), Gt (µ, λ)i Z 1 {f (z) − f (x) − f (y)}J (x, y, dz)(Y˜tN + µt )(dx)µ(dy) = 2 E×E×E Z 1 {aw(x) + aw(y) − f (x) − f (y)}(J ∗ − J¯)(x, y)(Y˜tN + µt )(dx)µ(dy) + 2 E×E Z N ˜ + (3t + λt ) {aw(x) − f (x)}w(x)µ(dx) E Z + λ {aw(x) − f (x)}w(x)(Y˜tN + µt )(dx). E

There is a constant C < ∞, which is independent of N , such that, for all t ≥ 0, all functions f on E with kf k ≤ 1 and all |a| ≤ 1, ˜N k(L(N ) − L)(Y˜tN , 3 t )k ≤ C/N, ˜N |Q(N ) (Y˜tN , 3 t )(f, a)| ≤ C,

kGt (µ, λ)k ≤ Ck(µ, λ)k,

f,a |Mt

Set

f,a

− Mt− | ≤ CN −1/2 .

u(t) = sup kUs k, z(t) = sup kZs k, s≤t

s≤t

Cluster Coagulation

431

then

Z u(t) ≤ z(t) + C

t

u(s) ds,

0

so

u(t) ≤ eCt z(t).

Hence it suffices to show that, for all t ≥ 0 and δ > 0, lim sup N −1 log P(z(t) ≥ δ) < 0. N→∞

We have assumed that, as N → ∞, uniformly in probability, ˜N kU0 k = kY˜0N − µ0 k + |3 0 − λ0 | → 0. We have also

Z t N (L(N ) − L)(Y˜ N , 3 ˜ s s ) ds ≤ Ct/N. 0

So it suffices to show that, for all t ≥ 0 and δ > 0, lim sup N −1 log P sup kMs k ≥ N 1/2 δ < 0. s≤t

N→∞

Since F is finite, it then suffices to show that, for all functions f on E with kf k ≤ 1 and all |a| ≤ 1, for all t ≥ 0 and δ > 0, f,a (4.2) lim sup N −1 log P sup Ms ≥ N 1/2 δ < 0. s≤t

N→∞

We recall a form of the exponential martingale inequality for martingales m whose jumps are bounded uniformly by A ∈ [0, ∞) and which have a continuous previsible increasing process hmi: for all θ ≥ 0 we have 1 P(sup mt ≥ γ and hmi∞ ≤ ε) ≤ exp{−θ γ + θ 2 eθ A ε}. 2 t This may be established as follows: set α = (1/2)θ 2 eθ A , then by Itô’s formula xt = exp{θmt − αhmit } is a supermartingale; set T = inf{t ≥ 0 : mt > γ }, then by optional stopping E(xT ) ≤ 1 and the claimed inequality follows by Chebyshev’s inequality. f,a Now take ms = Ms∧t , A = CN −1/2 , γ = N 1/2 δ, ε = Ct and θ = γ /(3Ct). We can assume that δ ≤ N 1/2 t, so eθ A ≤ eδ/3N and so

1/2 t

≤ e1/3 ≤ 3/2

2 f,a P sup Ms ≥ N 1/2 δ ≤ e−N δ /4Ct , s≤t

which certainly implies (4.2). u t

432

J. R. Norris

We now describe a coupling of the K-coagulant with the (J, J ∗ , w)-coagulant. Assume that J ≤ K and K¯ ≤ J ∗ . Suppose we are given two finite integer-valued measures X0 and Y0 on E and 30 ∈ [0, ∞) such that Y0 ≤ X0 , hw, Y0 i + 30 ≥ hw, X0 i. We can write Y0 =

q X

εxi , X0 =

i=1

p X

εxi

i=1

with q ≤ p. For i < j ≤ q, take independent exponential random variables Sij , Rij , Rij∗ , ¯ i , xj ) respectively. Also, having parameters J¯(xi , xj ), (K¯ − J¯)(xi , xj ), (J ∗ − K)(x for i < j ≤ q, take independent random variables Yij and Zij , Yij having law J (xi , xj , .)/J¯(xi , xj ) and Zij having law (K − J )(xi , xj , .)/(K¯ − J¯)(xi , xj , .). For i < j and j > q, take independent exponential random variables Tij of parameter ¯ i , xj ) and independent random variables Wij having law K(xi , xj , .)/K(x ¯ i , xj ). K(x Finally, for each i ≤Pq, take an independent exponential random variable Ri of pa¯ i , xj ). Our hypotheses ensure these last parameters are rameter w(xi )30 − j >q K(x all non-negative. For i < j ≤ q, set Tij = Sij ∧ Rij and Sij∗ = Rij ∧ Rij∗ , so ¯ i , xj ) and (J ∗ − J¯)(xi , xj ) respecthat Tij and Sij∗ are exponential of parameters K(x tively. Set also Wij = Yij 1Tij =Sij + Zij 1Tij =Rij and note that Wij has distribution ¯ i , xj ). For i ≤ q, set K(xi , xj , .)/K(x Si = Ri ∧ min Tij , j >q

then Si is exponential of parameter w(xi )30 . Now set T = min (Sij ∧ Rij ∧ Rij∗ ) ∧ min Tij ∧ min Ri . i<j ≤q

i<j,j >q

i≤q

Set Xt = X0 and (Yt , 3t ) = (Y0 , 30 ) for t < T and define ( X0 − εxi − εxj + εWij if T = Tij , XT = X0 otherwise;   (Y0 − εxi − εxj + εYij , 30 )   (Y − ε − ε , 3 + w(x ) + w(x )) 0 xi xj 0 i j (YT , 3T ) =  (Y0 − εxi , 30 + w(xi ))    (Y0 , 30 )

if T = Sij , i < j ≤ q, if T = Sij∗ , i < j ≤ q, if T = Si , i ≤ q, otherwise.

Then repeat the construction independently from time T . It is clear that the coupled processes (Xt )t≥0 and (Yt , 3t )t≥0 thus constructed are, respectively a K-coagulant and a (J, J ∗ , w)-coagulant. Moreover a straightforward check of the various cases shows that YT ≤ XT , hw, YT i + 3T ≥ hw, XT i, and hence, inductively, for all t, Yt ≤ Xt , hw, Yt i + 3t ≥ hw, Xt i.

Cluster Coagulation

433

Here is the main result of this section. It is expressed in terms of a metric dϕ on ˜ = {µ ∈ M+ : hϕ, ˜ µi < ∞} compatible with the topology of ϕ-weighted ˜ weak M+ (ϕ) ˜ µn i → hf ϕ, ˜ µi for all bounded convergence: thus dϕ (µn , µ) → 0 if and only if hf ϕ, continuous functions f on E. We remark that the exponential rate of convergence established here becomes computable only once one can quantify the speed of convergence of the discrete approximation scheme in Sect. 3. This would require much stronger hypotheses than we have made on K and µ0 . For related recent work see [CK96,Jeo98]. Theorem 4.2. Assume that the space E of clusters and the coagulation kernel K have the structure (3.1) and (3.2). Let µ0 be a measure on E. Assume that, for some continuous sublinear function ϕ : (0, ∞) → (0, ∞), ¯ K(x, y) ≤ ϕ(x) ˜ ϕ(y), ˜ x, y ∈ E, and that hϕ, ˜ µ0 i < ∞ and hϕ˜ 2 , µ0 i < ∞. Denote by (µt )t 0, lim sup N N→∞

−1

N ˜ log P sup dϕ (Xs , µs ) > δ < 0. s≤t

˜ − ν)k and we do this without Proof. It is convenient to assume that dϕ (µ, ν) ≤ kϕ(µ loss. Fix δ > 0 and t < T . Fix ε > 0 and B ⊆ (0, ∞) compact and recall that we write B˜ ε = {x ∈ B˜ : inf x ≥ ε}. For n ≥ 1, define Jn , Jn∗ and wn as in (3.3). Denote by B,n (µB,n s , λs )s≥0 the unique solution to (2.7) starting from ˜ E\B˜ ε , µ0 i µn0 = 1B˜ ε µ0 , λn0 = hϕ1 with coefficients Jn , Jn∗ , wn . By Theorem 3.1, we can choose ε, B and n0 so that, for all n ≥ n0 , λnt < δ/5. Now, for fixed ε, B and C < ∞, there exists n ≥ n0 such that, for all measures µ ˜ µi ≤ C, supported on B˜ ε with hϕ, d(µ, dµen ) < δ/5. We fix C < ∞ so that

˜ X˜ 0N i ≤ C, hϕ, ˜ µ0 i ≤ C. suphϕ, N

= hϕ1 ˜ E\B˜ ε , X0N i. Then X0n,N ≤ X0N and hϕ, ˜ X0n,N i + Set X0n,N = 1B˜ ε X0N and 3n,N 0

≥ hϕ, ˜ X0N i. Hence, using the coupling construction described above, we can find 3n,N 0 a sequence of (Jn , Jn∗ , wn )-coagulants (Xsn,N , 3n,N s )s≥0 such that, for all s, ˜ Xsn,N i + 3n,N ≥ hϕ, ˜ XsN i. Xsn,N ≤ XsN , hϕ, s

434

J. R. Norris

n ˜ n,N Write (X˜ sn,N , 3 s )s≥0 for the rescaled processes as in Proposition 4.1. Then µs and X˜ sn,N are supported in B˜ ε for all s and N , and

˜ X˜ sn,N i ≤ C, hϕ, ˜ µns i ≤ C. suphϕ, N

Hence

dϕ (X˜ sn,N , dX˜ sn,N en ) < δ/5, dϕ (µns , dµns en ) < δ/5 Now, for s ≤ t, we have, using Proposition 3.1, ˜ ns − µs )k = hϕ, ˜ µns − µs i ≤ λns ≤ λnt < δ/5. dϕ (µns , µs ) ≤ kϕ(µ Similarly, using the coupling, ˜ n,N ˜ n,N ˜ n,N ≤ λns + |3 − λns | < δ/5 + |3 − λns |. dϕ (X˜ sN , X˜ sn,N ) ≤ 3 s s s Hence, for s ≤ t, dϕ (X˜ sN , µs ) ≤ dϕ (X˜ sN , X˜ sn,N ) + dϕ (X˜ sn,N , dX˜ sn,N en )

+ dϕ (dX˜ sn,N en , dµns en ) + dϕ (dµns en , µns ) + dϕ (µns , µs ) ˜ n,N − λns |. ≤ 4δ/5 + kϕ(d ˜ X˜ sn,N en − dµns en )k + |3 s

Now (dµns en , λns )s≥0 is the unique solution to (2.7) starting from (dµn0 en , λn0 ) and ∗ n ˜ sn,N en take their (dXsn,N en , 3n,N s )s≥0 is a (Jn , Jn , wn )-coagulant. Both dµs en and dX values in the finite set F = {dxen : x ∈ B˜ ε }. We have ˜ n,N − λn0 | → 0, kdX˜ 0n,N en − dµn0 en k + |3 0 uniformly in probability, as N → ∞. Hence, by Proposition 4.1, since ϕ˜ is bounded on F, o n −1 n,N n n,N n ˜ ˜ ˜ Xs en − dµs en )k + |3s − λs | > δ/5 < 0 lim sup N log P sup kϕ(d N→∞

s≤t

and that completes the proof. u t Finally, in the eventually multiplicative case, a modification, indeed simplification, of the above argument leads to the following limit theorem, which remains valid beyond gelation. The key point, coming from the proof of Theorem 2.3, is that, for R ≥ R(K), we have 1(0,R] µt = νtR . This means that we can run the argument as above, but with B replaced by (0, R] and the final limit R → ∞ becomes unnecessary. The details are left to the reader. For R > 0 we write dR for a metric compatible with the topology of weak convergence of measures on {m ≤ R}. Theorem 4.3. Assume that the space E of clusters and the coagulation kernel K have the structure (3.1) and (3.2) and that K is eventually multiplicative. Let µ0 be a measure on E with hm + 1, µ0 i < ∞. Denote by (µt )t≥0 the unique solution to (2.6) provided by Theorem 2.3. Let (XtN )t≥0 be a sequence of stochastic K-coagulants. Set X˜ tN = N N −1 XN −1 t and suppose that, for some R ≥ R(K), uniformly in probability as N → ∞, dR (X˜ 0N , µ0 ) → 0, Then, for all t ≥ 0 and all δ > 0, lim sup N N→∞

−1

hm, X˜ 0N i → hm, µ0 i.

N ˜ log P sup dR (Xs , µs ) > δ < 0. s≤t

Cluster Coagulation

435

References [Ald90]

Aldous, D. J.: Deterministic and stochastic models for coalescence (aggregation, coagulation): A review of the mean-field theory for probabilists. Bernoulli 5, 3–48 (1999) [BC90] Ball, J. M. and Carr, J.: The discrete coagulation-fragmentation equations: Existence, uniqueness, and density conservation. J. Stat. Phys. 61 (1–2), 203–234 (1990) [CK96] Clark, J. M. C. and Katsouros, V.: Stably coalescent stochastic froths. Adv. Appl. Probab. 31, 199–219 (1999) [DS96] Dubovski˘ı, P. B. and Stewart, I. W.: Existence, uniqueness and mass conservation for the coagulation-fragmentation equation: Math. Methods Appl. Sci. 19 7, 571–591 (1996) [EZH] Ernst, M.H., Ziff, R.M. and Hendriks, E.M.: Coagulation processes with a phase transition. J. Colloid Interface Sci. 97, 266–277 (1984) [Hei92] Heilmann, Ole J.: Analytical solutions of Smoluchowski’s coagulation equation. J. Phys. A 25 (13), 3763–3771 (1992) [Jeo98] Jeon, I.: Existence of gelling solutions for coagulation-fragmentation equations. Commun. Math. Phys. 194, 541–567 (1998) [McL62] McLeod, J. B.: On an infinite set of nonlinear differential equations. Quart. J. Math. Oxford 13, 119–128 (1962) [McL64] McLeod, J. B.: On the scalar transport equation. Proc. London Math. Soc. 14, 445–458 (1964) [Nor99] Norris, J. R.: Smoluchowski’s coagulation equation: Uniqueness, non-uniqueness and a hydrodynamic limit for the stochastic coalescent. Ann. Appl. Probab. 9, 78–109 (1999) [vS16] van Smoluchowski, M.: Drei Vorträge über Diffusion, Brownsche Bewegung und Koagulation von Kolloidteilchen. Physik. Z. 17, 557–585 (1916) [Whi80] White, Warren H.: A global existence theorem for Smoluchowski’s coagulation equations. Proc. Amer. Math. Soc. 80 2, 273–276 (1980) Communicated by J. L. Lebowitz

Commun. Math. Phys. 209, 437 – 476 (2000)

Communications in

Mathematical Physics

© Springer-Verlag 2000

Shape Fluctuations and Random Matrices Kurt Johansson Department of Mathematics, Royal Institute of Technology, 100 44 Stockholm, Sweden. E-mail: [email protected] Received: 12 March 1999 / Accepted: 19 August 1999

Abstract: We study a certain random growth model in two dimensions closely related to the one-dimensional totally asymmetric exclusion process. The results show that the shape fluctuations, appropriately scaled, converges in distribution to the Tracy–Widom largest eigenvalue distribution for the Gaussian Unitary Ensemble (GUE). 1. Introduction and Results The shape and height fluctuations in many 2-d random growth models are expected to be of order N χ , with χ = 1/3, if the mean of the linear size of the shape or the height is of order N. See [KS] for a review and [NP] for rigorous bounds on χ in first-passage percolation. In this paper we will consider a specific model. It can be given several probabilistic interpretations, as a randomly growing Young diagram, a totally asymmetric one dimensional exclusion process, a certain zero-temperature directed polymer in a random environment or as a kind of first-passage site percolation model. The model has the advantage that we can prove that χ = 1/3 and also compute the asymptotic distribution of the appropriately rescaled random variable. Interestingly, the limit distribution that occurs is the same as that of the scaled largest eigenvalue of an N × N random matrix from the Gaussian Unitary Ensemble (GUE) in the limit N → ∞. The model in this paper has many similarities with the problem of the distribution of the length of the longest increasing subsequence in a random permutation where the same limiting distribution and χ = 1/3 was found in [BDJ]. To define the model let w(i, j ), (i, j ) ∈ Z2+ , be independent geometrically distributed random variables, P[w(i, j ) = k] = (1 − q)q k , k ∈ N, where 0 < q < 1. Let 5M,N be the set of all up/right paths π in Z2+ from (1, 1) to (M, N), i.e. sequences (ik , jk ), k = 1, . . . , M + N − 1, of sites in Z2+ such that

438

K. Johansson

(i1 , j1 ) = (1, 1), (iM+N−1 , jM+N −1 ) = (M, N ) and (ik+1 , jk+1 ) − (ik , jk ) = (1, 0) or (0, 1). Define the random variable X w(i, j ). (1.1) G(M, N ) = max π ∈5M,N

(i,j )∈π

We also define the closely related random variable X w∗ (i, j ), G∗ (M, N ) = max π ∈5M,N

(i,j )∈π

where w ∗ (i, j ) = w(i, j ) + 1, so that P[w∗ (i, j ) = k] = (1 − q)q k−1 , k ≥ 1. Clearly, G∗ (M, N) = G(M, N ) + M + N − 1,

(1.2)

since all paths have the same length. Using this random variable we can define, for each t ≥ 0, a random subset of the first quadrant by A(t) = {(M, N) ∈ Z2+ ; G∗ (M, N ) ≤ t} + [−1, 0]2 .

(1.3)

From the definition of G∗ (M, N) and the fact that we consider up/right paths it follows that A(t) has the form ∪rk=1 [k − 1, k] × [0, λk ] for some integers λ1 ≥ λ2 ≥ · · · ≥ λr ≥ 1, so we can think of A(t) as a Young diagram λ = (λ1 , . . . , λr ). If we think of t ∈ N as a discrete time variable, A(t) is a randomly growing Young diagram. Let ∂ ∗ A(t) be those unit cubes adjacent to A(t) that can be added to A(t) so that it is still a Young diagram, i.e. each cube in ∂ ∗ A(t) must have a cube in A(t) or R2 \ [0, ∞)2 immediately below and to the left of it. The fact that the w ∗ (i, j )’s are independent and geometrically distributed random variables implies that A(t + 1) is obtained by picking each cube in ∂ ∗ A(t) independently with probability p = 1 − q and adding those cubes that were picked to A(t). (Recall that P[w ∗ (i, j ) = k + l|w∗ (i, j ) ≥ k] = P[w(i, j ) = l], l ≥ 0, the lack of memory property.) The starting configuration is A(0) = ∅ and ∂ ∗ A(0) = [0, 1]2 . In this model G∗ (M, N) = k means that the box [M − 1, M] × [N − 1, N ] is added at time k. This growth model has been considered in [JPS]. This randomly growing Young diagram can also, equivalently, be thought of as a certain totally asymmetric exclusion process with discrete time, compare [Ro] or [Li, p. 412]. Let C(t) = ∂([0, ∞)2 \ A(t)) and note that C(t) consists of vertical and horizontal line segments of length 1. To each vertical line segment we associate a 1 and to each horizontal line segment a 0. If we read the numbers along C(t), starting at infinity along the y-axis and ending at infinity along the x-axis, we get an infinite sequence X(t) = (. . . , x−1 (t), x0 (t), x1 (0), x2 (0), . . . ) of 0’s and 1’s, starting with infinitely many 1’s and ending with infinitely many 0’s; we let x0 be the last number we have before passing through the line x = y. We can think of X(t) as a configuration of particles, where xk = 1 means that there is a particle at k, whereas xk = 0 means that there is no particle at k. The stochastic growth of A(t) described above corresponds to the following stochastic dynamics of the particle system. At time t each particle independently moves to the right with probability 1 − q provided there is no particle immediately to the right of it. Otherwise it does not move. The starting configuration is xk (0) = 1(−∞,0] (k). In this particle model G∗ (M, N ) = k means that the particle initially at position −(N − 1) has moved M steps at time k. Our first result concerns the mean and large deviation properties of G(M, N ).

Shape Fluctuations and Random Matrices

439

Theorem 1.1. For each q ∈ (0, 1) and γ ≥ 1, √ (1 + qγ )2 1 . E[G([γ N ], N )] = − 1 = ω(γ , q). N→∞ N 1−q lim

(1.4)

Also, G([γ N], N) has the following large deviation properties. There are functions i() and `() (which depend on q and γ ), so that, for any > 0, 1 log P[G([γ N], N ) ≤ N (ω(γ , q) − )] = −`() N→∞ N 2

(1.5)

1 log P[G([γ N ], N ) ≥ N (ω(γ , q) + )] = −i(). N The functions `(x) and i(x) are > 0 if x > 0.

(1.6)

lim

and

lim

N→∞

Note that the existence of the limit (1.4) follows by a subadditivity argument, so it is the explicit form of the constant that is interesting. The large deviation result (1.6) has been obtained in [Se2]. The theorem will be proved in Sect. 2. The theorem implies that 1t A(t) has an asymptotic shape A0 as t → ∞, in the sense that given any > 0, 1 (1 − )A0 ⊆ A(t) ⊆ (1 + )A0 t for all sufficiently large t. It follows from the definition of A(t), (1.3), and Theorem 1.1 that √ A0 = {(x, y) ∈ [0, ∞)2 ; y + 2 qxy + x ≤ 1 − q}. The boundary of A0 consists of two line segments from the origin to (1 − q, 0) and (0, 1 − q) and part of an ellipse that is tangent to the x- and y-axes. We now want to understand the fluctuations of A(t) around its asymptotic shape A0 , i.e. the fluctuations of G([γ N ], N ) around N ω(γ , q). Before we can formulate the result we need some preliminaries. Let Ai (x) be the Airy function defined by Z ∞ 1 3 ei(t+is) /3+ix(t+is) dt, Ai (x) = 2π −∞ where s > 0 is arbitrary. Consider the Airy kernel A(x, y) =

Ai (x)Ai 0 (y) − Ai 0 (x)Ai (y) , x−y

as an integral kernel on L2 [s, ∞). The Fredholm determinant Z ∞ X (−1)k det(A(xi , xj ))ki,j =1 d k x F (s) = det(I − A) |L2 [s,∞) = k! [s,∞)k

(1.7)

(1.8)

k=0

is a distribution function. It is the distribution function of the appropriately scaled largest eigenvalue of an N × N random matrix from the Gaussian Unitary Ensemble (GUE) in the limit N → ∞, the Tracy–Widom distribution, see [TW1]. The distribution function F (s) can also be defined using a certain Painlevé II function, Z ∞ (x − s)u(x)2 dx], (1.9) F (s) = exp[− s

440

K. Johansson

where u(x) is the unique solution of the Painlevé II equation u00 = 2u3 + xu, with the asymptotics u(x) ∼ Ai (x) as x → ∞. The fact that the expressions (1.8) and (1.9) are equal is proved in [TW1]. Theorem 1.2. For each q ∈ (0, 1), γ ≥ 1 and s ∈ R, lim P[

N→∞

G([γ N ], N ) − N ω(γ , q) ≤ s] = F (s), σ (γ , q)N 1/3

where σ (γ , q) =

q 1/6 γ −1/6 √ √ √ ( γ + q)2/3 (1 + qγ )2/3 . 1−q

(1.10)

(1.11)

The theorem will be proved in Sect. 3. We have not proved convergence of the moments of the rescaled random variable, see Remark 2.5. This theorem should be compared with the result obtained in [BDJ], that if `N (σ ) is the length of a longest increasing subsequence in a random permutation σ ∈ SN (all N ! permutations have the same probability), then √ √ lim P[( N)−1/3 (`N (σ ) − 2 N ) ≤ s] = F (s).

N→∞

(1.12)

Note that in both cases we have the same exponent 1/3, the standard deviation is ∼ (mean)1/3 The proofs of Theorems 1.1 and 1.2 are based on the following result which will be proved in Sect. 2. Proposition 1.3. For any M ≥ N ≥ 1, P[G(M, N) ≤ t] =

1 ZM,N

X

Y

(hi − hj )2

1≤i<j ≤N h∈NN max{hi }≤t+N −1

N Y hi + M − N hi · q , hi

(1.13)

i=1

where ZM,N is the normalization constant (partition function). This remarkable formula should be compared with the formula for the distribution function for the largest eigenvalue, λmax , of an N × N random matrix from GUE, P[λmax ≤ t] =

1 ZN

Z

Y

(−∞,t]N 1≤i<j ≤N

(xi − xj )2

N Y

e−2N xj d N x. 2

(1.14)

j =1

There is a clear similarity between the two expressions, so we can use the ideas developed to investigate (1.14). Just as the right-hand side of (1.14) can be written as a Fredholm

Shape Fluctuations and Random Matrices

441

determinant, so can the right-hand side of (1.13). The kernel for (1.13) is the Meixner kernel. KM,N (x, y) κN−1 MN (x)MN−1 (y) − MN −1 (x)MN (y) q q (wK (x)wK (y))1/2 , = κN x−y

(1.15)

where MN (x) = κN x N + . . . are the normalized orthogonal polynomials with respect to the discrete weight, K = M − N + 1, x+K −1 x q q , x ∈ N. (1.16) wK (x) = x This Meixner kernel also appears in the recent paper [BO]. The polynomial MN (x) is K,q a multiple of the classical Meixner polynomials mN (x). Using the explicit generating function for the Meixner polynomials, see [Ch], the appropriate asymptotics of the kernel (1.15) can be analyzed. This will be done in Sect. 5. Let u(i, j ), (i, j ) ∈ Z2+ , be independent exponentially distributed random variables with parameter 1. Let H (M, N ) be the analogue of G(M, N ) for these random variables, i.e. X u(i, j ) ; π ∈ 5M,N }. (1.17) H (M, N ) = max{ (i,j )∈π

We can consider the related stochastically growing Young diagram and totally asymmetric exclusion process just as in the geometric case, where we now have continuous time. This simple exclusion process is exactly the one considered by Rost, [Ro], see also [Li]. Z In this process X(t) = (ηk (t))∞ k=−∞ ∈ {0, 1} the initial configuration is 1(−∞,0] (k) and a particle (ηk = 1) jumps with exponential rate to the right one step provided there is no particle at k + 1 (ηk+1 = 0). By taking the q → 1 limit in (1.13) we obtain Proposition 1.4. For any M ≥ N ≥ 1, t ≥ 0, P[H (M, N) ≤ t] =

Z

1 0 ZM,N

Y

[0,t]N 1≤i<j ≤N

(xi − xj )2

N Y j =1

xjM−N e−xj d N x.

(1.18)

Proof. If XL is geometrically distributed with parameter 1−1/L, then L−1 XL converges in distribution to an exponential random variable with parameter 1. Since G(M, N ) is a continuous function of the w(i, j )’s, Proposition 1.3 gives P[H (M, N) ≤ t] = lim

L→∞

= lim

L→∞

=

1 0 ZM,N

1

X

ZM,N

Y

(∗) 1≤i<j ≤N

X

2 LN

ZM,N (M − N)! Z

Y [0,t]N

N Y hi + M − N (hi − hj ) (1 − 1/L)hi hi 2

1≤i<j ≤N

i=1

Y

(

(∗) 1≤i<j ≤N

(xi − xj )2

N Y j =1

N M−N hi − hj 2 Y − hi +o( 1 ) Y hi + k L e L ( ) ) L L i=1

k=1

xjM−N e−xj d N x,

t where (∗) means summation over all h ∈ NN such that max{hi } ≤ [Lt] + N − 1. u

442

K. Johansson

Remark 1.5. The right-hand side in (1.18) is the probability that the largest eigenvalue in the Laguerre ensemble is ≤ t. It occurs in the following way. Let A be an N × M rectangular matrix (N ≤ M) with entries that are complex Gaussian random variables with mean zero and variance 1/2. Then the right-hand side in (1.18) is the distribution function for the largest eigenvalue of AA∗ , see [Ja]. Theorem 1.6. For each γ ≥ 1, lim

N→∞

1 √ E[H ([γ N], N )] = (1 + γ )2 , N

(1.19)

and there are functions i∗ () and `∗ () (which depend on γ ), so that for any > 0, 1 √ log P[H ([γ N], N ) ≤ N ((1 + γ )2 − )] = −`∗ () 2 N→∞ N

(1.20)

1 √ log P[H ([γ N ], N ) ≥ N ((1 + γ )2 + )] = −i∗ (). N

(1.21)

lim

and lim

N→∞

Furthermore, assume that aN = O(N 1/3 ) as N → ∞ and pick dN so that dN − (1 + √ 1/ γ )aN = o(N 1/3 ) as N → ∞. Then, for each γ ≥ 1, lim P[

N→∞

√ H (γ N + aN , N ) − (1 + γ )2 N − dN ≤ s] = F (s). √ γ −1/6 (1 + γ )4/3 N 1/3

Proof. For the proof of (1.19) to (1.21) see Remark 2.3. Write c = (1 + √ ρ = γ −1/6 (1 + γ )4/3 . Then, by Proposition 1.4,

(1.22) √ 2 γ ) and

P[H (γ N + aN , N) ≤ cN + dN + ρN 1/3 s] Z N Y 1 1(x)2 xjαN e−xj d N x, = 0 Zγ N+aN ,N [0,cN +dN +ρN 1/3 s]N j =1

Q

where 1(x) = 1≤i<j ≤N (xj − xi ) and αN = (γ − 1)N + aN . By a standard argument, see [Me, Ch. 5], [TW3] or Sect. 3, this equals the Fredholm determinant Z N X (−1)k k=0

k!

[s,∞)k

det(ρN 1/3 KNαN (cN +dN +ρN 1/3 ξi , cN +dN +ρN 1/3 ξj ))ki,j =1 d k ξ, (1.23)

where KNα (x, y) =

κN−1 `αN (x)`αN −1 (y) − `αN (y)`αN −1 (x) α −x α −y 1/2 x e y e κN x−y

is the Laguerre kernel. Here, `αn (x) =

n! (α + n)!

1/2

(−1)n Lαn (x) = κn x n + . . .

Shape Fluctuations and Random Matrices

443

are the normalized associated Laguerre polynomials, Z ∞ `αn (x)`αm (x)x α e−x dx = δnm . 0

From asymptotic formulas for these polynomials it follows that lim KNαN (cN + dN + ρN 1/3 ξ, cN + dN + ρN 1/3 η) = A(ξ, η).

N→∞

(1.24)

This can be proved in the same way as the corresponding results for Meixner polynomials, see Sects. 3 and 4, by using the integral representation Z ex e−xz zn+α dz, Lαn (x) = 2π i C (z − 1)n+1 where C is a circle surrounding z = 1. Using (1.23), (1.24) and some estimates (compare Lemma 3.1) we obtain lim P [H (γ N + aN , N ) ≤ cN + dN + ρN 1/3 s]

N→∞ ∞ X

=

k=0

(−1)k k!

Z [s,∞)k

det(A(ξi , ξj ))ki,j =1 d k ξ = F (s).

We will not present all the details since they are similar to the proof of Theorem 1.2. u t Using this result we can get a fluctuation theorem for Rost’s totally asymmetric simple exclusion process defined above. The random variable H (N, M) is the first time at which the particle P starting at −(N − 1) has moved exactly M steps to the right. If we define Y (k, t) = j >k ηj (t) to be the number of particles to the right of k at time t, then Y (k, t) > m means that the particle that starts at −m has moved ≥ m + k + 1 steps at time t. Hence P[Y (k, t) ≤ m] = 1 − P[H (m + k + 1, m + 1) ≤ t]. Using this relation and (1.19) to (1.21) we obtain the following result first proved by Rost, [Ro], 1 1 Y ([ut], t) → (1 − u)2 t 4 almost surely as t → ∞, |u| ≤ 1. Now, using (1.22) it is fairly straightforward to show the following result. Corollary 1.7. For each u ∈ [0, 1), lim P[Y ([ut], t) ≤

t→∞

t (1 − u)2/3 1/3 ξ t ] = 1 − F (−ξ ). (1 − u)2 + 4 (1 + u)1/3

Remark 1.8. We can interpret Theorems 1.1 and 1.2 (and analogously Theorem 1.6) as a result for a kind of zero-temperature directed polymer or equivalently a directed first-passage site percolation model in the following way. Let Sk be the simple random walk in Z starting at 0 at time 0 and ending at 0 at time 2N + 2. Denote the set of all possible paths by PN . Let v(i, j ), (i, j ) ∈ Z2 be

444

K. Johansson

independent, identically distributed random variables, and let β > 0. On PN we put the random path probability measure β

QN [S] =

1 β

CN

exp(−β

2N X

v(k, Sk )),

k=1

β

S ∈ PN , where CN is the normalization constant. This measure describes a directed polymer (S) fixed at both endpoints at inverse temperature β in the random environβ ment given by the v(i, j )’s, see [Pi]. The free energy is −β −1 log CN , and in the zero temperature limit β → ∞ this becomes FNGS = min

Z∈PN

2N X

v(k, Sk ),

(1.25)

k=1

the ground state energy. By rotating the coordinate system by the angle −π/4 it is seen that (1.25) can, equivalently, be thought of as a first-passage time in a directed first passage site percolation model. Let u(i, j ), (i, j ) ∈ R2+ , be independent, identically distributed random variables (with the same distribution as the v(i, j )’s). Then FNGS has the same distribution as F (N, N ), where X u(i, j ). F (M, N ) = min π ∈5M,N

(i,j )∈π

(The u(i, j )’s are usually thought of as passage times and F (M, N) is the minimal flow time from (1, 1) to (M, N). Hence it is natural to assume that u(i, j ) ≥ 0, but this will not be the case below.) We can define a random shape B(t) = {(M, N) ∈ Z2+ ; F (M, N) ≤ t} + [−1, 0]2 . √ Set u(i, j ) = α − w(i, j ), where α > αmin = (1 − q)−1 (q + q) (this condition on α ensures that B(t) will grow); w(i, j ) are the geometrically distributed random variables considered above. Then clearly, F (M, N ) = α(M + N − 1) − G(M, N ).

(1.26)

ˆ )] = ([N γ ], N ), Let γ ≥ 1, set x(γ ˆ ) = (1 + γ 2 )−1/2 (γ , 1), a unit vector and [nx(γ ˆ )] is a lattice site near ([·] the integer part), where N = [(1 + γ 2 )−1/2 n], so that [nx(γ ˆ )], nx(γ ˆ ). Let Tn (γ ) be the first time s ≥ 0 for which B(s) reaches [nx(γ ˆ )] ∈ B(s)}. Tn (γ ) = inf{s ≥ 0; [nx(γ Clearly, by the definition of B(s) and Eq. (1.26), Tn (γ ) = α([γ N ] + N − 1) − G([γ N], N ), where N = [(1 + γ 2 )−1/2 n]. Theorem 1.1 implies that for each q ∈ (0, 1) and γ ≥ 1, √ (1 + qγ )2 1 1 . [α(γ + 1) − + 1] = µ(γ ). lim E[Tn (γ )] = p n→∞ n 2 1−q 1+γ

Shape Fluctuations and Random Matrices

445

Also, Tn (γ ) has large deviation properties similar to those for G([γ N ], N ). Using this result we can compute the asymptotic shape of B(t). It follows from Theorem 1.2 that P[

Tn (γ ) − nµ(γ ) ≤ s] → 1 − F (−s), (1 + γ 2 )−1/6 ρ(q, γ )n1/3

as n → ∞. Conjecture 1.9. Is the result for G([γ N ], N ) limited to geometric and exponential random variables? Normally, we expect limit laws for appropriately scaled random variables to be independent of the details. It is therefore natural to conjecture that if the w(i, j )’s are i.i.d. random variables with some suitable asumptions on their distribution, then there are constants a and b so that (G([γ N], N ) − aN )/bN 1/3 converges to a random variable with distribution F (s). By Remark 1.8 this leads to a related conjecture for directed first-passage site percolation. 2. The Coulomb Gas 2.1. Combinatorics. The key combinatorial ingredient is the Knuth correspondence introduced in [Kn]. It generalizes the Schensted correspondence [Sc] which is used in [BDJ]. Write [N] = {1, . . . , N}. Let MM,N denote the set of all M × N matrices A = (aij ) with non-negative integer elements, and let MkM,N be the subset of those P PN matrices that satisfy M i=1 j =1 aij = k. A two-rowed array i . . . ik σ = 1 j1 . . . jk is called a generalized permutation if the columns jirr are lexicographically ordered, i.e. either ir < ir+1 or ir = ir+1 , jr ≤ jr+1 . There is a one-to-one correspondence between k of all generalized permutations of length k, where the elements in the upper the set SM,N row come from [M] and the elements in the lower row from [N], and MkM,N defined by σ → f (σ ) = A = (aij ), where i occurs in σ. aij = #times j i We say that jr1 , . . . , jirrm , r1 < r2 < · · · < rm is an increasing subsequence in σ if r1 m j1 ≤ j2 ≤ · · · ≤ jrm . Let `(σ ) denote the length of a longest increasing subsequence in σ. Example. The generalized permutatation 1 1 1 2 2 2 3 3 4 4 1 2 2 2 2 2 1 2 1 3 corresponds to



1 0 1 1

2 3 1 0

 0 0 . 0 1

A longest increasing subsequence is 1 2 2 2 2 2 2 3 so `(σ ) = 8.

446

K. Johansson

Recall from Sect. 1 that 5M,N denotes the set of all up/right paths π from (1, 1) to (M, N) through the sites (i, j ) with 1 ≤ i ≤ M, 1 ≤ j ≤ N . Lemma 2.1. For each A ∈ MkM,N , X aij ; π ∈ 5M,N } = `(f −1 (A)). max{

(2.1)

(i,j )∈π

Proof. This is clear from the definitions. That we go to the right corresponds to the fact that ir1 ≤ · · · ≤ irm and that we go up corresponds to jr1 ≤ · · · ≤ jrm (the upper row gives row indices whereas the lower row gives column indices in the matrix). u t k to pairs (P , Q) Now, Knuth has defined a one-to-one mapping from the set SM,N of semi-standard Young tableaux of the same shape λ, which is a partition of k, λ ` k, where P has elements in [N] and Q has elements in [M]. (More information on Young tableaux can be found in [Sa] and [Fu].) This correspondence has the property that if σ → (P , Q) and P , Q have shape λ, then `(σ ) = the length of the first row, λ1 , in λ. Consider G(M, N) defined by (1.1). The M × N matrix W = (w(i, j )) is a random element in MM,N . Let N M X X w(i, j ) S(M, N ) = i=1 j =1

and

pM,N (t) = P[G(M, N ) ≤ t].

Then, pM,N (t) =

∞ X

P[G(M, N ) ≤ t|S(M, N ) = k]P[S(M, N ) = k].

(2.2)

k=0

For a fixed A ∈ MkM,N we have P[{A}] = since

P

Y (1 − q)q aij = (1 − q)MN q k , i,j

i,j

aij = k. We have proved

Lemma 2.2. The conditional probability P[·|S(M, N ) = k] is the uniform distribution on MkM,N . This lemma is the reason that we choose the w(i, j )’s to be independent and geometrically distributed. Note that P[S(M, N) = k] = #MkM,N (1 − q)MN q k .

(2.3)

Let L(λ, M, N) denote the number of pairs (P , Q) of semi-standard Young tableaux of shape λ, such that P has elements in [N ] and Q has elements in [M]. Combining Lemma 2.1, Lemma 2.2 and the Knuth correspondence we see that P[G(M, N) ≤ t|S(M, N ) = k] = To compute L(λ, M, N) we use

1

X

#MkM,N λ`k,λ ≤t 1

L(λ, M, N ).

(2.4)

Shape Fluctuations and Random Matrices

447

Lemma 2.3. The number of semi-standard tableaux of shape λ and elements in [N ] equals Y λi − λj + j − i . j −i 1≤i<j ≤N

Proof. We have two formulas for the Schur polynomial in N variables associated with the partition λ, [Sa, Fu], sλ (x) =

X

xT =

det(xjλi +N −i )1≤i,j ≤N

T

det(xjN −i )1≤i,j ≤N

,

where the sum is over all semi-standard λ-tableaux T with elements in [N] and x T = mN with mj equal to the number of times j occurs in T . Hence, evaluating the x1m1 . . . xN Vandermonde determinants, Y

sλ (1, x, . . . , x N −1 ) = x r

1≤i<j ≤N

where r = equals

x λi −λj +j −i − 1 , x j −i − 1

PN

i=1 (i − 1)λi . The number of semi-standard tableaux with elements in [N]

Y

sλ (1, 1, . . . , 1) = lim sλ (1, x, . . . , x N −1 ) = x→1

1≤i<j ≤N

λi − λ j + j − i . j −i

This completes the proof of the lemma. u t It follows from Lemma 2.3 that Y λi − λ j + j − i L(λ, M, N) = j −i 1≤i<j ≤M

Y 1≤i<j ≤N

λi − λ j + j − i . j −i

(2.5)

We assume from now on that M ≥ N , the other case is analogous by symmetry. Since the numbers in the columns in P and Q are strictly increasing we must have λi = 0 if N < i ≤ M. Hence L(λ, M, N) =

Y 1≤i<j ≤M

λi − λj + j − i j −i

2 Y M N Y λi + j − i . j −i i=1 j =N+1

Let hj = λj + N − j , j = 1, . . . , N, so that h1 = λ1 + N − 1, hN = λN ≥ 0 and h1 > h2 > · · · > hN . Then L(λ, M, N) =

Y 1≤i<j ≤N

=

N−1 Y j =0

1 j !(M − N + j )!

M N (hi − hj )2 Y Y hi + j − N (j − i)2 j −i i=1 j =N+1

Y 1≤i<j ≤N

N Y (hi + M − N )! (hi − hj )2 . hi ! i=1

(2.6)

448

K. Johansson

P PN The condition N j =1 λj = k translates into j =1 hj = k + N (N − 1)/2 and λ1 ≤ t to h1 ≤ t + N − 1. By (2.2), (2.3) and (2.4) we have pM,N (t) =

∞ X (1 − q)MN q k k=0

X

L(λ, M, N ),

λ`k,λ1 ≤k

and inserting (2.6) yields pM,N (t) =

N −1 (1 − q)MN −N (N−1)/2 Y 1 q N! j !(M − N + j )! j =0

×

∞ X

X

Y

k=0 P

1≤i<j ≤N h∈NN hi =k+N(N−1)/2 max{hi }≤t+N−1

N Y (hi + M − N )! PN hi (hi − hj ) q i=1 , hi ! 2

i=1

where we have used the symmetry under permutation of the hi ’s. Summing over k gives P all the possible values of hi , so we obtain pM,N (t) =

X

1 ZM,N

Y

(hi − hj )

2

1≤i<j ≤N h∈NN max{hi }≤t+N −1

N Y i=1

q

wM−N+1 (hi ),

(2.7)

q

where wK (x) is given by (1.16) and ZM,N = q N(N−1)/2 (1 − q)−MN

N −1 Y

j !(M − N + j )!.

(2.8)

j =0

This proves Proposition 1.3. u t 2.2. The large deviation estimate. In order to investigate the location of the rightmost charge in (2.7) and prove large deviation formulas we rescale the discrete Coulomb gas (2.7). Let M = [γ N], γ ≥ 1 fixed, and K = K(N) = [γ N ] − M + 1. Set AN = N1 N, AN (s) = {x ∈ AN ; x ≤ s} and γ ,q

VN (t) = −

1 q log wK(N) (N t), t ≥ 0. N

Using Stirling’s formula we see that γ ,q

1 − (t + γ − 1) log(t + γ − 1) q . + t log t + (γ − 1) log(γ − 1) = V γ ,q (t)

lim VN (t) = t log

N→∞

(2.9)

uniformly on compact subsets of [0, ∞). (We will often omit the superscripts γ and q.) Rescaling the variables in (2.7) by setting hi = N xi , xi ∈ AN we see that (2.7) can be written ZN ( Nt + 1 − N1 ) . , (2.10) pN (t) = pM(N ),N (t) = ZN

Shape Fluctuations and Random Matrices

449

where X

ZN (s) =

1N (x) exp −N 2

N X

VN (xj )

(2.11)

j =1

x∈AN (s)N

Q and ZN = ZN (∞). Here 1N (x) = 1≤i<j ≤N (xj − xi ) is the Vandermonde determinant. When investigating the large deviation properties of pN (t) we may just as well consider more general confining potentials VN . Assume that VN : [0, ∞) → R, N ≥ 1, satisfy (i) VN is continuous, N ≥ 1. (ii) There are constants ξ > 0, T ≥ 0 and N0 ≥ 1 such that VN (t) ≥ (1 + ξ ) log(t 2 + 1)

(2.12)

for t ≥ T and N ≥ N0 . (iii) VN (t) → V (t) uniformly on compact subsets of [0, ∞). Set for x ∈ AM N and β > 0, QM,N (x) = |1M (x)|β

M Y

exp(−

j =1

βN VN (xj )). 2

(This M is not the same as the previous M.) Define the partition functions X QM,N (x), ZM,N (t) = x∈AN (t)M

ZM,N = ZN,M (∞) and the probability measure PM,N [B] =

1

X

ZM,N

x∈B

QM,N (x),

B ⊆ NM . We are interested in the distribution of the position of the rightmost charge, max1≤k≤M xk . Its distribution function is given by FM,N (t) = PM,N [max xk ≤ t] =

ZM,N (t) . ZM,N

(2.13)

(If M = N we write FN (t).) In order to formulate the large deviation results for FN (t) we need some results from weighted potential theory, [ST]. The results we need differ from the usual ones since we are interested in the continuum limit of a discrete Coulomb gas, so that the particle density of the rescaled gas is always ≤ 1. Hence, the equilibrium measures will be absolutely continuous with a density φ satisfying R s 0 ≤ φ ≤ 1. Let As denote the set of all φ ∈ L1 [0, s) such that 0 ≤ φ ≤ 1 and 0 φ = 1, 1 ≤ s ≤ ∞. Given V : [0, ∞) → R, continuous and such that there is a δ > 0 and a T ≥ 0 such that V (t) ≥ (1 + δ) log(t 2 + 1)

(2.14)

450

K. Johansson

for t ≥ T , we set 1 1 kV (x, y) = log |x − y|−1 + V (x) + V (y) 2 2 Z

and IV [φ] =

s

Z

0

s

0

kV (x, y)φ(x)φ(y)dxdy,

for φ ∈ As . The proof of the next proposition is similar to the corresponding result in weighted potential theory. See [DS] and also [LL] where a very similar problem is treated. Proposition 2.1. For each s ∈ [1, ∞] there is a unique φVs ∈ As such that inf IV [φ] = IV [φVs ] = FVs .

φ∈As

The extremal function φVs has compact support. (If s = ∞ we will drop the superscript.) Let bV = sup(supp φV ) be the right endpoint of the support of φV . Set J (t) = 0 for t ≤ bV and Z ∞

J (t) = inf

τ ≥t

for t ≥ bV . Also, set

0

kV (τ, x)φV (x)dx − FV

(2.15)

1 t (F − FV ) 2 V for t ≥ 1. The next theorem gives the large deviations for the distribution function FN (t) defined by (2.13) L(t) =

Theorem 2.2. Assume that VN (t) satisfies the assumptions (i)–(iii) above. Then 1 log FN (t) = −βL(t) (2.16) N→∞ N 2 for any t ≥ 1 and L(t) > 0 if t < bV . Assume furthermore that J (t) > 0 for t > bV . Then 1 log(1 − FN (t)) = −βJ (t) (2.17) lim N→∞ N for all t. lim

We postpone the proof to Sect. 4. Remark 2.3. The same result is true for a continuous Coulomb gas on R with density 1 β

ZN

|1N (x)|β exp(−

N βN X V (xj )), 2

(2.18)

j =1

on RN , which occur in random matrix theory. The choice β = 2 and V (t) = 2t 2 corresponds to the Gaussian Unitary Ensemble (GUE), compare (1.14). We assume that V is continuous and satisfies (2.14). In this case As is replaced by M1 (s), the set of all probability measures on (−∞, s), and φV (x)dx is replaced by the equilibrium measure dµV (t), see [Jo]. The proof is essentially the same. The formula (2.16) for certain V is a consequence of the result in [BG], see also [HP]. Also, (2.17) has been proved in the case V (t) = t 2 /2 in [BDG]. If we take (2.18) on [0, ∞)N with β = 2 and V (t) = −(M/N − 1) log t + t we get the measure in (1.18), and in this way we can prove (1.19) to (1.21).

Shape Fluctuations and Random Matrices

451

We can now apply Theorem 2.2 to the model we are interested in. It is straightforward to γ ,q verify that VN satisfies the conditions (i) - (iii) with limiting external potential V γ ,q (t). Write bV γ ,q = b(γ , q). The computation of φV γ ,q will be outlined in Sect. 6. We have √ (1 + qγ )2 . b(γ , q) = 1−q If γ ≥ 1/q, then

where a =

√ (1− qγ )2 , 1−q

2 φV γ ,q (t) = v( (t − a) − 1), a ≤ t ≤ b, c c = b(γ , q) − a and

Bx + 1 Dx + 1 1 ) − arctan( √ )], [arctan( √ √ √ 2π 1 − x 2 D2 − 1 1 − x2 B2 − 1 √ √ B = (γ + q)/2 qγ , D = (1 + qγ )/2 qγ . If γ < 1/q, then, ( 1, if 0 ≤ t ≤ a φV γ ,q (t) = 2 v( c (t − a) − 1), if a ≤ t ≤ b, v(x) =

(2.19)

where v(x) =

Bx + 1 Dx + 1 1 ) − arctan( √ )] [π − arctan( √ √ √ 2 2 2π 1−x D −1 1 − x2 B2 − 1

(2.20)

with a, c, B, D as before. We will not discuss the explicit form of the lower tail rate function. The upper tail rate function is given by Z x γ −q 1 − qγ dy c , (2.21) (x − y)[ + ]p J (t) = √ 8 qγ 1 y+B y+D y2 − 1 with c, B, D as above and x = 2(t − a)/c − 1. Using this formula we can show that (see Sect. 6) there are constants c1 > 0 and c2 > 0 so that ( c1 δ 3/2 if 0 ≤ δ ≤ 1 (2.22) J (b + δ) ≥ if δ ≥ 1 c2 δ and

2(1 − q)3/2 γ 1/4 δ 3/2 + O(δ 5/2 ). √ √ √ 3q 1/4 ( q + γ )(1 + qγ ) In particular J (t) > 0 if t > b(γ , q). From (2.10), (2.13) and Theorem 2.2 we obtain J (b + δ) =

lim

N→∞

and

1 log pN (N t) = −2L(t + 1) N2

(2.23)

(2.24)

1 (2.25) log(1 − pN (N t)) = −2J (t + 1) N for each t ≥ 0. These formulas imply Theorem 1.1 with `() = 2L(bV − ) and i() = 2J (bV + ). By Theorem 2.2 and (2.22) we have i() > 0 and `() > 0 if > 0. By a superadditivity argument, the limit (2.25) actually gives a large deviation estimate for all N , compare [Se1]. lim

N→∞

452

K. Johansson

Corollary 2.4. For all t ≥ 0 and N ≥ 1, 1 − pN (N t) ≤ exp(−2N J (t + 1)).

(2.26)

Proof. For 1 ≤ PM1 ≤ M2 and 1 ≤ N1 ≤ N2 we let G[(M1 , N1 ), (M2 , N2 )] denote the maximum of (i,j )∈π w(i, j ) over all up/right paths from (M1 , N1 ) to (M2 , N2 ). Note that if 1 ≤ M1 < M2 and 1 ≤ N1 < N2 , then G[(M1 + 1, N1 + 1), (M2 , N2 )] and G[(1, 1), (M2 − M1 , N2 − N1 )] are identically distributed. (ii) G[(1, 1), (M1 , N1 )] and G[(M1 + 1, N1 + 1), (M2 , N2 )] are independent. Since [2γ N] ≥ 2[γ N], we have (iii) G[([γ N]+1, N +1), ([2γ N ], 2N )] ≥ G[([γ N ]+1, N +1), (2[γ N], 2N )]. Write aN (t) = 1 − pN (Nt) = P[G((1, 1), ([γ N ], N )) > N t]. Then, by (i) and (iii), (i)

aN (t) ≤ P[G(([γ N ] + 1, N + 1), ([2γ N ], 2N )) > N t] and hence, by (ii), aN (t)2 ≤ a2N (t). Repeated use of this inequality yields N −1 log aN (t) ≤ (2k N)−1 log a2k N (t), and by letting k → ∞ and using (2.25) t we find N −1 log aN (t) ≤ −2J (t + 1). u Remark 2.5. We cannot prove convergence of the moments of the rescaled random variable in Theorem 1.2 since we have no finite N estimate of P[G([γ N ], N ) − ωN ≤ −sN 1/3 ] for s > 0 large. This would require an estimate of the finite N Fredholm determinant. In the other direction we can use the estimate in Corollary 2.4. The same remark applies to Theorem 1.6. Remark 2.6. In [BR] it is proved by Baik and Rains that if we consider permutations with certain restrictions we can get the Tracy–Widom distributions for GOE and GSE as limiting laws for longest increasing and decreasing subsequences. By considering a restricted geometry we can obtain the Tracy–Widom distribution for GOE, [TW2], also in the present setting. Let w(i, j ) , 1 ≤ i ≤ j be independent geometrically distributed random P[w(i, j ) = k] = (1 − q)q k for 1 ≤ i < j and P[w(i, i) = k] = √ variables, k/2 (1 − q)q for i ≥ 1. Set w(i, j ) = w(j, i), if i > j ≥ 1, so that A = (w(i, j )) is a symmetric matrix. The Knuth correspondence maps A to a pair of semistandard Young tableaux (P , Q) with Q = P , i.e. A maps to a single semistandard Young tableaux, sym see [Kn] or [Fu]. Let 5N,N be the set of all up/right paths from (1, 1) to (N, N ) in {(i, j ) ∈ Z2+ ; 1 ≤ i ≤ j }, i. e. in a triangle, and set F (N ) = max{

X

(i,j )∈π

sym

w(i, j ) ; π ∈ 5N,N }.

Now, we also have F (N ) = max{

X

w(i, j ) ; π ∈ 5N,N },

(i,j )∈π

which equals the length of the first row in P , because those parts of a maximal path in 5N,N which goes below the diagonal P can be reflected in the diagonal to give a path in sym 5N,N without changing the sum w(i, j ) since w(i, j ) is symmetric.

Shape Fluctuations and Random Matrices

453

The same argument as above now gives X

1

P[F (N ) ≤ t] =

(1) ZN

Y

|hi − hj |

1≤i<j ≤N h∈NN max{hj }≤t+N −1

N Y

q hi /2 .

i=1

This corresponds to β = 1, γ = 1 in Theorem 2.2. It should be possible to analyze the asymptotics in this case analogously to GOE, see [TW2], to show that we can find constants a and b so that P[F (N) ≤ aN + sbN 1/3 ] converges to F1 (t), the Tracy– Widom distribution for GOE. However it is not immediate to generalize the techniques of [TW2], so this remains to be done. Note that again we can take the limit q → 1 to get the case of exponentially distributed random variables. 3. The Fredholm Determinant From the identity (2.7) we have pN (t) = ψN (t + N − 1), where ψN (s) = EN [

N Y

(3.1)

(1 − χs (hj ))].

(3.2)

j =1

Here EN [·] =

X

1 ZM(N ),N

(·)1N (h)2

N Y j =1

h∈NN

q

wK(N) (hj ),

K(N ) = M(N ) − N + 1, M(N) = [γ N] and χs (t) is the indicator function for the interval (s, ∞). We will take s in (3.2) to be an integer. K,q Let Mj (x), j = 0, 1, . . . be the normalized orthogonal polynomials with respect q to the weight wK (x) on N, ∞ X x=0 K,q

and Mj

k,q

K,q

Mi (x)Mj

q

(x)wK (x) = δij ,

(x) = κj x j + . . . with κj > 0. Set KN (x, y) =

N −1 X j =0

K,q

Mj

K,q

(x)Mj

q

q

(y)wK (x)1/2 wK (y)1/2 ,

so that KN (x, y) is a reproducing kernel on `2 (N). K,q The polynomials Mn are multiples of the standard Meixner polynomials, [NSU, Ch], (−1)n K,q K,q mN (x), Mn (x) = dn

454

K. Johansson

where dn2 = K,q

The leading coefficient in mn

n!(n + K − 1)! . (1 − q)K q n (K − 1)!

n is ( q−1 q ) and consequently 1 1−q n . κn = dn q

The Meixner polynomials have the generating function, [Ch], ∞ X

K,q

mn (x)

n=0

t tn = (1 − )x (1 − t)−x−K . n! q

(3.3)

The Christoffel–Darboux formula, [Sz], gives κN−1 MN (x)MN−1 (y) − MN (y)MN −1 (x) q q wK (x)1/2 wK (y)1/2 κN x−y q mN (x)mN−1 (y) − mN (y)mN −1 (x) q q wK (x)1/2 wK (y)1/2 , =− 2 x−y (1 − q)dN−1

KN (x, y) =

(3.4)

where we have omitted the upper indices. Standard computations from random matrix theory, [Me], Ch. 5 and [TW2], show that ψN can be written as a Fredholm determinant, ψN (s) =

N X (−1)k k=0

k!

X

det(KN (hi , hj ))1≤i,j ≤k .

(3.5)

h∈{s+1,s+2,... }k

The proof of Theorem 1.2 is based on taking the appropriate limit in (3.5). The next lemma will allow us to compute the asymptotics of the right-hand side of (3.5). Lemma 3.1. Let b ≥ 0 be a constant and assume that ρN → ∞ as N → ∞. Suppose furthermore that KN : N × B → R, N ≥ 1, satisfies the following properties. (i) Let M1 > 0 be a given constant. There is a constant C such that ∞ X

KN (bN + ρN τ + m, bN + ρN τ + m) ≤ C

(3.6)

m=1

for all N ≥ 1, τ ≥ −M1 . (ii) Given > 0, there is an L > 0 so that ∞ X

KN (bN + ρN L + m, bN + ρN L + m) ≤ ,

(3.7)

m=1

for all N ≥ 1. (iii) Let M0 > 0 be a given constant. If A(ξ, η) is the Airy kernel defined by (1.7), then lim ρN KN (bN + ρN ξ, bN + ρN η) = A(ξ, η)

N→∞

uniformly for ξ, η ∈ [−M0 , M0 ].

(3.8)

Shape Fluctuations and Random Matrices

455

(iv) The matrix (KN (xi , xj ))ki,j =1 is positive definite for any xi , xj ∈ [0, ∞), k ≥ 1 Then, for each fixed t ∈ R, N X (−1)k X

lim

N→∞

k!

k=0

det(KN (bN + ρN t + hi , bN + ρN t + hj ))ki,j =1 = F (t), (3.9)

h∈Nk

where F (t) is given by (1.8). Proof. It follows from (iv) that | det(KN (xi , xj ))1≤i,j ≤k | ≤

k Y

KN (xj , xj ),

(3.10)

j =1

see for example [HJ]. Consequently, |

X

det(KN (aN + hi , aN + hj ))1≤i,j ≤k | ≤

X ∞

KN (m, m)

k

,

(3.11)

m=1

h∈Nk

where we have written aN = bN + ρN t. Choose M1 so that |t| ≤ M1 . Let > 0 be given. It follows from the estimates (3.6) and (3.11) that we can choose ` so that |

N ∞ X X (−1)k X Ck ≤ , det(KN (aN + hi , aN + hj ))ki,j =1 | ≤ k! k! k

k=`+1

(3.12)

k=`+1

h∈N

for all N ≥ 1. Choose L0 so that (3.11) holds with L = L0 − M0 . Then, by the estimates (3.6), (3.7) and (3.10),  X X  − h∈Nk h∈([L ρ

0 N

≤

X

k Y

h∈Nk

i=1

some hj >L0 ρN

≤

k X j =1

≤k

X

k Y

KN (aN + hi , aN + hi )

KN (aN + hi , aN + hi )

h∈Nk i=1 hj >L0 ρN

X ∞

m=1 k−1

≤ kC

 det(KN (aN + hi , aN + hj ))1≤i,j ≤k ]c )k 

k−1 X ∞

KN (aN + m, aN + m)

.

KN (bN + LρN + m, bN + LρN + m)

m=1

(3.13)

456

K. Johansson

Denote the Fredholm determinant in the right-hand side of (3.9) by DN (t). Inserting the estimates (3.12) and (3.13) into the formula (3.9) we obtain ` k X X h h (−1) 1 j i DN (t) − det(KN (σ + ,σ + ))1≤i,j ≤k k k! ρ ρ ρN N N k=0 h∈[L0 ρN ]k (3.14) X ` k−1 kC + 1 ≤ (1 + eC ), ≤ k! k=0

where

KN (ξ, η) = ρN KN (bN + ρN ξ, bN + ρN η).

By assumption (iii), with M0 = L0 + M1 , we can chose N0 so that if N ≥ N0 , then x y x y ,σ + )) − det(A(σ + ,σ + ))| ≤ k | det(KN (σ + ρN ρN ρN ρN L0 for all x, y ∈ [L0 ρN ]. Thus, X ` (−1)k X h h h h 1 j j i i ,t + )) − det(A(t + ,t + )) k det(KN (t + k! ρ ρ ρ ρ ρN N N N N k k=0

≤

` X k=0

1 k!

h∈[L0 ρN ]

L0 ρN + 1 L0 ρN

k

≤ C 0 . (3.15)

Combining the estimates (3.14) and (3.15) we find ` k X X h h (−1) 1 j k i 00 DN (t) − det(A(σ + , σ + )) i,j =1 k ≤ C . k! ρ ρ ρ N N N k k=0

(3.16)

h∈[L0 ρN ]

The Airy kernel can be written, [TW1], Z ∞ Ai (x + s)Ai (y + s)ds. A(x, y) =

(3.17)

0

Using the formula, see for example [Hö], p. 214, Z ∞ 2 3/2 1 2√ 3 e−ξ x+iξ /3 dξ, Ai (x) = e− 3 x 2π −∞ valid for x > 0, we see that 2 3/2 1 |Ai (x)| ≤ √ 1/4 e− 3 x , x > 0. 2 πx

This estimate can be used to show that the Airy kernel satisfies (i) and (ii) above. Since the matrix (A(ξi , ξj ))1≤i,j ≤k is positive definite, we can use the same argument as above to show that ∞ Z ! ` Z X X (−1)k k k − (3.18) det(A(ξi , ξj ))i,j =1 d ξ ≤ k! [t,∞)k [t,L0 ]k k=0

k=0

Shape Fluctuations and Random Matrices

457

provided ` and L0 are sufficiently large. From (3.17) we see that choosing N1 ≥ N0 large enough we have Z ` X (−1)k k det(A(ξi , ξj ))1≤i,j ≤k d ξ ≤ C 000 (3.19) DN (t) − k! [t,L0 ]k k=0

for all N ≥ N1 . If we combine the estimates (3.18) and (3.19) we have proved the lemma. u t To apply this lemma to the Meixner kernel (3.4) we need Lemma 3.2. The Meixner kernel satisfies the properties (i) to (iv) in Lemma 3.1 with b = b(γ , q) as before and ρN = σ N 1/3 , where σ is given by (1.11). This lemma will be proved in Sect. 5. We can now combine (3.1), (3.5) and (3.9) to get lim pN ((b − 1)N + σ N 1/3 t) = F (t),

(3.20)

N→∞

which is (1.10) and Theorem 1.2 is proved. u t 4. Proof of the Large Deviation Theorem In this section we will prove Theorem 2.2. Set X kV (xi , xj ). KN,V (x) = 1≤i6 =j ≤N

By adding a constant C to VN , which does not alter the problem we can, by assumption (ii) on VN , assume that VN (t) − log(t 2 + 1) ≥ ξ log(t 2 + 1)

(4.1)

for all t ≥ 0. Since |t − s|2 ≤ (t 2 + 1)(s 2 + 1), this implies −KM,VN (x) ≤ −ξ(M − 1)

M−1 X j =1

log(1 + xj2 )

(4.2)

for all x ∈ [0, ∞)M . Note that X

log |xj − xk | − N

1≤j 6=k≤N−1

N −1 X

VN (xj ) = −KN −1,VN (x) −

j =1

N −1 X

VN (xj ).

(4.3)

j =1

The next lemma is analogous to Lemma 4.2 in [Jo]. Lemma 4.1. Let {sN } be a sequence in [0, ∞) such that sN → s > 0 as N → ∞, or sN ≡ ∞. Set, for a given α > 0, N,α (s) = {x ∈ AN (s)N −1 ;

1 KN −1,VN (x) ≤ FVσ + α}. N2

458

K. Johansson

Let 0 ≤ λ ≤ 1 and let σN ∈ AN , N ≥ 1, be a sequence converging to σ > 0. Define a probability measure on AN (sN )N−1 by λ,σN (; sN ) = PN−1,N

X N−1 Y

1

λ,σN ZN−1,N (sN ) x∈ j =1

|σN − xj |λβ QN −1,N (x),

(4.4)

λ,σN λ,σN (sN ) is a normalization constant. (EN where ZN−1,N −1,N [·; sN ] denotes the corresponding expectation and if sN ≡ ∞ or λ = σN = 0 we omit them in the notation.) Fix η > 0. Then there is an N1 such that for all a ≥ 0 and N ≥ N1 , β

λ,σN (N,η+a (sN )c ; sN ) ≤ e− 4 aN . PN−1,N 2

(4.5)

Proof. We first prove the following claim. Claim 4.2. Let σN ∈ AN , σN → σ as N → ∞ and s ∈ (0, ∞]. For each N ≥ 2 we N ) ∈ A (s)N −1 so that can choose (x1N , . . . , xN N −1 1 N2

X 1≤j 6=k≤N−1

log |xjN − xkN |−1 +

N −1 1 X VN (xjN ) N j =1

N −1 1 X log |σN − xjN | → FVs − 2 N

(4.6)

j =1

as N → ∞. To see this set ykN = max{

j ; j ∈ N and N

Z 0

j/N

φVs (t)dt <

k }. N

If ykN 6 = σN for k = 1, . . . , N − 1, we set xkN = ykN . If ykN0 = σN , we set xkN = ykN for k < k0 and xkN = ykN + 1/N for k = k0 , . . . , N − 1. Using the fact that 0 ≤ φVs ≤ 1 N it is not difficult to see that x1N < x2N < · · · < xN−1 ≤ L for all N and some fixed L. Furthermore N −1 1 X δx N → φVs (x)dx (4.7) k N −1 k=1

weakly as N → ∞. The property (iii) in the assumptions on VN implies Z ∞ N−1 1 X VN (xjN ) → V (t)φVs (t)dt. N 0

(4.8)

N−1 N −1 2 X N N N −1 2 1 X N −1 log |σ − x | ≤ log log = , N j N2 N2 j N2 (N − 1)!

(4.9)

j =1

Clearly,

j =1

j =1

Shape Fluctuations and Random Matrices

459

which → 0 as N → ∞. Also, since σN → σ and the xjN belong to a bounded set, we get a bound in the other direction which goes to 0 as N → ∞. Given M ≥ 1, set fM (t) = min{log |t|−1 , log M}. Write 1 X 1 X N N −1 log |x − x | = fM (xjN − xkN ) j k N2 N2 j 6 =k

1 + 2 N

j 6 =k

X

j 6=k |xjN −xkN |<1/M

(log |xjN − xkN |−1 − fM (xjN − xkN )).

(4.10)

The absolute value of the second sum in the right-hand side of (4.10) is ≤

1 N2

X

log |

1≤|j −k|≤N/M |j |,|k|≤LN

log M N |≤C . j −k M

Thus, using the weak convergence (4.7) and then letting M → ∞ we obtain Z ∞Z ∞ 1 X N N −1 log |xj − xk | = log |x − y|−1 φVs (x)φVs (y)dxdy, lim N→∞ N 2 0 0 j 6=k

which together with (4.8) and (4.9) proves the claim. We turn now to the proof of Lemma 4.1. Let > 0 be given. We want to estimate λ,σN from below. Choose N0 so that sN ≥ s − if N ≥ N0 . Then ZN−1,N λ,σN λ,σN ZN −1,N (sN ) ≥ ZN −1,N (s − ),

if N ≥ N0 . Choose (xkN )N−1 k=1 ⊆ AN (s − ) as in the claim. Clearly,  1 X β 1 λ,σN log ZN−1,N (sN ) ≥ −  2 log |xjN − xkN |−1 2 N 2 N j 6 =k  N −1 N−1 X X 1 + VN (xjN ) − 2 log |σN − xjN | , N j =1

j =1

and consequently, by Claim 4.2, lim inf N→∞

β s− 1 λ,σN log ZN −1,N (sN ) ≥ − FV . N2 2

Since FVs− & FVs as → 0+, lim inf N→∞

β s 1 λ,σN log ZN −1,N (sN ) ≥ − FV . N2 2

(4.11)

Thus, given δ > 0, we can choose N (δ) so that if N ≥ N (δ), then β s 1 λ,σN log ZN −1,N (sN ) ≥ − (FV + δ). 2 N 2

(4.12)

460

K. Johansson

It follows from (4.2) with M = N − 1 and (4.3), that for any 0 < ρ < 1/2, λ,σN (N,η+a (SN )c ; sN ) PN−1,N

≤e

βN 2 s 2 (FV +δ)

X

β

x∈A(sN )N −1 \N,η+a (sN )

≤e

P

β

e− 2 KN −1,VN (x)− 2

βN 2 β s s 2 2 (FV +δ)− 2 (1−ρ)(FV +η+a)N

 

j

VN (xj )

|σN − xj |λβ

j =1

X

− β2 ξ(N−1)

(t 2 + 1)

t∈AN

≤e

N −1 Y

N

(1 + σN2 )λβ/2 

− β4 aN 2

if N is sufficiently large (independent of a ≥ 0). Note that δ + ρFVs − η < 0 if we choose δ = η/2 and ρ sufficiently small. This completes the proof. u t This lemma can be used to prove Corollary 4.3. For any s ∈ (1, ∞], β 1 log ZN (s) = − FVs . 2 N→∞ N 2

(4.13)

lim

Furthermore FVs − FV > 0 if s < bV . Proof. The lower limit follows by taking λ = σN = 0 in (4.11) (replacing N − 1 by N does not modify the argument above in any essential way). Given 0 < ρ < 1, we can use (4.2) with M = N and the continuity of exp KN,VN to see that X

ZN (s) =

β

β

e− 2 KN,VN (x)− 2

PN

x∈AN (s)N

≤

sup x∈AN (s)N

≤e

β

e− 2 (1−ρ)KN,VN (x)

j =1 VN (xj )

X

β

e− 2 ρξ(N−1)

P

j

log(1+xj2 )

(4.14)

x∈AN (s)N

− β2 (1−ρ)KN,VN (y N )+CN

,

N ) ∈ A (s)N . Clearly, y N 6 = y N if if N is sufficiently large, where y N = (y1N , . . . , yN N j k P j 6 = k. Set λN = N −1 j δy N . It follows from (4.12), with λ = σ = 0 and N − 1 j

replaced by N , that N −2 log ZN (s) ≥ −β(FVs + δ)/2 for N ≥ N (δ), so (4.2) and (4.14) yield Z ∞

0

log(1 + t 2 )dλN (t) ≤ C.

Thus {λN }∞ N=1 is tight. Pick a subsequence that gives the upper limit of −2 N log ZN (s), and a further subsequence so that λNj converges weakly to ν = ψdx. The measure ν has to be absolutely continuous with density√satisfying √ 0 ≤ ψ ≤ 1 2 because of the definition of λN . Using (4.1) and |t − s| ≤ t + 1 s 2 + 1 we see that kVN (t, s) ≥ 0. Set, for given M > 0, kVMN (t, s) = min(kVN (t, s), M) and choose φT (t) continuous so that 0 ≤ φT ≤ 1, φT (t) = 1 if |t| ≤ T , = 0 if |t| ≥ T + 1

Shape Fluctuations and Random Matrices

461

and φT (t) ≤ φT 0 (t) if T ≤ T 0 . Then, kVN (t, s) ≥ φT (t)φT (s)kVMN (t, s) and using the estimate (4.14) we get 1 log ZNj (s) Nj2 C + β2 (1 − ρ)M β − (1 − ρ) ≤ Nj 2

Z 0

∞Z ∞ 0

φT (t)φT (s)KVMN (t, s)dλNj (t)dλNj (s),

and thus, letting j → ∞, M → ∞, T → ∞ and ρ → 0+ in that order, we obtain β 1 1 β − FVs ≤ lim inf 2 log ZN (s) ≤ lim sup 2 log ZN (s) ≤ − IV [ψ]. N→∞ N 2 N 2 N →∞ Thus IV [ψ] ≤ FVs and ψ ∈ As , so we must have ψ = φVs . Assume that FVs ≤ FV and s < bV . Then IV [φVs ] ≤ IV [φV ] and consequently s φV = φV by the uniqueness of the minimizing measure. This contradicts the definition t of bV . The corollary is proved. u Note that by (2.13) Corollary 4.3 implies (2.16) so we have proved the first part of Theorem 2.2. Before turning to the proof of the second part we need one more consequence of Lemma 4.1. Corollary 4.4. Let {sN } be as in Lemma 4.1 and assume that f : [0, σ +] → R, > 0, is continuous, or f : [0, ∞) → R is continuous and bounded in case sN ≡ ∞. Then PN 1 y,σN log EN −1,N [e j =1 f (xj ) ; sN ] = N→∞ N

Z

∞

lim

0

f (t)φVσ (t)dt.

(4.15)

Furthermore let y,σ

N (t) = uN−1,N

N −1

X 1 y,σN EN −1,N [ δt,xi ], N −1

(4.16)

i=1

(δt,s is Kronecker’s delta), be the 1-dimensional marginal distribution of the probability measure (4.4) (with sN ≡ ∞). Then for each 0 < y ≤ 1: y,σ

1 N (t) ≤ N−1 for all t ∈ AN , (i) 0 ≤ uN−1,N P y,σN (ii) if δt is the Dirac measure at t, then t∈AN uN −1,N (t)δt converges weakly to φV (t)dt as N → ∞. y,σN (σN ) = 0. (iii) uN−1,N

Proof. We can prove (4.15) using Lemma 4.1 in exactly the same way as in the proof of (2.5) on p. 194 in [Jo], see also [De]. The weak limit (ii) is a direct consequence of (4.15), Q −1 yβ see [De]. Note that the limit does not depend on y since the factor N i=1 |σN − xi | does not affect the leading asymptotics. In the expectation (4.16) all the xi ’s have to be different, otherwise the probability is zero, and consequently the expectation is ≤ 1, which proves (i). That (iii) holds follows Q −1 yβ from the presence of the factor N i=1 |σN − xi | . The corollary is proved.

462

K. Johansson

We turn now to the proof of the upper-tail limit. Note that QM,N (x) = e−

Nβ 2 VN (xM )

M−1 Y

|xM − xi |β QM−1,N (x 0 ),

(4.17)

i=1

where x 0 = (x1 , . . . , xM−1 ). Using this identity we see that X QM,N (x) ZM,N (t) = M! x∈AM N x1 ≤···≤xM ≤t

=M

X

e−

M−1 Y

X

Nβ 2 VN (s)

|s − xi |β QM−1,N (x).

x∈AN (s)M−1 i=1

s∈AN (t)

If we define HM−1,N (s) =

M−1 Y

X

1 ZM−1,N (s)

|s − xi |β QM−1,N (x),

x∈AN (s)M−1 i=1

this can be written X

ZM,N (t) = M

e−

Nβ 2 VN (s)

ZM−1,N (s)HM−1,N (s),

(4.18)

s∈AN (t)

or FM,N (t) =

MZM−1,N ZM,N

X

e−

Nβ 2 VN (s)

FM−1,N (s)HM−1,N (s).

(4.19)

s∈AN (t)

This is the main formula to be used in the proof of (2.17). We will need two choices of M, namely M = N and M = N − 1. They are handled completely analogously and we will consider only the case M = N . Write AN (t, s) = AN ∩ (t, s) for any 0 ≤ t < s ≤ ∞ and AN (t)∗ = AN (t, ∞). If we let t → ∞ in (4.19) and then subtract (4.19) from the limiting equality, we get 1 − FN (t) =

NZN−1,N ZN,N

X

e−

Nβ 2 VN (s)

FN −1,N (s)HN −1,N (s).

(4.20)

s∈AN (t)∗

Set 8V = FV −

1 2

Z 0

∞

V (s)φV (s)ds.

From the variational relations for φV (t) it follows that Z ∞ 1 log |bV − s|−1 φV (s)ds + V (bV ) = 8V . 2 0 Lemma 4.5. We have lim sup N→∞

ZN −1,N 1 log ≤ β8V . N ZN,N

(4.21)

(4.22)

Shape Fluctuations and Random Matrices

463

Proof. By (4.17) we have N −1 X Y Nβ ZN,N = e− 2 VN (s) EN −1,N [ |s − xi |β ] ZN−1,N i=1

s∈AN

≥e

− Nβ 2 VN (r)

N −1 Y

EN −1,N [

(4.23)

|r − xi |β ]

i=1

for any r ∈ AN . One difficulty in estimating the right-hand side in (4.23) comes from the fact that, due to the discrete nature of the problem the integrand could, apriori, be zero for many r’s with high probability. Note that we define 0y = 0 for any y > 0. Let ψs (t) = 1 if t 6 = s and ψs (s) = 0. Consider N −1 Y 1 |s − xi |yβ ψs (xi )]. log EN −1,N [ fN (y; s) = N i=1

Then,

N −1

fN (0+; s) = lim fN (y; s) = y→0+

Y 1 log EN −1,N [ ψs (xi )] N i=1

(4.24)

1 log PN −1,N [all xi 6 = s]. = N Let > 0 be given and write BN () = AN (bV + , bV + 2). Now, [ X PN−1,N [all xi 6 = s] ≥ PN −1,N [ {all xi 6= s}] s∈BN ()

s∈BN ()

= 1 − PN −1,N [

\

{one xi = s}].

(4.25)

s∈BN ()

Take g : [0, ∞) → [0, ∞) continuous such that g(s) = 1 if bV + ≤ s ≤ bV + 2 and g(s) = 0 if 0 ≤ s ≤ bV or s ≥ bV + 3. Then, PN \ {one xi = s}] ≤ EN −1,N [e i=1 g(xi ) ] ≤ eN/2 (4.26) eN PN−1,N [ s∈BN ()

for all sufficiently large N. The first inequality follows from the definitions whereas the second follows from Corollary 4.4, (4.15). Combining (4.25) and (4.26) we see that max PN −1,N [all xi 6= s] ≥

s∈BN ()

1 2N

(4.27)

for all sufficiently large N . Hence, by (4.24) and (4.27) we can choose σN = σN () ∈ BN () so that (4.28) lim fN (0+; σN ) = 0. N →∞

Take r = σN in (4.23). Then ZN,N β 1 log ≥ − VN (σN ) + fN (1; σN ) N ZN−1,N 2 β = − VN (σN ) + fN (0+; σN ) + β 2

Z 0

1

fN0 (y; σN )dy.

(4.29)

464

K. Johansson Z

We can pick a subsequence {Nj } which gives lim inf N →∞ N1 log ZNN,N and such that −1,N σNj () → σ () ∈ [bV + , bV + 2]. Then, by (4.28) and (4.29), lim inf N→∞

ZN,N β 1 log ≥ − V (σ ()) + β lim inf j →∞ N ZN−1,N 2

Now,

Z

1 0

fN0 j (y; σNj )dy.

(4.30)

N −1 1 X log |σN − xi |] N i=1 N −1 X y,σN log |σN − t|uN −1,N (t). = N y,σ

N [ fN0 (y; σN ) = EN −1,N

t∈AN

Hence, by Corollary 4.4 (i) and (iii), fN0 (y; σN ) ≥ 2

N 1 X i log ≥ −2, N N i=1

and consequently, by Fatou’s lemma, Z 1 Z fN0 j (y; σNj )dy ≥ lim inf j →∞

0

0

1

lim inf fN0 j (y; σNj )dy. j →∞

(4.31)

Given δ > 0, small, and M > 0 set   log M, if |t| ≥ M fM,δ (t) = log |t|, if δ ≤ |t| < M  log δ, if |t| ≤ δ. By Corollary 4.4 (i) and (iii) we have X y,σN (min(log M, log |σN − t|) − fM,δ (σN − t))uN −1,N (t) t∈AN [N δ] X σN − t 1 2 X Nδ log ≤ log δ N − 1 ≤ N − 1 k k=1

t∈AN ; 0<|t−σN |≤δ

2N δ. ≤ N −1 Also, if |σN − σ | ≤ δ, which is true if N is large enough, 1 |fM,δ (|σN − t|) − fM,δ (|σ () − t|)| ≤ δ log . δ Since log |σN − t| ≥ min(log M, log |σN − t|) and M,δ are arbitrary it follows from Corollary 4.4, (ii) that Z ∞ log |σ () − t|φV (t)dt. lim inf fN0 j (y; σNj ) ≥ j →∞

0

Shape Fluctuations and Random Matrices

465

Together with (4.30) and (4.31) this gives ZN,N β 1 log ≥ − V (σ ()) + β lim inf N→∞ N ZN−1,N 2

Z

∞

0

log |σ () − t|φV (t)dt.

We can pick a sequence j → 0 such that σ (j ) → bV and using (4.24) we obtain lim inf N→∞

1 ZN,N ≥ −β8V , log N ZN −1,N

and the lemma is proved. u t Given δ > 0 we can use Lemma 4.5 to find N0 (δ) so that ZN −1,N ≤ eNβ(8V +δ) ZN,N

(4.32)

if N ≥ N0 (δ). Since FN−1,N (s) ≤ 1 we can combine (4.20) and (4.32) to get the estimate X Nβ e− 2 VN (s) HN −1,N (s). (4.33) 1 − FN (t) ≤ N eNβ(8V +δ) s∈AN (t)∗

We have HN−1,N (s) = EN−1,N [

N−1 Y

|s − xi |β ; s]

i=1 β

0,0 [ ≤ (1 + s 2 ) 2 (N −1) EN−1,N

N −1 Y i=1

(1 + xi2 )β/2 ; s] ≤ eCN (1 + s 2 )βN/2 ,

where the last inequality is proved, using Lemma 4.1, just as (4.25) in [Jo]. Together with (4.1) this gives e−

Nβ 2 VN (s)

HN −1,N (s) ≤ eCN −

Nβξ 2

log(1+s 2 )

.

Hence, given a constant D > 0, there is a constant d > 0 such that X e−NβVN (s)/2 HN −1,N (s) ≤ e−N D . eNβ(8V +δ)

(4.34)

(4.35)

s∈AN (d)∗

For t ≥ s we define HN−1,N (t, s) = Clearly,

1 ZN −1,N (s)

X

N −1 Y

|t − xi |β QN −1,N (x).

x∈AN (s)N −1 j =1

HN−1,N (s) = HN −1,N (s, s) ≤ HN −1,N (t, s)

(4.36)

if t ≥ s. Combining the estimates (4.33), (4.35) and (4.36) we obtain X Nβ e− 2 VN (s) HN −1,N (s + , s) (4.37) 1 − FN (t) ≤ Ne−ND + N eNβ(8V +δ) x∈AN (t,d)

466

K. Johansson

for any > 0. Let sN ∈ AN (t, d) be the s which gives the largest term in the sum in (4.37). Then 1 − FN (t) ≤ Ne−ND + N 2 (d − t)eNβ(8V +δ− 2 VN (sN )) HN −1,N (sN + , sN ). (4.38) 1

Choose a sequence which gives the upper limit of N −1 log(1 − FN (t)) and such that sNj → σ ∈ [t, d]. We would like to prove that 1 log HNj −1,Nj (sNj + , sNj ) = −β lim j →∞ Nj

Z

log |σ + − t|φVσ (t)dt.

(4.39)

We will write N instead of Nj for simplicity. Looking at the definition of HN−1,N (t, s), we see that we are interested in the limit of PN −1 1 log EN−1,N [eβ j =1 log |sN +−xi | ; sN ] N

as N → ∞, sN → σ . Since | log |sN + −xi |−log |σ + −xi || = | log |1+

sN − σ |sN − σ | || ≤ C , (4.40) σ + − xi σ + − xi

where C is a numerical constant, and sN ≤ σ + /2 for N large enough, the limit (4.39) follows from Corollary 4.4. If t > bV , then φVσ = φV , since σ ≥ t, and combining (4.38) and (4.39) yields 1 log(1 − FN (t)) N N→∞ Z β ≤ max{−D, β8V + δ − V (σ ) − β log |σ + − t|−1 φV (t)dt}. 2

lim sup

(4.41)

Note that σ could depend on and d. Pick a sequence = j → 0+ and then a subsequence so that σ (jk ) → τ ∈ [t, d]. Then, since D and δ are arbitrary, we get lim sup N→∞

1 log(1 − FN (t)) ≤ β(8V − inf τ ≥t N

Z kV (τ, s)φV (s)ds)

(4.42)

and we have proved one half of (2.17). We now turn to the lower limit. If we start with M = N − 1 instead of N then (4.42) holds with FN−1 replaced by FN−1,N (t). By assumption the right-hand side of (4.42) is negative for all t > bV . Hence, if t > bV , we see that FN −1,N (t) ≥ 1/2

(4.43)

for all sufficiently large N. Note that, if t ≥ s, then HN−1,N (t) ≥

ZN−1,N (s) HN −1,N (t, s) ≥ FN −1,N (s)HN −1,N (t, s). ZN−1,N (t)

(4.44)

Shape Fluctuations and Random Matrices

467

R The function f (τ ) = kV (τ, s)φV (s)ds is continuous on [t, ∞) and f (τ ) → ∞ as τ → ∞, so it assumes its minimum in [t, ∞) at some point τ0 ≥ t. Let > 0. Pick sN ∈ AN (t)∗ such that sN & τ0 + . Then, picking one term in the sum X

e−

Nβ 2 VN (s)

FN −1,N (s)HN −1,N (s)

s∈AN (t)∗

≥ e−

Nβ 2 VN (sN )

FN −1,N (τ0 )2 HN −1,N (sN , sN − ).

If we use the limit (4.39), the estimate (4.43) with s = τ0 , and let → 0+, we see that lim inf N→∞

1 log N

X s∈AN

β ≥ − V (τ0 ) − β 2

e−

Nβ 2 VN (s)

FN −1,N (s)HN −1,N (s)

(t)∗

(4.45)

Z

log |τ0 − t|

−1

φV (t)dt.

To complete the proof we need Lemma 4.6. For any VN satisfying the conditions (i)–(iii), lim inf N →∞

ZN −1,N 1 log ≥ β8V . N ZN,N

(4.46)

Proof. If we let t → ∞ in (4.19), we see that, > 0, X Nβ ZN,N =N e− 2 VN (s) FN −1,N (s)HN −1,N (s) ZN−1,N s∈AN X Nβ e− 2 VN (s) FN −1,N (s)HN −1,N (s) ≤N

(4.47)

s∈AN (bV −)

+N

X

s∈AN (bV

e−

Nβ 2 VN (s)

HN −1,N (s),

−)∗

since FN−1,N (s) ≤ 1. By adjusting the constant C we see that (4.34) holds for all s ∈ AN , so the first sum in the right-hand side of (4.47) is ≤ eCN FN−1,N (bV − )

X

β

e− 2 N ξ log(1+s

2)

β

≤ eCN − 2 L(bV −)N

2

s∈AN

for all sufficiently large N by the first part of Theorem 2.2. (Replacing FN (t) by FN−1,N (t) does not make any difference.) Since L(bV − ) > 0 if > 0, the first part of the right hand side of (4.47) is negligible. The same argument that lead us from (4.33) to (4.42) allows us to treat the second term in the right-hand side of (4.47) and obtain ZN,N 1 log N Z N −1,N N→∞ Z β ≤ max{−D, − V (σ ) − β log |σ + η − t|−1 φVσ (t)dt}, 2

lim sup

(4.48)

468

K. Johansson

where σ ∈ [bV − , d], η > 2, D > 0 are given. Take = j → 0+ so that σ ( )

σ (j ) → τ ∈ [bV , d] . Note that φV j (t)dt converges weakly to φVτ (t)dt = φV (t)dt. Using an inequality like (4.40) we get 1 ZN,N log ZN−1,N N→∞ N Z β ≤ max{−D, − V (τ ) − β log |τ + η − t|−1 φV (t)dt}. 2

lim sup

(4.49)

We can now repeat the argument that leads from (4.41) to (4.42) and obtain Z ZN,N β 1 V (s)φV (s)ds ≤ lim sup log ZN−1,N 2 N→∞ N Z kV (τ, s)φV (s)ds ≤ −β8V , − β inf since

R

τ ≥bV

t kV (τ, s)φV (s)ds ≥ FV if τ ≥ bV . The lemma is proved. u

Combining (4.20), (4.45) and Lemma 4.6, we see that 1 log(1 − FN (t)) N Z Z ≥ β(FV − kV (τ0 , s)φV (s)ds) = β(FV − inf kV (τ, s)φV (s)ds),

lim inf N→∞

τ ≥t

by the choice of τ0 . This completes the proof of Theorem 2.2. 5. Asymptotics for the Meixner Kernel This section is devoted to the proof of Lemma 3.2, which is based on establishing the appropriate asymptotics of the Meixner polynomials. See [Go] and [JW] for some results on the asymptotics of Meixner polynomials. From (3.3) we obtain, x ∈ R, √ √ n+K Z √ n! γ + z/ q x dz K,q n( γ) mn (x) = (−1) √ n √ √ K n+1 √ √ ( q) 2πi 0r γ + qz ( γ + qz) z (5.1) √ x √ n+K Z r √ n! γ − t/ q sin πx ( γ ) dt − , √ √ √γ − √qt (√γ − √qt)K t n+1 π ( q)n γq √ √ where 0r is the circle |z| = r, 0 < r < γ /q; if 0 < r ≤ γ q the second integral √ should be omitted. Let b = (1 + γ q)2 /(1 − q) as before, let σ be given by (1.11) and set √ √ ( γ + q)2 . a =b+γ −1= 1−q Set √ √ √ γ+ q γq + z , t (z) = √ √ √ γ + qz γq + 1 √ √ γ+ q s(z) = √ √ , γ + qz

Shape Fluctuations and Random Matrices

and

For 0 < r <

√

469

bx (x + K − 1)!N ! γ K+N AN (x) = x+K x x!(N + K − 2)! 1 − q

r

q . γ

γ /q we define 1 2π

Dnr (x; g) =

Z

π

−π

g(reiθ )t (reiθ )x s(reiθ )K

dθ r n einθ

,

√ √ √ γ q, and if γ q < r < γ /q, then Z r dτ Fnr (x; g) = (−1)n+x+1 √ |t (−τ )|x s(−τ )K g(−τ ) n+1 . τ γq

(5.3)

Fnr (x; g) = 0 if 0 < r ≤

(5.4)

The powers are defined by taking the prinipal branch of the logarithm. The Meixner kernel (3.4) can now be written, for x, y integers (which is the case we need), p DN (x; g1 )DN (y; g2 ) − DN (x; g2 )DN (y; g1 ) AN (x)AN (y) x−y

(5.5)

KN (x, x) = AN (x)[DN (x − 1; g3 )DN (x; g2 ) − DN (x; g1 )DN (x − 1; g4 ) + FN (x; g1 )DN (x; g2 ) − FN (x; g2 )DN (x; g1 )],

(5.6)

KN (x, y) = if x 6 = y, and

where g1 (z) ≡ 1, g2 (z) = z − 1, g3 (z) = t (z) log t (z) and g4 (z) = g2 (z)g3 (z). The functions gi (z) are bounded for |z| ≤ 1. . . Write x = Nb + y and K = [γ N ] − N + 1 = N (γN − 1) = N (γ − 1) + ωN , 0 < ωN ≤ 1. Lemma 5.1. If x = Nb + ξ σ N 1/3 and M0 > 0 is a given constant, there are constants c1 (q, γ ) and c2 (q, γ ), such that 1 −2/3 AN (x) ≤ c1 (q, γ )ec2 (q,γ )ξ N N

(5.7)

for all ξ ≥ −M0 . Furthermore, √ γ q 1 AN (x) = √ N→∞ N (1 − q) ab lim

(5.8)

uniformly for |ξ | ≤ M0 . Proof. By Stirling’s formula (x + K)x+K N N bx (N + K)(N + K − 1) γ K+N x N +K x+K x (N + K) a x+K s r (x + K)N 1 q o(1) e . × x(N + K) 1 − q γ

AN (x) =

(5.9)

470

K. Johansson

Write aN = b + γN − 1. Then, (x + K)x+K N N bx γ K+N x x (N + K)N+K a x+K Nb x x + K x+K aN x+K γ N +K = . x NaN a γN

(5.10)

If we write u = NaN and v = Nb < u. Then y u+y y −v−y . g(y) Nb x x + K x+K =e . = 1+ 1+ x NaN u v Since g(0) = g 0 (0) = 0 and g 00 (t) = (v − u)(u + t)−1 (v + t)−1 < 0, we have exp g(t) ≤ 1 if ξ ≥ 0. If −M0 ≤ ξ ≤ M0 , then Z t |g(t)| = | (t − s)g 00 (s)ds| ≤ CN −1/3 . 0

Furthermore

x+K

aN a

and

γ γN

= eωN +O(ξ N

K+N

−2/3 )+o(1)

= e−ωN +o(1) .

Inserting these estimates into (5.10) we obtain (x + K)x+K N N bx −2/3 γ K+N ≤ CeCξ N x N +K x+K x (N + K) a for ξ ≥ −M0 and

(x + K)x+K N N bx γ K+N = 1 N→∞ x x (N + K)N +K a x+K lim

t uniformly for |ξ | ≤ M0 . By (5.9) this proves (5.7) and (5.8). The lemma is proved. u Set

√ √ √ u(z) = b log( γ q + z) − a log( γ + qz) − log z

so that r (x; g) = DN

1 2π

Z 0r

eN (u(z)−u(1))+y log t (z)+ωN log s(z) g(z)

dz . iz

(5.11)

Now, u0 (z) = −ρ(1 − z)2 √ √ √ √ 2 √ √ √ √ 3 qz + ( q + γ + q γ )z + q + γ + q γ + γ q , + ρ(1 − z) √ √ √ z(z + γ q)( γ + qz) where ρ=

√ γ q √ . √ √ (1 + γ q)( γ + q)

Shape Fluctuations and Random Matrices

471

Hence we can write u(z) − u(1) =

1 ρ(1 − z)3 + ρ(1 − z)4 v(z), 3

where one verifies that |v(z)| ≤ 28/27 if |z − 1| ≤ 1/4. By taking absolute values in (5.3) we obtain Z C a x/2 a K (1 − q)K π f (cos θ ) r (x; g)| ≤ e dθ, |DN 2π b rN −π

(5.12)

(5.13)

where f (τ ) =

x−K x √ √ log(γ q + r 2 + 2 γ qrτ ) + log(γ + qr 2 + 2 γ qrτ ). 2 2

Write r = 1 − δ, 0 ≤ δ < 1. A computation shows that f 0 (τ ) ≥ 0 if (say) √ 1 + q + 2 γq N, y ≥ −δ 1−q

(5.14)

which covers all the y’s we are interested in. Thus, if (5.14) is fullfilled, then r (x; g)| ≤ C exp(N (u(1 − δ) − u(1)) + y log t (1 − δ)). |DN

(5.15)

1 28 2 u(1 − δ) − u(1) ≤ ρδ 3 ( δ ) ≤ ρδ 3 3 27 3

(5.16)

By (5.12),

if 0 ≤ δ ≤ 1/4. Now, log t (1 − δ) = log 1 −

1 1−

√

√

q √ γ+ q

√ (1 − q) γ √ δ √ √ δ (1 + γ q)( γ + q)

1 ≤ −ρ(1 − q) √ δ, γq and consequently it follows from (5.15) and (5.16) that, if y ≥ 0, then 2N 3 1 r (x; g)| ≤ C[exp ρδ − ρ(1 − q) √ δy . |DN 3 γq

(5.17)

√ Recall that y = σ N 1/3 ξ with σ given by (1.11). Note that σ = (1 − q)−1 γ qρ −2/3 . √ Choose δ = (ρN)−1/3 ξ if ξ ≤ (Nρ)2/3 /16 and δ = 1/4, if ξ ≥ (Nρ)2/3 /16. Inserting this into (5.17) gives p 1 1 r (x; g)| ≤ C exp − min( ξ , (Nρ)1/3 )ξ , |DN 3 4 for ξ ≥ 0. Let ∈ [0, π ] and set

Z 1 dθ g(reiθ )t (reiθ )x s(reiθ )K N iN θ , 2π − r e r (x; g) − I10 . I100 = DN I10 =

(5.18)

472

K. Johansson

By the same argument that was used for (5.13) above, we see that if y satisfies (5.14), then 1 |I100 | ≤ C|t (rei )|x |s(rei )|K N r (5.19) ≤ C exp NRe (u(rei ) − u(1)) + y log |t (rei )| . √ Next, we consider FNr (x; g), γ q < r ≤ 1. Taking absolute values in (5.4) yields √ √ √ Z γ q − τ x γ + q x+K dτ √ . (5.20) |FNr (x; g)| ≤ C √ √ √ γ q + 1 γ − qτ τ N +1 γq The integrand in (5.20) is a increasing function of τ for all x that we are considering. The monotonicity argument used for (5.13) now shows that, if (5.14) is fulfilled, then |FNr (x; g)| ≤ C|t (−r)|x |s(−r)|K

1 rN

1 ≤ C|t (rei )|x |s(rei )|K N r ≤ C exp NRe (u(rei ) − u(1)) + y log |t (rei )| ,

(5.21)

where the last inequality is the same as in (5.19). If we take = 0, we get the same right-hand side as in (5.15) and hence we obtain the same estimates, i. e. p 1 1 |FNr (x; g)| ≤ C exp − min( ξ , (Nρ)1/3 )ξ . 3 4 Combining this with (5.6), (5.7) and (5.18) yields p 1 1 |KN (x, x)| ≤ CN exp − min( ξ , (Nρ)1/3 )ξ 4 4

(5.22)

for any ξ ≥ 0; x an integer. Consider now ξ ∈ [−M0 , (ρN )1/6 ]. Take = (ρN )−1/4 , δ = η(ρN )−1/3 ≤ (ρN)−1/4 , where η > 0 will be chosen below. By (5.12), we have Z 1 1 g((1 − δ)eiθ ) exp N ρ(1 − (1 − δ)eiθ )3 I10 = 2π − 3 (5.23) iθ 4 iθ + ρ(1 − (1 − δ)e ) v((1 − δ)e ) + y log t ((1 − δ)eiθ ) + ωN log s((1 − δ)eiθ ) dθ. We make the change of variables θ = ω(ρN)−1/3 . For 0 < η ≤ (ρN )1/12 , |θ | ≤ , we have 1 ρ(1 − (1 − δ)eiθ )3 + ρ(1 − (1 − δ)eiθ )4 v((1 − δ)eiθ ) 3 (5.24) 1 = (η − iω)3 + R1 , 3 where R1 → 0 uniformly as N → ∞. Furthermore, if ξ ∈ [−M0 , (ρN )1/6 ], then y log t ((1 − δ)eiθ ) = (−η + iω)ξ + R2 , where R2 → 0 uniformly as N → ∞.

(5.25)

Shape Fluctuations and Random Matrices

473

Suppose g (j ) (1) = 0, j = 0, . . . , ` − 1 but g (`) (1) 6= 0, so that 1 (`) (5.26) g (1)ρ −`/3 (−η + iω)` + . . . . `! √ We now have all the estimates we need. Let η = ξ if ξ ≥ M0 and η = 1 if |ξ | ≤ M0 . By (5.12) and (5.24) we obtain g((1 − δ)eiθ ) =

Re Nu((1 − δ)eiθ ) =

1 3 η − ηω2 + R1 3

and hence, if ξ ∈ [−M0 , (ρN )1/6 ], = ω(ρN)−1/3 with ω = (ρN )1/12 ,(5.19) yields, 1 |I100 | ≤ C exp η3 − η(ρN )1/6 − ηξ + R3 3 2 C ≤ (`+1)/3 exp − |ξ |3/2 . 3 N

(5.27)

Similarly, by (5.21), for ξ ∈ [−M0 , (ρN )1/6 ], |I10 | ≤

C N (`+1)/3

2 exp − |ξ |3/2 . 3

(5.29)

The dominated convergence theorem gives lim N (`+1)/3 I10 N→∞ ρ −(`+1)/3 (`) =

=

`!

ρ −(`+1)/3 `!

g

(1)

1 2π

Z

∞

i (−η + iω)` exp (ω + iη)3 + iξ(ω + iη) dω (5.30) 3 −∞

g (`) (1)Ai (`) (ξ ),

uniformly for |ξ | ≤ M0 . Observe that g1 (1) = 1,g2 (1) = 0 but g20 (1) = 1, g3 (1) = 0 but g30 (1) = ρ(1 − q)(γ q)−1/2 and g4 (1) = g40 (1) = 0 but g400 (1) = 2ρ(1 − q)(γ q)−1/2 . Combining (5.27) and (5.29) we obtain r (x; g)| ≤ |DN

C N (`+1)/3

2 exp − |ξ |3/2 , 3

(5.31)

for ξ ∈ [−M0 , (ρN)1/6 ]. The estimate (5.27) and the limit (5.30) give r (x; g1 ) = ρ −1/3Ai (ξ ), lim N 1/3 DN

(5.32a)

r (x; g2 ) = ρ −2/3Ai 0 (ξ ), lim N 2/3 DN

(5.32b)

N→∞ N→∞

r (x; g3 ) = lim N 2/3 DN

N→∞

and

ρ 1/3 (1 − q) 0 Ai (ξ ), √ γq

(1 − q) 00 r (x; g4 ) = √ Ai (ξ ). lim N DN N→∞ γq

(5.32c)

(5.32d)

We can now use (5.22), (5.28), (5.31) and (5.32) in (5.5) and (5.6) to prove (3.6), (3.7) and (3.8) for the Meixner kernel. The lemma is proved.

474

K. Johansson

6. The Equilibrium Measure The equilibrium measure φV (t)dt satisfies certain variational conditions. Proposition 6.1. Assume that φ ∈ As satisfies Rs (i) 0 kV (t, τ )φ(τ )dτ ≥ λ if φ(t) = 0, Rs (ii) 0 kV (t, τ )φ(τ )dτ ≤ λ if φ(t) = 1, Rs (iii) 0 kV (t, τ )φ(τ )dτ = λ if 0 < φ(t) < 1, for some λ (which = FV ). Then φ = φV . We will not prove this here, see [LL] for a very similar result. The way to compute φV is to seek a candidate solution φ and then verify that φ satisfies the variational conditions. In a region where 0 < φ(t) < 1 we can differentiate (iii) and obtain Z s φ(τ ) 1 (6.1) dτ = − V 0 (t). τ − t 2 0 Since V γ ,q is convex the support of φV is a single interval. If we consider the variational problem without the constraint φ ≤ 1, and this problem has a solution ψ0 such that 0 ≤ ψ0 ≤ 1, then this ψ0 is the solution we are seeking. This is the case when γ ≥ 1/q, and then [aV , bV ] = [a, b] and Z b 1 φ(τ ) (6.2) dτ = − V 0 (t), a ≤ t ≤ b. 2 a τ −t We must have φ(b) = 0 and φ(a) bounded (φ(a) = 0 if γ > 1/q). If the solution ψ0 (t) > 1 in some interval, e.g. ψ0 (t) > 1 in [0, a0 ) but 0 < ψ0 (t) < 1 in (a0 , b0 ), we make an ansatz that φ(t) = 1 in [0, a] and 0 < φ(t) < 1 in (a, b) for some a, b, [aV , bV ] = [0, b]. This is the situation when γ < 1/q. By (6.1), Z

b

a

φ(τ ) 1 dτ = − V 0 (t) − τ −t 2

Z 0

a

dτ , τ −t

(6.3)

and φ(a) = 1, φ(b) = 0. By making the substitution x = 2(t − a)/c − 1, y = 2(τ − a)/c − 1, c = b − a, in (6.2) and (6.3) we get an equation of the form 1 π

Z

1

−1

v(x) dx = f (y), −1 ≤ x ≤ 1, x−y

(6.4)

with some f . This equation has the general solution, [Tr], p Z 1 C f (y) 1 − y 2 1 dy + √ , v(x) = − √ y−x π 1 − x 2 −1 π 1 − x2 where C is an arbitrary constant. In this way we obtain (2.19) and (2.20). Equation (2.21) is obtained by substituting (2.19) or (2.20) into (2.15) (the infimum is assumed for τ = t). Consider the case γ > 1/q, the other case is similar. Then, with t = a + c(x + 1)/2, Z Z t c x 0 J 0 (s)ds = J (a + c(y + 1)/2)dy J (t) = 2 1 b

Shape Fluctuations and Random Matrices

475

and Z c 1 v(x) 1 . 0 dx + V 0 (a + c(y + 1)/2) g(y) = J (a + c(y + 1)/2) = 2 −1 x − y 2 Z 1 1 c 1 log |y − x|v 0 (x)dx + [log − log(y + B) + log(y + D)]. = 2 −1 2 q Now, 1 v (x) = 2π 0

and

Z

1

−1

log |y − x|v 0 (x)dx =

where 1 F (y, R) = π Note that

√ √ 2 1 D −1 B2 − 1 − √ x+D x+B 1 − x2

Z

1

−1

d F (y, R) = dy

1 1 F (y, D) − F (y, B), 2 2

√ R2 − 1 log |y − x|dx. √ (x + R) 1 − x 2

√ 1 1 R2 − 1 [p +√ ]. y+R R2 − 1 y2 − 1

Using these formulas we see that g(−1) = 0 and hence Z Z c x c x g(y)dy = (x − y)g 0 (y)dy J (t) = 4 1 4 1 √ √ Z dy B2 − 1 D2 − 1 c x , − )p (x − y)( = 4 1 x+B x+D y2 − 1 which gives (2.21). If f (y) = (γ − q)(y + B)−1 + (1 − qγ )(y + D)−1 , then f (y) > 0 for all y ≥ 1 and a0 = inf 1≤y≤1/c f (y) > 0. Thus for 0 ≤ δ ≤ 1, by (2.21), a0 c J (b + δ) ≥ √ 8 qγ

Z 1

1+δ/c

(1 −

2δ dy ≥ c1 δ 3/2 , − y) √ √ c y+1 y−1

for some constant c1 > 0. If δ ≥ 1, then Z 1+1/c dy 2δ a0 c − y) √ , (1 − J (b + δ) ≥ √ √ 8 qγ 1 c y+1 y−1 which proves (2.22). A more careful computation for small δ yields (2.23).

Acknowledgements. I thank C. Newman for drawing my attention to the fact that the exponent χ = 1/3 occurs in many problems. I want to express my gratitude to A. Dembo and P. Diaconis for telling me about the interpretation of G(M, N) in terms of randomly growing Young diagrams and the exclusion process. Remark 2.6 was motivated by discussions with J. Baik. Finally I thank the referee for pointing out some mistakes in a previous version. This work was supported by the Swedish Natural Science Research Council (NFR).

476

K. Johansson

References [BR] [BDJ] [BDG] [BG] [BO] [Ch] [De] [DS] [Fu] [Go] [HP] [HJ] [Hö] [Ja] [JW] [JPS] [Jo] [Kn] [KS] [LL] [Li] [NP] [NSU] [Pi] [Ro] [ST] [Sa] [Se1] [Se2] [Sz1] [TW1] [TW2] [TW3] [Tr]

Baik, J., Rains, E.: The asymptotics of monotone subsequences of involutions. math. CO/9905084 Baik, J., Deift, P.A. and Johansson, K.: On the distribution of the longest increasing subsequence in a random permutation. J. Amer. Math. soc. 12, 1119–1178 (1999) Ben Arous, G., Dembo, A., Guionnet, A.: Ageing of Spherical Spin Glasses. Preprint Ben Arous, G., Guionnet, A.: Large Deviations for Wigner’s Law and Voiculescu’s Non-commutative Entropy. Probab. Theory Relat. Fields, 108, 517–542 (1997) Borodin, A., Olshanski, G.: Statistics on Partitions, Point Processes and the Hypergeometric Kernel. math.CO/9904010 Chihara, T.S.: An Introduction to Orthogonal Polynomials. New York: Gordon and Breach, 1978 Deift, P.A.: Orthogonal polynomials and random matrices: A Riemann–Hilbert approach. New York: Courant Lecture Notes in Mathematics, 3, 1999 Dragnev, P.D., Saff, E.B.: Constrained energy problems with applications to orthogonal polynomials of a discrete variable. J. Anal. Math. 72, 223–259 (1997) Fulton, W.: Young Tableaux. London Mathematical Society, Student Texts 35, Cambridge: Cambridge Univ. Press, 1997 Goh, W.M.Y.: Plancherel–Rotach Type Asymptotics of the Meixner Polynomials. Preprint Hiai, F., Petz, D.: A Large Deviation Theorem for the Empirical Eigenvalue Distribution of Random Unitary Matrices. Math. Inst. of the Hungarian Academy of Sciences, Preprint No. 17/1997 Horn, R.A., Johnson, C.R.: Matrix Analysis. Cambridge: Cambridge Univ. Press, 1985 Hörmander, L.: The Analysis of Linear Partial Differential Operators I. Berlin–Heidelberg: SpringerVerlag, 1983 James, A.T.: Distributions of Matrix Variates and Latent Roots Derived from Normal Samples. Ann. Math. Statist. 35, 475–501 (1964) Jin, X.-S., Wang, R.: Uniform Asymptotic Expansions for Meixner Polynomials. Constr. Approx. 14, 113–150 (1998) Jockush, W., Propp, J., Shor, P.: Random domino tilings and the arctic circle theorem. Preprint 1995, math.CO/9801068 Johansson, K.: On Fluctuations of Eigenvalues of Random Hermitian Matrices. Duke Math. J. 91, 151–204 (1998) Knuth, D.E.: Permutations, Matrices and Generalized Young Tableaux. Pacific J. Math. 34, 709–727 (1970) Krug, J., Spohn, H.: Kinetic Roughening of Growing Interfaces. In: Solids far from Equilibrium: Growth, Morphology and Defects, Ed. C. Godrèche, Cambridge: Cambridge University Press, 1992, pp. 479–582 Lax, P.D., Levermore, C.D.: The Small Dispersion Limit of the Korteweg–deVries Equation. I. Commun. Pure and Appl. Math 36, 253–290 (1983) Ligget, T.M.: Interacting particle Systems. New York: Springer-Verlag, 1985 Newman, C.M., Piza, M.S.T.: Divergence of Shape Fluctuations in Two Dimensions. Ann. Prob. 23, 977–1005 (1995) Nikiforov, A.F., Suslov, S.K., Uvarov, V.B.: Classical Orthogonal Polynomials of a Discrete Variable. Springer Series in Computational Physics, Berlin–Heidelberg: Springer-Verlag, 1991 Piza, M.S.T.: Directed Polymers in a Random Environment: Some Results on Fluctuations. J. Stat. Phys. 89, 581–603 (1997) Rost, H.: Non–Equilibrium Behaviour of a Many Particle Process: Density Profile and Local Equilibria. Zeitschrift f. Warsch. Verw. Gebiete, 58, 41–53 (1981) Saff, E.B., Totik,V.: Logarithmic Potentials with External Fields. Grundlehren der Matematischen Wissenschaften, 316, Berlin: Springer-Verlag, 1997 Sagan, B.: The Symmetric Group. Brooks/Cole Publ. Comp., 1991 Seppäläinen, T.: Large Deviations for Increasing Subsequences on the Plane. Probab. Theory Relat. Fields 112, 221–244 (1998) Seppäläinen, T.: Coupling the totally asymmetric simple exclusion process with a moving interface. Markov Process. Related Fields 4, 593– 628 (1998) Szegö, G.: Orthogonal Polynomials. American Mathematical Society Colloquium Publications, Vol. XXII, New York, 1939 Tracy, C.A., Widom, H.: Level Spacing Distributions and the Airy Kernel. Commun. Math. Phys. 159, 151–174 (1994) Tracy, C.A., Widom, H.: On Orthogonal and Symplectic Matrix Ensembles. Commun. Math. Phys. 177, 727–754 (1996) Tracy, C.A., Widom, H.: Correlation Functions, Cluster Functions, and Spacing Distributions for Random Matrices. J. Statist. Phys. 92, 809–835 (1998) Tricomi, F.G.: Integral Equations. Pure Appl. Math V, London: Interscience, 1957

Communicated by A. Kupiainen

Commun. Math. Phys. 209, 477 – 516 (2000)

Communications in

Mathematical Physics

© Springer-Verlag 2000

Renormalization Proof for Spontaneously Broken Yang–Mills Theory with Flow Equations Christoph Kopper1 , Volkhard F. Müller2 1 Centre de Physique Théorique de l’Ecole Polytechnique, 91128 Palaiseau, France 2 Fachbereich Physik, Universität Kaiserslautern, 67653 Kaiserslautern, Germany

Received: 4 February 1999 / Accepted: 20 August 1999

Abstract: In this paper we present a renormalizability proof for spontaneously broken SU (2) gauge theory. It is based on Flow Equations, i.e. on the Wilson renormalization group adapted to perturbation theory. The power counting part of the proof, which is conceptually and technically simple, follows the same lines as that for any other renormalizable theory. The main difficulty stems from the fact that the regularization violates gauge invariance. We prove that there exists a class of renormalization conditions such that the renormalized Green functions satisfy the Slavnov–Taylor identities of SU (2) Yang–Mills theory on which the gauge invariance of the renormalized theory is based. 1. Introduction In the early seventies Wilson and his collaborators published their ideas on the renormalization group and effective Lagrangians [WiKo], which have stimulated the progress of quantum field theory and statistical mechanics ever since. In 1984 Polchinski [Pol] showed that these ideas are suited as a basis for perturbative renormalization theory.1 He proved Euclidean massive 844 to be renormalizable without introducing Feynman diagrams, thus sidestepping the associated complicated analysis of their divergence/convergence properties. Instead, the problem is solved by bounding inductively the solutions of a system of first order differential equations, the Flow Equations (FE), which are a reduction of the Wilson FE to their perturbative content. Over the past decade Polchinski’s argument has been considerably simplified technically, extended to physical renormalization conditions and has been rendered rigorous [KKSa]. Beyond it has been applied, again in mathematical rigour, to nearly all situations of physical interest: The 844 proof itself already also holds for any other massive theory 1 Wilson himself remarked already in the late sixties that this should be possible, as we learned from E. Brézin.

478

C. Kopper, V. F. Müller

with global symmetries only and renormalizable power counting, like e.g. the Yukawamodels, O(N)-models etc. It could then be extended to Euclidean massless 844 [KK1] and QED4 [KK2] and also to theories in Minkowski-space [KKSc]. The FE method also served to extract properties of, or bounds on Green functions which were harder - if at all - to get by other methods. We mention composite operator renormalization together with (generalized) Zimmermann identities [KK3], Wilson’s operator product expansion [KK4], Symanzik improvement in the convergence of the regularized theory [Ke1, Wie], de Calan-Rivasseau large order bounds on perturbation theory [Ke2], bounds on the singularities of Green functions at exceptional momenta [KK1], analyticity properties of Green functions in Minkowski space [KKSc] and decoupling theorems [Kim]. Renormalization of the nonlinear σ -model on the ultraviolet side is analysed in [MiRa]. A recent review (in German) on previous work on FEs can be found in [Kop]. We should also mention that the interest in FEs over the last decade goes far beyond mathematical physics and has led to many interesting results, ideas and calculations in theoretical physics. To give a few examples we mention that critical exponents for 844 -type theories have been calculated in [TeWe]. Truncated FE have also been applied to the bound state problem in [Ell], to Yang–Mills theory in [EHW], in particular to the study of vacuum condensates in [ReWe]. Among the entries in our list on solved renormalization problems there is still one missing, which is of fundamental importance in physics, namely nonabelian gauge theory. The present paper is intended to close this gap by treating spontaneously broken SU(2)-Yang–Mills theory, which corresponds to the weak sector of the standard model.2 Another interesting problem, which should be studied, is QCD where the problem of gauge invariance is intertwined with the infrared problem. Since the latter has already been extensively studied we chose the spontaneously broken theory which is infrared finite and thus simpler. On the other hand the Slavnov–Taylor identities (STI) or Ward identities of the spontaneously broken symmetry are more complicated to analyse.3 The (ultraviolet) power counting part of the FE renormalization proof is (up to notational and other minor changes) the same and simple for all the above mentioned theories, which renders the method attractive. Gauge theories, however, present a difficulty coming from the wellknown fact that gauge symmetry is broken by cutoffs in momentum space, and it is just the flow of such a cutoff which produces the FE. What we have to show is that gauge invariance is restored when the cutoffs are taken away. On the level of the Green functions (which are not gauge invariant) this means that we have to verify the STI of the theory. They then allow to argue that physical quantities such as the S-matrix are gauge-invariant [ZiJ]. On analysing the FE for a gauge theory one realizes that the restoration of the STI depends on the choice of the renormalization conditions chosen and cannot be true in general. More precisely, since gauge invariance is violated in the regularized theory, the renormalization group flow will generally produce nonvanishing contributions to all those relevant parameters of the theory, which are forbidden by gauge invariance, e.g. a noninvariant gauge field selfcoupling of the form (A2 )2 . The question is then: Can we use the freedom in adjusting the renormalization conditions such that the STI are nevertheless restored in the end? To answer this question a first observation, already encountered when treating QED, is crucial: The violation of the STI in the regu2 for vanishing Weinberg angle. This is however not of decisive importance for the line of the argument. It matters insofar as the explicit description and treatment of the whole SU(2)×U(1)-theory would require much more space. 3 We mention also that FE and STI for pure Yang–Mills theory in the limit case without UV cutoff have been considered in [BAM2].

Renormalization Proof for Spontaneously Broken Yang–Mills Theory

479

larized theory can be expressed through Green functions carrying an operator insertion, which depends on the regulators. FE theory for such insertions tells us that these Green functions will vanish once the cutoffs are removed, if we achieve renormalization conditions on the theory such that the inserted Green functions (uniquely calculated from those) have vanishing renormalization conditions for all relevant terms, i.e. up to the dimension of the insertion (which is 5 in our case). Comparing the number of relevant terms for the SU(2) theory – 37 (see App. A) – and for the insertion – 53 (see App. C) – we realize that it is not possible to make vanish 53 terms on adjusting 37 free parameters, unless there are linear interdependences. It is again the FE (in its global integrated form) which helps us to make transparent these interdependences. The problem of how to find one’s way through the STI and adjusting the renormalization conditions appropriately is somewhat complicated through spontaneous symmetry breaking, since the latter mixes Green functions of different dimension. One may of course ask the question whether such a proof of the renormalizability of Yang–Mills theory is still necessary in view of the fact that the problem has been settled in the seventies by the pioneering work of ’t Hooft and Veltman and successors. Without going into details or giving references on work which has made entrance into nearly all textbooks on quantum field theory or particle physics we would still like to mention that there rests a bit of uneasiness on the mathematical physicists’ side on the form in which the subject has settled in the course of time. This is because the standard way in which the argument is presented nowadays is based on two main ingredients: the existence of an invariant regularization scheme, i.e. dimensional regularization, and algebraic manipulations on generating functionals, which can be given rigorous meaning for regularized path integral formulations. To date nobody has achieved a (rigorous) definition of dimensionally regularized path integrals so that there remains a gap in the reasoning which could only be closed if the analysis of the STI were directly performed on individual Feynman graphs, a presumably awkward procedure. These arguments do not apply to the lattice regularization4 , which allows for a (particularly transparent) path integral formulation while respecting gauge invariance. It violates Euclidean or Lorentz symmetry however. We emphasize the work of Reiß as a largely coherent and rigorous analysis of the perturbative renormalization problem of (QCD type) gauge theories on the lattice [Rei]. His work is based on an adaptation of BPHZ renormalization to the lattice, where quite a number of new problems appear. As a guide to the logical structure of the paper we now expose the main line of arguments. Our starting point is a massive UV regularized theory. The generating functional L3,30 of the connected amputated Green functions (CAG) with momenta in the interval [3, 30 ] satisfies a flow equation (35) with respect to 3, which when reduced to its per0 turbative content (37) permits to bound inductively the l-loop n-point functions L3,3 l,n in such a way (39, 43) that their existence for 30 → ∞ becomes obvious. This is true for all theories renormalizable by power counting under the condition that all relevant terms, i.e. local terms of mass dimension ≤ 4 are fixed by (30 -independent) renormalization conditions (r.c.). In gauge theories the number of such terms is generally much bigger than the number of free parameters of the theory. For our model the respective numbers are 37 (listed in App.A) and 8 (cf. (121)). So most of the r.c. cannot be freely chosen for a gauge theory. A priori it does not seem possible to guess which r.c. are the right ones. Thus we analyse the action L0,30 for general r.c. and expose the violation of the STI as a functional associated with an operator insertion, which turns out to be of dimension 5. 4 The above mentioned algebraic analysis is however based on the continuum formulation.

480

C. Kopper, V. F. Müller

0 We denote it as L1 = L0,3 (75). This is achieved by using an UV regularized version 1 (62, 66) of the BRS transformation (13, 14, 18). General results from FE theory tell 0 will vanish for 30 → ∞ if all its relevant terms, i.e. the local parts us that L0,3 1 0 ,30 of dimension ≤ 5, are fixed to be 0 by the r.c. and if the irrelevant terms in L3 1 0,30 vanish sufficiently rapidly for 30 → ∞ (110). The 53 renormalization parts for L1 (see App. C) are functions of the 37 r.c. for L0,30 and 7 free parameters in the BRS transformation (see App. B). Thus if the model can be renormalized respecting the STI there must be linear interdependences among the 53 relations. These are not explicit in the theory L0,30 , since L0,30 contains irrelevant terms of arbitrary dimension which are not known explicitly. We therefore derive the violated Slavnov–Taylor identities (VSTI) 0 ,30 (98, 99), using again the FE also in terms of the bare functionals L30 ,30 and L3 1 0 0 ,30 and L3 for that purpose. The FE may also be used (104, 113–120) to relate L0,3 1 1 with each other (111, 112) so that – respecting the inductive procedure, i.e. climbing up in the loop order l, and for given l in the number of external legs n – we may hope 0 ,30 to to satisfy the STI (for 30 → ∞) as well by imposing the relevant terms in L3 1 0,30 3 ,3 5 0 0 vanish (instead of those in L1 ). Since L does not contain unknown irrelevant terms, an explicit analysis of the bare STI is possible, and we can make 53 terms vanish order by order in l by appropriately fixing L30 ,30 and the free BRS constants. However starting at the wrong end – i.e. fixing counterterms instead of r.c. – we cannot prove renormalizability. Thus the task is threefold:

i)

Reveal a number of free renormalization constants corresponding to the free parameters of the theory (121). ii) Satisfy a subset of the STI for the relevant parts by choosing appropriate r.c. for L0,30 (125, 127). This subset has to be chosen sufficiently large to get a hold on the finiteness problem, with the help of the FE and afterwards also of the STI themselves. iii) Satisfy the remaining STI for the relevant parts by choosing the appropriate l-loop terms in L30 ,30 (122, 123, 124). It is possible indeed to show that all remaining STI ((128, 129, 130) and those mentioned after (131)) can be satisfied. These are far more than the constants fixed in iii). All this has to be done respecting the order of the inductive procedure. If it were not possible to make ends meet (i.e. if either the subset in ii) is too small to prove finiteness, or the one in iii) is too small in order to satisfy all STI) we would face what is called an anomaly. Our procedure is complicated by a technical point. The analysis of the relevant part of the STI at 3 = 0 is much more complicated for L0,30 than for 0 0,30 , the generating functional of the one-particle irreducible functions. For L0,30 many more terms of the same loop order may appear in a single STI. Passing to one-particle irreducible objects achieves to a considerable degree the disentangling of the l-loop renormalization parts in the inhomogeneous linear equations of App. C. So App. C has indeed been written for the 0 0,30 - and not for the L0,30 -functional. The price to pay is that we have to provide for the necessary machinery for the 0-functional (flow equations (87), STI (82)) too, using the Legendre transform (78, 79). This should not obscure the fact that all results of this paper are to be obtained from L0,30 . 5 L30 ,30 will include some well-behaved irrelevant terms (107, 108) linked to the particular nature of the cutoff (30) chosen.

Renormalization Proof for Spontaneously Broken Yang–Mills Theory

481

This paper is organized as follows. In Sect. 2 we introduce the classical action of the model and fix notations. In Sect. 3 we introduce the concepts from FE theory and recall the statements on renormalizability we need. As regards the general aspects of bounding inductively solutions of the FE we tend to be brief as the reasoning follows the lines of previous papers. In Sect. 4 we derive the VSTI for the regularized theory in various forms, comment on the adaptation of the renormalization results to the vertex functions, analyse the above mentioned operator insertion and show how to make its relevant parts vanish step by step by disposing of the freedom in choosing the renormalization conditions. This is the key part of the paper. With the aid of the results from Sect. 3 it permits us to prove that the STI are restored and thus solves the renormalization problem for spontaneously broken SU(2) Yang–Mills theory. 2. Classical Theory and Tree Approximation We collect some basic properties of the classical Euclidean SU(2)- Yang–Mills-Higgs model in four dimensional Euclidean spacetime, mainly to introduce the notation and the conventions. We largely follow the textbook of Faddeev and Slavnov [FaSl]. The action considered involves the realYang–Mills field {Aaµ }a=1,2,3 and the complex scalar doublet {φα }α=1,2 . All bosonic fields appearing in this paper may be viewed as smooth functions of (sufficiently) rapid fall-off. Details do not matter in view of the fact that we do not perform any nonperturbative analysis of path integrals. The action has the form Z 1 a a 1 ∗ ∗ 2 2 F F + (∇µ φ) ∇µ φ + λ(φ φ − ρ ) , (1) Sinv = dx 4 µν µν 2 with the curvature tensor a (x) = ∂µ Aaν (x) − ∂ν Aaµ (x) + g abc Abµ (x)Acν (x) Fµν

(2)

and the covariant derivative ∇ µ = ∂µ + g

1 a a σ Aµ (x) 2i

(3)

acting on the SU(2)-spinor φ. The parameters g, λ, ρ are real positive, abc is totally skew symmetric, 123 = +1, and {σ a }a=1,2,3 are the standard Pauli matrices. For simplicity the wave function normalizations of the fields are chosen equal to one. The action (1) is invariant under local gauge transformations of the fields 1 1 a a σ Aµ (x) −→ u(x) σ a Aaµ (x)u∗ (x) + g −1 u(x)∂µ u∗ (x), φ(x) −→ u(x)φ(x) 2i 2i (4) with u : R4 → SU(2) smooth.A stable ground state of the action (1) implies spontaneous symmetry breaking, taken into account by reparametrizing the complex scalar doublet as B 2 (x) + iB 1 (x) , (5) φ(x) = ρ + h(x) − iB 3 (x)

482

C. Kopper, V. F. Müller

where {B a (x)}a=1,2,3 is a real triplet and h(x) the real Higgs field. Moreover, in place of the parameters ρ, λ we introduce the masses m=

1 1 gρ, M = (8λρ 2 ) 2 . 2

Aiming at a quantized theory we choose the ’t Hooft gauge fixing Z 1 (∂µ Aaµ − αmB a )2 , Sg.f. = dx 2α

(6)

(7)

with α ∈ R+ , implemented by the anticommuting Faddeev–Popov ghost and antighost fields {ca }a=1,2,3 and {c¯a }a=1,2,3 , respectively, via Z 1 Sgh = − dx c¯a (−∂µ ∂µ + αm2 )δ ab + αgm hδ ab 2 (8) b 1 acb c acb c + αgm B − g∂µ Aµ c . 2 Hence, the total "classical action" is SBRS = Sinv + Sg.f. + Sgh ,

(9a)

which we decompose as Z SBRS =

dx Lquad (x) + Lint (x)

(9b)

into its quadratic part, with 1 ≡ ∂µ ∂µ , Lquad =

1 (∂µ Aaν − ∂ν Aaµ )2 + 4 1 + h(−1 + M 2 )h + 2 −c¯a (−1 + αm2 )ca ,

1 1 (∂µ Aaµ )2 + m2 Aaµ Aaµ 2α 2 1 a B (−1 + αm2 )B a 2 (10)

and into its interaction part 1 Lint = g abc (∂µ Aaν )Abµ Acν + g 2 ( abc Abµ Acν )2 4 o 1 n + g (∂µ h)Aaµ B a − hAaµ ∂µ B a − abc Aaµ (∂µ B b )B c 2 o 1 a an + gAµ Aµ 4mh + g(h2 + B a B a ) 8 1 2 M 2 2 1 M2 2 a a (h + B a B a )2 h(h + B B ) + g + g 4 m 32 m n o 1 − αgmc¯a hδ ab + acb B c cb 2 −g acb (∂µ c¯a )Acµ cb .

(11)

Renormalization Proof for Spontaneously Broken Yang–Mills Theory

483

In (10) we recognize that all fields are massive and that no coupling term Aaµ ∂µ B a appears. The propagators of the Yang–Mills field Aaµ , of the Higgs field h, and of the ghost field ca and the Goldstone field B a , are thus (respectively) pµ pν δ ab {δµν − (1 − α) 2 }, 2 2 p +m p + αm2 δ ab 1 , S ab (p) = 2 . C(p) = 2 2 p +M p + αm2

ab (p) = Cµν

(12)

The classical action SBRS in (9b) has the following properties: i) Euclidean invariance: SBRS is an O(4)-scalar. ii) Rigid SO(3)-isosymmetry: The fields {Aaµ }, {B a }, {ca }, {c¯a } are isovectors and h is an isoscalar; SBRS is invariant under global SO(3)-transformations. iii) BRS-invariance: Introducing the classical composite fields n o (13a) ψµa (x) = ∂µ δ ab + g arb Arµ (x) cb (x), 1 ψ(x) = − gB a (x)ca (x), 2 1 1 arb r a ab ψ (x) = (m + g h(x))δ + g B (x) cb (x), 2 2 1 a (x) = g apq cp (x)cq (x), 2 the BRS-transformations of the fields are defined as Aaµ (x) −→ Aaµ (x) − ψµa (x),

(13b) (13c) (13d)

(14a)

h(x) −→ h(x) − ψ(x), (14b) a a a (14c) B (x) −→ B (x) − ψ (x), (14d) ca (x) −→ ca (x) − a (x), 1 (14e) c¯a (x) −→ c¯a (x) − (∂ν Aaν (x) − αmB a (x)). α In these transformations is a spacetime independent Grassmann element that commutes with the fields {Aaµ , h, B a } but anticommutes with the (anti-)ghosts {ca , c¯a }. To show the BRS-invariance of the total classical action (9) one first observes that the composite classical fields (13) are themselves invariant under the BRS-transformations (14). Moreover, we can write (8) in the form Z (15) Sgh = − dx c¯a {−∂µ ψµa + αmψ a }. Using these properties the BRS-invariance of (9) is straightforward (if somewhat tedious) to verify. It is convenient to add to the classical action (9) source terms both for the fields and the composite fields (13), defining Z Sc = SBRS + dx{γµa ψµa + γ ψ + γ a ψ a + ωa a } Z (16) − dx{jµa Aaµ + sh + ba B a + η¯ a ca + c¯a ηa }.

484

C. Kopper, V. F. Müller

The sources γµa , γ , γ a have dimension 2, ghost number −1 and are Grassmann elements, whereas ωa has dimension 2 and ghost number −2; the sources ηa and η¯ a have ghost number +1 and −1, respectively, and are Grassmann elements. The BRS-transformation (14) of Sc can be written as Sc −→ Sc + DSc ,

(17)

employing the BRS-operator D, defined by ( ) Z δ δ δ δ δ δ 1 −m a + ba a + η¯ a a + ηa ∂ν . (18) D = dx jµa a + s δγµ δγ δγ δω α δjνa δb (Observe that anticommutes with η, η, ¯ too.) For some purposes it will turn out to be convenient to regard the fields and functionals thereof in momentum space. Our conventions are Z Z Z d 4p ipx ˆ e φ(p), = , (19) φ(x) = (2π )4 p p where mostly we will omit the hat on φ(p). From (19) we obtain Z Z δ δ δ = d 4 p e−ipx = (2π )4 e−ipx . ˆ ˆ δφ(x) δ φ(p) δ φ(p) p For functionals with operator insertions like, e.g. Sγ (x) :=

δSc , δγ (x)

Z we define Sγ (p) :=

d 4 x eipx Sγ (x)

(20)

(again in abusively shortened notation). For later use it will be convenient to introduce a shortened collective notation for the fields, sources and propagators. As for the latter, we will sometimes denote all propagators (12) collectively by C. Furthermore we write for the bosonic fields ϕτ = (Aaµ , h, B a ) with corresponding sources Jτ = (jµa , s, ba ),

(21)

for all fields 8 = (ϕτ , ca , c¯a ) and for their sources K = (Jτ , ηa , ηa ),

(22)

and for the insertion sources ξ = (γµa , γ , γ a , ωa ) and γτ = (γµa , γ , γ a ).

(23)

The quantization of the classical theory amounts to constructing a well-defined version of the formal functional integral respresentation for the generating functional W of the connected Green functions such that these functions satisfy the system of STI. Considering the formal expression for the modified generating functional Z 1 1 ¯ exp{− Sc }, (24) exp W = N [dA dh dB dc d c] h¯ h¯

Renormalization Proof for Spontaneously Broken Yang–Mills Theory

485

we observe that the quadratic part (10) appearing in Sc constitutes a well-defined Gaussian measure6 . In a formal loop expansion of the remaining part of the exponent the emerging order h¯ 0 , i.e. the tree approximation, is well-defined and satisfies DW |h0 = 0,

(25)

which follows from (17) when using the invariance of the (formal) measure in (24) under BRS transformations. In the sequel we will inductively tackle all orders h¯ l , l ∈ N, of the loop expansion. 3. Flow Equations: Renormalizability Without Slavnov–Taylor Identities 3.1. The Flow Equations for the SU(2) Yang–Mills Higgs Model. The FE of Wilson’s renormalization group is obtained as a differential equation w.r.t. the flow parameter 3, which is the energy scale down to which the degrees of freedom have been integrated out, starting from the UV region. We will consider the generating functional of the connected amputated Green functions (CAG) which we denote as ¯ L3,30 (ϕτ , c, c)

(26)

with the following explanations: We have introduced an UV regularization7 30 to have a well-defined starting point, so that 0 ≤ 3 ≤ 30 < ∞.

(27)

¯ is to be viewed as a formal power series in h¯ , since we are The functional L3,30 (ϕ, c, c) studying the perturbative renormalization problem in the loop expansion. To be more precise on its definition we write it as L3,30 =

∞ X |n|=3

0 L3,3 l=0,n +

∞ X l=1

h¯ l

∞ X |n|=1

0 L3,3 l,n .

(28)

Here the multiindex n denotes the number of field variables of each species appearing: n = {nA , nh , nB , nc¯ , nc }, |n| := nA + nh + nB + nc¯ + nc .

(29)

So for |n| = 4 we are e.g. regarding a four point function. Equation (28) implies that, by definition, at 0 loop order L3,30 contains no contribution from the one- or two-point functions. With this restriction it is the generating functional of the CAG of the following theory: i) The propagators are those from (12) including the regulating factor σ3,30 (p2 ) =

σ30 (p2 ) − σ3 (p2 ) − 1 [(p2 +m2 )(p2 +αm2 )(p2 +M 2 )] . with σ3 (p 2 ) = e 36 σ30 (0) (30)

6 Once we have introduced the regularization (30), the support of the measure consists of sufficiently well-behaved functions. 7 Furthermore we should restrict the theory to a finite volume V as long as field independent vacuum terms are generated by the flow, which diverge in infinite volume by translation invariance. We do not make this explicit here and refer the interested reader to previous work [KKSa, KK3].

486

C. Kopper, V. F. Müller

In the sequel this choice of the cutoff function turns out to be technically convenient8 . Besides being explicit it permits to verify easily the following bounds on the regularized propagators C 3,30 (p) := C(p) σ3,30 (p2 )   Y |w| ∂   ∂ C 3,30 (p) i=1 ∂pµi ∂3 (31) C , for 0 ≤ 3 ≤ m . ≤ for m ≤ 3 ≤ 30 3−3−|w| P(|p|/3) σ3 (p2 ) , Here and in the following P denotes (each time it appears possibly a new) polynomial with nonnegative coefficients. These as well as the constant C depend on α, m, M, |w|, but not on p, 3, 30 . ii) The vertices are to be taken from our starting bare action (interaction Lagrangian including counterterms) L0 := L30 ,30 .

(32)

In the case of an invariant regularization we would choose here SBRS from (9b), modified by including counterterms of any order h¯ l , l ≥ 1, of the same structure and by excluding the 0-loop quadratic part. In our case such a restricted choice would not allow to prove restoration of the STI. Therefore we will allow at first for all counterterms permitted by the unbroken global symmetries of the theory, i.e. O(4) and SO(3)iso . These terms will then become unique functions of the renormalization conditions chosen. There are 37 such local terms of dimension ≤ 4, corresponding to those listed in Appendix A. At the tree level l = 0 we shall always consider the terms with |n|+|w| ≤ 4 to be given by (11). We denote by n 0 0 L3,3 |8≡0 = δ(p1 + . . . + p|n| ) L3,3 (2π)4(|n|−1) δ8(p) l l,n (p1 , . . . , p|n|−1 )

(33)

¯ c) the n-point CAG of loop order l involving the indicated number of (Aµ , h, B, c, fields. We will also write p for (p1 , . . . , p|n|−1 ) in the following. We stay somewhat unprecise about the momentum assignment to the fields since this would unnecessarily blow up the notation. We also omit vector and isovector indices. Finally we will also use the shorthand ∂ w :=

|n|−1 4 Y Y

(

i=1 µ=1

X ∂ wi,µ ) with w = (wi,1 , . . . , w|n|−1,4 ), |w| = wi,µ . ∂pµi

(34)

The Flow equations (FE) have been derived quite generally several times, so we tend to be brief. The Wilson FE written for L3,30 takes the form9 3,30 +I 3,30

e− h¯ (L 1

) = eh¯ 1(3,30 ) e− h¯ L . 1

0

(35)

8 There is of course a lot of arbitrariness in this choice. What is needed is a sufficiently well-behaved function tending to 1 for 3 → 0, 30 → ∞, which is essentially supported for momenta between 3 and 30 . The verification of the restoration of the STI in Sect. 4 would be somewhat easier using a suitable regulating 2

2

function with compact support of the type σ3 (p) = K( p +m ), where K(x) = 1, x ≤ 1, K(x) = 0, x ≥ 2, 32 K monotonic and smooth. But the choice (30) allows to perform the analytic continuation to Minkowski space as shown in [KKSc], and it has the advantage that (σ3 (p))−1 is well-defined. Avoiding its appearance is possible, but sometimes needs detours. 9 I 3,30 is the vacuum functional which strictly speaking exists only in finite volume. Since it plays hardly any role in the following, we do not discuss this issue here and refer to [KKSa, KK3] for further comments.

Renormalization Proof for Spontaneously Broken Yang–Mills Theory

487

Here 1(3, 30 ) is the functional Laplace operator which in our theory takes the form * + 1 δ 1 δ 3,30 δ 3,30 δ , C , C + 1(3, 30 ) = 2 δAaµ µν δAaν 2 δh δh (36) δ 1 δ 3,30 δ 3,30 δ i+ ,S . + h a, S 2 δB δB a δca δ c¯a 0 Using our shorthand notation we obtain the FE for the CAG L3,3 l,n from (35) on deriving w.r.t. 3, expanding L as in (28) and using (33) Z X w 3,30 0 cn0 (∂3 C 3,30 (k)) ∂ w L3,3 ∂3 ∂ Ll,n (p) = l−1,n0 (p, k, −k)

n0 ,|n0 |=|n|+2

−

k

X

l1 +l2 =l, w1 +w2 +w3 =w n1 ,n2 ,|n1 |+|n2 |=|n|+2

0 cn1 ,n2 ∂ w1 L3,3 l1 ,n1 (p1 , . . . , p|n1 |−1 )

· (∂ w3 ∂3 C 3,30 (p0 ))

(37)

0 0 ∂ w2 L3,3 l2 ,n2 (−p , . . . , p|n|−1 )

. s,a

The constants cn0 , cn1 ,n2 are combinatorial. The field assignment of the propagators C 3,30 is not written, it is implicit in the multiindices n0 , n1 , n2 related to n. On the r.h.s. the integrated momentum k refers to that of the fields from n0 − n, and −p0 = p1 + . . . + p|n1 |−1 . Furthermore the subscripts s, a indicate (anti)symmetrization according 0 to the statistics of the various fields, since we assume the L3,3 to be (anti)symmetrized l,n from the beginning. 3.2. Renormalizability. The system of differential FE (37) can be integrated inductively, using mixed boundary conditions (b.c.): A1 ) At 3 = 30 the n point functions with |n| + |w| > 4, i.e. the irrelevant ones, are supposed to be smooth functions of p, 30 obeying the bounds 0 ,30 (p)| ≤ 30 |∂ w L3 l,n

4−|n|−|w|

P1 (log

30 |p| ) P2 ( ), |n| + |w| ≥ 5. m 30

(38)

The standard case are b.c., where the r.h.s. of (38) vanishes. We need to be slightly more general to compensate for effects of the cutoff function σ0,30 , see Sect. 4, (107, 108). A2 ) At 3 = 0 the CAG with |n|+|w| ≤ 4, i.e. the relevant ones, are fixed, order by order in h¯ at the renormalization point, which we choose at p = 0 for simplicity. The renormalization conditions (r.c.) may be chosen weakly 30 -dependent; we restrict to smooth uniformly bounded functions of 30 converging for 30 → ∞. Of course we always restrict to b.c. respecting the global (Euclidean and Iso-)symmetries. 0 With the FE we can inductively obtain the following bounds on the CAG L3,3 l,n :

Proposition 1. 4−|n|−|w| 0 P1 (log |∂ w L3,3 l,n (p)| ≤ (3 + m)

3+m |p| ) P2 ( ). m 3+m

(39)

488

C. Kopper, V. F. Müller

The polynomials P1 , P2 have nonnegative coefficients depending on l, n, w, α, m, M, but not on p, 3, 30 . We do not present a proof of the proposition since the line of thought is the same as in the references [KKSa, KK3, Kop] and restrict ourselves to a few comments. It proceeds by induction upwards in the number of loops and for given loop order upwards in |n| (in contrast to the procedure employed when expanding in a coupling constant: There one proceeds downwards in |n|. For given l, n we proceed downwards in |w|, starting from some arbitrary10 |wmax | ≥ 3. Thus we have to start at loop order l = 0 and from |n| = 3, 0 since L3,3 l=0 does not contain contributions for |n| ≤ 2. Equation (37) immediately gives 30 ,30 0 (p), |n| = 3, L3,3 0,n (p) = L0,n

since the r.h.s. vanishes. Thus the bound is satisfied. For |n| = 4, l = 0 we may also fix the b.c. at 3 = 30 , if we want to read them off the action (11), since here the second term on the r.h.s. of (37) contributes and leads to a one particle reducible difference between 30 ,30 0 . This digression of the rules A1 ), A2 ) is a pure matter of convenience L3,3 0,n and L0,n however. The inductive proof then proceeds by inserting the induction hypothesis on the r.h.s. of the FE (which has already been bounded) and performing the momentum and 3-integrals, starting from the respective b.c. and using the bound (31). An important point to note is the following: which bounds for the L3,30 can be obtained depends only on the b.c. imposed and on the propagators (and dimensionality). Note finally that for the purpose of renormalizability only the bound on L0,30 in the limit 30 → ∞ is needed. The rest is of a technical nature. In the next chapter we want to make use of the following also somewhat technical Corollary. For given l0 > 0 and n0 , w0 with |n0 | + |w0 | ≤ 4 we assume that the b.c. 0 on the CAG ∂ w L3,3 l,n , (|w| ≤ |wmax | ), have been imposed in agreement with A1 ), A2 ) for l < l0 and arbitrary n, w; and for l = l0 and |n| < |n0 | and |w| ≤ |w0 |. Suppose 0 that we fix the b.c. for ∂ w0 L3,3 l0 ,n0 ’on the wrong side’, i.e. at 30 , such that it obeys the bound 4−|n0 |−|w0 |

P(log(30 /m)).

(40)

4−|n0 |−|w0 |

P(log(30 /m)).

(41)

0 ,30 |∂ w0 L3 l0 ,n0 (0)| ≤ 30

Then we also have 0 |∂ w0 L3,3 l0 ,n0 (0)| ≤ 30

Proof. Due to our assumptions, the r.h.s. of the FE (37) is bounded by (39), since the bounds on all terms preceding (l0 , n0 , w0 ) in the induction remain unchanged apart from those with |w| > |w0 |. Those are not needed however because we only make a statement at the renormalization point p = 0 and thus do not require a bound on the Taylor remainder. The deterioration of the bound then stems from both the b.c. contribution (40) and from the fact that the r.h.s. of the FE has to be integrated from 30 10 The minimal value of 3 is needed, because for the relevant terms the passage from the fixed momentum, at which the renormalization conditions are imposed, to any momentum is achieved by the Schlömilch or integrated Taylor formula [KKSa,Pol]. For the two point function there thus appear up to three derivatives. If one also wants to prove smoothness one has to admit for arbitrarily high |wmax |.

Renormalization Proof for Spontaneously Broken Yang–Mills Theory

489

to 3 (instead of integrating from 0 to 3), i.e. from the wrong side. This gives the bound 4−|n0 |−|w0 |

0 |∂ w0 L3,3 l0 ,n0 (0)| ≤ 30

Z +

≤

30

P1 (log(30 /m)) 0

d3 3

0 4−|n0 |−|w0 |−1

3 4−|n |−|w0 | P3 (log(30 /m)). 30 0

P2 (log(3 /m)) 0

0 ,30 Note that the bound does not improve, if we set the b.c. for ∂ w0 L3 l0 ,n0 (0) equal to zero. t u

We remark that statements similar to that of the corollary could also be extended to general external momenta, they are not needed however. In response to the remarks made before one may ask oneself whether the previous bounds (39) may be improved, if the b.c. are in some sense smaller. This is indeed the case. Regard e.g. the CAG containing an odd number of scalar fields, i.e. nh + nB ∈ 2N − 1. Then the following improved bounds hold: 3−|n|−|w| 0 P1 (log |∂ w L3,3 l,n (p)| ≤ (3 + m)

3+m |p| ) P2 ( ). m 3+m

(42)

The main reason why we may expect an improvement of power counting for those terms in our theory is that, as can be seen in App.A , at l = 0 the terms in question are all proportional to a mass factor. Since we will not need such sharpened statements we do not give a proof of (42) here. As usual the bound on the Green functions should be complemented by a convergence statement, since (39, 42) would still admit bounded but oscillating solutions11 . Convergence follows from Proposition 2. 0 |∂30 ∂ w L3,3 l,n (p)| ≤

1 |p| ). (3 + m)5−|n|−|w| P1 (log(30 /m)) P2 ( 2 3+m 30

(43)

As before the nonnegative coefficients in the (new) polynomials Pi may depend on l, n, w, α, m, M, but not on p, 3, 30 . For the proof, which follows the same inductive scheme, we refer again to the earlier references [KKSa, KK3, Kop].

3.3. Bounds on Green Functions with Operator Insertions. The problem of renormalizing Green functions with operator insertions has been studied quite generally in [KK3, KK4]. Again we state the propositions needed for SU(2) Yang–Mills theory without proofs, restricting to remarks on the (minor) modifications needed. We have to deal with two kinds of operator insertions here. The first are the BRS insertions (13a)–(13d). These are defined as operator insertions of dimension 2, ghost number one for (13a)–(13c) and ghost number 2 for (13d), which transform as vector-isovector, scalar-isoscalar, scalarisovector and scalar-isovector respectively. By the general renormalization theory we 11 a possibility generally only envisaged by mathematical physicists since such oscillations are counterintuitive to any experience from calculations

490

C. Kopper, V. F. Müller

thus have to allow for all counterterms of dimension ≤ 2 and of the same symmetry properties. In the bare action the insertions take the form ψµa (x) = R10 ∂µ ca (x) + R20 g arb Arµ (x) cb (x), 1 ψ(x) = −R30 g B a (x)ca (x), 2 1 1 a 0 ψ (x) = R4 m ca (x) + R50 g h(x) ca (x) + R60 g arb B r (x) cb (x), 2 2 a 0 1 apq p q (x) = R7 g c (x)c (x), 2

(44a) (44b) (44c) (44d)

where we demand Ri0 = 1 + O(h¯ ),

(45)

i.e. the counterterms are again viewed as formal power series in h¯ , and we of course assume the insertions to agree with (13a–13d) at the tree level. The following remark might be helpful, as regards the transformation (14e) of the 0 , corresponding to the terms of antighost: We do not introduce constants R80 , . . . , R11 dimension ≤ 2 with the same symmetry properties (besides the ones in (14e) these are h B a and εabc cb c¯c ). The claim implicit (not only here, but throughout the literature) and verified in Sect. 4 is then that it is possible to obtain a finite renormalized theory12 satisfying the STI, by fixing these constants at 3 = 30 , i.e. on the wrong side; in fact 0 , R 0 = 0. In the more general case one would have to admit setting R80 , R90 = 1, R10 11 arbitrary values for these four constants and to introduce another source for the respective composite operator. The (violated) STI (see below (75, 82, 98)) would then take a more symmetric form, the terms involving Aaµ , B a being replaced by another one of the form hca , D Lγ a i. The insertions may be generated by the respective sources as in (16), we set Z 30 ,30 = dx {γµa (x)ψµa (x) + γ (x)ψ(x) + γ a (x)ψ a (x) + ωa (x)a (x)}, (46) Lξ and also 0 ,30 . L˜ 30 ,30 = L30 ,30 + L3 ξ

(47)

We again get a Wilson FE (cf. (35)) for L˜ 3,30 generating the CAG with operator insertions13 ˜ 3,30 +I˜3,30 )

e− h¯ (L 1

˜ 30 ,30

= eh¯ 1(3,30 ) e− h¯ L 1

.

(48)

Restricting our attention to CAG with one insertion, e.g. 0 L3,3 γ (x) :=

δ L˜ 3,30 |ξ =0 δγ (x)

(49)

12 This is related to the fact that (14e) is linear in 8. 13 We will only regard insertions with nonvanishing ghost number. Therefore the vacuum functional I˜ equals

I , since there are no vacuum diagrams with nonvanishing ghost number, due to ghost number conservation under the flow. Thus we will always write I subsequently.

Renormalization Proof for Spontaneously Broken Yang–Mills Theory

491

(similarly for the other insertions) we obtain by deriving (48) w.r.t. 3 a linear FE for 0 L3,3 γ (x) . Writing similarly as in (33), 3,30 n 0 L3,3 (2π)4(|n|−1) δ8(p) γ (q);l |8≡0 = δ(q + p1 + . . . + p|n| ) Lγ (q);l,n (p1 , . . . , p|n|−1 ), (50)

we obtain the differential FE for CAG with one insertion Z X 0 0 0 (p) = c (∂3 C 3,30 (k)) ∂ w L3,3 ∂3 ∂ w L3,3 n γ (q);l,n γ (q);l−1,n0 (p, k, −k) k

n0 ,|n0 |=|n|+2

−

X

l1 +l2 =l, w1 +w2 +w3 =w n1 ,n2 ,|n1 |+|n2 |=|n|+2

· (∂

w3

∂3 C

0 cn1 ,n2 ∂ w1 L3,3 γ (q);l1 ,n1 (p1 , . . . , p|n1 |−1 )

3,30

0

(p )) ∂

w2

0 0 L3,3 l2 ,n2 (−p , . . .

, p|n|−1 )

, s,a

(51) the notation being that of (37). Since ghost and antighost in (36) do not appear symmetrically, the c¯ (c)-derivative appears once in n1 (n2 ) and once in n2 (n1 ). In the following we denote for shortness by ξ(q) any of the sources γµa (q), γ (q), γ a (q), ωa (q). Obviously each of the insertions leads to a FE as (51). In the derivation of (51) no use is made of the specific kind of insertion considered. Thus even more generally we replace ξ(q) by χ(q) when talking of an insertion of dimension D (instead of 2). This is because we also want to cover the CAG with one insertion of dimension 5 describing the BRS violating terms of the regularized theory. This insertion is analysed in Sect. 4.1. The particular kind of insertion chosen only comes into play when considering the b.c., which are fixed as follows: 0 B1 ) At 3 = 30 the n point functions ∂ w L3,3 χ (q);l,n with |n| + |w| > D, i.e. the irrelevant ones, are supposed to obey the bounds (cf. A1 , (38)) D−|n|−|w|

0 ,30 |∂ w L3 χ(q);l,n (p)| ≤ 30

P1 (log

30 |p| ) P2 ( ), |n| + |w| > D. m 30

(52)

B2 ) At 3 = 0 the CAG with |n| + |w| ≤ D, i.e. the relevant ones, are fixed, order by order in h¯ at the renormalization point p = 0, with the same restrictions as in A2 ). Again (51) lends itself to an inductive scheme through which we may prove the 0 renormalizability of the CAG with insertion. For the L3,3 ξ(q);l,n there are seven free r.c. 0 which fix the seven parameters Ri0 from (45). For the CAG L3,3 χ (q);l,n with insertion

0 ,30 from (67) we have to fix 53 r.c. corresponding to the list in App. C. Under these L3 1 conditions our inductive scheme may now also be employed to prove boundedness and convergence of inserted Green functions.

Proposition 3. D−|n|−|w| 0 P1 (log |∂ w L3,3 χ (q);l,n (p)| ≤ (3 + m)

|∂30 ∂

w

0 L3,3 χ(q);l,n (p) |

3+m ) P2 m

|p| , 3+m

(3 + m)D+1−|n|−|w| ≤ P1 (log(30 /m)) P2 320

(53)

|p| . (54) 3+m

492

C. Kopper, V. F. Müller

Whereas the bounds from Proposition 3 are sufficient for our purposes with regard to the 3,30 0 functions L3,3 ξ(q);l,n , we need a stronger result for the BRS violating insertions Lχ (q);l,n , which we can achieve on imposing further restrictions on the b.c. It is important in this respect that the FE for the inserted CAG is linear. This implies e.g. that multiplying all CAG with a 3- independent factor gives a new solution. If we want to show that 0 the CAG L3,3 χ (q);l,n from Sect. 4.1 vanish in the limit 30 → ∞, the strategy is thus

0 to reveal a negative power of 30 , which can be factorized from the CAG L3,3 χ (q);l,n . It is quite conceivably a sufficient condition for achieving this, to require that all r.c. be bounded by a negative power of 30 . The main issue of Sect. 4 will be to prove that there exist r.c. on the CAG such that the inserted CAG describing BRS violation obey such suppressed r.c. Once this is accomplished we can rely on the following proposition for the restoration of BRS invariance:

Proposition 4. Replace the statements from B2 ) on the renormalization conditions by 0 B3 ) At 3 = 0 the L0,3 ¯ l and p = 0 to be χ (q);l,n with |n| + |w| ≤ D are fixed at order h smooth functions of 30 bounded by

1 P(log(30 /m)). 30 Then we have the bound |∂

w

0 L3,3 χ (q);l,n (p)|

1 ≤ (3 + m)D+1−|n|−|w| P1 (log(30 /m)) P2 30

(55)

|p| . 3+m

(56)

Again we do not give a proof, but refer to our previous remarks, to [KK3] and in particular to Prop. 7 in the paper on QED [KK2], where similar results were obtained in the more complicated situation of a massless theory. Proposition 4 obviously shows 0 that the CAG L3,3 χ (q);l,n vanish for 30 → ∞. We remark that in Sect. 4 we will arrange for r.c. such that the bound (55) can be set to 0. This does not improve (56), because of the nonvanishing b.c. for the irrelevant terms (see B1), (52) above). 4. Restoration of the Slavnov–Taylor Identities 4.1. Violated Slavnov–Taylor Identities for Connected and Proper Green Functions. Once the physical free parameters of the theory, i.e. g, λ, m and the gauge fixing parameter α 14 have been fixed, the Yang–Mills-Higgs theory should be uniquely determined up to normalizations of the fields. The standard tool to enforce this uniqueness are the Slavnov–Taylor-identities. Whereas their role is twofold in renormalization procedures based on invariant regularization schemes – apart from assuring uniqueness and physical gauge invariance, they also serve as a technical tool to show inductively that the theory can be renormalized without introducing counterterms not present in the bare action – we only have to ensure their validity for the first purpose. At an intermediate stage they are inevitably violated by the regularization in momentum space, as gauge invariance is. We want to show that they hold after removing the regularization, if we choose the renormalization conditions properly. Our starting point is the generating functional of 14 on which physical quantities should not depend

Renormalization Proof for Spontaneously Broken Yang–Mills Theory

493

the regularized Green functions at the physical value 3 = 0 of the flow parameter. Remembering (21,22) we write ) ( Z X ϕτ (x)Jτ (x) + c¯a (x)ηa (x) + ηa (x)ca (x) . (57) h8, Ki = dx τ

The Gaussian measure dµ30 (8) corresponding to the quadratic form

1 h¯

Q30 with

1 1 a 0,30 −1 ) hi Q30 = hAaµ , (C 0,30)−1 µν Aν i + hh, (C 2 2 1 + hB a , (S 0,30 )−1 B a i − hc¯a , (S 0,30 )−1 ca i 2 is given by its characteristic functional Z 1 1 dµ30 (8) e h¯ h8, Ki = e h¯ P (K)

(58)

(59)

with P (K) =

1 1 1 a 0,30 a j i + hs, C 0,30 si + hba , S 0,30 ba i − hηa , S 0,30 ηa i. (60) hj , C 2 µ µν ν 2 2

The generating functional of the regularized Green functions may now be written as Z 1 30 ,30 1 + h¯ h8, Ki dµ30 (8) e− h¯ L . (61) Z 0,30 (K) = Defining regularized BRS variations of the fields through δBRS ϕτ (x) = −(σ0,30 ψτ )(x) ε, δBRS ca (x) = −(σ0,30 a )(x) ε,

(62)

1 δBRS c¯a (x) = −[σ0,30 ( ∂ν Aaν − m B a )](x) ε, α the BRS transform of the Gaussian measure is given by 1X hϕτ , (Cτ0,30 )−1 σ0,30 ψτ i ε dµ30 (8) 7 → dµ30 (8) 1 + h¯ τ 1 a hc¯ , (S 0,30 )−1 σ0,30 a i ε h¯ 1 1 + h ∂ν Aaν − m B a , σ0,30 (S 0,30 )−1 ca i ε h¯ α 1 = dµ30 (8) 1 − δBRS Q30 . h¯ −

(63)

The BRS-variation of the measure has mass dimension 5, since σ0,30 just cancels its inverse appearing in the inverted propagators in (63). This is convenient, and it is the basic reason why we regularized the BRS-transformation. Requiring the invariance of

494

C. Kopper, V. F. Müller

the functional integral in (61) under (regularized) BRS-transformations of the field variables15 , (62) provides us with the Violated Slavnov–Taylor identities (VSTI): Z 1 30 ,30 1 ! + h¯ h8, Ki dµ30 (8) e− h¯ L δBRS h8, Ki − δBRS (Q30 + L30 ,30 ) . (64) 0 = The BRS variations in (64) can be generated using an appropriate operator insertion: i) First we define the modified generating functional using (47), Z 1 ˜ 30 ,30 1 + h¯ h8, Ki dµ30 (8) e− h¯ L , Z˜ 0,30 (K, ξ ) =

(65)

together with the regularized BRS operator (compare to (18)) D30 =

X 1 δ δ δ δ hJτ , σ0,30 i + hη¯ a , σ0,30 a i + h( ∂ν a − m a ), σ0,30 ηa i. δγτ δω α δjν δb τ (66)

ii) Secondly we define the terms emerging from the BRS-noninvariance of the action 0 ,30 with ghost number 1, to form the insertion L3 1 0 ,30 ε := −δBRS (Q30 + L30 ,30 ). L3 1

(67)

0 ,30 is not a local operator. Due to the regularizing factor σ0,30 in (62) the insertion L3 1 Using (67) we introduce the generating functional Z 1 30 ,30 +χ L30 ,30 )+ 1 h8, Ki 1 h¯ dµ30 (8) e− h¯ (L (68) Zχ0,30 (K) :=

for χ ∈ R. Now the VSTI (64) can be rewritten as d 0,30 (K)|χ =0 . Z D30 Z˜ 0,30 (K, ξ )|ξ ≡0 = dχ χ

(69)

The modified functionals from (65, 68) permit to define the generating functionals of the corresponding CAG with the respective insertions Z˜ 0,30 (K, ξ ) = e h¯ P (K) e− h¯ (I 1

Zχ0,30 (K) = e

1 h¯ P (K)

1

e

0,30 +L ˜ 0,30 (ϕτ ,c,c;ξ ¯ ))

0,3 − h1¯ (I 0,30 +Lχ 0 (ϕτ ,c,c)) ¯

,

,

with the relations Z Z 0,30 a ϕτ (x) = dy Cτ (x − y)Jτ (y), c (x) = − dy S 0,30 (x − y)ηa (y), Z c¯a (x) = − dy S 0,30 (x − y)ηa (y)

(70) (71)

(72)

15 These transformations of variables and consequently (64) can be given rigorous meaning for the regularized Gaussian integrals. Arguing formally, (64) amounts to the somewhat sloppy statement that the Jacobian of the BRS-transformation equals 1, which in turn has rigorous meaning for the lattice regularization, see e.g. [Rei].

Renormalization Proof for Spontaneously Broken Yang–Mills Theory

495

between the variables of the Z and L functionals. Introducing the shorthand 1−α ∂µ ∂ν , −1 + M 2 , −1 + αm2 ≡ D Dτ = (−1 + m2 )δµν − α

(73)

for the inverted nonregularized propagators and also (remember (49)) 0 = L1 := L0,3 1

d 0,30 0| |χ =0 , L := L0,30 = L˜ 0,30 |ξ ≡0 (= L0,3 L χ =0 ) , χ dχ χ

(74)

since we will mostly regard the theory with 3 set to 0 in this section, we obtain from (69) via (70, 71, 72) the VSTI for the connected amputated functions CAG, 1 δL δL L1 = hca , D( ∂ν Aaν − m B a )i − hca , σ0,30 (∂ν a − m a )i α δAν δB X a hϕτ , Dτ Lγτ i − hc¯ , D Lωa i. +

(75)

τ

Since we also have to regard the proper vertex functions we define in an intermediate step the generating functional of connected nonamputated Green functions16 ˜

e h¯ W (K, ξ ) = 1

˜ Z(K, ξ) ˜ Z(0, 0)

(76)

(leaving out again the upper indices 0, 30 ). From this we derive using (69, 71, 72) ¯ D30 W˜ (K, ξ )|ξ =0 = −L1 (ϕτ , c, c).

(77)

The Legendre transform of W˜ now leads us to the generating functional of the proper vertex functions. We set Z X ˜ ˜ ¯ c; ξ ) + W (Jτ , η, η; ξ ) = dy ϕ τ Jτ + c¯ η + η c (78) 0(ϕ τ , c, τ

with the relations Jτ (x) =

δ 0˜ , δϕ τ (x)

c¯a (x) = −

δ W˜ , δηa (x)

ϕ τ (x) =

δ W˜ , δJτ (x)

ηa (x) = −

δ 0˜ , δca (x)

ηa (x) =

δ 0˜ , δ c¯a (x)

ca (x) =

δ W˜ . δηa (x)

(79)

Note that (78) says that Jτ , . . . may be viewed as a formal power series in h¯ with coefficients depending on the classical fields ϕ τ , . . . . These series may be inverted to express ϕ τ , . . . as series in terms of Jτ , . . . . As a consequence of (78) the relations δ W˜ δ 0˜ + =0 δγτ δγτ 16 Noting again that vacuum functionals should only appear before taking the infinite volume limit.

(80)

496

C. Kopper, V. F. Müller

and an analogous one for the derivative w.r.t. the source ωa hold. Similarly as before we write ˜ ξ ≡0 , 0γτ (x) = 0 = 0|

δ 0˜ |ξ ≡0 . δγτ (x)

(81)

Then the VSTI for the proper vertex functions emerging from (77) (where the upper indices 3 = 0, 30 in (82,83,84) are understood) read X δ0 δ0 h , σ0,30 0γτ i − h a , σ0,30 0ωa i δϕ δc τ τ

δ0 1 ¯ c) − h( ∂ν Aaν − m B a ), σ0,30 a i = 01 (ϕ, c, α δ c¯

(82)

with ¯ c) = L1 (ϕ, c, ¯ c) 01 (ϕ, c,

(83)

and Z ϕτ (x) =

dy Cτ (x − y)

Z ca (x) = − dy S(x − y)

δ0 , δϕ τ (y)

δ0 , c¯a = δ c¯a (y)

Z dy

δ0 S(y − x). δca (y)

(84)

4.2. Flow Equations and Renormalizability of Vertex Functions. In this section we briefly comment on flow equations for proper vertex functions. Such FE have been analysed previously in [KKSc] for φ44 -Theory, to prove analyticity statements in Minkowski space. They have been derived and applied before in the literature, see e.g. [BAM1, Wet]. Writing (70, 76, 78) with general 3 instead of 3 = 0 we may derive FE similarly as in the previous chapter by deriving w.r.t. 3. Deriving (76) we obtain ¯ ∂3 W˜ 3,30 (K, ξ ) = ∂3 P 3,30 (K) − ∂3 L˜ 3,30 (ϕτ , c, c),

(85)

and (78) then implies ∂3 0˜ 3,30 + ∂3 W˜ 3,30 = 0.

(86)

Combining both equations and using the FE derived from (48) for the functional L˜ 3,30 we obtain the FE for 0˜ 3,30 : Z 1X ¯ c) − ϕ (p) ∂3 (Cτ3,30 (p))−1 ϕ τ (−p) ∂3 0˜ 3,30 (ϕ τ , c, 2 τ p τ (87) Z a 3,30 −1 a 3,30 ˜ (p)) c (−p) = h¯ (∂3 1(3, 30 )) L (ϕτ , c, c). ¯ + c¯ (p) ∂3 (S p

Renormalization Proof for Spontaneously Broken Yang–Mills Theory

497

The functional on the r.h.s. has to be viewed as depending on the (classical) fields ¯ c . In momentum space the fields ϕτ , c, ¯ c are given in terms of those through ϕ τ , c, ϕτ (p) = (2π)4 Cτ3,30 (p)

δ 0˜ 3,30 δ 0˜ 3,30 , ca (p) = −(2π )4 S 3,30 (p) a , δϕ τ (−p) δ c¯ (−p)

c¯a (p) = (2π )4 S 3,30 (p)

δ 0˜ 3,30 δca (−p)

¯ c corresponding to (84). The r.h.s of (87) is expressed in terms of ϕ τ , c, following relations (and the chain rule): (2π)−4 (Cτ3,30 (p))−1 ϕ τ (p) = −

17

using the

δ 0˜ 3,30 δ L˜ 3,30 + , δϕτ (−p) δϕ τ (−p)

δ 0˜ 3,30 δ L˜ 3,30 − , δ c¯a (−p) δ c¯a (−p) δ 0˜ 3,30 δ L˜ 3,30 + a . (2π)−4 (S 3,30 (p))−1 c¯a (p) = − a δc (−p) δc (−p)

(2π)−4 (S 3,30 (p))−1 ca (p) =

(88)

The inverted propagators appearing in (87, 88) remain only at the tree level, they cancel at loop order ≥ 1. Considering first the functional without insertions we may again inductively bound 3,30 proceeding as in Sect. 3 upwards in l (note the factor of h¯ on the functions ∂ w 0l,n the r.h.s.), for given l upwards in |n|, and for given l, |n| downwards in the number of momentum derivatives. The induction starts from the tree order vertex functional Z Z 1X 3,30 = ϕ τ (p) (Cτ3,30 (p))−1 ϕ τ (−p) − c¯a (p) (S 3,30 (p))−1 ca (−p) 0l=0 2 τ p p + (033,30 + 043,30 )l=0 + L0irr |l=0 .

(89)

The tree level three and four point functions from the third term are given in App.A, the last term is the tree level contribution to the irrelevant extension of L0 in (107, 108). Imposing b.c. analogous to those imposed on the CAG from Sect. 3.2 in A1), A2) we may then derive the bounds Proposition 5. 3,30 (p)| ≤ (3 + m)4−|n|−|w| P1 (log |∂ w 0l,n

3+m |p| ) P2 ( ) m 3+m

with the same comments as for Proposition 1. 17 Note that 1(3, 3 ) in (87) is still the one in terms of the fields ϕ , c, τ ¯ c. 0

(90)

498

C. Kopper, V. F. Müller

We again skip the proof. Finally we note that to obtain the analogous renormalizability statements for vertex functions with one insertion the FE (87) has to be derived w.r.t. the corresponding source. Again a FE linear in terms of the inserted vertex functions, but involving also the noninserted ones, emerges. Its solutions are bounded in the same way as the corresponding CAG from Sect. 3. Since the analysis of the STI is more transparent in terms of the vertex functions, the renormalization conditions will be imposed on those. We may then directly infer the finiteness of the theory from the results of this section. We could also calculate from the b.c. on the vertex functions those for the CAG, which then also satisfy A1), A2) and conclude the finiteness by Sect. 3, so that we might have skipped this section altogether, paying instead more attention on how to calculate b.c. on L from those for 0 and vice versa. Generally speaking it seems to us that FE for vertex functions are useful in their own right. Nevertheless the CAG should perhaps be viewed as the “primary objects” of interest, since the FE for them takes a closed functional form. This closed form is of fundamental importance for the analysis of the linear relations among the STI and thus crucial for the proof of the theorem and in particular Lemma 2 below.

4.3. Violated Slavnov–Taylor Identities for the bare functional L0 . In this section we use again the abbreviations L = L0,30 , L˜ = L˜ 0,30 , 0 ,30 L˜0 = L˜ 30 ,30 , L01 = L3 . 1

1 = 1(0, 30 ), L0 = L30 ,30 ,

(91)

Our starting point is the VSTI (75). By commuting the functional differential operator appearing on the rhs of (75) with the renormalization group flow we will obtain the VSTI in terms of L0 . We introduce some further abbreviations: 1 δ 1 δ δ = ∂ν a −m a , X = hDca , ( ∂ν Aaν − m B a )i, δR a (x) α δAν (x) δB (x) α X δ δ δ hϕτ , Dτ i + hc¯a , D a i. Y = hca , σ0,30 a i − δR δγ δω τ τ

(92)

Now we can write (75) in the form ˜

˜

L1 = e h¯ L (X + h¯ Y ) e− h¯ L |ξ ≡0 . 1

1

(93)

The last two factors may be rewritten as (remember (48)) ˜

˜0

(X+ h¯ Y ) e−h¯ L = e h¯ I eh¯ 1 e−h¯ 1 (X+ h¯ Y ) eh¯ 1 e−h¯ L 1 1 ˜0 h¯ 2 = e h¯ I eh¯ 1 X+ h¯ Y − h¯ [1, X+ h¯ Y ]+ [1, [1, X+ h¯ Y ]] e−h¯ L . 2 (94) 1

1

1

Renormalization Proof for Spontaneously Broken Yang–Mills Theory

499

We have to calculate the commutators [1, Y ] = −h

X δ δ δ δ δ δ , σ0,30 S i− h , σ0,30 i + h a , σ0,30 i, a a a δ c¯ δR δϕ δγ δc δω τ τ τ (95a)

δ 1 δ [1, X] = hca , σ0,30 a i − h a , σ0,30 ( ∂ν Aaν − m B a )i, δR δ c¯ α δ δ i. [1, [1, X]] = −2 h a , σ0,30 S δ c¯ δR a From these relations we obtain (X + h¯ Y ) e

− h1¯ L˜

=e

1 h¯ I

eh¯ 1

(95b) (95c)

1 hca , D ( ∂ν Aaν − m B a )i α

1 δ L˜0 , σ0,30 ( ∂ν Aaν − m B a )i δ c¯a α X δ L˜ 0 δ L˜ 0 hϕτ , Dτ i − hc¯a , D a i + δγτ δω τ

−h

+

(96)

X δ L˜ 0 1 ˜0 δ L˜ 0 δ L˜ 0 δ L˜ 0 h , σ0,30 i − h a , σ0,30 a i e− h¯ L . δϕ δγ δc δω τ τ τ

Note that due to the form of L˜ 0 the contribution h¯

X τ

h

δ δ δ ˜0 δ , σ0,30 i L − h¯ h a , σ0,30 a i L˜ 0 δϕτ δγτ δc δω

vanishes and thus may be omitted in the parentheses in (96). On the other hand using 1 ˜ (93, 74) we can also express (X + h¯ Y ) e− h¯ L |ξ ≡0 in terms of L01 : ˜

˜

d − 1 Lχ e h¯ |χ =0 dχ 1 0 1 1 0 eh¯ 1 e− h¯ Lχ |χ =0 = e h¯ I eh¯ 1 L01 e− h¯ L .

(X + h¯ Y ) e− h¯ L |ξ ≡0 = L1 e− h¯ L |ξ ≡0 = −h¯ 1

1

d 1I = −h¯ e h¯ dχ

(97)

˜ ξ ≡0 = Lχ |χ =0 = L. Equality of (96) for ξ ≡ 0 and (97) and Remember that L| invertibility of exp h¯ 1 (in perturbation theory) now obviously give 1 δL0 1 hca , D ( ∂ν Aaν − m B a )i − h a , σ0,30 ( ∂ν Aaν − m B a )i α δ c¯ α X δL0 X δL0 hϕτ , Dτ L0γτ i − hc¯a , D L0ωa i + h , σ0,30 L0γτ i − h a , σ0,30 L0ωa i = L01 . + δϕτ δc τ τ (98) Equation (98) is the VSTI for the bare functional L0 . It turns out that it plays – unexpectedly – a prominent role in the analysis of how the STI can be restored. Since we

500

C. Kopper, V. F. Müller

impose renormalization conditions in momentum space we also express (98) through the Fourier transformed fields (using the conventions from Sect. 2) Z i 0 ca (p)(p2 + α m2 ){− pν Aaν (−p) − m B a (−p)} L1 = α p Z 0 i δL { p Aa (p) − m B a (p)} σ0,30 (p2 ) − (2π)4 a (p) α ν ν δ c ¯ p Z Z 1−α a pµ pν ]L0γνa (p) + h(p)(p2 + M 2 )L0γ (p) + Aµ (p)[(p2 + m2 )δµν + α p p Z Z a 2 2 0 a 2 2 0 + B (p)(p + αm )Lγ a (p) − c¯ (p)(p + αm )Lωa (p) (99) p

+ (2π)4

Z p

p

δL0 δL0 0 0 σ0,30 (p2 ) · + L L a δAaλ (p) γλ (−p) δh(p) γ (−p) +

δL0 δL0 0 0 L L − . a a δB a (p) γ (−p) δca (p) ω (−p)

4.4. Choice of Renormalization Conditions and Restoration of the Slavnov–TaylorIdentities. We have derived the STI in the previous two subsections for all three functionals 0, L, L0 . In fact the L-functional is only needed as a connecting link between the other two. As we mentioned before this threefold description will be required to recognize the linear interdependences among the STI projected onto the relevant parts of the various functionals. For this purpose we also need termwise equivalence relations among the relevant parts of 0 and L0 . These termwise equivalence relations are simplified, if we assume that the renormalization conditions for the functionals 0 or L are chosen such that: κ :=

δL δ0 |8≡0 = 0 ⇐⇒ |8≡0 = 0. δh(x) δh(x)

(100)

The condition (100) on the absence of tadpoles, although probably not indispensable, simplifies the subsequent formulae, and it is not really a physical restriction, but rather one on the parametrization of the theory. Here and in the following we use the shorthand notation n F |0 ∂ w δ8

to denote the derivative of the functional F (which might be L or 0) w.r.t. n fields 8, evaluated at 8 ≡ 0, followed by removing the global δ-function and performing the derivatives ∂ w . When we write n F |0,0 , ∂ w δ8

in addition we set all momenta to 0 afterwards, and n F |0,0,l ∂ w δ8

(101)

is the l th order coefficient in the loop expansion of the previous expression. We now state

Renormalization Proof for Spontaneously Broken Yang–Mills Theory

501

Lemma 1. Under the assumption (100) we have: If for given l, n, w and for all l 0 , n0 , w0 with l 0 < l and (n0 , w0 ) ⊆ (n, w) or with l 0 = l and (n0 , w0 ) ⊂ (n, w) we have 0 n0 ∂ w δ8 01 |0,0,l 0 = 0, then18 n n ∂ w δ8 01 |0,0,l = 0 ⇐⇒ ∂ w δ8 L1 |0,0,l = 0.

(102)

Proof. Equation (102) follows from (83, 84) on noting that all propagators are finite and nonvanishing and that all possible factorizations appearing when we apply the chain rule in going from the L- to the 0-functions or vice versa vanish due to the conditions on the lower dimension terms. u t Lemma 1 suggests that we satisfy the STI for the relevant terms proceeding upwards in the number of fields and momentum derivatives. The subsequent comparison of the VSTI for L (75) and L0 (98) shows that we also should proceed upwards in the number of loops. Before proceeding to the termwise comparison it is instructive to note quite generally that from L1 =

d d 0 Lχ |χ =0 and L01 = L |χ =0 , dχ dχ χ

(103)

it follows similarly as in (97) that L1 e− h¯ (L+I ) = eh¯ 1 L01 e− h¯ L , 1

1

0

(104)

and from the (perturbative) invertibility of eh¯ 1 we then obtain the relation L1 = 0 ⇐⇒ L01 = 0.

(105)

Our goal is to arrange for renormalization conditions such that the relevant terms in 01 vanish, proceeding inductively in the number of loops l. These relevant terms are listed in App. C, (I–XXIX). By the statements from Sect. 2 and 4.2 and App. C there are no nonvanishing relevant terms in 01 and L01 at the tree level in the limit 30 → ∞ (this limit remains formal before we have stated how to renormalize the theory in agreement with the STI). Since in the relevant part of the VSTI there are contributions stemming from σ˙ (106) for finite 30 which might conspire to give finite contributions in the VSTI when combining with divergent terms, our strategy is to compensate for them by introducing irrelevant terms in the bare action L0 . In this respect it is important to note that the termwise identities (I–XXIX) take the same form for 0 and L0 apart from the crucial fact that i) L0 contains only those irrelevant terms we are going to introduce explicitly, and from the fact that ii) there appear additional terms in (98) as compared to (82) which just replace those 0loop terms, excluded in L0 by its definition19 , so that as a consequence the termwise identities look as before (when ignoring the irrelevant terms). 18 We use the set theoretic relations for the multiindices n, w, though strictly speaking they are sequences. The symbol ⊂ means by definition strict inclusion. 19 These terms contribute only when performing up to three field derivatives.

502

C. Kopper, V. F. Müller

We will briefly denote the relevant terms in L0 by adding a sub- or superscript 0 to the corresponding term appearing in 0. In the same way we denote (I–XXIX) written for L0 as (I0 –XXIX 0 ). In a number of STI the irrelevant terms introduced in L0 below (107, 108) will make appearance, namely in III, V, VII, VIII. For those terms the STI for L0 are rewritten explicitly in App. C including these terms. We use similar notation as in App. A and App. C, in particular the shorthand σ˙ := σ˙ 0,30 (0) :=

dσ0,30 (p2 ) αm4 + (1 + α)m2 M 2 | = − , 2 p =0 dp2 360

(106)

and add the following contribution20 to L0 : Z Z L0irr

=

p

q

rst Arµ (p)Asν (q)B t (−p − q)

AAB AAB + (pµ pν − qµ qν )i20 ] · [ δµν (p2 − q 2 ) i10 r

(107)

r

cch ¯ cch ¯ cch ¯ + q 2 i20 + pq i30 ] + c¯ (p)c (q)h(−p − q) [ p2 i10 rst r s t 2 ccB ¯ 2 ccB ¯ ccB ¯ ]. + c¯ (p)c (q)B (−p − q)[ p i10 + q i20 + pq i30

We have presented L0 directly in momentum space, where we perform the analysis of the STI. The letter i was chosen to recall “irrelevant”, and we listed all terms of the respective field content allowed by the global symmetries, which are of second order in the momenta. The constants i ... will be chosen as follows:

ccB ¯ m(i10

AAB AAB m R40 = −σ˙ δm20 g R20 , i20 = 0, 2 i10 1 g m ccB ¯ ¯ cch ¯ − i30 ) = −σ˙ m F0ccB − 60BB g R60 , = σ˙ 60hh R30 , i30 2 2 2 m cch ¯ cch ¯ cch ¯ BB g 0 (2i − i30 ) = −σ˙ m F0 + 60 R , 2 10 2 5 m cch cch ¯ (2i ¯ − i30 ) = −2 σ˙ F0BBh m R40 , 2 20 ccB ¯ ccB ¯ ¯ ccB ¯ ¯ − i30 ) = σ˙ 60cc g R70 , i20 = −σ˙ F0ccB . m R40 (2 i10

(108)

These relations are written in terms of the linear combinations which appear in the respective STI and are needed to verify them. By the general results of Sect. 3 and Sect. 4.2 the theory stays finite when adding such ”irrelevant” dimension 5 terms to the bare action, |p| under the condition that these terms can be bounded by 3−1 0 P1 (log(30 /m)) P2 ( 30 ). If the relevant terms appearing in (108) obey (39, 53) for 3 = 30 , this bound is obvious from the fact that σ˙ = O(3−6 0 ). After this modification of the bare action we may state our Induction hypothesis. For l ≥ 1 and all l 0 ≤ l − 1, i)

we assume that the theory to order l 0 has been renormalized according to A1)(38) and A2) for the 0 (or equivalently for the L) functional.

20 We remark that when working with a regulator as in footnote 8, we could spare the detour (107,108), because then σ˙ would be zero.

Renormalization Proof for Spontaneously Broken Yang–Mills Theory

503

ii) Furthermore we assume n n 0 01 |0,0,l 0 = 0, ∂ w δ8 L1 |0,0,l 0 = 0, for (n, w) with |n| + |w| ≤ 5. (109) ∂ w δ8

iii) Finally we assume that 30 |p| ) P2 ( ), m 30 for (n, w) with |n| + |w| > 5.

n 30 ,30 L1 |0,l 0 | ≤ O(30 |∂ w δ8

5−|n|−|w|

) P1 (log

(110)

All these statements are fulfilled at the tree level by our assumptions on the tree level action. The rest of this section is devoted to prove the Theorem. The induction hypothesis holds at loop order l. Proof. At loop order l we first prove the crucial Lemma 2. For given (n, w) with |n| + |w| ≤ 5 under the assumptions (100, 109) and if 0

0

0

0

n n 0 L1 |0,0,l = 0, ∂ w δ8 L1 |0,0,l = 0 for (n0 , w0 ) ⊂ (n, w) and n0 ⊂ n, ∂ w δ8

(111)

the following equality holds: n n 0 L1 |0,0,l = ∂ w δ8 L1 |0,0,l . ∂ w δ8

Proof. Due to the induction assumption ii), Lemma 1 and (111) we find d w n − 1 Lχ n L1 |0,0,l , ∂ δ8 e h¯ (−h¯ ) |0,0,χ =0,l = ∂ w δ8 dχ

(112)

(113)

noting that factorized terms give vanishing contribution, since |n| + |w| ≤ 5. On the other hand we also obtain (cf. (104)) 1 d w n − 1 Lχ − L h 1 w n 0 − h 1 ¯ ¯ h h e ¯ ) |8≡0, l . (114) ∂ δ8 e ¯ |8≡0,χ =0, l = ∂ δ8 (e L1 e (−h¯ ) dχ Note that here we do not yet restrict to vanishing momenta p, but assume that the momenta of the fields appearing in the derivatives to be called p1 , . . . p|n| have been chosen nonexceptional 21 . Later we take p → 0 22 . We may rewrite the term eh¯ 1 L01 e−h¯ 1 as eh¯ 1 L01 e−h¯ 1 = L01 +

5 X h¯ ν [1, L01 ]ν , ν!

(115)

ν=1

with the definition [1, · ]ν := [1, [. . . [1, · ] . . . ]] . {z } |

(116)

ν times

21 i.e. no subsum vanishes 22 We point out that (113) should strictly speaking also be viewed as being obtained first for nonexceptional

p, where correction terms appear, which then smoothly tend to 0 for p → 0, so that we need not pay attention to them.

504

C. Kopper, V. F. Müller

In (115) we used that L01 is of degree 5 in the fields. We may then define P10 e− h¯ L = L01 + 1

5 X 1 h¯ ν [1, L01 ]ν e− h¯ L , ν!

(117)

ν=1

and recognize P10 as given by the sum over the contributions from the connected amputated diagrams containing i) exactly one vertex from L01 , ii) up to 5 vertices from −L, which are all directly linked to the vertex from L01 via a propagator from 1, ii) multiplied by the monomial in the fields produced by the derivatives from 1 acting on the respective term in (−L), multiplied by the respective power of h¯ and a combinatoric factor to be read from (117). We now have to regard n (P10 e− h¯ L )|8≡0, l . ∂ w δ8 1

(118)

After performing the field and momentum derivatives and after splitting off the global δ(p1 + . . . + p|n| )-function we let all momenta go to 0 so that then n (P10 e− h¯ L )|0,0, l ∂ w δ8 1

(119)

is given by the sum over all l-loop connected amputated diagrams containing exactly one vertex n , up to 5 vertices from −L directly from L01 , |n| external lines of the kind specified in δ8 0 linked to the one from L1 via a propagator, and weighed with a combinatoric factor as above. The functions are derived w.r.t. external momenta as indicated through ∂ w and taken at 0 external momenta in the end. Note that the restriction on the momenta avoids the production of disconnected terms by momentum conservation. Now remembering (100) and the fact that L does not contain 0-loop two point functions, we can use the induction hypothesis (109) and (111) to conclude that all contributions to (119) vanish apart from the term n 0 n 30 ,30 L1 |0, 0, l = ∂ w δ8 L1 |0, 0, l . ∂ w δ8 0

(120)

0

n L0 | 0 Any other contribution would require nonvanishing ∂ w δ8 1 0, 0, l 0 with l < l or nL | (n0 , w0 ) ⊂ (n, w) and n0 ⊂ n. The term (120) then equals ∂ w δ8 1 8≡0, 0, l by (113, 114) and subsequent comments. u t

After these preparations we present the renormalization conditions at l-loop order, lower orders being already fixed by induction. This means we fix the 37 relevant terms of the theory and the 7 normalization parameters Ri appearing in the BRS transformation at order h¯ l 23 : A) We fix κ = 0 (100), and we choose freely in 0 the 8 terms24 ¯ ˙ BB , 6 ˙ cc 6 trans , 6 long , 6 , 6 AB , F BBh , R2 , R3 . 23 We mostly leave out the index l of the loop order for readibility in the rest of this subsection. 24 cf. App. A (137) for the notation.

(121)

Renormalization Proof for Spontaneously Broken Yang–Mills Theory

505

This then fixes uniquely the corresponding terms in L. In fact we could interchange R1 with 6 long , R4 with 6 AB , and/or F AAA with R2 in (121). We made the previous choice since it simplifies the check of the STI. This means that we may choose freely all field normalizations with the exception of h25 , one global normalization for the BRS-transformations and the two couplings through F AAA and F BBh . Our simplifying assumption κ = 0 (100) is related to the freedom in choosing the vacuum expectation value of the Higgs field. B) We fix in L˜0 the following relevant terms: R60 = R70 = R20 , R50 =

(R20 )2 R30

.

(122)

This means that R60 , R70 are fixed to equal R20 , which in turn is uniquely given at l loop order by our free choice of R2 , and by lower loop order constants fixed before. Similarly R50 is fixed through R30 and R20 at l loop order. Remember again that, by the FE for 1PI functions, an l-loop contribution depends only on lower loop order terms and the l-loop boundary condition for the term in question. C) All those r0 -terms in L0 having no tree correspondence are chosen equal to zero (11 terms to be read from App. A), i.e. hBA ¯ cc ¯ , . . . , r0cc = 0. r20

(123)

D) Furthermore we fix in L0 the following relevant terms: F0BBA = −

R30 2R20

hBA ¯ F10 , F0ccB =−

R30 R20

¯ F0cch , F0AAhh =

R50 R30

AABB F10 .

(124)

More precisely (124) should be understood as follows: The F -terms on the rhs in (124) will be uniquely fixed as functions of (a subset of) the ones fixed previously in A)-C). Then we fix each of the three terms on the lhs as a function of those on the rhs. Finally also the remaining 18 relevant constants will be uniquely fixed as functions of the previous ones in our sweep through the STI. Since 17 relevant terms (those from B)–D)) are fixed on the wrong side, namely in L0 , one may wonder how we will get a finite theory in the end. The tool to achieve this will in fact again be the STI, once we know they are satisfied. This is not unexpected from the traditional use made of the STI in renormalization proofs. Now we first satisfy a subset of the STI (I–VIII) containing only up to three field derivatives by choosing appropriately ¯ ¯ , R1 ] (Ia ), 6 BB [6 cc , R4 ] (IIa ), R1 (Ib ), R4 (IIb ), F AAA [σ˙ δm2 ] (IIIb ), δm2 [6 cc ¯ ¯ ¯ , 6 BB )] (VIIc ), F AAh [F1hBA ](VIb ), F1ccA [F BBA , r2ccA ] (IVb ), F1hBA [R5 , σ˙ (F cch ¯ [F AAh , F1hBA , r2hBA ] (VIa ), F cch ¯ ¯ ¯ [R7 , F ccB ] (VIIIa ), 6 hh [R5 , 6 BB , F cch ] (VIIa ), 6 cc hh hBA hh ˙ [F1 , σ˙ 6 ] (VIIb ). 6

(125) 25 Remember that h, B a stem from the same complex scalar doublet (5).

506

C. Kopper, V. F. Müller

We wrote in brackets the STI which is satisfied by the respective choice of a renormalization constant and in square brackets the other relevant constants at loop order l, on which this choice depends. In the square brackets we omitted the terms from (121), which are freely chosen, and R1 and R4 , which by (125) depend on such terms only. Note however that Ib and IIb cannot be solved for R1 and R4 depending only on such terms, before we know that Ia and IIa hold. Therefore we indicated the dependence on R1 and R4 in the first two terms. At this stage R1 and R4 can already be seen to be finite. All other terms, depending on constants fixed on the wrong side, might diverge with 30 . We come back to the finiteness problem later and first convince ourselves that the system (125) is consistent, i.e. solvable. This is a problem only if a term is present before and within square brackets at the same time, when we successively replace each term within square brackets by those on which it depends at l loop.26 Checking the list ¯ , which, when substituting F hBA from (125), we find that this happens only for F cch 1 cch ¯ it appears with a coefficient 1/α + σ˙ m2 (R4 /R1 ). depends on itself. Solving for F Since we know that R1 , R4 are finite, this coefficient does not vanish for 30 large.27 As ¯ [F hBA , r hBA ] by a result we may replace F cch 1 2 ¯ [r2hBA ]. F cch

(126)

After this change one rapidly realizes the solvability of (125). Now we impose renormalization conditions for the remaining 6 relevant terms by satisfying the following relations among (I–XXIX) for 0: F BBBB (X), F BBhh (XX), F hhh (IX),

F hhhh (XIX),

F1AABB (XIII2 ), F1AAAA (XIVc ).

(127)

The order is important for the first four terms, for the last two it is arbitrary. Again we wrote in parentheses the relation which is satisfied by and which fixes the respective renormalization condition. At this stage the 37 + 7 relevant parameters are completely fixed. All the remaining relations among the STI will now be verified for L0 . Since there are no dimension 3 terms left, we start with the dimension 4 terms which have not yet been verified. IV0a is the only relation left among (I0 –VIII0 ): Using (123) it takes the form ¯ = 0. 2m R40 F0BBA + g/2 R60 60AB + 1/α F0ccB

(128)

From (125) we know VIa and VIb to be true. Lemma 2 then implies VI0a and VI0b to be true as well. These relations together with (122, 123, 124) then allow to verify (128). Now XI0 , XII0 , XIII01

(129)

are the last relations of dimension ≤ 4 to be analysed. They follow directly from (122, 123, 124). By Lemma 2 we pass from L01 to L1 for XI, XII, XIII1 . Therefore Lemma 1 now tells us that all terms in 01 of dimension ≤ 5 vanish iff they vanish in L1 , and Lemma 2 tells us that all terms in L01 of dimension ≤ 5 vanish iff they vanish in L1 . 26 E.g. at l loop R depends on R only by (122), whereas F BBA depends on R , R , F hBA by (124). 6 2 3 2 1 27 α is supposed to be finite, but α → ∞ may be taken after 3 → ∞. 0

Renormalization Proof for Spontaneously Broken Yang–Mills Theory

507

Among the relations containing 4 or more field derivatives the following ones: XIV0a , XV02b , XVIII0a ,

XIV0b , XV02c , XVIII0b ,

XIV0d , XVI0a , XVIII0c ,

XIV0e , XVI0b , XXI0 ,

XV01a , XV01b , XV02a , XVI0c , XVII0a , XVII0b , XXII0 − XXIX0

(130)

remain to be verified. Only those written in (131) are not immediately obvious from (122, 123, 124). They can be verified using the relations we wrote in parentheses XV01a (122, VI0b , XIII02 ), XVII0a (124, XIII02 , VI0b ).

(131)

We have not yet checked the following five relations of dimension 5 involving three fields only: IIIa , V, VIId , VIIIb , VIIIc , which are the most delicate ones. They contain terms multiplied by σ˙ . For IIIa this is true when inserting F AAA from IIIb . We first forget about these corrections and at the same time about the modification of L0 by (107, 108) and convince ourselves that the 5 STI are fulfilled in this case. To do so we use the following relations: Ib0 , II0b , IV0b , VII0c .

(132)

First we can verify V 0 using (124, 122) and VII0c . Next we regard III0a and realize that it amounts to show that ccA ¯ . g R20 (1 + 6 long,0 ) = F1,0

(133)

cc ¯

˙ ) by I 0 , the rhs equals −g/m R 0 6 AB − 4 R 0 F BBA The lhs equals g (R20 /R10 )(1 + 6 2 0 4 0 b by IV0b . Using II0b for the first and V 0 for the second term we recognize now that (133) holds. V I Id0 follows directly from (122), and VIII0b follows similarly as III0a from IV0b , II0b and V 0 , VIII0c follows from (122). Now we also take into account the correction terms: Those relations among (I0 –XXIX0 ) which are affected by L0irr (107) are listed explicitly in App. C. On inspection one realizes that the choice (108) exactly cancels all terms ∼ σ˙ in (I0 –XXIX0 ). Thus all STI are fulfilled in our theory, and item ii) of the induction hypothesis is satisfied to loop order l. What remains to show is that the theory defined up to l-loop order is finite for 30 → ∞. As we noted, apart from the 9 evidently finite constants in (121), finiteness of R1 and R4 can be inferred from (125). To proceed further it is important to note that all irrelevant terms appearing in the STI apart from those in (107, 108) are a priori finite at l-loop since they only depend on the renormalization conditions at order l 0 ≤ l − 1. The next step is then to convince oneself of the fact that (F AAA )0,30 has a finite limit for 30 → ∞. As we see from IIIb the finiteness of F AAA follows, if we can show that σ˙ δm2 has a finite limit. From (122, 123, 124) it is evident that all relevant parameters fixed on the wrong side (at 3 = 30 ) satisfy the bound assumed in the corollary from Sect. 3.2. From this corollary (adapted to the 0-functional) and the induction hypothesis we therefore conclude that δm2 is bounded by 320 P(log 3m0 ), AAA .28 Then we go through the STI whereas σ˙ ∼ 3−6 0 . This proves the finiteness of F as follows: ¯ ¯ (XI Vb ), r2AAAA (XI Va ), r2AAcc (XI Ve ), F1AAAA (XI Vc ), r1AAcc ¯ ¯ cc ¯ ¯ (XV1b ), r cc (XV I I Ib ), r2ccA (XV I I Ic ), r2AABB (XXI I ). r1BB cc 28 Using the STI we may in fact show at this stage that δm2 diverges at most logarithmically.

(134)

508

C. Kopper, V. F. Müller

In parentheses we wrote the STI from which the finiteness of the respective relevant ¯ ¯ cc ¯ (XV I I I ) note that XV I I I does not depend on r ccA at term may be inferred. In r cc b b 2 l-loop order. We now infer from V I Id that R0 r2hBA = 2mσ˙ F0BBh 4 |l + f inite has a finite limit for 30 → ∞. R1

(135)

Here the first contribution stems from the irrelevant term m cch cch ¯ ) = −2m σ˙ F0BBh R40 (2i ¯ − i30 2 20 in (108). In VIId this contribution appears among the irrelevant terms and originates from the b.c. at 3 = 30 . Note that F0BBh , R40 diverge at most linearly with 30 using the results from Sect. 3.2 and Sect. 4.2. Disposing of the finiteness of R2 and r2hBA finite¯ (XVII ), r hB cc ¯ (XXVIII), ness follows now also for R6 (XVIa ), R7 (XVIIIa ), r hhcc b BB cc ¯ BBBB (XXVII), F (X). Similarly as in (135) we may now conclude from V the r2 ¯ (IV ), finiteness of F BBA (V). Next we pass through the following finiteness chain: F1ccA b ¯ (IV ), ¯ (VIII ), F AAh (XIII2 ), F1hBA (VIb ), F ccB 6 cc F1AABB (XV1a ), a a ¯ (VI ), R (VII or 6 BB (IIa ), δm2 (Ia ), and then we can establish finiteness of F cch a c 5 XVIb ). Finally it is easy to convince oneself of the finiteness of the remaining constants ˙ hh (VIIb ), F BBhh (XX), F hhh (IX), F hhhh (XIX), F AAhh (XVIIa ). 6 hh (VIIa ), 6 In regard to the previous series of finiteness statements it is interesting to note that it is first extracted for the pure gauge sector and last for the terms involving the h field.29 By now all of the 44 relevant constants are known to be finite, and thus item i) of the induction hypothesis to loop order l is verified.30 Once i), ii) are verified, item iii) immediately follows from the general bounds in Sect. 3.2 on noting that n L0 | a) from our choice of the bare action it is evident that ∂ w δ8 1 0,0,l = 0, |n| > 5, 0 b) the irrelevant terms in L1 generated from those introduced in (107) on BRS transformation obey the required bound as a consequence of the previous finiteness statements, c) all other irrelevant terms in L01 are generated by momentum derivatives acting on the regulating factor σ0,30 (p), which automatically produces (more than) the required negative powers of 30 .

t u

So the induction hypothesis holds to l-loop order. This ends the proof of the theorem.

Once the theorem is proven, Proposition 4 tells us that the STI hold in the limit 30 → ∞. 29 This is reminiscent of the fact that the radiative corrections in the scalar boson sector are more rapidly divergent, namely quadratically, than all other ones. 30 The smoothness assumption directly follows from the smoothness of the regulator and from the b.c. which depend on 30 only through the regulator.

Renormalization Proof for Spontaneously Broken Yang–Mills Theory

509

Concluding Remarks We have presented a renormalization proof for spontaneously broken Yang–Mills theory based on the Wilson renormalization group. The renormalization conditions admissible in view of the STI could be stated explicitly in (121) to (127).31 We tried to avoid any equivocality as regards the analytical status of the statements we made, in particular for which values of the cutoffs they hold. We did not make use of unregularized path integrals. We think that the analytical aspect is generally somewhat neglected in the recent literature including textbooks. We did not attempt generality on the symmetry or group theoretical aspects, which have been studied extensively in the literature, and restricted for simplicity to the physically interesting SU(2) case. We think it would be worthwhile to extend the work – with the same precision on analytical status – to the physical consequences to be drawn from the STI, in particular the gauge invariance of the SMatrix. Further interesting problems to be treated in this context are the renormalization of QCD and the analysis of anomaly problems and of the action principle. Appendix A Here we consider the generating functional for the proper vertex functions ¯ c) = 0(A, h, B, c,

4 X

0n + 0(n>4) ,

n=1

n counting the number of fields, and extract its relevant part, i.e. its local field content with mass dimension not greater than four. Generally we will not underline the field variable symbols in the Appendices, though of course all 0 functional arguments should be understood as such. In App. A and App. B the regulators are not made explicit, apart from the subsequent comments on the two-point functions, where contributions arising for finite 30 are made explicit. 1) One-point function: ˆ 01 = κ h(0). 2) Two-point functions: Z 1 1 a AA Aµ (p)Aaν (−p)0µν (p) + h(p)h(−p)0 hh (p) 02 = 2 p 2 1 a + B (p)B a (−p)0 BB (p) 2 a a cc ¯ a a AB − c¯ (p)c (−p)0 (p) + Aµ (p)B (−p)0µ (p) , AA (p) = δµν (m2 + δm2 ) + (p2 δµν − pµ pν )(1 + 6trans (p2 )) 0µν 1 + pµ pν (1 + 6long (p2 )), α 0 hh (p) = p2 + M 2 + 6 hh (p2 ), 0 BB (p) = p2 + αm2 + 6 BB (p2 ), ¯ ¯ (p) = p2 + αm2 + 6 cc (p2 ), 0µAB (p) = ipµ 6 AB (p2 ). 0 cc 31 Using in particular (123) it should be possible to derive the antighost equation of motion often used in textbooks [FaSl, ZiJ].

510

C. Kopper, V. F. Müller

Besides the unregularized tree order there emerge 10 relevant parameters from the vari¯ (0), ˙ hh (0), 6 BB (0), 6 ˙ BB (0), 6 cc ous self energies: δm2 , 6trans (0), 6long (0), 6 hh (0), 6 P P ¯ (0) and 6 AB (0), where we used the notation ˙ (0) ≡ (∂ ˙ cc 6 )(0). p2

0,30 carries the inverted regulating By (78, 79, 84) the 0-loop-order functional 02,l=0 −1 2 2 2 2 factor (σ0,30 ) (p ) = 1 − σ˙ p + O((p ) ) with σ˙ = −(αm4 + (1 + α)m2 M 2 )/360 . Therefore all self energies vanish at order l = 0 , whereas hh ˙ l=0 (0) = −σ˙ M 2 , 6

6trans |l=0 (0) = −σ˙ m2 ,

BB cc ¯ ˙ l=0 ˙ l=0 6 (0) = 6 (0) = −σ˙ α m2 ,

6long |l=0 (0) = −σ˙ α m2 .

(136)

To clearly isolate the tree level cutoff effects from the loop contributions we introduce the notation ˙ ˙ ˙ = 6(0) − 6(0)| 6(0) = 6(0) − 6(0)|l=0 , 6(0) l=0 .

(137)

3) Three-point functions: Only the relevant part is given explicitly: r = O(h¯ ) denotes a relevant parameter which vanishes in the tree order, otherwise a relevant parameter is denoted by F . Moreover, we indicate an irrelevant part by a symbol On , n ∈ N, indicating that this part vanishes as an nth power of the momentum in the limit when all momenta tend to zero homogeneously. Z Z AAA (p, q) rst Arµ (p)Asν (q)Atλ (−p−q)0µνλ 03 = p

q r AAh (p, q)+ rst B r (p)B s (q)Atµ (−p−q)0µBBA (p, q) +Aµ (p)Arν (q)h(−p−q)0µν

¯ (p, q) +h(p)B r (q)Arµ (−p−q)0µhBA (p, q)+ rst c¯r (p)cs (q)Atµ (−p−q)0µccA

+B r (p)B r (q)h(−p−q)0 BBh (p, q)+h(p)h(q)h(−p−q)0 hhh (p, q) ¯ ¯ (p, q)+ rst c¯r (p)cs (q)B t (−p−q)0 ccB (p, q) , +¯cr (p)cr (q)h(−p−q)0 cch

AAA (p, q) = δ i(p−q) F AAA +O , 0µνλ µν λ 3

F AAA = −21 g+r AAA ,

AAh (p, q) = δ F AAh +O , 0µν µν 2

F AAh = 21 mg+r AAh ,

0µBBA (p, q) = i(p−q)µ F BBA +O3 ,

F BBA = −41 g+r BBA ,

0µhBA (p, q) = i(p−q)µ F1hBA +i(p+q)µ r2hBA +O3 ,

F1hBA = 21 g+r1hBA ,

¯ ccA ¯ ccA ¯ ¯ , ¯ (p, q) = ip F ccA = g+r1ccA 0µccA µ 1 +iqµ r2 +O3 , F1

0 BBh (p, q) = F BBh +O2 ,

F BBh = 41 g Mm +r BBh ,

0 hhh (p, q) = F hhh +O2 ,

F hhh = 41 g Mm +r hhh ,

¯ (p, q) = F cch ¯ +O , 0 cch 2 ¯ (p, q) = F ccB ¯ +O , 0 ccB 2

2 2

¯ ¯ , F cch = −21 αgm+r cch

¯ ¯ . F ccB = 21 αgm+r ccB

Renormalization Proof for Spontaneously Broken Yang–Mills Theory

511

The 3-point functions AAB and BBB have no relevant local content. 4) Four-point functions: With parameters r and F defined as before Z Z Z abc ars Abµ (k)Acν (p)Arµ (q)Asν (−k − p − q)F1AAAA 04 |rel = k

p

q

+ Arµ (k)Arµ (p)Asν (q)Asν (−k − p − q)r2AAAA ¯ ¯ + Aaµ (k)Abµ (p)c¯r (q)cs (−k − p − q)(δ ab δ rs r1AAcc + δ ar δ bs r2AAcc )

+ Aaµ (k)Abµ (p)B r (q)B s (−k − p − q)(δ ab δ rs F1AABB + δ ar δ bs r2AABB ) ¯ ¯ + δ ar δ bs r2BB cc ) + B a (k)B b (p)c¯r (q)cs (−k − p − q)(δ ab δ rs r1BB cc

+ h(k)h(p)h(q)h(−k − p − q)F hhhh + B r (k)B r (p)h(q)h(−k − p − q)F BBhh + B r (k)B r (p)B s (q)B s (−k − p − q)F BBBB + Arµ (k)Arµ (p)h(q)h(−k − p − q)F AAhh ¯ + h(k)h(p)c¯r (q)cr (−k − p − q)r hhcc ¯ cc ¯ + c¯a (k)ca (p)c¯r (q)cr (−k − p − q)r cc

¯ + rst h(k)B r (p)c¯s (q)ct (−k − p − q)r hB cc , F1AABB = 18 g 2 + r1AABB , F1AAAA = 41 g 2 + r1AAAA , 1 2 M 2 1 2 M 2 hhhh hhhh = 32 g m + r , F BBhh = 16 g m + r BBhh , F 1 2 M 2 g m + r BBBB , F AAhh = 18 g 2 + r AAhh . F BBBB = 32 Hence, in total 0 involves 1 + 10 + 11 + 15 = 37 relevant parameters. Appendix B We also have to consider the vertex functions with operator insertions stemming from the BRS-transforms. These insertions have mass dimension ≤ 2. Only the respective relevant part of the four vertex functions with insertions is listed: Z 0γµa (p) |rel = −ipµ ca (−p)R1 + arb Arµ (q)cb (−p − q)gR2 , q Z 1 r r 0γ (p) |rel = B (q)c (−p − q)(− gR3 ), 2 q 0γ a (p) |rel = mca (−p)R4 Z Z 1 1 + h(q)ca (−p − q) gR5 + arb B r (q)cb (−p − q) gR6 , 2 2 q q

512

C. Kopper, V. F. Müller

0ωa (p) |rel = ars

Z

1 cr (q)cs (−p − q) gR7 . 2 q

There appear 7 relevant parameters Ri = 1 + ri , ri = O(h¯ ), i = 1, ..., 7. All other 2-point functions, and the higher ones, of course, are of irrelevant type. Appendix C Here we present the 53 conditions which result upon requiring that the functional 01 , (83), has a vanishing local part for (mass) dimensions smaller or equal to five ! ¯ c)|dim≤5 = 0 : 01 (A, h, B, c, Into most of these conditions also irrelevant contributions enter which are not given explicitly but are simply indicated by “irr”. To recognize the local origin, we keep the momentum factors arising. The δ-distribution emerging from the functional derivatives and forcing the sum of the corresponding momenta to zero is not written. Relations explicitly rewritten for L0 carry a zero in the numbering. In those, the irrelevant terms from (107,108) are the only ones appearing and are written explicitly. The STI for 0 are supposed to be written for the case 3 = 0, 30 ≤ ∞. Note that they take a different form for 30 < ∞ and 30 → ∞ only, if σ˙ appears, which is the case in Ib , IIb , IIIb ,V,VIIb ,VIIc , VIId ,VIIIb ,VIIIc . For the L0 -functional we write for the loop ˙ 0 instead of 6, 6. ˙ level two-point functions 60 instead of 6 and 6 0 , 6 Two fields δAaµ (q) δcr (k) 01 |0 o n P P¯ ! a) 0 = qµ −(m2 + δm2 )R1 + AB (0)mR4 + m2 + α1 cc (0) , n ¯ ! ˙ cc (0)) b) 0 = q 2 qµ − α1 (1 + 6 long (0))R1 + α1 (1 + 6 o P P AB cc ¯ 1 (0)mR4 − α (0)] + irr . − σ˙ [δm2 R1 − II) δB a (q) δcr (k) 01 |0 P P¯ ! a) 0 = m(αm2 + BB (0))R4 − m(αm2 + cc (0)) + κ(− 21 g)R3 , n P ¯ ! ¯ (0) − ˙ BB (0))R4 − m(1 + 6 ˙ cc (0)) − σ˙ m[6 cc b) 0 = q 2 − AB (0)R1 + m(1 + 6 o 6 BB (0)R4 ] + irr .

I)

Three fields III)

δArµ (p) δAsν (q) δct (k) 01 |0

n ! ¯ ¯ ) a) 0 = (pµ pν − qµ qν ) − 2F AAA R1 − α1 (F1ccA − r2ccA i o h + α1 (1 + 6 long (0)) − (1 + 6 trans (0)) gR2 + irr , ! b) 0 = (p2 − q 2 )δµν 2F AAA R1 + (1 + 6 trans (0))gR2 + σ˙ δm2 gR2 + irr ,

Renormalization Proof for Spontaneously Broken Yang–Mills Theory

513

! b0 ) 0 = (p2 − q 2 )δµν 2F0AAA R10 + (1 + 6 0,trans (0))gR20 AAB m R 0 , + σ˙ δm20 gR20 + 2 i10 4 IV) δArµ (p) δB s (q) δct (k) 01 |0 o n P ! ¯ ¯ a) 0 = pµ 2F BBA mR4 + 21 g AB (0)R6 + α1 F ccB − mr2ccA + irr , o n P ! ¯ ¯ ) + irr , − r2ccA b) 0 = qµ g AB (0)R2 + 4F BBA mR4 + m(F1ccA V) δB r (p) δB s (q) δct (k) 01 |0 n ! ˙ BB (0)) g R6 0 = (p2 − q 2 ) 2R1 F BBA + (1 + 6 2 ¯ − σ˙ [m F ccB − 6 BB (0) g2 R6 ] + irr , n ! g 0 ¯ ˙ BB ˙ [m F0ccB − 60BB (0) g2 R60 ] V 0 ) 0 = (p2 − q 2 ) 2R10 F0BBA + (1 + 6 0 (0)) 2 R6 − σ ccB ¯ ccB ¯ ) , − i30 − m(i10 VI) δArµ (p) δh(q) δct (k) 01 |0 o n P ! ¯ + irr , a) 0 = pµ − 2R1 F AAh + mR4 (F1hBA − r2hBA ) + AB (0) 21 gR5 − α1 F cch ! b) 0 = qµ −2R1 F AAh + 2mR4 F1hBA + irr , VII) δh(p) δB s (q) δct (k) 01 |0 P ! ¯ a) 0 = (M 2 + hh (0))(− 21 gR3 ) + 2mF BBh R4 + mF cch PBB 1 2 + (αm + (0)) 2 gR5 , o n Phh 1 ! hBA 2 ˙ hh (0)) 1 gR3 − σ˙ (0) 2 gR3 + irr , b) 0 = p F1 R1 − (1 + 6 2 n o Phh 1 0 1 ! 1 cch ¯ 0 ˙ hh ˙ b0 ) 0 = p 2 F1hBA R10 − (1 + 6 0 (0)) 2 gR3 − σ 0 (0) 2 gR3 + 2 m i30 , o n P ! ¯ + BB (0) 1 gR ]+irr , ˙ BB (0)) 1 gR5 +σ˙ [mF cch c) 0 = q 2 −F1hBA R1 +(1+6 5 2 2 n PBB 1 0 BB ! 2 1 hBA cch ¯ 0 0 0 ˙ 0 (0)) gR + σ˙ [mF + 0 (0) 2 gR5 ] c ) 0 = q − F10 R1 + (1 + 6 0 5 2 o cch ¯ − i cch ¯ + 21 m (2i10 30 ) , ! d) 0 = k 2 r2hBA R1 + σ˙ 2mF BBh R4 + irr , hBA 0 ! cch ¯ − i cch ¯ R1 + σ˙ 2mF0BBh R40 + 21 m (2i20 d 0 ) 0 = k 2 r20 30 ) , VIII) δct (q) δcs (p) δc¯r (k) 01 |0 Pcc ! ¯ ccB ¯ R − (αm2 + a) 0 = 2mF (0))gR7 , 4 o n Pcc ¯ ! 2 ¯ ccA ¯ ccA ¯ ˙ cc (0))gR7 − σ˙ (0))gR7 + irr , b) 0 = k F1 R1 − r2 R1 − (1 + 6 n P¯ ¯ ! ccA ¯ R 0 − r ccA ¯ 0 0 0 ˙ cc ˙ cc b0 ) 0 = k 2 F10 0 (0))gR7 20 R1 − (1 + 6 0 (0))gR7 − σ 1 o ccB ¯ ccB ¯ ) , + m R40 (2i10 − i30 n o ! ¯ R +σ ¯ R + irr , ˙ mF ccB c) 0 = (p2 + q 2 ) r2ccA 1 4 n o ! ccA ¯ R0 + σ ¯ R 0 + m R 0 i ccB ¯ ˙ mF0ccB . c0 ) 0 = (p2 + q 2 ) r20 1 4 4 20 Four fields IX)

δh(p) δh(q) δB 1 (k) δc1 (l) 01 |0 !

¯ + irr. 0 = 6F hhh (− 21 gR3 ) + 4F BBhh mR4 + 2F BBh gR5 + 2mr hhcc

514

X) XI)

C. Kopper, V. F. Müller

δB 1 (k) δB 1 (p) δB 2 (q) δc2 (l) 01 |0

! ¯ + r BB cc ¯ + irr. 0 = −F BBh gR3 + 8F BBBB mR4 + m 2r1BB cc 2 δh(l) δc¯3 (k) δc1 (p) δc2 (q) 01 |0 !

¯ mR + F ccB ¯ gR + F cch ¯ gR + irr. 0 = 2r hB cc 4 7 5 XII) δc2 (k) δc¯2 (l) δc1 (p) δB 1 (q) 01 |0 !

¯ −r BB cc ¯ )mR +F ccB ¯ (− 1 gR )+(2r BB cc ¯ ( 1 gR −gR )+2mr cc ¯ cc ¯ +irr. 0 = F cch 3 4 6 7 1 2 2 2 XIII)1 δA1µ (k) δA2ν (p) δB 1 (q) δc2 (l) 01 |0 !

¯ + irr. 0 = 2r2AABB R4 + r2AAcc XIII)2 δA1µ (k) δA1ν (p) δB 2 (q) δc2 (l) 01 |0 !

¯ + irr. 0 = −F AAh gR3 + 4F1AABB mR4 + 2mr1AAcc XIV) δA1µ (p) δA1ν (q) δA2ρ (k) δc2 (l) 01 |0 o n ! ¯ + irr , a) 0 = 2δµν lρ 4(F1AAAA + r2AAAA )R1 + 2F AAA gR2 + α1 r1AAcc ! ¯ + irr , b) 0 = δµν (pρ + qρ ) α2 r1AAcc ! c) 0 = (δµρ lν + δνρ lµ ) −4F1AAAA R1 − 2F AAA gR2 + irr , !

d) 0 = (δµρ pν + δνρ qµ ) {0 + irr}, ! ¯ + irr . e) 0 = (δµρ qν + δνρ pµ ) − α1 r2AAcc XV)1 δB 1 (p) δB 1 (q) δA2µ (k) δc2 (l) 01 |0 o n ! a) 0 = lµ 4F1AABB R1 + 2F BBA gR6 + irr , ! ¯ + irr , b) 0 = kµ r1BB cc XV)2 δB 1 (p) δB 2 (q) δA1µ (k) δc2 (l) 01 |0 ! a) 0 = pµ −2r2AABB R1 + 2F BBA gR2 + F1hBA gR3 + irr , ! b) 0 = qµ −2r2AABB R1 − 2F BBA gR2 + 2F BBA gR6 + irr , o n ! ¯ +irr , c) 0 = kµ −2r2AABB R1 +F1hBA 21 gR3 +r2hBA 21 gR3 +F BBA gR6 − α1 r2BB cc XVI) δh(p) δA1µ (k) δB 2 (q) δc3 (l) 01 |0 ! a) 0 = pµ F1hBA g(R6 − R2 ) − r2hBA gR2 + irr , ! b) 0 = qµ F1hBA gR2 − r2hBA gR2 + 2F BBA gR5 + irr , o n ! ¯ + irr , c) 0 = kµ F1hBA 21 gR6 − r2hBA 21 gR6 + F BBA gR5 − α1 r hB cc XVII) δh(p) δh(q) δA1µ (k) δc1 (l) 01 |0 ! a) 0 = lµ 4F AAhh R1 − F1hBA gR5 + irr , ! ¯ + irr . b) 0 = kµ r2hBA gR5 + α2 r hhcc XVIII) δA2µ (k) δc2 (p) δc1 (q) δc¯1 (l) 01 |0 ¯ ! ¯ cc ¯ + irr , a) 0 = lµ F1ccA g(R2 − R7 ) + α2 r cc o n ! ¯ R + r ccA ¯ g(R − R ) + 2 r cc ¯ cc ¯ + irr , b) 0 = pµ 2r1AAcc 1 2 7 2 α o n ! 2 cc AA cc ¯ ccA ¯ ¯ cc ¯ R1 − r2 gR7 + α r + irr . c) 0 = qµ − r2

Renormalization Proof for Spontaneously Broken Yang–Mills Theory

515

Five fields XIX)

δh(p) δh(q) δh(k) δB 1 (l) δc1 (l 0 ) 01 |0

XX)

0 = −2F hhhh R3 + F hhBB R5 + irr. δh(p) δB 1 (q) δB 1 (k) δB 2 (l) δc2 (l 0 ) 01 |0

XXI)

0 = −F BBhh R3 + 2F BBBB R5 + irr. δA1µ (k) δA1ν (p) δh(k) δB 2 (l) δc2 (l 0 ) 01 |0

XXII)

0 = −F AAhh R3 + F1AABB R5 + irr. δA1µ (k) δB 1 (p) δc1 (l 0 ) δA2ν (q) δB 3 (l) 01 |0

! !

!

!

0 = r2AABB (R6 − 2R2 ) + irr. XXIII) δA1µ (k) δB 1 (q) δA2ν (p) δc2 (l 0 ) δh(l) 01 |0 !

0 = r2AABB R5 + irr. XXIV) δA3µ (k) δA3ν (p) δc¯2 (q) δc3 (l) δc1 (l 0 ) 01 |0 !

XXV)

¯ R + r AAcc ¯ R + irr. 0 = r2AAcc 2 7 1 δA3µ (k) δc¯3 (q) δA2ν (p) δc3 (l) δc1 (l 0 ) 01 |0 !

¯ (3R − R ) + irr. 0 = r2AAcc 2 7 XXVI) δB 1 (p) δB 1 (q) δc¯1 (k) δc2 (l) δc3 (l 0 ) 01 |0 !

¯ (R − R ) − r BB cc ¯ R + irr. 0 = r2BB cc 6 7 7 1 XXVII) δB 1 (p) δc¯1 (k) δB 2 (q) δc3 (l) δc1 (l 0 ) 01 |0 !

¯ (3R − 2R ) + irr. ¯ R + r BB cc 0 = −r hB cc 3 6 7 2 XXVIII) δh(p) δh(q) δc¯1 (k) δc2 (l) δc3 (l 0 ) 01 |0 !

¯ R + r hhcc ¯ R + irr. 0 = r hB cc 7 5 XXIX) δh(p) δB 1 (q) δc1 (l) δc¯2 (k) δc2 (l 0 ) 01 |0 !

¯ R + r BB cc ¯ R + r hB cc ¯ R − 2r BB cc ¯ (−R + 2R ) + irr. 0 = 2r hhcc 3 6 7 5 5 1 2

These 53 conditions are fulfilled in the (tree) order h¯ 0 for 3 = 0 and 30 ≤ ∞. For finite 30 we also have to take into account the tree order irrelevant contribution from (107, 108) to the classical action. References [BAM1] Bonini, M., D’Attanasio, M., Marchesini, G.: Ward identities and Wilson renormalization group for QED. Nucl. Phys. B418, 81–112 (1994) [BAM2] Bonini, M., D’Attanasio, M., Marchesini, G.: BRS symmetry for Yang–Mills theory with exact renormalization group. Nucl. Phys. B437, 163–186 (1995) [Ell] Ellwanger, U.: Flow equations and Bound states in Quantum Field theory. Zeitsch. f. Physik C38, 619–629 (1993) [EHW] Ellwanger, U., Hirsch, M., Weber, A.: Flow Equations for the relevant part of the pure Yang–Mills action. Zeitsch. f. Physik C69, 687–697 (1996) [FaSl] Faddeev, L.D., Slavnov, A.A.: Gauge Fields: Introduction to Quantum Theory. Reading, MA: Benjamin, 1980 [Ke1] Keller, G.: The Perturbative Construction of Symanzik’s improved Action for φ44 and QED4 . Helv. Phys. Acta 66, 453 (1993) [Ke2] Keller, G.: Local Borel summability of Euclidean 844 : A simple Proof via Differential Flow Equations. Commun. Math. Phys. 161, 311–323 (1994) [Kim] Kim, C.: A Renormalization Group Flow Approach to Decoupling and Irrelevant Operators. Ann. Phys. (N.Y.) 243, 117–143 (1995)

516

[KK1] [KK2] [KK3] [KK4] [KKSa] [KKSc] [Kop] [MiRa] [Pol] [Rei] [ReWe] [TeWe] [Wet] [Wie] [WiKo] [ZiJ]

C. Kopper, V. F. Müller Keller, G., Kopper, Ch.: Perturbative Renormalization of Massless 844 with Flow Equations. Commun. Math. Phys. 161, 515–532 (1994) Keller, G., Kopper, Ch.: Perturbative Renormalization of QED via flow equations. Phys. Lett. B273, 323–332 (1991); Renormalizability Proof for QED Based on Flow Equations. Commun. Math. Phys. 176, 193–226 (1996) Keller, G., Kopper, Ch.: Perturbative Renormalization of Composite Operators via Flow Equations I. Commun. Math. Phys. 148, 445–467 (1992) Keller, G., Kopper, Ch.: Perturbative Renormalization of Composite Operators via Flow Equations II: Short distance expansion. Commun. Math. Phys. 153, 245–276 (1993) Keller, G., Kopper, Ch., Salmhofer, M: Perturbative Renormalization and Effective Lagrangians in 844 . Helv. Phys. Acta 65, 32–52 (1991) Keller, G., Kopper, Ch., Schophaus, C.: Perturbative Renormalization with Flow Equations in Minkowski Space. Helv. Phys. Acta 70, 247–274 (1997) Kopper, Ch.: Renormierungstheorie mit Flußgleichungen. Aachen: Shaker Verlag, 1998 Mitter, P.K., Ramadas, T.R.: The Two-Dimensional O(N) Nonlinear σ -Model: Renormalisation and Effective Actions. Commun. Math. Phys. 122, 575–596 (1989) Polchinski, J.: Renormalization and Effective Lagrangians. Nucl. Phys. B231, 269–295 (1984) Reiß, Th.: Lattice Gauge Theory: Renormalization to all orders in the Loop Expansion. Nucl. Phys. B313, 417–463 (1989), and previous work of this author cited there Reuter, M., Wetterich, Ch.: Gluon Condensation in Nonperturbative Flow Equations. Phys. Rev. D56, 7893–7916 (1997) Tetradis, N., Wetterich, Ch.: Critical exponents from theAverageAction. Nucl. Phys. B422, 541–592 (1994) Wetterich, Ch.: Exact evolution equation for the effective potential. Phys. Lett. B301, 90–94 (1993) Wieczerkowski, Ch.: Symanzik’s Improved actions from the viewpoint of the Renormalization Group. Commun. Math. Phys. 120, 148–176 (1988) Wilson, K., Kogut, J.B.: The Renormalization Group and the ε-Expansion. Phys. Rep. 12C, 75–199 (1974) Zinn-Justin, J.: Quantum Field Theory and Critical Phenomena. Oxford: Clarendon Press, 3rd ed., 1997

Communicated by D. Brydges

Commun. Math. Phys. 209, 517 – 545 (2000)

Communications in

Mathematical Physics

© Springer-Verlag 2000

Percolation, Path Large Deviations and Weakly Gibbs States Christian Maes1,? , Frank Redig1,?? , Senya Shlosman2,??? , Annelies Van Moffaert1,† 1 Inst. voor Theoretische Fysica, K.U. Leuven, Celestijnenlaan 200D, 3001 Leuven, Belgium.

E-mail: [email protected]; [email protected]; [email protected]

2 CPT/CNRS, Luminy, Case 907, 13288 Marseille, Cedex, France. E-mail: [email protected]

Received: 14 May 1999 / Accepted: 16 September 1999

Abstract: We present a unified approach to establishing the Gibbsian character of a wide class of non-Gibbsian states, arising in the Renormalisation Group theory. Inside the realm of the Pirogov–Sinai theory for lattice spin systems, we prove that RG transformations applied to low temperature phases give rise to weakly Gibbsian measures. In other words, we show that the Griffiths–Pearce–Israel scenario of RG pathologies is carried by atypical configurations. The renormalized measures are described by an effective interaction, with relative energies well-defined on a full measure set of configurations. In this way we complete the first part of the Dobrushin Restoration Program: to give a Gibbsian description to non-Gibbsian states. A disagreement percolation estimate is used in the proof to bound the decay of quenched correlations through which the interaction potential is constructed. The percolation is controlled via a novel type of pathwise large deviation theory.

1. Introduction and the Main Result 1.1. Problem of Gibbsianity for the restrictions of Gibbs random fields. In this paper we continue the study of the Gibbsian nature of certain random fields, arising naturally in the context of statistical mechanics. As it is known by now, not all reasonable random fields are Gibbs fields. One class of examples can be obtained by applying simple Renormalization Group transformations to some of the most usual lattice Gibbs fields of statistical mechanics. A theorem of van Enter, Fernandez, and Sokal [EFS], extending earlier results of Griffiths and Pearce [GP1, GP2] and Israel [I], states that the restriction ? Onderzoeksleider FWO, Flanders.

?? Post-doctoraal Onderzoeker FWO, Flanders. ??? The work of S.S. is partially supported by the NSF through the grant DMS-9800860 and by the Russian

Basic Research Foundation through the grant 99-01-00284. † Aspirant FWO, Flanders.

518

C. Maes, F. Redig, S. Shlosman, A. Van Moffaert

of the (+)-phase of the two-dimensional low temperature Ising model to any square sublattice is not a Gibbs field. In [Sch] it is proven that the restriction of the same (+)-phase to the one-dimensional sublattice is also not a Gibbs field. So, originated by Dobrushin [D2], some efforts were made to generalize the notion of a Gibbsian field so as to bring these restrictions back into some class of generalized Gibbs fields. One way of doing this is to compromise on the condition that the interaction energy between a finite volume configuration and the outside world is always defined. More precisely, for states of infinite range dependence, this energy has to be represented by an infinite series. However, in some natural cases the absolute convergence of this series can not be satisfied for all configurations, for whatever choice of the interaction function. Dobrushin’s idea was that the convergence condition can be sacrificed, and one should be content with only almost everywhere convergence, according to the corresponding probability measure. This idea was implemented in [DS,MvdV1] for the projection of the (+)-phase of the two-dimensional low temperature Ising model to the 1D sublattice, and in [BKL] for the projection to a square sublattice. It is worthwhile to mention that the methods of these papers are quite different. In [BKL] the crucial technique is the one used in the study of the behavior of the Ising model in a random field, [BK]. It seems that such a technique can not be applied to the case of projections to lower dimensions. The methods of [DS] can in principle be applied to other situations, but their implementation requires much additional technical work. Let us remind the reader of the remark in Sect. 4.2 of [EFS] (used in [LMV]) that the image measure under a renormalization group transformation of a Gibbs measure may be viewed as the restriction of (another) Gibbs measure (obtained as the joint distribution of the original Gibbs measure with its RG image). In that way, the study of restrictions of Gibbs measures in fact incorporates a wide class of renormalization group transformations applied to Gibbs measures. For that reason, in the present paper we will concentrate on the case of the simplest renormalization group transformation, that of the restriction (≡projection ) to a sublattice. The generalization to other examples of renormalization group transformations is straightforward. In fact, our restrictions are more general, since we project Gibbs measures on a quite arbitrary infinite countable subset M of the lattice. On the other hand, the Gibbs measures we treat are the so called pure phases of models satisfying conditions of the Pirogov-Sinai theory [Sin].(The plus and minus phases of the standard Ising model are the best known examples.) We develop a universal approach to the problem, which is insensitive to the geometry of the subset M. In particular, all the above cited results are included. However, the temperatures for which our technique works depends on how sparse the set M is, and goes to zero when the sparseness increases. Our strategy is the development of the one used in [MvdV1] for the case of projecting the 2D Ising model onto the 1D sublattice. The idea of [MvdV1] to use percolation techniques has to be supplemented in our present more general situation by a certain large deviation theory. The required large deviation estimates are of a novel type, which is developed in Sect. 6 of this paper (see also Sect. 1.2 of the present Introduction). We now describe briefly our results. Throughout the paper we fix a countable subset M of the regular d-dimensional lattice Zd , containing the origin. The only restriction is that the set M has to be k-connected, k > 0. It means that the set o n (1) x ∈ Zd : dist (x, M) ≤ k is connected. The number k is fixed throughout the paper. Let now the random field P be an extremal low temperature Gibbs state of a model of statistical mechanics on Zd , d ≥ 2, satisfying all the conditions of the Pirogov–Sinai (PS) theory (see [Sin]).

Weakly Gibbs States

519

(The reader can think about the (+)-phase of the two-dimensional Ising model.) The d field P is a probability measure on the set = S Z , S finite, of all spin configurations σ on Zd . We are interested in the projection (≡restriction) PM of P onto the subset M ⊂ of all spin configurations σM on M. We are looking for a Gibbsian potential for PM , i.e. for a system U = (U (T , σT ), T ⊂ M, 0 < |T | < ∞) of real-valued functions U (T , σT ) of σT ∈ S T , such that the usual Gibbs formula for the conditional distributions of PM holds. However, a potential, which is absolutely summable, does not exist in general, as we already said above. What it is possible to find, is a system U, which makes PM into a weakly Gibbs random field. That means that one can find a tail ˜ M ⊂ M , such that measurable subset ˜ M = 1, (2) PM and the relative energy series EVU (σV |σ¯ M\V ) X U (T , σT ) + = T ⊆V ,T 6=∅

X

U (T , σT ∩V ∪ σ¯ T ∩(M\V ) )

T ⊂M:T ∩V 6 =∅,T ∩(M\V )6 =∅,|T |<∞

(3) ˜ M . The properties (2), (3) converges absolutely for all boundary conditions σ¯ M ∈ allow to write the Gibbs specification for PM -almost all configurations, and hence one can also write down the DLR equations, which in turn are satisfied by our measure PM . We refer to [MRV1,EMS] for further definitions and for a comparison with the notion of an almost Gibbsian field. Summarizing, our results in a preliminary form are given by the following Theorem 1. The projection PM of a Gibbs state P, describing a low temperature pure state of the PS model, to a k-connected subset M ⊂ Zd , is a weakly Gibbs random field. ˜ M , for which the Gibbs specifications can be defined, is given The set of configurations by a constructive procedure. (A more detailed statement is contained in Theorem 4 below.) ˜ M is an interesting subject in itself, so we Actually, the construction of the set conclude the introduction by mentioning our results concerning it. 1.2. Path large deviations. For the sake of simplicity we describe in the introduction the corresponding results in the simplified setting of the (+)-phase of the low temperature 2D Ising model. We want to discuss properties of the typical configurations, or, rather, typical properties of configurations. One well-known example of a typical property is the property of “having the right magnetization”. It means the following. Consider the event ) ( 1 X ∗ σx − m (β) > ε , A (ε, V ) = σ ∈ : |V | x∈V

where m∗ (β) is the spontaneous magnetization at inverse temperature β. Then for every ε > 0,

β,+

PV

(A (ε, V )) → 0 as V → Z2 ,

(4)

520

C. Maes, F. Redig, S. Shlosman, A. Van Moffaert β,+

where PV is the Gibbs state in the square box V with (+) boundary conditions. In particular, if we put \[ A (ε, Vn ) , A (ε) = N n≥N

where Vn is an n-square, then for every ε > 0, Pβ,+ (A (ε))c = 1. Here Pβ,+ stands for the (+)-phase. In the present paper we need properties which are valid for almost all configurations not just in the bulk, but along every single selfavoiding path. So let SV be the collection of all selfavoiding paths in V , connecting the origin to the boundary of V . Is it then true that the following strengthening of (4) holds: for every ε > 0,   [ β,+ A (ε, W ) → 0 as Vn → Zd . (5) PVn  W ∈SVn

In other words, is the property of “having the right magnetization along every path” a typical one? The answer is clearly negative, since for a Pβ,+ -typical configuration σ we easily can find a selfavoiding path γ , which avoids essentially all P contours of σ , and so the magnetization of σ along the path γ – that is the quantity |γ1 | x∈γ σx – can easily be almost equal to 1. So we introduce a smaller event ) ( 1 X ∗ σx > ε . (6) B (ε, W ) = σ ∈ : m (β) − |W | x∈W

For example, the event B (ε, γ ) happens, if the path γ enters too often inside the contours of the configuration σ . Then the following theorem holds: Theorem 2. If β is large enough, then   [ β,+  B (ε, W ) → 0 as n → ∞. P Vn

(7)

W ∈SVn

In words, the above statement means that the magnetization of any typical con/ figuration along every selfavoiding path is above m∗ (β) − ε. If a configuration σ ∈ S B W then we say that σ has a correct Path Large Deviation properties, or (ε, ), W ∈SVn simply that σ is a PLD configuration. In contrast with (4), where the convergence is exponential in |V |, in (7) we only have a stretched exponential decay. This is the content of Theorem 9 below, which in particular proves the claim of Theorem 2 above. In the next section we introduce the notations. In Sect. 3 we reduce the proof of Theorem 1 to the question of correlation decay in a random (quenched) environment. In Sect. 4 the correlation decay question is reduced to a question about percolation in a random environment. In Sect. 5 the percolation problem is solved under the hypothesis that the random environment has a property of the type described in Theorem 2. Finally, in Sect. 6 the generalization of Theorem 2 is proven, which justifies the use of the hypothesis above.

Weakly Gibbs States

521

2. Notation We fix an arbitrary k-connected (see (1)) infinite subset M of the regular d-dimensional lattice Zd , containing the origin. The most interesting cases concern M = l Zr , r = 1, . . . , d; l = 1, ..., 2k − 1, where we keep the invariance under (a subgroup of) translations. In the following, |A| is the cardinality of the set A, while Ac denotes the complement of A in Zd . For W ⊂ M we sometimes denote by W c the complement M \ W . General elements (sites) of Zd are written as x, y, z, but we write i, j when referring to sites (elements) in M. The distance between x = (x 1 , . . . , x d ) and y = (y 1 , . . . , y d ) is |x − y| ≡

d X

|x α − y α |.

(8)

α=1

The distance between two sets A and B is d(A, B) ≡ minx∈A,y∈B |x − y|. The diameter of a set A is diam A ≡ maxx,y∈A |x−y|. When dealing with a singleton A = {x}, we often write A = x. 3n is the cube {x ∈ Zd : |x| ≤ n}, n = 1, 2, . . . . Its intersection with M is 3n ∩M ≡ Vn . The boundary of a set 3 is ∂3 ≡ {x ∈ 3c : ∃y ∈ 3, |x−y| = 1} and must be distinguished from its internal boundary ∂i 3 ≡ {x ∈ 3 : ∃y ∈ ∂3, |x − y| = 1}. We find it also useful to regard Zd as a graph with its sites as vertices and its bonds (nearest-neighbor connections) as edges. x and y are adjacent (nearest-neighbors) if |x − y| = 1. A (finite) path (of length n) from x to y is a sequence of consecutive and mutually distinct nearest-neighbors (x0 = x, x1 , . . . , xn−1 = y). Infinite paths are the natural extensions of this. A path from A to B is any path starting in a site x ∈ A and ending in a site y ∈ B (its length is at least d(A, B)). We will be using the lexicographic order “≤” on Zd to say that x < y if x 1 < y 1 , or 1 x = y 1 and x 2 < y 2 , or ... x 1 = y 1 , x 2 = y 2 , . . . , x d−1 = y d−1 and x d < y d . Dual to paths are surfaces. They are sometimes referred to as ?-circuits in two dimensions. A surface around A is any collection of next-nearest-neighbor connected sites in Ac so that by removing them from the lattice, no infinite path can exist starting in A. We consider lattice spin systems on Zd . A general spin configuration on Zd is denoted d by σ or η. They are elements of the configuration space ≡ S Z , where S is the finite set (|S| ≡ q ≥ 2) of spin-values a, b, c, . . . at a single site. Ising-spins have S = {+1, −1}. The value of the spin at a site x in the configuration σ is σ (x) ∈ S. We will frequently use some reference configuration, denoted by 1 with 1(x) = +1 everywhere. The restriction of a σ ∈ to a set A is σA ∈ S A ; σA ηAc ≡ σA ∪ηAc equals σ on A (i.e., σA ηAc (x) = σ (x) for all x ∈ A) and equals η on Ac . We write σ A for the configuration which equals σ on A and is equal to +1 outside A. The restriction of to M is M ≡ S M and n ≡ 3n . We often consider M as a subset of Zd via natural embedding σ ∈ Zd → σM ∈ M . Therefore the same symbols σ, η, ξ will sometimes appear for configurations in and in M . All notation is inherited, e.g. ξ V equals ξ on V and is +1 on V c . A function f on is local if its dependence set Df , i.e. the minimal set A such that f (σ ) = f (η) whenever σA = ηA , is finite. Continuous functions are uniform limits of local functions with the sup-norm ||f || ≡ supσ |f (σ )|. The sigma-algebra generated by the evaluations x ∈ A → σ (x) is denoted by FA . When A = Zd , respectively A = M, we simply set F = FZd , respectively F 0 = FM . 0 The tailfield sigma-algebras are denoted by F ∞ = ∩n F3cn and F ∞ = F ∞ ∩ F 0 respectively. In what follows we will be considering probability measures µ on (, F). Their corresponding random field is denoted by X ≡ (X(x), x ∈ Zd ). Expectations are abbreviated

522

C. Maes, F. Redig, S. Shlosman, A. Van Moffaert

R as f (σ )dµ(σ ) = µ(f ) and a covariance is written as µ(f ; g) = µ(f g) − µ(f )µ(g). The probability of an event E ∈ F is µ[E] = µ[X ∈ E] or also µ(I [X ∈ E]), where we introduced the indicator function I . The same notation is used for probability measures ν on (, F 0 ). Such a measure appears as the restriction of µ to F 0 (or, to M). We denote by Y ≡ (Y (i), i ∈ M) the restriction of the random field X to M. The basic model system we will be dealing with here is defined via a nearest-neighbor interaction X J (σ 3n (x), σ 3n (y)). (9) H3n (σ ) = − x∈3n ,y∈Zd ,|x−y|=1

The interaction term J (·, ·) is a real-valued symmetric function on S × S (possibly containing a self-energy). We have taken care of putting +1 boundary conditions outside the cube 3n . The nearest-neighbor aspect ensures that the corresponding Gibbs fields will be Markov random fields but that will not be essential. The partition function is X e−β H3n (σ ) . (10) Zn ≡ σ ∈S 3n

We suppose that H satisfies the conditions of the PS theory, and that the configuration 1 with 1(x) = +1 everywhere is a ground state and gives rise to pure states at low temperatures in the usual sense of PS theory. More specifically, we assume that for all a, b ∈ S \ {+1}, J (+1, +1) = 0, J (a, +1) < −1, J (a, b) ≤ 0.

(11)

We assume further, that all ground state configurations of H are translation invariant. That implies in particular that J (a, b) < 0 for a 6= b. We then assume that in such a situation J (a, b) < −1 as well. 3. Correlation Decay ⇒ Weak Gibbsianity We start with the definition of some finite subsets of M. For every i ∈ M we put Li,m ≡ {j ∈ M : j ≤ i, |j − i| ≤ m}, m = 0, 1 . . . .

(12)

Clearly, Li,m−1 ⊂ Li,m ⊂ M and v(i, m) ≡ |Li,m \ Li,m−1 | ≤

2d−1 md−1 ≤ 2md−1 , m ≥ 1. (d − 1)!

(13)

We can now, following the lexicographic order, add sites one by one to Li,m−1 to end up with the set Li,m . So let us write Li,m \ Li,m−1 = {j1 , j2 , . . . , jv } with j1 < j2 < . . . < jv and their number v = v(i, m) possibly depending on m and i but not exceeding 2md−1 . It can happen that Li,m−1 = Li,m in which case v(i, m) = 0. Define the sets Qi,m,r ≡ Li,m−1 ∪ {j1 , j2 , . . . , jr }

(14)

with r = 1, 2, . . . , v. We have Qi,m,v = Li,m and we put Qi,m,0 ≡ Li,m−1 . Let µn be the Gibbs state in the box 3n , defined by our Hamiltonian H and the configuration 1 as boundary condition. According to our hypothesis, the sequence of probability measures µn ≡ µ3n on (, F) weakly converges to a measure µ, which is

Weakly Gibbs States

523

a pure state of our model. Let νn denote the probability measures obtained from µn by restricting them to M. Then, the limit limn νn = ν exists and equals µ restricted to M. ξ For every configuration ξ ∈ M the measure µn on (n , F) is defined in the following way: i) one takes the conditional Gibbs distribution µ¯ n,σ¯ n,ξ in 3n , defined by our Hamiltonian H and the following boundary conditions σ¯ n,ξ : ξ (x) for x ∈ ∂3n ∩ M, σ¯ n,ξ (x) = 1 for x ∈ 3cn \ (∂3n ∩ M) , (in our notation, σ¯ n,ξ = ξ ∂3n ∩M ), ξ ii) one defines µn by the conditioning: µξn [X = σ on A] = µ¯ n,σ¯ n,ξ [X = σ on A|X = ξ on Vn ], where A ⊂ 3n . ξ

ξ

(15)

ξ

We denote by n ≡ n,M ⊂ Zd the support of the measure µn ; by definition, these configurations coincide with σ¯ n,ξ outside 3n , and with ξ on Vn . Also, we define µξ by ξ µ· [A] = µ 1A |F 0 ; by what follows one can show that µξ = limn µn on a set of ξ ’s of PM -measure one. ξ Let us define the local observables φi = φi (σ ) , i ∈ M so that for all i ∈ Vn , Q

µn [X = ξ Q\i on Vn ] = µn [X = ξ Q on Vn ]µξn (φi ).

(16)

Of course, these functions φi can be written down, but we do not need explicit expressions for them. Also, for all i, j ∈ Vn we have Q

µn [X = ξ Q\{i,j } on Vn ] = µn [X = ξ Q on Vn ]µξn (8ij ).

(17)

Since the interaction is nearest neighbor, 8ij = φi φj ,

(18)

provided |i − j | > 4. Next we formulate the Correlation Decay property, which, if valid, implies Weak Gibbsianity. Definition 3. We say that the Quenched Correlation Decay (QCD) property holds, if there are constants C < ∞, λ > 0, a tail-set K ∈ F 0 with µn [K] = µ[K] = 1

(19)

and a function `(i, ξ ), defined for ξ ∈ K, i ∈ M, such that for every j ∈ M the set Bad (j, ξ ) ≡ {i ∈ M : `(i, ξ ) ≥ |i − j |}

(20)

is finite, and for all finite Q ⊂ M, all n and all j with |i − j | > min {`(i, ξ ), `(j, ξ )}, Q

|µξn (φi ; φj )| ≤ Ce−λ|i−j | .

(21)

¯ We suppose additionally that for every ξ ∈ K and every Q ⊂ M (finite or infinite) ¯ ¯ Q Q ⊂ Bad (j, ξ ). ξ ∈ K as well, and that for every j ∈ M Bad j, ξ

524

C. Maes, F. Redig, S. Shlosman, A. Van Moffaert

Theorem 4. Assume that the condition QCD holds. Then the restricted state, ν, is weakly Gibbsian with interaction potential U = {U (T , ·), T ⊂ M} vanishing except possibly for the sets T = Li,m with v(i, m) ≥ 1, where i ∈ M, m ∈ N, and is then given by U (Li,m , ξ ) = −

v(i,m) X r=1

U (Li,m , ξ ) = −

v(i,m) X r=1

ln

ln

µξ µξ

Qi,m,r

µξ

Qi,m,r

(φi φjr )

(φi )µξ

Qi,m,r

Qi,m,r

for m > 4,

(22)

for 1 ≤ m ≤ 4,

(23)

(φjr )

(8ijr )

Q Q µξ i,m,r (φi )µξ i,m,r (φjr )

while for m = 0 i

U (Li,0 , ξ ) = ln µξ (φi ).

(24)

This potential is absolutely summable on the tail-set K. Moreover, it satisfies the following bound: there exist constants C1 , C2 < ∞, λ > 0 such that for all m ≥ 0, ξ ∈ K, i ∈ M, |U (Li,m , ξ )| ≤ C1 I [m ≤ `(i, ξ )]md−1 + C2 I [m > `(i, ξ )]md−1 exp[−λm],

(25)

with the `(i, ξ ) and the set K as in QCD conditions (19–21) above. Proof. Consider the probability to find the configuration ξ in Vn . In our notation it is νn [Y = ξ on Vn ] = µn [X = ξ on Vn ],

(26)

and we will abbreviate it as µn [ξ ]. Order the sites in Vn lexicographically as i1 < i2 < . . . < i|Vn | to write |Vn | µn [ξ {i1 ,... ,is } ] µn [ξ ] Y = . µn [1] µn [ξ {i1 ,... ,is−1 } ]

(27)

s=1

For every i ∈ Vn we define m(i, n) ≡ maxj ∈Vn :j ≤i |i − j |. Then we can rewrite every factor in (27) as m(i s ,n) Y µn [ξ Lis ,m ] µn [ξ Lis ,m−1 \is ] µn [ξ {i1 ,... ,is } ] = . µn [ξ {i1 ,... ,is−1 } ] µn [ξ Lis ,m−1 ] µn [ξ Lis ,m \is ] m=0

So if we define the family U n = {U n (·, ·)} by U n (Li,m , ξ ) ≡ − ln then we have

 

µn [ξ Li,m ] µn [ξ Li,m−1 \i ] , µn [ξ Li,m−1 ] µn [ξ Li,m \i ]

µn [ξ ] = µn [1] exp − 

X X m(i,n) i∈Vn m=0

(28)

 

U n (Li,m , ξ Vn ) . 

(29)

Weakly Gibbs States

525

Note that ξ ∅ ≡ 1, and so µn [ξ i ] i = ln µξn (φi ). µn [1]

U n (Li,0 , ξ ) ≡ − ln

(30)

Observe also that m(i, n) = 0 when i = i1 is the “first” site in Vn , and that ξ(i) = +1, or v(i, m) = 0, or ξ(j ) = +1 for all j ∈ Li,m \ Li,m−1 .

U n (Li,m , ξ ) = 0, provided

(31)

We can further telescope (28) as U n (Li,m , ξ ) = −

v(i,m) X

µn [ξ Qi,m,r ]µn [ξ Qi,m,r−1 \i ] , µn [ξ Qi,m,r−1 ]µn [ξ Qi,m,r \i ]

ln

r=1

(32)

provided m > 0 and v(i, m) ≥ 1. If m > 4, we can use (16-18) to rewrite (32) as n

U (Li,m , ξ ) = −

v(i,m) X

ξ Qi,m,r

ln

µn ξ

Qi,m,r

µn

r=1

(φi φjr ) ξ Qi,m,r

(φi )µn

(φjr )

,

(33)

while for 1 ≤ m ≤ 4 we can not use the factoring (18), so we keep the initial local ξQ

observables 8ijr Here we remind the reader that the conditioning µn of the measure µn by the configuration ξ Q means the use of the condition X = ξ on Q, X = 1 on M\Q, |Q| < ∞. So the cluster expansion easily provides us with the existence of the limits ξ Qi,m,r

(f ) for every local observable f . Taking the limits n → ∞ in (33), limn→∞ µn (30), we arrive to the formulas (22)–(24). The estimate (25) follows from relation (21). The almost sure convergence of the relative energy series (3) follows from the finiteness of the sets Bad (j, ξ ) of “bad” points (20), when ξ ∈ K. What remains is to show that indeed the random field ν is a Gibbs field with the potential U = {U (T , ·), T ⊂ M}, i.e. to prove that the corresponding DLR equations hold. The rest of the present section contains the proof of validity of DLR equations. Since this proof is not used in the rest of the paper, the reader might want to go directly to the next section. The following proof is an adaptation of the similar statement from [DS], Sect. 8. We begin by introducing for every finite V ⊂ M the Gibbs specification pVU (ξV |ξ¯M\V ) =

exp{−EVU (ξV |ξ¯M\V )} , Z U (ξ¯M\V )

(34)

V

where the partition function ZVU (ξ¯M\V ) =

X ξV

∈XV

exp{−EVU (ξV |ξ¯M\V )}.

(35)

526

C. Maes, F. Redig, S. Shlosman, A. Van Moffaert

Here the function EVU (ξV |ξ¯M\V ) is defined by the series (3), if the latter converges. Let us show that for ξ¯ = ξV ∪ ξ¯M\V ∈ K this convergence follows from that part of Theorem 4 that is already proven. Indeed, let the set Li,m ⊂ M be “bad”, in the sense that it intersects the box V , and `(i, ξ¯ ) ≥ m. In other words, the contribution of U (Li,m , ξV ∪ ξ¯M\V ) to EVU (ξV |ξ¯M\V ) might be big, according to (25). Then necessarily the site i belongs to the S set j ∈V Bad j, ξ¯ , which is finite. Note that the number of different sets Li,m with d m ≤ `(i, ξ¯ ) is less than 2`(i, ξ¯ ) , so the total number of “bad” Li,m -s is bounded by X S

i∈

d

2`(i, ξ¯ )

,

Bad(j,ξ¯ )

j ∈V

which is also finite. The above argument implies that the functions pVU (ξV |ξ¯M\V ) are defined ν-a.s., which makes well-defined the rhs of the DLR equation (36), which follows: Z X Z (36) φW (ξW )pVU (ξV |ξ¯M\V ) νM\V (d ξ¯M\V ). φW (ξW )ν(dξ ) = ξV ∈S V

Here V ⊆ W are arbitrary finite subsets of M, ξW = ξV ∪ ξ¯M\V |W is (with some abuse of notation) the restriction, the local observable φW (ξW ) ≡ φW (ξV ∪ ξW \V ) is a 0 function on S W , while νM\V is the restriction of the measure ν to the σ -algebra FM \V . Equation (36) should hold for any choice of V ⊆ W . To prove (36), let us introduce the subsets XN ≡ XN (W ) ⊂ K as     X Bad j, ξ¯ < N , XN = ξ¯ ∈ K :   j ∈W

and rewrite the rhs integral of (36) as a sum: Z Z Z (·) + (·) =

(XN )c

XN

(·).

(37)

P Note that the last integral goes to zero as N → ∞, since the function j ∈W Bad j, ξ¯ is finite on K and because the integrand is bounded. We further introduce the subsets XN,R of XN :     [ Bad j, ξ¯ we have dist (i, W ) ≤ R . XN,R = ξ¯ ∈ XN : ∀ i ∈   j ∈W

We have that

Z

Z XN

(·) =

XN,R

Z (·) +

Again, the last integral goes to zero as R → ∞.

XN \ XN,R

(·).

(38)

Weakly Gibbs States

527

The estimate (25) implies the continuity of the function pVU (ξV |ξ¯M\V ) on the sub3 of theQCD property implies that the followspaces XN , XN,R . Note also that Definition ing implications hold: ξ¯ ∈ XN ⇒ ξ¯ Q ∈ XN , and ξ¯ ∈ XN,R ⇒ ξ¯ Q ∈ XN,R . Together these properties imply that for every ε > 0 there exists a distance ρ (ε, N, R) big enough, such that for every set Qρ containing the set Q (W ; ρ (ε, N, R)) = {i ∈ M : dist (i, W ) ≤ ρ (ε, N, R)} we have: Z X φW (ξW )pVU (ξV |ξ¯M\V ) νM\V (d ξ¯M\V )− XN,R ξ ∈S V V Z X Q ρ U ¯ ¯ φW (ξW )pV (ξV |ξM\V ) νM\V (d ξM\V ) < ε. − XN,R ξ ∈S V

(39)

V

Repeating the arguments (37), (38) for the last integral, in the reversed order, we can replace it by the integral over the whole space, Z

Q φW (ξW )pVU (ξV |ξ¯Mρ\V ) νM\V (d ξ¯M\V ),

X ξV ∈S V

(40)

again with arbitrary precision. But because of (31) the integrand in the last integral is a local function! Therefore we can approximate it arbitrarily close by the integral with respect to the finite volume measure νn , Z

Q φW (ξW )pVU (ξV |ξ¯3nρ\V ) (νn )3n \V (d ξ¯3n \V ),

X ξV

∈S V

(41)

provided n > n Qρ is large enough. Note that for any local event A we have the exponential convergence µn (A) → µ (A) as n → ∞ (though not uniform in A, of course). Therefore the two functions, n Q Q p U (ξV |ξ¯ ρ ) and pU (ξV |ξ¯ ρ ) can be made arbitrarily close, uniformly in ξV ∪ V

3n \V

V

3n \V

Q ξ¯3nρ\V , provided only that n (= n (ρ)) is large enough. The last step would be to use again the approximations (37-39) to replace the integral

Z

n Q φW (ξW )pVU (ξV |ξ¯3nρ\V ) (νn )3n \V (d ξ¯3n \V )

X ξV ∈S V

by Z

n φW (ξW )pVU (ξV |ξ¯3n \V ) (νn )3n \V (d ξ¯3n \V ),

X ξV ∈S V

528

C. Maes, F. Redig, S. Shlosman, A. Van Moffaert

and then to observe that because of the validity of the finite volume DLR equation (which is an identity) Z X n Q φW (ξW )pVU (ξV |ξ¯3nρ\V ) (νn )3n \V (d ξ¯3n \V ) Z =

ξV ∈S V

φW (ξW )νn (d ξ¯3n ) →

Z

φW (ξW )ν(d ξ¯ ).

This use is possible since the relations (37-39) hold in fact for the integrals Z X n φW (ξW )pVU (ξV |ξ¯M\V ) (νn )M\V (d ξ¯M\V ) ξV ∈S V

as well, uniformly in n → ∞, due to our Definition 3 of QCD. u t Remark. The construction of the potential of the basic lemma above goes back to the paper of Kozlov, [Koz]. This telescoping potential was already used in [MvdV1,MRV2] and it has the nice property that, when the set M allows it, U is explicitly translationinvariant. 4. Percolation ⇒ Correlation Decay We continue with the same setup of the previous section. We must estimate the covariances in the left hand side of (21). Using an idea of [B, BM, BS] as applied in [MvdV1, MRV2], we can reduce this to a percolation question. For m > 4, the two local observables f = φi and g = φjr have disjoint dependence sets Df = F, Dg = G ⊂ 3n . We will take n large enough, so that Qi,m,r ⊂ 3n . To save on notation we fix ξ and Qi,m,r ξ Qi,m,r

and we write µn = ρn . We must estimate ρn (f ; g). To proceed we consider the product space of configurations S 3n × S 3n = (S × S)3n on which we put the product coupling ρn × ρn . So, we just consider two independent copies of the original system. Now, for any two disjoint finite sets B and C, we introduce the event E(B, C) that there is a path from B to C in 3n , such that for every site x of this path in 3n \ Qi,m,r we have (X(x), X0 (x)) 6= (+1, +1). Lemma 5. For the Markov random field ρn we have |ρn (f ; g)| ≤ 2 ||f || ||g|| (ρn × ρn ) [E(F, G)].

(42)

Proof. First of all, we can write |ρn (fg) − ρn (f )ρn (g)| = |ρn (g(X)[ρn (f |XG ) − ρn (f (X))]) | ≤ ||g|| ρn (|ρn (f |XG ) − ρn (f )|) = ||g|| ρn dX ρn dX 0 |XG × ρn (·) (f (X) × 1 − 1 × f (X0 )) .

(43)

To save on notation, let 3 ≡ 3n \ Qi,m,r \ G. Now imagine a fixed configuration σ on ∂3 with σ = +1 on ∂3n , σ = ξ on Qi,m,r and σ some fixed configuration on G; we must study (ρn (·|X = σ on ∂3) × ρn (·)) (f × 1 − 1 × f ).

Weakly Gibbs States

529

If for every path in 3n from F to G there is a point x ∈ 3n \ Qi,m,r on it where (X (x) , X 0 (x)) = (+1, +1), then there exists a surface (or ?− circuit) around F separating it inside 3n from G and on which X ≡ X0 . In general, this surface has a part in 3 \ F ∪ ∂3n on which (X, X0 ) ≡ (+1, +1) and a part in Qi,m,r on which the configuration ξ lives and thus there (X, X 0 ) ≡ (ξ, ξ ). Hence there exists a maximal surface among these, 1, inside 3 (maximal in the inclusion sense). Now since X = X0 on 1 and by the Markov property, |(ρn (·|X = σ on ∂3) × ρn (·)) (f × 1 − 1 × f )| ≤ 2||f || (ρn (·|X = σ on ∂3) × ρn (·)) [E(F, G)].

(44)

Continuing with (43), (44) now yields that |ρn (f ; g)| ≤ 2 ||f || ||g|| (ρn × ρn ) [E(F, G)],

(45)

which we wanted to prove. u t Remark. Clearly, from the proof above, an analogous estimate to (42) holds in the case that µn is not (strictly) Markovian but becomes Markovian when to each site of the lattice a finite number of edges is attached linking that site with more than just its nearest-neighbors. 5. PLD ⇒ Contour Estimates ⇒ Exponentially Weak Percolation Let F be a finite subset of Zd , as in the previous section. Denote by CF the random set containing all sites x ∈ Zd \ M which are nearest-neighbor connected to ∂F via sites y ∈ Zd \ M for which (X(y), X0 (y)) 6 = (+1, +1). This is the cluster of F . Our task is now to find a (large) set K of configurations ξ and corresponding lengths `(i, ξ ) (see(19)), for which Qi,m,r

µξn

Qi,m,r

× µξn

[diam (CF ) > m] ≤ e−λm ,

(46)

whenever m > `(i, ξ ). Given the bound (42), this would take care of the assumption (21). This of course is reminiscent of the stochastic-geometric structure of the low-temperature phases in the realm of the Pirogov-Sinai theory. ξ Qi,m,r

ξ Qi,m,r

× µn , Since the coupling we are considering is just a product coupling µn it is clear that we have (46) once we know that a suitable analogue of the Peierls estimate ξ Qi,m,r

ξQ

≡ µn itself. So we will formulate it next, leaving the holds for the state µn derivation of (46) for the end of the present section. + + Let X ∈ + n = n (ξ ), where n (ξ ) is the set of configurations that are +1 outside 3n , except at sites in M \ 3n , where they coincide with ξ . We introduce now the contours of configuration X in the usual manner. Namely, we call a face F of the dual lattice a boundary face, iff the values Xy , Xz at the sites y, z, closest to F, are different. The connected components of the boundary faces are called contours. We denote by G (X) the set of all contours of X, while G ex (X) is the set of exterior contours of X; these are the contours which are not surrounded by other contours. Finally, we call a contour closed, if it does not contain faces outside 3n . Otherwise the contour will be called open.

530

C. Maes, F. Redig, S. Shlosman, A. Van Moffaert

If the configuration X ∈ + n (ξ ) , and ξ 6 ≡ + 1, then X can have both open and closed contours. For x ∈ 3n we define the contour 2x (X) by 0, if 0 ∈ G ex (X) , x ∈ Int0, 2x (X) = ∅ otherwise. We are interested in showing that the event that the contour 20 (·) is of the size L ξQ happens with µn -probability ≤ exp {−cβL}. However, that cannot possibly be true without putting conditions on the configuration ξ . The condition should be of the type that the set D (ξ ) ⊂ M of sites where ξ 6 = 1 is quite sparse. The following generalization of the Path Large Deviation property is a condition of this type. We will still call it PLD property. Definition 6. Let ξ ∈ M , λ > 0 and x ∈ Zd be given. Put  min {l : for all finite k-connected     T ⊂ Zd with x ∈ T , diam (T ) > l if such T -s exist, `λ (x, ξ ) ≡ `λ (x, D (ξ )) = we have |T ∩ D (ξ )| ≤ λ|T |}     ∞ otherwise. A set D will be called λ-sparse, iff `λ (x, D) is finite for all x. We define the set Bad (y, ξ ; λ) ≡ {x ∈ Zd : `λ (x, ξ ) ≥

1 |x − y|}, 2

(47)

and we put K (λ) ≡ ∩y∈Zd Ky (λ) with Ky (λ) ≡ {ξ ∈ M : |Bad (y, ξ ; λ)| < ∞} . We say that PLD holds iff µ (K (λ)) = 1. Theorem 7. Let ξ ∈ K (λ/2) with λ ≤ λ (J ), where λ (J ) is small enough. Then for all Q ⊂ Zd finite and uniformly in large n Q

µξn {X : diam (20 (X)) > L} ≤ exp {−cβL} ,

(48)

provided L > `λ (0, ξ ). Here c is some constant. Note. In case the set D (ξ ) itself would contain a long contour 0 surrounding the origin, ξQ

the event diam (20 (X)) ≥ diam (0) happens with µn -probability one, if 0 ⊂ Q, so Theorem 7 has no chance to hold in such a case. Happily, under the condition that ξ ∈ K0 (λ) we have immediately that D (ξ ) cannot contain a contour with diam (0) > `λ (0, ξ ) once λ < 1/2. Proof. We begin by showing that under our hypothesis the value `λ (0, ξ ) is finite. (This is the only property of the configuration ξ needed to prove (48)). Since ξ ∈ K (λ/2), we have `λ/2 (x, ξ ) < ∞ for some x. Let us check now that if for some x the value `κ (x, ξ ) is finite, then for all y the values `2κ (y, ξ ) are also finite. Indeed, let y ∈ T , diam (T ) = l and |T ∩ D (ξ )| ≥ 2κ|T |. Below, we denote by [x, y] ⊂ Zd any n.n. connection between x and y, having |x − y| sites. If maxz∈T |x − z| > `κ (x, ξ ), then |(T ∪ [x, y]) ∩ D (ξ )| < κ |T ∪ [x, y]| ≤ κ |T | + κ |x − y| .

Weakly Gibbs States

531

On the other hand, l |(T ∪ [x, y]) ∩ D (ξ )| ≥ |T ∩ D (ξ )| ≥ 2κ|T | ≥ 2κ . k Therefore l < k |x − y|, and `2κ (y, ξ ) < max {k |x − y| , 2`κ (x, ξ )}. For convenience, we suppress the index Q throughout the proof. First, let 0 be a contour, surrounding the origin and not attached to the boundary of the box 3n , i.e. a closed contour. Then we have to estimate the probability µξn X : 0 ∈ G ex (X) . The first remark is that it does not exceed the ratio Z 0 (Int 0, +1, β) , Z 00 (Int 0, +1, β)

(49)

where the partition function Z 0 is calculated over the subset 0 (Int 0, M, ξ ) of configurations in Int 0, where 0 (3, M, ξ ) = {σ ∈ 3 : σ (x) 6 = +1 for every x ∈ ∂i (3) , σ (x) = ξ (x) for all x ∈ M ∩ 3} ,

(50)

while Z 00 is calculated over 00 (Int 0, M, ξ ) with 00 (3, M, ξ ) = {σ ∈ 3 : σ (x) = ξ (x) for all x ∈ M ∩ 3} .

(51)

To estimate the ratio (49), we need the following two cluster expansion results for the partition functions in the framework of the PS theory (see [Sin] and [KP] or [D1]). The first one deals with the case when we calculate the partition function Z (3, a, β) in the box 3 with the constant boundary condition a, corresponding to the stable phase. Then ln (Z (3, a, β)) = |3| f (β) + c (3, a, β) ,

(52)

where f (β) is the free energy of the model, and the boundary term c (3, a, β) has the c (3, a, β) is exponentially small in β. The second case is obtained property that the ratio |∂3| when we restrict the summation in the partition function (52) to the configurations σ , which possess a contour 0 right at the boundary of 3. In other words, for all points x ∈ 3, adjacent to ∂i 3, σ (x) 6 = a. This partition function will be denoted by Z (3, 0, a, β). Then ln (Z (3, 0, a, β)) = |Int 0| f (β) + c0 (3, a, β) − βE (0) .

(53)

Here c0 (3, a, β) is again a boundary term, while E (0) is the (temperature independent) energy of the contour 0. It satisfies the Peierls condition: E (0) ≥ c1 |0| with c1 > 0, which bound is the precondition of the PS theory.

532

C. Maes, F. Redig, S. Shlosman, A. Van Moffaert

The estimate of (49) proceeds as follows. Note that every configuration σ from the set (50) possesses in addition to the contour 0 a collection of contours κ˜ (σ ) = {γi }, separating the set M1 (ξ, 0) = {x ∈ M ∩ Int 0, ξ (x) = 1}

(54)

from the contour 0. Their presence is due to the fact that the 1-spins sitting in the set M1 (ξ, 0) force the 1-phase inside the box Int 0. Our notation κ˜ (σ ) refers to the collection of all such contours, and we denote by κ (σ ) ⊂ κ˜ (σ ) a subset of external contours of κ˜ (σ ). The contours from κ (σ ) have the 1-phase inside them and are not separated from 0 by other contours. Another way of defining them is to say that the contours from κ (σ ) are boundaries of maximal 1-surfaces, surrounding points of M1 (ξ, 0). We now split the partition function Z 0 (Int 0, +1, β) according to what the family κ = κ (σ ) is: X Z 0 (Int 0, +1, β) = Z 0 (Int 0 ∩ Ext κ, +1, β) Z 00 (Int κ \ ∂i (Int κ) , +1, β) ; κ

here the partition function Z 0 (Int 0 ∩ Ext κ, +1, β) is calculated over the set 0 (Int 0∩ Ext κ, M, ξ ). Hence, Z 0 (Int 0, +1, β) Z 00 (Int 0, +1, β) X Z 0 (Int 0 ∩ Ext κ, +1, β) Z 00 (Int κ \ ∂i (Int κ), +1, β) ≤ Z 00 (Int 0, +1, β) κ X Z 0 (Int 0 ∩ Ext κ, +1, β) . ≤ Z 00 (Int 0 ∩ Ext κ, +1, β) κ

(55)

We note for clarity that for every κ the set Int 0 ∩ Ext κ contains no sites of the set M1 (ξ, 0). To obtain the upper estimate for Z 0 , we use (53). Our upper bound on Z 0 is ln Z 0 (Int 0 ∩ Ext κ, +1, β) ≤ |(Int 0 ∩ Ext κ) \ M| f (β) − τ (|0| + |κ|) ,

(56)

with τ diverging as β → ∞. Moreover, τ = τ (β) satisfies the estimate τ (β) ≥ c2 β,

(57)

with c2 = c2 (c1 ) > 0. To estimate the denominator, we diminish the partition function Z 00 (Int 0 ∩ Ext κ, +1, β), replacing it by the partition function Z 000 (Int 0 ∩ Ext κ, +1, β) calculated over the subset 000 (Int 0 ∩ Ext κ, M, ξ ) of 00 (Int 0 ∩ Ext κ, M, ξ ), where 000 (3, M, ξ ) = {σ ∈ 3 : σ (x) = ξ (x) for all x ∈ M ∩ 3, σ (s) = 1 for all s ∈ ∂D (ξ ) ∩ 3} . Here by ∂D (ξ ) we denote the set o n ∂D (ξ ) = x ∈ Zd \ M : for some i ∈ M with ξ (i) 6 = 1 we have |x − i| = 1 .

Weakly Gibbs States

533

To estimate the partition function Z 000 from below, we use (52): (58) ln Z 000 (Int 0 ∩ Ext κ, +1, β) ≥ |(Int 0 ∩ Ext κ) \ M| f (β) −c3 (β) (|0| + |κ| + |∂D (ξ ) ∩ (Int 0 ∩ Ext κ)|) − R |∂D (ξ ) ∩ (Int 0 ∩ Ext κ)| , with c3 (β) → 0 as β → ∞; the last term corresponds to the interaction of the 1-spins σ sitting at points x ∈ ∂D (ξ ) with spins ξ (i) 6 = 1, i ∈ M ∩ (Int 0 ∩ Ext κ) . Clearly, R ≤ βC (J (·, ·)) .

(59)

The crucial observation now is that the set ∂ (Int 0) ∪ [M ∩ Int 0] is k-connected, and so the set M (0, κ) = ∂ (Int 0 ∩ Ext κ) ∪ [M ∩ (Int 0 ∩ Ext κ)] is k-connected as well. We are going to argue now that it implies that the term (|0| + |κ|) from (56) suppresses the term |∂D (ξ ) ∩ (Int 0 ∩ Ext κ)| in (58) provided the constant λ is small. To see it we first need the following lemma. It claims, roughly, that if an Ising contour 0 surrounds a λ-sparse set D, which in turn is a subset of a k-connected set T , then either the length (surface area) |0| of this contour is of the order of λ−1 |D|, or else the contour surrounds a lot of points belonging to T . Lemma 8. Let D ⊂ Zd be a finite λ-sparse set, see Def. 6, and let T ⊂ Zd be a kconnected set, with D ⊂ T . Let κ be a finite collection of mutually external contours, while 0 is a contour surrounding the family κ. (The family κ can be empty.) Suppose that D ⊂ (Int 0 ∩ Ext κ). Then |0| + |κ| + |T ∩ (Int 0 ∩ Ext κ)| ≥ λ−1 |D| . Proof. Consider the set ∂ (Int 0 ∩ Ext κ)∪[T ∩ (Int 0 ∩ Ext κ)]. This set is k-connected and it contains D. Therefore |D| ≤ λ |∂ (Int 0 ∩ Ext κ) ∪ [T ∩ (Int 0 ∩ Ext κ)]| , so the claim follows. u t We would like now to use for the set M (0, κ) the information provided by the assumptions of the theorem we are proving: namely, that diam (M (0, κ)) > `λ (0, ξ ) . However, that would be of use only if 0 ∈ M (0, κ) . If that is not the case, we can modify the set M (0, κ) by attaching to it a connection to the origin of length ≤ |0| /2. The resulting set will still be denoted by M (0, κ) . Now Lemma 8, under the conditions of the theorem we are proving, provides us with the estimate: |∂D (ξ ) ∩ (Int 0 ∩ Ext κ)| ≤

3dλ (|0| + |κ|) . 1−λ

(60)

(To get it we apply Lemma 8 with D = D (ξ ) ∩ (Int 0 ∩ Ext κ) and T = M (0, κ) .) Using (60) together with the bounds (57), (59) and the estimates (56) and (58), we have for (55), X X Z 0 (Int 0 ∩ Ext κ, +1, β) ≤ exp −τ 0 (|0| + |κ|) 00 Z (Int 0 ∩ Ext κ, +1, β) κ κ

(61)

with τ 0 = τ 0 (τ, λ) diverging with τ . We remind the reader that the last summation goes over families κ surrounding the set M1 (ξ, 0) ∩ Int 0. Hence (48) follows from (61) by standard combinatorics.

534

C. Maes, F. Redig, S. Shlosman, A. Van Moffaert

As the last step in the proof of Theorem 7 we have to consider the case when a contour 0 is not a closed contour, but is attached to the boundary of the box 3n , due to the impurity of the boundary conditions around the box 3n , caused by the presence of the conditioning by ξ. Note however that we can well assume that n > `λ (0, ξ ), since otherwise the theorem holds trivially, and also that L > n. That implies that the set D (ξ ) ∩ ∂i (Int 0) is λ-sparse. (Again, we add to 0 the connector to 0, if needed.) So the number of these impurities to which the contour 0 is attached does not exceed λ |0| . If now σ is any configuration in 3n with the contour 0, then it can be modified to the value +1 in the vicinities of these impurities, so that the resulting configuration σ 0 has a (closed) contour 0 0 , which is not attached to the boundary of 3n any more. The energy cost of such a modification is no more than λC (J ) |0|, while the contour 0 0 can be treated by the already proven part of our theorem. So if λ is small, the factor exp −cβ 0 0 beats the energy weight exp {βλC (J ) |0|}, and that completes our argument. u t Proof of Estimate (46). The proof goes essentially along the same lines. Without loss that 0 ∈ CF . Let Y = {y ∈ CF : X(y) 6= +1}, Y 0 = of generality0 we can suppose y ∈ CF : X (y) 6 = +1 . Then Y ∪ Y 0 = CF . Consider the following collections of contours: 2Y = 0 ∈ G ex (X) : Int 0 ∩ Y 6 = ∅ , 2Y 0 = 0 ∈ G ex X 0 : Int 0 ∩ Y 0 6= ∅ . Note that the union ϒ of all contours from 2Y ∪ 2Y 0 is a connected set. Clearly, X Qi,m,r Qi,m,r Qi,m,r Qi,m,r × µξn [diam (CF ) > m] ≤ µξn µξn (2Y ) µξn (2Y 0 ) . 2Y ,2Y 0 : diam(2Y ∪2Y 0 )>m

Using the analogue of the estimates (56), (58) and the analogue of Lemma 8 with γ replaced by the (connected) union ϒ, we arrive at the following analogue of (61): Qi,m,r Qi,m,r µξn (2Y ) µξn (2Y 0 ) ≤ exp −τ 00 |ϒ| . The rest of the proof is standard combinatorics. u t

6. Proof of PLD In this last section we finally prove an unconditional statement: the PLD property, as defined in the previous section, holds with probability one. We will start with the lowtemperature (+)-phase Pβ,+ of the Ising model. We consider the general case in the last subsection. The events we are interested in here are ) ( 1 X ∗ σx > ε . (62) B (ε, W ) = σ ∈ : m (β) − |W | x∈W

We now introduce the notation SV (x; k, N ) for the family of all k-connected subsets of the box V of diameter between N and 2N , containing a site x, with SV (k, N ) ≡ SV (0; k, N ). Then we have the following theorem:

Weakly Gibbs States

535

Theorem 9. For every k > 0, ε1 > 0 there exists a value βk , such that for all β > βk and all V containing x   [ d−1 β,+  2+ε d ( )  1 B (ε, W ) ≤ exp −βN . (63) PV W ∈SV (x;k,N )

Let us now fix N, and consider the union S Zd (y; k, N ) =

[

SZd

x∈Zd

Note that because of (63)

1 x; k, N + |x − y| . 3





 Pβ,+ 

[

 B (ε, W ) → 0

W ∈S Zd (y;k,N )

as N → ∞. On the other hand, if σ ∈ \ Bad 0, σ ; λ¯ (see (20), (47)) with λ¯ = λ¯ (ε, β) =

S

B W , then the set (ε, ) d (0;k,N )

W ∈S Z

1 + ε − m∗ (β) 2

is contained within the ball of radius 6N around the origin and is therefore finite. Hence, the set of configurations    \ [ \   B (ε, W ) K˜ (ε) =  \  y∈Zd

N

W ∈S Zd (y;k,N )

is contained in the set K λ¯ . Since K˜ (ε) has full measure, PLD is satisfied. Note finally that λ¯ (ε, β) → 0 as ε → 0, β → ∞, so for ε small and β large λ¯ (ε, β) ≤ λ (J ) (see Theorem 7). So what is left is the proof of Theorem 9. We will consider the case x = 0, therefore x will disappear from the notation. 6.1. The strategy. The proofs of the above results turn out to be quite non-trivial. To explain the nature of the difficulties, we present below a short account of the straightforward idea of the proof, together with the explanation why it does not work. Let us try to prove Theorem 9 for the case of k = 1; moreover, let us take V = Vn to be a cube, and restrict the set SVn (k) to consist of selfavoiding paths only, connecting the origin to the boundary ∂Vn . So, we are talking about the events which may be called “large deviations along paths”. If we would be able to prove the estimate β,+ (64) PVn B (ε, γ ) ≤ exp {−c (β) |γ |} , valid for every path γ , with the exponent c (β) diverging as β → ∞, then we would be done. Indeed, the number of paths γ containing the origin and having length l is bounded

536

C. Maes, F. Redig, S. Shlosman, A. Van Moffaert

by 3l , so we can finish the proof by summation over all paths. The point however is that the estimate (64) does not hold in general. To see it can be violated, let us introduce the set C (0) of configurations having the contour 0 among their exterior contours, and take 0 to be a boundary of a cubic box centered at the origin and having the volume n. Then n o d−1 β,+ PVn [C (0)] ≥ exp −c (β) n d . Now, consider a shortest possible path γ0 , which visits all points inside 0 prior to leaving for ∂V . Its length is proportional to n. On the other hand, evidently β,+ β,+ β,+ PVn B (ε, γ ) ≥ PVn B (ε, γ ) |C (0) PVn [C (0)] , β,+ and the first factor is larger than 21 for β and n large. So the probability PVn B (ε, γ ) decays in n subexponentially only. The reason why the above argument fails is that the summation over all paths includes an overcounting; the same set of configurations makes many different paths to have the wrong magnetization along them. On i control the contribution of the h the other hand, to β,+ S set C (0) to the probability PV γ ∈SV (k) B (ε, γ ) , one does not need any extra n o d−1 β,+ argument involving path counting, simply because PV [C (0)] ≤ exp −c0 (β) n d (Peierls estimate). To proceed with the proof of Theorem 9, we introduce some more definitions. Let a configuration σ ∈ + be given, where + = ∪n + n . We denote by G (σ ) the set of its exterior contours. We call a set W ∈ SV (k, N ) a bad set for σ , iff P 0∈G(σ ): |Int 0| Int 0∩W 6 =∅ ≥ δ. (65) |W | Here δ > 0 is some fixed number. Clearly, if σ belongs to the event B (ε, W ), then W is a bad set for σ , with δ = δ (ε, β). We denote the event (65) by C (δ, W ). As one will see later, the proofs of the above theorems require the introduction of different scales, and these scales are needed to treat contours of different sizes. Anticipating that, we introduce now the events Cr (·, W ) , r = 1, 2, ..., as follows: P 0∈G(σ ): |Int 0|=r 1 δr δr Int 0∩W 6 =∅ ≥ . , W ⇐⇒ (66) σ ∈ Cr |W | r r δr , W , then we say that the set W is an r-bad set for σ . If σ belongs to the event Cr r Our choice of the parameters δr will be the following: δr = with any positive ε1 <

1 2d .

δ , K (ε1 ) r 1+ε1

In this way the inclusion δr ,W C (δ, W ) ⊂ ∪r Cr r

(67)

(68)

Weakly Gibbs States

537

P 1 holds, provided ∞ 1 r 1+ε1 < K (ε1 ). That means, in words, that if W is a bad set for σ , then W is also r-bad for σ for at least one value of r. Let now the sequence of integers x1 ≤ x2 ≤ ... be given, x1 ≥ k (where k is the value of the connection parameter entering the formulation of our theorems). The values xr will be used as scales for studying the configurations with r-bad sets. The choice of these scales, depending on δr , will also be made later. The sequence xr does not depend on any other parameter, but for a fixed value of N we will use only the first r (N) terms of it, where r (N) is defined in (80) below. The reason for that is that for a given N the contribution of contours of size r > r (N ) to the event (65) can be neglected. δr ,W . Now we will start to estimate the probability of the event ∪W ∈SV (k,N ) Cr r In order to avoid the overcounting, to which we were alluding above, we will make a coarse graining of our system. So we introduce the natural partition Lr of Zd into cubes Axr of size xr , with faces parallel to the coordinate planes: n o Lr = Axr (y) ≡ Axr + y, y ∈ xr Zd , and we denote by W r the xr -fattening of W , i.e. the union of all those cubes Axr (y) of Lr , which contain at least one point of the initial set W . Note that the set W ris always δr ,W , connected, since W is supposed to be k-connected, while xr ≥ k. If σ ∈ Cr r then evidently P

0∈G(σ ): |Int 0|=r Int 0∩W r 6 =∅

|W |

P 0∈G(σ ): |Int 0|=r

1

Int 0∩W 6 =∅

≥

1

|W |

≥

δr , r

(69)

so P 0∈G(σ ): |Int 0|=r Int 0∩W r 6 =∅ |W r |

1

≥

δr

κr (xr )d−1

,

(70)

provided the estimate r W ≤ κ (xr )d−1 |W |

(71)

holds for some κ = κ (d, k), all W ∈ SV (k, N ) and all r = 1, 2, ..., r (N ). The estimate (71) is indeed valid under the condition that the size xr(N) of the cubes of the partition Lr(N ) is much smaller than N. The reason is that if γ ⊂ Zd is a path of length |γ | ≥ N, then the number of cubes of the partition Lr the path γ can hit is bounded from above by C (d) |γ | /xr , provided xr N. This last condition is ensured for all N large enough and all r = 1, 2, ..., r (N ) by the choices (80) and (86) made below. Let us now introduce the family SVr (N) as the collection of all connected subsets of Zd of diameter between N and 3N, containing the origin, which are made from the cubes of the partition Lr , intersecting V . For future use we introduce for the sets W ∈ SVr (N ) the notation ||W || for the number of xr -cubes they are composed of; of

538

C. Maes, F. Redig, S. Shlosman, A. Van Moffaert

course, ||W || = (xr )−d |W |. It follows from (69), (70) and the definition (66) that β,+

PV

 

[

Cr



 

 

δr β,+ ,W ≤ PV   r

[

Cr

δr

κ r (xr )d−1

,W

 

 X δr β,+ PV , W . (72) ≤ Cr κr (xr )d−1 W ∈S r (N) W ∈SVr (N)

W ∈SV (k,N)

V

A heuristic comment. At first glance the maneuver we performed in (72) could have well been done with the sets W themselves, without passing to the fattenings W r . The key difference however lies in the amount of events which are of the form of (66) or (70). For example, there are about C (d, k)N different sets W with |W | = N , which may appear in (66), so in order to have the summation over them to converge, we need the estimate of the probability of the event (66) to be exponentially small in N . But such an estimate simply does not hold! On the other hand, the corresponding number of different (connected) sets which can be produced from these W -s via fattening by 2C(d)N xr -cubes is bounded by 3d xr , and that makes the summation possible, as we will see soon. Note that the Peierls estimate implies immediately, that for every set W˜ and δ˜ > 0 we have

β,+

PV

n

˜ W˜ Cr δ,

o

˜ W

n ok X W˜ d−1 , exp −β 0 r d ≤ k

(73)

k=δ˜ W˜

with β 0 diverging together with β. Indeed, we have to have δ˜ W˜ exterior contours, surrounding points of W˜ , which explains the first factor. The surface of every contour d−1

contributing to the lhs of (73) is at least 2dr d , the number of different 0-s, containing a given face and having the surface L, is bounded by 3L , and X L≥2dr

n o d−1 3L exp {−2βL} ≤ exp −β 0 r d .

d−1 d

We want to interpret the rhs of (73) as a probability. So we introduce the Bernoulli random field, made by i.i.d. Bernoulli random variables ξir , i ∈ Zd :  n o n o−1  0 r d−1  1 with probability pr = exp −β 0 r d−1 d d , 1 + exp −β ξir = n o−1 d−1   0 with probability qr = 1 + exp −β 0 r d .

(74)

Weakly Gibbs States

539

Then we can rewrite (73) as

β,+ PV

n

Cr

˜ W˜ δ,

o

n

0

≤ 1 + exp −β r

n ≤ 1 + exp −β 0 r

W˜ o W˜ X W˜ d−1 d

k=δ˜ W˜   W˜ X o W˜  d−1

k

prk qr

˜ W −k

    ˜ r ˜ P ξi ≥ δ W .      i=1 

d

(75)

That is why we will study now the large deviation properties of the random field ξir . 6.2. Large deviations for Bernoulli variables. The results in this section are fairly standard. We present them just for completeness. Lemma 10. Let ξi be a sequence of Bernoulli random variables, 1 with probability p, ξi = 0 with probability 1 − p. Let SK =

PK

i=1 ξi .

Then for every k and every z > 2 we have the estimate P (SK > k) ≤ z−k (1 − p + pz)K .

(76)

Proof. We use Cramer tilting. We have K k p (1 − p)K−k ≡ P (SK = k) = k K ≡ (pz)k (1 − p)K−k z−k ≡ k # " K k K−k k (pz) (1 − p) z−k (1 − p + pz)K . ≡ (1 − p + pz)K But the expression in the square brackets is again the probability of the same event, now pz 1−p , 1 − p 0 = 1−p+pz . Summation according to the Bernoulli sequence with p 0 = 1−p+pz over k yields the result. u t Corollary 11. Let A be a real number, such that Ap < 1. Then P (SK > ApK) ≤ εK ,

(77)

with ε = ε (p, A) =

1−p 1 − Ap

(1−Ap) Ap 1 . A

(78)

540

C. Maes, F. Redig, S. Shlosman, A. Van Moffaert

Note. Of course, the notation ε (p, A) does not necessarily make the quantity (78) small. This notation just expresses our hope. Proof. Let us take z to be the solution z0 of the equation pz = Ap. 1 − p + pz (Note that such a choice of z makes the p0 -Bernoulli process of the previous proof to have the mean value of the sum of K its element to be ApK.) The solution is given by z0 = A Note also the relation 1 − p + pz0 = −ApK

P (SK > ApK) ≤ z0 =

1−p 1−Ap

z K 0

z0 A.

=

A K(1−Ap)

AApK

1−p . 1 − Ap

Putting this into (76), we have (z0 )K(1−Ap) = AK " #K 1 − p (1−Ap) 1 Ap ≡ . 1 − Ap A

t u

6.3. Proof of Theorem 9. i) Let the number N be fixed. Let us denote by R (> k, N ) the set of all configurations σ = (σx , x ∈ V ), which have a large contour 0 ∈ G (σ ) in the vicinity of the origin: dist (0, Int 0) < 2N, |0| > k. Then it is immediate to see that if k (N ) > ln N, then (79) (R (> k (N) , N )) ≤ exp −β 0 k (N ) → 0 as N → ∞, i hS so we need to study only the intersection W ∈SV (k,N ) B (ε, W ) ∩ R (< k (N ) , N ) . Note that this intersection satisfies the inclusion   [ δr r(N)   ,W B (ε, W ) ∩ R (< k (N ) , N ) ⊂ ∪r=1 Cr r β,+

PV

W ∈SV (k,N)

(compare with (68)) if d

r (N ) = k (N ) d−1 .

(80)

Weakly Gibbs States

541

For our purposes the optimal choice of the function k (N) turns out to be d−1

(81) k (N ) = N 2d (1+ε1 ) . n o β,+ δr , W r of the event (70) we will Cr ii) To estimate the probability PV d−1 κr (xr ) use first the estimate (75) and then (77) with n o n o−1 d−1 d−1 , 1 + exp −β 0 r d p = pr = exp −β 0 r d see (74), and A = pr−1 We have:



.

r , W ≤ κ r (xr )d−1 h oi(xr )d ||W r || n 0 d−1 1 + exp −β r d  δ   o n o−1  1− κr (xrr)d−1 n  d−1 d−1   0  1 + exp −β 0 r d   1 − exp −β r d   ×   δ  r    1 −   κr (xr )d−1 β,+

PV

δr

κr (xr )d−1

δr

Cr

n

0  exp −β r  × 

(xr )d ||W δ  o−1  κr (xrr)d−1  d−1     1 + exp −β 0 r d      δr      d−1  κr (xr )

o d−1 d

n

r ||

.

(82)

If we are able to show that the product of the curly brackets in the last expression is small for every r, then we would be done, since that would enable us to beat the entropy 2||W r || 2k . (It is not hard to see that the quantity 3d estimates the number of factor 3d connected sets made from k unit cubes in Rd , containing a given one.) So we will look at the logarithm of {·} in (82). It is equal to h n oi d−1 ln { · } = (xr )d ln 1 + exp −β 0 r d  o n o−1  n d−1 d−1 0 0 1 + exp −β r d  1 − exp −β r d  δr   ln + (xr )d 1 −   d−1 δ r κr (xr ) 1− d−1 κr (xr )  n o n o  + (xr )d

δr

d−1

κr (xr )

0  exp −β r ln  

d−1 d

1 + exp −β 0 r δr

κr (xr )d−1

d−1 d

−1

  

542

C. Maes, F. Redig, S. Shlosman, A. Van Moffaert

n o d−1 δr . (xr )d exp −β 0 r d + (xr )d 1 − κr (xr )d−1 n o n o−1 δr 0 d−1 0 d−1 d d − exp −β r 1 + exp −β r · κr (xr )d−1 n o xr δr δr 0 d−1 0 d−1 d d − exp −β r −β r + − ln κr κr (xr )d−1 n o d−1 δr xr δr d 0 d−1 d β 0 r d + ln − 1 + exp −β r . .− ) (x r κr κr (xr )d−1

(83)

The only thing needed for the validity of the above estimate is the smallness of o n δr d−1 , which is not an extra constraint. From this estimate we exp −β 0 r d and κr (xr )d−1 see that the following requirements are sufficient for the numbers xr to make the above logarithm very negative: xr δr ≥ α > 0 for some α and all r, κr

(84)

n d−1 o for all r. (xr )d α exp β 0 r d

(85)

Indeed, under (84) β 0r

d−1 d

+ ln

n d−1 o δr = ln exp β 0 r d κr (xr )d−1 i h n d−1 o ≥ ln α exp β 0 r d (xr )−d 1,

δr

κr (xr )d−1

because of (85). For example, any choice xr ∼ r 2+3ε1 /2

(86)

would go. Then the dominating term in (83) would be −β 0

xr δr d−1 r d −1 κr

uniformly in r, provided β is large. iii) What remains now is to estimate the sum d

k(N) Xd−1

X

r=1

W r ∈SVr (N )

β,+ PV

Cr

κ r (xr )d−1

For W r ∈ SVr (N ) we have ||Wr || ≥

δr

N . xr

,W

r

.

Weakly Gibbs States

543

Hence, by the estimates (82), (83) the last sum is bounded by d

X 2L xr δr d−1 r d L . exp −β 0 3d κr N

k(N) Xd−1 r=1

L≥ xr

According to the choices made before, see (67), the second exponent beats the first one, and the resulting upper bound is, due to (81), d

k(N) Xd−1 r=1

d−1 δr d−1 exp −β 0 r d N ≤ exp −β 0 N 2d (1+ε1 ) . κr

Together with the estimate (79) and again in view of the choice made in (81), that proves the result. u t Remark 1. The reader might have noted from (85) that the values of β for which our results hold, depend on k and go to infinity as k increases. We believe that this is only due to technical reasons, and in fact the same result should hold for β large enough, uniformly in k. This belief turned out to be correct, as it is shown in the paper [S] of one of us.

6.4. The general case. In this subsection we use the notation Pβ,+ for the Gibbs state of our Hamiltonian (9) corresponding to the boundary condition +1. For Pβ,+ -almost every configuration σ the set of points x ∈ Zd , where σ (x) = 1 contains a unique infinite component Ext σ , with all connected components of the complement Zd \ Ext σ finite. We will denote these components by 1i (σ ), their collection by D (σ ), and will call them droplets of σ . In analogy with (62) we introduce the event  

1 B (ε, W ) = σ ∈ :  |W |

X i:1i (σ )∩W 6 =∅

 

|1i (σ )| > ε . 

With this notation Theorem 9 is valid in the above generality. Proof. The proof of this statement is the same one as is given for the case of the Ising model. The only extra ingredient needed for the general case is the estimate that for every finite connected set 1 ⊂ Zd , Pβ,+ (σ : 1 ∈ D (σ )) ≤ exp {−c (β) |∂1|} , with c (β) diverging with β. But this is a standard corollary of the PS theory. u t

544

C. Maes, F. Redig, S. Shlosman, A. Van Moffaert

7. Outlook We end the paper with a short discussion concerning logical continuations of the present work. First of all, it is clear that not all interesting cases of non-Gibbsian states have been covered. As an example, we have not dealt here with fuzzy descriptions of Gibbs random fields [MvdV2] which can be seen as a coarse graining of the spin space. A well known example is the projection of the massless Gaussian field on the sign variables, see [LM,EFS,ES]. Moreover, even for the projection to a sublattice, our assumption that the original measure is a low temperature phase of the PS theory, is generally violated for the space-time Gibbs measures describing the steady state of a stochastic dynamics. So, we have no results on the Gibbsianness of the projections to spatial layers, hence on the Gibbsian character of stationary measures in the coexistence regime, see [LMS, GKLM]. Finally, the present work should be followed by establishing the standard results of the Gibbs formalism (existence of thermodynamic potentials, variational principle, etc.). The basis for that can be the structure of weakly Gibbsian fields, uncovered in the present work, see the estimate (25) and Definition 6. That would contribute to the second part of the Dobrushin program of Gibbsian Restoration. References [B]

van den Berg, J.: A uniqueness condition for Gibbs measures, with application to the 2-dimensional Ising antiferromagnet. Commun. Math. Phys. 152, 161–166 (1993) [BK] Bricmont, J. and Kupiainen, A.: Phase transition in the 3D random field Ising model. Commun. Math. Phys. 116, 539–572 (1988) [BKL] Bricmont, J., Kupiainen, A. and Lefevere, R.: Renormalization group pathologies and the definition of Gibbs states. Commun. Math. Phys. 194, 359–388 (1998) [BM] van den Berg, J. and Maes, C.: Disagreement percolation in the study of Markov fields. Ann. Prob. 22, 749–763 (1994) [BS] Burton, R.M. and Steif, J.E.: Quite weak Bernoulli with exponential rate and percolation for random fields. Stoch. Process. Appl. 58, 35 (1995) [D1] Dobrushin, R.L.: Estimates of semi-invariants for the Ising model at low temperatures. In: Beresin memorial volume, AMS Translations Series, vol. 177(2), 59–81 (1996) [D2] Dobrushin, R.L.: A Gibbsian representation for non-Gibbsian fields. Lecture given at the workshop “Probability and Physics”, September 1995, Renkum (the Netherlands) [DS] Dobrushin, R.L. and Shlosman, S.B.: “Non-Gibbsian” states and their Gibbs description. Commun. Math. Phys., 200, 1, 125–179 (1999) [EFS] van Enter, A.C.D., Fernandez, R. and Sokal, A.D.: Regularity properties and pathologies of positionspace renormalization transformations: scope and limitations of Gibbsian theory. J. Stat. Phys. 72, 879–1167 (1993) [EMS] van Enter, A.C.D., Maes, C. and Shlosman, S.B.: Dobrushin’s program on Gibbsianity restoration: Weakly Gibbsian and Almost Gibbsian random fields. In: On Dobrushin’s way. From Probability Theory to Statistical Mechanics, R.A. Minlos, S.B. Shlosman, Yu.M. Suhov eds, Providence, RI: AMS, 1999 [ES] van Enter, A.C.D. and Shlosman, S.B.: (Almost) Gibbsian description of the sign-fields of SOSfields. J. Stat. Phys. 92, 353–368 (1998) [GKLM] Goldstein, S., Kuik, R., Lebowitz, J.L. and Maes, Ch.: From PCAs to equilibrium systems and back. Commun. Math. Phys. 125, 71–79 (1989) [GP1] Griffiths, R.B. and Pearce, P.A.: Position-space renormalization-group transformations: Some proofs and some problems. Phys. Rev. Lett. 41, 917–920 (1978) [GP2] Griffiths, R.B. and Pearce, P.A.: Mathematical problems of position-space renormalization-group transformations. J. Stat. Phys. 20, 499–545 (1979) [I] Israel, R.B.: Banach algebras and Kadanoff transformations. In: Random Fields, Esztergom, 1979, J. Fritz, J.L. Lebowitz, D. Szasz eds, Amsterdam: North-Holland, 1981, 2, pp. 593–608

Weakly Gibbs States

[KP] [Koz] [LM] [LMS] [LMV] [MRV1] [MRV2] [MvdV1] [MvdV2] [Sch] [S] [Sin]

545

Kotecky, R. and Preiss, D.: Cluster expansion for abstract polymer models. Commun. Math. Phys. 103, 491–498 (1986) Kozlov, O.K.: Gibbs description of a system of random variables. Prob. Inform. Transmission 10, 258–265 (1974) Lebowitz, J.L. and Maes, C.: The effect of an external field on an interface, entropic repulsion. J. Stat. Phys. 46, 39–49 1987 Lebowitz, J.L., Maes, C. and Speer, E.: Statistical mechanics of probabilistic cellular automata. J. Stat. Phys. 59, 117–170 1990 Lorinczi, J., Maes, C. and Vande Velde, K.: Transformations of Gibbs measures. Prob. Th. Rel. Fields, 112, 121–147 (1998) Maes, K., Redig, F. and Van Moffaert, A.: Almost Gibbsian versus weakly Gibbsian measures. Stoch. Proc. Appl. 79, 1–15 (1999) Maes, C., Redig, F. and Van Moffaert, A.: The restriction of the Ising model to a layer. To appear in J. Stat. Phys. (1999) Maes, C. and Vande Velde, K.: Relative energies for non-Gibbsian states. Commun. Math. Phys. 189, 277–286 (1997) Maes, C. and Vande Velde, K.: The fuzzy Potts model. J. Phys. A 28, 4261–4271 (1995) Schonmann, R.H.: Projections of Gibbs measures can be non-Gibbsian. Commun. Math. Phys. 124, 1–7 (1989) Shlosman, S.: Path large deviation and other typical properties of the low-temperature models, with applications to the weakly Gibbs states. In preparation Sinai, Ya.G.: Theory of Phase Transitions: Rigorous Results. Oxford: Pergamon Press, 1982

Communicated by Ya. G. Sinai

Commun. Math. Phys. 209, 547 – 548 (2000)

Communications in

Mathematical Physics

© Springer-Verlag 2000

Erratum

Entropic Repulsion of the Lattice Free Field Erwin Bolthausen1 , Jean-Dominique Deuschel2 , Ofer Zeitouni3 1 Institute for Mathematics, Universität Zürich, Winterthurerstrasse 190, 8057 Zürich, Switzerland.

E-mail: [email protected]

2 Fachbereich Mathematik, Sekr. MA 7-4, TU Berlin, Strasse des 17. Juni 135, 10623 Berlin, Germany.

E-mail: [email protected]

3 Department of Electrical Engineering, Technion-Israel Institute of Technology, Haifa 32000, Israel.

E-mail: [email protected] Received: 9 November 1999/ Accepted: 9 November 1999 Commun. Math. Phys. 170, 417–443 (1995)

Contrary to what is claimed in Lemma A.2 of [1], it is not enough to assume (b’) in order to prove (A.5) (the last line in the proof, p. 438, contains a gap). The results of the paper [1] remain correct if, when assuming (b’), one also assumes (A.5). See [2] or [3] for sufficient conditions ensuring (A.5). In particular, an explicit computation using the representation by Bessel functions shows that assumption (b) in [1] does imply (A.5), see [3], p. 414, for a hint. We take this oportunity to correct a few misprints in the proof of Proposition A.18, p. 441: in line 7, replace X X |R(|i − j |)| + sup |R(|i − j |)|), ≤ ||α||22 ||β||22 (sup i∈I1 j ∈I

j ∈I2 i∈I

2

by ≤ (1/2)||α||22 (sup

X

i∈I1 j ∈I

1

j ∈I2 i∈I

2

and replace in line 13 τξI1 ,I2

≤1∧

(supi∈I1 (1 − 2 supi∈I1

X

|R(|i − j |)|) + (1/2)||β||22 sup

|R(|i − j |)|) ,

1

P

j ∈I2

P

j ∈I1

P |R(|i − j |)| + supj ∈I2 i∈I1 |R(|i − j |)|) . P ¯ − j |)|)(1 − 2 supi∈I ¯ |R(|i j ∈I |R(|i − j |)|) 2

2

by τξI1 ,I2

1 ≤1∧ 2

(supi∈I1

P

(1−2 supi∈I1

|R(|i −j |)| ¯ j ∈I |R(|i −j |)|)

j ∈I2

P

1

+

supj ∈I2

P

(1 − 2 supi∈I2

! |R(|i −j |)|) . P ¯ j ∈I |R(|i −j |)|)

i∈I1

2

548

E. Bolthausen, J.-D. Deuschel, O. Zeitouni

Acknowledgements. We are grateful to P. Caputo for pointing out the gap in the proof of (A.5) and Y. Peres for the reference to [3].

References 1. Bolthausen, E., Deuschel, J.-D. and Zeitouni, O.: Entropic repulsion of the lattice free field. Commun. Math. Phys. 170, 417–443 (1995) 2. Le Gall, J.-F. and Rosen, J.: The range of stable random walks. Ann. of Proba. 19, 650–705 (1991) 3. Williamson, J.A.: Random walks and Riesz kernels. Pacific J. Math. 25 393–415 (1968) Communicated by J. L. Lebowitz

Commun. Math. Phys. 209, 549 – 594 (2000)

Communications in

Mathematical Physics

© Springer-Verlag 2000

Axiomatic Conformal Field Theory Matthias R. Gaberdiel, Peter Goddard Department of Applied Mathematics and Theoretical Physics, University of Cambridge, Silver Street, Cambridge, CB3 9EW, UK Received: 22 October 1998 / Accepted: 16 July 1999

Abstract: A new rigourous approach to conformal field theory is presented. The basic objects are families of complex-valued amplitudes, which define a meromorphic conformal field theory (or chiral algebra) and which lead naturally to the definition of topological vector spaces, between which vertex operators act as continuous operators. In fact, in order to develop the theory, Möbius invariance rather than full conformal invariance is required but it is shown that every Möbius theory can be extended to a conformal theory by the construction of a Virasoro field. In this approach, a representation of a conformal field theory is naturally defined in terms of a family of amplitudes with appropriate analytic properties. It is shown that these amplitudes can also be derived from a suitable collection of states in the meromorphic theory. Zhu’s algebra then appears naturally as the algebra of conditions which states defining highest weight representations must satisfy. The relationship of the representations of Zhu’s algebra to the classification of highest weight representations is explained. 1. Introduction Conformal field theory has been a subject which has attracted a great deal of attention in the last thirty years, much of the interest being motivated by its importance in string theory. Its role in string theory goes back to its very beginning when Veneziano [1] proposed a form for the scattering amplitude for four particles, quickly generalised to nparticle amplitudes, which could conveniently be expressed as integrals in the complex plane of meromorphic functions [2]. From these amplitudes, the space of states was obtained by factorisation. The power of two-dimensional conformal field theory was conclusively demonstrated by the work of Belavin, Polyakov and Zamolodchikov [3]. They set a general framework for its study which was further developed by Moore and Seiberg [4], in particular. This approach is couched within the general language of quantum field theory. Not

550

M. R. Gaberdiel, P. Goddard

least because it is possible to establish very strong results in conformal field theory, it is very desirable to have a precise mathematical context within which they can be established. Rigorous approaches to conformal field theory have been developed, broadly speaking, from three different standpoints: a geometrical approach initiated by Segal [5]; an algebraic approach due to Borcherds [6, 7], Frenkel, Lepowsky and Meurman [8] and developed further by Frenkel, Huang and Lepowsky [9] and Kac [10]; and a functional analytic approach in which techniques from algebraic quantum field theory are employed and which has been pioneered by Wassermann [11] and Gabbiani and Fröhlich [12]. Each of these three approaches produces a different perspective on conformal field and each facilitates the appreciation of its deep connections with other parts of mathematics, different in the three cases. Here we present a rigorous approach closely related to the way conformal field theory arose at the birth of string theory. It is a development of earlier studies of meromorphic conformal field theory [13]. Starting from a family of “amplitudes”, which are functions of n complex variables and describe the vacuum expectation values of n fields associated with certain basic states, the full space of states of the theory is obtained by factorising these amplitudes in a certain sense. The process of reconstructing the space of states from the vacuum expectation values of fields is familiar from axiomatic quantum field theory. In the usual OsterwalderSchrader framework of Euclidean quantum field theory [14], the reflection positivity axiom guarantees that the resulting space of states has the structure of a Hilbert space. (In the context of conformal field theory this approach has been developed by Felder, Fröhlich and Keller [15].) In the present approach, the construction of the space of states depends only on the meromorphicity of the given family of amplitudes, A, and positivity is not required for the basic development of the theory. The spaces of states that are naturally defined are not Hilbert spaces but topological vector spaces, their topology being determined by requirements designed to ensure meromorphic amplitudes. (Recently Huang has also introduced topological vector spaces, which are related to ours, but from a different point of view [16].) They are also such that one can introduce fields (“vertex operators”) for the basic states which are continuous operators. The locality property of these vertex operators is a direct consequence of the locality assumption about the family of amplitudes, A, and this is then sufficient to prove the duality property (or Jacobi identity) of the vertex operators [13]. To develop the theory further, we need to assume that the basic amplitudes, A, are Möbius invariant. This enables us to define vertex operators for more general states, modes of vertex operators and a Fock space which contains the essential algebraic content of the theory. This Fock space also enables us to define the concept of the equivalence of conformal field theories. The assumptions made so far are very general but if we assume that the amplitudes satisfy a cluster decomposition property we place much more severe restrictions on the theory, enabling us, in particular, to prove the uniqueness of the vacuum state. Nothing assumed so far implies that the theory has a conformal structure, only one of Möbius symmetry. However, we show that it is always possible to extend the theory in such a way that it acquires a conformal structure. (For theories with a conformal structure this leaves the theory unchanged.) A conformal structure is necessary if we want to be able to define the theory on higher genus Riemann surfaces (although this is not discussed in the present paper). For this purpose, we also need to introduce the concept of a representation of a conformal (or rather a Möbius) field theory. Developing an idea of Montague, we show that any representation corresponds to a state in the space of states of the theory [17]. This naturally poses the question of what conditions

Axiomatic Conformal Field Theory

551

a state has to satisfy in order to define a representation. For the case of highest weight representations, the conditions define an associative algebra which is that originally introduced by Zhu [18]. It is the main content of Zhu’s Theorem that this algebra can be defined in terms of the algebraic Fock space. The plan of the paper is as follows. In Sect. 2, we introduce the basic assumptions about the family of amplitudes, A, and construct the topological vector space of states and the vertex operators for the basic states. In Sect. 3, Möbius invariance and its consequences are discussed. In Sect. 4, we define modes of vertex operators and use them to construct Fock spaces and thus to define the equivalence of theories. Examples of conformal field theories are provided in Sect. 5: the U (1) theory, affine Lie algebra theory, the Virasoro theory, lattice theories, and an example which does not have a conformal structure. In Sect. 6, the assumption of cluster decomposition is introduced and in Sect. 7 we show how to extend a Möbius invariant theory to make it conformally invariant. In Sect. 8, we define what is meant by a representation and show how any representation can be characterised by a state in the theory. In Sect. 9, we define the idea of a Möbius covariant representation and the notion of equivalence for representations. An example of a representation is given in Sect. 10. In Sect. 11, we introduce Zhu’s algebra and explain the significance of Zhu’s Theorem in our context. Further developments, which are to be the subject of a future paper [19], are surveyed in Sect. 12. There are seven appendices in which some of the more technical details are described.

2. Amplitudes, Spaces and Vertex Operators The starting point for our approach is a collection of functions, which are eventually to be regarded as the vacuum expectation values of the fields associated with a certain basic set of states which generate the whole theory. We shall denote the space spanned by such states by V . In terms of the usual concepts of conformal field theory, V would be a subspace of the space of quasi-primary states. V can typically be taken to be finitedimensional but this is not essential in what follows. (If it is infinite-dimensional, we shall at least assume that the algebraic dimension of V is countable, that is that the elements of V consist of finite linear combinations of a countable basis.) We suppose that V can be regarded as the direct sum of a collection of subspaces, an integer, h, called the conformal weight of the Vh , to each of which we can attachL states in that subspace, so that V = h Vh . This is equivalent to saying that we have a diagonalisable operator δ : V → V , with eigenspaces Vh = {ψ ∈ V : δψ = hψ}. We also suppose that for any positive integer n, and any finite collection of vectors ψi ∈ Vhi , and zi ∈ P (the Riemann Sphere), where i = 1, . . . , n, we have a density f (ψ1 , . . . , ψn ; z1 , . . . , zn ) ≡ hV (ψ1 , z1 )V (ψ2 , z2 ) · · · V (ψn , zn )i

n Y

dzj

hj

. (1)

j =1

Here hV (ψ1 , z1 )V (ψ2 , z2 ) · · · V (ψn , zn )i is merely a suggestive notation for what will in the end acquire an interpretation as the vacuum expectation value of a product of fields. These “amplitudes” are assumed to be multilinear in ψi , invariant under the exchange of (ψi , zi ) with (ψj , zj ), and analytic in zi , save only for possible singularities occurring at zi = zj for i 6 = j , which we shall assume to be poles of finite order (although one could consider generalisations in which the amplitudes are allowed to have essential singularities). Because of the independence of order of the (ψj , zj ), we can use the

552

M. R. Gaberdiel, P. Goddard

notation

* f (ψ1 , . . . , ψn ; z1 , . . . , zn ) =

n Y

+ V (ψj , zj )

j =1

n Y

dzj

hj

.

(2)

j =1

We denote the collection of these densities, and the theory we develop from them, by A = {f }. We may assume that if all amplitudes in A involving a given ψ ∈ V vanish then ψ = 0 (for, if this is not so, we may replace V by its quotient by the space of all vectors ψ ∈ V which are such that all amplitudes involving ψ vanish). We use these amplitudes to define spaces of states associated with certain subsets C of the Riemann Sphere P. We can picture these spaces as consisting of states generated by fields acting at points of C. First introduce the set, BC , whose elements are labelled by finite collections of ψi ∈ Vhi , zi ∈ C ⊂ P, i = 1, . . . , n, n ∈ N and zi 6= zj if i 6= j ; we denote a typical element ψ ∈ BC by ψ = V (ψ1 , z1 )V (ψ2 , z2 ) · · · V (ψn , zn ) ≡

n Y

V (ψi , zi ).

(3)

i=1

We shall immediately identify ψ ∈ BC with the other elementsQof BC obtained by replacing each ψj in (3) by µj ψj , 1 ≤ j ≤ n, where µj ∈ C and nj=1 µj = 1. Next we introduce the free (complex) vector space on BC , i.e. the complex vector P space with basis BC that is consisting of formal finite linear combinations 9 = j λj ψ j , λj ∈ C, ψ j ∈ BC ; we denote this space by VC . The vector space VC is enormous, and, intuitively, as we consider more and more complex combinations of the basis vectors, BC , we generate vectors which are very close to one another. To measure this closeness, we need in essence to use suitably chosen amplitudes as test functions. To select a collection of linear functionals which we may use to construct from VC a space in which we have some suitable idea of topology, we select another subset O ⊂ P with O ∩ C = ∅, where O is open, and we suppose further that the interior of C, C o , is not empty. Let φ = V (φ1 , ζ1 )V (φ2 , ζ2 ) · · · V (φm , ζm ) ∈ BO ,

(4)

where φj ∈ Vkj , j = 1, . . . , m. Each φ ∈ BO defines a map on ψ ∈ BC by ηφ (ψ) = (φ, ψ) =

*m Y i=1

V (φi , ζi )

n Y

+ V (ψj , zj ) ,

(5)

j =1

which we can use as a contribution to our measure of nearness of vectors in VC . [Strictly speaking, this map defines a density rather than a function, so that we should really be hj Q ki Qn .] considering ηφ (ψ) m i=1 (dζi ) j =1 dzj For each φ ∈ BO , ηφ extends by linearity to a map VC → C, provided that O∩C = ∅. We use these linear functionals to define our concept of closeness or, more precisely, the topology of our space. To make sure that we end up with a space which is complete, we need to consider sequences of elements of VC which are convergent in a suitable sense. eC be the space of sequences 9 = (91 , 92 , . . . ), 9j ∈ VC . We consider the subset Let V O e of such sequences 9 for which ηφ (9j ) converges on subsets of φ of the form V C {φ = V (φ1 , ζ1 )V (φ2 , ζ2 ) · · · V (φm , ζm ) : ζj ∈ K, |ζi − ζj | ≥ , i 6= j },

(6)

Axiomatic Conformal Field Theory

553

where for each collection of φj , > 0 and a compact subset K ⊂ O, the convergence is uniform in the (compact) set {(ζ1 , . . . , ζm ) : ζj ∈ K, |ζi − ζj | ≥ , i 6= j }. eO , the limit If 9 ∈ V C

lim ηφ (9j )

j →∞

(7) (8)

is necessarily an analytic function of the ζj , for ζj ∈ O, with singularities only at ζi = ζj , i 6 = j . (Again these could in principle be essential singularities, but the assumption of the cluster decompostion property, made in Sect. 6, will imply that these are only poles of finite order.) We denote this function by ηφ (9). [A necessary and sufficient condition for uniform convergence on the compact set (7) is that the functions ηφ (9j ) should be both convergent in the compect set and locally uniformly bounded, i.e. each point of (7) has a neighbourhood in which ηφ (9j ) is bounded independently of j ; see Appendix A for further details.] It is natural that we should regard two such sequences 9 1 = (9i1 ) and 9 2 = (9i2 ) as equivalent if (9) lim ηφ (9j1 ) = lim ηφ (9j2 ), j →∞

j →∞

i.e. ηφ (9 1 ) = ηφ (9 2 ), for each φ ∈ BO . We identify such equivalent sequences, and denote the space of them by VCO . The space VCO has a natural topology: we define a sequence χj ∈ VCO , j = 1, 2, . . . , to be convergent if, for each φ ∈ BO , ηφ (χj ) converges uniformly on each (compact) subset of the form (7). The limit (10) lim ηφ (χj ) j →∞

is again necessarily a meromorphic function of the ζj , for ζj ∈ O, with poles only at ζi = ζj , i 6 = j . Provided that the limits of such sequences are always in VCO , i.e. lim ηφ (χj ) = ηφ (χ ), for some χ ∈ VCO ,

j →∞

(11)

we can define the topology by defining its closed subsets to be those for which the limit of each convergent sequence of elements in the subset is contained within it. In fact we do not have to incorporate the need for the limit to be in VCO , because it is so necessarily; we show this in Appendix B. [As we note in this appendix, this topology on VCO can be induced by a countable family of seminorms of the form ||χ||n = max1≤i≤n maxζij |ηφ i (χ )|, where the φij in φ i are chosen from finite subsets of a countable basis and the ζij are in a compact set of the form (7).] eO (using constant sequences), and this has an BC can be identified with a subset of V C O image in VC . It can be shown that this image is necessarily faithful provided that we assume the cluster property introduced in Sect. 6. In any case, we shall assume that this is the case in what follows and identify BC with its image in VCO . There is a common vector ∈ BC ⊂ VCO for all C, O which is called the vacuum vector. The linear span of BC is dense in VCO , (i.e. it is what is called a total space). With this identification, the image of BC in VCO , ψ, defined as in (3), depends linearly on the vectors ψj ∈ V .

554

M. R. Gaberdiel, P. Goddard

A key result in our approach is that, for suitable O, VCO does not depend on C. This is an analogue of the Reeh-Schlieder Theorem of Axiomatic Quantum Field Theory. In our context it is basically a consequence of the fact that any meromorphic function is determined by its values in an arbitrary open set. Precisely, we have the result: Theorem 1. VCO is independent of C if the complement of O is path connected. The proof is given in Appendix C. In the following we shall mainly consider the case where the complement of O is path-connected and, in this case, we denote VCO by V O . eC and, in particular, V O all depend, at least The definition of ηφ : VC → C, V C superficially, on the particular coordinate chosen on P, that is the particular identification of P with C ∪ {∞}. However the coefficients with which elements of BC are combined to constitute elements of VC should be regarded as densities and then a change of coordinate on P induces an endomorphism of VC which relates the definitions of the space VCO which we would get with the different choices of coordinates, because ηφ only changes by an overall factor (albeit a function of the ζi ). In this way VCO , etc. can be regarded as coordinate independent. Suppose that O ⊂ O0 and C ∩ O0 = ∅ with C o 6= ∅. Then if a sequence 9 = (9j ) ∈ e VC is such that ηφ (9j ) is convergent for all φ ∈ BO0 it follows that it is convergent for all φ ∈ BO ⊂ BO0 . In these circumstances, if ηφ (9) vanishes for all φ ∈ BO , it follows that ηφ 0 (9) will vanish for all φ 0 ∈ BO0 , because each ηφ 0 (9) is the analytic continuation of ηφ (9), for some φ ∈ BO ; the converse is also immediate because BO ⊂ BO0 . Thus 0 members of an equivalence class in VCO are also in the same equivalence class in VCO . 0 0 We thus have an injection V O → V O , and we can regard V O ⊂ V O . Since BC is dense 0 in V O , it follows that V O is also. Given a subset C ⊂ P with C o 6= ∅, BC is dense in a collection of spaces V O , with C ∩ O = ∅. Given open sets O1 and O2 such that the complement of O1 ∪ O2 contains an open set, BC will be dense in both V O1 and V O2 if C is contained in the complement of O1 ∪ O2 and C o 6 = ∅. The collection of topological vector spaces V O , where O is an open subset of the Riemann sphere whose complement is path-connected, forms in some sense the space of states of the meromorphic field theory we are considering. 0 A vertex operator can be defined for ψ ∈ V as an operator V (ψ, z) : V O → V O , where z ∈ O but z ∈ / O0 ⊂ O, by defining its action on the dense subset BC , where C ∩ O = ∅, V (ψ, z)ψ = V (ψ, z)V (ψ1 , z1 )V (ψ2 , z2 ) · · · V (ψn , zn ),

(12)

and ψ ∈ BC . The image is in VC 0 for any C 0 ⊃ C which contains z, and we can choose C 0 such that C 0 ∩ O0 = ∅. This then extends by linearity to a map VC → VC 0 . To 0 show that it induces a map VCO → VCO0 , we need to show that if 9 j , ∈ VCO , → 0 as j → ∞, then V (ψ, z)9 j → 0 as j → ∞; i.e. if ηφ (9 j ) → 0 for all φ ∈ BO , then ηφ 0 (V (ψ, z)9 j ) → 0 for all φ 0 ∈ BO0 . But ηφ 0 (V (ψ, z)9 j ) = ηφ (9 j ), where φ = V (ψ, z)φ 0 ∈ BO and so tends to zero as required. It is straightforward to show that the vertex operator V (ψ, z) is continuous. We shall refer to these vertex operators also as meromorphic fields. It follows directly from the invariance of the amplitudes under permutations that

Axiomatic Conformal Field Theory

555

Proposition 2. If z, ζ ∈ O, z 6= ζ , and φ, ψ ∈ V , then V (φ, z)V (ψ, ζ ) = V (ψ, ζ )V (φ, z)

(13)

as an identity on V O . This result, that the vertex operators, V (ψ, z), commute at different z in a (bosonic) meromorphic conformal field theory, is one which should hold morally, but normally one has to attach a meaning to it in some other sense, such as analytic continuation (compare for example [13]). 3. Möbius Invariance In order to proceed much further, without being dependent in some essential way on how the Riemann sphere is identified with the complex plane (and infinity), we shall need to assume that the amplitudes A have some sort of Möbius invariance. We shall say that the densities in A are invariant under the Möbius transformation γ , where γ (z) =

az + b , cz + d

(14)

(and we can take ad − bc = 1), provided that the densities in (2) satisfy + n * n + n * n Y Y Y Y hj V (ψj , zj ) (dzj ) = V (ψj , ζj ) (dζj )hj , where ζj = γ (zj ), j =1

i.e.

j =1

*

n Y

j =1

+ V (ψj , zj ) =

j =1

*

n Y

j =1

+ V (ψj , γ (zj ))

j =1

(15a) n Y

hj

γ 0 (zj )

.

(15b)

j =1

Here ψj ∈ Vhj . The Möbius transformations form the group M ∼ = SL(2, C)/Z2 . If A is invariant under the Möbius transformation γ , we can define an operator U (γ ) : V O → V Oγ , where Oγ = {γ (z) : z ∈ O}, by defining it on the dense subset BC for some C with C ∩ O = ∅, by U (γ )ψ =

n Y j =1

V (ψj , γ (zj ))

n Y

hj

γ 0 (zj )

,

(16)

j =1

where ψ = V (ψ1 , z1 ) · · · V (ψn , zn ) ∈ BC . Again, this extends by linearity to a map O defined on VC , and to show that it defines a map VCO → VCγγ , where Cγ = {γ (z) : z ∈ C}, we again need to show that if ηφ (9j ) → 0 for all φ ∈ BO , then ηφ 0 (U (γ )9j ) → 0 for all φ 0 ∈ BOγ . By the assumed invariance under γ , we have ηφ 0 (U (γ )9j ) = ηφ (9j ), where φ = U (γ −1 )φ 0 , and the result follows. It follows immediately from the definition of U (γ ) that U (γ ) = (where we have identified ∈ V O with ∈ V Oγ as explained in Sect. 2). Furthermore, U (γ )V (ψ, z)U (γ −1 ) = V (ψ, γ (z))γ 0 (z)h , for ψ ∈ Vh .

(17)

556

M. R. Gaberdiel, P. Goddard

By choosing a point z0 ∈ / O, we can identify V with a subspace of V O by the map ψ 7 → V (ψ, z0 ); this map is an injection provided that A is invariant under an infinite subgroup of M which maps z0 to an infinite number of distinct image points. For, if n Y h V (ψi , zi )V (ψ, ζ )i

(18)

i=1

vanishes for ζ = z0 for all ψi and zi , then by the invariance property, the same holds for an infinite number of ζ ’s. Regarded as a function of ζ , (18) defines a meromorphic function with infinitely many zeros; it therefore vanishes identically, thus implying that ψ = 0. In the following we shall use elements of SL(2, C) to denote the corresponding elements of M where no confusion will result, so that az + b ab . (19) if γ = , γ (z) = cd cz + d An element of M has either one or two fixed points or is the identity. The oneparameter complex subgroups of M are either conjugate to the translation group z 7→ z + λ (one fixed point) or the dilatation group z 7 → eλ z (two fixed points). Now, first, consider a theory which is invariant under the translation group z 7 → τλ (z) = z + λ. Then, if τλ = eλL−1 , and we do not distinguish between U (L−1 ) and L−1 in terms of notation, from (17) we have eλL−1 V (ψ, z)e−λL−1 = V (ψ, z + λ).

(20)

[If, instead, we had a theory invariant under a subgroup of the Möbius group conjugate to the translation group, {γ0−1 τλ γ0 : λ ∈ C} say, and if ζ = γ0 (z), ζ 7 → ζ 0 = ζ + b(ψ, ζ ) = V (ψ, z)γ 0 (z)−h and Lˆ −1 = γ −1 L−1 γ0 , then λ under γ0−1 τλ γ0 ; then, if V 0 0 ˆ b ˆ b(ψ, ζ + λ).] (ψ, ζ )e−λL−1 = V eλL−1 V Consider now a theory which is invariant under the whole Möbius group. We can pick a group conjugate to the translation group, and we can change coordinates so that z = ∞ is the fixed point of the selected translation group. (In particular, this defines an identification of P with C ∪ {∞} up to a Euclidean or scaling transformation of C.) If we / O, we have effectively selected select a point z0 to define the injection V → V O , z0 ∈ two fixed points. Without loss of generality, we can choose z0 = 0. Then ψ = V (ψ, 0) ∈ V O .

(21)

We can then introduce naturally two other one-parameter groups, one generated by L0 which fixes both 0 and ∞ (the group of dilatations or scaling transformations), and another which fixes only 0, generated by L1 (the group of special conformal transformations). Then eλL−1 (z) = z + λ, eλL0 (z) = eλ z, eλL1 (z) = eλL−1

z , 1 − λz

! 1 λ 2 1λ e 1 0 0 λL0 λL1 = , e = = , , e 1 01 −λ 1 0 e− 2 λ

(22a)

(22b)

Axiomatic Conformal Field Theory

557

and thus L−1 =

1 0 0λ 0 0 , L0 = 2 1 , L1 = . 00 −1 0 0 −2

(22c)

In particular, it then follows that [Lm , Ln ] = (m − n)Lm+n , m, n = 0, ±1.

(23)

We also have that Ln = 0, n = 0, ±1. With this parametrisation, the operator corresponding to the Möbius transformation γ , defined in (19), is given as (see [13]) !L0 √ c ad − bc b L−1 exp − L1 . (24) U (γ ) = exp d d d For ψ ∈ Vh , by (17), U (γ )V (ψ, z)U (γ −1 ) = V (ψ, γ (z))γ 0 (z)h , and so, by (21), U (γ )ψ = limz→0 V (ψ, γ (z))γ 0 (z)h . From this it follows that, L0 ψ = hψ, L1 ψ = 0, L−1 ψ = V 0 (ψ, 0).

(25)

Thus L0 = δ acting on V . Henceforth we shall assume that our theory defined by A is Möbius invariant. O Having chosen an identification of P with C ∪ {∞} Qn and of V with a subspace of V , we can now also define vertex operators for ψ = j =1 V (ψj , zj ) ∈ BC by V (ψ, z) =

n Y

V (ψj , zj + z).

(26)

j =1

/ O 2 ⊂ O1 Then V (φ, z) is a continuous operator V O1 → V O2 , provided that zj + z ∈ but zj + z ∈ O1 , 1 ≤ j ≤ n. We can further extend the definition of V (ψ, z) by linearity from ψ ∈ BC to vectors O 9 ∈ VO C , the image of VC in VC , to obtain a continuous linear operator V (9, z) : O O V 1 → V 2 , where Cz ∩ O2 = ∅, O2 ⊂ O1 and Cz ⊂ O1 for Cz = {ζ + z : ζ ∈ C}. One might be tempted to try to extend the definition of the vertex operator even further to states in VCO ∼ = V O , but the corresponding operator will then only be well-defined on a suitable dense subspace of V O1 . For the vertex operator associated to 9 ∈ VO C , we again have eλL−1 V (9, z)e−λL−1 = V (9, z + λ), V (9, 0) = 9.

(27)

V (9, z)V (φ, ζ ) = V (φ, ζ )V (9, z),

(28a)

Furthermore, V (9, z) = e

zL−1

9

(28b)

for any φ ∈ V , ζ ∈ / Cz . [In (28a), the left-hand and rigth-hand sides are to be interpreted as maps V O1 → V O2 , with V (φ, ζ ) : V O1 → V OL and V (9, z) : V OL → V O2 on the left-hand side and V (9, z) : V O1 → V OR and V (φ, ζ ) : V OR → V O2 on the rigth-hand side, where O2 ⊂ OL ⊂ O1 , O2 ⊂ OR ⊂ O1 , ζ ∈ OR ∩ OLc ∩ O2c and c ∩ O c (where O c denotes the complement of O , etc.). Equation (28b) Cz ⊂ OL ∩ OR 2 2 2 holds in V O with Cz ∩ O = ∅.] Actually, these two conditions characterise the vertex operator already uniquely:

558

M. R. Gaberdiel, P. Goddard

Theorem 3 (Uniqueness). For 9 ∈ VO C , the operator V (9, z) is uniquely characterised by the conditions (28a) and (28b). The proof is essentially that contained in ref. [13]: If W (z)V (φ, ζ ) = V (φ, ζ )W (z) for 0 φ ∈ V, ζ ∈ / Cz , and W (z) = ezL−1 9, it follows that, for 8 ∈ VO C 0 , W (z)V (8, ζ ) = V (8, ζ )W (z) provided that Cζ0 ∩ Cz = ∅ and so W (z)eζ L−1 8 = W (z)V (8, ζ ) = V (8, ζ )W (z) = V (8, ζ )ezL−1 9 = V (8, ζ )V (9, z) = V (9, z)V (8, ζ ) = V (9, z)eζ L−1 8 0

0

O for all 8 ∈ VO C 0 , which is dense in V , showing that W (z) = V (9, z). From this uniqueness result and (17) we can deduce the commutators of vertex operators V (ψ, z), ψ ∈ Vh , with L−1 , L0 , L1 :

d V (ψ, z), dz d [L0 , V (ψ, z)] = z V (ψ, z) + hV (ψ, z), dz 2 d V (ψ, z) + 2hzV (ψ, z). [L1 , V (ψ, z)] = z dz

[L−1 , V (ψ, z)] =

(29a) (29b) (29c)

We recall from (25) that L1 ψ = 0 and L0 ψ = hψ if ψ ∈ Vh ; if L1 ψ = 0, ψ is said to be quasi-primary. The definition (26) immediately implies that, for states ψ, φ ∈ V , V (ψ, z)V (φ, ζ ) = V (V (ψ, z − ζ )φ, ζ ). This statement generalises to the key duality result of Theorem 4, which can be seen to follow from the uniqueness theorem: 0

O 0 Theorem 4 (Duality). If 9 ∈ VO C and 8 ∈ VC 0 , where Cz ∩ C ζ = ∅, then

V (9, z)V (8, ζ ) = V (V (9, z − ζ )8, ζ ).

(30)

[In (30), the left-hand and rigth-hand sides are to be interpreted as maps V O1 → V O2 , with V (8, ζ ) : V O1 → V OL and V (9, z) : V OL → V O2 on the left-hand side and V (9, z − ζ )8 ∈ VCz−ζ ∪C 0 where O2 ⊂ OL ⊂ O1 , Cz ⊂ OL ∩ O2c and C 0 ζ ⊂ O1 ∩ OLc .] The result follows from the uniqueness theorem on noting that V (8, z)V (9, ζ ) = V (8, z)eζ L−1 9 = eζ L−1 V (8, z − ζ )9 = V (V (8, z − ζ )9), ζ ). 4. Modes, Fock Spaces and the Equivalence of Theories The concept of equivalence between two meromorphic field theories in our definition could be formulated in terms of the whole collection of spaces V O , where O ranges over the open subsets of P with path-connected complement, but this would be very unwieldy. In fact, each meromorphic field theory has a Fock space at its heart and we can focus on this in order to define (and, in practice, test for) the equivalence of theories. To approach this we first need to introduce the concept of the modes of a vertex operator.

Axiomatic Conformal Field Theory

559

It is straightforward to see that we can construct contour integrals of vectors in V O , e.g. of the form Z

Z

Z C1

dz1

C2

dz2 . . .

Cr

dzr µ(z1 , z2 , . . . , zr )

n Y

V (ψi , zi ),

(31)

i=1

where r ≤ n and the weight function µ is analytic in some neighbourhood of C1 × C2 × · · · × Cr and the distances |zi − zj |, i 6= j , are bounded away from 0 on this set. In this way we can define the modes I zh+n−1 V (ψ, z)dz, for ψ ∈ Vh , (32) Vn (ψ) = C

as linear operators on VCO , where C encircles C and C ⊂ O with H ∞ ∈ O and 0 ∈ C, and we absorb a factor of 1/2πi into the definition of the symbol . The meromorphicity of the amplitudes allows us to establish V (ψ, z) =

∞ X

Vn (ψ)z−n−h

(33)

n=−∞ 0

with convergence with respect to the topology of V O for an appropriate O0 . The definition of Vn (ψ) is independent of C if it is taken to be a simple contour encircling the origin once positively. Further, if O2 ⊂ O1 , V O1 ⊂ V O2 and if ∞ ∈ / O1 , the definition of Vn (ψ) on V O1 , V O2 , agrees on V O1 , which is dense in O2 , 0 ∈ O 2 V , so that we may regard the definition as independent of O also. Vn (ψ) depends on our choice of 0 and ∞ but different choices can be related by Möbius transformations. We define the Fock space HO ⊂ V O to be the space spanned by finite linear combinations of vectors of the form 9 = Vn1 (ψ1 )Vn2 (ψ2 ) · · · VnN (ψN ),

(34)

where ψj ∈ V and nj ∈ Z, 1 ≤ j ≤ N . Then, by construction, HO has a countable basis. It is easy to see that HO is dense in V O . Further it is clear that HO is independent of O, and, where there is no ambiguity, we shall denote it simply by H. It does however depend on the choice of 0 and ∞, but different choices will be related by the action of the Möbius group again. It follows from (28b) that V (ψ, 0) = ψ (35) which implies that Vn (ψ) = 0 if n > −h

(36)

V−h (ψ) = ψ.

(37)

and Thus V ⊂ H. Since ∞ and 0 play a special role, it is not surprising that L0 , the generator of the subgroup of M preserving them, does as well. From (29b) it follows that [L0 , Vn (ψ)] = −nVn (ψ),

(38)

560

M. R. Gaberdiel, P. Goddard

so that for 9 defined by (34), L0 9 = h9, where h = −

N X

nj .

(39)

j =1

Thus H=

M

Hh , where Vh ⊂ Hh ,

(40)

h∈Z

P where Hh is the subspace spanned by vectors of the form (34) for which h = j nj . Thus L0 has a spectral decomposition and the Hh , h ∈ Z, are the eigenspaces of L0 . They have countable dimensions but here we shall only consider theories for which their dimensions are finite. (This is not guaranteed by the finite-dimensionality of V ; in fact, in practice, it is not easy to determine whether these spaces are finite-dimensional or not, although it is rather obvious in many examples.) We can define vertex operators for the vectors (34) by I I hN +nN −1 z1h1 +n1 −1 V (ψ1 , z+z1 )dz1 · · · zN V (ψN , z+zN )dzN , (41) V (9, z) = C1

CN

where the Cj are contours about 0 with |zi | > |zj | if i < j . We can then replace the densities (1) by the larger class A0 of densities hV (91 , z1 )V (92 , z2 ) · · · V (9n , zn )i

n Y

dzj

hj

,

(42)

j =1

where 9j ∈ Hhj , 1 ≤ j ≤ n. It is not difficult to see that replacing A with A0 , i.e. replacing V with H, does not change the definition of the spaces V O . Theorem 3 O (Uniqueness) and Theorem 4 (Duality) will still hold if we replace VO C with HC , the space we would obtain if we started with H rather than V , etc. These theorems enable the Möbius transformation properties of vertex operators to be determined (see Appendix D). However, if we use the whole of H as a starting point, the Möbius properties of the densities A0 can not be as simple as in Sect. 3 because not all ψ ∈ H have the quasi-primary property L1 ψ = 0. But we can introduce the subspaces of quasi-primary vectors within H and Hh , M Q Q Hh . (43) HQ = {9 ∈ H : L1 9 = 0}, Hh = {9 ∈ Hh : L1 9 = 0}, HQ = h

V ⊂ HQ and HQ is the maximal V which will generate the theory with the same spaces V O and with agreement of the densities. [Under the cluster decomposition assumption of Sect. 6, H is generated from HQ by the action of the Möbius group or, more particularly, L−1 . See Appendix D.] We are now in a position to define the equivalence of two theories. A theory specified by a space V and amplitudes A = {f }, leading to a quasi-primary space HQ , is said to be equivalent to the theory specified by a space Vˆ and amplitudes Aˆ = {fˆ}, leading to a Q quasi-primary space Hˆ Q , if there are graded injections ι : V → Hˆ Q (i.e. ι(Vh ) ⊂ Hˆ h ) and ιˆ : Vˆ → HQ which map amplitudes to amplitudes.

Axiomatic Conformal Field Theory

561

Many calculations in conformal field theory are most easily performed in terms of modes of vertex operators which capture in essence the algebraic structure of the theory. In particular, the modes of the vertex operators define what is usually called a W-algebra; this can be seen as follows. The duality property of the vertex operators can be rewritten in terms of modes as V (8, z)V (9, ζ ) = V (V (8, z − ζ )9, ζ ) X V (Vn (8)9, ζ )(z − ζ )−n−hφ , =

(44)

n

where L0 9 = h9 9 and L0 8 = h8 8, and 9, 8 ∈ H. We can then use the usual contour techniques of conformal field theory to derive from this formula commutation relations for the respective modes. Indeed, the commutator of two modes Vm (8) and Vn (9) acting on BC is defined by I

I

=

dζ zm+h8 −1 ζ n+h9 −1 V (8, z)V (9, ζ )

dz |z|>|ζ |

I −

I

dz

dζ zm+h8 −1 ζ n+h9 −1 V (8, z)V (9, ζ ),

(45)

|ζ |>|z|

where the contours on the right-hand side encircle C anti-clockwise. We can then deform the two contours so as to rewrite (45) as I [Vm (8), Vn (9)] =

ζ

n+h9 −1

I dζ

0

ζ

zm+h8 −1 dz

X

V (Vl (8)9, ζ )(z − ζ )−l−h8 ,

l

(46) where the z contour is a small positive circle about ζ and the ζ contour is a positive circle about C. Only terms with l ≥ 1 − h8 contribute, and the integral becomes [Vm (8), Vn (9)] =

∞ X N =−h8 +1

m + h8 − 1 Vm+n (VN (8)9). m−N

(47)

In particular, if m ≥ −h8 + 1, n ≥ −h9 + 1, then m − N ≥ 0 in the sum, and m + n ≥ N + n ≥ N − h9 + 1. This implies that the modes {Vm (9) : m ≥ −h9 + 1} close as a Lie algebra; the same also holds for {Vm (9) : 0 ≥ m ≥ −h9 + 1}. As we shall discuss below in Sect. 6, in conformal field theory it is usually assumed that the amplitudes satisfy another property which guarantees that the spectrum of L0 is bounded below by 0. If this is the case then the sum in (47) is also bounded above by h9 . 5. Some Examples Before proceeding further, we shall give a number of examples of theories that satisfy the axioms that we have specified so far.

562

M. R. Gaberdiel, P. Goddard

5.1. The U (1) theory. The simplest example is the case where V is a one-dimensional vector space, spanned by a vector J of weight 1, in which case we write J (z) ≡ V (J, z). The amplitude of an odd number of J -fields is defined to vanish, and in the case of an even number it is given by hJ (z1 ) · · · J (z2n )i =

n kn X Y 1 , n 2 n! (zπ(j ) − zπ(j +n) )2

= kn

π ∈S2n j =1 n X Y

0 j =1 π ∈S2n

1 , (zπ(j ) − zπ(j +n) )2

(48a)

(48b)

where k is an arbitrary (real) constant and, in (48a), S2n is the permutation group on 2n 0 of permutations π ∈ S object, whilst, in (48b), the sum is restricted to the subset S2n 2n such that π(i) < π(i + n) and π(i) < π(j ) if 1 ≤ i < j ≤ n. (This defines the amplitudes on a basis of V and we extend the definition by multilinearity.) It is clear that the amplitudes are meromorphic in zj , and that they satisfy the locality condition. It is also easy to check that they are Möbius covariant, with the weight of J being 1. From the amplitudes we can directly read off the operator product expansion of the field J with itself as k + O(1). (49) J (z)J (w) ∼ (z − w)2 Comparing this with (44), and using (47) we then obtain [Jn , Jm ] = nkδn,−m .

(50)

This defines (a representation of) the affine algebra u(1). ˆ 5.2. Affine Lie algebra theory. Following Frenkel and Zhu [20], we can generalise this example to the case of an arbitrary finite-dimensional Lie algebra g. Suppose that the matrices t a , 1 ≤ a ≤ dim g, provide a finite-dimensional representation of g so that [t a , t b ] = f ab c t c , where f ab c are the structure constants of g. In this case, the space V is of dimension dim g and has a basis consisting of weight one states J a , 1 ≤ a ≤ dim g. Again, we write J a (z) = V (J a , z). If K is any matrix which commutes with all the t a , define κ a1 a2 ...am = tr(Kt a1 t a2 · · · t am ).

(51)

The κ a1 a2 ...am have the properties that

and

κ a1 a2 a3 ...am−1 am = κ a2 a3 ...am−1 am a1

(52)

κ a1 a2 a3 ...am−1 am − κ a2 a1 a3 ...am−1 am = f a1 a2 b κ ba3 ...am−1 am .

(53)

With a cycle σ = (i1 , i2 , . . . , im ) ≡ (i2 , . . . , im , i1 ) we associate the function ai1 ai2 ...aim

(zi1 , zi2 , . . . , zim ) κ ai1 ai2 ...aim . = (zi1 − zi2 )(zi2 − zi3 ) · · · (zim−1 − zim )(zim − zi1 )

fσ

(54)

Axiomatic Conformal Field Theory

563

If the permutation ρ ∈ Sn has no fixed points, it can be written as the product of cycles of length at least 2, ρ = σ1 σ2 . . . σM . We associate to ρ the product fρ of functions fσ1 fσ2 . . . fσM and define hJ a1 (z1 )J a2 (z2 ) . . . J an (zn )i to be the sum of such functions fρ over permutations ρ ∈ Sn with no fixed point. Graphically, we can construct these amplitudes by summing over all graphs with n vertices where the vertices carry labels aj , 1 ≤ j ≤ n, and each vertex is connected by two directed lines (propagators) to other vertices, one of the lines at each vertex pointing towards it and one away. Thus, in a given graph, the vertices are divided into directed loops or cycles, each loop containing at least two vertices. To each loop, we associate a function as in (55) and to each graph we associate the product of functions associated to the loops of which it is composed. Again, this defines the amplitudes on a basis of V and we extend the definition by multilinearity. The amplitudes are evidently local and meromorphic, and one can verify that they satisfy the Möbius covariance property with the weight of J a being 1. The amplitudes determine the operator product expansion to be of the form J a (z)J b (w) ∼

κ ab f ab c J c (w) + + O(1), (z − w)2 (z − w)

(55)

and the algebra therefore becomes c + mκ ab δm,−n . [Jma , Jnb ] = f ab c Jm+n

(56)

This is (a representation of) the affine algebra g. ˆ In the particular case where g is simple, κ ab = tr(Kt a t b ) = kδ ab , for some k, if we choose a suitable basis.

5.3. The Virasoro theory. Again following Frenkel and Zhu [20], we can construct the Virasoro theory in a similar way. In this case, the space V is one-dimensional, spanned by a vector L of weight 2 and we write L(z) = V (L, z). We can again construct the amplitudes graphically by summing over all graphs with n vertices, where the vertices are labelled by the integers 1 ≤ j ≤ n, and each vertex is connected by two lines (propagators) to other vertices. In a given graph, the vertices are now divided into loops, each loop containing at least two vertices. To each loop ` = (i1 , i2 , . . . , im ), we associate a function f` (zi1 , zi2 , . . . , zim ) c/2 (57) , = 2 2 2 2 (zi1 − zi2 ) (zi2 − zi3 ) · · · (zim−1 − zim ) (zim − zi1 ) where c is a real number, and, to a graph, the product of the functions associated to its loops. [Since it corresponds to a factor of the form (zi −zj )−2 rather than (zi −zj )−1 , each line or propagator might appropriately be represented by a double line.] The amplitudes hL(z1 )L(z2 ) . . . L(zn )i are then obtained by summing the functions associated with the various graphs with n vertices. [Note graphs related by reversing the direction of any loop contribute equally to this sum.] These amplitudes determine the operator product expansion to be L(z)L(ζ ) ∼

2L(ζ ) L0 (ζ ) c/2 + + + O(1) (z − ζ )4 (z − ζ )2 z−ζ

(58)

which leads to the Virasoro algebra [Lm , Ln ] = (m − n)Lm+n +

c m(m2 − 1)δm,−n . 12

(59)

564

M. R. Gaberdiel, P. Goddard

5.4. Lattice theories. Suppose that 3 is an even n-dimensional Euclidean lattice, so that, if k ∈ 3, k 2 is an even integer. We introduce a basis e1 , e2 , . . . , en for 3, so that any element k of 3 is an integral linear combination of these basis elements. We can introduce an algebra consisting of matrices γj , 1 ≤ j ≤ n, such that γj2 = 1 and γi γj = (−1)ei ·ej γj γi . If we define γk = γ1m1 γ2m2 . . . γnmn for k = m1 e1 + m2 e2 + . . . + mn en , we can define quantities (k1 , k2 , . . . , kN ), taking the values ±1, by γk1 γk2 . . . γkN = (k1 , k2 , . . . , kN )γk1 +k2 +...+kN .

(60)

We define the theory associated to the lattice 3 by taking V to have a basis {ψk : k ∈ 3}, where the weight of ψk is 21 k 2 , and, writing V (ψk , z) = V (k, z), the amplitudes to be hV (k1 , z1 )V (k2 , z2 ) · · · V (kN , zn )i = (k1 , k2 , . . . , kN ) Y (zi − zj )ki ·kj ·

(61)

1≤i<j ≤N

if k1 + k2 + . . . + kN = 0 and zero otherwise. The (k1 , k2 , . . . , kN ) obey the conditions (k1 , k2 , . . . , kj −1 , kj ,kj +1 , kj +2 , . . . , kN ) = (−1)kj ·kj +1 (k1 , k2 , . . . , kj −1 , kj +1 , kj , kj +2 , . . . , kN ), which guarantees locality, and (k1 , k2 , . . . , kj )(kj +1 , . . . , kN ) = (k1 + k2 + . . . + kj , kj +1 + . . . + kN )(k1 , k2 , . . . , kj , kj +1 , . . . , kN ), which implies the cluster decomposition property of Sect. 6. 5.5. A non-conformal example. The above examples actually define meromorphic conformal field theories, but since we have not yet defined what we mean by a theory to be conformal, it is instructive to consider also an example that satisfies the above axioms but is not conformal. The simplest such case is a slight modification of the U (1) example described in 5.1: again we take V to be a one-dimensional vector space, spanned by a vector K, but now the grade of K is taken to be 2. Writing K(z) ≡ V (K, z), the amplitudes of an odd number of K-fields vanishes, and in the case of an even number we have n kn X Y 1 . (62) hK(z1 ) · · · K(z2n )i = n 2 n! (zπ(j ) − zπ(j +n) )4 π ∈S2n j =1

It is not difficult to check that these amplitudes satisfy all the axioms we have considered so far. In this case the operator product expansion is of the form K(z)K(w) ∼

k + O(1), (z − w)4

(63)

k n(n2 − 1)δn,−m . 6

(64)

and the algebra of modes is given by [Kn , Km ] =

Axiomatic Conformal Field Theory

565

6. Cluster Decomposition So far the axioms we have formulated do not impose any restrictions on the relative normalisation of amplitudes involving for example a different number of vectors in V , and the class of theories we are considering is therefore rather flexible. This is mirrored by the fact that it does not yet follow from our considerations that the spectrum of the operator L0 is bounded from below, and since L0 is in essence the energy of the corresponding physical theory, we may want to impose this constraint. In fact, we would like to impose the slightly stronger condition that the spectrum of L0 is bounded by 0, and that there is precisely one state with eigenvalue equal to zero. This will follow (as we shall show momentarily) from the cluster decomposition property, which states that if we separate the variables of an amplitude into two sets and scale one set towards a fixed point (e.g. 0 or ∞) the behaviour of the amplitude is dominated by the product of two amplitudes, corresponding to the two sets of variables, multiplied by an appropriate power of the separation, specifically + * +* + * Y Y Y Y V (φi , ζi ) V (ψj , λzj ) ∼ V (φi , ζi ) V (ψj , zj ) λ−6hj (65) i j i j as λ → 0, where φi ∈ Vh0i , ψj ∈ Vhj . It follows from Möbius invariance, that this is equivalent to *

Y

V (φi , λζi )

Y

i

+

*

V (ψj , zj ) ∼

j

Y

+* V (φi , ζi )

i

Y

+ 0

V (ψj , zj ) λ−6hi

j

(66)

as λ → ∞. The cluster decomposition property extends also to vectors 8i , 9j ∈ H. It is not difficult to check that the examples of the previous section satisfy this condition. We can use the cluster decomposition property to show that the spectrum of L0 is non-negative and that the vacuum is, in a sense, unique. To this end let us introduce the projection operators defined by I (67) PN = uL0 −N −1 du, for N ∈ Z. 0

In particular, we have PN

Y

I V (ψj , zj ) =

uh−N −1 V (ψj , uzj )du,

(68)

j

where h =

P

j

hj . It then follows that the PN are projection operators PN PM = 0, if N 6 = M, PN2 = PN ,

X

PN = 1

(69)

N

onto the eigenspaces of L0 , L0 PN = N PN .

(70)

566

M. R. Gaberdiel, P. Goddard

For N ≤ 0, we then have + * Y Y V (φi , ζi )PN V (ψj , zj ) i

j

*

I u

=

Y

6hj −N−1

0

i

Y

j

+*

* ∼

V (φi , ζi )

Y

V (φi , ζi )

i

Y

+ V (ψj , uzj ) du

+I

V (ψj , zj )

|u|=ρ

j

u−N −1 du,

which, by taking ρ → 0, is seen to vanish for N < 0 and, for N = 0, to give * + Y Y V (ψj , zj ) = V (ψj , zj ) , P0 j

(71)

j

and so P0 9 = h9i. Thus the cluster decomposition property implies that PN = 0 for N < 0, i.e. the spectrum of L0 is non-negative, and that H0 is spanned by the vacuum , which is thus the unique state with L0 = 0. As we have mentioned before the absence of negative eigenvalues of L0 gives an upper bound on the order of the pole in the operator product expansion of two vertex operators, and thus to an upper bound in the sum in (47): if 8, 9 ∈ H are of grade L0 8 = h8 8, L0 9 = h9 9, we have that Vn (8)9 = 0 for n > h9 because otherwise Vn (8)9 would have a negative eigenvalue, h9 − n, with respect to L0 . In particular, this shows that the leading singularity in V (8, z)V (9, ζ ) is at most (z − ζ )−h9 −h8 . The cluster property also implies that the space of states of the meromorphic field theory does not have any proper invariant subspaces in a suitable sense. To make this statement precise we must first give meaning to a subspace of the space of states of a conformal field theory. The space of states of the theory is really the collection of topological spaces V O , where O is an open subset of P whose complement is path0 connected. Recall that V O ⊂ V O if O ⊃ O0 . By a subspace of the conformal field theory we shall mean subspaces U O ⊂ V O specified for each open subset O ⊂ P with 0 path-connected complement, such that U O = U O ∩ V O if O ⊃ O0 . Proposition 5. Suppose {U O } is an invariant closed subspace of {V O }, i.e. U O is closed; 0 0 U O = U O ∩ V O if O ⊃ O0 ; and V (ψ, z)U O ⊂ U O for all ψ ∈ V , where z ∈ O, 0 O z∈ / O ⊂ O. Then {U } is not a proper subspace, i.e. either U O = V O for all O, or O U = {0}. / O0 ⊂ O and consider Proof. Suppose that φ ∈ U O , ψj ∈ V , zj ∈ O, zj ∈ n Y

0

V (ψj , zj )φ ∈ U O .

(72)

j =1

Now, taking a suitable integral of the left-hand side, * n + n Y Y V (ψj , zj )φ = λ = V (ψj , zj )φ . P0 j =1

j =1

(73)

Axiomatic Conformal Field Theory

567

Thus either all the amplitudes involving φ vanish for all φ ∈ U, in which case U = {0}, 0 or ∈ U O for some O0 , in which case it is easy to see that ∈ U O for all O and it follows that U O = V O for all O. The cluster property also implies that the image of BC in VCO is faithful. To show that the images of the elements ψ, ψ 0 ∈ BC are distinct we note that otherwise ηφ (ψ) = ηφ (ψ 0 ) for all φ ∈ VO with φ as in (4). By taking m in (4) to be sufficiently large, dividing the ζi , 1 ≤ i ≤ m, into n groups which we allow to approach the zj , 1 ≤ j ≤ n, successively. The cluster property then shows that these must be the same Q points as the zj0 , 1 ≤ j ≤ n0 in ψ 0 and that ψi = µj ψj0 for some µj ∈ C with nj=1 µj = 1, t establishing that ψ = ψ 0 as elements of VCO . u 7. Conformal Symmetry So far our axioms do not require that our amplitudes correspond to a conformal field theory, only that the theory have a Möbius invariance, and indeed, as we shall see, the example in 5.3 is not conformally invariant. Further, what we shall discuss in the sections which follow the present one will not depend on a conformal structure, except where we explicitly mention it; in this sense, the present section is somewhat of an interlude. On the other hand, the conformal symmetry is crucial for more sophisticated considerations, in particular the theory on higher genus Riemann surfaces, and therefore forms a very important part of the general framework. Let us first describe a construction by means of which a potentially new theory can be associated to a given theory, and explain then in terms of this construction what it means for a theory to be conformal. Suppose we are given a theory that is specified by a space V and amplitudes A = {f }. Let us denote by Vˆ the vector space that is obtained from V by appending a vector L of grade two, and let us write V (L, z) = L(z). The amplitudes involving only fields in V are given as before, and the amplitude h

m Y j =1

L(wj )

n Y

V (ψi , zi )i,

(74)

i=1

where ψi ∈ Vhi is defined as follows: we associate to each of the n+m fields a point, and then consider the (ordered) graphs consisting of loops where each loop contains at most one of the points associated to the ψi , and each point associated to an L is a vertex of precisely one loop. (The points associated to ψi may be vertices of an arbitrary number of loops.) To each loop whose vertices only consist of points corresponding to L we associate the same function as before in Sect. 5.3, and to the loop (zi , wπ(1) , . . . , wπ(l) ) we associate the expression l−1 Y j =1

1 (wπ(j ) − wπ(j +1) )2

hi (wπ(1) − zi )(wπ(l) − zi ) 1 1 d d 1 + . + 2 (wπ(1) − zi ) dzi (wπ(l) − zi ) dzi

(75)

We then associate to each graph the product of the expressions associated to the different loops acting on the amplitude which is obtained from (74) upon removing L(w1 ) · · ·

568

M. R. Gaberdiel, P. Goddard

L(wm ), and the total amplitude is the sum of the functions associated to all such (ordered) graphs. (The product of the expressions of the form (75) is taken to be “normal ordered” in the sense that all derivatives with respect to zi only act on the amplitude that is obtained from (74) upon excising the Ls; in this way, the product is independent of the order in which the expressions of the form (75) are applied.) We extend this definition by multilinearity to amplitudes defined for arbitrary states in Vˆ . It follows immediately that the resulting amplitudes are local and meromorphic; in Appendix E we shall give a more explicit formula for the extended amplitudes, and use it to prove that the amplitudes also satisfy the Möbius covariance and the cluster property. In terms of conventional conformal field theory, the construction treats all quasiprimary states in V as primary with respect to the Virasoro algebra of the extended theory; this is apparent from the formula given in Appendix E. We can generalise this definition further by considering in addition graphs which contain “double loops” of the form (zi , wj ) for those points zi which correspond to states in V of grade two, where in this case neither zi nor wj can be a vertex of any other loop. We associate the function cψ /2 (76) (zi − wj )4 to each such loop (where cψ is an arbitrary linear functional on the states of weight two in V ), and the product of the different expressions corresponding to the different loops in the graph act in this case on the amplitude (74), where in addition to all L-fields also the fields corresponding to V (ψi , zi ) (for each i which appears in a double loop) have been removed. It is easy to see that this generalisation also satisfies all axioms. This construction typically modifies the structure of the meromorphic field theory in the sense that it changes the operator product expansion (and thus the commutators of the corresponding modes) of vectors in V ; this is for example the case for the “nonconformal” model described in Sect. 5.5. If we introduce the field L as described above, we find the commutation relations cK m (m2 − 1) δm,−n . (77) [Lm , Kn ] = (m − n) Km+n + 12 However, this is incompatible with the original commutator in (64): the Jacobi identity requires that 0 = [Lm , [Kn , Kl ] ] + [Kn , [Kl , Lm ] ] + [Kl , [Lm , Kn ] ] = (l − m) [Kn , Kl+m ] + (m − n) [Kl , Km+n ] i h k = δl+m+n,0 −(l − m) (l + m) (l + m)2 − 1 + (2m + l) l (l 2 − 1) 6 k = δl+m+n,0 m (m2 − 1) (2l + m) 6 and this is not satisfied unless k = 0 (in which case the original theory is trivial). In fact, the introduction of L modifies (64) as k k m(m2 − 1)δm,−n + (m − n)Zm+n , 6 a cK 2 m(m − 1)δm,−n + (m − n)Km+n , [Lm , Kn ] = 12 [Zm , Kn ] = 0,

[Km , Kn ] =

Axiomatic Conformal Field Theory

569

a m(m2 − 1)δm,−n + (m − n)Zm+n , 12 c m(m2 − 1)δm,−n + (m − n)Lm+n , [Lm , Ln ] = 12 [Zm , Zn ] = 0, [Lm , Zn ] =

where a is non-zero and can be set to equal k by rescaling Z, and the Zn are the modes of a field of grade two. This set of commutators then satisfies the Jacobi identities. It also follows from the fact that the commutators of Z with K and Z vanish, that amplitudes that involve only K-fields and at least one Z-field vanish; in this way we recover the original amplitudes and commutators. The construction actually depends on the choice of V (as well as the values of cψ and c), and therefore does not only depend on the equivalence class of meromorphic field theories. However, we can ask whether a given equivalence class of meromorphic field theories contains a representative (V , A) (i.e. a choice of V that gives an equivalent ˆ is equivalent to (V , A); if this is the case, description of the theory) for which (Vˆ , A) we call the meromorphic field theory conformal. It follows directly from the definition of equivalence that a meromorphic field theory is conformal if and only if there exists a representative (V , A) and a vector L0 ∈ V (of grade two) so that n Y V (ψj , zj )i = 0 h L(w) − L0 (w)

(78)

j =1

for all ψj ∈ V , where L is defined as above. In this case, the linear functional cψ is defined by cψ = 2(w − z)4 hL0 (w)V (ψ, z)i. In the case of the non-conformal example of Sect. 5.5, it is clear that (78) cannot be satisfied as the Fock space only contains one vector of grade two, L0 = αK−2 , and L0 does not satisfy (78) for any value of α. On the other hand, for the example of Sect. 5.1, we can choose 1 J−1 J−1 , L0 = (79) 2k and this then satisfies (78). Similarly, in the case of the example of Sect. 5.2, we can choose X 1 J a J a , L0 = 2(k + Q) a −1 −1 where Q is the dual Coxeter number of g (i.e. the value of the quadratic Casimir in the adjoint representation), and again (78) is satisfied for this choice of L0 (and the above choice of V ). This construction is known as the “Sugawara construction”. For completeness it should be mentioned that the modes of the field L (that is contained in the theory in the conformal case) satisfy the Virasoro algebra [Lm , Ln ] = (m − n)Lm+n +

c m(m2 − 1)δm,−n , 12

where c is the number that appears in the above definition of L. Furthermore, the modes Lm with m = 0, ±1 agree with the Möbius generators of the theory.

570

M. R. Gaberdiel, P. Goddard

8. Representations In order to introduce the concept of a representation of a meromorphic conformal field theory or conformal algebra, we consider a collection of densities more general than those used in Sect. 2 to define the meromorphic conformal field theory itself. The densities we now consider are typically defined on a cover of the Riemann sphere, P, rather than P itself. We consider densities which are functions of variables ui , 1 ≤ i ≤ N, and zj , 1 ≤ j ≤ n, which are analytic if no two of these N + n variables are equal, may have poles at zi = zj , i 6 = j , or zi = uj , and may be branched about ui = uj , i 6= j . To define a representation, we need the case where N = 2, in which the densities are meromorphic in all but two of the variables. Starting again with V = ⊕h Vh , together with two finite-dimensional spaces Wα and Wβ (which may be one-dimensional), we suppose that, for each integer n ≥ 0, and zi ∈ P and u1 , u2 on some branched cover of P, and for any collection of vectors ψi ∈ Vhi and χ1 ∈ Wα , χ2 ∈ Wβ , we have a density g(ψ1 , . . . , ψn ; z1 , . . . , zn ; χ1 , χ2 ; u1 , u2 ) ≡ hV (ψ1 , z1 )V (ψ2 , z2 ) · · · V (ψn , zn )Wα (χ1 , u1 )Wβ (χ2 , u2 )i ·

n Y

dzj

hj

(du1 )r1 (du2 )r2 ,

(80)

j =1

where r1 , r2 are real numbers, which we call the conformal weights of χ1 and χ2 , respectively. The amplitudes hV (ψ1 , z1 )V (ψ2 , z2 ) · · · V (ψn , zn )Wα (χ1 , u1 )Wβ (χ2 , u2 )i

(81)

are taken to be multilinear in the ψj and χ1 , χ2 , and invariant under the exchange of (ψi , zi ) with (ψj , zj ), and meromorphic as a function of the zj , analytic except for possible poles at zi = zj , i 6 = j , and zi = u1 or zi = u2 . As functions of u1 , u2 , the amplitudes are analytic except for the possible poles at u1 = zi or u2 = zi and a possible branch cut at u1 = u2 . We denote a collection of such densities by R = {g}. Just as before, given an open set C ⊂ P we introduced spaces BC , whose elements are of the form (3), so we can now introduce sets, BC αβ , labelled by finite collections of ψi ∈ Vhi , zi ∈ C ⊂ P, i = 1, . . . , n, n ∈ N and zi 6 = zj if i 6= j , together with χ1 ∈ Wα , χ2 ∈ Wβ and u1 , u2 ∈ C, u1 6 = u2 and zi 6 = uj , denoted by χ = V (ψ1 , z1 )V (ψ2 , z2 ) · · · V (ψn , zn )Wα (χ1 , u1 )Wβ (χ2 , u2 ) ≡

n Y

V (ψi , zi )Wα (χ1 , u1 )Wβ (χ2 , u2 ).

(82)

i=1

We again immediately identify different χ ∈ BC αβ with the other elements of BC αβ obtained by replacing each Qψj in (82) by µj ψj , 1 ≤ j ≤ n, χi by λi χi , i = 1, 2, where λ1 , λ2 , µj ∈ C and λ1 λ2 nj=1 µj = 1. Proceeding as before, we introduce the vector space VC αβ with basis BC αβ and we cut it down to size exactly as before, i.e. we note that if we introduce another open set O ⊂ P, with O ∩ C = ∅, and, as in (4) write φ = V (φ1 , ζ1 )V (φ2 , ζ2 ) · · · V (φm , ζm ) ∈ BO ,

(83)

Axiomatic Conformal Field Theory

571

where φj ∈ Vkj , j = 1, . . . m, each φ ∈ BO defines a map on BC αβ by *m + n Y Y V (φi , ζi ) V (ψi , zi )Wα (χ1 , u1 )Wβ (χ2 , u2 ) . ηφ (χ ) = (φ, χ ) = i=1

(84)

i=1

eC αβ , of Again ηφ extends by linearity to a map VC αβ → C and we consider the space, V sequences X = (X1 , X2 , . . . ), Xj ∈ VC αβ , for which ηφ (Xj ) converges uniformly on each of the family of compact sets of the form (7). We write ηφ (X) = limj →∞ ηφ (Xj ) and define the space VCOαβ as being composed of the equivalence classes of such sequences, identifying two sequences X1 , X2 , if ηφ (X 1 ) = ηφ (X2 ) for all φ ∈ BO . Using the same arguments as in the proof of Theorem 1 (see Appendix C), it can be shown that the space VCOαβ is independent of C, provided that the complement of O is O ≡ V O . We can define a family of seminorms path-connected; in this case we write Vαβ C αβ

O by ||X|| = |η (X)|, where φ is an arbitrary element of B , and the natural for Vαβ φ φ O O is the topology that is induced by this family of seminorms. (This is to topology on Vαβ O converges if and only if η (X ) converges say, that a sequence of states in Xj ∈ Vαβ j φ for every φ ∈ BO .) So far we have not specified a relationship between the spaces V O , which define O , which we have now introduced to the conformal field theory, and the new spaces Vαβ define a representation of it. Such a relation is an essential part of the definition of a representation; it has to express the idea that the two spaces define the same relations between combinations of vectors in the sets BC . To do this consider the space of all continuous linear functionals on V O , the dual space of V O , which we will denote O )0 , of V O . It is natural to consider these dual spaces (V O )0 , and also the dual, (Vαβ αβ as topological vector spaces with the weak topology: for each f ∈ (V O )0 , we can consider the (uncountable) family of seminorms defined by ||f ||9 ≡ |f (9)|, where 9 O )0 ). The weak topology is then the is an arbitrary element of V O (and similarly for (Vαβ topology that is induced by this family of seminorms (so that fn → f if and only if fn (9) → f (9) for each 9 ∈ V O ). Every element of φ ∈ BO defines a continuous linear functional both on V O and on O Vαβ , each of which we shall denote by ηφ , and the linear span of the set of all linear O )0 . We therefore have functionals that arise in this way is dense in both (V O )0 and (Vαβ O )0 , and the condition a map from a dense subspace of (V O )0 to a dense subspace of (Vαβ for the amplitudes (80) to define a representation of the meromorphic (conformal) field theory whose spaces of states are given by V O is that this map extends to a continuous map between the dual spaces, i.e. that there exists a continuous map O 0 ) such that ι(ηφ ) = ηφ . ι : (V O )0 → (Vαβ

(85)

O will not distinguish limits of linear combinations of B This in essence says that Vαβ O O not distinguished by V . Given a collection of densities R we can construct (in a similar way as before for the collection of amplitudes A) spaces of states VαO and VβO , on which the vertex operators of the meromorphic theory are well-defined operators. By the by now familiar scheme, let us introduce the set BC α that is labelled by finite collections of ψi ∈ Vhi , zi ∈ C ⊂ P,

572

M. R. Gaberdiel, P. Goddard

i = 1, . . . , n, n ∈ N and zi 6 = zj if i 6 = j , together with χ ∈ Wα and u ∈ C, zi 6 = u, denoted by χ = V (ψ1 , z1 )V (ψ2 , z2 ) · · · V (ψn , zn )Wα (χ , u) ≡

n Y

(86)

V (ψi , zi )Wα (χ , u).

i=1

We again immediately identify different χ ∈ BC α with the other elements of BC α obtained Q by replacing each ψj in (86) by µj ψj , 1 ≤ j ≤ n, χ by λχ, where λ, µj ∈ C and λ nj=1 µj = 1. We also define BC β analogously (by replacing χ ∈ Wα by χ ∈ Wβ ). We then introduce the vector space VC α with basis BC α , and we cut it down to size exactly as before by considering the map analogous to (84), where now φ ∈ BOβ . The Oβ resulting space is denoted by VC α , and is again independent of C provided that the Oβ

Oβ

complement of O is path-connected; in this case we write Vα ≡ VC α . It also has a natural topology induced by the seminorms |ηφ (X)|, where now φ ∈ BOβ . With respect Oβ to this topology, the span of BC α is dense in Vα . We can similarly consider the spaces VβOα by exchanging the rôles of Wα and Wβ . For ψ ∈ V , a vertex operator V (ψ, z) can be defined as an operator V (ψ, z) : Oβ O0 β Vα → Vα , where z ∈ O but z 6 ∈ O0 ⊂ O, by defining its action on the total subset BC α , where C ∩ O = ∅ V (ψ, z)χ = V (ψ, z)V (ψ1 , z1 ) · · · V (ψn , zn )Wα (χ , u), and χ ∈ BC α is as in (86). The image is in BC 0 α for any C 0 ⊃ C which contains z, and we can choose C 0 such that C 0 ∩ O0 = ∅. This then extends by linearity to a map VC α → VC 0 α , and we can show, by analogous arguments as before, that it induces a map Oβ O0 β VC α → VC 0 α . By the same arguments as before in Sect. 4, this definition can be extended to vectors 9 of the form (34) that span the Fock space of the meromorphic theory. The actual Fock 0 space HO (that is typically a quotient space of the free vector space spanned by the vectors of the form (34)) is a subspace of (V O )0 provided that O0 ∪ O = P, and if the 0 O )0 because of (85). In this case it is amplitudes define a representation, ι(HO ) ⊂ (Vαβ Oβ

O0 β

then possible to define vertex operators V (9, z) : Vα → Vα for arbitrary elements of the Fock space H, and this is what is usually thought to be the defining property of a representation. By the same argument the vertex operators are also well-defined for elements in VC for suitable C. There exists an alternative criterion for a set of densities to define a representation, which is in essence due to Montague [17], and which throws considerable light on the nature of conformal field theories and their representations. (Indeed, we shall use it to construct an example of a representation for the u(1)-theory below.) Theorem 6. The densities (80) define a representation provided that, for each open set / O, there is a state 6αβ (u1 , u2 ; O ⊂ P with path-connected complement and u1 , u2 ∈ O χ1 , χ2 ) ∈ V that is equivalent to Wα (u1 , χ1 )Wβ (u2 , χ2 ) in the sense that the amplitudes of the representation are given by ηφ (6αβ ):

Axiomatic Conformal Field Theory

*

m Y

573

+ V (φi , ζi )Wα (u1 , χ1 )Wβ (u2 , χ2 ) =

*m Y

i=1

+ V (φi , ζi )6αβ (u1 , u2 ; χ1 , χ2 ) , (87)

i=1

where ζi ∈ O. The proof of this theorem depends on the following: Lemma. There exist sequences ei ∈ VCO , fi ∈ (VCO )0 , dense in the appropriate topoloP O gies, such that fj (ei ) = δij and such that ∞ i=1 ei fi (9) converges to 9 for all 9 ∈ VC . To prove the lemma, take the {ei } to be formed from the union of the bases of the eigenspaces HN of L0 , which we have taken to be finite-dimensional, taken in order, N = 0, 1, 2, . . . . Using the projection operators PN defined by (67), we have that P ∞ N=0 PN 9 = 9 and Pn 9 can be written as a sum of the ei which are basis elements of HN , with P∞ and linearly on 9. It is then P coefficients fi (9) which dependOcontinuously 0, η = e f (9) = 9 and, if η ∈ (V ) clear that ∞ i i i=1 i=1 fi η(ei ), showing that {ei } is dense in V O and {fi } is dense in (V O )0 . O )0 , let us Proof of Theorem 6. Assuming we have a continuous map ι : (V O )0 → (Vαβ define 6αβ (u1 , u2 ; χ1 , χ2 ) by X ei ι(fi ) Wα (u1 , χ1 )Wβ (u2 , χ2 ) . (88) 6αβ (u1 , u2 ; χ1 , χ2 ) = i

P

Then, if ηφ = *m Y

j

+

V (φi , ζi )Wα (u1 , χ1 )Wβ (u2 , χ2 ) = ηφ 6αβ (u1 , u2 ; χ1 , χ2 )

i=1

=

λj fj ,

X

λj fj (ei ) ι(fi ) Wα (u1 , χ1 )Wβ (u2 , χ2 )

ji

= ι(ηφ ) Wα (u1 , χ1 )Wβ (u2 , χ2 ) + *m Y V (φi , ζi )6αβ (u1 , u2 ; χ1 , χ2 ) , = i=1

and the convergence of (88) can be deduced from this. Conversely, suppose that (87) holds; then n Y i=1

V (ψi , ζi )Wα (u1 , χ1 )Wβ (u2 , χ2 ) →

n Y

V (ψi , ζi )6αβ (u1 , u2 ; χ1 , χ2 )

(89)

i=1

O → V O (where ζ , u , u ∈ defines a continuous map Vαβ i 1 2 / O), and this induces a dual O O 0 0 map ι : (V ) → (Vαβ ) , continuous in the weak topology, satisfying ι(ηφ ) = ηφ , i.e. (85) holds. O onto V O : it is onto for otherwise its The map (89) defines an isomorphism of Vαβ image would define an invariant subspace of V O and the argument of Proposition 5 shows that this must be the whole space; and it is an injection because if it maps a vector t X to zero, ηφ (X) must vanish for all φ ∈ BO , implying X = 0. u

574

M. R. Gaberdiel, P. Goddard

9. Möbius Covariance, Fock Spaces and the Equivalence of Representations We shall now assume that each density in the collection R is invariant under the action of the Möbius transformations, i.e. that the amplitudes satisfy n Y h V (ψi , zi )Wα (χ1 , u1 )Wβ (χ2 , u2 )i i=1

=h

n Y

V (ψi , γ (zi ))Wα (χ1 , γ (u1 ))Wβ (χ2 , γ (u2 ))i

i=1 2 Y

(γ 0 (ul ))rl

·

l=1

(90)

n Y (γ 0 (zi ))hi , i=1

where rl are the real numbers which appear in the definition of the densities, and hi is the grade of ψi . Oβ

In this case, we can define operators U (γ ), m apping Vα BC α , where C ∩ O = ∅, these operators are defined by

U (γ )χ = V (ψ1 , γ (z1 )) · · · V (ψn , γ (zn ))Wα (χ , γ (u))

Oγ β

to Vα

; on the total subset

n Y (γ 0 (zi ))hi γ 0 (u)r1 , (91) i=1

where χ is defined as in (86), and hi is the grade of ψi , i = 1, . . . , n. This definition extends by linearity to operators being defined on VC α , and by analogous arguments to O β Oβ those in Sect. 3, this extends to a well-defined map Vα → Vα γ . If we choose two M points z∞ and z0 as before, we can introduce the Möbius generators LM 0 , L±1 which are well-defined on these spaces. We define the Fock space HαO ⊂ VαO to be the space spanned by finite linear combinations of vectors of the form 8 = Vn1 (ψ1 ) · · · VnN (ψN )Wα (χ , 0),

(92)

where ψj ∈ V , χ ∈ Wα and nj ∈ Z, 1 ≤ j ≤ N. Here the modes Vn (ψ) are defined as before in (32) where the contour encircles the point 0 ∈ C, and this still makes sense since the amplitudes R are not branched about ui = zj . It is clear that HαO is a dense subspace of VαO , and that it is independent of O; where no ambiguity arises we shall therefore denote it by Hα . By construction, Wα ⊂ Hα . We can also define Wβ ⊂ Hβ in the same way. As before it is then possible to extend the amplitudes R to amplitudes being defined for χ1 ∈ Hα and χ2 ∈ Hβ (rather than χ1 ∈ Wα and χ2 ∈ Wβ ), and for the subset of quasiprimary states in Hα and Hβ (i.e. for the states that are annihilated by LM 1 defined above), the Möbius properties are analogous to those in (90). As in the case of the meromorphic theory we can then define the equivalence of two representations. Let us suppose that for a given meromorphic field theory specified by V and A = {f }, we have two collections of densities, one specified by Wα , Wβ with ˆ We denote the amplitudes given by R = {g}, and one specified by Wˆ α , Wˆ β and R. the corresponding Fock spaces by Hα , Hβ in the case of the former densities, and by

Axiomatic Conformal Field Theory

575

Hˆ α and Hˆ β in the case of the latter. We say that the two densities define equivalent representations if there exist graded injections

and

ια : Wα → Hˆ α ιβ : Wβ → Hˆ β ,

(93)

ιˆα : Wˆ α → Hα ιˆβ : Wˆ β → Hβ ,

(94)

that map amplitudes to amplitudes. We similarly define two representations to be conjugate to one another if instead of (93) and (94) the amplitudes are mapped to each other under (930 ) ια : Wα → Hˆ β ιβ : Wβ → Hˆ α , and

ιˆα : Wˆ α → Hβ ιˆβ : Wˆ β → Hα .

(940 )

A representation is called highest weight, if the equivalence class of collections of densities contains a representative which has the highest weight property: for each density g and each choice of χ1 ∈ Wα , χ2 ∈ Wβ and ψi ∈ Vhi , the pole in (zi − ul ) is bounded by hi . This definition is slightly more general than the definition which is often used, in that it is not assumed that the highest weight vectors transform in any way under the zero modes of the meromorphic fields. In Sect. 6, we showed, using the cluster property, that the meromorphic conformal field theory does not have any proper ideals. This implies now Proposition 7. Every non-trivial representation is faithful. 0

0 Proof. Suppose that V (8, z), where 8 ∈ VO C , C ∩ O = ∅ and Cz ⊂ O, acts trivially on the representation VαO , i.e. that

V (8, z)9 = 0 for every 9 ∈ VαO .

(95)

Then, for any ψ ∈ V and ζ ∈ O0 for which ζ + z ∈ O we have V (V (ψ, ζ )8, z) 9 = V (ψ, ζ + z)V (8, z)9 = 0,

(96)

and thus V (ψ, ζ )8 also acts trivially on VαO . This implies that the subspace of states in 0 O O VO C that act trivially on Vα is an ideal. Since there are no non-trivial ideals in VC , this implies that the representation is faithful. u t 10. An Example of a Representation Let us now consider the example of the U (1) theory which was first introduced in Sect. 5.1. In this section we want to construct a family of representations for this meromorphic conformal field theory. Let us first define the state Z b Z b dw1 · · · dwn : J (w1 ) · · · J (wn ) :, (97) 9n = a

a

where a, b ∈ C ⊂ C, the integrals are chosen to lie in C, and the normal ordering prescription : · : means that all poles in wi − wj for i 6= j are subtracted. We can deduce

576

M. R. Gaberdiel, P. Goddard

from the definition of the amplitudes (48) and (97) that the amplitudes involving 9n are of the form + * n N Y Y X Y (b − a) J (ζj ) = k n J (ζj )i, (98) h 9n (a − ζil )(b − ζil ) i ,... ,i ∈{1,... ,N } j =1

n ij 6=il

1

l=1

j 6 ∈{i1 ,...in }

where ζj ∈ O ⊂ C and C ∩H O = ∅. By analytic continuation of (98) we can then calculate the contour integral Ca J (z)dz9n , where Ca is a contour in C encircling a but not b, and we find that I J (z)dz9n = −nk9n−1 , (99) Ca

and

I Ca

(z − a)n J (z)dz9n = 0 for n ≥ 1,

(100)

where the equality holds in V O . Similar statements also hold for the contour integral around b, I J (z)dz9n = nk9n−1 , (101) Cb

and

I Cb

Next we define

(z − b)n J (z)dz9n = 0 for n ≥ 1.

Z b ∞ X α αn 9n =: exp J (w)dw :, 9α = n! k n k a

(102)

(103)

n=0

where α is any (real) number. This series converges in V O , since for any amplitude of the form h9α J (ζ1 ) · · · J (ζN )i only the terms in (103) with n ≤ N contribute, as follows from (98). We can use 9 to define amplitudes as in (87), and in order to show that these form a representation, it suffices (because of Theorem 6) to demonstrate that the functions so defined have the appropriate analyticity properties. The only possible obstruction arises from the singularity for ζi → a and ζi → b, but it follows from (99–102) that J (ζ )9 ∼

−α + O(1) as ζ → a, (ζ − a)

and

α + O(1) as ζ → b, (ζ − b) and thus that the singularities are only simple poles. This proves that the amplitudes defined by (87) give rise to a representation of the U (1) theory. From the point of view of conventional conformal field theory, this representation (and its conjugate) is the highest weight representation that is generated from a state of U (1)-charge ±α. It may be worthwhile to point out that we can rescale all amplitudes of a representation of a meromorphic field theory by J (ζ )9 ∼

g 7→ C(u1 − u2 )2δ g,

(104)

Axiomatic Conformal Field Theory

577

where C and δ are fixed constants (that are the same for all g), without actually violating any of the conditions we have considered so far. (The only effect is that r1 and r2 are replaced by rˆl = rl − δ, l = 1, 2.) For the representation of a meromorphic conformal field theory, the ambiguity in δ can however be canonically fixed: since the meromorphic fields contain the stress-energy field L (whose modes satisfy the Virasoro algebra Ln ), we can require that for n = 0, ±1, (105) Ln = LM n when acting on Hα . The action of L0 on Hα is not modified by (104), but since rˆl = rl −δ, the action of LM 0 is, and (105) therefore fixes the choice of δ in (104). In the above example, in order to obtain a representation of the meromorphic conformal field theory (with L0 being given by (79)), we have to modify the amplitudes as in (104) with δ = −α 2 /2k. This can be easily checked using (99–102). 11. Zhu’s Algebra The description of representations in terms of collections of densities has a large redundancy in that many different collections of densities define the same representation. Typically we are only interested in highest weight representations, and for these we may restrict our attention to the representatives for which the highest weight property holds. In this section we want to analyse the conditions that characterise the corresponding states 6αβ ; this approach is in essence due to Zhu [18]. Suppose we are given a highest weight representation, i.e. a collection of amplitudes that are described in terms of the states 6αβ (u1 , u2 ; χ1 , χ2 ) ∈ V O , where ul 6∈ O. 0 Each such state defines a linear functional on the Fock space HO , where O ∪ O0 = P. But, for given u1 , u2 , the states 6αβ (u1 , u2 ; χ1 , χ2 ), associated with the various possible representations, satisfy certain linear conditions: they vanish on a certain subspace 0 Ou1 ,u2 (HO ). Thus they define, and are characterised by, linear functionals on the quo0 0 tient space HO /Ou1 ,u2 (HO ). This is a crucial realisation, because it turns out that, in cases of interest, this quotient is finite-dimensional. Further the quotient has the structure of an algebra, first identified by Zhu [18], in terms of which the equivalence of representations, defined by these linear functionals, can be characterised. Let us consider the case where u1 = ∞ and u2 = −1, for which we can choose0 O and O0 so that 0 ∈ O and 0 6 ∈ O0 . We want to characterise the subspace of H = HO on which the linear functional defined by 6αβ (∞, −1; χ1 , χ2 ) vanishes identically. Given ψ and χ in H, we define the state V (N ) (ψ)χ in H by I h i dw (N ) L0 V + 1) ψ, w χ, (106) V (ψ)χ = (w N +1 0 w where N is an arbitrary integer, and the contour is a small circle (with radius less than one) around w = 0. If 6αβ has the highest weight property then h6αβ (∞, −1; χ1 , χ2 )V (N ) (ψ)φi = 0 for N > 0.

(107)

This follows directly from the observation that the integrand in (107) does not have any poles at w = −1 or w = ∞. Let us denote by O(H) the subspace of H that is generated by states of the form (106) with N > 0, and define the quotient space A(H) = H/O(H). Then it follows that

578

M. R. Gaberdiel, P. Goddard

every highest weight representation defines a linear functional on A(H). If two representations induce the same linear functional on A(H), then they are actually equivalent representations, and thus the number of inequivalent representations is always bounded by the dimension of A(H). In fact, as we shall show below, the vector space A(H) has the structure of an associative algebra, where the product is defined by (106) with N = 0. In terms of the states 6αβ this product corresponds to h6αβ (∞, −1; χ1 , χ2 )V (0) (ψ)φi = (−1)hψ h6αβ (∞, −1; V0 (ψ)χ1 , χ2 )φi.

(108)

One may therefore expect that the different highest weight representations of the meromorphic conformal field theory are in one-to-one correspondence with the different representations of the algebra A(H), and this is indeed true [18]. Most conformal field theories of interest have the property that A(H) is a finite-dimensional algebra, and there exist therefore only finitely many inequivalent highest weight representations of the corresponding meromorphic conformal field theory; we shall call a meromorphic conformal field theory for which this is true finite. In the above discussion the two points, u1 = ∞ and u2 = −1 were singled out, but the definition of the quotient space (and the algebra) is in fact independent of this choice. Let us consider the Möbius transformation γ which maps ∞ 7 → u1 , −1 7→ u2 and 0 7 → 0 (where ul 6 = 0); it is explicitly given as u1 u2 ζ u1 u2 0 ↔ , γ (ζ ) = u2 u2 − u1 u2 (ζ + 1) − u1 with inverse γ −1 (z) =

u1 − u2 z u2 (z − u1 )

↔

u1 − u2 0 . u2 −u1 u2

Writing ψ 0 = U (γ )ψ and χ 0 = U (γ )χ we then find (see Appendix F) ) (ψ 0 )χ 0 = U (γ )V (N ) (ψ) χ, Vu(N 1 ,u2

(109)

(N )

where Vu1 ,u2 (ψ) is defined by ) (ψ)χ Vu(N 1 ,u2

I = 0

u1 dw w (u1 − w)

u2 (u1 − w) N (u2 − u1 ) w " # (110) (u1 − w)(u2 − w) L0 u wu L1 V e 1 2 ψ, w χ, u1 u2

and the contour encloses w = 0 but not w = ul . We can then also define Ou1 ,u2 (H) to be the space that is generated by states of the form (110) with N > 0, and Au1 ,u2 (H) = H/Ou1 ,u2 (H). As z = 0 is a fixed point of γ , U (γ ) : H → H, and because of (109), U (γ ) : O(H) → Ou1 ,u2 (H). It also follows from (109) with N = 0 that the product is covariant, and this implies that the different algebras Au1 ,u2 (H) for different choices of u1 and u2 are isomorphic. To establish that the algebra action is well-defined and associative, it is therefore sufficient to consider the case corresponding to u1 = ∞ and u2 = −1. In this case we write V (0) (ψ)χ also as ψ ∗ χ .

Axiomatic Conformal Field Theory

579

Let us first show that Ou1 ,u2 (H) = Ou2 ,u1 (H). Because of the Möbius covariance it is sufficient to prove this for the special case, where u1 = ∞ and u2 = −1. For this (N ) case we have V∞,−1 (ψ) = V (N ) (ψ) as before, and (N )

V−1,∞ (ψ) ≡ Vc(N ) (ψ) = (−1)N

I

1 dw w (w + 1)

w+1 w

N

V (w + 1)L0 ψ, w . (111)

The result then follows from the observation that, for N ≥ 1, N X N − 1 −l (w + 1)N −1 = w , wN l−1 l=1

and

N l−1 X 1 N −l N − 1 (w + 1) = (−1) . wN wl l−1 l=1

In particular, it follows from this calculation that (107) also holds if V (N ) (ψ) is replaced (N ) (N ) by Vc (ψ). Because of the definition of Vc (ψ) it is clear that the analogue of (108) is now h6αβ (∞, −1; χ1 , χ2 )Vc(0) (ψ)φi = h6αβ (∞, −1; χ1 , V0 (ψ)χ2 )φi.

(112)

(N )

One should therefore expect that for N ≥ 0, the action of V (0) (ψ) and Vc (χ ) commute (M) up to elements of the form Vc (φ), where M > 0 which generate states in O(H). To prove this, it is sufficient to consider the case, where ψ and χ are eigenvectors of L0 with (N ) eigenvalues hψ and hχ , respectively; then the commutator [V (0) (ψ), Vc (χ )] equals N (up to the constant (−1) in (111)) dζ dw w+1 N (w + 1)hχ V (ψ, ζ )V (χ , w) (ζ + 1)hψ ζ w(w + 1) w |ζ |>|w| I I w+1 N dw dζ (w + 1)hχ (ζ + 1)hψ V (χ , w)V (ψ, ζ ) − w ζ |z|>|ζ | w(w + 1) I I dw w+1 N dζ hψ (w + 1)hχ (ζ + 1) V (ψ, ζ )V (χ , w) = w(w + 1) w 0 w ζ X I I dζ hψ −n−hψ = (ζ + 1) V (Vn (ψ)χ , w)(ζ − w) 0 w ζ n w+1 N dw (w + 1)hχ · w(w + 1) w ψ −1 X X n+h hψ = (−1)l l+1−n I I

=

n≤hχ

I

· 0

l=0

dw w(w + 1)

w+1 w

N +l+1

(w + 1)hχ −n V (Vn (ψ)χ , w)≈ 0,

580

M. R. Gaberdiel, P. Goddard

where we denote by ≈ equality in H up to states in O(H). Because of the Möbius (0) covariance, it then also follows that [V (N ) (ψ), Vc (χ )] ≈ 0 for N ≥ 0, As Ou1 ,u2 (H) = Ou2 ,u1 (H), this calculation implies that the action of V (0) (ψ) is well-defined on the quotient space. To prove that the action defines an associative algebra, we observe that in the same way in which V (ψ, z) is uniquely characterised by the two properties (28a) and (28b), V (0) (ψ) is uniquely determined by the two properties V (0) (ψ) = ψ, [V (0) (ψ), Vc(N ) (χ )] ≈ 0 for N ≥ 0.

(113)

Indeed, if V (0) (ψ1 ) and V (0) (ψ2 ) both satisfy these properties for the same ψ, then V (0) (ψ1 )φ = V (0) (ψ1 )Vc(0) (φ) ≈ Vc(0) (φ)V (0) (ψ1 ) = Vc(0) (φ)ψ = Vc(0) (φ)V (0) (ψ2 )

(114)

≈ V (0) (ψ2 )Vc(0) (φ) = V (0) (ψ2 )φ, (0)

where we have used that Vc (φ) = φ, as follows directly from the definition of (0) Vc (φ). This therefore implies that V (0) (ψ1 ) = V (0) (ψ2 ) on A(H). It is now immediate that V (0) (V (0) (ψ)χ ) ≈ V (0) (ψ)V (0) (χ ),

(115)

(0)

since both operators commute with Vc (φ) for arbitrary φ, and since V (0) (V (0) (ψ)χ) = V (0) (ψ)χ = V (0) (ψ)V (0) (χ ).

(116)

We have thus shown that (ψ ∗ χ) ∗ φ ≈ ψ ∗ (χ ∗ φ). Similarly, V (0) (V (N ) (ψ)χ ) ≈ V (N ) (ψ)V (0) (χ ),

(117)

and this implies that ψ1 ∗ φ ≈ 0 if ψ1 ≈ 0. This proves that A(H) forms an associative algebra. The algebraic structures on Au1 ,u2 (H) and Au2 ,u1 (H) are related by o (118) Au1 ,u2 (H) = Au2 ,u1 (H) , where Ao is the reverse algebra as explained in Appendix G. Indeed, it follows from (111) and (113) that V (0) (ψ1 )V (0) (ψ2 ) = V (0) (ψ1 )Vc(0) (ψ2 ) = Vc(0) (ψ2 )V (0) (ψ1 ) = Vc(0) (ψ2 )Vc(0) (ψ1 ), and this implies (118).

Axiomatic Conformal Field Theory

581

By a similar calculation to the above, we can also deduce that for hφ > 0, hφ +hψ −1

X

≈

V (0) (Vm+1−hφ (φ)ψ)

m=0 hφ −1

=

X

m=0

V

hφ −1

=

X

m=0

I =

min(hφ ,m)

X

(−1)m+s

s=0

(0)

m X m+s hφ (Vm+1−hφ (φ)ψ) (−1) s s=0

hφ − 1 (0) V (Vm+1−hφ (φ)ψ) m

hφ s

(119)

V (0) (V (φ, ζ )ψ)(ζ + 1)hφ −1 dζ, I

and V

(1)

(ψ) = =

V (ψ, ζ )(ζ + 1)hψ

hψ X hψ n=0

n

dζ ζ2

Vn−hψ −1 (ψ)

= V−hψ −1 (ψ) + hψ V−hψ (ψ) = (L−1 + L0 )V−hψ (ψ). In particular, this implies that (L−1 + L0 )ψ ≈ 0 for every ψ ∈ H. For the Virasoro field L(z) = V (L, z) (119) becomes =

1 X 1 m=0

m

V (0) (Vm−1 (L)ψ)

= (L−1 + L0 )ψ ≈ 0, which thus implies that L is central in Zhu’s algebra. So far our considerations have been in essence algebraic, in that we have considered the conditions 6αβ have to satisfy in terms of the linear functional it defines on the Fock space H. If, however, we wish to reverse this process, and proceed from a linear functional on A(H) to a representation of the conformal field theory, we need to be concerned about the analytic properties of the resulting amplitudes. To this end, we note that we can perform an analytic version of the construction as follows. In fact, since 6αβ is indeed an element of V O for a suitable O, it actually defines a linear functional on the whole dual space VO ≡ (V O )0 (of which the Fock space is only a dense subspace). Let us denote by O(VO ) the completion (in VO ) of the space that is generated by states of the form (110) with N > 0, where now ψ ∈ H and χ ∈ VO . By the same arguments as before, the linear functional associated to 6αβ vanishes then on O(VO ), and thus defines a linear functional on the quotient space (120) A VO = VO /O(VO ). It is not difficult to show (see [19] for further details) that a priori A VO is a quotient space of A(H); the main content of Zhu’s Theorem [18] is equivalent to:

582

M. R. Gaberdiel, P. Goddard

Theorem 8 (Zhu’s Theorem). The two quotient spaces are isomorphic vector spaces A VO ' A(H). Proof. It follows from the proof in [18] that every non-trivial linear functional on A(H) (that is defined by ρ(a) = hw ∗ , awi, where w is an element of a representation of state A(H), and w ∗ is an element of the corresponding dual space) defines a non-trivial t 6αβ ∈ V O , and therefore a non-trivial element in the dual space of A VO . u The main importance of this result is that it relates the analytic properties of correlation functions (which are in essence encoded in the definition of the space V O , etc.) to the purely algebraic Fock space H. Every linear functional on A(VO ) defines a highest weight representation of the meromorphic conformal field theory, and two such functionals define equivalent representations if they are related by the action of Zhu’s algebra as in (108). Because of Zhu’s Theorem there is therefore a one-to-one correspondence between highest weight representations of the meromorphic conformal field theory whose Fock space is H, and representations of the algebra A(H); this (or something closely related to it) is the form in which Zhu’s Theorem is usually stated. Much of the structure of the meromorphic conformal field theory (and its representations) can be read off from properties of A(H). For example, it was shown in ref. [21] (see also Appendix G) that if A(H) is semisimple, then it is necessarily finitedimensional, and therefore there exist only finitely many irreducible representations of the meromorphic field theory.

12. Further Developments In this paper we have introduced a rigorous approach to conformal field theory taking the amplitudes of meromorphic fields as a starting point. We have shown how the paradigm examples of conformal field theories, i.e. lattice theories, affine Lie algebra theories and the Virasoro theory, all fit within this approach. We have shown how to introduce the concept of a representation of such a meromorphic conformal field theory by using a collection of amplitudes which involve two non-meromorphic fields, so that the amplitudes may be branched at the corresponding points. We showed how this led naturally to the introduction of Zhu’s algebra and why the condition that this algebra be finite-dimensional is a critical one in distinguishing interesting and tractable theories from those that appear to be less so. To complete a treatment of the fundamental aspects of conformal field theory we should discuss subtheories, coset theories and orbifolds, all of which can be expressed naturally within the present approach [19]. It is also clear how to modify the axioms for theories involving fermions. It is relatively straightforward to generalise the discussion of representations to correlation functions involving N > 2 non-meromorphic fields. The only difference is that in this case, there are more than two points ul , l = 1, . . . , N, at which the amplitudes are allowed to have branch cuts. The condition that a collection of such amplitudes defines an N -point correlation function of the meromorphic conformal field theory can then be described analogously to the case of N = 2: we consider the vector space VC α (where α = (α1 , . . . , αN ) denotes the indices of the N vector spaces Wαi that are associated to

Axiomatic Conformal Field Theory

583

the N points u1 , . . . , uN ), whose elements are finite linear combinations of vectors of the form (121) V (ψ1 , z1 ) · · · V (ψn , zn ) Wα1 (χ1 , u1 ) · · · WαN (χN , uN ). We complete this space (and cut it down to size) using the standard construction with respect to BO and the above set of amplitudes, and we denote the resulting space by VCOα . The relevant condition is then that BO induces a continuous map (V O )0 → (VαO )0 .

(122)

There also exists a formulation of this condition analogous to (87): a collection of amplitudes defines an N -point correlation function if there exists a family of states 6α (u1 , . . . , uN ; χ1 , . . . , χN ) ∈ VCO for each O, C with O ∩ C = ∅ that is equivalent to Wα1 (χ1 , u1 ) · · · WαN (χN , uN ) in the sense that n DY

E

V (ψi , zi )Wα1 (χ1 , u1 ) · · · WαN (χN , uN )

i=1

* =

n Y

+ V (ψi , zi )6α (u1 , . . . , uN ; χ1 , . . . , χN ) ,

(123)

i=1

where zi ∈ O. An argument analogous to Theorem 6 then implies that (122) is equivalent to (123). In the case of the two-point correlation functions (or representations) we introduced a quotient space (120) of the vector space VO = (V O )0 that classified the different highest weight representations. We can now perform an analogous construction. Let us consider the situation where the N highest weight states are at u1 = ∞, u2 , . . . , uN , and define (M) VN (ψ)χ

I = 0

dζ ζ 1+M

QN

j =2 (ζ − uj ) ζ N −2

!hψ V (ψ, ζ )χ ,

(124)

where M is an integer, ψ ∈ H, χ ∈ VO , and the contour encircles 0, but does not encircle u1 , . . . , uN . We denote by ON (VO ) the completion of the space (in VO ) that is spanned by states of the form (124) with M > 0, and we denote by AN (VO ) the quotient space VO /ON (VO ). (In this terminology, Zhu’s algebra is the space A2 (VO ).) By the same arguments as in (107) is is easy to see that every state 6α that corresponds to an N-point function of N highest weights (where the highest weight property is defined as before) defines a functional on VO that vanishes on ON (VO ), and thus defines a linear functional on AN (VO ). One can show (see [19] for more details) that the space AN (VO ) carries N commuting actions of Zhu’s algebra A(H) which are naturally associated to the N non-meromorphic points u1 , . . . , uN . For example the action corresponding to u1 is given by (124) with (0) M = 0 and ψ ∈ H, ψ ◦ φ = VN (ψ)φ. This action is actually well-defined for ψ ∈ A(H) since we have for L ≥ 0, (0) (L) (0) (125) VN V (L) (ψ1 ) ψ2 φ ≈ VN (ψ1 ) VN (ψ2 ) φ,

584

M. R. Gaberdiel, P. Goddard

where we denote by ≈ equality in VO up to states in ON (VO ). Applying (125) for L = 0 implies that the algebra relations of A(H) are respected, i.e. that (ψ1 ∗ ψ2 ) ◦ φ = ψ1 ◦ (ψ2 ◦ φ) ,

(126)

where ∗ denotes the multiplication of A(H). Every N -point correlation function determines therefore N representations of Zhu’s algebra, and because of Zhu’s Theorem, we can associate N representations of the meromorphic conformal field theory to it. Conversely, every linear functional on AN (VO ) defines an N-point correlation function, and two functionals define equivalent such functions if they are related by the actions of Zhu’s algebra; in this way the different N -point correlation functions of the meromorphic conformal field theory are classified by AN (VO ). There exists also an “algebraic” version of this quotient space, AN (H) = H/ON (H), where ON (H) is generated by the states of the form (124), where now ψ and φ are in H. This space is much more amenable for study, and one may therefore hope that in analogy to Zhu’s Theorem, the two quotient spaces are isomorphic vector spaces, AN (VO ) ' AN (H) ; it would be interesting if this could be established. It is also rather straightforward to apply the above techniques to an analysis of correlation functions on higher genus Riemann surfaces. Again, it is easy to see that the correlation functions on a genus g surface can be described in terms of a state of the meromorphic conformal field theory on the sphere, in very much the same way in which N-point correlation functions can be defined by (87). The corresponding state induces a linear functional on VO (or H), and since it vanishes on a certain subspace thereof, defines a linear functional on a suitable quotient space. For the case of the genus g = 1 surface, the torus, the corresponding quotient space is very closely related to Zhu’s algebra, and one may expect that similar relations hold more generally. Acknowledgements. We would like to thank Ben Garling, Terry Gannon, Graeme Segal and Anthony Wassermann for useful conversations. M.R.G. is grateful to Jesus College, Cambridge, for a Research Fellowship and to Harvard University for hospitality during the tenure of a NATO Fellowship in 1996/97. P.G. is grateful to the Mathematische Forschungsinstitut Oberwolfach and the Aspen Center for Physics for hospitality in January 1995 and August 1996, respectively. The visit to Aspen was partially funded by EPSRC grant GR/K30667.

Appendix A: Sequences of Holomorphic Functions A sequence of functions {fn } = f1 , f2 , f3 , . . . , each defined on a domain D ⊂ C, is said to be uniformly bounded on D if there exists a real number M such that |fn (z)| < M for all n and z ∈ D. The sequence is said to be locally uniformly bounded on D if, given z0 ∈ D, {fn } is uniformly bounded on Nδ (z0 , D) = {z ∈ D : |z − z0 | < δ} for some δ ≡ δ(z0 ) > 0. A sequence of functions {fn } defined on D is said to be uniformly convergent to f : D → C if, given > 0, ∃ N such that |fn (z) − f (z)| < for all z ∈ D and n > N. We write fn → f uniformly in D. The sequence is said to be locally uniformly convergent to f : D → C if, given z0 ∈ D, fn → f uniformly in Nδ (z0 , D) for some δ ≡ δ(z0 ) > 0. Clearly, by the Heine-Borel Theorem, a sequence is locally uniform bounded on D if and only if it is uniformly bounded on every compact subset of D, and locally uniformly

Axiomatic Conformal Field Theory

585

convergent on D if and only if it is uniformly convergent on every compact subset of D. Local uniformity of the convergence of a sequence of continuous functions guarantees continuity of the limit and, similarly, analyticity of the limit of a sequence of analytic functions is guaranteed by local uniformity of the convergence [22]. The following result [23] is of importance in the approach to conformal field theory developed in this paper: Theorem. If D is an open domain and fn : D → C is analytic for each n, fn (z) → f (z) at each z ∈ D, and the sequence {fn } is locally uniformly bounded in D, then fn → f locally uniformly in D and f is analytic in D. Proof. Again, given z0 ∈ D, Nδ (z0 ) = {z ∈ C : |z − z0 | ≤ δ} ⊂ D for some δ > 0 because D is open. Because Nδ (z0 ) is compact, ∃M, such that |fn (z)| < M for all n and all z ∈ Nδ (z0 ). First we show that the sequence fn0 (z) is uniformly bounded on Nρ (z0 ) for all ρ < δ. For I 1 fn (ζ )dζ Mδ 0 = M1 (ρ), say. ≤ |fn (z)| = 2 2πi Cδ (z0 ) (ζ − z) (δ − ρ)2 Thus for fixed ρ, given > 0, ∃ δ1 > 0 such that |fn (z) − fn (z0 )| < 13 for all values of n provided that z, z0 ∈ Nρ (z0 ) are such that |z − z0 | < δ1 (). Then also |f (z) − f (z0 )| < 13 for |z − z0 | < δ1 (). Now we can find a finite number K of points zj ∈ Nρ (z0 ), 1 ≤ j ≤ K, such that given any point in z ∈ Nρ (z0 ), |z − zj | < δ1 for some j . Now each fn (zj ) → f (zj ) and so we can find integers Lj such that |f (zj ) − fn (zj )| < 13 for n > Lj . Now if n > L = max1≤j ≤K {Lj } and z ∈ Nρ (z0 ), |fn (z)−f (z)| ≤ |fn (z)−fn (zj )|+|fn (zj )−f (zj )|+|f (zj )−f (z)| < , establishing uniform convergence on Nρ (z0 ) and so local uniform convergence. This is sufficient to deduce that f is analytic. u t Appendix B: Completeness of VCO We prove that, if a sequence χj ∈ VCO , j = 1, 2, . . . , ηφ (χj ) converges on each subset of φ of the form (6) (where the convergence is uniform on (7)), the limit limj →∞ ηφ (χj ) necessarily equals ηφ (χ) for some χ ∈ VCO . To see this, note that uniform convergence in this sense is implied by the (uniform) convergence on a countable collection of such sets, taken by considering = 1/N, N a positive integer, K one of a collection of compact subsets of O and the φj to be elements of some countable basis. Taken together we obtain in this way a countable number of conditions for the uniform convergence. Defining ||9||n = maxφ maxζj |ηφ (9)|, where φ ranges over the first n of these countable conditions for each O and the maximum is taken over ζj within (7), we have a sequence of semi-norms, ||9||n , on VC , with ||9||n ≤ ||9||n+1 . Given such a sequence of semi-norms, we can define a Cauchy sequence (9j ), 9j ∈ VC , by the requirement that ||9i − 9j ||n → 0 as i, j → ∞ for each fixed n. This requirement is equivalent to uniform convergence on each set of the form (6). Moreover, the space VCO is obtained by adding in the limits of these Cauchy sequences (identifying points zero distance apart with respect to all of the semi-norms). This space is necessarily complete because if χj ∈ VCO is Cauchy, i.e. ||χi − χj ||n → 0 as i, j → ∞ for each fixed n, and 9im → χi as m → ∞, 9im ∈ VC , then selecting IN

586

M. R. Gaberdiel, P. Goddard

so that ||χi − χj ||N < 1/3N for i, j ≥ IN , and IN +1 ≥ IN , we can find an integer mN such that ||ψImNN − χIN ||N < 1/3N , and if ψN = ψImNN , ||ψM − ψN ||p ≤ ||ψM − χIM ||p + ||χIM − χIN ||p + ||χIN − ψN ||p ≤ ||ψM − χIM ||M + ||χIM − χIN ||N + ||χIN − ψN ||N ≤ 1/N, provided that M ≥ N ≥ p, implying that ψM is Cauchy. It is easy to see that its limit is the limit of χj , showing that VCO is complete. The completeness of this space is equivalent to the condition (11). Appendix C: Proof that VCO is Independent of C (Theorem 1) eO is independent of C, first note that we may identify a vector Proof. To prove that V C Q O e / O for 1 ≤ i ≤ n but it is not χ ∈ VC with ψ = ni=1 V (ψi , zi ) (where zi ∈ necessarily the case that zi ∈ C for each i) if ηφ (χ ) = ηφ (ψ), for all φ ∈ BO , i.e. the value of ηφ (χ) is given by (5) for all φ ∈ BO . Consider then the set Q of values Q eO . of z = (z1 , z2 , . . . , zn ) for which ψ(z) = ni=1 V (ψi , zi ) is a member of V C 0 0 Then D ⊂ Q ⊂ D, where D = {z : zi , zj ∈ C, zi 6 = zj , 1 ≤ i < j ≤ n} and D = {z : zi , zj ∈ Oc , zi 6 = zj , 1 ≤ i < j ≤ n}, where Oc is the complement of O in P. We shall show that Q = D. If zb is in Do , the interior of D, but not in Q, choose a point za ∈ D0 ⊂ Q and join it to zb by a path C inside Do , {z(t) : 0 ≤ t ≤ 1} with z(0) = za and z(1) = zb . (There is such a point za because C o 6 = ∅; the path C exists because the interior of Oc is connected, from which it follows that Do is.) Let tc be the supremum of the values of t0 for which {z(t) : 0 ≤ t ≤ t0 } ⊂ Qo , the interior of Q, and let zc = z(tc ). Then zc = (z1c , z2c , . . . , znc ) is inside the open set Do and so we can find a neighbourhood of the form N1 = {z : |z−zc | < 4δ} which is contained inside Do . Let zd , ze be points each distant less than δ from zc , with zd outside Q and ze inside Qo . (There must be a point zd outside Q in every neighbourhood of zc .) Then the set N2 = {ze + (zd − ze )ω : |ω| < 1} is inside N1 but contains points outside Q. We shall show that N2 ⊂ Q establishing a contradiction to the assumption that there is a point zb in Do but not in Q, so that we must have Do ⊂ Q ⊂ D. The circle {ze + (zd − ze )ω : |ω| < } is inside Q for some in the range 0 < < 1. Now, we can form the integral Z χ = ψ(z)µ(z)d r z S

of ψ(z) over any compact r-dimensional sub-manifold S ⊂ Q, with continuous weight eO , because the approximating sums to the function µ(z), to obtain an element χ ∈ V C integral will have the necessary uniform convergence property. So the Taylor coefficients Z eO ψ(ze + (zd − ze )ω)ω−N −1 dω ∈ V ψN = C P∞

|ω|=

and, since the N=0 ηφ (ψ N )ωN converges to ηφ (ψ(ze + (zd − ze )ω)) for |ω| < 1 and all φ ∈ BO , we deduce that N2 ⊂ Q, hence proving that Do ⊂ Q ⊂ D. Finally, if zj is a sequence of points in Q convergent to z0 ∈ D, it is straightforward to see that ψ(zi ) t will converge to ψ(z0 ), so that Q is closed in D, and so must equal D. u

Axiomatic Conformal Field Theory

587

Appendix D. Möbius Transformation of Vertices By virtue of the Uniqueness Theorem we can establish the transformation properties eλL−1 V (9, z)e−λL−1 = V (9, z + λ), e

λL0

e

λL1

V (9, z)e

−λL0

V (9, z)e

−λL1

λh

(D.1)

λ

= e V (9, e z), −2h

= (1 − λz)

(D.2)

V (exp(λ(1 − λz)L1 )9, z/(1 − λz)) ,

where we have used the relation 1 z 1 1−λz 1 0 1 0 1z 0 1−λz . = −λ(1 − λz) 1 −λ 1 01 0 1 0 1 − λz

(D.3)

(D.4)

From these it follows that hV (8, z)i = 0, hV (8, z)V (9, ζ )i =

ϕ(8, 9) , (z − ζ )h8 +h9

(D.5) (D.6)

where L0 8 = h8 8, L0 9 = h9 9 and h8 6 = 0. The bilinear form ϕ(8, 9), defined as being the constant of proportionality in (D.6), has the symmetry property ϕ(8, 9) = (−1)h8 +h9 ϕ(9, 8).

(D.7)

If, in addition, L1 8 = L1 9 = 0, it follows from (D.3) applied to (D.6) that ϕ(9, 8) = 0 unless h8 = h9 . It follows from (D.3) that, if 9 ∈ H1 , hV (9, z)i = (1 − λz)−2 hV (9, z/(1 − λz))i λ hV (L1 9, z/(1 − λz))i, + (1 − λz)

(D.8)

and from (D.5) that both the left-hand side and the first term on the right-hand side of (D.8) vanish, implying that the second term on the right-hand side also vanishes. But L1 9 ∈ H0 and, if we assume cluster decomposition, so that the vacuum is unique, L1 9 = κ for some κ ∈ C. We deduce that κ = hV (L1 9, ζ )i = 0, so that 9 is Q quasi-primary, i.e. H1 = H1 . Q We can show inductively that Hh is the direct sum of spaces Ln−1 Hh−n , where 0 ≤ n < h, i.e. H is composed of quasi-primary states and their descendants under the action of L−1 . Given 9 ∈ Hh , we can find 8 ∈ L−1 Hh−1 such that L1 (9 − 8) = 0; then 9 is the sum of the quasi-primary state 9 − 8 and 8 which, by an inductive hypothesis, is the sum of descendants of quasi-primary states. To find 8, note L1 (9 +

h X n=1

an Ln−1 Ln1 9) =

h X n (an−1 + an (2nh + n(n + 1))Ln−1 −1 L1 9, n=1

where a0 = 1. So choosing an = −an−1 /(2nh + n(n + 1)), 1 ≤ n ≤ h, we have P L1 (9 − 8) = 0 for 8 = − hn=1 an Ln−1 Ln1 9 ∈ L−1 Hh−1 establishing the result.

588

M. R. Gaberdiel, P. Goddard

Appendix E: Proof that the Extended Amplitudes Aˆ Satisfy the Axioms An alternative description of the extended amplitudes can be given as follows: we define the amplitudes involving vectors in Vˆ recursively (the recursion being on the number of times L appears in an amplitude) by hL(w)i = 0, hL(w)

n Y

V (ψi , zi )i =

i=1

n X

Y cψl h V (ψi , zi )i 4 (w − zl )

l=1 n X

+ +

(E.1)

l=1 n X l=1

i6 =l n Y

hl h (w − zl )2

V (ψi , zi )i

i=1

d 1 hV (ψ1 , z1 ) · · · V (ψl , zl ) · · · V (ψn , zn )i (w − zl ) dzl (E.2)

and hL(w)

m Y j =1

L(wj )

n Y i=1

V (ψi , zi )i =

m X

m n Y Y 2 h L(w ) V (ψi , zi )i j (w − wk )2

k=1 n X

+

l=1

+

m X k=1

+

n X l=1

+ +

m X k=1 n X l=1

j =1 m Y

i=1 n Y

hl h (w − zl )2

L(wj )

j =1

V (ψi , zi )i

i=1

1 d h (w − wk ) dwk

m Y

L(wj )

j =1

1 d h (w − zl ) dzl

m Y

c/2 h (w − wk )4 cψl /2 h (w − zl )4

L(wj )

n Y

V (ψi , zi )i

i=1

L(wj )

n Y

j 6 =k m Y

Y

j =1

i6 =l

L(wj )

V (ψi , zi )i

i=1

j =1

Y

n Y

V (ψi , zi )i

i=1

V (ψi , zi )i. (E.3)

Here hi is the grade of the vector ψi ∈ V , c is an arbitrary (real) number, and cψ is zero unless ψ is of grade two. It is not difficult to see that the functions defined by (E.1)–(E.3) agree with those defined in the main part of the text: for a given set of fields, the difference between the two amplitudes does not have any poles in wj , and therefore is constant as a function of wj ; this constant is easily determined to be zero. The diagrammatical description of the amplitudes immediately implies that the amplitudes are local. We shall now use the formulae (E.1)–(E.3) to prove that they are also Möbius covariant. The Möbius group is generated by translations, scalings and the inversion z 7 → 1/z. It is immediate from the above formulae that the amplitudes (with the grade of L being 2) are covariant under translations and scalings, and we therefore only

Axiomatic Conformal Field Theory

589

have to check the covariance under the inversion z 7 → 1/z. First, we calculate (setting for the moment cψ = 0 for all ψ of grade two) hL(1/w) +

n X l=1

n Y

V (ψi , 1/zi )i =

i=1

n X l=1

n Y hl h V (ψi , 1/zi )i (1/w − 1/zl )2 i=1

d 1 hV (ψ1 , 1/z1 ) · · · V (ψl , z˜ l ) · · · V (ψn , 1/zn )i . (1/w − 1/zl ) d z˜ l z˜ l =1/zl (E.4)

Using the Möbius covariance of the original amplitudes, we find n d Y d hV (ψ1 , 1/z1 ) · · · V (ψl , z˜ l ) · · · V (ψn , 1/zn )i = −zl2 h V (ψi , 1/zi )i z˜ l =1/zl d z˜ l dzl i=1   !−hi !−hl n Y −1 Y −1 d  h V (ψi , zi )i = −zl2 2 2 dzl z z i l i6=l i=1 !−hi n Y −1 d hV (ψ1 , z1 ) · · · V (ψl , zl ) · · · V (ψn , zn )i = −zl2 2 dz z l i i=1 !−hi !−1 n n Y Y −1 1 2 − (−h ) h V (ψi , zi )i − zl2 l zi2 zl3 zl2 i=1 i=1 !−hi n Y −1 d hV (ψ1 , z1 ) · · · V (ψl , zl ) · · · V (ψn , zn )i −zl2 = 2 dzl zi i=1 # n Y −2 hl zl h V (ψi , zi )i . i=1

Inserting this formula in the above expression, we get hL(1/w)

n Y i=1

V (ψi , 1/zi )i

# n !−hi ( n " −2 Y n X Y 2zl2 zl2 −1 hl + 3 h V (ψi , zi )i = 2 2 2 w (w − zl ) w (w − zl ) zi i=1 l=1 i=1 ) n 3 X zl d hV (ψ1 , z1 ) · · · V (ψl , zl ) · · · V (ψn , zn )i + 3 w (w − zl ) dzl l=1 !−hi ( n n n X 3z2 w − 2 z3 Y −1 −2 Y −1 l l h h V (ψi , zi )i = l 3 w2 w (w − zl )2 zi2 i=1 l=1 i=1 ) n X zl3 d hV (ψ1 , z1 ) · · · V (ψl , zl ) · · · V (ψn , zn )i . + w 3 (w − zl ) dzl

−1 w2

l=1

590

M. R. Gaberdiel, P. Goddard

It remains to show that the expression in brackets actually agrees with (E.4). To prove this, we observe, that because of Möbius invariance of the amplitudes we have n X d hV (ψ1 , z1 ) · · · V (ψl , zl ) · · · V (ψn , zn )i = 0, dzl l=1

n X l=1

hl h

n Y

V (ψi , zi )i +

i=1

n X

zl

l=1

d hV (ψ1 , z1 ) · · · V (ψl , zl ) · · · V (ψn , zn )i = 0 dzl

and n X l=1

n n Y X d 2 hl zl h V (ψi , zi )i + zl2 hV (ψ1 , z1 ) dzl i=1

l=1

· · · V (ψl , zl ) · · · V (ψn , zn )i = 0. The claim then follows from the observation that " # n n X 3 zl2 w − 2 zl3 Y 1 hl − 3 h V (ψi , zi )i (w − zl )2 w (w − zl )2 l=1 i=1 # " n 3 X z 1 − 3 l + (w − zl ) w (w − zl ) l=1

d hV (ψ1 , z1 ) · · · V (ψl , zl ) · · · V (ψn , zn )i · dzl ( n n X d Y 1 h V (ψi , zi )i = 3 w2 w dzl l=1 i # " n n n n X Y X d Y hl h V (ψi , zi )i + zl h V (ψi , zi )i +w dzl l=1 i=1 l=1 i ) n n n n X Y X Y 2 d 2 hl zl h V (ψi , zi )i + zl h V (ψi , zi )i + dzl l=1

i=1

l=1

i=1

= 0. We have thus shown that the functions of the form (E.2) (for cψ = 0) have the correct transformation property under Möbius transformations. This implies, as L is quasiprimary, that the functions of the form (E.3) have the right transformation property for c = 0. However, the sum involving the c-terms has also (on its own) the right transformation property, and thus the above functions have. This completes the proof. Finally, we want to show that the amplitudes Aˆ have the cluster property provided the amplitudes A do. We want to prove the cluster property by induction on the number NL of L-fields in the extended amplitudes. If NL = 0, then the result follows from the assumption about the original amplitudes. Let us therefore assume that the result has been proven for NL = N, and consider the amplitudes with NL = N + 1. For a given amplitude, we subdivide the fields into two groups, and we consider the limit, where the parameters zi of one group are scaled to zero, whereas the parameters ζj of the other

Axiomatic Conformal Field Theory

591

group are kept fixed. Because of the Möbius covariance, we may assume that the group whose parameters zi are scaled to zero contain at least one L-field, L(z1 ), say, and we can use (E.2) (or (E.3)) to rewrite the amplitudes involving L(z1 ) in terms of amplitudes which do not involve L(z1 ) and which have NL ≤ N. It then follows from (E.2) (or (E.3)) together with the induction hypothesis that the terms involving (z1 − ζj )−l (where l = 1, 2 or l = 4) are not of leading order in the limit where the zi are scaled to zero, whereas all terms with (z1 − zi )−l are. This implies, again by the induction hypothesis, that the amplitudes satisfy the cluster property for NL = N + 1, and the result follows by induction.

13. Appendix F: Möbius Transformation of Zhu’s Modes We want to prove formula (110) in this Appendix. We have to show that ) (ψ) = U (γ )V (N ) (U (γ )−1 ψ)U (γ )−1 Vu(N 1 ,u2 I h i dζ = U (γ ) V (ζ + 1)L0 U (γ )−1 ψ, ζ N +1 U (γ )−1 ζ 0 I h i dζ = U (γ )V (ζ + 1)L0 U (γ )−1 ψ, ζ U (γ )−1 N +1 . ζ 0

We therefore have to find an expression for the transformed vertex operator. By the uniqueness theorem, it is sufficient to evaluate the expression on the vacuum; then we find i h U (γ )V (ζ + 1)L0 U (γ )−1 ψ, ζ U (γ )−1 = U (γ )eζ L−1 (ζ + 1)L0 U (γ )−1 ψ. To calculate the product of the Möbius transformations, we write them in terms of 2 × 2 matrices, determine their product and rewrite the resulting matrix in terms of the generators L0 , L±1 . After a slightly lengthy calculation we then find U (γ )eζ L−1 (ζ + 1)L0 U (γ )−1 ψ =   !L0 ζ ζ +1 u1 u2 ζ  . exp L1 ψ, V u2 ζ 2 u ζ + (u − u ) u ζ + (u2 − u1 ) 2 2 1 2 (1 − (u1 −u2 ) ) (N )

In the integral for Vu1 ,u2 we then change variables to w=

u1 u2 ζ = γ (ζ ); u2 ζ + (u2 − u1 )

in terms of w the relevant expressions become 1+ζ =

dw u2 ζ u1 u1 (u2 − u1 ) u1 (w − u2 ) . 1− = dζ = u2 (w − u1 ) (u1 − u2 ) (u1 − w) u2 (w − u1 )2

Putting everything together, we then obtain formula (110).

592

M. R. Gaberdiel, P. Goddard

14. Appendix G: Rings and Algebras In this Appendix we review various concepts in algebra; the treatment follows closely the book [24]. We restrict attention to rings, R which have a unit element, 1 ∈ R. An algebra, A, over a field F , is a ring which is also a vector space over F in such a way that the structures are compatible [λ(xy) = (λx)y, λ ∈ F , x, y ∈ R]. The dimension of A is its dimension as a vector space. We shall in general consider complex algebras, i.e. algebras over C. Since 1 ∈ A we have F ⊂ A. A (left) module for a ring R is an additive group M with a map R × M → M, compatible with the structure of R [i.e. (rs)m = r(sm), (r + s)m = rm + sm, r, s ∈ R, m ∈ M]. A module M for an algebra A, viewed as a ring, is necessarily a vector space over F (because F ⊂ A) and provides a representation of A as an algebra in terms of endomorphisms of the vector space M. R provides a module for itself, the adjoint module. A submodule N of a module M for R is an additive subgroup of M such that rN ⊂ N for all r ∈ R. A simple or irreducible module is one which has no proper submodules. A (left) ideal J of R is a submodule of the adjoint module, i.e. an additive subgroup J ⊂ R such that rj ∈ I for all r ∈ R, j ∈ J . The direct sum M1 ⊕ M2 of the R modules M1 , M2 is the additive group M1 ⊕ M2 with r(m1 , m2 ) = (rm1 , rm2 ), r ∈ R, m1 ∈ M1 , m2 ∈ M2 . The direct sum of a (possibly infinite) set Mi , i ∈ I , of R modules consists of elements (mi , i ∈ I ), with all but finitely many mi = 0. The module M is decomposable if it can be written as the direct sum of two non-zero modules and completely reducible if it can be written as the direct sum of a (possibly infinite) sum of irreducible modules. A representation of an algebra A is irreducible if it is irreducible as a module of the ring A. An ideal J is maximal in R if K ⊃ J is another ideal in R, then K = R. If M is an irreducible module for the ring R, then M ∼ = R/J for some maximal ideal J ⊂ R. [Take m ∈ M, m 6 = 0 and consider Rm ⊂ M. This is a submodule, so Rm = M. The kernel of the map r 7 → rm is an ideal, J ⊂ R. So M ∼ = R/J . If J ⊂ K ⊂ R and K is an ideal, then K/J defines a submodule of M, so that K/J = M and K = R, i.e. J is maximal.] Thus an irreducible representation of a finite-dimensional algebra A is necessarily finite-dimensional. The coadjoint representation A0 of an algebra A is defined on the dual vector space to A consisting of linear maps ρ : A → F with (rρ)(s) = ρ(sr). If M is an n-dimensional irreducible representation of A and dij (r) the corresponding representation matrices, the n elements dij (r), 1 ≤ j ≤ n, i fixed, define an n-dimensional invariant subspace A0 corresponding to a representation equivalent to M. So the sum of the dimensions of the inequivalent representations of A does not exceed dim A. This shows that each irreducible representation of a finite-dimensional algebra is finite-dimensional and there are only finitely many equivalence classes of such representations. This is not such a strong statement as it seems because A may have indecomposable representations. In fact A may have an infinite number of inequivalent representations of a given dimension even if dim A < ∞. [E.g. consider the three dimensional complex 2 2 algebra, consisting of λ + µx + νy, λ, µ, ν ∈ C, subject to x =  y = xy = yx = 0, λ0µ which has the faithful three dimensional representation 0 λ ν  and the inequivalent 00λ

Axiomatic Conformal Field Theory

593

λ µ + ξν , for each ξ ∈ C.] The situation is more 0 λ under control if the algebra is semi-simple. The ring R is semi-simple if the adjoint representation is completely reducible. If R is semi-simple, 1 is the sum of a finite number of elements of R, one in each of aP number of the summands in P the expression of R as a sum of irreducible modules, 1 = ni=1 ei and, since any r = ni=1 Lnrei , it follows that there is a finite number, n, of summands Ri = Rei and R = i=1 Ri . If R is semi-simple, every R module is completely reducible (though not necessarily into a finite number L of irreducible summands). [Any module M is the quotient of the free module R = m∈M Rm , where RP m is a copy of the adjoint module R, by the ideal consisting of those (rm )m∈M such that m∈M rm m = 0. The result follows since R is completely reducible if R is and the quotient of a completely reducible module is itself completely reducible.] P An R module M is finitely-generated if M = { ni=1 ri mi : ri ∈ R} for a finite number, n, of fixed elements mi ∈ M, 1 ≤ i ≤ n. If R is semi-simple, any finitely generated R module is completely reducible of summands. [This Ln into a finite number ∼ R , where R R, by the ideal {(ri ) : follows because M is the quotient of = i i i=1 Pn r m = 0}.] i i i=1 If M and N are R modules, an R-homomorphism f : M → N is a map satisfying rf = f r. If M and N are simple modules, Schur’s Lemma implies that the set of R-homomorphisms HomR (M, N ) = 0 if M and N are not equivalent. If M = N , HomR (M, M) ≡ EndR (M) is a division ring, that is every ring in which every non-zero element has an inverse. In the case of an algebra, if dim M < ∞, EndA (M) = F , the underlying field. If the R-module M is completely decomposable into a finite number of irreducible P ni submodules, we can write M = N i=1 Mi , where each Mi is irreducible and Mi and Mj are inequivalent if i 6 = j . Since HomR (Mi , Mj ) = 0 if i 6= j , two-dimensional representations

EndR (M) =

N Y i=1

EndR (Mini )

=

n Y

Mni (Di ),

i=1

where the division algebra Di = EndR (Mi ) and Mn (D) is the ring of n × n matrices with entries in the division algebra D. ni , where the Ri are irreducible If R is a semi-simple ring, we can write R = ⊕N i=1 Ri Q as R modules and inequivalent for i 6 = j . So EndR R = N i=1 Mni (Di ), where Di = EndR (Ri ). But EndR R = R o , the reverse ring to R defined on the set R by taking the product of r and s to be sr rather than rs. Since, evidently, (R o )o = R, R=

N Y i=1

Mni (Dio ),

i.e. every semi-simple ring is isomorphic to the direct product of a finite number of finitedimensional matrix rings over division algebras [Wedderburn’s Structure Theorem]. In the case of a semi-simple algebra, each Di = F , the underlying field, so A=

N Y

Mni (F ),

i=1

where Mn (F ) is the algebra of n × n matrices with entries in the field F . In particular, any semi-simple algebra is finite-dimensional.

594

M. R. Gaberdiel, P. Goddard

References 1. Veneziano, G.: Construction of a crossing-symmetric, Regge-behaved amplitude for linearly rising trajectories. Nuovo Cim. 57A, 190 (1968) 2. Koba, Z. and Nielsen, H.B.: Manifestly crossing-invariant parametrization of n-meson amplitude. Nucl. Phys. B12, 517 (1969) 3. Belavin, A.A., Polyakov, A.M. and Zamolodchikov, A.B.: Infinite conformal symmetry in twodimensional quantum field theory. Nucl. Phys. B241, 333 (1984) 4. Moore, G. and Seiberg, N.: Classical and Quantum Conformal Field Theory. Commun. Math. Phys. 123, 177 (1989) 5. Segal, G.: Notes on Conformal Field Theory. Unpublished manuscript 6. Borcherds, R.E.: Vertex algebras, Kac–Moody algebras and the monster. Proc. Nat. Acad. Sci. U.S.A. 83, 3068 (1986) 7. Borcherds, R.E.: Monstrous moonshine and monstrous Lie algebras. Invent. Math. 109, 405 (1992) 8. Frenkel, I., Lepowsky, J. and Meurman, A.: Vertex Operator Algebras and the Monster. London–New York: Academic Press, 1988 9. Frenkel, I., Huang, Y.-Z. and Lepowsky, J.: On axiomatic approaches to vertex operator algebras and modules. Mem. Am. Math. Soc. 104, 1 (1993) 10. Kac, V.: Vertex algebras for beginners. Providence, RI: Am. Math. Soc., 1997 11. Wassermann, A.J.: Operator algebras and conformal field theory. In: Proceedings of the I.C.M. Zürich 1994, Basel–Boston: Birkhäuser, 1995, p. 966 12. Gabbiani, F. and Fröhlich, J.: Operator algebras and conformal field theory. Commun. Math. Phys. 155, 569 (1993) 13. Goddard, P.: Meromorphic conformal field theory. In: Infinite dimensional Lie algebras and Lie groups: Proceedings of the CIRM Luminy Conference, 1988 Singapore: World Scientific, 1989, p. 556 14. Osterwalder, K. and Schrader, R.: Axioms for euclidean Green’s functions. Commun. Math. Phys. 31, 83 (1973) 15. Felder, G., Fröhlich, J. and Keller, G.: On the structure of unitary conformal field theory. I. Existence of conformal blocks. Commun. Math. Phys. 124, 417 (1989) 16. Huang, Y.-Z.: A functional-analytic theory of vertex (operator) algebras. math.QA/9808022 17. Montague, P.: On Representations of Conformal Field Theories and the Construction of Orbifolds. Lett. Math. Phys. 38, 1 (1996), hep-th/9507083 18. Zhu, Y.: Vertex Operator Algebras, Elliptic Functions and Modular Forms. Caltech preprint (1990), J. Am. Math. Soc. 9, 237 (1996) 19. Gaberdiel, M.R. and Goddard, P.: In preparation 20. Frenkel, I.B. and Zhu, Y.: Vertex operator algebras associated to representations of affine and Virasoro algebras. Duke Math. J. 66, 123 (1992) 21. Dong, C., Li, H., Mason, G.: Twisted Representations of Vertex Operator Algebras. q-alg/9509005 22. Hille, E.: Analytic Function Theory I. Blaisdell Publishing Company, 1959 23. Hille, E.: Analytic Function Theory II. Blaisdell Publishing Company, 1962 24. Farb, B., Dennis, R.K.: Noncommutative Algebra. Berlin–Heidelberg–New York: Springer, 1993 Communicated by R. H. Dijkgraaf

Commun. Math. Phys. 209, 595 – 632 (2000)

Communications in

Mathematical Physics

© Springer-Verlag 2000

Large Deviations and Variational Principle for Harmonic Crystals P. Caputo, J.-D. Deuschel Fachbereich Mathematik, Sekr. MA 7-4, TU Berlin, Strasse des 17. Juni 136, 10623 Berlin, Germany. E-mail: [email protected]; [email protected] Received: 2 June 1999 / Accepted: 9 August 1999

Abstract: We consider massless Gaussian fields with covariance related to the Green function of a long range random walk on Zd . These are viewed as Gibbs measures for a linear-quadratic interaction. We establish thermodynamic identities and prove a version of Gibbs’ variational principle, showing that translation invariant Gibbs measures are characterized as minimizers of the relative entropy density. We then study the large deviations of the empirical field of a Gibbs measure. We show that a weak large deviation principle holds at the volume order, with rate given by the relative entropy density. 1. Introduction Connections between equilibrium statistical mechanics and the theory of large deviations have been known for long time, [24]. For lattice random fields with bounded interactions, estimates of the Donsker–Varadhan type [13] have been established for the empirical field of a Gibbs measure [7,32,18,21], showing that probabilities of large fluctuations of the system in the thermodynamical limit are exponentially small in the volume with rate given by a relative entropy functional. The latter can be expressed as a difference of free energies and a classical variational principle applies, giving a complete characterization of equilibrium measures, i.e. translation invariant Gibbs measures, as minima of relative entropy. The presence or absence of a phase transition in the system is then explicitly reflected in the large deviation picture, in the sense that if there exists more than one translation invariant Gibbs state, then large fluctuations can occur with probabilities decaying slower than exponentially in the volume of the system. Moreover, the interplay between large deviations and the variational principle has found interesting applications in testing the Gibbsian property of certain probability measures [16]. The case of unbounded spin systems is far less investigated, the reason being that in the noncompact setting, even existence of thermodynamics becomes nontrivial. Existence problems can be solved by making strong assumptions on the interactions such as superstability, [26]. Assuming superstability and other regularity properties, a version

596

P. Caputo, J.-D. Deuschel

of the variational principle was discussed in [23]. Since there are many simple and basic models – e.g. the massless free field in dimension d > 3 – which violate superstability, it is interesting to see how far the theory can be pushed without these assumptions. In the present work we make a first attempt in this direction, in the simplified context of Gaussian random fields. We consider an unbounded spin system on Zd , d > 1, with linear-quadratic pair potential, also known as the harmonic crystal. The formal Hamiltonian is given by X J (x, y) φx φy , (1.1) H(φ) = x,y

where φx ∈ R, x ∈ Zd , and the couplings J (x, y) are symmetric and translation invariant. The linearity of the model makes it possible to give a complete description of the Gibbs measures associated to (1.1) – provided they exist – in terms of Gaussian fields, with covariance given by the Green function G = J −1 , [12]. It follows that, in order to have existence, minimal assumptions on J are Z dθ ˆ < ∞, J > 0 and ˆ J (θ ) with Jˆ denoting the Fourier transform of J , as a function on (−π, π]d (Jˆ−1 plays the role of a spectral density). We say that the harmonic crystal (1.1) is in a massless regime if Jˆ vanishes at least at one point. On the other hand, if Jˆ has no zeros the model is much simpler: the spectral density Jˆ−1 is bounded and we have weakly dependent Gaussian fields, with summable covariances. Such fields satisfy hypercontractive estimates which allow to use standard tools to study large deviations, [6,11]. When Jˆ has no zeros we therefore speak of an hypercontractive regime. In a sense which is made precise in the next sections, in the hypercontractive regime we have uniqueness of the Gibbs measure, while in the massless regime we have a phase transition. Moreover, one can show that the interaction is superstable if and only if Jˆ has no zeros, [23]. We restrict our attention to couplings J of the following form. Let p(x, y) be the transition function of a symmetric, homogeneous random walk on Zd . We consider ( 1 − p(0, 0) if x = y (1.2) J (x, y) = β −ηx−y p(x, y) if x 6 = y d

with β > 0 and η ∈ {−1, +1}Z , ηx = η−x . The parameter β plays no role (there is no “temperature”) and it will be set equal to one. The vector η allows us to treat the case of non ferromagnetic couplings. If we set η ≡ 1 the model is ferromagnetic, and the Hamiltonian (1.1) takes the form H(φ) =

1 X p(x, y) (φx − φy )2 . 2 x,y

(1.3)

The system (1.3) can be interpreted as a model for an harmonic interface. The global shift symmetry of the field φ is spontaneously broken as soon as Gibbs measures exist, yielding a phase transition regime. As a consequence, Gibbs measures will not exist for short-range models in dimension d = 1, 2, cf. [19]. We ensure existence by requiring the random walk p to be transient. In particular, low-dimensional models can be treated

Harmonic Crystals

597

if the random walk takes sufficiently long jumps. On the other hand for finite range p, large deviations estimates for harmonic crystals of the type (1.3) were obtained in [1]. For ferromagnetic models, our results on large deviations are the natural extension of [1] to the non-local case. If η 6 ≡ 1, the model is not ferromagnetic and monotonicity tools such as FKG inequalities are no longer available. Moreover, there is no more global shift symmetry to be broken but still phase transition can occur, if the Fourier transform Jˆ of J has zeros. In order to obtain a massless regime we require the constants η to have a suitable oscillating structure. We now turn to a brief description of the main results of this paper. We refer to the relevant sections for precise statements and proofs. In Sect. 2, after introducing the models and reviewing some known material, we prove a new result concerning the structure of translation invariant Gibbs measures. It states that for ferromagnetic models generated by an irreducible random walk, translation invariant Gibbs measures cannot be realized as mixtures of non-translation invariant extremal Gibbs measures. Or, in other words, the set of translation invariant Gibbs measures is a face of the convex set of Gibbs measures. See Remark 3 for further discussion of this topic. In Sect. 3 we introduce thermodynamics. We prove existence of relative entropy density together with an identity expressing the latter as a difference of free energies. In the case of bounded interactions, this is a classical result, [17]. For Gaussian random fields the identity was proven by Künsch under the assumption that either Jˆ−1 is bounded or J (x, y) decays at least as |x − y|−3d/2 , as |x − y| → ∞, cf. [22], Theorem 2.11. We show that in our setting the result is true with no assumption on the decay of J . We next establish the variational principle characterizing translation invariant (tempered) Gibbs states as minima of the free energy or, equivalently, as zeros of relative entropy. In the finite range case, the proof that any minimizer must be Gibbsian follows from the standard argument based on quasilocality, [33,20]. The non-local case is much more difficult in the unbounded-spin setting, since specifications are no longer quasilocal and a weaker approximation is required (the same problem appears for compact spins in the context of weakly Gibbsian measures [28], see also [27] for results in this direction). Under assumptions of superstability and rapid decay of the interaction the issue was settled in [23], but the general case remained open. Exploiting the linearity of the model we are able to show that for our harmonic crystals the variational principle holds with no restriction on the range of J . The main tool in this section is an expansion in terms of random walks for the Green function G associated to J , closely related to the expansions introduced in [4]. Our results concerning large deviations are discussed in Sect. 4. The empirical field on a cubic box 3N of side N is the random probability measure defined by 1 X δθx φ φ ∈ , (1.4) R˜ N (φ) = d N x∈3N

d

where δξ stands for the Dirac measure at ξ ∈ ≡ RZ , and θx φ denotes the shifted configuration φ·+x . Let P denote the centered Gaussian field with covariance given by G. Since P is ergodic with respect to lattice shifts, R˜ N ⇒ P , P − a.s.

as N → ∞,

in the sense of weak convergence. In particular, for any open set 0 of probability measures on such that P ∈ / 0, we have P (R˜ N ∈ 0) → 0, as N → ∞. We show that these

598

P. Caputo, J.-D. Deuschel

probabilities are exponentially decaying in the volume N d , with rate given by the relative entropy density. Since we are interested in the massless regime, classical large deviations results for stationary Gaussian processes based on the continuity of the spectral density (cf. e.g. [14,3]) cannot be used. Following an idea introduced in [1], the exponential bounds are first established for the hypercontractive regime and then lifted to the general case. In contrast to the hypercontractive regime, in the phase transition regime the upper estimates must be restricted to certain compact sets of well behaved measures. In this sense we speak of a weak large deviation principle. In view of the variational principle one expects critical large deviations in the phase transition regime. In particular, if 0 is an open set such that P ∈ / 0 but 0 contains some other translation invariant Gibbs measure, then P (R˜ N ∈ 0) decays slower than exponentially in the volume. This is indeed what has been described in detail in [1] for finite range harmonic crystals. We discuss critical large deviations for a class of long range models of the form (1.3) in a forthcoming paper, see also our remarks after Theorem 4.1. Finally, the appendix contains an expansion for Green functions in terms of random walks, and an estimate relevant for the hypercontractive regime.

2. The Models 2.1. Preliminaries. Let Zd be the d-dimensional integer lattice. For x ∈ Zd , |x| ded notes the euclidean norm. Let = RZ , equipped with the product topology, and denote by F the corresponding Borel field. Probability measures on (, F) are denoted by M1 (). The set M1 () is equipped with the weak topology, i.e. the topology induced by convergence against bounded continuous functions on . For each finite subset 3 of Zd , denoted 3 ⊂⊂ Zd , F3 is the σ -field generated by the canonical projections πx (φ) = φx , x ∈ 3, φ ∈ , and φ3 denotes the restriction of φ ∈ to 3. We also set 3c = Zd \ 3. A function f : → R is called local if there exists a 3 ⊂⊂ Zd such that f is F3 -measurable. The smallest such 3 is called the support of f and is denoted Sf . Lattice shifts {θy } are defined by πx ◦ θy = πx+y , x, y ∈ Zd . Elements of M1 () invariant under transformations induced by shifts are denoted MS1 (). The set of shift invariant measures MS1 () is a closed convex subset of M1 (). A measure µ ∈ MS1 () is called ergodic if µ(A) = 0 or 1 for any event A ∈ F such that θx A = A for all x ∈ Zd . 2.2. The interaction. Let p : Zd × Zd → [0, 1] be a symmetric, homogeneous function p(x, y) = p(0, y − x) = p(y, x),

x, y ∈ Zd .

We make the following basic assumptions. • p is the transition function of a random walk on Zd , i.e. X y∈Zd

p(x, y) = 1,

x ∈ Zd .

(2.1)

Harmonic Crystals

599

• p is transient, i.e. σ2 =

∞ X

pn (0, 0) < ∞,

(2.2)

n=0

where p n denotes the nth iterate of p. The model is specified by the potential J (x, y) = δ(x, y) − Q(x, y),

(2.3)

where δ(x, y) = 1 if x = y, and δ(x, y) = 0 otherwise, and we have defined Q(x, y) = ηx−y p(x, y),

(2.4)

with η an arbitrary symmetric vector with values in {−1, +1} d

η ∈ {−1, +1}Z , ηx = η−x ∀x ∈ Zd . We also use the notation J = 1 − Q. We will be working in this setting for the rest of the paper, with the assumptions (2.1) and (2.2). In Sect. 4, the following mild decay condition will be considered: • there exists β > 0, such that lim |x|d+β p(0, x) = 0.

|x|→∞

(2.5)

We emphasize that our results up to Sect. 4 hold without any hypothesis (other than (2.1)) on the decay of p. Thanks to the transience condition (2.2), the Green function G(x, y) =

∞ X

Qn (x, y)

(2.6)

n=0

is well defined, cf. [35]. Let Jˆ denote the Fourier transform of J , X J (0, x) ei x·θ , θ ∈ (−π, π]d . Jˆ(θ) = x∈Zd

It is easily checked that (2.1) implies Jˆ > 0, and (2.2) implies that f ≡ Jˆ−1 is integrable, with Z 1 f (θ ) ei (y−x)·θ dθ. (2.7) G(x, y) = (2π )d (−π,π]d Remark 1. Let us define N = {θ ∈ (−π, π]d : Jˆ(θ ) = 0},

(2.8)

the set of zeros of Jˆ. If η ≡ 1 in (2.4) then clearly 0 ∈ N . Moreover, it is known that N = {0} if p is irreducible, cf. [35]. By irreducibility we mean that for any x, y ∈ Zd there exists an integer n > 0 such that pn (x, y) > 0. In the non ferromagnetic case η 6 ≡ 1 it could happen that N = ∅. As discussed in the introduction, this reduces to an hypercontractive regime, with G(x, y) absolutely summable. One should therefore keep in mind that the interesting cases are the massless models, obtained by choosing η in such a way that N 6 = ∅. In Sect. 2.5 we discuss some examples.

600

P. Caputo, J.-D. Deuschel

Let 3 ⊂⊂ Zd . We denote by J3 and Q3 the restriction to 3 of J and Q respectively. Then J3 = 13 − Q3 is a positive definite matrix and 03 ≡ J3−1 is given by 03 (x, y) =

∞ X n=0

Qn3 (x, y), x, y ∈ 3.

(2.9)

Let G3 be the restriction of G to 3. Note that G3 is positive definite. In the appendix, cf. Lemma A.1, we show that G3 has an expansion analogous to (2.9) where Q3 must ˜ 3 . This gives be replaced by an effective coupling matrix Q ˜ G−1 3 = 13 − Q3 .

(2.10)

The matrix G3 can be seen as the Green function of J on 3 with free boundary conditions, while 03 is associated to Dirichlet boundary conditions. We call P3 the centered Gaussian measure on R3 with covariance G3 , so that E 1 D dP3 (φ) = (2π)−|3|/2 (det G3 )−1/2 exp (− φ, G−1 3 φ ). dφ3 2 Q Here dφ3 = x∈3 dφx denotes Lebesgue measure, h, i is the usual scalar product, and |3| is the cardinality of the set 3. The measures {P3 , 3 ⊂⊂ Zd } form a consistent family and there exist a unique measure P ∈ M1 () such that its marginal on 3 is given by P3 . We also refer to P as the centered Gaussian field on Zd , with covariance G. From (2.7) and Riemann-Lebesgue lemma, we have G(x, y) → 0, as |x − y| → ∞. This implies P is trivial on tail events, cf. e.g. [20], Chapter 7. In particular, P is ergodic w.r.t. lattice shifts.

2.3. Gibbs measures. Let n o X |J (x, y)φy | < ∞, for all x ∈ Zd . J = φ ∈ :

(2.11)

y∈Zd

Note that J is a tail event. Fix 3 ⊂⊂ Zd and a boundary condition τ ∈ J . The Hamiltonian is defined by H3τ (φ) =

1 X J (x, y)φx φy + 2 x,y∈3

X

J (x, y)φx τy .

We define specifications γ3τ ∈ M1 () by ( τ )−1 exp (−H τ (φ))dφ (Z3 3 if φ3c = τ3c 3 γ3τ (dφ) = 0 otherwise with τ = Z3

Z

τ

(2.12)

x∈3,y ∈3 /

e−H3 (φ) dφ3 .

(2.13)

(2.14)

Harmonic Crystals

601

Then γ3τ is the Gaussian measure on (, F) with mean ( P P − y∈3 z∈3 τ,3 τ / 03 (x, y)J (y, z)τz if x ∈ 3 mx ≡ γ3 (φx ) = if x ∈ /3 τx and covariance (independent of τ ∈ J ) γ3τ (φx φy ) − γ3τ (φx )γ3τ (φy )

(2.15)

( =

03 (x, y) if x, y ∈ 3 . 0 otherwise

(2.16)

The partition function (2.14) is explicitly given by τ = (2π)|3|/2 (det 03 )1/2 exp Z3

1 D τ,3 −1 τ,3 E m , 03 m . 2

(2.17)

Given a local bounded function g, we denote by γ3 (g) the F3c -measurable function Z γ3τ (dφ) g(φ). J 3 τ → γ3τ (g) = A measure µ ∈ M1 () is said to be a Gibbs measure relative to the potential J if µ(J ) = 1

(2.18)

µ(γ3 (g)) = µ(g),

(2.19)

and

for any local bounded function g, for any 3 ⊂⊂ Zd . Let G = GJ be the set of Gibbs measures relative to J . We now recall a basic result about the structure of G. Let σ ∈ and denote by γ σ the translate of P by the configuration σ , i.e. γ σ is the Gaussian field with mean γ (φx ) = σx , x ∈ Zd , and covariance G. Let HJ denote the set of J -harmonic configurations X J (x, y)φy = 0, for all x ∈ Zd }. HJ = {φ ∈ : y∈Zd

Let Gext denote the set of extremal points of the convex set G. A result of Rozanov [34] and Dobrushin [12] (see also Theorems (13.24) and (13.36) of [20]) shows that Gext = {γ σ , σ ∈ HJ }.

(2.20)

Remark 2. The transience assumption (2.2) guarantees existence of Gibbs measures, i.e. G 6 = ∅. Suppose p has finite range. Then the transience hypothesis forces d > 3, cf. e.g. [35]. Moreover, condition (2.18) is automatically satisfied. It is then useful to restrict oneself to measures on smaller subsets of . One can introduce sets of tempered configurations , and define tempered Gibbs measures as the set of µ ∈ G such that µ() = 1. A natural choice in finite range models is = S 0 (Zd ), where S 0 (Zd ) denotes the space of configurations which increase at most polynomially at infinity, [12].

602

P. Caputo, J.-D. Deuschel

Remark 3. A general result (see [20], Theorem (14.17)) states that a shift invariant Gibbs measure has a unique decomposition in terms of ergodic Gibbs measures. But an ergodic Gibbs measure need not be extremal in G in general (an ergodic measure is not necessarily tail trivial). Indeed there are many well known examples (see e.g. [16], Sect. 2) of shift invariant Gibbs measures which can be written as mixtures of non shift invariant elements of Gext . In Theorem 2.1 below we prove that for ferromagnetic models generated by an irreducible random walk this cannot happen, at least if we restrict to tempered measures. In other words all ergodic Gibbs measures are extremal in G and “extremal shift invariant Gibbs = shift invariant extremal Gibbs”. We also give an example of models where this is violated. 2.4. Ferromagnetic models. The ferromagnetic case is obtained if η ≡ 1 in (2.4), i.e. d Q = p, Por equivalently J is the generator of a (continuous time) random walk on Z . Since y J (x, y) = 0 for any x, any constant configuration σx ≡ c is in HJ and the global shift symmetry φ → φ + c, is spontaneously broken. In particular the set G contains uncountably many measures. A classical example is the massless free field. It is obtained when p is the simple random walk ( 1/2d if |x − y| = 1, p(x, y) = 0 otherwise i.e. J = −1, if 1 is the discrete Laplace operator on Zd . As an example of a non local interaction we shall consider the following model, to which we refer as the α-stable crystal. For α ∈ (0, 2 ∧ d), let p be such that lim |x|d+α p(0, x) = cα

|x|→∞

(2.21)

for some cα ∈ (0, ∞). The random walk will then be in the domain of attraction of a stable law (see [30] for some interesting properties of stable random walks). In this case the condition (2.18) forces all Gibbs measures to be tempered, with no additional requirement. In ferromagnetic models the covariance matrices 03 , G3 , 3 ⊂⊂ Zd admit a simple interpretation in terms of the random walk. We see from (2.9) that 03 is the Green function of the random walk with transition p which is “killed” upon exiting the box 3. On the other hand G3 is the Green function of the random walk “embedded” in 3, see the remark after Lemma A.1. One can use these facts to give the following representation for the mean values (2.15). If Px denotes the probability for the discrete time random walk ξ0 , ξ1 . . . defined by p, with start in ξ0 = x ∈ 3, and if t3 denotes the first hitting time for Zd \ 3, we have X = Px ξt3 = y τy , τ ∈ J . (2.22) mτ,3 x y ∈3 /

These type of formulas represent only a special case of the so called random walk representation, which can be generalized to anharmonic crystals with convex pair interaction, cf. e.g. [10] for a recent account. We turn to the result announced in Remark 3 above.

Harmonic Crystals

603

Theorem 2.1. Let µ be a shift invariant Gibbs measure for J = 1−p with p irreducible, and assume µ has finite first moments, i.e. µ(|φ0 |) < ∞. Then µ is a mixture of shift invariant elements of Gext . In particular, there exists a probability measure ρ0 on R with finite first moment and Z ∞ γt ρ0 (dt), (2.23) µ= −∞

where γt =

γσ

with σx ≡ t ∈ R.

Proof. Let µ be given as in the hypothesis, and let ρ ∈ M1 () be its extremal decomposition in G, i.e. Z γ σ ρ(dσ ). (2.24) µ=

In particular, cf. (2.20), ρ(HJ ) = 1. Moreover, ρ is shift invariant and has finite first moments. We want to show that ρ is actually concentrated on the constant elements of HJ . This would reduce the integral on in (2.24) to the integral on R in (2.23), thereby proving the theorem. It is sufficient to show that for any x, y ∈ Zd , for any β > 0, (2.25) ρ |φx − φy | > β = 0. In order to prove (2.25) we may proceed as follows. Observe that φ ∈ HJ is equivalent to pt φ = φ for all t > 0, where pt (x, y) = e−t

∞ X tn n p (x, y) n! n=0

is the continuous time kernel on Zd with Fourier transform exp (−t Jˆ). Since ρ(HJ ) = 1 we can replace φ in (2.25) by pt φ, and by Chebyshev’s inequality and the shift invariance of ρ, X |pt (x, z) − pt (y, z)|. ρ |φx − φy | > β 6 β −1 ρ(|φ0 |) z∈Zd

Since t is arbitrary, (2.25) follows from X |pt (x, z) − pt (y, z)| = 0. lim t→∞

(2.26)

z∈Zd

If p has finite range and is irreducible, (2.26) follows from the standard local limit theorem estimates, as in the case of the simple random walk, cf. e.g. [25], Corollary 1.2.3. For a general irreducible random walk one can use the following argument, cf. [8]. Let K > 0 be a large number such that δ ∈ (0, 1) if X p(0, x). δ= x: |x| 6 K

Define two homogeneous transitions r(x, y) and s(x, y), by r(x, y) = δ −1 p(x, y) 1{|x−y| 6 K} , s(x, y) = (1 − δ)−1 p(x, y) 1{|x−y|>K} .

604

P. Caputo, J.-D. Deuschel

Notice that since p is irreducible, K can be chosen in such a way that r is irreducible. If rt and st are the corresponding continuous time kernels, then it is easy, using e.g. the Fourier transform, to see that X rδt (x, z0 ) s(1−δ)t (z0 , z). pt (x, z) = z0 ∈Zd

We have X

|pt (x, z) − pt (y, z)| 6

z∈Zd

X

|rδt (x, z0 ) − rδt (y, z0 )| s(1−δ)t (z0 , z)

z,z0 ∈Zd

=

X

|rδt (x, z0 ) − rδt (y, z0 )|.

(2.27)

z0 ∈Zd

Now (2.26) follows from (2.27) since r has finite range. u t 2.5. Non-ferromagnetic models. Here we consider some examples of potentials J satisfying conditions (2.1) and (2.2) but such that X J (0, x) 6= 0. (2.28) x∈Zd

Suppose η in (2.4) is such that N = ∅, cf. (2.8). Then one can show (see e.g. [20], Corollary (13.40)) that the unique element of G with uniformly bounded first moments is P . Moreover P is hypercontractive in this case, cf. Appendix B. If p has long range, e.g. p satisfies (2.21) for some α > 0, one can actually show that G = {P }. This is due to the fact that condition (2.18) forces all Gibbs measures to be strongly tempered, cf. [12,20]. More interesting is the case N 6= ∅. Let θ0 ∈ N (by symmetry, this implies −θ0 ∈ N ). Then it is easy to see that there exists m ∈ Zd such that θ0 = π(m1 , . . . , md ), and the coefficients ηx have the form d X mi xi , ηx = cos π i=1

for any x ∈

Zd

such that p(0, x) > 0. Moreover, all oscillating configurations σx = α cos (θ0 · x),

x ∈ Zd , α ∈ R,

(2.29)

are elements of HJ . We now give an example of translation invariant Gibbs measures which are not mixtures of translation invariant extremal Gibbs states. Such cases were first pointed out by Rozanov, [34]. Since (2.28) implies that P is the unique shift invariant element of Gext , it will be enough to exhibit a shift invariant µ ∈ G with µ 6 = P . The following example shows much more, namely that for any L > 0, we can can find µ ∈ G ∩ MS1 () with µ(φ02 ) > L. Let T > 0 and let ρT be the centered Gaussian field with spectral density T (δθ0 + δ−θ0 ), ρˆT = 2 or, in other words ρT is the centered Gaussian field with covariance ρT (σx , σy ) = T cos θ0 · (x − y)

Harmonic Crystals

605

(cf. e.g. [20], Proposition (13.A7) to check that ρT is well defined). Then, for any T > 0, Z (2.30) µT = γ σ ρT (dσ ) is a centered Gaussian field, since it is a convolution of centered Gaussian measures. Moreover it is translation invariant, since its covariance GT (x, y) = G(x, y) + T cos θ0 · (x − y), is homogeneous on Zd . To show that µT ∈ G we have to show ρT (HJ ) = 1. But this follows from X X J (x, y)σy |2 = T J (x, y)J (x, z) cos θ0 · (y − z) ρT | y∈Zd

y,z∈Zd

= T Jˆ(θ0 )2 = 0, for any x ∈ Zd . We remark that in the case of Theorem 2.1 the measure µT can be realized only for θ0 = 0, cf. Remark 1, and ρT is concentrated on the constant configurations. On the other hand if η ≡ 1, p is not irreducible and there exists 0 6 = θ0 ∈ N , then we see that the measure (2.30) violates the claim of Theorem 2.1. 3. Thermodynamics and Variational Principle In this section we prove a version of Gibbs’ variational principle. We first prove a basic thermodynamic identity (cf. Proposition 3.1) expressing relative entropy density as a difference of free energies. 3.1. Thermodynamics. For any N ∈ Z+ let 3N = [−N/2, N/2)d ∩ Zd . We will always assume N is an even integer, so that 3N has side N and |3N | = N d . We define the sets of translation invariant measures [ KL . (3.1) KL = µ ∈ MS1 (), µ(φ02 ) 6 L , K∞ = L>0

A tightness argument shows that KL is compact in the weak topology for any L > 0. Given two measures µ, ν, let µN , νN denote their marginals on 3N . The relative entropy of µ with respect to ν on 3N is defined by  R dµ log dµN if µ <<ν N N N . (3.2) HN (µ|ν) = dνN +∞ otherwise Let also HN (µ) denote the relative entropy of µ with respect to Lebesgue measure on 3N . The entropy density of µ ∈ MS1 () is defined as s(µ) = − lim

N →∞

1 HN (µ). Nd

(3.3)

606

P. Caputo, J.-D. Deuschel

A general subadditivity argument shows that this limit always exists, although it may be equal to −∞. Moreover, s is affine (see e.g. Chapter 15 of [20]). The internal energy density of µ ∈ K∞ is defined as e(µ) =

1 1 lim 2 N→∞ N d

X

J (x, y)µ(φx φy ).

(3.4)

x,y∈3N

As a consequence of the absolute summability of J we obtain that, for µ ∈ K∞ , the limit exists and equals e(µ) =

1 X J (0, y)µ(φ0 φy ). 2 d

(3.5)

y∈Z

The pressure is defined by p = lim

N →∞

1 0 log Z3 , N Nd

(3.6)

0 is given by (2.14) with τ = 0. By Theorem 2.5 in [22] the above limit exists where Z3 N and equals Z 1 1 1 log f (θ )dθ, (3.7) p = log (2π) + 2 2 (2π )d (−π,π]d

where f = Jˆ−1 is the spectral density of P (cf. (2.7)). We define the relative entropy density of µ w.r.t. ν as h(µ|ν) = lim

N →∞

1 HN (µ|ν). Nd

Proposition 3.1. The function h( · |P ) is well defined on MS1 (), and ( +∞ if µ ∈ MS1 () \ K∞ . h(µ|P ) = e(µ) − s(µ) + p if µ ∈ K∞

(3.8)

(3.9)

Moreover, h( · |P ) is affine on K∞ . Proof. Let µ ∈ M1 (). We check the first line of (3.9). An alternative expression for HN is given by (cf. e.g. Lemma (3.2.13) of [11]) (3.10) HN (µ|P ) = sup µ(ψ) − log P (eψ ) , ψ

where the supremum is taken over bounded F3N −measurable functions. Let α ∈ 2 (0, σ 2 /2), so that log P (eαφ0 ) ≡ c(α) < ∞. Then choosing ψT (φ) = α(φ02 ∧ T ), T > 0 in (3.10) gives HN (µ|P ) > αµ(φ02 ∧ T ) − c(α). Letting T → ∞, this implies HN (µ|P ) > αµ(φ02 ) − c(α). Hence, h(µ|P ) = ∞ if µ∈ / K∞ .

Harmonic Crystals

607

We turn to the proof of the identity in the second line of (3.9). We may assume that µN has a density, otherwise the identity holds trivially, since s(µ) = −∞ in this case. Let ψN be the density of PN , and call ψN0 the density of γ30 N , cf. (2.13). We have HN (µ|P ) = HN (µ) − µ(log ψN0 ) + µ(log ψN0 ψN−1 ). Since lim

N→∞

1 µ(log ψN0 ) = − p − e(µ), Nd

we need to prove lim

N →∞

1 µ(log ψN0 ψN−1 ) = 0. Nd

(3.11)

Recall that −1 ψN0 ψN−1 (φ) = det GN 0N

1/2

1 D E −1 exp − − G−1 φ, (0N N )φ , 2

where GN ≡ G3N and 0N ≡ 03N . The argument of Theorem 2.5 in [22], shows that lim

N→∞

1 −1 log det GN 0N = 0. d N

We are left with lim

N→∞

E 1 D −1 −1 µ φ, (0 − G )φ = 0. N N Nd

(3.12)

Let x, y ∈ 3N . Using the expansion of Lemma A.1, we have −1 ˜ (x, y) − G−1 0N N (x, y) = Q3N (x, y) − Q3N (x, y) =

∞ X

Bn (x, y).

n=1

Since µ ∈ K∞ , by Schwarz’ inequality and (A.2), E D µ φ, (0 −1 − G−1 )φ 6 N

N

6 µ(φ02 )

∞ X X Bn (x, y) |µ(φx φy )| x,y∈3N n=1

X

∞ X

X

x,y∈3N n=1 z1 ∈3 / N

···

X

p(x, z1 ) · · · p(zn , y).

(3.13)

zn ∈3 / N

Let ξ0 , ξ1 , . . . be the random walk with transition p. We call Px the probability measure associated to the paths with ξ0 = x, x ∈ 3N . Let tN = inf { n > 1, ξn ∈ 3N }. After summing over y ∈ 3N , (3.13) can be written as µ(φ02 )

∞ X X x∈3N n=1

X Px ξ1 ∈ / 3N ; tN = n + 1 6 µ(φ02 ) Px ξ1 ∈ / 3N . x∈3N

608

We then have

P. Caputo, J.-D. Deuschel

E X X D µ φ, (0 −1 − G−1 )φ 6 µ(φ 2 ) p(x, z). 0 N N x∈3N z∈3 / N

Now a simple argument using just absolute summability of p(0, y) shows that the above expression is o(N d ). This implies (3.11) and (3.9) follows. Finally, the affinity of h( · |P ) t on K∞ follows from (3.9) and the affinity of s. u 3.2. Variational principle. In classical statistical mechanics Gibbs’ variational principle can be stated as follows. A measure µ ∈ MS1 () is a minimum for the free energy f (µ) = e(µ) − s(µ) if and only if µ is a translation invariant Gibbs state. In view of Proposition 3.1 this can be turned into a statement about relative entropy density. Let G2S = G ∩ K∞ , the set of translation invariant Gibbs states with finite second moment. Theorem 3.2. Let µ ∈ MS1 (). Then, µ ∈ G2S ⇐⇒ h(µ|P ) = 0.

(3.14)

Proof. Let µ ∈ G2S , and denote by ρ ∈ M1 () its extremal decomposition, i.e. (cf. (2.20)) Z µ = γ σ ρ(dσ ), ρ(HJ ) = 1. Note that µ ∈ K∞ implies ρ ∈ K∞ . By convexity Z HN (γ σ |P ) ρ(dσ ). HN (µ|P ) 6

(3.15)

Moreover, by explicit computation, HN (γ σ |P ) =

E 1 D σ, G−1 N σ . 2

For σ ∈ HJ we can write E D E D X X −1 σ = σ, (G − J )σ − J (x, y)σx σy . σ, G−1 N N N x∈3N y ∈3 / N

−1 is the restriction of J to 3N . We can now use the argument of PropoHere JN = 0N sition 3.1, cf. (3.12), and the fact that ρ is concentrated on HJ to obtain E D 1 ρ σ, G−1 σ = 0. lim N d N→∞ N

From (3.15) we see that h(µ|P ) = lim

N →∞

1 HN (µ|P ) = 0. Nd

Harmonic Crystals

609

We now turn to the more delicate part of (3.14). Namely, suppose µ ∈ MS1 (), and h(µ|P ) = 0.

(3.16)

We are going to prove that µ satisfies (2.18) and the DLR equations (2.19). Note that Proposition 3.1 implies that µ has finite second moments, and therefore (2.18) follows from X X |J (x, y)| |φy | = µ(|φ0 |) |J (x, y)| < ∞, x ∈ Zd . µ y∈Zd

y∈Zd

Let g be a local, bounded continuous function, and let 3 ⊂⊂ Zd . We will show µ(γ3 (g)) = µ(g).

(3.17)

Let V ⊂⊂ Zd be a large set containing both 3 and the support of g. We write fV =

dµV dPV

for the density of µ w.r.t. P on V . Then, since µ(g) = P (fV g), we have |µ(γ3 (g)) − µ(g)| 6 |µ(γ3 (g)) − P (fV γ3 (g))| + |P (fV γ3 (g)) − P (fV g)|.

(3.18)

To prove (3.17) we show that both terms in the r.h.s. of (3.18) can be made arbitrarily small by choosing V large enough. To be more precise, (3.17) is a consequence of the next two lemmas. Lemma 3.3. For any ε > 0, there exists N = N (ε) ∈ Z+ such that for any V ⊃ 3N , |µ(γ3 (g)) − P (fV γ3 (g))| < ε.

(3.19)

Lemma 3.4. For any ε > 0 and for all N ∈ Z+ , there exists a finite set V = V (ε, N ), such that V ⊃ 3N , and |P (fV γ3 (g)) − P (fV g)| < ε.

(3.20)

Note that Lemma 3.3 becomes trivial if p has finite range. In the non-local case the continuity expressed by (3.19) replaces the classical argument based on quasilocality. Issues of continuity for Gibbsian specifications in the non quasilocal setting were recently discussed in the paper [29]. Proof of Lemma 3.3. Notice that P (fV γ3 (g)) = P (fV P (γ3 (g)|FV )) = µ(P (γ3 (g)|FV )).

(3.21)

Define τ 0V c

τ (g) = γ3V γ3,V

(g),

i.e. we put 0-boundary conditions outside of V . Note that γ3,V (g) is now a FV measurable function. Let us fix a sequence {VN }, with VN ⊃ 3N , N ∈ Z+ . We define ϕN = γ3 (g) − γ3,VN (g).

610

P. Caputo, J.-D. Deuschel

Using (3.21) we see that for any N , |µ(γ3 (g)) − P (fVN γ3 (g))| 6 µ(|ϕN |) + µ(P (|ϕN | |FVN )). To prove (3.19) it is then sufficient to show lim µ(|ϕN |) = 0

(3.22)

lim µ(P (|ϕN | |FVN )) = 0.

(3.23)

N →∞

and N→∞

We may assume g to be a bounded Lipshitz-continuous local function. For such a function, define kgkL = sup sup

n |g(σ ) − g(η)| |σx − ηx |

x∈Zd

o , σ, η ∈ : σy = ηy , ∀ y 6= x . τVN 0V c

From (2.15) and (2.16) we know that the measure γ3 γ3τ shifted by the mean values vector mN (τ ), given by τ V N 0V c

mN x (τ ) ≡ γ3 We then have

N

(φx ) − γ3τ (φx ) =

X X

N

can be seen as the measure

03 (x, y)J (y, z)τz .

y∈3 z∈V / N

Z

γ3τ (dφ) |g(φ + mN (τ )) − g(φ)| X X X |mN |J (y, z)| |τz |, 6 kgkL x (τ )| 6 C(g, 3)

|ϕN (τ )| 6

x∈3

(3.24)

y∈3 z∈V / N

having defined C(g, 3) = |3| kgkL sup |03 (x, y)|. x,y∈3

Using the shift invariance of µ and the fact that µ ∈ K∞ , we have X X |J (y, z)| µ(|φz |) µ(|ϕN |) 6 C(g, 3) y∈3 z∈V / N

= C(g, 3) µ(|φ0 |)

X X

|J (y, z)|.

(3.25)

y∈3 z∈V / N

Now, the r.h.s. of (3.25) vanishes as N → ∞, and (3.22) follows. We turn to the proof of (3.23). From (3.24) we have X X |J (y, z)| µ(P (|φz | |FVN )). µ(P (|ϕN | |FVN )) 6 C(g, 3) y∈3 z∈V / N

(3.26)

Harmonic Crystals

611

Let us fix τ ∈ J and denote by P (·|τVN ) the conditional expectation P (·|FVN ) evaluated at τVN . Then P (·|τVN ) is the Gaussian field on VNc with mean P (φz |τVN ) = −

X X y∈VNc

x∈VN

0VNc (z, y)J (y, x)τx , z ∈ Zd \ VN ,

(3.27)

and covariance P (φz φz0 |τVN ) − P (φz |τVN )P (φz0 |τVN ) = 0VNc (z, z0 ),

(3.28)

where 0VNc (z, z0 ) =

∞ X n=0

if QVNc is the function given by

QnV c (z, z0 ),

z, z0 ∈ Zd \ VN ,

N

(

QVNc (x, y) =

Q(x, y) if x, y ∈ Zd \ VN . 0 otherwise

Note that (3.27) and (3.28) can be obtained as a limiting form of (2.15) and (2.16). For ferromagnetic models (3.27) and (3.28) assume a simple probabilistic meaning in terms of the random walk killed as it enters the set VN (cf. discussion in Sect. 2.4). Replacing Q(x, y) by its absolute value p(x, y) we can reduce ourselves to the ferromagnetic case. Let Pz , z ∈ Zd \ VN denote the probability associated to the random walk ξ0 = z, ξ1 , ξ2 , . . . with transition p. Let tN be the first hitting time of the set VN . We have X X 2 0VNc (z, y)J (y, x)τx P (φz2 |τVN ) = 0VNc (z, z) + 6 σ2 +

X x,x 0 ∈VN

y∈VNc x∈VN

Pz (ξtN = x)Pz (ξtN = x 0 )|τx τx 0 |,

where we have used (2.2) to bound 0VNc (z, z). We now use Schwarz’ inequality and the shift invariance of µ to write µ P (φz2 |FVN ) 6 σ 2 + µ(τ02 ). Therefore, 1/2 1/2 6 σ 2 + µ(τ02 ) . µ P (|φz | |FVN ) 6 µ P (φz2 |FVN ) Plugging (3.29) into (3.26) shows that 1/2 X X

µ(P (|ϕN | |FVN )) 6 C(g, 3) σ 2 + µ(τ02 )

y∈3 z∈V / N

as N → ∞. This completes the proof of Lemma 3.3. u t

|J (y, z)| → 0,

(3.29)

612

P. Caputo, J.-D. Deuschel

Proof of Lemma 3.4. Since fV \3 = dµV \3 /dPV \3 is F3c −measurable, the DLR property of P implies P (fV \3 γ3 (g)) = P (fV \3 g), and we can write |P (fV γ3 (g)) − P (fV g)| 6 2 kgk∞ P (|fV \3 − fV |). The lemma now follows from a classical argument (cf. [33,20]). Precisely, following [20], Step 2 in the proof of Theorem (15.37), one shows that (3.16) implies that for any ε > 0 and for all N ∈ Z+ , there exists V = V (ε, N ) ⊂⊂ Zd , such that V ⊃ 3N , and P (|fV − fV \3 |) < ε. (3.30) This completes the proof of Lemma 3.4. u t

4. Large Deviations In this section we discuss volume order large deviations of the empirical field with respect to the measure P . From now on, in addition to the usual assumptions (2.1) and (2.2), we assume that the coupling coefficients satisfy the decay condition (2.5). We also assume N 6 = ∅, i.e. we focus on the massless regime. The (periodized) empirical field RN is defined by 1 X δθx φN , (4.1) RN (φ) = d N x∈3N

where δξ stands for the Dirac measure at ξ ∈ , θx φ denotes the shifted configuration φ·+x , and φN denotes the configuration on 3N periodized to the whole of Zd . With this definition RN (φ) ∈ MS1 () for all φ ∈ , and the map φ → RN (φ) is F3N measurable. It is well known, cf. e.g. [21], that as far as the large deviation behaviour is concerned (4.1) is equivalent to (1.4). Let ( +∞ µ ∈ M1 () \ MS1 () . (4.2) I(µ) = h(µ|P ) µ ∈ MS1 () The main result in this section states that RN satisfies a weak large deviation principle with rate function I. Recall the definition (3.1) of the compact sets KL ⊂ MS1 (). Theorem 4.1. For any L > 0, the restriction of I to KL is lower semi-continuous. Moreover, the following bounds hold: • For any closed set F ⊂ M1 (), for any L > 0, lim sup N→∞

1 log P RN ∈ F ∩ KL 6 − inf I(µ). d N µ∈F ∩KL

(4.3)

• For any open set G ⊂ M1 (), lim inf N→∞

1 log P RN ∈ G > − inf I(µ). d µ∈G N

(4.4)

Harmonic Crystals

613

Before starting the proof of Theorem 4.1 we make some remarks. Remark 1. Since the upper bound (4.3) is restricted to compact sets we speak of a weak large deviation principle. The standard way to remove this restriction would be to prove exponential tightness in the form lim lim sup

L→∞ N →∞

1 log P RN ∈ / KL = −∞. d N

But this is false in the massless regime. Indeed, for fixed L > 0, the measure µT defined by (2.30) is not in KL , for any T > L − σ 2 . But µT ∈ G2S , and therefore the lower bound (4.4) together with Theorem 3.2 implies lim inf N→∞

1 log P RN ∈ / KL > − h(µT |P ) = 0. d N

Note, moreover, that the upper bound (4.3) does not extend to any compact subset of MS1 () (it suffices to consider compact sets of measures which have finite first moment but infinite second moment). It is also easy to see that, in contrast to the hypercontractive case discussed in Theorem 4.2 below, I cannot be lower semi-continuous on the whole of M1 (). Indeed, consider e.g. mixtures of the type (2.23). Let µn , n ∈ Z+ be the measures obtained with such mixtures when ρ0n (dt) = ψn (t)dt with densities ψn ∈ L2 (R) such that ψn converges in L1 (R) to some ψ ∈ L1 (R) \ L2 (R). Clearly µn → µ weakly, as n → ∞, where µ is given by (2.23) with ρ0 (dt) = ψ(t)dt. On the other / K∞ , we have I(µ) = ∞. hand I(µn ) = 0 for all n but since µ ∈ Remark 2. In view of the variational principle proven in Theorem 3.2, we have I(µ) = 0

⇐⇒

µ ∈ G2S .

In particular we expect deviations of order o(N d ) in the class of shift invariant Gibbs states. We then speak of critical large deviations. One can investigate this issue at the level of the empirical mean MN (φ) =

1 X φx . Nd x∈3N

With respect to P , MN is a Gaussian variable with mean zero and variance σN2 = P (MN2 ) =

1 N 2d

X

G(x, y).

x,y∈3N

It is easy to check N d σN2 → f (0) =

X

G(x, y), N → ∞.

x∈Zd

This implies that, for any β > 0, lim

N→∞

β2 1 log P MN > β = − . d N 2f (0)

614

P. Caputo, J.-D. Deuschel

In other words, if the model is not ferromagnetic, i.e. f (0) < ∞, the event MN > β cannot be realized in the class G2S . For ferromagnetic models, f (0) = +∞ and the event MN > β can always be produced within the class G2S (the latter contains all the fields γ σ with σ a constant). For finite range ferromagnetic models with irreducible random walks, the full picture of critical large deviations has been obtained in the paper [1]. Since a classical invariance principle applies, one has here the order N d−2 , for d > 3. In particular one can prove that exponential tightness holds at this order. It is possible to extend this analysis of critical large deviations to the α−stable models, cf. (2.21), for any d > 1. The correct order here is N d−α , due to the different scaling. This will be the subject of a separate paper, [5]. Remark 3. The proof of the upper bound in Theorem 4.1 is based on the argument introduced in [1] for the case of finite range ferromagnetic interaction. The main tool in extending this to our general setting is the expansion proved in the appendix, Lemma A.1, which is heavily used for comparisons of different Gaussian fields. The main line of the proof can be roughly described as follows. We perturb the original measure P by removing the singularity of its spectral density, cf. (4.9). This defines a new Gaussian field, P ε , with bounded spectral density. The new field is hypercontractive and satisfies a strong large deviation principle, cf. Theorem 4.2. This is a multidimensional version of the sort of results proven in Sect. 5.4 of [11], see also [6]. Hypercontractivity is obtained via the classical theorem of Nelson [31], cf. Lemma B.2 in the appendix. We then show that the bounds satisfied by P ε are stable (on the volume scale) when the singularity is restored, cf. Theorem 4.3. This is the most delicate part of the proof. In particular, the arguments of [1] do not apply and finer estimates are needed, cf. Lemma 4.6. Remark 4. The lower bound in Theorem 4.1 can be proven along the line of classical arguments [13,18]. This is done at the end of this section. Our proof of Theorem 4.1 also shows, however, how the lower bound for P can be in principle derived from a lower bound on P ε . Remark 5 (On the effect of boundary conditions). We observe that the proof of Theorem 4.1 carries through with minor modifications if we replace the infinite volume measure P by PN0 = γ30 N , i.e. the Gibbs measure on 3N with zero boundary condition. In particular the statement of Theorem 4.1 holds as it is when P is replaced by PN0 . Indeed, this amounts to replace the covariance GN by 0N . As (the proof of) Proposition 3.1 shows, this cannot affect thermodynamics, and in general will not affect any volume order asymptotic. Let us discuss the case of an arbitrary boundary condition τ ∈ J . We will find sufficient conditions on τ such that the bounds of Theorem 4.1 hold with P replaced by PNτ = γ3τ N . Let ψN =

dPNτ

dPN0

,

so that for any measurable set B ⊂ M1 (), PNτ RN ∈ B = PN0 ψN ; RN ∈ B . If we assume the validity of the bounds (4.3) and (4.4) for PN0 , then an application of Hölder’s inequality, cf. (4.34) and (4.36), shows that to obtain the same bounds for PNτ we have to check, for any p > 1, lim

N→∞

1 1 p −p log PN0 ψN = lim log PNτ ψN = 0. d d N →∞ N N

(4.5)

Harmonic Crystals

615

By explicit computation one has that (4.5) is equivalent to 1 D τ,N −1 τ,N E m , 0N m = 0, lim N→∞ N d

(4.6)

where mτ,N stands for mτ,3N , cf. (2.15). From (2.17) we see that condition (4.6) may be interpreted as the vanishing of the surface tension induced by τ , i.e. σ (τ ) = lim

N →∞

τ ZN 1 log = 0, 0 Nd ZN

(4.7)

(viewing the field as an interface in a d + 1-dimensional space). If we consider the massless free field, d > 3, and we impose a linear tilt τx = u · x, for some u ∈ Rd , then one can compute σ (τ ) = |u|2 /2, and one has a nontrivial influence on large deviations. On the other hand one can show that (4.7) holds for boundaries growing sufficiently slow. To see this, let us consider the α-stable crystal defined by (2.21), α ∈ (0, 2 ∧ d). We will show that (4.7) holds uniformly for τ in the class L,β = τ ∈ : |τx | 6 L(1 + |x|β ), x ∈ Zd for any L > 0 and β < α/2. Let ξxτ,N =

X

p(x, z)τz ,

x ∈ 3N

z∈3 / N

so that mτ,N = 0N ξ τ,N . Then, using (2.21) and the fact that 0N (x, y) 6 G(x − y) 6 const. |x − y|α−d

(4.8)

(cf. [5]) we have D E D E −1 τ,N m = N −d ξ τ,N , 0N ξ τ,N 6 const. L2 N 2β−α . N −d mτ,N , 0N This proves our claim. It follows that the large deviation principle of Theorem 4.1 holds uniformly for boundary conditions τ ∈ L,β for any fixed L > 0 and β < α/2. For finite range models, the same argument (set α = 2 in (4.8)) shows that we may choose τ uniformly in L,β , for L > 0 and β < 1. 4.1. Proof of Theorem 4.1. For each ε > 0, we define the measure P ε ∈ M1 (), as the centered Gaussian field on Zd with spectral density (cf. (2.6)) f ε = (Jˆ + ε)−1 =

f . 1+εf

(4.9)

Let h(µ|P ε ) be the relative entropy density associated to the measure P ε , cf. (3.8), and define ( +∞ µ ∈ M1 () \ MS1 () . (4.10) Iε (µ) = h(µ|P ε ) µ ∈ MS1 ()

616

P. Caputo, J.-D. Deuschel

Theorem 4.2. The function Iε : M1 () → [0, ∞] is lower semi-continuous and its level sets µ ∈ M1 () : I(µ) 6 L , L > 0

are compact. For each µ ∈ K∞ it satisfies ε Iε (µ) = I(µ) + µ(φ02 ) + φ(ε), µ ∈ K∞ , 2

(4.11)

where φ(ε) is given by φ(ε) = −

1 1 2 (2π )d

Z (−π,π]d

log (1 + εf (θ ))dθ.

(4.12)

Moreover, • For any closed set F ⊂ M1 (), lim

N→∞

1 log P ε RN ∈ F 6 − inf Iε (µ). d µ∈F N

(4.13)

• For any open set G ⊂ M1 (), lim

N→∞

1 log P ε RN ∈ G > − inf Iε (µ). d µ∈G N

(4.14)

The proof of Theorem 4.2 is postponed until the end of the section. Observe that the semi-continuity of I on each KL is a consequence of (4.11) and the semi-continuity of Iε . We turn to the proof of the bounds (4.3) and (4.4). Let ε > 0 be fixed, and choose an integer N. We define a new measure P N,ε ∈ M1 () by 1 ε X 2 dP N,ε exp − φx . (φ) = FN,ε (φ) ≡ dP ZN,ε 2

(4.15)

x∈3N

The normalizing factor ZN,ε can be computed h ε X 2 i φx = det (1N + εGN )−1/2 . ZN,ε = P exp − 2 x∈3N

In particular, the argument of Theorem 2.5 in [22] shows that lim

N→∞

1 log ZN,ε = φ(ε), Nd

(4.16)

where φ(ε) is given in (4.12). The key intermediate step in deriving the bounds of Theorem 4.1 is given by the following result.

Harmonic Crystals

617

Theorem 4.3. For each closed set F ⊂ M1 (), lim sup N→∞

1 log P N,ε RN ∈ F 6 − inf Iε (µ), d µ∈F N

(4.17)

and, for each open set G ⊂ M1 (), lim inf N→∞

1 log P N,ε RN ∈ G > − inf Iε (µ). d µ∈G N

(4.18)

Let us assume, in fact, the validity of Theorem 4.3. Let L > 0 be fixed. For each closed set F ⊂ M1 (), setting FL ≡ F ∩ KL , we have −1 ; RN ∈ FL P RN ∈ FL = P N,ε FN,ε (4.19) 6 ZN,ε exp (ε LN d /2) P N,ε RN ∈ FL , where we have used the fact that if RN (φ) ∈ KL , then X φx2 6 L N d . x∈3N

On the other hand, for any open set G ⊂ M1 (), we have −1 ; RN ∈ G P RN ∈ G = P N,ε FN,ε > ZN,ε P N,ε RN ∈ G .

(4.20)

Let us prove the upper bound (4.3). From (4.19) and (4.17) we have 1 1 lim sup d log P RN ∈ FL 6 − inf Iε (µ) + φ(ε) + ε L. µ∈FL N 2 Using (4.11)N→∞ we have lim sup N→∞

1 1 log P RN ∈ FL 6 − inf I(µ) + ε L. d µ∈FL N 2

(4.21)

Then (4.3) follows from (4.21) letting ε → 0. To prove the lower bound (4.4), it is sufficient to show that for each µ ∈ G, lim inf N→∞

1 log P RN ∈ G > − I(µ). d N

We may assume µ ∈ K∞ . From (4.20) and (4.18), we have lim inf N→∞

1 log P RN ∈ G > − Iε (µ) + φ(ε). d N

Using (4.11) this implies lim inf N→∞

1 1 log P RN ∈ G > − I(µ) − εµ(φ02 ). d N 2

Letting ε → 0 we obtain (4.22). This completes the proof of Theorem 4.1. u t

(4.22)

618

P. Caputo, J.-D. Deuschel

4.2. Proof of Theorem 4.3. Before proving the bounds (4.17) and (4.18), we shall derive some useful facts about the measures P ε and P N,ε . Let Q be given by (2.4). For x, y ∈ Zd , we define the modified couplings ( Q(x, y) Qε (x, y) if x ∈ 3N , Qε,N (x, y) = . (4.23) Qε (x, y) = Q(x, y) if x ∈ / 3N (1 + ε) Lemma 4.4. Let Gε be the covariance of the Gaussian field P ε . Then Gε = (1 + ε)−1

∞ X (Qε )n .

(4.24)

n=0

In particular, the marginal of P ε on each 3 ⊂⊂ Zd is given by the centered Gaussian measure with covariance ∞ X ˜ ε3 )n , (Q (4.25) Gε3 = (1 + ε)−1 where, for x, y ∈ 3, n=0 ˜ ε3 (x, y) = Qε (x, y) + Q

∞ X X

···

/ l=1 z1 ∈3

X

Qε (x, z1 ) · · · Qε (zl , y).

(4.26)

zl ∈3 /

Proof. The covariance Gε is, by definition, Z 1 Gε (x, y) = f ε (θ ) ei (y−x)·θ dθ. (2π )d (−π,π]d From (4.9) we see that f ε is the Fourier transform of (1 + ε)−1 (1 − Qε )−1 . This implies (4.24). The rest is a straightforward consequence of Lemma A.1 applied to (4.24). u t Lemma 4.5. Let 3 ⊂ 3N . Then the marginal of P N,ε on 3 is the centered Gaussian measure with covariance ∞ X −1 ˜ ε,N )n , = (1 + ε) (Q (4.27) Gε,N 3 3 where, for x, y ∈ 3, n=0 ˜ ε,N (x, y) = Qε (x, y) + Q 3

∞ X X

···

/ l=1 z1 ∈3

X

Qε,N (x, z1 ) · · · Qε,N (zl , y).

(4.28)

zl ∈3 /

Proof. We first observe that the marginal of P N,ε on 3N is the centered Gaussian measure with covariance −1 . (4.29) Gε,N = G−1 N + ε1N ˜ From Lemma A.1 we have G−1 N = 1N − QN and we can write Gε,N = (1 + ε)−1

∞ X n=0

˜n . (1 + ε)−n Q N

(4.30)

The marginal of P N,ε on 3 ⊂ 3N is then the centered Gaussian measure with covariance {Gε,N (x, y)}x,y∈3 . By applying Lemma A.1 to (4.30), it is straightforward to check that the effective couplings in (4.27) are given precisely by (4.28). u t

Harmonic Crystals

619

Remark 6. In the ferromagnetic case, the couplings Qε , cf. (4.23), define a discrete time random walk on Zd with constant killing. The probability to die at each step is ε/(1 + ε). Similarly, the couplings Qε,N define a random walk for which killing occurs only inside 3N . Moreover, the covariance Gε can be seen as the Green function associated to the random walk with constant killing probability ε/(1 + ε), with killing starting at time 0, cf. (4.24). We turn to the proof of the upper bound (4.17). We use here the projective limit approach. Let M ∈ Z+ and let πM : → M be the canonical projection φ → φ3M . −1 We will also use the notation µ ◦ πM to denote the marginal µM on 3M of a measure µ ∈ M1 (). The Dawson-Gärtner theorem (see e.g. Theorem 4.6.1 in [9]) shows that in order to establish the upper bound (4.17) it is enough to show, for each M ∈ Z+ , for each closed set FM ⊂ M1 (M ), 1 −1 log P N,ε RN ◦ πM ∈ FM d N→∞ N 6 − inf Iε (µ), µ ∈ M1 () : µM ∈ FM .

lim sup

(4.31)

Let us fix M ∈ Z+ and denote by ρM a metric on M1 (M ), compatible with the weak topology and such that ρM is dominated by the total variation norm (cf. e.g. [15]). For a given δ ∈ (0, 1), define X 1 δθx φN , RN,δ (φ) = |3N,δ | x∈3N,δ

where

d 3N,δ ≡ − (1 − δ)N/2, (1 − δ)N/2 ∩ Zd .

It is easy to check that there exist constants δ0 (d) > 0 and c(d) < ∞, only depending on d, such that for any bounded measurable function f , for any N ∈ Z+ , sup RN (φ) (f ) − RN,δ (φ) (f ) < c(d) kf k∞ δ φ∈

provided δ < δ0 (d). This implies

sup ρM RN (φ), RN,δ (φ) < c(d) δ

φ∈

(4.32)

for δ < δ0 (d), for integers N > M. Now observe that, if e.g. N > 2M/δ, the map −1 φ → RN,δ (φ) ◦ πM

is F3N,δ/2 -measurable. Let us define the density φN,δ ≡

N,ε dPN,δ/2

ε dPN,δ/2

,

(4.33)

N,ε ε and PN,δ/2 are the marginals of P N,ε and P ε respectively, on the box where PN,δ/2 3N,δ/2 . We define the set n o F (δ) = ν ∈ M1 (M ) ρM (ν, FM ) 6 c(d)δ .

620

P. Caputo, J.-D. Deuschel

Then, by (4.32), −1 −1 ∈ FM 6 P N,ε RN,δ ◦ πM ∈ F (δ) P N,ε RN ◦ πM

−1 ∈ F (δ) . = P ε φN,δ ; RN,δ ◦ πM

For each 1 < p < ∞, q = p/(p − 1), Hölder’s inequality yields −1 ∈ FM logP N,ε RN ◦ πM 1 1 p −1 ∈ F (2δ) , 6 log P ε φN,δ + log P ε RN ◦ πM p q

(4.34)

where we have used (4.32) again. We shall need the following key approximation result. Lemma 4.6. Let φN,δ be the density defined in (4.33). For all p > 1, ε > 0 and δ > 0, we have 1 1 p −p log P ε φN,δ = lim log P N,ε φN,δ = 0. (4.35) lim d d N→∞ N N →∞ N Let us show that Lemma 4.6 allows to prove the desired estimate. Suppose (4.35) holds. Then (4.34) together with the upper bound of Theorem 4.2 imply lim sup N→∞

1 −1 log P N,ε RN ◦ πM ∈ FM Nd 1 −1 ∈ F (2δ) . 6 − inf Iε (µ), µ ∈ M1 () : µ ◦ πM q

This implies (4.31) since q and δ are arbitrarily close to 1 and 0 respectively. The proof of (4.17) is complete. Before proving Lemma 4.6, we derive the lower bound (4.18). Let us fix an open set GM ⊂ M1 (M ), for a given M ∈ Z+ . Given δ > 0 define G(δ) = µ ∈ M1 (M ) ρM (µ, GcM ) > c(d)δ . Then, by (4.32) −1 −1 ∈ GM > P N,ε RN,δ ◦ πM ∈ G(δ) P N,ε RN ◦ πM

−1 ∈ G(δ) . = P ε φN,δ ; RN,δ ◦ πM

For any 1 < p < ∞, p/q = p − 1, Hölder’s inequality yields −1 ∈ GM logP N,ε RN ◦ πM q −p −1 ∈ G(2δ) . > − log P N,ε φN,δ + q log P ε RN ◦ πM p

(4.36)

From Lemma 4.6, the arbitrariness of q and δ, and from the lower bound of Theorem 4.2, we have 1 −1 ∈ GM > − Iε (µ), lim inf d log P N,ε RN ◦ πM N→∞ N for any µ ∈ M1 () such that µM ∈ GM . This implies (4.18). The proof of Theorem 4.3 is completed. u t

Harmonic Crystals

621

4.3. Proof of Lemma 4.6. We fix ε > 0 and δ > 0. Let GN,ε be the covariance function of the field P N,ε , cf. (4.29). Let us define the matrices CN = GN,ε (x, y) x,y∈3 , DN = Gε (x, y) x,y∈3 . N,δ/2

N,δ/2

Then (4.33) can be written 1 D E 1 −1 2 −1 −1 σ, (CN ) exp − − DN )σ . φN,δ (σ ) = det (DN CN 2 An explicit computation shows that 1 p P ε φN,δ = det (1N + BN )p/2 det (1N + p BN )− 2 ,

(4.37)

(4.38)

where we have defined −1 − 1N , BN = DN CN

with 1N denoting the unit diagonal matrix on 3N,δ/2 . Moreover, setting q = p/(p − 1), − 21 p −p P N,ε φN,δ = det (1N + BN )−p/2q det (1N − BN ) . q

(4.39)

Let us denote by {λj (N ), j = 1, . . . , |3N,δ |} the set of eigenvalues of the matrix BN . Observe that, by elementary matrix calculus, we have X |BN (x, y)|. (4.40) sup |λj (N)| 6 kBN k ≡ sup x∈3N,δ y∈3 N,δ

j

From (4.38), 1 X p X p log (1 + λi ) − log (1 + p λi ) log P ε φN,δ = 2 2 i

i

1 p 6 |3N,δ | log (1 + kBN k) − |3N,δ | log (1 − p kBN k). 2 2 A similar computation for (4.39) shows −p logP N,ε φN,δ 1 p p |3N,δ | log (1 − kBN k) − |3N,δ | log (1 − kBN k). 6 − 2q 2 q In order to prove (4.35) it is then sufficient to show lim kBN k = 0.

N →∞

We have −1 −1 − DN k, kBN k 6 kDN k kCN

(4.41)

622

P. Caputo, J.-D. Deuschel

where the norm k · k is defined as in (4.40). From the absolute summability of Gε , cf. Appendix, it follows that for each ε > 0 sup kDN k < ∞.

N ∈Z+

Then (4.41) will follow from −1 −1 − DN k = 0. lim kCN

N→∞

(4.42)

Thanks to Lemma 4.4 and Lemma 4.5, we can write, for x, y ∈ 3N,δ/2 , h i −1 −1 ˜ ε,N (x, y) − Q ˜ ε3 (x, y) − DN (x, y) = (1 + ε) Q (x, y) . CN 3N,δ/2 N,δ/2 We now use explicitly the expressions (4.23), (4.26) and (4.28). Given sites / 3N,δ/2 , . . . , zl ∈ / 3N,δ/2 , z1 ∈ / 3N . Then, for all let k(z1 , . . . , zl ) be the number of i’s, i ∈ {1, . . . , l} such that zi ∈ x, y ∈ 3N,δ/2 , letting pε (x, y) = |Qε (x, y)|, −1 −1 (x, y) − DN (x, y)| |CN

6

∞ X

X

/ N,δ/2 l=1 z1 ∈3

6 2

∞ X

X

···

X

zl ∈3 / N,δ/2

···

/ N,δ/2 l=1 z1 ∈3

pε (x, z1 ) · · · pε (zl , y) (1 + ε)k(z1 ,...,zl ) − 1

X

pε (x, z1 ) · · ·

zl ∈3 / N,δ/2

· · · p ε (zl , y) (1 + ε)k(z1 ,...,zl ) χ(k(z1 , . . . , zl ) > 1),

(4.43)

where χ stands for the indicator function. Let us consider now the random walk on Zd with rates pε,N (x, y) = |Qε,N (x, y)|, i.e. the process with killing occurring only inside 3N , cf. the remark after Lemma 4.5. We observe that the quantity appearing in the r.h.s. of (4.43), apart from the factor 2, is the probability that the walker started in x ∈ 3N,δ/2 , arrives eventually in y ∈ 3N,δ/2 taking only jumps out of 3N,δ/2 and visiting at least once the set Zd \ 3N . Let {ξn }, n = 0, 1 . . . , ξ0 = x, denote the above described process, and let Px be the associated probability measure. Let τε be the time at which {ξn } is killed. Let also τN be the hitting time for Zd \ 3N , i.e. / 3N }. τN = inf {n > 1, ξn ∈ Summing over y ∈ 3N,δ/2 in (4.43) we can estimate −1 −1 − CN k62 kDN

sup

x∈3N,δ/2

Px τε > τN ).

(4.44)

Observe that due to the constant killing inside 3N , there exists m(ε) > 0 such that for any N, x ∈ 3N,δ/2 for any l > 0 sufficiently large Px τε > l ; τN = l ) 6 exp (−m(ε) l).

Harmonic Crystals

623

We then estimate, for any sufficiently large integer n, Px τε > τN ) 6

n X

∞ X Px τN = l + Px τε > l ; τN = l

l=0

l=n+1

6 n Px τN 6 n + exp (−m(ε) n/2).

(4.45)

On the other hand, for each fixed n > 0, sup

x∈3N,δ/2

Px τN 6 n 6 n

X

p(0, z),

|z|>δN/n

is vanishing in the limit N → ∞. Since n is arbitrary in (4.45), this implies (4.42) and the lemma follows. u t 4.4. The hypercontractive regime. This subsection is devoted to the proof of Theorem 4.2. Let the rate functions I and Iε be defined as in (4.2) and (4.10). Let also φ(ε) be given by (4.12). Lemma 4.7. For each ε > 0, Iε satisfies the variational principle and the identity

Iε (µ) = 0 ⇐⇒ µ = P ε , µ ∈ M1 ()

(4.46)

ε Iε (µ) = I(µ) + µ(φ02 ) + φ(ε), µ ∈ K∞ . 2

(4.47)

Proof. From Proposition 3.1 we have h(µ|P ε ) = eε (µ) − s(µ) + pε ,

(4.48)

where the entropy s is given by (3.3) (independent of ε), and eε , pε are the internal energy and the pressure associated to the potential J + ε1. According to (3.7) and (4.9) we see that p ε − p = φ(ε). Then (4.47) follows from the obvious identity (cf. (3.5)) ε eε (µ) − e(µ) = µ(φ02 ). 2 To prove the variational principle we may use Theorem 3.2 and the fact that P ε is the only shift invariant Gibbs measure with finite second moment for the potential J + ε1, cf. e.g. Corollary (13.40) in [20]. u t Upper bound. Once the Hypercontractivity of P ε is established, see Lemma B.2, the proof of the upper bound in Theorem 4.2 follows the standard line, cf. Sect. 5.4 of [11]. We point out the main steps needed. Step 1. Let Q ⊂⊂ Zd be a fixed cube containing the origin. Using Lemma B.2 one shows that for all bounded measurable function f such that Sf ⊂ Q, the limit 3Q (f ) = lim

N →∞

X 1 ε log P f ◦ θx exp d N x∈3N

(4.49)

624

P. Caputo, J.-D. Deuschel

exists, cf. Appendix. An application of Hölder’s inequality shows that 3Q is a convex functional. Its Legendre transform is the convex function on M1 (Q ) given by 3∗Q (µ) = sup µ(f ) − 3Q (f ) , (4.50) f

where the supremum is taken over continuous bounded f with Sf ⊂ Q. Step 2. Following Lemma 5.4.13 and Corollary 5.1.11 in [11], one can check that 3∗ has compact level sets on M1 (Q ) and for each closed set FQ ⊂ M1 (Q ), 1 −1 ∈ FQ 6 − inf 3∗Q (µ). (4.51) lim sup d log P ε RN ◦ πQ µ∈FQ N→∞ N Step 3. Let −1 ), µ ∈ M1 (), 3∗ (µ) = sup 3∗Q (µ ◦ πQ Q

where the supremum is taken over all finite cubes containing the origin. By definition, the function 3∗ is non-negative, convex and lower semicontinuous. Moreover, the standard projective limit argument (cf. e.g. [9], Theorem 4.6.1) shows that 3∗ has compact level sets, and (4.51) can be lifted to 1 (4.52) lim sup d log P ε RN ∈ F 6 − inf 3∗ (µ) µ∈F N→∞ N for all closed sets F ⊂ M1 (). Step 4. With the same arguments as in [11], Lemma 5.4.17, one then proves the identity 3∗ (µ) = Iε (µ),

µ ∈ M1 ().

From Step 3, this implies that Iε has compact level sets and satisfies the upper bound (4.13) of Theorem 4.2. Lower bound. We remark that a proof of the lower bound in Theorem 4.2 using hypercontractivity as in [6,11] would require a condition stronger than Lemma B.2 (cf. Sect. 5.4 of [11]). Since the proof we give below does not depend on hypercontractivity we may equally well set ε = 0 and consider the original field P , instead of P ε . We prove that for any µ ∈ MS1 () and for any open neighbourhood G of µ, 1 log P RN ∈ G > − I(µ). (4.53) d N→∞ N The above statement is equivalent to the lower bound (4.14), if we replace P by P ε and I by Iε . A classical change of measure argument, [13] or Lemma 5.4.21 in [11], implies that (4.53) is always satisfied if µ is ergodic with respect to lattice shifts. The general case follows from ergodic approximation. Namely, let µ ∈ MS1 (), and assume µ ∈ K∞ . Following Theorem (14.12) and Proposition (16.40) of [20] we construct a sequence of ergodic measures µn ∈ K∞ such that µn → µ weakly and s(µn ) → s(µ), where s denotes the entropy (3.3). Moreover, using the explicit form of µn it is not hard to check that e(µn ) → e(µ), cf. (3.5). Proposition 3.1 now implies that lim inf

I(µn ) → I(µ), n → ∞, and the lower bound (4.53) follows. u t

Harmonic Crystals

625

A. The Expansion The point of this section is to replace (2.6) by an expansion involving only paths belonging to some finite region 3 ⊂⊂ Zd rather than the whole lattice. We fix a finite box 3 ⊂⊂ Zd , and for each integer n > 1, we define the matrix X X ··· Q(x, z1 ) · · · Q(zn , y) x, y ∈ 3, Bn (x, y) = z1 ∈3 /

zn ∈3 /

Let also B0 (x, y) = Q(x, y). Here Q is given by (2.4). Lemma A.1. For each x, y ∈ 3, we have G(x, y) =

∞ X n=0

˜ n3 (x, y), Q

(A.1)

˜ 3 is the matrix given by where Q ˜ 3 (x, y) = Q

∞ X

Bl (x, y).

(A.2)

l=0

Remark. In the ferromagnetic case Q = p, the above lemma becomes a restatement of ˜ 3 (x, y) can be interpreted as the results of [35], Chapter 3. In particular, in this case Q the transition function of a “renormalized” or embedded random walk. Proof. Let x, y ∈ 3 be fixed. Then, from (2.6), we have ∞ X

G(x, y) = δ(x, y) + Q(x, y) +

Qn (x, y).

(A.3)

n=2

Moreover, ∞ ∞ X X X X Qn (x, y) = ··· Q(x, z1 ) · · · Q(zn , y) n=2

n=1 z1 ∈Zd

=

∞ X

X

X

n=1 π∈{3,3c }n z1 ∈π(1)

zn ∈Zd

···

X

Q(x, z1 ) · · · Q(zn , y),

zn ∈π(n)

where we have isolated the contribution of each sequence π ∈ {3, 3c }n . For integers n > 1 and k 6 n, define the matrix element A(n, k) by X X X ··· Q(x, z1 ) · · · Q(zn , y), A(n, k) = π ∈{3,3c }n : #3 (π )=k

z1 ∈π(1)

zn ∈π(n)

where we only have summed the contributions of those sequences π ∈ {3, 3c }n such that 3 occurs exactly k times (#3 (π ) = k). Setting A(0, 0) = 0, we have ∞ X n=2

Qn (x, y) =

n ∞ X X n=1 k=0

A(n, k) =

∞ ∞ X X k=0 n=k

A(n, k).

(A.4)

626

P. Caputo, J.-D. Deuschel

Now observe that A(n, 0) = Bn (x, y), for all n > 1. In particular ∞ X

˜ 3 (x, y) − Q(x, y), A(n, 0) = Q

n=1

˜ 3 is defined by (A.2). From (A.3) and (A.4) we have where Q ˜ 3 (x, y) + G(x, y) = δ(x, y) + Q

∞ ∞ X X

A(n, k).

k=1 n=k

In order to finish the proof, we have to check the identity ∞ X

˜ k+1 (x, y), k > 1. A(n, k) = Q 3

n=k

(A.5)

For fixed integers n > k > 1, we can write X X ··· A(n, k) = 1 6 i1 <···
X

···

Bi1 −1 (x, ξ1 ) Bi2 −i1 −1 (ξ1 , ξ2 ) · · · Bn−ik (ξk , y).

ξk ∈3

Summing over n we obtain ∞ X

A(n, k) =

n=k

∞ X

∞ X

···

i1 =1 i2 =i1 +1

···

∞ X

X

∞ X X

···

ik =ik−1 +1 n=ik ξ1 ∈3

Bi1 −1 (x, ξ1 ) Bi2 −i1 −1 (ξ1 , ξ2 ) · · · Bn−ik (ξk , y).

ξk ∈3

The above line can now be written as ˜ k+1 (x, y) = Q 3

∞ X l1 =0

···

∞ X X lk+1 =0 ξ1 ∈3

···

X

Bl1 (x, ξ1 ) Bl2 (ξ1 , ξ2 ) · · · Blk+1 (ξk , y).

ξk ∈3

This proves the identity (A.5) and completes the proof of (A.1). u t B. Hypercontractivity Estimates We start with an application of Nelson’s famous result on the hypercontractivity of Gaussian measures. Similar estimates have been obtained in the work [2]. Let P ε be the centered Gaussian field with spectral density given in (4.9). Also let β > 0 be as in the decay assumption (2.5). We denote by kf kp , 1 6 p < ∞ the Lp norm of f w.r.t. P ε . Lemma B.1. There exist constants α = α(β) > 0 and l0 > 0 such that for all l > l0 and for any pair of local functions f, g ∈ L1 (P ε ) with d(Sf , Sg ) > l, kf gk1 6 kf k1+τ (l) kgk1+τ (l) , with τ (l) 6 l −α .

(B.1)

Harmonic Crystals

627

Proof. Given two configurations ξ, η ∈ of finite support, we define X hξ, ηiε = ξx Gε (x, y) ηy , x,y∈Zd

and let kξ k2ε = hξ, ξ iε . Following [6] we define, for 31 , 32 ⊂⊂ Zd , τ (31 , 32 ) = sup hξ, ηiε , kξ kε = kηkε = 1, ξ = 0 on 3c1 , η = 0 on 3c2 , and set, for any l > 0,

τ (l) = sup τ (31 , 32 ) ; 31 , 32 ⊂⊂ Zd , d(31 , 32 ) > l .

(B.2)

We may use now Nelson’s estimate (cf. e.g. Theorem 5.11 and Lemma 5.12 in [6]), to write kf gk1 6 kf k1+τ (l) kgk1+τ (l) , for any pair of local functions f, g ∈ L1 (P ε ) with d(Sf , Sg ) > l. Let us check the decay of τ (l), as l → ∞. Choose 31 , 32 ⊂⊂ Zd and ξ, η ∈ such that ξ = 0 on 3c1 and η = 0 on 3c2 , and kξ kε = kηkε = 1. We write X (ξx2 + ηy2 ) |Gε (x, y)| | hξ, ηiε | 6 2 x,y∈Zd

6 4 sup

X

x∈31 y∈3

|Gε (x, y)| + 4 sup

X

y∈32 x∈3

2

|Gε (x, y)|,

(B.3)

1

having used the fact that, for any ξ ∈ of finite support, X ξx2 6 2 kξ k2ε , x∈Zd

which is easily checked using e.g. the Fourier transform (2.7). It is then sufficient to show X |Gε (0, x)| 6 l −α , (B.4) |x|>l

for some α > 0 and for all l sufficiently large. We first observe that Gε (0, x) is indeed absolutely summable, cf. (4.24). We can then write X |x|>l

|Gε (0, x)| 6

X |x|>l

p(0, x) +

∞ X X (1 + ε)−n pn (0, x). n=2

(B.5)

|x|>l

Let β be the constant appearing in (2.5). It is easy to check that the first term above can be bounded by l −β as soon as l is sufficiently large. Let P0 denote the probability measure for the random walk {ξ0 , ξ1 , ξ2 , . . . } with transition p and starting point ξ0 = 0. For each n > 2, we have X pn (0, x) = P0 |ξn | > l 6 n P0 |ξ1 | > l/n 6 c1 n1+β l −β , |x| > l

628

P. Caputo, J.-D. Deuschel

where c1 = c1 (d) is a finite constant (independent of n and l). Therefore there exist constants c2 < ∞ and m(ε) > 0, such that for any integer M, ∞ M X X X X (1 + ε)−n pn (0, x) 6 pn (0, x) + n=2

+

∞ X

(1 + ε)−n

n=M+1

(B.6)

n=2 |x|>l

|x|>l

X

pn (0, x) 6 c2 M 2+β l −β + exp(−m(ε)M).

|x|>l

We choose now M = M(l) = m(ε)−1 β log l in (B.6). From (B.5) we see that this implies (B.4) for any 0 < α < β and l sufficiently large. u t Let fi : → R, i = 1, . . . , n, be a finite collection of local functions. Given a number l > 0, we say that f1 , f2 , . . . , fn are l-separated if their supports S1 , S2 , . . . , Sn are such that d(Si , Sj ) > l, for all i 6 = j. Lemma B.2. There exists a constant l0 > 0 and a non-increasing function ρ : [0, ∞) → [1, ∞), with ρ(l) → 1, as l → ∞, such that the following holds for arbitrary integers n ∈ Z+ , l > l0 and L > l. Suppose f1 , f2 , . . . , fn are l-separated local functions in L1 (P ε ). Suppose there exist sites xi ∈ (L + l)Zd , i = 1, . . . , n with Si ⊂ Bi ≡ xi + [0, L + l]d ∩ Zd , i = 1, . . . , n. Then n n

Y Y

fi 6 kfi kρ(l) .

i=1

1

(B.7)

i=1

Proof. Let us consider the case d = 2. The general case requires an obvious generalization of the argument. Let l > 0, L > l and let f1 , f2 , . . . , fn and B1 , . . . , Bn be as above. We group the cubes Bi , i = 1, . . . , n into four families A,B,C and D, depending on whether the coordinates of the corresponding xi are (even,even),(odd,even),(odd,odd) or (even,odd) respectively. Define Y fi . fA ≡ i: Si ∈A

We define fB , fC and fD in the same way. In this way fA , fB , fC and fD are l-separated functions, and Lemma B.1 yields n

Y

fi 6 kfA k1+τ (l) kfB fC fD k1+τ (l)

i=1

1

6 kfA k1+τ (l) kfB k(1+τ (l))2 kfC k(1+τ (l))3 kfD k(1+τ (l))3 .

Harmonic Crystals

629 Q1

A

B

A

B

f1 Q2

C

D

A

B

C

f2

D

A

B Qn

fn C

L

C

D

D

l

Fig. 1. Geometric construction in the proof of Lemma B.2

We can use the fact that kf kp 6 kf kp0 , for 1 < p 6 p0 , to see that, for some constant γ ∈ (0, ∞) (independent of n, L and l) n

Y

fi 6 kfA k1+γ τ (l) kfB k1+γ τ (l) kfC k1+γ τ (l) kfD k1+γ τ (l) .

i=1

1

(B.8)

Since L > l, we see that, within each family A,B,C,D, the functions are now at least 2l-separated. Therefore we can repeat the argument by dividing the family A into four subfamilies AA,AB,AC and AD. Within each subfamily the functions are now 4lseparated. Implementing this idea, and iterating (B.8), we have n n

Y Y

f 6 kfi kρ(l) ,

i i=1

1

i=1

where we have defined ρ(l) ≡

∞ Y

(1 + γ τ (2k l)).

k=1

Using the bound τ (l) 6 l −α , cf. Lemma B.1, it is straightforward to check that ρ(l) satisfies the desired properties. u t The hypothesis of Lemma B.2 are tailored in such a way that they apply to the following situation (see [11], Lemma 5.4.13, for the one-dimensional case). Suppose a local bounded function f is given, whose support Sf is contained in the cube of side q ∈ Z+ , Q = [0, q − 1]d ∩ Zd .

630

P. Caputo, J.-D. Deuschel

We show that the limit 3Q (f ) = lim

N→∞

X 1 log P ε exp f ◦ θx d N

(B.9)

x∈3N

exists. To this end we may assume f > 0, otherwise replace f with f + kf k∞ . By shift invariance of P ε , we may replace 3N in (4.49) by the cube BN = [0, N − 1]d ∩ Zd . We also let BN (x) denote the translate BN + x, x ∈ Zd . Let M ∈ Z+ , M < N and divide BN into translates of the cube BM . If m = [N/M], the integer part, then BN = BmM ∪ RN,M , with |RN,M | 6 cM N d−1 , for some M-dependent constant cM . We now enlarge the cubes BM contained in BmM by changing their side from M to M + q + l, for some l > 0. We choose md sites xi ∈ (M + q + l)Zd , in such a way that d

BmM ⊂

m \

BM+q+l (xi ).

i=1

We may estimate X

d

f ◦ θx 6

x∈BN

m X

X

f ◦ θx + cM N d−1 kf k∞ .

i=1 x∈BM+q+l (xi )

Moreover, there is a finite constant cq,l , depending on q and l, such that for any i = 1, . . . , md and for any M ∈ Z+ , | BM+q+l (xi ) \ BM (xi ) | 6 cq,l M d−1 , and therefore md X X f ◦ θx 6 (cq,l md M d−1 + cM N d−1 ) kf k∞ + x∈BN

X

f ◦ θx .

(B.10)

i=1 x∈BM (xi )

By construction, the supports Si of the functions X f ◦ θx fi ≡ x∈BM (xi )

are contained in BM+q (xi ), so that d(Si , Sj ) > l, for all i 6 = j and, setting L = M + q, we see that Lemma B.2 applies. If l is large enough we can then use the estimate (B.7) to obtain md md h i1/ρ(l) Y Y ε exp fi 6 , (B.11) P ε (exp ρ(l)fi ) P i=1

i=1

where ρ(l) → 1, as l → ∞. Also, notice that for each i = 1, . . . , md , 1 log P ε exp ρ(l)fi 6 log P ε exp fi + (ρ(l) − 1) M d kf k∞ . ρ(l)

Harmonic Crystals

631

Let us define

X f ◦ θx . 0N (f ) = log P ε exp x∈BN

Using shift invariance of P ε , from (B.11) and (B.10) we have

0N (f ) 6 md 0M (f ) + md M d (ρ(l) − 1) + cq,l md M d−1 + cM N d−1 kf k∞ .

For fixed M and l, this implies lim sup N→∞

cq,l 0M (f ) 0N (f ) kf k∞ . 6 + (ρ(l) − 1) + d d N M M

By letting first M → ∞, and then l → ∞, we have lim sup N→∞

0N (f ) 0M (f ) 6 lim inf . d M→∞ N Md

This proves the existence of the limit in (4.49). Acknowledgements. The authors thank Yvan Velenik and Aernout van Enter for stimulating conversations.

References 1. Bolthausen, E., Deuschel, J.D.: Critical large deviations for Gaussian fields in the phase transition regime. Ann. Probab. 21, 1876–1920 (1993) 2. Bolthausen, E., Deuschel, J.D., Zeitouni, O.: Entropic repulsion of the lattice free field. Commun. Math. Phys. 170, 417–443 (1995) 3. Bryc, W., Dembo, A.: On large deviations of empirical measures for stationary Gaussian processes. Stochastic Processes Appl. 58, 23–34 (1995) 4. Brydges, D., Fröhlich, J., Spencer, T.: The random walk representation of classical spin systems and correlation inequalities. Commun. Math. Phys. 83, 123–150 (1982) 5. Caputo, P., Deuschel, J.D.: Critical large deviations for massless Gaussian fields with long range interaction. In preparation 6. Chyonobu, T., Kusuoka, S.: The large deviations principle for hypermixing processes. Probab. Th. Rel. Fields 78, 627–649 (1988) 7. Comets, F.: Grandes deviations pour des champs de Gibbs sur Zd . C. R. Acad. Sci. Paris I 303, 511–513 (1986) 8. Cox, J.T., Greven, A., Shiga, T.: Finite and infinite systems of interacting diffusions. Probab. Th. Rel. Fields 102, 165–197 (1995) 9. Dembo, A., Zeitouni, O.: Large deviations. Boston: Jones and Burtlett publishers, 1993 10. Deuschel, J.D., Giacomin, G., Ioffe, D.: Large deviations and concentration properties for ∇ϕ interfaces. Preprint 1998 11. Deuschel, J.D., Stroock, D.W.: Large deviations. Pure and Applied Math. 137, NewYork: Academic Press 1989 12. Dobrushin, R.L.: Gaussian random fields: Gibbsian point of view. In: Multicomponent random systems, R.L. Dobrushin and Ya.G. Sinai eds., New York: Marcel Dekker, 1980 13. Donsker, M.D., Varadhan, S.R.S.: Asymptotic evaluation of certain Markov process expectations for large time IV. Commun. Pure Appl. Math. 28, 1–47 (1975) 14. Donsker, M.D., Varadhan, S.R.S.: Large deviations for stationary Gaussian processes. Commun. Math. Phys. 97, 187–210 (1985) 15. Dudley, R.M.: Real analysis and probability. Monterey: Wadsworth and Brooks/Cole, 1989 16. van Enter, A.C.D., Fernandez, R., Sokal, A.D.: Regularity properties and pathologies of position–space renormalization–group transformations: scope and limitations of Gibbsian theory. J. Stat. Phys. 72, 879– 1167 (1993) 17. Föllmer, H.: On entropy and information gain in random fields. Z. Wahrsch. verw. Gebiete, 26, 207–217 (1973)

632

P. Caputo, J.-D. Deuschel

18. Föllmer, H., Orey, S.: Large deviations for the empirical field of a Gibbs measure. Ann. Probab. 16, 961–977 (1988) 19. Fröhlich, J., Pfister, C.: On the absence of spontaneous symmetry breaking and of crystalline ordering in two–dimensional systems. Commun. Math. Phys. 81, 277–298 (1981) 20. Georgii, H.O.: Gibbs measures and phase transitions. Studies in Math. 9, Amsterdam: de Gruyter, 1988 21. Georgii, H.O.: Large deviations and maximum entropy principle for interacting random fields on Zd . Ann. Probab. 21, 1845–1875 (1993) 22. Künsch, H.: Thermodynamics and statistical analysis of Gaussian random fields. Z. Wahrsch. verw. Gebiete 58, 407–421 (1981) 23. Künsch, H.: Almost sure entropy and variational principle for random fields with unbounded state space. Z. Wahrsch. verw. Gebiete 58, 69–85 (1981) 24. Lanford III, O.E.: Entropy and equilibrium states in classical statistical mechanics. In: Statistical mechanics and mathematical problems, Lecture Notes in Physics 20, Berlin: Springer 1973, pp. 1–113 25. Lawler, G.F.: Intersections of random walks. Boston: Birkhäuser, 1991 26. Lebowitz, J.L., Presutti, E.: Statistical mechanics of unbounded spins systems. Commun. Math. Phys. 50, 195–218 (1976) 27. Lefevere, R.: Variational Principle for some renormalized measures. J. Stat. Phys. 96, 109–134 (1999) 28. Lörinczi, J., Maes, C.: Weakly Gibbsian measures for lattice spin systems. J. Stat. Phys. 89, 561–579 (1997) 29. Maes, C., Redig, F., Van Moffaert, A.: Almost Gibbsian versus weakly Gibbsian measures. Stoch. Proc. Appl. 79, 1–15 (1999) 30. Montroll, E.W., West, B.J.: On an enriched collection of stochastic processes. In: Fluctuation phenomena, Studies in statistical mechanics vol. VII, eds. E.W. Montroll and J.L. Lebowitz, Amsterdam: North– Holland, 1979 31. Nelson, E.: The free Markov field. J. Funct. Anal. 12, 211–227 (1973) 32. Olla, S.: Large deviations for Gibbs random fields. Probab. Th. Rel. Fields 77, 343–357 (1988) 33. Preston, C.: Random fields. Lecture Notes in Mathematics 534, Berlin–Heidelberg–New York: Springer, 1976 34. Rozanov, Y.A.: On Gaussian fields with given conditional distributions. Th. Prob. Appl. 12, 381–391 (1967) 35. Spitzer, F.: Principles of random walks, 2nd ed. Berlin–Heidelberg–New York: Springer, 1976 Communicated by J. L. Lebowitz

Commun. Math. Phys. 209, 633 – 670 (2000)

Communications in

Mathematical Physics

© Springer-Verlag 2000

Albanese Maps and Off Diagonal Long Time Asymptotics for the Heat Kernel Motoko Kotani1,? , Toshikazu Sunada2 1 Department of Mathematics, Faculty of Science, Toho University, Miyama 2-2-1, Funabashi, Chiba 274,

Japan. E-mail: [email protected]

2 Mathematical Institute, Graduate School of Sciences, Tohoku University, Aoba, Sendai 980-77, Japan.

E-mail: [email protected] Received: 20 September 1998 / Accepted: 19 August 1999

Abstract: We discuss long time asymptotic behaviors of the heat kernel on a noncompact Riemannian manifold which admits a discontinuous free action of an abelian isometry group with a compact quotient. A local central limit theorem and the asymptotic power series expansion for the heat kernel as the time parameter goes to infinity are established by employing perturbation arguments on eigenvalues and eigenfunctions of twisted Laplacians. Our ideas and techniques are motivated partly by analogy with Floque–Bloch theory on periodic Schrödinger operators. For the asymptotic expansion, we make careful use of the classical Laplace method. In the course of a discussion, we observe that the notion of Albanese maps associated with the abelian group action is closely related to the asymptotics. A similar idea is available for asymptotics of the transition probability of a random walk on a lattice graph. The results obtained in the present paper refine our previous ones [4]. In the asymptotics, the Euclidean distance associated with the standard realization of the lattice graph, which we call the Albanese distance, plays a crucial role.

1. Introduction We first recall that the heat kernel on the m-dimensional Euclidean space is exactly given by −m/2

k(t, x, y) = (4π t)

kx − yk2 . exp − 4t

? Present address: Mathematical Institute, Graduate School of Sciences, Tohoku University, Aoba, Sendai 980-77, Japan. E-mail: [email protected]

634

M. Kotani, T. Sunada

This shape is almost preserved for the heat kernel on a more general Riemannian manifold X as far as the near-diagonal short time asymptotics are concerned; say d(x, y)2 k(t, x, y) ∼ (4πt)−m/2 exp − 4t × (a0 (x, y) + a1 (x, y)t + a2 (x, y)t 2 + . . . ) (t ↓ 0), where m = dim X and d(x, y) denotes the Riemannian distance between x and y (see S. A. Molchanov [6] and Y. Kannai [2] for instance). It should be noted that the nature of coefficients ai (x, y) is “local” in the sense that they are described by quantities defined only on a neighborhood of the shortest geodesic joining x and y. This is roughly explained by the intuitive observation that the short time behavior of the heat diffusion on X should be similar to the one on the Euclidean space. On the other hand, the asymptotics of k(t, x, y) as t goes to infinity have to be controlled by the global properties of the manifold, and hence might become much more complicated than short time asymptotics. Actually, we have very few examples of non-compact manifolds for which the long time asymptotics of the heat kernel are established. In the present paper, we shall confine ourselves to a class of “periodic” manifolds. Namely we consider a non-compact Riemannian manifold X which admits a discontinuous free action of an abelian isometry group 0 with a compact manifold M = 0\X as its quotient. We denote by π : X → M the canonical covering map, associated with which we have a surjective homomorphism of H1 (M, Z) onto 0 and its extension to a surjective linear map of H1 (M, R) = H1 (M, Z) ⊗ R onto 0 ⊗ R. We introduce an inner product on H1 (M, R) by identifying it with the space of harmonic 1-forms on M. The dual inner product is equipped on H1 (M, R) = Hom(H1 (M, R), R). As the inner product on 0⊗R, we take up the quotient inner product derived from the inner product on H1 (M, R). To describe long time asymptotics of the heat kernel on X, we need a distance function ˜ 0 : X → 0 ⊗ R by different from the Riemannian one. For this sake, define the map 8 using the paring of 0 ⊗ R and Hom(0, R) as Z x ˜ ω, ˜ < 80 (x), ω >= x0

where ω ∈ Hom(0, R) ⊂ H1 (M, R), and ω˜ denotes its lift to X. It should be pointed out that the integral in the right-hand side does not depend on the choice of a path ˜ 0 (y) − 8 ˜ 0 (x)k, where k · k denotes the joining x0 and x. We then put d0 (x, y) = k8 Euclidean norm on 0 ⊗ R associated with the inner product defined above. We call d0 the 0-Albanese pseudo distance. We shall see in Sect. 2 that the pseudo-distance d0 is roughly equivalent to the Riemannian distance d, and hence d(x, y) ↑ ∞ if and only if d0 (x, y) ↑ ∞. Consider the flat torus 0 ⊗ R/ 0 ⊗ Z with the flat metric induced from the inner ˜ 0 (x) + σ ⊗ 1 for σ ∈ 0, we obtain a ˜ 0 (σ x) = 8 product defined as above. Since 8 ˜ 0 . From now on, we call the flat map 80 : M → 0 ⊗ R/ 0 ⊗ Z whose lift to X is 8 torus 0 ⊗ R/ 0 ⊗ Z the 0-Albanese torus and denote it by Alb0 . We also call 80 the 0-Albanese map. The map 80 is harmonic. When X is the maximal abelian covering manifold of M, that is, ρ is an isomorphism, the flat torus Alb0 and the map 80 coincide with what T. Nagano and B. Smyth [7] call the Albanese torus and the Albanese map respectively.

Albanese Maps and Asymptotics for the Heat Kernel

635

Let r = rank(0) (> 0), and Tor(0) be the torsion part of 0, which coincides with the kernel of the canonical homomorphism (σ 7 → σ ⊗ 1) of 0 onto the free abelian group 0 ⊗ Z. In our discussion, the following constant shows up frequently: C(X) =

vol(M)r/2−1 vol(Alb0 ) . # Tor(0)

The first main result is a Local Central Limit Theorem which is closely related to the long time asymptotic behavior of the Brownian motion on X. Theorem 1 (Local Central Limit Theorem). vol(M) d0 (x, y)2 = 0, lim (4πt)r/2 k(t, x, y) − C(X) exp − t↑∞ 4t uniformly for all x, y ∈ X. In particular, for each fixed x and y, we have lim (4π t)r/2 k(t, x, y) = C(X).

t↑∞

The above theorem leads to the following ˜ 0 (xδ ) = x. For every Theorem 2. Let {xδ }δ>0 be a sequence in X with limδ→0 δ 8 continuous function f on the Euclidean space 0 ⊗ R with compact support, we have Z lim δ↓0

X

˜ 0 (y))dy k(δ −2 vol(M)t, xδ , y)f (δ 8 −r/2

Z

= (4π t)

0⊗R

exp −

1 kx − yk2 f (y)dy, 4t

where dy denotes the Riemannian density on X and dy denotes the Lebesgue density on 0 ⊗ R. In terms of Brownian motions, the above theorem tells us the following. If B(t) is the ˜ 0 (B(δ −2 vol(M)t)) converges in distribution Brownian motion on X, then the process δ 8 to the Brownian motion on the Euclidean space 0 ⊗ R as δ goes to zero. It should be ˜ 0 (B(t)) on 0 ⊗ R is a martingale worthwhile to mention that the stochastic process 8 ˜ since 80 is a harmonic map. We then proceed with the asymptotic expansion of k(t, x, y) as t ↑ ∞. We may expect, in view of the limit formula in Theorem 1, that the asymptotic expansion should involve the term (4πt)

−r/2

vol(M) exp − d0 (x, y)2 4t

−r/2

= (4π t)

∞ X 1 vol(M) i d0 (x, y)2i t −i . − i! 4 i=0

(1) As a matter of fact, we have extra terms in general and find out that the coefficients in the right-hand side of (1) dominate each coefficient in the asymptotic expansion with respect to t −1 as d(x, y) ↑ ∞ as is seen in the following and Theorem 5 below.

636

M. Kotani, T. Sunada

Theorem 3. There are functions βi (p, q; ξ ) on M ×M ×(0 ⊗R) which are polynomials in ξ ∈ 0 ⊗ R with degree less than 2i such that vol(M) d0 (x, y)2 k(t, x, y) ∼ (4πt)−r/2 C(X) exp − 4t × (1 + b1 (x, y)t −1 + b2 (x, y)t −2 + . . . ), ˜ 0 (y)−8 ˜ 0 (x)), and the term exp − vol(M) d0 (x, y)2 where bi (x, y) = βi (π(x), π(y); 8 4t should be considered as an asymptotic power series in t −1 as in (1).

It should be noted that, if we do not impose the condition of the degree on the polynomials βi (p, q; ξ ) in ξ , then the expression of the asymptotic formula above is meaningless since every asymptotic power series in t −1 is divisible by any asymptotic power series with a non-zero constant term. By taking the product of two asymptotic series in Theorem 3, we have the following Theorem 4. k(t, x, y) ∼ (4πt)−r/2 C(X)(1 + c1 (x, y)t −1 + c2 (x, y)t −2 + . . . ) (t ↑ ∞), 1 vol(M) l d0 (x, y)2l (d(x, y) ↑ ∞). − cl (x, y) ∼ l! 4 The explicit forms of the coefficients ci (x, y) in Theorem 4 are complicated in general. Actually, in order to calculate ci (x, y), we have to know detailed information on the first eigenvalues of and eigenfunctions of the twisted Laplacians. To give the exact shape of c1 (x, y), let ω1 , . . . , ωr be an orthonormal basis of the space Hom(0, R)(⊂ H1 (M, R)) and let G : C ∞ (M) → C ∞ (M) be the Green operator. Theorem 5. r r X X vol(M) 2 vol(M) 2 d0 (x, y) − (G( |ωi | )(π(x))+G( |ωi |2 )(π(y))) c1 (x, y) = − 4 2 i=1 i=1 Z r r r Z X X X vol(M) G( |ωi |2 ) |ωi |2 + 2 G(< ωi , ωj >) < ωi , ωj > . + 4 M M i=1

i=1

i,j =1

P P Remark. The functions ri=1 |ωi |2 , ri,j =1 G(< ωi , ωj >) < ωi , ωj > do not depend on the choice of an orthonormal basis {ωi }ri=1 . We conclude, from Theorem 1 and Theorem 5, that the constant C(X) and the pseudodistance d∗ (x, y) = vol(0\X)1/2 d0 (x, y) do not depend on the choice of 0 as far as it satisfies our condition. Although it is rather straightforward to check that two commensurable 0’s have the same constant C(X) and the same pseudo-distance d∗ (x, y), this fact is, by no means, obvious since there exist two groups 01 and 02 which satisfy our condition and are not commensurable. The case where no extra terms in the asymptotic expansion appear is characterized as follows.

Albanese Maps and Asymptotics for the Heat Kernel

637

Theorem 6. The following conditions are equivalent: vol(M) d0 (x, y)2 (t ↑ ∞), 1. k(t, x, y) ∼ (4πt)−r/2 C(X) exp − 4t where the righthand side should be considered as an asymptotic power series in t −1 . vol(M) d0 (x, y)2 . 2. c1 (x, y) = − 4 3. Every harmonic 1-forms ω ∈ Hom(0, R) has constant length, i.e. |ω|(p) is a constant function of M. 4. The map 80 : M → Alb0 is a homothetic Riemannian submersion; namely by a homothetic change of the metric on M, 80 becomes a Riemannian submersion. We finally consider the case of random walks (cf. [4]), which is regarded as a discrete model of Brownian motions. A reversible random walk on a locally finite graph X = (V , E) is given by a positive valued function p on the set of all oriented edges E and a positive valued function m on the set V of vertices satisfying; X p(e) = 1, where Ex = {e ∈ E : o(e) = x}, and o(e) is the origin of the edge e, • e∈Ex

• p(e)m(o(e)) = p(e)m(t (e)) for e ∈ E, where e denotes the inverse edge of e. We consider p(e) the probability that a particle placed at o(e) moves to the terminus t (e) along the edge e in one unit time. The probability that a particle starting at x reaches y at time n is given by X p(e1 )p(e2 ) . . . p(en ), p(n, x, y) = c=(e1 ,e2 ,...,en )

where the sum is taken over all paths c = (e1 , . . . , en ) of length n whose origin o(c) = x and terminus t (c) = y. It is easy to check that m(x)p(n, x, y) = m(y)p(n, y, x). We may regard p(n, x, y)m(y)−1 as a discrete analogue of the heat kernel k(t, x, y). Our concern is the asymptotic of p(n, x, y) as n goes to infinity. We suppose that X is a lattice graph, namely it admits a free action of an abelian automorphism group 0 with a finite graph X0 = 0\X as its quotient. Furthermore we assume that the functions p and m are invariant under the 0-action so that we have a reversible random walk on X0 = (V0 , E0 ). Typical examples of lattice graphs are the Zk -lattice, the triangular lattice and the hexagonal lattice, which appear in many contexts of mathematical sciences. Since a lattice graph with an invariant reversible random walk is regarded as a discrete analogue of a “periodic manifold”, we may introduce similar notions and terminologies for the lattice graph X as ones in the case of manifolds. We identify H1 (X0 , R) with the space of harmonic 1-forms X p(e)ω(e) = 0 (x ∈ V0 ) , ω : E0 → R | ω(e) = −ω(e), δω(x) = e∈Ex

which is endowed with the inner product defined by ω1 , ω2 =

1X ω1 (e)ω2 (e)p(e)m(o(e)). 2 e∈E

638

M. Kotani, T. Sunada

Imitating the procedure in the continuous case, we equip an inner product on 0 ⊗ R, ˜ 0 : V0 → 0 ⊗ R and the pseudo-distance d0 . The interpolation and define the map 8 ˜ 0 by line segments is said to be the standard realization of X in 0 ⊗ R. We map for 8 should remark that the standard realization gives a “standard” way to realize an abstract graph in the Euclidean space as the Z2 -lattice is realized as the square lattice in R2 . More discussion about standard realizations is found in [5]. We also introduce the 0-Albanese torus Alb0 and the constant C(X) =

m(V0 )r/2−1 vol(Alb0 ) , # Tor(0)

P where m(V0 ) = x∈V0 m(x) and r = rank(0). The following is a discrete version of Theorem 1 which gives a refinement of our previous result [4] and a generalization of a local central limit theorem for the isotropic random walk on the square lattice in the Euclidean space (see F. Spitzer [8]). Actually we were inspired by this theorem, where, instead of the graph-theoretical distance, the Euclidean distance shows up, and wondered what distance would appear in the asymptotics for a more general lattice graph. The Albanese distance is found out to be a nice distance to describe the asymptotics as is seen in the following Theorem 7. Let X, 0 be as above. 1. If X is a non-bipartite graph, then m(V0 ) r/2 −1 2 d0 (x, y) =0 lim (4πn) p(n, x, y)m(y) − C(X) exp − n↑∞ 4n uniformly for all x, y ∈ V . 2. If X is a bipartite graph with a bipartition V = A q B, and (a) if x, y ∈ A or x, y ∈ B, then p(n, x, y) = 0 for odd n and m(V0 ) r/2 −1 2 =0 d0 (x, y) lim (4πn) p(n, x, y)m(y) − 2C(X) exp − n↑∞ 4n uniformly for x, y ∈ A or for x, y ∈ B, where n runs over even numbers, (b) if x ∈ A, y ∈ B or x ∈ B, y ∈ A, then p(n, x, y) = 0 for even n and m(V0 ) r/2 −1 2 d0 (x, y) =0 lim (4πn) p(n, x, y)m(y) − 2C(X) exp − n↑∞ 4n uniformly for x ∈ A, y ∈ B or for x ∈ B, y ∈ A, where n runs over odd numbers. This theorem implies roughly that, when the mesh of the lattice graph X realized in the Euclidean space 0 ⊗ R becomes finer, the random walk on X approaches the Brownian motion in 0 ⊗ R with a suitable time scale change. As for the asymptotic expansion of p(n, x, y) in n−1 as n goes to infinity, we have similar results as Theorem 4 and Theorem 5. A minor change is required only for the coefficient c1 (x, y). The organization of this paper is as follows. We begin by introducing the concepts of 0-Albanese maps and 0-Albanese distances (Sect. 2). We give a characterization of the 0-Albanese maps among harmonic maps into flat tori, and establish rough equivalence between the 0-Albanese distance and the Riemannian distance. Employing ideas in the Floque–Bloch theory, in Sect. 3, we reduce our problem to the study of twisted Laplacians acting in sections of the flat line bundles, which allows us to use geometry

Albanese Maps and Asymptotics for the Heat Kernel

639

of the compact manifold downstairs. By choosing a special section of the line bundle, we analyze eigenvalues and eigenfunctions of the twisted Laplacians 1χ . They are also effectively used for an integral expression of the heat kernel over the group b 0 of unitary characters χ of 0. In this stage, the 0-Albanese map shows up explicitly in the heat kernel. We then observe that the first eigenvalues of 1χ around the trivial character dominate the integral when t goes to infinity. This observation leads us to the local central limit theorem in Sect. 5 and to the asymptotic expansion in Sects. 6–7 by making careful use of the Laplace method. In the final section, we study the asymptotics of the transition probability of a random walk on a lattice graph. 2. Albanese Distance Let X be a non-compact abelian covering manifold of a compact connected Riemannian manifold M. We denote by 0 its covering transformation group, by π : X → M the covering map, and by : H1 (M, Z) → 0 the surjective homomorphism associated with the covering map π. Extend ρ to the surjective linear map ρR : H1 (M, R) = H1 (M, Z) ⊗ R → 0 ⊗ R. Consider also the transpose t ρ : Hom(0, Z) → Hom(H1 (M, Z), Z) = H1 (M, Z) and its extension t ρR : Hom(0, R) → H1 (M, R), which is an injective linear map. From now on, we identify Hom(0, R) with the subspace Image (t ρR ) in H1 (M, R), and 0 ⊗ R with the quotient linear space of H1 (M, R). In view of the Kodaira–Hodge Theorem, we identify H1 (M, R) with the space of harmonic 1-forms and H1 (M, Z) with Z ω ∈ Z for all closed curves c}. {ω ∈ H1 (M, R) | c

R It is straightforward to check that ω ∈ H1 (M, R) is in Hom(0, R) if and only if c ω = 0 for every closed curve in M with ρ([c]), when [c] denotes the homology class of c. We introduce the inner product on H1 (M, R) defined by Z < ω1 , ω2 > . ω1 , ω2 = M

We omit the Riemannian density in the integral here and from now on. The dual inner product is endowed with H1 (M, R) = Hom(H1 (M, R), R). By using these inner products, we equip the inner products on the subspace Hom(0, R) of H1 (M, R) and the quotient space 0 ⊗ R of H1 (M, R) in a natural manner, which, as is easily checked, are dual to each other as inner product spaces. 0 of 0 is identified with The identity component b 01 of the unitary character group b the torus Hom(0, R)/Hom(0, Z). Actually for ω ∈ Hom(0, R)(⊂ H1 (M, R)), define 0 as χω ∈ b √ (σ ∈ 0), χω (σ ) = exp(2π −1 < σ ⊗ 1, ω >) where <, > denotes the paring map on 0 ⊗ R and Hom(0, R). If cσ is a closed curve such that ρ([cσ ]) = σ , then Z ω. < σ ⊗ 1, ω >= cσ

640

M. Kotani, T. Sunada

It is easy to see that ω 7 → χω gives an isomorphism from Hom(0, R)/Hom(0, Z) to b 01 . We call the flat tori 0 ⊗ R/ 0 ⊗ Z and Hom(0, R)/Hom(0, Z) with the flat metrices induced from the inner products defined above the 0-Albanese torus and 0-Jacobian torus and denote them by Alb0 and Jac0 , respectively. Since the lattice group 0 ⊗ Z in 0 ⊗ R and the lattice group Hom(0, Z) in Hom(0, R) are dual each other, we observe that Alb0 is the dual flat torus of Jac0 , and hence vol(Alb0 ) = vol(Jac0 )−1 . ˜ 0 : X → 0 ⊗ R by Following Nagano-Smyth [7], we define the map 8 Z x ˜ 0 (x), ω >= ω˜ (ω ∈ Hom(0, R)). <8 x0

Here x0 is a fixed base point in X and ω˜ stands for the lift of ω to X. Note that the integral above does not depend on the choice of a curve from x0 to x. Indeed, for any closed curve c in X, Z Z ω˜ = ω = 0, c

πc

since ρ([πc]) = 0 in 0. This implies that ω˜ = df for some f ∈ C ∞ (X) and ˜ 0 (x), ω >= f (x) − f (x0 ). <8 ˜ 0 is a vector-valued harmonic function. For Since 1f = δ ω˜ = 0, we observe that 8 σ ∈ 0, we find Z σx Z x Z σx ˜ 0 (σ x) = ω˜ = ω˜ + ω˜ 8 x0 x0 x Z ˜ 0 (x), ω > + < σ ⊗ 1, ω >, ˜ 0 (x), ω > + ω =< 8 =< 8 cσ

which implies ˜ 0 (x) + σ ⊗ 1. ˜ 0 (σ x) = 8 8 ˜ 0 induces 80 : M → Alb0 , which we call the 0-Albanese map. Therefore the map 8 The 0-Albanese map is “universal” among harmonic maps of M into flat tori in the following sense. Theorem 2.1 ([7]). 1. The 0-Albanese map is harmonic. 2. Let ϕ : M → T be a harmonic map from a compact Riemannian manifold M to a flat torus T , and ϕ∗ : π1 (M) → π1 (T ) be the induced homomorphism of the fundamental groups. Put 0 = π1 (M)/ Ker ϕ∗ . Then there exists a unique affine map g : Alb0 → T with ϕ = g ◦ 80 . ˜ 0 is a harmonic function. The proof The first claim is a consequence of the fact that 8 of the second claim is carried out in the same way as the one in [7]. We write E(ϕ, g) for the energy of a smooth map ϕ of M into a flat torus (T , g). We have the following characterization of the 0- Albanese map.

Albanese Maps and Asymptotics for the Heat Kernel

641

Theorem 2.2. Let X and 0 be as above. Let g be a flat metric on the torus T = 0 ⊗ R/ 0 ⊗ Z with vol(T , g) = vol(Alb0 ), and ϕ be a smooth map of M = 0\X into a torus T . If ϕ is homotopic to 80 , then E(80 , g0 ) ≤ E(ϕ, g), where g0 denotes the flat metric on the 0-Albanese torus Alb0 . The equality holds if and only if (T , g) = Alb0 and ϕ coincides with 80 up to a translation. Proof. Let c1 , . . . , cr be a Z-basis of 0 ⊗ Z, and u1 , . . . , ur be its dual basis of Hom(0, Z), namely < uk , cl >= δkl . We then find ˜ 0 (x) = 8

r Z X k=1

x

x0

uk ck ,

and hence, for v ∈ T (M), d80 (v) =

r X

uk (v)ck .

k=1

We may assume, without loss of generality, that ϕ : M → (T , g) is a harmonic map since a harmonic map into a flat torus is energy minimizing among maps in its homotopy class. Note here that, if ϕ1 and ϕ2 are homotopic harmonic maps into flat tori (T , g1 ) and (T , g2 ) respectively, then ϕ1 − ϕ2 is constant. Therefore, ϕ coincides with 80 up to a translation. It is thus enough to show that E(80 , g0 ) ≤ E(80 , g),

(2)

and the equality holds if and only if g = g0 . Let <, >g be the inner product on 0 ⊗ R corresponding to the flat metric g, and consider the n × n-matrix Ag = (< ci , cj >g ). Since vol(T , g) = (det Ag )1/2 , the assumption vol(T , g) = vol(Alb0 ) is equivalent to local frame e1 , . . . , em (m = dim M) and the det Ag = det Ag0 . Take an orthonormal P dual co-frame ω1 , . . . , ωm . Write uk = m i=1 uki ωi for k = 1, . . . , r, and U = (uij ). The energy density e(80 , g) for 80 : M → (T , g) is given by e(80 , g) =

r m m 1X 1X X < d80 (ei ), d80 (ei ) >g = uki uli < ck , cl >g 2 2 i=1

i=1 k,l=1

1 1 = tr(U t Ag U ) = tr(U U t Ag ). 2 2 Hence we find that 1 E(80 , g) = 2

Z

Z 1 tr(U U Ag ) = tr U U t Ag . 2 M M t

On the other hand, since {uk } is the dual basis of {cl }, we have m Z Z X Z = ( u , u ) = < u , u > = u u UUt. = A−1 k l k l ki li g0 M

M i=1

M

642

M. Kotani, T. Sunada

Thus we obtain E(80 , g) =

1 tr(A−1 g0 Ag ). 2

We now employ the inequality tr(A−1 g0 Ag ) ≥ r, where the equality holds if and only if Ag = Ag0 . From this, the desired inequality (2) t follows, and the equality holds if and only if g = g0 . This completes the proof. u Let T be a torus, and C be a homotopy class of maps from M into T which induce the surjective homomorphism of π1 (M) onto π1 (T ). The homomorphism η = ϕ∗ of π1 (M) onto π1 (T ) = H1 (T , Z) induced from a map ϕ : M → T does not depend on e where M e is the the choice of ϕ ∈ C. Putting 0 = π1 (M)/ Ker η and X = Ker η\M, universal covering manifold of M, we use Theorem 2.2 to obtain Theorem 2.3. For a given positive constant v0 , there exists a pair (ϕ0 , g0 ) of a harmonic map ϕ0 and a flat metric g0 on T with vol(T , g0 ) = v0 satisfying E(ϕ0 , g0 ) ≤ E(ϕ, g) for every ϕ ∈ C and every flat metric g on T with vol(T , g) = v0 . The pair (ϕ0 , g0 ) is uniquely determined. More precisely, if E(ϕ0 , g0 ) = E(ϕ, g), then g = g0 and ϕ coincides with a translation of ϕ0 . Proposition 2.4. The following two conditions are equivalent: 1. Every harmonic 1-form ω ∈ Hom(0, R) has constant length, i.e. |ω|(p) is a constant function of M. 2. The 0-Albanese map 80 : M → Alb0 is a homothetic Riemannian submersion. Proof. It suffices to note that (dp 80 )∗ (ω) = ω(p). Actually < d80 (v), ω >p = < v, ω >p for v ∈ Tp M and dp 80 : (Ker dp 80 )⊥ → 0 ⊗ R is a homethetic bijection if t and only if (dp 80 )∗ is a homothetic injection. u ˜ 0 (x) − 8 ˜ 0 (y)k for x, y ∈ X, which defines a pseudo-distance on Put d0 (x, y) = k8 X. We call d0 the 0-Albanese pseudo-distance. Proposition 2.5. Let d be the distance associated with the Riemannian metric of X. The identity map ι : (X, d) → (X, d0 ) is a rough isometry [1]. More precisely, there are positive constants C1 , C2 , C3 such that C1 d0 (x, y) ≤ d(x, y) ≤ C2 d0 (x, y) + C3 . In particular, d0 (x, y) ↑ ∞ ⇐⇒ d(x, y) ↑ ∞.

Albanese Maps and Asymptotics for the Heat Kernel

643

Proof. To prove the first inequality, let ω1 , . . . , ωr be an orthonormal basis of Hom(0, R) and take a shortest geodesic c : [0, 1] → X from x to y. Then for each ωi , Z 1 Z 1 Z y dc dc , ω˜ i > dt| ≤ ω˜ i | = | < | | |ω˜ i (c(t))|dt | dt x 0 0 dt Z 1 dc | |dt = max |ωi (x)|d(x, y). ≤ max |ωi (x)| p∈M p∈M 0 dt Putting a = max{|ωi (p)| | p ∈ M, i = 1, . . . , r }, we have ˜ 0 (y) − 8 ˜ 0 (x)k2 = d0 (x, y)2 = k8 Z r X = | i=1

x

r X

˜ 0 (y) − 8 ˜ 0 (x), ωi > |2 |<8

i=1 y

ω˜ i |2 ≤ ra 2 d(x, y)2 .

Next we prove the second inequality. Let D be a relatively compact fundamental domain of 0 in X and fix a base point x0 ∈ D. For x = σ x 0 , y = τy 0 ∈ X with σ, τ ∈ 0 and x 0 , y 0 ∈ D, we have d(x, y) = d(σ x 0 , τy 0 ) = d(x 0 , σ −1 τy 0 ) ≤ d(x 0 , x0 ) + d(x0 , σ −1 τ x0 ) + d(σ −1 τ x0 , σ −1 τy 0 ) = d(x 0 , x0 ) + d(x0 , y 0 ) + d(x0 , σ −1 τ x0 ) ≤ 2diamd (D) + d(x0 , σ −1 τ x0 ), where diamd (D) stands for the diameter of D with respect to the distance d. Writing |σ | = d(x0 , σ x0 ) for σ ∈ 0, we easily observe |σ + τ | ≤ |σ | + |τ |, |σ −1 | = |σ |. Take elements P σ1 , . . . , σr ∈ 0 such that σ1 ⊗ 1, . . . , σr ⊗ 1 form a Z-basis of 0 ⊗ Z. For σ = ri=1 ki σi + µ ∈ 0 with ki ∈ Z, µ ∈ Tor(0), we have |σ | ≤

r X

r X

|ki ||σi | + |µ| ≤ max |σi | i=1,...,r

i=1

j =1

|kj | + max |µ|, µ∈Tor(0)

which implies that there are positive constants a and b such that |σ | ≤ akσ ⊗ 1k + b (σ ∈ 0), Pr 0 0 since one can find a constant c with j =1 |kj | ≤ c kσ ⊗1k. Therefore, for x = σ x 0 , y = τy 0 , we have d(x, y) ≤ akσ −1 τ ⊗ 1k + b + 2diamd (D). On the other hand, we see ˜ 0 ) + σ −1 τ ⊗ 1 − 8(x ˜ 0 )k d0 (x, y) = d0 (x 0 , σ −1 τy 0 ) = k8(y ˜ 0 ) − 8(x ˜ 0 )k ≥ kσ −1 τ ⊗ 1k − c, ≥ kσ −1 τ ⊗ 1k − k8(y where c = diamd0 (D) ≤ c1 · diamd (D). Putting all together, we obtain the second inequality. u t

644

M. Kotani, T. Sunada

Remark. We have the following expressions of the Albanese distance: Z y ω| ˜ | ω ∈ Hom(0, R), kωk ≤ 1} d0 (x, y) = sup{| x

= sup{|f (x) − f (y)| | f ∈ C ∞ (X), 1f ≡ 0, df is 0-invariant, kdf k ≤ 1}. It is interesting to compare this characterization of d0 with the one for the Riemannian distance d: d(x, y) = sup{|f (x) − f (y)| | f ∈ C ∞ (X), |df | ≤ 1}. The following theorem gives a characterization of a manifold X whose d0 is homothetic to the Riemannian distance. Proposition 2.6. If there is a positive constant a such that d0 (x, y) = a · d(x, y) for every x, y ∈ X, then X is the r-dimensional Euclidean space. Proof. Take an arbitrary length minimizing geodesic c : [0, T ] → X with the arc-length parameter. From the assumption, we see k Put φ(t) =

˜ 0 (c(t + h)) − 8 ˜ 0 (c(t))k k8 d ˜ 80 (c(t))k = lim ≡ a. h→0 dt h

d ˜ dt 80 (c(t)).

Z k

s

t

For any 0 < s < t < T , the above equation tells that

˜ 0 (c(t)) − 8 ˜ 0 (c(s))k = a(t − s) φ(τ )dτ k = k8 Z t Z t d ˜ k 80 kdτ = kφ(τ )kdτ, = dt s s

which implies that for any s < u < t, Z k

u s

Z φ(τ )dτ +

t

u

Z φ(τ )dτ k = k

Z

u

φ(τ )dτ k + k

s

u

t

φ(τ )dτ k.

We find especially a function β such that Z

u s

Z φ(τ )dτ = β(u)

s

t

φ(τ )dτ,

d ˜ 80 (c(t)) = φ(t) which implies φ(u) is parallel to a constant vector. Since kφ(t)k ≡ a, dt ˜ 0 is a ˜ has to be a constant vector, namely t 7 → 80 (c(t)) is a line segment. Therefore 8 r totally geodesic map from X to R , which is also homothetic. This proves the proposition. t u

Albanese Maps and Asymptotics for the Heat Kernel

645

3. Twisted Laplacians We shall first review some materials related to twisted Laplacians (see [3,9]). For χ ∈ b 0 , define the flat line bundle over M as the quotient Lχ = X × C/ ∼, where the equivalence relations are given by (x, v) ∼ (σ x, χ (σ )v) (σ ∈ 0). A section s of Lχ is identified with a C-valued function s˜ of X satisfying s˜ (σ x) = χ(σ )˜s (x). By restricting the Laplacian 1X on X to C ∞ (Lχ ) through this identification, we have a self-adjoint positive elliptic operator 1χ : C ∞ (Lχ ) → C ∞ (Lχ ), which we call the twisted Laplacian. Let its eigenvalues be 0 ≤ λ0 (χ) ≤ λ1 (χ ) ≤ · · · ≤ λk (χ ) ≤ . . . ↑ ∞. 0 . Since The perturbation theory assures that each λk (χ ) is a continuous function on b the first eigenvalue λ0 (1) = 0 for the trivial character 1 is simple, λ0 (χ ) is simple and depend smoothly on χ around χ = 1. Moreover, λ0 (χ ) = 0 if and only if χ = 1 ([9]). By employing again the perturbation theory, we may take an orthonormal basis 2 {sχ,k }∞ k=0 of L (Lχ ) such that 1. 1χ sχ,k = λk (χ)sχ ,k , 2. sχ,k is bounded and integrable with respect to χ , 3. sχ,0 is smooth in χ around χ = 1 and s1,0 ≡ vol(M)−1/2 . Lemma 3.1. The heat kernel k(t, x, y) on X is given by k(t, x, y) =

∞ Z X b k=0 0

exp(−λk (χ )t)˜sχ ,k (x)˜sχ ,k (y)dχ ,

where dχ is the normalized Haar measure of b 0. Proof. The kernel function kχ (t, p, q) ∈ (Lχ )p ⊗ (L∗χ )q of the operator exp(−t1χ ) (p, q ∈ M) is given as kχ (t, p, q) =

∞ X

exp(−λi (χ )t)sχ ,i (p) ⊗ sχ ,i (q),

i=0

and its lift k˜χ (t, x, y) =

∞ X

exp(−λi (χ )t)˜sχ ,i (x)˜sχ ,i (y)

i=0

to X × X has the following expression: k˜χ (t, x, y) =

X

χ(σ )k(t, x, σy).

σ ∈0

From the orthogonal relation of unitary characters ( Z 1 σ = 1, χ(σ )dχ = 0 σ 6 = 1, b 0 the lemma follows. u t

(3)

646

M. Kotani, T. Sunada

Since inf{λ1 (χ) | χ ∈ b 0 } is positive, we observe that the asymptotic of k(t, x, y) as t goes to infinity coincides with that of the integral Z exp(−λ0 (χ )t)˜sχ ,0 (x)˜sχ ,0 (y)dχ , U

where U is an arbitrary small neighborhood of χ = 1. Indeed, Z exp(−λ0 (χ )t)˜sχ ,0 (x)˜sχ ,0 (y)dχ , b 0 −U

and Z X ∞ b 0 i=1

exp(−λi (χ )t)˜sχ ,i (x)˜sχ ,i (y)dχ

have exponential decay as t ↑ ∞. We shall elaborate this observation in Sects. 5–7. From now on, we identify U with a neighborhood of ω = 0 in Hom(0, R) through the correspondence ω 7 → χω . To analyze the behavoir of λ0 (χ ) and of sχ ,0 around χ = 1, take sω ∈ C ∞ (Lχω ) defined by Z x √ √ ˜ 0 (x), ω >). ω) ˜ = exp(2π −1 < 8 s˜ω (x) = exp(2π −1 x0

For f ∈ C ∞ (M), we see that f sω ∈ C ∞ (Lχω ) and √ 1χ (f sω ) = (1M f − 4π −1 < ω, df > +4π 2 |ω|2 f )sω . √ Put Hω f := 1M f − 4π −1 < ω, df > +4π 2 |ω|2 f . Since |sω | ≡ 1, the operator Hω : C ∞ (M) → C ∞ (M) is unitarily equivalent to 1χω . The function φω ∈ C ∞ (M) defined by sχω ,0 = φω sω satisfies (

Hω φω = λ0 (χω )φω , φ0 ≡ vol(M)−1/2 .

It is obvious that φω depends smoothly on ω around 0. For later use, we should also remark Z x √ ω) ˜ s˜χ ,0 (x)˜sχ,0 (y) = φω (π(x))φω (π(y)) exp(2π −1 y

√ ˜ 0 (y) − 8 ˜ 0 (x), ω >). = φω (π(x))φω (π(y)) exp(−2π −1 < 8 Consider the smooth path τ ω in Hom(0, R) for a fixed ω and write Hτ = Hτ ω , φτ = φτ ω and λ0 (τ ) = λ0 (χτ ω ) for simplicity. Notice that H0 = 1M , λ0 (0) = 0 and φ0 = vol(M)−1/2 . In the sequel, the k th derivative of a function g(τ ) is denoted by g (k) (τ ). The following lemma is a key to our discussion.

Albanese Maps and Asymptotics for the Heat Kernel

647

Lemma 3.2. (2k−1)

λ0

(0) = 0,

λ(2) (0) = 8π 2

(4)

kωk2 , vol(M)

(5)

Z Z 48π 2 kωk2 (2) (2) 2 (φ |ω| ) − φ vol(M)1/2 M 0 vol(M) M 0 Z (2) (2) 1M φ0 · φ0 , = −6

λ(4) (0) =

(6)

M

(1)

φ0 ≡ const, (2)

1M φ0 = 8π 2 vol(M)−1/2

(7)

kωk2 − |ω|2 , vol(M)

(8)

kωk2 √ (3) (2) (1) − |ω|2 φ0 . 1M φ0 = 12π −1 < ω, dφ0 > +24π 2 vol(M) Proof. By definition, φτ satisfies ( Hτ φτ = λ0 (τ )φτ , φ0 ≡ vol(M)−1/2 , and is normalized as

(9)

(10)

Z M

|φτ |2 ≡ 1.

(11)

Observe that Hω φ = H−ω φ and therefore λ0 (ω) = λ0 (−ω), from which (4) follows. (1) Differentiating both sides of (10) with respect to τ at τ = 0, we have 1M φ0 ≡ 0, (1) which implies that φ0 is a constant function. Differentiating (10) again, we obtain (2)

(2)

1M φ0 + 8π 2 |ω|2 vol(M)−1/2 = λ0 (0) vol(M)−1/2 .

(12)

Integrating (12) over M, we have λ(2) (0) = 8π kωk2 vol(M)−1 . Repeating the process, we also have kωk2 √ (3) (2) (1) − |ω|2 φ0 , 1M φ0 = 12π −1 < ω, dφ0 > +24π 2 vol(M) (4)

(3)

1M φ0 − 16π < ω, dφ0 > +48π 2 |ω|2 φ (2) (4)

(2)

= vol(M)−1/2 λ0 (0) + 6λ(2) (0)φ0 . Integrating both sides again over M, we obtain Z Z kωk2 (2) (2) (φ0 |ω|2 ) − φ0 . λ(4) (0) vol(M)1/2 = 48π 2 vol(M) M M

t u

648

M. Kotani, T. Sunada

By changing the eigen-sections sχ ,0 around 1, if necessary, we may assume that (1)

1. φ0 ≡ 0, (2) 2. φ0 is real valued, R (2) 3. M φ0 = 0. (1)

Indeed, as we have seen, φ0 is a constant function. On the other hand, by differentiating (1) both sides of (11) with respect to τ at τ = 0, we see that φ0 takes a value in pure √ imaginary numbers, say −1a(ω) for a(ω) ∈ R. Taking a look at the imaginary part (2) (2) of (8), we have 1M (Im(φ0 )) = 0, i.e. Im(φ0 ) is a constant, which we put b(ω). 2 It should b(τ ω) = τ b(ω). Replacing φω with √ be noted that a(τ ω) = τ a(ω) and φω exp −1(−a(ω) + 21 b(ω)2 ) vol(M)−1/2 , we can assume the first two properties (1) (2) above for φ0 and φ0 . The third property is a consequence of (11). From now on, we assume the above properties for φω . Put P (ω) = φω (π(x))φω (π(y)) and Pτ = P (τ ω). We then have (1)

P0

= 0,

(2) P0

= vol(M)−1/2 (φ0 (π(x)) + φ0 (π(y))).

(2)

(2)

We denote by G by the Green operator on M, that is, the operator acting on C ∞ (M) satisfying I = H + G1M , G1M = 1M G, GH = HG = O, where Hf = vol(M)−1

R M

f . From (9), it follows that (2)

G1M φ0 = −8π 2 vol(M)−1 G(|ω|2 ). On the other hand (2)

(2)

(2)

(2)

G1M φ0 = φ0 + Hφ0 = φ0 , so that (2)

φ0 = −8π 2 vol(M)−1/2 G(|ω|2 ). Summarizing, we have Lemma 3.3. We can take an eigen-section sχ ,0 such that P (0) = vol(M)−1 , ∇P |ω=0 = 0, Hess P |ω=0 (ω, ω) = −

8π 2 (G(|ω|2 )(π(x)) + G(|ω|2 )(π(y))). vol(M)

Albanese Maps and Asymptotics for the Heat Kernel

649

4. Fourier Transform We quickly review several facts about the Fourier transform and a version of the classical Laplace method which we need in later sections. The Fourier transform fˆ of a rapidly decreasing function f of Rr is Z √ ˆ f (u) exp(−2π −1u · ξ )du. f (ξ ) = Rr

Recall, for the Gaussian function g(u) = exp(−π |u|2 ), we have gˆ = g. Define the function FA f (ξ ), for a positive constant A and a function f (ξ, u) ((ξ, u) ∈ Rr ), by Z √ exp(−A|u|2 − 2π −1u · ξ )f (ξ, u)du. FA f (ξ ) := Rr

It is easy to see Fat 1(ξ ) =

π r/2 at

exp(−

π2 2 |ξ | ), at

(13)

for a, t > 0. Differentiation of the left-hand side of (13) with respect to ξ gives √ ∂ξα (Fat 1) = Fat (−2π −1u)α . On the other hand, inductive arguments lead to the following π2 a . α Pl with

Lemma 4.1. Let α be a multi-index and b = 1. When |α| = 2m, there are polynomials

deg Plα ≤ 2(l − m)(≤ 2l) such that

X b b t −l Plα (ξ ) exp(− |ξ |2 ). ∂ξα (exp(− |ξ |2 )) = t t 2m

l=m

2. When |α| = 2m − 1, there are polynomials Qαl with deg Qαl ≤ 2(l − m) + 1(≤ 2l) such that 2m−1 X b b t −l Qαl (ξ ) exp(− |ξ |2 ). ∂ξα (exp(− |ξ |2 )) = t t l=m

For the case |α| ≤ 2, we shall write down the precise formulae. Lemma 4.2. Write ∂i =

∂ ∂ξi .

Then

b 2b b ∂i exp(− |ξ |2 ) = − ξi exp(− |ξ |2 ), t t t 2b b 2 4b2 b ∂i ∂j exp(− |ξ | ) = − δij + 2 ξi ξj exp(− |ξ |2 ), t t t t 4b2 b 2 4b2 4b2 8b3 b δ ξ + δ ξ + δ ξ − ξ ξ ξ exp(− |ξ |2 ). ∂i ∂j ∂k exp(− |ξ | ) = ij k j k i ik j i j k t t2 t2 t2 t3 t Comparing differentials of both sides of (13) when |α| = 2, we have

650

M. Kotani, T. Sunada

Lemma 4.3. 2π 2 2π 2 2 π2 π n/2 δij + ξi ξj exp − |ξ |2 . − 4π 2 Fat (ui uj ) = − at at at at Let Q(ξ, u) be a function defined on Rr × Rr with compact support with respect to r the P variableαu for each ξ ∈ R and its Taylor expansion around u = 0 be Q(ξ, u) ∼ α qα (ξ )u . The following lemma is related to Theorem 3. Lemma 4.4. Assume that qα (ξ )’s are polynomials in ξ and q0 (ξ ) ≡ q0 . Then there are polynomials βl (ξ ) such that Fat Q(ξ ) ∼

π r/2 at

∞ π2 X 2 exp − |ξ | q0 βi (ξ )t −i , at i=0

where β0 ≡ 1 and βl are polynomials, and we should understand the term exp(− πat |ξ |2 ) to be an asymptotic power series in t −1 . Moreover, if deg qα (ξ ) < |α| (6= 0), we have deg βl < 2l (l > 0). 2

Proof. We first make a formal computation. Fat

X

qα uα =

π r/2 X at

α

=

π r/2 at

α

−π 2 1 qα (ξ )∂ξα exp |ξ |2 √ at (−2π −1)|α|

∞ −π 2 |ξ |2 X

exp

at

l=0

+

∞ X l=0

2l X X 1 qα (ξ ) Piα (ξ )t −i √ (2π −1)2l |α|=2l i=l

−1 √ (2π −1)2l−1

X

qα (ξ )

|α|=2l−1

2l−1 X i=l

Qαi (ξ )t −i .

Thus if we put βi (ξ ) =

X i/2≤l≤i

X 1 q −1 qα (ξ )Piα (ξ ) √ (2π −1)2l |α|=2l 0 X

+

(i+1)/2≤l≤i

−1 √ (2π −1)2l−1

X |α|=2l−1

q0−1 qα (ξ )Qαi (ξ ),

we have the following formal expression: Fat Q(ξ ) =

π r/2 at

π 2 2 X βl (ξ )t −l . |ξ | )q0 at ∞

exp(−

l=0

To make the above argument rigorous, write X qα (ξ )uα + R(ξ, u), Q(ξ, u) = |α|≤l

Albanese Maps and Asymptotics for the Heat Kernel

651

with |R(ξ, u)| ≤ q(ξ )|u|l . Note Z |R(ξ, u)| exp(−at|u|2 )du |Fat R| ≤ r R Z −r/2 |R(ξ, t −1/2 v) exp(−a|v|2 )dv =t Rr Z −(r+l)/2 q(ξ ) |v|l exp(−a|v|2 )dv. ≤t Rr

This completes the proof of the first claim. The second claim follows from deg qα (ξ )Piα (ξ ) < |α| + 2(i − l) = 2i (|α| = 2l), deg qα (ξ )Qαi (ξ ) < |α| + 2(i − l) + 1 = 2i (|α| = 2l − 1).

t u

In the above fomulae, we kept exp(− πat |ξ |2 ) unexpanded. The coefficient cl of t −l for the asymptotic power series 2

Fat Q(ξ ) ∼

∞ π r/2 X q0 cl (ξ )t −l , at l=0

obtained by expanding exp(− πat |ξ |2 ) and executing the product is given by 2

cl =

X 1 π 2 j 2j βi (ξ ) − |ξ | j! a

i+j =l

l

= β0

π 2 l−i 2(l−i) 1 π 2 l 2l X 1 − βi (ξ ) − |ξ | + |ξ | . l! a (l − i)! a i=1

It should be remarked that the degree of each βi (ξ )|ξ |2(l−i) for i ≥ 1 appearing in the above formula is less than 2l, since deg βi < 2i. This observation will be employed in the proof of Theorem 4. Suppose Q(ξ, u) has such a Taylor expansion as Q(ξ, u) = q0 +

X

qij (ξ )ui uj + . . . ,

i,j

with q0 = constant. Then the coefficients β1 and c1 are given as Lemma 4.5. β1 (ξ ) =

r 1 X qii (ξ ), 2aq0

(14)

i=1

c1 (ξ ) = β1 −

r π2 2 1 X π2 2 |ξ | β0 = |ξ | . qii (ξ ) − a 2aq0 a i=1

(15)

652

M. Kotani, T. Sunada

5. The Local Central Limit Theorem This section is devoted to the proof of Theorem 1 (the Local Central Limit Theorem). Recall

k(t, x, y) =

∞ Z X i=0

b 0

exp(−λi (χ )t)˜sχ ,i (x)˜sχ ,i (y)dχ .

0 and ε ≤ λ0 (χ ) Since there is a positive constant ε with ε ≤ λi (χ ) for all i ≥ 1 and χ ∈ b for χ ∈ b 0 \ U (U is an arbitrary small neighborhood of χ = 1), we observe that

lim (4πt)r/2

t↑∞

∞ Z X i=1

lim (4πt)r/2

t↑∞

Z

b 0 \U

b 0

exp(−λi (χ )t)˜sχ ,i (x)˜sχ ,i (y)dχ = 0,

exp(−λ0 (χ )t)˜sχ ,0 (x)˜sχ ,0 (y)dχ = 0,

uniformly for all x, y ∈ X. From now on, convergence always means uniform convergence on (x, y) ∈ X × X. P Take an orthonormal basis {ω1 , . . . , ωr } of Hom(0, R) and write ω = ri=1 wi ωi . The normalized Haar measure dχ on the character group b 0 is written by using the Lebesgue measure dω = dw1 ∧ dw2 ∧ · · · ∧ dwr as dχ = C0 dω with C0 = (vol(Jac0 )# Tor(0))−1 = vol(Alb0 )(# Tor(0))−1 . Therefore, identifying χ ∈ U with ω ∈ U 0 ⊂ Hom(0, R), we have lim (4πt)r/2 k(t, x, y) t↑∞ Z √ exp(−λ(ω)t)P (ω; π(x), π(y)) exp(−2π −1ξ · ω)dω = 0, − C0 U0

˜ 0 (y) − 8 ˜ 0 (x). Take an where λ0 (ω) = λ0 (χω ), P (ω; p, q) = φω (p)φω (q) and ξ = 8 extension λ(ω) of λ0 (ω) to the whole Hom(0, R) such that λ(ω) ≥ bkωk2 for some positive constant b. Extend also P (ω; p, q) to a smooth function on Hom(0, R) × M × M with compact support in the variable ω. Note that P (ω; π(x), π(y)) is uniformly bounded with respect to x, y and P (0; p, q) = vol(M)−1 . Therefore we have

lim (4πt)r/2

Z

t↑∞

U0

√ exp(−λ0 (ω)t)P (ω; π(x), π(y)) exp(−2π −1ξ · ω)dω

Z

−

Rr

√ exp(−λ(ω)t)P (ω; π(x), π(y)) exp(−2π −1ξ · ω)dω = 0.

Albanese Maps and Asymptotics for the Heat Kernel

653

Put a = 4π 2 vol(M)−1 . We then have Z √ exp(−λ(ω)t − 2π −1ξ · ω)P (ω)dω (4πt)r/2 r ZR √ exp(−akωk2 t − 2π −1ξ · ω)P (0)dω − Rr Z √ exp(−λ(ω)t) − exp(−akωk2 t) P (ω) exp(−2π −1ξ · ω)dω = (4πt)r/2 r ZR √ exp(−akωk2 t − 2π −1ξ · ω) P (ω) − P (0) dω + Rr Z √ ω ω ω exp(−λ( √ )t) − exp(−akωk2 ) P ( √ ) exp(−2π −1ξ · √ )dω = (4π)r/2 r t t t ZR √ ω ω exp(−akωk2 − 2π −1ξ · √ ) P ( √ ) − P (0) dω , + t t Rr where we simply write P (·) = P (·; π(x), π(y)). Since ω λ( √ )t ≥ bkωk2 , t ω lim λ( √ )t = akωk2 , t↑∞ t

ω lim P ( √ ; p, q) = P (0; p, q) uniformly for p, q ∈ M, t

t↑∞

we obtain Z √ exp(−λ(ω)t)P (ω; π(x), π(y)) exp(−2π −1ξ · ω)dω lim (4πt)r/2 r t↑∞ Z R √ exp(−akωk2 t)P (0; π(x), π(y)) exp(−2π −1ξ · ω)dω = 0. − Rr

On the other hand, Z √ exp(−akωk2 t)P (0; π(x), π(y)) exp(−2π −1ξ · ω)dω (4πt)r/2 Rr

= vol(M)r/2−1 exp(−

vol(M) d0 (x, y)2 ). 4t

Putting all together lead us to the local central limit theorem. Theorem 5.1. Let C(X) = C0 vol(M)r/2−1 , vol(M) d0 (x, y)2 = 0, lim (4πt)r/2 k(t, x, y) − C(X) exp − t↑∞ 4t uniformly for all x, y ∈ X. In particular, for each fixed two points x, y, lim (4π t)r/2 k(t, x, y) = C(X).

t↑∞

654

M. Kotani, T. Sunada

We now prove Theorem 5.2. For every continuous function f on the Euclidean space 0 ⊗ R with ˜ 0 (xδ ) = x, we have compact support, and for a sequence {xδ }δ>0 in X with limδ↓0 δ 8 Z lim δ↓0

X

˜ 0 (y))dy k(δ −2 vol(M)t, xδ , y)f (δ 8 = (4π t)−r/2

Z 0⊗R

exp −

1 kx − yk2 f (y)dy, 4t

where dy denotes the Riemannian density on X and dy denotes the Lebesgue density on 0 ⊗ R. In particular, Z lim δ↓0

X

˜ 0 (y))dy k(δ −2 vol(M)t, x, y)f (δ 8 = (4π t)−r/2

Z 0⊗R

exp −

1 kyk2 f (y)dy. 4t

Proof. In view of the local central limit theorem, for an arbitrary small ε > 0, there is a positive number δ0 such that |(4πt 0 )r/2 k(t 0 , x, y) − vol(M)r/2−1 C0 exp −

1 ˜ ˜ 0 (y)k2 | < ε, kδ 80 (x) − δ 8 4t

for δ ≤ δ0 , where t 0 = t vol(M)δ −2 . Dividing it by (4π t 0 )r/2 and putting ε0 = (4πt vol(M))−r/2 ε, we have |k(t 0 , xδ , y) − (4πt)−r/2 δ r vol(M)−1 C0 exp −

1 ˜ ˜ 0 (y)k2 | < ε0 δ r . kδ 80 (xδ ) − δ 8 4t

Therefore, Z ˜ 0 (y))dy k(t 0 , xδ , y)f (δ 8 X

− (4πt)−r/2 δ r vol(M)−1 C0

Z X

exp −

1 ˜ ˜ 0 (y)k2 f (δ 8 ˜ 0 (y))dy kδ 80 (xδ ) − δ 8 4t Z 0 r ˜ 0 (y))|dy. (16) |f (δ 8 <εδ X

Recall the definition of the Riemannian integral. Since the map 0 → 0 ⊗ Z is surjective and # Tor(0)-to-1, we find that, for an arbitrary continuous function F on 0 ⊗ R with compact support, lim δ r δ↓0

X σ ∈0

F (δ(x + σ ⊗ 1)) =

# Tor(0) vol(Alb0 )

Z 0⊗R

F (y)dy = C0−1

Z 0⊗R

F (y)dy

Albanese Maps and Asymptotics for the Heat Kernel

655

uniformly for x ∈ 0 ⊗ R. Therefore we have lim δ r δ↓0

Z X

˜ 0 (y))dy = lim δ r F (δ 8 δ↓0

= C0−1

XZ σ ∈0 D

˜ 0 (y) + σ ⊗ 1))dy F (δ(8

Z Z D

0⊗R

F (y)dy = vol(M)C0−1

Z 0⊗R

F (y)dy,

where D is a fundamental domain of the 0-action on X. Substituting F (y) = exp − 4t1 kx − yk2 f (y) for the above equation, we obtain lim δ

r

δ↓0

Z X

exp −

1 ˜ 0 (y)k2 f (δ 8 ˜ 0 (y))dy kx − δ 8 4t Z = vol(M)C0−1

0⊗R

exp(−

1 kx − yk2 )f (y)dy. 4t

We also have lim δ δ↓0

r

Z X

˜ 0 (y)|dy = C −1 |f (δ 8 0

Z 0⊗R

|f (y)|dy.

On the other hand, there exists a positive number δ1 such that, for δ ≤ δ1 , | exp −

1 ˜ ˜ 0 (y)k2 | < ε ˜ 0 (y)k2 − exp − 1 kx − δ 8 kδ 80 (xδ ) − δ 8 4t 4t

uniformly in y so that Z 1 ˜ r ˜ 0 (y)k2 − exp − kδ 8 δ 0 (xδ ) − δ 8 4t X Z 1 ˜ 0 (y)k2 f (δ 8 ˜ 0 (y)|dy ˜ 0 (y))dy < εδ r |f (δ 8 exp − kx − δ 8 4t X and Z 1 ˜ ˜ 0 (y)k2 f (δ 8 ˜ 0 (y))dy exp − kδ 8 lim δ r 0 (xδ ) − δ 8 δ↓0 4t X Z Z 1 −1 −1 2 exp(− kx − yk )f (y)dy ≤ εC0 |f (y)|dy. − vol(M)C0 4t 0⊗R 0⊗R Since ε is an arbitrary small number, we find Z lim δ↓0

˜ 0 (y))dy k(t vol(M)δ −2 , xδ , y)f (δ 8 X Z = (4π t)−r/2

0⊗R

exp(−

1 kx − yk2 )f (y)dy. 4t

t u

656

M. Kotani, T. Sunada

We shall explain what we have proved above in terms of stochastic processes. Let B(t) : (, P ) → X (t ≥ 0) be the Brownian motion on X, namely for each x ∈ X, and for any continuous function f on X with compact support, Z k(t, x, y)f (y)dy, E(f (B(t)) | x ) = X

where x = {ω ∈ | B(0)(ω) = x} and E( |x ) denotes the conditional expectation. Theorem 5.3. Let B0 (t) be the Brownian motion on the Euclidean space 0 ⊗ R, and let ˜ 0 (xδ ) = x. Then, for any continuous function {xδ }δ>0 be a sequence in X with limδ↓0 δ 8 f on 0 ⊗ R with compact support, we have ˜ 0 (B(t vol(M)δ −2 )) | xδ ) = E(f (B0 (t)) | x ), lim E(f (δ 8 δ↓0

namely, 0 ˜ 0 (B(t vol(M)δ −2 ))| lim δ 8 x = B (t)|x δ↓0

δ

in distribution. Proof. It suffices to note Z ˜ 0 (Bt vol(M)δ −2 )) | xδ ) = E(f (δ 8 −r/2

E(f ((B (t)) | x ) = (4π t) 0

X

˜ 0 (y))dy, k(t vol(M)δ −2 , xδ , y)f (δ 8

Z

0⊗R

exp(−

kx − yk2 )f (y)dy. 4t

t u

6. Proof of Theorem 3 and Theorem 4 We keep the notations in the previous section. Since λ(0) = 0, λ(−ω) = λ(ω) and Hess λω=0 (v, v) = 8π 2 vol(M)−1 kvk2 , we may apply the Morse lemma in a refined form to find a new coordinate system (U 00 , u) around ω = 0 such that   ω(u) = −ω(−u), (17) ω(u) = u + h(u) h(u) = O(kuk3 ),  λ(ω(u)) = akuk2 for a = 4π 2 vol(M)−1 . We then have Z √ exp(−λ(ω)t)P (ω; π(x), π(y)) exp(−2π −1ξ · ω)dω U0 Z √ exp(−akuk2 t)P (ω(u); π(x), π(y)) exp(−2π −1ξ · h(u)) = U"

∂ω √ × exp(−2π −1ξ · u) du, ∂u

Albanese Maps and Asymptotics for the Heat Kernel

657

˜ 0 (y) − 8 ˜ 0 (x), and where ξ = 8

∂ω ∂(w , . . . , w ) 1 n = ∂u ∂(u1 , . . . , un )

is the Jacobian of the coordinate exchange. Extend h(u), P (ω(u); p, q) and | ∂ω ∂u | smoothly to functions on Rr with compact supports in the variable u respectively, and write those functions by the same symbols. Put ∂ω √ Q(ξ, u; p, q)) = P (ω(u; p, q)) exp(−2π −1ξ · h(u)). ∂u Since |k(t, x, y) − C0 Fat Q(ξ, u; π(x), π(y))| has an exponential decay as t goes to infinity, the asymptotic expansion of k(t, x, y) reduces to that of C0 Fat Q(ξ, u; π(x), π(y)). To apply the discussion in Sect. 4, we should note that the Taylor expansion X qα (ξ ; p, q)uα Q(ξ, u; p, q)) ∼ α

has the coefficients qα (ξ ; p, q) which are polynomials in ξ with degree less than |α| (|α| 6 = 0). In fact, this comes from the fact that the Taylor expansion of h0 (u) at u = 0 starts from a polynomial with degree greater than two. We should also note that q0 (ξ ; p, q) = Q(ξ, 0; p, q) = P (0; p, q) = vol(M)−1 since h(0) = 0 and | ∂ω ∂u (0)| = 1. Hence π r/2

∞ X π2 2 |ξ | ) vol(M)−1 βl (p, q; ξ )t −l at at l=0 vol(M) vol(M) r/2 ˜ 0 (y) − 8 ˜ 0 (x)k2 k8 exp − ∼ 4π t 4t ∞ X −1 βl (p, q; ξ )t −l , × vol(M)

Fat Q(ξ, u; p, q) ∼

exp(−

l=0

where βl (p, q; ξ ) are polynomials in ξ with deg βl (p, q; ξ ) < 2l, and β0 ≡ 1. Therefore we have k(t, x, y) ∼ (4πt)−r/2 C(X) exp −

vol(M) d0 (x, y)2 4t × (1 + b1 (x, y)t −1 + b2 (x, y)t −2 + . . . ),

˜ ˜ 8(x)). This completes the proof of Theorem 3. where bi (x, y) = βi (π(x), π(y); 8(y)− For Theorem 4, take a look at the coefficient vol(M) l−i 1 vol(M) l 2l X − |ξ | + βi (π(x), π(y); ξ ) − |ξ |2(l−i) l! 4 4 l

cl (x, y) =

i=1

658

M. Kotani, T. Sunada

for the asymptotic expansion k(t, x, y) ∼ (4πt)−r/2 C(X)(1 + c1 (x, y)t −1 + c2 (x, y)t −2 + · · · ) (t ↑ ∞). Since |ξ |−2l (βi (π(x), π(y); ξ )|ξ |2(l−i) ) goes to zero as |ξ | = d0 (x, y) ↑ ∞, we conclude that 1 vol(M) l − d0 (x, y)2l . cl (x, y) ∼ l! 4 This proves Theorem 4. 7. Proof of Theorem 5 To determine the constant c1 in Theorem 4, we need to compute ficient qij in the expansion

Pr

i=1 qii

for the coef-

∂ω √ Q(ξ, u; p, q) = P (ω; p, q) exp(−2π −1ξ · h(u)) ∂u r X qij ui uj + · · · (q0 = vol(M)−1 ), = q0 + i,j =1

(see Lemma 4.5). For this sake, take the eigenfunction φτ of Hτ with the eigenvalue λ(τ ) as in Sect. 3. In view of Lemma 3.2, we have (2)

8π 2 kωk2 , vol(M) Z (8π 2 )2 = −6 G|ω|2 · |ω|2 . vol(M) M

λ0 = (4)

λ0

Therefore the Taylor expansion of λ = λ(ω) around ω = 0 is given by 1 1 (2) λ (0) + λ(4) (0) + . . . 2 4! Z (4π 2 )2 4π 2 kωk2 − G|ω|2 · |ω|2 + O(kωk6 ). = vol(M) vol(M) M

λ(ω) =

Using the coordinate change ω = ω(u) = u + h(u) introduced above, and noting kh(u)k = O(kuk3 ), we find 4π 2 4π 2 kuk2 = ku + h(u)k2 vol(M) vol(M) Z (4π 2 )2 − G|u + h(u)|2 · |u + h(u)|2 + O(kuk6 ) vol(M) M Z 4π 2 ˜ kuk2 + 2 u, h(u) = −4π 2 G|u|2 · |u|2 + O(kuk6 ) , vol(M) M

Albanese Maps and Asymptotics for the Heat Kernel

659

˜ ˜ where h(u) is the 3rd order term in the Taylor expansion of h, namely h(u) is a vectorvalued homogeneous polynomial of degree 3. Comparing both sides, we find that Z ˜ G|u|2 · |u|2 . u, h(u) = 2π 2 M

On the other hand, since 1 , vol(M)1/2 8π 2 =− G|ω|2 , vol(M)1/2 φ0 ≡

(2)

φ0 we have

1 (2) 1 (2) P (ω(u); p, q)) = φω(u) (p)φω(u) (q) = (φ0 + φ0 (p) + . . . )(φ0 + φ0 (q) + . . . ) 2 2 1 2 2 2 1 − 4π (G(|u| )(p) + G(|u| )(q)) + O(kuk3 ). = vol(M) We write u =

Pr

i=1 ui ωi

and h˜ =

Pr

i=1 hi (u)ωi .

Keeping in mind that

√ exp(−2π −1ξ · h(u)) = 1 + O(kuk3 ), r ∂ω X ∂hi ∂hi + O(kuk4 ), =1+ = δij + ∂u ∂uj ∂ui i=1

we obtain ∂ω √ Q(ξ, u; p, q) = P (ω; p, q) exp(−2π −1ξ · h(u)) ∂u r X 1 1 − 4π 2 = (G < ωi , ωj >)(p) vol(M) i,j =1

r X ∂hi + O(kuk3 ). + (G < ωi , ωj >)(q) ui uj + ∂ui i=1

We write r X ∂hi i=1

∂ui

=

r X

aij ui uj .

i,j =1

Since r X i=1

aii =

r r 1 X ∂ 2 X ∂hj , 2 ∂uj ∂u2i i=1

j =1

660

M. Kotani, T. Sunada

it suffices to compute the right-hand side. For this end, we first note ∂ 2 ∂hj ∂ 2 ∂hi ∂2 ∂2 ˜ < u, h(u) >= 2 2 +2 2 , 2 2 ∂ui ∂uj ∂ui ∂uj ∂uj ∂ui so that r r r X 1 X ∂2 ∂2 1 ∂ 2 X ∂hj ˜ = < u, h(u) >. 2 ∂u2 2 ∂u2i ∂uj 8 ∂u i j i=1 j =1 i,j =1

Recall ˜ u, h(u) = 2π 2

Z M

G|u|2 · |u|2 ,

∂2 |u|2 = 2 < ωi , ωj >, ∂ui ∂uj and obtain Z r X ∂2 ∂2 2 2 2 G|u| · |u| 2π ∂u2i ∂u2j M i,j =1 Z X 2 r X ∂2 X ∂2 ∂ ∂2 2 2 2 2 G|u| · |u| + 2 G|u| · |u| = 4π 2 ∂ui ∂uj ∂ui ∂uj ∂u2i ∂u2j M i j i,j =1 Z X r r r X X 2 |ωi |2 ) · |ωi |2 + 2 G(< ωi , ωj >)· < ωi , ωj > . G( = 16π M

i=1

i=1

i,j =1

Therefore we have r r X 1 ∂ 2 X ∂hk 2 ∂u2i ∂uk i=1 k=1 Z X r r r X X = 2π 2 |ωi |2 ) · |ωi |2 + 2 G(< ωi , ωj >)· < ωi , ωj > , G( M

i=1

i=1

i,j =1

and r r X X 1 −4π 2 G( |ωi |2 )(p) + G( |ωi |2 )(q) vol(M) i=1 i=1 i=1 Z X Z r r r X X G |ωi |2 · |ωj |2 + 2 G < ωi , ωj > · < ωi , ωj > . + 2π 2

r X

qii =

M

i=

j =1

M i,j =1

Albanese Maps and Asymptotics for the Heat Kernel

661

Putting it all together, we finally obtain the formula for c1 : r

c1 =

1 −1 X π2 2 q0 |ξ | qii (ξ ) − 2a a i=1

r r X X 1 1 = − vol(M)|ξ |2 − vol(M) G( |ωi |2 )(π(x)) + G( |ωi |2 )(π(y)) 4 2 i=1 i=1 Z r r r X X X 1 |ωi |2 ) · |ωj |2 + 2 G(< ωi , ωj >)· < ωi , ωj > . G( + vol(M) 4 M i=1

j =1

i,j =1

This completes the proof of Theorem 5 since |ξ | = d0 (x, y). Proposition 7.1. The pseudo-distance d∗ (x, y) := vol(0\X)1/2 d0 (x, y) does not depend on the choice of a group 0. Proof. Since c1 (x, y), especially c1 (x, x), depends only on X, we find that the function X r 2 |ωi | )(π(x) − vol(0\X)G +

vol(0\X) 4

Z

i=1

0\X

X X r r r X |ωi |2 · |ωi |2 + 2 (G < ωi , ωj >) < ωi , ωj > G i=1

i=1

i,j =1

t does not depend on the choice of 0, therefore nor does d∗ (x, y). u We now give a condition in order that c1 (x, y) has no extra terms, namely coincides with its principal term − vol(M)d0 (x, y)2 /4. 1 2 Proposition 7.2. If c1 (x, y) = − vol(M) 4 d0 (x, y) , then < ωi , ωj >≡ vol(M) δij . In other ∗ words, the evaluation map Hom(0, R) → Tp M defined by ω 7→ ω(p) gives homothetic injection for every p ∈ M, or equivalently the 0-Albanese map 80 : M → Alb0 is a homothetic Riemannian submersion.

Proof. If the assumption holds, we have, for every p, q in M, X X r r 2 2 |ωi | (p) + G |ωi | (q) G i=1

=

1 2

Z M

i=1

X X r r r Z X G |ωi |2 · |ωi |2 + G(< ωi , ωj >)· < ωi , ωj >, i=1

i=1

i,j =1 M

which implies X r |ωi |2 ≡ constant. G i=1

662

Therefore

M. Kotani, T. Sunada

Pr

i=1 |ωi |

2

is a constant function, and hence X r 2 |ωi | ≡ 0, G r Z X i,j =1 M

i=1

G(< ωi , ωj >)· < ωi , ωj >= 0.

Since G is a positive operator, we conclude G < ωi , ωj >≡ 0 for all i, j , and < ωi , ωj >≡ cij for some constant cij . Since cij = vol(M)−1 ωi , ωj = vol(M)−1 δij , the proof is completed. u t Proposition 7.3. If 80 : M → Alb0 is a homothetic Riemannian submersion, then vol(M) d0 (x, y)2 k(t, x, y) ∼ (4πt)−r/2 exp − (t ↑ ∞), 4t where the right-hand side should be considered as an asymptotic power seires in t −1 . 2 In particular, c1 (x, y) = − vol(M) 4 d0 (x, y) . Proof. Since |ω| is constant, 4π 2 |ω|2 is an eigenvalue of Hω with a constant eigenfunction. Thus we have λ0 (ω) = 4π 2 |ω|2 , φω ≡ vol(M)−1 and Z exp(−λ0 (χ)t)˜sχ,0 (x)˜sχ ,0 (y)dχ U Z √ C0 exp(−akωk2 t) exp(−2π −1 < ξ, ω >)dω = vol(M) U Z √ C0 exp(−akωk2 t) exp(−2π −1 < ξ, ω >)dω, ∼ vol(M) Rr ˜ 0 (y) − 8 ˜ 0 (x), from which the claim follows where a = 4π 2 vol(M)−1 and ξ = 8 immediately. u t 8. The Asymptotics for the Transition Probability In this section, we see how our idea applies to the asymptotics of a random walk. Actually our argument does not differ much from the case of heat kernels. A minor change is required in the computation of the asymptotic expansion. We also need to handle with care the case of bipartite graphs. For the fundamental facts and the settings, we refer to [4]. Let X = (V , E) be a locally finite connected graph, V being the set of vertices and E being the set of all oriented edges. For an oriented edge e ∈ E, the origin and the terminus of e are denoted by o(e) and t (e), respectively. The inverse edge of e is denoted by e.

Albanese Maps and Asymptotics for the Heat Kernel

663

A reversible random walk on X is given by positive valued functions p on E and m on V satisfying X p(e) = 1 for x ∈ V , e∈Ex

p(e)m(o(e)) = p(e)m(t (e)) for e ∈ E, where Ex = {e ∈ E : o(e) = x}. Define the operator L acting on functions on V by X p(e)f (t (e)). (Lf )(x) = e∈Ex

The operator L is what we call the transition operator associated with the random walk. The transition probability p(n, x, y) is then defined as the “kernel function” of the nth iteration Ln ; say X p(n, x, y)f (y), (Ln f )(x) = y∈V

and p(n, x, y)m(y)−1 (= p(n, y, x)m(x)−1 ) is regarded as a discrete analogue of the heat kernel. In connection with the transition operator, and for the other purposes, let us introduce the notion of the discrete Laplacian associated with the weight functions mV (x) = m(x) (x ∈ V ), mE (e) = p(e)m(o(e)) (e ∈ E). Note that mE (e) = mE (e). Regarding X as a 1-dimensional CW-complex, we let C i (X) (i = 0, 1) be the cochain group with coefficients in C. Define the linear operators d : C 0 (X) → C 1 (X) and δ : C 1 (X) → C 0 (X) as df (e) = f (t (e)) − f (o(e)) for f ∈ C 0 (X), X X 1 mE (e)ω(e) = − p(e)ω(e) for ω ∈ C 1 (X). (δω)(x) = − mV (x) e∈Ex

e∈Ex

The discrete Laplacian 1 is then defined by (1f )(x) = (δdf )(x) = −

X 1 f (t (e)) − f (o(e)) mE (e). mV (x) e∈Ex

We easily find L = I − 1. When X is a finite graph, we may introduce a discrete analogue of the Green operator G as in the case of compact Riemannian manifolds. Namely, G is the operator acting on functions on V satisfying I = H + G1, G1 = 1G, GH = HG = O,

664

M. Kotani, T. Sunada

P P where Hf = m(V )−1 x∈V f (x)m(x) with m(V ) = x∈V m(x). To pursue an analogy with the continuous case, we occasionally write Z X f := f (x)m(x). V

x∈V

We now suppose that X is a lattice graph. Namely X is a graph which has an abelian group 0 acting freely on X as an automorphism group and has a finite quotient graph 0\X = X0 = (V0 , E0 ). We also assume that p(σ e) = p(e), m(σ x) = m(x) (e ∈ E, x ∈ V , σ ∈ 0), so that we have a reversible random walk on X0 for which we make use of the same symbols p, m. We denote by N the number of vertices in X0 . Identify the 1-cohomology group H1 (X0 , R) of X0 with the space of harmonic 1forms: {ω : E0 → R | ω(e) = −ω(e), δω = 0} (a discrete analogue of the Kodaira–Hodge theorem), and introduce the inner product on H1 (X0 , R) given by Z < ω1 , ω2 > (ω1 , ω2 ∈ H1 (X0 , R)), ω1 , ω2 = V0

where < ω1 , ω2 > (p) =

1 X p(e)ω1 (e)ω2 (e) (p ∈ V0 ). 2 e∈E0p

As in the continuous case, we equip inner products on Hom(0, R) and 0 ⊗ R via the surjective homomorphism of H1 (X0 , Z) onto 0 associated with the covering map ˜ 0 : V → 0 ⊗ R by X → X0 , and define the map 8 Z x ˜ 0 (x), ωi = ω˜ (ω ∈ Hom(0, R)), h8 x0

Rx P ˜ i) where x0 ∈ V is a reference point, ω˜ is the lift of ω to V , and x0 ω˜ = li=1 ω(e for a path c = (e1 , . . . , el ) with o(c) = x0 and t (c) = x. The standard realization ˜ 0 by line segments. The of X in 0 ⊗ R is defined to be the interpolation map for 8 pseudo-distance d0 is defined in a similar way. It is readily checked that d0 is roughly equivalent to the combinatorial distance d on the graph X. We also define 0-Albanese torus Alb0 as the torus 0 ⊗ R/ 0 ⊗ Z with the flat metric induced from the inner product on 0 ⊗ R. A similar characterization of the Albanese map to that in Theorem 2.2 has been established in [5]. We are ready to prove Theorem 7 as well as discrete analogues of Theorems 4, 5. For a unitary character χ of 0, consider the N -dimensional inner product space `2χ = {f : X → C | f (σ x) = χ(σ )f (x) for σ ∈ 0 }

Albanese Maps and Asymptotics for the Heat Kernel

665

with the inner product Z hf1 , f2 iχ =

D

f1 f2 ,

where D ⊂ V is a fundamental set for the 0-action. We may identify `21 with the space `2 (V0 ) = {f : V0 → C}. The operator L preserves `2χ , i.e. L(`2χ ) ⊂ `2χ . We define the twisted transition operator Lχ as the restriction of L to `2χ . It should be pointed out that 1χ = I − Lχ is a discrete analogue of the twisted Laplacian. We enumerate the eigenvalues µi (χ ) ∈ [−1, 1] (i = 1, . . . , N ) of the operator Lχ with repetition according to the multiplicity: µ1 (χ ) ≥ µ2 (χ ) ≥ · · · ≥ µN (χ ). It should be noted that µ1 (1) = 1 and µ1 (χ ) is simple for a character χ around the trivial one. Let sχ ,1 , · · · , sχ ,N be an orthonormal basis of `2χ with Lχ sχ ,i = µi (χ )sχ ,i (i = 1, . . . , N). One can take sχ,i in such a way that sχ ,i ’s are integrable in χ, s1,1 ≡ m(V0 )−1/2 and sχ ,1 is smooth around χ = 1. In just the same way as the continuous case, we have the following integral expression for p(n, x, y), p(n, x, y)m(y)−1 =

Z X N b 0 i=1

µi (χ )n sχ ,i (x)sχ ,i (y)dχ ,

(18)

and hence its asymptotic as n goes to infinity is controlled by the characters χ with µ1 (χ) = 1 or µN (χ) = −1. Lemma 8.1 ([4]). 1. µ1 (χ) = 1 ⇐⇒ χ = 1, 2. µN (χ) = −1 for some character χ ⇐⇒ X is bipartite. To see how µi (χ) and sχ ,i depend on χ , we imitate the way given√in Sect. 3. First, we identify ω and χω via the correspondence ω 7 → χω := exp(2π −1 < ω, · >). Then define the function sω by √ sω (x) = exp(2π −1

Z

x x0

ω). ˜

(19)

It is readily checked that sω ∈ `2χω . Define a unitary map S : `2 (V0 ) → `2χ by S(f ) = f˜sω , where f˜ is the lift of f ∈ `2 (V0 ) to X and put Hω = S −1 Lχω S : `2 (V0 ) → `2 (V0 ). We find X √ p(e) exp(2π −1ω(e))f (t (e)). (Hω f )(x) = e∈E0x

Put φω = S −1 sχ ,0 ∈ `2 (V0 ). The behavior of µ1 (χ) around the trivial character χ = 1 can be analyzed in much the same way as the first eigenvalues of the twisted Laplacian.

666

M. Kotani, T. Sunada

Lemma 8.2. Put φτ = φτ ω and µ1 (τ ) = µ1 (χτ ω ). By a suitable choice of sχ ,0 , we have kωk2 (2k−1) (2) , (0) = 0, µ1 (0) = −8π 2 µ1 (0) = 1, µ1 m(V0 ) Z 16π 4 X (4) (2) (2) µ1 (0) = 6 1φ0 · φ0 + ω(e)4 mE (e), m(V0 ) V0 e∈E0

kωk2 (1) (2) (2) − |ω|2 , Imφ0 = 0, φ0 = 0, 1φ0 = 8π 2 m(V0 )−1/2 m(V0 ) X √ √ (3) (2) p(e)(2π −1ω(e))3 . 1φ0 = 12π −1 < ω, dφ0 > +m(V0 )−1/2 x∈E0x (2)

As immediate consequences, together with φ0 lowing expansions of µ1 and of φ0 ;

=

2 √−8π G|ω|2 , m(V0 )

we have the fol-

Z 4π 2 (4π 2 )2 2 kωk + G|ω|2 · |ω|2 µ1 = 1 − m(V0 ) m(V0 ) V0 (4π 2 )2 X + ω(e)4 mE (e) + O(kωk6 ), 4!m(V0 ) e∈E0

−1/2

φ0 (p) = m(V0 )

(1 − 4π 2 G(|ω|2 )(p)) + O(kωk4 ). (4)

A difference from the case of heat kernels is seen in the formula for µ0 (0), where an extra term appears. We put λ(ω) = − log µ1 (χω ). Since the Taylor expansion of − log(1 − z) around z = 0 is given as − log(1 − z) = z +

z2 + O(z3 ), 2

we have λ(ω) = − log(1 − (1 − µ1 (χω ))) Z (4π 2 )2 4π 2 G|ω|2 · |ω|2 kωk2 − = m(V0 ) m(V0 ) V0 (4π 2 )2 X (4π 2 )2 − ω(e)4 mE (e) + kωk4 + O(kωk6 ) 4!m(V0 ) 2m(V0 )2 e∈E Z 4π 2 kωk2 − 4π 2 G|ω|2 · |ω|2 = m(V0 ) V0 π2 X 2π 2 kωk4 + O(kωk6 ) . − ω(e)4 mE (e) + 6 m(V0 ) e∈E

We first treat the non-bipartite case. Our problem is reduced to the asymptotics of the integral Z √ exp(−λ(ω)n)P (ω) exp(−2π −1ξ · ω)dω, C0 U

Albanese Maps and Asymptotics for the Heat Kernel

667

where C0 = vol(Alb0 )(# Tor(0))−1 , U is a small neighborhood of ω = 0, P (ω) = ˜ 0 (x) − 8 ˜ 0 (y). Therefore the same methods as in the case φω (π(x))φω (π(y)) and ξ = 8 of manifolds may be employed for the proof of the first part of Theorem 7. We also have Theorem 8.3. Put r = rank 0 and C(X) =

m(V0 )r/2−1 vol(Alb0 ) . # Tor(0)

Then we have p(n, x, y)m(y)−1 ∼ (4πn)−r/2 C(X) × (1 + c1 (x, y)n−1 + c2 (x, y)n−2 + . . . ) (n ↑ ∞), cl (x, y) ∼

1 m(V0 ) l − d0 (x, y)2l (d(x, y) ↑ ∞). l! 4

We now establish an analogue of Theorem 5. Following as in Sect. 7, we take a change of the coordinate ω = u + h(u), and have Z π2 X π2 G|u|2 · |u|2 + u(e)4 mE (e) − kuk4 . u, h˜ = 2π 2 12 m(V0 ) V0 e∈E

From ∂ 2 ∂ 2 X 2 2 ∂2 ∂2 4 kuk = uk ) = 16δij + 8, ( ∂u2i ∂u2j ∂u2i ∂u2j k ∂2 ∂2 u(e)4 mE (e) = 24(ωi (e)ωj (e))2 mE (e), ∂u2i ∂u2j with an orthonormal basis {ωi }ri=1 of Hom(0, R), it follows that m(V0 ) d0 (x, y)2 4 r r X X m(V0 ) − |ωi |2 )(π(x)) + G( |ωi |2 )(π(y)) G( 2 i=1 i=1 Z r r r X X X m(V0 ) |ωi |2 ) |ωi |2 + 2 G(< ωi , ωj >) < ωi , ωj > G( + 4 V0

c1 (x, y) = −

+

m(V0 ) X ( 32 e∈E

i=1 r X i=1

i=1

i,j =1

2 2r + r 2 ωi (e))2 mE (e) − . 8

Next we consider the bipartite case. We may assume, without loss of generality, that 0 preserves the bipartition of X, by taking a double covering of X0 , if necessary (see

668

M. Kotani, T. Sunada

[4]). This being the case, if V = A q B is the bipartition, and if we define the unitary operator T : `2χ → `2χ by ( (Tf )(x) =

f (x) −f (x)

x ∈ A, x ∈ B,

then we have T Lχ T −1 = −Lχ . Therefore µN (χ ) = −µ1 (χ ), and one may assume sχ ,N = T sχ,1 , so that ( sχ,1 (x)sχ ,1 (y) x, y ∈ A or x, y ∈ B, sχ ,N (x)sχ ,N (y) = −sχ,1 (x)sχ ,1 (y) x ∈ A, y ∈ B or x ∈ B, y ∈ A. From this, we conclude that, 1. if x, yR ∈ A or x, y ∈ B, then p(n, x, √ y)m(y)−1 has the same asymptotic as 2C0 U exp(−λ(ω)n)P (ω) exp(−2π −1ξ · ω)dω as n = 2m ↑ ∞. 2. if x ∈ A, p(n, x, y)m(y)−1 has the same asymptotic R y ∈ B or x ∈ B, y ∈ A, then √ as 2C0 U exp(−λ(ω)n)P (ω) exp(−2π −1ξ · ω)dω as n = 2m − 1 ↑ ∞. We may carry out the proof for the second part of Theorem 7, and obtain the following asymptotic expansion of p(n, x, y) in the same manner as above. Theorem 8.4. Let X be a bipartite lattice graph with a bipartition V = A q B. 1. If x, y ∈ A or x, y ∈ B, then p(n, x, y)m(y)−1 ∼ (4πn)−r/2 2C(X) × (1 + c1 (x, y)n−1 + c2 (x, y)n−2 + . . . ) (n = 2m ↑ ∞). 2. If x ∈ A, y ∈ B or x ∈ B, y ∈ A, then p(n, x, y)m(y)−1 ∼ (4πn)−r/2 2C(X) × (1 + c1 (x, y)n−1 + c2 (x, y)n−2 + . . . ) (n = 2m − 1 ↑ ∞). In either case, we have cl (x, y) ∼

1 m(V0 ) l d0 (x, y)2l (d(x, y) ↑ ∞). − l! 4

The following theorem is a discrete analogue of Theorem 2. Theorem 8.5. Let X be a lattice graph. With a fixed t > 0 and nδ 2 = m(V0 )t, we have Z X 1 ˜ 0 (y)) = (4π t)−r/2 p(n, x, y)f (δ 8 exp − kyk2 f (y)dy lim n→∞ 4t 0⊗R y∈V

for every continuous function f on the Euclidean space 0 ⊗ R with compact support.

Albanese Maps and Asymptotics for the Heat Kernel

669

The proof for the non-bipartite case is just the same as the one for Theorem 2 (see Sect. 5). When X is a bipartite graph with the bipartition V = A q B which is preserved by the 0-action, we observe that X0 is a bipartite graph with the bipartition V0 = A0 qB0 , where A0 = 0\A and B0 = 0\B. Since X X X X p(e)m(o(e)) = p(e)m(t (e)) m(A0 ) = x∈A0 e∈E0x

=

x∈A0 e∈E0x

X X

p(e)m(o(e)) = m(B0 ),

x∈B0 e∈E0,x

we find that m(A0 ) = m(B0 ) = 21 m(V0 ). When x ∈ A, we have (P X ˜ 0 (y)) p(n, x, y)f (δ 8 ˜ p(n, x, y)f (δ 80 (y)) = Py∈A ˜ y∈B p(n, x, y)f (δ 80 (y)) y∈V

(n even), (n odd).

From Theorem 7, it follows that, for a given > 0, there exists a positive integer n0 such that, for even n ≥ n0 , X ˜ 0 (y)) p(n, x, y)f (δ 8 y∈V

− (4πt)−r/2 δ r m(V0 )2C0

X

exp(−

y∈A

1 ˜ ˜ 0 (y)k2 )f (δ 8 ˜ 0 (y))m(y) kδ 80 (x) − δ 8 4t X ˜ 0 (y))|m(y), |f (δ 8 < δ y∈A

and for odd n ≥ n0 , X ˜ 0 (y)) p(n, x, y)f (δ 8 y∈V

− (4πt)−r/2 δ r m(V0 )2C0

X y∈B

exp(−

1 ˜ ˜ 0 (y)k2 )f (δ 8 ˜ 0 (y))m(y) kδ 80 (x) − δ 8 4t X ˜ 0 (y))|m(y). |f (δ 8 < δ y∈B

On the other hand, for K = A or B, we find, for an arbitrary continuous function F on 0 ⊗ R with compact support, Z X ˜ 0 (y))m(y) = 1 m(V0 ) F (δ 8 f (y)dy. lim δ r δ↓0 2 0⊗R x∈K

Thus applying the argument in Sect. 5, we obtain the claim in Theorem 8.5 when x ∈ A. In much the same way, we can prove the claim when x ∈ B. At the end of the paper, we exhibit a few examples of isotropic random walk, i.e. the one with p(e) = (deg o(e))−1 , m(x) = deg x. If X is the maximal abelian covering graph of X0 , then vol(Alb0 ) is equal to the complexity of the finite graph X0 ; namely the number of spanning trees in X0 . For more general lattice graphs, we have a recipe to compute the constant C(X) (see [5]).

670

M. Kotani, T. Sunada

Example 1. The standard realization of Zr -lattice is the square lattice in Rr each of whose edge has unit length. The asymptotic of the transition probability is given as p(n, x, y) ∼

X 2(2r)r/2 (1 + ci (x, y)n−i ), r/2 (4π n) 1≤i

r r c1 (x, y) = − d0 (x, y)2 − . 2 4 Example 2. The standard realization of the triangular √ lattice in is the equilateral traingular lattice in R2 each of whose edge has length 2/3. The asymptotic of the transition probability √ X 2 3 (1 + ci (x, y)n−i ), (r = 2) p(n, x, y) ∼ r/2 (4πn) 1≤i

3 1 c1 (x, y) = − d0 (x, y)2 − . 2 2 Example 3. The standard realization of the hexagonal lattice is the equilateral hexagonal √ lattice in R2 each of whose edge has length 2/3. The asymptotic of the transition probability is given as √ X 6 3 (1 + ci (x, y)n−i ), (r = 2) p(n, x, y) ∼ (4πn)r/2 1≤i

3 1 c1 (x, y) = − d0 (x, y)2 − . 2 2 References 1. Kanai, M.: Rough isometry, and combinatorial approximation of geometries of noncompact Riemannian manifolds. J. Math. Soc. Japan 37, 391–413 (1985) 2. Kannai, Y.: Off diagonal short time asymptotics for fundamental solutions of diffusion equations. Comm. in P.D.E. 2, 781–830 (1997) 3. Katsuda, A. and Sunada, T.: Homology and closed geodesics in a compact Riemann surface. Am. J. Math. 109, 141–156 (1987) 4. Kotani, M., Shirai, S. and Sunada, T.: Asymptotic behavior of the transition probability of a random walk on an infinite graph. J. Funct. Anal. 159, 669–689 (1998) 5. Kotani, M. and Sunada, T.: Standard realizations of crystal lattices via harmonic maps. To appear in Trans. Amer. Mth. Soc. 6. Molchanov, S.A.: Diffusion process and Riemannian geometry. Russian Math. Surveys 30, 1–63 (1975) 7. Nagano, T. and Smyth, B.: Minimal varieties and harmonic maps in tori. Comm. Math. Helv. 50, 249–265 (1975) 8. Spitzer, F.: Principles of random walk. Princeton, NJ: D. Van. Nostrand, 1964 9. Sunada, T.: Unitary representations of fundamental groups and the spectrum of twisted Laplacians. Topology 28, 125–132 (1989) Communicated by P. Sarnak

Commun. Math. Phys. 209, 671 – 690 (2000)

Communications in

Mathematical Physics

© Springer-Verlag 2000

Scaling Limits of Wick Ordered KPZ Equation Terence Chan Department of Actuarial Mathematics & Statistics, Heriot-Watt University, Edinburgh EH14 4AS, UK. E-mail: [email protected] Received: 20 September 1998 / Accepted: 20 August 1999

Abstract: Consider the KPZ equation u(t, ˙ x) = 1u(t, x) + |∇u(t, x)|2 + W (t, x), x ∈ Rd , where W (t, x) is a space-time white noise. This paper investigates the question of whether, for some exponents χ and z, k −χ u(k z t, kx) converges in some sense as k → ∞, and if so, what are the values of these exponents. The non-linear term in the KPZ equation is interpreted as a Wick product and the equation is solved in a suitable space of stochastic distributions. The main tools for establishing the scaling properties of the solution are those of white noise analysis, in particular, the Wiener chaos expansion.A notion of convergence in law in the sense of Wiener chaos is formulated and convergence in this sense of k −χ u(k z t, kx) as k → ∞ is established for various values of χ and z depending on the dimension d. 1. Introduction Consider the Kardar–Parisi–Zhang (KPZ) equation (Kardar, Parisi and Zhang (1986)): λ ∂u = ν1u + |∇u|2 + σ W (t, x), ∂t 2 u(0, x) = u0 (x),

(1.1)

where W (t, x) is a space-time white noise, u = u(t, x) and x ∈ Rd . It is well-known that this equation can be transformed into a linear one by means of the so-called Cole–Hopf transform: define λ u(t, x) , (1.2) v(t, x) := exp 2ν then v satisfies λσ ∂v = ν1v + v(t, x)W (t, x). ∂t 2ν

(1.3)

672

T. Chan

In this paper, we are interested in the question of whether, for some exponents χ and z, k −χ u(k z t, kx) converges in some sense as k → ∞, and if so, what are the values of these exponents. The aim of this paper is to present a mathematical framework in which to formulate and establish such scaling limits. There is wide agreement in the physics literature that when d = 1, we have χ = 1/2 and z = 3/2. In higher dimensions, there is less certainty as to the values of these exponents. However, the interpretation of (1.1) and the associated notion of convergence presented in this paper do not lead to the kind of scaling behaviour expected by most physicists. The main results of this paper can be summarized as follows: (i) Full scaling limit: (a) d = 1: for fixed t and x, k −1/4 u(k 7/4 t, kx) converges (in a sense to be explained) as k → ∞. (b) d = 2: (1.1) is scale invariant, and for fixed t, x and k, u(k 2 t, kx) is “identical in law” (in a sense to be explained) to u(t, x). (ii) White noise scaling limit: In any dimension d ≥ 1, suppose z satisfies 6+d , 2 z < min 4 and let χ be given by z−χ =1+

d . 2

Then for fixed t and x, k −χ u(k z t, kx) converges (in the same sense) as k → ∞, to a limit which is different from the full scaling limit described above and which is a multiple of white noise. A proper statement of these results will be given later in Theorem 4.1 in Sect. 4. It should be stressed that it is not claimed here that these results contradict the conventional (χ = 1/2, z = 3/2) scaling. The differences are most likely due to a difference in the way (1.1) is interpreted. In the extensive body of literature on stochastic PDEs it is widely known that in general, the solution u(t, x) to a SPDE like (1.1) or (1.3) does not exist in a “classical” sense as a random field or random variable, but only in some generalized sense as a (stochastic) distribution – the one exception being the case of 1 spatial dimension (i.e. d = 1) where the solution to (1.3) (appropriately interpreted) is a proper (2-parameter) stochastic process. Even the white noise W (t, x) does not exist as a proper random field but may be regarded as a Hida distribution (see Kuo (1994)). As several authors have already pointed out (notably Holden et al. (1994) and Lindstrøm et al. (1995)), this immediately presents a problem as to how to interpret the non-linear term in (1.1) and the product v(t, x)W (t, x) in (1.3). Following Holden, Lindstrøm et al., we shall interpret such products as Wick products, which will be explained in the next section (see for example Holden et al. (1994, 1995) and Lindstrøm et al. (1995)). While the Wick product formulation is in many respects the natural one from a mathematical point of view, it is also an approach adopted in some of the physics literature – for example, Bertini and Giacomin (1997), although their formulation of the Wick product differs from the present one in some important respects. A more detailed discussion of the differences between the interpretation of (1.1) presented here and the kind of interpretation physicists would like to adopt (in particular, the framework of Bertini and Giacomin (1997)) is deferred to Sect. 2, after the necessary precise definitions have been introduced.

Scaling Limits of Wick Ordered KPZ Equation

673

Although our approach to (1.1) is “wrong” from the physical point of view, in that it does not exhibit the expected scaling properties, the interpretation adopted here is arguably the simplest possible and it seems that before one can understand what is the “right” interpretation of the KPZ equation, one should first understand the simple situation considered here. We conclude this section by briefly summarizing some of the work of other authors who have investigated the KPZ and related equations within a probabilistic framework. The main existence and uniqueness result we shall rely on is due to Holden et al. (1995), mainly in connection with (1.3) and the stochastic Burgers’equation to which the KPZ equation can also be transformed. The authors consider the Wick product versions of these equations and establish existence and uniqueness of the solution in a certain space of distributions which will be described in the next section. The authors also show that the Cole–Hopf transformation (1.2) can be applied to these distributions and, more importantly from our point of view, can be inverted so that their results on (1.3) can be applied to the KPZ equation by inverting the Cole–Hopf transformation. Holden et al. (1994) study the same class of equations, but in the context of what the authors call functional processes X(t, x, φ, ω), where φ is a suitable test function on R+ × Rd translated to the point (t, x) (i.e. φ(· − t, · − x)) and ω is a point in the probabilistic sample space. The authors show that the Wick versions of (1.3) and the stochastic Burgers’ equation have unique functional process solutions. The KPZ equation (1.1) is discussed in passing, but the authors do not discuss how the inversion of the Cole–Hopf transform can be applied to their results. Nualart and Rozovskii (1997) study a class of linear SPDEs which includes (the Wick version of) (1.3) and show that these have a unique solution in a much smaller space of distributions than that considered by Holden et al. (1995). However, it is not known if a similar result holds for the KPZ equation and in this paper, we shall settle for the larger space considered by Holden et al. (1995). Bertini and Giacomin (1997) consider the KPZ equation in 1 dimension. They construct the solution by first considering the KPZ equation driven by a regularized smooth approximation to white noise and then showing that a weak limit of these approximating solutions exists and is in fact a continuous function. Moreover, this limit is the same as (2ν/λ) log v(t, x), where v(t, x) is the solution to (1.3), with vW interpreted as a Wick product. The authors further show that the solution to the (1-dimensional) KPZ equation is the rescaled limit of a lattice growth model, the weakly asymmetric solid on solid growth process. The Cole–Hopf transformation plays an important role in their methods. However, the question as to how to interpret the non-linearity in (1.1) is left open. Also, while they consider a Wick product for the non-linear term in the smoothed KPZ equation, it is not clear how this Wick product is defined in the limit; it will be seen in the next section that, at least in the limit, the Wick product used for the non-linear term of (1.1) by Bertini and Giacomin (1997) is not the same as that used in this paper. (On the other hand, the Wick product interpretation of vW in (1.3) used by Bertini and Giacomin (1997) is the same as the one used here.) Finally, we mention the paper of Handa (1996), which presents a rigorous formulation of the KPZ equation (for arbitrary d) driven by a certain class of correlated noises. The author shows that a unique solution (in an appropriate weak sense) exists as a continuous function.

674

T. Chan

2. Mathematical Foundations: White Noise Analysis and Wiener Chaos The purpose of this section is to present the mathematical setting in which one can solve equations such as (1.1). The reference most directly relevant to Eqs. (1.1) and (1.3) is Holden et al. (1995). For general background reference for white noise analysis, see Kuo (1994). Let S 0 = S 0 (R+ × Rd ) be the space of tempered distributions whose space of test functions is S = S(R+ × Rd ). The basic sample space in the white noise framework is = S 0 . By the Bochner–Minlos theorem, there exists a unique probability measure P on S 0 such that Z 1 eihω,φi P(dω) = exp − k φ k2 , ∀φ ∈ S, 2 S0 where k · k denotes the L2 norm and h·, ·i denotes both dual pairing between S and S 0 and the L2 inner product. Thus P is the analogue of Wiener measure and the random variable ω 7 → hω, φi has N(0, k φ k2 ) distribution under P. Let Bt,x be a Brownian sheet and define the white noise functional W on S as follows: Z ∞Z f (t, x) dBt,x ≡ hω, f i, f ∈ S. W (f ) := 0

Rd

We have the following Itô isometry: E[W (f )W (g)] = hf, gi. Consequently, the white noise functional W may be extended to the whole of L2 and in particular we have the following representation for Brownian sheet: Bt,x (ω) = hω, 1[0,t]×[0,x] i, where, by a slight abuse of notation, we have used [0, x] to denote the rectangle with 0 and x as diagonal vertices. Denote by Hn (x) the Hermite polynomial Hn (x) = (−1)n ex

2 /2

d n −x 2 /2 (e ). dx n

Let {mi } be an orthonormal basis for L2 (R+ ) and {ej } be an orthonormal basis for L2 (Rd ), so that {mi ⊗ ej } is an orthonormal basis for L2 (R+ × Rd ). Although we can take {ej } to be any orthonormal basis, in the context of white noise analysis, {ej } are usually tensor products of Hermite functions as follows: let ξn be the nth Hermite function √ √ 2 (2.1) ξn (x) = ( π n!)−1/2 Hn ( 2x)e−x /2 , (j )

(j )

(j )

take an arbitrary ordering of d-dimensional multi-indices β (j ) = (β1 , β2 , . . . , βd ) and let ej = ξβ (j ) ⊗ . . . ⊗ ξβ (j ) . 1

d

(For concreteness, we may take {mi } to be, say, the Laguerre functions.)

(2.2)

Scaling Limits of Wick Ordered KPZ Equation

675

For a matrix multi-index α = (αij ), define the iterated Wiener integral Y Hαij W (mi ⊗ ej ) . Wα := i,j

(Recall that if {θi } is any orthonormal basis of L2 (Rn ) and β = (β1 , . . . , βk ) is a multiindex, then the iterated Wiener integral may be written in terms of Hermite polynomials: Z

⊗β1

Rn|β|

θ1

ˆ · · · ⊗θ ˆ k⊗βk dB ⊗|β| = ⊗

k Y

Hβi (hω, θi i)

i=1

ˆ denotes symmetric tensor product), hence the reason for calling Wα iterated (where ⊗ √ Q Wiener integrals.) The family {Wα / α!} (where α! = i,j αij !) form an orthonormal 0 basis for (L2 ) := L2 (S X ∈ (L2 ) admits a Wiener Pvariable P , P): thus any random 2] = 2 . Moreover, E[X] = a and if and E[X α!a chaosPexpansion X = α aα WP α 0 α α Y = α bα Wα then E[XY ] = α α!aα bα . The Wick product of two elements of (L2 ), which we denote by X Y , is defined by putting Wα Wβ := Wα+β

(2.3)

and extending this definition to the whole of (L2 ) by linearity. It turns out that from the mathematical point of view, the most natural way to interpret (1.3) is to interpret the product v(t, x)W (t, x) as a Wick product v(t, x) W (t, x): for example, Holden et al. (1994, 1995), Lindstrøm et al. (1995) and Nualart and Rozovskii (1997) all take this approach (even though the last paper does not make explicit reference to Wick products). Moreover, as explained in Lindstrøm et al. (1992), the following identity holds in an appropriate sense: Z Z Yr dBr = Yr Wr dr, where B and W are respectively a standard Brownian sheet and white noise and Y is a suitable (multi-parameter) process. Therefore, any attempt at interpreting (1.3) as an Itô integral equation in some generalized sense, for example Z

T

− 0

Z Rd

˙ x) dxdt v(t, x)φ(t, Z

=ν 0

T

Z Rd

v(t, x)1φ(t, x) dxdt +

λσ 2ν

Z 0

T

Z Rd

v(t, x)φ(t, x)dB(t, x),

will boil down to treating the vW term as a Wick product. However, it is well-known (see for example Nualart and Rozovskii (1997) and Walsh (1986)) that equations like (1.3) (even interpreted in the Wick sense) do not admit solutions in (L2 ) in dimensions d > 1. Consequently we need to seek solutions in a larger space of generalized functions or stochastic distributions. In Lindstrøm et al. (1995), it is shown that the Wick version of (1.3) has a unique solution in the space of Hida distributions (see Kuo (1994) for the definition). However, in order to be able to reverse the Cole–Hopf transformation (1.2) to get a solution to (1.1), we need an even larger space of distributions to accommodate solutions to the latter.

676

T. Chan

We introduce the Kondratiev spaces of test functions and distributions; for a more p detailed P treatment see2 Holden et al. (1995). For 0 ≤ p ≤ 1, let (S ) denote the space of f = α aα Wα ∈ (L ) such that X (α!)1+p aα2 (2N)αm < ∞ ∀m ≥ 0, (2.4) k f k2p,m := α

where (2N)α =

Y (j ) (j ) (2d iβ1 . . . βd )αij , i,j

(k)

(k)

and (β1 , . . . , βd ) is related to the basis {ej } by ej = ξβ (j ) ⊗ . . . ⊗ ξβ (j ) . d

1

Let

(S −p )

P denote the space consisting of formal expansions F = α bα Wα such that X (α!)1−p bα2 (2N)−αq < ∞ for some q > 0. (2.5) α

The {k · kp,m }m≥0 form a separating family of semi-norms which gives rise to the natural metric topology on (S p ) and the space (S −p ) can then be regarded as the dual of (S p ) via the action X α!aα bα hF, f i = P

(S p )

P

α

if f = α aα Wα ∈ and F = α bα Wα ∈ (S −p ). The spaces (S −p ) and (S p ) are called Kondratiev distributions and Kondratiev test functions respectively. Note that it is one of the anomalies of this notation that (S 0 ) 6= (S −0 ); in fact the latter is the space of Hida distributions while the former is the space of Hida test functions. We have the following analogue of the classical Gel’fand triple: (S 1 ) ⊂ (S p ) ⊂ (S 0 ) ⊂ (L2 ) ⊂ (S −0 ) ⊂ (S −p ) ⊂ (S −1 ). We shall henceforth work with the largest of these spaces of distributions, (S −1 ). Note that the WickP product can still be defined on (S −1 ) by extending (2.3) by linearity. For F = α bα Wα ∈ (S −1 ), the Hermite transform HF of F is defined to be the function on CN×N given by X bα zα , (2.6) (HF )(z) = α

Q α provided the series is convergent, where zα = i,j zijij . The Hermite transform is unique, i.e. if HF = HG then F = G. The Hermite transform is related to the Stransform SF more usually used P in white noise analysis (see Kuo (1994)) via the identity (HF )(z) = (SF )(φ) if φ = i,j zij mi ⊗ ej ∈ L2 (R+ × R), at least for F ∈ (L2 ). Note that the definition of HF (unlike that of SF ) depends on the choice of basis {mi ⊗ ej }. The two transforms have more or less identical properties; in particular H(F G)(z) = HF (z)HG(z)

(2.7)

Scaling Limits of Wick Ordered KPZ Equation

677

and any bounded analytic function in some domain of CN×N is the Hermite transform of some Kondratiev distribution in (S −1 ). Consider now the Wick versions of (1.1) and (1.3), which we rewrite as λ ∂u = ν1u + |∇u|2 + σ W (t, x) ∂t 2

(2.8)

(where |x|2 = x12 + · · · + xd2 ) and λσ ∂v = ν1v + v(t, x) W (t, x). ∂t 2ν

(2.9)

Equations (2.8) and (2.9) are still related by the Cole–Hopf transformation provided the exponential in (1.2) is replaced by the Wick exponential, which we denote by exp ; this may be defined via the Taylor series in which ordinary powers are replaced by Wick powers:

exp {X} =

∞ X X n n=0

n!

(see Holden et al. (1995) for the details and other useful properties of exp ). Similarly, any real analytic function has a Wick version which can be defined via its Taylor series in the same way. In particular, the Wick logarithm log exists and has the natural properties in relation to exp that one might expect, in particular, log (exp X) = exp (log X) = X – see Holden et al. (1995). It is shown in Holden et al. (1995) that (2.9) has a unique solution in (S −1 ) for any initial condition V (0, x) = v0 (x). Moreover, log v(t, x) ∈ (S −1 ) exists provided that E[v0 (x)] > 0. Combining these t wo results, we see that the KPZ equation (2.8) has a unique solution in (S −1 ) for any initial condition u(0, x) = u0 (x). Bertini and Giacomin (1997) interpret the noise term in (1.3) in the sense of an Itô integral, and as has already been pointed out, this is identical to the Wick version (2.9). In the case d = 1, they show that the solution v(t, x) exists as a continuous function and moreover v(t, x) > 0 for all t if v(0, x) > 0. However, they then take (2ν/λ) log v(t, x) – the ordinary logarithm – as the solution to the KPZ equation. This leaves unanswered the question as to what equation the “solution” (2ν/λ) log v(t, x) satisfies – it is certainly not (2.8) because for that, one needs to take the Wick logarithm (2ν/λ) log v(t, x). It seems likely that this is the root cause of the unexpected scaling properties of (2.8) established here. Although as has already been pointed out, (2.9) has a unique solution in (S −0 ) (in fact, Nualart and Rozovskii (1997) show that the solution lives on an even smaller space), these smaller spaces are in general not closed under taking of Wick logarithms and there is no guarantee that u(t, x) = (2ν/λ) log v(t, x) exists in (S −0 ). 3. A Wiener Chaos Approach to the KPZ Equation In this section, we P consider the Wiener chaos expansion of the solution u(t, x) to (2.8). Writing u(t, x) = α cα (t, x)Wα , we shall derive a recursive system of (deterministic) PDEs for the coefficients cα . However, we first need a Wiener chaos expansion for space-time white noise.

678

T. Chan

Formally, W (t, x) is the derivative of Brownian sheet: W (t, x) =

∂ 1+d Bt,x . ∂t∂x1 . . . ∂xd

Since Bt,x = hω, 1[0,t]×[0,x] i, formally W (t, x) = hω, δt,x i andPin terms of the basis {mi ⊗ ej } of L2 (R+ × Rd ), δt,x has the formal expansion δt,x = i,j mi (t)ej (x)mi ej ; consequently W (t, x) should have the Wiener chaos expansion W (t, x) =

X

Z mi (t)ej (x)

mi (s)ej (y) dBs,y =

i,j

X

mi (t)ej (x)Wij ,

(3.1)

i,j

where ij is the matrix multi-index with 1 in the (i, j ) entry and 0 everywhere else. Indeed, we could define space-time white noise to be the object whose Wiener chaos expansion is given by (3.1). Clearly, W (t, x) 6 ∈ (L2 ) but it is not difficult to see that W (t, x) ∈ S −0 (see Kuo (1994)). Any study of the scaling properties of u(t, x) must revolve around the scaling properties of W (t, x), which we now investigate. From the Brownian scaling a −1/2 b−d/2 Bat,bx D Bt,x and the fact that W (t, x) should behave like the “derivative” of Bt,x , we see that = D W (t, x). But such a statement has as yet no W should scale like a 1/2 bd/2 W (at, bx) = mathematical meaning, for W (t, x) exists only as a distribution and not a random variable and so we need to give a proper meaning to the notion of “identity in law”. This can be done via the Wiener chaos expansion as follows. From (3.1) we have

p

abd W (at, bx) =

p X abd mi (at)ej (bx)Wij . i,j

√ √ But m ˜ i ⊗ e˜j := ami (a ·) ⊗ bd ej (b ·) also form an orthonormal basis and hence eij }, where, of the family of random variables {Wij } is identical in law to the family {W eα denotes the iterated Wiener integral of the orthonormal basis {m ˜ i ⊗ e˜j }. Thus course, W P e e W (t, x) := i,j m ˜ i (t)e˜j (x)Wij is also a space-time white noise which can justifiably be √ said to be “identical in law” to abd W (at, bx). This motivates the following definition: P P Definition 3.1. (i) Let F = α aα Wα and G = α bα Wα be two elements of (S −1 ). Then F and G are said to be identical in law in the sense of Wiener chaos – ˜ i ⊗ e˜j } and assowritten F W = G – if there exists another orthonormal basis {m P eα such that G = e a ciated iterated Wiener integrals W W α α α . (In other words, P W P e α aα Wα = α aα Wα .) (ii) Let (Fn ) be a sequence in (S −1 ). Then Fn is said to converge in law in the sense W of Wiener chaos to F ∈ (S −1 ) as n → ∞ – written Fn → F – if there exists a W en and F en → F in the topology of (S −1 ) en ) such that for every n, Fn = F sequence (F as n → ∞. In terms of the Hermite transform, F W = G if and only if there are two bases {mi ⊗ ej } e such that HF (z) = HG(z). e and {m ˜ i ⊗ e˜j } and associated Hermite transform H and H

Scaling Limits of Wick Ordered KPZ Equation

679

P Substituting the Wiener chaos expansions u(t, x) = α cα (t, x)Wα and (3.1) into (2.8) and equating coefficients, we obtain the following recursive system of PDEs for the coefficients cα : λ ∂c0 = ν1c0 + |∇c0 |2 , c0 (0, x) = u0 (x), ∂t 2

(3.2a)

∂cij = ν1cij + λ∇c0 · ∇cij + σ mi (t)ej (x), cij (0, x) = 0, ∂t

(3.2b)

λ ∂cα = ν1cα + λ∇c0 · ∇cα + ∂t 2

X

∇cβ · ∇cγ , cα (0, x) = 0, |α| > 1.

β,γ 6 =0: β+γ =α

(3.2c) (In the above, we have used cij as a simplified notation for cij .) The most striking feature of (3.2) is that all the equations apart from (3.2a) are linear autonomous equations for cα (note that the non-homogeneous term in (3.2c) involves only cη for |η| < |α| which are already known from the earlier equations). The non-linear equation (3.2a) for c0 is what one would get if the noise term in (2.8) were omitted (for constants, Wick multiplication agrees with ordinary multiplication) and c0 (t, x) could be thought of as the “expected value” of u(t, x) even though there is no proper notion of expectation on (S −p ). Note that the Cole–Hopf transformation (1.2) can be applied to c0 to get a linear PDE. As equating coefficients to get (3.2) could be considered to be only a formal procedure, there is a little extra work required to prove the result rigorously. P Lemma 3.1. Consider the Wiener chaos expansion u(t, x) = α cα (t, x)Wα of the solution to (2.8). Then the coefficients cα satisfy Eqs. (3.2). Proof. From (3.1), we see thatP the Hermite transform of space-time white noise W (t, x) is given by (HW (t, x))(z) = i,j zij mi (t)ej (x). Let Uz (t, x) = (Hu(t, x))(z) be the Hermite transform of u(t, x). Then because of (2.7), Uz satisfies the PDE λ ∂Uz (3.3) = ν1Uz + |∇Uz |2 + σ φz , Uz (0, x) = u0 (x), ∂t 2 P where we have put φz (t, x) = i,j zij mi (t)ej (x). Next, observe that the coefficients cα have the following representation in terms of Uz : 1 (3.4) cα = ∂zα Uz , z=0 α! Q α where ∂zα = i,j ∂ αij /∂zijij . To establish the lemma, we just have to check that if cα are defined using (3.4), then cα satisfy (3.2). This is clearly true for c0 . For the higher order coefficients, note that the linear terms in (3.3) and (3.2) present no problems, for 1 α ∂Uz 1 α ∂cα = ∂z ∂ , 1c = 1U (3.5) . α z z=0 ∂t α! ∂t α! z z=0

We just need to check that the quadratic and non-homogeneous terms in (3.3) and (3.2) agree with (3.4). For cij , we have

∂z ij |∇Uz |2 =

∂ |∇Uz |2 = 2∇Uz · ∂zij ∇Uz , ∂zij

680

T. Chan

∂zij ∇Uz

z=0

= ∇ ∂zij Uz

z=0

= ∇cij ,

∂φz = mi ej , and ∂zij z=0 so that cij satisfies (3.2b). For the higher-order coefficients, first note that = 0 if |α| > 1. ∂zα φz

(3.6)

z=0

Next, observe the elementary identity m

m X

ak am−k = 2

k=0

m X

kak am−k

k=1

for arbitrary sequences (ak ). Hence, if α is any multi-index with αij = 0 and ij denotes the multi-index with the (i, j ) component equal to 1 and 0 elsewhere, then α+(n+1)ij

∂z

=

2∂zα

|∇Uz |2 = ∂zα

" n X k=0

#

n! ∇(∂zk+1 Uz ) · ∇(∂zn−k Uz ) ij ij k!(n − k)!

= 2(n + 1)!∂zα

=

(n + 1)!∂zα

= (n + 1)!

∂n n [2∇(∂zij Uz ) · ∇Uz ] ∂zij

" n X k + 1 n+1

k=0

"n+1 X k=0

#

1 Uz ) · ∇(∂zn−k Uz ) ∇(∂zk+1 ij ij (k + 1)!(n − k)! #

1 Uz ) · ∇(∂zn−k Uz ) ∇(∂zk+1 ij ij (k + 1)!(n − k)!

n+1 X X k=0 β+γ =α

= (α + (n + 1)ij )!

1 α! β+(k+1)ij γ +(n−k)ij Uz ) · ∇(∂z Uz ) ∇(∂z (k + 1)!(n − k)! β!γ !

n+1 X X

∇cβ+(k+1)ij · ∇cγ +(n−k)ij putting z = 0

k=0 β+γ =α

= (α + (n + 1)ij )!

X

∇cβ · ∇cγ .

(3.7)

β+γ =α+(n+1)ij

Comparing (3.3) and (3.2c), taking into account (3.5), (3.6) and (3.7), shows that t cα+(n+1)ij satisfies (3.2c). The result now follows by induction. u We next establish the scaling properties of (3.2) which will form the basis for the scaling properties of u(t, x).

Scaling Limits of Wick Ordered KPZ Equation

681

Lemma 3.2. Let cα (t, x) be the coefficients in the Wiener chaos expansion of u(t, x) (b) relative to the basis {mi ⊗ ej }, satisfying Eqs. (3.2). Let cα be the coefficients in the Wiener chaos expansion of u(t, x) relative to the basis {bmi (b2 ·) ⊗ bd/2 ej (b ·)}. Then (b) cα (b2 t, bx) = b|α|(1−d/2) cα (t, x), provided that the initial condition u0 (x) is scaled (b) accordingly, so that c0 (0, bx) = c0 (0, x). Proof. Define c˜0 (t, x) := c0 (b2 t, bx). Then an easy exercise shows that c˜0 satisfies (3.2a) (with c˜0 (0, x) = c0 (0, bx)). Next, define c˜ij (t, x) := b−(1−d/2) cij (b2 t, bx). Again, an elementary exercise shows that c˜ij satisfies (3.2b) with c0 replaced by c˜0 and ˜ i (t)e˜j (x) := bmi (b2 t)bd/2 ej (bx). Finally, defining mi (t)ej (x) replaced by m −|β|(1−d/2) cβ (b2 t, bx) for |β| ≤ |α|, some more elementary computac˜β (t, x) := b ˜ Thus, c˜α satisfy the same system tions show that c˜α satisfies (3.2c) with c replaced by c. (b) t of equations as cα . u We conclude this section by recording for future reference the explicit solution to Eqs. (3.2) which is given by: 2ν λ (3.8a) log E x exp u0 (B2νt ) , c0 (t, x) = λ 2ν cij (t, x) = E

t,x

Z t mi (s)ej (Xs ) ds , σ

(3.8b)

0

 λ cα (t, x) = E t,x  2

Z

t

X

0 β,γ 6 =0: β+γ =α

 ∇cβ (s, Xs ) · ∇cγ (s, Xs ) ds  ,

(3.8c)

where, in (3.8a), B is a standard Brownian motion in Rd and E x is expectation taken with respect to the law of B while in (3.8b,c), X is the Rd -valued process satisfying the backward stochastic differential equation √ (3.9) −dXs = 2ν dBs← + λ∇c0 (s, Xs ) ds and E t,x is expectation taken with respect to the law P t,x of X, which is the solution to the backward martingale problem P t,x (Xr = x, r ≥ t) = 1, f

s 7 → Ms := f (s, Xs ) − f (t, x) Z t f˙(r, Xr ) − ν1f (r, Xr ) − λ∇c0 (r, Xr ) · ∇f (r, Xr ) dr + s

is a backward martingale. In (3.9), dBs← denotes the backward Itô integral defined to be Z t−s Z t ← Yu dBu = Yt−u dBu . s

0

In particular, note that if a flat initial condition u0 ≡ K = const. is assumed, then c0 ≡ K and in (3.8b,c), the process Xs is just a (time-reversed) Brownian motion s 7 → B2ν(t−s) with B0 = x. Clearly, we can take K = 0 without loss of generality. We shall henceforth make this simplifying assumption, which will enable us to do some explicit calculations.

682

T. Chan

4. The Main Result: Scaling Limits of the KPZ Equation We are now ready to state the main result of this paper: Theorem 4.1. Let u(t, x) be the solution to the KPZ equation (2.8) with constant initial condition u0 (x) = 0. Then the following scaling limits hold: (i) Full scaling limit: W (a) d = 1: for fixed t and x, k −1/4 u(k 7/4 t, kx) → u∞ (t, x) as k → ∞, where the limit u∞ (t, x) ∈ (S −1 ) is non-trivial and has non-zero Wiener-chaos coefficients cα∞ of all orders. (b) d = 2: for fixed t, x and k, u(k 2 t, kx) W = u(t, x). (ii) White noise scaling limit: In any dimension d ≥ 1, suppose z satisfies 6+d , 2 z < min 4 and let χ be given by z−χ =1+

d . 2

W

Then for fixed t and x, k −χ u(k z t, kx) → w(t, x) as k → ∞, where the limit w(t, x) is a multiple of white noise. Before proving Theorem 4.1, we need some preliminary lemmas. In what follows, expressions of the form f (n) ∼ g(n) will be interpreted in the weaker sense, f (n) ∼ O(g(n)), that the ratio of the 2 sides tends to a constant (not necessarily 1) as n → ∞. Lemma 4.1. Let g(t, x) be a smooth function and for some multi-index p = (p1 , p2 , . . . , pd ) let g (p) = ∂ p g = Let γ (t, x) be the function defined by γ (t, x) = E x

Z

∂ |p| g p . . . . ∂xd d

p ∂x1 1

t

0

g(s, Bt−s ) ds ,

where B is a standard Brownian motion in Rd and E x is the expectation taken with respect to the law of B given B0 = x. Then as t ↓ 0, Z t X 1 Z t g (p) (s, x) ds + ∂ 2r g (p) (s, x)(t − s)r ds, ∂ p γ (t, x) ∼ r r! 2 0 0 |r|≥1

where in the sum on the right-hand side above, r = (r1 , . . . , rd ) is a multi-index: thus Q r ∂ r = dj =1 ∂ rj /∂yjj etc. In particular, if g (p) (0, x) 6= 0, then ∂ p γ (t, x) ∼ tg (p) (0, x) +

X |r|≥1

t |r|+1 ∂ 2r g (p) (0, x). 2r (|r| + 1)r!

Scaling Limits of Wick Ordered KPZ Equation

683

Proof. Writing γ (t, x) =

0

=

∂ p γ (t, x) =

Z tZ 0

Rd

|y|2 1 p g(s, x + y) dy ds, exp − 2(t − s) 2π(t − s)d

Z tZ 0

we see that

Rd

|x − y|2 1 p exp − g(s, y) dy ds 2(t − s) 2π(t − s)d

Z tZ

Rd

|y|2 1 p exp − g (p) (s, x + y) dy ds. 2(t − s) 2π(t − s)d

(4.1)

It is therefore sufficient to show that Z |y|2 1 p exp − g (p) (s, x + y) dy ∼ g (p) (s, x) 2(t − s) Rd 2π(t − s)d X (t − s)r ∂ 2r g (p) (s, x) (4.2) + 2r r! |r|≥1

as t ↓ s, uniformly in s belonging to some small neighbourhood of 0. Next, expand g (p) (s, x + y) as a Taylor series g (p) (s, x + y) = g (p) (s, x) +

X yr ∂ r g (p) (s, x), r!

|r|≥1

which has a common radius of convergence for all s ∈ [0, δ]. The result (4.2) then follows upon substituting the above Taylor expansion into (4.1) and noting the following formula for the even moments of a standard Gaussian distribution: Z y2 y 2m (t − s)m (2m)! exp − dy = . t u (4.3) √ 2(t − s) 2m m! R 2π(t − s) Lemma 4.2. Let cα (t, x) be the coefficients in the Wiener chaos expansion of u(t, x) with respect to the basis {mi ⊗ ej } given by (2.2). Then for fixed x, as t ↓ 0, |α|−1 λ cα (t, x) ∼ nα 2 " !# (4.4) e(|α|−1) t 2|α|−1 1 e(|α|+1) t 2|α| · + (2|α| − 1)(2|α| − 3) . . . 3.1 2 (2|α|)(2|α| − 2) . . . 2 and

|α|−1 λ |∇cα (t, x)| ∼ n˜ α 2 " 1 e˜(|α|) t 2|α|−1 + · (2|α| − 1)(2|α| − 3) . . . 3.1 2

e˜(|α|+2) t 2|α| (2|α|)(2|α| − 2) . . . 2

!# , (4.5)

684

T. Chan

where nα , n˜ α ≤ K α

(4.6)

for some constant K and each e(|α|) = e(|α|) (x) and e˜(|α|) = e˜(|α|) (x) denotes a sum of partial derivatives of the basis functions {ej } of the form e(|α|) , e˜(|α|) = ∂ |α| ∂ 1 . . . ∂ 1 + ∂ |α|−1 ∂ 2 . . . ∂ 2 ∂ 1 . . . ∂ 1

(4.7)

+ ∂ |α|−2 ∂ 3 . . . ∂ 3 ∂ 2 . . . ∂ 2 ∂ 1 . . . ∂ 1 + · · · ,

where each symbol ∂ m represents a derivative of order m of one particular function ej (i.e. ∂ r ej (x) for some multi-index with |r| = m) and the sum of the superscripts in each term of (4.7) is 2|α| in the case of e(|α|) and 2|α| − 1 in the case of e˜(|α|) . Proof. We prove the lemma by induction on α. For |α| = 1, the result follows from (3.8b) and Lemma 4.1. Next, assume that (4.4), (4.5), (4.6) and (4.7) hold for all matrix multi-indices β with |β| < |α|. Using (3.8c) and Lemma 4.1 we have ! |α|−2 X λZ t 1 e˜(|β|+2) s 2|β| e˜(|β|) s 2|β|−1 λ + nβ nγ cα (t, x) ∼ 2 0 2 (2|β| − 1) . . . 3.1 2 (2|β|) . . . 2 β+γ =α

×

∼

1 e˜(|γ |+2) s 2|γ | e˜(|γ |) s 2|γ |−1 + (2|γ | − 1) . . . 3.1 2 (2|γ |) . . . 2

|α|−1 X nβ nγ e˜(|β|) e˜(|γ |) t 2|α|−1 λ 2 (2|α| − 1) (2|β| − 1) . . . 1 (2|γ | − 1) . . . 1 β+γ =α +

1 4|α|

! ds

(4.8)

|α|−1 X nβ nγ e˜(|β|+2) e˜(|γ |) t 2|α| λ 2 (2|β|) . . . 2 (2|γ | − 1) . . . 1 β+γ =α nβ nγ e˜(|β|) e˜(|γ |+2) t 2|α| + (2|β| − 1) . . . 1 (2|γ |) . . . 2

+ o(t 2|α| ).

Now note that for large |β|, (2|β| − 1) . . . 3.1 ∼ |β|−1/2 (2|β|) . . . 2, p (2|β| − 1) . . . 3.1 ∼ (|β| − 1) (2|β| − 1)! , p (2|β|) . . . 2 ∼ |β| (2|β|)! . Q Q (To see this, consider the large n behaviour of nk=1 (2k +1)/(2k) = nk=1 (1+1/(2k)).) p √ Next, (|β| − 1) (2|β| − 1)!(|γ | − 1) (2|γ | − 1)! (where β + γ = α) has its minimum when |β| = |γ | = |α|/2. By Stirling’s formula, [2(|α| − 1)]! ∼ 22(|α|−1) . [(|α| − 1)!]2

Scaling Limits of Wick Ordered KPZ Equation

685

Therefore

(2|β| − 1) . . . 3.1 (2|γ | − 1) . . . 3.1 ≥ (|β| − 1)(|γ | − 1)(|α| − 1)! p ∼ (|β| − 1)(|γ | − 1) (2|α| − 2)!2−(|α|−1) ∼ (|β| − 1)(|γ | − 1) (2|α| − 3) . . . 1 |α|−1/2 2−(|α|−1) ≥ (2|α| − 3) . . . 1 2−|α| .

For β + γ = α, we can have |β| = |α| − 1 and |γ | = 1, |β| = |α| − 2 and |γ | = 2 and |) gives a sum e(|α|−1) of the form described so on, so it is not hard to see that e˜(|β|) e˜(|γP Q by (4.7). The number of terms in the sum β+γ =α is i,j (αij + 1) − 2 (because for each individual entry, there are αij +1 pairs of values – (0, αij ), . . . (αij , 0) – which sum up to αij but the 2 pairs (0, α) and (α, 0) are not included in the sum). The main point is that this number can not exceed 2α . Consequently, putting all the above observations together gives |α|−1 X nβ nγ e˜(|β|) e˜(|γ |) t 2|α|−1 λ 2 (2|α| − 1) (2|β| − 1) . . . 1 (2|γ | − 1) . . . 1 β+γ =α |α|−1 λ ≤ 2

! nα e(|α|−1) t 2|α|−1 , (2|α| − 1)(2|α| − 3) . . . 3.1

where nα satisfies (4.6). This shows that the leading term in (4.8) has the asymptotic behaviour claimed in (4.4). The correction term in (4.8) can be dealt with similarly. This establishes (4.4). The corresponding result (4.5) for ∇cα can be proved by noting that Lemma 4.1 shows that (recursively) differentiating under the integral sign in (3.8c) does not change the power of t in the small t asymptotics and so we can effectively simply differentiate (4.4). u t Remark. Since nα satisfies the exponential estimate (4.6), in future applications of Lemma 4.2, the (λ/2)|α|−1 appearing in (4.4) and (4.5) will be absorbed into nα and omitted for notational convenience. Proof of Theorem 4.1. Fix an orthonormal basis {mi ⊗ ej } of L2 (R+ × Rd ) as given (k) by (2.2) and let cα (t, x) denote the coefficients of the Wiener chaos expansion of u(t, x) with respect to the basis {k −1 mi (k −2 ·) ⊗ k −d/2 ej (k −1 ·)}. Then by Lemma 3.2, (k) cα (k 2 t, kx) = k |α|(1−d/2) cα (t, x) (where cα are of course the Wiener chaos coefficients with respect to {mi ⊗ ej }) and hence, for fixed k, X cα(k) (k 2 k z−2 t, kx)Wα(k) k −χ u(k z t, kx) = k −χ α

W −χ X |α|(1−d/2) k cα (k z−2 t, x)Wα . =k

(4.9)

α

The full scaling limit in dimension d = 2 stated in (i)(b) of the theorem is now immediate, for putting d = z = 2 in (4.9) gives WX cα (t, x)Wα = u(t, x). u(k 2 t, kx) = α

686

T. Chan

(This can also be easily checked directly from the KPZ equation (2.8) without recourse to Wiener chaos expansions.) In the other cases, we shall show that the right-hand side of (4.9) with the stated −1 values of χ and Pz converges in1 the topology of (S ) as k → ∞, i.e. for any fixed test function f = α aα Wα ∈ (S ) whose coefficients aα satisfy the growth condition (2.4) with p = 1, X α!aα k |α|(1−d/2) cα (k z−2 t, x) (4.10) k −χ α

converges to a non-trivial limit as k → ∞. A sufficient condition for (4.10) to converge is that the individual terms k −χ α!aα k |α|(1−d/2) cα (k z−2 t, x)

(4.11)

converge uniformly in α, and it is this last assertion that we shall prove. Before that, let us first find the values of the exponents χ and z which are necessary for convergence. For z < 2, by Lemma 4.2, (4.12) cα (k z−2 t, x) ∼ O k (z−2)(2|α|−1) as k → ∞. Hence, in order that the quantity in (4.11) converge to a non-zero limit as k → ∞ for all α, it is necessary that k −χ k |α|(1−d/2) k (z−2)(2|α|−1) = k |α|(2z−3−d/2) k 2−z−χ

(4.13)

converge for each α as k → ∞; in other words 2z − 3 − d/2 = 0, z + χ = 2.

(4.14)

For d = 1, the solution to (4.14) is z = 7/4, χ = 1/4 as stated in (i)(a) of the theorem; for d = 2, the solution is z = 2, χ = 0 as has already been proved; for d ≥ 3 there is no solution for which z < 2 so our method based on the small t asymptotics of cα (t, x) breaks down. For the white noise scaling limit, we require that (4.11) and (4.13) converge to a non-zero limit for |α| = 1 and to 0 for |α| > 1. This can only happen if z − χ = 1 + d/2, z <

6+d , 4

(4.15)

but in addition, we still need z < 2. To finish the proof, we need to show that, firstly, the limit of (4.9) is in (S −1 ) and secondly, the individual terms given by (4.11) converge uniformly in α. If (4.10) does indeed converge, then the limit of (4.9) will have Wiener chaos coefficients given by the limit as k → ∞ of k −χ k |α|(1−d/2) cα (k z−2 t, x),

(4.16)

which according to Lemma 4.2 is k −χ k |α|(1−d/2) k (z−2)(2|α|−1) t 2|α|−1 nα e(|α|−1) k→∞ (2|α| − 1)(2|α| − 3) . . . 3.1 2|α|−1 e(|α|−1) nα t = (2|α| − 1)(2|α| − 3) . . . 3.1

cα∞ := lim

(4.17)

Scaling Limits of Wick Ordered KPZ Equation

687

if χ and z satisfy (4.14). To prove that the limit of (4.9) is in (S −1 ), we need to show that the limiting Wiener chaos coefficients cα∞ given by (4.17) satisfy the growth condition (2.5) with p = 1. We therefore need to know how the term e(|α|−1) behaves for large |α|. To this end, consider the j th Hermite function ξj given by the formula (2.1). It is known that k ξj k∞ = O(j −1/12 ), and since d j −x 2 /2 2 e = (−1)j e−x /2 Hj (x), dx j we have that

j p 2 d 2 sup e−x /2 Hj (x) = sup j e−x /2 = O j −1/12 j ! . x dx x∈R

Differentiating the last two expressions gives n n+j p d d −x 2 /2 −x 2 /2 Hj (x) = sup n+j e = O (n + j )−1/12 (n + j )! . sup n e x dx x dx From this, we can infer that n n √ d d √ 2 sup n ξj (x) = ( π j !)−1/2 sup n e−x /2 Hj ( 2x) x dx x dx √ n/2 2 (n + j )−1/12 (n + j )! . ≤O √ j!

(4.18)

Of course, in the context of (4.17), n will play the role of |α| − 1 and j will play the role (j ) of one of the βl in (2.2). For j = O(bn), Stirling’s formula shows that ( n ) (1 + b)1+b (n + j )! n! ; (4.19) ∼O j! bb for j n, (n + j )! ∼ O(j n ), j!

(4.20)

(n + j )! ∼ O(nj n!). j!

(4.21)

while for n j ,

We have already seen in the proof of Lemma 4.2 that p (2|α| − 1) . . . 3 ∼ (|α| − 1) (2|α| − 1)!.

(4.22)

688

T. Chan

Putting together the estimates (4.18)–(4.22) shows that, for some q > 0, cα∞ ∼ o (2N)αq asymptotically for large |α| and therefore the sum in (2.5) converges for p = 1. In the case that |α| = 1 and the exponents χ and z satisfy (4.15), Lemma 4.1 shows that the limiting first-order Wiener chaos coefficients corresponding to (4.17) are given by mi (0)ej (x)t

(4.23)

if α = ij , so from (3.1) the white noise limit is tW (0, x). It remains to show that the expression in (4.11) converges as k → ∞ uniformly in α. Because of the growth condition (2.4) (with p = 1) for the coefficients aα , it is sufficient to show that, for some q > 0, k −χ k |α|(1−d/2) cα (k z−2 t, x)(2N)−αq

(4.24)

converges (to cα∞ (2N)−αq ) uniformly in α. But this is immediately apparent on applying the preceding argument to the correction term in (4.4), to show that this behaves asymptotically for large |α| as ! e(|α|+1) t 2|α| 1 ∼ o (2N)αq (4.25) 2 (2|α|)(2|α| − 2) . . . 2 for some q > 0 – since according to Lemma 4.2, k −χ k |α|(1−d/2) cα (k z−2 t, x) − cα∞ (2N)−αq 1 ∼ 2

e(|α|+1) t 2|α| k |α|(2z−3−d/2)−χ (2|α|)(2|α| − 2) . . . 2

! (2N)−αq ,

(4.25) shows that (4.24) converges uniformly in α as k → ∞ provided χ and z satisfy (4.14) or (4.15). u t Remarks. (i) Note that the values of χ and z for full scaling in dimensions 1 and 2 are related by z + χ = 2, z =

6+d . 4

Therefore (4.14) – corresponding to the full scaling limit in dimensions 1 and 2 – can be thought of as the extremal case of (4.15), corresponding to the white noise scaling regime. (ii) The only point in the proof of Theorem 4.1 where the assumption of a constant initial condition is used is in the proof of Lemma 4.1, which relies on certain explicit formulae for Brownian motion. Provided one can establish a similar result for the process Xt defined by (3.9), the remaining argument is equally valid for a large class of (well-behaved) initial conditions u0 (x). For example, the results should still hold for initial conditions u0 (x) which are bounded, continuous and |u0 (x)| → 0 (or |u0 (x)| → K) as |x| → ∞. To see this, note that in the proof of Theorem (k) 4.1, the Wiener chaos coefficients cα (t, x) = k |α|(1−d/2) cα (k −2 t, k −1 x) have to

Scaling Limits of Wick Ordered KPZ Equation

689

satisfy Eqs. (3.2) with initial condition u0 (x), because we start by expanding u(t, x) (k) relative to the basis {Wα }. This means that cα have to satisfy (3.2) with the rescaled initial condition u0 (kx). Heuristically, this situation reduces to that of a constant initial condition because u0 (kx) → 0 as k → ∞. To establish the analogous result to Lemma 4.1 for X, we need to compute expectations of functionals of X; but we can use Girsanov’s theorem to transform such expectations into expectations of functionals of Brownian motion as follows. If P = P t,x denotes the probability law of X given Xt = x, define a new law Q by Z t λ dQ = Zt = exp − √ ∇c0 (t − u, Xt−u ) · dBu dP 2ν Z0 λ2 t 2 − |∇c0 (t − u, Xt−u )| du . 4ν 0 √ Rt Then Girsanov’s theorem says that under Q, B˜ t = Bt + (λ/ 2ν) 0 ∇c0 (t − u, Xt−u ) du is a Brownian √ motion (where B is a Brownian motion R t under P ). Hence, 2ν B˜ t−s under Q and so E t,x [ 0 g(s, Xs ) ds] = from (3.9) Xs = √ Rt E Q [Zt−1 0 g(s, 2ν B˜ t−s ) ds] (where E Q denotes expectation with respect to Q). From the formula (3.8a), Z ∞ λ |y − x|2 2ν 1 log exp u0 (ky) dy , exp − c0 (t, x) = √ λ 4νt 2ν 4π νt −∞ we see that (for fixed t and x) c0 and |∇c0 | converge to 0 as k → ∞. It is then not hard to make rigorous the idea that Zt ≈ 1 for large k and small t ∼ k z−2 , and apply the results of Lemma 4.1 for Brownian motion. (iii) Comparing (4.22) with the “worst case” of (4.19)–(4.21) and noting that n!/(2n)! ∼ O (4n n!)−1 , we see that in fact, cα∞ ∼ o (α!)−1/2 (2N)αq for some q > 0 and so u∞ (t, x) ∈ (S −1/2 ). (iv) The most interesting case of Theorem 4.1 is likely to be the full scaling regime – the white noise scaling is a rather degenerate result in many respects. Thus in dimensions d ≥ 3 the scaling results presented here are still incomplete. (v) One further drawback of the approach presented here is that in the space (S −1 ), there is no way of defining correlation functions, which can be written formally in terms of the Wiener chaos coefficients as X α!cα (t, x)cα (s, y), hu(t, x), u(s, y)i = α

a sum which diverges. Consequently, one cannot consider the scaling properties of correlation functions, except on a formal level. References 1. Bertini, L. and Giacomin, G.: Stochastic Burgers’ and KPZ equations from particle systems. Commun. Math. Phys. 193, 571–607 (1997) 2. Handa, K.: On a stochastic PDE related to Burgers’ equation with noise. In: Hydrodynamic Limit and Burgers’Turbulence, ed. T. Funaki and W. A. Woyczynski, Berlin–Heidelberg–New York: Springer, 1996 3. Holden, H., Lindstrøm, T., Øksendal, B., Ubøe, J. and Zhang, T.-S.: The Burgers’ equation with a noisy force and the stochastic heat equation. Commun. Partial Differential Equations 19, 119–141 (1994)

690

T. Chan

4. Holden, H., Lindstrøm, T., Øksendal, B., Ubøe, J. and Zhang, T.-S.: The stochastic Wick-type Burgers’ equation. In: Stochastic Partial Differential Equations, ed. A. Etheridge, Cambridge: CUP, 1995 5. Kardar, M., Parisi, G. and Zhang, Y.-C.: Dynamic scaling of growing interfaces. Phys. Rev. Lett. 56, 889–892 (1984) 6. Kuo, H.-H.: Analysis of white noise functionals. Soochow J. Math. 20, 419–464 (1994) 7. Lindstrøm, T., Øksendal, B. and Ubøe, J.: Wick multiplication and Itô-Skorohod stochastic differential equations. In: Ideas and Methods in Mathematical Analysis, Stochastics and Applications, ed. S. Albeverio, J. E. Fenstad, H. Holden and T. Lindstrøm, Cambridge: CUP, 1992 8. Lindstrøm, T., Øksendal, B., Ubøe, J. and Zhang, T.-S.: Stability properties of stochastic partial differential equations. Stoch. Analysis Appl. 13, 177–204 (1995) 9. Nualart, D. and Rozovskii, B.: Weighted stochastic Sobolev spaces and bilinear SPDEs driven by spacetime white noise. J. Funct. Anal. 149 No. 1 (1997) 10. Walsh, J.B.: An introduction to stochastic partial differential equations. In: Ecole d’Eté de Probabilités de Saint-Flour XIV 1984, ed. P. L. Hennequin, Lecture Notes in Math. 1180, Berlin–Heidelberg–New York: Springer, 1986 Communicated by A. Kupiainen

Commun. Math. Phys. 209, 691 – 728 (2000)

Communications in

Mathematical Physics

© Springer-Verlag 2000

The Atiyah–Bott–Patodi Method in Deformation Quantization Boris Fedosov Institut für Mathematik, Universität Potsdam, Postfach 60 15 53, 14415 Potsdam, Germany Received: 10 June 1999 / Accepted: 23 August 1999

Abstract: We give a new proof of the index theorem for deformation quantization following the Atiyah–Bott–Patodi scheme. The main ingredient consists of Tamarkin’s theorem on topological invariants of symplectic connections which is a symplectic analog of the well-known theorems for Riemannian connections. We construct an averaging procedure which transforms any topological functional on connections to its standard form. The construction is combinatorial, its geometrical nature remains obscure. But it enables us to conclude that the index density is given by universal polynomials in Pontryagin and Chern forms. The Hirzebruch arguments reduce the index theorem to a direct check for complex projective spaces. This is done by means of the reduction theorem for deformation quantization providing a description of deformation quantization for projective spaces in terms of a harmonic oscillator. Introduction TheAtiyah–Singer index theorem for elliptic operators is one of the most beautiful results in mathematics. Since its appearance in 1963 it attracts the interest of mathematicians working in different areas. The number of different proofs of this result is now over six let alone particular cases such as Gauss–Bonnet, Riemann–Roch or signature theorems. We mention here only the Atiyah–Singer K-theoretical proof [4,5] and the Atiyah–Bott– Patodi direct proof for operators of the Dirac type [3]. An index theorem for deformation quantization is a generalization of the Atiyah– Singer theorem. It was announced by the author [6], a compressed version of the proof appeared in [7] and the detailed proof in [8]. Later the author’s investigations in deformation quantizations and index theory were summarized in the book [9]. A different approach was suggested in [15], these authors have also considered various generalizations of the index theorem, e.g. for families [16]. In this paper we present a new proof of the index theorem for deformation quantization following the Atiyah–Bott–Patodi scheme [3]. The original proof in [8,9] was a

692

B. Fedosov

modification of the Atiyah–Singer K-theoretical proof. We have thus a kind of a fascinating game trying to adjust one of the numerous proofs of the index theorem for elliptic operators to deformation quantization. This adjustment is by no means automatic; as a result we have deeper insight into both quantization and index theory. In particular, for the Atiyah–Bott–Patodi method such a modification means, roughly speaking, that we are able to extend the original method from Dirac operators to general elliptic operators. We consider the simplest form of the index theorem for deformation quantization assuming that the symplectic manifold in question is compact. Here we are going to touch upon neither more general forms of the index theorem nor its consequences for pseudodifferential operators. These are standard matters which for the sake of simplicity of exposition are better to publish separately. So, consider a compact symplectic manifold (M, ω) and a vector bundle E over M. We denote by K the bundle Hom(E, E) and refer to it as a coefficient bundle. A choice of a symplectic connection ∂ s on M and a connection ∂ E on E gives rise to a canonical quantization procedure Q (see a brief survey in Sect. 1 below) which maps functions a ∈ C ∞ (M, K) to an associative non-commutative algebra WD (M, K) of quantum observables. For a compact symplectic manifold M there exists a trace functional Tr : WD → C[[h]]/ hn . Here the right-hand side means a space of formal Laurent series in the powers of a formal parameter h (the Planck constant) with negative exponents not exceeding n = 1/2 dim M. This functional satisfies the trace property: Tr b a ◦b b = Tr b b ◦b a , ◦ means the product in the algebra WD . It is unique up to a constant factor which may be fixed by a natural normalization condition (see Subsect. 1.3). A particular case of the index theorem considered in this paper is the following formula for the “number of quantum states”. Theorem 0.1. Tr b 1=

Z M

chE exp

ω b A(M). 2π h

(0.1)

Here b 1 = Q(1) is a unit in the algebra WD (M, K), chE = tr exp(κ E )

(0.2)

is the Chern character of the connection ∂ E , s

sinh κ /2 b A(M) = det−1/2 κ s /2

(0.3)

is the Atiyah-Hirzebruch form of the connection ∂ s ; κ E , κ s denote curvature forms divided over 2πi, κE =

1 1 E i R dx ∧ dx j , 2π i 2 ij

(0.4)

κs =

1 1 k R dx i ∧ dx j , 2π i 2 lij

(0.5)

k are components of curvature tensors of ∂ E and ∂ s respectively. RijE , Rlij

Atiyah–Bott–Patodi Method in Deformation Quantization

693

The Atiyah–Bott–Patodi method for Theorem 0.1 consists in the following. Using deformation quantization machinery, one can show that the left-hand side of (0.1) has the form Z ωn 1 , (0.6) t (x, h) Tr b 1= (2π h)n n! where the function t (x, h) = dimE + ht1 (x) + h2 t2 (x) + . . . called trace density is a formal series in powers of h whose coefficients are polynomials in components of curvature tensors Rijs kl = ωim Rjmkl , RijE and their covariant derivatives completely contracted by means of ω−1 = (ωαβ ). The problem is to find an averaging procedure (if it exists) not changing the functional (0.6) and mapping the integrand in (0.6) to a polynomial in Chern forms c1 , c2 , . . . , cm (m = dimE) of the connection ∂ E , Pontryagin forms p1 , p2 , . . . , pn , (n = 1/2 dim M) of ∂ s and ω. The crucial theorem making possible the existence of such an averaging was proved by Tamarkin [17]. It may be considered as a symplectic analog of the Abramov [1] and Gilkey [13] theorems in the Atiyah–Bott–Patodi method. Theorem 0.2 (Tamarkin). If the functional of the type (0.6) does not depend on the choice of connections ∂ s and ∂ E , then the coefficients tk (x) are polynomials P (ck , pl , ω) up to exact forms ωn , ∂is Aj ωij n! where Aj are polynomials in curvature tensors RijE , Rijs kl and their covariant derivatives contracted by ω−1 . The construction and the properties of the averaging is the basic step of our proof. E 6 = 0) Previously [10], the case of a flat symplectic connection (that is Rijs kl = 0 while Rkl was considered. In this case the averaging acts on monomials tr ai1 i2 ...i2n−1 i2n ωi1 i2 . . . ωi2n−1 i2n

(0.7)

coming into the trace density t by alternation over all the upper (or, equivalently, lower) indices. Here ai1 i2 ...i2n−1 i2n is a product of tensors E E E = ∂nE . . . ∂m Rkl . Rkl,m...n

(0.8)

In general, when Rij kl 6 = 0, we have additional factors s s Rijs kl,m...n = ∂ns . . . ∂m Rij kl ,

(0.9)

so that the complete antisymmetrization gives 0. After a rather long search the needed averaging procedure was found. Its description and properties are given in Sect. 2. The main property is that being applied to the integrand in (0.6) it does not change the value of the integral, that is Z ωn 1 (0.10) Pt (x, h) , Tr b 1= n (2π h) n!

694

B. Fedosov

where P denotes the averaging. The integrand in (0.10) is a power series in h whose coefficients are universal polynomials in Pontryagin and Chern forms and the symplectic form ω. The multiplicative property of the index (Subsect. 1.3) enables us to identify this expression with the right-hand side of (0.1) using Hirzebruch’s argument similarly to [3]. We need only to check the equality (0.1) for two particular cases: 1. M = CPn with a trivial line bundle E = C, 2. M is a 2-dimensional torus T2 with an arbitrary bundle E. The latter case was already proved in [10], so we need only the case of CPn . We study deformation quantization for CPn in Sect. 3. The canonical construction described in Sect. 1 does not lead to explicit formulas, so for a more explicit description we use reduction procedure [11]. The calculations in this section illustrate the relations between different versions of quantization. First, using the Weyl pseudodifferential operators on Rn+1 , we construct quantization on CPn for discrete values of h ∈ (0, 1]. Then, applying the stationary phase method to rapidly decaying exponentials, we extend this quantization procedure as an asymptotic one for h → 0. Finally, treating asymptotic series as formal ones, we arrive at deformation quantization. Here we encounter a non-trivial problem of the normalization of the trace. What we do in this section is a very particular case of a general program developed by A. Karabegov [14] for flag manifolds. 1. Deformation Quantization Here we give a brief summery of results concerning deformation quantization on a symplectic manifold (M, ω). More details and proofs may be found in [12] or in the book [9]. 1.1. Construction of the algebra WD . The letter W stands for Weyl (Hermann) and D means a special connection. The notation WD means that our quantum objects are flat sections of a bundle W (the Weyl algebras bundle) with respect to the connection D. We denote by K the bundle Hom (E, E), where E is a complex vector bundle over M. A connection ∂ E on E defines a connection on K (with the same notation) given in local frames by ∂ E a = da + [0 E , a], where a ∈ C ∞ (M, K) is a section of K and 0 E = 0i dx i is a local connection one-form of ∂ E . Define the Weyl algebras bundle W = W (M, K) by the space C ∞ (M, W ) of its sections. The section a of the bundle W is a function (formal) on T M with values in K depending on a formal parameter h: a = a(x, y, h) =

∞ X

hk ak,α (x)y α .

(1.1)

k,|α|=0

Here x ∈ M, y = (y 1 , . . . , y 2n ) ∈ Tx M is a tangent vector, α = (α1 , . . . , α2n ) is a multiindex, y α = (y 1 )α1 (y 2 )α2 . . . (y 2n )α2n , ak,α (x) are symmetric tensors on M with values in K, h is a formal parameter. The series (1.1) is a formal power series in h, y i . We define degrees: deg y i = 1, deg h = 2 and order the terms by the total degree 2k + |α|.

Atiyah–Bott–Patodi Method in Deformation Quantization

695

the tangent space Tx . The latter carries a linear symplectic structure given by the form ω at the point x ∈ M. Using this structure, we define a fibrewise product (the Weyl (Moyal) product) by ∂ ∂ ih a ◦ b = exp − ωij (x) i j a(x, y, h)b (x, z, h)|z=y 2 ∂y ∂z k ∞ X 1 i1 j1 ∂ka ∂kb ih = . (1.2) ω . . . ωik jk i − 2 k! ∂y 1 . . . ∂y ik ∂y j1 . . . ∂y jk k=0

We will also need differential forms on M with values in W ; these are sections of the bundle W ⊗ 3, where 3 means the bundle of exterior forms. In local coordinates such a form is X a= hk akpq , where akpq = aki1 ...ip j1 ...jq (x)y i1 . . . y ip dx j1 ∧ · · · ∧ dx jq

(1.3)

with the total degree of terms 2k + p. The coefficients in (1.3) are tensors symmetric in i1 , . . . , ip and antisymmetric in j1 , . . . , jq . Let ∂ s be a symplectic connection on M, that is a torsion-free connection preserving the symplectic tensor ωij (x). Given connections ∂ s and ∂ E , we define a connection ∂ in the bundle W (or W ⊗ 3) by taking covariant differentials of coefficients in (1.1) or (1.3) which are tensors on M with values in K. In local Darboux coordinates we have ∂a = dx a +

i [0, a], h

where 0=

1 0ij k (x)y i y j dx k − ih0i (x)dx i , 2

(1.4)

and 0ij k = ωim 0jmk are connection coefficients (Christoffel symbols) of the symplectic connection ∂ s . They are completely symmetric in i, j, k. Introduce an important derivation δa = dx i ∧

∂a i = − [ωij y i dx j , a]. ∂y i h

It acts on summands (1.3), replacing in turn y i1 by dx i1 , . . . , y p by dx ip and summing the results. We have by a direct check δ 2 a = 0; δ∂ + ∂δ = 0. Introduce further an operator δ∗a = y k i

∂ ∂x k

a,

which acts on (1.3), replacing in turn dx j1 by y j1 , dx jq by (−1)q−1 y jq and summing the results. Clearly, (δ ∗ )2 = 0, but it is not a derivation. Finally, introduce a "fibrewise Laplace operator" δ ∗ δ + δδ ∗ = (δ + δ ∗ )2

696

B. Fedosov

which acts on (1.3) by

(δ ∗ δ + δδ ∗ )akpq = (p + q)akpq .

Put δ −1 akpq =

1 δ ∗ akpq , (p + q) > 0, (p + q) δ −1 ak00 = 0.

Then we have the following “‘fibrewise Hodge–de Rham decomposition”: a = δ −1 δa + δδ −1 a + a00 .

(1.5)

We will consider more general connections on the bundle W , Da = −δa + ∂a +

i [r, a], h

(1.6)

where r ∈ C ∞ (M, W ⊗ 31 ) is a globally defined 1-form with deg r ≥ 3. A simple computation shows that i ∂ 2 a = [R, a], h where ih E k 1 i dx ∧ dx l R = d0 + 0 ◦ 0 = Rij kl y i y j dx k ∧ dx l − Rkl h 4 2 E is a curvature tensor of the connection ∂ E and is the curvature of ∂, Rkl Rij kl = ωim Rjmkl is the curvature tensor of the symplectic connection ∂ s . It satisfies the following symmetry relations: Rij kl = Rj ikl = −Rij lk , Rij kl + Riklj + Rilj k = 0,

(1.7) (1.8)

∂m Rij kl + ∂k Rij lm + ∂l Rij mk = 0.

(1.9)

and the Bianchi identity

In local Darboux coordinates we may rewrite (1.6) in the form Da = dx a +

i [ωij y i dx j + 0 + r, a]. h

(1.10)

Then an easy calculation yields 1 i i 2 2 i j − ωij dx ∧ dx + R − δr + ∂r + r , a . D a= h 2 h The two-form = −ω + R − δr + ∂r +

i 2 r h

appearing in this formula is called the Weyl curvature of D.

(1.11)

Atiyah–Bott–Patodi Method in Deformation Quantization

697

We will look for a special connection D whose Weyl curvature is equal to −ω. The form ω is central, so we would have i D 2 a = − [ω, a] ≡ 0. h The connection D with this property is called Abelian. From Eq. (1.11) we obtain an equation for this connection D, δr = R + ∂r +

i 2 r . h

(1.12)

If we impose an additional condition δ −1 r = 0

(1.13)

(it means that r is “fibrewise coclosed”), then by virtue of (1.5) we obtain an equivalent equation i (1.14) r = δ −1 R + δ −1 ∂r + r 2 h with δ −1 R =

1 ih Rij kl y i y j y k dx l − Rij y i dx j . 8 2

(1.15)

The basic theorem of deformation quantization reads Theorem 1.1. There exists a unique solution r of (1.12) satisfying (1.13). It may be found by iterations of (1.14). From (1.15) we see that the symmetrized curvature tensor R(ij k)l =

1 (Rij kl + Rj kil + Rkij l ) 3

(1.16)

is more important for deformation quantization than the curvature Rij kl itself. Here parenthesis on the left-hand side mean symmetrization. Besides the symmetry in the first three indices, the tensor R(ij k)l satisfies an additional condition R((ij k),l) = R(ij k)l + R(ij l)k + R(ikl)j + R(lj k)i = 0,

(1.17)

which means that its completely symmetric component vanishes. The original tensor Rij kl may be found from R(ij k)l by alternation in any pair of indices not belonging to the symmetric cluster Rij kl =

1 (R(ij k)l − R(ij l)k ). 2

(1.18)

We define quantum objects to be flat sections of the bundle W with respect to the special connection D: WD := {a ∈ C ∞ (M, W )| Da = 0}.

698

B. Fedosov

The property of being flat may be written by (1.6) as δa = ∂a +

i [r, a], h

(1.19)

or using (1.5), as i a = a0 + δ −1 ∂a + [r, a] , h

(1.20)

where a0 (x, h) = a(x, 0, h) is a formal function on M. Theorem 1.2. Equation (1.19) has a unique solution satisfying a|y=0 = a0 . It may be found by iterations of (1.20). We will use the following notation for this solution: a =b a0 = Q(a0 ). The operator

Q : C ∞ (M, K)[[h]] → WD

will be called the quantization procedure or simply quantization map. Its inverse is a restriction to y = 0. Remark 1.3. Using Q and Q−1 , we may transport the product ◦ in WD directly to the functions on M defining the so-called star-product, a(x, h) ∗ b(x, h) = Q−1 (Q(a) ◦ Q(b)). However, it is more convenient to work with the algebra WD and the fibrewise product ◦ than with the algebra of functions equipped with the star- product. 1.2. Isomorphisms and traces. The simplest example of the above construction is the standard symplectic space R2n with a constant symplectic form ω=

1 ωij dx i ∧ dx j 2

and a trivial bundle K. We set ∂ = dx , D0 = −δ + d = d +

i [ωij y i dx j , · ], h

(1.21)

so that the flat sections in WD0 (R2n ) are those which have the form b a = Q(a(x, h)) = a(x + y, h) =

X 1 a (α) (x, h)y α . α!

Choose a local Darboux chart O ⊂ M and a local frame of the bundle E and consider the restriction W |O of the Weyl algebras bundle to O. We have two Abelian connections on W |O . One of them D has been constructed in the previous subsection globally on M. The other one D0 is defined locally on W |O by (1.21). Thus, we have two different algebras WD and WD0 over O, two different quantization maps Q and Q0

Atiyah–Bott–Patodi Method in Deformation Quantization

699

corresponding to D and D0 . It turns out that the algebras WD and WD0 are isomorphic and the isomorphism may be taken in a special form a0 = (Ad U )a = U ◦ a ◦ U −1 with

U = exp

i H , h

(1.22)

(1.23)

where H ∈ C ∞ (O, W ) is a section of the bundle W over O, degH ≥ 3. Although (1.23) is not defined as a section of W , the expression (1.22) is defined correctly since i adH a U ◦ a ◦ U −1 = exp h ∞ X 1 i i i (1.24) [H, [H, . . . [ H, a] . . . ]]. = k! h | h {z h} k=0 k

We may also define (1.23) in an extension of the bundle W . To this end consider an algebra bundle W + whose sections are sums (1.1), where k may be negative, provided the total degree 2k + |α| is positive with a finite number of terms of a given total degree entering the sum (1.1). Then 1 i 2 i H ◦ H + ... U =1+ H + h 2 h is an invertible section of the bundle W + since deg H ≥ 3. If we require a0 given by (1.24) to belong to WD for any a ∈ WD , then we come to the following condition i −1 U ◦ D0 U − 10, a = 0, h which may be derived by differentiation of (1.22) with respect to D0 and using D0 a = Da −

i [10, a] h

with 10 = 0 + r

(1.25)

as is easily seen from (1.21) and (1.10). To satisfy this condition, we set U −1 ◦ D0 U =

i 10. h

(1.26)

First of all we see that the solution of (1.26) is not unique: multiplying U by an invertible section V ∈ C ∞ (O, WD0 ) from the left, we obtain another solution V ◦ U . To find a particular solution of (1.26), we take it in the form (1.23) and make use of the well-known formula 1 − exp(−adb) D0 b. D0 exp b = exp b ◦ adb

700

B. Fedosov

Thus, since D0 = −δ + d, we would have δH = dH −

ad hi H

1 − exp(−ad hi H )

(0 + r).

(1.27)

Applying δ −1 and setting H |y=0 = 0, we come to the equation H =δ

−1

dH −

ad hi H

1 − exp(−ad hi H )

! (0 + r) ,

(1.28)

which may be solved by iterations starting with H0 ≡ 0. It gives H =

1 0ij k y i y j y k − ih0i y i + . . . 6

(1.29)

Thus, we arrive at the following theorem Theorem 1.4. Any algebra WD is isomorphic locally to the standard algebra WD0 and the isomorphism may be taken in the form (1.22). This theorem allows one to define a trace functional on elements a ∈ WD having compact support in M. So, let supp a ⊂ O ⊂ M, where O is a local Darboux coordinate chart. We take an isomorphism (1.22) to the algebra WD0 (R2n ) and set 1 Tr a = (2π h)n

Z R2n

tr (Ad U ) a|y=0

ωk . n!

(1.30)

For an arbitrary section a ∈ WD with compact support we take a locally finite covering by Darboux charts Oi and trivializations of the bundle E over Oi . Next, taking partition of unity ρi (x) ∈ C ∞ (M) subordinate to the covering Oi , we construct flat sections ρ bi = Q(ρi ) which give a partition of unity in the algebra WD . Now, taking isomorphisms Ui for each chart Oi , we set Z X 1 ωn . (1.31) tr (Ad U ) ρ b ◦ a| Tr a = i i y=0 (2π h)n R2n n! i

Theorem 1.5. The functional Tr a is correctly defined (that is, independent of the choices Oi , ρi , Ui ) and satisfies the trace property Tr a ◦ b = Tr b ◦ a

(1.32)

for any a, b ∈ WD with compactly supported product. This is a unique (up to a normalization factor) functional with this property. The factor (2πh)−n in (1.30), (1.31) is taken by analogy with pseudodifferential operators. There are deeper reasons for this choice (see the next subsection). If M is a compact symplectic manifold, then any a ∈ WD has a trace.

Atiyah–Bott–Patodi Method in Deformation Quantization

701

1.3. Trace density and normalization condition. Take a scalar function a(x) ∈ C0∞ (M) with compact support, consider a section a(x) ⊗ 1 ∈ C0∞ (M, K), where 1 means a unit in the algebra K = Hom (E, E), quantize this section obtaining b a = Q(a(x) ⊗ 1) ∈ WD (M, K) and calculate its trace. The result will be Tr b a=

1 (2π h)n

Z M

t (x, h)a(x)

ωn , n!

(1.33)

where t (x, h) = tr 1 + ht1 + h2 t2 · · · ∈ C ∞ (M)[[h]] is a function on M called trace density. The explicit formulas of Subsects. 1.1 and 1.2 show that the coefficients tk (x) may be expressed as polynomials in connection coefficients 0i , 0ij k and their derivatives in local coordinates ∂ p 0i (x) , ∂x i1 . . . ∂x ip p ∂ 0ij k (x) 0ij k,k1 k2 ...kp (x) = k ∂x 1 . . . ∂x kp 0i,i1 i2 ...ip (x) =

(1.34) (1.35)

completely contracted by the inverse symplectic tensor ω−1 = ωij . An extremely important fact is that although the quantities (1.34), (1.35) are not tensors and depend on the choice of local Darboux coordinates and local frames of E, the coefficients tk (x) are independent of these choices. Thus, to compute tk (x0 ) at a point x0 ∈ M, we may use a special Darboux coordinate system centered at x0 and special frames of the bundle E. Observe that the quantities (1.35) are symmetric in i, j, k and k1 , k2 , . . . , kp separately while (1.34) are symmetric in i1 , i2 , . . . , ip . We come to the notion of normal Darboux coordinates and normal frames centered at x0 imposing an additional symmetry property: completely symmetric components (in all indices) should vanish at the origin x0 : 0(i,i1 ...ip ) (x0 ) = 0(ij k,k1 k2 ...kp ) (x0 ) = 0.

(1.36)

This requirement may be expressed in a more elegant form 0i (x)x i ≡ 0ij k (x)x i x j x k ≡ 0

(1.37)

if the functions are understood as jets at x0 . Note that the normal Darboux coordinates are not geodesic ones with respect to the connection ∂ s since the latter are not Darboux coordinates at all. So, the question arises if normal Darboux coordinates exist. The answer is positive on the level of jets [9] which is sufficient for our purposes. Theorem 1.6. There exists a local Darboux coordinate system at any point x0 ∈ M such that the function 0ij k (x)x i x j x k vanishes at x = x0 up to infinite order. Two such systems differ up to infinite order by a linear symplectic change of variables.

702

B. Fedosov

As to normal frames, they may be obtained by parallel transport of a given frame at x = x0 along the rays emanating from the origin of the normal Darboux coordinate system. Varying x0 ∈ M and taking normal coordinates centered at x0 , we obtain tensor fields E E (x0 ) = 0i,i (x)|x=x0 , Ti,i 1 ...ip 1 ...ip

Tij k,k1 ...kp (x0 ) = 0ij k,k1 ...kp (x)|x=x0 , which are symmetric within the clusters i, (i1 . . . ip ), ij k, (k1 . . . kp ) with vanishing completely symmetric components. These tensors may be expressed as polynomials in curvature tensors and their covariant derivatives [9]. For example, 3 Tij k,l = − R(ij k)l , 4 1 E = − RijE . Ti,j 2 As was mentioned in the introduction, our task is to calculate the integral Z ωn 1 t (x) . tr b 1= n (2π h) M n!

(1.38) (1.39)

(1.40)

Of course, the explicit formulas of deformation quantization written in normal coordinates at a varying point x0 ∈ M allow us to express t (x) in terms of curvature tensors Rij R(ij k)l and their covariant derivatives. One can do it explicitly for lower dimensions dim M = 2, 4. But even for dim M = 4 the computations become extremely tiresome and lead to the formula which at first glance is quite different from the index formula (0.1) in the introduction. Only after tiresome transformations consisting in integration by parts one can derive the index formula. For higher dimensions dim M = 6, 8, . . . these direct computations become hopeless and we need additional tools. These tools consist in the following properties of the quantity Tr b 1: 1. Stability. Tr b 1 does not depend on the choice of connections ∂ s , ∂ E . 2. Multiplicative property. Let E1 → M1 , E2 → M2 be vector bundles over two symplectic manifolds M1 , M2 . Consider M = M1 × M2 with E = E1 ⊗ E2 , or more precisely E = (π1∗ E1 ) ⊗ (π2∗ E2 ), where π1∗ E1 is a bundle over M1 × M2 which is the pullback of E1 over M1 under projection π1 : M1 × M2 → M1 , the same is true for π2∗ E2 . Denoting by Q1 , Q2 and Q the corresponding quantization maps on M1 , M2 and M1 × M2 , we have Tr Q(1) = Tr Q1 (1)Tr Q2 (1).

(1.41)

The first property expresses the homotopic stability of the index. It follows from the fact that two quantum algebras WD and WD 0 obtained by the canonical construction for different choices of connections are isomorphic and the isomorphism preserves traces [9]. The second property is obvious since by explicit formulas of deformation quantization we see that all the variables x, y ∈ T M are separated into x1 , y1 ∈ T M1 and x2 , y2 ∈ T M2 . This implies for trace densities t (x, h) = t (x1 , h)t (x2 , h),

Atiyah–Bott–Patodi Method in Deformation Quantization

703

so that integration yields (1.41). There is another version of the index theorem which gives a convenient normalization for the trace. Let Q be a canonical deformation quantization with trivial coefficients, the corresponding algebra of flat sections will be denoted by WD (M). We can extend Q to matrix–valued functions by quantizing each entry of the matrix, that is for a(x) = (aij (x)) we set Q(a(x)) = (Q(aij (x))). In this way we obtain an algebra WD (M, Mat) of matrix–valued quantum observables. A trace functional on WD (M) may be also extended to WD (M, Mat) in the usual way: for a matrix b a = (b aij ) we set X Tr b aii . Tr b a= i

π ◦b π =b π , that is b π is an idempotent element. Let further b π ∈ WD (M, Mat) satisfy b The corresponding classical object is a matrix–valued function π(x) = b π |y=0,h=0 with the property π 2 (x) = π(x). It means that π(x) is a projection matrix and thus defines a vector bundle E whose fiber Ex at the point x ∈ M is Im π(x), the image of π(x). Conversely, any vector bundle may be obtained in such a way [9]. In this notation we have the following version of the index theorem: Theorem 1.7.

Z Tr b π=

M

chE exp

ω b A(M). 2π h

π1 }: This formula holds also for virtual bundles, that is for pairs {b π0 , b Z ω b A(M). π1 = (chE0 − chE1 ) exp Tr b π0 − Tr b 2π h M

(1.42)

(1.43)

Consider now virtual bundles with supports in a small neighborhood O of a point x0 ∈ M. It means that the corresponding projector–valued functions π0 (x), π1 (x) coincide outside O. For such virtual bundles (1.43) reduces to Z π1 = (chE0 − chE1 ). (1.44) Tr b π0 − Tr b O

We may consider such pairs as elements Kcomp (R2n ) ∼ = Z; the isomorphism is given by the right-hand side of (1.44) (see, e.g. [2]). In particular, for a generator of Kcomp (R2n ) we obtain π1 = 1, Tr b π0 − Tr b

(1.45)

and this equality may serve as a natural normalization condition for the trace on the original algebra WD (M) with scalar coefficients. Theorem 1.7 will be verified for CPn in Sect. 3 by direct check. By Hirzebruch’s argument it holds then in general since both sides of (1.42) possess stability and multiplicative property.

704

B. Fedosov

2. Averaging Procedure The stability property allows us to apply Tamarkin’s theorem which guarantees that in principle integrating by parts, one can transform the integrand in (1.40) to Chern and Pontryagin polynomials. But it gives no idea what the final expression should be. Here we construct a symmetrizer allowing one to find this final expression, provided it exists. E Let L be a linear space of differential polynomials in curvature tensors Rij kl and Rkl ij whose coefficients are products of tensors ω . More precisely, the space L is spanned by monomials of the form a = Ri1 j1 k1 l1 ,m1 ...n1 . . . Rip jp kp lp ,mp ...np tr RkEp+1 lp+1 ,mp+1 ...np+1 . . . RkEp+q lp+q ,mp+q ...np+q ωα1 β1 . . . ωαs βs .

(2.1)

Here we use the notation (0.8) and (0.9), suppressing the superscript s for the symplectic connection. The upper indices α1 , β1 , . . . , αs , βs form a permutation of the lower ones i1 , j1 , k1 , l1 , m1 , . . . , n1 , . . . , kp+q , lp+q , mp+q , . . . , np+q (of course, the total number of lower indices should be even). The symbols ∂i denote covariant derivatives satisfying the Leibniz rule, ∂i (ab) = ∂i ab + a∂i b,

(2.2)

[∂i , ∂j ]ak = Rkpij aq ωpq + [RijE , ak ].

(2.3)

and the Ricci identity

The commutator of two derivatives [∂i , ∂j ] = ∂i ∂j − ∂j ∂i is again a derivation and thus satisfies the Leibniz rule. This allows one to extend the Ricci identity to any tensor ak...l . Recall the symmetry properties for the curvatures: E E = −Rlk , Rkl Rij kl = Rj ikl = −Rij lk .

(2.4) (2.5)

Since the symplectic connection is torsion-free, we have Rij kl + Riklj + Rilj k = 0,

(2.6)

which together with (2.5) implies that alternation with respect to any three indices gives 0. Another useful consequence of (2.6) is Ri[j k]l := Rij kl − Rikj l = Rilkj .

(2.7)

There are also Bianchi identities for covariant derivatives of curvatures E E E + Rlm,k + Rmk,l = 0, Rkl,m

(2.8)

Rij kl,m + Rij lm,k + Rij mk,l = 0.

(2.9)

Atiyah–Bott–Patodi Method in Deformation Quantization

705

Using symmetry properties (2.4)–(2.6), we may express (2.8), (2.9) by saying that the E and Rij kl,m gives 0. alternation in any three indices for tensors Rkl,m The above relations allow one to rewrite (2.1) in many equivalent forms. Besides, we may consider more general monomials than (2.1), where some partial derivatives are applied to a product of tensors. This more general form may be reduced, of course, by means of the Leibniz rule to a standard form (2.1). There are three important subspaces in L. The first one consists of divergences, that is of expressions ∂i bj ωij , where bj are of the form (2.1) with one free lower index. We denote this subspace by Ldiv . The second one LPC consists of Pontryagin–Chern polynomials. The latter are sums of expressions (2.1) of a special form Ri1 j1 k1 l1 . . . Rip jp kp lp tr RkEp+1 lp+1 . . . RkEp+q lp+q , ωα1 β1 . . . ωαp βp ω[k1 l1 . . . ωkp+q lp+q ] ,

(2.10)

where α1 , β1 , . . . , αp , βp is a permutation of the indices i1 , j1 , . . . , ip , jp and square brackets mean alternation. Comparing (2.10) with (2.1), we see that the Pontryagin– Chern polynomials have the following additional properties: 1. they contain no covariant derivatives of the curvature tensors, 2. the first two indices of the symplectic curvature tensors i1 , j1 , . . . , ip , jp are contracted between themselves, 3. the remaining indices k1 , l1 , . . . , kp+q , lp+q enter (2.10) in an antisymmetric way. the third subspace Linv consists of polynomials a ∈ L such that the integral R Finally, n is independent of the choice of connections ∂ s , ∂ E . In other words, it means aω M that the Euler variational derivative vanishes. Clearly, Linv contains both Ldiv and LPC and the Tamarkin theorem may be expressed by the equality Linv = LPC ⊕ Ldiv . We are going to construct a symmetrization operator P : L → LPC with the following two properties: 1. P vanishes on Ldiv , 2. restricted to LPC , P is an identity operator. These two properties imply that for any a ∈ Linv , Z Z aωn = Paωn . M

M

(2.11)

The construction goes in two steps. First we construct an auxiliary operator A called alternation which also maps the space L to LPC and vanishes on divergences. At the second step we modify it to obtain a projector P. A crucial property of A making possible this modification is described in the following theorem. Theorem 2.1. The alternation operator A restricted to the space LPC of Pontryagin– Chern polynomials is nondegenerate. Denoting by B the restriction of A to the space LPC and by B −1 : LPC → LPC the inverse operator, define P = B −1 A. Then

P(P(a)) = B −1 AB −1 Aa = B −1 BB−1 Aa = B −1 Aa,

(2.12)

706

B. Fedosov

so P is, in fact, a projector to the space of Pontryagin–Chern polynomials. Clearly, P annihilates polynomials which may be written in the divergent form since A does. Thus, it remains to construct the alternation A and to prove Theorem 2.1. To define an averaging operator A on monomials (2.1), we proceed as follows. We supply each tensor ωαk βk in (2.1) with a label b (base) or f (fiber) obtaining the so-called labelled monomial alab . Set X (2.13) A(a) = A(alab ), where the sum is taken over all possible labellings. For a labelled monomial we set A(alab ) = 0 if the number of labels f is not equal to the number of curvature tensors Rij kl entering (2.1), that is if ]ωf−1 6= p. Otherwise, that is, if ]ωf−1 = p, A(alab ) is α βk

defined as the alternation over all upper indices αk , βk entering ωb k In other words,

labelled with b.

A(alab ) = Ri1 j1 k1 l1 ,m1 ...n1 . . . Rip jp kp lp ,mp ...np tr RkEp+1 lp+1 ,mp+1 ...np+1 . . . RkEp+q lp+q ,mp+q ...np+q α β1

ωf 1

α β

[αp+1 βp+1

. . . ωf p p ωb

α βs ]

. . . ωb s

.

(2.14)

Here square brackets mean alternation over all indices in the brackets. We take a special normalization, namely [α1 β1

ωb

α βn ]

. . . ωb n

=

1 X σ (α )σ (β1 ) σ (α )σ (βn ) sgn σ ωb 1 . . . ωb n , n!2n σ

(2.15)

where σ runs over the whole symmetric group S 2n . The normalizing factor in front of the sum is taken to avoid repetitions. Indeed, we have n pairs of indices (α1 , β1 ), . . . , (αn , βn ).

(2.16)

The permutations σ ∈ S 2n transposing the indices inside pairs or permutting them pairwise form the so-called octahedral subgroup (the Weyl group of the symplectic group). The number of elements of this subgroup is 2n n! and we take it to normalize the sum (2.15). This alternation procedure may be expressed in a slightly different way. Introduce an ordering in the set of all 2n indices αk , βk , k = 1, 2, . . . , n, taking α1 < β1 < α2 < β2 < · · · < αn < βn . A permutation σ ∈ S 2n is called admissible if σ (αk ) < σ (βk ) (k = 1, 2, . . . , n); σ (α1 ) < σ (α2 ) < · · · < σ (αn ). Then (2.15) may be rewritten in the form X [α β α β ] σ (α )σ (β1 ) σ (α )σ (βn ) sgn σ ωb 1 . . . ωb n , ωb 1 1 . . . ωb n n = where σ runs over all admissible permutations. For example, ω[α1 β1 ωα2 β2 ] = ωα1 β1 ωα2 β2 − ωα1 α2 ωβ1 β2 + ωα1 β2 ωβ1 α2 .

(2.17)

(2.18)

Atiyah–Bott–Patodi Method in Deformation Quantization

707

The number of summands in (2.18) is equal to (2n)! = (2n − 1)(2n − 3) . . . 3 · 1 = (2n − 1)!!, (2.19) 2n n! in particular, this number is odd. Returning to the definition of A(alab ), we see that the alternation over upper indices is equivalent to the alternation over corresponding lower indices. The indices αk , βk α β entering ωb k k and the corresponding lower indices will be called the base ones. Thus, A(alab ) is the alternation over base indices (upper or lower ones). In the final expression (2.13) the labels f and b should be omitted. The operator A defined in this way will also be called the alternation. We hope that this name will not cause any misunderstanding. For polynomials A is defined by linearity. First of all we need to prove that the definition is correct, that is independent of the representation of the monomial a in the form (2.1). If we transform (2.1) using symmetry relations (2.4)–(2.6) or Bianchi identities (2.8),(2.9), it does not affect the result A(a) because all these transformations preserve the number of symplectic curvature tensors Rij kl in (2.1). It means that the number of labels f and b remains fixed under these transformations. So, the operator A acts only on factors ωαβ not affected by these transformations. But if we apply the Ricci identity (2.3), we get an extra factor Rij kl and an extra factor ωαβ , so it is not evident that the result A(a) will be the same in this case. We postpone the treatment of the Ricci identity for a while and consider some properties of the alternation A(a) supposing that the representation (2.1) is fixed. Lemma 2.2. Let A(alab ) 6 = 0. Then 1. each tensor Rij kl contains precisely two fiber indices, 2. there are no covariant derivatives in the representation (2.1). Proof. Observe that the tensor Rij kl may not contain three or more base indices because the alternation over these indices gives 0 by virtue of the symmetry property (2.6). Thus, it may contain at most two base indices, hence at least two fiber ones. But the total number of fiber indices is equal to 2p, where p denotes the number of tensors Rij kl in (2.1). This proves the first item. All the remaining indices in (2.1) must be base ones. This means that the expression ∂m Rij kl must contain three base indices and alternation over these indices gives 0 by E contains only base indices and is virtue of the Bianchi identity (2.9). Similarly, ∂m Rkl killed by alternation. u t Corollary 2.3. If A(a) 6 = 0, then a does not contain covariant derivatives. Proof. If a does contain covariant derivatives, so does alab for each labelling. By Lemma t 2.1 A(alab ) = 0, implying A(a) = 0 because of (2.13). u We will also need partly labelled monomials. It means that in (2.1) some tensors ωαβ are supplied with labels b and f and some are not. We define the operator A on partly labelled monomials similarly to (2.13), X A(alab ), A(ap.lab ) = but here the sum is taken over all labellings extending the given partial labelling. The notation a ∼ b or ap.lab ∼ b means that A(a) = A(b) or A(ap.lab ) = A(b), we call such polynomials equivalent.

708

B. Fedosov

Lemma 2.4. Each polynomial a ∈ L is equivalent to a linear combination of the following canonical monomials: Ri1 j1 k1 l1 . . . Rip jp kp lp tr RkEp+1 lp+1 . . . RkEp+q lp+q , ωα1 β1 . . . ωαp βp ωk1 l1 . . . ωkp+q lp+q ,

(2.20)

where α1 , β1 , . . . , αp , βp form a permutation of the indices i1 , j1 , . . . , ip , jp . Proof. By Lemma 2.1 monomials containing partial derivatives are equivalent to 0, so we may drop them. For a given labelled monomial alab we may assume without loss of generality that in each factor Rij kl the last two indices k, l are the base ones. Indeed, by Lemma 2.2 each factor contains two base indices. Alternation over these two indices along with (2.7) moves them to the last two ones. Thus, the following labelled monomials: Ri1 j1 k1 l1 . . . Rip jp kp lp tr RkEp+1 lp+1 . . . RkEp+q lp+q , α β1

ωf 1

α β

k

. . . ωf p p ωbk1 l1 . . . ωbp+q

lp+q

(2.21)

may be taken as canonical representatives. We see that (2.21) is obtained from (2.20) by a special choice of labels. It remains to show that any other labelling in (2.20) gives a labelled monomial alab ∼ 0. Indeed, if we replace ωbk1 l1 by ωfk1 l1 , then the indices k1 , l1 become fiber ones. By Lemma 2.2 we are forced to make the remaining indices i1 , j1 the base ones. But this choice gives t 0 after alternation over base indices because of the symmetry in i1 , j1 . u Corollary 2.5. The polynomials A(a), where a is a canonical monomial (2.20) form a basis in the space LPC . Remark 2.6. The indices k1 , l1 , . . . , kp+q , lp+q may be replaced by any permutation σ (k1 ), σ (l1 ), . . . , σ (kp+q ), σ (lp+q ). This does not change the equivalence class of (2.20) up to a factor sgn σ . We return to the Ricci identity. Proposition 2.7. The Ricci identity does not change the equivalence class of monomials (2.1). E in (2.1). The general Proof. Let us suppose for simplicity that there are no terms with Rkl case is not much harder and may be left to the reader. Let us denote

[∂m , ∂n ]Rij kl := Riαmn Rβj kl ωαβ + Rj αmn Riβkl ωαβ +Rkαmn Rijβl ωαβ + Rlαmn Rij kβ ωαβ ,

(2.22)

that is, the right-hand side of the Ricci identity. The left-hand side will be denoted by (∂m ∂n − ∂n ∂m )Rij kl = ∂m ∂n Rij kl − ∂n ∂m Rij kl . We also write [∂m , ∂n ]f Rij kl and [∂m , ∂n ]b Rij kl for partially labelled expressions (2.22) αβ αβ with ωαβ replaced by ωf or ωb . We need to prove the following relation:

Atiyah–Bott–Patodi Method in Deformation Quantization

709

(∂m ∂n − ∂n ∂m )Ri1 j1 k1 l1 . . . Rip jp kp lp ωα1 β1 . . . ωα2p+1 β2p+1 ∼ [∂m , ∂n ]Ri1 j1 k1 l1 . . . Rip jp kp lp ωα1 β1 . . . ωα2p+1 β2p+1 ,

(2.23)

where α1 , β1 , . . . , α2p+1 , β2p+1 is a permutation of indices m, n, i1 , j1 , k1 , l1 , . . . , ip , jp , kp , lp . The left-hand side is equivalent to 0 since it contains covariant derivatives, so the same must be proved independently for the right-hand side. At first glance (2.23) does not seem to be the most general case since there may be extra covariant derivatives applied to the factors Rij kl . But in this case both sides would be equivalent to 0 because of Lemma 2.2. Replacing for a moment [∂m , ∂n ] by [∂m , ∂n ]f , we obtain a true relation. Indeed, in αβ αβ this case any labelling of factors ωαβ with p factors ωf and p + 1 factors ωb on the left-hand side of (2.23) would give us a labelling on the right-hand side with p + 1 αβ αβ factors ωf and p + 1 factors ωb and vice versa. Thus, in this case the relation (2.23) is a consequence of the Ricci identity itself. It remains to prove that [∂m , ∂n ]b Ri1 j1 k1 l1 . . . Rip jp kp lp ωα1 β1 . . . ωα2p+1 β2p+1 ∼ 0. α,β

(2.24) αβ

We have to consider various labellings with p factors ωb and p + 1 factors ωf and then alternate over base indices including the indices αβ on the right-hand side of (2.22). Those labellings for which the result is 0 will be called trivial. Lemma 2.8. For a non-trivial labelling the indices m and n can not have the same label. Proof. Suppose that m and n are both base indices. Then we have three base indices α, m, n in the first factor in each summand of (2.22) which gives a contradiction. Suppose now that m and n are both fiber indices and again consider the first factors in each summand of (2.22). The first two indices in these factors must be base ones, giving a contradiction because of symmetry in the first two indices. u t Lemma 2.9. For non-trivial labellings in (2.24) there is only one base index among i1 , j1 , k1 , l1 and there are precisely two base indices in each factor Ri2 j2 k2 l2 , . . . , Rip jp kp lp . Proof. Suppose there are two base indices. By symmetry relations we may achieve that these are k1 , l1 . Then the first and the second summands of (2.22) have three base indices β, k, l in the second factor. In the third and fourth summands we have three base indices in the first factor. Indeed, in the third summand the base indices are k, α and m (or n) and in the fourth one l, α and m (or n). This gives a contradiction. Thus, there may be at most one base index among i1 , j1 , k1 , l1 . Any other factor Ri2 j2 k2 l2 . . . Rip jp kp lp must contain at most two base indices. Since the total number of base indices is equal to 2(p + 1), the lemma follows. u t Lemma 2.9 allows us to write any non-trivial labelled monomial in the form [∂m , ∂n ]b (Ri1 j1 k1 l1 Rl2 j2 k2 l2 Rip jp kp lp ) α β1

ωb 1

α β

α

. . . ωb p p ωf p+1

βp+1

α

. . . ωf 2p+1

β2p+1

.

(2.25)

Indeed, the derivation [∂m , ∂n ]b with one base index m or n applied to each factor Ri2 j2 k2 l2 , . . . , Rip jp kp lp containing precisely two base indices yields zero. Now consider the action of the derivation [∂m , ∂n ]b on coupled fiber indices.

710

B. Fedosov

Lemma 2.10. The following identity holds: ij

ij

[∂m , ∂n ]b (Aij )ωf ≡ −[∂m , ∂n ]f (Aij )ωb .

(2.26)

Proof. We may assume that the tensor Aij is antisymmetric since its symmetric part is killed by contraction with ωij . Now on the left we have αβ

ij

αβ

ij

αβ

ij

Riαmn Aβj ωb ωf + Rj αmn Aiβ ωb ωf = 2Riαmn Aβj ωb ωf ij

(the two summands are equal because of antisymmetry of Aij and ωf ). The right-hand side of (2.26) is αβ ij αβ ij −Riαmn Aβj ωf ωb − Rj αmn Aiβ ωf ωb which gives as before αβ

ij

αβ

ij

−2Riαmn Aβj ωf ωb = 2Rαimn Ajβ ωf ωb . Thus, it is equal to the left-hand side up to the change of notation. u t To finish the proof of Proposition 2.7, we rewrite (2.25) using (2.26) as a sum of terms [∂m , ∂n ]f (Ri1 j1 k1 l1 . . . Rip jp kp lp ) α β1

ωb 1 α

α

. . . ωb p+1

βp+1

α

ωf p+2

βp+2

α

. . . ωf 2p+1

β2p+1

β

with one extra factor ωb p+1 p+1 . But this extra factor implies that one of the tensors Ri1 j1 k1 l1 . . . Rip jp kp lp would contain three (or more) base indices and thus vanishes under alternation over them. u t We have proved that the alternation operator A is correctly defined on the space L, that is it does not depend on a particular representation of monomials. Its image is the space of Pontryagin–Chern polynomials. Proposition 2.11. The operator A vanishes on the subspace Ldiv . Proof. By definition, divergences being the sum of expressions ∂i bj ωij necessarily contain partial derivatives. So, each term contains partial derivatives of curvature tensors (at least for some representation). By Corollary 2.3 the operator A vanishes on such expressions. u t Proof of Theorem 2.1. To prove Theorem 2.1, we construct a basis in LPC for which the operator A has a matrix with integer entries, the diagonal entries are odd and the off-diagonal ones are even. In other words, the operator A (mod 2) is an identity. We introduce new variables (1.38), (1.39) instead of tensors Rij kl , 1 Tij k,l = − (Rij kl + Rj kil + Rkij l ). 4

(2.27)

Conversely, as follows from (1.18) Rij kl may be expressed by means of Tij kl , Rij kl = Tij l,k − Tij k,l .

(2.28)

Atiyah–Bott–Patodi Method in Deformation Quantization

t

t

t t

t

t

? t

? t

t t

? t

? t

a

711

t t ]J J

J

t Jt b

t

t

t

t

? t

? t

? t

? t

c

Fig. 1.

The tensor Tij k,l is symmetric in the three first indices i, j, k while its symmetric component in all four indices vanishes: T(ij k,l) = 0. The polynomials in these tensors will be represented by diagrams where each factor Tij k,l is represented by a set of four points corresponding to the indices, the cluster of three symmetric indices being surrounded by an oval, each factor TklE by a pair of points, and each contraction with ωαβ by an arrow starting at the point α and ending at the point β. For example, all polynomials in two tensors Ti1 j1 k1 ,l1 , Ti2 j2 k2 ,l2 are linear combinations of three monomials represented in Fig. 1. The diagrams a, b, c in Fig. 1 correspond to the monomials Ti1 j1 k1 ,l1 Ti2 j2 k2 ,l2 ωi1 i2 ωj1 j2 ωl1 k1 ωl2 k2 , Ti1 j1 k1 ,l1 Ti2 j2 k2 ,l2 ωi1 i2 ωj1 j2 ωl1 k2 ωl2 k1 , Ti1 j1 k1 ,l1 Ti2 k2 j2 ,l2 ωi1 i2 ωj1 j2 ωk1 k2 ωl1 l2 respectively. Let us also describe the action of the alternation operator A on these diagrams. First we need to supply two arrows with a label f and two remaining arrows with a label b and then alternate over the set of initial and end points of the base arrows. For the diagram a there is only one non-zero choice of labelling: the vertical arrows get the label f and the horizontal ones get the label b. The result of subsequent alternation is A(a) = a − b − c.

(2.29)

A similar result is valid for the diagram b. Again we have the only non-zero choice of labels: the vertical arrows get the label f and the remaining two arrows get the label b. So, A(b) = −a + b + c.

(2.30)

For the diagram c we have three different possibilities (Fig. 2). Of course, they are equivalent to each other because of symmetry. Thus, A(c) = 3A(b) = −3a + 3b + 3c. The relation (2.31) also follows from the symmetry property: Tij k,l = −Tij l,k − Tilk,j − Tlik,i .

(2.31)

712

B. Fedosov

t d

f

? t 6

t f

t b

? t

dt

t

f

b

? ? t tp

? t 6

t

t f

b

? t

t b

? ? t tp

dt

b

? t 6

t f

t f

? t

t b

? ? t tp

Fig. 2.

Let us also explain the construction of the projector P for this example. We see that A(a), A(b), A(b) are proportional to the Pontryagin polynomial A(a) = a − b − c. Now, using (2.29)–(2.31), we obtain A(A(a)) = A(a) − A(b) − A(c) = 5A(a), so, that 1 A. 5 Canonical monomials (2.20) in new variables Tij k,l have the same form P=

Ti1 j1 k1 ,l1 . . . Tip jp kp ,lp tr TkEp+1 lp+1 . . . TkEp+q lp+q , ωα1 β1 . . . ωαp βp ωk1 l1 . . . ωkp+q lp+q , and the corresponding Pontryagin–Chern polynomials A(a) form a basis in LPC (see E , leaving the Corollary 2.5). In the sequel we again suppose that there are no factors Ti,j general case to the reader. Further classification of canonical monomials may be obtained using partitions into cycles. Let Tij = Tij k,l ωlk be a symmetric matrix obtained by contraction of indices k, l in Tij kl . The monomial T (s) = Ti1 j1 ωj1 i2 Ti2 j2 ωj2 i3 . . . Tis js ωjs i1

(2.32)

is called a cycle of length s. Lemma 2.12. If s is odd, then the cycle (2.32) is equal to 0. Proof. Consider another expression Tj1 i1 ωi2 j1 Tj2 i2 ωi3 j2 . . . Tjs is ωi1 js

(2.33)

obtained from (2.32) by transpositions of all pairs of indices. Since Tij is symmetric and ωij is antisymmetric, (2.33) coincides with (2.32) up to a sign (−1)s . On the other hand, changing the order of factors in (2.33), we obtain the same cycle (2.32) (up to a change of notation). u t Given an (even) integer s, we consider all possible partitions of s s : s = s1 + s2 + · · · + sm

Atiyah–Bott–Patodi Method in Deformation Quantization

t

713

t i

i

t

i

i

t

i

i

t

i

i

t

t

t a

b

Fig. 3.

into a sum of positive even summands. For each partition we consider a canonical monomial T (s) = T (s1 )T (s2 ) . . . T (sm )

(2.34)

and the corresponding Pontryagin monomial A(T (s)). We use graphical representation for monomials (2.34). We simplify the graphical notation for the tensor Tij k,l to a relatively large circle corresponding to the cluster ij k of symmetric indices and a smaller filled circle corresponding to the index l. The latter will be called outer while the indices belonging to the symmetric cluster will be called inner ones. At any large white circle there must be three arrows (incoming or outgoing). Since we are going to deal with integer coefficients modulo 2, the sign is irrelevent, so we may use unoriented lines instead of arrows. Figure 3 illustrates two such monomials corresponding to partitions 4 = 4 (Fig. 3a) and 4 = 2 + 2 (Fig. 3b). There is no ambiguity in labelling the edges of these graphs since all outer vertices are coupled with inner ones (see Lemma 2.4). Such graphs corresponding to coupled outer and inner indices within the same tensor Tij k,l , that is, to the matrix Tij = Tij k,l ωkl , correspond to canonical monomials. For such a canonical graph, the base edges are those coupling filled and white circles, the other ones are fiber edges. Thus, to each monomial T (s) corresponds a canonical graph and vice versa. To obtain the Pontryagin monomial from the canonical graph, we need to consider all admissible permutations of the end points of the base edges and sum the results A(T (s)) ≡

X

σ (T (s))

(mod 2).

(2.35)

Here σ means an admissible permutation and the sum is taken over all of them. Figure 4 illustrates three admissible permutations applied to the graph of Fig. 3a. Unfortunately, we have no possibility to illustrate all admissible permutations since there are 7 · 5 · 3 = 105 of them. Now, we need to investigate the action of the operator A on Pontryagin monomials (2.35). To apply A to σ (T (s)) means to consider all relabellings of the edges and to alternate with respect to new base edges summing the results. Let us illustrate the action

714

B. Fedosov

t

b

i

i

i

f

f

t

t

b

f f

b

i t

b a

t

t

b

i b

i

f f

t

i

b

b

i b

t

i

f

f

i t

t

b

i

f

f

t

f f

b

i

b

t

c

b Fig. 4.

of A applying it to the graphs in Fig. 4. For the graph 4a there is no ambiguity in labelling (Lemma 2.4), thus A(4a) ≡ A(3a) (mod 2). For the graph 4b there are two possible labellings: one is represented in the picture, the other one is obtained by interchanging f and b at the double edge. The result of alternation will be the same and we obtain A(4b) ≡ 2A(3a) ≡ 0 (mod 2). There are still more possibilities for the graph 4c. Besides interchanging f and b at two double edges, we may relabel horizontal fiber edges for base ones, and vertical base edges for fiber ones. It gives A(4c) ≡ 4A(3a) + A(3b) ≡ A(3b) (mod 2). In general, applying A to σ (T (s)), we obtain a linear combination of Pontryagin monomials with coefficients in Z2 . Thus, X a(s, s 1 )A(T (s 1 )) A(A(T (s))) ≡ s1

with coefficients a(s, s 1 ) ∈ Z2 . We will show that a(s, s) ≡ 1 (mod 2) while a(s, s 1 ) ≡ 0 (mod 2) for (s 6 = s 1 ). To this end, observe that the monomial T (s) carries a unique

Atiyah–Bott–Patodi Method in Deformation Quantization

715

labelling. Thus, for any admissible permutation σ the monomial σ (T (s)) also carries a labelling: the image of T (s) by the action of σ . We may decompose A(σ (T (s))) = A0 (σ (T (s))) + A1 (σ (T (s))) + . . . ,

(2.36)

where A0 means that no changes in labelling have been made, A1 means that only one fiber edge was relabelled for the base one (of course, one base edge should be relabelled for the fiber one). Next, A2 means that two fiber labels were renamed for base ones (and two base labels were renamed for fiber ones) and so on. In each case A0 , A1 , A2 , . . . we have to consider all possible relabellings and for each relabelling we need to alternate again over base indices. Thus, A0 (σ (T (s))) ≡ A(T (s)) (mod 2) and X (A0 (σ (T (s))) + A1 (σ (T (s))) + . . . ) A(A(T (s))) = X X A(T (s)) + (A1 (σ (T (s))) + A2 (σ (T (s))) + . . . ) = σ

σ

= (2s − 1)!!A(T (s)) +

(2.37)

X (A1 (σ (T (s))) + A2 (σ (T (s))) + . . . ). σ

The first summand gives an identity matrix times an odd number (2s − 1)!!, the number of all admissible permutations. So, we need to show that A1 (σ (T (s))),

A2 (σ (T (s))),

have even coefficients. Consider in more detail the term Ak (σ (T (s))). It may be described as follows. We mark k fibre edges and delete them. This breaks some cycles into pieces which then must be glued into new cycles by inserting k base edges renamed for fiber ones. This process is represented in Fig. 5a for k = 2. We have a closed polygon ABCDEF GH formed by fiber edges. Two edges H A and DE are marked and then deleted being renamed for base ones. As a result we obtain two broken lines ABCD and EF GH . The remaining three figures: b, c, d show three possibilities of gluing these lines into cycles. The dashed lines mean the base edges renamed for new fiber ones. Figures 5b and 5c show new polygons: ABCDEF GH and ABCDH GF E, they are isomorphic as graphs to the original polygon in Fig. 5a. Figure 5d shows another partition corresponding to two cycles ABCD and EF GH . These new cycles have even lengths by Lemma 2.12. Returning to the general case, observe that marking some fiber edges is equivalent to subdivision of cycles into pieces: from one marked edge (included) to the next one (excluded) in the chosen direction. For example, in Fig. 5a we have two pieces: H ABCD and DEF GH . Thus, for the lengths of cycles we have subpartitions si = si0 + si00 + si000 + . . . .

Lemma 2.13. If the averaging over new base vertices is nonzero (mod 2), then the lengths are even numbers.

716

B. Fedosov

C

B

C

B

A

D

A

D

H

E

H

E

G

G

F a

b

C

B

F

C

B

A

D

A

D

H

E

H

E

G

G

F c

F d

Fig. 5.

Proof. Suppose that one of these lengths, say s10 , is odd. Deleting marked fiber edges, we obtain a piece of even length s10 − 1. This piece must be glued to the remaining part of the cyccle by two base edges renamed for fiber ones. We have three possibilities to join four ends with two edges (see Fig. 5b, c, d). One of them (see Fig. 5d) gives a cycle of odd length s10 , for which the subsequent alternation over base indices gives 0 by Lemma 2.12. The remaining two (Fig. 5b, 5c) lead to isomorphic cycles, so their sum gives 0 (mod 2). u t Now, let us count how many subdivisions of an even cycle into pieces of even lengths exist. Lemma 2.14. The number of different subdivisions of a cycle into pieces: s = s1 + s2 + · · · + sk , where s, s1 , s2 , . . . , sk are even numbers, is even. Proof. Let s1 , s2 , . . . , sk be successive length of pieces. Denote by k0 the smallest period of this sequence and let smin = s1 + s2 + · · · + sk0 −1 be the sum of lengths of all

Atiyah–Bott–Patodi Method in Deformation Quantization

717

C

B

C

B

A

D

A

D

H

E

H

E

G

G

F a

C

B

F b

C

B

A

D

A

D

H

E

H

E

G

G

F c

F d

Fig. 6.

pieces within the smallest period. It is necessarily even since s1 , s2 , . . . , sk0 −1 are. Given any position of marked edges defining this subdivision, we may obtain smin different positions by cyclic shifts (see Fig. 6). This proves the lemma. u t Now we may complete the proof of Theorem 2.1. The last two lemmas imply that in (2.37) all the terms are equal to 0 (mod 2) except the first one A0 (σ (T (s))), which defines an identical map (mod 2). Thus, A(A(T (s))) ≡ (2s − 1)!!A(T (s)) ≡ A(T (s)) (mod 2), proving the theorem. u t Now, applying the results of Sects. 1 and 2, we reach our aim. Theorem 2.15. There exist universal polynomials Pk (p1 , p2 , . . . , pn , c1 , c2 , . . . , cm , ω) in Pontryagin forms p1 , p2 , . . . , pn , Chern forms c1 , c2 , . . . , cm and ω such that Z X 1 hk Pk . (2.38) Tr b 1= 2π hn M k=0

718

B. Fedosov

P Proof. The trace density t (x, h) = hk tk (x) is given by universal (that is independent of M and E) formulas of deformation quantization. Moreover, the stability property (Subsect. 1.3) implies that tk (x) ∈ Linv . Then, by (2.11) we obtain Z X Z t (x, h)ωn = Ptk (x)ωk . M

M

k

The definition of the projector P also has a universal character, so Ptk are universal Pontryagin–Chern polynomials. Being multiplied by ωn these polynomials give polynomials in Pontryagin forms, Chern forms and ω proving the theorem. u t As has been explained in the introduction, to identify the integrand in (2.38) with the integrand in (0.1), we need to verify directly the index theorem for the case M = CPn with a trivial bundle E = C. 3. The Case of CPn In this case the left-hand side and the right-hand side of (0.1) may be calculated directly. A crucial fact for our calculation is that CPn is the result of the symplectic reduction of the standard symplectic space Cn+1 with ω=i

n X

dzi ∧ dzi

(3.1)

i=0

by the action of the circle group G = U (1) : z 7 → zeit . This action is a Hamiltonian flow defined by the Hamiltonian function H =

n X

zi zi = |z|2 .

(3.2)

i=0

According to the reduction procedure we consider invariant functions, that is the functions a = a(z, z) ∈ C ∞ (Cn+1 ) such that d a (zeit , ze−it ) = {H, a} = 0 t=0 dt

(3.3)

and restrict them to the level set H = 1 which is a unit sphere S 2n+1 ⊂ Cn+1 . The invariant functions on S 2n+1 may be considered as the functions on the orbit space S 2n+1 /G = CPn . The form (3.1) restricted to S 2n+1 is invariant under the circle action, so it defines a symplectic form on CPn , the so called Fubini-Study form. The normalized Fubini-Study form n i X dzi ∧ dzi e ω= 2π i=0

Atiyah–Bott–Patodi Method in Deformation Quantization

719

has the property Z CPn

e ωn = 1. n!

(3.4)

Consider now the reduction procedure for deformation quantization. We have the Weyl ∗-product on Cn+1 which in complex coordinates has the form ( n ) ∂ ∂ hX ∂ ∂ − (3.5) a(z, z)b (w, w)|w=z . a ∗ b = exp 2 ∂zi ∂wi ∂zi ∂wi i=0

b = C ∞ (Cn+1 )[[h]] with the product The algebra of quantum observables on Cn+1 is A (3.5). Since H is a quadratic function, the quantum commutator coincides with the b is similar to the b0 ⊂ A Poisson bracket (times −ih), so the invariant subalgebra A classical invariant subalgebra, b0 ⇔ d a(zeit , ze−it , h) a(z, z, h) ∈ A t=0 dt = [H, a] = −ih{H, a} = 0. b= A b0 /Jb, where Jb is an ideal generated by the The reduced algebra is the quotient R b is Hamiltonian H − 1. We refer to [11] where it is shown that the reduced algebra R isomorphic to the algebra of quantum observables WD (CPn ) obtained by the canonical b∼ construction. Thus, a quantum observable b a∈R = WD (CPn ) is a class b a = {a(z, z, h) + b(z, z, h) ∗ (H − 1)},

(3.6)

b0 (Cn+1 ). For the b0 (Cn+1 ) is a fixed invariant function and b runs over A where a ∈ A purpose of reduction we may restrict ourselves to invariant functions supported in the neighborhood O2ε = {z : |H − 1| < 2ε} of the unit sphere S 2n+1 ⊂ Cn+1 . We will denote this class of invariant functions by b0 (O2ε ). Introducing coordinates A H = |z|2 ;

ζ =

z ∈ S 2n+1 |z|

in O2ε , we have the following lemma. b0 (O2ε ) there exists a function Lemma 3.1. For any function a(H, ζ, ζ , h) ∈ A b b(H, ζ, ζ , h) ∈ A0 (O2ε ) such that the function c(H, ζ, ζ , h) = a + b ∗ (H − 1) is independent of H in a smaller neighborhood Oε = {z : |H − 1| < ε}. Proof. For an invariant function b we have {H, b} = 0, so that b ∗ (H − 1) = b(H − 1) −

h2 1b, 16

720

B. Fedosov

where 1=4

n X

∂2

i=0

∂zi ∂zi

is the Laplace operator. Thus, c(H, h) = a(H, h) + b(H, h)(H − 1) −

h2 1b(H, h), 16

where we have omitted the variables ζ, ζ . Supposing that c is independent of H , we obtain an equation for b, b(H, h) = −

a(H, h) − a(1, h) h2 1b(H, h) − 1b(1, h) + H −1 16 H −1

which can be solved as a formal power series in h by iterations (note that the ratio a(H, h) − a(1, h) H −1 is a smooth invariant function). It remains to multiply the function b(H, h) by a cut-off function ρ(H ) which is equal to 1 for |H − 1| < ε and vanishes for |H − 1| > 2ε. u t b∼ bD (CPn ), we first consider a representation To construct a trace on the algebra R =W b of A0 (O2ε ) by means of the Weyl pseudodifferential operators on L2 (Rn+1 ). In this representation h is treated as a numerical parameter h ∈ (0, 1]. So, for a function b0 (O2ε ) we first consider a partial sum a∈A a|N =

N X

hk ak (z, z).

k=0

Now h may be treated as a number from (0, 1] and the operator Op(a|N ) may be defined. By Lemma 3.1 we may assume that a|N is independent of H in coordinates H, ζ in the neighborhood |H − 1| < ε. Equation (3.5) becomes an asymptotic formula for the operator product: kOp(a|N )Op(b|N ) − Op( (a ∗ b)|N )k = O(hN ),

Tr (Op(a|N )Op(b|N ) − Op( (a ∗ b)|N )) = O(hN −n ) when h → 0 (see e.g. [9]). The Weyl representation may be defined for more general functions, in particular, for polynomials in z, z, h. The corresponding operator Op(a) acts as a closed densely defined operator on L2 (Rn+1 ). The asymptotic formula (3.5) becomes explicit if one of the factors is a polynomial. In particular, this is the case for a = H . The operator b = Op(H ) is the well-known quantum harmonic oscillator. It is a self-adjoint operator H on L2 (Rn+1 ) having a discrete spectrum consisting of eigenvalues n+1 , k = 0, 1, . . . (3.7) λk = h k + 2

Atiyah–Bott–Patodi Method in Deformation Quantization

721

with multiplicities (n + k)! . (3.8) n!k! More details about Weyl pseudodifferential operators and the quantum harmonic oscillator may be found e.g. in the book [9]. b has an eigenvalue λ = 1. This may happen if and only if h Suppose now that H belongs to a discrete subset n+1 1 , k = 0, 1, 2, . . . . 3= h: =k+ h 2 n = mk = Cn+k

Denoting by Ph , h ∈ 3, the spectral projector in L2 (Rn+1 ) corresponding to λ = 1, consider a functional Tr N,h a := Tr Ph Op(a|N ), h ∈ 3

(3.9)

b0 (O2ε ). Here Tr on the right-hand side means the operator trace. defined on A On the other hand, the numerical functional (3.9) may be approximated by a single formal series which we denote Tr a to distinguish it from operator traces. Theorem 3.2. There exists a trace functional b → C[[h]]/ hn Tr : R on the algebra

b= A b0 (O2ε )/(H − 1) ∼ R = WD (CPn )

such that for any N ∈ N

(3.10) (Tr a) N − Tr N,h a = O(hN −n ) when h → 0, h ∈ 3. Here (Tr a) N means the N th partial sum of the formal series Tr a.

b0 (O2ε ) may be uniquely represented Proof. By Lemma 3.1 any a = a(H, ζ, ζ 0 , h) ∈ A in Oε as a(H, ζ, ζ 0 , h) = c(ζ, ζ 0 , h) + b(H, ζ, ζ 0 , h) ∗ (H − 1).

(3.11)

n 0 Tr a = C1/ h+(n−1)/2 hc(ζ, ζ , h)i,

(3.12)

We set

where hi means termwise averaging over a unit sphere S 2n+1 and the binomial coefficient Cνn is given by Cνn = so that

ν(ν − 1) . . . (ν − n + 1) , n!

1 1 (n − 1) n + = C1/ h+(n−1)/2 n! h 2 1 (n − 1) 1 (n − 1) + − 1 ... − . h 2 h 2

Next we prove asymptotic estimates (3.10).

(3.13)

(3.14)

722

B. Fedosov

0

i

−i

Fig. 7.

Lemma 3.3. If h ∈ 3, then the expression Z n−1 1 2i 2 exp − (|z| τ − arctan τ ) (1 + τ 2 ) 2 dτ p(z, z, h) = π 0 h

(3.15)

is the symbol of the spectral projector Ph . Here 0 is the contour in Fig. 7. Proof. Consider a function t 2 t U (z, z, h, t) = exp − |z|2 tanh − n ln cosh h 2 2

(3.16)

which is rapidly decaying in z for h > 0, t > 0. One can easily check that 1 dU = − |z|2 ∗ U ; U |t=0 = 1. dt h Thus, U is the Weyl symbol of the operator

1b t exp − H h

=

∞ X

e

− k+ n+1 t 2

Pk .

k=0

The symbol of Pk may be found as the Fourier coefficient Pk (z, z) =

1 2πi

Z

σ +iπ

σ −iπ

U (z, z, h, t)ek+(

n+1 2 )t

dt

−1 . Here σ is an arbitrary positive number. Of course, we have to for h = k + n+1 2 show that the integrand has an analytic extension for 0 as a periodic function with the period 2πi. This is indeed the case since the exponent may be transformed to the following expression: 1 − e−t 1 − e−t 2 , + kt − n ln − |z|2 h 1 + e−t 2

Atiyah–Bott–Patodi Method in Deformation Quantization

where h = k +

723

n+1 −1 ; 2

k ∈ N. Hence, the symbol of the projector Pk is Z σ +iπ t 1 2 t 2 |z| tan t − exp − p(z, z, h) = 2π i σ −iπ h 2 2 t dt −n ln cosh 2 −1 . Changing variables tanh t/2 = iτ , we obtain the integrand of for h = k + n+1 2 (3.15) which is a single-valued function for h ∈ 3. The contour 0 is a circle surrounding the point t = −i and may be deformed to the contour in Fig. 7 after making cuts [−i, −i∞), [i, i∞). The integrand decays exponentially for |z| > 0, h > 0 along the cut [−i, −i∞), so we may deform contour without changing the integral. u t The integral (3.15) is meaningful for any h > 0, so the symbol p(z, z, h) is defined not only for h ∈ 3 but on the whole interval h ∈ (0, 1]. Lemma 3.4. Let ρ = ρ(H ) be a cut-off function equal to 1 for |H − 1| < ε and equal to 0 for |H − 1| > 2ε. Then for h → 0, Z ωn+1 1 n ∞ = C1/ p(z, z, h)ρ (3.17) h+(n−1)/2 + O(h ). (2πh)n+1 Cn+1 (n + 1)! Proof. After integration over the unit sphere and the change of variables (H − 1) = u, the integral (3.17) becomes Z ∞ Z 2i 1 dτ duρ(1 + u) exp − uτ (1 + u)n πhn+1 0 h −∞ n−1 2i exp( (arctan τ − τ ))(1 + τ 2 ) 2 . h The inner integral over u is equal to 2τ ih d n ρ b , 1+ 2 dτ h where ρ b(s) is the Fourier transform of ρ(1 + u). Since ρ(1 + u) is supported in the neighborhood |u| < 2ε, ρ b(s) is an entire function satisfying estimates |b ρ (s)| ≤ CN (1 + <s)−N exp(2ε|=s|) for any N. The remaining function in the integrand admits an estimate in the lower half plane =s ≤ 0, Ce−=s |s|n−1 , so the contour may be deformed to the real axis resulting in Z ∞ d n h 2 1 ρ b (s) 1 − i exp(i( arctan s − s)) I= n 2πh −∞ ds h 2  2 ! n−1 2  h s ds. 1+  2

724

B. Fedosov

Now, using 1 2π

Z

∞

−∞

sk ρ b(s)ds = (−i)k ρ (k) (1)

which is 0 for k > 0 since ρ(1 + u) ≡ 1 in the neighborhood |u| < ε, we obtain the following asymptotic expression:   2 ! n−1 2  hs h 2 d n arctan −s 1+ s exp i I = 1−i  ds  h 2 2 s=0

+O(h∞ ).

The first summand is a polynomial in 1/ h of degree n. To identify it with the binomial coefficient in (3.17), consider the integral I for h ∈ 3. Then I = Tr Ph + O(h∞ ). But Tr Ph for h = k +

n+1 −1 2

is equal to the multiplicity mk (3.8). The latter is −1 , proving the lemma. u t evidently equal to the value of (3.14) at h = k + n+1 2 Let us prove the estimate (3.10). From (3.11) we have a|N = c|N + b ∗ (H − 1)|N = c|N + ( b|N ) ∗ (H − 1) + O(hN +1 ). For the Weyl pseudo-differential operators we thus obtain b − 1) + O(hN +1 ), Op(a|N ) = Op(c|N ) + Op(b|N )(H where O(hN+1 ) means the norm estimate. For h ∈ 3 we multiply both sides by Ph . b − 1)Ph = 0 and Taking into account that (H Tr Ph O(hN +1 ) = O(hN −n+1 ), we get

Tr N,h a = Tr Ph Op(c|N ) + O(hN −n+1 ).

Now, for Weyl pseudo-differential operators the trace formula Z ωn+1 1 ab Tr Op(a)Op(b) = (2π h)n+1 Cn (n + 1)! holds (see e.g. [9]). Thus, integrating over a unit sphere, we obtain the following expression: Z ∞ 1 p(H, h)hc|N iH n dH. Tr Pn Op(c|N ) = n+1 h 0 But hc|N i is independent of H in the neighborhood |H − 1| < ε. By Lemma 3.4 the last integral is equal to n ∞ ∞ C1/ h+(n−1)/2 hc|N i + O(h ) = (Tr a) N + O(h ), proving the estimate (3.10).

Atiyah–Bott–Patodi Method in Deformation Quantization

725

The trace property of the functional Tr a follows from that of the operator trace. Indeed, for any fixed N ∈ N and h → 0, h ∈ 3 we have (Tr a ∗ b) N = Tr Ph Op(a|N )Op(b|N ) + O(hN −n ), and similarly

(Tr b ∗ a) N = Tr Ph Op(b|N )Op(a|N ) + O(hN −n ).

b), But since Ph commutes with Op(b|N ) and Op(a|N ) (since they commute with H Tr Ph Op(a|N )Op(b|N ) = Tr Ph Op(b|N )Op(a|N ), t implying that expansions for Tr a ∗ b and Tr b ∗ a coincide. This proves the theorem. u The trace Tr which we have constructed is unique up to a constant factor and we have to check that the factor (3.14) gives a proper normalization (1.45). There is a distinguished vector bundle E over CPn : for any z ∈ Cn+1 \ 0 representing a point ζ ∈ CPn the fiber Eζ is the one-dimensional subspace in Cn+1 spanned by z. This bundle may be determined by a projector-valued matrix function π(z, z) on CPn with the entries πij = zi H −1 zj , i, j = 0, 1, . . . , n. Here z ∈ Oε and H = |z|2 . We also need tensor powers of π . They are given by a matrix π k with the entries πik1 ...ik j1 ...jk = zi1 . . . zik H −k zj1 . . . zjk ,

(3.18)

all the indices run over 0, 1, . . . , n. b ε ) setWe define quantum projectors as matrix-valued functions with entries in A(O ting b πik1 ...ik j1 ...jk = zi1 . . . zik ∗ fk (H ) ∗ zj1 . . . zjk ,

(3.19)

where fk (H ) is a rational function in H (with respect to the product ∗). It should be πk = b π k . This yields the following equation: determined from identity b πk ∗ b n X

fk (H ) ∗ zj1 . . . zjk ∗ zjk zjk . . . zj1 = 1.

j1 ,...,jk =0

Here it should be mentioned that for functions zj1 , zj2 , . . . , zjk , the ∗-product coincides with the usual commutative product of functions, the same is true for zj1 , . . . , zjk . Now, n X

z jk ∗ zjk = H −

jk =0

n+1 h. 2

Further, the identity zi ∗ H = (H + h) ∗ zi implies for any rational function f (H ) that zi ∗ f (H ) = f (H + h) ∗ zi .

(3.20)

726

B. Fedosov

Using this property, we obtain successively n X

zj1 . . . zjk−1 zjk ∗ zjk zjk−1 . . . zj1

j1 ,...,jk =0 n X

=

j1 ,...,jk−1 =0

n+1 h ∗ zjk−1 . . . zj1 zj1 . . . zjk−1 ∗ H − 2

n+1 n+1 h ∗ H −h− h ∗ zjk−2 . . . zj1 zj1 . . . zjk−2 ∗ H − 2 2 j1 ,...,jk−2 n+1 n−1 = H− h ∗ · · · ∗ H − kh − h . 2 2 =

n X

b ε ) since the leading term in each parenthesis does This expression is invertible in A(O not vanish. So, we set in (3.19), n + 1 −1 n − 1 −1 ∗ · · · ∗ H − kh − , h h fk (H ) = H − 2 2 where the exponent −1 is taken with respect to the ∗-product. Clearly, the leading term of b π k coincides with (3.18). Now, using again (3.20), we get n X

Tr (b π k ) = Tr

zi1 . . . zik ∗ fk (H ) ∗ zik . . . zi1

i1 ,...,ik =0 n X

= Tr fk (H + kh) ∗

zi1 . . . zik ∗ zik . . . zi1

i1 ,...,ik =0

n+1 n+1 h ∗ · · · ∗ H + (k − 1)h + h . = Tr fk (H + kh) ∗ H + 2 2 We substitute H = 1 because everything is defined modulo H − 1. This gives n − 1 −1 1 1 n − 1 −1 +k− h − h ... Tr (b π )= h 2 h 2 1 n−1 1 n+1 + h ... +k+ h Tr b 1 h 2 h 2 n = C1/ h+k+(n−1)/2 . k

(3.21)

We calculate now the right-hand side of (0.1) for CPn taking E = π k . Lemma 3.5. The following equality holds: Z e ω n n b chπ k A(CP ) exp = C1/ h+k+(n−1)/2 . h CPn

(3.22)

Atiyah–Bott–Patodi Method in Deformation Quantization

727

Proof. The curvature of the bundle π is tr π dπ dπ =e ω, 2π i ω). Next, the Atiyah-Hirzebruch class is represented by the form (see so chπ k = exp(ke e.g. [9])

n+1 e ω ω/2 − ee ω/2 ee n+1 n+1 e ω e ω =e 2 . ω−1 ee

n b )= A(CP

Thus, the integrand in (3.22) is equal to exp e ω

n+1 1 +k+ h 2

e ω ω−1 ee

n+1

.

The integral is equal to the Taylor coefficient at e ωn which may be found as a residue Z dz 1 eaz z , 2π i |z|=ε (e − 1)n+1 where a = 1/ h + k + (n + 1)/2. Changing variables ζ = ez − 1, we obtain Z dζ 1 n (1 + ζ )a−1 n+1 = Ca−1 , 2πi |ζ |=ε ζ proving the lemma. u t It is well-known (see e.g [2]) that the ring K(CPn ) is generated by powers of ξ = {π, 1} ∈ K(CPn ) modulo ξ n+1 . Thus, for any element η ∈ K(CPn ) represented by a pair {π0 , π1 } of projectors we have Z e ω n b Tr b π0 − Tr b π1 = (chπ0 − chπ1 )A(CP ) exp (3.23) n h CP because this equality has been proved for tensor powers π k and corresponding quantum projectors b π k . In particular, this implies that our trace functional Tr satisfies the normalization condition. Indeed, if π0 − π1 = 0 outside the neighborhood O of a point x0 ∈ CPn , the integral on the right-hand side of (3.23) reduces to Z (chπ0 − chπ1 ) O

which is the desired normalization. Acknowledgements. I would like to thank my colleague, Prof. N. Tarkhanov, and Dr. A. Karabegov for helpful discussions.

728

B. Fedosov

References 1. Abramov, A.A.: On topological invariants of Riemannian spaces obtained by integration of tensor fields. Dokl. Akad. Nauk SSSR 85, (1951) (Russian) 2. Atiyah, M.F.: K–Theory. Second edition, Reading, MA: Addison–Wesley, Inc., 1989 3. Atiyah, M.F., Bott, R., Patodi, V.K.: On the heat equation and the index theorem. Invent. Math. 19, 279–330 (1973) 4. Atiyah, M.F., Singer, I.M.: The index of elliptic operators I. Ann. Math. 87, 484–530 (1968) 5. Atiyah, M.F., Singer, I.M.: The index of elliptic operators III. Ann. Math. 87, 546–604 (1968) 6. Fedosov, B.: An index theorem in the algebra of quantum observables (Russian). Dokl. Akad. Nauk SSSR 305 4, 835–838 (1989); Translation in Soviet Phys. Dokl. 34, 4, 319–321 (1989) 7. Fedosov, B.: Index theorems. In: Sovremennye problemy matematiki. Fundamental’nye napravleniya. Vol. 65 VINITI, Moscow, 1991, pp, 165–268 8. Fedosov, B.: The index theorem for deformation quantization. In: Boundary value problems, Schrödinger operators, Deformation quantization. Math. Top. 8, Berlin: Akademie Verlag, 1995, pp. 206–318 9. Fedosov, B.: Deformation Quantization and Index Theory. Math. Top. 9, Berlin: Akademie Verlag, 1996, pp. 325 10. Fedosov, B.: A trace density in deformation quantization. In: Boundary value problems, Schrödinger operator, Deformation quantization. Math. Top. 8, Berlin: Akademie Verlag, 1995, pp. 319–333 11. Fedosov, B.: Non-Abelian reduction in deformation quantization. Lett. Math. Phys. 43, 137–154 (1998) 12. Fedosov, B.: A simple geometrical construction of deformation quantization. J. Diff. Geom. 40, 213–238 (1994) 13. Gilkey, P.B.: Invariance theory, the heat equation, and the Atiyah–Singer index theorem. Math. Lect. Series 11 Wilington, Del.: Publish or Perish, Inc., 1984 14. Karabegov, A.: Pseudo–Kähler quantization on flag manifolds. Commun. Math. Phys. 200, 355–379 (1999) 15. Nest, R., Tsygan, B.: Algebraic index theorem. Commun. Math. Phys. 172, 223–262 (1995) 16. Nest, R., Tsygan, B.: Algebraic index theorem for families. Advances in Math. 113 2, 151–205 (1995) 17. Tamarkin, D.: Topological invariants of connections on symplectic manifolds. Functsional. Analysis and its Applications 29 4, 45–56 (1995) Communicated by A. Jaffe

Commun. Math. Phys. 209, 729 – 755 (2000)

Communications in

Mathematical Physics

© Springer-Verlag 2000

Bounds for Kac’s Master Equation? Persi Diaconis1 , Laurent Saloff-Coste2,3 1 Department of Mathematics, Stanford University, Stanford, CA 94305, USA 2 CNRS, Toulouse, France 3 Department of Mathematics, Cornell University, Ithaca, NY 14853, USA. E-mail: [email protected]

Received: 26 April 1999 / Accepted: 30 August 1999

Abstract: Mark Kac considered a Markov Chain on the n-sphere based on random rotations in randomly chosen coordinate planes. This same walk was used by Hastings on the orthogonal group. We show that the walk has spectral gap bounded below by c/n3 . This and curvature information are used to bound the rate of convergence to stationarity. 1. Introduction On Euclidean space Rn consider the rotation  1 0 ...  .. 0 .    c s    gij (θ ) =  ..  .. . .    −s c   ..  . 0 ... 0

0



        ..   .      0

(1.1)

1

where all the entries on the diagonal are equal to 1 except for the (i, i) and (j, j ) entries that are equal to c = cos(θ), and all the off-diagonal entries are 0 except for the (i, j ) and (j, i) entries that are respectively +s and −s with s = sin(θ ), 0 ≤ θ < 2π. This represents a clockwise rotation by θ in the i, j plane, 1 ≤ i < j ≤ n. We consider the ? Research partially supported by Nato Grant CRG 950686 and by NSF Grants DMS-9802855 and DMS9504379

730

P. Diaconis, L. Saloff-Coste

random walk on the orthogonal group SO(n) generated by repeatedly multiplying by gij (θ) for 1 ≤ i < j ≤ n chosen uniformly and θ chosen uniformly in [0, 2π ). Call this measure Q and let Q∗k be the k th convolution power. Let U denote the uniform distribution (Haar measure) on SO(n). Our main result shows that Q∗k is close to U for k of order n5 log n. Theorem 1. The random rotations measure Q on SO(n) satisfies √ |Q (f ) − U (f )| ≤ 7 n 1 − ∗k

n 1 2k/ (2)+2 60n3

for f any bounded Lipschitz function of norm at most 1. In Sect. 5, we prove a better though less explicit result showing convergence after n4 log n steps. First motivation. The present problem arose as part of Mark Kac’s study of Boltzmann’s derivation of a basic equation of kinetic theory (1956), (1959, pp. 109–132). Kac simplified the problem to an n-particle system in one dimension. Assuming the positions are in equilibrium, he studied the velocities (v1 , v2 , . . . , vn ). Kac assumed that only the total energy v12 + v22 + · · · + vn2 = nσ 2 is conserved (hence the restriction to the sphere). In Kac’s model, the particles exchange energy as follows: at the times of a Poisson processes with rate nλ, a pair of indices (i, j ) is chosen at random and the velocities vi , vj are changed by (vi , vj ) → (vi cos(θ ) + vj sin(θ ), −vi sin(θ ) + vj cos(θ )) with θ chosen uniformly in [0, 2π). This gives rise to the operator Ht = e−nλt (I −Q) on L2 of the n-sphere with Z 1 X 2π f (gij (θ )V )dθ, (1.2) Qf (V ) = 2π n2 i<j 0 where V = (v1 , . . . , vn )t and gij (θ ) is the rotation defined at (1.1). If an initial density φ(V , 0) is assumed on the sphere then the process at time t has density φ(V , t) = Ht φ(V , 0). Differentiating shows that φ(V , t) satisfies Kac’s master equation Z nλ X 2π ∂φ(V , t) [φ(gij (θ )V , t) − φ(V , t)]dθ. = −nλ(I − Q)φ(V , t) = ∂t 2π n2 i<j 0

(1.3) Of course this is linear in φ. To get an analog of the non–linear Boltzmann equation, Kac studied the marginal distribution of the first coordinate v1 , call this f1n (v, t), and of the first two coordinates f2n (v, w, t). Assuming the sequence of initial densities φ n (V , 0) is symmetric in (v1 , . . . , vn ) and varies with n so that the marginals approximately factor: f2n (v, w, 0) ∼ f1n (v, 0)f1n (w, 0). Kac proved what has come to be called “propagation of chaos” f2n (v, w, t) ∼ f1n (v, t)f1n (w, t).

Bounds for Kac’s Master Equation

731

Plugging this in (1.3) gives the Kac analog of Boltzmann’s equation Z ∞ Z 2π ∂f (x, t) =C f (x cos θ + y sin θ, t)f (−x sin θ + y cos θ, t) ∂t −∞ 0 − f (x, t)f (y, t) dθ dy. Kac left many details of the derivation vague. Among these is a bound for the second eigenvalue of the basic operator n(I − Q) with Q as in (1.2). Kac comments that it depends on n and conjectures that it is bounded away from zero, uniformly in n. This corresponds to a bound on the second eigenvalue of Q of the form 1−

const . n

The argument for Theorem 1 gives the lower bound 1 − 1/(60n3 ) and the upper bound 1 − 2/n. Exactly determining the gap would be useful in pushing Kac’s attempt to justify Boltzmann’s proof of the H -theorem: entropy of the marginal density f1 (v, t) is decreasing in t. Of course we know this for the entropy of the joint Q distribution density φ(V , t) and Kac (1956, pp. 185–186) shows that if φ(V , t) ∼ f1 (vi , t) in a suitable sense then the desired result transfers to f1 . The stochastic dynamics underlying the Kac formulation is used as an algorithm for studying solutions of Boltzmann’s original equation. Indeed, following Kac (1956, Sect. 2), Grünbaum (1971) gave an appropriate stochastic dynamics for the spatially homogeneous version of Boltzmann’s original equation. This is further developed by Uchiyama (1988). Méléard (1996) points out that these stochastic processes are essentially the same as algorithms of Bird (1976) and Nambu (1983) for solving Boltzmann’s original equation. See Perthame (1994) for more of this. Thus, spectral gaps and results of the present paper correspond to running time bounds for these algorithms applied to Kac’s equation. Kac’s paper has given rise to a fair sized literature on “propagation of chaos”. Useful surveys are in Méléard (1996) and Sznitman (1991) and the thesis of Gottlieb (1998). There is also some literature on Kac’s equation (1.3). See McKean (1966), Grünbaum (1972), Desvillettes (1995), Carlen et al (1997) and Méléard (1996). A good overall survey on Boltzmann’s equation is Cercignani et al (1994). Second motivation. The same random walk acting on all of SO(n) was suggested by Hastings (1970) as a simple way of generating an approximately random rotation. In his paper, Hastings reports some numerical studies when n =Z 50. He used the random f (m)dm by Jef = walk to estimate the average value of a function f : Jf = SO(n)

N 1 X f (mi ). For example, if f (m) = m211 + m222 + · · · + m2nn it is known that Jf = 1. N i=1 Using N = 1000, starting the walk at the identity, Hastings obtained Jef = 3.5 ± 1.5. He supposed this poor estimate was due to the starting place and showed empirically that if the walk is started more “centrally” (e.g., at a real version of the discrete Fourier matrix) satisfactory estimates were obtained. We note that the walk analyzed here is an example of what statisticians call the Gibbs sampler (see e.g., Smith and Roberts (1993)): to sample from a vector distribution, pick a few random coordinates, freeze the rest, and sample from the correct distribution on the chosen coordinates given the frozen coordinates. The Gibbs sampler is also known as the

732

P. Diaconis, L. Saloff-Coste

heat bath or Glauber dynamics. To generate from the uniform distribution on the sphere these algorithms pick two coordinates at random and then choose from the conditional distribution given the rest. This is just Kac’s walk! Our theorem thus gives one of very few available examples of a rate of convergence result for the Gibbs sampler. One further motivation for the careful study of the present example is to begin the extension of the geometric theory of Markov chains developed in [7,8,38] from finite to continuous state spaces. There has been some previous study of rates of convergence of random walk on compact groups. Diaconis and Shahshahani (1986), Rosenthal (1994) and Porod (1995, 1996 a,b) study the walk on O(n) generated by random reflexions. This walk is constant on conjugacy classes so character theory can be used to bound convergence. One difficulty with Kac’s walk is that the convolution Q∗k is not absolutely continuous with respect to Haar measure. There is positive probability that all the gij chosen have the same value (i, j ). Thus Q∗k does not converge in L2 . This blocks the usual route used in [7, 8, 38] of bounding total variation convergence by L2 convergence. We are able to prove total variation convergence (Corollary 2.1) but the argument only shows 2 convergence after order 4n steps. The arguments developed in the present paper use a factorization of Haar measure to allow piecewise continuous paths to be chosen between points of SO(n). It then applies comparison inequalities, much as in [7,8], to prove spectral gap bounds. The operator Q is far from compact: In Sect. 3 we find eigenvalues with infinite multiplicity. Section 2 gives a careful description of the factorization of the Haar measure that we use. Basically, the Euler angles of a randomly chosen element in SO(n) are independent beta variates. Section 3 contains the spectral gap estimates. It also gives results for θ chosen from a non–uniform distribution at each stage and for the walk driven by uniform rotations in planes corresponding to consecutive coordinates (i, i + 1), 1 ≤ i ≤ n − 1. Section 4 reviews needed geometric tools (Ricci curvature, diameter and volume growth) on SO(n). The quantitative bound on the dual bounded Lipschitz rate of weak convergence using a spectral gap estimate may be of general interest. Section 5 shows how one can take full advantage of comparison inequalities to obtain improved rates of convergence for a random walk on a group, much as in [7]. It is straightforward to extend the analysis to a parallel walk on the unitary group U (n). This may be of interest in connection with quantum computing. Randomly choosing a pair of coordinates and multiplying by a random element of U (2) can be studied as a model of noisy quantum circuits. See Aharonov and Ben-Or (1997) and Shor (1996). In preliminary work, David Maslin (1999) has determined that the spectral gap of Kac’s walk equals 3 1 + 2n 2n(n − 1) with multiplicity n(n − 1)(n + 6)(n + 1)/24. His argument makes heavy use of representation theory of SO(n). E. Janvresse (1999) has also obtained a bound of form c/n for the walk on the sphere by a different method.

2. A Factorization of Haar Measure This section gives a probabilistic interpretation to Hurwitz’ (1897) construction of Haar measure on SO(n).

Bounds for Kac’s Master Equation

733

Let gi (θ) = gi−1,i (θ), 2 ≤ i ≤ n. These rotations act on a column vector [x1 , . . . , xn ]t by gi x = [x1 , . . . , cxi−1 + sxi , −sxi−1 + cxi , . . . , xn ]t . Choosing c =

q ±xi−1 , 2 xi2 +xi−1

q ±xi 2 xi2 +xi−1

s=

(same sign in each) results in a vector with i th

coordinate zero. A succession of such rotations can be used to bring a given m ∈ SO(n) to diagonal form. Suppose e.g. that n is 4. Then, with obvious notation ∗ ∗ ∗ ∗

∗ ∗ ∗ ∗

∗ ∗ ∗ ∗

∗∗ ∗ ∗ g4 ∗ ∗ −→ ∗∗ ∗ 0∗ ∗

∗ ∗ ∗ ∗

∗ ∗∗ ∗ g3 ∗ ∗ −→ ∗ 0∗ ∗ 0∗

∗ ∗ ∗ ∗

∗ 0 −→ 0 0

∗ ∗ ∗ ∗

∗ ∗∗ ∗ g30 0 ∗ −→ ∗ 00 ∗ 00

∗∗ ∗∗ ∗ ∗ g400 0 ∗ −→ ∗∗ 00 ∗∗ 00

g40

∗ ∗ ∗ 0

∗ ∗∗ ∗ g2 0 ∗ −→ ∗ 0∗ ∗ 0∗

∗ ∗ ∗ ∗

∗ ∗ ∗ ∗

∗∗ ∗∗ ∗∗ 0∗

The final matrix is orthogonal and so all off diagonal entries must be zero and all diagonal entries must be ±1. By using the free choice of sign in gi , the final matrix may be taken as the identity. Thus 0 0 00 m = (g4t g3t g2t )(g4t g3t )(g4 t ). Clearly this generalizes so that any element of SO(n) can be represented as n−2 n−1 1 . . . g21 )(gn2 · · · g32 ) · · · (gnn−2 gn−1 )gn . m = (gn1 gn−1

Hurwitz discovered that a uniform probability distribution on SO(n) (now called Haar measure) can be derived by giving an appropriate product measure to the {gji } above. This may be seen by an elementary argument. Recall the gamma density on [0, ∞), γa (x) = 0(a)−1 e−x x a−1 . The following facts from a first probability course are useful. Lemma 2.1. If X1 , X2 , . . . , Xn are independent with Xi having a γai distribution then X1 + X2 X1 + · · · + Xn−1 X1 , ,... , , X1 + · · · + Xn are independent (1) X1 + X2 X1 + X2 + X3 X1 + · · · + Xn 0(A) X1 + · · · + Xi x A−1 (1 − x)B−1 having density β(A, B; x) = with X1 + · · · + Xi+1 0(A)0(B) on [0, 1] for A = a1 + · · · + ai , B = ai+1 and Sn = X1 + · · · + Xn having density γa1 +···+an . X Xn 1 ,... , . (2) Sn is independent of the vector Sn Sn Lemma 2.2. Let Z1 , Z2 , . . . , Zn be independent standard Gaussian random variables. Then Zn Z1 ,... , q is uniformly distributed on the n-sphere (1) q Z12 + · · · + Zn2 Z12 + · · · + Zn2 with first coordinate having density 0( n2 )

0( 21 )0( n−1 2 )

(1 − x 2 )

n−3 2

on [−1, 1].

734

P. Diaconis, L. Saloff-Coste

(2) Let W1 , . . . , Wn+1 be standard Gaussian variables, independent of each other and of Z1 , . . . , Zn in (1). Let v u 2 2 p u W + · · · + Wn+1 W1 , B = t 22 = 1 − A2 . A= q 2 2 2 W + · · · + W 1 n+1 W1 + · · · + Wn+1 Then

Zn Z1 ,... , q A, B q Z12 + · · · + Zn2 Z12 + · · · + Zn2

is uniformly distributed on the (n + 1)-sphere. Proof. Property (1) is a classical fact following from the invariance of the Gaussian 1 2 n 2 product density e− 2 (z1 +···+zn ) /(2π ) 2 under rotations. For (2), observe that Z12 is a scale 2 multiple of a γ1/2 variable. Squaring A, B, q 2 Z1 , the sum W12 + · · · + Wn+1 is Z1 +···+Zn2

independent of all the ratios. Multiplying through, we have Z12 Zn2 2 )· , . . . , . W12 , (W22 + · · · + Wn+1 Z12 + · · · + Zn2 Z12 + · · · + Zn2 The last n-components are distributed as a vector of independent scaled γ1/2 variates using Lemma 2.1 (2). So the ratio of the square roots of the entries to the square root of the sum of all the entries is uniform on the (n + 1)-sphere by Lemma 2.1 (1). u t The next result puts the pieces together to give a probabilistic version of Hurwitz (1897). For 2 ≤ j ≤ n fixed, consider rotations of the form   1 0 ...  ...  0  .  .  .    x y   gj =   −y x    ..    .   ..   . 0 ... 0 1 where all diagonal entries are equal to 1 except for the (j − 1, j − 1) and (j, j ) entries which are equal to x ∈ [0, 1] and all off-diagonal entries are √ equal to zero except for the (j − 1, j ) and (j, j − 1) which are equal respectively to y = 1 − x 2 and −y. Let νj be the measure supported by these rotations and such that, under νj , x has the distribution of the first coordinate of a point uniformly chosen on the n + 2 − j -sphere. Proposition 2.1. Let {Gij }, 1 ≤ i < j ≤ n, be independent random matrices in SO(n) j −1

with {Gij }i=1 having common distribution νj . Then

n−2 n−1 (G1n G1n−1 · · · G12 ) · · · (Gn−2 n Gn−1 )Gn

is uniformly distributed on SO(n).

Bounds for Kac’s Master Equation

735

Proof. The idea of the proof is simple. First, in R3 , if a uniform rotation in the (x, y) plane is followed by an independent rotation taking the z axis to a uniform direction, the result is uniform on SO(3). In general, if e1 , . . . , en is the standard basis for Rn and {Ni }ni=2 are independent random matrices in SO(n) with N2 fixing e1 , . . . , en−2 and uniform in the (n − 1, n) plane, . . . , Ni fixing e1 , . . . , en−i and taking en−i+1 to a uniform vector in span en−1+1 , . . . , en . Then the product Nn Nn−1 , . . . , N2 is uniformly distributed on SO(n). This is given a formal proof in [10]. To finish the proof, we argue that n−(i−1)

· · · Gn−(i−2) , 2 ≤ i ≤ n, Ni = Gn−(i−1) n have the required properties. Proceed by induction. Dropping the superscript, Gn has form   1   .. .    a 0 b0  −b0 a 0

with a 0 , b0 chosen uniformly on the 2-sphere. Gn−1 has form  1  .. .   a b   −b a 0 0

    0 0 1

√ Z1 and b = 1 − a 2 . The product Gn Gn−1 has the with a distributed as q Z12 + Z22 + Z32 last three elements of the third column (a, −ba 0 , bb0 ). From Lemma 2.2, this is uniform on the three-sphere. The product Gn Gn−1 Gn−2 has ones on the diagonal down to the (n−5, n−5) place and the last four entries of column n − 4 uniform on the four-sphere. Continuing, Gn Gn−1 · · · G2 has its first column uniformly distributed on the n-sphere. t u Remark. Using standard characterizations of beta and gamma random variables we can prove a converse: if G ∈ SO(n) is chosen from Haar measure then, almost surely, the factorization into rotations as above is uniquely defined and the terms Gij are independent with distributions specified by Proposition 2.1. As a corollary of Proposition 2.1 we show that the random rotation chain of Theorem 1 satisfies a Döblin condition and thus converges to the uniform distribution in total variation norm. This gives a remarkably poor bound but, up to minor improvements, it is the best we know. Corollary 2.1. The convolution Q∗k of Theorem 1 converges to Haar measure on SO(n) in total variation. Indeed n

kQ∗k − U kT V ≤ (1 − c)bk/(2))c with c = 4−n n−n . 2

736

P. Diaconis, L. Saloff-Coste

n Proof. We claim that Q∗(2) (A) ≥ cU (A) for all Borel sets A. This Döblin condition implies the result (see e.g. Kloss (1959)). To prove the claim observe that the chance that the first n2 steps of the walk pick rotations in the exact coordinates used for the n n factorization of Haar measure in Proposition 2.1 is 1/ n (2) . For this component of Q∗(2)

2

the density of the corresponding random matrix is i,j

5fn (xij ) with the product ranging over the chosen coordinates and fn (x) =

(1 − x 2 )−1/2 1 ≥ . π π

Proposition 2.1 gives the density of Haar measure as i,j

5fnij (xij ) for the same coordinates but different densities. There are n−1 terms of the Haar density with nij = n. Cancel these from both sides. Bound the remaining density factors of the n n−1 component of Q∗(2) below by 1/π ( 2 ) . To bound the remaining factors of the density of Haar measure above, use Lemma 2.2 (1) for k ≥ 3, fn+2−k (x) =

0( 2k )

0( 21 )0( k−1 2 )

(1 − x 2 )

k−3 2

≤

0( 2k )

0( 21 )0( k−1 2 )

k ≤ √ . 2 π

There are n − k + 1 terms in the product for a given k, 3 ≤ k ≤ n. This shows that the remaining factors of the density of Haar measure are bounded above by 1 (n−1) 2 3n−2 · 4n−3 · · · n. √ 2 π Combining bounds gives the result with c>

1 ( 2π 3/2

n−1 2

)

(3n−2 · 4n−3 · · · n)−1 > 4−n n−n . 2

t u

The next corollary uses part of the factorization to represent the measure Qθ on SO(n) which rotates by a fixed angle θ in a randomly chosen two–dimensional space. cos θ sin θ in the upper leftFormally, let Rθ be the n × n matrix with the 2 × 2 block − sin θ cos θ hand corner. Let Qθ be the probability distribution corresponding to MRθ M −1 , where M is chosen from the Haar measure. Thus Qθ is uniformly distributed on the conjugacy class containing Rθ . Repeated convolutions of Qθ were studied by Rosenthal (1994). We will use his results to get bounds on the spectrum of Q in the next section. Corollary 2.2. With notation as in Proposition 2.1, the measure Qθ is the probability distribution of T Rθ T −1 , where T = [(G1n G1n−1 · · · G12 ))(G2n G2n−1 · · · G23 )].

Bounds for Kac’s Master Equation

737

Proof. The argument for Proposition 2.1 shows that (G1n · · · G12 ) has columns V1 , V2 · · · Vn with V1 uniformly distributed. Similarly, (G2n · · · G23 ) has form   1 0 ··· 0 W2 · · · Wn    .. . with the column W2 uniformly distributed. The product T of these two has first two columns V1 , W22 V2 + W32 V3 + · · · + Wn2 Vn . Now V2 , V3 , . . . , Vn is an orthonormal basis for V1⊥ and so the second column is uniformly distributed in this space. Thus the first two columns of the product are a uniformly distributed two–plane. By direct computation, the matrix T Rθ T −1 only depends on the first two columns and so has the t same distribution as MRθ M −1 . u Remark. Similar factorizations hold for the unitary and symplectic group. Factorizations also hold for finite groups generated by reflexions (e.g. the symmetric group). Details and further applications can be found in [6]. These factorizations can be used exactly as in Sect. 3 below to give bounds on the eigenvalues of associated random walks. For example, on the symmetric group, the walk parallel to Kac’s walk is the one generated by random transpositions. 3. Spectral Bounds 3.1. Introduction. For 1 ≤ i < j ≤ n, let µij be the push–forward of the measure f (x) =

(1−x 2 )−1/2 π

on [−1, 1] under the map   1 0 ...  ..  0 .  .  .  .    x y   p   x 7→   , y = 1 − x2,   −y x    ..   .    ..   . 0 ... 0 1

where the x’s are in position (i, i) and (j, j ) and y (resp. −y) is in position (i, j ), (resp (j, i)). This corresponds to rotation by a uniform angle in the (i, j ) plane. Let 1 X µij . (3.1) Q = n 2

i<j

Then Q is a symmetric probability measure on SO(n). It acts on the real vector space R R L2 (SO(n)) via Qf (x) = f (xy −1 )Q(dy) = f (xy)Q(dy). Because of symmetry, Q is a bounded self-adjoint operator on L2 . It has real spectrum contained in [−1, 1]. In this section we bound the spectral gap, that is, the norm of Q acting on L20 = {f ∈ R L2 : f dx = 0}.

738

P. Diaconis, L. Saloff-Coste

We will show that for all f ∈ L20 , kf k22 ≤ Ah(I − Q)f, f i, kf k22 ≤ Ah(I + Q)f, f i, for A = 60n3 .

(3.2)

For f ∈ L20 with kf k22 = 1, (3.2) implies that hQf, f i = 1 − h(I − Q)f, f i ≤ 1 −

1 1 , hQf, f i = −1 + h(I + Q)f, f i ≥ −1 + . A A

Now, an elementary argument in Riesz–Nagy ((1960), Sect. 9.2) shows kQk0,2→2 = max(−m, M) with m = min hQf, f i, M = max hQf, f i. Thus (3.2) proves kf k0,2 =1

kf k0,2 =1

Theorem 3.1. The probability measure Q of (3.1) on SO(n) satisfies kQ − U k2→2 = kQk0,2→2 ≤ 1 −

1 . 60n3

(3.3)

e on SO(n). This Q e results The argument for (3.2) is by comparison with a measure Q from rotating by π in a randomly chosen plane. Formally e is the distribution of M −1 RM on SO(n), where M is Haar Q distributed and R is a diagonal matrix with two minus ones and n − 2 ones on the diagonal.

(3.4)

e is given by Corollary 2.2. This Q e is uniformly distributed on Another description of Q the conjugacy class containing R. Its spectral behavior can thus be obtained by character theory on SO(n). This was done by Rosenthal (1994) whose results are described in Lemma 3.2. In Sect. 3.2 we show that for every f ∈ L2 , e f i ≤ 16n2 h(I − Q)f, f i, h(I + Q)f, e f i ≤ 16n2 h(I + Q)f, f i. h(I − Q)f, These comparison inequalities are proved using the factorizations of Sect. 2. In Sect. 3.3 we show that Q has the eigenvalue 1 − n2 with infinite multiplicity. In Sect. 3.4 we describe some variants of the measure Q to which the present techniques apply. We conclude this section by proving two needed lemmas. The first is for functions on the circle S 1 . Lemma 3.1. For any h ∈ L2 (S 1 ), and any probability measure ν on S 1 , RR (a) h(I − ν)h, hi ≤ RR (h(x) − h(xy))2 dxdy, (b) h(I + ν)h, hi ≤ (h(x) + h(xy))2 dxdy. Proof. Acting by convolution on S 1 , ν has spectrum in [−1, 1]. Thus 0 ≤ h(I −ν)h, hi ≤ R 2 2khk2 . Further, if h = h(x)dx, h(I − ν)(h − h), hi = h(I − ν)h, hi and h(I − ν)(h − h), hi = 0. Hence h(I − ν)h, hi = h(I − ν)(h − h), h − hi ≤ 2kh − hk22 = The proof of (b) is similar. u t

ZZ (h(x) − h(xy))2 dxdy.

Bounds for Kac’s Master Equation

739

e This The second lemma gives a sharp bound on the spectral gap for the measure Q. leans heavily on work of Rosenthal (1994). e defined at (3.4) and any f ∈ L2 , Lemma 3.2. For n ≥ 2, Q 0 e f i, kf k22 ≤ ah(I + Q)f, e f i with a = kf k22 ≤ ah(I − Q)f,

4n . 15

e by character theory. The Proof. Rosenthal (1994) determined all the eigenvalues of Q different eigenvalues are indexed by n-tuples a1 < a2 < · · · < am with ai integers or half integers. For definiteness, we treat the case where n = 2m + 1 is odd, so ai − 21 ∈ {0, 1, 2, . . . }. These ai index the irreducible representations. For example, the trivial representation corresponds to 21 , 23 , . . . , m − 21 and the n-dimensional representation corresponds to a∗ = 21 , 23 , . . . , m − 23 , m + 21 . Rosenthal shows there is an eigenvalue e for each such m-tuple given by β(a) of Q β(a) =

1 m (−1)aj −j + 2 (2m − 1)! X . Qm Qj −1 2 2 2 2 22m−1 j =1 aj i=1 (aj − ai ) i=j +1 (ai − aj )

(3.5)

These eigenvalues have multiplicity the square of the dimension of the corresponding irreducible representation which is given by a similar formula. We do not need to consider e these multiplicities. As a varies, these are all the eigenvalues of Q. The eigenvalues can be bounded by m 1 (2m − 1)! X . |β(a)| ≤ r(a) = Qm Qj −1 2 2 2 2 22m−1 j =1 aj i=1 (aj − ai ) i=j +1 (ai − aj )

(3.6)

Obviously, for a corresponding to any non–trivial representation, r(a) is largest when a = a∗ defined above. This a∗ corresponds to the n-dimensional representation for which the eigenvalue is β(a∗ ) = n1 Tr (R) = 1 − n4 . Comparing (3.5) and (3.6) we have r(a∗ ) = 1 −

4 15 4 ≤1− + . n 4n (n + 21 ) 4n

This r(a∗ ) bounds the absolute value of the largest and smallest eigenvalues β+ , β− . e follow since these operators have largest and smallest The claimed bounds for I ± Q t eigenvalues 1 − β+ , 1 − β− . u

3.2. Comparison inequalities. This section proves (3.2) and so Theorem 3.1. e on SO(n) defined in (3.1), (3.4) and any Proposition 3.1. For the probabilities Q, Q f ∈ L20 (SO(n)), e f i ≤ 16n2 h(I − Q)f, f i. h(I − Q)f,

740

P. Diaconis, L. Saloff-Coste

Proof. The argument uses the factorization of the Haar measure derived in Sect. 2. In the calculation below, squared differences are bounded by writing y = y1 · · · yk so [f (x) − f (xy)]2 = [f (x) − f (xy1 . . . yk )]2 = [(f (x) − f (xy1 )) + (f (xy1 ) − f (xy1 y2 )) + · · · + (f (xy1 . . . yk−1 ) − f (xy1 . . . yk ))]2 ≤ k[(f (x) − f (xy1 ))2 + · · · + (f (xy1 . . . yk−1 ) − f (xy1 . . . yk ))2 ]. Integrating over x in SO(n) gives Z (f (x) − f (xy))2 dx ≤ k

k Z X

(f (x) − f (xyi ))2 dx.

i=1

The factorization of Sect. 2 depends on an ordering of {1, 2, . . . , n} and only involves rotations Gi using pairwise adjacent coordinates. The measure Q uses all pairs of coordinates. We symmetrize by conjugating by the permutation matrix corresponding to σ in Sn . Write gi,σ (θ) = gσ (i−1),σ (i) (θ ) and Gi,σ for the corresponding random element of SO(n). Thus for any fixed σ , n−2 n−1 (G1n,σ · · · G12,σ ) · · · (Gn−2 n,σ Gn−1,σ )Gn,σ

(3.7)

is a uniformly distributed element of SO(n). Further, with 

−1

Tσ =

(G1n,σ

· · · G12,σ )(G2n,σ

· · · G23,σ ),

σ

R =σ

   

−1 

 −1 1

   σ, ..  .  1

Tσ R σ Tσ−1

(3.8)

e See Corollary 2.2. has distribution Q. j j Write νk,σ for the distribution of Gk,σ (this distribution actually does not depend on j ) and µk,σ for the distribution corresponding to a uniform rotation in the coordinate plane σ (k − 1), σ (k). For any f ∈ L20 , ZZ e e fi = 2h(I − Q)f, (f (x) − f (xy))2 dx Q(dy) Z Z 1 X 1 2 2 −1 · · · g3,σ R σ (g3,σ ) · · · (f (x) − f (xgn,σ = n! σ ∈Sn

1 1 1 2 2 · · · (gn,σ )−1 )2 dxνn,σ (dgn,σ ) · · · ν3,σ (dg3,σ ).

e f i is Using the Cauchy–Schwarz inequality on the differences as above, 2h(I − Q)f, bounded above by Z n Z 2 X X (4n − 5) X k (dg). (f (x) − f (xR σ ))2 dx + 2 (f (x) − f (xg))2 dxν`,σ n! σ ∈Sn

k=1 `=k+1

Bounds for Kac’s Master Equation

741

To complete the argument we show that in each term above the measure ν can be replaced by the measure µ corresponding to a uniform rotation in the chosen coordinate plane. To see this, fix a term Z k (g) |f (x) − f (xg)|2 dxdν`,σ in the sum above. Factor dx into appropriate pieces in the order τ as in (3.7) where the permutation τ is chosen so that the right-most factor Gn−1 n, τ is a (uniform) rotation in the desired coordinate plane (σ (` − 1), σ (`)) (this is achieved by any τ such that (τ (n − 1), τ (n)) = (σ (` − 1), σ (`))). Fixing the other coordinates, define a √ function 1 · · · g n−2 g n−2 g(z)) with g(z) in SO(n) having z, 1 − z2 fe on S 1 as fe(z) = f (gn,τ n,τ n−1,τ installed in the appropriate places. Using Lemma 3.1, the integral over z with any measure ν is smaller than the integral with z uniform. This also holds for the terms R σ (where ν e f i is bounded above by is point mass). Thus 2h(I − Q)f, Z (4n − 5) X (f (x) − f (xg)2 dxµ2,σ (dg) n! σ ∈Sn

+2

n Z 2 X X

(f (x) − f (xg))2 dxµ`,σ (dg).

k=1 `=k+1

As µ`,σ = µij with i = σ (` − 1), j = σ (`), we see that a given term Z (f (x) − f (xg)2 dxµij (dg) appears at most 2(n − 2)! + 2 × 2(2n − 3)(n − 2)! ≤ 8(n − 1)! times in the sum above. Finally, this yields the bound e f i ≤ αh(I − Q)f, f i h(I − Q)f, with

(4n − 5)8(n − 1)! α= n!

n 2

≤ 16n2 .

t u

The next result yields a lower bound for negative eigenvalues. e on SO(n) defined in (3.1), (3.4) and any Proposition 3.2. For the probabilities Q, Q 2 f ∈ L0 (SO(n)), e f i ≤ 16n2 h(I + Q)f, f i. h(I + Q)f, Proof. The argument parallels the proof of Proposition 3.1 using a factorization of odd length k for y = y1 · · · yk so that [f (x) + f (xy)]2 = [(f (x) + f (xy1 )) − (f (xy1 ) + f (xy1 y2 )) + · · · + (f (xy1 · · · yk−1 ) + f (xy1 · · · yk ))]2 ≤ k[(f (x) + f (xy1 ))2 + · · · + (f (xy1 · · · yk−1 ) + f (xy1 · · · yk ))2 ]. The factorization (3.8) always has odd length. Now, proceed as in Proposition (3.1), using Lemma 3.1 b). u t

742

P. Diaconis, L. Saloff-Coste

3.3. Examples. Section 3.2 gives upper bounds on the eigenvalues of Q defined in (3.1) 1 of form βi ≤ 1 − 60n 3 . For the n-dimensional representation ρ(m) we have 2 1 X b b µij (ρ) = 1 − I, Q(ρ) = n n 2

(3.9)

i<j

R where the last equality comes from computing b µij (ρ) = ρ(m)µij (dm). This is a diagonal matrix with zero entries at (i, i), (j, j ) and ones elsewhere. Summing over i, j gives (3.9). Since the n-dimensional representation appears n times in the decomposition of L20 , 1 − n2 appears as an eigenvalue with multiplicity at least n2 . The next result shows that it appears with infinite multiplicity. Proposition 3.3. On SO(n), let f (m) = mk1,1 for k odd. Then 2 f (m) for all k = 1, 3, 5, . . . . Qf (m) = 1 − n Proof. 1 X µij f (m) = Qf (m) = n 2

i<j

n n−1 1 X 2 n f (m) + n 2 2 j =2

2 f (m). µ1j f (m) = 1 − n

Indeed, µij with 2 < i < n doesn’t move the first coordinate and Z µ1j f (m) =

2π

(m11 cos θ − m1j sin θ )k dθ = 0.

t u

0

By similar fooling around with test functions we can find eigenvalues of form 1− nc with infinite multiplicity and c smaller than 2. Marc Kac conjectured and it has now been proved by D. Maslin (1999) and, independently, by E. Janvresse (1999), that all eigenvalues are smaller than 1 − nc for some universal c. Maslin’s result applies to the walk on SO(n) whereas Jeanvresse’s is restricted to the sphere.

3.4. Variations and remarks. The methods of this section are fairly robust and give similar results for a variety of measures Q on SO(n). Variation 1. For 1 ≤ i < j ≤ n, let λij be a measure on SO(n) with the property that for some k and c ≥ 0, λ∗k ij ≥ cµij with µij from (3.1). For example, λij may correspond to rotation in the (i, j ) plane through an angle uniformly chosen in [− π4 , π4 ]. Let Qλ =

1 X λij . n 2

i<j

Theorem 3.2. The probability measure Qλ of (3.10) satisfies kQλ − U k2→2 ≤ 1 −

c . 60k 2 n3

(3.10)

Bounds for Kac’s Master Equation

743

Proof. We may compare Qλ with Q of (3.1) using the domination and Cauchy–Schwarz ZZ (f (x) ± f (xy))2 dxµij (dy) ZZ 1 (f (x) ± f (xy1 y2 · · · yk ))2 dxλij (dy1 ) · · · λij (dyk ) ≤ c Z k2 (f (x) ± f (xy))2 dx dy. ≤ c This gives h(I ± Q)f, f i ≤

k2 h(I ± Qλ )f, f i. c

t u

Variation 2. For 2 ≤ i < n, let µi correspond to uniform rotation in coordinates (i−1, i). Let n 1 X Q= µi . (3.11) n−1 i=2

Thus Q corresponds to uniform rotation in pairwise adjacent coordinates. This might be an appropriate model for energy exchange of n particles confined to a line. The following theorem gets a bound on the spectral gap of Q. Theorem 3.3. The probability measure Q of (3.11) satisfies kQ − U k2→2 ≤ 1 −

3 . 16n3

Proof. Use the factorization of Corollary 2.2 without further symmetrization to compare e of (3.4). The argument for Proposition 3.1 gives with the measure Q e fi h(I − Q)f, n Z i hZ X (f (x) − f (xg))2 dxνi (dg) ≤ (4n − 5) (f (x) − f (xR))2 dx + 4 i=2

≤ 5(4n − 5)(n − 1)h(I − Q)f, f i, where νi is as in Proposition 2.1. Thus e f i ≤ 20n2 h(I − Q)f, f i. h(I − Q)f, Now, Lemma 3.2 yields, for any f ∈ L20 , kf k22 ≤

16 3 n h(I − Q)f, f i. 3

t u

An argument similar to that used in Proposition 3.2 shows kf k22 ≤

16 3 n h(I + Q)f, f i. 3

These results combine to prove the claim. u t

(3.12)

744

P. Diaconis, L. Saloff-Coste

Remarks. (1) We believe that the gap estimate c/n3 from Theorem 3.3 is sharp: Kac’s walk is somewhat analogous to random transposition on the symmetric group Sn whereas the variant of Theorem 3.3 is analogous to random adjacent transposition. The spectral gap of random transposition is of order 1/n whereas that of random adjacent transposition is of order 1/n3 (see [7] and the references therein). (2) In the arguments for Theorems 3.1–3.3 it is possible to avoid the use of character theory but get a slightly worse bound. For example, consider Q defined at (3.1). Use the factorization of Proposition 2.1 to represent a uniform rotation as a product of n2 rotations. Using Cauchy–Schwarz and then Lemma 3.1 along with symmetrization gives a bound of the form 2 n h(I − Q)f, f i. kf k22 ≤ 2 2 e and Rosenthal’s sharp bounds on the eigenvalues of Q e improve The comparison with Q this by a factor of n. 4. Rates of Weak Convergence 4.1. Introduction. This section develops bounds for the rate of convergence of a random walk generated by a probability Q on a compact group G to stationarity. The bounds use the second eigenvalue and some geometric information about volume growth. Section 4.2 gives bounds for the dual bounded Lipschitz metric on probabilities. Section 4.3 gives bounds for discrepancy. Section D specializes the bounds to SO(n) and the n-sphere. The main results are summarized in Corollary 4.1 and Theorems 4.2, 4.3. We hope that this material may be more generally useful. In the remainder of this introduction we set out our notation. Let G be a compact metrizable group with normalized Haar measure dg. Let H be a closed subgroup and X = G/H = {gH : g ∈ G} be the associated homogeneous space (G acts on the left of X ). For example, if G = SO(n+1), H = {id} (resp. H = SO(n)). Then X = SO(n + 1) (resp. X = S n the sphere in Rn+1 ). Consider a symmetric measure Q on G (so Q(A) = Q(A−1 ) for Borel sets A). Define Z f (v −1 g)Q(dv). Qf (g) = Q ∗ f (g) = G

Hence Q is a Markov operator on G which is self-adjoint on L2 (G) and commutes with right translations. Define a Markov kernel on X by setting K(x, A) = Q(AH u−1 ) if x = uH, A = AH. Let K ` (x, dy) = Kx` (dy) denote the distribution after ` steps. Recall that there exists on X a unique G-invariant probability measure m(dx) = dx such that Z Z Z f (g)dg = f (gh)dhdx G

X

H

for any continuous function f (dh denotes the normalized Haar measure on H ). It follows that, if x = gH and f (u) = f (uH ), we have Z f (v −1 x)Q(dv) Qf (g) = Kf (x) = G

Bounds for Kac’s Master Equation

745

for any bounded measurable function f : X → R. The symmetry assumption on Q implies that K is reversible with respect to m. We will work on L2 = L2 (X , m) on which K is self-adjoint. Let m : f → Rmf denote the operator that sends any function f to the constant function mf (x) = f (y)dy and set n k(K − m)f k o 2 , kf k 2 2 f ∈L

β = sup

R where kf k22 = X |f (y)|2 dy. Assume that X carries a G left-invariant distance d = dX . Let B(x, r) ⊂ X denote the corresponding balls. Let ρ be the diameter of X . 4.2. Bounded Lipschitz functions. Define the bounded-Lipschitz norm of a function f by |f (x) − f (y)| . kf kBL = kf k∞ + L(f ) with L(f ) = sup d(x, y) x6 =y The functions with kf kBL < ∞ form a Banach algebra carefully discussed in Dudley (1976). Consider the volume growth function s → V (s) = m(B(e, s)), where e is a base point on X . By invariance of d and of the measure m, the volume V (r) does not depend on the choice of e. The volume growth functions for SO(n) and S n are determined in Sect. 4.3 below. Theorem 4.1. Assume that there are positive c and n such that V (r) ≥ c(r/ρ)n for 0 < r ≤ ρ. Then, the Markov chain K ` on X defined in Sect. 4.1 satisfies k(K ` − m)f k∞ ≤ 3ρc−1/(2+n) β 2`/(n+2) L(f ). This theorem is proved by a sequence of lemmas. First observe that k(K ` − m)f k∞ = k(K ` − m)(f − a)k∞ for any constant a. It follows that we may assume that f changes sign on X . Now, if f changes sign, kf k∞ ≤ ρL(f ). Set χr (x, y) = V (r)−1 1B(x,r) (y) and Z Z f (y)χr (x, y)dy = V (r)−1 fr (x) = χr f (x) = X

B(x,r)

f (y)dy.

Lemma 4.1. For any Lipschitz function f , kf − fr k∞ ≤ rL(f ). Proof. Z Z f (x) − χr (x, y)f (y)dy ≤ |f (x) − f (y)|χr (x, y)dy ≤ rL(f ).

t u

746

P. Diaconis, L. Saloff-Coste

Let Tg : f → Tg f be defined by Tg f (x) = f (gx) for any function f . Since the distance d is invariant under the left action of G, we have gB(x, r) = B(gx, r) and Z Z f (y)dy = V (r)−1 f (y)dy Tg fr (x) = fr (gx) = V (r)−1 B(gx,r) gB(x,r) Z Z f (gy)dy = V (r)−1 Tg f (y)dy. = V (r)−1 B(x,r)

B(x,r)

Hence Tg χr = χr Tg for all r > 0 and g ∈ G. It follows that Kχr = χr K,

(4.1)

that is χr and K commute. Lemma 4.2. For any bounded function f , k(K − m)` fr k∞ ≤ β ` V (r)−1/2 kf k∞ . Proof. We have (K − m)` fr (x) = (K − m)` χr f (x) = χr (K − m)` f (x) Z Z χr (x, z)(K ` (z, y) − 1)f (y)dydz = X X 1/2 Z k(K − m)` f k2 ≤ |χr (x, z)|2 dz ≤ V (r)−1/2 β ` kf k2 ≤ V (r)−1/2 β ` kf k∞ .

t u

We return to the proof of the theorem. Fix f , and assume that f changes sign on X . Hence, kf k∞ ≤ ρL(f ). Write, for any r > 0, k(K ` − m)f k∞ ≤ k(K − m)` fr k∞ + k(K ` − m)(f − fr )k∞ ≤ V (r)−1/2 β ` kf k∞ + 2kf − fr k∞ ≤ (V (r)−1/2 β ` ρ + 2r)L(f ). Observe also that k(K ` − m)f k∞ ≤ 2kf k∞ ≤ 2ρL(f ). Thus, if we assume that V (r) ≥ c(r/ρ)n for 0 < r ≤ ρ, it follows that k(K ` − m)f k∞ ≤ ρ[(c(r/ρ)n )−1/2 β ` + 2r/ρ]L(f ) for all r > 0. Picking r so that (r/ρ)(2+n)/2 = c−1/2 β ` yields

k(K ` − m)f k∞ ≤ 3ρc−1/(2+n) β 2`/(n+2) L(f ).

t u

We end this section by stating a version of Theorem 4.1 in terms of the dual bounded Lipschitz distance. Following Dudley (1966), define the dual bounded Lipschitz distance D∗ (µ, ν) between two probability measures µ and ν by D∗ (µ, ν) =

sup

kf k∞ +L(f )≤1

|µ(f ) − ν(f )|.

Bounds for Kac’s Master Equation

747

Dudley [11–14] shows that D∗ metrizes weak ∗ convergence of probability measures on X . Further, if the Prohorov metric on probability measures is defined by R(µ, ν) = inf {µ(A) ≤ ν(Aε ) + ε, all Borel A}, ε

Aε = {y ∈ G : ∃ x ∈ A : d(x, y) < ε}, then R(µ, ν) ≤ 2D∗ (µ, ν). Corollary 4.1 gives a bound for the rate of convergence in D∗ and R distance in the presence of a bound on the spectral gap β. Corollary 4.1. Assume that V (r) ≥ c(r/ρ)n for 0 < r ≤ ρ. Then, for every x, and `, the Markov chain K ` on X defined in Sect. 4.1 satisfies D∗ (Kx` , m) ≤ 3ρc−1/(2+n) β 2`/(n+2) . 4.3. Discrepancy. Consider now the discrepancy distance associated to the metric d defined by D(µ, ν) = sup {|µ(B(x, r)) − ν(B(x, r))|}. x∈X ,r>0

Discrepancy is a standard measure of the rate of convergence in the metric theory of numbers. Kuipers and Nederiter (1974) is a book length treatment of techniques to bound discrepancy. Phillips–Lubotzky–Sarnak (1986) give discrepancy bounds for a random walk on the sphere. Su (1995) gives discrepancy rates for a variety of random walks on compact spaces. Some remarks comparing these results to our results appear at the end of this section. Theorem 4.2. Assume that V (r) ≥ c(r/ρ)n for 0 < r ≤ ρ and that for some C > 0, V (r + ε) − V (r − ε) ≤ Cε/ρ for all r, ε > 0. Then, for every x and ` the Markov chain K ` defined in Sect. 4.1 satisfies D(Kx` , m) ≤ 2C n/(n+2) c−1/(n+2) β 2`/(n+2) . Proof. Fix ε, r > 0 and y ∈ X . Set B = B(y, r), ϕ(z) = 1B and ϕ1 (z) = 1B(y,r−ε) , ϕ2 (z) = 1B(y,r+ε) . Recall that χε,z (w) = V (ε)−1 1B(z,ε) (w). Viewing χε as a Markov operator, we have χε ϕ1 ≤ ϕ ≤ χε ϕ2 . Furthermore

m(|ϕ − χε ϕi |) ≤ V (r − ε) − V (r + ε) ≤ Cε/ρ.

We consider two cases, depending on whether Kx` (B) ≥ m(B) or not. If Kx` (B) ≥ m(B), then |Kx` (B) − m(B)| = K ` ϕ(x) − m(B)

≤ K ` χε ϕ2 (x) − m(B)

≤ m(|ϕ − χε ϕ2 |) + |(K ` − m)χε ϕ2 (x)| ≤ Cε/ρ + c−1/2 (ε/ρ)−n/2 β ` .

748

P. Diaconis, L. Saloff-Coste

The last inequality uses the volume hypotheses and Lemma 4.2. In the case where Kx` (B) < m(B) the same inequality is obtained by using ϕ1 instead of ϕ2 in the argument. Hence |Kx` (B) − m(B)| ≤ Cε/ρ + c−1/2 (ε/ρ)−n/2 β ` . For (ε/ρ)(n+2)/2 = C −1 c−1/2 β ` this yields D(Kx` , m) ≤ 2C n/(n+2) c−1/(n+2) β 2`/(n+2) .

t u

Remark. Su (1995) analyzes the simple random walk on the circle S 1 taking steps ±α for irrational α. He bounds the rate of convergence to stationarity giving results that depend on the degree of irrationality of α. The bounds use standard tools from uniform distribution mod(1): Leveque’s inequality and the Erdös–Turan bound [25]. In the notation of this section, Leveque’s inequality on S 1 gives D(Kx` , m) ≤ Cβ 2`/3 for a universal constant C. This also follows from Theorem 4.2. The Erdös–Turan bound gives D(Kx` , m) ≤ C( h1 + β ` log h) for any positive integer h. Optimizing in h gives as a slight improvement the bound C 1 `β ` . On the sphere S 2 , Lubotzky–Phillips–Sarnak (1986) show that discrepancy satisfies D(Kx` , m) ≤ Cβ 2`/3 for a universal C. Theorem 4.2 yields D(Kx` , m) ≤ Cβ `/2 . Their improved estimate leans heavily on special features available for dimension 2.

4.4. Examples. To treat examples we will use the following results on volume growth for Riemannian manifolds. Gallot, Hulin, and LaFontaine (1990) is a useful reference for this material. Lemma 4.3 ((Bishop, Gromov; [15], p. 133)). Let M be a compact Riemannian manifold of dimension n endowed with its distance function and its canonical Riemannian measure. Let V(x, r) denote the volume of the ball of radius r > 0 around x ∈ M. Let ρ be the diameter of M. Assume that the Ricci curvature of M is nonnegative. Then V(x, r) − V(x, s) rn r n − sn V(x, r) ≤ n and ≤ V(x, t) t V(x, t) tn for all 0 < t ≤ s ≤ r < +∞. In particular, if V = V(ρ) is the volume of M, and V (x, r) = V(x, r)/V, V (x, r) ≥ (r/ρ)n for all 0 < r ≤ ρ. Furthermore,

ωn nρ n (ε/ρ) V for all r, ε > 0, where ωn is the volume of the unit ball in Rn . V (x, r + ε) − V (x, r − ε) ≤

Example 1. G = SO(n+1), H = SO(n), X = S n ⊂ Rn+1 equipped with its canonical Riemannian distance. Let σn = Vol(S n ) =

2π (n+1)/2 0((n + 1)/2)

Bounds for Kac’s Master Equation

749

be the volume of the unit sphere in Rn+1 . Recall that ωn =

σn−1 2π n/2 = . n n0(n/2)

The diameter of S n is ρ = π. Corollary 4.1 and Theorem 4.2 yield Theorem 4.3. For the sphere S n , the bounded Lipschitz distance D∗ satisfies D∗ (Kx` , m) ≤ 3πβ 2`/(n+2) , whereas the discrepancy D satisfies D(Kx` , m) ≤

2π n 0((n + 1)/2) 2`/(n+2) β . 0(n/2)

Example 2. X = G = SO(n), H = {id}. We need to fix an invariant metric on SO(n) and compute the corresponding Riemannian volume V and diameter ρ. The dimension of SO(n) is N = n2 . Up to scaling, there exists a unique bi–invariant metric. Because of bi–invariance, we need only specify the distance to the identity and d(m, id) only depends on the eigenvalues θP i of m. For θ ∈ [0, 2π ], let |θ |1 = min(|θ |, |θ − 2π |). Then d 2 (m, id) is proportional to |θi |21 . The Ricci curvature of any bi-invariant metric on a compact Lie group is non-negative (in fact the sectional curvature is non-negative). See, e.g., Proposition 3.17 in [15]. To fix the scaling constant, recall that the Lie Algebra L of G = SO(n) can be identified with the space of real skew symmetric matrices with the exponential map given by exp : L → G, M → exp(M) = eM . Let {Ei,j : 1 ≤ i < j ≤ n} be the natural basis of L. Here Ei,j denotes the matrix with all entries equal to zero except the (i, j ) and (j, i) entries which are respectively equal to 1 and −1. The usual Euclidean structure X Mi,j Ni,j hM, Ni = i<j

for which the above basis is orthonormal give rise to a bi-invariant Riemannian structure on G. For this Riemannian structure the volume form is given by ^ dg = gjt dgi , i<j

where g = (gi,j )1≤i,j ≤n and gi is the column vector gi = (g`,i )1≤`≤n . The volume Vn of SO(n) is then equal to (recall σi from Example 1) Vn =

n−1 Y 1

σi =

2n−1 π N/2 . 0(n/2)0((n − 1)/2) · · · 0(3/2)0(1)

The diameter is equal to the diameter of a maximal torus which is ρn = πk 1/2 , where n = 2k or n = 2k + 1.

750

P. Diaconis, L. Saloff-Coste

More generally, in terms of eigenvalues the distance to the identity is exactly d 2 (m, id) = P |θi |21 for this normalization. Here the constants c, C in Theorems 4.1, 4.2 and Corollary 4.1 are given by c = 1 and 4π N [n/2]N/2 0(n/2)0((n − 1)/2) · · · 0(3/2)0(1) . C= 2n 0(N/2) Recall that, for t ≥ 2, √ √ −1 −1 2π(t − 1)t−1/2 e−t+1 e(12(t−1)+1) ≤ 0(t) ≤ 2π (t − 1)t−1/2 e−t+1 e(12(t−1)) . Hence 0(n/2)0((n − 1)/2) · · · 0(3/2)0(1) ≤ n(2π )n/2 [(n − 2)/2]N/2 e(−N+n)/2 , n(N/2)1/2 e1+n/2 (2π )(n−1)/2 0(n/2)0((n − 1)/2) · · · 0(3/2)0(1) . ≤ 0(N/2) (n/2)N/2 It follows that C ≤ 10N+1 if n ≥ 3. Applying these estimates we have Theorem 4.4. For the special orthogonal group SO(n), n ≥ 3, the bounded Lipschitz distance D∗ satisfies D∗ (Kx` , m) ≤ 3π(n/2)1/2 β 2`/(N+2) , whereas the discrepancy D satisfies D(Kx` , m) ≤ 2 × 10N +1 β 2`/(N+2) , where N = n(n − 1)/2. 5. Improved Bounds for Compact Groups This last section shows how to take full advantage of the comparison with a known random walk in controlling the bounded Lipschitz distance or discrepancy on a compact group. In the case of Kac’s walk on SO(n), the bounds obtained below improve by a factor of n those of Theorem 4.4. The trick is to refine the comparison technique used in Sect. 4 to give bounds on all the eigenvalues (not just the spectral gap) and then use this additional information. Some care is needed. In our main example, the measure Q has e has all eigenvalues of finite multiplicity. eigenvalues of infinite multiplicity while Q Let G be a compact metrizable group equipped with its normalized Haar measure. Let Q be a symmetric probability measure. As in Sect. 4.1, we also view Q as a convolution operator. Q is then a self-adjoint operator on L2 (G). Given a finite dimensional subspace H of L2 (G), define β(H ) = inf{hQf, f i : f ∈ H, kf k2 = 1} and set βi = sup{β(H ) : dimH = i + 1}, i = 0, 1, 2, . . . . The βi ’s form a non-increasing sequence, β0 = 1 and limi→∞ = β is the top of the essential spectrum of Q. Note that this limit exists because the βi ’s are bounded below by −1.

Bounds for Kac’s Master Equation

751

We now repeat this construction for the negative eigenvalues by starting from the opposite end of the spectrum. Namely, define α(H ) = sup{hQf, f i : f ∈ H, kf k2 = 1} and set αi = inf{α(H ) : dimH = i}, i = 1, 2, . . . . This time the αi ’s form a non-decreasing sequence and we set limi→∞ αi = α. This is the bottom of the essential spectrum of Q. Note that α≤β and that α = β = 0 if Q is a compact operator (e.g., when the measure Q has an L2 density). Set γ = max{−α, β}. (5.1) For any small ≥ 0, set 6 (`) =

X i: i>0 βi >γ +

X

|βi |2` +

|αi |2` .

(5.2)

i:αi <−γ −

In words, 6 (`) is the sum of the power of the eigenvalues lying outside the interval [−γ − , γ + ] (excluding β0 = 1). Remark. For > 0, 6 (`) is always finite: it is a finite sum. The quantity 60 (`) is infinite unless one of two cases arise: (1) α = β = 0 and Q∗` has a density in L2 (G), i.e., Q2` is trace class, in which case 60 (`)1/2 = kQ` k2→∞ is also the L2 -norm of the density of Q∗` w.r.t. Haar measure. (2) γ 6 = 0 and there are only finitely many eigenvalues lying outside the interval [−γ , γ ]. In this second case, 60 (`) is a finite sum and if γ = β (resp. −γ = α), β (resp. α) is an eigenvalue of infinite multiplicity (if β = −α = γ at least one of them is an eigenvalue of infinite multiplicity, possibly both). Keeping the notation of Sect. 4.2, we now can state a variant of Lemma 4.2. Lemma 5.1. For any ≥ 0 and for any bounded function f , k(Q − m)` fr k∞ ≤ 6 (`)1/2 + (γ + )` V (r)−1/2 kf k∞ . Proof. It suffices to prove the result for > 0. The case = 0 then follows by passing to the limit (if 60 (`) is infinite, the limit inequality is trivial). R1 Fix > 0. Let Q = −1 λdEλ be the spectral decomposition of Q. Define Z λdEλ Q2 = [−γ −,γ +]

752

P. Diaconis, L. Saloff-Coste

and Q1 = Q − Q2 . Observe that Q1 has density in L2 (G) and that Q1 , Q2 are bounded operators on L2 (G) with kQ2 k2→2 = γ + . Observe also that Q1 Q2 = 0. It follows that (Q − m)` = (Q1 − m)` + Q`2 for any ` = 1, 2, . . . . Moreover, k(Q1 − m)` k2→∞ = 6 (`)1/2 . Thus, we have (Q − m)` fr (x) = (Q1 − m)` fr (x) + Q`2 χr f (x) = (Q1 − m)` fr (x) + χr Q`2 f (x) ≤ k(Q1 − m)` fr k∞ + kχr k2 kQ`2 f k2 ≤ k(Q1 − m)` k2→∞ kfr k2 + V (r)−1/2 kQ`2 k2→2 kf k2 ≤ 6 (`)1/2 + V (r)−1/2 (γ + )` kf k∞ . t Here we have used the obvious fact that kfr k2 ≤ kf k∞ . u Using this lemma, we obtain some improved versions of Theorem 4.1, 4.2. Theorem 5.1. Under the assumptions of Theorem 4.1, the bounded Lipschitz distance is bounded by D∗ (Q` , m) ≤ 6 (`)1/2 + 3ρc−1/(2+n) (γ + )2`/(n+2) , for all ≥ 0. Similarly, under the assumptions of Theorem 4.2, the discrepancy distance is bounded by D(Kx` , m) ≤ 6 (`)1/2 + 2C n/(n+2) c−1/(n+2) (γ + )2`/(n+2) , for all ≥ 0. Let us now illustrate how these results can be used for Kac’s walk on SO(n). Keep e be the two probability measures on SO(n) defined the notation of Sect. 3.2. Let Q, Q at (3.1), (3.4). e have finite multiplicity and the only From the results of [36], all the eigenvalue of Q e e∗` has a bounded density for ` accumulation point in the spectrum of Q is 0. In fact, Q large enough. Moreover, it is proved in [36] that there exist B and b > 0 such that e0 (`) ≤ Be−bt 6

(5.3)

for all t > 0 and ` such that

1 n log n + tn. 8 By Proposition 3.1 and the minimax principle we have, with obvious notation, `≥

βi ≤ 1 −

ei 1−β 16n2

(5.4)

Bounds for Kac’s Master Equation

753

for i = 0, 1, . . . ,. From the definition of β and this inequality, it follows that 1 . 16n2

β ≤1− Similarly, Proposition 3.2 yields αi ≥ −1 +

1 +e αi 16n2

α ≥ −1 +

1 . 16n2

(5.5)

for i = 1, 2, . . . , and

This yields the following lemma. e on SO(n) defined at (3.1), (3.4), we have Lemma 5.2. Referring to the measures Q, Q γ ≤1−

1 , 16n2

γ = 0. e

Moreover e0 ([`/32n2 ]) 6δ (`) ≤ 6 for δ =1−

1 − γ. 32n2

Proof. Only the second inequality needs a further argument. The sum 6δ (`) only involves eigenvalues that fall outside [−1 + (1/32n2 ), 1 − (1/32n2 )]. By (5.4)–(5.5), the e must be outside [−1/2, 1/2]. Thus, corresponding eigenvalues of Q 6δ (`) ≤

X

1−

ei ≥1/2 i:β

ei 1−β 16n2

2` +

X

1−

i:e αi ≤−1/2

e i | 2` 1 − |α . 16n2

By the elementary inequalities ∀x ∈ (0, ∞), 1 − x ≤ e−x and ∀x ∈ (1, 1/2), e−2x ≤ 1 − x, we get e1/2 ([`/32n2 ]). t u 6δ (`) ≤ 6 Now, for the bounded Lipschitz distance D∗ , Lemma 5.2, (5.3) and Theorem 5.1 yield (in applying Theorem 5.1, recall that SO(n) has dimension N = n(n + 1)/2, not n) D∗ (Q` , m) ≤ Be−bt + 3π(n/2)1/2 e−`/(8n

3 (n+1))

for all t > 0 and ` such that ` ≥ 2n3 log n + 16n3 t. It is easy to check that the dominant term is the last term and this gives convergence for ` of order n4 log n.

754

P. Diaconis, L. Saloff-Coste

Theorem 5.2. For Kac’s walk on SO(n) there exists a constant B such that D∗ (Q` , m) ≤ Be−t for all `, t > 0 such that

` ≥ 4n4 log n + 8n4 t,

whereas the discrepancy distance satisfies D(Q` , m) ≤ B 0 e−t for all `, t > 0 such that

` ≥ 4n6 + 8n4 t.

Thanks to (3.12), the same result holds also for the walk Q defined at (3.11) with slightly different numerical constants. This is worth mentioning because it seems it would be hard to improve upon this result in the case of Q. Acknowledgement. We thank Eric Carlen for telling us about Kac’s work and its development. We thank David Maslin for keeping us informed about his progress on Kac’s problem.

References 1. Aharonov, D. and Ben-Or, M. (1997): Fault tolerant quantum computation with constant error. Proc. 29th S.T.O.C., Assoc. Comp. Mach. New York, pp. 176–188 2. Bird, G. A. (1976): Molecular Gas Dynamics Oxford: Clarendon Press, 3. Carlen, E., Gabetta, E. and Toscani, G. (1997): Propagation of smoothness and the rate of exponential convergence to equilibrium for a spatially homogeneous Maxwellian gas. Preprint, Dept. of Mathematics, Georgia Tech. 4. Cercignani, C., Illner, R. and Pulverinti, M. (1994): The mathematical theory of dilute gases. NewYork: Springer 5. Desvillettes, L. (1995): About the regularizing properties of the non cutoff Kac equation. Commun. Math. Phys. 168, 416–440 6. Diaconis, P. and Mallows, C. (1997): Identities for normal variables arising from the classical groups. Technical Report, Dept. of Mathematics, Cornell University 7. Diaconis, P. and Saloff-Coste, L. (1993 a): Comparison techniques for random walk on finite groups. Ann. Probab. 21, 2131–2156 8. Diaconis, P. and Saloff-Coste, L. (1993 b): Comparison theorems for reversible Markov chains. Ann. Appl. Math. 3, 696–730 9. Diaconis, P. and Shahshahani, M. (1996): Products of random matrices as they arise in the study of random walks on groups. Contemp.Math. 50, 183–195 10. Diaconis, P. and Shahshahani, M. (1987): The subgroup algorithm for generating uniform random variables. Prob. Eng. Info. Sci. 1, 15–32 11. Dudley, R. (1966): Convergence of Baire measures. Studia Math. 27, 251–268 12. Dudley, R. (1968): Distances of probability measures and random variables. Ann. Math. Statist. 39, 1563–1572 13. Dudley, R. (1976): Probabilities and Metrics. Lecture Notes Series 45, Matematisk Institut Aarhus 14. Dudley, R. (1989): Real Analysis and Probability Pacific Grove, CA: Wadsworth & Brooks/Cole 15. Gallot, S., Hulin, D. and LaFontaine, J. (1990): Riemannian Geometry. 2nd ed., Berlin: Springer-Verlag 16. Gottlieb, A. L. (1998): Markov Transitions and Propagation of Chaos. Ph.D thesis, Department of Mathematics, University of California, Berkeley 17. Grünbaum, F. A. (1971): Propagation of chaos for the Boltzmann equation. Arch. Rat. Mech. and Anal. 42, 323–345 18. Grünbaum, A. (1972): Linearization for the Boltzmann equation. Trans. Amer. Math. Soc. 165, 425–499 19. Hastings, W. (1970): Monte Carlo sampling methods using Markov chains and their applications. Biometrika 57, 97–109 20. Hurwitz, A. (1897): Über die Erzeugung der Invarianten durch Integration. Nach. Gesell. Wissen, Göttingen Math-Phys Klasse. pp. 71–90; Reprinted in Hurwitz, A. (1963): Mathematische Werke, Vol. II, Basel: Birkhäuser, pp. 546–564

Bounds for Kac’s Master Equation

755

21. Janvresse, E. (1999): Bounds on random rotations on the sphere. Preprint 22. Kac, M. (1956): Foundations of Kinetic Theory. Proc. 3rd Berkeley Sympos., J. Neymann, ed. Vol. 3, pp. 171–197 23. Kac, M. (1959): Probability and Related Topics in Physical Science. New York: Wiley Interscience 24. Kloss, B. (1959): Limiting distributions on bicompact topological groups. Th. Prob. Appl. 4, 237–270 25. Kuipers, L. and Niederreiter, H. (1974): Uniform Distribution of Sequences. New York: Wiley 26. Lubotzky, A., Phillips, R. and Sarnak, P. (1986): Hecke operators and distributing points on the sphere I. Comm. Pure Appl. Math. 39, Supplement 1, S149–S186 27. Maslin, D. (1999): The eigenvalues of Kac’s master equation. Preprint, Department of Mathematics, Dartmouth 28. McKean, H. (1966): Speed of approach to equilibrium for Kac’s caricature of a Maxwellian gas. Arch. Rat. Mech. Anal. 2, 343–367 29. Méléard, S. (1996): Asymptotic behavior of some interacting particle systems, Mc Kean-Vlasov and Boltzmann models. Lecture Notes in Math. 1627, New York: Springer 30. Nambu, K. (1983): Interrelations between various direct simulation methods for solving the Boltzmann equation. J. Phys. soc. Japan 52, 3382–3388 31. Perthame, B. (1994): Introduction to the theory of random particles methods for Boltzmann equation. In: Progress on Kinetic Theory, Singapore: World Scientific 32. Porod, U. (1995): L2 -lower bounds for a special class of random walks. Probab. Th. Related Fields 101, 277–289 33. Porod, U. (1996 a): The cut-off phenomenon for random reflections. Ann. Prob. 24, 74–99 34. Porod, U. (1996 b): The cut–off phenomenon for random reflections II: Complex and quaternionic cases. Probab. Th. Related Fields 104, 181–209 35. Riesz, F., Nagy, B. (1960): Functional Analysis. Unger, New York 36. Rosenthal, J. (1994): Random rotations: Characters and random walks on SO(n). Ann. Probab. 22, 398– 423 37. Rothaus, O. (1981): Diffusion on compact Riemannian manifolds and logarithmic Sobolev inequalities. J. Func. Anal. 42, 102–109 38. Saloff-Coste, L. (1997): Lectures on finite Markov chains. Ecole d’ete de St. Flour, LNM 1665, Berlin– Heidelberg–New York: Springer 39. Shor, P. (1998): Quantum computation. Proc. I.C.M., Berlin 1998, I, 467–486 40. Smith, A. and Roberts, G. (1993): Bayesian Computation via the Gibbs Sampler and Related Markov Chain Monte Carlo Methods (with discussion). J. Roy. Stat. Soc. B 55, 3–24 41. Su, F. (1995): Ph. D. Dissertation Dept. Mathematics, Harvard University 42. Sznitman, A.-S. (1991): Topics in Propagation of Chaos. Springer Lecture Notes in Math, 1464, New York: Springer 43. Uchiyama, K. (1988): Derivation of the Boltzmann equation from particle dynamics. Hiroshima Math. J. 18, 245–297 Communicated by J. L. Lebowitz

Commun. Math. Phys. 209, 757 – 783 (2000)

Communications in

Mathematical Physics

© Springer-Verlag 2000

A New Invariant for σ Models Ioannis P. Zois Mathematical Institute, 24-29 St. Giles’, Oxford OX1 3LB, UK. E-mail: [email protected] Received: 2 February 1999 / Accepted: 2 September 1999

To my brother Demetrios Abstract: We introduce a new invariant for σ models (and foliations more generally) using the even pairing between K-homology and cyclic homology. We try to calculate it for the simplest case of foliations, namely principal bundles. We end up by discussing some possible physical applications including quantum gravity and M-Theory. In particular for M-Theory we propose an explicit topological Lagrangian and then using S-duality we conjecture on the existence of certain plane fields on S 11 .

1. Introduction and Motivation In [37] we proposed a Lagrangian density for the topological part of the non-supersymmetric M-theory using Polyakov’s flat bundle description of non-linear σ models. The new key ingredient from the geometric point of view was “characteristic classes for flat bundles”. This idea about using characteristic classes of flat bundles came from the definition of a new invariant for Haefliger structures [38]. This invariant can be defined for any foliation in general; as far as physics is concerned however (and this includes the case of M-theory treated in [37]), we are primarily concerned with a special kind of foliations, called flat foliations of bundles. This is due to the fact that as Polyakov had noticed in [18], σ models can be thought of as flat principal bundles (see [37] for more details). Thus, hopefully, our invariant might be of some relevance whenever σ models are met in physics. We organise this paper as follows: in Sect. 2 we explain the strategy of the construction; in Sect. 3 we provide all the details; in Sect. 4 we give the invariant formula; in Sect. 5 we calculate this invariant for the simplest case of a principal bundle and in Sects. 6 and 7 we discuss some possible applications in physics.

758

I. P. Zois

2. Strategy 2.1. Instantons. Let us recall some facts about instantons. We would like to think of our invariant as an analogue of the instanton number for foliations. We consider a principal bundle (P , π, M, G), where M is the base manifold assumed to be compact and 4-dim for brevity, G is SU(2) for simplicity, P is the total space of the bundle and π is the projection. Assuming we have a connection R A on P with curvature F , then the instanton number ignoring constants is simply M F ∧ F , i.e. the second Chern number c2 of the bundle P . We would like to think of this number slightly differently: more or less by definition, any principal bundle P over M defines an element (K-class) of the group K 0 (M) (we forget equivariant K-theory for simplicity). Using the Chern–Weil homomorphism we get the Chern classes of P which belong to the cohomology groups H 2∗ (M). Considering the (top dimensional) fundamental class [M] of M in the homology group H∗ (M) of M and taking the pairing between homology and cohomology, which in this case is just integration over M, we get the instanton number. We can consider the Chern– Weil homomorphism from K 0 (M) → H 2∗ (M) as a “black box” and forget all about cohomology for the moment; then the instanton number will be the result from pairings between K-theory and (singular) homology H∗ (M). Our construction since we are dealing with foliations (more accurately with the space of leaves of foliations) which provide a good example of non-commutative topological spaces, imitates the above picture: to each foliation we can associate a homology class which will be the analogue of the fundamental class [M] above; this class however will belong to an appropriate homology theory called cyclic homology and it is called the transverse fundamental class of the foliation. Moreover one can also construct a class in K-homology, being the analogue of K-theory for our purpose. Then we use a formula for pairings between cyclic homology and K-homology to get our result.

2.2. Non-commutative topology. In this subsection we would like to mention briefly what non-commutative topology is about. As its name suggests, this is related to of noncommutative geometry. Non-commutative geometry has appeared in physics literature some years ago mainly through the so-called “quantised calculus”. Anyway, the starting point of non-commutative topology is the fact that given any compact Hausdorff space X say, the commutative (C ∗ )-algebra C(X) of complex valued functions defined on X can capture all the topological information of the space X itself; in fact X and C(X) are completely equivalent, one can be uniquely constructed by the other. Conversely, given any commutative algebra A, say, there exists a compact Hausdorff space X say, (called the spectrum of A) “realising” the commutative algebra A. Realising means that the commutative algebra C(X) of complex valued functions on X is essentially the algebra A. In mathematics terminology one says that the categories of compact Hausdorff spaces and commutative C ∗ -algebras are equivalent. This is the so-called Gelfand’s theorem. We know however that there exist non-commutative C ∗ -algebras as well. The natural question then is whether one can find a “topological” realisation for them just like for the commutative ones. We are looking for a non-commutative analogue of Gelfand’s theorem. This question is not fully answered in mathematics, it is related to the famous Baum–Connes conjecture. There are some things already known in mathematics and these are related to foliations. This is what we shall be using extensively in this paper. The appropriate framework is that of K-theory and various homology theories.

New Invariant for σ Models

759

During the ’70s mathematicians (Baum, Douglas, Kasparov and others) developed a K-theory for arbitrary C ∗ -algebras (commutative or not) and it is a well-known theorem due to Serre and Swan that in the commutative case this K-theory reduces to Atiyah’s original topological K-theory. Moreover in the 80’s mathematicians (Connes, Loday, Quillen and others) developed a homology theory called cyclic homology for arbitrary algebras which again in the commutative case gives in the limit the usual simplicial homology. So non-commutative topology, in terms of K-theory and various homology theories gives a generalisation of ordinary topology through Gelfand’s theorem. A good example of a non-commutative topological space is the space of leaves of a foliation (see below for definitions). In general, quotients of ordinary topological spaces by discrete groups give non-commutative (abbreviated to “nc” in the sequel) spaces. Good textbooks are [15] and [42] for an introduction on K-theory of C ∗ -algebras and cyclic homology respectively. 2.3. The invariant. In order to construct our invariant for any foliation [38],we use some ideas from non-commutative geometry (ncg), [20,22,23,27]. The strategy is as follows: given any foliation F of a manifold V , namely an integrable subbundle F of T V , one can associate to it another manifold 0(F ), called the graph (or holonomy groupoid) of the foliation introduced in [29]. This is of dimension dim V + dim F . Using the complex line bundle 1/2 (0(F )) of 1/2-densities defined on 0(F ), we consider the set (actually vector space) of smooth sections of this line bundle equipped with a ∗ product, thus obtaining an algebra. We then complete this algebra in a “minimal” manner (in standard C ∗ -algebra theory this is called the reduced C ∗ -algebra completion), thus we obtain a C ∗ -algebra denoted C ∗ (F ) which is naturally associated to our original foliation F . From now on one can forget the original foliation F of V all together and concentrate on its corresponding C ∗ -algebra C ∗ (F ). We are interested in the K0 group of C ∗ (F ) and in its cyclic homology groups. If we pick a metric g on the transverse bundle t of F we can construct in a natural way a C ∗ (F )-module E(F ), thus obtaining a class [E(F )] in K0 (C ∗ (F )). Moreover to our foliation one can associate in a natural way a cyclic cocycle [F ] in the q th cyclic homology group of the C ∗ -algebra C ∗ (F ), called the fundamental transverse cyclic cocycle of the foliation, where q is the codimension of the foliation F . Then we use the even pairing between K-homology and cyclic homology in this case, namely we consider the pairing h[E(F )], [F ]i := (m!)−1 (F #T r)(E(F ), . . . , E(F )) as was firstly introduced in the abstract algebraic context in [23]. Hence we obtain a complex number as a result from the above pairing and this complex number characterises our original foliation F . (We have fixed the field of complex numbers as the ground field throughout this paper). 3. The Constructions in Detail 3.1. Foliations. Let V be a smooth manifold and T V its tangent bundle. A smooth subbundle F of T V is called integrable iff one of the following equivalent conditions is satisfied: 1. Every x ∈ V is contained in a submanifold W of V such that Ty (W ) = Fy , where Ty denotes the tangent space over y.

760

I. P. Zois

2. Every x ∈ V is in the domain U ⊂ V of a submersion p : U → Rq (q = codimF ) with Fy = ker(p∗ )y ∀y ∈ U . 3. C ∞ (V , F ) = X ∈ C ∞ (V , T V ); Xx ∈ Fx ∀x ∈ V is a Lie subalgebra of the Lie algebra of vector fields on V . 4. The ideal J (F ) of smooth differential forms which vanish on F is stable under differentiation: d(J ) ⊂ J . The Condition 3 is simply Frobenius’ Theorem and 4. its dual. Example. Any 1-dimensional subbundle F of T V is integrable, but for dim F ≥ 2 the condition is non-trivial; for instance if V is the total space of a principal bundle with compact structure group, then we know that the subbundle of vertical vectors is always integrable, but the horizontal subbundle is integrable iff the connection is flat. We shall make extensive use of this fact in this piece of work. A foliation of V is given by an integrable subbundle F of V . The leaves of the foliation are the maximal connected submanifolds L of V with Tx (L) = Fx ∀x ∈ L, and the partition of V into leaves V = ∪La , where a ∈ A is characterised geometrically by its “local triviality”: every point x ∈ V has a neighborhood U and a system of local coordinates (x j ), j = 1, . . . , dim V which is called foliation chart , so that the partition of U into connected components of leaves, called plaques (they are the leaves of the restriction of the foliation on U ), corresponds to the partition of Rdim V = Rdim F × RcodimF into the parallel affine subspaces Rdim F × pt. Very simple examples indicate that the leaves L may not be compact even if the manifold V is and that the space of leaves X := V /F may not be Hausdorff for the quotient topology. The “rational torus” is such an example. Throughout this paper we would mainly restrict our attention to two special kinds of foliations: we consider a principal bundle P with structure (Lie) group G (assumed compact and connected) over a compact manifold M. The total space P has automatically a foliation induced by the fibration: the leaves are the fibers which are isomorphic to the structure group G and the space of leaves is just the base space M with its manifold topology. We shall be referring to this foliation as the vertical foliation of the principal bundle and it will be denoted PV . Clearly, the dimension of this foliation is equal to the dimension of the group G, the integrable subbundle of T P being in this case the vertical subbundle. The codimension is equal to the dimension of the base space M. Now if in addition a flat connection is given on our principal bundle, we have another foliation of the total space which we shall be referring to as the horizontal or flat foliation and it will be denoted PH . We shall study this foliation extensively in the following subsection. The dimension of this foliation equals the dimension of the base space and the codimension equals the dimension of the group. From this one can see that the vertical and the horizontal foliations of a principal bundle are transverse to each other. Now the vertical foliation behaves very well; everything is compact and Hausdorff, as were the spaces we started with to build our bundle. In this case the general theory of foliations gives nothing more than the well-known theory of principal bundles. However, the horizontal foliation can suffer from various “pathological” defects and for this reason it is interesting from the ncg point of view. Let us study it in greater detail.

3.2. Flat foliation of a principal bundle. To begin with, a flat connection on a principal bundle P with structure group G and base space M, corresponds to reduction of the

New Invariant for σ Models

761

structure group from G to a subgroup isomorphic to a normal subgroup of the fundamental group of the base space π1 (M). Moreover a (gauge equivalence class of a) flat connection also defines a (conjugacy class of a) representation H : π1 (M) → G. If we identify the fundamental group with the group of covering translations of the universal covering M˜ of M we get an action ~ of π1 (M) on M˜ × G defined as follows: ~ : π1 (M) × (M˜ × G) → (M˜ × G), (γ , (m, ˜ g)) 7→ (γ (m), ˜ H (γ )(g)), ˜ ∈ M. This action gives a where we use the obvious notation γ ∈ π1 (M), g ∈ G, m commutative diagram: pr −−−−→ M˜  q y ,

M˜ × G   πy

(1)

P 0 = (M˜ × G)/~ −−−−→ M p

where pr is the canonical projection, π is the quotient map by ~, p is uniquely induced by pr and q is just the map from the universal covering space to the original space. This construction is called suspension of the representation H . One can prove that the map π is a covering map and that if z := I mH is endowed with the induced topology, then ξH = (P 0 , p, M) is a fiber bundle with fiber G, total space P 0 , base M, projection p and structure group z. To study the geometric properties of suspensions we introduce a new topology on the total space P 0 of ξH . We denote by Gδ the set G supplied with the discrete topology. Then the action ~ of π1 (M) on M˜ × Gδ remains continuous and the map π : M˜ × Gδ → P 0 induces on P 0 a new topology which is finer than its manifold topology. We denote by P δ the set P 0 supplied with this topology. The topology on M˜ × Gδ and the topology P δ are called the leaf topologies. Then the suspension diagram below is a commutative diagram of covering maps: pr M˜ × Gδ −−−−→   πy

Pδ

M˜  q y .

(2)

−−−−→ M p

The topological space P δ is not connected unless the fiber is contractible. A connected ˜ g) ∈ P 0 belongs to exactly component of P δ is called a leaf of ξH . Each point x = π(m, one leaf which is denoted Lx and equals π(M˜ × g). The leaves are injectively immersed submanifolds of P 0 but in general not embedded. They are transverse to the fibers of ξH . Conjugate representations H and H 0 give suspension bundles ξH and ξH 0 which are isomorphic. Let now x = π(m, ˜ g). Then the representation Hx : π1 (Lx ) → G

762

I. P. Zois

with image zg , is called the holonomy representation of the leaf Lx at the point x. The group zg is the holonomy group of the leaf Lx at the point x. zg is the isotropy group of z in g ∈ G. Moreover π1 (Lx ) is isomorphic to the isotropy group of π1 (M) in the point g ∈ G, namely π1 (Lx ) ∼ = {γ ∈ π1 (M)|H (γ )g = g}. See also [19]. There is a topological way to characterise these flat bundles which is by using classifying spaces for flat bundles in a fashion analogous for ordinary bundles, namely: Let G be a connected Lie group and let Gδ denote the same group with the discrete topology. The Milnor join construction for G defines a connected space BG which is the classifying space for principal G-bundles. The same construction applied to Gδ yields a connected topological space BGδ which is an Eilenberg-Maclane space K(G, 1), namely π1 (BG) = G and πj (BG) = 0 for j > 1. The inclusion i : Gδ → G induces a continuous map Bi : BGδ → BG. As sets these two spaces are the same with the source having finer topology than the range. The difference in these two topologies is measured by introducing the homotopy fiber BG0 . This is defined by first replacing Bi with a homotopy equivalent weak fibration over BG, then take for BG0 the (homotopy class of the) fiber. The description then is just the construction of the Puppe Sequence for Bi (cf. [30]). Choose a base point in BGδ and consider its image in BG. Then let (BG) and P (BG) denote the space of based loops and paths with initial point of BG respectively. Let e be the end point map of a path. Then one has a fibration (BG) ,→ P (BG) → BG, where the second map is e. Then define BG0 via the homotopy pull-back diagram: (BG) −−−−→ (BG)     y y BG0   y

−−−−→ P (BG) .  e y

BGδ

−−−−→ Bi

(3)

BG

A principal G-bundle P over a manifold M is equivalent to giving an open covering for M and the transition functions. This data defines a continuous map gP : M → BG. If the transition functions are locally constant, namely if the bundle P is flat, then gP can be factored through BGδ as a continuous map. A choice of transition functions which are locally constant is equivalent to specifying a flat G-structure on P . Hence P has a horizontal foliation whose holonomy map a : π1 (M) → G defines the classifying map Ba : M → BGδ . Conversely, given a continuous map Ba : M → BGδ , there is induced a representation a : π1 (M) → G and a corresponding flat principal G-bundle Pa = M˜ ×π1 (M) G, where M˜ is the universal covering of M. The topological type of the G-bundle Pa is determined by the composition ga : M → BGδ → BG. The principal bundle is trivial iff ga is homotopic to the constant map M → pt. The choice of the homotopy is equivalent to specifying a global section on Pa .

New Invariant for σ Models

763

3.3. Groupoids and C ∗ -algebras associated to foliations. The next step is to associate the holonomy groupoid to any foliation. In general a groupoid is roughly speaking a small category with inverses, or more precisely Definition 1. A groupoid consists of a set 0, a distinguished subset 0 (0) of 0, two maps r, s : 0 → 0 (0) and a law of composition ◦ : 0 (2) := (γ1 , γ2 ) ∈ 0 × 0; s(γ1 ) = r(γ2 ) → 0 such that: 1. s(γ1 ◦ γ2 ) = s(γ2 ), r(γ1 ◦ γ2 ) = r(γ1 ) ∀(γ1 , γ2 ) ∈ 0 (2) , 2. s(x) = r(x) = x∀x ∈ 0 (0) , 3. γ ◦ s(γ ) = γ , r(γ ) ◦ γ = γ ∀γ ∈ 0, 4. (γ1 ◦ γ2 ) ◦ γ3 = γ1 ◦ (γ2 ◦ γ3 ), 5. Each γ has a two sided inverse γ −1 , with γ γ −1 = r(γ ) and γ −1 γ = s(γ ). The maps r, s are called range and source maps. In the category theory terminology, 0 (0) is the space of objects and 0 (2) is the space of morphisms. Definition 2. A smooth groupoid 0 is a groupoid together with a differentiable structure on 0 and 0 (0) such that the maps r, s are submersions and the object inclusion map 0 (0) → 0 is smooth, as is the composition map 0 (2) → 0. The notion of a 21 -density on a smooth manifold allows one to define in a canonical manner the convolution algebra of a smooth groupoid 0. 1/2 Specifically, given 0, let 1/2 be the line bundle over 0 whose fiber γ at γ ∈ 0, r(γ ) = x, s(γ ) = y, is the linear space of maps ρ : ∧k Tγ (0 x ) ⊗ ∧k Tγ (0y ) → C such that

ρ(λν) = |λ|1/2 ρ(ν)∀λ ∈ R.

Here 0y = γ ∈ 0; s(γ ) = y, 0 x = γ ∈ 0; r(γ ) = x, and k = dimTγ (0 x ) = dimTγ (0y ) are the dimensions of the fibers of the submersions r : 0 → 0 (0) and s : 0 → 0 (0) . Then we endow the linear space Cc∞ (0, 1/2 ) of smooth compactly supported sections of 1/2 with the convolution product Z a(γ1 )b(γ2 ) (a ∗ b)(γ ) = γ1 ◦γ2 =γ

∀a, b ∈ Cc∞ (0, 1/2 ), where the integral on the RHS makes sense since it is the integral of a 1-density, namely a(γ1 )b(γ1−1 γ ), on the manifold 0 x , x = r(γ ). One then can prove that if 0 is a smooth groupoid and Cc∞ (0, 1/2 ) is the convolution algebra of smooth compactly supported 21 -densities with involution *, f ∗ (γ ) = f (γ −1 ). Then for each x ∈ 0 (0) , the following defines an involutive representation πx of Cc∞ (0, 1/2 ) in the Hilbert space L2 (0x ): Z

(πx (f )ξ )(γ ) =

f (γ1 )ξ(γ1−1 γ )

764

I. P. Zois

∀γ ∈ 0x , ξ ∈ L2 (0x ) The completion of Cc∞ (0, 1/2 ) for the norm ||f || = supx∈0 (0) ||πx (f )|| is a C ∗ -algebra denoted Cr∗ (0). Moreover one defines the C ∗ -algebra C ∗ (0) as the completion of the involutive algebra Cc∞ (0, 1/2 ) for the norm ||f ||max = sup||π(f )||; {π involutive Hilbert space representation of Cc∞ (0, 1/2 )}. After this general introduction to groupoids and to C ∗ -algebras associated to them, now we pass to groupoids and C ∗ -algebras associated to foliations. Let (V , F ) be a foliated manifold of codim q. Given any x ∈ V and a small enough open set W in V containing x, the restriction of the foliation F to W has as its leaf space an open set of Rq which we shall call a transverse neighborhood of x. In other words, this open set W/F is the set of plaques around x. Now given a leaf L of (V , F ) and two points x, y ∈ L, any simple path γ from x to y on L uniquely determines a germ h(γ ) of a diffeomorphism from a transverse neighborhood of x to one of y. This depends only on the homotopy class of γ and is called the holonomy of the path γ . The holonomy groupoid of a leaf L is the quotient of its fundamental groupoid by the equivalence relation which identifies two paths γ1 , γ2 from x to y both in L iff h(γ1 ) = h(γ2 ). Here by the fundamental groupoid of a leaf we mean the groupoid 0 = L × L, r, s are the two projections, 0 (0) = L and the composition is (x, y) ◦ (y, z) = (x, z). (From this one can see that every space is a groupoid.) The holonomy covering L˜ of a leaf L is the covering of L associated to the normal subgroup of its fundamental group π1 (L) given by paths with trivial holonomy. The holonomy groupoid or graph of the foliation is the union 0 of the holonomy groupoids of its leaves. Given an element γ of 0 we denote by s(γ ) = x the origin of the path γ and by r(γ ) = y its endpoint, where r, s are the range and source maps as in the general case. An element of 0 is thus given by two points x = s(γ ) and y = r(γ ) of V together with an equivalence class of smooth paths : the γ (t), t ∈ [0, 1] with γ (0) = x and γ (1) = y, tangent to the bundle F , namely with dγ dt ∈ Fγ (t) ∀t ∈ R, identifying γ1 and γ2 as equivalent iff the holonomy of the path γ2 γ1−1 at the point x is the identity. The graph 0 has an obvious composition law. For γ1 and γ2 in 0, the composition γ1 ◦ γ2 makes sense if s(γ1 ) = r(γ2 ). The groupoid 0 is by construction a (not necessarily Hausdorff) manifold of dimension dim 0 = dim V + dim F . Definition 3. The C ∗ -algebra of the foliation is exactly the C ∗ -algebra of its graph, as described for arbitrary groupoids above. For our foliations of interest, the graph 0 is the following: for the vertical foliation ˜ where a is just the manifold P × G whereas for the horizontal foliation is P ×a M, is the representation from π1 (M) to G induced by the flat connection 1-form (via the holonomy). Moreover the distinguished subset 0 (0) in both cases is the manifold we want to foliate, namely P , the total space of our bundle in our case. The C ∗ -algebras associated to our foliations are: for the vertical foliation is C(M) tensored with compact operators which act as smoothing kernels along the leaves which in turn is strongly Morita equivalent to just C(M), whereas for the horizontal foliation is strongly Morita equivalent (abbreviated to SME) to C(P ) o π1 (M). (Note: the representation of the fundamental group of the base onto the structure Lie group induced by the flat connection 1-form used enters the definition of the crossed product.) The first algebra is commutative (up to SME), but the second is not! It is for this reason that we can see now that ncg has an important role to play, in fact we are deeply in the ncg setting. Obviously if the space is simply connected, i.e. π1 vanishes, non-commutativity is lost. We

New Invariant for σ Models

765

would like to emphasise that in all cases in the literature where some “non-commutative” algebras were used, especially in connection to the well-known Connes–Lott model for electroweak theory (or even QCD), these algebras are in fact SME to commutative ones. Hence in terms of topology, this is not a real non-commutative case.

3.4. K-classes associated to foliations. We shall give the general construction for an arbitrary foliation. Let (V , F ) be a foliated manifold and t = T V /F the transverse bundle of the foliation. The holonomy groupoid 0 of (V , F ) acts in a natural way on t by the differential of the holonomy, thus for every γ ∈ 0, γ : x → y determines a linear map h(γ ) : tx → ty . We denote this action by h. It is not in general possible to find a Euclidean metric on t which is invariant under the above action of 0. Let g be an arbitrary smooth Euclidean metric on the real vector bundle t. Thus for ξ ∈ tx we let ||ξ ||g = (hξ, ξ ig )1/2 be the corresponding norms and inner products and drop the subscript g henceforth. Using g we define a C ∗ -module E on the C ∗ -algebra Cr∗ (V , F ) of the foliation. Recall that Cr∗ (V , F ) is the completion of the convolution algebra Cc∞ (0, 1/2 ) which acts by right convolution on the linear space Cc∞ (0, 1/2 ⊗ r ∗ (tC )) denoted by 3 for simplicity and tC is the complexification of the transverse bundle t: Z ξ(γ1 )f (γ1−1 γ ), (ξf )(γ ) = 0y

where y = r(γ ). Endowing the complexified bundle tC with the inner product associated to g and anti-linear in the first variable, the following formula defines a Cc∞ (0, 1/2 )valued inner product: Z hξ, ni(γ ) =

0y

hξ(γ1−1 ), n(γ1−1 γ )i

for any ξ, n ∈ Cc∞ (0, 1/2 ⊗ r ∗ (tC )). One then checks that the completion E of the space Cc∞ (0, 1/2 ⊗ r ∗ (tC )) for the norm ||ξ || = (||hξ, ξ i||Cr∗ (V ,F ) )1/2 becomes a C ∗ -module over Cr∗ (V , F ). If one takes also the action h of 0 on t into account, with some extra effort one can make E into a (3, 6)-bimodule (for the definition of the algebra 6 see below). The first construction thus gives us an element E of K0 (Cr∗ (V , F )) whereas the second gives E as an element of KK0 (3, 6), Kasparov’s bivariant KTheory. (Recall that the 0th Kasparov bivariant K-group in this case consists of stable isomorphism classes of (3, 6)-bimodules). We shall use this action h to define a left action of Cc∞ (0, 1/2 ) on E by: Z (f ξ )(γ ) = f (γ1 )h(γ1 )ξ(γ1−1 γ ) ∀f ∈ Cc∞ (0, 1/2 ), ξ ∈ E. 0y

One then can prove that for any f ∈ Cc∞ (0, 1/2 ) the above formula defines an endomorphism λ(f ) of the C ∗ -module E whose adjoint λ(f )∗ is given by Z f # (γ1 )h(γ1 )ξ(γ1−1 γ ) (λ(f )∗ ξ )(γ ) = 0y

766

where and

I. P. Zois

f # (γ ) := fe(γ −1 )1(γ ) 1(γ ) = (h(γ )−1 )t h(γ )−1 ∈ End(tC (r(γ ))).

This shows that unless the metric on t is 0-invariant, the representation λ is not a *-representation, the subtle difference between λ(f )∗ and λ(f ∗ ) being measured by 1. In particular λ is not in general bounded for the C ∗ -algebra norms on both EndCr∗ (V ,F ) E and Cr∗ (V , F ) ⊃ Cc∞ (0, 1/2 ). However λ is a closable homomorphism of C ∗ -algebras, namely, the closure of the graph of λ is the graph of a densely defined homomorphism. Then with the graph norm ||x||λ = ||x|| + ||λ(x)||, the domain 6 of the closure e λ of λ is a Banach algebra which is dense in the C ∗ -algebra ∗ ∗ 3 = Cr (V , F ). The C -module E is then a (3, 6)-bimodule. This particular module E we constructed here will be the one of the two main ingredients which define the invariant we want and we shall denote it E(F ). 3.5. Cyclic classes associated to foliations (transverse fundamental cyclic cocycle). We begin with some definitions from cyclic homology: R Definition 4. 1. A cycle of dimension n is a triple (, d, ), where = ⊕nj=0 j is a R graded algebra over C, d is a differential on ’s and : n → C is a closed graded trace on . R 2. Let A be an algebra over C. Then a cycle over A is given by a cycle (, d, ) and a homomorphism ρ : A → 0 . A cycle over A of dimension n is essentially determined by its character which is the following (n + 1)-linear functional on A: Z 0 n ∀a j ∈ A. τ (a , . . . , a ) = ρ(a 0 )d(ρ(a 1 ))d(ρ(a 2 )) . . . d(ρ(a n )) One can then prove that this is a cyclic cocycle of A, namely it defines a cohomology class in the cyclic homology of A and that the above is a necessary and sufficient statement. We shall now describe the transverse fundamental class associated to foliations. There is a general construction for arbitrary foliations which is quite involved since one has to complete the graded algebra. This is so because the transverse bundle of the foliation may not be integrable and in this case derivation along transverse directions will not be a differential. We, however, are primarily interested in our two special kinds of foliations, the vertical and the horizontal foliation of a principal bundle. These foliations are transverse, both are integrable so derivatives are differentials and hence one does not have to complete the graded algebras. We refer to [27] for the general construction. Here we shall only describe the classes which are associated to our two foliations: The vertical and the horizontal (or flat) foliations of the total space of our principal bundle P will be denoted (PV ) and (PH ) respectively. One then has that there is a natural cycle for the algebra of each foliation, namely: Vertical Foliation, cycle denoted [PV ]: The natural cycle canonically associated to the algebra Cc∞ (P × G, 1/2 ) of the vertical foliation consists of:

New Invariant for σ Models

767

1. The graded algebra Cc∞ (P × G, 1/2 ⊗ r ∗ (∧PH∗ )), where PH is the horizontal subbundle (i.e. the transverse bundle to the vertical foliation). 2. The differential d = dV + dH + θ , where dH : C ∞ (P , ∧r PV∗ ⊗ ∧s PH∗ ) → C ∞ (P , ∧r PV∗ ⊗ ∧s+1 PH∗ ),

dV : C ∞ (P , ∧r PV∗ ⊗ ∧s PH∗ ) → C ∞ (P , ∧r+1 PV∗ ⊗ ∧s PH∗ ).

θ means contraction with the section θ ∈ C ∞ (P , PV ⊗ ∧2 PH∗ ), where θ is defined by: θ(pH (X), pH (Y ) = pV ([X, Y ]) for any pair of horizontal vector fields X, Y ∈ C ∞ (P , PH ) and (pH , pV ) is the isomorphism T P → PH ⊕ PV given by PH . 3. The trace is defined via Z w, τ (w) = 0 (0)

where 0 is the graph of the vertical foliation 0 = P × G. Similarly one defines a fundamental class for the horizontal foliation. One then can define the character of these cycles – essentially the trace – which is a class in the cyclic homology of the appropriate algebra for each foliation [22]. Note. Since now we have a cyclic homology class, say φ, of the algebra of the foliation, say 3, we automatically have a map Ki (3) → C given by pairing it with K-group elements (i = 0, 1 above) to get index theorems for leafwise elliptic operators. Let us mention here that the analytic Index of an operator elliptic along the leaves of an arbitrary foliation say (V , F ), is an element of K0 (C ∗ (V , F )), being in fact a generalisation of the index of families of elliptic operators considered by Atiyah and Singer. (In the Atiyah-Singer case of families of elliptic operators one is dealing with the foliation induced by the fibration, which is the commutative geometry case). The operator itself which is elliptic along the leaves of the foliation is an element of KK0 (C ∗ (V , F ), C(V )). 4. Invariant for the nlσ m The final step then is to make use of the general formula for pairings between K-homology and cyclic homology. In more concrete terms, one has: Definition 5. Let A be an algebra. Then the following equality defines a bilinear pairing between K-theory and cyclic homology: 1. Even case: K0 (A) and H C ev (A): h[e], [φ]i := (m!)−1 (φ#T r)(e, . . . , e) for e ∈ K0 (A) using the idempotents’ description and φ ∈ H C 2m (A) and where # is the cup product in cyclic homology (see for instance [23] for the precise definition). 2. Odd case: K1 (A) and H C odd (A): n 1 h[u], [φ]i = √ 2−n 0( + 1)−1 (φ#T r)(u−1 − 1, u − 1, u−1 − 1, . . . , u − 1). 2 2i

768

I. P. Zois

This is an important point because by pairing the C ∗ -module E we constructed previously naturally associated to the foliation considered with the cyclic cocycle naturally associated to the foliation, we get an invariant for arbitrary foliations. In particular if we apply this to the horizontal foliation, we get a complex number which is an invariant for the nlσ m. Namely one has: h[E(PH )], [PH ]i = (m!)−1 ((PH )#T r)(E(PH ), . . . , E(PH )) ∈ C. In more concrete terms, assuming that E(PH ) ∈ Mk (C ∗ (PH )) for some k (where C ∗ (PH ) is the corresponding C ∗ -algebra to the horizontal foliation) and [PH ] ∈ Z q (C ∗ (PH )), where Z denotes cyclic cocycles and q is the codimension of the horizontal foliation, then (PH )#T r ∈ Z q (Mk (C ∗ (PH ))) is defined by ((PH )#T r)(a 0 ⊗ m0 , . . . , a q ⊗ mq ) = (PH )(a 0 , . . . , a q )T r(m0 . . . mq ) for any a i ∈ C ∗ (PH ), mi ∈ Mk (C), i = 1, . . . , q. Note. Let us mention that the odd case formula is related to the η invariant for leafwise elliptic operators, see [31,32] which in turn is related to global anomalies and to the Freedman–Townsend invariance (cf. [33,38,25,41]). 5. An Example: Principal Fibre Bundles In order to get some more insight to this pairing we shall try to calculate it for the case of principal bundles (vertical foliation) which is the simplest example. We begin by describing the graph in detail: the set 0 in this case is the manifold P × G, the distinguished subset 0 (0) = P × {e} and denoting the action (on the right) of g 3 G on p 3 P simply by (p, g) 7 → pg, one has that the range and source maps are respectively r(p, g) = p and s(p, g) = pg, the inverse (p, g)−1 = (pg, g −1 ) and the law of composition is (p1 , g1 ) ◦ (p2 , g2 ) = (p1 , g1 g2 ) if p1 g1 = p2 . Obviously the set 0 (2) = P × G × G. Moreover we recall that the C ∗ -algebra for the vertical foliation is strongly Morita equivalent to C(M), We now make use of two important facts: 1. K0 (C(M)) = K 0 (M), namely the Serre–Swan theorem and ∗ (C(M)) = H (M), where on the RHS we have the ordinary homology of M 2. Hcont ∗ (with complex coefficients) and by definition for the LHS we have H ∗ (C(M)) := Lim→ (H C n (C(M)), S) (see [23] for explanations of the notation), H C ∗ denotes cyclic homology and “cont” means restriction to continuous linear functionals. The first fact says that for commutative C ∗ -algebras one gets Atiyah’s topological K-theory for the underlying space (described in terms of stable isomorphism classes of complex vector bundles over the space considered) and the second says that in the commutative case again cyclic homology is “roughly speaking” the ordinary homology of the underlying space (and thus we see that non-commutative geometry reduces to ordinary geometry in the commutative case). Since we have these two results at our disposal, we shall try to reduce the whole discussion in terms of bundles and ordinary homology theory because this is more comprehensible. In order to describe the pairing then we need the transverse fundamental cyclic cocycle: we shall give a simple dimensional argument here; the exact computations are

New Invariant for σ Models

769

rather too technical to be presented in greater detail. The cyclic cocycle we will get from the vertical foliation will be of dimension equal to the codimension of the vertical foliation which is equal to the dimension of the base space of our bundle. Moreover as we mentioned above, in this case the C ∗ -algebra of this foliation is SME to the algebra of functions on the base C(M). This is a commutative C ∗ -algebra whose cyclic homology is more or less the de Rham cohomology of the base space. Hence it is not too hard to suspect that we get a top homology class, which in fact turns out to be the fundamental class of our base space [M]. For the module denoted E above in this case one uses the following fact: it is a consequence of Serre–Swan theorem mentioned above that the link between topological K-theory and K-theory of commutative C ∗ -algebras is that given a complex vector bundle over M (thus a topological K-class), one considers the corresponding C(M)-module of smooth sections of the given complex vector bundle. Thus in this case the module E we get is Cc∞ (P × G, 1/2 ⊗ r ∗ (tC )), hence we can recover the corresponding complex vector bundle over M as follows: If we denote by (P , π, G, M) our original principal bundle and we consider the vertical foliation, then its normal bundle t would be π ∗ (T M), where T M is the tangent bundle of M. We prefer the topological K-theory description which in this case is rather easy to read: the bundle associated to this C(M) module E is: 1/2 ⊗ pr1∗ π ∗ (T M) −−−−→ π ∗ (T M) −−−−→     y y P ×G

−−−−→ pr1

P

TM  τ y M,

(4)

−−−−→ M π

where the fibre of the line bundle 1/2 is the linear space of maps ρ : ∧dim G Tγ (G) ⊗ ∧dim G Tγ (G) → C satisfying the well-known property for 1/2-densities. Hence in this case the result will be the number we get if we take the bundle 1/2 ⊗ pr1∗ π ∗ (T M) seen as a bundle over M, then apply the ordinary Chern character to it and integrate over M. The result will be a combination of the Pontryagin class of T M and the second Chern class of P (recall that we assumed M to be 4-dim and P is an SU(2) bundle) which is something expected. There are some subtleties though: we have the bundle 1/2 ⊗ pr1∗ π ∗ (T M) over the graph which is P × G. We want to see this as a bundle over M. We consider firstly the factor pr1∗ π ∗ (T M). This is indeed a bundle over M with fibre G × G × Rdim M , but this is neither a vector nor a principal bundle and in order to talk about characteristic classes one actually needs one or the other. In order not to change the topology then, which is what we are mainly interested in, we can consider the vector bundle T M ⊗ adP instead, where adP is the adjoint bundle to P . To study formally the classes of T M ⊗ adP is an exercise in mathematics (we forget the pull-backs since they can be treated easily). The point is that we shall get a combination of the Pontryagin classes of T M (or Chern if we complexify) and of Chern classes of P . The later is known since the bundle is given whereas the former can be computed from topological information of M itself. For example for simply connected closed 4-manifolds one has from the Hirzebruch signature formula that (see [17]): p1 = 3τ = 3(b+ − b− ), where τ is the signature.

770

I. P. Zois

As for the other factor, the 1/2-densities of the graph, seen as a rank 1 real bundle over the graph P × G, is rather dull. It will be determined by ω1 , the first Stiefel-Whitney class: it is either trivial (ω1 = 0) or non-orientable (ω1 = 1). (Note: half densities over a complex manifold say N with dim N = k are slightly more complicated: in this case its class will be 21 c1 (∧k T ∗ ), where ∧k T ∗ is the canonical bundle, so it will correspond to spinc structures on N.) But we still have the same problem that over M it is neither a vector nor a principal bundle. This can be overcome as before; the point is that since it is a dull bundle over P × G, the projection does not change anything, so as a bundle over M it will be determined by the topology of P , hence we also have Chern classes of P . What we gave above was a qualitative description and the lesson was that the invariant will be some combination of Chern numbers of P and the Pontryagin number of the tangent bundle T M of M. The key point is that T M appears because it is the transverse bundle of the vertical foliation. We expect then that characteristic classes of the transverse bundle should be important in general. What about the flat foliation then, which is the case which is related to nlσ m in general and to M-theory in particular in physics? This is a purely nc case. Computations are much harder and a picture involving bundles is impossible since we do not have Serre–Swan theorem. Moreover cyclic homology of nc algebras has no relation to the usual topology. We can still say something though: first of all, since we have a flat principal bundle, all its characteristic classes vanish, so we get nothing from them. Since the base space M is not simply connected, gauge inequivalent classes of flat connections are characterised by their holonomy. So we expect the holonomy to play some role. Moreover, from the vertical foliation case we saw that the normal bundle of the foliation also plays a vital role. Note. It is not always true that in the commutative case cyclic homology identifies well with ordinary homology, in fact this is always essentially true only for H0 . There may be complications. However this point will not be treated in this article with greater detail. See [24]. 6. Relation to Physics There are three cases in physics where this invariant may play some role: 1. Nlσ m. As it is well-known, σ models classically describe harmonic maps between two Riemannian manifolds (target and source spaces). From the genral remarks we made in the preceding section, we said that we expect the invariant to include the holonomy of the flat connection (namely π1 of the source space) plus the topology of the space of leaves (target space). Hence this invariant should contain information about the topology of both manifolds involved in σ models. Characteristic classes of foliations may provide the way to calculate the invariant (analogue of Chern–Weil theory, see [35]). Moreover, in the flat foliation case the invariant describes the topological charge of the M-theory Lagrangian density suggested in [37]. Another application is the following (we thank Dr. S. T. Tsou for pointing this out to us): we know from Polyakov ([18]) that Yang–Mills theories can be formulated as nlσ m on the loop space. This point of view very recently exhibited some very nice dualities of the Standard Model, see [44]. Hence the invariant, for the flat foliation case (since nlσ m can be thought of as a flat bundle with structure group the isometries of the target space), may be of some relevance also for Yang–Mills theories, the setting however

New Invariant for σ Models

771

will involve loop spaces now! We do not know exactly what its physical significance would be. Moreover doing K-theory on loop spaces (infinite dimensional manifolds) is considerably harder. There are however some path integral techniques (see again [44] and references therein). 2. Instantons with non simply connected boundary. Following largely the case of instantons we suggest that this invariant is related to interpolation between gauge inequivalent vacua which exist due to the non-simply connectedness of the space considered. Clearly there is extra degeneracy of the vacuum coming from the fact that our space is not simply connected. This degeneracy is of different origin than that of instantons since as it is well-known for ordinary instantons the degeneracy comes from the different topologies of the bundle considered. In more concrete terms we suppose that this invariant will be relevant in the following case: let us assume that we try to follow the discussion in the BPST famous paper on instantons [36]; if we assume that we have a space whose boundary is not just S 3 as in that case but a 3-manifold which has a non-trivial π1 . In this case we want the potential to become flat (pure gauge) on the boundary. However if the boundary is a 3-manifold with a non trivial fundamental group, then flat connections are not unique (up to gauge equivalence of course). We know more specifically that gauge equivalent classes of flat connections are in 1-1 correspondence with conjugate classes of representations of the fundamental group onto the structure group considered. Thus in this case the flat connection we choose will not be unique. This extra degeneracy of the vacuum comes from the different possible choices of flat connection, which is something noticed for the first time. The invariant is related to interpolation between these extra vacua. We expect then some relation with the so-called ALE gravitational instantons which are important both in quantum gravity and in gauge theory [39,40]. 3. Gravity, Non-Commutative Topological Quantum Field Theories (ncTQFT for brevity). In ordinary Yang–Mills theory, gauge transformations are described as automorphisms of the bundle (namely fibre preserving maps) which induce the identity on the base space (cf. for example [26]). Sometimes these are called strong bundle automorphisms. If one wants to generalise this picture and attempts to include the symmetry of general relativity, namely local diffeomorphisms of the base space, then there is a problem because there are local diffeomorphisms of the base space which can not be induced by bundle automorphisms (cf. [16]). In simple words: there are “more” local diffeomorphisms of the base space than bundle automorphisms. The way that theoretical physicists usually try to go around this problem is to begin with, supersymmetry and finally, supergravity. The origins of supersymmetry are actually quantum mechanical (multiplets with the same number of bosons and fermions plus symmetry between particles of different spin which should exist, if all interactions are eventually unified since gauge particles have spin-1 whereas the graviton is supposed to have spin-2. Another point of view is to examine the largest possible symmetry of the S-matrix elements in the framework of relativistic quantum field theory on Minkowski space). To make a long story short, based on the Coleman–Mandula theorem (which is responsible for introducing anti-commuting coordinates), what one actually does (following the superspace formalism for N = 1 supersymmetry coming from the observation that Minkowski space is actually Poincaré/Lorentz) is to enlarge the base manifold (assumed to be spacetime) by adding some fermionic dimensions (non-commuting coordinates), thus obtaining another space, the so-called superspace. This superspace, in an analogous fashion, can be seen as the quotient space superPoincaré/Lorentz. Supersymmetric Yang–Mills theories

772

I. P. Zois

are principal G-bundles over superspace with G some compact and connected Lie group (usually SU(N)) whereas supergravity can be seen as a principal G-bundle over superspace where G is the superPoincaré group (generalisation of Einstein–Cartan theory). However the so-called Noether technique which makes local a rigid supersymmetry and which is used mainly to construct supersymmetric interacting Lagrangians actually suggests that supersymmetric Yang–Mills theories can equivalently be seen as principal G-bundles over ordinary spacetime with G some Super Lie Group. Letting alone some severe criticism of supersymmetric theories (e.g. positive metric assumption), especially when the discussion comes to supergravity – (the most important experimental problem of supersymmetric theories is the fact that none of the superpartners of particles has ever been observed, the way phenomenologists try to overcome this problem is to assume spontaneously breaking of supersymmetries; two of the main theoretical problems are: the aspect of supergravity as a local gauge theory which is not completely mathematically justified, for example in N = 8 D = 4 supergravity theory coming from D = 11 N = 1 supergravity which is supposed to be the best candidate for unification and according to recent progress one of the two low energy limits of M-theory, local diffeomorphisms are supposed to come from gauging the group O(8), an assumption which is based on the observation that ordinary gravity comes from gauging the Poincaré group, something which is wrong because of the existence of the “shouldering” form on arbitrary curved manifolds; the second important problem is that most of the extended supersymmetric and supergravity theories are actually up to now formulated only “on-shell”, namely they are essentially classical theories – for N = 1 supergravity though there is another problem, one has more than one “off-shell” formulation, this in fact has now an explanation from the recently observed string/5-brane duality in D = 10 (old brane-scan); trying to be fair, we must mention that the good features of such theories are that they offer probably the only up to now known hope for unification plus the fact that they give “less divergent” theories, something essential for perturbative quantum fields theories), we would like to propose here another approach; our approach is more in the spirit of non-perturbative quantum field theories, in fact topological quantum field theories: instead of enlarging the base manifold by considering anti-commuting coordinates, we chose to relax the fibre preserving condition, meaning that now we allow “bundle” maps which are not fibre-preserving; in such a case, fibres may be “mixed up”, for example they may be “tilted” or “broken”. The resulting structure after applying these more general transformations to our original bundle may no longer be a fibre bundle, but it will still be a foliation. In this case however what we will get as the quotient space will not necessarily be the manifold we had originally in our bundle construction (supposed to be space-time) but another space of leaves with the same dimension and maybe very different topology. In this case the dimension of the leaves is kept fixed, equal to the dimension of the Lie algebra considered; had we changed that, the dimension of the space of leaves would have changed accordingly. This picture is quite close to the picture that string theorists patronise, namely that space-time is not fixed but it emerges as a ground state from some dynamical process. In fact there is a deep result due to Thurston which for a given manifold M, say, it relates the group of local diffeomorphisms of M with the group of foliations of M [3]. In particular Thurston proves that one has an isomorphism in cohomology after some shift in the degrees between the classifying space of local diffeomorphisms and the classifying space of foliations of a closed manifold M. If we take M to be the total space of a principal bundle P over spacetime, then obviously local diffeomorphisms of the base space are included to local diffeomorphisms of the total space which in turn are

New Invariant for σ Models

773

very closely related to the group of foliations. Hence at least in principle looking at the group of foliations of the total space of a principal bundle provides a framework which is rich enough in order to incorporate local diffeomorphisms of the base space, something we need in order to relate general relativity with Yang–Mills theory symmetries and this framework is mathematically rigorous. The group then of all foliations of the total space with fixed codimension is huge. It definitely contains all foliations which are “regular” enough in order to get manifolds diffeomorphic to our original one. Yet foliations can be really nasty: in this case the quotient space may not be a manifold at all but a “quantum” topological space. All these cases need to be studied. For the moment we know that whenever the foliations have a corresponding C ∗ algebra which is SME to a commutative one, then the space of leaves will be a compact Hausdorff topological space of the same dimension. If the C ∗ -algebras of two foliations can be related with a *-preserving homomorphism, then the corresponding quotient spaces will be homeomorphic. What the appropriate condition on the C ∗ algebras is in order to get diffeomorphic manifolds, we do not know (this point is of particular interest in 4-dim due to the existence of the so-called “exotic structures” for 4-manifolds). The main point here is that we can “control” how much noncommutativity we want in the C ∗ -algebra and then see what this means topologically. At this point we would like to recall that mathematically, going from classical physics to quantum is going from commuting algebras to non-commuting ones. The essence of Planck’s constant then is that it tells us “how much” non-commutativity we want. Moreover there is the fundamental theorem for C ∗ -algebras representations, namely that for each C ∗ -algebra (commutative or not), there exists a Hilbert space whose space of bounded operators is actually “the same” as the original C ∗ -algebra. Hence for each foliation there exists a corresponding C ∗ -algebra (commutative or not), a corresponding topological space (space of leaves which may be a manifold or a quantum topological space respectively) and finally, a Hilbert space as a representation space! Let us now turn to something related to the above but a little more concrete: for the moment let us consider the case where the dimensions of the leaves and of the space of leaves are kept fixed. This situation has some similarities with quantum gravity seen as a TQFT (in fact we generalise that picture and we present a way to consider unified theories-namely gravity and Yang–Mills theories-as Non-Commutative Topological Quantum Field Theories). In [39] it was argued that TQFT may provide a framework which is rich enough for the development of a quantum theory of gravity. In that aspect, space-time was treated as an unquantized object whereas the metric was quantum mechanical. The idea in TQFT framework is to find an invariant Z(M) for a topological space M and then one seeks for a Lagrangian density whose partition function yields the invariant Z(M), see [34]. One has to be a little more careful though in order for Atiyah’s axioms for TQFT to be satisfied [14]. In quantum mechanics one usually has a space of quantum states associated to a given system. Often this space of states refers to a particular instant of time, which can be represented in a 4-dimensional world by a space-like hypersurface. In TQFT this vector space appears as part of the definition, when the space-time M has a boundary, i.e. a space of dimension 1 less. In more concrete terms, to a (d − 1)-dim space 6 we associate a vector space V (6) and to each d-dim space M with ∂M = 6 we associate a vector Z(M), the partition function of the space M. This point can be generalised, in fact 6 can be any embedded submanifold of M with dimension 1 less. One interpretation of these conditions is that 6 represents the “present instant” of time and that the vectors in V (6) which are determined by various choices of observables represent a memory of past facts. The primary problem nonetheless is

774

I. P. Zois

the construction of invariants for spaces and the state spaces and partition functions for spaces with boundary are usually obtained as a by-product. Since by our proposal above one can end up with quantum topological spaces as spaces of leaves of foliations, one can call this theory non-commutative topological quantum field theory and we believe that this can provide a framework for quantum unified theories (including Yang–Mills and gravity). The picture we have then is the following: we start with a G-bundle P over a 4manifold M. From symmetry considerations, namely we want to include local diffeomorphisms of the base space and relate them to bundle automorphisms (hence relating general relativity to gauge theory), we end up to consider all dimG-dim foliations of P . Automatically the 4-dim space which is the space of leaves is somehow “quantized”, namely it is forced to have one of the leaf topologies. This is a difference with TQFT as explained in [39], where the metric was quantised but space-time was unquantized (needless to say, in such a case as ours, the metric is quantised too automatically). Moreover, for each foliation we have a quotient space of leaves and hence an invariant Z(M) which is a complex number. The boundary of course can be added with its vector space attached to it, one however has to examine what happens to it as foliations vary. The Lagrangian density whose partition function is this invariant for foliations is an open question. It should be related to characteristic classes for foliations. A good indication for that is the fact that the 0q functor of q-dim Haefliger structures or 0q structures as they are known in topology (and hence foliations which is an example of a 0q structure, where q is the codimension of the foliation) is representable, see [13]. So for the moment we do not have a “full” specific ncTQFT since we have the invariant but not the appropriate Lagrangian density. We presented though a generalisation of TQFT for non-commutative cases. Another possible application might be the ability to construct deformed Yang–Mills theories, see [43]. In that paper, some new compactifications of the IKKT matrix theory on non-commutative tori were introduced which, in a certain sense, could be realised as deformed Yang–Mills theories. Clearly in this case our invariant will be the “instanton number” of these deformed Yang–Mills theories. This picture also suggests that the above described non-commutative topological quantum field theories can be seen as emerging from M-theory compactified down to some non-commutative spaces (tori or other).

7. M-Theory In this section we shall present an application to M-Theory. Since it is more extensive, we give it separately. We know that M-Theory consists of membranes and 5-branes living on an 11manifold ([11,12]) and it is non-perturbative. This theory has a very intriguing feature: we can only extract information about it from its limiting theories, namely either from D = 11 N = 1 supergravity or from superstrings in D = 10. This is so because this theory is genuinly non-perturbative for a reason which lies in the heart of manifold topology: Let us recall that in string theory, the path integral involves summation over all topologically distinct diagrams (the same for point particles of course). Strings are 1branes, hence in time they sweep out a 2-manifold. At the tree level then we need all topologically distinct simply connected 2-manifolds (actually there is only one, as topology tells us) and for loop corrections, topology again says that topologically distinct

New Invariant for σ Models

775

non-simply connected 2-manifolds are classified by their genus, so we sum up over all Riemann surfaces with different genus. It is clear then that for a perturbative quantum field theory involving p-branes we have to sum upon all topologically distinct (p+1)-dim manifolds: simply connected ones for tree level and non-simply connected ones for loop corrections. Thus we must know beforehand the topological classification of manifolds in the dimension of interest. That is the main problem of manifold topology in mathematics. But now we face a deep and intractable problem: geometry tells us, essentially via a no-go theorem which is due to Whitehead from the late ’40’s, that: “we cannot classify non-simply connected manifolds with dimension greater or equal to 4”! Hence for pbranes with p greater or equal to 3, all we can do via perturbative methods is up to tree level! What happens for 3-manifolds then (hence for membranes)? The answer from mathematics is that we do not know if all 3-manifolds can be classified! So even for 2-branes it is still unclear whether perturbative methods work (up to all levels of perturbation theory)! The outlet from this situation that we propose here is not merely to look only at non perturbative aspects of these theories (i.e. the soliton part of the theory) and then apply S-duality, as was done up to now, but to abandon perturbative methods completely from the very beginning. There is only one way known up to now which can achieve this “radical” solution to our problem: formulate the theory as a Topological Quantum Field Theory and hence get rid of all perturbations once and for all. Let us explain how this can be achieved Our approach is based on one physical “principle”: A theory containing p-branes should be formulated on an m-dim manifold which admits 0q -structures, where q = m − p − 1. N.B. Although we used in our physical principle 0q -structures which are more general than foliations, we shall use both these terms to mean essentially the same structure. The interested reader may refer to [2] for example to see the precise definitions which are quite complicated. The key point however is that the difference between 0q -structures (or Haefliger structures as they are most commonly known in topology) and codim-q foliations is essentially the difference between transverse and normal. This does not affect any of what we have to say, since Bott-Haefliger theory of characteristic classes is formulated for the most general case, namely 0-structures. We would also like to mention the relation between 0-structures and -spectra which is currently an active field in topology. (For D-branes we need a variant of the above principle, namely we need what are called plane foliations but we shall not elaborate on this point here.) One way of thinking about this principle is that it is analogous to the “past histories” approach of quantum mechanics. Clearly on the quantum level one should integrate over all foliations of a given codim. A warning here. This principle does not imply that all physical processes between branes are described by foliations. Although the group of foliations is huge, in fact comparable in size with the group of local diffeomorphisms [3], and foliations can be really “very nasty”, we would not like to make such a strong statement. What is definitely true though is that some physical processes are indeed described by foliations, hence at least this condition must be satisfied because of them.

776

I. P. Zois

Note. Before going further, we would like to make one crucial remark: this principle puts severe restrictions on the topology that the underlying manifold may have, in the case of M-Theory this is an 11-manifold. It is also very important if the manifold is open or closed. This may be of some help, as we hope, for the compactification problem of string theory or even M-Theory, namely how we go from D = 10 (or D = 11) to D = 4 which is our intuitive dimension of spacetime. We shall address this question in the next section. The final comment is this: this principle puts absolutely no restriction on the usual quantum field theory for point particles in D=4, e.g. electroweak theory or QCD. This is so because in this case spacetime is just R4 which is non-compact and we have 0-branes (point particles) and consequently 1-dim foliations for which the integrability condition is trivially satisfied (essentially this is due to a deep result of Gromov for foliations on open manifolds, which states that all open manifolds admit codim 1 foliations; in striking contrast, closed manifolds admit codim 1 foliations iff their Euler characteristic is zero, see for example in [2,4] or references therein). If we believe this principle, then the story goes on as follows: we are on an 11manifold, call it M for brevity and we want to describe a theory containing 5-branes for example (and get membranes from S-duality). Then M should admit 6-dim foliations or equivalently codim 5 foliations. We know from Haefliger that the 0q -functor, namely the functor of codim q Haefliger structures and in particular codim q foliations, is representable. Practically this means that we can have an analogue of Chern–Weil theory which characterises foliations of M up to homotopy using cohomology classes of M. (One brief comment for foliations: one way of describing Haefliger structures more generally is to say that they generalise fibre bundles in exactly the same way that fibre bundles generalise the Cartesian product. This observation is also important when mentioning gerbes later on.) In fact it is proved that the correct cohomology to classify Haefliger structures up to homotopy (and hence foliations which constitute a particular example of Haefliger structures) is the Gelfand-Fuchs cohomology. This is a result of Bott and Haefliger, essentially generalising an earlier result due to Godbillon and Vey which was dealing only with codim 1 foliations, [5]. Now we have a happy coincidence: the Bott-Haefliger class for a codim 5 foliation (which, recall, is what we want for 5-branes on an 11-manifold) is exactly an 11-form, something that fits well with using it as a Lagrangian density! The construction for arbitrary codim q foliations goes as follows: let F be a codim q foliation on an m-manifold M and suppose its normal bundle ν(F ) is orientable. Then F is defined by a global decomposable q-form . Let {(Ui , Xi )}i∈I be a locally finite cover of distinguished coordinate charts on M with a smooth partition of unity {ρi }. Then set X m−q+1 ρi dxi ∧ · · · ∧ dxim . = i∈I

Since is integrable, d = θ ∧ ,

(5)

where θ is some 1-form on M. The (2q+1)-form γ = θ ∧ (dθ )q ,

(6)

is closed and its de Rham cohomology class is independent of all choices involved in defining it, depending only on homotopy type of F . That’s the class we want.

New Invariant for σ Models

777

Clearly for our case we are on an 11-manifold dealing with 5-branes, hence 6-dim foliations, hence codim 5 and thus the class γ is an 11-form. This construction can be generalised to arbitrary 0qr -structures as a mixed de RhamCech cohomology class and thus gives an element in H 2q+1 (B0qr ; R), where B0qr is the classifying space for 0qr -structures. Note that in fact the BHGV class is a cobordism invariant of codim q foliations of compact (2q+1)-dim manifolds. This construction gives one computable characteristic class for foliations. Optimally we would like a generalisation of the Chern–Weil construction for GLq . That is we would like an abstract GDA with the property that for any codim q foliation F on a manifold M there is a GDA homomorphism into the de Rham algebra on M, defined in terms of F such that the induced map on cohomology factors through a universal map into H ∗ (B0qr ; R). This algebra is nothing more than the Gelfand-Fuchs Lie coalgebra of informal vector fields in one variable. More concretely, let 0 be a transitive Lie-pseudogroup acting on Rn and let a(0) denote the Lie algebra of formal 0 vector fields associated to 0. Here a vector field defined on U ⊂ Rn is called a 0 vector field if the local 1-parameter group which it engenders is 0 and a(0) is defined as the inverse limit a(0) = lim← a k (0) of the k-jets at 0 of 0 vector fields. In the pseudogroup 0 let 00 be the set of elements of 0 keeping 0 fixed and set 00k equal to the k-jets of elements in 00 . Then the 00k form an inverse system of Lie groups and we can find a subgroup K ⊂ lim← 00k whose projection on every 00k is a maximal compact subgroup for k > 0. This follows from the fact that the kernel of the projection 00k+1 → 00k is a vector space for k > 0. The subgroup K is unique up to conjugation and its Lie algebra k can be identified with a subalgebra of a(0). For our purposes we need the cohomology of basic elements rel K in a(0), namely H (a(0); K) which is defined as follows: Let A{a k (0)} denote the algebra of multilinear alternating forms on a k (0) and let A{a(0)} be the direct limit of the A{a k (0)}. The bracket in a(0) induces a differential on A{a(0)} and we write H {a(0)} for the resulting cohomology group. The relative group H ∗ (a(0); K) is now defined as the cohomology of the subcomplex of A{a(0)} consisting of elements which are invariant under the natural action of K and annihilated by all inner products with elements of k. Then the result is: Let F be a 0-foliation on M. There is an algebra homomorphism φ : H {a(0); K} → H (M; R) which is a natural transformation on the category C(0). The construction of φ is as follows: Let P k (0) be the differential bundle of k-jets at the origin of elements of 0. It is a principal 00k -bundle. On the other hand 0 acts transitively on the left on P k (0). Denote by A(P ∞ (0)) the direct limit of the algebras A(P k (0)) of differential forms on P k (0). The invariant forms wrt the action of 0 constitute a differential subalgebra denoted A0 . One can then prove that it is actually isomorphic to A(a(0)). Now let F be a foliation on M and let P k (F ) be the differentiable bundle over M whose fibre at every point say x ∈ M is the space of k-jets at this point of local projections that vanish on x. This is a 00k -principal bundle. Its restriction is isomorphic to the inverse image of the bundle P k (0), hence the differential algebra of 0-invariant

778

I. P. Zois

forms on P k (0) is mapped in the algebra A(P k (F )) of differential forms on P k (F ). If we denote by A(P ∞ (F )) the direct limit of A(P k (F )) we get an injective homomorphism φ of A(a(0)) in A(P ∞ (F )) commuting with the differential. This homomorphism is compatible with the action of K, hence induces a homomorphism on the subalgebra of K-basic elements. But the algebra A(P k (F ); K) of K-basic elements in A(P k (F )) is isomorphic to the algebra of differential forms on P k (F )/K which is a bundle over M with contractible fibre 00k /K. Hence H (A(P k (F ); K)) is isomorphic via the de Rham theorem to H (M; R). The homomorphism φ is therefore obtained as the composition H (a(0); K) → H (A(P ∞ (F ); K)) = H (M; R). But we think that is enough with the abstract nonsense formalism. Let us make our discussion more down to earth: Consider the GDA (over R) W Oq = ∧(u1 , u3 , . . . , u2(q/2)−1 ) ⊗ Pq (c1 , . . . , cq ) with dui = ci for odd i and dci = 0 for all i and Wq = ∧(u1 , u2 , . . . , uq ) ⊗ Pq (c1 , . . . , cq ) with dui = ci and dci = 0 for i = 1, . . . , q, where deg ui = 2i − 1, deg ci = 2i and ∧ denotes exterior algebra, Pq denotes the polynomial algebra in the ci ’s mod elements of total degree greater than 2q. The cohomology of Wq is the Gelfand Fuchs cohomology of the Lie algebra of formal vector fields in q variables. We note that the ring structure at the cohomology level is trivial, that is all cup products are zero. Then the main result is that there are homomorphisms φ : H ∗ (W Oq ) → H ∗ (B0qr ; R),

˜ qr ; R) φ˜ : H ∗ (Wq ) → H ∗ (B0

˜ qr denotes the classifying space for framed for r ≥ 2 with the following property (B0 r foliations): If F is a codim q C foliation of a manifold M, there is a GDA homomorphism φF : W Oq → ∧∗ (M) into the de Rham algebra on M, defined in terms of the differential geometry of F and unique up to chain homotopy, such that on cohomology we have φF = f ∗ ◦ φ, where f : M → B0qr classifies F . If the normal bundle of F is trivial, there is a homomorphism φ˜F : Wq → ∧∗ (M) with analogous properties. Combining this result with the fact that B 0˜q0 is contractible, we deduce that a foliation is essentially determined by the structure of its normal bundle; the Chern classes of the normal bundle are contained in the image of the map φ above but we have additional non-trivial classes in the case of foliations (which are rather difficult to find though), one of which is this BHGV class which we constructed explicitly and it is the class we use as a Lagrangian density which is purely topological since its degree fits nicely for describing 5-branes. There is an alternative approach due to Simons [9] which avoids passing to the normal bundle using circle coefficients. What he actually does is to associate to a principal bundle

New Invariant for σ Models

779

with connection a family of characteristic homomorphisms from the integral cycles on a manifold to S 1 and then defines an extension denoted Kq2k of H 2k (BGLq ; Z). This approach is related to gerbes. A gerbe over a manifold is a construction which locally looks like the Cartesian product of the manifold with a line bundle. Clearly it is a special case of foliations (remember our previous comment on foliations). However this approach actually suggests that they might be equivalent, if the approach of BottHaefliger is equivalent to that of Simons, something which is not known. Now the conjecture is that the partition function of this Lagrangian is related to the invariant introduced in [38]. In order to establish the relation with physics, we must make some identifications. The 1-form θ appearing in the Lagrangian has no direct physical meaning. In physics it is assumed that a 5-brane gives rise to a 6-form gauge field denoted A6 whose field strength is simply dA6 = F7 .

(7)

The only way we can explain this geometrically is that this 6-form is the Poincaré dual of the 6-chain that the 5-brane sweeps out as it moves in time. We know that since we have S-duality between membranes and 5-branes, in an obvious notation one has F7 = ∗F4 .

(8)

F4 = dA3 .

(9)

which is the S-duality relation, where

Observe now that the starting point for 5-brane theory is A6 where the starting point to construct the BHGV class was the 5-form . How are they related? There are three obvious possibilities: I. d = A6 . That would imply that A6 is pure gauge. II. dF4 = . This is trivial because it implies d = 0, hence d = θ ∧ = 0. III. The only remaining possibility is ∗A6 = .

(10)

We call this the “reality condition”. So now in principle we can substitute Eqs. (10) and (5) into (6) and get an expression for the Lagrangian which involves the gauge field A6 . The Euler-Lagrange equations which are actually analogous to D=11 N=1 supergravity Euler-Lagrange equations (see Eq. (12) below) read: 1 d ∗ dθ + (dθ )5 = 0. 5

(11)

The on-shell relation with D=11 N=1 supergravity is established as follows: recall that the bosonic sector of this supergravity theory is Z F4 ∧ F4 ∧ A3 ,

780

I. P. Zois

where F4 = dA3 with Euler–Lagrange equations 1 d ∗ F4 + F4 ∧ F4 = 0. 2

(12)

Constraining A3 via (12), by (9), (8), (7), (10) and (5) we get a constraint for θ which can be added to the class γ as a Lagrange multiplier. In order to calculate the partition function, some additional difficulties may arise because we do not know what notion of equivalence between foliations is the appropriate one for physics in order to fix the gauge and add Faddeev–Popov terms as constraints to kill-off the gauge freedom. There are actually four different notions of equivalence for foliations: conjugation, homotopy, integrable homotopy and foliated cobordism. In principle, one must end up with an equivalent theory starting with membranes (that’s due to S-duality), provided of course a suitable class was found. Clearly the BHGV class for a membrane would be a 17-form. The final comment refers to [45]. In that article it was conjectured that the quantum mechanics of branes could be described as a matrix model. As it is well-known matrix models use point particle degrees of freedom. This is rather intriguing since we are talking about M-theory which contains various p-branes. In our approach though we propose a Lagrangian density which has as a fundamental object a mysterious 1-form which, if seen as a gauge potential, would imply the existence of some yet unknown underlying point particle!

7.1. Plane fields. We now pass on to the second question raised in this application, namely the restrictions on the topology of the underlying manifold of a theory containing p-branes via our physical principle. It is clear from the definition that the existence of a foliation of certain dim, say d (or equivalently codim q=n-d) on an n-manifold (closed) depends: a) On the existence of a dim d subbundle of the tangent bundle, b) On this d-dim subbundle being integrable. The second question has been answered almost completely by Bott and in a more general framework by Thurston. Bott’s result dictates that for a codim q subbundle of the tangent bundle to be integrable, the ring of Pontrjagin classes of the subbundle with degree > 2q must be zero. There is a secondary obstruction due to Shulman involving certain Massey triple products but we shall not elaborate on this. However Bott’s result suggests nothing for question a) above. Let us also mention that this result of Bott can be deduced by another theorem due to Thurston which states that the classifying space B 0˜q∞ of smooth codim q framed foliations is (q+1)-connected. On the contrary, Thurston’s result reduces the existence of codim q > 1 foliations (at least up to homotopy) to the existence of q-plane fields. This is a deep question in differential topology, related to the problem of classification of closed manifolds according to their rank. Now the problem of existence of q-plane fields has been answered only for some cases for spheres S n for various values of n,q [6]. In particular we know everything for spheres of dimension 10 and less. We should however mention a theorem due to Winkelnkemper [7] which is quite general in nature and talks about simply connected compact manifolds of dim n greater than 5. If n is not 0 mod 4 then it admits a so-called Alexander decomposition which under special assumptions can give a particular kind of

New Invariant for σ Models

781

a codim 1 foliation with S 1 as the space of leaves and a surjection from the manifold to S1. If n is 0 mod 4 then the manifold admits an Alexander decomposition iff its signature is zero. Let us return to string theory now: String theory works in D=10 and in this case we have the old brane-scan suggesting the string/5-brane duality. The new brane-scan contains all p-branes for p ≤ 6 and some D-7 and 8-branes are thought to exist. However topology says that for a sphere in dim 10 we can have only dim 0 and dim 10 plane fields (in fact this is true for all even dim spheres), hence by Thurston only dim 0 and dim 10 foliations and then our physical principle suggests that S 10 is ruled out as a possible underlying topological space for string theory. What about M-Theory in D = 11 then? For the case of S 11 then it is known that S 11 admits a 3-plane field, hence by our physical principle a theory containing membranes can be formulated on S 11 . For S 11 nothing is known for the existence of q-plane fields for q greater than 3. But now we apply S-duality between membranes/5-branes and conjecture that: S 11 should admit 5-plane fields. Let us close with two final remarks: 1. There is extensive work in foliations with numerous results which actually insert many extra parameters into their study, for example metric aspects, existence of foliations with compact leaves (all or at least one or exactly one), with leaves diffeomorphic to Rn for some n, etc. We do not have a clear picture for the moment concerning imposing these in physics. Let us only mention one particularly strong result due to Wall generalising a result of Reeb [8]: if a closed n-manifold admits a codim 1 foliation whose leaves are homeomorphic to Rn−1 , then by Thurston we know that its Euler characteristic must vanish, but in fact we have more: it has to be the n-torus! The interesting point however is that although all these extended object theories in physics are expressed as σ models [37], hence they involve metrics on the manifold (target space) and on the worldvolumes i.e. on the leaves, in our approach the metric is only used in the reality condition (10) which makes connection with physical fields (that is some metric on the target space) where at the same time we do not use any metric on the source space (worldvolumes-leaves of the foliation). 2. In [37] another Lagrangian density was proposed. It is different from the one described here but they are related in an analogous way to the relation between the Polyakov and Nambu-Goto (in fact Dirac [10]) actions for the free bosonic string: extended objects basically imitate string theory and we have two formalisms: the σ model one which is the Lagrangian exhibited in [37] using Polyakov’s picture of σ models as flat principal bundles with structure group the isometries of the metric on the target space [18]; yet we also have the embedded surface picture which is the Dirac (Nambu–Goto) action and whose analogue is described in this work. In the light of very recent work [34], we can also make some further comments: the first is that the Moyal algebra used in order to discuss noncommutative solitons is actually Morita equivalent to the usual commutative one. This fact can be further verified from the explicit construction of an algebra homomorphism between noncommutative and ordinary Yang–Mills fields based on Gelfand’s theorem. Truly noncommutative situations appear when discussing noncommutative tori. The next comment refers to the last section of that paper: we already know that strings in a constant magnetic field can be described from that Moyal algebra-like spacetime structure and they also discuss what may happen to 5-branes in M-theory. Our point of

782

I. P. Zois

view coincides with theirs in the following way: we here propose that this is indeed the case for 5-branes with a C-field turned on, namely that this situation can be described by 6-dim foliations which rather correspond to a “free” theory of branes but on a “noncommutative” topological space, which is actually, in our case, the space of leaves of the corresponding foliation. Note added in proof. The partition function of the BHGV class should be expressed in terms of the Ray-Singer analytic torsion, see [46] where abelian Chern–Simons theory was discussed. One important point here is that considering the full non-abelian 11-dim Chern–Simons theory, apart from many complications, one will probably also have to abandon S-duality since the Hodge isomorphism holds only for real-valued forms and with mild assumptions on the holonomy can in fact be extended to flat bundle valued forms (with non-abelian structure group, see [47]). We hope to report more on this elsewhere. A crucial observation is that for a 3-manifold the BHGV class gives a topological quantum field theory which coincides with the abelian version of the Jones– Witten theory. Acknowledgement. I would like to thank John Roe for useful discussions.

References 1. Bott, R. and Haefliger, A.: Characteristic classes of 0-foliations. Bull. Am. Math. Soc. 78.6 (1972) 2. Lawson, H.B.: Foliations. Bull. Am. Math. Soc. 80.3 (1974) 3. Thurston, W.: Theory of foliations of codim greater than 1. Comment. Math. Helvetici 49, 214–231 (1974); Thurston, W.: Foliations and groups of diffeomorphisms. Bull. Am. Math. Soc. 80.2 (1974) 4. Gromov, M.L.: Stable mappings of foliations into manifolds. Izv. Akad. Nauk. USSR Ser. Mat. 33 (1969) 5. Godbillon, C. and Vey, J.: Un invariant des feuilletages de codim 1. C. R. Acad. Sci. Paris Ser AB 273 (1971) 6. Steenrod, N.: The topology of fibre bundles. Princeton, NJ: Princeton Univ. Press, 1951 7. Winkelnkemper, H.E.: Manifolds as open books. Bull. Am. Math. Soc. 79 (1973) 8. Reeb, G.: Feuillages, resultats anciens et nouveaux. Montreal, 1982 9. Simons, J.: Characteristic forms and transgression. Preprint, SUNY Stony Brook 10. Dirac, P.A.M.: Proc. Roy. Soc. London A166, (1969) 11. Duff, M.J. et al: String Solitons. Phys. Rep. 259, 213 (1995); Duff, M.J.: Supermembranes. hep-th 9611203 12. West, P.C.: Supergravity, brane dynamics and string dualities. hep-th 9811101 13. Bott, R.: Lectures on characteristic classes and foliations. Berlin–Heidelberg–New York: Springer LNM 279 1972 14. Atiyah, M.F.: Topological quantum field theories. Publ. Math. IHES 68, 175–186 (1989); The geometry and the physics of knots. Cambridge: Cambridge University Press, 1990 15. Wegge-Olsen: K-Theory and C ∗ -algebras. Oxford: Oxford University Press, 1992 16. Shardanashvili, G., Zakharov, O.: Gauge Gravitation Theory. Singapore: World Scientific, 1991 17. Hirzebruch, F.: Geometrical methods in Topology. Berlin: Springer, 1970 18. Polyakov, A.: Nucl. Phys. B164 (1981); Phys. Lett. 82B (1979); hep-th/9607049 Princeton preprint 1996 19. Hector, G., Hirsch, U.: Introduction to the Geometry of Foliations. Vieweg, 1981 20. Connes, A.: A survey of foliations and operator algebras. Proc. Sympos. Pure Math. 38, Providence, RI: AMS 1982 21. Baum, P., Douglas, R.: K-homology and Index theory. ibid. 22. Connes, A.: Non-Commutative Geometry. London–New York: Academic Press, 1994 23. Connes, A.: Noncommutative differential Geometry I, II. IHES Publ. Math. 63 (1985) 24. Connes, A.: Cyclic cohomology and the transverse fundamental class of a foliation. Pitman Res. Notes in Math. 123, London: Longman Harlow, 1986 25. Townsend, P., Freedman, D.: Nucl. Phys. B177, 282 (1981) 26. Donaldson, S.K., Kronheimer, P.B.: The geometry of 4-manifolds. Oxford: Oxford University Press, 1991 27. Connes, A., Skandalis, G.: The longitudinal index theorem for foliations. Publ. Res. Inst. Sci. Kyoto 20 (1984) 28. Atiyah, M.F., Patodi, V.K., Singer, I.M.: Spectral assymetry and Riemannian geometry I, II, III. Math. Proc. Cambridge Philos. Soc. 77 (1975)

New Invariant for σ Models

783

29. Winkelnkemper, H.E.: The graph of a foliation. Ann. Global Anal. and Geom. 1 No. 3, 51 (1983) 30. Whitehead, G.W.: Elements of Homotopy Theory. Berlin: Springer, 1978 31. Douglas, R., Harder, S., Kaminker, J.: Toeplitz operators and the eta invariant: The case of S 1 . Contemp. Math. 70 (1988) 32. Douglas, R., Harder, S., Kaminker, J.: Cyclic cocycles, renormalisation and eta invariant. Invent. Math 103 (1991) 33. Witten, E.: Global Gravitation Anomalies. Commun. Math. Phys. 100 (1985) 34. Witten, E.: Quantum Field Theories and the Jones polynomial. Commun. Math. Phys. 121 (1989); Topology-changing amplitudes in (2+1)-dim gravity. Nucl. Phys. B323, 113–140 (1989); (2 + 1)-dim gravity as an exactly soluble system. Nucl. Phys. B311, 46–78 (1988); String Theory and Noncommutative Geometry. hep-th/9908142 35. Kamber, F.W., Tondeur, P.: Foliated Bundles and Characteristic Classes. LNM 493, Berlin–Heidelberg– New York: Springer, 1975 36. Belavin, A.A., Polyakov, A.M., Schwartz, A.S., Tyupkin,Y.S.: Pseudoparticle Solutions of theYang–Mills Equations. Phys. Lett. 59B (1975) 37. Zois, I.P.: Search for the M-theory Lagrangian. Phys. Lett. B402, 33–35 (1997) 38. Zois, I.P.: The duality between two-index potentials and the non-linear σ model in field theory. D. Phil Thesis, Oxford, Michaelmas, 1996 39. Barrett, J.W.: Quantum Gravity as a Topological Quantum Field Theory. Nottingham Mathematics Preprint 1995 40. Kronheimer, P.B., Nakajima, H.: Yang–Mills instantons on ALE gravitational instantons. Mathematische Annalen 288, 263 (1990) 41. Henneaux, M.: Uniqueness of the Freedman–Townsend interaction vertex for 2-form gauge fields. Preprint Universite Libre de Bruxelles, 1996 42. Loday, J.-L.: Cyclic Homology. Berlin: Springer, 1991 43. Connes, A., Douglas, M.R. and Schwarz, A.: Noncommutative Geometry and Matrix Theory: Compactification on Tori. J. High Energy Phys. 02, 003 (1998) 44. Tsou S.T. et al.: Features of Quark and Lepton Mixing from Differential Geometry of Curves on Surfaces. Phys. Rev. D58 053006 (1998); Tsou, S.T. et al.: The Dualised Standard Model and its Applications. Talk given at the International Conference of High Energy Physics 1998 (ICHEP 98), Vancouver, Canada 45. Banks, T. et al.: M-Theory as a Matrix Model: A Conjecture. Phys. Rev. D55, 5112 (1997) 46. Schwarz, A.S.: The partition function of degenerate quadratic functional and the Ray–Singer invariants. Lett. Math. Phys. 2 (1978) 47. Tsou, S.T. and Zois, I.P.: Two index potentials and twisted de Rham cohomology. To appear in Rept. Math. Phys. Communicated by A. Connes

Commun. Math. Phys. 209, 785 – 810 (2000)

Communications in

Mathematical Physics

© Springer-Verlag 2000

On Nonlinear σ -Models Arising in (Super-)Gravity Peter Breitenlohner, Dieter Maison Max-Planck-Institut für Physik (Werner-Heisenberg-Institut), Föhringer Ring 6, 80805 München, Germany Received: 3 August 1998 / Accepted: 1 December 1999

Abstract: In a previous paper with Gibbons [1] we derived a list of three dimensional symmetric space σ -models obtained by dimensional reduction of a class of four dimensional gravity theories with abelian gauge fields and scalars. Here we give a detailed analysis of their group theoretical structure leading to an abstract parametrization in terms of “triangular” group elements. This allows for a uniform treatment of all these models. As an interesting application we give a simple derivation of a “Quadratic Mass Formula” for strictly stationary black holes.

1. Introduction Starting from the maximal eleven dimensional supergravity theory many different models with N ≥ 1 supersymmetries have been constructed in lower dimensions via KałuzaKlein reductions. In the case of compactification on a torus the use of adapted coordinates leads to abelian gauge symmetries derived from the general covariance in eleven dimensions. Depending on the dimension some of the abelian vector resp. tensor gauge fields can be replaced by scalar potentials via Hodge duality. Experience shows that all the scalars of these models organize themselves into the simple structure of a non-linear sigma model with a coset space G/H as target space. Even more, these coset spaces have the structure of a Riemannian resp. pseudo-Riemannian non-compact symmetric space. As the number of scalars increases when the dimension is reduced it reaches a maximum in three dimensions. Although further reduction to two dimensions results in another scalar from the 3-metric, the latter plays a different role and does not lead to a further increase of G/H .1 In our previous paper with Gibbons [1] we gave a general classification of four dimensional bosonic theories leading to three dimensional reductions allowing for “Ehlers-Harrison” type transformations. Besides the models obtained 1 There is however a dramatic increase of G to the infinite dimensional Geroch group, if the complete integrability of the two dimensional theories is taken into account [2,3].

786

P. Breitenlohner, D. Maison

from supergravities we found many new cases including certain infinite series, all of them resulting in symmetric space sigma models. The present paper is devoted to an elaboration of the specific structure of all these coset spaces. An essential step for the identification of the particular sigma model corresponding to a theory is the parametrization of the coset space with the scalar fields of the theory. Considering specific examples one finds a typical structure of the target space metric. While some of the fields appear only in polynomial form others do not. This has been exploited in the past to construct certain triangular matrix representations of the coset spaces, such that the polynomial dependence is related to the nilpotent (off-diagonal) part of the matrices. This reminds one very much of the Iwasawa decomposition for semi-simple Lie groups resp. its reduction to the coset space. There is a corresponding structure in the action of the various transformations of G on the fields, in particular the generalized Ehlers and Harrison transformations. In fact, this will be the starting point of our structural analysis of the Lie algebra g of G. It turns out that all the different cases can be characterized by certain matrices made up from the structure constants of g. As a result, we not only get the complete action of g on the fields, but also obtain an abstract parametrization of the coset space G/H in terms of the fields via the exponential map. Clearly, in order to obtain a concrete parametrization, the above mentioned matrices have to be worked out for each case resp. family of models individually and we shall exemplify this in detail. Nevertheless for certain applications our knowledge on the general structure of G is sufficient without the need of any specific parametrization. A particularly interesting case is provided by the “Quadratic Mass Formula” for strictly stationary single black hole solutions, generalizing work of Heusler [4]. Another important application we have in mind but not yet completely worked out is a proof of the uniqueness of the “Generalized Kerr Solutions”, which we announced already in [1]. In Sect. 2 we review the dimensional reduction from four to three dimensions for stationary or axisymmetric solutions, concentrating on those cases, where the scalars of the 3-dimensional theory form a nonlinear σ -model, i.e. parametrize a pseudoRiemannian or Riemannian symmetric space and consequently there exist generalized Ehlers-Harrison transformations. We closely follow the treatment in [1] and for the convenience of the reader we repeat the list of all such cases from that paper. In Sect. 3 we analyze the general structure of the Lie algebra g and derive consistency conditions for the coefficients used to parametrize the structure constants of the Lie algebra g for all the possible cases. In Sect. 4 we discuss the parametrization of coset representatives forming a “triangular” subgroup T ⊂ G, the action of (infinitesimal) Ehlers-Harrison transformations, and the form of the Lagrangian built from the Lie algebra valued conserved current J . Section 5 contains two simple applications for strictly stationary black holes. In the appendix we give a detailed discussion of all the cases. 2. Dimensional Reduction from 4 to 3 Dimensions We consider 4-dimensional theories with scalars forming a nonlinear σ -model coupled to gravity and k abelian gauge fields. Stationary solutions of the field equations can be described in terms of a “dimensionally reduced” 3-dimensional theory. We will closely follow the analysis in [1] and use a rather similar notation.

2.1. The 4-Dimensional Theory. We start from a Lagrangian field theory over a spacetime manifold 64 with coordinates x a and metric gab (x) (with ds 2 = gab dx a dx b > 0

On Nonlinear σ -Models Arising in (Super-)Gravity

787

¯ be a Riemannian symmetric space with (real) coordialong time-like directions). Let 8 ¯ The nonlinear σ -model with target space 8 ¯ is characterized nates φ¯ i and metric γ¯ij (φ). ¯ by L(x) = 21 g ab (x)∂a φ¯ i (x)∂b φ¯ j (x)γ¯ij (φ(x)). We are interested in particular in the case ¯ H¯ , where G ¯ is a noncompact Lie group with maximal compact subgroup H¯ . ¯ = G/ 8 ¯ for the Lie algebra g¯ of G ¯ with commutators We choose a basis si , i = 1, . . . , dim G [si , sj ] = fij l sl . ¯ H¯ is a symmetric space, there exists an involutive autoDue to the assumption that G/ morphism τ¯ of g, ¯ τ¯ ([X, Y ]) = [τ¯ (X), τ¯ (Y )] for all X, Y ∈ g, ¯

τ¯ 2 = Id,

such that the Lie algebra of H¯ is h¯ = {X ∈ g¯ : τ¯ (X) = X}. ¯ H¯ by coset representatives π¯ (x) ∈ G. ¯ We can describe elements of the coset space G/ ¯ The group G acts on these representatives, but in addition there is the gauge group H¯ changing the coset representatives ¯ v(x) ¯ ∈ H. π(x) ¯ 7 → v(x) ¯ π¯ (x)u¯ −1 , with u¯ ∈ G, In order to eliminate the gauge degrees of freedom we consider the gauge invariant group ¯ element µ¯ = τ¯ (π¯ −1 )π¯ and the Lie algebra valued currents J¯ = J¯a dx a = 21 µ¯ −1 d µ. This allows us to rewrite the kinetic term for the scalars in the form 41 < J¯a , J¯a >g¯ , where < ·, · >g¯ is a suitable (not necessarily unique) invariant scalar product on the Lie algebra g. ¯ Finally, let there be k (real) abelian vector fields (“electric” potentials) Ba = (BaI ), I = 1, . . . , k with field strengths Gab = ∂a Bb − ∂b Ba and their duals ∗Gab . For the coupled system of gravity, scalars, and vector fields we choose the action Z p 1 1 c T 4 a ab ∗ ab ¯ ¯ |g|d x − R + < Ja , J >g¯ − Gab µG − ν˜ G ˜ , S4 = 2 4 8 64

¯ and ν˜ (φ) ¯ are symmetric k×k where R is the scalar curvature, c is a positive constant, µ( ˜ φ) matrices, and µ˜ is positive definite. The constant c could be absorbed by rescaling the matrices µ˜ and ν˜ and/or the vector fields Ba . Rescaling the matrices would, however, invalidate the identification of µ˜ and M¯ (defined below) as matrix representatives of ¯ (compare Sect. 4). Rescaling the vector fields may be equally elements of the group G undesirable; for the Kałuza–Klein theories discussed in detail in Sect. A.1 the vector fields BaI are directly related to components of the metric in 4 + n dimensions. For the generalized Einstein–Maxwell theories discussed in Sect. A.2 one would like to normalize the vector fields, and thus the charges, such that a Reissner–Nordstrøm black hole with charge q has mass m ≥ |q|. ˜ ab − ν˜ ∗Gab ) = 0 can be interpreted The field equations for the vector fields ∇a (µG as Bianchi identitities for field strengths Hab = ∂a Cb − ∂b Ca derived from “magnetic” ˜ ab − ν˜ ∗Gab ) with some constant orthogonal potentials Ca . Choosing ∗Hab = η(µG matrix η, we obtain 2k vector fields Aa and field strengths Fab = ∂a Ab −∂b Aa satisfying the linear relation Gab Bm , Am = , Fab = Y¯ M¯ ∗Fab , Fab = Hab Cm

788

P. Breitenlohner, D. Maison

with matrices M¯ = M¯ T and Y¯ = −Y¯ T such that Y¯ M¯ Y¯ = −M¯ −1 , Y¯ =

0 ηT −η 0

, M¯ =

µ˜ + ν˜ µ˜ −1 ν˜ ν˜ µ˜ −1 ηT ηµ˜ −1 ν˜ ηµ˜ −1 ηT

.

¯ with Assume there is a 2k-dimensional real matrix representation ρ¯ of G ¯ ¯ µ) ¯ = P¯ T P¯ = M. ρ¯ : π¯ 7 → ρ( ¯ π) ¯ = P¯ , ρ( ¯ τ¯ (π¯ )) = P¯ T −1 , ρ( ¯ invariant, provided the action of G ¯ on the The field equations for the vectors are then G ¯ acts nonlinearly on the ¯ 3 u¯ : Fab 7 → ρ( ¯ u)F ¯ ab . Thus the group G field strengths is G ¯ but acts linearly on the field strengths Gab and their duals ∗Gab (with coefficient scalars φ, depending on the scalars). The contribution of the vector fields to the gravitational ¯ and scalar field equations can be similarly expressed in terms of Fab in explicitly Gcovariant form. This “on-shell” symmetry can, however, in general not be formulated as an invariance of the action.

2.2. The 3-Dimensional Theory. A solution of the 4-dimensional field equations with a 1-parameter symmetry group is characterized by a Killing vector field K such that the Lie derivative with respect to K of the metric, scalars, and field strengths vanishes. The solution is stationary, strictly stationary, or axisymmetric if K is asymptotically timelike, everywhere time-like, or asymptotically space-like respectively. In the following we will mostly assume that the solution is strictly stationary, i.e. 1 = Ka K a > 0, but the results hold as well for axisymmetric solutions with 1 < 0. Although the dimensional reduction from 4 to 3 dimensions could be formulated in a coordinate independent form, the discussion is simplified by choosing adapted coordinates such that the isometry is just a translation (e.g. x a = (x m , t) with K = ∂t∂ ). The metric, scalars, and field strengths will then depend only on the three coordinates x m , m = 1, 2, 3 parametrizing the orbit space 63 of the action of K; in a suitable gauge this will also be true for the vector fields. We decompose the metric and vector fields into pieces perpendicular and parallel to Ka = (1km , 1), gab =

− 11 hmn + 1km kn 1kn 1km 1

, Ba =

B˜ m + Bkm B

.

Inserting this decomposition into the action S4 and omitting the integration over dt as well as surface terms, we obtain the action S3 describing a Lagrangian field theory over the orbit space 63 , the dimensionally reduced theory. The action 63 depends on the 3-vectors km and B˜ m only through their field strengths kmn = ∂m kn − ∂n km and B˜ mn = ∂m B˜ n − ∂n B˜ m ; the resulting field equations can again be interpreted as consistency conditions for the existence of dual potentials. For ∗B˜ mn these are just the parallel components C of Ca , thus the k vectors Ba of the 4-dimensional theory yield 2k scalars A. The “twist” ∗kmn can be expressed in terms of the “twist vector” ωm with the “twist potential” ψ, c B A= , ωm = ∂m ψ + AT Y¯ −1 ∂m A. C 2

On Nonlinear σ -Models Arising in (Super-)Gravity

789

Performing these dualizations via Lagrange multipliers finally yields the dimensionally reduced action in the form Z p 1 1 c ∂m AT M¯ ∂ mA |h|d 3 x R − < J¯m , J¯m >g¯ + S3 = 2 4 41 63 1 − 2 (∂m 1 ∂ m1 + ωm ωm ) , 41 where the 3-metric hmn and its inverse are used to compute the scalar curvature R as well as to raise indices. We have used the rescaled 3-metric hmn , assuming 1 6= 0, in order to obtain an action without additional coefficient for the scalar curvature. The action S3 describes a nonlinear σ -model with target space 8 coupled to 3dimensional gravity (note, however, that 3-dimensional gravity has no dynamical degrees ¯ − dim H¯ ) + 2k + 2 of freedom). 8 is a homogeneous space parametrized by (dim G ¯ A, 1, ψ). The theory is invariant under various transformation (with scalars φ = (φ, constant parameters): (1) (2) (3) (4)

twist gauge transformations ψ → ψ + χ , “electromagnetic” gauge transformations A → A + α, ψ → ψ + 2c AT Y¯ −1 α, √ scale transformations A → ζ A, 1 → ζ 1, ψ → ζ ψ, and ¯ transformations acting on φ¯ and A. G

The field equations for the scalar fields are equivalent to the conservation laws for corresponding Noether currents. The 3-metric hmn is positive definite for 1 > 0 (strict stationarity) whereas 8 has an indefinite metric with 2k negative eigenvalues. For the axisymmetric case we will ¯ A0 , 10 , ψ 0 ) with 10 < 0; the metric on 80 is positive use primed variables 80 = (φ, definite but the 3-metric h0mn has signature (+, −, −). The restriction of 8 (or 80 ) to the ¯ H¯ ⊗ SL(2)/SO(2), invariant submanifold A = 0 (or A0 = 0) is the symmetric space G/ under the well known (infinitesimal) Ehlers transformation δ φ¯ = 0, δ1 = 21ψ, δψ = ψ 2 − 12 [5]. For some such theories the Ehlers transformation can be extended to an invariance of the whole target space 8 or 80 ; commuting this generalized Ehlers transformation with (infinitesimal) electromagnetic gauge transformations finally yields generalized Harrison transformations [6]. One easily sees that all these transformations form a noncompact Lie group G with maximal compact subgroup H 0 and the target space is either the Riemannian symmetric space 80 = G/H 0 or the pseudo Riemannian symmetric space 8 = G/H , where H is a noncompact real form of H 0 . The dimensions of G and H are ¯ + dim SL(2) + 4k, dim H = dim H¯ + dim SO(2) + 2k. dim G = dim G The dimensionally reduced action can then be expressed in the form Z p 1 1 3 R + L , with L = − < Jm , J m >, |h|d x S3 = 2 4 63

where < ·, · > is a suitably normalized invariant scalar product on the Lie Algebra g of G. The G/H σ -model is characterized by two commuting involutive automorphisms τ and τ 0 : All elements of G that are invariant under τ form the subgroup H , all elements

790

P. Breitenlohner, D. Maison

Table 1. List of all symmetric spaces obtained by dimensional reduction from four to three dimensions of theories with scalars and vectors (reproduced from Table 2 in [1]). #

G/H

¯ H¯ G/

1

SL(n + 2)/SO(n, 2)

GL(n)/SO(n)

2 3 4 5 6 7 8 9 10 11

SU (p + 1, q + 1) S(U (p, 1) × U (1, q) SO(p + 2, q + 2) SO(p, 2) × SO(2, q) SO ∗ (2n + 4)/U (n, 2) Sp(2n + 2; R)/U (n, 1) G2(+2) SU (1, 1) × SU (1, 1) F4(+4) Sp(6; R) × SU (1, 1) E6(+6) /Sp(8; R) E6(+2) SU (3, 3) × SU (1, 1) E6(−14) SO ∗ (10) × SO(2) E7(+7) /SU (4, 4)

¯ H¯ dim G/ n(n + 1) 2

k n

U (p, q)/(U (p) × U (q))

2pq

p+q

SO(p, q) SO(2, 1) × SO(p) × SO(q) SO(2)

pq + 2

p+q

SO ∗ (2n)

SU (2) × U (n) SU (2) Sp(2n; R)/U (n)

n(n − 1)

2n

n(n + 1)

n

SU (1, 1)/U (1)

2

2

Sp(6; R)/U (3)

12

7

SL(6)/SO(6)

20

10

SU (3, 3) S(U (3) × U (3))

18

10

SU (5, 1)/U (5)

10

10

SO(6, 6) SO(6) × SO(6)

36

16

E7(−5) SO ∗ (12) × SO(2, 1) E7(−25) E6(−14) × SO(2)

SO ∗ (12)/U (6)

30

16

SO(10, 2) SO(10) × SO(2)

20

16

14

E8(+8) /SO ∗ (16)

E7(+7) /SU (8)

70

28

15

E8(−24) E7(−25) × SU (1, 1)

E7(−25) E6(−78) × SO(2)

54

28

12 13

invariant under τ 0 form the subgroup H 0 , and all elements invariant under τ τ 0 form the ¯ is the original automorphism τ¯ ; ¯ ⊗ SL(2). The restriction of τ or τ 0 to G subgroup G the restriction to SL(2) is the automorphism defining the maximal compact subgroup SO(2). We choose a basis ti for the Lie algebra sl(2), where t+ = e, t0 = d, and t− = k satisfy the commutation relations [d, e] = e, [d, k] = −k, [e, k] = 2d, and τ (e) = −k, τ (d) = −d, τ (k) = −e. ¯ is faithful, i.e., that all scalars of We may assume that the representation ρ¯ of G ¯ H¯ σ -model couple to the vector fields. The group G will then be the 4-dimensional G/ simple, and Table 1 (a reproduction of Table 2 in [1]) lists all possible cases.

On Nonlinear σ -Models Arising in (Super-)Gravity

791

3. Parametrization of the Lie Algebra 3.1. Commutation Relations. We choose a basis for the Lie algebra g of G consisting ¯ the generators ti of sl(2), and 4k additional generators hi , ai , of the generators si of g, i = 1, . . . , 2k transforming as doublet under sl(2) [d, hi ] = 21 hi , [e, hi ] = 0, [k, hi ] = −ai , [d, ai ] = − 21 ai , [e, ai ] = −hi , [k, ai ] = 0, and with a 2k-dimensional real matrix representation ρ¯ of g, ¯ ¯ i ) = Ri , [si , h · α] = h · Ri α, [si , a · α] = a · Ri α. ρ¯ : si 7 → ρ(s The remaining commutators can be determined from the Jacobi identities J (x, y, z) ≡ [[X, Y ], Z] + [[Y, Z], X] + [[Z, X], Y ] = 0: J (h · α, h · β, d) ⇒ [h · α, h · β] = α · yβ e, J (h · α, h · β, k) ⇒ [h · α, a · β] = α · yβ d + α · x l β sl , J (h · α, a · β, k) ⇒ [a · α, a · β] = −α · yβ k, with y T = −y and x lT = x l . Finally J (h · α, a · β, si ) ⇒ yRi + RiT y = 0, x l Ri + RiT x l = fij l x j , J (h · α, h · β, a · γ ) ⇒ 21 α (β · yγ ) + Rl α (β · x l γ ) − (α ↔ β) = γ (α · yβ). The trace of this “completeness relation” yields x l Rl − (x l Rl )T + (2k + 1)y = 0. 3.2. The Invariant Scalar Product. The simple Lie algebra g has the invariant scalar product (unique up to an overall factor) < X, Y >g =

1 Tr g (Ad(X) Ad(Y )), X, Y ∈ g, cg

similarly the invariant scalar product on sl(2) is < X, Y >sl(2) =

1 csl(2)

Tr sl(2) (Ad(X) Ad(Y )), X, Y ∈ sl(2),

where the action of the adjoint representation is Ad(X) |Y i = |[X, Y ]i , X, Y ∈ g. In order to compute the scalar product for sl(2) we need the commutators [d, [d, e]] = e, [d, [d, d]] = 0, [d, [d, k]] = k

⇒ < d, d >sl(2) =

[e, [k, e]] = 2e, [e, [k, d]] = 2d, [e, [k, k]] = 0

⇒ < e, k >sl(2) =

Choosing csl(2) = 2 we obtain for X = e + δd + κk, < X, X >sl(2) = δ 2 + 4κ = δ 2 + ( + κ)2 − ( − κ)2 .

2 csl(2) , 4 csl(2) .

792

P. Breitenlohner, D. Maison

Extending the trace over all generators of g, [d, [d, h · α]] = 41 h · α, [d, [d, a · α]] = 41 a · α

⇒ < d, d >g =

[e, [k, h · α]] = h · α,

⇒ < e, k >g =

[e, [k, a · α]] = 0

k+2 cg , 2(k+2) cg .

We choose cg = k + 2 such that the scalar product < ·, · >sl(2) coincides with the restriction of < ·, · >g to the subalgebra sl(2). The restriction of < ·, · >g to the subalgebra g¯ similarly defines a particular invariant scalar product on g¯ (among possibly different ones when g¯ is not simple). For the scalar product < si , sj > we need the commutators [si , [sj , h · α]] = h · Ri Rj α, [si , [sj , a · α]] = a · Ri Rj α, and find

Tr g¯ (Ad(si ) Ad(sj )) + 2 Tr(Ri Rj ) . k+2 For the scalar product < h · α, a · β > we need the commutators < si , sj >=

[h · α, [a · β, e]] = (α · yβ) e, [h · α, [a · β, h · γ ]] = h · Rl α (β · x l γ ) − 21 h · α (β · yγ ), [h · α, [a · β, d]] = 21 (α · yβ) d + 21 (α · x l β) sl , [h · α, [a · β, si ]] = −(α · yRi β) d − (α · x j Ri β) sj , [h · α, [a · β, a · γ ]] = −a · α (β · yγ ), [h · α, [a · β, k]] = 0. Collecting all terms we find < h · α, a · β >=

1 (3(α · yβ) − α · x l Rl β + β · x l Rl α), k+2

and finally using the completeness relation < h · α, a · β >= 2(α · yβ). 3.3. The Involutive Automorphism. As noted above the two involutive automorphisms τ and τ 0 coincide when restricted to the subalgebra g¯ ⊕ sl(2), whereas τ (X) = −τ 0 (X) for X = h · α + a · β. The action of τ on the subalgebra sl(2) is given above, the action on g¯ can be expressed by a matrix T = (T i j ), τ (si ) = sj T j i , with T i j T j k = δki . From the commutators [d, hi ] = 21 hi and [d, ai ] = − 21 ai together with τ (d) = −d we can deduce τ (h · α) = a · Zα, τ (a · α) = h · Z −1 α, with some nonsingular 2k × 2k matrix Z. The properties of this matrix Z can be determined by consistency requirements: τ ([e, a · α]) = [τ (e), τ (a · α)] ⇒ Z −1 = −Z, τ ([k, h · α]) = [τ (k), τ (h · α)] τ ([si , a · α]) = [τ (si ), τ (a · α)] ⇒ ZRi Z = −τ (Ri ) ≡ −Rj T j i , τ ([si , h · α]) = [τ (si ), τ (h · α)] τ ([h · α, a · β]) = [τ (h · α), τ (a · β)] ⇒ Z T yZ = y, Z T x k Z = T k l x l .

On Nonlinear σ -Models Arising in (Super-)Gravity

793

Note that the symmetric matrix yZ is positive definite, since the scalar product < X, τ 0 (X) > is negative definite. Changing the basis of the representation space for ρ¯ yields modified matrices Ri , Z, y, and x l , Ri 7 → S −1 Ri S, Z 7 → S −1 ZS, y 7→ S T yS, x l 7→ S T x l S. We choose a basis such that yZ = c12k , where 1n is the n-dimensional unit matrix, and c is the positive constant introduced in Sect. 2, with the consequence τ (Ri ) = −RiT . The matrix Z is then antisymmetric, i.e. antihermitian and has therefore k pairs of complex conjugate eigenvectors with eigenvalues ±i. This allows us to further specifiy the basis such that 0 ηT 0 −ηT , y=c = −cY¯ −1 , ηT η = 1k , Z= η 0 −η 0 and to identify the Lie algebra elements k, a · α, d, and si as generators of infinitesimal ¯ transformations respectively. We will twist gauge, electromagnetic gauge, scale, and G occasionally decompose the 2k-dimensional vectors A, a, h, α, . . . into k-dimensional “electric” and “magnetic” parts A(e) , a(e) , . . . and A(m) , a(m) , . . . respectively (with A(e) = B and A(m) = C). 4. Parametrization of the σ -Model 4.1. The General Structure. We use an “Iwasawa type” (KAN) decomposition of the Lie Algebra g with the two subgroups H = exp K and T = exp A exp N and coset representatives π : G/H 7 → T , (“triangular gauge”). The action of the group G on these coset representatives is G 3 u : π(x) 7 → v(x)π(x)u−1 , with v(x) ∈ H. The transformation with the constant parameter u ∈ G is combined with a gauge transformation with parameter v(x) ∈ H chosen such that the transformed coset representative is again an element of the triangular subgroup T . The infinitesimal form of these transformations is g 3 δg : δπ(x) = δh(x)π(x) − π(x)δg, with δh(x) ∈ h. It is convenient to rewrite the variation of π(x) in the form δπ(x) = (δh(x) − π(x)δgπ −1 (x))π(x), that allows to determine δh(x) ≡ δh(π(x), δg). Note that the Iwasawa decomposition and the triangular subgroup T are very convenient but are not really needed; all we need are unique coset representatives π that allow us to compute δh(π(x), δg). For some of ¯ H¯ discussed in Appendix A we will actually not use the triangular the coset spaces G/ subgroup. Given π(x) we can compute two Lie algebra valued 1-forms A and J , dπ π −1 = A + J = (Am + Jm )dx m , where τ (A) = A, τ (J ) = −J ,

794

P. Breitenlohner, D. Maison

with transformation laws δJ = [δh, J ], δA = dδh + [δh, J ], i.e., J is a covariant (“matter”) field and A can be interpreted as gauge connection. The σ -model Lagrangian is 1 L = − < J , J >, 4 with the implied contraction of 1-forms Jm dx m Jn dx n 7 → Jm J m . We can rewrite the resulting field equations Dm J m ≡ ∇m J m − [Am , J m ] = 0, in the form Dm J m = π∇m J m π −1 , with J = π −1 J π, L = −

1 < J, J >, 4

and express J in terms of π as J =

1 −1 µ dµ, with µ = τ (π −1 )π, 2

and with the linear transformation laws G 3 u : µ 7 → τ (u)µu−1 , J 7 → uJ u−1 . 4.2. The SL(2)/SO(2) σ -Model. Dimensional reduction of gravity from 4 to 3 dimensions yields the well known SL(2)/SO(2) σ -model with two scalar fields, the square 1 of the Killing vector and the twist potential ψ [5]. ˜ ≡ SL(2) are The coset representatives π˜ ∈ G π˜ = eln 1 d eψ k . We will need the expressions δ π˜ π˜ −1 =

δ1 δ1 δψ δ1 d+ k, π˜ −1 δ π˜ = d + (δψ − ψ ) k, 1 1 1 1

in order to compute the variation of the fields 1 and ψ under infinitesimal transformations generated by e (Ehlers transformation), d (scale transformation), and k (twist gauge transformation). The expression π˜ δ g˜ has already the required form for δ g˜ = k or d (the generators of the triangular subgroup T ) and therefore the “compensating” gauge transformation vanishes in these two cases: δ g˜ = k : π˜ −1 δ π˜ = −k ⇒ δ1 = 0, δψ = −1, δ g˜ = d : π˜ −1 δ π˜ = −d ⇒ δ1 = −1, δψ = −ψ.

On Nonlinear σ -Models Arising in (Super-)Gravity

795

For the Ehlers transformation we find δ g˜ = e :

ψ2 k. δ π˜ π˜ −1 = δ h˜ − π˜ e π˜ −1 = δ h˜ − 1 e + 2ψ d + 1

Since the coefficient of e in δ π˜ π˜ −1 has to vanish we must choose δ h˜ = 1(e − k) and obtain δ g˜ = e :

δ π˜ π˜ −1 = 2ψ d + (

ψ2 − 1)k 1

⇒ δ1 = 2ψ1, δψ = ψ 2 − 12 .

The commutators of these variations of the fields 1 and ψ reflect the structure of the Lie algebra [δ(X), δ(Y )] = δ([Y, X]), where δ(X) are the variations under the transformation generated by X ∈ g. ˜ The decomposition of d π˜ π˜ −1 yields d1 dψ (∂1)2 + (∂ψ)2 dψ (k − e), J˜ = d+ (e + k), L = − , A˜ = 21 1 21 412 and the gauge invariant current is dψ 1d1 + ψdψ (12 − ψ 2 )dψ − 2ψ1d1 e+ d+ k. J˜ = π˜ −1 J˜ π˜ = 2 2 21 1 212 We can easily translate these relations for Lie algebra and group elements into matrix equations, using the 2-dimensional matrix representation ρ, ˜ 1 δ , < X, Y >= 2 Tr(ρ(X) ˜ ρ(Y ˜ )), ρ(e ˜ + δd + κk) = 2 κ − 21 δ

with T

˜ y˜ ρ( ˜ τ˜ (X)) = −ρ˜ (X) = y˜ ρ(X)

−1

, y˜ = iσ2 =

0 1 , −1 0

˜ and ρ( ˜ π) ˜ = P˜ , ρ( ˜ µ) ˜ = M, ! √ 1 0 1 + 1−1 ψ 2 1−1 ψ , M˜ = P˜ T P˜ = . P˜ = √ψ √1 1−1 1−1 ψ 1 1 4.3. The G/H σ -Model. The fields of the G/H (or G/H 0 ) σ -model are the scalars from ¯ H¯ σ -model described by coset representatives π, the 4-dimensional G/ ¯ the two scalars 1 and ψ from 4-dimensional gravity and 2k scalar (“electromagnetic”) potentials Ai from the k vector fields. Note that the twist vector ωm now has a contribution from the electromagnetic potentials 1 ω = ωm dx m = dψ − A · ydA. 2 We choose G/H coset representatives π in the triangular group T ⊂ G, π = eln 1 d π¯ ea·A+ψ k ,

796

P. Breitenlohner, D. Maison

¯ H¯ coset representatives π¯ ∈ T¯ ⊂ G. ¯ We will again need the expressions with the G/ δψ− 21 A·yδA δ1 P¯ δA −1 + a·√ d + δ π ¯ π ¯ + k, 1 1 1 δ1 δ1 −1 −1 = 1 d + π¯ δ π¯ + a · (δA + P¯ δ P¯ A − A 21 ) 1 1 δ1 −1 +(δψ + 2 A · yδA + 2 A · y P¯ δ P¯ A − ψ 1 ) k,

δπ π −1 = π −1 δπ

¯ in the 2k-dimensional matrix representawhere P¯ represents the group element π¯ ∈ G tion ρ, ¯ ¯ 3 π¯ 7 → P¯ , with π¯ h · α π¯ −1 = h · P¯ α. g¯ 3 si 7 → R(si ) = Ri , G For the infinitesimal transformations generated by k (shift of the twist potential), a (shift of the electromagnetic potentials), and d (scale transformation) we have δh = 0 and thus π −1 δπ = −δg. The variations of the fields are: δg = k : δ1 = 0, δ π¯ = 0, δA = 0, δψ = −1, δg = a · α : δ1 = 0, δ π¯ = 0, δA = −α, δψ = 21 A · yα, δg = d : δ1 = −1, δ π¯ = 0, δA = − 21 A, δψ = −ψ. ¯ π¯ , δ g) For δg = δ g¯ ∈ g¯ we need the compensating gauge transformation δh = δ h( ¯ resulting in ¯ − δ g¯ δg = δ g¯ : π −1 δπ = π −1 δ hπ −1 ¯ = π¯ δ hπ¯ − δ g¯ + a · R(π¯ −1 δ h¯ π¯ )A + 21 (A · yR(π¯ −1 δ h¯ π¯ )A) k. Comparing coefficients and using P¯ −1 δ P¯ = R(π¯ −1 δ π¯ ) we find δ1 = 0, δ π¯ = δ h¯ π¯ − π¯ δ g, ¯ δA = R(δ g)A, ¯ δψ = 0. In order to determine the compensating gauge transformation δh required for the infinitesimal transformations generated by h (Harrison transformations) and e (Ehlers transformation) we first compute π δg π −1 . For the Harrison transformations with δg = h · α we obtain √ ¯ ¯ ψ ψ k) h · P¯ α 1 exp −( a·√P A + 1 k) δh − π δg π −1 = δh − exp( a·√P A + 1 1 1 √ l −1 ¯ = δh − h · P α 1 − A · yα d + A · x α π¯ sl π¯ + √1 a · P¯ (α ψ − 41 A A · yα − 21 Rl A A · x l α) 1

ψ A · yα k + −1

1 61

A · yRl A A · x l α k.

This requires δh(π, h · α) = (h · P¯ α + a · Z P¯ α)

√

¯ π¯ , sl ), 1 − A · x l α δ h(

and yields ¯ π¯ , sl ) − π¯ sl π¯ −1 ) + a · Z P¯ α δπ π −1 = −A · yα d − A · x l α (δ h( 1 1 ¯ √ a · P (α ψ − 4 A A · yα − 21 Rl A A · x l α) + 1

ψ A · yα k + −1

1 61

A · yRl A A · x l α k.

√ 1

On Nonlinear σ -Models Arising in (Super-)Gravity

797

Comparing coefficients we find the variations of the fields δ1 = −A · yα 1, ¯ π¯ , sl ) π¯ − π¯ sl ), δ π¯ = −A · x l α (δ h( ¯ 1 − 1 A A · yα − 1 Rl A A · x l α, δA = α ψ + cy −1 Mα 4 2 ¯ 1 − 1 A · yRl A A · x l α, δψ = − 1 A · yα ψ + c A · Mα 2

2

12

with the positive definite symmetric matrix M¯ = P¯ T P¯ . For the Ehlers transformations with δg = e we obtain ¯

¯

ψ k) 1 e exp −( a·√P A + δh − π δg π −1 = δh − exp( a·√P A + 1 1 √ 1 = δh − e 1 − h · P¯ A 1 + 2ψ d + 21 A · x l A π¯ sl π¯ −1 + √1 a · P¯ (A ψ − 16 Rl A A · x l A) 1 ψ2 +1 k

+

1 241

ψ 1 k)

A · yRl A A · x l A k.

This requires δh(π, e) = 1(e − k) + (h · P¯ A + a · Z P¯ A)

√

1 ¯ π¯ , sl ), 1 − A · x l A δ h( 2

and yields ¯ π¯ , sl ) − π¯ sl π¯ −1 ) + a · Z P¯ A δπ π −1 = 2ψ d − 21 A · x l A (δ h( 1 ¯ + √ a · P (A ψ − 16 Rl A A · x l A) 1 ψ 2 −12 + 1

k+

1 241

√ 1

A · yRl A A · x l A k.

Comparing coefficients we find the variations of the fields δ1 = 2ψ 1, ¯ π¯ , sl ) π¯ − π¯ sl ), δ π¯ = − 21 A · x l A (δ h( −1 ¯ δA = A ψ + cy MA 1 − 16 Rl A A · x l A, ¯ 1 − 1 A · yRl A A · x l A. δψ = ψ 2 − 12 + 2c A · MA 24 The decomposition of dπ π −1 yields A = A¯ +

ω · P¯ dA − h · Z P¯ dA) + 21 (k − e), d1 1 ω (e + k), J = 1 d + J¯ + √ (a · P¯ dA + h · Z P¯ dA) + 21 1 √ (a 2 1

2 1

where A¯ and J¯ are the quantities derived from π¯ . The Lagrangian is L=−

1 c (∂1)2 + ω2 1 < J , J >= − < J¯ , J¯ > + ∂A · M¯ ∂A − . 4 4 41 412

The gauge invariant current is J = π −1 J π = N −1 Jˆ N, N = exp(a · A + ψ k), with

d1 1 ω 1 ω h · Z M¯ dA + d + J¯ + a · dA + k. e+ Jˆ = 2 21 21 1 2 2

798

P. Breitenlohner, D. Maison

In order to compute N −1 Jˆ N let us first evaluate N −1 X N for all X ∈ g, N −1 e N = e − h · A + 2ψ d − 21 A · x i A si 1 −a · A ψ − 16 a · Ri A A · x i A − ψ 2 k − 24 A · x i A A · yRi A k, −1 i N h · α N = h · α − A · yα d + A · x α si + a · α ψ + 41 a · A A · yα + 21 a · Ri A A · x i α + A · yα ψ k + 16 A · x i A A · yRi α k, −1 N d N = d − 21 a · A − ψ k, N −1 si N = si + a · Ri A + 21 A · yRi A k, N −1 a · α N = a · α + A · yα k, N −1 k N = k. Expanding J with respect to the generators J =

1 1 1 1 e J e + h · ZJ h + J d d + J i si + a · J a + J k k, 2 2 2 2

and collecting all terms we obtain the conserved currents ω , 12 1 ω J h = M¯ dA + ZA 2 , 1 1 c ω d1 d − A · M¯ dA + 2 ψ, J = 1 21 1 1 ω i i ¯ ¯ A · x i A si , A · x Z M dA si − J si = J + 21 412 1 d1 1 + 2R(J¯)A + Ri A A · x i Z M¯ dA + Z M¯ dA ψ J a = dA − A 1 21 1 ω ω c i R A A · x A, A A · M¯ dA − 2 A ψ − + i 41 1 612 d1 c ψ + A · yR(J¯)A + A · M¯ dA ψ J k = ω + A · ydA − 2 1 1 ω ω c A · x i A A · RiT M¯ dA − 2 ψ 2 − A · x i A A · yRi A, − 61 1 2412

Je =

and the Lagrangian L=−

1 1 < J, J >= − J i J j < si , sj > −cJ a · J h + J d J d + J e J k . 4 4

Using the identity x i < si , sj >= −2yRj it is straightforward, although tedious, to verify that this reproduces the expression given above. 5. Application to Black Holes In our previous paper with Gibbons [1] we used the σ -model formulation in order to generalize a number of results on stationary black holes of the Einstein–Maxwell theory to a large class of 4-dimensional theories with abelian gauge fields and scalars. With the more detailed knowledge of their group theoretical structure gained in this paper, some further applications are conceivable. Here we just give simple derivations of two such

On Nonlinear σ -Models Arising in (Super-)Gravity

799

results for strictly stationary single black holes. First we generalize a quadratic mass formula of Heusler [4] to all the theories under consideration. It is quite remarkable, how the use of the group theoretical structure simplifies the derivation. As a second application we state the action of the Ehlers–Harrison transformation on the charges of the black hole. Consider a strictly stationary single black hole solution for one of the theories discussed in this article in its dimensionally reduced form. We can choose suitable coordinates on 63 as well as a suitable gauge such that the behaviour of the fields at infinity is 1 π1 1 hmn (x) = δmn + O( ), π(x) = exp( + O( 2 )). r r r The element π1 ∈ t (where t is the Lie algebra of the triangular group T ) contains the charges π1 = −2md + 2nk + a · q + π¯ 1 , the total mass m, the NUT-charge n, the combined electric and magnetic charges q, and scalar charges determined by π¯ 1 ∈ t¯. Integrating the conserved currents J over a closed surface 6 enclosing the black hole yields the Lie algebra valued charge I 1 J m d6m . Q= 4π Evaluating this integral at infinity we obtain Q=

1 1 (π1 − τ (π1 )) = 2md − n(e + k) − (a · q + h · Zq) + Qi si , 2 2

¯ = Qi si = 1 (π¯ 1 − τ (π¯ 1 )). with the scalar charges Q 2 Due to the assumption of strict stationarity, the field 1 vanishes on the horizon H and is positive everywhere outside H. The √ boundary conditions at the horizon [7] imply that ψ, A, and the surface gravity κ = 21 hmn ∂m 1∂n 1 are constant on the horizon, whereas the scalars fields φ i are finite but may vary. Moreover, the twist vector ωm vanishes linearly with 1, and the area of the horizon is given by I 1 nm d6m , aH = lim 1→0 1 1= const.

where n is the unit normal to the surface 1 = const. with respect to the rescaled metric hmn . 5.1. A Quadratic Mass Formula. Evaluating the integral for Q on the horizion yields −1 QH NH , Q = NH

where NH is the value of N = exp(a · A + ψk) on the horizon, and the charges QH on the horizon are I 1 1 Jˆm d6m = 2mH d − nH e − h · ZqH , QH = 4π H 2

800

P. Breitenlohner, D. Maison

with the “irreducible mass” mH =

1 8π

I H

1 ∂m 1 d6 m = κH aH , 1 4π

and the NUT and vector charges “at the horizon” I I ¯ 1 1 ωm M∂m A m d6 , qH = − d6 m . nH = − 8π H 12 4π H 1 Note that QH has no contributions with the generators si , a, and k, because the corresponding terms in Jˆ carry no inverse powers of 1. Using the invariance of the scalar product, the relation between Q and QH implies < Q, Q >=< QH , QH >. Furthermore the structure of the scalar product (pairing e with k, etc.) yields < QH , QH >= 4m2H and hence the desired quadratic mass formula c ¯ Q ¯ >g¯ = m2 , m2 + n2 − q T q+ < Q, H 4 extending the well known relation m2 + n2 − q 2 − p2 = m2H for the generalized Reissner–Nordstrøm solutions with electric and magnetic charges q and p to a large class of theories. 5.2. The Action of G on Black Holes. The action of G, given explicitly in Sect. 4.3 for the infinitesimal transformations, preserves the boundary conditions on the horizon, and leaves the irreducible mass invariant. The action of G on the conserved currents is G 3 u : J 7 → uJ u−1 . In order to preserve the boundary conditions at infinity, we have to restrict the transformations to the subgroup H , with the action on the Lie algebra valued charge Q given by H 3 u : Q 7 → uQu−1 . For the infinitesimal transformation generated by δg = e − k this yields ¯ = 0, δm = −2n, δn = 2m, δq = Zq, δ Q for the transformation generated by δg = h · α + a · Zα, c ¯ δm = − q · Zα, δq = 2αn − 2Zαm + 2R(Q)Zα, 2 c ¯ = − 1 q · x l α(sl − τ (sl )), δQ δn = q · α, 2 2 ¯ and for the transformations generated by δg = δ g¯ ∈ h, ¯ = [δ g, ¯ δm = 0, δn = 0, δq = R(δ g)q, ¯ δQ ¯ Q]. ¯ Starting from a nondegenerate black hole (i.e. mH > 0) with charges m, n, q, and Q, we can apply a suitable group element from H to obtain a new solution with the same ¯ This new solution is again a strictly mH , and with modified charges m, n, q, and Q. stationary black hole as long as the transformation does not introduce zeros or infinities of 1.

On Nonlinear σ -Models Arising in (Super-)Gravity

801

A. Appendix: The Individual Models ¯ H¯ σ -model coupled Dimensional reduction of 4-dimensional theories consisting of a G/ ¯ − dim H¯ ) + 2k + 2 to k abelian vector fields and gravity to 3 dimensions yields (dim G scalars coupled to gravity. The conditions that all these scalars form one G/H σ -model have been discussed in [1]. In Table 1, reproduced from that paper, we list 15 different possibilities, all with a simple Lie group G. In the following we discuss some of these cases in detail and finally indicate a general procedure applicable to all cases. A.1. SL(n + 2)/SO(n, 2). These are the Kałuza–Klein theories obtained from pure gravity in D = n + 4 dimensions by dimensional reduction with respect to n commuting space like Killing vectors. The 4-dimensional theory consists of k = n vector fields and a GL(n)/SO(n) σ -model with 21 n(n + 1) scalars. Let πˆ be the GL(n)/SO(n) coset representative and ρˆ the k-dimensional matrix representation ˆ i ) = ri . ρˆ : πˆ 7 → ρ( ˆ πˆ ) = Pˆ , τ (Pˆ ) = Pˆ T −1 , ρ(s The action for the 4-dimensional theory is given by < J¯, J¯ >g¯ = 2 Tr(Jˆ Jˆ) −

2 ˆ Tr(Jˆ) Tr(Jˆ), c = 1, µ˜ = M, ν˜ = 0, n+2

where the symmetric matrix Mˆ = Pˆ T Pˆ contains the (rescaled) scalar products of the n Killing vectors, and the σ -model currents are 1 ˆ ρ( ˆ J¯) = Jˆ = Mˆ −1 d M. 2 The theory is invariant under the action of GL(n) and SO(n), GL(n) 3 u : Ba 7 → u Ba , Pˆ 7→ v Pˆ u−1 , with v ∈ SO(n). Further dimensional reduction with respect to a time-like Killing vector yields (with a suitable choice for η) the 3-dimensional theory with ri 0 0 1n Mˆ 0 ¯ . , Ri = , y= M= −1n 0 0 −riT 0 Mˆ −1 This model has the unique property that the scalar potentials A transform under a re¯ = GL(n), due to the fact that the action of G ¯ on ducible representation of the group G the field strengths Bab involves no duality transformations; this leads to an unambiguous decomposition into electric and magnetic potentials A(e) and A(m) respectively. In order to exhibit the SL(n+2)/SO(n, 2) σ -model consider the (n+2)-dimensional matrix representation ρ(X) of the element X = e + h · α + δd + λi si + a · β + κk of the Lie algebra sl(n + 2) and the involutive automorphism ρ(τ (X)) = −D −1 ρ T (X)D with     1 T 1 0 0 i 2 δ −α(m) Tr(λ ri ) 1n+2 , D =  0 −1n 0  , ρ(X) =  β(e) λi ri α(e)  − n+2 1 T 0 0 1 κ β(m) − 2 δ

802

P. Breitenlohner, D. Maison

and with the scalar product < X, Y >= 2 Tr(ρ(X) ρ(Y )). Finally we obtain   √ 1 0 0 1  Pˆ 0  Pˆ A(e) P = ρ(π ) = (det Pˆ )− n+2  , √1 (ψ + 1 AT A(e) ) √1 AT √1 (m) 2 (m) 1

1

1

and ρ(J ) = 21 M −1 dM with M = P T DP . A.2. SU (p + 1, q + 1)/S(U (p, 1) × U (1, q)). These are 4-dimensional theories with k = p + q vector fields and the 2pq real scalars from U (p, q)/(U (p) × U (q)). For q = 0 there are no scalars, the only effect of the U (p)/U (p) σ -model is the action of U (p) on the field strength. For p = 1, q = 0 this is the Einstein-Maxwell theory; the theories with p > 1, q = 0 are generalizations with several vector fields. Let π¯ be the U (p, q)/(U (p) × U (q)) coset representative, ρˆ the k-dimensional complex matrix representation ρˆ : π¯ 7 → ρ( ˆ π) ¯ = Pˆ , Vˆ Pˆ = (Pˆ + )−1 Vˆ , τ (Pˆ ) = (Pˆ + )−1 , with the real U (p, q) metric Vˆ = Vˆ T = Vˆ ∗ = Vˆ −1 , and ρ¯ the corresponding 2kdimensional real representation ρ¯ : π¯ 7 → ρ( ¯ π¯ ) = P¯ =

Re Pˆ − Im Pˆ Im Pˆ Re Pˆ

.

Decomposing the hermitian matrix Mˆ = Pˆ + Pˆ satisfying Mˆ Vˆ Mˆ = Vˆ into real and imaginary parts M˜ = M˜ T = Re Mˆ and N˜ = −N˜ T = Im Mˆ we obtain the relations M˜ Vˆ M˜ − N˜ Vˆ N˜ = Vˆ , M˜ Vˆ N˜ + N˜ Vˆ M˜ = 0. This allows us to express the action for the 4-dimensional theory with c = 4 in terms of the symmetric matrices µ˜ = Vˆ M˜ −1 Vˆ , ν˜ = Vˆ M˜ −1 N˜ , and the invariant scalar product on u(p, q), ˆ ρ(Y ˆ )) − < X, Y >g¯ = 2 Tr(ρ(X)

2 Tr ρ(X) ˆ Tr ρ(Y ˆ ). p+q +2

Note, however, that det Mˆ = 1 and therefore Tr ρ(J ˆ ) = 0. Choosing η = Vˆ , we obtain the 3-dimensional theory with M¯ =

M˜ −N˜ N˜ M˜

, y=c

0 Vˆ −Vˆ 0

.

In order to construct the (k +2)-dimensional matrix representation ρ of SU (p +1, q +1) we first rearrange the 2k real components of A into a k-dimensional complex vector A =

On Nonlinear σ -Models Arising in (Super-)Gravity

803

A(e) +iA(m) and similarly for a, h, α, etc. The element X = e+h·α+δd+λi si +a·β+κk of su(p + 1, q + 1) is represented by √ +   1 ˆ i ˆ √2 δ − i 2α V √ i ))  − Tr(λ ρ(s 1k+2 , ρ(X) =  i 2β λ ρ(s ˆ ) i 2α i √ + k + 2 2β Vˆ κ − 21 δ such that ρ(X) = −V −1 ρ + (X)V  0 0 V =  0 Vˆ i 0

and ρ(τ (X)) = −D −1 ρ + (X)D with    −i 1 0 0 0  , D =  0 −1k 0  , 0 0 1 0

and with the scalar product < X, Y >= 2 Tr(ρ(X) ρ(Y )). Finally we obtain   √ 1 0 0 √ 1  Pˆ 0  i 2Pˆ A P = ρ(π ) = (det Pˆ )− k+2  , √ 2 1 + + √ (ψ + iA Vˆ A) √ A Vˆ √1 1

1

1

the twist vector ω = dψ − 2 Im A+ Vˆ dA, and ρ(J ) = 21 M −1 dM with M = P + DP . A.3. SO(p + 2, q + 2)/(SO(p, 2) × SO(2, q)). These are 4-dimensional theories with k = p + q vector fields, with pq scalars from SO(p, q)/(SO(p) × SO(q)), and with a dilaton ϕ and an axion χ from SL(2)/SO(2). Some well known examples are the Einstein-Maxwell-dilaton-axion theory with p = 1, q = 0 [8], the bosonic sector of N = 4 supergravity with p = 6, q = 0, and the (dimensionally reduced) bosonic sector of 10-dimensional supergravity with p = q = 6. Let πˆ be the SO(p, q)/(SO(p) × SO(q)) coset representative and ρˆ the k-dimensional matrix representation ρˆ : πˆ 7 → ρ( ˆ π) ˆ = Pˆ , Vˆ Pˆ = Pˆ T −1 Vˆ , τ (Pˆ ) = Pˆ T −1 , with the real SO(p, q) metric Vˆ = Vˆ T = Vˆ −1 , and consider the action for the 4dimensional theory with (∂ϕ)2 + (∂χ )2 ˆ + Tr(Jˆ Jˆ), c = 1, µ˜ = ϕ M, ν˜ = χ Vˆ , < J¯, J¯ >g¯ = ϕ2 where

1 ˆ Mˆ = Pˆ T Pˆ , Jˆ = Mˆ −1 d M. 2

˜ y = Vˆ ⊗ y, Choosing η = Vˆ , we obtain the 3-dimensional theory with M¯ = Mˆ ⊗ M, ˜ y˜ = iσ2 , where √ ϕ 0 ϕ + ϕ −1 χ 2 ϕ −1 χ T ˜ ˜ ˜ ˜ χ , M=P P = , P = √ √1 ϕ −1 ϕ −1 χ ϕ ϕ parametrizes the SL(2)/SO(2) σ -model.

804

P. Breitenlohner, D. Maison

In order to construct the (k + 4)-dimensional matrix representation ρ of SO(p + 2, q + 2) we first rearrange the 2k components of A into a k × 2 matrix A = (A(e) , A(m) ) ¯ into sˆi ∈ gˆ = so(p, q) and and similarly for a, h, α, . . . and split the generators si of G s˜i ∈ g˜ = sl(2). The element X = e + h · α + δd + λi si + a · β + κk of so(p + 2, q + 2) is represented by   1 ˜ i ˜ si ) −α T Vˆ 1k 2 δ12 + λ ρ(˜ , ρ(X) =  β y˜ λˆ i ρ(ˆ ˆ si ) α y˜ β T Vˆ − 21 δ12 + λ˜ i ρ(˜ ˜ si ) κ12 such that ρ(X) = −V −1 ρ T (X)V  0 0 V =  0 Vˆ −y˜ 0

and ρ(τ (X)) = −D −1 ρ T (X)D with    y˜ 0 12 0 0  , D =  0 −1k 0  , 0 0 12 0

and with the scalar product < X, Y >= Tr(ρ(X) ρ(Y )). Finally we obtain   0 0 11/2 P˜ , P = ρ(π ) =  Pˆ Ay˜ Pˆ 0 ˜ 1−1/2 AT Vˆ 1−1/2 P˜ 1−1/2 (ψ12 + 1 AT Vˆ Ay) 2

and ρ(J ) = 21 M −1 dM with M = P T DP . A.4. SO ∗ (2n + 4)/U (n, 2). These are 4-dimensional theories with k = 2n vector fields and the n(n − 1) scalars from SO ∗ (2n)/U (n) × SU (2)/SU (2). In the following we assume n > 1, since for n = 1 this is the SU (3, 1)/U (2, 1) theory already discussed in Sect. A.2. The 4n electromagnetic potentials transform under the 2n-dimensional representation ρˆ of SO ∗ (2n) and under the 2-dimensional representation ρ˜ of SU (2), and ¯ into sˆi ∈ gˆ = so∗ (2n) and s˜i ∈ g˜ = su(2). we will therefore split the generators si of G ∗ The group SO (2n) is defined as the subgroup of SO(2n; C) that leaves the antihermitian form ϕ + iσ2 ⊗ 1n χ invariant. Allowing for a change of basis we obtain ρˆ : sˆi 7 → Rˆ i , Vˆ Rˆ i + Rˆ i+ Vˆ = 0, Wˆ Rˆ i + Rˆ iT Wˆ = 0, τ (Rˆ i ) = −Rˆ i+ , with matrices Vˆ = −Vˆ + and Wˆ = Wˆ T such that Cˆ ≡ Vˆ −1 Wˆ + = Wˆ −1 Vˆ T . This implies the reality condition Rˆ i = Cˆ Rˆ i∗ Cˆ −1 with Cˆ Cˆ ∗ = −12n . Hence the representation ρˆ is “pseudo real”, i.e., is equivalent to its complex conjugate but cannot be written with real matrices; the same holds true for the 2-dimensional representation ρ˜ of su(2) with i 0 1 + −1 ˜ T ˜ ˜ ˜ ˜ y˜ = iσ2 = , ρ˜ : s˜i 7 → Ri = σi = τ (Ri ) = −Ri = −y˜ Ri y, −1 0 2 and R˜ i = y˜ R˜ i∗ y˜ −1 . We can now construct the (2n+4)-dimensional matrix representation ρ of so∗ (2n+4) with the Lie algebra element X = e + h · α + δd + λi si + a · β + κk represented by   1 ˜ i ˜ si ) −α + Vˆ 12 2 δ12 + λ ρ(˜ , ρ(X) =  β λˆ i ρ(ˆ ˆ si ) α 1 + ˆ i ˜ β V − 2 δ12 + λ ρ(˜ ˜ si ) κ12

On Nonlinear σ -Models Arising in (Super-)Gravity

805

such that Vρ(X) + ρ + (X)V = 0, Wρ(X) + ρ T (X)W = 0, and ρ(τ (X)) = −D −1 ρ + (X)D with      0 0 y˜ 0 0 −12 12 0 0 V =  0 Vˆ 0  , W =  0 Wˆ 0  , D =  0 −12n 0  , 0 0 12 0 −y˜ 0 0 12 0 

and with the scalar product < X, Y >= Tr(ρ(X) ρ(Y )). The representation matrices satisfy the reality condition ρ(X)∗ = C −1 ρ(X)C with the “charge conjugation” matrix 

C = V −1 W + = W −1 V T

 −y˜ 0 0 =  0 Cˆ 0  , 0 0 −y˜

ˆ ∗ y˜ The 2n × 2 matrix A (and similarly α, β, . . . ) satisfies the reality condition A = CA and can be expressed in terms of the 4n components of A, the precise form depending on the choice of Vˆ and Wˆ . Choosing Vˆ = iσ1 ⊗ 1n and Wˆ = σ3 ⊗ 1n with Cˆ = −σ2 ⊗ 1n , ˆ ∗ ) with A(c) = A(e) + iA(m) , we obtain as well as A = (A(c) , CA (c) (α · yβ)12 = β + Vˆ α − α + Vˆ β, y =

0 2σ1 ⊗ 1n −2σ1 ⊗ 1n 0

,

i.e. c = 2 and η = σ1 ⊗ 1n , as well as the 4n-dimensional real representation ρ¯ = ρˆ ⊗ ρ˜ of g¯ in the form ρ¯ : λi si 7 → λˆ i

Re Rˆ i − Im Rˆ i Im Rˆ i Re Rˆ i

+

λ˜ 1 2 iσ2 λ˜ 3 λ˜ 2 2 iσ2 − 2 12

λ˜ 3 λ˜ 2 2 iσ2 + 2 12 ˜1 − λ2 iσ2

! ⊗ 1n .

Finally we obtain  0 0 11/2 12 , P = ρ(π ) =  Pˆ A Pˆ 0 1 + ˆ −1/2 −1/2 + −1/2 ˆ (ψ12 + 2 A V A) 1 A V 1 12 1 

and ρ(J ) = 21 M −1 dM with M = P + DP , and therefore < J, J >= Tr(ρ(J )ρ(J )) = Tr(ρ( ˆ Jˆ)ρ( ˆ Jˆ)) −

(∂1)2 + ω2 2 ˆ , M∂A ∂A+ (c) + (c) 1 12

ˆ Mˆ = Pˆ + Pˆ , and Mˆ σ1 ⊗ 1n Mˆ = σ1 ⊗ 1n . We can therefore with ρ( ˆ Jˆ) = 21 Mˆ −1 d M, proceed as in Sect. A.2, and express the 4-dimensional theory in terms of the symmetric matrices ˆ −1 σ1 ⊗ 1n , ν˜ = σ1 ⊗ 1n (Re M) ˆ −1 Im M. ˆ µ˜ = σ1 ⊗ 1n (Re M)

806

P. Breitenlohner, D. Maison

A.5. Sp(2n + 2; R)/U (n, 1). These correspond to 4-dimensional theories with k = n vector fields and the n(n + 1) scalars from Sp(2n; R)/U (n). For n = 1 this is again the Einstein-Maxwell-dilaton-axion theory [8]. Consider the action for the 4-dimensional theory with ˆ + Tr(Mˆ −1 ∂ Nˆ Mˆ −1 ∂ Nˆ ), < J¯, J¯ >g¯ = Tr(Mˆ −1 ∂ Mˆ Mˆ −1 ∂ M) ˆ ν˜ = N, ˆ where the symmetric matrices Mˆ = Pˆ T Pˆ and Nˆ describe scalar c = 1, µ˜ = M, “dilaton” fields and pseudo scalar “axion” fields respectively. Choosing η = Sn , where Sn is the n-dimensional “skew diagonal unit matrix” with elements (Sn )ij = δi+j,n+1 , we obtain the 3-dimensional theory with M¯ = P¯ T P¯ , and P¯ =

Pˆ

0

Sn Pˆ T −1 Nˆ Sn Pˆ T −1 Sn

, y=

0 Sn −Sn 0

.

The matrix P¯ is the Sp(2n; R)/U (n) coset representative π¯ in the 2n-dimensional matrix representation ρ¯ : π¯ 7 → ρ( ¯ π) ¯ = P¯ , y P¯ = P¯ T −1 y, τ (P¯ ) = P¯ T −1 , with the “symplectic metric” y = −y T = −y −1 , and we can express the invariant scalar ¯ ρ(Y ¯ )). product on the Lie algebra sl(2n; R) in the form < X, Y >g¯ = 2 Tr(ρ(X) In order to exhibit the Sp(2n + 2; R)/U (n, 1) σ -model consider the (2n + 2)dimensional matrix representation ρ(X) of the element X = e + h · α + δd + λi si + a · β + κk of the Lie algebra sp(2n + 2; R)   ρ(X) = 

1 2δ 1 √ β 2

κ

√1 α T y 2 λi ρ(s ¯ i) 1 − √ βT y 2

√1 α 2 − 21 δ

  ,

with the symplectic metric V such that ρ(X) = −V −1 ρ T (X)V and with the involutive automorphism ρ(τ (X)) = −D −1 ρ T (X)D, where V =

0 Sn+1 −Sn+1 0



 1 0 0 , D =  0 −12n 0  , 0 0 1

and with the scalar product < X, Y >= 2 Tr(ρ(X) ρ(Y )). Finally we obtain  √  P = ρ(π ) = 

1 P¯ A

√1 2 √1 ψ 1

and ρ(J ) = 21 M −1 dM with M = P T DP .

0 P¯ − √1 AT y 21

 0 0  ,

√1 1

On Nonlinear σ -Models Arising in (Super-)Gravity

807

A.6. The General Procedure. For the remaining cases of Table 1 the group G is one of the exceptional groups G2 , F4 , E6 , E7 , or E8 . Some of these cases describe the bosonic sector of supergravity theories, e.g., N = 8 supergravity in 4 dimensions with ¯ H¯ = E7(+7) /SU (8) and G/H = E8(+8) /SO ∗ (16) [9]. Instead of discussing each G/ of these cases in detail, we outline the general procedure and apply it to Case 6 with ¯ H¯ = SU (1, 1)/U (1), and k = 2 vector G/H = G2(+2) /(SU (1, 1) × SU (1, 1)), G/ fields. ¯ H¯ σ -models in 3 and 4 dimensions, we know that the 4Given the G/H and G/ ¯ H¯ ) − 2) = 1 (dim G − dim G ¯ − 3) dimensional theory has k = 21 (dim(G/H ) − dim(G/ 4 vector fields. Parametrizing the Lie algebra of G as described in Chapter 3, and choosing some c > 0 we find that the electromagnetic potentials transform with a 2k-dimensional ¯ with real representation ρ¯ of G 0 ηT , ¯ i ) = Ri = −y −1 RiT y, y = c ρ¯ : si 7 → ρ(s −η 0 where η is an arbitrary orthogonal matrix and τ (Ri ) = −RiT . The representation matrices have therefore the structure Ai Bi ηT , Ri = ηCi −ηATi ηT with symmetric k × k matrices B and C. ¯ H¯ = SU (1, 1)/U (1) ∼ SL(2)/SO(2) we choose c = 1 In our example with G/ ¯ e, ¯ with the commutation relations and generators d, ¯ and k¯ of G ¯ e] ¯ k] ¯ = −k, ¯ [e, ¯ = 2d, ¯ [d, ¯ = e, ¯ [d, ¯ k] the automorphism

¯ τ (d) ¯ = −d, ¯ τ (k) ¯ = −e, τ (e) ¯ = −k, ¯

¯ κ¯ k, ¯ and the 4-dimensional matrix representation of the Lie algebra element X = ¯ e+ ¯ δ¯d+ √   3 ¯ 3¯ 0 0 δ 2 √  3κ¯ 1 δ¯ 2¯ √0  0 1 , η = 2 . ρ(X) ¯ =  0 −1 0 3¯  2κ¯ √ − 21 δ¯ 0 0 3κ¯ − 3 δ¯ 2

Next we compute the invariant scalar product < si , sj >, in our example < X, X >= ¯ and use the identity x i < si , sj >= −2yRj to determine the matrices x i . 3δ¯2 + 12¯ κ, We would like to choose coset representatives π¯ from a triangular group T¯ , and choose a basis for the representation ρ, ¯ such that the generators of T¯ are represented by matrices Ri with Bi = 0 (and Ai suitably restricted). If that is possible, we obtain matrices P¯ = ρ( ¯ π) ¯ and M¯ = ρ( ¯ µ), ¯ µ˜ −1 ν˜ ν˜ µ˜ −1 ηT P˜ 0 ¯ = P¯ T P¯ = µ˜ + ν˜−1 , M , P¯ = ηµ˜ ν˜ ηµ˜ −1 ηT ηP˜ T −1 ν˜ ηP˜ T −1 ηT with symmetric matrices µ˜ = P˜ T P˜ and ν˜ , and can then reconstruct the 4-dimensional theory in terms of these matrices. All we really need is, however, a way to express the

808

P. Breitenlohner, D. Maison ¯

¯

matrices M¯ in terms of µ, ˜ ν˜ , and η. In our example we choose π¯ = eln ϕ d eχ k represented by   3/2 0 0 0 √ ϕ 1/2  ϕ 1/2 0 0  , √ 3ϕ−1/2χ 2 P¯ =  −1/2   3ϕ χ √2ϕ −1/2 χ ϕ 0 √ 3ϕ −3/2 χ 2 3ϕ −3/2 χ ϕ −3/2 ϕ −3/2 χ 3 and finally obtain µ˜ =

+ 3ϕχ 2 ϕ 3√ 3ϕχ

√

3ϕχ ϕ

, ν˜ =

3 √2χ 2 3χ

√ 2 3χ , 2χ

and (∂ϕ)2 + (∂χ )2 . < J¯, J¯ >= 3 ϕ2 In the following we will show for the remaining Cases 7–15 from Table 1, that the matrices M¯ can indeed be expressed in terms of µ, ˜ ν˜ , and η as required. In most cases this can be achieved by choosing P¯ in the form given above. Assume that there exist bases in the Lie algebra g¯ and in the representation space for ρ¯ such that for each Ri at least one of the submatrices Bi and Ci is zero. We can then choose the triangular subgroup T¯ for the coset representatives, generated by Lie algebra elements si with Bi = 0 and with Ai suitably restricted. This is always possible if there exists a “coset generator” X = −τ (X) ∈ g¯ such that all eigenvalues of ρ(X) ¯ are nonzero. The symmetric matrix ρ(X) ¯ can be diagonalized by an orthogonal transformation, and we can choose the eigenvectors with positive and negative eigenvalues as first and last k elements respectively of a new basis in the representation space for ρ, ¯ automatically preserving the structure of y. Next we can diagonalize Ad X, which is symmetric provided we choose a basis for g¯ with < si , τ (sj ) >= −δij , such that [X, si ] = ξi si . In these new bases Bi = 0 except when ξi > 0 and Ci = 0 except when ξi < 0 as desired. In the following we demonstrate the existence of such an X for each of the Cases 7–9 and 11–15 of Table 1. For Case 10, where no such X can be found, we will directly demonstrate how to express the matrices M¯ in terms of symmetric matrices µ˜ and ν˜ . A.6.1. The 14-Dimensional Representation of Sp(6; R). In Case 7 of Table 1 there are 7 vector fields and the electromagnetic potentials transform under one of the two inequivalent 14-dimensional representations of Sp(6; R). Decomposing this representation with respect to the subgroup SL(2) ⊗ SL(2) ⊗ SL(2) (using the isomorphism Sp(2; R) = SL(2)) yields 2 ⊗2 ⊗2 ⊕2 ⊗1⊗1⊕1⊗2 ⊗1⊕1⊗1⊗2. Each of the three SL(2) subgroups has a coset generator d¯i with eigenvalues ± 21 in the 2-dimensional real representation denoted by 2. Therefore, we can choose X = d¯1 + d¯2 + d¯3 with eigenvalues ± 21 and ± 23 . A.6.2. The 20-Dimensional Representation of A5 . In Cases 8–10 of Table 1 there are 10 vector fields and the electromagnetic potentials transform under the 20-dimensional representation of one of the noncompact forms sl(6), su(3, 3), or su(5, 1) of the Lie algebra A5 . As representation space we may take the totally antisymmetric 3-index tensors ϕij k , real for sl(6) and subject to the reality condition ϕ¯ ij k ≡ γ il γ j m γ kn (ϕlmn )∗ =

On Nonlinear σ -Models Arising in (Super-)Gravity

809

1 ij klmn 6 ϕlmn 1p

for su(3, 3) and su(5, 1), where γ ij is the su(p, q) metric (chosen as 0 for simplicity). γ = 0 −1q For sl(6) and su(3, 3) we can decompose the representation with respect to the subalgebra sl(2) ⊕ sl(2) ⊕ sl(2) as in Subsect. A.6.1 (using this time the isomorphism su(1, 1) = sl(2)). Distributing the three indices of ϕij k among the three sl(2) subalgebras yields again the representations 2 ⊗ 2 ⊗ 2, 2 ⊗ 1 ⊗ 1, 1 ⊗ 2 ⊗ 1, and 2 ⊗ 1 ⊗ 1. Therefore, we can again choose X = d¯1 + d¯2 + d¯3 with eigenvalues ± 21 and ± 23 . This procedure does, however, not work for Case 10 with su(5, 1), and it may even be impossible to bring the matrices P¯ into “block triagonal” form. We therefore have to analyze the matrices M¯ in some detail. Let ρˆ be the 6-dimensional complex representation of su(5, 1) with ˆ i ) = Rˆ i , Vˆ Rˆ i + Rˆ i+ Vˆ = 0, τ (Rˆ i ) = −Rˆ i+ , ρˆ : si 7 → ρ(s with the su(5, 1) metric Vˆ = Vˆ + ; we choose a b 15 0 , , λi Rˆ i = Vˆ = 0 −1 b+ − Tr a where a = (ai j ), i, j = 1, . . . , 5 is (the matrix representative of) an element of u(5), and b = (bi ), b+ = (b¯ j ) represent the coset generators. Next consider the action of su(5, 1) on the totally antisymmetric 3-index tensors ϕij k , choosing ϕij ≡ ϕij 6 and ϕ¯ ij = (ϕij )∗ as basis for this 20-dimensional representation, ρ˜ : si 7 → R˜ i , V˜ R˜ i + R˜ i+ V˜ = 0, W˜ R˜ i + R˜ iT W˜ = 0, τ (R˜ i ) = −R˜ i+ , with matrices kl −i110 0 0 i110 aij b¯ij kl ˜ ˜ , , V = , W = λi R˜ i = −i110 0 0 i110 bij kl a¯ ij kl and matrix elements aij kl = 2a[i [k δjl]] − δ[i[k δjl]] am m = −a¯ kl ij , bij kl =

1 ij klm bm = (b¯ij kl )∗ . 2

We finally obtain the real 20-dimensional representation ρ¯ of su(5, 1) with Re ϕij and Im ϕij as new basis, ¯ i ) = Ri = S −1 R˜ i S, V¯ Ri + RiT V¯ = 0, τ (Ri ) = −RiT , ρ¯ : si 7 → ρ(s with 1 S=√ 2

1 i 1 −i

, V¯ = S + V˜ S = S T W˜ S =

0 110 −110 0

,

¯ µ) ¯ = P¯ T P¯ = M¯ with M¯ V¯ M¯ = V¯ . and therefore ρ( ¯ π) ¯ = P¯ with V¯ P¯ = P¯ T −1 V¯ , and ρ( ¯ The last equation implies that M has the required form with symmetric matrices µ, ˜ ν˜ and with η = 110 .

810

P. Breitenlohner, D. Maison

A.6.3. The 32-Dimensional Representation of D6 . In Cases 11–13 of Table 1 there are 16 vector fields and the electromagnetic potentials transform under a 32-dimensional real “spinor” representation of one of the noncompact forms so(6, 6), so∗ (12), or so(10, 2) of the Lie algebra D6 . The Lie algebra Dn has two inequivalent 2n−1 -dimensional “chiral” n . For so(p, q) with p = q (mod 8) they are both real, for spinor representations S± so(p, q) with p = q +4 (mod 8) they are both pseudo-real, for so(p, q) with p = q ±2 n is real and S n (mod 8) and for so∗ (4p + 2) they are complex conjugate, whereas S+ − p+q is pseudo-real for so∗ (4p). Decomposing the representations S± with respect to the p+q p q p q subalgebra Dp ⊕ Dq yields S± = S+ ⊗ S± ⊕ S− ⊗ S∓ . For so(6, 6) and so(10, 2) we use the subalgebra so(2, 2) ⊕ D4 with D4 = so(4, 4) 6 = S 2 ⊗ S 4 ⊕ S 2 ⊗ S 4 with real spinors and D4 = so(8) respectively and obtain S+ + + − − 2 4 2 = 2 ⊗ 1 and S± and S± . Using the isomorphism so(2, 2) = sl(2) ⊕ sl(2), we find S+ 2 S− = 1 ⊗ 2. Therefore, we can choose X = d¯+ + d¯− with eigenvalues ± 21 , where d¯± are the coset generators of the two sl(2) subalgebras. 6 = For so∗ (12) we use the subalgebra so∗ (4) ⊕ so∗ (4) ⊕ so∗ (4) and obtain S+ P 2 2 2 2. S± ⊗ S± ⊗ S± , where the sum contains all combinations with an even number of S− ∗ 2 2 Using the isomorphism so (4) = sl(2) ⊕ su(2) we find S+ = 2 ⊗ 1 and S− = 1 ⊗ 20 , where 20 is the 2-dimensional pseudo-real representation of su(2). Therefore, we can choose X = d¯1 + d¯2 + d¯2 with eigenvalues ± 21 and ± 23 , where d¯i are the coset generators of the three sl(2) subalgebras. A.6.4. The 56-Dimensional Representation of E7 . In Cases 14 and 15 of Table 1 there are 28 vector fields and the electromagnetic potentials transform under the 56-dimensional representation of one of the noncompact forms E7(+7) or E7(−25) of the Lie algebra E7 . We first decompose the representation with respect to the subalgebra sl(2) + D6 , with 6, D6 = so(6, 6) for E7(+7) or D6 = so(10, 2) for E7(−25) , and obtain 2 ⊗ V 6 ⊕ 1 ⊗ S+ where V 6 denotes the vector representation of D6 . Further decomposing the representations of D6 with respect to the subalgebra sl(2) ⊕ sl(2) ⊕ D4 as in Subsect. A.6.3 finally yields V 6 = 2 ⊗ 2 ⊗ 1 ⊕ 1 ⊗ 1 ⊗ V 4 . Therefore, we can choose X = d¯1 + d¯2 + d¯2 with eigenvalues ± 21 and ± 23 , where d¯i are again the coset generators of the three sl(2) subalgebras. References 1. 2. 3. 4. 5. 6. 7. 8. 9.

Breitenlohner, P., Maison, D., and Gibbons, G.: Commun. Math. Phys. 120, 295–334 (1987) Julia, B.: In Proc. John Hopkins Workshop on Particle Theory, Baltimore, 1981 Breitenlohner, P., and Maison, D.: Ann. Inst. Henri Poincaré 46, 215–246 (1987) Heusler, M.: Phys. Rev. D 56, 961–973 (1997) Neugebauer, G., and Kramer, D.: Ann. d. Physik 24, 62–71 (1969); Geroch, R.: J. Math. Phys. 12, 918–924 (1971) Harrison, K.: J. Math. Phys. 9, 1744–1752 (1968) Carter, B., In Black Holes, Proc. 1972 Les Houches Summer School, Eds. De Witt, C. and De Witt, B.S., New York: Gordon and Breach, 1973 Gal’tsov, D.V., and Kechkin, O.V.: Phys. Rev. D 50, 7394–7399 (1994); Gal’tsov, D.V.: Phys. Rev. Lett. 74, 2863–2866 (1995) Markus, N., and Schwarz, J.H.: Nucl. Phys. B 228, 145–162 (1983)

Communicated by H. Nicolai