Commun. Math. Phys. 223, 1 – 12 (2001)
Communications in
Mathematical Physics
© Springer-Verlag 2001
Resonance Expansions in Semi-Classical Propagation Nicolas Burq1 , Maciej Zworski2 1 Université de Paris-Sud, 91405 Orsay Cedex, France. E-mail:
[email protected] 2 Department of Mathematics, University of California, Berkeley, CA 94720, USA.
E-mail:
[email protected] Received: 27 July 2000 / Accepted: 5 March 2001
Abstract: We present a long time expansion of a semi-classical propagation in terms of resonances close to the real axis. Our method also gives an xpansion of scattered classical waves in terms of scattering poles to the real axis. 1. Introduction and Statement of Results Resonances, or scattering poles, are complex numbers which mathematically describe meta-stable states: the real part of a resonance gives the rest energy, and its imaginary part, the rate of decay of a meta-stable state. They appear in many branches of physics, chemistry and mathematics, from molecular dynamics to automorphic forms – see [14] for a general introduction (and [7, Sect. 1] for a somewhat different point of view). The purpose of this note is to present an expansion of a semi-classical propagating state in terms of resonances close to the real axis. Since the inverse of the distance to the real axis gives the life-span of a resonance, the times at which the expansions are valid have to be large enough to eliminate the contribution of other resonances (see the remark at the end of Sect. 3). Our method also gives an expansion of scattered classical waves in terms of scattering poles close to the real axis. This expansion is weaker than the expansion presented in [13] but the advantage is that it does not depend on any hard to verify conditions (which was the case in [13]). Our semi-classical results are applicable to Schrödinger or wave equations for long range “black box” perturbations [8] – see Sect. 2 for a review of definitions. A typical operator to keep in mind is P (h) = −h2 + V (x), where |V (x)| ≤ C|x|− , > 0 is analytic in a complex conic neighbourhood of infinity (in this case R0 in the theorem below can be taken to be 0). We denote by Res(P ) the set of resonances of P (h), that is the set of poles of the meromorphic continuation of R(z, h) = (P (h) − z)−1 from Im z > 0 to the lower half-plane. Theorem 1. Let P (h) be an operator satisfying the assumptions of Sect. 2 and let χ ∈ Cc∞ (Rn ) be equal to one on a neighbourhood of B(0, R0 ). Let ψ ∈ Cc∞ ((0, ∞))
2
N. Burq, M. Zworski
√ √ and let chsupp ψ = [a, b]. We put µ(z) = z or z, with the convention that z > 0 for z > 0. There exists 0 < δ < c(h) < 2δ such that for every M > M0 there exists L = L(M), and we have χ e−itµ(P )/ h χ ψ(P ) =
χ Res(e−itµ(•)/ h R(•, h), z)χ ψ(P )
z∈(h)∩Res(P)
+ OH→H (h∞ ),
(1.1)
for t > h−L ,
(h) = (a − c(h), b + c(h)) − i[0, hM ), and where Res(f (•), z) denotes the residue of a meromorphic family of operators, f , at z. The function c(h) depends on the distribution of resonances: roughly speaking we cannot “cut” through a dense cloud of resonances. Even in the very well understood case of the modular surface [1, Theorem 1] there is, currently at least, a need for some non-explicit grouping of terms. This is eliminated by the separation condition [13, (4.4)] which however is hard to verify. The method of proof is a direct consequence of earlier work of Tang and the second author, which in turn draws heavily from the work of Melrose, Sjöstrand, and Stefanov– Vodev – see [12, 13] and the references given there. The dynamical nature of resonances was emphasized early by Lax and Phillips [4] and their celebrated semi-group still provides the most elegant connection between the stationary and dynamical definitions. In practice, that is, when evolution equations are being considered, the semi-group methods provide expansions in non-trapping situations only – see Sect. 3 of [13] for a review of Vainberg’s direct proof. Time dependent theories of resonances have also been investigated recently in [7] and [11] (see also [3] for earlier results). The difference here lies in considering many resonances at high energies rather than a time dependent theory of a single resonance obtained by perturbing an embedded eigenvalue. In the energy régimes at which molecular reactions take place there are normally many resonant states and it is their contributions to the propagation of a state that seems to be of interest – see [5] and references given there. The random matrix method is one of the tools used in transition state theory and it still remains inaccessible in rigorous work on quantum mechanics. Ideas used in the semi-classical case provide an unconditional result in the case of the classical wave equation. Suppose that the operator P satisfies the assumptions of Sect.2 with h = 1. The wave group of P can be defined abstractly by
Dt U (t) U (t) 0 I U(t) = exp t = P 0 Dt2 U (t) Dt U (t)
with
U (t) = i
√ sin t P . √ P
(1.2)
As usual we define the iterated domain by DL = (P + i)−L H.
(1.3)
Resonance Expansions in Semi-Classical Propagation
3
Our result is slightly different depending on finer assumptions on P which we state as three cases: Case 1
P |Rn \B(0,R0 ) = −|Rn \B(0,R0 )
n odd
Case 2
P |Rn \B(0,R0 ) = −|Rn \B(0,R0 )
n even
Case 3
P |Rn \B(0,R0 ) = Q|Rn \B(0,R0 )
any n
where Q is an elliptic operator close to the Laplacian at infinity – see (2.2) with h = 1. Theorem 2. Let P be an operator satisfying the assumptions of Sect.2 with h = 1. Let χ ∈ Cc∞ (Rn ) be equal to one on a neighbourhood of B(0, R0 ) and % ∈ C ∞ (R) be an even function such that for x ∈ R in cases 1 and 2 % (x) = 1 , % (x) = 0 near 0 in case 3. (1.4) for x ≥ 1 in case 3 For every M > M0 , there exist = (M) > 0, a function c(t) satisfying |c(t)−t | ≤ C, and √ χ U (t)%( P )χ = χ Res(e−it• R(•), λj )χ + E(t) + W (t), Im λj >−λj −M 1<| Re λj |
λ2j ∈ Res(P ), Im λj < 0, CM t K0 − L in cases 1 and 3 E(t)DL →H ≤ Ct −n+1 in case 2,
(1.5)
where K0 is a fixed constant and L is large enough in case 2, and W (t) corresponds to the contribution from the pure point spectrum of P . We note that when the algebraic multiplicity is equal to the geometric multiplicity we have Res(e−it• R(•), λj ) = e−itλj Res(R(•), λj ), while in general, powers of t will appear in the expansion. The term W (t) has the usual expression: W (t) = i
µk ∈σpp (P )
√ sin(t µk ) √ χ √ %( µk ).µk χ , µk
where .µ is the orthogonal projection on the eigenspace of µ. 2. Preliminaries To avoid the analysis of specific aspects of obstacle, potential, or metric scattering we work in the “black box” formalism introduced in [10] and generalized further in [8].
4
N. Burq, M. Zworski
The operator we study acts on H, a complex Hilbert space with an orthogonal decomposition H = HR0 ⊕ L2 (Rn \ B(0, R0 )), where R0 > 0 is fixed and B(x, R) = {y ∈ Rn : |x − y| < R}. The corresponding orthogonal projections are denoted by u|B(0,R0 ) and u|Rn \B(0,R0 ) or by 1B(0,R0 ) u and 1Rn \B(0,R0 ) u respectively, where u ∈ H. We work in the semi-classical setting and for each h ∈ (0, h0 ], we have P (h) : H −→ H with the domain D, independent of h, and satisfying 1Rn \B(0,R0 ) D = H 2 (Rn \ B(0, R0 )) uniformly with respect to h (see [8] or [12] for a precise meaning of this statement). We also assume that 1B(0,R0 ) (P (h) + i)−1 : H −→ HR0
is compact,
1Rn \B(0,R0 ) P (h)u = Q(h)(u|Rn \B(0,R0 ) ),
for u ∈ D,
(2.1) (2.2)
where Q(h) is a formally self-adjoint operator on L2 (Rn ) given by aα (x; h)(hDx )α v for v ∈ C0∞ (Rn ) Q(h)v = |α|≤2
such that aα (x; h) = aα (x) is independent of h for |α| = 2, aα (x; h) ∈ Cb∞ (Rn ) are uniformly bounded with respect to h; here Cb∞ (Rn ) denotes the space of C ∞ functions on Rn with bounded derivatives of all orders, aα (x; h)ξ α ≥ (1/c)|ξ |2 ∀ ξ ∈ Rn , |α|=2
for some constant c > 0, |α|≤2 aα (x; h)ξ α −→ ξ 2 uniformly with respect to h as |x| → ∞. The meromorphic continuation is guaranteed by the following analyticity assumption: there exist θ ∈ [0, π), > 0 and R ≥ R0 such that the coefficients aα (x; h) of Q(h) extend holomorphically in x to {rω : ω ∈ Cn , dist(ω, Sn ) < , r ∈ C, |r| > R, arg r ∈ [− , θ0 + )} with |α|≤2 aα (x; h)ξ α −→ ξ 2 uniformly with respect to h as |x| → ∞ remains valid in this larger set of x’s. We use P (h) to construct a self-adjoint operator P 9 (h) on H9 = HR0 ⊕ L2 (M \ B(0, R0 ))
Resonance Expansions in Semi-Classical Propagation
5
as in [10] where M = (R/RZ)n for some R R0 . Let N (P 9 (h), I ) denote the number of eigenvalues of P 9 (h) in the interval I , we assume n9 /2
N (P 9 (h), [−λ, λ]) = O((λ/h2 )
), for λ ≥ 1,
(2.3)
n9
for some number ≥ n. Under the above assumptions on P (h), the resonances close to the real axis can be defined by the method of complex scaling (see [8] and references given there). They coincide with the poles of the meromorphic continuation of the resolvent (P (h) − z)−1 from Im z > 0 to a conic neighbourhood of the positive half axis in the lower half plane. The set of resonances of P (h) will be denoted by ResP (h) and we include them with their multiplicity. The spectral assumption (2.3) implies (in a non-trivial way – see [8] and references given there) a bound on the number of resonances: let {z : Im z ≤ 0, Re z > 0}, then 9
# ∩ Res(P (h)) ≤ Ch−n ,
(2.4)
where # denotes the number of elements counted according to their multiplicities. We will need two lemmas which come from [12]: ˜ ⊂ Sθ , where Lemma 2.1. For any simply connected compact set Sθ = {z ∈ C : Max(−π, 2θ − 2π ) < −argz < 2θ} (θ as in Condition 3 of P (h)) and positive function g(h) 1 defined on 0 < h < h0 , ˜ > 0 and h1 with 0 < h1 < h0 such that there exist constants A = A() 9 1 Ah−n log g(h) ˜ \ χ (P (h) − z)−1 χ H→H ≤ Ae ∀z ∈ D(zj , g(h)) ˜ zj ∈ResP (h)∩
for h < h1 , where χ ∈
C0∞ (Rn )
with χ = 1 near B(0, R0 ), and
D(zj , g(h)) = {z ∈ C, |zj − z| ≤ g(h)}. This is a generalization of earlier results on bounds on the resolvent in the nonphysical plane (see [12] and references given there, and [13, Prop. 4.1] for a direct proof in a simpler setting), and is based on the work of Sjöstrand [8]. The second lemma is the “semi-classical” maximum principle (see [13, Lemma 4.1]): Lemma 2.2. Fix an integer n9 and let 0 < h < 1. Suppose F (z, h) is a family of holomorphic function (in z) defined in a neighbourhood of ˜
˜
9 +2
(h) = [E(h) − 5hk , E(h) + 5hk ] + i[−4hk+n
, 4hk+1 ]
for k > n9 , k˜ ≤ k − n9 , and a real number E(h) > δ > 0. If F (z, h) satisfies −n9
|F (z, h)| ≤ AeAh log h on (h), 1 on (h) ∩ {Im z < 0}, |F (z, h)| ≤ | Im z| 1
then there exist h(k) > 0 and C > 0 such that |F (z, h)| ≤
C 9 hk+n +2
9 9 ˜ ˜ ˜ for z ∈ (h) = [E(h) − hk , E(h) + hk ] + i[−2hk+n +2 , 2hk+n +2 ].
(2.5)
6
N. Burq, M. Zworski
A direct consequence of these two lemmas is the following Lemma 2.3. Suppose that the operator P (h) has no resonances in ˜
˜
9 +2
(h) = [E(h) − 5hk , E(h) + 5hk ] + i[−4hk+n
, 4hk+1 ]
for k > n9 , k˜ ≤ k − n9 , and a real number E(h) > δ > 0. Then there exist h(k) > 0 and C > 0 such that |χ (P (h) − z)−1 χ | ≤
C 9 hk+n +2
,
9 9 ˜ ˜ ˜ for z ∈ (h) = [E(h) − hk , E(h) + hk ] + i[−2hk+n +2 , 2hk+n +2 ].
(2.6)
Remark. Resolvent estimates were present in many previous works on resonances, in particular in almost all the works in which resonances were constructed. The point here is the abstract and general nature of the estimate. In fact, as was pointed out to us by M. Hitrik, the estimates of the type used here bear some similarity to the abstract estimates developed by Markus and Matsaev in the study of non-self-adjoint operator pencils – see [6] and references given there. 3. Semi-Classical Expansions For simplicity of presentation we assume that there is no point spectrum. The contribution of eigenvalues of P to (1.1) is immediate and the modification of the argument is clear. Let us write R± (z, h) = (P (h) − z)−1 ,
analytic for ± Im z > 0 ,
using the same notation for the meromorphic continuation. The spectral projection is then given by Stone’s formula: dEλ = (2π i)−1 (R− (λ) − R+ (λ)), and the left-hand side of (1.1) can be rewritten as ∞ 1 χ e−itµ(P )/ h χ ψ(P ) = e−itµ(λ) χ (R− (λ) − R+ (λ))χ dλψ(P ). 2πi 0
(3.1)
We would like to deform the contour in this integral with some modifications necessary, as we would like to commute χ and ψ(P ), and as ψ ∈ Cc∞ ((0, ∞)). As recalled in Sect. 2, the number of resonances in [a − 2δ, b + 2δ] − i[0, δ] is 9 bounded by O(h−n ). Hence there exists c(h), 0 < δ < c(h) < 2δ, such that 9 +1
D(a − c(h), 2hn
9 +1
) ∩ Res(P ) = D(b + c(h), 2hn
) ∩ Res(P ) = ∅.
We construct an h-dependent function, ψh , satisfying, 9 +1
ψh ∈ C ∞ (R), supp ψh ⊂ [a − c(h) − hn ψh ≡ 1 on [a − c(h) + h
n9 +1
9 +1
, b + c(h) + hn
, b + c(h) − h
n9 +1
].
],
(3.2)
We have the following slight modification of the standard almost analytic continuation lemma (see for instance [2, Sect. 8]):
Resonance Expansions in Semi-Classical Propagation
7
Lemma 3.1. The function ψh satisfying (3.2) can be extended to a function in C ∞ (C) satisfying 9 O(| Im z/ hn +1 |∞ ) ∂¯z ψ˜ h (z) = 9 9 0 if | Re z − a + c(h)| > hn +1 and | Re z − b − c(h)| > hn +1 . (3.3) We now introduce fixed cut-off functions ψi ∈ Cc∞ ((0, ∞)): supp ψ1 ⊂ [a − 3δ, b + 3δ], supp ψ2 ⊂ [a − δ, b + δ], ψ1 ≡ 1 on [a − 2δ, b + 2δ], ψ2 ≡ 1 on [a − δ/2, b + δ/2].
(3.4)
We then rewrite the right-hand side of (3.1) as ∞ 1 e−itµ(λ) χ (R− (λ) − R+ (λ))ψh (λ)χ ψ(P )dλ 2πi 0 ∞ 1 e−itµ(λ) χ (R− (λ) − R+ (λ))(ψ1 − ψh )(λ) ((1 − ψ2 )(P )χ ψ(P )) dλ (3.5) + 2πi 0 ∞ 1 e−itµ(λ) χ (R− (λ) − R+ (λ)) ((1 − ψ1 )(P )χ ψ(P )) dλ, + 2πi 0 and we claim that the two last terms give O(h∞ ) contributions. For that we need the following lemma which comes essentially from Sect. 4 of [8] (see also Sect. 5 of [10] for a simpler version, and Sect. 3 of [9]): Lemma 3.2. Let χ , ψ, and ψi have the same properties as in (3.5). Then P m ψi (P )(1 − χ ) = Qm ψi (Q)(1 − χ ) + OH→H (h∞ ), P m (1 − χ )ψ(P ) = Qm (1 − χ )ψ(Q) + OH→H (h∞ ), where Q = Q(h) is the same as in (2.2). Hence to show that the last terms in (3.5) are negligible we observe that Qm (1 − ψj )(Q)χ ψ(Q) = OH→H (h∞ ), which, in view of the support properties of ψ and ψj , follows from the semi-classical functional calculus (see Sect. 8 of [2]). We needed the powers of P to guarantee the convergence of the spectral measure integral on the real line. We now deform the contour in the first term of (3.5) (see Fig. 1) using the Green formula and noting that in the support of ∂¯z ψ˜ h there are no poles of R− (z, h)−R+ (z, h). Thus ∞ 1 e−itµ(z) χ (R− (z) − R+ (z))ψh (z)χ ψ(P )dz 2πi 0 = χ Res(e−itµ(•)/ h R(•, h), z)χ ψ(P ) z∈(h)∩Res(P)
1 e−itµ(z) χ (R− (z) − R+ (z))ψ˜ h (z)χ ψ(P )dz 2πi B(h) 1 + e−itµ(z) χ (R− (z) − R+ (z))∂¯z ψ˜ h (z)χ L(dz)ψ(P ), π
+
(3.6)
8
N. Burq, M. Zworski a − 2δ
hM+1
{
b 00000000000000000 11111111111111111 0000000000000000 1111111111111111 00000000000000000 11111111111111111 0000000000000000 1111111111111111 00000000000000000 11111111111111111 0000000000000000 1111111111111111 1111111111 0000000000 00000000 11111111 00000000000000000 11111111111111111 0000000000000000 1111111111111111 0000000000 1111111111 00000000 11111111 00000000000000000 11111111111111111 0000000000000000 1111111111111111 0000000000 1111111111 00000000 11111111 00000000000000000 0000000000000000 1111111111111111 0000000000 1111111111 00000000 11111111 00000000000000000 11111111111111111 0000000000000000 a 1111111111111111 b 11111111111111111 0000000000 1111111111 00000000 11111111 00000000000000000 11111111111111111 0000000000000000 1111111111111111 0000000000 1111111111 00000000 11111111 00000000000000000 11111111111111111 0000000000000000 1111111111111111 0000000000 1111111111 00000000000000000 11111111111111111 0000000000000000 1111111111111111 00000000000000000 11111111111111111 0000000000000000 1111111111111111 00000000000000000 11111111111111111 0000000000000000 1111111111111111 00000000000000000 11111111111111111 0000000000000000 1111111111111111 00000000000000000 11111111111111111 0000000000000000 1111111111111111 11111111111111111111 00000000000000000000 00000000000000000 11111111111111111 0000000000000000 1111111111111111 11111111111111111111 00000000000000000000 00000000000000000 11111111111111111 0000000000000000 1111111111111111 00000000000000000 11111111111111111 0000000000000000 1111111111111111 00000000000000000 11111111111111111
+ 2δ
}h
M
support of ∂¯ ψ˜ h
Fig. 1. The contour deformation in the semi-classical case (there are no resonances in the shaded region)
where (h) is as in Theorem 1, B(h) is the interval [a − 2δ, b + 2δ] − id(h) with the positive orientation, and L(dz) is the Lebesgue (dxdy) measure in C. Here d(h) = hM (1 + O(h)) is chosen so that there are no resonances in [a − 2δ, b + 2δ] − i[d(h) − hM1 , d(h) + hM1 ], and that d(B(h), Res(P (h))) > hM2 , for some fixed M2 (the constants M1 and M2 depend on M and n9 ). The existence of d(h) follows again from the polynomial upper bound on the number of resonances. The last two terms of the right-hand side of (3.6) are negligible. In fact, Lemma 2.1 and the choice of B(h) guarantee that −K1
χ (R− (z, h) − R+ (z, h))χ = OH→H (1)eh
,
where K1 depends on M2 and n9 only. Hence the integral over B(h) is bounded by −K1
exp(−thM )eh
= O(h∞ ),
if t > h−L , L > K1 + M. The last term on the right-hand side of (3.6) is estimated using Lemma 2.3 as we recall that there are no poles in the support of ∂¯z ψ˜ h . Hence we bound the norm of the 9 ) if M > n9 + 1. integrand by | Im z/ hn +1 |∞ h−K2 . Since | Im z| ≤ hM , this is O(h∞
M Finally we note that d(h) can be replaced by h . If Ph (z) = zj ∈Res(P )∩ (z − 9
zj ), = (a − 2δ, b + 2δ) − i(0, ), then Ph (z) ≤ exp Ch−n and Ph (z)R− (z, h) is holomorphic in . Lemma 2.1 and the maximum principle give bounds on the residues: −K
χRes(eit• R− (•, h)χ )(zj ) = O(e−t Im zj eCh
) : H −→ H,
and that is O(h∞ ) if t > h−L and Im zj h−M = O(1). Combined with the polynomial bound on the number of resonances, this allows us to change d(h) to hM . Remark. As was observed by Laurence Nedelec, the method of proof of Theorem 1 (h) = (a − c(h), b + c(h)) − i[0, ) where 0 < < easily allows replacing (h) by 1/C. That corresponds to the fact that
χ Res(e−itµ(•)/ h R(•, h), z)χ ψ(P ) = O(h∞ )
(h)∩Res(P)\(h) z∈
which follows from the argument presented at the end of the proof.
for t > h−L .
Resonance Expansions in Semi-Classical Propagation
9
4. Expansions for the Wave Equation In this section we come back to the classical case and assume that the operator P satisfies the asumptions of Sect. 2 for h = 1. As in the previous section we assume for simplicity that P has no point spectrum. ˜ = {|Imz| < |Rez|−2 /C, |z| > 1}. In the For C > 0 sufficiently large we write classical case, Lemma 2.1 gives, by a scaling argument, the following estimates on the truncated resolvent: n9
χ (P − z)−1 χ H→H ≤ AeA|z| /2 log(1/g(1/|z|)) ˜ \∪ ∀z ∈ ˜ D(zj , g(1/|z|))
(4.1)
zj ∈Res(P )∩
and for z0 1 if P has no resonances in the set ˜
˜
9 +2)/2
(z0 ) = [z0 − 5|z0 |−k/2 , z0 + 5|z0 |−k/2 ] + i[−4|z0 |−(k+n
9 +2)/2
, 4|z0 |−(k+n
],
then Lemma 2.3 gives 9 +2)/2
|χ (P − z)−1 χ | ≤ C|z|(k+n
˜ ˜ ˜ 0 ) = [z0 − |z0 |−k/2 for z ∈ (z , z0 + |z0 |−k/2 ] 9 +2)/2
+ i[−2|z0 |−(k+n
(4.2) 9 +2
, 2|z0 |−(k+n
)/2].
In the case of the wave equation it is convenient to make the change of variables z = λ2 , with C \ [0, ∞) corresponding to Im λ > 0. We then write R(λ) = (P − λ2 )−1 and we now have a meromorphic continuation to a conic neighbourhood of R. We are going to use the estimates (4.2) and (4.1) to perform a contour deformation in the integral: +∞ √ i χ U (t)(P + i)−L %( P )χ = e−itλ χ R(λ)(λ2 + i)−L %(λ)χ dλ, t > 0, 2π −∞ (4.3) where we noted that the contribution of R(−λ) in the spectral projection can be eliminated by contour deformation when t > 0 – see [13, Sect. 4]. We first consider Case 1: odd dimensions and P = − in the exterior of a (large) ball. Let us fix M > (k + n9 + 2)/2. In view of the estimate on the number of resonances, we can construct a smooth curve B˜ = {z = x + iγ (x) : x ∈ R}
(4.4)
∀ x ∈ R, γ (x) > 0 and γ (x) = |x|−M (1 + O(|x|−1 )), ˜ dist (z, Res(P )) > |z|−M , ∀ z ∈ B,
(4.5)
such that
(4.6)
and such that the length of the part of B˜ lying between −x and x is smaller than 3x. This can be achieved by considering hj = 2−j/2 , j ∈ N and scaling: for |x| ∈ [2j , 2j +1 ] the curve is obtained from the curve Im z = d(hj ) constructed in the proof of Theorem 1. It is then enough to modify the curves slightly near each of the points x = 2j in order to join them together.
10
N. Burq, M. Zworski
Take > 0 to be fixed later. Using again the estimate on the number of resonances, there exists a function c(t) ∼ t such that the operator P has no resonances in the balls D(±c(t), t −M ). Now, we deform the contour in (4.2) to the following new contour: B = {z = x + iγ (x) : |x| < c(t)} ∪ c(t) × −i[0, γ (c(t))] ∪ {|x| ≥ c(t)} = B1 + B 2 + B 3 ,
(4.7) (4.8)
and we obtain c (t) ∼ t 1111111 0000000
111111111111111 000000000000000
γ (c (t)) ∼ t −εM
Fig. 2. The contour deformation in the classical case (for case 2)
χ U (t)(P + i)
−L
i χ= 2π +
B1 +B2 +B3
e−itλ χ R(λ)(λ2 + i)−L χ dλ
χ Res(e−it• R− (•), λj )(P + i)−L χ .
(4.9)
Im λj >−γ (Re λj ) | Re λj |
The finitely many additional resonances appearing in (4.9) (compared to (1.5) – all of them near 0) give an exponentially small contribution. To obtain Theorem 2 it is now sufficient to control the error term given by the integral on the deformed contour. Let us consider first the contribution of B1 . We use (4.1) and get, for any f ∈ H, M n9 −itλ 2 −L e χ R(λ)(λ + i) χ dλf ≤ e−αt|z| AeA|z| dzf B1
≤
B1 c(t)
−c(t)
e−αtt
− M
AeAt
n9
dzf , α > 0, (4.10)
and for (M + n9 ) < 1, this term is decaying faster than any polynomial. We now fix < 1/(M + n9 ) and consider the contribution of B2 , where we use (4.2): γ (c(t)) (k+n9 +2)/2 M |z| e−itλ χ R(λ)(λ2 + i)−L χ dλf ≤ e−sαtt dsf (1 + |z|)L B2 0 2t − M (4.11) 9 ≤ t (k+n +2)/2− L dsf 0 9 +2)/2−L]
≤ Ct [(k+n
f .
This is exactly the decay stated in Theorem 1. For the contribution coming from B3 , we use the spectral calculus, to obtain 1 e−itλ χ (R(λ) − R(−λ))(λ2 + i)−L χ dλf ≤ sup |λ2 + i|−L f ≤ L f , t B3 B3 (4.12)
Resonance Expansions in Semi-Classical Propagation
11
where (see [13, Sect. 4]) we can insert the term R(−λ) and use the bounds on the spectral projection. To deal with Case 2, that is with n even, we have to modify our argument because the resolvent has a branching point at λ = 0. Hence we have to deform the contour which near 0 is equal to {z = x − iβx, x ≥ 0} ∪ {z = x + iβx, x ≤ 0}, for β small
(4.13)
and to use the usual estimates for the resolvent near 0: χ R (z) χ ≤ Ca,b |z|n−2 | log(z)|
(4.14)
in any sector arg z ∈ [−a, b]. 0
111111 000000 000000 111111 000000 111111 000000 111111 000000 111111 000000 111111 000000 111111
=1 %
=0 %
111111 000000 000000 111111 000000 111111 000000 111111 000000 111111 000000 111111 000000 111111
=1 %
support of ∂¯ % Fig. 3. The contour deformation in the “black box” case
To deal with the general case, we consider an almost analytic extension of the function such that ∂¯ % is supported in a set where P has no resonances. We deform (4.3) %, % to a contour which for |z| > 1 is the same as before, and for |z| < 1 it is as in Fig. 3. By Stokes’s formula we get exactly the same contributions as in Case 1 (since near 0, = 0) with an additional term % i 2π
(z) e−itλ χ (R(λ)(λ2 + i)−L χ dL (z) , ∂¯ %
(4.15)
where is the domain between the real axis and the new contour (in shaded
in area (z) = O (| Im z|∞ ), we obtain that this last term is O t −∞ (in Fig. 3). Using that ∂¯ % the energy norm). Finally, we point out that the remark made at the end of Sect. 3 applies also in the classical case: we can sum over resonances in a larger region but the additional contribution can be absorbed into the error term E(t). We can now replace the region over which we sum the contributions from resonances by the simpler region given in Theorem 2 arguing as at the end of the proof of Theorem 1. Acknowledgements. The authors would like to thank the National Science and Engineering Research Council of Canada and the France-Berkeley Fund for partial support. The research of the second author was partially supported by the National Science Foundation of the U.S. We would also like to thank Laurence Nedelec for her comments on the first version of the paper, and to Michael Hitrik for a useful reference.
12
N. Burq, M. Zworski
References 1. Christiansen, T. and Zworski, M.: Resonance wave expansions: Two hyperbolic examples. Commun. Math. Phys. 212, 323–336 (2000) 2. Dimassi, M. and Sjöstrand, J.: Spectral asymptotics in the semi-classical limit. LMS Lecture Series. Cambridge: Cambridge University Press, 1999 3. Hunziker, W.: Distortion analyticity and molecular resonance curves. Ann. Inst. H. Poincaré Phys. Théor. 45, 339–358 (1986) 4. Lax, P. and Phillips, R.: Scattering Theory. 2nd edition. New York: Academic Press, 1989 5. Peskin, U., Reisler, H., and Miller, W.H.: On the relation between unimolegular reaction rates and overlapping resonances. J. Chem. Phys. 101 (11), 9672–9680 (1994) 6. Markus, A.S.: Introduction to the spectral theory of polynomial operator pencils. Translations of Mathematical Monographs, 71. Providence, RI: Am. Math. Soc., 1988 7. Merkli, M., Sigal, I. M.: A time-dependent theory of quantum resonances. Commun. Math. Phys. 201, 549–576 (1999) 8. Sjöstrand, J.: A trace formula and review of some estimates for resonances. In: Microlocal analysis and spectral theory (Lucca, 1996), NATO Adv. Sci. Inst. Ser. C Math. Phys. Sci., 490. Dordrecht: Kluwer Acad. Publ., 1997, pp. 377–437 9. Sjöstrand, J.: Resonances for bottles and trace formulae. Preprint, 1998, to appear in Math. Nachrichten 10. Sjöstrand, J., and Zworski, M.: Complex scaling and the distribution of scattering poles. J. A.M.S., 4 (4), 729–769 (1991) 11. Soffer, A. and Weinstein, M.: Resonances, radiation damping and instability in Hamiltonian nonlinear wave equations. Invent. Math. 136, 9–74 (1999) 12. Tang, S.H. and Zworski, M.: From quasimodes to resonances. Math. Res. Lett. 5, 261–272 (1998) 13. Tang, S.H. and Zworski, M.: Resonance expansions of scattered waves. Comm. Pure Appl. Math. 53, 1305–1334 (2000) 14. Zworski, M.: Resonances in physics and geometry. Notices Am. Math. Soc. 46, 319–328 (1999) Communicated by B. Simon
Commun. Math. Phys. 223, 13 – 28 (2001)
Communications in
Mathematical Physics
© Springer-Verlag 2001
Bound States in Curved Quantum Layers P. Duclos1,2 , P. Exner3,4 , D. Krejˇciˇrík1,2,3,5 1 Centre de Physique Théorique, CNRS, 13288 Marseille-Luminy, France 2 PHYMAT, Université de Toulon et du Var, 83957 La Garde, France. E-mail:
[email protected] 3 Nuclear Physics Institute, Academy of Sciences, 25068 Rež ˇ near Prague, Czech Republic.
E-mail:
[email protected];
[email protected]
4 Doppler Institute, Czech Technical University, Bˇrehová 7, 11519 Prague, Czech Republic 5 Faculty of Mathematics and Physics, Charles University, V Holešoviˇckách 2, 18000 Prague, Czech Republic
Received: 26 February 2001 / Accepted: 21 May 2001
Abstract: We consider a nonrelativistic quantum particle constrained to a curved layer of constant width built over a non-compact surface embedded in R3 . We suppose that the latter is endowed with the geodesic polar coordinates and that the layer has the hardwall boundary. Under the assumption that the surface curvatures vanish at infinity we find sufficient conditions which guarantee the existence of geometrically induced bound states.
1. Introduction Relations between the geometry of a region in Rn , boundary conditions at ∂, and spectral properties of the corresponding Laplacian are one of the vintage problems of mathematical physics. Recent years brought new motivations and focused attention to aspects of the problem which attracted little attention earlier. A strong impetus comes from mesoscopic physics, where new experimental techniques make it possible to fabricate semiconductor systems which can be regarded with a reasonable degree of accuracy as waveguides, resonators, etc., for effectively free quantum particles. Often potential barriers at their boundaries can be modelled as a hard wall, in which case it is natural to identify the system Hamiltonian – up to a constant which is usually unimportant – with the Dirichlet Laplacian, − D , defined as the Friedrichs extension – cf. Sect. 3.3. Moreover, the mentioned solid-state physics advances inspired new insights into classical physics, because analogous problems involving the Dirichlet Laplacian arise also in flat electromagnetic waveguides. For more information about the physical background see [DE,LCM] and references therein. On the mathematical side a new interesting effect is the binding due to the curvature, supposed to be nonzero and asymptotically vanishing, of an infinitely stretched tubular region in Rn , n = 2, 3. Such “trapped modes” may be generated by other local perturbations of a straight tube as well – see, e.g., [BGRS] – but in the bent-tube case they are
14
P. Duclos, P. Exner, D. Krejˇciˇrík
of a purely quantum origin because there are no classical closed trajectories, apart from a zero measure set of initial conditions in the phase space. More generally, quantum motion in the vicinity of a manifold with a potential constraint or Dirichlet condition was studied a long time ago [JK, dC1, dC2,T] in formal attempts to justify quantization on submanifolds. For a thin neighbourhood one excludes the transverse part of the Hamiltonian which gives rise to normal oscillations and the Hamiltonian is replaced by a tangential operator on the submanifold with the energy appropriately renormalized. Interest in this problem has been renewed recently when time evolution around a compact n-dimensional manifold in Rn+m was treated in a rigorous way and compared with the corresponding classical dynamics [FH]. The confinement was realized by a harmonic potential transverse to the manifold and the thin-neighbourhood limit was performed by means of a dilation procedure followed by averaging in the normal direction. If the normal bundle is trivial, which is the case, e.g., for manifolds of codimension one, the resulting tangential Hamiltonian contains two terms; the first is proportional to the Laplace-Beltrami operator on the constraint manifold and the second is an effective potential which depends not only on the intrinsic quantities, but also on the external curvature of the constraint manifold. Notice also that if Rn+m is replaced by a manifold of the same dimension, the effective potential depends also on the curvature of this ambient space [M]. The said potential is important also in the situation when the width of the “fat manifold” is finite and fixed. This was first noticed for bent planar Dirichlet strips in the paper [EŠ] which was followed by numerous studies on which the existence conditions and properties of the geometrically induced discrete spectrum were further investigated – see, in particular, [GJ, DE, RB], the first two papers also for a generalization to curved tubes in R3 . On the other hand, much less is known about other possible generalizations of this problem to higher dimensions starting from the physically interesting case of curved layers in R3 . This is the question we address in the present paper. While the strategy will be the same as in the work mentioned above, using suitable curvilinear coordinates to transform the Laplacian, the two-dimensional character of the underlying manifold bring new features. To characterize them briefly, recall that in the simplest (1+1)-case the effective potential is − 41 κ 2 , where κ is the curvature, which is negative whenever the curvature is nonzero. In case of a layer, n = 2 and m = 1, which we consider here, the (leading term of the) effective potential is given by − 41 (k1 − k2 )2 – see the derivation of (3.12) – where k1 , k2 are the principal curvatures of the surface. This expression may vanish also if the surface is locally spherical, k1 = k2 , but the last relation cannot be valid everywhere at a non-compact surface unless the latter is a plane, k1 = k2 = 0. Thus the effective potential has again an attractive component, which now combines with a more complicated tangential operator – the surface Laplace-Beltrami – since in distinction to a curve the surface cannot be fully rectified. This makes the layer case richer and more interesting.
2. Survey of the Paper The ultimate objective of this work is to set a list of sufficient conditions to guarantee the existence of curvature-induced bound states. We restrict ourselves naturally to noncompact layers only, since the spectrum of the Dirichlet Laplacian in a bounded region of Rn is always discrete [Dav, Chap. 6].
Bound States in Curved Quantum Layers
15
The layer configuration space itself is properly defined in Sect. 3 as a tubular neighbourhood of width d built over a surface embedded in R3 which is diffeomorphic to R2 . To make it more visual, we can understand as a part of R3 between a pair of parallel surfaces. For technical reasons we suppose from the beginning that the surface admits at least one pole from which we can parametrize the surface globally by geodesic polar coordinates. We stress already here that the existence of a pole in is a strong geometric assumption and that there may be no poles in general [GM]. We introduce first quantities describing the layer geometry and formulate some basic assumptions. In the subsequent part, the Dirichlet Laplacian, − D , is expressed in terms of the couple q = (q 1 , q 2 ) of the surface (called also longitudinal) coordinates together with the normal (transverse) coordinate u. In Sect. 4, we estimate the threshold of the essential spectrum of the Hamiltonian under the assumption 0 that the reference surface is asymptotically planar in the sense that its Gauss and mean curvatures vanish at large distances. We find that this part 2 of spectrum is bounded from below by κ12 := πd , which is the lowest transverse-mode energy. Section 5 is dedicated to the analysis of the discrete part of the spectrum. We find here three sufficient conditions and illustrate them on examples. Since these results leave open the existence question for thick layers of positive total Gauss curvature, we present in Sect. 6 an alternative method, which covers the case of asymptotically planar layers that are cylindrically symmetric. Finally, we conclude in Sect. 7 with an example of a layer which has no bound states; the reference surface here is not asymptotically planar. To state here the main results of the paper we need to mention some assumptions which will be discussed in more detail below: 1 and 2 means respectively the integrability of the Gauss curvature K and the square of ∇g M, where M is the mean curvature, and 1 requires the layer half-width to be less than the minimum normal curvature radius of . The integral (total) curvatures corresponding to K and M are defined in (3.3). Theorem 2.1. Let be a C 2 -smooth complete simply connected non-compact surface with a pole embedded in R3 . Let the layer built over the surface be not self-intersecting. If the surface is not a plane but it is asymptotically planar, then any of the conditions • 1 and the total Gauss curvature is non-positive • is C 3 -smooth and the layer is sufficiently thin • is C 3 -smooth, 1, 2, and the total mean curvature is infinite • 1 and is cylindrically symmetric is sufficient for the Laplace operator − D to have at least one isolated eigenvalue of finite multiplicity below inf σess (−D ) for all the layer half-widths satisfying 1. While this theorem covers various wide classes of layers, the list is not exhaustive. For instance, it remains to be clarified whether one can include also thick layers without cylindrical symmetry built over surfaces with strictly positive total Gauss curvature which, however, do not satisfy the assumption 2. Another open question is whether one can replace 1 by an assumption including the existence of the total Gauss curvature only, defined in the principal value sense. Finally, it is desirable to find existence results also for layers over more general surfaces which do not possess poles or are not diffeomorphic to R2 . Properties of the obtained curvature-induced bound states will be discussed elsewhere. Let us just mention that in analogy to bent strips [DE] one can perform the
16
P. Duclos, P. Exner, D. Krejˇciˇrík
Birman–Schwinger analysis for slightly curved planar layers (weak-coupling regime) which yields the first term in the asymptotic expansion for the gap between the eigenvalue and the threshold of the essential spectrum. We also remark that the weak coupling analysis of bent “fat” manifolds is similar to that of a local one-sided deformation of a straight strip [BGRS] or planar layer [BEGK]. We use the standard component notation of the tensor analysis, the range of indices being 1, 2 for Greek and 1, 2, 3 for Latin. The indices are associated with the above mentioned coordinates by (1, 2, 3) ↔ (q 1 , q 2 , u) ≡ (s, ϑ, u). The partial derivatives are denoted by commas, however, we use also the dot notation for the derivatives w.r.t. s. 3. Preliminaries Let be a C 2 -smooth surface in R3 which has at least one pole, i.e., a point o ∈ such that the exponential mapping, expo : To → , is a diffeomorphism. The existence of a pole in is a nontrivial assumption which has important topological consequences. In particular, is necessarily diffeomorphic to R2 and as such it is simply connected and non-compact. Using the geodesic polar coordinates we can parametrize the surface (with the exception of the pole o) by a unique patch p : 0 → R3 , where 0 := (0, ∞) × S 1 . The tangent vectors p,µ := ∂p/∂q µ are linearly independent and their cross-product defines a unit normal field n on . Put 0 := 0 × (−a, a). We define a layer := L(0 ) of width d = 2a > 0 over the surface by virtue of the mapping L : 0 → R3 which acts as (cf. [Sp3, Prob. 12 of Chap.3]) L(q, u) := p(q) + un(q).
(3.1)
3.1. The Surface Geometry. The induced surface metric in the geodesic polar coordinates has the diagonal form, (gµν ) = diag(1, r 2 ), where r 2 ≡ g := det(gµν ) is the square of the Jacobian of the exponential mapping which satisfies the classical Jacobi equation r¨ (s, ϑ) + K(s, ϑ) r(s, ϑ) = 0
with
r(0, ϑ) = 0, r˙ (0, ϑ) = 1.
(3.2)
The Gauss curvature K, together with the mean curvature M, can be determined via the Weingarten tensor hµν – cf. [Kli, Prop. 3.5.5]. 1
By means of the invariant surface element, d := g 2 d 2 q, we may introduce some global quantities characterizing , namely the total Gauss curvature K and the total mean curvature M which are defined, respectively, by the integrals 2 Kd and M := M 2 d . (3.3) K :=
The latter always exists (it may be +∞), while the former is well defined provided 1 K ∈ L1 ( 0 , d ) If this condition is not satisfied, one can understand the above integral as the principalvalue defined through the area restricted by the geodesic circle p(s, ·) of radius s → ∞. Assuming K to be finite, an integration of (3.2) yields the following useful estimate: 2π ∃C > 0 ∀s ∈ (0, ∞) : r(s, ϑ) dϑ ≤ Cs. (3.4) 0
Bound States in Curved Quantum Layers
17
The norm and the inner product in the Hilbert space L2 ( 0 , d ) will be indicated by the subscript “g”.
3.2. The Layer Geometry. It is clear from the definition (3.1) that the metric tensor of the layer (as a submanifold of R3 ) has the block form (Gµν ) 0 (Gij ) = (3.5) with Gνµ = (δνσ − uhνσ )(δσρ − uhσρ )gρµ . 0 1 This formula is well suited for calculation of the determinant, G := det(Gij ), because the eigenvalues of the matrix of the Weingarten map are the principal curvatures k1 , k2 , and K = k1 k2 , M = 21 (k1 + k2 ). Hence G = g [(1 − uk1 )(1 − uk2 )]2 = g(1 − 2Mu + Ku2 )2 .
(3.6)
1
In particular, this expression defines through d := G 2 d 2 q du the volume element of . Henceforth, we shall assume 0 is not self-intersecting
i.e., L is injective.
We have to require also that L is a diffeomorphism. In view of the regularity assumptions imposed on and the inverse function theorem, it is equivalent to assuming that 1 − 2Mu + Ku2 does not vanish on 0 , which can be guaranteed by imposing a restriction on the layer thickness: 1
a < ρm := (max {k1 ∞ , k2 ∞ })−1
The number ρm is naturally interpreted as the minimal normal curvature radius of (for planar surfaces one can put ρm := ∞). It follows from (3.5) that C− ≤ 1 − −1 2 . The lower bound explains why we 2Mu + Ku2 ≤ C+ holds with C± := 1 ± aρm assume 1 (together with 0) to get the global diffeomorphism. On the other hand, the supremum norms in the definition of ρm are necessarily finite since a meaningful layer must have a non-zero width. Another consequence of the considerations is that under the assumption 1, Gµν can be immediately estimated by the surface metric, C− gµν ≤ Gµν ≤ C+ gµν
with
0 < C− ≤ 1 ≤ C+ < 4.
(3.7)
Remark. We stress the following which will be supposed through all the paper but will not be always referred to hereafter: • We consider surfaces which can be parametrized by means of the geodesic polar coordinates. This requires the existence of at least one pole. • Since is assumed to be of class C 2 , the surface curvatures K, M are C 0 and as such bounded locally. • Moreover, since we assume layers with non-zero widths, the principal curvatures have to be bounded uniformly on all 0 due to 1. By virtue of the relation between k1 , k2 and K, M, the same is true for the latter.
18
P. Duclos, P. Exner, D. Krejˇciˇrík
3.3. The Hamiltonian. After geometric preliminaries let us define the Hamiltonian of our model. We consider a nonrelativistic spinless particle confined to which is free within it and suppose that the boundary of the layer is a hard wall, i.e., the wavefunctions satisfy the Dirichlet boundary condition there. For the sake of simplicity we set Planck’s constant h¯ = 1 and the mass of the particle m = 21 . Then the Hamiltonian can be 2 identified with the Dirichlet Laplacian − D on L (), which is defined for an open 3 set ⊂ R as the Friedrichs extension of the free Laplacian with the domain defined initially on C0∞ () – cf. [RS4, Sect. XIII.15] or [Dav, Chap. 6]. The domain of the closure of the corresponding quadratic form is the Sobolev space W01,2 (). A natural way to investigate this operator is to pass to the coordinates (q, u) in which it acquires the Laplace-Beltrami form (Gij Gj k := δik ) H := −G− 2 ∂i G 2 Gij ∂j 1
1
on
1
L2 (0 , G 2 d 2 q du).
(3.8)
This coordinate change is nothing else than the unitary transformation U : L2 () → L2 (0 , d) : {ψ → U ψ := ψ ◦ L} −1 3 which relates the two operators by H = U (− D )U . If is not C -smooth, the operator H has to be understood in the form sense
Q[ψ] := H 2 ψ2G = (ψ,i , Gij ψ,j )G , 1
Dom Q = W01,2 (0 , d).
(3.9)
Here the subscript “G” indicates the norm and the inner product in the Hilbert space of (3.8). Employing the block form (3.5) of Gij , we can split H into a sum of two parts, H = H1 + H2 , given by H1 := −G− 2 ∂µ G 2 Gµν ∂ν = −∂µ Gµν ∂ν − 2F,µ Gµν ∂ν , 1 1 Ku − M ∂3 , H2 := −G− 2 ∂3 G 2 ∂3 = −∂32 − 2 1 − 2Mu + Ku2 1
1
(3.10) (3.11)
1
where we have introduced F := ln G 4 and expressed F,3 explicitly for H2 . At the same time, it is useful to have an alternative form of the Hamiltonian which 1 has the factor 1 − 2Mu + Ku2 removed from the weight G 2 of the inner product. It is obtained by another unitary transformation, Uˆ : L2 (0 , d) → L2 (0 , d du) : {ψ → Uˆ ψ := (1 − 2Mu + Ku2 ) 2 ψ}, 1
which leads to the unitarily equivalent operator Hˆ := Uˆ H Uˆ −1 . This operator makes sense if we impose a stronger regularity assumption on , namely that the latter is piecewise C 4 -smooth (or C 3 if Hˆ is considered in the form sense). The operator Hˆ can be rewritten by means of an effective potential V using J := 21 ln(1 − 2Mu + Ku2 ) as follows: Hˆ = −g − 2 ∂i g 2 Gij ∂j + V , 1
1
V = g − 2 (g 2 Gij J,j ),i + J,i Gij J,j 1
1
and again, employing the particular form of Gij , the operator Hˆ can be split into a sum, Hˆ 1 + Hˆ 2 . The first operator is defined by the part of Hˆ where one sums over the Greek indices and K − M2 Hˆ 2 = −∂32 + V2 , V2 = . (1 − 2Mu + Ku2 )2
Bound States in Curved Quantum Layers
19
To motivate the considerations of the following sections let us look at this transformed operator from a heuristic point of view. While the operator Hˆ 1 + V2 depends on all three −1 ) coordinates, in thin layers (a ρm ) its leading term depends up to an error O(aρm on the longitudinal coordinates q only. One can estimate the former in the form sense by −1 ). The transverse coordinate u is means of (3.7) and use the fact that C± = 1 + O(aρm 2 isolated in Hˆ 2 − V2 = −∂3 , so up to higher-order terms in a the Hamiltonian decouples into a sum of the operators Hq := −g − 2 ∂µ g 2 g µν ∂ν + K − M 2 1
1
and
Hu := −∂32 ,
(3.12)
the first one being the Laplace-Beltrami operator of , except for the additional potential K −M 2 which can be rewritten by means of the principal curvatures as − 41 (k1 −k2 )2 . This is the attractive interaction mentioned in the introduction. Let us remark that similar Laplace-Beltrami operators penalized by a quadratic function of the curvature lead on compact surfaces to interesting isoperimetric problems [H, HL, EHL, F]. In what follows we shall use the family of eigenfunctions {χn }∞ n=1 of the transverse operator (−∂32 )D which is given by 2 cos κn u if n is odd, χn := d 2 sin κ u if n is even. d
n
Here κn2 := (κ1 n)2 with κ1 := π/d are the corresponding eigenvalues. 4. Essential Spectrum The essential spectrum of a planar layer (K, M ≡ 0) is clearly [κ12 , ∞). By a bracketing argument [DEK, Sect. 3.1] and using an appropriate Weyl sequence, it is easy to see that the same remains true if is obtained by a compactly supported deformation of a 2 planar layer. In this section we will prove the inclusion σess (− D ) ⊆ [κ1 , ∞) under the assumption that the surface is asymptotically planar in the sense 0 K, M → 0
as s → ∞
Theorem 4.1. Suppose 0, 1 and assume that the surface is asymptotically planar 0. Then 2 inf σess (− D ) ≥ κ1 . Proof. We divide the layer into an exterior and interior part by putting ext := L(0,s0 ) and int := \ ext , respectively, where 0,s0 := 0,s0 × (−a, a), 0,s0 := (s0 , ∞)×S 1 for some s0 > 0. Imposing the Neumann boundary condition at the common boundary of the two parts, s = s0 , we arrive at the decoupled Hamiltonian H N = N ⊕ H N . More precisely, it is obtained as the operator associated with the quadratic Hint ext N form QN acting as (3.9), however with the domain Dom QN := Dom QN int ⊕ Dom Qext , where 1,2 Dom QN (ω , d) | ψ(·, ±a) = 0}, ω := {ψ ∈ W
ω ∈ {int, ext}.
N is purely discrete [Dav, Chap. 7], the minimax Since H ≥ H N and the spectrum of Hint N ) ≥ inf σ (H N ). Hence it is principle gives the estimate inf σess (H ) ≥ inf σess (Hext ext
20
P. Duclos, P. Exner, D. Krejˇciˇrík
N . However, by virtue of (3.9) and (3.5), we have sufficient to find a lower bound on Hext for all ψ ∈ Dom QN ext : 2 2 2 QN ext [ψ] ≥ ψ,3 G,ext ≥ inf {1 − 2Mu + Ku } ψ,3 L2 ( 0,s0
≥ 1 − sup {2a|M| + a 2 |K|} κ12 ψ2L2 (
0 ,d du),ext
0,s0
≥
0 ,d du),ext
1 − sup 0,s {2a|M| + a 2 |K|} 0
1 + sup 0,s {2a|M| + a 2 |K|}
κ12 ψ2G,ext
0
=: (1 + 5(s0 )) κ12 ψ2G,ext , where 5 denotes a function which goes to zero as s0 → ∞ due to 0. The subscript “ext” indicates the restriction of the norm to the exterior part. In the second line we have used (−∂32 )D ≥ κ12 . The claim then easily follows by the fact that s0 can be chosen arbitrarily large. & ' Remark. This threshold estimate is sufficient for the subsequent investigation of the discrete spectrum which is our goal in this paper. In order to show that all energies above κ12 belong to the spectrum, one has to construct an appropriate Weyl sequence to check 2 the opposite conclusion σess (− D ) ⊇ [κ1 , ∞). This can be done under an assumption stronger than 0 which involves derivatives of the Weingarten tensor as well. 5. Discrete Spectrum The aim of this section is to prove three different conditions sufficient for the Hamiltonian to have a non-empty spectrum below κ12 . Since we have shown that the essential spectrum does not start below this value for the layers built over asymptotically planar surfaces, the conditions yield immediately the existence of curvature-induced bound states. All the proofs here are based on the variational idea of finding a trial function 6 from the form domain of H such that ˜ Q[6] := Q[6] − κ12 62G < 0. It is convenient to split Q into two parts, Q = Q1 + Q2 , which are associated with H1 and H2 of (3.10) and (3.11), respectively. A powerful method in this situation is to construct a trial function by deforming the transverse-threshold resonance wavefunction separately in the central and tail regions. The idea goes back to Goldstone and Jaffe [GJ], see also [DE, Thm. 2.1], [RB] and [DEK, Sect. 3.2]. Theorem 5.1. Assume 0, 1, 1, and suppose that is not planar. If the surface has a non-positive total Gauss curvature, i.e., K ≤ 0, then 2 inf σ (− D ) < κ1 .
Proof. We begin the construction of 6 by considering a radially symmetric function ψ(s, ϑ, u) := ϕ(s)χ1 (u), where ϕ is arbitrary for a moment. Employing the explicit form (3.11) of H2 we get immediately Q2 [ψ] − κ12 ψ2G = (ϕ, Kϕ)g ,
(5.1)
Bound States in Curved Quantum Layers
21
while the “longitudinal kinetic part” Q1 (ψ) can be estimated by virtue of (3.7) and (3.4) as ∞ 2 Q1 [ψ] ≤ C1 |ϕ(s)| ˙ s ds. (5.2) 0
The r.h.s. of this inequality depends on the surface geometry through the constant C1 := (C+ /C− )2 C only. To make this integral arbitrarily small we replace ϕ by the family {ϕσ : σ ∈ (0, 1]} of elements which are equal to 1 on a compact set, s ≤ s0 , for some s0 > 0, and outside they are given by scaled Macdonald functions [AS, Sect. 9.6]:
K0 (σ s) ϕσ (s) := min 1, . K0 (σ s0 ) Since K0 is strictly decreasing, the corresponding ψσ := ϕσ χ1 will not be smooth at s = s0 but it remains continuous, hence it is an admissible trial function as an element of Dom Q. Using the properties of the Macdonald function [AS, Sec. 9.6] and [GR, 5.54], it is now easy to verify that for σ s0 small enough ∞ C2 |ϕ˙σ (s)|2 s ds < (5.3) ∃C2 > 0 : | ln σ s0 | 0 and therefore Q1 [ψσ ] → 0+ as σ → 0+. On the other hand, since we assume 1 and |ϕσ | ≤ 1 together with ϕσ → 1− pointwise as σ → 0+, we get by the dominated convergence theorem that (5.1) (after the replacement ψ → ψσ ) converges to K. Thus, ˜ σ ] can be made strictly negative if the total Gauss by choosing σ small enough, Q[ψ curvature is strictly negative too. In order to deal with the case K = 0, in analogy to [GJ] we construct the trial function by a small deformation of ψσ in the central region. We set 6σ,ε := ψσ + ε9, where 9(q, u) := j (q)uχ1 (u) with j ∈ C0∞ ((0, s0 ) × S 1 ). Since 9 is evidently a function from Dom Q as well, we can write ˜ σ,ε ] = Q[ψ ˜ ˜ σ ] + 2ε Q(9, ˜ Q[6 ψσ ) + ε 2 Q[9].
(5.4)
An explicit calculation where one employs the fact that the scaling acts out of the support ˜ of the localization function j yields: Q(9, ψσ ) = −(j, M)g , which can be made nonzero by choosing j supported on a compact where M does not change sign. Let us stress ˜ that it is independent of σ , because ϕσ = 1 on supp j ; the same is true for Q[9]. Now such a compact surely exists because it is supposed that is not a plane and we can take the parameter s0 arbitrarily large. If we choose now the sign of ε in such a way that the second term on the r.h.s. of (5.4) is negative, then also the sum with the last term will ˜ σ,ε ) < 0 be negative for sufficiently small ε, and we can choose σ so small that Q(6 ˜ σ ) → K = 0 as σ → 0+ here. & because Q(ψ ' Remarks. (a) The special choice of the Macdonald function K0 for the mollifier ϕ is not indispensable. In analogy to [GJ] or [DE, Thm. 2.1] we need a family of suitable functions scaled exterior to (0, s0 ) in such a way that the integral (5.2) tends to zero as σ → 0+. However, since this integral contains the extra factor s (the relic of integration in a higher dimension) we have to be more careful about the decay properties. We have adopted for this purpose the mollifier employed in [EV, BCEZ], which is the most natural in a sense, because it employs the Green function kernel of the free 2-dimensional Laplacian at zero
22
P. Duclos, P. Exner, D. Krejˇciˇrík
energy. Nevertheless, we would have succeeded equally if we had chosen for the scaled tail, e.g., a compactly supported function similar to that of the proof of Theorem 6.1. ˜ := (b) In the case K = 0 we have not used the deformation proposed in [DE]: 9 2 ∞ 2 1 ˜ (H − κ1 )ψσ with ˜ ∈ C0 ((0, s0 ) × S × (−a, a)), because it requires an extra condition on the surface regularity. The analogous condition in the strip case has been forgotten in [DE, Thm. 2.1]. Moreover, the localization function j used here is simpler since it is independent of u. A class of layers to which the above theorem applies is represented by those built over Cartan–Hadamard surfaces, i.e., geodesically complete simply connected noncompact surfaces with non-positive Gauss curvature. In view of the Cartan–Hadamard theorem [Kli, Thm. 6.6.4] each point is a pole and we can therefore construct infinitely many geodesic polar coordinate systems. Excluding the trivial planar case, the total Gauss curvature is always strictly negative and so all these layers possess at least one bound state provided they are asymptotically planar, K is finite, and the assumptions 0, 1 are satisfied. Example 1 (Hyperbolic Paraboloid). The simple quadric given in R3 by the equation z = x 2 − y 2 is an asymptotically planar surface with K = −2π. Example 2 (Monkey Saddle). Take z = x 3 − 3xy 2 . One can again check that 0 holds true and the total Gauss curvature now equals −4π . A family of layers of the limit case K = 0 was investigated in [DEK]. We consider there compactly supported deformations of a planar layer for which the zero value of K follows at once by the Gauss-Bonnet theorem. If such a deformed plane contains at least one pole, all the spectral results are trivial consequences of the present Theorems 4.1 and 5.1. On the other hand, the results of [DEK] are more general in the sense that due to the compact support assumption the technique works without the requirement on the existence of a pole. Example 3 (Compactly Perturbed Plane Without Poles). Suppose that a plane with a circular hole is connected via a cylindrical tube perpendicular to it with a pierced sphere. Both interfaces can be made as smooth as needed. If the tube is sufficiently long there is only one pole o provided the surface has a cylindrical symmetry w.r.t. the axis of the tube; it coincides with the intersection of the axis with the sphere. If we break now the symmetry by taking an ellipsoid instead of the sphere, we destroy the injectivity of the exponential mapping expo without creating new poles. The Goldstone–Jaffe trick of choosing the ground state of the transverse operator as ˜ has proven its usefulness as a the generalized annulator of the shifted energy form Q robust argument for demonstrating the existence of bound states. However, in the present context it reaches its limits because the above proof does not work for layers built over surfaces with positive total curvature, for instance: Example 4 (Elliptic Paraboloid). The surfaces z = (x/x0 )2 + (y/y0 )2 with x0 , y0 > 0 are asymptotically planar but K = 2π > 0. They always contain two poles given by its umbilics which coincide if it is a paraboloid of revolution. On the other hand, due to the heuristic argument based on (3.12) one expects existence of bound states in any non-planar layer thin enough. This is indeed true. This fact together with another sufficient condition are established in the next theorem.
Bound States in Curved Quantum Layers
23
Theorem 5.2. Assume 0, 1, and suppose that is C 3 -smooth, non-planar and obeys in addition 2 ∇g M ∈ L2 ( 0 , d ) 2 Then inf σ (− D ) < κ1 if one of the following two conditions is satisfied:
(a) the layer is sufficiently thin, i.e., d is small enough, (b) 1 and the total mean curvature is infinite, i.e., M = ∞. For brevity we have introduced here the non-component notation ∇g for the covariant derivative on . Proof. We use 6σ (s, ϑ, u) := (1 + M(s, ϑ)u) ψσ (s, u), where ψσ = ϕσ χ1 is the trial function defined in the first part of the proof of Theorem 5.1. Under the stated regularity assumption, 6σ is an admissible trial function, i.e., it belongs to Dom Q. Using (3.7) together with Minkowski’s inequality and (3.11), we get
Q1 [6σ ] ≤ 2(C+ /C− )2 (1 + aM∞ )2 ϕ˙σ 2g + a 2 ϕσ ∇g M2g
π2 − 6
2 Q2 [6σ ] − κ12 6σ 2G = ϕσ , (K − M 2 )ϕσ + , KM ϕ . ϕ σ σ g g 12κ12 We start by checking the second sufficient condition. We recall that due to 1, K and M are uniformly bounded. Thus, thanks to 2 and the hypotheses assumed in (b), ˜ σ ] → −∞ as σ → 0+. it follows that Q[6 We pass now to the first sufficient condition. Since K − M 2 is negative – cf. (3.12) – continuous and the surface is supposed to be non-planar, the first term at the r.h.s. of the second line is strictly negative, say −c2 , for a sufficiently large value of s0 (the radius of the disc where ψσ = χ1 ). On the other hand, ϕ˙σ g is estimated by (5.3), so we can choose σ so small that it is less than c2 /3. Now we choose the layer half-width a ˜ σ ] is less than c2 /3 so small that the sum of the remaining terms of the estimated Q[6 −2 2 ˜ σ ] ≤ −c2 /3 < 0 as well. For this we recall that κ1 is proportional to a . Hence Q[6 for σ, d small enough. & ' Remark. In order to obtain the first sufficient condition, one can replace 2 by an assumption on the boundedness of ∇g M. Moreover, if we had used the compactly supported function ϕn from the proof of Theorem 6.1 below instead of ϕσ , it would have been sufficient to assume that ∇g M was bounded locally only, which is exactly the situation when is of class C 3 . This is why 2 is not included in the thin layer case of Theorem 2.1. We believe that the hypothesis 2 is technical – cf. Example 6. Even with it, however, the class of layers possessing bound states without any restriction on the layer thickness other than 1 is extended significantly. For instance, it is an easy exercise to verify that all the conditions of Theorem 5.2 (b) are fulfilled for the elliptic paraboloids and many other surfaces with a positive total Gauss curvature. Removing this technical condition is still an open question except for layers endowed with the cylindrical symmetry which we shall discuss below.
24
P. Duclos, P. Exner, D. Krejˇciˇrík
6. Cylindrically Symmetric Layers Consider now layers which are invariant w.r.t. rotations around a fixed axis in R3 . We may thus suppose that is a surface of revolution parametrized by p : 0 → R3 , p(s, ϑ) := (r(s) cos ϑ, r(s) sin ϑ, z(s)) ,
where
r, z ∈ C 2 ((0, ∞)) , r > 0.
It will be the geodesic polar coordinate chart if we impose the following condition on the canonical parametrization, r˙ 2 + z˙ 2 = 1;
then also
r˙ r¨ + z˙ z¨ = 0.
(6.1)
An explicit calculation yields the diagonal form of the Weingarten tensor, (hµν ) = diag(ks , kϑ ), with the principal curvatures ks = r˙ z¨ − r¨ z˙ and kϑ = z˙ r −1 . In fact, it is sufficient to know the function s → ks (s) only, since r, z can be constructed from the relations s s r(s) = cos b(ξ ) dξ 0 ks (ξ ) dξ. (6.2) with b(s) := s 0 z(s) = sin b(ξ ) dξ 0
Recall that by Theorem 5.1 the spectrum bottom of any layer is strictly less than the first transverse eigenvalue provided K ≤ 0. However, only the case K = 0 is relevant to the present situation of surfaces of revolution, because by the Gauss-Bonnet theorem (see also (3.2)) K +2π r˙ (∞) = 2π,
where
r˙ (∞) := lim r˙ (s), s→∞
(6.3)
and r˙ (∞) > 1 is not allowed because of (6.1). Notice, on the other hand, that r˙ (∞) always exists since the existence of the total Gauss curvature is supposed. Moreover, the positivity of r requires K ≤ 2π. The goal of this section is to show that in the present special case of symmetric 2 layers inf σ (− D ) < κ1 holds true also for all admissible strictly positive values of K, irrespective of the layer thickness. Our argument requires to exclude here the extreme case K = 0 for which the result is already known, without any symmetry assumption. Hereafter we will therefore assume that 0 ≤ r˙ (∞) < 1. It follows that there exist 0 < δ + < 21 and s0 > 0 such that for all s ≥ s0 one has −δ + ≤ r˙ (s) ≤ 1 − δ + . Using now the explicit dependence of kϑ on r, z˙ and (6.1), we obtain the essential ingredients of our strategy: Lemma 6.1. Assume K > 0. There exist δ > 0 and s0 > 0 such that ∀s ≥ s0 :
δ 1 ≤ |kϑ (s)| ≤ r(s) r(s)
and kϑ (s) does not change sign.
In particular, employing (3.4), it follows that kϑ is not integrable in L1 (R+ ). On the other hand, the meridian curvature ks is integrable under the assumption 1, which is seen by the regularity properties imposed on p and the following estimate: ∞ ∞ ∞ ∞> |K(s)| r(s) ds ≥ |ks (s)kϑ (s)| r(s) ds ≥ δ |ks (s)| ds. 0
s0
s0
Bound States in Curved Quantum Layers
25
This is the essence of what we are going to use in our method. Even if M may decay at infinity it is not negligible in the integral sense there. However, K is supposed to be integrable and it will enable us to eliminate the unpleasant contribution of the corresponding total curvature – cf. (5.1) – by going to large distances by means of a family of trial functions supported there. Theorem 6.1. Assume 0, 1, 1, and suppose that is a surface of revolution. 2 Then inf σ (− D ) < κ1 . Proof. Since the result for K = 0 is included in Theorem 5.1, we suppose K > 0 in the following. We use 6n,ε (s, u) := (ϕn (s) + εφn (s)u)χ1 (u), where ε will be specified later and ϕn , φn are functions “localized at infinity” as n → ∞. They are defined in the following way: Consider three sequences b1 , b2 , b3 : N → N such that 0 < b1 < b2 < b3 and b1 (n) → ∞ as n → ∞. We set ϕn (s) :=
ln(s/bi ) , ln(bj /bi )
(i, j ) ∈ {(1, 2), (3, 2)},
and
φn (s) :=
ϕn (s) s
if min{bi , bj } < s ≤ max{bi , bj }, and assume that ϕn , φn are zero elsewhere. Defined in this way the functions are not smooth at the matching points, however, 6n,ε still belongs to Dom Q because they are continuous and of a compact support for each n ∈ N. Next we note that they are positive and uniformly bounded (the maximum of φn is even decreasing as n → ∞). ˜ – cf. also Using (3.7) and (3.4) we can estimate the longitudinal kinetic parts of Q (5.2) – by one-dimensional integrals ∞ ∞ d2 2 C1 ϕ˙n (s) s ds, Q1 [φn uχ1 ] ≤ φ˙ n (s)2 s ds, Q1 [ϕn χ1 ] ≤ C1 2 0 0 and an explicit calculation yields that both converge to zero as n → ∞ if we demand, in addition, that b2 /b1 and b3 /b2 tend to infinity as n → ∞. The same is true for the mixed term Q1 (ϕn χ1 , φn uχ1 ) by the Schwarz inequality. On the other hand, an explicit ˜ yields integration w.r.t. u for the rest of Q Q2 [6n,ε ] − κ12 6n,ε 2G
= (ϕn , Kϕn )g − 2ε(ϕn , Mφn )g + ε
2
φn 2g
π2 − 6 + (φn , Kφn )g . 3κ12
For large n the contribution of the Gauss curvature will be negligible because of 1 and the facts that ϕn and φn are uniformly bounded and the infimum of their support tends to infinity as n → ∞. Summing up the results, we arrive at ˜ n,ε ] = lim ε 2 φn 2g − 2ε(ϕn , Mφn )g (6.4) lim Q[6 n→∞
n→∞
if the limit on the r.h.s. exists. We put ε ≡ εn := (ϕn , Mφn )−1 g which will be seen in a moment as a reasonable choice because the integral tends to infinity as n → ∞ for particular choices of bj ; εn is thus well-defined for n large enough. Then the problem turns to comparing the number −2 to the limit (φn , φn )g lim . n→∞ (ϕn , Mφn )2g
26
P. Duclos, P. Exner, D. Krejˇciˇrík
In the special case of cylindrically symmetric surfaces when one has the information about the explicit behaviour of M at infinity, it is an easy matter. Indeed, since ks is integrable in L1 (R+ ) and φn is chosen in a way to eliminate the weight r with the help of (3.4), the meridian curvature does not contribute in the denominator, while in view of Lemma 6.1, kϑ r can be replaced by a constant value near infinity. Using in addition (3.4) in the numerator, one is therefore seeking the zero limit of ∞ 2 3 1 0 φn (s) s ds = . ∞ 2 = ∞ 2 ln(b3 /b1 ) ϕn (s)φn (s)ds 0 φn (s) s ds 0
One can choose, for instance, ∀n ≥ 2: b1 (n) := n, b2 (n) := n2 , b3 (n) := n3 , which fulfill also the other properties earlier required about these sequences. We conclude ˜ n,ε ] → −2 as n → ∞ so we can find a finite n0 for which the form will be by Q[6 negative. & ' Remark. Notice that (6.4) is a general result. We have not supposed anything of the surface symmetry when deriving this relation. Example 5 (Hyperboloid of Revolution). Consider one of the two sheets of the hyperboloid given by the equation x 2 + y 2 − (z/z0 )2 = −1. It is an asymptotically planar surface of revolution and via the parameter z0 > 0 we can get arbitrary value of the total Gauss curvature between 0 and 2π . Example 6 (Surface with Non Square Integrable ∇g M). Let us construct an asymptotically planar surface of revolution which satisfies 1 but contradicts 2. We define ks (s) := s −2 sin s 2 and use (6.2) to get the functions r, z and in this way the map p. One can easily check that there is a c such that r(s) ≥ cs for all s ∈ R+ . Therefore kϑ = z˙ r −1 → 0 as s → ∞ because |˙z| = | sin b(s)| ≤ 1; the same limit holds, of course, for ks . Since K, M are expressed by means of the principal curvatures, it follows that the surface is asymptotically planar 0. At the same time, |K|r = |ks z˙ | ≤ |ks | is integrable in L1 (R+ ) which gives 1. On the other hand, while it is true that k˙ϑ = ks r −1 cos b − r −2 sin b cos b belongs to L2 (R+ , r(s)ds), the same does not hold ˙ 0) does not fulfill 2. We note that an for k˙s by its definition. Hence, ∇g M = (M,
explicit calculation together with (6.3) yields K = 2π 1 − cos example.
π 2
≈ 1.38 π in this
Remark. (Partial Wave Decomposition). An alternative approach is to decompose − D with respect to angular momentum subspaces to investigate the spectral properties of layers endowed with the cylindrical symmetry. The obtained series of partial-wave Hamiltonians have a similar form as the pure strip Hamiltonian – cf. [EŠ, DE] – except for an additional centrifugal term and different operator domain for the lowest wave. This, however, makes the spectral analysis of layers more complicated than a direct use of the non-decomposed Hamiltonian H . At the same time, it gives an insight into the choice of the trial function in the proof of Theorem 6.1 which has to be supported in the region where the influence of the centrifugal term is negligible. 7. A Layer Without Bound States Consider a semi-cylinder of radius R closed by a hemisphere; the total Gauss curvature is 2π . Since the mean curvature of the cylindrical part is constant, M = (2R)−1 > 0, such
Bound States in Curved Quantum Layers
27
a surface is not asymptotically planar. We shall demonstrate that the Hamiltonian H := − D of the corresponding layer built over this surface does not possess bound states for any a < R. Imposing the Neumann or Dirichlet boundary condition on the segment of connection N ⊕ HN ≤ H ≤ of the hemispherical and cylindrical layer, we get the bounds sphHsph cyl D D Hsph ⊕Hcyl . The spectrum of the hemispherical-segment Hamiltonians is purely discrete. By the minimax principle only the cylindrical part of the estimating operators contributes to the essential spectrum, while a possible eigenvalue of H below the essential spectrum N and H D . In particular, is squeezed between the corresponding eigenvalues of Hsph sph N ) > inf σ (H D ). The spectral for our purpose it is sufficient to show that inf σ (Hsph ess cyl analysis of these operators becomes trivial if they are expressed in the spherical or cylindrical coordinates, respectively. N is the same as the lowest Due to mirror symmetry, the ground state energy of Hsph j
eigenvalue of the entire spherical layer which is κ12 . On the other hand, σ (Hcyl ) = j
σess (Hcyl ) = [51 , ∞) for both the conditions j ∈ {N, D}, where the threshold 51 is given by the first eigenvalue of the Dirichlet radial operator −∂r2 − (4r 2 )−1 on L2 ((R − a, R + a)). Since the latter is less than −∂r2 − (4(R + a)2 )−1 , the Rayleigh principle yields 51 < κ12 . It is now easy to conclude that the spectrum of the unified layer satisfies σ (H ) = σess (H ) = [51 , ∞).
(7.1)
Remark. The above example shows that without the condition 0, or at least without M → 0 at infinity, one cannot guarantee the existence of bound states. Notice that the reference surface is not C 2 -smooth in this counter-example and thus it does not belong to the class of manifolds considered from the beginning. Nevertheless, one can construct a sequence of domains which converges in an appropriate sense to the hemispherical layer and, at the same time, they can be connected to the cylindrical part in a sufficiently smooth way. It follows then from [RT, Thm. 1.5] that the spectral result (7.1) remains preserved for the domains close to the limiting layer. Acknowledgement. The authors would like to thank Mark S. Ashbaugh for private communications, and Wolfgang T. Meyer who suggested Example 3. The work has been done during the visits of P. E. and D. K. to Centre de Physique Théorique, Marseille-Luminy, and P. D. to the Nuclear Physics Institute, AS CR; the authors express their gratitude to the hosts. The work has been partially supported by the Grant AS A 1048101 and the CAS-CNRS Exchange Agreement 7919.
References [AS]
Abramowitz, M.S. and Stegun, I.A., eds.: Handbook of mathematical functions. New York: Dover, 1965 [BCEZ] Bentosela, F., Cavalcanti, R.M., Exner, P. and Zagrebnov, V.A.: Anomalous electron trapping by localized magnetic fields. J. Phys. A 32, 3029–3039 (1999) [BEGK] Borisov, D.,Exner, P., Gadyl’shin, R. and Krejˇciˇrík, D.: Bound states in weakly deformed strips and layers. Ann. H. Poincaré 2, 553–572 (2001) [BGRS] Bulla, W., Gesztesy, F. Renger, W. and Simon, B.: Weakly coupled bound states in quantum waveguides. Proc. Am. Math. Soc. 127, 1487–1495 (1997) [dC1] da Costa, R.C.T.: Quantum mechanics of a constrained particle. Phys. Rev. 23, 1982–1987 (1981) [dC2] da Costa, R.C.T.: Constraints in quantum mechanics. Phys. Rev. A 25, 2893–2900 (1982) [Dav] Davies, E.B.: Spectral theory and differential operators. Cambridge: Camb. Univ. Press, 1995 [DE] Duclos, P. and Exner, P.: Curvature-induced bound states in quantum waveguides in two and three dimensions. Rev. Math. Phys. 7, 73–102 (1995)
28
[DEK] [EHL] [EŠ] [EV] [F] [FH] [GJ] [GR] [GM] [H] [HL] [JK] [Kli] [LCM] [M] [RT] [RS4] [RB] [Sp3] [T]
P. Duclos, P. Exner, D. Krejˇciˇrík
Duclos, P., Exner, P., and Krejˇciˇrík, D.: Locally curved quantum layers. Ukrainian J. Phys. 45, 595–601 (2000) Exner, P., Harrell, E.M. and Loss, M.: Optimal eigenvalues for some laplacians and Schrödinger operators depending on curvature. In: Proceedings of QMath7 (Prague 1998), Oper. Theory Adv. Appl. Vol. 108, Basel–Boston: Birkhäuser, 1999, pp. 47–58 Exner, P. and Šeba, P.: Bound states in curved quantum waveguides. J. Math. Phys. 30, 2574–2580 (1989) Exner, P. and Vugalter, S.A.: Asymptotic estimates for bound states in quantum waveguides coupled laterally through a narrow window. Ann. Inst. H. Poincaré 65, 109–123 (1996) Freitas, P.: On minimal eigenvalues of Schrödinger operators on manifolds. Commun. Math. Phys. 217, 375–382 (2001) Froese, R. and Herbst, I.: Realizing holonomic constraints in classical and quantum mechanics. Commun. Math. Phys. 220, 489–535 (2001) Goldstone, J. and Jaffe, R.L.: Bound states in twisting tubes. Phys. Rev. B 45, 14100–14107 (1992) Gradshtein, I.S. and Ryzhik, I.M.: Table of integrals, series and products. New York: Academic Press, 1980 Gromoll, D. and Meyer, W.: On complete open manifolds of positive curvature. Ann. of Math. 90, 75–90 (1969) Harrell, E.M.: On the second eigenvalue of the Laplace operator penalized by curvature. J. Differ. Geom. and Appl. 6, 397–400 (1996) Harrell, E.M. and Loss, M.: On the laplace operator penalized by mean curvature. Commun. Math. Phys. 195, 645–650 (1998) Jensen, H. and Koppe, H.: Quantum mechanics with constraints. Ann.Phys. 63, 586–591 (1971) Klingenberg, W.: A course in differential geometry. New York: Springer-Verlag, 1978 Londergan, J.T., Carini, J.P. and Murdock, D.P.: Binding and scattering in two-dimensional systems LNP Vol. 60, Berlin: Springer, 1999 Mitchell, K.A.: Gauge fields and extrapotentials in constrained quantum systems. Phys. Rev. A 63, art.042112 (2001) Rauch, J. and Taylor, M.: Potential and scattering theory on wildly perturbed domains: J. Funct. Anal. 18, 27–59 (1975) Reed, M. and Simon, B.: Methods of modern mathematical physics, IV. Analysis of operators. New York: Academic Press, 1978 Renger, W. and Bulla, W.: Existence of bound states in quantum waveguides under weak conditions. Lett. Math. Phys. 35, 1–12 (1995) Spivak, M.: A comprehensive introduction to differential geometry. Vol. III, Berkeley, CA: Publish or Perish, 1975 Tolar, J.: On a quantum mechanical d’Alembert principle. In: Group theoretical methods in physics, LNP, Vol. 313, Berlin–Heidelberg–New York: Springer, 1988, pp. 268–274
Communicated by B. Simon
Commun. Math. Phys. 223, 29 – 46 (2001)
Communications in
Mathematical Physics
© Springer-Verlag 2001
Generalized q-Hermite Polynomials Christian Berg1 , Andreas Ruffing2 1 Department of Mathematics, University of Copenhagen, Universitetsparken 5, 2100 Copenhagen,
Denmark. E-mail:
[email protected]
2 Zentrum Mathematik, Technische Universität München, Arcisstrasse 21, 80333 München, Germany.
E-mail:
[email protected] Received: 6 December 1999 / Accepted: 21 May 2001
Abstract: We consider two operators A and A+ in a Hilbert space of functions on the exponential lattice {q n , −q n |n ∈ Z}, where 0 < q < 1. The operators are formal adjoints of each other and depend on a real parameter γ < 21 . We show how these operators lead to an essentially unique symmetric ground state ψ0 and that A and A+ are ladder operators for the sequence ψn = (A+ )n ψ0 . The sequence (ψn /ψ0 ) is shown to be a family of orthogonal polynomials, which we identify as symmetrized q-Laguerre polynomials. We obtain in this way a new proof of the orthogonality for these polynomials. When γ = 0 the polynomials are the discrete q-Hermite polynomials of type II, studied in several papers on q-quantum mechanics.
1. Introduction It is a well established fact that there are deep connections between the theory of orthogonal polynomials on the one hand and properties of Schrödinger operators on the other hand. Those operators are assumed to act in conventionally used Hilbert spaces like for example L2 (Rn ). A prominent example for these connections are the classical Hermite polynomials which correspond to Schrödinger operators with a quadratic potential. In the one-dimensional case these polynomials are orthogonal with respect to a normal distribution, the support of which is the whole real line. When dealing with a discretization of this support, one meets a new ingredient which has enriched the investigation of orthogonal polynomials: It is the aspect of deformation. The idea of q-analogues or q-deformations of classical functions has played a crucial role in the context of special functions. In some cases of q-deformations one sees that the deformation itself can be associated with discretizing the support of the orthogonality measures for certain orthogonal polynomials. Our purpose is to look at generalized Hermite polynomials on the lattice-like support {+q n , −q n |n ∈ Z}. These polynomials will turn out to be one-parameter generaliza-
30
C. Berg, A. Ruffing
tions of the discrete q-Hermite polynomials of type II. Our basic starting point is the introduction of difference operators which transform these polynomials into each other. These operators contain two parameters, the deformation parameter q which is related to the discrete support {+q n , −q n |n ∈ Z} and a parameter γ which couples one-dimensional Dunkl-like operators Mq , Mq+ to the difference operators under consideration. As for the multidimensional case of corresponding Dunkl operators in the continuum situation, see [18, 19]. The parameter γ that occurs in our context can be regarded as a further deformation parameter. Having derived the generalized q-Hermite polynomials in the course of the second section, we will classify them in Sect. 3. It turns out that they are symmetrized versions (α) of certain q-Laguerre polynomials Ln (x; q). In the limit q → 1 the monic generalized q-Hermite polynomials tend to the generalized Hermite polynomials in the sense of Szeg˝o, cf. [21]. We shall comment on that briefly in Sect. 4. Let us start with the basic definitions and tools. We need the exponential lattice Rq := {+q n , −q n |n ∈ Z}
0
and furthermore the set of functions ∞ 2 k k 2 k 2 L (Rq ) := f : Rq → C|(1 − q) q (|f (q )| + |f (−q )| ) < ∞ ,
(1.1)
(1.2)
k=−∞
which is the ordinary Hilbert space of square-integrable functions on Rq with respect to the measure µq = (1 − q)
∞ k=−∞
q k (δq k + δ−q k ).
(1.3)
The corresponding scalar product ∞
(f, g)J := (1 − q)
q k f (q k )g(q k ) + f (−q k )g(−q k )
(1.4)
k=−∞
is called the Jackson product. An orthonormal basis for L2 (Rq ) is given by the functions enσ , σ ∈ {+1, −1}, n ∈ Z defined by n
enσ (τ q m ) := q − 2 (1 − q)− 2 δmn δσ τ , 1
(1.5)
where m, n ∈ Z, σ, τ ∈ {+1, −1}, and δmn , δσ τ denoting the Kronecker δ-symbol. It is simple to verify that the measure µq converges to Lebesgue measure on the real line when q → 1. The convergence is to be understood in the so-called vague sense ∞ lim f dµq = f (x)dx q→1
−∞
for all continuous functions f : R → C with compact support. We shall apply the theory of linear operators in the Hilbert space L2 (Rq ) which will be an important tool in the course of our calculations.
Generalized q-Hermite Polynomials
31
In the following section we derive the generalized discrete q-Hermite polynomials of type II by a formalism of q-difference ladder operators A+ and A. It is well known that the use of ladder operators plays a crucial role in mathematical physics, for instance in the context of Schrödinger operators. For discretized versions of ladder operators in the context of q-deformations, see for example [2, 3, 13, 15, 17, 9, 16] where similar questions were treated. For a more general framework see [12]. 2. The Generalized Polynomials Let 0 < q < 1. We consider the following operators on functions ϕ : Rq → C, given for x ∈ Rq by ϕ(qx) − ϕ(x) , qx − x ϕ(q −1 x) − ϕ(−q −1 x) Mq ϕ(x) := , x ϕ(qx) + ϕ(−qx) Mq+ ϕ(x) := , x Rϕ(x) := ϕ(qx), Lϕ(x) := ϕ(q −1 x), Xϕ(x) := xϕ(x). Dϕ(x) :=
(2.1) (2.2) (2.3) (2.4) (2.5)
Furthermore let γ be a real parameter and f : Rq → R a real-valued function on the exponential lattice Rq . We then introduce the operators Aγ := q −1 (LD + γ LMq+ + Lf (X)),
(2.6)
A+ γ
(2.7)
:= −D + γ Mq R + f (X)R,
where f (X) is the operator of multiplication given by f (X)ϕ(x) = f (x)ϕ(x).
(2.8)
2 Each of the above operators can be considered as an operator √ in L (R √ q ) by choosing the domain of definition properly. We first notice that (1/ q)L and qR are unitary operators which are adjoint and inverse of each other, i.e.
L∗ = qR,
R∗ =
1 L. q
(2.9)
We also notice that LD = qDL,
DR = qRD.
(2.10)
Let Cc (Rq ) be the set of functions ϕ : Rq → C with finite support. This is a dense linear subspace of L2 (Rq ), and it is clearly invariant under each of the operators (2.1)(2.8). The operators (2.1)-(2.3) and (2.5) are unbounded, and we can consider a minimal and a maximal version for each of them. The domain of definition of the maximal version is the set of functions in L2 (Rq ) for which the image under the operator belongs to L2 (Rq ). The domain of definition of the minimal version is Cc (Rq ).
32
C. Berg, A. Ruffing
It is elementary to verify the following formulas, where at least one of the functions ϕ, ψ : Rq → C has finite support: (LDϕ, ψ)J = −q(ϕ, Dψ)J , (LMq+ ϕ, ψ)J
= q(ϕ, Mq Rψ)J ,
(Lf (X)ϕ, ψ)J = q(ϕ, f (X)Rψ)J .
(2.11) (2.12) (2.13)
From (2.11) we see that the adjoint in L2 (Rq ) of LDmin is −qDmax , i.e. (LDmin )∗ = −qDmax ,
(2.14)
(L(Mq+ )min )∗ = q(Mq R)max .
(2.15)
and similarly
From the general theory of self-adjoint operators it is known that f (X)max is selfadjoint. Combining these equations we get (Aγ ϕ, ψ)J = (ϕ, A+ γ ψ)J if at least one of the functions ϕ, ψ has finite support. In the following we consider only the maximal versions of the operators, and we shall skip the suffix max. We now ask for the solution of the following problem: Let λ > 0 be a positive number. Does there exist a non-vanishing real-valued function ψγ ∈ L2 (Rq ) such that the following two equations have a common solution for all x ∈ Rq ? Aγ ψγ (x) = 0,
(2.16)
A+ γ ψγ (x) = λxψγ (x).
(2.17)
We shall first assume that there exists such a solution and then see which kind of properties we can derive about it. From (2.16) by applying the operator R (D + γ Mq+ + f (X))ψγ = 0
(2.18)
follows. Let us now further assume that ψγ is an even function on Rq , i.e. ψγ (x) = ψγ (−x)
(2.19)
for all x ∈ Rq . Equation (2.18) can now be rewritten as follows: Dψγ (x) + 2γ x −1 ψγ (qx) + f (x)ψγ (x) = 0.
(2.20)
We see that the function f is determined by the so-called ground state ψγ via the equation f (x) = −
Dψγ (x) 2γ ψγ (qx) − . ψγ (x) x ψγ (x)
(2.21)
Inserting f into Eq. (2.17) and using that Mq vanishes on even functions we obtain Dψγ (x) 2γ ψγ (qx) R− R ψγ (x) = λxψγ (x), (2.22) −D− x ψγ (x) ψγ (x)
Generalized q-Hermite Polynomials
33
hence −ψγ (x)(Dψγ )(x) − (Dψγ )(x)(Rψγ )(x) − 2γ x −1 ψγ2 (qx) = λxψγ2 (x).
(2.23)
Using the q-difference product rule for two arbitrary functions ϕ1 , ϕ2 D(ϕ1 ϕ2 ) = (Dϕ1 )ϕ2 + (Rϕ1 )(Dϕ2 )
(2.24)
with ϕ1 := ϕ2 := ψγ , we get from (2.23) Dψγ2 (x) = −λxψγ2 (x) − 2γ x −1 ψγ2 (qx).
(2.25)
Defining gγ = ψγ2 , we finally obtain gγ (qx) − gγ (x) = −λxgγ (x) − 2γ x −1 gγ (qx), qx − x hence
(1 − 2γ (1 − q))gγ (qx) = (1 + (1 − q)λx 2 )gγ (x).
In the following it will be convenient to define τ := 1 − 2γ (1 − q).
(2.26)
Since gγ is assumed non-vanishing, hence positive, we have to make the restriction τ > 0, and we then get 1 + (1 − q)λx 2 (2.27) gγ (x). τ This shows that the even function gγ is fixed on Rq as soon as it is known for one value of x. Iterating (2.27) we get gγ (qx) =
(−(1 − q)λx 2 ; q 2 )n gγ (x), x ∈ Rq , n ≥ 0, τn where we use the standard notation for q-shifted factorials, cf. [8], gγ (q n x) =
(a; q)n =
n
(1 − aq k−1 ),
n ∈ N0 ∪ ∞.
(2.28)
k=1
Extending the definition to negative integers by (a; q)−n =
1 (aq −n ; q)n
,
n ∈ N,
(2.29)
and putting gγ (1) = c > 0, we get (−(1 − q)λ; q 2 )n c, τn We shall make use of the following simple identity gγ (±q n ) =
(a; q)−n = and are now ready to prove:
n ∈ Z.
(−q/a)n 1 n(n−1) q2 , n ∈ N, (q/a; q)n
(2.30)
(2.31)
34
C. Berg, A. Ruffing
Proposition 2.1. The function gγ defined for τ > 0 by (2.30) with c = 1 belongs to L1 (Rq ) if and only if τ > q (equivalently γ < 21 ). For these values of γ we even have Xn gγ ∈ L1 (Rq ) for all n ∈ N0 . √ The function ψγ := gγ is a non-vanishing even function satisfying (2.16) and (2.17), where f is the odd function given by (2.21) reducing to
√ 1 2 τ 1 + (1 − q)λx − 1 . (2.32) f (x) = (1 − q)x Proof. The function Xn gγ belongs to L1 (Rq ) if and only if ∞
q k(n+1) gγ (q k ) < ∞.
(2.33)
k=−∞
If this holds for n = 0 we have in particular ∞
(−(1 − q)λ; q 2 )k
k=0
q k τ
< ∞,
and since (−(1 − q)λ; q 2 )k converges to (−(1 − q)λ; q 2 )∞ , we see that q < τ , which holds precisely for γ < 21 (under the assumption τ > 0). On the other hand if γ < 21 , then the sum of the terms in (2.33) corresponding to k ≥ 0 is clearly finite, and using (2.31) the sum for k < 0 can be written ∞ k=1
k 2 1 τ qk , (−q 2 /((1 − q)λ); q 2 )k λ(1 − q)q n
which is finite for any n. When f is defined by (2.21), it is clear that ψγ satisfies (2.16), and Eq. (2.17) is equivalent with (2.27), which is satisfied because of the formula (a; q)n+1 = (1 − aq n )(a; q)n , valid for n ∈ Z. From (2.21) we get ψγ (qx) 1 τ −1 , f (x) = (1 − q)x ψγ (x) so (2.32) follows by (2.27).
In the rest of the paper we shall assume that γ < abbreviations: A := Aγ , ψ(x) := ψγ (x),
1 2
and we shall use the following
A+ := A+ γ, + n
ψn (x) := (A ) ψγ (x),
(2.34) n ∈ N0 .
(2.35)
Note that ψ0 = ψ and that (2.35) shall be considered as the pointwise definition. Only later it will be clear that ψn ∈ L2 (Rq ) for all n ∈ N0 (under the restriction γ < 21 ). Proposition 2.2. For n ∈ N0 we have ψn (−x) = (−1)n ψn (x), and there exists a real number αn such that ψn+1 (x) − λq n xψn (x) + αn ψn−1 (x) = 0.
(2.36)
Generalized q-Hermite Polynomials
35
Proof. We shall give a proof by induction, but the proof will be different for n being even and odd. In the case n = 0 we know that ψ0 = ψ is even and by (2.17) that ψ1 (x) − λxψ0 (x) = 0,
(2.37)
which shows the assertion for n = 0 with α0 = 0. Note that ψ1 is odd. Case 1. n even: We assume that (2.36) holds for an even n and that ψn is even. We shall prove that (2.36) also holds for n + 1 and that ψn+1 is odd. Applying A+ to (2.36) we obtain ψn+2 (x) − λq n A+ (xψn (x)) + αn ψn (x) = 0.
(2.38)
This means ψn+2 (x) − λq n (−D + γ Mq R + f (X)R)(xψn (x)) + αn ψn (x) = 0,
(2.39)
and using that ψn is even, this can be rewritten as ψn+2 (x) − λq n (−D + f (X)R)(xψn (x)) − λq n γ
2xψn (x) + αn ψn (x) = 0, (2.40) x
which is equivalent to ψn+2 (x) − λq n+1 x(−D + f (X)R)ψn (x) + (αn + λq n − 2λq n γ )ψn (x) = 0, (2.41) where we made use of the commutation relation DX = qXD + 1,
(2.42)
which is a special case of (2.24). As ψn (x) is an even function, we can rewrite the last equation as follows: ψn+2 (x) − λq n+1 x(−D + γ Mq R + f (X)R)ψn (x) + (αn + λq n − 2λq n γ )ψn (x) = 0, but this is Eq. (2.36) with n replaced by n + 1, and we have found αn+1 = αn + λq n − 2γ λq n .
(2.43)
In particular α1 = λ(1 − 2γ ). We also observe that ψn+1 (x) = −Dψn (x) + f (x)ψn (qx) is odd since ψn is even and f is odd. Case 2. n odd: We assume that (2.36) holds for an odd n with ψn being odd. Applying A+ to Eq. (2.36) and using that xψn (x) is even, we obtain ψn+2 (x) − λq n (−D + f (X)R)(xψn (x)) + αn ψn (x) = 0.
(2.44)
Using again the commutation relation (2.42) this leads to ψn+2 (x) − λq n+1 x(−D + f (X)R)ψn (x) + (αn + λq n )ψn (x) = 0.
(2.45)
36
C. Berg, A. Ruffing
We rewrite the last equation as ψn+2 (x) − λq n+1 x(−D + γ Mq R + f (X)R)ψn (x) + λq n+1 xγ Mq Rψn (x) + (αn + λq n )ψn (x) = 0.
(2.46)
Since λq n+1 xγ Mq Rψn (x) = 2λq n+1 γ ψn (x)
(2.47)
which holds true because ψn is odd, we finally get ψn+2 (x) − λq n+1 xψn+1 (x) + (2λq n+1 γ + αn + λq n )ψn (x) = 0.
(2.48)
Again this is (2.36) where n is replaced by n + 1. We have αn+1 = αn + 2λγ q n+1 + λq n , and ψn+1 (x) = −Dψn (x) + 2γ x −1 ψn (x) + f (x)ψn (qx) is an even function.
(2.49)
Combining (2.43) and (2.49) we obtain α2n+2 = α2n + (1 + q)λq 2n τ,
(2.50)
α2n+3 = α2n+1 + λ(1 + q)q
(2.51)
2n+1
with n ∈ N0 . These recursion relations are easy to solve for n ∈ N0 using α0 = 0
α1 = λ(1 − 2γ ).
(2.52)
We get λτ (1 − q 2n ), 1−q λτ q 2n+1 = 1− . 1−q τ
α2n = α2n+1
(2.53) (2.54)
Proposition 2.3. The function ψn = (A+ )n ψ is in L2 (Rq ) for every n ∈ N0 and 1 ψn (x)ψn−1 (x), x
1 ψn (qx)ψn−1 (x), x
1 ψn (x)ψn−1 (qx) ∈ L1 (Rq ) x
(2.55)
for n ≥ 1. Proof. From (2.36) it follows that ψn is a linear combination of the functions X j ψ0 , j ∈ {0, . . . , n}, and since X n ψ02 ∈ L1 (Rq ) by Proposition 2.1, the first assertion follows. The second assertion is proved by induction. For n = 1 we have by (2.37), 1 ψ1 (x)ψ0 (x) = λψ02 (x), x
1 ψ1 (qx)ψ0 (x) = λqψ0 (x)ψ0 (qx), x
1 ψ1 (x)ψ0 (qx) = λψ0 (x)ψ0 (qx), x
Generalized q-Hermite Polynomials
37
which all belong to L1 (Rq ). Suppose now that (2.55) holds for some n. By (2.36) we get 1 αn ψn+1 (x)ψn (x) = λq n ψn2 (x) − ψn (x)ψn−1 (x), x x 1 αn ψn+1 (qx)ψn (x) = λq n+1 ψn (qx)ψn (x) − ψn (x)ψn−1 (qx), x x α 1 n ψn+1 (x)ψn (qx) = λq n ψn (x)ψn (qx) − ψn (qx)ψn−1 (x), x x which all belong to L1 (Rq ) by the induction hypothesis and the fact that ψn and its right translate belong to L2 (Rq ). Corollary 2.4. For n ≥ 1 we have f (x)ψn (qx)ψn−1 (x) ∈ L1 (Rq ). Proof. By Proposition 2.1 f is bounded for x → ±∞, and f is O(1/x) for x → 0.
Proposition 2.5. The operators A+ and A are ladder operators for the functions {ψn } in the sense that αn A+ ψn = ψn+1 , Aψn = ψn−1 , n ≥ 0 (2.56) q (with ψ−1 := 0). Proof. The first formula is evident from the definition ψn = (A+ )n ψ0 . The second formula will be proved by induction, and it is clearly true for n = 0 by (2.16) since α0 = 0. Like in Proposition 2.2 the proof will depend on the parity of n resp. ψn . Let us assume that the second formula holds for values from zero to n, and we shall then establish Aψn+1 = (αn+1 /q)ψn . By (2.36) and the induction hypothesis we get Aψn+1 (x) = λq n A(xψn (x)) − αn Aψn−1 (x) = λq n−1 L D(xψn (x)) + γ Mq+ (xψn (x)) αn αn−1 + xf (x)ψn (x) − ψn−2 (x), q and this can be transformed to Aψn+1 (x) = λq n−1 L ψn (qx) + xDψn (x) + γ Mq+ (xψn (x)) + xf (x)ψn (x) αn αn−1 − (2.57) ψn−2 (x). q Case 1. n even: In this case (2.57) reduces to Aψn+1 (x) = λq n−1 L (ψn (qx) + x(Dψn (x) + f (x)ψn (x))) −
αn αn−1 ψn−2 (x). q
However, if we insert Aψn (x) =
1 2γ L (Dψn (x) + f (x)ψn (x)) + ψn (x) q x
38
C. Berg, A. Ruffing
we get α α 2γ n n−1 ψn (x) − ψn−2 (x) Aψn+1 (x) = λq n−1 ψn (x) + λq n−1 x Aψn (x) − x q αn = λq n−1 (1 − 2γ )ψn (x) + λq n−1 xψn−1 (x) − αn−1 ψn−2 (x) q αn αn+1 n−1 = λq ψn (x). (1 − 2γ )ψn (x) + ψn (x) = q q Case 2. n odd: In this case (2.57) reduces to Aψn+1 (x) = λq n−1 L ψn (qx) + x(Dψn (x) + f (x)ψn (x)) + 2γ qψn (qx) αn αn−1 ψn−2 (x), − q and when inserting Aψn (x) =
1 L (Dψn (x) + f (x)ψn (x)) q
we find αn αn−1 ψn−2 (x) Aψn+1 (x) = λq n−1 (1 + 2γ q)ψn (x) + λq n−1 xAψn (x) − q αn n−1 = λq n−1 (1 + 2γ q)ψn (x) + xψn−1 (x) − αn−1 ψn−2 (x) λq q αn αn+1 = λq n−1 (1 + 2γ q) + ψn (x) = ψn (x). q q Lemma 2.6. For n ≥ 1 we have (A+ ψn , ψn−1 )J = (ψn , Aψn−1 )J . Proof. Case 1. n even: By Proposition 2.3 and Corollary 2.4 we can split the inner product on the left-hand side in 3 sums (A+ ψn , ψn−1 )J =
1 ψn (qx) , ψn−1 J 1−q x 1 ψn , ψn−1 + (f (x)ψn (qx), ψn−1 )J , − J 1−q x
and by (2.9) this is easily transformed to (ψn , Aψn−1 )J . The case of n odd is treated similarly. Proposition 2.7. The functions {ψn } are mutually orthogonal in L2 (Rq ). Proof. We shall establish that (ψn , ψk )J = 0 for 0 ≤ k < n by induction. For n = 1 we shall prove that (ψ1 , ψ0 )J = 0, which is clear, since it is the sum of an odd function. Under the induction hypothesis: (ψn , ψk )J = 0 for 0 ≤ k < n for k = 1, . . . , n we shall establish (ψn+1 , ψk )J = 0 for k < n + 1. The assertion is again clear for k = n
Generalized q-Hermite Polynomials
39
since we shall sum an odd function, and it follows from the induction hypothesis for k < n − 1 by the recursion (2.36): (ψn+1 , ψk )J = λq n (xψn , ψk )J + αn (ψn−1 , ψk )J = λq n (ψn , xψk )J = q n−k (ψn , ψk+1 )J + αk (ψn , ψk−1 )J = 0. We finally have to show that (ψn+1 , ψn−1 )J = 0, but this follows from Lemma 2.6 since (ψn+1 , ψn−1 )J = (A+ ψn , ψn−1 )J = (ψn , Aψn−1 )J , but by Proposition 2.5 the last expression is equal to αn−1 (ψn , ψn−2 )J , q which is zero by the induction hypothesis. Proposition 2.8. For n ≥ 1 we have ||ψn ||2 =
αn ||ψn−1 ||2 . q
Proof. From the recursion relation (2.36) we get by calculating the inner product with ψn+1 : ||ψn+1 ||2 = λq n (xψn , ψn+1 )J .
(2.58)
If we calculate instead the inner product with ψn−1 we find αn ||ψn−1 ||2 = λq n (xψn , ψn−1 )J . Dividing this by q and using (2.58) (with n replaced by n − 1) we get the assertion. From Proposition 2.8 we find ||ψn ||2 =
α1 · · · α n ||ψ0 ||2 . qn
(2.59)
We can evaluate ||ψ0 ||2 by Ramanujan’s sum, cf. [8], ∞
(a; q)k zk =
k=−∞
(q, az, q/(az); q)∞ . (q/a, z; q)∞
(2.60)
By (2.30) we get ||ψ0 ||2 = 2(1 − q)
∞
(q/τ )k (−λ(1 − q); q 2 )k
k=−∞
(q 2 , −qλ(1 − q)/τ, −qτ/(λ(1 − q)); q 2 )∞ = 2(1 − q) , (−q 2 /(λ(1 − q)), q/τ ; q 2 )∞
(2.61)
where as before τ = 1 − 2γ (1 − q). The recurrence relation (2.36) resembles the three-term recurrence relation for orthogonal polynomials. We now define the functions pn (x) = pn (x; λ, γ ) := ψn (x)/ψ0 (x), n ≥ 0, and get immediately the following result:
(2.62)
40
C. Berg, A. Ruffing
Proposition 2.9. The functions pn defined by (2.62) are polynomials of degree n satisfying the recurrence relation pn+1 (x) − λq n xpn (x) + αn pn−1 (x) = 0,
p0 (x) = 1,
p−1 (x) = 0,
(2.63)
where αn is given by (2.53), (2.54). Remark 2.10. The polynomials given by (2.62) can be defined under the hypothesis 0 < q < 1, λ > 0 and γ < 1/(2(1 − q)). By Favard’s theorem, cf. [6], the polynomials are orthogonal with respect to a positive measure on the real line if and only if αn > 0 for n ≥ 1. Clearly α2n > 0 under the given restrictions, and we see that α2n+1 > 0 if and only if γ < 1/2, which was also the restriction imposed on γ by Proposition 2.1. It follows that the discrete measure (cf. (1.3)) |ψ0 (x)|2 dµq (x)
(2.64)
concentrated on the lattice Rq is an orthogonality measure for the polynomials {pn }. In the next section we shall see that the polynomials correspond to an indeterminate moment problem and find some further orthogonality measures. It turns out that the orthonormal system {ψn /||ψn ||} is not an orthonormal basis for L2 (Rq ). 3. Identification of the Polynomials To identify the polynomials from Proposition 2.9 with a known family of orthogonal polynomials we derive the recurrence relation for the corresponding monic polynomials. To do so, we introduce the sequence of non-vanishing real numbers (kn ) and the sequence of monic polynomials qn (x) = qn (x; λ, γ ) such that kn qn (x) = pn (x).
(3.1)
This yields xqn (x) =
kn+1 αn kn−1 qn+1 (x) + qn−1 (x), λq n kn λq n kn
(3.2)
and hence kn+1 /(λq n kn ) = 1. Since k0 = 1 we find kn = λn q n(n−1)/2 which yields xqn (x) = qn+1 (x) + βn+1 qn−1 (x)
(3.3)
with τ (1 − q 2n ), λ(1 − q)q 4n−1 q 2n+1 τ , 1− = λ(1 − q)q 4n+1 τ
β2n+1 =
(3.4)
β2n+2
(3.5)
where in both cases n ∈ N0 . The polynomials are symmetric in the sense that qn (−x) = (−1)n qn (x), because the middle term is missing in the three-term recurrence relation (3.3). We now need some general remarks about symmetric monic polynomials, cf. [6, p. 40]. Let (Pn ) be a sequence of monic polynomials, orthogonal with respect to a positive measure σ supported by the interval [0, ∞[, and let (Kn ) be the monic polynomials
Generalized q-Hermite Polynomials
41
orthogonal with respect to the measure x dσ (x). They are called the kernel polynomials for the parameter value 0. Then it is easy to see that the symmetric polynomials S2n (x) := Pn (x 2 ),
S2n+1 (x) := xKn (x 2 )
(3.6)
are orthogonal with respect to the symmetric measure µ on the real line determined by the equations g(x 2 ) dµ(x) =
g(x) dσ (x),
where g is an arbitrary continuous function on [0, ∞[ of at most polynomial growth. In other words one may say that σ is the image measure of µ under the mapping x → x 2 . If σ has the density w(x) on [0, ∞[, then µ has the density |x|w(x 2 ). If σ is a discrete measure of the form σ = a0 δ0 +
∞
ak δxk ,
(3.7)
k=1
where ak ≥ 0, xk > 0, then µ is given as µ = a0 δ0 +
∞ 1 k=1
2
ak δ−√xk + δ√xk .
(3.8)
We say that (Sn ) are the symmetrized monic orthogonal polynomials corresponding to (Pn ). Theorem 3.1. The polynomials qn (·; λ, γ ) given by (3.3) are the symmetrized monic orthogonal polynomials corresponding to the discrete q 2 -Laguerre polynomials (α− 21 )
Ln
(x; q 2 ), if λ = 1/(1 − q) and α is determined such that γ =
The condition γ <
1 2
1 1 − q −2α . 2 1−q
(3.9)
corresponds to α > − 21 . (α)
Proof. The discrete q-Laguerre polynomials Ln (x; q) are orthogonal polynomials when α > −1 and 0 < q < 1. They correspond to an indeterminate Stieltjes moment problem, see [1, 10, 14]. For a new treatment of the moment problem see [20]. In the normalization of [11] their recurrence relation reads
n+1 n+α − q 2n+α+1 xL(α) (x; q) + (1 − q ) + q(1 − q ) L(α) n n (x; q) (α)
(α)
= (1 − q n+1 )Ln+1 (x; q) + q(1 − q n+α )Ln−1 (x; q).
(3.10)
They are given by the following formula involving a q-basic hypergeometric function L(α) n (x; q) =
(q α+1 ; q)n −n α+1 ; q, −xq n+α+1 . 1 φ1 q ; q (q; q)n (α)
It follows that the corresponding monic polynomials Ln (x; q) are given as n n−1 L(α) + ··· , n (x; q) = x + kn (α; q)x
(3.11)
42
C. Berg, A. Ruffing
where kn (α; q) = −q −2n−α+1 (1 − q n+α )
1 − qn . 1−q
(3.12)
We claim that (qn ) are the symmetrized orthogonal polynomials corresponding to the polynomials (α− 21 )
Pn (x) = Ln
(x; q 2 ).
(3.13)
Since (Pn ) are orthogonal with respect to x α− 2 (−x; q 2 )∞ 1
(3.14)
on the half-line, cf. [11], we see that the corresponding monic kernel polynomials for the parameter 0 are (α+ 21 )
Kn (x) = Ln
(x; q 2 ),
(3.15)
and the symmetric polynomials given by (3.6) are orthogonal with respect to the symmetric weight function on the real line |x|2α , (−x 2 ; q 2 )∞
1 α>− . 2
(3.16)
Their recurrence relation is easily found from (3.12). In fact, if we write it xSn (x) = Sn+1 (x) + ωn+1 Sn−1 (x),
(3.17)
and insert the expressions (3.6), we get two equations (α− 21 )
Ln
(α+ 21 )
xLn
(α+ 21 )
(x; q 2 ) = Ln
(α+ 1 )
(x; q 2 ) + ω2n+1 Ln−1 2 (x; q 2 ),
(α− 21 )
(α− 21 )
(x; q 2 ) = Ln+1 (x; q 2 ) + ω2n+2 Ln
(3.18)
(x; q 2 ),
(3.19)
and we see that ω2n+1 = kn (α − 21 ; q 2 ) − kn (α + 21 ; q 2 ) = q −2α−4n+1 (1 − q 2n ), ω2n+2 = kn (α +
2 1 2 ; q ) − kn+1 (α
−
2 1 2; q )
=q
−4n−1
(q
−2α
−q
2n+1
(3.20) ).
Comparing β2n+1 from (3.4) with ω2n+1 , we see that they agree if and only if τ q −2α = . λ(1 − q)
(3.21)
(3.22)
Inserting this in (3.21) we find ω2n+2
λ(1 − q) 2n+1 τ 1− . = q λ(1 − q)q 4n+1 τ
(3.23)
Comparing finally the last formula with β2n+2 from (3.5), we see that they agree if and only if 1 λ= . (3.24) 1−q Using (3.22) we finally get γ =
1 1 − q −2α . 2 1−q
(3.25)
Generalized q-Hermite Polynomials
43
We have identified the monic polynomials qn in the special case λ = 1/(1 − q). This is no severe restriction, because the polynomials in the general case can be expressed by qn∗ (x) = qn (x; 1/(1 − q), γ ), as we shall now see. The recurrence relation for qn∗ is ∗ ∗ ∗ xqn∗ (x) = qn+1 (x) + βn+1 qn−1 (x),
(3.26)
where τ (1 − q 2n ), q 4n−1 q 2n+1 τ = 4n+1 1 − , q τ
∗ β2n+1 =
(3.27)
∗ β2n+2
(3.28)
and the recurrence relation (3.3) for qn (x) = qn (x; λ, γ ) can be written xqn (x) = qn+1 (x) +
1 β ∗ qn−1 (x). λ(1 − q) n+1
(3.29)
For a > 0 we consider the monic polynomials qn (x) = a −n qn∗ (ax), which satisfy ∗ qn+1 (x) + a −2 βn+1 qn−1 (x). x qn (x) =
(3.30)
1
qn if a = (λ(1 − q)) 2 , hence This shows that qn = n qn (x; λ, γ ) = (λ(1 − q))− 2 qn λ(1 − q)x; 1/(1 − q), γ .
(3.31)
The q-Laguerre polynomials have a family of discrete orthogonality measures, cf. [1, 14]. These families are also considered in [4, 5]. In the normalization of [11] we see (α− 21 )
that {Ln
(x; q 2 )} has the orthogonality measure ∞ k=−∞
q k(2α+1) δ 2k , (−q 2k ; q 2 )∞ q
(3.32)
which by (3.8) means that (qn∗ ) has the orthogonality measure ∞ 1 q k(2α+1) (δ k + δq k ). 2 (−q 2k ; q 2 )∞ −q
(3.33)
k=−∞
Using (3.24), (3.25) we see that the function gγ from Proposition 2.1 (defined by (2.30) with c = 1) is given by (−1; q 2 )∞ 2kα q , gγ ± q k = q 2kα (−1; q 2 )k = (−q 2k ; q 2 )∞
(3.34)
which shows that (3.33) is proportional to the measure with density gγ with respect to µq given by (1.3). Since gγ = ψ02 , we see that the orthogonality of the polynomials (qn∗ ) with respect to gγ dµq is equivalent to (ψn , ψm )J = 0
for
n = m.
(3.35)
44
C. Berg, A. Ruffing
Since (3.35) was established in Sect. 2, the results there can be used to provide an orthogonality measure for the q-Laguerre polynomials. The moment problem corresponding to the orthogonal polynomials pn (x) = pn (x; λ, γ ), λ > 0, γ < 1/2, is indeterminate being the symmetrized version of an indeterminate Stieltjes moment problem. Using the parameter τ given by (2.26), we get the following orthogonality measures for {pn } corresponding to (3.16) and (3.33) respectively: |x|− log τ/ log q dx (−λ(1 − q)x 2 ; q 2 )∞ ∞ q k(1−log τ/ log q) √ √ δ + δ k k −q / λ(1−q) q / λ(1−q) . (−q 2k ; q 2 )∞
(3.36) (3.37)
k=−∞
We shall not make any effort to normalize the measures (3.36), (3.37) to probabilities and to find the norm of the polynomials {pn }, since this is equivalent to well-known facts about the q-Laguerre polynomials. The solution ψ02 dµq to the above moment problem is not an N-extremal solution, because it is not concentrated in a discrete subset of the real line. By the Theorem of Riesz there exists a non-zero square-integrable function h with respect to ψ02 dµq for which hpn ψ02 dµq = 0, n ≥ 0, hence (hψ0 , ψn )J = 0, n ≥ 0.
(3.38)
Equation (3.38) shows that the orthonormal system {ψn /||ψn ||} is not complete in the Hilbert space L2 (Rq ), cf. Remark 2.10. 4. The Generalized Hermite Polynomials For q → 1 the three-term recurrence relation (3.3) converges to xqn (x) = qn+1 (x) + βn+1 qn−1 (x)
(4.1)
with β2n+1 =
2n , λ
β2n+2 =
2n + 1 − 2γ . λ
(4.2)
It follows that the generalized monic q-Hermite polynomials {qn (x; λ, γ )} converge to a family {hn (x; λ, γ )} of monic polynomials determined by (4.1) and the initial condition h0 (x; λ, γ ) = 1. We shall next see that the weight function (3.36) converges pointwise to the weight function 1 |x|−2γ exp − λx 2 2
for q → 1.
(4.3)
Generalized q-Hermite Polynomials
45
In fact, it is well-known that (−z; q)∞
n ∞ q ( 2 ) zn = , (q; q)n
z ∈ C,
(4.4)
n=0
cf. [8], and it follows that lim (−λ(1 − q)x 2 ; q 2 )∞ =
q→1
∞ n=0
1
2 λx
2 n
n!
= exp
1 2 λx 2
pointwise for x ∈ R since q n(n−1) (1 − q)n 1 = n 2 2 q→1 (q ; q )n 2 n! lim
and
n n q n(n−1) (1 − q)n q 2k−2 1 = < . (q 2 ; q 2 )n 1 + q + · · · + q 2k−1 2k − 1 k=1
k=1
The even moments of the symmetric weight function (4.3) are given as s2n =
n−γ + 1 2 2 1 1 n−γ + . λ 2
(4.5)
The corresponding moment problem is determinate by Carleman’s criterion. Using the Ramanujan integral ∞ (q −c ; q)∞ tc π dt = − , c > −1, (4.6) (−t; q)∞ sin(π c) (q; q)∞ 0 cf. [1, 8], it is easy to calculate the 2nth moment s2n (q) of (3.36), and we get s2n (q) = − (λ(1 − q))−c−1 with c =n−
(q −2c ; q 2 )∞ π sin(π c) (q 2 ; q 2 )∞
(4.7)
1 log τ 1 − . 2 log q 2
Since limq→1 s2n (q) = s2n , it follows from the method of moments (cf. [7]) that any solution to the indeterminate moment problem with moments (4.7) converges weakly to the density (4.3). In particular (3.36) and (3.37) converge weakly to the measure given by (4.3). It also follows that {hn (x; λ, γ )} are the monic orthogonal polynomials with respect to (4.3). The change of scale corresponding to λ = 2 gives the weight function |x|−2γ exp(−x 2 ). The corresponding orthogonal polynomials are called generalized Hermite polynomials in Chihara’s monograph [6]. See also [18, 19].
46
C. Berg, A. Ruffing
References 1. Askey, R.: Ramanujan’s extension of the gamma and beta functions. Am. Math. Monthly 87, 346–359 (1980) 2. Askey, R. and Suslov, S.K.: The q-harmonic oscillator and an analogue of the Charlier polynomials. J. Phys. A. 26, 693–698 (1993) 3. Askey, R. and Suslov, S.K.: The q-harmonic oscillator and the Al-Salam and Carlitz polynomials. Lett. Math. Phys. 29, 123–132 (1993) 4. Berg, C.: On some indeterminate moment problems for measures on a geometric progression. J. Comput. Appl. Math. 99, 67–75 (1998) 5. Berg, C.: From discrete to absolutely continuous solutions of indeterminate moment problems. Arab. J. Math. Sc. 4, No. 2, 1–18 (1998) 6. Chihara, T.S.: An introduction to orthogonal polynomials New York–London–Paris: Gordon and Breach, 1978 7. Feller, W.: An Introduction to Probability Theory and Its Applications. Vol. II., New York: Wiley, 1966 8. Gasper, G. and Rahman, M.: Basic Hypergeometric Series. Cambridge: Cambridge University Press, 1990 9. Hinterding, R. and Wess, J.: q-deformed Hermite polynomials in q-quantum mechanics. Eur. Phys. J. C (1998) 10. Ismail, M.E.H. and Rahman, M.: The q-Laguerre Polynomials and Related Moment Problems. J. Math. Anal. Appl. 218, 155–174 (1998) 11. Koekoek, R., Swarttouw, R.F.: The Askey-scheme of hypergeometric orthogonal polynomials and its q-analogue. Report 98-17, Delft University of Technology, Faculty TWI, 1998 12. Koornwinder, T.H.: Orthogonal polynomials in connection with quantum groups. In: Orthogonal polynomials: Theory and practice (ed. P. Nevai), N. ASI series C, vol. 294, Dorddrecht: Kluwer, 1990. pp. 257–292 13. Lorek, A., Ruffing, A., Wess, A.: A q-Deformation of the Harmonic Oscillator. Zeitschrift für Physik C 74, 369–377 (1997) 14. Moak, D.S.: The q-analogue of the Laguerre polynomials. J. Math. Anal. Appl. 81, 20–47 (1981) 15. Ruffing, A.: Doctorate Thesis, LMU München, 1996 16. Ruffing, A.: On Schrödinger-Hermite Operators in Lattice Quantum Mechanics. Lett. Math. Phys. 47, 197–214 (1999) 17. Ruffing, A., Witt, M.: On the Integrability of q-Oscillators Based on Invariants of Discrete Fourier Transforms. Lett. Math. Phys. 42, No. 2, 167–181 (1997) 18. Rösler, M.: Habilitation Thesis. TU München, 1999 19. Rösler, M.: Generalized Hermite Polynomials and the Heat Equation for Dunkl Operators. Comm. Math. Phys. 192, 519–542 (1998) 20. Simon, B.: The classical moment problem as a self-adjoint finite difference operator. Adv. Math. 137, 82–203 (1998) 21. Szeg˝o, G.: Orthogonal Polynomials. Fourth Edition, Providence, RI: American Mathematical Society, 1975 Literature for Further Reading 1. Atakishiev, N.M., Suslov, S.K.: A realization of the q-Harmonic Osciallator. Theoretical and Mathematical Physics, Vol. 87, No. 1, 1991, pp. 442–444 2. Atakishiev, N.M., Suslov, S.K.: Difference Analogs of the Harmonic Oscillator. Theoretical and Mathematical Physics, Vol. 85, No. 1, 1991, pp. 1055–1062 3. Atakishiyev, N:M., Mir-Kasimov, R.M., Nagiyev, Sh.M.: A Relativistic Model of the isotropic Oscillator. Annalen der Physik, 7. Folge, 42, 1, 25–30 (1985) 4. Benaoum, H.: h analogue of Newton’s binomial formula. J. Phys. A: Math Gen. 31, L751–L754 (1999) 5. Benaoum, H.: (q, h)-analogue of Newton’s beinomial formula, J. Phys. A: Math. Gen. 32, 2037–2040 (1999) Communicated by T. Miwa
Commun. Math. Phys. 223, 47 – 65 (2001)
Communications in
Mathematical Physics
© Springer-Verlag 2001
On the Integrated Density of States for Schrödinger Operators on Z2 with Quasi Periodic Potential Wilhelm Schlag Department of Mathematics, Princeton University, Fine Hall, Princeton, NJ 08544, USA. E-mail:
[email protected] Received: 13 February 2001 / Accepted: 21 May 2001
Abstract: In this paper we consider discrete Schrödinger operators on the lattice Z2 with quasi periodic potential. We establish new regularity results for the integrated density of states, as well as a quantitative version of a “Thouless formula”, as previously considered by Craig and Simon, for real energies and with rates of convergence. The main ingredient is a large deviation theorem for the Green’s function that was recently established by Bourgain, Goldstein, and the author. For the integrated density of states an argument of Bourgain is used. Finally, we establish certain fine properties of separately subharmonic functions of two variables that might be of independent interest. 1. Introduction The purpose of this note is mainly to establish certain regularity properties of the integrated density of states for the two-dimensional discrete quasi periodic model H = −Z 2 + λV (θ1 + n1 ω1 , θ2 + n2 ω2 ) for large λ. In [6] it was shown that for analytic V that are non constant on any vertical and horizontal line, and large λ, Anderson localization holds for a large (in measure) set of (ω1 , ω2 ) ∈ T2 and (θ1 , θ2 ) ∈ T2 . Our first result shows that under the same hypotheses the integrated density of states (IDS) has a modulus of continuity exp(−| log t|c ) for some small c > 0. It is reasonable to believe that the IDS should have better regularity properties, but our current methods do not allow us to conclude that. Studying the regularity of the IDS has a long history that we will not review in detail. The IDS is known to be continuous under very mild assumptions, see Delyon and Souillard [9] or Figotin and Pastur [10], Theorem 3.4. It is also well-known that this is equivalent to saying that any given number has zero probability of being an eigenvalue, see Craig and Simon [8]. In the case of the two-dimensional quasi periodic model considered above Craig and Simon [8] showed that the IDS is log-Hölder continuous (in fact, their argument is valid in any dimension).
48
W. Schlag
Recall that the IDS is the limiting distribution of the eigenvalues. More precisely, () if ⊂ Z2 is a square centered at the origin, then let Ej (θ ) denote the eigenvalues of H (θ) restricted to the square with Dirichlet boundary conditions. Set 1 () #[j : Ej (θ ) ≤ E]. ||
()
Nθ (E) :=
(1.1)
It is a simple and well-known consequence of the ergodic theorem (in this case with two commuting shifts Tω1 and Tω2 ) that the limit ()
lim
diam()→∞
Nθ
=k
exists for almost every θ ∈ T2 . The probability measure given by k is called the IDS. For further details see [1] or [10]. In fact, combining [1] with [9] implies that this limit procedure converges for every θ ∈ T2 . Clearly, one also has lim
diam()→∞ T2
()
Nθ
dθ = k.
For one dimensional models (such as almost Mathieu) the IDS is naturally connected with the Lyapunov exponent by means of the Thouless formula γ (E) =
log |E − E | dk(E ),
see Avron and Simon [1]. On Z2 there is no Lyapunov exponent. Nevertheless, Craig and Simon [8] made successful use of the integral on the right-hand side for E with Im(E) = 0. Amongst other things they showed that one has
log |E − E | dk(E ) ≥ 0
which was crucial for their proof of log-Hölder continuity. Below we study this mean for real E and show that it is (as one would expect) the limit of logarithms of determinants. Rates of convergence as well as a large deviation theorem for the determinants are also obtained. We want to emphasize though, that all aforementioned results are obtained only for large disorders (because we use large deviation theorems for the Green’s function from [6], see the following section). Part of our original motivation was to understand if large deviation estimates could be obtained for log | det(H (θ ) − E)|. This is natural in view of recent nonperturbative arguments by Jitomirskaya [12], Bourgain and Goldstein [5], and Bourgain and Jitomirskaya [7]. Although a nonperturbative proof for the Z2 model seems rather remote at this point, it might be helpful to know what should be true based on the results obtained here for large disorders. Throughout this paper C will stand for a numerical constant that can change from line to line. Usually we will indicate which parameters various constants depend on. Also, a b stands for a ≤ Cb and similarly with a b. Finally, a b means both a b and a b.
Integrated Density of States for Schrödinger Operators
49
2. The IDS for Large Disorder Let H (θ ) = − + λV (θ ),
(2.1)
where V (θ)(n1 , n2 ) = v(θ1 + n1 ω1 , θ2 + n2 ω2 ) and λ is a large parameter. As in [6] we assume that the real-analytic function v : T2 → R satisfies: θ1 → v(θ1 , θ2 )
and
θ2 → v(θ1 , θ2 )
(2.2)
are nonconstant functions for any choice of the other variable. Most of the work in [6] was devoted to proving large deviation estimates for the Green’s functions G (θ , E) := (H (θ ) − E)−1 , where H (θ ) := R H (θ )R , R being the restriction operator to . Here should be thought of as a square in Z2 , but for technical reasons it was necessary to consider a larger class of L-shaped sets in [6], which were referred to as “elementary regions”. We shall not dwell on this point here, see Sect. 2 in [6] for details. Returning to the large deviation estimates (LDE), we call G (θ , E) good, provided for some fixed 0 < b < 1 and γ > 0, b
G (θ , E) ≤ λ−1 e , |G (θ , E)(x, y)| ≤ e−γ |x−y|
for all x, y ∈ , |x − y| > /4,
(2.3)
where = diam(). Otherwise, G (θ , E) is called bad. The main technical statement in [6] is the following proposition, see Sect. 4 there. We set Eλ = [−4 − 2λv∞ , 4 + 2λv∞ ] so that the spectrum of H is always strictly contained inside Eλ . Proposition 2.1. Let v be a real-analytic function satisfying (2.2). Given ε > 0 there exist %ε ⊂ T2 so that mes[T2 \ %ε ] < ε, and (large) numbers λ0 = λ0 (v, ε), and N0 = N0 (v, ε) with the following property: For any ω ∈ %ε , all λ ≥ λ0 and all N ≥ N0 there is the estimate sup mes θ ∈ T2 : G (θ , E) is bad ≤ exp −N ρ for i = 1, 2 (2.4) E∈Eλ
for any square ⊂ Z2 , diam() = N , with γ = b, ρ < 1.
1 4
log λ and some constants 0 <
This LDE was a crucial ingredient in the proof of localization, the other being a technique of energy elimination via properties of semi-algebraic sets. The latter will not concern us here, but we will rely heavily on Proposition 2.1. Clearly, we may assume that there is some (large) constant Cε so that for all ω ∈ %ε , ω1 n1 + ω2 n2 ≥ Cε−1 [|n1 | + |n2 |]−3
for all (n1 , n2 ) ∈ Z2 \ {(0, 0)}.
Here · denotes the distance to the nearest integer. This Diophantine condition will be used below without further mention. Fix some small ε > 0, and let %ε , λ0 , and N0 be as above. For any N ≥ N0 and ⊂ Z2 a square of side length N , the operator H :=
50
W. Schlag
H (θ ) has eigenvectors ξ = ξ (θ ) with corresponding eigenvalues E = E (θ ) (here θ is fixed but arbitrary). We will show that for any interval J ⊂ R and large N , #[
: E ∈ J ] ≤ C exp(−| log J |c )||
(2.5)
with constants c, C that only depend on ε, v. This of course shows that the IDS has the modulus of continuity stated in the introduction. The proof of (2.5) is based on the following idea (for the details see the following proposition): Let δ = |J | and choose 1 N0 | log δ| b (this notation means comparable up to constants). Let E be the center of J . Partitioning into squares {Qj } of size N0 , it follows from Proposition 2.1, and the fact that the set in (2.4) is contained in a semi-algebraic set of degree N0C1 for some constant C1 and comparable measure, that ρ
#[j : GQj (θ , E) is bad] ≤ N0C1 exp(−N0 ) N 2 .
(2.6)
Now denote ∗ = bad Qj , where the union runs over those squares in that are bad. Using the resolvent identity it can be shown that G\∗ (θ , E) exp(N0b ).
(2.7)
By choice of N0 one has b
δeN0 1. It is then easy to see from (2.7) that any eigenfunction ξ = ξ with eigenvalue belonging to J has most of its 2 –mass on the set ∗ . Since the ξ ’s under consideration are pairwise orthogonal, in view of (2.6) there can basically be no more than ρ
ρ
|∗ | N0C1 exp(−N0 ) N 2 exp(−| log δ| b )|| for many of these eigenfunctions, as claimed. This argument is taken from [3], where Bourgain studied the regularity of the IDS for the almost Mathieu (and more general) equations by means of this method. The main point of his paper was to show that the norm of the Green’s function can be controlled by quadratic polynomials. This allows him to prove Hölder-( 21 −) regularity for the IDS. Using Proposition 2.1 instead of the explicit control via (quadratic) polynomials gives a correspondingly weaker result. Presently it is not clear how to improve on it. In [4], the same argument is also used. We now turn to a more detailed account of the regularity result for the IDS in the Z2 model. Proposition 2.2. Given ε, let %ε , λ0 , and N0 be as in Proposition 2.1. There exist constants c = c(ε, v), C = C(ε, v) so that for any ω ∈ %ε , λ ≥ λ0 , and any interval J ⊂ R the bound (2.5) holds for sufficiently large N , i.e., sup #[ : E (θ ) ∈ J ] ≤ C exp(−| log δ|c ) N,
θ∈T2
where δ = |J |. In particular, for any such ω and λ, the IDS has modulus of continuity exp(−| log t|c ).
Integrated Density of States for Schrödinger Operators
51
Proof. Assume ε > 0, ω ∈ %ε fixed. Fix some small interval of energies J of length δ 1 and center E. Further, let ⊂ Z2 be a large square centered at 0. Let N0 | log δ| b be an integer, where b is as in (2.4) and the multiplicative constant in this notation is taken sufficiently small. Increasing N if necessary, we can assume that can be partitioned into squares {Qj } of side length N0 . Let 0 denote the square centered at 0 with side length N0 and set B0 := {θ ∈ T2 : G0 (θ , E) is bad}, see (2.3). Clearly, GQj (0, E) is good iff (mj ω1 , nj ω2 ) ∈ B0 mod Z2 , where (mj , nj ) ∈ Z2 is the center of Qj . We have set the phase θ = 0 merely for convenience. Any other phase works just as well. Since Proposition 2.1 provides the measure estimate ρ
mes(B0 ) ≤ exp(−N0 ), and B0 is semi-algebraic of degree at most N0C1 , one has ρ
#[(n1 , n2 ) ∈ : (n1 ω1 , n2 ω2 ) ∈ B0 mod Z2 ] N0C1 exp(−N0 ) ||.
(2.8)
To verify this claim, observe firstly that we may replace the potential function with a trigonometric polynomial of degree N02 , thus providing the semi-algebraic property of B0 . More precisely, by analyticity of v there is a trigonometric polynomial PN of degree at most N 2 , say, such that v − PN ∞ e−N when N is large. This introduces at most an exponentially small error that is negligible, whereas the set B0 defined in terms of PN rather than v is semi-algebraic of degree at most N 10 , see [6], Remark 4.3 for details. For convenience, we do not distinguish between B0 and B0 . In light of ρ this fact, we may cover B0 by at most N0C1 many disks of size exp(− 21 N0 ). Since the vector (ω1 , ω2 ) satisfies a Diophantine condition, one concludes (2.8) by means of standard discrepancy considerations, say. Therefore, the number of bad squares Qj in cannot exceed the right-hand side of (2.8). In particular, ∗ = bad Qj satisfies ρ
|∗ | N0C1 exp(−N0 )||.
(2.9)
On the other hand, b
G\∗ (0, E) eN0 .
(2.10)
This follows by means of a straightforward application of the resolvent identity. The details can be found in Lemma 2.2 and Lemma 4.4 of [6]1 . Denote the eigenfunctions of H = H (0) with eigenvalues falling into the interval J by {ξj }M j =1 . Let ξ be one of them with eigenvalue E . By definition, R\∗ (H − E)R\∗ ξ + R\∗ (H − E)R∗ ξ = (E − E)R\∗ ξ. 1 Strictly speaking, one needs to define a good square so that every point in it is surrounded by a good elementary, i.e., L-shaped, region of a certain size. But this only brings in another factor of N0C . For more details concerning elementary regions as well as the details of the resolvent identity argument we refer the reader to [6].
52
W. Schlag
Applying G\∗ (0, E) to this line yields R\∗ ξ + G\∗ (0, E)(H − E)R∗ ξ = (E − E)G\∗ (0, E)R\∗ ξ.
(2.11)
Let P denote the projection onto the range of G\∗ (0, E)(H − E)R∗ . Clearly, the dimension of this range does not exceed |∗ |. Thus rank(P ) ≤ |∗ |. In view of (2.11) and (2.10), b
R\∗ ξ − P R\∗ ξ δeN0 .
(2.12)
1
By taking N0 to be a small multiple of | log δ| b , the right-hand side of (2.12) can be 1 made less than 10 , say. Invoking (2.12) for each of the ξj shows that M=
M
M
ξj 2 ≤
j =1
≤
M
M P R\∗ ξj 2 + R∗ ξj 2 + 2 j =1
M +2 2
M
j =1
P ξj 2 + 3
j =1
M
R∗ ξj 2
j =1
M M + 2 trace(P ) + 3 trace(R∗ ) ≤ + 2 rank(P ) + 3|∗ | 2 2 M ρ ≤ + C N0C exp(−N0 )||. 2
≤
This yields the desired bound (2.5).
Inspection of this proof shows that a LDE (2.4) with b = ρ = 1 implies Hölder continuity of the IDS (in any dimension). This should explain why it was possible to establish Hölder continuity of the IDS in [11] from the “sharp LDEs” established there. Note, however, that the previous argument is more satisfactory as it bounds the number of eigenvalues inside a small interval. In contrast, [11] uses the Thouless formula that only applies to the limit. As far as LDEs with b = ρ = 1 are concerned, they have been established only for the case of a one-dimensional equation with one frequency. In all other cases where LDEs are known, the value of ρ is rather small. For the remainder of this section we discuss LDEs for logarithms of determinants and the “Thouless formula”. More precisely, we shall consider squares ⊂ Z2 centered at the origin and we define 1 log det(H (θ ) − E). f,E (θ ) := || Let
γ (E) :=
T2
f,E (θ ) dθ
and γ (E) :=
log |E − E | dk(E ).
For the case of Im(E) = 0 the quantity γ (E) was introduced by Craig and Simon [8]. Their objective was to prove the log-Hölder continuity of the IDS (in all dimensions). They accomplished this by showing that γ (E) ≥ 0 for all E with Im(E) = 0.
Integrated Density of States for Schrödinger Operators
53
Of course, it follows from the fact that the IDS exists that for a.e. θ lim
diam()→∞
f,E (θ ) → γ (E) for all E with Im(E) = 0.
However, it is much harder to show that this limit is always nonnegative, and Craig and Simon achieved this by means of a reduction to strips. In the latter case one has an interpretation of γ (E) as an average of all nonegative Lyapunov exponents. Observe that their result implies by means of Fatou’s lemma that γ (E) ≥ 0
for all real E.
As already apparent in the proof of the Thouless formula in [1] it is more subtle to understand whether or not the limit of f,E (θ ) exists for real E and equals this integral. By general principles one can easily conclude that for a.e. θ it exists in an L2 sense in E, cf. Proposition 2.7 below. We show here that for large disorders and most ω one has for all E that γ (E) → γ (E). Moreover, we obtain the rate of convergence |γ (E) − γ (E)| ||−δ with some constant δ > 0. Finally, we establish a LDE for f,E for large disorders, see Proposition 2.5 below. The argument proving this proposition is again very general and applies to all cases (in any dimension) where a LDE for the Green’s function is known. By means of this LDE one concludes that for all E, f,E (θ ) → γ (E) for a.e. θ . Presently it is not clear whether this can be true for a.e. θ and all E. The following lemma is Weyl’s well-known eigenvalue comparison theorem for Hermitian matrices. The proof is an immediate consequence of the min-max characterization of eigenvalues, see Theorem 8.4 in [2]. Lemma 2.3. Let A, B be Hermitian d × d matrices. Suppose rank(A − B) ≤ k. If a1 ≤ a2 ≤ . . . ≤ ad , and b1 ≤ b2 ≤ . . . ≤ bd denote the eigenvalues of A and B, respectively, then a
≥b b ≥a
+k
−k
for any d ≥ + k ≥ ≥ 1 for any d ≥ ≥ − k ≥ 1.
This lemma is used to compare determinants, as stated in the following result. Corollary 2.4. Suppose A, B are Hermitian with rank(A − B) ≤ k. If dist(Spec(A), 0) ≥ ρ > 0, then for all t ∈ R | det(B + it)| ≤ ρ −4k B + it4k | det(A + it)|.
54
W. Schlag
Proof. Consider first the case t = 0. Let the eigenvalues of A be given by a1 ≤ a2 ≤ a
−1
≤a ≤0
+1
≤a
+2
≤ . . . ≤ ad .
The eigenvalues of B are denoted by bj . Then, by Weyl’s lemma, | det A| = ad ad−1 · . . . · a +1 |a a −1 · . . . · a1 | ≥ bd−k bd−k−1 · . . . · b +k+1 a +2k · . . . · a +1 |a a −1 · . . . · a −2k+1 b −k b −k−1 · . . . · bk+2 bk+1 | ≥ ρ 4k bd−k bd−k−1 · . . . · b
+k+1 |b −k b −k−1
· . . . · bk+2 bk+1 |.
Therefore | det B| ≤ ρ −4k | det A|B4k , as claimed. For the case t = 0 simply note that for real numbers a ≥ b ≥ 0 one has |a + it| ≥ |b + it|. Hence the same arguments apply to the general case as well.
The following proposition shows that f,E does not deviate much from its mean γ (E). The proof uses the LDEs for the Green’s function. In fact, passing from the Green’s functions to the determinants is rather straightforward. Proposition 2.5. Let ε > 0 and let %ε , λ0 , N0 be as in Proposition 2.1. Fix any ω ∈ %ε , and let λ ≥ λ0 and N ≥ N0 + (log λ)C . Then for any square of size N and any E ∈ [−C log λ, C log λ], mes[θ ∈ T2 : |f,E (θ ) − γ (E)| > N −δ ] < exp(−N δ ).
(2.13)
Here δ > 0 is some small constant. Proof. Let 1 ⊂ Z2 be a large square of size N and set 2 := 1 + (1, 1). Let be (j ) the smallest square containing 1 ∪ 2 (it has size N + 1), see Fig. 1. Define H (θ ) to be the operator that is obtained from H (θ ) by cutting the bonds along the boundary (j ) of j that lies inside . More precisely, H (θ ) is the direct sum of Hj (θ ) and the operator on 2 ( \ j ) that acts solely by multiplication with the potential at any site in \ j . Observe that for all θ , (j )
rank(H (θ ) − H (θ )) ≤ 10N for j = 1, 2. By Proposition 2.1 we know that G (θ , E) + max Gj (θ , E) < eN
b
j =1,2
ρ
up to a θ-set of measure less than e−N . Moreover, in view of (2.2) one easily concludes b that up to a θ-set of measure e−c N , (j )
b
(H (θ ) − E)−1 eN .
Integrated Density of States for Schrödinger Operators
55
1
2 Fig. 1. The three squares
This holds because the eigenvalues of the operator on 2 ( \ j ) are simply the values of the potential along the boundary. For such θ Corollary 2.4 implies that | det(H (θ ) − E)| < (Cλ)40N e40N | det(H (θ ) − E)| > (Cλ)−40N e
b+1
| det(Hj (θ ) − E)|,
−40N b+1
| det(Hj (θ ) − E)|,
where we have absorbed the contribution from the boundary strip into the error terms. Therefore, 1 1 log λ log det(H (θ ) − E) − log det(Hj (θ ) − E) N b−1 + || || N for j = 1, 2 and such θ. This clearly implies that 1 1 log λ log det(H1 (θ ) − E) − log det(H2 (θ ) − E) N b−1 + |1 | |2 | N ρ
up to a θ-set of measure less than e−N . Hence, for any (large) square of size N 1 (log λ) b , (2.14) f,E (θ ) − f,E (θ + ω) N −δ ρ
up to a θ–set of measure less than e−N (with δ = 1 − b). This is an almost invariance property like the one used in [5] and [11] for the monodromy matrices. It is clear that
56
W. Schlag
such an invariance property cannot hold uniformly in θ in case of the determinant. We shall now apply Theorem 3.7 to a suitably normalized version of the function f,E . Firstly, observe that sup
|z1 |,|z2 |≤2
f,E (z1 , z2 ) ≤ C log λ
since |E| ≤ C log λ. It therefore remains to check that f,E (θ1 , θ2 ) is not too negative 2
for some θ ∈ T2 . This can be seen as follows: Let N1 = (log N ) ρ and for every point x ∈ consider a square N1 (x) ⊂ of size about N1 . By Proposition 2.1 the Green’s function GN1 (x) (θ , E) is good for every x ∈ up to a θ-set of measure less than ρ
N 2 e−N1 ≤ N 2 e−(log N) < 21 , say, for large N . This implies by means of the resolvent identity that 2
inf G (θ , E) ≤ eN1 ,
θ∈T2
see Lemma 2.2 in [6]. Hence, sup f,E (θ ) −N1 .
θ∈T2
In view of the preceding, f,E + C(log N )2/ρ C (log N )2/ρ + C log λ is a separately subharmonic function, see Definition 3.5. One now applies Theorem 3.7 below with r = N −ε , γ = 18 , and ρ = 21 . Thus, there is a N −ε ×N −ε –rectangle R ⊂ T2 with the property that |f,E (θ ) − f,E (θ )| N −ε/2 (log λ + (log N )2/ρ )
for any θ , θ ∈ R \ B, (2.15)
ε/8
δ where mes[B] < e−N . Here we take ε = 100 . By the Diophantine property of ω 2 any point of T can be moved into R by no more than N 3ε ω–steps. In view of (2.14) and (2.15) this implies that δ
|f,E (θ ) − f,E (θ )| < N − 300
for any θ , θ ∈ T2 \ B,
(2.16)
δ < exp(−N 800 ) and N > (log λ)C . In view of Lemma 3.6, where mes[B] 2 f,E (θ ) dθ < C(log λ + (log N )2/ρ )2 .
T2
The desired bound (2.13) now follows from (2.16) and Cauchy–Schwarz. Next we turn to considerations involving the convergence of the γ . For technical reasons, we also allow complex energies. In what follows, E and η are always real.
Integrated Density of States for Schrödinger Operators
57
Lemma 2.6. Under the assumptions of Proposition 2.5 there are constants δ > 0 and C(λ, |E| + |η|) such that |γ (E + iη) − γ2 (E + iη)| ≤ C(λ, |E| + |η|)(diam )−δ for all squares ⊂ Z2 . Here 2 denotes the double of . In particular, the limit lim γ2 (E + iη) =: γ∞ (E + iη) →∞
exists for every and E + iη and for all ≥ 0, |γ2 (E + iη) − γ∞ (E + iη)| (2 diam )−δ uniformly in E + iη in bounded sets. Proof. Fix a large square 2 of size 2N , say. Partition it into four congruent squares
{j }4j =1 of size N . Let H (θ ) denote the operator which is the direct sum of the Hj (θ ).
Then rank[H (θ ) − H (θ )] ≤ 10N . By Proposition 2.1 one has G (θ , E) + max Gj (θ , E) < eN
b
j =1,2,3,4 ρ
up to a θ -set of measure not exceeding e−N . Corollary 2.4 therefore implies that 4 log | det(Hj (θ ) − (E + iη))| log | det(H2 (θ ) − (E + iη))| − j =1
N b+1 + N log(λ + |E| + |η|) ρ
for all θ ∈ G (E), where mes[T2 \G (E)] < e−N . Integrating this last line over G (E) and applying Cauchy–Schwarz to the integral over T2 \ G (E) (as in the previous proof) yields |γ2 (E + iη) − γ (E + iη)| ≤ C(λ, |E| + |η|)N −δ with δ = 1 − b, as claimed.
In the following proposition we identify the limit γ∞ . Proposition 2.7. Assume that ω ∈ %ε and that λ is large, cf. Proposition 2.5. Then the limit γ∞ from the previous lemma does not depend on . In fact, ∞ γ (E) = log |E − E | dk(E ) for all E. Moreover, for every E, f,E (θ ) → for a.e. θ.
log |E − E | dk(E )
58
W. Schlag
Proof. This is basically the same as Sect. 4 in [1]. Denote () Nθ (·) dθ =: N () (·), T2
see Sect. 1. Then N () ({E}) → k({E}) as diam() → ∞ for every E that is not an atom of k (in particular, a.e.). Hence they also converge in L2 . By definition and standard properties of the Hilbert transform, γ (E) =
log |E − E | dN () (E ) =
N () (E ) dE , E − E
where the second equality holds for a.e. E. By L2 boundedness of the Hilbert transform one has N () (E ) k(E ) dE → dE = log |E − E | dk(E ) as diam() → ∞ E − E E − E in the L2 sense w.r.t. E. By the previous lemma therefore γ∞ (E) =
k(E ) dE = E − E
log |E − E | dk(E )
(2.17)
for a.e. E (note that the previous equality holds for all E + iη with η = 0 by virtue of the existence of the IDS as a weak limit). The right-hand side is clearly subharmonic in (complex) E. It is important to recall at this point that subharmonicity requires both the sub mean value property and upper semi-continuity (the latter being Fatou in the case of the logarithmic integral). As a uniform limit of continuous subharmonic functions γ∞ (E) is also subharmonic. Indeed, Lemma 2.6 guarantees that uniform convergence takes place in bounded sets of the complex E+iη plane. Since two subharmonic functions that are equal a.e. are equal everywhere (which is an immediate consequence of the aforementioned two properties of subharmonic functions), it follows that (2.17) holds for all E. The final statement of the proposition is obtained by means of combining the previous one with the LDE Proposition 2.5. 3. Polar Sets and Cartan’s Theorem In this section we present some material that it basically already contained in [11], see Sect. 8 there. However, the two–dimensional Cartan theorem proved there is not strong enough for our purposes because the functions are assumed to be bounded. It is simple to remove that assumption, though. As the resulting theorem, see Theorem 3.7 below, has both a stronger conclusion and weaker assumptions, we have decided to include it here with all details. The following lemma is Cartan’s theorem, see [14] Section 11.2. It differs from the statement there only by allowing for the parameter ε.
Integrated Density of States for Schrödinger Operators
59
Lemma 3.1. Let u(z) =
C
log |z − ζ | dµ(ζ )
(3.1)
for some positive finite measure µ. For any 0 < ε, H < 1 there exist disks {D(zj , rj )}∞ j =1 with the property that j
rjε ≤ (5H )ε ,
(3.2)
∞ 1 u(z) > −µ ε −1 + log D(zj , rj ). for all z ∈ C \ H
(3.3)
j =1
Proof. Fix ε > 0. For any p > 0 we say that z is p-good if µ(D(z, r)) ≤ p r ε
∀ r > 0.
By a well-known covering theorem, see Stein [15] page 9, there are pairwise disjoint disks {D(zj , rj /5)}∞ j =1 (possible empty) with the property that Bε,p := {z ∈ C | z is p-bad} ⊂
∞
D(zj , rj )
j =1
and
rjε ≤ 5ε
1 µ. p
Setting p = H −ε µ, this latter inequality is exactly (3.2). Furthermore, if z ∈ Bε,p , then 1 µ(D(z, r) u(z) ≥ dr log |z − ζ | dµ(ζ ) = − r |z−ζ |≤1 0 H 1 dr dr ≥− pr ε µ − r r 0 H 1 = −µ(ε −1 + log ), H as claimed.
Observe that this has the following well-known Corollary 3.2. Let u be as in (3.1). Then dim[u = −∞] = 0, where dim refers to Hausdorff dimension.
60
W. Schlag
Definition 3.3. Let 0 < H < 1. For any subset B ⊂ C we say that B ∈ Car 1 (H ) if B ⊂ j D(zj , rj ) with rj ≤ H. (3.4) j
If d is a positive integer greater than one and B ⊂ Cd we define inductively that B ∈ Car d (H ) if there exists some B0 ∈ Car d−1 (H ) so that B = {(z1 , z2 , . . . , zd ) : (z2 , . . . , zd ) ∈ B0 or z1 ∈ B(z2 , . . . , zd ) for some B(z2 , . . . , zd ) ∈ Car 1 (H )}. We refer to the sets in Car d (H ) for any d and H collectively as Cartan sets. The following lemma collects some well–known facts, see [14] and [13]. The proof of this lemma is in [11] (we assume there that u is bounded but this assumption is irrelevant). Lemma 3.4. Suppose u : D(0, 2) → R ∪ {−∞} is a subharmonic function satisfying sup u(z) ≤ 1 and
z∈D(0,2)
sup u(x) = 0.
−1<x<1
Let µ be the Riesz measure of u. For any z0 ∈ D(0, 21 ), 0 < r < 21 , and H ∈ (0, 1) there exists B ∈ Car 1 (H ) so that dµ(ζ ) 1 |u(z) − u(z )| < C µ(D(z0 , r)) log + |z − z | 1 + H D(0,1)\D(z0 ,r) |z0 − ζ | (3.5) for all z, z ∈ D(z0 , r/2) \ B. In particular, if for some A ≥ 1, M1 µ(z0 ) = sup 0
µ(D(z0 , t)) ≤ A, t
(3.6)
then
1 1 |u(z) − u(z )| < C A r log + |z − z | log H r for all z, z ∈ D(z0 , r/2) \ B.
(3.7)
Our main concern in this section is to obtain a suitable analogue of the previous lemma that applies to functions of two variables which are subharmonic in each variable. The precise meaning of this is given in the following definition. Definition 3.5. Let u be a continuous function on :(0, 2) := D(0, 2) × D(0, 2) ⊂ C2 with values in R ∪ {−∞} so that sup u ≤ 1 and
:(0,2)
sup
−1<x1 <1 −1<x2 <1
u(x1 , x2 ) = 0.
Suppose further that z1 → u(z1 , z2 ) is subharmonic for each z2 ∈ D(0, 2) z2 → u(z1 , z2 ) is subharmonic for each z1 ∈ D(0, 2). Then u will be called separately subharmonic.
(3.8)
Integrated Density of States for Schrödinger Operators
61
We first dispense with a small technical lemma. Lemma 3.6. For any separately subharmonic function u one has 1 1 u2 (x1 , x2 ) dx1 dx2 < C, −1 −1
where C is some absolute constant. Proof. Applying conformal transformations in each variable separately, one may assume that u(0, 0) = 0. Then z2 → u(0, z2 ) is a subharmonic function with bounded Riesz measure and harmonic part. In particular, 1 −C < u(0, x2 ) dx2 < C. −1
Then the subharmonic function
v(z1 ) :=
1
−1
u(z1 , x2 ) dx2
satisfies v(0) > −C
and
sup v(z1 ) ≤ 2.
|z1 |≤2
Hence, by Cartan’s estimate mes[x1 ∈ [−1, 1] : v(x1 ) ≤ −Ct] ≤ e−t
(3.9)
for every t ≥ 1. If v(x1 ) ≥ −Ct, then also sup−1<x2 <1 u(x1 , x2 ) ≥ −Ct. Hence, via the Riesz representation in the second variable 1 u2 (x1 , x2 ) dx2 < Ct 2 −1
in that case. One concludes from (3.9) that therefore 1 √ mes x1 ∈ [−1, 1] : u2 (x1 , x2 ) dx2 > Ct ≤ e− t −1
for t ≥ 1. Clearly, this implies the lemma.
Theorem 3.7. Let u be a separately subharmonic function as in Definition 3.5. Fix some γ ∈ (0, 21 ). Given r ∈ (0, 1) and r < ρ < 1 there exists %0 ⊂ [−1, 1]2 with mes([−1, 1]2 \ %0 ) ≤ ρ
−γ , as defined in Definition 3.3, such that for every choice of and a set B ∈ Car 2 e−r (0)
(0)
(0)
(0)
(x1 , x2 ) ∈ %0 the polydisk : = D(x1 , r 1−γ ) × D(x2 , r) satisfies |u(z1 , z2 ) − u(z1 , z2 )| <
Cγ 1−2γ 1 r log 2 r ρ
for all (z1 , z2 ), (z1 , z2 ) ∈ : \ B. (3.10)
62
W. Schlag
Proof. Applying conformal transformations in each variable separately, one may assume that u(0, 0) = 0. Then the subharmonic function z2 → u(0, z2 ) has finite Riesz measure and bounded harmonic part. The Riesz representation therefore implies that
1 −1
u(0, x2 ) dx2 > −C
(3.11)
with some absolute constant C. For any z1 ∈ D(0, 2) define v(z1 ) =
1
−1
u(z1 , x2 ) dx2 .
(3.12)
The subharmonic function v : D(0, 2) → R satisfies v(0) > −C, see (3.11), and v ≤ 1. Hence its Riesz measure µv is bounded, and 1 v(x1 ) dx1 > −C (3.13) −1
for some absolute constant. Let M1 µv be the maximal function given by (3.6). Clearly, M1 satisfies the usual weak–type L1 inequality C 3 mes x1 ∈ [−1, 1] : M1 µv (x1 ) > λ ≤ µv (D(0, )). λ 2
(3.14)
In view of (3.13) and (3.11), up to a set of x1 –measure at most ρ one has both M1 µv (x1 ) (0) ρ −1 and v(x1 ) −ρ −1 . Pick one such x1 . For any z2 ∈ D(0, 2) let gt (z2 ) =
0
1
(0)
(0)
u(x1 + te2πiθ , z2 ) dθ − u(x1 , z2 ).
By Jensen’s formula, see Theorem 2 in Sect. 7.2 of [14], t n(s, z2 ) t µ(dz gt (z2 ) = ds, log , z ) = 1 2 (0) (0) s |z1 −x1 |
(3.15)
(3.16)
(0)
where n(s, z2 ) = µ(D(x1 , s), z2 ) with the Riesz measure µ(·, z2 ) of u(·, z2 ). Clearly,
(0)
µv (D(x1 , s)) =
1
−1
n(s, x2 ) dx2 . (0)
Therefore, in view of (3.15), (3.16), and our choice of x1 ,
1
−1
gt (x2 ) dx2 =
t 0
(0)
µv (D(x1 , s)) t ds ≤ C . s ρ
Now fix some r ∈ (0, 1/2) and define G=
0≤j
2−j g2j r . 1 r
(3.17)
Integrated Density of States for Schrödinger Operators
63
The subharmonicity of z1 → u(z1 , z2 ) implies that gt ≥ 0 so that G is the sum of nonnegative terms. By (3.17), 1 C 1 G(x2 ) dx2 ≤ r log , ρ r −1 and thus
C 1 mes x2 ∈ [−1, 1] : G(x2 ) > 2 r log < ρ, ρ r
(3.18)
provided C is a sufficiently large absolute constant. For technical reasons we introduce the auxiliary subharmonic function 1 (0) u(x1 + r 2 e2πiθ , z2 ) dθ for any z2 ∈ D(0, 2). (3.19) h(z2 ) = 0
We denote the Riesz measure of h by µh . The function gt introduced in (3.15) is the difference of two subharmonic functions on D(0, 2). Let µt and µ0 be their respective (0) (0) Riesz measures. By our choice of x1 , v(x1 ) > − Cρ and thus C (0) u(x1 , x2 ) > − . ρ −1<x2 <1 sup
(0)
Since also supz2 ∈D(0,2) u(x1 , z2 ) ≤ 1, the Riesz measure µ0 satisfies the bound µ0 (D(0, 3/2)) ≤
C . ρ
(3.20) (0)
Since the integral in (3.15) is pointwise bigger than u(x1 , z2 ) for any choice of z2 , one concludes that (3.20) also holds for µt for any t (and thus in particular for µh ). By the weak-L1 bound on M1 and (3.20) all points x2 ∈ [−1, 1] up to a set of measure at most ρ satisfy C 1 µ2j r + µ0 + µh (x2 ) ≤ 2 log . (3.21) M1 ρ r 1 0≤j
r
(0)
In view of Lemma 3.4, there exists B0 = B0 (x1 ) ∈ Car 1 (exp(−r −γ )) so that for any such x2 sup 0≤j
1 r
|g2j r (z2 ) − g2j r (z2 )| <
C 1−γ 1 r log ρ2 r
for all z2 , z2 ∈ D(x2 , r) \ B0 . (3.22)
Combining (3.18) and (3.22) shows that up to an x2 -set of measure at most ρ, g2j r (z2 ) ≤
C j 1 [2 r + r 1−γ ] log 2 ρ r
(0)
for all z2 ∈ D(x2 , r) \ B0
and
1 all 0 ≤ j < C log . r
(3.23)
64
W. Schlag (0)
Now fix such a point x2 ∈ [−1, 1]. Using (3.16) one concludes from (3.23) that (0)
µ(D(x1 , 2j r), z2 ) ≤
C j 1 [2 r + r 1−γ ] log ρ2 r
for all z2 and j as before. Inserting this bound into (3.5) with H = exp(−r −γ ) and r 1−γ instead of r one obtains for any such z2 a Cartan set B(z2 ) ∈ Car 1 (H ) so that C 1−γ 1 1 2 1 log − z | log r log + |z 1 1 ρ2 r H r C 1−2γ 1 (0) ≤ 2r for any z1 , z1 ∈ D(x1 , r 1−γ ) \ B(z2 ). log ρ r (3.24)
|u(z1 , z2 ) − u(z1 , z2 )| ≤
To control the deviation in z2 we invoke the auxiliary subharmonic function h from above. Because of (3.21) Lemma 3.4 implies that |h(z2 ) − h(z2 )| ≤
C 1−γ 1 r log 2 ρ r
(0)
for all z2 , z2 ∈ D(x2 , r) \ B1 ,
(3.25)
(0)
where B1 = B1 (x1 ) ∈ Car 1 (H ), H = exp(−r −γ ). By the definition of a Cartan set and (3.24), |h(z2 ) − u(z1 , z2 )| ≤
C 1−2γ 1 [r log + r −2 H ] 2 ρ r for all z2 ∈
(0)
(0) D(x2 , r) \ B0 ,
z1 ∈
(0) D(x1 , r) \ B(z2 ).
(3.26)
(0)
Let : = D(x1 , r 1−γ ) × D(x2 , r) and (0)
B = {(z1 , z2 ) : z2 ∈ B0 ∪ B1 or z2 ∈ D(x2 , r) \ B0 ∪ B1
and z1 ∈ B(z2 )}.
In view of Definition 3.3, B ∈ Car 2 (H ) with H = exp −r −γ . Combining (3.26) with (3.25) implies that |u(z1 , z2 ) − u(z1 , z2 )| ≤
C 1−2γ 1 r log ρ2 r (0)
for all (z1 , z2 ), (z1 , z2 ) ∈ : \ B,
(0)
as claimed. Let %0 be the set of all (x1 , x2 ) as above. By construction, mes([−1, 1]2 \ %0 ) ρ. Notice that the set B depends on :. To finish the proof cover %0 by about r −2 many such polydiscs and take the union of the resulting sets B. Acknowledgement. The author is grateful to the anonymous referee for pointing out several inaccuracies in an earlier version of this paper, as well as making numerous suggestions for improving the exposition. He was partially supported by the NSF, grant DMS-0070538.
Integrated Density of States for Schrödinger Operators
65
References 1. Avron, J., Simon, B.: Almost periodic Schrödinger operators II. The integrated density of states. Duke Math. J. 50 no. 1, 369–391 (1983) 2. Bhatia, R.: Perturbation bounds for matrix eigenvalues. Pitman research notes in mathematics series 162, London: Longman, 1987 3. Bourgain, J.: Holder regularity of integrated density of states for almost Mathieu operator in perturbative regime. Lett. Math. Phys. 51, 83–118 (2000) 4. Bourgain, J.: Estimates on Green’s functions, localization and the quantum kicked rotor model. Preprint 2000, to appear in Annals of Math. 5. Bourgain, J., Goldstein, M.: On nonperturbative localization with quasi-periodic potential. Ann. of Math. (2) 152, no. 3, 835–879 (2000) 6. Bourgain, J., Goldstein, M., Schlag, W.: Anderson localization for Schrödinger operators on Z2 with quasi-periodic potential. To appear in Acta Math. 7. Bourgain, J., Jitomirskaya, S.: Anderson localization for the band model. Geometric Aspects of Functional Analysis. Lecture Notes in Math. 1745. Berlin: Springer, 2000 8. Craig, W., Simon, B.: Log Hölder continuity of the integrated density of states for stochastic Jacobi matrices. Commun. Math. Phys. 90, no. 2, 207–218 (1983) 9. Delyon, F., Souillard, B.: Remark on the continuity of the density of states of ergodic finite difference operators. Commun. Math. Phys. 94, no. 2, 289–291 (1983) 10. Figotin, A., Pastur, L.: Spectra of random and almost–periodic operators. Grundlehren der mathematischen Wissenschaften 297, Berlin–Heidelberg–New York: Springer, 1992 11. Goldstein, M., Schlag, W.: Hölder continuity of the integrated density of states for quasiperiodic Schrödinger equations and averages of shifts of subharmonic functions. Ann. of Math. 154, no. 1, 155–203 (2001) 12. Jitomirskaya, S.: Metal–insulator transition for the almost Mathieu operator. Ann. of Math. 150, no. 3, 1159–1175 (1999) 13. Koosis, P.: The logarithmic integral II. Cambridge Studies in Advanced Mathematics, 21, Cambridge: Cambridge University Press, 1992 14. Levin, B. Ya.: Lectures on entire functions. Transl. of Math. Monographs, Vol. 150, Providence, RI: AMS, 1996 15. Stein, E.: Singular integrals and differentiablity properties of functions Princeton, NJ: Princeton University Press, 1970 Communicated by B. Simon
Commun. Math. Phys. 223, 67 – 86 (2001)
Communications in
Mathematical Physics
© Springer-Verlag 2001
Genealogy of Shocks in Burgers Turbulence with White Noise Initial Velocity Christophe Giraud Laboratoire de Probabilités et Modèles Aléatoires, Université Pierre et Marie Curie, et C.N.R.S. UMR 7599, 175, rue du Chevaleret, 75013 Paris, France. E-mail: [email protected] Received: 5 September 2000 / Accepted: 29 May 2001
Abstract: As time passes, the shocks of the solution of the inviscid Burgers equation aggregate. We characterize, in the case of white noise initial velocity, the stochastic fragmentation process obtained when time runs backwards. In other words, we specify the law of the genealogy of the shocks of the Burgers turbulence with white noise initial velocity. 1. Introduction Burgers equation ∂t u + u.∇u = µu
(1)
is a simplified version of the Navier–Stokes equation, in which the terms of pressure and force are neglected. First introduced by Burgers (see [6]) as a simple model for hydrodynamic turbulence, Burgers equation also appears in different fields such as the theory of the interfaces for ballistic aggregation or in cosmology for the formation of superstructures in the universe (see [19] and references therein). We focus henceforth on dimension one when the viscosity µ tends to 0. It is known that the solution uµ of (1) converges when µ → 0 to u0 = u, which is the (weak) entropy solution of the so-called inviscid Burgers equation ∂t u + u∂x u = 0.
(2)
Several studies concern statistics of the solution of (2) when the initial velocity u(., 0) is a random process, see [6, 13, 19]. The solution u(., t) at fixed time t of (2) is now well known in many cases, such as for example when u(., 0) is a Brownian motion (see [18, 4]) or a white noise (see [2, 3, 6, 11, 16, 17]). Recently, Bertoin has studied the evolution in time of the solution u when the initial velocity u(., 0) is a Brownian motion, and proved a striking connection with the additive coalescent, see [5].
68
C. Giraud
The aim of this paper is to describe the evolution in time of u when u(., 0) is a white noise. The solution u(., t) is then a tooth-path with a discrete sequence of discontinuities and segments of slope 1/t. In particular, the solution u(., t) is determined by the location and amplitude of its discontinuities, which are usually called shocks. As time passes, when two shocks “collide”, they form a single shock with amplitude the sum of the amplitudes of the two previous shocks. The evolution in time of u is thus governed by the deterministic dynamic of clustering of the shocks. This dynamic induces a loss of information, in the sense that if t1 < t2 , the solution u(., t1 ) cannot be recovered from u(., t2 ). When time runs backwards, we thus obtain a random process of fragmentation: a shock S splits after a certain time ρ into two shocks S1 and S2 , which in turn split after a time ρ1 and ρ2 , respectively into S11 , S12 and S21 , S22 , etc. . . . . The purpose of the present work is to describe the law of this process of fragmentation, that is the genealogy of the shocks. t
S
backward time
forward time
ρ
t- ρ ρ
t- ρ−ρ
S1 S2
1
1
S 11 S 12
S 21
S 22
In this direction it is convenient to use the sticky particles interpretation of the inviscid Burgers equation. Consider at time t = 0, infinitesimal particles uniformly distributed on the line, with initial velocity u(., 0). We suppose that they evolve with the dynamics of completely inelastic shocks. This means that the velocity of a particle only changes in case of collision, and when two (clusters) of particles collide, they form a heavier cluster with conservation of masses and momenta. The evolution of the system is completely described by the entropy solution of (2), and u(x, t) then represents the velocity of the particle located at x at time t. The case of white noise initial velocity arises for example at the infinitesimal limit of a discrete sticky particles system, when particles are evenly spread with i.i.d. velocities (with zero mean and finite variance). A shock of u(., t) represents a cluster at time t with mass given by t×the amplitude of the shock. The fragmentation of the shocks then corresponds to the fragmentation of the clusters obtained by reversing the dynamics of the sticky particles. We prove here that, in the case of white noise initial data, conditionally on the state of the system at time t, masses split independently of their location, velocity and also of their environment. We compute the law of the parameters which describe the splitting of a mass, and which also characterize the law of the fragmentation process. We present in the second section some material for the study of the system. We specified the fragmentation in the third section. The fourth section is devoted to the proofs of the preliminary results. The Appendix contains the somewhat technical proof of Lemma 6.
Shocks in Burgers Turbulence with White Noise Initial Velocity
69
2. Preliminaries 2.1. Inviscid Burgers equation and sticky particles. Let us consider the entropy solution u of the inviscid Burgers equation. This is the weak solution of (2) such that x → u(x, t) has only discontinuities of the first kind and no positive jumps. We shall work here with the version satisfying u(x+, t) + u(x−, t) u(x, t) = , 2 where u(x+, t) and u(x−, t) refers to the right and left limit of u(., t) at x. We call initial potential the process z W (z) = u(x, 0)dx, z ∈ R. 0
o(|z|2 )
for z → ±∞, we denote by a(x, t) the largest location of the When W (z) = minimum of the function z → W (z) + 2t1 (z − x)2 . We have the following two equivalent geometrical interpretations of a(., t). W (z) + z2 /2t
W (z)
(z − x)2 /2t + C x/t x
a(x, t)
a(x, t)
First, consider a parabola z → − 2t1 (z−x)2 +C, with C chosen such that this parabola is strictly below the path of W . Let C increase until this parabola touches the graph of W . The largest abscissa of the contact points is then a(x, t). Alternatively, bring up a line of slope x/t, until it touches the graph of z → W (z) + 2t1 z2 . The largest abscissa of contact is again a(x, t). The function a(., t) may be thus described in terms of the convex minorant of the initial potential with a 2t1 -parabolic drift, z → W (z)+ 2t1 z2 , or in terms of the “ 2t1 -parabolic” minorant of the initial potential W . One notices in particular that a(., t) is right continuous and non-decreasing. We can express the entropy solution of the inviscid Burgers equation in terms of a(., t). Yet, it is known (see Hopf [12] or Cole [7]) that u(x, t) =
x − a(x, t) t
at every x where a(., t) is continuous, and u(x, t) = elsewhere.
u(x+, t) + u(x−, t) , 2
70
C. Giraud m(x, t)
a(x− , t)
a(x, t) a(x2 , s)
a(x1 , s) m1
a(x3 , s)
a(xk+1 , s)
a(xk , s)
m2
mk
We have seen that the function a(., t) is right continuous and non decreasing. It possesses therefore a right continuous inverse x(a, t) = inf{y ∈ R : a(y, t) > a}, which is known as the Lagrangian function. In the sticky particles interpretation, the latter gives the location at time t of the particle started from a ∈ R. In particular, the Eulerian shock points at time t, which are the abscisses x of discontinuity of a(., t), are the locations of clusters in the system at time t. They have mass m(x, t) = a(x+, t) − a(x−, t) and velocity u(x, t), since (conservation of momenta) u(x, t) =
u(x+, t) + u(x−, t) 1 = 2 a(x+, t) − a(x−, t)
a(x+,t) a(x−,t)
u(z, 0)dz.
When the initial potential W is a Brownian motion (i.e. u(., 0) is a white noise), it was shown (see [2, 11]) that a(., t) is a.s. a step function, which means that u(., t) is a tooth-path. The particles are thus located at time t > 0 on a discrete set of clusters. One says that the shock structure is discrete. We are interested in the fragmentation of a cluster located at x at time t, when time runs backwards. We write m1 (x, s, t), . . . , mk (x, s, t) for the masses of the clusters at time s which form this cluster at time t. One may notice that m1 (x, s, t), . . . , mk (x, s, t) are exactly the length of the intervals of the partition of [a(x−, t), a(x, t)] induced by the range of a(., s). We give therefore the following definition. Definition 1. For any Eulerian shock point x at time t and 0 < s < t, we set M(x, s, t) = (m1 (x, s, t), . . . , mk (x, s, t)), where m1 (x, s, t), . . . mk (x, s, t) are the length of the intervals appearing in the partition of [a(x−, t), a(x, t)] by the range of a(., s) ranked according to the increasing order of their location. M(x, s, t) takes values in the space of finite positive numerical sequence S = ∪n∈N∗ ]0, ∞[n . The process (M(x, t − r, t); 0 ≤ r ≤ t) will be called the fragmentation process. We notice in the following lemma that the fragmentation of a cluster only depends on the “excursions” of the initial potential above the “parabolic minorant”. We write in the sequel E = ∪m>0 {m} × C([0, m], R+ ) for the space of positive excursions, where m is meant to represent the duration of the excursion. Lemma 1. There exists a function F : R+ × R+ × E → S such that for any initial potential W which induces a discrete shock structure, and any Eulerian shock point x at time t, we have M(x, s, t) = F (s, t, m(x, t), ε (x,t) ),
for 0 < s < t,
Shocks in Burgers Turbulence with White Noise Initial Velocity
71
where ε (x,t) (z) = W (a(x−, t) + z) − W (a(x−, t)) 1 − z(m(x, t) − z) − zu(x, t), for 0 ≤ z ≤ m(x, t). 2t Proof of Lemma 1. We set ε (x,t) (z) = W (a(x−, t) + z)−W (a(x−, t)) 1 − z(m(x, t)−z)−zu(x, t), for z ∈ R. 2t For any y ∈ R and 0 < s < t, we have 1 2 a(y, s) = a(x−, t) + argmin W (a(x−, t) + z)−W (a(x−, t)) + (z−y) z∈R 2s 1 1 2 (x,t) = a(x−, t) + argmin ε (z) + z(m(x, t)−z) + (z−y −su(x, t)) , z∈R 2t 2s where argminz∈R (g(z)) denotes the largest z that minimizes g. We deduce so that the range of a(., s) − a(x−, t) only depends on s, t, m(x, t) and ε (x,t) . Now the intersection of the range of a(., s) − a(x−, t) with [0, m(x, t)] depends on s, t, m(x, t) and the restriction of ε (x,t) to [0, m(x, t)], which is ε(x,t) . Since M(x, s, t) is the finite sequence of the length of the intervals of the partition of [0, m(x, t)] by the range of a(., s) − a(x−, t), it is therefore a function of s, t, m(x, t) and ε (x,t) . The proof is complete.
2.2. Laplace transform of the integral of a 3-d Bessel bridge (after Groeneboom). We recall in this subsection the value of the Laplace transform of the integral of a 3-d Bessel bridge, which shall appear in further calculations. Let 0 > −ω1 > −ω2 > · · · denotes the zeros of the Airy function Ai as defined on p. 446 of [1]. We introduce C(λ) =
∞ √ 2πλ exp(−2−1/3 ωn λ2/3 ), n=1
and F (x,y) (λ) = 2−1/3 λ2/3
∞ Ai(21/3 λ1/3 y − ωn ) n=1
Ai (−ωn )
exp(−2−1/3 λ2/3 xωn ).
Groeneboom has computed the Laplace transform of the integral of a three dimensional Bessel bridge, see [11] Theorem 2-1 and formulas (4-9), (4-13). [x] Lemma 2. For x, y ≥ 0, let β0→y be a three dimensional Bessel bridge with duration x starting at 0 and ending at y. We set x [x] (x,y) L udβ0→y (u) . (λ) = E exp λ 0
We have
√ 2πx 3 exp λxy + y 2 /2x F (x,y) (λ) /y for y > 0 (x,y) L (λ) = x E exp −λ 0 e[x] (s) ds = C x 3/2 λ for y = 0,
where e[x] denotes a Brownian excursion with duration x.
72
C. Giraud
2.3. Excursions above parabolas. We state here some useful results on excursions conditioned to stay above a parabola. The proof of these results are given in Sect. 4. For m > 0, we set P[m] for the law of a Brownian excursion (or of a 3D-Bessel bridge starting and ending at 0) with duration m. For any a, m > 0 we introduce the probability measure m exp −a 0 Xs ds [m] ν(a, m) = P C(am3/2 ) which is absolutely continuous with respect to P[m] , where Xs denotes the canonical process and the normalizing factor C has been defined in the previous subsection. In the sequel, we will often write e[m] for a Brownian excursion with duration m. We first connect ν(a, m) to the law of an excursion conditioned to stay above a parabola. Lemma 3. For any a, m > 0, we set for 0 ≤ z ≤ m, p(a,m) (z) =
a z(m − z). 2
If e[m] is a Brownian excursion with duration m, then the law of e[m] −p(a,m) conditionally on e[m] ≥ p(a,m) is ν(a, m). We now associate to a Brownian excursion with duration m, e[m] , the two variables σ (m) = sup{a ≥ 0; e[m] ≥ p(a,m) } η(m) = the largest abscissa z0 ∈ (0, 1) such that e[m] (z0 ) ≥ p(σ (m),m) (z0 ). We shall write in the sequel e = e[1] , σ = σ (1) and η = η(1). Let us specify the law of (σ (m), η(m)). Lemma 4. Law of (σ (m), η(m)). (i) For any m > 0 we have the scaling identity (σ (m), η(m)) = (m−3/2 σ, m η). law
(ii) The law of (σ, η) is given by a2
P (η > x, σ ∈ da) = e− 24 ∂2 G(x) (a, 0), where G(x) (a, b) =
√
8π eab(1−x) (2x+1)/12 eO(b ) ∞ × eyb(x−1/2) F (x,y) (a)F (1−x,y+β) (a − b) dy, 2
2
0 b2
with O(b2 ) = 24 (1 − x)(8x 2 − 4x − 1), β = p (b,1) (x) = b2 x(1 − x) and F (x,y) defined in the previous subsection. In particular we have P(σ ≥ a) = e−a with C defined in the previous section.
2 /24
C(a),
Shocks in Burgers Turbulence with White Noise Initial Velocity
73
We give in the last lemma the law of an excursion conditionally on (σ = a, η = x). In this direction, it is convenient to define the concatenation of two processes (ε1 (z); 0 ≤ z ≤ m1 ) and (ε2 (z); 0 ≤ z ≤ m2 ) as the process ε1 (z)10≤z≤m1 + ε2 (z)1m1 ≤z≤m1 +m2 ; 0 ≤ z ≤ m1 + m2 .
Lemma 5. Conditionally on (σ = a, η = x), e −p(σ,1) has the law of the concatenation of two independent processes of law ν(a, x) and ν(a, 1 − x). As a consequence, the law of e[m] − p (σ,m) under ν(a, m) conditionally on (σ (m) = b − a, η(m) = xm) is the law of the concatenation of two independent processes of law ν(xm, b) and ν((1 − x)m, b). Remark. The previous lemma ensures in particular that there exists a.s. a unique abscissa z0 ∈ (0, 1) such that e[m] (z0 ) = p(a,m) (z0 ). Indeed, an excursion is a.s. positive on (0,1), and this property still holds under ν(a, m). We can start now our investigations. 2.4. Conditional distribution of the initial data. Clusters are ranked according to the increasing order of their location, with the convention that x1 (t) is the location of the first cluster at the right of 0. From a physical point of view, the state of the system at time t is described by the sequence ((xn (t) , mn (t) , vn (t)) ; n ∈ Z), where mn (t) and vn (t) are the mass and velocity of the nth cluster. Yet, one may notice that the useful datum is the sequence ((xn (t) , an (t)) ; n ∈ Z), where an (t) = a (xn (t) , t). Indeed, we have mn (t) = an (t) − an−1 (t) , 2xn (t) − (an (t) + an−1 (t)) . and vn (t) = 2t We thus introduce Ft = σ ((xn (t) , an (t)) ; n ∈ Z) which is the datum given by the state of the system at time t. The law of ((xn (t) , mn (t) , vn (t)) ; n ∈ Z) is known, see formulas (101) and the following remark, (50), (54), (55), (66), (67) and (70) in [8]. We specify now the law of the initial potential W conditionally on the state of the turbulence at time t. We prove that the pieces of Brownian motion between the Lagrangian points an (t) are independent conditionally on Ft and we connect their law to ν(a, m) (recall that this distribution has been introduced in Sect. 2.3). Lemma 6. The “excursions” of the Brownian motion above the “ 2t1 -parabolic minorant” ε(xn (t),t) (z) = W (z+an−1 (t))−W (an−1 (t))−p (mn (t),1/t) (z)−zvn (t),
0 ≤ z ≤ mn (t),
are independent conditionally on Ft and their conditional law is ν(mn (t), 1/t). This lemma is the key of our analysis, since it connects the study of the fragmentation to the law ν(a, m). Let us sketch an explanation for the origin of this result. Groeneboom [10] has studied the convex minorant of the Brownian motion (he actually focused on the concave majorant, which is its symmetric about the abscissas line). He has shown it is a piecewise linear path, and conditionally on the edge points of this path, the Brownian motion realizes independent Brownian excursions above each
74
C. Giraud
W
vn(t)
mn (t)
an-1 (t)
x n (t)
an (t)
x n+1 (t)
an+1 (t)
segment of the convex minorant. By the Girsanov Theorem, adding a parabolic drift z → 2t1 z2 to the Brownian motion amounts to work under the probability measure T T3 1 zdWz − 2 PGT , exp t 0 6t where GT = σ (Ws ; 0 ≤ s ≤ T ). In other words conditionally on the “ 2t1 −parabolic minorant” of the Brownian motion, the excursions ε(xn (t),t) of the Brownian motion above the “ 2t1 −parabolic minorant” are independent and have the law ν(mn (t), 1/t). Nevertheless, we need to investigate the convex minorant of a Brownian motion on a finite interval in order to apply the Girsanov Theorem, which cannot be deduced easily from the convex minorant of a Brownian motion on [0, ∞). However, we can prove Lemma 6 from results in [11], and this somewhat technical proof is given in the Appendix. 3. Fragmentation Statistics 3.1. Statement of the main results. We thenceforth turn our attention to the process of fragmentation generated by the dynamic of sticky particles as time runs backwards. Since the clustering dynamic of sticky particles is deterministic and induces a loss of information, the fragmentation process obtained by time reversal is a stochastic Markovian process. Our aim is to describe this process. Recall that (xn (t), mn (t), vn (t)) denotes the (location, mass, velocity) of the nth cluster at time t and Ft = σ ((xn (t), mn (t), vn (t)); n ∈ Z). We point out that (Ft , t ≥ 0) is a backwards filtration, since the evolution of the system is deterministic and induces a loss a information. The variable M(x, s, t) is defined in Sect. 2.1 (Definition 1). We first specify the dependence of the fragmentation of a cluster and its environment. In this direction, we write µ(t, m) for the law of (M(x1 (t), t − r, t); 0 ≤ r ≤ t) conditionally on (m1 (t) = m). Theorem 1. Fix t > 0. Conditionally on the state Ft of the turbulence at time t, the fragmentation processes (M(xn (t), t − r, t); 0 ≤ r ≤ t), n ∈ Z, are independent, and their conditional laws only depend on (t, mn (t)). More precisely, they are given by µ(t, mn (t)).
Shocks in Burgers Turbulence with White Noise Initial Velocity
75
Roughly, Theorem 1 claims that conditionally on the state of the system at time t, the masses of the clusters at time t − r are obtained by breaking into pieces each cluster at time t independently of its location and velocity and of the other clusters. In particular, we deduce from Theorem 1 that for any Eulerian shock point x, the fragmentation process (M(x, t − r, t); 0 ≤ r ≤ t) is an inhomogeneous Markov process. We want now to describe the law µ(t, m). We denote by ρ(t, m) (or simply ρ) the time at which the cluster located at x1 (t) at time t splits conditionally on (m(x1 (t), t) = m). We will check that we have a binary splitting M(x1 (t), t − ρ, t) = (m1 , m2 ) and we denote by R(t, m) (or simply R) the ratio R = m1 /(m1 + m2 ). The next theorem characterizes the law µ(t, m), in terms of the law of the variable (ρ, R). We introduce in this aim the operation ∗ on the space of finite numerical sequences : for any M1 = (m11 , . . . , m1k1 ) and M2 = (m21 , . . . , m2k2 ), we write M1 ∗ M2 = (m11 , . . . , m1k1 , m21 , . . . , m2k2 ). We can state now our result which is a splitting property at time ρ. Theorem 2. For any t, m > 0, let M = (M(r); 0 ≤ r ≤ t) be a process of law µ(t, m). We have M(ρ + r) = M1 (r) ∗ M2 (r) for 0 ≤ r ≤ t − ρ, where M1 and M2 are independent conditionally on (ρ, R) with conditional law µ(t − ρ, Rm) and µ(t − ρ, (1 − R)m). In other words, conditionally on m(x1 (t), t) the cluster located at x1 (t) at time t splits after a time ρ0 = ρ(t, m(x1 (t), t)) into two clusters of mass m1 = R0 m(x1 (t), t) and m2 = (1 − R0 ) m(x1 (t), t), where R0 = R(t, m(x1 (t), t)). Moreover the fragmentation processes of these two clusters are independent conditionally on (m(x1 (t), t), R0 , ρ0 ) and their conditional law are µ(t − ρ0 , m1 ) and µ(t − ρ0 , m2 ). We can therefore iterate Theorem 2 in order to obtain that in turn each of these two clusters splits into two clusters of mass m11 = R(t − ρ0 , m1 ) m1 , m12 = (1 − R(t − ρ0 , m1 )) m1 and
m21 = R(t − ρ0 , m2 ) m2 ,
m22 = (1 − R(t − ρ0 , m2 )) m2
at time ρ1 = ρ(t − ρ0 , m1 ) and ρ2 = ρ(t − ρ0 , m2 ), and so on. The last theorem gives the joint law of ρ(t, m) and R(t, m) which completes the characterization of the law µ(t, m). Theorem 3. For any 0 < r < t, m > 0 and 0 ≤ α ≤ 1, the law of (ρ(t, m), R(t, m)) is given by µ(t, m) (ρ ∈ dr, R > α) 3 3/2 m 1 1 m3 dr (α) m = exp ∂ − G , 0 2 24 t 2 (t − r)2 t −r C(m3/2 /t)(t − r)2 where G(α) (a, b) is defined in Lemma 4 and C in Sect. 2.2. In particular, we have 3 m C(m3/2 /(t − r)) 1 1 µ(t, m)(ρ ≥ r) = exp . − 24 t 2 (t − r)2 C(m3/2 /t)
76
C. Giraud
We deduce therefore the following asymptotics for the distribution of the splitting time ρ. Corollary 1. For any t, m > 0, we have µ(t, m)(ρ ≥ r) exp (m3 /24t 2 ) ∼r→t C(m3/2 /t)
√
2π m3/2 m m3 −1/3 , −2 ω1 exp − t −r 24(t − r)2 (t − r)2/3
with ω1 ≈ 2.3381, and µ(t, m)(ρ ≤ r) ∼r→0
m3/2 C (m3/2 /t) − 12t C(m3/2 /t)
×
r , t2
where C is defined in Sect. 2.2. Let us prove now these results. 3.2. Numerical illustrations. The joint density of the time ρ(1, 1) and the position R(1, 1) computed in Theorem 3 is plotted in Fig. 1. The joint density is not plotted at the extremal values α = 0 and α = 1 of the position R(1, 1), since the series F (x,y) (λ) does not converge when x = 0.
2
joint density
1.5
1
0.5
0 1 0.8
1 0.6
0.8 0.6
0.4
0.4
0.2 position
0.2 0
0
time
Fig. 1. Joint density of time ρ(1, 1) and the position R(1, 1)
Shocks in Burgers Turbulence with White Noise Initial Velocity
77
2.0 1.8 1.6 1.4 1.2 1.0 0.8 0.6 0.4 0.2 0 0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1.0
Fig. 2. Theoretical and simulated density of the time ρ(1, 1)
In Figs. 2 and 3, the plots of the theoretical and the simulated marginal densities of ρ(1, 1) and R(1, 1) are given. The simulated curves are obtained in drawing a Brownian excursion on 1000 steps. The paths which are not everywhere above the parabola z → z(1 − z)/2 are rejected. We then compute σ and η to deduce (ρ(1, 1), R(1, 1)). The plots result from 3.106 iterations. Theoretical and simulated curves do not fit exactly. The difference is the consequence of the discretisation of the Brownian excursion. Increasing the number of steps of the simulated Brownian excursion leads to the convergence of the simulated curves towards the theoretical ones. The density of R(1, 1) is only plotted on [0.5, 0.95], since the series F (x,y) (λ) converges slowly for small values of x. 3.3. Proofs. Proof of Theorem 1. Lemma 1 ensures that the fragmentation processes may be written M(xn (t), t − ., t) = F (t − ., t, mn (t), ε(xn (t),t) ). We thus deduce from Lemma 6 that they are independent conditionally on Ft . Since the Ft -conditional law of the processes ε (xn (t),t) , n ∈ Z is ν(t, mn (t)), the Ft -conditional law of M(xn (t), t − ., t), n ∈ Z only depends on (t, mn (t)) and is µ(t, mn (t)). Proof of Theorem 2. The law of the excursion ε (x1 (t),t) of the Brownian motion above the “ 2t1 -parabolic minorant” is ν(m(x1 (t), t), 1/t). We write τ = t − ρ, ε1 (z) = ε(x1 (t),t) (z) − p (m(x1 (t),t),1/τ −1/t) (z),
for 0 ≤ z ≤ m1 = Rm(x1 (t), t)
78
C. Giraud
density 1.6
1.4
1.2
Simulated curve 1.0
0.8
Theoretical curve 0.6
0.4
0.2
0.0 0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1.0
position Fig. 3. Theoretical and simulated density of the position R(1, 1)
and ε2 (z) = ε (x1 (t),t) (m1 + z) − p (m(x1 (t),t),1/τ −1/t) (m1 + z) for 0 ≤ z ≤ m2 = m(x1 (t), t) − m1 . Since
1 = sup a ≥ 0; ε(x1 (t),t) (z) ≥ p(m(x1 (t),t),a−1/t) , for 0 ≤ z ≤ m(x1 (t), t) , τ Lemma 5 ensures that conditionally on (m(x1 (t), t) = m, 1/τ = 1/t1 − 1/t, R = x) the law of ε1 is ν(xm, 1/t1 ) and the law of ε2 is ν((1 − x)m, 1/t1 ). Moreover, since the fragmentation of [a(x1 (t)−, t), a(x1 (t), t)] after time ρ is obtained from the fragmentation of [a(x1 (t)−, t), a(x1 (t)−, t) + Rm(x1 (t), t)] and [a(x1 (t)−, t) + Rm(x1 (t), t), a(x1 (t), t)] we deduced from Lemma 1 that M(x1 (t), τ − r, t) = F (τ − r, τ, m(x1 (t), t), ε(x1 (t),τ ) ) = F (τ − r, τ, m1 , ε1 ) ∗ F (τ − r, τ, m2 , ε2 ). Putting pieces together, one obtains Theorem 2.
Proof of Theorem 3. Conditionally on m(x1 (t), t) = m, we have:
1 1 = sup a ≥ 0; ε(x1 (t),t) (z) ≥ p(m,a−1/t) , for 0 ≤ z ≤ m . = t −ρ τ
Shocks in Burgers Turbulence with White Noise Initial Velocity
79
We thus have with Lemma 6 and the scaling property of Brownian excursions µ(t, m)(ρ ∈ dr, R > α)
1 3/2 3/2 exp −m3/2 t −1 0 es ds m m = E ;σ ∈ d − ; η > α , t −r t C m3/2 /t where σ and η are defined in Sect. 2.3. Lemma 3 ensures now that
m3/2 µ(t, m)(ρ ∈ dr, R > α) = P σ ∈ d ; η > α|σ ≥ t 3/2 m m3/2 =P σ ∈d ; η > α /P σ ≥ . t −r t m3/2 t −r
We thus deduce the joint law of (τ, R) from the joint law of (σ, η), which has been computed in Lemma 4.
4. Proof of the Preliminary Results 4.1. Proof of Lemma 3. Lemma 3 is mainly an application of the Girsanov formula. Let W be a Brownian motion starting from x > 0 under Px . For any Borel bounded functional f ,
Ex f Ws − ps(a,1) ; 0 ≤ s ≤ 1 a 1 −a 2 /24 x . E f (Ws ; 0 ≤ s ≤ 1) exp − (1 − 2s)dWs =e 2 0 For y > 0, it follows from the equality p(a,1) (1) = 0 that
Ex f Ws − ps(a,1) ; 0 ≤ s ≤ 1 |W1 = y
(a,1) Ex f Ws − ps ; 0 ≤ s ≤ 1 ; W1 − p (a,1) (1) ∈ dy = Px (W1 ∈ dy) a 1 −a 2 /24 x E f (Ws ; 0 ≤ s ≤ 1) exp − (1 − 2s)dWs |W1 = y . =e 2 0 After an integration by parts, we obtain
[1] E f bx→y (s) − p (a,1) (s); 0 ≤ s ≤ 1 2 [1] (s); 0 ≤ s ≤ 1 exp −a = ea(x+y)/2−a /24 E f bx→y
0
1
[1] bx→y (s)ds
,
80
C. Giraud
[1] where bx→y is a Brownian bridge of duration 1 from x to y. We have in particular
2 [1] [1] E f bx→y (s) − p (a,1) (s); 0 ≤ s ≤ 1 ; bx→y − p (a,1) ≥ 0 = ea(x+y)/2−a /24 1 [1] [1] [1] × E f bx→y (s); 0 ≤ s ≤ 1 exp −a bx→y (s)ds ; bx→y ≥ 0 . 0
[1] conditioned to stay positive has the We shall use now the fact that a Brownian bridge bx→y
[1] [1] law of a Bessel(3) bridge βx→y . We divide the previous equality first by P bx→y ≥ 0
[1] [1] E f βx→y (s) − p (a,1) (s); 0 ≤ s ≤ 1 ; βx→y ≥ p(a,1) a(x+y)/2−a 2 /24 [1] E f βx→y (s); 0 ≤ s ≤ 1 exp −a =e
1
0
[1] and then by P βx→y ≥ p(a,1) :
[1] βx→y (s)ds
,
[1] [1] E f βx→y (s) − p (a,1) (s); 0 ≤ s ≤ 1 | βx→y ≥ p(a,1) 1 −1 [1] = E exp −a βx→y (s)ds 0
[1] × E f βx→y (s); 0 ≤ s ≤ 1 exp −a
0
1
[1] βx→y (s)ds
.
[1] [1] This means that for a Bessel(3) bridge βx→y of duration 1 under P, the law of βx→y −
[1] p(a,1) conditionally on βx→y ≥ p(a,1) is
E exp −a 0
1
[1] βx→y (s)ds
−1
× exp −a
1 0
[1] βx→y (s)ds
P.
[1] converges to the law of an excursion e Now, since the law of a Bessel(3) bridge βx→y when x, y → 0, we claim that the law of e − p(a,1) conditionally on (e ≥ p (a,1) ) is ν(a, 1). Using the scaling property of Brownian excursions, one obtains Lemma 3.
4.2. Proof of Lemma 4. We give first some relations between Bessel bridges. [m] Lemma 7. Let βx→y denote a Bessel (3) bridge from x to y of duration m. For any y, z, α ≥ 0 and x > 0, the process
α [x] βz→y+α (t) − t; 0 ≤ t ≤ x x [x] . conditioned to stay positive has the same law as βz→y As a consequence, for a, y ≥ 0, x > 0 and α = p(a,1) (x) = a2 x(1 − x) the law of the process [x] − p (a,1) β0→y+α [x] conditioned to stay positive is the same as the law of β0→y − p (a,x) conditioned to stay positive.
Shocks in Burgers Turbulence with White Noise Initial Velocity
81
Proof of Lemma 7. For any y, z > 0, let W z−y be a Brownian motion starting from z − y. It is well known that the process y + W z−y (t) −
t z−y W (x), x
0 ≤ t ≤ x, z−y
[x] which is independent of Wx . We can therefore conditioned is a Brownian bridge bz→y z−y W z−y by (Wx = α) which gives
t law [x] [x] y + bz−y→α (t) − α; 0 ≤ t ≤ x = bz→y (t); 0 ≤ t ≤ x , x
which may be written
α law [x] [x] bz→y+α (t) − t; 0 ≤ t ≤ x = bz→y (t); 0 ≤ t ≤ x . x Since a Brownian bridge conditioned to stay positive has the law of a Bessel (3) bridge,
[x] α the process βz→y+α (t) − x t; 0 ≤ t ≤ x conditioned to stay positive has same law as
[x] . The result still holds for y = 0 or z = 0 by taking the limits y → 0, z → 0. βz→y The second part of the lemma follows from the first part and the equality
p(a,1) (t) = p(a,x) (t) + The proof is complete.
α t. x
(3)
We prove now Lemma 4. We set
τa = inf u > 0, e(u) ≤ p(a,1) (u) . We want to compute P(η > x, σ ∈ da) = P (τa > x; σ ∈ da) P (τa > x; σ ≥ a − b) − P (τa > x; σ ≥ a) . = lim b↓0 b Conditionally on (ex = z), a Brownian excursion is the concatenation of two independent [x] [1−x] and βz→0 . This decomposition leads us to Bessel bridges β0→z P (τa > x; σ ≥ a − b) ∞
[x] [1−x] = P(ex − α ∈ dy)P β0→y+α ≥ p(a,1) P βy+α→0 ≥ p(a−b,1) (x + ·) , 0
where α = p(a,1) (x). We deduce from Lemma 7 and (3) that
α [x] [x] [x] ≥ p(a,1) = P β0→y ≥ p(a,x) P β0→y+α (s) ≥ s; 0 ≤ s ≤ x . P β0→y+α x
82
C. Giraud
If one follows the same way as in proof of Lemma 3, one obtains
a x [x] [x] (a,x) −a 2 x 3 /24 P β0→y ≥ p E exp − (x − 2s) dβ0→y (s) =e 2 0 = e−a
2 x 3 /24
e−axy/2 L(x,y) (a),
where the second equality stems from Lemma 2.
[x] Let us compute P β0→y+α (s) ≥ αx s; 0 ≤ s ≤ x . Let B denote a three dimensional
Bessel process starting from z under Pz . Under P0 , we have the identity in law law
(sB1/s ; s ≥ 0) = (Bs ; s ≥ 0), which implies that P
0
α α y+α 0 Bs ≥ s; 0 ≤ s ≤ x|Bx = y + α = P B1/s ≥ ; 0 ≤ s ≤ x|B1/x = x x x
α (y+α)/x =P Bs ≥ ; s ≥ 0 , x
where the last equality follows from the Markov property of B. Moreover, it is known (see Chap. VI, Corollary (3-4) in [15]) that this last quantity is equal to y/(y +α). Putting the pieces together, we obtain
y 2 3 [x] P β0→y+α ≥ p(a,1) = e−a x /24 e−axy/2 L(x,y) (a). y+α Now, since (with β = p(b,1) (x))
[1−x] P βy+α→0 ≥ p(a−b,1) (x + ·)
[1−x] ≥ p(a−b,1) = P β0→y+α =
y + β −(a−b)2 (1−x)3 /24 −(a−b)(1−x)(y+β)/2 (1−x,y+β) e e L (a − b), y+α
and P(ex ∈ d(y + α)) =
(y + α)2 exp − dy 2x(1 − x) 2π x 3 (1 − x)3 2(y + α)2
we obtain P (τa > x; σ ≥ a − b) = e−a and at last
P (τa > x; σ ∈ da) = −e−a
2 /24
2 /24
G(x) (a, b),
(4)
∂2 G(x) (a, 0).
With Lemma 2 and the decomposition of an excursion into two Bessel bridges, one may check that G(x) (a, 0) = C(a) for any x ∈ [0, 1]. Finally, taking x = 0 and b = 0 in formula (4) gives 2 P(σ ≥ a) = e−a /24 C(a). The proof of Lemma 4 is complete.
Shocks in Burgers Turbulence with White Noise Initial Velocity
83
4.3. Proof of Lemma 5. We shall mainly use here a path decomposition for Markov processes due to Millar [14]. For any x > 0 and 0 ≤ z ≤ 1, let P(x,z) be the law of [1−z] [m] (s), z + s); 0 ≤ s ≤ 1 − z}, where βx→y is a Bessel(3) bridge of duration m {(βx→0 from x to y. The canonical process X is Markovian under P(x,z) in its natural filtration {Gs , s > 0}. We set 2y(1 − y)/x if x > 0 f (x, y) = +∞ else. Under P(0,0) , we write X(s) = (e(s), s), where e is a Brownian excursion. We thus have σ = inf f (Xs ) s>0
and η is the rightmost time where f (Xs ) reaches its overall minimum. The theorem of Millar in [14] allows us to decompose e at time η. Indeed, we apply this theorem to X under P(0,0) , and obtain that (e(η + s); 0 ≤ s ≤ 1 − η) is a Markov process independent of (e(s); 0 ≤ s ≤ η) conditionally on (η, σ ), with transitions E g(eη+t ) | eu ; u ≤ η + s = g(y)Ht−s (eη+s , η + s, σ ; dy), where and
[1−u] Ht (x, u, a; dy) = P βx→0 (t) ∈ dy | Ta = ∞
[1−u] Ta = inf t > 0, βx→0 (t) < p(a,1) (u + t) .
This means that conditionally on (σ = a, η = x), the process eη+· has the law of a Bessel (3) bridge βp[1−x] (a,1) (x)→0 conditioned by (Ta = ∞). We have as in Lemma 7, that the process
[1−x] βα→0 (t) − p (a,1) (x + t); 0 ≤ t ≤ 1 − x conditioned to stay positive has the law of
e[1−x] (t) − p (a,1−x) (t); 0 ≤ t ≤ 1 − x conditioned to stay positive, which is ν(a, 1 − x). We have proved that the law of
e(η + s) − p (σ,1) (η + s); 0 ≤ s ≤ 1 − η conditionally on (σ = a, η = x) is ν(a, 1 − x). Using the symmetry law
(e(s); 0 ≤ s ≤ 1) = (e(1 − s); 0 ≤ s ≤ 1) of Brownian excursion, we obtain that the law of e(s) − p (σ,1) (s); 0 ≤ s ≤ η conditionally on (σ = a, η = x) is ν(a, x). The first part of Lemma 5 is proved. For any 0 ≤ a ≤ b and 0 ≤ x ≤ 1, we write e = e2 + p (a,1)
84
C. Giraud
σ2 = σ − a = sup α ≥ −a, e2 (s) ≥ p(α,1) (s)
and
for 0 ≤ s ≤ 1 .
Conditionally on (σ = b, η = x), e − p(σ,1) has the law of the concatenation of two independent processes with law ν(b, x) and ν(b, 1 − x) (first part of the lemma). Since e − p (σ,1) = e2 − p (σ2 ,1) we have that conditionally on (σ = b, η = x), the law of e2 is ν(a, 1) conditioned on (σ2 = b − a, η = x). The second part of Lemma 5 follows now from the scaling property of Brownian excursion. 5. Appendix The key of the proof of Lemma 6 is the following result. Lemma 8. We write Y (z) = W (z) + z2 /2t for the Brownian motion with parabolic drift and Y ∗ = inf z∈R Y (z) for its minimum. For any a ∈ R and m ≥ 0 the law of the process (Y (a + z) − Y (a); 0 ≤ z ≤ m) conditionally on Y ∗ = Y (a) = Y (a + m) is ν(m, 1/t). Proof of Lemma 8. It follows from the work of Millar [14], that the law of (W (a + z) − W (a), 0 ≤ z ≤ m) conditionally on Y ∗ = Y (a) is the weak limit when x decreases to 0 of the conditional law − 1 Px · |W (z) ≥ − z(z + 2a) for 0 ≤ z ≤ m , 2t −
where W is a Brownian motion starting from x under Px . As a consequence the law of (W (a + z) − W (a), 0 ≤ z ≤ m) conditionally on Y (a) = Y ∗ = Y (a + m) is the weak limit when x decreases to 0 of the conditional law − − 1 1 x P · |W (z) ≥ − z(z + 2a) for 0 ≤ z ≤ m ; W (m) = x − m(m + 2a) , 2t 2t which is actually the weak limit when x decreases to 0 of the law of a Brownian bridge [m] bx→x−α of duration m relying x to x − α (with α = m(m + 2a)/2t), conditioned to stay above the path z → −z(z + 2a)/2t. The same argument as in the proof of Lemma 7 [m] ensures that the law of bx→x−α path z → −z(z + 2a)/2t
conditioned to stay above the
[m] [m] (z) − αz/m; 0 ≤ z ≤ m with bx→x conditioned to is the same as the law of bx→x stay above z → z(m − z)/2t. Now, on the one hand the law of a Brownian bridge [m] conditioned to stay above z → z(m − z)/2t is the same as the law of a Bessel bx→x [m] of duration m relying x to x conditioned to stay above z → z(m − z)/2t Bridge βx→x (see the proof of Lemma 3 for very close arguments) and on the other hand the law of [m] converges when x decreases to 0 towards the law of a Brownian excursion e[m] βx→x of duration m. Putting pieces together, we deduce that the pieces of Brownian motion (W (a + z) − W (a), 0 ≤ z ≤ m) conditioned by Y (a) = Y ∗ = Y (a + m) has the law of [m] e (z) − αz/m; 0 ≤ z ≤ m , where e[m] is conditioned to stay above z → z(m−z)/2t. Lemma 8 follows now from Lemma 3 and the equality
Y (a + z) − Y (a) = W (a + z) − W (a) +
1 z(z + 2a). 2t
Shocks in Burgers Turbulence with White Noise Initial Velocity
85
Proof of Lemma 6. Let us define for n ∈ Z the processes a (t)
W−n
= (W (z); −∞ < z ≤ an (t)) ,
a (t) W+n
= (W (z + an (t)) − W (an (t)); z ≥ 0) .
We know from the theory of splitting times (see [9] for a short introduction to splitting a(0,t) a(0,t) is independent of W+ conditionally on a (0, t). We want to prove times) that W− an (t) an (t) now that W− is independent of W+ conditionally on Ft . For any real y, let us call Nt (y) the index of the first cluster located at the right of y. By stationarity of W , the processes (W (a0 (t) + z) ; z ∈ R) and W aNt (y) (t) + z ; a (t) a (t) z ∈ R have the same law. In particular, W−Nt (y) is independent of W+Nt (y) conditionally on aNt (y) (t). Let (qk ; k ∈ N) be an enumeration of the rational numbers, and define the event qk ∈ Atn = {Nt (qk ) = n; Nt (ql ) = n, ∀ l < k} which is Ft −measurable. For any f, g Borel functional on path, we have
a (t) a (t) g W+n |Ft E f W−n ∞ aN (q ) (t) aN (q ) (t) = g W+ t k ; qk ∈ Atn |Ft E f W− t k =
k=0 ∞ k=0
aN (q ) (t) aN (q ) (t) |Ft E g W+ t k |Ft ; qk ∈ Atn E E f W− t k
a (t) a (t) = E f W−n |Ft E g W+n |Ft . a (t)
a (t)
This ensures the independence of W−n and W+n conditionally on Ft and by the way the independence of the processes ε(xn (t),t) ; n ∈ Z , conditionally on Ft . Let us give now the law of ε(xn (t),t) conditionally on Ft . As a consequence of the stationarity of the Brownian motion W , the process ε (xn (t),t) conditioned by an−1 (t) − xn (t) = a and mn (t) = m has the same law as the process (Y (a +z)−Y (a); 0 ≤ z ≤ m) conditioned by Y ∗ = Y (a) = Y (a + m). The conditional law of ε (xn (t),t) given Ft is then ν(mn (t), 1/t). Acknowledgement. I am very grateful to J. Bertoin for his guidance, without which this work would not have been possible.
References 1. Abramowitz, M. and Stegun, I.A.: Handbook of mathematical functions. Washington: Nat. Bur. Stand. 1964 2. Avellaneda, M. and E, W.: Statistical properties of shocks in Burgers Turbulence. Commun. Math. Phys. 172, 13–38 (1995) 3. Avellaneda, M.: Statistical properties of shocks in Burgers Turbulence II. Commun. Math. Phys. 169, 45–59 (1995) 4. Bertoin, J.: The inviscid Burgers equation with Brownian initial velocity. Commun. Math. Phys. 193, 397–406 (1998)
86
C. Giraud
5. Bertoin, J.: Clustering statistics for sticky particles with Brownian initial velocity. J. Math. Pures Appl. 79, 2, 173–194 (2000) 6. Burgers, J.M.: The nonlinear diffusion equation. Dordrecht: Reidel (1974) 7. Cole, J.D.: On a quasi linear parabolic equation occuring in aerodynamics. Quart. Appl. Math. 9, 225–236 (1951) 8. Frachebourg, L. and Martin, P.A.: Exact statistical properties of the Burgers equation. To appear in J. Fluids Mech. 9. Getoor, R.K. Splitting times and shift functionals. Z. Wahrscheinlichkeitstheorie Verw. Gebiete 47, 69–81 (1979) 10. Groeneboom, P.: The concave majorant of Brownian motion. Ann. of Proba. 11, 4, 1016–1027 (1983) 11. Groeneboom, P.: Brownian motion with a parabolic drift and Airy functions. Probability Theory and Rel. Fields 81, 79–109 (1989) 12. Hopf, E.: The partial differential equation ut +uux = µuxx . Comm. Pure Appl. Math. 3, 201–230 (1950) 13. Leonenko, N.: Limit theorems for random fields with singular spectrum. Math. and Appl. Dordrecht: Kluwers Academic Publishers, 1999 14. Millar, P.W.: A path decomposition for Markov processes. Ann. of Proba. 6, 345–348 (1978) 15. Revuz, D.,Yor, M.: Continuous martingales and Brownian motion. Grundlehren Math. Wiss. 293, Berlin– Heidelberg–New York: Springer-Verlag, 1991 16. Ryan, R.: Large-deviation analysis of Burgers turbulence with white-noise initial data. Comm. Pure Applied Math. 51, 47–75 (1998) 17. She, Z., Aurell, E. and Frisch, U.: The inviscid Burgers equation with initial data of Brownian type. Commun. Math. Phys. 148, 623–641 (1992) 18. Sinai, Y.: Statistics of shocks in solutions of inviscid Burgers Equation. Commun. Math. Phys. 148, 601–621 (1992) 19. Woyczynski, W.A.: Burgers-KPZ Turbulence. Göttingen lectures, Lectures Notes in Math. 1700, Berlin– Heidelberg–New York: Springer, 1998 Communicated by Ya. G. Sinai
Commun. Math. Phys. 223, 87 – 123 (2001)
Communications in
Mathematical Physics
© Springer-Verlag 2001
Infinite Random Matrices and Ergodic Measures Alexei Borodin1 , Grigori Olshanski2 1 Department of Mathematics, The University of Pennsylvania, Philadelphia, PA 19104-6395, USA.
E-mail:
[email protected]
2 Dobrushin Mathematics Laboratory, Institute for Problems of Information Transmission,
Bolshoy Karetny 19, 101447 Moscow GSP-4, Russia. E-mail:
[email protected];
[email protected] Received: 22 January 2001 / Accepted: 30 May 2001
Abstract: We introduce and study a 2-parameter family of unitarily invariant probability measures on the space of infinite Hermitian matrices. We show that the decomposition of a measure from this family on ergodic components is described by a determinantal point process on the real line. The correlation kernel for this process is explicitly computed. At certain values of parameters the kernel turns into the well-known sine kernel which describes the local correlation in Circular and Gaussian Unitary Ensembles. Thus, the random point configuration of the sine process is interpreted as the random set of “eigenvalues” of infinite Hermitian matrices distributed according to the corresponding measure. Contents Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . 1. The Pseudo-Jacobi Ensemble . . . . . . . . . . . . . . 2. The Scaling Limit of the Correlation Functions . . . . . 3. The Hua–Pickrell Measures . . . . . . . . . . . . . . . 4. Ergodic Measures . . . . . . . . . . . . . . . . . . . . 5. Approximation of Spectral Measures . . . . . . . . . . 6. The Main Result . . . . . . . . . . . . . . . . . . . . . 7. Vanishing of the Parameter γ2 . . . . . . . . . . . . . . 8. Remarks and Problems . . . . . . . . . . . . . . . . . 9. Appendix: Existence and Uniqueness of Decomposition on Ergodic Components . . . . . . . . . . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
88 93 97 100 105 108 110 112 115
. . . . . . . . . .
119
88
A. Borodin, G. Olshanski
Introduction We first introduce some basic notions, and then describe the main results of the paper.
Random point configurations and correlation functions. Let X be a locally compact space. A locally finite point configuration in X is a finite or countably infinite collection of points in X, also called particles, such that any compact set contains finitely many particles. The ordering of the particles is unessential. For the sake of brevity, we will omit the adjective “locally finite”. A point process on X is a probability measure on the space Conf(X) of point configurations. Given a point process, we can speak about the random point configuration. The nth correlation measure of a point process (n = 1, 2, . . . ) is a symmetric measure ρn on Xn , which is determined by the relation ρn , F = E
F (x1 , . . . , xn ) ,
(0.1)
where F is a compactly supported test function on Xn , E is the symbol of expectation, and the summation is taken over all ordered n-tuples of particles chosen from the random point configuration. The nth correlation function is the density of ρn with respect to the nth power of a certain reference measure on X. Usually, the reference measure is the Lebesgue measure. The first correlation function is also called the density function. See [Len], [DVJ, Ch. 5]1 , [So].
The Dyson circular unitary ensemble. Let T ⊂ C be the unit circle and TN /S(N ) be the set of orbits of the symmetric group S(N ) of degree N acting on the torus TN , where N = 1, 2, . . . Consider the following probability measure on TN /S(N ):
const ·
1≤j
|uj − uk |2
N j =1
dϕj ,
uj = e2πiϕj ∈ T ,
ϕj ∈ [− 21 , 21 ], (0.2)
√ where const is the normalizing factor, i = −1. This measure defines a point process on X = T living on the N -point configurations, which is called the N th Dyson circular unitary ensemble or simply the Dyson ensemble for short. Note that the Dyson ensemble is invariant under rotations of T. Let U (N ) be the group of N × N unitary matrices. Consider the natural projection U (N ) → TN /S(N ) assigning to a matrix U ∈ U (N ) the collection of its eigenvalues. Note that the fibers of this projection are exactly the conjugacy classes of the group U (N ). The measure (0.2) coincides with the pushforward of the normalized Haar measure on U (N ) under this projection. In other terms, (0.2) is the radial part of the Haar measure. It follows that the Dyson ensemble is formed by spectra of random unitary matrices U ∈ U (N ) distributed according to the Haar measure. See [Dys, Me]. 1 In the book [DVJ] the correlation measures are called the “factorial moment measures”.
Infinite Random Matrices and Ergodic Measures
89
The sine process. This is a translationally invariant point process on X = R. Its correlation functions (with respect to the Lebesgue measure on R) are given by
sin(π(yj − yk )) ρn (y1 , . . . , yn ) = det π(yj − yk )
n j,k=1
,
n = 1, 2, . . . ,
y1 , . . . , yn ∈ R.
(0.3) )) The function sin(π(y−y on R × R is called the sine kernel. π(y−y ) The correlation functions of the sine process can be obtained from the correlation functions of the N th Dyson ensemble by the following scaling limit as N → ∞. Fix an arbitrary point u0 ∈ T and rescale the angular coordinate ϕ about the point u0 by writing u = u0 e2πiy/N . Then, for any fixed n, the nth correlation function of the N th Dyson ensemble, expressed in terms of the y-variables, converges, as N → ∞, to the function (0.3). See [Dys, Me].
A substitute of the Haar measure. A natural question is whether the sine process can be interpreted as a radial part of an infinite–dimensional analog of the Haar measure. In this paper we suggest such an interpretation. It is convenient to pass from unitary matrices to Hermitian matrices. Let H (N ) be the linear space of N × N complex Hermitian matrices. Consider the Cayley transform H (N) X → U =
i−X ∈ U (N ), i+X
N = 1, 2, . . . .
(0.4)
The map (0.4) is one-to-one, and the complement of its image in U (N ) is a negligible set. Thus, we can transfer the normalized Haar measure from U (N ) to H (N ). The result has the following form: const · det(1 + X 2 )−N × (the Lebesgue measure).
(0.5)
Let H be the space of all infinite Hermitian matrices X = [Xj k ]∞ j,k=1 . A remarkable fact is that the measures (0.5) with different values of N are consistent with natural projections H (N) → H (N − 1) and, therefore, determine a probability measure m on H . We view m as a substitute of the Haar measure on U (N ) for N = ∞.
Ergodic measures. Assume that we have a group acting on a Borel space. An invariant probability Borel measure is called ergodic if any invariant mod 0 set has measure 0 or 1. Ergodic measures coincide with extreme points of the convex set of all invariant probability measures, see [Ph]. For continuous actions of compact groups ergodic measures are exactly orbital measures, i.e., invariant probability measures supported by orbits. According to the general philosophy of the ergodic theory, the concept of ergodic measure is a right generalization of that of orbital measure. We are interested in a special situation when the space is H and the group is an infinite–dimensional version U (∞) of the groups U (N ). By definition, U (∞) is the union of the groups U (N ). Its elements are infinite unitary matrices [Uj k ]∞ j,k=1 with finitely many entries Uj k not equal to δj k . The group U (∞) acts on the space H by conjugations.
90
A. Borodin, G. Olshanski
Consider the space whose elements ω are given by 2 infinite sequences α1+ ≥ α2+ ≥ · · · ≥ 0,
α1− ≥ α2− ≥ · · · ≥ 0,
where
∞ j =1
(αj+ )2 +
∞ j =1
(αj− )2 < ∞,
(0.6) and 2 extra real parameters γ1 , γ2 , where γ2 ≥ 0. It is known that the ergodic measures on H can be parametrized by the points ω ∈ . We consider as a substitute of the space TN /S(N ) for N = ∞. Let us explain the asymptotic meaning of the parameters αj± , γ1 , γ2 . According to a general result, each ergodic measure M on H can be approximated by a sequence {M (N) | N = 1, 2, . . . }, where M (N) is an orbital measure on H (N ) with respect to the action of U (N ) by conjugations. Any such measure M (N) is specified by a collection λ(N) of eigenvalues. Then the parameters of ω describe the asymptotic behavior of λ(N) as N → ∞: (N)
λ(N) = (λ1
(N)
≥ · · · ≥ λN ) ∼ (N α1+ , N α2+ , . . . , −N α2− , −N α1− ), (N)
λ1 (N)
(N)
(N)
+ · · · + λN N
→ γ1 ,
(λ1 )2 + · · · + (λN )2 → γ2 + (α1+ )2 + (α2+ )2 + · · · + (α1− )2 + (α2− )2 + . . . . N2 (0.7) For more details, see [Pi2, OV], and references therein. From spectral measures to point processes. It can be proved that any U (∞)-invariant probability measure on H can be decomposed on ergodic components. I.e., it can be written as a continual convex combination of ergodic measures. This decomposition is unique, we call it the spectral decomposition. It is determined by a probability measure on , which we call the spectral measure of the initial invariant measure. We map the space to the space Conf(R∗ ) of point configurations on the punctured real line R∗ = R \ {0} as follows: ω = ({αj+ }, {αj− }, γ1 , γ2 ) → C = (−α1− , −α2− , . . . , α2+ , α1+ ) ∈ Conf(R∗ ), (0.8) where we omit possible zeros among the numbers αj± . The map (0.8) transforms any spectral measure (which is a probability measure on ) to a point process on R∗ . This makes it possible to describe spectral measures in terms of the correlation functions. However, the map (0.8) ignores the parameters γ1 , γ2 . Note that each configuration C ∈ Conf(R∗ ) of the form (0.8) is contained in a sufficiently large interval |x| ≤ const. It follows that C −1 (the image of C under the inversion map x → 1/x) is a well-defined configuration on the whole line R. An interpretation of the sine process. Applying the procedure described above to the measure m on H we prove the following result. Theorem I. Let P be the spectral measure of the U (∞)-invariant measure m and let P be the corresponding point process on R∗ . Then the point process on R obtained from 1 P under the transform x → y = − πx coincides with the sine process.
Infinite Random Matrices and Ergodic Measures
91
A simple explanation of this result follows from the comparison of two approximation procedures: that for the correlation functions of the sine process and that for the ergodic measures. Indeed, the eigenvalues in (0.7) grow linearly in N , so that we rescale them according to the rule λ = N x. Under the Cayley transform u = i−λ i+λ the scaling takes the form 1 1 2i 1 i − Nx 2πiy/N = (−1)e , y=− + O = −1 + +O , u= 2 2 i + Nx Nx N N πx (0.9) which means that the variable y is consistent with the scaling of the Dyson ensemble near the point u0 = −1. Thus, the statement of Theorem I is not surprising. However, the justification of the formal limit transition made on the level of correlation functions requires certain efforts. Note also that dividing the eigenvalues λ ∈ R by N corresponds in terms of u = i−λ i+λ to the fractional–linear transformation of T of the form (N + 1)u + (N − 1) . (0.10) u → (N − 1)u + (N + 1) This transformation has two fixed points, +1 and −1. Near the point −1 it looks like the expansion by the factor of N while near the point +1 it looks like the contraction by the factor of N . Using (0.10) as a scaling transformation one can define a scaling limit for the correlation functions of the Dyson ensembles staying on the circle T. Theorem I is complemented by Theorem II. The spectral measure P of the measure m is concentrated on the subset {ω ∈ | γ2 = 0}. Thus, the parameter γ2 (which is ignored by the map (0.8)) is actually irrelevant for the measure m. In a certain sense, this means that the measure m does not involve Gaussian components (see Sect. 4 about the connection of the parameter γ2 with Gaussian measures). A generalization: The main result. Let s ∈ C, s > − 21 , be a parameter. Consider the following probability measure on TN /S(N ): const ·
1≤j
|uj − uk |2
N
(1 + uj )s¯ (1 + u¯ j )s dϕj ,
j =1
uj = e2πiϕj ∈ T ,
(0.11)
ϕj ∈ [− 21 , 21 ].
When s = 0, we get (0.2). Thus, this is a deformation of the measure (0.2) depending on two real parameters, s and s. The measure (0.11) is the radial part of the probability measure on U (N ) of the form const · det((1 + U )s¯ ) det((1 + U −1 )s ) × (the Haar measure on U (N )).
(0.12)
Transferring the measure (0.12) from the group U (N ) to the space H (N ) by means of the Cayley transform (0.4) we get the following measure on H (N ), which is a deformation of the measure (0.5): const · det((1 + iX)−s−N ) det((1 − iX)−¯s −N ) × (the Lebesgue measure on H (N )). (0.13)
92
A. Borodin, G. Olshanski
When s is real, the expression (0.13) takes a simpler form: const · det((1 + X 2 )−s−N ) × (the Lebesgue measure on H (N )), s ∈ R,
s > − 21 .
(0.14)
Again, it turns out that the measures (0.13) are consistent with the projections H (N) → H (N − 1), and they determine a U (∞)-invariant probability measure on the space H . We denote it by m(s) . Note that m(0) = m. To our knowledge, the finite–dimensional measures (0.14) were first studied by Hua. He calculated the normalizing constant factor in (0.14) using a recurrence relation in N , and his argument proves the consistency property (although he did not state it explicitly), see [Hua, Theorem 2.1.5]. Much later Pickrell [Pi1] considered analogs of the measures (0.12) and (0.13) (with real s), which live on complex Grassmannians and on the spaces of all complex matrices, respectively. He proved the consistency property and considered the analogs of the measures m(s) on the space of all complex matrices of infinite order. His paper also contains a few other important ideas and results. Apparently, Pickrell was unaware of Hua’s work. Note also Shimomura’s paper [Shim], where an analog of the measure m(0) for the infinite-dimensional orthogonal group was constructed (more general measures depending on a parameter are not discussed in [Shim]). The possibility of introducing a complex parameter (in the case of Hermitian matrices) was discovered by Neretin [Ner2]. He also examined further generalizations of the measures m(s) . We propose to call the measures m(s) the Hua–Pickrell measures. Theorem III. The Hua–Pickrell measures m(s) on H are pairwise disjoint. I.e., for any two different values s , s of the parameter there exist two disjoint Borel subsets in H supporting m(s ) and m(s ) , respectively. The next claim is the main result of the paper. Theorem IV. Let P (s) be the spectral measure of a Hua–Pickrell measure m(s) . The corresponding point process P (s) on R∗ can be described in terms of its correlation functions. They have the determinantal form ρn(s) (x1 , . . . , xn ) = det[K (s) (xj , xk )]nj,k=1 ,
(0.15)
where K (s) (x, x ) is a certain kernel on R∗ × R∗ which can be expressed through the confluent hypergeometric function or, for real values of s, through the Bessel function. We give explicit expressions for the kernel in Theorem 2.1 below. As in Theorem I, one can use the transformation C → C −1 to pass from R∗ to R. Pseudo-Jacobi polynomials. The proof of Theorem IV, similarly to that of Theorem I, consists of three steps: the calculation of the correlation functions for the finite– dimensional measures (0.13), the scaling limit transition as N → ∞, and a justification. However, the first step is more involved comparing to the Dyson ensemble. We show that the correlation functions are expressed through the Christoffel–Darboux kernel for the so-called pseudo-Jacobi polynomials. This family of orthogonal polynomials, which is not widely known, has interesting features. It is defined by a weight function on R with only finitely many moments, so that the system of orthogonal polynomials is finite.
Infinite Random Matrices and Ergodic Measures
93
Organization of the paper. In Sect. 1 we introduce the pseudo-Jacobi ensemble and obtain its correlation functions. In Sect. 2 we compute the scaling limit of these correlation functions as the number of particles goes to infinity. The limit correlation functions are given by a determinantal formula and we write down the correlation kernel explicitly. In Sect. 3 we define the Hua–Pickrell measures m(s) and show that they are pairwise disjoint. Section 4 provides a brief summary of known results about the ergodic U (∞)invariant probability measures on H . In Sect. 5 we show that the spectral measure for any U (∞)-invariant probability measure M on H can be approximated by finite-dimensional projections of M. Section 6 contains the proof of our main result (Theorem IV above). In Sect. 7 we prove that the sine process has no Gaussian component (Theorem II above). Section 8 contains remarks concerning the connections of our work with other subjects as well as several open problems. Section 9 is an appendix where we prove the existence and uniqueness of the decomposition of U (∞)-invariant probability measures on H on ergodic measures. 1. The Pseudo-Jacobi Ensemble In this section we define the pseudo-Jacobi ensemble and compute its correlation functions. Consider the radial part of the Haar measure on U (N ) which determines the Dyson ensemble, see (0.2). Under the inverse Cayley transform T → R which takes u ∈ T N to x = i 1−u 1+u ∈ R, the measure (0.2) turns into the following measure on R /S(N ) = Conf N (R), the set of N -point configurations on R: const
(xj − xk ) · 2
1≤j
N
(1 + xj2 )−N dxj .
(1.1)
j =1
More generally, let s be a complex parameter. We introduce the following deformation of the measure (1.1) depending on s:
const
(xj − xk )2 ·
1≤j
= const
1≤j
N
(1 + ixj )−s−N (1 − ixj )−¯s −N dxj
j =1
(xj − xk )
2
N
(1.2)
· (1 + xj2 )−s−N e2s Arg(1+ixj ) dxj . j =1
Here we assume that the function Arg(. . . ) takes values in (−π, π ) (actually, Arg(1 + ixj ) ∈ (− π2 , π2 )). Proposition 1.1. The measure (1.2) is finite provided that s > − 21 . Proof. This follows from the estimate x ∈ R, |x| 0, (1.3) (1 + x 2 )−s−N e2s Arg(1+ix) |x|−2s−2N , and the fact that the expansion of 1≤j
94
A. Borodin, G. Olshanski
Henceforth we assume the condition s > − 21 to be satisfied, and we choose the normalizing constant in (1.2) in such a way that (1.2) defines a probability measure. About the case s ≤ − 21 see Sect. 8 below. Note that (1.2) corresponds, via the Cayley transform, to the measure (0.11). For real values of the parameter s the expression (1.2) takes a simpler form const
(xj − xk )2 ·
1≤j
N
(1 + xj2 )−s−N ,
s ∈ R.
j =1
Our aim is to compute the correlation functions of the measure (1.2). We remark that (1.2) is an orthogonal polynomial ensemble (see [Me,NW]) corresponding to the weight function φ(x) = (1+ix)−s−N (1−ix)−¯s −N = (1+x 2 )−s−N e2s Arg(1+ix) ,
x ∈ R. (1.4)
We call it the N th pseudo-Jacobi ensemble. The reason why we use this term is explained below. For real s, this ensemble was also considered in [WF] where it was called the unitary Cauchy ensemble. The reason is that for real s, the weight function (1.4) is proportional to the density of the classical Cauchy distribution. For generalities about orthogonal polynomial ensembles, see, e.g., [Me, NW]. Let p0 ≡ 1, p 1 , p2 , . . . denote the monic orthogonal polynomials on R associated with the weight function (1.4). Since for any s, φ(x) has only finitely many moments, this system of orthogonal polynomials is finite. Specifically, it follows from (1.3) that the polynomial p m (x) exists if m < s + N − 21 . According to a well-known general principle (see, e.g., [Me]), the correlation functions in question are given by determinantal formulas involving the Christoffel–Darboux kernel N−1 pm (x )p m (x ) . (1.5) #p m #2 m=0
By the assumption s > − 21 , the polynomials up to the order m = N − 1 exist, so that this kernel is well-defined. The orthogonal polynomials p m are known. They were introduced by V. Romanovski in 1929, see [Ro], and studied by R. Askey [A] and P. A. Lesky [Les1, §5], [Les2, §1.4]. Following P. A. Lesky we call them pseudo-Jacobi polynomials, which explains our choice of the name for the ensemble (1.2). Let
∞ a(a + a) . . . (a + n − 1) · b(b + 1) . . . (b + n − 1) n a, b
z = z 2 F1 c
c(c + 1) . . . (c + n − 1) · n! n=0
denote the Gauss hypergeometric function. Proposition 1.2. Let m < s + n − 21 , so that the mth monic orthogonal polynomial pm with the weight function (1.4) exists. Then it is given by the explicit formula
−m, s + N − m
2 m p m (x) = (x − i) 2 F1 (1.6) 2s + 2N − 2m 1 + ix
Infinite Random Matrices and Ergodic Measures
95
and its norm is given by #pm (x)# = 2
=
∞ −∞
p2m (x)φ(x)dx
π 2−2s
22(N−m−1)
2s + 2(N −m)−1, 2s + 2(N −m), m+1 , , s + N − m, s¯ + N − m, 2s + 2N − m
(1.7)
where we use the notation ,
,(a),(b) . . . a, b, . . . = . c, d, . . . ,(c),(d) . . .
Proof. These formulas can be extracted from [A], [Les1, §5], [Les2, §1.4]. Another way to get them is to use a general method described in [NU]. This method works for any orthogonal polynomials of hypergeometric type and allows to compute all the data starting from the differential equation. In our case the differential equation has the form −(1 + x 2 )p m + 2(−s + (s + N − 1)x)p m + m(m + 1 − 2s − 2N )p m = 0. (1.8) ! Note the symmetry property pm (−x) = (−1)m pm (x) |s↔¯s .
(1.9)
It follows from the symmetry of the weight function φ(−x) = φ(x) |s↔¯s and can be verified directly from the expression (1.6). To compute the Christoffel–Darboux kernel we will use the classical formula N−1 m=0
pN (x )p N−1 (x ) − p N−1 (x )p N (x ) pm (x )p m (x ) 1 = . 2 2 #p m # #p N−1 # x − x
(1.10)
If the parameter s satisfies the stronger condition s > 21 then the polynomial p N (x) exists and the formula holds. Since all the terms in the left-hand side depend analytically on s and s¯ , we can use the formula for s with 21 ≥ s > − 21 as well with the understanding that the kernel is obtained by analytic continuation in s and s¯ viewed as independent variables (or, equivalently, by analytic continuation in the variables s and s + s¯ ). Note that the trick with analytic continuation is actually needed only for the values of s on the vertical line s = 0, because a singularity in the expression (1.6) for m = N arises for s = 0 only. The next lemma makes it possible to get an alternative expression for the Christoffel– Darboux kernel. The advantage of this new formula is that all its terms have no singularities in the whole region s > − 21 .
96
A. Borodin, G. Olshanski
Lemma 1.3. Set p N (x) = pN (x) −
2iN s p (x). 2s(2s + 1) N−1
This polynomial, initially defined for s > from the explicit formula: pN (x) = (x − i)N 2 F1
1 2,
(1.11)
makes sense for s > − 21 , as follows
−N, s
2 . 2s + 1 1 + ix
(1.12)
Proof. Indeed, using the power series expansion of the hypergeometric function it is readily verified that the following general relation holds:
abz a, b
a, b
a + 1, b + 1
z = 2 F1 z + z . (1.13) 2 F1 2 F1 c
c + 1
c+2
c(c + 1) From (1.13) and (1.6) we easily get (1.12).
!
We summarize the above results in the following Theorem 1.4. The correlation functions of the N th pseudo-Jacobi ensemble (1.2) have the form ρ (s,N) (x1 , . . . , xn ) = det[K (s,N) (xi , xj )]ni,j =1 (1.14) n with a kernel K (s,N) (x , x ) defined on R × R. This kernel is given by the formulas K
(s,N)
22s 2s + N + 1, s + 1, s¯ + 1 (x , x ) = , N, 2s + 1, 2s + 2 π p (x )p N−1 (x ) − p N−1 (x )p N (x ) × N φ(x )φ(x ) x − x
(1.15)
or, equivalently, K
(s,N)
22s 2s + N + 1, s + 1, s¯ + 1 , (x , x ) = N, 2s + 1, 2s + 2 π p (x )p N−1 (x ) − p N−1 (x ) p N (x ) × N φ(x )φ(x ), x − x
(1.16)
where φ(x) = (1 + ix)−s−N (1 − ix)−¯s −N = (1 + x 2 )−s−N e2s Arg(1+ix) , and
−N, s
2 pN (x) = (x − i) 2 F1 , 2s 1 + ix
−N + 1, s + 1
2 pN−1 (x) = (x − i)N−1 2 F1
1 + ix , 2s + 2
−N, s
2 N pN (x) = (x − i) 2 F1 . 2s + 1 1 + ix N
x ∈ R, (1.17)
(1.18) (1.19) (1.20)
Infinite Random Matrices and Ergodic Measures
97
Note that the expression (1.15) is directly applicable when the parameter s does not lie on the line s = 0 while the expression (1.16) makes sense for any s with s > − 21 . Proof. A standard argument from the Random Matrix Theory, see, e.g., [Me] shows that the correlation functions are given by the determinantal formula (1.14), where √ the kernel is equal to the Christoffel–Darboux kernel (1.5) multiplied by the factor φ(x )φ(x ). Together with (1.6), (1.7), (1.10) this leads to the expression (1.15) for the kernel. The alternative formula (1.16) then follows from Lemma 1.3. ! Remark 1.5. For s = 0 the polynomial p N can be defined by taking the limit as s → 0 along the real line. Taking the limit in the hypergeometric series it is easy to get the following expression: p N (x) |s=0 =
(x + i)N + (x − i)N . 2
Likewise, we get
(x + i)N − (x − i)N . 2iN It follows that the Christoffel–Darboux kernel (1.10) is an elementary expression. This agrees with the fact that for s = 0 our ensemble is related (via the Cayley transform) to the Dyson ensemble. pN−1 (x) |s=0 =
2. The Scaling Limit of the Correlation Functions In this section we compute the scaling limit of the correlation functions of the pseudoJacobi ensemble as the number of particles goes to infinity. The limit correlation functions have a determinantal form, and we express the correlation kernel through the confluent hypergeometric function. Recall the definition of the confluent hypergeometric function: ∞ a(a + 1) . . . (a + n − 1) n a
z = z , 1 F1 c
c(c + 1) . . . (c + n − 1) · n! n=0
see, e.g., [Er, 6.1]. (s,N) of the pseudo-Jacobi ensemble (see Let us rescale the correlation functions ρ n (1.14)) by setting ρn(s,N) (x1 , . . . , xn ) = N n · ρ (s,N) (N x1 , . . . , N xn ). n Note that the factor N n comes from the transformation of the reference (Lebesgue) measure dx1 . . . dxn . We will assume that the variables range over the punctured real line R∗ , not the whole line R, as before. Theorem 2.1. Let s > − 21 , as before. For any n = 1, 2, . . . and x1 , . . . , xn ∈ R∗ (s,N) there exists a limit of the scaled nth correlation functions ρn as N → ∞: . lim ρn(s,N) (x1 , . . . , xn ) = det K (s,∞) (xi , xj ) N→∞
1≤i,j ≤n
98
A. Borodin, G. Olshanski
Here the kernel K (s,∞) (x , x ) on R∗ × R∗ is as follows: 1 P (x )Q(x ) − Q(x )P (x ) s + 1, s¯ + 1 (s,∞) (x , x ) = , , K 2s + 1, 2s + 2 2π x − x
s
2
s
2i , P (x) =
e−i/x+π s·sgn(x)/2 1 F1 2s x x
2
2
s −i/x+πs·sgn(x)/2 s + 1
2i . Q(x) = e 1 F1 2s + 2 x x x
(2.1)
Or, equivalently, K
(s,∞)
(x ) 1 P (x )Q(x ) − Q(x )P s + 1, s¯ + 1 (x , x ) = , , 2s + 1, 2s + 2 2π x − x
s
2i
s
(x) = 2 e−i/x+πs·sgn(x)/2 1 F1 . P
x
2s + 1 x
(2.2)
The limit is uniform provided that the variables x1 , . . . , xn range over any compact subset of R∗ . Comments. 1. As in Theorem 1.4, the formula (2.1) is directly applicable provided that s does not lie on the line s = 0, while the formula (2.2) holds for any s with s > − 21 . 2. The kernel K (s,∞) (x , x ) can be expressed through the M-Whittaker functions, see [Er, 6.9] for the definition. Namely, iπ(¯s +1) sgn(x) 2i 2i − iπ s¯ sgn(x) − 2 2 P (x) = e , Q(x) = e . M−is,s− 1 M−is,s+ 1 2 2 x x (2.3) 3. The symmetry property (1.9) of the pseudo-Jacobi polynomials implies that P (−x) = P (x) |s↔¯s ,
Q(−x) = −Q(x) |s↔¯s ,
(2.4)
which can also be verified directly from (2.3) using the formula [Er, 6.9(7)]: 1, t > 0, i1π µ+ 21 Mκ,µ (t) = e M−κ,µ (−t), 1 = −1, t < 0. It follows that the correlation kernel K (s,∞) (x , x ) remains invariant when x , x , s are replaced by −x , −x , s¯ (there is one more change of sign in the denominator (x −x )). 4. Formula (2.4) implies that the functions P (x) and Q(x) are real–valued, which agrees with the fact that the pseudo-Jacobi polynomials have real coefficients. Hence, the kernel K (s,∞) (x , x ) is real symmetric. 5. When s is real, the confluent hypergeometric function 1 F1 turns into the Bessel function, and the expressions for P and Q can be written as follows: 1 P (x) = 22s−1/2 ,(s + 1/2)|x|−1/2 Js−1/2 , |x| 1 Q(x) = sgn(x)22s+1/2 ,(s + 3/2)|x|−1/2 Js+1/2 . |x|
Infinite Random Matrices and Ergodic Measures
99
6. For s = 0 the Bessel functions with indices ± 21 degenerate to trigonometric functions, and we get P (x) |s=0 = cos( x1 ), K (0,∞) (x , x ) =
Q(x) |s=0 = 2 sin( x1 ), 1 sin( x1 − x1 ) . π x − x
1 Changing the variable, y = πx , and taking into account the corresponding transformation of the differential dx we get the sine kernel, in accordance with (0.9).
Proof of Theorem 2.1. We will show that lim (sgn(x ) sgn(x ))N N · K (s,N) (N x , N x ) = K (s,∞) (x , x ),
N→∞
x , x ∈ R∗ ,
uniformly on compact sets in R∗ . Note that the factor (sgn(x ) sgn(x ))N does not affect the determinantal formula. We start with the formula (1.15). First of all, we remark that ,(2s + N + 1) ∼ N 2s+1 , ,(N ) which easily follows from the Stirling formula. Next, we will examine the asymptotics of
pN (N x) φ(N x), pN−1 (N x) φ(N x),
N → ∞.
Here we will assume that x is not a real but a complex variable ranging in a neighborhood of a point x0 ∈ R∗ . This will allow us to overcome the difficulty related to the singularity x − x = 0 in the denominator of (1.15) by making use of the Cauchy integral formula. The asymptotics of the hypergeometric functions entering the formulas (1.18) and (1.19) are as follows:
2 −N, s
s
2i lim 2 F1 = 1 F1 , 2s 1 + iN x 2s x N→∞
2 −N + 1, s + 1
s + 1
2i F = . lim 2 F1 1 1
1 + iN x 2s + 2 2s + 2 x N→∞ Indeed, this is a special case of the well-known limit relation
a, b
z b
z ∈ C. = 1 F1
z , lim 2 F1
c a c |a|→∞ This can be readily verified using the integral representation of the hypergeometric function written in the form
b−1 t+ (1 − t)c−b−1 1 a, b
z + , = ,(c) , 2 F1 c a ,(b) ,(c − b) (1 − tz/a)a where the brackets denote the pairing between a generalized function (which in the present case is supported by [0, 1]) and a test function, and t is the argument of both
100
A. Borodin, G. Olshanski
functions. Note that the limit is uniform provided that z ranges over a bounded subset of C. The asymptotics of the remaining terms look as follows:
lim (±1)N (N x − i)N φ(N x) ∼ N −s (±x)−s e−i/x e±πs ,
N→∞
where ± is the sign of x and the limit is uniform on compact subsets in the open right or left half-plane. Indeed, assume x > 0. In the transformations below any expression of the form zc with c ∈ C is understood as a holomorphic function in the domain C \ (−∞, 0]. We have
(N x − i)N φ(N x) = (N x − i)N (1 + iN x)−(s+N)/2 (1 − iN x)−(¯s +N)/2 = (N x)N (iN x)−(s+N)/2 (−iN x)−(¯s +N)/2 1 −(s+N)/2 1 −(¯s +N)/2 i N 1+ 1− × 1− Nx iN x iN x = N −s x −s i −(s+N)/2 (−i)−(¯s +N)/2 1 −(s+N)/2 1 −(¯s +N)/2 i N 1+ 1− × 1− Nx iN x iN x ∼ N −s x −s eπs e−i/x . For x < 0 the argument is similar. Combining all these asymptotics we get the desired result.
!
3. The Hua–Pickrell Measures In this section we define the Hua–Pickrell measures. They form a 2-parameter family of U (∞)-invariant probability measures on the space of infinite Hermitian matrices. Let H (N) denote the real vector space formed by complex Hermitian N × N matrices, N = 1, 2, . . . Let H stand for the space of all infinite Hermitian matrices X = [Xi,j ]∞ i,j =1 . For X ∈ H and N = 1, 2, . . . , we denote by θN (X) ∈ H (N ) the upper left N × N corner of X. Using the projections θN H → H (N ), N = 1, 2, . . . , we may identify H with the projective limit space lim H (N ). We equip H with the corre← − sponding projective limit topology. We will also use the Borel structure on H generated by this topology. Let U (N ) be the group of unitary N × N matrices,N = 1, 2, . . . For any N , we u0 embed U (N ) into U (N + 1) using the mapping u → . Let U (∞) = lim U (N ) − → 01 denote the corresponding inductive limit group. We regard U (∞) as the group of infinite unitary matrices U = [Uij ]∞ i,j =1 with finitely many entries Uij ' = δij . The group U (∞) acts on the space H by conjugations. Proposition 3.1. For any s ∈ C, s > − 21 , there exists a probability Borel measure m(s) on H , characterized by the following property: for any N = 1, 2, . . . , the image of
Infinite Random Matrices and Ergodic Measures
101
m(s) under the projection θN is the probability measure m(s,N) on H (N ) defined by m(s,N) (dX) = (const N )−1 det((1 + iX)−s−N ) det((1 − iX)−¯s −N ) ×
N j =1
where
dXjj
d(Xj k )d(Xj k ),
(3.1)
1≤j
constN =
N j =1
π j ,(s + s¯ + j ) . 2s+¯s +2j −2 ,(s + j ),(¯s + j )
The measure m(s) is invariant under the action of U (∞). Comments. 1. For X ∈ H (N ) and z ∈ C we define the matrix (1 ± iX)z by means of the functional calculus. This makes the expression fN (X) = det((1 + iX)−s−N ) det((1 − iX)−¯s −N ),
X ∈ H (N )
meaningful. Equivalently, denoting by x1 , . . . , xN the eigenvalues of X, fN (X) =
N
(1 + ixj )−s−N (1 − ixj )−¯s −N ,
(3.2)
j =1
where we use the analytic continuation of the function (. . . )z from the positive axis to the region C \ (−∞, 0]. 2. When s is real, the expression (3.2) takes a simpler form fN (X) = (det(1 + X 2 ))−s−N ,
X ∈ H (N ),
s ∈ R.
Proof. Step 1. First of all, note that fN (X) ≥ 0. Therefore, if fN is integrable then it defines a finite measure on H (N ). Fix N ≥ 2 and write an arbitrary matrix X ∈ H (N ) in the block form Y ξ , Y ∈ H (N − 1), ξ ∈ CN−1 , t ∈ R. X= ∗ ξ t We will prove that for any Y ∈ H (N − 1) the integral of fN over ξ, t is finite and it is equal to
(ξ,t)∈CN −1 ×R
fN
Y ξ ξ∗ t
N d(ξj )d(ξj ) · dt · j =1
= fN−1 (Y ) ·
π N ,(s + s¯ + N ) . + N ),(¯s + N )
2s+¯s +2N−2 ,(s
(3.3)
For N = 1, Y and t disappear, and the claim is that the integral of f1 over R is finite and it is given by ∞ π ,(s + s¯ + 1) f1 (t)dt = (1 + it)−s−1 (1 − it)−¯s −1 dt = s+¯s . (3.4) 2 ,(s + 1),(¯s + 1) t∈R −∞
102
A. Borodin, G. Olshanski
Let us show (3.3) and (3.4) imply the proposition. Indeed, using induction on N we see that the integral of fN over H (N ) is finite and equals constN . Thus, the measure m(s,N) is correctly defined for any N . Next, (3.3) implies that the measures m(s,N) and m(s,N−1) are consistent with the projection X → Y from H (N) to H (N − 1). 2 Since H coincides with the projective limit of the spaces H (N) as N → ∞, we conclude that the measure m(s) exists and is unique. Finally, m(s) is invariant under the action of U (∞), because each m(s,N) is invariant under the action of U (N ) for all N = 1, 2, . . . Step 2. We proceed to the proof of (3.3) and (3.4). The latter formula follows from formula (3.9) in Lemma 3.3 below. The former formula is proved in [Hua, Theorem 2.1.5] for real s, and we employ his argument with slight modifications. Applying Lemma 3.2 (see below) we get fN (X) = det((1 + iY )−s−N )(1 + it + ξ ∗ (1 + iY )−1 ξ )−s−N × det((1 − iY )−¯s −N )(1 − it + ξ ∗ (1 − iY )−1 ξ )−¯s −N .
(3.5)
Next, note that the integral (3.3) is invariant under the conjugation of Y by a matrix V ∈ U (N − 1). Indeed to see this, we use the invariance of the function fN and make a change of a variable, V ξ → ξ . Therefore, without loss of generality we may assume that Y is a diagonal matrix. Denoting its diagonal entries (which are real numbers) as y1 , . . . , yN−1 and using (3.5) we reduce the integral (3.3) to N−1
(1 + iyj )−s−N (1 − iyj )−¯s −N
j =1
−s−N N−1 |ξj |2 yj |ξj |2 1 + × + i t − 2 2 1 + y 1 + y (ξ,t)∈CN −1 ×R j j j =1 j =1 −¯s −N N N−1 N−1 |ξj |2 |ξj |2 yj t − × 1 + − i d(ξj )d(ξj ) · dt. 2 2 1 + y 1 + y j j j =1 j =1 j =1
N−1
(3.6)
This integral is easily simplified. First, assuming that the variables ξ1 , . . . , ξN−1 are fixed, we make a change of variable t−
N−1 j =1
|ξj |2 yj → t. 1 + yj2
Next, we change the variables ξj ,
ξj 1 + yj2
→ ξj ,
j = 1, . . . , N − 1,
2 For real s, this fact was discovered by Hua Loo–Keng [Hua]. As we learnt from Peter Forrester, it was also discussed in the physics literature, see [Br].
Infinite Random Matrices and Ergodic Measures
which gives rise to the factor N−1
(1 + yj )2 . Then (3.6) is reduced to
(1 + iyj )−s−N+1 (1 − iyj )−¯s −N+1 ·
j =1
× 1 +
Setting r = N−1
103
N−1
(ξ,t)∈CN −1 ×R
−¯s −N |ξj | − it 2
j =1
1 +
N
N−1
−s−N |ξj |2 + it
j =1
d(ξj )d(ξj ) · dt.
(3.7)
j =1
|ξj |2 we readily reduce (3.7) to
(1 + iyj )−s−N+1 (1 − iyj )−¯s −N+1
j =1
π N−1 · ,(N − 1)
r≥0 t∈R
(1 + r + it)−s−N (1 + r − it)−¯s −N r N−2 drdt.
By Lemma 3.3, the double integral is finite and its value is given by (3.9), where we substitute a = s + N , b = s¯ + N (the assumption of Lemma 3.3 is satisfied because s > − 21 ). This implies (3.3). ! We proceed to the proof of two lemmas which were used in Proposition 3.1. Lemma 3.2. Consider the N × N matrix analog of the right halfplane in C: Mat(N, C)+ = {A ∈ Mat(N, C) | A + A∗ > 0}. Write N × N matrices in the block form according to a partition N = N1 + N2 , A11 A12 A= . A21 A22 Then for z ∈ C and A ∈ Mat(N, C)+ the following relation holds z det(Az ) = det(Az11 ) det((A22 − A21 A−1 11 A12 ) ).
(3.8)
Proof. First of all, we show that both sides in (3.8) make sense. Note that if A ∈ Mat(N, C)+ then any eigenvalue λ of A lies in the open right halfplane (indeed, if ξ ∈ CN is an eigenvector with the eigenvalue λ then 0 < ((A + A∗ )ξ, ξ ) = 2λ(ξ, ξ ), which implies λ > 0). Therefore, we can define the matrix Az by means of the functional calculus. Next, note that the matrices A11 and A22 − A21 A−1 11 A12 also belong to the matrix right halfplanes. Indeed, for the former matrix this is evident, and for the latter matrix this follows from the fact that A−1 ∈ Mat(N, C)+ and −1 −1 A22 − A21 A−1 11 A12 = ((A )22 ) .
Thus, the expressions (. . . )z in the right-hand side of (3.8) are well-defined. Since both sides of (3.8) are holomorphic functions in A in the connected region Mat(N, C)+ , we may assume, without loss of generality, that A lies in a small neighborhood of the matrix 1. Then we may interchange the symbol of determinant and exponentiation. This reduces (3.8) to the classical formula for the determinant of a block matrix, det A = det A11 · det(A22 − A21 A−1 11 A12 ), see, e.g. [Ga, Ch. II, §5.3]. !
104
A. Borodin, G. Olshanski
Lemma 3.3. We have π N−1 (1 + r + it)−a (1 + r − it)−b r N−2 drdt ,(N − 1) r≥0 t∈R = and t∈R
π N ,(a + b − N ) , 2a+b−2 ,(a),(b)
(1 + it)−a (1 − it)−b dt =
a, b ∈ C,
(a + b) > N,
π ,(a + b − 1) , 2a+b−2 ,(a),(b)
a, b ∈ C,
N > 1,
(3.9)
(a + b) > 1. (3.10)
Proof. The integral (3.10) is readily reduced to a known integral, see [Er, 1.5 (30)]. To evaluate the integral (3.9), we make a change of variable, t → (1 + r)t. The integral splits into a product of two integrals, one of which is (3.10) and the other one is the integral ,(a + b − N ) r N−2 (1 + r)−a−b+1 dr = . ,(N − 1) ,(a + b − 1) r≥0 This proves (3.9). N−2 Note also that (3.10) is a degeneration of (3.9), because r+ / ,(N − 1) degenerates to the delta function δ(r) at N = 1. ! Let C+ denote the right halfplane. Following Neretin [Ner2] we define a map ∞ H X = [Xj k ]∞ j,k=1 → (ζ1 , ζ2 , . . . ) ∈ R × C+
(3.11)
as follows. For any N = 2, 3, . . . , write the matrix θN (X) = [Xj k ]N j,k=1 in the block form θ (X) ξ θN (X) = N−1∗ ξ t and then set
ζN = it + ξ ∗ (1 + iθN−1 )−1 ξ ∈ C+ .
Finally, set ζ1 = X11 ∈ R. Proposition 3.4. The pushforward of the measure m(s) under the map (3.11) is a product measure µ1 × µ2 × . . . on the space R × C∞ + . Here µ1 , µ2 , . . . are the following probability measures: µ1 (dt) =
2s+¯s ,(s + 1),(¯s + 1) (1 + it)−s−1 (1 − it)−¯s −1 dt π,(s + s¯ + 1)
and, for N ≥ 2, ζ = r + it ∈ C+ , µN (dζ ) =
r N−2 2s+¯s +2N−2 ,(s + N ),(¯s + N ) (1 + ζ )−s−N (1 + ζ¯ )−¯s −N drdt. π,(s + s¯ + N ) ,(N − 1) (3.12)
Proof. This follows from the proof of Proposition 3.1.
!
Infinite Random Matrices and Ergodic Measures
105
Theorem 3.5. The Hua–Pickrell measures m(s) are pairwise disjoint. I.e., if s , s are two distinct values of the parameter s then there exist two disjoint Borel sets in H supporting the measures m(s ) and m(s ) , respectively. Proof. We will apply Kakutani’s theorem [Ka].Assume first we are given two probability measures, µ and µ , defined on the same Borel space. Take any measure ν such that both µ and µ are absolutely continuous with respect to ν. For instance, ν = µ +µ . Denote by µ /ν and µ /ν the respective Radon-Nikodym derivatives. The measure √ does not depend on the choice of ν. Denote it by µ µ and set µ , µ = µ µ .
µ µ ν ν
·ν
We have 0 ≤ µ , µ ≤ 1. Moreover, µ , µ = 1 is equivalent to µ = µ while µ , µ = 0 exactly means that µ and µ are disjoint. Next, assume µ = µ1 × µ2 × . . . and µ = µ1 × µ2 × . . . are two product probability measures defined on the same countably infinite product space. Kakutani’s theorem [Ka] says that µ and µ are disjoint if the infinite product ∞ N=1 µN , µN is divergent, i.e., the partial products tend to 0. Finally, consider the product space R × C∞ + and take as µ and µ the pushforwards ) ) (s (s of measures m and m , respectively, as explained in Proposition 3.4. We prove that µ and µ are disjoint. Then this immediately implies that the initial measures m(s ) and m(s ) are disjoint. We omit the value N = 1 which plays a special role and calculate the integral defining µN , µN for N ≥ 2. By (3.12) and (3.9) we get µN , µN =
,(s + N ),(s + N ),(s + N ),(s + N ) ,(s
s + s s= . 2
+ s
+ N ),(s
+ s
+ N)
,(s + s¯ + N ) , ,(s + N ),(¯s + N )
The classical asymptotic formula for the ratio of two ,-functions, see [Er, 1.18(4)], implies that ,(z + N ),(¯z + N ) z¯z 1 . ∼1− +O ,(z + z¯ + N ),(N ) N N2 It follows that µN , µN
|s − s |2 ∼1− +O 4N
Thus, the product of µN , µN ’s is divergent.
1 N2
.
!
4. Ergodic Measures In this section we recall the classification theorem and some other known results on U (∞)-invariant ergodic probability measures on the space of infinite Hermitian matrices.
106
A. Borodin, G. Olshanski
Consider the natural embeddings
H (N) → H (N + 1),
A →
A0 , 0 0
and denote by H (∞) the corresponding inductive limit space lim H (N ). Then H (∞) − → is identified with the space of infinite Hermitian matrices with finitely many nonzero entries. We equip H (∞) with the inductive limit topology. In particular, a function f : H (∞) → C is continuous if its restriction to H (N ) is continuous for any N . There is a natural pairing H (∞) × H → R,
(A, X) → tr(AX).
H is the algebraic dual space of H (∞) with respect to this pairing. Using the map H X → {Xii }∞ i=1 ! {Xij , Xij }i<j
we can identify H , as a topological vector space, with the infinite productspace R∞ = n R × R × · · · . Under this identification, H (∞) ⊂ H turns into R∞ n≥1 R , and 0 := ∞ ∞ the pairing defined above turns into the standard pairing between R0 and R . Given a Borel probability measure M on H , we define its Fourier transform, or characteristic function, as the following function on H (∞): A → ei tr(AX) M(dX). (4.1) H
The group U (∞) acts by conjugations both on H (∞) and H , and the pairing between these two spaces is, clearly, U (∞)-invariant. Each matrix from H (∞) can be brought by a conjugation to a diagonal matrix diag(r1 , r2 , . . . ) with finitely many nonzero entries. It follows that the Fourier transform of a U (∞)-invariant measure on H is uniquely determined by its values on the diagonal matrices from H (∞). Set = {ω = (α + , α − , γ1 , δ) ∈ R2∞+2 = R∞ × R∞ × R × R | α − = (α1− ≥ α2− ≥ · · · ≥ 0), α + = (α1+ ≥ α2+ ≥ · · · ≥ 0), δ ≥ 0, (αi+ )2 + (αi− )2 ≤ δ}. γ1 ∈ R, This is a closed subset of R2∞+2 . Denote γ2 = δ − (αi+ )2 − (αi− )2 ≥ 0. In this notation we have Proposition 4.1. There exists a parametrization of ergodic U (∞)-invariant probability measures on the space H by the points of the space . Given ω, the Fourier transform (4.1) of the corresponding ergodic measure M ω is given by ei tr(diag(r1 ,...,rn ,0,0,... ) X) M ω (dX) X∈H + − n ∞ ∞ e−iαk rj eiαk rj iγ1 rj −γ2 rj2 e . = 1 − iαk+ rj k=1 1 + iαk− rj j =1 k=1
Infinite Random Matrices and Ergodic Measures
107
Proof. See [Pi2, Proposition 5.9] and [OV, Theorem 2.9].
!
αi± ,
Remark 4.2. If only one of the parameters γ1 , γ2 is distinct from 0 then the corresponding ergodic measure is called elementary.A description of the elementary measures can be found in [OV, Corollaries 2.5–2.7]. Note, in particular, that the elementary measures corresponding to the parameter γ2 are standard Gaussian measures on H , see [OV, Corollary 2.6]. Since the expression of Proposition 4.1 is multiplicative with respect to the coordinates of ω, any ergodic measure is a convolution product of elementary ergodic measures. For N = 1, 2, . . . , let SN ⊂ RN denote the set of N -tuples of weakly decreasing real numbers λ = (λ1 ≥ · · · ≥ λN ). Given λ ∈ SN , let Orb(λ) denote the set of matrices X ∈ H (N ) with eigenvalues λ1 , . . . , λN . The sets of the form Orb(λ) are exactly the U (N )-orbits in H (N ). Given λ ∈ SN , we set max(λi , 0) , i = 1, . . . , N, ai+ (λ) = N 0, i = N + 1, N + 2, . . . , max(−λN+1−i , 0) , i = 1, . . . , N, ai− (λ) = N 0, i = N + 1, N + 2, . . . . Equivalently, if k and l denote the numbers of strictly positive terms in {ai+ } and {ai− }, respectively then λ = (a1+ (λ), . . . , ak+ (λ), 0, . . . , 0, −al− (λ), . . . , −a1− (λ)). Further, we set c(λ) =
∞ i=1
d(λ) =
∞ i=1
ai+ (λ) −
(ai+ (λ))2 +
∞
ai− (λ) =
λ1 + · · · + λN , N
(ai− (λ))2 =
λ21 + · · · + λ2N . N2
i=1
∞ i=1
By virtue of [OV, Theorem 3.3], any ergodic measure can be approximated by orbital measures on the spaces H (N ) as N → ∞. The next result provides an explicit description of the approximating orbital measures. It also clarifies the meaning of the parameters in Proposition 4.1. Proposition 4.3. Let {Orb(λ(N) ) | λ(N) ∈ SN } be a sequence of orbits and let {M (N) } be the sequence of the corresponding orbital measures on the spaces H (N ), N = 1, 2, . . . . We view each M (N) as a measure on H . The measures M (N) weakly converge to a measure M on H , i.e., f, M (N) → f, M for any bounded continuous function f on H , if and only if there exist limits αi± = lim ai± (λ(N) ), N→∞
γ1 = lim c(λ(N) ), N→∞
δ = lim d(λ(N) ). N→∞
i = 1, 2, . . . ,
108
A. Borodin, G. Olshanski
If this condition holds then the collection ω = ({αi+ }, {αi− }, γ1 , δ) is a point of and the limit measure M coincides with the ergodic measure M ω . Proof. See [OV, Theorem 4.1]. ! Proposition 4.4. For any U (∞)-invariant probability measure M on H there exists a probability measure P on such that M= M ω P (dω),
which means that for any bounded Borel function f on H , f, M = f, M ω P (dω).
(4.2)
Such measure P is unique. Conversely, any probability measure P on arises in this way from a certain measure M. Proof. This follows from Theorem 9.1 and Proposition 9.4.
!
We will call P the spectral measure for M. 5. Approximation of Spectral Measures In this section we show that the spectral measure for a U (∞)-invariant probability measure M on H can be obtained as a certain limit of finite–dimensional projections of M. For X ∈ H , let λ(N) (X) ∈ SN denote the spectrum of the finite matrix θN (X) ∈ H (N). Let us say that X ∈ H is regular if there exist limits αi± (X) = lim ai± (λ(N) (X)), N→∞
γ1 (X) = lim c(λ(N) (X)), N→∞
i = 1, 2, . . . , (5.1)
δ(X) = lim d(λ(N) (X)). N→∞
Let Hreg ⊂ H denote the subset of regular matrices in H . Since λ(N) (X) is a continuous function in X for any N, the functions ai± (λ(N) (X)), c(λ(N) (X)), and d(λ(N) (X)) are also continuous. It follows that Hreg is a Borel subset of H (more precisely, a subset of type Fσ δ ). Theorem 5.1. Any U (∞)-invariant probability measure on H is supported by Hreg . Proof. First, let M be an ergodic U (∞)-invariant probability measure on H . By Vershik’s ergodic theorem (see [OV, Theorem 3.2]), M is concentrated on the set of those X ∈ H for which the orbital measures Orb(λ(N) (X)) weakly converge to M. By Proposition 4.3, this set consists of exactly those X for which the limits (5.1) exist and coincide with the parameters of M given in Proposition 4.1. All such matrices X belong to Hreg , so that M is supported by Hreg . Thus, the claim of the theorem holds for ergodic measures. Now let M be an arbitrary U (∞)-invariant probability measure on H and P be its spectral measure. Apply (4.2) by taking as f the characteristic function of the set Hreg ⊂ H . We have f, M ω = 1 for any ω ∈ . Since P is a probability measure, we get from (4.2) that f, M = 1. Therefore, Hreg is of full measure with respect to M. !
Infinite Random Matrices and Ergodic Measures
109
Let π : Hreg → denote the map sending X ∈ Hreg to the point ω with the coordinates defined by (5.1). This is a Borel map, because it is a pointwise limit of a sequence of continuous maps. Theorem 5.2. Let M be a U (∞)-invariant probability measure on H and let M |Hreg be the restriction of M to Hreg , which is correctly defined by Theorem 5.1. The pushforward of the measure M |Hreg under the Borel map π introduced above coincides with the spectral measure P . Proof. Let F be an arbitrary bounded Borel function on and f be its pullback on Hreg . We must prove that f, M = F, P . By definition of P , we have f, M = f, M ω P (dω).
On the other hand, we know that for any ω ∈ , the measure M ω is supported by π −1 (ω) ⊂ Hreg (see the beginning of the proof of Theorem 5.1). Finally, by the definition of f , we have f |π −1 (ω) ≡ F (ω), so that f, M ω = F (ω). Therefore, the integral in the right-hand side is equal to F, P . ! For N = 1, 2, . . . , let πN : H → ⊂ R2∞+2 denote the composition of the maps H X → λ(N) (X) ∈ SN and SN λ → ({ai+ (λ)}, {ai− (λ)}, c(λ), d(λ)) ∈ . Theorem 5.3. Let M be a U (∞)-invariant probability measure on H , P be its spectral measure, and PN be the pushforward of M under the map πN : H → defined above. Then PN weakly converge to P as N → ∞. That is, for any continuous bounded function F on , lim
F,PN
N→∞
=
F, P .
Proof. By Theorem 5.1, Hreg ⊂ H is of full measure with respect to M, so that we may view (Hreg , M) as a probability space. We have πN (M) = PN , π(M) = P . Indeed, the first equality follows from the definition of PN and the fact that Hreg is of full measure, and the second equality is given by Theorem 5.2. Next, by the very definition of Hreg , we have πN (t) → π(t) for any t ∈ Hreg as N → ∞, where the limit is taken with respect to the coordinatewise convergence on the space R2∞+2 . Since F is continuous, we get F (πN (t)) → F (π(t)). That is, F ◦ πN converges to F ◦ π at any point t ∈ Hreg . Since these functions are uniformly bounded, it follows that (F ◦ πN )(t)M(dt) → (F ◦ π )(t)M(dt). Hreg
Hreg
Since πN (M) = PN and π(M) = P , (F ◦ πN )(t)M(dt) = F, PN , Hreg
Consequently, F, PN → F, P .
!
Hreg
(F ◦ π )(t)M(dt) = F, P .
110
A. Borodin, G. Olshanski
6. The Main Result Let s ∈ C, s > − 21 . Consider the Hua–Pickrell measure m(s) . Let P (s) be its spectral measure and P (s) be the corresponding point process on R∗ , see (0.8). In this section we prove the following theorem which is our main result. Theorem 6.1. The correlation functions of the process P (s) exist and coincide with the limit correlation functions from Theorem 2.1. Let X range over Hreg . Recall that in Sect. 5 we attached to X two monotone sequences {αi− (X)} and also, for any N = 1, 2, . . . , two monotone sequences
{αi+ (X)},
+ {ai,N (X) = ai+ (λ(N) (X))},
− {ai,N (X) = ai− (λ(N) (X))}.
From these data we form point configurations C(X) = {αi+ (X)} ! {−αi− (X)},
+ − CN (X) = {ai,N (X)} ! {−ai,N (X)},
where we omit the zero coordinates. Let M be a U (∞)-invariant probability measure on H . We restrict M to Hreg , which is a subset of full measure, and view (Hreg , M) as a probability space. Then any quantity depending on X becomes a random variable. Let P be the spectral measure of M and let PN be the finite–dimensional measures defined in Theorem 5.3. Recall that PN ’s approximate P as N → ∞. Let PN and P be the point processes on R∗ corresponding to PN and P , respectively. We may view PN and P as the random point configurations CN (X) and C(X), where X is viewed as the point of the probability space (Hreg , M). (N) By ρk and ρk we denote the k th correlation measures of the processes PN and P, respectively. Note that the very existence of the measures ρk is not evident. For a compact set A ⊂ R∗ we set NA,N (X) = Card(CN (X) ∩ A),
NA (X) = Card(C(X) ∩ A).
These are random variables. ± We know that for any fixed X and for any index i = 1, 2, . . . , ai,N (X) tends to αi± (X) (N)
as N → ∞. We would like to conclude from this that ρk converges to ρk as N → ∞. The next lemma says that, under a reasonable technical assumption, this is indeed true. Lemma 6.2. Assume that for any compact set A ⊂ R∗ there exist uniform in N estimates l E[NA,N ] ≤ Cl ,
l = 1, 2, . . . ,
where the symbol E stands for the expectation. Then for any k = 1, 2, . . . , the correlation measure ρk exists and coincides with the (N) weak limit of the measures ρk as N → ∞. The limit is understood in the following sense: for any continuous compactly supported function F on (R∗ )k , (N)
lim F, ρk = F, ρk .
N→∞
Infinite Random Matrices and Ergodic Measures
111
Proof. Fix a continuous compactly supported function F on (R∗ )k . It will be convenient to assume that F is nonnegative (this does not mean any loss of generality). Introduce random variables f and fN as follows: f (X) = F (x1 , . . . , xk ), fN (X) = F (x1 , . . . , xk ), (6.1) x1 ,...,xk ∈C (X)
x1 ,...,xk ∈CN (X)
where the sums are taken over ordered k-tuples of points with pairwise distinct labels. Any such sum is actually finite because F is compactly supported and the point configurations are locally finite. By the definition of the correlation measures, F, ρk = E[f ],
(N)
F, ρk = E[fN ].
The correlation measure ρk exists if E[f ] is finite for any f as above, see, e.g., [Len]. Thus, we have to prove that E[fN ] → E[f ] < ∞ as N → ∞. By a general theorem (see [Shir, Ch. II, §6, Theorem 4]), it suffices to check the following two conditions: Condition 1. fN (X) → f (X) for any X ∈ Hreg . Condition 2. The random variables fN are uniformly integrable, that is, sup fN (X)M(dX) → 0, as c → +∞. N
{X|fN (X)≥c}
Let us check Condition 1. This condition does not depend on M, it is a simple consequence of the regularity property. Indeed, let us fix X ∈ Hreg . For any ε > 0 set Rε = R \ (−ε, ε). Choose ε so small that the function F is supported by (Rε )k . Fix j ± ± so large that αj± (X) < ε. Since aj,N (X) → αj± (X), we have aN,j < ε for all N large enough. By monotonicity, the same inequality holds for the indices j + 1, j + 2, . . . as well. + − (X) or x = −ai,N (X) Recall that each point x ∈ CN (X) has the form x = ai,N for a certain index i. It follows that in the sums (6.1), only the points with indices i = 1, . . . , j − 1 may really contribute. Then, using the continuity of F we conclude that fN (X) → f (X). Let us check Condition 2. Choose a compact set A such that F is supported by Ak . The supremum of F (let us denote it by sup F ) is finite. We have fN (X) ≤ sup F ·NA,N (X)(NA,N (X)−1) . . . (NA,N (X)−k+1) ≤ sup F ·(NA,N (X))k . Therefore, the random variables fN are uniformly integrable provided that this is true for the random variables (NA,N )k for any fixed k. But the latter fact follows from the assumption of the theorem and Chebyshev’s inequality. ! Assume that PN is a determinantal process given by a symmetric nonnegative integral operator KN on R∗ . That is, the correlation functions have determinantal form with the kernel KN . For a compact set A ⊂ R∗ we denote by KA,N the restriction of the kernel KN to A. Lemma 6.3. Assume that for any compact set A ⊂ R∗ we have an estimate tr KA,N ≤ const, where the constant does not depend on N . Then the assumption of Lemma 6.2 is satisfied.
112
A. Borodin, G. Olshanski
Proof. Instead of ordinary moments we can deal with factorial moments. Given l = 1, 2, . . . , the l th factorial moment of NA,N is equal to (N)
ρl
(Al ) =
Al
det[KA,N (xi , xj )]1≤i,j ≤l dx1 . . . dxl = l! tr(∧l KA,N ).
Since KA,N is nonnegative, we have tr(∧l KA,N ) ≤ tr(⊗l KA,N ) = (tr(KA,N ))l . This concludes the proof, because we have a uniform bound for the traces by the assumption. ! (N)
Proof of Theorem 6.1. Take M = m(s) and denote the correlation measure ρk by (s,N) . The latter measure is calculated in Sect. 1: it coincides with a scaling of the kth ρk (s,N) correlation function ρ k (x1 , . . . , xk ) for the N th pseudo-Jacobi ensemble. In terms of the corresponding correlation functions, (s,N)
ρk
(s,N)
(x1 , . . . , xk ) = N k ρ k
(N x1 , . . . , N xk ),
x1 , . . . , xk ∈ R∗ .
By Theorem 2.1, for each k = 1, 2, . . . , there exists a limit (s,N) (x1 , . . . , xk ) lim ρ N→∞ k
(s,∞)
= ρk
(x1 , . . . , xk ),
(6.2)
uniformly on compact subsets in (R∗ )k . Moreover, the correlation functions have determinantal form. It follows that the assumptions of Lemma 6.3 are satisfied (indeed, (s,N) (x) over A). Contr KA,N is simply the integral of the first correlation function ρ1 sequently, we may apply Lemma 6.2. By this lemma, the correlation measures of the (s,N) as N → ∞. Therefore, process P (s) exist and coincide with limits of the measures ρk (s,∞) defined by the limit correlation functions they are nothing else than the measures ρk (6.2). !
7. Vanishing of the Parameter γ2 In this section we show that the parameter γ2 which is responsible for the presence of the Gaussian component vanishes for the measure m(0) . We start with a general result concerning an abstract U (∞)-invariant probability measure M. As in Sect. 6, let PN and P denote the corresponding point processes on (N) (N) R∗ , and let ρ1 and ρ1 be their first correlation measures. We assume that ρ1 approach ρ1 , as N → ∞, in the sense of Lemma 6.2: (N)
G, ρ1 → G, ρ1
for any G ∈ C0 (R∗ ),
(7.1)
where C0 (R∗ ) denotes the space of continuous functions with compact support on R∗ . In Sect. 6 we verified that the condition (7.1) holds when M is a Hua–Pickrell measure.
Infinite Random Matrices and Ergodic Measures
113
Proposition 7.1. Let M satisfy the condition (7.1). Further, assume that ε (N) x 2 ρ1 (dx) = 0 uniformly in N . lim ε→0 −ε
(7.2)
Then the spectral measure P of the measure M is concentrated on the subset γ2 = 0 of . Comment. The density of the measure ρ1 may have a singularity at 0. For instance, when M = m(0) , the density function is proportional to 1/x 2 . The condition (7.2) means (N) that the densities of the measures ρ1 , multiplied by x 2 , are uniformly integrable about x = 0. We need a simple lemma. Lemma 7.2. Assume we are given sequences + + a1,N ≥ a2,N ≥ · · · ≥ 0,
such that
− − a1,N ≥ a2,N ≥ · · · ≥ 0,
lim a ± N→∞ i,N
and lim
N→∞
∞ i=1
= αi± ,
N = 1, 2, . . . ,
i = 1, 2, . . .
+ 2 − 2 ((ai,N ) + (ai,N ) ) = δ < +∞,
N = 1, 2, . . . .
Further, let F (x) be an arbitrary continuous function on R+ such that F (x) = x 2 with a certain ε > 0. Set γ2 = δ − Then we have lim
N→∞
∞ i=1
∞ i=1
for |x| < ε
((αi+ )2 + (αi− )2 ) and note that γ2 ≥ 0.
+ − (F (ai,N ) + F (−ai,N )) =
∞ i=1
(F (αi+ ) + F (−αi− )) + γ2 .
+ − + − Proof. Fix k so large that αk+1 < ε, αk+1 < ε. Then ak+1,N < ε, ak+1,N < ε for + − sufficiently large N and, moreover, ai,N < ε, ai,N < ε for all i ≥ k +1 by monotonicity. Likewise, αi+ < ε, αi− < ε for i ≥ k + 1. Therefore, ± ± 2 F (±ai,N ) = (ai,N )
(for large N ),
F (±αi± ) = (αi± )2 ,
i ≥ k + 1.
It follows that ∞ i=1
+ − (F (ai,N ) + F (−ai,N ))
=
k i=1
+ − (F (ai,N ) + F (−ai,N )) +
∞ i=k+1
+ 2 − 2 ((ai,N ) + (ai,N ) )
and similarly ∞ i=1
(F (αi+ ) + F (−αi− )) =
k i=1
(F (αi+ ) + F (−αi− )) +
∞
((αi+ )2 + (αi− )2 ).
i=k+1
114
A. Borodin, G. Olshanski
As N → ∞, we have k i=1
+ − (F (ai,N ) + F (−ai,N )) →
k i=1
(F (αi+ ) + F (−αi− )),
by continuity of F , and ∞ i=k+1
+ 2 − 2 ((ai,N ) + (ai,N ) )→
∞
((αi+ )2 + (αi− )2 ) + γ2 ,
i=k+1
!
by the assumption of the lemma. This conludes the proof.
± (X) and αi± (X) Proof of Proposition 7.1. Let X range over Hreg . Recall the notation ai,N introduced in Sect. 5 and in the beginning of Sect. 6. Let γ2 (X) denote the value of the parameter γ2 at the point π(X) ∈ , where π : Hreg → is the projection defined in Sect. 5. Our aim is to prove that γ2 (X) = 0 almost everywhere with respect to the measure M. This implies the claim of the proposition. Fix a continuous function F (x) ≥ 0, with compact support on R and such that F (x) = x 2 near 0. For any X ∈ Hreg set
ϕN (X) =
∞ i=1
ϕ∞ (X) =
∞ i=1
+ − (F (ai,N (X)) + F (−ai,N (X))),
(F (αi+ (X)) + F (−αi− (X))).
± ± Applying Lemma 7.2 to the sequences ai,N = ai,N (X) and αi± = αi± (X), we get
ϕN (X) → ϕ∞ (X) + γ2 (X),
X ∈ Hreg .
The functions ϕN (X), ϕ∞ (X), γ2 (X) are all nonnegative Borel functions. By Fatou’s lemma (see, e.g., [Shir, Ch. II, §6, Theorem 2]), lim inf ϕN (X)M(dX) ≥ ϕ∞ (X)M(dX) + γ2 (X)M(dX). N→∞
t∈Treg
X∈Hreg
X∈Hreg
Recall that in the beginning of Sect. 6 we introduced the point configurations CN (X) associated with an arbitrary X ∈ Hreg . We have ϕN (X) =
∞ i=1
so that
+ − (F (ai,N (X)) + F (−ai,N (X)) =
x∈CN (X)
(N)
X∈Hreg
Likewise,
ϕN (X)M(dX) = F, ρ1 .
X∈Hreg
ϕ∞ (X)M(dX) = F, ρ1 .
F (x),
Infinite Random Matrices and Ergodic Measures
115
Therefore, (N)
lim inf F, ρ1 ≥ F, ρ1 + N→∞
X∈Hreg
γ2 (X)M(dX).
(7.3)
On the other hand, we will prove that (N)
lim supF, ρ1 ≤ F, ρ1 . N→∞
(7.4)
It will follow from (7.3) and (7.4) that γ2 (X) = 0 for M-almost all X, because γ2 (X) ≥ 0. To prove (7.4) we represent F (x), for an arbitrary ε > 0, in the form F (x) = Fε (x) + Gε (x), where 0 ≤ Fε (x) ≤ x 2 , supp Fε ⊂ [−ε, ε], Fε (x) = x 2 near 0, Gε ∈ C0 (R∗ ). (N) Choosing ε small enough, we can make Fε , ρ1 arbitrarily small, uniformly in N , (N) by virtue of the assumption (7.2). As for Gε , ρ1 , it tends to Gε , ρ1 , by (7.1). This concludes the proof of Proposition 7.1. ! Theorem 7.3. The spectral measure of the measure m(0) is concentrated on the subset γ2 = 0 of . Proof. By virtue of Proposition 7.1, it suffices to verify the condition (7.2). To do this, (N) (0,N) we use the fact that in our case the first correlation function ρ1 (x) = ρ1 (x) has a very simple expression: 1 N2 (0,N) ρ1 (x) = . (7.5) π 1 + N 2x2 The simplest way to check (7.5) is to use the relationship to the N th Dyson ensemble, where the first correlation function is identically equal to N . N 2x2 From (7.5) and the trivial estimate 1+N 2 x 2 ≤ 1 we readily conclude that the condition (7.2) is indeed satisfied. ! We expect that Theorem 7.3 holds for any Hua–Pickrell measure. 8. Remarks and Problems Orthogonal polynomials on the circle. In this paper we deal with the pseudo-Jacobi ensemble (1.1) defined by the weight function (1.4) on the real line. Instead of this, one could work with the orthogonal polynomial ensemble (0.11). Then we need orthogonal polynomials on the unit circle T with the weight function (1 + u)s¯ (1 + u) ¯ s = 2a (1 + cos ϕ)a ebϕ , where u = eiϕ ∈ T, −π < ϕ < π, s = a + ib. For real s, the weight function depends only on u = cos ϕ ∈ [−1, 1]. Then one can use a general trick described in [Sz, §11.5]. It allows one to express the polynomials on T in terms of two families of orthogonal polynomials on the interval [−1, 1], which, in our case, turn out to be certain Jacobi polynomials. This makes it possible to evaluate the Christoffel–Darboux kernel and then pass to a limit as N → ∞, which leads to another derivation of Theorem 2.1 (for real s). Perhaps, such an approach can be used for nonreal values of s as well.
116
A. Borodin, G. Olshanski
Painlevé V. Consider a kernel of the form K(x , x ) =
P (x )Q(x ) − Q(x )P (x ) , x − x
where the functions P and Q satisfy a differential equation of the form d P (x) P (x) = A(x) Q(x) dx Q(x) with a traceless rational 2×2 matrix A(x). Let J be a union of intervals inside the real line. Then the Fredholm determinant det(1 + K|J ) satisfies a certain system of partial differential equations with the endpoints of J regarded as variables, see [TW]. In particular, when only one endpoint is moving the corresponding ordinary differential equation often happens to be one of the Painlevé equations. The kernel K (s,∞) introduced in Theorem 2.1 is not an exception. In particular, the function d ln det 1 − K (s,∞) |(t −1 ,+∞) σ (t) = t , t > 0, dt satisfies a σ -version of the Painlevé V equation: −(tσ )2 = (2(tσ − σ ) + (σ )2 + i(¯s − s)σ )2 − (σ )2 (σ − 2is)(σ + 2i s¯ ), see [BD] for details. Note that the approach of [BD] is very different from the machinery developed in [TW].
Infinite measures. The construction of the Hua–Pickrell measures m(s) , s > − 21 , given in Sect. 3 can be extended to arbitrary complex values of s. However, when s ≤ − 21 , m(s) ceases to be a probability measure and becomes an infinite measure. Its pushforward m(s,N) under the projection θN : H → H (N ) makes sense only for sufficiently large values of N . Specifically, N must be strictly greater than −2s. Then the measure m(s,N) is defined, within a constant factor not depending on N , by formula (3.1), where the factor const N is subject to the recurrence relation const N = constN−1
π N ,(s + s¯ + N ) . + N ),(¯s + N )
2s+¯s +2N−2 ,(s
In other words, even if the measures m(s,N) are infinite, their projective limit m(s) = lim m(s,N) still exists. The reason is that the fibers of the projection H (N ) → H (N − 1) ← − have finite mass with respect to the conditional measures provided that N is large enough. Problem. Define and study the spectral decomposition of the infinite measures m(s) , s ≤ − 21 .
Infinite Random Matrices and Ergodic Measures
117
Representation-theoretic meaning of U (∞)-invariant measures on H . Let G(N ) = U (N ) H (N) be the semidirect product of the group U (N ) acting on the additive group H (N) by conjugations. Similarly, set G = U (∞) H (∞) = lim G(N ). − → The groups G(N ) are examples of the so-called Cartan motion groups, and the group G is an infinite–dimensional version of the groups G(N ). A unitary representation T of the group G is called spherical if it possesses a cyclic unit vector ξ which is invariant with respect to the subgroup U (∞) ⊂ G. There is a one-to-one correspondence between the classes of equivalence of the pairs (T , ξ ) and the U (∞)-invariant probability Borel measures M on H . Given M, the representation T can be realized in the Hilbert space L2 (H, M). Elements U ∈ U (∞) and A ∈ H (∞) act on functions f ∈ L2 (H, M) as follows: (T (U )f )(X) = f (U −1 XU ),
(T (A)f )(X) = ei tr(AX) f (X),
X ∈ H.
In this realization, ξ is the constant function 1. Consider the matrix coefficient ϕ(g) = (T (g)ξ, ξ ), called the spherical function. Since ϕ is U (∞)-biinvariant, the function ϕ |H (∞) , the restriction of ϕ to the subgroup H (∞) ⊂ G, is a U (∞)-invariant positive definite normalized function on H (∞). It follows that ϕ |H (∞) coincides with the Fourier transform (4.1) of the U (∞)-invariant probability Borel measure M. Under the correspondence (T , ξ ) ↔ M, ergodicity of M is equivalent to irreducibility of T . Note also that for an irreducible spherical representation T , the vector ξ is unique (within a scalar multiple), so that the function ϕ is an invariant of T . Thus, irreducible spherical representations of the group G = U (∞) H (∞) are parametrized by ergodic measures on H . For more details about representations of the group G, see [Ol2, Pi2]. The graph of spectra. Recall that by SN we denoted the subset of RN formed by vectors λ with weakly decreasing coordinates. For µ ∈ SN−1 and λ ∈ SN we write µ ≺ λ if the coordinates of λ and µ interlace: λ1 ≥ µ1 ≥ λ2 ≥ · · · ≥ λN−1 ≥ µN−1 ≥ λN . We set qN−1,N (µ, λ) =
(µi − µj )/
1≤i<j ≤N−1
0,
Note that for any λ ∈ SN , qN−1,N (µ, λ)dµ = 1, SN −1
(λk − λl ), if µ ≺ λ,
1≤k
otherwise.
dµ = dµ1 . . . dµN−1 .
Let M be an arbitrary U (∞)-invariant probability Borel measure and PN be the radial part of the measure θN (M) (this is a probability measure on SN ). Then the measures P1 , P2 , . . . satisfy the following consistency relation: qN−1,N (µ, λ)PN (dλ) = the density of PN−1 at µ. SN
118
A. Borodin, G. Olshanski
Conversely, if a sequence {PN } of probability measures satisfies the above consistency relation for each pair of adjacent indices then this sequence comes from a certain measure M. Introduce the set T formed by all infinite sequences τ = (τ (1) ≺ τ (2) ≺ . . . ),
τ (N) ∈ SN .
on T with the following property: for each N = Consider the probability measures P 2, 3, . . . , the probability that τ (N−1) lies in an infinitesimal region dµ about a point is uniquely µ ∈ SN−1 provided that τ (N) = λ, is qN−1,N (µ, λ)dµ. Any such measure P determined by a sequence {PN } satisfying the consistency relations. Thus, the measures are in one-to-one correspondence with the U (∞)-invariant probability measures M P on H . We call the collection of sets {SN } together with the functions qN−1,N (µ, λ) the graph of spectra. This term was suggested by Sergei Kerov.According to the philosophy of [VK] we call the functions qN−1,N (µ, λ) the cotransition functions of the graph of spectra. Here the term “graph” should not be understood literally, it only hints at a similarity with some “branching graphs” like the Young graph [VK] or the Gelfand–Tsetlin graph [BO]. For instance, the set T is an analogue of the set of paths in a branching graph. It can be shown that the graph of spectra can be obtained from the Gelfand–Tsetlin graph via a scaling limit procedure. Projective limit of the spaces U (N ). There exist projections (not group homomorphisms!) U (N ) → U (N − 1) which correspond, via the Cayley transform, to the projections H (N) → H (N − 1). This allows one to form the projective limit space U = lim U (N ). The space U admits a natural two-sided action of the group U (∞). The ← − space H is embedded into U, and the measures m(s) are transferred to U via this embedding. The resulting measures on U are quasiinvariant with respect to the two-sided action of U (∞). This makes it possible to construct analogs of the biregular representation for the group U (∞), see [Ner2, Ol5] for more details.
Analogy with the infinite symmetric group and the Poisson–Dirichlet distributions. The construction of the space U mentioned above is parallel to the construction of the space lim S(n) of virtual permutations, see [KOV]. Here S(n) denotes the symmetric group of ← − degree n. The family of the Hua–Pickrell measures should be viewed as a counterpart of a family {µt }t>0 of probability measures on the space of virtual permutations, see [KOV]. The Hua–Pickrell measures play the same role in harmonic analysis on the group U (∞) as the measures µt do in harmonic analysis on the infinite symmetric group S(∞). The decomposition of the measures µt on ergodic components is described by the Poisson–Dirichlet distributions. These are remarkable probability measures on an infinite–dimensional simplex (see [Kin]), which were studied by many authors. Thus, the spectral measures P (s) may be viewed as counterparts of the Poisson–Dirichlet distributions. Other examples of group actions. The action of the group U (∞) on the space H examined in the present paper is connected with a particular series of flat symmetric spaces
Infinite Random Matrices and Ergodic Measures
119
{G(N )/U (N ) = H (N)}N=1,2,... which in turn is related to a series of compact symmetric spaces: the unitary groups U (N ) with the action of U (N ) × U (N ). There exist 10 infinite series of compact symmetric spaces and related flat spaces. With each such series, one can associate an infinite–dimensional group action on a space of infinite matrices (see, e.g., [Pi2]) and a family of “Hua–Pickrell measures” on that space depending on a real or complex parameter (see [Ner2]). We expect that the results of the present paper can be carried over to this more general context. 9. Appendix: Existence and Uniqueness of Decomposition on Ergodic Components Let M be the set of U (∞)-invariant probability Borel measures on H . We equip M with the Borel structure generated by the functions of the form M → F, M, where M ranges over M and F is an arbitrary bounded Borel function on H . Let the symbol ex(. . . ) denote the set of extreme points of a convex set. Recall that elements of ex M are called ergodic measures. Theorem 9.1. (i) ex M is a Borel subset in M. (ii) For any M ∈ M there exists a probability Borel measure P on ex M representing M, i.e., F, M =
M∈ex M
F, MP (dM)
(9.1)
for any bounded Borel function F on H . (iii) The measure P is unique.
There exist different ways to prove such results, in particular: (a) Representation–theoretic techniques. (b) Dynkin’s theorem about boundaries of general Markov processes, see [Dyn] and the references therein. (c) Choquet’s theorem about existence and uniqueness of barycentric decomposition in compact metrizable convex sets which are “Choquet simplices”, see [Ph]. In (a) we reduce the problem to that of decomposing a spherical representation of the Cartan motion group G (see Sect. 8 above). Here we have to apply the classical desintegration theory for representations of locally compact groups and C ∗ -algebras (see [Dix]) to groups which are not locally compact but are inductive limits of locally compact groups (see [Ol1, §3.6]). A crucial fact is that (G, U (∞)) is a Gelfand pair in the sense of [Ol4, §6]. In (b) one should use the graph of spectra (see Sect. 8) to reduce Theorem 9.1 to Dynkin’s theorem. We follow (c) below. Proposition 9.2 (Choquet’s theorems). Let A be a convex subset of a locally convex topological vector space E. Assume that A is compact and metrizable. (i) ex A is a Borel subset of A (more precisely, a Gδ subset). (ii) For any a ∈ A there exists a probability Borel measure P on ex A representing a, i.e., f (a) =
b∈ex A
f (b)P (db)
for any continuous linear functional f on E.
(9.2)
120
A. Borodin, G. Olshanski
(iii) The measure P is unique if and only if the cone spanned by A is a lattice. Proof. Claim (i) is an elementary fact, see [Ph, Prop. 1.3]. Claims (ii) and (iii) are Choquet’s theorems, see [Ph, Sects. 3 and 9]. ! We need one more general result. Proposition 9.3. For any group action on a Borel space, the cone of finite Borel measures is a lattice. Proof. See [Ph, Sect. 10].
!
By Proposition 9.3, the set M satisfies the lattice condition from the last part of Proposition 9.2. However, there is no apparent way to make M a compact space, which is the major obstacle to applying Choquet’s theorems. We bypass it by embedding M into a larger convex set to which Choquet’s theorems can be applied. Here we use an idea borrowed from the proof of Theorem 22.10 in [Ol3] (see also Sect. 6 in [OkOl]). Proof of Theorem 9.1. For N = 1, 2, . . . let MN denote the set of U (N )-invariant !N be the larger set formed by U (N )probability Borel measures on H (N ), and let M invariant finite Borel measures of total mass less or equal to 1. Further, let C0 (H (N )) be the Banach space of continuous functions on H (N ) vanishing at infinity, and let EN denote its dual space equipped with the weak star topology. Using the natural pairing between functions from C0 (H (N )) and finite measures, we !N into EN . Note that M !N is a compact metrizable space with respect to the embed M topology of EN . For N = 2, 3 . . . , let θN−1,N denote the projection H (N ) → H (N − 1) which consists in removing the N th row and column from a N × N matrix. This projection !N to M !N−1 and also sends MN to MN−1 . Moreover, M coincides with the sends M projective limit space lim MN . ← − !N → M !N−1 is not continuous. The reason is that the Note that the map θN−1,N : M projection H (N) → H (N − 1) is not a proper map. (To illustrate this phenomenon, consider the projection of the plane R2 onto its first coordinate axis. Take the Dirac measure at a point on the second coordinate axis and move the point to infinity. Then the measure will weakly converge to the zero measure, while its projection will remain fixed.) !N → M !N−1 possesses a weaker property: it is However, the map θN−1,N : M semicontinuous from below. (This property does not rely on the specific character of the projection H (N) → H (N − 1), it holds for any continuous map between locally compact spaces.) This implies that for any N = 2, 3, . . . the set !N−1 × M !N | MN−1 ≥ θN−1,N (MN )} AN−1,N = {(MN−1 , MN ) ∈ M
(9.3)
is closed. It is convenient to allow the index N in (9.3) to take the value {1}. To this end we define H (0) as a one-point set. Then θ0,1 projects H (1) onto a single point, the vector !0 is the interval [0, 1] ∈ E0 , and M0 is identified with space E0 is identified with R, M 1. Next, we take as A the subset of E0 × E1 × . . . formed by infinite sequences a = !N for N = 1, 2, . . . , and for any N = (M0 , M1 , . . . ) such that M0 = 1, MN ∈ M 1, 2, . . . , the pair (MN−1 , MN ) belongs to the set AN−1,N defined in (9.3). We remark that A is a convex compact metrizable set.
Infinite Random Matrices and Ergodic Measures
121
For any N = 0, 1, 2, . . . , we define an embedding ι : MN → A as follows: MN M → a = (M0 , M1 , . . . , MN , 0, 0, . . . ), MN = M, Mi−1 = θi−1,i (Mi ), i = N, . . . , 1. We also consider the embedding ι : M → A which comes from the identification of M with lim MN . ← − Now, we make the following crucial observation: (*) Any element a ∈ A can be written as a convex combination of certain elements aN ∈ ι(MN ) and an element a∞ ∈ ι(M). Moreover, this representation is unique. By Proposition 9.3, for any N , the cone in EN spanned by MN is a lattice, and the same is also true for M. Together with (*), this implies that the cone generated by A is a lattice. Thus, the set A satisfies all the assumptions of Choquet’s theorem. Applying this theorem, we get that any point a ∈ A is uniquely represented by a probability measure P on ex A. On the other hand, (*) implies the following fact: (**) ex A is the disjoint union of the sets ι(ex M), ι(ex M0 ), ι(ex M1 ), . . . . Since ex A is a Borel set by Choquet’s theorem, and since all the sets ι(ex MN ) are evidently Borel sets, we conclude from (**) that ι(ex M) ⊂ A is a Borel set. Next, we note that the Borel structure on M coming from its embedding into A coincides with its initial Borel structure. Indeed, both structures are defined by functions on M of the form M → F, M, the only difference is in the choice of a class {F } of functions on the space H . In the latter case, F may be an arbitrary bounded Borel function, while in the former case F belongs to the smaller class of cylindrical functions of the form G ◦ θN with G ∈ C0 (H (N )), N = 1, 2, . . . . However, both classes clearly generate the same Borel structure. This proves claim (i) of Theorem 9.1. Further, it follows from (**) and the definition of the set A that if a ∈ M then its representing measure P is concentrated on ι(ex M) ⊂ ex A. Comparing (9.1) and (9.2) we get that (9.1) holds for any cylindrical function of the form F = G ◦ θN with G ∈ C0 (H (N )). But then it also holds for any bounded Borel function on H , as required. ! Recall that we have an explicit description of the set ex M: it is parametrized by the space (Proposition 4.1). The next claim, together with Theorem 9.1, is used in Proposition 4.4 above: Proposition 9.4. The “abstract” Borel structure on ex M, which comes from the standard Borel structures on M, coincides with the “concrete” Borel structure, which comes from the natural Borel structure on via the bijection ex M ↔ . Proof. Let us show that for any bounded Borel function f , the expression f, M ω is a Borel function in ω ∈ . Indeed, it suffices to check this claim for functions f of the form f (X) = ei tr(AX) , where A is an arbitrary fixed matrix from H (∞). Further, without loss of generality we may assume that A is a diagonal matrix, and then the claim follows from Proposition 4.1. Consider the correspondence ex M ↔ provided by Proposition 4.1. We have just proved that → ex M is a Borel map. Since both and ex M are standard Borel spaces, we may apply a general result (see [Ma, Theorem 3.2]) to conclude that our correspondence is an isomorphism of Borel spaces. !
122
A. Borodin, G. Olshanski
Acknowledgements. At various stages of the work we discussed the subject with Sergei Kerov, Yuri Neretin, and Anatoly Vershik. We are grateful to them for valuable remarks. The present work was completed during our stay at the Erwin Schrödinger International Institute for Mathematical Physics in July 2000, in the framework of the semester “Representation Theory 2000” organized by Victor Kac and Alexandre Kirillov. We thank them for the invitation and the administration of ESI for warm hospitality. We also thank Peter Forrester for drawing our attention to [WF] and [Br]. The second author (G. O.) was supported by the Russian Foundation for Basic Research, grant 98-01-00303.
References [A] [BD] [BO] [Br] [DVJ] [Dix] [Dyn] [Dys] [Er] [Ga] [Hua]
[Ka] [Kin] [KOV]
[Len] [Les1] [Les2] [Ma] [Me] [NW] [Ner1] [Ner2] [NU] [OkOl] [Ol1] [Ol2]
Askey, R.: An integral of Ramanujan and orthogonal polynomials. J. Indian Math. Soc. 51, 27–36 (1987) Borodin, A. and Deift, P.: In preparation Borodin, A. and Olshanski, G.: Harmonic analysis on the infinite–dimensional unitary group and determinantal point processes. In preparation Brouwer, P.W.: Generalized circular ensemble of scattering matrices for a chaotic cavity with nonideal leads. Physical Review B 51,16878–16884 (1995); cond-mat/95010253 Daley, D.J., Vere-Jones, D.: An introduction to the theory of point processes. Springer series in statistics, Berlin–Heidelberg–New York: Springer, 1988 Dixmier, D.: Les C ∗ -algèbres et leurs représentations. Paris: Gauthier-Villars, 1969 Dynkin, E.B.: Sufficient statistics and extreme points. Ann. Probab. 6, 705–730 (1978) Dyson, F.J.: Statistical theory of the energy levels of complex systems I, II, III. J. Math. Phys. 3, 140–156, 157–165, 166–175 (1962) Erdelyi, A. (ed.): Higher transcendental functions, Vol. 1. New York: Mc Graw–Hill, 1953 Gantmakher, F.R.: The theory of matrices. Russian edition: Moscow: Nauka, 1988; English edition: New York: Chelsea Publ. Co., 1959 Hua, L.K.: Harmonic analysis of functions of several complex variables in the classical domains. Chinese edition: Peking: Science Press, 1958; Russian edition: Moscow: IL, 1959; English edition: Transl. Math. Monographs 6, Providence; RI: Amer. Math. Soc., 1963 Kakutani, S.: On equivalence of infinite product measures. Ann. of Math. 1948, 214–224 (1948) Kingman, J.F.C.: Poisson processes. Oxford; Oxford Univ. Press, 1993 Kerov, S., Olshanski, G., Vershik, A.: Harmonic analysis on the infinite symmetric group. A deformation of the regular representation. Comptes Rend. Acad. Sci. Paris, Sér. I 316, 773–778 (1993); detailed version in preparation Lenard, A.: Correlation functions and the uniqueness of the state in classical statistical mechanics. Commun. Math. Phys. 30, 35–44 (1973) Lesky, P.A.: Endliche und unendliche Systeme von kontinuierlichen klassischen Orthogonalpolynomen. Z. angew. Math. Mech. 76, (3), 181–184 (1996) Lesky, P.A.: Eine Charakterisierung der kontinuierlichen und diskreten klassischen Orthogonalpolynome. Preprint 98–12, Mathematisches Institut A, Universität Stuttgart, 1998 Mackey, G.W.: Borel structure in groups and their duals. Trans. Amer. Math. Soc. 85, 134–165 (1957) Mehta, M.L.: Random matrices. 2nd edition. New York: Academic Press, 1991 Nagao, T., Wadati, M.: Correlation functions of random matrix ensembles related to classical orthogonal polynomials. J. Phys. Soc. Japan 60, (10), 3298–3322 (1991) Neretin,Yu.A.: Separation of spectra in analysis of Berezin kernel. Func. Anal. Appl. 34, (3), 197–207 (2000), math/9906075 Neretin, Yu.A.: Hua type integrals over unitary groups and over projective limits of unitary groups. math-ph/0010014 Nikiforov, A.F. and Uvarov, V.B.: Special functions of mathematical physics. Russian edition: Moscow: Nauka, 1984; English edition: Basel–Boston, MA: Birkhäuser Verlag, 1988 Okounkov, A. and Olshanski, G.: Asymptotics of Jack polynomials as the number of variables goes to infinity. Internat. Math. Res. Notices 13, 641–682 (1998) Olshanski, G.I.: Unitary representations of the infinite–dimensional classical groups U (p, ∞), SO(p, ∞), Sp(p, ∞) and the corresponding motion groups. Funct. Anal. Appl. 12, 185–195 (1979) Olshanski, G.I.: Method of holomorphic extensions in the representation theory of infinite– dimensional classical groups. Funct. Anal. Appl. 22, (4), 273–285 (1989)
3 Reference of the form math/??????? means preprint version posted in the “arXiv.org” (formerly “xxx.lanl.gov”) electronic archive and available via http://arXiv.org/abs/math/???????.
Infinite Random Matrices and Ergodic Measures
123
Olshanski, G.I.: Unitary representations of infinite-dimensional pairs (G, K) and the formalism of R. Howe. Representation of Lie Groups and Related Topics, A.M. Vershik and D.P. Zhelobenko, eds., Advanced Studies in Contemporary Math. 7, New York, etc.: Gordon and Breach Science Publishers, 1990, pp. 269–463 [Ol4] Olshanski, G.I.: On semigroups related to infinite-dimensional groups. In: Topics in representation theory, A.A. Kirillov, ed., Advances in Soviet Math., Vol. 2. Providence, R.I.: Am. Math. Soc. 1991, 67–101 [Ol5] Olshanski, G.I.: The problem of harmonic analysis on the infinite–dimensional unitary group. In preparation [OV] Olshanski, G. and Vershik, A.: Ergodic unitary invariant measures on the space of infinite Hermitian matrices. Contemporary Mathematical Physics, R. L. Dobrushin, R. A. Minlos, M. A. Shubin, A. M. Vershik (eds.), American Mathematical Society Translations, Ser. 2, Vol. 175, Providence, RI: Amer. Math. Soc., 1996, pp. 137–175 [Ph] Phelps, R.R.: Lectures on Choquet’s theorem. Van Nostrand, 1966 [Pi1] Pickrell, D.: Measures on infinite-dimensional Grassmann manifolds. J. Func. Anal. 70 (2), 323–356 (1987) [Pi2] Pickrell, D.: Mackey analysis of infinite classical motion groups. Pacific J. Math. 150, 139–166 (1991) [Ro] Romanovski, V.: Sur quelques classes nouvelles de polynômes orthogonaux. C. R. Acad. Sci. Paris 188, 1023–1025 (1928) [Shim] Shimomura, H.: On the construction of invariant measure over the orthogonal group on the Hilbert space by the method of Cayley transformation. Publ. RIMS Kyoto Univ. 10, 413–424 (1974/75) [Shir] Shiryaev, A.N.: Probability. Russian edition: Moscow: Nauka, 1980; English edition: New York: Springer-Verlag, 1996 [So] Soshnikov, A.: Determinantal random point fields. Russian Math. Surveys 55, 923–975 (2000), math/0002099 [Sz] Szegö, G.: Orthogonal polynomials. AMS Colloquium Publications XXIII. New York: Amer. Math. Soc., 1959 [TW] Tracy, C. and Widom, H.: Fredholm determinants, differential equations and matrix models. Commun. Math. Phys. 163, 33–72 (1994) [VK] Vershik, A.M., Kerov, S.V.: Asymptotic theory of characters of the symmetric group. Funct. Anal. Appl. 15, (4), 246–255 (1981) [WF] Witte, N.S. and Forrester, P.J.: Gap probabilities in the finite and scaled Cauchy random matrix ensembles. Nonlinearity 13, 1965–1986 (2000), math-ph/0009022
[Ol3]
Communicated by P. Sarnak
Commun. Math. Phys. 223, 125 – 141 (2001)
Communications in
Mathematical Physics
© Springer-Verlag 2001
On Ruelle–Perron–Frobenius Operators. I. Ruelle Theorem Aihua Fan1 , Yunping Jiang2, 1 Département de Mathématiques et Informatique, Université de Picardie Jules Verne, 33, Rue Saint Leu,
80039 Amiens Cedex 1, France. E-mail:
[email protected]
2 Department of Mathematics, Queens College of CUNY, Flushing, NY 11367, USA and Department of
Mathematics, CUNY Graduate School, 635 Fifth Avenue, New York, NY 10016, USA. E-mail:
[email protected] Received: 31 May 2000 / Accepted: 1 June 2001
Abstract: We study Ruelle–Perron–Frobenius operators for locally expanding and mixing dynamical systems on general compact metric spaces associated with potentials satisfying the Dini condition. In this paper, we give a proof of the Ruelle Theorem on Gibbs measures. It is the first part of our research on the subject. The rate of convergence of powers of the operator will be presented in a forthcoming paper. 1. Introduction Let (X, d) be a compact metric space and f : X → X be a continuous map. It gives us a dynamical system {f n }∞ n=0 . We simply call f a dynamical system on X. We call a function ψ : X → R+ := {x > 0} a potential. Given a dynamical system f and a ∗ potential ψ, we can define an operator L = Lf,ψ as Lφ(x) = ψ(y)φ(y) y∈f −1 (x)
for φ in a suitable function space on X. The operator we just defined is called a Ruelle– Perron–Frobenius operator, or simply Ruelle operator. Ruelle operators play important roles in thermodynamics. They are actually useful tools in many different areas of mathematics and mathematical physics. The famous Ruelle theorem deals with the maximal spectrum of the transfer operator associated to a locally expanding dynamical system and a potential with certain smoothness. When the given dynamical system is the one-side shift on a symbolic space of finite type and when the given potential is a Hölder continuous function, Ruelle proved in [R1, R2] (see also [Bo]) that the Ruelle operator acting on the Hölder continuous function space has a unique maximal positive eigenvalue ρ with a positive eigenfunction. This Supported in part by NSF grants and PSC-CUNY awards
126
A. Fan, Y. Jiang
result gives a mathematical understanding of the existence and uniqueness of the Gibbs measure (also called the equilibrium measure) of a lattice gas. Walters [Wa] generalized the Ruelle theorem in a more general setting where a dynamical system can be expansive and mixing and a potential can be of summable variation. Since then other approaches towards the Ruelle theorem have appeared. For example, for the one-side shift of a symbolic space of finite type, Ferrero and Schmitt [FS] gave a geometric proof by using the Hilbert projective metrics introduced by Birkhoff in [Bi]; the first author of the present paper gave a proof by bringing some ideas from probability theory [Fa]. For a locally expanding and mixing dynamical system and a Hölder continuous potential, the second author of the present paper gave a simple proof of the existence and simplicity of the maximal eigenvalue of a Ruelle operator in [Ji] without using any fixed point theorem. By combining the ideas and techniques in both authors’ research, we have studied Ruelle operators extensively. In this paper, we present a new proof of the Ruelle theorem for a locally expanding and mixing dynamical system on a general compact metric space associated with a potential satisfying the Dini condition. An important problem in dynamical systems concerns asymptotic behaviour of the iterates of a Ruelle operator. In our second paper [FJ] on this research, we will present our estimates of the convergence speed of the iterates of a Ruelle operator. The article is organized as follows. In Sect. 2, we give some geometric properties of a locally expanding and mixing dynamical system on a compact metric space. In Sect. 3, we study the Ruelle operator. When the potential satisfies the Dini condition, we first adapt the technique in [Ji] to prove that there is a strictly positive eigenvalue of the Ruelle operator with a strictly positive eigenfunction. Unlike the case of the Hölder potential, the strictly positive eigenfunction here may not have the same smoothness as the potential (see Remark 1). This eigenfunction allows us to renormalize the Ruelle operator. Then we use the technique in [DF] to prove that there is a unique Gibbs measure for a system with the Dini potential. We don’t need to construct and to use any Markov partition. The reader who is interested in the Ruelle Theorem may refer to [Bo, R1, R2,Wa] for some other different proofs under different settings. In the last section, we will apply our result to prove the existence and uniqueness of a smooth invariant measure for an expanding C 1+ω dynamical system on a d-dimensional connected compact Riemannian manifold, where ω is a modulus of continuity which satisfies the Dini condition. This result is a little more general version of the Krzyzewski–Szlenk Theorem (see [KS]). 2. Geometry of Locally Expanding and Mixing Dynamical Systems Let X be a compact metric space with metric d and let f : X → X be a continuous map. For any n ≥ 0, we define a new metric dn on X, called the n-Bowen metric, as dn (x, y) = max d(f j (x), f j (y)). 0≤j ≤n
The n-Bowen ball centered at x ∈ X of radius r > 0 is denoted by Bn (x, r). The 0Bowen metric is just the original metric d on X. A dynamical system f on X is said to be locally expanding if there are constants λ > 1 and b > 0 such that d(f (x), f (y)) ≥ λd(x, y),
d(x, y) ≤ b.
We call (λ, b) a primary expanding parameter. A dynamical system f on X is said to be mixing if for any non-empty open set U of X, there is an integer n > 0 such that f n (U ) = X. We will need to know some properties of a locally expanding dynamical
On Ruelle–Perron–Frobenius Operators. I. Ruelle Theorem
127
system. We state them in the following three propositions whose proofs will be postponed to end of the paper (Appendix). Proposition 1. Suppose f is a locally expanding dynamical system on a compact metric space (X, d) with a primary expanding parameter (λ, b). Then 1. f is a local homeomorphism. More precisely, for any x ∈ X and 0 < b ≤ b, f : B(x, b ) → f (B(x, b )) is a homeomorphism. 2. For any y ∈ X, f −1 (y) is finite. And moreover, there is a constant 0 < a ≤ b such that for any y ∈ X with f −1 y = {x1 , · · · , xn }, there are local inverses g1 , · · · , gn of f defined on B(x, a) such that gj (y) = xj and gj (B(y, a)) (1 ≤ j ≤ n) are pairwise disjoint. 3. Let a > 0 be a constant in (2). We have #(f −1 (x)) = #(f −1 (y)) if d(x, y) ≤ a. Furthermore, we can arrange f −1 (x) = {x1 , · · · , xn } and f −1 (y) = {y1 , · · · , yn } so that d(x, y) d(xj , yj ) ≤ (1 ≤ j ≤ n). λ 4. There is a constant n0 such that #(f −1 (x)) ≤ n0 for all x ∈ X . 5. Let a > 0 be a constant in (2). If 0 < r ≤ a, the map f n : Bn (x, r) → B(f n (x), r) is a homeomorphism. If X is connected, #(f −1 (x)) is a constant independent of x. In this case, we say that f is a covering. Henceforth, we call a pair of constants (λ, a) which appeared in Proposition 1 (2) an expanding parameter of the dynamical system (X, f ). Note that if 1 < λ ≤ λ and 0 < a ≤ a, then (λ , a ) is also an expanding parameter. In the sequel, (λ, a) will be reserved to this usage. Proposition 2. Suppose f : X → X is a mixing map defined on a compact metric space (X, d). For any r > 0, there is an integer p = p(r) ≥ 1 such that f p (B(x, r)) = X for any x ∈ X. Proposition 3. Suppose f : X → X is a locally expanding and mixing dynamical system defined on a compact metric space (X, d) with an expanding parameter (λ, a). For any 0 < r ≤ a, let p = p(r) ≥ 1 be an integer in Proposition 2, then p 1 ≤ # f −(n+p) (y) ∩ Bn (x, r) ≤ n0 for any x, y ∈ X and for any n ≥ 1.
128
A. Fan, Y. Jiang
3. Ruelle Theorem Let (X, d) be a compact metric space. Let f be a locally expanding and mixing map with an expanding parameter (λ, a), λ > 1, 0 < a < 1. Denote by C = C(X, R) the space of all continuous functions φ : X → R with the supremum norm ||φ|| = max |φ(x)|. x∈X
A right continuous and increasing function ω : R+ → R+ with ω(0) = 0 is called a modulus of continuity. Given a modulus of continuity ω, denote by Hω = Hω (X, R) the space of all functions φ ∈ C satisfying [φ]ω =
sup
x,y∈X,0
|φ(x) − φ(y)| < ∞. ω(d(x, y))
A modulus of continuity ω(t) is said to satisfy the Dini condition if 1 ω(t) dt < ∞. t 0 For a modulus of continuity ω, we define ω(t) ˜ =
∞
ω(λ−n t).
n=1
Lemma 1. Suppose ω satisfies the Dini condition. Then ω(t) ˜ =
∞
ω(λ−n t) → 0 as t → 0+ .
n=1
Proof. Since ω is increasing, we have ∞ t/λ ω(u) 1 −x ω(t) ˜ ≤ du → 0 ω(λ t)dx = log λ u 1 0
as
t → 0+ .
Remark 1. The orders of ω(t) and ω(t) ˜ near 0+ may be different. If ω(t) ≈ t α near 0+ (0 < α ≤ 1), we have indeed ω(t) ˜ ≈ t α near 0+ . However, in general, Hω ⊂ Hω˜ but not equal. For example, if ω(t) ≈ 1/| log t|β for some β > 1, we have ω(t) ˜ ≈ 1/| log t|β−1 . In particular, if β ≤ 2, then ω(t) ˜ does not satisfy the Dini condition. For any modulus of continuity ω and fixed constants s > 0 and K > 0, define ω = Hω (X, R) to be the subset of Hω consisting of all functions satisfying HK,s K,s φ(x) ≥ s,
φ(x) ≤ eKω(d(x,y)) , φ(y)
x, y ∈ X, d(x, y) ≤ a.
Henceforth, suppose ω is a modulus of continuity satisfying the Dini condition. Take ω ψ > 0 a function in HK as potential. We define a linear operator L = Lψ as 0 ,s0 ψ(y)φ(y), φ ∈ C. Lφ(x) = y∈f −1 (x)
On Ruelle–Perron–Frobenius Operators. I. Ruelle Theorem
129
This operator is called a Ruelle–Perron–Frobenius operator or Ruelle operator. Without loss of generality, we always assume s0 = 1 (otherwise, we consider ψ/s0 as our potential). Let M be the dual space of C. From the Rieze representation theorem, M is actually the space of all Borel measures on X. Let L∗ : M → M be the dual operator of L : C → C. Define Gn (x) =
n−1
ψ(f j (x)),
(n ≥ 1, x ∈ X).
j =0
For any ν ∈ M and φ ∈ C, let ν, φ = X φ dν. We also need the space Hω˜ = Hω˜ (X, R), the space of all functions φ ∈ C satisfying [φ]ω˜ =
sup
x,y∈X,0
|φ(x) − φ(y)| < ∞. ω(d(x, ˜ y))
Theorem 1 (Ruelle Theorem). Under the above assumptions, we have the following statements: ω˜ such 1. There are a strictly positive number ρ and a strictly positive function h ∈ HK,s that Lh = ρh. 2. There is a unique probability measure ν = νψ ∈ M such that L∗ ν = ρν. 3. For any 0 < r ≤ a, there is a constant C = C(r) > 0 such that
C
−1
ν Bn (x, r) ≤ −n ≤C ρ Gn (x)
holds for all x ∈ X and n ≥ 1. 4. Take h in (1) such that ν, h = 1. Then for any φ ∈ C, ρ −n Ln φ converges uniformly to ν, φh as n goes to infinity. 5. The number ρ is a simple eigenvalue of the operator L : C → C. Remark 2. Since L is a positive operator acting on C, its spectral radius is equal to 1 limn→∞ Ln 1 n . Then it follows from Theorem 1 (4) that the eigenvalue ρ is equal to the spectral radius of L acting on C. Remark 3. The probability measure µ = hν coming from Theorem 1 (4) is called the Gibbs measure for system f with potential ψ. We also refer to the inequalities in Theorem 1 (3) as the Gibbs property. Theorem 1 implies that there is a unique f -invariant probability measure satisfying the Gibbs property, which is µ = hν. Our proof of Theorem 1 is based on several lemmas. Lemma 2 (Naive Distortion). For any x, y ∈ X with dn (x, y) ≤ a, G (x) n n ˜ (x), f n (y))). ≤ K0 ω(d(f log Gn (y)
130
A. Fan, Y. Jiang
Proof. Let xi = f i (x) and yi = f i (y) for 0 ≤ i ≤ n. Then d(xi , yi ) ≤ λn−i d(xn , yn ). So | log Gn (x) − log Gn (y)| ≤ ≤
n−1 i=0 n−1
| log ψ(xi ) − log ψ(yi )| K0 ω(d(xi , yi ))
i=0
≤ K0
n−1
ω(λ−(n−i) d(xn , yn ))
i=0
≤ K0
n
ω(λ−i d(xn , yn )).
i=1 ω˜ ω˜ . Lemma 3. LHK,s ⊆ HK,s ω˜ . For any x, y ∈ X, d(x, y) ≤ a, let f −1 (x) = {x , · · · , x } Proof. Suppose φ ∈ HK,s 1 k and f −1 (y) = {y1 , · · · , yk } such that d(xi , yi ) ≤ λ−1 d(x, y). Then
Lφ(x) =
k
ψ(xi )φ(xi )
i=1
≤
k
˜ i ,yi )) ψ(yi )eK0 ω(d(xi ,yi )) φ(yi )eK0 ω(d(x
i=1
−1 ˜ −1 d(x,y)) ≤ Lφ(y) eK0 ω(λ d(x,y))+K0 ω(λ ˜ = Lφ(y) eK0 ω(d(x,y)) . It is clear that Lφ(x) ≥ s for all x ∈ X because ψ(x) ≥ 1.
The following lemma is a direct conclusion of Arzela–Ascoli Theorem and Lemma 2. ω˜ which is bounded in supremum norm has a convergent Lemma 4. Any sequence in HK,s ω˜ . subsequence in C whose limit is still in HK,s
Define S as the set consisting of positive real numbers ξ > 0 such that there is a φ in ω˜ satisfying Lφ ≥ ξ φ. HK,s Lemma 5. The set S is a non-empty bounded subset in the real line R. ω˜ . Then for any y Proof. First let us show that S is non-empty. Take a function φ in HK,s in X, we have
s φ(y) Lφ(x) = ψ(y) φ(x) ≥ φ(x). ψ(y)φ(y) = φ(x) ||φ|| −1 −1 y∈f
(x)
y∈f
(x)
On Ruelle–Perron–Frobenius Operators. I. Ruelle Theorem
131
Thus ξ = s/||φ|| is in S. ω˜ , let ||φ|| = φ(x) for some Now let us show that S is bounded. For any φ in HK,s x ∈ X. Then Lφ(x) = ψ(y)φ(y) ≤ φ(x) ψ(y) ≤ Dψ φ(x), y∈f −1 (x)
y∈f −1 (x)
where Dψ = max y∈X
Therefore, S is bounded by Dψ .
ψ(z) < ∞.
z∈f −1 (y)
Proof of Theorem 1 (1). Let ρ = sup S > 0. There is a sequence {ξn }∞ n=1 in S converω ˜ gent to ρ. Let φn be a function in HK,s such that Lφn ≥ ξn φn . We may assume that minx∈X {φn (x)} = s. Under this assumption, {φn }∞ n=1 is a bounded sequence in C since ω˜ . By Lemma 4, it has a convergent subsequence in C whose limit is it is a subset in HK,s ω˜ . Let us denote φ as the limit. Then Lφ ≥ ρφ . in HK,s 0 0 0 We claim that Lφ0 = ρφ0 . Otherwise, there was a point y in X such that Lφ0 (y) > ρφ0 (y). Then there is a neighborhood U of y such that Lφ0 (y ) − ρφ0 (y ) > 0
(∀y ∈ U ).
Since f is mixing, there is an integer n > 0 such that f n (U ) = X. Then Ln (Lφ0 − ρφ0 ) > 0, i.e. L Ln φ0 > ρLn φ0 . ω , we get a ξ > ρ such that Lφ ≥ ξ φ. This Therefore by choosing φ = Ln φ0 ∈ HK,s contradicts the maximal property of ρ.
For ρ and h in Theorem 1 (1), take ˜ ψ(x) =
h(x) ψ(x) > 0. ρh(f (x))
˜ The important feature of L˜ is Consider the Ruelle operator L˜ = Lψ˜ with potential ψ. that ˜ = 1. L1 We call it a normalized Ruelle operator. Let ˜n = G
n−1
˜ i (x)). ψ(f
i=0
Then ˜ n (x) = Gn (x) G
h(x) . ρ n h(f n (x))
132
A. Fan, Y. Jiang
˜ n ’s (Notice that ψ˜ is a function in Hω˜ and may not satisfy the Dini condition. But G have a similar distortion property to Gn ’s, see Lemma 2). Consequently, we have the relations between L and L˜ and between L∗ and L˜ ∗ : Ln φ = ρ n hL˜ n (φh−1 )
L∗n ν = ρ n h−1 L˜ ∗n (hν).
and
It follows that (for n = 1) a measure ν in M satisfies L∗ ν = ρν if and only if L˜ ∗ µ = µ for µ = hν. ˜ A probability measure µ ∈ M is called a ψ-measure if L˜ ∗ µ = µ. ˜ The above relation gives us a one-to-one correspondence between the ψ-measures and the measures ν such that L∗ ν = ρν. So, Theorem 1 (2) is equivalent to saying that there ˜ is a unique ψ-measure. ˜ Now we are led to prove that there is a unique ψ-measure. To do this we introduce a sequence of linear operators P = {Pn }∞ defined as n=1
Pn φ(x) = L˜ n φ(f n (x)) =
˜ n (y)φ(y). G
y∈f −n (f n (x))
Each Pn is a positive operator and satisfies the normalization condition Pn 1 = 1. Let Pn∗ be the dual operator of Pn . A probability measure µ such that Pn∗ µ = µ for all n ≥ 1 is called a G-measure. By the Schauder-Tychonoff fixed point theorem there is at ˜ ˜ least one ψ-measure. It is easy to see that any ψ-measure is f -invariant (consequence ˜ of L(φ ◦ f ) = φ) and is a G-measure. Therefore, in order to prove Theorem 1 (2), we only need to prove that there is a unique G-measure . The uniqueness of the G-measure is described by the following lemmas (Lemmas 6–9). Lemma 6. The sequence of operators Pn : C → C (n ≥ 1) satisfy the relations Pm Pn = Pn Pm = Pm (m ≥ n). Proof. Let us show that Pm Pn = Pm . Gm (y)Pn φ(y) Pm Pn φ(x) = y∈f −m (f m (x))
=
Gm−n (w)Gn (y)Pn φ(y)
w∈f −(m−n) (f m (x)) y∈f −n (w)
=
w∈f −(m−n) (f m (x)) y∈f −n (w)
=
=
w∈f −(m−n) (f m (x))
y∈f −n (w)
Gm−n (w)Gn (z)φ(z)
z∈f −n (w)
Gn (y)
Gn (z)φ(z)
z∈f −n (f n (y))
Gn (y)
w∈f −(m−n) (f m (x)) y∈f −n (w)
Gm−n (w)Gn (y)
z∈f −n (w)
Gm (z)φ(z)
On Ruelle–Perron–Frobenius Operators. I. Ruelle Theorem
=
133
Gm (z)φ(z)
w∈f −(m−n) (f m (x)) z∈f −n (w)
=
Gm (z)φ(z) = Pm φ(x).
z∈f −m (f m (x))
We use the fact that y∈f −n (w) Gn (y) = 1 and f n (y) = w. This also implies that Pn is a projection, i.e., Pn2 = Pn . Similar arguments imply that Pn Pm = Pm . Lemma 7. Denote 1n = ImPn . For any φ ∈ C and χ ∈ 1n , Pn (φχ ) = χ Pn φ. Proof. Suppose χ (x) =
y∈f −n (f n (x)) Gn (y)β(y).
Pn (φχ )(x) =
=
Gn (z)φ(z)
z∈f −n (f n (x))
Gn (y)Gn (z)φ(z)β(y)
Gn (y)β(y)
y∈f −n (f n (x))
= χ (x)Pn φ(x).
Gn (y)β(y)
y∈f −n (f n (z))
y∈f −n (f n (x)) z∈f −n (f n (x))
=
Then
Gn (z)φ(z)
z∈f −n (f n (x))
The above two lemmas show that P is a compatible chain of Markovian projections (CCMP). Let Fn be the σ -algebra generated by 1n . Lemma 6 implies 1m ⊆ 1n for m ≥ n ≥ 1. So {Fn }∞ n=1 is a decreasing sequence of sub-σ -algebras of the Borel-field FX of X. Let F∞ = ∩∞ n=1 Fn . A G-measure µ is said to be P-ergodic if µ|F∞ is trivial, i.e., µ(A) = 0 or 1 for any A ∈ F∞ . The following statements and results are quite standard in ergodic theory. We postpone proofs of Lemmas 8 and 9 to Appendix. Let µ ∈ M be a probability measure. We use E(φ|Fn ) to denote the conditional expectation of φ given Fn . If µ is a G-measure for P, by Lemma 7, we have that for any χ ∈ 1n , µ, χ Pn φ = µ, Pn (φχ ) = Pn∗ µ, φχ = µ, φχ . This means Pn φ = E(φ|Fn ),
µ − a.e.
Therefore, by the decreasing martingale theorem (see [Pa, pp. 21–30]), for any φ ∈ C, Pn φ → E(φ|F∞ ) a.e. and in L1 (X, FX , µ). Furthermore, µ is P −ergodic ⇐⇒ lim Pn φ =< φ, µ > µ−a.e. (∀φ ∈ C). n→∞
Lemma 8. Suppose µ1 , µ2 and µ are G-measures. 1. If µ1 and µ2 are P-ergodic, then either µ1 = µ2 or µ1 ⊥ µ2 . 2. µ is P-ergodic iff µ is an extremal point of the set of all G-measures.
134
A. Fan, Y. Jiang
The set G of all G-measures is a compact convex subset in M which is metrizable. Let EG be the set of P-ergodic µ in G. Then Lemma 8 says that EG consists of all extremal points in G. By the Choquet representation theorem (see [OR, pp. 1–32]), we have that for each µ ∈ G, there exists a Borel probability measure m on G, supported on the set EG of extremal points, such that µ= νdm(ν). G
So, the uniqueness of G-measures reduces to the uniqueness of ergodic G-measures. The uniqueness of (ergodic) G-measures also ensures the convergence of L˜ n . Actually, by the relation between L˜ n and P n and the surjectivity of f n : X → X, we have Pn φ − c = L˜ n φ − c
(∀c constant).
It follows that for any φ in C, Pn φ converges uniformly to a constant if and only if L˜ n φ converges uniformly to the same constant. It is easy to see that the constant must be µ, φ for any G-measure µ (a consequence of Pn∗ µ = µ). Lemma 9. The following are equivalent. 1. There is a unique G-measure µ for P. 2. For every φ ∈ C, Pn φ converges uniformly to a constant. 3. For every φ ∈ C, Pn φ converges pointwise to a constant. Now we are ready to prove the rest of the Theorem. We first prove the Gibbs property for all G-measures by using the naive distortion lemma and Proposition 3. Proof of Theorem 1 (3). We prove the Gibbs properties for any G-measure µ. Recall that ˜ n (x) = Gn (x) G
h(x) ρ n h(f n (x))
,
Lemma 2 and Lemma 1 imply that there is a constant C0 = C0 (r) > 0, C0 (r) → 1 as r → 0+ , such that ˜ n (z) G C0−1 ≤ ≤ C0 ˜ n (x) G for all z, x ∈ X, n ≥ 1 and dn (x, z) ≤ 2r. For any x ∈ X and r such that 0 < 2r ≤ a, take a φ ∈ C such that 1Bn (x,r) ≤ φ ≤ 1Bn (x,2r) , where 1B denotes the characteristic function of a set B. Then we have µ(Bn (x, r)) ≤ φ dPn∗ µ = Pn φ dµ, where Pn φ(y) =
z∈f −n (f n (y))
˜ n (z)φ(z) ≤ G
z∈f −n (f n (y))
˜ n (z)1Bn (x,2r) (z). G
On Ruelle–Perron–Frobenius Operators. I. Ruelle Theorem
135
Proposition 1 implies that #(f −n (f n (y)) ∩ Bn (x, 2r)) ≤ 1. Thus we get ˜ n (x). µ Bn (x, r) ≤ C0 G On the other hand, we have µ Bn (x, 2r) ≥
∗ φ dPn+p µ=
where p is the integer in Proposition 3 and Pn+p φ(y) =
Pn+p φ dµ,
˜ n+p (z)φ(z) G
z∈f −n−p (f n+p (y))
≥
˜ n+p (z)1Bn (x,r) (z). G
z∈f −n−p (f n+p (y))
By Proposition 3, in the last sum there is at least one term which is non-zero. Therefore, p ˜ n+p (x) ≥ C0 min ψ(x) ˜ n (x). ˜ µ Bn (x, 2r) ≥ C0 G G x
Let k be the least integer such that λk ≥ 2. Then we have Bn (x, r) ⊃ Bn+k (x, λk r) ⊃ Bn+k (x, 2r). By the last inequality, we get p+k ˜ n (x). ˜ µ Bn (x, r) ≥ C0 min ψ(x) G x
p+k ˜ Take C = C0 minx ψ(x) , we have for any G-measure µ, C −1 ≤ for any x ∈ X, n ≥ 0.
µ(Bn (x, r)) ≤C ˜ n (x) G
Proof of Theorem 1 (2) and (4). As we have seen above, the only thing to prove is the uniqueness of ergodic G-measures. Let µ be an (ergodic) G-measure. The Gibbs property that we have proved state that for a fixed number r, 0 < 2r ≤ a, we have a constant C > 0 such that ˜ n (x) ≤ µ(Bn (x, r)), µ(Bn (x, 2r)) ≤ C G ˜ n (x) C −1 G for any x ∈ X, n ≥ 1. By Lemma 8, we have to show that any two P-ergodic Gmeasures µ1 and µ2 are mutually absolutely continuous. We will see that the Gibbs property implies a little more: there is a constant C > 0 such that C −1 µ1 (U ) ≤ µ2 (U ) ≤ Cµ1 (U ) for any open set U of X. Let {x1 , · · · , xm } be a 2r-net in (X, d), this means that the balls {B(xi , r)}1≤i≤m are disjoint and the balls {B(xi , 2r)}1≤i≤m form a cover of X. Define A1 = B(x1 , 2r) \ B(x2 , r) ∪ · · · ∪ B(xm , r) , Ai = B(xi , 2r) \ (A1 ∪ · · · ∪ Ai−1 ),
2 ≤ i ≤ m.
136
A. Fan, Y. Jiang
Then we get a partition Q0 = {Ai }m i=1 of X satisfying B(xi , r) ⊆ Ai ⊆ B(xi , 2r),
1 ≤ i ≤ m.
For every n ≥ 1 and every 1 ≤ i ≤ m, denote f −n (xi ) = {zj }kjni=1 . Let gj n be the inverse of f n : Bn (zj , 2r) → B(xi , 2r). Define Anij = gj n (Ai ). We call Anij a n-component of f −n |Q0 . Let Qn be the set of all n components of f −n |Q0 . It is again a partition of X and satisfies that for any A ∈ Qn , Bn (cA , r) ⊆ A ⊆ Bn (cA , 2r), f n (c
where cA ∈ A such that A ) = xj . The point cA is called the center of A. It is worth noting that for n > k ≥ 1 and for any A ∈ Qn , f (n−k) (A) ∈ Qk . However Qk may not be a refinement of Qn . (So they are not Markov partitions.) Let U be an arbitrary open set in X. For n ≥ 1, let Qn (U ) be the family of all elements A of the partition Qn such that the n-Bowen ball Bn (cA , r) is entirely contained in U . Let
Vn = A. A∈Qn (U )
This is a Borel subset of U which is a countable union of disjoints sets. From the Gibbs property, we get µ1 (Vn ) = µ1 (A) ≤ µ1 (Bn (cA , 2r)) A∈Qn (U )
A∈Qn (U )
≤C
˜ n (cA ) ≤ C 2 G
A∈Qn (U )
≤ C2
µ2 (Bn (cA , r))
A∈Qn (U )
µ2 (A) = C 2 µ2 (
A∈Qn (U )
A) = C 2 µ2 (Vn ).
A∈Qn (U )
Then we have µ1 (U ) ≤ C 2 µ2 (U ) by using Fatou lemma and the fact that U = lim inf Vn . n→∞
C2µ
Similarly µ2 (U ) ≤ 1 (U ). Therefore, the G-measure is unique. Let µ be the unique G-measure. By Lemma 10, Pn φ → µ, φ as n → ∞ for any −n n ˜ φ ∈ C. Therefore, L˜n φ → µ, φ. From the relation between L and L, ρ L φ → ν, φh, where ν = µ/ h and 1/ hdµ = 1. Proof of Theorem 1 (5). Let Eρ = {φ ∈ C; Lφ = ρφ} be the eigenspace of L : C → C. Suppose φ is any function in Eρ . Let φ(x) a = min x∈X h(x) and φ1 = φ − ah. Then φ1 is in Eρ and φ1 ≥ 0. Moreover, there is a point y in X such that φ1 (y) = 0. Then φ1 (x) = 0 for all x in f −1 (y). Inductively, we have φ1 = 0 on −n (y). Since f is mixing, X is a dense subset in X. So φ = 0 on X, that Xy = ∪∞ y 1 n=0 f is, φ = ah. So Eρ is one-dimensional, that is, ρ is a simple eigenvalue.
On Ruelle–Perron–Frobenius Operators. I. Ruelle Theorem
137
4. Application: Smooth Invariant Measures Suppose X = M is a d-dimensional connected compact C 2 Riemannian manifold. Let f be an expanding C 1 dynamical system on M. Then it is automatically mixing (see [KS]). We say that f is C 1+ω , where ω is a modulus of continuity, if the determinant Jf (x) of the Jacobi matrix of f is in Hω . We also call Jf the Jacobian of f . The RiemannLebesgue measure dx on M is not necessarily f -invariant. We would like to know if there is a f -invariant measure on M equivalent to the Lebesgue measure. A measure µ on M is said to be smooth if the Radon-Nikodym derivative h = dµ/dx (called the density of µ) is continuous on M. A smooth measure with positive density is equivalent to the Lebesgue measure. Suppose f is a C 1 expanding dynamical system on M and Jf (x) % = 0 for all x ∈ M. A smooth measure µ with density h is f -invariant if and only if y∈f −1 x
h(y) = h(x). Jf (y)
That means 1 is an eigenvalue with positive eigenfunction of the Ruelle operator defined on the left. The following existence, a more general version of Krzyzewski–Szlenk Theorem (see [KS]), is a corollary of Theorem 1. Theorem 2. Suppose M is a d-dimensional connected compact Riemannian manifold and suppose f is an expanding C 1+ω dynamical system on M, where ω satisfies the Dini condition. Then f has a unique smooth f -invariant probability measure with a strictly positive density belonging to Hω˜ . Proof. Consider the positive transfer operator L for the dynamical system f and the potential function ψ = 1/Jf . By Theorem 1, there is a strictly positive function h ∈ Hω˜ such that Lh = ρh. Observe that Lh(x)dx = h(x)dx, where dx is the Lebesgue measure on M (see, for example, [Ji]). It follows that ρ = 1. The uniqueness follows from the simplicity of the eigenspace corresponding to the maximal eigenvalue ρ. Appendix Proof of Proposition 1. (1) First it is clear that f |B(x, b ) is injective. Since f is continuous on the closed ball B(x, b ) and since the closed ball is compact, the inverse of f |B(x, b ) is also continuous. But f : B(x, b ) → f (B(x, b )) is bijective, so it is a homeomorphism. (2) If #(f −1 (y)) were not finite, f would be not homeomorphic at limit points of f −1 (y), which will contradict (1). Let d(y) = inf d(xk , xj ) > 0 k%=j
be the shortest distance between the preimages of y. By (1), if 0 < r ≤ min(b, d(y)/2), f : B(xj , r) → f (B(xj , r)) is a homeomorphism for each j . Since the open set
138
A. Fan, Y. Jiang
∩nj=1 f (B(xj , r)) contains y, it must contain a sufficiently small ball B(y, ry ) with ry > 0 such that the inverse gj mapping y to xj satisfies gj : B(y, ry ) → gj (B(y, ry )) ⊂ B(xj , r). Since B(xj , r) are disjoint, gj (B(y, ry )) are disjoint. Now take a finite number of balls B(yi , ryi ) such that {B(yi , ryi /2)} form a cover of X. We claim that a=
1 min{ryi } 2 i
will satisfy (2). In fact, for any y ∈ X, we have y ∈ B(yi , ryi /2) for some i. Then B(y, a) ⊂ B(yi , ryi ). So, f −1 (B(y, a)) has its inverse components respectively contained in the inverse components of f −1 (B(yi , ryi )) which are pairwise disjoint. (3) This is a consequence of (2) and the expanding condition of f . (4) By (3), #(f −1 (x)) is a locally constant function of x. It is then bounded because of the compactness of X. (5) Let g1 , · · · , gn be the local inverses of f such that g1
g2
gn−1
g3
gn
x ←− f (x) ←− f 2 (x) ←− · · · ←− f n−1 (x) ←− f n (x). We claim that for any 1 ≤ k ≤ n, we have gn−k+1 ◦ · · · ◦ gn (B(f n (x), r)) = Bk (f n−k (x), r). We prove this by induction on k. When k = 1, we have to show that gn (B(f n (x), r)) = B1 (f n−1 (x), r). Since f is locally expanding, gn is contractive, i.e. r gn (B(f n (x), r)) ⊆ B f n−1 (x), ⊆ B(f n−1 (x), r). λ On the other hand, gn being a homeomorphism with inverse f , we have y ∈ gn (B(f n (x), r)) if and only if f (y) ∈ B(f n (x), r). From these two facts, we conclude k = 1. Suppose the claimed fact is true for k. Then gn−k ◦ · · · ◦ gn (B(f n (x), r)) = gn−k (Bk (f n−k (x), r)). So, we have to show that gn−k (Bk (f n−k (x), r)) = Bk+1 (f n−k−1 (x), r). In fact, z ∈ gn−k (Bk (f n−k (x), r)) if and only if d(z, f n−k−1 (x)) < r and f (z) ∈ Bk (f n−k (x), r). (We used the fact that Bk (y, r) is decreasing when k increases and that f is a local homeomorphism mapping f n−k−1 (x) to f n−k (x) with inverse gn−k .) Equivalently, d(z, f n−k−1 (x)) < r, d(f (z), f n−k (x)) < r, · · · , d(f k+1 (z), f n (x)) < r. That is to say z ∈ Bk+1 (f n−k−1 (x), r). Thus the proof is finished. In particular, we have proved g1 ◦ g2 ◦ · · · ◦ gn (B(f n (x), r)) = Bn (x, r). Taking f n on both sides, we get B(f n (x), r) = f n (Bn (x, r)). Thus Proposition 1 is proved. We have actually considered a chain of homeomorphisms in Proposition 1: g1
g2
gn−1
gn
Bn (x, r) ←− Bn−1 (f (x), r) ←− · · · ←− B1 (f n−1 (x), r) ←− B(f n (x), r).
On Ruelle–Perron–Frobenius Operators. I. Ruelle Theorem
139
Proof of Proposition 2. By the compactness of X, there is a finite cover of balls B(xi , r/2) to X. For each i, there is an integer p(xi ) > 0 such that r = X. f p(xi ) B xi , 2 Let
p = max{p(xi )}. i
This p is a good choice for us. In fact, for any x ∈ X, we have x ∈ B(xi , r/2) for some i. So, B(x, r) ⊃ B(xi , r/2) and then r ⊇ f p−p(xi ) (X) = X. f p (B(x, r)) ⊇ f p−p(xi ) f p(xi ) B xi , 2 Proof of Proposition 3. According to Proposition 1, f n : Bn (x, r) → B(f n (x), r) is a homeomorphism. From Proposition 2, f n+p (Bn (x, r)) = f p (B(f n (x), r)) = X. p
So, f −(n+p) (y) ∩ Bn (x, r) % = ∅ for any x, y ∈ X. On the other hand, #(f −p (y)) ≤ n0 by Proposition 1 (4). By Proposition 1 (5), every z ∈ f −p (y) ∩ B(f n (x), r), if any, has exactly one preimage in Bn (x, r) under f n . It implies that p
#(f −(n+p) (y) ∩ Bn (x, r)) ≤ n0 .
Proof of Lemma 8. (1) Suppose µ1 % = µ2 . There exists φ ∈ C such that µ1 , φ % = µ2 , φ. Define
A1 = x ∈ X; lim Pn φ = µ1 , φ and A2 = x ∈ X; lim Pn φ = µ2 , φ . n→∞
n→∞
We have µ1 (A1 ) = 1 and µ2 (A1 ) = 0, and µ1 (A2 ) = 0 and µ2 (A2 ) = 1. This implies that µ1 ⊥ µ2 . (2) Suppose µ is P-ergodic and µ = tν1 + (1 − t)ν2 with both ν1 and ν2 G-measures and 0 < t < 1. For any A ∈ F∞ , µ(A) = 0 or 1 because of the ergodicity of µ. Then ν1 (A) = ν2 (A) = 0 or 1 because 0 < t < 1. That means that ν1 and ν2 are also P-ergodic. We claim that ν1 = ν2 . Otherwise, according to (1), we can find A ∈ F∞ such that ν1 (A) = 1 and ν2 (A) = 0. Consequently µ(A) = t, which contradicts the ergodicity of µ. Therefore ν1 = ν2 and µ is extremal in the set of all G-measures. Conversely, suppose µ is a G-measure but is not P-ergodic. Let A ∈ F∞ such that 0 < t = µ(A) < 1. Define ν1 =
1 µ1A , t
ν2 =
1 µ1X\A . 1−t
We are now going to show that ν1 , ν2 are G-measures. For any n ≥ 1, Pn∗ ν1 , φ =
1 µ, 1A Pn φ, t
φ ∈ C.
140
A. Fan, Y. Jiang
However, since A ∈ F∞ ⊆ Fn , µ1 , 1A Pn φ = supµ, φ Pn φ = supPn∗ µ, φφ = supµ, φ φ = 1B µ1 , φ,
where the supremum is taken over {φ ≤ 1A , φ ∈ 1n }. This implies that Pn∗ ν1 = ν1 for all n ≥ 1, that is, ν1 is a G-measure. Similarly, ν2 is a G-measure. But µ1 = tν1 + (1 − t)ν2 , which implies that µ1 is not an extremal point.
0<α<1
Proof of Lemma 9. For any G-measure µ, the constant in (3) is µ, φ (by the Lebesgue dominated convergence theorem). So (3) implies (1). It is clear that (2) implies (3). It suffices to show that (1) implies (2). Suppose it is not true. There is a continuous function φ in C, a fixed number ; > 0, and a sequence of integers {ni } and a sequence of points {xi } in X such that |Pni φ (xi ) − µ, φ | ≥ ; for all i. Observe that Pni φ(xi ) = Pn∗i δxi , φ, where δxi is the Dirac measure concentrated at xi . Let µ be a weak limit of Pn∗i δxi . Then for any φ ∈ C and n ≥ 1, µ , Pn φ = lim Pn∗i δxi , Pn φ = lim δxi , Pni Pn φ n→∞
n→∞
= lim δxi , Pni φ = lim Pn∗i δxi , φ = µ , φ. n→∞
So
µ
n→∞
is also a G-measure but |µ, φ − µ , φ| ≥ ;.
It is a contradiction.
Acknowledgement. This work was started when the second author visited the Faculté de Mathématiques et Informatique at Université de Picardie Jules Verne in Amiens, France and it was done when the first author visited the Institute of Mathematical Science (IMS) at the Chinese University of Hong Kong and when the second author visited the Institute for Mathematical Research (FIM) at ETH-Zürich in Switzerland. The authors would like to thank these institutes for hospitality and support. The authors also would like to thank Professors A.-S. Sznitman, K. S. Lau and O. Lanford III for their interest and helpful discussions. Thanks also go to A. Rivière et Y. L. Ye for valuable comments.
References [Bi] Birkhoff, G.: Extensions of Jentzch’s theorem. Trans. Amer. Math. Soc. 85, 219–227 (1957) [Bo] Bowen, R.: Equilibrium states and the ergodic theory of Anosov diffeomorphisms. LNM 470. Berlin: Springer, 1975 [DF] Dooley, A.H. and Fan, A.H.: Chains of Markovian projections and (G, 1)-measures. In: Trends in Probability and Related Analysis, eds. N. Kono and N.R. Shieh, Singapore: World Scientific, 1997, pp. 101–116 [Fa] Fan, A.H.: A proof of the Ruelle theorem. Reviews Math. Phys. 7, no. 8, 1241–1247 (1995) [FJ] Fan, A.H. and Jiang, Y.P.: On Ruelle–Perron–Frobenius Operators. II. Convergence speeds. Commun. Math. Phys. 223, 143–159 (2001) [FS] Ferrero, P. and Schmitt, B.: Ruelle’s Perron–Frobenius theorem and projective metrics. Coll. Math. Soc., Jáne Bolyai, 27, 333–336 (1979)
On Ruelle–Perron–Frobenius Operators. I. Ruelle Theorem
[Ji] [KS] [OR] [Pa] [R1] [R2] [Wa]
141
Jiang, Y.P.: A Proof of existence and simplicity of a maximal eigenvalue for Ruelle–Perron–Frobenius operators. Lett. Math. Phys. 48, 211–219 (1999) Krzyzewski, K. and Szlenk, W.: On invariant measures for expanding differentiable mappings. Studia Math. XXXIII, 83–92 (1969) Odell, E. and Rosenthal, H.: Functional Analysis. Lecture Notes in Mathematics, 1332. Berlin– Heidelberg–New York: Springer-Verlag, 1988 Parry, W.: Topics in ergodic theory. Cambridge–London–New York: Cambridge University Press, 1981 Ruelle, D.: Statistical mechanics of a one-dimensional lattice gas. Commun. Math. Phys. 9, 267–278 (1968) Ruelle, D.: A measure associated with Axiom A attractors. Am. J. Math. 98, 619–654 (1976) Walters, P.: Invariant measures and equilibrium states for some mappings which expand distances. Trans. Am. Math. Soc. 236, 121–153 (1978)
Communicated by Ya. G. Sinai
Commun. Math. Phys. 223, 143 – 159 (2001)
Communications in
Mathematical Physics
© Springer-Verlag 2001
On Ruelle–Perron–Frobenius Operators. II. Convergence Speeds Aihua Fan1 , Yunping Jiang 2, 1 Département de Mathématiques et Informatique, Université de Picardie Jules Verne, 33, Rue Saint Leu,
80039 Amiens Cedex 1, France. E-mail:
[email protected]
2 Department of Mathematics, Queens College of CUNY, Flushing, NY 11367, USA and Department of
Mathematics, CUNY Graduate School, 365 Fifth Avenue, New York, NY 10016, USA. E-mail:
[email protected] Received: 31 May 2000 / Accepted: 1 June 2001
Abstract: We study Ruelle operators on expanding and mixing dynamical systems with potential function satisfying the Dini condition. We give an estimate for the convergence speed of the iterates of a Ruelle operator. Our proof avoids Markov partitions. This is the second part of our research on Ruelle operators. 1. Introduction Let X be a compact metric space with metric d and f : X → X be a continuous map. The couple (X, f ) is called a dynamical system. Let ψ : X → R+ ∗ be a strictly positive continuous function, called a potential. The Ruelle–Perron–Frobenius operator L = Lf,ψ , simply called the Ruelle operator, is defined as Lφ(x) = ψ(y)φ(y) y∈f −1 (x)
for φ in a suitable space of functions on X. The Ruelle operator is an important tool in the study of dynamical systems. The famous Ruelle theorem deals with spectral properties of L and then implies the convergence of the powers of L. Under the setting of an expanding and mixing dynamical system with a Dini potential, the Ruelle theorem is proved in the first part of our study [FJ]. (See also [Ru1, Ru2, Bo,Wa] for different proofs under different settings.) In this paper, we present our results on the convergence speed of the powers of L. Our result is new in the general setting that we consider here. Our method may also work in the setting considered in [Wa], where no convergence speed was studied. Recall that a dynamical system f on X is said to be locally expanding if there are constants λ > 1 and b > 0 such that d(f (x), f (y)) ≥ λd(x, y),
x, y ∈ X, d(x, y) ≤ b.
Supported in part by NSF grants and PSC-CUNY awards.
144
A. Fan, Y. Jiang
We call (λ, b) a primary expanding parameter. It is said to be mixing if for any nonempty open set U of X, there is an integer n > 0 such that f n (U ) = X. For any n ≥ 0, we define a new metric dn on X, called the n-Bowen metric, as dn (x, y) = max d(f j (x), f j (y)). 0≤j ≤n
The n-Bowen ball centered at x ∈ X of radius r > 0 is denoted by Bn (x, r). The 0-Bowen metric is just the original metric d on X. Let C = C(X, R) be the space of all continuous functions φ : X → R with the supremum norm ||φ||∞ = max |φ(x)|. x∈X
For a right continuous and increasing function ω : R+ → R+ with ω(0) = 0 (called modulus of continuity), we define Hω = Hω (X, R) to be the space of all functions φ ∈ C satisfying |φ(x) − φ(y)| [φ]ω = sup < ∞. x,y∈X,0
0 will be chosen and fixed later.) A modulus of continuity ω(t) is said to satisfy the Dini condition if 1 ω(t) dt < ∞. t 0 For such a Dini function ω, define ω(t) ˜ =
∞
ω(λ−n t).
n=1
It is easy that ω˜ is also a modulus of continuity. Let M be the dual space of C and let L∗ : M → M be the dual operator of L : C → C. For any measure ν ∈ M and any function φ ∈ C, we use ν, φ to denote the integral of φ with respect to ν. Let us recall the Ruelle theorem that we proved in [FJ]. Theorem 1 (Ruelle Theorem). Suppose that ω is a Dini modulus of continuity and ψ ∈ H ω . We have the following statements: 1. There exists a strictly positive number ρ and a strictly positive function h ∈ Hω˜ such that Lh = ρh. 2. There exists a unique probability measure ν = νψ ∈ M such that L∗ ν = ρν. 3. For sufficiently small r > 0, there is a constant C = C(r) > 0 such that ν Bn (x, r) −1 C ≤ −n ≤ C (Gibbs property) ρ Gn (x) j holds for all x ∈ X and n ≥ 1, where Gn (x) = n−1 j =0 ψ(f x). 4. Take h in (1) such that ν, h = 1. Then for any φ ∈ C, ρ −n Ln φ → ν, φh as n → ∞.
On Ruelle–Perron–Frobenius Operators. II. Convergence Speeds
145
Notice that the function h belongs to Hω˜ but not to Hω , in general. Our concern in this paper is the convergence speed of ρ −n Ln φ. Such speeds will provide us with good knowledge on the statistical properties of the dynamical system. We shall see that the convergence speed depends on the regularities of both ψ and φ. For any function φ, denote by φ (t) its modulus of continuity defined by supd(x,y)≤t |φ(x)− φ(y)|. Our main result in this paper is the following. Theorem 2. Make the same assumptions as in Theorem 1. Take an eigenfunction h (associated to the eigenvalue ρ) such that ν, h = 1. Then for any with 0 < c2 ≤ a, c2 = 2λ/(λ − 1) , there exist constants 0 < γ < 1, p ≥ 0, C > 0 such that for any n ≥ 1, any φ ∈ C, any integer partition of [1, n], 1 ≤ n0 < n1 < · · · < n%−1 < n% ≤ n, satisfying nj − nj −1 > p for 1 ≤ j < % (let n−1 = 0), we have ρ −n Ln φ − ν, φh∞ % ≤ C φ (c2 λ−n0 ) + φ∞ ω(λ ˜ p c2 λ−(nj −nj −1 ) ) + φ∞ γ % . j =0
Our result in the general setting unifies to some extent the existing ones (see [FP]). Our method is completely different and seems simple. Markov partitions are not needed, unlike what one could expect. That is one reason for the simplicity of the method. In the place of the Markov partition, we need a non Markovian partition which is very easy to construct and may be adapted to the setting studied in [Wa]. The article is organized as follows. In Sect. 2, we will recall some properties of an expanding and mixing dynamical system and construct non Markovian partitions. In Sect. 3, we will prove our main result (Theorem 2, also Theorems 3 and 4). In Sect. 4, we will give some examples providing different kind of convergence speeds (polynomial, superpolynomial, subexponential, etc). In Sect. 5, we will apply the main result to get decays of correlation and the central limit theorem. 2. Construction of Non-Markovian Partitions For a locally expanding dynamical system (X, f ) with expanding primary parameter (λ, b), the restriction f : B(x, b ) → f (B(x, b )) is homeomorphic for any x ∈ X and 0 < b ≤ b. Moreover, there is an integer m0 > 0 such that #(f −1 (x)) ≤ m0 for any x ∈ X and for any x ∈ X and any 0 < r ≤ b, we have Bk (x, r) ⊆ Bk−1 (x, λ−1 r)
(k ≥ 1).
Some further properties listed below are proved in [FJ]. Proposition 1. Suppose f is a locally expanding and mixing dynamical system with a primary expanding parameter (λ, b). 1. There is a constant 0 < a ≤ b such that for any x ∈ X with f −1 (x) = {x1 , · · · , xn }, there are local inverses g1 , · · · , gn of f defined on B(x, a) such that gj (x) = xj and gj (B(x, a)) (1 ≤ j ≤ n) are pairwise disjoint. 2. Let a > 0 be a constant in (1). We have #(f −1 (x)) = #(f −1 (y)) if d(x, y) ≤ a. Furthermore, we can arrange f −1 (x) = {x1 , · · · , xn } and f −1 (y) = {y1 , · · · , yn } so that d(x, y) d(xj , yj ) ≤ (1 ≤ j ≤ n). λ
146
A. Fan, Y. Jiang
3. Let a > 0 be a constant in (1). If 0 < r ≤ a, the map f n : Bn (x, r) → B(f n (x), r) is a homeomorphism. 4. Let a > 0 be a constant in (1). Then for any 0 < r ≤ a, there is an integer p = p(r) ≥ 1 such that f p (B(x, r)) = X for any x ∈ X. Moreover, for any x, y ∈ X, p 1 ≤ # f −(n+p) (y) ∩ Bn (x, r) ≤ m0 . Let a be a constant in (1). We call the pair (λ, a) an expanding parameter for f . Henceforth we suppose f is a locally expanding and mixing dynamical system with a fixed expanding parameter (λ, a). Now we are going to construct a sequence of partitions of X when λ > 3. Denote c1 =
λ−3 λ−1
and
c2 =
2λ . λ−1
Let be a real number satisfying that 0 < 2 ≤ a. Let {x1 , · · · , xm } be a 2 -net in (X, d), that is to say, the balls {B(xj , )}1≤j ≤m are disjoint and the balls {B(xj , 2 )}1≤j ≤m form a cover of X. Define A1 = B(x1 , 2 ) \ B(x2 , ) ∪ · · · ∪ B(xm , ) , Aj = B(xj , 2 ) \ (A1 ∪ · · · ∪ Aj −1 )
(2 ≤ j ≤ m).
Thus we get a partition P0 = {Aj } of X such that B(xj , ) ⊆ Aj ⊆ B(xj , 2 )
(1 ≤ j ≤ m).
For any n ≥ 1 and 1 ≤ j ≤ m, the inverse under f n of every Aj is composed of disjoint sets (called components), each of which contains a dn -ball of radius and is contained in a dn -ball of radius 2 (Proposition 1 (3)). More precisely, for each component A, f n : A → Aj is homeomorphic and Bn (cA , ) ⊆ A ⊆ Bn (cA , 2 ), where cA ∈ A and f n (cA ) = xj . We call cA the center of A. The set of all such components A form a partition, which we denote by Pn . It is worthy to note that if n > k ≥ 1 and if A ∈ Pn , we have f k A ∈ Pn−k . However Pn is not necessarily a refinement of Pk . In the following, we will modify {Pk }nk=0 to get a new (finite) sequence of partitions {Qk }nk=0 such that Qk+1 is a refinement of Qk . Proposition 2. Suppose λ > 3. For any n ≥ 1 and partitions Qk (0 ≤ k ≤ n) such that
such that 0 < c2 ≤ a, there are
1. Qk+1 is a refinement of Qk (0 ≤ k < n). 2. Any element in Qk contains a dk -ball of radius c1 and is contained in a dk -ball of radius c2 .
On Ruelle–Perron–Frobenius Operators. II. Convergence Speeds
147
Proof. We construct Qk (0 ≤ k ≤ n) by induction on k (decreasing from n to 0). First take Qn = Pn . For A ∈ Pn−1 , let A˜ = ∪D∈Qn : cD ∈A D, where cD is the center of D ∈ Qn = Pn . We claim that Bn−1 cA , (1 − 2λ−1 ) ⊆ A˜ ⊆ Bn−1 cA , 2 (1 + λ−1 ) . In fact, suppose that the center cD of D ∈ Qn is outside A. Since A contains the dn−1 ball Bn−1 cA , of radius centered at cA , dn−1 (cA , cD ) ≥ . This implies that for z ∈ D ⊆ Bn (cD , 2 ) ⊂ Bn−1 (cD , 2 /λ) we have dn−1 (cA , z) ≥ dn−1 (cA , cD ) − dn−1 (cD , z) ≥ (1 − 2λ−1 ). Thus we have proved the first inclusion. On the other hand, suppose that the center cD of D ∈ Qn is inside A. Since A is contained in a dn−1 -ball Bn−1 cA , 2 of radius 2 centered at cA , dn−1 (cD , cA ) ≤ 2 . This implies that for z ∈ D ⊆ Bn (cD , 2 ) ⊂ Bn−1 (cD , 2 /λ), we have dn−1 (z, cA ) ≤ dn−1 (z, cD ) + dn−1 (cD , , cA ) ≤ 2 λ−1 + 2 < 2 (1 + λ−1 ). Thus the second inclusion is proved. All these A˜ form Qn−1 . Again we call cA˜ = cA the center of A˜ in Qn−1 . In case there is no confusion, we will still use A (without tilde) to mean an element in Qn−1 . Let s1 = (1 − 2λ−1 ),
t1 = 2 (1 + λ−1 ).
Suppose we have constructed Qn−(k−1) (2 ≤ k ≤ n) such that for any D ∈ Qn−(k−1) we have Bn−(k−1) (cD , sk−1 ) ⊆ D ⊆ Bn−(k−1) (cD , tk−1 ), where cD is the center of D. Now for any A ∈ Pn−k , define an element A˜ of Qn−k as follows: A˜ = ∪D∈Qn−k+1 : cD ∈A D. Let cA be the center of A. We claim that
Bn−k (cA , − tk−1 ) ⊆ A˜ ⊆ Bn−k cA , 2 + tk−1 λ−1 .
In fact, A in Pn−k contains the dn−k -ball Bn−k (cA , ) and is contained in the dn−k ball Bn−k (cA , 2 ). Suppose D is in Qn−(k−1) whose center cD is outside A. Then dn−k (cA , cD ) ≥ . Hence, for any z ∈ D ⊆ Bn−(k−1) (cD , tk−1 ) ⊆ Bn−k (cD , tk−1 λ−1 ), we have dn−k (cA , z) ≥ dn−k (cA , cD ) − dn−k (cD , z) ≥ − tk−1 λ−1 > − tk−1 . This proves the first inclusion in the claim. On the other hand, for every D in Qn−(k−1) whose center cD is in A, we have that dn−k (cA , cD ) ≤ 2 and that for any z ∈ D ⊆ Bn−(k−1) (cD , tk−1 ) ⊆ Bn−k (cD , tk−1 λ−1 ), dn−k (z, cD ) ≤ tk−1 λ−1 . Thus, dn−k (z, cA ) ≤ dn−k (z, cD ) + dn−k (cD , cA ) ≤ 2 + tk−1 λ−1 .
148
A. Fan, Y. Jiang
This is the second inclusion in the claim. Now let sk = − tk−1 λ−1 For any A˜ in Qn−k ,
tk = 2 + tk−1 λ−1 .
and
Bn−k (cA˜ , sk ) ⊆ A˜ ⊆ Bn−k cA˜ , tk ,
˜ An easy calculation shows that where cA˜ = cA is the center of A. tk = 2
λ − λ−k λ−1
and
sk =
1−
2(1 − λ−(k+1) ) . λ−1
We see that tk ≤ c2 and that for λ > 3, sk ≥ c1 > 0. So we have completed the proof. 3. Convergence Speeds of Ruelle Operators We give here a proof of Theorem 2. Let s = minx∈X ψ(x). Let K0 = [ψ]ω /s. For any x, y ∈ X, let xi = f i (x) and yi = f i (x) for i ≥ 0. The following distortion property is easy to obtain by using the fact d(xi , yi ) ≤ λn−i d(xn , yn ) for 0 ≤ i < n (a detailed proof is given in [FJ]). Lemma 1 (Naive Distortion). For any x, y ∈ X with dn (x, y) ≤ a,
log Gn (x) ≤ K0 ω(d(x ˜ n , yn )), Gn (y) where Gn (x) =
n−1
j =0 ψ(f
j x).
Given φ ∈ C. Let φ˜ = φ − ν, φh. Then we have
˜ = 0. And moreover, φdν
ρ −n Ln φ˜ = ρ −n Ln φ − ν, φh ˜ ∞ ≤ (1+h∞ )φ∞ . Therefore, Theorem 2 is a consequence of the following and φ theorem. Theorem 3. Make the same assumptions as Theorem 2. Then for any such that 0 < c2 ≤ a, there exist constants 0 < γ < 1, p ≥ 0, C > 0 such that for any n ≥ 1, any φ ∈ C such that ν, φ = 0, any integer partition of [1, n], 1 ≤ n0 < n1 < · · · < n%−1 < n% ≤ n, satisfying nj − nj −1 > p for 1 ≤ j < % (let n−1 = 0), we have ρ
−n n
L φ∞ ≤ C φ (c2 λ
−n0
) + φ∞
%
p
ω(λ ˜ c2 λ
−(nj −nj −1 )
) + φ∞ γ
%
.
j =0
˜ Instead of working with the operator L, we shall work with its normalization L, which is defined as follows. Let ψ˜ = ψ
h . ρ h◦f
On Ruelle–Perron–Frobenius Operators. II. Convergence Speeds
Define
˜ Lφ(x) =
149
˜ ψ(y)φ(y).
y∈f −1 (x)
˜ = 1. Denote The important feature for L˜ is that L1 ˜ n (x) = G
n−1
˜ i (x)) = Gn (x) ψ(f
i=0
ρn
h(x) . h ◦ f n (x)
Then we have the expression L˜ n φ(x) =
˜ n (y)φ(y). G
y∈f −1 (x)
The following lemma is an easy consequence of Lemma 1. Lemma 2. Let K1 = K0 + 2[h]ω˜ / min h. For any x, y ∈ X with dn (x, y) ≤ a,
˜ log Gn (x) ≤ K1 ω(d(x ˜ n , yn )). ˜ (y) G n
˜ Moreover, if K = K1 eK1 ω(a) , we have
˜ G n (x) − 1 ≤ K ω(d(x ˜ n , yn )). G ˜ n (y) Remark that for 0 < δ ≤ a and 1 ≤ k ≤ m, by Lemma 2 we have
˜ Gk (x) − 1; sup ˜ (y) G k
dm (x, y) ≤ δ ≤ K ω(λ ˜ −(m−k) δ).
Let ν be the measure in Theorem 1 (2) and take h in Theorem 1 (1) such that ν, φ = 1. Let µ = hν (the Gibbs measure). We will show that Theorem 3 follows from the following theorem. Theorem 4. Make the same assumptions as in Theorem 2. Then for any with 0 < c2 ≤ a, there exist constants 0 < γ < 1, p ≥ 0, K > 0 such that for any n ≥ 1, any φ ∈ C such that µ, φ = 0, any integer partition of [1, n], 1 ≤ n0 < n1 < · · · < n%−1 < n% ≤ n, satisfying nj − nj −1 > p for 1 ≤ j < %, we have that L˜ n φ∞ ≤ φ (c2 λ−n0 ) + Kφ∞
%
ω(λ ˜ −(nj −nj −1 ) c2 λp ) + φ∞ γ % .
j =1
Notice that the sum in the inequality of Theorem 3 is taken over 0 ≤ j ≤ %, while that in the inequality of Theorem 4 is taken over 1 ≤ j ≤ l. To prove Theorem 4, we will need several lemmas. The first one has its own interests. It is simple but decisive.
150
A. Fan, Y. Jiang
Lemma 3. Let (, A, µ) be a measure space. Let 0 < α < β < ∞ be two constants. There exists a constant 0 < γ = γ (α, β) < 1 such that the inequality φχ dµ ≤ γ |φ|χ dµ holds for any measurable function χ such that α ≤ χ (x) ≤ β and any integrable function φ such that φdµ = 0 (the optimal γ is (β − α)/(β + α)). Proof. The special case (corresponding to the discrete measure µ = δ1 + δ2 ) |x1 − x2 | ≤ γ (x1 + x2 )
(α ≤ x1 , x2 ≤ β)
is trivial. Now without loss of generality, we assume that
φ>0
φdµ = −
φ<0
φdµ = 1.
Apply the special case to x1 =
φ>0
φχ dµ,
x2 = −
φ<0
φχ dµ.
Since α ≤ x1 , x2 ≤ β, we have φχ dµ = |x1 − x2 | ≤ γ (x1 + x2 ) = γ |φ|χ dµ.
The main idea of the proof of Theorem 4 is to introduce a sequence of linear operators P = {Pn }∞ n=1 defined as
Pn φ(x) = L˜ n (f n (x)) =
˜ ψ(y)φ(y).
y∈f −n (f n (x))
As we have seen in [FJ] that Pn is positive, Pn 1 = 1 and Pm Pn = Pn Pm = Pm
(m ≥ n ≥ 1).
Assume for the moment λ > 3. Fix n ≥ 1 and such that 0 < c2 ≤ a. Let Qk (1 ≤ k ≤ n) be the partitions constructed in Proposition 2. Let us still use Qk to denote the σ -algebra generated by Qk . Let Ek = E(·|Qk ) be the conditional expectation with respect to Qk on the probability space (X, µ). Lemma 4. Let p0 = p( ) be a fixed integer in Proposition 1 (4). Then there exists a constant 0 < γ < 1 depending only upon and (f, ψ) such that for any φ ∈ L∞ (µ) with µ, φ = 0, any p ≥ p0 , and any k ≥ 1, Pk+p Ek φ∞ ≤ γ φ∞ .
On Ruelle–Perron–Frobenius Operators. II. Convergence Speeds
151
Proof. Note that
Pk+p Ek φ(x) =
˜ k+p (y)Ek φ(y) G
y∈f −(k+p) (f k+p (x))
=
A∈Qk
A φdµ
µ(A)
˜ k+p (y). G
y∈A∩f −(k+p) (f k+p (x))
By Propositions 1 (4) and the Gibbs property in Theorem 1 (3), there is a constant C0 = C0 > 0 such that C0−1 ≤
1 µ(A)
˜ k+p (y) ≤ C0 . G
y∈A∩f −(k+p) (f k+p (x))
So Lemma 3 implies that we have a constant 0 < γ = (C0 − C0−1 )/(C0 + C0−1 ) < 1 such that |Pk+p Ek φ(x)| ≤ γ
A∈Qk
≤ γ φ∞
A |φ|dµ
µ(A)
˜ k+p (y) G
y∈A∩f −(k+p) (f k+p (x))
˜ k+p (y) = γ φ∞ , G
A∈Qk y∈A∩f −(k+p) (f k+p (x))
because
A∈Qk
˜
y∈A∩f −(k+p) (f k+p (x)) Gk+p (y)
= Pk+p 1 = 1.
For any function φ defined on X, define (n, ) (φ) = sup sup |φ(x) − φ(y)| Vm A∈Qm x,y∈A
(1 ≤ m ≤ n).
This describes the variation of φ on the partition Qm which depends on n and . A (n, ) function φ is Qm -measurable if Vm (φ) = 0, i.e., φ is a piecewise constant function with respect to Qm . Lemma 5. For any such that 0 < c2 ≤ a, there exists a constant integer q0 ≥ 1 such that for any q ≥ q0 with n ≥ m ≥ k + q > k ≥ 1 and any Qk -measurable φ we have (n, ) Vm (Pk+q φ) ≤ Kφ∞ ω(λ ˜ −(m−k) c2 λq ).
Proof. Suppose A ∈ Qm and x, y ∈ A. Let f −k (f k (x)) = {xj } and f −k (f k (y)) = {yj }. By Proposition 2, A is contained in a dm -ball of radius c2 . We may assume that for each j , xj and yj are contained in a dm -ball of radius c2 which is contained in a dm−q -ball of radius c2 λ−q . Take q0 such that λq0 > c2 /c1 .
152
A. Fan, Y. Jiang
Then xj and yj are contained in a dm−q -ball of radius c1 which is contained in a dk -ball of radius c1 because m − q ≥ k. As φ is Qk -measurable, φ(xj ) = φ(yj ). So, ˜ ˜ |Pk+q φ(x) − Pk+q φ(y)| = Gk+q (xj )φ(xj ) − Gk+q (yj )φ(yj ) j j ˜ k+q (xj ) − G ˜ k+q (yj ) ≤ φ∞ G j
G ˜ k+q (xj ) ˜ k+q (yj ) ≤ φ∞ G − 1 ˜ k+q (yj ) G j ˜ k+q (yj ) ≤ Kφ∞ ω(λ ˜ −(m−k+q) c2 ) G
j
≤ Kφ∞ ω(λ ˜
−(m−k)
We have used the remark after Lemma 2 and the fact
q
c2 λ ).
j
˜ k (xj ) = Pk 1 = 1. G
The following lemma is obvious for any µ-measurable function φ. Lemma 6. For 1 ≤ k ≤ n, we have (n, )
(I − Ek )φ∞ ≤ Vk
(φ).
Proof of Theorem 4. Let p be the biggest of p0 = p( ) in Lemma 4 and of q0 in Lemma 5. For n ≥ k + p, we have Pn = Pn Pk+p . Write Qk+p = Pk+p Ek . We have Pn = Pn (I − Ek ) + Pn Ek = Pn (I − Ek ) + Pn Qk+p . By induction, we have Pn = Pn (I − En0 ) + Pn Qn0 +p = Pn (I − En0 ) + Pn [(I − En1 ) + Qn1 +p ]Qn0 +p = Pn (I − En0 ) + Pn (I − En1 ) + Pn Qn1 +p Qn0 +p ··· j −1 % %−1 (I − Enj ) Qni +p + Qni +p . = Pn (I − En0 ) + j =1
i=0
i=0
By using the fact Pn φ∞ ≤ φ∞ , Lemma 6, and Lemma 4, we have Pn φ∞ ≤ (I − En0 )φ∞ j −1 % (I − En ) Q φ + ni +p j j =1
) (φ) + ≤ Vn(n, 0
i=0
% j =1
Vn(n, j
)
j −1 i=0
∞
%−1 + Q φ ni +p
i=0
Qni +p φ + γ % φ∞ .
∞
On Ruelle–Perron–Frobenius Operators. II. Convergence Speeds
Let φj =
j −1 i=0
153
Qni +p φ. Then
φj =
j −1
Qni +p φ = Qnj −1 +p φj −1 = Pnj −1 +p Enj −1 φj −1 .
i=0
Here φ˜ j = Enj −1 φj −1 is Qnj −1 -measurable. From Lemma 5 and the fact φ˜j ∞ ≤ φ∞ , we get ) Vn(n, j
j −1
Qni +p φ
i=0
) = Vn(n, (Pnj −1 +p φ˜j ) j
≤ Kφ∞ ω(λ ˜ −(nj −nj −1 ) c2 λp ). On the other hand, ) Vn(n, (φ) ≤ 0
sup
dn0 (x,y)≤c2
|φ(x) − φ(y)| ≤ (c2 λ−n0 ).
Thus we obtain Pn φ∞ ≤ φ (c2 λ−n0 ) + Kφ∞
%
ω(λ ˜ −(nj −nj −1 ) c2 λp ) + γ % φ∞ .
j =1
Because of Pn φ(x) = (L˜ n φ)(f n (x)) and the surjectivity of f n , we now get L˜ n φ∞ ≤ φ (c2 λ−n0 ) + Kφ∞
%
ω(λ ˜ −(nj −nj −1 ) c2 λp ) + γ % φ∞ .
j =1
For general λ > 1, first take an integer θ > 1 such that λθ > 3 and consider the local expanding and mixing map g = f θ . Then consider the normalization L˜ g of the Ruelle operator Lg . Then, there are constants 0 < γg < 1, pg ≥ 0, Kg > 0 such that for any φ having mean value zero with respect to µ and any integer partition of [1, n], 1 ≤ n0 < n1 < · · · < n%−1 < n% ≤ n, satisfying nj − nj −1 > pg , L˜ ng φ∞ ≤ φ (c2 λ−θn ) + Kg φ∞
%
−θ(nj −nj −1 )
ω(λ ˜ g
j =1
c2 λp ) + γg% φ∞ .
This, together with the fact L˜ ng = L˜ θn f , implies the desired result.
Proof of Theorem 3. The relations between L and L˜ and between L∗ and L˜ ∗ are Ln φ = ρ n hL˜ n (φh−1 )
and
L∗n ν = ρ n h−1 L˜ ∗n (hν).
Let µ = hν. Then ν, φ = 0 iff µ, φh−1 = 0. Therefore, ρ −n Ln φ∞ ≤ h∞ L˜ n (φh−1 )∞ .
154
A. Fan, Y. Jiang
However, denoting hmin = minx h(x) we have φh−1 (t) ≤ (hmin )−2 φ∞ h (t) + (hmin )−1 φ (t). Notice that h ∈ Hω˜ . By Theorem 4, there is a constant C > 0 such that
% ω(λ ˜ −(nj −nj −1 ) c2 λp ) + γ % φ∞ . ρ −n Ln φ∞ ≤ C φ (c2 λ−n0 ) + φ∞ j =0
4. Examples ˜ α for another If ω(t) ≤ Ct α for some constants C > 0 and 0 < α ≤ 1, then ω(t) ˜ ≤ Ct ω ω ˜ α constant C˜ > 0. In this case, H = H = C is the α-Hölder continuous space and it is known that the convergence speed is exponential (cf. [Bo, PP]), i.e. there are constants C > 0 and ϑ > 0 such that for any φ ∈ C α , ρ −n Ln φ − ν, φh∞ ≤ Ce−ϑn Cα
(n ≥ 1).
Cα
Moreover, L : → is quasi-compact (see [PP, He]). When ψ is less regular, ρ −n Ln φ −ν, φh∞ for φ ∈ Hω may not have exponential decay. Our result will show different speeds for the decay of ρ −n Ln φ− < ν, φ > h∞ when ω satisfies the Dini condition. Following are some illustrating examples. Corollary 1. Suppose α, β > 1 and ω(t) =
1 | log t|β
and ω0 (t) =
1 . | log t|α
Suppose 0 < ψ ∈ Hω is the potential and φ ∈ Hω0 is any function such that then there exists a constant C > 0 such that ρ −n Ln φ∞ ≤ C
(log n)max{α,β} nmin{α,β−1)
φdν = 0,
(n ≥ 1).
Proof. Note that ω(t) ˜ = O(| log t|β−1 ). Apply Theorem 2 by choosing n n0 = nj − nj −1 = (1 ≤ j ≤ %) with % = [c log n] − 1, log n where [x] denotes the integral part of a real number x and c > 0 is chosen sufficiently large. We get
log n α log n β−1 −n n c log n ρ L φ∞ ≤ C + (log n) · +γ n n ≤C
(log n)max(α,β) , nmin(α,β−1)
where C , C > 0 are two constants.
On Ruelle–Perron–Frobenius Operators. II. Convergence Speeds
155
β
Corollary 2. Suppose ω(t) = e−α| log log t| (α > 0, β > 1). Suppose ψ ∈ Hω is the potential and φ ∈ Hω is any function with φdν = 0. Then for any ε > 0 there exists B = B(α, β, ε) > 0 such that β
ρ −n Ln φ∞ ≤ Be−(α−ε)(log n) ,
(n ≥ 1).
Proof. We show first the estimate (t being small) t 1 β dξ 1 β | log t| ω(t) ˜ ≤ ≤C e−α(log log ξ ) · e−α| log log t | . 1 ξ | log log t |β−1 0 In fact, by making successively two changes of variables u = | log ξ | and v = u/| log t|, we get ∞ β ω(t) ˜ ≤ e−α| log u| du | log t|
= | log t|
∞
1 β
e−α(log v+(log log t ) dv.
1
By using the inequality (1 + x)β ≥ 1 + βx (β ≥ 1, x ≥ 0), we get ∞ 1 β−1 −α(log log 1t )β · e−βα(log log t ) log v dv. ω(t) ˜ ≤ | log t| · e 1
Now note that the integrand in the last integral is actually a polynomial and its integral is equal to
−1 1 . βα(log log )β−1 − 1 t Apply Theorem 2 by choosing n n0 = nj − nj −1 = (1 ≤ j ≤ %) with % = logq n − 1, q log n where q > β. We get that, up to a multiplicative constant, ρ −n Ln φ∞ is bounded by the sum of the following three terms: β
e−α(log n−q log log n) , n − logq n β logq n · · e−α(log n−q log log n) , (log n − q log log n)β−1 and q
elog γ log n . To finish the proof, it suffices to note that the second is the biggest and it is bounded up β to a constant by Be−(α− )(log n) . β
ω Corollary 3. Suppose ω(t) = e−α| log t| (α > 0, 0 < β < 1). Suppose ψ ∈ H is the ω potential and φ ∈ H is any function with φdν = 0. Then there exists B = B(α, β) > 0 and C = C(α, β) > 0 such that
ρ −n Ln φ∞ ≤ Be−Cn
β 1+β
,
(n ≥ 1).
156
A. Fan, Y. Jiang
Proof. We show first the estimate t β dξ β ω(t) ˜ ≤ e−α| log ξ | ≤ | log t|1−β · e−α| log t| . ξ 0 By making the change of variables u = | log ξ |, we get ∞ β ω(t) ˜ ≤ e−αu du. | log t|
So, it suffices to show that for any R > 0, ∞ β β e−αu du ≤ CR 1−β e−αR . R
Observe that for z ≥ 1,
∞ z
∞
z
e
−x 2
x a dx =
1 −z2 e 2 a − 1 ∞ −x 2 a−2 + e x dx. 2 z
e−x xdx = 2
1 a−1 −z2 z e 2
Let a ≥ 1 and let q be the smallest integer such that a − 2q ≤ 1. Applying q times the last equality enables us to get
∞ ∞ −x 2 a a−1 −z2 −x 2 a−2q e x dx ≤ C z e + e x dx z
≤ C za−1 e−z + 2
a−1 −z2
≤Cz
e
z
∞ z
e−x xdx 2
.
Now to obtain the claimed inequality, it suffices to apply the above inequality to the right-hand side of the following equality: ∞ ∞ 2 2 2 −1 −αuβ e du = e−x x β dx. √ 1 β αR R βα β Apply Theorem 2 by choosing 1 n0 = nj − nj −1 = n 1+β
(1 ≤ j ≤ %)
with
β % = n 1+β − 1.
Then up to a multiplicative constant, ρ −n Ln φ∞ is bounded by e−αn
β 1+β
1−β
+ n1−β n 1+β e−αn
β 1−β
+ elog γ ·n
β 1+β
. β
It is clear that each of the above three terms is bounded by Ce−Bn B > max(α, | log γ |).
when
On Ruelle–Perron–Frobenius Operators. II. Convergence Speeds
157
5. Applications 5.1. Correlations. Suppose (X, d) is a compact metric space and f is an expanding and mixing dynamical system on X. Suppose ψ is a potential function in Hω , where ω is a modulus of continuity satisfying the Dini condition. Let µ = hνψ be the Gibbs ˜ ◦ f ) = φ). For measure associate to ψ. Then L˜ ∗ µ = µ and µ is f -invariant (for L(φ a continuous function φ ∈ C, φ ◦ f n is a stationary process defined on the probability space (X, µ). Its correlation is defined by D(n) =
(φ ◦ f n )φdµ −
2 φ dµ
.
We have Theorem 5. Under the same condition as in Theorem 2, |D(n)| ≤ Cφ∞ φ (c2 λ
−n0
) + φ∞
%
ω(λ
−(nj −nj −1 )
p
c2 λ ) + φ∞ γ
%
,
j =0
where 1 ≤ n0 < n1 < · · · < n% ≤ n with nj − nj −1 > p and C > 0 is a constant. ˜ Proof. Let φ˜ = φ − µ, φ. Then φdµ = 0 and ˜ ˜ D(n) = (φ˜ ◦ f n )φdµ = µ, (φ˜ ◦ f n )φ. But ˜ = L˜ ∗n µ, (φ˜ ◦ f n )φ ˜ = µ, Ln ((φ˜ ◦ f n )φ) ˜ = µ, φL ˜ n φ. ˜ µ, (φ˜ ◦ f n )φ ˜ ∞ L˜ n φ ˜ ∞ . Thus the claimed result follows from Theorem 2. So |D(n)| ≤ φ
5.2. Central limit theorem. The other way to describe the statistical properties of a dynamical system is the central limit theorem. For expanding and mixing dynamical systems, the central limit theorem holds thanks to Theorem 2. Theorem 6. Let ω(t) =
1 | log t|2+
and ω0 (t) =
1 | log t|1+
( > 0).
Suppose 0 < ψ ∈ Hω is a potential and µ = hνψ is the Gibbs measure associate to ψ (µ is f -invariant). For any φ ∈ Hω0 , we have
t n−1 √ 1 x2 j lim µ x : φ ◦ f − n φdµ ≤ t n = √ exp − 2 dx, n→∞ 2σ 2π σ −∞ j =0
where σ 2 = −Eφ 2 + 2
∞
j =0 E(φ
· φ ◦ f j ), Eφ denoting µ, φ.
158
A. Fan, Y. Jiang
Proof. Without loss of generality, assume φdµ = 0. Let B be the Borel σ -field. For n ≥ 1, let Bn = f −n B. Define V φ = φ ◦ f for φ ∈ L2 (ν). Let V ∗ be the adjoint operator of V : L2 → L2 . By Theorem 1.1 of [Li], it suffices to show the convergences of the following two series: ∞
∞
|E(φV n φ)| < ∞,
n=0
E|V ∗n φ| < ∞.
n=0
Since L˜ ∗ µ = µ, E(φV n φ) = µ, φ · V n φ = L˜ ∗n µ, φ · V n φ = µ, Ln (φ · V n φ) = µ, φLn φ. So
|E(φV n φ)| ≤ φ∞ L˜ n φ∞ .
Then, by Corollary 1, we have n
|E(φV φ)| = O
(log n)2+ n1+
.
Thus we get the convergence of the first series. On the other hand, observe that V ∗ φ = ˜ So Lφ.
(log n)2+ E|V ∗n φ| ≤ L˜ n φ∞ = O . n1+ The convergence of the second series follows.
Acknowledgement. This work was started when the second author visited the Faculté de Mathématiques et Informatique at the Université de Picardie Jules Verne in Amiens, France and it was done when the first author visited the Institute of Mathematical Science (IMS) at the Chinese University of Hong Kong and when the second author visited the Institute for Mathematical Research (FIM) at ETH-Zürich in Switzerland. The authors would like to thank these institutes for hospitality and support. The authors also would like to thank Professors A.-S. Sznitman, K. S. Lau and O. Lanford III for their interest and helpful discussions. Thanks also go to A. Rivière et Y. L. Ye for valuable comments.
References [Bo] [DF] [Fa] [FP] [FJ] [He] [Ji] [Li]
Bowen, R.: Equilibrium states and the ergodic theory of Anosov diffeomorphisms. LNM 470, Berlin: Springer, 1975 Dooley, A.H. and Fan, A.H.: Chains of Markovian projections and (G, G)-measures. In: Trends in Probability and Related Analysis, eds. N. Kono and N.R. Shieh, Singapore: World Scientific, 1997, pp. 101–116 Fan, A.H.: A proof of the Ruelle theorem. Reviews Math. Phys. 7, 8, 1241–1247 (1995) Fan,A.H. and Pollicott, M.: Non-homogeneous equilibrium states and convergence speeds of averaging operators. Math. Proc. Camb. Phil. Soc. (2000), to appear Fan, A.H. and Jiang, Y.P.: On Ruelle–Perron–Frobenius Operators. I. Ruelle’s Theorem. Commun. Math. Phys. 223, 125–141 (2001) Hennion, H.: Sur un théorème spectral et ses applications aux noyaux lipschitziens. Proc. Am. Math. Soc. 118, 627–634 (1993) Jiang, Y.: A Proof of existence and simplicity of a maximal eigenvalue for Ruelle–Perron–Frobenius operators. Lett. Math. Phys. 48, 211–219 (1999) Liverani, C.: Central limit theorem for deterministic systems. In: International Congress on Dynamical Systems, Montevideo 95, Proceedings, Research Notes in Mathematics series, London: Pitman, 1996, pp. 56–75
On Ruelle–Perron–Frobenius Operators. II. Convergence Speeds
[PP]
159
Parry, W. and Pollicott, M.: Zeta Functions And The Periodic Orbit Structure of Hyperbolic Dynamics. Astérisque (1990) [Ru1] Ruelle, D.: Statistical mechanics of a one-dimensional lattice gas. Commun. Math. Phys. 9, 267–278 (1968) [Ru2] Ruelle, D.: A measure associated with Axiom A attractors. Am. J. Math. 98, 619–654 (1976) [Wa] Walters, P.: Invariant measures and equilibrium states for some mapping which expand distances. Trans. A.M.S. 236, 121–153 (1978) Communicated by Ya. G. Sinai
Commun. Math. Phys. 223, 161 – 203 (2001)
Communications in
Mathematical Physics
© Springer-Verlag 2001
Critical Behavior of the Massless Free Field at the Depinning Transition Erwin Bolthausen1 , Yvan Velenik2 1 Institut für Mathematik, Universität Zürich, Winterthurerstrasse 190, 8057 Zürich, Switzerland.
E-mail: [email protected]
2 L.A.T.P., UMR-CNRS 6632, C.M.I., 39 rue Joliot Curie, 13453 Marseille Cedex 13, France.
E-mail: [email protected] Received: 1 November 2000 / Accepted: 15 June 2001
Abstract: We consider the d-dimensional massless free field localized by a δ-pinning of strength ε. We study the asymptotics of the variance of the field (when d = 2), and of the decay-rate of its 2-point function (when d ≥ 2), as ε goes to zero, for general Gaussian interactions. Physically speaking, we thus rigorously obtain the critical behavior of the transverse and longitudinal correlation lengths of the corresponding d + 1-dimensional effective interface model in a non-mean-field regime. We also describe the set of pinned sites at small ε, for a broad class of d-dimensional massless models. 1. Introduction The behavior of a two-dimensional interface at phase transitions has been much studied in the physics literature, especially regarding some models of wetting. The latter problem arises when one considers an interface above an attractive wall. Then there is a competition between attraction by the wall and repulsion due to the decrease of entropy for interfaces close to it. Often, tuning some external parameter (the temperature, or the strength of the attraction), two behaviors are possible: either energy wins, and the interface stays localized along the wall, or entropy wins, and the interface is repelled at a distance from the wall diverging as the size of the system grows. The corresponding transition is called wetting transition. Usually in Nature this transition is first-order, which means here that the average height of the interface above the wall stays uniformly bounded as the parameter approaches the critical value from the localized phase, and makes a jump “to infinity” (in the thermodynamic limit) at the transition. There are however cases when this transition is second-order (the two-dimensional Ising model is a nice theoretical example, but this behavior can also be observed in real systems); this is the so-called critical wetting. In this case, the average height of the interface diverges continuously as the critical value is approached. It is then of interest to characterize this divergence. We refer to [11] for references to the (non-rigorous) results which have been obtained.
162
E. Bolthausen, Y. Velenik
Unfortunately, very little is known rigorously about the behavior of two-dimensional interfaces at a critical wetting transition, even for simple effective interface models. There are some results on part of the so-called “mean-field” regime [10, 19], but nothing concerning the more interesting ones. In the present work, we study the critical behavior of a d-dimensional interface localized by a δ-pinning (defined below). The main focus will be on the most difficult and physically most relevant two-dimensional case, but the other cases will also be discussed. Though this problem is clearly simpler than the wetting transition, it has the advantage of being non-mean-field, while being rigorously tractable; we make some additional comments on the wetting problem at the end of Sect. 2. Let Zd . We consider the following class of massless gradient models in , with d 0-boundary conditions described by the following probability measures on RZ (δ0 is the point mass at 0): 1 β µ (dφ) = exp − p (x − y) V φx − φy dφx δ0 (dφx ) , (1.1) Z 2 x,y def
x∈
x∈
where V is an even and convex function, and β > 0. We assume that p(x) = p(−x) ≥ 0, d d x∈Z p(x) = 1, for any x ∈ Z there exists a path 0 ≡ x0 , x1 , . . . , xn ≡ x such that p(xk − xk−1 ) > 0, k = 1, . . . , n, and at least
p(x) |x|2+δ < ∞
(1.2)
x∈Z2
for some δ > 0. We denote by µ the (Gaussian) measure corresponding to the particular choice V (x) = 21 x 2 . A superscript will always be used for quadratic interactions. It is well-known that for d = 2 these measures describe a random field with unbounded fluctuations as Zd , diverging logarithmically with the size of if the limit is taken along a sequence of cubes, say, while for d ≥ 3 the variance stays bounded. For the Gaussian case this follows from the well known random walk representation of the covariances out T 1 µ φx φy = Ex I (Xn = y) , (1.3) β n=0
where (Xn ) is a random walk, starting at x under Px , with transition probabilities out is the first exit time from , and I (·) denotes the indicator Px (X1 = y) = p (y − x), T function of a set.A two-dimensional symmetric random walk satisfying (1.2) is recurrent, and so for d = 2 the divergence of the variances as Z2 follows. In higher dimensions, random walks are transient, and therefore, the variance stays bounded. Notice however, that even in two dimensions, a random walk satisfying x p(x) |x|2−δ = ∞ for some δ > 0 is transient. For more general convex interaction functions V , the corresponding results follow by an application of the Brascamp–Lieb inequality (see [4]). It turns out, however, that the addition of an arbitrarily weak self-potential breaking the continuous symmetry of the Hamiltonian, φ → φ + c, c ∈ R, is enough to localize
Critical Behavior of Massless Free Field at Depinning Transition
163
the field. More precisely, if a and b are two strictly positive real numbers, then we perturb the measures by modifying them with a “square well” potential: def µ · exp[b a,b x∈ I (|φx | ≤ a)] . µ ( · ) = (1.4) µ exp[b x∈ I (|φx | ≤ a)] Another type of pinning, mathematically slightly more convenient, has also been investigated, the so-called δ-pinning. It corresponds to the weak limit of the above measures when a → 0 and 2a(eb − 1) = ε, for some ε > 0, and has the following representation: 1 β µε (dφ) = ε exp − 2 p(x − y) V (φx − φy ) Z x,y (1.5) × δ0 (dφx ). (dφx + εδ0 (dφx )) x∈
x∈
The most natural question in two dimensions is if a thermodynamic limit as Z2 of these measures exists. The answer is most probably “yes”, but we cannot prove this, except in the Gaussian case with δ-pinning (Proposition 2.1 below). A somewhat simpler question is whether the variance stays bounded uniformly in . This was shown for µa,b in the Gaussian nearest neighbor case in [9], and was finally proved in [8] much more generally, assuming only V ≥ const. > 0. Moreover, it was shown in [16] that ε the covariances µ φx φy decay exponentially in |x − y| , uniformly in , provided 0 < const. ≤ V ≤ const. < ∞ (see also [2] for the Gaussian nearest neighbor case). The discussion in [16] is restricted to the δ-pinning case, but it could probably be extended to the square well case at least for quadratic interactions. The aim of the present paper is to obtain a precise description of the behavior of the variance of the field (or equivalently, in a more physical terminology, of the transverse correlation length) and of the rate of decay of the covariance (or of the longitudinal correlation length), as one approaches the depinning transition, i.e. as the strength ε of the pinning potential goes to zero. The latter question is also of interest for d ≥ 3. For the Gaussian δ-pinning case, we determine exactly the divergence of the variance for d = 2 (Theorem 2.2) as a function of the pinning parameter ε, and the ε-dependence of the mass for d ≥ 2, including the correct power of the logarithmic correction for d = 2 to the power law dependence in ε (Theorem 2.3). There are two main ingredients to our approach. By a simple expansion like expanding the product x∈ in (1.5), we obtain a representation of the random field as a mixture of free measures (1.1). The mixture is given in terms of the distribution of pinned sites. For the δ-pinning case, this is particularly simple. µε generates a law on subsets A ⊂ , the set of sites where the random field is 0 inside . Conditioned on this set, the field is def then just the free field (1.1) on Ac = \A with 0-boundary conditions on Zd \ ∪ A. It is therefore crucial to have information on the distribution of pinned sites, which we ε (see the precise definition in (2.3)). The main result on this problem is a denote by ν domination property of this distribution by Bernoulli measures from above and below. The difficulty in dimension two (in contrast with the situation in higher dimensions) is that, strictly speaking, there is no sharp domination, i.e. with the same ε behaviour from above and below, but, surprisingly, correlations can be estimated as if there were such a domination. This is the content of Theorem 2.4 which is proved for general convex interactions.
164
E. Bolthausen, Y. Velenik
The main results on the depinning properties (Theorem 2.2 and Theorem 2.3) are however proved for the Gaussian δ-pinning case only. The restriction to the Gaussian case is mainly due to the fact that we need precise information on the behavior of various objects appearing in the random walk representations (1.3), like estimates of Green functions and ranges of the random walk. One might hope that with the help of the Helffer-Sjöstrand representation (see [7]) which gives a representation similar to (1.3) also for the case of convex, even interactions, this could be extended. However, this random walk representation is a much more complicated object and the precise information we need is not available in this case, yet. The restriction to the δ-pinning case, which is made here mainly for technical convenience, is more innocuous and could probably be much relaxed by replacing the simple expansion of products by the more sophisticated Brydges–Fröhlich–Spencer random walk representation, see [5] (not to be mixed up with the Helffer-Sjöstrand representation). The critical behavior of the 2-point correlation function has also been obtained in a mean-field regime, mentioned at the beginning of the introduction, in [10], see also [19]. We briefly describe the setting and the result in order to show the difference with the regime studied here. The measure considered in [10] is µU (dφ) =
def
1
U Z
xy
1
e− 2 (φx −φy )
2
e−U (φx )
x∈
dφx
x∈
δ0 (dφy ),
y∈
where xy denotes nearest-neighbor sites, and
− x2 U (x) = −c e 2q 2 − 1 . Then, provided1 K log(1 + c−1 ) < 0 < c ≤ 1, it is proved that
√
q for some sufficiently large constant K and
√ −D µ (φx φy ) ≤ K log(q/ c)e U
√
c q
|x−y|
,
with the constant D → 1 if c is fixed and q → ∞. The heuristics behind this result is rather clear. Under the above assumption, the quadratic approximation U (x) = 2qc 2 x 2
U holds over a huge range of values of x. Over this range √ of values the measure µ behaves like a massive Gaussian model with mass m = c/q, and therefore, provided the interface stays mostly there, the exponential decay should be given by this mass. The main part of the proof in [10, 19] was then to show perturbatively that indeed the interface remains essentially all the time in this range. The δ-pinning corresponds to an opposite regime, where instead of having a very wide and shallow potential well, one has a very narrow and deep one. It is far less clear a priori what the behavior of the correlation lengths should be in this case, since the latter cannot be read from the self-potential. The paper is organized as follows: In the next section we state precisely the results. In Sect. 3, we prove the main domination results. Section 4 proves the results on the variance, and Sect. 5 for the covariance. In Appendix A, we prove the existence of the mass in the Gaussian case. We will also need precise results about standard random 1 It is emphasized in [10] that this condition is actually too strong and that the result should be true under the weaker condition that K log(1 + c−1 ) < q, which characterizes the mean-field regime.
Critical Behavior of Massless Free Field at Depinning Transition
165
walks, and the number of points visited by random walks. Some of these properties are standard, but others are more delicate. We collect what we need in Appendix B and Appendix C. To complete the picture, we shortly sketch the one-dimensional situation in Appendix D but only in the nearest neighbor case p(±1) = 1/2 which is easily reduced to standard renewal theory. 2. Results The basic assumption is that the (symmetric) transition kernel (p(x))x∈Zd is irreducible and satisfies (1.2). Only in Theorem 2.3 we need a stronger assumption. We write X0 , X1 , X2 , . . . for a random walk with these transition probabilities, and Px for the corresponding law for a walk starting in x. With X[0,n] we denote the set of points vis ited by the walk up to time n, and by X[0,n] the number of points visited. If p(0) = 0, then we remark that the interface model is not changed if we replace p by its half, putting p(0) = 1/2, and doubling β. We can therefore as well assume that p is aperiodic, and especially that for any x ∈ Z2 , pn (x) > 0 for large enough n, where pn is the n-fold convolution of p. We denote by C or C , C generic constants, not necessarily the same at different occurrences, which may depend on p and the dimension d, but on nothing else, unless explicitly stated. Our first result complements estimates obtained in [2, 8] where it was shown for d = 2 that provided V ≥ c > 0 and p( · ) satisfies (1.2), there exists a constant C > 0 def (depending on p only) such that, for small enough e = 2a βc(eb − 1) > 0, 1 a,b 2 2 sup µ (φ0 ) ≤ C log e + a . cβ We are going to show that this upper bound indeed corresponds to the correct behavior. def Let Q be the covariance matrix of p: Q(i, j ) = x∈Zd xi xj p(x). Theorem 2.1. Assume d = 2 and let V be an even C 2 function with 0 ≤ V (x) ≤ c for all x. Then in the square well pinning case, there exists a constant C > 0 (depending only on p) such that for any def e = e (a, b, β) = 2a βc eb − 1 small enough and provided a 2 ≤ (8βcπ det Q)−1 | log e|,
C |log e| . φ02 ≥ lim inf µa,b 2 cβ Z √ This remains true for δ-pinning with e = ε βc (and a = 0). Our next two results are for the Gaussian (i.e. V (x) = x 2 /2) case and δ-pinning. For this case there is a simple proof of the existence of a thermodynamic limit. Proposition 2.1. The thermodynamic limit µ,ε = lim µ,ε def
Zd
exists in all dimensions and is translation invariant. The limit is defined in terms of limits of integrals over bounded local functions.
166
E. Bolthausen, Y. Velenik
Proof. This is an immediate consequence of the corresponding property for the law of the pinned sites, given in Lemma 2.1 below. Our main result on the behavior of the variance in the Gaussian case is the following Theorem 2.2. Assume d = 2. There exists ε0 > 0 and C > 0 such that for all ε and β √ satisfying 0 < ε β < ε0 ,
log √βε ,ε 2 βε . φ0 − √ µ ≤ C log log 2πβ det Q The second quantity we are interested in is the decay-rate of the covariance (i.e. the mass). This is of interest also in the higher-dimensional case. It is defined, for x on the def unit sphere x ∈ Sd−1 = x ∈ Rd : |x| = 1 as the limit 1 log µε (φ0 φ[kx] ), k→∞ k
mε (x) = − lim
(2.1)
where [kx] is the integer part of kx, componentwise. The existence of this limit, in the Gaussian case, is proved in Appendix A. The following theorem shows that in the Gaussian case mε ∼ ε1/2+o(1) as ε goes to zero, provided the coupling p( · ) has an exponential moment. Theorem 2.3. Consider the case of δ-pinning and Gaussian interaction, and assume that there exists a > 0 such that p(x) ea|x| < ∞. (2.2) x∈Zd
a) Assume d = 2. Then there exist ε0 > 0 and constants 0 < C1 ≤ C2 < ∞ (depending only on p) such that √ 1/2 √ 1/2 βε βε C1 ≤ mε (x) ≤ C2 log(√βε)3/4 log(√βε)3/4 √ for all 0 < ε β < ε0 and for any x ∈ S1 . b) Assume d ≥ 3. Then there exist ε0 > 0 and constants 0 < C1 < C2 < ∞ (depending only on p and d) such that C1 ( βε)1/2 ≤ mε (x) ≤ C2 ( βε)1/2 √ for all 0 < ε β < β0 and for any x ∈ Sd−1 . Remark 2.1. 1. The theorem gives much more than just the correct power law decay ε 1/2 of the mass, since it shows that there are no logarithmic corrections when d ≥ 3, while it provides the correct power for the logarithmic correction when d = 2. The most precise results one might expect to hold in the latter case would be √ 1/2 βε mε (x) = √ 3/4 ϕ (x) (1 + o(1)) , log βε
Critical Behavior of Massless Free Field at Depinning Transition
167
where ϕ is a positive function on S1 which is bounded and bounded away from 0. Our techniques, however, do not give such precise information. 2. The assumption on the existence of an exponential moment is essentially optimal. Otherwise, there is no positive mass. Indeed, it is easy to show that the decay of the covariance cannot be faster than that of p( · ): In the random-walk representation of µε (φ0 φx ), see (2.17), we get a lower bound by letting the random-walk jump directly from 0 to x. Probably, this “one-jump” contribution gives the leading order of the decay of correlations correctly, but we don’t have a proof. Remark 2.2. The parameter enters only in a trivial way. If we replace the √ temperature √ field (φx ) by βφx , and ε by βε we have transformed the model to temperature parameter β = 1. In the proofs, we will therefore always assume β = 1. As remarked in the introduction, the mechanism at play is that the potential will randomly pin some sites at height 0 or close to 0. The main point therefore is to find the properties of the distribution of these pinned sites. Precise information about this distribution is used in essential ways in the proofs of the previous theorems. Since these results are also interesting per se, and yield a better understanding of the reason behind the behavior described above, we discuss them in some details, and prove more than is needed for the proofs of Theorems 2.2 and 2.3. In particular, we do not restrict to the Gaussian case. Let us start by defining precisely what we mean by the set of pinned sites, and its distribution. The starting point is the following expansion: For any bounded measurable function f , µa,b (f ) =
1 a,b Z
×
β
f (φ)e− 2
x,y
p(x−y)V (φx −φy )
dφx δ0 (dφx )
(eb − 1)I (|φx | ≤ a) + 1
x∈
= =
x∈
Z a (A) (eb − 1)|A| a,b Z A⊂
A⊂
x∈
(2.3)
µ (f | |φx | ≤ a, ∀x ∈ A )
a,b ν (A) µ (f | |φx | ≤ a, ∀x ∈ A ) ,
where a,b ν (A) = (eb − 1)|A| def
a (A) Z a,b Z
,
a (A) = Z µ (|φx | ≤ a, ∀x ∈ A) . Z def
Therefore the effect of the potential can be seen as pinning, i.e. constraining to the interval [−a, a] a random set of points, the pinned sites. The distribution of the latter is given a,b by the probability measure ν . We’ll denote by A the corresponding random variable, taking values in the subsets of . A completely similar representation is obtained in the
168
E. Bolthausen, Y. Velenik
case of δ-pinning by just expanding the term x∈ (dφx + εδ0 (dφx )). The result reads ε µε (f ) = ν (A) µAc (f ), (2.4) A⊂
ε (A) = ε |A| ZAc . where Ac = \ A and ν ε Z The following lemma gives some basic properties of the distribution of pinned sites. def
def
Lemma 2.1. Suppose that Griffiths’ inequalities (in the sense of [14]) hold for the measure µ . Then a,b ε satisfy the lattice condition, i.e. and ν 1. ν a,b a,b a,b a,b ν (A ∪ B) ν (A ∩ B) ≥ ν (A) ν (B) ,
(2.5)
ε . In particular, these two measures are strong for A, B ⊂ , and similarly for ν FKG, see [12]. def ε exists and is translation invariant. 2. ν ε = lim Zd ν
Proof. Part 1 is very simple: In the square-well case (2.5) is equivalent to µ |φx | ≤ a, ∀x ∈ B \ A φy ≤ a, ∀y ∈ A ≥ µ |φx | ≤ a, ∀x ∈ B \ A φy ≤ a, ∀y ∈ A ∩ B , which follows from Griffiths’ inequality. The δ-pinning case is similar. Part 2 is easy, too: For any local increasing function f (of the random set A) with support inside ⊂ Zd , one has ε ε ε (f ) = ν ν (f | \ ⊂ A) ≥ ν (f ).
(2.6)
Translation invariance is a simple consequence of this. Indeed, let x ∈ Zd and Tx f = f ( · − x). Denoting by + (respectively − ) the biggest (respectively smallest) square box centered at x contained in (respectively containing) , we have ε ε ε (Tx f ) ≤ ν (Tx f ) ≤ ν (Tx f ), ν + − ε (T f ) = provided is big enough. Taking the limit Zd and using the fact that ν x − ε ε ε νT−x − (f ), and the corresponding statement for + , we get ν (f ) = ν (Tx f ) which implies the desired result.
Remark 2.3. 1. Griffiths’ inequalities are known to hold in the Gaussian case, see [14]. 2. Part 1 of the lemma is of course not specific to the cubic lattice. Griffiths’ inequality for µ implies the strong FKG property for the distribution of pinned sites on an arbitrary lattice. The following Theorem 2.4 is the key step for our analysis of the random fields. It states domination properties of the field of pinned sites by Bernoulli measures and is a substantial improvement on the results already present in [8, 16]. Although the main emphasis in this paper is on the case of the (difficult) two-dimensional lattice, we include also the higher-dimensional case.
Critical Behavior of Massless Free Field at Depinning Transition
169
Let us first introduce some standard notions. If ν1 and ν2 are two probability measures on the set of subsets {0, 1} of a finite set , we say that ν1 dominates ν2 , if for any increasing function f : P () → R, we have ν1 (f ) ≥ ν2 (f ) .
(2.7)
We say that ν1 strongly dominates ν2 , if for any x ∈ and any subset C ⊂ \ {x}, ν1 (x ∈ A | A \ {x} = C) ≥ ν2 (x ∈ A | A \ {x} = C) .
(2.8)
It is evident that strong domination implies domination, and the latter implies that for any subset B ⊂ , one has ν1 (A ∩ B = ∅) ≤ ν2 (A ∩ B = ∅) . We formulate the next theorem for the square-well case only. We set
def ε = εa,b = 2a eb − 1 .
(2.9)
The δ-pinning case follows either in an identical way, or by taking the limit as a → 0, keeping ε fixed. Theorem 2.4. Let V be an even C 2 function. 1. Assume d ≥ 2 and suppose 0 ≤ V (x) ≤ c, ∀x. Then there exists C < ∞, depending a,b only on p and d, such that for any Zd , the distribution ν of pinned sites is def
= C (1 ∧ strongly dominated by the Bernoulli measure on {0, 1} with density p− √ −1 d a ) βc ε (ε given by (2.9)). In particular, for any B ⊂ Z , a,b |B| ν (A ∩ B = ∅) ≥ (1 − p− ) .
(2.10)
2. Assume d = 2 and suppose that V (x) = 21 x 2 . For any α > 0, there exist ε0 > 0 √ and C (α) < ∞, such that, for βε ≤ ε0 , any Z2 and any B ⊂ Z2 with d(B, c ) > ε−α , a,b ν (A ∩ B = ∅) ≥ (1 − p− )|B| ,
(2.11)
def p− = p− (α, ε) = C (α) | log βε|−1/2 βε.
(2.12)
with
3. Assume d = 2 and suppose V (x) ≥ c > 0,∀x. There exist ε0 > 0 and C > 0 such that, for all a, b > 0 with βc ε ≤ ε0 , 2a βc ≤ | log βc ε|1/2 , and for any set B ⊂ Z2 ,
with
a,b ν (A ∩ B = ∅) ≤ (1 − p+ )|B| ,
(2.13)
−1/2 def p+ = C log βc ε βc ε.
(2.14)
170
E. Bolthausen, Y. Velenik
4. For d ≥ 3 and V (x) ≥ c > 0, there exists C > 0, depending only on p and d such a,b that ν strongly dominates a Bernoulli measure with
def p+ = C 1 ∧ a −1 βc ε.
(2.15)
All the statements remain true in the case of δ-pinning. Remark 2.4. 1. Part 3 of the theorem is stated for small enough ε and a only.An essentially a,b (A∩B = ∅) for any a, b > 0. The precise identical proof yields exponential decay of ν ε dependence given in the theorem, however, is only valid for small values of ε. 2. We expect that Part 2 could be generalized to more general convex interactions V , but a proof eludes us. a,b The fact that for d ≥ 3, ν can be strongly dominated from above and below by a Bernoulli measure has been observed by Dima Ioffe (oral communication). That this is not true for d = 2 can be seen as follows: It is easy to check that ε ν (A # x |y ∈ A, ∀y = x s.t. |x − y| < T )
is decreasing to zero as T → ∞, Z2 , since under this conditioning typical values √ of the field at the sites neighboring x will be (at least) of order log T . This excludes the possibility of any strong domination of a Bernoulli measure, uniformly in . This leaves open the possibility of a domination in the sense of (2.7), which might be true; note however that the density of the corresponding Bernoulli measure cannot be larger than ε| log ε|−1/2 by (2.14). by p . Indeed, When d = 2, it is impossible to improve on (2.10) by replacing p− − there is no strong domination by a Bernoulli process with density o(ε), as the following argument shows: In the case of δ-pinning,
−1 ε (A # 0 | \ {0} ⊂ A) = 1 + ε−1 Z{0} , ν and therefore ε ν (A # 0 | \ {0} ⊂ A) ≥ Cε,
which is incompatible with such a strong domination. In fact, even more is true: There is no domination, even in the sense of (2.7), by a Bernoulli measure of density o(ε). Indeed, it is not difficult to show that the probability of the increasing event {A ⊃ B} is larger than (Cε)|B| | log ε|−1 , for any connected set B ⊂ . This shows in particular that there must be a gap between ε in dimension 2. In view of this, it is rather any upper and lower dominations of ν remarkable that as long as we are only interested in covariances of the field, such a domination holds, as a consequence of the estimates (2.11) and (2.13): Corollary 2.1. Assume the Gaussian δ-pinning case with β = 1 (which is no restriction, according to Remark 2.2). There exists ε0 > 0 such that for 0 < ε ≤ ε0 the following is true. Let ρ+ be the Bernoulli measure with density (2.14) or (2.15), and ρ− the Bernoulli
Critical Behavior of Massless Free Field at Depinning Transition
171
(ε) in the case d ≥ 3, measures with density p− (ε) from (2.12) in the case d = 2, and p− then for any x, y ∈ Zd , µ,ε (φx φy ) ≥ ρ− µAc (φx φy ) ,
and
µ,ε (φx φy ) ≤ ρ+ µAc (φx φy ) .
Proof. We recall that the variance of the Gaussian field can be written as out µ (φx φy ) = β −1 Px [Xn = y, T > n],
(2.16)
n≥0
where Px is the law of the random walk in Zd , with transition probabilities p( · ), starting at x. Inserting this in (2.4), we get −1 ε out µ,ε ν ⊗ Px [Xn = y, TA (2.17) c > n]. (φx φy ) = β n≥0
ε inside, we get (Remember that D c = \ D.) Taking the expectation w.r.t. ν out ε −1 µ,ε Ex I (Xn = y) I T > n ν (A ∩ X[0,n] = ∅) . (φx φy ) = β n≥0
The corollary then follows from an application of the estimates of Theorem 2.4.
Notice that Corollary 2.1 can also be stated in the two following ways: in in ρ− ⊗ Px [Xn = x, TA > n] ≤ µ,ε (φx φy ) ≤ ρ+ ⊗ Px [Xn = y, TA > n], n≥0
n≥0
(2.18) def
in = min {n ≥ 0 : X ∈ B}, and, setting p − equal to p− when d = 2 and p− where TB n when d ≥ 3, Ex I (Xn = y) (1 − p − )|X[0,n] | ≤ µ,ε (φx φy ) n≥0
≤
Ex I (Xn = y) (1 − p+ )|X[0,n] | .
n≥0
(2.19) The problem is therefore essentially reduced to the analysis of the asymptotics of the Green function of the random walk with transition probabilities p( · ), in an annealed random environment of killing obstacles distributed according to Bernoulli measures in the limit of vanishing density. Equivalently, what we need is the asymptotics of the Green function of the “Wiener sausage”, Ex I (Xn = y) e−s |X[0,n] | , n≥0
as s → 0.
172
E. Bolthausen, Y. Velenik
Let us conclude by making some comments on open problems. First of all, one might wonder how universal the asymptotic behavior we have found actually is. It would be very interesting to extend the analysis to a more general class of interactions V . As remarked in the introduction, for even, strictly convex, C 2 interactions a representation of the covariance, similar to (2.16), also exists [7]. It was used in particular to establish exponential decay of covariances for this class of interactions [16]. It is however much more complicated than the standard random walk: The jump-rates of the walk are random, both in space and time; they are given by the state of an independent diffusion process on d RZ which depends on the distribution of pinned sites. So, even though the distribution of pinned sites can be treated in general (see Theorem 2.4), precise asymptotics in this situation are probably hard to obtain. Finally, there is a natural extension of this problem, which is more closely related to the issue of critical wetting discussed in the beginning of the paper: what happens in the presence of a hard-wall condition? More precisely, one considers the measure d = µa,b µa,b,+ ( · | φx ≥ 0, ∀x ∈ Z ), def
or the corresponding measure with δ-pinning. In this case, attraction of the pinning potential competes with entropic repulsion due to the conditioning, which makes this a much more difficult problem. Up to now, the only rigorous results (in dimension larger than 1) concern the existence, or not, of a strictly positive critical value εc such that for ε > εc the interface is pinned, while it is repelled for 0 < ε < εc . It was shown in [3] that for quadratic interactions and dimensions 3 and higher, there is no such εc : As in the pure pinning case, the interface is localized for arbitrarily weak pinning strength. On the other hand, it was shown in [6] that in dimension 2 there exists such an εc ; moreover it was shown in the latter paper that this is true in any dimension if the interaction is Lipschitz. The results of these two papers provide only information on the density of pinned sites, but give no local estimates. For example, it is even an open problem whether in the localized regime the variance of the spin at the origin is finite. To get much more, namely the critical behavior of such a quantity, seems therefore to be quite a challenge.
3. Geometry of the Pinned sites: Proof of Theorem 2.4 Note that it is enough to consider the case β = 1 and V ≤ 1, respectively V ≥√1, in (x) = βV (x Point 1, respectively 3 and 4. Indeed, say in point 1, we can define V βc), and then, by an obvious change of variables we see that √ a βc,b
a,b ν,β,V = ν,1,V ,
(3.1)
≤ 1. and by construction V 3.1. Proof of Point 1. By simple algebraic manipulations, one can write, for any A ⊂ \ {x}, Z (A ∪ {x}) −1 a,b (A # x | A = A off x) = 1 + (eb − 1) . ν Z (A)
(3.2)
Critical Behavior of Massless Free Field at Depinning Transition
173
We now need the following result, which we establish below: Z\{x} (A) Z (A ∪ {x}) ≤ 2a ≤ 2a Z (A) Z (A)
1 . 2π
(3.3)
Of course, we also have the trivial upper bound Z (A ∪ {x})/Z (A) ≤ 1, since the ratio can be written as a conditional probability. This and (3.3) readily imply the claim, since −1 a,b ν (A # x | A = A off x) ≥ 1 + C(1 ∧ a −1 )ε ≥ 1 − C(1 ∧ a −1 )ε. Let us now prove (3.3). The first inequality follows from the fact that the maximum of (t) the density F,A of φx under µ ( · | |φz | ≤ a, ∀z ∈ A) is at φx = 0. Indeed, F,A is equal to C (A, t)
p(y − x) µ V (φy − t) φx = t, |φz | ≤ a, ∀z ∈ A
y∈
−
p(y − x) V (t),
y∈
where C (A, t) > 0. Now, V (s) ≥ 0 for all s ≥ 0, and, for t ≥ 0, µ V (φy − t) φx = t, |φz | ≤ a, ∀z ∈ A φx = 0, |φz + t| ≤ a, ∀z ∈ A = µ−t V (φy ) ≤ µ V (φy ) φx = 0, |φz | ≤ (a − t) ∨ 0, ∀z ∈ A = 0, where µ−t denotes the measure with boundary condition −t outside . The inequality is a consequence of the FKG property, and the last equality follows from the fact that V is odd. Since F,A is even, the claim is proven. To prove the second inequality in (3.3), we write Z (A) Z\{x} (A)
= µ\{x} ≥
−∞
≥
∞
∞ −∞
exp − p(y − x)(V (φy − t) − V (φy )) dt |φz | ≤ a, ∀z ∈ A y∈Z2
exp − 21 p(y − x) y∈Z2
× µ\{x} V (φy − t) + V (φy + t) − 2V (φy ) |φz | ≤ a, ∀z ∈ A dt ∞
−∞
exp[− 21 t 2 ] dt,
where the first inequality is a consequence of Jensen’s inequality and the symmetry of the measure under φ → −φ, and for the second inequality we used the assumption V ≤ 1.
174
E. Bolthausen, Y. Velenik
3.2. Proof of Point 2. We assume d = 2 in this subsection. Let us write B = {t1 , . . . , t|B| }, 1
and let B0 = ∅, Bk = {t1 , . . . , tk }. Let also Ck = {x ∈ | |x − tk | ≤ ε −(α∧ 3 ) }. We write def
a,b ν (A ∩ B = ∅) =
=
|B| k=1 |B| k=1
a,b ν (A ∩ Bk = ∅ | A ∩ Bk−1 = ∅)
a,b ν (A # tk | A ∩ Bk−1 = ∅) .
Now, a,b ν (A # tk | A ∩ Bk−1 = ∅) −1 b |A| A#t , A∩Bk−1 =∅ (e − 1) Z (A ∪ {tk }) b = 1 + (e − 1) k b |A| A#t , A∩Bk−1 =∅ (e − 1) Z (A)
k a,b ∪{tk }) −1 = ∅ A ∩ B I(A # tk ) ZZ(A ν k−1 (A) b = 1 + (e − 1) . a,b A # tk A ∩ Bk−1 = ∅ ν
Strong domination by Bernoulli measure from Part 1 of the theorem shows that a,b ν (A # tk | A ∩ Bk−1 = ∅) ≥ 1/2,
provided ε is small enough. We are left with the numerator. We decompose it as follows:
Z (A ∪ {tk }) a,b I(A # tk ) ν A ∩ Bk−1 = ∅ Z (A)
Z (A ∪ {tk }) a,b = ν I(A ∩ Ck = ∅) A ∩ Bk−1 = ∅ Z (A)
Z (A ∪ {tk }) a,b + ν I(A ∩ Ck = ∅, A # tk ) A ∩ Bk−1 = ∅ . Z (A)
(3.4)
Let us first consider the second term. We already know, see (3.3), that √ Z (A ∪ {tk })/Z (A) ≤ 2a/ 2π for all A # tk . Therefore applying again the domination result from Part 1, this term is bounded from above by 2a a,b A ∩ Ck = ∅ A ∩ Bk−1 = ∅ √ ν 2π 2a a,b 1 − ν A ∩ Ck = ∅ A ∩ Bk−1 = ∅ =√ 2π 2a ≤√ 1 − (1 − p− )|Ck | = C 2a ε1/3 . 2π
Critical Behavior of Massless Free Field at Depinning Transition
175
Let us now examine the first term in (3.4). We prove below that Z (A ∪ {tk }) = µ |φtk | ≤ a | |φx | ≤ a, ∀x ∈ A Z (A) ≤ µAc |φtk | ≤ 2a .
(3.5)
This then implies the following bound: Z (A ∪ {tk }) ≤ C 2a | log ε|−1/2 , Z (A) since2 , under µAc with A ∩ Ck = ∅, φtk is a Gaussian random variable with 0 mean and variance bounded from below by C| log ε|. Putting all this together, we get −1 −1/2 ε a,b ≥ e−C | log ε| , A # tk A ∩ Bk−1 = ∅ ≥ 1 + C | log ε|−1/2 ε + C ε 4/3 ν and therefore −1/2 ε |B|
a,b ν (A ∩ B = ∅) ≥ e−C | log ε|
|B| ≥ 1 − C | log ε|−1/2 ε .
It only remains to prove (3.5), µ |φtk | ≤ a |φx | ≤ a, ∀x ∈ A = 1 − 2µ φtk > a |φx | ≤ a, ∀x ∈ A . We use the FKG inequality, stating that the random field (φx )x∈\A with boundary conditions {φx = ηx : x ∈ A ∪ c } , depends monotonically on (ηx )x∈A∪c . Therefore, for |ηx | ≤ a, x ∈ A, µ φtk > a φx = ηx , ∀x ∈ A ≥ µ∞ φtk > a φx = −a, ∀x ∈ A ∪ c 1 − µAc φtk ≤ 2a = µAc φtk > 2a = . 2 This proves (3.5). 3.3. Proof of Point 3. We again have the assumption d = 2. The proof proceeds in three steps. First, we prove a statement similar to that of Theorem 2.4, but valid only for sets B sufficiently “fat”. In the second step, we use this result to show that with high probability there is a high density of pinned sites at a large enough (ε-dependent) scale. Then, in the last step, we use this information to conclude the proof of Theorem 2.4, Part 3. We need the following definition: Consider a partition of Z2 into cells by a grid of spacing l; the set of all cells entirely contained in a subset (not necessarily finite) ⊂ Z2 is denoted by (l). 2 This is the only place where we use the assumption that V is quadratic. We don’t know how to estimate the probability density in the non-Gaussian case. Note that we only need to estimate it at zero, since the maximum is there.
176
E. Bolthausen, Y. Velenik
3.3.1. Step 1: Probability of clean fat sets. This step is a variant of the proofs given in [8, 16]. Here, however, we want to keep track of the ε-dependence of the constants. We remind the reader that we assume β = 1, c= 1, and that ε = εa,b = 2a eb − 1 . Proposition 3.1. Let β = 1 and let V be an even, C 2 , function with V (x) ≥ 1. There exist constants K > 0 and ε0 > 0 such that, for all ε ≤ ε0 , and provided 2a ≤ | log ε|1/2 , the following holds: For any set B ⊂ Z2 composed of cells of (K|log ε|1/4 ε −1/2 ), a,b ν (A ∩ B = ∅) ≤ exp − C |log ε|−1/2 ε |B| . This statement remains true in the case of δ-pinning. Proof. We suppose first, for simplicity, that B is connected. The changes for the general case are the same as those described in [16], and we’ll indicate their effects on our bounds at the end of the proof. def
Let B 0 = B, and define B k+1 as the union of B k and all its nearest neighboring cells in Z2 (K |log ε| 1/4 ε −1/2 ); let k be the largest k for which B k ⊂ . We then write a,b ν (A ∩ B = ∅) ≤
k k=0
a,b ν (A ∩ B k = ∅ | A ∩ B k+1 = ∅),
and a,b ν (A ∩ B k = ∅ | A ∩ B k+1 = ∅) (3.6) −1 ≤ (eb − 1)|D| inf µ |φx | ≤ a, ∀x ∈ D |φx | ≤ a, ∀x ∈ A . A∩B k =∅ D⊂B k k+1 A∩B
=∅
It was proved in [8], see the proof of Proposition 4.1, that inf
A∩B k =∅ A∩B k+1 =∅
µ
|D| 2a |φx | ≤ a, ∀x ∈ D |φx | ≤ a, ∀x ∈ A ≥ C √ ∧1 |log ε|
2a = C√ |log ε|
|D|
,
for the class of sets D containing exactly one point in each cell of B k . Therefore, summing only over such D’s in (3.6) (notice that there are K 2 ε −1 |log ε|1/2 choices for which site is occupied in a given cell), we get, choosing K 2 = 2/C, (C from the formula above), a,b ν (A ∩ B k = ∅ | A ∩ B k+1 = ∅) ≤ exp −C |log ε|−1/2 ε B k . From this we easily prove the claim for the one-component case, by summing over k. Indeed, we can use the trivial estimate B k ≥ |B| + kε −1 |log ε|1/2 . To treat the case of multiple components, one proceeds as in the proof of Theorem 2 in [16]. The idea is to grow simultaneously all components in a suitable way. This procedure only modifies the value of the constant in the exponent, provided the components are all big enough. In our present situation, this is enforced automatically as soon as ε is sufficiently small (the cells from which B is built are growing when ε decreases).
Critical Behavior of Massless Free Field at Depinning Transition
177
3.3.2. Step 2: Density of pinned sites at large scales. Our aim in this step is to show that any subset of has the property that many of its points are close to pinned sites. To do this, we need two partitions of Z2 , first the one used in Step 1, Z2 (K |log ε| 1/4 ε −1/2 ), and a second Z2 (|log ε| ε−1/2 ). The cells of the latter are called “big”, and are supposed to be built of cells from the finer partition (this might require some slight modification of the size of the cells, but this is a trivial point). The actual choice of the size of the big cells is actually not important. |log ε|α ε −1/2 for any α > 1/4 would do. Given an arbitrary subset B ⊂ , we write NB for the number of big cells containing sites of B. If A ⊂ is another subset, then we write NB (A) for the number of those cells containing sites of B but no site of A or of Z2 \ . We shortly write NB = a,b NB (A), when A is our standard random subset, distributed according to ν . Let ρ = −2 |log ε| ε |B| 2NB . We want to prove that a,b (NB > ρ NB ) ≤ e−C |log ε| ν
−1/2
ε |B|
,
(3.7)
provided ε is small enough (independently of B). Notice that 1 2
|log ε|−2 ε ≤ ρ ≤ 21 .
Equation (3.7) is an easy consequence of Proposition 3.1. Indeed, we can apply the latter to get NB NB a,b ν (NB > ρ NB ) ≤ exp − C |log ε|−1/2 ε kε−1 |log ε|2 k k>ρ NB
NB NB = exp − C |log ε|3/2 k k k>ρ NB ( ) NB NB −tρ NB 3/2 ≤ inf e exp (t − C |log ε| k t≥0 k k=0 NB = inf e−tρ 1 + exp t − C |log ε|3/2 t≥0 ≤ exp − 21 C |log ε|3/2 ρ NB = exp − 41 C |log ε|−1/2 ε |B| . 3.3.3. Step 3: Arbitrary sets. Let now B be an arbitrary subset of . By (3.7), we know that a,b ν (A ∩ B = ∅) a,b ≤ ν A ∩ B = ∅
NB < 1 |log ε|−2 ε |B| + exp − C |log ε|−1/2 ε |B| . 2
In order to finish the proof of the theorem, it remains to estimate the first summand on the right-hand side. The idea is to essentially repeat the argument used in the proof of Proposition 3.1, using the fact that there are already many pinned sites close to B. Let us therefore suppose, without loss of generality, that 1 A : A ∩ B = ∅, NB (A) < |log ε|−2 ε |B| = ∅ 2
178
E. Bolthausen, Y. Velenik
(otherwise the conditional probability is simply 0 and there is nothing to prove). Then we have, as in (3.6), 1 a,b −2 ν A ∩ B = ∅ NB < |log ε| ε |B| 2 −1 (eb − 1)|D| µ |φx | ≤ a, ∀x ∈ D |φx | ≤ a, ∀x ∈ A ≤ inf A
≤
D⊂B
inf A
(eb − 1)|D| µ |φx | ≤ a, ∀x ∈ D |φx | ≤ a, ∀x ∈ A
−1 ,
D⊂B g (A)
where the infimum is taken over sets A with A ∩ B = ∅ and NB (A) < 21 |log ε|−2 ε |B| , and where B g (A) is the set of “good” points in B: those sharing a big box with at least one point from A or Z2 \. It is easy to estimate the inner probability. Indeed, numbering the elements of D = {t1 , . . . , t|D| }, we can write µ ( |φx | ≤ a, ∀x ∈ D| |φx | ≤ a, ∀x ∈ A) =
|D|−1 k=1
≥
|D|−1 k=1
µ φtk+1 ≤ a |φx | ≤ a, ∀x ∈ A ∪ {t1 , . . . , tk } 1 1 a ∧ , 2 4µAc \{t1 ,... ,tk } φtk+1 2
where the last inequality follows from Lemmas 5.4 and 5.5 of [8]. The expected value is easily estimated using the random walk representation: * µAc \{t1 ,... ,tk } φtk+1 ≤ µAc \{t1 ,... ,tk } φt2k+1 * ≤ µAc \{t1 ,... ,tk } φt2k+1 ≤ C |log ε|, where the second inequality follows from Brascamp–Lieb, and the last one follows from (B.3), since the last probability is bounded by the Green function of the random walk killed as it hits the closest site of A or of Z2 \ located in the same cell as tk+1 (there is such a site since tk+1 ∈ B g (A)). Therefore, C (2a ∧ 1) |D| µ ( |φx | ≤ a, ∀x ∈ D| |φx | ≤ a, ∀x ∈ A) ≥ √ . |log ε| This finally yields ( a,b ν (A ∩ B
= ∅ | NB <
1 2
−2
|log ε|
ε |B|) ≤ inf A
D⊂B g
Dε √ |log ε|
|D| )−1
≤ exp − C |log ε|−1/2 ε inf B g (A) . A
Critical Behavior of Massless Free Field at Depinning Transition
179
The conclusion follows easily since g B (A) ≥ |B| − NB (A) |log ε|2 ε −1 ≥ |B| /2, when NB (A) <
1 2
|log ε|−2 ε |B| .
3.4. Proof of Point 4. We assume here d ≥ 3. The desired inequality follows from (3.2) and, using Lemmas 5.4 and 5.5 of [8], Z (A ∪ {x}) = µ (|φx | ≤ a | |φy | ≤ a ∀y ∈ A) Z (A) + , a a 1 1 1 1 ≥2 ≥2 ∧ ∧ 2 ≥ C3 (2a ∧ 1). 4µ (|φx |) 2 4 µ(|φx |2 ) 4. Asymptotics of the Variance 4.1. Proof of Theorem 2.1. We start with δ-pinning. Let be a square in Z2 , centered at the origin, and with large enough sidelength (the thermodynamic limit is taken at the end). Let def Be (0) = x ∈ Z2 : 'x'∞ ≤ 21 e−1/2 |log e|−1/4 . Using (2.4), we get µε (φ02 ) =
A⊂
ε ν (A) µAc (φ02 )
ε ≥ ν (A ∩ Be (0) = ∅)
inf
A∩Be (0)=∅
µAc (φ02 ).
1 By the inverse Brascamp–Lieb inequality [7], µAc (φ02 ) ≥ 1c µAc (φ02 ) = βc GAc (0, 0), where the last quantity is the Green function for the simple random walk killed as it enters the set A. Clearly GAc (0, 0) is minimum when A = Z2 \ Be (0). Moreover from Part 1 of Theorem 2.4, we know that −1/2 ε ν (A ∩ Be (0) = ∅) ≥ 1 − C log e ,
and the conclusion follows. The square-well potential is treated essentially in the same way. The only difference is that we use the following bound, which is a consequence of FKG and Cauchy–Schwartz inequalities (see Sect. 5 of [8] for similar estimates) µ (φ02 | |φx | ≤ a, ∀x ∈ A) = µ (φ02 I(φ0 ≥ 0) | |φx | ≤ a, ∀x ∈ A, φ0 ≥ 0) ≥ µ (φ02 I(φ0 ≥ 0) | φx = −a, ∀x ∈ A, φ0 ≥ 0) ≥ µAc ((φ0 − a)2 I(φ0 ≥ a) | φ0 ≥ a) ≥ µAc ((φ0 − a)2 I(φ0 ≥ a) | φ0 ≥ 0) * 2 ≥ µAc (φ02 ) − a .
(4.1)
180
E. Bolthausen, Y. Velenik
We are now back to the previous case, since when A ∩ Be (0) = ∅, our assumption on a implies that * 2 µAc (φ02 ) − a ≥ 21 µAc (φ02 ).
4.2. Proof of the lower bound in Theorem 2.2. The proof is almost identical to that of Theorem 2.1 in the δ-pinning case. The only difference is that in the Gaussian case we do not need the inverse Brascamp–Lieb inequality, and therefore we do not get the factor 1c . Moreover, using Part 2 of Theorem 2.4, we obtain the improved estimate −1 ν ε (A ∩ Be (0) = ∅) ≥ 1 − C log e . Therefore, we get in this case
,ε
µ
φ02
√ log βε βε , ≥ − C log log √ 2πβ det Q
which proves the lower bound in Theorem 2.2.
4.3. Proof of the upper bound in Theorem 2.2. We apply Remark 2.2, and therefore assume β = 1. Using Corollary 2.1, we have 2 µ,ε (φ0 ) ≤
≤
n≥0 n0
out ρ+ ⊗ P0 Xn = 0, TA c > n P0 [Xn = 0] +
n=0 n0
= G (0, 0) +
n>n0
out ρ+ ⊗ P0 TA c > n
E0 (1 − p+ )|X[0,n] | ,
(4.2)
n>n0
where we choose n0 = n0 (ε) = ε−1 |log ε|η , for some η > 0 to be chosen later. Then the n0 -step Green function in the right-hand side of the last equation has the following asymptotics, see (B.4), √ Gn0 (0, 0) = (2π det Q)−1 |log ε| + O(log |log ε|). The claim will be proved if we show that the second term in (4.2) does not contribute more than O(log |log ε|); we are actually going to check that it is even o(1) as ε goes to zero. Indeed, introducing a small constant κ > 0, it can be estimated in the following way: n>n0
E0 (1 − p+ )|X[0,n] | ≤ (1 − p+ )κ n/ log n + P0 X[0,n] ≤ κ n/ log n . n>n0
n>n0
By Proposition C.2, we see that P0 X[0,n] ≤ κ n/ log n ≤ n−2 , provided κ is chosen small enough; this shows that the last sum is o(1). To see that this is also true for the
Critical Behavior of Massless Free Field at Depinning Transition
181
first one, we bound it as follows (remember that n0 → ∞ when ε → 0): (1 − p+ )κ n/ log n ≤ e−p+ κ n/ log n n>n0
≤
n>n0 ∞
≤ =
n0 −1 ∞ n0 /2
2 p+ κ
e−p+ κ x/ log x dx 1
e− 2 p+ κ y dy 1
e− 4 p+ κ n0
which is o(1) by definition of p+ and n0 , provided we take η sufficiently large (depending on κ). 5. Asymptotics of the Mass: Proof of Theorem 2.3 We discuss the 2-dimensional case in detail. The simpler higher-dimensional case follows exactly in the same way by using Theorem 2.4 Parts 1 and 4 instead of Parts 2 and 3. We consider x ∈ Z2 sufficiently far away from 0. We take to be a finite box in Z2 , and prove the estimates when is large enough, depending possibly on x. This then proves the estimates in the thermodynamic limit. Remember that we assume here that (p (x))x∈Z2 has an exponential moment. Furthermore, we assume that p is irreducible and aperiodic. (n)
Proof of the upper bound. We denote by Ex,y the expectation for the random walk starting in x and conditioned on Xn = y, provided the probability of the latter event is positive. Using (2.19), we have µ,ε (φ0 φx ) ≥ E0 exp − Cε| log ε|−1/2 X[0,n] ; Xn = x =
n≥0 ∞
pn (x)E0 exp − Cε| log ε|−1/2 X[0,n] Xn = x
n=0
≥
∞
(n) pn (x) exp − Cε| log ε|−1/2 E0,x X[0,n]
n=0
(m) ≥ pm (x) exp − Cε| log ε|−1/2 E0,x X[0,m] , where def m = m(|x| , ε) = |log ε|3/4 ε −1/2 |x| . We apply Proposition C.1, and use pm (x) ≥
.
x C |x|2 C exp − mI ≥ exp − ρ , m m m m
182
E. Bolthausen, Y. Velenik
for some positive ρ, see Proposition B.2. So we get ∞
E0 exp − Cε| log ε|−1/2 X[0,n] ; Xn = x
n=0
C
≥
|log ε|3/4 ε −1/2 |x| . |log ε|3/4 ε −1/2 |x| × exp − ρ |log ε|−3/4 ε 1/2 |x| − C ε| log ε|−1/2 log(|log ε|3/4 ε −1/2 ) ≥ exp −C |log ε|−3/4 ε 1/2 |x|
for small enough ε > 0, and then large enough |x|. This proves the lower bound. There is a trivial modification necessary for d ≥ 3 : We have to replace the use of Proposition C.1 by the completely trivial bound X[0,n] ≤ n + 1.
Proof of the lower bound. We start by proving that the logarithmic asymptotics for the 2-point function µ,ε (φ0 φx ) are entirely determined by the probability that the random walk reaches x before dying. Lemma 5.1. 1. in in ε in out < TA ] = lim ν ⊗ P0 [T{x} < TA ν ε ⊗ P0 [T{x} c] def
Z2
exists for all x ∈ Zd . 2. For all x ∈ S1 , in 1 1 in , log µ,ε (φ0 φ[kx] ) ≤ lim sup log ν ε ⊗ P0 T{[kx]} < TA k k k→∞ k→∞ in 1 1 in < TA . lim inf log µ,ε (φ0 φ[kx] ) ≥ lim inf log ν ε ⊗ P0 T{[kx]} k→∞ k k→∞ k
lim sup
(That these limits actually exist is proved in Appendix A.) ε implies ν ε (A ∩ D = ∅) ≥ Proof. 1. If ⊂ Z2 , the FKG property of ν ε ν (A ∩ D = ∅) for any set D ⊂ , see (2.6). Therefore
in ε out in out ε ν ν A ∩ X[0,Tin ] = ∅ ⊗ P0 T{x} < T\ A = E0 I T{x} < T {x} in out ε ≥ E0 I T{x} < T ν A ∩ X[0,Tin ] = ∅ {x} in ε out = ν ⊗ P0 T{x} < T \A , which proves the claim since the probabilities are bounded by 1.
Critical Behavior of Massless Free Field at Depinning Transition
183
2. Using the expansion (2.4), we can write ε ν ⊗ P0 Xn = x, X[0,n] ⊂ Ac µ,ε (φ0 φx ) = n≥0
=
m≥0 n≥0 A⊂
=
n≥0 A⊂
=
A⊂
=
in ε out ν (A) P0 T{x} = m < TA Px Xn = x, X[0,n] ⊂ Ac c
in ε out Px Xn = x, X[0,n] ⊂ Ac ν (A) P0 T{x} < TA c
in ε out GAc (x, x) ν (A) P0 T{x} < TA c
R≥0
A⊂ d∞ (A,x)=R
in ε out ν (A) P0 T{x} < TA GAc (x, x) c
in ε out ≤ ν ⊗ P0 T{x} < TA max GZ2 \{y} (x, x) c y:'x−y'∞ =R R≥0 ε × ν
in out ⊗ P0 d∞ (A, x) = R | T{x} < TA c
in ε out ≤ ν ⊗ P0 T{x} < TA c ε in out × C log R ν ⊗ P0 d∞ (A, x) = R | T{x} < TA c . R≥0
We therefore have to bound the conditional probability. This can be done as follows: ε in out ⊗ P0 [d∞ (A, x) = R | T{x} < TA ν c]
≤
ε ⊗ P [d (A, x) = R] ν 0 ∞ ε ⊗ P [Tin < Tout ] ν 0 {x} Ac
∧1≤
e−C(ε)R
2
e−C (ε)|x|
2
∧ 1,
where we used Theorem 2.4 to bound the numerator and the bound on the denominator follows from in ε out in ⊗ P0 T{x} < TA E0 I(T{x} = n) (1 − p− )|X[0,n] | ν c ≥ n≥0
2 in ≥ (1 − p− )|x| P0 T{x} ≤ |x|2 − 1 , and the local CLT. Therefore the sum over R is smaller than C(ε)(|x| log |x| + 1), which proves the first claim. To prove the second claim, notice that ε in out ν (A) P0 [T{x} < TA µ,ε c ] GAc (x, x) (φ0 φx ) = R≥0
A⊂ d∞ (A,x)=R
in out ε ≥ ν ⊗ P0 [T{x} < TA c ],
since GAc (x, x) ≥ 1 (one can restrict the sum over sets A not containing x, since otherwise the probability of reaching x is 0).
184
E. Bolthausen, Y. Velenik
Let <(ε) = ε −1/2 | log ε|3/4 . We consider a partition of Z2 into cells of width <(ε), and write, for y ∈ Z2 , By for the cell containing y, and B y for the square composed of (2M + 1) × (2M + 1) cells with middle-cell By , where M is a big integer to be chosen later. We introduce the following stopping-times: • T0 = 0; • Tk = min{n > Tk−1 : B Xn ∩ B XTl = ∅ ∀l < k} (k ≥ 1) ; (k ≥ 0), • Tk = min{n > Tk : Xn ∈ BM<(ε) (XTk )} where BM<(ε) (y) is the ball of radius M<(ε) and center y. Let also k = max{k ≥ 0 : Tk < Tin }, and let c be some small constant to be chosen later. We then have (remember Bx
that p+ = Cε| log ε|−1/2 ) |X[0,Tin ] | in in {x} ν ε ⊗ P0 [T{x} < TA ] ≤ E0 (1 − p+ ) |X[0,Tin ] | {x} I(k > c|x|/<(ε)) + P k ≤ c|x|/<(ε) . ≤ E0 (1−p+ ) 0
By Proposition B.3, the last probability is bounded from above by e−C|x| , with C > 0 independent of ε, provided c is chosen small enough; indeed the total number of cells visited is certainly smaller than (2M + 1)2 (k + 1). Let us now consider the first term. Clearly,
E0 (1−p+ )
|X[0,Tin ] | {x}
- k−1 . |X[T ,T ] | k k I(k > c|x|/<(ε)) ≤ E0 (1−p+ ) I(k ≥ c|x|/<(ε)) k=0
- c|x|/<(ε) . |X[T ,T ] | k k ≤ E0 (1 − p+ )
-
k=0
≤ E0 (1 − p+ )
|X[0,Tout
BM<(ε) (0)
]
|
.c|x|/<(ε) .
The conclusion follows, since the latter expectation can easily be bounded. Choosing some C1 > 0, we split as follows: |X[0,Tout | out ] BM<(ε) (0) < M<(ε)2 ≤ e−p+ C1 /p+ + P0 TB E0 (1 − p+ ) M<(ε) (0) + P0 |X[0,M<(ε)2 ] | ≤ C1 /p+ ≤ 3/4. We now choose first C1 such that the first summand is ≤ 1/4. Next, observe that by the invariance principle for the random walk, we have out lim P0 TB < M 2 <(ε)2 = P0 σB1 (0) < 1 , M<(ε) (0) M→∞
where σ is the exit time of a Brownian motion with covariance Q, and therefore out < M<(ε)2 = 0, lim P0 TB M<(ε) (0) M→∞
uniformly in ε ≤ 1/2, say. As M → ∞, also the third summand is converging to 0, uniformly in ε ≤ 1/2, which follows from the law of large numbers for the range of the
Critical Behavior of Massless Free Field at Depinning Transition
185
random-walk, see [17] (notice that C2 /p+ = C2 C
k→∞
in in 1 1 in in < TA < TA = sup − log ν ε ⊗ P0 T{kx} log ν ε ⊗ P0 T{kx} |x| k |x| k k
exists for all x ∈ Z2 . Proof. This follows from a standard subadditivity argument, since in in in in in ν ε ⊗P0 T{[(k+l)x]} < TA < T{[(k+l)x]} < TA ≥ ν ε ⊗ P0 T{[kx]} in in = E0 I T{[kx]} < T{[(k+l)x]} < ∞ ν ε A ∩ X[0, in T{[(k+l)x]} ] =∅ in in ≥ E0 I T{[kx]} < T{[(k+l)x]} < ∞ ν ε A ∩ X[0,Tin ] = ∅ {[kx]}
ε × ν A ∩ X[Tin , =∅ in T ] {[kx]} {[(k+l)x]} in ε in in in = Cν ε ⊗ P0 T{[kx]} ν ⊗ P0 T{[lx]} , < TA < TA in in = min{n > T{[kx]} | Xn = [(k + l)x]}, and the inequality follows from where T{[(k+l)x]} the FKG property. The constant C, which depends only on p, takes care of the possible discrepancy between [(k + l)x] and [kx] + [lx]. def
B. Some Properties of Random Walks We keep the assumptions on p made in the introduction. Especially, we always assume the existence of a moment of order 2 + δ (1.2) and that the random walk is irreducible and aperiodic. We always use C, C for positive constants, not necessarily the same at different occurrences, which may depend on p( · ) and d, but on nothing else. B.1. Properties of Green functions for random walks in dimension 2. We denote by a(x) = n≥0 (P0 (Xn = 0) − P0 (Xn = x)) the potential kernel associated to the random walk. TBout def For any B ⊂ Z2 , GB (x, y) = Ex n=0 I (Xn = y) is the Green function of the m def random walk killed as it exits B. For m ≥ 0, we write Gm (x, y) = Ex n=0 I (Xn = y) for the m-step Green function.
186
E. Bolthausen, Y. Velenik
Let Q be the covariance matrix of p. We write 'x'Q =
*
x, Q−1 x , where (· , ·)
is the inner product in R2 . Observe that there exist c > 0 and c < ∞ such that c |x| ≤ 'x'Q ≤ c |x|. Proposition B.1. 1. There exists a constant K > 0 depending on p( · ) such that √ lim [a(x) − (π det Q)−1 log 'x'Q − K] = 0.
|x|→∞
(B.1)
2. Let B be the box of radius R centered at the origin. Then, as R → ∞, √ GB (0, 0) = (π det Q)−1 log R + O(1).
(B.2)
3. Let x ∈ Z2 . Then, as |x| → ∞, √ GZ2 \{x} (0, 0) = 2(π det Q)−1 log |x| + O(1).
(B.3)
√ Gn (0, 0) = (2π det Q)−1 log n + O(1).
(B.4)
4. As n → ∞,
5. Let B be the box of radius R centered at the origin, and let x ∈ B be such that |x| ≤ 21 R. Then there exist K3 > 0 and R0 > 0 such that, for all R ≥ R0 , in K3 out Px T{0} ≥ . ≤ TB log R
(B.5)
Proof. (B.1) is proved in [13]. (B.2) follows from (B.1) by a standard argument, see [18]. The proof there is for the nearest neighbor random walk only, but it can be easily adapted to cover the more general case considered here. (B.3) follows from (B.1) and P11.6 in [20]. (B.4) follows from a standard local limit theorem: pn (0) =
√
1
2π det Qn
+ O(n−1−ε )
(B.6)
for some positive ε. Under the assumptions of the existence of a third moment, this is a standard Berry–Esseen type estimate (with ε = 1/2). We don’t know of an exact reference under the assumption of a (2 + δ)-moment only. The paper [15] treats the case of a one-dimensional random walk. The method there can easily be adapted to prove (B.6) on the two-dimensional lattice. Finally, (B.5) is proved in [18], Proposition 1.6.7, for the simple random walk. Again, the proof can easily be adapted to cover the more general case.
Critical Behavior of Massless Free Field at Depinning Transition
187
B.2. Approximations for pn (x). We will need some essentially well-known facts about pn (x) for large n and x, in case there exists an exponential moment of p. For the convenience of the reader, we sketch the argument, which is completely standard. The results in this subsection hold for general dimensions. Proposition B.2. Assume
p(x) ea|x| < ∞
x
for some a > 0. Then there exists η > 0 such that for |x/n| < η, 1 1 pn (x) = exp [−nI (x/n)] , +O √ n(d+1)/2 (2πn)d/2 det Q (x/n) where Q (ξ ) , |ξ | < η, are d × d-matrices, depending analytically on ξ, and satisfying Q (0) = Q. I (ξ ), |ξ | < η, also depends analytically on ξ and satisfies I (0) = 0, ∇I (0) = 0, ∇ 2 I (0) = Q−1 . Proof. We use the standard approximation of pn (x) by tilting the measure and applying a local central limit theorem with error estimate. For λ ∈ Rd in a neighborhood of 0, we consider the tilted measure p(λ) (x) =
def
p(x) exp (λ, x) , z (λ)
def 2 where z (λ) = x p(x) exp (λ, x) . Clearly ∇ log z (0) = 0, and ∇ log z(0) = Q. Therefore, the mapping λ → ∇ log z (λ) is an analytic diffeomorphism of a neighborhood of 0 to a neighborhood of 0, leaving 0 fixed. Therefore, for any ξ in a neighborhood of 0 in Rd , there exists a unique λ(ξ ) with ∇ log z (λ(ξ )) = ξ. Using this, we see that for |x| ≤ ηn, η > 0 small enough, we can write (λ(x/n))
pn (x) = exp [−nI (x/n)] pn
(x),
def
where I (ξ ) = (λ(ξ ), ξ ) − log z (λ (ξ )) . Evidently, I (0) = 0, ∇I (0) = 0, and a simple computation yields ∇ 2 I (0) = Q−1 . Furthermore, p (λ(x/n)) has now mean exactly x/n and covariance matrix Q (x/n) , where Q (ξ ) depends analytically in ξ and satisfies Q (0) = Q. Applying a local central limit theorem with standard Berry–Esseen type error estimate we get (λ(x/n)) 1 C pn ≤ (x) − . d/2 √ det Q (x/n) n(d+1)/2 (2π n) Corollary B.1. Assume (2.2). There exist κ0 and K > 0 such that for κ ≥ κ0 and all x ∈ Zd with |x| large enough, P0 [X[κ|x|] = x] ≥ e−K κ
−1 |x|
.
Proof. Approximate I in Proposition B.2 by an appropriate quadratic function.
(B.7)
188
E. Bolthausen, Y. Velenik
B.3. Crossing probabilities for thick shells. We start with some one-dimensional considerations. Let (Xi )i≥0 be a Z-valued random walk, where the distribution of the jumps Xi − Xi−1 is distributed according to (q (x))x∈Z , where we assume that x x q(x) = 0 and x exp [α |x|] q (x) < ∞ for some α > 0. We define the ladder-epochs and ladder heights def
def
τ0 = 0, ξ0 = 0, def
def
τk+1 = inf {n > τk : Xn ≥ ξk + 1} , ξk+1 = Xτk+1 . By the Markov property, the sequence (ξk − ξk−1 )k≥1 is i.i.d. Lemma B.1. a)
E0 exp α ξ1 < ∞
for some α > 0. def def b) Let K, n ∈ N, and define the intervals Ij = ((j − 1)K, j K] ⊂ N. Let also ζ = # j ≤ n : ∃k, ξk ∈ Ij . Then for any 0 < s < 1, lim sup lim sup K→∞
n→∞
1 log P0 (ζ ≤ sn) < 0. Kn
Proof. a) is well known. For the convenience of the reader we give a crude proof, sufficient for our purpose: P0 (ξ1 ≥ k) ≤ P0 (ξ1 ≥ k, τ1 ≤ exp [λk]) + P0 (τ1 > exp [λk]) C ≤ exp [λk] P0 (X1 ≥ k) + √ ≤ exp −α k exp [λk] for some α > 0, by choosing λ > 0 appropriately, for large enough k. def b) Let σ = min j : ξj > Kn . Then, by standard large deviation estimates, P0 (σ > λKn) = P0 (ξλKn < Kn) ≤ exp [−CKn] for Kn large enough, when λ is chosen appropriately (e.g. λ = 1/2E0 ξ1 ). We consider def
the independent differences <j = ξj − ξj −1 . We have for 0 < s < 1: λKn <j P0 (ζ ≤ sn, σ ≤ λKn) ≤ P0 I <j > K ≥ (1 − s)n j =1 K λKn ≤ exp [−aK(1 − s)n] E0 exp a<j I <j > K , for any a > 0. According to a), we can choose a such that E0 exp a<j < ∞, and then, for any δ > 0, we may choose K large enough, such that E0 exp a<j I <j > K ≤ 1 + δ, i.e.
-
P0 (ζ ≤ sn, σ ≤ λKn) ≤ exp [−aK(1 − s)n + δλKn] ≤ exp if δ ≤ a(1 − s)/2λ. This proves the claim.
. −aK(1 − s)n , 2
Critical Behavior of Massless Free Field at Depinning Transition
189
We apply this now to our d-dimensional random walk, where we again assume the existence of an exponential moment (2.2). Let Zd (K) be the division of Zd into square cells of side length K, where we take {1, . . . , K}d as one of the blocks. We further def consider a big square Sn,K = {−nK + 1, −nK + 2, . . . , nK}d , which of course is divided in (2n)d cells of side-length K. Proposition B.3. Let ηn,K be the number of cells in Zd (K) which are visited by the random walk up to time TSout . Then for any s ∈ (0, 1), n,K lim sup lim sup K→∞
n→∞
1 log P0 ηn,K ≤ sn < 0. Kn
Proof. The proposition is an easy consequence of the one-dimensional result. Indeed, write the random walk in (dependent) components X = X , X , . . . , Xn,d , where n n,1 n,2 Xn,i are one-dimension random walks, possessing an exponential moment. first The time TSout 1 ≤ leaves S when is also the first time where one of the X ) (X n n,K n,i n,K {−nK i ≤ d leaves the interval + 1, −nK + 2, . . . , nK} . Assume for instance that at TSout for the first time leaves the above interval to the right. (There are of , X n,1 n,K course 2d − 1 other cases). This is then the first time it surpasses nK. Furthermore, the number of K-cells visited by the d-dimensional walk is at least the number of intervals among (1, K], (K, 2K], . . . , ((n − 1)K, nK], visited by Xn,1 . For the other 2d − 1 cases, similar statements hold, of course. From this observation, Proposition B.3 follows immediately from Lemma B.1. C. On the Range of a Random Walk We present here two results about the number of points visited by a two-dimensional random walk. (n) C.1. Tied-down expectations of X[0,n] . We write P x,y for the law of a random walk, starting in x, conditioned on Xn = y. Of course, we tacitly always assume that the probability of the latter is positive whenever we use this notation, which is certainly true for large enough n (depending on x, y). We will need some information on the first return probabilities: def
ql = P0 (X1 = 0, . . . , Xl−1 = 0, Xl = 0) . By recurrence of the random walk, we have l≥1 ql = 1, and the following estimate is well-known. Lemma C.1. ql = where γ > 0, as l → ∞.
γ l (log l)2
+o
1 l (log l)2
(n) We need some information on E0,x X[0,n] .
,
(C.1)
190
E. Bolthausen, Y. Velenik
Proposition C.1. There exist A0 > 1 and C > 0 such that for all A ≥ A0 , there exists r0 (A) ∈ N, such that for |x| ≥ r0 (A) and n defined by def
n = [A |x|] , one has n (n) . E0,x X[0,n] ≤ C log A Proof. Remark first that under the conditions of the lemma, P0 (Xn = x) > 0 for the large enough r0 (A). This easily follows from irreducibility and aperiodicity. Therefore, (n) E0,x X[0,n] is well-defined. We first derive a simple exact expression for this expectation: n pn−l (x) (n) E0,x X[0,n] = n + 1 − . (n − l + 1) ql pn (x)
(C.2)
l=1
This readily follows from a standard “last exit – first entrance” decomposition. E X[0,n] ; Xn = x X[0,n] = 0 , pn (x) P0 (Xk = y for some k ∈ [0, n], Xn = x) E0 X[0,n] ; Xn = x =
(n) E0,x
y∈Z2
= =
n
y∈Z2
k=0
n y∈Z2
P0 (Xk = y, Xk+1 = y, . . . , Xn−1 = y, Xn = x) P0 (Xk = y) Py (X1 = y, . . . , Xn−k−1 = y, Xn−k = x) ,
k=0
Py X1 = y, . . . , Xn−k−1 = y, Xn−k = x = pn−k (x − y) −
n−k
Py (X1 = y, . . . , Xl−1 = y, Xl = y, Xn−k = x)
l=1
= pn−k (x − y) −
n−k
ql pn−k−l (x − y).
l=1
Implementing this into (C.3) and summing over y yields n−k n−1 ql pn−l (x) E0 X[0,n] ; Xn = x = (n + 1) pn (x) −
= (n + 1) pn (x) −
k=0 l=1 n
(n − l + 1) ql pn−l (x).
l=1
(C.3)
Critical Behavior of Massless Free Field at Depinning Transition
191
From this, (C.2) follows. We next use this together with the information on ql in Lemma C.1 and Proposition B.2 to get the desired estimate: (n) E0,x
A pn−l (x) X[0,n] ≤ n + 1 − (n − l + 1) ql pn (x) l=1 A (n − l + 1) pn−l (x) = (n + 1) 1 − ql (n + 1) pn (x) l=1 A A (n − l + 1) pn−l (x) = (n + 1) 1 − ql + ql 1 − (n + 1) pn (x) l=1
= (n + 1)
∞
l=1
ql + (n + 1)
l=A+1
A
(C.4)
ql
l=1
(n − l + 1) pn−l (x) 1− , (n + 1) pn (x)
the last equation by recurrence of the two-dimensional random walk. From Lemma C.1, we get ∞ l=A+1
ql ≤
C , log A
(C.5)
and it therefore suffices to estimate the second summand on the right-hand side of (C.4). If n is large enough (depending on A), then |x| ≤ 2 (n − l) /A whenever l ≤ A. We use Proposition B.2 and obtain for |x| → ∞, and therefore n → ∞,
x (n − l − 1) pn−l (x)exp (n − l) I n−l 1 x = 1 + OA √ , n (n + 1) pn (x)exp nI n
where OA √1n means that there is a constant cA depending on A such that
cA . Furthermore, by Taylor expansion, we get OA √1n ≤ √ n
x . x exp − (n − l) I + nI n−l n . -
x 1 x x = exp l I − , ∇I + OA . n n n n Remark that 2 x x
x |x| . − , ∇I I ≤C n n n n
192
E. Bolthausen, Y. Velenik
Combining these observations, we get A l=1
. A l 1 (n − l + 1) pn−l (x) ≤ ql 1 − ql exp C 2 − 1 + OA √ A (n + 1) pn (x) n l=1
A 1 1 C + O √ A A2 n (log l)2 l=1 1 C 2C , ≤ + OA √ ≤ A A n ≤
for large enough n (n ≥ n0 (A)). This is much better than required, and therefore proves the proposition. C.2. Moderate deviations for X[0,n] . We use a variant of the approach in [1] to prove the following result. Proposition C.2. Assume (1.2). For any R > 0 there exists κ > 0 such that n P0 X[0,n] ≤ κ ≤ n−R , log n
(C.6)
for all n large enough. In contrast to our standard convention about constants denoted by C, c1 , c2 , . . . are positive constants which are always the same after they have been introduced. If these constants depend on other parameters, it will be clearly indicated. All inequalities are supposed to hold only for large enough n without further notice. def √ def Let Ln = n/ log n and Tn = {0, 1, . . . , Ln − 1}d . We periodize the random walk by setting def Xˆ n = Xn mod Ln ,
coordinatewise, getting therefore a random walk on the discrete torus Tn . The transition def probabilities are given by p(x) ˆ = number of points Xˆ [0,n] y=x mod Ln p(y). The visited by the periodized random walk is clearly at most X[0,n] . Therefore
P0
X[0,n] ≤ κ n log n
≤ P0
Xˆ [0,n] ≤ κ n . log n
For the rest of this section, we always work with this periodized walk, but leave the hat ˆ out in the notations for the sake of notational convenience. For convenience, we also assume that p is aperiodic. The general case requires only some trivial adjustments. We choose m = mn = δ logn n , where δ > 0 is a (small) number, to be specified later on. We also set K = Kn = [n/mn ] ≈ observed at multiples of m : def
log n δ . We denote by X the sequence of points
X = (X0 , Xm , X2m , . . . , XKm ) .
Critical Behavior of Massless Free Field at Depinning Transition
193
The set of points (on the torus) visited during the i th time interval is denoted by Vi0 : def Vi0 = X(i−1)m+1 , X(i−1)m+2 , . . . , Xim . We introduce a truncation by defining √ def Vi0 if d(X(i−1)m , Xim ) ≤ b m √ . Vi = ∅ if d(X(i−1)m , Xim ) > b m def
d is the lattice distance on the discrete torus. We also write V = (V1 , V2 , . . . , VK ). Remark that d(X(i−1)m , Xim ) are i.i.d. random variables and √ P0 d(X(i−1)m , Xim ) > b m ≤ exp −c1 b2 . Let √ def Hn,b,δ = # i : d(X(i−1)m , Xim ) > b m . Then Hn,b,δ is binomially distributed, and we obtain log n . log n P0 Hn,b,δ ≥ 2 exp −c1 b2 ≤ exp −c2 exp −c1 b2 . δ δ
(C.7)
We denote by P(Tn ) the set of subsets of Tn and by I : P(Tn )K → N the mapping /K V → Vi . i=1
Clearly, I is Lipschitz in the sense |I (V) − I (U)| ≤
K
|Vi . Ui | .
i=1
Using this notation, we get n n P0 X[0,n] ≤ κ ≤ P0 I (V) ≤ κ log n log n n = E0 PX I (V) ≤ κ , log n where PX denotes the conditional law given the vector X. Under PX , the sets Vi are independent random subsets of the torus Tn . We thus can apply a general result of Talagrand. Let µ = µX be a median of the (conditional) distribution of I, i.e. a number with PX (I (V) ≤ µX ) ≥ 1/2 and PX (I (V) ≥ µX ) ≥ 1/2. Let f : P(Tn )K → N be defined by ) (K def |Vi . Ui | : I (U) ≤ µX . f (V) = inf i=1
194
E. Bolthausen, Y. Velenik
Then by Theorem 2.4.1 of [21], we have for any a > 0 and λ > 0, PX (f (V) ≥ a) ≤ JX (a, λ), where JX (a, λ) = 2e−λa def
K
EX (cosh (λ |Vi . Ui |)) ,
i=1
and where the U is an independent copy of V (under the conditional law). Similarly, putting ) (K def |Vi . Ui | : I (U) ≥ µX , f0(V) = inf i=1
we get
PX f0(V) ≥ a ≤ JX (a, λ).
Combining these two estimates, we get PX (|I (V) − µX | ≥ a) ≤ 2JX (a, λ). Now, PX (I (V) ≤ a) ≤ PX (|I (V) − µX | ≥ a) + I (|µX − EX I (V)| ≥ 2a) + I (EX I (V) ≤ 4a) . Remark that |µX − EX I (V)| ≤ a + |Tn | PX (|I (V) − µX | ≥ a) , and therefore PX (I (V) ≤ a) ≤ PX (|I (V) − µX | ≥ a) a + I (EX I (V) ≤ 4a) + I PX (|I (V) − µX | ≥ a) ≥ |Tn | (C.8) a ≤ 2JX (a, λ) + I JX (a, λ) ≥ + I (EX I (V) ≤ 4a) . 2 |Tn | We apply this inequality to a = an = κ logn n , and with λ = λn = A (lognn) , where A will be specified below. Then we have + , n (log n)2 ,A = 2 exp −Aκ log n JX κ log n n + + ,, K (log n)2 |Vi . Ui | × , EX cosh A n i=1 + + ,, log m (log n)2 |Vi . Ui | |Vi . Ui | EX cosh A ≤ EX cosh 2Aδ . n m def
def
2
Critical Behavior of Massless Free Field at Depinning Transition
195
We assume now 2Aδ < 1,
(C.9)
and use cosh(xy) ≤ 1+x 2 ey for 0 ≤ x ≤ 1, 0 ≤ y. Furthermore, we have the following Lemma C.2. -
. log m EX exp |Vi | ≤ C(b). m √ Proof. We can take i = 1. If d(0, Xm ) > b m, then V1 = ∅ and there is nothing to prove. (m) We write P0,x for the law of the random walk (X0 , X1 , . . . , Xm ) conditioned on X0 = 0, Xm = x. (For simplicity, we neglect trivial parity problems.) Let ZT (m/2) be the number of points visited by X1 , . . . , Xm/2 on the√torus (assuming m for simplicity to be even). Then it suffices to prove for d(0, x) ≤ b m, . log m (m) E0,x exp (C.10) ZT (m/2) ≤ C(b). m The left-hand side of this equals . pm/2 (x − y) log m E0 exp ZT (m/2) I {Xm/2 = y} } m pm (x) y . log m ≤ C(b)E0 exp ZT (m/2) , m √ because for d(0, x) ≤ b m we have pm (x) ≥ C(b)/m > 0, and for all y, pm/2 (x−y) ≤ C (b)/m. We can replace ZT (m/2) by Z(m), the number of points visited by a random walk of length m on Z2 . (We replace m/2 by m just for notational convenience.)
.(log m)3 . log m log m m , Z(m) ≤ E0 exp Z m m (log m)3
m by the Markov property. We write Z for Z 3 . -
E0 exp
(C.11)
(log m)
. log m log m log m (log m)2 2 E0 exp (C.12) E0 Z exp Z ≤1+ E0 Z + Z m m 2m2 m . log m log m (log m)2 2 ≤ exp . E0 Z exp E0 Z + Z m 2m2 m -
.
Moreover, using the trivial bound Z ≤ m/(log m)3 , we have E0
-
log m Z exp Z m 2
. ≤C
m2 (log m)6
.
196
E. Bolthausen, Y. Velenik
Implementing this into (C.11) and (C.12), this gives . log m C (log m)4 Z(m) ≤ exp E0 Z + . E0 exp m m log m As m (log m)3
E0 Z ≤ C
log
m (log m)3
≤C
m (log m)4
,
this proves the claim. Using this lemma, we get log m |Vi . Ui | ≤ 1 + (2Aδ)2 C(b). EX cosh 2Aδ m Therefore, we obtain + JX
n (log n)2 κ ,A log n n
,
≤ 2 exp −Aκ log n (1/δ) log n
× 1 + c3 (b)(2Aδ)2 . Aκ log n , ≤ 2 exp − 2
if 8c3 (b)Aδ < κ. We fix def
A(κ, δ, b) =
κ . 16c3 (b)δ
Remark that we are then also on the safe side concerning (C.9) provided κ ≤ κ0 (b), κ0 (b) small enough. Therefore + , . n κ2 (log n)2 ,A ≤ exp − log n . JX κ log n n 24c3 (b)δ This is a deterministic bound. We see that the second summand on the right-hand side of (C.8) is zero (with a = κ logn n ) for n large enough, and therefore . n κ2 n P0 I(V) ≤ κ ≤ P0 EX I(V) ≤ 4κ + exp − log n . log n log n 32c3 (b)δ (C.13) We choose now def
δ = κ 3,
(C.14)
Critical Behavior of Massless Free Field at Depinning Transition
197
and so the second summand in (C.13) is fine, again if κ ≤ κ0 (b), κ0 (b) small enough. The reader should keep in mind that V depends on our truncation parameter b, which we emphasize by writing Vb . Combining what we have achieved so far, we see that it suffices to prove that for any R > 0 there exists b (large enough) and then κ > 0 small enough (depending on b) such that n (C.15) P0 EX I(Vb ) ≤ κ ≤ n−R ; log n EX I(Vb ) =
PX
/
x∈Tn
=
1−
x∈Tn
≥
K
i=1
x ∈ Vi,b
K i=1
1 − PX x ∈ Vi,b
x∈Tn
- . K 1 − exp − . PX x ∈ Vi,b i=1
S2 , . . . , SM of sidelength We * now chop the torus Tn into M = 1/δ subsquares S1 ,√ n δ log n . For notational convenience, we will assume that 1/δ is an integer, which evidently is no restriction. (Remember the setting δ = κ 3 but for the moment, this will be of no importance.) We set √ def ξi = # j ∈ {1, . . . , K} : X(j −1)m ∈ Si , d(X(j −1)m , Xj m ) < b m and
def ξ i = # j ∈ {1, . . . , K} : X(j −1)m ∈ Si . √ Lemma C.3. Let X(j −1)m ∈ Si , d(X(j −1)m , Xj m ) ≤ b m, and x ∈ Si . Then c4 (b) PX x ∈ Vj,b ≥ . log n (m)
Proof. We use the same notations as in the proof of Lemma C.2: Py,z denotes the law of the random walk of length√m (on the torus), conditioned to start in y and to end in z. If x, y ∈ Si and d(y, z) ≤ b m, then (m) (m) Xj = x for some j ∈ {1, . . . , m} ≥ Py,z Xj = x for some j ∈ [m/4, m/2] Py,z m/2 j =m/4 pj (x − y)P0 X1 = 0, . . . , Xm/2−j −1 = 0, Xm−j = z − x , = pm (z − y) def
pm (z − y) ≤ C(b)m−1 , pj (x − y) ≥ Cm−1 for m/4 ≤ j ≤ m/2. Let r = m − j, which for the region of summation is in [m/2, 3m/4], and m/2 − j − 1 ≤ r/2. Then P0 X1 = 0, . . . , Xm/2−j −1 = 0, Xm−j = z − x √ ≥ P0 X1 = 0, . . . , Xr/2 = 0, d(Xr/2 , 0) ≤ m inf √ P0 (Xr/2 = z − x − u) u:d(u,0)≤ m
C(b) ≥ . m log m
198
E. Bolthausen, Y. Velenik
Therefore, we get C(b) C(b) (m) Py,z x = Xj for some j ∈ {1, . . . , m} ≥ ≥ . log m log n We set
1 def Zn,δ = δ# i : ξi ≥ log n , 4
and Z n,δ Then
1 = δ# i : ξ i ≥ log n . 2
def
. n c4 (b) EX I(Vb ) ≥ Zn,δ 1 − exp − , log n 4
and therefore
n κ P0 EX I(Vb ) ≤ κ ≤ P0 Zn,δ ≤ . log n 1 − exp [−c4 (b)/4] Remark now that if Z n,δ − Zn,δ ≥ 8 exp −c1 b2 , then Hn,b,δ ≥ 2 exp −c1 b2 logδ n . Therefore, using (C.7), we get n κ 2 P0 EX I(Vb ) ≤ κ ≤ P0 Z n,δ ≤ + 8 exp −c1 b log n 1 − exp [−c4 (b)/4] . log n + exp −c2 exp −c1 b2 . δ Choosing b large enough, and then κ > 0 small enough (and correspondingly δ = κ 3 ), we see that in order to finish the poof of Proposition C.2, it suffices to prove the following: Lemma C.4. For any R > 0 there exists η > 0 such that for any δ > 0, P0 Z n,δ ≤ η ≤ n−R for n large enough. Proof. We rescale the random walk by defining (n,δ) def
Yj
= Xj m /Ln . This random walk depends on δ through m = δ logn n . It takes values in Tn /Ln which def
2 we regard as a (discrete) subset of the continuous torus T = [0, 1) with lattice spacing * def
1/Ln . Remember the setting Ln = p m (L
n log n
. The transition probabilities of the Y -
chain are given by p(x) ˜ = is depending on δ). Here n x), x ∈ Tn /Ln (notice that p pm is the mth matrix power. By the local central limit theorem (and our aperiodicity
Critical Behavior of Massless Free Field at Depinning Transition
199
assumption) there exists γ0 > 0 such that for γ ≥ γ0 and any x ∈ Tn /Ln , n ∈ N and δ > 0, p˜ [γ /δ ] (x) ≤ 2L−2 (C.16) n . √ √ √ √ We denote by Sδ,η the set of unions of square [k1 δ, (k1 +1) δ)×[k2 δ, (k2 +1) δ) ⊂ T with total area at most η. In order to prove the lemma, it suffices to prove that for any R > 0, n)/δ (log
log n 1 (n,δ) ≤ −R, lim sup log P0 (C.17) 1A Y j ≥ 2δ n→∞ log n j =0
for small enough η uniformly in δ and A ∈ Sδ,η . We estimate the above probability in a standard way. For any λ > 0 we have log n/δ
log n (n,δ) P0 1A Yj ≥ 2δ j =0 n)/δ (log
(n,δ) ≤ exp −λ log n E0 exp 2λδ 1A Yj . (C.18) j =0
In order to estimate the right hand side, we use (C.16). We split the summation on j alternatively in intervals of length γ /δ and 3γ /δ, the former being called “short” intervals, the others “long”. We begin with a short interval. Remark that the contribution of all short intervals to the exponent in the expectation on the r.h.s. of (C.18) is at most λ log n we can leave this part out, replacing the first factor on the r.h.s. of 2 . Therefore,
def λ log n (C.18) by exp − 2 . If we choose γ = max γ0 , logλ 2 we have by (C.16),
E0 exp 2λδ
j ∈ long intervals
≤ exp
(n,δ)
1A Y j
.
λ log n Eu exp 2λδ 4
3γ /δ j =0
(n,δ)
1A Yj
log n 4γ ,
where Eu is the expectation with respect to an uniform starting distribution. We therefore get n)/δ (log
log n 1 (n,δ) lim sup log Pu 1A Yj ≥ 2δ n→∞ log n j =0 3γ /δ
λ 1 (n,δ) ≤− + lim log Eu exp 2λδ 1A Y j 4 4γ n→∞ j =0 3γ /δ λ 1 =− + log Eu exp 2λδ 1A Bδj , 4 4γ j =0
200
E. Bolthausen, Y. Velenik
where (Bt )t≥0 is a Brownian motion on T with covariance matrix Q. For x ≥ 0 we have ex ≤ 1 + xex , and we therefore get 3γ /δ 3γ /δ Eu exp 2λδ 1A Bδj ≤ 1 + 2λδ Pu Bδj ∈ A e6λγ j =0
j =0
= 1 + 6λγ |A| e6λγ ≤ 1 + 6λγ ηe6λγ . We therefore get n)/δ (log
log n 1 (n,δ) ≤ − λ + 3λη e6λγ . lim sup 1A Y j ≥ log Pu 2δ 4 2 n→∞ log n j =0
Choosing λ appropriately, this proves the claim.
D. The Case d=1 We consider the δ-pinning case only, and p (±1) = 1/2. We however can easily allow more general symmetric interaction functions V : R → R+ . We set e−βV (x)/2 . e−βV (y)/2 dy 5 5 5 The only property we need is e−βV (y)/2 dy < ∞, xψ (x) dx = 0, x 2 ψ (x) dx = σ 2 < ∞. By a simple rescaling, we can assume σ 2 = 1. Let ψk be the k-fold convolution of ψ. By the local central limit theorem, we have 1 1 def f (k) = ψk (0) = √ +o √ , 2π k k def ψ (x) = 5
as k → ∞. The distribution νnε of pinned sites on = {−n, −n + 1, . . . , n} is easily described: Let A ⊂ {−n, −n + 1, . . . , n} with |A| = m − 1, A = {k1 , k2 , . . . , km−1 } , def
def
where k0 = −n − 1 < k1 < k2 < . . . < km−1 < km = n + 1. Then νnε (A) = Of course,
k
m 1 m−1 ε f kj − kj −1 . Zn,ε j =1
f (k) = ∞. Therefore, there exists a unique λ = λ (ε) , such that ε e−λk f (k) = 1. k
Remark that (D.1) is not changed if we replace f by fλ (k) = e−λk f (k). def
(D.1)
Critical Behavior of Massless Free Field at Depinning Transition
201
Standard renewal arguments then show that ν ε = limn→∞ νnε exists, and is simply given as the stationary renewal sequence with renewal epochs with distribution fλ(ε) (k) : k > 0 . For instance, if def
ξ = max {m ≤ 0 : m ∈ A} , def
η = min {m > 0 : m ∈ A} , then Lemma D.1. ν ε ((ξ, η) = (k, l)) = def
if k ≤ 0 < l, where M ε =
j
1 fλ(ε) (l − k) Mε
j fλ(ε) (j ) .
The full measure µε (in the thermodynamic limit) is then given as a mixture µε = ν ε (A) µA , A⊂Z
where µA is the measure on RZ given by independent pieces of tied-down random walks between successive elements of A. For instance
1 2 fλ(ε) (l − k)E0 S−k Sl−k = 0 , φ02 µε (dφ) = ε M k≤0
where S0 , S1 , S2 , . . . is a random walk on R starting at 0 with distribution of the increments given by ψ. We now want to determine the ε → 0 behavior of this quantity. First, remark that for small λ > 0, ε
1 1 ε e−λk √ λ √ e−λk ∼√ 2π λ k 2π k kλ k ∞ 1 ε ε ∼√ √ e−x dx = √ . x 2π λ 0 2λ
e−λk f (k) ∼ ε
k
Therefore λ (ε) =
ε2 + o ε2 . 2
From this, we get Mε =
j fλ(ε) (j ) ∼
j
= ∼
ε3 ε3
1 √ 1 √
2π 2π
*
j√
j
ε 2 ε 2 j e−ε
1 2 e−ε j/2 2πj
2 j/2
(D.2)
j
0
∞√
xe−x/2 dx =
1 . ε3
202
E. Bolthausen, Y. Velenik
Furthermore
2 fλ(ε) (l − k)E S−k Sl−k = 0 ∼ √ k≤0
k≤0
=
∞ n=1
√
1 2 2 e−ε (l−k)/2 El−k S−k 2π (l − k)
1 2π n
e−ε
2 n/2
n−1 m=0
2 En S m ,
where Em stands for the expectation with respect to a random walk tied down after time m. The right-hand side of the above expression is ∼ ∼
∞ √ n−1 n m m 2 1− √ e−ε n/2 n n 2π n=1 m=0
∞ ∞ 1 1 2 2 3/2 −ε2 n/2 1 1 1 ε ε n e ∼ y 3/2 e−y/2 dy = 5 . √ √ 5 5 6ε 6ε 2ε 2π 2π 0 n=1
Combining this with (D.2) yields Proposition D.1.
φ02 µε (dφ)
1 1 = 2 +o 2 2ε ε
as ε → 0.
5 The mass is very easy, too. For fixed ε, the x → ∞ limit of φ0 φx µε (dφ) is in leading order the same as the probability under ν ε that the interval [0, x] has no renewal point. In leading order, this is just the exponential tail behavior of the distribution fλ(ε) . Therefore, we get Proposition D.2. 1 log x→∞ x lim
φ0 φx µε (dφ) = −λ (ε) = −
ε2 + o ε2 . 2
Acknowledgements. We thank François Dunlop for several very interesting discussions on the physical aspects of these questions, as well as for encouraging us to look at the problem investigated in the present paper. We also thank Pietro Caputo for interesting discussions. Y.V. gratefully acknowledges the warm hospitality of the Institute for Mathematics of Zürich University and of the Department of Industrial Engineering of the Technion, where part of this work was done. Y.V. is supported by a Swiss National Science Foundation Grant #8220-056599. E.B. is supported by SNSF Grant #20-55648.98.
References 1. van den Berg, M., Bolthausen, E., den Hollander, F.: Moderate deviations for the Wiener sausage. To appear in Annals of Math. 2. Bolthausen, E., Brydges, D.: Localization and decay of correlations for a pinned lattice free field in dimension two. In: State of the Art in Probability and Statistics, Festschrift for Willem R. van Zwet, IMS Lecture Notes Vol. 36 (2001), pp. 134–149 3. Bolthausen, E., Deuschel, J.-D., Zeitouni, O.: Absence of a wetting transition for lattice free fields in dimensions three and larger. J. Math. Phys. 41, 1211–1223 (2000)
Critical Behavior of Massless Free Field at Depinning Transition
203
4. Brascamp, H.J., Lieb, E.H.: On extensions of the Brunn–Minkowski and Prekopa–Leidler theorems. J. Funct. Anal. 22, 366–389 (1976) 5. Brydges, D.C., Fröhlich, J., Spencer, T.: The random walk representation of classical spin systems and correlation inequalities. Commun. Math. Phys. 83, 123–150 (1982) 6. Caputo, P., Velenik, Y.: A note on wetting transition for gradient field. Stoch. Proc. Appl. 87, 107–113 (2000) 7. Deuschel, J.-D., Giacomin, G., Ioffe, D.: Large deviations and concentration properties for ∇φ interface models. Probab. Theory Relat. Fields 117, 49–111 (2000) 8. Deuschel, J.-D., Velenik, Y.: Non-Gaussian surface pinned by a weak potential. Probab. Theory Relat. Fields 116, 359–377 (2000) 9. Dunlop, F., Magnen, J., Rivasseau, V., Roche, P.: Pinning of an interface by a weak potential. J. Stat. Phys. 66, 71–98 (1992) 10. Dunlop, F., Magnen, J.. Rivasseau, V.: Mass generation for an interface in the mean field regime. Ann. Inst. Henri Poincaré 57, 333–360 (1992) 11. Fisher, M.E., Jin, J.: Is short-range “critical” wetting a first-order transition? Phys. Rev. Lett. 69, 792–795 (1992) 12. Fortuin, C.M., Kasteleyn, P.W., Ginibre, J.: Correlation inequalities on some partially ordered sets. Commun. Math. Phys. 22, 89–103 (1971) 13. Fukai, Y., Uchiyama, K.: Potential kernel for two-dimensional random walk. Ann. Prob. 24, 1979–1992 (1996) 14. Ginibre, J.: General formulation of Griffiths’ inequalities. Commun. Math. Phys. 16, 310–328 (1970) 15. Ibragimov, I.A.: On the accuracy of the approximation of distribution functions of sums of independent random variables by the normal distribution. Theory Prob. Appl 11, 559–580 (1966) 16. Ioffe, D., Velenik, Y.: A note on the decay of correlations under δ-pinning. Probab. Theory Relat. Fields 116, 379–389 (2000) 17. Jain, N.C., Pruitt, W.E.: The range of random walk. In: Proceedings of the Sixth Berkeley Symposium on Mathematical Statistics and Probability, Berkeley, CA: Univ. California Press, 1972, pp. 31–50 18. Lawler, G.F.: Intersections of random walks. Basel–Boston: Birkhäuser, 1991 19. Lemberger, P.: Large-field versus small-field expansions and Sobolev inequalities. J. Stat. Phys. 79, 525– 568 (1995) 20. Spitzer, F.: Principles of random walk. Berlin–Heidelberg–New York: Springer-Verlag, 1976 21. Talagrand, M.: Concentration of measure and isoperimetric inequalities in product spaces. IHES Publications Mathématiques 81, 73–205 (1995) Communicated by H. Spohn
Commun. Math. Phys. 223, 205 – 222 (2001)
Communications in
Mathematical Physics
© Springer-Verlag 2001
Survival Probability in Rank-One Perturbation Problems Alexei Poltoratski Department of Mathematics, Texas A&M University, College Station, TX 77843, USA. E-mail: [email protected] Received: 17 April 2001 / Accepted: 18 June 2001
Abstract: A finite complex Borel measure µ on the unit circle or on the real line is called Rajchman if its Fourier coefficients µ(n) ˆ tend to 0 as n → ∞. In quantum dynamics the self-adjoint operators (Hamiltonians) whose spectral measures are Rajchman correspond to the systems having certain scattering properties. In this paper we study how a small perturbation of the operator can affect the Rajchman property of its spectral measure. Our approach is based on the notion of the local symmetry of measures. 1. Introduction In this paper we study rank-one perturbations of self-adjoint and unitary operators. The rank-one case has become the focus of active research in recent years (see, for instance, [5, 6] or [18] for results and further references). This interest is largely due to the fact that such perturbations often appear in applications, such as mathematical models of solid state physics. Also, as was shown in [7], all of the rank-one perturbation problems can be translated in terms of self-adjoint extensions of a given symmetric operator with deficiency indices (1,1) or in terms of differential operators with changing boundary condition. The general question of perturbation theory can be stated as follows: Knowing the spectral properties of an operator A, what can be said about the spectrum of A + B if B belongs to a certain class? Before the classical paper by Donoghue [7] very little was known about the effect of a rank-one perturbation on the spectral measure of a self-adjoint operator. By the Kato–Rosenblum Theorem any compact perturbation cannot change the absolutely continuous component of the spectral measure (up to the equivalence). However, the examples constructed in [7] demonstrated that singular continuous and pure point components are extremely unstable even under much “smaller” rank-one perturbations. Later examples by other authors shed more light on the behavior of pure Supported in part by N.S.F. grant DMS 9970151
206
A. Poltoratski
point and singular continuous spectra, as well as other spectral characteristics such as the Hausdorff dimension of the spectral measure (see, for instance, [5]). In this paper we discuss the effect of rank-one perturbations on the asymptotics of the so-called survival probability. This notion comes from quantum dynamics. If A is a cyclic self-adjoint operator (a Hamiltonian) and φ is a vector (a state), the survival probability at the time t is given by 2 PA (φ, t) = φ, e−iAt φ . It represents the probability to find the particle corresponding to φ in its initial state at the time t. For the purposes of this paper, we will say that the state φ is transient if PA (φ, t) → 0 as t → ∞ and recurrent otherwise. In terms of the spectral measure µ of A corresponding to φ we have 2 ˆ , PA (φ, t) = |µ(t)|
where µ(t) ˆ = R e−ixt dx is the Fourier transform of µ. Those measures satisfying µ(t) ˆ → 0 as t → ∞ are called Rajchman measures. As we can see, the spectral measure µ is Rajchman iff the corresponding state φ is transient. It is well known, that if µ is Rajchman, then f µ is Rajchman for any f ∈ L1 (µ). Hence, if the spectral measure µ corresponding to a cyclic vector φ is Rajchman then the corresponding quantum system admits only transient states. The studies of such measures date back to the 19th century, starting with the work of Riemann, see [13] or [10]. Most of the interest towards Rajchman measures was generated by their connection to the uniqueness sets for trigonometric polynomials. The relation with quantum dynamics mentioned above brings new questions to this classical area. It follows immediately from the classical results on Rajchman measures (see, for instance, [13]) that any finite Borel measure µ on R (or on T) can be decomposed into Rajchman and non-Rajchman parts: µ = µR + µN such that µR ⊥ µN , µR is a Rajchman measure and µN is purely non-Rajchman, i.e. any measure η µN (absolutely continuous with respect to µN ) is not Rajchman. If µ is a spectral measure of a Hamiltonian, then the spectral subspace corresponding to µR contains only transient states and the subspace corresponding to µN only recurrent states. This decomposition is closely related to the traditional Radon decomposition µ = µc + µp.p. into the continuous and pure point parts. A classical theorem by Wiener says that a measure is continuous iff the arithmetical mean of the absolute values of its first n Fourier coefficients tends to 0 as n → ∞. Hence µR µc and µp.p µN . It follows that the spectral subspace corresponding to the continuous spectrum contains all transient states and the discrete subspace is contained in the subspace of all recurrent states. But, since these two decompositions do not coincide, the Rajchman/non-Rajchman version seems to be more natural in applications concerning quantum dynamics (see, for example, [11]). The focus of this paper is the stability of the Rajchman and non-Rajchman parts of the spectrum under rank-one perturbations. We will attempt to develop methods that can help track the behavior of these parts of the spectrum in various examples of perturbations. After that, as an application of our tools, we will construct several examples of rank-one perturbations of the Rajchman spectrum.
Rank-One Perturbations of Rajchman Measures
207
Let A0 be a cyclic self-adjoint operator and φ be its cyclic vector. Denote by Aλ , λ ∈ R the corresponding self-adjoint rank-one perturbations Aλ = A + λ(·, φ)φ. Let µλ be the spectral measure of Aλ corresponding to φ. Examples constructed in [7] presented a family Aλ such that µ0 is singular continuous but all µλ , λ = 0 are discrete and a family where µ0 is discrete but all other µλ are singular continuous. Many other examples displaying the unstable character of the discrete and singular continuous spectra followed, see [18]. The main tool for all of these constructions was the following convenient “point-mass test” (see [7]): A point x ∈ R is a point mass for some µλ , λ = 0 iff dµ0 (y) < ∞. (1) (x − y)2 R This observation was later developed by Simon and Wolff [19] into the following criterion: µλ is pure point for almost every λ = 0 iff (1) holds for almost every x ∈ R. One can also control the point masses and other properties of the measures µλ using the Krein spectral shift. The results in this direction can be found in [14]. A variation of this technique was used in [16] to produce a family of operators such that Aλ are pure point for λ < 0 and singular continuous for λ ≥ 0. As we discussed above, in view of Wiener’s Theorem the pure-point spectrum is closely related to the non-Rajchman spectrum and the continuous spectrum is somewhat similar to the Rajchman spectrum. Hence it would be interesting to see if the Rajchman/non-Rajchman parts can show a similar lack of stability in rank-one perturbation problems. To answer this question we first note that a singular Rajchman measure can be obtained from a non-Rajchman measure (or from practically any singular measure) by a rank-one perturbation as follows from a general result obtained in [17]: Theorem 1.1. Let U and V be singular unitary cyclic operators such that σ (U ) = σ (V ) = T. Suppose that their spectral measures are mutually singular. Then U and V are equivalent up to a rank one perturbation. I.e. there exists a unitary operator W and a rank-one operator R such that W U W ∗ = V + R. As we can see, a rank one perturbation can change practically any property of a singular spectral measure with dense support, including the Rajchman property. However, in applications it seems more important to look not at the individual operators but at the family Aλ as a whole, trying to understand how the spectral perturbations occur under the variation of the coupling constant λ. In Sect. 3 we attempt to construct the examples analogous to the 3 examples from [7] and [16] mentioned above with “singular continuous” replaced by “Rajchman” and “pure point” by “non-Rajchman”. The analogue of the first example from [7] can be constructed right away using the existing tools and basic properties of Rajchman measures, see Sect. 3. For the other two examples we need to find a replacement for (1) and the Simon– Wolff criterion. The direct analogue of (1), a local necessary and sufficient condition for the existence of the non-Rajchman spectrum, obviously does not exist. Instead, in Sect. 2 we look for an analogue to the reverse statement. We find that the condition that the integral in (1) is infinite can be replaced in Rajchman/non-Rajchman problems with the condition that µ0 is locally symmetric at x. I.e. if µ0 is symmetric near every point from a set E ⊂ R, then all µλ are Rajchman on E. One can also consider the following
208
A. Poltoratski
reverse version of the Simon–Wolff criterion: µλ is singular continuous for a. e. λ = 0 iff (1) does not hold for a. e. x ∈ R. An analogue to this is the statement that, if µ0 is symmetric at a. e. point then µλ is Rajchman for a. e. λ = 0. The only known meaningful necessary and sufficient condition for a measure to be Rajchman, formulated by Lyons in [12] in terms of the so-called Weyl sets, seems to be impossible to use for our purposes. We have to settle for a “workable” sufficient condition instead. The local symmetry gives us such a condition: it seems to be reasonably sharp yet easy to use in the perturbation settings. As we will see in Sect. 2, the local symmetry argument works in the following way: If µ0 is symmetric near µλ -a.e. point, then (Corollary 2.4) µλ is symmetric near µλ -a.e. point and therefore (Theorem 2.8) is Rajchman. The paper is organized as follows. In Sect. 2 we introduce the notion of local symmetry of measures, show that it is invariant under rank-one perturbations and discuss its relations with the Rajchman property. In Sect. 3 we construct the announced examples of families of rank-one perturbations of self-adjoint (unitary) operators Aλ . In our first example the spectral measure µ0 is Rajchman, but all other µλ are pure point (and therefore purely non-Rajchman). In the second example we show that for any cyclic self-adjoint operator A with dense spectrum there exists a cyclic vector such that all the corresponding perturbations are Rajchman. If A itself was non-Rajchman this provides an analogue to the second example from [7]. Finally, in our third example we construct a family Aλ such that Aλ is pure point for all λ < 0 and Rajchman for almost all λ ≥ 0. As in [16], for this type of example we need to use the Krein spectral shift. We combine the local symmetry argument from Sect. 2 with the Krein shift technique in Lemma 3.5. 2. The Local Symmetry and the Rajchman Property Most of the constructions in this paper are done on the unit circle which corresponds to the case of unitary rank one perturbations. There is a standard argument that allows one to “move” such results to the real line and translate them in terms of the self-adjoint operators, see for instance [16]. Throughout this paper we denote by J and J adjacent open arcs of equal length on the unit circle T. As usual, m stands for the normalized Lebesgue measure on T. Instead of m(J ) we often write |J |. We denote by M the space of all complex Borel measures on T and by M+ the subset consisting of positive measures. The notation supp µ is used for the closed support of the measure µ. Definition 2.1. We say that a measure µ ∈ M is symmetric near ξ ∈ T iff ξ ∈ supp µ and for any sequence of arcs Jk such that |Jk | → 0, µ(Jk ) = 0 and sup k
dist(Jk , ξ ) < ∞, |Jk |
(2)
we have µ(Jk ) = 1. k→∞ µ(Jk ) lim
This definition is a localized version of the standard definition of a symmetric measure. Symmetric measures were studied in [4] and several other papers. Recently an effective method of construction of symmetric measures was found in [2].
Rank-One Perturbations of Rajchman Measures
209
Symmetric measures possess many important complex analytic properties. Our next goal is to formulate local analogues of these properties. Our proofs utilize the ideas from [2]. The first important property of a measure µ symmetric near a point ξ is that µ can not vanish too fast at ξ . If z ∈ T we denote by Jz the open arc centered in z/|z| such that |Jz | = 1 − |z|. If J is an arc on T and C is a positive constant we denote by CJ the arc with the same center as J satisfying |CJ | = C|J |. If µ ∈ M we denote by P µ its Poisson integral in the unit disk: 1 − |z|2 Pµ = dµ(ξ ). 2 T |ξ − z| Lemma 2.2. If µ ∈ M+ is symmetric near a point ξ ∈ T then 1 − |z| = o(P µ(z)), as z −→ ξ.
The standard notation z −→ ξ means that z tends to ξ non-tangentially in D, i.e. there exists C, 1 < C < ∞ such that z stays within the sector )ξC = |z − ξ | ≤ C Re(1 − ξ¯ z), |z| > C1 . In the rest of the paper we denote by )ξ the standard sector √
)ξ 2 . It is well known that in all statements concerning the boundary behavior of Cauchy integrals z −→ ξ can be interpreted as z → ξ, z ∈ )ξ .
Proof. Observe that for any z ∈ )ξ there exists D > 0 such that by the properties of the Poisson kernel,
dist(Jz ,ξ ) |Jz |
< D. Also,
1 − |z| |Jz | |Jz |2 ≤E =E P µ(z) µ(Jz )/|Jz | µ(Jz ) for some E > 0. Since µ is symmetric near ξ , µ(2Jz ) < (2+,)µ(Jz ), where 0 < , < 1, if Jz is small enough. Iterating we obtain 2 + , k |2k Jz |2 2+, k |Jz |2 < < F µ(Jz ) 4 µ(2k Jz ) 4 for small enough Jz and some F > 0. By tending |Jz | to 0 we can tend k to ∞.
Recall that a bounded analytic function in the unit disk D is called inner if its nontangential boundary values are equal to 1 by the absolute value almost everywhere on the unit circle T. Let θ be an inner function. It is not difficult to verify that for any constant α ∈ T the fraction α+θ α−θ has positive real part in D. Hence there exists a measure σα ∈ M+ such that P σα = Re
α+θ . α−θ
We denote the family of all such measures {σα }α∈T corresponding to θ by Mθ . It is well known that any such Mθ is a family of spectral measures for a suitably chosen family of unitary rank one perturbations Uα = U1 + (1 − α)(·, U1∗ φ)φ,
α∈T
210
A. Poltoratski
and vice versa, any family of spectral measures of cyclic singular unitary rank-one perturbations is equal to Mθ for some θ . For the self-adjoint perturbations one has to consider inner functions in the upper half-plane instead of the disk. Families Mθ have many interesting function theoretic properties, see [1] or [15]. Their relation with perturbation theory allows one to construct examples of rank-one perturbations of spectra by producing examples of inner functions, see [16]. To combine this method with our tools, we need to translate the notion of local symmetry in terms of the boundary behavior of the underlying inner function. If µ ∈ M we denote by H µ its Herglotz integral 1 + ξ¯ z Hµ = dµ(ξ ). T 1 − ξ¯ z Lemma 2.3. Let θ be an inner function in D and µ = σ1 ∈ Mθ . Let ξ ∈ T. Then the following conditions are equivalent: (I) µ is symmetric near ξ ; (1 − |z|2 )θ (z) (II) lim = 0; z−→ξ 1 − |θ (z)|2 (III)
(1 − |z|2 )H µ = 0. z−→ξ Pµ lim
Proof. Simple computations show that (II) is equivalent to (III). z¯ is a function uniAt the same time, (1 − |z|2 )H µ = 2P φz µ, where φz = ξ¯ 1−ξ 1−ξ¯ z modular on T. (I) ⇒ (III). We will estimate PPφµz µ in the sector )ξ . Fix large N, L ∈ N and a small ,, 1 > , > 0 (all to be chosen later). Let I be an arc centered at ξ such that for any arc J ⊂ I satisfying dist(J, ξ ) ≤L |J |
(3)
we have (1 − ,) <
µ(J ) < (1 + ,). µ(J )
Consider z close enough to ξ so that 2N Jz ⊂ I . Then Pz φz dµ + Pz φz dµ + P φz µ(z) = I \2N Jz
T\I
2N Jz
Pz φz dµ,
2
1−|z| where Pz denotes the Poisson kernel |ξ . By the properties of the Poisson kernel the −z|2 first integral is O(1 − |z|) and therefore o(P µ) by Lemma 2.2. The second integral can be estimated as ∞
µ(2k+1 Jz \ 2k Jz ) µ(2k+1 Jz ) < P φ dµ < z z N 4k (1 − |z|) 4k (1 − |z|) I \2 Jz
k≥N, 2k Jz ⊂I ∞
<3
k=N
2+, 4
k
k=N
N µ(Jz ) 3 P µ(z), ≤ 12 |Jz | 4
Rank-One Perturbations of Rajchman Measures
211
since for any z from the sector )ξ we have ξ ∈ Jz . It is left to recall that N can be made arbitrarily large. Finally, we estimate the last integral split 2N Jz into L equal arcs I1 , I2 , . . . , IL . Every arc Ik satisfies (3). Note that Pz satisfies
2N sup Pz ≤ 1 + L Ik
2 inf Pz ≤ 4 inf Pz Ik
Ik
(4)
if we choose L > 2N . Also, the function φz is a smooth unimodular function on T whose derivative is bounded by 2/(1 − |z|). Hence, if ζk and ωk are any two points from Ik then Pz (ζk )φz (ζk ) − Pz (ωk )φz (ωk ) ≤ |Pz (ζk ) − Pz (ωk )φz (ωk )| + |Pz (ζk )φz (ζk ) − φz (ωk )| 2N+1 C(N ) 2N sup Pz + sup Pz < inf Pz , ≤ L Ik L L Ik Ik
(5)
where C(N ) depends on N but not on L. In addition, since µ is symmetric, as z gets closer to ξ the difference between the measures of Ik becomes small in comparison with the measures of Ik . In particular, for z close enough to ξ we have |µ(Ik ) − L1 µ(2N Jz )| < ,µ(Ik ) for any k = 1, 2, . . . , L. Therefore, for some complex numbers ak , bk satisfying |ak | < C(N) L inf Ik Pz and |bk | < ,, we have
Ik
Pz φz dµ = µ(Ik )Pz (ζk )φz (ζk ) + µ(Ik )ak µ(2N Jz ) Pz (ζk )φz (ζk ) + bk µ(Ik )Pz (ζk )φz (ζk ) + µ(Ik )ak . L
= Hence
L µ(2N Jz )
|2N Jz | P Pz φz dµ ≤ (ζ )φ (ζ ) z k z k |2N Jz | L 2N Jz k=1 L L
C(N ) +, µ(Ik )Pz (ζk )φz (ζk ) + inf Pz µ(Ik ) . L Ik
k=1
k=1
The first factor in the first summand is O(P µ). The second factor is a Riemann sum for 2N Jz
(1 − |z|2 )ξ dm(ξ ). (ξ − z)2
(6)
This integral can be calculated and shown to be bounded by (1/2)N by the absolute value. (Or, to avoid the calculation, notice that T
(1 − |z|2 )ξ dm(ξ ) = (1 − |z|2 )(1/2) = 0 (ξ − z)2
212
A. Poltoratski
and conclude that (6) must be small for large N .) By (5) the difference between the Riemann sum and the integral is at most L
µ(2N Jz ) C(N ) k=1
L
L
inf Pz ≤ Ik
C(N ) . L
Hence the first summand is O 2−N + C(N) P µ(z) . The second summand is bounded L from above by 4,P µ(z). Finally, the third summand is bounded by C(N) L P µ(z). Putting together all the estimates we obtain the statement. (III) ⇒ (I). Suppose that In are the arcs such that |In | → 0, dist(In , ξ ) < C|In |,
(7)
but |µ(In ) − µ(In )| > cµ(In ) for some c, C > 0. Let zn and zn be points in D such that In = Jzn and In = Jzn . Then Lemma 2.5 below implies that |P µ(zn ) − P µ(zn )| > c 2 P µ(zn ) for large enough n. By the Mean Value Theorem this implies that on the segment [zn , zn ] there exists a point zn∗ where P µ(z∗ ) > c P µ(zn ) . n 2 |zn − zn | But by (7) |zn − zn | >
1 2C
(1 − |zn |) and we obtain a contradiction.
Corollary 2.4. Let θ be an inner function and Mθ = {µα } be the corresponding family of measures. Consider ξ ∈ T. The following conditions are equivalent: (I) There exists α ∈ T such that µα is symmetric near ξ ; (II) For any α ∈ T µα is symmetric near ξ . Proof. The measure µα is symmetric near ξ iff θ satisfies Condition (II) in the last theorem (note that µα = σ1 ∈ Mαθ ¯ ). ¯ for the inner function αθ Lemma 2.5. Let µ ∈ M+ and suppose that for some ξ ∈ T, (1 − |z|2 )H µ →0 Pµ
(8)
as z −→ ξ . Then
µ(Jz ) = P µ(z) + o(P µ(z)) |Jz | as z −→ ξ .
Proof. The proof uses the ideas of [3] and [2]. For v, w ∈ T denote by ρ(v, w) the hyperbolic distance v−w . ρ(v, w) = 1 − wv ¯
Rank-One Perturbations of Rajchman Measures
213
The Mean Value Theorem and (8) imply that for any c < 1, |P µ(z) − P µ(w)| →0 P µ(z) {w | ρ(z,w)
as z −→ ξ.
By the standard argument, the limit can be preserved even if one allows the radius of the hyperbolic disk to tend to 1 slowly enough, i.e. there exists a function R(t) > 0, R(t) → 1− as t → 1, such that |P µ(z) − P µ(w)| →0 P µ(z) {w | ρ(z,w)
as z −→ ξ.
Now let us show that for any C > 1, 4 Pz dµ < Pz dµ C T T\CJz if z ∈ )ξ is close enough to ξ . Let zC be a point such that JzC = CJz . Then PzC ≥ on T \ CJz and C PzC dµ > Pz dµ. 2 T\CJz T\CJz
(9)
(10) C 2 Pz
At the same time by (9) if z is close enough to ξ then P µ(z) > 1/2P µ(zC ). Thus C P µ(z) > 1/2P µ(zC ) > 1/2 PzC dµ > Pz dµ, 4 T\CJz T\CJz which establishes (10). If J is an arc on T and 0 < r < 1 we will denote by J r the arc {rζ | ζ ∈ D} inside the disk. Notice that there exists a function r(t), 1 > r(t) > 0, 1 − r(t) = o(1 − t) as t → 1, such that for any w ∈ (Jz )r(|z|) we have ρ(z, w) < R(|z|), where R is the same as in (9). Let 0 < , be a constant to be chosen later. Denote Lz = ((1 − ,)Jz )r(|z|) . Then by (10), 1 |Lz |P µ(z) = P µ(w)|dw| + o(P µ(z))|Lz |. 2π Lz At the same time, since 1 − r(|z|) = o(1 − |z|), by (10) for any w ∈ Lz , P µ(w) = Pw dµ + o(P µ(w)) = Pw dµ + o(P µ(z)). Jz
Hence 1 |Lz |P µ(z) = 2π
Jz
Lz
But for every ζ ∈ Jz , 1 2π
Jz
Pw (ζ )dµ(ζ )|dw| + o(P µ(z))|Lz |.
Lz
Pz (ζ )|dw| < 1,
214
A. Poltoratski
and therefore |Lz |P µ(z) < µ(Jz ) + o(P µ(z))|Lz |. Also, for every ζ ∈ (1 − 2,)Jz , 1 Pz (ζ )|dw| → 1 2π Lz
(11)
as z −→ ξ.
Hence |Lz |P µ(z) ≥ (1 − 2,)µ(Jz )| + o(P µ(z))|Lz |. Since |Lz | = r(|z|)(1 − ,)Jz and r(|z|) → 1, (11) and (12) give us the statement.
(12)
Lemma 2.6. Let µ ∈ M. Then the following conditions are equivalent: (I) µ is Rajchman; (II) for all Riemann integrable functions f on T, n f (z )dµ → µ(T) f dm T
T
(13)
as n → ∞. Proof. (II) ⇒ (I) is trivial. To establish (I) ⇒ (II) notice that if µ is Rajchman then (13) obviously holds for polynomials. By polynomial approximation we can pass from polynomials to continuous functions. Since all Riemann integrable functions can be uniformly approximated by linear combinations of characteristic functions of open arcs, it is enough to prove (II) for such characteristic functions. If f is such a function then there exists a continuous g, f − g∞ = 1/2 such that E = {f = g} consists of two open arcs, µ(E) < , and |E| < ,. Let χE be the characteristic function of E. Let h be a continuous function such that h ≥ χE and T hdm = 2,. Then f (zn )dµ − g(zn )dµ ≤ χE (zn )dµ ≤ h(zn )dµ → µ(T)2, T
as n → ∞. Since
T
T
and
n
g(z )dµ → µ(T)
T
gdm,
gdm − f dm < ,, T
T
we obtain the statement. The relation with symmetry is more evident from the following version of the last statement:
Rank-One Perturbations of Rajchman Measures
215
Lemma 2.7. Let µ ∈ M. Fix 0 ≤ c ≤ 2π and consider the sequence of partitions @cn = {I1 , I2 , . . . ., I2n } of T into equal disjoint arcs,
2π(k − 1) 2π k <φ ≤c+ . Ik = eiφ ∈ T | c + 2n 2n The measure µ is Rajchman iff for all such sequences of partitions (for all c) we have lim µ(I1 ) − µ(I2 ) + µ(I3 ) − · · · − µ(I2n ) = 0.
n→∞
(14)
Proof. Suppose µ is Rajchman. Consider the function f on T defined as 1 on the upper half-circle and as −1 on the lower half circle. Then Lemma 2.6 applied to f (e−ic z) produces (14). If (14) is satisfied for all sequences of partitions @cn , then by the standard argument (13) holds for any odd f (satisfying f (z) = f (−z)) and therefore for f = z. Theorem 2.8. Let µ ∈ M and A be a Borel subset of T. Suppose that µ is symmetric near µ-almost every point of A. Then the restriction of µ on A is Rajchman. Note that we do not require µ to be positive, although in the next section Theorem 2.8 will only be applied to positive measures. Hence, those readers not interested in maximal generality can assume that in the following two proofs all the measures are positive, which may simplify the argument. To prove Theorem 2.8 we first need Lemma 2.9. If µ is symmetric near µ-almost every point of A then µA , the restriction of µ on A, is symmetric near µ-almost every point of A. Proof. µ-a.e. point ξ of A is a point of density 1 for the set A with respect to µ, i.e. µ(A ∩ I, ) →1 µ(I, )
as , → 0,
(15)
where I, is the arc of length , centered at ξ . Now suppose that Jk is a sequence of arcs satisfying (2), µ(Jk ) = 0 and |Jk | → 0. Then µ(Jk ) µ(Jk ∩ A) + o(µ(Jk ∩ A)) = . µ(Jk ) µ(Jk ∩ A) + o(µ(Jk ∩ A)) Hence the ratio
µ(Jk ∩A) µ(Jk ∩A)
tends to 1.
Proof of Theorem 2.8. Let ε, , > 0 be small constants (to be chosen later). Denote by µA the restriction of µ on A. Since, by Lemma 2.9, µA is symmetric near µA -a. e. point, there exists @ ⊂ T, |µA |(@) > |µA |(T) − ε and δ > 0 such that for any arc J satisfying |J | < δ and |µA |(J ∩ @) > 0, we have |µA (J ) − µA (J )| < ,µA (J ). Let {I1 , I2 , . . . , I2n } be a partition of T into equal arcs. Denote K = {k : |µA | ((I2k−1 ∪ I2k ) ∩ @) > 0}.
216
A. Poltoratski
Then if |Ik | < δ, |µA (I1 ) − µA (I2 ) + · · · − µA (I2n )| ≤
|µA (I2k−1 ) − µA (I2k )| + ε
k∈K
≤,
µA (I2k−1 ) + ε ≤ ,µA + ε.
k∈K
Let the operators Aλ and the spectral measures µλ be defined as in the introduction. Together the results from this section give us the following: Theorem 2.10. If µ0 is symmetric near every point of a Borel set E ⊂ R then all Aλ are Rajchman on E (the restrictions of µλ on E are Rajchman). Proof. The statement follows from Corollary 2.4, Theorem 2.8 and the connection between families µλ and Mθ discussed before (see [16]). 3. Examples Let A be a cyclic self-adjoint operator, φ its cyclic vector and µ the spectral measure of A corresponding to φ. Recall that we denote by µλ the spectral measures of the rank one perturbations of A, Aλ = A + λ(·, φ)φ, λ ∈ R corresponding to φ. To construct the examples announced in the introduction we need, in addition to the results of the previous section, the following two simple and, probably, well-known lemmas. If µ is a positive Borel measure on the real line we will denote
µ(x − ,, x + ,)) Aµ = x ∈ R → 0 as , → 0 . 2, Similarly, if µ ∈ M we will denote by Aµ the circle analogue of this set. Lemma 3.1. Let the spectral measure µ0 be singular. Then for any λ = 0 µλ (Aµ0 ) = 0. Proof. Let {σα } = Mθ be the family of measures corresponding to an inner function θ. We need to show that σα (Aσ1 ) = 0 for all α = 1. But the definition of Aσ1 implies that P σ1 → 0 as z −→ ξ for every ξ ∈ Aσ1 . Hence, θ(z) → α for any α = 1. It is left to
notice that θ (z) → α as z −→ ξ for σα -a.e. ξ .
Lemma 3.2. Let µ ∈ M be such that supp µ is a set of Lebesgue measure 0. Then there exists a positive f ∈ L1 (µ) such that Af µ = supp µ. Proof. Denote E = supp µ. To construct such an f , for k = 1, 2, . . . consider open sets Ek ⊃ E such that |Ek | < 1/2k . Define the function fk on T in the following way: If Ek = ∪n Ikn , where Ik1 , Ik2 , . . . are disjoint open arcs, then fk is equal to |Ikn |/µ(Ikn ) on Ikn and to 0 outside of Ek . Then fk L1 (µ) = 1/2k . Hence the series fk converges in L1 (µ). If we put f = fk and µ = f µ, then Af µ = E.
Rank-One Perturbations of Rajchman Measures
217
Example 3.3. First we construct an operator A whose spectral measure µ is singular Rajchman but all Aλ , λ = 0 are pure point. The construction is similar to the one used by Donoghue in [7] and does not require any of the results of the previous section. All we need is to produce a positive singular Rajchman measure that for any ∞ µ such dy x ∈ Aµ (where Aµ is defined as before Lemma 3.1) the integral −∞ (x−y) 2 converges. Then by Lemma 3.1 all µλ , λ = 0 will be concentrated on the set where this integral is finite and hence will be pure point. Let ν be an arbitrary singular Rajchman measure. (To construct such a measure one can, for example, consider a Riesz product corresponding to a positive sequence an such that an → 0 but {an } ∈ l 2 . For the information on Riesz products see, for instance, [9].) Then there exists a closed set F of Lebesgue measure 0 such that ν(F ) > 0. Denote by γ the restriction of ν on F . Then E = supp γ is a subset of F and therefore |E| = 0. By Lemma 3.2 we can choose a positive function f ∈ L1 (γ ) such that Af γ = E. After that we put µ = f γ . Since ν was Rajchman, so is µ. Example 3.4. In this example we will show that for any cyclic unitary operator U with σ (U ) = T there exists a cyclic vector φ such that all rank-one perturbations Uα = U + (1 − α)(·, U −1 φ)φ,
α ∈ T,
α = 1
are Rajchman. (If the original operator is purely non-Rajchman, this produces an example analogous to the second one from [7]). By Theorem 2.10 and Lemma 3.1 it is enough to show that for any singular µ ∈ M+ with supp µ = T there exists f ∈ L1 (µ) such that f µ is symmetric near every point of T \ Af µ . Let ν be an arbitrary singular symmetric probability measure on T. (Such measures are constructed in [4] and [2].) Note that then supp ν = T (otherwise the measure is not symmetric in the gaps of the support). Step 1. One can choose a collection of disjoint open arcs In , n = 1, 2, . . . such that ν(∪In ) = ν, | ∪ In | ≤ 1/2, and ν(In ) ≥ |In | for all n. For each arc In consider arcs 1 (1 − √1 )In , k = 1, 2, . . . The set (1 − √k+1 )In \ (1 − √1 )In consists of two small k
k
arcs. Denote these arcs by Enk and Fnk . By Lemma 3.2 on arcs Enk and Fnk we can choose nonnegative functions ukn and vnk such that the measures βnk = ukn µ and γnk = vnk µ satisfy: (1) | supp βnk | = | supp γnk | = 0; (2) supp βnk = Aβnk ⊂ Enk , supp γnk = Aβnk ⊂ Fnk ; (3) βnk = ν(Enk ) and γnk = ν(Fnk ). Define σ1 = k,n βnk + γnk . Note that then the measure σ1 satisfies P σ1 (z) > 1/2P ν(z) and |H σ1 (z)| < 2|H ν(z)|
when z → ξ, z ∈ )ξ for any ξ ∈ ∪In . Before we proceed to the next step, let us denote ∪In = @1 . Step k. In the same way as in Step 1, we can choose an open set @k ⊂ @k−1 and a measure σk << µ, σk = ν on @k satisfying:
218
a) b) c) d)
A. Poltoratski
|@k | ≤ 1/2k ; @k = ∪In , where In are disjoint open arcs such that σk (In ) ≥ |In |; | supp σk ∩ @k | = 0, supp σk ∩ @k = Aσk ∩ @k ; P σk (z) > 1/2P ν(z) and |H σk (z)| < 2|H ν(z)| when z → ξ, z ∈ )ξ for any ξ ∈ @k . 1 Put µ0 = σ . Since ν is symmetric, by Lemma 2.3, 2k k lim z −→ ξ
(1 − |z|2 )H ν(z) =0 P ν(z)
for all ξ ∈ T. Condition d) now implies that for every point ξ ∈ @1 , (1 − |z|2 )H µ0 (z) (1 − |z|2 )H ν(z)
as z −→ ξ.
(16)
as z −→ ξ.
k
1 n=1 2n σn
as z −→ ξ.
we will also have (17)
k
1 supp σn ∩ (@k \ @k+1 ) 2n n=1
lies inside ∪kn=1 Aσn , which in its turn is a subset of Aµ0 . Therefore, (17) holds for all ξ from @k \ @k+1 except possibly those from Aµ0 . Putting T \ @1 and all sets @k \ @k+1 together and using Lemma 2.3 we obtain that µ0 is symmetric near any ξ except possibly those from Aµ0 ∪ (∩@k ). But if ξ ∈ ∩@k then by b) ξ ∈ Aµ0 . Therefore µ0 is symmetric near every point outside Aµ0 . Thus we constructed the measure µ0 = f0 µ, µ0 = 1 satisfying the desired symmetry condition. The only thing that remains to correct is that µ({f0 = 0}) > 0, i.e. the vector φ corresponding to f0 is not cyclic. To finish our construction notice that Condition 2) in Step 1 (and all the subsequent steps) implies that the restriction of µ on {f0 = 0} is still densely supported on T. Also, with the proper choice of functions unk and vkn we can make µ({f0 = 0}) < µ/2. After that we can repeat the whole construction for the restriction of µ on {f0 = 0} in place of µ. Iterating this procedure we obtain the sequence of measures µk = fk µ each satisfying (17) (with µ0 replaced ¯ µk and such that µk < µ/2k . Now we can put σ = by µk ) on A µk . Then σ satisfies (17) outside of ∪Aµk . Hence, it is symmetric near every point outside of Aσ and by Theorem (2.10) and Lemma 3.1 all other measures from the same family of rank one perturbations are Rajchman. At the same time, σ = f µ for some positive f ∈ L1 (µ). Therefore, if µ was pure non-Rajchman, say, then so is σ .
Rank-One Perturbations of Rajchman Measures
219
For our final example we need to introduce the notion of the Krein spectral shift. Here we will only discuss it in the case of singular unitary operators. Let U1 be a unitary cyclic singular operator, Uα = U1 + (1 − α)(·, U1−1 φ)φ for some cyclic vector φ of U1 and let µα be the spectral measures of these operators corresponding to φ. As was discussed in the previous section, {µα } = Mθ for some inner function θ . The Krein spectral shift for the perturbation problem U1 " → Uα is defined as a function u on the circle T equal to π/2 on θ −1 ({eiψ | 0 < ψ < ζ }) and to −π/2 on θ −1 ({eiψ | ζ < ψ < 2π}), where α = eiζ , 0 < ζ < 2π . It follows that 1+α u = arg H µ1 − 1−α a.e. on T, where arg stands for the principal branch of the argument taking values in (−π, π). Hence u also satisfies (18) H µ1 = C exp iH u − i udm = (H µα )−1 T
for some C > 0. The last two formulas can be viewed as a definition of u as well. For any singular cyclic unitary perturbation problem U1 " → Uα there exists such a function u and, conversely, for any function u on T, u(T) = {π/2, −π/2} there exists a unitary cyclic singular operator and its rank-one perturbation such that u is the Krein spectral shift for the corresponding perturbation problem. For a more detailed discussion of u, as well as the Krein spectral shift for self-adjoint rank-one perturbation problems, see, for instance, [16]. It is not difficult to deduce from the above definitions of u that P u → π/2
as z −→ ξ for µβ -a.e. ξ
if β ∈ {eiψ | 0 < ψ < ζ } and P u → −π/2 as z −→ ξ for µβ -a.e. ξ
if β ∈ {eiψ | ζ < ψ < 2π}, where α = eiζ , 0 < ζ < 2π . The Herglotz integral H u tends to ic non-tangentially at µβ -a.e. point, where the real constant c = c(β) is positive if β ∈ {eiψ | 0 < ψ < ζ } and negative if β ∈ {eiψ | ζ < ψ < 2π }, see [16]. We will use this observation in our construction. The Krein spectral shift can be used together with the tools developed in the previous section to study the behavior of the Rajchman spectrum under rank-one perturbations. We illustrate this with the following statements and example. Lemma 3.5. Let Uα be a family of unitary singular cyclic rank-one perturbations and µα be the corresponding spectral measures. Consider the Krein spectral shift u corresponding to the perturbation problem U1 " → U−1 . Denote E = {u = −π/2} and E¯ = {u = π/2}. Let ξ be a point of density 1 for the set E such that H u has a non-zero non-tangential limit at ξ . The measure (π/2 + u)m is symmetric near ξ iff all µα are symmetric near ξ .
220
A. Poltoratski
Proof. Suppose that (π/2 + u)m is symmetric near ξ . By Corollary 2.4 it is enough to show that µ1 is symmetric near ξ . To prove this, notice that P (π/2 + u) = π/2 + arg H µ1 P µ1 P µ1 Qµ1 = π/2 + arctan < C1 < C2 P µ1 Qµ1 H µ1 for some C1 , C2 > 0 as z −→ ξ because H µ1 → ci and P µ1 → 0. Therefore
2 (1 − |z|)2 H µ1 (1 − |z|)2 iH uH µ1 = < C3 (1 − |z|) H u → 0 P (π/2 + u) P µ1 P µ1 as z −→ ξ . Thus µ1 is symmetric near ξ .
The opposite direction is proved in the same way.
We denote by T± the upper and lower half-circles. By our previous discussion, all µα , α ∈ T± are concentrated on the set {u = ±π/2}. The formula (18) together with the last lemma give us the following Corollary 3.6. If the measure (π/2 + u)m ((π/2 − u)m) is symmetric near almost every point of the set {u = −π/2} ({u = π/2}) then µα are Rajchman for almost every α ∈ T− (T+ ). Example 3.7. In this example we will see that there exists a family of unitary cyclic operators Uα = U1 + (1 − α)(·, U1−1 φ)φ such that their specral measures µα are pure point for all α = eiψ , π < ψ < 2π but singular continuous for all α = eiψ , 0 ≤ ψ ≤ π and Rajchman for almost all α = eiψ , 0 ≤ ψ ≤ π . This implies that there also exists a self-adjoint family Aλ = A + λ(·, φ)φ such that the spectral measures µλ are pure point for λ < 0 but singular continuous for λ ≥ 0 and Rajchman for almost all λ ≥ 0. The idea is to find a Krein spectral shift u satisfying certain symmetry conditions and use Lemma 3.5. To find the suitable u we first construct a non-zero Cantor set C ⊂ [0, 1] in the following way: Let C0 = I00 = [0; 1], C1 = I11 ∪ I21 , . . . , Cn = I1n ∪ · · · ∪ I2nn , . . . , where n n ∪ I2k−1 = Ikn−1 \ Ln−1 I2k k
and Lnk is the open interval placed in the center of the interval Ikn such that |Lnk | = n12 |Ikn |. ¯ Put C = ∞ n=0 Cn . Denote also C = [0, 1] \ C and let χC¯ be the characteristic function ¯ of C. Claim 3.8. The measure χC¯ m is symmetric near almost every point of C. (The definition of a measure symmetric near a point on the real line is similar to the one given in Sect. 2 for the case of the circle.)
Rank-One Perturbations of Rajchman Measures
221
Proof. Let x ∈ C. Suppose that Jn is a sequence of intervals satisfying (2) and such that |Jn | → 0. Let us first assume that there is at most a finite number of intervals Lnk such that √ dist(x, Lnk ) < n|Lnk |. (19) Then simple estimates show that, for large enough n, 1 1 Jn \ C| − |J \ C| < D − , n l l + 21 ln l where l is the minimal integer such that Jn ∪ Jn intersects Lkl . One can also notice that then |Jn \ C| > d 1l . It is left to show that the measure of the set R consisting of x ∈ C, such that (19) is satisfied for infinitely many intervals Lkn , is 0. Denote by RN the set of x ∈ C such that there exists Lkn satisfying (19) with n ≥ N . I. e. RN is the union of (n2 |Lkn |)neighborhoods of intervals Lkn for n ≥ N . By our construction |RN | ≤
∞
1 → 0 as N → ∞. n3/2 N
At the same time R1 ⊃ R2 ⊃ R3 ⊃ . . . and ∩Rn ⊃ R. Hence |R| = 0.
Now we can define the Krein spectral shift u on T to be equal to π/2 on {ei2πφ | φ ∈ C} and to −π/2 elsewhere. Let U1 " → U−1 be the unitary rank-one perturbation problem corresponding to u, Uα = U1 + (1 − α)(·, U1−1 φ)φ. It was shown in [16], Example 6.1, that spectral measures µα are pure point for all α = eiψ , π < ψ < 2π but singular continuous for all α = eiψ , 0 ≤ ψ ≤ π (the construction was performed on the real line, but all the arguments can be transferred to the circle in the standard way). Now, together with Lemma 3.5 the last claim implies that µα are Rajchman for a. e. α = eiψ , 0 ≤ ψ ≤ π. Remark 3.9 (Open question). As was shown in [8] and [6] if the spectrum of the original operator U1 contains the whole circle T then Uα can not have any point spectrum for a dense Gδ set of α ∈ T. The question is, if the point spectrum can be replaced with nonRajchman spectrum in this result. I.e., does there exist a family of unitary (self-adjoint) singular cyclic rank-one perturbations Uα (Aλ ) such that σ (U1 ) = T, (σ (A0 ) = R) but all the spectral measures µα (µλ ) have non-trivial non-Rajchman parts? Acknowledgement. The author is grateful to A. B. Aleksandrov for useful discussions.
References 1. Aleksandrov, A.B.: Multiplicity of boundary values of inner functions. Izv. Acad. Nauk. Arm. SSR, Matematica 22, 5, 490–503 (1987) 2. Aleksandrov, A., Anderson, J. and Nicolau, A.: Inner functions, Bloch spaces and symmetric measures. Proc. London Math. Soc. (3) 79, no. 2, 318–352 (1999) 3. Bishop, C.: Bounded functions in the little Bloch space. Pacific J. Math. 142, 209–225 (1990) 4. Carleson, L.: On mappings, conformal at the boundary. J. d’Analyse Math. 19, 1–13 (1967)
222
A. Poltoratski
5. del Rio, R., Jitomirskaya, S., Last, Y. and Simon, B.: Operators with singular continuous spectrum, 4. Hausdorff dimension and rank one perturbations. J. Anal. Math. 69, 153–200 (1996) 6. del Rio, R., Jitomirskaya, S., Makarov, N. and Simon, B.: Singular continuous spectrum is generic. Bull. Amer. Math. Soc. (N.S.) 31, no. 2, 208–212 (1994) 7. Donoghue, W.: On the perturbation of spectra. Comm. Pure Appl. Math. 18, 559–576 (1965) 8. Gordon, A.: Pure point spectrum under 1-parameter perturbations and instability of Anderson localization. Commun. Math. Phys. 164, no. 3, 489–505 (1994) 9. Havin, V. and Jöricke, B.: The uncertainity principle in harmonic analysis, Berlin–Heidelberg–New York: Springer-Verlag, 1994 10. Kechris, A.: Set theory and uniqueness for trigonometric series. Unpublished lecture notes 11. Last, Y.: Quantum dynamics and decompositions of singular continuous spectra. J. Funct. Anal. 142, no. 2, 406–445 (1996) 12. Lyons, R.: Fourier–Stieltjes coefficients and asymptotic distribution modulo 1. Ann. of Math. 122, 155– 170 (1985) 13. Lyons, R.: Seventy years of Rajchman measures. J. Fourier Anal. Appl. Kahane Special Issue, 363–377 (1995) 14. Martin, M. and Putinar, M.: Lectures on Hyponormal operators. Operator Theory: Advances and Applications, 39, 1989 15. Poltoratski, A.: On the boundary behavior of pseudocontinuable functions. St. Petersburg Math. J. 5, 389–406 (1994) 16. Poltoratski, A.: The Krein spectral shift and rank one perturbations of spectra. Algebra i Analiz 10, No. 5, 143–183 (1998), Russian; English translation to appear in St. Petersburg Math. J. 17. Poltoratski, A.: Equivalence modulo rank-one perturbation. Pacific J. Math. 194, no. 1, 175–188 (2000) 18. Simon, B.: Spectral analysis of rank one perturbations and applications. Math. quantum theory II, Schrödinger operators (Vancouver, BC, 1993), 109–149, CRM Proc. Lecture notes 8, Amer. Math. Soc., Providence, RI, 1995 19. Simon, B. and Wolff, T.: Singular continuous spectrum under rank one perturbations and localization for random Hamiltonians. Comm. Pure Appl. Math. 39, 75–90 (1986) Communicated by B. Simon
Commun. Math. Phys. 223, 223 – 259 (2001)
Communications in
Mathematical Physics
© Springer-Verlag 2001
Szeg˝o Difference Equations, Transfer Matrices and Orthogonal Polynomials on the Unit Circle Leonid Golinskii1, , Paul Nevai2, 1 Mathematics Division, Institute for Low Temperature Physics and Engineering, 47 Lenin Avenue,
Kharkov 61103, Ukraine. E-mail: [email protected]
2 Department of Mathematics, The Ohio State University, 231 West 18th Avenue, Columbus, OH 43210-1174,
USA. E-mail: [email protected] Received: 26 February 2001 / Accepted: 28 May 2001
Abstract: We develop the theory of orthogonal polynomials on the unit circle based on the Szeg˝o recurrence relations written in matrix form. The orthogonality measure and Cfunction arise in exactly the same way as Weyl’s function in the Weyl approach to second order linear differential equations on the half-line. The main object under consideration is the transfer matrix which is a key ingredient in the modern theory of one-dimensional Schrödinger operators (discrete and continuous), and the notion of subordinacy from the Gilbert–Pearson theory. We study the relations between transfer matrices and the structure of orthogonality measures. The theory is illustrated by the Szeg˝o equations with reflection coefficients having bounded variation. 1. Introduction Let µ be a probability measure on the unit circle T = {|ζ | = 1} with infinite support, supp µ. The latter is defined as the smallest closed set with the complement having µ-measure zero. The polynomials ϕn (z) = ϕn (µ, z) = κn (µ)zn + . . . , orthonormal on the unit circle with respect to µ are uniquely determined by the requirement that κn = κn (µ) > 0 and ϕn (ζ )ϕm (ζ ) dµ = δn,m , n, m = 0, 1, . . . , ζ ∈ T. (1) T
The monic orthogonal polynomials n are n (z) = n (µ, z) = κn−1 ϕn = zn + . . . . The orthonormal polynomials satisfy the Szeg˝o recurrences (cf. [30, Formula (11.4.7)]) κn ϕn (z) = κn−1 zϕn−1 (z) + an ϕn∗ (z) , (2) κn za n ϕn−1 (z) + ϕn∗ (z) , ϕn∗ (z) = κn−1 Partially supported by INTAS Grant 2000-272
Supported by the National Science Fundation under Grant DMS-9706695
224
L. Golinskii, P. Nevai def
where n ∈ N = {1, 2, . . . } and ϕ0 = ϕ0∗ = 1. The reversed ∗ -polynomial of a polynodef
def
mial pn of degree n is defined by pn∗ (z) = zn pn (1/¯z). The numbers an = n (µ, 0), n ∈ N, a0 = 1, known as reflection coefficients, describe completely the system of orthonormal polynomials ϕn , since (cf. [9, p. 7]) def
ρn2 =
2 κn−1
κn2
= 1 − |an |2 ,
κn−2 =
n
1 − |ak |2 ,
n ∈ N,
κ0 = 1.
(3)
k=1
As all zeros of n are inside the unit circle (cf. [9, Sect. 8, p. 9]), |an | < 1 for all n ∈ N. It is much more to the point for our purpose that the converse is also true. More precisely, given an arbitrary sequence of complex numbers {an } with the only restriction |an | < 1, the polynomials ϕn defined by (2) and (3) turn out to be orthonormal with respect to a unique probability measure µ with infinite support such that an = n (µ, 0) for n ∈ N. This result is referred to as Favard’s theorem for the unit circle (see [5] for a simple proof). Therefore we can view the theory of orthogonal polynomials on the unit circle as a theory of the 1st order vector (matrix) difference equation (2) with an being arbitrary complex numbers from the open unit disk D = {|z| < 1}. Our main goal here is to pursue this idea consistently (some traces of such an approach can be found in [13]). The emphasis is made on the matrix nature of the problem wherein analytic matrix valued functions play a crucial role. Consider a vector (matrix) difference equation − → − → X (z, n) = Tn (z) X (z, n − 1), Xn (z) = Tn (z)Xn−1 (z),
1 Tn (z) = ρn def
an , anz 1 z
n ∈ N,
which is called the Szeg˝o equation and Tn the Szeg˝o matrix. Here {an } is an arbitrary sequence of complex numbers with |an | < 1. Define the transfer matrix by Tn = Tn Tn−1 . . . T1 . In Sect. 2 we derive some general properties of the Szeg˝o equations, including the Christoffel–Darboux formula, from the J property of the transfer matrices. We show that the well known algebraic relations for orthogonal polynomials are nothing but the equalities for determinants of certain matrices. We also find the matrix formula for associated polynomials (cf. [22]). − → − → − → − → The two solutions Ψ (z) = { Ψ (z, n)}n≥0 and Φ (z) = { Φ (z, n)}n≥0 with
− → − → 1 ϕn (z) 1 ψn (z) def def = = , Ψ (z, n) = Tn (z) Φ (z, n) = Tn (z) 1 −1 ϕn∗ (z) −ψn∗ (z) are of particular concern. In Sect. 3 we follow H. Weyl in his approach to second order linear differential equations on the half-line and show that for each |z| < 1 there is a unique linearly independent solution of the Szeg˝o equation
− → → − → ϕ+ (z, n) def − Φ + (z, n) = Ψ (z, n) + F (z) Φ (z, n) = ψ+ (z, n) which belongs to 2 , that is, n≥0 (|ϕ+ (z, n)|2 + |ψ+ (z, n)|2 ) < ∞. Here F is an analytic function in the unit disk with the positive real part. The measure µ, which
Szeg˝o Difference Equations
225
comes in quite naturally thanks to the Riesz–Herglotz Theorem ζ +z dµ, |z| < 1, F (z) = T ζ −z
(4)
turns out to be the orthogonality measure for the polynomials ϕn . In case when |z| = 1 the situation is much more complicated. The notion of subordinacy appears on the scene now. In Sect. 4 we develop a unit circle analogue of − → the Gilbert–Pearson theory. A non-trivial solution U of the Szeg˝o equation is called − → subordinate at the point z if for every other linearly independent solution V k
− → 2 n=0 U (z, n) lim − → k k→∞ 2 n=0 V (z, n)
= 0.
An intimate relation between subordinacy and the boundary behavior of Weyl’s function F is exhibited in the following results. def
Theorem 2. Let supn |an | < 1 and assume that for some ζ ∈ T the finite limit F (ζ ) = lim r→1 F (rζ ) exists and is a pure imaginary number. Then the solution − → − → → − → def − Φ + (ζ ) = { Φ + (ζ, n)}n≥0 = Ψ (ζ ) + F (ζ ) Φ (ζ )
− → is subordinate at ζ . If lim r→1 |F (rζ )| = +∞ then the solution Φ is subordinate at ζ . − → − → Theorem 3. Let supn |an | < 1 and assume that Ψ + F Φ is the subordinate solution at some ζ ∈ T with some complex F . Then iF ∈ R and there exists a sequence rk → 1 − → as k → ∞ such that lim k→∞ F (rk ζ ) = F . If Φ is subordinate at ζ , then there exists a sequence rk → 1 as k → ∞ such that lim k→∞ |F (rk ζ )| = +∞. On the other hand, specific boundary behavior of F leads to some conclusions about the structure of measure µ (4). Given a finite Borel measure ν on T, a set A ⊂ T is called a carrier1 of ν if ν(G) = 0 def for each Borel set G ⊂ Ac = T \A. A is said to be an essential support of ν if, in addition, A has “no gaps”, that is, ν(E) > 0 for each Borel set E ⊂ A with m(E) > 0. The following well-known result forms an analytic background for our investigation. The first statement is sometimes called Fatou’s theorem, whereas the second one is due to de la Vallée Poussin. Theorem A. Let ν = νac + νs = ν dm + νs be Lebesgue’s decomposition of µ on the absolutely continuous (a.c.) and the singular parts with respect to the normalized Lebesgue measure m on T, and F be Weyl’s function of ν. Define def
AC(ν) = {ζ ∈ T : there exists a finite lim F (rζ ) = F (ζ ) and F (ζ ) > 0}, r→1
def
S(ν) = {ζ ∈ T : lim F (rζ ) = +∞}. r→1
1 The term is accepted in potential theory.
226
L. Golinskii, P. Nevai
Then AC(ν) is an essential support of the a.c. part νac and ν = F (ζ ) IAC (ζ ), where IG stands for the indicator of G; (ii) S(ν) is a carrier of the singular part νs . (i)
One of the main points in our study is the observation that each sequence {an } from D is embedded in a natural way into a family {λan }, |λ| = 1, which in turn leads to a family of probability measures {µλ } on T. Such measures µλ arise in the theory of bounded analytic functions in the unit disk [1, 12, 17], unconditional bases [18] and the theory of composition operators [2, 26] (wherein they are known as the Aleksandrov measures). In our setting they play exactly the same role as the spectral measures for rank one perturbations of a self-adjoint operator (cf., e.g. [29, 27]). In Sect. 5 we give some formulas for orthogonal polynomials with respect to µλ and prove the version of the Aronszajn–Donoghue theorem for the Aleksandrov measures. Note that the unit circle counterpart of the Simon–Wolff theory can be developed with no effort. We also put together some results concerning general families of measures and their lower envelopes. The reason is that we need to analyze the inequality
Tn (ζ ) 2 dσ ≤ 4, σ = min (µ1 , µ−1 ) T
(cf. (68)). Finally, in Sect. 6 we present the main results of the paper which relate the behavior of transfer matrices and the structure of the spectral measure µ. For instance, the following statement is valid (cf. [21, Theorem 1.1]). Theorem 12. Denote by def
B =
k 1 ζ ∈ T : lim inf
Tn (ζ ) 2 < ∞ . k→∞ k + 1 n=0
Then B is an essential support of the a.c. part of µ and µs (E) = 0 for each Borel E ⊂ B. It is well to emphasize that our approach is pretty much similar to that one applied in the theory of one-dimensional Schrödinger operators (discrete or continuous) when the 1st order matrix equation moves in from the wings to center stage (cf. [28, 21]). 2. Szeg˝o Equations and Properties of Transfer Matrices J inner matrix functions and symmetry principle. Let a = {an }, n ∈ N be a sequence of complex numbers from D, that is, |an | < 1. Denote by B the set of all such sequences. Each element a ∈ B gives rise to a difference equation (which is the main object under consideration) Xn (z) = Tn (z)Xn−1 (z), for 2 × 2 matrices Xn (z), where 1 Tn (z) = T (z, an ) = ρn
an , anz 1 z
n∈N
ρn2 = 1 − |an |2 .
(5)
(6)
Szeg˝o Difference Equations
227
We call Tn (6) the Szeg˝o matrix and Eq. (5) (as well as its vector version below) the Szeg˝o equation. The solution /n of (5) with the initial condition 1 1 def (7) /0 (z) = I0 = 1 −1 is our special concern. For 0 ≤ m < n define fundamental or transfer matrices by def
Tn,m (z) = Tn (z) . . . Tm+1 (z),
def
Tn (z) = Tn,0 (z) = Tn (z)Tn−1 (z) . . . T1 (z),
(8)
T0 = I , where I is the identity matrix. Now (5) can be written in the form Xn (z) = Tn (z)X0 (z),
/n (z) = Tn (z)I0 .
(9)
It follows directly from (6) that det Tn (z) = z. Hence det Tn,m (z) = zn−m and det Xn (z) = zn det X0 (z). In particular det /n (z) = zn det I0 = −2zn .
If we write /n (z) =
ϕn (z) ψn (z) ϕn0 (z) −ψn0 (z)
(10)
it is easily seen by induction that ϕn (z) = κn (ϕ)zn + · · · + ϕn (0), with κn (ϕ) = κn (ψ) = κn =
n k=1
ψn (z) = κn (ψ)zn + · · · + ψn (0)
ρk−1 > 0,
ϕn (0) = −ψn (0) = κn an .
The degree of the polynomials ϕn0 , ψn0 does not exceed n. We proceed with the following definition. Let −1 0 J = ; J = J ∗ , J 2 = I, 0 1 where A∗ stands for the adjoint to a matrix A. Definition. A 2 × 2 matrix A is said to be J expansive if A∗ J A − J ≥ 0 and J unitary if A∗ J A − J = 0. A J unitary matrix A is clearly invertible and A−1 = J A∗ J. It is a matter of routine computation to show that (6) implies 1 − xy 0 1 − |z|2 0 ∗ ∗ , Tn (z)J Tn (z) − J = , Tn (y)J Tn (x) − J = 0 0 0 0
(11)
(12)
228
L. Golinskii, P. Nevai
and hence each factor Tn (z) is the first order matrix polynomial which is J expansive inside the unit disk D and J unitary on the unit circle. Such analytic matrix functions are usually called J inner matrix functions (cf. [4, Chapter 1]). Since the product of J expansive [resp. J unitary] matrices is again J expansive [resp. J unitary], the transfer matrices Tn,m and, in particular, the matrices Tn are J inner matrix functions. If we write (11) for Tn (ζ ) on the unit circle, we get Tn−1 (ζ ) = J Tn∗ (ζ )J = J Tn∗ (1/ζ )J,
ζ ∈ T.
Taking into account the Uniqueness Theorem for analytic matrix functions we come to the following symmetry principle for J inner functions: Tn−1 (z) = J Tn∗ (1/z)J,
Tn∗ (1/z) = J Tn−1 (z)J,
z ∈ C\{0}.
(13)
By using (9), (10) and simple identities I0∗
= I0 ,
I02
= 2I,
def
I0 J I0 = −2Jr , Jr =
0 1 1 0
(14)
we can display (13) for the matrix /n = Tn I0 as −1 /∗n (1/z) = I0∗ Tn∗ (1/z) = I0 J Tn−1 (z)J = I0 J I0 /−1 n (z)J = −2Jr /n (z)J
or in the matrix entries 1 ϕn0 (z) ϕn (1/z) ϕn0 (1/z) ϕn (z) = n z ψn (1/z) −ψn0 (1/z) ψn0 (z) −ψn (z) By comparing the first columns in the latter relation we find that ϕn0 and ψn0 are nothing but the reversed ∗ -polynomials of ϕn and ψn , respectively: ϕn0 (z) = ϕn∗ (z),
ψn0 (z) = ψn∗ (z).
We call ϕn [resp. ψn ] the first [resp. the second ] kind polynomials. The transfer matrix Tn takes now the form 1 1 ϕn (z) + ψn (z) ϕn (z) − ψn (z) −1 Tn (z) = /n (z)I0 = /n (z)I0 = . (15) 2 2 ϕn∗ (z) − ψn∗ (z) ϕn∗ (z) + ψn∗ (z) The well known relation between the first and second kind polynomials drops out immediately upon computing the determinant of /n (see (10)) det /n (z) = − ϕn (z)ψn∗ (z) + ψn (z)ϕn∗ (z) = −2zn , that is ϕn (z)ψn∗ (z) + ψn (z)ϕn∗ (z) = 2zn .
(16)
In particular, ϕn (ζ )ψn (ζ ) = 1,
ζ ∈ T.
(17)
Szeg˝o Difference Equations
229
Let us make another remark, pertaining to the general properties of the transfer matrix. If we replace a = {an } by −a = {−an } we come to the equation n (z) = Tn I0 , Tn = T (z, −an )T (z, −an−1 ) . . . T (z, −a1 ). (18) / It is clear that T (z, −ak ) = J T (z, ak )J and hence Tn = J Tn J . If we multiply (18) through by J from the left and by −Jr from the right and note that I0 Jr = −J I0 , we n = −J /n Jr or in terms of n Jr = Tn I0 = /n . Thus / come to the equalities −J / matrix entries n (z) ϕn (z) ψn (z) ψ ϕn (z) = , n∗ (z) ϕn∗ (z) −ψ ψn∗ (z) −ϕn∗ (z) which means that the first and second kind polynomials just trade roles after changing sign at an . One of the main points of our investigation is the observation that both a and −a are embedded in a natural way into the family a(λ) = {λan }, λ ∈ T, of the elements from B. We will look into more detail about this phenomenon later in Sect. 5. Associated polynomials and finite shift formula. There are two shift operators acting on B. The left shift operator is defined by def
Sl (a1 , a2 , . . . ) = (a2 , a3 , . . . ). The ν th associated polynomials ϕn,ν , ψn,ν then correspond to the reflection coefficients Slν a = (aν+1 , aν+2 . . . ). In accordance with our notation the transfer matrix for the (ν) (ν) (ν) shifted parameters is Tn = Tn+ν,ν and /n = Tn I0 . By the “chain identity” Tn+ν (z) = Tn+ν,ν (z)Tν (z) = Tn(ν) (z)Tν (z),
−1 /n+ν (z) = Tn(ν) (z)/ν (z) = /(ν) n (z)I0 /ν (z),
or in the matrix form ψn+ν (z) ϕn+ν (z) ∗ (z) −ψ ∗ (z) ϕn+ν n+ν 1 ϕn,ν (z) + ψn,ν (z) ϕn,ν (z) − ψn,ν (z) ϕν (z) ψν (z) = ∗ (z) − ψ ∗ (z) ϕ ∗ (z) + ψ ∗ (z) 2 ϕn,ν ϕν∗ (z) −ψν∗ (z) n,ν n,ν n,ν
(19)
(20)
(cf. [22, Theorem 3.1 and Corollary 3.1]). The right shift operator is defined by Sr<α> (a1 , a2 , . . . ) = (α1 , . . . , αN , a1 , a2 , . . . ), def
where < α >= (α1 , . . . αN ) with |αk | < 1, k = 1, 2, . . . , N. Let new polynomials be <α> = T T <α> for m = 1, 2, . . . , so that ϕn<α> and ψn<α> . It is clear that Tm+N m N <α> (z) <α> (z) ψm+N ϕm+N <α> ∗ (z) −ψ <α> ∗ (z) ϕm+N m+N <α> (z) 1 ϕm (z) + ψm (z) ϕm (z) − ψm (z) ϕN ψN<α> (z) = . (21) <α> ∗ (z) −ψ <α> ∗ (z) ∗ (z) − ψ ∗ (z) ϕ ∗ (z) + ψ ∗ (z) 2 ϕm ϕN m m m N
230
L. Golinskii, P. Nevai
Assume that two sequences {an } and {bn } agree from some point on, that is, an = bn , n ≥ N + 1. We can obtain the sequence {bn } from {an } by composing left and right shift operators: (b1 , b2 , . . . ) = Sr<α> SlN (a1 , a2 , . . . ) with < α >= (b1 , . . . , bN ). The corresponding transfer matrices now satisfy (b)
(a)
(b)
Tm+N = Tm+N,N TN , which leads to the known relations between ϕn (an ), ψn (an ) and ϕn (bn ), ψn (bn ). Christoffel–Darboux formula and zeros. By the definition Tk+1 = Tk+1 Tk , and hence we can write ∗ ∗ Tk+1 (y)J Tk+1 (x) − J = Tk∗ (y) Tk+1 (y)J Tk+1 (x) − J Tk (x) + Tk∗ (y)J Tk (x) − J. Summing up from k = 1 to k = n − 1 gives Tn∗ (y)J Tn (x) − J =
n−1
k=1
∗ Tk∗ (y) Tk+1 (y)J Tk+1 (x) − J Tk (x) + T1∗ (y)J T1 (x) − J. (22)
Let us multiply the latter through from both sides by I0 = I0∗ , keeping in mind (9), (12), I0 J I0 = −2Jr and ϕ0 = ψ0 = 1:
−ϕn (x) −ψn (x) + 2Jr ϕn∗ (x) −ψn∗ (x) n−1
1 − xy 0 ϕk (y) ϕk∗ (y) ϕk (x) ψk (x) . = ψk (y) −ψk∗ (y) 0 0 ϕk∗ (x) −ψk∗ (x) k=0
ϕn (y) ϕn∗ (y) ψn (y) −ψn∗ (y)
The latter is equivalent to 4 scalar equalities, in particular, for (1, 1) entries we come to the well known Christoffel–Darboux formula, ϕn∗ (y)ϕn∗ (x) − ϕn (y)ϕn (x) = (1 − xy)
n−1
ϕk (y)ϕk (x).
(23)
k=0
Taking, for instance, (1, 2) entries we obtain the mixed Christoffel–Darboux formula, which involves both the first and the second kind polynomials (cf. [10, Formula (8.4)]) 2 − ϕn∗ (y)ψn∗ (x) − ϕn (y)ψn (x) = (1 − xy)
n−1
ϕk (y)ψk (x).
k=0
Let us put x = y = z in (23), |ϕn∗ (z)|2 = |ϕn (z)|2 + (1 − |z|2 )
n−1
k=0
|ϕk (z)|2 ≥ (1 − |z|2 )|ϕ0 (z)|2 = 1 − |z|2 , (24)
Szeg˝o Difference Equations
231
whence it follows that ϕn∗ does not vanish inside D. Moreover, assume that ϕn∗ (eiω ) = 0. Then ϕn∗ (z) = (z − eiω )m p(z) for some m ≥ 1 and p(eiω ) = 0. From (24) with z = teiω , 0 < t < 1 we have |ϕn∗ (teiω )|2 = (t − 1)2m |p(teiω )|2 ≥ 1 − t 2 , that is impossible as t → 1. Thus all zeros of ϕn∗ lie outside the unit disk D (or equivalently, all zeros of ϕn lie inside D). It is sometimes advisable dealing with the vector analogue of (5),
− → − → − → − → x1 (z, n) ∈ C2 , X (z, n) = Tn (z) X (z, n − 1) = Tn (z) X (z, 0), X (z, n) = x2 (z, n) (25) n ∈ N. As in (22) above we have − →∗ − → − → − → X (z, n)J X (z, n) = X ∗ (z, 0)J X (z, 0) +
n−1
∗ − → − →∗ (z)J Tk+1 (z) − J X (z, k) X (z, k) Tk+1 k=0
or |x2 (z, n)|2 − |x1 (z, n)|2 = |x2 (z, 0)|2 − |x1 (z, 0)|2 + (1 − |z|2 )
n−1
|x1 (z, k)|2 . (26)
k=0
3. Weyl’s Theory and Orthogonality 2 -solutions and Weyl’s function. We adopt here the reasoning similar to Weyl’s approach to second order linear differential equations on the half-line (cf. [31, Sects. 2.1– 2.2], [6, Sect. 2] and [7]). − → − → − → − → Let Ψ (z) = { Ψ (z, n)}n≥0 and Φ (z) = { Φ (z, n)}n≥0 be two linearly independent solutions of (25) with
− → 1 ϕn (z) def = , Φ (z, n) = Tn (z) 1 ϕn∗ (z)
(27) − → 1 ψn (z) def Ψ (z, n) = Tn (z) . = −1 −ψn∗ (z) Theorem 1. There is a unique linearly independent solution of the Szeg˝o equation (25) → − → − → def − (28) Φ + (z, n) = Ψ (z, n) + F (z) Φ (z, n) which belongs to 2 for |z| < 1. Here ψn∗ (z) n→∞ ϕn∗ (z)
F (z) = lim
(29)
uniformly inside the unit disk, and F is an analytic function with the positive real part.
232
L. Golinskii, P. Nevai
Proof. Fix z ∈ D, z = 0 and consider the Möbius transformation def
Mz (w) =
ψn∗ (z) w + ψn (z) , ϕn∗ (z) w − ϕn (z)
w ∈ C.
(30)
It takes the unit circle onto some circle Qn (z).2 Denote by Cn (z) the center of this circle. It is clear that Mz (w∞ ) = ∞ for w∞ = ϕn (z)/ϕn∗ (z). By the Symmetry Principle Cn (z) = Mz (wc ) with wc = (w ∞ )−1 , that is Cn (z) =
ψn (z)ϕn (z) + ψn∗ (z)ϕn∗ (z) . |ϕn∗ (z)|2 − |ϕn (z)|2
(31)
Since by (24) |wc | > 1, the transformation Mz maps the exterior of the unit disk onto the interior Q0n of the circle Qn , Mz (w) ∈ Q0n (z) ⇐⇒ |w| > 1. Put l = Mz (w),
w=
(32)
ϕn (z) l + ψn (z) , ϕn∗ (z) l − ψn∗ (z)
so that (32) can be paraphrased as l ∈ Q0n (z) ⇐⇒ | − ψn∗ (z) + lϕn∗ (z)| < |ψn (z) + l ϕn (z)|.
(33)
Going back to (25) let us single out its solution − → − → − → X (z, n) = Φ (z, n) + l Ψ (z, n),
l ∈ C,
so that x1 (z, n) = ψn (z) + l ϕn (z),
x2 (z, n) = −ψn∗ (z) + l ϕn∗ (z).
By (26) relation (33) is equivalent to l ∈ Q0n (z) ⇐⇒ (1 − |z|2 )
n−1
|ψk (z) + l ϕk (z)|2
k=0 2
< |x1 (z, 0)| − |x2 (z, 0)|2 = 4 l. Finally, l ∈ Q0n (z) ⇐⇒
n−1
k=0
|ψk (z) + l ϕk (z)|2 <
4 l . 1 − |z|2
(34)
As a simple consequence of (34) we get Q0n+1 ⊂ Q0n , that is, the disks Q0n are nested. 2 Q is a proper circle, since ϕ ∗ (z) = 0. n n
Szeg˝o Difference Equations
233
We can evaluate the radius rn of the disk Q0n , ψ (z)ϕ (z) + ψ ∗ (z)ϕ ∗ (z) ψ ∗ (z) + ψ (z) n n n n n n − ∗ rn (z) = |Cn (z) − Mz (1)| = |ϕn∗ (z)|2 − |ϕn (z)|2 ϕn (z) − ϕn (z) ϕ (z) − ϕ ∗ (z) ψ (z)ϕ ∗ (z) + ψ ∗ (z)ϕ (z) 2|z|n n n n n n n = ∗ = , ∗ 2 2 2 ϕn (z) − ϕn (z) |ϕn (z)| − |ϕn (z)| (1 − |z|2 ) n−1 k=0 |ϕk (z)| i.e., rn decays exponentially fast to zero as n goes to infinity. Thus the disks Q0n shrink to a single point F = F (z). For l = F , (34) holds for all n ∈ N. Therefore ∞
k=0
|ψk∗ (z) − F (z) ϕk∗ (z)|2 <
∞
|ψk (z) + F (z) ϕk (z)|2 ≤
k=0
4 F (z) . 1 − |z|2
(35)
As Mz (∞) = ψn∗ (z)/ϕn∗ (z) ∈ Q0n for all n, we have ∗ 4|z|n F (z) − ψn (z) < 2rn (z) < . ∗ 2 ϕn (z) (1 − |z|2 ) n−1 k=0 |ϕk (z)|
(36)
It is clear now that F (z) is analytic in the unit disk function (after being extended to the origin by F (0) = 1) with the positive real part. Such functions are known as C-functions.3 −→ − → − → We come thereby to the specific solution Φ+ (z, n) = Ψ (z, n) + F (z) Φ (z, n) of Eq. (25), which belongs to 2 for |z| < 1, z = 0. The latter is obviously true for z = 0 −→ as well, since Φ+ (0, n) = 0. −→ It is easy to see that the solution Φ+ (z, n) is unique up to a constant factor 2 -solution − → of (25). Indeed, suppose that there is another solution Y (z, n) with the same property. − → Then, taking appropriate linear combinations of the two we would have Φ (z0 , n) ∈ 2 . ∗ 2 2 The latter is false since |ϕn (z0 )| ≥ 1 − |z0 | > 0 by (24). It should be noted that the equality sign actually prevails in the second inequality (35), ∞
n=0
|ψn (z) + F (z)ϕn (z)|2 =
4 F (z) . 1 − |z|2
(37)
−→ Indeed, the components of the 2 -solution Φ+ (z, n) tend to zero, as n → ∞. The result − → −→ now follows immediately from (26) with X (z, n) = Φ+ (z, n). 3 In the context of differential equations the corresponding function is called Weyl’s function.
234
L. Golinskii, P. Nevai
Orthogonality relations. Let σ be a finite Borel measure on T with moments def ck = ζ −k dσ, k ∈ Z. T
It is well known (cf. [15, Chapter 1.11]) that the Toeplitz determinants det ck−j m k,j =0 are nonnegative for all m, and det ck−j m > 0, m ∈ N iff the measure σ has infinite k,j =0 support. According to the Riesz–Herglotz Theorem the function F in Theorem 1 admits the representation ζ +z F (z) = dµ, |z| < 1, (38) T ζ −z where dµ is a uniquely determined probability measure on T. The power series expansion for F is of the form ∞
k ck z , ck = ζ −k dµ. F (z) = 1 + 2 T
k=0 def
Similarly, Fn (z) = ψn∗ (z)/ϕn∗ (z) is a rational function analytic in the unit disk and by (16) 2 Fn (ζ ) = > 0, |ζ | = 1, |ϕn (ζ )|2 so that Fn is a C-function and ∞
ζ +z cn,k zk , cn,k = ζ −k dµn , dµn = 1 + 2 Fn (z) = T ζ −z T k=0
where dµn = 2|ϕn (ζ )|−2 dm and dm is the normalized Lebesgue measure on T. It is clear that det cn,k−j sk,j =0 > 0 for all s ∈ N. In view of Cauchy’s Estimate, (36) yields cn,k = ck for k = 0, 1, . . . , n − 1. Hence for m < n, cm,k = cn,k = ck ,
k = 0, 1, . . . , m − 1,
or, in other words, −k −2 −k −2 ζ |ϕn (ζ )| dm = ζ |ϕm (ζ )| dm = ζ −k dµ, T
T
T
|k| ≤ m − 1.
(39)
In particular, det ck−j sk,j =0 > 0 for all s, that is, µ has infinite support. Our goal here is to show that µ comes in as the orthogonality measure for the system ϕn , ϕp (ζ )ϕq (ζ ) dµ = δp,q , p, q = 0, 1, . . . , (40) T
which by (39) is equivalent to ϕp (ζ )ϕq (ζ ) T
dm = δp,q , |ϕq+1 (ζ )|2
p ≤ q.
(41)
Szeg˝o Difference Equations
235
In computing the left-hand side in (41) we proceed in two steps. 1. Let P be a polynomial of degree p ≤ q. It follows from (39) that 0, for p < q, P ∗ (ζ )ζ q−p−1 dm 1 dζ = P ∗ (0) P (ζ )ϕq (ζ ) = 2 ∗ |ϕq (ζ )| 2π i T ϕq (ζ ) T ϕ ∗ (0) , for p = q. q
In particular, for P = ϕp , ϕp (ζ )ϕq (ζ )
dm = δp,q , |ϕq (ζ )|2
p ≤ q.
P (0) dm , = |ϕq (ζ )|2 ϕq∗ (0)
p ≤ q.
T
Similarly
T
P (ζ )ϕq∗ (ζ )
2. Write the Szeg˝o equation for /n (9) in the form −1 (z)/q+1 (z), /q (z) = Tq+1
In particular ϕq (z) = Hence ϕp (ζ )ϕq (ζ ) T
−a q+1 −1 Tq+1 (z) = , ρq+1 z −a q+1 z z 1
1
∗ (z) ϕq+1 (z) − aq+1 ϕq+1
ρq+1 z
dm 1 = |ϕq+1 (ζ )|2 ρq+1
T
z = 0.
.
∗ P (ζ ) ϕq+1 (ζ ) − aq+1 ϕq+1 (ζ )
dm |ϕq+1 (ζ )|2
with P (z) = zϕp (z). The latter value is zero for p = 0, 1, . . . , q − 1. When p = q we have ϕq∗ (0) κq dm P ∗ (0) P (ζ )ϕq+1 (ζ ) = = ρq+1 , = = ∗ ∗ 2 |ϕq+1 (ζ )| ϕq+1 (0) ϕq+1 (0) κq+1 T P (0) dm ∗ P (ζ )ϕq+1 (ζ ) = ∗ = 0, 2 |ϕq+1 (ζ )| ϕq+1 (0) T that proves (40) completely. Associated C-function. The function F (ν) , which corresponds to the shifted sequence {aν+k } via Theorem 1 is called here the associated C-function. To find the formula for F (ν) we make use of the standard notation for the “right” linear fractional transformation. Given a matrix a b A= c d we denote def
A{ω} =
aω+b , cω+d
ω ∈ C.
236
L. Golinskii, P. Nevai
In this notation F (z) = lim Fn (z) = n→∞
lim /−1 (z){∞}, n→∞ n (ν)
and, accordingly, F (ν) = lim n→∞ Fn
/−1 n (z)
1 = n 2z
ψn∗ (z) ψn (z) ϕn∗ (z) −ϕn∗ (z)
(ν) −1 = lim n→∞ /n {∞}. We know from (19)
(ν)
that /n = /n+ν /−1 ν I0 . Hence −1 −1 F (ν) (z) = lim I0−1 /ν (z)/−1 n+ν (z){∞} = I0 /ν (z){F (z)} = I0 Tν (z)I0 {F (z)}. n→∞
4. Gilbert–Pearson Theory for Szeg˝o Equation The Gilbert–Pearson [11] theory is known to be a key ingredient in the modern approach to Schrödinger operators (discrete and continuous) on the half-line. Following the line of reasoning from [16] we develop here the version of GP theory for the Szeg˝o equation (25). Let
− − → → − → − → − → x1 (z, n) X (z) = X (z, 0), X (z, 1), . . . X (z, n), . . . , X (z, n) = x2 (z, n) be a solution of (25). Define its k-norm by
− → → def −
X (z) 2k =
X (z, n) 2 = |x1 (z, n)|2 + |x2 (z, n)|2 < ∞. k
k
n=0
n=0
− → Definition. A non-trivial solution U of (25) is said to be subordinate at the point z if − → for every other linearly independent solution V , − →
U (z) k lim → = 0. k→∞ − V (z) k
(42)
The following properties of subordinate solutions can be easily checked. 1. For each z there is at most one subordinate solution of (25). − → − → 2. Let (42) hold for just one solution V . Any other solution W , which is not a constant − → − → − → − → − → − → multiple of U , has the form W = U + a V , a = 0. By (42) U k < 21 |a| V k for large enough k, so that − → → − → − → |a| −
W (z) k ≥ |a| V (z) k − U (z) k >
V (z) k 2 − → − → − → and (42) is true with V k replaced by W k . Hence U is the subordinate solution. By Theorem 1 for each z ∈ D there exists the 2 -solution of (25), − → − → − → − → Φ + (z) = { Φ + (z, n)}n≥0 = Ψ (z) + F (z) Φ (z)
Szeg˝o Difference Equations
237
(see (27) and (28)), which is a fortiori subordinate. The situation when z = ζ ∈ T is much more intricate. We aim to prove two results which establish a link between subordinacy and the boundary behavior of the C-function F (38) (cf. [16, Theorems 1 and 2]). Throughout the rest of the paper our basic assumption on the Szeg˝o equation (on reflection coefficients) is supn |an | < 1 or, equivalently, def
1 + |an |2 < ∞. 2 n≥1 1 − |an |
γ 2 = sup
(43) def
Theorem 2. Under condition (43) assume that for some ζ ∈ T the finite limit F (ζ ) = lim r→1 F (rζ ) exists and is a pure imaginary number. Then the solution − → − → → − → def − Φ + (ζ ) = { Φ + (ζ, n)}n≥0 = Ψ (ζ ) + F (ζ ) Φ (ζ ) − → is subordinate at ζ . If lim r→1 |F (rζ )| = +∞ then the solution Φ is subordinate at ζ . − → − → Theorem 3. Under condition (43) assume that Ψ + F Φ is a subordinate solution at some ζ ∈ T with some complex F . Then iF ∈ R and there exists a sequence rk → 1 as − → k → ∞ such that lim k→∞ F (rk ζ ) = F . If Φ is subordinate at ζ then there exists a sequence rk → 1 as k → ∞ such that lim k→∞ |F (rk ζ )| = +∞. The main idea lies in comparing two solutions − → − → − → Φ + (z) = {Tn (z)f (z)}n≥0 = Ψ (z) + F (z) Φ (z), and
F (z) + 1 f (z) = F (z) − 1 def
− → − → − → − → def Y (ζ ) = { Y (ζ, n)}n≥0 = {Tn (ζ )f (z)}n≥0 = Ψ (ζ ) + F (z) Φ (ζ ),
where z = rζ is specified in an appropriate way. Lemma 4. For an arbitrary pure imaginary F , 0 < r < 1 and k ∈ N the inequality − → − → − → − → − → − →
Y (ζ ) − Φ + (rζ ) k ≤ 2γ (1 − r) Ψ (ζ ) + F Φ (ζ ) k Φ (ζ ) k Φ + (rζ ) k (44) holds. Proof. We have − → − → Y (ζ, n) − Φ + (z, n) = (Tn (ζ ) − Tn (z)) f (z). We can write the difference on the right as Tn (ζ ) − Tn (rζ ) = =
n j =1 n
Tj (ζ ) −
n
Tj (rζ )
j =1
Tn (ζ ) . . . Tp+1 (ζ ) Tp (ζ ) − Tp (rζ ) Tp−1 (rζ ) . . . T1 (rζ )
p=1
=
n−1
p=0
Tn,p+1 (ζ ) Tp+1 (ζ ) − Tp+1 (rζ ) Tp (ζ ).
238
L. Golinskii, P. Nevai
Since 1 Qq = ρq def
Tq (ζ ) − Tq (rζ ) = ζ (1 − r)Qq ,
1 0 , aq 0
we see that n−1
− → − → − → Tn,p+1 (ζ )Qp Φ + (z, p). Y (ζ, n) = Φ + (z, n) + ζ (1 − r) p=0
Put I0 =
1 1 1 −1
,
def
F =
F 1 , 1 0
F
−1
=
0 1 1 −F
−1 = (Tn I0 F)(Tp+1 I0 F)−1 . In terms of the matrix entries and write Tn,p+1 = Tn Tp+1 (15) 1 ϕl∗ ψ n + F ϕ n ϕn −ϕl −1 Tn I0 F = , (Tl I0 F) = . 2 ζ l ψl∗ − F ϕl∗ ψl + F ϕl −ψn∗ + F ϕn∗ ϕn∗
Therefore 2 ζ p+1 Tn,p+1 (ζ ) is equal to
∗ ∗ ∗ )ϕ + (ψp+1 − F ϕp+1 (ψn + F ϕn )ϕp+1 n
−(ψn + F ϕn )ϕp+1 + (ψp+1 + F ϕp+1 )ϕn
∗ ∗ ∗ )ϕ ∗ −(−ψ ∗ + F ϕ ∗ )ϕ ∗ (−ψn∗ + F ϕn∗ )ϕp+1 + (ψp+1 − F ϕp+1 n n n p+1 + (ψp+1 + F ϕp+1 )ϕn
.
But ∗ ∗ ψm (ζ ) − F ϕm (ζ ) = ζ m ψm (ζ ) + F ϕm (ζ )
(here we use iF ∈ R). Hence 4 Tn,p+1 (ζ ) 2 ≤ |c11 |2 + |c12 |2 + |c21 |2 + |c22 |2 , where each of the four values on the right admits the same bound, |ckj |2 ≤ 2 |ϕp+1 (ζ )|2 |ψn (ζ ) + F ϕn (ζ )|2 + |ϕn (ζ )|2 |ψp+1 (ζ ) + F ϕp+1 (ζ )|2 . Finally n−1
Tn,p+1 (ζ ) 2 ≤ 2 |ψn (ζ ) + F ϕn (ζ )|2
p=0
k
|ϕp (ζ )|2
p=0
+ 2 |ϕn (ζ )|2
k
|ψp (ζ ) + F ϕp (ζ )|2 .
p=0
As Qq 2 ≤ γ 2 for all q, we have by Schwarz’s inequality − → − → − →
Y (ζ, n) − Φ + (rζ, n) 2 ≤ 2 (1 − r)2 γ 2 Φ + (rζ ) 2k k k
|ψn (ζ ) + F ϕn (ζ )|2 |ϕp (ζ )|2 + |ϕn (ζ )|2 |ψp (ζ ) + F ϕp (ζ )|2 . p=0
Inequality (44) now follows by summing up over n.
p=0
Szeg˝o Difference Equations
239
The following elementary lemma shows how to specify the value r = rk . Lemma 5. Let for x ∈ (0, 1) and k ∈ N a function h(x, k) be nonnegative and continuous on (0, 1) for each fixed k. Assume next that (i) for all x ∈ (0, 1) h(x, ·) is monotonically decreasing to zero sequence as k → ∞; √ (ii) for all k ∈ N h(x, k) ≥ Ck x on (0, 1) with Ck > 0. Then there is a sequence xk which goes to zero as k → ∞ and such that xk = h(xk , k). Proof. Given an arbitrary δ ∈ (0, 1) pick k = k (δ) to meet h(δ, k ) < δ (see (i)). Hence h(δ, k) < δ for all k ≥ k . def Put g(x, k) =√h(x, k) −√x, so that g(δ, k) < 0 for such k. On the other hand, due to (ii) g(x, k) ≥ x(Ck − x) > 0 for small enough positive x. Thus, the equation x = h(x, k) is solvable on each interval (0, δ). By choosing an appropriate monotonically decreasing to zero sequence {δj } we end up with an increasing sequence of integers {kj } and monotonically decreasing to zero sequence xkj such that xkj = h(xkj , kj ). Next, let kj < n < kj +1 . By (i) xkj ≥ h(xkj , n) and by (ii) x < h(x, n) for small enough x. The latter means that there is a solution xn = h(xn , n) with xn ≤ xkj . The proof is complete. In the sequel we will encounter two typical examples of functions h in Lemma 5. 1. Let H be the C-function of a probability measure ν. The Harnack inequality (which can be easily deduced from (38)) states that 1−r 1+r ≤ H (rζ ) ≤ , 1+r 1−r
0 ≤ r < 1,
|ζ | = 1.
(45)
Put def
h(x, k) =
[H ((1 − x)ζ )]1/2 , Vk2
Vk = Vk (ζ ),
(46)
where the monotonically increasing to infinity sequence Vk will be assigned later on. By (45), H ((1 − x)ζ ) ≥
x x ≥ 2−x 2
(47)
√ so that (ii), Lemma 5 holds with Ck = ( 2 Vk2 )−1 . Hence a sequence rk = 1 − xk can be found with lim k→∞ rk = 1 and 1 − rk =
[H (rk ζ )]1/2 . Vk2
(48)
2. We may as well take 1 h(x, k) = 2 Wk def
H ((1 − x)ζ ) 1 + H ((1 − x)ζ )
1/2 ,
Wk = Wk (ζ ).
(49)
240
L. Golinskii, P. Nevai
Taking into account that the function u(t) = t (1 + t)−1 increases for positive t we see by (47) that the conditions of Lemma 5 are met, and as above 1 − rk =
1 Wk2
H (rk ζ ) 1 + H (rk ζ )
1/2 (50)
for some rk which goes to 1 as k → ∞. The inequality (35) will also be crucial in what follows. We write it in the form
− →
Φ + (z) 2k = |ψj∗ (z) − F (z) ϕj∗ (z)|2 + |ψj (z) + F (z) ϕj (z)|2 k
j =0
(51)
8F (z) , z = rζ. 1−r Proof of Theorem 2. To prove the first statement put in (44) F = 0 and define rk by (48) with H (z) = F (z) and 1/2 − − → → − → − → def Vk2 = Φ (ζ ) k Ψ (ζ ) k
Φ (ζ ) k + Ψ (ζ ) k . (52) <
It follows from (48) that 1/2 − → − →
Φ (ζ )
Ψ (ζ ) k k − → − → (1 − rk ) Φ (ζ ) k Ψ (ζ ) k = − [F (rk ζ )]1/2 → − →
Φ (ζ ) k + Ψ (ζ ) k ≤
[F (rk ζ )]1/2 . 2
By the assumptions of the theorem the right-hand side tends to zero as k → ∞. We see from (44) that − →
Y (ζ ) k = 1. → k→∞ − Φ + (rk ζ ) k lim
(53)
Next, (51) implies − →
Φ + (rk ζ ) k F (rk ζ ) 1/2 3 < − − → − → → − → 1 − rk
Φ (ζ ) k + Ψ (ζ ) k
Φ (ζ ) k + Ψ (ζ ) k − 1/4 → − →
Φ (ζ ) k Ψ (ζ ) k 1/4 =3 1/2 [F (rk ζ )] , − → − →
Φ (ζ ) k + Ψ (ζ ) k which also goes to zero, so that − →
Φ + (rk ζ ) k = 0. → − → k→∞ − Φ (ζ ) k + Ψ (ζ ) k lim
(54)
Szeg˝o Difference Equations
241
From (53) and (54) we conclude − →
Y (ζ ) k = 0. → − → k→∞ − Φ (ζ ) k + Ψ (ζ ) k − → − → − → Since Φ + (ζ ) − Y (ζ ) = (F (ζ ) − F (rk ζ )) Φ (ζ ), then − → − → − →
Φ + (ζ ) k ≤ Y (ζ ) k + |F (ζ ) − F (rk ζ )| Φ (ζ ) k lim
(55)
and − →
Φ + (ζ ) k lim → = 0. (56) − → k→∞ − Φ (ζ ) k + Ψ (ζ ) k − → − → − → − → Finally, Φ (ζ ) k + Ψ (ζ ) k ≤ Φ + (rk ζ ) k + (1 + |F (ζ )|) Φ (ζ ) k , which eventually leads to the conclusion − →
Φ + (ζ ) k lim − = 0, k→∞ → Φ (ζ ) k as needed. To prove the second statement, we proceed in the same way with the only difference that now in (48) H (z) = F −1 (z). Hence 1/2 − − → → 1/2
Φ (ζ ) k Ψ (ζ ) k − → − → −1 (1 − rk ) Φ (ζ ) k Ψ (ζ ) k = − F (r ζ ) k → − →
Φ (ζ ) k + Ψ (ζ ) k 1 ≤ , 2|F (rk ζ )|1/2 which again goes to zero by the second assumption of the theorem, so that (53) is still in effect. Let us write it as − → − →
Φ (ζ ) + F 1(z) Ψ (ζ ) k lim → = 1. (57) − → k→∞ − Φ (z) + F 1(z) Ψ (z) k Next, by (51) − → − → 1/2
Φ (z) + F 1(z) Ψ (z) k F (rk ζ ) 3 < − → − → − → − → 2
Φ (ζ ) k + Ψ (ζ ) k
Φ (ζ ) k + Ψ (ζ ) k (1 − rk )|F (rk ζ )| 1/4 − − → → 1/4
Φ (ζ ) k Ψ (ζ ) k F −1 (rk ζ ) =3 1/2 − → − →
Φ (ζ ) k + Ψ (ζ ) k and again − → − →
Φ (ζ ) + F 1(z) Ψ (ζ ) k = 0. lim → − → k→∞ − Φ (ζ ) k + Ψ (ζ ) k
242
L. Golinskii, P. Nevai
Finally, − → − → − →
Φ (ζ ) + F (r1k ζ ) Ψ (ζ ) k
Φ (ζ ) k ≤ − − → − → → − →
Φ (ζ ) k + Ψ (ζ ) k
Φ (ζ ) k + Ψ (ζ ) k − →
Ψ (ζ ) k 1 + →0 → − → |F (rk ζ )| − Φ (ζ ) k + Ψ (ζ ) k as k → ∞, and hence
− →
Φ (ζ ) k = 0, → k→∞ − Ψ (ζ ) k lim
as needed.
Proof of Theorem 3. We show first that F is a pure imaginary number whenever
− → − → − → − → − → F +1 X (ζ ) = { X (ζ, n)}n≥0 = Ψ (ζ ) + F Φ (ζ ), X (ζ, n) = Tn (ζ ) F −1 is the subordinate solution of (25). To this end note that the transfer matrices are “symplectic” on the unit circle 0 1 Tn (ζ ) = ζ Jr Tn (ζ )Jr , Tn (ζ ) = ζ n Jr Tn (ζ )Jr , Jr = 1 0 and A¯ stands for the complex conjugate (not transformed) matrix. − → − → Let us pick another solution Y (ζ ) = { Y (ζ, n)}n≥0 with
− → F +1 F −1 = Tn (ζ )Jr Y (ζ, n) = Tn (ζ ) F −1 F +1
− → F +1 = Tn (ζ )Jr = ζ n Jr X (ζ, n). F −1 − → − → − → It is obvious that Y (ζ, n) = X (ζ, n) , that is, Y (ζ ) is the subordinate solution. − → − → Therefore Y (ζ ) = τ X (ζ ), |τ | = 1 which implies τ (F − 1) = F + 1,
τ (F + 1) = F − 1,
and hence F = −F , as claimed. The rest is pretty much similar to the proof of Theorem 2. To prove the first statement consider (44) with this F and take in (49), H (z) = F (z),
− → → − → 3/2 − 1/2 Wk2 = Φ (ζ ) k Ψ (ζ ) + F Φ (ζ ) k .
Szeg˝o Difference Equations
243
By (50) − → − → − → (1 − rk ) Ψ (ζ ) + F Φ (ζ ) k Φ (ζ ) k − → − → 1/2 1/2
Ψ (ζ ) + F Φ (ζ ) k F (rk ζ ) = →0 − → 1/2 1 + F (rk ζ )
Φ (ζ ) k as k → ∞ by the assumption of the theorem, so that (53) is true. Next, by (51) with z = rk ζ , − →
Φ + (z) k
1/2 F (rk ζ ) 3 ≤ − →
Φ (ζ ) k (1 − rk )1/2 1 + F (rk ζ ) − → − → 1/4 1/4
Ψ (ζ ) + F Φ (ζ ) k F (rk ζ ) =3 →0 − → 1/4 1 + F (rk ζ )
Φ (ζ ) k
− →
Φ (ζ ) k (1 + F (z))1/2
as k → ∞, and hence lim
− →
Y (ζ ) k
k→∞
= 0. − →
Φ (ζ ) k (1 + F (z))1/2
(58)
Finally, − → − → − → − →
(F (rk ζ ) − F ) Φ (ζ ) k
Y (ζ ) k
Ψ (ζ ) + F Φ (ζ ) k ≤ − + , |F (rk ζ ) − F | = − → → − →
Φ (ζ ) k
Φ (ζ ) k
Φ (ζ ) k which along with (58) and the assumption of the theorem implies lim
k→∞
|F (rk ζ ) − F | (1 + F (rk ζ ))1/2
= 0.
The latter relation yields lim sup F (rk ζ ) < ∞, k→∞
lim F (rk ζ ) = F,
k→∞
as needed. To prove the second statement put F = 0 in (44) and H (z) = F −1 (z), Wk2 = Vk2 (52). By (50) − → − → (1 − rk ) Ψ (ζ ) k Φ (ζ ) k 1/2 − − → → 1/2
Φ (ζ ) k Ψ (ζ ) k F (rk ζ ) = − → − → 1 + F (rk ζ )
Φ (ζ ) k + Ψ (ζ ) k −1 − → − → 1/2 1/2
Φ (ζ ) k
Φ (ζ ) k F (rk ζ ) 1 + = − →0 → − → 1/2 1 + F (rk ζ )
Ψ (ζ ) k
Ψ (ζ ) k
244
L. Golinskii, P. Nevai
as k → ∞, and (57) holds. By (51) − → − →
Φ (z) + F 1(z) Ψ (z) k − 1/2 − → →
Φ (ζ ) k + Ψ (ζ ) k 1 + F −1 (z) 1/2 3 F −1 (z) (1 − rk )−1/2 ≤ − 1/2 → − →
Φ (ζ ) k + Ψ (ζ ) k 1 + F −1 (z) 1/4 − → − → 1/2
Φ (ζ ) k Ψ (ζ ) k F −1 (z) = 1/2 − → − → 1 + F −1 (z)
Φ (ζ ) k + Ψ (ζ ) k and, as above − → − →
Φ (z) + F 1(z) Ψ (z) k lim → 1/2 = 0. − → k→∞ − Φ (ζ ) k + Ψ (ζ ) k 1 + F −1 (z) Next, |F −1 (z)|
1/2 1 + F −1 (z)
− →
F −1 (z) Ψ (ζ ) k 1/2 − → 1 + F −1 (z)
Ψ (ζ ) k − → − → − →
Φ (ζ ) + F 1(z) Ψ (ζ ) k
Φ (ζ ) k ≤ + , 1/2 − 1/2 − → →
Ψ (ζ ) k 1 + F −1 (z) 1 + F −1 (z)
Ψ (ζ ) k =
whence it follows that lim k→∞ |F (rk ζ )| = +∞. It is clear that the sequence rk can be made monotonic by the obvious modification. The proof is complete.
5. Aleksandrov Measures and Lower Envelopes Let us go back to Szeg˝o equations and spectral measures. Recall that setting off from an element a = {an } ∈ B we end up with some probability measure µ such that ϕn are orthonormal with respect to µ. Similarly, the family a(λ) = {λan } leads to the family {µλ }λ∈T of probability measures on T and the family {F (z, λ)} of C-functions (38) in D. The polynomials ϕn (z, λ) are orthonormal with respect to µλ , and in particular def
ψn (z) = ϕn (z, −1) are orthonormal with respect to the measure µ− = µ−1 which is called the second kind measure. As is known (cf. [12, p. 462]) the C-functions F (z, λ) are related by the linear fractional transformation F (z, λ1 ) =
F (z, λ2 ) − iτ , 1 − iτ F (z, λ2 )
τ = tan
ω 1 − ω2 , 2
λj = eiωj ,
j = 1, 2.
In particular, the second kind C-function F (z, −1) = F −1 (z), F (z) = F (z, 1).
(59)
Szeg˝o Difference Equations
245
For the corresponding Szeg˝o matrices Tn (z, λ) and transfer matrices Tn,m (z, λ) the relations λ 0 λ 0 λ 0 λ 0 Tn (z, λ) = Tn (z) Tn,m (z) , Tn,m (z, λ) = (60) 0 1 0 1 0 1 0 1 hold, where Tn (z) = Tn (z, 1), Tn,m (z) = Tn,m (z, 1). Equality (60) produces the relation between ϕn (·, λ), ψn (·, λ) and the initial pair ϕn , ψn (cf. [12, p. 461]) ϕn (z, λ) =
1+λ 1−λ ϕn (z) + ψn (z), 2 2
ψn (z, λ) =
1−λ 1+λ ϕn (z) + ψn (z). 2 2
We begin our study of the Aleksandrov measures with the unit circle analogue of fundamental Aronszajn–Donoghue theorem (cf. [3, Theorem 2]; [27, Theorems 2.1 and 2.2]). We only outline the proof which relies on Theorem A. Theorem 6. Let µλ = µλ,ac + µλ,s be Lebesgue’s decomposition of µλ . Then for λ = λ , µλ ,ac and µλ ,ac are mutually absolutely continuous, whereas µλ ,s and µλ ,s are mutually singular. Sketch of the proof. It is easily seen from (59) that F (ζ, λ ) =
1 + τ2 F (ζ, λ ), |1 − iτ F (ζ, λ )|2
and the first assertion follows from (i), Theorem A. The second one is based on (ii), Theorem A and the fact that in view of (59) S(µλ ) ∩ S(µλ ) = ∅ for λ = λ . As we mentioned in the Introduction, the measures µλ play the same role as the spectral measures for rank one perturbations of a self-adjoint operator, with (59) being an analogue of the Aronszajn–Krein formula. The theory parallel to that one in [29] can be developed for the Aleksandrov measures. We are not going to spell out any details here (the proof will be given elsewhere), but just bring in two results (cf. [29, Theorems 2 and 4]) and two examples to illustrate them. Given a measure µ = µ1 on the unit circle with the C-function F , define a function L on T by dµ(ζ ) −1 def = (ζ − ξ )−1 −2 L(ξ ) = µ , 2 T |ζ − ξ | so that 0 ≤ L ≤ 4 and L(ξ ) > 0 whenever (ζ − ξ )−1 ∈ L2µ . Theorem 7. Let λ = eiω = 1 and ξ ∈ T. Then µλ {ξ } > 0 if and only if (i) L(ξ ) > 0, −1 (ii) lim r→1 F (rξ ) = (iτ )−1 = i tan ω2 . Theorem 8. The following statements are equivalent. (i) µλ are pure point measures for m-a.e. λ, (ii) L(ζ ) > 0 for m-a.e. ζ .
246
L. Golinskii, P. Nevai
Example 1. Let Kj,k = e
2π ij k
, j = 0, 1, . . . , k − 1 be k-roots of 1 and k−1
1 δ(Kj,k ) νk = k def
j =0
be a uniform distribution on this set. Finally, put µ=
∞
2−n ν2n .
n=1 π k|
≤ for each ξ ∈ T there is j = j (ξ ) with Since |Kj +1,k − Kj,k | = 2| sin |ξ − Kj,k | ≤ π/k. Hence dν2n (ζ ) 22n 2n ≥ 2 ν2n (Kj,2n ) = 2 2 π π T |ζ − ξ | 2π k ,
and
∞
T
dµ(ζ ) = 2−n 2 |ζ − ξ |
∞
T
n=1
1 dν2n (ζ ) ≥ , 2 |ζ − ξ | π2 n=1
that is, L(ξ ) = 0 everywhere on T. Next, by the construction µ is pure point measure, and Theorem 6 implies that all µλ are singular. By Theorem 7 for λ = 1 µλ have no masspoints, i.e., they are singular continuous. Example 2. Let {ζn }n≥1 be an arbitrary sequence of points on T. Take 0 < α < 1 and write def
H (ξ ) =
∞
n=1
αn . |ζn − ξ |2
√ We show first that H is finite m-a.e. Indeed, for p ∈ N let |ζn − ξ | > α n/4 / p for all n. Then H (ξ ) < p
∞
α n/2 =
n=1
p √ . 1− α
In other words,
p Ep = ξ : H (ξ ) ≥ √ 1− α def
⊂
! n≥1
α n/4 ξ : |ζn − ξ | ≤ √ . p
Hence m(Ep ) ≤
∞
α n/4 m ξ : |ζn − ξ | ≤ √ p n=1
∞ π n/4 α →0 ≤√ p n=1
as p → ∞. The rest is obvious since {ξ : H (ξ ) = ∞} ⊂ Ep for all p.
Szeg˝o Difference Equations
247
Let now 0 < cn ≤ Cα n and
n cn
= 1. Put
µ = µ1 =
∞
cn δ(ζn ).
n=1
Then −1
L
(ξ ) =
∞
T
dµ(ζ ) cn = ≤ C H (ξ ), |ζ − ξ |2 |ζn − ξ |2 n=1
and hence L(ξ ) > 0 m-a.e. The measure µ is clearly pure point. By Theorem 8 so are µλ for m-a.e. λ. In the second half of the section we present in a suitable way (and assemble together) some results regarding families of measures and their lower envelopes. Let {να } be a family of finite positive Borel measures on a measurable space (X, A). Given A ∈ A denote by PA the set of all partitions of A onto disjoint measurable sets: def
PA = {A =
n !
Aj ,
Ap ∩ Aq = ∅}.
j =1 def
Definition. The lower envelope ν = inf α να of {να } is defined by ν(A) = inf PA
n
inf να (Aj ), α
j =1
where the outer infimum is taken over all partitions of the set A. The function ν is known to be a finite positive Borel measure on (X, A). The basic (and obvious) property of ν claims that for all α ν ≤ να , that is, ν(A) ≤ να (A) ∀A ∈ A, and, conversely, if σ is a measure on (X, A) such that σ ≤ να for all α, then σ ≤ ν. Let σ , σ be measures on (X, A). We write σ ≺ σ (σ ⊥ σ ) for σ being a.c. (singular) with respect to σ . As a straightforward consequence of the basic property we see that inf α να ≺ σ (inf α να ⊥ σ ) for some measure σ as long as να ≺ σ (να ⊥ σ ) for at least one value of α. For the rest of the paper we shall focus on the case when the family contains two elements: n
ν(A) = min(ν1 , ν2 )(A) = inf min ν1 (Aj ), ν2 (Aj ) . PA
j =1
If we denote def
(ν1 , ν2 )(A, P) =
n
min ν1 (Aj ), ν2 (Aj ) ,
P: A=
j =1
then
n !
Aj ,
j =1
min (ν1 , ν2 )(A) = inf (ν1 , ν2 )(A, P). P
We continue with a number of other (less trivial) properties of the lower envelope. Given A ∈ A the set {PA } of all partitions of A is endowed with a partial order. More
248
L. Golinskii, P. Nevai
precisely, let P : A = ∪nj=1 Aj , P : A = ∪nj =1 Aj be two partitions of A. We say that P ≥ P if ! Ak , j = 1, 2, . . . , n. Aj = k∈Ij
Proposition 1. P ≥ P implies (ν1 , ν2 )(A, P ) ≤ (ν1 , ν2 )(A, P). Proof. We make use of an elementary inequality min
n
ak ,
k=1
n bk ≥ min (ak , bk ),
n
k=1
ak , bk ≥ 0.
(61)
k=1
Hence
(ν1 , ν2 )(A, P ) = =
n
k=1 n
min ν1 (Ak ), ν2 (Ak )
j =1 k∈Ij
≤
n
min
min ν1 (Ak ), ν2 (Ak )
j =1
=
n
k∈Ij
ν1 (Ak ),
k∈Ij
ν2 (Ak )
min ν1 (Aj ), ν2 (Aj ) = (ν1 , ν2 )(A, P),
j =1
as claimed.
We need another elementary inequality for positive numbers min (a + b, c) ≤ min (a, c) + min (b, c),
(62)
which appears to be true for measures as well. Proposition 2. Let ν1 , ν2 and σ be measures on (X, A). Then min (ν1 + σ, ν2 ) ≤ min (ν1 , ν2 ) + min (σ, ν2 ). Proof. By the definition there are partitions P : A = ∪nj=1 Ej and Q : A = ∪m i=1 Gi such that (ν1 , ν2 )(A, P) ≤ min (ν1 , ν2 )(A) + K, (σ, ν2 )(A, Q) ≤ min (σ, ν2 )(A) + K. Let us make up another partition L: A=
nm ! k=1
Ak =
! (Ej ∩ Gi ), i,j
(63)
Szeg˝o Difference Equations
249
so that L ≥ P, L ≥ Q. By (62) min (ν1 (Ak ) + σ (Ak ), ν2 (Ak )) ≤ min (ν1 (Ak ), ν2 (Ak )) + min (σ (Ak ), ν2 (Ak )) . Summing up over k gives (ν1 + σ, ν2 )(A, L) ≤ (ν1 , ν2 )(A, L) + (σ, ν2 )(A, L). It follows now from (63) and Proposition 1 that (ν1 + σ, ν2 )(A, L) ≤ min (ν1 , ν2 )(A) + min (σ, ν2 )(A) + 2K and the result drops out by taking inf over L on the left and K → 0 on the right.
There is a nice description of mutual singularity of two measures in terms of their minimum. Proposition 3. For two measures ν1 , ν2 on (X, A), ν1 ⊥ ν2 ⇐⇒ min (ν1 , ν2 ) = 0. Proof. Let first ν1 ⊥ ν2 . Then there exists a partition X = G1 ∪ G2 such that for each measurable set A we have ν1 (A ∩ G1 ) = ν2 (A ∩ G2 ) = 0. Hence for the partition A = A1 ∪ A2 with Ai = A ∩ Gi , i = 1, 2 the equality min (ν1 (Ai ), ν2 (Ai )) = 0 holds for i = 1, 2 and we are done. The converse is more delicate. Let ν2 = ν2,ac + ν2,s be Lebesgue’s decomposition of ν2 with respect to ν1 . Then by Proposition 2, min (ν2 , ν1 ) = min (ν2,ac + ν2,s , ν1 ) ≤ min (ν2,ac , ν1 ) + min (ν2,s , ν1 ). But as has just been proved the second term on the right is zero, so that min (ν2 , ν1 ) ≤ min (ν2,ac , ν1 ). Since the converse inequality is obvious, the equality sign prevails min (ν2 , ν1 ) = min (ν2,ac , ν1 ). Note that this is true for an arbitrary pair ν1 , ν2 . Next, let ν2,ac = f ν1 with some nonnegative and ν1 -integrable function f (the def
Radon-Nikodym derivative of ν2 towards ν1 ). Denote by A(k) = {x ∈ X : f ≥ k −1 }. Then n ν1 A(k)
(k) (k) (k) min (f ν1 )(Aj ), ν1 (Aj ≥ (f ν1 , ν1 )(A , P) = k j =1
for all partitions of the set A(k) . Hence 0 = min(f ν1 , ν1 ) A(k) ≥ ν1 A(k) /k for all k which yields f = 0 [ν1 ]. The proof is complete. We are in a position now to prove the main results regarding lower envelopes of measures. Proposition 4. Let νj = νj,ac + νj,s be Lebesgue’s decompositions of νj with respect to some measure m, j = 1, 2. Assume that ν1,s ⊥ ν2,s . Then min (ν1 , ν2 ) = min (ν1,ac , ν2,ac ).
250
L. Golinskii, P. Nevai
Proof. It is clear that min (ν1 , ν2 ) ≥ min (ν1,ac , ν2,ac ). Conversely, by Proposition 2, min (ν1 , ν2 ) ≤ min (ν1,ac , ν2,ac ) + min (ν1,ac , ν2,s ) + min (ν1,s , ν2,ac ) + min (ν1,s , ν2,s ). It remains only to note that the last three terms on the right are zeros by Proposition 3. Proposition 5. Let ν1 and ν2 be mutually a.c., ν1 % ν2 . Then min (ν1 , ν2 ) % νj for j = 1, 2. Proof. Since min (ν1 , ν2 ) ≤ νj then a fortiori min (ν1 , ν2 ) ≺ νj . To prove the converse, assume, for the contrary, that min (ν1 , ν2 )(A) = 0 for some A ∈ A, but νj (A) > δ > 0, j = 1, 2. Consider the sequence of partitions {Pm } of the set A with nm
j =1
δ min ν1 (Aj,m ), ν2 (Aj,m ) < m , 2
def
m ∈ N.
def
Put I1,m = {j : ν1 (Aj,m ) ≤ ν2 (Aj,m )}, I2,m = [1, 2, . . . , nm ]\I1,m . Then ! !
δ ν1 (Aj,m ) + ν2 (Aj,m ) = ν1 Aj,m + ν2 Aj,m < m . 2 j ∈I1,m
j ∈I2,m
j ∈I1,m
j ∈I2,m
In other words, there is a sequence of partitions A = Fm ∪ Gm such that ν1 (Fm ) + def
ν2 (Gm ) < δ 2−m , m ∈ N. Set F = ∩m Fm , then ν1 (F ) = 0 and by the assumption ν2 (F ) = 0. def On the other hand, G = A\F = ∪m Gm and ν2 (G) ≤ m ν2 (Gm ) < δ. Thus, ν2 (A) = ν2 (F ) + ν2 (G) < δ. The contradiction completes the proof. Let us sum up the results obtained above in connection with the Aleksandrov measures, taking into account Theorem 6. Theorem 9. For the Aleksandrov measures {µλ }λ∈T , min (µλ , µλ ) = min (µλ ,ac , µλ ,ac ) % µλ ,ac ,
λ = λ .
6. Transfer Matrices and Spectral Measures We are ready to enter upon the main item of business. Namely, we show that the behavior of norms of transfer matrices is closely tied to the absolute continuity of spectral measures, with subordinacy being a bridge between the two. We follow here the line of reasoning from [21]. Note that condition (43) is still in effect. Let µ = µ({an }) be the spectral measure of Szeg˝o equation (25) with the C-function F . Recall that AC(µ) is defined in Theorem A as AC(µ) = {ζ ∈ T : there exists a finite lim F (rζ ) = F (ζ ) and F (ζ ) > 0}. r→1
Put def
S({an }) = {ζ ∈ T : there are no subordinate solutions of (25) at ζ }.
Szeg˝o Difference Equations
251
Theorem 10. S ⊃ AC, S is an essential support of the a.c. part of µ and µs (E) = 0 for each Borel E ⊂ S. Proof. Let ξ ∈ S c , the complement of S on T. In other words, there exists a subordinate solution of (25) at ξ . By Theorem 3 for some sequence rk → 1 we have either lim k F (rk ξ ) = F ∈ iR or lim k |F (rk ξ )| = +∞. In each case ξ ∈ AC c , and inclusion follows. This means that S is a carrier of µac . def Next, put A = A(µ) = {ζ ∈ T : there exists a finite lim r→1 F (rζ ) = F (ζ )}. As is well known, m(A) = 1. It is not hard to check that m(S\AC) = 0. Indeed, if ξ ∈ A\AC then F (ξ ) = lim r→1 F (rξ ) is a pure imaginary number. Hence by Theorem 2 ξ ∈ S c , that is, S ∩ (A\AC) = A ∩ (S\AC) = ∅, as needed. Let now G ⊂ S and m(G) > 0. Decompose G as G = (G ∩ AC) ∪ (G\AC) = G1 ∪ G2 . As we have just proved m(G2 ) = 0 and hence m(G1 ) > 0, G ⊂ AC. But AC is known to be an essential support of µac (see (i), Theorem A) and thereby µac (G) ≥ µac (G1 ) > 0, which proves the second statement. Finally, by (ii), Theorem A S = S(µ) is a carrier of µs , that is, µs (S c ) = 0. But Theorem 2 claims that S ⊂ S c or S ⊂ S c , and we are done. Theorem 11. Let ζ ∈ S c . Then k
1
Tn (ζ ) 2 = +∞. k→∞ k + 1 lim
n=0
Proof. We proceed as in [21, Theorem 3.2]. − → It is clear that each solution of (25) (up to a constant factor) is of the form X (z) = − → { X (z, n)}n≥0 with
− → − → λ 0 sin ϑ ; |λ| = 1, ϑ ∈ R. (64) X (z, n) = X (z, n; λ, ϑ) = Tn (z) 0 1 cos ϑ In particular, − → → π 1 − X z, n; 1, = √ Φ (z, n), 4 2
# " − → → 1 − 3π = √ Ψ (z, n). X z, n; 1, 4 2 − → By the condition there exists a subordinate solution X (ζ, λ, ϑ) at the point ζ ∈ T. − → Pick another (linearly independent) solution X (ζ, λ, π − ϑ) and recall that Tn (ζ ) is J unitary, that is Tn∗ (ζ )J Tn (ζ ) = J . Hence by (64) − →∗ − → X (ζ, n; λ, ϑ)J X (ζ, n; λ, π − ϑ)
λ 0 λ 0 sin ϑ ∗ = sin ϑ, cos ϑ Tn J Tn 0 1 0 1 − cos ϑ
λ 0 λ 0 sin ϑ = sin ϑ, cos ϑ J = −1, 0 1 0 1 − cos ϑ
252
L. Golinskii, P. Nevai
and we obtain − → − → 1 ≤ X (ζ, n; λ, ϑ) X (ζ, n; λ, π − ϑ) .
(65)
− → Next, it is clear that X (ζ, n; λ, π − ϑ) ≤ Tn (ζ ) and hence − → k k
X (ζ, λ, π − ϑ) 2k → 1 1 −
X (ζ, n; λ, π − ϑ) 2 ≤
Tn (ζ ) 2 . = k+1 k+1 k+1 n=0
n=0
By (65) and Schwarz’s inequality 1≤
2 k → − → 1 −
X (ζ, n; λ, ϑ)
X (ζ, n; λ, π − ϑ) k+1 n=0 − → − →
X (ζ, λ, ϑ) 2k X (ζ, λ, π − ϑ) 2k ≤ , k+1 k+1
so that −1 − − → →
X (ζ, λ, π − ϑ) 2k
X (ζ, λ, ϑ) 2k ≤ . k+1 k+1 Thus we end up with the relation − → k
X (ζ, λ, π − ϑ) k 1 ≤
Tn (ζ ) 2 . − → k + 1
X (ζ, λ, ϑ) k n=0 − → The desired conclusion now stems from the subordinacy of X (ζ, λ, ϑ).
(66)
Remark. Let us single out an important step in the argument above. For arbitrary solution (64) we have
− → sin ϑ λ 0 = Tn−1 (ζ ) X (z, n). cos ϑ 0 1 Since | det Tn (ζ )| = 1 we see that − → 1 ≤ Tn (ζ ) X (z, n) , In particular, if Theorem 1.6]).
n Tn (ζ )
−2
− →
Tn (ζ ) −2 ≤ X (z, n) 2 .
= +∞ the Szeg˝o equation has no 2 -solutions (cf. [21,
The relation between the behavior of transfer matrices and the fine structure of the spectral measure is given in the following statement (cf. [21, Theorem 1.1]).
Szeg˝o Difference Equations
Theorem 12. Denote by def
B =
253
k 1 2 ζ ∈ T : lim inf
Tn (ζ ) < ∞ . k→∞ k + 1
(67)
n=0
Then B is an essential support of the a.c. part of µ and µs (E) = 0 for each Borel E ⊂ B. Proof. By Theorem 11 B ⊂ S. Hence in view of Theorem 64 we only have to make sure that µac (B c ) = 0. Recall (see Sect. 2) that ψn (z) 1 1 ϕn (z) Tn (z)I0 = , I02 = 2I, , I0 = 1 −1 ψn∗ (z) −ϕn∗ (z) where ϕn (ψn ) are orthonormal with respect to µ (µ− ), respectively. We have then for ζ ∈ T,
Tn (ζ ) 2 ≤ Tn (ζ )I0 2 ≤ 2 (|ϕn (ζ )|2 + |ψn (ζ )|2 ). Take σ = min(µ, µ− ) and integrate the latter inequality over T, # " 2 2 2
Tn (ζ ) dσ ≤ 2 |ϕn (ζ )| dµ + |ψn (ζ )| dµ− = 4, T
T
T
T
(68)
k
1
Tn (ζ ) 2 dσ ≤ 4. k+1 n=0
By Fatou’s lemma, k
1 lim inf
Tn (ζ ) 2 < ∞ k→∞ k + 1
(69)
n=0
a.e. with respect to σ . Now, Theorem 9 comes into play: σ = min(µ, µ− ) % µac , that is, (69) holds a.e. with respect to µac . The proof is complete. Remark. The following general result is actually proved in [21, Theorem 3.10]. Let (X, A, ν) be a measurable space with the finite Borel measure ν. Let {fn }n≥0 be a bounded sequence in L2 (X, A, ν). Then for each δ > 0, lim sup k→∞
1
k
(k + 1) (log(k + 1))1+δ
n=0
|fn (x)|2 < ∞
holds ν-a.e. Inequality (68) shows now that lim sup k→∞
1
k
(k + 1) (log(k + 1))1+δ
n=0
Tn (ζ ) 2 < ∞
holds µac -a.e. In particular, if Tn (or |ϕn |) grows exponentially fast on a Borel set E, then µ is singular on E.
254
L. Golinskii, P. Nevai
We could equally well have considered the transfer matrices Tn,m . It is clear from the chain identity Tn = Tn,m Tm , n > m, and Tn−1 = Tn that def
B(m) =
m+k 1 2 ζ ∈ T : lim inf
Tn,m (ζ ) < ∞ = B. k→∞ k + 1 n=m
Next, by (68),
T
#1/2 "
"
Tn,m (ζ ) dσ ≤
T
Tn (ζ ) 2 dσ
#1/2 T
Tm (ζ ) 2 dσ
≤ 4,
(70)
and again Fatou’s lemma leads to the following conclusion (cf. [21, Theorem 1.2]). Theorem 13. Let nj , mj be arbitrary sequences of positive integers which tend to infinity and let def B1 = {ζ ∈ T : lim inf Tnj ,mj (ζ ) < ∞}. j →∞
Then B1 is a carrier of µac . It might be worth comparing the latter result with the known Rakhmanov’s lemma. It is easy to see (by computing the eigenvalues of the matrix Tn∗ Tn ) that
Tn (z) 2 =
1 + |an | , 1 − |an |
and hence lim supn Tn = +∞ is equivalent to Rakhmanov’s condition lim supn |an | = 1.4 By taking an appropriate sequence nj and mj = nj − 1 in Theorem 13 the set B1 should be empty. The latter is quite consistent with Rakhmanov’s lemma which states that under Rakhmanov’s condition the spectral measure µ is singular. Unfortunately, Rakhmanov’s lemma does not follow from Theorem 13, which is proved under the opposite assumption (43). Theorem 12 can be applied to the study of Szeg˝o equations with reflection coefficients, having regular behavior at infinity. Definition. A sequence {an } ∈ B has a right limit {a˜ n } ∈ B if there exists a sequence of positive integers mn ∈ N such that lim amn +j = a˜ j ,
n→∞
j = 1, 2, . . . .
An asymptotically periodic sequence an , which satisfies limn→∞ anN+j = a˜ j , j = 1, 2, . . . , N, clearly has (periodic) right limit. Let {an } have a right limit {a˜ n }. Denote by µ (µ) ˜ and Tn (T˜n ) the spectral measure and transfer matrices for {an } ({a˜ n }), respectively. The following result in the setting of Schrödinger operators is obtained in [21, Theorem 1.4]. Theorem 14. Let E˜ be a Borel set of positive Lebesgue measure on T, such that µ˜ is ˜ = 0. Then so is the measure µ. ˜ that is, µ˜ ac (E) pure singular on E, 4 In the theory of Schrödinger operators the similar condition is known as high barriers.
Szeg˝o Difference Equations
255
Proof. It is clear by the definition of the right limit that lim Tmn +j,mn (ζ ) = lim Tmn +j (ζ )Tmn +j −1 (ζ ) . . . Tmn +1 (ζ ) = T˜j (ζ )
n→∞
n→∞
and k
k
j =0
j =0
1 1 ˜
Tmn +j,mn (ζ ) 2 =
Tj (ζ ) 2 . n→∞ k + 1 k+1 lim
By Fatou’s lemma and (70) we have T
k
1 ˜
Tj (ζ ) 2 dσ ≤ 4. k+1 j =0
The repeated application of Fatou’s lemma produces as in Theorem 12 that µac (B˜ c ) = 0, where B˜ is the set (67) for ˜ the measure µ. ˜ = µac E˜ ∩ B˜ + µac E\ ˜ B˜ . The second term on the right is shown to Next, µac (E) be zero. As for the first one, note that m E˜ ∩ B˜ = 0 by the assumption of the theorem and Theorem 12 applied to µ. ˜ Hence the first term is also zero, as was to be proved. Remark. Theorem 14 says that a Borel set G is a carrier of µac as long as it is for µ˜ ac . As far as the measures µ and µ˜ themselves go, the situation is sort of opposite. More precisely, for the derived sets of their supports the inclusion (supp µ) ˜ ⊂ (supp µ) holds. Example. Let lim n→∞ a˜ n = a, 0 < |a| < 1. It is well known (cf. [8, Theorem 1’]) def that the arc Ma = {eit : |t| < α = 2 arcsin |a|} is free from the a.c. part of the spectral measure µ. ˜ It turns out that if a sequence {an } tends to a along some large (in a way) set of indices, the same conclusion regarding µ remains valid. More precisely, denote by M = ∪j ≥1 [pj , pj + qj ] ⊂ N a set of indices with pj +1 > pj + qj and qj → ∞. Assume that an → a, n ∈ M. Then it is easy to check that the constant sequence {a, a, . . . } is the right limit of {an }, and µac (Ma ) = 0 thanks to Theorem 14. We complete the section with the result concerning integral bounds of transfer matrices (cf. [21, Theorem 1.3]). Suppose that sup Tn (ζ ) 2 dm < ∞. n
T
Then Fatou’s lemma implies m(B) = 1. Nevertheless it is not necessarily true that µ is pure a.c. since the set B c of Lebesgue measure zero may carry the singular part of µ. Theorem 15. Suppose that for some p > 2 and for an arc N, sup
Tn (ζ ) p dm < ∞. n
N
(71)
Then all Aleksandrov’s measures µλ are pure absolutely continuous on each compact subset of N.
256
L. Golinskii, P. Nevai
Proof. We see from (60) that Tn (ζ ) = Tn (ζ, λ) , so that it suffices to deal with one measure, say, µ. By definition
1 ϕn (ζ ) = Tn (ζ ) , ∗ 1 ϕn (ζ ) $
$
$ $2 $2 $2 $ $ $ $ $ 1 $ $ $ −1 2 $ ϕn (ζ ) $ 2 $ ϕn (ζ ) $ $ = Tn (ζ ) $ ∗ $ , $ $ ≤ Tn (z) $ ∗ $ ϕn (ζ ) $ $ ϕn (ζ ) $ $ 1 $ and hence 2 ≤ 2 Tn (ζ ) 2 |ϕn (ζ )|2 ,
1 ≤ Tn (ζ ) p . |ϕn (ζ )|p
By the assumption there is a sequence nj of integers such that dm < ∞, p > 2. sup p |ϕ nj (ζ )| N j
(72)
With no loss of generality let us assume that the endpoints of N are not masspoints for µ. Then by Rakhmanov’s theorem ∗ − lim j |ϕnj (ζ )|−2 dm = µ on N. The latter coupled with (72) provides the desired absolute continuity of the limit measure due to the standard result from Real Analysis (see, e.g., [20, Chapter IV.D]). 7. Sequences of Bounded Variation The theory of orthogonal polynomials on the unit circle with asymptotically periodic reflection coefficients was initiated by Ya. L. Geronimus in the forties and considerably extended by F. Peherstorfer and R. Steinbauer in the nineties (cf. [23, 24]). Our goal here is to demonstrate the transfer matrix method applied to an important subclass of reflection coefficients having bounded variation. This section is intended for illustration purposes only and so the exposition here is somewhat fragmentary. Definition. Given N ∈ N, we say that a sequence {an } ∈ B has N -type bounded variation if ∞
|an − an+N | < ∞.
(73)
n=0
It is clear that {an } is asymptotically periodic, that is, limn anN+j = a˜ j exists for j = 1, 2, . . . , N, and |a˜ j | ≤ 1. We assume that (43) holds, so that 0 ≤ a˜ j < 1 for all j . The reasoning below is heavily based on the following result due to R. Kooman [19, Theorem 1.3] (adapted to our setting). Theorem. Let An (ζ ) and A(ζ ) be m×m matrix valued functions on T, and let {rk (ζ, n)}, {rk (ζ )} be the eigenvalues of An and A, respectively. Assume that An → A uniformly on T, ∞
sup
An (ζ ) − An+1 (ζ ) < ∞, ζ ∈T n=0
Szeg˝o Difference Equations
257
and for an open set E ⊂ T and ζ ∈ E all rk are distinct and unimodular and |rk (·, n)| = 1 for large enough n and all k = 1, 2, . . . , m. Then each solution of a matrix equation Xn = An Xn−1 is uniformly bounded on compact subsets of E. Note that in the original Kooman’s theorem An and A are constant matrices, but the uniform dependence on a parameter can be easily traced from the proof. We begin with the simple case when a˜ 1 = · · · = a˜ N = a, 0 ≤ |a| < 1. Fix some j , 1 ≤ j ≤ N , and put (j )
An (ζ ) = An (ζ ) = TnN+j, (n−1)N+j (ζ ),
(j )
Xn (ζ ) = XnN+j (ζ ). (j )
(74)
(j )
We wish to show that all solutions of matrix equations Xn = An Xn−1 are uniformly bounded inside a certain set on T. It is clear from the expression for Szeg˝o matrices (6) that now (j )
A(ζ ) = lim An (ζ ) = T N (ζ, a) def
n→∞
exists for each j = 1, 2, . . . , N (and does not depend on j ), and sup
∞
ζ ∈T n=0
(j )
(j )
An (ζ ) − An+1 (ζ ) < ∞
under conditions (43) and (73). The spectrum {r1 , r2 } of T (and hence of T N ) is known explicitly (cf. [14, Sect. 2]). For the arc ζ = eiϑ ∈ N = {ζ : α ≤ ϑ ≤ 2π − α}, def
we have ei(ϑ/2) r1,2 (ζ ) = % 1 − |a|2 def
cos ω =
sin(α/2) = |a|,
& ϑ ϑ ϑ +α ϑ −α cos ± i sin sin = ei( 2 ±ω) , 2 2 2
cos ϑ2 , cos α2
0 ≤ ω ≤ π. Hence |r1 | = |r2 | = 1 and r1 = r2 off the endpoints of this arc. For the spectrum {r1N , r2N } of T N it follows that r1N (ζ ) = r2N (ζ ) ⇐⇒ ω =
lπ ϑl α lπ ⇐⇒ cos = cos cos , N 2 2 N
l = 0, 1, . . . , N.
Thus, |r1N | = |r2N | = 1,
r1N = r2N ,
ζ ∈ N(N) = N\{eiϑl : l = 0, 1, . . . , N}. def
(75)
To be able to apply Kooman’s theorem let us examine the spectrum {r1 (n), r2 (n)} of An (74). The characteristic equation for An is of the form w2 − tr An (ζ )w + det An (ζ ) = w2 − tr An (ζ )w + ζ N = 0.
258
L. Golinskii, P. Nevai ∗
The matrix An is known to be J unitary on T (see Sect. 2), that is, An = J A−1 n J . Hence ∗
tr An (ζ ) = tr A−1 n (ζ ) =
tr An (ζ ) det An (ζ )
= ζ N tr An (ζ ).
We see that the function τn (ζ ) = ζ −N/2 tr An (ζ ) is real valued on T. Put % τn (z) ± τn2 (z) − 4 def −N/2 hk (z, n) = z . rk (z, n) = 2 def
On the set G(n) = {ζ : τn2 < 4} ⊂ T the equalities h2 (ζ, n) = h1 (ζ, n),
r2 (ζ, n) = ζ N r1 (ζ, n),
|rk (ζ, n)| = 1
hold (recall that |r1 (ζ, n)r2 (ζ, n)| = | det An | = 1). As n → ∞ the sets G(n) tend to def G = {ζ : τ 2 < 4}, where τ (ζ ) = ζ −N/2 tr A(ζ ). Note that, unlike G(n) and τn , the limit values G and τ are the same for all j = 1, 2, . . . , N. It remains to observe that G agrees with N(N) . By Kooman’s theorem and Theorem 15 the spectral measure is absolutely continuous on each compact subset of N(N) . In the general situation we do not have the explicit description of the limit set (as for N(N) above). If we extend the limit sequence a˜ 1 , a˜ 2 , . . . , a˜ N as N -periodic we end up with the periodic sequence of Szeg˝o matrices {T˜m }, T˜N+m = T˜m , and (j ) lim An (ζ ) = A(j ) (ζ ) = T˜N+j, j (ζ ),
n→∞
j = 1, 2, . . . , N.
Although A(j ) do depend on j now, it is clear due to periodicity, that A(j ) form a cyclic permutation of the product A = T˜N . Since all matrices are nonsingular, the spectrum of A(j ) does not depend on j . In particular, tr A(j ) = tr A. def
As above we can define the set G = {ζ : τ 2 < 4} (which is a finite union of open arcs) and make sure that for the eigenvalues {r1 , r2 } of A the relations |r1 (ζ )| = |r2 (ζ )| = 1,
r1 (ζ ) = r2 (ζ )
hold on G. Again, the sets G(n) tend to G as n → ∞ and the conditions of Kooman’s theorem are satisfied. Now Theorem 15 shows that the spectral measure is absolutely continuous on each compact subset of G. References 1. Aleksandrov, A.B.: Multiplicity of boundary values of inner functions. Izv. AN Arm SSR 22, 490–503 (1987) 2. Cima, J.A. and Matheson, A.L.: Essential norms of composition operators and Aleksandrov measures. Pacific J. Math. 179, 59–64 (1997) 3. Donoghue, W.F.: On the perturbation of spectra. Comm. Pure Appl. Math. 18, 559–579 (1965) 4. Dym, H.: J Contractive Matrix Functions, Reproducing Kernel Hilbert Spaces and Interpolation. In: Am. Math. Soc. Regional Conference Series, Vol. 71, Providence, RI: Am. Math. Soc., 1989 5. Erdélyi, T., Geronimo, J.S., Nevai, P., and Zhang, J.: A simple proof of “Favard’s Theorem” on the unit circle. Atti. Sem. Mat. Fis. Univ. Modena 29, 41–46 (1991) 6. Geronimo, J. S.: Polynomials orthogonal on the unit circle with random recurrence coefficients. Lecture Notes in Math. 1550, Berlin–Heidelberg–New York: Springer, 1992, pp. 43–61 7. Geronimo, J.S. and Teplyaev, A.: A difference equation arising from the Trigonometric moment problem having random reflection coefficients- An operator Theoretic Approach. J. Funct. Anal. 123, 12–45 (1994)
Szeg˝o Difference Equations
259
8. Geronimus, Ya.L.: On the character of the solutions of the moment problem in the case of limit-periodic associated fraction. Izv. Akad. Nauk SSSR Ser. Mat. 5, 203–210 (1941) (Russian) 9. Geronimus, Ya.L.: Orthogonal Polynomials. New York: Consultants Bureau, 1961 10. Geronimus, Ya.L.: Polynomials orthogonal on a circle and their applications. In: Series and Approximations, Providence, RI: Am. Math. Soc. Transl. (1) 3, 1962, pp. 1–78 11. Gilbert, D.J. and Pearson, D.B.: On subordinacy and analysis of the spectrum of one-dimensional Schrödinger operators. J. Math. Anal. 128, 30–56 (1987) 12. Golinskii, L.: Schur functions, Schur parameters and orthogonal polynomials on the unit circle. Zeit. für Anal. Anwend. 12, 457–469 (1993) 13. Golinskii, L., Nevai, P. and Van Assche, W.: Perturbation of orthogonal polynomials on an arc of the unit circle. J. Approx. Theory 83, 392–422 (1995) 14. Golinskii, L., Nevai, P., Pinter, F. and Van Assche, W.: Perturbation of orthogonal polynomials on an arc of the unit circle I I . J. Approx. Theory 96, 1–32 (1999) 15. Grenander, U. and Szeg˝o, G.: Toeplitz Forms and Their Applications. Berkeley: University of California Press, 1958; 2nd edition: New York: Chelsea Publishing Company, 1984 16. Khan, S. and Pearson, D.B.: Subordinacy and spectral theory for infinite matrices. Helv. Phys. Acta 65, 505–527 (1992) 17. Khrushchev, S.: Schur’s algorithm, orthogonal polynomials and convergence of Wall’s continued fractions in L2 (T). J. Approx. Theory 108, 161–248 (2001) 18. Khrushchev, S.V., Nikol’skii, N.K. and Pavlov, B.S.: Unconditional bases of exponentials and of reproducing kernels. Lect. Notes Math. 864, Berlin–Heidelberg–New York: Springer, 1981, pp. 214–335 19. Kooman, R.J.: Asymptotic behaviour of solutions of linear recurrences and sequences of Möbius transformations. J. Approx. Theory 93, 1–58 (1998) 20. Koosis, P.: Introduction to Hp Spaces. London–New York: Cambridge University Press, 1980 21. Last, Y. and Simon, B.: Eigenfunctions, transfer matrices and absolutely continuous spectrum of onedimensional Schrödinger operators. Invent. math. 135, 329–367 (1999) 22. Peherstorfer, F.: A special class of polynomials orthogonal on the unit circle including the associated polynomials. Constr. Approx. 12, 161–185 (1996) 23. Peherstorfer, F. and Steinbauer, R.: Asymptotic behaviour of orthogonal polynomials on the unit circle with asymptotically periodic recurrence coefficients. J. Approx. Theory 88, 316–353 (1997) 24. Peherstorfer, F. and Steinbauer, R.: Orthogonal polynomials on the circumference and arcs of the circumference. J. Approx. Theory 102, 96–119 (2000) 25. Rahmanov, E.A.: On the asymptotics of the ratio of orthogonal polynomials, II. Math. USSR-Sb. 46, 105–117 (1983); Russian Original: Mat. Sb. 118 (160), 104–117 (1982) 26. Shapiro, J.E.: Aleksandrov measures used in essential norm inequalities for composition operators. J. Operator Theory 40, 133–146 (1998) 27. Simon, B.: Spectral analysis of rank one perturbations and application. Proc. Mathematical Quantum Theory II: Schrödinger Operators, J. Feldman, R. Froese, L. M. Rosen Eds. CRM Proc. Lecture Notes 8, Providence, RI: Am. Math. Soc., 1995, pp. 109–149 28. Simon, B.: Bounded eigenfunctions and absolutely continuous spectra for one-dimensional Schrödinger operators. Proc. Am. Math. Soc. 124, 3361–3369 (1996) 29. Simon, B. and Wolff, T.: Singular continuous spectrum under rank one perturbation and localization for random Hamiltonians. Comm. Pure Appl. Math. 39, 75–90 (1986) 30. Szeg˝o, G.: Orthogonal Polynomials. (4th edition), Am. Math. Soc. Colloq. Publ. 23, Providence, RI: Am. Math. Soc., 1975 31. Titchmarsh, E.C.: Eigenfunction Expansions Associated with Second-Order Differential Equations. Oxford: Oxford Clarendon Press, 1946 Communicated by B. Simon
Commun. Math. Phys. 223, 261 – 288 (2001)
Communications in
Mathematical Physics
© Springer-Verlag 2001
A Spin-Statistics Theorem for Quantum Fields on Curved Spacetime Manifolds in a Generally Covariant Framework Rainer Verch Institut für Theoretische Physik, Universität Göttingen, Bunsenstr. 9, 37073 Göttingen, Germany. E-mail: [email protected] Received: 1 March 2001 / Accepted: 28 May 2001
Abstract: A model-independent, locally generally covariant formulation of quantum field theory over four-dimensional, globally hyperbolic spacetimes will be given which generalizes similar, previous approaches. Here, a generally covariant quantum field theory is an assignment of quantum fields to globally hyperbolic spacetimes with spinstructure where each quantum field propagates on the spacetime to which it is assigned. Imposing very natural conditions such as local general covariance, existence of a causal dynamical law, fixed spinor- or tensor type for all quantum fields of the theory, and that the quantum field on Minkowski spacetime satisfies the usual conditions, it will be shown that a spin-statistics theorem holds: If for some of the spacetimes the corresponding quantum field obeys the “wrong” connection between spin and statistics, then all quantum fields of the theory, on each spacetime, are trivial. 1. Introduction The spin-statistics theorem of quantum field theory in Minkowski spacetime asserts that elementary particles with integer spin must obey Bose-statistics (“spacelike commutativity”), while those of half-integer spin must obey Fermi-statistics (“spacelike anti-commutativity”). Although this behaviour of elementary particles is often taken as an experimental fact of life, it is remarkable that in quantum field theory such a connection between two at first sight apparently unrelated properties of particles can be deduced from a few very basic principles: (1) Relativistic covariance, (2) stability of matter (spectrum condition and existence of a vacuum state), (3) localization properties of charges and (4) locality (spacelike commutativity of observable quantities). This deeply rooted connection between the covariance properties of elementary particles and the behaviour under exchange of their positions has attracted the attention of numerous researchers in quantum field theory, and has a long history with a fair number of general and rigorous results. Among the first are the investigations by Pauli [38] and by Fierz [20] who proved the spin-statistics theorem for quantum fields of
262
R. Verch
arbitrary spin obeying linear hyperbolic wave-equations in Minkowski-spacetime. The first results on the connection between spin and statistics in quantum field theory in a completely general, model-independent approach (for quantum fields in the Wightman framework) were then obtained by Burgoyne [11] and by Lüders and Zumino [36]. They have subsequently been further extended and refined, particularly to cover the situation of having several fields of different spinor types in a quantum field theory; these theorems are presented in the textbooks by Jost [33], by Streater and Wightman [44], and by Bogoliubov, Logunov, Todorov and Oksak [5], to which we refer the reader for further discussion and references. The Wightman-framework takes as fundamental objects pointlike quantum fields which may be charge-carrying and need not represent observable quantities. The operator-algebraic approach to quantum field theory [30, 29] uses, instead, observable quantities as the basic objects describing a theory of elementary particles and, at the same time, abandons their pointlike localizability. The charge-carrying objects and the global gauge group are, in this approach, not put in by hand, but can be reconstructed from the observables together with sets of states distinguished by certain localization properties (representing the localization properties of the charges in a quantum field theory). This is a deep result by Doplicher and Roberts [16] arising from the profound analysis of the charge superselection structure by Doplicher, Haag and Roberts (see [15, 16, 29] and references given therein). Spin-statistics theorems have also been derived in the operator-algebraic approach to quantum field theory, beginning with works by Epstein [19] and by Doplicher, Haag and Roberts [15] for the case of strictly localizable charges. Generalizations of spin-statistics theorems to the case of charges that can be localized in spacelike cones have been obtained by Buchholz and Epstein [10]. A new line of development has been introduced by the Tomita–Takesaki modular theory of von Neumann algebras [46] and its connection to Lorentz-transformations which was first established in two articles by Bisognano and Wichmann [4]; see the recent review by Borchers [6] for more information on this nowadays very important area of activity in algebraic quantum field theory. In this context, there are spin-statistics theorems by Guido and Longo [26] and by Kuckert [35] in algebraic quantum field theory which take a certain geometric action of the Tomita–Takesaki modular objects associated with the vacuum state and distinguished algebras of quantum field observables as the starting point. The results just summarized concern quantum field theory on four-dimensional Minkowski spacetime. The present article focusses on quantum field theory on four-dimensional curved spacetimes, but before turning to that topic, we just mention that spinstatistics connections have also been investigated in other settings. Among those are, in particular, quantum field theories on flat two-dimensional spacetime and chiral conformal quantum field theories on one-dimensional spacetimes (e.g. the circle S 1 ), see e.g. the articles [40] for the case of two dimensions and [27] for chiral conformal quantum field theory. A spin-statistics connection for so-called “topological geons” has been investigated within a diffeomorphism-covariant approach to quantum gravity [17,2] which is not directly related to the quantum field theoretical framework. For the sake of completeness we mention that the spin-statistics connection may also be violated e.g. for quantum fields having infinitely many components; at this point we refer to [5] and references cited there. While the spin-statistics connection is well-explored in quantum field theory on flat spacetime, offering a wealth of results, there is little analogous to be found so far for quantum field theory on curved spacetime manifolds. We recall that in quantum field
Spin-Statistics Theorem for Quantum Fields on Curved Spacetime
263
theory on curved spacetime one considers quantum fields propagating on a curved, classically described spacetime background; the standard references on that subject, from a more mathematical point of view, include [21, 52]. Clearly, the reason for lacking results on the spin-statistics connection in curved spacetime is that the spin-statistics theorem on Minkowski spacetime rests significantly on Poincaré-covariance which possesses no counterpart in generic curved spacetimes. In general, the isometry group of a curved spacetime will even be trivial. Thus it is not at all clear if a spin-statistics theorem can be established on curved spacetime in a model-independent quantum field theoretical framework. The situation is, of course, better when the spacetimes on which quantum fields propagate possess still large enough isometry groups. Such a setting has been considered recently in [28]. In that article, the charge superselection theory in the operatoralgebraic approach to quantum field theory has been generalized from the familiar case of Minkowski spacetime to arbitrary, globally hyperbolic spacetimes. Moreover, if a spacetime admits a spatial rotation-symmetry with isometry group SO(3), and also a certain time-space reflection symmetry, then a spin-statistics theorem has been shown to hold for covariant charges, where the spin is defined via the SU(2)-covering of the spatial rotation group SO(3). A certain geometric action of Tomita–Takesaki modular objects associated with an isometry-invariant state and distinguished algebras of observables has been taken as input. (We refer to [28] for further details and discussion.) Such a spin-statistics theorem applies e.g. for quantum field theories on Schwarzschild–Kruskal black hole spacetimes. However, when one is confronted with the question if there is a connection between spin and statistics for quantum fields on general spacetime manifolds, one finds scarcely any results. The only results known to us have been obtained in papers by Parker and Wang [37], and by Wald [50], and they apply to the case of quantum fields obeying linear equations of motion. The situation considered in these two papers is, roughly speaking, as follows: A linear quantum field propagates in the background of a (globally hyperbolic) spacetime consisting of three regions: A “past” region and a “future” region, both of which are isomorphic to flat Minkowski spacetime, and an intermediate region lying between the two (i.e. lying to the future of the “past” region, and to the past of the “future” region) which is assumed to be non-flat. (Actually, only particular types of spacetimes of this form are considered in [37] and [50].) Then it is shown in the mentioned articles that a quantum field of integer spin (≤ 2) obeying a linear waveequation won’t satisfy canonical anti-commutation relations in the “future” region if canonical anti-commutation relations were fulfilled in the “past” region. In other words, the “wrong” commutation relations are unstable under the dynamical evolution of the quantum field in the presence of a curved spacetime background. Likewise, a quantum field of half-integer spin (≤ 3/2) will no longer satisfy canonical commutation relations in the “future” region if it did so in the “past” region. It should be noted that these results don’t make reference to states (e.g., the vacuum state in any of the flat regions), so that it is really the non-trivial spacetime curvature in the intermediate region inducing dynamical instability of the “wrong” connection between spin and statistics at the level of the commutation relations. In that respect, the line of argument in [37] and [50] seems to be restricted to free fields. Nevertheless, there are some aspects of it which are worth pointing out since they can be generalized to model-independent quantum field theoretical settings. So one notes that the quantum field theories in the flat, “past” and “future” regions are “the same” regarding field content and dynamics; otherwise it would be difficult to formulate
264
R. Verch
that their commutation relations are unstable under the dynamical evolution. There is another aspect in form of the well-posedness of the Cauchy-problem for linear fields in globally hyperbolic spacetime, entailing that field operators located in the “future” are dynamically determined by the field operators located in the “past” region. This property is sometimes referred to as strong Einstein causality, or existence of a causal dynamical law, and not restricted to free field theories. Thus one may extract from the setting investigated by Parker and Wang, and by Wald, the two following important ingredients for a quantum field theory on curved spacetime: The parts of the theory restricted to isomorphic spacetime regions should themselves be isomorphic (i.e., copies of each other), and there should exist a causal dynamical law. One may then interpret the results of [37] and [50] as saying that, for a certain class of curved spacetimes and for a certain class of quantum field theories, the two said ingredients are incompatible with assuming the “wrong” connection between spin and statistics. On the basis of the mentioned ingredients, we can now abstract from the setting of [37] and [50]. We shall consider families {M }M∈G of quantum field theories indexed by the elements of G, the set of all four-dimensional, globally hyperbolic spacetimes with spin-structures M. Each M is a quantum field propagating on the background spacetime M, and it is assumed that for each M, the quantum field M is of a specific spinor- or tensor-type (the same for all M). The picture is that one can, for each spinoror tensor-type, formulate field equations that depend on the spacetime metrics in a covariant manner. (A very simple example is (✷g + m2 )M = 0 for a scalar field M on M = (M, g), where ✷g is the d’Alembertian associated with the metric g on the spacetime-manifold M.) Then there should be an isomorphism α between the algebras FM1 (O1 ) and FM2 (O2 ) formed by the field operators M1 (f1 ) and M2 (f2 ) with supp fj ⊂ Oj (j = 1, 2), respectively, 1 as soon as the subregions Oj ⊂ Mj are isomorphic, i.e. whenever there is a local isomorphism (of metrics and spin-structures)
: M1 ⊃ O1 → O2 ⊂ M2 . Moreover, α should be a net-isomorphism in the sense that it respects localized inclusions, meaning that α (FM1 (O)) = FM2 ( (O)) holds for all O ⊂ O1 . This is the principle of general covariance. It is worth noting that our concept of general covariance is a “local” one, in contrast to a similar, but global notion of general covariance for quantum field theories which has been developed by Dimock [13, 14]. Apart from that (and apart from the fact that we need the netisomorphisms at the level of von Neumann algebras, while in existing literature they have been looked at as C ∗ -algebraic net-isomorphisms), our concept of general covariance is very close to that suggested by Dimock, and also similar to ideas in [3, 34, 32]. The principle of existence of a causal dynamical law can then be expressed by demanding that, for each M, there holds FM (O1 ) ⊂ FM (O) whenever the subregion O1 of M lies in the domain of dependence of the subregion O of M (that is, O1 is causally determined by O, see Sect. 2 for details). There is another principle that is also most naturally imposed. Minkowski spacetime M0 is also a member of G, and clearly the quantum field theory M0 should satisfy the 1 The precise mathematical sense in which the algebras are formed by the field operators will be explained in Sect. 4. The M are viewed as operator-valued distributions and the fj are test-spinors or test-tensors (smooth sections of compact support in an appropriate spinor-bundle or tensor-bundle).
Spin-Statistics Theorem for Quantum Fields on Curved Spacetime
265
usual properties assumed for a quantum field theory (e.g., in the Wightman framework), like Poincaré-covariance, spectrum condition, existence of a vacuum state and, in order that a spin-statistics theorem can be expected, the Bose–Fermi alternative. If these conditions – fixed spinor- or tensor-type, general covariance, existence of a causal dynamical law and the usual properties for the theory M0 on Minkowski spacetime – are satisfied, we call the family {M }M∈G a generally covariant quantum field theory over G. For such generally covariant quantum field theories over G we shall establish in the present article a spin-statistics theorem. Roughly speaking, the contents of that theorem are as follows (see Thm. 5.1 for the precise statement): If there is some M ∈ G and a pair of causally separated regions O1 and O2 in M so that pairs of field operators of the quantum field M localized in O1 and O2 , respectively, fulfill the “wrong” connection between spin and statistics (i.e. they anti-commute if M is of integer spin-type (tensorial), or they commute if M is of half-integer spin type (spinorial)), then this entails that all field operators M˜ are mutliples of the unit operator ˜ ∈ G, thus the theory is trivial. for all M Our method of proof is to show with the help of a spacetime deformation argument (Lemma 2.1) that under the said assumptions the “wrong” connection between spin and statistics in any of the theories M leads to the “wrong” spin-statistics connection for the theory M0 on Minkowski spacetime; hence the known spin-statistics theorem for quantum field theory on Minkowski spacetime shows that M0 must be trivial. Using the spacetime deformation argument once more, this will then be shown to imply that all theories M˜ are trivial. The framework we use is in a sense a mixture of the Wightman-type quantum field theoretical setting and of the operator-algebraic approach to quantum field theory. This seems to have some technical advantages. Upon making some changes, one could reformulate the arguments so that they apply either to a purely Wightman-type quantum field theoretical setting, or to a purely operator-algebraic approach; however in the latter case it wouldn’t be so clear how to assign to a theory a spinor- or tensor-type on a curved spacetime. This has resulted in the framework we shall be employing here. We should like to point out that the assumptions imposed on a generally covariant quantum field theory {M }M∈G over G are quite general. They are fulfilled for free field theories on curved spacetimes in representations induced by Hadamard states as we will indicate by sketching some examples in Sect. 6. Our current understanding is, however, that these assumptions aren’t restricted to the case of free field theories but apply in fact to a larger class of quantum field theories. At any rate, they reflect a few very natural and general principles. Our work is organized as follows. In Sect. 2 we summarize a few properties of globally hyperbolic spacetimes. Lemma 2.1 will be of importance later for proving the spin-statistics theorem; it states that one can deform a globally hyperbolic spacetime into another globally hyperbolic spacetime which is partially flat, and partially isomorphic to the original spacetime. Section 3 contains the technical definition of local isomorphisms between spacetimes with spin structures. In Sect. 4 we give the full definition of a generally covariant quantum field theory over G. The main result on the connection between spin and statistics for such generally covariant quantum field theories over G is presented in Sect. 5. In Sect. 6 we sketch the construction of three theories that provide examples for generally covariant quantum field theories over G: The free scalar Klein– Gordon field, the Proca field and the Majorana-Dirac field in representations induced by quasifree Hadamard states.
266
R. Verch
There are three appendices. Appendix A contains the proof of Lemma 2.1, and in Appendix B we summarize the standard assumptions for a quantum field theory on Minkowski spacetime and quote the corresponding spin-statistics theorem from the literature. In Appendix C we briefly indicate (generalizing similar ideas in [14]) that generally covariant quantum field theories over G may be viewed as covariant functors from the category G of globally hyperbolic spacetimes with a spin-structure to the category N of nets of von Neumann algebras over manifolds, both categories being equipped with suitable local isomorphisms as morphisms. (See also the “Note added in proof” at the end of the article.) 2. Globally Hyperbolic Spacetimes We begin the technical discussion by collecting some basics on globally hyperbolic spacetimes. This section will be brief, and serves mainly for fixing our notation. The reader is referred to the monographs [31, 51] for further explanations and proofs. A spacetime is a pair (M, g) where M is a four-dimensional smooth manifold (connected, Hausdorff, paracompact, without boundary) and g is a Lorentzian metric with signature (+, −, −, −) on M. It will be assumed that (M, g) is orientable and timeorientable, meaning that there exists a smooth timelike vectorfield v on M. (Then g(v, v) > 0 everywhere on M, so v is nowhere vanishing). A continuous, piecewise smooth causal curve R ⊃ (a, b) t → γ (t) is future-directed (past-directed) if d g(γ˙ , v) > 0 (g(γ˙ , v) < 0), where γ˙ = dt γ is the tangent vector. Henceforth, it will be assumed that an orientation and a time-orientation have been chosen. Then one defines the following regions of causal dependence for any given set O ⊂ M: (i) (ii) (iii) (iv) (v)
J ± (O) is the set of all points lying on future(+)/past(–) -directed causal curves emanating from O, J (O) = J + (O) ∪ J − (O), D ± (O) is the set of all points p in J ± (O) such that each past(+)/future(–) -directed causal curve starting at p passes through O unless it has a past/future endpoint, D(O) = D + (O) ∪ D − (O), O ⊥ = M\J (O) is the causal complement of O.
The set D(O) is called the domain of dependence of O. If O1 ⊂ int D(O), then we say that O1 is causally determined by O, and denote this by O1 ✁ O. A time-orientable spacetime (M, g) is called globally hyperbolic if M possesses a smooth hypersurface which is intersected exactly once by each inextendible causal curve. Such a hypersurface is called a Cauchy-surface. It is known that globally hyperbolic spacetimes possess C ∞ -foliations into Cauchy-surfaces, in other words, for each globally hyperbolic spacetime (M, g) there exists a smooth 3-dimensional manifold 0 together with a diffeomorphism F : R × 0 → M such that for all t ∈ R, F ({t} × ) is a Cauchy-surface in (M, g) and such that, for each x ∈ 0 , R t → F (t, x) is an endpointles timelike curve. While this may at first sight appear to be quite restrictive, it is known that the set of globally hyperbolic spacetimes is quite large and contains many spacetimes of physical interest. Moreover it should be noted that global hyperbolicity isn’t connected to the existence of spacetime symmetries. When N is an open, connected subset of M, then (N, g N ) is again an oriented and time-oriented spacetime. We call it a globally hyperbolic sub-spacetime of (M, g) if the following conditions are satisfied (cf. [31]Sect. 6.6): (1) the strong causality assumption holds on (N, g N ), (2) for any two points p, q ∈ N , the set J + (p) ∩ J − (q), if
Spin-Statistics Theorem for Quantum Fields on Curved Spacetime
267
non-empty, is compact and contained in N . This entails that (N, g N ) is a globally hyperbolic spacetime in its own right, but also when seen as embedded into (M, g). We give two types of examples for subsets N of M so that (N, g N ) is a globally hyperbolic sub-spacetime: First, if p, q ∈ M with p ∈ int J + (q), then the “double cone” N = int(J − (p) ∩ J + (q)) gives rise to a globally hyperbolic sub-spacetime. And secondly, suppose that C1 , C2 , C3 are three Cauchy-surfaces in (M, g) with C2 ⊂ int J + (C1 ) and C3 ⊂ int J + (C2 ), and let G be a connected open subset of C2 . Then the “truncated diamond” N = int(D(G) ∩ J + (C1 ) ∩ J − (C3 )) yields, equipped with the appropriate restriction of g, again a globally hyperbolic sub-spacetime of (M, g). For the purposes of the present paper, a particularly important property of globally hyperbolic spacetimes is the following: A globally hyperbolic spacetime (M, g) can be “deformed” into another globally hyperbolic spacetime (M, g ) in such a way that certain regions of (M, g) remain unchanged in (M, g ), while other regions in (M, g) are isomorphic to parts of flat Minkowski spacetime. This will be made more precise in the subsequent statement, whose proof, given in Appendix A, is an extension of methods used in [22]. Lemma 2.1. Let (M, g) be a globally hyperbolic spacetime and let p1 , p2 ∈ M be a pair of causally separated points (i.e. p1 ∈ {p2 }⊥ ). Then there is a globally hyperbolic j , U j (j = 1, 2) and G, G, spacetime (M, g ), together with a collection of subsets Uj , U with the following properties: in (M, (a) There are Cauchy-surfaces in (M, g), and g ), so that with N+ = + = int J + ( + , ), (N+ , g N+ ) is isomorphic to (N int J + () ⊂ M and N g + ). N + will be denoted by p (b) p1 , p2 ∈ N+ . The isomorphic images of p1 and p2 in N 1 and p 2 . ⊂N − = int J − ( is a globally hyperbolic ) is simply connected, and (G, (c) G g G) sub-spacetime of (M, g ) isomorphic to a globally hyperbolic sub-spacetime (G0 , η G0 ) of flat Minkowski-spacetime (M0 , η) ∼ (R4 , diag(+, −, −, −)). + is simply connected and (d) G ⊂ N (G, g G) is a globally hyperbolic sub-spacetime of (M, g ) containing p 1 and p 2 . (e) The sets Uj , Uj , Uj are, when equipped with the appropriate restrictions of g as a metric, globally hyperbolic, relatively compact sub-manifolds of (M, g ) which are, j , U j ⊂ G respectively, causally separated for different indices, and p j ∈ Uj ⊂ G, U (j = 1, 2). j is causally determined by Uj , and Uj is causally determined by U j (j = 1, 2). (f) U Figure 2.1 may help to illustrate the relations between the sets involved in Lemma 2.1.
U1
U1
G
~
U2 U2
U1
~
U2
G j , U j , G, G Fig. 2.1. Sketch of the causal relations of the sets Uj , U
~ Σ
268
R. Verch
3. Spacetimes with Spin-Structures Let (M, g) be a globally hyperbolic spacetime where an orientation and a time-orientation have been chosen. Then let F (M, g) be the bundle of oriented and time-oriented (and future-directed) g-orthonormal frames on M. That is, an element e = (e0 , . . . , e3 ) in F (M, g) is a collection of four vectors in Tp M, p ∈ M, with g(ea , eb ) = ηab , where (ηab ) = diag(+, −, −, −) is the Minkowski metric, e0 is a future-directed timelike vector, and the frame (e0 , . . . , e3 ) is oriented according to the chosen orientation on M. The bundle projection πF : F (M, g) → M assigns to e the base point p to which the vectors ↑ e0 , . . . , e3 are affixed. The proper orthochronous Lorentz group L+ operates smoothly on the right on F (M, g) by (R' e)a = eb 'b a and thus F (M, g) is a principal fibre bundle ↑ with fibre group L+ over M. A spin structure for (M, g) is a pair (S(M, g), ψ), where S(M, g) is an SL(2, C)-principal fibre bundle over M and ψ : S(M, g) → F (M, g) is a base-point preserving bundle homomorphism (that is, πF ◦ ψ = πS where πS is the base projection of S(M, g)) with the property ψ ◦ Rs = R'(s) ◦ ψ. Here, Rs denotes the right action of s ∈ SL(2, C) on S(M, g), and SL(2, C) s → ↑ '(s) ∈ L+ is the covering projection; recall that SL(2, C) is the universal covering ↑ group of L+ . Two spin-structures (S (1) (M, g), ψ (1) ) and (S (2) (M, g), ψ (2) ) are called (globally) equivalent if there is a base-point preserving bundle-isomorphism : S (1) (M, g) → S (2) (M, g) so that ◦ ψ (2) = ψ (1) . It is known that each 4-dimensional globally hyperbolic spacetime admits spin-structures and that all such spin-structures are equivalent if the spacetime manifold is simply connected (cf. [25]). From now on, we will abbreviate by M = ((M, g), S(M, g), ψ) an oriented and timeoriented globally hyperbolic spacetime endowed with a spin-structure, and we shall also use the notation Mj = ((Mj , gj ), Sj (Mj , gj ), ψj ) if we have labels j distiguishing several such objects. We denote by G the set of all 4-dimensional, oriented and timeoriented globally hyperbolic spacetimes with a spin-structure. One may view G as a category; of interest are then “local morphisms” between its objects, or more properly, morphisms between sub-objects. We will introduce the “local morphisms” as follows. For more details, see Appendix C. Definition 3.1. Let M1 and M2 be in G. Then we say that = ( , ϑ) is a local isomorphism between M1 and M2 if: (a) There are simply connected, oriented and time-oriented globally hyperbolic subspacetimes (Nj , gj Nj ) of (Mj , gj ) (j = 1, 2) so that ϑ : (N1 , g1 N1 ) → (N2 , g2 N2 ) is an orientation and time-orientation preserving isomorphism. Then N1 will be called the initial localization of , denoted by *ini (), and N2 will be called the final localization of , denoted by *fin (). (b) When denoting by Sj (Nj , gj ) the restriction of Sj (Mj , gj ) in its base set (that is, Sj (Nj , gj ) = πS−1 (Nj )) , then j
: S1 (N1 , g1 ) → S2 (N2 , g2 ) is a principal fibre bundle isomorphism (so it intertwines the corresponding right actions of the fibre groups) with the following properties:
Spin-Statistics Theorem for Quantum Fields on Curved Spacetime
269
(i) ϑ ◦ πS1 = πS2 ◦ on S1 (N1 , g1 ), (ii) ϑF ◦ ψ1 = ψ2 ◦ on S1 (N1 , g1 ). Here, ϑF : F (N1 , g1 ) → F (N2 , g2 ) is induced by the tangent map corresponding to ϑ : N1 → N2 . Remark. In [14], Dimock has introduced the category G, and global isomorphisms between pairs of objects in G as morphisms. Since each globally hyperbolic sub-spacetime of a globally hyperbolic spacetime with spin-structure is itself a member of G, the definition of local isomorphisms can be regarded as introducing morphisms between sub-objects of objects in G. It should be noted that the class of local isomorphisms between elements of G is clearly larger than the class of global isomorphisms as considered in [14], and therefore covariance properties imposed on quantum systems with respect to the class of local isomorphisms are more restrictive than those using only global isomorphisms. Further below we will see the implications of that. Let ρ be a linear representation of SL(2, C) on some finite-dimensional vectorspace Vρ (which may be real or complex). Then, given a spacetime-manifold with spinstructure M = ((M, g), S(M, g), ψ) ∈ G, one can form the vector bundle Vρ = S(M, g) ρ Vρ associated with the principal fibre bundle S(M, g) and the representation ρ. Vρ is a vector bundle over the base-manifold M, and we recall that the elements of (Vρ )p , the fibre of Vρ at a base point p ∈ M, are the orbits {(Rs−1 sp , ρ(s)v) : s ∈ SL(2, C)} of pairs (sp , v) ∈ S(M, g)p × Vρ under the action s → (Rs−1 sp , ρ(s)v)
(3.1)
of the structure group SL(2, C) of S(M, g). This action induces a linear representation ρˇ of SL(2, C) on each (Vρ )p . We say that Vρ is the vector bundle of (spin-) representation type ρ. Now let M1 and M2 be in G and let V1 and V2 be associated vector bundles of representation type ρ1 and ρ2 , respectively. Suppose that ρ1 and ρ2 are equivalent, i.e. there is some bijective linear map T : V1 → V2 so that T ρ1 (. )T −1 = ρ2 (. ).
(3.2)
One finds from these assumptions that any local isomorphism = ( , ϑ) between ˇ between V1 and V2 in a way we shall now M1 and M2 lifts to a local isomorphism
indicate. Let πˇ j denote the base projections of Vj (j = 1, 2) and, with N1 = *ini (), N2 = *fin ( ), let Vj (Nj ) = πˇ j−1 (Nj ) denote the restrictions of the vector bundles in ˇ : V1 (N1 ) → V2 (N2 ) by assigning to any element (sp , v) in the base sets. Then define
S(M1 , g1 )p × V1 , with p ∈ N1 , the element (( s)ϑ(p) , T v) in S(M2 , g2 )ϑ(p) × V2 , and form the orbits under the corresponding structure group actions (3.1). It is not difficult to check that this assignment indeed induces a well-defined map between V1 (N1 ) and V2 (N2 ) which is linear in the fibres and fulfills ˇ ϑ ◦ πˇ 1 = πˇ 2 ◦
ˇ intertwines the representations ρˇj in the sense that on V1 (N1 ). Moreover,
ˇ ◦ ρˇ1 (s) = ρˇ2 (s) ◦
ˇ
for all s ∈ SL(2, C).
270
R. Verch
4. Generally Covariant Quantum Fields In the present section we introduce a concept of generally convariant quantum field theories on curved spacetimes with spin-structures. Moreover, we will make the assumption that these quantum field theories fulfill the condition of strong Einstein causality, or synonymously, that there exists a causal dynamical law. The combination of these two assumptions – general covariance and existence of a causal dynamical law – will lead to the connection between spin and statistics shown in the subsequent section. It should be remarked that there are several possible formulations of these two assumptions at the technical level. Here, we have chosen to use a framework which is in a sense a mixture of the Wightman-approach to “pointlike” quantum fields (operatorvalued distributions) and the Haag-Kastler approach which emphasizes local algebras of bounded operators. Therefore, some technical assumptions have to be made in order to match these two approaches; yet we feel that the resulting framework is more general and more flexible than e.g. a framework using only Wightman fields, since then we would have to make even more stringent technical assumptions, for instance fairly detailed assumptions on the domains of field operators, or we would have to impose a very restrictive form of general covariance and strong Einstein causality. Since we don’t wish to impose conditions of such kind, we regard the approach to be presented in this section as reasonable and fairly general. The relevant assumptions will be listed next. (a) Quantum fields of a spin representation type and their (local) von Neumann algebras. Let M = ((M, g), S(M, g), ψ) ∈ G be a globally hyperbolic spacetime with spin-structure. Moreover, let ρ be a representation of SL(2, C) on the finite-dimensional vector-space Vρ . We will say that a triple of objects (., D, H) is a quantum field of spin representation type ρ on M if: H is a Hilbert-space, D is a dense linear subspace of H, and . is a linear map taking elements f ∈ /0 (Vρ ), the space of C ∞ -sections in Vρ with compact support, to closable operators .(f ) in H having domain D. In addition, it will be assumed that D is invariant under application of the operators .(f ), and that D is also an invariant domain for the adjoint field operators .(f )∗ . It will also be assumed that there are cyclic vectors in D, where χ ∈ D is called cyclic if the space generated by χ and all F1 · · · Fn χ , n ∈ N, where Fj ∈ {.(fj ), .(fj )∗ }2 with fj ∈ /0 (Vρ ), is dense in H. We write orc(M) to denote the set of open, relatively compact subsets of M. Let O ∈ orc(M), then denote by F(O) the von Neumann algebra which is generated by all eiλ|.(f )| , λ ∈ R, and Jf , with supp f ⊂ O, where .(f ) = Jf |.(f )| denotes the polar decomposition of a field operator’s closure. Thus the quantum field (., D, H) induces a net of von Neumann algebras {F(O)}O∈orc(M) fulfilling the isotony condition O1 ⊂ O2 ⇒ F(O1 ) ⊂ F(O2 ). In the following, we shall abbreviate a quantum field (., D, H) by the symbol . (b) Existence of a causal dynamical law. Let be a quantum field of some spinrepresentation type ρ on M. We say that there exists a causal dynamical law for the 2 {.(f ), .(f )∗ } denotes the set containing the operators in the curly brackets, and not their antij j commutator. In this work, we will never use curly brackets to denote anti-commutators.
Spin-Statistics Theorem for Quantum Fields on Curved Spacetime
271
quantum field (or that the quantum field fulfills strong Einstein causality) if for the net {F(O)}O∈orc(M) of local von Neumann algebras it holds that O1 ✁ O2 ⇒ F(O1 ) ⊂ F(O2 ). (c) Local morphisms. Assume that we have two representations ρ1 and ρ2 on finitedimensional vector spaces V1 and V2 , respectively, and suppose that these representations are isomorphic, i.e. (3.2) holds with some bijective linear map T : V1 → V2 . Let 1 and 2 be quantum fields of spin-representation type ρ1 and ρ2 on M1 and M2 , respectively, where Mj ∈ G (j = 1, 2). Moreover, suppose that there is a local isomorphism = ( , ϑ) between M1 and M2 . Then we say that the local morphism between M1 and M2 is covered by local isomorphisms between the quantum field theories 1 and 2 if the following holds: Given any relatively compact subset Ni ⊂ *ini () and writing Nf = ϑ(Ni ), and denoting by {F1 (Oi )}Oi ∈orc(Ni ) and {F2 (Of )}Of ∈orc(Nf ) the von Neumann algebraic nets induced by the quantum fields 1 and 2 restricted to Ni and Nf , respectively, there is a von Neumann algebraic isomorphism α,Ni : F1 (Ni ) → F2 (Nf ) fulfilling the covariance property α,Ni (F1 (Oi )) = F2 (ϑ(Oi )),
Oi ∈ orc(Ni ).
(4.1)
Comments and Remarks. (i) In (a), the property of a quantum field to be a spinor field of a certain type is just specified by requiring that it acts linearly on the test-spinors of the corresponding type. This is a quite common approach to defining spinor fields on curved spacetime. An algebraic transformation property, e.g. that a (local) spinortransformation ρ(s) on Vρ induces an endomorphism on the ∗-algebra of quantum field operators, holds in general only when the underlying spacetime has a flat metric. One may regard the properties of Def. 4.1 below as a weak replacement of such an algebraic transformation property. (ii) Existence of a causal dynamical law is a typical feature of quantum fields obeying linear hyperbolic equations of motion, but is expected to hold also for interacting quantum field theories as long as the mass spectrum behaves moderately. For free field theories, the existence of a causal dynamical law is commonly fulfilled in the following stricter form (see [13] for the case of the scalar field, but the argument generalizes to more general types of fields, cf. e.g. [42]): Given O1 ✁ O2 , then for each f1 ∈ /0 (Vρ ) with supp f1 ⊂ O1 there is f2 ∈ /0 (Vρ ) with supp f2 ⊂ O2 such that .(f2 ) = .(f1 ). Our formulation given in (b) is more general. (iii) It is of some importance in (c) that Ni and Nf are assumed to be relatively compact subsets of *ini () and *fin (), respectively, as otherwise it is known from free field examples that a von Neumann algebraic isomorphism α,Ni : F1 (Ni ) → F2 (Nf ) with the covariance property (4.1) cannot be expected to exist. In typical cases, the von Neumann algebras Fj (O) are of properly infinite type, and then α,Ni is implemented by a unitary operator U,Ni : H1 → H2 . The subsequent definition will fix the notion of general covariance for quantum fields on curved spacetimes. Definition 4.1. Let ρ be a linear representation of SL(2, C) on a finite dimensional vector space V . By G we denote, as before, the set of all oriented and time-oriented, 4-dimensional, globally hyperbolic spacetimes equipped with a spin-structure. A family
272
R. Verch
{M }M∈G will be called a generally covariant quantum field theory over G of spin representation type ρ if the following properties are fulfilled: (A) For each M ∈ G, M = (.M , DM , HM ) is a quantum field theory on M of spin representation type ρ (the same for all M) such that the properties (a) and (b) stated above are satisfied. (B) For the case that M = M0 is Minkowski spacetime with its usual spin-structure, we demand that the corresponding quantum field theory M0 fulfills the Wightman axioms, including the Bose–Fermi alternative (or normal commutation relations); see Appendix B for details. (C) If for a pair M1 and M2 in G there is a local isomorphism between M1 and M2 , then it is covered by local isomorphisms between the corresponding quantum field theories M1 and M2 . Let us discuss some features of that definition in a further set of Comments and Remarks. (iv) Readers familiar with the articles of Dimock [13, 14] will notice that our definition is very much inspired by the concept of general covariance introduced in those works for quantum field theories on curved spacetimes. The main difference, as we have mentioned already in the Remark below Def. 3.1, is that the isomorphisms between the spacetimes with spin-structures, and accordingly between the corresponding quantum field theories, are here assumed to be local, whereas in [13, 14] they are assumed to be global. To allow local isomorphisms in the condition of general covariance (C) leads, in combination with the conditions (A) and (B), to restrictions which apparently are not present when using only global isomorphisms. The significance of that point has, in a somewhat different context, been noted by Kay [34]. Our definition of a generally covariant quantum field theory resembles an approach taken by Kay in his investigation of “F-locality” in [34]. The main difference (apart from differences of technical detail) is that Kay considers a much larger class G of spacetimes which need not be globally hyperbolic, and he essentially investigates the question of what the largest class G of spacetimes might be so that a quantum field theory over G is compatible with the covariance property (C) once certain properties are assumed for the quantum fields on the individual spacetimes in G. For the case of the scalar Klein–Gordon field, he finds that restrictions on the class of spacetimes G arise in order to obtain compatibility, see [34] for further discussion. (v) Given a local isomorphism between M1 and M2 in G, then it is known for free fields that typically the identification ˇ 5 (f ) = .M1 (f ), .M2 ◦
supp f ∈ *ini (),
ˇ 5f =
ˇ ◦ f ◦ ϑ −1 , with
preserves CAR or CCR and thereby gives rise to a (C ∗ -algebraic) local isomorphism α covering between the quantum field theories. In [52] (pp. 89–91 of that reference), such a covariance property has been proposed as a condition on the (renormalized) stressenergy tensor of a quantum field on curved spacetimes, and more recently, Hollands and Wald have defined the notion of a local, covariant quantum field by means of such a covariance behaviour of the quantum field and have shown that one may construct, essentially uniquely, Wick-polynomials of the free scalar field in such a way that they become local, covariant quantum fields [32]. Our conditions on a local isomorphism between quantum field theories are much less detailed; indeed, the slightly complicated definition of a local isomorphism between quantum field theories serves the purpose of keeping this notion as general as possible and yet to transfer enough algebraic information
Spin-Statistics Theorem for Quantum Fields on Curved Spacetime
273
for making it a useful (i.e. sufficiently restrictive) concept in combination with existence of a causal dynamical law formulated in (b). (vi) We have required that the spin-representation ρ be the same for all members M of a generally covariant quantum field theory over G, expressing that all these quantum field theories on the various spacetimes have the same field content. (Of course, it would be sufficient just to require that the various ρM be isomorphic; to demand equality is just a simplification of notation.) We think that this is necessary in order that (C) can be fulfilled, but a proof of that remains to be given. (vii) It should be noted that each element M ∈ G comes equipped with an orientation and a time-orientation. The local isomorphisms have been assumed to preserve orientation and time-orientation, so the condition of general covariance imposes no restrictions on quantum field theories M1 and M2 when M1 and M2 are connected by a local isomorphism that reverses orientation and time-orientation. In fact, if is an (appropriately defined) local isomorphism between M1 and M2 reversing both time-orientation and orientation, one would expect that for any relatively compact Ni ⊂ *ini (), writing Nf = ϑ(Ni ), there is an anti-linear von Neumann algebraic isomorphism α,Ni having the covariance property (4.1). It would be quite interesting to see if one could deduce the existence of such anti-linear local von Neumann algebraic isomorphisms at least for a distinguished class of time-orientation and orientation reversing local isomorphisms from the assumptions on {M }M∈G of Def. 4.1. That would correspond to a PCT-theorem in the present general setting. (viii) The assignment of quantum field theories M to each M ∈ G fulfilling the condition of general covariance allows a functorial description which will be indicated in Appendix C. 5. Spin and Statistics In the present section we state and prove a spin-statistics theorem for generally covariant quantum field theories over G. Before we can start to formulate the result, it is in order to briefly recapitulate the terminology referring to “integer” and “half-integer” spin. Let k C2 denote the k-fold symmetrized tensor product of C2 . Then an irreducible complex linear representation D (k,l) of SL(2, C) for k, l ∈ N0 is given on the vectorspace Vk,l = (k C2 ) ⊗ (l C2 ) by D (k,l) (s) = (k s) ⊗ (l s), where s ∈ SL(2, C) acts like a matrix on column vectors in C2 , and s is the matrix with complex conjugate entries.3 All finite-dimensional complex linear irreducible representations of SL(2, C) arise in this way. Such an irreducible representation is said to be of integer type (or simply integer) if k + l is even and of half-integer type (or simply half-integer) if k +l is odd. There also the (finite dimensional) real linear irreducible representations D (k,l) ⊕ D (l,k) for k ! = l, and D (l,l) . They are called real-linear irreducible because it is possible to select real-linear subspaces in Vk,l ⊕ Vl,k and in Vl,l , respectively, on which these representations act irreducibly as real-linear representations. As complex linear representations they are, however, reducible except for the case D (l,l) . The classification of these representations as being of “integer” or “half-integer” type is analogous to that of complex linear irreducible representations. 3 By convention, the case k = 0 and l = 0 corresponds to a scalar field, with the trivial one-dimensional representation of SL(2, C).
274
R. Verch
Theorem 5.1. Let {M }M∈G be a generally covariant quantum field theory over G of spin representation type ρ, where ρ is assumed to be a complex linear irreducible, or real linear irreducible, finite dimensional representation of SL(2, C). (I) If ρ is of half-integer type, and if there exist an M ∈ G and a pair of non-empty O1 , O2 ∈ orc(M) with O1 ⊂ O2⊥ so that FM (O1 ) ⊂ FM (O2 )" (where by FM (O) we denote the local von Neumann algebras generated by M and by FM (O)" the ˆ ∈ G that . ˆ (f ) = cf · 1 for commutant algebras4 ) then it follows for all M M some cf ∈ C, i.e. the quantum field operators of all quantum fields of the generally covariant theory are multiples of the unit operator. (II) If ρ is of integer type, and if there exist an M ∈ G, a pair of causally separated points p1 and p2 in M and for each pair of open neighbourhoods Oj of pj with O1 ⊂ O2⊥ a pair fj ∈ /0 (Vρ ) with supp fj ⊂ Oj and .M (fj ) ! = 0 (j = 1, 2) so that .M (f1 ).M (f2 ) + .M (f2 ).M (f1 ) = 0 or .M (f1 ).M (f2 )∗ + .M (f2 )∗ .M (f1 ) = 0,
(5.1)
ˆ ∈ G that all field operators . ˆ (f ) are multiples of then it follows again for all M M the unit operator. We note that FM (O1 ) ⊂ FM (O2 )" means that the field operators .M (f1 ) and .M (f2 ) for supp fj ⊂ Oj commute strongly in the sense that the operators appearing in their polar decompositions commute strongly. This stronger form of commutativity at causal separation is expected to hold in physically relevant theories. In Appendix B we give a few more comments on this point. If the stronger forms of general covariance at the level of invidual field operators as indicated in Remarks (ii) and (iv) of Sect. 4 were assumed, the statement for the half-integer case could be strengthened to resemble the integer case more closely; namely, then one would conclude for the half-integer case that the relations .M (f1 ).M (f2 ) − .M (f2 ).M (f1 ) = 0 or .M (f1 ).M (f2 )∗ − .M (f2 )∗ .M (f1 ) = 0 for some M and a pair of test-spinors f1 and f2 with causally separated supports so that .M (fj ) ! = 0 already imply that the field operators .Mˆ (f ) are multiples of unity for all ˆ ∈ G. M Proof of Theorem 4.1. We begin with part (I) of the statement involving a theory of half-integer type, and we suppose that F(O1 ) ⊂ F(O2 )" for a pair of causally separated O1 , O2 ∈ orc(M), where we use the notation F(O) = FM (O). Then let pj ∈ Oj , and choose for this pair of causally separated points in M a globally hyperbolic spacetime j , U j , G, G, as in Lemma 2.1, which can be done (M, g ) with neighbourhoods Uj , U −1 ⊂ M. in such a way that ϑ (Uj ) ⊂ Oj , where ϑ is the isomorphism M ⊃ N → N Now we equip (M, g ) with any spin-structure and denote the resulting spacetime with The neighbourhoods G and G are simply connected. Thus, since spin-structure by M. all spin-structures over simply connected globally hyperbolic spacetimes are equivalent, with *fin () = G, and also a there is a local isomorphism between M and M and M 0 , where M 0 is Minkowski spacetime with its local isomorphism 0 between M standard spin-structure. This is due to the fact that G is isomorphic to a subset ϑ −1 (G) is isomorphic to a subset in Minkowski-spacetime M0 , cf. Lemma 2.1. in M and G Let us now introduce the notation F(U ) = FM (U ) and F0 (U ) = FM0 (U ) for the local von Neumann algebras corresponding to the theories M and M0 , respectively. 4 I.e. F (O)" = {A" ∈ B(H ) : A" A = AA" , ∀ A ∈ F (O)}. M M M
Spin-Statistics Theorem for Quantum Fields on Curved Spacetime
275
i of Then choose two globally hyperbolic, relatively compact submanifolds Nf and N respectively, with the additional property that Uj ⊂ Nf and U j , U j ⊂ N i G and G, −1 (j = 1, 2). Denote Ni = ϑ (Nf ). According to the general covariance assumption (C) there are local isomorphisms α,Ni between M and M and α0 ,N i between M and M0 so that F(U ), U ∈ orc(Nf ), α,Ni (F(ϑ −1 (U ))) = )), U ∈ orc(N i ), α ,N (F(U )) = F0 (ϑ0 (U 0
i
(5.2) (5.3)
into M0 . Since we have supposed initially where ϑ0 is the isomorphism embedding G that F(O1 ) ⊂ F(O2 )" , and since ϑ −1 (Uj ) ⊂ Oj , relation (5.2) implies that F(U1 ) ⊂ j and hence, by the existence of a causal dynamical law, it F(U2 )" . Moreover, Uj ✄ U follows that 1 ) ⊂ 2 )" . F(U F(U Exploiting also (5.3), one obtains 1 )) ⊂ F0 (ϑ0 (U 2 ))" , F0 (ϑ0 (U
(5.4)
1 ) and ϑ0 (U 2 ) are a pair of open, causally separated subsets of Minkowski where ϑ0 (U spacetime. Since the quantum field theory M0 on Minkowski spacetime has been assumed to fulfill the usual assumptions, and is, by assumption, of half-integer spin-type, the last relation (5.4) implies by the known spin-statistics theorem for quantum field theories on Minkowski spacetime that F0 (U0 ) = C · 1 holds for all U0 ∈ orc(M0 ). (See Appendix B for details.) In a next step we will show how that conclusion implies that all other quantum field ˆ = ((M, ˆ g), ˆ g), ˆ ∈ G and choose any theories Mˆ are likewise trivial. Let M ˆ S(M, ˆ ψ) ˆ ˆ point p1 ∈ M (and any other causally separated point p2 ∈ M, which actually plays no j , U j , G, G as in Lemma 2.1 role). Then choose a spacetime (M, g ) with subsets Uj , U ˆ g) for these data, (M, ˆ now playing the role of (M, g). Identifying F(O) = FMˆ (O) and 1 )) = making similar adaptations, Eqs. (5.2) and (5.3) hold accordingly. Then F0 (ϑ0 (U 1 ) = C · 1, and since U 1 ✄ U1 it follows that C · 1 implies, by (5.3), F(U F(U1 ) = C · 1. Hence (5.2) leads to F(ϑ −1 (U1 )) = C · 1, implying that .Mˆ (f ) is a multiple of the unit operator for all f with supp f ⊂ ϑ −1 (U1 ). As ϑ −1 (U1 ) is an open neighbourhood of an ˆ and since the quantum field f → . ˆ (f ) is linear, a partition arbitrary point p1 ∈ M, M of unity argument shows that therefore one must have .Mˆ (f ) = cf · 1 with suitable ˆ cf ∈ C for all test-spinors f on M. Now we turn to the proof of statement (II) of the theorem. According to the assumptions, there are two points p1 and p2 in M which are causally separated, and moreover, j , U j , G, G when choosing a deformation (M, g ) of (M, g) with neighbourhoods Uj , U as in Lemma 2.1, there are a pair of testing spinors fj supported in ϑ −1 (Uj ) so that .M (fj ) ! = 0 and such that one of the relations (5.1) holds. We shall, for the sake of simplicity of notation, assume that .M (f1 ).M (f2 ) + .M (f2 ).M (f1 ) = 0
(5.5)
holds, and we will show that these properties are in conflict with Bosonic commutation relations for the theory M0 on Minkowski spacetime. The other case of (5.1) can be treated by similar arguments. The proof proceeds indirectly, so we suppose that M0
276
R. Verch
possesses Bosonic commutation relations. As before in the proof of (I) above, we can find local isomorphisms α,Ni and α0 ,Ni fulfilling the relations (5.2) and (5.3) for the and von Neumann algebraic nets corresponding to the quantum field theories on M, M M0 . Having supposed Bosonic commutation relations for the quantum field theory on 1 ) ⊂ 2 )" . Now Uj ✁ U j and thus, Minkowski spacetime, it follows by (5.3) that F(U F(U by the existence of a causal dynamical law, it holds that F(U1 ) ⊂ F(U2 )" . By (5.2) we obtain F(ϑ −1 (U1 )) ⊂ F(ϑ −1 (U2 ))" . Since the operators .M (fj ) are affiliated to the von Neumann algebras F(ϑ −1 (Uj )), one concludes that .M (f1 ).M (f2 ) − .M (f2 ).M (f1 ) = 0.
(5.6)
Comparing (5.5) and (5.6) yields .M (f1 ).M (f2 ) = 0. It is clear that this relation entails .M (f1 )∗ .M (f1 ).M (f2 ).M (f2 )∗ = 0. The operators A1 = .M (f1 )∗ .M (f1 ) and A2 = .M (f2 ).M (f2 )∗ are positive and possess selfadjoint extensions affiliated to F(ϑ −1 (U1 )) and F(ϑ −1 (U2 )), respectively. Denoting by Ej (a) their spectral projections corresponding to the spectral interval (−a, a), the operators 1 )) and it holds that A1 (a)A2 (a) = 0 for all Aj (a) = Ej (a)Aj are contained in F(ϑ −1 (U a > 0. Repeating the arguments that led to Eq. (5.6), one can see that the Aj (a) possess j (a) in F0 (ϑ0 (U j )) so that A 1 (a)A 2 (a) = 0 for all a > 0. But isomorphic images A since the net {F0 (U )}U ∈orc(M0 ) was assumed to fulfill Bosonic commutations relations, and since it fulfills the usual assumptions for a quantum field theory on Minkowskispacetime, including the spectrum condition and the existence of a vacuum state, it follows that the Schlieder property [43] holds for this net. This property states that j (a) ∈ F0 (ϑ0 (U j )), cl ϑ0 (U 1 ) ⊂ ϑ0 (U 2 )⊥ and A 1 (a)A 2 (a) = 0 imply the relations A 1 (a) = 0 or A 2 (a) = 0. Hence one obtains that, for all a > 0, A1 (a) = 0 or A2 (a) = 0, A and this entails A1 = 0 or A2 = 0, which in turn enforces .M (f1 ) = 0 or .M (f2 ) = 0. Thus one arrives at a contradiction since both operators .M (f1 ) and .M (f2 ) are by assumption different from 0. One concludes that Bosonic commutation relations are an impossible option for the theory M0 on Minkowski spacetime and thus, due to the Bose–Fermi-alternative, that theory must fulfill Fermionic commutation relations. Since the theory is of integer spin-type, this implies that the von Neumann algebras F0 (U0 ) of the theory on Minkowski spacetime consist only of multiples of the unit operator because of the spin-statistics theorem on flat spacetime (cf. Appendix B). Repeating the ˆ ∈ G the quantum field argument given for part (I) above, it follows that for each M operators .Mˆ (f ) are multiples of the unit operator for all test-tensors f . % &
6. Examples In this section we briefly indicate examples of linear quantum field theories which fulfill the properties required for a generally covariant quantum field theory over G in Sect. 4. 1. The free scalar field. The simplest example is the free scalar field, although its significance for a spin-statistics theorem is, naturally, quite limited.
Spin-Statistics Theorem for Quantum Fields on Curved Spacetime
277
For each globally hyperbolic spacetime M = (M, g) ∈ G (endowed with a spinstructure whose explicit appearance is now suppressed since it is irrelevant for the scalar field) we consider the scalar Klein–Gordon equation (✷g + m2 )ϕ = 0 for real-valued functions ϕ on M, where m ≥ 0 is a constant independent of M and ✷g is the scalar d’Alembertian for (M, g). Following Dimock [13], one can construct a C ∗ -algebraic quantization of this field as follows. There are uniquely determined, ± : C0∞ (M, R) → C ∞ (M, R) with the properties continuous linear maps EM ± ± = f = EM (✷g + m2 )f (✷g + m2 )EM ± f supp EM
and
±
⊂ J (supp f ), f ∈ C0∞ (M, R).
± + − Their difference EM = EM − EM , called the (causal) propagator, induces a symplectic form κM ([f ], [h]) = dη f · EM h, [f ], [h] ∈ KM , M
on KM = C0∞ (M, R)/ker EM , where f → [f ] = [f ]M denotes the quotient map and dη is the metric-induced volume-form on (M, g). To the resulting symplectic space (KM , κM ) there corresponds the CCR-Weyl algebra A[KM , κM ], defined as the (up to C ∗ -isomorphisms unique) C ∗ -algebra generated by unitary elements WM (x), x ∈ KM , fullfilling the Weyl-relations, or “exponentiated” canonical commutation relations (see [9]) WM (x)WM (y) = exp(−iκM (x, y)/2)WM (x + y), WM (x)∗ = WM (−x), x, y ∈ KM . Dimock has shown that any isometry θ : M1 → M2 induces a C ∗ -algebraic isomorphism αθ : A[KM1 , κM1 ] → A[KM2 , κM2 ] with the property that αθ (WM1 ([f ]M1 )) = WM2 ([θ ∗ f ]M2 ),
f ∈ C0∞ (M1 , R),
(6.1)
where θ ∗ f = f ◦ θ −1 . If M1 ⊂ M1" and M2 ⊂ M2" are globally hyperbolic subspacetimes of a pair of globally hyperbolic spacetimes M1" and M2" , then WMj = WMj" KMj (j = 1, 2) holds up to C ∗ -isomorphisms as a consequence of the uniqueness of the causal propagators, thus there is always a C ∗ -algebraic Weyl-algebra isomorphism covering a local isomorphism between members of G. Furthermore, Dimock has also shown in [13] that, upon denoting by AM (O) the C ∗ -subalgebra of A[KM , κM ] generated by all WM ([f ]M ), supp f ⊂ O, there holds O1 ✁ O2 ⇒ AM (O1 ) ⊂ AM (O2 )
(6.2)
for all O1 , O2 ⊂ M. Now let ωM be an arbitrary quasifree Hadamard state on A[KM , κM ]. Such a state is determined by its two-point correlation function which here is required to be of “Hadamard form”. The Hadamard form specifies the singular short-distance behaviour in a particular way, see [21, 52] and references cited therein for discussion. Equivalently, the Hadamard form of a two-point function can be characterized by a certain form of
278
R. Verch
its wavefront set (see [39, 42] for details). It has been shown in [22] that there exists an abundance of Hadamard states on A[KM , κM ]. To such a quasifree Hadamard state ωM there corresponds its GNS-Hilbertspace representation (πM , HM , CM ), cf. e.g. [8]. In that representation, we define the local von Neumann algebras FM (O) = πM (AM (O))"" for each O ∈ orc(M). Then (6.2) clearly implies the existence of a causal dynamical law O1 ✁ O2 ⇒ FM (O1 ) ⊂ FM (O2 ). A vector χ ∈ HM is defined to be in DM if for each choice of x = (x1 , . . . , xn ) ∈ (KM )n the map t → πM (WM (t1 x1 )) · · · πM (WM (tn xn ))χ ,
t = (t1 , . . . , tn ) ∈ Rn ,
is C ∞ . One can show that DM is a dense domain in HM (cf. [9]). One can define for each f ∈ C0∞ (M, R) the quantum field operator .M (f ) by d .M (f )χ = −i πM (WM (t[f ]M ))χ , χ ∈ DM . dt t=0 One can also show that DM is left invariant under the action of .M (f ) and that .M (f ) is essentially self-adjoint [9]. It is also obvious that .M (f ) is affiliated to FM (O) as soon as supp f ⊂ O. Moreover, the results of [48] show that the C ∗ -algebraic isomorphism αθ in (6.1) can be extended, in representations induced by quasifree Hadamard states, to von Neumann algebraic isomorphisms in the following way. Suppose that between M1 and M2 in G there is a local isomorphism θ , and let Ni ⊂ *ini (θ ) be a relatively compact subset. Then, writing Nf = θ (Ni ), the Weyl-algebra isomorphism αθ in (6.1) extends to an isomorphism αθ,Ni : FM1 (Ni ) → FM2 (Nf ) between von Neumann algebras. Consequently, there holds the covariance property αθ,Ni (FM1 (Oi )) = FM2 (θ (Oi )),
Oi ∈ orc(Ni ).
Finally, if M0 is Minkowski spacetime, we take ωM0 to be the vacuum state which is known to be a quasifree Hadamard state. In conclusion, the just constructed family {M }M∈G of Klein–Gordon quantum fields for each M ∈ G satisfies all the assumptions required for a generally covariant quantum field theory over G. 2. The Proca field. The Proca field is a co-vector field, i.e. of tensorial type, corresponding to the D (1,1) irreducible representation of SL(2, C). For each globally hyperbolic spacetime M = (M, g) ∈ G (where again we suppress the spin-structure in our notation since it is presently not relevant), we denote by d the exterior derivative of differential forms, by ∗ the Hodge-star operator corresponding to the metric g, and define the co-differential δ = ∗d∗. Then the Proca equation reads, for ϕ ∈ /0 (T ∗ M), (δd + m2 )ϕ = 0, where m > 0 is a constant independent of M. (Note that δd depends on the metric g.) A C ∗ -algebraic quantization has recently been given by Furlani [23] (cf. also [45], whose
Spin-Statistics Theorem for Quantum Fields on Curved Spacetime
279
notation we follow here). To this end one constructs advanced and retarded fundamental ± solutions FM : /0 (T ∗ M) → /(T ∗ M) uniquely determined by ± ± ± FM (δd + m2 )f = f = (δd + m2 )FM f, supp FM f ⊂ J ± (supp f ),
f ∈ /0 (T ∗ M).
As in the case of the scalar Klein–Gordon field, one defines the (causal) propagator + − FM = FM − FM and a symplectic space (KM , κM ) where f ∧ ∗FM h, [f ], [h] ∈ KM , κM ([f ], [h]) = M
on KM = /0 (T ∗ M)/ker FM and f → [f ] = [f ]M is the quotient map. From here onwards, all the arguments leading to the construction of a generally covariant theory {M }M∈G can be taken over almost literally, except for obvious modifications, from the previous case of the scalar Klein–Gordon field to the present case of the Proca field. There are some provisions which should nevertheless be recorded: Firstly, the existence of Hadamard states for the Proca field has not been demonstrated. However, as mentioned towards the end of Sect. 5.1 in [42], the existence of Hadamard states could be established by using the existence of a ground state for the Proca field on ultrastatic spacetimes [24] in combination with results in [41] and [22] to prove that there exists a large set of quasifree Hadamard states for the Proca field. Secondly, the arguments given in [48] showing that the C ∗ -algebraic isomorphism (6.1) can be extended to a von Neumann algebraic isomorphism in the above said way apply to the case of the free scalar Klein–Gordon field. But those arguments can obviously be generalized to apply to a far more general class of free fields, including the Proca field. Thus, one may conclude that also the Proca field gives rise to a generally covariant quantum field theory {M }M∈G . 3. The Dirac field. Our last example is the Dirac field, which is a spinorial field of spin 1/2. We consider it in a Majorana representation; our presentation follows [14] to large extent, with some alterations specific to Majorana representations, see [42] for details. The Majorana representation corresponds to the real linear irreducible representation D (1,0) ⊕ D (0,1) of SL(2, C). This Majorana-Dirac representation will be denoted by ρ. Its representation space is Vρ = C4 . Let M = (M, g, S(M, g), ψ) ∈ G be a globally hyperbolic spacetime with spinstructure. The vector bundle V = S(M, g) ρ C4 associated with S(M, g) and the representation ρ will be denoted by Dρ M; its sections are called spinors, or spinor fields. The metric-induced connection ∇ on T M lifts to a connection on the frame bundle F (M, g) which in turn lifts to a connection on S(M, g), and this induces also a connection on Dρ M. The corresponding covariant derivative operator will be denoted by ∇. One can then introduce the spinor-tensor γ ∈ /(T ∗ M ⊗ Dρ M ⊗ Dρ∗ M) by requiring that its components γ a A B in (appropriate, dual) local frames are equal to the matrix elements (γa )A B of the gamma-matrices in the Majorana-representation. This is a set of four 4 × 4 matrices γ0 , γ1 , γ2 , γ3 obeying the relations γa γb + γb γa = 2ηab ,
γ0∗ = γ0 ,
γk∗ = −γk (k = 1, 2, 3),
γ a = γa .
Here, γa∗ means the Hermitian conjugate of γa and γa is the transpose of γa∗ , and (ηab ) = diag(1, −1, −1, −1) is the Minkowskian metric. Then the Dirac-operator ∇ / is defined by setting in frame components, for any local section f = f A EA ∈ /0 (Dρ M), (∇ / f )A = ηab γ a A B (∇ b f )B .
280
R. Verch
(At this point, we refer to [14, 42] for details.) There is a charge conjugation C which operates by complex conjugation of the frame-components in any frame, i.e. (Cu)A = uA for the components of u ∈ Dρ M. There is also the Dirac adjoint u → u+ mapping Dρ M anti-linearly and base-point preserving onto its dual bundle Dρ∗ M; in dual frame components it is defined as (u+ )B = uA γ0 AB . The Dirac-equation on M is the differential equation (∇ / + im)ϕ = 0 for ϕ ∈ /(Dρ M), where m ≥ 0 is a constant, independent of M. As in the cases considered before, there are uniquely determined advanced and retarded fundamental ± solutions SM : /0 (Dρ M) → /(Dρ M) distinguished by the properties ± ± ± SM (∇ / + im)f = f = (∇ / + im)SM f, supp SM f ⊂ J ± (supp f ),
f ∈ /0 (Dρ M).
+ − Hence one obtains a distinguished causal propagator SM = SM − SM . It gives rise to a pre-Hilbertspace (HM , sM ), where HM = /0 (Dρ M)/ker SM with scalar product sM ([f ], [h]) = dη (Sf )+ (h), [f ], [h] ∈ HM , M
where we have denoted the metric-induced measure on M by dη and by f → [f ] = [f ]M the quotient map. The charge conjugation C can be shown to induce a conjugation on (HM , sM ) which will be denoted by the same symbol. We shall also notationally identify HM with its completion to a Hilbertspace. To the Hilbertspace (HM , sM ) with complex conjugation C there corresponds (uniquely, up to C ∗ -algebraic equivalence) the self-dual CAR-algebra B[HM , sM , C] (cf. [1]) which is a C ∗ -algebra generated by elements BM (v) depending linearly on v ∈ HM and fulfilling the canonical anti-commutation relations BM (v)∗ BM (w) + BM (w)BM (v)∗ = sM (v, w), BM (v)∗ = BM (Cv),
v, w ∈ HM .
In [14], Dimock has proven that each (global) isomorphism = ( , ϑ) between members M1 and M2 in G induces a C ∗ -algebraic isomorphism α : B[HM1 , sM1 , C] → B[HM2 , sM2 , C] satisfying ˇ 5 f ]M2 ), α (BM1 ([f ]M1 )) = BM2 ([
f ∈ /0 (Dρ M1 ),
(6.3)
ˇ 5f =
ˇ ◦ f ◦ ϑ −1 ,
ˇ being the map Dρ M1 → Dρ M2 induced by . As in where
the cases discussed before, this statement has a local version to the effect that for each local isomorphism between members of G there is a C ∗ -algebraic isomorphism between the corresponding CAR-algebras covering it. Moreover it was shown in [49] that strong Einstein causality, O1 ✁ O2 ⇒ BM (O2 ) ⊂ BM (O2 ),
(6.4)
holds for the local C ∗ -subalgebras BM (O) of B[HM , sM , C] which are generated by all BM ([f ]M ) with supp f ⊂ O. Now let ωM be any quasifree Hadamard state on B[HM , sM , C], and (πM , HM , CM ) the corresponding GNS-representation, then the local von Neumann algebras will be defined via FM (O) = πM (BM (O))"" , O ∈ orc(M),
Spin-Statistics Theorem for Quantum Fields on Curved Spacetime
281
whereas the field operators are now given as .M (f ) = πM (BM ([f ]M )),
f ∈ /0 (Dρ M).
Owing to the canonical anti-commutation relations, these field operators are bounded, and one may take their domain DM to be equal to HM . The existence of a causal dynamical law at the level of the local von Neumann algebras is then granted by (6.4). It is to be expected that the arguments of [48] showing that the C ∗ -algebraic Weylalgebra isomorphisms (6.1) (when appropriately localized, see above) extend to von Neumann algebraic isomorphisms for the case of the scalar Klein–Gordon field and have generalizations allowing to conclude that the C ∗ -algebraic CAR-algebra isomorphisms (6.3) extend, in a similar manner, to von Neumann algebraic isomorphisms, so that general covariance is fulfilled. Another provision is that, as in the case of the Proca field, the existence of quasifree Hadamard states for the Dirac field has as yet not been demonstrated. However, the same comment as given above for the case of the Proca field applies here. Anticipating therefore that these provisions are lifted, the just constructed family {M }M∈G of Dirac quantum fields for each M ∈ G yields another example of a generally covariant quantum field theory over G upon choosing ωM0 as the vacuum state (being quasifree and Hadamard) on Minkowski spacetime M0 . (See also the “Note added in proof” at the end of the article.) Appendix A Proof of Lemma 2.1. Let two causally separated points p1 and p2 be given; hence we may form the manifold M ∨ = M\(J + (p1 ) ∪ J + (p2 )). Then (M ∨ , g M ∨ ) is again a globally hyperbolic spacetime. This globally hyperbolic spacetime may be smoothly foliated into Cauchy-surfaces and thus one can move Cauchy-surfaces for (M ∨ , g M ∨ ) arbitrarily close to p1 and p2 . We will use this property in order to construct a Cauchysurface in (M, g) having the following properties: (i) ⊂ M ∨ . (ii) There is an open, simply connected neighbourhood W ⊂ which is contained in a coordinate chart (for ), and it holds that J − (pj ) ∩ ⊂ W (j = 1, 2). To this end, let F : R × 0 → M be a C ∞ -foliation of (M, g) in Cauchy-surfaces. If C is any Cauchy-surface in (M, g), then there is a diffeomorphism HC : 0 → C which is defined by assigning to x ∈ 0 the point qx ∈ C so that F (tx , x) = qx for some (uniquely determined) tx ∈ R. Now let (tj , xj ) ∈ R × 0 be such that F (tj , xj ) = pj , j = 1, 2. Then there is clearly a pair S1 , S2 of open neighbourhoods of x1 , x2 , respectively, in 0 lying in a simply connected chart domain W0 (of 0 ), cf. [12], Prop. 16.26.9. Thus, whenever C is a Cauchy-surface in (M, g), then the sets HC (S1 ) and HC (S2 ) are contained in the simply connected chart domain HC(W0 ) of C. On the other hand, HC (Sj ) is the intersection of C with the “tube” Tj = {F (t, x) : t ∈ R, x ∈ Sj }. It is now fairly easy to see that, if Bj denotes the unit ball in Tpj M with respect to arbitrarily given coordinates, then the sets Vj (τ ) = {exppj (v) : v ∈ τ · Bj , v past-directed and causal} of segments of “causal rays” emanating to the past from pj will be contained in Tj if τ > 0 is small enough. Choosing such a τ and using that (M ∨ , g M ∨ ) is globally hyperbolic, one can thus find a Cauchy-surface in (M ∨ , g M ∨ ) with (Vj (τ )\Vj (τ/2)) ⊂ int J − (); this implies that the intersection of J − (pj ) with is contained in Tj ∩ = H (Sj ), and since is also a Cauchy-surface
282
R. Verch
for (M, g), one realizes that it has the desired properties (i) and (ii) upon choosing W = H (W0 ). In a next step we note that, since the sets J − (pj ) ∩ are closed and contained in the open set W , also the closures of sufficiently small open neighbourhoods of these sets are contained in W . Thus we can choose two sufficiently small sets Uj = int(J − (p1+ ) ∩ J + (pj− )), where pj± ∈ int J ± (pj ), i.e. they are “double cones” surrounding the points pj , with J − (Uj ) ∩ ⊂ W . [Note that in Fig. 2.1 we have represented the sets Uj as truncated double cones since this turned out to be easier graphically.] Obviously one may choose the Uj so that they are contained in N+ = int J + (). Moreover, J − (Uj ) ∩ will be contained in an open, simply connected subset W1 of with W1 ⊂ W . Then int D + (W1 ) is a simply connected neighbourhood of U1 and U2 , and is globally hyperbolic when endowed with the metric g. Since (N+ , g N+ ) is a globally hyperbolic spacetime, one can choose a Cauchy-surface + in (N+ , g N+ ) “sufficiently close to ” so that the set G = int D + (W1 ) ∩ int J + (+ ) ⊂ N+ is still an open, simply connected neighbourhood of U1 and U2 which is globally hyperbolic when supplied with g as metric. The remaining part of the argument proceeds in a similar way as the proof ofAppendix C in [22]. We can cover with a system {Xα }α of coordinate patches, choosing one of them, say X1 , to have the property W 1 ⊂ X1 ,
X 1 ⊂ W.
(A.1)
Using Gaussian normal coordinates for , one may introduce coordinate patches (−εα , εα ) × Xa covering a neighbourhood N0 of , on each of which the metric g assumes the form dt 2 − gij (t, x)dx i dx j , where t ∈ (−εα , εα ) and x = (x i )3i=1 are coordinates on Xα ; (gij (t, x)) are the coordinates of the 3-dim. Riemannian metric induced by the metric g on the slices of constant t. Here, the coordinatization is assumed to be such that (t, x) represents a point in N+ for t > 0 and a point in N− = int J − () for t < 0. Moreover, N0 may be chosen so that it is, with g N0 as metric, a globally hyperbolic sub-spacetime of (M, g), and assuming now that N0 has been chosen in that way, also N0 ∩ N− is a globally hyperbolic sub-spacetime with the appropriate restriction of g as metric. After a moment of reflection one can see that this implies the existence of a Cauchy-surface 1 in N0 ∩ N− so that J − (W 1 ) ∩ J + (1 ) ⊂ (−ε1 , 0) × X1 by “moving 1 sufficiently close to ”. Upon moving 1 , if necessary, “still closer” to , it is also possible to ensure that the parts of J − (U 1 ) and J − (U 2 ) lying in J + (1 ) are causally separated. With 1 chosen in that manner, one can now pick some pair of small j lying relatively compact in int(J + (1 ) ∩ J − (Uj )) (j = 1, 2). We neighbourhoods U may then also select another Cauchy-surface 2 in N0 ∩ N1 , with j ⊂ int J − (2 ), cl U
2 ⊂ int J + (1 ).
In the next step, we endow with a complete Riemannian metric γ , which we prescribe to be a flat Euclidean metric on X1 (which is possible because of (A.1) in view
Spin-Statistics Theorem for Quantum Fields on Curved Spacetime
283
of the fact that W is a coordinate patch). We shall, furthermore, choose γ so that the flat Lorentzian metric η on (−ε1 , 0) × X1 given by η = dt 2 − γij dx i dx j has for (t, x) ∈ (−ε1 , 0) × X1 the property that each causal curve for η is also a causal curve for g, i.e. Jη (q) ⊂ Jg (q) on (−ε1 , 0) × X1 . This may always be realized by rescaling γ by a constant factor. = int J + (1 ). Let f ∈ C ∞ (M, R+ ) have the following properties: Now define M by 0 ≤ f ≤ 1, f ≡ 0 on J + (), f ≡ 1 on J − (2 ). Then define a metric g on N0 ∩ M setting its coordinate expression to be equal to b(t, x)dt 2 − f (t, x)γij + (1 − f (t, x))gij (t, x) dx i dx j on each coordinate patch (−εα , εα ) × Xα . Here, b is a smooth function on N0 ∩ M with 0 < b ≤ 1 and sufficiently small so that, with the new metric g , N0 is globally hyperbolic; from the properties of γ mentioned before it is obvious that one can choose such a b so that b ≡ 1 on N+ and b ≡ 1 on the set ∩ J − (2 ) ∩ (−ε1 , 0) × X1 . Y = int M With this choice of b, it is moreover clear that g coincides on N+ with the metric g, and to all of M by defining so g may be extended from N0 ∩ M g as g on N+ . Moreover, g is a flat Lorentzian metric on Y , and viewing Uj , j = 1, 2, canonically as subsets of M, the previous constructions entail that there are two globally hyperbolic sub-spacetimes (with metric j ✄ Uj U g ) which are relatively compact in Y , and have the property that U with respect to the metric g. Finally, one can make Y slightly smaller in order to obtain a globally hyperbolic of (M, j and U j (if sub-spacetime G g ) which is simply connected and still contains U j slightly smaller as well); and Therefore we necessary, by making the U g is flat on G. j , U j (j = 1, 2) and G, have now constructed the required (M, g ) and the subsets Uj , U G with the properties claimed in Lemma 2.1. % & Appendix B In this appendix we collect the assumptions about a quantum field theory M0 on Minkowski spacetime equipped with its standard spin structure, and quote the spinstatistics theorem for this setting. The assumptions are those given in the book by Streater and Wightman [44], except that in formulating the Bose–Fermi alternative (normal commutation relations), we will posit that Bosonic commutation relations hold in the strong sense, similarly as in the statement of Thm. 4.1. See below for details. To begin with, write (M0 , η) = (R4 , diag(+, −, −, −)) for Minkowski spacetime. A Lorentzian coordinate frame (e0 , . . . , e3 ) has been chosen by which M0 is identified with R4 , and which also serves to fix orientation and time-orientation. The framebundle ↑ F (M0 , η) is isomorphic to R4 × L+ , and for each x ∈ R4 , (x, (e0 , . . . , e3 )) represents an element in F (M0 , η). Then the spin-bundle S(M0 , η) is isomorphic to R4 ×SL(2, C), and one obtains a spin-structure ψ0 : S(M0 , η) → F (M0 , η) by assigning to (x, s) ∈ S(M0 , η) the element ψ0 (x, s) = (x, (e0 (s), . . . , e3 (s))) in F (M0 , η) with eb (s) = ea 'a b (s),
284
R. Verch ↑
where SL(2, C) s → '(s) ∈ L+ is the covering projection. Explicitly, the matrix components of '(s) are given by 'ab (s) =
1 Tr(s∗ σa sσb ), 2
where σ0 , . . . , σ3 are the Pauli-matrices. Now let ρ denote any of the complex linear irreducible representations D (k,l) , or of the real linear irreducible representations D (k,l) ⊕ D (l,k) (k ! = l), where k, l ∈ N0 . The corresponding representation space will be denoted by Vρ . Then we require that the quantum field theory M0 = (.M0 , DM0 , HM0 ) has the following properties (where in the following, we abbreviate (.M0 , DM0 , HM0 ) by (.0 , D0 , H0 )): (1) H0 is a Hilbertspace and D0 ⊂ H0 is a dense linear subspace. (2) .0 is a linear map taking elements f in S(R4 , Vρ ) to closable operators .0 (f ) all having the common, dense and invariant domain D0 . Here, S(R4 , Vρ ) is the set of Schwartz-functions on R4 taking values in the finite-dimensional representation space Vρ .5 (3) For each pair of vectors χ , χ " ∈ D0 , the map S(R4 , Vρ ) f → (χ , .0 (f )χ " ) is continuous, hence an element in S" (R4 , Vρ ). (4) There is a strongly continuous representation ↑ P+ (a, s) → U (a, s) ↑
of P+ = R4 SL(2, C) (the covering group of the proper orthochronous Poincaré group) by unitary operators on H0 ; D0 is left invariant under the action of the U (a, s). (5) The spectrum of the translation-subgroup a → U (a, 1) is contained in the closed forward lightcone V + , i.e. the relativistic spectrum condition holds. Moreover, there is an up to a phase unique unit vector C ∈ H0 , the vacuum vector, fulfilling ↑ U (a, s)C = C for all (a, s) ∈ P+ . This vector is assumed to be contained in D0 and to be cyclic for the algebra generated by the field operators in the sense that D0 coincides with the vector space spanned by C and all vectors of the form F1 · · · Fn C, n ∈ N, Fj ∈ {.0 (fj ), .0 (fj )∗ }, f1 , . . . , fn ∈ S(R4 , Vρ ). (6) The quantum field possesses the covariance property U (a, s).0 (f )U (a, s)−1 = .0 (ρa 5 (s)f ), where
ρa 5 (s)f (y) = ρ(s)(f ('(s)−1 (y − a)))
for all a ∈ R4 , s ∈ SL(2, C), f ∈ S(R4 , Vρ ). (7) Spacelike clustering holds on the vacuum, i.e. if a is any non-zero spacelike vector, then one has (C, F1 · · · Fk U (ta, 1)Fk+1 · · · Fn C) −→ (C, F1 · · · Fk C)(C, Fk+1 · · · Fn C) t→∞
for all Fj ∈ {.0 (fj ), .0 (fj
)∗ },
with f1 , . . . , fn ∈ S(R4 , Vρ ), n ∈ N.
5 In the case of flat Minkowski-spacetime, S(M , η) = R4 × SL(2, C) and one can canonically identify 0 Vρ with R4 × Vρ and ρˇ with id × ρ.
Spin-Statistics Theorem for Quantum Fields on Curved Spacetime
285
(8) Finally, the Bose–Fermi alternative is required to hold in the following form. The quantum field fulfills either Bosonic commutation relations. Given any pair of causally separated subsets O1 , O2 ∈ orc(R4 ), then it holds that F0 (O1 ) ⊂ F0 (O2 )" , or Fermionic commutation relations. Given any pair of f1 , f2 ∈ S(R4 , Vρ ) with spacelike separated supports, then it holds that .0 (f1 ).0 (f2 ) + .0 (f2 ).0 (f1 ) = 0 and .0 (f1 ).0 (f2 )∗ + .0 (f2 )∗ .0 (f1 ) = 0. In formulating the statement of Bosonic commutation relations (or locality, as it is also called), F0 (O) denotes the von Neumann algebra generated via the polar decomposition of the closed field operators .0 (f ) with supp f ⊂ O as described in assumption (a) of Sect. 4. The above statement of Bosonic commutation relations is thus equivalent to saying that the field operators .0 (f1 ) and .0 (f2 ) commute strongly for spacelike separated supports of f1 and f2 ; here we say that a pair of closable operators Xj (j = 1, 2) commutes strongly if J1 and eis|X1 | commute with J2 and eit|X2 | , s, t ∈ R, where Xj = Jj |Xj | denotes polar decomposition. Clearly, the property of field operators to commute strongly at spacelike separation implies their spacelike commutativity in the ordinary sense, .0 (f1 ).0 (f2 ) − .0 (f2 ).0 (f1 ) = 0 and .0 (f1 ).0 (f2 )∗ − .0 (f2 )∗ .0 (f1 ) = 0 whenever the supports of f1 and f2 are spacelike separated, but without further information one can in general not conclude that this last relation also implies spacelike commutativity of the field operators in the strong sense as usually the field operators will be unbounded. The question as to when this conclusion may nevertheless be drawn for field operators in quantum field theory is a longstanding one; however, several criteria are known. We refer the reader to [7, 18] for further discussion and references. Suffice it to say here that ordinary spacelike commutativity is expected to imply strong spacelike commutativity of field operators in the case of physically relevant theories. We also mention that in Def. 4.1 the quantum field .0 = .M0 has only been assumed to be an operator-valued distribution defined on test-spinors of compact support, which would correspond to elements in D(R4 , Vρ ). Thus, we assume here that .0 can be extended to an operator-valued distribution on S(R4 , Vρ ) with the above stated properties. Now we quote the spin-statistics theorem for a quantum field theory on Minkowski spacetime which is proved in [44] for complex linear irreducible ρ and in [33] for real linear irreducible ρ. (In fact, the results in [44, 33] are slightly more general since Bosonic commutation relations are only required in the ordinary sense there.) Theorem 2.1. Suppose that M0 is a quantum field theory on Minkowski spacetime fulfilling the above listed Conditions (1)–(8). Then the following two cases imply that .0 (f ) = 0, f ∈ S(R4 , Vρ ), and hence that F0 (O) = C · 1 holds for all bounded open regions O in Minkowski spacetime: (α) Bosonic commutation relations hold and the field is of half-integer spin type (k + l is odd). (β) Fermionic commutation relations hold and the field is of integer-spin type (k + l is even).
286
R. Verch
Appendix C In this appendix we will explain how a generally covariant quantum field theory over G may be viewed as a covariant functor between the category G and a category N of nets of von Neumann algebras over manifolds (more generally, one could consider N as the category of isotonous families of Neumann algebras indexed by directed index sets, but we don’t need that generality here). A similar functorial description has been given by Dimock [14] for the case that the morphisms of G are global isomorphisms, and that N is a category of C ∗ -algebraic nets. Here, we take the morphisms of G to be the local isomorphisms, and correspondingly we have to consider local morphisms for N. We now consider G as a category whose objects are the four-dimensional, globally hyperbolic spacetimes with a spin-structure. Given M1 and M2 in G, we define the set of morphisms hom(M1 , M2 ) to consist of the local isomorphisms between M1 and M2 . We also add to hom(M1 , M2 ) a trivial morphism 0. (In fact, 0 should be indexed by M1 and M2 , but that is inconvenient and will be skipped as there is no danger of confusion.) The composition of two morphisms a ∈ hom(M1 , M2 ) and b ∈ hom(M2 , M3 ) will be defined according to the following rules: If a = 0 or b = 0, then b a = 0. If both a and b are non-trivial, but *ini (b ) ∩ *fin (a ) = ∅, then also b a = 0. Otherwise, we declare b a to be the local isomorphism between M1 and M3 obtained by composing the bundle maps and isometries on their natural domains, so that *ini (b a ) = ϑa−1 (*ini (b ) ∩ *fin (a )). This is reasonable because it is not difficult to show that the intersection of two globally hyperbolic submanifolds of a globally hyperbolic spacetime yields again a globally hyperbolic submanifold. The identical bundle map gives the unit element in hom(M, M), and one can straightforwardly check that also the associativity of morphisms is fulfilled. The objects of the category N are families F = {F(O)}O∈orc(X) of von Neumann algebras which are indexed by the open, relatively compact subsets of a manifold X and which are subject to the condition of isotony (cf. Sect. 4, item (a)). The morphisms in hom(F1 , F2 ) are local net-isomorphisms. A local net isomorphism is a pair ({αNi }, φ) with the following properties: φ : X1 ⊃ N1 → N2 ⊂ X2 is a diffeomorphism between open subsets of the manifolds X1 and X2 which relate to the indexing sets of F1 and F2 in the obvious manner. {αNi }Ni ∈orc(N1 ) is a family of von Neumann algebraic isomorphisms αNi : F1 (Ni ) → F2 (Nf ) with Nf = φ(Ni ) obeying the covariance property αNi (F1 (O)) = F2 (φ(O)),
O ∈ orc(Ni ).
As before, we add to the local net-isomorphisms in hom(F1 , F2 ) a trivial morphism 0 (which may here be concretely thought of as the map which sends each algebra element in the net F1 to the algebraic zero element in the net F2 ). The composition rule for morphisms is then analogous as before, we only have to specify the case of two netisomorphisms (αNi , φ) ∈ hom(F1 , F2 ) and (βNi" , φ " ) ∈ hom(F2 , F3 ) when *ini (φ " ) ∩ *fin (φ) ! = ∅. In this situation, we define the composition of the two morphisms as the element (γNi , ψ) in hom(F1 , F3 ), where ψ is φ " ◦ φ restricted to φ −1 (*ini (φ " )∩*fin (φ)), and for any open, relatively compact subset Ni in *ini (ψ) we define γNi = βφ(Ni ) ◦ αNi . Again, each hom(F, F) contains the identical map as an identity, and one may check the associativity of the composition rule. Then the covariance structure (Condition (C) of Def. 4.1) of a generally covariant quantum field theory is that of a covariant functor F : G → N which assigns to each
Spin-Statistics Theorem for Quantum Fields on Curved Spacetime
287
object M ∈ G an object F(M) = {F(O)}O∈orc(M) in N, and which assigns to each (nontrivial) morphism = ( , ϑ) of G a morphism F() = (α,Ni , ϑ) of N. Moreover, F maps trivial morphisms to trivial morphisms. Diagrammatically, one has F
M1 −−−−→ {F1 (O)}O∈orc(M1 ) ({α },ϑ) ,Ni F
M2 −−−−→ {F2 (U )}U ∈orc(M2 ) Note added in proof. • A more general and concise functorial decription of the principle of general covariance will appear in [53]. • The required properties concerning Hadamard states mentioned at the end of Sect. 6 have recently been discussed in a preprint by D’Antoni and Hollands [54]. References 1. Araki, H.: On quasifree states of CAR and Bogoliubov transformations. Publ. RIMS 6, 385 (1970/71) 2. Balachandran, A.P., Batista, E., Costa e Silva, I.P., Teotonia-Sobrinho, P.: The spin-statistics connection in quantum gravity. Nucl. Phys. B 566, 441 (2000) 3. Bannier, U.: On generally covariant quantum field theory and generalized causal and dynamical structures. Commun. Math. Phys. 118, 163 (1988) 4. Bisognano, J.J., Wichmann, E.H: On the duality condition for a Hermitian scalar field. J. Math. Phys. 16, 985 (1975); On the duality condition for quantum fields. J. Math. Phys. 17, 303 (1976) 5. Bogoliubov, N.N., Logunov, A.A., Todorov, I.T., Oksak, A.I.: General principles of quantum field theory. Dordrecht: Kluwer Academic Publishers, 1990 6. Borchers, H.-J.: On revolutionizing quantum field theory with Tomita’s modular theory. J. Math. Phys. 41, 3604 (2000) 7. Borchers, H.-J., Yngvason, J.: From quantum fields to local von Neumann algebras. Rev. Math. Phys. Special Issue, 15 (1992) 8. Bratteli, O., Robinson, D.W.: Operator algebras and quantum statistical mechanics, Vol. 1, 2nd edn., Berlin: Springer-Verlag, 1987 9. Bratteli, O., Robinson, D.W.: Operator algebras and quantum statistical mechanics, Vol. 2, 2nd edn., Berlin: Springer-Verlag, 1997 10. Buchholz, D., Epstein, H.: Spin and statistics of quantum topological charges. Fizika 17, 329 (1985) 11. Burgoyne, N.: On the connection of spin with statistics. Nuovo Cim. 8, 607 (1958) 12. Dieudonné, J.: Foundations of Analysis, Vol. 3. New York: Academic Press, 1972 13. Dimock, J.: Algebras of local observables on a manifold. Commun. Math. Phys. 77, 219 (1980) 14. Dimock, J.: Dirac quantum fields on a manifold. Trans. Am. Math. Soc. 269, 133 (1982) 15. Doplicher, S., Haag, R., Roberts, J.E.: Local observables and particle statistics, I. Commun. Math. Phys. 23, 199 (1971); —, II. Commun. Math. Phys. 35, 49 (1974) 16. Doplicher, S., Roberts, J.E.: Why there is a field algebra with a compact gauge group describing the superselection structure in particle physics? Commun. Math. Phys. 131, 51 (1990) 17. Dowker, H.F., Sorkin, R.D.: A spin-statistics theorem for certain topological geons. Class. Quantum Grav. 15, 1153 (1998) 18. Driessler, W., Summers, S.J., Wichmann, E.H.: On the connection between quantum fields and von Neumann algebras of local operators. Commun. Math. Phys. 105, 49 (1986) 19. Epstein, H.: CTP invariance in a theory of local observables. J. Math. Phys. 8, 750 (1967) 20. Fierz, M.: Über die relativistische Theorie kräftefreier Teilchen mit beliebigem Spin. Helv. Phys. Acta 12, 3 (1939) 21. Fulling, S.A.: Aspects of quantum field theory in curved spacetime. Cambridge: Cambridge University Press, 1989 22. Fulling, S.A., Narcowich, F.J., Wald, R.M.: Singularity structure of the two-point function in quantum field theory in curved spacetime, II. Ann. Phys. (N.Y.) 136, 243 (1981)
288
R. Verch
23. Furlani, E.P.: Quantization of massive vector fields in curved space-time. J. Math. Phys. 40, 2611 (1999) 24. Furlani, E.P.: Quantization of massive vector fields on ultrastatic spacetimes. Class. Quantum Grav. 14, 1665 (1997) 25. Geroch, R: Spinor structure of space-times in general relativity. J. Math. Phys. 9, 1739 (1968) 26. Guido, D., Longo, R.: An algebraic spin and statistics theorem. Commun. Math. Phys. 172, 517 (1995) 27. Guido, D., Longo, R.: The conformal spin-statistics theorem. Commun. Math. Phys. 181, 11 (1996) 28. Guido, D., Longo, R., Roberts, J.E., Verch, R.: Charged sectors, spin and statistics in quantum field theory on curved spacetimes. Rev. Math. Phys. 13, 125 (2001) 29. Haag, R.: Local quantum physics. 2nd edn., Berlin: Springer-Verlag, 1996 30. Haag, R., Kastler, D.: An algebraic approach to quantum field theory. J. Math. Phys. 5, 848 (1964) 31. Hawking, S.W., Ellis, G.F.R.: The large scale structure of space-time. Cambridge. Cambridge University Press, 1973 32. Hollands, S., Wald, R.M.: Local Wick polynomials and time ordered products of quantum fields in curved spacetime. Preprint gr-qc/0103074 33. Jost, R.: The general theory of quantized fields. Lectures in applied mathematics, Vol. 4. Providence, RI: American Mathematical Society, 1965 34. Kay, B.S.: Quantum fields in curved spacetime: Non global hyperbolicity and locality. In: The Proceedings of the Conference Operator algebras and quantum field theory held in Rome, July 1996, S. Doplicher, R. Longo, J.E. Roberts, L. Zsido, eds, Cambridge, MA: International Press, 1997 35. Kuckert, B.: A new approach to spin and statistics. Lett. Math. Phys. 35, 319 (1995) 36. Lüders, G., Zumino, B.: Connection between spin and statistics. Phys. Rev. 110, 1450 (1958) 37. Parker, L., Wang, Y.: Statistics from dynamics in curved spacetime. Phys. Rev. D 39, 3596 (1989) 38. Pauli, W.: On the connection between spin and statistics. Phys. Rev. 58, 716 (1940) 39. Radzikowski, M.J.: Micro-local approach to the Hadamard condition in quantum field theory in curved spacetime. Commun. Math. Phys. 179, 529 (1996) 40. Rehren, K.-H.: Spin-statistics and CPT for solitons. Lett. Math. Phys. 46, 95 (1998) 41. Sahlmann, H., Verch, R.: Passivity and microlocal spectrum condition. Commun. Math. Phys. 214, 705 (2000) 42. Sahlmann, H., Verch, R.: Microlocal spectrum condition and Hadamard form for vector-valued quantum fields in curved spacetime. Preprint math-ph/0008029, to appear in Rev. Math. Phys. 43. Schlieder, S.: Einige Bemerkungen über Projektionsoperatoren. Commun. Math. Phys. 13, 216 (1969) 44. Streater, R.F., Wightman, A.S.: PCT, spin and statistics, and all that: New York: Benjamin, 1964 45. Strohmaier, A.: The Reeh–Schlieder property for quantum fields on stationary spacetimes. Commun. Math. Phys. 215, 105 (2000) 46. Takesaki, M.: Tomita’s theory of modular Hilbert algebras and its applications. Lecture Notes in Mathematics, Vol. 128, Berlin–Heidelberg–New York: Springer-Verlag, 1970 47. Taylor, M.E.: Pseudodifferential operators. Princeton, NJ: Princeton University Press, 1981 48. Verch, R.: Local definiteness, primarity and quasiequivalence of quasifree Hadamard quantum states in curved spacetime. Commun. Math. Phys. 160, 507 (1994) 49. Verch, R.: Scaling analysis and ultraviolet behaviour of quantum field theories in curved spacetime. Dissertation, Hamburg University, 1996 50. Wald, R.M.: Existence of the S-matrix in quantum field theory in curved space-time. Ann. Phys. (N.Y.) 118, 490 (1979) 51. Wald, R.M.: General relativity. Chicago, IL: University of Chicago Press, 1984 52. Wald, R.M.: Quantum field theory in curved spacetime and black hole thermodynamics. Chicago, IL: 1994 53. Brunetti, R., Fredenhagen, K., Verch, R.: The generally covariant locality principle – A new paradigm for local quantum field theory. In Preparation 54. D’Antoni, C., Hollands, S.: Nuclearity, local quasiequivalence and split property for Dirac quantum fields in curved spacetime. Preprint math-ph/0106028 Communicated by H. Nicolai
Commun. Math. Phys. 223, 289 – 326 (2001)
Communications in
Mathematical Physics
© Springer-Verlag 2001
Local Wick Polynomials and Time Ordered Products of Quantum Fields in Curved Spacetime Stefan Hollands, Robert M. Wald Enrico Fermi Institute, Department of Physics, University of Chicago, 5640 Ellis Ave., Chicago, IL 60637, USA. E-mail: [email protected]; [email protected] Received: 27 March 2001 / Accepted: 6 June 2001
Abstract: In order to have well defined rules for the perturbative calculation of quantities of interest in an interacting quantum field theory in curved spacetime, it is necessary to construct Wick polynomials and their time ordered products for the noninteracting theory. A construction of these quantities has recently been given by Brunetti, Fredenhagen, and Köhler, and by Brunetti and Fredenhagen, but they did not impose any “locality” or “covariance” condition in their constructions. As a consequence, their construction of time ordered products contained ambiguities involving arbitrary functions of spacetime point rather than arbitrary parameters. In this paper, we construct an “extended Wick polynomial algebra” – large enough to contain the Wick polynomials and their time ordered products – by generalizing a construction of Dütsch and Fredenhagen to curved spacetime. We then define the notion of a local, covariant quantum field, and seek a definition of local Wick polynomials and their time ordered products as local, covariant quantum fields. We introduce a new notion of the scaling behavior of a local, covariant quantum field, and impose scaling requirements on our local Wick polynomials and their time ordered products as well as certain additional requirements – such as commutation relations with the free field and appropriate continuity properties under variations of the spacetime metric. For a given polynomial order in powers of the field, we prove that these conditions uniquely determine the local Wick polynomials and their time ordered products up to a finite number of parameters. (These parameters correspond to the usual renormalization ambiguities occurring in Minkowski spacetime together with additional parameters corresponding to the coupling of the field to curvature.) We also prove existence of local Wick polynomials. However, the issue of existence of local time ordered products is deferred to a future investigation. 1. Introduction Despite some important differences from quantum field theory in Minkowski spacetime caused by the lack of a “preferred vacuum state”, the theory of a linear quantum field in a
290
S. Hollands, R. M. Wald
globally hyperbolic, curved spacetime is entirely well formulated (see, e.g., [14,20] for a review). However, even in Minkowski spacetime, the theory of a nonlinear (i.e., selfinteracting) quantum field is not, in general, well formulated. Nevertheless, in Minkowski spacetime there are well defined rules for obtaining perturbation series expressions for all quantities of interest for a nonlinear field (and in particular the interacting field itself). These perturbation expressions are defined up to certain, well specified “renormalization ambiguities”. It is of interest to know if a similar perturbative definition of nonlinear quantum fields can be given in curved spacetime and, if so, whether the renormalization ambiguities in curved spacetime are of the same nature as those in Minkowski spacetime. This issue was analyzed by Bunch and collaborators [3, 4], but the key steps in this analysis were done in the context of Riemannian spaces rather than Lorentzian spacetimes. Now, Minkowski spacetime can be viewed as a real section of a complex 4dimensional space that also contains a 4-dimensional, real Euclidean section. It is well known that a suitable definition of a field theory on this Euclidean section gives rise (via analytic continuation) to the definition of a field theory in Minkowski spacetime. However, no such connection between Riemannian and Lorentzian field theory holds for curved spacetimes, since (apart from a few special classes of spacetimes, such as static spacetimes) a general Lorentzian spacetime cannot be expressed as a section of a complex spacetime that also contains a real Riemannian section. Furthermore, the techniques used by Bunch cannot readily be generalized to the Lorentzian case because of the very significant mathematical differences in the nature of the divergences occurring in the Riemannian and Lorentzian cases. For example, in the Riemannian case, it follows from elliptic regularity that Green’s functions for the free theory are unique up to addition of smooth functions. However, no such result holds in the Lorentzian case, as exemplified by the very different properties of the advanced, retarded, and Feynman propagators. Furthermore, singularities in the Green’s function occur only in the coincidence limit in the Riemannian case, but they occur also for non-coincident, lightlike related events in the Lorentzian case. As a result, formulas like (2.14) of [3], which play a crucial role in the Riemannian analysis, cannot be readily taken over to the Lorentzian case. In addition, dimensional regularization and other renormalization techniques used in Riemannian spaces are not well defined in Lorentzian spacetimes. Recently, significant progress in the definition of perturbative quantum field theory in Lorentzian spacetimes was made by Brunetti and Fredenhagen [1, 2], who used the methods of “microlocal analysis” [11, 7] to analyze the nature of the divergences occurring in the Lorentzian theory. In [1], these authors considered the Fock space arising (via the GNS construction) from a choice of quasi-free Hadamard state ω. They showed that on this Hilbert space, the Wick polynomials – generated by the (formally infinite) products of field operators and their derivatives evaluated at the same spacetime point – can be given a well defined meaning as operator-valued-distributions via a normal ordering prescription with respect to ω. In [2], they then used an adaptation of the Epstein-Glaser method [8] of renormalization in Minkowski spacetime to analyze time ordered products of Wick polynomials, which are the quantities needed for a perturbative construction of the interacting field theory. They thereby showed that quantum field theories in curved spacetime could be given the same “perturbative classification” as in Minkowski spacetime, i.e., that all of the “ultraviolet divergences” of the theory in curved spacetime are of the same nature as in Minkowski spacetime. Nevertheless, their analysis in curved spacetime left open a much greater renormalization ambiguity than in Minkowski spacetime: In essence, quantities that appear at each perturbation order in Minkowski spacetime
Local Wick Polynomials and Time Ordered Products Curved Spacetime
291
as renormalized coupling constants now appear in curved spacetime as renormalized coupling functions, whose dependence upon the spacetime point can be arbitrary. It seems clear that the missing ingredient in the analysis of [2] is the imposition of a suitable requirement of covariance/locality on the renormalization prescription, as was previously given for the definition of the stress-energy tensor of a free quantum field (see pp. 89–91 of [20]). The imposition of such a condition should provide an appropriate replacement for the imposition of Poincaré invariance in Minkowski spacetime. When such a condition is imposed, one would expect that the renormalized coupling functions would no longer be arbitrary functions of the spacetime point but would be locally constructed out of the metric in a covariant manner. Furthermore, one might expect that when suitable continuity and scaling requirements are also imposed, the ambiguities should be reduced to finitely many free parameters at each order rather than free functions. The renormalization ambiguities would then correspond to the renormalization ambiguities in Minkowski spacetime together with the renormalization of some additional parameters associated with couplings of the quantum field to curvature. The main purpose of this paper is to show that these expectations are correct with regard to the uniqueness (though not necessarily the existence) of the perturbatively defined theory. A key step in our analysis is to define the notion of a local1 , covariant quantum field. The basic idea behind this notion is to consider a situation wherein one changes the metric outside of some region O and, in essence, demands that the local, covariant quantum field not change within O. A precise definition of this notion will be given in Sect. 3 below (see Def. 3.2). In Sect. 3, we will also explicitly see that the Wick polynomials as defined in [1] fail to be local, covariant quantum fields (no matter how ω is chosen); consequently, neither are the time ordered products of these fields constructed in [2]. These quantities must therefore not be used for the definition of the local observables in the interacting theory; their definition depends on a choice of a reference state ω, which is itself a highly nonlocal quantity. Our analysis will proceed as follows: First, we will obtain, for any given globally hyperbolic spacetime (M, g), an abstract “extended Wick polynomial algebra”, W(M, g), via a normal ordering prescription with respect to a quasi-free Hadamard state, ω. (We refer to our algebra W(M, g) as “extended”, because it is actually enlarged beyond the usual Wick polynomial algebra so as to already include elements corresponding to the time ordered products of Wick polynomials.) Our construction of this algebra is essentially a straightforward generalization to curved spacetime (using the methods of [1]) of a construction previously given by [6] in the context of Minkowski space. We then note that the resulting operator algebra – viewed as an abstract algebra – is independent of the choice of ω. Next, we will seek to identify the elements of this abstract algebra that merit the interpretation of representing the various Wick polynomials and time ordered products. As indicated above, the crucial requirement that we shall place on these elements is that they be local, covariant quantum fields. We shall refer to these elements as “local Wick polynomials” and “local time ordered products”. Some other “specific properties” – such as commutation relations with the free field – will also be imposed as requirements on the definitions of these quantities. It is worth emphasizing that, unlike in Minkowski space, we will find that some ambiguities necessarily arise in defining local Wick polynomials. 1 In quantum field theory, the terminology “local field” is commonly used to mean a field that commutes with itself at spacelike separated events. Our use of the terminology “local, covariant field” here is not related to this notion. Rather, we use this terminology to express the idea that the field is constructed in a local and covariant way from the spacetime metric, as precisely defined in Sect. 3 below.
292
S. Hollands, R. M. Wald
Consequently, renormalization ambiguities in defining perturbative quantum field theory in curved spacetime arise not only from the definition of time ordered products of Wick polynomials but also from the definition of the local Wick polynomials themselves. As indicated above, after our locality/covariance requirement and our other specific properties have been imposed on the definition of Wick polynomials and their time ordered products, we will find that the ambiguities in the definitions of these quantities will be reduced from arbitrary functions of the spacetime point to functions that are locally constructed from the metric (as well as parameters that appear in the classical theory) in a covariant manner. However, in order to further reduce the ambiguities to the renormalization of finitely many parameters at each order, there are two other conditions we must impose: (i) a suitable continuous/analytic dependence of the local Wick polynomials and their time ordered products on the metric, g, and coupling constants, p, and (ii) a suitable scaling behavior of these quantities. However, neither of these notions are straightforward to define. The difficulty with defining a suitable notion of the continuous dependence of an element in W(M, g) on the metric and parameters occurring in the classical theory arises from the fact that the Wick polynomial algebra W(M, g) for a spacetime (M, g) is not naturally isomorphic to the Wick polynomial algebra W(M, g ) for a different spacetime (M, g ), so it is far from clear what it means for an element of the Wick polynomial algebra to vary continuously as g is continuously varied to g . Fortunately, the task of defining this notion is made much easier by the fact that we are concerned only with local, covariant quantum fields, so we may restrict attention to metric variations that occur in some spacetime region O with compact closure. In order to make use of a similar simplification with regard to variations of the parameters, p, appearing in the classical theory, it is convenient to allow these parameters to become functions of spacetime point and to then also restrict attention to variations that occur only within O. If g agrees with g and p agrees with p outside of O, we can identify an element of Wp (M, g) with the element of Wp (M, g ) which, say, agrees with it outside of the future2 of O (where we have a put a subscript p on the algebras to indicate their dependence on the coupling parameters). With this identification of elements of the different algebras, we require that if (g(s) , p(s) ) vary smoothly with s in a suitable sense, then within O each local Wick polynomial and time ordered product of local Wick polynomials must vary continuously with s. A precise formulation of this requirement will be given in Sect. 4.2 below. The above requirement that the local Wick polynomials and their time ordered products depend continuously on the metric would not suffice to eliminate non-analytic local curvature ambiguities of the sort considered in [19]. We therefore shall impose an additional analyticity requirement that states that if g(s) is a one-parameter analytic family of analytic metrics, then each local Wick polynomial and time ordered product of local Wick polynomials must vary analytically with s; we similarly require analytic variation of local Wick polynomials and their time ordered products under analytic variation of the parameters p. However, for analytic spacetimes, we cannot use the above method to identify algebras of different spacetimes, since one can no longer make local variations of the metric. Instead, we proceed by introducing a notion of an analytic family, ω(s) , of quasi-free Hadamard states on (M, g(s) ), and we require that the distributions obtained by acting with ω(s) on the local Wick polynomials and their time ordered products vary analytically with s in a suitable sense. A precise formulation of these requirements will be given in Sect. 4.2. 2 We would obtain a different identification of the algebras by demanding agreement outside the past of O, but this would give rise to an equivalent notion of continuous dependence.
Local Wick Polynomials and Time Ordered Products Curved Spacetime
293
In Minkowski spacetime, scaling behavior is usually formulated in terms of how fields behave under the transformation x → λx. Such a formulation would be highly coordinate dependent in curved spacetime and thus would be very awkward to implement. Our notion of local, covariant quantum fields allows us to formulate a notion of scaling in terms of the behavior of these fields under the scaling of the spacetime metric, g → λ2 g (where λ is a constant) together with associated scalings of the parameters, p, occurring in the theory. Note that in Minkowski spacetime, consideration of the behavior of a local, covariant quantum field under scaling of the spacetime metric, g → λ2 g, is equivalent to considering the behavior of these fields under x → λx, since this diffeomorphism is a conformal isometry with constant conformal factor λ2 , so x → λx with fixed metric is equivalent via a diffeomorphism to g → λ2 g at each fixed x. If we consider a classical field theory that is invariant under g → λ2 g together with corresponding scaling transformations on the field and on the parameters, p → p(λ), appearing in the theory, then the corresponding field algebras, Wp(λ) (M, λ2 g), will be naturally isomorphic to each other. It might appear natural to require that our definition of local Wick polynomials and their time ordered products be such that they are preserved under this isomorphism of the algebras. However, even in quantum field theory in Minkowski spacetime, it is well known that such a requirement cannot be imposed on time ordered products. In curved spacetime, we shall show that such a scaling requirement cannot be imposed upon the local Wick polynomials either. However, it is possible to require that the failure of the local Wick polynomials and their time ordered products to scale like their classical counterparts is given by terms with only logarithmic dependence upon λ. This notion is made precise in Sect. 4.3. The main results of this paper may now be summarized. First, we shall construct the algebra W(M, g) for an arbitrary globally hyperbolic spacetime. We then define the notion of a “local, covariant quantum field” and provide an axiomatic characterization of “local Wick polynomials” and their time ordered products. We shall then prove the existence of local Wick polynomials via an explicit construction, and we shall give a precise characterization of their non-uniqueness. Next, we consider the time ordered products of local Wick polynomials. We shall obtain a precise characterization of the non-uniqueness of these time ordered products in a manner similar to our analysis of the non-uniqueness of the local Wick polynomials. However, the existence of time ordered products that satisfy our covariance/locality requirement cannot be readily proven because the Epstein–Glaser prescription does not manifestly preserve covariance/locality. Consequently, we shall defer the investigation of existence of time ordered products to a future investigation. For simplicity and definiteness, we shall restrict consideration in this paper to the theory of a real scalar field. However, the generalization of our definitions and conclusions to other fields should be straightforward. Notations and conventions. Throughout, (M, g) denotes a globally hyperbolic, timeoriented spacetime. The manifold structure of M is assumed to be real analytic, and the metric tensor g ≡ gab is assumed to be smooth (but not necessarily analytic). Our conventions regarding the spacetime geometry are those of [21]. Vx± denote the closed future resp. past lightcone at a point x. g = g ab ∇a ∇b is the wave operator in curved space and µg = |det g|1/2 d 4 x. D(M) is the space of smooth complex-valued functions on M with compact support and D (M) is the corresponding dual space of distributions. Our convention for the Fourier transform in Rn is u(k) ˆ = (2π )−n/2 e+ikx u(x)d n x.
294
S. Hollands, R. M. Wald
2. Definition of the Extended Wick-Polynomial Algebra 2.1. Definition of the fundamental algebra of observables associated with a quantized Klein–Gordon field. The theory of a free classical Klein–Gordon field on a spacetime (M, g) with mass m and curvature coupling ξ is described by the action L0 µg = (g ab ∇a ϕ∇b ϕ + ξ Rϕ 2 + m2 ϕ 2 ) µg . (1) S= M
M
The theory of a free quantized Klein–Gordon field in curved spacetime can be formulated in various ways. For our purposes, it is essential to formulate the theory within the socalled “algebraic approach” (see, for example [14, 20]). In this approach, one starts from an abstract *-algebra A(M, g) (with unit), which is generated by certain expressions in the smeared quantum field, ϕ(f ), where f is a test function. In [14, 20], expressions of the form eiϕ(f ) were considered. The main advantage of working with such expressions is that the so-obtained algebra then has a norm (in technical terms, it is a C ∗ -algebra). Defining the algebra A(M, g) in that way would however be inconvenient for our purposes. Instead, we shall take A(M, g) to be the *-algebra generated by the identity and the smeared field operators ϕ(f ) themselves, subject to the following relations: Linearity: D(M) f → ϕ(f ) ∈ A(M, g) is complex linear. Klein–Gordon: ϕ((g − ξ Rg − m2 )f ) = 0 for all f ∈ D(M). Hermiticity: ϕ(f )∗ = ϕ(f¯). ret Commutation Relations: [ϕ(f1 ), ϕ(f2 )] = ig (f1 ⊗ f2 )1, where g = adv g − g is the causal propagator for the Klein–Gordon operator. The so-obtained algebra A(M, g) is now no longer a C ∗ -algebra, because of the unbounded nature of the smeared quantum fields ϕ(f ). This will however not be relevant in the following. A state in the algebraic framework is a linear functional ω : A(M, g) → C which is normalized so that ω(1) = 1 and positive in the sense that ω(a ∗ a) ≥ 0 for all a ∈ A(M, g). The algebraic notion of a state is related to the usual Hilbert-space notion of a state by the GNS theorem. This says that for any algebraic state ω, one can can construct a Hilbert space Hω containing a distinguished “vacuum” vector |ω , and a representation πω of the algebraic elements a ∈ A(M, g) as linear operators on a dense invariant subspace Dω ⊂ Hω , such that ω(a) = ω |πω (a)|ω for all a ∈ A(M, g). The multilinear functionals on D(M) defined by def
ω(f1 ⊗ · · · ⊗ fn ) = ω(ϕ(f1 ) . . . ϕ(fn ))
(2)
are called n-point functions. Every state on A(M, g) is uniquely determined by the collection of its n-point functions. A quasi-free state is by definition one which satisfies ω(eiϕ(f ) ) = e− 2 ω(f ⊗f ) . 1
(3)
Note that the elements eiϕ(f ) do not actually belong to the algebra A(M, g). What is meant by Eq. (3) is the set of identities obtained by functionally differentiating this equation with respect to f . The so obtained identities then express the n-point functions of the state ω in terms of its two-point function. For quasi-free states, the GNS construction gives the usually considered representation of the fields on Fock-space, with |ω the Fock-vacuum and with the field given in terms of creation and annihilation operators [14].
Local Wick Polynomials and Time Ordered Products Curved Spacetime
295
In our subsequent constructions, we will consider quasi-free states which are in addition of “global Hadamard type”. These are states whose two-point function has no spacelike singularities, and whose symmetrized two-point functions are given locally, modulo a smooth function, by a Hadamard fundamental solution [10], H , defined as H (x, y) = u(x, y) P(σ −1 ) + v(x, y) ln |σ |.
(4)
Here, σ is the squared geodesic distance between the points x and y in the spacetime (M, g), u and v are certain real and symmetric smooth functions constructed from the metric and the couplings and “P” denotes the principal value. Strictly speaking, H is well defined only in analytic spacetimes (we will come back to this issue in Sect. 5.2), so the above definition needs to be modified in spacetimes that are only smooth. For a detailed discussion of this and of the statement that “there are no spacelike singularities”, see [14]. An immediate consequence the definition of Hadamard states is that if ω and ω are Hadamard states, then ω(x, y) − ω (x, y) is a smooth function on M × M. There exists an alternative, equivalent characterization of globally Hadamard due to Radzikowski [18, Thm. 5.1], involving the notion of the “wave front set”[11, 7] of a distribution, which will play a crucial role in our subsequent constructions. (A definition of the wave front set and some of its elementary properties is given in the Appendix.) Namely, the globally Hadamard states in the sense of [14] are precisely those states whose two-point function is a bidistribution with wave front set WF(ω) = {(x1 , k1 , x2 , −k2 ) ∈ (T ∗ M)2 \{0} | (x1 , k1 ) ∼ (x2 , k2 ), k1 ∈ Vx+1 }.
(5)
Here, the following notation has been used: We write (x1 , k1 ) ∼ (x2 , k2 ) if x1 and x2 can be joined by a null geodesic and if k1 and k2 are cotangent and coparallel to that null geodesic. 2.2. Definition and properties of the algebra W(M, g). In the previous subsection, we reviewed the algebraic construction of a free quantum field theory. However, the algebra A(M, g) used in that construction includes only observables corresponding to the smeared n-point functions of the free field. If we wish to define a nonlinear quantum field theory via a perturbative construction off the free field theory, we must consider additional observables, namely Wick polynomials and their time ordered products. Our strategy for doing so is to define an enlarged algebra of observables, W(M, g), that contains A(M, g) and also contains, among others, elements corresponding to (smeared) Wick polynomials of free-fields and (smeared) time ordered products of these fields. The construction of W(M, g) is essentially a straightforward generalization of [6], using ideas of [1, 2]. The construction initially depends on the choice of an arbitrary quasi-free Hadamard state ω on A(M, g). However, we will show below that different choices for ω give rise to isomorphic algebras. In that sense the algebras W(M, g) do not depend on the choice of a particular quasi-free Hadamard state. We note that, in particular, the construction of W(M, g) achieves the goal stated on p. 86 of [20], namely, to define an enlarged algebra of observables that includes the smeared stress-energy tensor. Once we have properly identified the elements in W(M, g) corresponding to local Wick products and local time ordered products, the standard rules of perturbative quantum field theory will allow us to obtain perturbative expressions for the interacting field observables. These perturbative quantities – such as for example the interacting field itself – are given by formal power series in the coupling constants. The infinite sums
296
S. Hollands, R. M. Wald
occurring of these formal power series do not, of course, define elements of our algebra W(M, g). However, the expressions obtained by truncating these power series at some arbitrary order in perturbation theory will be elements in W(M, g). In that sense W(M, g) contains the observables (to arbitrary high order in perturbation theory) of the interacting theory. The “renormalization ambiguities” occurring in these perturbative expressions arise from the ambiguities in the definition of the local Wick products and local time ordered products. The main goal in this paper is to give a precise characterization of these ambiguities. It should be noted that since A(M, g) ⊂ W(M, g), the notion of states for the nonlinear theory will be more restrictive than the notion of states for the free theory given in the previous section, but the states on W(M, g) will include a dense set of vectors in the GNS representation of any quasi-free Hadamard state. Indeed, it will follow from our results below that all Hadamard states on A(M, g) whose truncated n-point functions (other than the two-point function) are smooth can be extended to W(M, g). We conjecture that these are the only states on A(M, g) that can be extended to W(M, g), i.e., that the states on W(M, g) are in 1–1 correspondence with Hadamard states on A(M, g) with smooth truncated n-point functions.3 (See the “Note added in proof” at the end of article.) To begin our construction of W(M, g), choose a quasi-free Hadamard state ω on A(M, g). Via the GNS construction, one obtains from this a representation of the field operators ϕ(f ) as linear operators on a Hilbert space Hω with dense, invariant domain Dω , where we use the same symbol for the algebraic element ϕ(f ) and its representative on Hω . Next, define the symmetric operator-valued distributions Wn (x1 , . . . , xn ) =: ϕ(x1 ) . . . ϕ(xn ) :ω
δn 1 = n exp ω(f ⊗ f ) + iϕ(f ) i δf (x1 ) . . . δf (xn ) 2 f =0
def
(6)
for n ≥ 1 and W0 ≡ 1. The operators Wn (t) obtained by smearing with a test function t = f1 ⊗ · · · ⊗ fn ∈ D(M n ) are elements of the algebra A(M, g). The product of two operators Wn (t) and Wm (t ) is given by the following formula (which is just a re-formulation of Wick’s theorem), Wn+m−2k (t ⊗k t ) ∀t ∈ D(M n ), t ∈ D(M m ). (7) Wn (t)Wm (t ) = k
The expression t ⊗k for m, n ≥ k by
t
is the symmetrized, k times contracted tensor product, defined
(t ⊗k t )(x1 , . . . , xn+m−2k ) n!m! def = S t (y1 , . . . , yk , x1 , . . . , xn−k ) (n − k)!(m − k)!k! M 2k × t (yk+1 , . . . , y2k , xn−k+1 , . . . , xn+m−2k )
k
ω(yi , yk+i ) µg (yi )µg (yk+i ),
(8)
i=1 3 Kay (unpublished) has shown that in the vacuum representation of A in Minkowski spacetime, these states include all n-particle states with smooth mode functions. More generally, he also showed that on a globally hyperbolic spacetime, these states include all n-particle states with smooth mode functions in the GNS representation of any quasi-free Hadamard state.
Local Wick Polynomials and Time Ordered Products Curved Spacetime
297
where S means symmetrization in x1 , . . . , xn+m−2k . If either m < k or n < k, then the contracted tensor product is defined to be zero. In order to obtain more general operators such as normal ordered Wick powers, we would like to be able to smear the operator-valued distributions Wn not only with smooth test functions, but in addition also with certain compactly supported test distributions t. That this is indeed possible can be seen by means of a microlocal argument, which is based on the following observation [2]: The domain Dω contains a dense invariant subspace of vectors |ψ (the so-called “microlocal domain of smoothness”, see [2, Eq. (11)]) having the property that the wave front set of the vector-valued distributions t → Wn (t)|ψ is contained in the set Fn (M, g), defined as Fn (M, g) = {(x1 , k1 , . . . , xn , kn ) ∈ (T ∗ M)n \{0} | ki ∈ Vx−i , i = 1, . . . , n}.
(9)
Now, smearing the above vector-valued distributions with a distributional test function t involves taking the pointwise product of two distributions. As it is well known, the pointwise product of two distributions is in general ill-defined. However, a theorem by Hörmander [11, Thm. 8.2.10] states that if the wave front sets of two distributions u and v are such that {0} ∈ / WF(u) + WF(v), then the pointwise product between u and v can be unambiguously defined. In the case at hand, we are thus allowed to smear Wn in with any compactly supported distribution t such that {0} ∈ / WF(t) + Fn (M, g). We here shall consider a subclass of the set of all such n-point distributions t, namely the class En (M, g) = {t ∈ D (M n ) | t is symmetric, supp(t) is compact, WF(t) ⊂ Gn (M, g)}, (10) def
where def
∗
n
Gn (M, g) = (T M) \
x∈M
(Vx+ )n
∪
x∈M
(Vx− )n
.
(11)
Smearing Wn with test distributions t ∈ En (M, g) gives therefore well defined operators on the microlocal domain of smoothness. (For notational simplicity, we denote this domain again by Dω .) Definition 2.1. W(M, g) is the *-algebra of operators on Hω generated by 1 and elements of the form Wn (t), where n ≥ 1 and where t ∈ En (M, g). Theorem 2.1. The product in the algebra W(M, g) can be computed by Eq. (7), and the *-operation is given by Wn (t)∗ = Wn (t¯). Furthermore, Wn (t) = 0 whenever t is of the form t (x1 , . . . , xn ) = (g − ξ Rg − m2 )xi s(x1 , . . . , xn ) for some s ∈ En (M, g). Proof. The statement concerning the *-operation follows because the free field is Hermitian. In order to show that the algebra product can be calculated by Eq. (7), we (M, g), then t ⊗ t ∈ E first show that if t ∈ En (M, g) and t ∈ Em k n+m−2k (M, g). Clearly t ⊗k t is compactly supported and symmetric. We must show that in addition WF(t ⊗k t ) ⊂ Gn+m−2k (M, g). This can be seen by an application of [11, Thm. 8.2.13],
298
S. Hollands, R. M. Wald
which yields, in combination with Eq. (5) for WF(ω), WF(t ⊗k t ) ⊂ {(x1 , k1 , . . . , xn+m−2k , kn+m−2k ) ∈ (T ∗ M)n+m−2k | ∃ elements (x1 , k1 , . . . , xn−k , kn−k , y1 , p1 , . . . , yk , pk ) ∈ WF(t) and xn−k+1 , kn−k+1 , . . . , xn+m−2k , kn+m−2k , yk+1 , pk+1 , . . . , y2k , p2k ∈ WF(t ) such that either (xj , pj ) ∼ (xj +k , −pj +k ) and pj ∈ Vx−j \{0} or pj = pj +k = 0 for all j = 1, . . . , k}. (12) It is not difficult to see that the set on the right side of the above inclusion is in fact (M, g), contained in Gn+m−2k (M, g), thereby showing that t ⊗k t is in the class En+m−2k as we wanted to show. We finish the proof by showing that Eq. (7) holds not only for smooth test functions, but also for our admissible test distributions t ∈ En (M, g) (M, g). To see this, we consider sequences of test functions {t } and {t } and t ∈ Em α α converging to t and t in the sense of D. n (M n ) resp. D. m (M m ) (for a definition of these spaces and their pseudo topology, the so-called “Hörmander pseudo topology”, see the Appendix), where .n and .m are closed conic sets in Gn (M, g) and Gm (M, g), respectively with the property that WF(t) ⊂ .n and WF(t ) ⊂ .m . Now the operation of composing distributions – which forms the basis of the definition of the contracted tensor product, Eq. (8) – is continuous in the Hörmander pseudo topology. Therefore tα ⊗k tα → t ⊗k t in the space D. m+n−2k (M n+m−2k ), where .n+m−2k is a certain closed conic set in Gn+m−2k (M, g), which is calculable from .n and .m using formula Eq. (12). Now expressions of the sort Wn (t)|ψ arise from the pointwise product of distributions. This product is continuous in the Hörmander pseudo-topology. Therefore we conclude that Wn+m−2k (tα ⊗k tα )|ψ → Wn+m−2k (t ⊗k t )|ψ. By a similar argument, it also follows that Wn (tα )Wm (tα )|ψ → Wn (t)Wm (t )|ψ. Equation (7), applied to some vector |ψ ∈ Dω , is already known to hold for tα and tα , since these are smooth test functions. It follows that Eq. (7) must also hold for our admissible test distributions. The last statement of the theorem is obvious from the definition of Wn when t and s are smooth functions. By a continuity argument similar to the one above, it also holds for distributional t and s. ! " Since En (M, g) is a vector space and since Eq. (7) holds, it follows immediately that any a ∈ W(M, g) can be written in the form a = t0 1 +
N
Wn (tn ),
(13)
n=1
with t0 ∈ C and tn ∈ En (M, g). Furthermore, the following proposition holds, which will be needed in Sect. 5: Proposition 2.1. Let k ≥ 0 and let a ∈ W(M, g) be such that [. . . [[a, ϕ(f1 )], ϕ(f2 )], . . . ϕ(fk+1 )] = 0 ∀f1 , . . . , fk+1 ∈ D(M). Then a is of the form a = t0 1 +
k
n=1 Wn (tn ),
where t0 ∈ C and tn ∈ En (M, g).
(14)
Local Wick Polynomials and Time Ordered Products Curved Spacetime
299
Proof. a must be of the form (13) where N is some natural number. We must show that N ≤ k. Let us assume that N > k and that WN (tN ) $ = 0. We show that this leads to a contradiction. By assumption [. . . [a, ϕ(f1 )], . . . ϕ(fN+1 )] = 0 for all test functions. Using Eq. (7) (and recalling that ϕ(f ) = W1 (f )), this gives us (g ⊗ · · · ⊗ g )tN (x1 , . . . , xN ) ≡ 0.
(15)
ret Using the relation g = adv g − g , the support properties of the advanced and retarded fundamental solutions and the fact that tN is compactly supported, one finds ret from Eq. (15) that the distribution s = (ret g ⊗ · · · ⊗ g )tN must be of compact support. In combination with a microlocal argument similar to the one given in the proof of Thm. 2.1, one finds moreover that s ∈ EN (M, g). Since tN (x1 , . . . , xN ) = N 2 i=1 (g − ξ Rg − m )xi s(x1 , . . . , xN ), it follows from Thm. 2.1 that WN (tN ) = 0, which contradicts our hypothesis. ! "
That the algebra W(M, g) contains normal ordered Wick products can be seen as follows. Let t (x1 , . . . , xk ) = f (x1 )δ(x1 , . . . , xk ),
f ∈ D(M).
(16)
The distribution t is in Ek (M, g), because WF(t) = {(x, k1 , . . . , x, kk ) ∈ (T ∗ M)k \{0} |
ki = 0} ⊂ Gk (M, g).
i
The algebraic element Wk (t) with t as in Eq. (16) is then just the nth normal ordered Wick power of a free field operator, as previously defined in [1], : ϕ k (f ) :ω = Wk (t).
(17)
More generally, we may take t to be t (x1 , . . . , xr ) = δ(xi1 , . . . ) . . . δ(xin , . . . )f1 (xi1 ) . . . fr (xin ),
(18)
where I1 = {i1 , . . . }, . . . , In = {in , . . . } is a partition of {1, . . . , r} into n pairwise disjoint subsets with |Ij | = kj . This gives us the generalized Wick product : ϕ k1 (f1 ) . . . ϕ kn (fn ) :ω = Wr (t).
(19)
As was shown in [2], W(M, g) also contains time ordered products of Wick-powers of free fields. We next discuss the dependence of the algebra W(M, g) on our choice of a reference state ω. Let us suppose we had started with another quasi-free Hadamard state ω . We would then have obtained another algebra W (M, g) generated by corresponding operators acting on the GNS Hilbert space constructed from ω . If the GNS representations of ω and ω were unitarily equivalent, then the Bogoliubov transformation implementing that unitary equivalence would induce a canonical isomorphism between W(M, g) and W (M, g). However, even if the GNS representations of ω and ω fail to be unitarily equivalent, at the algebraic level, there is nevertheless a canonical isomorphism:
300
S. Hollands, R. M. Wald
Lemma 2.1. There is a canonical *-isomorphism α : W (M, g) → W(M, g), which acts on the generators Wn of W (M, g) by def Wn−2k (d ⊗k , t), (20) α(Wn (t)) = k
where Wn denote the generators in W(M, g), and we are using the following notation: d(x1 , x2 ) = ω(x1 , x2 ) − ω (x1 , x2 ) and n! def d ⊗k , t(x1 , . . . , xn−2k ) = t (y1 , . . . , y2k , x1 , . . . , xn−2k ) (2k)!(n − 2k)! M 2k ×
k
d(y2i−1 , y2i ) µg (y2i−1 )µg (y2i )
(21)
i=1
for 2k ≤ n and d ⊗k , t = 0 for 2k > n. Proof. In order to show that the right hand side of Eq. (20) represents an element in (M, g). We first note that, since ω and ω W(M, g), we must show that d ⊗k , t ∈ En−2k are Hadamard states, d is smooth. By [11, Thm. 8.2.13] we therefore find WF(d ⊗k , t) ⊂ {(x1 , k1 , . . . , xn−2k , kn−2k ) ∈ (T ∗ M)n−2k \{0} | ∃(x1 , k1 , . . . , xn−2k , kn−2k , y1 , 0, . . . , y2k , 0) ∈ Gn (M, g)} ⊂ Gn−2k (M, g).
(22)
The distribution d ⊗k , t is by definition symmetric and of compact support. Therefore d ⊗k , t ∈ En (M, g), which gives us that α(Wn (t)) ∈ W(M, g). Since every element in W (M, g) can be written as a sum of elements of the form Wn (t), with t ∈ En (M, g), we may therefore take Eq. (20) as the definition of a linear map from W (M, g) to W(M, g). That this map is a homomorphism is demonstrated by the following calculation: α(Wn (t))α(Wm (t )) = Wn−2k d ⊗k , t Wn−2l d ⊗l , t k,l
=
i
=
k,l
r r
i
=
Wn+m−2(k+l+i) d ⊗k , t ⊗i d ⊗l , t
i
=α
Wn+m−2(r+i) d ⊗k , t ⊗i d ⊗(r−k) , t
(23)
k=0
Wn+m−2(r+i) d ⊗r , t ⊗i t
r Wn (t)Wm (t ) ,
where we have used the identity r
d ⊗k , t ⊗i d ⊗(r−k) , t = d ⊗r , t ⊗i t .
(24)
k=0
That α preserves the *-operation follows because d is real, which is in turn a consequence of the fact that Im ω = Im ω = 21 g . That α is one-to-one can be seen from an explicit construction of its inverse, given by the same formula as (20), but with d replaced by −d. " !
Local Wick Polynomials and Time Ordered Products Curved Spacetime
301
It should be noted here that the abstract algebra W(M, g) could be defined more simply and directly as the algebra of expressions of the form Eq. (13), with a product defined by Eq. (7), a *-operation defined by Wn (t)∗ = Wn (t¯) and which satisfy Wn (t) = 0 whenever t is of the form t (x1 , . . . , xn ) = (g − ξ Rg − m2 )xi s(x1 , . . . , xn ). (Note, however, that the definition of the product (7) requires a choice of Hadamard state ω; see Eq. (8).) However, our explicit construction of W(M, g) as an operator algebra on the GNS representation of a quasi-free state, ω, on A(M, g), is useful for establishing that a suitably wide class of states exists on W(M, g). In addition, the concrete realization of W(M, g) will be useful in our explicit construction of local Wick products. For later purposes, we also need to define a notion of convergence within the algebra W(M, g). In particular, we would like to have a notion of convergence which is preserved under taking products in our algebra, and which is independent of the quasi-free Hadamard state ω by which this algebra is defined. Such a notion can be defined as follows. Let {tα } be a sequence of distributions in En (M, g) with WF(tα ) ⊂ .n ∀α, where .n is some closed conic set contained in Gn (M, g). Then we say that aα = Wn (tα ) → a = Wn (t) if
in W(M, g)
tα → t in D. n (M n ),
i.e., if tα → t in the sense of the Hörmander pseudo-topology associated with the cone .n (for the definition of this pseudo topology and the spaces D. n (M n ) we refer to the Appendix). Convergence in the Hörmander pseudo-topology guarantees that t ∈ En (M, g). Therefore our algebra is closed with respect to the above notion of convergence. Clearly, that notion is also independent of the particular quasi-free Hadamard state chosen to define W(M, g). Finally, let aα → a and bα → b be two convergent sequences in W(M, g) in the above sense. Then, by an argument almost identical to the one given towards the end of the proof of Thm. 2.1, we also have aα bα → ab. Hence, the element-wise product of two convergent sequences of algebraic elements gives again a convergent sequence. 3. Mathematical Formulation of the Notion of a Local, Covariant Quantum Field The field quantities of interest in quantum field theory in curved spacetime such as the stress energy tensor of free fields or the quantity “λϕ 4 ” should be local and covariant, i.e., their definition should not depend on structures that are only globally defined (such as a preferred vacuum state) nor should they depend on non-covariant structures (such as a preferred coordinate system). The aim of this section is to explain precisely what we mean by the statement that an element in W(M, g) is “locally defined” and “transforms covariantly under diffeomorphisms”. This notion requires the consideration of a given operator on spacetimes (M, g) and (M , g ) that have isometric regions, but that are not globally isometric. The basic problem is that operators living on (M, g) and (M , g ) belong to different algebras, and therefore cannot be compared directly. Therefore, we must first provide a natural and consistent identification of the corresponding algebras (see Lem. 3.1). For this purpose, we consider “causality preserving isometric embeddings”, that is, isometric embeddings χ : N → M from a spacetime (N, g ) to another spacetime (M, g) so that the causal structure on χ (N ) induced from (N, g ) coincides with that induced from (M, g). (This is equivalent to the condition that χ preserves the time-orientation and that J + (x) ∩ J − (y) ⊂ χ (N ) ∀x, y ∈ χ (N ).)
302
S. Hollands, R. M. Wald
Lemma 3.1. Let χ : N → M be an isometric embedding of some globally hyperbolic spacetime (N, g ) into another globally hyperbolic spacetime (M, g) (so that in fact g = χ ∗ g) which is causality preserving. Denote by W(N, g ) and W(M, g) the corresponding extended Wick-polynomial algebras, viewed as abstract algebras. Then there is a natural injective *-homomorphism ιχ : W(N, g ) → W(M, g) such that if ω is a quasi-free Hadamard state on (M, g) and ω (x, y) = ω(χ (x), χ (y)) we have ιχ (Wn (t)) = Wn (t ◦ χ −1 ) ∀t ∈ En (N, g ),
(25)
where Wn and Wn are given by Eq. (6) in the GNS representations of ω and ω respectively and χ −1 : χ (N ) → N is the inverse of χ (defined on the image of N under χ ). Proof. Let ω be a quasi-free Hadamard state for the spacetime (M, g) and let ω (x, y) = ω(χ (x), χ (y)). Then ω (x, y) is the two-point function of a quasi-free Hadamard state ω on (N, g ). (Here we are using the assumption that our isometry χ is causality preserving.) By Lem. 2.1, we may assume that the abstract algebras W(N, g ) and W(M, g) are concretely realized as linear operators on the GNS constructions of the quasi-free Hadamard states ω and ω. Since every element in W(N, g ) can be written as a sum of elements of the form Wn (t), the above formula gives, by linearity, a map from W(N, g ) to W(M, g). That this map is a *-homomorphism can easily be seen from the formulas (7) and (8), together with the relation ω (x, y) = ω(χ (x), χ (y)). That ιχ is injective follows from the definition. ! " Remarks. (1) If ω is an arbitrary quasi-free Hadamard state on (N, g ), then, in terms of the generators Wn (t) of W(N, g ) in the GNS representation of ω , we have ιχ (Wn (t)) = Wn−2k (dχ⊗k , t ◦ χ −1 ), (26) k
where dχ (x, y) = ω(χ (x), χ (y)) − ω (x, y) and where dχ⊗k , t is given by Eq. (21). (2) We note that the identifications provided by the maps ιχ are consistent in the following sense. Let χ1,2 : M1 → M2 and χ2,3 : M2 → M3 be causality preserving isometric embeddings and χ1,3 = χ2,3 ◦ χ1,2 . Then the corresponding homomorphisms satisfy (in the obvious notation) ι1,3 = ι2,3 ◦ ι1,2 . Definition 3.1. A quantum field 7 (in one variable) is an assignment which associates with every globally hyperbolic spacetime (M, g) a distribution 7[g] taking values in the algebra W(M, g), i.e., a continuous linear map 7[g] : D(M) → W(M, g). Using the identifications provided by Lemma 3.1, we can now state what we mean by 7 being a “local, covariant quantum field”. Definition 3.2. A quantum field 7 (in one variable) is said to be local and covariant, if it satisfies the following property: Let χ be an isometric embedding map from a spacetime (N, g ) into another spacetime (M, g) (so that in fact g = χ ∗ g) which is causality preserving. Let ιχ : W(N, g ) → W(M, g) be the corresponding homomorphism, defined in Lem. 3.1. Then ιχ (7[χ ∗ g](f )) = 7[g](f ◦ χ −1 ) for all f ∈ D(N ).
(27)
Local fields in n variables are defined in a similar manner. We will sometimes omit the explicit dependence of the fields on the metric.
Local Wick Polynomials and Time Ordered Products Curved Spacetime
303
Remarks. (1) The above type of algebraic formulation of the locality/covariance property was suggested to us by K. Fredenhagen [9]. It is closely related to a formulation of “locality” previously given in [20, pp. 89–91] for the stress energy operator. Antecedents to this idea can be found in [22] and [15]. (2) It should be noted that the above definition involves actually two logically distinct requirements, namely (a) that the quantum field 7[g] under consideration be given by a diffeomorphism covariant expression, and (b) that it be locally constructed from the metric. The second requirement is incorporated in the possibility to consider isometries χ which map a spacetime N into a portion of a “larger” spacetime M. This allows one to contemplate a situation in which “the metric is varied outside some globally hyperbolic subset N of a spacetime M”. Note that the “covariance” axiom of Dimock [5] effectively corresponds to property (a), but since his axiom applies only to global isometries, it does not impose the requirement that the field depends only locally on the metric (property (b)). (3) To illustrate our notion of local, covariant fields and to show that locality is in fact not a trivial requirement, we now display an example of a field which fails to be local. We consider, for every spacetime (M, g), the operator-valued distribution 7[g] = : ϕ 2 :ω(M,g) , viewed now as an element of the abstract algebra W(M, g), where ω(M,g) is a quasifree Hadamard state. We claim that the field 7 is not a local, covariant field, no matter how one assigns states ω(M,g) with globally hyperbolic spacetimes (M, g). The crucial observation needed to prove this is that the locality requirement, Def. 3.2, would imply the following consistency relation between the two-point functions of the given family of quasi-free Hadamard states: ω(M,g) (χ (x), χ (y)) = ω(N,g ) (x, y)
∀(x, y) ∈ N × N ,
(28)
whenever χ : N → M is a causality preserving isometric embedding map of a spacetime (N, g ) into a spacetime (M, g) (so that in fact g = χ ∗ g). To see that it is impossible to satisfy this constraint, consider the spacetimes (M, g) and (M, g ) such that g ≡ g everywhere outside some region O with compact closure. Let ω(M,g) and ω(M,g ) be the quasi-free Hadamard states associated with those spacetimes. Let us now choose a Cauchy surface 8+ to the future of O and a Cauchy surface 8− to the past of O. Furthermore let us choose globally hyperbolic neighborhoods N± of 8± , which do not intersect O. The consistency requirement, Eq. (28), applied to the embeddings of (N± , g) into the spacetimes (M, g) resp. (M, g ) then immediately gives that ω(M,g) (x, y) = ω(N± ,g) (x, y) for all (x, y) ∈ N± × N± and that ω(M,g ) (x, y) = ω(N± ,g) (x, y) for all (x, y) ∈ N± × N± . From this we get ω(M,g) (x, y) = ω(M,g ) (x, y)
∀(x, y) ∈ N+ × N+ and ∀(x, y) ∈ N− × N− . (29)
This means that the two-point functions of the states ω(M,g) and ω(M,g ) have the same initial data both on 8+ and 8− . But they do not obey the same field equation (the metrics g and g being different inside O). From this one can easily obtain a contradiction. The above argument can be applied to any normal ordered operator, in particular to the normal ordered stress energy tensor. Our argument therefore gives a precise meaning to the common statement that normal ordering is not a valid procedure for defining the quantum stress-energy tensor in curved spacetime: The normal ordered stress tensor is not a local, covariant field. For later purposes, we also find it useful to make the following definition.
304
S. Hollands, R. M. Wald
Definition 3.3. Let 7(x1 , . . . , xn ) be a local, covariant field in n variables. Then, for any globally hyperbolic spacetime, (M, g), we define a conic subset . 7 (M, g) ⊂ (T ∗ M)n \{0} associated with 7 by . 7 (M, g) =
def
WF(ω(7[g]( · ))),
(30)
ω
where the closure is taken in (T ∗ M)n \{0}, and where the union runs over all quasi-free Hadamard states. Remark. If χ is a causality preserving isometric embedding from (N, g ) to (M, g) (so that in fact g = χ ∗ g), then we have . 7 (M, χ ∗ g) = χ ∗ . 7 (M, g). This is a straightforward consequence of our notion of local, covariant fields. 4. Additional Properties of Local Wick Polynomials and Their Time Ordered Products As we have seen, although normal ordering is mathematically a well defined prescription for defining powers of field operators, it does not define a local, covariant field, and is therefore not of any particular physical interest. Consequently, the same also applies to time ordered products of normal ordered Wick powers. In particular, the latter should not be used for the perturbative definition of an interacting field theory, since this field theory would then depend on nonlocal information, namely the global properties of the state chosen for the normal ordering prescription. We therefore seek to define a notion of local Wick polynomials and local time ordered products in the algebras W(M, g). In the present section, we shall specify these fields axiomatically (but not uniquely, as we shall see) by certain properties, which can heuristically be stated as follows: (i)
Locality: The sought-for Wick products and time ordered products are local, covariant fields in the sense of Def. 3.2. (ii) Specific properties: They have properties analogous to certain properties known to hold for the normal ordered Wick products and the time ordered products of these, such as for example a specific expression for their commutator with a free field. (iii) Continuity and Analyticity: The fields vary analytically (continuously) under analytic (smooth) variations of the metric and the coupling parameters. (iv) Scaling: The fields scale homogeneously “up to logarithmic terms” under a rescaling of the metric and the coupling parameters. We have given a precise definition of requirement (i) in the previous section. A mathematically precise formulation of conditions (ii)–(iv) will now be given in the following three subsections. 4.1. Specific properties. We first consider local Wick powers of the free field without derivatives. These are denoted by ϕ k , where k ∈ N. We make the obvious requirement that ϕ 1 be identical with the free field ϕ (which is easily checked to be a local, covariant field), and for later convenience we also set ϕ 0 = 1. We impose the following conditions on ϕ k :
Local Wick Polynomials and Time Ordered Products Curved Spacetime
305
Expansion. [ϕ k (x), ϕ(y)] = ikg (x, y)ϕ k−1 (x). Hermiticity. ϕ k (f )∗ = ϕ k (f¯) for all f ∈ D(M). Microlocal spectrum condition. Let ω be a quasi-free Hadamard state. Then ω(ϕ k (x)) is a smooth function in x. Local Wick powers of differentiated fields are required to satisfy suitably generalized versions of the above requirements. The modifications are straightforward and therefore left to the reader. For notational simplicity we will explicitly consider only the undifferentiated Wick powers in the following, but our existence and uniqueness arguments and results apply to the differentiated Wick powers as well as to the undifferentiated Wick powers. Remark. For the local Wick products of differentiated fields it also would be reasonable to impose the following additional requirement: Any local Wick product containing ( − ξ R − m2 )ϕ as a factor should vanish. We note that the explicit construction of local Wick products that will be given in Sect. 5.2 does not satisfy that requirement. (A related difficulty with our prescription given in Sect. 5.2 is that it gives a stress energy operator which is not conserved.) We believe that a construction of local Wick products of differentiated fields satisfying this additional condition can be given via the use of the local vacuum-concept introduced by Kay [16] (see also [12, Ch. 6]), but we will defer the consideration of this issue to a future investigation. We next consider local time ordered products of undifferentiated local Wick powers. These are denoted by T (ϕ k1 . . . ϕ kn ). We make the obvious requirement that T (ϕ k ) be equal to the local Wick power ϕ k considered above. Our further requirements are the following: Symmetry. Any time ordered product is symmetric under a permutation of the operators under the time-ordering symbol. Causal factorization. Consider any set of points (x1 , . . . , xn ) ∈ M n and a partition of {1, . . . , n} into two non-empty subsets I and I c , with the property that no point xi with i ∈ I is in the past of any of the points xj with j ∈ I c , i.e., xi ∈ / J − (xj ) for all i ∈ I c and j ∈ I . Then the time ordered products factorize in the following sense: T (ϕ k1 (x1 ) . . . ϕ kn (xn )) = T
ϕ ki (xi )
T
ϕ kj (xj ) .
j ∈I c
i∈I
Expansion. [T (ϕ k1 (x1 ) . . . ϕ kn (xn )), ϕ(y)] =i
n
ki g (xi , y)T ϕ k1 (x1 ) . . . ϕ ki −1 (xi ) . . . ϕ kn (xn ) .
i=1
Unitarity. T ϕ k1 (x1 ) . . . ϕ kn (xn )
∗
=
P =I1 '···'Ij
(−1)n+j
I ∈P
T
i∈I
ϕ ki (xi ) .
306
S. Hollands, R. M. Wald
Here we have used the following notation: P = I1 ' · · · ' Ij denotes a partition of the set {1, . . . , n} into j pairwise disjoint, nonempty subsets Ii . The unitarity condition is equivalent to requiring that the S-matrix is unitary in the sense of formal power series of operators. Microlocal spectrum condition. Let . T (M, g) ⊂ (T ∗ M)n \{0} be the conic set associated with the time ordered product T (ϕ k1 (x1 ) . . . ϕ kn (xn )) as in Def. 3.3. Then, any point (x1 , k1 , . . . , xn , kn ) in . T (M, g) satisfies the following: (a) there exist null-geodesics γ1 , . . . , γm which connect any point xj in the set {x1 , . . . , xn } to some other point in that set, (b) there exists coparallel, cotangent covectorfields p1 , . . . , pm along these geodesics such that pi ∈ V + if the starting point of γi is not in the causal
past of the end
point of γi , (c) for the covector kj over the point xj it holds that kj = e pe (xj ) − s ps (xj ), where the index e runs through all null-geodesics ending at xj and s runs through all null-geodesics starting at xj . The microlocal spectrum condition may be viewed as a microlocal analogue of translation invariance in Minkowski space. It was shown to hold for time ordered products of normal ordered Wick powers in [2]. We also note that it reduces to the requirement that ω(ϕ k (x)) be smooth in the case n = 1. Again, time ordered products of differentiated Wick powers would satisfy suitable generalizations of the above requirements. Our uniqueness arguments of Sect. 5.3 would also apply to such time ordered products, but for notational simplicity we shall explicitly only consider the undifferentiated products below. For later purposes, we also wish to impose a sharpened version of the microlocal spectrum condition for the local Wick polynomials and their time ordered products for the case that the metric g is not only smooth, but in addition real analytic in some convex normal neighborhood O ⊂ M. For this purpose, we consider “analytic” quasi-free Hadamard states, i.e., quasi-free states ω with the property that ω(x, y) − H (x, y) is not only a smooth, but in addition an analytic function in O × O, where H is the Hadamard fundamental solution defined by Eq. (4). We then impose a sharpened constraint on the singular behavior of the expectation values of a local time ordered product in such a state by considering the so-called “analytic wave front set” [11] instead of the ordinary, “smooth wave front set”, which is used in the above microlocal spectrum condition (compare Def. 3.3). The concept of the analytic wave front set, WFA (u), of a distribution u characterizes the points and directions for which u fails to be analytic, in much the same way as the ordinary wave front set, WF(u), characterizes the points and directions for which u is not smooth.4 In order to give a formulation of the microlocal spectrum condition in the analytic case that is parallel to the one given above in the smooth case, we first introduce, for 7 (O, g) ⊂ (T ∗ O)n \{0}, which every local, covariant field 7(x1 , . . . , xn ), a conic set .A is defined as in Def. 3.3, but with the difference that the union in Eq. (30) now runs over all analytic Hadamard states in O × O, and that WF is replaced by WFA . In the case when 7(x1 , . . . , xn ) is a local time ordered product, we denote this conic set by T (O, g). Our analytic microlocal spectrum condition is then the following: .A Analytic microlocal spectrum condition. Let O be a convex normal neighborhood T (O, g) has the properties stated in the of M. Then any point (x1 , k1 , . . . , xn , kn ) ∈ .A microlocal spectrum condition for the smooth case. Remark. For a local Wick product (the case n = 1), this condition implies that ω(ϕ k (x)) is analytic in O for any analytic Hadamard state. 4 We note that for any distribution u it holds that WF(u) ⊂ WF (u). A
Local Wick Polynomials and Time Ordered Products Curved Spacetime
307
4.2. Continuity and analyticity. The basic difficulty in defining notions of continuous and analytic dependence of a local, covariant field under a corresponding variation of the metric and the parameters is that the fields corresponding to different metrics and parameters are elements of different algebras and hence cannot be compared directly. It is therefore necessary to provide a suitable identification of these elements first. In order to simplify the discussion, we will first consider only variations of the spacetime metric, and keep the coupling constants fixed. We will comment on how to generalize the present discussion to include also variations of the parameters at the end of this subsection. We first give a notion of the continuous dependence of a local, covariant quantum field on the metric. Here, we consider a situation wherein one is given a family of metrics, g(s) , depending smoothly on some real parameter, s, and differing from each other only within some compact region, O, in the spacetime M. Under these circumstances, we will show in Lem. 4.1 that it is possible to construct isomorphisms between the algebras corresponding to different values of s by identifying the observables in the past (or future) of O. A local, covariant field 7 with a continuous dependence under smooth variations of the metric will then be defined as one for which the family 7[g(s) ] depends continuously on s under this identification of the corresponding algebras for all smooth families of metrics g(s) . A notion of the analytic dependence of a local, covariant field under corresponding variations of the metric is given next. Here, we consider an analytic family, g(s) , of real analytic metrics in some open neighborhood O of M. However, unlike in the case of a smooth family of metrics considered above, we now cannot demand that our metrics coincide outside some compact region, because there are no analytic functions with compact support. Consequently, we cannot identify the algebras for different values of s in the same manner as in the smooth case, and we therefore have no obvious means to compare directly a given field for the different metrics g(s) , since these fields belong to different algebras. We will avoid this problem by considering instead a notion of analytic dependence of a field on the metric via its expectation values in an analytic family of quasi-free Hadamard states, ω(s) , corresponding to the metrics g(s) : We shall say that a local, covariant field 7 depends analytically on the metric if the family of expectation values ω(s) (7[g(s) ](x1 , . . . , xn )) depends, in a suitable sense, analytically on s, for all possible choices of analytic families of metrics g(s) and states ω(s) . Lemma 4.1. Consider two globally hyperbolic spacetimes (M, g) and (M, g ), such that g ≡ g everywhere outside some region O with compact closure. Then there exists a *-isomorphism τret : W(M, g ) → W(M, g), such that the restriction of τret to the subalgebra W(M− , g ) with M− = M\J + (O) is the identity. Similarly there exists a *-isomorphism τadv : W(M, g ) → W(M, g), such that the restriction of τadv to the subalgebra W(M+ , g ) with M+ = M\J − (O) is the identity. Remark. The isomorphisms τret and τadv are constructed by a suitable identification of the fields in both algebras on a Cauchy surface 8− not intersecting the future of O or, respectively, on a Cauchy surface 8+ not intersecting the past of O. The particular choice of those Cauchy surfaces is irrelevant for the constructions, so in that sense, τret and τadv are canonical. In the following proof, we will only construct τret , the construction of τadv is completely analogous. Proof. Let 8− be a Cauchy surface not intersecting the future of O and let 8+ be a Cauchy surface not intersecting the past of O. Define a bidistribution S on M by S(f1 ⊗ f2 ) = (F1 ∇a F2 − F2 ∇a F1 ) na dσ, (31) 8−
308
S. Hollands, R. M. Wald
where F1 (x) =
M
g (x, y)f1 (y) µg (y),
F2 (x) =
M
g (x, y)f2 (y) µg (y).
(32)
By a standard argument based on Gauss’ law (see e.g. [20]), one can see that S does not depend on the particular choice for 8− . Let χ be an arbitrary smooth function on M satisfying χ (x) = 0 for all x ∈ J + (8+ ) and χ (x) = 1 for all x ∈ J − (8− ). We then define a linear map Aret : D(M) → D (M) by def
Aret f = −(g − ξ Rg − m2 )(χ Sf ). The distribution Aret f satisfies the following properties: (a) Aret f is of compact support with supp(Aret f ) ⊂ J + (8− ) ∩ J − (8+ ), (b) g Aret f (x) = g f (x) for all x ∈ J − (8− ) and f ∈ D(M). Item (a) immediately follows from the fact that (g − ξ Rg − m2 )Sf (x) = 0 for all x ∈ J − (8− ) and the fact that χ (x) = 0 for all x ∈ J + (8+ ). Item (b) holds since 2 g Aret f (x) = ret g (g − ξ Rg − m )(χ Sf )(x) = Sf (x) = g f (x)
∀x ∈ J − (8− ).
(33)
We wish to show that the nth tensor power of Aret gives a map A⊗n ret : En (M, g ) → En (M, g).
We begin by showing that S has the following wave front set: WF(S) ⊂ {(x1 , k1 , x2 , −k2 ) ∈ (T ∗ M)2 \{0} | ∃y ∈ M\J + (O) and (y, p) ∈ Ty∗ M such that (x1 , k1 ) ∼ (y, p) with respect to g and such that (x2 , k2 ) ∼ (y, p) with respect to g }.
(34)
In order to see this, we note that by definition, (g − ξ Rg − m2 )x S(x, y) = (g − ξ Rg − m2 )y S(x, y) = 0.
(35)
We are thus in a position to apply the “propagation of singularities theorem” [7, Thm. 6.1.1] to S. This theorem tells us that an element (x1 , k1 , x2 , k2 ) is in WF(S) if and only if every element of the form (y1 , p1 , y2 , p2 ) is in WF(S), where (y1 , p1 ) ∼ (x1 , k1 ) with respect to g and where (y2 , p2 ) ∼ (x2 , k2 ) with respect to g . Moreover, by definition of S, we have that S(x, y) = g (x, y) = g (x, y) for all x, y ∈ M\J + (O). The wave front set of g is known to be WF(g ) = (x1 , k1 , x2 , −k2 ) ∈ (T ∗ M)2 \{0} | (x1 , k1 ) ∼ (x2 , k2 )with respect to g .
(36)
Combining these two pieces of information then gives us the above wave front set for S.
Local Wick Polynomials and Time Ordered Products Curved Spacetime
309
Since differentiating and multiplying a distribution by a smooth function does not enlarge its wave front set, it holds that WF(Aret ) ⊂ WF(S). By the rules [11] for calculating the wave front set of a tensor product of distributions, we get from this that ∗ 2n WF(A⊗n ret ) ⊂ {(x1 , k1 , . . . , xn , kn , y1 , p1 , . . . , yn , pn ) ∈ (T M) \{0} | (xi , ki , yi , pi ) ∈ WF(S) ∪ {0} for all i = 1, . . . , n}.
(37)
Let t ∈ En (M, g ), that is, t is a symmetric, compactly supported n-point distribution with WF(t) ⊂ Gn (M, g ). Then it follows from the above form of WF(A⊗n ret ) that {(y1 , p1 , . . . , yn , pn ) ∈ (T ∗ M)n \{0} | ∃(x1 , 0, . . . , xn , 0, y1 , −p1 , . . . , yn , −pn ) ∈ WF(A⊗n ret )} ∩ WF(t) = ∅.
(38)
Therefore [11, Thm. 8.2.13] applies and we conclude from that theorem that the linear operator A⊗n ret has a well-defined action on distributions t ∈ En (M, g ). The wave front set ⊗n of the distribution Aret t can be calculated from [11, Thm. 8.2.13] using our knowledge about WF(A⊗n ret ) and WF(t): ∗ n WF(A⊗n ret t) ⊂ {(x1 , k1 , . . . , xn , kn ) ∈ (T M) \{0} | ∃(xi , ki , yi , −pi ) ∈ WF(S) ∪ {0},
i = 1, . . . , n, such that (y1 , p1 , . . . , yn , pn ) ∈ Gn (M, g )} ∪{(x1 , k1 , . . . , xn , kn ) ∈ (T ∗ M)n \{0} | ∃(xi , ki , yi , 0) ∈ WF(S) ∪ {0} for all i = 1, . . . , n} ⊂ Gn (M, g). (39)
Since the distribution A⊗n ret t is of compact support by (a), we have thus demonstrated that the nth tensor power of Aret gives a map from En (M, g ) to En (M, g), as we had claimed. The algebras W(M, g) and W(M, g ) are faithfully represented on the GNS Hilbert spaces of any quasi-free Hadamard states ω respectively ω on the subalgebras A(M, g) and A(M, g ). We may choose these quasi-free states (or rather their two-point functions) to have identical initial data on 8− . In view of item (b), this amounts to saying that ω(Aret f1 ⊗ Aret f2 ) = ω (f1 ⊗ f2 )
(40)
for all compactly supported test functions f1 , f2 . We now define τret : W(M, g ) → W(M, g) by τret (Wn (t)) = Wn (A⊗n ret t), def
(41)
where the Wn are the generators of W(M, g ) and where the Wn are the generators of W(M, g). We must show that this is indeed a *-isomorphism. That τret respects the product in both algebras, Eq. (7), follows from ⊗(n+m−2k)
Aret
⊗n (t ⊗k t ) = (A⊗m ret t) ⊗k (Aret t ),
(42)
where ω is used for the contractions in ⊗k on the left side, and ω is used for the contractions in ⊗k on the right side, as one can easily verify using relation Eq. (40) and the definition of the contracted tensor product. That τret respects the *-operation follows because Aret is real. That τret is invertible can be seen by an explicit construction of its
310
S. Hollands, R. M. Wald
inverse, given by the same construction as above, but with the spacetimes (M, g) and (M, g ) interchanged. The definition of Aret does not depend on the specific choice for 8− , but it depends on a choice for χ . It is however not difficult to see that isomorphism τret itself is independent of that choice. We finally prove that the restriction of τret to W(M− , g ) is the identity. By item (b) above we have g (t − Aret t) = g t − g t
in J − (8− )
(43)
for any t ∈ E1 (M, g ). Now if the support of t is in M− (so that supp(t) ∩ J + (O) = ∅) then the above expression vanishes on J − (8− ). Since this expression is moreover a solution to the Klein–Gordon equation, it must in fact vanish everywhere. Therefore, by the same argument as in the proof of Prop. 2.1, there is an s ∈ E1 (M, g) such that t − Aret t = (g − ξ Rg − m2 )s. Since W1 ((g − ξ Rg − m2 )s) = 0, this implies that τret (W1 (t)) = W1 (Aret t) = W1 (t) for all t ∈ E1 (M− , g ). This argument can be generalized to show that τret (Wn (t)) = Wn (t) for all t ∈ En (M− , g ) and arbitrary n, thus proving our claim. ! " Using the above lemma, we are now able to say what precisely we mean by the statement that a “local field varies continuously under a smooth variation of the metric”. Let g(s) be a family of metrics on M such that g(s) ≡ g outside a compact region O (s) and which depends smoothly on s in the sense that the five-dimensional metric gab + (ds)a (ds)b is smooth on M × R. From the above lemma, we then get, for each value of s, an isomorphism τret : W(M, g(s) ) → W(M, g). Continuity. A local, covariant quantum field 7 is said to depend continuously on the metric if the algebra-valued function R s → τret 7[g(s) ](f ) ∈ W(M, g) is continuous for all families of metrics as described above and all test functions f . Remarks. (1) A notion of continuous dependence of the fields on the metric could also be given based on the isomorphisms τadv . It can be seen (although we do not demonstrate this here) that both notions coincide. (2) We also note that the isomorphisms τadv and τret can be used in certain cases to describe in a meaningful way the advanced and retarded response of local, covariant quantum field to an infinitesimal perturbation of the metric. Namely, for a local, covariant field 7 which has not only a continuous but in addition a once differentiable dependence on the metric, one can define its advanced response, (δ7/δgab )adv , to a metric perturbation by M n+1
δ7(x1 , . . . , xn ) δgab (y)
hab (y)f (x1 , . . . , xn ) µg (y)µg (x1 ) . . . µg (xn ) adv def
=
d τadv (7[g + sh](f )) , s=0 ds
(44)
where h ≡ hab is of compact support. In the same way one can define the retarded response, (δ7/δgab )ret , of a local, covariant field 7 to a metric perturbation.
Local Wick Polynomials and Time Ordered Products Curved Spacetime
311
We next explain what we mean by the statement that a “local field varies analytically under an analytic variation of the metric”. Let g(s) be a family of metrics on M which is analytic in some convex normal neighborhood O ⊂ M in the sense that the five(s) dimensional metric gab + (ds)a (ds)b is analytic on O × I , where I is an open interval. We consider a family of quasi-free Hadamard states, ω(s) , on the algebras W(M, g(s) ) that is analytic in s in the following sense: Let H (s) be the Hadamard parametrices, given by Eq. (4), constructed from the metrics g(s) , and let us assume that O is small enough such that H (s) is well-defined on O × O for all s. We say that ω(s) is an analytic one-parameter family of states if the difference ω(s) (x, y)−H (s) (x, y) is jointly analytic in (x, y, s) on O × O × I . We would like to define a notion of the analytic dependence of a local field on the metric by demanding that the expectation values ω(s) (7[g(s) ](x1 , . . . , xn )) depend analytically on s for any analytic family of metrics and any corresponding analytic family of quasi-free Hadamard states. However, since these expectation values are in fact distributions in x1 , . . . , xn , it is not clear a priori what is actually meant by “analytic dependence on s”. To give precise meaning to this statement we must characterize the extent to which the above expectation values, viewed as distributions jointly in (x1 , . . . , xn , s), “fail to be analytic”. We do so by means of the analytic wave front set of the above expectation values of a local, covariant field, viewed as a distribution jointly in (x1 , . . . , xn , s). Analytic dependence. Let g(s) be an analytic family of metrics in O ⊂ M and let ω(s) be a corresponding analytic family of quasi-free Hadamard states. Let 7 be a local, 7 (O, g) ⊂ (T ∗ O)n \{0} be the associated conic covariant field in n variables, and let .A set as introduced in Subsect. 4.1. Consider the family of expectation values, def Eω7 (x1 , . . . , xn , s) = ω(s) 7[g(s) ](x1 , . . . , xn ) , (45) viewed as a distribution on O n × I . Then we demand that WFA (Eω7 ) ⊂ {(x1 , k1 , . . . , xn , kn , s, ρ) ∈ T ∗ (O n × I ){O} | 7 (x1 , k1 , . . . , xn , kn ) ∈ .A (O, g(s) )} (46)
for all analytic families of metrics and all corresponding analytic families of states. Remarks. (1) The above condition on the analytic wave front set can be understood as follows. Consider first an open neighborhood U ⊂ O n such that Eω7 is non-singular for all (x1 , . . . , xn ) ∈ U for a given value of s = s0 . Then the condition on WFA (Eω7 ) implies that Eω7 varies analytically in (x1 , . . . , xn ) and s in the neighborhood of the form U × (s0 − δ, s0 + δ) for some δ > 0. On the other hand, if (x1 , . . . , xn ) is a singular point for the local, covariant field 7 at a given s, then the condition on WFA (Eω7 ) demands that the singular “x-directions” of Eω7 in momentum space are the same ones as for the field 7[g(s) ](x1 , . . . , xn ), considered as a distribution in the x-variables at fixed s. (2) The above definition assumes the existence of an analytic family of states for any given analytic family of metrics. While we do not have any argument proving the existence of such a family, we remark that, for the sake of our definition of analytic dependence, it would be entirely sufficient to have a suitable family, ψ (s) , of normalized, linear (but not necessarily positive) functionals on the algebras W(M, g(s) ). We now briefly indicate how such a family can be constructed. Firstly, using the results of [12, Ch. 6] one can obtain families of bidistributions ψ (s) (x, y) which have the same properties as
312
S. Hollands, R. M. Wald
ω(s) (x, y), except possibly for positivity. These bidistributions can then be promoted, by the same formula as Eq. (3), to normalized linear functionals on the algebras A(M, g(s) ) of free fields. It is then not difficult to see that these can then be extended (via normal ordering elements of W(M, g(s) ) with respect to ψ (s) ) to functionals on the algebras W(M, g(s) ). The analyticity of local, covariant fields under corresponding variations of the coupling parameters can be formulated in a very similar way as above. To obtain a corresponding notion of continuous dependence, it is however necessary to allow the coupling parameters p (≡ (ξ, m2 ) in the case of a real scalar field, Eq. (1)) to be arbitrary smooth functions on spacetime, rather than constants. One can then consider two coupling functions p1 and p2 which differ only within some compact region. In such a situation, it is possible to find an identification of the algebras corresponding to p1 and p2 , which is analogous to the one established in Lem. 4.1. Based on such an identification, one can give a notion of continuity of local, covariant fields under smooth variations of the coupling parameters, which is completely analogous to the above notion of continuity under smooth variations of the metric. It should also be noted that the consideration of different coupling parameters involves a slight generalization of our notion of local, covariant fields (Def. 3.2). This generalization is however rather obvious and therefore left to the reader. 4.3. Scaling. The scaling requirement involves the comparison of a given local, covariant field at different scales, i.e., its behavior under a rescaling g → λ−2 g and under corresponding rescalings of the coupling parameters m2 , ξ and ϕ, chosen in such a way as to leave the action S invariant. For the action (1), the unique corresponding scalings of m2 , ξ and ϕ leaving S invariant are m2 → λ2 m2 , ξ → ξ and ϕ → λϕ. We will refer to the various exponents of λ as the “engineering dimension” of the corresponding quantities (and similarly for other quantities derived from those). In order to compare an arbitrary local, covariant field 7 in the algebras W(M, g) at different scales, we first show that the algebras constructed from the rescaled quantities are naturally isomorphic for all values of λ > 0. Lemma 4.2. There are natural *-isomorphisms σλ : Wp(λ) (M, λ−2 g) → Wp (M, g) for all λ > 0, where the subscripts on the algebras indicate the dependence on the parameters, p = (ξ, m2 ) and p(λ) = (ξ, λ2 m2 ). Proof. Let ω be a quasi-free Hadamard state for the theory at λ = 1. For all λ > 0, let ω(λ) (x, y) = λ2 ω(x, y).
(47)
Then ω(λ) is the two-point function of a quasi-free Hadamard state of the theory scaled by λ. (Note that Eq. (47) is equivalent to the relation ω(λ) (f1 ⊗ f2 ) = λ−6 ω(f1 ⊗ f2 ) between the smeared two-point functions, because the metric volume element transforms as µλ−2 g = λ−4 µg .) We use ω(λ) to give a concrete realization of the algebra Wp(λ) (M, λ−2 g). We then define (using the same symbol for the generators Wn in both algebras) σλ : Wp(λ) (M, λ−2 g) Wn (t) → λ−3n Wn (t) ∈ Wp (M, g). σλ is a well defined map for all λ > 0, because En (M, g) = En (M, λ−2 g). Using Eq. (47), it is also easily checked to be a *-homomorphism. ! "
Local Wick Polynomials and Time Ordered Products Curved Spacetime
313
Using the above lemma, we are now in a position to consider a given local, covariant field at different scales: Let 7 be a local, covariant field in n variables. We then define a rescaled field, Sλ 7, by def Sλ 7[g, p](f ) = λ4n σλ 7[g(λ), p(λ)](f ) , (48) where p(λ) = (ξ, λ2 m2 ), g(λ) = λ−2 g and λ > 0. The crucial point to note about the automorphism σλ is that (a) it ensures that the field 7 and the rescaled field Sλ 7 live in the same algebra (so that they may be compared), and that (b) it is constructed in such a way that the rescaled field Sλ 7 is again local in the sense of Def. 3.2. The factor λ4n has been included in the definition of the scaling map Sλ in order to compensate for the fact that the quantum fields are distributions and therefore transform as densities under rescalings of the metric. The action of Sλ on some simple local, covariant fields is given below. Next, we introduce the notion of the scaling dimension of a local, covariant field. Definition 4.1. The scaling dimension d7 of a local, covariant field 7 is defined by d7 = inf{δ ∈ R | lim λ−δ Sλ 7 = 0}, λ→0+
(49)
where the limit is understood to mean that lim λ−δ Sλ 7[g, p](f ) = 0
λ→0+
for all metrics g, all values of the parameters p and all test functions f . It is easy to see from the definition that the free field indeed scales as Sλ ϕ = λϕ. The local c-number field C = m2 R1 scales as Sλ C = λ4 C, so it has scaling dimension four. The fields in the above examples scale homogeneously. However, this is clearly not always so, as may be seen from the elementary example (1 + R 2 )−1 1, which is local, has scaling dimension zero, but which does not scale homogeneously (and which also has no well-defined engineering dimension). We would like to require that our local Wick powers and local time ordered products scale homogeneously, the basic idea being that we wish our fields to have a well-defined engineering dimension. However, as it is well known in quantum field theory – and, as we shall see in more detail for the local Wick products below – logarithmic terms cannot be avoided in general (with the exception of the free field). Consequently, we will require, instead, that the local Wick powers and their local time ordered products scale “homogeneously up to logarithmic terms”. This requirement is formulated precisely as follows. We say that an element a ∈ Wp (M, g) has order k if its (k + 1) times repeated commutator with a free field vanishes. (Proposition 2.1 provides a characterization of such elements.) By the expansion requirement, we know that the time ordered products
T (ϕ k1 . . . ϕ kn ) have order i ki . It is also clear that the order is additive under the multiplication of two operators. Using the notion of the order of an operator, we now give a recursive definition of local, covariant field with “almost homogeneous scaling”. Definition 4.2. A local, covariant field 7 of order zero (i.e., a local c-number field) is said to have “almost homogeneous scaling” if it scales in fact exactly homogeneously, λ−d7 Sλ 7 = 7.
(50)
314
S. Hollands, R. M. Wald
A local, covariant field 7 of order k > 0 is said to scale almost homogeneously if lni λ · Ci , for all λ > 0, (51) λ−d7 Sλ 7 = 7 + i
where the Ci are finitely many local, covariant fields of order ≤ k − 1 with dCi = d7 and almost homogeneous scaling. Our requirement concerning the scaling of local Wick-products and time ordered products is then the following. k1 kn Scaling. The local time
ordered products 7 = T (ϕ . . . ϕ ) have almost homogeneous scaling with d7 = ki = order of 7.
5. Analysis of the Renormalization Ambiguity for Local Wick Products and Their Time Ordered Products 5.1. Uniqueness of local Wick products. We now analyze the ambiguity in defining local Wick powers with the properties stated in the previous section. As previously mentioned, we will explicitly consider only undifferentiated Wick powers here, but our results can be straightforwardly extended to differentiated Wick powers (modulo the remark in Sect. 4.1 above). Theorem 5.1. Suppose we are given two sets of local Wick products ϕ k (x) and ϕ k (x), satisfying the requirements formulated in the previous section (for all k). Then there holds ϕ k (x) = ϕ k (x) +
k−2 k Ck−i (x)ϕ i (x). i
(52)
i=0
Here, Ck (x) ≡ Ck [gab (x), Rabcd (x), . . . , ∇(e1 . . . ∇ek−2 ) Rabcd (x), ξ, m2 ] (k ∈ N)
(53)
are polynomials (with real coefficients depending analytically on ξ ) in the metric, the curvature and the mass parameter, which scale as Ck → λk Ck under rescalings gab → λ−2 gab , m2 → λ2 m2 , ξ → ξ . Remark. The space of possible curvature terms Ck described in the theorem is finite dimensional for every k. For example C2 must be a real linear combination of R and m2 , since these are the only curvature terms with the required properties. Therefore the ambiguity in defining ϕ 2 is given by ϕ 2 = ϕ 2 + (Z1 R + Z2 m2 )1, where Z1 , Z2 are undetermined real constants, depending analytically on ξ . Proof of Theorem 5.1. The proof is divided into two steps: We first show that there exist local, covariant, Hermitian c-number fields Ck such that Eq. (52) holds and which have the property that each Ck depends continuously (analytically) on the metric and scales homogeneously up to logarithmic terms with dimension dCk = k. The second step is then to show that the Ck are polynomials in the metric, the Riemann tensor, its derivatives and the coupling constants, and that they scale in fact exactly as Ck → λk Ck under a rescaling of the metric and the mass parameter.
Local Wick Polynomials and Time Ordered Products Curved Spacetime
315
The first step is accomplished by a simple induction argument in k. Clearly, Eq. (52) holds for k = 1 and C1 = 0, since there is no ambiguity in the definition of the free field. Suppose we have found Hermitian local c-number fields Ci , i = 2, 3, . . . , k − 1 such that Eq. (52) holds up to order k − 1 and which have furthermore the properties (a) they are continuous (analytic) under corresponding variations of the metric and the parameters and (b) they have almost homogeneous scaling with dimension dCi = i. We define a local, covariant field 7k by
k−2 k def k Ck−i (x)ϕ i (x) . ϕ (x) − ϕ k (x) + (54) 7k (x) = i i=1
By the induction assumption it follows that the local, covariant field 7k is Hermitian, it is continuous (analytic) under corresponding variations of the metric and the parameters, and it has almost homogeneous scaling with d7k = k. This is because 7k arises as a sum of local, covariant fields with these properties. Using the expansion requirement for the local Wick powers and the inductive assumption, one easily gets [7k (x), ϕ(y)] = 0
for all x, y ∈ M.
(55)
Using Prop. 2.1 we therefore get that 7k = Ck 1, where Ck ≡ Ck [g, p] is some Hermitian local, covariant c-number field with the properties (a) and (b). Using the microlocal spectrum condition for the local Wick monomials, we moreover immediately get that Ck is actually a smooth function in x. We have thus completed the first step and we come to the second step. The locality requirement, Def. 3.1, implies that5 χ ∗ Ck [g, p] = Ck [χ ∗ g, p],
(56)
for any diffeomorphism χ of M, and that Ck [g, p](x) = Ck [g , p](x) holds true whenever g = g in some open neighborhood of the point x. The first condition means that Ck [g, p](x) is given by a diffeomorphism covariant expression, and the second means that it depends only on the germ of g at x. In order to proceed, we now consider the subspace of all metrics g, which are real analytic in some neighborhood of x, and we view Ck as a functional on that sub-space. Since the germ at x of a real analytic metric g depends only on the metric itself and all its derivatives at x, this functional must be of the form ◦
◦ ◦
Ck [g, p](x) ≡ Ck [gµν (x), ∂ σ gµν (x), ∂ σ ∂ ρ gµν (x), . . . , p]
(57)
◦
for all real analytic metrics g. Here, ∂ µ is the coordinate derivative operator in some fixed analytic coordinate system around x and greek indices denote the components in these coordinates. For convenience, we take the values of all the coordinates of x to be zero. Consider, now, the 1-parameter family of coupling parameters p(s) = (ξ, s 2 m2 ) and the following 1-parameter family of real analytic metrics, defined by g(s) = s −2 χs∗ g.
(58)
Here, χs is the diffeomorphism which in our coordinates around x acts by rescaling the coordinates by a factor s. Let y α denote the coordinates of a point y in a sufficiently 5 Note that the role played by ι in the locality requirement is trivial in the case at hand, since C is a χ k c-number.
316
S. Hollands, R. M. Wald
small neighborhood of x. In terms of components in our fixed coordinate system, we have (s) α (y ) = gµν (sy α ). gµν
(59)
It follows immediately from (59) that g(s) is an analytic family of metrics in a neighborhood of x and s = 0. By the analyticity and analytic microlocal scaling degree requirements, Ck [g(s) , p(s) ](x) is analytic in s in a neighborhood of s = 0, and we may thus expand it in a convergent power series about s = 0. It also follows immediately ◦
◦
(0)
from (59) that ∂ σ1 · · · ∂ σk gµν (y) = 0 for all y in a neighborhood x and that ◦
(s) (x) = gµν (x) gµν
◦
◦
◦
(s) (x) = s k ∂ σ1 . . . ∂ σk gµν (x). ∂ σ1 . . . ∂ σk gµν
(60)
We find from this the power series expansion Ck [g(s) , p(s) ](x) =
∞ n=0
sn
∂ j0 +j1 +···+jr Ck [. . . ]
2j0 +j1 +2j2 +···+rjr =n
(∂m2 )j0 [∂(∂ g(x))]j1 . . . [∂(∂ . . . ∂ g(x))]jr
◦
◦
◦
◦
◦
◦
× m2j0 [(∂ g)(x)]j1 . . . [(∂ . . . ∂ g)(x)]jr ,
(61)
where the spacetime indices have been omitted for simplicity and where [. . . ] = [gµν (x), 0, . . . , 0, ξ, m2 = 0]. Applying Eq. (56) to the diffeomorphism χs and using that χs (x) = x, we get Ck [g(s) , p(s) ](x) = Ck [s −2 g, ξ, s 2 m2 ](x). ◦
(62)
◦
Let us define Kn [gµν (x), . . . , ∂ σ1 . . . ∂ σn gµν (x), ξ, m2 ] (which we shall simply denote by Kn [g, ξ, m2 ](x)) as the coefficient of s n in the above power series expansion, Ck [s −2 g, ξ, s 2 m2 ](x) ≡
∞
s n Kn [g, ξ, m2 ](x).
(63)
n=0
(Note that Kn is a polynomial in m2 and the derivatives of the metric, whose coefficients depend analytically on ξ .) The left side of this identity is covariant under diffeomorphisms for all s. Therefore it follows that also each individual term in the series on the right side of this equation must have this property, i.e., for any analytic diffeomorphism χ, χ ∗ Kn [g, ξ, m2 ] = Kn [χ ∗ g, ξ, m2 ]
for all n ≥ 0.
(64)
Since Kn [g, ξ, m2 ](x) depends in addition polynomially on the metric and its derivatives at x, for all x ∈ M, it follows from the “Thomas replacement theorem” (see [13, Lem. 2.1]) that Kn [g, ξ, m2 ] can be written in a “manifestly covariant form”, i.e., as a polynomial in the metric, the Riemann tensor, a finite number of its (symmetrized) metric derivatives and m2 , whose coefficients depend analytically on ξ . In other words Kn [g, ξ, m2 ](x) ≡ Kn [gab (x), Rabcd (x), . . . , ∇(e1 . . . ∇en−2 ) Rabcd (x), ξ, m2 ]. (65)
Local Wick Polynomials and Time Ordered Products Curved Spacetime
317
We now use the scaling properties of Ck to find out more about its functional dependence on the metric and the coupling parameters. First, since the scaling dimension of Ck is k, we immediately find that Kn = 0 for all n < k. By Eq. (63), this means that the map λ → λ−k Ck [λ−2 g, ξ, λ2 m2 ](x) is analytic at λ = 0. Furthermore, we know that Ck is a local, covariant field which scales almost homogeneously. This means by definition that λ−k Ck [λ−2 g, p(λ)] − Ck [g, p] = lni λ · Ci [g, p], with p(λ) = (ξ, λ2 m2 ), i
(66) for a finite number of local, covariant fields Ci . Since the left side of this equation is analytic at λ = 0 and since the logarithms are not, this is only possible if in fact Ci = 0 for all i. Therefore, only the k th term in the series (63) can be nonzero, which means that Ck [g, ξ, m2 ](x) ≡ Kk [gab (x), Rabcd (x), . . . , ∇(e1 . . . ∇ek−2 ) Rabcd (x), ξ, m2 ],
(67)
for all analytic metrics g, that is, Ck is a polynomial in the metric, the curvature and the mass parameter, whose coefficients depend analytically on ξ . Since we already know that Ck is Hermitian, the coefficients of this polynomial must be real. Moreover, we can directly read off from the expansion (63) that Ck [λ−2 g, ξ, λ2 m2 ](x) = λk Ck [g, ξ, m2 ].
(68)
This then proves the theorem for analytic metrics g. But we already know that Ck [g, p] has a continuous dependence on the metric. By approximating a smooth metric by a sequence of metrics which are real analytic in a neighborhood of x, we thus conclude that Eq. (67) must also hold for metrics which are only smooth, thus proving the theorem. " !
5.2. Existence of local Wick products. We next sketch how to construct local Wick powers with the desired properties. The construction is very similar to the construction for the renormalized stress energy operator given in [20]. The main ingredient in our construction is the local “Hadamard parametrix”, given by Eq. (4). H is not defined globally but only for x, y contained in a sufficiently small convex normal neighborhood.6 In the following we therefore restrict attention to such a neighborhood in all expressions involving H . (This does not create any problems for our construction of local Wick powers, since only coincident limits of quantities involving H need to be considered.) A technical complication arises from the fact that, while u is (at least locally) unambiguously defined for arbitrary smooth spacetimes, the same does not apply to v, which is unambiguously defined only for real analytic spacetimes. In the latter case, v is expandable as v(x, y) =
∞
vn (x, y)σ n ,
(69)
n=0
where vn are certain real and symmetric [17] smooth functions constructed from the metric and ξ, m2 . In principle, one would like to define v by the above formula also for 6 The reason for considering convex normal neighborhoods is that even σ is only defined for points that can be joined by a unique geodesic.
318
S. Hollands, R. M. Wald
spacetimes which are only smooth. However, it is well-known that the above series does not in general converge in this case. This difficulty can be overcome by replacing the coefficients vn (x, y) in the above expansion by vn (x, y)ψ(σ/αn ), where ψ : R → R is some smooth function with ψ(x) ≡ 1 for |x| < 21 and ψ(x) ≡ 0 for |x| > 1. If the αn ’s tend to zero sufficiently fast, then the series with the above modified coefficients converges to a smooth function V . The coincidence limit of V and of all its derivatives does not depend on the choice of αn and ψ, and it is only through these that V enters our definition of local Wick products. These choices therefore do not affect our definition. We choose a quasi-free state ω on A(M, g) and represent W(M, g) as operators in the GNS representation of ω. Next, we define operator-valued distributions : ϕ(x1 ) . . . ϕ(xn ) :H by a formula identical to Eq. (6), except that ω is replaced by H in that formula. Now, by the very definition of Hadamard states, H is equal, modulo a smooth function, to the symmetrized two-point function of ω. Consequently, it follows immediately that : ϕ(x1 ) . . . ϕ(xn ) :H can be smeared with distributions t ∈ En (M, g) (supported sufficiently close to the total diagonal in M n ), and the so-obtained expressions belong to W(M, g). By analogy with our definition of a normal ordered field operator, Eq. (17), we are thus allowed to define def : ϕ(x1 ) . . . ϕ(xk ) :H f (x1 )δg (x1 , . . . , xk ) µg (xi ). (70) : ϕ k (f ) :H = Mk
i
Although it will not be needed until the next subsection, we find it convenient to define, by analogy with Eq. (19), also multi-local Wick products of the form : ϕ k1 (f1 ) . . . ϕ kn (fn ) :H . Local Wick products involving derivatives of the field can also be defined in a similar manner, although, as previously mentioned in the remark in Sect. 4.1, the definition fails to satisfy an additional condition that one may want to impose. We claim that the fields : ϕ k :H are local Wick monomials in the sense of the criteria given in Sects. 3 and 4. We will not give a detailed proof of this claim here but merely indicate the main arguments. That : ϕ k :H is a local, covariant field immediately follows from the fact that the Hadamard parametrix is locally and covariantly defined in terms of the metric. The expansion property can be seen in just the same way as the corresponding property for normal ordered Wick monomials. It seems clear that the construction yields continuous (analytical) dependence of our Wick monomials under corresponding variations of the metric and the parameters, although we have not attempted to give a complete proof of this result. Finally, in order to verify the scaling axiom, we first restrict our attention to real analytic spacetimes (M, g), so that the function v, Eq. (69), is well-defined. In that case one finds from the definition of u and v that λ−2 H [λ−2 g, ξ, λ2 m2 ] = H [g, ξ, m2 ] + v[g, ξ, m2 ] ln λ2 .
(71)
The appearance of the v ln λ2 term is due to the fact that the definition of H implicitly depends on a choice of length scale in the argument of the logarithm.7 Using Eq. (71) and the definition of the scaling map Sλ , Eq. (48), we find that : ϕ k :H has dimension k and that it scales almost homogeneously in the sense of Def. 4.2. The same holds also for smooth spacetimes, by the continuity of the local Wick monomials. Thus we have demonstrated existence of local Wick products satisfying all of our requirements. 7 This becomes more apparent by writing the logarithmic term in H as v ln σ µ2 , where µ has the dimension of a mass.
Local Wick Polynomials and Time Ordered Products Curved Spacetime
319
Although : ϕ k :H scales almost homogeneously, it should be noted that the presence of the ln λ2 term in Eq. (71) implies that it fails to scale exactly homogeneously. The local, covariant fields Ci in Eq. (51) are given by lower order local Wick monomials times curvature terms of the appropriate dimension. Now, by Eq. (52), any other prescription for the local Wick products, ϕ k , will be related to : ϕ k :H by k
k
ϕ (x) = : ϕ (x) :H
k−2 k Ci (x) : ϕ i (x) :H , + i
(72)
i=0
where each Ci scales exactly homogeneously. It follows that ϕ k also fails to scale exactly homogeneously. Consequently, by an argument given on pp. 98–99 of [20], there is an inherent ambiguity in the definition of ϕ k that cannot be removed within the context of quantum field theory in curved spacetime. Thus, in quantum field theory in curved spacetime, the renormalization ambiguities arise not only from the definition of the time ordered products of Wick polynomials, but also from the local Wick polynomials themselves.
5.3. Uniqueness of local time ordered products. The analysis of the ambiguity in the definition of local time ordered products of local Wick monomials differs less in substance than in combinatorical complexity from the corresponding analysis for the local Wick products. Since the combinatorical side is rather well-known, we only sketch the proof of the result, Thm. 5.2. The presentation as well as the proof of our result is simplified by comparing an arbitrary prescription for the time ordered products to a prescription based on the local Wick products : ϕ k (x) :H , defined in the previous subsection. Again, for notational simplicity, we explicitly consider only time ordered products of undifferentiated local Wick products, but our arguments and results would apply to time ordered products of differentiated Wick products as well (modulo the remark of Sect. 4.1). We find it convenient to use a multi-index notation, i.e. k ∈ Nn means a multi index k = (k1 , . . . , kn ), and standard abbreviations for multi-indices such as ki = kj ! j ij !(kj −ij )! . P = I1 ' · · · ' Is denotes a collection of pairwise disjoint subsets of {1, . . . , n}. Theorem 5.2. Consider a prescription T for defining local time ordered products based on the local Wick products : ϕ k :H , and another prescription, T, based on another, prescription ϕ˜ k for defining local Wick products. Assume that both prescriptions for defining local time ordered products satisfy all the requirements of Sect. 4. Then T
n
ki
ϕ˜ (xi )
= T
i=1
n
ki
: ϕ (xi ) :H
i=1
+
P =I1 '···'Is not all Ij = ∅
T
I ={i1 ,...,i|I | }∈P
: OkI (xI ) :H
i ∈I / ∀I ∈P
: ϕ ki (xi ) :H ,
(73)
320
S. Hollands, R. M. Wald
where xI = (xi1 , . . . , xi|I | ) and kI = (ki1 , . . . , ki|I | ). For n ≥ 2, the : Ok (x1 , . . . , xn ) :H (k ∈ Nn ) are local, covariant quantum fields of the form k Ck−i (x1 )δ(x1 , . . . , xn ) : ϕ i1 (x1 ) . . . ϕ in (xn ) :H , (74) : Ok (x1 , . . . , xn ) :H ≡ i i≤k
where the Ck are real c-number polynomials in gab , Rabcd , . . . , ∇(e1 . . . ∇ed−2 ) Rabcd , xi m2 , and
covariant derivative operators ∇a , with scaling (= engineering) dimension d = ki − 4(n − 1), whose coefficients depend analytically on ξ . For n = 1, the quantum fields : Ok (x) :H (k ∈ N) are given by the same kind of expression as above, but with no delta-functions and no covariant derivatives. Remarks. (1) The multi-local covariant quantum fields : Ok (x1 , . . . , xn ) :H can alternatively be written as a sum of, possibly differentiated, mono-local Wick powers (i.e., depending only on one argument, say, the point x1 ), multiplied by suitable differentiated delta-functions. In formulas, with ai denoting a four-dimensional spacetime multi index, a ...a : Ck i n (x1 ) :H ∇ax11 . . . ∇axnn δ(x1 , . . . , xn ), (75) : Ok (x1 , . . . , xn ) :H = (a)
(a)
where the : Ck :H are local Wick polynomials, possibly with derivatives (all spacetime indices are assumed to be raised), whose coefficients are polynomials in the metric, the curvature, its covariant derivatives and the mass. These polynomials scale almost
homogeneously with dimension i ki − 4(n − 1). The time ordered products appearing in the second line of Eq. (73) are to be understood as the expressions obtained by inserting the above expression for the fields : Ok :H and by pulling the delta function type terms out of the time ordered product. The disadvantage of writing Eq. (73) explicitly in terms of these monolocal Wick-powers is that the relation between the ambiguities for different k and fixed order n (due to the expansion property of the time ordered products) now becomes a rather complicated-looking constraint on the possible delta-function type terms. A formulation of Thm. 5.2 not involving the specific prescription : ϕ k (x) :H , but instead some other arbitrary prescription, would consist in writing all the generalized multilocal Wick products in expression (73) in terms of ordinary, monolocal ones, and then replacing these by that arbitrary prescription for those fields. (2) The collection of local, covariant fields : Ok (x1 , . . . , xi ) :H with i ≤ n represent the finite renormalization ambiguity in defining time ordered products with n factors. The crucial point of the theorem is that the form of these ambiguities is severely restricted. Our uniqueness result for the Wick monomials, Thm. 5.1, is a special case of the above theorem, corresponding to n = 1. Sketch of the proof for Thm. 5.2. One proceeds by a double induction in the order n in perturbation theory and the scaling dimension d = ki of the time ordered products. Assuming the validity of the theorem up to order n − 1, one finds, using the causal factorization of the time ordered products, that Eq. (73) also holds at order n, up to an unknown local, covariant 7k (x1 , . . . , xn ) which is nonzero only for points such that x1 = · · · = xn . Assuming now that this field has the form Eq. (74) for all multi indices k with ki ≤ d − 1, one finds that it also has this form for dimension d, up to a c-number field of the form ck (x1 , . . . , xn ) = Ck (x1 )δ(x1 , . . . , xn ), where Ck is a polynomial in the covariant derivative operators with bounded coefficients. By locality, Ck is locally
Local Wick Polynomials and Time Ordered Products Curved Spacetime
321
constructed out of the metric and out of the coupling parameters. The task is then to show that it can be written as a polynomial in gab , Rabcd , . . . , m2 , ∇axi , whose coefficients are analytic functions in ξ , and which scale as Ck → λd Ck under a corresponding rescaling of the parameters. In order to find out more about the functional dependence of Ck on the metric, we now use the continuous and analytic dependence of the time ordered products under corresponding variations of the metric and the parameters, and their scaling behavior. This is done in essentially the same way as in our uniqueness proof for the local Wick products, so we only sketch the main arguments here, focusing on the differences compared to the case of the Wick monomials. For simplicity, let us first assume that Ck contains no derivatives. Consider an analytic family, g(s) , of analytic metrics in a neighborhood O in M, and an analytic family, p (s) , of coupling parameters. We would like (s) to show that the distribution Ck (x) is analytic in s and x. (Here and in the following, the superscript s indicates that we mean the quantity associated with the metric g(s) and the coupling parameters p(s) .) In order to show this, we look at the analytic wave front (s) set of ck (x1 , . . . , xn ), viewed as a distribution jointly in s and x1 , . . . , xn . Now, this (s) distribution arises as a sum of products of distributions of the form cj (x1 , . . . , xm ), with m ≤ n − 1 and j = (j1 , . . . , jm ), and of time ordered products, T (s) (. . . ). The (s) analytic wave front sets of the cj (viewed as distributions in s and the x-variables) is known by the inductive assumption; it has the same form as the wave front set of a deltadistribution. The analytic wave front set of the time ordered products – or rather of their expectation value in some analytic family of states, viewed as a distribution in s and the x-variables – is known by the analyticity requirement combined with the analytic mi(s) crolocal spectrum condition. One can use this information to infer that ck (x1 , . . . , xn ) (viewed as a distribution in s and the x-variables) has analytic wave front set WFA (ck ) ⊂ {(x1 , p1 , . . . , xn , pn , s, ρ) ∈ T ∗ (O n × I )\{0} | T (x1 , p1 , . . . , xn , pn ) ∈ .A (O, g(s) )},
(76)
T (O, g(s) ) is specified in the analytic microlocal spectrum condiwhere the conic set .A tion. But we already know that ck has support on the set of points such that x1 = · · · = xn . Using this, we therefore find
WFA (ck ) ⊂ {(x1 , p1 , . . . , xn , pn , s, ρ) ∈ T ∗ (O n × (−H, H))\{0} | pi = 0, not all pi = 0}. x1 = · · · = xn ,
(77)
i
Now, we can trivially write (s) Ck (x)
=
M n−1
(s)
ck (x, y1 , . . . , yn−1 )f (y1 , . . . , yn−1 )
n−1
µ(s) (yi ),
(78)
i=1
where f ∈ D(O) is equal to one near x. By [11, Thm. 8.5.4’] we can conclude from this (s) that Ck (x) – viewed as a distribution jointly in s and x – has analytic wave front set WFA (Ck ) = {(x, p, s, ρ) | (x, p, y1 , 0, . . . , yn−1 , 0, s, ρ) ∈ WFA (ck )} = ∅
322
S. Hollands, R. M. Wald (s)
near x. Since x was arbitrary, this then shows that Ck (x) is jointly analytic in x and s. We can now proceed as in the uniqueness proof for the local Wick products, by considering the particular family of metrics g(s) (defined in (58)) and parameters p (s) = (ξ, s 2 m2 ), and following through the same steps as there. This then shows us that Ck is indeed a polynomial in the metric, the curvature and the mass with engineering dimension d, whose coefficients depend analytically on ξ . The case when Ck (x) also contains derivatives, ∇axi , can be treated essentially in the same way as above. The only difference in the argument is that one has to consider more general functions f in Eq. (78). ! " An important direct consequence of Thm. 5.2 is the renormalizability of ϕ 4 -Theory in curved spacetime, i.e., the perturbative quantum field theory corresponding to the classical theory given by the Lagrangian L0 + L1 , where L0 is the free-field Lagrangian in Eq. (1), and where L1 = f ϕ 4 . Observables in this interacting quantum field theory can be obtained from the S-matrix, given by S(L1 ) = 1 +
in T (L1 (x1 ) . . . L1 (xn ))µg (x1 ) . . . µg (xn ), n! M n
(79)
n≥1
viewed here as a formal power series in the coupling constant f . We note that the above integrals would not in general make sense if f were taken to be a constant, so we instead take it to be an element in D(M) which is constant in some region, O, of spacetime, where we wish to define local observables. Choosing f in this way makes the series for S(L1 ) truncated at some N an element in W(M, g). Now S(L1 ) clearly depends on what prescription for the local time ordered products one chooses in (79). So consider two different prescriptions, T and T, for the time 1 ). Now if ordered products and denote the corresponding S-matrices by S(L1 ) and S(L 1 ) = S(L1 + δL1 ) for some local, covariant field δL1 which had the it were true that S(L same form as the original Lagrangian, then the theories based on different prescriptions for the time ordered products would actually be equivalent, the effect of δL1 being merely a redefinition of the coupling constants of the theory and of the field strength. Theories with this property are called “renormalizable”. It is well known that ϕ 4 -Theory in Minkowski space belongs to this class of theories. We now show that Thm. 5.2 implies that this is also the case in curved spacetime. Without loss of generality, we assume that one of the prescriptions for the time ordered products, say the “non–tilda” one, is based on a local normal ordering prescription defined in the previous section. Since : L1 :H = f : ϕ 4 :H , we must investigate the possible form of the fields : Ok (x1 , . . . , xn ) :H in the case that all ki = 4, because these govern the ambiguities in defining the time ordered products appearing in Eq. (79). Let us define a field : δL1 :H by M
def
: δL1 (x) :H µg (x) =
n n≥1 M
: Ok (x1 , . . . , xn ) :H
n
f (xi )µg (xi ),
(80)
i=1
where all ki = 4, viewed as a formal power series in f . (When this series is truncated at some order N, the above equation defines a field in W(M, g).) It then follows from the properties of the fields : Ok :H stated in Thm. 5.2 (applied to the case ki = 4), that
Local Wick Polynomials and Time Ordered Products Curved Spacetime
323
: δL1 :H is given by : δL1 :H =
f n Z0,n : g ab ∇a ϕ∇b ϕ :H + (Z1,n R + Z2,n m2 ) : ϕ 2 :H + Z3,n : ϕ 4 :H
n≥1
(Z4,n R +Z5,n Rab R ab +Z6,n Rabcd R abcd +Z7,n R +Z8,n m2 R +Z9,n m4 )1 +. . . , 2
(81) where “dots” denotes terms containing derivatives of f , and where Zi,n are real constants. One finds from Eq. (73), that 1 ) = S(: L1 :H + : δL1 :H ) S(L
(82)
in the sense of formal power series of operators. Now : δL1 :H has the same form as the original Lagrangian, : L0 :H + : L1 :H , apart from the terms proportional to the identity operator in the square brackets, and apart from the terms involving the derivatives of f . The terms proportional to the identity contribute only an overall phase to the S-matrix and therefore do not affect the definition of the interacting quantum fields derived from the S-matrix. The terms containing derivatives of f vanish in the formal limit when f → const., but for non-constant f they do affect the definition of the observables in the interacting theory. Nevertheless, it can be shown, using the arguments given in Sect. 8 of [2], that the interacting theory obtained from the interaction Lagrangian : L1 :H + : δL1 :H locally (i.e., in the region O where f is constant) does not depend on the terms in : δL1 :H involving derivatives of f . This then proves renormalizability of ϕ 4 -Theory in curved spacetime, provided of course that time ordered products satisfying our assumptions do indeed exist. 6. Conclusions and Outlook We have constructed, for every globally hyperbolic spacetime (M, g), an algebra W(M, g) containing normal ordered Wick products and time ordered products thereof. We then gave a notion of what it means for a field in that algebra to be “locally constructed out of the metric” in a covariant manner. Furthermore, we gave notions of analytic resp. continuous dependence of a local, covariant field under corresponding variations of the metric, and we gave a notion of “essentially homogeneous” scaling of a local, covariant field under suitable rescalings of the metric and the parameters of the theory. We then axiomatically characterized local Wick polynomials and local time ordered products by demanding that they satisfy the above requirements together with certain other, natural properties expected from a reasonable definition of these quantities. The imposition of these requirements was shown to reduce the ambiguities in defining these quantities to a finite number of real parameters. The nature of these ambiguities was shown to imply the renormalizability of a self-interacting quantum field theory in curved space. By an explicit construction, the existence of local Wick products with the desired properties was demonstrated. However, the issue of the existence of local time ordered products is beyond the scope of this paper and will be treated elsewhere. We mention that our notion of the scaling of a local, covariant field makes possible a renormalization group analysis of the quantum observables in the interacting theory (posed as an open problem in [2]), i.e. an analysis of the behavior of an observable in the interacting theory under a change of scale. Namely, the “action of a renormalization group transformation” on an observable in the interacting theory is implemented in our
324
S. Hollands, R. M. Wald
framework by the scaling map, Sλ , defined in Eq. (48). The task is then to analyse the action of this map on observables in the interacting theory. Now, the observables in the interacting theory are defined in terms of perturbative expressions involving local time ordered products, and hence one only has to analyse the action of Sλ on the local time ordered products. Consider an expression of the form Tλ (. . . ) = λ−d Sλ T (. . . ), where T (. . . ) is a local time ordered product with scaling dimension d. The rescaled time ordered product Tλ (. . . ) is in general not equal to the unscaled time ordered product. However, by our uniqueness theorem 5.2, the scaled time ordered products differ from the unscaled ones by well-specified renormalization ambiguities, given by certain real parameters (depending on λ). As explained in the previous section, these parameters correspond to a finite renormalization of the coupling parameters in the theory. The action of Sλ (i.e., a renormalization group transformation) therefore translates directly into a flow of the coupling parameters (and a multiplicative rescaling of the field strength). A detailed calculation of these can of course only be done based on a concrete prescription for the local time ordered products. Acknowledgements. We wish to thank Klaus Fredenhagen and Bernard Kay for helpful discussions. This research was supported in part by NSF grants PHY95-14726 and PHY00-90138 to the University of Chicago.
7. Appendix It is well known that the regularity properties of a distribution u ∈ D (Rn ) are in correspondence with the decay properties of its Fourier transform. This can be made more precise by introducing the concept of the “wave front set” of a distribution [11], which we shall define now. Let u be a distribution of compact support. We define 8(u) to be the set of all k ∈ Rn \{0} which have no conical8 neighborhood V such that | u(p)| ≤ CN (1 + |p|)−N for all p ∈ V and all N = 1, 2, . . . . 8(u) may be thought of as describing the “singular directions” of u. The wave front set provides a more detailed description of the singularities of a distribution by localizing these singular directions. If u ∈ D (X), with X an open subset of Rn , then we define 8x (u) = ∩f 8(f u), where the intersection is taken over all f ∈ D(X) such that f (x) $ = 0. The wave front set of u is now defined as WF(u) = {(x, k) ∈ X × (Rn \{0}) | k ∈ 8x (u)}. def
If (x, k) ∈ WF(u), then x is a singular point of u, i.e., there is no neighborhood of x in which u can be written as a smooth function. Conversely, if x is a point such that no (x, k) ∈ WF(u), then x is a regular point. Differentiation does not increase the wave front set, WF(∂u) ⊂ WF(u). The wave front set of a distribution is an entirely local concept, and it can be shown to transform covariantly under a change of coordinates, in the sense that WF(χ ∗ u) = (dχ )t ◦ WF(u) for any diffeomorphism χ . This makes it possible to define in an invariant way the wave front set of distributions u on a manifold X. The above transformation property then shows that WF(u) is intrinsically a (conic) subset of T ∗ X\{0}, where T ∗ X denotes the cotangent bundle of X, and where {0} means the zero section in T ∗ X. (In this paper, X is typically a product manifold M × · · · × M.) 8 A cone in Rn is a subset V with the property that if k ∈ V , then also λk ∈ V for all λ > 0.
Local Wick Polynomials and Time Ordered Products Curved Spacetime
325
In this paper we often use the notion of the wave front set to ensure that the pointwise product of certain distributions exists, or, more generally, to ensure that certain linear maps with distributional kernel have a well-defined action on certain distributions (cf. Thms. 8.2.10 and 8.2.13 of ref. [11]). The above operations with distributions are not continuous (even if they are well defined) in the usual distribution topology. However, they are continuous in the so-called “Hörmander pseudo-topology”, which is defined as follows: Let . be a closed conic set9 in Rn × Rn , and let D. (Rn ) be the set of all distributions u on Rn with WF(u) ⊂ .. We say that a sequence {uα } ⊂ D. (Rn ) converges to u in the Hörmander pseudo-topology if uα → u in the usual sense of distributions and if, for any open neighborhood O ⊂ Rn and any cone V ⊂ Rn such that .x ⊂ V ∀x ∈ O and any f ∈ D(O) there holds sup |(f uα − fu)(k)|(1 + |k|)N → 0
k ∈V /
∀N ∈ N.
This notion can be generalized in an invariant manner to smooth manifolds X, where . is now a closed conic subset of T ∗ X. Note added in proof. This has now been proven for continuous states by S. Holland and W. Ruan, [gr-qc/0108032]. References 1. Brunetti, R., Fredenhagen, K. and Köhler: M.: The microlocal spectrum condition and Wick polynomials on curved spacetimes. Commun. Math. Phys. 180, 633–652 (1996) 2. Brunetti, R. and Fredenhagen, K.: Microlocal Analysis and Interacting Quantum Field Theories: Renormalization on physical backgrounds. Commun. Math. Phys. 208, 623–661 (2000) 3. Bunch, T.S.: BPHZ renormalization of λ74 field theory in curved space-times. Ann. Phys. 131, 118 (1981) 4. Bunch, T.S., Panangaden, P. and Parker, L.: On renormalization of λ74 in curved space-time I. J. Phys. A: Math. Gen. 13, 901–918 (1980); On renormalization of λ74 in curved space-time II. J. Phys. A: Math. Gen. 13, 919–932 (1980) 5. Dimock, J.: Algebras of Local Observables on a Manifold. Commun. Math. Phys. 77, 219–228 (1980) 6. Dütsch, M. and Fredenhagen, K.: Algebraic quantum field theory, perturbation theory, and the loop expansion. [hep-th/0001129]; Perturbative algebraic field theory, and deformation quantization. [hepth/0101079] 7. Duistermaat, J.J. and Hörmander, L.: Fourier integral operators II. Acta Math. 128, 183–269 (1972) 8. Epstein, H. and Glaser, V.: The role of locality in perturbation theory. Ann. Inst. H. Poincaré Sec. A XIX, 211–295 (1973) 9. Fredenhagen, K.: Private communication at Oberwolfach meeting, September 2000 10. Garabedian, P.R.: Partial Differential Equations. New York: Wiley, 1964 11. Hörmander, L.: The Analysis of Linear Partial Differential Operators I. Berlin: Springer-Verlag, 1985 12. Hollands, S.: Aspects of Quantum Field Theory in Curved Spacetimes. PhD thesis, University of York, September 2000 13. Iyer, V. and Wald, R.M.: A Comparison of Noether charge and Euclidean methods for computing the entropy of stationary black holes. Phys. Rev. D 52, 4430 (1995) [gr-qc/9503052] 14. Kay, B.S. and Wald, R.M.: Theorems on the uniqueness and thermal properties of stationary, nonsingular, quasifree states on spacetimes with a bifurcate Killing horizon. Phys. Rep. 207, 49 (1991) 15. Kay, B.S.: Casimir Effect in Quantum Field Theory. Phys. Rev. D 20, 3052–3062 (1979) 16. Kay, B.S.: Application of linear hyperbolic PDE to linear quantum fields in curved spacetimes: Especially black holes time machines and a new semilocal vacuum concept. In: Proceedings Journées Équations aux derivées partielles, GDR 1151 (CNRS), Nantes 2000, available at: http://www.math.sciences.univnantes.fr/edpa/2000/html and [gr-qc/0103056] 9 By this we mean a set of the form . = {(x, k) ∈ U × Rn | k ∈ . }, where U is a closed set and where x .x is a closed cone in Rn for all x ∈ U .
326
S. Hollands, R. M. Wald
17. Moretti, V.: Proof of the symmetry of the off-diagonal Hadamard/Seeley-deWitt’s coefficients in C ∞ Lorentzian manifolds by a local Wick rotation. Commun. Math. Phys. 212 165–189 (2000) [grqc/9908068] 18. Radzikowski, M.J.: Micro-Local Approach to the Hadamard condition in QFT on Curved Space-Time. Commun. Math. Phys. 179, 529–553 (1996) 19. Tichy, W. and Flanagan, E.: How unique is the expected stress energy tensor of a massive scalar field?. Phys. Rev. D 58, 124007 (1998) [gr-qc/9807015] 20. Wald, R.M.: Quantum Field Theory in Curved Spacetime and Black Hole Thermodynamics. Chicago: The University of Chicago Press, 1994 21. Wald, R.M.: General Relativity Chicago: The University of Chicago Press, 1984 22. Wald, R.M.: The Back Reaction Effect in Particle Creation in Curved Spacetime. Commun. Math. Phys. 54, 1–19 (1977) Communicated by H. Nicolai
Commun. Math. Phys. 223, 327 – 362 (2001)
Communications in
Mathematical Physics
© Springer-Verlag 2001
Positive Commutators in Non-Equilibrium Quantum Statistical Mechanics Return to Equilibrium Marco Merkli, Department of Mathematics, University of Toronto, Toronto, Ontario, M5S 3G3, Canada Received: 27 December 2000 / Accepted: 21 June 2001
Abstract: The method of positive commutators, developed for zero temperature problems over the last twenty years, has been an essential tool in the spectral analysis of Hamiltonians in quantum mechanics. We extend this method to positive temperatures, i.e. to non-equilibrium quantum statistical mechanics. We use the positive commutator technique to give an alternative proof of a fundamental property of a certain class of large quantum systems, called Return to Equilibrium. This property says that equilibrium states are (asymptotically) stable: if a system is slightly perturbed from its equilibrium state, then it converges back to that equilibrium state as time goes to infinity.
1. Introduction In this paper, we study a class of open quantum systems consisting of two interacting subsystems: a finite system, called the particle system coupled to a reservoir (heat bath), described by the spatially infinitely extended photon-field (a massless Bose field). The dynamics of the coupled system on the von Neumann algebra of observables is generated by a Liouville operator, also called Liouvillian or thermal Hamiltonian, acting on a positive temperature Hilbert space. Many key properties of the system, such as return to equilibrium (RTE), i.e. asymptotic stability of the equilibrium state, can be expressed in terms of the spectral characteristics of this operator. Applying the positive commutator (PC) method to the Liouville operator of systems in question, we obtain rather detailed information on the spectrum of these operators. This allows us to recover, with a partial improvement, a recent fundamental result by several authors on RTE. This work is part of the author’s PhD requirement.
Present address: Departement Mathematik, ETH Zürich, 8092 Zürich, Switzerland.
E-mail: [email protected]
328
M. Merkli
Our main technical result is a positive commutator estimate (also called a Mourre estimate) for the Liouville operator. This result holds for a wider class of systems than previously considered. Spectral information on the Liouville operator, and hence the property of RTE, is extracted from the PC estimate through Virial Theorem type arguments. It turns out that the existing Virial Theorem techniques are too restrictive to apply to positive temperature systems, and we need to extend them beyond their traditional range of application. There is a restriction on the class of systems for which we prove RTE, due to our Virial Theorem type result mentioned above. This is the first result of this kind, and we expect that it will be improved to yield the RTE result for a considerably wider class of systems.
1.1. A class of open quantum systems. The choice of the class of systems we analyze is motivated by the quantum mechanical models of nonrelativistic matter coupled to the radiation field, or matter interacting with a phonon field (quantized modes of a lattice), or a generalized spin-boson system. For notational convenience, we consider only scalar Bosons. A good review of physical models leading to the class of Hamiltonians considered here is found in [HSp]. 1.1.1. The non-interacting system. The algebra of observables of the uncoupled system is the C ∗ - algebra A = B(Hp )⊗W(H0 ), where B(Hp ) denotes the bounded operators on the particle Hilbert space Hp and W(H 0 ) is the Weyl CCR algebra over the one-particle space H0 = {f ∈ L2 (R3 , d 3 k) : |k|−1 |f (k)|2 < ∞}. The restriction to f ∈ H0 comes from the fact that we will work in the Araki-Woods representation of the CCR algebra, which is only defined for Weyl operators W (f ) with f ∈ H0 (see [AW, JP1, JP2, BFS4]). The dynamics of the non-interacting system is given by the automorphism group R t → αt,0 ∈ Aut(A), αt,0 (A) = eitH0 Ae−itH0 , where H0 = Hp ⊗ 1f + 1p ⊗ Hf is the sum of the particle and free field Hamiltonians. H0 acts on the Hilbert space Hp ⊗Hf , ⊗nsym where Hf = ∞ is the Fock space over H0 and Hf is the free field Hamiltonian, n=0 H0 i.e. the second quantization of the multiplication operator by ω = |k|, Hf = d(ω); if a ∗ (k), a(k) denote the (distribution valued) creation and annihilation operators, then we can express it equivalently as Hf = ω(k)a ∗ (k)a(k)d 3 k. The particle Hamiltonian is assumed to be a selfadjoint operator on Hp which has purely discrete spectrum: σ (Hp ) = {Ej }∞ j =0 ,
(1)
(where multiplicities are included, i.e. for a degenerate eigenvalue Ei , we have Ei = Ej for some j = i), and we denote the orthonormal basis diagonalizing Hp by {ϕj }. Let tr denote the trace on B(Hp ), then we further assume that Zp (β) := tre−βHp < ∞, ∀ β > 0.
(2)
We do not need to further specify the particle system. As a concrete example, one may think of a system of finitely many Schrödinger particles in a box (hence the name particle system), or a spin system. In some of our results (see Theorem 2.4 on the Fermi Golden Rule Condition), we shall assume that the spectrum of Hp is finite (N -level system). The equilibrium state at temperature T = 1/β > 0 for the non- interacting system −βHp p f p is given by the product ωβ,0 = ωβ ⊗ ωβ ∈ A∗ . Here, ωβ (·) = tr(e −βHp ·) is the tre
Positive Commutators in Non-Equilibrium Quantum Statistical Mechanics
329
f
particle-Gibbs state at temperature β and ωβ is the field β-KMS state that describes the infinitely extended field in the state of black body radiation, i.e. its two-point function ) f is given according to Planck’s law by ωβ (a ∗ (k)a(k )) = δ(k−k . The GNS construction eβ|k| −1 for (A, αt,0 , ωβ,0 ) yields the (up to unitary equivalence) unique data (H, L0 , #β,0 , π ) (dependent on β). Here, H is the GNS Hilbert space with inner product · , ·, #β,0 is a cyclic vector for the ∗-morphism π : A → B(H) (the representation map), and the Liouvillian L0 is the selfadjoint operator on H implementing the dynamics, i.e. satisfying L0 #β,0 = 0 and ωβ,0 (αt,0 (A)) = #β,0 , eitL0 π(A)e−itL0 #β,0 , ∀ A ∈ A. This GNS construction has been carried out in [AW] (for the field, the particle part is standard since it is a finite system), see also [JP1, JP2, BFS4]. We shall not explicitly use the representation map π here and thus omit its presentation which can be found in the above references. The GNS Hilbert space and cyclic vector are given by H = Hp ⊗ Hp ⊗ F(L2 (R × S 2 )), #β,0 =
p #β
⊗ #,
(3) (4)
p
where #β is the particle Gibbs state at temperature β given in (21). F(L2 (R × S 2 )) is the Fock space over L2 (R × S 2 ) with vacuum #, which we call the Jak˘si´c–Pillet glued space. It was introduced by Jak˘si´c and Pillet in [JP1] and is isomorphic to Hf ⊗ Hf , the field GNS Hilbert space constructed in [AW]. It is easily verified that the Liouvillian is given by L0 = Lp + Lf (see also [JP1, JP2]). We write simply Lp instead of Lp ⊗ 1F (L2 (R×S 2 )) and similarly for Lf . Here, Lp = Hp ⊗ 1p − 1p ⊗ Hp , Lf = d(u) and u is the first (the radial) variable in R × S 2 . It is clear that the spectrum of Lp is the set {e = Ei − Ej : Ei,j ∈ σ (Hp )} and the spectrum of Lf is the entire real axis (continuous spectrum) with an embedded eigenvalue at 0 (corresponding to the vacuum eigenvector #). Consequently, L0 has continuous spectrum covering the whole real line and embedded eigenvalues given by the eigenvalues of Lp . 1.1.2. The interacting system. We now describe the interacting system by defining an interacting Hamiltonian acting on Hp ⊗ Hf : H = H0 + λv,
(5)
where the coupling constant λ is a small real number, and v = G ⊗ (a(g) + a ∗ (g)).
(6)
Here, G is a bounded selfadjoint operator on Hp . The function g ∈ H0 is called the form factor and the smoothed out creator is given by a ∗ (g) = d 3 k g(k)a ∗ (k). We assume g to be a bounded C 1 -function, satisfying the following infra-red (IR) and ultra- violet (UV) conditions (recall that ω = |k|): IR:
|g(k)| ≤ Cωp , for some p > 0, as ω → 0, for some results, we assume p > 2,
UV: |g(k)| ≤
Cω−q ,
for some q > 5/2, as ω → ∞.
(7)
330
M. Merkli
In addition, we assume that conditions (7) hold for the derivative ∂ω g, if p, q are replaced by p − 1, q + 1. We point out that the value coming from the model of an atom coupled to the radiation field in the dipole approximation is p = 1/2 (without this approximation, p = −1/2). From now on we will refer to p = 1/2 as the physical case. The interacting Hamiltonian (which describes the coupled system at zero temperature) corresponds to an interacting Liouvillian (positive temperature Hamiltonian) which is given by (cf. [JP1, JP2, BFS4]): L = L0 + λI, I = Gl ⊗ a ∗ (g1 ) + a(g1 ) − Gr ⊗ a ∗ (g2 ) + a(g2 ) .
(8) (9)
Here, Gl := G ⊗ 1p , Gr := 1p ⊗ CGC, where C is the antilinear map on Hp that, in the basis that diagonalizes Hp , has the effect of complex conjugation of coordinates. The origin of C is the identification of the Hilbert–Schmidt operators on Hp with Hp ⊗ Hp via the isomorphism |ϕψ| ↔ ϕ ⊗ Cψ (see also [JP2, BFS4]). Moreover, we have defined, for g ∈ L2 (R+ × S 2 ): √ 1 + µ(u) u g(u, α), u ≥ 0 g1 (u, α) = √ (10) µ(−u) u g(−u, α), u < 0 and g2 (u, α) = −g1 (−u, α), where the function µ = µ(k) is the momentum density distribution, given by Planck’s law describing black body radiation: µ(k) = (eβω −1)−1 , ω = |k|. The structure of g1 in (10) comes from the Jak˘si´c–Pillet gluing which identifies L2 (R3 ) ⊕ L2 (R3 ) with L2 (R × S 2 ) via the isometric isomorphism (f1 , f2 ) → f , f (u, α) = uf1 (u, α) for u ≥ 0 and f (u, α) = uf 2 (−u, α) for u < 0. For more detail, we refer to [JP1, JP2]. For λ = 0, one can construct a vector #β,λ ∈ H s.t. the vector state defined by ωβ,λ (A) = #β,λ , A#β,λ is a β-KMS state w.r.t. the coupled dynamics αt (A) = eitL Ae−itL , where A is an element in the von Neumann algebra M := B(Hp ) ⊗ B(Hp ) ⊗ π(W(H0 )) (weak closure in B(F(L2 (R × S 2 ))) ). An extension of the algebra of observables to this weak closure is necessary since the full dynamics does not leave B(Hp ) ⊗ B(Hp ) ⊗ π(W(H0 )) invariant. It is not difficult to show that (M, αt ) is a W ∗ dynamical system (compare also to [FNV, JP2]). Notice in particular that L#β,λ = 0. The construction of #β,λ goes under the name structural stability of KMS states, see [BFS4] for this specific model, but also [A, FNV, BRII]. For β|λ| small, one has the estimate (for the O-notation, see after (20)): #β,λ − #β,0 = O(β|λ|).
(11)
We show in Appendix A.1 that L is essentially selfadjoint (Theorem A.2). 1.2. Spectral characterization of RTE. We define the equilibrium states at temperature T = 1/β > 0 to be the β-KMS states. Hence the equilibrium state of the coupled system at inverse temperature β > 0 is given by the above constructed ωβ,λ ∈ M∗ . A conjectured property of KMS states is their dynamical stability (which should be a natural property of equilibrium states). In our case, this means that ω ◦ αt → ωβ,λ as t → ∞, for states ω that are close to ωβ,λ . This is called the property of return to
Positive Commutators in Non-Equilibrium Quantum Statistical Mechanics
331
equilibrium. Apart from specifying the mode of convergence, it remains to say what we mean by ω being close to ωβ,λ . There is a natural neighbourhood of states around ωβ,λ in which the dynamics is also determined by L: the set of all normal states ω w.r.t. ωβ,λ . By definition, ω is normal w.r.t. ωβ,λ , iff ∀ A ∈ M : ω (A) = tr(ρA),
(12)
where tr(·) is the trace on the GNS Hilbert space H given in (3) and ρ is a trace class operator on H, normalized as trρ = 1. Proposition 1.1 (Spectral Characterization Let M ⊂ B(H) be a von Neu of RTE). mann algebra and suppose that ωβ (·) = #β , · #β : M → C is a β-KMS state with respect to the dynamics αt ∈ Aut(M). Suppose that the Liouvillian L generating the dynamics on H has no eigenvalues except for a simple one at zero, so that the only eigenvector of L is #β . Then, for any normal state ω w.r.t. ωβ , and for any observable A ∈ M, we have
1 T lim ω (αt (A))dt = ωβ (A). (13) T →∞ T 0 This means that the system exhibits return to equilibrium in an ergodic mean sense. The proof is given e.g. in [JP2, BFS4, M]. Better information on the spectrum of L yields stronger convergence; if L has absolutely continuous spectrum, except a simple eigenvalue at 0, then (13) can be replaced by limt→∞ ω (αt (A)) = ωβ (A). 1.3. The PC method. This section introduces the general idea of the PC method. As we have seen above, the Liouville operators in the class of systems we consider consist of two parts: L = L0 + λI, where L0 is the uncoupled Liouville operator, describing the two subsystems (particles and field) when they do not interact. I is the interaction, and λ is a real (small) coupling parameter. The spectrum of L0 consists of a continuum covering the whole real axis, and it has embedded eigenvalues, arranged symmetrically w.r.t. zero. Moreover, zero is a degenerate eigenvalue. We would like to show that for λ = 0, the spectrum of L has no eigenvalues, except for a simple one at zero, because then Proposition 1.1 tells us that the system exhibits RTE! In other words, we want to show that all nonzero eigenvalues of L0 are unstable under the perturbation λI , and that this perturbation removes the degeneracy of the zero eigenvalue, see Fig. 1. We know that L has a zero eigenvalue with eigenvector #β,λ , σ (L0 ) XX
σ (L)
λ = 0 X
0 degenerate
XX
X
0 non-degenerate
Fig. 1. Spectra of the unperturbed and perturbed Liouvillians
332
M. Merkli
the perturbed KMS state. This means that our task reduces to showing instability of all nonzero eigenvalues, and that the dimension of the nullspace of L is at most one. It is conventional wisdom that embedded eigenvalues are unstable under generic perturbations, turning into resonances. We now outline the technique we use to show instability of embedded eigenvalues: the PC technique. To do so, we concentrate first on a nonzero (isolated) eigenvalue e of L0 whose instability we want to show. The main idea is to construct an anti-selfadjoint operator A, called the adjoint operator (to L), s.t. we have the following PC estimate: 2 E4 (L)[L, A]E4 (L) ≥ θE4 (L),
(14)
where θ > 0 is a strictly positive number, E4 (L) denotes the spectral projector of L onto the interval 4, and [· , ·] is the commutator. Here, 4 is chosen to contain the eigenvalue e but no other eigenvalues of L0 . Equation (14) is also called a (strict) Mourre estimate. If it is satisfied, then one sees that L has no eigenvalues in 4 by using the following argument by contradiction: suppose that Lψ = e ψ, with e ∈ 4 and ψ = 1. Then we have E4 (L)ψ = ψ, and the PC estimate (14) gives on one hand ψ, [L, A]ψ ≥ θ. On the other hand, formally expanding the commutator yields ψ, [L, A]ψ = ψ, [L − e , A]ψ = 2 Re (L − e )ψ, Aψ = 0, (15) which leads to the contradiction θ ≤ 0, hence showing that there cannot be any eigenvalue of L in 4. This formal proof is in general wrong. Indeed, both operators L and A are unbounded, and one has to take great care of domain questions, including the very definition of the commutator [L, A]. Relation (15) is called the Virial Theorem, and it can be made in many concrete cases rigorous by approximating the hypothetical eigenfunction ψ by “nice” vectors. The situation in which this works is quite generally given by the case where [L, A] is bounded relative to L, which is in particular satisfied for N -body Schrödinger systems, and systems of particles coupled to a field at zero temperature. However, in our case the condition is not satisfied, and as mentioned above, we have to develop a more general argument of this type. The treatment of the zero eigenvalue is similar, except that we prove (14) only on Ran E4 (L)P ⊥ , where P is the rank-one projector onto the known zero eigenvector #β,λ of L, and P ⊥ is its orthogonal complement. 2. Main Results Our main technical result is the abstract PC estimate, Theorem 2.1. This result is the basis for the spectral analysis of the Liouvillian, as explained above. We point out that the PC estimate holds for infrared behaviour of the form factor (see (7)) characterized by p > 0, which covers the physical case p = 1/2. Theorem 2.2 characterizes the spectrum of the Liouvillian in view of the property of RTE. To prove this result, we combine the PC estimate with a Virial Theorem type argument. It is for the latter that we need presently the more restricting infra-red behaviour p > 2. We think that our method can be improved. A direct consequence of Theorem 2.2 is Corollary 2.3 which says that the system exhibits RTE (recall also Proposition 1.1).
Positive Commutators in Non-Equilibrium Quantum Statistical Mechanics
333
All the results hold under assumption of the Fermi Golden Rule Condition, (18) and (19). In Theorem 2.4, we give explicit conditions on the operator G and the form factor g so that the Fermi Golden Rule Condition holds. We start by explaining this condition. In the language of quantum resonances, it expresses the fact that the bifurcation of complex eigenvalues (resonance poles) of the spectrally deformed Liouvillian takes place at second order in the perturbation (i.e. the lifetime of the resonance is of the order λ−2 ). As we have mentioned above, the Liouvillian corresponding to the particle system at positive temperature is given by Lp = Hp ⊗ 1 − 1 ⊗ Hp , acting on the Hilbert space Hp ⊗ Hp , so Lp has discrete spectrum given by σ (Lp ) = {e = Ei − Ej : Ei , Ej ∈ σ (Hp )}. For every eigenvalue e of Lp , we define an operator (e) acting on the corresponding eigenspace, Ran P (Lp = e) ⊂ Hp ⊗ Hp , by
(e) = m∗ (u, α)P (Lp = e)δ(Lp − e + u)m(u, α), (16) R×S 2
where δ denotes the Dirac function, and where the operator m is given by m(u, α) = Gl g1 (u, α) − Gr g2 (u, α).
(17)
Recall that g1,2 and Gl,r were defined in and just before Eq. (10). It is clear from (16) that (e) is a non-negative selfadjoint operator. The Fermi Golden Rule Condition is used to show instability of embedded eigenvalues. For nonzero eigenvalues, the condition says that (e) is strictly positive: for e = 0, γe := inf σ (e) Ran P (Lp = e) > 0. (18) We show in Theorem 2.4 that (0) has a simple eigenvalue at zero, the eigenvector p being the Gibbs state of the particle system, #β (see (21)). This reflects the fact that the zero eigenvalue of L0 survives the perturbation, however, its degeneracy is removed, i.e. the zero eigenvalue of L is simple. The Fermi Golden Rule Condition for e = 0 requires strict positivity on the complement of the zero eigenspace of (0), i.e. γ0 := inf σ (0) Ran P (Lp = 0)P#⊥p > 0. (19) β
p
Here, P#p is the projection onto C#β , and P#⊥p = 1 − P#p . We give in Theorem 2.4 β
β
β
below explicit conditions on G and g(k) s.t. (18) and (19) hold. Here is our main result.
Theorem 2.1 (Positive Commutator Estimate). Assume the IR and UV behaviour (7), with p > 0. Let 4 be an interval containing exactly one eigenvalue e of L0 and let h ∈ C0∞ be a smooth function s.t. h = 1 on 4 and supp h ∩ σ (Lp ) = {e}. Assume the Fermi Golden Rule Condition (18) (or (19)) holds. Let β ≥ β0 , for any fixed 0 < β0 < ∞. Then there is a λ0 > 0 (depending on β0 ) s.t. if 0 < |λ| < λ0 , then we have in the sense of quadratic forms on D(N 1/2 ) (see Remarks, 1. below), for some explicitly constructed anti-selfadjoint operator A:
h(L)[L, A]h(L) ≥ 21 λ91/50 h(L) γe 1 − 5δe,0 P#β,0 − O λ1/200 h(L). (20)
334
M. Merkli
Notation. Let s be a real variable. Then O(s) stands for a family Ts of bounded operators depending on s, satisfying lims→0 Ts /s = C < ∞. In (20), s = λ1/200 . Remarks. 1. N = d(1) is the number operator in the positive temperature Hilbert space (see also (3) and (89)), and P#β,0 is the projector onto the span of #β,0 , the β-KMS state of the uncoupled system (see (4)). Also, δe,0 is the Kronecker symbol, equal to one if e = 0 and zero otherwise. 2. We show in Theorem A.2 that L is essentially selfadjoint on a dense domain in the positive temperature Hilbert space. 3. The commutator [L, A] is by construction in first approximation equal to N (see Sect. 4), and h(L) leaves the domain D(N 1/2 ) invariant (see e.g. [M]), so that (20) is well defined. 4. There is no smallness condition on the interval 4 (apart from it only containing one eigenvalue of L0 ). Theorem 2.2 (Spectrum of L). Assume the IR condition p > 2 (see (7)). Let β ≥ β0 , for any fixed 0 < β0 < ∞, β < ∞. Then the Liouvillian L has the following spectral properties: 1) Let e = 0 be a nonzero eigenvalue of L0 , and suppose that the Fermi Golden Rule Condition (18) holds for e. Then there is a λ0 > 0 (dependent on β0 ) s.t. for 0 < |λ| < λ0 , L has no eigenvalues in the open interval (e− , e+ ), where e− is the biggest eigenvalue of L0 smaller than e, and e+ is the smallest eigenvalue of L0 bigger than e. 2) Assume the Fermi Golden Rule Condition (19) holds for e = 0. Then there is a λ0 > 0 (dependent on β0 ) s.t. if 0 < |λ| < λ0 and 0 < β|λ| < λ0 , then L has a simple eigenvalue at zero. Remark. Theorem 2.2 shows that if the Fermi Golden Rule Condition holds for all eigenvalues of L0 , then L has no eigenvalues, except a simple one at zero. Corollary 2.3 (Return to Equilibrium). Suppose the IR condition and the condition on β as in Theorem 2.2, and that the Fermi Golden Rule Condition is satisfied for all eigenvalues of L0 . If |λ| > 0 is small (in the sense of Theorem 2.2, 2)), then every normal state w.r.t. the β-KMS state #β,λ (the zero eigenvector of L) exhibits return to equilibrium in an ergodic mean sense. The corollary follows immediately from Theorem 2.2 and Proposition 1.1, where the ergodic mean convergence is defined by (13). Theorem 2.4 ( Spectrum of (e)). Set p (e) := P (Lp = e)(e)P (Lp = e) and for Ei , Ej ∈ σ (Hp ), let Eij := Ei − Ej . 1) Let e = 0. Then there is a non-negative number δ0 = δ0 (G) (independent of β, λ) whose value is given in Appendix A.2 (see before (97)) s.t. p (e) ≥ δ0
inf
i,j :Eij =0
|Eij |
S2
2 P (Lp = e). dS(ω, α) g(|Eij |, α)
In particular, the Fermi Golden Rule Condition (18) is satisfied if the r.h.s. is not zero.
Positive Commutators in Non-Equilibrium Quantum Statistical Mechanics
335 p
2) p (0) has an eigenvalue at zero, with the particle Gibbs state #β as eigenvector: p e−βEi /2 ϕi ⊗ ϕi , (21) #β = Zp (β)−1/2 i
where we recall that Zp (β) was defined in (2). Moreover, if
eβEn 2 |ϕ | g0 := inf δ(Emn + ω)|g|2 ≥ 0 n , Gϕm m,n:Emn <0 e−βEmn − 1 R3 is strictly positive, then zero is a simple eigenvalue of p (0) with unique eigenvector p #β and the spectrum of p (0) has a gap at zero: (0, 2g0 Zp ) ∩ σ (p (0)) = ∅. In particular, the Fermi Golden Rule Condition (19) holds. Remarks. 1. If e = 0 is nondegenerate, i.e. if e = Em0 n0 for a unique pair (m0 , n0 ), then (see before (97)) ϕn , Gϕn 2 + ϕm , Gϕm 2 . δ0 = 0 0 n=n0
m=m0
2. If Hp is unbounded, then g0 = 0. Indeed, let m be fixed, and take n → ∞, then Emn < 0 and ϕn , Gϕm → 0, since ϕn goes weakly to zero. Notice though that g0 > 0 is only a sufficient condition for the Fermi Golden Rule Condition to hold at zero. 3. For g0 > 0, the size of the gap, 2g0 Zp , is bounded away from zero uniformly in β ≥ β0 , since ˆ
lim
inf
β→∞ m,n:Em <En
tre−βHp tre−β Hp = lim inf , β→∞ m,n:Eˆ m <Eˆ n e−β Eˆ m − e−β Eˆ n e−βEm − e−βEn
where Eˆ i := Ei −E0 ≥ 0 (E0 is the smallest eigenvalue of Hp ) and Hˆ p := Hp −E0 ≥ 0 (the smallest eigenvalue of Hˆp is zero). 3. Review of Previous Results Proving the RTE property is one of the key problems of non- equilibrium statistical mechanics. Until recently, this property was proven for specially designed abstract models (see [BRII]). The first result for realistic systems came in the pioneering work of Jak˘si´c and Pillet [JP1, JP2] in 1996. In their work, Jak˘si´c and Pillet prove return to equilibrium, with exponential rate of convergence in time, for the spin-boson system (i.e. an N -level system coupled to the free massless bosonic field with N = 2; their work easily extends to general finite N ) for sufficiently high temperatures. Their work introduces the spectral approach to RTE. The analysis is done in the spirit of the theory of quantum resonances, using spectral deformation techniques, where the deformation is generated by energy- translation. The IR condition on the form factor is g(ω) ∼ ωp , ω → 0, with p > −1/2, hence includes the physical case p = 1/2. However, there is a restriction on temperature: |λ| < 1/β. The spectral deformation technique imposes certain analyticity conditions on the form factor. The N -level system coupled to the free massless bosonic field is also treated in [BFS4], but the spectrum of the Liouvillian is analyzed using complex dilation instead
336
M. Merkli
of translation. RTE with exponentially fast rate in convergence in time is established for small coupling constant λ independent of β. Bach, Fröhlich and Sigal adapt in this work their Renormalization Group method developed in [BFS1–BFS3] to the positive temperature case. The IR condition is p > 0, which includes the physical case. In a recent work, Derezi´nski and Jak˘si´c [DJ] consider the Liouvillian of the N -level system interacting with the free massless bosonic field. Their analysis of the spectrum of the Liouvillian is based on the Feshbach method which is justified with the help of the Mourre Theory, applied to the reduced Liouvillian (away from the vacuum sector). The Mourre theory in turn is based on a global positive commutator estimate for the reduced Liouvillian. The IR condition for instability of nonzero eigenvalues is p > 0, and for the lifting of the degeneracy of the zero eigenvalue, it is p > 1. The method for the spectral analysis of the Liouvillian we use employs the energytranslation generator in the Jak˘si´c–Pillet glued positive temperature Hilbert space, as in [JP1, JP2] and [DJ]. We prove a Mourre estimate (PC estimate) for the original Liouvillian with a conjugate operator which is a deformation of the energy shift generator mentioned above. This method has been developed in the zero-temperature case in [BFSS] (for the dilation generator though). Our construction of the PC works for the IR condition p > 0, which includes the physical case. In order to conclude absence of eigenvalues from the PC estimate, the Virial Theorem is needed. So far, the systems for which the Virial Theorem was applied have always satisfied the condition that [L, A] is relatively bounded with respect to L, in which case a general theory has been developed, see [ABG] (for specific systems, see also [BFSS] for particle-field at zero temperature, [HS1] for N -body systems). We remark though that in [S], Skibsted extends the abstract Mourre theory to certain systems where [L, A] is not relatively bounded (but [[L, A], A] is). We develop in this work a Virial Theorem type argument in the case where the commutator [L, A] is not relatively L-bounded. This comes at the price that our estimates involve the triple commutator [[[L, A], A], A], and consequently, we need a restrictive IR behaviour of the form factor, namely p > 2. We think that this restriction coming from the part of the proof using the Virial Theorem (not the PC estimate), can be improved by a better understanding of the Virial Theorem. It should be pointed out that the Virial Theorem is an important tool of interest on its own, still currently under research, see e.g. [GG]. Let us mention that in order to show RTE, we need the condition 0 < |λ| < λ0 /β (Corollary 2.3), so our result of RTE is not uniform in temperature as T = 1/β → 0. The same situation occurs in [JP1, JP2]. Uniformity in temperature is obtained in [BFS]. We finish this brief review by comparing our approach to that of [DJ] which, in the literature on the subject, is closest to ours. The main difference is that [DJ] develop first the Mourre theory for a reduced Liouville operator, starting from a global PC estimate on the radiation sector. Using the Feshbach method, they show then the limiting absorption principle for the Liouvillian acting on the full space. [DJ] use the fact that the system has a global PC estimate (i.e. for positive temperatures, one cannot avoid using the generator of translations as the adjoint operator) and we do not see how to modify that technique for a different adjoint operator. The use of a different adjoint operator than the Jak˘si´c–Pillet translation generator might be desirable, for instance in order to remove restrictive assumptions on the coupling functions. In our method, we modify the bare adjoint operator in such a way as to have a local PC estimate right from the start for the full (i.e. not for a reduced) Liouvillian. This
Positive Commutators in Non-Equilibrium Quantum Statistical Mechanics
337
method has the advantage that it works for various choices of the adjoint operator, in fact, it was first developed (for zero temperatures) for the dilation generator in [BFSS]. It is true though that the use of the translation generator greatly reduces the number of estimates to be performed, and this is the reason why we use it here. Let us also mention that in proving our PC estimate, we do not need a smallness condition on |4| (except that 4 should contain only one eigenvalue of Lp ), while in Mourre theory it is usually necessary to assume that |4| is small. We do not claim that either of the two methods is better, both having, in our view, advantages and disadvantages. We do believe that our approach gives new insights and can open doors to new techniques to handle the problem of RTE and related spectral problems. 4. Proof of Theorem 2.1: Step 1 We prove in this section the PC estimate w.r.t. spectral localization in the uncoupled Liouvillian L0 , see Theorem 4.3. Step 2 consists in passing from this estimate to the one localized w.r.t. the full Liouvillian L and is performed in the next section. Our estimates are uniform in β ≥ β0 (for any 0 < β0 < ∞ fixed). For notational convenience, we set β0 = 1, see also the remark after Proposition A.1 in Appendix A.1. 4.1. PC with respect to spectral localization in L0 . We construct an operator B (see (27)) which is positive on spectral subspaces of L0 , see Theorem 4.3 (the main result of this section). On L2 (R × S 2 ) and for t ∈ R, we define the unitary transformation
U˜ t ψ (u, α) = ψ(u − t, α), which induces a unitary transformation Ut on Fock space F = F(L2 (R × S 2 )): Ut = (U˜ t ), i.e. for ψ ∈ F, the projection onto the n-sector of Ut ψ is given by (Ut ψ)n (u1 , . . . , un ) = ψn (u1 − t, . . . , un − t). Here and often in the future, we do not display the angular variables α1 , . . . , αn in the argument of ψn . Ut is a strongly continuous unitary one-parameter (t ∈ R) group on F. Its anti-selfadjoint generator A0 , defined in the strong sense by ∂t |t=0 Ut = A0 , is A0 = −d(∂u ). The domain of the unbounded operator A0 , D(A0 ) = {ψ ∈ F : ∂t |t=0 Ut ψ ∈ F}, is dense in F, which simply follows from the fact that A0 is the generator of a strongly continuous group. From now on, we write Ut = etA0 , t ∈ R. The following result serves to motivate the definition of an operator denoted by [L, A0 ] (see (23) below). The proof is not difficult and can be found in [M]. Proposition 4.1. On the dense set D(L0 )∩D(N ), we have e−tA0 LetA0 = L0 +tN +λIt , where It is obtained from I by replacing the form factor g by its translate g t , and g t (u, α) = g(u + t, α). We obtain therefore ∂t |t=0 e−tA0 LetA0 = N + λI˜,
(22)
where I˜ = Gl ⊗ (a ∗ (∂u g1 ) + a(∂u g1 )) − Gr ⊗ (a ∗ (∂u g2 ) + a(∂u g2 )). The derivative in (22) is understood in the strong topology.
338
M. Merkli
On a formal level, we have ∂t |t=0 e−tA0 LetA0 = −A0 L + LA0 = [L, A0 ], which suggests the definition of the unbounded operator [L, A0 ] with domain D([L, A0 ]) = D(N ) as [L, A0 ] := N + λI˜.
(23)
We point out that the operator [L, A0 ] is defined as the r.h.s. of (23), and not as a commutator in the sense of LA0 − A0 L. Remark that [L, A0 ] is positive on D(N ) ∩ Ran P#⊥ , where # is the vacuum in F. Indeed, from Proposition A.1, it follows (take e.g. c = 1/4) [L, A0 ] ≥ 43 N − O(λ2 ), so that P#⊥ [L, A0 ]P#⊥ ≥ 3/4 − O(λ2 ) P#⊥ . On the other hand, P# [L, A0 ]P# = 0, so if we want to find an operator that is positive also on C#, then we need to modify A0 . For a fixed eigenvalue e ∈ σ (Lp ), define
b(e) = θ λ QRA2 I Q − QI RA2 Q , −1/2
. RA = (L0 − e)2 + A 2
(24)
Here, θ and A are positive parameters, and Q, Q are projection operators on H defined as Q = P (Lp = e) ⊗ P# ,
Q = 1 − Q.
(25)
In what follows, we denoteR A := QRA . Proposition 4.2. The operator b = b(e) is bounded and [L, b] = Lb − bL is well defined on D0 and it extends to a bounded operator on the whole space. We denote the extended operator again by [L, b]. Proof. The operator b is bounded since both I Q and QI are bounded. Furthermore, since L0 RA ≤ 1 + |e|/A and L0 Q = |e|, then [L0 , b] is bounded. Moreover, since 2 I Q ≤ C and IR A I Q ≤ CA −2 (N + 1)I Q ≤ 2CA −2 I Q ≤ CA −2 , then also [I, b] < ∞. We used the fact that Ran I Q ⊂ Ran P (N ≤ 1), since I is linear in creators and N Q = 0. & ' We define the operator [L, A] by D([L, A]) = D(N ) and [L, A] := [L, A0 ] + [L, b] = N + λI˜ + [L, b].
(26)
Again, we point out that [L, A] is to be understood as the r.h.s. of (26) (with [L, b] defined in Proposition 4.2). The commutator notation [L, A] is chosen because in the sense of quadratic forms on D(L0 ) ∩ D(N ) ∩ D(A0 ), one has ϕ, [L, A]ϕ = 2 Re Lϕ, Aϕ with A = A0 + b. Define now the operator B by D(B) = D(N ) and B := [L, A] −
1 9 N= N + λI˜ + [L, b]. 10 10
(27)
Positive Commutators in Non-Equilibrium Quantum Statistical Mechanics
339
Here is the main result of this section: Theorem 4.3. Let e ∈ σ (Lp ) and let 4 be an interval around e not containing any other 0 = E (L ). eigenvalue of Lp . Let E4 be the (sharp) indicator function of 4 and set E4 4 0 Assume that the Fermi Golden Rule Condition (18) (or (19)) holds. Then there is a number s > 0 s.t. if 0 < θ, A, Aθ −1 , θλ2 A −3 < s, then we have on D(N 1/2 ), in the sense of quadratic forms: 0 θλ2 0 1 − 25 δe,0 P#β,0 E4 γ e E4 , A is the projector onto the span of #β,0 defined in (4). 0 0 BE4 ≥ E4
where P#β,0
(28)
An essential ingredient of the proof of Theorem 4.3 is the Feshbach method, which we explain now. 4.2. The Feshbach method. The main idea of the Feshbach method is to use an isospectral correspondence between operators acting on a Hilbert space and operators acting on some subspace. We explain this method adapted to our case. For a more general exposition, see e.g. [BFS2] and [DJ]. 0 , where χ = χ (N ≤ ν) Consider the Hilbert spaces He defined by He = Ran χν E4 ν is a cutoff in N, and ν is a positive integer. With our definitions of Q, Q, (see (25)) we have 0 0 Q ⊕ Ran χν E4 Q. He = Ran χν E4
(29)
0 Q and Q = χ E 0 Q and set B = Q BQ , i, j = 1, 2. The Define Q1 = χν E4 2 ν 4 ij i j operators Bij are bounded due to the cutoff in N . Notice that Q1,2 are projection operators 0 and Q. (i.e. Q21,2 = Q1,2 ) since χν commutes with E4 The main ingredient of the Feshbach method is the following observation:
Proposition 4.4 (Isospectrality of the Feshbach map). If z is in the resolvent set of B22 (i.e. if (B22 − z)−1 Ran Q2 exists as a bounded operator) and if (30) Q2 (B22 − z)−1 Q2 BQ1 < ∞, Q1 BQ2 (B22 − z)−1 Q2 < ∞, then we have z ∈ σ# (B) ⇐⇒ z ∈ σ# (Ez ), where the Feshbach map Ez = Ez (B) is defined by B → Ez = B11 − B12 (B22 − z)−1 B21 , and σ# stands for σ or σpp (spectrum or pure point spectrum). The proof of Proposition 4.4 is given in a more general setting e.g. in [BFS2, DJ]; we do not repeat it here. We use the isospectrality of the Feshbach map to show positivity of B in the following way (see also [BFSS]): ˜ 2 for some ϑ˜ > Corollary 4.5. Let ϑ0 = inf σ (B He ) and suppose that B22 ≥ ϑQ −∞, and that inf σ (Eϑ ) ≥ F0 uniformly in ϑ for ϑ ≤ ϑ1 , where F0 and ϑ1 are two ˜ inf σ (Eϑ0 )}. fixed (finite) numbers. Then we have ϑ0 ≥ min{ϑ, Remarks. 1. All our estimates in this section will be independent of the N -cutoff ˜ ϑ0 , ϑ1 , F0 are independent of ν. This will allow us introduced in (29). In particular, ϑ, to obtain inequality (28) on D(N 1/2 ) from the corresponding estimate on Ran χ (N ≤ ν) by letting ν → ∞ (see (50) below). 2. The condition inf σ (Eϑ ) ≥ F0 uniformly in ϑ for ϑ ≤ ϑ1 , implies that ϑ0 = −∞.
340
M. Merkli
˜ then the assertion is clearly true. If ϑ0 < ϑ, ˜ then Proof of Corollary 4.5. If ϑ0 > ϑ, ϑ0 is in the resolvent set of B22 , and it is easy to show that (30) holds for z = ϑ0 , so ϑ0 ∈ σ (Eϑ0 ), i.e. ϑ0 ≥ inf σ (Eϑ0 ). & '
4.3. Proof of Theorem 4.3 (using the Feshbach method). We apply Corollary 4.5 to the operator B = B − δe,0 δP#⊥β,0 ,
(31)
where δe,0 is the Kronecker symbol, i.e. δe,0 is one if e = 0 and zero otherwise. The positive number δ will be chosen appropriately below, see after (48). ≥ (3/4 − δ First, we show that B22 e,0 δ)Q2 (see (33)), then we show that Eϑ ≥ −1 − δe,0 δ =: F0 (see Proposition 4.6), uniformly in ϑ for ϑ ≤ 1/2 − δe,0 δ. Invoking Corollary 4.5 will then yield the result. Notice that due to the cutoff χν in (29), Bij , i, j ∈ {1, 2} are bounded operators. All the following estimates are independent of ν. = Q B Q . Using QQ = 0, and δ P ⊥ Q = δ Q , we We first calculate B22 2 2 2 e,0 #β,0 2 e,0 2 obtain from (31) and (27), 9 2 2 B22 N + λI˜ + θλ2 (R A I QI − I QIR A ) − δe,0 δ Q2 . = Q2 (32) 10 Proceeding as in the proof of Proposition A.1, one shows that ∀ c > 0, 2 ψ, λI˜ψ ≤ cN 1/2 ψ2 + C λc ∂u g1 2L2 ψ2 . With our assumptions on g, ∂u g1 2L2 < ∞, uniformly in β ≥ 1. Using the inequality 2
above with c = 1/10 and R A I QI ≤ CA −2 , we obtain 8 2 2 −2 B22 ≥ Q2 N − O(λ + θλ A ) − δe,0 δ Q2 . 10
As can be easily checked, Q2 = Q2 P#⊥ , so we have N Q2 ≥ Q2 , and we conclude that there is a s1 > 0 s.t. if λ2 + θ λ2 A −2 ≤ s1 , then 8 3 B22 ≥ (33) − δe,0 δ − O(λ2 + θλ2 A −2 ) Q2 ≥ − δe,0 δ Q2 . 10 4 In the language of Corollary 4.5, this means we can take ϑ˜ = 3/4 − δe,0 δ. In a next step, we calculate a lower bound on Eϑ for ϑ ≤ 1/2 − δe,0 δ. Proposition 4.6. We have, uniformly in ϑ for ϑ ≤ 1/2 − δe,0 δ: Aδe,0 δ ⊥ θ λ2 1/4 −1 2 −3 (1 − 5θ )Q1 (e) − P + Aθ + θλ A ) Q1 , Eϑ ≥ 2π p − O(A A 2θλ2 #β (34) p
where the error term is independent of δ. Recall that #β is the particle Gibbs state defined in (21).
Positive Commutators in Non-Equilibrium Quantum Statistical Mechanics
341
− B (B − ϑ)−1 B . We show that Proof of Proposition 4.6. By definition, Eϑ = B11 12 22 21 is positive and B (B − ϑ)−1 B is small compared to B . B11 12 22 21 11 With QQ1 = 0, QQ1 = Q1 and δe,0 P#⊥β,0 Q1 = δe,0 P#⊥p Q1 , we obtain from (31) β
and (27): B11
≥ 2θ λ Q1 2
2 IR A I
δe,0 δ ⊥ − P p Q1 − O(λ2 ), 2θλ2 #β
(35)
1 where we used λI˜ ≥ − 10 N − O(λ2 ) and Q1 N = 0. − ϑ)−1 B . Notice that from (32), we get Let us now examine B12 (B22 21 Q2 (B22 − ϑ)Q2 =
9 1/2 (1 − 10 (ϑ 10 Q2 N 9
+ δe,0 δ)N −1 + K1 )N 1/2 Q2 ,
(36)
where we defined the bounded selfadjoint operator K1 acting on Ran Q2 as
−1/2 ˜ + θλ2 (R 2A I QI − I QIR 2A ) N −1/2 . K1 = 10 λ I N 9
(37)
Since Q2 N −1/2 ≤ 1 and I˜(N + 1)−1/2 ≤ C, we get K1 ≤ C(λ + θλ2 A −2 ). Now on Ran P#⊥ , we have N ≥ 1, so since we look at ϑ s.t. ϑ + δe,0 δ ≤ 1/2, we obtain 1−
10 9 (ϑ
+ δe,0 δ)N −1 ≥ 1 −
10 1 9 2
= 49 .
(38)
Therefore we can rewrite (36) as − ϑ)Q2 = Q2 (B22
9 1/2 1 − 10 (ϑ + δ δ)N −1 1/2 (1 + K ) e,0 2 10 Q2 N 9 1/2 1/2 10 −1 N Q2 , × 1 − 9 (ϑ + δe,0 δ)N
(39)
where K2 = 1 −
10 9 (ϑ
+ δe,0 δ)N −1
−1/2
K1 1 −
10 9 (ϑ
+ δe,0 δ)N −1
−1/2
,
and K2 ≤
9 K1 = O(λ + θλ2 A −2 ) << 1. 4
We have thus from (39): −1/2 10 −1 −1/2 K 2 1 − 10 9 Q2 N 9 (ϑ + δe,0 δ)N −1 −1/2 N −1/2 Q , × 1 − 10 2 9 (ϑ + δe,0 δ)N
− ϑ)−1 Q2 = Q2 (B22
(40)
where K = (1 + K2 )−1/2 is bounded and selfadjoint with K2 = K 2 = (1 + 1 K2 )−1 ≤ 1−K < 2. We have therefore, from (40) and (38), and uniformly in ϑ for 2 ϑ ≤ 1/2 − δe,0 δ: 10 −1 −1/2 N −1/2 B ψ2 ψ, B12 (B22 − ϑ)−1 B21 ψ = 10 21 9 K 1 − 9 (ϑ + δe,0 δ)N −1/2 9 ≤ 2 10 B21 ψ2 = 5N −1/2 B21 ψ2 . 9 4 N
(41)
342
M. Merkli
= B and B = B . Now, remembering (27), and since N Q = 0 Notice that B12 12 21 1 21 and Q2 Q = 0 = QQ1 ,
N −1/2 B21
2 2 2 2 = N −1/2 Q2 λI˜ + θ λ(L0 − e)R A I − θλR A I (L0 − e) + θλ2 IR A I −R A I QI Q1 .
Using N −1/2 Q2 ≤ 1, I˜Q1 ≤ C, I Q1 ≤ C, N −1/2 I ≤ C,(L0 − e)Q1 = 0, (L0 − e)RA ≤ 1, we get −1/2 2 N B21 ψ ≤ C(λ2 + θ 2 λ4 A −4 )ψ2 + 2θ 2 λ2 R A I Q1 ψ2 , thus with (41), we obtain − ψ, B12 (B22 − ϑ)−1 B21 ψ
2 ≥ −10θ 2 λ2 ψ, Q1 IR A I Q1 ψ − O(λ2 + θ 2 λ4 A −4 )ψ2 ,
and so, together with (35), we get, uniformly in ϑ for ϑ ≤ 1/2 − δe,0 δ:
2 δe,0 δ ⊥ Eϑ ≥ 2θ λ2 (1 − 5θ )Q1 IR A I − P p Q1 − O(λ2 + θ 2 λ4 A −4 ). 2θλ2 #β
(42)
We point out that the error term in the last inequality does not depend on δ. With the choice of parameters we will make (see (68)), (42) shows that Eϑ ≥ −1−δe,0 δ uniformly in ϑ for ϑ ≤ 1/2 − δe,0 δ, i.e. in the language of Corollary 4.5, F0 = −1 − δe,0 δ. The remaining part of the proof consists in relating the strict positivity of the nonneg2 ative operator Q1 IR A I Q1 to the Fermi Golden Rule Condition. We let Ia and Ic = Ia∗ denote the parts of I containing annihilators and creators only, so that I = Ia + Ic . Thus 2
2
Q1 IR A I Q1 = Q1 IaR A Ic Q1 = Q1 Ia RA2 Ic Q1 .
(43)
In the first step, we used Ia Q1 = 0 and Q1 Ic = 0 (since Ia P# = 0) and in the second step, we used Q1 Ia Q = Q1 Ia (since Ia Q = 0). Now write
2 Q1 Ia RA Ic Q1 = Q1 m∗ (u, α)a(u, α)RA2 (e)a ∗ (u , α )m(u , α ) Q1 , (44) where m is defined (17), and where we display the dependence of RA2 on e. The operatorvalued distributions (a and a ∗ ) satisfy the canonical commutation relations [a(u, α), a ∗ (u , α )] = δ(u − u )δ(α − α ). Next, we notice that the pull-through formula a(u, α)Lf = (Lf + u)a(u, α) implies a(u, α)RA2 (e) = RA2 (e − u)a(u, α).
(45)
Using the CCR and formula (45) together with the fact that a(u, α)Q1 = 0, we commute a(u, α) in (44) to the right and arrive at
(44) = Q1 m∗ (u, α)RA2 (e − u)m(u, α) Q1 . (46) We can pull a factor P# out of Q1 and place it inside the integral next to RA2 (e − u) and thus replace RA2 (e−u) by ((Lp −e+u)2 +A 2 )−1 . Notice that A((Lp −e+u)2 +A 2 )−1 → δ(Lp − e + u) as A → 0. More precisely, we have
Positive Commutators in Non-Equilibrium Quantum Statistical Mechanics
343
Proposition 4.7. There is an s2 > 0 s.t. for 0 < A < s2 , we have
−1 π Q1 m∗ (u, α) (Lp − e + u)2 + A 2 m(u, α) Q1 ≥ Q1 (e) − O A 1/4 Q1 . A Proposition 4.7, which we prove in Appendix A.3, together with (42)–(44) and (46) yields (34), proving Proposition 4.6. & ' Now we finish the proof of Theorem 4.3. If the Fermi Golden Rule Condition (18) holds, then for e = 0, we have (e) ≥ γe > 0 on Ran Q1 , so we obtain from (34), 2 and under the conditions on the parameters stated in Theorem 4.3: Eϑ ≥ π θλA γe , so by Corollary 4.5: inf σ (B He ) ≥ min{1/2, π θλ2 A −1 γe } = π
θλ2 γe , A
(47)
since by our choice of the parameters (see (68)), we will have θλA < (2π γe )−1 . p For e = 0, we have (0) = (0)P#⊥p , since (0)#β = 0 (see Theorem 2.4), so 2
β
Proposition 4.6 gives θ λ2 Aδ ⊥ 1/4 −1 2 −3 Eϑ ≥ π γ0 − P#p − O(A + Aθ + θλ A ) Q1 . Q1 β A 2θλ2
(48)
γ0 For some fixed 0 < a < 2(π−1) (independent of θ, λ, A), there is a s3 > 0 s.t. if Aδ 2 −1 0 < θ λ A < s3 , then γ0 − 2θλ2 > −a, which gives with (48): θ λ2 Q1 −aP#⊥p − O A 1/4 + Aθ −1 + θλ2 A −3 Q1 Eϑ ≥ π β A
2 θλ −a − O A 1/4 + Aθ −1 + θλ2 A −3 Q1 ≥π A θ λ2 ≥ −2π a Q1 . A
The last step is true provided A 1/4 + Aθ −1 + θλ2 A −3 < s4 , for some small s4 > 0. Remembering that B = B − δP#⊥β,0 , we obtain from Corollary 4.5,
θλ2 , inf σ (B − δP#⊥β,0 ) H0 ≥ min{1/2, −2π aθλ2 /A} = −2π a A from which we conclude that if the condition on the parameters given in Theorem 4.3 is satisfied with s = min(s1 , s2 , s3 , s4 ), then θλ2 0 0 0 ⊥ 0 + δP#β,0 E4 , χν χν E4 BE4 χν ≥ χν E4 −2π a A 0 θ λ2 0 1 − a(π − 1)/γ0 − (1 + a/γ0 )P#β,0 E4 γ 0 χ ν E4 =2 χν (49) A θ λ2 0 0 ≥ (1 − 25 P#β,0 )E4 χν , γ 0 χ ν E4 A
344
M. Merkli
where we used a/γ0 ≤
1 2(π−1) .
Estimates (47) and (49) yield ∀ ψ:
θλ2 0 0 0 0 γe ψ, χν E4 BE4 χν ψ ≥ (1 − 25 δe,0 P#β,0 )E4 χν ψ . ψ, χν E4 A
(50)
Suppose now ψ ∈ D(N 1/2 ). Then, since (N + 1)−1/2 B(N + 1)−1/2 is bounded (see the definition of B, (27)), and since χν → 1 strongly as ν → ∞, we conclude that ∀ ψ ∈ D(N 1/2 ):
θλ2 0 0 0 0 ψ, E4 γe ψ, E4 BE4 ψ ≥ (1 − 25 δe,0 P#β,0 )E4 ψ , A
which proves Theorem 4.3.
' &
5. Proof of Theorem 2.1: Step 2 We pass from the positive commutator estimate w.r.t. L0 given in Theorem 4.3 to one w.r.t. the full Liouvillian L, hence proving Theorem 2.1. The essential ingredient of this procedure is the IMS localization formula, which we apply to a partition of unity w.r.t. N. Then, we carry out the estimates on each piece of the partition separately. 5.1. PC with respect to spectral localization in L. Let 1 = χˆ 12 (x) + χˆ 22 (x), x ∈ R+ , χˆ 12 ∈ C0∞ ([0, 1]), be a C ∞ -partition of unity. For some scaling parameter σ >> 1, define χi = χi (N ) = χˆ i (N/σ ), i = 1, 2. The reason why we introduce the partition of unity is that I χ1 = O(σ 1/2 ) is bounded. Since the χi leave D(N 1/2 ) invariant, then [χi , [χi , B]] = χi2 B − 2χi Bχi + Bχi2 is well defined on D(N 1/2 ) in the sense of quadratic forms, and by summing over i = 1, 2, we get the so-called IMS localization formula (see also [CFKS]): B=
χi Bχi + 21 [χi , [χi , B]].
(51)
1,2
Furthermore, we obtain from (51) and (27), in the sense of quadratic forms on D(N 1/2 ): h(L)[L, A]h(L) =
1 10 h(L)N h(L) +
h(L)χi Bχi h(L)
1,2
+
(52)
1 2 h(L)[χi , [χi , B]]h(L).
In Propositions 5.1–5.3 below, we estimate the different terms on the r.h.s. of (52). Then we complete the proof of Theorem 2.1 by choosing suitable relations among the parameters θ, λ, A, σ (see (68)). Proposition 5.1. There is a s5 > 0 s.t. if λ2 σ −1 < s5 , then hχ2 Bχ2 h ≥
σ 2 hχ h. 2 2
(53)
Positive Commutators in Non-Equilibrium Quantum Statistical Mechanics
345
9 Proof. Recall that B = 10 N + λI˜ + [L, b]. Since Qχ2 = 0 and QI χ2 = 0 (see also end of proof of Proposition 4.2), we have ∀ ψ: ψ, χ2 [L, b]χ2 ψ = 0. Furthermore, Proposition 6.1 gives ∀ c > 0, λI˜ ≥ cN − O(λ2 /c), so 9 − c)N − O(λ2 /c) χ2 ψ ψ, χ2 (9N/10 + λI˜)χ2 ψ ≥ ψ, χ2 ( 10 ≥ 43 σ ψ, χ22 ψ ,
provided λ2 σ < s5 and where we picked the value c = 1/10 and used χ2 N χ2 ≥ σ χ22 . ' & Proposition 5.2. We have hχ1 Bχ1 h +
1 10 hN h
≥
θ λ2 5 θλ2 γe 1 − O(λσ 1/2 ) hχ12 h − γ0 δe,0 hP#β,0 h A 2 A θλ2 −1 − O Aθ + Aσ 1/2 + λσ A −1 h2 . A
Proof. Let F40 := F4 (L0 ), where 4 is an interval whose interior contains the closure
of 4, and F4 is a smooth characteristic function with support in 4 , s.t. E4 (L0 )F40 = 0, where we denoted 1 − F40 =: F40 . We take 4 to contain only one eigenvalue of σ (L0 ), 0 replaced by E 0 . We have namely e, so that (28) in Theorem 4.3 holds, with E4 4 hχ1 Bχ1 h +
1 10 hN h
= hχ1 F40 BF40 χ1 h +
0 1 0 20 hN h + hχ1 F4 BF4 χ1 h + + hχ1 F40 BF40 χ1 h.
(54) adjoint
(55) (56)
First, we show that (55) and (56) are bounded below by small terms. To treat (55), notice that χ1 F40 BF40 χ1 = χ1 F40 (9N/10 + λI˜ + [L, b])F40 χ1 = ≥
0 9 2 0 0 0 ˜ 10 χ1 F4 F4 N + χ1 F4 (λI + [L, b])F4 χ1 χ1 F40 (λI˜ + [L, b])F40 χ1 .
Now for φ1,2 ∈ D(N 1/2 ), we have for any c > 0 (see Proposition A.1)
φ1 , λI˜φ2 ≤ λ φ1 , I˜a φ2 + φ2 , I˜a φ1
≤ Cλ φ1 N 1/2 φ2 + φ2 N 1/2 φ1
≤ Cλ2 c−1 φ1 2 + φ2 2 + c N 1/2 φ1 2 + N 1/2 φ2 2 . With φ1 = F40 χ1 ψ, φ2 = F40 χ1 ψ, this yields ∀ c > 0: λ2 ψ, χ1 F40 λI˜F40 χ1 ψ ≤ C 2χ1 ψ2 + 2cN 1/2 χ1 ψ2 , c
(57)
346
M. Merkli
2 so χ1 F40 λI˜F40 χ1 + adjoint ≥ −4 C λc χ12 + cN . Taking c < 0 ˜ 0 1 20 hN h + hχ1 F4 λI F4 χ1 h + adjoint
1 40
gives then
1 ≥ ( 10 − 4c)hN h − Cλ2 hχ12 h
≥ −Cλ2 hχ12 h.
(58)
Next, using QF40 = 0 and (L0 − e)Q = 0, we calculate χ1 F40 [L, b]F40 χ1 = χ1 F40 [L0 − e, b]F40 χ1 + λχ1 F40 [I, b]F40 χ1 2
= θ λχ1 F40 QIR A (L0 − e)F40 χ1
2 2 2 + θ λ2 χ1 F40 −R A I QI − I QIR A + QIR A I F40 χ1
(59)
= O(θ λ + θλ2 A −2 σ 1/2 ), where we used RA F40 ≤ |4 |−1 ≤ C and I χ1 ≤ Cσ 1/2 . Next, since supp h ∩
supp F40 = ∅, then χ1 F40 h(L) = χ1 F40 (h(L) − h(L0 )), so by using the operator calculus introduced in Appendix A.4, we obtain
χ1 F40 h(L)
= χ1
d F˜4 (z)(L0 − z)−1 λI (L − z)−1 h(L) = O(λσ 1/2 ).
(60)
From (59), we then have hχ1 F40 [L, b]F40 χ1 h ≥ −C θλA (Aσ 1/2 + λσ A −1 )h2 , which, together with (58) and (57) yields 2
(55) ≥ −C
θ λ2 (Aθ −1 + Aσ 1/2 + λσ A −1 )h2 . A
(61)
Our next step is estimating (56). Again, using QF40 = 0, we get χ1 F40 BF40 χ1
2 2 = χ1 F40 (9N/10 + λI˜)F40 χ1 − θλ2 χ1 F40 R A I QI + I QIR A F40 χ1 ≥ −C(λ2 + θ λ2 ),
where we used λI˜ ≥ −cN − O(λ2 /c) and F40 RA2 ≤ |4 |−2 ≤ C. We thus obtain, since θ << 1: (56) = hχ1 F40 BF40 χ1 h ≥ −C
θλ2 A 2 h . A θ
(62)
Positive Commutators in Non-Equilibrium Quantum Statistical Mechanics
347
Finally, we investigate the positive term (54). By sandwiching (28) in Theorem 4.3 (with 0 replaced by E 0 ) with F 0 , and noticing that F 0 E 0 = F 0 , we arrive at E4 4 4 4 4 4
θλ2 γe hχ1 F40 1 − 25 δe,0 P#β,0 F40 χ1 h A
θλ2 γe h χ12 (F40 )2 − 25 δe,0 P#β,0 h A 2 θλ2 γe h χ12 1 − F40 − 25 δe,0 P#β,0 h A
θλ2 γe h χ12 1 − 2F40 − 25 δe,0 P#β,0 h A
θλ2 γe h χ12 (1 − Cλσ 1/2 ) − 25 δe,0 P#β,0 h, A
hχ1 F40 BF40 χ1 h ≥ π ≥ = ≥ ≥
(63)
where we used (60) in the last step once again, and −2χ12 (F40 )2 P#β,0 ≥ −2P#β,0 in the second step. Combining (63) with (61) and (62) yields Proposition 5.2. & ' 2 Proposition 5.3. We have 1,2 h[χi , [χi , B]]h = θλA O(Aθ −1 λ−1 σ −3/2 )h2 . Proof. Notice that χ1 and 1 − χ2 have compact supports contained in [0, 2]. Now in the double commutator, we can replace χ2 by 1 − χ2 without changing its value. So it suffices to estimate [χ (N/σ ), [χ (N/σ ), B]], where χ ∈ C0∞ ([0, 2]). We have [χ (N/σ ), [χ (N/σ ), B]] = [χ (N/σ ), [χ (N/σ ), λI˜ + [L, b]]]. It is not difficult to see that we have in the sense of operators on D(N 1/2 ):
λ [χ (N/σ ), [χ (N/σ ), λI˜]] = 2 d χ˜ (z) d χ˜ (ζ )(N/σ − z)−1 (N/σ − ζ )−1 σ × I˜(N/σ − z)−1 (N/σ − ζ )−1 . (64) We used the operator calculus introduced in Appendix A.4. Now since I˜(N/σ − z)−1/2 ≤ C(N + 1)1/2 (N/σ − z)−1 ≤ Cσ 1/2 | Im z|−1 , which follows from √ x+1 ≤ Cσ 1/2 |Imz|−1 , sup |x/σ − z| x≥0 we conclude that
[χ (N/σ ), [χ (N/σ ), λI˜]] ≤ qCλσ −3/2 .
(65)
Next, write for simplicity χ instead of χ (N/σ ), and look at 2
[χ , [χ , [L, b]]] = θλ[χ , [χ , [L,R A I Q]]] + adjoint. We claim that 2
[χ , [L,R A I Q]] = 0. 2
2
(66)
2
Write first [L,R A I Q] = R A [L0 , I ]Q + λ[I,R A I Q]. Then 2
2
2
2
[χ ,R A [L0 , I ]Q] = [χ ,R A [L0 , I ]Q] = χR A [L0 , I ]Q −R A [L0 , I ]Qχ .
348
M. Merkli 2
Here, χ = 1 − χ . Notice that Qχ = 0, and since RanR A [L0 , I ]Q ⊂ Ran P (N = 1), we 2 2 have also χR A [L0 , I ]Q = 0, for σ > 2. Similarly, [χ , [I,R A I Q]] = 0, so (66) follows. We obtain thus from (65): [χ , [χ , B]] = O(λσ −3/2 ), which proves the proposition. ' & Now we finish the proof of Theorem 2.1. The IMS localization formula (52) together with Propositions 5.1–5.3 yields θ λ2 σ 5 θλ2 h[L, A]h ≥ γe 1 − O(λσ 1/2 ) hχ12 h + hχ22 h − γ0 δe,0 hP#β,0 h A 2 2 A θ λ2 −1 − O Aθ + Aσ 1/2 + λσ A −1 + Aθ −1 λ−1 σ −3/2 h2 . A The sum of the first two terms on the r.h.s. is bounded below by θ λ2 γe 1 − O(λσ 1/2 ) h2 , A so we get θ λ2 h γe 1 − 25 δe,0 P#β,0 − O(λσ 1/2 ) h[L, A]h ≥ A (67) − O Aθ −1 + Aσ 1/2 + λσ A −1 + Aθ −1 λ−1 σ −3/2 h. ˆ
Finally, we choose our parameters. Let A = λAˆ /100 , σ = λ−σˆ /100 , θ = λθ/100 , and choose ˆ = (44, 55, 26). (ˆA , σˆ , θ)
(68)
It is then easily verified that for small λ, the conditions on the parameters given in Theorem 4.3 and Proposition 5.1 hold, and furthermore, (67) becomes h[L, A]h ≥ λ182/100 h γe 1 − 25 δe,0 P#β,0 − O(λ145/200 ) − O(λ1/200 ) h
γ e ≥ λ91/50 h ' & (1 − 5δe,0 P#β,0 ) − O(λ1/200 ) h. 2 6. Proof of Theorem 2.2 We follow the idea of the Virial Theorem, as explained in Subsect. 1.3: Assume ψ is a normalized eigenvector of L with eigenvalue e. If e = 0, we assume in addition that ψ ∈ Ran P#⊥β,λ . Let α > 0 and set fα := α −1 f (iαA0 ), where f is a bounded C ∞ function, such that the derivative f is positive and s.t. f (0) = 1 (take e.g. f = Arctan). Set fα := f (iαA0 ), and hα := fα . Furthermore, set fα := f (iαA0 ). For ν > 0 and g ∈ C0∞ (−1, 1), define ψν = g(νN)ψ. Here, α, ν will be chosen small in an appropriate way. We define the regularized eigenfunction ψα,ν = hα ψν . Notice that ψα,ν → ψ, as α, ν → 0.
(69)
Positive Commutators in Non-Equilibrium Quantum Statistical Mechanics
349
Set for notational convenience in this section K := [L, A0 ] = N + λI˜. The strategy is to show that Kψα,ν := ψα,ν , Kψα,ν → 0, as α, ν → 0 (see the next subsection, (74)). For this estimate, we need the restrictive IR behaviour p > 2, see after Proposition 6.1. Using the PC estimate, Theorem 2.1, we also show that Kψα,ν is strictly positive (as α, ν → 0), see Subsect. 6.2, (86). The combination of these two estimates yields a contradiction, hence showing that the eigenfunction ψ of L we started off with cannot exist. In the case e = 0, we need to use that the product P#β,0 P#⊥β,λ is small, which is satisfied provided β|λ| < C, see (11). 6.1. Upper bound on Kψα,ν . Using (L − e)ψ = 0 and that [N, I ] is N 1/2 -bounded, we find that fα (L − e)ψν = gν fα (L − e)gν ψ = fα gν [λI, gν ]ψ = O(λα −1 ν 1/2 ).
(70)
Next, observe that 2 Im fα (L − e)ψν = [L, ifα ]ψν = Re [L, ifα ]ψν = Re fα N + λ[I, ifα ] ψ ,
(71)
ν
where we used in the last step
[L0 , ifα ] = d f˜(z)(iαA0 − z)−1 [L0 , A0 ](iαA0 − z)−1
= d f˜(z)(iαA0 − z)−2 N = fα N, since A0 and N commute (second step) and we made use of (113) with p = 1 in the last step. The commutator [I, ifα ] is examined in Proposition 6.1. The following equality holds in the sense of operators on D(N 1/2 ) or in the sense of quadratic forms on D(N 1/4 ): i [I, ifα ] = fα adA1 0 (I ) − αfα adA2 0 (I ) + R, 2
(72)
where we assume that the k-fold commutator adAk 0 (I ) := [· · · [I, A0 ], A0 , · · · , A0 ] is N 1/2 -bounded (or N 1/4 -form bounded) for k = 1, 2, 3. The term R satisfies the estimate RN −1/2 , N −1/4 RN −1/4 = O(α 2 ). Proof. Using the operator calculus introduced in Appendix A.4, we write [I, ifα ]
= d f˜(z)(iαA0 − z)−1 [I, A0 ](iαA0 − z)−1
1 = fα adA0 (I ) − iα d f˜(z)(iαA0 − z)−2 adA2 0 (I )(iαA0 − z)−1
i = fα adA1 0 (I ) − αfα adA2 0 (I ) − α 2 d f˜(z)(iαA0 − z)−3 adA3 0 (I )(iαA0 − z)−1 . 2
350
M. Merkli
The last integral is defined to be R, and the estimates follow by noticing that A0 and N commute. & ' Notice that it is here that we need adAk 0 (I )N 1/2 ≤ C, k = 2, 3, hence the more restrictive IR behaviour p > 2. We obtain from (72) and recalling that I˜ = [I, A0 ]: λ (71) = Re fα K ψ − Re iαfα adA2 0 (I ) + O(λα 2 ν −1/2 ) ν ψν 2 i (73) = Kψα,ν + λ Re hα [hα , λI˜] − αfα adA2 0 (I ) + O(λα 2 ν −1/2 ) 2 ψν = Kψα,ν + O(λα 2 ν −1/2 ). We used in the last step that the real part in the second term above is i 2 ˜ = O(α 2 ν −1/2 ), [hα , [hα , I ]] − α[fα , adA0 (I )] 2 ψν since adA3 0 (I ) is N 1/2 -bounded. Combining (73) and (70), we obtain 1/2 α2 ν Kψα,ν ≤ Cλ + 1/2 ψ2 . α ν
(74)
6.2. Lower bound on Kψα,ν . Let 4 be an interval containing exactly one eigenvalue, e, of Lp . We introduce two partitions of unity. The first one is given by 2 + χ 24 = 1, χ4
where χ4 ∈ C ∞ (4), χ4 (e) = 1. We localize in L, i.e. we set χ4 = χ4 (L). The second partition of unity is given by χ 2 + χ 2 = 1, where χ ∈ C ∞ is a “smooth Heaviside function”, i.e. χ (x) = 0 if x ≤ 0 and χ (x) = 1 if x ≥ 1. We set for n > 0: χn = χ (N/n), χ 2n = 1 − χn2 . We will choose n < 1/ν, so that χn ψν = χn ψ. The last equation will be used freely in what follows. We are going to use the IMS localization formula (51) with respect to both partitions of unity, and we start with the one localizing in N: 1 1 Kψα,ν = χn Kχn + χ n Kχ n + [χn , [χn , K]] + [χ n , [χ n , K]] 2 2 ψα,ν (75) n 2 −3/2 ≥ Kχn ψα,ν + χ n ψα,ν − O(λn ), 2 where we used that K ≥ n/2 on Ran P#⊥ , and the estimate (65) with σ replaced by n. Next, from the IMS localization formula for the partition of unity w.r.t. L, we have Kχn ψα,ν = χ4 Kχ4 + χ 4 Kχ 4 + R χ ψ n α,ν ≥ χ4 (K + [L, b])χ4 + χ 4 Kχ 4 + R χ ψ − λ19/50 O(αn + λn−1/2 ) n α,ν 2 ≥ θχ4 χn ψα,ν − Cθδe,0 P#β,0 χ4 χn ψα,ν 2 + χ 4 Kχ 4 + R χ ψ n α,ν
−λ
19/50
O(αn + λn
−1/2
).
(76)
Positive Commutators in Non-Equilibrium Quantum Statistical Mechanics
351
Here, several remarks are in order. First, we have set 2R = [χ 4 , [χ 4 , K]] + [χ4 , [χ4 , K]], and we have used in the second step the fact that [L, b]χ4 χn ψα,ν = [L − e, b]χ4 χn hα ψ = 2 Re χ4 (L − e)hα χn ψ, bχ4 χn hα ψ = λ19/50 O(αn + λn−1/2 ). We recall that b is a bounded operator (see Proposition 4.2), with b = O(λ19/50 ). In the last step in (76), we used the positive commutator estimate, Theorem 2.1, in the 2, following way. For e = 0, Theorem 2.1 gives right away χ4 (K + [L, b])χ4 ≥ θχ4 where we recall that [L, A] = [L, A0 ] + [L, b], and b is defined in (24). We have set θ = Cλ91/50 . In the zero eigenvalue case, e = 0, we have λ91/50 γ0 (1 − 5P#β,0 ) − O(λ1/200 ) χ4 χn ψα,ν 2 91/50 91/50 λ 5λ ≥ γ0 χ4 χn ψα,ν 2 − γ0 P#β,0 χ4 χn ψα,ν 2 . 4 2
K + [L, b]χ4 χn ψα,ν ≥
Setting again θ = Cλ91/50 yields (76). We now estimate the remainder term R. Notice that the same observation as at the beginning of the proof of Proposition 5.3 shows that we have the estimate Rχn ψα,ν = 2i Im χ 4 χn ψα,ν , [χ 4 , K]χn ψα,ν . Therefore, Rχ ψ ≤ Cχ 4 χn hα ψ [χ 4 , K]χn hα ψ. (77) n α,ν Now we have on D(N ): [χ 4 , K] = d χ˜ 4 (z)(L − z)−1 [K, L](L − z)−1 , where we recall that (L − z)−1 leaves D(N ) invariant. Furthermore, [K, L] = λ[N, I ] + λ[I˜, L0 ] + λ2 [I˜, I ] = λ[N, I ] + λI (u∂u g) + λ2 [I˜, I ],
(78)
where I (u∂u g) is obtained from I by replacing the form factor g by u∂u g. The last commutator in (78) is bounded, and the other two are N 1/2 -bounded, so we obtain [χ 4 , K]χn hα ψ = O(λn1/2 )χn ψα,ν . (79) Next, we estimate the first term on the r.h.s. of (77): χ 4 χn hα ψ = (L − e)−1 χ 4 (L − e)χn hα ψ ≤ C(L − e)χn hα ψ ≤ Cn−1 λ[N, I ]χn hα ψ + O(λn−3/2 ) + Cχn (L − e)hα ψ
(80)
≤ Cλn−1/2 χn ψα,ν + O(λn−3/2 + αn).
Combining this with (79) and (77), we arrive at the estimate Rχ ψ ≤ Cλ2 χ ψα,ν χn ψα,ν + O(λ2 n−1 + λαn3/2 ). (81) n n α,ν There is one more term in (76) we have to estimate: χ 4 Kχ 4 χ ψ . Since P#⊥ (N + n α,ν λI˜)P#⊥ ≥ 0 and since P# I˜P# = 0, we have the bound K ≥ P#⊥ λI˜P# + adj. ≥ −Cλ, which implies χ 4 Kχ 4 χ ψ ≥ −Cλχ 4 χn ψα,ν 2 . (82) n α,ν
352
M. Merkli
Using (82) and (81), we obtain from (76) Kχn ψα,ν ≥ θχn ψα,ν 2 − (θ + Cλ)χ 4 χn ψα,ν 2 − Cθ δe,0 P#β,0 χ4 χn ψα,ν 2 − Cλ2 χn ψα,ν χn ψα,ν − λ19/50 O(αn + λn−1/2 ) − λO(αn3/2 + λn−1 ).
(83)
Next, we have for any η, A > 0: χn ψα,ν χn ψα,ν ≤ ηχn ψα,ν 2 + η−1 χn ψα,ν 2
≤ (Aη−1 + η)χn ψα,ν 2 + η−1 A −2 χ n ψα,ν 2 .
In the second step, we used the standard fact that we can choose the partition of unity s.t. χn ψ2 ≤ Aχn ψ2 + A −2 χ n ψ2 , for any A > 0. Combining this with (83), we obtain from (75): Kψα,ν ≥ (θ − Cλ2 (Aη−1 + η))χn ψα,ν 2 + (n/2 − Cλ2 η−1 A −2 )χ n ψα,ν 2 − Cθ δe,0 P#β,0 χ4 χn ψα,ν 2 − (θ + Cλ)χ 4 χn ψα,ν 2 − O(λαn3/2 + λ19/50 αn + λ69/50 n−1/2 ). Consider λ small and fixed. Then if n − Cη−1 A −2 ≥ θ, 2
(84)
we obtain Khα ψν ≥ θhα ψν 2 − Cθ δe,0 P#β,0 χ4 χn hα ψν 2 − O(Aη−1 + η + αn3/2 + n−1/2 ) − Cθ(n−1 + n−3 + α 2 n2 ).
(85)
On the last line, we used (80). Let us choose the parameters as follows: A = α 1/10 ,
η = α 1/20 ,
n = α −1/2 ,
then (84) is verified, and furthermore, (85) reduces to Kψα,ν ≥ θψα,ν 2 − Cθδe,0 P#β,0 χ4 χn ψα,ν 2 − O(α 1/20 ).
(86)
On the other hand, recalling (74), we obtain by choosing the parameters ν and α as ν = α3: Kψα,ν ≤ Cα 1/2 .
(87)
Since ψα,ν → ψ = 1 as α, ν → 0, and since −Cθ δe,0 P#β,0 χ4 χn ψα,ν 2 → −Cθ δe,0 P#β,0 P#⊥β,λ ψ2 (recall that ψ = P#⊥β,λ ψ if e = 0), we obtain thus for small α from (86) and (87) the inequality θ 1 − Cδe,0 P#β,0 P#⊥β,λ ψ2 ≤ Cα 1/2 . (88) 2
Positive Commutators in Non-Equilibrium Quantum Statistical Mechanics
353
For e = 0, this is a contradiction, and it shows that there can not be any eigenvalues of L in the interval 4. Remark that there is no smallness condition on the size of 4, except that it must not contain more than one eigenvalue of L0 , so we can choose 4 = (e− , e+ ). Let us look now at the case e = 0. Again, we reach a contradiction from (88), provided P#β,0 P#⊥β,λ ψ2 << 1. In this case, we conclude that zero is a simple eigenvalue of L. Now the fact that P#β,0 P#⊥β,λ = O(β|λ|) follows immediately from (11), so taking β|λ| small enough finishes the proof of Theorem 2.2. & '
Acknowledgements. The author thanks I. M. Sigal for his support and advice. Many thanks also go to J. Fröhlich and R. Froese for stimulating discussions and to the referee for helpful comments. During the writing up of this work, the author has been supported by an NSERC PDF (Natural Sciences and Engineering Council of Canada Postdoctoral Fellowship), which is gratefully acknowledged.
A. Appendix A.1. Selfadjointness of L and some relative bounds. We introduce the positive operator M = d(|u|) with domain D(M) = {ψ ∈ H : Mψ < ∞} and the number operator N = d(1)
(89)
with natural domain D(N ) = {ψ ∈ H : N ψ < ∞}. Proposition A.1 (Relative Bounds). Set L2 = L2 (R × S 2 ), and let 0 < β0 < ∞ be a fixed number. 1) If f ∈ L2 , then a(f )N −1/2 ≤ f L2 . 2) If |u|−1/2 f ∈ L2 , then a(f )M−1/2 ≤ |u|−1/2 f L2 . 3) For ψ ∈ D(N 1/2 ) and ψ ∈ D(M1/2 ) respectively, we have the following bounds, uniformly in β ≥ β0 :
I ψ2 ≤ CG N 1/2 ψ2 + ψ2 ,
I ψ2 ≤ CG M1/2 ψ2 + ψ2 . Here, C ≤ C (1 + β0−1 ), where C is independent of β, β0 . 4) For ψ ∈ D(N 1/2 ), any c > 0, and uniformly in β ≥ β0 , we have
16λ2 |ψ, λI ψ| ≤ cN 1/2 ψ2 + (1 + β0−1 ω−1 )|g|2 d 3 k. G2 ψ2 3 c R 5) For ψ ∈ D(M1/2 ), any c > 0, and uniformly in β ≥ β0 , we have
32λ2 |g|2 3 |ψ, λI ψ| ≤ cM1/2 ψ2 + G2 ψ2 d k. (1 + β0−1 ω−1 ) c ω R3 Remarks. 1. The parameter β0 gives the highest temperature, T0 = 1/β0 , s.t. our estimates 3)–5) are valid uniformly in T ≤ T0 . T0 can be fixed at any arbitrary large value. Since we are not interested in the large temperature limit T → ∞, we set from now on for notational convenience T0 = 1.
354
M. Merkli
2. Notice that 4) and 5) tell us that ∀ c > 0 (with the O-notation introduced after Theorem 2.1), |λI | ≤ cN + O(λ2 /c),
|λI | ≤ cM + O(λ2 /c),
where we understand these inequalities holding in a sense of quadratic forms on D(N 1/2 ) and D(M1/2 ) respectively. Proof of Proposition A.1. The proof is standard (see e.g. [BFS4, JP1, JP2]); we only 2 presentthe proof of 3), as an example of how to keep track of β. From I ψ ≤ 2 ∗ 2 ∗ 2 2 2 4G a (g1 )ψ + a (g2 )ψ + a(g1 )ψ + a(g2 )ψ | , and using the CCR [a ∗ (f ), a(g)] = f, g, we get a ∗ (g1,2 )ψ2 = ψ, a(g1,2 )a ∗ (g1,2 )ψ = a(g1,2 )ψ2 + g1,2 2L2 ψ2 ,
so I ψ2 ≤ 8G2 a(g1 )ψ2 + a(g2 )2 + 2g1 2L2 ψ2 , where we used g1 L2 = g2 L2 , since g1 (u, α) = −g2 (−u, α). Using 1) and 2) above, we get
I ψ2 ≤ 16G2 g1 2L2 N 1/2 ψ2 + ψ2 , 2 I ψ2 ≤ 16G2 |u|−1/2 g1 2 M1/2 ψ2 + ψ2 . L
Next, we show that g1 L2 ≤ C and |u|−1/2 g1 L2 ≤ C, uniformly in β ≥ β0 . Indeed, notice that g1 2L2 = R3 (1 + 2µ)|g(ω, α)|2 dωdS(α) = g2 2L2 , where we represented g in the integral in spherical coordinates. Since we have 1 + 2µ = 1 + 2(eβω − 1)−1 ≤ 1 + 2β −1 ω−1 ≤ 1 + 2β0−1 ω−1 , uniformly in β ≥ β0 , we get with (7) (for p > 0) the following uniform bound in β ≥ β0 :
(1 + β0−1 ω−1 )|g(k)|2 d 3 k = C < ∞. (90) g1,2 2L2 ≤ 2 R3
Similarly, |u|−1/2 g1 2L2 ≤ 2 R3 (1 + β0−1 ω−1 )ω−1 |g(ω, α)|2 d 3 k = C < ∞, uniformly in β ≥ β0 . It is clear from the last two estimates that C satisfies the bound indicated in the proposition. & ' These relative bounds and Nelson’s commutator theorem yield essential selfadjointness of the Liouvillian (cf. also Theorem 5.1 in [DJ]): Theorem A.2 ( Selfadjointness of the Liouvillian). Since Hp is bounded below, there −1/2 is bounded in the sense is a C > 0 s.t. Hp > −C. Suppose that [G, Hp ](H p + C) that the quadratic form ψ → 2i Im Gψ, Hp ψ , defined on D(Hp ), is represented by an operator denoted [G, Hp ]o , s.t. [G, Hp ]o (Hp + C)−1/2 is bounded. Then ∀ λ ∈ R, L is essentially selfadjoint on D0 := D(Hp ) ⊗ D(Hp ) ⊗ D(M) ⊂ Hp ⊗ Hp ⊗ F(L2 (R × S 2 )). Proof. The proof uses Nelson’s commutator theorem (see [RS], Theorem X.37). Let N = (Hp + C) ⊗ 1p + 1p ⊗ (Hp + C) + M + 1, then N is selfadjoint on D0 and N ≥ 1. Also, L is defined and symmetric on D0 .
Positive Commutators in Non-Equilibrium Quantum Statistical Mechanics
355
According to Nelson’s commutator theorem, in order to prove Theorem 1.2, we have to show that ∀ ψ ∈ D0 and some constant d > 0, Lψ ≤ dN ψ, |Lψ, N ψ − N ψ, Lψ| ≤ dN
1/2
(91)
ψ . 2
(92)
Estimate (91) easily follows from Lp N −1 ≤ 1, Lf N −1 ≤ 1 and I N −1 ≤ I (M + 1)−1/2 (M + 1)1/2 (M + 1)−1 ≤ d (by 3) of Proposition 6.1). To show (92), notice that L0 commutes with N , so the l.h.s. of (92) reduces to |I ψ, N ψ − N ψ, I ψ| ≤ |I ψ, Mψ − Mψ, I ψ| + K,
(93)
K = I ψ, ((Hp + C) ⊗ 1 + 1 ⊗ (Hp + C))ψ − ((Hp + C) ⊗ 1 + 1 ⊗ (Hp + C))ψ, I ψ .
(94)
where
Let us examine the first term on the r.h.s. of (93). It is easily shown that since |u|g1,2 ∈ L2 (R × S 2 ), then a ∗ (g1,2 )M = Ma ∗ (g1,2 ) + a ∗ (|u|g1,2 ) on D(M). This shows that a # (g1,2 ) leave D(M) invariant and so we have ∀ ψ ∈ D0 : I ψ, Mψ − Mψ, I ψ = |ψ, (I M − MI )ψ| = ψ, Gl ⊗ (a ∗ (|u|g1 ) − a(|u|g1 )) − Gr ⊗ (a ∗ (|u|g2 ) − a(|u|g2 )) ψ ≤ cψ (M + 1)1/2 ψ ≤ cN 1/2 ψ2 , where we used Proposition 6.1 in the third step. Now we look at K given in (94). Using the specific form of I (see (9)), we can write K ≤ |K1 | + |K2 |, where K1 = Gl ⊗ (a(g1 ) + a ∗ (g1 ))ψ, (Hp + C) ⊗ 1ψ − (Hp + C) ⊗ 1ψ, Gl ⊗ (a(g1 ) + a ∗ (g1 ))ψ , K2 = Gr ⊗ (a(g2 ) + a ∗ (g2 ))ψ, 1 ⊗ (Hp + C)ψ − 1 ⊗ (Hp + C)ψ, Gr ⊗ (a(g2 ) + a ∗ (g2 ))ψ . We examine K1 . Let ψ ∈ D0 , then (Hp + C)1/2 ψ ∈ H, and so K1 = 2i Im Gl ⊗ (a(g1 ) + a ∗ (g1 )), (Hp + C) ⊗ 1ψ = 2i Im (a(g1 ) + a ∗ (g1 ))ψ, [G, Hp ]o ψ , so we obtain |K1 | ≤ c(M + 1)1/2 ψ (Hp + C)1/2 ⊗ 1ψ ≤ cN 1/2 ψ2 . The same estimate is obtained for |K2 | in a similar way. This shows (92) and completes the proof. ' &
356
M. Merkli
A.2. Proof of Theorem 2.4. For a fixed eigenvalue e = 0 of L0 , define the subsets of N: Nr(i) := {j |Ei − Ej = e}, (j )
Nl
:= {i|Ei − Ej = e},
Nr := ∪i Nr(i) = {j |Ei − Ej = e for some i}, (j )
Nl := ∪j Nl
= {i|Ei − Ej = e for some j }.
We also let Pi denote the rank-one projector onto Cϕi , where we recall that {ϕi } is the orthonormal basis diagonalizing Hp . For any nonempty subset N ⊂ N, put PN :=
Pj , and PN := 0 if N is empty.
j ∈N
Set Emn := Em − En , and for e ∈ σ (Lp )\{0}, m ∈ Nl and n ∈ Nr , define:
δm := inf σ PN (m) GPNrc GPN (m) PN (m) ≥ 0, r r r
δn := inf σ PN (n) GPNlc GPN (n) PN (n) ≥ 0. l
l
l
(95) (96)
Here, the superscript c denotes the complement. Notice that if e = 0, then Nrc = Nlc are empty, and δm , δn = 0. We define also δ0 := inf m∈Nl {δm } + inf n∈Nr {δn }. From P (Lp = e) = {i,j :Eij =e} Pi ⊗ Pj , we obtain together with the definition of (e) given in (16):
p (e) = 1 − δEmn ,e δ(Emn − e + u)Pij m∗ Pmn m Pkl . m,n
{i,j :Eij =e} {k,l:Ekl =e}
(97) The idea here is to get a lower bound on the sum over (m, n) ∈ N × N by summing only over a convenient subset of N × N (notice that every term in the sum is positive). That subset is chosen such that the summands reduce to simpler expressions. Using the definition of m (see (17)), we obtain Pij m∗ Pmn mPkl = Pij Gl g 1 − Gr g 2 Pmn (Gl g1 − Gr g2 ) Pkl = Pi GPm GPk ⊗ Pn δj n δnl |g1 |2 − Pi GPm ⊗ Pn CGCPl δj n δmk g 1 g2 − Pm GPk ⊗ Pj CGCPn δim δnl g 2 g1 + Pm ⊗ Pj CGCPn CGCPl δim δmk |g2 |2 . Summing over i, j and k, l according to (97) yields
{i,j :Eij =e} {k,l:Ekl =e}
Pij m∗ Pmn mPkl
= g 1 PN (n) GPm ⊗ Pn − g 2 Pm ⊗ PN (m) CGCPn · adjoint. l
r
Positive Commutators in Non-Equilibrium Quantum Statistical Mechanics
357
For (m, n) ∈ Nl × Nrc , we have PN (n) = 0 and PN (m) = 0, and for (m, n) ∈ Nlc × Nr , r l we have PN (n) = 0 and PN (m) = 0. As explained above, we now get a lower bound on r l the sum (97) by summing only over the disjoint union ˙ Nlc × Nr . (m, n) ∈ Nl × Nrc ∪ An easy calculation shows that
2 dS g2 (Eij , α) Pm ⊗ CPN (m) G PNrc GPN (m) C p (e) ≥ inf r r i,j :Eij =0 S2 m∈Nl
+
inf
i,j :Eij =0
S2
2 dS g1 (Eij , α)
n∈Nr
PN (n) G PNlc GPN (n) ⊗ Pn . l l
Next, we investigate the integrals. From (10), we have
2 dS|g1,2 (Eij , α)| ≥ |Eij | dS|g(|Eij |, α)|2 , S2
S2
uniformly in β ≥ 1. With (95), (96) and remarking that σ (CT C) = σ (T ) for any selfadjoint T , this yields
|Eij | inf {δm } + inf {δn } P (Lp = e), dS|g(Eij , α)|2 p (e) ≥ inf i,j :Eij =0
m∈Nl
S2
n∈Nr
since m∈Nl Pm ⊗ PN (m) = n∈Nr PN (n) ⊗ Pn = P (Lp = e). This shows 1) of r l Theorem 2.4. Now we look at the zero eigenvalue. A general normalized element of Ran P (Lp = 0) is of the form φ = i ci ϕi ⊗ ϕi , with i |ci |2 = 1, so
φ, (0)φ = 1 − δEmn ,0 ci cj δ(Emn + u) ϕi ⊗ ϕi , m∗ Pmn mϕj ⊗ ϕj . m,n
i,j
Using again the explicit form of m given in (17) and ϕm , CGCϕn = ϕm , Gϕn , we obtain
φ, (0)φ = 1 − δEmn ,0 δ(Emn + u) |ϕn , Gϕm |2 |cn g1 − cm g2 |2 . (98) m,n
˙ R− × S 2 and using (10) and We split the domain of integration R × S 2 into R+ × S 2 ∪ g2 (u, α) = −g1 (−u, α), arrive at
2 √ δ(Emn + ω) 1 + µcn g − µcm g δ(Emn + u)|cn g1 − cm g2 |2 = R3 √ 2 + δ(Emn − ω) µcn g − 1 + µcm g .
358
M. Merkli
This together with (98) gives
e betaEn e−βEmn − 1 {m,n:Emn <0} 2
−βEm /2 −βEn /2 δ(Emn + ω)|g|2 , × e cn − e cm
φ, (0)φ = 2
|ϕn , Gϕm |2
(99)
where we used δ(Emn + ω)µ = δ(Emn + ω)(e−βEmn − 1)−1 . Equation (99) shows that −1/2 if we choose cn = Zp e−βEn /2 , then each term in the sum is zero. Recall now that p
p
the particle Gibbs state is given by (21), so #β , (0)#β = 0. Since (0) ≥ 0, this p
implies that #β is a zero eigenvector of (0). Finally we show that there is a gap in the spectrum of (0) at zero. Indeed, from (99), we get by the definition of g0 (see statement of Theorem 2.4): φ, (0)φ ≥ 2g0 |e−βEm /2 cn − e−βEn /2 cm |2 {m,n:Emn <0}
= g0
|e−βEm /2 cn − e−βEn /2 cm |2
m,n
= g0
e−βEm |cn |2 + e−βEn |cm |2 − e−β(Em +En )/2 (cn cm + cn cm )
m,n
2 = g0 Zp (β) + Zp (β) − 2 e−βEm /2 cm
m
2 p = 2g0 Zp (β) 1 − #β , φ , where we used
2 n |cn |
= 1. Therefore, we obtain on Ran P#⊥p : (0) ≥ 2g0 Zp (β). β
This proves that if g0 > 0, then we have a gap at zero and zero is a simple eigenvalue. ' & A.3. Proof of Proposition 4.7. We denote the spectrum of Lp by σ (Lp ) = {ej }, where we include multiplicities, i.e. for degenerate eigenvalues, we have ej = ek for different j = k. Let Pj denote the rank one projector onto span{ϕi }, where ϕj ∈ Hp ⊗ Hp is the unique eigenvector corresponding to ej . Let e be a fixed eigenvalue of Lp . Setting mj = Pj m, we have
−1 ψ, Q1 m∗ (Lp − e + u)2 + A 2 m Q1 ψ
= ψ, Q1 m∗j mj ((ej − e + u)2 + A 2 )−1 Q1 ψ . (100) ej ∈σ (Lp )
First, we estimate the term in the sum coming from {j : ej = e}:
ψ, Q1 m∗j mj (u2 + A 2 )−1 Q1 ψ ≤ u−2 mj Q1 ψ2 . {ej =e}
{ej =e}
(101)
Positive Commutators in Non-Equilibrium Quantum Statistical Mechanics
Now
359
mj Q1 ψ2 = P (Lp = e)(Gl g1 − Gr g2 )Q1 ψ2
{ej =e}
≤ 2G2 (|g1 |2 + |g2 |2 )ψ2 ,
so (101) ≤ 2G2 ψ2 g1 /u2L2 + g2 /u2L2 = 4G2 |g1 /u2L2 ψ2 . From our assumptions on g (see (7)) and (10), it is clear that g1 /uL2 = C < ∞, uniformly in β ≥ 1, and we conclude that (101) ≤ Cψ2 . Next, we estimate the sum of the terms in (100) with ej = e and write it as
du((ej − e + u)2 + A 2 )−1 m ˜ j (u, ψ), ej =e R
(102)
(103)
where we put m ˜ j (u, ψ) = S 2 dSmj (u, α)Q1 ψ2 . ∀ ξ > 0, we have
du ((ej − e + u)2 + A 2 )−1 m ˜ j (u, ψ) ej =e {|u−(e−ej )|≥ξ }
≤ξ
−2
ej =e R
≤ ξ −2
du m ˜ j (u, ψ)
m(u, α)Q1 ψ2 ≤ 4ξ −2 G2 g1 /u2L2 ψ2 ≤ Cξ −2 ψ2 .
(104)
Next, with the changes of variables y = u − (e − ej ), we arrive at
du((ej − e + u)2 + A 2 )−1 m ˜ j (u, ψ) ej =e {|u−(e−ej )|≤ξ }
=
ξ −ξ
+
2 −1
dy (y + A ) 2
ξ −ξ
m ˜ j (e − ej , ψ)
ej =e −1
dy(y 2 + A 2 )
m ˜ j (y + e − ej , ψ) − m ˜ j (e − ej , ψ) .
(105)
ej =e
The mean value theorem yields for the last sum: y ∂y |y∈(−ξ,ξ m ˜ j (y + e − ej , ψ). ˜ )
(106)
ej =e
Now ∂y
ej =e
=2
m ˜ j (y + e − ej , ψ)
2 ej =e S
dS Re Pj (∂u m)(y + e − ej , α)Q1 ψ, Pj m(y + e − ej , α)Q1 ψ .
360
M. Merkli
Using the Schwarz inequality for sums, we bound the modulus of the r.h.s. from above by
2 dS Pj (∂u m)(y + e − ej , α)Q1 ψ2 S2
ej =e
·
(107) Pj m(y + e − ej , α)Q1 ψ2 .
ej =e
Now m(y + e − ej , α) = Gl g1 (y + e − ej , α) − Gr g2 (y + e − ej , α), so Pj m(y + e − ej , α)Q1 ψ2
(108)
≤ 2|g1 (y + e − ej , α)| Pj Gl Q1 ψ + 2|g2 (y + e − ej , α)| Pj Gr Q1 ψ2 . 2
2
2
˜ ≥ |e − ej | − |y| ˜ > We have to evaluate this at y = y˜ ∈ (−ξ, ξ ). Clearly, |e − ej + y| d0 − ξ ≥ d0 /2, if we choose ξ ≤ d0 /2, where d0 := inf |ei − ej | > 0. ei =ej
The r.h.s. of (108) can thus be estimated from above by 2 sup |g1 (u, α)|2 Pj Gl Q1 ψ2 + 2 sup |g2 (u, α)|2 Pj Gr Q1 ψ2 , |u|>d0 /2
|u|>d0 /2
hence we arrive at
|(106)| ≤ 32|y| G2 ψ2
S2
dS
sup |∂u g1 | + sup |g1 | .
|u|>d0 /2
|u|>d0 /2
(109)
Using the conditions (7) with p > 0, one shows that the suprema are bounded, uniformly in β ≥ 1, and so is |g1 |, thus (109) gives |(106)| ≤ C|y| ψ2 .
(110)
p−1/2
Remark that the constant here depends on d0 , C ∼ d0 . This argument is valid for any p. Going back to the second term on the r.h.s. of (105), we have shown: ξ dy m ˜ j (y + e − ej , ψ) − m ˜ j (e − ej , ψ) 2 2 −ξ y + A ej =e
ξ |y| |ξ | dy ≤ C ψ2 . ≤ Cψ2 2 + A2 y A −ξ Now we consider the first term on the r.h.s. of (105). We see that, as A/ξ → 0,
ξ
−ξ
2 π dy 2 η , Arctan(ξ/A) = + o (A/ξ ) = y2 + A2 A A 2
(111)
Positive Commutators in Non-Equilibrium Quantum Statistical Mechanics
361
for any 0 < η < 1. This simply follows from the fact that for any such η, we have limx→∞ x η (Arctan(x) − π/2) = 0. Also,
m ˜ j (e − ej , ψ) = ψ, Q1 m∗ δ(u − e + Lp )P (Lp = e)mQ1 ψ . ej =e
We conclude that (103) is equal to
π (1−O(A/ξ )) ψ, Q1 m∗ δ(u−e+Lp )P (Lp = e)mQ1 ψ −O(ξ +Aξ −2 )ψ2 . A Choose e.g. ξ = A 1/4 and η close to 1, then we arrive at
π (103) = ψ, Q1 m∗ δ(u − e + Lp )P (Lp = e)mQ1 ψ − O(A 1/4 )ψ2 . A This together with (102) yields
Q1 m∗ ((Lp − e + u)2 + A 2 )−1 mQ1
π ∗ 1/4 ≥ Q1 m P (Lp = e)δ(Lp − e + u)m − O(A ) Q1 . A
' &
A.4. Operator calculus. We outline an operator calculus for functions of selfadjoint operators, used extensively in this work. For a detailed exposition and more references, we refer to [HS3]. Let f ∈ C0k (R), k ≥ 2, and define the compactly supported complex measure 1 d f˜(z) = − 2π ∂x + i∂y f˜(z)dxdy, where z = x + iy and f˜ is an almost analytic complex extension of f in the sense that ∂x + i∂y f˜(z) = 0, z ∈ R. Then, for a selfadjoint operator A, one shows that
f (A) = d f˜(z)(A − z)−1 , where the integral is absolutely convergent. Given f , one can construct explicitly an almost analytic extension f˜ supported in a complex neighbourhood of the support of f . One shows that for p ≤ k − 2,
k d f˜(z) | Im z|−p−1 ≤ C f (j ) j −p−1 , (112)
j =0
dxxn |f (x)|,
where f n = f (A) are given by
f
(p)
and x = (1 + x 2 )1/2 . Furthermore, the derivatives of
(A) = p!
d f˜(A)(A − z)−p−1 .
(113)
We finish this outline by mentioning that these results extend by a limiting argument to functions f that do not have compact support, as long as the norms in the r.h.s. of (112) are finite.
362
M. Merkli
References [A]
Araki, H.: Relative Hamiltonian for faithful normal states of a von Neumann algebra. Pub. R.I.M.S., Kyoto Univ. 9, 165–209 (1973) [ABG] Amrein, W., Boutet de Monvel, A., Georgescu, V.: C0 -Groups, Commutator Methods and Spectral Theory of N -Body Hamiltonians. Basel–Boston–Berlin: Birkhäuser, 1996 [AW] Araki, H., Woods, E.: Representations of the canonical commutation relations describing a nonrelativistic infinite free bose gas. J. Math. Phys. 4, 637–662 (1963) [BRI,II] Bratteli, O., Robinson, D.: Operator Algebras and Quantum Statistical Mechanics 1,2. Texts and Monographs in Physics, Berlin: Springer-Verlag, 2nd edition, 1987 [BFS1] Bach, V., Fröhlich, J., Sigal, I.M.: Mathematical Theory of Nonrelativistic Matter and Radiation. Lett. Math. Phys. 34, 183–201 (1995) [BFS2] Bach, V., Fröhlich, J., Sigal, I.M.: Quantum electrodynamics of confined nonrelativistic particles. Adv.Math. 137, no. 2, 299–395 (1998) [BFS3] Bach, V., Fröhlich, J., Sigal, I.M.: Renormalization group analysis of spectral problems in quantum field theory. Adv. Math. 137, no. 2, 205–298 (1998) [BFS4] Bach, V., Fröhlich, J., Sigal, I.M.: Return to Equilibrium. J. Math. Phys. 41, no 6, 3985–4061 (2000) [BFSS] Bach, V., Fröhlich, J., Sigal, I.M., Soffer, A.: Positive Commutators and the spectrum of Pauli–Fierz hamiltonian of atoms and molecules. Commun. Math. Phys. 207, no. 3, 557–587 (1999) [CFKS] Cycon, H.L., Froese, R., Kirsch, W., Simon, B.: Schrödinger Operators with applications to Quantum Mechanics and Global Geometry. Berlin–Heidelberg–New York: Springer, 1987 [DJ] Derezi´nski, J., Jak˘si´c, V.: Spectral theory of Pauli–Fierz operators. Preprint (2000) [FNV] Fannes, M., Nachtergaele, B., Verbeure, A.: The equilibrium states of the spin-boson model. Commun. Math. Phys. 114, 963 (1988) [GG] Georgescu, V., Gérard, C.: On the Virial Theorem in Quantum Mechanics. Commun. Math. Phys. 208, 275–281 (1999) [H] Haag, R.: Local Quantum Physics. Fields, Particles, Algebras. Text and Monographs in Physics. Berlin; Springer-Verlag, 1992 [HS1] Hunziker, W., Sigal, I.M.: The general theory of N-body quantum systems. CRM Proc. Lecture Notes 8, 35–72 [HS2] Hunziker, W., Sigal, I.M.: The quantum N -body problem. J. Math. Phys. 41, no. 6, 3448–3511 (2000) [HS3] Hunziker, W., Sigal, I.M.: Time-dependent scattering theory of N-body quantum systems. Rev. Math. Phys. 8, 1033–1084 (2000) [HSp] Hübner, M., Spohn, H.: Radiative decay: Nonperturbative approaches. Rev. Math. Phys. 7, 363–387 (1995) [JP1] Jak˘si´c, V., Pillet, C.A.: On a Model for Quantum Friction II. Fermi’s Golden Rule and Dynamics at Positive Temperature. Commun. Math. Phys. 176, 619–644 (1996) [JP2] Jak˘si´c, V., Pillet, C.A.: On a Model for Quantum Friction III. Ergodic Properties of the Spin-Boson System. Commun. Math. Phys. 178, 627–651 (1996) [M] Merkli, M.: Positive Commutator Method in Non-Equilibrium Statistical Mechanics. Ph.D. thesis, Department of Mathematics, University of Toronto, 2000 [RS] Reed, M., Simon, B.: Fourier Analysis, Self-Adjointness. Methods of Modern Mathematical Physics, Vol. II. New York: Academic Press, 1975 [S] Skibsted, E.: Spectral analysis of N -body systems coupled to a bosonic field. Rev. Math. Phys. 10, no. 7, 989–1026 (1998) Communicated by A. Kupiainen
Commun. Math. Phys. 223, 363 – 382 (2001)
Communications in
Mathematical Physics
© Springer-Verlag 2001
Characteristic Polynomials of Real Symmetric Random Matrices E. Brézin1 , S. Hikami2 1 Laboratoire de Physique Théorique, Ecole Normale Supérieure , 24 rue Lhomond 75231,
Paris Cedex 05, France
2 Department of Pure and Applied Sciences, CREST of JST, University of Tokyo, Meguro-ku, Komaba,
Tokyo 153, Japan Received: 19 March 2001 / Accepted: 21 June 2001
Abstract: The correlation functions of the random variables det(λ − X), in which X is an hermitian N × N random matrix, are known to exhibit universal local statistics in the large N limit. We study here the correlation of those same random variables for real symmetric matrices (GOE). The derivation relies on an exact dual representation of the problem: the k-point functions are expressed in terms of finite integrals over (quaternionic) k × k matrices. However the control of the Dyson limit, in which the distance of the various parameters λ’s is of the order of the mean spacing, requires an integration over the symplectic group. It is shown that a generalization of the Itzykson– Zuber method holds for this problem, but contrary to the unitary case, the semi-classical result requires a finite number of corrections to be exact. We have also considered the problem of an external matrix source coupled to the random matrix, and obtain explicit integral formulae, which are useful for the analysis of the large N limit. 1. Introduction The spectrum of eigenvalues of complex Hamiltonians are often modelled by a random matrix theory, in which the random matrices belong to various ensembles according to the symmetries of the physical problem. The most common space-time symmetries of the Hamiltonian lead to the consideration of ensembles of real, complex or quaternionic random matrices. In the simplest case one considers Gaussian probability distributions. This simple choice is in many cases sufficient since it is now understood that the local statistics of the eigenvalues are universal, i.e. largely independent of the probability distribution. The most commonly studied Gaussian ensembles, called GOE, GUE and GSE, are invariant under the orthogonal, unitary or symplectic groups, respectively, and they all have important applications [1–3]. Unité Mixte de Recherche 8549 du Centre National de la Recherche Scientifique et de l’École Normale Supérieure
364
E. Brézin, S. Hikami
In this article we follow our previous study for the GUE case of the characteristic polynomials of random matrices [4]. If X is an N × N random matrix, whose characteristic polynomial is det(λ−X), we consider the average of products of such characteristic polynomials defined by Fk (λ1 , . . . , λk ) =
k
det(λl − X) .
(1)
l=1
In the GUE case we have derived in a previous article explicit formulae for those correlation functions, found then their asymptotic behavior for large N, and proved their universality in the short distance limit in which the differences λi − λj are of the order of the mean spacing of the eigenvalues of X. As usual the orthogonal and symplectic cases are more difficult to handle. In the GUE case it turns out that there is a hidden duality between N the size of the matrices, and k the number of points in the correlation functions: we may turn the integrals over N × N matrices into integrals over k × k matrices and, since we are interested in large N -finite k limit, this is the required tool for obtaining the large N limit by the saddle-point method. We return below to the GUE case and exhibit this duality. This N − k duality is present in the GOE case as well. It follows in both cases from the standard method of integration over Grassmann variables. One easily obtains thereby exact expressions for the correlation functions of characteristic polynomials, in terms of integrals over a finite number of variables, k 2 in the case of the GUE ensemble, 2k 2 − k for the GOE. Previous explicit formulae have been derived by Forrester [8] for the related problem of generalized Selberg integrals, which is known to correspond to random matrix theory in an appropriate limit for some parameters. In fact it had been known for some time, from the work of Aomoto [9] and Kaneko [10] that these Selberg integrals may be explicitly expressed in terms of multivariable generalized hypergeometric functions, related themselves to Jack polynomials. Unfortunately having explicit representations still does not provide an answer to our purpose which is to obtain explicit asymptotic expressions for large N . Indeed our goal is to understand the Dyson limit in which N is large and the distances between the λ’s are of the order of the mean level spacing. This difficulty seems to be present in those earlier approaches as well; in our case, in spite of the finiteness of the number of integrations, straight use of the saddle-point method does not yield the large N limit, since the corrections to the saddle-point integration are of order one. However the GOE case turns out to be dual to the GSE. This duality is already present in earlier work based on integrals over the eigenvalues [8, 7, 11]. The use of this symplectic symmetry allows us to reduce the number of integrations further, to an integral over k variables only. This is done through an extension of the Harish-Chandra, Itzykson and Zuber formula (HIZ) [19, 20]; in our case we are dealing with an integral over the symplectic group, which turns out to be exact if one includes a finite number of corrections to the leading WKB result. This fact has also been noticed in recent work of Guhr and Kohler [12]. The remaining integrals over k variables are then well adapted to the saddle-point method and we give explicit expressions for the large N limit in a few cases. Our method can also be generalized to include an external matrix source in the probability measure, which breaks its invariance under the orthogonal group and does not allow to study the problem in terms of eigenvalues of the random matrices. Among the formulae that we derive through these methods, let us quote for instance the moments of the characteristic polynomials. In the GUE case we had found earlier [4] that in the
Characteristic Polynomials of Real Symmetric Random Matrices
365
large N limit (GUE)
exp (−N kλ2 ) F2k
=
k−1 l=0
l! 2 (2N πρ(λ))k (k + l)!
in which ρ(λ) is the asymptotic density of eigenvalues. For the GOE case we derive below that (GOE)
exp (−2N kλ2 ) F2k
=
k l=1
(2l − 1)! 2 (2πρ(λ))2k +k . (2k + 2l − 1)!
For the sake of, and for making the role of the generalized HIZ formula more transparent, we shall begin by re-exposing the unitary case at the light of the N − k duality and of the HIZ formula. We shall then proceed to the GOE ensemble. 2. Survey of the Unitary Ensemble For the Gaussian unitary ensemble (GUE), the random matrix X is a complex N × N Hermitian matrix, with a probability distribution function 1 N 2 P (X) = exp − tr X . Z 2
(2)
The average . . .means integration with the normalized weight P (X), with the Eu clidean measure i dXii i<j dXij d Xij . It is easily shown that the F2 (λ1 , λ2 ) reduces, up to a trivial factor, to the kernel [3] KN (λ1 , λ2 ) which characterizes the correlation functions of the eigenvalues of X. When all the λj ’s are nearby , i.e. in the short distance scaling region in which N is large and the products N (λi − λj ) are finite, F2k (λ1 , . . . , λ2k ) becomes, within an appropriate scaling, a universal function, i.e. independent of the specific distribution P (X). When all the λj are equal, F2k (λ) = F2k (λ, . . . , λ) = [det(λ − X)]2k (3) the 2k th moment of the characteristic polynomial. In the large N limit, we have derived earlier [4, 5] exp (−N kλ2 /2) F2k (λ) = γk [2π Nρ(λ)]k , 2
(4)
in which ρ(λ) is the density of eigenvalues, and γk is a universal factor. This number had been computed for the circular unitary ensemble by Keating and Snaith who used the Selberg integral formula [13]. There are several different derivations for those results. Let us here expose the duality which was mentioned in the introduction. We introduce Grassmann variables c and c, ¯ normalized to ¯ dcd ce ¯ iN cc = 1. (5)
366
E. Brézin, S. Hikami
Then the characteristic polynomial may be written as det(λ − X) =
N
d c¯a dca eiN
a,b c¯a (λδab −Xab )cb
.
(6)
a=1
Repeating this k-times, k α=1
det(λα − X) =
N k
d c¯aα dcaα eiN
N
a,b=1
k
α=1 c¯aα (λα δab −Xab )cbα
.
(7)
a=1 α=1
The integration over X, in the presence of a matrix source Y , yields N N 2 2 dXe− 2 Tr X +iN Tr XY = e− 2 Tr Y ,
(8)
(by convention we include from now on a normalization in the euclidean measure dX such that (8) holds as stated).
We may now apply this to the matrix Yab = − kα=1 c¯aα cbα , generated by (7). Then one finds easily that Tr(Y 2 ) = −
N k
c¯aα caβ
αβ=1 a=1
N
c¯bβ cbα = − tr(γ 2 )
(9)
b=1
with γαβ =
N
c¯aα caβ .
(10)
a=1
Our notation are as follows: “Tr” refers here to N -dimensional space, whereas “tr” refers to matrices acting in the k-dimensional space. We introduce next an auxiliary matrix k×k hermitian matrix B, such that N N 2 e 2 tr γ = dB exp − tr B 2 + N tr γ B , (11) 2 integrate over the Grassmann variables (which are now decoupled in the original N dimensional space) and end up with N N 2 Fk (λ1 · · · λk ) = dB det( − iB) exp − tr B 2 (12) N N tr 2 N 2 2 =e dB(det B) exp − tr B + iN tr B 2 in which is the diagonal matrix (λ1 , · · · , λk ). The problem is thus mapped into Gaussian integrals over k × k hermitian matrices as announced. This dual representation is of course well adapted to the k-fixed, N -large, limit that we are considering, since (12) contains k 2 variables instead of N 2 in our starting point. It is not difficult to proceed from (12) and derive the scaling results which were given in [4].
Characteristic Polynomials of Real Symmetric Random Matrices
367
However it turns out that it is simpler , and necessary in view of what comes next for the GOE problem, to integrate out the unitary degrees of freedom in (12). This is done through the HIZ formula [19, 20] which gives the integral over the unitary group U (k):
dU eiN tr U XU
†Y
= CN
det 1≤i,j ≤k eiNxi yj %(x1 , · · · , xk )%(y1 , · · · , yk )
(13)
in which the xi ’s and yj ’s are the eigenvalues of the Hermitian X and Y respectively; %(x1 , · · · , xk ) is the Van der Monde determinant %(x1 , · · · , xk ) =
(xi − xj ).
(14)
i<j
It is well-known that the formula (13) happens to be exact semi-classically, i.e. if one retains only the sum of the k! stationary points in the space of unitary matrices, weighted by the Gaussian fluctuations around each of them. Higher corrections happen to cancel exactly. This leads immediately to an integral over the k eigenvalues bl of B, rather than over the k 2 matrix elements: Fk (λ1 , · · · , λk ) = Ce
N 2
tr
2
k l=1
N
dbl blN e− 2
k l=1
bl2 −2ibl λl
k l
(bl − bl ) . (15) (λl − λl )
When we consider simply the k th moment of the characteristic polynomials, namely the case in which (λ = λ1 = · · · = λk ), the previous formula reduces to Fk (λ) = Fk (λ, · · · , λ) = Ce
Nk 2 2 λ
k l=1
dbl blN
N
e− 2
k l=1
k
bl2 −2ibl λ
(bl − bl )2 .
l
(16) Note that the above representations of the correlation functions of the characteristic polynomials, in terms of integrals over k variables, are exact for any size N × N of the random matrices. It is then simple to find the large-N limit of those functions by saddle-point integration. If we focus to even values of k, and substitute 2k to k (the odd case is doable of course, but it leads to an oscillatory behavior) the saddle point equation for each bl is bl2 − iλl bl − 1 = 0 whose roots are bl±
(17)
2 = iλl ± 4 − λl /2. In the scaling limit, in which the λl −λl are
of order 1/N , the eigenvalues bj are very close and one must pay attention to the Vandermonde determinant in the integration measure. Finally the leading saddle-points cor√ respond to equal numbers of bl close to either b+ or b− , with b± = iλ ± 4 − λ2 /2. 2k There are thus = 2k!/k!k! saddle-points of equal weight. The combinatorial factor k
368
E. Brézin, S. Hikami
γk of (4) is then simply
k−1
l! , (k + l)!
2k γk = k =
l=0
(hk )2 h2k (18)
where we have used hk = =
1 (2π)k/2 k
∞
k
−∞ 1
dxj e− 2
1
k
2 i=1 xi
k
(xl − xl )2
l
(19)
l!.
l=0
(This formula is used when we consider the gaussian fluctuations near the saddle-point in which k of the bj ’s are near b+ and the other half are close to the saddle point b− .) Finally this representation through an integral over a finite matrix B, may be generalized to the case of an external matrix source A coupled to the random matrix X [6, 15, 17]. dX
2k l=1
=
exp
N
det(λl − X)e− 2 N
2
%(λ)
λ2l
dbl
tr X2 +N tr AX N 2k
(ai − ibl )
2k
N
(bl − bl )e− 2
bl2 +iN
λl bl
.
(20)
l
i=1 l=1
We shall now transpose these techniques to real symmetric random matrices. 3. Real Symmetric Matrices and Characteristic Polynomials Again we consider Fk (λ1 , . . . , λk ) =
N
dXe− 2
tr X2
k
det(λi − X),
(21)
i=1
in which X is a real symmetric N × N matrix, X = XT . It is worth remembering that real symmetric matrices form the Lie algebra of the symmetric space U (N )/O(N ) (the Lie algebra of U (N ) consists of N 2 complex hermitian matrices; the imaginary part of those matrices are the N (N − 1)/2 antisymmetric generators of O(N ); the real parts are the N (N + 1)/2 real symmetric generators of the coset). Using again Grassmann variables, and the representation (6) of the characteristic determinants we are led again to an integration over real symmetric matrices in the
presence of the matrix source Y = − kα=1 c¯aα cbα . This gives N N 2 2 T dXe− 2 Tr X +iN tr XY = e− 4 Tr(Y +Y Y ) . (22)
Characteristic Polynomials of Real Symmetric Random Matrices
369
We have dealt earlier with Tr(Y 2 ) = − tr(γ 2 )
(23)
which led to the integral (11) over an hermitian k × k matrix. In addition we have here Tr(Y Y T ) = tr(U V ),
(24)
with the matrices U and V defined by Uαβ =
N
c¯aα c¯aβ ,
(25)
caα caβ .
(26)
a=1
Vαβ =
N a=1
Defining the complex conjugation of Grassmann variables as (c¯1 c2 )∗ = c¯2 c1 , we have γ = γ † , V † = U . Therefore, we may again decompose the remaining quartic terms in the c’s and c’s ¯ as N i † i † † (27) e− 4 tr(U V ) = dDe−N tr(D D+ 2 V D+ 2 D V ) , where D is a complex k × k antisymmetrix matrix, D = −DT . Then we have
k N d c¯aα dcaα eiN α=1 a=1 λα c¯aα caα Fk (λ1 , . . . , λk ) =
2 † × dBdDe−N tr(B +D D)+N c¯aα caβ Bβα i
(28)
i
†
× e− 2 Dαβ caβ caα − 2 Dβα c¯aα c¯aβ . Those auxiliary matrices B and D allow us to integrate over each pair c¯a , ca independently of the other pairs. It is convenient to define c¯a . (29) ψa = ca For a given antisymmetric matrix M, (M = −M T ), we then have the following formula:
d c¯aα dcaα e
iN 2 ψαa Mαβ ψβa
=
d c¯α dcα e N
iN 2 ψα Mαβ ψβ
N (30)
N/2
= [−PfM] = [det M]
,
where Pf is the pfaffian of the antisymmetrix matrix M. Applying this to our problem, we deal here with
D − iB T (31) M= −( − iB) D†
370
E. Brézin, S. Hikami
in which
= diag(λ1 , . . . λk ). Thus we finally obtain Fk (λ1 , . . . ., λk ) =
dBdDe−N tr(B
2 +D † D)
[−PfM]N .
(32)
This representation (32) of Fk in terms of a finite number of integrals, here 2k 2 − k integrals (one k × k hermitian matrix B, one complex antisymmetric k × k matrix D), again exact for any N, is a solution to the problem. However, contrary to the GUE case, it turns out that a direct use of the saddle-point equations fails in the scaling limit. In other words every term of the perturbative expansion around the saddle-point turns out to be relevant in the regime in which the products N (λi − λj ) are finite. One may use the remaining invariances of this representation, to reduce further the number of integrations. A unitary transformation U , among the ca , which diagonalizes the Hermitian matrix B, one of the block matrices of M, transforms D into D → U ∗ DU T ; in other words one can diagonalize B and keep for D an antisymmetric matrix. Therefore, applying the Harish-Chandra–Itzykson–Zuber formula [19, 20] for the integration over the unitary group, i.e. over the relative unitary transformation between the diagonal matrix and the eigenbasis of B, we obtain eN Tr %( )
2
Fk (λ1 , . . . , λk ) =
k
dDe−N
dbl %(b)
bα2 +2iN
λα bα −N Tr D † D
l=1 N
(33)
× (−PfM) ,
where now the matrix B in M is diagonal; we have reduced the integrations to k 2 variables, instead of 2k 2 − k. However it turns out that this is still unsufficient: the inapplicability of the saddle-point method in the scaling limit is still a problem if we proceed from (33). It is thus necessary to return to the underlying geometry of the space of matrices M in the representation (31). In order to make the quaternionic structure more apparent we return to (28) and define the spinor
ψaα
c¯aα = caα
ψ¯ aα
−caα . = c¯aα
and the adjoint
(34)
(35)
Then the quadratic form in the Grassmann variables of (28) takes the form (repeated N times for each index a that we drop) k i † i c¯α cβ Bβα − Dαβ cβ cα − Dβα c¯α c¯β + iλα c¯α cα 2 2
α,β=1
=
k 1 ψ¯ α qα,β + iλα δαβ ψβ 2 α,β=1
(36)
Characteristic Polynomials of Real Symmetric Random Matrices
371
in which the qαβ are quaternionic matrix elements, i.e. linear combination of the Pauli matrices. The identification in terms of 2 × 2 matrices is thus
∗ Bα,β −iDα,β . (37) qα,β = ∗ −iDα,β Bα,β This defines a self-dual quaternion matrix [1, 3], i.e. † = qβα , qαβ
(38)
and qαβ qβα is a multiple of identity. Let M˜ 0 be the quaternionic matrix whose elements are the quaternion qαβ and M˜ the quaternionic matrix with elements, M˜ αβ = qαβ + iλα δαβ .
(39)
The Grassmannian integration leads to
dcα dcα exp −
k N ψ¯ α qα,β + iλα δαβ ψβ = Q det M˜ = −PfM, 2
(40)
α,β=1
in which Q det denotes the quaternionic determinant [1, 3]. In addition qαβ qβα . tr(B 2 + DD † ) = tr(M˜ 02 ) =
(41)
αβ
It may be clarifying to show this quaternionic construction explicitly for the k = 2 case. There one has 0 d λ1 − iB11 −iB21 −d 0 −iB12 λ2 − iB22 M= (42) , −λ1 + iB11 iB12 0 −d ∗ iB21
−λ2 + iB22
d∗
0
and −PfM = |d|2 + (λ1 − iB11 )(λ2 − iB22 ) + |B12 |2 . The equivalent quaternionic construction is
q12 q11 + iλ1 ˜ M= q21 q22 + iλ2
(43)
(44)
with q11 = B11 1, q21 =
† q12 ,
q12 = (B12 )1 + i( B12 )σ3 + i(d)σ1 + i( d)σ2 , q22 = B22 1,
(45)
and ˜ = (q11 + iλ1 )(q22 + iλ2 ) − q12 q † Q det(M) 12 = (B11 + iλ1 )(B22 + iλ2 ) + |B12 |2 + |d|2 = −PfM.
(46)
372
E. Brézin, S. Hikami
Therefore we end up for the correlation functions with the following duality: 2 ˜ N e−N tr(M˜ 0 ) . Fk (λ1 , . . . , λk ) = d M˜ 0 (Q det M)
(47)
The original integral over real symmetric N × N matrices is replaced by integrals over quaternionic matrices which depend upon (2k 2 − k) degrees of freedom. Those matrices are the generators for the symmetric space U (2k)/Sp(k). This representation (47) in terms of a finite number of integration variables is a priori well adapted to the large N -limit. However it turns out that in the scaling limit of interest, the contributions of the non-gaussian fluctuations around the saddle-points are all relevant. Therefore it is necessary to eliminate first the “angular” degrees of freedom. When all the λi are equal, namely if we consider the moments of the characteristic polynomials, one can simply diagonalize the symplectic matrices in terms of k eigenvalues and then proceed to the large N limit. This is done in the next section. However if the λi ’s are unequal we need some equivalent of the HIZ formalism, which will be described afterwards. 4. Moments of the Characteristic Polynomials We first note the trivial k = 1 case: F1 (λ) = det(λ − X) is simply ∞ 2 F1 (λ) = dbe−Nb (λ − ib)N ,
(48)
−∞
√ which, up to a trivial factor, is the Hermite polynomial HN Nλ which has an oscillatory behavior for large N when λ belongs to the support of Wigner’s semi-circle. Therefore we consider from now on the more interesting even correlation functions. When all the λi ’s are equal, the matrix is a multiple of identity, and one can diagonalize the quaternionic matrix M˜ through a transformation belonging to the symplectic group Sp(2k). The transformation of M˜ into the diagonal matrix T = diag(t1 , . . . tk ) yields the Jacobian J = [%(t)]4 , (%(t) is the Vandermonde determinant i<j (ti − tj )), characteristic of the β = 4 problem, as found earlier in [11]. This gives simply F2k (λ) = det(λ − X)2k
2k N 2k
2
(49) = C exp 2N kλ2 dtl tl (tl − tl )4 e−N tl +i2Nλ tl . l=1
l=1
l
The integral representation (49) is well suited to the study of the large N limit. Ex 2k ponentiating the tlN term as eN log tl , the integrand is of the form exp −N f (tl ) 1
with f (t) = t 2 − 2iλt − log t.
(50)
The saddle points for every tl are solutions of f (tc ) = 0, i.e. 2t 2 − 2iλt − 1 = 0.
(51)
Characteristic Polynomials of Real Symmetric Random Matrices
373
The two solutions are given by t± =
1 iλ ± 2 − λ2 . 2
(52)
The difference (t + − t − ) is proportional to the semi-circular density of eigenvalues of the GOE ensemble: t + − t − = πρ(λ),
(53)
√ where ρ(λ) = 2 − λ2 /π. Expanding tl around either t + or t − , we find that the leading saddle-points are those in which half of the tl (l = 1, . . . , 2k) are near t + and the remain 2k − ing half near t . (Other choices give oscillatory contributions in exp −N f (tl ) , 1
which damp the large N -limit.) Therefore we have to add the (2k)!/(k!k!) leading saddlepoints corresponding to the distribution of half of the tl ’s near t + , and the other half near t − . The measure term given by the 4th power of the Vandermonde determinant, yields 2 a factor (πρ(λ))4k from the k variables near t + and the k near t − . The exponent f is then expanded around t + or t − , and the remaining integral factorizes into an integration around t + and another one around t − . The integration around t + is
∞
k
−∞ l=1
dtl e
− N2 f (t + )(t−t + )2
k
(ti − tj ) = 4
i<j
1
2k 2 −k
Nf (t + )
k
(2l)!.
(54)
l=1
Noting t + t − = −1, and t + − t − = πρ(λ), we find f (t + )f (t − ) = (πρ(λ))2 . We need to fix the normalization constant C in (49). It is obtained from the integral,
dtl e
− N2
tl2
2k
(ti − tj ) = 4
i<j
1 N
4k 2 −k 2k (2l)!.
(55)
l=1
The constant C in (49) is thus the inverse of this number. This constant appears as a normalization for the n-point correlation function of the Gaussian symplectic ensemble [3]. Thus including this normalization constant, F2k (λ) becomes in the large N limit as 2 2 (GOE) = γk N 2k (2πρ(λ))2k +k , (56) exp −2N kλ2 F2k 2 k l=1 (2l)! (2k)! γk = , 2k k!k! l=1 (2l)! =
k l=1
(2l − 1)! . (2k + 2l − 1)!
(57)
For example, in the case of F2 (λ) =< (det(λ − X))2 >, it gives 1 exp −2N λ2 F2 (λ)(GOE) = N 2 (2πρ(λ))3 . 6
(58)
374
E. Brézin, S. Hikami
This result agrees with the result which one would deduce from lim ρ(λ1 , λ2 )/(λ1 −λ2 ), where the limit means λ1 → λ and λ2 → λ. The value of γk agrees with the result of COE (circular orthogonal ensemble) found by Keating and Snaith [13] through the Selberg integral formula. In the COE case, however, the density of state is a constant, and the factor ρ(λ) is absent. This result is to be compared with the earlier result for the GUE, 2 (GUE) exp −N kλ2 F2k = γk (2N πρ(λ))k , (59) where γk is given by (18). This universal constant γk appears also in the average of the moments of the Riemann ζ -function, and it has a number theoretical meaning [4, 13, 14]. 5. Correlations of Characteristic Polynomials The integral representation (12) of the correlations functions Fk (λ1 , . . . , λk ) is not unitary invariant unless the λi ’s are all equal. Therefore if we parametrize the matrix B as B = U † bU , in which b is a diagonal matrix, we have to consider the HIZ integral det exp iλi bj dU exp i tr U † bU = (60) %(b)%(λ) which is well-known to be WKB exact. This explains why, in the GUE case, it is equally possible, to apply the saddle-point method with or without integrating out the unitary group. In the symplectic case we are not aware of any similar explicit result; however it will be shown now, at least for the lowest values of k, that the integral over the symplectic group can be performed exactly. The result is remarkably that, in this case, WKB plus a finite number of corrections is exact. In the k = 2 case, we have F2 (λ1 , λ2 ) = det(λ1 − X) det(λ2 − X) .
(61)
As shown in the previous section, it is given by the integral (49). We first evaluate explicitly the angular integral. We first diagonalize B by a unitary transformation, and then write the eigenvalues b1 and b2 in terms of new parameters t and c, b1 = (1 − c)t1 + ct2 , b2 = ct1 + (1 − c)t2 .
(62)
Then we have b1 + b2 = t1 + t2 , and b1 − b2 = (1 − 2c)(t1 − t2 ). Since the integrand is a function of |d|2 , we change variable to |d|2 = c(1 − c)(t1 − t2 )2 ; then we have b12 + b22 + 2|d|2 = t12 + t22 , and b1 b2 − |d|2 = t1 t2 . Finally since the parameter c is restricted to the interval 0 < c < 1, we replace it by c = sin2 θ . This leads to 1 ∞ (t1 − t2 )3 2 2 F2 (λ1 , λ2 ) = e−N(λ1 +λ2 ) (t1 t2 )N dc dt1 dt2 (1 − 2c) (λ − λ ) (63) 1 2 0 −∞ × e−N(t1 +t2 )+2iN(t1 λ1 +t2 λ2 )−2iNc(t1 −t2 )(λ1 −λ2 ) . 2
2
Characteristic Polynomials of Real Symmetric Random Matrices
375
The integration over c yields
2 2 ∞ 2 2 dt1 dt2 (t1 t2 )N e−N(t1 +t2 )+2iN(t1 λ1 +t2 λ2 ) F2 (λ1 , λ2 ) = e−N λ1 +λ2 −∞ t1 − t2 1 i t1 − t2 2 . × + 2 N λ1 − λ 2 N (λ1 − λ2 )3
(64)
When λ1 = λ2 = λ, it reduces as expected to (49). This formula may be easily checked for finite values of N , since it reduces to Gaussian integrals over t1 and t2 ; for instance in the simplest case N = 1, it gives F2 (λ1 , λ2 ) ∼ λ1 λ2 + 1, which agrees with the direct dx(λ1 −x)(λ2 −x)e− 2 x ). 1 2
calculation (in this case the trivial integral over the real axis
However the representation (64), which is exact for any N , makes it clear i) that the large N -limit may be found through a saddle-point integration over t1 and t2 ; this will be done below. ii) that in the universal local limit of interest, in which N goes to infinity, λ1 − λ2 goes to zero and N (λ1 − λ2 ) remains finite, the large N -limit could not have been taken earlier. If, for instance, we had used the saddle-point method at the level of (33), we would have missed the second term in the bracket of (64). If, at the early level of (33), we had recognized that the regime of interest requires to expand beyond the Gaussian approximation to the saddle-point, it would have appeared unexpectedly that the expansion stops after the first correction. Therefore (64) could have been obtained by a semi-classical approximation with a finite number of corrections, here just one. This is analogous, although not as simple, to the Harish-Chandra–Itzykson–Zuber formula for the GUE case, which is semi-classically exact, without any correction term. The above integration over c is therefore, for k = 2, the corresponding HIZ formula for the symplectic group. For higher values of k we need a more elaborate strategy. The HIZ formula may be easily derived by considering the Laplacian operator [21], L=−
∂2 . 2 ∂Xij
(65)
Its eigenfunctions are plane waves LeiN tr
X
= (N 2 tr
2
)eiN tr
X
.
(66)
One can construct a unitary invariant eigenfunction of L, for the same energy N 2 tr 2 , by the superposition † I = dU eiN tr U XU , (67) which is nothing but the HIZ integral. The integral being unitary invariant, it is a function of the k eigenvalues ti of X. The same considerations hold for the three ensembles β = 1, 2 and 4, corresponding to the orthogonal, unitary and symplectic ensemble, with −1 I = eN tr gXg dg. (68)
376
E. Brézin, S. Hikami
The Laplacian, expressed in terms of a differential operator on the eigenvalues ti reads k k 2 ∂ 1 ∂ I = −>I, (69) +β 2 ti − tj ∂ti ∂t i i=1 i=1,(i=j ) with the eigenvalue > > = N2
k i=1
λ2i .
(70)
The t-dependent eigenfunctions of this Schrödinger operator have a scalar product given by the measure ϕ1 |ϕ2 = dt1 · · · dtk |%(t1 · · · tk )|β ϕ1∗ (t1 · · · tk )ϕ2 (t1 · · · tk ). (71) The measure becomes trivial if one multiplies the wave function by |%|β/2 . Thus if one changes I (t) to ψ(t1 · · · tk ) = |%(t1 · · · tk )|β/2 I (t1 · · · tk ), one obtains the Hamiltonian, k 2 ∂ 1 β ψ = −>ψ. −β −1 2 2 2 (t − t ) ∂t i j i i=1 i<j
(72)
(73)
The relation between matrix quantum mechanics and many-body problems with 1/r 2 pair potentials was already present in [16] and it has also been used for the study of Selberg integrals by Forrester [8]. It differs from the Calogero–Sutherland model by the absence of a confining potential. For the β = 2, the solution is again given by plane waves in the ti and (taking into account the symmetry under permutations of I ), one obtains the HIZ formula. For β = 4 case, the problem is less trivial, but simple for finite values of k. Indeed the problem turns out to possess simple solutions of the form of symmetrized sums of plane waves multiplied by polynomials in the variables 1/(ti − tj )(λi − λj ), providing therefore a complete explicit solution for the group integration. Let us exhibit those solutions, starting with k = 2; the solution of (73), which satisfies the proper boundary conditions, is 2i iN(λ1 t1 +λ2 t2 ) ψ0 = e 1+ . (74) N (t1 − t2 )(λ1 − λ2 ) The symmetry of I under permutation of the ti ’s leads then to the solution 2i ψ = eiN(λ1 t1 +λ2 t2 ) 1 + N (t1 − t2 )(λ1 − λ2 ) 2i + eiN(λ1 t2 +λ2 t1 ) 1 + . N (t2 − t1 )(λ1 − λ2 )
(75)
Characteristic Polynomials of Real Symmetric Random Matrices
377
Then, after multiplication by the Vandermonde factor, we obtain the required symplectic HIZ formula (for k = 2), I=
1 (t1 − t2
)2 (λ
1
− λ 2 )2
ψ.
(76)
For general k (β = 4), the solution of (73) is of the form ψ0 = eiN(λ1 t1 +···+λk tk ) χ ,
(77)
k k 2 ∂ ∂ 4 χ = 0. + 2iN λi − ∂ti (t − tj )2 ∂ti2 i=1 i=1 i<j i
(78)
where χ satisfies
k ∂2 4 − annihilates the function %−1 (t1 , · · · , tk ) . Con2 (t − t j )2 ∂t i i i<j i=1 sequently the solution of (78) may be written
The operator
χ (t1 · · · tk ) =
f (t1 · · · tk ) %(t1 · · · tk )
(79)
in which f (t1 · · · tk ) is a polynomial of degree k(k − 1)/2 in the ta ’s. Defining τij = N (λi − λj )(ti − tj )
(80)
2 1 1 1 χ = 1− + + i τ12 τ23 τ31 1 1 1 1 − 12i . −4 + + τ12 τ23 τ23 τ31 τ31 τ12 τ12 τ23 τ31
(81)
one finds for k = 3,
Again, as for k = 2, one sees that the successive terms in the r.h.s. of (81) are of the same order in the limit of interest, and again they could have been obtained through a finite number of corrections to a semi-classical calculation. It is remarkable that the series of χ stops at the order of the inverse of the Vandermonde; thus the symplectic HIZ integral is expressed as the sum of a finite number of terms. The successive coefficients of each term are determined by Eq. (73).
378
E. Brézin, S. Hikami
Using this modified HIZ formula for the symplectic case, we obtain for the k=3 case, F3 (λ1 , λ2 , λ3 ) which is expressed by
F3 (λ1 , λ2 , λ3 ) = e−N λ1 +λ2 +λ3 %(t) 2 −N(t12 +t22 +t32 )+2iN(λ1 t1 +λ2 t2 +λ3 t3 ) N × dt1 dt2 dt3 e (t1 t2 t3 ) %(λ) 1 1 1 × 1+i + + N (λ1 − λ2 )(t1 − t2 ) N (λ2 − λ3 )(t2 − t3 ) N (λ3 − λ1 )(t3 − t1 ) 1 1 − 2 − N (λ1 −λ2 )(λ2 −λ3 )(t1 −t2 )(t2 −t3 ) N 2 (λ2 −λ3 )(λ3 −λ1 )(t2 −t3 )(t3 −t1 ) 1 3 1 − 2 −i , N (λ3 − λ1 )(λ1 − λ2 )(t3 − t1 )(t1 − t2 ) 2 N 3 %(λ)%(t) (82) 2
2
2
where %(t) = (t1 − t2 )(t2 − t3 )(t3 − t1 ). (Note that the symmetries of the integrand allowed us to keep only the single solution (81), without adding permutations.) For low values of N (i.e. N=1 or 2), one verifies easily this result by a direct integration over 1 × 1 or 2 × 2 matrices. Higher values of k may be handled in a similar way, but the combinatorics become quite heavy. For instance in an appendix the solution of the case k = 4 is given explicitly and, although again it consists of a finite number of terms, it is quite cumbersome. As is now clear, those integral representations make it easy to find the scaling limit (large N, finite N (λi − λj )). For instance for k = 2 one finds the saddle point values of t1 and t2 from (64) in the large N limit, i > t i = λi + 2 − λ2i , (83) 2 2 √ where i = 1, 2 and > = ±1. We use the parametrization λi = 2 cos θi . There are a priori four saddle-points given by (83), but the two dominant ones are t1 = √i e−iθ1 > , 2
t2 = √1 eiθ2 > for > = ±1. In the short distance limit, N large and N (θ1 − θ2 ) = N θ12 2 finite , we obtain 1 i> −N(λ21 +λ22 ) −2i>Nθ12 sin2 θ F2 (λ1 , λ2 ) = e , (84) e + (N θ12 )2 2(N θ12 )3 sin2 θ >=±1
where θ = (θ1 + θ2 )/2. The semi-circle density of states is given by ρ(λ) = 1 −λ2 and θ12 = − √1 λsin θ . Thus we obtain in the scaling short distant limit, 2
F2 (λ1 , λ2 ) = Ce
−2Nλ2
cos x sin x − 3 , x2 x
√ 2 π
sin θ ,
(85)
where x = πN (λ1 −λ2 )ρ(λ). It is interesting $ may be expressed function to note that this cos x π sin x = − as a half-integer Bessel function, since − 3 J3/2 (x). In the 2 x x 2x 3 sin x unitary case, the sine kernel is similarly a half integer Bessel function since = x
Characteristic Polynomials of Real Symmetric Random Matrices
379
$
π J1/2 (x). This two point correlation function in the large N limit has been also 2x derived through the generalized Selberg integral by Aomoto [9], but further extension to the higher correlations seems to be difficult. Our present method is suited for the analysis of the scaling limit of the higher correlations. 6. Extension to an External Matrix Source
In the GUE case, when an external matrix source A is coupled to an Hermitian random matrix X, as we have discussed earlier, F2k (λ1 , . . . , λ2k ) is given by (20). The degrees of freedom provided by the eigenvalues of A are useful to study a number of new universality classes [6]. For instance by tuning the eigenvalues ai of the external source matrix A, we can study the problem of a closing gap in the spectrum of random hermitian matrices [6]. Thus it is interesting to consider this external source problem for real symmetric matrices as well. One can always assume that the external source matrix A is diagonal. In the method of integration over Grassmann variables used in Sect. 3, it is simple to include the external matrix A: N N 2 2 T e− 2 tr X +NtrAX+iN tr XY dX = e− 2 tr[(Y −iA) +(Y −iA)(Y −iA) , (86)
where Y
= − α c¯aα cbα . Since A is diagonal, the term tr AY gives simply the extra term exp[iN an c¯nα cnα ] in the integrand of Fk in (28). Therefore, repeating the calculations of Sect. 2, we find that Eq. (30) is modified as follows:
d c¯j α dcj α e
(j ) iN 2 ψαj Mαβ ψβj
=
N
d c¯α dcα e
j =1
=
%
− PfM
(j )
(j ) iN 2 ψα Mαβ ψβ
&
=
j
%
det M
& (j ) /2
(87) ,
j
where PfM (a) is the pfaffian of the antisymmetrix matrix M (a) given by
D + aj 1 − iB T (j ) M = −( + aj 1 − iB) D† in which
(88)
= diag(λ1 , . . . λk ) and aj 1 = diag(aj , . . . , aj ). Thus we finally obtain Fk (λ1 , . . . ., λk ) =
dBdDe−N tr(B
2 +D † D)
N %
& − PfM (j ) .
(89)
j =1
This integral can be expressed in terms of a quarternion matrix Q, which can be diagonalized by the symplectic group Sp(k). When all the λi ’s are equal to a single λ, we get Fk (λ, . . . , λ) = e
−N
λ2l
N k
(tl − iaj )
l=1 j =1
l
(tl − tl )4 e−N
tl2 +2iNλ
tl
k
dtl .
l=1
(90)
380
E. Brézin, S. Hikami
For the case of different λi ’s , this formula is modified by an extra factor as in the previous section. Those explicit representations may be used to study the expected universality of the previous sections with respect to the external matrix source. As an example of the usefulness of the above representation, we choose an external source with only two opposite eigenvalues ±c, with half of the eigenvalues equal to +c and the other half to −c. This gives a factor (tl2 + c2 )N/2 = exp N2 log (t 2 + c2 ) in 2 2 the integrand. Expanding √ it in powers of t , the total coefficient of t in the exponent vanishes for c = 1/ 2. Therefore in the large N limit, we obtain at this new critical point 4 k (91) < [det(X)] >= e−N tr Q dQ, where Q is a k × k symmetric quaternionic matrix, and the average . . . is evaluated with√ the distribution in the presence of the external source whose eigenvalues are ±c = ±1/ 2. One could make other choices for the eigenvalues of the external source matrix A, and obtain thereby higher multicritical points with terms such as tr Qn in the exponent, in analogy with the GUE case in an external matrix source [17]. 7. Summary
' k ( In this article, an exact representation of the k-point functions a=1 det(λa − X) , averaged over N × N real symmetric random matrices, has been derived in terms of an integral over quaternionic k × k matrices, invariant under the unitary symplectic group. This representation leads to an easy calculation of the moments of the characteristic polynomials (λ1 = · · · = λk ). In the large N -limit one finds k (GOE) = exp −N kλ2 F2k l=1
(2l − 1)! 2 2 N 2k (2πρ(λ))2k +k , (2k + 2l − 1)!
(92)
to be compared to the earlier result for the GUE, k−1 (GUE) exp −N kλ2 /2 F2k = l=0
l! 2 (2N πρ(λ))k . (k + l)!
(93)
For unequal λa ’s, in spite of the fact that the integral representation involves a finite number of variables, in the large N -limit the corrections to the saddle-point, in the scaling regime N (λi − λj ) finite, are not negligible. A generalization of the HarishChandra–Itzyson–Zuber formula is shown to solve the problem. Remarkably this formula is “nearly” semi-classical, in the sense that it happens that the semi-classical expansion terminates after a few terms, a number of terms which increases with k but not with N . Then the saddle-point method may easily be applied for large N , and this leads to explicit asymptotic formulae for the correlation functions of the characteristic polynomials. Finally this may be generalized to include an external matrix source in the probability measure. Real symmetric random matrices appear as models of numerous physical time-reversal invariant Hamiltonians. For instance the orthogonal matrix model with an external source has been investigated as a model of glassy behavior [18]. The results of the present work for the moments and for the correlation functions in an external source may be of interest for such problems.
Characteristic Polynomials of Real Symmetric Random Matrices
381
Appendix A: The Solution for k = 4 From (78) and (79), the polynomial f satisfies k k k ∂ 1 ∂f ∂2 ∂f f + 2 + iN λ f % λi = 0. + 2iN i 2 ∂t ∂t % ∂t ∂ti i i i i=1 i=1 i=1
(A.1)
The solution of this equation is obtained by expanding in powers of the λi ’s, but the expansion ends at the level of the Vandermonde %(λ1 · · · λ4 ). Using the notation of (80), τij = N (ti − tj )(λi − λj ), we obtain i τ12 + τ13 + τ14 + τ23 + τ24 + τ34 4 1 − τ12 τ13 + τ12 τ14 + τ13 τ14 + τ12 τ23 + τ23 τ24 + τ12 τ24 12 + τ14 τ34 + τ14 τ24 + τ24 τ34 + τ23 τ34 + τ13 τ34 + τ13 τ23 1 − τ12 τ34 + τ13 τ24 + τ14 τ23 18 i + (τ12 τ13 τ14 + τ12 τ23 τ24 + τ13 τ23 τ34 + τ14 τ24 τ34 24 i + (τ12 τ13 τ23 + τ12 τ14 τ24 + τ13 τ14 τ34 + τ23 τ24 τ34 36 + τ14 τ34 τ23 + τ14 τ24 τ23 + τ12 τ24 τ34 + τ12 τ23 τ34 + τ12 τ14 τ23 + τ13 τ14 τ23 + τ12 τ13 τ34 + τ12 τ14 τ34 + τ13 τ34 τ24 + τ13 τ24 τ23 + τ14 τ24 τ13 + τ13 τ12 τ24 1 + τ12 τ23 τ34 τ14 + τ12 τ13 τ24 τ34 + τ13 τ14 τ24 τ23 + τ12 τ14 τ24 τ34 + τ12 τ14 τ24 τ23 72 + τ12 τ14 τ24 τ13 + τ12 τ13 τ23 τ34 + τ12 τ13 τ23 τ14 + τ12 τ13 τ23 τ24 + τ12 τ24 τ23 τ34 + τ14 τ24 τ23 τ34 + τ13 τ24 τ23 τ34 + τ12 τ14 τ13 τ34 + τ14 τ13 τ23 τ34 + τ14 τ13 τ34 τ24 i − τ12 τ13 τ24 τ23 τ34 + τ12 τ14 τ24 τ13 τ23 + τ12 τ14 τ24 τ13 τ34 144 + τ14 τ13 τ24 τ23 τ34 + τ12 τ14 τ13 τ34 τ23 + τ12 τ14 τ24 τ23 τ34 1 − (A.2) τ12 τ13 τ14 τ23 τ24 τ34 . 288
f =1−
The HIZ integral is obtained by requiring the symmetry under permutation of the ti ’s in the final expression for I , thus from (79) (and a replacement of λ by 2λ), I =C
1 [%(t)%(λ)]3
ei
λi ti
f (t1 , . . . , tk ) + perm of f ,
(A.3)
where the last term means that one adds the terms in which one permutes the ti ’s for fixed λ’s .
382
E. Brézin, S. Hikami
References 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17. 18. 19. 20. 21. 22. 23.
Dyson, F.J.: Commun. Math. Phys. 19, 235 (1970) Dyson, F.J.: J. Math. Phys. 13, 90 (1972) Mehta, M.L.: Random matrices. 2nd ed., New York: Academic, 1991 Brézin, E. and Hikami, S.: Commun. Math. Phys. 214, 111 (2000) Brézin, E. and Hikami, S.: Physica A 279, 333 (2000) Brézin, E. and Hikami, S.: Phys. Rev. E 62, 3558 (2000) Mehta, M.L. and Normand, J.-M.: Preprint (2001) Forrester, P.J.: Nucl. Phys. B 388, (1992) 671 Aomoto, K.: In: Advanced Studies in Pure Mathematics 16. edited by H. Morikawa, Tokyo, Kinokuniya Company, 1988, pp. 1–16 Kaneko, J.: SIAM J. Math. Anal. 24, 1086 (1993) Baker, T.H. and Forrester, P.J.: Commun. Math. Phys. 188, 175 (1997) Guhr, T. and Kohler, H.: Preprint math-ph/0011007 Keating, J. and Snaith, N.: Commun. Math. Phys. 214, 57 (2000) Conrey, J.B. and Farmer, D.W.: Int. Math. Res. Notices 17, 883 (2000) Brézin, E. and Hikami, S.: Phys. Rev. E 56, 264 (1997) Brézin, E., Itzykson, C., Parisi, G. and Zuber, J.-B.: Commun. Math. Phys. 59, 35 (1978) Brézin, E. and Hikami, S.: Phys. Rev. E 57, 4140 (1998) Marinari, E., Parisi, G. and Ritort, F.: J. Phys. A 27, 7647 (1994); Parisi, G. and Potters, M.: J. Phys. A 28, 5267 (1995) Harish-Chandra: Proc. Nat. Acad. Sci. 42, 252 (1956) Itzykson, C. and Zuber, J.-B.: J. Math. Phys. 21, 411 (1980) Brézin, E.: Two dimensional quantum gravity and random surfaces. edited by D.J. Gross, T. Piran and S. Weinberg, Singapore: World Scientific, 1992, p. 37 Brézin, E., Hikami, S. and Larkin, A.I.: Phys. Rev. B 60, 3589 (1999) Duistermaat, J.J. and Heckman, G.H.: Invent. Math. 69, 259 (1982)
Communicated by H. Spohn
Commun. Math. Phys. 223, 383 – 408 (2001)
Communications in
Mathematical Physics
© Springer-Verlag 2001
Differential Geometry from Differential Equations Simonetta Frittelli1,2 , Carlos Kozameh3 , Ezra T. Newman2 1 Department of Physics and Astronomy, Duquesne University, Pittsburgh, PA 15282, USA 2 Department of Physics, University of Pittsburgh, Pittsburgh, PA 15260, USA 3 FaMAF, Universidad Nacional de Cordoba, 5000 Cordoba, Argentina
Received: 10 October 2000 / Accepted: 26 June 2001
Abstract: We first show how, from the general 3rd order ODE of the form z = F (z, z , z , s), one can construct a natural Lorentzian conformal metric on the fourdimensional space (z, z , z , s). When the function F (z, z , z , s) satisfies a special differential condition the conformal metric possesses a conformal Killing field, ξ = ∂/∂s, which in turn, allows the conformal metric to be mapped into a three dimensional Lorentzian metric on the space (z, z , z ) or equivalently, on the space of solutions of the original differential equation. This construction is then generalized to the pair of differential equations, zss = S(z, zs , zt , zst , s, t) and ztt = T (z, zs , zt , zst , s, t), with zs and zt the derivatives of z with respect to s and t. In this case, from S and T , one can again, in a natural manner, construct a Lorentzian conformal metric on the six dimensional space (z, zs , zt , zst , s, t). When the S and T satisfy differential conditions analogous to those of the 3rd order ode, the 6-space then possesses a pair of conformal Killing fields, ξ = ∂/∂s and η = ∂/∂t which allows, via the mapping to the four-space of (z, zs , zt , zst ) and a choice of conformal factor, the construction of a four-dimensional Lorentzian metric. In fact all four- dimensional Lorentzian metrics can be constructed in this manner. This construction, with further conditions on S and T , thus includes all (local) solutions of the Einstein equations. 1. Introduction It is intuitively clear that with enough information about the null (or characteristic) surfaces (or equivalently, the null geodesics) of a four-dimensional Lorentzian manifold, one could reconstruct the metric tensor, up to multiplication by a conformal factor. The issue is how could one explicitly carry out this reconstruction process in some simple and useful fashion. In other words is there a simple way in which the information about the null surfaces could be given or coded and then is there a simple way in which knowledge about the metric could be extracted from this information. It is the solution to this problem that we present here. Though it is not at all obvious, this problem can be
384
S. Frittelli, C. Kozameh, E. T. Newman
rephrased and solved in a rather surprising manner. By beginning with families of special pairs of 2nd order, overdetermined partial differential equations for a single function of two independent variables, any conformal Lorentzian metric can be constructed from the solutions. The information about the null surfaces is stored in the solutions of the pair of equations. It is in this manner that our problem is solved. The present work is a detailed exposition of these ideas. We have developed and studied a reformulation of General Relativity where the primary objects of study are families of 3-dimensional surfaces in a 4-space. They are used to define a conformal Lorentzian metric via the requirement that the surfaces be null or characteristic surfaces of that conformal metric. In addition, a choice of a conformal factor is needed to make the conformal metric into an Einstein metric. From this point of view the metric tensor is a derived concept and the Einstein equations appear as equations for the surfaces and the conformal factor, with no mention of a metric. In this study it had on occasion been useful to simplify some of the equations by assuming we were studying conformal geometry in a three-dimensional Lorentzian manifold. Much to our surprise, we recently discovered that this 3-dimensional problem – with a totally different motivation – had been studied in some classical papers by Cartan and Chern. One major purpose of this work is to examine the relationship of our version of 3-dimensional Lorentzian conformal geometry with that of Cartan and Chern. Of perhaps greater importance is our generalization of these ideas to the case of 4-dimensional conformal Lorentzian geometries. This, very much more complicated problem is discussed after an exposition of the 3-dimensional problem. Though one might consider these investigations to be mainly in the realm of the study of certain classes of differential equations, our main motivation has been towards the investigation of the Einstein equations of general relativity. Already the theory of self- (or anti-self) dual vacuum Einstein metrics has arisen as a natural special case. The emphasis now is on the case of the full vacuum Einstein equations. In the late 1930’s Cartan and Chern [1–4], while studying the invariance properties of differential equations, showed that there was a natural geometric structure that can be associated with ordinary differential equations [ODE’s] of the form z = E(z, z , s) or z = F (z, z , z , s), where the prime denotes differentiation with respect to the independent variable s. This geometric structure which is quite rich, involving a wide variety of connections (projective, conformal and metric with and without torsion), is given on the solution spaces of the equations. More specifically, if the solutions are denoted by z = z(x a , s), where x a are the arbitrary constants of integration, (two of them for the first equation and three for the second) the geometric structures often (or usually) live on the space of the constants of integration, but are sometimes augmented with an extra dimension by adding the independent variable s as a fiber coordinate. In Sect. 2, we first review, from a new perspective, these results of Cartan and Chern applied to the third order equation, and then, in Sect. 3, we generalize them to a new system of equations. For the new system we will consider the dependent variable, z, to be a function of now two independent variables, s and t, e.g., z = z(s, t) that satisfies the system of equations zss = S(z, zs , zt , zst , s, t) and ztt = T (z, zs , zt , zst , s, t), with zs and zt the derivatives of z with respect to s and t. The integrability conditions, Stt = Tss , are assumed to be satisfied. The solution space to these equations, which is four-dimensional and is again augmented by the fiber coordinates s and t, possesses a natural six-dimensional conformal metric. By a judicious choice of the two functions, S and T , (i.e. by being solutions of a complicated differential equation) the six-dimensional
Differential Geometry from Differential Equations
385
conformal metric possesses two conformal Killing fields and (via them and a special choice of conformal factor) maps to a family of conformal Lorentzian metrics on the fourdimensional solution space. All four-dimensional Lorentzian metrics can be obtained in this manner. It follows that by a further restriction in the choice of S and T and choice of conformal factor, all Einstein spaces can be so obtained. In Sect. 4 we discuss the relationship of this work to the earlier work on the null surface reformulation of GR. For the sake of completeness, in Appendix A, we will outline the Cartan geometry associated with the second order equation, z = E(z, z , s). 2. The Differential Geometry of z = F (z, z , z , s) We will study the geometry associated with the differential equation z = F (z, z , z , s) assuming that F (z, z , z , s) is a smooth function in all its variables. We will only be interested in the local behavior of the solutions. There will be several different (but related) spaces that will be of interest to us. First of all, we mention the two-dimensional space of (z, s); Cartan and Chern studied the problem of the equivalence classes of differential equations under the diffeomorphisms in this two-space. This problem, though of considerable interest, will not concern us. The next space is the three-dimensional solution space of the differential equation. The solutions of the third order ODE are given in terms of three constants of integration, x a , so that z = z(x a , s) is the general solution and the space of the x a is the solution space. For any fixed value of the independent variable s, the relations z = z(x a , s),
z = z (x a , s),
z = z (x a , s)
(1)
can be considered as a coordinate transformation (that depends on the parameter s), between the three x a and the “coordinates” (z, z , z ), i.e., it defines a one-parameter family of coordinate transformations. This leads naturally to the idea of a four-dimensional space coordinatized either by (x a , s) or by (z, z , z , s). The first choice suggests that this four-space should be thought of as a base three-space augmented by the one-dimensional fibers coordinatized by s. Different constructions or applications lead naturally to one or the other of the coordinatizations. Remark 1. For most applications that are of interest to us, the independent variable s is taken to be the angle φ on the circle and the fiber would thus be thought of as S 1 . This circle, in our applications to 3-dimensional Lorentzian spaces, is simply the “circle” of null directions at each space-time point. At this point in the exposition, this, however, is not easily seen. Here, for the moment, we are only interested in the local behavior. On the four dimensional space of (z, z , z , s), we consider the four one-forms β a , β1 β2 β3 β4
= = = =
dz − z ds, dz − z ds, dz − F (z, z , z , s)ds, ds.
From (1), we can write dz = ∂a zdx a + z ds, dz = ∂a z dx a + z ds, dz = ∂a z dx a + F (z, z , z , s)ds,
(2)
386
S. Frittelli, C. Kozameh, E. T. Newman
so that we have the alternative version of the forms β 1 = za dx a , β 2 = za dx a ,
(3)
β 3 = za dx a ,
β 4 = ds. The following four linear combinations of the β’s will play a central role, though for the moment only the first three will be used, ω1 ω2 ω3 ω4
= = = =
β 1, β 2, β 3 + aβ 1 + bβ 2 , Cβ 4 .
(4)
The (a, b, C) are three functions of (z, z , z , s) that are to be determined. From the ωi = (ω1 , ω2 , ω3 ) we construct the following one-parameter family of Lorentzian 3-metrics, parametrized by the values of s: g(z, z , z , s) = ω1 ⊗ ω3 + ω3 ⊗ ω1 − ω2 ⊗ ω2 .
(5)
At this point we are simply defining a one-parameter family of metrics constructed from the (ω1 , ω2 , ω3 ). Later we will see that this definition is justified by the results. Note. A more general version of (4) could have been used. The ω2 could have included another term so that it had the form ω2 = β 2 + Aβ 1 with the further modification ω3 = β 3 + (A + a)β 1 + ( 21 A2 + b)β 2 . These exta terms however play no role; they form a “null” rotation, leaving the metric, (5) invariant with arbitrary A.) Remark 2. In order to try to give some perspective and motivation we remark that from another point of view, described in detail later, the metric (5) arose from physical considerations where the function u = z(x a , s) = const, represented a one-parameter family of null foliations of a Lorentzian three-dimensional space-time. The three ω’s are chosen to form a null triad with ω1 being the gradient of z(x a , s) and with ω3 as the second null covector of the triad. The (so-far) arbitrary functions a and b will be uniquely determined by a requirement of “minimal” dependence of the metric, Eq. (5), on the parameter s. The precise meaning of ‘ “minimal” dependence’ will be given shortly. Remark 3. We emphasize that, though it appears that our choice of the form of the metric Eq. (5) is arbitrary, in fact, it appears to be the only choice that allows the following construction. Later we will give, from the other point of view, an alternate justification of its “naturalness”. Definition 1. For any function H (z, z , z , s), H˙ is the total s-derivative; dH ≡ H˙ ≡ Hz z + Hz z + Hz F + H,s . ds
Differential Geometry from Differential Equations
387
Our plan is to first take the s-derivative of g(z, z , z , s), i.e., g˙ ≡
dg(z, z , z , s) , ds
and then by judicious choice of a and b, to make g˙ as close as possible to being proportional to g itself. Remark 4. This derivative is actually the Lie derivative of the metric along the vector field, d ∂ ∂ ∂ ∂ ≡ + z + z + F , ds ∂s ∂z ∂z ∂z in the(z, z , z , s) space. It is, perhaps, simpler to think of it as the total s-derivative of g in the (z, z , z , s) coordinate system. It turns out that to exactly make g˙ = λg requires a restriction on the F of the starting differential equation, z = F (z, z , z , s). Nevertheless it is the “as close as possible” condition that will constitute our ‘ “minimal” dependence’ condition. Note that when g˙ = λg there is a conformal structure naturally defined on the solution space. Explicitly the s-derivative of g(z, z , z , s), is g˙ = ω˙ 1 ⊗ ω3 + ω1 ⊗ ω˙ 3 + ω˙ 3 ⊗ ω1 + ω3 ⊗ ω˙ 1 − ω˙ 2 ⊗ ω2 − ω2 ⊗ ω˙ 2 .
(6)
From the definition of the forms we have for the first two ω that ω˙ 1 = ω2 , ω˙ 2 = ω3 − aω1 − bω2 .
(7)
Using ˙ 2 + b ω˙ 2 , ˙ 1 + a ω˙ 1 + bω ω˙ 3 = β˙ 3 + aω 3 a a β˙ = za dx = F,a dx = Fz ω1 + Fz ω2 + Fz (ω3 − aω1 − bω2 ), we have ω˙ 3 = (Fz − aFz + a˙ − ab)ω1 + (Fz − bFz + a + b˙ − b2 )ω2 + (Fz + b)ω3 or ω˙ 3 = U ω1 + V ω2 + W ω3 ,
(8)
U = Fz − aFz + a˙ − ab, V = Fz − bFz + a + b˙ − b2 ,
(9)
with
W = Fz + b. Substituting Eqs. (7) and (8) into Eq. (6) we obtain, after collecting terms, g˙ = 2U ω1 ⊗ ω1 + 2(V + a)ω(1 ⊗ ω2) + 2W ω(1 ⊗ ω3) + 2bω2 ⊗ ω2 .
(10)
388
S. Frittelli, C. Kozameh, E. T. Newman
We can now precisely state our condition of “minimal s dependence” of the metric; 1. We require that, in Eq. (10), the coefficient of ω(1 ⊗ ω2) vanishes, i.e., a = −V ,
(11)
2. We require the two terms, 2W ω(1 ⊗ ω3) + 2bω2 ⊗ ω2 , combine so that they are proportional to the metric, Eq. (5), i.e., 2b = −W.
(12)
This leads to the unique algebraic determination of a and b in terms of F and its derivatives: 1 b = − Fz , (13) 3 2 1 d 2a = −Fz − (Fz )2 + (Fz ). 9 3 ds Using, from Eqs. (9) and (13), U [F ] ≡ Fz − a[F ]Fz + a[F ˙ ] − a[F ]b[F ],
(14)
this leads to the final form of g˙ : g[F ˙ ] = 2U [F ]ω1 ⊗ ω1 + λg,
(15)
with 2 Fz . 3 Our “minimal s dependence” leads to a unique determination of a and b and unique differential expressions for both U and λ in terms of F . λ(x a , s) =
Proposition 2. Our one-parameter family of metrics are all conformally related if F is restricted by the condition U [F ] = 0, so that g[F ˙ ]=
2 Fz g. 3
˙ = 1 Fz ", so that, for all values In this case there exists a conformal factor, ", with " 3 d of s, the metric g = "−2 g, satisfies ds g = 0. In the general case, U [F ] = 0, we can extend the metric g to a four dimensional metric by g (4) = g − ω4 ⊗ ω4 = g − C 2 ds ⊗ ds
(16)
so that ˙ ⊗ ds, g˙ (4) = g˙ − 2C Cds 2 ˙ ⊗ ds. = 2U ω1 ⊗ ω1 + Fz g − 2C Cds 3 If the unknown C is chosen such that C˙ = 13 Fz C then 2 g˙ (4) = 2U ω1 ⊗ ω1 + Fz g (4) . 3
(17)
(18)
Differential Geometry from Differential Equations
389
Proposition 3. For the special case of U [F ] = 0, we see that ξ = d/ds is a conformal Killing field of the four-space so that each of the “three-slices”, s = const, yield threemetrics that are conformally related. If g (4) is conformally rescaled by g (4) = "−2 g (4) , 1 ˙ with " = 3 Fz ", the conformal Killing vector field becomes a Killing field and the three-slices are all isometric. An alternate point of view [1] towards the geometry of z = F (z, z , z , s) is via the first of Cartan’s structure equations, for the three one-forms ωi = (ω1 , ω2 , ω3 ); we have dωi = ωij ∧ ωj + T i .
(19)
The indices are raised and lowered with the Lorentzian metric, (from Eq. (5)), ηij, with η13 = −η22 , all other independent components vanishing. The basis one-forms are taken as dx a and ds; so that though the ωi contain only dx a , the other forms, i.e., dωi , ωji and T i will, in general, contain ds. The connection one-forms, ωij , do not form a metric connection but rather a conformal connection. Written as ωij = ηik ωkj they are given by ωij = wij + ωηij , wij = −wj i ,
(20) (21)
i.e., are taken as a metric connection plus a trace-term, ω = 13 ωii . Remark 5. Note that the use of the trace term in the connection, ωij , is a variant of a (Weyl) connection via the equation ∇c gab (x a ) = 2 ωc gab . It is not exactly the same as a Weyl connection, but is a variant of it, because here we have the extra degree of freedom, namely the variable s. Writing out Eq. (19) we have dω1 = (w[31] + ω) ∧ ω1 + w[32] ∧ ω2 + T 1 ,
(22)
ω ∧ ω − w[23] ∧ ω + T , dω = −w[21] ∧ ω + 2
1
2
3
2
ω) ∧ ω3 + T 3 dω3 = w[12] ∧ ω2 + (w[13] + which, to determine the structure and torsion forms, can be compared with the direct calculation of dωi , namely dω1 = ds ∧ ω2 , dω2 = ds ∧ (ω3 − aω1 − bω2 ), dω3 = ds ∧ (ω1 U [F ] − aω1 − 2bω2 ) +(az − baz − bz + abz )ω2 ∧ ω1 + az ω3 ∧ ω1 + bz ω3 ∧ ω2 .
(23)
When the comparison is made we see that there are far more variables than equations and thus there are ambiguities in the algebraic solution for wij , ω and T i . If however we require that the skew-part of the connection, when pulled back to the constant s
390
S. Frittelli, C. Kozameh, E. T. Newman
surfaces, is precisely the metric connection of Eq. (5), then we have a unique solution for the connection and torsion: 1 w[32] = ds − bz ω1 , 2 1 w[31] = −az ω1 − bz ω2 + bds, 2
(24)
1 w[12] = −ads + (az − baz − bz + abz )ω1 − bz ω3 , 2 ω = −bds, T 1 = 0, T 2 = 0, T 3 = U [F ]ds ∧ ω1 . Once again we see the geometric role of U [F ]; when it vanishes the “conformal” connection has zero torsion. We thus have seen that the general third order differential equation z = F (z, z , z , s) induces a variety of geometric structures; a “conformal” connection on the solutions space, x a , a four-dimensional Lorentzian metric on the space (z, z , z , s) so that when the space is foliated by the constant s three-surfaces they possess a one parameter family of three-metrics, all closely related, satisfying g˙ = 2U [F ]ω1 ⊗ ω1 + 23 Fz g. When the special condition U [F ] = 0 is satisfied all the three-metrics are conformally equivalent. Cartan studied the connection associated with the full conformal equivalence class. We, instead, worked out the metric connections, Eq. (24), associated with the one-parameter family of metrics, Eq. (5). Remark 6. The study [1–5] of this third order ODE had its origin in the classical question of the equivalence of ODE’s under transformations in the plane; (z, s) ⇔ (z∗ , s ∗ ). Cartan studied the equivalence classes (with their invariants) of 3rd order ODEs under point transformations, z∗ = Z(z, s), s ∗ = S(z, s), while Chern studied the same problems but under a larger group of transformation, the group of contact transformations. The functional U [F ], often referred as the Wunschmann Invariant, is a relative invariant under contact transformations of the 3rd order ODEs. We return briefly to this issue in the Discussion section. To conclude this section we will summarize [13, 14], what appears to be a completely different problem that in fact turns out to be virtually identical, or more correctly, turns out to be the inverse to the problem just addressed, namely the geometry of 3rd order ODEs. Roughly speaking, we begin with a 3-dimensional conformal Lorentzian metric and find a complete integral of the associated Eikonal equation; i.e., we find a one parameter, “s”, family of characteristic surfaces of sufficient generality. By taking three derivatives with respect to the parameter, the three space-time coordinates can be eliminated from the eikonal, resulting in a 3rd order ODE with “s” being the independent variable. Automatically the Wunschmann Invariant vanishes. Actually, (along with the problem of the four-dimensional solution space of the next section) this inverse point of view was how we first addressed the issues of this work. More precisely, we begin with a three manifold, M, locally coordinatized by, x a , and require that there be a Lorentzian (conformal) metric determined in the following manner: there is to exist a one-parameter family of foliations of M, of sufficient generality, (referred to as a complete integral) such that every member of the foliation is to be
Differential Geometry from Differential Equations
391
a null-surface of the unknown metric. If the level surfaces of the one-parameter family of foliations (parametrized by s) is given by u = z(x a , s), then the condition that they be null, for all values of s, with respect to the unknown metric, g ab (x a ), is g ab ∂a z∂b z ≡ g ab za zb = 0.
(25)
By now taking a series (four) of s derivatives of Eq. (25), we have g ab za zb = 0,
(26)
= 0,
(27)
= 0,
(28)
= 0.
(29)
g ab za zb + g ab za zb g ab za zb + 3g ab za zb g ab za zb + 4g ab za zb + 3g ab za zb Then by considering the set u = z(x a , s),
w = z (x a , s),
R = z (x a , s),
F = z (x a , s),
(30)
the three x a can be eliminated from the last expression via the first three expressions, yielding z = F (z, z , z , s).
(31)
From the form of the metric, (5), we have that ω1 = za dx a is a null covector. This observation thus establishes the connection with the first approach. [Note that by “a one parameter family of sufficient generality”, we mean that the three one-forms (za dx a , za dx a , za dx a ) are linearly independent for all s. This implies that one can invert (30), i.e., obtain x a = X a (z, z , z , s).] We see that, via Eq. (31), za and za can be expressed in terms of the gradient basis (za , za , za ). The five expressions Eqs.(25) and (26) yield the five independent components of a conformal metric that depend on s. This conformal metric (though described in the gradient basis) is identical to the conformal metric of Eq. (5) which is described in a null basis. The fifth derivative of Eq. (25), namely g ab za(5) zb + 5g ab za(4) zb + 10g ab za zb = 0, when expressed in terms of F and its derivatives, is identical to Eq. (14), i.e., U [F ] = 0. This completes the display of the equivalence of the two approaches. It also gives the justification for the apparently arbitrary choices of the one-forms (4) and metric (5). Linear combinations of the three gradient one-forms, (za dx a , za dx a , za dx a ), form the null triad (ω1 , ω2 , ω3 ). In addition to the condition U [F ] = 0, Tod [6], following Cartan [2], in a continuing study of Eq. (31) imposes further restrictions on F so that the resulting metrics contain all three-dimensional Einstein–Weyl spaces.
392
S. Frittelli, C. Kozameh, E. T. Newman
3. Pairs of Partial Differential Equations The discussion of the previous section is really a variant of the work of Cartan and Chern with our point of view. In this section we will discuss a new situation. We want to find differential equations whose solution space is four-dimensional and in addition possesses a Lorentzian structure. This four-dimensional solution space is to be the fourdimensional manifold M of physical space-time. Our goal, eventually, is to impose the Einstein vacuum equations on this space. This issue however will not be addressed here. After the consideration of the equation z = F (z, z , z , s) one might have thought that the generalization from three to four dimensions should be to an equation of the form, z = G(z, z , z , z , s) whose solution space is four dimensional. This case was studied by Bryant [7] who found a further rich variety of geometric structures, e.g., a quartic metric, gabcd but it does not include a four-dimensional Lorentzian structure. We have taken a different direction for the creation of a four-dimensional solution space; we consider and study the geometry of the pair of equations Zss = P (Z, Zs , Zt , Zst , s, t), Ztt = Q(Z, Zs , Zt , Zst , s, t),
(32)
where P and Q satisfy the integrability conditions for all Z, Dt2 P = Ds2 Q,
(33)
and the weak inequality, needed for the four-dimensionality of the solution space, ∂P ∂Q 1> . (34) ∂Zts ∂Zts We have used the notation for the total derivatives Dt or Ds to mean, respectively, the t and s derivatives acting on all the variables but holding, respectively, the s or the t constant. For example, if H = H (Z, Zs , Zt , Zst , s, t) then Dt H ≡
∂H ∂H ∂H ∂H ∂H Zts + Ztt + Ztts + Zt + . ∂Z ∂Zs ∂Zt ∂Zts ∂t
(35)
Dt and Ds should also be thought of as vector fields on the six-space, (Z, Zs , Zt , Zst , s, t). For notational reasons and for comparison with earlier work but without changing anything essential, we will consider P and Q to be complex conjugates of each other, s and t to also be complex conjugates of each other and will adopt the notation that P = S and Q = S ∗ and t = s ∗ with Ds ≡ D and Dt ≡ D ∗ . (For example, s and s ∗ can be considered as the complex stereographic coordinates on S 2 .) The solution space of Eqs. (32) is four-dimensional [8], the space of constants of integration, (x a ); solutions can be written as Z = Z(x a , s, s ∗ ). We will be interested in several different spaces; the four dimensional space of the (x a ); the space of (Z, Zs , Zs ∗ , Zss ∗ ) ≡ (Z, DZ, D ∗ Z, DD ∗ Z) ≡ (Z, W, W ∗ , R),
(36)
(defining the Z, W, W ∗ , R) and the six-dimensional space of (Z, W, W ∗ , R, s, s ∗ ).
Differential Geometry from Differential Equations
393
Our starting equations are then rewritten D 2 Z = S(Z, DZ, D ∗ Z, DD ∗ Z, s, s ∗ ), D ∗2 Z = S ∗ (Z, DZ, D ∗ Z, DD ∗ Z, s, s ∗ ).
(37)
We identify the spaces (x a ) ⇔ (Z, W, W ∗ , R)
(38)
for any fixed values of (s, s ∗ ), treating the relationship, Eq. (38), as a coordinate transformation between the two sets, that is parametrized by (s, s ∗ ). The six-space can then be coordinatized either by (x a , s, s ∗ ) or by (Z, W, W ∗ , R, s, s ∗ ). It is useful to think of the larger space as being a two-dimensional bundle over the four-space, x a . In our applications it is taken to be the sphere-bundle, physically, the bundle of null directions at each space-time point. This point of view will not be emphasized here. We begin with the six gradient one-forms θ i = (θ 0 , θ + , θ − , θ 1 ) ≡ ∂a (Z, W, W ∗ , R)dx a , θ 0 ≡ dZ − W ds − W ∗ ds ∗ = Za dx a , θ + ≡ dW − D 2 Zds − DD ∗ Zds ∗ = Wa dx a , θ − ≡ dW ∗ − Rds − D ∗2 Zds ∗ = Wa∗ dx a ,
(39)
θ ≡ ds, θ ∗ ≡ ds ∗ ,
(40)
θ 1 ≡ dR − D ∗ D 2 Zds − DD ∗2 ds ∗ = Ra dx a ,
and form the combinations ωi = (ω0 , ω+ , ω− , ω1 ), ω0 ω+ ω− ω1
= = = =
θ 0, α(θ + + bθ − ), α(θ − + b∗ θ + ), (θ 1 + aθ + + a ∗ θ − + cθ 0 ),
(41)
and ω = Cθ, ω∗ = C ∗ θ ∗ , where the (α, a, b, c) and C are to be determined.
(42)
394
S. Frittelli, C. Kozameh, E. T. Newman
3.1. Four-dimensional Lorentzian metrics. From the four ωi , we form the 2-parameter, (s, s ∗ ) family, of Lorentzian four-metrics by g(x a , s, s ∗ ) = ω0 ⊗ ω1 + ω1 ⊗ ω0 − ω+ ⊗ ω− − ω− ⊗ ω+ ,
(43)
= ηij ωi ⊗ ωj . This defines a metric for each value of s and s ∗ , such that the ω ’s form a null tetrad. We wish to know how (α, a, b, c) should be specified for the (s, s ∗ )-dependent metrics to be “almost” conformally equivalent for all (s, s ∗ ). The metrics are said to be “almost” conformally equivalent if Dg = Uij [M[S, S ∗ ]]ωi ⊗ ωj + 2[S, S ∗ ]g,
(44)
D ∗ g = Uij∗ [M ∗ [S, S ∗ ]]ωi ⊗ ωj + 2∗ [S, S ∗ ]g, where, (1); 2 and 2∗ are explicit functions of (S, S ∗ ), (2); M[S, S ∗ ] and M ∗ [S, S ∗ ] are specific non-linear functions of (S, S ∗ ) and their derivatives, [the “metricity expressions” or generalized Wunschmann conditions], (3); Uij [M] are functions of M, DM and D 2 M that all vanish when M = 0. S and S ∗ are still arbitrary functions of (Z, DZ, D ∗ Z, DD ∗ Z, s, s ∗ ). The (s, s ∗ )-dependent metrics are conformally equivalent when S and S ∗ are such that M[S, S ∗ ] vanishes. For arbitrary S, however, the metrics are “almost” conformally related. We refer to (44) as the “minimal dependence conditions”. In this section we display the values of (α, a, b, c) in terms of S, that satisfy the Eqs. (44). We could do this by using, in principle a “simplicity” argument (i.e., by trying to do the simplest thing possible), which would consist of setting to zero certain components of Dg −λg, (for some λ); namely those that allow us to solve for (α, a, b, c) algebraically in terms of (S, S ∗ ) and their derivatives. The remaining components of Dg − λg were then to be shown to be of the form Uij [M[S, S ∗ ]] depending on a single function M[S, S ∗ ] which vanishes when M[S, S ∗ ] vanishes. In this manner, the unknown functions (α, a, b, c) were to be uniquely determined. In fact we did not do this. We did start this calculation and did, in this manner, determine, (α, a, b), (see below) but soon the complexity of the algebraic expressions and manipulations became unmanageable and we could not determinec and M[S, S ∗ ] directly. There however was an alternative approach (see Sect. 4) that did allow us to finish the task. In the following we will first state the main results (partially obtained by both methods) and then outline the “simplicity” argument. The results will then be discussed in Subsect. 3.2. Finally, in Sect. 4, the alternative method will be described in detail. The equivalence of both methods is then shown. The main results are the following determination of the unknown functions (α, a, b, c): 1 1 ∗ ∗S − 1 , 1 − S b = 1 − SR∗ SR − 1 , R R ∗ SR SR 1 − SR∗ SR + 1 (1 + bb∗ ) α2 = , = ∗ 2(1 − SR SR ) (1 − bb∗ )2 b=
(45)
(46)
Differential Geometry from Differential Equations
395
−1 1 1 ∗ 1 ∗ a = (1 − SR SR∗ )−1 1 − SR SR∗ SR − TR∗ 1 + SR∗ SR SW ∗ + S W 4 2 2
3 − SR∗ [SW + SW ∗ SR∗ − TR ] , (47) 4 1 (48) c = − G − (a − a ∗ b∗ )(a ∗ − ab)(1 + bb∗ )−1 , 2 where T ≡ D ∗ S, U ≡ D ∗ T = D 2∗ S ≡ D 2 S ∗ , the subscripts on the S, T , U refer to partial derivatives. G is defined by 1 1 G(1 + SR SR∗ ) = TW + TW ∗ SR∗ + TW∗ ∗ + TW∗ SR − UR 2 2 1 ∗ ∗ ∗ ∗ ∗ + (SW S W SR + S W S W ∗ + S W ∗ SW ∗ S R + S W ∗ S W 2 1 g 1+ ∗ − SR∗ SZ − SR SZ∗ ) − (SW SR∗ + SR SW + 2TR∗ ) 01 2 g 1− 1 g ∗ ∗ − (SR SW , ∗ + SW ∗ SR + 2TR ) 2 g 01
(49)
with g 1+ 1− g 01 g 1− 1− g 01
1 1 SR SR∗ = − [TR − SW − SW ∗ SR∗ ] + 4 2 1 1 ∗ ∗ SR SR∗ = − [TR∗ − SW ∗ − S W SR ] + 4 2
1 ∗ ∗ SR [TR∗ − SW ∗ − SW SR ], 4 1 ∗ S [TR − SW − SW ∗ SR∗ ]. 4 R
The metricity expression is given by M[S, S ∗ ] =
−2 {Db + SW ∗ − bSW − (SR + b)(a ∗ − ab)} (1 − bb∗ )
(50)
with b and b∗ given by Eq. (45). Explicitly the simplicity argument is carried out as follows: We begin by constructing Dg = ηij Dωi ⊗ ωj + ηij ωi ⊗ Dωj .
(51)
Working out, (see Appendix), via Eqs. (41) and (39), all the Dωi = Aij ωj ,
(52)
with Aij explicit functions of the derivatives of S and S ∗ and the unknown functions (α, a, b, c), we obtain Dg = [ηkj Aki + ηik Akj ]ωi ⊗ ωj ≡ Gij ωi ⊗ ωj ,
(53)
with symmetric Gij . The Gij are thus also explicitly known functions (see Appendix) of S and S ∗ and their derivatives and the (α, a, b, c). (Dg should be thought of as the Lie derivative of the metric along the vector field defined by (35).)
396
S. Frittelli, C. Kozameh, E. T. Newman
We now rewrite Dg by adding and subtracting a term 2G01 ω(+ ⊗ ω−) , obtaining Dg = G01 g + G11 ω1 ⊗ ω1 + 2G1+ ω(1 ⊗ ω+) + 2G1− ω(1 ⊗ ω−)
(54)
+ G−− ω− ⊗ ω− + G++ ω+ ⊗ ω+ + 2(G01 + G+− )ω(+ ⊗ ω−)
+ 2G0+ ω(0 ⊗ ω+) + 2G0− ω(0 ⊗ ω−) + G00 ω0 ⊗ ω0 . A simple inspection of the explicit expressions for Gij = ηkj Aki + ηik Akj and its conjugate (see the Appendix) reveals that, contrary to the 2+1 case of Sect. 2, determining which combinations of them should vanish is not at all, in this case, obvious . However, guided by the procedure that leads to the Null Surface reformulation of GR, as explained in Sect. 4, we take the following steps: 1. We observe that G11 ≡ 0. 2. Setting G1+ [S, b, α] = G1− [S, b, α] = 0
(55)
determines b(S) and α(S) algebraically, as given in Eqs.(45) and (46). 3. If b(S) is given as in Eqs. (45), then setting G++ [S, b(S), α(S), a] = b∗2 (S)G−− [S, b(S), α(S), a]
(56)
allows us to determine the functions a(S) algebraically as in Eq. (47). 4. We then have, (56), with (45) and (46), that G01 [S, b(S), α(S), a(S)], G−+ [S, b(S), α(S), a(S)], G−− [S, b(S), α(S), a(S)] are now explicit functions of S and moreover they satisfy G01 [S, b(S), α(S), a(S)] + G−+ [S, b(S), α(S), a(S)] = b∗ (S)G−− [S, b(S), α(S), a(S)] as an identity. Up to this point, namely, imposing (55) and (56), we have that Dg is reduced to Dg = G01 g + G−− (ω− ⊗ ω− + b∗ 2 (S)ω+ ⊗ ω+ + 2b∗ (S)(ω(+ ⊗ ω−) ) +2G0+ ω(0 ⊗ ω+) + 2G0− ω(0 ⊗ ω−) + G00 ω0 ⊗ ω0 ,
(57)
where G01 G−− G0+ G0− G00
= = = = =
G01 [S, a(S)], G−− [S, b(S), α(S), a(S)], G0+ [S, b(S), α(S), a(S), c], G0− [S, b(S), α(S), a(S), c], G00 [S, b(S), α(S), a(S), c, Dc].
(58)
5. We now need to extract, from our minimal dependence condition (44), a linear combination of the remaining components of Dg − G01 g which, when vanishing, will allow us to obtain c algebraically and simultaneously force G0+ , G0− and G00 to vanish
Differential Geometry from Differential Equations
397
when G−− [S, b(S), α(S), a(S)] = 0. The linear combination of G ’s that determines c in this way is ∗ )G−− = 0 (2 − bSR∗ )G0+ + (SR∗ − 2b∗ )G0− + α −1 (1 − bb∗ )(aSR∗ − SW
(59)
with G0+ , G0− and G−− given by (58). This result becomes extremely difficult to see by simple inspection of the equations. Instead, one must turn to the methods of Sect. 4. From this analysis one could see that c was given by (48) and that G0+ , G0− , G00 all vanish when G−− = 0. Based on this, we promote G−− [S, b(S), α(S), a(S)] to the metricity condition and make the following identifications: G01 (S, a(S)) ≡ 2(S)
(60)
G−− [S, b(S), α(S), a(S)] ≡ M(S).
(61)
and
From this we then have the form of Gij given in our minimal dependence equation (44). When M(S) = 0 we have that Dg = 2[S]g. 3.2. Six dimensional metrics. We now extend the metric g to a six-dimensional metric by g (6) (x a , s, s ∗ ) = g − ω ⊗ ω∗ = g − CC ∗ ds ⊗ ds ∗
(62)
so that (63) Dg (6) = Dg − (C ∗ DC + CdC ∗ )ds ⊗ ds ∗ , = Uij [M[S, S ∗ ]]ωi ⊗ ωj + 2[S, S ∗ ]g − (C ∗ DC + CDC ∗ )ds ⊗ ds ∗ , If the unknown C is chosen so that DC = 21 2C , DC ∗ = 21 2C ∗ , then Dg (6) = Uij [M[S, S ∗ ]]ωi ⊗ ωj + 2g (6) .
(64)
This leads to Proposition 4. If the class of differential equations is restricted to those S that satisfy the conditions M[S, S ∗ ] = 0, M ∗ [S, S ∗ ] = 0,
(65)
then there exists on the six-space, a pair of conformal Killing fields ξ = d/ds and ξ ∗ = d/ds ∗ . It is obvious that conformal factors " can easily be found for the six metric so that the conformal Killing fields become Killing fields and the six-space can be foliated with four-dimensional subspaces, (s and s ∗ constant), with the induced four metrics all isometric. These four metric then map down to a unique conformal class of Lorentzian metrics on the four-space of the x a .
398
S. Frittelli, C. Kozameh, E. T. Newman
4. Relationship with the Null Surface Formulation of GR It is not hard to see that all Lorentzian 4-metrics (locally) are included in this construction. This follows from its equivalence to the Null Surface reformulation of General Relativity. In that work [9–12], one begins with a four-manifold M with an unknown – but to be determined – conformal Lorentzian metric and asks that there be a two-parameter, (s, s ∗ ), family of (local) null surface foliations of M of sufficient generality, whose level surfaces are given by u = Z(x a , s, s ∗ ). This requires that the unknown conformal metric satisfies g ab ∂a Z∂b Z = 0
(66)
for all (s, s ∗ ). The arbitrary conformal factor can depend on (s, s ∗ ). By repeated (s, s ∗ ) derivatives of Eq. (66), (explicitly, the following eight derivatives) D, D ∗ , D 2 , D ∗2 , DD ∗ , D ∗ D 2 , DD ∗2 , D 2 D ∗2 , which with Eq. (66), yields nine relations so that the unknown conformal metric, g ab , can be given completely in terms of a function S(Z, DZ, D ∗ Z, DD ∗ Z, s, s ∗ ) that is defined by D 2 Z = S, D ∗2 Z = S ∗ and we are back to our starting point. These metrics satisfy our minimal dependence condition, Eq. (44), for (s, s ∗ ) dependence. If we continue and take the derivatives D 3 and D ∗3 of Eq. (66), we finally obtain the metricity conditions M[S, S ∗ ] = 0, M ∗ [S, S ∗ ] = 0. Since we began with an arbitrary Lorentzian space, we see that for any such space there exists an S satisfying the metricity conditions that yields, via Eqs. (37), that metric up to conformal factor. It is now clear that the vacuum Einstein equations, with a specific choice of conformal factor, can be obtained by a further restriction on the class of functions (S, S ∗ ). The procedure just outlined can be carried out explicitly, with considerable calculational effort, as follows: Start with a function Z = Z(x a , s, s ∗ ) and its D, D ∗ and DD ∗ derivatives as “primary” functions, i.e., θ i = (θ 0 , θ + , θ − , θ 1 ) ≡ (Z, W, W ∗ , R); Z W W∗ R
= Z(x a , s, s ∗ ), = DZ(x a , s, s ∗ ), = D ∗ Z(x a , s, s ∗ ), = D ∗ DZ(x a , s, s ∗ ) = D ∗ W = DW ∗ .
It is assumed that these four relations can be inverted as x a = X a (θ i , s, s ∗ ).
(67)
Differential Geometry from Differential Equations
399
The set of “secondary” functions S ∗ ≡ D ∗2 Z, S ≡ D 2 Z, ∗ T ≡ DR = D S = D ∗ D 2 Z, T ∗ ≡ D ∗ R = DS ∗ = DD ∗2 Z, U ≡ D 2 S ∗ = D ∗2 S = D ∗ T = DT ∗ = D 2 D ∗2 Z,
(68)
which can all be thought of as functions of (Z, W, W ∗ , R, s, s ∗ ), where the x a have been eliminated via the inversion of Eq. (67). The exterior derivatives (the space-time gradients, holding (s, s ∗ ) constant) of the “primary” functions are dθ i = ∂a θ i dx a
(69)
and for the “secondary” functions S, T and U , they are given by dS = SZ dZ + SW dW + SW ∗ dW ∗ + SR dR = Sθ i dθ i ,
(70)
dT = TZ dZ + TW dW + TW ∗ dW ∗ + TR dR = Tθ i dθ i ,
dU = UZ dZ + UW dW + UW ∗ dW ∗ + UR dR = Uθ i dθ i . If we write the unknown, but to be determined, inverse of the space-time metric g ab (x a , s, s ∗ ) = g ab (x a )ω2 (s, s ∗ ), (i.e., where the (s, s ∗ ) behavior appears only in the conformal factor) as g I ≡ g ab ∂a ⊗ ∂b , then we can define the metric components in the gradient basis, θai ≡ ∂a θ i , by j
g ij (x a , s, s ∗ ) ≡ g I (dθ i , dθ j ) = g ab θai θb .
(71)
g ab (x a )ω2 (s, s ∗ ) we have Note the very important point that since g ab = Dg I ≡ Dg ab ∂a ⊗ ∂b = 2ω−1 Dωg I ≡ λg I .
(72)
We, however, will not be using Eq. (72) fully until the end of the calculation. More explicitly, we will only be using different specific components of Eq. (72), i.e., specific components of j
Dg I (dθ i , dθ j ) ≡ Dg ab θai θ b = λg ij
(73)
along the way, and only at the end, with the metricity condition, will the full Eq. (72) be used. Until this last condition is imposed the conditions on Dg I are precisely our “minimal dependence conditions”. Starting with the condition that the level surfaces , Z(x a , s, s ∗ ) = constant, (for each value of (s, s ∗ )) are null surfaces of the metric, we have that g 00 = g I (dZ, dZ) = g ab Z,a Z,b = 0.
(74)
By applying D and D ∗ to Eq. (74), we have Dg ab Z,a Z,b +2g ab W,a Z,b = 0, D ∗g ab Z,a Z,b +2g ab W ,∗a Z,b = 0.
(75) (76)
400
S. Frittelli, C. Kozameh, E. T. Newman
Thus from Eq. (74), using one component of Eq. (73), Dg ab Z,a Z,b = λg ab Z,a Z,b = 0,
(77)
g 0+ = g I (dDZ, dZ) = g I (dW, dZ) = 0, g 0− = g I (dD ∗ Z, dZ) = g I (dW ∗ , dZ) = 0.
(78) (79)
we have
Next, applying D to Eq. (79), yields
Dg ab W ,∗a Z,b +g ab R,a Z,b +W ,∗a W,b = 0.
(80)
Thus from Eq. (79), again with one component of Eq. (73), Dg ab W ,∗a Z,b = λg ab W ,∗a Z,b = 0,
(81)
we have g I (dR, dZ) + g I (dW, dW ∗ ) = 0
⇔
g −+ + g 01 = 0.
(82)
If D and D ∗ are applied respectively to Eqs.(78) and (79), we have Dg ab W,a Z,b +g ab (S,a Z,b +W,a W,b ) = 0, D ∗g ab W ,∗a Z,b +g ab (S,∗a Z,b +W ,∗a W ,∗b ) = 0.
(83) (84)
Therefore, from Eq. (78) again using one component of Eq. (73), Dg ab W,a Z,b = λg ab W,a Z,b = 0,
(85)
(which implies its complex conjugate as well) then, using (70), g I (dS, dZ) + g I (dW, dW ) = 0 g (dS ∗ , dZ) + g I (dW ∗ , dW ∗ ) = 0 I
⇔ g ++ = −SR g 01 , ⇔ g −− = −SR∗ g 01 .
(86) (87)
Continuing this process, i.e., applying D and D ∗ to Eq. (82), yields Dg ab W,a W ,∗b +Dg ab Z,a R,b +g ab (S,a W ,∗b +2W,a R,b +Z,a T ,b ) = 0, D ∗g ab W,a W ,∗b +D ∗g ab Z,a R,b +g ab (W,a S,∗b +2W ,∗a R,b +Z,a T ,∗b ) = 0.
(88) (89)
Therefore, from Eq. (82) using two components of Eq. (73), we have Dg ab W,a W ,∗b +Dg ab Z,a R,b = λg ab (W,a W ,∗b +Z,a R,b ) = 0,
(90)
(and its complex conjugate) leads, respectively, to g I (dT , dZ) + g I (dS, dW ∗ ) + g I (dR, dW ) + g I (dW, dR) = 0, g (dT ∗ , dZ) + g I (dR, dW ∗ ) + g I (dR, dW ∗ ) + g I (dW, dS ∗ ) = 0. I
(91) (92)
Using Eqs. (70), (74), (78), (79) and (86), in (91) and (92) we obtain, respectively, 2g +1 = −TR g 01 + SW g 01 + SW ∗ SR∗ g 01 − SR g −1 , 2g
−1
=
−TR∗ g 01
∗ 01 + SW ∗g
∗ + SW SR g 01
− SR∗ g +1 ,
(93)
Differential Geometry from Differential Equations
which are easily solved for g +1 and g −1 : 1 1 +1 ∗ 1 − SR SR = − [TR − SW − SW ∗ SR∗ ]g 01 g 4 2 1 ∗ ∗ 01 + SR [TR∗ − SW ∗ − SW SR ]g , 4 1 1 ∗ ∗ 01 g −1 1 − SR SR∗ = − [TR∗ − SW ∗ − SW SR ]g 4 2 1 + SR∗ [TR − SW − SW ∗ SR∗ ]g 01 . 4
401
(94)
(95)
Notice that Eqs. (91) and (92) are obtained just as well by taking D ∗ of Eq. (86) and D of (87) using Eq. (87) and two components of Eq. (73), we have Dg ab W ,∗a W ,∗b +Dg ab S,a Z,b = 0
(96)
(with its complex conjugate). Finally, by applying D ∗ to Eq. (91), (or D to Eq. (92)), we obtain the last metric component g 11 : 1 ∗ ∗ −− )g ++ + (2TW ∗ + SW ∗ SW (97) −2 1 + SR SR∗ g 11 = (2TW∗ + SW SW ∗ )g 2 +g 01 (UR + SZ SR∗ + SR SZ∗ ∗ ∗ ∗ −SW ∗ SW − SW SW ∗ − 2TW ∗ − 2TW ) ∗ ∗ +g −1 (SR SW ∗ + SW ∗ SR + 2TR ) ∗ +g +1 (SR SW + SW SR∗ + 2TR∗ ),
using the equation Dg ab (T ,∗a Z,b +S,∗a W,b +2R,a W ,∗b ) = 0
(98)
that arises from similar considerations, from Eq. (73), as before. We emphasize that at this point we have not yet used the full set of components of Eq. (73) and consequently we do not yet have a single conformal metric by this construction – but instead we have an (s, s ∗ ) dependent family of conformal metrics. In other words we see that modulo an overall (conformal) factor, namely g 01 = ω2 (s, s ∗ ), a two parameter family of metrics, g ij (x a , s, s ∗ ), has been obtained by requiring that a series of D and D ∗ derivatives, applied to the null surface condition, Eq. (74) remains zero for all (s, s ∗ ) [this is the meaning of imposing Eqs. (77), (81), (85), (90), (96) and (98)]. All the components, in the gradient basis, θai , have been expressed in terms of derivatives of S and S ∗ . This (s, s ∗ )-dependent metric satisfies the minimal dependence condition, Eq. (44), and is, in fact, identical (up to a conformal rescaling) to the metric of Eq. (43), but is expressed in a gradient basis, rather than in the null tetrad basis. By using one further component of Eq. (73), namely applying D to Eq. (86) and using Dg ab (W,a W,b +S,a Z,b ) = 0,
(99)
we obtain the metricity condition, the only condition of the functions S and S ∗ (given below). In this case the (s, s ∗ )-dependent family of metric are all conformal to each other.
402
S. Frittelli, C. Kozameh, E. T. Newman
We notice that all the components of the metric, g ij , are determined up to a single overall undetermined factor, namely g 01 , i.e., g ij = g 01 hij [S, S ∗ ]. To make the explicit comparison between this conformal metric and the metric, (43), we chose the special conformal gauge g 01 = 1. If we take the null tetrad system, ωia as a linear combination of the gradient basis θ j,b then re-expressing the metric in the null tetrad system, ωia , allows us to read off the coefficients, (α, a, b, c), of Eq. (41). Explicitly, j
gab = gij θ i,a θ j,b = ηij ωia ωjb = ηij Kki Kl θ k,a θ l,b ,
(100)
ηkl Kik Kjl = gij ,
(101)
or
where the coefficients Kji are given by ωai ≡ Kji θ j,a and are shown explicitly in Eqs. + 1 , b = K + /K + and c = K 1 . From (101), then (41). In particular, α = K+ , a = K+ − + 0 01 using the special conformal frame, (g = 1 ⇒ g +− = −1), we obtain −1 √ ( J − 1), g −− −1 √ = ++ ( J − 1), g g ++ g −− = , √ 2J ( J − 1) g 1+ g −− + g 1− , = J g 1− g ++ + g 1+ = , J √ √ 1 11 [g −− g 1+ + g 1− + g 1− J ][g ++ g 1− + g 1+ + g 1+ J ] =− g + , √ 2 2J ( J − 1)
b= b∗ α2 a a∗ c with
J = (g −+ )2 − g ++ g −− = 1 − SR SR∗ . Since the components of the metric are functions of S as obtained above, the parameters (α, a, b, c) are expressed in terms of S, as desired. We can now see how this procedure justifies Eqs. (55), (56) and (59) used in Sect. 3. Straightforward algebra shows that j
j
Dg ab θai θb = −ηkl Gkm ηmn (K −1 )il (K −1 )n .
(102)
Using (102) to translate Eqs. (77), (81), (85), (90), (96) and (98) in terms of Gij , the propositions follow:
Differential Geometry from Differential Equations
403
Proposition 5. The vanishing of Dg ab θa0 θb0 is equivalent to the vanishing of G11 . Explicitly, Dg ab θa0 θb0 = 0 ⇔ G11 = 0.
(103)
Proposition 6. The vanishing of Dg ab θa− θb0 and Dg ab θa+ θb0 (with G11 = 0) is equivalent to the vanishing of G1+ and G1− . Thus Dg ab θa− θb0 = 0 = Dg ab θa+ θb0 ⇔ G1+ = 0 = G1− .
(104)
Proposition 7. The vanishing of Dg ab (θa+ θb− +θa0 θb1 ) and Dg ab (θa− θb− +SR∗ θa0 θb1 ) (with G11 = G1+ = G1− = 0) is equivalent to the vanishing of [G++ − b∗ 2 G−− ] and [G+− + G01 − b∗ G−− ]: Dg ab (θa+ θb− + θa0 θb1 ) = Dg ab (θa− θb− + SR∗ θa0 θb1 ) = 0
(105)
⇔ [G++ − b G−− ] = [G+− + G01 − b G−− ] = 0.
(106)
∗2
∗
This justifies our choices of vanishing combinations of Gij in order to obtain (α, a, b, c) in terms of S in Sect. 3. The metricity condition, from this point of view, is obtained simply by applying D to Eq. (86), i.e. to g I (dS, dZ) + g I (dW, dW ) = 0, which, with the use of Dg ab (W,a W,b +S,a Z,b ) = 0,
(107)
which follows from the last component of (72), leads to g I (dDS, dZ) + 3g I (dS, dW ) = 0, or M(S, S ∗ ) =
g ++ g −+ g 1+ 1 (DS)R + SW 01 + SW ∗ 01 + SR 01 = 0. 3 g g g
Furthermore, with the use of (102) we can prove the following Proposition 8. The vanishing of Dg ab (θa+ θb+ + SR θa0 θb1 ) (with the earlier conditions) is equivalent to the vanishing of G−− . Thus if G11 = G1+ = G1− = 0, G++ = b∗ 2 G−− and G+− + G01 = b∗ G−− , then Dg ab (θa+ θb+ + SR θa0 θb1 ) = 0 ⇔ G−− = 0.
(108)
Note that G−− = 0 implies G+− + G01 = 0. This allows us to promote G−− to the status of metricity condition if (α, a, b, c) are given in terms of S, as argued in Sect. 3. When the metricity condition, Eq. (108), is imposed the expressions for the remaining components of Eq. (73) are identically satisfied and G0− and G00 vanish. When G−− = 0, the expression from Eq. (44) Uij [M[S, S ∗ ]] is a linear combination of M(S, S ∗ ), D ∗ M(S, S ∗ ) and D 2∗ M(S, S ∗ ) which thus all vanish when M(S, S ∗ ) vanishes.
404
S. Frittelli, C. Kozameh, E. T. Newman
5. Discussion In this work we have extended Cartan’s beautiful construction of differential geometric structures that are naturally associated with ordinary differential equations, to a pair of overdetermined partial differential equations. The resulting geometric structures form a rich set of mathematical constructions that includes as a special case all Lorentzian space-times – and, as an obvious consequence, all solutions of Einstein’s theory of general relativity. The study of the Einstein equations via this approach had already begun in an earlier series of papers [9–12] long before we connected it with Cartan’s view. This work on general relativity is continuing with hopefully many applications, but the point of view towards it diverges from that of the present paper. Here we feel that the issues are more closely related to the study of equivalence classes of the starting equations under some set or class of transformations. An immediate question that we wish to investigate is the following: given our set of equations D 2 Z = S,
D ∗2 Z = S ∗ ,
where the (S, S ∗ ) satisfy the metricity conditions, the level surfaces of the (local) solutions, u = Z(x a , s, s ∗ ), define a two-parameter family of null surfaces (a complete integral) in the space-time of the solution space, x a , with the (conformal) metric, Eq. (43). In general, null surfaces develop caustics or wavefront singularities at which point the function Z(x a , s, s ∗ ) no longer satisfies the differential equations – the local solutions break down. Nevertheless the space-time and its metric could be completely smooth there. Other families of null surfaces would exist that satisfied similar equations but with different (S, S ∗ ). We will consider the “restricted equivalence” problem to be the problem of finding all pairs of functions, (S, S ∗ ), that yield the same space-time metric. As a matter of fact, this “restricted problem” is intimately related with the equivalence problem under general contact transformations. A paper is being prepared on this issue [15]. Another issue that we left untreated was how the Cartan structure equations, Eq. (19), behave in the case of the pair of PDE’s. It seems very likely that we will have similar results to that obtained from the third order ODE of Sect. 2, where the metricity function plays the role of a torsion tensor and we obtain a conformal connection rather than a metric connection. But this remains to be analyzed – the calculations being quite lengthy. Appendix dz d2 5.1. Geometry of ds 2 z = E z, ds ,s . We begin with an arbitrary function of four variables :(z, s, u, v) = 0
(109)
assuming that it can be locally solved for any one of the variables. Considering the two two-spaces of (z, s) and (u, v), we see that a point, (z, s) in the first, corresponds to a specific curve in the second – as well as the converse. If we solve Eq. (109) for z = z(s, u, v) ≡ z(s, x A )
(110)
Differential Geometry from Differential Equations
405
by differentiating with respect to s, (u, v) can be eliminated from the second derivative leaving z = E(z, z , s). The same thing can be done with the variables (u, v) resulting in the second order differential equation du d 2u = U u, ,v . dv 2 dv
(111)
for the curve in the (u, v) space. Cartan then asks for the conditions on E(z, z , s) such that Eq. (111) is a geodesic for some (projective) symmetric connection – which is determined by the form of E. He finds that E(z, z , s) must satisfy d2 d d Ez z − Ezz − Ez Ez z + 2Ezz + Ez Ezz − 2Ez Ez z = 0, ds 2 ds ds
(112)
≡ ∂ z , by and the connection is given, in the gradient basis, zA ≡ ∂A z and zA A
∇B zA = 0, ∇B z A = 2α(A zB) , 1 d = (Ezz − Ez z )zA + Ez z zA , αA 2 ds
(113) (114) (115)
A ∼ = A + 2δ A ϒ . remembering the projective equivalence =BC BC (B C) This arises from the following argument: If the curve determined in Eq. (110) by (z, s) = const has a tangent vector t A , then t A has a vanishing product with the gradient of z = z(s, x A ), i.e.,
t A zA = 0.
(116)
t B ∇B t A = βt A
(117)
If t A is tangent to a geodesic then
and t B ∇B (zA t A ) = t A t B ∇B zA + zA t B ∇B (t A ) = t A t B ∇B zA = 0. form a basis set for the covectors, we have that Now since zA and zA zA) + cz(B zA) , ∇B zA = az(B zA) + bz(B
but from Eqs. (116) and (117) we have c = 0 and hence ∇B zA = az(B zA) + bz(B zA) = α(A zB)
and immediately ∇B z A = α(A zB) + α(A zB) .
406
S. Frittelli, C. Kozameh, E. T. Newman
But via the projective equivalence they are the same as ∇B zA = 0, = α(A zB) . ∇B z A
(118)
By taking another s (or prime) derivative ∇B zA = α(A zB) + α(A zB)
and comparing it with ∇B z A = (Ezz ∇zA + Ezz ∇zA )zB + (Ezz ∇A z + Ez z ∇A z )zB + Ez 2α(A zB)
obtained from z = E(z, z , s) and zA = Ez zA + Ez zA ,
we recover Eqs. (112) and (115). We are indebted to Paul Tod for explaining this construction, due to Cartan, to us. 5.2. or
Dω = Aω. We have, via a lengthy calculation that the Aij , defined by Dωi = Aij ωj Dω0 = A00 ω0 + A0+ ω+ + A0− ω− + A01 ω1 , +
Dω = Dω− = Dω1 =
(119)
+ + + − + 1 0 A+ 0 ω + A+ ω + A− ω + A1 ω , − + − − − 1 0 A− 0 ω + A+ ω + A− ω + A1 ω , A10 ω0 + A1+ ω+ + A1− ω− + A11 ω1 ,
are given by (120) Dω0 = (1 − bb∗ )−1 α −1 (ω+ − bω− ), Dω+ = α{SZ − c(SR + b)}ω0 + ω1 α{SR + b} ω+ (1 − bb∗ )−1 {(1 − b∗ b)D ln α + SW − b∗ (Db + SW ∗ ) − (SR + b)(a − a ∗ b∗ )} + ω− (1 − bb∗ )−1 {Db + SW ∗ − bSW − (SR + b)(a ∗ − ab)} − Dω = ω0 α{b∗ [SZ − cSR ] − c} + ω1 α{1 + b∗ SR } + ω− (1 − bb∗ )−1 {(1 − bb∗ )D ln α − (a ∗ − ab) − bDb∗ + b∗ [SW ∗ − bSW − SR (a ∗ − ab)]} · ω+ (1 − bb∗ )−1 {Db∗ − (a − a ∗ b∗ ) + b∗ [SW − SW ∗ b∗ − SR (a − a ∗ b∗ )]} Dω1 = ω0 {Dc + TZ + aSZ − c(aSR + a ∗ + TR )} + ω1 {aSR + a ∗ + TR } · ω+ (1 − bb∗ )−1 α −1 {TW + c + Da + aSW − b∗ (Da ∗ + aSW ∗ + TW ∗ ) − (aSR + a ∗ + TR )(a − a ∗ b∗ )} · ω− (1 − bb∗ )−1 α −1 {(Da ∗ + aSW ∗ + TW ∗ ) − (a ∗ − ab)(aSR + a ∗ + TR ) − b(TW + c + Da + aSW )}
Differential Geometry from Differential Equations
407
5.3. Dg = Gij ωi ⊗ ωj . The calculation of the Gij , defined by Dg = Gij ωi ⊗ ωj begins with Dωi = Aij ωj and g = ηij ωi ⊗ ωj , Dg = ηij Dωi ⊗ ωj + ηij ωi ⊗ Dωj = {ηj k Aki + ηik Akj }ωi ⊗ ωj ≡ Gij ωi ⊗ ωj . Then, by direct substitution, we have (+ (− ⊗ ω1) + 2(A0− − A+ ⊗ ω1) Dg(x a , s, s ∗ ) = 2(A0+ − A− 1 )ω 1 )ω (0 +) · 2A10 ω0 ⊗ ω0 + 2(A1+ − A− 0 )ω ⊗ ω
(0 −) + 2(A1− − A+ + 2A11 ω(0 ⊗ ω1) 0 )ω ⊗ ω
− (+ − + − − + − 2(A+ ⊗ ω−) − 2A+ + + A− )ω − ω ⊗ ω − 2A+ ω ⊗ ω
with
ij 0 Gij = + − 1
0 2A10 (A1+ − A− 0) (A1− − A+ 0) A11
+ (A1+ − A− 0) −2A− + − (A+ + + A− ) 0 (A+ − A− 1)
− (A1− − A+ 0) − (A+ + A + −) + −2A− (A0− − A+ 1)
1 A11 (A0+ − A− ) 1 (A0− − A+ ) 1 0
and explicitly, G11 = 0, G−1 = − b(1 − bb∗ )−1 α −1 − α(SR + b), G+1 = (1 − bb∗ )−1 α −1 − α{1 + b∗ SR }, G−− = − 2(1 − bb∗ )−1 {Db + SW ∗ − bSW − (SR + b)(a ∗ − ab)} + α(SR + b), G+− = (1 − bb∗ )−1 {2(1 − b∗ b)D ln α + (1 − bb∗ )SW − D(bb∗ ) − (SR + b)(a − a ∗ b∗ ) − (a ∗ − ab)[1 + SR b∗ ]}, G++ = − 2(1 − bb∗ )−1 {Db∗ − (a − a ∗ b∗ ) + b∗ [SW − SW ∗ b∗ − SR (a − a ∗ b∗ )]}, G01 = (aSR + a ∗ + TR ), G0− = (1 − bb∗ )−1 α −1 {(Da ∗ + aSW ∗ + TW ∗ ) − (a ∗ − ab)(aSR + a ∗ + TR ) − b(TW + c + Da + aSW )} − α{SZ − c(SR + b)}, G0+ = (1 − bb∗ )−1 α −1 {TW + c + Da + aSW − b∗ (Da ∗ + aSW ∗ + TW ∗ ) − (aSR + a ∗ + TR )(a − a ∗ b∗ )} − α{b∗ [SZ − cSR ] − c}, G00 = 2{Dc + TZ + aSZ − c(aSR + a ∗ + TR )}. Acknowledgements. The authors thank the NSF for support under research grants # PHY 92-05109, PHY 97-22049 and PHY 98-03301. Carlos Kozameh thanks CONICET for support. ETN is endebted to Peter Vassiliou, Niky Kamran and Robert Bryant for enlightening discussions and a detailed proof of the four-dimensionality of the solutions space for our pair of pdes. We thank Paul Tod for both pointing out to us the connection of our earlier work to the work of Cartan and Chern that got us started in this project, for his teaching us Cartan’s construction for 2nd order ODE’s and for his helpful comments on an early version of this manuscript.
408
S. Frittelli, C. Kozameh, E. T. Newman
References 1. Cartan, E.: Les Espaces Generalises e L’integration de Certaines Classes d’Equations Differentielles. C. R. Acad. Sci. 206, 1425–1429 (1938). 2. Cartan, E.: La Geometria de las Ecuaciones Diferenciales de Tercer Orden. Rev. Mat. Hispano-Amer. 4, 1–31 (1941) 3. Cartan, E.: Sur une Classe d’Espaces de Weyl. Ann. Sc. Ec. Norm. Sup. 3e serie 60, 1–16 (1943) 4. Chern, S.-S.: The Geometry of the Differential Equation y = F (x, y, y , y ). In Selected Papers. Berlin–Heidelberg–New York; Springer-Verlag, 1978 (original 1940) 5. Wunschmann, K.: Über Berührungsbedingungen bei Integralkurven von Differentialgleichungen. Inaug. Dissert., Leipzig: Teubner 6. Tod, K.P.: Einstein-Weyl Spaces and Third Order Differential Equations, J. Math, Phys. 41, 5572 (2000) 7. Bryant, R.L.: Proc. Symp. Pure. Maths. Vol. 53, 33–88 (1991) 8. Though there is a simple heuristic argument for the four-dimensionality of the solution space, recently, in a private communication, Peter Vassiliou and Niky Kamran have given a rigorous proof of this. 9. Frittelli, S., Kozameh, C., Newman, E.: GR Via Characteristic Surfaces. J. Math. Phys. 36, 4984 (1995) 10. Frittelli, S., Kozameh, C., Newman, E.: Lorentzian Metrics from Characteristic Surfaces. J. Math. Phys. 36, 4975 (1995) 11. Frittelli, S., Kozameh, C., Newman, E.: Dynamics of Light cone Cuts of Null Infinity. Phys. Rev. D 56, 4729 (1997) 12. Frittelli, S., Kozameh, C., Newman, E.: Linearized Einstein Theory Via Null Surfaces. J. Math. Phys. 36, 5005 (1995) 13. Forni, D., Iriondo, M., Kozameh, C.: Null Surface Formulation in 3D. J. Math. Phys. 41, 5517 (2000) 14. Tanimoto, M.: On the Null Surface Formalism. gr-qc/9703003 (1997) 15. Frittelli, S., Kamran, N., Newman, E.: Differential Equations and Conformal Geometry. Preprint (2001) Communicated by M. Aizenman
Commun. Math. Phys. 223, 409 – 435 (2001)
Communications in
Mathematical Physics
© Springer-Verlag 2001
Finite Gap Potentials and WKB Asymptotics for One-Dimensional Schrödinger Operators Thomas Kriecherbauer1 , Christian Remling2 1 Universität München, Mathematisches Institut, Theresienstr. 39, 80333 München, Germany.
E-mail: [email protected]
2 Universität Osnabrück, Fachbereich Mathematik/Informatik, 49069 Osnabrück, Germany.
E-mail: [email protected] Received 14 March 2001 / Accepted: 27 June 2001
Abstract: Consider the Schrödinger operator H = −d 2 /dx 2 + V (x) with powerdecaying potential V (x) = O(x −α ). We prove that a previously obtained dimensional bound on exceptional sets of the WKB method is sharp in its whole range of validity. The construction relies on pointwise bounds on finite gap potentials. These bounds are obtained by an analysis of the Jacobi inversion problem on hyperelliptic Riemann surfaces. 1. Introduction We are interested in one-dimensional Schrödinger equations, −y (x) + V (x)y(x) = Ey(x),
(1)
and the spectra of the corresponding self-adjoint operators Hβ = −d 2 /dx 2 + V (x) on L2 (0, ∞), say. The index β ∈ [0, π ) refers to the boundary condition y(0) cos β + y (0) sin β = 0. The spectral properties of the operators Hβ give information on the large time behavior of the quantum mechanical system described by (1). In this paper, we will present an alternate approach to an earlier result of one of us [21]. With this new approach, we can remove a technical condition and thus prove that a previously obtained bound on the embedded singular spectrum of Hβ is sharp in its whole range of validity. We will describe this result shortly; let us first point out that the new idea of this paper is to use finite gap potentials in the construction of [21]. The main difficulty is to obtain good pointwise bounds on these potentials. A substantial part of this paper is devoted to this problem. More specifically, we will have to study in some detail the Jacobi inversion problem on hyperelliptic Riemann surfaces. The result of this analysis is formulated as Theorem 3.1. Actually, our proof gives more than stated: We obtain a whole sequence of good pointwise approximations (where, very roughly, “good” means better than expected, due to cancellations) to finite gap potentials. While
410
T. Kriecherbauer, C. Remling
our motivation for proving Theorem 3.1 is to provide tools for the proof of Theorem 1.1 below, this discussion is perhaps of independent interest. Let us now return to (1); suppose that the potential V is bounded by a decaying power, that is, |V (x)| ≤
C (1 + x)α
(α > 0).
(2)
Then, if α > 1/2, the operators Hβ have absolutely continuous spectrum essentially supported by (0, ∞), as was first proved in [1, 19]. Embedded singular spectrum can occur if α ≤ 1 (see [18, 26, 30]), but there are restrictions on the dimension of the singular part of the spectral measure. This is intimately related to the problem of solving (1) asymptotically (for large x). We say that a solution y(x, E) of the Schrödinger equation satisfies the WKB asymptotic formulae if x 1 y(x, E) √ E − V (t) dt + o(1) (x → ∞). (3) = exp i y (x, E) i E 0 It is well known that there exist solutions of (1) satisfying (3) for all E > 0 if the potential V decays and is slowly varying in a suitable sense (see, for instance, [7, Chapter 2]). Obviously, this latter assumption need not hold if V only satisfies (2). Nevertheless, recent work [1–3, 19, 20] has shown that (3) continues to hold off a small exceptional set of energies E as long as α > 1/2. Call this exceptional set S; in other words, S = {E > 0 : No solution of (1) satisfies (3)}.
(4)
General criteria [11, 25, 29] show that if there is some embedded singular spectrum on (0, ∞), then the corresponding parts of the spectral measures are supported by S. In (β) other words, if ρ (β) denotes the spectral measure of Hβ , then ρsing ((0, ∞) \ S) = 0 for all β. Therefore, it is interesting to study S in detail. We know from [1, 19] that S is of Lebesgue measure zero if α > 1/2; this was subsequently strengthened in [20] where it was proved that the Hausdorff dimension of S satisfies dim S ≤ 2(1 − α). Formally, this result is valid for all α ∈ R (if one defines dim ∅ = −∞), but it gives nontrivial information only if 1/2 < α ≤ 1. We will show that this bound is sharp and is even attained for suitable potentials: Theorem 1.1. For every α ∈ (1/2, 1], there exist potentials V (x) satisfying (2), so that dim S = 2(1 − α). If α ∈ / (1/2, 1], the whole picture is different. More precisely, if α ≤ 1/2, then S can have full Lebesgue measure in (0, ∞) [14, 15, 24] and the spectrum can be purely singular. On the other hand, it is easy to prove that S = ∅ if α > 1 (see, e.g., [7]). In [21], Theorem 1.1 was proved for α > 2/3. Things get more difficult as α approaches 1/2. In particular, we really need the full force of Theorem 3.1 in that the exponent N there gets larger and larger as α decreases to 1/2. Actually, here, too, we show more than stated: For any given function (x) with (x) → 0 as x → ∞ (no matter how slowly), we can construct a potential V (x) = O(x −α−(x) ), so that dim S = 2(1 − α). There are extensions of the results quoted above to far more general settings. Deift and Killip [5] have proved that there is absolutely continuous spectrum essentially supported by (0, ∞) already if V ∈ L1 +L2 ; very recently, Killip has obtained even stronger results in this direction [13]. WKB asymptotics off exceptional sets have been established by
Finite Gap Potentials and WKB Asymptotics for 1D-Schrödinger Operators
411
Christ and Kiselev [2, 3] under very general conditions, including V ∈ L1 + Lp for some p < 2 (but not in the borderline case p = 2, which remains open). A major open question in this context is Simon’s problem no. 7 [27]: Are there potentials satisfying (2) with α > 1/2, so that for some boundary condition β, the operator Hβ has some singular continuous spectrum? We organize this paper as follows. In Sect. 2, we discuss the construction of the so-called finite gap potentials, that is, of quasi-periodic potentials with finitely many prescribed gaps in the spectrum. Since this material is rather classical, we concentrate on those aspects of the theory that are needed later. The following section introduces the problem of obtaining pointwise bounds on finite gap potentials. We state our main result on finite gap potentials (Theorem 3.1) and discuss some general features of this result. The proof is given in Sects. 4, 5, 6; this analysis is perhaps the central part of this paper. It depends on a study of the Jacobi inversion problem in cases where a large number of small gaps is present. A major role will be played by a graphical representation of the terms of a perturbation series, which we introduce in Sect. 5. With Theorem 3.1 as new input, we can then obtain Theorem 1.1, relying mainly on the ideas already contained in [21]. This is done in Sect. 7. In fact, with our new approach, the treatment becomes more transparent. 2. Finite Gap Potentials In this section, we will briefly review the construction and some results on finite gap potentials. We will more or less follow the representation given in [16]. For further information on this many-faceted topic (for example, the connections to equations of the KdV hierarchy), see [9, 10]. The needed facts from the theory of compact Riemann surfaces can be found in [8, 28]. Let energies E0 < E1 < · · · < E2g be given. The aim is to construct a family of (quasi-periodic) potentials V ∈ C ∞ (R) so that the corresponding operators H = −d 2 /dx 2 + V (x) on L2 (R) have purely absolutely continuous spectrum with precisely the prescribed gaps: σ (H ) = R \
g
(E2n−1 , E2n )
(E−1 := −∞).
(5)
n=0
To this end, consider the Riemann surface S of R(z) =
2g
1/2 (En − z)
.
n=0
S is compact and hyperelliptic and its genus is equal to g. A topological model of S can be obtained by gluing together two copies of the extended complex plane cut along the gaps (∞, E0 ), (E1 , E2 ), . . . , (E2g−1 , E2g ). The points of S may thus be viewed as pairs z = (z, R(z)). Here, z ∈ C ∪ {∞} is the canonical projection of z ∈ S onto C ∪ {∞}, and R(z), which is of course already determined by z up to a sign, shows on which sheet of the surface z lies. Put differently, the canonical projection z → z gives a two-sheeted branched covering of the Riemann sphere C ∪ {∞} by S; the preimages of ∞, E0 , . . . , E2g are branch points of order one.
412
T. Kriecherbauer, C. Remling
Standard coordinates ζ on S can be defined as follows. If z is not a branch point, put j , use ζ = (Ej − z)1/2 , the ζ = z in a neighborhood of z; near a finite branch point E sign being determined by the sign of R, and near infinity, use similarly ζ = (−z)−1/2 . There are precisely g linearly independent holomorphic differentials (also known as Abelian differentials of the first kind) on S. One can obtain unique basis elements by prescribing certain periods. We will work with the standard normalization which amounts to demanding that for i, j = 1, . . . , g, 2
E2i
E2i−1
ωj = δij .
(6)
(The notation is a little sloppy: The path of integration projects onto [E2i−1 , E2i ], and (−1)i R ≤ 0 on this path.) Note that the left-hand side is just the integral of ωj over the cycle ai of a standard homology basis (see, e.g., [10, p. 109ff]). One can then show that the ωj thus defined are of the form ωj =
pj (z) dz, R(z)
pj (z) = cj
i=j
(j )
(λi
− z),
(7)
(j )
with λi ∈ (E2i−1 , E2i ), cj > 0. Of course, this representation refers to the coordinate maps z → z discussed above. The Abel–Jacobi map α sends positive divisors of degree g (that is, unordered collections of g points from S) to the Jacobi variety of S, which is the complex torus equal to Cg modulo the period lattice of the holomorphic differentials. This map is onto; in other words, the Jacobi inversion problem can be solved. We will need the Abel–Jacobi map only for divisors of the form ( µ1 , . . . , µg ) with µi ∈ [E2i−1 , E2i ]. The Abel–Jacobi map is then given by αi ( µ1 , . . . , µg ) = 2π
g
µj
j =1 E2j −1
ωi
mod 2π ;
(8)
here, we take paths of integration whose projections lie entirely in the corresponding gaps [E2j −1 , E2j ]. It follows from classical theorems of Abel and Jacobi [28, Chapter 10] that α = (α1 , . . . , αg ) is a bijection from the set of divisors specified above onto the real part of the Jacobi variety Tg = [0, 2π )g . Alternately, this fact may be verified directly, using a representation of the Abel–Jacobi map that will be derived below (see Eq. (16)). Actually, (8) differs from the standard definition of theAbel–Jacobi map by an additive constant vector and the factor 2π ; the choice (8) is more convenient here. Now the stage has been set for the actual construction of the finite gap potentials. Consider the following linear flow on Tg : φx α0 = α0 + νx, where the frequency vector ν is given by
νj = 2π res (−z)1/2 ωj
z=∞
.
Using the coordinate ζ = (−z)−1/2 at z = ∞, we can easily evaluate the residue to obtain νj = 4πcj , where cj is the normalization constant of the polynomial pj (see
Finite Gap Potentials and WKB Asymptotics for 1D-Schrödinger Operators
413
(7)). Now pull back the flow φx to the set of divisors ( µ1 , . . . , µg ), using the Abel– Jacobi map. In other words, define the functions µj (x) ∈ S (µj (x) ∈ [E2j −1 , E2j ]) by requiring that α( µ1 (x), . . . , µg (x)) = α0 + νx. Next, introduce a potential V by the following trace formula: Vα0 (x) = E0 +
g
(E2n−1 + E2n − 2µn (x)).
(9)
n=1
One can then show that this family of potentials solves the inverse problem stated at the beginning of this section. Namely, the operators H = −d 2 /dx 2 + Vα0 (x) on L2 (R) have purely absolutely continuous spectrum equal to the set given in (5). This follows from the following representation of the diagonal of the Green function of H : G(x, x; z) =
g 1 (µn (x) − z) . 2R(z)
(10)
n=1
This important formula is derived with the aid of the so-called Baker-Akhieser function, which gives explicit expressions for the solutions y of the DE Hy = zy. See [16] for the details. Actually, the right-hand side of (10) defines a meromorphic function on S (with simple poles precisely at the finite branch points) for every fixed x ∈ R. The Green function, however, depends on z ∈ C; therefore, we must complement (10) by recalling that Im G(z) Im z > 0 for Im z = 0. It is useful (and probably also more natural) to interpret the above recipe in a slightly different way. Namely, define f : Tg → R implicitly by the trace formula: f (β) = E0 +
g
(E2n−1 + E2n − 2µn ),
(11)
n=1
where α( µ1 , . . . , µg ) = β. Then the finite gap potential Vα (x) is obtained by evaluating f along the trajectory of α under the flow φx α = α + νx. So, if ν is known, then Vα is computed by inverting the Abel–Jacobi map. We remark parenthetically that there is an “explicit” solution to this problem which uses Riemann theta functions [9, 10, 17], but these formulae are not of much use here. 3. Pointwise Estimates on Finite Gap Potentials Since µj ∈ [E2j −1 , E2j ], the trace formula (9) immediately implies the following bound on Vα (x): sup |Vα (x) − E0 | ≤
x∈R
g
(E2n − E2n−1 ).
(12)
n=1
In other words, Vα − E0 ∞ is bounded by the -1 -norm of the sequence of the gap lengths E2n − E2n−1 . It is also obvious that nothing more can be said in general: Indeed, if the components of the frequency vector ν are rationally independent, every trajectory
414
T. Kriecherbauer, C. Remling
{φx α : x ∈ R} is dense in the torus Tg , and thus the L∞ -norm of Vα (x) − E0 is equal to the maximum of f − E0 over the torus, so (12) holds with equality in this case. However, there still is hope that (12) can be improved if the supremum is only taken over a bounded (but large) interval 0 ≤ x ≤ L. Then the problem is to choose a trajectory whose initial piece avoids those points of the torus where |f | is large. Our next major goal is to confirm this hope. This will occupy us for the following four sections. It will be convenient to use the centers and the half-lengths of the gaps as new parameters. So, define mn =
E2n−1 + E2n , 2
ln =
E2n − E2n−1 . 2
The finite gap potentials that are needed in the construction underlying Theorem 1.1 have gaps which are small compared to the bands [E2n , E2n+1 ]. Therefore, we from now on concentrate on this situation. To make this condition precise, we introduce l = max ln , n=1,... ,g
d=
min (mn − mn−1 );
n=2,... ,g
our condition on the parameters of the construction will be that (l/d) ln g is sufficiently small. The following theorem is our main result on finite gap potentials. It says that the family of finite gap potentials {Vα : α ∈ Tg } contains functions which are over long intervals “almost” bounded by the -2 -norm of the gap lengths (rather than the -1 -norm). According to the above remarks, we now use the following parameters to describe finite gap potentials: g ∈ N (g ≥ 2) is the number of gaps; E0 < m1 < · · · < mg and l1 , . . . , lg > 0 describe the locations and the lengths of the gaps, respectively. We require that the gaps do not touch or overlap. Clearly, this amounts to demanding that E0 < m1 − l1 and mn + ln < mn+1 − ln+1 for n = 1, . . . , g − 1. Theorem 3.1. Let C1 , C2 > 0 be constants so that C1 ≤ m1 − E0 , mg − E0 ≤ C2 , and let N ∈ N0 . Then there exists a constant C, depending only on C1 , C2 , and N (but not on the parameters of the finite gap potentials), such that the following holds. For every L ≥ 1, there exists an α ∈ Tg so that sup f (α + νx) − f0 ≤ C g 1/2 l (ln(gL))1/2 + gl(ld −1 ln g)N+1 , 0≤x≤L
where f was defined in (11) and f0 =
Tg
f (β)
dβ . (2π )g
Recall from Sect. 2 that f (α + νx) is just the finite gap potential Vα (x). In the proof of Theorem 3.1, we will in fact show that the assertion holds with large probability if α ∈ Tg is chosen at random. The first term of the bound is the -2 -norm of the gap lengths (as promised), times a logarithmic factor. Of course, the point is that the increase in L is slow, so we can still take a relatively large L. Note, however, that we no longer get an improvement over the trivial bound gl if L is of the order eg . In the application of Theorem 3.1 in this paper, we will have L ≤ g γ , and then Theorem 3.1 indeed gives a good bound. From a theoretical point of view, a particularly neat situation arises when the flow φx and thus also the finite gap potentials are periodic with period p. In that case, one
Finite Gap Potentials and WKB Asymptotics for 1D-Schrödinger Operators
415
can take L = p to obtain a bound which is valid for all x ∈ R. This remark is not as academic as it may seem, because one can show, using topological arguments, that in situations with small gaps one can get a periodic φx by slightly moving the centers mn . The period will be of the order p ≈ d −1 . See also [6, Appendix C.2] for statements of this type. The second term of the above bound contains the -1 -norm gl, but multiplied by an arbitrarily high power of ld −1 ln g. So Theorem 3.1 is interesting only if this combination is small, but this is the case in our construction for proving Theorem 1.1. What exactly “small” means obviously depends on C and thus on C1 , C2 , N , but on nothing else. This will be very important in the proof of Theorem 1.1, where we will apply Theorem 3.1 to a whole sequence of finite gap potentials. We would like to emphasize the fact that we do not subtract E0 from Vα (x) (which is perhaps the constant that comes to mind first), but rather the average of f over the real part of the Jacobi variety. This may be viewed as a renormalization, due to higher order terms. Indeed, E0 is the limiting value of f at l = 0; now the theorem says that the zeroth Fourier coefficient f0 (which contains also terms which are of higher order in the small parameter l/d) gives a better constant approximation to Vα (x). This remark is actually true for approximations by trigonometric polynomials of arbitrarily high degree; we will comment on this point again after having discussed the proof of Theorem 3.1. We will give this proof in the following three sections. This is the plan of attack: We will first solve the Jacobi inversion problem up to order N in the small parameter ld −1 ln g. (As explained above, the problem of computing finite gap potentials basically is the Jacobi inversion problem, that is, the problem of inverting the Abel–Jacobi map.) This will be done by expanding in Fourier and Taylor series and solving the equations by iteration. The expressions obtained in this way rapidly get out of hand as N increases. However, things become surprisingly transparent if a graphical representation of the perturbation series is introduced. This will be developed in Sect. 5, after having discussed some preparatory material in Sect. 4. Then, in Sect. 6, we extend classical methods, due to Salem and Zygmund [23], for bounding random trigonometric polynomials to finish the proof. In its original version, this argument shows, for example, that for a random choice of signs, N p(x) = n=1 ±an cos nx is almost bounded by the -2 -norm of its coefficients: p∞ ≤ Ca2 (ln N )1/2 . Theorem 3.1 can perhaps be viewed as a nonlinear version of this result. 4. Proof of Theorem 3.1: Basic Estimates We want to analyze the Abel–Jacobi map (8). We can parametrize the divisors ( µ1 , . . . , µg ) (as usual, µj ∈ [E2j −1 , E2j ]) by the points (ψ1 , . . . , ψg ) of another copy of the torus Tg = [0, 2π )g (not to be confused with the real part of the Jacobi variety) as follows: Write µj = mj − lj cos ψj , R(µj ) = Rj (µj )ilj sin ψj ,
(13) (14)
where Rj (z) = R(z)/ (E2j −1 − z)(E2j − z). This definition is not yet complete since the sign of Rj (µj ) on the right-hand side of (14) also needs to be specified. Note that iRj (µ) is real and non-zero for E2j −1 < µ < E2j . Therefore, it makes sense to require that iRj (µ) be positive for odd j and negative for even j and µ as above. So, for a given
416
T. Kriecherbauer, C. Remling
ψj ∈ [0, 2π), Eq. (13) tells us what the projection µj of µj is, while (14) determines the sheet on which µj lies. In particular, if ψj + ψj = 2π , then the corresponding points µj ∈ S have the same projections but lie on different sheets. µj , The substitution (13), (14) allows us to write integrals involving the ωj in a particularly convenient way. Indeed, recalling (7), we see that the normalization condition (6) now takes the form π pj (µ) 2 dψ = δj n . (15) iR (µ) 0
n
µ=mn −ln cos ψ
Similarly, the Abel–Jacobi map, now viewed as a map from Tg to Tg (but still denoted by α), can be written as g ψn pj (µ) αj (ψ1 , . . . , ψg ) = 2π dψ. (16) iRn (µ) µ=mn −ln cos ψ 0 n=1
Finally, the function f from the trace formula for V (see (11)) takes the following form when expressed in terms of the new variables: f = E0 + 2
g
ln cos ψn .
(17)
n=1
As already discussed, Theorem 3.1 is vacuous if ld −1 ln g is not small (if ld −1 ln g ≥ , just take C = 4 −N −1 ). Thus only the case where ld −1 ln g <
(18)
needs proof; here, > 0 can be chosen according to our needs and may depend on C1 , C2 , and N (but on nothing else). In addition to the hypotheses of Theorem 3.1, we will therefore assume (18) with a sufficiently small from now on. In particular, the reader should keep in mind that (18) with a suitable = (C1 , C2 , N ) as well as the hypotheses of Theorem 3.1 are (tacit) assumptions in all lemmas of Sects. 4–6. Notational remark. In the sequel, we will use the following conventions. A “constant” (usually denoted by C) is a number that only depends on C1 , C2 , and N . In particular, the constants which are implicit in the Landau notation O(· · · ) may only depend on C1 , C2 , and N . We will sometimes write a b instead of a ≤ Cb (or a = O(b)); here, C is a constant in the sense just explained. Similarly, a ≈ b is short-hand for two-sided estimates. Finally, the value of C may change from one formula to the next, so there is nothing wrong with an inequality like C + 1 ≤ C (to give a blatant example). Assuming (18), we can analyze (15), (16) in some detail by using Taylor expansions. The following lemma will get us started. Lemma 4.1. For all j, n = 1, . . . , g, the function pj (z)/Rn (z) is holomorphic in a neighborhood of [E2n−1 , E2n ], and for all s ∈ N0 , s s d pj (z) Cs! ≤ C max . z∈[E2n−1 ,E2n ] dzs Rn (z) d d −1 |mj − mn | + 1 Moreover, cj =
(mj − E0 )1/2 1 + O((l/d)2 ln g) . 2π
Finite Gap Potentials and WKB Asymptotics for 1D-Schrödinger Operators
417
Proof. The first assertion is obvious; in fact, it holds on any simply connected neighborhood of [E2n−1 , E2n ] that avoids the other branch points. Thus, for z ∈ [E2n−1 , E2n ], we can use the Cauchy formula to represent the derivatives: f (ζ ) s! (s) f (z) = dζ. (19) 2π i K (ζ − z)s+1 Here, we integrate over the contour K = {ζ = mn + (d/2)eiϕ : 0 ≤ ϕ ≤ 2π } in counter-clockwise direction. Note that K is well separated from all gaps [E2i−1 , E2i ]. In particular, if l/d is sufficiently small (for instance, l/d ≤ 1/4 will do), then |ζ −z| d for all ζ ∈ K, z ∈ [E2n−1 , E2n ], so (19) implies that (s) (20) max f (z) ≤ C(C/d)s s! max |f (ζ )| . z∈[E2n−1 ,E2n ]
ζ ∈K
We want to apply this to f = pj /Rn , so we need to estimate pj and Rn : We have that for ζ ∈ K, |Rn (ζ )| = |ζ − E0 |1/2
1/2 |mi − li − ζ | |mi + li − ζ |
i=n
|mi − ζ | 1 + O
i=n
l2 (mi − mn )2
.
Here we used the fact that mn − E0 ≈ 1 by the hypotheses of Theorem 3.1. (j ) (j ) Similarly, since the unknown zeros λi of pj satisfy λi ∈ (E2i−1 , E2i ) and since cj > 0, we obtain (j ) pj (ζ ) = cj |m | 1+O λi − ζ = cj i −ζ i=j
i=j
l |mi − mn | + d
.
Now |mi − mj | ≥ d|i − j |, so taking logarithms, we see that for small ld −1 ln g,
l 1+O = 1 + O(ld −1 ln g), |mi − mn | + d i=j l2 1+O = 1 + O((l/d)2 ). (mi − mn )2 i=n
Estimates of this type will be used quite often in the sequel. Combining the bounds just proved, we get pj C (ζ ) ≤ cj , R −1 |m − m | + 1 d n j n and the claim on the derivatives of pj /Rn would follow with (20) if we knew already the asserted formula for the cj ’s.
418
T. Kriecherbauer, C. Remling
So, it only remains to prove the estimate on cj stated in Lemma 4.1. Taylor’s theorem with remainder gives pj (mn ) pj (µ) d pj (mn ) ln cos ψ = − Rn (µ) µ=mn −ln cos ψ Rn (mn ) dz Rn (l/d)2 . + O cj −1 d |mj − mn | + 1 Plug this into (15). The first order term integrates to zero. Also, Rn (mn ) = −i mn − E0 (1 + O((l/d)2 )) (mi − mn ), i=n
so we obtain (j ) 2πcj i=j (λi − mn ) (1 + O((l/d)2 )) + √ mn − E0 i=n (mi − mn ) O cj
(l/d)2 d −1 |mj − mn | + 1
= δj n .
(21)
For j = n, (21) leads to (j )
λn − mn (1 + O(ld −1 ln g)) = O mj − m n
l 2 /d , |mj − mn |
(j )
thus λn = mn + O(l 2 /d). Using this in (21) with j = n, we finally obtain 2 −1
l d 2πcn 1+O = 1 + O cn (l/d)2 , √ |mi − mn | mn − E 0 i=n
and the lemma follows.
We now expand the integrands of the Abel–Jacobi map (16) in a Fourier series. This, and not a Taylor series, is the appropriate choice here, because it gives the correct “renormalized” constant term immediately, without contributions from higher order terms. So write 2πpj (µ) = am (j, n)eimψ . (22) iRn (µ) µ=mn −ln cos ψ m∈Z
Since the left-hand side is in C ∞ (T) as a function of ψ, this expansion converges uniformly. Moreover, 2π pj (µ) am (j, n) = e−imψ dψ, (23) iRn (µ) µ=mn −ln cos ψ 0 and, as a consequence, a0 (j, n) = δj n (by (15)).
Finite Gap Potentials and WKB Asymptotics for 1D-Schrödinger Operators
419
Lemma 4.2. |am (j, n)| ≤
(Cl/d)|m| d −1 |mj − mn | + 1
Proof. This is trivially satisfied if m = 0, so we suppose that m = 0. Then, by Taylor’s theorem and Lemma 4.1, |m|−1 pj (mn − ln cos ψ) = bk (j, n)(−ln cos ψ)k + ρ|m| (ψ), Rn k=0
where the remainder satisfies the estimate (Cl/d)|m| ρ|m| (ψ) ≤ . d −1 |mj − mn | + 1 2π Since 0 cosk ψ e−imψ dψ = 0 for |m| > k, the claim now follows from (23).
Using a0 (j, n) = δj n , we can now plug (22) into (16) to write the Abel–Jacobi map in the form g am (j, n) imψn αj = ψj + − 1). (24) (e im m∈Z n=1
Here and in the sequel, the prime at the sum sign indicates omission of the term with m = 0. To obtain (24), we have integrated (22) term by term, which is allowed because of the uniform convergence. We want to solve the system of equations (24) for ψ1 , . . . , ψg . It is useful to separate the leading term, which, due to the smallness of the am (j, n)’s expressed by Lemma 4.2, is αj . So, introduce θj by writing ψj = αj + θj ; then (24) becomes g am (j, n) imαn imθn θj + e − 1) = 0, (25) (e im m n=1
and these equations must now be solved for the θj ’s. Actually, we will compute the θj ’s only up to an error of order O((ld −1 ln g)N+1 ). Note that by Lemma 4.2 and (25), |θj | ≤ 2
(Cl/d)|m|
m
g n=1
d −1 |m
j
1 ld −1 ln g, − mn | + 1
(26)
since d −1 |mj − mn | ≥ |j − n|. Now we keep only those terms of (25) which are of order ≤ N in the small parameter ld −1 ln g, and we iterate these new equations N times. The following lemma justifies this procedure; we get indeed a good approximation to θj . (0)
Lemma 4.3. Define θj (s+1)
θj
= 0,
=−
g am (j, n) imαn − 1) (e im
|m|≤N n=1
−
g am (j, n) |m|≤N n=1
im
eimαn
N−|m| t=1
(N) s = 0, 1, . . . , N − 1. Then θj − θj ≤ C(ld −1 ln g)N+1 .
(s)
(imθn )t , t!
(27)
420
T. Kriecherbauer, C. Remling
Proof. We will prove by induction that (s) θj − θj ≤ C(ld −1 ln g)s+1
(28)
for s = 0, 1, . . . , N. For s = 0, this is just (26). Now assume (28) holds for some s ≥ 0. We claim that then (s+1) θj
=−
g am (j, n) m∈Z n=1
(s) eimαn eimθn − 1 + O((ld −1 ln g)N+1 ).
im
(29)
Indeed, comparison with (27) shows that the error from (29), which we want to bound by C(ld −1 ln g)N+1 , is equal to g am (j, n) |m|≤N n=1
im
eimαn
∞
(s)
t=N+1−|m|
(imθn )t + t! g am (j, n) imαn imθn(s) e −1 . e im
|m|>N n=1
The induction hypothesis implies that (s) (s) θn ≤ θn − θn + |θn | = O(ld −1 ln g), so, by Lemma 4.2, the first contribution to the error is bounded by a constant times g |m|≤N n=1
(l/d)|m| (ld −1 ln g)N+1−|m| d −1 |mj − mn | + 1 (l/d)N+1 (ln g)N+2−|m| (ld −1 ln g)N+1 , |m|≤N
as desired. Similarly, the second contribution to the error term can be estimated by g |m|>N n=1
(Cl/d)|m| (Cl/d)|m| (l/d)N+1 ln g. ln g d −1 |mj − mn | + 1 |m|>N
This concludes the proof of (29). Adding (29) and (25), we obtain (s+1)
θj
= θj −
g am (j, n) m∈Z n=1
im
(s) eimαn eimθn − eimθn + O((ld −1 ln g)N+1 ).
Lemma 4.2 together with imθn(s) − eimθn |m|(ld −1 ln g)s+1 , e which follows from the induction hypothesis, now yield the induction statement (28) for s + 1.
Finite Gap Potentials and WKB Asymptotics for 1D-Schrödinger Operators
421
5. The Feynman Rules We now introduce, as announced above, a graphical representation of the terms obtained (0) from the recursion (27). We have θj = 0 and (1)
θj
=−
g am (j, n)
im
|m|≤N n=1
(eimαn − 1).
This latter expression can be represented by the following graph: j
✉
n ❤
✲ m (1)
Here is the recipe to recover θj from this graph: Associate the factor am (j, n) with the edge m with vertices j and n. The circled vertex n contributes a factor eimαn − 1, where m is the parameter of the incoming edge. Finally, multiply by i/m and sum over m = ±1, . . . , ±N and n = 1, . . . , g. These rules, suitably generalized, also work for larger values of s. At first sight, the (2) (1) formula for θj looks considerably more complicated than that for θj because now the second line of (27) also contributes. However, it is not hard to convince oneself that (2) θj can actually be computed by evaluating the following graphs. j
j
✉
✉
✲ m
✲ m1
n ❤
+
j
✉
n ❤2 ✟ ✟ ✟ ✟m ✯ n1 ✟✟ 2 ✟ ✉ ❍ ❍❍ ❥ ❍ ❍ m3❍ ❍❤ n3
✲ m1
+
n1 ✉
✲ m2
n ❤2
+
···
More precisely, there are N such graphs; they have the common property that every edge except the first one emanates from the second vertex. Again, edges contribute factors of the form am (n, n ), and for each vertex = j , there is a factor eimαn (eimαn − 1 if the vertex is marked by a circle). Then one has to multiply by a factor that depends on the edge indices mi and also on the graph and finally sum over all parameters except j . (Explicitly, this factor is i(−1)E+1
E mE−2 1 m−1 k , (E − 1)! k=2
where E is the number of edges.) We are now ready to formulate the rules for computing ei(αj +θj ) from graphs of this type. The quantity ei(αj +θj ) = eiψj is of especial interest here because the function f from (17) depends on exactly this combination.
422
T. Kriecherbauer, C. Remling
Feynman rules for ei(αj +θj ) 1. Draw all directed trees with at most N edges. By a “directed tree”, we mean a connected graph with the property that there is precisely one vertex with only outgoing edges, while for every other vertex, there is exactly one incoming edge. The vertices without outgoing edges are called final vertices (the trivial graph consisting of just one vertex is excluded in this definition); they are marked by circles. Formally, such a graph (with E edges, say) may be represented by E + 1 symbols V1 , . . . , VE+1 (“vertices”) and a collection of E ordered pairs (Vi , Vj ) with i = j (“edges”). Two graphs are equal if there is a bijection from one set of vertices to the other which preserves the edges. The figure below illustrates the case N = 3. s ❡ ✟✟ ✯ s ✟ ✟ ❍❍ ❥ ❍❍ ❡
s ✲
❡
❡ ✟✟ ✯ ✟ ❡ s ✲ ✟ ❍❍ ❥ ❍❍ ❡
s✲
s✲
❡ ✑ ✑ ✸ s✑ ✑ ◗◗ s ◗ ◗s✲
❡
s✲
s ✲ ❡
s✲
s✲
❡
❡ ✟✟ ✯ s ✟ ✟ ❍❍ ❥ ❍❍ ❡
2. For every graph, label the (unique) vertex without incoming edge j . Then, attach the indices n1 , . . . , nE to the remaining vertices, and label the edges m1 , . . . , mE . It is of no significance how the indices n1 , . . . , nE and m1 , . . . , mE are assigned to the vertices and edges, respectively, but once a graph has been labeled, this particular labeling is fixed once and for all. 3. These labeled graphs are translated into formulae as follows. An edge labeled m pointing from vertex n to vertex n stands for a factor am (n, n ). A non-final vertex with index n = j contributes eimαn , where m is the (unique) incoming edge. In case n is a final vertex, the rule is similar except that the factor now is eimαn − 1. The vertex j always carries the factor eiαj . Finally, the result is multiplied by a number cG (m1 , . . . , mE ) which depends on the graph and the edge indices (and N , but this is fixed throughout). In principle, cG can be computed, as the discussion below will show, but we do not need to know the precise values of the cG ’s here. 4. Sum over mi = 0, E i=1 |mi | ≤ N , and ni = 1, . . . , g. Finally, sum over all graphs. Carrying out these instructions produces a (complicated) function of the αn ’s. The claim is that up to an error O((ld −1 ln g)N+1 ), this function coincides with ei(αj +θj ) . We will now prove this assertion, which is the central result of this section. This proof, though not really difficult, is not easy to formulate; to get a feeling for the underlying principles, it is advisable to try things out by iterating (27) a few times and drawing some pictures. Our verbal description will thus be somewhat sketchy. The strategy of the proof, however, is straightforward. First of all, we show that the (s) approximations θj from (27) admit a representation by diagrams. We have demonstrated this already for s = 1, 2, and the general case is hardly more difficult. Then, we use this knowledge to formulate similar rules for ei(αj +θj ) . Finally, terms of order (ld −1 ln g)N+1 or higher can be dropped on the way. (s) So, our first claim is the following statement: The θj from Lemma 4.3 can be calculated by evaluating certain graphs according to similar rules like the ones given above. There are a number of differences: All graphs have exactly one edge emanating
Finite Gap Potentials and WKB Asymptotics for 1D-Schrödinger Operators
423
from j , there is no factor eiαj attached to j , the factors cG are different, the mi ’s are summed over the range |mi | ≤ N , mi = 0, and there may be graphs with more than N edges. In fact, we know this already for s = 1, 2, and the proof of the general case is by (s+1) (s) is obtained by inserting θn on the right-hand side induction on s. By its definition, θj (s)
of (27). By induction hypothesis, θn is a sum of many terms each of which corresponds to a graph with certain parameters. We now multiply out the right-hand side of (27) and only then take the various sums. To prove our claim, it suffices to make the following observations: graphs are multiplied together by attaching them to one another at the “initial” vertex j . Similarly, multiplying a graph by am (j, n)eimαn , as in the second line of (27), amounts to attaching this graph to the single-edge graph in such a way that the final vertex of this single-edge graph and the initial vertex of the other graph combine to one new vertex. Also, we may restrict ourselves to graphs with at most N edges and to parameters mi with E i=1 |mi | ≤ N . Indeed, since each edge carries a factor am (n, n ), Lemma −1 N+1 4.2 implies that the omitted contributions are O((ld ln g) ). Here, the logarithmic factors come from the denominators of the bound of Lemma 4.2, when the vertex indices are summed over. Note also in this context that summing over the mi ’s is never dangerous because the restrictions |mi | ≤ N imply that there is an a priori bound (depending on N only) on the number of summands. Next, we have that e
i(αj +θj )
=e
iαj
N (i θ˜ (N) )t j t=0
t!
+ O((ld −1 ln g)N+1 ).
The tilde on the right-hand side indicates the omission of higher order terms, as discussed (N) in the preceding paragraph.Again, the task is to multiply this out. The θ˜j have graphical representations, as we have just seen, and the above remarks about multiplying together different graphs are still relevant here. The asserted rules follow from this. The additional factor eiαj has simply been attached to the vertex j . Note also that the same graph may arise many times when the process of multiplying out is performed, but then we can simply combine these contributions to a single one. This will only affect the numbers cG (m1 , . . . , mE ). 6. Bounds Along a Random Trajectory This last part of the proof of Theorem 3.1 deals with the problem of bounding f (α) − f0 along trajectories α = α0 + νx, given the information obtained in the preceding section. First of all, recall from (17) that f (α) = E0 + 2
g j =1
lj cos ψj = E0 + 2 Re
g
lj ei(αj +θj ) .
We will first convince ourselves that f is of the form b(m) sin(m · α + ϕm ) + O(lg(ld −1 ln g)N+1 ). f (α) = f0 + |m|1 ≤N+1
(30)
j =1
(31)
424
T. Kriecherbauer, C. Remling
We use a slightly different notation in this section in that now m = (m1 , . . . , mg ) with mi ∈ Z. Also, |m|1 = |mi | and m · α = mi αi ; finally, the prime at the sum sign now means omission of the summand with m = (0, . . . , 0). To prove (31), use (30) and think of the exponentials ei(αj +θj ) as being evaluated according to the Feynman rules. Then α-dependent factors come in only through the vertices of the graphs; more precisely, vertices contribute factors of the form eimαn (or eimαn − 1 if the vertex is final), where m is the index of the incoming edge. The vertex j always contributes a factor eiαj , so eachgraph is a sum of α-independent factors times an exponential of the form exp(i(αj + mi αni )). Since rule 4 imposes the restriction |mi | ≤ N, a rearrangement of terms gives (31), as asserted. Clearly, this argument has not only established (31), but it has also indicated how the coefficients b(m) can be computed, at least in principle, using the graphs introduced in Sect. 5. This will become very important in a moment. (Just proving (31) is easy and does not require the Feynman rules.) To prove Theorem 3.1, we need to estimate the second term on the right-hand side of (31). Call this sum fN (α). The main step will be to prove the following estimate. Given Lemma 6.1, we will then be able to apply the methods of [23]. Lemma 6.1. There is a constant C, so that for every λ ∈ R, dα 2 2 eλfN (α) ≤ eCλ l g . g (2π ) Tg Remark. Our “definition” of fN is not quite complete, since (31) does not uniquely determine fN , given f . Lemma 6.1 really asserts that for some fixed choice of fN , consistent with (31), the stated estimate holds. More precisely, fN is obtained by going from (30) to (31) in exactly the way described above. The following proof will also clarify this. Proof. We will further decompose fN and then analyze the individual terms separately. To this end, we first introduce equivalence classes of indices m. Namely, we say that m and m are equivalent if they have the same non-zero entries, taking the order into account. To put this into more formal language, write m = (0, . . . , 0, k1 , 0, . . . , 0, k2 , 0, . . . , 0, kr , 0, . . . , 0), with r ∈ N and ki = 0 for all i = 1, . . . , r. Then m and m are equivalent precisely if r = r and ki = ki for all i. This definition may not look very useful at first sight, but recall that N (which bounds the -1 -norm of m) is fixed while g (which is the length of the vectors m) is typically large, so the vectors m indeed have only relatively few non-zero entries. The number of equivalence classes in the set of indices {m ∈ Zg : |m|1 ≤ N + 1} only depends on N, but not on g. (Note, however, that the cardinality of the equivalence classes themselves does go to infinity as g increases.) Now fix an equivalence class (m0 ) and consider b(m) sin(m · α + ϕm ). m∈(m0 )
Denote the positions of the non-zero entries ki of m ∈ (m0 ) by ni . Then, if we vary the ni ’s (respecting the obvious restrictions 1 ≤ n1 < n2 < · · · < nr ≤ g), but keep r and
Finite Gap Potentials and WKB Asymptotics for 1D-Schrödinger Operators
425
the ki ’s fixed, we get exactly all elements of the equivalence class under consideration. Thus the above sum is equal to b(m) sin k1 αn1 + · · · + kr αnr + ϕm . (32) 1≤n1 <···
In this formula, m is defined by mni = ki and mi = 0 otherwise. Using the addition laws for sine and cosine r − 1 times, we can write (32) as a sum of 2r−1 terms of the form b(m) sin(k1 αn1 + γ1 ) · · · sin(kr αnr + γr ). (33) 1≤n1 <···
Here, the dependence of the phases γi on the index m has not been made explicit because the precise values of the γi ’s will not matter anyway. (On top of that, we of course have a lot of freedom in the choice of the γi ’s, given ϕm from (32).) Since r ≤ N + 1, we still have a universal bound (depending on N only) on the number of different sums of the form (33) that arise in the decomposition of fN just performed. Fix such a sum and call it F = F (α1 , . . . , αg ) for easier reference. It suffices to establish the lemma with F in place of fN . Indeed, if this is proved, then, since fN = F with an a priori bound on the number of summands F , the claimed estimate follows from Hölder’s inequality. Now, to bound eλF , we first do the integration with respect to the last variable αg . The sum defining F contains many terms that do not depend on αg . More precisely, we have that F (α1 , . . . , αg ) = F1 (α1 , . . . , αg−1 ) sin(kr αg + γr ) + F2 (α1 , . . . , αg−1 ), with F1 =
b(m) sin(k1 αn1 + γ1 ) · · · sin(kr−1 αnr−1 + γr−1 ),
1≤n1 <...
F2 =
b(m) sin(k1 αn1 + γ1 ) · · · sin(kr αnr + γr ).
1≤n1 <...
2π So, we now have to evaluate (2π )−1 0 exp[λF1 sin(kr α +γr )] dα. The substitution β = kr α + γr together with the computation 2π ∞ n 2π ∞ dβ c (c/2)2n 2n dβ 2 ec sin β sinn β = = ≤ ec /4 2π n! 2π (2n)! n 0 0 n=0
show that
2π 0
e
λF
n=0
dαg = eλF2 2π
dαg eλF1 sin(kr αg +γr ) 2π 0 2 2 λ λF2 ≤ e exp , |b(m)| 4 2π
(34)
where thesum is over 1 ≤ n1 < . . . < nr−1 ≤ g − 1, and nr = g. We now estimate this sum |b(m)|. By the discussion following (31), the coefficients b(m) can be obtained with the help of the Feynman rules of the preceding section by collecting those
426
T. Kriecherbauer, C. Remling
contributions which depend on α in exactly the way described by m. (More precisely, collect everything that comes with a factor eim·α , multiply by the corresponding lj ’s, and then take the real part and read off b(m).) By the triangle inequality, we can estimate |b(m)| by bounding the individual contributions associated with certain fixed graphs with fixed labelings and then taking the various sums at the very end. So, fix a graph that contributes to some b(m) occurring in the sum |b(m)|. To avoid confusion with the numbers mi , ni introduced in this section, the parameters labeling the edges and vertices of the graph will now be called mi and ni , respectively. Since the factors eim αn are attached to the vertices of the graph and since the m’s under consideration have r non-zero entries, the graph fixed above must have at least r verticesand hence at least E ≥ r − 1 edges. Moreover, since nr is set equal to g in the sum |b(m)| we are trying to estimate, at least one vertex of the graph must have its parameter equal to g. In other words, r j = g or ni = g for some i. (There is the additional restriction that |mi | ≥ i=1 |ki | − 1, but this will not be used.) Armed with these observations, we are now ready to do the estimates. By the Feynman rules, the contribution coming from the fixed graph admits a bound of the form Cl am1 (. . . ) · · · amE (. . . ) (35) if the parameters j, n1 , . . . , nE , m1 , . . . , mE are all kept fixed. The factor l allows for the fact that the lj ’s from (30) have been absorbed by the b(m)’s when passing to (31). The arguments of the factors ami depend on the particular form of the graph and also on the way the graph was labeled. Note that the (unknown) factors cG from rule 3 can be absorbed by C because there are only finitely many different values of cG (m1 , . . . , mE ) and we can thus simply estimate these numbers by their maximum. We now use Lemma 4.2 to estimate (35), and we want to sum these bounds over those values of the parameters j, ni , mi which satisfy the restrictions obtained above. (In contrast to the rules from Sect. 5, there is now a sum over j = 1, . . . , g also; this is simply the sum from (30).) In particular, j or one of the ni ’s is held fixed (equal to g). This implies that we can sum over the indices of the remaining vertices in the following way: First of all, delete the vertex corresponding to the fixed index. From the remaining graph, pick a vertex which is connected to just one edge and perform the corresponding sum ni = 1, . . . , g. By the choice of the vertex, ni appears in precisely one of the factors ami as the argument. The denominator of the bound of Lemma 4.2 thus yields a factor ln g when summed in the way just described. As a reminder that the corresponding sum has been performed, delete the chosen vertex together with the edge connected to it. Then repeat the whole procedure with the modified graph to determine the next index to be summed over. Again, just one ami is involved in the sum, and thus another factor ln g results. Since at each stage, there are equally many vertices and edges, this process can only stop after the whole graph has disappeared. There are only E + 1 ≤ N + 1 possible choices for the vertex whose parameter is set equal to g, so the net result is that after summation over the vertices, (35) can be estimated by
Cl(l/d)|m1 |+···+|mE | (ln g)E .
(36)
Indeed, the numerators from Lemma 4.2 contribute the factor (l/d)|m1 |+···+|mE | , and by the argument just given, each of the E sums over the vertices accounts for a factor ln g. We can further estimate (36). By rule 4, mi = 0 for all i = 1, . . . , E, hence |mi | ≥ E; since E ≥ 0, we can thus can bound (36) simply by Cl.
Finite Gap Potentials and WKB Asymptotics for 1D-Schrödinger Operators
427
Now the rest is easy. First of all, each of the at most N parameters mi has values in ±1, . . . , ±N, so summing over the mi ’s just increases the constant C (by at most a factor (2N )N ). Then, the total number of graphs also depends on N only, so we can finally sum over those graphs which could in principle contribute to |b(m)|, and we still have the bound Cl (again, with a possibly larger constant C). Returning to (34) now, we have thus proved that dα dα Cλ2 l 2 λF2 (α ) eλF (α) ≤ e e , (2π )g (2π )g−1 Tg Tg−1 where α = (α1 , . . . , αg ) and α = (α1 , . . . , αg−1 ). The integral on the right-hand side has the same structure as the original one, except that g has been replaced by g − 1. We can therefore repeat the whole argument; the second step would be to carry out the integration with respect to αg−1 in the same way as discussed above. We need at most g steps to do the integral completely, and at each step, we get a factor exp(Cλ2 l 2 ). As 2 2 a result, we obtain the estimate (2π )−g eλF ≤ eCλ l g , and, as already explained, the lemma follows. Remarks. 1. The key point of this proof was the observation that the summation over the vertices only gives logarithmic factors (ln g)E , but no powers of g. Note that to establish this, in turn, we only used some structural information contained in the Feynman rules and Lemma 4.2; the precise form of the underlying iteration was largely irrelevant. 2. The estimate Cl on (36) is of course crude unless we are in the extreme case E = 0 (which, it turns out now, gives the dominant contributions to fN ). Indeed, to prove the extension of Theorem 3.1 mentioned in the beginning of the Introduction and in Sect. 3, one has to keep (36) as it stands. We are now ready to finish the proof of Theorem 3.1. This final part of the argument uses methods developed in [23]. We will follow the presentation given in [12, Chapter 6]. We want to bound fN along a trajectory φx α = α + νx. So, let M(α) = max |fN (α + νx)| . 0≤x≤L
To run the argument from [12, 23], we need a bound on (d/dx)fN (φx α). Write fN as a sum of terms of the form (33), as in the proof of Lemma 6.1, and fix again one of the summands F . Then, by the argument presented in that proof, the sum over the corresponding coefficients b(m) satisfies |b(m)| ≤ Cgl. 1≤n1 <...
Indeed, to show this, it just remains to sum the bound Cl also over the vertex index that had been fixed, and this gives an additional factor g. Since r
d kj νnj cos(kj (αnj + νnj x) + γj ) , F (α + νx) = b(m) j dx
j =1
where j is short-hand for the product of sines with the j th factor omitted, it now follows that d F (α + νx) ≤ C |b(m)| ≤ Cgl. dx
428
T. Kriecherbauer, C. Remling
Here we also used the fact that the νn ’s are bounded; this follows from Lemma 4.1 since νn = 4πcn . Summing the above bounds, we see that also |(d/dx)fN (φx α)| ≤ Cgl. The maximum M(α) is attained at some point x0 , and thus by the mean value theorem, there is a constant C0 (as usual, depending only on C1 , C2 , and N ) together with an interval I = I (α) ⊂ [0, L] of length |I | = min{C0 M(α)/(gl), L}, so that |fN (α + νx)| ≥ M(α)/2 for all x ∈ I . We may assume that M(α) ≥ C0−1 l for all α ∈ Tg , since in the opposite case we have for free a better bound than the one we are trying to prove. We then have that g|I | ≥ 1, and it follows that dα dα eλM(α)/2 ≤ g |I (α)|eλM(α)/2 g g g (2π) (2π )g T T
dα λfN (α+νx) −λfN (α+νx) ≤g dx e + e g Tg (2π ) I (α) L dα λfN (α+νx) −λfN (α+νx) e dx + e ≤g g Tg (2π ) 0 dα λfN (α) −λfN (α) = gL + e e g Tg (2π ) ≤ 2gLeCλ
2 l2 g
.
The last step is by Lemma 6.1. For λ > 0, we can write this inequality in the form 1 2 λ M − 2Cλl 2 g − ln(4gL) ≤ , E exp 2 λ 2 with E(· · · ) denoting the expectation taken with respect to the probability measure (2π)−g dα on the torus Tg . By a Chebyshev estimate, the inequality M(α) ≤ 2Cλgl 2 +
2 ln(4gL) λ
holds with probability ≥ 1/2. The parameter λ > 0 is still at our disposal, the optimal choice being ln(4gL) 1/2 λ= . Cgl 2 Then the bound becomes M(α) ≤ 4C 1/2 g 1/2 l (ln(4gL))1/2 , and this holds for α’s from a set of (2π )−g dα measure at least 1/2. The proof of Theorem 3.1 is complete. Moreover, by re-examining the reasoning of this section, we see that we can also prove the more general result already mentioned. We can obtain a whole series of pointwise approximations to Vα (x). More specifically, the difference between f (φx α) and those terms of b(m) sin(m · φx α + ϕm ) for which |m|1 ≤ M is g 1/2 l(ld −1 ln g)M (ln(gL))1/2 + gl(ld −1 ln g)N+1 for suitable α. In other words, the Fourier series of f (viewed as a function on the Jacobi variety), up to some order, gives a very good pointwise approximation to Vα (x) with
Finite Gap Potentials and WKB Asymptotics for 1D-Schrödinger Operators
429
positive probability (in fact, with as large probability as we please) if α is chosen at random. With M = 0, Theorem 3.1 is recovered. We do not need these more refined statements to prove Theorem 1.1. 7. Proof of Theorem 1.1 The basic idea of the construction of [21] was to glue together suitably chosen periodic potentials. In this paper, we will instead use finite gap potentials with gaps of equal length. Roughly speaking, the construction runs as follows. We will choose the first finite gap potential V1 so that all gaps lie in, let us say, [1, 2]. V2 will have much smaller gaps; also, these new gaps will be contained in the gaps of V1 . If we continue in this way, the intersection over all n of the unions of the gaps of Vn will be a Cantor type set whose dimension is easily controlled, provided there is an appropriate scaling. Moreover, the set S defined in (4) will contain this Cantor type set because if the energy E is in a gap of Vn , the solutions to the Schrödinger equation are on average exponentially increasing or decreasing and hence do not satisfy (3). Of course, we must also take care of the required decay of V (x), that is, Vn must be sufficiently small for large n. The bounds on Vn will be established with the aid of Theorem 3.1. We start by investigating the solutions of (1) for finite gap potentials V and energies E which lie in some gap of V . Lemma 7.1. Let V (x) be a finite gap potential whose parameters satisfy the assumptions of Theorem 3.1. Then there exists an = (C1 , C2 , N ) > 0 and a constant C = C(C1 , C2 , N), such that for ld −1 ln g < , the following holds. If |E − mn | ≤ ln /2 for some n ∈ {1, . . . , g}, then there is a solution y(x) of the Schrödinger equation (1) with y(x0 ) = 1 for some x0 ∈ [0, 1] and ∞ |y(x)|2 dx ≤ C/ ln . x0
Remark. This statement cannot, in general, hold with a fixed, prescribed x0 because the decaying solution has zeros. Roughly speaking, the lemma says that there is a solution which has some decay over intervals of length ln−1 . Proof. Our starting point is the following formula (see, for example, [4, Chapter 9]): ∞ Im mx (z) |f (t, z)|2 dt = . (37) Im z x Here, mx is the m-function of −d 2 /dt 2 + V (t) on [x, ∞) with Dirichlet boundary conditions at t = x. More specifically, let u, v be the solutions of −y + V y = zy with the initial values u(x, z) = v (x, z) = 1, u (x, z) = v(x, z) = 0 and write f (t, z) = u(t, z) + mx (z)v(t, z); then mx (z) is defined by requiring that f ∈ L2 (x, ∞). The Green function of the whole line problem is related to the m-function by −1 − G(x, x; z) = (m− x (z) − mx (z)) , where mx is the m-function of the operator on L2 (−∞, x) (see again [4]). Since the imaginary parts of m− x and mx have opposite
430
T. Kriecherbauer, C. Remling
−1 signs, the right-hand side of (37) is less than −Im G(x, x; z) /Im z. So, if we use (10) and abbreviate (µj (x) − z) = Ux (z), then (37) becomes
∞ x
|f (t, z)|2 dt < −
R(z) 2 Im . Im z Ux (z)
(38)
Here, the sign of R(z) is determined by the fact that Im G(x, x; z) has the same sign as Im z for Im z = 0 (compare the discussion following (10)). Now let E be as in the hypothesis, and put z = E+iδ with δ > 0. By slightly changing x if necessary, we may assume that µn (x) = E. Then E is not in the spectrum of the operator on L2 (x, ∞) with Dirichlet boundary conditions. This is so simply because µn (x) is the only eigenvalue in the gap (E2n−1 , E2n ). Thus mx (z) and R(z)/Ux (z) are holomorphic in a neighborhood of z = E. For this latter function, this may of course be seen by direct inspection. Moreover, R(E)/Ux (E) is real. Therefore, the right-hand side of (38) converges to −2(R/Ux ) (E) as δ → 0+, while the function f (t, E + iδ) tends to f (t, E) = u(t, E) + mx (E)v(t, E). Fatou’s Lemma together with (38) imply
∞ x
|f (t, E)|2 dt ≤ −2
d dz
R Ux
(E).
We have that f (x, E) = 1, so it remains to evaluate (R/Ux ) (E). To this end, note that ln
R Ux
(E) =
g 1 1 1 1 + − − . 2(E − E0 ) µj (x) − E 2(E2j −1 − E) 2(E2j − E) j =1
Estimating as in the proof of Lemma 4.1, we see that 1 1 1 ld −2 . − µ (x) − E − 2(E − E) 2(E − E) j 2j −1 2j
j =n
Furthermore, the term with j = n can be bounded by C/|µn (x)−E|. Since this is ln−1 which is much larger than ld −2 , we can in fact estimate the whole logarithmic derivative by C/|µn (x)−E|. Finally, similar arguments show that |(R/Ux )(E)| ln /|µn (x)−E|, so we conclude that ∞ Cln |f (t, E)|2 dt ≤ . (µ (x) − E)2 n x The proof is finished by observing that µn (x) moves by an amount ln if x varies over an interval of length one. Indeed, if we again use the variables ψj (see (13), (14)), then the ψj ’s evolve according to the differential equations 2iRn (mn − ln cos ψn ) dψn = , dx (m j − mn − lj cos ψj + ln cos ψn ) j =n and the right-hand sides are ≈ 1, independently of the positions the ψj (x)’s.
Finite Gap Potentials and WKB Asymptotics for 1D-Schrödinger Operators
431
Now let a1 = 0, an+1 = an + Ln , where Ln > 0 will be chosen later. Then V will be of the form V (x) =
∞
χ(an ,an+1 ) (x)Vn (x − an );
n=1
the building blocks Vn are finite gap potentials. We now pick these Vn ’s. We basically keep the notation of the preceding sections, except that there is now an additional index n. The gaps of Vn are taken to be of equal length ln , and gn denotes the number of gaps of Vn . Let α ∈ (1/2, 1) be the exponent from (2) (if α = 1, there is nothing to prove). We abbreviate 2(1−α) = D, so D ∈ (0, 1), and D is the dimension the set S from (4) must have. Fix a number a > (1 − D)−1 and put ln = exp(−a n ). A Cantor type set with gn intervals of length ln as its nth approximation has dimension D if there is a scaling of the type gn lnD ∼ 1. This suggests to take gn ∼ exp(Da n ), but for technical reasons, the actual definition is slightly different. First of all, choose a sequence n > 0 which tends to zero, but so slowly that n a n − n−1 a n−1 → ∞ and a n exp(−n a n ) → 0. (In fact, we could take n = q n with a −1 < q < 1 right away, but if we leave n unspecified, we can improve the decay rate of V .) Now put gn0 −1 = 1, where n0 ∈ N must be sufficiently large (this will be made precise below), and then define inductively for n ≥ n0 , gn = exp (D − n )a n + θn gn−1 . Here, we require that gn /gn−1 ∈ N, and this determines θn ∈ [0, 1) uniquely. The parameter dn describes the spacing between adjacent gaps of Vn . Since we want these gaps to lie in some gap of Vn−1 , we get gn gn−1 ln−1 /dn . If we disregard the n ’s and θn ’s for the moment and take dn about as large as possible, this motivates the choice
dn = exp (−D + (D − 1)a −1 )a n . A computation then shows that ln dn−1 ln gn → 0, so Theorem 3.1 can indeed be used to bound Vn for sufficiently large n. Finally, Lemma 7.1 suggests that we put Ln = A/ ln , with a large constant A (how large A has to be will become clear later on). It remains to choose the locations of the gaps, that is, we must pick, for each n, the (n) parameters E0 < mn (1) < · · · < mn (gn ). Let Gn0 −1 = [1, 2], say (any compact (n ) subinterval of (0, ∞) will do). For the time being, set E0 0 = 0. Pick gn0 different centers mn0 (1) < · · · < mn0 (gn0 ) that lie well inside [1, 2] and satisfy min(mn0 (k) − mn0 (k − 1)) ≥ dn0 . k
By the first requirement we mean that |mn0 (k) − 3/2| ≤ 1/4 for all k = 1, . . . , gn0 . Since dn0 gn0 → 0 as n0 → ∞, this can certainly be done, provided n0 is large enough.
432
T. Kriecherbauer, C. Remling
Let Gn0 be the union of (the middle halves of) the corresponding gaps: g
Gn0 =
n0
mn0 (k) − ln0 /2, mn0 (k) + ln0 /2 .
(39)
k=1
The general step is similar. Suppose Gn−1 has been constructed for some n ≥ n0 + 1. More specifically, Gn−1 is then a union of (half the) gaps of Vn−1 . For every such interval [mn−1 (j ) − ln−1 /2, mn−1 (j ) + ln−1 /2] ⊂ Gn−1 , pick gn /gn−1 centers mn (k) with |mn (k) − mn−1 (j )| ≤ ln−1 /4, and also so that the mn (k) are separated from one another by a distance ≥ dn , Again, mn (k)’s with these properties exist in sufficiently large supply: Indeed, we can find ≈ ln−1 /dn centers mn (k) with the required properties in each subinterval of Gn−1 . On the other hand, by our assumptions on the sequence n , n−1 n the desired number of centers gn /gn−1 = (ln−1 /dn )en−1 a −n a + θn is much smaller than ln−1 /dn for all n ≥ n0 + 1, provided n0 is large enough. So the construction is possible. We then get a total of (gn /gn−1 )gn−1 = gn new gaps, and we would like to define Gn as the union of these gaps, as in (39). Unfortunately, there is an additional complication: If we proceed as above, we have no control on f0 , so the bound of Theorem 3.1 is useless. The problem is that we have (n) (n) not renormalized correctly. So it is necessary to also adjust E0 . We will add E0 to all the previously chosen centers mn (k) (k = 1, . . . , gn ), while keeping the gap length ln (n) fixed. This just amounts to adding E0 to the function f (α). In particular, the original (n) mean value f0 will be replaced by f0 + E0 . (n) (n) Now (30) implies that (for E0 = 0) f0 = O(gn ln ), so for a suitable E0 = O(gn ln ), we have f0 = 0, as desired. On the other hand, since we were cautious enough to take the (n) gaps of Vn well inside the gaps of Vn−1 , the new centers E0 + mn (k) will still satisfy, (n) let us say, |E0 + mn (k) − mn−1 (j )| ≤ ln−1 /3. This follows from the fact that the (n) shift E0 = O(gn ln ) is much smaller than ln−1 for large n (here we use the inequality a > (1 − D)−1 ). (n) Once E0 has been picked, Gn can again be defined as in (39), but with the shifted (n) centers E0 + mn (k) taking the role of mn (k). (n) This two step procedure (first choose mn (k)’s, then shift by an appropriate E0 to (n) make f0 = 0) can now be used to pick the mn (k)’s and E0 (inductively) for all n ≥ n0 . Note that the construction ensures that Gn ⊂ Gn−1 . We must still choose, for every n ≥ n0 , a particular potential from the corresponding family Vα0 of finite gap potentials. Fortunately, this choice is easy: We fix once and for all a sufficiently large N ∈ N (where “sufficiently large” will be made precise at the end of the proof) and then simply take a Vn that satisfies the conclusion of Theorem 3.1 for L = Ln . Note also that the assumptions of Theorem 3.1 on the location of the gaps (that (n) is, C1 ≤ mn (k) − E0 ≤ C2 for all k = 1, . . . , gn ) hold with n-independent constants (n) C1 , C2 > 0 because E0 lies in a small interval centered at zero while all gaps are in [1, 2]. Therefore, the constant C from the statement of Theorem 3.1 is also independent of n. Finally, for n < n0 , we can take an arbitrary bounded (and measurable) function as Vn ; for instance, we can put Vn ≡ 0 for n < n0 . Let T = n Gn . We will now show that S ⊃ T , where S is the set defined in (4). Suppose E ∈ T . Then E satisfies the assumptions of Lemma 7.1 for the potentials
Finite Gap Potentials and WKB Asymptotics for 1D-Schrödinger Operators
433
V (x) = Vn (x) for all sufficiently large n. Assume, to obtain a contradiction, that E ∈ / S. Then there is a solution y(x) of the form (3), and y, y span the space of solutions of (1). In particular, the solution fn from Lemma 7.1 must be a linear combination of y and y, that is, fn (x) = An eiω(x) + Bn e−iω(x) + rn (x),
(40)
x √ with ω(x) = 0 E − V (t) dt and |rn (x)| ≤ (|An | + |Bn |)ρ(x), where ρ(x) → 0. Of course, since V = Vn only on the interval (an , an+1 ), Eq. (40) holds for an ≤ x ≤ an+1 . (n) (n) Lemma 7.1 shows that fn (x0 ) = 1 for some x0 ∈ [an , an + 1] and
an+1 (n)
x0
|fn (x)|2 dx ≤ C/ ln
(41)
for large values of n. We also take n so large that ρ(x) < δ for x ≥ an and Vn ∞ < δE, which clearly is possible since gn ln → 0 (of course, Theorem 3.1 would give a better (n) bound on Vn , but this is not needed here). Then, since fn (x0 ) = 1, we must have −1 that |An | + |Bn | > (1 + δ) . Now routine estimates show that if δ > 0 was chosen sufficiently small, then (40) implies that
an+1 (n)
x0
|fn (x)|2 dx ≥ C0 Ln = AC0 / ln .
This inequality contradicts (41) if A is sufficienctly large, so T ⊂ S, as claimed. The next step is to prove that dim T = D. To this end, we introduce a Borel measure µ that reflects the self-similar scaling structure of T . More specifically, µ gives equal weight to the intervals of Gn for every n: µ(In ) = gn−1 if In is one of the intervals [mn (k) − ln /2, mn (k) + ln /2]. Moreover, we also demand that µ be supported by T : µ(R \ T ) = 0. It is not hard to show (for instance, by considering approximations µn supported by Gn ) that there indeed exists a unique Borel (probability) measure µ satisfying these requirements. We will now establish the following property of the generalized derivatives of µ: For every fixed γ < D, we have that lim sup
δ→0+ |I |≤δ
µ(I ) = 0. |I |γ
(42)
The supremum is over all intervals I ⊂ R of length at most δ. If (42) holds, then, by general facts on Hausdorff measures [22, Sect. 3.4, Theorem 67], µ gives zero weight to sets of dimension strictly less than D, and therefore dim T ≥ D, as desired. The converse inequality dim T ≤ D does not need explicit proof (although that would actually be easy to do) because we know that always dim S ≤ 2(1 − α) (this is the result whose optimality we are about to prove), and thus dim T ≤ dim S ≤ D will follow automatically once we have established that V (x) = O(x −α ). So let us prove (42): Fix γ , and let I be an interval with |I | ≤ δ, where δ > 0 is small. Then, define n ∈ N by requiring that ln < |I | ≤ ln−1 . Clearly, n is large if δ is small. We first treat the case when |I | ≤ dn . Recall that dn is the minimal distance
434
T. Kriecherbauer, C. Remling
between adjacent gaps of Vn . So the above assumption implies that I intersects at most two of the intervals that build up Gn . Each of these intervals has measure gn−1 , hence 2 2 µ(I ) ≤ ≤ γ. |I |γ gn |I |γ gn ln On the other hand, if |I | > dn , then the number of subintervals of Gn intersecting I is ≤ 3|I |/dn , thus in this case, 1 3|I |1−γ gn−1 ln−1 µ(I ) ≤ ≤3 γ . γ |I | d n gn dn gn gn−1 ln−1 γ
Now gn ln exp(σ a n ), where σ > 0 depends on γ , and gn−1 ln−1 /(dn gn ) exp(n a n ); indeed, relations of this type motivated our definition of gn and dn . Since, as noted above, n → ∞ as δ → 0+, (42) now follows. It remains to show that V satisfies the bound (2). So, let x ∈ (an , an+1 ) with large n. Recall that f0 = 0, where f is the function from the trace formula for Vn . Theorem 3.1 therefore implies that 1/2 |V (x)| ≤ C gn ln (ln(gn Ln ))1/2 + gn ln (ln dn−1 ln gn )N+1 for these x. On the other hand, x ≤ an+1 =
n
Lm
m=1
n m=1
−1 lm ln−1 ,
and (2) indeed follows, provided we took N +1≥
a 1−α . 2α − 1 a − 1
(As expected, N → ∞ as α → 1/2+ .) Actually, doing these final estimates carefully, we obtain a stronger bound of the form x −α−n /2 (ln x)1/2 . The strengthening of Theorem 1.1 mentioned in Sect. 1 follows from this by taking a sequence n that tends to zero sufficiently slowly. Acknowledgement. C.R. acknowledges financial support by the Heisenberg program of the Deutsche Forschungsgemeinschaft.
References 1. Christ, M. and Kiselev, A.: Absolutely continuous spectrum for one-dimensional Schrödinger operators with slowly decaying potentials: Some optimal results. J. Am. Math. Soc. 11, 771–797 (1998) 2. Christ, M. and Kiselev, A.: WKB asymptotic behavior of almost all generalized eigenfunctions for onedimensional Schrödinger operators with slowly decaying potentials. Preprint (2000) 3. Christ, M. and Kiselev, A.: WKB and spectral analysis of one-dimensional Schrödinger operators whose potentials have slowly decaying derivatives. Preprint (2000) 4. Coddington, E.A. and Levinson,N.: Theory of Ordinary Differential Equations. New York: McGraw-Hill, 1955 5. Deift, P. and Killip, R.: On the absolutely continuous spectrum of one-dimensional Schrödinger operators with square summable potentials. Commun. Math. Phys. 203, 341–347 (1999) 6. Deift, P.,Kriecherbauer, T. and Venakides, S.: Forced lattice vibrations II. Commun. Pure Appl. Math. 48, 1251–1298 (1995)
Finite Gap Potentials and WKB Asymptotics for 1D-Schrödinger Operators
435
7. Eastham, M.S.P.: The Asymptotic Solution of Linear Differential Systems, London Math. Soc. Monographs New Series 4. Oxford: Clarendon Press, 1989 8. Farkas, H.M. and Kra, I.: Riemann Surfaces, New York: Springer, 1980 9. Gesztesy, F., Ratnaseelan, R. and Teschl, G.: The KdV hierarchy and associated trace formulas. In: Gohberg, I. (ed.) et al., Oper. Theory: Advances and Applications 87, 125–163 (1996) 10. Gesztesy, F. and Weikard, R.: Spectral deformations and soliton equations, In: Ames, W.F., (ed.) et al., Differential equations with applications to mathematical physics, Boston: Academic Press, 1993, pp. 101– 139 11. Gilbert, D.J. and Pearson, D.B.: On subordinacy and analysis of the spectrum of one-dimensional Schrödinger operators. J. Math. Anal. Appl. 128, 30–56 (1987) 12. Kahane, J.-P.: Some Random Series of Functions. Cambridge: Cambridge University Press, 1985 13. Killip, R.: Perturbations of one-dimensional Schrödinger operators preserving the absolutely continuous spectrum, Ph.D. thesis, Caltech 2000; electronically available at http://www.ma.utexas.edu/mp_arcbin/mpa?yn=00-326 14. Kiselev, A., Last, Y. and Simon, B.: Modifed Prüfer and EFGP transforms and the spectral analysis of one-dimensional Schrödinger operators. Commun. Math. Phys. 194, 1–45 (1998) 15. Kotani, S. and Ushiroya, N.: One-dimensional Schrödinger operators with random decaying potentials. Commun. Math. Phys. 115, 247–266 (1988) 16. McKean, H.P.: Variation on a theme of Jacobi. Commun. Pure Appl. Math. 38, 669–678 (1985) 17. Mumford, D.: Tata Lectures on Theta 2. Basel: Birkhäuser-Verlag, 1984 18. Naboko, S.N.: Dense point spectra of Schrödinger and Dirac operators. Theor. and Math. Phys. 68, 646–653 (1986) 19. Remling, C.: The absolutely continuous spectrum of one-dimensional Schrödinger operators with decaying potentials. Commun. Math. Phys. 193, 151–170 (1998) 20. Remling, C.: Bounds on embedded singular spectrum for one-dimensional Schrödinger operators. Proc. Amer. Math. Soc. 128, 161–171 (2000) 21. Remling, C.: Schrödinger operators with decaying potentials: some counterexamples. Duke Math. J. 105, 463–496 (2000) 22. Rogers, C.A.: Hausdorff Measures. Cambridge: Cambridge University Press, 1970 23. Salem, R. and Zygmund A.: Some properties of trigonometric series whose terms have random signs. Acta Math. 91, 245–301 (1954) 24. Simon, B.: Some Jacobi matrices with decaying potential and dense point spectrum. Commun. Math. Phys. 87, 253–258 (1982) 25. Simon, B.: Bounded eigenfunctions and absolutely continuous spectra for one-dimensional Schrödinger operators. Proc. Amer. Math. Soc. 124, 3361–3369 (1996) 26. Simon, B.: Some Schrödinger operators with dense point spectrum. Proc. Amer. Math. Soc. 125, 203–208 (1997) 27. Simon, B.: Schrödinger operators in the twenty-first century. Fokal, A. et al (eds.), Mathematical Physics 2000, London: Imperial College, 2000, pp. 283–288 28. Springer, G.: Introduction to Riemann Surfaces. Reading, MA: Addison-Wesley, 1957 29. Stolz, G.: Bounded solutions and abolute continuity of Sturm-Liouville operators. J. Math. Anal. Appl. 169, 210–228 (1992) 30. von Neumann, J. and Wigner, E.: Über merkwürdige diskrete Eigenwerte. Phys. Z. 30, 465–467 (1929) Communicated by B. Simon
Commun. Math. Phys. 223, 437 – 450 (2001)
Communications in
Mathematical Physics
© Springer-Verlag 2001
Characterizations of the Automorphisms of Hilbert Space Effect Algebras Lajos Molnár Institute of Mathematics and Informatics, University of Debrecen, P.O.Box 12, 4010 Debrecen, Hungary E-mail: [email protected] Received: 3 April 2001 / Accepted: 27 June 2001
Abstract: In this paper we characterize the automorphisms of Hilbert space effect algebras by means of their preserving properties which concern certain relations and quantities appearing in quantum measurement theory. 1. Introduction and Statement of the Results The concept of effects plays a fundamental role in the mathematical description of quantum measurement (for detailed explanations see the Introduction in [2] or § 1 in [11]). In the Hilbert space framework, the set E(H ) of all effects on a complex Hilbert space H is just the operator interval [0, I ] of all positive operators on H which are bounded by the identity I . The set E(H ) can be equipped with several algebraic operations and relations which all have physical content. Hence we have different algebraic structures on the same set E(H ). The investigation of the morphisms of these structures was initiated by Ludwig (for an explanation and results see Chapters V and VI in [10]). We recall one of his fundamental results in this direction as follows. First, there is a natural partial order ≤ on E(H ) which is induced by the usual order between selfadjoint operators on H . Next, there is a kind of orthocomplementation ⊥: E → I − E on E(H ) (cf. [2], p. 25). Now, Ludwig’s result [10, Sect. V.5.] (also see [5]) describes the ortho-order automorphisms of E(H ) (that is, the automorphisms of E(H ) with respect to the relation ≤ and the operation ⊥) in the following way: If dim H ≥ 3, then every ortho-order automorphism φ of E(H ) is of the form φ(E) = U EU ∗
(E ∈ E(H ))
for some either unitary or antiunitary operator U on H . Clearly, this result is in an intimate connection with the fundamental theorem of projective geometry determining the form of the ortho-order automorphisms of the orthoposet P(H ) of all projections on H . (In our recent paper [13] we have shown that, unlike the fundamental theorem of projective
438
L. Molnár
geometry, the conclusion in Ludwig’s result remains valid also in the two-dimensional case.) It is an exciting problem to characterize the automorphisms of algebraic structures of any kind by means of their preservation properties concerning certain relevant relations, sets, quantities, etc. which are connected with the underlying structures. To mention only one such area of investigation, we refer to the linear preserver problems which, in the last decades, represent one of the most extensively studied research areas in matrix theory (see, for example, the survey paper [9]). In what follows we present three results of the above kind concerning the automorphisms of the Hilbert space effect algebra. Our aim with this paper is to try to draw the attention of the people working on the foundations of quantum mechanics and dealing with algebraic structures appearing there to such problems. We believe that just like in certain parts of pure mathematics, such investigations can give new insight into the behaviour of the automorphisms that might help to better understand the underlying algebraic structures. We now turn to our results. Let us begin with the following small, innocent remark. Remark. Clearly, Ludwig’s theorem describes those bijections of the effect algebra which preserve the relation ≤ and the operation ⊥. However, these properties in question can be expressed by the preservation of one single relation which is orthogonality. The effects E, F are said to be orthogonal if E ≤ I − F (or, equivalently, if E + F ≤ I ) (see, for example, [7]). Now, our assertion is that a bijective map φ : E(H ) → E(H ) is an ortho-order automorphism of E(H ) if and only if φ preserves the orthogonality in both directions. Indeed, the necessity is obvious. Conversely, suppose that φ preserves the orthogonality in both directions. It is easy to see that for any effects A, B we have A ≤ B if and only if for every C ∈ E(H ), the orthogonality of B and C implies the orthogonality of A and C. This characterization of the order gives us that φ preserves the order in both directions. Next, for any effect A ∈ E(H ), the effect A⊥ can easily be characterized as the supremum of all effects which are orthogonal to A. We now easily get that φ preserves the operation ⊥. This proves our assertion. We point out that the orthogonality preserving property appears in the definition of the so-called effect-automorphisms [10, D 4.2.1] (in [4] they were called E-automorphisms). These are bijective maps φ : E(H ) → E(H ) with the property that for every E, F ∈ E(H ) we have E + F ∈ E(H ) ⇐⇒ φ(E) + φ(F ) ∈ E(H ),
(1)
and in this case φ(E + F ) = φ(E) + φ(F ) holds. It now follows that the first property (1) in the definition of effect automorphisms characterizes exactly the ortho-order automorphisms. Observe that it follows from Ludwig’s theorem if dim H ≥ 3 and from [13] if dim H = 2 that the ortho-order automorphisms are additive, so in those cases these two kinds of automorphisms are the same. This is trivially not true if dim H = 1. We now turn to the nontrivial results of the paper. Beside order and orthogonality there is another important relation on E(H ). This is the coexistency (see, for example, [2, II.2.2.] or [11, § 1]). A set of effects is called coexistent if its members are in the range of an unsharp observable, i.e. a POV (positive operator valued) measure. In the
Characterizations of Automorphisms
439
case of two effects E, F this is well-known to be equivalent to the following: there exist effects A, B, C ∈ E(H ) such that E = A + C,
F = B + C, and A + B + C ∈ E(H ).
Our first theorem which follows tells us that the preservation of the two binary relations of order and coexistency characterizes the ortho-order automorphisms of E(H ). Theorem 1. Let H be a Hilbert space with dim H ≥ 3. Let φ : E(H ) → E(H ) be a bijective map with the properties that E ≤ F ⇐⇒ φ(E) ≤ φ(F ) and E and F are coexistent ⇐⇒ φ(E) and φ(F ) are coexistent for every E, F ∈ E(H ). Then there exists an either unitary or antiunitary operator U on H such that φ is of the form φ(E) = U EU ∗
(E ∈ E(H )).
We remark that it is easy to see that the preservation of the order or the preservation of the coexistency alone does not characterize the automorphisms of E(H ). As for order, see the remark after the proof of Theorem 1. As for coexistency, consider the transformation φ : E(H ) → E(H ) defined as φ(0) = I, φ(I ) = 0 and φ(E) = E otherwise. If ϕ is a pure state (i.e. a unit vector in H ), then the probability of an effect E ∈ E(H ) in this state is Eϕ, ϕ. Our second result asserts that if a bijective map φ on E(H ) preserves the order and there are two pure states ϕ, ψ ∈ H with respect to which φ preserves the probability, then φ is an automorphism of E(H ). More explicitly, we have the following result. Theorem 2. Assume dim H ≥ 3. Let φ : E(H ) → E(H ) be a bijective map for which E ≤ F ⇐⇒ φ(E) ≤ φ(F )
(E, F ∈ E(H ))
and suppose that there are unit vectors ϕ, ψ ∈ H such that φ(E)ψ, ψ = Eϕ, ϕ
(E ∈ E(H )).
Then there exists an either unitary or antiunitary operator U on H such that φ(E) = U EU ∗
(E ∈ E(H ))
and U ϕ = ψ. (For further results of the same spirit see the remark after the proof of Theorem 2.) Similarly to the case of our first theorem, we remark that the preservation of the probability appearing above alone is not sufficient to characterize the automorphisms of E(H ). Indeed, choosing pure states ϕ = ψ and defining φ by the identity on the set N of all effects E for which Eϕ, ϕ = 0 and by any permutation on E(H ) \ N we can easily get an appropriate example. The set E(H ) of all effects is clearly a convex set. So, it is natural to equip it with the operation of convex combinations. The automorphisms of effect algebras with respect
440
L. Molnár
to this operation which are called mixture automorphisms were studied, for example in, [8]. These automorphisms of E(H ) in full generality were determined in [12]. The result [12, Corollary 2] says that every mixture automorphism φ of E(H ) is either of the form φ(E) = U EU ∗
(E ∈ E(H ))
or of the form φ(E) = U (I − E)U ∗
(E ∈ E(H )),
where U is an either unitary or antiunitary operator on H . An effect A is called a mixture of the effects B, C if A is a convex combination of B and C, that is, if there is a scalar λ ∈ [0, 1] such that A = λB + (1 − λ)C. Our third result states that the preservation of mixtures characterizes the mixture automorphisms of E(H ). Theorem 3. Assume dim H ≥ 2. Let φ : E(H ) → E(H ) be a bijective function with the property that A is a mixture of B and C ⇐⇒ φ(A) is a mixture of φ(B) and φ(C) holds for all A, B, C ∈ E(H ). Then there exists an either unitary or antiunitary operator U on H such that either φ(A) = U AU ∗
(A ∈ E(H ))
or φ(A) = U (I − A)U ∗
(A ∈ E(H )).
2. Proofs This section is devoted to the proofs of our results. We begin with some lemmas. Lemma 1. The effect A ∈ E(H ) is coexistent with every E ∈ E(H ) if and only if A = λI holds with some scalar λ ∈ [0, 1]. Proof. If A is coexistent with every effect, then it is coexistent with every projection P on H . Since coexistence with a projection means commutativity with that projection (see, for example, [11, p. 120]), it follows that A commutes with every projection which implies that A commutes with every operator B ∈ B(H ) (B(H ) denotes the algebra of all bounded linear operators on H ). It is well-known that this implies that A is a scalar. Conversely, suppose that A = λI with some λ ∈ [0, 1]. If E ∈ E(H ) is arbitrary, then we can write λI = λ(I − E) + λE,
E = (1 − λ)E + λE.
Since λI + (1 − λ)E ≤ I , it follows that λI and E are coexistent.
Lemma 2. Let E, F ∈ E(H ) be of rank 1. Suppose that the ranges of E and F are different. Then E, F are coexistent if and only if E + F ∈ E(H ).
Characterizations of Automorphisms
441
Proof. Only the necessity requires proof. Suppose that E, F are coexistent. Then there are effects A, B, C such that A + B + C is an effect and E = A + C, F = B + C. As E, F are of rank 1 and C ≤ E, F , it follows that the range rng C of C is included in the range of E and F which implies that rng C = {0}, that is, C = 0. This shows that E + F = A + B + 2C = A + B + C ∈ E(H ). In what follows we need the concept of the strength of effects along rays (rank-1 projections) defined in [6]. Let E ∈ E(H ) and consider an arbitrary rank-1 projection P on H . The strength of E along P is defined by λ(E, P ) = sup{λ ∈ [0, 1] : λP ≤ E}. If ϕ ∈ H is any unit vector, then let Pϕ denote the rank-1 projection which projects onto the linear space generated by ϕ. In the sequel we shall use the following nice result of Busch and Gudder [6, Theorem 3]: for any effect E ∈ E(H ) and unit vector ϕ ∈ H we have ∃λ > 0 : λPϕ ≤ E ⇐⇒ ϕ ∈ rng E 1/2 . Lemma 3. Let E ∈ E(H ) and 0 < λ < µ ≤ 1. Suppose that λI ≤ E ≤ µI. If ϕ, ψ ∈ H are unit vectors such that λ(E, Pϕ ) = λ,
λ(E, Pψ ) = µ,
then ϕ, ψ are eigenvectors of E and the corresponding eigenvalues are λ, µ, respectively. Proof. It follows from λI ≤ E ≤ µI that for the spectrum σ (E) of E we have σ (E) ⊂ [λ, µ]. We assert that λ, µ ∈ σ (E). Indeed, suppose, for example, that the effect E − λI is invertible. Then its square-root is also invertible and the above mentioned result of Busch and Gudder ([6, Theorem 3]) tells us that there exists a positive number for which Pϕ ≤ E − λI. This implies that ( + λ)Pϕ ≤ E which means that the strength of E along Pϕ is greater than λ. But this is a contradiction. So, we have λ ∈ σ (E). One can prove in a similar fashion that µ ∈ σ (E). Therefore, the convex hull of σ (E) is exactly [λ, µ]. Now, one can follow the proofs of the statements (a), (b) in [6, Theorem 5] to verify that Eϕ = λϕ and Eψ = µψ. Remark. It is easy to see that in the previous lemma λ = 0 can not be allowed. To show this, pick two different rank-1 projections P , Q such that P Q = 0. Let ϕ ∈ H be a unit vector such that Q = Pϕ . Clearly, we have 0 ≤ P ≤ I and λ(P , Pϕ ) = 0 but P ϕ = 0 · ϕ. Lemma 4. Let P , Q be different projections on the Hilbert space H . Then P and Q are mutually orthogonal (that is, P Q = 0) if and only if every subprojection of P commutes with every subprojection of Q.
442
L. Molnár
Proof. This follows easily from the following observation: the projections P , Q commute if and only if there are mutually orthogonal projections P0 , Q0 , R such that P = P0 + R and Q = Q0 + R. Now, we are in a position to prove our first theorem. Proof of Theorem 1. Any bijection of E(H ) which preserves the order in both directions also preserves the projections in both directions. This important observation was made in [10, Theorem 5.8., p. 219]. By the order preserving property of φ, one can deduce that the operator φ(P ) is a rank-1 projection if and only if P is a rank-1 projection. More generally, one can prove that φ(P ) is a rank-n projection if and only if P is a rank-n projection. Indeed, this follows from the following characterization of rank-n projections: the projection P is of rank-n if and only if there is a chain P1 , . . . , Pn−1 of n − 1 projections such that 0 P1 P2 , . . . Pn−1 P but there is no such chain of n members. Since, as we have already mentioned, the coexistence of projections is equivalent to the commutativity of the projections in question, it follows from Lemma 4 that φ preserves the orthogonality of projections. So, φ is a bijection of the set of all projections on H which preserves the order and the orthogonality in both directions. It follows from the fundamental theorem of projective geometry (see the introduction) that there exists an either unitary or antiunitary operator U on H such that φ(P ) = U P U ∗ holds for every projection P on H . Considering the transformation E → U ∗ φ(E)U if necessary, we can clearly assume without any loss of generality that φ(P ) = P holds for every projection P . It remains to prove that we have φ(E) = E for every effect E as well. By Lemma 1 we find that there is a bijective strictly monotone increasing function f : [0, 1] → [0, 1] such that φ(λI ) = f (λ)I
(λ ∈ [0, 1]).
Let P be a rank-1 projection. Since φ(P ) is also of rank 1, it follows from 0 ≤ φ(λP ) ≤ φ(P ) that φ(λP ) is a scalar multiple of φ(P ). Therefore, we have a bijective strictly monotone increasing function fP : [0, 1] → [0, 1] such that φ(λP ) = fP (λ)φ(P ) (λ ∈ [0, 1]). The strength of φ(λI ) = f (λ)I along φ(P ) is obviously f (λ). On the other hand, we have fP (µ)φ(P ) = φ(µP ) ≤ φ(λI ) = f (λ)I if and only if µ ≤ λ, which shows that the strength of φ(λI ) along φ(P ) is fP (λ). Therefore, we have fP = f . Consequently, φ(λP ) = f (λ)φ(P )
(λ ∈ [0, 1])
holds for every rank-1 projection P on H . Let P1 , . . . , Pn be pairwise orthogonal rank-1 projections and λ1 , . . . , λn ∈ [0, 1]. Set P = P1 + . . . + Pn and E = λ1 P1 + . . . + λn Pn . Since φ(E) ≤ φ(P ), we deduce that φ(E) acts on the n-dimensional subspace rng φ(P ) (this means that φ(E) sends the range of φ(P ) into itself and φ(E) is zero on the orthogonal complement of rng φ(P )). Since each Pi commutes with E, it follows from the coexistence preserving property of φ that φ(Pi ) commutes with φ(E). As the sum of the φ(Pi )’s is φ(P ), we readily obtain that φ(E) = µ1 φ(P1 ) + . . . + µn φ(Pn )
(2)
Characterizations of Automorphisms
443
holds for some scalars µi ∈ [0, 1]. Since the strength of E along Pi is λi , by the order preserving property of φ we infer that the strength of φ(E) along φ(Pi ) is f (λi ). On the other hand, it follows from the equality (2) that the strength of φ(E) along φ(Pi ) is µi . Therefore, we have φ(λ1 P1 + . . . + λn Pn ) = φ(E) = f (λ1 )φ(P1 ) + . . . + f (λn )φ(Pn ). This gives us that for any finite rank operator A ∈ E(H ) we have φ(A) = f (A), where f (A) denotes the image of f under the continuous function calculus corresponding to the normal operator A. (Observe that, as f : [0, 1] → [0, 1] is a strictly monotone increasing bijection, it is a continuous function.) It follows from the spectral theorem and the properties of the spectral integral that for any effect A ∈ E(H ) there is a net (Aα ) of finite rank effects such that Aα ≤ A and Aα → A in the strong operator topology. Since the multiplication is continuous on the bounded subsets of operators with respect to the strong operator topology, we obtain that p(Aα ) → p(A) strongly for every polynomial p. As, by Weierstrass’s theorem, f can be approximated by polynomials in the uniform norm, we find that f (Aα ) → f (A) strongly. Since f (Aα ) = φ(Aα ) ≤ φ(A), we obtain that f (A) ≤ φ(A)
(A ∈ E(H )).
(3)
To see the reverse inequality, observe that we have φ −1 (λI ) = f −1 (λ)I . Therefore, considering φ −1 in the place of φ, we get that f −1 (A) ≤ φ −1 (A).
(4)
It follows from (3) and (4) that A = f −1 (f (A)) ≤ φ −1 (f (A)) ≤ φ −1 (φ(A)) = A. This implies that φ(A) = f (A) holds for every A ∈ E(H ). We show that f (λ) + f (1 − λ) = 1 (λ ∈ [0, 1]). Let 0 < λ < 1. Let P be a rank-1 projection. Pick any 0 < < 1 − λ. Clearly, the spectrum of λP + P is {0, λ + }. Let Q be a rank-1 projection such that rng Q ∩ rng P = {0}. If δ denotes the largest eigenvalue of the positive operator λP + Q, then by Weyl’s perturbation theorem (see, for example, [1, Corollary III.2.6]) we have |(λ + ) − δ| ≤ (λP + P ) − (λP + Q) = P − Q. (We mention that, as the referee remarked in his/her report, the above inequality in our particular case could also be verified by direct computation.) So, if Q is close enough to P , then the largest eigenvalue of the operator λP + Q is close enough to λ + and hence it is less than 1 which shows that λP + Q ∈ E(H ). This implies that the effects λP and Q are coexistent and by the properties of φ it follows that the same must hold true for f (λ)P = f (λ)φ(P ) = φ(λP ) and f ()Q = f ()φ(Q) = φ(Q). By Lemma 2 we infer that f (λ)P + f ()Q ≤ I . If we let Q converge to P , we get f (λ)P + f ()P ≤ I . This gives us that f (λ) + f () ≤ 1. Therefore, if tends to 1 − λ, we obtain f (λ) + f (1 − λ) ≤ 1 or, equivalently, f (1 − λ) ≤ 1 − f (λ).
(5)
444
L. Molnár
Applying the above argument for φ −1 instead of φ, we find that f −1 (1 − λ) ≤ 1 − f −1 (λ).
(6)
Using the monotonicity of f −1 and the inequalities (5), (6), we have λ = f −1 (f (λ)) ≤ f −1 (1 − f (1 − λ)) ≤ 1 − f −1 (f (1 − λ)) = λ. Therefore, we deduce f −1 (f (λ)) = f −1 (1 − f (1 − λ)) which implies that f (λ) + f (1 − λ) = 1 for every 0 < λ < 1. It is trivial that the equality is valid for λ = 0, 1 as well. This implies that f (I − A) = I − f (A) holds for every effect A ∈ E(H ) which yields that φ satisfies φ(I − A) = I − φ(A)
(A ∈ E(H )).
Therefore, φ is an ortho-order automorphism of E(H ). Applying Ludwig’s theorem on the form of those automorphisms we have φ(E) = E (E ∈ E(H )). This completes the proof of the theorem. Remark. A careful examination of the proof of Theorem 1 shows that if φ : E(H ) → E(H ) is a bijective map which preserves the order and the commutativity in both directions, then there exists an either unitary or antiunitary operator U on H and a strictly monotone increasing bijection f : [0, 1] → [0, 1] such that φ is of the form φ(A) = Uf (A)U ∗
(A ∈ E(H )).
(7)
Clearly, it follows from the order preserving property of φ that f as well as f −1 are operator monotone on [0, 1]. (Recall that a continuous real function g on an interval is called operator monotone if for arbitrary selfadjoint operators A, B with spectrum in the domain of g, the relation A ≤ B implies g(A) ≤ g(B).) It is easy to see that, conversely, if U and f are such as above, then the formula (7) defines a bijection of E(H ) which preserves the order and commutativity in both directions. (This last property follows from the fact that if A, B are commuting, then the same holds for their polynomials. Finally, as every continuous function on a compact subset of the real line can be uniformly approximated by polynomials, we obtain the commutativity of any continuous function of A and B.) Now, the question is whether there do exist nontrivial continuous functions f : [0, 1] → [0, 1] with the property that f, f −1 are operator monotone. The answer to this question is affirmative. Indeed, consider, for example, the function f (λ) = (2λ)/(1 + λ) (λ ∈ [0, 1]). From a mathematical point of view it is remarkable that the preservation of both the order and the coexistency does characterize the automorphisms of E(H ) while the preservation of the order and the commutativity (which is a much more widely used property than coexistency) does not. We continue with the proof of our second theorem. We shall need the following observation.
Characterizations of Automorphisms
445
Lemma 5. Let E ∈ E(H ) and let D be a dense subset of the set of all unit vectors in H . Pick 0 < λ ≤ 1. If λ(E, Pϕ ) = λ for every ϕ ∈ D, then E = λI . Proof. As λPϕ ≤ E for every ϕ ∈ D and D is dense in the set of all unit vectors, it follows that λPϕ ≤ E holds for every unit vector ϕ in H . According to the result [6, Theorem 3] we deduce that the square-root of E is surjective which gives us that E 1/2 , E are invertible. In [6, Theorem 4], Busch and Gudder gave an explicit formula for the strength of an arbitrary effect along an arbitrary ray. It follows from that result that E −1/2 ϕ−2 = λ for every ϕ ∈ D which implies that we have the same equality for every unit vector ϕ ∈ H . This gives us that E −1 ϕ, ϕ = for every ϕ ∈ H . Hence, we have E = λI .
1 λ
ϕ, ϕ
We now can prove Theorem 2. Proof of Theorem 2. Just as in the proof of Theorem 1, we obtain that φ preserves the projections in both directions as well as their rank and that for every rank-1 projection P there is a strictly monotone increasing bijection fP : [0, 1] → [0, 1] such that φ(λP ) = fP (λ)φ(P ) (λ ∈ [0, 1]). Let ϕ, ψ be as in the theorem. If the range of P is not orthogonal to ϕ, then we compute fP (λ)φ(P )ψ, ψ = φ(λP )ψ, ψ = λP ϕ, ϕ = λP ϕ, ϕ. This gives us that fP (λ) = cλ (λ ∈ [0, 1]) for some constant c. Since fP is a bijection of [0, 1] onto itself, it follows that c = 1. So, in the present case we have φ(λP ) = λφ(P )
(λ ∈ [0, 1]).
Since φ(Pϕ )ψ2 = φ(Pϕ )ψ, ψ = Pϕ ϕ, ϕ = 1, we deduce that φ(Pϕ )ψ = 1 = ψ which implies that ψ is in the range of φ(Pϕ ). Hence we have φ(Pϕ ) = Pψ . Suppose now that P Pϕ = 0. As φ(P )ψ, ψ = P ϕ, ϕ = 0, we have φ(P )ψ = 0 implying that φ(P )Pψ = 0. Therefore, φ(P ) is orthogonal to Pψ = φ(Pϕ ). As φ −1 has similar properties as φ, we obtain that a rank-1 projection P is orthogonal to Pϕ if and only if φ(P ) is orthogonal to φ(Pϕ ). Let P be a rank-1 projection orthogonal to Pϕ . Let 0 < λ ≤ 1 be arbitrary but fixed and consider the operator A = φ(λ(P + Pϕ )). Set Q = φ(P + Pϕ ). Since A ≤ Q, it follows that A acts on the range of the rank-2 projection Q. Let Q0 be any rank-1 subprojection of Q which is not orthogonal to φ(Pϕ ). Then φ −1 (Q0 ) is a rank-1 subprojection of P + Pϕ which is not orthogonal to Pϕ . Clearly, the strength of λ(P + Pϕ ) along φ −1 (Q0 ) is λ. It follows from the second section of the present proof that the strength of A along Q0
446
L. Molnár
is also λ. Since Q0 runs through the set of all rank-1 subprojections of Q which are not orthogonal to φ(Pϕ ), Lemma 5 applies to obtain A = λQ. Therefore, we have fP (λ)φ(P ) = φ(λP ) ≤ φ(λ(P + Pϕ )) = A = λQ. This gives us that fP (λ) ≤ λ (λ ∈ [0, 1]). (Observe that in fact we have the above inequality only for positive λ’s but fP (0) = 0 is trivial because of the definition of fP .) Applying the above argument in relation with φ −1 in the place of φ, we find that fP−1 (λ) ≤ λ (λ ∈ [0, 1]). Since, due to the order preserving property of φ, fP is monotone increasing, it follows that fP (λ) = λ. To sum up what we have already proved, we have φ(λP ) = λφ(P )
(8)
for every λ ∈ [0, 1] and rank-1 projection P on H no matter whether P is orthogonal to Pϕ or not. By (8) the strength of φ(λI ) along every rank-1 projection is λ and hence, by Lemma 5, we have φ(λI ) = λI . Now we are in a position to prove that φ preserves the orthogonality between projections. Let P , Q be rank-1 projections with P Q = 0. Choose a projection R such that P ≤ R and Q ≤ I − R. Let 0 < λ < µ ≤ 1. Set E = λR + µ(I − R). We have λI = φ(λI ) ≤ φ(E) ≤ φ(µI ) ≤ µI. Since the strength of E along P is λ, it follows from (8) that the strength of φ(E) along φ(P ) is also λ. Similarly, we obtain that the strength of φ(E) along φ(Q) is µ. Lemma 3 shows that rng φ(P ), rng φ(Q) are eigensubspaces of φ(E) and the corresponding eigenvalues are λ, µ, respectively. Since the eigensubspaces of a self-adjoint operator corresponding to different eigenvalues are mutually orthogonal, it follows that φ(P )φ(Q) = 0. Since φ is order-preserving and every projection is the supremum of all rank-1 projections which are included in it, this implies that φ preserves the orthogonality between arbitrary projections. So, for any projections P , Q on H we have P Q = 0 if and only if φ(P )φ(Q) = 0. It follows that φ is a bijection of the set of all projections on H which preserves the order and the orthogonality in both directions. By the fundamental theorem of projective geometry there exists an either unitary or antiunitary operator U on H such that φ(P ) = U P U ∗ holds for every projection P on H . By (8) it follows that φ(λP ) = U (λP )U ∗
(9)
for every λ ∈ [0, 1] and rank-1 projection P . The operators of the form λP are the weak atoms in E(H ) [6, Lemma 2]. The statement [6, Corollary 3] says that every effect E is the supremum of all weak atoms which are less than or equal to E. It follows from (9) that φ(A) = U AU ∗
(A ∈ E(H )).
If U above is unitary, then we have AU ∗ ψ, U ∗ ψ = φ(A)ψ, ψ = Aϕ, ϕ
Characterizations of Automorphisms
447
for every A ∈ E(H ). It is easy to see that this implies U ∗ ψ = ϕ for some complex number of modulus 1. This yields U ϕ = ψ. Replacing U by U , we obtain the last assertion of our theorem. If U is antiunitary, then one can argue in a similar way. This completes the proof of the theorem. Remark. We note that the same conclusion as in Theorem 2 holds true if φ : E(H ) → E(H ) is a bijective map which preserves the order in both directions as well as the spectrum (or the numerical range, or, more generally, the numerical radius = the spectral radius = the norm of effects). We omit the proofs since those results have no real physical content (with the only possible exception of the numerical range which concept can be interpreted as the set of all probabilities of an effect corresponding to pure states). We now turn to the proof of our last theorem. Just as before, we need some auxiliary results. Lemma 6. Let f be a linear functional on the real linear space Bs (H ) of all bounded self-adjoint operators on H . If f is bounded from below on the operator interval [0, I ], then f is a bounded linear functional. Proof. Let K be a real number such that K ≤ f (A) (A ∈ [0, I ]). We show that f is bounded also from above on [0, I ]. Suppose on the contrary that for everyn n ∈ N there exists an operator An ∈ [0, I ] such that f (An ) ≥ 2n . Let B = ∞ n=1 An /2 ∈ [0, I ] and Bn = nk=1 Ak /2k ∈ [0, I ]. Clearly, we have B−Bn ∈ [0, I ] and hence K ≤ f (B−Bn ) which implies that K + n ≤ K + f (Bn ) ≤ f (B). Since this holds for every n ∈ N we arrive at a contradiction. This gives us that f is bounded on [0, I ]. Since every self-adjoint operator of norm not greater than 1 is the difference of two elements of [0, I ], we obtain the boundedness of f . Lemma 7. Assume dim H ≥ 2. Let f be a bounded linear functional on Bs (H ). Suppose that f (P ) = c for every nonzero projection P where c is a fixed scalar. Then we have f = c = 0. Proof. Clearly, every projection P = I is the difference of two nonzero projections. We thus obtain that f is 0 on the set of all such projections. Since I is the sum of two projections different from I , we obtain that f vanishes on the whole set of projections. By the spectral theorem, the linear span of all projections is norm-dense in Bs (H ) and hence we obtain that f = 0. Proof of Theorem 3. There is a beautiful result due to Páles [14] on segment preserving maps between general convex sets in linear spaces. Its assertion can be translated to our situation in the following way: if K is a noncollinear convex set in a real linear space X and φ : K → K is a bijective function with the property that x is a mixture (i.e., a convex combination) of y, z if and only if φ(x) is a mixture of φ(y), φ(z) (x, y, z ∈ K), then φ can be written in the form φ(x) =
ψ(x) + b f (x) + c
(x ∈ K),
(10)
where ψ : X → X is a linear transformation, b ∈ X is fixed, f : X → R is a linear functional, c ∈ R is fixed, and the denominator in (10) is everywhere positive on K.
448
L. Molnár
Adapting this result for E(H ), we have a linear transformation ψ on Bs (H ), an operator B ∈ Bs (H ), a linear functional f : Bs (H ) → R and a constant c ∈ R such that f + c is positive on E(H ) and φ(A) =
ψ(A) + B f (A) + c
(A ∈ E(H )).
(11)
By Lemma 6, f is a bounded linear functional on Bs (H ). Since 0 ≤ φ(A) ≤ I for every A ∈ E(H ), it follows that 0 ≤ ψ(A) + B ≤ (f (A) + c)I (A ∈ E(H )). If M > 0 denotes an upper bound of the values of f + c on E(H ), then we have −B ≤ ψ(A) ≤ MI − B
(A ∈ E(H )).
This implies that −BI ≤ ψ(A) ≤ (M + B)I
(A ∈ E(H ))
which yields that the numerical range of the operator ψ(A) (A ∈ E(H )) is contained in the interval [−B, M + B]. Since the numerical radius and the norm of a selfadjoint operator coincide, we obtain that ψ(A) ≤ M + B
(A ∈ E(H )),
that is, ψ is bounded on E(H ). Just as in the proof of Lemma 6 this implies the boundedness of the linear transformation ψ. By the continuity of ψ and f we obtain that φ is norm-continuous and, as φ −1 has the same properties as φ, we deduce that φ −1 is also continuous, that is, φ is a homeomorphism of E(H ). By the preserving property of φ it follows that φ preserves the extreme points of the convex set E(H ) in both directions. It is well-known that the extreme points of E(H ) are exactly the projections, hence we obtain that φ preserves the projections in both directions. The two trivial projections 0, I are distinguished in the set P(H ) of all projections by the following property: 0, I are the only projections which cannot be connected to a different projection via a continuous curve inside the set of all projections (see the proof of [12, Theorem 1]). By the known properties of φ we infer that φ permutes the projections 0, I , that is, it maps 0 either to 0 or to I . Considering the transformation A → φ(I − A) if necessary, we can assume that φ sends 0 to 0 and I to I . Since ψ, being a linear transformation, maps 0 to 0, it now follows from (11) that the operator B is 0. So, we have φ(A) =
ψ(A) f (A) + c
(A ∈ E(H )).
(12)
Since φ sends projections to projections, it follows from (12) that the linear transformation ψ sends every projection to a scalar multiple of a projection which scalar might, of course, depend on the projection in question. We show that in fact there is no such dependence. Let P be a nontrivial projection. We have nonzero projections P , Q and nonzero scalars λ, µ, ν such that ψ(P ) = λP ,
ψ(I − P ) = µQ ,
ψ(I ) = νI.
Observe that ν does not depend on P . By the additivity of ψ we obtain νI = λP + µQ
Characterizations of Automorphisms
449
and this implies λP = νI − µQ = ν(I − Q ) + (ν − µ)Q .
(13)
Clearly, Q = I . From Eq. (13) we then easily infer that λ = ν. This shows that (1/ν)ψ(P ) is a projection for every nonzero projection P . Since φ sends projections to projections, we easily obtain from (12) that ν/(f (P ) + c) = 1 for any nonzero projection P . Since ν, c are constants, we infer from Lemma 7 that f = 0. We have c = ν. Therefore, by (12) we conclude that φ = (1/ν)ψ holds on E(H ) which shows that φ extends to a linear transformation on Bs (H ). Therefore, φ is a mixture automorphism of E(H ) and [12, Corollary 2] applies to complete the proof. Remark. We mention that Theorem 3 can be generalized for the case of von Neumann algebras. Namely, one can easily modify the proofs of Lemma 6 and Lemma 7 as well as Theorem 3 (one should also consult the proof of [12, Theorem 1]) to obtain the following statement: If A = CI is a von Neumann factor on the Hilbert space H and φ is a bijection of the effect algebra of A (which is the convex set [0, I ] ∩ A) that preserves mixtures in both directions, then φ is a mixture automorphism. As for Theorem 1 and Theorem 2, the presented proofs heavily depend on the fact that every projection on the underlying Hilbert space belongs to the effect algebra. In our opinion, it would be a nice achievement if one could find an extension of those theorems for the case of effect algebras of von Neumann algebras. Acknowledgements. This research was supported from the following sources: (1) Hungarian National Foundation for Scientific Research (OTKA), Grant No. T030082, T031995, (2) A grant from the Ministry of Education, Hungary, Reg. No. FKFP 0349/2000. The author is very grateful to the referee whose valuable comments helped to improve the presentation of the paper.
References 1. Bhatia, R.: Matrix Analysis. Berlin–Heidelberg–New York: Springer-Verlag, 1997 2. Busch, P., Grabowski, M. and Lahti, P.J.: Operational Quantum Physics. Berlin–Heidelberg–New York: Springer-Verlag, 1995 3. Busch, P., Lahti, P.J. and Mittelstaedt, P.: The Quantum Theory of Measurement. Berlin–Heidelberg–New York: Springer-Verlag, 1991 4. Cassinelli, G., De Vito, E., Lahti, P. and Levrero, A.: Symmetry groups in quantum mechanics and the theorem of Wigner on the symmetry transformations. Rev. Math. Phys. 8, 921–941 (1997) 5. Cassinelli, G., De Vito, E., Lahti, P. and Levrero, A.: A theorem of Ludwig revisited. Found. Phys. 30, 1755–1761 (2000) 6. Busch, P. and Gudder, S.P.: Effects as functions on projective Hilbert spaces. Lett. Math. Phys. 47, 329–337 (1999) 7. Gudder, S.P.: Sharp and unsharp quantum effects. Adv. Appl. Math. 20, 169–187 (1998) 8. Gudder, S.P. and Pulmannová, S.: Representation theorem for convex effect algebras. Comment. Math. Univ. Carolinae 39, 645–659 (1998) 9. Li, C.K. and Tsing, N.K.: Linear preserver problems: A brief introduction and some special techniques., Linear Algebra Appl. 162–164, 217–235 (1992) 10. Ludwig, G.: Foundations of Quantum Mechanics, Vol. I. Berlin–Heidelberg–New York: Springer Verlag, 1983 11. Kraus, K.: States, Effects and Operations. Lecture Notes in Physics, Vol. 190. Berlin–Heidelberg–New York: Springer-Verlag, 1983 12. Molnár, L.: On some automorphisms of the set of effects on Hilbert space. Lett. Math. Phys. 51, 37–45 (2000)
450
L. Molnár
13. Molnár, L. and Páles, Zs.: ⊥ -order automorphisms of effect algebras: The two-dimensional case. J. Math. Phys. 42, 1907–1912 (2001) 14. Páles, Zs.: Characterization of segment preserving maps. Preprint Communicated by H. Araki
Commun. Math. Phys. 223, 451 – 464 (2001)
Communications in
Mathematical Physics
© Springer-Verlag 2001
Total Convergence or General Divergence in Small Divisors R. Pérez-Marco1,2, 1 UCLA, Dept. of Mathematics, 405, Hilgard Ave., Los Angeles, CA 90095-1555, USA.
E-mail: [email protected]
2 CNRS UMR 8628, Université Paris-Sud, Mathématiques, 91405 Orsay, France
Received: 19 September 2000 / Accepted: 27 February 2001
Dedicated to the memory of M. R. Herman Abstract: We study generic holomorphic families of dynamical systems presenting problems of small divisors with fixed arithmetic. The characteristic features are delicate problems of convergence of formal power series due to Small Divisors. We prove the following dichotomy: We have convergence for all parameter values, or divergence everywhere except for an exceptional pluri-polar set of parameters. We illustrate this general principle in different problems of Small Divisors. As an application we obtain new richer families of non-linearizable examples in the Siegel problem when the Bruno condition is violated, generalizing and extending to higher dimension previous results of Yoccoz and the author. Introduction In this article we study generic (polynomial) holomorphic families of dynamical systems presenting problems of small divisors with fixed arithmetic. Generally speaking, the characteristic feature is the existence of a formal solution to a functional equation whose convergence is problematic due to the existence of small divisors. The principle our theorems illustrate is: There is total convergence for all parameter values or general divergence except maybe for a very small exceptional set of parameter values. The germinal idea can be traced back to Y. Ilyashenko where in [Il] he studies divergence in problems of small divisors from divergence of the homological (or linearized) equation. Ilyashenko’s paper contains a remarkable idea. We find there, for the first time in Small Divisors, the study of linear deformation of the system and the use of the polynomial dependence of the new formal linearizations. A similar idea, but not quite in the same problem, was used by H. Poincaré to show that linear deformations of completely Research supported by NSF
452
R. Pérez-Marco
integrable hamiltonians are not generally completely integrable with analytic first integrals depending analytically on the parameter ([Poi] Vol I, Chap. V). It is the key preliminary step in his difficult proof of the non existence of non trivial local analytic first integrals in the three body problem (for some particular configurations of masses). Linear deformations have been fruitfully used by J.-C.Yoccoz ([Yo] p. 58). He proves that in the Siegel problem the quadratic polynomial is the worst linearizable holomorphic germ. The only ingredient in this proof that is not in Ilyashenko’s one is the classical Douady–Hubbard straightening theorem for polynomial-like mappings. Yoccoz simplifies Ilyashenko’s argument replacing Nadirashvili’s lemma by the maximum principle. He loses in that way the strength of the original approach, in particular the potential theoretic aspects. Non-linear polynomial perturbations were used by the author in [PM1] to generalize Yoccoz result to higher degree polynomials. In this article we clarify and strength the role played by potential theory in parameter space. A key point is the observation that Nadirashvili’s lemma can be improved by using Bernstein–Walsh lemma in approximation theory. In that way we make precise the thinnest notion for the exceptional set. In parameter space the exceptional set is pluri-polar (i.e. there is a pluri-sub-harmonic function identically −∞ on this set) which is much stronger than the original measure 0 condition. The techniques in this article are applicable to virtually any holomorphic problem in small divisors where the dependence on parameters of the coefficients of the divergent series are polynomial (as we will see this happens in most of the problems). We have selected a few illustrative ones guided mainly by our personal taste. We only consider here polynomial families. The same proof can be extended easily for more general holomorphic families (see Remark 4). As far as the author knows, the first person who studied small divisors problems using ingredients from potential theory in parameter space is M. Herman (see [He1] and [He2]). Linearization. Theorem 1. Let n, m ≥ 1 and d ≥ 0. For a multi-index i = (i1 , . . . im ) ∈ Nm with 0 ≤ i1 + . . . im ≤ d, let ϕi be a germ of holomorphic map ϕi : (Cn , 0) → (Cn , 0) of order larger or equal to 2 (i.e. ϕi (z) = O(z2 )). For t = (t1 , . . . tm ) ∈ Cm we consider the holomorphic family of germs of holomorphic diffeomorphisms, z ∈ Cn , ft (z) = Az + t i ϕi (z), i=(i1 ,...im ) 0≤i1 +...+im ≤d
where A ∈ GLn (C) is a fixed linear map, A = D0 f , with non-resonant eigenvalues. Then all maps ft , t ∈ Cm are formally linearizable, i.e. there exists a unique formal map ht with ht (0) = 0 and D0 ht = I such that the formal equation ht ◦ ft = A ◦ h t is satisfied. We have the following dichotomy:
Total Convergence or General Divergence in Small Divisors
453
1) The holomorphic family (ft )t∈Cm is holomorphically linearizable, that is for all t ∈ Cm , ht defines a germ of holomorphic diffeomorphism. Moreover, the radius R(ht ) of convergence of the linearization ht is bounded from below on compact sets and, more precisely, for some C0 > 0, and any t ∈ Cm , R(ht ) ≥
C0 . 1 + ||t||d
2) Except for an exceptional pluri-polar set E ⊂ Cm of values of t, ft not holomophically linearizable. We remind that E ∈ Cm is a pluri-polar set if for each z ∈ E there is a neighborhood U of z and a pluri-sub-harmonic function u such that E ∩ U ⊂ u−1 (−∞). This implies that the set has Lebesgue measure 0 and is small in a strong sense. For example, there are C ∞ smooth arcs in Cm which are not pluri-polar. We refer to [Kl] for basic notions of pluri-potential theory. In dimension 1 (m = 1) this means that the set is polar in the usual sense of potential theory, that is, it has 0 logarithmic capacity. Such a set has not only measure 0 but even Hausdorff dimension 0 (see [Ra] or [Tsu]). Remarks. 1) We recall that the eigenvalues (λ1 , λ2 , . . . , λn ) of A, counted with multiplicity, are non-resonant if λi − λi11 . . . λinn = 0 for all (i1 , . . . , in ) ∈ Nn with i1 + . . . + in ≥ 2, and i = 1, . . . , n. We state later a theorem for holomorphic germs with resonant linear parts. (proved in [PM3]). 2) The linear part A is in the Poincaré domain if min(max |λi |, max |λ−1 i |) < 1. i
i
In that case it is well known that we are always in Case (1). Otherwise the linear part of A belongs to the Siegel domain. 3) In general the exceptional set E ⊂ Cm is not empty. For example if ϕ0 = 0, then 0 ∈ E when we are in the second case. 4) With the same type of proof, one can prove the same result (but with a weaker estimate on the radius of convergence in (1)) for holomorphic non-polynomial families of the form ft (z) = Az + t i ϕi (z), i=(i1 ,...im ) |i|=i1 +...+im ≥0
where the holomorphic germs (ϕi (z)) have inreasing orders such that ϕi (z) = O zε0 |i| for some ε0 > 0. Some illustrative corollaries follow now. For n = 1 and the special case of entire functions we have directly from Theorem 1: Corollary 1. Let (ft )t∈Cm be a finite dimensional holomorphic family of entire functions as above with ft (0) = λ, where λ = e2πiα with α ∈ R − Q. Then the family is linearizable or, except for an exceptional polar set E ∈ C of values of t, all ft are non-linearizable.
454
R. Pérez-Marco
Assuming that the family contains a non-linearizable structurally stable polynomial (for example a quadratic polynomial) we can break the dichotomy. This just follows from the observation that in a neighborhood of this polynomial all elements of the family are quasi-conformally conjugated (by the Douady–Hubbard straightening theorem), thus they are linearizable or not simultaneously. Corollary 2. Let (ft )t∈Cm be a finite dimensional holomorphic family of entire functions as above with ft (0) = λ, where λ = e2πiα with α ∈ R − Q. We assume that for some value t0 ft0 is a structurally stable polynomial in the space of polynomials with fixed point at 0 and multiplier λ. Then if α is not a Bruno number almost all entire functions ft , except maybe for an exceptional polar set E ⊂ C of values of t, are not linearizable. When α ∈ R−Q is not a Bruno number, no examples were known of non-linearizable entire functions not quasi-conformally conjugated to polynomials in a neighborhood of 0. This was due to the shortcomings of Yoccoz’ maximum principle approach [Yo]. A particular case of this corollary is the theorem proved in [PM1] about polynomial germs. The author showed, generalizing Yoccoz’ result for the quadratic polynomial, that if α is not a Bruno number, in the space Pλ,d = {P (z) = λz + a2 z2 + . . . + ad zd ; (a2 , . . . , ad ) ∈ Cd−1 } the polynomials that are of degree d and structurally stable (this is an open dense set of pluri-polar complement) are not linearizable. It is worth mentioning that the question to decide if the exceptional set Eλ,d is trivial (i.e. reduced to 0) for the polynomial family Pλ,d when α ∈ R − Q is not a Bruno number, is still open, even for the cubic family: Pb (z) = λz + bz2 + z3 . Contrary to extended belief, the author will not be surprised that Eλ,d is not trivial for appropriate values of λ and d. For Liouville numbers α with extremely good rational approximations, by an argument of Cremer (see [Cr] and [PM1]), Eλ,d is known to be reduced to 0. To illustrate the strength of the precedent theorem, we present the following variations. Corollary 3. Let α ∈ R − Q be not Bruno. 1) Let f (z) = e2πiα z + O(z2 ) be non-linearizable. Any polynomial family (ft )t∈C as above containing f has all of its members ft non-linearizable except for an exceptional polar set of parameters t. 2) For an arbitrary holomorphic germ ϕ(z) = O(z2 ) and for almost all values t ∈ C except a polar set E, we have that ft (z) = e2πiα z + z2 + tϕ(z) is not linearizable. 3) Let f (z) = e2πiα z +
fn z n
n≥2
be an arbitrary entire function. Keeping all coefficients fixed except f2 , there is a polar set E such that if f2 ∈ C − E, then f is not linearizable.
Total Convergence or General Divergence in Small Divisors
455
Also, we have the same type of results for rational functions (in the proof we use Remark 4). Corollary 4. Let Rλ,d = {f ∈ C(z); f (0) = 0; f (0) = λ; deg f = d}. When α ∈ R − Q is not a Bruno number, except for an exceptional set, all rational functions in Rλ,d are not linearizable. The corollaries presented here are by no means restricted to dimension 1. There follows just one example of new result. Corollary 5. We consider the space PA,d of polynomial germs of holomorphic diffeomorphisms with non-resonant linear part A ∈ GLn (C) of total degree d. The existence of one non-linearizable example forces all the others, except maybe a pluri-polar exceptional set, to be non-linearizable. This happens for instance when one eigenvalue of A does not satisfy Bruno’s condition. We can prove also a version of Theorem 1 for resonant linear parts A which has an independent interest (for example when applied to symplectic holomorphic mappings). When the linear part is resonant, the linearization is not uniquely determined. Nevertheless, given a polynomial family (ft ) as in Theorem 1 whose elements are all formally linearizable, there always exist a canonical family of linearizations (ht ) whose coefficients depend polynomially on t (see [PM3]). The complete treatment of this situation requires some algebraic preliminaries. We do not develop them in this article. We refer to [PM3] for a complete treatment. We are content to prove here the following: Theorem 2. We consider a family (ft )t∈Cm as in Theorem 1 but we allow A ∈ GLn (C) to be resonant. We are also given a family of formal linearizations (ht )t∈Cm whose coefficients depend polynomially on t. We assume that the monomial of order l has as coefficient a polynomial of degree bounded above by C0 + C1 l for some C0 , C1 > 0. We have the following dichotomy: 1) The family (ft )t∈Cm is holomorphically linearizable by the family (ht )t∈Cm . 2) For all t ∈ Cm except for an exceptional set E of $-capacity 0, ht is diverging. One can also prove a statement similar to Theorem 2 when (ft ) is not formally linearizable but the family (ht ) conjugates the family to a formal normal form ([PM3]). A particular relevant case is the one of a symplectic holomorphic diffeomorphism with an elliptic fixed point. The formal conjugacy to Birkhoff’s normal form is then in general diverging (see [Si-Mo], Sect. 30). The formal normal form situation is also relevant when A is not invertible. Central manifolds. In situations where the dynamics is not linearizable, one can still have invariant manifolds through the fixed point (see for example [Pos], and [St]). Usually one has a formal equation whose coefficients depend polynomially on the coefficients of ft thus on t. In these situations the following theorem applies. Theorem 3. Under the same assumptions as in Theorem 1, we assume the existence of a formal invariant submanifold through 0 with equation Ft (z) = 0
456
R. Pérez-Marco
with Ft : Cn → Cp a formal mapping whose coefficients depend polynomially on t ∈ Cm . More precisely, the coefficient of the monomial of valuation l is a polynomial on t of degree less than C0 + C1 l where C0 , C1 > 0 are constants. We have the dichotomy: 1) Ft converges and defines an invariant submanifold for all t ∈ Cm . 2) Except for a pluri-polar exceptional set of parameter values t ∈ Cm , Ft diverges. We have the same theorem for holomorphic vector fields. To be more specific, consider the situation treated by L. Stolovitch [Sto], for 1 ≤ j ≤ n, z˙ j = λj zj +
d
t i fj,i (z),
i=1
where fj,i = O(2). We assume that the linear part (which does not depend on t) is in the Siegel domain, that is 0 belongs to the convex hull of {λ1 , . . . , λn }. We assume that the linear part is resonant, and the resonances, n1 , . . . , n2 ≥ 1 and any 1 ≤ j ≤ n, n
n i λi − λ j = 0
i=1
are generated by a finite number of resonances, 1 ≤ j ≤ l, rj = (r1 , . . . rn ) = 0, rj ∈ Nn , (rj , λ) = 0. Then there exists a formal change of variables w = ht (z) with ht (0) = 0 and D0 ht = I which transforms the system into w˙ i = λi wi + gi,t (w) with gi,t (w) = lj =1 gi,j,t y rj , and if ||rj || = 1 then gi,j,t (0) = 0. As constructed in [Sto], the coefficients of the formal normalization do depend polynomially on t. Theorem 4. With the previous assumption, we have the following dichotomy, 1) For all value of t ∈ Cm the formal normalization ht converges, thus the sub-manifold {wr1 = 0, . . . wrn = 0} is invariant. 2) Except for an exceptional pluri-polar set of values of t, the normalization mappings ht diverge. According to [Sto], and assuming that the higher dimensional resonant Bruno condition on (λ1 , . . . , λn ) holds, we are always in Case (1). Singularities of holomorphic vector fields. We consider a polynomial family of germs of holomorphic vector fields as before. But we assume here that the linear part is nonresonant, that is, for any n1 , . . . , n2 ≥ 1 and any 1 ≤ j ≤ n, λj −
n i=1
ni λi = 0.
Total Convergence or General Divergence in Small Divisors
457
Theorem 5. Under the above hypothesis, we have the dichotomy 1) The family of holomorphic vector fields is linearizable for all t. 2) Except for an exceptional pluri-polar set of values of t, the holomorphic vector fields are non-linearizable. In the case n = 2 one has a complete correspondence of the problem of linearization of holomorphic vector fields as above and the problem of linearization of germs of holomorphic diffeomorphisms of (C, 0). This was stablished in [PM-Yo] where,as corollary, Yoccoz and the author proved that the Bruno condition is optimal for the problem of linearization. Centralizers. We discuss here the situation of one complex variable. The analysis generalizes similarly to higher dimension. The study of centralizers of holomorphic germs generalizes the problem of linearization. We refer to [PM2] for proofs and references. In the group of holomorphic diffeomorphisms G = (Diff(C, 0), ◦), composed by holomorphic germs f with f (0) = 0 and f (0) = 0, and with group operation given by the composition ◦, we consider the centralizer of f , Cent (f ) = {g ∈ Diff(C, 0); g ◦ f = f ◦ g}. It is a subgroup of G that can be interpreted as the group of symmetries of f (i.e. those changes of variables conjugating f to itself). We have the following cases: 1) For germs with attracting or repelling fixed point at 0, i.e. f (0) = e2πiα with α ∈ / R, the centralizer is a complex flow of dimension 1. 2) For germs with indifferent rational fixed point at 0, i.e. f (0) = e2πiα with α ∈ Q, the centralizer is generated by root (for composition) of the germ (then it is discrete), or it is a one dimensional complex flow. These cases are well understood. We discuss the last case in what follows. 3) For germs with an indifferent irrational fixed point at 0, f (0) = e2πiα with α ∈ R − Q, the centralizer can be a one-dimensional real flow (the linearizable case), discrete or uncountable. The existence of examples where the last possibility holds was only proved recently in [PM2]. In this case the centralizer is abelian and isomorphic to a subgroup of the circle T = R/Z by the rotation number morphism, ρ: We denote
G −→ T f −→ log f (0).
G(f ) = ρ(Cent(f )).
Note that Zα ⊂ G(f ). The holomorphic germ f is linearizable if and only if the centralizer is full G(f ) = T, otherwise it is an Fσ and dense set of T with 0 measure (and indeed 0 capacity). Moreover, all elements g ∈ Cent (f ) are non-linearizable. Thus how small G(f ) is can be thought of as a measure of how far f is from being linearizable. Thus the study of centralizers (apart from the motivation coming from the theory of foliations, see [PM2]) is motivated as a finer study of linearization. The
458
R. Pérez-Marco
question of determining if β ∈ G(f ) is intimately connected with the common rational approximations of α and β, as the following theorem of J. Moser shows ([Mo]). Let f be non-linearizable. If there exists γ , τ > 0 such that for any p ≥ 1, q ∈ Z, γ min(|qα − p1 |, |qβ − p2 |) ≥ τ , q then β ∈ / G(f ). The necessity of an arithmetic condition in Moser’s theorem is proved in [PM2]. Using the techniques developped in this article, we prove: Theorem 6. Let ft be a family of holomorphic germs as in Theorem 1, with fixed linear part f (0) = e2πiα , α ∈ R − Q. For any β ∈ T, we have the following dichotomy: 1) For all t ∈ C, β ∈ G(ft ). 2) Except for an exceptional polar set E ⊂ C, β ∈ / G(ft ). Further applications. One can use the argument presented here, to improve on the original result of Ilyashenko in [Il], and replace one parameter by several, and measure 0 set by pluri-polar set. The same remark applies to all results derived from Ilyashenko’s argument, as those one can find in [He2] and [Yo]. A complete treatment for the problem of linearization of resonant holomorphic germs is given in [PM3]. These techniques also apply to other small divisors problems. Behind the technique used here, there is an abstract elementary theorem on holomorphic extension of Rothstein type for a certain type of power series(see [PM4]). 1. Proof of Theorem 1 1.1. Nadirashvili and Bernstein lemmata. We first present the potential theory tools in dimension 1 which is probably more familiar to the readers, and is enough to prove the theorems for one dimensional parameter families. For the definition of Green function, polar sets and other notions in potential theory we refer the reader to [Ra] for example (for a more encyclopedic treatment see [Tsu]). Y. Ilyashenko in his article [Il] makes use of the following lemma attributed to N. S. Nadirashvili ([Na]). Lemma (Nadirashvili). Let E ⊂ C be a compact set with positive measure in the disk DR of center 0 and radius R > 0. Let P be a polynomial of degree n such that for some M > 0, ||P ||C 0 (E) ≤ M n . Then there exists a constant C only depending on the measure of E and R > 0 such that ||P ||DR ≤ C n M n . The key idea of this lemma is that it is enough to control a polynomial on a set of E positive measure to get a bound in any bounded domain. Note that this idea is very different in nature than the maximum principle. We improve on [Il] observing that the measure of E is not the relevant quantity. Nadirashvili’s lemma is a direct corollary of the classical Bernstein (or Bernstein–Walsh) lemma in approximation theory and classical potential theory (see [Ra], p. 156) and the fact that a set of positive measure is non-polar:
Total Convergence or General Divergence in Small Divisors
459
Lemma (Bernstein). Let E ⊂ C be a non-polar compact set (i.e. cap(E) > 0). Let 4 be the connected component of C − E containing ∞. Then for any polynomial P of degree n, we have for t ∈ C, |P (t)| ≤ eng4 (t,∞) ||P ||C 0 (K) , where g4 denotes the Green function of 4. We recall that the non-polarity of E implies the existence of a Green function g4 (z, ∞) = g4 (z) such that g4 (∞) = ∞, g4 is harmonic in 4, for z → ∞, g4 (z) − log |z| is bounded, and when z → z0 , z0 regular point of E, g4 (z) → 0. These properties determine g4 uniquely. The proof of this lemma is quite simple. Proof. We can assume the polynomial monic. Then u(t) =
1 1 log P (t) − log ||P ||C 0 (K) − g4 (t, ∞) n n
is sub-harmonic, is negative near ∞ (because g4 (t, ∞) = log |t| + cap(E) + o(1)), and lim sup u(t) ≤ 0 when t → K. The maximum principle concludes the proof. For future reference we recall here that a countable union of polar sets is polar.
1.2. Pluri-potential theory. There is a relatively recent extension to Cm of the classical potential theory on C. We refer to [Kli], Chap. 5 for proofs. We consider the set L of pluri-subharmonic functions u defined in Cm and of minimal growth, i.e. u(z) − log ||z|| is bounded above when ||z|| → ∞. Given a subset E ⊂ Cm , we define VE (z) = sup{u(z); u ∈ L, u/E ≤ 0} . The upper semi-continuous regularization VE∗ of VE is called the pluri-sub-harmonic Green function of E. This function VE∗ is either pluri-sub-harmonic or identically +∞. We are in the former case when E is not pluri-polar, then VE∗ has logarithmic growth at ∞, that is VE∗ (z) − log ||z|| is bounded above when z → ∞. As in one dimension we immediately prove Lemma. If E is not pluri-polar and P is a polynomial of degree d, then we have for z ∈ Cm , |P (z)| ≤ ||P ||C 0 (E) edVE (z) . We also have that a countable union of pluri-polar sets is pluri-polar.
460
R. Pérez-Marco
1.3. Proof of Theorem 1. We start with the following elementary lemma. Lemma A. The coefficient vectors hi (t) of the formal linearization hi (t)zi ht (z) = z + i=(i1 ,...in ) i1 +...+in ≥2
have coordinates that are polynomials in the parameter t = (t1 , . . . tm ) of degree less than d(i1 + . . . in ). Proof. We can assume that A is in upper triangular Jordan normal form. We solve the functional equation A ◦ ht = ht ◦ ft identifying coordinates and developing in homogeneous vector monomials. By induction on |i| = i1 + . . . + in we do determine successively the vectors hi (t) that depend on coefficients of ft and lower order hj (t)’s, |j | < |i|. By induction, the linear equations determining hi (t) do have the form cj (t)hj (t), (A − Mi )hi (t) = |j |<|i|
where the matrix Mi is upper triangular, only depends on A and i (but not on the parameter t), has diagonal coefficients products of eigenvalues of A (thus A − Mi is invertible) and in the left-hand side the coefficients cj (t) are polynomials in t of total degree at most (|i| − |j |)d. To see this, note that cj (t) is obtained collecting the coefficient of zi in the expansion of ft,k (ft,1 (z))j1 . . . (ft,n (z))jn = with |j | + |{k; k ≥ 2}| ≤ |{k; k = 1}| + 2 |{k; k ≥ 2}| ≤
k = |i|
and deg t cj (t) ≤ d |{k; k ≥ 2}| ≤ d (|i| − |j |) By induction the result follows. Remark. In the case of a more general family as the ones in remark 4 after the theorem, we get, by the same proof, that the degree of cj (t) is bounded by ε0−1 (|i| − |j |). Proof of Theorem 1. Let E = {t ∈ C; ft is linearizable}. We want to show that E is pluri-polar or the whole complex plane. We have Ej , E= j ≥1
where Ej the set of parameters t such that ht has radius of convergence larger or equal to 1/j and ht is holomorphic and bounded by 1 in the ball of center 0 and radius 1/j . It is clear that any convergent ht belongs to some Ej since ht (0) = 0. The sets Ej are
Total Convergence or General Divergence in Small Divisors
461
clearly closed. If E is not pluri-polar, we have that for some j0 ≥ 1, Ej is not pluri-polar. Thus, by Cauchy, there exists ρ0 > 0 such that for all t ∈ Ej0 , ϕ(t) =
−|i|
sup ||hi (t)||ρ0
|i|→+∞
< +∞ .
The function ϕ is lower semicontinuous (as a supremum of continuous functions), and Ej0 = Lp p
where Lp = {z ∈ Ej0 ; ϕ(t) ≤ p} is closed. Again some Lp0 is not pluri-polar. Finally, we fund a non-pluri-polar closed set C = Lp0 for which there exists ρ1 > 0 such that for any t ∈ C and and all i ∈ Nn , |i|
||hi (t)||C 0 (C) ≤ ρ1 . Using Bernstein’s lemma and Lemma A we get for any t ∈ Cm |i|
|hi (t)| ≤ ed|i|VC (t) ρ1 . Thus ht has non-zero radius of convergence and ft is linearizable for any t ∈ Cm . The radius of canvergence can be estimated by the precise form of Bernstein lemma and VC (t) ∼ log ||t|| when t → ∞, by R(ht ) ≥
C0 , 1 + ||t||d
for some constant C0 > 0. Now, the proof also goes through in the case of Remark 4. In Lemma A the coefficient hi (t) is now a polynomial of degree ε0−1 |i|, because hi (t) only depends on the coefficients of ft of degree ≤ |i| (which are polynomials on t of degree O(ε0−1 |i|)). This only affects the bound on the radius of convergence (just replace d by ε0−1 ).
1.4. Proof of the corollaries. Corollary 1 is just a particular case of Theorem 1. Corollary 3 part (1) also. Now using Corollary 3 part (1) we prove Corollary (J.-C. Yoccoz [Yo])). The quadratic polynomial Pα (z) = e2πiα z + z2 is non-linearizable when α is not a Bruno number. Proof. For this, pick f non-linearizable (that exists from [Yo]) and consider ft (z) = tPα (z) + (1 − t)f (z). Since f1 is not linearizable, all ft except for a polar set E of values of t are not linearizable. By Douady–Hubbard straightening theorem C − E is a neighborhood of 0 and 0 is not linearizable. Now by the same argument, part (2) and (3) of Corollary 3 follow. We prove: Corollary (R. Pérez-Marco [PM1]). If P is a structurally stable polynomial in the space Pλ,d = P (z) = λz + a2 z2 + . . . + ad zd ; (a2 , . . . , ad ) ∈ Cd−1 , then P is not linearizable.
462
R. Pérez-Marco
Proof. Just consider
ft (z) = tP (z) + (1 − t)Pα (z)
and do the same proof. Now Corollary 2 follows from part (1) of Corollary 3. Corollary 4 is not a strict corollary of Theorem 1 but of the improvement using Remark 4. One should observe that the coefficients of the linearization are polynomial functions of the coefficients of the rational function with appropriate degree. To see this, since 0 is not a pole of the rational function R, we can assume conjugating by a linear dilatation that the constant coefficient of the denominator is 1, i.e. R(z) = Now expanding
P (z) . 1 − Q(z)
1 = Q(z)i 1 − Q(z) i
we see that the coefficients of the power series depend, as is power series on the coefficients of P and Q, with order bounded from below by a linear function. For Corollary 5 only the last assertion is not immediate. If one of the eigenvalues of A violates the Bruno condition, λ1 for example, then (z1 , . . . , zn ) → (λ1 z1 + z12 , λ2 z2 , . . . , λn zn ) is not linearizable, thus the first part applies giving a rich family of polynomial nonlinearizable examples. 2. Proof of the Other Theorems 2.1. Formal linearizations and Theorem 2. The proof of Theorem 2 is similar to the proof given in the previous section of Theorem 1. For a complete study of the resonant case we refer to [PM3]. We just mention here the new difficulties that appear. Assume that we have a germ of holomorphic diffeomorphism f : (Cn , 0) → (Cn , 0) with resonant linear part A = D0 f ∈ GLn (C). The formal linearization h is not always unique when the linear part A is resonant or not invertible. For example, for n = 2 and
11 A= , 01 then if h is a formal linearization then l ◦ h is also one, where l(z1 , z2 ) = (z1 + k(z2 ), z2 ). If we consider a polynomial family (ft )t∈Cm , to request that ht has coefficients depending polynomially on t does not improve things. One then can take various kt depending polynomially on t. Thus the family (ht ) with this further restriction is not unique.
Total Convergence or General Divergence in Small Divisors
463
This presents a problem in order to prove the non-linearizability. Considering a polynomial parameter family of formal linearizations of a fixed map f , (ht ), we may be in the second case, but this does not mean that f is not linearizable. For instance, if the exceptional set E is not empty, then f will be linearizable! The question of nonlinearizability is harder to answer. In [PM3] we show that if the polynomial family of linearizations is chosen in a natural way, this difficulty does not arise. 2.2. Other theorems. The proofs are similar to Sect. 1. We just comment on the particularities of each problem. For an explicit example where Theorem 3 applies one can workout the example of J. Poschel [Pos]. The polynomial dependence with the appropriate bound on the degrees follows from the formal computation of the formal equation of the invariant manifold. Theorem 4 is proved in a similar way. We refer to [Sto] for the formal computation of a normalizing map with polynomial dependence on the parameter t with the appropriate degrees. One can workout in this situation similar results than in [PM3]. The linearization in Theorem 5 is unique and it is well known ([Ar]) that it depends polynomially on t with the appropriate degrees. Thus the same proof applies. Note that in C2 , by [PM-Y] one can realize any germ of holomorphic diffeomorphism in (C, 0) as holomony of a singularity of holomorphic vector field of the type considered. For the proof of Theorem 6, we give the induction formulas for the coefficients of gβ . Let µ = e2πiα = f1 and λ = e2πiα = g1 , and f (z) =
+∞
fn z n ,
n=1
g(z) =
+∞
gn z n .
n=1
Identifying terms of degree n ≥ 2 in the equation g ◦ f = f ◦ g, we get for n ≥ 2, +∞
µn − µ gn = n fp fn + λ −λ p=1
i1 +...ip =n ij ≥1
gi1 . . . gip +
+∞ p=1
gp
fi1 . . . fip .
i1 +...ip =n ij ≥1
And by induction the coefficients of gβ depend polynomially on t and have the appropriate degrees. Acknowledgements. This paper has benefit from remarks of several people. Many thanks to Rafael de La Llave who pointed out corrections and improvements, and for all the friendly and instructive discussions on Small Divisors. Also to the specialists in several complex variables, Nessim Sibony and independently Eric Bedford and Norm Levenberg, for pointing out very quickly that the natural notion of smallness is pluri-polarity and not $-capacity 0 used in the first version. Special thanks to Eric Bedford for explaining to me some basic notions of pluri-potential theory. Thanks to Giovanni Forni and Jean-Christophe Yoccoz for pointing out corrections, and also to Yulij Ilyashenko for exchanges on the subject.
References [Ar]
Arnold, V.I.: Chapitres supplementaires de la théorie des équations différentielles ordinaires, Moscov: Editions MIR, 1980
464
[Cr] [He1]
R. Pérez-Marco
Cremer, H.: Über die Häufigkeit der Nichtzentren. Math. Ann. 115, 573–580 (1938) Herman, M.R.: Une méthode pour minorer les exposants de Lyapounov et quelques exemples montrant le caractère local d’un théorème d’Arnol’d et Moser sur le tore en dimension 2. Comment. Math. Helv. 58, 453–502 (1983) [He2] Herman, M.R.: Recent results and some open questions on Siegel’s linéarisation theorem of germs of complex analytic diffeomorphisms of Cn near a fixed point. Proceedings VIIIth Int. Conf. Math. Phys., Singapore: World Scientific Publishers, 1987 [Il] Ilyashenko, Y.: Divergence of series that reduce an analytic differential equation to linear normal form at a singular point. Funct. Anal. and Appl. 13, 87–88 (1979) [Kl] Klimek, M.: Pluripotential theory. London Matheamtical Society Monographs, new series, 6, 1991 [Na] Nadirashvili, N.S.: Vest. Mosk. Gos. Univ. Ser. Mat. 3, 39–42 (1976) (from [Il]) [Poi] Poincaré, H.: Les méthodes nouvelles de la mécanique céleste. Volume 1, Paris, 1893 [Pos] Poschel, J.: On invariant manifolds of complex analytic mappings near fixed points. Exposition. Math. 4, 2, 97–109 (1986) [PM1] Pérez-Marco, R.: Sur les dynamiques holomorphes non linéarisable et une conjecture de V. I. Arnold. Ann. Scient. Ec. Norm. Sup. 4 serie, t.26, 565–644 (1993) [PM2] Pérez-Marco, R.: Non linearizable holomorphic dynamics having an uncountable number of symmetries. Invent. Math. 119, 67–127 (1995) [PM3] Pérez-Marco, R.: Linearization of holomorphic germs with resonant linear part. Preprint xxx.lanl.gov/abs/math.DS/0009030, 2000 [PM4] Pérez-Marco, R.: A note on holomorphic extensions. Preprint: xxx.lanl.gov/abs/math.DS/0009031, 2000 [PM-Yo] Yoccoz, J.-C., Pérez-Marco, R.: Germes de feuilletages holomorphes à holonomie prescrite. In: Complex methods in dynamical systems, Astérisque 222, 345–371 (1994) [Ra] Ransford, T.: Potential theory in the complex plane. London Mathematical Society, Student Texts 28, Cambridge: Cambridge University Press, 1995 [Si-Mo] Siegel, K.L., Moser, J.: Lectures on celestial mechanics. Berlin–Heidelberg–New York: SpringerVerlag, 187, 1971 [Sto] Stolovitch, L.: Sur un théorème de Dulac. Ann. Inst. Fourier, 44, 5, 1397–1433 (1994) [Tsu] Tsuji, M.: Potential Theory in Modern Function Theory. Tokyo: Maruzen, 1959 [Yo] Yoccoz, J.-C.: Théorème de Siegel, polynômes quadratiques et nombres de Brjuno. Astérisque 231, 1995 Communicated by Ya. G. Sinai
Commun. Math. Phys. 223, 465 – 474 (2001)
Communications in
Mathematical Physics
© Springer-Verlag 2001
Isophasal Scattering Manifolds in Two Dimensions Robert Brooks1, , Peter A. Perry2, 1 Department of Mathematics, Technion-Israel Institute of Technology, Haifa, Israel.
E-mail: [email protected]
2 Department of Mathematics, University of Kentucky, Lexington, KY 40506-0027, USA.
E-mail: [email protected] Received: 25 August 2000 / Accepted: 1 June 2001
Abstract: We construct pairs of nonisometric, two-dimensional, asymptotically Euclidean manifolds X1 and X2 with the same scattering phase. 1. Introduction In this note, we construct non-isometric pairs of complete Riemannian manifolds in two dimensions which are isometric to Euclidean R2 outside a compact set and have the same scattering phase. Our work provides a counterpoint to the recent result of Hassell and Zelditch [7] which shows that the scattering phase of an exterior domain in R2 with smooth boundary determines the obstacle up to a compact set of deformations. The examples are constructed using the Sunada construction [11] and ideas of Brooks and Tse [5]. To understand the physical intuition behind these examples, it is useful to recall a special example of exterior obstacle scattering in Rn , the Helmholtz resonator (see, for example, [1, 8]). For convenience, we consider n = 2. Let O be a bounded open subset of R2 (the “obstacle”) of the form O = S\Z (ε), where S is diffeomorphic to a thin spherical shell and Z (ε) is a thin tube with diameter of order ε which connects the interior of S to its exterior. Thus O divides R2 into a cavity C and an exterior domain E, connected by the thin tube Z (ε). We consider the wave equation with Dirichlet boundary conditions on = R2 \O, i.e., u = utt , u|∂ O = 0. Partially suppported by the Israel Science Foundation, founded by the IsraelAcademy ofArts and Sciences, and by the Fund for the Promotion of Research and the Fund for the Promotion of Sponsored Research at the Technion. Supported in part by NSF grant DMS-9707051 and a University Research Professorship from the University of Kentucky.
466
R. Brooks, P. A. Perry
If the thin tube were closed, stable oscillating modes with frequencies determined by the eigenvalues of the Laplacian in the cavity would be decoupled from the exterior domain; since the tube is open, initial conditions chosen to excite these cavity modes result in a solution which “looks like” a stable cavity mode initially but all of whose energy leaks out of the cavity over time. If denotes the Laplacian on with Dirichlet boundary conditions on ∂ , the existence of such resonant solutions corresponds to the existence −1 of poles of the resolvent operator R(k) = − − k 2 , meromorphically continued to the logarithmic plane. Such poles are referred to as scattering resonances and have nonzero imaginary part. For each eigenvalue λm of the closed cavity, there is a resonance 2 = λ + O (ε). The remaining pole ρm of R (k) with nonzero imaginary part and ρm m scattering resonances are similarly “close” to resonances of an exterior obstacle problem for E. Given two isospectral cavities, one might hope that the corresponding Helmholtz resonators have the same scattering resonances and therefore to be “isoscattering”, although it is hard to make this hope precise. Our construction is a variant of this idea, and involves gluing isospectral compact manifolds to the Euclidean plane along a small disc. Thus, the isospectral manifold (with a disc removed) plays the role of the resonant cavity, and the Euclidean plane with the disc removed plays the role of the exterior E. The notion of “isoscattering” that we actually use involves the scattering phase, defined and discussed below. Two manifolds with the same scattering phase have the same scattering poles, so that this notion of isoscattering is actually stronger than the equality of sets of scattering poles. In what follows, we will say that a complete, two-dimensional Riemannian manifold is Euclidean near infinity if, for a compact set K ⊂ X, X\K is isometric to R2 \B(0, R), where B(0, R) denotes the closed ball of radius R for some fixed R > 0. Being Euclidean near infinity is stronger than having one end which carries a flat metric near infinity. A surface with one flat end will be Euclidean at infinity if every closed curve of constant curvature 1/r which stays in the flat region has length 2π r. We will prove: Theorem 1. There exist pairs of complete, two-dimensional Riemannian manifolds (X1 , g1 ) and (X2 , g2 ) with the following properties: (1) X1 and X2 are Euclidean near infinity, (2) X1 is not isometric to X2 , and (3) X1 and X2 have the same scattering phase. To define the scattering phase, let us recall some basic facts about spectral and scattering theory for compact perturbations of Euclidean space (see [9] for a detailed discussion). Let X be Euclidean near infinity. Then the positive Laplacian X has empty point spectrum and continuous spectrum in [0, ∞), and is unitarily equivalent to the Euclidean Laplacian on R2 . The resolvent (X − λ)−1 is therefore analytic in the cut plane C\[0, ∞) as an operator-valued function on L2 (X); if we define −1 RX (k) = X − k 2 , initially for Im(k) > 0, and view it as a map from C0∞ (X) to C ∞ (X), RX (k) admits a meromorphic continuation to the logarithmic plane with discrete poles. These poles are the scattering resonances of X. To further describe scattering on X, it is useful to compactify X to a manifold ¯ by adding a circle S1 at infinity. If (r, θ ) are polar coordinates with boundary, X, 2 on R \B(0, R), x = r −1 is a defining function for ∂ X¯ and (x, θ ) give coordinates ¯ The absolute scattering operator for X is a mapping for X¯ in a neighborhood of ∂ X.
Isophasal Scattering Manifolds
467
SX (k) : C ∞ S1 → C ∞ S1 defined as follows. For each k > 0 and f− ∈ C ∞ S1 , there exists a unique solution u ∈ C ∞ (X) of the eigenvalue equation X − k 2 u = 0 (1.1) having the asymptotic form u (x, θ ) ∼ x −1/2 a+ (θ ) eik/x + x −1/2 a− (θ ) e−ikx + O x 1/2 .
(1.2)
The absolute scattering operator is the map SX (k) : C ∞ S1 → C ∞ S1 given by SX (k) a− = a+ . Clearly SX (k) SX (−k) =I ; moreover, SX (k) extends to a unitary operator on L2 S1 . If J : C ∞ S1 → C ∞ S1 is given by (Jf ) (θ) = f (−θ ), then the relative scattering operator (relative to wave propagation on R2 with no “obstacle”) is given by SX (k) = SX (k) ◦ J . If I denotes the identity operator on L2 S1 , then SX (k) − I has smooth kernel and therefore extends to a trace-class operator on the Hilbert space L2 S1 . It follows that det (SX (k)) is well-defined. The scattering phase σX (k) is given by σX (k) = log det SX (k) . We have: Theorem 2. Let X be Euclidean near infinity. Then the scattering phase σX (k) extends to a meromorphic function on the logarithmic plane whose poles coincide with the poles of the meromorphically continued resolvent RX (k). Thus, isophasal manifolds also have the same scattering poles. Clearly, to prove Theorem 1, it suffices to exhibit pairs of manifolds X1 and X2 for which there is an invertible linear mapping ( : L2 S1 → L2 S1 which intertwines the scattering operators for X1 and X2 . The plan of this paper is as follows. In Sect. 2, we discuss orbifolds and orbifold coverings. In Sect. 3, we adapt the Sunada construction to certain non-compact manifolds and give sufficient conditions for non-compact Sunada pairs of non-compact manifolds to have scattering operators which are intertwined by an invertible linear map. This material is very similar to results in [2, 4, 6, 13] but is included to make the paper selfcontained. In Sect. 4, we describe the construction of the isophasal pairs. 2. Orbifolds Let M˜ be a complete, simply-connected Riemannian manifold, and * a group of isome˜ tries of M acting properly discontinuously on M. This means that for every x ∈ M, there is a neighborhood U of x such that the intersection U ∩ γ (U ) is non-empty for only finitely many γ ∈ *. ˜ * carries a metric which is a Riemannian metric away Then the quotient M = M/ from the orbits of fixed points of elements of *. Every point x ∈ M has a neighborhood which is isometric to the quotient of a neighborhood of a point in M˜ by a finite group. We define the group * as the orbifold fundamental group of M. As an example, let us consider the action of the group Z/kZ on the Euclidean space R2 given by rotation through angle 2π/k. The quotient space C(k) is topologically R2 ,
468
R. Brooks, P. A. Perry
but metrically is a cone, such that the circles of radius r about the cone point (the orbit of 0) have length 2πr/k. We will call the orbifold C(k) the cone with cone angle 2π/k. If S is a smooth surface such that S is isometric to C(k) outside a compact set, then we will say that S is a truncated cone with cone angle 2π/k. As a further example, let us consider a triangle T in the hyperbolic plane H2 whose angles are π/n1 , π/n2 , π/n3 . Such a triangle will exist and is unique provided 1/n1 + 1/n2 + 1/n3 < 1. Let *˜ be the group generated by reflections in the sides of T , and * the subgroup of index 2 consisting of elements of *˜ which preserve the orientation of H2 . Then M = H2 / * may be identified with two copies of T glued together along the boundary, and so is a sphere with three singular points, p1 , p2 , p3 such that the length l(r, pi ) of the circle of radius r about pi satisfies l(r, pi ) → 2π/ni as r → 0. r The orbifold fundamental group of M has the presentation p
p
p
π = {X1 , X2 , X3 : X1 1 = X2 2 = X3 3 = X1 X2 X3 = Id}. The orbifold fundamental group enjoys the usual properties of the fundamental group. In particular, finite orbifold coverings of M are classified by subgroups of the orbifold fundamental group. We may now specialize our discussion to surfaces. Let M˜ be a complete simply ˜ * an orbifold quotient of M˜ by a group of orientationconnected surface and M = M/ ˜ preserving isometries of M. We will say that the point p has order k if the subgroup fixing a point p˜ lying over p is cyclic of order k. The order of p can be read off from the lengths of circles of small radius about p, as above. The point p is singular if k > 1. If M is topologically a surface of genus g, and has singular points p1 , . . . , pk of orders n1 , . . . , nk , then the orbifold fundamental group π1 (M) of M is given by π1 (M) = {A1 , B1 , . . . , Ag , Bg , X1 , . . . , Xk : g
(i=1 [Ai , Bi ] = (j Xj , X1n1 = . . . = Xknk = Id}. Here the Ai ’s and Bi ’s are the usual representatives of a basis of loops in a surface of genus g, and the Xi ’s correspond to loops going around the singular points pi . If M is an orbifold, and M a finite covering of M, it may well happen that M is a smooth surface. To describe when this happens, we observe that the coset space π1 (M)/π1 (M ) carries an action of the group π1 (M). The condition that M be smooth is that the action of each Xi on the coset space has all orbits of length ni . Roughly speaking, this says that, in a neighborhood of any point p covering pi , ni copies of a neighborhood of pi come together to fill out a neighborhood of p . Note that this implies that the degree of the covering is a multiple of ni . Let M be an orbifold surface, and let M p1 be obtained from M by removing the point p1 from M and changing the metric in a neighborhood of p1 so that it is complete. Then the orbifold fundamental group π1 (M p ) may be obtained by replacing the relation X1n1 = Id with the empty relation X10 = Id. In particular, the obvious map π1 (M p ) → π1 (M) allows us to pull back coverings of M to coverings of M p .
Isophasal Scattering Manifolds
469
Now let M be an orbifold surface with a singular point p of order n (and possibly other singular points as well), and M a covering of M which is smooth. The degree of the covering is then nk for some k. We may now delete a neighborhood of p and replace it with a truncated cone of cone angle 2π/p to obtain M p . If we then pull back the covering of M over M to M p , the resulting surface M p has k Euclidean ends. In particular, if k = 1, the surface M p is Euclidean at infinity.
3. Sunada’s Construction In this section, we discuss a version of the Sunada construction which is well-suited to the scattering problem at hand. We follow closely the discussion in [4]; compare Zelditch [12], Sect. 5 where a similar intertwining map is written down to verify Fourier isospectrality of isospectral manifolds generated by the Sunada construction. We consider triples of finite groups (G, H1 , H2 ) and corresponding triples of Riemannian orbifolds (S0 , S1 , S2 ) with the following properties: (1) (G, H1 , H2 ) satisfies the Sunada condition: H1 and H2 are subgroups of G and for every conjugacy class [g] of G, # {[g] ∩ H1 } = # {[g] ∩ H2 } ,
(3.1)
but H1 is not conjugate to H2 in G, (2) There is a surjective homomorphism f : π1 (S0 ) → G, and S1 and S2 are Riemannian coverings of S0 with π1 (Si ) = f −1 (Hi ), i = 1, 2, and (3) S1 and S2 are smooth and Euclidean near infinity, and there is a compact subset K0 of S0 so that S0 \K0 is isometric to a truncated cone. In order to obtain (3), we need that there are triples of finite groups (G, H1 , H2 ) so that the order of at least one singular point on S0 is exactly equal to the index of Hi in G. In terms of group theory, what is required is a Sunada triple (G, H1 , H2 ) with an element g of G which is of order the index of the Hi in G, and which acts transitively on the coset spaces G/H1 and G/H2 . Such examples of Sunada triples appear to be rare, but we will provide such examples below. As in [4], Sect. 4, we note that the Sunada condition (3.1) is equivalent to the condition that L2 (G/H1 ) and L2 (G/H2 ) are equivalent as G-modules. Observing that G acts on the space L2 (G) both on the left via [g · ψ] (x) = ψ g −1 x and on the right via [ψ · g] (x) = ψ (xg) , H we may identify L2 (G/Hi ) with the subspace L2 (G) i of functions invariant under the left action of Hi . The Sunada condition is then in turn equivalent to the existence of H H a left G-module isomorphism T between L2 (G) 1 and L2 (G) 2 .
470
R. Brooks, P. A. Perry
We wish to show: Lemma 1. There are invertible linear maps T : C ∞ (X1 ) → C ∞ (X2 ) and T∂ : C ∞ (∂X1 ) → C ∞ (∂X2 ) with the following properties: (a) X2 ◦ T = T ◦ X1 . (b) If u ∈ C ∞ (X1 ) solves X1 − k 2 u = 0 with
u (x, θ ) = x −1/2 a+ (θ ) exp (ik/x) + x −1/2 a− (θ) exp (−ik/x) + O x 1/2 (3.2)
for functions a± ∈ C ∞ (∂X1 ), then T u solves X2 − k 2 u = 0 with (T u) (x, θ) = x −1/2 (T∂ a+ ) (θ ) exp (ik/x)
+ x −1/2 (T∂ a− ) (θ ) exp (−ik/x) + O x 1/2 .
(3.3)
Proof. We proceed very much as in the proof of Theorem 4.4 in [4]. Let Xe be the covering of X0 with π1 (Xe ) = f −1 (e), where e is the identity element. Then Xe is a Galois covering of X1 and X2 with respective covering groups H1 and H2 , and the Hi act on the left on C ∞ (Xe ) by isometries. Note that Xe is the union of a compact manifold with boundary and finitely many copies of R2 \B (0, R). We may identify functions on Xi with Hi -invariant functions on Xe ; we denote by C ∞ (Xe )Hi the smooth Hi -invariant functions on Xe . If x is the canonical determining function for X0 (chosen using the isometry of X0 \K0 and a Euclidean cone), we lift x to a determining function on Xe which then projects to the canonical determining function on each of the Xi . The construction of a mapping T : C ∞ (Xe )H1 → C ∞ (Xe )H2 which intertwines the Laplacians proceeds exactly as in the proof of Theorem 4.4 in [4]; one obtains (T u)(z) = (# (H1 ))−1 c (g) u (gz) g∈G
for a certain right H2 -invariant function c on G. From the form of Xe and the fact that G acts by isometries it is clear that any smooth function of the form (3.2) is mapped to a function of the form (3.3). Note that elements g ∈ G act as Euclidean isometries “near infinity” and so leave the defining function invariant; moreover, they act on the circle boundaries either as rotations or permutations. Thus, we may define (T∂ a)(m) = (# (H1 ))−1 c (g) a (gm) , g∈G
and it is clear that
(T u) (x, θ ) = x −1/2 (T∂ a+ ) (θ) exp (ik/x) + x −1/2 (T a− ) (θ) + O x 1/2 . This simple lemma implies:
Isophasal Scattering Manifolds
471
Proposition 1. The mapping T∂ : C ∞ (∂X1 ) → C ∞ (∂X2 ) satisfies SX2 (k) ◦ T∂ = T∂ ◦ SX1 (k) for all k > 0. Proof. Let a− ∈ C ∞ (X1 ) be given, and let u be the unique solution of (1.1) (with X = X1 ) having the asymptotic form (1.2), where a+ = SX1 (k) a− . Mapping u to T u and computing its asymptotic expansion yields coefficients b± with b+ = SX2 (k) b− . By construction b± = T∂ a± so that T∂ SX1 (k) a− = T∂ a+ = b+ = SX2 (k) b− = SX2 (k) T∂ a− . Note that the mapping T∂ is independent of k, so that the intertwining relation is also valid for other k by meromorphic continuation. 4. Sunada Pairs of Manifolds Euclidean at Infinity We now consider the following construction of isospectral surfaces of small genus, taken from [5]. Let G be the group P SL(3, Z/2), and let H1 and H2 be the subgroups ∗∗∗ H1 = 0 ∗ ∗ 0∗∗ and
∗00 H2 = ∗ ∗ ∗ . ∗∗∗
Note that H1 and H2 have index 7 in G. As is discussed, for instance, in [3], the triple (G, H1 , H2 ) satisfies the Sunada condition. It was shown in [5] that: Lemma 2. (a) There are elements A and B of G such that A, B, and AB are all of order 7, and A and B generate G. (b) There are elements A and B of G such that A and B generate G, and the commutator [A , B ] is of order 7. We observe that since 7 is prime, A, B, and AB each have one cyclic orbit on the coset space G/H1 and on the coset space G/H2 . Now if we let S0 be the sphere with three singular points of order 7, the calculation of the orbifold fundamental group π1 (S0 ) gives us a homomorphism φ : π1 (S0 ) → G given by φ(X1 ) = A, φ(X2 ) = B, φ(X3 ) = (AB)−1 .
472
R. Brooks, P. A. Perry
E
D
G A
C 3
1
5
B
F 4 F
2 G 7
6 E
C D
B
A
Fig. 1. The first surface of genus 3
E
F
A G
6
B
1 3
C
2
F
D 5 C 7 4 G
E D
B
A
Fig. 2. The second surface of genus 3
It follows that the orbifold coverings S1 and S2 , as given in Sect. 2, are smooth. A simple calculation of the Euler characteristic shows that they are surfaces of genus 3 (see Figs. 1 and 2). We may now pick one of the singular points of S0 and replace it with a truncated p p cone of cone angle 2π/7, as in Sect. 2. The resulting coverings S1 and S2 will then be Euclidean at infinity and isophasal. In order to show that they are not isometric, we may, as in [5], choose a sufficiently asymmetric metric on S0 . If we now let T0 be the torus with one singular point of order 7, then we may use the generators of part (b) above to find a homomorphism π1 (T0 ) → G. The surfaces T1 and T2 will then be surfaces of genus 4, and are shown to be non-isometric by a direct argument in [5]. Replacing a neighborhood of the singular point with a truncated cone
Isophasal Scattering Manifolds
473 p
p
of cone angle 2π/7 then gives us surfaces T1 and T2 which are isophasal and Euclidean p p at infinity. The argument of [5] carries over immediately to show that T1 and T2 are not isometric (see Figs. 3 and 4). a
A
B G
f
g
a F
1 G
b
5 C
e 2 d
D 3
D
f 7
e
C 4
b
6
g
E
E B
A
c
c
d F
Fig. 3. The first surface of genus 4
a
D
A A
g
a
f
C
d
1
B
6
B
c 3
b
E 5
E
b
7 c
F 4
2
f
d
F
G C e
D
e
g G
Fig. 4. The second surface of genus 4
Acknowledgements. Peter Perry gratefully acknowledges the support of the University of Kentucky through its Research Professorship program.
474
R. Brooks, P. A. Perry
References 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13.
Beale, J.T.: Scattering frequencies of resonators. C.P.A.M. 26, 549–563 (1973) Bérard, P.: Transplantation et isospectralité. Math. Ann. 292, 547–560 (1992) Brooks, R.: Constructing isospectral manifolds. Am. Math. Monthly 95, no. 9, 823–839 (1988) Brooks, R., Gornet, R., Perry, P.: Isoscattering Schottky manifolds. G.A.F.A. 10, 307–326 (2000) Brooks, R., Tse, R.: Isospectral surfaces of small genus. Nagoya Math. J. 107, 13–24 (1987) Buser, P.: Isospectral Riemann surfaces. Ann. Inst. Fourier Grenoble 36, 167–192 (1986) Hassell, A., Zelditch, S.: Determinants of Laplacians in exterior domains. Int. Math. Res. Not. 18, 971– 1004 (1999) Hislop, P., Martinez, A.: Scattering resonances of a Helmholtz resonator. Indiana Univ. Math. J. 40, 767–788 (1991) Melrose, R. B.: Geometric Scattering Theory. New York, Melbourne: Cambridge University Press, 1995 Melrose,R.B., Zworski, M.: Scattering metrics and geodesic flow at infinity. Inventiones Math. 124, 399–436 (1996) Sunada, T.: Riemannian coverings and isospectral manifolds. Ann. Math. 121, 169–186 (1985) Zelditch, S.: Isospectrality in the FIO category. J. Differential Geom. 35, no. 3, 689–710 (1992) Zelditch, S.: Kuznecov sum formulae and Szegö limit formulas on manifolds. Comm. P.D.E. 17, 221–260 (1992)
Communicated by P. Sarnak
Commun. Math. Phys. 223, 475 – 507 (2001)
Communications in
Mathematical Physics
© Springer-Verlag 2001
Wu’s Equations and Quasi-Hypergeometric Functions Kazuhiko Aomoto1 , Kazumoto Iguchi2 1 Graduate School of Mathematics, Nagoya University, 1 Furo-cho, Chikusa-ku, Nagoya 464-8602, Japan.
E-mail: [email protected]
2 70-3 Shinhari, Hari, Anan, Tokushima 774-0003, Japan. E-mail: [email protected]
Received: 22 October 1999 / Accepted: 28 June 2001
Abstract: We investigate analytic solutions to Wu’s equations and their symmetry property, singularities and monodromy, relating them to quasi hypergeometric functions. We finally give the monodromy theorem for them in one, two and arbitrary dimensional cases (see Theorem 1, 2 and 3).
1. Quasi-Hypergeometric Functions Associated with Wu’s Equations )n Let β = (βi,j i,j =1 be an n × n matrix with real entries βi,j . Being given a point n z = (z1 , . . . , zn ) ∈ C , Wu’s equations with respect to w1 , . . . , wn are described as β
β
wi − 1 = zi w1 1i · · · wn ni ,
(1.1)
1 ≤ i ≤ n. Wu’s equations have appeared as fundamental equations determining mutually fractional exclusion statistics in statistical mechanics (see [25].) By applying the multivariable Lagrange inversion formula, the second author has shown that the unique holomorphic solution to (1.1) can be explicitly expressed as a special type of quasi hypergeometric functions (see [15]). It is important to study the singularity of this solution both in mathematics and in theoretical physics, because it gives a global nature of the function and also is related to critical phenomena in statistical physics. However we do not discuss any physical aspect. The purpose of this note is to give a monodromy formula for the solutions w(z) to (1.1). Under a certain non-degenerate condition, we first give 3n kinds of the local solutions to (1.1) near the points w = (w1 , . . . , wn ) such that wj = 0, 1, ∞ in the compactified space (CP 1 )n .
476
K. Aomoto, K. Iguchi
To do that, we consider the n dimensional real variety in the complex affine space Cn ,
XC1 ,... ,Cn
arg β w1 −1β = C1 w1 1,1 ···wn n,1 ··· : w −1 arg n = Cn , β
(1.2)
β
w1 1,n ···wn n,n
depending on certain real constants C1 , · · · , Cn . The analytic continuation of w(z) can be done, in a concrete way, along special paths in XC1 ,... ,Cn . The connection among the local solutions can be expressed as some congruence identities among the system of n integers (ν1 , . . . , νn ). Because of the symmetry of Wu’s equations, these functions also have a symmetry. At least in the case of a single variable case, this symmetry has already been observed in [23, 11, 14], etc. We show that this symmetry plays a crucial part to get their global property like analytic continuation, singularity and monodromy. The general theory on quasi-hypergeometric functions has been investigated in our previous paper (see [3, 4]). See also [9] where the authors investigate the very general category of functions called “GG functions” which contain our quasi-hypergeometric functions as a special case. We denote by fi (w1 , . . . , wn ) the functions
fi (w) = wi − 1 − zi w1 β1i · · · wn βni . Then the Jacobian J = hold
∂(f1 ,... ,fn ) ∂(w1 ,... ,wn )
(1.3)
at a point w = (w1 , . . . , wn ) where the equalities
f1 (w) = f2 (w) = · · · = fn (w) = 0 is given by the formula J =
ϕ(w1 , . . . , wn ) . w1 · · · wn
Here ϕ(w1 , . . . , wn ) denotes the determinant of the nth order matrix (δi,j wj − − 1))ni,j =1 .
(w βi,j j
ϕ(w1 , . . . , wn ) is a polynomial of nth degree which can be expressed as ϕ(w) = B +
n
Ci1 ···ir wi1 · · · wir ,
(1.4)
r=1 1≤i1 <···
where B and G( = C1,2,... ,n ) denote the determinant of the matrices β and 1 − β respectively. Let α1 , . . . , αn be arbitrary complex numbers. We denote by F (z) = Fβ (α1 , . . . , αn ; z1 , . . . , zn ) the holomorphic function of z at the origin (we call it the “Quasi-Hypergeometric Function”) which is by definition expressed as
Wu’s Equations and Quasi-Hypergeometric Functions
477
F (z) = Fβ (α1 , . . . , αn ; z1 , . . . , zn ) n n νn ν1 i=1 (αi + k=1 βi,k νk ) z1 · · · zn n n = i=1 (αi + k=1 βi,k νk ) ν1 ! · · · νn !
(1.5)
ν1 ,... ,νn ≥0
− 1 ,β for βi,i = βi,i i,k = βi,k (i = k). As was proved in [15] (see also [1]) the following identity holds: α1 w1 · · · wnαn . F (z) = ϕ(w) f1 =···=fn =0
(1.6)
The monomial w1α1 · · · wnαn itself has the power series w1α1 · · · wnαn = KFβ (α1 , . . . , αn ; z1 , . . . , zn ),
(1.7)
where KFβ (α1 , . . . , αn ; z1 , . . . , zn ) = BFβ (α1 , . . . , αn ; z1 , . . . , zn ) +
n
r=1 1≤i1 <···
Ci1 ,... ,ir Fβ (α1 , . . . , αi1 + 1, . . . , αir + 1, . . . , αn ; z).
(1.8)
Let us investigate the symmetry property of Eqs. (1.1). We define first a finite number )n of transformations β˜ = (β˜i,j i,j =1 : ρk β , (1 ≤ k ≤ n − 1), σk β (1 ≤ k ≤ n) and τk β (1 ≤ k ≤ n) respectively applied to β . Let ρk be the transposition between the arguments k and k + 1: β˜i,j = βρk (i),ρk (j ) . σk β is defined by the substitution β˜ = 1−β , β˜ = −β (j = k), and β˜ = β (i = k). Finally τk β is defined as k,k
k,j β = β , β˜k,j = βk,j k,k k,k 2 2 have ρi = σj = τk2
β˜k,k
k,k
1
k,j
(j =
k), β˜i,j
=
i,j i,j β −β β βk,k i,j k,j i,k βk,k
β
= − i,k (i = k). (i, j = k), β˜i,k β k,k
= e for the identical transformation e. We The system of generators {ρk (1 ≤ k ≤ n − 1), σk , τk (1 ≤ k ≤ n)} forms a finite group G of order 6n · n!. G is isomorphic to the semi-direct product of n pieces of S3 and Sn , G∼ = S3n Sn , where Sm denotes the symmetric group of mth degree. {ρ1 , . . . , ρn−1 } with the relations ρi ρi+1 ρi = ρi+1 ρi ρi+1 generate Sn . Each {σi , τi } generates the subgroup S3 with the relation τi σi τi = σi τi σi . σi , τi commute with σj , τj for i = j . We denote by G0 the subgroup of order 2n ·n! generated by {ρ1 , . . . , ρn−1 , σ1 , . . . , σn }. The right coset G0 \G is finite and has the cardinality |G0 \G| = 3n . In particular we have τ1 · · · τn β = β
−1
,
τ1 · · · τn σ1 · · · σn β = (1 − β )−1 . i1 , . . . , ip We denote by β the subdeterminant of the (i1 , . . . , ip )th lines and the j1 , . . . , jp (j1 , . . . , jp )th columns of β and abbreviate it by β (i1 , . . . , ip ) if i1 = j1 , . . . , ip = jp . We assume now the following non-degeneracy condition for β :
478
K. Aomoto, K. Iguchi
σi1 · · · σip β (j1 , . . . , jq ) = 0,
(N D)
1 ≤ i1 < · · · < ip ≤ n(0 ≤ p ≤ n), 1 ≤ j1 < · · · < jq ≤ n(1 ≤ q ≤ n). In particular we have · · · βn,n (β1,1 − 1) · · · (βn,n − 1) = 0. BGβ1,1
The condition (N D) is invariant under the action of G. Lemma 1. Under (N D) Eqs. (1.1) admit the action of G as follows: We have β˜
β˜
w˜ i − 1 = z˜ i w˜ 1 1,i · · · w˜ n n,i 1 ≤ i ≤ n,
(1.9)
ρk : w˜ i = wi (i = k, k + 1), w˜ k+1 = wk , w˜ k = wk+1 , z˜ i = zi (i = k, k + 1), z˜ k+1 = zk , z˜ k = zk+1 , σk : w˜ i = wi (i = k), w˜ k = wk−1 ,
β
β
τk : w˜ i = wi (i = k), w˜ k = 1 − wk = −zk w1 1,k · · · wn n,k , z˜ i = zi (−zk )
−
βki βk,k
(i = k), z˜ k = −(−zk )
−
1 βk,k
.
Each representative σ of the right coset G0 \G gives rise to the transformation F = Fβ (α; z) → Tσ F satisfying the identity Tσ σ F = Tσ (Tσ F ) for arbitrary σ, σ ∈ G, such that
σ w1α1 · · · wnαn = KTσ Fβ (α1 , . . . , αn ; z1 , . . . , zn ) σ ∈ G.
(1.10)
In fact, Tρk Fβ (α; z) = Fρk β (α1 , . . . , αk+1 , αk , . . . , αn ; z1 , . . . , zk+1 , zk , . . . , zn ),
(1.11)
Tσk Fβ (α; z) = Fσk β (α1 , . . . , 1 − αk , . . . , αn ; z1 , . . . , −zk , . . . , zn ),
(1.12)
Tτk Fβ (α; z) = where α˜ i = αi − In particular
αk βi,k , α˜k βk,k
=
αk ˜k ,z βk,k
Tτ1 ···τn β Fβ (α; z) =
1 βk,k
(−zk )
= −(−zk )
−
1 βk,k
−
αk βk,k
Fβ (α; ˜ z˜ ),
, z˜ i = zi (−zk )
−
(1.13) βk,i βk,k
.
1 (−z1 )−α˜1 · · · (−zn )−α˜n Fτ1 ···τn β (α; ˜ z˜ ), B
(1.14)
˜ ˜ α for β ˜ = β −1 and z˜ i = −(−z1 )−β1,i where α˜ i = nj=1 β˜i,j · · · (−zn )−βn,i . j Similarly we have Tσ1 ···σn τ1 ···τn Fβ (α; z) = for α˜ i =
n
˜ j =1 βi,j αj
1 (z1 )−α˜1 · · · (zn )−α˜n Fτ1 ···τn σ1 ···σn β (α; ˜ z˜ ) G −1
for β˜ = (1 − β )
˜
˜
and z˜ i = −(z1 )−β1,i · · · (zn )−βn,i .
(1.15)
Wu’s Equations and Quasi-Hypergeometric Functions
479
2. A Family of Local Solutions to Wu’s Equations Consider Eqs. (1.1) in the compactified space (CP 1 )n . We fix non negative integers p, q such that p + q ≤ n. Put −1 w1 = Z1 W1 , ...., wp = Zp Wp , wp+1 = Zp+1 Wp+1 , . . . , wp+q −1 = Zp+q Wp+q , wp+q+1 = Wp+q+1 , . . . , wn = Wn .
Then (1.1) are transformed into β
β
Zi Wi − 1 = −W1 1,i · · · Wn n,i , β1,i
Wi − Zi = W1 Wi − 1 =
βn,i
· · · Wn ,
β Zi W1 1,i
1 ≤ i ≤ p, p + 1 ≤ i ≤ p + q,
β · · · Wn n,i ,
(2.1)
p + q + 1 ≤ i ≤ n,
respectively, where Z1 , . . . , Zn are uniquely defined by the equations β
β
−β
−β
zi Z1 1i · · · Zpp,i Zp+1p+1,i · · · Zp+qp+q,i = −1, βp,i
β1i
−βp+1,i
zi Z1 · · · Zp Zp+1 β1i
zi Z1
1−βi,i
· · · Zi
β −β · · · Zpp,i Zp+1p+1,i
−βp+q,i
· · · Zp+q
= 1,
−β · · · Zp+qp+q,i
= Zi ,
1 ≤ i ≤ p, p + 1 ≤ i ≤ p + q, p + q + 1 ≤ i ≤ n.
Under the condition (N D), (2.1) has the power series solutions in Z1 , . . . , Zn in a small neighborhood of the origin |Z1 | < δ1 , . . . , |Zn | < δn for suitable positive numbers δ1 , . . . , δn , Wi = 1 +
n
Zk ck,i + · · · .
(2.2)
k=1
ck,i are uniquely determined by the relations n j =1 n j =1
ci,j βj,s = −δi,s 1 ≤ i ≤ p, 1 ≤ s ≤ n,
ci,j βj,s − ci,s = −δi,s , p + 1 ≤ i ≤ p + q, 1 ≤ s ≤ n,
ci,s = δi,s , p + q + 1 ≤ i ≤ n, 1 ≤ s ≤ n. We can choose an arbitrary subset of indices i1 , . . . , ip , ip+1 , . . . , ip+q from {1, 2, . . . , n} instead of the subset {1, 2, . . . , p; p + 1, . . . , p + q}. We have then a similar solution to (2.2). We denote by )1 , . . . , )n the local solution to (1.1) such that )j = 0, ∞, 1 as wj takes 0, ∞, 1 at Z1 = Z2 = ... = Zn = 0, in other words, as j = iν (1 ≤ ν ≤ p), j = iν (p + 1 ≤ ν ≤ p + q) and j otherwise. The total number of the local solutions thus obtained is 3n . In particular, when q = n, we have the local solution 0n such that
n ˜ (1 ≤ i ≤ n), (2.3) βi,j Zj + · · · wi = Zi 1 − j =1
480
K. Aomoto, K. Iguchi
where ˜
˜
Zi = (−z1 )−β1,i · · · (−zn )−βn,i and the matrix β˜i,j denotes the inverse of β .
1 ≤ i ≤ n,
(2.4)
Definition 1. In the complex plane zj near the infinity, 0n is multivalued and has new branches by counterclockwise rotations Sj zj → zj e2πi . We denote the branches thus obtained by wν(∞) 1 ,... ,νn
=
S1ν1
· · · Snνn 0n
= exp −2π i
n k=1
˜ βk,i νk Zi (1 − · · · ).
(2.5)
3. Case N = 1 Let us consider first the case n = 1 which plays a basic role in many variable cases. Let α ∈ C and β ∈ R be given constants. Equation (1.1) reduces to w − 1 = zw β
(3.1)
and the QHGF defined in (1.4) reduces to the Lambert series Fβ (α; z) =
∞ ν=0
(α + β ν) zν . (α + (β − 1)ν) ν!
(3.2)
We have the identities wα , (1 − β )w + β w α = β Fβ (α; z) + (1 − β )Fβ (α + 1; z)
Fβ (α; z) =
(3.3) (3.4)
for the holomorphic solution w to (1.1) at the origin. In particular, w = β Fβ (1; z) + (1 − β )Fβ (2; z).
(3.5)
(0)
We shall denote this holomorphic solution w by w0 . The condition (N D) simply means β = 0, 1. Equation (3.1) has the symmetry with respect to σ1 : β → 1 − β and τ1 : β →
1 . β
In the sequel we shall only consider the case β > 1, without losing generality. Fβ (α; z) is holomorphic in the disc |z| < c (c = β −β (β − 1)β −1 ) and has a branch at z = c. We have Tσ1 Fβ (α; z) = F1−β (1 − α; −z) = Fβ (α; z)
(3.6)
Wu’s Equations and Quasi-Hypergeometric Functions
481
for |z| < c. On the other hand, Tτ1 Fβ (α; z) =
1 −α (−z) β F 1 β β
α −1 ; −(−z) β β
(3.7)
which is defined in a neighborhood of z = ∞. In the same way
α 1 − βα − β1 Tτ1 σ1 Fβ (α; z) = (−z) F1− 1 1 − ; (−z) , β β β 1−α 1 − 1−β Tσ1 τ1 Fβ (α; z) = z F 1 1−β 1 − β
(3.8)
1 1−α −1 β , ; −z 1 − β
(3.9)
1−α 1 α − β 1 − 1−β −1 β Tτ1 σ1 τ1 Fβ (α; z) = . z F β ; −z 1 − β 1 − β β −1
(3.10)
The map w−1 wβ
T :w→
(3.11)
from R>0 to R has critical point w(c) = β β−1 > 1 and critical value c. The inverse image by T of the interval (0, ∞) consists of the real positive line l 1 and the convex loop l 2 starting from and ending in the origin, which intersects l 1 at w(c) in normal crossing. The arguments of its tangents at the origin are equal to ± βπ . l 1 and l 2 form the separatrices of the phase defined by the family of curves XC : arg
w−1 =C wβ
(C denotes a constant). (0)
(0)
Lemma 2. Equation (3.1) has the local solution wˆ 0 at z = 0 besides w0 ,
1 (0) wˆ 0 = Z −1 1 − Z + ··· , β −1 1
(3.12)
(0)
which is a Laurent series with respect to Z for Z = z β −1 . wˆ 0 can also be expressed by using (3.9) as (0)
wˆ 0 = β Tσ1 τ1 Fβ (1; z) + (1 − β )Tσ1 τ1 Fβ (2; z)
(3.13)
from the identity (3.4). 1
(0)
Proof. By substitution w = Z −1 W , Z = z β −1 , wˆ 0 can be obtained as the unique solution to the equation W − Z = Wβ such that W (0) = 1.
482
K. Aomoto, K. Iguchi
In the same way, we have (∞)
Lemma 3. Equation (3.1) has the local solution w0 (∞) w0 − β1
for Z = (−z)
in a neighborhood of z = ∞,
1 = Z 1 − Z + ··· β
(3.14)
, and (∞)
w0
= β Tτ1 Fβ (1; z) + (1 − β )Tτ1 Fβ (2; z). (0)
(3.15) (0)
Lemma 4. Tσ1 τ1 Fβ (α; z) (or wˆ 0 ) is the analytic continuation of Fβ (α; z) (or w0 ) along a closed curve starting from the origin, turning around c counterclockwise in a small circle and going back to the origin. Let S be the shift operator defined by the analytic continuation by the rotation z → (0) (∞) ze2πi , and hence by S ν the ν-times rotations for an integer ν. We denote S ν wˆ 0 , S ν w0 (0) (∞) by wˆ ν , wν respectively. Lemma 5. Let l + be the oriented curve consisting of the interval [1, w(c)] combined by the lower half of l 2 which starts from 1 and ends in w = 0. Then Tτ1 Fβ (α; z) (or (∞) (0) w0 ) is the analytic continuation of Fβ (α; z) (or w0 ) when z moves from 0 to +∞ along the real axis z ≥ 0 detoured around c in the lower half plane. Lemma 6. Let l − be the oriented curve consisting of the interval [w(c), ∞] and the (∞) upper half of l 2 which starts from w = +∞ and ends in w = 0. Then w−1 is the (0)
analytic continuation of wˆ 0 along l − when z moves starting from z = 0 and ending in z = +∞ along the real axis detoured around c in the lower half plane.
We have X0 = l 1 ∪ l 2 = l + ∪ l − . Hence X0 contains two paths from 1 to 0 and from (0) (∞) ∞ to 0 along which we have the analytic continuations from w0 to w0 and from (0) (∞) wˆ 0 to w−1 respectively. We also consider the following equations which are obtained from (3.1) by the rotation w → we2πνi (ν ∈ Z):
w − 1 = e2πνiβ w β .
(3.16)
(0)
Equation (3.16) has the holomorphic solution wν at z = 0,
wν(0) = β Fβ (1; e2πνβ i z) + (1 − β )Fβ (2; e2πνβ i z).
(3.17)
(∞)
Proposition 1. (1) Suppose that wµ is the local solution at z = ∞ obtained by the (0) analytic continuation of wν along the real curve X0 from the origin z = 0 to z = +∞. π+2πµ Let 2πνβ and − β be expressed as 2π νβ = ϕ + 2mπ
(3.18)
Wu’s Equations and Quasi-Hypergeometric Functions
483
(0 < ϕ ≤ 2π, m ∈ Z) and π + 2π µ = θ + 2nπ β (|θ | < π, n ∈ Z.) Then we get the equalities −
(3.19)
1 + µ = −m, ν = n.
(3.20)
2π ν = ϕ + 2mπ 1 − β
(3.21)
(2) Put
(∞)
for 0 ≤ ϕ < 2π . wµ if
(0)
is the analytic continuation of wˆ ν along the curve l 2 if and only µ + 1 = ν − m.
(3.22)
Proof. (1) The curve X0 lies in the region surrounded by l 2 . We have C = ϕ = π − β θ.
(3.23)
This implies (3.20). (2) In this case the curve X0 lies outside of the curve l 2 . We have C = ϕ − β (ϕ + 2mπ ) = π − β (θ + 2nπ ).
(3.24)
This occurs if and only if (3.22) holds. (0)
(0)
This proposition shows that the local solutions wν or wˆ ν at the origin are analyti(∞) cally continued to the ones wµ at z = ∞ along the curves X0 . Lemma 7. As z moves from 0 to c on the real axis and turn around c, in a detoured (0) (0) way and moves back to the origin, w0 and wˆ 0 meet each other and are transposed at (0) (0) z = c, while each of any other branches wµ or wˆ µ remains the same. (∞) (∞) In the same way w0 and w−1 meet each other and are transposed while each of (∞)
the other wµ (µ = 0, −1) remains the same. In other words, we have the monodromy which is a permutation among (∞) wµ −∞<µ<∞ (∞) (∞) w → w−1 , 0 (∞) (∞) Mc : w−1 → w0 , (∞) (∞) wµ → wµ (µ = 0, −1). On the other hand by the rotation S, we have (∞)
S : wµ(∞) → wµ+1 .
(3.25)
As a conclusion, we may state Theorem 1. Assume that β > 1. The monodromy corresponding to the group generated by the transformations Mc and S gives rise to permutations among the local solutions (∞) wµ (−∞ < µ < ∞). It contains every finite permutation and the shifts S ν (−∞ < ν < ∞): (∞)
S ν : wµ(∞) → wµ+ν .
(3.26)
484
K. Aomoto, K. Iguchi
4. Case n = 2 Equations (1.1) are written as β
β
β
β
w1 − 1 = z1 w1 1,1 w2 2,1 , w2 − 1 = z2 w1 1,2 w2 2,2 .
(4.1)
The corresponding QHGF is
=
ν1 ,ν2 ≥0
ν (α1 + β1,1 1
Fβ (α1 , α2 ; z1 , z2 )
ν )(α + β ν + β ν ) ν1 ν2 + β1,2 2 2 2,1 1 2,2 2 z1 z2
(α1 + β1,1 ν1 + β1,2 ν2 )(α2 + β2,1 ν1 + β2,2 ν2 ) ν1 !ν2 !
(4.2)
−1 = β for β1,1 1,1 β2,2 − 1 = β2,2 and β1,2 = β1,2 β2,1 = β2,1 . Equations (1.3) and (1.5) are valid for ϕ(w1 , w2 ) = Gw1 w2 − (B − β1,1 )w2 − (B − β2,2 )w1 + B,
(4.3)
or equivalently, w1α1 w2α2 = KFβ (α1 , α2 ; z1 , z2 ),
(4.4)
where K denotes the operator KFβ (α1 , α2 ; z1 , z2 )
= GFβ (α1 + 1, α2 + 1; z1 , z2 ) − (B − β1,1 )Fβ (α1 , α2 + 1; z1 , z2 ) −(B − β2,2 )Fβ (α1 + 1, α2 ; z1 , z2 ) + BFβ (α1 , α2 ; z1 , z2 ).
(4.5)
Here B and G represent the determinants B = β1,1 β2,2 − β1,2 β2,1 , )(1 − β2,2 ) − β1,2 β2,1 . G = (1 − β1,1
In particular w1 = KFβ (1, 0; z1 , z2 ), w2 = KFβ (0, 1; z1 , z2 ).
(4.6)
The condition (N D) reduces to the inequality BGβ1,1 β2,2 (β1,1 − 1)(β2,2 − 1)(B − β1,1 )(B − β2,2 ) = 0.
(4.7)
The group G is of order 72 and is generated by < ρ1 , σ1 , σ2 , τ1 , τ2 > with relations ρ12 = σ12 = σ22 = τ12 = τ22 = e, σ1 σ2 = σ2 σ1 , τ1 τ2 = τ2 τ1 , σ1 τ1 σ1 = τ1 σ1 τ1 , σ2 τ2 σ2 = τ2 σ2 τ2 . The representatives of the coset G0 \G can be chosen as {e, τ1 , τ2 , τ1 τ2 , τ1 σ1 , τ2 σ2 , τ1 σ1 τ2 , τ2 σ2 τ1 , τ1 τ2 σ1 σ2 } (see [13]).
Wu’s Equations and Quasi-Hypergeometric Functions
The corresponding matrices are given as , −β 1 − β1,1 1,2 σ1 β = , , β β2,1 2,2 1 β1,2 , β β1,1 1,1 , τ1 β = β2,1 B − , β β1,1 1,1 β2,2 β1,2 B ,− B τ1 τ2 β = , β 2,1 β1,1 , − B B β1,2 1 1 − β , − 1 − β 1,1 1,1 τ1 σ1 β = − B , β2,1 β2,2 − , 1 − β1,1 1 − β1,1 1 − β2,2 β1,2 β − B , − β − B 1,1 τ2 σ2 τ1 β = 1,1 , β2,1 β1,1 − B , β − B β1,1 1,1 1 − β2,2 β1,2 , G G τ1 τ2 σ1 σ2 β = . β 2,1 1 − β1,1 , G G
485
, β β1,1 1,2
σ2 β =
, 1 − β −β2,1 2,2 B β1,2 , − β2,2 β2,2 , τ2 β = β 2,1 1 , β β2,2 2,2
−B β1,1
,
β1,2
1 − β , − 1 − β 2,2 2,2 , τ2 σ2 β = β2,1 1 − , 1 − β2,2 1 − β2,2 β2,2 β1,2 β − B , β − B 2,2 2,2 τ1 σ1 τ2 β = , β2,1 1 − β1,1 − , β2,2 − B β2,2 −B
β = 0 is transformed by a suitable Lemma 8. An arbitrary matrix β such that β1,2 2,1 > 0, β˜ > 0. element of G into a matrix β˜ such that β˜1,2 2,1 > 0, β > 0, belongs to one of the following Lemma 9. Every matrix β satisfying β1,2 2,1 , β , β − 1, β − 1, B − β , B − 4 orbits of G, according to the signs of G, B, β1,1 2,2 1,1 2,2 1,1 : β2,2
(a) (+ + + + + + ++) (+ + + + − − −−) (− + + + + − −+) (− + + + − + +−) (+ − − + − − +−) (+ − + − − − −+) (− − − + − + −+) (− − + − + − +−) (+ + − − − − ++),
(b) (+ − + + − − −−) (− + + + + + +−) (− + + + + + −+) (− + + + + + ++) (− − − − − − −−) (+ − − + − − −−) (+ − + − − − −−) (− − + − − − −−) (− − − + − − −−),
486
K. Aomoto, K. Iguchi
(c) (− − + + + + −−) (− − + + + − −−) (− − + + − + −−) (− + + + − + −−) (− + + + + − −−) (− + + + − − −−) (+ − − − − − ++) (+ − − − − − −+) (+ − − − − − +−),
(d) (− − + + − − −−) (− + + + + + −−) (+ − − − − − −−).
As a consequence, we have Corollary 1. An arbitrary matrix β satisfying N D such that β1,2 > 0, β2,1 > 0 can be transformed by an element of G into the following four cases: > 0, β > 0, B > 0, G > 0, β > 1, β > 1, B −β > 0, B −β > 0; (a) β1,2 2,1 1,1 2,2 1,1 2,2 > 0, β2,1 > 0; B < 0, G > 0, 1 > β1,1 > 0, 1 > β2,2 > 0, B − β1,1 < (b) β1,2 < 0; 0, B − β2,2 > 0, β > 0, B < 0, G < 0, β > 1, β > 1, B −β < 0, B −β < 0; (c) β1,2 2,1 1,1 2,2 1,1 2,2 > 0, β2,1 > 0, B < 0, G < 0, 1 > β1,1 > 0, 1 > β2,2 > 0, B − β1,1 < (d) β1,2 0, B − β2,2 < 0.
In this section, because of simplicity, we only consider the case (a) in the corollary, namely we assume the following: > 0, β > 0, B > 0, G > 0, β > 1, β > 1, B −β > 0, B −β > 0. (C1 ) β1,2 2,1 1,1 2,2 1,1 2,2
The transformation formula for Fβ (α1 , α2 ; z1 , z2 ) corresponding to each representative of G0 \G is given as follows: (1) Tσ1 Fβ (α1 , α2 ; z1 , z2 ) = Fσ1 β (1 − α1 , α2 ; −z1 , z2 ), Tσ2 Fβ (α1 , α2 ; z1 , z2 ) = Fσ2 β (α1 , 1 − α2 ; z1 , −z2 ). Both coincide with Fβ (α1 , α2 ; z1 , z2 ) itself. (2) Tτ1 Fβ (α1 , α2 ; z1 , z2 ) = Fτ1 β for z˜ 1 = −(−z1 )
α1 , α2 β1,1
β
− α1 β2,1 ˜ 1 , z˜ 2 ;z 1,1
1
β1,1
(−z1 )
−
α1 β1,1
β − 1,2 β1,1
− 1 β1,1
, z˜ 2 = z2 (−z1 ) . α − 2 β β2,2 α2 1 (3) Tτ2 Fβ (α1 , α2 ; z1 , z2 ) = Fτ2 β α1 − α2 β1,2 , ; z ˜ , z ˜ (−z ) 1 2 β 2 β 2,2
for z˜ 1 = z1 (−z2 )
β − 2,1 β2,2
, z˜ 2 = −(−z2 )
(4) Tτ1 τ2 Fβ (α1 , α2 ; z1 , z2 ) = for α˜ 1 =
−α β α1 β2,2 2 1,2 , B
z˜ 2 = −(−z1 )
β1,2 B
(−z2 )
.
1 β2,2
2,2
.
1 −α˜1 (−z )−α˜2 F ˜ 1 , α˜ 2 ; z˜ 1 , z˜ 2 ) 2 τ1 τ2 β (α B (−z1 )
α˜ 2 =
β − 1,1 B
−
2,2
+α β −α1 β2,1 2 1,1 B
and z˜ 1 = −(−z1 )−
β2,2 B
(−z2 )
β2,1 B
,
Wu’s Equations and Quasi-Hypergeometric Functions
(5) Tσ1 τ1 Fβ (α1 , α2 ; z1 , z2 ) = Fτ1 σ1 β −
for z˜ 1 = −z1
1 1−β1,1
β1,2 1−β1,1
, z˜ 2 = z1
487
−
, z˜ 2 = −z2
1 1−β2,2
2 − 1−β2,2 (1−α2 )β1,2 1−α2 1 , ; z ˜ , z ˜ z 1 2 1−β2,2 1−β2,2 1−β2,2 2 1−α
(6) Tσ2 τ2 Fβ (α1 , α2 ; z1 , z2 ) = Fτ2 σ2 β α1 −
for z˜ 1 = z1 z2
−
z2 .
β2,1 1−β2,2
1 − 1−β1,1 (1−α1 )β2,1 1 ; z˜ 1 , z˜ 2 1−β z1 1−β1,1 1,1 1−α
1−α1 , α2 1−β1,1
.
(7) Tτ1 σ2 τ2 Fβ (α1 , α2 ; z1 , z2 ) = Fτ2 σ2 τ1 β (α˜ 1 , α˜ 2 ; z˜ 1 , z˜ 2 ) β
1
1,1 −B
for α˜ 1 = (z2 )
β − 2,1 β1,1 −B
)−(1−α )β α1 (1−β2,2 2 1,2 , −B β1,1
, z˜ 2 = −(−z1 )
β1,2 −B β1,1
+(1−α )β α1 β2,1 2 1,1 −B β1,1
α˜ 2 =
(z2 )
β − 1,1 β1,1 −B
and z˜ 1 = −(−z1 )
1
2,2 −B
for α˜ 1 = (−z2 )
β2,1 −B β2,2
, z˜ 2 = −(z1 )
α˜ 2 =
β − 1,2 β2,2 −B
(−z2 )
−
1−β2,2 −B β1,1
.
(8) Tτ2 σ1 τ1 Fβ (α1 , α2 ; z1 , z2 ) = Fτ1 σ1 τ2 β (α˜ 1 , α˜ 2 ; z˜ 1 , z˜ 2 ) β +α β (1−α1 )β2,2 2 1,2 , −B β2,2
(−z1 )−α˜ 1 (z2 )−α˜ 2
+α (1−β ) −(1−α1 )β2,1 2 1,1 −B β2,2
1−β − 1,1 β2,2 −B
(z1 )−α˜ 1 (−z2 )−α˜ 2
and z˜ 1 = −(z1 )
−
β2,2 −B β2,2
.
(9) Tσ1 σ2 τ1 τ2 Fβ (α1 , α2 ; z1 , z2 ) = Fτ2 τ1 σ2 σ1 β (α˜ 1 , α˜ 2 ; z˜ 1 , z˜ 2 ) G1 z1−α˜ 1 z2−α˜ 2 for α˜ 1 = β − 2,1 G
z2
)+(1−α )β (1−α1 )(1−β2,2 2 1,2 , G β − 1,2 G
, z˜ 2 = −z1
1−β − G1,1
z2
α˜ 2 =
+(1−α )(1−β ) (1−α1 )β2,1 2 1,1 G
and z˜ 1 =
− −z1
1−β2,2 G
.
Proposition 2. Corresponding to each of the coset G0 \G there are 9 local solutions w = (w1 , w2 ) to Eq. (4.1). They are given as follows: (1) w = (w1 , w2 ) are holomorphic in a neighborhood U1 (δ1 , δ2 ) : |z1 | < δ1 , |z2 | < δ2 for small positive numbers δ1 , δ2 . w1 = KFβ (1, 0; z1 , z2 ) = 1 + z1 + · · · , w2 = KFβ (0, 1; z1 , z2 ) = 1 + z2 + · · · .
(4.8)
(2) w1 = KTτ1 Fβ (1, 0; z1 , z2 ) = Z1
β2,1 1 1 − Z 1 − Z2 + · · · , β1,1 β1,1
w2 = KTτ1 Fβ (0, 1; z1 , z2 ) = 1 + Z2 + · · ·
(4.9)
488
K. Aomoto, K. Iguchi
for Z1 = (−z1 ) δ1 , |Z2 | < δ2 .
−
1 β1,1
, Z2 = z2 (−z1 )
−
β1,2 β1,1
in a neighborhood U2 (δ1 , δ2 ) : |Z1 | <
(3) w1 = KTτ2 Fβ (1, 0; z1 , z2 ) = 1 + Z1 + · · · ,
β1,2 1 w2 = KTτ2 Fβ (0, 1; z1 , z2 ) = Z2 1 − Z1 − Z2 + · · · β2,2 β2,2 for Z1 = z1 (−z2 ) δ1 , |Z2 | < δ2 .
−
β2,1 β2,2
, Z2 = (−z2 )
−
1 β2,2
in a neighborhood U3 (δ1 , δ2 ) : |Z1 | <
β2,2 β2,1 Z1 + Z2 + · · · , (4) w1 = KTτ1 τ2 Fβ (1, 0; z1 , z2 ) = Z1 1 − B B
β1,2 β1,1 − Z2 + · · · w2 = KTτ1 τ2 Fβ (0, 1; z1 , z2 ) = Z2 1 + B B for Z1 = (−z1 )−
β2,2 B
(−z2 )
β2,1 B
, Z2 = (−z1 )
β1,2 B
, (−z2 )−
β1,1 B
in a neighborhood
U4 (δ1 , δ2 ) : |Z1 | < δ1 , |Z2 | < δ2 . (5) w1 = KTσ1 τ1 Fβ (1, 0; z1 , z2 ) =
Z1−1
β2,1 1 1+ Z1 + 1 − β Z2 + · · · , 1 − β1,1 1,1
w2 = KTσ1 τ1 Fβ (0, 1; z1 , z2 ) = 1 + Z2 + · · · 1 −1 β1,1
for Z1 = z1
β1,2 1−β1,1
, Z2 = z1
(4.10)
z2 in a neighborhood U5 (δ1 , δ2 ) : |Z1 | < δ1 , |Z2 | < δ2 .
(6) w1 = KTσ2 τ2 Fβ (1, 0; z1 , z2 ) = 1 + Z1 + · · · ,
β1,2 1 −1 w2 = KTσ2 τ2 Fβ (0, 1; z1 , z2 ) = Z2 1 + Z1 + 1 − β Z2 + · · · 1 − β2,2 2,2 (4.11) β2,1 1 1−β2,2
for Z1 = z1 z2
β
, Z2 = z2 2,2
−1
in a neighborhood U6 (δ1 , δ2 ) : |Z1 | < δ1 , |Z2 | < δ2 .
−1 β2,2 β2,1 Z1 − Z2 + · · · , (7) w1 = KTτ1 σ2 τ2 Fβ (1, 0; z1 , z2 ) = Z1 1 + β1,1 − B β1,1 − B
β1,2 β1,1 w2 = KTτ1 σ2 τ2 Fβ (0, 1; z1 , z2 ) = Z2−1 1 − Z1 + Z2 + · · · β1,1 − B β1,1 − B (4.12) for Z1 = (−z1 )
−1 β2,2 −B β1,1
−
z2
β2,1 β1,1 −B
U7 (δ1 , δ2 ) : |Z1 | < δ1 , |Z2 | < δ2 .
, Z2 = (−z1 )
β1,2 −B β1,1
−
z2
β1,1 β1,1 −B
in a neighborhood
Wu’s Equations and Quasi-Hypergeometric Functions
(8) w1 = KTτ2 σ1 τ1 Fβ (1, 0; z1 , z2 ) =
Z1−1
489
1+
w2 = KTτ2 σ1 τ1 Fβ (0, 1; z1 , z2 ) = Z2 1 −
β2,2
−B β2,2 β2,1
−B β2,2
Z1 −
Z1 +
β1,2
−B β2,2
−1 β1,1
−B β2,2
Z2 + · · · ,
Z2 + · · · (4.13)
−
for Z1 = z1
β2,2 −B β2,2
(−z2 )
β2,1 −B β2,2
β1,2 −B β2,2
−
, Z2 = z1
(−z2 )
−1 β1,1 −B β2,2
in a neighborhood
U7 (δ1 , δ2 ) : |Z1 | < δ1 , |Z2 | < δ2 . (9) w1 = KTσ1 σ2 τ1 τ2 Fβ (1, 0; z1 , z2 ) =
Z1−1
1+
1 − β2,2
G
Z1 +
β2,1
G
Z2 + · · · ,
β1,2 1 − β1,1 Z1 + Z2 + · · · w2 = KTσ1 σ2 τ1 τ2 Fβ (0, 1; z1 , z2 ) = Z2−1 1 + G G (4.14) −1 β2,2 G
for Z1 = z1
− z2
β2,1 G
, Z2 =
− z1
β1,2 G
−1 β1,1 G
z2
in a neighborhood U9 (δ1 , δ2 ) : |Z1 | <
δ1 , |Z2 | < δ2 . Generally Eqs. (4.1) have local solutions at each point w where w1 , w2 equal 0, 1 or ∞ such that its restriction to a family of special curves in the space (z1 , z2 ) can be expressed as quasi Puiseux expansions, as follows: Let us fix positive constants v, λ. Consider the family of curves γλ (or γλ∗ ) : z1 = t, z2 = vt λ (or z1 = −t −1 , z2 = −vt −λ ), where t moves over the set 0 < t ≤ δ, δ being a small positive constant. On each curve γλ (or γλ∗ ) Eqs. (4.1) with respect to w1 , w2 have a finite number of solutions which are expressed as quasi Puiseux expansions, w1 = u1,0 t κ1,0 + u1,1 t κ1,1 + · · · , w2 = u2,0 t κ2,0 + u2,1 t κ2,1 + · · · ,
(4.15)
such that κ1,0 < κ1,1 < · · · , κ2,0 < κ2,1 < · · · and u1,0 > 0, u2,0 > 0. Let us call these solutions “admissible”. Then Proposition 3. (a) (1), (5), (6) and (9) are all admissible solutions to (4.1) which are holomorphic in the neighborhood |z1 |
−
β1,2 β1,1 −1
|z2 | < δ1 , |z1 ||z2 |
−
β2,1 β2,2 −1
< δ2 , |z1 | < δ3 , |z2 | < δ4
for small positive numbers δ1 , δ2 , δ3 , δ4 , having an expression (4.15) on γλ.
(4.16)
490
K. Aomoto, K. Iguchi
(b) (4) is the only admissible solution to (4.1) which is holomorphic in the neighborhood −
|z1 |
β1,2 β1,1
β2,1
− β 1 1 1 1 |z2 | > , |z1 ||z2 1,1 | > , |z1 | > , |z2 | > δ1 δ2 δ3 δ4
(4.17)
having an expression (4.15) on γλ∗ . Proof. (a) λ satisfies the inequalities β1,2
−1 β1,1
<λ<
−1 β2,2 β2,1
.
Once the exponents (κ1,0 , κ2,0 ) are determined, (4.15) is uniquely expressed in a recursive way. There are only 4 such exponents satisfying (4.1) which correspond to (1), (5), (6), (9). (b) In this case we have the inequalities β1,2
<λ<
β1,1
β2,2
β2,1
.
It can be proved similarly as above. 5. Critical Set, Branch Loci and Analytic Continuation Equations (4.1) are equivalent to considering the map T : w1 − 1 z = β β 1 w1 1,1 w2 2,1 T : w2 − 1 z2 = β1,2 β w1 w2 2,2 2 into the target space R 2 . T has the critical set 8 defined by from the source space R>0 the equation
8 : ϕ(w1 , w2 ) = 0.
(5.1)
8 is the hyperbola with two asymptotic lines w1 = a1 , and w2 = a2 , centered at B−β B−β 1,1 2,2 (a1 , a2 )= . It has two components 81 = 8 ∩ {w1 < a1 , w2 < a2 }, G , G 82 = 8 ∩ {w1 > a1 , w2 > a2 }. 8 is parametrized by w1 − a 1 =
β2,1
G
t, w2 − a2 =
β1,2
Gt
.
(5.2)
It is known since H.Whitney that the local singularities of a 2 dimensional mapping generically consist of folds, cusps and nodes (see [17, 19, 23].) Consider the restriction of T to 8 (denoted by T ). T has a fold at a point of 8 where T is not singular. The singular points of T corresponding to cusps of T (more precisely 8 1,1 -type Boardman singularity) is given by the equations dz1 dz2 ≡ ≡0 z1 z2
mod dϕ.
(5.3)
Wu’s Equations and Quasi-Hypergeometric Functions
491
This means the following cubic equation in t: − where t1 = −
B−β1,1 , t2 β2,1
c2 1 1 c1 + − + = 0, t − t1 t − t2 t − t3 t
=−
−1 β2,2 , t3 β2,1
β
1,2 = − B−β
2,2
and c1 =
(5.4)
β1,1 , c2 β2,1
=
1 . β2,1
The critical sets are illustrated in Figs. 1(a)–(d) according to the 4 cases (a)–(d) in Corollary 1 to Lemma 9. From now on we assume the condition (C1 ), i.e., only consider the case (a). Equation (5.4) has at least one positive solution which we denote by t0 . It has 3 different real solutions if and only if its discriminant is positive. This is also equivalent , β , β , β to positivity of the following polynomial : in β1,1 1,2 2,1 2,2 appearing in the irreducible factor of the discriminant: : = :0,0 + :1,1 β1,2 β2,1 + :2,1 β1,2 β2,1 + :1,2 β1,2 β2,1 + :2,2 β1,2 β2,1 2
2
2
+ :3,2 β1,2 β2,1 + :2,3 β1,2 β2,1 + :4,2 β1,2 β2,1 + :2,4 β1,2 β2,1 3
2
2
3
4
2
2
2
4
+ :3,3 β1,2 β2,1 + :4,3 β1,2 β2,1 + :3,4 β1,2 β2,1 + β1,2 β2,1 , 3
3
4
3
3
4
4
4
where :0,0 = − 27β1,1 β1,1 (β2,2 − 1)2 (β2,2 − 1)2 , 2
2
− 1)(β2,2 − 1)(2β1,1 − 1)(2β2,2 − 1), :1,1 = 18(β1,1 (β1,1 − 1)(β2,2 + 1)(2β2,2 − 1)(β2,2 − 2), :2,1 = 2β1,1 (β2,2 − 1)(β1,1 + 1)(2β1,1 − 1)(β1,1 − 2), :1,2 = 2β2,2 :2,2 = − 62β1,1 β2,2 (1 − β1,1 )(1 − β2,2 ) + 8(β1,1 + β2,2 − β1,1 − β1,1 ) + 1, 2
2
:3,2 = 2(2β2,2 − 2β2,2 − 1)(1 − 2β1,1 ), 2
:2,3 = 2(2β1,1 − 2β1,1 − 1)(1 − 2β2,2 ), 2
)(1 − 2β2,2 ), :4,2 = :2,4 = 2, :3,3 = 4(1 − 2β1,1 − 1), :3,4 = 2(2β1,1 − 1). :4,3 = 2(2β2,2
The component 82 is separated into two parts 82+ and 82− according as t ≥ t0 or t ≤ t0 . The image T 8 in the target space R2 , being a branch locus of the function Fβ (α1 , α2 ; z1 , z2 ), consists of 2 disjoint curves T 81 and T 82 (see Figs. 2 and 3). The 2 − 8 consist of three cells components of the complement R>0 ;1 : ϕ(w) > 0, ;2 : ϕ(w) < 0, ;3 : ϕ(w) > 0,
a1 > w1 > 0, a2 > w2 > 0, w1 > 0, w2 > 0, w1 > a1 , w2 > a2 .
T ;1 is the convex domain surrounded by T 81 including the origin. T ;3 is the domain surrounded by T 82 . T ;2 is not planar and the domain overlapping T ;3 surrounded by T 81 , T 82 , the negative z1 axis and the negative z2 axis. Consider the exterior domain surrounded by T 81 in the 1st octant z1 ≥ 0, z2 ≥ 0 in the target space. We denote by ;4 its inverse image by T in the complexification C 2 of the source space, such that 81 is a boundary of ;4 .
492
K. Aomoto, K. Iguchi 0.1 "type 1(i)"
0.08
0.06
0.04
0.02
0
0
0.02
0.04
0.06
Fig. 1(a). β :
52
0.08
0.1
24
-1.21 "t2-4a.p050" -1.215
-1.22
-1.225
-1.23
-1.235
-1.24
-1.245
-1.25 -1.34
-1.335
-1.33
-1.325
1 1
-1.32
+ s (−0.4 ≤ s ≤ 0.5),
20 2 Fig. 1(b)-1. β : 1 1 2
-1.315
-1.31
20
s = 0.5
-1.305
-1.3
Wu’s Equations and Quasi-Hypergeometric Functions
493
-1.26 "t2-4a.p015"
-1.27
-1.28
-1.29
-1.3
-1.31 -1.34
-1.335
-1.33
-1.325
-1.32
-1.315
1 1
+ s (−0.4 ≤ s ≤ 0.5),
20 2 Fig. 1(b)-2. β : 1 1 2
-1.31
-1.305
-1.3
s = 0.15
20
-1.26 "t2-4a.p010"
-1.27
-1.28
-1.29
-1.3
-1.31 -1.34
-1.335
-1.33
-1.325
1 1
-1.32
+ s (−0.4 ≤ s ≤ 0.1),
20 2 Fig. 1(b)-3. β : 1 1 2
-1.315
-1.31
20
s = 0.1
-1.305
-1.3
494
K. Aomoto, K. Iguchi -1.28 "t2-4a.p005" -1.285
-1.29
-1.295
-1.3
-1.305
-1.31
-1.315
-1.32 -1.33
-1.325
-1.32
-1.315
1 1
-1.31
-1.305
+ s (−0.4 ≤ s ≤ 0.5),
20 2 Fig. 1(b)-4. β : 1 1 2
-1.3
-1.295
-1.29
s = 0.05
20
-1.29 "t2-4a.p000" -1.295
-1.3
-1.305
-1.31
-1.315
-1.32
-1.325
-1.33 -1.33
-1.325
-1.32
-1.315
-1.31
-1.305
-1.3
1 1 + s 20 2 (−0.4 ≤ s ≤ 0.5), Fig. 1(b)-5. β : 1 1 2 20
s = 0.0
-1.295
-1.29
Wu’s Equations and Quasi-Hypergeometric Functions
495
-1.29 "t2-4a.m005" -1.295
-1.3
-1.305
-1.31
-1.315
-1.32
-1.325
-1.33 -1.32
-1.315
-1.31
-1.305
-1.3
-1.295
1 1
+ s (−0.4 ≤ s ≤ 0.5),
20 2 Fig. 1(b)-6. β : 1 1 2
-1.29
-1.285
-1.28
s = −0.05
20
-1.29 "t2-4a.m010" -1.295
-1.3
-1.305
-1.31
-1.315
-1.32
-1.325
-1.33 -1.32
-1.315
-1.31
-1.305
1 1
-1.3
+ s (−0.4 ≤ s ≤ 0.5),
20 2 Fig. 1(b)-7. β : 1 1 2
-1.295
-1.29
20
s = −0.1
-1.285
-1.28
496
K. Aomoto, K. Iguchi
-1.3 "t2-4a.m015" -1.305
-1.31
-1.315
-1.32
-1.325
-1.33
-1.335
-1.34 -1.31
-1.305
-1.3
-1.295
-1.29
-1.285
-1.28
-1.275
-1.27
1 1 + s 20 2 (−0.4 ≤ s ≤ 0.5), Fig. 1(b)-8. β : 1 1 2 20
s = −0.15
-1.31 "t2-4a.m040" -1.315
-1.32
-1.325
-1.33
-1.335
-1.34
-1.345
-1.35 -1.28
-1.275
-1.27
-1.265
-1.26
-1.255
-1.25
1 1 + s 20 2 (−0.4 ≤ s ≤ 0.5), Fig. 1(b)-9. β : 1 1 2 20
s = −0.4
-1.245
-1.24
Wu’s Equations and Quasi-Hypergeometric Functions
497
0.03 "type 4(i)" 0.025 0.02 0.015 0.01 0.005 0 -0.005 -0.01 -0.015 -0.02 -0.02
-0.015
-0.01
-0.005
0
Fig. 1(c). β :
0.005
0.01
0.015
50 + 5 × 10
50 50 + 5 × 10−16
0.02
0.025
0.03
−16
50
4 "type 4(iv)" 3 2 1 0 -1 -2 -3 -4 -5 -5
-4
-3
-2
-1
0
Fig. 1(d). β :
1
0.5 1
1 0.5
2
3
4
498
K. Aomoto, K. Iguchi
Fig. 2.
Fig. 3.
Lemma 10. ;4 is homeomorphic to its image T ;4 which is itself a cell. It contains the both real curves l + × {w2 = 1}, and {w1 = 1} × l + lying in the lines w2 = 1 and w1 = 1 respectively. ;4 is contained in the region ϕ(w1 , w2 ) < 0 in a neighborhood of the locus 81 . Proof. In fact, the Jacobian J of T is different from 0 inside of ;4 . Hence ;4 is homeomorphic to its image T ;4 . Let z = (η1 t, η2 t) t > 0 be a parametrization of the point z in the target space R 2 , where η1 , η2 are non negative numbers such that η1 + η2 = 1. We denote by tc the unique solution such that (η1 t, η2 t) lies in T 8 1 . Then we have near t = tc the Puiseux expansions of the solution to (4.1), √ w1 = ξ1 − tc − tξ1 + · · · , √ w2 = ξ2 − tc − tξ2 + · · · , where (ξ1 , ξ2 ) is a point in 81 , and (ξ1 , ξ2 ) (ξ1 > 0, ξ2 > 0) satisfies the linear equation ξ1 ξ + β2,1 (1 − ξ1 ) 2 = 0, ξ1 ξ2 ξ ξ β1,2 (1 − ξ2 ) 1 + (ξ2 + β2,2 (1 − ξ2 )) 2 = 0, ξ1 ξ2
(1 − ξ1 )) (ξ1 + β1,1
where its determinant vanishes. Hence we have ϕ(w1 , w2 ) > 0 if t < tc and (1−ξ ) > 0, ξ +β (1−ξ ) > 0 ϕ(w1 , w2 ) < 0 if t > tc and t is near tc , since ξ1 +β1,1 1 2 2 2,2 and ξ1 > 1, ξ2 > 1. w lies in ;4 if and only if t > tc . This proves Lemma 10. " # We now want to define several paths from the point w = (1, 1) in the complex affine space of w, along which the local solution (1) at z = (0, 0) is continued analytically to the other local solutions (2)–(9) expressed in Proposition 2 (see Fig. 4). In the complex line w2 = 1 for z2 = 0, Eq. (4.1) reduces to β
w1 − 1 = z1 w1 1,1 .
(5.5)
Wu’s Equations and Quasi-Hypergeometric Functions
499
Fig. 4.
We take the real path l + in the w1 plane which is denoted by ω2 . Then the solution > 1. (1) goes to (2) owing to Lemma 5 in view of β1,1 In the same way we can define the real path ω3 in the complex line w1 = 1, along which (1) goes to (3). In a neighborhood U2 (δ1 , δ2 ) of w = (0, 1) for Z1 = 0, Eq. (4.1) reduces to B β1,1
w2 − 1 = Z2 w2
.
(5.6)
Hence the real path ω2,4 = {w1 = 0} × l + lying in the complex line w1 = 0 goes from (2) to (4) since βB > 1. 1,1
We also have (4) from (3) by interchanging the coordinates w1 , w2 . We have thus defined the path ω4 going from (1) to (4) as the composite ω1,2 ◦ ω2,4 . ω5 denotes the path from w = (1, 1) to w = (1, +∞) lying in the line w2 = 1 such that w1 ∈ [1, ∞) (see Lemma 4). Then the local solution (1) goes to (5) along ω5 . The path ω6 from (1) to (6) can be defined similarly. Let us get paths from (5) to (8) and from (5) to (9) respectively. When w1 tends to +∞ such that Z2 is fixed, Z1 tends to 0, and therefore w2 in the solution (5) satisfies in U5 (δ1 , δ2 ) −B β2,2 1−β1,1
w2 − 1 = Z2 w2
.
(5.7)
500
K. Aomoto, K. Iguchi
Remark that
−B β2,2 1−β1,1
> 1. We take the path ω5,8 as {w1 = ∞} × l + going from (5) to (8),
and the path ω5,9 as {w1 = +∞} × [1, +∞), where we have the solution (9) from (5). We can also define the paths ω6,7 and ω6,9 getting (7) and (9) respectively from (6) by interchanging w1 and w2 . The composite of paths ω8 (or ω9 )= ω5 ◦ ω5,8 (or ω5 ◦ ω5,9 ) goes from (1) to (8) (or (9)). We can define similarly ω9 = ω6 ◦ ω6,9 going from (1) to (9). Lemma 11. (1) Fβ is continued analytically to Tσ1 τ1 Fβ or Tσ2 τ2 Fβ along T ω5 or T ω6 . (2) Fβ is continued analytically to Tσ1 σ2 τ1 τ2 Fβ along T ω9 or T ω9 . (3) Fβ is continued analytically to Tτ1 Fβ ,Tτ2 Fβ , Tτ1 σ2 τ2 Fβ , Tτ2 σ1 τ1 Fβ along T ω1 , T ω2 , T ω7 , T ω8 respectively, where the paths should be taken in a detoured way around 8 in the complex affine space C2 , such that arg ϕ(w) increases, when they cross 81 or 82 . (4) Fβ is continued analytically to Tτ1 τ2 Fβ along T ω4 . (∞)
We denote by w0,0 the function (4) defined in U4 (δ1 , δ2 ). We denote further by S1 and S2 the shift operators obtained by analytic continuation of rotation in the complex plane z1 → e2πi z1 , z2 → e2πi z2 and put µ
µ
(∞)
wµ(∞) = S1 1 S2 2 w0,0 . 1 ,µ2 Crossing 8 in the source space corresponds to a reflection on T 8 in the target space. Hence we have Lemma 12. The reflection on T 81 in the target space gives rise to the transposition between (1) and (5) or the one between (1) and (6). The reflection on T 82+ (or T 82− ) gives rise to the transposition between (5) and (9) (or the one between (6) and (9)). We further define the path T ω∗1 (or T ω∗2 ) as a path in the target space starting from the origin, going through T ;1 , reflecting on T 82+ (or T 82− ) and going back to the origin. The above lemma shows that the monodromy group generated by the paths ω∗1 , ω∗2 , ω5 , ω6 , gives rise to all the permutations among (1), (5), (6), (9), or equivalently among Fβ , Tσ1 τ1 Fβ , Tσ2 τ2 Fβ , Tσ1 σ2 τ1 τ2 Fβ . Hence, the monodromy group also gives rise to all (∞)
(∞)
(∞)
(∞)
the permutations among w0,0 , w−1,0 , w0,−1 , w−1,−1 . On the other hand, S1 , S2 give the shifts for each component (∞)
→ wµ1 +1,µ2 , S1 : wµ(∞) 1 ,µ2
(5.8)
(∞) wµ1 ,µ2 +1 .
(5.9)
S2 : wµ(∞) → 1 ,µ2
From now on we assume that ) be an arbitrary transform of β by an element of G. Then both (C2 ) Let β˜ = (β˜i,j triples {β˜ , β˜ , 1}, {β˜ , β˜ , 1} are linearly independent over the field of rationals. 1,1 (0) Let wν1 ,ν2
2,1
1,2
2,2
be the unique holomorphic solution at the origin to the following equations, corresponding to (1) in Proposition 2:
β
β
β
β
w1 − 1 = z1 e2πi(ν1 β1,1 +ν2 β2,1 ) w1 1,1 w2 2,1 ,
w2 − 1 = z2 e2πi(ν1 β1,2 +ν2 β2,2 ) w1 1,2 w2 2,2 .
(5.10)
Wu’s Equations and Quasi-Hypergeometric Functions
501
(∞)
w = wµ1 ,µ2 has the similar expansion as (4.11) w1 = e
+β −β2,2 2,1
w2 = ei
B
−β β1,2 1,1 B
π−2
β2,2 β2,1 B µ1 π+2 B µ2 π
β β1,1 π+2 1,2 B µ1 π−2 B µ2 π
i
i
Z1 (1 + · · · ),
Z2 (1 + · · · ). (0)
(∞)
To establish the relation of analytic continuation between wν1 ,ν2 and wµ1 ,µ2 for arbitrary (ν1 , ν2 ) and (µ1 , µ2 ) ∈ Z2 , consider the real surface XC1 ,C2 (abbreviated by X) defined by (1.2). Lemma 13. Suppose that 2πβ1,1 ν1 π + 2πβ2,1 ν2 π = ϕ1 + 2m1 π,
(5.11)
ν1 π 2πβ1,2
(5.12)
+ 2πβ2,2 ν2 π
= ϕ2 + 2m2 π,
0 < ϕ1 ≤ 2π , 0 < ϕ2 ≤ 2π , (m1 , m2 )∈ Z2 , + β −β2,2 2,1
B − β1,1
β1,2
B
π −2 π +2
β2,2
B β1,2 B
µ1 π + 2 µ1 π − 2
β2,1
B β1,1 B
µ2 π = θ1 + 2l1 π, µ2 π = θ2 + 2l2 π,
(5.13)
(l1 , l2 ) ∈ Z2 , −π < θ1 < π, −π < θ2 < π . We can put C1 = 2πβ1,1 ν1 + 2πβ2,1 ν2 , and C2 = 2πβ1,2 ν1 + 2πβ2,2 ν2 . (0)
(∞)
wν1 ,ν2 is analytically continued to wµ1 ,µ2 along a path in the surface X if and only if C1 − 2m1 π = ϕ1 = π − β1,1 θ1 − β2,1 θ2 , θ1 − β2,2 θ2 , C2 − 2m2 π = ϕ2 = π − β1,2
(5.14)
µ1 + 1 = −m1 , µ2 + 1 = −m2 , ν1 = l1 , ν2 = l2 .
(5.15)
i.e., if and only if
Lemma 14. Let wν1 ,ν2 be the unique solution to the equations corresponding to (5) in Proposition 2,
β
β
β
β
w1 − 1 = z1 e2πiν2 β2,1 w1 1,1 w2 2,1 , w2 − 1 = z2 e2πiν2 β2,2 w1 1,2 w2 2,2 ,
(5.16)
such that w1 , w2 have the expansions at the origin w1 =
Z1−1 e
ν )i 2π(ν1 +β2,1 2 1−β1,1
w2 = 1 + Z2 e
(1 + · · · ),
+ 2πiν2 β2,2
ν )β 2π i(ν1 +β2,1 2 1,2 1−β1,1
+ ··· .
(5.17)
502
K. Aomoto, K. Iguchi
We put ν ) 2π(ν1 + β2,1 2 1 − β1,1
and
ν )β 2π(ν1 + β2,1 2 1,2 1 − β1,1
= ϕ1 + 2m1 π, 0 ≤ ϕ1 < 2π
+ 2π ν2 β2,2 = ϕ2 + 2m2 π, 0 < ϕ2 ≤ 2π. (∞)
wν, ν2 is analytically continued to wµ1 ,µ2 along a path in the surface X, if and only if C1 − 2m1 π = ϕ1 − β1,1 (ϕ1 + 2m1 π ) = π − β1,1 (θ1 + 2l1 π ) − β2,1 θ2 , (ϕ1 + 2m1 π ) = π − β1,2 (θ1 + 2l1 π ) − β2,2 θ2 , (5.18) C2 − 2m2 π = ϕ2 − β1,2
i.e., ν1 − m1 = µ1 + 1, −m2 = µ2 + 1,
l2 = ν2 .
(5.19)
In the same way, Lemma 15. Let wν1 ,ν2 be the unique solution to the equations corresponding to (6) in Proposition 2,
β
β
β
β
w1 − 1 = z1 e2πiν1 β1,1 w1 1,1 w2 2,1 , w2 − 1 = z2 e2πiν1 β1,2 w1 1,2 w2 2,2 ,
(5.20)
such that w1 , w2 have the expansions at the origin w1 = 1 + Z1 e w2 = Z2−1 e Put
ν )β 2π(ν2 + β1,2 1 2,1 1 − β2,2
and
+ 2πiν1 β1,1
ν )i 2π(ν2 +β1,2 1 1−β2,2
+ ··· ,
(1 + · · · ).
(5.21)
+ 2π ν1 β1,1 = ϕ1 + 2m1 π, 0 < ϕ1 ≤ 2π
ν ) 2π(ν2 + β1,2 1 1 − β2,2
ν )β 2π i(ν2 +β1,2 1 2,1 1−β2,2
= ϕ2 + 2m2 π, 0 ≤ ϕ2 < 2π. (∞)
Let wν, ν2 be analytically continued to wµ1 ,µ2 along a path in the surface X. Then we have −m1 = µ1 + 1, ν2 − m2 = µ2 + 1, l1 = ν1 .
(5.22)
Wu’s Equations and Quasi-Hypergeometric Functions (0)
503
(0)
Lemma 16. Let wˆ ν1 ,ν2 = S1ν1 S2ν2 wˆ 0,0 be the unique solution to Eq. (4.1) corresponding to (9) in Proposition 2, such that we have the following expansions in U9 (δ1 , δ2 ) w1 = e(2πν1 w2 = e
1−β2,2 β2,1 G +2πν2 G )i
β 1−β1,1 (2πν1 1,2 G +2πν2 G )i
Z1 (1 + · · · ), Z2 (1 + · · · ).
(5.23)
Put )ν 2π(1 − β2,2 1
G )ν 2π(β1,2 1 G
+
+
ν 2πβ2,1 2
G
)ν 2π(1 − β1,1 2
G
= ϕ1 + 2m1 π, = ϕ2 + 2m2 π,
for 0 ≤ ϕ1 < 2π, 0 ≤ ϕ2 < 2π, (m1 , m2 ) ∈ Z 2 . (0) (∞) Then wˆ ν, ν2 is analytically continued to wµ1 ,µ2 along a path in the surface X, if and only if C1 − 2m1 π = ϕ1 − β1,1 (ϕ1 + 2m1 π ) − β2,1 (ϕ2 + 2m2 π ) 2π(θ1 + 2l1 π ) − β2,1 2π(θ2 + 2l2 π ), = π − β1,1 (ϕ1 + 2m1 π ) − β2,2 (ϕ2 + 2m2 π ) C2 − 2m2 π = ϕ2 − β1,2 2π(θ1 + 2l1 π ) − β2,2 2π(θ2 + 2l2 π ), = π − β1,2
(5.24)
i.e., ν1 − m1 = µ1 + 1, ν2 − m2 = µ2 + 1.
(5.25)
Summing up these lemmas, we have shown that every local solution to (5.11), (5.17), (5.21) and (4.1) in the neighborhoods U1 (δ1 , δ2 ), U5 (δ1 , δ2 ), U6 (δ1 , δ2 ), U9 (δ1 , δ2 ) in (∞) Proposition 2 respectively can be analytically continued to one of wµ1 ,µ2 defined in the neighborhood U4 (δ1 , δ2 ). As a result we can conclude Theorem 2. Under the condition (C1 ) and (C2 ) the monodromy group generated by T ω∗1 , (∞) T ω∗2 , T ω5 ,T ω6 , and S1 , S2 contains every finite permutation among {wµ1 ,µ2 }(−∞ < µ1 , µ2 < ∞) and the shifts (∞)
S1 : wµ(∞) → wµ1 +1,µ2 , 1 ,µ2 (∞)
S2 : wµ(∞) → wµ1 ,µ2 +1 . 1 ,µ2
(5.26)
The subgroup of finite permutations is normal in the group of all permutations so that the monodromy group is isomorphic to the semi-direct product of the subgroup of all finite permutations and the 2 dimensional lattice Z 2 . Remark 1. Our condition (N D) is not necessarily satisfied by an example arising in physical problem (for example, Laughlin’s incompressible 1/m fluid explained in [15] or [26]).
504
K. Aomoto, K. Iguchi
6. Case Where n is Arbitrary The critical set 8 of the map (1.1) is the complex algebraic hypersurface defined by 8 : ϕ(w1 , . . . , wn ) = 0.
(6.1)
The set 8 seems generally very complicated. To make it manageable, we assume now that the matrix β has the following property: (C3 ) The matrix β − 1 is oscillatory i.e., all the subdeterminants of β − 1 are nonnegative and there exists a positive integer k such that the k th power (β − 1)k are totally positive. Let H = diag(η1 , . . . , ηn ) be the diagonal matrix such that all ηj > 0. Then Lemma 17. The equation ϕ(η1 t, . . . , ηn t) = 0
(6.2)
with respect to t has n positive roots which separate each other. Proof. In fact, we have ϕ(η1 t, . . . , ηn t) = det(tH (1 − β ) + β ) = det(1 − β ) · det(tH − 1 − (β − 1)−1 ). Since β − 1 is oscillatory, there exists a diagonal matrix J with ±1 entries such that J (β − 1)−1 J is oscillatory. Hence J (1 + (β − 1)−1 )J is also. Since H is a positive diagonal matrix, H −1 J (1 + (β − 1)−1 )J = J H −1 (1 + (β − 1)−1 )J is also oscillatory. This means H −1 (1 + (β − 1)−1 ) has all positive eigenvalues which are different from each other. · < tn and by We denote by t1 , t2 , . . . , tn the n roots of (6.2) such that 0 < t1 < · · 8j the set of all points (η1 tj , . . . , ηn tj ), where η1 ≥ 0, . . . , ηn ≥ 0 and nj=1 ηj = 1. Then n the hypersurface 8 consists of n connected components 8 (1 ≤ Corollary 2. In R≥0 j j ≤ n) which are non-singular.
We consider a pseudo-real n-cube K with 2n vertices (v1 , . . . , vn ) in (CP 1 )n such that vj is equal to 1 or +∞. The edges of K are the paths of the form v1 × · · · vj −1 × [1, +∞] × vj +1 × · · · × vn , where [1, +∞] denotes the interval from 0 to +∞ in the j th coordinate C plane. v1 × · · · vj −1 × [1, +∞] × vj +1 × · · · × vn meet 8 at exactly one point. It meets 8j if and only if the number of vk such that vk = 1 is equal to j − 1. In the complex line defined by w1 = · · · = wp = +∞ and wp+2 = · · · = wn = 1 the corresponding solution to (1.1) is given by )1 , , . . . , )n such that )1 = · · · = )p = +∞ and )p+2 = · · · = )n = 1, by putting Z1 = · · · = Zp = Zp+2 = · · · = Zn = 0. From (2.1) and (2.2) we obtain the equation β˜
p+1,p+1 wp+1 − 1 = Zp+1 wp+1 ,
where β˜p+1,p+1 is defined as follows:
(6.3)
Wu’s Equations and Quasi-Hypergeometric Functions
505
= βp+1,p+1 − (βp+1,1 , . . . , βp+1,p ) · (β p − 1)−1 · t (β1,p+1 , . . . , (βp,p+1 ), β˜p+1,p+1 p
) where β p denotes the submatrix (βi,j i,j =1 . ˜ We see that βp+1,p+1 > 1. This has the same form as (3.1), whence we can apply the case n = 1 to (6.3). We call a path “admissible” if it has one of the following forms:
v1 × · · · vj −1 × l 1 × vj +1 × · · · × vn , v1 × · · · vj −1 × l + × vj +1 × · · · × vn , v1 × · · · vj −1 × l − × vj +1 × · · · × vn . These paths will be abbreviated by ej −1 ×l 1 ×en−j , ej −1 ×l + ×en−j , ej −1 ×l − ×en−j (ek denotes the k products of the trivial path e) respectively. They are all contained in X0,... ,0 . Lemma 18. 1p , ∞n−p can be analytically continued to 1p−1 , ∞n−p+1 along the path ep−1 × l1 × en−p . For the proof see Lemma 4. In the same way Lemma 19. 0p , 1q , ∞n−p−q is analytically continued to 0p+1 , 1q−1 , ∞n−p−q along the path ep × l + × en−p−q . For the proof see Lemma 5. −1 Lemma 20. 0p , 1q , ∞n−p−q is analytically continued to Sp+q+1 0p , 1q , 0, ∞n−p−q−1 along the path ep+q × l − × en−p−q−1 .
For the proof see Lemma 6. The above three lemmas show the following: Proposition 4. An arbitrary local solution 0p , 1q , ∞n−p−q is an analytic continuation of 1n along successive admissible paths in X0,... ,0 , 0p , 1q , ∞n−p−q can be (∞) −1 · · · Sn−1 0n along successive admissible analytically continued to wν1 ,... ,νn = Sp+q+1 paths in X0,... ,0 , where νj = 0 (1 ≤ j ≤ p + q) and νj = −1 (p + q + 1 ≤ j ≤ n). On the other hand, in the complex line w1 = · · · = wp = 0, wp+1 = · · · = wp+q−1 = 1, wp+q+1 = · · · = wn = ∞, i.e., for Z1 = · · · = Zp+q−1 = Zp+q+1 = · · · = Zn = 0, Eq. (1.1) has the singularity at a point Zp+q = c(c > 1). If we take, in the Zp+q plane, the analytic continuation from 0 to 0 turning around c counterclockwise, then 0p , 1q , ∞n−p−q and 0p , 1q−1 , ∞n−p−q+1 are interchanged. Since 0p , 1q , ∞n−p−q goes to (∞)
w0×···×0 × −1×···×−1 p+q×
n−p−q×
by successive admissible paths, this fact is the same thing as interchanging (∞)
w0×···×0 × −1×···×−1 p+q×
n−p−q×
506
K. Aomoto, K. Iguchi
and
(∞)
w0×···×0 × −1×···×−1 . p+q−1×
n−p−q+1×
Since this occurs also after any permutation of the coordinates wj , we have an arbi(∞) trary permutation among wν1 ,... ,νn for νj = 0, −1 by the above analytic continuation. Now we assume the following condition: (C4 ) Let β˜ = (β˜ ) be an arbitrary transform of β by an element of G. Then for i,j
, · · · , β˜ ) are linearly independent over each k, the n + 1dimensional vector (1, β˜1,k n,k the field of rationals. Then we can conclude the following theorem in the same way as in the case n = 2.
Theorem 3. Under (C3 ) and (C4 ), the analytic continuation along the set of admissible closed paths in X0,... ,0 and the shifts Sj (1 ≤ j ≤ n) generate the monodromy group for the solution w(z) to (1.1). This group is isomorphic to the semi-direct product of (∞) the group of all finite permutations and the lattice group Zn among wν1 ,... ,νn (−∞ < ν1 , . . . , νn < ∞). Remark 2. The real variety X = XC1 ,... ,Cn appearing in (1.2) plays an important part in studying the global nature of the solutions to Wu’s equations. Recently Kyoji Saito has defined “real twisted forms” associated with real Coxeter arrangements and has clarified their relationship to Coxeter groups (see [21]). In our investigation, X and the monodromy group in Theorem 2 seem to have a similar relation. Acknowledgements. The authors appreciate the help of Prof. Takashi Sakajo to draw the figures, and useful advice from Prof. Takuo Fukuda about 2 dimensional mapping singularities. K. I. would like to thank The Mitsubishi Foundation for a scientific grant and Kazuko Iguchi for her continuous financial support and encouragement.
References 1. Aomoto, K.: Integral representations of quasi-hypergeometric functions. Talk at Hong Kong workshop (June 1999) on Special Functions 2. Aomoto, K. and Iguchi, K.: On quasi-hypergeometric functions. Methods and Appli. of Anal. 6, (1), 55–66 (1998) 3. Aomoto, K. and Iguchi, K.: Singularity and monodromy of quasi-hypergeometric functions. In: q-Series from a Contemporary Perspective, M. E. H. Ismail and D. W. Stanton (eds). Contemporary Mathematics 254, Providence, RI: AMS, 2000 4. Aomoto, K. and Kita, M.: Theory of Hypergeometric Functions. In Japanese, Tokyo: Springer Verlag, 1994 5. Appell et Kampe de Feriet, P. : Fonctions hypergeometriques et hyperspheriques. Polynomes d’Hermite. Paris: Gauthiers Villars, 1960 6. Belardinelli, G.: Fonctions Hypergeom ´ etriques ´ de Plusieures Variables et Resolutions Analytiques des Equations Algebriques ´ Gen ´ erales. ´ Mem. des Sci. Math., Paris: Gauthiers Villars, 1960 7. Berndt, B.C.: Ramanujan’s Notebooks, Part I, Chap. 3. New York: Springer Verlag, 1985 8. Erfving, G.: The History of Mathematics in Finland 1828–1918. “R. H. Mellin”. Helsinki, 1981 9. Gelfand, I.M. and Graev, M.I.: GG-functions and their relation to general hypergeometric functions. Russ. Math. Surveys 52, 4, 639–684 (1997) 10. Haldane, F.D.M.: “Fractional Statistics” in arbitrary dimensions: A generalization of the Pauli Principle. Phys. Rev. Lett. 67, 937–940 (1991) 11. Ha, Z.N.C.: Quantum Many Body Systems in One Dimension. Singapore: World Scientific, 1996 12. Herglotz, G.: Über die Wurzeln trinomischer Gleichungen. Göttingen: Gesammelte Schriften, 1979, pp. 423–428
Wu’s Equations and Quasi-Hypergeometric Functions
507
13. Humphreys, J.: Reflection groups and Coxeter groups. Cambridge: Cambridge Univ Press, 1990 14. Iguchi, K.: Equation of state and virial coefficients of an ideal gas with fractional exclusion statistics in arbitrary dimensions. Mod. Phys. Lett. B11, 765–772 (1997) 15. Iguchi, K.: Generalized Lagrange theorem and thermodynamics of a multispecies quasiparticle gas with mutual fractional exclusion statistics. Phys. Rev. B58, 6892–6911 (1998) 16. Iguchi, K. and Aomoto, K.: Quasi modular symmetry and quasi hypergeometric functions in quantum statistical mechanics of fractional exclusion statistics. Mod. Phys. Lett. B13, 1039–1046 (1999) Integral Representation for the Grand Partition Function in Quantum Statistical Mechanics of Exclusion Statistics, Int. J. Mod. Phys. B13, 485–506 (1999) 17. Ishikawa, T. and Izumiya, S.: Applied Singularity Theory.(In Japanese). Tokyo: Kyouritu Shuppan, 1998) 18. Kuniba, A., Nakanishi T., and Tsuboi, Z.: Notes on general Q-systems. Preprint, 2000 19. Morin, B.: Formes canoniques des singularités d’une application differéntiable. C. R. Acad. Sci. Paris, t. 260, Groupe 1, 5662–5665, et 6503–6506 (1965) 20. Polya, G. and Szego, ¨ G.: Problems and Theorems in Analysis I, Part III, Chap 5, Problems Numbers 211–216. Berlin: Springer Verlag, Bd. 193, 1972 21. Saito, K.: The polyhedron dual to the chamber decomposition for a finite Coxeter group. Preprint, 1998 22. Srivastava, H.M. and Daoust, M.C.: Certain generalized Neumann expansions associated with the Kampé de Fériet function. Indag. Math. 31, 449–457 (1969) 23. Sutherland,B.: Quantum many-body problem in one dimension : Thermodynamics. Jour. Math. Phys. 12, 251–256 (1971) 24. Whitney, H.: On singularities of mappings of Euclidean spaces I. Mappings of the plane into the plane. Annals of Math. 62, 374–410 (1955) 25. Wu, Y.-S.: Statistical distribution for generalized ideal gas of fractional statistical particles. Phys. Rev. Lett. 73, 922–925 (1994) 26. Wu, Y.S., Yu, Y., Hatsugai, Y. and Kohmoto, M.: Exclusonic quasiparticles and thermodynamics of fractional quantum Hall liquids. Phys. Rev. B 57, 9907–9919 (1998) Communicated by T. Miwa
Commun. Math. Phys. 223, 509 – 532 (2001)
Communications in
Mathematical Physics
© Springer-Verlag 2001
Schrödinger Operators with Sparse Potentials: Asymptotics of the Fourier Transform of the Spectral Measure Denis Krutikov1 , Christian Remling2 1 Universität Essen, Fachbereich Mathematik/Informatik, 45117 Essen, Germany.
E-mail: [email protected]
2 Universität Osnabrück, Fachbereich Mathematik/Informatik, 49069 Osnabrück, Germany.
E-mail: [email protected] Received: 17 May 2001 / Accepted: 28 June 2001
Abstract: We study the pointwise behavior of the Fourier transform of the spectral measure for discrete one-dimensional Schrödinger operators with sparse potentials. We find a resonance structure which admits a physical interpretation in terms of a simple quasiclassical model. We also present an improved version of known results on the spectrum of such operators.
1. Introduction Let H be the Hamiltonian of a quantum mechanical system, acting on a Hilbert space H. 2 If the initial state is denoted by ψ (so ψ ∈ H and ψ = 1), then ψ, e−itH ψ is the probability of finding the system again in the state ψ at time t. Clearly, ψ, e−itH ψ = ρ ψ (t), where ρψ is the spectral measure of ψ and the hat denotes the Fourier transform. It is therefore interesting to study the Fourier transform of the spectral measures of H . Usually, one does not analyze dynamical properties directly, but rather tries to connect T them to the spectral properties of H . For instance, the time average (1/2T ) −T | ρ (t)|2 dt is related to the continuity properties of ρ with respect to Hausdorff measures [7]. These properties, in turn, can be (and have been) studied successfully for many interesting models. In this paper, however, we are interested in the pointwise behavior of ρ (t) as t → ±∞. Clearly, this quantity carries additional information which gets lost in the averaging process. In particular, it is often interesting to know whether limt→±∞ ρ (t) = 0 (the measures ρ with this property are called Rajchman measures). On the other hand, the pointwise behavior of ρ (t) is usually difficult to analyze and it may depend in a subtle way on number theoretic properties of ρ. For example, a classical result of Salem says that a Cantor set with ratio of dissection θ > 2 does not support non-zero Rajchman measures precisely if θ is a Pisot number, that is, if θ is an algebraic integer whose conjugates are strictly less than one in absolute value (see [10, Chapter III]). Furthermore, Lyons [9] characterized the Rajchman measures as the measures annihilating all Weyl sets, and the
510
D. Krutikov, C. Remling
property of being a Weyl set again depends on arithmetic properties. However, there are also two obvious remarks that can be made: an absolutely continuous measure is Rajchman (by the Riemann–Lebesgue Lemma), while a point measure is not Rajchman (by Wiener’s Theorem). So the distinction between Rajchman and non-Rajchman measures really concerns the singular continuous part of a measure. In this paper, we will discuss one specific model where the pointwise behavior of ρ (t) can be analyzed rather completely. Indeed, the estimates we will prove below cannot be substantially improved as this would be inconsistent with the spectral properties – compare the discussion following Theorem 1.2. We will study discrete one-dimensional Schrödinger operators with sparse potentials. These potentials can lead to singular continuous spectra, as was first shown by Pearson in his celebrated paper [12]. Pearson’s results were recently improved and extended in [4, 11, 14, 15]. The discrete Schrödinger equation reads y(n − 1) + y(n + 1) + V (n)y(n) = Ey(n)
(n ∈ N);
(1)
let H : 2 (N) → 2 (N) be the associated Schrödinger operator, that is, (Hy)(n) equals the left-hand side of (1) (where we put y(0) := 0). The potential V will have the form V (n) =
∞
gm δn,xm ,
(2)
m=1
where the gn are bounded and x1 < x2 < · · · is a rapidly increasing sequence of natural numbers. It is easy to see that the essential spectrum of H contains the interval [−2, 2] if xn − xn−1 → ∞. There may also be essential spectrum outside [−2, 2]; in fact, this part of σess also admits a rather explicit description along the lines of [11]. In this paper, however, we are only interested in the part of the spectrum in (−2, 2). In a sense, ρ (t) contains more information on the dynamics of the quantum system than the spectral properties of H . Still, it is comforting to know that in the situations we will analyze below, it is also possible to determine the spectral properties of H . Theorem 1.1. Suppose xn−1 /xn → 0 and sup |gn | < ∞. Then: a) If gn2 < ∞, then H is purely absolutely continuous on (−2, 2). b) If gn2 = ∞, then H is purely singular continuous on (−2, 2). This dichotomy was already observed by Pearson [12], but under much stronger assumptions on the rate of growth of the xn ’s. Part a) of Theorem 1.1 is due to Kiselev, Last, and Simon [4]; they also proved the statement of part b) under the additional assumption that gn → 0. In the generality stated, part b) is new; probably, it can be extended even further to situations where each barrier is supported by a finite number of sites and these numbers are bounded. Note, however, that new phenomena (like spectra of mixed type) occur if the supports are allowed to grow [11, 15]. The proof of Theorem 1.1b) combines ideas from [4, 11, 12, 14, 15]. We will prove below general estimates on ρ (t) under the sole assumption that xn−1 /xn → 0 and sup |gn | < ∞ (see Theorems 4.3 and 5.1). However, for the discussion of these results, it is better to specialize and draw some conclusions whose relevance is more obvious. The following theorem contains three such conclusions.
Schrödinger Operators with Sparse Potentials
511
Theorem 1.2. Suppose that sup |gn | < ∞. xn → ∞, then limt→±∞ (f dρ)(t) = 0 for all f ∈ C0∞ (−2, 2). a) If n1 ln xn−1 b) Fix > 0 (arbitrarily small) and define the resonant set R by R= [(1 − )xn , xn (ln xn )1+ ]. n∈N 1−µ
Suppose that for some C > 0, µ > 0, we have xn ≤ Cxn+1 for all n ∈ N. Then: (i) For every m ∈ N and every f ∈ C0∞ (−2, 2), there exists a constant C so that |(f dρ)(t)| ≤ C(1 + |t|)−m for all t with |t| ∈ / R. / supp f , there exists (ii) For every γ < min{1/2, µ} and every f ∈ C0∞ (−2, 2) with 0 ∈ a constant C so that |(f dρ)(t)| ≤ C(1 + |t|)−γ for all t. Here, ρ is the spectral measure associated with the vector δ1 ∈ 2 (δ1 (1) = 1 and δ1 (n) = 0 if n = 1). Since δ1 is a cyclic vector for H , any other spectral measure ρψ is absolutely continuous with respect to ρ. Some comments on Theorem 1.2 are in order. First of all, Killip and one of us have shown [3] that HRaj := {ψ ∈ 2 : ρψ is a Rajchman measure} is a reducing subspace for H . So, since C0∞ (−2, 2) is dense in L2 ((−2, 2), dρ), part a) of Theorem 1.2 tells us that the Schrödinger operator H is purely Rajchman on (−2, 2), that is, E((−2, 2))2 ⊂ HRaj (where E denotes the spectral projection of H ). Simon [16] has obtained earlier a very general result which goes in the same direction. Roughly speaking, it states that for many models with singular continuous spectrum, one can achieve that Hsc = HRaj (and, in fact, ρ (t) = O(|t|−1/2 ln |t|)) by making the potential sufficiently sparse. However, there is little control on the rate with which the barrier separations have to increase. Simon’s techniques are quite different from ours. Theorem 1.2b) shows that under a stronger assumption on the xn ’s, we also get information on the rate with which (f dρ)goes to zero. Namely, according to part (i), the Fourier transform decays very rapidly off the resonant set R. Part (ii) is especially 1/2 interesting if the xn grow so rapidly that xn ≤ Cxn+1 . Then µ = 1/2, and Theorem 1.2b) says that for arbitrary m ∈ N, δ > 0, C(1 + |t|)−m |t| ∈ /R |(f dρ)(t)| ≤ . (3) −1/2+δ C(1 + |t|) |t| ∈ R This conclusion can also be proved under weaker assumptions on the increase of xn if there is some regularity in the way in which the xn ’s tend to infinity. For example, if xn = [exp(a n )] with a > 1, then (3) also holds. These estimates must be rather accurate, at least if gn2 = ∞. Indeed, Theorem 1.1b) then shows that the spectral measure is purely singular on (−2, 2), so (f dρ)/ ∈ L2 . This
512
D. Krutikov, C. Remling
means, first of all, that on the resonant set, the exponent of (1+|t|) cannot be smaller than −1/2. By the same token, our definition of the resonant set is close to optimal in that it cannot be true that for all large n, the interval containing xn is smaller than Cxn1− , with > 0. Indeed, if such an estimate held, then (writing In = [xn − Cxn1− , xn + Cxn1− ]) In
−1/2+δ 2 |(f dρ)(t)|2 dt ≤ C0 xn1− xn = C0 xn2δ− .
Hence by taking δ < /2, we see that (f dρ)∈ L2 . As mentioned above, this conclusion contradicts the fact that ρ is singular. Since our intervals have a size of ≈ xn (ln xn )1+ , we may be off by at most a factor which is o(xn ) for all > 0. Note also that the intervals contained in R are disjoint and large for large n, but there are also huge gaps between them, so that the complementary set of non-resonant times covers a considerable portion of the real line. Theorem 1.2b) very neatly supports a naive quasiclassical picture of quantum motion under the influence of a sparse potential. Namely, play the following game: Start with a particle localized at the origin n = 1 at time t = 0, and let it move towards the first barrier (which is at x1 ). When the particle hits the first barrier, it is either reflected or transmitted (the corresponding probabilities should presumably be determined from the reflection and transmission coefficients from stationary scattering theory, but this is quite irrelevant here). In the case of reflection, the particle returns to the origin, while in the case of transmission, it moves on to the second barrier, where it is again either transmitted or reflected. Recalling that | ρ (t)|2 is the probability of finding the particle again at n = 1 at time t if it was initially at n = 1, we see that the above model suggests that ρ should have a resonance structure since return to the origin is possible only at certain times. Because of the spreading of the wave packets, we should not expect very sharp resonances. Of course, mathematically speaking, there is little reason to have much confidence in this simplistic model, and indeed the actual analysis proceeds along different lines. Still, the final result (compare Eq. (3)) is exactly what the model predicts! We can now also understand the role of the assumption 0 ∈ / supp f in Theorem 1.2b) (ii): Namely, the spreading of wave packets under the free evolution is slower for wave packets localized (in energy) around E = 0. Our methods also work if 0 ∈ supp f is allowed, but one obtains weaker estimates. In particular, under the same assumptions as above (µ = 1/2), one can prove that (f dρ)(t) = O(|t|−1/6+ ) for every > 0. See [6] for details on this. Our approach for proving Theorem 1.2 depends on a representation of the Fourier transform of the spectral measure as a rather complicated looking limit of (an increasing number of) series of integrals (= Theorem 2.3). This formula is completely general, but if (and probably only if) the potential is sparse, it is also useful because most of the integrals are oscillatory and hence small. These terms will be estimated in Sect. 4, the result being Theorem 4.3. There are other terms which cannot be treated in this way; these contributions are discussed in Sect. 5. Armed with these estimates, we can then prove Theorem 1.2 in Sect. 6; in fact, this result is a rather straightforward consequence of Theorems 4.3, 5.1. Finally, in Sect. 7, we prove Theorem 1.1. It is also possible to treat the case of unbounded gn ’s with our methods, although the technical difficulties increase and the results are somewhat less satisfactory. See again [6] for further information.
Schrödinger Operators with Sparse Potentials
513
2. Preliminaries In this section, we collect some basic material that will be needed in the sequel. First of all, we will use a Prüfer type transformation (compare [4, 5]) to rewrite the Schrödinger equation (1). So, suppose that E ∈ (−2, 2), and let y be the solution of (1) with initial values y(0) = 0, y(1) = 1 (say). Write E = 2 cos k with k ∈ (0, π ) and define R(n) > 0, ψ(n) by y(n − 1) sin k sin(ψ(n)/2 − k) = R(n) . y(n) − y(n − 1) cos k cos(ψ(n)/2 − k) In fact, the angle ψ(n) is defined only modulo 4π . One then checks that R and ψ obey the equations R(n + 1)2 V (n) V (n)2 sin2 (ψ(n)/2), =1− sin ψ(n) + 2 R(n) sin k sin2 k V (n) cot (ψ(n + 1)/2 − k) = cot(ψ(n)/2) − . sin k There is no problem with the singularities of cot because we can as well use a similar equation with tan instead of cot. Actually, a tiny bit of information got lost when we passed from (1) to these new equations. This is reflected in the fact that now ψ(n + 1) is only determined modulo 2π by the equations. We must in fact impose the additional requirement that sin(ψ(n)/2) and sin(ψ(n + 1)/2 − k) have the same sign (and if sin(ψ(n)/2) = 0, then cos(ψ(n + 1)/2 − k) = cos(ψ(n)/2)). Fortunately, these points will not cause any inconvenience. Note that the evolution of R, ψ is especially simple if V = 0: R is constant and ψ(n + 1) = ψ(n) + 2k. If the potential is sparse (that is, of the form (2)), we use a slightly different notation in that we write Rn = R(xn ) and ψn = ψ(xn ); also, it is often useful to make the dependence on k explicit. We then have that R(m) = Rn for xn−1 < m ≤ xn and 2 Rn+1
gn gn2 sin2 (ψn /2), sin ψn + sin k sin2 k ψn = ψ(xn−1 + 1) + 2k(xn − xn−1 − 1), gn−1 . cot (ψ(xn−1 + 1)/2 − k) = cot(ψn−1 /2) − sin k Rn2
= 1−
(4) (5) (6)
As a second tool, we need a representation of the spectral measure as a weak star limit of absolutely continuous measures involving the solutions of (1). We again use the spectral measure associated with δ1 , and we denote this measure by ρ. In other words, ρ(M) = E(M)δ1 2 , where E(·) is the spectral resolution of H . Proposition 2.1. Let w be a Herglotz function (that is, a holomorphic mapping from C+ = {z ∈ C : Im z > 0} to itself), and let I ⊂ R be a bounded, open interval. Suppose that w extends continuously to C+ ∪ I and that Im w(E) > 0 for all E ∈ I . Then 1 Im w(E) f (E) dρ(E) = lim dE f (E) n→∞ π |y(n, E) − w(E)y(n + 1, E)|2 for all continuous functions f with support in I . Here, y is the solution of (1) with the initial values y(0, E) = 0, y(1, E) = 1.
514
D. Krutikov, C. Remling
Basically, this result is from [13]; the special case w ≡ i has been known before [1, 8]. The proof we give below does not depend on the methods used in these papers; it is based on an idea of Atkinson (unpublished manuscript). Proof of Proposition 2.1. Let y be as above, and also introduce v as the solution of (1) with the initial values v(0, E) = 1, v(1, E) = 0. In fact, the spectral parameter E will also take complex values in this proof, and in that case we usually denote it by z instead of E. Fix N ∈ N, write f (n, z) = v(n, z) − MN (z)y(n, z) and determine MN from the (non-selfadjoint) boundary condition f (N, z) = w(z)f (N + 1, z) (z ∈ C+ ). A brief computation shows that MN (z) =
v(N, z) − v(N + 1, z)w(z) . y(N, z) − y(N + 1, z)w(z)
(7)
Moreover, there is Green’s identity N
n=N g(n)(τ h)(n) − (τg)(n)h(n) = g(n)h(n + 1) − g(n + 1)h(n) . n=0
n=1
Here, g, h are arbitrary functions from N0 to C, and (τy)(n) is short-hand for the lefthand side of (1). If we apply this to N
|f (n, z)|2 =
n=1
N
1 f (n, z)(τf )(n, z) − (τf )(n, z)f (n, z) z−z n=1
with the function f from above, we obtain N n=1
|f (n, z)|2 =
Im w(z) Im MN (z) − |f (N + 1, z)|2 . Im z Im z
This equation together with (7) show that MN is a Herglotz function. Clearly, Im MN ≥ 2 Im z N n=1 |f (n, z)| , which is precisely the condition for MN to lie inside the Weyl circle KN (z) (see, for example, [2, Sect. 9.2] and [18, Sect. 2.4]). By standard Weyl theory, the Weyl circles shrink to a point as N → ∞, and this point is nothing but the m-function of the half-line problem: m(z) = δ1 , (H − z)−1 δ1 . In particular, we have that MN (z) → m(z) for fixed z ∈ C+ . It now follows that the measures associated with MN converge (in a sense that will be made precise shortly) to ρ. This part of the argument is similar to the construction of the spectral measure ρ in standard Weyl theory (compare the discussion in [2, Sect. 9.3]) and will thus only be sketched. Write down the Herglotz representation of MN : t 1 MN (z) = aN + bN z + − 2 dρN (t). t +1 R t −z (t) Here aN ∈ R, bN ≥ 0, and ρN is a positive Borel measure with dρt 2N+1 < ∞. By analyzing the asymptotics of MN (iy) as y → ∞, one can in fact show that bN = 0. It (t) and write MN as is nice to have finite measures, so we introduce dµN (t) = dρt 2N+1 tz + 1 dµN (t). MN (z) = aN + R t −z
Schrödinger Operators with Sparse Potentials
515
Note that Im MN (i) = µN (R); since this sequence is bounded (even convergent), the Banach-Alaoglu Theorem shows that the µN converge on a subsequence to a limit measure µ in the weak star topology (where the finite, complex Borel measures on R are viewed as the dual of C0 (R)). By passing to the limit in the equation 2 Im MN (z) t +1 − 1 dµN (t), − Im MN (i) = 2 Im z R |t − z| we thus see that Im m(z) − Im m(i) = Im z
R
t2 + 1 −1 |t − z|2
dµ(t).
Since the measure associated with a Herglotz function is already determined by the . In particular, this imaginary part of that function, we must have that dµ(t) = dρ(t) t 2 +1 measure is the only possible weak star limit point of the µN ’s, and thus it was not (t) → dρ(t) in the weak star necessary to pass to a subsequence. Rather, we have dρt 2N+1 t 2 +1 topology. Finally, a computation using (7) and constancy of the Wronskian W (n) = v(n)y(n + 1) − v(n + 1)y(n) shows that for all E ∈ I , the limit MN (E) ≡ lim→0+ MN (E + i) exists and Im MN (E) =
Im w(E) |y(N, E) − y(N + 1, E)w(E)|2
.
By general facts on Herglotz functions, the measures ρN are therefore purely absolutely continuous in I with density (1/π ) Im MN (E). Corollary 2.2. Suppose f is a continuous function with support contained in (−2, 2). Then π 2 sin2 k f (E) dρ(E) = lim f (2 cos k) 2 dk. π n→∞ 0 R (n, k) Proof. We want to apply Proposition 2.1 with I = (−2, 2) and
z z2 w(z) = + i 1 − , 2 4 but we first have to check that this is a Herglotz function. More precisely, we will choose the square root on z ∈ (−2, 2) so that Im w > 0 there and then continue holomorphically to the upper half-plane. The continuation is possible because the branch points of (w − z/2)2 = z2 /4 − 1 are z = ±2, neither of which is in the upper half-plane. By the monodromy theorem, the continuation is also unique. Moreover, w(z) extends continuously to the closure of C+ (in the Riemann sphere C∞ ), and then the image of R ∪ {∞} is the closed curve (−∞, −2) ∪ {2eiϕ : π ≥ ϕ ≥ 0} ∪ (2, ∞) ∪ {∞}.
(8)
Therefore, the set {w(z) : z ∈ C+ } must be contained in one of the two regions into which the sphere is divided by (8). It now follows easily that this image must actually be
516
D. Krutikov, C. Remling
contained in the region contained in the upper half-plane, so w(z) is a Herglotz function, as required. Now the claim follows from Proposition 2.1 together with the substitution E = 2 cos k. We now use Corollary 2.2 to derive a formula for the Fourier transform of ρ. Since we are interested only in the part of the operator on (−2, 2), we will study (f dρ)(t) =
∞
−∞
f (E)e−itE dρ(E),
with f ∈ C0∞ (−2, 2). Theorem 2.3. (f dρ)(t) = lim
N→∞
∞
π
g(k)
n1 ,... ,nN =−∞ 0
N
i c(nj , gj / sin k) e
N l=1 nl ψl (k)−2t
cos k
dk,
j =1
(9) where g ∈ C0∞ (0, π) and
c(0, a) = 1,
2i n c(n, a) = 1 + a |n|
−|n|
(n = 0).
Proof. By Corollary 2.2 and (4), we have 2 (f dρ)(t) = lim π N→∞
π 0
N
j =1
f (2 cos k) sin2 k −2it cos k × e R12 (k) −1 gj2 gj 2 sin (ψj (k)/2) 1− dk. sin ψj (k) + sin k sin2 k
The factors in the product can be expanded in a Fourier series: ∞ 1 = c(n, a)einψ , 1 − a sin ψ + a 2 sin2 (ψ/2) n=−∞
with the coefficients c(n, a) defined in the statement of the theorem. This can be checked by summing the series. As the convergence is uniform in ψ, we may interchange the order of integration and summation. Finally, the factor 2/π sin2 kR1−2 (k) can be absorbed by g, and the claim now follows.
Schrödinger Operators with Sparse Potentials
517
3. Estimates on the Prüfer Angle The integrals from (9) contain rapidly oscillating exponentials. As usual, we will exploit this by integrating by parts. We will then need the following estimates on the derivatives of the Prüfer angles ψn . From now on and throughout the rest of this paper, we assume that the potential is given by (2) and that xn−1 /xn → 0 and sup |gn | < ∞. Lemma 3.1. ψn (k) = 2xn (1 + O(xn−1 /xn )) , (j ) j (j ≥ 2). ψn (k) ≤ Cj xn−1 These estimates hold uniformly for k from a compact subset of (0, π ). The estimates on the first two derivatives were also proved in [4]. Since we will integrate by parts many times (not only once, as in [4]), we really need Lemma 3.1 in full generality. Actually, in Sect. 7, we will also need a slightly different version of the first statement (which will be more accurate for small gn ’s), but this will be discussed later. Proof. Let θn = ψ(xn−1 + 1). Then (6) says that θn ψn−1 gn−1 cot − k = cot − . 2 2 sin k We differentiate this equation and solve for θn to obtain θn = 2 +
sin2 ψn−1 2
− sin2
ψn−1 2
1
+ cos
ψn−1 2
−
gn−1 sin k
sin
cos k 2 ψn−1 gn−1 sin 2 k sin 2 gn−1 + cos ψn−1 2 − sin k sin
ψn−1 2 2
ψn−1 2 2
ψn−1
.
Now the gn ’s are bounded and sin k is bounded away from zero (since k varies over a compact subset of (0, π)). Taking (5) into account, we therefore obtain ψn = 2(xn − xn−1 ) + O(1)ψn−1 + O(1),
where the constants implicit in O(1) only depend on sup |gn | and inf sin k. The xn ’s grow more rapidly than exponentially, so the claim on ψn follows by iterating this equation. (j ) (j ) To prove the assertion on the higher derivatives, we note that ψn = θn for j ≥ 2. Thus, for these j , (j −1) (j ) ψn =
sin2 ψn−1 2
+ cos
−
sin2
ψn−1 2
ψn−1 ψn−1 2
−
gn−1 sin k
sin
ψn−1 2 2
cos k 2 ψn−1 gn−1 sin 2 k sin 2 gn−1 + cos ψn−1 2 − sin k sin
ψn−1 2
(j −1)
2
.
518
D. Krutikov, C. Remling
Denote the denominator by D, that is, ψn−1 ψn−1 gn−1 ψn−1 2 . D = sin2 + cos − sin 2 2 sin k 2 If the derivatives are evaluated using the product rule j − 1 times, we get a sum of many terms. Fortunately, it suffices to observe the following facts: (j )
(j )
(i) The only term containing ψn−1 is ψn−1 /D. (ii) Everything else is of the form (r ) pi −m i D ψn−1 f (ψn−1 , k), i
where f is a bounded function, m ≤ j , and the numbers ri , pi satisfy
i ri pi
≤ j.
We can now complete the proof by induction on j . By the induction hypothesis (and a direct argument for j = 2), the above remarks imply that
(j ) (j ) j ψn ≤ Cj ψn−1 + xn−1 . The claimed estimates follow by iterating this.
4. Non-resonant Terms The heading of this section refers to those terms from (9) for which the exponential is rapidly oscillating as a function of k. It is useful to first make explicit in the notation the largest index j with nj = 0. To this end, we denote the expression from the right-hand side of (9), with no limit taken, by IN (t) (so (f dρ)(t) = limN→∞ IN (t)). Also, let
N π N i l=1 nl ψl (k)−2t cos k JN (t) = g(k) c(nj , gj / sin k) e dk. n1 ,... ,nN ∈Z 0 nN =0
j =1
Then IN (t) = JN (t) + IN−1 (t). We can now describe our general strategy for estimating (9). By Lemma 3.1, the derivative of the phase is roughly equal to N j =1
nj ψj (k) + 2t sin k ≈ 2
N
nj xj + 2t sin k.
j =1
Since the xj ’s are rapidly increasing, we may expect this to be of the order 2nN xN + 2t sin k. So if |t| is either much larger or much smaller than xN (and if N is not too small), the exponential will be heavily oscillating and the corresponding contribution to (9) will be small. If |t| is of the order of xN (“resonance”), a different treatment is necessary (see the next section). Of course, the above reasoning is not literally true because the nj ’s with j < N can be so large in absolute value that, due to cancellations, N j =1 nj xj is much smaller that |nN |xN . This difficulty is overcome by suitably cutting off the series over the nj ’s.
Schrödinger Operators with Sparse Potentials
519
So, everything depends on the relative size of |t| and xN . Let a = max sin k, k∈supp g
and fix > 0 (arbitrarily small). We first study the case when |t| ≤
1− xN . a
More specifically, we will analyze JN (t), assuming this inequality. The series will be cut off at M = bxN /xN−1 , where [x] denotes the largest integer ≤ x, and b > 0 will be chosen later. So we have to distinguish two (sub-)cases: a) |nj | ≤ M for all j ∈ {1, 2, . . . , N − 1}; b) |nj | > M for some j ∈ {1, 2, . . . , N − 1}. Before we go on, a general remark on the notation we will use may be helpful. Namely, the term “constant” will refer to a number that is independent of t, N , and the nj ’s (later, we will sum over these latter parameters, anyway). It may depend, however, on the other parameters of the problem, which are sup |gn |, the xn ’s and the function g ∈ C0∞ (−2, 2). It may also depend on additional parameters we introduce like the from above. A constant is usually denoted by C; the actual value of C may change from one formula to the next. Also, we sometimes write a b instead of a ≤ Cb. Now let us start with case a). Abbreviate ϕ(k) =
N
nj ψj (k) − 2t cos k.
j =1
Using Lemma 3.1, we then see that N−1 ϕ ≥ |nN ψ | − |nj ψj | − 2(1 − )xN N j =1
≥ 2 (|nN | − 1 + ) xN − C|nN |xN−1 − 2Cb(xN /xN−1 )
N−1
xj .
j =1
If N is sufficiently large and if b is chosen sufficiently small, then we may further estimate this by, let us say, ϕ ≥ |nN |xN . (10) In order to obtain good estimates, we must now integrate by parts sufficiently many times. To do this, we introduce the differential expression L=
−i d . ϕ (k) dk
520
D. Krutikov, C. Remling
Note that L(eiϕ ) = eiϕ . Therefore, we can manipulate the integrals from the expression for JN (t) as follows:
m iϕ m iϕ c L e dk = eiϕ L g c dk. g c e dk = g Here, m ∈ N may still be chosen and L =
i d dk ϕ (k)
is the transpose of L. There are no boundary terms because g has compact support. We obtain the estimate
iϕ ≤ π max L m g g dk c ; (11) c e k∈supp g we expect the right-hand side to be small because ϕ is large by (10). m So, our next task is to control L g c . Each of the m derivatives contained in L m can act either on g or on some c(nj , gj / sin k) or on one of the factors 1/ϕ . The function g is smooth, so |g (j ) | ≤ Cm . Next, note that ∓2i cos k d c(n, g/ sin k) = c(n, g/ sin k) |n|, dk g ± 2i sin k where the signs depend on the sign of n. Since c itself decays exponentially – |c(n, g/ sin k)| ≤ e−γ |n| , where γ > 0 depends only on sup |gn | and inf sin k – we obtain the bound j d j −γ |n| . (12) dk j c(n, g/ sin k) ≤ Cj |n| e Finally, (1/ϕ )(T ) is a sum of terms of the form C
ϕ (r1 ) · · · ϕ (rs ) , (ϕ )q
(13)
ri = q + T − 1;
(14)
where ri ≥ 2 and s i=1
the ri ’s need not be distinct. To bound these expressions, we use Lemma 3.1 which implies that (for 2 ≤ r ≤ m) N (r) r r |nj |xjr −1 + 2|t| (xN /xN−1 )xN−2 + |nN |xN−1 + xN . ϕ ≤ Cm
(15)
j =1
We introduce the abbreviation AN (r) for this latter bound. Recalling that |ϕ | |nN |xN (by (10)), we can thus bound (13) by (|nN |xN )−q si=1 AN (ri ).
Schrödinger Operators with Sparse Potentials
521
The above considerations show that L m g c is a sum of many terms each of which admits a bound of the form Cm (|nN |xN )−P
s
AN (ri )
i=1
N
|nj |pj e−γ |nj | .
(16)
j =1
More precisely, such a bound results if pj derivatives act on c(nj , gj / sin k). Consequently, the remaining derivatives (if any) act on some factor 1/ϕ or on g. For later use, we record the fact that the number of different terms of the form (16) admits a bound of the form CN m , where C depends on m only. To prove this, observe that the product rule,
(l) N c with 0 ≤ l ≤ m, produces at most N l ≤ N m terms. Furthermore, applied to j =1 the number of possibilities of distributing the remaining m − l derivatives among g and the factors 1/ϕ does not depend on N . We now claim that there are the following restrictions on the parameters: P ≥ m, s ≥ 0, ri ≥ 2, pj ≥ 0 and s
ri +
i=1
N
pj ≤ P .
j =1
The first inequality just says that the number of factors 1/ϕ increases when derivatives act on them,and the following three relations are obvious. The last inequality is obtained as follows. pj is the number of derivatives acting on c, thus if T denotes the number of derivatives that act on some factor 1/ϕ , then T ≤ m − pj . Assume for the moment that these T derivatives all act on the same factor 1/ϕ . Then expressions of the form (13) result, and the exponent q must be related to P by P = q + m − 1. Hence (14) gives s
ri = P − m + 1 + T − 1 ≤ P −
i=1
N
pj ,
j =1
as claimed. We need not pay special attention to the case where the T derivatives act on different factors 1/ϕ because only terms of the type already handled can arise in this way. To simplify (16), we observe that AN (r) xN−2 r xN−1 r−1 xN−1 r 1 1 1 + + r r−1 r r−1 r |nN | xN−1 xN |nN | xN (|nN |xN ) |nN | xN r−1 xN−1 . |nN |xN Hence (16)
xN−1 |nN |xN
(ri −1)
1 |nN |xN
P − ri N j =1
|nj |pj e−γ |nj | ,
522
D. Krutikov, C. Remling
and these bounds can now be summed over the range ni ∈ Z, nN = 0, |ni | ≤ M (actually, this latter restriction is not needed at this point). So, let |n|p e−γ |n| , Dp = n∈Z
and use the conditions on the various exponents (see the discussion following (16)); we obtain (ri −1) P − ri N xN−1 1 (16) ≤ Cm Dpj xN xN n1 ,... ,nN j =1
nN =0
= Cm
(r −1) N
xN−1i
P −s xN
≤ Cm ≤ Cm
xN−1 xN xN−1 xN
Dpj
j =1
P −s N j =1
−p
Dpj xN−1j
m/2 N j =1
−p Dpj xN−1j .
s
The last inequality holds because ri ≥ 2 and i=1 ri ≤ P , hence s ≤ P /2, and thus P − s ≥ P /2 ≥ m/2. p We can now find an N0 = N0 (m) so that Dp ≤ D0 xN−1 for all N ≥ N0 , p = 0, 1, . . . , m. We use this observation and also replace m/2 by m to obtain xN−1 m (16) ≤ Cm D0N (N ≥ N0 ). xN n1 ,... ,nN nN =0
Up to now, we have estimated only the typical term from the decomposition of but, as already noted, the number of such terms is bounded L m g c performed above, by CN m , so L m g c satisfies the same estimate (with a possibly larger constant and D0 replaced by, let us say, 2D0 ). Because of (11), the discussion of case a) is thus complete. Case b) is much easier. Now |nj | > M for some j ∈ {1, . . . , N − 1}, where M = [bxN /xN−1 ]. Use (12) (with j = 0) and sum over all n1 , . . . , nN for which we are in case b). This gives N−1 iϕ g c e e−γ |n1 | · · · e−γ |nj | · · · e−γ |nN | j =1 n1 ∈Z
Case b)
N D0N e−γ bxN /xN −1
|nj |>M
nN ∈Z
N −γ bxN /xN −1
≤ (2D0 ) e
.
We summarize: Lemma 4.1. Suppose that |t| ≤ (1/a − )xN ( > 0). Then, for any m ∈ N, there are constants Cm , D, not depending on t or N , so that |JN (t)| ≤ Cm D N (xN−1 /xN )m . Moreover, D is also independent of m.
Schrödinger Operators with Sparse Potentials
523
Proof. It suffices to prove this for large N because then validity of the bound for all N is achieved by simply adjusting the constant. By combining the above estimates, we obtain xN−1 m |JN (t)| ≤ Cm D N (N ≥ N0 (m)), + e−γ bxN /xN −1 xN and the second term is much smaller than the first one for large N and can thus be dropped. The opposite case (|t| much larger than xN ) can be treated using similar ideas. It will thus suffice to provide a sketch of the argument. We fix once and for all a sequence BN ≤ ln xN (say) that tends to infinity. In fact, the point is that BN may go to infinity arbitrarily slowly (for instance, BN = (ln xN ) is a reasonable choice). We now assume that |t| ≥ BN xN ln xN . We can again prescribe an arbitrarily large exponent m ∈ N, and we again distinguish two subcases: a) |nj | ≤ (m/γ ) ln |t| (where γ is from (12)) for j = 1, . . . , N. We will estimate IN (not JN ), so we do not assume that nN = 0. b) |nj | > (m/γ ) ln |t| for some j ∈ {1, . . . , N}. In case a), we have that for sufficiently large N , |ϕ | ≥ 2a0 |t| −
N j =1
≥ 2a0 |t| − 3xN
m 2xj 1 + O(xj −1 /xj ) ln |t| γ m ln |t|, γ
where a0 = mink∈supp g sin k > 0. Now x/ ln x is an increasing function of x for x > e, so BN xN ln xN |t| ≥ , ln |t| ln xN + ln(BN ln xN ) which, for large N, is bigger than (BN /2)xN , say. Hence |ϕ | ≥ 2a0 |t| −
6m |t| ≥ a0 |t| γ BN
for large N. We now integrate by parts sufficiently many times (the exact number of integrations depends on m), as above. Lemma 3.1 now gives N (r) r |nj |xjr −1 + 2|t| xN−1 ln |t| + |t|, ϕ ≤ Cm j =1
and this estimate replaces (15). If this bound is again denoted by AN (r), then one shows that AN (r)/|t|r (xN−1 /|t|)r−1 . It is this combination, with |t| in the denominator, that
524
D. Krutikov, C. Remling
is of interest here because now |ϕ | |t|. Having made these adjustments, the argument now proceeds as above; the final result is the bound m iϕ N xN−1 g c e ≤ C D . m |t| n1 ,... ,nN Case a)
As usual, the constant Cm depends on m and the sequence BN , but of course not on t or N. Moreover, the constant D is also independent of m. In case b), we can argue as in case b) above to obtain iϕ g c e ≤ CN D0N e−γ (m/γ ) ln |t| = CN D0N |t|−m . n1 ,... ,nN Case b)
Putting things together, this gives: Lemma 4.2. Suppose that |t| ≥ BN xN ln xN . Then, for any m ∈ N, there are constants Cm , D, independent of t, N , so that |IN (t)| ≤ Cm D N (xN−1 /|t|)m . Moreover, D is also independent of m. Proof. Combine the above estimates, just as in the proof of Lemma 4.1.
For a large set of times t, we are in one of the two situations treated by Lemmas 4.1 and 4.2, respectively, for every N ∈ N. In view of the physical interpretation attempted in the Introduction, we call this set the set of non-resonant times. More precisely, define the resonant set R by 1 R= (17) − xn , Bn xn ln xn . a n∈N
For a = 1 and Bn = (ln xn Theorem 1.2.
) ,
this reduces to the definition given in the formulation of
Theorem 4.3. For any m ∈ N, the following holds. If |t| ∈ / R and if N ∈ N is such that BN xN ln xN < |t| < (1/a − )xN+1 , then |(f dρ)(t)| ≤ Cm D
N
xN−1 |t|
m
+
∞
D
n
n=N+1
(18)
xn−1 xn
m !
.
The constant D is independent of m. Remark. Of course, since we only assumed that xn−1 /xn → 0, the series can diverge, in which case Theorem 4.3 is vacuous. Proof. By (9) and the definition of IN , Jn , we can write (f dρ)(t) = IN (t) +
∞
Jn (t),
n=N+1
where we use the N from (18). We now apply Lemma 4.2 to estimate IN (t) and Lemma 4.1 to bound the Jn (t) (n ≥ N + 1).
Schrödinger Operators with Sparse Potentials
525
5. Resonant Terms It remains to analyze the case when t ∈ R. So suppose that (1/a − )xN ≤ |t| ≤ BN xN ln xN . The point k = π/2 (which corresponds to the energy E = 0) plays a special role now because the second derivative of cos k is zero there. Therefore, we also assume that π/2 ∈ / supp g. We introduce the new phase θ(k) = 2k
N
nj xj − 2t cos k.
j =1
Then, using the notation from the preceding section, we have that ϕ = θ + η, where η(k) =
N
nj (ψj (k) − 2xj k).
j =1
As usual, we need information on the derivatives. By Lemma 3.1, N η |nj |xj −1 j =1
(where we put x0 := 1). Also, θ = 2
N
nj xj + 2t sin k,
θ = 2t cos k.
j =1
In particular, our assumption π/2 ∈ / supp g ensures that |θ | ≈ |t|. We regard η as a perturbation of θ . Resonance is possible now, that is, θ (k) can be small, but since |θ | is large, this can only happen for a small set of k’s, and outside this set, we still have oscillatory integrals. To make these ideas precise, introduce the sets S0 = supp g, S1 = {k ∈ S0 : θ (k) ≤ δ1 xN }, S2 = {k ∈ S1 : θ (k) ≤ δ2 xN }, . . . . The numbers δj > 0 will be chosen later; they will satisfy 1 =: δ0 δ1 δ2 . . . . Clearly, S0 ⊂ [, π/2 −]∪[π/2 +, π −] for some > 0. By treating these two parts of the support of g separately and replacing the actual support with the corresponding interval, we may assume that S0 is an interval. Then θ does not change sign on S0 , and hence all the sets Sn are intervals. Clearly, S0 ⊃ S1 ⊃ S2 ⊃ · · · . It also follows that |Sn | δn
xN δn . |t|
Note also that the sets Sl depend on the nj ’s.
(19)
526
D. Krutikov, C. Remling
Our goal is to estimate IN (t). The integrals Jn (t) (n > N)do not resonant contain terms, and we can use the results of Sect. 4. We must estimate g c ei(θ+η) . Using the sets Sn , we can split the integrals as follows:
··· =
S0
Sm
··· +
m−1 l=0
Sl \Sl+1
··· .
The number m is a parameter which we leave unspecified for the time being. The integrals over Sl \ Sl+1 are again handled by integrating by parts. More precisely, we have that
Sl \Sl+1
i(θ+η) g c e =
(eiθ ) g c eiη iθ Sl \Sl+1
g c eiη ≤ boundary terms + |Sl | sup . θ k∈Sl \Sl+1
(20)
Since Sl \Sl+1 consists of at most two disjoint intervals, the boundary terms are obtained by inserting the endpoints of these intervals into g c /θ . As a result, these boundary terms may be estimated by
e−γ |nj | |boundary terms | . δl+1 xN For the second term from the right-hand side of (20), we use the by now familiar arguments from the preceding section. We obtain the bound
xN−1 |nj | |t| + 2 2 δl e−γ |nj | . δl+1 xN δl+1 xN
We have used (19) here. The numerator of the first term in parentheses is a bound on |η |, the second ratio bounds the contribution where the derivative acts on 1/θ . Finally, the derivative may also act on c or g, but this leads to contributions which are smaller than the ones already obtained. As usual, these bounds will now be summed over the nj ’s. This gives n1 ,... ,nN ∈Z
δ x B ln x δ l N−1 l N N . g c eiϕ ≤ CD N + 2 x δl+1 xN δl+1 Sl \Sl+1 N
The bound CD N /(δl+1 xN ) on the boundary terms does not occur here because it is dominated by the second term from the right-hand side of the above inequality. We also need an estimate on Sm , but this is easy, since we clearly have that
Sm
g
c eiϕ δm e−γ |nj | .
Schrödinger Operators with Sparse Potentials
527
After summing over the nj ’s, we thus get the bound CD N δm . Combining the facts just established, we see that g c eiϕ ≤ CD N × n1 ,... ,nN ∈Z
S0
m−1 m−1 xN−1 δl BN ln xN δl δm + + xN δ xN δ2 l=0 l+1 l=0 l+1
.
(21)
Theorem 5.1. Suppose that 0 ∈ / supp f and (1/a − )xN ≤ |t| ≤ BN xN ln xN . a) Then for arbitrary σ > 0, m ∈ N, there exist constants C, D, independent of N, t, so that xn−1 m |(f dρ)(t)| ≤ C D + xn n=N+1 ! xN−1 1/2 1 xN σ N . + BN ln xN CD xN xN−1 (xN−1 xN )1/2 ∞
n
The constant D is also independent of m and σ . b) We also have the estimate ∞ xn−1 m |(f dρ)(t)| ≤ C Dn + CD N xn n=N+1
xN−1 1−σ xN
+
BN ln xN
!
1/2−σ
xN
.
Proof. a) Here, we take δl = (xN−1 /xN )σ l . Then (21) yields g c eiϕ ≤ CD N × n1 ,... ,nN ∈Z
S0
xN−1 xN
α
+
xN−1 xN
1−σ
+ BN ln xN
xN xN−1
σ
1 1−α α xN−1 xN
,
(22)
where α = σ m. The constant D is independent of m and σ . But, as in the proof of Theorem 4.3, (f dρ)(t) = IN (t) +
∞
Jn (t);
n=N+1
IN (t) has just been estimated in (22), and the Jn (t) can be bounded using Lemma 4.1. So |Jn (t)| ≤ CD n (xn−1 /xn )m ; also, in (22), we specialize to α = 1/2. The claim now follows since we may clearly assume that σ ≤ 1/2. −σ l (and again α = 1/2). b) Proceed as in the proof of part a), but with δl = xN
528
D. Krutikov, C. Remling
6. Proof of Theorem 1.2 a) The hypothesis says that xn /xn−1 = ean n , where an → ∞. It is now straightforward to check that the bounds of Theorems 4.3, 5.1a) tend to zero as N → ∞, provided the parameters are chosen appropriately. For instance, we can take BN = ln xN and σ ∈ (0, 1/2). (In fact, Theorem 5.1 has the additional hypothesis that 0 ∈ / supp f , but this causes no problems since C0∞ functions with this property are still dense in L2 ((−2, 2), dρ).) b) Here, we put BN = (ln xN ) . Note also that a ≤ 1, so the set R defined in Theorem 1.2b) contains the set R from (17). So, if |t| ∈ / R, Theorem 4.3 applies. We will now further estimate the bound from the statement of this theorem. First of all, xN−1 m xN−1 m |t| m(1−µ) ≤ ≤ Cm |t|−mµ . |t| |t| xN As for the second term, we observe that m ∞ ∞ Dn n xn−1 D ≤ Cm mµ xn x n=N+1 n=N+1 n
∞ xN+1 mµ Cm D N+1 n = D . mµ xN+1+n xN+1 n=0
Now for sufficiently large N, we have xN+1 /xN+1+n ≤ 2−n (say) for all n ≥ 0, so the series converges for large m and the sum may be estimated by a number that does not depend on N . Thus m ∞ −mµ n xn−1 D ≤ Cm D N xN+1 ≤ Cm D N |t|−mµ . xn n=N+1
Finally, D N xN |t|, so (i) follows by taking m large enough. Part (ii) follows in a similar way from Theorem 5.1b), so we will only sketch the argument. Fix a sufficiently small σ > 0. Then, for instance, µ−σ xN−1 (ln |t|)1+ −µ+σ x . N 1−σ |t| xN The last term from the bound of Theorem 5.1b) is treated similarly, and the first term has already been discussed above. The additional factors D N and D N (ln xN )1+ are O(|t|δ ) for arbitrary δ > 0, so they do not spoil these estimates. 7. Proof of Theorem 1.1 Since, as noted above, part a) is actually a result from [4], we only need to prove part b). First of all, absence of point spectrum is easy: the gn are bounded, so (4) shows that for every k ∈ (0, π), there exists q > 0 so that Rn ≥ q n . But then ∞ m=1
R(m)2 =
∞ n=1
Rn2 (xn − xn−1 )
Schrödinger Operators with Sparse Potentials
529
diverges, which implies that there are no 2 solutions to (1). Hence σpp ∩ (−2, 2) = ∅. Now as in [4], the main part of the proof will depend on a general criterion for absence of absolutely continuous spectrum from [8]. Namely, if I ⊂ (−2, 2) is an open interval and if we can find a sequence Nm → ∞ so that for almost all E ∈ I (with respect to Lebesgue measure), limm→∞ R(Nm , E) = ∞, then it will follow that σac ∩ I = ∅. We will again work with k instead of E. Fix a compact subinterval I of (0, π ). According to what has been said above, we want to find a sequence Nm → ∞ so that RNm (k) → ∞ for almost all k ∈ I . By (4) and the fact that R1 = 1,
ln RN+1 (k) =
N
Xn (k, ψn (k)),
n=1
where (writing un (k) = gn / sin k) Xn (k, ψ) =
1 ln 1 − un (k) sin ψ + u2n (k) sin2 (ψ/2) . 2 (n)
(n)
(n)
For every n ∈ N, we subdivide I into subintervals I0 , I1 , . . . , INn , so that for l > 0, (n)
ψn (k) runs over an interval of length 2π if k varies through Il . We start this process (n) of subdividing I at the right endpoint of I , so we end up with an interval I0 at the left (n) endpoint of I which has the property that ψn (I0 ) is an interval of length less than or (n) equal to 2π. Since ψn ∼ 2xn by Lemma 3.1, we have the estimate |Il | 1/xn . We introduce γn,l =
1 (n)
|Il |
(n)
Il
Xn (k, ψn (k)) dk
(n)
and Yn (k) = Xn (k, ψn (k)) − γn,l (k ∈ Il ). So, in particular,
(n)
Il
Yn (k) dk = 0.
Let us now compute the second moments of Yn with respect to the probability measure (n) dP (k) = |I |−1 dk on I . We first consider EYm Yn with m < n. Let kl denote an (n) arbitrary (but fixed) point in Il , and note that |Yn | |gn |. Also, by inspection and (n) Lemma 3.1 again, |dYn /dk| |gn |xn (except, of course, at the endpoints of the Il , where Yn need not be differentiable). It follows that N
EYm Yn =
n 1 |I |
l=0 N
=
n 1 |I |
l=0
(n)
Il
Ym (k)Yn (k) dk
(n) Ym (kl ) + O(|gm |xm /xn ) Yn (k) dk
(n)
Il
= O(|gm gn |xm /xn ).
530
D. Krutikov, C. Remling
Next, we have that N
EYn2
n 1 = |I |
l=0
Yn2 (k) dk
(n)
Il
Nn 1 2 2 (n) Xn (k, ψn (k)) dk − γn,l Il = (n) |I | Il l=0
Nn l=0
(n) 2 Il gn gn2 .
Finally, we must take a closer look at γn,l for l ≥ 1. To do this, we need the following improved version of (the first part of) Lemma 3.1. Lemma 7.1. ψn (k) = 2xn + O
n−1
|gi |xi .
i=1
This estimate holds uniformly on compact subsets of (0, π ). Proof. Proceeding as in the proof of Lemma 3.1, we obtain a recursion for ψn of the form ψn = 2(xn − xn−1 ) + (1 + O(|gn−1 |))ψn−1 + O(|gn−1 |).
We know already that ψn = 2xn + O(xn−1 ), so if we let δn = ψn − 2xn , then δn = δn−1 + an−1 xn−1 , where an = O(|gn |).
We also need the evaluation
2π 0
u2 (k) Xn (k, ψ) dψ = π ln 1 + n 4
(this is a crucial formula in this context and was already used in [12]) and the estimate |∂Xn (k, ψ)/∂k| |gn |. Lemma 7.1 shows that (for l ≥ 1) π (n) Il = xn
1+O
n−1 i=1
|gi |xi /xn
.
Schrödinger Operators with Sparse Potentials
531 (n)
(n)
We are now ready to approximately compute γn,l (l ≥ 1). Fixing, as above, kl ∈ Il (n) and writing un,l = un (kl ), we have 2π Xn 1 1 Xn dk = (n) dψn γn,l = (n) (n) |Il | Il |Il | 0 ψn n−1 2π 1 1+O Xn dψn = |gi |xi /xn (n) 2xn |Il | 0 i=1 n−1 2π
1 (n) Xn (kl , ψ) + O(|gn |/xn ) dψ + O = |gn gi |xi /xn 2π 0 i=1 n−1 u2n,l 1 = ln 1 + (23) |gn gi |xi /xn . +O 2 4 i=1
To conclude the proof, we use an elementary probabilistic argument. (In fact, it is possible to obtain more detailed information on Yn by using a more sophisticated result like [17, Theorem 3.7.2], but the simple approach presented below suffices for our purposes.) Namely, using the above results, we estimate 2 N N Yn = EYn2 + 2 EYm Yn E n=1
n=1
N n=1
N n=1
1≤m
gn2 +
|gm gn |
1≤m
xm xn
gn2 .
To pass the last line, we use the fact that if m < n, then xm /xn ≤ C2m−n (say); thus we can estimate the double sum with the help of the Cauchy-Schwarz inequality (writing |gm gn |xm /xn |gm |2(m−n)/2 · |gn |2(m−n)/2 ). For later use, we note that this estimate can in fact be carried out more carefully. Namely, given an > 0, no matter how small, we can find an N0 = N0 () so that xm /xn < n−m if n > m ≥ N0 . Taking this into account, we find that N xm 2 (N → ∞). (24) |gm gn | =o gn xn n=1
1≤m
The Chebysheff inequality yields N 3/4 N −1/2 N 2 2 P Yn ≥ gn gn , n=1
n=1
n=1
and since the right-hand side tends to zero as N → ∞, we can extract a subsequence Nm → ∞ so that the corresponding probabilities are summable (over m). Now the BorelCantelli Lemma guarantees that for almost all k ∈ I , there exists m0 = m0 (k) ∈ N, so
532
D. Krutikov, C. Remling
that N N 3/4 m m 2 Yn (k) ≤ gn n=1
(25)
n=1
(n)
for all m ≥ m0 . Since the intervals I0 shrink to the left endpoint of I as n → ∞, we (n) also have that almost surely, eventually k ∈ / I0 . So, recalling that un = gn / sin k, we now deduce from (23), (24), and (25) that for almost every k ∈ I , N Nm Nm m 2 2 Xn (k) ≥ C gn − o gn → ∞ (m → ∞). ln RNm +1 (k) = n=1
n=1
n=1
The proof of Theorem 1.1 is complete. Acknowledgements. C.R. acknowledges financial support by the Heisenberg program of the Deutsche Forschungsgemeinschaft.
References 1. Carmona, R.: One-dimensional Schrödinger operators with random or deterministic potentials: New spectral types. J. Funct. Anal. 51, 229–258 (1983) 2. Coddington, E.A. and Levinson, N.: Theory of Ordinary Differential Equations. NewYork: McGraw-Hill, 1955 3. Killip R. and Remling, C.: Reducing subspaces. To appear in J. Funct. Anal. 4. Kiselev, A., Last, Y. and Simon, B.: Modified Prüfer and EFGP transforms and the spectral analysis of one-dimensional Schrödinger operators. Commun. Math. Phys. 194, 1–45 (1998) 5. Kiselev, A., Remling, C. and Simon, B.: Effective perturbation methods for one-dimensional Schrödinger operators. J. Differential Eq. 151, 290–312 (1999) 6. Krutikov, D.: Über die Fouriertransformationen der Spektralmaße von diskreten Schrödingeroperatoren mit dünn besetzten Potentialen (in German). Ph.D. thesis, Osnabrück, 2001 7. Last, Y.: Quantum dynamics and decompositions of singular continuous spectra. J. Funct. Anal. 142, 406–445 (1996) 8. Last, Y. and Simon, B.: Eigenfunctions, transfer matrices, and absolutely continuous spectrum of onedimensional Schrödinger operators: Invent. Math. 135, 329–367 (1999) 9. Lyons, R.:Fourier–Stieltjes coefficients and asymptotic distribution modulo 1. Ann. Math. 122, 155–170 (1985) 10. Meyer, Y.: Algebraic Numbers and Harmonic Analysis. Amsterdam: North-Holland Publishing, 1972 11. Molchanov, S.: Multiscattering on sparse bumps. In: E. Carlen (ed.) et al., Advances in Differential Equations and Mathematical Physics, Contemp. Math. 217, Providence, RI: Amer. Math. Soc., 1997, pp. 157–181 12. Pearson, D.B.: Singular continuous measures in scattering theory. Commun. Math. Phys. 60, 13–36 (1978) 13. Pearson, D.B.: Value distribution and spectral analysis of differential operators. J. Phys. A 26, 4067–4080 (1993) 14. Remling, C.: A probabilistic approach to one-dimensional Schrödinger operators with sparse potentials. Commun. Math. Phys. 185, 313–323 (1997) 15. Remling, C.: Embedded singular continuous spectrum for one-dimensional Schrödinger operators. Trans. Am. Math. Soc. 351, 2479–2497 (1999) 16. Simon, B.: Operators with singular continuous spectrum, VII. Examples with borderline time decay. Commun. Math. Phys. 176, 713–722 (1996) 17. Stout, W.F.: Almost Sure Convergence. New York: Academic Press, 1974 18. Teschl, G.: Jacobi Operators and Completely Integrable Nonlinear Lattices. Mathematical Surveys and Monographs 72. Providence, RI: American Mathematical Society, 2000 Communicated by B. Simon
Commun. Math. Phys. 223, 533 – 582 (2001)
Communications in
Mathematical Physics
© Springer-Verlag 2001
Calogero–Moser Systems and Hitchin Systems J. C. Hurtubise1, , E. Markman2, 1 Centre de Recherches Mathématiques, Université de Montréal and Department of Mathematics,
McGill University, Montréal, Quebec H3C 3P8, Canada. E-mail: [email protected]
2 Department of Mathematics, University of Massachusetts, Amherst, MA 01003, USA.
E-mail: [email protected] Received: 8 May 2000 / Accepted: 2 July 2001
Abstract: We exhibit the elliptic Calogero–Moser system as a Hitchin system of Gprincipal Higgs pairs. The group G, though naturally associated to any root system, is not semi-simple. We then interpret the Lax pairs with spectral parameter of d’Hoker and Phong [dP1] and Bordner, Corrigan and Sasaki [BCS1] in terms of equivariant embeddings of the Hitchin system of G into that of GL(N ). 1. Introduction The Calogero–Moser Hamiltonian system must be one of the most thoroughly studied Hamiltonian systems, yet many aspects of its geometry remain quite mysterious. One can associate Calogero–Moser systems to any root system R on the Lie algebra h of a torus H of dimension r and to any elliptic curve (as well as to the limiting cases of rational nodal or cusp curves, the “trigonometric” and “rational" cases respectively). In canonical coordinates (x, p) ∈ h × h∗ the system is very simple, and is given by the Hamiltonian CM = p · p + m|α| p(α(x)), (1.1) α∈R
where p(x) is the Weierstrass p-function, and the m|α| are constants depending only on the norm of the root α. In the rational case, one replaces p by the function x −2 , and in the trigonometric case, by the function sin(x)−2 . The system in its full generality was of course obtained in a step by step fashion from the rational and trigonometric Sl(n) case, by various people (for a survey, see [OP]), who in particular noticed that one could replace the linear functions (xi − xj ) occurring in the original SL(N ) case by the roots of more general root systems, while maintaining The first author of this article would like to thank NSERC and FCAR for their support.
The second author was partially supported by NSF grant DMS-9802532.
534
J. C. Hurtubise, E. Markman
integrability. The presence of root systems naturally suggests that the Calogero–Moser systems have some geometric origin, tied to Lie groups. In particular, when one is discussing ties between Lie groups or algebras and integrable systems, one is immediately led to look for Lax pairs L˙ = [M, L], and indeed this is the way much of the work on the Calogero–Moser system has progressed, e.g. with [OP, K] and more recently [dP1, dP2, dP3, BCS1, BCS2], so that one now has Lax pairs with a spectral parameter for all of the Calogero–Moser systems. Nevertheless, there are several mysterious aspects to many of these Lax pairs, and in general, there is a lack of concordance between the Lax pairs and the geometry of the group: – The first is that, for the most part, the Lax matrices L are not in the Lie algebra of the root systems, though they occasionally occur in a symmetric space construction associated to the Lie algebra [OP]. Often they are in some Gl(N ), where N is not even a dimension of a non-trivial irreducible representation of the group. – As was pointed out in [Do], while there is a manifest invariance of the Calogero– Moser Hamiltonian under the action of the Weyl group, there are on the other hand, for most semisimple Lie groups, no Weyl invariant coadjoint orbits. Such orbits would usually be an essential ingredient for a suitable geometric version of the Calogero– Moser systems. Also, in general, there are no suitable orbits of dimension twice the dimension of the torus, which is the dimension one wants. – Finally, one has a Calogero–Moser system for the root systems BCn , and these do not even correspond to groups. There is one case for which a satisfactory geometric version of the Calogero–Moser system exists, that of SL(N, C). In this case, one finds that the Calogero–Moser system is a generalized Hitchin system over a moduli space of stable pairs over the elliptic curve. (This is, in some sense, already implicit in [K], but has been made more explicit by [GN, N, Do].) One considers the moduli space of pairs (E, φ), where – E is a rank N degree 0 bundle on with trivial determinant, and – φ is a section of End(E) ⊗ K with a simple pole at the origin whose residue is a conjugate of m · diag(1, 1, 1, . . . 1, −N + 1)). This realises the phase space in a natural way: the Hamiltonians are the coefficients of the equation of the “spectral curve” of φ; and one has natural compactifications of the level sets of the Calogero–Moser Hamiltonians as Jacobians of the spectral curves. The following note attempts to explain some of the geometry of the Calogero–Moser systems for an arbitrary root system R in terms of the geometry of a modified Hitchin system. The departure from previous work is that we do not use as structure group the semi-simple group associated to the root system, but rather a group which one can construct for any root system, whose connected component of the identity is the semidirect product of the torus with the sum of the root spaces. The Weyl group acts on these, and this allows one to construct in a natural way for any root system some Weyl invariant coadjoint orbits of the correct dimension, to which one can associate Hitchin systems, over which the Calogero Moser Hamiltonian appears naturally. The Lax pairs with spectral parameter of [dP1, BCS2] appear in a natural way from embeddings of the Lie algebras of our groups into Gl(V ), where V is a sum of weight spaces invariant under the Weyl group W of the root system. These embeddings are not homomorphisms, but are invariant under the torus and the Weyl group action. While this in some way clears up some of the mystery surrounding the Calogero– Moser systems, and in particular addresses the three facts outlined above, there are several aspects that are still unexplained: the first is that while the Calogero–Moser Hamiltonian
Calogero–Moser Systems and Hitchin Systems
535
occurs naturally, the other commuting Hamiltonians have no easy interpretation in our setup and only seem to occur naturally after embedding into the Hitchin system for Gl(V ). Another remaining, and probably related, question is that of understanding the compactifications of the level sets of the Hamiltonians (as Abelian varieties). This would require an enlargement of our phase space. Section 2 of this paper is devoted to recalling certain facts about elliptic curves; Sect. 3 is similarily devoted to the required facts on generalized Hitchin systems. In Sect. 4, after discussing extensions of the Weyl group by the torus, we introduce the group which interests us, and discuss its properties. The next section is devoted to Hitchin systems with this group as structure group, and we show how the Calogero–Moser systems arise. The sixth section is devoted to a discussion of how these systems embed into the Hitchin systems for Gl(V ), giving the Lax pairs of [dP1], [BCS2]. 2. Line Bundles on an Elliptic Curve Let be a non-degenerate lattice in C with generators 2ω1 , 2ω2 and let = C/ be the corresponding elliptic curve. Denote the origin by p0 . We have on C the standard elliptic functions σ (z), ζ (z) with expansions at z = 0, σ (z) = z + O(z5 ), 1 ζ (z) = + O(z3 ), z
(2.1)
σ (z + 2ωi ) = −σ (z) exp (2ηi (z + 2ωi )) , ζ (z + 2ωi ) = ζ (z) + 2ηi ,
(2.2)
and periodicity relations
with ηi = ζ (ωi ). We have d log(σ (z)) = ζ (z), dz
d ζ (z) = −p(z), dz
where p(z) is the standard Weierstrass p-function. From the periodicity relations, one has that the function σ (z − x) xζ (z) ρ 1 (x, z) = (2.3) e σ (z)σ (x) is well defined on the elliptic curve with parameter z, with an essential singularity at the origin, and a single zero at x = z. If we set ρ 0 (x, z) = e−xζ (z) ρ 1 (x, z),
(2.4)
we find that ρ 0 has a single pole in z at the origin. Covering by U1 = − (origin), U0 = disk around origin, we can reinterpret the relation (2.4) as saying that ρ 1 , ρ 0 define a section of the line bundle Lx with transition function e−xζ (z) ; this section has a single pole at the origin. Let px be the point z = x on ; Lx corresponds to the divisor px − p0 . There is another way of representing sections of the line bundle Lx , which is as functions f on C satisfying automorphy relations: f (z + 2ωi ) = f (z) exp(−2ηi x).
(2.5)
536
J. C. Hurtubise, E. Markman
In this way, the function ρ 0 (x, z) also represents a section of Lx with a simple pole at the origin. For use later on, we note that ζ (z)ρ 0 (x, z) = e−xζ (z)
∂ρ 1 (x, z) ∂ρ 0 (x, z) − . ∂x ∂x
(2.6)
As a function of z, we have expansions −1 σ (x) + + O(z), z σ (x) d 0 d2 (log(σ (x)) + O(z). ρ (x, z) = dx dx 2
(2.7)
ρ 0 (x, z)ρ 0 (−x, z) = ρ 1 (x, z)ρ 1 (−x, z) = p(z) − p(x).
(2.8)
ρ 0 (x, z) =
We have
Finally, consider
In : →
z → nz,
(2.9)
and let In∗ denote the induced action on line bundles; we note that In∗ (Lx ) = Lnx . The pull-back In∗ (ρ(x, z)) = ρ(x, nz) has automorphy factors exp(−2nηi x), and so one can represent a section by ρ 0 (x, nz) in the trivialization given above: ρ 0 (x, nz) = e−nxζ (z) ρˆ 1 (x, z)
(2.10)
for a suitable function ρˆ 1 on U1 ; one has nζ (z)ρ 0 (x, nz) = e−nxζ (z)
∂ ρˆ 1 ∂ρ 0 (x, nz) − (x, nz). ∂x ∂x
(2.11)
3. Generalised Hitchin Systems on an Elliptic Curve Let G be a complex Lie group. Following [Ma], we will consider the moduli space MG of pairs (G-bundle PG on , trivialisation tr of PG at p0 ) and its cotangent bundle T ∗ MG . For PG ∈ MG , let Pg be the adjoint bundle associated to PG , and Pg∗ the associated coadjoint bundle. For any vector bundle V , set V (p0 ) = V ⊗ O(p0 ), V (−p0 ) = V ⊗ O(−p0 ). The fibre of T ∗ MG → MG at PG is the vector space H 0 ( , Pg∗ (p0 )), (note that the canonical bundle of is trivial) so that T ∗ MG is a space of triples (G-bundle PG on , trivialisation tr at p0 , section φ of Pg∗ with a pole at p0 ). Dually, the tangent space to MG is H 1 ( , Pg (−p0 )). We have, at (PG , T ), the exact sequence for T (T ∗ MG ): 0 → H 0 ( , Pg∗ (p0 )) → T (T ∗ MG ) → H 1 ( , Pg (−p0 )) → 0.
(3.1)
The group G acts naturally on the trivialisations, and so acts symplectically on T ∗ MG . The moment map for this action is simply the residue of φ at p0 , expressed in
Calogero–Moser Systems and Hitchin Systems
537
the trivialisation tr. One can take symplectic reductions of T ∗ MG under this action, and obtain reduced moduli spaces (T ∗ MG )red . This amounts to fixing the coadjoint orbits O of the residue of φ, and forgetting the trivialisation tr. Let F be a homogeneous invariant function on g∗ of degree n, and ω an element of H 1 ( , K −n+1 (−np0 )) H 1 ( , O(−np0 )). Applying F to φ, one obtains an element F (φ) of H 0 ( , O(np0 )), and so one can define the Hamiltonian Fω on T ∗ MG by Fω (PG , φ) = {F (φ), ω}, where { , } denotes the Serre duality pairing. These Hamiltonians descend to the reduced spaces (T ∗ MG )red , and, for G reductive, define there an integrable system, the generalized Hitchin system. Explicit formulae. We can cover by the opens U0 , U1 as above, and choose trivialisations on these opens, with the one on U0 compatible with the trivialisation tr. Let T = T1,0 be the corresponding transition function U0 ∩ U1 → G for ; sections of H 0 ( , Pg∗ (p0 )) can be represented as functions φ i : Ui → g∗ , with φ 1 holomorphic on U1 and φ 0 meromorphic on U0 with only one simple pole at the origin, and φ 1 = ad∗ (T )φ 0 on U0 ∩ U1 . We would like to split the sequence (3.1). Represent a one parameter family of elements (PG (t), tr(t), φ(t)) of T ∗ MG by (T (t), φ 0 (t), φ 1 (t)), with φ 1 (t)) = Ad∗ (T (t))φ 0 (t). At t = 0, the corresponding tangent vectors are given by v˙ = T −1 T˙ , φ˙ 0 , φ˙ 1 , with ˙ 0 )) + φ˙ 0 . φ˙ 1 = Ad∗ (T ) (ad∗ (v)(φ Let , : H 0 ( , Pg∗ (p0 )) × H 1 ( , Pg (−p0 )) → C denote the Serre duality pairing; explicitly, it is defined by ˙ v ˙ φ, ˙ = resp0 (φ˙ 0 · v). At a point PG of MG , choose a transition function T = T10 : U0 ∩ U1 → G. Let us choose a vector space V of cocycles mapping isomorphically to H 1 ( , Pg (−p0 )). One can split (3.1) by taking for each v˙ in V H 1 ( , Pg (−p0 )) the vector (v, ˙ φ˙ norm ) such ˙ that the pairing of φnorm with elements of V is zero. More generally, for any section a of Pg∗ (p0 ) over U0 , let a & denote the element of H 0 ( , Pg∗ (p0 )) whose pairing with elements of V is the same as that of a. One then has the isomorphism T (T ∗ MG ) → H 1 ( , Pg (−p0 )) ⊕ H 0 ( , Pg∗ (p0 )) ˙ → (v, ˙ & ). (v, ˙ φ) ˙ (φ)
(3.2)
Proposition 3.1. Under this isomorphism, the symplectic form on T ∗ MG becomes, at (PG , tr, φ): & & 2 v , φ , v, (3.3) ˙ φ˙ & = v , φ˙ & − v, ˙ φ + v , v˙ , φ . Proof. One can parametrise MG locally by V ; indeed, if T is a transition matrix for P , one has in a neighbourhood of the origin a map V → MG obtained by associating to the cocycle v the transition matrix T · exp(v); this in turn defines a map ρ : V × V ∗ =
538
J. C. Hurtubise, E. Markman
T ∗ V → T ∗ MG , which preserves the symplectic form. With respect to the splitting (3.2), the differential dρ at the origin is
1 ∗ & ˙ ˙ dρ(v, ˙ φ) → v, ˙ φ + (adv˙ φ) . (3.4) 2 Substituting in the standard expression for the symplectic form on a cotangent bundle gives (3.3). The explicit action of the group G on T ∗ (MG ) is given by g(T , φ 0 , φ 1 ) = Adg (T ), Ad∗g −1 (φ 0 ), φ 1 , and the moment map for this action is resp0 (φ 0 ). From (3.3), we can compute the Hamiltonian vector fields associated to Fω : ˙ = (dF (φ) · ω, 0) . (v, ˙ φ)
(3.5)
In other words, the Higgs field part stays as is, but the bundle varies. That this is possible is due to the invariance of the function: ad∗ (dF )(φ) = 0. We can write an equivalent version of the flow by modifying the transition function by a coboundary: that is, we are allowed to modify our trivialisations of the bundle over the Ui , as long as on U0 the trivialisation is not changed over p0 . Thus, if gi ∈ H 0 (Ui , Pg∗ ), with g0 (0) = 0, we have the equivalent version of the flow: v, ˙ φ˙ 0 , φ˙ 1 = df · ω + Ad(T −1 )(g1 ) − g0 , ad∗ (g0 )φ 0 , ad∗ (g1 )φ 1 . (3.6) Similarly, on the reduced space, one can modify the flow by a coboundary, but now with g0 (0) arbitrary. 4. A Group Associated to a Root System In this section, we will define the structure group which we will use for our Calogero– Moser systems. It will be associated to the root system R acting on the Lie algebra h of torus H = (C∗ )r . We begin however with a discussion which shows that in some sense, passing to a new group is necessary. The symplectic reduction leading to the generalized Hitchin system (T ∗ MG )red depends on a choice of a coadjoint orbit of G. In our case of an elliptic base curve, the dimension of (T ∗ MG )red is equal to the dimension of the coadjoint orbit. We are thus looking for a group, related to the rank r root system, and admitting a 2r-dimensional coadjoint orbit; 2r being the dimension of the Calogero–Moser system. When the semi-simple group is SLn , the coadjoint orbit of diag(1, 1, . . . , 1, 1 − n) is 2n − 2 dimensional. A general semi-simple group Gss does not have a Weyl invariant 2r-dimensional coadjoint orbit. There is however a group G0 , naturally associated to Gss , which does admit 2r-dimensional coadjoint orbits. We consider the group O0 (Gss ) of germs at the origin of maps C → Gss , and let V be the subgroup of germs of the form g(z) = h + zg1 + z2 g2 + . . . , h ∈ H , and V be the subgroup of V of germs of the form g(z) = Id + zh1 + z2 g2 + . . . , h1 ∈ h. Then set G0 = V /V . G0 is the semi-direct product of H with the direct sum of the root spaces. The coadjoint orbits of G0 are analyzed in Proposition (4.14) below.
Calogero–Moser Systems and Hitchin Systems
539
There is a natural extension N (G0 ) of the Weyl group W by G0 . Simply consider germs with a leading coefficient in the normalizer N of H in Gss . N (G0 ) acts on g∗0 via the coadjoint action. We encounter another difficulty: For a general semi-simple group, there does not exist any W -invariant 2r-dimensional coadjoint orbit in g∗0 (i.e., one which is also an N (G0 ) orbit). Proposition (4.14) shows that for such an orbit to exist, we need a non-trivial W -invariant H -orbit in the direct sum of the root spaces. When the group G is SLn , such an orbit is obtained by intersecting the direct sum of the root spaces with the coadjoint orbit of diag(1, 1, . . . , 1, 1 − n). More generally, we relate the existence of W -invariant H -orbits to splittings of the short exact sequence 0 → H → N → W → 0,
(4.1)
where N is any extension of W by H such that the conjugation in N induces on H the standard W -module structure. Let V be an N representation and R a non-zero W invariant H -orbit in V . Given a character α of H , denote by Nα and Wα its stabilizers in N and W . Lemma 4.1. (1) The stabilizer Stab(ξ ) of every element ξ ∈ R intersects H in a fixed normal subgroup Stab0 ⊂ H . (2) Let 0→H →N →W →0 (4.2) be the quotient of (4.1) by Stab0 . Then the stabilizer StabN (ξ ) of every ξ ∈ R projects isomorphically onto W . In particular, (4.2) splits. (3) If V is an irreducible representation of N , and the W -invariant H -orbit R is not the zero orbit, and ιξ : W 9→ N is the splitting provided by ξ ∈ R, then the W representation ι−1 ξ (V ) of W is equivalent to IndWα (1) for any character α of H in V . Consequently, we obtain a characterization of V as a representation of N : V is equivalent to the pullback to N of IndN α, where α is the unique character of N α Nα
which restricts to the trivial character of Wα and to the character α of H . Proof. (1) We have the equality Stab(n · ξ ) = n · Stab(ξ ) · n−1 . Since R is a H -orbit, every two stabilizers of elements in R are conjugate by an element of H . Stab(ξ ) ∩ H is a fixed group Stab0 as H is commutative. Stab0 is a normal subgroup of N since n · Stab ·n−1 = n · Stab(ξ ) · n−1 ∩ H = Stab(n · ξ ) ∩ H = Stab . 0
0
(2) Let nw be an element of N mapping to w ∈ W . Choose an element ξ ∈ R. Since R is also an N-orbit, there exists an element a ∈ H such that nw · ξ = a · ξ . Thus, Stab(ξ ) maps onto W . It follows that the stabilizer StabN (ξ ) in N surjects onto W . The homomorphism StabN (ξ ) → W is injective because StabN (ξ ) ∩ H = (1). (3) Let α be any character of H with positive multiplicity in V , Vα the corresponding subspace, and ξ an element of R. Since V is irreducible, ξα = 0. Here ξα is the component of ξ in Vα with respect to the decomposition V = ⊕α Vα . Since ιξ (W ) is the stabilizer of ξ , the line spanned by ξα in Vα is the trivial character of Wα . It follows that the direct sum V of all the translates of span{ξα } by ιξ (W ) is a sub-representation of V . The irreducibility of V implies that Vα = span{ξα } and V = V . It follows that, as an N -module, V is the induced representation IndN Vα . The equivalence V ∼ = IndW Wα (1) Nα of W -modules follows. Note that V need not be irreducible as a W -representation.
540
J. C. Hurtubise, E. Markman
The lemma specifies two obstructions to the existence of a non-trivial W -invariant H -orbit in g∗ for a simple Lie algebra. The first obstruction is the extension class of (4.1). In particular, considering the (co)-adjoint representation, the list of simple groups of adjoint type for which the exact sequence (4.1) for the normaliser does not split is: SP (n), F4 , E6 , E7 , E8 (mod centers). The second obstruction, condition (3) in the lemma, rules out the existence of a W -invariant H -orbit in g∗ for Lie algebras of type Dn (and in the long root representation of type Bn ) even though (4.1) splits. Example. The exact sequence (4.1) splits for SO(2n) and W embeds in N as a subgroup of the group of permutation matrices. Identify the Lie algebra
mn so(2n) = : q = −mt , nt = −n, and pt = −p , pq consider the Cartan span{ei,i − en+i,n+i } and let α be the root >i − >j corresponding to gα = span{ei,j −en+j,n+i }, i = j . The matrix of the permutation σ := (i, n+j )(j, n+i) belongs to the stabilizer Wα of the point α in the root lattice. However, σ acts by multiplication by −1 on gα . In particular, gα is not the trivial character of Wα and condition (3) of the lemma is not satisfied. In order to circumvent the first obstruction, instead of the normaliser N (H ) of the torus, we will consider the semi-direct product N of the torus and the Weyl group: 0 → H → N → W → 0.
(4.3)
We now define our structure group. For any set of weights w which is invariant under the Weyl group, part (3) of Lemma 4.2 determines a representation of N on the associated sum of weight spaces V = ⊕Cw : indeed, if we choose a basis element for each weight space Cw , the Weyl group acts simply by permuting these basis elements, while the torus acts in the natural way on each weight space. This holds in particular for the roots α : h → C. We define G to be the semi direct product ⊕nα=1 Cα → G → N .
(4.4)
The connected component of the identity is the group G0 discussed above. G0 is the semi-direct product ⊕nα=1 Cα → G0 → H. (4.5) Given any element of the Lie algebra g, we can decompose it into its torus and root space components; write this decomposition as ξ = ξh + ξr . Similarly, we can decompose an element a of g∗ as ah + ar . The choice of the group G is motivated by the following: Proposition 4.2. The G − Ad∗ -invariant functions on g∗ onlydepend on the root space components, and correspond to the N -invariant functions on α Cα . The generic coadjoint orbit is 2r-dimensional, where r = dim(N ), and is of the form
N − orbit in Cα × h∗ . α
Moreover, g∗ has a 2r-dimensional connected (W -invariant) coadjoint orbit.
Calogero–Moser Systems and Hitchin Systems
541
Proposition (4.2) follows from Proposition (4.3). The rest of this section is dedicated to the proof of these two propositions. Let us fix a basis element for each root space Cα , in a W -invariant way. The components of a vector C ∈ ⊕nα=1 Cα are then well defined, and naturally indexed by the roots themselves: C = (Cα ). Let α˜ : H → C∗ denote the character corresponding to α, so that H acts on Cα by (h, v) → α(h) ˜ · v. Let C t · D denote the natural scalar n product of two vectors in ⊕α=1 Cα , and let C ◦ D denote the componentwise product: (C ◦ D)α = Cα Dα . We denote by I the permutation matrix acting on the root spaces which permutes the α th and the −α th root spaces. Finally, we write the action of h on ⊕nα=1 Cα as a matrix: τi Ai,α = α(τ ), i
so that the action of τ on C can be written as (τ t · A) ◦ C. As a manifold, G0 = (⊕nα=1 Cα ) × H . The product is given by (C, h)(C , h ) = C + (exp(log(h) · A)) ◦ C , hh .
(4.6)
(⊕nα=1 Cα ) ⊕ h
is The corresponding Lie bracket on t (A, τ ), (A , τ ) = (τ t · A) ◦ A − (τ · A) ◦ A , 0 .
(4.7)
There is a pairing on the Lie algebra, identifying g with g∗ : (A, τ ), (A , τ ) = A t · I · A + τ t · τ .
(4.8)
We will use this pairing to describe the coadjoint action: one then has t [(B, σ ), (A, τ )] , (A , τ ) = σ t · A ◦ A − τ t · A ◦ B · I · A = −A t · I · (σ t · A) ◦ A − τ t · A · B ◦ (I · A ) , (4.9) remembering that Ai,−α = −Ai,α . Therefore ad∗(B,σ ) (A , τ ) = (−σ t · A) ◦ A , −A · B ◦ (I · A ) . (4.10) Similarly, the coadjoint action of an element of the group can be written as Ad∗(D,exp(σ )) (A , τ ) = exp(−σ t · A) ◦ A , τ − A · D ◦ (I · A ) .
(4.11)
From (4.11), one has: Proposition 4.3. The G0 − Ad∗ -invariant functions on g∗ only depend on the root space components, and correspond to the H -invariant functions on α Cα . The generic coadjoint orbit is 2r-dimensional, where r = dim(H ), and is of the form (H -orbit in α Cα ) ×h∗ . The pairing (4.8) is part of a more general family of invariant inner products on g: Lemma 4.4. Let D be a diagonal matrix such that Dα,α = Dα ,α if α, α lie in the same Weyl group orbit, and let δ be a constant. The N invariant pairings on g are given by < (A, τ ), (A , τ ) >D,d = A t · D · I · A + δτ t · τ .
(4.12)
When there is a single Weyl orbit of roots, there is then a two parameter family of pairings; when there two orbits, there is a three parameter family. Proof. The invariance under H forces the α th root space to be paired only with the −α th , and h to be paired only with itself. Further invariance under the Weyl group reduces one to a single choice up to scale for h and for each Weyl orbit of roots.
542
J. C. Hurtubise, E. Markman
5. Hitchin Systems for G We now turn to studying (modified) Hitchin systems for our group G, over an elliptic curve . We begin with MG = {G − bundles of degree zero, trivialized at p0 }, then take the cotangent bundle of this space. We then reduce by the action of the group G, at a W -invariant element of g∗ ; this element must then lie in g∗r . We will see that this is essentially equivalent to reducing by the action of the subgroup ⊕α Cα . The Calogero– Moser Hamiltonians are then expressed naturally in terms of the scalar product of (4.8) on the reduced space. 5.1. G-bundles on . We begin by giving an explicit description of the moduli of framed G-bundles of degree 0 on . We first note that under the projection of G to W , any G-bundle defines a W -bundle. We will consider only the component M0G of moduli corresponding to trivial W -bundles, so that we can represent our bundles as G0 -bundles. The subspace of G-bundles which can be represented as G0 bundles is a quotient of the moduli of G0 -bundles: one must quotient out by the action of W , since different G0 -bundles can be the same as G-bundles. M0G = MG0 /W.
(5.1)
Let MG0 be the moduli of G0 -bundles (without framing). To analyse the moduli MG0 , we use the fact that the group G0 maps to H , and so one has maps F : MG0 → MH = P ic0 ( )r = r , r
FW : MG0 /W → MH /W = /W.
(5.2) (5.3)
By a theorem of Looijenga [Lo], the space r /W is a weighted projective space. The fiber of (5.2) at χ ∈ r is ⊕α H 1 ( , Lα(χ) ), where Lα(χ) is the line bundle associated to ˜ ˜ α(χ ˜ ). This can be seen by writing out a cocycle explicitly in the semi-direct product. Each H 1 ( , Lα(χ) ) is isomorphic to C if Lα(χ) is trivial and is (0) otherwise. Consequently, ˜ ˜
one has an open set MG0 ⊂ MG0 isomorphic to the open set of r /W corresponding to H -bundles for which none of the Lα(T ˜ h ) are trivial. Putting the framings back in, one has that the moduli of framed H -bundles is the same as the moduli of unframed H -bundles, as the automorphisms act transitively on framings. Consequently, one has F : MG0 → MH = P ic0 ( )r = r , r
FW : MG0 /W → MH /W = /W.
(5.4) (5,5)
This time, the fibre is ⊕α H 1 ( , Lα(χ) (−p0 )). Each H 1 , Lα(χ) (−p0 ) is isomorphic ˜ ˜ to C. Definition. We say that a G-bundle PG is special if Lα(T ˜ h ) is trivial for some root α.
Calogero–Moser Systems and Hitchin Systems
One has an open set
543
M G ⊂ MG
of framed non-special G-bundles. We will take a reduction of M G , which will be the space over which the Calogero–Moser systems are defined, and indeed, we shall see that the reduction does not extend to the locus where one of the Lα(T ˜ h ) is trivial. Explicitly, covering the elliptic curve by U0 = disk around p0 , and U1 = − p0 , the torus part Th of the transition function T is a function Th : U0 ∩ U1 → H . We can choose these functions to be of the form Th = exp (xζ (z)) ,
x ∈ h.
(5.6)
The root space part Tr can be represented by a vector M of cocycles Mα representing elements of H 1 ( , Lα(T ˜ h ) (p0 )). For Lα(T ˜ h ) non-trivial, these cocycles can be taken to be constant functions on U0 ∩ U1 ; when Lα(T ˜ h ) is trivial, the constant functions correspond to trivial classes, and one must choose a function with a simple pole at p0 as generator, for example ζ (z). One has as cotangent space to M G at a bundle PG the set of Higgs fields φ in 0 H ( , Pg∗ (p0 )) (fields in the associated coadjoint bundle with poles at p0 ). Splitting φ into a root space component and a torus component, φ = φh + φr ,
(5.7)
one has that the components φα of φr have poles at the origin only when the line bundle is not trivial. Explicitly, for a bundle with transition functions (Tr , Th = exp(xζ (z))), one represents (φr , φh ) in the U0 -trivialisation by (φr0 , φh0 ), and in the U1 -trivialisation by (φr1 , φh1 ), with (φr1 , φh1 ) = exp(−x t · Aζ (z)) ◦ φr0 , φh0 − A · Tr ◦ (I · φr0 ) . Here φr0 has simple poles at the origin; its components φα0 are simply multiples of the functions ρ 0 (α(x), z) of (2.4). Similarly, the components φα1 of φr1 are multiples of the functions ρ 1 (α(x), z). 5.2. Reduction. There is, as in Sect. 3, an action of G on T ∗ MG by changing the trivialisation at p0 . The moment map for this action is, as we saw in Sect. 3, simply the residue of the Higgs field φ at p0 , expressed in the trivialization. We want to reduce at an element C of g∗ which is W invariant. Being W -invariant, C must lie in g∗r . W invariance implies that reduction of T ∗ MG by G at C is equivalent to the reduction of T ∗ MG0 by G0 at C, then quotienting by W . The set of root vectors α splits up into W -orbits according to their lengths |α|, and we choose constants c|α| for each length, and set C = (c|α1 | , c|α2 | , .., c|αn | ). Note that the coadjoint orbit G0 · C is also a G-orbit, the connected orbit of Proposition (4.7). If the constants c|αi | are non-zero, the coadjoint orbit of the element C is of the form (H -orbit in g∗r )×h∗ . Also, the stabiliser of the element C lies in the root space part of the group. From these facts and from the expressions for the actions, it follows that taking the symplectic quotient by G0 or the quotient by its subgroup ⊕α Cα gives exactly the same result.
544
J. C. Hurtubise, E. Markman
The action of an element V ∈ ⊕α Cα on (T , φ 0 ) ∈ T ∗ MG is given explicitly by (T , φ 0 ) = (Tr , Th ), (φr0 , φh0 ) → (Tr + V , Th ), (φr0 , φh0 − A · V ◦ (I · φr0 ) . (5.8) The moment map for this action is res(φr0 ). Reducing at C, we fix to C the residues of φr0 , and quotient by the group. Referring to the explicit form of the action, this means that we can normalise to Tr = 0, thus reducing the bundle to the torus, over the locus M G . Our reduced space (T ∗ MG )red can thus be thought of as a subspace of the unreduced one. This subspace (T ∗ MG )red of MG is characterised by: Tr = 0,
res(φr0 ) = a W − invariant constant C.
(5.9)
Note that we still have the torus part of the framing in this description of our reduced space. Explicitly, the elements of (T ∗ MG )red are then H -bundles with transition functions T = (Tr , Th ) = (0, exp (xζ (z))) , along with Higgs fields whose root space components are, in the U0 -trivialisation φα0 = c|α| ρ 0 (α(x), z) , and whose torus components are constants φh0 = φh1 = p. By (3.3) the functions x ∈ h, p ∈ h∗ provide canonical coordinates on (T ∗ MG )red ; one must remember that we are restricting to the set α(x) = 0, α ∈ R, and quotienting out by the action of the affine Weyl group on h × h∗ . Remarks. (1) In the above discussion we have excluded a Higgs pair (PG , φ) if the bundle PG is special (see Sect. 5.1). This exclusion is automatically imposed by the reduction along the coadjoint orbit G · C. The coadjoint bundle Pg∗ fits in the short exact sequence 0 → h∗ → Pg∗ → (⊕α˜ Lα˜ )∗ → 0. The moment map sends (PG , φ) into G · C if and only if the residue of φ projects into the H -orbit of C in the fiber of (⊕α˜ Lα˜ )∗ (p0 ) at p0 . The triviality of one of the Lα˜ rules out the existence of such sections in H 0 ( , Pg∗ (p0 )). (2) All the non-special principal G-bundles considered have a canonical reduction to a principal N -bundle (equivalently, a W -orbit of reductions to an H -bundle). This follows from the fact that the global sections of Pg generate a sub-bundle of commutative sub-algebras isomorphic to h.
Calogero–Moser Systems and Hitchin Systems
545
5.3. The Calogero–Moser Hamiltonian. Let ω be the class in H 1 ( , O(−2p0 )), with representative with respect to the cover U0 , U1 , ω = ζ (z). Our Hamiltonian on space:
(T ∗ MG )red
(5.10)
will then be the W-invariant function on the reduced
CM = resp0 (ω φ, φ).
(5.11)
Note that the bilinear form of Lemma 4.4 gives rise to a canonical bilinear form on Pg because all the non-special principal G-bundles we consider have a canonical reduction to N . The bilinear form sends a Higgs pair (PG , φ) to the element φ, φ of H 0 ( , K ⊗2 (2p0 )). H 0 ( , K ⊗2 (2p0 )) is two dimensional. If the pair (PG , φ) belongs to (T ∗ MG )red , then the residue of φ belongs to our coadjoint orbit G · C. Hence, the quadratic residue of < φ, φ > is fixed and < φ, φ > lies in a marked affine line I in H 0 ( , K ⊗2 (2p0 )). We see that the Calogero–Moser Hamiltonian is determined canonically up to a choice of an affine linear isomorphism I ∼ = C. The restriction of the linear functional (5.11) on H 0 ( , K ⊗2 (2p0 )) provides such an isomorphism. can split CM One into a sum CMr +CMh of a root space piece CMr = {ω, φr , φr } 0 = ω, φα0 φ−α and a torus piece CMh = {ω, φh , φh }. The relation (2.8) tells us that 0 2 φα0 φ−α = c|α| (p(z) − p(α(x)) and so the Hamiltonian is CM = p · p −
α
2 c|α| p (α(x)) ,
(5.12)
2 . which is indeed the Calogero–Moser Hamiltonian, setting m|α| = −c|α| Next we provide an explicit formula for the vector field of the Calogero–Moser Hamiltonian. The formula (5.18) will be needed in Sect. 6. The function CMr is ad∗ invariant, and its differential on the H 0 ( , Pg∗ (p0 ))-component of the tangent space T ∗ MG is then dCMr = (ωφ)0r , (5.13)
where one thinks of dCMr as an element of H 1 ( ,Pg (−p0 )) acting on H 0 ( ,Pg∗ (p0 )). With respect to the splitting (3.2) the action of dCMr on the H 1 ( , Pg (−p0 ))-component of the tangent space is trivial. By the considerations of Sect. 2, the Hamiltonian vector field in T ∗ MG of CMr at (T , φ 0 , φ 1 ), where T is the (torus) transition matrix and φ 1 = Ad∗ (T )(φ 0 ) is the Higgs field, is given by: T −1 T˙ = ωφr0 , φ˙ i = 0.
(5.14)
This takes us out of the normalised form for the reduced space, since the transition function no longer lies in the torus. Remembering that we are on an elliptic curve, the root space components ωφα in H 1 ( , Lα ) (in the reduced space, we have forgotten the root space component of the framings, and so we are in H 1 ( , Lα ) instead of H 1 ( , Lα )(−p0 ))) can be written as coboundaries: (ωφ)r = (ωφ)0r + AdT −1 (ωφ)1r
(5.15)
and the flow of CMr can be written T −1 T˙ = 0,
φ˙ 0 = (ad∗(ωφ)0 φ 0 )& . r
(5.16)
546
J. C. Hurtubise, E. Markman
The vector field on (T ∗ MG )red corresponding to CMh = p · p, with respect to the splitting (3.2) is simply T −1 T˙ = ωφh0 = ωp,
φ˙ 0 = 0,
and so combining (5.16) and (5.17), one has for the flow of CM & T −1 T˙ = ωφh0 , φ˙ 0 = ad∗(ωφ 0 ) φ 0 . r
(5.17)
(5.18)
Let us check that the vector field (5.18 ) is indeed the vector field that one obtains from the explicit parametrisation. From (2.6), using ω = ζ (z) we find that the α th components of the coboundary decomposition (5.15) satisfy (ωφ)0α Ai,α =
d 0 d (φ 0 )α = c|α| ρ (α(x), z) , dxi dxi
(5.19)
dα where we decompose x ∈ h into components x1 , . . . , xr , and Ai,α = dx is the correi sponding component of the root α. Referring to the formulae (4.10) for the coadjoint action and to (2.8), we have for the flows:
d 0 p˙ i = c|α| ρ (α(x), z) · c|α| ρ 0 (−α(x), z) dxi α
1 2 d (5.20) = c|α| p (α(x)) , 2 α dxi
x˙i = pi . 6. Embeddings in Gl(N, C) We now give two embeddings of our system into the Hitchin systems for Gl(N, C) over
, one corresponding to the Lax pairs of [dP1], the other to those of [BCS2]. Let V = CN be a sum of (integral) weight spaces Cωi , i = 1, . . . , N for the torus H , such that the set of roots is Weyl invariant. The weights wi are maps of h to C; denote the corresponding homomorphism H → C∗ by w˜ i . As for the root spaces, each of these weight spaces should be thought of as having a preferred basis, and the bases are invariant under the Weyl group. One has an embedding J of the torus H into the diagonal subgroup D of gl(N, C); it is given by J(h) = diag (w˜ 1 (h), . . . , w˜ N (h)) . (6.1) Let ξ denote the corresponding Lie algebra homomorphism. ˆ from the space of H -bundles of degree 0 over The homomorphism J induces a map J
to the space of D-bundles of degree 0 over , where D is the diagonal subgroup of Gl(N ). The space of H -bundles, as we saw, can be parametrised with some redundancy ˆ by h; for h ∈ h, the corresponding D-bundle J(E) is a sum of line bundles ⊕i Lwi (h) . ˆ The bundle End(J(E)) is then a sum of line bundles ⊕i,j Lwi (h)−wj (h) . The differences wi − wj are sums of roots, for wi , wj in the same orbit. We note that the space of Gl(N, C)-bundles of degree zero is essentially the finite quotient of the space of Dbundles of degree zero by the Weyl group of Gl(N ), as the generic Gl(N, C)-bundle of degree zero reduces to the torus.
Calogero–Moser Systems and Hitchin Systems
547
ˆ extends to a map The d’Hoker–Phong embedding. The map J JM : (T ∗ MG )red → T ∗ MGL(N)
(6.2)
of the reduced moduli space (T ∗ MG )red into the cotangent bundle T ∗ MGL(N) of the space of Gl(N )- bundles with level structure at the point p0 of . Conceptually, the map JM is determined by an N -equivariant extension of ξ ∗ to a linear map from g∗ to gl(N, C). Recall that a non-special G-bundle admits a canonical reduction to a N -bundle. Thus, (T ∗ MG )red is also a moduli space of pairs (P , φ), where P is a principal N -bundle, Pg∗ is the vector bundle associated via the map of g∗ into gl(N, C), and ϕ is a section of Pg∗ ⊗ K(p0 ). The homomorphism J extends to a homomorphism J : N → N (D) realizing gl(N, C) as an N representation. Thus, an N -equivariant linear map from g∗ to gl(N, C) gives rise to a map (6.2). More explicitly, (T ∗ MG )red is a space of pairs (H -bundle E with H -level structure at p0 , section φG of E((h∗ ⊕ (⊕α Cα )) ⊗ K (p0 )). To this, JM will associate an element of T ∗ MGL(N) . Such an element is a pair (rk N bundle EGl(N) with level structure at p0 , section φGl(N) of End(EGl(N) ) ⊗ K (p0 )). ˆ The Gl(N, C)-bundle EGl(N) associated by JM to (E, φ) is simply J(E). We then define the corresponding φGl(N) . We choose for each pair (w, w ) of weights a constant Cw,w in a way that it is invariant under the Weyl group and so that Cw,w = Cw ,w . We then define a “shift” operator for each root α (Shα )w,w = δw−w ,α Cw,w ,
(6.3)
where we index the entries of the matrix by the weights themselves. The coefficient δw−w ,α is the Kronecker δ. We then set (φG )α Shα . φGl(N) = ξ (φG )h +
(6.4)
α∈R
Let CM denote the image J((T ∗ MG )red ). The space MGL(N) has dimension N 2 . Indeed, the space of bundles is of dimension N : the generic Gl(N )-bundle on reduces to the subgroup D of diagonal matrices. The bundles have, generically, the group D as automorphisms. When one adds in the level structure, one adds in N 2 parameters, on which the automorphisms act, reducing one to N 2 − N parameters, giving N + N 2 − N = N 2 parameters in all. When one considers the Higgs fields φGl(N) in H 0 ( , End(EGl(N) )) ⊗ K (p0 )), one has similarly N 2 parameters, giving 2N 2 parameters for T ∗ MGL(N) . The Calogero–Moser locus CM lies inside T ∗ MGL(N) , and is of dimension 2r. It is characterised by the fact that the framing is compatible with the reduction to the diagonal subgroup J(H ) ⊂ D (so that transition functions respecting the trivialisation can be chosen diagonal), and the polar parts of the Higgs field φGl(N) are fixed, while its diagonal parts lie in ξ(h).
548
J. C. Hurtubise, E. Markman
More explicitly, for any matrix A, let A = Ad + Aod
(6.5)
denote the splitting of A into diagonal and off-diagonal matrices. One can choose the transition matrices M for a GL(N )-bundle, at least generically, to be diagonal, so that Md = diag (exp (yi ζ (z))) , Mod = 0,
(6.6)
where yi are constant on . In turn, one represents the Higgs fields, which decompose into a sum of sections of line bundles, by (φGl(N) )d = diag(qi ), 0 = Cw,w δw−w ,α c|α| ρ 0 (α(x), z), (φGl(N) )od
w,w
(6.7)
α
where qi are constant functions, and ρ 0 are the functions of (2.4). Note that the Kw,w are the residues of the section at the origin. The Calogero–Moser locus CM is given by constraints (1) diag(yi ) ⊂ h, (2) Mod = 0, (3) diag(qi ) ⊂ h, (6.8) 0 = Cw,w δw−w ,α c|α| . (4) res0 φGl(N)
w,w
α
Referring to the explicit form of the symplectic form on T ∗ MGL(N) (3.3), it follows: Proposition 6.1. The embedding JM is symplectic, and the Calogero–Moser Hamilto2 nian corresponds under the embedding to a multiple of res(ω tr(φGl(N) )). We need to compare the flow in the reduced space of G-bundles and the flow in the cotangent space of Gl(N )-bundles with level structure; in our Gl(N ) moduli, the 2 ) with the cocycle ω of (5.7); Hamiltonian is given, up to a constant, by pairing tr(φGl(N) the corresponding flow of (M, φGl(N) ) is, by (3.5) 0 , M −1 M˙ = ωφGl(N)
0 φ˙ Gl(N) = 0.
(6.9)
0 , first into its diagonal and off-diagonal components, and then Again we split ωφGl(N) write the off diagonal term as a coboundary (function on U0 vanishing at the origin, minus M −1 (function on U1 ) M) plus a constant cocycle:
0 1 cst 0 0 0 0 0 = ωφGl(N) + ωφGl(N) − M −1 ωφGl(N) M + ωφGl(N) ; ωφGl(N) od od od d (6.10) referring to (3.7) this transforms the flow (6.11) into the equivalent one: cst 0 0 M −1 M˙ = ωφGl(N) + ωφGl(N) , od d (6.11) 0 0 0 0 φ˙ Gl(N) = ωφGl(N) , , φGl(N) od
Calogero–Moser Systems and Hitchin Systems
549
This is not necessarily tangent to the embedded Calogero–Moser locus CM: it does not satisfy the constraints (1), (2) and (3) of (6.7), but does satisfy the constraint (4). When one has a symplectic subvariety V of a larger subvariety W , one can split the tangent space of W along V into T V ⊕ (T V )⊥ , using the symplectic form. The Hamiltonian vector field of H along V is simply the projection of the corresponding field in W , with respect to this splitting. Referring to the formula (3.3), splitting the diagonal matrices as d = h ⊕ h⊥ , and letting πh : gl(N ) → h, πh⊥ : gl(N ) → h⊥ be the ensuing projections, the bundle (T CM)⊥ is given by ˙ = 0, πh (M −1 M) 0 ˙ φ0 + [M −1 M, ] = 0. πh φ˙ Gl(N) Gl(N)
(6.12)
One can make the MGL(N) -flow tangent to the Calogero–Moser locus V by adding to it the following vector field, which lies in T CM⊥ : 0 )cst M −1 M˙ = −(ωφGl(N) od , 0 0 0 = a + ωφGl(N) )cst φ˙ Gl(N) od , (φGl(N) ) ,
(6.13)
d
where a = a(x) is a suitable constant (in z) in h⊥ , giving the Calogero–Moser flow: 0 )d M −1 M˙ = (ωφGl(N) 0 0 0 0 0 (φ˙ Gl(N) ) = a(x) + (ωφGl(N) )0od , (φGl(N) ) + ωφGl(N) )cst od , (φGl(N) ) , d (6.14) 0 0 0 cst 0 = a(x) + (ωφGl(N) )od + (ωφGl(N) )od , (φGl(N) ) 0 0 )cst . − (ωφGl(N) od , (φGl(N) ) od
Indeed, this satisfies the constraints (1) (2) (4) of (6.8), the third constraint being given by an appropriate choice of a ∈ h⊥ ⊂ d: 0 0 0 )0od + (ωφGl(N) )cst . (6.15) a(x) = −πh⊥ (ωφGl(N) od , (φGl(N) ) Let D be the group of automorphisms of the bundle given by M; with respect to our trivialisations the automorphisms are represented by constant matrices. D includes the group D of diagonal matrices. The action of D on T ∗ MGL(N) is represented by the vector field: 0 0 M −1 M˙ = 0, φ˙ Gl(N) = d , φGl(N) , (6.16) for d ∈ Lie (D ). If d is diagonal, this vector field lies in CM⊥ . We would like to use this vector field to rewrite the flows (6.15) as 0 , M −1 M˙ = ωφGl(N) d 0 0 0
0 φ˙ Gl(N) = (ωφGl(N) )0od + ωφGl(N) )cst + d (x), (φ ) od Gl(N) ,
(6.17)
giving a Lax pair with spectral parameter for the flow. This gives the constraint 0 0 0 ) = a − (ωφGl(N) )cst , (φ ) . (6.18) d (x), (φGl(N) od Gl(N) od
550
J. C. Hurtubise, E. Markman
We have, referring to (2.6)–(2.8), (φ 0 )w,w+α (x, z) = Cw,w+α cα −z−1 + ζ (α(x)) + O(z) , = Rw,w+α z−1 + Qw,w+α (x) + O(z),
def
0 0 (ωφGl(N) )0od + +ωφGl(N)
cst od
d 0 ρ (α(x), z) , dα(x) = Rw,w+α p (α(x)) + O(z),
(6.19)
)w,w+α = Cw,w+α cα
(6.20)
def
= Pw,w+α (x) + O(z).
Recall that we are all along dealing with flows of bundles and of sections φGl(N) , and in particular, that sections are determined by their leading order terms at z = 0. This gives necessary and sufficient algebraic constraints for d = d (x): (6.21) [P (x), R]od = d (x), R od , (6.22) 0 = d (x), R d , a(x) = d (x), Q(x) d . (6.23) Relation (6.22) is automatically satisfied and, when d is diagonal, (6.23) forces a = 0. These algebraic constraints are essentially the ones of Theorems 1 and 2 of [dP1]. Indeed, their Theorem 1 gives an ansatz for a Lax pair, which contains our solution: they have three constraints, labelled there (3.7), (3.8), and (3.9); they then particularise their ansatz in Theorem 2 to what is in essence our case, with d diagonal; their conditions then particularise to their (3.17), (3.18), (3.19). The first of their conditions follows automatically from Weyl invariance; their second, (3.18), essentially tells us that a = 0; their third is condition (6.22). By choosing a suitable representation (which is strongly constrained by the conditions), they then ensure that these conditions can be satisfied. The flow then has the Lax form (6.18) on the unreduced space T ∗ MGL(N) . Projecting to the reduced space, one quotients out by the action of the automorphisms of the bundle, and so omits the d(x), giving simply 0 M −1 M˙ = ωφGl(N) , d (6.24) 0 0 0 0 φ˙ Gl(N) = (ωφGl(N) )0od + ωφGl(N) )cst od , (φGl(N) ) , which is precisely the flow of the Hitchin system. In particular, one has a full set of commuting flows. The Bordner–Corrigan–Sasaki embedding. We keep our representation space V = CN of a sum of weight spaces Cw , invariant under the Weyl group, and still have our emˆ turning our H -bundles E(h) into bundles EGl(N) = J(E(h)) ˆ bedding J, = ⊕i Lwi (h) . ∗ ˆ of the space of H bundles to (T MG )red in a different We now extend the embedding J way, corresponding to the Lax pairs of Bordner–Corrigan–Sasaki [BCS2]. This involves the construction of a different Gl(N ) Higgs field. We first define some sections of End(EGl(N) ) associated to sections of the bundles Lα(h) . We note that to each root α, we have an associated reflection Rα (v) =
Calogero–Moser Systems and Hitchin Systems
551
v− < α, ˆ v > α of the Lie algebra h, and in turn a permutation of the weight spaces Cw , which can be represented by a matrix (sα )w,w ∈ Gl(V ), where, as usual we index the entries of the matrix by the weights themselves. Note that the non-zero entries of sα must have w − w = nα for some integer n. For a section f of Lα(h) , we define s˜α (f )w,w = (sα )w,w δw−w ,nα n · In∗ f. (6.25) n
Represent the section φ, which is a section of the associated bundle E((⊕α Cα ) ⊕ h) by ((φα ), φh ). We define the corresponding section φGl(N) given by (6.26) φGl(N) = (˜sα (φα )) + ξ(φh ). α
This section φGl(N) has poles not only at the origin, but also, for its w, w + nα components, at the nth roots of unity in the curve . The moduli space of Gl(N ) Higgs pairs must be chosen accordingly. 2 The Hamiltonian is again a multiple of the Hamiltonian given by pairing tr(φGl(N) ) with our standard cocycle ω of (5.7); as above, the flow is given by (6.14) 0 M −1 M˙ = ωφGl(N) d 0 0 0 0 0 ˙ φGl(N) = a + (ωφGl(N) )0od , φGl(N) + ωφGl(N) )cst , od , φGl(N) d 0 (6.27) 0 0 0 = a + ωφGl(N) + (ωφGl(N) )cst od , (φGl(N) ) od cst 0 0 − ωφGl(N) , φGl(N) , od
od
with again Eq. (6.15) for a. Explicitly, one has 0 = (sα )w,w δw−w ,nα nρ 0 (α(x), nz) , φGl(N)
w,w
0 ωφGl(N)
0 od
α
cst 0 + ωφGl(N) od
n
w,w
=
α
n
(sα )w,w δw−w ,nα
∂ρ 0 (α(x), nz) . ∂x
One again wants to use the action of the diagonal subgroup to write (6.28) as a Lax pair. We take d (x), to be diagonal, and get 0 M −1 M˙ = (ωφGl(N) )d , 0 0 0
0 )0od + ωφGl(N) )cst = (ωφGl(N) . φ˙ Gl(N) od + d (x), φGl(N)
(6.28)
In this case, the appropriate diagonal terms are given in [BCS2]: d (x)w,w =
∂ρ 0 (sα )w,w (α(x), 0) . ∂x α
(6.29)
In this case, there is no constraint on the representation; indeed, Bordner, Corrigan and Sasaki create a “universal Lax pair” within the algebra C(H ) ⊗ C[W ] created by
552
J. C. Hurtubise, E. Markman
tensoring the group algebra of the Weyl group with the function field of H ; the product must be suitably defined, but corresponds roughly to representing the Weyl group as acting by reflections on a sum of weight spaces, and the group H as acting by diagonal matrices. One then represents this algebra into a sum of weight spaces, and the Lax pair gets embedded into Gl(N ). This is what is given above. It would be interesting to do the geometry of bundles directly within this algebra. References [BCS1] Bordner, A.J., Corrigan, E. and Sasaki, R.: Calogero–Moser models: I. A new formulation. Progr. Theoret. Phys. 100, no. 6, 1107–1129 (1998) [BCS2] Bordner, A.J., Corrigan, E. and Sasaki, R.: Generalized Calogero–Moser models and universal Lax pair operators. Progr. Theoret. Phys. 102, no. 3, 499–529 (1999) [dP1] d’Hoker, E. and Phong, D.H.: Calogero–Moser Lax pairs with spectral parameter for general Lie algebras. Nucl. Phys. B 530 , no. 3, 537–610 (1998) [dP2] d’Hoker, E. and Phong, D.H.: Spectral curves for super-Yang–Mills with adjoint hypermultiplet for general simple Lie algebras. Nucl. Phys. B 534, no. 3, 697–719 (1998) [dP3] d’Hoker, E. and Phong, D.H.: Calogero–Moser and Toda systems for twisted and untwisted affine Lie algebras. Nucl. Phys. B 530, no 3, 611–640 (1998) [Do] Donagi, R.: Seiberg–Witten integrable systems. In: Proc. Sympos. Pure Math. 62, Part 2, Providence, RI: Am. Math. Soc., 1997, pp. 3–43 [GN] Gorsky, A., Nekrasov, N.: Elliptic Calogero–Moser system from two dimensional current algebra. hep-th/9401021 [K] Krichever, I.M.: Elliptic solutions of the Kadomtsev–Petviashvili equation and integrable systems of particles. Funct. Anal. Appl. 14, 282–290 (1980) [Lo] Looijenga, E.: Root systems and elliptic curves. Inv. Math. 38, 17–32 (1976); Invariant Theory for generalized root systems. Inv. Math. 61, 1–32 (1980) [Ma] Markman, E.: Spectral curves and integrable systems. Compositio Math. 93, 255–290 (1994) [N] Nekrasov, N.: Holomorphic bundles and many-body systems. Commun. Math. Phys. 180, no. 3, 587–603 (1996) [OP] Olshanetsky, M.A. and Perelomov, A.M.: Completely integrable Hamiltonian systems connected with semisimple Lie algebras. Invent. Math. 37, 93–108 (1976) [OP2] Olshanetsky, M.A. and Perelomov, A.M.: Classical integrable finite-dimensional systems related to Lie algebras. Phys. Rep. C 71, 313–400 (1981) Communicated by T. Miwa
Commun. Math. Phys. 223, 553 – 582 (2001)
Communications in
Mathematical Physics
© Springer-Verlag 2001
Spinodal Decomposition for the Cahn–Hilliard–Cook Equation Dirk Blömker1, , Stanislaus Maier-Paape1, , Thomas Wanner2 1 Institut für Mathematik, Universität Augsburg, 86135 Augsburg, Germany.
E-mail: [email protected]; [email protected]
2 Department of Mathematics and Statistics, University of Maryland, Baltimore County, Baltimore,
MD 21250, USA. E-mail: [email protected] Received: 2 May 2000 / Accepted: 8 July 2001
Abstract: This paper gives theoretical results on spinodal decomposition for the stochastic Cahn–Hilliard–Cook equation, which is a Cahn–Hilliard equation perturbed by additive stochastic noise. We prove that most realizations of the solution which start at a homogeneous state in the spinodal interval exhibit phase separation, leading to the formation of complex patterns of a characteristic size. In more detail, our results can be summarized as follows. The Cahn–Hilliard–Cook equation depends on a small positive parameter ε which models atomic scale interaction length. We quantify the behavior of solutions as ε → 0. Specifically, we show that for the solution starting at a homogeneous state the probability of staying near a finitedimensional subspace Y ε is high as long as the solution stays within distance rε = O(ε R ) of the homogeneous state. The subspace Yε is an affine space corresponding to the highly unstable directions for the linearized deterministic equation. The exponent R depends on both the strength and the regularity of the noise. Contents 1. 2.
3.
4.
Introduction . . . . . . . . . . . . Linear Theory . . . . . . . . . . . 2.1 Basic definitions . . . . . . 2.2 Auxiliary results . . . . . . 2.3 Main theorem . . . . . . . Nonlinear Extensions . . . . . . . 3.1 Outline of the main ideas . 3.2 Main results . . . . . . . . The Cahn–Hilliard–Cook Equation
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
554 558 558 559 568 573 573 576 579
Present address: Institut für Mathematik, RWTH-Aachen, Templergraben 55, 52062 Aachen, Germany. E-mail: [email protected]; [email protected]
554
D. Blömker, S. Maier-Paape, T. Wanner
1. Introduction The deterministic Cahn-Hilliard equation was introduced in [5, 8] as a mathematical model for phase separation phenomena in binary metal alloys. It is given by ∂u ∂u ut = − ε2 u + f (u) = = 0 on ∂G, (1) in G, ∂ν ∂ν where G ⊂ Rd , d ∈ {1, 2, 3} is a bounded domain with sufficiently smooth boundary, and −f is the derivative of a double-well potential. The standard choice for f is a cubic polynomial such as f (u) = u − u3 . The variable u represents the concentration of one of the two components of the binary alloy subject to some affine transformation, and the small parameter ε models the atomic scale interaction length. One of the important phase separation phenomena which can be observed in binary metal alloys is called spinodal decomposition, and can be described as follows. At high temperatures, the two components of the alloy form a stable homogeneous mixture. If this mixture is suddenly quenched to a certain lower temperature, it loses its stability, and a phase separation process sets in. During the initial stage of this separation, which is called spinodal decomposition, a characteristic fine-grained pattern exhibiting a characteristic length scale is quickly generated. See for example Cahn [6, 7], Elder, Desai [14], and Elder, Rogers, Desai [15, 16]. The above-described scenario depends crucially on the specific value of the initial homogeneous concentration m. Spinodal decomposition can only be observed, if m is contained in the so-called spinodal region. This is the usually connected set of all m such that f (m) > 0. Mathematical results on spinodal decomposition have been obtained by Grant [17], Maier-Paape, Wanner [21, 22], and by Sander, Wanner [25, 26]. These results will be described below in order to introduce notation. The present paper is concerned with the stochastic version of the Cahn-Hilliard equation, known as Cahn–Hilliard–Cook equation. This equation is given by ∂u ∂u ut = − ε2 u + f (u) + σ · ξ in G, = = 0 on ∂G, (2) ∂ν ∂ν where the additive noise term ξ is usually chosen as space-time white noise or colored noise. These terms will be explained in more detail later. The stochastic partial differential equation (2) was first considered by Cook [9]. Some authors, such as for example Binder [2] and Pego [24], have expressed the belief that only the stochastic version can correctly describe the whole decomposition process in a binary alloy. Numerous papers in the physics literature addressed this question, see for example [15, 16, 18, 20, 23], just to name a few. Using numerical methods or formal analysis they study the behavior of structure factors or correlation functions. For initial mass m in the spinodal region, one of the main conclusions is that linear theory is only valid for short times and in the highly unstable regime, i.e., for ε > 0 small enough. The main result of this paper will be the first rigorous mathematical result explaining spinodal decomposition for the stochastic Cahn–Hilliard equation in this highly unstable regime. Similar to the deterministic case considered in Maier-Paape, Wanner [21, 22], this will be achieved by considering the linearized equation. Before presenting our results in more detail, let us briefly summarize a result for the deterministic equation (1). Maier-Paape and Wanner [21, 22] proved that with high probability, solutions starting in a small ball with size of the order O(εd ) stay close
Spinodal Decomposition for the Cahn–Hilliard–Cook Equation
555
to a finite-dimensional subspace Yε before leaving a larger ball with size of the same order. Furthermore, they show that functions in the dominating subspace Yε exhibit the expected spinodally decomposed patterns. This work was later extended by Sander and Wanner [25, 26] to explain the solution behavior even further from equilibrium, i.e., for balls of considerably larger radii. We recall some of the definitions and results contained in [21, 22]. Suppose that m is contained in the spinodal interval, i.e., assume that f (m) > 0. Then the linearization of the Cahn-Hilliard equation (1) at the spatially homogeneous equilibrium v0 = m is given by vt = − ε2 v + f (m)v
∂v ∂v = =0 ∂ν ∂ν
in G,
on ∂G.
(3)
Due to f (m) > 0 we can rescale (3) by considering v˜ = f (m) · v ,
x˜ =
x f (m)
,
and
ε˜ =
ε . f (m)
(4)
Thus, without loss of generality, we may and will assume f (m) = 1 in the following. To arrive at an abstract formulation of (3), let udx = 0 . X = u ∈ L2 (G) : G
(5)
Notation 1.1. Consider the operator − on X with domain D(−) = {u ∈ X ∩ H 2 (G) : ∂u/∂ν = 0 on ∂G}. Then the spectrum of − consists of an infinite sequence 0 < µ1 ≤ µ2 ≤ . . . → ∞ of real eigenvalues. The corresponding eigenfunctions {fk }k∈N form a complete orthonormal set in X. Furthermore, there are positive constants cG and CG depending only on the domain G such that cG · k 2/d ≤ µk ≤ CG · k 2/d for all k ∈ N.
(6)
See for example [10, 13]. The above-mentioned eigenvalues and eigenfunctions are of relevance for the linear operator in (3). Notation 1.2. Now consider the operator Aε describing the right-hand side of the linearized problem (3), i.e., let Aε v := − ε2 v + v , for all v ∈ D(Aε ) = {u ∈ X ∩ H 4 (G) : ∂u/∂ν = ∂u/∂ν = 0 on ∂G}. Then Aε is self-adjoint and sectorial. The spectrum of Aε consists of eigenvalues λk,ε := µk · 1 − ε 2 µk and the corresponding eigenfunctions fk .
(7)
556
D. Blömker, S. Maier-Paape, T. Wanner −γ − λmax ε $−− ε
γ − λmax ε
0 $− ε
$+ ε
γ + λmax ε
λmax ε
R
$++ ε
Fig. 1. The decomposition of the spectrum
Notice that the largest eigenvalue of Aε is bounded above by λmax = ε
1 . 4ε 2
(8)
The maximum value λmax is attained in (7) for µmax = 1/(2ε 2 ). In order to decompose ε ε − + the spectrum of Aε , fix constants 0 γ < γ < 1. Then we partition the spectrum into four sets − + ++ σ (Aε ) = $−− ε ∪ $ε ∪ $ε ∪ $ε ,
where the four sets on the right-hand side denote the intersections of the spectrum with − − max − + max + max (−∞, −γ − ] · λmax ε , (−γ , γ ] · λε , (γ , γ ] · λε , and (γ , 1] · λε , respectively. See also Fig. 1. This partition induces a decomposition of X into four subspaces Xε−− , Xε− , Xε+ , and Xε++ , generated by the eigenfunctions corresponding to the eigenvalues in the four sets of the partition. From the asymptotics (6) of the eigenvalues µk of −, it can easily be verified that the spaces Xε− , Xε+ , and Xε++ are finite-dimensional with dimensions proportional to ε−d as ε → 0. Let us now return to the discussion of the linear equation (3). Its general solution is given by the Fourier series v(t) = k∈N eλk,ε t · (v(0), fk )X · fk . This immediately implies that the above-defined subspaces of X are invariant under the linear flow. MaierPaape and Wanner [21] used this setting to explain spinodal decomposition for the linearized problem (3). To this end, choose an initial condition v0 close to 0. Then the Xε++ -component v ++ of the corresponding solution v of (3) grows exponentially fast. Meanwhile, the projection v − grows much slower than v ++ , and the projection v −− decays exponentially fast. Thus, the solution v stays close to the dominating unstable subspace Yε := Xε+ ⊕ Xε++ .
(9)
The space Xε+ is introduced in [22] for technical reasons, in order to obtain certain − spectral gaps between $++ ε and $ε for small ε > 0. This allows comparisons between ++ the fast growing modes in Xε and the slow growing or decaying modes in Xε− . The spectral gap will also be necessary for our approach. The described linear scenario can be given a meaning in the nonlinear situation as well – at least in a neighborhood of order O(ε d ). See Theorem 3.10 in [22]. The characteristic patterns of spinodally decomposed states are a consequence of the closeness of the solution to the space Yε . For an arbitrary element ϕ = αk fk ∈ Yε the sum is over those k with µk ≈ µmax = O(ε−2 ). Recalling that the nodal domains of ϕ are the ε maximal connected subsets of {x ∈ G : ϕ(x) > 0} or {x ∈ G : ϕ(x) < 0}, the following result is stated and proven in [21] subject to some technical conditions. There exists a positive constant C such that for every “typical” x0 ∈ G the following holds. If N ⊂ G denotes the nodal domain of ϕ containing x0 , then for any ball of radius r contained in N and centered at x0 the estimate r ≤ C/ µmax = O(ε) is ε satisfied, provided ε > 0 is sufficiently small.
Spinodal Decomposition for the Cahn–Hilliard–Cook Equation
557
In other words, the “thickness” of the nodal domains is bounded above by O(ε). Therefore spinodally decomposed states appear to be snake-like patterns. The present paper is devoted to showing that the results obtained in [21, 22] can be carried over to the Cahn–Hilliard–Cook equation (2). However, this has to be achieved by completely different methods. Unlike in the deterministic situation, we consider (2) only in combination with the initial condition u(0) = m
for some m in the spinodal region.
(10)
Due to the additive nature of the noise, trajectories of the corresponding solution process are likely to be driven away from the homogeneous state. Using the natural probability measure, which is available on the probability space over which the perturbation process ξ is modeled, we will show that most trajectories of the solution of (2), (10) exhibit spinodal decomposition. As in the deterministic situation, this is due to the fact that most trajectories stay close to the dominating subspace Yε defined in (9). Furthermore, we will only be able to verify this behavior in appropriate neighborhoods of the homogeneous equilibrium m, where the influence of the nonlinearity is sufficiently small. Even though this seems to be completely analogous to the deterministic result, there are major differences. First of all, we already mentioned that in the stochastic case only one single initial value problem is considered. The probabilistic aspect of spinodal decomposition enters naturally through the underlying probability space, and not through the introduction of a measure on some inertial manifold. See [22] for details. Secondly, our stochastic result is proven for the complete, infinite-dimensional phase space, and not only on an inertial manifold as in [22]. To accomplish this, some estimates on the evolution of the higher modes are necessary. Another important difference is that we are able to consider the solution in balls around the homogeneous initial data with larger radii than in [22]. This is possible, since we establish bounds for the distance between the linear and nonlinear solutions directly. We do not need an inertial manifold whose existence is proven only in a small neighborhood of the initial data. This implies that for the Cahn–Hilliard–Cook equation we can consider radii nearly up to order O(1) in H 2 (G), depending on the regularity of the underlying noise process. This is considerably better than the deterministic theory in [22], which yields radii of order O(εd ). It is, however, not as good as the results in [25, 26], where radii close to the order O(ε−2 ) can be achieved. These two papers establish a second phase of spinodal decomposition in a certain cone-shaped region of phase space. Due to [22], most solutions of the deterministic equation automatically end up in this region. Such a second phase spinodal decomposition is likely in the stochastic case as well. See [4]. The paper is organized as follows. In Sect. 2 the theory for linearized stochastic equations is presented. Based on three crucial estimates contained in Propositions 2.3, 2.6, and 2.8, we will present the analogue of the deterministic linear theory of [21, Sect. 2]. These results are extended to the nonlinear situation in Sect. 3, where we present our main Theorems 3.1 and 3.2. Finally, the application of the theory to the Cahn–Hilliard–Cook equation is discussed in Sect. 4. See Theorem 4.1. Throughout the paper, the letters C and c denote generic positive constants which are always independent of ε and whose value may change from occurrence to occurrence, sometimes even within one line.
558
D. Blömker, S. Maier-Paape, T. Wanner
2. Linear Theory 2.1. Basic definitions. We begin our discussion of spinodal decomposition by considering the linear case. Consider the linear stochastic partial differential equation vt = − ε2 v + v + σ · ξ, v(0) = 0 (11) on a suitable domain G ⊂ Rd , subject to homogeneous Neumann boundary conditions for v and v. We will use the spaces and notation contained in the following assumptions. Assumption 2.1. Let X be defined as in (5), and µk , fk , λk,ε , and Aε be as in Notations 1.1 and 1.2. Furthermore, let s ∈ R+ 0 be arbitrary and define H∗s (G) = D (−)s/2 ⊂ X with norm · s := (−)s/2 · , where · denotes the norm in X. This is the subspace of the Sobolev space H s (G) given by the orthogonal L2 (G)-complement of the constant functions. See [1]. For arbitrary s ∈ R the spaces H∗s (G) are defined in the usual way by a formal Fourier series expansion. Finally, assume that the spectrum of the operator Aε is partitioned into subsets $++ ε , − −− as shown in Fig. 1. Then, similar as before with X, we decompose $+ ε , $ε , and $ε the space H∗s (G) into corresponding subspaces Xε++ , Xε+ , Xε− , and Xε−− , spanned by the eigenfunctions fk whose eigenvalues λk,ε are contained in the respective spectral set. Note that only Xε−− actually depends on s. For the additive stochastic term in (11), we assume the following. Assumption 2.2. The noise process ξ , defined on an abstract probability space (+, A, P ), is the generalized derivative of some Q-Wiener process {W(t)}t≥0 , given by the expansion W(t) = αk · βk (t) · fk with Qfk = αk2 fk , (12) k∈N
where the fk are the eigenfunctions from Assumption 2.1. The operator Q denotes the covariance operator of W. Furthermore, the sequence {βk }k∈N consists of independent real-valued standard Brownian motions over (+, A, P ). For more details we refer the reader to [12, Sect. 4.1]. The specific case αk = 1 for all k ∈ N is of particular importance, since it corresponds to the case of space-time white noise. We denote this specific Q-Wiener process by W . Notice also that due to the above Fourier series expansion, the noise process conserves mass. The differential equation (11) is interpreted as a stochastic integral equation in the sense of Ito. For σ = 1, its solution is given by the Ito-integral and Fourier series expansion t t (t−τ )Aε WAε (t) = e dW(τ ) = αk · e(t−τ )λk,ε dβk (τ ) · fk , (13) 0
k∈N
0
where the eigenvalues λk,ε are defined as in (7). See for example [12]. The stochastic process {WAε (t)}t≥0 is called stochastic convolution. In the case of space-time white noise, i.e., if αk = 1 for all k ∈ N, we denote this process by WAε .
Spinodal Decomposition for the Cahn–Hilliard–Cook Equation
559
In the following, we will prove spinodal decomposition for the solution process σ · WAε of problem (11). For the sake of simplicity, we begin by considering only the case of space-time white noise in Subsect. 2.2, i.e., we assume αk = 1 for all k ∈ N and discuss the stochastic convolution σ · WAε . Colored noise is considered afterwards in Subsect. 2.3. It is well-known [12, Sect. 5.1.2] that WAε is a continuous Gaussian process in suitable spaces H∗s (G). In our situation we need s + d/2 < 2 to obtain existence in C([0, T ], H∗s (G)). If s ∈ N0 , then the solution is s-times weakly differentiable with respect to the spatial variable x. For the case s = 0 we refer the reader to [12], for the case s = 1 compare [11, Prop. 1.2]. The general case can be obtained by a quite technical extension of the ideas in the cited articles using the Kolmogorov test. See Blömker [3, Sect. 2.2.2] for more details. Before going on to the next subsection, we introduce the following notation. Consider the decomposition of the spectrum of Aε and the corresponding subspaces Xε++ , Xε+ , (t), WA+ε (t), WA−ε (t), and Xε− , and Xε−− of H∗s (G) as in Assumption 2.1. Then let WA++ ε −− WAε (t) denote the projections of WAε onto the corresponding subspaces of H∗s (G). It can readily be verified that these projections are orthogonal both in H∗s (G) and in L2 (G). 2.2. Auxiliary results. Throughout this subsection we assume that Assumptions 2.1 and 2.2 are satisfied. Furthermore, in this subsection only the case of space-time white noise is considered, i.e., we assume αk = 1 for all k ∈ N. Using this setting, we derive three estimates which together imply that with high probability the solution of (11) stays close to the unstable space Yε in H∗s (G) defined in (9) before exiting a ball of a certain radius. Here s denotes a real constant which will be specified in more detail later. The first of these results, Proposition 2.3, determines the time te,ε > 0 at which the probability is high for the Xε++ -component of the stochastic convolution to leave a ball of radius r > 0 in H∗s (G). The remaining two results, presented in Propositions 2.6 and 2.8, respectively, derive bounds for the probability that the Xε− - and the Xε−− -component of the stochastic convolution is bounded by a small quantity for all t ∈ [0, te,ε ]. Unlike in the deterministic case, one of the biggest difficulties is to derive estimates for the Xε−− -component WA−− , because in contrast to the deterministic theory this component ε will not decay at all. So the problem cannot be reduced to a finite-dimensional one. We begin our presentation with Proposition 2.3, which basically says that the probability for the Xε++ -component and therefore the Yε -component of the solution σ WAε of (11) to not exit a given ball Br (0) ⊂ H∗s (G) is small, provided t is sufficiently large. Proposition 2.3. Let nε = dim Xε++ . Then there exist constants c, C > 0 such that for all r, σ > 0, and every small enough ε > 0 the estimate −d P σ WA++ (t)s < r ≤ 2−nε /2 ≤ 2−cε ε is satisfied for arbitrary times t ≥ te,ε :=
1 2λmax · γ+ ε
· ln
r2 C · ε 2s−2 · + 1 , σ 2 4(nε /2 + 1)2/nε
(14)
where λmax is defined in (8). The constant c depends only on γ + and the constants cG ε and CG from (6), the constant C depends only on γ + and s.
560
D. Blömker, S. Maier-Paape, T. Wanner
Remark 2.4. Notice that due to Stirling’s formula and the definition of the 4-function we have 4(nε /2 + 1)2/nε ∼ nε /2 ∼ ε−d . Hence
r2 r 2 2s−2+d + γ ≤ ln C · · ε + 1 ln ce · 2 · ε 2s−2+d + 1 ≤ te,ε · 2λmax e ε σ σ2
(15)
for suitable positive constants ce and Ce which depend only on γ + , s, and the constants from (6). Thus, as ε → 0, both sides of (15) are proportional to ε2s+d−2 if 2s + d ≥ 2, and they are proportional to ln(ε−1 ) if 2s + d < 2, provided r and σ are fixed. Proof. Using the L2 -Fourier series representation given in (13) we obtain t s/2 s/2 ++ µk · e(t−τ )λk,ε dβk (τ ) · fk . (−) WAε (t) = 0
k: λk,ε ∈$++ ε
The right-hand side can be interpreted as a normal distribution on Rnε , where nε = ∼ ε −d . More precisely, the random variable on the right-hand side is N (0, diag(I12 , . . . , In2ε ))-distributed, with Ij > 0 such that |$++ ε |
Ij2
j =1,... ,nε
t = µsk · e2τ λk,ε dτ 0
λk,ε ∈$++ ε
.
(16)
Denote by B(r) := {x ∈ Rnε : |x| < r} the open ball around 0 ∈ Rnε with radius r > 0. Using the density of the normal distribution and substitution one then obtains s
P (σ WA++ (t)s < r) = P ((−) 2 WA++ (t)L2 (G) < r/σ ) ε ε = N (0, diag(I12 , . . . , In2ε ))(B(r/σ )) nε
1 n ε 2 2 Ij−1 · e− 2 j =1 xj /Ij d nε x = (2π )−nε /2 · j =1
= (2π )−nε /2 ·
B(r/σ )
2 /2
diag
(I1−1 ,... ,In−1 ε )·B(r/σ )
e−x
d nε x
≤ (2π )−nε /2 · vol(B(r/[σ min{I1 , . . . , Inε }])) nε r = √ 4(nε /2 + 1)−1 . 2σ · min{I1 , . . . , Inε } we have λk,ε = −ε2 µ2k + µk ≥ γ + λmax For λk,ε ∈ $++ ε ε , and therefore
1 1 2 + max 2 − γ λε ≥ ε · µ k − 2 . 4ε 2 2ε = ε−2 /4, we derive For γ + near 1, and using the fact that λmax ε µk − 1 ≤ 1 · 1 − γ + . 2 2 2ε 2ε
Spinodal Decomposition for the Cahn–Hilliard–Cook Equation
561
Thus, there exist positive constants C and D which depend only on γ + such that C · ε−2 ≤ µk ≤ D · ε −2 for all k satisfying λk,ε ∈ $++ ε . Together with (16) this implies for arbitrary indices j = 1, . . . , nε , Ij2 ≥
C · ε −2s 2t·λmax + + 2−2s 2t·λmax ε ·γ ε ·γ · e − 1 = C · ε · e − 1 , 2λmax · γ+ ε
where the constant C depends only on γ + and s. Altogether we obtain P
(t)s σ WA++ ε
r nε
C · ε 2s−2 max ·γ +
e2t·λε
−1
nε /2
· 4(nε /2 + 1)−1 .
(17)
It can easily be verified that the right-hand side of (17) is bounded above by 2−nε /2 if and only if t ≥ te,ε , which completes the proof of the proposition. Remark 2.5. Obviously one can neglect some of the modes in Xε++ , it is certainly enough ∗ ⊂ N with to consider a large enough subspace. To be more precise, consider I 0 ∗ ∗ limN→∞ |{k ≤ N : k ∈ I }|/N > 0. If one replaces W by W (t) := k∈I ∗ βk (t)fk , then the conclusions of Proposition 2.3 still hold. The remaining two propositions of this section give upper bounds for the probability that WA−ε (t) or WA−− (t) leaves a δ-neighborhood of Yε in H∗s (G) for some time t < te,ε , ε where te,ε is defined in Proposition 2.3. We begin by considering the case of WA−ε (t) in the proposition below, the case of WA−− (t) will be considered in Proposition 2.8. ε Proposition 2.6. For every p ∈ N there exists a constant C depending only on p, s, γ + , γ − , and the constants in (6) such that for all positive r, δ, σ and small ε > 0 the following estimate holds: P
sup
t∈[0,te,ε ]
σ WA−ε (t)s ≤C
≥δ
σ2 2 · ε · :ε · δ2
r2 Ce · 2 · ε d+2s−2 + 1 σ
p
γ − /γ + −1
,
where te,ε was defined in Proposition 2.3, Ce in Remark 2.4, and −2s−d for s > −d/2 ε :ε := ln(ε −1 ) for s = −d/2 . 1 for s < −d/2 Ultimately we want to show that the right-hand side in the main estimate of the above proposition is small in ε. However, we will achieve this only in a simplified form, by prescribing polynomial ε-dependence of both σ and r. This will be discussed in detail in our main result Theorem 2.11. For the proof of Proposition 2.6 we need the following lemma.
562
D. Blömker, S. Maier-Paape, T. Wanner
Lemma 2.7. There exists a positive constant C which depends only on γ − , s, and the constants in (6) such that {k:λk,ε ∈$−ε } µsk ≤ C · :ε for all sufficiently small ε > 0. − Proof. Let Nε := max{k : λk,ε ∈ $− ε }, i.e., λk,ε ∈ $ε implies 1 ≤ k ≤ Nε . Analogous −2 to the proof of Proposition 2.3 one obtains µNε ∼ ε as ε → 0, with proportionality constants depending only on γ − . In view of µk ∼ k 2/d (cf. (6)) this implies Nε ∼ ε−d , where the proportionality constants depend only on γ − and on the constants in (6). Now
k:λk,ε ∈$− ε
µsk ≤ C
Nε
k 2s/d ≤ C 1 +
Nε +1
τ 2s/d dτ
1
k=1
is satisfied, which immediately implies the stated upper bounds.
Proof of Proposition 2.6. The proof is divided into two parts. To this end, we partition − − the spectral set $− ε into the subset $ε> of nonnegative and the subset $ε< of strictly − negative eigenvalues. This partition induces a splitting of the space Xε , as well as corresponding restrictions of the linear operator Aε . We denote the part of Aε corresponding − to eigenvalues in $− ε> by B> , and the part of Aε corresponding to eigenvalues in $ε< by B< . − 2 To begin with, we consider the nonnegative part B> of A− ε . We show that WB> (t)s is a continuous submartingale with respect to the standard filtration {Ft }t≥0 induced by W . The continuity is a consequence of the continuity of the stochastic process WAε . To prove the submartingale property, we consider for t2 > t1 , 2
− E WB> (t2 ) Ft1 s 2 t2 t1 (t −τ )B > dW (τ ) + =E e 2 e(t2 −τ )B> dW (τ ) Ft1 0 t1 s t1 2 2 t 2 (t −τ )B (t2 −τ )B> > 2 = e dW (τ ) + E e dW (τ ) . t1
s
0
s
Here we used the stochastic independence of the increments of W and standard properties of conditional expectations. The mixed term vanishes, since the mean value of the stochastic convolution does. Using the fact that the definition of B> implies − , one further deduces e(t2 −t1 )B> us ≥ us for all u ∈ Xε>
E
2 (t −t )B − 2 1 > ≥ e F (t ) WB> 2 t1 s
t1 0
2 2 − e(t1 −τ )B> dW (τ ) ≥ WB> (t1 ) . s
s
The submartingale-inequality in [19, Chapter I, Theorem 3.8(i)], together with the estimate for the higher moments of a normal distribution in [12, Corollary 2.17], now
Spinodal Decomposition for the Cahn–Hilliard–Cook Equation
563
implies P
sup σ WB−> (t) ≥ δ s
t∈[0,te,ε ]
≤
σ 2p δ
2p · sup E WB−> (t) s
t∈[0,te,ε ]
σ 2p
≤C·
δ σ 2p
=C·
δ
≤C·
·
2 sup E WB−> (t) s
t∈[0,te,ε ]
· sup
t∈[0,te,ε ]
p
λk,ε ∈$− ε>
µsk ·
t
p e2τ λk,ε dτ
0
p
σ2 − 2 2te,ε λmax ε γ · : · ε · e − 1 ε δ2
,
where Lemma 2.7 was applied and :ε denotes the constant defined in the formulation of Proposition 2.6. The constant C depends on p, s, γ − , and the constants in (6). With the definition of te,ε from (14) one further obtains P
sup σ WB−> (t) ≥ δ
t∈[0,te,ε ]
≤C·
s
σ2 · :ε · ε 2 · δ2
r2 Ce · 2 · ε d+2s−2 + 1 σ
p
γ − /γ + −1
.
Now consider the second part B< of the operator A− ε . Since all eigenvalues of B< are negative and the operator is defined on a finite-dimensional space, it is the generator − . Furthermore, of a group {etB< }t∈R with etB< us ≤ us for all t ≥ 0 and u ∈ Xε< due to [12, Theorem 4.12] stochastic integrals are continuous martingales, provided the integrand does not depend on the limits of integration. Thus, the martingale inequalities of [12, Theorem 3.8] and the estimate for higher moments of normal distributions yield − P sup σ WB< (t) ≥ δ t∈[0,te,ε ]
s
t −τ B< ≤P sup σ e dW (τ ) ≥ δ 0 t∈[0,te,ε ] s t 2 p σ 2p −τ B< ≤C· · sup E dW (τ ) . e δ t∈[0,te,ε ]
s
0
With 2 t −τ B< ≤ e dW (τ ) E 0
s
λk,ε ∈$− ε<
µsk ·
t 0
e2τ |λk,ε | dτ
564
D. Blömker, S. Maier-Paape, T. Wanner
and Lemma 2.7, we can now proceed as in the above first part of the proof to obtain an − max estimate similar to the one for B> . (Notice that for λk,ε ∈ $− ε< we have |λk,ε | ≤ γ λε .) Combining these estimates with − 2 − 2 − 2 WAε (t) = WB< (t) + WB> (t) s
s
s
and P
2 sup σ WA−ε (t) ≥ δ 2 2
t∈[0,te,ε ]
≤P
s
2 2 − 2 2 − 2 sup σ WB> (t) ≥ δ /2 + P sup σ WB< (t) ≥ δ /2 2
t∈[0,te,ε ]
s
completes the proof of Proposition 2.6.
t∈[0,te,ε ]
s
Xε−− -component
of WAε (t), i.e., the infinite-dimensional, Finally, we consider the yet contractive part. Similar to the case of Proposition 2.6, we derive an upper bound on the probability that WA−− (t) leaves a δ-neighborhood of Yε in H∗s (G) for some time ε t < te,ε , where te,ε is defined in Proposition 2.3. More precisely, we prove the following. Proposition 2.8. Assume 2s + d < 4. Then for any sufficiently large integer p there exists a constant C depending only on p, s, d, γ − , γ + , and the constants in (6), such that
p 2 σ −− 2−2s−d ·ε · ε −2 · te,ε , P sup σ WAε (t)s ≥ δ ≤ C 2 δ t∈[0,te,ε ] where te,ε is defined in (14). As we already pointed out after the statement of Proposition 2.6, we ultimately want to show that the right-hand side in Proposition 2.8 is small with respect to ε. This will be discussed in detail in our main result, Theorem 2.11. Notice that the condition 2s + d < 4 in the above proposition is the regularity condition assuring that the stochastic convolutions W2 (t) and WAε (t) exist in the space H∗s (G). For the proof of Proposition 2.8 we need an estimate on certain weighted sums of the eigenvalues µk , where the sum is taken only over those eigenvalues contributing to $−− ε . To this end, let . Nε− := inf k : λk,ε < −γ − · λmax ε Due to the asymptotic behavior of λk,ε , arguments similar to the ones in Lemma 2.7 imply Nε− ∼ ε −d , with proportionality constant depending on γ − and the constants in (6). Furthermore, we obtain the following result. Lemma 2.9. Let s ∈ R. Then there exist positive constants c and C such that for all ε > 0 and τ > 0 the estimate
∞ −(2s+d)/4 −2 s 2τ λk,ε 2 µk · e ≤ C · τε + <ε,τ · e−cε τ k=Nε−
Spinodal Decomposition for the Cahn–Hilliard–Cook Equation
holds with <ε,τ
565
−2s−d for s < −d/2, τ < ε2 ε −2 := − ln(τ ε ) for s = −d/2, τ < ε2 . 0 for the remaining cases
(18)
The constants c and C depend on γ − and the constants in (6), the constant C also depends on s and d. Proof. A straightforward calculation yields λk,ε <
−γ − λmax ε
1+
µk >
if and only if
1 + γ− , 2ε 2
and for any c0 ∈ (0, 1) we have λk,ε < −c0 ε 2 µ2k
µk >
if and only if
If we choose c0 depending on γ − such that 1 1+ = 1 − c0
1 . (1 − c0 )ε 2
1 + γ− , 2
then for all k ≥ Nε− we obtain λk,ε < −c0 ε 2 µ2k .
(19)
For s ≥ 0 and c > 0 it can easily be verified that
2 4/d max x 2s/d e−(cτ ε x )/3 = x≥0
3s 2e · cτ ε 2
s/2
.
Furthermore, for all T > 0 one obviously has 2 4/d 2 4/d max e−(cτ ε x )/3 = e−(cτ ε T )/3 .
(20)
(21)
x≥T
Using µk ∈ [cG , CG ] · k 2/d and Nε− ∈ [c, C] · ε −d , we obtain for any s ≥ 0, ∞ k=Nε−
µsk
·e
2τ λk,ε
(19)
≤ C
∞
k 2s/d e−cτ ε
2 k 4/d
k=Nε−
∞ −s/2 2 4/d ≤ C τ ε2 · e−(2cτ ε k )/3
(20)
k=Nε−
∞ −s/2 − 4/d 2 2 4/d ≤ C τ ε2 · e−(cτ ε (Nε ) )/3 · e−(cτ ε k )/3
(21)
≤ C τ ε2
−s/2
· e−cτ ε
−2
·
k=Nε−
∞ Nε− −1
e−cτ ε
−(2s+d)/4 −2 ≤ C τ ε2 · e−cτ ε ·
0
∞
2 η4/d
e−cη
4/d
dη dη
(22)
566
D. Blömker, S. Maier-Paape, T. Wanner
for positive constants, all denoted by C and c. For the remaining case s < 0 we cannot use (20), but we directly estimate the sum by an integral. Analogous to (22) we obtain for s < 0, ∞ k=Nε−
µsk · e2τ λk,ε ≤ Ce−cτ ε
−2
= Ce−cτ ε
−2
·
∞ cε−d
τ ε2
η2s/d e−cτ ε
2 η4/d
−(2s+d)/4 ·
dη
∞ c(τ ε−2 )d/4
η2s/d e−cη
4/d
dη.
For 0 > s > −d/2 or τ ≥ ε2 we therefore obtain the same result as in (22), since the last integral is bounded. Difficulties arise for s ≥ −d/2 and τ < ε2 . However, if s < −d/2, then ∞ d/4 2s/d+1 −(2s+d)/4 −2 s 2τ λk,ε 2 −2 · e−cτ ε µk · e ≤ C τε · 1 + τε k=Nε−
≤C·
τ ε2
−(2s+d)/4
+ ε −2s−d
The case s = −d/2 can be treated analogously.
−2
· e−cτ ε .
Proof of Proposition 2.8. The proof is similar to the one of Proposition 7.3 in [12], where the factorization method is used. In our situation, however, we have to determine the constants more precisely. We have sin(π α) t −− WA−− (t) = (t − τ )α−1 e(t−τ )Aε Yα (τ )dτ, ε π 0 where α ∈ (0, 1) will be fixed later, and τ −− Yα (τ ) = (τ − ϑ)−α e(τ −ϑ)Aε dW (ϑ), 0
which is a normal random variable. In preparation of our later estimates, assume that γ > −1 is arbitrary. Then a simple calculation furnishes τ τ γ −cε−2 ϑ ϑ e dϑ ≤ ϑ γ dϑ = (1 + γ )−1 τ 1+γ , 0
as well as
τ
0
γ −cε−2 ϑ
ϑ e
dϑ ≤
0
Hence
∞
γ −cε−2 ϑ
ϑ e
0
τ
ϑ γ e−cε
dϑ =
∞
ϑ γ e−cϑ dϑ · ε 2(1+γ ) .
0
−2 ϑ
0
where C depends only on c and γ .
1+γ dϑ ≤ C · min ε 2 , τ ,
(23)
Spinodal Decomposition for the Cahn–Hilliard–Cook Equation
567
Then Hölder’s inequality and the fact that A−− is the generator of a semigroup of ε contractions on H∗s (G) imply t
2p − 2p α−1 −(t−τ )λmax ε γ WA−− (t) ≤ C (t − τ ) · e · Y (τ ) dτ α s s ε 0
2p−1 t −2 2p τ (α−1)·2p/(2p−1) · e−cp τ ε dτ Yα (τ )s dτ 0 0 2pα−1 te,ε (23) 2p ≤ C · min te,ε , ε2 · Yα (τ )s dτ t
≤C
0
for all t ∈ [0, te,ε ], provided we have α > 1/(2p). (This will determine the lower bound on p, once we specified α.) Here and for the rest of the proof C and c denote generic constants, which change from place to place. The constant C depends on s, d, γ − , γ + , p, α, and the constants in (6), but c depends only on γ − and the constants in (6). Consider s = −d/2. Then (18) implies <ε,ϑ ≤ ε −2s−d . Furthermore, together with the estimate for the higher moments of a normal distribution in [12, Corollary 2.17], the Ito-isometry, and Lemma 2.9 we obtain −− 2p E sup WAε (t) t∈[0,te,ε ]
s
p EYα (τ )2s dτ 0 p te,ε τ ∞ ≤ Cε4pα−2 · µsk · ϑ −2α e2ϑλk,ε dϑ dτ
≤ Cε4pα−2 ·
te,ε
te,ε
k=Nε− τ
0
≤ Cε
4pα−2
· 0
(23)
te,ε
≤ Cε4pα−2 ·
ϑ
(24)
0
−2α
ϑε
2
−(2s+d)/4
0
+ <ε,ϑ e
ε 2(1−2α−(2s+d)/4) ε −(2s+d)/2
p
−cϑε−2
dϑ
p
dτ
p dτ + ε 2(1−2α) ε −(2s+d)
0
≤ Cε−2 · te,ε · ε p(2−2s−d) , provided we have both 4α < 2 − s − d/2 and α < 1/2. According to our assumptions, 2 − s − d/2 > 0, and therefore it is possible to apply (23) in the above estimates as long as 1 4 − 2s − d and 0 < α < , 0<α< 8 2 which we assume from now on. The bound for s = −d/2 is derived analogously, but in addition we have to use τ ∞ −2α 2 −1 −cϑε−2 2(1−2α) ϑ · ln(ε ϑ ) · e dϑ ≤ ε · ϑ −2α · | ln(ϑ −1 )| · e−cϑ dϑ . 0
0
Finally the Markov inequality yields P
sup
t∈[0,te,ε ]
σ WA−− (t)s ε
≥δ
≤
σ 2p δ
E
sup
t∈[0,te,ε ]
Together with (24) this completes the proof of the proposition.
2p WA−− (t)s ε
.
568
D. Blömker, S. Maier-Paape, T. Wanner
2.3. Main theorem. The three main results of the last subsection consider only the case of space-time white noise with strength σ , i.e., they assume αk = 1 for all k ∈ N in (12). In this subsection we consider more regular noise, i.e., colored noise, due to the following two reasons. On the one hand, solutions in the linear theory can only be controlled with respect to the H s (G)-norm, provided we assume s < 2 − d/2 due to the regularity condition in Proposition 2.8. However, in order to verify the existence of characteristic patterns of the solutions we need estimates in Sobolev norms that dominate the L∞ -norm — and for this, the inequality s > d/2 is necessary. The second reason for using colored noise is that the solutions to most nonlinear equations simply do not exist for space-time white noise. We therefore assume the following. Assumption 2.10. Let Assumption 2.1 be satisfied and let {W(t)}t≥0 be a Q-Wiener process as defined in Assumption 2.2, where {αk }k∈N is normalized by sup{αk2 : k ∈ N} = 1. Furthermore, assume that there exist sR ∈ R and constants 0 < ca < Ca such that R R ≤ α2 αk2 ≤ Ca · µ−s and ca · µ−s nk nk k
for all k ∈ N
(25)
for a subsequence with limN→∞ |{k ∈ N : nk ≤ N }|/N > 0. Note that cut-off noise, i.e., αk = 0 for all but finitely many k ∈ N, is excluded, since this will lead to a finite-dimensional problem, where the whole dominant space, which corresponds to modes with λk,ε of the order λmax ε , is excluded for small ε > 0. The solution of the linear problem (11) with respect to the Wiener process given in Assumption 2.10 is WAε (t) =
k∈N
with
αk ·
t 0
e(t−τ )λk,ε dβk (τ ) · fk
2 s 2 E WAε (t)s = µk · α k · k∈N
t
e2τ λk,ε dτ.
0
R as k → ∞, then we obtain two positive If we assume that αk2 is proportional to µ−s k constants c1 and c2 such that for all t ≥ 0 we have c1 · WAε (t)s−s ≤ WAε (t)s ≤ c2 · WAε (t)s−s . R
R
Therefore, the assertions of Propositions 2.3, 2.6, and 2.8 carry over immediately to analogous assertions for WAε (t) with s being replaced by s − sR . In fact assuming proportionality as in Assumption 2.10 is sufficient. See Remark 2.5. We refrain from restating the propositions in this form. Note that it is also possible to consider different upper and lower bounds, but the conditions are more complicated in this case. In order to apply the results of the linear theory to the nonlinear case, it is necessary to work with ε-dependent data rε , δε , and σε . This is in complete analogy to the deterministic case. We assume polynomial dependence, which simplifies the conditions to algebraic conditions for the exponents. The following theorem is the main result for a linear equation of the form (11) subject to colored noise and ε-dependent data. It is a consequence of the modified versions of Propositions 2.3, 2.6, and 2.8 mentioned above.
Spinodal Decomposition for the Cahn–Hilliard–Cook Equation
569
Theorem 2.11. Consider a Q-Wiener process as in Assumption 2.10. Suppose further that rε = r0 ε R ,
δε = δ0 ε R ,
and σε = σ0 ε A
(26)
for positive constants σ0 , r0 , and δ0 , with r0 δ0 . Assume that s ∈ R satisfies the inequalities d <2 2
s − sR + and
(27)
d , R − A − 1 < Cγ · s − sR + 2
(28)
where the constant Cγ is defined as −1 γ− · γ+ − γ− for s − sR + d/2 < 0 Cγ := . −1 for s − sR + d/2 ≥ 0
(29)
Then for te,ε as defined in (14) we have −E ln C1 ε −E + 1 < 2γ + λmax · t < ln C ε + 1 e,ε 2 ε with
d E = 2 1 − R + A − − s + sR 2
(30)
>0
(31)
and positive constants C1 and C2 depending only on γ + and s. Furthermore, there exists an ε0 > 0 such that for all q ∈ N there exists a constant Cq > 0 depending on γ + , γ − , s, d, and q, with (ε) (32) P ML ≤ Cq · ε q for all 0 < ε < ε0 , where (ε) ML
:= σε WA++ (t ) ≤ r e,ε ε ∪ ε
s
sup σε WA−ε (t) ≥ δε
t∈[0,te,ε ]
∪
s
−− sup σε WAε (t) ≥ δε .
t∈[0,te,ε ]
(33)
s
Notice that in the formulation of the theorem we suppressed the explicit dependence of the constants from δ0 , r0 , σ0 , and the constants in (6) and (25), in order to keep the notation as simple as possible. Furthermore, we will see later that E > 0 which follows from (28) is necessary unless an improvement of the estimates in Propositions 2.3, 2.8, and 2.8 is available. The estimate in (32) of Theorem 2.11 implies spinodal decomposition, as sketched in Fig. 2. It shows that with high probability, the solution of the linear stochastic equation (11) exhibits spinodal decomposition, since at time te,ε it is likely to be far away from
570
D. Blömker, S. Maier-Paape, T. Wanner Xε−− ⊕ Xε−
√ 2δε
u(te,ε )
rε
Yε
Fig. 2. Dominating subspace for the stochastic equation R A+1
−2 + s + d2
s + d2
∼ sR
sR
∼ −Cγ sR Fig. 3. Admissible region for R
the origin in the shaded region near the dominating subspace Yε , which is responsible for √ the characteristic patterns. Furthermore, with high probability it does not leave the 2δε -cylinder around Yε up to time te,ε . Notice that nothing is said about the behavior of the solution inside the thin cylinder along the dominating subspace before it reaches time te,ε . Note further that closeness is measured in the space H∗s (G). For example, if s = 0, then smallness of the L2 (G)-distance to Yε will not force the solutions to show the typical patterns. We should obtain at least L∞ (G)-estimates. In Fig. 3 the admissible region for sR and R is shown. It is derived from the estimates given in (27) and (28). The vertical bound on the left is due to (27). It is the natural regularity condition for the solution, showing how regular the noise has to be, in order to guarantee existence in H∗s (G). The second inequality (28) provides an upper bound on R, which in turn implies a lower bound on the radius rε = r0 ε R . This is necessary, since for small radii the noise dominates the behavior of the solution, and no significant motion along the dominating subspace is expected. The two cases of condition (28) can be interpreted as follows. Colored noise with sR s + d/2 − 2 allows only very large radii, because Cγ is a very large positive constant. This bound is due to the modification of Proposition 2.6. In this case the first few eigenvalues, which are contained in Xε− , are driven away from the initial condition 0 by a random force of strength O(1). Afterwards, they diverge exponentially fast, since
Spinodal Decomposition for the Cahn–Hilliard–Cook Equation
571
the corresponding eigenvalues are positive. Meanwhile, the modes in Xε++ are driven away only by a small random force. After some time they do catch up with the first few modes, because they move exponentially faster, but our proof establishes this only far from the origin. On the other hand, if the noise is not regular enough, then the fluctuations in the higher modes of Xε−− will dominate the behavior near the origin. This can be seen in the application of the modified Proposition 2.8. See the proof of Theorem 2.11. But this effect becomes less important for colored noise. Our conditions imply that the smallest radii are possible, if sR = s +d/2. Before proving the result, we examine some examples to see what our conditions imply. Example 2.12 (L∞ -Theory). In order to obtain results for the L∞ (G)-norm, one has to consider Sobolev norms which dominate it, i.e., one needs to assume s > d/2. Thus, if we choose for example R < 1 + A with R ≈ 1 + A, and also sR ≥ d with sR ≈ d, then one can always choose an s > d/2 such that Theorem 2.11 holds. Example 2.13 (H∗2 -Theory). The most important example in the application to the CahnHilliard equation is s = 2, since the whole deterministic theory is done in this space. Theorem 2.11 furnishes the conditions R < 1 + A + Cγ (2 − sR + d/2) and sR > d/2. Therefore, R < 1 + A with R ≈ 1 + A as sR ≈ 2 + d/2 would do. Also R > 0 with sR ∈ 1 + d/2 + (−A, 1 + (1 + A)/c˜γ ), where c˜γ = γ − /(γ + − γ − ) is the positive constant in (29). Proof of Theorem 2.11. It is clear that in order to verify (32) we only have to establish analogous estimates for any of the three sets. In the following we use the modification of Propositions 2.3, 2.6, and 2.8 for colored noise. Thus, s has to be replaced by s − sR in the estimates and conditions of these results, but not in the norms. This was discussed in the beginning of this subsection. In view of Remark 2.5, the intersection of Xε++ with the space spanned by the eigenfunctions fk for which the lower bound on the αk holds, furnishes a dominating space inside Xε++ . Furthermore, it provides a lower bound on the growth of WA++ . ε Hence, we obtain −d ≤ 2−cε ≤ Cεp , (t ) ≤ r P σε WA++ e,ε s ε ε for sufficiently small ε > 0, where te,ε was defined in (14). Therefore, the constant c depends only on γ + and the constants in (6), and the constant C depends on both c and p. Note that if we consider all ε-dependencies, then (15) implies 2γ + λmax · te,ε ≤ ε ln(C2 ε −E + 1), where −E = 2R − 2A + d + 2s − 2 − 2sR as claimed in (30). The constant C2 depends on Ce , which in turn depends only on γ + , s and the constants in (6). The other direction follows similarly. For the remainder of this proof we denote all constants which depend only on γ + , − γ , s, d, p, G, and the proportionality constants describing the asymptotic behavior of αk and µk by C. To prove the second estimate we use the modified Proposition 2.6, because together with −E = 2R − 2A + d + 2s − 2 − 2sR it implies − P sup σε WAε (t) ≥ δε t∈[0,te,ε ]
s
p γ − /γ + −1 . ≤ C εE+d+2s−2sR · :ε · C2 ε −E + 1
(34)
572
D. Blömker, S. Maier-Paape, T. Wanner
It can readily be verified that for arbitrary a0 ∈ (0, ∞) there exists a positive constant Ca0 such that (a + 1)
γ − /γ +
− 1 ≤ Ca0 ·
aγ a
− /γ +
for a ≥ a0 for 0 ≤ a ≤ a0 .
(35)
Furthermore, for 0 < ε < 1 bounded away from 1 we have
εd+2s−2sR
1 · :ε = ln(ε−1 ) ε2s−2sR +d
for s − sR + d/2 > 0 for s − sR + d/2 = 0 , for s − sR + d/2 < 0
which is bounded below by a constant. This, in conjunction with (34) and (35) shows that E > 0 is necessary for our proof. Moreover, in this case we obtain with a0 := 1 the estimate p − + − P sup σε WAε (t) ≥ δε ≤ C ε d+2s−2sR · :ε · ε E(1−γ /γ ) . s
t∈[0,te,ε ]
The term in parentheses on the right-hand side is bounded above by εp0 for a sufficiently small p0 > 0, provided d + 2s − 2sR ≥ 0
and
E>0
or if d + 2s − 2sR < 0
and
d + 2s − 2sR + E · (1 − γ − /γ + ) > 0.
It can easily be shown that in fact one of these conditions has to be satisfied due to (28). (ε) Finally, let us address the last of the three sets in the definition of ML . Using the modified Proposition 2.8 and the bounds for te,ε from (30), we then obtain P
(t) sup σε WA−− ≥ δε ε s
t∈[0,te,ε ]
p ≤ C ε E · ε −2 · te,ε ≤ CεpE · ln(ε −1 ) ≤ Cεp1 p
for a sufficiently small p1 > 0, provided E > 0. Choosing p sufficiently large in order to assure q ≤ min{1, p0 , p1 } · p completes the proof of the theorem.
Spinodal Decomposition for the Cahn–Hilliard–Cook Equation
573
3. Nonlinear Extensions In this section we derive error estimates for the difference between the mild solution of a nonlinear problem and the solution of the corresponding linearized equation. Specifically, we consider nonlinear equations of the form ut = Aε u + F (u) + σε · ξ,
u(0) = 0,
(36)
where the stochastic noise ξ is colored noise given as the generalized derivative of a Q-Wiener process W as in Assumption 2.10. Then the variation of constants formula implies for all t > 0 the identity t u(t) − σε · WAε (t) = e(t−τ )Aε F (u(τ ))dτ. (37) 0
Due to the effects of the nonlinearity in (36) we will only be able to prove the smallness of the deviation between the solution u of (36) and the stochastic convolution σε WAε in a sufficiently small neighborhood of the initial condition 0. Thus, we let Te,ε = inf {t > 0 : u(t)s > ρε }
(38)
denote the first exit-time of the solution of (36) from the Bρε (0) ball in H∗s (G), where for now s is an arbitrary constant. The size ρε will be determined later. Note that Te,ε is a random variable whereas te,ε was a deterministic quantity. 3.1. Outline of the main ideas. Let us begin by considering a simplified example which outlines the main ideas of our proceeding. For this, assume that F (u)s ≤ CF u2s is satisfied for all u ∈ H∗s (G) — an assumption which is not satisfied in our application of the theory to the Cahn–Hilliard–Cook equation, since there one has to consider the nonlinearity as operator between two different spaces. However, this more technical case will be deferred until the next subsection. Let ρε = rε − δε . Then for all t ≤ Te,ε the identity (37) implies t max u − σε · WA (t) ≤ e(t−τ )λε CF u(τ )2s dτ ε s 0 (39) CF · ρε2 tλmax ε ≤ · e − 1 . λmax ε In addition, let us now assume that the linearization of (36) satisfies all the assumptions of Theorem 2.11. Inserting the bound on te,ε given in (30) and using E > 0 then furnishes for all t ≤ min{Te,ε , te,ε } the estimate 2 u − σε · WA (t) ≤ CF · ρε · ete,ε λmax ε − 1 ε s λmax ε
1/(2γ + ) CF · ρε2 −E ≤ · Cε + 1 − 1 λmax ε
(40)
+
≤ Cε2R+2−E/(2γ ) . Our goal is to bound the right-hand side of (40) by δε = δ0 ε R . Since WAε (t) is close to Yε , this will imply that u(t) is close to Yε as well. Achieving this bound introduces yet
574
D. Blömker, S. Maier-Paape, T. Wanner
another condition on the exponents, which effectively is an upper bound on the radius rε . This, however, is fully expected, because the linear regime is valid only in a sufficiently small neighborhood of 0. More precisely, in order to bound the right-hand side of (40) by δε we have to assume
1 d E = · 1 − R + A − − s + s R+2> R , 2γ + γ+ 2 and therefore R>
d 1 + . − s + s · A − + 1 − 2γ R 1 + γ+ 2
Define (ε) MNL
= Te,ε ≤ te,ε ∩
∩
sup
− u (t) ≤ 2δε s
sup
−− u (t) ≤ 2δε , s
t∈[0,Te,ε ]
t∈[0,Te,ε ]
(41)
(42)
where u− denotes the projection of u onto the space Xε− , and similarly for the other (ε) spaces. The set MNL denotes the set of all solution trajectories of the nonlinear equation √ which stay in a 2 2δε -cylinder around Yε , and leave a ρε -ball in H∗s (G) around 0 before time te,ε . Hence, these trajectories exhibit spinodally decomposed patterns after a short time. See also Fig. 4. Under the assumptions of Theorem 2.11 we obtain for the (ε) (ε) complement NL of the set ML defined in (33) the inequality (ε) P NL ≥ 1 − Cε q . In the following we will show that if we assume the additional condition (41), then the (ε) set MNL has high probability as well. Thus, with high probability the situation sketched in Fig. 4 is true. To begin with, we verify that the event Te,ε ≤ te,ε occurs with high probability. Then (40) and (41) imply that it is very likely for the nonlinear solution to be close to the linear solution upon exiting the Bρε (0)-ball. Furthermore, this forces the nonlinear solution to leave the Bρε (0)-ball before te,ε and therefore in short time. If one assumes the inequality (ε) Te,ε > te,ε , then on NL we have ++ ++ ++ ++ u (te,ε ) ≥ W (t ) − W (t ) − u (t ) σ σ ε Aε e,ε ε Aε e,ε e,ε s ≥ rε − δε = ρε ,
s
s
where (41) and the error estimate (40) were used. This, however, contradicts the definition of Te,ε in (38). Hence we obtain Te,ε ≤ te,ε
on
(ε)
NL .
In addition, both − − − − u (t) ≤ W (t) + (t) − σ W (t) σ u < 2δε , ε ε Aε Aε s s
s
Spinodal Decomposition for the Cahn–Hilliard–Cook Equation
575
Yε
u(Te,ε )
Xε− ⊕ Xε−− ρε √ 2 2δε
Fig. 4. Stochastic nonlinear result
and similarly
−− u (t) < 2δε , s (ε)
are satisfied for all t ∈ [0, Te,ε ] on NL . This immediately implies that the nonlinear equation exhibits spinodal decomposition with high probability, since the just verified (ε) (ε) inclusion NL ⊂ MNL furnishes (ε) (ε) P MNL ≥ P NL ≥ 1 − Cε q . As for the conditions on the exponent R, both (41) and (28) are simultaneously satisfied, if and only if 1 · 1 + A − D − 2γ + < R < 1 + A + Cγ · D, 1 + γ+ where we used the abbreviation D := s + d/2 − sR , and Cγ is defined as in (29). Thus, it is possible to find suitable R, if and only if 0 < 3γ + + γ + · A + D · Cγ · 1 + γ + + 1 , and therefore Cγ∗ · D < 3 + A, if we define
1 + +1 C · 1 + γ γ = Cγ∗ = − 1 + γ− − + γ+ γ − γ−
for D ≥ 0 for D < 0
.
In other words, appropriate values for the exponent R exist, provided the strength of the noise is not too large.
576
D. Blömker, S. Maier-Paape, T. Wanner
3.2. Main results. In this subsection we apply the method outlined above to the abstract nonlinear equation (36), for which we assume the existence of a unique local mild solution. As usual, the norm in H∗s (G) is denoted by · s . In contrast to the simplified case treated in the last subsection, the main assumption on the nonlinearity is now F (u)s−β ≤ CF u1+κ s
(43)
for some κ > 0, a constant β ≥ 0, and all u ∈ H∗s (G) with us ≤ Cg for a given constant Cg > 0. For the Cahn–Hilliard–Cook equation equipped with the cubic nonlinearity F (u) = (u3 ) this estimate is satisfied for s = 2 and d < 4, if we set κ = 2 and β = 2. This follows immediately from the identity F (u) = u3 = 6u |∇u|2 + 3u2 u and the estimate
F (u)2 ≤ 72
G
u2 |∇u|4 dx + 18
G
u4 (u)2 dx
≤ 72u2L∞ (G) · ∇u4L4 (G) + 18u4L∞ (G) · u2L2 (G)
(44)
≤ Cu6 , where we used the Sobolev imbeddings of H 2 (G) into L∞ (G) for d < 4, and of H 1 (G) into L4 (G) for d ≤ 4. Similar to the last subsection, the main result in the nonlinear situation is an explicit bound on the distance between the solution u of the nonlinear problem (36), and the stochastic convolution σε WAε , which solves the linearized equation. Theorem 3.1. Assume that the nonlinear problem (36) has a unique local mild solution P -a.s. in C 0 ([0, T ∗ ), H∗s (G)), where T ∗ is a P -almost sure positive stopping time with T ∗ = ∞, or u(t)s → ∞ as t → T ∗ . Assume that the nonlinearity F satisfies (43) with 0 ≤ β < 4 and κ > 0. Let rε , δε , and σε be as in (26) with R > 0, and define ρε = rε − δε . Let Te,ε be the stopping time defined in (38), and assume E = 2(1 − R + A − d/2 − s + sR ) > 0 such that the linear exit time te,ε from Theorem 2.11 satisfies the estimate te,ε ≤ (ln(C ε −E + 1))/(2γ + λmax ε ). Finally, assume that for some fixed γ∗ ∈ (0, 1) we have
1 d R> + · γ + γ∗ (β − 2) + 1 + A − s + sR − . (45) γ γ∗ κ + 1 2 Then there exists an ε0 > 0 depending on γ ∗ , β, CF , and C such that for all 0 < ε < ε0 we have u(t) − σε · WA (t) ≤ δε (46) ε s for all t < min{Te,ε , te,ε }. If in addition the assumptions of Theorem 2.11 are satisfied, then the above theorem implies spinodal decomposition for the nonlinear equation, since the solution of the nonlinear equation remains sufficiently close to the solution of the linear equation. This was discussed in the simplified example presented in the last subsection, and carries over directly to the general case. It remains to verify that both the lower bound on R given
Spinodal Decomposition for the Cahn–Hilliard–Cook Equation
577
in (45) and the upper bound given in (28) can be satisfied simultaneously. This can be achieved, if and only if the interval
+ 1 · γ γ (β − 2) + 1 + A − D , 1 + A + C · D (47) IR := ∗ γ γ + γ∗ κ + 1 is not empty, where D = s − sR + d/2, and Cγ is defined as in (29). In other words, the inequality γ + γ∗ (β − 2) − 1 + Cγ · γ + γ∗ κ + 1 · D < γ + γ∗ κ · (1 + A) , has to be satisfied, which is equivalent to A > −1 +
β −2 + Cγ∗ · D, κ
(48)
where Cγ∗ =
1
1 + γ − γ∗ κ − γ∗ κ · (γ + − γ − )
for D ≥ 0 for D < 0
.
Using the same ideas as in Subsect. 3.1, the last condition, in conjunction with the error estimate of Theorem 3.1, furnishes the following result. Theorem 3.2. Suppose that the assumptions of Theorems 2.11 and 3.1 are satisfied. Furthermore, assume that there is an R > 0 in the interval IR defined in (47). Then the (ε) noise strength exponent A satisfies (48). Let the set MNL be defined as in (42). Then for every q ∈ N, (ε) P MNL ≥ 1 − Cq ε q for all sufficiently small ε > 0 with the constant Cq > 0 from (32). Thus, with high R probability, trajectories of the solution of (36) leave √ a ball of radius ρε = ρ0 ·ε centered s at the initial condition 0 in H∗ (G) within a 2 2δε -cylinder around the dominating subspace Yε before time te,ε of order O(ε 2 ln(ε −1 )). In other words, it is very likely that the solution exhibits spinodally decomposed patterns. The above theorem is our main nonlinear result, and will be applied to the Cahn– Hilliard–Cook model in the next section. The set of admissible values for the exponent R is depicted in Fig. 5. The remainder of this section is devoted to the proof of Theorem 3.1. We begin with the following auxiliary result. Lemma 3.3. For arbitrary β ≥ 0 and every a ∈ (0, 1) there exists a positive constant C s−β which depends only on G and β, such that for all w ∈ H∗ (G) and every t > 0 we have max β/4 λε max tAε · etλε /a ws−β . e w ≤ C · s (1 − a)t
578
D. Blömker, S. Maier-Paape, T. Wanner R A+1
s + d2 − 2
sR ∼ sR
s + d2
∼ −Cγ sR
Fig. 5. Admissible region for the nonlinear theory with large noise strength
Proof. For any t > 0 we have 2 2 2 2 tAε e w = (−)β/2 e−tε (1−a) et (−ε a −) (−)(s−β)/2 w s
s−β
for all w ∈ H∗ (G). Since −2 is the infinitesimal generator of an analytic semigroup of contractions and λmax = ε−2 /4, we have ε β · λmax β/4 2 2 ε · v (−)β/2 e−tε (1−a) v ≤ e(1 − a)t
for all v ∈ X. This can easily be verified using a Fourier series expansion of v ∈ X. Direct calculations, as in the case a = 1, provide that λmax ε /a is an upper bound for the eigenvalues of −ε2 a2 − . Hence, for a positive constant C depending on G we deduce for all v ∈ X, max t (−ε2 a2 −) v ≤ etλε /a · v. e
Combining all estimates yields the desired result.
Proof of Theorem 3.1. Note that from the definition of Te,ε in (38) it is obvious that Te,ε ≤ T ∗ holds P -almost surely. Analogously to (39) and (40), Lemma 3.3 and condition (43)
Spinodal Decomposition for the Cahn–Hilliard–Cook Equation
579
for the nonlinearity F imply for all t < min{Te,ε , te,ε } and any γ ∗ ∈ (0, 1), u(t) − σε · WA (t) ε s t (t−τ )A ε F (u(τ )) dτ e ≤ s 0
t
β/4 λmax max ε · e(t−τ )λε /γ∗ · F (u(τ ))s−β dτ ≤C t − τ 0 t max β/4 λε max · eτ λε /γ∗ dτ ≤ Cρε 1+κ · τ 0 max β/4 1−β/4 t λmax /γ 1+κ ∗ · λε · te,ε · e e,ε ε ≤ Cρε 1−β/4 1/(2γ + γ∗ ) −1+β/2 −E ≤ Cρε 1+κ · λmax · ln Cε + 1 · Cε −E + 1 ε 1−β/4 + ≤ Cε(1+κ)R+2−β−E/(2γ γ∗ ) · ln Cε −E + 1 , (49)
where E > 0, R > 0, and β < 4 were used. Notice that due to R > 0 we know that ρε ≤ Cg for sufficiently small ε > 0, where Cg was introduced after (43). The right-hand side in (49) is less than δε = δ0 ε R for sufficiently small ε > 0, provided
E 1 d κ ·R >β −2+ = β − 2 + · 1 − R + A − s + s − , R 2γ + γ∗ γ + γ∗ 2 due to (31). Hence we need R>
1 d + γ (β − 2) + 1 + A − s + s − · γ . ∗ R γ + γ∗ κ + 1 2
4. The Cahn–Hilliard–Cook Equation In this section we apply the nonlinear theory presented in the last section to the Cahn– Hilliard–Cook equation given by ut = −(ε 2 u + u − g(u)) + σε · ξ
in G,
∂u ∂u = =0 ∂ν ∂ν
on ∂G,
(50)
with initial condition u(0) = 0 and domain G ⊂ Rd with sufficiently smooth boundary, where d = 1, 2, 3. The noise process ξ is the generalized time derivative of a Q-Wiener process W satisfying Assumption 2.10, and we also assume that the noise conserves total mass. Notice that we focus not only on the specific cubic polynomial g(u) = u3 , but allow for a more general class of nonlinearities. More precisely, the function g is supposed to be sufficiently smooth with g(0) = g (0) = 0. Additionally, we sometimes assume g (0) = 0. As for global existence of a solution of (50) we refer the reader to [11]. However, for the situation considered in this section we assume only that a local solution exists in H∗2 (G). This can be proven using fixed point arguments and the variation of constants
580
D. Blömker, S. Maier-Paape, T. Wanner
formula, provided the stochastic convolution WAε is sufficiently regular and g satisfies some additional conditions, such as for example polynomial growth. This subsection only addresses the case of the homogeneous initial condition u(0) = 0, since in this case our abstract theory applies immediately. Other homogeneous initial conditions u(0) = m can be considered using the substitution u˜ = u − m. Furthermore, we always assume the identity f (m) = 1, where −f (u) = g(u) − u is the usual derivative of a double-well potential considered in (1). Other values of f (m) can be treated by employing the change of variables discussed in (4). This involves a change of variables in the spatial variables leading to a rescaling of the noise. The specific nonlinearity F (u) = u3 of the abstract formulation was discussed in (44). It satisfies the estimate (43) with s = 2, β = 2 and κ = 2. For the general case F (u) = g(u), similar estimates can be established easily. Consider a C 2 -function g with g(0) = g (0) = 0, discussed for example in [17, 21]. Then Taylor’s formula implies |g (x)| ≤ C|x|, and the continuity of g furnishes |g (x)| ≤ C, in both cases for all x with |x| ≤ Ki Kg , where Kg is a fixed sufficiently small constant, and the constant Ki is chosen in such a way that wL∞ (G) ≤ Ki wH 2 (G) for all w ∈ H 2 (G). Then for arbitrary w with wH 2 (G) ≤ Kg the identity g(w) = g (w)w + g (w)|∇w|2 implies |g(w)| ≤ C |w| · |w| + |∇w|2 , as well as
g(w)L2 (G) ≤ C wL∞ (G) · wL2 (G) + ∇w2L4 (G) .
Finally, if we use the same imbeddings as in (44), then for dimensions d ≤ 3 we obtain g(w)L2 (G) ≤ Cw2H 2 (G) for all w ∈ H 2 (G) with wH 2 (G) ≤ Kg . In other words, the condition (43) holds with β = 2 and κ = 1. If in addition we assume g (0) = 0 and that g is a C 3 -function, then using similar arguments we obtain the validity of (43) with β = 2 and κ = 2. It remains to verify the algebraic conditions on the exponents. In the following we vary sR and discuss the dimensions d = 1, 2, 3. We consider both the case of smallest possible radii rε , and of "least colored" noise. The first case is sR = 2 + d2 . The second case means that the linear theory is close to failing. In other words, sR is larger than, but close to d/2. Theorem 4.1. We consider the nonlinear Cahn–Hilliard–Cook equation (50) with initial condition u(0) = 0 in dimension d = 1, 2, 3. Assume that this problem has a unique mild solution u in H∗2 (G) for arbitrary ε > 0, where the noise process ξ is the generalized time derivative of a Q-Wiener process W satisfying Assumption 2.10 with noise strength σε = εA . Moreover, assume that the nonlinearity g satisfies g(0) = g (0) = 0, and fix a constant K0 > 1/(1+γ + ). If in addition we have g (0) = 0, then fix K0 > 1/(1+2γ + ). Finally, assume that the constants sR , A, and R are chosen in either one of the following two ways:
Spinodal Decomposition for the Cahn–Hilliard–Cook Equation
581
(I) For the case of smallest possible radii, let sR = 2 + d/2, choose A > −1, and let R ∈ (1 + A) · (K0 , 1). (II) For the case of least possible colored noise, let sR = η + d/2 for some fixed small η > 0, choose A > 1 − η, and let R ∈ (A − 1 + η) · (K0 , 1). In either of the above two cases the following holds. For arbitrary q ∈ N there exists an ε-independent constant Cq such that the solution u leaves a ball of radius rε = εR centered at 0 in H∗2 (G) within a small cylinder around the strongly unstable space Yε with probability higher than 1 − Cq ε q . The time needed for this to happen is of the order O(ε 2 ln(ε −1 )). In other words, with high probability, the solution u exhibits spinodally decomposed patterns of wavelength O(ε). Proof. Case (I). Here we have sR = 2 + d/2. This corresponds to the case of the smallest possible radius, because we can consider nearly maximal values of R in the admissible region. See Fig. 5. As in the deterministic case, we study the stochastic Cahn–Hilliard–Cook equation in the space H∗2 (G). Thus, we choose s = 2, which implies D = 2 − sR + d/2 = 0. The regularity condition (27) for the existence of a linear solution in H∗2 (G) is then satisfied. Condition (48) reduces to A > −1, and the condition R ∈ IR (see (47)) for the admissible radii reduces to γ +γ
1 · (1 + A) < R < 1 + A. ∗κ + 1
Fix γ∗ near 1 to enlarge the interval, and choose r0 = 1 and σ0 = 1 for simplicity. Therefore, we can consider radii + rε ∈ ε1+A , ε(1+A)/(γ γ∗ κ+1) , with γ + γ∗ close to 1 and κ ∈ {1, 2}, such that K0 = (γ + γ∗ κ + 1)−1 . Furthermore, we can consider noise strength σε = εA = o(ε −1 ). Here our nonlinear theory applies. Case (II). In this case we have sR = η + d/2 for some fixed small η > 0. This case deals with least possible spatial regularity of the noise process, and corresponds to the left boundary of the admissible region. See Fig. 5. The choice of sR is due to the regularity condition sR > d/2 given in (27). Notice that we have s = 2 and D = 2 − η > 0. Similar to the above Case (I), the conditions (48) for A and R ∈ IR reduce to A >1−η and K0 · (A − 1 + η) < R < A − 1 + η, where γ∗ is chosen close to 1. Hence, rε ∈ εA−1+η , εK0 (A−1+η) for the noise strength σε = εA = o(ε 1−η ). Note finally that some constants depend on η and γ∗ . Therefore, an extension to η = 0 and γ∗ = 1 is not immediately possible.
582
D. Blömker, S. Maier-Paape, T. Wanner
References 1. Adams, R.A.: Sobolev Spaces. Volume 65 of Pure and Applied Mathematics, New York: Academic Press, Inc., 1978 2. Binder, K.: Kinetics of phase separation. In: Stochastic nonlinear systems in physics, chemistry, and biology,Volume 8 of Springer Series in Synergetics, Berlin–Heidelberg–NewYork: Springer,1981, pp. 62– 71 3. Blömker, D: Stochastic Partial Differential Equations and Surface Growth. PhD thesis, Universität Augsburg, 2000 4. Blömker, D., Maier-Paape, S. and Wanner, T.: Second phase spinodal decomposition for the stochastic Cahn–Hilliard-Cook equation. In preparation 5. Cahn, J.W.: Free energy of a nonuniform system. II. Thermodynamic basis. J. Chem. Phys. 30, 1121–1124 (1959) 6. Cahn, J.W.: Phase separation by spinodal decomposition in isotropic systems. J. Chem. Phys. 42, 93–99 (1965) 7. Cahn, J.W.: Spinodal decomposition. Transactions of the Metallurgical Society of AIME 242, 166–180 (1968) 8. Cahn, J.W. and Hilliard, J.E.: Free energy of a nonuniform system. I. Interfacial free energy. J. Chem. Phys. 28, 258–267 (1958) 9. Cook, H.E.: Brownian motion in spinodal decomposition. Acta Metallurgica 18, 297–306 (1970) 10. Courant, R. and Hilbert, D.: Methods of Mathematical Physics. New York: Intersciences, 1953 11. da Prato, G. and Debussche, A.: Stochastic Cahn–Hilliard equation. Nonlinear Analysis 26, 2, 241–263 (1996) 12. da Prato, G. and Zabczyk, J.: Stochastic Equations in Infinite Dimensions. In: Encyclopedia of Mathematics and its Application, 44, Cambridge: Cambridge University Press, 1992 13. Edmunds, D.E. and Evans, W.D.: Spectral Theory and Differential Operators. Oxford: Oxford Science Publications, 1990 14. Elder, K.R. and Desai, R.C.: Role of nonlinearities in off-critical quenches as described by the Cahn– Hilliard model of phase separation. Phys. Rev. B 40, 243–254 (1989) 15. Elder, K.R., Rogers, T.M. and Desai, R.C.: Numerical study of the late stages of spinodal decomposition. Phys. Rev. B 37, 9638–9649 (1987) 16. Elder, K.R., Rogers, T.M. and Desai, R.C.: Early stages of spinodal decomposition for the Cahn–HilliardCook model of phase separation. Phys. Rev. B 38, 4725–4739 (1988) 17. Grant, C.P.: Spinodal decomposition for the Cahn-Hilliard equation. Comm. Partial Differ. Eqs. 18, 3–4, 453–490 (1993) 18. Grant, M., San Miguel, M., Viñals, J., Gunton, J.D.: Theory for early stages of phase separation. The long-range-force limit. Phys. Rev. B 31, 3027–3039 (1985) 19. Karatzas, I. and Shreve, S.E.: Brownian Motion and Stochastic Calculus. Second Edition. Berlin– Heidelberg–New York: Springer, 1999 20. Langer, J.S., Bar-on, M. and Miller, H.: New computational method in the theory of spinodal decomposition. Phys. Rev. A 11, 4, 1417–1429 (1975) 21. Maier-Paape, S. and Wanner, T.: Spinodal decomposition for the Cahn-Hilliard equation in higher dimensions. Part I: Probability and wavelength estimate. Commun. Math. Phys. 195, 2, 435–464 (1998) 22. Maier-Paape, S. and Wanner, T.: Spinodal decomposition for the Cahn-Hilliard equation in higher dimensions: Nonlinear dynamics. Arch. Rational Mech. Anal. 151, 187–219 (2000) 23. Milchev, A., Heermann, D.W. and Binder, K.: Monte Carlo simulation of the Cahn-Hilliard model of spinodal decomposition. Acta Metallurgica 36, 2, 377–383 (1988) 24. Pego, R.L.: Front migration in the nonlinear Cahn–Hilliard equation. Proc. Royal Soc. London Series A 422, 261–278 (1989) 25. Sander, E. and Wanner, T.: Monte Carlo simulations for spinodal decomposition. J. Stat. Phys. 95, 5–6, 925–948 (1999) 26. Sander, E. and Wanner, T.: Unexpectedly linear behavior for the Cahn–Hilliard equation. SIAM J. Appl. Math. 60, 6, 2182–2202 (2000) Communicated by J. L. Lebowitz
Commun. Math. Phys. 223, 583 – 626 (2001)
Communications in
Mathematical Physics
© Springer-Verlag 2001
A Time-Dependent Born–Oppenheimer Approximation with Exponentially Small Error Estimates George A. Hagedorn1, , Alain Joye2 1 Department of Mathematics and Center for Statistical Mechanics and Mathematical Physics, Virginia
Polytechnic Institute and State University, Blacksburg, Virginia 24061-0123, USA
2 Institut Fourier, Unité Mixte de Recherche CNRS-UJF 5582, Université de Grenoble I, BP 74,
38402 Saint Martin d’Hères Cedex, France Received: 13 February 2001 / Accepted: 13 July 2001
Dedicated to Jean-Michel Combes in celebration of his 60th birthday Abstract: We present the construction of an exponentially accurate time-dependent Born–Oppenheimer approximation for molecular quantum mechanics. We study molecular systems whose electron masses are held fixed and whose nuclear masses are proportional to −4 , where is a small expansion parameter. By optimal truncation of an asymptotic expansion, we construct approximate solutions to the timedependent Schrödinger equation thatagree with exact normalized solutions up to errors whose norms are bounded by C exp − γ / 2 , for some C and γ > 0. 1. Introduction In this paper we construct exponentially accurate approximate solutions to the time-dependent Schrödinger equation for a molecular system. The small parameter that governs the approximation is the usual Born–Oppenheimer expansion parameter , where 4 is the ratio of the electron mass divided by the mean nuclear mass. The approximate solutions we construct agree with exact solutions up to errors whose norms are bounded 2 by C exp −γ / , for some C and γ > 0, under analyticity assumptions on the electron Hamiltonian. The Hamiltonian for a molecular system with K nuclei and N − K electrons moving in l dimensions has the form H () =
K j =1
−
N 1 4 Xj − Xj + Vij (Xi − Xj ). 2Mj 2mj j =K+1
i<j
Here Xj ∈ Rl denotes the position of the j th particle, the mass of the j th nucleus is −4 Mj for 1 ≤ j ≤ K, the mass of the j th electron is mj for K + 1 ≤ j ≤ N , and the Partially Supported by National Science Foundation Grants DMS–9703751 and DMS–0071692.
584
G. A. Hagedorn, A. Joye
potential between particles i and j is Vij . For convenience, we assume each Mj = 1. We set d = Kl and let X = (X1 , X2 , . . . , XK ) ∈ Rd denote the nuclear configuration vector. We can then decompose H () as H () = −
4 X + h(X). 2
The first term on the right-hand side represents the nuclear kinetic energy, and the second is the “electron Hamiltonian” that depends parametrically on X. For each fixed X, h(X) is a self-adjoint operator on the Hilbert space Hel = L2 (R(N −K)l ). The time-dependent Schrödinger equation we approximately solve in L2 (Rd , Hel ) as → 0 is i 2
∂ψ 4 = − X ψ + h(X)ψ. ∂t 2
(1.1)
The factor of 2 on the left hand side results from a particular choice of time scale, where t ∈ [0, T ] is measured in units of −2 . This choice of time scale is the one in which the nuclear motion has a non-trivial classical limit as tends to zero. Asymptotic expansions in powers of of certain solutions to Eq. (1.1) are derived in [7–9]. We obtain our construction by truncating these expansions after an -dependent number of terms, in an effort to minimize the norm of the error. Similar strategies have been used to obtain exponentially accurate results for adiabatic approximations [27, 18, 19, 16] and semiclassical approximations [14, 15], both of which play roles in the Born–Oppenheimer approximation we are studying here. Roughly speaking, the time-dependent Born–Oppenheimer approximation says the following for small : The electrons move very rapidly and adjust their state adiabatically as the more slowly moving nuclei change their positions. If the electrons start in a discrete energy level of h(X), they will remain in that level to leading order in . In the process, the electron states create an effective potential in which the motion of the heavy nuclei is well described by a semiclassical approximation. The asymptotic expansions show that this intuition is valid up to errors of order k for any k. In Born–Oppenheimer approximations, adiabatic and semiclassical limits are being taken simultaneously, and they are coupled. Analysis of errors for the adiabatic and semiclassical approximations showsthat they are each accurate up to errors whose bounds have the form C exp −γ / 2 [17, 14]. Non-adiabatic transitions are known in some systems to be of this order, and tunnelling in semiclassical approximations makes contributions of this order. Thus, one cannot expect to do better than approximations of this type because of two well-known physical phenomena that Born–Oppenheimer approximations do not take into account. In some systems, tunnelling might dominate the error. In some, non-adiabatic electronic transitions may dominate. In others, the two effects can be of comparable magnitude. One of the motivations for our work is to generate a “good” basis upon which to build a “surface hopping model” that would accurately describe non-adiabatic electronic transitions. Prior authors (see, e.g., [29, 32, 33, 5]) have proposed such models based on the zeroth order time-dependent Born–Oppenheimer approximation. Using the zeroth order states as a basis of the surface hopping model, the non-adiabatic transitions appear at order 2 . This is huge compared to the exponentially small physical phenomenon one would like to study, and we believe interference between transitions that occur at
Time-Dependent Born–Oppenheimer Approximation
585
different times is responsible for the exponential smallness of the physically interesting quantity. Our view is that by choosing a much better set of states on which to base the model, one will obtain a much more useful approximation. Sir Michael Berry [3, 4, 21] has advocated such ideas for the somewhat simpler adiabatic approximation (which does not have the complications of the nuclear motion). These ideas have been used in [16, 18] to prove the accuracy of certain results for non-adiabatic transitions that are exponentially small and in [22] to study exponentially small transitions for a simple one-dimensional trajectory model for N -levels systems. Remarks. 1. There are some other exponentially accurate results in the general topic of Born–Oppenheimer approximations. The prior results come from the study of the time-independent Schrödinger equation and depend on global properties of the system. Our results are time-dependent and make use of local information. Klein [20] and Martinez [23–25] show that resonances associated with predissociation processes have exponentially long lifetimes. Benchaou and Martinez [1, 2] also show that certain S-matrix elements associated with non-adiabatic transitions are exponentially small. 2. The papers cited in the previous remark obtain estimates that depend on the global structure of the electron energy levels. The results we obtain depend on a particular classical path. When the path stays away from the nuclear configurations where the gap between relevant electronic levels is minimized, one would expect the nonadiabatic errors from our approximation to be smaller, i.e. , both results would obtain errors of order exp(−/ 2 ), but we would obtain a larger value of . We expect this because in our case, the Landau–Zener formula predicts that our should come from the minimum gap between eigenvalues on the classical path, rather than the global minimum gap. 3. From a mathematical point of view, the optimal truncation procedure in this context was first stated for the adiabatic approximation for two component systems of ODE’s by Berry [3, 4]. It was first proved to yield exponentially accurate results for Hilbert space valued ODE’s by Nenciu [27]. In [14, 15] we used this idea for the semiclassical approximation, which is a complex valued PDE setting. The present paper can be viewed as extending these ideas to a Hilbert space valued PDE setting. 4. Since the original preprint of this paper was written (mp_arc 00-209), three other relevant preprints have been written [28, 26, 31]. These three papers use pseudodifferential operator techniques to investigate the adiabatic aspects of problems similar to the ones we study here. They reduce the full problem to a semiclassical Schrödinger equation for the nuclei that is associated with a finite set of isolated electronic levels. They allow more general initial conditions for the nuclei than the ones we study here, but their results are less detailed.
1.1. Hypotheses. We assume that the electron Hamiltonian h(X) satisfies the following analyticity hypotheses: H0 (i) For any X ∈ Rd , h(X) is a self-adjoint operator on some dense domain D ⊂ Hel , where Hel is the electronic Hilbert space. We assume the domain D is independent of X and h(X) is bounded from below uniformly in Rd . (ii) There exists a δ > 0, such that for every ψ ∈ D, the vector h(X)ψ is analytic in Sδ = {z ∈ Cd : |Im(zj )| < δ, j = 1, . . . , d}.
586
G. A. Hagedorn, A. Joye
H1 There exists an open set " ⊂ Rd , such that for all X ∈ ", there exists an isolated, multiplicity one eigenvalue E(X) of h(X) associated with a normalized eigenvector $(X) ∈ Hel . We assume without loss that the origin belongs to ". Remarks. 1. Hypothesis H0 implies that the family of operators {h(X)}X∈Sδ is a holomorphic family of type A. 2. It follows from H0 and H1 that there exists δ ∈ (0, δ) and " ⊂ " such that the complex and vector valued functions E(·) and $(·) admit analytic continuations on the set %δ = {z ∈ Cd : Re(z) ∈ " and |Im(zj )| < δ , j = 1, . . . , d}. 3. Realistic models with Coulomb potentials do not satisfy condition (ii) of Hypothesis H0 because of the singularities of the Coulomb potentials. One should be able to extend our results to Coulomb potentials by using the methods of [9], but doing so would be technically extremely complicated.
1.2. Summary of the main results. Our main results are stated precisely as Theorem 4.1 in Sect. 4. Two generalizations of this result are presented in Sect. 8. Roughly speaking, Theorem 4.1 states the following: Under hypotheses H0 and H1 , we construct &∗ (X, t, ) (that depends on a parameter g) for t ∈ [0, T ]. For small values of g, there exist C(g) and (g) > 0, such that in the limit → 0, −itH ()/ 2 2 e &∗ (X, 0, ) − &∗ (X, t, )L2 (Rd ,H ) ≤ C(g)e−(g)/ el
In the state &∗ (X, t, ), the electrons have a high probability of being in the electron state $(X). For any b > 0 and sufficiently small values of g, the nuclei are localized near a classical path a(t) in the sense that there exist c(g) and γ (g) > 0, such that in the limit → 0, 1/2 2 &∗ (X, t, )2H dx ≤ c(g)e−γ (g)/ . |X−a(t)|>b
el
The mechanics of the nuclear configuration a(t) is determined by classical dynamics in the effective potential E(X). Two theorems in Sect. 8 generalize this result. The first allows the time interval to grow as tends to zero. The second allows more general initial conditions. Remark. We have not addressed the question of whether or not we obtain realistic values for (g) or γ (g) when making optimal choices of the parameter g in our results. 2. Coherent States and Classical Dynamics In the construction of our approximation to the solution of the molecular Schrödinger equation, we need wave packets that describe the semiclassical dynamics of the heavy nuclei. In the Born–Oppenheimer context, the semiclassical parameter is 2 , which plays the role of the usual h¯ for the nuclei. We make use of a convenient set of coherent states (also called generalized squeezed states), that we express here in terms of the semiclassical parameter h. ¯
Time-Dependent Born–Oppenheimer Approximation
587
Warning. We warn the reader that we use the familiar semiclassical notation where the small parameter is h. ¯ In the rest of the paper, the physical value of Planck’s constant is h¯ = 1, and the relevant semiclassical parameter is 2 , due to the large masses of the nuclei and our choice of time scale. We recall the definition of the coherent states ϕj (A, B, h, ¯ a, η, X) that are described in detail in [13]. A more explicit, but more complicated definition is given in [12]. We adopt the standard multi-index notation. A multi-index j = (j1 , j2 , . . . , jd ) is a
j j j d-tuple of non-negative integers. We define |j | = dk=1 jk , X j = X11 X22 · · · Xdd , ∂ |j | . (∂X1 )j1 (∂X2 )j2 ···(∂Xd )jd d d assume a ∈ R , η ∈ R and
j ! = (j1 !)(j2 !) · · · (jd !), and D j =
h¯ > 0. We also assume that Throughout the paper we A and B are d × d complex invertible matrices that satisfy At B − B t A = 0, A∗ B + B ∗ A = 2I.
(2.1)
These conditions guarantee that both the real and imaginary parts of BA−1 are sym −1 = AA∗ . metric. Furthermore, ReBA−1 is strictly positive definite, and ReBA−1 Our definition of ϕj (A, B, h, ¯ a, η, X) is based on the following raising operators that are defined for m = 1, 2, . . . , d. d d 1 ∂ ∗ B nm (Xn − an ) − i Anm − i h¯ − ηn . Am (A, B, h, ¯ a, η) = √ ∂Xn 2h¯ n=1
n=1
The corresponding lowering operators Am (A, B, h, ¯ a, η) are their formal adjoints. These operators satisfy commutation relations that lead to the properties of the ϕj (A, B, h, ¯ a, η, X) that we list below. The raising operators Am (A, B, h, ¯ a, η)∗ for m = 1, 2, . . . , d commute with one another, and the lowering operators Am (A, B, h¯ , a, η) commute with one another. However, ∗ ∗ Am (A, B, h, ¯ a, η)An (A, B, h¯ , a, η) − An (A, B, h, ¯ a, η) Am (A, B, h, ¯ a, η) = δm,n .
Definition 2.1. For the multi-index j = 0, we define the normalized complex Gaussian wave packet (modulo the sign of a square root) by ϕ0 (A, B, h¯ , a, η, X) = π −d/4 h¯ −d/4 (det(A))−1/2
× exp −(X − a), BA−1 (X − a)/(2h¯ ) + iη, (X − a)/h¯ . Then, for any non-zero multi-index j , we define j j 1 A1 (A, B, h¯ , a, η)∗ 1 A2 (A, B, h¯ , a, η)∗ 2 · · · ϕj (A, B, h¯ , a, η, ·) = √ j! j × Ad (A, B, h¯ , a, η)∗ d ϕ0 (A, B, h¯ , a, η, ·). Properties. 1. For A = B = I , h¯ = 1, and a = η = 0, the ϕj (A, B, h, ¯ a, η, ·) are just the standard Harmonic oscillator eigenstates with energies |j | + d/2.
588
G. A. Hagedorn, A. Joye
2. For each admissible A, B, h, ¯ a, and η, the set {ϕj (A, B, h, ¯ a, η, ·)} is an orthonormal 2 d basis for L (R ). 3. In [12], the state ϕj (A, B, h, ¯ a, η, X) is defined as a normalization factor times Hj (A; h¯ −1/2 |A|−1 (X − a))ϕ0 (A, B, h, ¯ a, η, X). Here Hj (A; y) is a recursively defined |j |th order polynomial in y that depends on A only through UA , where A = |A|UA is the polar decomposition of A. 4. By scaling out the |A| and h¯ dependence and using Remark 3 above, one can show 2 that Hj (A; y)e−y /2 is an (unnormalized) eigenstate of the usual Harmonic oscillator with energy |j | + d/2. 5. When the dimension √ d is 1, the position√and momentum uncertainties of the ϕj (A, B, h, and (j + 1/2)√ h|B|, respectively. In higher ¯ a, η, ·) are (j + 1/2) ¯ ¯ √ h|A| respecdimensions, they are bounded by (|j | + d/2)h¯ A and (|j | + d/2)hB, ¯ tively. 6. When we approximately solve the Schrödinger equation, the choice of the sign of the square root in the definition of ϕ0 (A, B, h, ¯ a, η, ·) is determined by continuity in t after an arbitrary initial choice. The following simple but very useful lemma is proven in [15]. Lemma 2.1. Let P|j |≤n denote the projection onto the span of the ϕj (A, B, h¯ , a, η, ·) with |j | ≤ n, (X − a)m P|j |≤n = P|j |≤n+|m| (X − a)m P|j |≤n ,
(2.2)
and |m| √ (X − a)m P|j |≤n ≤ 2h¯ dA
(n + |m|)! n!
1/2 .
(2.3)
In the Born–Oppenheimer approximation, the semiclassical dynamics of the nuclei is generated by an effective potential given by a chosen isolated electronic eigenvalue E(X) of the electronic hamiltonian h(X), X ∈ Rd . For a given effective potential E(X) we describe the semiclassical dynamics of the nuclei by means of the time dependent basis constructed as follows: By assumption H1 , the potential E : " ⊂ Rd → R is smooth and bounded below. Associated to E(X), we have the following classical equations of motion: a(t) ˙ = η(t) ˙ = ˙ = A(t) ˙ B(t) =
η(t), −∇E(a(t)), iB(t), iE (2) (a(t))A(t), 2 ˙ = η(t) − E(a(t)), S(t) 2
(2.4)
Time-Dependent Born–Oppenheimer Approximation
589
where E (2) denotes the Hessian matrix for E. We always assume the initial conditions A(0), B(0), a(0), η(0), and S(0) = 0 satisfy (2.1). The matrices A(t) and B(t) are related to the linearization of the classical flow through the following identities: A(t) =
∂a(t) ∂a(t) A(0) + i B(0), ∂a(0) ∂η(0)
B(t) =
∂η(t) ∂η(t) B(0) − i A(0). ∂η(0) ∂a(0)
Because E is smooth and bounded below, there exist global solutions to the first two equations of the system (2.4) for any initial condition if " = Rd . From this, it follows immediately that the remaining three equations of the system (2.4) have global solutions. If " = Rd , for any initial conditions, there exists a 0 < T ≤ ∞ so that solutions to the system (2.4) exist for any time t ∈ [0, T ]. T is finite if and only if the solution a(t) corresponding to the chosen initial condition leaves the set " in finite time. Furthermore, it is not difficult [11, 12] to prove that conditions (2.1) are preserved by the flow. The usefulness of our wave packets stems from the following important property [13]. If we decompose the potential as E(X) = Wa (X) + Za (X) ≡ Wa (X) + (E(X) − Wa (X)), where Wa (X) denotes the second order Taylor approximation (with the obvious abuse of notation) Wa (X) ≡ E(a) + E (1) (a)(X − a) + E (2) (a)(X − a)2 /2 then for all multi-indices j , i h¯
∂ iS(t)/h¯ ϕj (A(t), B(t), h, e ¯ a(t), η(t), X) ∂t 2 h¯ = − + Wa(t) (X) eiS(t)/h¯ ϕj (A(t), B(t), h, ¯ a(t), η(t), X) , 2
if A(t), B(t), a(t), η(t), and S(t) satisfy (2.4). In other words, our semiclassical wave packets ϕj exactly take into account the kinetic energy and quadratic part Wa(t) (X) of the potential when propagated by means of the classical flow and its linearization around the classical trajectory selected by the initial conditions. In the rest of the paper, whenever we write ϕj (A(t), B(t), h¯ , a(t), η(t), X), we tacitly assume that A(t), B(t), a(t), η(t), and S(t) are solutions to (2.4) with initial conditions satisfying (2.1). 3. The Born–Oppenheimer Expansion in Powers of In this section we derive an explicit formal expansion in for the solution to the molecular Schrödinger equation by means of a multiple scales analysis. This asymptotic analysis is similar to that performed, e.g., in [10]. We discuss this in detail because we need more detailed information on the structure of successive terms in the expansion.
590
G. A. Hagedorn, A. Joye
We start with the molecular Schrödinger equation for d nuclear configuration dimensions, i 2
4 ∂& = − X & + h(X)&. ∂t 2
(3.1)
We consider the isolated, multiplicity one, smooth eigenvalue E(X) of h(X) of hypothesis H1 . For the moment we assume E(X) is well defined on all of Rd rather than just on a subset " ⊂ Rd . Later we introduce a cut-off function to take care of the general case. We consider the solution a(t), η(t), A(t), B(t), and S(t) to the system (2.4) of (X, t) so that ODE’s. Then, we choose the phase of the eigenfuction $ ∂ (X, t)Hel = 0. + iη(t)∇X )$ (3.2) ∂t This can always be done. See, e.g., [10]. The multiple scales analysis consists of separating the two length scales that are important in the nuclear variable X. The electron wave function is sensitive on an O(1) scale in this variable, so X, or equivalently, X−a(t) is relevant. The quantum mechanical fluctuations of nuclear wave function occur on an O() length scale, so (X − a(t))/ is also relevant. We replace the variable X by both w = X − a(t) and y = w/, and consider them as independent variables. This leads to the new problem of studying 2 4 2 ∂& i = − w − 3 ∇w · ∇y − y + i 2 η(t) · ∇w + iη(t) · ∇y ∂t 2 2 . (3.3) + [h(a(t) + w) − E(a(t) + w)] + E(a(t) + y) & (X, t), (i $
(w, y, t) solves (3.3) then & (X − a(t), (X − a(t))/, t) solves We easily check that if & (3.1). (X, t). Then (3.2) becomes We define $(w, t) = $ ∂ $(w, t)Hel = 0. ∂t We seek solutions to (3.3) of the form $(w, t), i
(3.4)
(w, y, t) = eiS(t)/ 2 eiη(t)·y/ φ(w, y, t). & This requires φ(w, y, t) to satisfy 2 ∂φ 2 4 = − w − 3 ∇w · ∇y + − y + E (2) (a(t))y 2 i 2 ∂t 2 2 2 + [h(a(t) + w) − E(a(t) + w)] (3.5) y2 + E(a(t) + y) − E(a(t)) − E (1) (a(t)) · y − 2 E (2) (a(t)) φ, 2! where here and below we make use of the shorthand notation (D k E)(x)y k ym = , E (m) (x) m! k! {k:|k|=m}
Time-Dependent Born–Oppenheimer Approximation
591
in the usual multi-index notation. We next assume that φ(w, y, t) has an expansion of the form φ(w, y, t) = φ0 (w, y, t) + φ1 (w, y, t) + 2 φ2 (w, y, t) + · · · . We further decompose each φn as φn (w, y, t) = gn (w, y, t)$(w, t) + φn⊥ (w, y, t), by projecting into the $(w, t) direction and into the orthogonal directions in Hel . We substitute this expansion into (3.5) and equate terms of the corresponding powers of . Order 0. The zeroth order terms require [h(a(t) + w) − E(a(t) + w)] φ0 (w, y, t) = 0. This forces φ0⊥ (w, y, t) = 0. Order 1. The first order terms require [h(a(t) + w) − E(a(t) + w)] φ1 (w, y, t) = 0. This forces φ1⊥ (w, y, t) = 0. Order 2. The second order terms require 1 y2 ∂φ0 (2) = − y + E (a(t)) φ0 + [h(a(t) + w) − E(a(t) + w)] φ2 . i ∂t 2 2! We separately examine the components of this equation in the $ direction and in the orthogonal directions. By (3.4), this yields the two conditions ∂g0 1 y2 (2) i = − y + E (a(t)) g0 , ∂t 2 2!
(3.6)
and [h(a(t) + w) − E(a(t) + w)] φ2 = ig0
∂$ . ∂t
(3.7)
We arbitrarily choose g0 to be the following w-independent particular solution of (3.6): g0 (w, y, t) = −d/2
c0,j ϕj (A(t), B(t), 1, 0, 0, y),
|j |≤J
where c0,j = cj is determined by the initial conditions.
(3.8)
592
G. A. Hagedorn, A. Joye
We let the Hilbert space Hel⊥ be the subspace of Hel orthogonal to $(w, t). The restriction of [h(a(t) + w) − E(a(t) + w)] to Hel⊥ is invertible, and we denote the inverse by r(w, t) = [h(a(t) + w) − E(a(t) + w)]−1 r . With this notation, Eq. (3.7) forces φ2⊥ (w, y, t) = ig0 (w, y, t)r(w, t) = −d/2
∂$ (w, t) ∂t
d2,j (w, t)ϕj (A(t), B(t), 1, 0, 0, y),
(3.9)
|j |≤J
where d2,j (w, t) = c0,j r(w, t)
∂$ (w, t) ∂t
(3.10)
is Hel -valued. Order 3. The third order terms require 1 y2 ∂φ1 = − y + E (2) (a(t)) φ1 i ∂t 2 2! −∇w · ∇y φ0 + E (3) (a(t))
y3 φ0 + [h(a(t) + w) − E(a(t) + w)] φ3 . 3!
We separately examine the components of this equation in the $ direction and in the orthogonal directions. By (3.4), this yields the two conditions ∂g1 y2 1 (2) i − − y + E (a(t)) g1 ∂t 2 2! = −(∇y g0 ) · $, ∇w $ + E (3) (a(t))
y3 g0 , 3!
(3.11)
and [h(a(t) + w) − E(a(t) + w)] φ3 = ig1
∂$ + (∇y g0 ) · (P⊥ ∇w $), ∂t
(3.12)
where P⊥ (w, t) is the projection in Hel onto Hel⊥ . The solution to (3.11) with g1 (w, y, 0) = 0 can be written as g1 (w, y, t) = −d/2 c1,j (w, t)ϕj (A(t), B(t), 1, 0, 0, y), |j |≤J +3
for some coefficients c1,j (w, t). Equation (3.12) determines φ3⊥ (w, y, t) ∂$ = r(w, t) ig1 (w, y, t) (w, t) + (∇y g0 )(w, y, t) · (P⊥ (w, t)∇w $(w, t)) (3.13) ∂t = −d/2 d3,j (w, t)ϕj (A(t), B(t), 1, 0, 0, y), |j |≤J +3
Time-Dependent Born–Oppenheimer Approximation
where
593
˙ t) c1,j (w, t) d3,j (w, t) = i r(w, t)$(w, + r(w, t)(P⊥ ∇w $)(w, t) · ϕj , ∇y ϕq c0,q (w, t). |q|≤J
Here and below ˙ ≡
∂ . ∂t
Order n. The nth order terms require ∂φn−2 y2 1 1 (2) i = − y + E (a(t)) φn−2 − w φn−4 − ∇w · ∇y φn−3 ∂t 2 2! 2 +
n m=3
E (m) (a(t))
ym φn−m + [h(a(t) + w) − E(a(t) + w)] φn . m!
The components of this equation in the $(w, t) direction require ∂gn−2 1 1 i − − y + E (2) (a(t))y 2 gn−2 ∂t 2 2! 1 1 = − w gn−4 − $, ∇w $ · (∇w gn−4 ) − $, w $gn−4 2 2 − ∇w · ∇y gn−3 − $, ∇w $ · (∇y gn−3 ) +
n
E (m) (a(t))
m=3
ym gn−m m!
∂$
1 ⊥ ⊥ ⊥ − $, w φn−4 − $, ∇w · ∇y φn−3 +i . (3.14) , φn−2 2 ∂t ∂φ ⊥ ∂$ ⊥ Note that the last term has been transformed from −i $, n−2 to i . The , φn−2 ∂t ∂t ⊥ = 0 with equivalence of these expressions follows from differentiation of $, φn−2 respect to t. The components orthogonal to $(w, t) require [h(a(t) + w) − E(a(t) + w)] φn n ⊥ ∂φn−2 1 ym ⊥ 1 ⊥ = P⊥ i − E (m) (a(t)) φn−m + y − E (2) (a(t))y 2 φn−2 ∂t 2 2! m! m=3
+
1 1 ⊥ + (P⊥ ∇w $) · (∇w gn−4 ) + (P⊥ w $)gn−4 P⊥ w φn−4 2 2
+
⊥ P⊥ ∇w · ∇y φn−3 + (P⊥ ∇w $) · (∇y gn−3 ) + i
∂$ gn−2 . ∂t
Equation (3.15) determines φn⊥ (w, y, t) by an application of −1 −E(a(t) + w) r .
(3.15)
h(a(t) + w)
594
G. A. Hagedorn, A. Joye
It is easily checked that the solution to (3.14) with gn−2 (w, y, 0) = 0 has the form cn−2,j (w, t)ϕj (A(t), B(t), 1, 0, 0, y), (3.16) gn−2 (w, y, t) = −d/2 |j |≤J +3n−6
for some coefficients cn−2,j (w, t), and that the y dependence of the vector φn⊥ has the same form, with other coefficients depending on (w, t), i.e. , dn,j (w, t)ϕj (A(t), B(t), 1, 0, 0, y), (3.17) φn⊥ (w, y, t) = −d/2 |j |≤J +3n−6
where the dn,j (w, t) take their values in the electronic Hilbert space. Equations (3.14) and (3.15) determine cn−2,j and dn,j . When recursively solving these equations, we must determine dn,j before cn,j because the right-hand side of (3.14) (with n − 2 replaced by n) contains φn⊥ . The solution to (3.15) in terms of the dn,j , is dn,j (w, t) =
8
i (w, t),
(3.18)
i=1
where 1 (w, t) = ir(w, t)P⊥ (w, t)d˙n−2,j (w, t), 2 (w, t) = −
3≤|m|≤n
3 (w, t) =
(D m E)(a(t)) m!
ϕj , y m ϕq r(w, t)dn−|m|,q (w, t),
|q|≤J +3(n−|m|−2)
1 r(w, t)P⊥ (w, t)(w dn−4,j )(w, t), 2
4 (w, t) = r(w, t)P⊥ (w, t)(∇w $) · (∇w cn−4,j )(w, t), 1 r(w, t)P⊥ (w, t)(w $)cn−4,j (w, t), 2 6 (w, t) = r(w, t)P⊥ (w, t)ϕj , ∇y ϕq (∇w dn−3,q )(w, t), 5 (w, t) =
|q|≤J +3(n−5)
7 (w, t) =
r(w, t)P⊥ (w, t)(∇w $)ϕj , ∇y ϕq cn−3,q (w, t),
|q|≤J +3(n−3)
˙ 8 (w, t) = ir(w, t)P⊥ (w, t)$(w, t)cn−2,j (w, t). Similarly, the solution to (3.14) in terms of the cn,j is obtained by integration with respect to t of i c˙n,j (w, t), where i c˙n,j (w, t) =
9 i=1
i (w, t),
(3.19)
Time-Dependent Born–Oppenheimer Approximation
595
where 1 1 (w, t) = − (w cn−2,j )(w, t), 2 2 (w, t) = −$, ∇w $ · (∇w cn−2,j )(w, t), 1 3 (w, t) = − $, w $cn−2,j (w, t), 2 4 (w, t) = − ϕj , ∇y ϕq · (∇w cn−1,q )(w, t), |q|≤J +3(n−1)
5 (w, t) = −
$, ∇w $ · ϕj , ∇y ϕq cn−1,q (w, t),
|q|≤J +3(n−1)
6 (w, t) =
3≤|m|≤n+2 |q|≤J +3(n+2−m)
(D m E)(a(t)) ϕj , y m ϕq cn+2−m,q (w, t), m!
1 $, (w dn−2,j )(w, t), 2 8 (w, t) = − ϕj , ∇y ϕq · $, (∇w dn−2,q )(w, t), 7 (w, t) =
|q|≤J +3(n−3)
˙ dn,j (w, t). 9 (w, t) = i$, 4. The Main Result We introduce a C ∞ real valued cut-off function F : Rd → R that equals 1 in a neighborhood of the origin and equals zero away from the origin. More precisely, we choose 0 < b0 < b1 < ∞, such that supp(∂wi F )(w) ⊆ {w ∈ Rd : b0 < |w| < b1 }, for any i ∈ {1, . . . , d}, and such that for any t ∈ A, all quantities appearing in the above expansion are well defined for w ∈ Rd with |w| < b1 . Here A is a particular simply connected open complex neighborhood of the real interval [0, T ] that we construct in Sect. 5 under hypotheses H0 and H1 . We define our approximate solution to (3.1) at order N by the following expression: ˆ N (w, y, t) &
(4.1) iS(t)/ 2 iη(t)·y/
= F (w)e
e
N n=0
n
gn (w, y, t)$(w, t) +
N+2 n=2
n φn⊥ (w, y, t)
.
We prove in Sect. 7.2 that this quantity agrees with an exact solution up to an error whose norm is bounded by N for t ∈ [0, T ]. We emphasize that once the molecular hamiltonian h(X) and its spectral data E(X), $(X) are given, the only arbitrary input of the above derived expansion consists of the
596
G. A. Hagedorn, A. Joye
set of coefficients c0,j , |j | ≤ J . We note that at time t = 0, we have cn,j (0, w) ≡ 0 for all n ≥ 1. Thus, at t = 0, the approximation reduces to N+2 2 ˆ N (w, y, 0) = F (w)eiS(0)/ eiη(0)·y/ g0 (0, y, 0)$(w, 0) + & n φn⊥ (w, y, 0) . n=2
This expression is completely determined by g0 (0, y, 0), the nuclear part of the wave function parallel to the chosen electronic level at time 0. As is usual in the study of adiabatic problems, in order to get accurate information on the evolution of an initial wave function associated with a specific electronic level, one needs to include a higher order component perpendicular to that electronic level. This higher order part is completely determined by the parallel part. Here it is given (up to N+2 phase and cut-off functions) at time 0 by n φn⊥ (w, y, 0). We now state our main theorem:
n=2
Theorem 4.1. Assume hypotheses H0 and H1 and consider the above construction. For all sufficiently small choices of g > 0, there exist C(g) > 0 and (g) > 0 such that, for ˆ N() (X − a(t), (X − a(t))/, t) satisfies N() = [[g 2 / 2 ]], the vector &∗ (X, t, ) = & −itH ()/ 2 2 e &∗ (X, 0, ) − &∗ (X, t, )L2 (Rd ,H ) ≤ C(g)e−(g)/ , el
for all t ∈ [0, T ], as → 0. Moreover, we have the following exponential localization result. For any b > 0 and a sufficiently small choice of g > 0 (that depends on b), there exist c(g) and γ (g) > 0, such that 1/2 2 2 &∗ (X, t, )Hel dx ≤ c(g)e−γ (g)/ , |x−a(t)|>b
for all t ∈ [0, T ], as → 0. ˆ N (X − The strategy of the proof is as follows: We consider the approximation & a(t), (X − a(t))/, t) and the exact solution to the Schrödinger equation with the same initial conditions. We estimate the norm of the error (that is the difference between these two quantities) as a function of both N and . Apart from some subtleties, the norm of the error is bounded by C N (τ N 1/2 )N , for some constants C and τ > 0. We minimize the error estimate over all choices of N . This yields N " g 2 / 2 , for sufficiently small 2 g > 0, and an estimate of order e−(g)/ for the norm of the error. We prove two extensions of this result in Sect. 8. In the first extension, we consider the validity of our approximation on the Ehrenfest time scale, i.e. , when T = T () " ln(1/). In the second extension, we study the dependence of our construction on J , in order to extend our main result to a wider class of initial conditions. We refer the reader to Sect. 8 for the precise statements. 5. Analyticity Properties Our estimates depend on analyticity in t ∈ A of the vectors cn (w, t) ∈ l 2 (Nd , C) and dn (w, t) ∈ l 2 (Nd , Hel ), where A is the particular simply connected open complex neighborhood of the real interval [0, T ] mentioned at the beginning of Sect. 4.
Time-Dependent Born–Oppenheimer Approximation
597
To construct A, we begin with several observations. Our hypotheses imply that the eigenvalue E(X) is analytic in %δ , so the solutions a(t), η(t), A(t), B(t), and S(t) are well defined for all t ∈ [0, T ]. Moreover, by standard arguments [6], these functions all have analytic continuations from [0, T ] to a simply connected open set A1 that contains [0, T ]. We assume without loss of generality that A1 = A1 , where A1 denotes the conjugate of A1 . We note that A∗ (t) and B ∗ (t) also have analytic continuations from [0, T ] to A1 . To see this for A∗ (t), note that for t ∈ [0, T ], A∗ (t) = A∗ (t), and A∗ (t) has an analytic continutation to A1 . The argument for B ∗ (t) is similar. It now follows easily from the definitions that for each X, ϕj (A(t), B(t), 2 , a(t), η(t), X) and ϕj (A(t), B(t), 2 , a(t), η(t), X) have analytic continuations from [0, T ] to some simply connected open set A2 . For t ∈ [0, T ], the real part of B(t)A(t)−1 is strictly positive. This positivity will remain true for the real part of the analytic continuation of B(t)A(t)−1 on some simply connected subset A ⊂ A1 ∩ A2 that contains [0, T ]. We assume without loss of generality that A = A and we can assume that A has the form {t : −a < Ret < b and |Imt| < c} where a > c > 0 and b > T + c. It follows that for t ∈ A, both ϕj (A(t), B(t), 2 , a(t), η(t), x) and ϕj (A(t), B(t), 2 , a(t), η(t), x) have analytic continuations from [0, T ] to A as elements of L2 (Rd ). Using these results and carefully examining the constructions of the vectors cn (w, t) and dn (w, t), we see that they are analytic in t for t ∈ A, also. Our hypotheses on h(·) and the above results also show that each of the following quantities is analytic in t for t ∈ A and each fixed w ∈ %δ ⊂ Cd , for sufficiently small δ: r(w, t) = [h(a(t) + w) − E(a(t) + w)]−1 r , $(w, t), α (Dw $)(w, t), α Dw E(a(t)),
for |α| ≤ 2, for all
α,
P⊥ (w, t). By explicit computation of the phase corresponding to (3.2) it is easy to check that $(w, t) and its derivatives are also analytic for t ∈ A. Moreover, if fi (w, t), (i in some finite set) represents any of these quantities, fi is analytic in w ∈ %δ , for any fixed t ∈ A. Thus, by the Cauchy integral formula, we can assume that the following bounds hold (with the appropriate norm in each case): α α! (D fi )(w, t) ≤ ci G|α| , (5.1) w i (1 + |α|)d+1 for some ci , Gi , w ∈ %δ , and α ranges over all multi-indices. We can assume here that all Gi ≤ D2 for some constant D2 ≥ 1, and we associate the prefactors ci in (5.1) with the different functions according to the rules c1 ↔ rP⊥ , c2 ↔ $, ˙ $, c4 ↔ ∇w $, c3 ↔ c5 ↔ c4
w $,
↔ $, ∇w $,
c6 ↔ c5
E,
↔ $, w $.
598
G. A. Hagedorn, A. Joye
6. Structure and Estimates of the cn (w, t) and dn (w, t) In this section, we decompose the functions gn and φn⊥ of Sect. 3 into pieces, each of which satisfies various estimates. Throughout this section, all w-dependent quantities are defined for w in the support of the cut-off function F . Furthermore, all the results of this section are claimed to hold only on the support of F . Our decompositions of gn (w, y, t) and φn⊥ (w, y, t) have the following forms: gn (w, y, t)
(6.1)
= −d/2
β∈Bn,1 p≤n |l|+k≤p+ n |j |≤J +n+2(p−|l|−k)
cn,p,l,k,β,j (w, t)ϕj (A(t), B(t), 1, 0, 0, y).
2
and φn⊥ (w, y, t) = −d/2
(6.2)
β∈Bn,2 p≤n−1 |l|+k≤p+ n−1 |j |≤J +(n−1)+2(p−|l|−k)
dn,p,l,k,β,j (w, t)ϕj (A(t), B(t), 1, 0, 0, y).
2
In (6.1), n, k and p are non-negative integers; j and l are multi-indices; and the index β runs over a finite set Bn,1 . The number J is fixed by the initial conditions. Each cn,p,l,k,β,j is a complex valued function. In (6.2), n ≥ 2, k and p are non-negative integers; j and l are multi-indices; and the index β runs over a finite set Bn,2 . Each dn,p,l,k,β,j (w, t) takes values in Hel . We let cn,p,l,k,β (w, t) and dn,p,l,k,β (w, t) respectively denote vectors in l 2 (Nd , C) and l 2 (Nd , Hel ) whose components are cn,p,l,k,β,j (w, t) and dn,p,l,k,β,j (w, t). The crucial step in the proof of Theorem 4.1 is the following: Proposition 6.1. There is a recursive construction of the coefficients cn,p,l,k,β,j (w, t) and dn,p,l,k,β,j (w, t) for w on the support of F . The indices for cn,p,l,k,β,j (w, t) are non-negative and satisfy β ∈ Bn,1 , p ≤ n, n , 2 |j | ≤ J + n + 2(p − |l| − k).
|l| + k ≤ p +
The indices for dn,p,l,k,β,j (w, t) are non-negative and satisfy n ≥ 2, β ∈ Bn,2 , p ≤ n − 1, n−1 , 2 |j | ≤ J + (n − 1) + 2(p − |l| − k).
|l| + k ≤ p +
Moreover, the following conditions are satisfied:
Time-Dependent Born–Oppenheimer Approximation
599
(i) For any n > 0, cn,0,l,k,β,j (w, t) = 0. (ii) There exists K0 > 0, such that the number of terms in both of the sums (6.1) and (6.2) is bounded by eK0 n . (iii) For t ∈ A, let dist(t) be the distance from t to the complement of A. The coefficients cn,p,l,k,β (w, t) and dn,p,l,k,β (w, t) are analytic for t ∈ A, and there exist constants D1 and D2 , such that α (Dw cn,p,l,k,β )(w, t) |α|+|l|+4n
≤ D1 D2
(6.3)
(α + l)! |t|p k k (1 + |α|)d+1 p! dist(t)k
(J + n + 2(p − |l| − k))! J!
1/2 ,
and α (Dw dn,p,l,k,β )(w, t) |α|+|l|+4(n−1)
≤ D1 D2
(α+l)! |t|p k k (1+|α|)d+1 p! dist(t)k
(6.4) 1/2 (J +(n−1) + 2(p−|l|−k))! . J!
Remark. The complicated estimates (6.3) and (6.4) are motivated by estimates used in semiclassical approximations and adiabatic approximations. The factors on the righthand sides that explicitly involve J , n, and p occur in the semiclassical paper [15]. The factors that involve α and l appear in the adiabatic paper [27]. The factors that involve k occur in a proof of the adiabatic results of [27] that are based on Cauchy estimates instead of Nenciu’s lemma [27] (that we generalize below as Lemma 6.4). We were unable to prove Proposition 6.1 without using a combination of all of these techniques. We estimate adiabatic error terms by using Nenciu’s approach in the w variable and Cauchy estimates in the t variable. 6.1. The toolbox. To prove Proposition 6.1, we repeatedly use the following very handy lemmas, whose proofs are given in Sect. 9. The first two lemmas deal with basic properties of analytic functions of one variable and are consequences of the Cauchy integral formula. Lemma 6.1. For k = 0, define k k = 1. Suppose g is an analytic vector-valued function on the strip Sδ = {t : |Imt| < δ}. If g satisfies g(t) ≤ Ck k (δ − |Imt|)−k , for some k ≥ 0, then g satisfies g (t) ≤ C(k + 1)k+1 (δ − |Imt|)−k−1 , for all t ∈ Sδ . Lemma 6.1 has a generalization to regions other than infinite strips. The generalization is needed if one wishes to study problems where analyticity holds only in a neighborhood of a finite time interval. The proof of the generalized lemma is similar to that of Lemma 6.1, but involves slightly more complicated geometry. The precise statement is the following:
600
G. A. Hagedorn, A. Joye
Lemma 6.2. For k = 0, define k k = 1. Suppose g is an analytic vector-valued function in an open region A ⊂ C. For t ∈ A, let dist(t) be the distance from t to AC , the complement of A. If g satisfies g(t) ≤ Ck k (dist(t))−k , for all t ∈ A and some k ≥ 0, then g satisfies g (t) ≤ C(k + 1)k+1 (dist(t))−k−1 , for all t ∈ A. The next lemma gives estimates on indefinite integrals of certain analytic functions under stronger assumptions on the domain A. Lemma 6.3. Suppose f is an analytic vector-valued function in an open region A ⊂ C. For t ∈ A, let dist(t) be the distance from t to AC . We assume the domain is starshaped with respect to the origin and that the origin is the most distant point to AC , i.e., dist(0) ≥ dist(t), for all t ∈ A. Moreover, we assume that dist(t) is monotone decreasing along any line emanating from the origin. If f satisfies f (t) ≤ C|t|p (dist(t))−k , t for all t ∈ A and some k ≥ 0, then 0 f (s)ds satisfies t p+1 f (s)ds ≤ C |t| (dist(t))−k , p+1 0
for all t ∈ A. Remark. In our situation, examples of sets A we can use that satisfy the conditions of Lemma 6.3 are infinite symmetrical horizontal strips or the rectangular regions chosen in Sect. 5. A fourth tool we repeatedly use below is a multidimensional generalization of a lemma used in [27]. We warn the reader that the symbol for a norm means different things in different contexts, e.g., for scalar-valued, operator-valued, and vector-valued functions, it respectively means absolute value, operator norm, and vector space norm. Lemma 6.4. The quantity ν = sup (1 + |α|)d+1 α
{l:0≤li ≤αi }
1 1 (1 + |l|)d+1 (1 + |α − l|)d+1
(6.5)
is finite. Let % be an open subset of Cd . Suppose M(·) ∈ C ∞ (%) is scalar-valued or operatorvalued, and N (·) ∈ C ∞ (%) is either operator-valued or vector-valued. Assume these functions satisfy α D M (x) ≤ m(x)a(x)|α+p|
(α + p)! , (1 + |α|)d+1
(6.6)
α D N (x) ≤ n(x)a(x)|α+q|
(α + q)! (1 + |α|)d+1
(6.7)
Time-Dependent Born–Oppenheimer Approximation
601
for x ∈ %, all multi-indices α, and some fixed multi-indices p and q. Then α D (MN ) (x) ≤ m(x)n(x)νa(x)|α+p+q| (α + p + q)! (1 + |α|)d+1
(6.8)
for each multi-index α, where ν is defined by (6.5).
6.2. Proof of Proposition 6.1. We prove Proposition 6.1 by induction and begin with the case n = 0. We construct c0,0,0,0,β,j ≡ c0,j with β = 1 ∈ B0,1 ≡ {1}. We note that there is no dn,p,l,k,β (w, t) for n ≤ 1; the inequalities for its indices in the conclusion to the proposition cannot be satisfied by non-negative integers. Whenever dn,p,l,k,β (w, t) with n ≤ 1 appears in any of the formal calculations below, it is understood to be zero. We now assume that the estimates (6.3) and (6.4) on cm,p,l,k,β (w, t) and dm,p,l,k,β (w, t) are true for all m ≤ n − 1 and prove they still hold for m = n. Our strategy is to show that each contribution i and i consists of a finite sum of terms that satisfy the required estimate. We estimate the number of terms by a separate argument. Our main tools are Lemmas 2.1, 6.2, 6.3, and 6.4. The index β must be considered when counting the number of terms, but it plays no role in the estimates of the individual terms. To simplify the notation, we drop it while estimating the terms. The Term 1 . We begin by considering the contribution to (6.4) from the term 1 in (3.18). By induction, each dn−2,p,l,k,β (w, t) is analytic for t ∈ A and has a p th order zero at t = 0. It follows that dn−2,p,l,k,β (w, t) = t p f (t), where f is analytic in A. When we take the time derivative, we obtain two terms, pt p−1 f (t) and t p f˙(t). These, respectively, give rise to two terms dn,p−1,l,k,β (w, t) and dn,p,l,k+1,β (w, t). We consider all w-derivatives of 1 (w, t). We apply the induction hypothesis, Lemma 6.2, and Lemma 6.4 to obtain α D rP⊥ (w, t)d˙n−2,p,l,k (w, t) w |α|+|l|+4(n−3)
≤ c1 νD1 D2 × =
(α + l)! (1 + |α|)d+1
(J + (n − 3) + 2(p − |l| − k))! J!
|α|+|l|+4(n−1) c1 νD1 D2−8 D2
t p (k + 1)k+1 |t|p−1 kk + (p − 1)! dist(t)k p! dist(t)k+1
kk (α + l)! |t|p (1 + |α|)d+1 p ! dist(t)k (J + (n − 1) + 2(p − |l| − k))! × J!
|α|+|l|+4(n−1) + c1 νD1 D2−8 D2
(α + l)! |t|p k k (1 + |α|)d+1 p! dist(t)k (J + (n − 1) + 2(p − |l| − k ))! , × J!
602
G. A. Hagedorn, A. Joye
with p = p − 1 and k = k + 1. We check that p ≤ n − 3 < n − 1, p ≤ n − 4 < n − 1, |l| + k ≤ p + (n − 3)/2 = p + (n − 1)/2, |l| + k ≤ p + (n − 3)/2 + 1 = p + (n − 1)/2, and the ranges of the components of each vector satisfy |j | ≤ J + (n − 1) + 2(p − |l| − k), |j | ≤ J + (n − 1) + 2(p − |l| − k ), as required. Hence, we get the desired bound for each of the two contributions, provided D28 ≥ c1 ν. The Term 2 . In the analysis of this term, we encounter an infinite matrix that represents multiplication by y m in the basis of semiclassical wave packets. We denote this matrix by ϕ, y m ϕ. Its entries are ϕj , y m ϕq (t), for multi-indices m, j, q ∈ Nd . We recall that Lemma 2.1 gives bounds for these matrix elements and also states that ϕj , y m ϕq (t) = 0 if ||j | − |q|| > |m|. We adopt the analogous notation for the infinite matrix ϕ, Dym ϕ that represents the operator Dym in the basis of semiclassical wave packets. √ We define d0 = 2d. Then, using (5.1), Lemmas 2.1, 6.3, 6.2, and 6.4, and some algebra, we obtain n D m E(a(t)) α m D ϕ(rP )(w, t)d (w, t) ϕ, y ⊥ n−m,p,l,k ˜ w m! |m|=m ˜ m=3 ˜
≤
n
c6 c1 νD2m˜ m! (d0 A) (1 + m) ˜ d+1 m!
m ˜
m=3 ˜ |m|=m ˜
(J + (n − 1) + 2(p − |l| − k))! J!
|α|+|l|+4(n−1−m) ˜
×D1 D2 ≤
n m=3 ˜ |m|=m ˜
D 1 c 6 c1 ν
(d0 A)m˜ D23m˜
|α|+|l|+4(n−1)
D2
×
(α + l)! |t|p k k (1 + |α|)d+1 p! dist(t)k
(α + l)! |t|p k k (1 + |α|)d+1 p! dist(t)k
(J + n − 1 + 2(p − |l| − k))! . J!
We also verify the constraints on the parameters and components of the vectors: p ≤ n−1−m ˜ ≤ n − 1, |l| + k ≤ p + (n − 1 − m)/2 ˜ ≤ p + (n − 1)/2, |j | ≤ J + (n − m ˜ − 1) + 2(p − |l| − k) + m ˜ ≤ J + (n − 1) + 2(p − |l| − k).
Time-Dependent Born–Oppenheimer Approximation
603
Hence, we see that each contribution from 2 (w, t) satisfies the required bound, provided the following two conditions are fulfilled: D23 ≥ d0 A, D29 ≥ (d0 A)3 c6 c1 ν. There are
n
1≤
m=3 ˜ |m|=m ˜
|m|≤n
n+d 1= d
≤ σ0 eσ n
such contributions, where σ > 0 can be chosen arbitrarily small (see [14]). The Term 3 . For this term we make the laplacian explicit and write w dn−4,p,l,k (w, t) =
d i=1
2 (Dw d )(w, t). i n−4,p,l,k
We introduce li,2 = l + (0, 0, . . . 0, 2, 0, . . . , 0), where the 2 sits in the i th column. We then estimate α1 D (w, t) d (w, t) rP ⊥ w n−4,p,l,k w2 d
≤
p 1 kk |α|+|l|+2+4(n−5) (α + li,2 )! |t| c1 νD1 D2 2 (1 + |α|)d+1 p! dist(t)k i=1
×
=
d c1 νD1
2D216 i=1
|α|+|li,2 |+4(n−1)
D2
(J + (n − 5) + 2(p − |l| − k))! J!
(α + li,2 )! |t|p k k (1 + |α|)d+1 p! dist(t)k ×
(J + (n − 1) + 2(p − |li,2 | − k))! . J!
Again, the constraints are satisfied since p ≤ n − 5 < n − 1, |li,2 | + k ≤ p + (n − 5)/2 + 2 = p + (n − 1)/2, |j | ≤ J + (n − 5) + 2(p − |l| − k) = J + (n − 1) + 2(p − |li,2 | − k), and each of the d contributions stemming from 3 (w, t) satisfies the required estimate, provided D216 ≥ c1 ν/2. We estimate each of the remaining terms i (w, t), i = 4, . . . , 8, in the same fashion, using the same tools. Since this is straightforward, we only outline the arguments.
604
G. A. Hagedorn, A. Joye
The Term 4 . We expand the dot product (∇w $) · (∇w cn−4,p,l,k ) =
d
(Dwi $)(Dwi cn−4,p,l,k )
i=1
and use the definition li,1 = l + (0, 0, . . . 0, 1, 0, . . . , 0), where the 1 sits at the i th column. Recall that the estimates on the cm,p,l,k ’s differ from those on the dm,p,l,k ’s by a shift of 1 in the m dependence. We have α D rP⊥ (w, t)∇w $(w, t) · ∇w cn−4,p,l,k (w, t) w
d c1 c 4 ν 2 D 1
(α + li,1 )! |t|p k k (1 + |α|)d+1 p! dist(t)k D212 i=1 (J + (n − 1) + 2(p − |li,1 | − k))! × , J! with all constraints on |j |, p, |li,1 |, k satisfied. Thus each of the d contributions stemming from 4 (w, t) satisfies the required estimate, provided ≤
|α|+|li,1 |+4(n−1)
D2
D212 ≥ c1 c4 ν 2 . The Term 5 . This term is similar to the previous one. We obtain α 1 D (w, t)( $)(w, t)c (w, t) rP w n−4,p,l,k w 2 ⊥ c1 c5 ν 2 D1 |α|+|l|+4(n−1) (α + l)! |t|p k k D2 (1 + |α|)d+1 p! dist(t)k 2D212 (J + (n − 1) + 2(p − |l| − k))! × , J! with all constraints on |j |, p, |l|, k satisfied. Thus the contribution stemming from 5 (w, t) satisfies the required estimate, provided ≤
D212 ≥ c1 c5 ν 2 /2. The Term 6 . At this point the matrices ϕ, Dyi ϕ play a role that we control by the momentum space analog of Lemma 2.1. Expanding the dot product and introducing the matrices ϕ, Dyi ϕ, i = 1, . . . , d we have the following estimate for this term: d α (rP⊥ )(w, t)ϕ, Dyi ϕDwi dn−3,p,l,k (w, t) D w i=1
≤
d d0 c1 νBD1 i=1
D212
|α|+|li,1 |+4(n−1)
D2
×
(α + li,1 )! |t|p k k (1 + |α|)d+1 p! dist(t)k
(J + (n − 1) + 2(p − |li,1 | − k))! J!
Time-Dependent Born–Oppenheimer Approximation
605
with all constraints on |j |, p, |li,1 |, k satisfied. Thus, each of the d contributions stemming from 6 (w, t) satisfies the required estimate, provided D212 ≥ d0 c1 νB. The Term 7 . Similarly, d α (rP⊥ )(w, t)ϕ, Dyi ϕ(Dwi $)(w, t)cn−3,p,l,k (w, t) Dw i=1
≤
d d0 c1 c4 ν 2 BD1 i=1
D28
(α + l)! |t|p k k (1 + |α|)d+1 p! dist(t)k (J + (n − 1) + 2(p − |l| − k))! × , J!
|α|+|l|+4(n−1)
D2
with all constraints on |j |, p, |l|, k satisfied. Thus, each of the d contributions stemming from 7 (w, t) satisfies the required estimate, provided D212 ≥ d0 c1 c4 ν 2 B. The Term 8 . Finally, α D rP⊥ (w, t)$(w, ˙ t)cn−2,p,l,k (w, t) w ≤
c1 c3 ν 2 D1 |α|+|l|+4(n−1) (α + l)! |t|p k k D2 (1 + |α|)d+1 p! dist(t)k D24 (J + (n − 1) + 2(p − |l| − k))! , × J!
with all constraints on |j |, p, |l|, k satisfied. Thus the contribution stemming from 8 (w, t) satisfies the required estimate, provided D24 ≥ c1 c3 ν 2 . We now perform a similar analysis for the quantities i (w, t) that appear in the expression for c˙n,p,l,k (w, t). We integrate these terms with respect to t and apply Lemma 6.3. According to the lemma, integration of a term with a given value of p gives rise to a term with p = p + 1 in the estimates. We also note that the estimates we want to prove for the c’s differ from those for the d’s by the replacement of n − 1 by n. t The Term 0 1 . We use the same techniques above to obtain α t 1 D (w cn−2,p,l,k )(w, s)ds w 0 2 d D1 |α|+|li,2 |+4n (α + li,2 )! t p (J + n + 2(p − |li,2 | − k))! kk ≤ D2 . 8 d+1 k (1 + |α|) p ! dist(t) J! 2D2 i=1
606
G. A. Hagedorn, A. Joye
We check that the constraints are satisfied: p ≤ n − 1 < n, |li,2 | + k ≤ p + (n − 2)/2 + 2 = p + n/2 |j | ≤ J + (n − 2) + 2(p − |l| − k) = J + n + 2(p − |li,2 | − k). Thus, each of the d contributions stemming from 1 (w, t) satisfies the required estimate provided D28 ≥ 1/2. t The Term 0 2 . Similarly, with p = p + 1, t d α $, Dwi $(w, s)Dwi cn−2,p,l,k (w, s)ds Dw 0 i=1
≤
d c νD1 4
i=1
D28
|α|+|li,1 |+4n D2
kk (α + li,1 )! |t|p (1 + |α|)d+1 p ! dist(t)k
(J + n + 2(p − |li,1 |−k))! , J!
with all constraints on |j |, p , |li,1 |, k satisfied. Thus each of the d contributions stemming from 2 (w, t) satisfies the required estimate, provided D28 ≥ c4 ν. t The Term 0 3 . Again, with p = p + 1, α t 1 D $, (w $)(w, s)cn−2,p,l,k (w, s)ds w 2 0 ≤
c5 νD1
|α|+|l|+4n (α + l)! D2 8 (1 + |α|)d+1 2D2
|t|p kk p ! dist(t)k
(J + n + 2(p − |l| − k))! , J!
with all constraints on |j |, p , |l|, k satisfied. Thus, the contribution stemming from 3 (w, t) satisfies the required estimate, provided D28 ≥ c5 ν/2. t The Term 0 4 . Recall that the matrices ϕ, Dyi ϕ are controlled by an analog of Lemma 2.1, t d α ϕ, Dyi ϕDwi cn−1,p,l,k (w, s)ds Dw 0 i=1
≤
d d0 BD1 i=1
D24
|α|+|li,1 |+4n
D2
(α + li,1 )! |t|p kk (1 + |α|)d+1 p ! dist(t)k (J + n + 2(p − |li,1 | − k))! , × J!
Time-Dependent Born–Oppenheimer Approximation
607
with all constraints on |j |, p , |li,1 |, k satisfied. Thus each of the d contributions stemming from 4 (w, t) satisfies the required estimate, provided D24 ≥ d0 B. The Term
t 0
5 . For this term we obtain
t d α $, (Dwi $)(w, s)ϕ, Dyi ϕcn−1,p,l,k (w, s)ds Dw 0 i=1
d c4 νd0 BD1 |α|+|l|+4n (α + l)! D2 4 (1 + |α|)d+1 D 2 i=1
≤
×
|t|p kk p ! dist(t)k
(J + n + 2(p − |l| − k))! , J!
with all constraints on |j |, p , |l|, k satisfied. Thus each of the d contributions stemming from 5 (w, t) satisfies the required estimate, provided D24 ≥ c4 νd0 B. t The Term 0 6 . In this term, we encounter the sum over all previous c’s. As in the similar contribution from 2 , we obtain n+2 t m α D E(a(t)) m D ϕc (w, s) ϕ, y n+2−m,p,l,k ˜ w m! 0 m=3 ˜ |m|=m ˜
≤
n
(d0 A) D1 c6 D28 D23m˜ m=3 ˜ |m|=m ˜
m ˜
|α|+|l|+4n D2
×
(α + l)! |t|p kk (1 + |α|)d+1 p ! dist(t)k
(J + n + 2(p − |l| − k))! . J!
We check that the constraints on the parameters and components of the vectors are satisfied p ≤ n − m ˜ + 3 ≤ n, |l| + k ≤ p + (n − m ˜ + 2)/2 ≤ p + n/2, |j | ≤ J + n + 2 + 2(p − |l| − k) = J + n + 2(p − |l| − k). Hence we see that each contribution from 2 (w, t) satisfies the required bound, provided the following two conditions are fulfilled D23 ≥ d0 A, D2 ≥ (d0 A)3 c6 .
608
G. A. Hagedorn, A. Joye
There are
n
1 ≤ σ0 eσ n such contributions, where σ > 0 can be chosen arbitrarily
m=3 ˜ |m|=m ˜
small. t The Term 0 7 . This terms depends on the d’s. Recall the estimates are a little different for them. t d 1 α 2 $(w, s), (Dwi dn−2,p,l,k )(w, s)ds Dw 2 0 i=1
≤
d c2 νD1 i=1
2D212
|α|+|li,2 |+4n
D2
(α + li,2 )! |t|p kk (1 + |α|)d+1 p ! dist(t)k (J + n + 2(p − |li,2 | − k))! × , J!
with all constraints on |j |, p , |li,2 |, k satisfied. Thus each of the d contributions stemming from 7 (w, t) satisfies the required estimate, provided D212 ≥ c2 ν/2. t The Term 0 8 . Similarly, t d α ϕ, Dyi ϕ$(w, s), (Dwi dn−1,p,l,k )(w, s)ds D w 0 i=1
≤
d c2 νd0 BD1 i=1
D28
|α|+|li,1 |+4n
D2
kk (α + li,1 )! |t|p (1 + |α|)d+1 p ! dist(t)k (J + n + 2(p − |li,1 | − k))! , J!
with all constraints on |j |, p , |li,1 |, k satisfied. Thus, each of the d contributions stemming from 8 (w, t) satisfies the required estimate provided D28 ≥ c2 νd0 B. t The Term 0 9 . Finally, α t D ˙ $(w, s), d (w, s)ds n,p,l,k w 0
c3 νD1 |α|+|l|+4n (α + l)! |t|p kk ≤ D 2 (1 + |α|)d+1 p ! dist(t)k D24
(J + n + 2(p − |l| − k))! , J!
with all constraints on |j |, p , |l|, k satisfied. Thus, the contribution stemming from 9 (w, t) satisfies the required estimate, provided D24 ≥ c3 ν.
Time-Dependent Born–Oppenheimer Approximation
609
By choosing D2 large enough, all conditions are satisfied. This completes the induction for part iii) of Proposition 6.1. The integration required to construct the c’s shows that we obtain non-zero results for cn,p,l,k,β for n > 0 only when p ≥ 1. This proves part (i) of Proposition 6.1. We now turn to the proof of part (ii) of Proposition 6.1.
6.3. Counting the number of terms that occur in our expansion. In our Born–Oppenheimer expansion, the nth order term has the form φn (w, y, t) = gn (w, y, t)$(w, t) + φn⊥ (w, y, t). The way we compute gn (w, y, t) and φn⊥ (w, y, t), they decompose naturally as sums over the parameter β. We define un to be the number of such terms in gn (w, y, t) and vn to be the number of terms in φn⊥ (w, y, t). An examination of our construction shows that un and vn satisfy the recursive estimates un+1 ≤ vn+1 ≤
3
aj un−j +
3
bj vn−j +
n
j =0
j =0
j =0
3
3
n
j =0
dj un−j +
ej vn−j +
j =0
j c1 γ1 un−j
+
n
j
c2 γ2 vn−j + vn+1 , (6.9)
j =0 j
c3 γ3 un−j +
j =0
n
j
c4 γ4 vn−j ,
(6.10)
j =0 j
where ai , bi , ci , di , ei and γi are fixed numbers. The exponentials γi arise from an estimate (proven in the proof of Lemma 5.2 of [15]) for the number of Taylor series terms of any given order in the expansion of E(a(t) + y). We substitute (6.10) for the last term in (6.9) and add the result to (6.10). By some simple estimates this leads to a recursive estimate for the single quantity zn = un + vn of the form zn+1 ≤
3
aj zn−j +
j =0
n
c γ j zn−j .
j =0
An easy induction on n shows that this implies that zn grows at most like ekn for a sufficiently large value of k. The quantity zn is the number of terms in φn (w, y, t), so this proves the assertion. & ' Proposition 6.1 now follows easily.
' &
7. Exponential Error Bounds In this section, we prove Theorem 4.1.
7.1. The explicit error term. We use the following abstract lemma, whose proof is an easy application of Duhamel’s formula (see e.g. [13]).
610
G. A. Hagedorn, A. Joye
Lemma 7.1. Suppose H (h¯ ) is a family of self-adjoint operators for h¯ > 0. Suppose ψ(t, h¯ ) belongs to the domain of H (h¯ ), is continuously differentiable in t, and approximately solves the Schrödinger equation in the sense that i h¯
∂ψ (t, h¯ ) = H (h¯ )ψ(t, h¯ ) + ξ(t, h¯ ), ∂t
where ξ(t, h¯ ) satisfies
ξ(t, h¯ ) ≤ µ(t, h¯ ).
Then, for t > 0, e−itH (h¯ )/h¯ ψ(0, h¯ ) − ψ(t, h¯ ) ≤ h¯ −1
t 0
µ(s, h¯ )ds.
The analogous statement holds for t < 0. We substitute our approximate solution (4.1) N iS/ 2 iη·y/ n N+1 ⊥ N+2 ⊥ Fe e φn + φN+1 + φN+2 n=0
into the Schrödinger equation and compute the residual term ξN . It is more convenient to write this term in the multiple scales notation. We also use the E (m) (a(t)) m (D j E)(a(t)) j notation m |j | y to denote the Taylor series term y . m! j! |j |=m
In this notation, the residual ξN (w, y, t) is given, up to a phase factor, by two sums of terms. The first one contains all terms that do not involve derivatives of the cut-off. The second contains all terms that do involve derivatives of the cut-off. The first sum is F (w) times the following: N+3 (w gN−1 ) $ 2 +
N+4 (w gN ) $ 2
(7.1) (7.2)
+ N+3 (∇w gN−1 ) · (∇w $)
(7.3)
+ N+4 (∇w gN ) · (∇w $)
(7.4)
N+3 gN−1 (w $) 2
(7.5)
+
N+4 gN (w $) 2 N+3 ⊥ + w φN−1 2 N+4 ⊥ + w φN 2 +
(7.6) (7.7) (7.8)
Time-Dependent Born–Oppenheimer Approximation
611
+ N+3 ∇w · ∇y gN $ + N+3 ∇y gN · (∇w $) ⊥ + N+3 ∇w · ∇y φN
(7.10)
⊥ + i N+3 φ˙ N+1
(7.12)
⊥ + i N+4 φ˙ N+2
(7.13)
(7.9)
(7.11)
N+5 ⊥ w φN+1 2 N+6 ⊥ + w φN+2 2 ⊥ + N+4 ∇w · ∇y φN+1 +
(7.14) (7.15) (7.16)
⊥ + N+5 ∇w · ∇y φN+2
(7.17)
N+3 ⊥ y φN+1 2 N+4 ⊥ + y φN+2 2
+
−
(7.18) (7.19)
N+3 (2) ⊥ E (a(t))y 2 φN+1 2
(7.20)
N+4 (2) ⊥ E (a(t))y 2 φN+2 2 N E (m) (a(t)) m − N−n E(a(t) + y) − m y gN−n $ m! −
n=0
−
N
(7.21)
(7.22)
m≤2+n
N−n
E(a(t) + y) −
n=0
m≤2+n
mE
(m) (a(t))
m!
y
m
⊥ φN−n
(7.23)
E (m) (a(t)) ⊥ m y m φN+1 − N+1 E(a(t) + y) − m!
(7.24)
E (m) (a(t)) ⊥ m . y m φN+2 − N+2 E(a(t) + y) − m!
(7.25)
m≤2
m≤2
612
G. A. Hagedorn, A. Joye
The second sum arises from terms in which the cut-off F (w) is differentiated. It is N n+4 n=0
2
(w F )gn $
+
N+2 n=0
+
N
(7.26)
n+4 (w F )φn⊥ 2
(7.27)
n+4 (∇w F ) · (∇w gn )$
(7.28)
n+4 gn (∇w F ) · (∇w $)
(7.29)
n=0
+
N n=0
+
N+2
n+4 (∇w F ) · (∇w φn⊥ )
(7.30)
n+3 (∇w F ) · (∇y gn )$
(7.31)
n=0
+
N n=0
+
N+2 n=0
n+3 (∇w F ) · (∇y φn⊥ ).
(7.32)
7.2. Optimal truncation. Each error term in the first sum (7.1)–(7.25) can be written as a uniformly bounded function times one of the following two forms: A = &(w, t) B=
cr,j (w, t)ϕj (y, t),
r |j |≤ρ(r)
dr ,j (w, t)ϕj (y, t),
r |j |≤ρ (r )
where &(w, t) ∈ Hel , ϕj (y, t) = −d/2 ϕj (A(t), B(t), 1, 0, 0, y), r, r denote a collective set of indices that belong to some finite set, and ρ(r) and ρ (r ) limit the number of multi-indices j allowed in the second sum. The error term ξ(w, y, t) ∈ Hel needs to be estimated for t ∈ R, in the following norm 1/2
ξ(t) = =
ξ (x − a(t), (x − a(t))/, t) H dx 2
Rd
Rd
el
1/2 ξ(w, w/, t)2H dw el
.
Time-Dependent Born–Oppenheimer Approximation
613
With that norm, using the Cauchy–Schwarz inequality and the L2 (Rd ) orthonormality of the ϕj (y, t), we obtain the following estimate for the norm of A in terms of the norm of vector cr (w, t) ∈ l 2 (Nd , C): A ≤
sup &(w, t)Hel
r w∈suppF
sup cr (w, t)
w∈suppF
1/2
1
.
(7.33)
|j |≤ρ(r)
By similar arguments we get the following estimate for the norm of B in terms of the norm of the vector dr (w, t) ∈ l 2 (Nd , Hel ): B ≤
r
sup dr (w, t)
w∈suppF
1/2 1
.
(7.34)
|j |≤ρ (r )
Note also that
1≤
|j |≤ρ (r )
ρ (r ) + d , d
(7.35)
which grows at most polynomially with ρ (r ). Lemma 7.2. For t ∈ [0, T ], and for any α ∈ Nd and γ ∈ Nd , there exist C0 > 0 and τ0 > 0, such that &(w, t)D α Dyγ cn,p,l,k,β,j (w, t)ϕj (y, t) w β∈Bn,1 p≤n k+|l|≤p+ n2 |j |≤J +n+2(p−|l|−k)
n ≤ C0 n1/2 τ0 and
(7.36)
β∈Bn,2 p≤n−1 k+|l|≤p+ n−1 |j |≤J +n−1+2(p−|l|−k)
α γ D Dy dn,p,l,k,β,j (w, t)ϕj (y, t) w
2
n ≤ C0 n1/2 τ0 .
(7.37)
γ
If the operator Dy is replaced by the operator y γ , the same bounds are valid. Proof. We begin with (7.36). We have γ α Dw cn,p,l,k,β,j (w, t)Dy ϕj (y, t) |j |≤J +n+2(p−|l|−k)
=
˜ |k|≤J +|γ |+n+2(p−|l|−k)
γ α cn,p,l,k,β (w, t) k˜ ϕk˜ (y, t). ϕ, Dy ϕDw
614
G. A. Hagedorn, A. Joye γ
αc We know that the vector ϕ, Dy ϕDw n,p,l,k,β (w, t) satisfies the estimate
ϕ, Dyγ ϕD α cn,p,l,k,β (w, t) ≤ D1 D |α|+|l|+4n w
2
×
(α + l)! |t|p k k (Bd0 )|γ | (1 + |α|)d+1 p! δ k
(J + |γ | + n + 2(p − |l| − k))! . J!
(7.38)
Here δ > 0 is the distance in the complex plane from [0, T ] to the complement of A. Since the number of indices in Bn,1 is bounded by eK0 n , D2 ≥ 1, and (α + l)! ≤ (|α| + |l|)!, we can estimate the sum (7.36) by k |α| D1 D2 (Bd0 )|γ | K0 n 11n/2 |t|p k e D2 √ d+1 p! δ J !(1 + |α|) p≤n |l|+k≤p+n/2 # ×(|α| + |l|)! (J + |γ | + n + 2(p − |l| − k))!.
(7.39)
Then, using a!b! ≤ (a + b)!, the fact that (a + 2p)!/(p!)2 is increasing in p, and p ≤ n, we have (J + |γ | + n + 2(p − |l| − k))!((|α| + |l|)!)2 (J + |γ | + 2|α| + 3n − 2k)! ≤ , 2 (p!) (n!)2 so that (7.39) is bounded by p+n/2 |α| D1 D2 (Bd0 )|γ | K0 n 11n/2 p k k e D2 |t| √ δ n! J !(1 + |α|)d+1 p≤n k=0
# × (J + |γ | + 2|α| + 3n − 2k)!
1.
|l|≤p+n/2−k
[[3n/2]] + d The last term is bounded by ≤ σ0 e3σ n/2 , where [[x]] denotes the integer d part of x. Using k 2k ≤ (2k)2k , a a bb ≤ (a + b)a+b and a! ≤ a a we have (J + |γ | + 2|α| + 3n − 2k)!k 2k ≤ (J + |γ | + 2|α| + 3n)J +|γ |+2|α|+3n . Since we can assume without loss that δ < 1, this implies p+n/2 k=0
k δ
k
#
(J + |γ | + 2|α| + 3n − 2k)!
≤ (J + |γ | + 2|α| + 3n)
J +|γ |+2|α|+3n 2
p+n/2
δ −k
k=0
≤ (J + |γ | + 2|α| + 3n)
J +|γ |+2|α|+3n 2
K1 −p δ , δ n/2
(7.40)
Time-Dependent Born–Oppenheimer Approximation
615
for some constant K1 that satisfies δ −1 − 1 ≥ K1−1 δ −1 . Together with K2 (t/δ)n if t/δ > 1 Kn if t/δ = 1 (t/δ)p ≤ K2 if t/δ < 1, p≤n 2
(7.41)
where K2 is constant, we get (in the first case above) ϕ, Dyγ ϕD α cn,p,l,k,β (w, t) w β∈Bn,1 p≤n k+|l|≤p+ n2 |α|
≤
σ0 K1 K2 D1 D2 (Bd0 )|γ | e(K0 +3σ/2)n D2 √ (1+|α|)d+1 J !n!δ 3n/2
11n/2 n t
(J +|γ |+2|α|+3n)
J+|γ |+2|α|+3n 2
.
We postpone the study of the dependence of our estimates on t and J to Sect. 8. So, using the above, (J +|γ |+2|α|+3n)
J +|γ |+2|α|+3n 2
≤ (J +|γ |+2|α|+3n)
J +|γ |+2|α| 2
3n
((J +|γ |+2|α|+3)n) 2 ,
and the existence of 0 < a < b, such that a n nn ≤ n! ≤ bn nn , we learn the existence of positive constants (i.e. , independent of n) K3 , K4 and K5 , such that
3n/2 1/2 n ϕ, Dyγ ϕD α cn,p,l,k,β (w, t) ≤ K3 K n n ) . w 4 n n ≤ K3 (K5 n a n n
β∈Bn,1 p≤n k+|l|≤p+ 2
This yields the result with C0 = K3 and τ0 = K5 . The second sum is dealt with in the same manner, since the vectors dn,p,l,k,β (w, t) satisfy the same bounds as cn,p,l,k,β (w, t) does with n replaced by n − 1. γ γ Finally, the replacement of Dy by the operator y γ means that the matrix ϕ|Dy ϕ γ must be replaced by the matrix ϕ|y ϕ. But the latter has the same properties as the former; the bounds above remain true with B replaced by A from (7.38) onward. This affects the definition of C0 only. & ' The following lemma is the key to the proof of exponential accuracy of our approximation by means of optimal truncation. Lemma 7.3. For sufficiently small g > 0, there exist (g) and C(g) > 0 such that the choice N () = [[g 2 / 2 ]] implies that the norm of the error term ξN() (t) given by (7.1) satisfies 2 ξN() (t) ≤ C(g)e−(g)/ . Proof. The previous lemma, formulas (7.33), (7.34) and (7.35) show that all terms in the first sum defining ξN except (7.12), (7.13), (7.22), and (7.23) are exponentially small, once we prove ' (N() 2 C0 N() N ()1/2 τ0 ≤ Ce−/ . (7.42) Because g 2 / 2 − 1 ≤ N ≤ g 2 / 2 , if we choose 0 < g < 1/τ0 , the left hand side of this inequality is bounded by ' (N 2 2 C0 N 1/2 τ0 ≤ C0 {gτ0 }N ≤ C0 e−| ln(gτ0 )|N ≤ C0 e| ln(gτ0 )| e−| ln(gτ0 )|g / , (7.43)
616
G. A. Hagedorn, A. Joye
which gives
C(g) = C0 e| ln(gτ0 )|
and
(g) = | ln(gτ0 )|g 2 .
The terms (7.12) and (7.13) can be dealt with in a similar fashion once we have computed ⊥ = d˙N+1,p,l,k,β,j (w, t)ϕj (y, t) φ˙ N+1 β∈BN +1,2 p≤N k+|l|≤p+ N |j |≤J +N+2(p−|l|−k) 2
+ dN+1,p,l,k,β,j (w, t)ϕ˙j (y, t), where the second term equals
β∈BN +1,2 p≤N k+|l|≤p+ N |k|≤J ˜ +N+2+2(p−|l|−k) 2
i ϕ, y ϕ 2
iE (2) (a(t)) 2 − ϕ, y ϕ dN+1,p,l,k,β (w, t) ϕk˜ (y, t). 2 ˜ k
Lemma 6.2 shows that d˙N+1,p,l,k,β satisfies bounds similar to those satisfied by dN+1,p,l,k,β,j and the term above is taken care of by Lemma 7.2. Similar statements ⊥ , and the analysis above also applies to these error terms. are true for φ˙ N+2 Next consider (7.22). By the mean value theorem, there exists ζq (y, t, ) = a(t) + θq (y, t, )y, where q ∈ Nd and θq (y, t, ) ∈ (0, 1), such that
E(a(t) + y) −
m
m≤2+n
E (m) (a(t)) m y = m!
|q|
|q|=2+n+1
D q E(ζq (y, t, )) q y . q!
Hence, we need to estimate N n=0
N+3
|q|=2+n+1
D q E(ζq (y, t, )) q y q!
cN−n,p,l,k,β,j (w, t)ϕj (y, t)$,
β,p,k,l,j
(7.44) with the following restrictions: |j | ≤ J + (N − n) + 2(p − k − |l|), k + |l| ≤ p + (N − n)/2, p ≤ N − m, β ∈ B1,N−n .
(7.45)
We take a fixed value of n ∈ [0, N ], and consider the vectors D q E(ζq (y, t, )) ϕ, y q ϕ(t)cN−n,p,l,k,β (w, t) q! we have to estimate. Due to the presence of the cut-off function F (which we have omitted in the notation), we have |q|
c 6 D2 |D q E(ζq (y, t, ))| ≤ , q! (1 + |q|)d+1
Time-Dependent Born–Oppenheimer Approximation
617
and with our bounds on the matrix ϕ, y q ϕ(t) and on the vector cN−n,p,l,k,β (w, t), we can write q D E(ζq (y, t, )) q ϕ, y ϕ(t)cN−n,p,l,k,β (w, t) (7.46) q! √ p k c6 D2n+3 (d0 A)n+3 (J + 3 + N + 2(p − k − |l|))! |l|+4(N−n) |t| k ≤ D D l! . √ 1 2 (1 + (n + 3))d+1 p! δ k J! Then we use similar estimates to the above and the restrictions (7.45) to get k 2k l!l!
(J + 3 + N + 2(p − k))! (J + 3 + N + 2(p − k − |l|))! ≤ (2k)2k p!p! p!p!
≤ (2k)2k
(J + 3 + 3N − 2n − 2k)! (J + 3 + 3N − 2n)J +3+3N−2n ≤ . (N − n)!(N − n)! (N − n)!(N − n)!
Using this and |l| ≤ 3(N − n)/2, we see that (7.46) is bounded above by c6 D1 (D2 d0 A)3 11N/2 (d0 A)n |t|p (J + 3 + 3N − 2n)(J +3+3N)/2 . D2 √ 9n/2 (N − n)!δ k 4d+1 J ! D2 Finally, with N −m
N−n p+2 p=0
k=0
|t|p δ −k ≤ K1 K2 δ −(N−m)/2
N−n t , δ
(see (7.40), (7.41)), the bounds |l|≤p+(N−n)/2 1 ≤ σ0 e3σ (N−n)/2 , |q|≤n+3 1 ≤ σ0 eσ (n+3) , and |B1,N−n | ≤ eK0 (N−n) , we get (with the conditions (7.45) on the summations) q N D E(ζq (y, t, )) q ϕ, y ϕ(t)cN−n,p,l,k,β (w, t) q!
|q|=2+n+1 β,p,k,l,j
≤
σ02 e3σ K1 K2 c6 D1 (D2 d0 A)3 5σ N/2 N e D2 (d0 A)N N (J + 3 + 3N )(J +3+N)/2 √ 4d+1 J ! N−n 9/2 D2 t N−n (J + 3 + 3N )N−n . (7.47) × δ 1/2 d0 A δ (N − n)!
Postponing the study of the t and J dependence of our estimates, we use the bound (J + 3 + 3N )(J +3+N)/2 ≤ N N/2 (J + 3 + 3N )(J +3)/2 (J + 6)N/2 to establish the existence of constants L0 , L1 , L2 , independent of N and n, such that (7.47) is bounded above by N−n N N/2 (L2 N ) L0 L N . 1 N (N − n)!
618
G. A. Hagedorn, A. Joye
It remains for us to sum over n and use (7.33) to bound (7.44) by
3√
σ0 e
3σ N/2
N N/2 L 0 LN 1 N
N (L2 N )N−n
(N − n)!
n=0 ∞
(L2 N )s √ ≤ 3 σ0 L0 (e3σ/2 L1 )N N N N/2 s! ≤
s=0
3√
σ0 L0 (e
3σ/2
L2 N N
L1 e ) N
N/2
.
If we choose g < 1/(L1 eL2 +3σ/2 ), we can apply the analysis (7.43) to obtain an exponentially small bound on (7.44) by the optimal truncation N () = [[g 2 / 2 ]]. Since the estimates we have on the d’s are similar to those we have on the c’s, with the replacement of n by n − 1, the same exponential bound is valid for (7.23), (see (7.34)) and the analysis of the the first collection of error terms is completed. We now need to take into account the error terms (7.26) to (7.32) arising from the derivatives of the cut-off function F . Choose F0 > 0 that satisfies max{|w F (w)|, ∇w F (w)} ≤ F0 , uniformly in w, and recall that for any i = 1, . . . , d, supp∂wi F (w) ⊆ {w ∈ Rd : b0 < |w| < b1 }
(7.48)
for some 0 < b0 < b1 < ∞. Now consider (7.26). We express gn in terms of the c’s to 2 see that the norm of 4 times (7.26) (in L2 (Rd , Hel )) can be bounded as follows: N w F (w)gn (w, y, t) n $(w, t) n=0
≤
N
) * * n+
n=0
≤ F0
N n=0
n
Rd
, , ,w F (w) , sup
w∈suppF ⊆Rd
|j |≤J +3n
,2 , cn,j (w, t) −d/2 ϕj (w/, t),, $(w, t)2H dw el
) * * cn (w, t)+
|j |≤J +3n |w|≥b0
, , , −d/2 ϕj (w/, t),2 dw. (7.49)
We know from Sect. 7 of [15] that there exists a constant 0 < βd depending on the dimension d only, such that # 2|j | + d < b0 /(A), for all|j | ≤ J + 3N and |j | ≤ J + 3N imply - , , , −d/2 ϕj (w/, t),2 dw ≤ eβd |j | e−(b02 )/(12A2 2 ) . |w|≥b0
All the conditions here will be satisfied if N () = [[g 2 / 2 ]], provided we choose g and to satisfy 2 (d + 2J ) + 6g 2 < b02 /A2 .
Time-Dependent Born–Oppenheimer Approximation
For such a choice, using ) * * +
|j |≤J +3n |w|≥b0
|j |≤J +3n e
2βd |j |
619
≤ σ0 e(σ +2βd )(J +3n) , we get
, , , −d/2 ϕj (w/, t),2 dw ≤ √σ0 e(σ +2βd )(J +3n)/2 e−(b02 )(12A2 2 ) .
Moreover, by means of manipulations that by now are familiar, cn (w, t) ≤
cn,p,l,k,β (w, t)
β∈Bn,1 p≤n k+|l|≤p+ n2
≤
β∈Bn,1 p≤n k+|l|≤p+ n2
≤
β∈Bn,1 p≤n k+|l|≤p+ n2
≤ eK0 n σ0 e3σ n/2 K1 K2
l!
11n/2
|t|p (J + 3n)(J +3n)/2 δk n!
D1 D 2 √ J!
1 δn
|t|p k k (J + n + 2(p − |l| − k)) √ p! δ k J!
|l|+4n
D1 D 2
|t| δ
n
11n/2
D1 D 2 √ J!
(J + 3n)J /2
(J + 3N )3n/2 . n!
Combining these estimates, we get the existence of positive constants M0 and M1 , such that for N = [[g 2 / 2 ]], N N (M1 N 3/2 )n 2 2 2 n w F (w)gn (w, y, t) $(w, t) ≤ e−(b0 )/(12A ) M0 n! n=0
n=0
≤ e−(b0 )/(12A 2
22)
M0 eM1 N
3/2
≤ e−(b0 )/(12A 2
22)
M0 eM1 g
3 / 2
≤ M0 e−(b0 )/(24A 2
22)
,
provided M1 g 3 < b02 /(24A2 ). All other terms in the list (7.26) to (7.32) can be estimated in a similar fashion under a similar condition on g. This concludes the proof of our lemma. & ' Remark. It is not difficult to check that if we keep N fixed, then our approximation (4.1) ˆ ψ(w, y, t) is accurate up to an error of order N , as expected. A by-product of our estimates on the terms stemming from the introduction of the cutoff is that our approximation is exponentially localized in a ball centered at a(t) of any radius b0 , as stated in the second part of Theorem 4.1. Hence, we have completed the proof of Theorem 4.1. & '
620
G. A. Hagedorn, A. Joye
8. Generalizations As in [15], under some mild supplementary assumptions, we can extend our results to allow 0 ≤ t ≤ T () with T () " ln(1/ 2 ). This proves the validity of our construction up to the Ehrenfest time scale. Theorem 8.1. In addition to the assumptions of Theorem 4.1, assume that a classical solution to Eq. (2.4) exists for all t ∈ R. Moreover, assume that for all z in a complex neighborhood of ", the following bound is satisfied: |E(z)| ≤ N eM|z| , and that E(x) is bounded below. Suppose also that there exist L and λ > 0, such that for all t ∈ R, A(t) + B(t) ≤ Leλt . Then, there exist τ , C , T > 0, and 0 < σ, σ < 2 such that the approximation defined by choosing N () " 1/ σ is accurate up to an error whose norm is bounded by σ C e−τ / , uniformly for all times 0 ≤ t ≤ T ln(1/ 2 ). Proof. It is enough to mimick the proof of the corresponding result for the semiclassical propagation of the Schrödinger equation in [15], since our hypotheses imply that nothing can happen on the adiabatic side of the problem. By the conservation of energy, the exponential bound on E(z) and the assumed existence of a Liapunov exponent, we easily see from the proof of Lemmas 7.2 and 7.3, that the behavior in t of all constants (independent of N) is at worst exponential in t. From the conditions D2 ≥ eKT , with K some constant, we need to take g(T ) ≤ g0 e−g1 t so that the optimal truncation procedure 2 −2g1 2 yields an error of the order eK0 T e−g0 e / . The choice T () ≤ T ln(1/ 2 ), with T > 0 sufficiently small, gives the desired result. & ' Similarly, we can extend our results to allow initial conditions in a wider class of vectors. Indeed, we have been careful to make explicit the J dependence in all estimates so that we can control the error term as a function of J . Recall that J is fixed arbitrarily in (3.8) which gives the expansion in the basis ϕj (A(0), B(0), 2 , a(0), η(0), x) of the nuclear part of the wave function that we take as an initial condition. As in [15], for (a, η) ∈ R2d , we introduce the operator T (a, η) such that (T (a, η)f )(x) = −d eiη·(x−a)/ f ((x − a)/). 2
We define a dense set C in L2 (Rd ), that is contained in the set S of Schwartz functions, by cj ϕj (I, I, 1, 0, 0, x) ∈ S, such that C = f (x) = j
there exists K > 0 with
|j |>J
−KJ
|cj | ≤ e 2
, for large J .
(8.1)
Time-Dependent Born–Oppenheimer Approximation
621
Remark. It is easy to check that the inequality in (8.1) is equivalent to the requirement that the coefficients of f satisfy |cj | ≤ e−K|j | , for large |j |. Another equivalent definition of C is C = ∪t>0 e−tHho S, where Hho = −/2 + x 2 /2 is the harmonic oscillator Hamiltonian. The set C is also called the set of analytic vectors [30] for the harmonic oscillator Hamiltonian. Let f ∈ C. We set fJ (y, t) =
cj ϕj (A(t), B(t), 2 , 0, 0, y),
and
|j |≤J
f (y, t) =
cj ϕj (A(t), B(t), 2 , 0, 0, y),
j
where the classical quantities a(t), η(t), A(t), B(t), and S(t) correspond to the initial conditions a(0), η(0), A(0) = B(0) = I, and S(0). We consider the construction described in Sect. 4 corresponding to the initial condition g0 (0, y, t) = fJ (y, t), making explicit the dependence on J in the notation: ˆ J,N (w, y, t) & = F (w)e
iS(t)/ 2 iη(t)·y/
e
N
n
gn,J (w, y, t)$(w, t) +
n=0
N+2 n=2
⊥ n φn,J (w, y, t)
.
Recall that ˆ J,N (w, y, 0) & = F (w)e
iS(0)/ 2 iη(0)·y/
e
fJ (y, 0)$(w, 0) +
N+2 n=2
⊥ n φn,J (w, y, 0)
.
Let ν > 0, and consider N () = [[g 2 / 2 ]] and J () = νN (). We define our more general initial conditions as ˆ f (w, y, 0) & iS(0)/ 2
= F (w)e
eiη(0)·y/ f (y, 0)$(w, 0) +
N()+2 n=2
⊥ n φn,J () (w, y, 0) ,
ˆ f (X − which corresponds, when we get back to the variables (X, t), to an initial state & ˜ a(0), (X −a(0))/, 0) whose projection along the electronic eigenvector $(X, 0) yields a nuclear wave packet of the form (T (a(0), η(0))f )(X). Note that the component of the ˜ initial state perpendicular to $(X, 0) necessary to achieve exponential accuracy depends on . This component is determined by the coefficients of the function f . We can now state our result for such general initial conditions
622
G. A. Hagedorn, A. Joye
Theorem 8.2. Assume the hypotheses of Theorem 4.1 and consider the above constructions. There exist sufficiently small g > 0 and positive constants C(g), (g), such that with the definition ˆ J (),N() (X − a(t), (X − a(t))/, t), &∗ (X, t, ) = & we have
−itH ()/ 2 &f (X, 0, ) − &∗ (X, t, ) e
≤ C(g)e−(g)/ , 2
L2 (Rd ,H
el )
for all t ∈ [0, T ], as → 0. Moreover, the result for times T " ln(1/ 2 ) corresponding to Theorem 8.1 is also true for these initial conditions. Proof. We have e−itH ()/ &f (X, 0, ) 2
= e−itH ()/ (&f (X, 0, ) − &∗ (X, 0, )) + e−itH ()/ &∗ (X, 0, ) 2
2
= &∗ (X, t, ) + O(e−itH ()/ &∗ (X, 0, ) − &∗ (X, t, )L2 (Rd ,Hel ) ) 2
+ O &f (X, 0, ) − &∗ (X, 0, )L2 (Rd ,Hel ) . By our choice of function f , the last term is exponentially small in 1/ 2 . The remaining norm to estimate corresponds to the situation of Theorem 4.1 in which we let the parameter J grow as 1/ 2 , according to our choice of J (). But, as in the proof of Theorem 3.6 in [15] for the corresponding result in semiclassical dynamics, we have made the dependence in J of all the key estimates explicit. It is enough to go through the proof of Theorem 4.1 to check that with J = νN , all arguments can be repeated to get the same N and behavior for the estimates on the error terms, (see [15] for details). Hence, we see that for sufficiently small g, we can approximate the solution corresponding to 2 these generalized initial conditions up to an error of order e−(g)/ . The Ehrenfest time regime is dealt with similarly. & ' 9. Technicalities In this section we give the proofs of the auxiliary lemmas we used in the course of the main argument. Proof of Lemma 6.1. We first consider the case k ≥ 1. By Cauchy’s formula, we can write g(s) 1 g (t) = ds, (9.1) 2π i (t − s)2 where is the circular contour with center t and radius
1 (δ − |Imt|). k+1
Time-Dependent Born–Oppenheimer Approximation
k (δ − |Imt|). Thus, k+1
For s on , we have (δ − |Ims|) ≥
g(s)
k
≤
623
Ck (δ − |Ims|)
−k
≤
Ck
k
−k
k (δ − |Imt|) k+1
So, by putting the norm inside the integral in (9.1), we have −k −2 k 1 1 2π (δ − |Imt|)Ck k (δ − |Imt|) (δ − |Imt|) 2π k + 1 k+1 k+1
g (t) ≤
= C(k + 1)k+1 (δ − |Imt|)−k−1 . For k = 0 we use the same argument with the radius of replaced by α(δ − |Imt|) for any α < 1. This yields the bound g (t) ≤ Cα −1 (δ − |Imt|)−1 . The lemma follows because α < 1 is arbitrary.
' &
Proof of Lemma 6.4. To prove the quantity ν is finite, we estimate {l:0≤li ≤αi }
1 1 (1 + |l|)d+1 (1 + |α − l|)d+1
=
{l : 0 ≤ li ≤ αi }
1 1 (1 + |l|)d+1 (1 + |α − l|)d+1
|l| ≤ [[ |α| 2 ]]
+
{l : 0 ≤ li ≤ αi }
1 1 d+1 (1 + |l|) (1 + |α − l|)d+1
|l| > [[ |α| 2 ]]
2
≤
d+1
1 + [[ |α| 2 ]]
≤
≤
d+1
{l : 0 ≤ li ≤ αi } |l| ≤ [[ |α| 2 ]]
2d+2 (1 + |α|)
{l : 0 ≤ li ≤ αi } |l| ≤ [[ |α| 2 ]]
2d+2 (1 + |α|)
d+1
l
1 . (1 + |l|)d+1
1 (1 + |l|)d+1
1 (1 + |l|)d+1
624
G. A. Hagedorn, A. Joye
ν ≤ 2d+2
Thus,
(1 + |l|)−d−1 .
l
To see that the right-hand side of this inequality is finite, we note that the number of multi L+d −1 indices l with |l| = L is the binomial coefficient , with the convention d −1 0 that = 1. Thus, 0 ν≤2
d+2
∞ 1 L+d −1 d −1 (1 + L)d+1
L=0
∞ 2d+2 (L + d − 1)(L + d − 2) · · · (L + 1) = . (d − 1)! (L + 1)d+1 L=0
(L + d − 1)(L + d − 2) · · · (L + 1) is asymptotic to L−2 , so ν is finite. (L+ 1)d+1 d 0 αj D l M D (α−l) N , we have Since D α (MN ) = lj
For large L,
{l:0≤li ≤αi }
j =1
α D (MN ) (x)
≤
{l:0≤li ≤αi }
d 0 αj m(x)n(x)a(x)|α+p+q| lj j =1
(l + p)! (α − l + q)! d+1 (1 + |l|) (1 + |α − l|)d+1
= m(x)n(x)a(x)|α+p+q| (α + p + q)! −1 d 0 1 αj αj + pj + qj × . lj lj + pj (1 + |l|)d+1 (1 + |α − l|)d+1 {l:0≤li ≤αi }
Since
j =1
αj + pj + qj lj + p j
≥
αj + qj lj
≥
αj , lj
we therefore have
α D (MN ) (x) ≤ m(x)n(x)a(x)|α+p+q| ×
(α + p + q)!
{l:0≤li ≤αi }
1 (1 + |l|)d+1 (1 + |α
≤ m(x)n(x)νa(x)|α+p+q|
(α + p + q)! . (1 + |α|)d+1
− l|)d+1
.
' &
Proof of Lemma 6.3. If f (t) satisfies f (t) ≤ C|t|p dist(t)−k , for all t ∈ A, there exists g(t) analytic in A, such that f (t) = t p g(t) and g(t) ≤ Cdist(t)−k . We use
Time-Dependent Born–Oppenheimer Approximation
625
the integration path from 0 to t ∈ A parametrized by γ (u) = tu, with u ∈ [0, 1], to compute t 1 1 p f (s)ds = f (tu)du = t (tu) g(tu)du 0
0
≤ C|t|p+1
0
1 0
up dist(tu)
du ≤ C k
|t|p+1 dist(t)−k , p+1
since, by assumption, dist(ut) is a decreasing function of u.
(9.2)
' &
Acknowledgements. George Hagedorn wishes to thank the Institut Fourier and Alain Joye wishes to thank Virginia Tech for hospitality and support.
References 1. Benchaou, M.: Estimations de Diffusion pour un Opérateur de Klein-Gordon Matriciel Dépendant du Temps. Bull. Soc. math. France 126, 273–294 (1998) 2. Benchaou, M., and Martinez A.: Estimations Exponentielles en Théorie de la Diffusion des Opérateurs de Schrödinger Matriciels. Ann. Inst. H. Poincaré Sect. A 71, 561–594 (1999) 3. Berry, M.V.: Quantum Phase Corrections from Adiabatic Iteration. Proc. R. Soc. Lond. A 414, 31–46 (1987) 4. Berry, M.V.: Histories of Adiabatic Quantum Transitions. Proc. R. Soc. Lond. A 429, 61–72 (1990) 5. Coker, D. F., and Xiao, L: Methods for Molecular-Dynamics with Nonadiabtic Transitions. J. Chem. Phys. 102, 496–510 (1995) 6. Dieudonné J.: Calcul Infinitésimal. Paris: Hermann 1968 7. Hagedorn, G. A.: A Time-Dependent Born–Oppenheimer Approximation. Commun. Math. Phys. 77, 1–19 (1980) 8. Hagedorn, G. A.: High Order Corrections to the Time-Dependent Born–Oppenheimer Approximation I: Smooth Potentials. Ann. Math. 124, 571–590 (1986). Erratum 126, 219 (1987) 9. Hagedorn, G. A.: High Order Corrections to the Time-Dependent Born–Oppenheimer Approximation II: Coulomb Systems. Commun. Math. Phys. 117, 387–403 (1988) 10. Hagedorn, G.A.: Molecular Propagation Through Electronic Eigenvalue Crossings, MemoirsAmer. Math. Soc. 111 (536), (1994) 11. Hagedorn, G. A.: Semiclassical Quantum Mechanics III: The Large Order Asymptotics and More General States. Ann. Phys. 135, 58–70 (1981) 12. Hagedorn, G. A.: Semiclassical Quantum Mechanics IV: Large Order Asymptotics and More General States in More than One Dimension. Ann. Inst. H. Poincaré Sect. A. 42, 363–374 (1985) 13. Hagedorn, G. A.: Raising and lowering operators for semiclassical wave packets. Ann. Phys. 269, 77–104 (1998) 14. Hagedorn, G. A. and Joye, A.: Semiclassical Dynamics with Exponentially Small Error Estimates. Commun. Math. Phys. 207, 439–465 (1999) 15. Hagedorn, G. A., and Joye, A.: Exponentially Accurate Semiclassical Dynamics: Propagation, Localization, Ehrenfest Times, Scattering and More General States. Ann. H. Poincaré 1, 837–883 (2000) 16. Joye, A.: Proof of the Landau–Zener Formula. Asymptotic Analysis 9, 209–258 (1994) 17. Joye, A. and Pfister, C.-E.: Exponentially Small Adiabatic Invariant for the Schrödinger Equation. Commun. Math. Phys. 140, 15–41 (1991) 18. Joye, A. and Pfister, C.-E.: Superadiabatic Evolution and Adiabatic Transition Probability between Two Non-Degenerate Levels Isolated in the Spectrum. J. Math. Phys. 34, 454–479 (1993) 19. Joye, A., Pfister, C.-E. : Semi-Classical Asymptotics beyond All Orders for Simple Scattering Systems, SIAM J. Math. Anal. 26, 944–977 (1995) 20. Klein, M.: On the Mathematical Theory of Predissociation. Ann. Phys. 178, 48–73 (1987) 21. Lim R., and Berry, M.V.: Superadiabatic Tracking of Quantum Evolution. J. Phys. A: Math. Gen. 24, 3255–3264 (1991) 22. Martin Ph.-A. and Nenciu G.: Semiclassical Inelastic S-Matrix for One-Dimensional N-States Systems, Rev. Math. Phys. 7, 193–242 (1995) 23. Martinez, A.: Développements Asymptotiques et Effet Tunnel dans l’Approximation de Born– Oppenheimer. Ann. Inst. H. Poincaré Sect. A 50, 239–257 (1989)
626
G. A. Hagedorn, A. Joye
24. Martinez, A.: Resonances dans l’Approximation de Born–Oppenheimer I. J. Diff. Eq. 91, 204–234 (1991) 25. Martinez, A.: Resonances dans l’Approximation de Born–Oppenheimer II. Largeur de Résonances. Commun. Math. Phys. 135, 517–530 (1991) 26. Martinez, A. and Sordoni, V.: On the Time-Dependent Born-Oppenheimer Approximation with Smooth Potential. Preprint mp_arc 01–37 27. Nenciu, G.: Linear Adiabatic Theory and Applications: Exponential Estimates. Commun. Math. Phys. 152, 121–135 (1993) 28. Nenciu, G., and Sordoni, V.: Semiclassical limit for multistate Klein-Gordon systems: almost invariant subspaces and scattering theory. Preprint mp_arc 01–36 29. Pechukas, P.: Time-Dependent Semiclassical Scattering Theory. II. Atomic Collisions. Phys. Rev. 181, 174–184 (1969) 30. Reed, M. and Simon, B.: Methods of Modern Mathematical Physics I: Functional Analysis. New York, London: Academic Press 1972 31. Spohn, H. and Teufel, S.:Adiabatic Decoupling and Time-Dependent Born-Oppenheimer Theory. Preprint mp_arc 01–144 32. Tully, J. C.: Molecular Dynamics with Electronic Transitions. J. Chem. Phys. 93, 1061–1071 (1990) 33. Webster, F., Rossky, P. J., and Friesner, R.A.: Nonadiabatic Processes in Condensed Matter: Semi-Classical Theory and Implementation. Comp. Phys. Commun. 63, 494–522 (1991) Communicated by B. Simon
Commun. Math. Phys. 223, 627 – 672 (2001)
Communications in
Mathematical Physics
© Springer-Verlag 2001
A Fredholm Determinant Identity and the Convergence of Moments for Random Young Tableaux Jinho Baik1,2 , Percy Deift3,4 , Eric Rains5 1 Department of Mathematics, Princeton University, Princeton, NJ 08544, USA.
E-mail: [email protected]
2 Institute for Advanced Study, Princeton, NJ 08540, USA 3 Department of Mathematics, University of Pennsylvania, Philadelphia, PA 19104, USA.
E-mail: [email protected]
4 Department of Mathematics, Courant Institute of Mathematical Sciences, New York, NY 10012, USA 5 AT&T Research, Florham Park, NJ 07932, USA.
E-mail: [email protected] Received: 19 December 2000 / Accepted: 23 July 2001
Abstract: We obtain an identity between Fredholm determinants of two kinds of operators, one acting on functions on the unit circle and the other acting on functions on a subset of the integers. This identity is a generalization of an identity between a Toeplitz determinant and a Fredholm determinant that has appeared in the random permutation context. Using this identity, we prove, in particular, convergence of moments for arbitrary rows of a random Young diagram under Plancherel measure. 1. Introduction In [3], the authors considered the length N (π ) of the longest increasing subsequence of a random permutation π ∈ SN , the symmetric group on N numbers. They showed, √ N (π)−2 N ˜ , in particular, that for N (π ) := N 1/6 lim P(˜N ≤ x) = F (1) (x),
N→∞
(1.1)
where F (1) (x) is the Tracy–Widom distribution [36] for the largest eigenvalue of a random matrix from the Gaussian Unitary Ensemble (GUE). The authors also proved the convergence of moments, ∞ m ˜ lim E (N ) = x m dF (1) (x), m = 1, 2, . . . . (1.2) N→∞
−∞
The authors then reinterpreted (1.1), (1.2) in terms of Young diagrams λ = (λ1 , λ2 , . . . ) th via the Robinson–Schensted correspondence. Here λj is the number of boxes in the j row of λ and λ1 ≥ λ2 ≥ · · · ≥ 0. The set of Young diagrams YN of size N , j λj = N , is equipped with Plancherel measure, PPlan N (λ) :=
dλ2 , N!
λ ∈ YN ,
(1.3)
628
J. Baik, P. Deift, E. Rains
where dλ is the number of standard Young tableaux of shape λ. Set √ λj − 2 N , j = 1, 2, . . . . (1.4) ξj := N 1/6 Then (1.1), (1.2) imply that ξ1 converges in distribution, together with all its moments, to F (1) . This reinterpretation led the authors to conjecture that for all k, ξ1 , ξ2 , . . . , ξk converge to the joint distribution function F (x1 , x2 , . . . , xk ) for the first k eigenvalues of a random GUE matrix. In [4], the authors verified the convergence in distribution, together with its moments, to the Tracy–Widom distribution F (2) for the second largest eigenvalue of a random GUE matrix. The conjecture for ξ1 , ξ2 , . . . , ξk was then proved in three independent papers [29, 8, 24], all appearing within a few months in the spring of 1999. Let yj be the j th largest eigenvalue of a random N × N matrix from GUE with probability density d PGUE N (y1 , . . . , yN ) =
1 ZN
(yi − yj )2
1≤i<j ≤N
N
e−yj dy1 · · · dyN , 2
(1.5)
j =1
where y1 ≥ · · · ≥ yN , and ZN is the normalization constant. At the “edge” of the spectrum, the following convergence in distribution is well-known (see, e.g. [36, 24] Theorem 1.4): for any k ∈ N, there is a distribution function F (x1 , . . . , xk ) on x1 ≥ · · · ≥ xk such that √ √ √ √ lim PGUE (y1 − 2N ) 2N 1/6 ≤ x1 , . . . , (yk − 2N ) 2N 1/6 ≤ xk N N→∞ (1.6) = F (x1 , . . . , xk ). In all three papers [29, 8, 24], the authors showed that for any x1 , . . . , xk ∈ Rk , lim PPlan N (ξ1 ≤ x1 , . . . , ξk ≤ xk ) = F (x1 , . . . , xk ),
N→∞
(1.7)
but the question of the convergence of moments was left open. Introduce the Poissonized Plancherel measure PPois (λ) t
2 ∞ e−t t 2N Plan = PN (λ), N!
t > 0,
(1.8)
N=0
on all Young diagrams, which corresponds to choosing N as a Poisson variable with parameter t 2 . Here PPlan N (λ) = 0 if λ is not a partition of N . Throughout the paper, we will work with PPois (λ) rather than PPlan t N (λ) itself. This is because the expectation with respect to PPois (λ) leads to convenient determinantal formulae. Indeed, in [19], Gessel t proved the following formula PPois (λ1 ≤ n) = e−t det(Tn ), t 2
(1.9)
where Tn is the n × n Toeplitz matrix with entries (Tn )pq = cp−q , 0 ≤ p, q < n, where −1 −1 dz ck is the k th Fourier coefficient of et (z+z ) , ck = |z|=1 z−k et (z+z ) 2πiz . This formula played a basic role in [3] in proving (1.1), (1.2). In [4], the authors introduced the integral −1 operator Kn with ϕ(z) = et (z−z ) (see (2.1) below) and proved the following formulae: PPois (λ1 ≤ n) = 2−n det(1 − Kn ) t
(1.10)
Fredholm Determinant Identity for Random Young Tableaux
629
and
√ √ ∂
Pois PPois (1 + s)−n det(1 − s Kn ) . (λ ≤ n + 1) = P (λ ≤ n) + − 2 1 t t
∂s s=1 (1.11)
These formulae played a basic role in [4] in proving the analogue of (1.1),(1.2) for λ2 . In [8] and [24], and also later, in greater generality, in [28] and [31], the authors obtained the following identity: Let "k denote the (finite) set {n ∈ {0, 1, . . . }k : rj =1 nj ≤ r − 1, r = 1, . . . , k}. Then for ak ≤ · · · ≤ a1 ≤ a0 = ∞, PPois (λ1 − 1 ≤ a1 , λ2 − 2 ≤ a2 , . . . , λk − k ≤ ak ) t
k
1 ∂ |n|
= det 1 + s χ l (al ,al−1 ] S , n1 ! · · · nk ! ∂s1n1 · · · ∂sknk s1 =···=sk =−1 n∈" l=1
(1.12)
k
√
−1
where the matrix elements of S(i, j ) are given in (2.3) below with ϕ(z) = e γ (z−z ) . As usual, χ(a,b] denotes the characteristic function of the interval (a, b], and so k 2 l=1 sl χ(al ,al−1 ] S denotes the operator in (Z) with kernel sl S(i, j ) if i ∈ (al , al−1 ], and zero otherwise. Setting aj = 2t + xj t 1/3 , x1 ≥ x2 ≥ · · · ≥ xk , and letting t → ∞, and de-Poissonizing as in [26], the authors in [8] and [24] obtain (1.7). In [8] and [24], however, the authors are not able to prove convergence of moments. The reason for is that it is possible method to con this to use the classical steepest-descent k 1/3 as t → ∞, uniformly for trol det 1 + l=1 sl χ(al ,al−1 ] S for aj = 2t + xj t x1 ≥ x2 ≥ · · · ≥ xk ≥ M for any fixed M. But as the xj ’s tend to −∞, the method break down. On the other hand, the authors in [3, 4] are able to control the lower tails of the probability distributions, and hence prove the convergence of moments for λ1 and λ2 , using the steepest-descent method for the Riemann–Hilbert problem (RHP) naturally associated with Tn and Kn above. The steepest-descent method for RHP was introduced in [16], and extended to include fully non-linear oscillations in [15]. The asymptotic analysis in [3, 4] is closely related to the analysis in [13, 14]. The main motivation for this paper was to find a formula for the joint distribution of λ1 , . . . , λk , which generalized (1.11), and to which the above Riemann–Hilbert steepest-descent methods could be applied to obtain the lower tail estimates. Note that from (1.9), (1.10) and (1.12), we have three formulae for the distribution of λ1 , PPois (λ1 ≤ n) = e−t det(Tn ) t 2
= 2−n det(1 − Kn ) = det(1 − χ[n,∞) S),
(1.13)
and from (1.11) and (1.12), two formulae for the distribution of λ2 , (λ2 ≤ n + 1) PPois t
√ √ ∂
(1 + s)−n det(1 − s Kn ) (λ ≤ n) + − = PPois 1 t
∂s s=1
∂
= PPois (λ1 ≤ n) + det(1 + sχ[n,∞) S). t ∂s
s=−1
(1.14)
630
J. Baik, P. Deift, E. Rains
To obtain the second formula, we use the fact that "k=2 = {(0, 0), (0, 1)} and set a1 = ∞, a2 = n − 1 in (1.12). From (1.14), we might guess that (1 +
√
s)−n det(1 −
√
s Kn ) = det(1 − sχ[n,∞) S).
(1.15)
The content of Theorem 2.1 is that precisely this relation is true for a general class of functions ϕ(z), provided ϕ(z) has no winding. If the winding number of ϕ is non-zero, the 2 above relation must be modified slightly as in (2.7). The fact that e−t det(Tn ) = det(1− χ[n,∞) S) for (essentially) the same general class of ϕ s (with zero winding number) was first proved in [7], with an alternative proof given in [5]. The relation (1.15) for general s was proved essentially simultaneously with the present paper by Rains in [31], for a subclass of functions ϕ with zero winding, using algebraic methods (see Remark 4 in 2 Sect. 2). A particularly simple proof of the relation e−t det(Tn ) = det(1 − χ[n,∞) S) can be found in the recent paper [9] of Böttcher (see also [10]). The paper [9] also extends Theorem 2.1 and 2.12 to the matrix case (see Remark 2.3 and 2.13 below). In this paper, we will prove a general identity between determinants of operators of two types: the operators of the first type act on functions on the unit circle, and the operator of the second type act on functions on a subset of the integers. Specializations of this identity have, in particular, the following consequences: (S1) A proof of the convergence of moments for ξ1 , . . . , ξk (see Theorem 3.1). (S2) An interpretation of F (x1 , . . . , xk ) in (1.7) as a “multi-Painlevè” function (see Sect. 6). As we will see, the behavior of multi-Painlevé functions has similarities to the interactions of solitons in the classical theory of the Korteweg de Vries equation. (S3) The analogue of Theorem 3.1 for signed permutations and so-called colored permutations (see Sect. 7). (S4) New formulae for random word problems, certain 2-dimensional growth models, and also the so-called “digital boiling” model (see Sect. 7). The new identity is given in Theorem 2.1 in two closely related forms (2.7), (2.8). In (S1)–(S4), we only use (2.7). As we√will see, some simple estimates together with a Riemann–Hilbert analysis of det(1 − s Kn ) is enough to control the lower tail estimation of PPois (λ). The relation t (1.15) generalizes to the multi-interval case, as described in Theorem 2.12 in Sect. 2. In Sect. 2, we prove the main identity (2.7), (2.8) in the single interval case, and also the identity (2.53) in the multi-interval case. In Sect. 3, we use (2.7) to prove the convergence of moments for random Young tableaux (Theorem 3.1). A stronger version of this result is given in (3.2). Section 4 contains certain tail estimates, needed in Sect. 3. Various estimates needed in Sect. 4 for a ratio of determinants are derived in Sect. 5 using the steepest-descent method for RHP’s. In Sect. 6, we introduce the notion of a multiPainlevé solution, and in Sect. 7, we prove various formulae for colored permutations and also discuss certain random growth models from the perspective of Theorem 2.1.
2. Fredholm Determinant Identity Let ϕ(z) be a continuous, complex-valued, non-zero function on the unit circle * = {z ∈ C : |z| = 1}. Define Kn to be the integral operator acting on L2 (*, dw) with
Fredholm Determinant Identity for Random Young Tableaux
631
kernel 1 − zn ϕ(z)w −n ϕ(w)−1 , 2π i(z − w) (Kn f )(z) = Kn (z, w)f (w)dw. Kn (z, w) =
(2.1)
|w|=1
For a function f on *, its Fourier coefficients are denoted by fj , so that f (z) = fj z j .
(2.2)
j ∈Z
Let S be the matrix with entries S(i, j ) =
(ϕ −1 )i+k ϕ−j −k ,
i, j ∈ Z,
(2.3)
i, j ∈ Z.
(2.4)
k≥1
and let R be the matrix with entries R(i, j ) = (ϕ −1 )i+k ϕ−j −k , k≤0
Let Sn denote the operator χ[n,∞) S acting on 2 ({n, n + 1, . . . }), Sn (i, j )f (j ), i ≥ n, (Sn f )(i) =
(2.5)
j ≥n
and let Rn denote the operator χ(−∞,n−1] R acting on 2 ({. . . , n − 2, n − 1}), Rn (i, j )f (j ), i ≤ n − 1. (Rn f )(i) =
(2.6)
j ≤n−1
Theorem 2.1. Let ϕ(z) be a non-zero function on the unit circle satisfying j ∈Z |j ϕj | < ∞, which has winding number equal to #(ϕ). For s ∈ C and n ∈ Z, Kn , Sn and Rn are trace class on L2 (*, dw), 2 ({n, n + 1, . . . }) and 2 ({. . . , n − 1}) respectively, and we have det(1 − s Kn ) = (1 + s)n+#(ϕ) det(1 − s 2 Sn ), = (1 − s)
−n−#(ϕ)
det(1 − s Rn ), 2
s = −1,
(2.7)
s = 1.
(2.8)
Remark 2.2. Standard Banach algebra estimates show that if the winding number of ϕ is 1/2 |j ||(log ϕ)j |2 < ∞. This is enough zero and |j ϕj | < ∞, then log ϕ∞ + to prove that the first and the third terms in (1.13) are equal for all such ϕ’s (see [5]). In particular, by (1.13), (2.7) is true for all ϕ without winding and satisfying |j ϕj | < ∞, when s = 1. Remark 2.3. As noted by Böttcher [9], Theorem 2.1 remains true in the case where ϕ(z) is an invertible N × N matrix, provided the exponent n + #(ϕ) is replaced in (2.7), (2.8) by N n + #(det ϕ). The proof in the scalar case extends to N × N matrices, and we give no further details: the proof in [9] is different and uses Wiener-Hopf factorization directly.
632
J. Baik, P. Deift, E. Rains
For the proof of Theorem 2.1, we use the following basic properties of the determinant (see, e.g., [33]). If A is a trace class operator on a Hilbert space H , A1 = (tr A∗ A)1/2 denotes the trace norm. Lemma 2.4. (i) If An is a trace class operator for each n and An → A in trace norm, then A is a trace class operator and det(1 + An ) → det(1 + A) as n → ∞. (ii) If A is a trace class operator, and Bn and Cn are bounded operators such that (Bn )∗ and Cn converge strongly to B ∗ and C respectively, then det(1 + Cn ABn ) → det(1 + CAB) as n → ∞. (iii) If AB and BA are trace class operators, then det(1 + AB) = det(1 + BA). (iv) Suppose C acts on 2 (Z) and has matrix elements (cij )i,j ∈Z . If i,j ∈Z |cij | < ∞, then C is trace class and C1 ≤ i,j ∈Z |cij |. Proof of Theorem 2.1. Define the projection operators on the circle
(Pn f )(z) =
fj z j ,
n ∈ Z,
fj z j ,
n > 0,
(2.9)
j ≥n
and
(Qn f )(z) =
0≤j
(Qn f )(z) = −
fj z j ,
n < 0,
(2.10)
n≤j <0
with (Q0 f )(z) = 0. Thus in particular, we have Pn = P0 − Qn . Let Mg denote the multiplication operator (Mg f )(z) = g(z)f (z).
(2.11)
Direct calculation shows that Kn = −P0 + Mϕ Pn Mϕ −1 = (1 − P0 ) − Mϕ (1 − Pn )Mϕ −1 , Sn = Pn Mϕ −1 (1 − P0 )Mϕ Pn ,
(2.12)
Rn = (1 − Pn )Mϕ −1 P0 Mϕ (1 − Pn ). First, we show that Kn , Sn and Rn are trace class. Indeed Kn = −Qn −H Mϕ −1 , where H = [Pn , Mϕ ]. H acts on the basis {zl }l∈Z for L2 (*, dw), as follows: H zk = l Hlk zl . We find Hlk = ϕl−k (χl≥n − χk≥n ),
l, k ∈ Z,
(2.13)
where χ·≥n denotes the characteristic function of the set {k ≥ n}. But l,k |Hlk | ≤ j |j ϕj | < ∞, and hence by Lemma 2.4, we have the trace norm estimate Kn 1 ≤ n +
j
|j ϕj | ϕ −1 L∞ .
(2.14)
Fredholm Determinant Identity for Random Young Tableaux
633
Now write Sn = AB, where A : 2 ({1, 2, . . . }) → 2 ({n, n+1, . . . }) and B : 2 ({n, n+ 1, . . . }) → 2 ({1, 2, . . . }) with matrix elements Aik = (ϕ −1 )i+k , Bkj = ϕ−k−j ,
i ≥ n, k ≥ 1, k ≥ 1, j ≥ n.
(2.15)
Write − R, A = χn+ 5−1 χ−1
(2.16)
where (Rf )j = f−j , 5−1 denotes convolution on 2 (Z) by {ϕj−1 }, (5−1 h)j =
(ϕ −1 )j −l hl ,
(2.17)
l∈Z − and χn+ , χ−1 are the projections onto {k ≥ n} and {k ≤ −1} respectively. From (2.16), it is clear that A is bounded from 2 ({1, 2, . . . }) → 2 ({n, n + 1, . . . }) with norm estimate
A ≤ ϕ −1 L∞ .
(2.18)
On the other hand, a similar calculation to (2.14) shows that B is trace class from 2 ({n, n + 1, . . . }) → 2 ({1, 2, . . . }) and B1 ≤ |lϕ−l | ≤ |lϕl |, (2.19) l≥n+1
l
which implies Sn 1 ≤
|(l + |n|)ϕ−l | ϕ −1 L∞ .
(2.20)
l≥n+1
Similarly, we have Rn 1 ≤
|(l + |n|)ϕ−l | ϕ −1 L∞ .
(2.21)
l≤n
Thus Kn , Sn and Rn are trace class. Moreover, if we set ϕJ := |j |≤J ϕj zj , J ≥ 0, then from (the proofs of) (2.14), (2.20) and (2.21), it is clear that as J → ∞, Kn (ϕJ ) → Kn (ϕ), Sn (ϕJ ) → Sn (ϕ), Rn (ϕJ ) → Rn (ϕ) in trace norm, and hence the Fredholm determinants converge to the corresponding determinants. Also for J sufficiently large, the winding number of ϕJ is the same as the winding number of ϕ, and so we see that to prove (2.7), it is enough to consider ϕ’s which are non-zero and analytic in a neighborhood of *. Henceforth we will assume that ϕ is analytic: this analyticity assumption is not necessary and is used only to give a particularly simple proof of Lemma 2.5 below. Below, we only present the proof of (2.7). The proof of (2.8) is similar. Formally, we proceed as follows. Suppose Pn is finite rank so that P0 = Qn + Pn is also finite rank. We have s det(1 + sP0 − sMϕ Pn Mϕ −1 ) = det(1 + sP0 ) det(1 − Mϕ Pn Mϕ −1 ). (2.22) 1 + sP0
634
J. Baik, P. Deift, E. Rains
1 s Using P0 = Qn + Pn and 1+sP = 1 − 1+s P0 , the right-hand side reduces to 0
s det(1 + sQn ) det(1 + sPn ) det(1 − s(1 − 1+s P0 )Mϕ Pn Mϕ −1 ), n ≥ 0, −1 (2.23) s det(1 − sQn ) det(1 + sPn ) det(1 − s(1 − 1+s P0 )Mϕ Pn Mϕ −1 ), n < 0.
The first term in both cases is equal to (1 + s)n . Using Lemma 2.4 and Pn = Pn2 for the last determinant, (2.23) becomes s (1 + s)n det(1 + sPn ) det(1 − sPn Mϕ −1 1 − P0 )Mϕ Pn 1+s s n = (1 + s) det((1 + sPn ) − s(1 + s)Pn Mϕ −1 1 − P0 )Mϕ Pn (2.24) 1+s = (1 + s)n det(1 − s 2 Sn ), which is the desired result, up to the winding number #(ϕ). For the case in hand, however, P0 is not a trace class operator and the above “proof” breaks down. We circumvent the difficulty by approximating the operator Kn by finite rank operators, and the missing factor #(ϕ) will appear along the way. Let TN be the projection (TN f )(z) = fj zj , N ≥ 1. (2.25) |j |≤N
Note that TN is a trace class operator since it has finite rank. Clearly TN , TN∗ → 1 strongly, and hence by Lemma 2.4, det(1 − s Kn ) = lim det(1 + s(P0 − Mϕ Pn Mϕ −1 )TN ). N→∞
(2.26)
Now since Pk TN is trace class, proceeding as above in (2.22)-(2.24), we have for N ≥ |n|, det(1 + s(P0 − Mϕ Pn Mϕ −1 )TN ) = (1 + s)n det((1 + sPn TN ) 1 − sPn Mϕ −1 TN 1 −
s P0 TN )Mϕ Pn . 1+s
(2.27)
Thus we have det(1 − s Kn ) = (1 + s)n lim det(1 + XN + YN ), N→∞
(2.28)
where for N ≥ n, 1 + sTN Pn (TN − Mϕ −1 TN Mϕ )Pn , 1+s 1 + sTN YN = −s 2 Pn Mϕ −1 TN (1 − P0 )Mϕ Pn . 1+s
XN = s
We observe that 1. XN and YN are trace class. ∗ → 0 strongly as N → ∞. 2. XN 3. YN → −s 2 Sn in trace norm as N → ∞.
(2.29) (2.30)
Fredholm Determinant Identity for Random Young Tableaux
635
4. (1 + XN )−1 and (1 + YN )−1 are uniformly bounded in operator norm as N → ∞ when s is small enough. The third property follows using Lemma 2.4 as (1−P0 )Mϕ Pn is trace class and TN → 1 strongly. For a moment, we assume that s is small so that (iv) is satisfied. Now we rewrite the right-hand side of (2.28) as 1 1 det(1 + XN + YN ) = det 1 + YN ) det(1 + XN ) det(1 − YN X N . 1 + XN 1 + YN (2.31) From the properties (i), (ii), (iv) above, we have Using the property (iii), we now have
1 1 1+XN 1+YN
YN XN → 0 in trace norm.
det(1 − s Kn ) = (1 + s)n det(1 − s 2 Sn ) lim det(1 + XN ). N→∞
(2.32)
Rewrite XN as 1 + sTN Pn (Mϕ −1 (1 − TN )Mϕ − (1 − TN ))Pn 1+s 1 + sTN =s Pn (Mϕ −1 PN+1 Mϕ − PN+1 )Pn 1+s 1 + sTN +s Pn (Mϕ −1 (1 − P−N )Mϕ − (1 − P−N ))Pn 1+s =: ZN + WN .
XN = s
(2.33)
Then as N ≥ |n|, 1 + sTN Pn Mϕ −1 (1 − P−N )Mϕ Pn 1+s 1 + sTN Pn Mϕ −1 (1 − P−N )(1 − P0 )Mϕ Pn , =s 1+s
WN = s
(2.34)
and hence WN → 0 in trace norm as (1 − P−N ) → 0 strongly and (1 − P0 )Mϕ Pn is in trace class. Also ZN → 0 strongly, and (1 + ZN )−1 , (1 + WN )−1 are uniformly bounded as N → ∞ for s small enough. Thus by similar arguments leading to (2.32), we have lim det(1 + XN ) = lim det(1 + ZN ).
N→∞
N→∞
(2.35)
Now by (2.12) and (2.1), we note that ˜ N+1 +QN+1 = AN+1 K ˜ 0 A−1 , Mϕ −1 PN+1 Mϕ − PN+1 = K N+1
(2.36)
˜ N+1 is KN+1 with ϕ replaced by ϕ −1 , and AN+1 is the operator of multiplication where K N+1 by z . Thus we have 1 + sTN ˜ 0 A−1 Pn Pn AN+1 K det(1 + ZN ) = det 1 + s N+1 1+s (2.37) 1 + sTN −1 ˜ = det 1 + s K0 AN+1 Pn Pn AN+1 , 1+s
636
J. Baik, P. Deift, E. Rains
by Lemma 2.4 (iii). Since Pk AN+1 = AN+1 Pk−N−1 , and TN = P−N − PN+1 , we have A−1 N+1 Pn
1 1 + sTN Pn AN+1 = ((1 + s)Pn−N−1 − sP0 ) 1+s 1+s 1 + s − sP0 → =: D(s) 1+s
(2.38)
˜ 0 is a trace class, we obtain strongly. Since K ˜ 0 D(s)). lim det(1 + ZN ) = det(1 + s K
(2.39)
N→∞
Therefore from (2.32) and (2.35), ˜ 0 D(s)). det(1 − s Kn ) = (1 + s)n det(1 − s 2 Sn ) det(1 + s K
(2.40)
˜ 0 D(s)) does not depend on n, we obtain the value of this determinant Since det(1+s K by letting n → ∞ in both sides of (2.40). But by Lemma 2.5 below, for small s, (1 + s)−n det(1 − s Kn ) → (1 + s)#(ϕ) as n → ∞. On the other hand, from (2.20), ˜ 0 D(s)) = (1+s)#(ϕ) , and we obtain, det(1−s 2 Sn ) converges to 1. Therefore det(1+s K for small s, det(1 − s Kn ) = (1 + s)n+#(ϕ) det(1 − s 2 Sn ), as desired. The result for all s now follows by analytic continuation. Observe from (2.1), (2.12) that Kn ≤ 1 estimate (see Appendix [4]) shows that Kn ≤ the following result.
(2.41)
+ ϕ∞ ϕ −1 ∞ =: s0 . (A different max(ϕ∞ , ϕ −1 ∞ ).) Then we have
Lemma 2.5. For a function ϕ which is analytic and non-zero in a neighborhood of the unit circle {|z| = 1} in the complex plane, and has winding number equal to #(ϕ), we have for |s| < s0−1 , lim (1 + s)−n det(1 − s Kn ) = (1 + s)#(ϕ) ,
(2.42)
lim (1 − s)n det(1 − s Kn ) = (1 − s)−#(ϕ) ,
(2.43)
n→∞
n→−∞
for some s0 > 0. √
−1
λ(z−z ) and Proof. √ In Lemma 5 of [4], we obtained the result (2.42) for ϕ(z) = e s = t, 0 ≤ t ≤ 1. For general analytic ϕ and s small, the proof remains the same until ϕ Eq. (50). The second component in the asymptotics of F is now − 1+s , and hence (51) is changed to s ds ϕ (z) −1 −nz − dz + O(e−cn ) log det(1 − s Kn ) = − ϕ(z) 0 2π i(1 + s ) |z|=1
= (n + #(ϕ)) log(1 + s) + O(e−cn ). The calculation for (2.43) is similar and we skip the details.
(2.44)
Remark 2.6. Lemma 2.5 does not require ϕ to be analytic. However, in this case, the proof is particularly simple, and can be quoted directly from [4] as above.
Fredholm Determinant Identity for Random Young Tableaux
637
˜ 0 D(s)) = (1 + s)#(ϕ) is rather remarkable. It is Remark 2.7. The fact that det(1 + s K an instructive exercise to check this identity directly when ϕ is simple, say ϕ = zk or ϕ = (1 + az)(1 + bz−1 ), |a|, |b| < 1. Remark 2.8. By (2.7), we see that if n + #(ϕ) > 0, det(1 − s Kn ) has a root at s = −1 of order at least n + #(ϕ). In particular, Kn has eigenvalue −1. Moreover, if Kn is self√ −1 ) λ(z−z adjoint (which is true by (2.12) whenever |ϕ| = 1, e.g., ϕ = e as in [3, 4]), then Kn has an eigenspace of dimension at least n + #(ϕ) corresponding to the eigenvalue −1. It is also clear from (2.7) that if s = −1 is a root of det(1 − s Kn ), then so is −s. On the other hand, if n + #(ϕ) < 0, then clearly det(1 − s 2 Sn ) has a root at s = ±1, etc. In the self-adjoint case, when |ϕ| = 1, we see from (2.12) that Sn is positive definite with norm ≤ 1. We will use this fact in Sect. 5. Remark 2.9. Define the operator A acting on (Z) by A = Mϕ −1 P0 Mϕ .
(2.45)
Since det(1 − s Kn ) = det(1 − sMϕ −1 Kn Mϕ ), using (2.12), the above theorem can be rephrased as det(1 − s(Pn − A)) = (1 + s)n+#(ϕ) det(1 − s 2 Pn (1 − A)Pn ), = (1 − s)−n−#(ϕ) det(1 − s 2 (1 − Pn )A(1 − Pn )).
(2.46)
These are the identities (8.55), (8.56) in [31] for a certain subclass of ϕ’s with zero winding, #(ϕ) = 0. The following corollary will be used in the analysis of (S3) in Sect. 7 below. (m)
Corollary 2.10. Let ϕ(z) be as in Theorem 2.1. Define Kn , S(m) and R(m) to be the operators analogous to Kn , S(m) and R(m) with the matrix elements given by Kn(m) (z, w) =
1 − zn ϕ(zm )w −n ϕ(w m )−1 , 2π i(z − w)
(2.47)
and S(m) (i, j ) =
(ϕ −1 )(i+k)/m ϕ(−j −k)/m ,
(2.48)
(ϕ −1 )(i+k)/m ϕ(−j −k)/m ,
(2.49)
k≥1
R(m) (i, j ) =
k≤0
(m)
where ϕa = (ϕ −1 )a = 0 if a = Z. Set Sn Then we have
(m)
= χ[n,∞) S(m) and Rn
det(1 − s Kn(m) ) = (1 + s)n+#(ϕ) det(1 − s 2 S(m) n ), = (1 − s)−n−#(ϕ) det(1 − s 2 Rn(m) ),
= χ(−∞,n−1] R(m) .
s = −1,
(2.50)
s = 1.
(2.51)
638
J. Baik, P. Deift, E. Rains (m)
Remark 2.11. Observe that Sn has the block structure (m) (m) Sn (mi, mj ) ··· Sn (mi, mj + m − 1) ··· ··· ··· (m) (m) Sn (mi + m − 1, mj ) · · · Sn (mi + m − 1, mj + m − 1) Sn (i, j ) 0 ··· 0 0 0 Sn (i, j ) · · · = . ··· 0 0 · · · Sn (i, j )
(2.52)
For the multi-interval case, we can generalize the argument in Theorem 2.1 to obtain the following result. Theorem 2.12. Let 0 = n0 ≤ n1 ≤ n2 ≤ · · · ≤ nk ≤ nk+1 = ∞ be integers, and let s1 , . . . , sk be complex numbers satisfying sk = −1 and sk − sj = −1. Also set s0 = 0. We have k−1 −1 k (1 + sk )−#(ϕ) (1 + sk − sj )nj +1 −nj det 1 − (sj − sj −1 ) Knj j =0
k = det 1 − j =1
j =1
sk s j χ[nj ,nj +1 ) S , 1 + s k − sj
(2.53)
where #(ϕ) is again the winding number of ϕ. Remark 2.13. As noted by Böttcher ([9]; cf. Remark 2.3 above), the formula (2.53) remains true in the N × N matrix case, provided we replace nj +1 − nj by N (nj +1 − nj ), 0 ≤ j ≤ k − 1, and #(ϕ) by #(det ϕ). Again the proof in the scalar case extends to the matrix case, and we provide no details. Proof. The formal procedure (without considering the winding number) is as follows. For j = 0, . . . , k − 1, let Rj be the projection operator on {nj , . . . , nj +1 − 1}, and let Rk be the projection operator on {nk , nk + 1 . . . }. Since we have from (2.12), k k Knj = − Rl + M ϕ Rl Mϕ −1 , j = 1, . . . , k, (2.54) l=0
l=j
the determinant on the left-hand side in (2.53), denoted by (∗), is equal to k k (∗) = det 1 + sk Rj − M ϕ sj Rj Mϕ −1 . j =0
(2.55)
j =1
First we pull out the term 1 + sk kj =0 Rj , then use Lemma 2.4 (iii) to obtain k k 1 M Rj det 1 − M s R (∗) = det 1 + sk −1 ϕ j j ϕ 1 + sk kj =0 Rj j =0 j =1
= det 1 + sk
k j =0
Rj det 1 −
k j =1
sj Rj Mϕ −1
1 + sk
1 k
j =0 Rj
Mϕ .
(2.56)
Fredholm Determinant Identity for Random Young Tableaux
639
Now note that (recall s0 = 0) k det 1 + sk Rj j =0
= det 1 +
(sk − sj )Rj +
j =0
= det 1 + =
k−1
k−1
k
sj R j
j =1
(sk − sj )Rj det 1 +
j =0
1+
k−1
k (1 + sk − sj )nj +1 −nj det 1 +
j =0
j =1
k−1
k
1
j =0 (sk
− sj )Rj
sj Rj . 1 + s k − sj
(2.57) sj Rj
j =1
Using (2.57) and then multiplying two determinants, we have
(∗) =
k−1
k (1 + sk − sj )nj +1 −nj det 1 +
j =0
k
− (1 + sk )
j =1
j =1
sj Rj 1 + s k − sj
sj 1 Rj Mϕ −1 Mϕ . 1 + s k − sj 1 + sk kj =0 Rj
(2.58)
Finally, using
1 + sk
1 k
=
j =0 Rj
1 + sk 1 − kj =0 Rj 1 + sk
(2.59)
in the determinant on the right-hand side of (2.58), we obtain
(∗) =
k−1
(1 + sk − sj )
j =0
nj +1 −nj
k × det 1 − j =0
k sk sj Rj Mϕ −1 1 − R j Mϕ , 1 + sk − sj
(2.60)
j =0
which is precisely (2.53) from (2.12). The rigorous proof is also similar to the proof of Theorem 2.1. Let TN be the projection on |j | ≤ N as in (2.25). We take N large so that N > nk . The analogue of (2.28) is now (∗) =
k−1 j =0
nj +1 −nj
(1 + sk − sj )
lim det(1 + XN + YN ),
N→∞
(2.61)
640
J. Baik, P. Deift, E. Rains
where XN =
k j =1
sj 1 + sk − sj (1 − TN ) Rj (TN − Mϕ −1 TN Mϕ ), 1 + s k − sj 1 + sk
k
YN = −
j =1
(2.62)
k sk sj 1 + sk − sj (1 − TN ) Rj Mϕ −1 TN 1 − sj Rj M ϕ , 1 + s k − sj 1 + sk j =0
(2.63) which becomes, by the same argument leading to (2.35), (∗) =
k−1
(1 + sk − sj )nj +1 −nj
j =0
k × det 1 −
sk s j χ[nj ,nj +1 ) S lim det(1 + ZN ), N→∞ 1 + s k − sj
j =1
(2.64)
with ZN in (2.33) where s is replaced by sk , This then leads to the desired result as in the single interval case. 3. Convergence of Moments In this section, we prove the convergence of moments for arbitrary (scaled) rows, ξj , of a random young diagram under the Plancherel measure, mentioned in the Introduction. The tail estimates used in the proof of Theorem 3.1 are given in Sect. 4 below. Let N0 := N ∪ {0}. Theorem 3.1. For any fixed k ∈ N, and for any aj ∈ N0 , 1 ≤ j ≤ k, we have as N → ∞, a1 EPlan ξ1 · · · ξkak → E x1a1 · · · xkak , (3.1) N where EPlan N denotes the expectation with respect to the Plancherel measure on YN , and E denotes the expectation with respect to the limiting distribution function F in (1.6), (1.7). Remark 3.2. It will be clear from the proof below that the following stronger convergence result is also true: Let hj (x), j = 1, . . . , k be continuous functions on R satisfying 3/2−? for some ? > 0. Then for any k, as N → ∞, |hj (x)| ≤ C1 ec2 |x| h1 (ξ1 ) · · · hk (ξk ) → E h1 (x1 ) · · · hk (xk ) . EPlan (3.2) N Proof. We have a1 EPlan N (ξ1
· · · ξkak )
=
k
x1 ≥···≥xk j =1
a
xj j d PPlan N (ξ1 ≤ x1 , . . . , ξk ≤ xk )
(3.3)
Fredholm Determinant Identity for Random Young Tableaux
641
since λ1 ≥ λ2 · · · . Fix a number T > 2. We split the integral into two pieces: (a) (b)
max |xj | ≤ T ,
(3.4)
max |xj | > T .
(3.5)
1≤j ≤k 1≤j ≤k
In the first part (a), using a standard argument and the convergence in distribution (1.7) above, the limit becomes as N → ∞,
k
a xj j dF (x1 , . . . x1 ≥···≥xk max |xj |≤T j =1
, xk ).
(3.6)
For the second part (b), the region is a union of two (not necessarily disjoint) pieces: (i)
max |xj | = |x1 |,
(3.7)
(ii)
max |xj | = |xk |.
(3.8)
j j
Note that since x1 ≥ · · · ≥ xk , maxj |xj | is either |x1 | or |xk |. Over region (i),
k
|xj |aj d PPlan N (ξ1 ≤ x1 , . . . , ξk ≤ xk )
(i) j =1
≤
(i)
≤ =
|x1 |a1 +···+ak d PPlan N (ξ1 ≤ x1 , . . . , ξk ≤ xk )
|x1 |a1 +···+ak d PPlan N (ξ1 (−∞,−T )∪(T ,∞) a1 +···+ak (χξ1 <−T + χξ1 >T )). EPlan N (|ξ1 |
(3.9)
≤ x1 )
Similarly,
k
|xj |aj d PPlan N (ξ1 ≤ x1 , . . . , ξk ≤ xk )
(ii) j =1
(3.10)
a1 +···+ak ≤ EPlan (χξk <−T + χξk >T )). N (|ξk |
Now from the tail estimates in Proposition 4.3 below, the moment (3.3) as N → ∞ is equal to (3.6) plus a term which can be made arbitrarily small if we take T large enough. However, from Lemma 3.3, for T large, (3.6) is arbitrarily close to E(x1a1 · · · xkak ). Thus we have proved the theorem. Lemma 3.3. For any k ∈ N, and for any aj ∈ N0 , 1 ≤ j ≤ k, E(|x1 |a1 · · · |xk |ak ) < ∞,
(3.11)
where E is the expectation with respect to the limiting distribution function F in (1.6) and (1.7).
642
J. Baik, P. Deift, E. Rains
Proof. We need to show that
k x1 ≥···≥xk j =1
|xj |aj dF (x1 , . . . , xk ) < ∞.
(3.12)
Fix T > 2. We split the integral into two parts as in (3.4), (3.5): (a) maxj |xj | ≤ T , and (b) maxj |xj | > T . In (a), the integral is finite. In (b), the argument yielding (3.9), (3.10) implies that
k
|xj |aj dF (x1 , . . . , xk )
(3.13)
(b) j =1
≤ E(|x1 |a1 +···+ak χx1 >T ) + E(|xk |a1 +···+ak χxk <−T ). a1 +···+ak χ (Note that the additional terms corresponding to EPlan ξ1 T We will prove the finiteness of the last two expected values for a = a1 + · · · + ak . First, we prove that E(x1a χx1 >T ) < ∞ for any a ∈ N0 . Note that by (1.7) and (4.9) below, for x1 > T0 , 3/2
Plan −cx1 1 − F (x1 ) = lim 1 − PPlan N (ξ1 ≤ x1 ) = lim EN (χξ >x1 ) ≤ Ce N→∞
N→∞
(3.14)
for some C, c > 0. In particular, we have for any a ∈ N0 , lim x a (1 − F (x1 )) x1 →∞ 1 Thus, integrating by parts, ∞ x1a dF (x1 ) = T a (1 − F (T )) + T
∞ T
= 0.
ax1a−1 (1 − F (x1 ))dx1 .
(3.15)
(3.16)
Using (1.7), Fatou’s lemma and (4.9), we have E(x1a χx1 >T ) ∞ = x1a dF (x1 ) T
∞ (ξ ≤ T )) + lim ax1a−1 (1 − PPlan = lim T a (1 − PPlan 1 N N (ξ1 ≤ x1 ))dx1 N→∞ T N→∞ (3.17) ∞ a−1 Plan ≤ lim inf T a (1 − PPlan (ξ ≤ T )) + ax (1 − P (ξ ≤ x ))dx 1 1 1 1 N N 1 N→∞
T
a = lim inf EPlan N (ξ1 χξ1 >T ) N→∞
≤ Ce−cT
3/2
< ∞.
The proof of the finiteness of the second expected value in (3.13) is similar using (4.10).
Fredholm Determinant Identity for Random Young Tableaux
643
4. Tail Estimates For the proof of Theorem 3.1, we need tail estimates for the (scaled) length ξk of each row, which are uniform in N . In this section, we obtain these tail estimates in Proposition 4.3. These estimates follow from the tail estimates, Proposition 4.1 for the Poissonized Plancherel measure introduced in Sect. 1, together with the de-Poissonization Lemma 4.2. Define φn(k) (t) := PPois (λk ≤ n) = t
2 ∞ e−t t 2N Plan PN (λk ≤ n). N!
(4.1)
N=0
√ (In [3, 4], the notation λ = t is used. But in this paper, to avoid the confusion with the notation λ for a partition, we use t.) The following result is proved in Sect. 5 using the (k) steepest-descent method for RHP. Note that 0 ≤ φn (t) ≤ 1. Proposition 4.1. Define x by 2t x = 1 − 1/3 2/3 . n 2 n
(4.2)
Let k ∈ N. There are constants C, c > 0 and 0 < δ0 < 1 such that for large t and n, and for any fixed 0 < δ < δ0 , the following hold true: for x ≥ 0, 2t ≤ 1 − δ, n 2t 3/2 0 ≤ 1 − φn(k) (t) ≤ Ce−cx , 1 − δ < ≤ 1, n
0 ≤ 1 − φn(k) (t) ≤ Ce−cn ,
0≤
(4.3) (4.4)
and for x < 0, 2t < 1 + δ, n 2t 1+δ ≤ . n
0 ≤ φn(k) (t) ≤ Ce−c|x| , 1 <
(4.5)
0 ≤ φn(k) (t) ≤ Ce−ct ,
(4.6)
3/2
We also need the following de-Poissonization lemma: Lemma 4.2. There exists C > 0 such that for all sufficiently large N , √ 1/2 (k) PPlan , N (λk ≤ n) ≤ Cφn (N − N ) √ 1/2 Plan (k) 1 − PN (λk ≤ n) ≤ C 1 − φn (N + N )
(4.7) (4.8)
for all n ∈ Z.
√ Proof. This is similar to Lemma 8.3 in [3] (again note that λ in [3] satisfies λ = t.) Indeed, the proof of Lemma 8.3 in [3] only requires the fact that 0 ≤ qn,N+1 ≤ qn,N ≤ 1. In our case, qn,N = PPlan N (λk ≤ n), which is clearly between 0 and 1. The monotonicity can be found in [24] Lemma 3.8. Now Proposition 4.1 and Lemma 4.2 imply the following uniform tail estimates.
644
J. Baik, P. Deift, E. Rains
Proposition 4.3. Fix k ∈ N and a ∈ N0 . For a given T ≥ 2, there are constants C, c > 0 and N0 > 0 such that for N ≥ N0 , a −cT EPlan N (ξk χξk >T ) ≤ Ce
3/2
+ Ce−cN
1/2
(4.9)
and a −cT EPlan N (|ξk | χξk <−T ) ≤ Ce
3/2
+ Ce−cN
1/2
.
(4.10)
Proof. (a) Bound (4.9): Without any loss we can assume a > 0. Note that since 0 ≤ λk ≤ N, √ N −2 N 1/3 −2N ≤ ξk ≤ < N 5/6 . (4.11) N 1/6 If T ≥ N 5/6 , then the expected value in (4.9) is zero, and the bound is trivial. Thus we assume that T < N 5/6 . Integrating by parts and using Lemma 4.2, a EPlan N (ξk χξk >T ) =
s a d PPlan N (ξk ≤ s) = T a 1 − PPlan (ξ ≤ T ) + k N (T ,N 5/6 )
(T ,N 5/6 )
as a−1 1 − PPlan N (ξk ≤ s) ds
(4.12)
√ 1/2 (1) ≤ CT 1 − φn(T ) N + N √ (1) as a−1 (1 − φn(s) ((N + N )1/2 ))ds, +C a
(T ,N 5/6 )
√ for large N , where n(s) = 2 N + sN 1/6 . Note that since T ≥ 2, distinguish two cases: √ 2(N + N )1/2 (i) 0 ≤ < 1 − δ, n(T ) √ 2(N + N )1/2 (ii) 1 − δ ≤ ≤ 1, n(T )
√ 2(N+ N)1/2 n(T )
≤ 1. We
(4.13) (4.14)
where 0 < δ < 1 is a fixed constant satisfying δ < δ0 , where δ0 appears in Proposition 4.1. √ √ Case (i): For all s ≥ T , 0 ≤
2(N+ N)1/2 n(s)
≤
2(N+ N)1/2 n(T )
√ 2(N + N )1/2 n(s) ≥ n(T ) ≥ 1−δ
< 1 − δ. Note that for T ≤ s, √ 2 N ≥ . (4.15) 1−δ
Hence from the estimate (4.3), we have √ 1/2 (1) 1 − φn(s) ((N + N )1/2 ) ≤ Ce−cn(s) ≤ Ce−cN
(4.16)
for T ≤ s < N 5/6 with a new constant c. Therefore, from (4.12), we obtain a −cN EPlan N (ξk χξk >T ) ≤ Ce
1/2
.
(4.17)
Fredholm Determinant Identity for Random Young Tableaux
Case (ii): There is s0 > T such that s0 ≥ N 5/6 , [s0 , N 5/6 ) is empty) a EPlan N (ξk χξk >T )
≤ CT
a
(1) (1 − φn(T ) ((N
+
√
√ 2(N+ N)1/2 n(s0 )
N)
1/2
645
)) + C
= 1 − δ. We write (4.12) as (if
(1)
(T ,s0 )
as a−1 (1 − φn(s) ((N +
√ (1) as a−1 (1 − φn(s) ((N + N )1/2 ))ds [s0 ,N 5/6 ) a −cx(T )3/2 a−1 −cx(s)3/2 ≤ CT e +C as e ds + C
√
N )1/2 ))ds
+C
(T ,s0 )
[s0 ,N 5/6 )
as a−1 e−cn(s) ds (4.18)
using (4.3), √(4.4), where x(s) is defined by the formula (4.2) with t = (N + and n = 2 N + sN 1/6 . As in Case (i), for s ≥ s0 , √ √ 2(N + N )1/2 2 N n(s) ≥ n(s0 ) = ≥ , 1−δ 1−δ and hence, the last integral is less than Ce−cN
1/2
√
N )1/2
(4.19)
. For the other terms, since
√ √ √ 1/2 −4 N 2 N − 2(N + N ) = √ ≥ −1 ≥ −N 1/6 , √ 2 N + 2(N + N )1/2
(4.20)
we have for T ≤ s < s0 , √ √ n − 2t 2 N + sN 1/6 − 2(N + N )1/2 x(s) = = √ (n/2)1/3 ( N + 2s N 1/6 )1/3 s−1 ≥ ≥ (1 + 2Ns1/3 )1/3 (1 + as s ≥ T ≥ 2. Noting that s0 = we have
√ √ 2(N+ N)1/2 −2(1−δ) N (1−δ)N 1/6
x(s) ≥
(1 +
(4.21)
1 2 s )1/3 2N 1/3
1 2 s )1/3 2N 1/3
≤ c0 N 1/3 for some constant c0 ,
≥ cs
(4.22)
for s ≤ s0 with some constant c > 0. Hence a a −cT EPlan N (ξk χξk >T ) ≤ CT e
≤ Ce−cT
3/2
3/2
+C
∞ T
+ Ce−cN
1/2
.
s a−1 e−cs
3/2
ds + Ce−cN
1/2
(4.23)
646
J. Baik, P. Deift, E. Rains
(b) Bound (4.10): Recalling (4.11), if T ≥ 2N 1/3 , the expected value in (4.10) is zero and the bound is trivial. Thus we assume that T ≤ 2N 1/3 . Integrating by parts and using Lemma 4.2, we have for some constants C, c > 0, a EPlan N (|ξk |χξk <−T ) (−s)a d PPlan = N (ξk ≤ s) [−2N 1/3 ,−T ) a Plan = | − T | PN (ξk < −T ) +
[−2N 1/3 ,−T )
a(−s)a−1 PPlan N (ξk ≤ s)ds
(4.24)
√ (k) ≤ CT a φn(−T ) ((N − N )1/2 √ (k) +C a(−s)a−1 φn(s) ((N − N )1/2 ds [−2N 1/3 ,−T )
√ for large N, where n(s) = 2 N + sN 1/6 as before. Given T , we take N0 > so that for N ≥ N0 ,
√ 2(N− N)1/2 n(−T )
√ 1+ 1+T 3 2T
> 1. We distinguish two cases:
√ 2(N − N )1/2 , (i) 1 + δ ≤ n(−T ) √ 1/2 2(N − N ) (ii) 1 < < 1 + δ, n(−T ) where 0 < δ < 1 is a fixed constant as above. Case (i): For all −2N 1/3 − 2N 1/3 < s < −T , From the estimate (4.6), using T ≤ 2N 1/3 ,
√ 2(N− N)1/2 n(s)
a −cN EPlan N (|ξk |χξk <−T ) ≤ Ce
Case (ii): There is s0 > T such that
√ 2(N− N)1/2 n(−s0 )
1/2
(4.25) (4.26)
≥
√ 2(N− N)1/2 n(−T )
.
≥ 1 + δ.
(4.27)
= 1 + δ. We write (4.24) as
a EPlan N (|ξk |χξk <−T )
√ √ (k) (k) a(−s)a−1 φn(−T ) ((N − N )1/2 )ds ≤ CT a φn(−T ) ((N − N )1/2 ) + C (−s0 ,−T ) √ a−1 (k) +C a(−s) φn(−T ) ((N − N )1/2 )ds [−2N 1/3 ,−s0 ] 3/2 3/2 a(−s)a−1 e−c|y(s)| ds ≤ CT a e−c|y(−T )| + C (−s0 ,−T ) √ 1/2 +C a(−s)a−1 e−c(N− N) ds [−2N 1/3 ,−s0 ]
(4.28) √ using (4.5), (4.6), where y(s) is defined by x in the formula (4.2) with t = (N − N )1/2 √ 1/2 and n = 2 N + sN 1/6 . The last integral is less than Ce−cN with new constants
Fredholm Determinant Identity for Random Young Tableaux
647
√ √ C, c. For the other terms, note that since 2 N − 2(N − N )1/2 ≤ 2 ≤ N 1/6 and s ≤ −T ≤ −2, we have for −2N 1/3 ≤ s ≤ −T , √ √ n − 2t s + [2 N − 2(N − N )1/2 ]N −1/6 s y(s) = = .≤s+1≤ . (4.29) 1/3 1/3 1/3 (n/2) (1 + s/(2N )) 2 Thus, we obtain a EPlan N (|ξk |χξk <−T )
≤ CT a e−cT
3/2
≤ Ce−cT
+ Ce−cN
3/2
+C
a|s|a−1 e−c|s| ds + Ce−cN 3/2
(−s0 ,−T ) 1/2
1/2
(4.30)
.
Remark 4.4. The results (4.3)–(4.6) for k = 1 were given in [3]. Indeed, in [3] stronger bounds than (4.5), (4.6) were obtained (Lemma 7.1 (iv), (v) in [3]): (4.31)
0 ≤ φn(1) (t) ≤ Ce−ct ,
(4.32)
3
2
(note λ =
2t < 1 + δ, n 2t 1+δ ≤ n
0 ≤ φn(1) (t) ≤ Ce−c|x| , 1 <
√ t in [3].) From this, as in the Proof of Proposition 4.3. we have a −cT + Ce−cN . EPlan N (|ξ1 |χξ1 <−T ) ≤ Ce 3
(4.33)
In this paper, we only obtain the above weaker bounds (4.5) and (4.6), but they are enough for our purpose in proving the convergence of moments. However, we believe that the same bound (4.33) holds true for general k. In the next section, we indicate why we only obtain these weaker bounds (see Remark 5.1 below). 5. Riemann–Hilbert Problem In this section, we will prove Proposition 4.1. (k) (1) For (4.3) and (4.4), note that φn (t) ≥ φn (t) for all k ≥ 1 as λ1 ≥ λ2 ≥ · · · . But for k = 1 the estimates (4.3), (4.4) were proved in [3] (Lemma 7.1 (i), (ii)), and so we have the same bounds for all k ≥ 1. On the other hand, since PPlan N (λ ≤ n) ≤ 1, we (k) always have 0 ≤ φn (t) ≤ 1. The rest of this section is devoted to proving (4.5) and (4.6). We start from the (k) (λk ≤ n)) formulae (see [8, 24, 28, 31]) that (recall φn (t) = PPois t φn(1) (t) = det(1 − Sn ),
k
d (k+1) (k)
φn+k (t) = φn+k−1 (t) + − det(1 − r Sn ), dr r=1
(5.1) k ≥ 1,
(5.2)
with ϕ(z) = et (z−z
−1 )
.
(5.3)
648
J. Baik, P. Deift, E. Rains
This follows from, for example, Theorem 3.1 of [31] with p+ = p− = (t, 0, 0) (see also [30]) which states that for any finite subset A of Z, PPois (A ⊂ {λj − j }) = det(S(i, j ))i,j ∈A , t S(i, j ) =
∞
(5.4)
(ϕ −1 )i+k ϕ−j −k ,
k=1
with ϕ given by (5.3), where PPois denotes the Plancherel measure for SN with N being t the Poisson variable with mean t 2 (Poissonized Plancherel measure). Recall from (4.1) (k) (λk ≤ n). In [3], the authors obtained the estimates (stronger than) that φn (t) = PPois t (4.5) and (4.6) in the case k = 1 (see Remark 4.4). Hence we need to prove that for any fixed k ∈ N,
k
d
2t −c|x|3/2
< 1 + δ, (5.5) det(1 − r S ) , 1< n ≤ Ce
dr k r=1 n
k
d
2t −ct
(5.6) 1+δ ≤ .
dr k r=1 det(1 − r Sn ) ≤ Ce , n By Cauchy’s theorem, we write for 0 < ? < 1, det(1 − s Sn ) k! d k
det(1 − r Sn ) = ds. dr k r=1 2π i |s−1|=? (s − 1)k+1
(5.7)
By Remark 2.8 in Sect. 2, for ϕ(z) as in (5.3), Sn is positive and Sn ≤ 1. Hence the eigenvalues aj of (the trace class operator) Sn satisfies 0 ≤ aj ≤ 1 (actually one can show that 0 ≤ aj < 1). For |s − 1| = ?,
1 − saj ≤
det(1 − s Sn ) = (1 − (1 − ?)aj ) = det(1 − (1 − ?) Sn ). (5.8) j
j
Therefore we have
k
d
k!
dr k r=1 det(1 − r Sn ) ≤ ? k det(1 − (1 − ?) Sn ).
(5.9)
Thus it is sufficient to prove that for fixed 0 < r < 1, det(1 − r Sn ) ≤ Ce−c|x| , 3/2
det(1 − r Sn ) ≤ Ce−ct ,
2t < 1 + δ, n 2t 1+δ ≤ , n 1<
(5.10) (5.11)
or by Theorem 2.1, we need to prove that for any fixed 0 < s < 1, (note that ϕ given by (5.3) has no winding) (1 + s)−n det(1 − s Kn ) ≤ Ce−c|x| , 3/2
(1 + s)−n det(1 − s Kn ) ≤ Ce−ct ,
2t < 1 + δ, n 2t 1+δ ≤ , n 1<
(5.12) (5.13)
Fredholm Determinant Identity for Random Young Tableaux
649
where x is defined in (4.2) 2t x = 1 − 1/3 2/3 . n 2 n
(5.14)
Since Kn is an integrable operator, there is a naturally associated Riemann–Hilbert problem (see [21, 12]). Let m(z; k) be the 2×2 matrix function which solves the following Riemann–Hilbert problem (RHP): with contour * = {|z| = 1}, oriented counterclockwise, m(z; k) is analytic inz ∈ C \ *, −1 1 − s2 −sz−k e−t (z−z ) m (z; k) = m− (z; k) for z ∈ *, (5.15) −1 + szk et (z−z ) 1 m(z; k) → I as z → ∞. Here and also in the following, the notation f+ (z) (resp., f− (z)) denotes the limiting value of limz →z f (z ) from the left (resp., right) of the contour in the direction of the orientation. In the above case, m+ (z; k; s) (resp., m− (z; k; s)) means limz →z m(z ; k; s) with |z | < 1 (resp., |z | > 1.) In (52) of [4], it is shown that (1 + s)
−n
det(1 − s Kn ) =
∞
m11 (0; k + 1),
(5.16)
k=n
where m11 (0; k) is the (11)-entry of m(z; k) evaluated at z = 0. Therefore, in order to prove (5.12) and (5.13), we need asymptotic results for m11 (0; k) as k, t → ∞. In the special case when s = 1, this RHP is algebraically equivalent to the RHP for the or−1 thogonal polynomials on the unit circle with respect to the measure et (z+z ) dz/(2π iz), whose asymptotics as k, t → ∞ was investigated in [3]. The RHP (5.15) was introduced in [4]. There is a critical difference in the asymptotic analysis depending on whether s = 1 or 0 < s < 1. In the former case, the jump matrix in (5.15) has an upper/lower factorization, but not a lower/upper factorization, while in the later case, the jump matrix has both factorizations. This difference makes the later case much easier to analyze asymptotically. In the former case, we need a WKB type analysis which involves the construction of a parametrix in terms of the equilibrium measure of a certain variational problem, and in the case where 2tn > 1, the main asymptotic contribution to the RHP comes from the part of the circle near z = −1. But in the later case, due to the existence of both factorization of the jump matrix, the RHP localizes in the limit just to two points on the circle. We refer the reader to [16] for an example of the second type, and to [17, 15, 14] for examples of the first type. Remark 5.1. The different analysis for s = 1 and 0 < s < 1 gives us different estimates. Indeed, when s = 1, instead of (5.18) below, we have (see (6.42) of [3]) log m11 (0; k) ≤ k(−
2t 2t + log + 1), k k
(5.17)
which imply (4.31), (4.32). Thus, in order to obtain the better estimates (4.31), (4.32), (4.33) for the general row ξk (see Remark 4.4), we need to analysis the RHP (5.15) as s → 1 instead of fixed s < 1.
650
J. Baik, P. Deift, E. Rains
Fix 0 < s < 1. We will prove the following estimate. Lemma 5.2. There are positive constants M0 and c such that when we have 2 k log m11 (0; k) ≤ −c 1 − , 2t
2t k
≥ 1+
M0 , 21/3 k 2/3
(5.18)
for large t. Assuming this result, we will prove (5.12) and (5.13), which completes the proof of Proposition 4.1. Proof of Proposition 4.1. We need to prove (5.12) and (5.13). (a) Estimate (5.12): It is enough to prove (5.12) for −21/3 n2/3 δ < x ≤ −M, where M := M0 (1 + 2δ) with M0 in Lemma 5.2. As noted above, since Sn is positive and Sn ≤ 1, its eigenvalues aj satisfy 0 ≤ aj ≤ 1. Hence for 0 < s < 1, by (5.16) and Theorem 2.1, 0≤
∞
m11 (0; k + 1) = det(1 − s 2 Sn ) =
k=n
(1 − s 2 aj ) ≤ 1,
(5.19)
j
for any n. Therefore in order to prove (5.12), it is enough to show that log m11 (0; k + 1) ≤ −c|x|3/2 + C,
(5.20)
(∗)
for some constants c, C > 0 where the summation is over the set (note x + M ≤ 0 in (5.20)) (∗) :
n≤k ≤n−
x + M1 1/3 n , 21/3
(5.21)
with M1 := M0 (1 + δ). We will show that for k in (∗), M0 2t ≤ . 21/3 k 2/3 k
(5.22)
2t M0 ≥ 1 + 1/3 2/3 . k 2 n
(5.23)
−1 2t 2t n x + M1 x = 1 − 1/3 2/3 , · ≥ 1 − 1/3 2/3 k n k 2 n 2 n
(5.24)
1+ Since n ≤ k, (5.22) follows from
In order to show (5.23), since
it is enough to check that x x + M1 M0 1 − 1/3 2/3 ≥ 1 − 1/3 2/3 1 + 1/3 2/3 , 2 n 2 n 2 n
(5.25)
Fredholm Determinant Identity for Random Young Tableaux
which is equivalent to check that
651
x + M1 M1 ≥ 1 − 1/3 2/3 M0 . 2 n
But since −x ≤ 21/3 δn2/3 , x + M1 M1 M1 1 − 1/3 2/3 M0 ≤ 1 + δ − 1/3 2/3 ≤ M1 , 2 n 2 n 1+δ
(5.26)
(5.27)
and hence (5.22) is proved. Now using (5.18), the sum in (5.20) satisfies logm11 (0; k + 1) (∗)
≤
−c
(∗)
≤
(∗)
≤ −c
1−
! −c 1 −
n−
k 2t
2
k 2t
x+M1 1/3 n 21/3
!
n
1−
s ds 2t
3/2 1 1/3 3/2 n n − x+M 4t n 21/3 1− − 1− 3 2t 2t 3/2 −1/2 2t x + M1 3/2 2 2t 2/3 2t = c n2/3 −1 + − 1 − n 3 n n 21/3 n −1/2 3/2 3/2 2 2t M1 −x = c , − 1/3 1/3 3 n 2 2 (5.28) " where the second inequality is due to the monotonicity of the function f (y) = 1 − 2ty . =c
Since 1 <
2t n
< 1 + δ, we obtain (5.12).
(b) Estimate (5.13): By a similar argument as in (a), it is enough to show that log m11 (0; k + 1) ≤ −ct + C, (5.29) (∗∗)
for some constants c, C > 0 where (∗∗) is the set (∗∗) :
2t 2t ≤k≤ . 1+δ 1 + δ/2
(5.30)
For k in (∗∗), we have k 1 ≤ < 1. 2t 1 + δ/2
(5.31)
652
J. Baik, P. Deift, E. Rains
Now using (5.18), the sum in (5.29) satisfies 2 k log m11 (0; k + 1) ≤ −c 1 − 2t (∗∗) (∗∗) 2 2t 2t 1 − +1 ≤ −c 1 − 1 + δ/2 1 + δ/2 1 + δ
(5.32)
≤ −ct + C.
RHP Asymptotics and Proof of Lemma 5.2. In the rest of this section, we prove Lemma 5.2 by asymptotic analysis of the RHP (5.15). Set k η := . (5.33) 2t Under the condition of Lemma 5.2, we have η < 1. We denote by v(z) the jump matrix in the second condition of the RHP (5.15). Note that the (21)-entry of v is se2tf (z;η) , where 1 f (z; η) := (z − z−1 ) + η log z, (5.34) 2 where log z ∈ R for z > 0. The critical points of this function are ξ := eiθc and ξ −1 = ξ , where " (5.35) ξ = −η + i 1 − η2 . Note that −π/2 < θc < π. For z = ρeiθ , consider Fθ (ρ) := Re f (z) =
1 (ρ − ρ −1 ) cos θ + η log ρ. 2
(5.36)
Its derivative at ρ = 1 is d Fθ (1) = cos θ + η, dρ
(5.37)
which is positive for |θ | < θc , and is negative for θc < |θ | ≤ π . Indeed it is easy to check that: 1. When |θ| ≤ π2 , Fθ (ρ) < 0 for 0 < ρ < 1, and Fθ (ρ) > 0 for ρ > 1. 2. When π2 < |θ | < θc , Fθ (ρ) > 0 for 0 < ρ < ρ0 , Fθ (ρ) < 0 for ρ0 < ρ < 1, Fθ (ρ) > 0 for 1 < ρ < ρo−1 , and Fθ (ρ) < 0 for ρ > ρ0−1 . Here ρ0 is a number satisfying 0 < ρ0 < ρθ , where # η − η2 − cos2 θ ρθ := < 1, (5.38) − cos θ and
d dθ Fθ (ρθ )
=
−1 d dθ Fθ (ρθ )
= 0.
Fredholm Determinant Identity for Random Young Tableaux
653
1
1
1
0.8
0.8
0.8
0.6
0.6
0.6
0.4
0.4
0.4
0.2
0.2
0.2
0
0
0
−0.2
−0.2
−0.2
−0.4
−0.4
−0.4
−0.6
−0.6
−0.6
−0.8
−0.8
−1
0
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8
2
−1
−0.8
0
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8
2
−1
0
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8
2
Fig. 1. Graph of Fθ (ρ) for θ = π6 (case (i)), 5π 6 (case (ii)) and π (case (iii)) when η = 15/16
3. When θc < |θ | ≤ π, Fθ (ρ) > 0 for 0 < ρ < 1, and Fθ (ρ) < 0 for ρ > 1. 4. The curve {ρeiθ : Fθ (ρ) = 0} crosses the circle at 90 degree. Typical graphs of Fθ (ρ) is given in Fig. 1 for the value η = 15/16 for θ in the three different cases (i)–(iii). Figure 2 is a signature table for Re(f (z)) when η = 15/16. The solid curve is Re(f (z)) = 0, and the dotted rays represent the lines cos θ = −η. The ± signs denote the signature of Re(f (z)) in each of the four components. The curve Re(f (z)) = 0 and the lines cos θ = −η meet on the unit circle at the points ξ and ξ −1 . 2
1.5
+
1
0.5
− 0
+
−
0
−0.5
−1
−1.5
−2 −2
−1.5
−1
−0.5
0
0.5
1
1.5
2
Fig. 2. Curve Re(f (z)) = 0 when η = 15/16
Let * = *1 ∪ *2 , where *1 = {eiθ : |θ | < θc }, and *2 = * \ *1 . Define the function δ(z) :=
z−ξ z − ξ −1
− 2π1 i log(1−s 2 )
,
(5.39)
which is analytic in C \ *2 : we choose the branch so that δ(z) → 1 as z → ∞ along the positive real axis. Then it solves the scalar Riemann–Hilbert problem δ+ (z) = δ− (z)(1 − s 2 ), z ∈ *2 , δ(z) → 1, as z → ∞,
(5.40)
654
J. Baik, P. Deift, E. Rains 2
1.5
1
0.5
0
−0.5
−1
−1.5
−2 −2
−1.5
−1
−0.5
0
0.5
1
1.5
2
Fig. 3. Solid curve represents the set {ρθ eiθ : π2 < θ ≤ θc } when η = 15/16
where δ± has the same meaning as in the RHP (5.15). Note that θc
δ(0) = (1 − s 2 )1− π . Now set
(5.41)
(2)
m
(z) := m(z)δ
−σ3
,
1 0 σ3 = . 0 −1
(5.42) (2)
Then (i) m(2) (z) is analytic in C \ *, (ii) m(2) (z) → I as z → ∞, and (iii) m+ (z) = (2) m− v (2) (z) for z ∈ *, where 1 − s2 −se−2tf (z) δ 2 (z) , z ∈ *1 , 2tf (z) −2 (2) δ (z) 1 v (z) = se (5.43) s −2tf (z) δ 2 (z) 1 − e + 1−s 2 , z ∈ *2 . s e2tf (z) δ −2 2 1 − s 2 − 1−s Also we have (2)
θc
m11 (0) = m11 (0)(1 − s 2 )1− π . Note that the jump matrix has the factorizations 1 −se−2tf (z) δ 2 (z) 1 0 , 0 1 se2tf (z) δ −2 (z) 1 (2) v (z) = s −2tf (z) δ 2 (z) 1 − 1−s 1 0 2e + , s e2tf (z) δ −2 1 − s 2 0 1 − 1−s 2
(5.44)
z ∈ *1 , z ∈ *2 . (5.45)
Fredholm Determinant Identity for Random Young Tableaux
655 (3)
From (i)–(iii) in (5.38), we can take an oriented closed curve *− surrounding 0 and −1, and passing through ξ and ξ −1 (the solid curve in Fig. 4) on which Re f (z) < 0 (3)
(3)
(3)
(3)
(3)
except at z = ξ, ξ −1 . Let *− = *−1 ∪ *−2 , where *−1 is the open subset of *−
(3)
Σ +1 ξ −1
(3)
Σ −2
ξ
(3)
Ω6
(3)
Σ −1 (3)
Σ +2
(3) (3)
0
Ω4 Ω3
Σ
(3)
Ω5
(3)
Ω2
(3)
Ω1
−1
(3)
(3)
Fig. 4. *± and Ij , j = 1, . . . , 6
(3)
(3)
(3)
satisfying | arg(z)| < θc and *−2 = *− \*−1 . Similarly, we can take an oriented closed (3)
curve *+ surrounding 0, but not −1 and passing through ξ, ξ −1 (the dashed curve in (3)
(3)
(3)
(3)
(3)
Fig. 4) on which Re f (z) > 0 except at z = ξ, ξ −1 . Again let *+ = *+1 ∪ *+2 , (3)
(3)
(3)
where *+1 is the open subset of *+ satisfying | arg(z)| < θc and *+2 = *+ \ *+1 . The shape of
(3) *±
will be specified further below (see the third case for the estimation (3)
(3)
(3)
of |vR (z) − I | between (5.88) and (5.89)). Let * (3) = *− ∪ *+ . Let Ij , 1 ≤ j ≤ 6 be open regions as in Fig. 4. Define −1 1 0 (2) , m (z) se2tf δ −2 1 −2tf δ 2 1 −se m(2) (z) , 0 1 −1 s −2tf δ 2 m(3) (z) := e 1 − 2 m(2) (z) 1−s , 0 1 1 0 (2) , m (z) s e2tf δ −2 1 1−s 2 (2) m (z),
(3)
z ∈ I1 , (3)
z ∈ I2 , (5.46)
(3)
z ∈ I3 , (3)
z ∈ I4 , (3)
(3)
z ∈ I5 ∪ I6 .
656
J. Baik, P. Deift, E. Rains
Then (i) m(3) (z) is analytic in C \ * (3) , (ii) m(3) (z) → (3) m− (z)v (3) for z ∈ * (3) , where 1 0 2tf −2 1 , se δ 1 −se−2tf δ 2 , 0 1 (3) v (z) = s −2tf δ 2 1 − 1−s 2e , 0 1 1 0 , s 2tf −2 e δ 1 1−s 2
(3)
I as z → ∞, and m+ (z) =
(3)
z ∈ *−1 , (3)
z ∈ *+1 , (5.47) z∈
(3) *+2 , (3)
z ∈ *−2 .
Also we have (3)
θc
m11 (0) = m11 (0)(1 − s 2 )1− π .
(5.48)
Observe that v (3) (z) → I as t → ∞ for z ∈ * (3) \ {ξ, ξ −1 }. Thus we expect that m(3) (z) → I as t → ∞. If this were indeed true, we would have # 2 −1 2 2 sin log(1 − s ) k 1−η 2 log m11 (0) ∼ 1− , t → ∞. log(1 − s ) ≤ π π 2t (5.49) But the difficulty, however, is that v (3) does not converge to I uniformly on * (3) . As in [16], we overcome this difficulty by constructing a parametrix for the solution of the RHP (* (3) , v (3) ) around the points ξ, ξ −1 . Let τ be a complex number satisfying 0 < |τ | < 1. Following [16], set ν := − Define β12
1 log(1 − |τ |2 ), 2π
√ π π 2πe 4 i e− 2 ν := , τ M(−a)
β21 := β12
a := iν. √ π π 2π e− 4 i e− 2 ν = . τ M(a)
(5.50)
(5.51)
Note that β12 β21 = ν,
(5.52)
π as |M(iv)|2 = ν sinh(πν) for real ν = 0. Let Da be the parabolic-cylinder function (see, e.g. [1, 37]) which solves d2 1 ζ2 Da (ζ ) + (5.53) − + a Da (ζ ) = 0. dζ 2 2 4
We note that Da (ζ ) is an entire function.
Fredholm Determinant Identity for Random Young Tableaux
Let the matrix
O11 (w) O12 (w) O(w) = , O21 (w) O22 (w)
657
w ∈ C \ R,
(5.54)
be defined as follows (see [16] Sect. 4): for Im(w) > 0, O11 (w) := e− 4 πν Da (e− 4 πi w), 1 d −1 41 πν D−a (e− 4 πi w) − O12 (w) := (β21 ) e dw 3 3 d Da (e− 4 πi w) + O21 (w) := (β12 )−1 e− 4 πν dw 3
3
iw − 41 πi D−a (e w) , 2 3 iw Da (e− 4 πi w) , 2
O22 (w) := e 4 πν D−a (e− 4 πi w), 1
1
(5.55) (5.56) (5.57) (5.58)
and for Im(w) < 0, O11 (w) := e 4 πν Da (e 4 πi w), 3 3 d iw −1 − 43 πν πi πi 4 4 D−a (e w) − D−a (e w) , O12 (w) := (β21 ) e dw 2 1 1 1 d iw O21 (w) := (β12 )−1 e 4 πν Da (e 4 πi w) + Da (e 4 πi w) , dw 2 1
1
O22 (w) := e− 4 πν D−a (e 4 πi w). 3
3
(5.59) (5.60) (5.61) (5.62)
The function O satisfies – O(w) is analytic in w ∈ C \ R. – For w ∈ R,
1 − |τ |2 −τ O+ (w) = O− (w) , τ 1
(5.63)
where O+ (w) (resp., O− (w)) is the limit of O(s) as s → w with Im(s) > 0 (resp., Im(s) < 0). – As w → ∞, O(w)e 4 iw 1
2σ 3
w −iνσ3 = I + O(w −1 ),
(5.64)
where w−iν denotes the branch which is analytic in C \ (−∞, 0] and has modulus 1 for w ∈ (0, ∞). These properties can be found in [16] Sect. 4. Let M be the union of four rays, labeled by Mj , j = 1, . . . , 4, with the orientation as indicated in Fig. 5. All the rays and R meet at the angle π/3. Denote the components of C \ (M ∪ R) by Ij , j = 1, . . . , 6 as in Fig. 5. Define H (w), analytic in C \ M, by 1 a(ξ ) 0 iw2 σ3 −iνσ3 0 1 a(ξ )−1 0 0 1 4 H (w) := , (5.65) w 1 0 O(w)e 1 0 φ(w) 0 a(ξ )−1 0 a(ξ )
658
J. Baik, P. Deift, E. Rains
Γ1
Γ2
Ω3
Ω2
Ω1
R Ω4
Ω5
Ω6
Γ3
Γ4 Fig. 5. Mj and Ij
where a(ξ ) is a(ξ ) = etf (ξ )
−iξ √ −1 (ξ − ξ ) 2t(1 − η2 )1/4
iν ,
(5.66)
and φ(w) is defined by
1 2 2 iw w −2iν 1 −τ e , 0 1 1 0 , 1 2 −τ e− 2 iw w 2iν 1 1 0 φ(w) := , τ − 21 iw2 2iν w 1 2e 1−|τ | 1 2 τ 2 iw w −2iν 1 1−|τ 2e | , 0 1 I
w ∈ I1 , w ∈ I6 , w ∈ I3 ,
(5.67)
w ∈ I4 , w ∈ I2 , I5 .
Then by recalling that w−iνσ3 is analytic in C \ (−∞, 0], one H+ (w) = H− (w)vH (w) for w ∈ M, where vH (w) is given by 1 2 1 −τ a(ξ )−2 w −2iν e 2 iw , 0 1 1 0 τ a(ξ )2 w 2iν e− 21 iw2 1 , vH (w) := 1 0 1 , 2 τ a(ξ )2 w 2iν e− 2 iw 1 1−|τ |2 1 2 −τ 1 1−|τ a(ξ )−2 w −2iν e 2 iw |2 , 0 1
can directly check that
w ∈ M1 , w ∈ M4 , (5.68) w ∈ M2 , w ∈ M3 .
Fredholm Determinant Identity for Random Young Tableaux
659
w (z)
z ξ
0 Lξ
Fig. 6. map z → w(z)
Also, from (5.64), we have H (w) = I + O(w −1 ),
as w → ∞.
(5.69)
As |a(ξ )| = e−νθc , −π/2 < θc < π , we see that the error term O(w −1 ) in (5.69) is uniform for 2tk > 1. Similarly, |H (w)| is uniformly bounded in the w plane for 2tk > 1. Define the map z → w(z) :=
√ 2t(1 − η2 )1/4 iξ −1 (z − ξ ).
(5.70)
It maps ξ to 0, and the tangent line Lξ to the unit circle * at ξ , to the real line as in Fig. 6. Let Oξ , Oξ be the disjoint sets {z : |z − ξ | < P}, {z : |z − ξ | < P}, respectively, where P is defined by # ? |ξ − ξ | = ? 1 − η2 , P= 2 ?,
M0 2t ≤ < 1 + δ, 21/3 k 2/3 k 2t 1+δ ≤ . k 1+
(5.71)
The (small) parameter 0 < ? < 1 will be specified below (see (5.105) below). We note that one may choose the curves in * (3) above so that in Oξ , Oξ , they are straight lines which map under z → w(z) to (finite subsets of ) the rays Mj , j = 1, . . . , 4, (3) (3) (3) (3) *−1 ∩ Oξ → M4 , *−2 ∩ Oξ → M2 , *+1 ∩ Oξ → M1 , *+2 ∩ Oξ → M3 , and similarly for the neighborhood of Oξ . For τ = s, we define H (w(z)), mp (z) := H (w(z)), I,
z ∈ Oξ \ * (3) , z ∈ Oξ \ * (3) ,
(5.72)
z ∈ C \ (Oξ ∪ Oξ ).
Let *R := * (3) ∪ ∂Oξ ∪ ∂Oξ as in Fig. 7 where ∂Oξ and Oξ are oriented counterclockwise. Clearly mp solves a RHP on *R : mp (z) is analytic in C \ * (3) , mp (z) → I as z → ∞, and mp+ (z) = mp− (z)vp (z) for z ∈ *R for a suitable jump matrix vp . Set R(z) := m(3) (z)mp (z)−1 . Then R+ (z) = R− (z)vR (z) for z ∈ *R , where vR = mp− v (3) vp−1 m−1 p− . Now we estimate |vR (z) − I |.
660
J. Baik, P. Deift, E. Rains
Oξ (3)
Σ (3)
(3)
Σ −2
Σ +2
Σ +1
(3) −1
0
Σ
O ξ−1 Fig. 7. *R := * (3) ∪ ∂Oξ ∪ ∂Oξ
– For z ∈ ∂Oξ , we have from (5.71), when 1 + 21/3Mk02/3 ≤ 2tk ≤ 1 + δ, √ |w(z)| = ?(1 − η2 )3/4 2t −1/4 3/4 2t 2t ≥? k 2/3 −1 k k ≥ When 1 + δ ≤
2t k,
|w(z)| =
(5.73)
3/4 ?M0 . 21/4 (1 + δ)1/4
√
2t(1 − η2 )? ≥
δ 1+δ
1/4
√ ? 2t.
(5.74)
Thus if we have taken M0 large, and t is large, we have for z ∈ ∂Oξ , from (5.69), 1 mp+ (z) = I + O . (5.75) 3/4 √ min(M0 , t) But as v (3) = I on ∂Oξ , vR (z) = vp (z)−1 = mp+ (z)−1 , and hence vR (z) − I L∞ (∂ Oξ ) ≤
C 3/4
min(M0 ,
√ t)
z ∈ ∂Oξ .
(5.76)
We are using here the standard fact that if det vp = 1, then det mp = 1. Similarly, we have the same estimate (5.76) on vR (z) for z ∈ ∂Oξ . – For z ∈ * (3) ∩ Oξ , since mp and m−1 p are uniformly bounded, |vR (z) − I | ≤ (3)
C|v (3) (z)vp (z)−1 − I |. For z ∈ *−1 ∩ Oξ , by (5.47), (5.68), (3)
|v (3) (z)vp (z)−1 − I | ≤ |v21 (z) − (vH )21 (w(z))| = s|e2tf (z) δ −2 (z) − a(ξ )2 w(z)2iν e− 2 iw(z) | =: s|Q|. 1
2
(5.77)
Fredholm Determinant Identity for Random Young Tableaux
661
Setting u := iξ −1 (z − ξ ), we have
1 1 2 (ξ(1−iu)− ξ(1−iu)
2t
Q=e
+η log ξ(1−iu))
2t
−e
1 −1 )+η log ξ 2 (ξ −ξ
−iξ u ξ(1 − iu) − ξ −1
−ti(1−η2 )1/2 u2
−iξ u ξ − ξ −1
2iν
2iν
1 −1 2 1/2 2 = e2th(u) j (u) − 1 e2t ( 2 (ξ −ξ )+η log ξ )−ti(1−η ) u
(5.78)
−iξ u ξ − ξ −1
2iν ,
where 1 iu 1 h(u) = − iξ u − + η log(1 − iu) + i(1 − η2 )1/2 u2 2 ξ(1 − iu) 2 " 1 1 = − iη + 1 − η2 u3 + O(u4 ), 6 2
(5.79)
and j (u) =
Also, as √ u
ξ − ξ −1 ξ − ξ −1 − iξ u
1−η
2iν =1+O
u ξ − ξ −1
u =1+O # . 1 − η2 (5.80)
≤ c? for z ∈ Oξ , we have 2 |h(u)| ≤ c|u|3 ,
j (u) = 1 + O #
u 1 − η2
(5.81)
= 1 + O(?).
(5.82)
(3)
On the other hand, for z ∈ *−1 ∩ Oξ , Re
" " − it 1 − η2 u2 ≤ −ct 1 − η2 |u|2 ,
π > 0. 6
(5.83)
√ 2 2 |Q| ≤ C|(e2th(u) − 1) + e2th(u) (j (u) − 1)|e−ct 1−η |u| √ 2 2 ≤ C |2th(u)| + |j (u) − 1| e−ct 1−η |u| +| Re(2th(u))| √ 2 2 ≤ C(t|u|3 + ?)e−ct 1−η |u| C ≤√ + C?, 2t(1 − η2 )3/4
(5.84)
c = cos
Therefore, we obtain
662
J. Baik, P. Deift, E. Rains
where for the last inequality, we have used the fact that |x 3 e−x | is uniformly bounded for x ∈ R. Now for 1 + 21/3Mk02/3 ≤ 2tk < 1 + δ, we have 2
√ √ 2t(1 − η2 )3/4 > 2t(1 − η)3/4 =
k 2t
1/4
3/4
k 2/3 (
3/4 M0 1 ≥ , (1 + δ)1/4 21/3 and for 1 + δ ≤
2t k,
2t − 1) k
(5.85)
we have √
2t(1 − η )
2 3/4
√ ≥ 2t
δ 1+δ
3/4 .
(5.86)
Thus we obtain (recall (5.77)) vR − I L∞ (* (3) ∩O ) ≤ CsQL∞ (* (3) ∩O ) ≤ −1
ξ
−1
ξ
C 3/4
min(M0 ,
√ + C?, t)
(5.87)
which is small if we take M0 , t large and ? small. For other parts of * (3) ∩ Oξ , by a similar argument, we obtain the same estimate. By the symmetry m(3) (z) = m(3) (z) and mp (z) = mp (z), we obtain the same estimate for * (3) ∩ Oξ . (3)
– Let O := Oξ ∪ Oξ . For z ∈ *−1 ∩ (C \ O), vR (z) = v (3) (z). Thus we need an (3) estimate for v21 (z) = se2tf (z) δ −2 (z). Since |δ(z)| = e−νθ , where θ = arg z−ξ , z−ξ c (3) 2t −1 k , dist(*−1 ∩O , {ξ, ξ } (3) check that we can take *−1 so
|δ(z)| and |δ −1 (z)| are uniformly bounded. When 1+δ ≤
is uniformly bounded below. From this fact, one can (3)
c
that Re(f (z)) ≤ −c0 (?) for z ∈ *−1 ∩ O for some constant c0 (?) > 0 depending on ?. Hence we have vR − I L∞ (* (3) ∩Oc ) ≤ Ce−c0 (?)t , −1
1+δ ≤
2t . k
(3)
(5.88)
On the other hand, when 1 + 21/3Mk02/3 ≤ 2tk < 1 + δ, we take *−1 = {ρ(θ )eiθ : |θ| < θc } such that (3) (3) (i) For z ∈ *−1 with 2π 3 < | arg(z)| < θc , *−1 is a pair of straight lines which meet the unit circle at ξ and ξ , respectively, with angle π/3. (3) iθ iθ (ii) For ρeiθ ∈ *−1 with | arg(z)| ≤ 2π 3 , Re(f (ρe )) ≤ Re(f (ρ e )) for ρ ≤ 2π ρ ≤ 1. Also ρ(θ ) is an increasing function for 0 < θ < 3 and is a decreasing function for − 2π 3 < θ < 0. (Here the precise value 2π 3 is of no importance: any angle between π/2 and π will (3) do.) Condition (ii) can be achieved by choosing *−1 always to be above the curve π iθ {ρθ e : 2 < θ ≤ θc } (recall (5.38) and Figs. 1, 3). Condition (i) can be achieved as the curve {ρθ eiθ : π2 < θ ≤ θc } crosses the unit circle at 90 degrees (see Fig. 3). For
Fredholm Determinant Identity for Random Young Tableaux
663 π
z in (i) satisfying arg(z) > 0, we have z = ξ(1 − ire− 3 i ) for some real r > 0. We √ note that r ≤ 23 < √2 . For such z, we have (recall (5.35)) 3
" Re(f (z)) = A(r) 1 − η2 + B(r)η,
(5.89)
where √ r 2 (r − 3) A(r) = , √ 4(1 − 3r + r 2 ) √ √ √ r(r − 3)(2 − 3r) 1 + log(1 − 3r + r 2 ). B(r) = − √ 2 4(1 − 3r + r 2 )
(5.90)
√ One can easily check that A(r) < 0 for 0 < r < 3 and B(r) < 0 for 0 < r < Thus for z in (i) satisfying arg(z) > 0, we have for some c > 0, " " Re(f (z)) ≤ A(r) 1 − η2 ≤ −cr 2 1 − η2 .
√2 . 3
(5.91)
For z in (ii), note first that for fixed 0 < ρ < 1, Re(f (ρeiθ ) is an increasing function (3) in 0 ≤ θ < π. Let zb be the point on *−1 satisfying arg(z) = 2π 3 . Thus together with the condition (ii), we obtain for z in (ii) satisfying arg(z) > 0, " " Re (f (z)) ≤ Re(f (zb )) ≤ −c|zb − ξ |2 1 − η2 ≤ −c|z − ξ |2 1 − η2 .
(5.92)
(3)
Here the second inequality follows from (5.91). Thus we have for z ∈ *−1 ∩ Oξ with arg(z) > 0,
|vR (z) − I | ≤
Ce−c0 (?)t ,
√ Ce−ct|z−ξ |2 1−η2 ,
2t , k M0 2t 1 + 1/3 2/3 ≤ < 1 + δ. 2 k k
c
1+δ ≤
(5.93)
(3)
By symmetry, we have similar estimates for z ∈ *−1 with arg(z) < 0. Since |z−ξ | > c P for z ∈ O , the above estimates imply in particular that vR − I L∞ (* (3) ∩(C\O)) −1 3/2 M0 2t 2 Ce−c? M0 , 1 + 1/3 2/3 ≤ < 1 + δ, 2 k k ≤ 2t Ce−c0 (?)t , 1+δ ≤ . k
(5.94)
(3)
For *+1 ∩ (C \ O), by the symmetry Re(f (ρeiθ )) = Re(f (ρ −1 eiθ ), we have the (3)
same estimate. Also by a similar argument, we obtain a similar estimate for (*−2 ∪ (3)
*+2 ) ∩ (C \ O).
664
J. Baik, P. Deift, E. Rains
As usual, define an operator on L2 (*R ), CvR (f ) = C− (f (vR − I )), where C− is the Cauchy operator 1 z →z 2π i
(C− f )(z) = lim
*R
(5.95)
f (s) ds, s − z
z ∈ *R ,
(5.96)
where z is on the − side of *R . As the Cauchy operator is scale invariant, C− is bounded from L2 (*R ) → L2 (*R ) uniformly for 2tk ≥ 1 + 21/3Mk02/3 , and we have CvR < 21 for t, M0 sufficiently large by (5.76), (5.87) and (5.94). Hence 1 − CvR is invertible. By standard facts in Riemann–Hilbert theory (see [11, 6]), the solution R(z) to the RHP (*R , vR ) is given by 1 (I + (1 − CvR )−1 CvR I )(vR − I )(s) R(z) = I + ds. (5.97) 2πi *R s−z (3)
As mp (0) = I , we have m11 (0) = R11 (0). By using dist(0, *R ) > 0, (1 − CvR )−1 ≤ c, and C− ≤ c, we have (3)
|m11 (0) − 1| ≤ cvR − I L1 (*R ) + c(1 − CvR )−1 CvR I L2 (*R ) vR − I L2 (*R )
≤ cvR − I L1 + c(1 − CvR )−1 L2 →L2 C− (vR − I )L2 vR − I L2 ≤ cvR − I L1 + cvR − I 2L2
(5.98)
≤ cvR − I L1 + cvR − I L∞ vR − I L1 ≤ cvR − I L1 (*R ) as vR − I L∞ is bounded. We estimate vR − I L1 in each part of *R . First, for ∂O and * (3) ∩ O, since the length of the contour is of order P, we obtain by (5.76), (5.87) 1 vR − I L1 (*R ∩O) ≤ CP + ? . (5.99) 3/4 √ min(M0 , t) # When 1 + 21/3Mk02/3 ≤ 2tk < 1 + δ, by (5.71), P = ? 1 − η2 . When 1 + δ ≤ 2tk , # # # 1 − η2 = 1 − (k/(2t))2 ≥ C, and hence we have P = ? ≤ c 1 − η2 . Thus in both cases, we obtain " 1 vR − I L1 (*R ∩O) ≤ C 1 − η2 + ? . (5.100) √ 3/4 min(M0 , t) Now we compute vR − I L1 (*
1+δ ≤
2t k,
R ∩O
c
)
(3)
c
. We first focus on *−1 ∩ O ∩ {Im(z) > 0}. When
by (5.93),
C vR − I L1 (* (3) ∩Oc ∩{Im(z)>0}) ≤ Ce−c0 (?)t ≤ √ −1 t
"
1 − η2 ,
(5.101)
Fredholm Determinant Identity for Random Young Tableaux
for large t as
#
665
1 − η2 ≥ C in this case. When 1 +
M0 21/3 k 2/3
≤
2t k
< 1 + δ, from (5.93),
vR −I L1 (* (3) ∩Oc ∩{Im(z)>0}) −1 √ −ct 1−η2 |z−ξ |2 ≤ Ce |dz| c (3)
*−1 ∩O ∩{Im(z)>0} ∞ √ −ct 1−η2 r 2
≤C ≤ M0 21/3 k 2/3
But since, for 1 + "
P
e
C
#
(t 1 − η2 )1/2
≤
2t k
e−ct
(5.102)
dr
√
1−η2 P2
.
< 1 + δ,
1/6 $ 1/3 CM0 k 2t k 1 2/3 1− k , = − 1 ≥ 2t (2t)1/3 2t k t 1/3
! 1 − η2
≥
(5.103)
we obtain vR − I L1 (* (3) ∩Oc ∩{Im(z)>0}) ≤ −1
C
e−c? 3/4
M0
2 M 3/2 0
"
1 − η2 .
(5.104) c
By a similar computation, we obtain the same estimate for the other parts of *R ∩ O . Thus if we take ? small, and then take M0 , t large, we obtain by (5.98), (5.99) and (5.104), " (3) m11 − 1L1 (*R ) ≤ α 1 − η2 , (5.105) with a constant α > 0 which can be taken to be arbitrarily small. Therefore, from (5.48), (5.98), using (5.105), we obtain (note (5.49)) for large t, θc (3) log m11 (0) = log m11 (0) + 1 − log(1 − s 2 ) π (5.106) " " " 2 2 2 ≤ α 1 − η − c 1 − η ≤ −C 1 − η , for some C > 0, which is (5.18). 6. Multi-Painlevé Functions In this section we will show that the multi-interval case considered in Theorem 2.12 is related to new classes of the “multi-Painlevé function”. As we will see, these functions describe the interaction of solutions of Painlevé equations in a way which is strongly reminiscent of the interaction of classical solitons. We suggest the name “Painlevétons” or simply “P-tons” for these functions. In this section we only illustrate a few of the properties of P-tons. The general theory will be developed in a subsequent paper together with Alexander Its.
666
J. Baik, P. Deift, E. Rains
From Theorem 2.12, in the k interval case, k
(sj − sj −1 ) Knj (z, w) =
k
l=0 fl (z)gl (w)
z−w
j =1
,
(6.1)
where f = (f0 , . . . , fk )T = (sk , (s1 − s0 )ϕ(z)zn1 , . . . , (sk − sk−1 )ϕ(z)znk )T , T
g = (g0 , . . . , gk ) = (2πi)
−1
n1 −1
(1, −(ϕ(z)z )
nk −1 T
, . . . , −(ϕ(z)z )
) .
(6.2) (6.3)
Thus by the integrable operator theory [21, 12], the associated jump matrix v on * = {|z| = 1} has the form v = I − 2πifg T 1 − sk sk (ϕzn1 )−1 ......... sk (ϕznk )−1 −(s − s )ϕzn1 1 0 . = . . . δpq + (sp − sp−1 )znp −nq 1≤p,q≤k n k −(sk − sk−1 )ϕz (6.4)
For purposes of illustration, we will only consider the case when k = 2, 1 − s2 s2 (ϕzn1 )−1 s2 (ϕzn2 )−1 = −s1 ϕzn1 1 + s1 s1 zn1 −n2 −(s2 − s1 )ϕzn2 (s2 − s1 )zn2 −n1 1 + s2 − s1
v = v (3)
and ϕ = et (z−z takes the form
−1 )
(6.5)
as in Introduction. Observe now that when s1 = 0, the jump matrix 1 − s2 s2 (ϕzn1 )−1 s2 (ϕzn2 )−1 = 0 1 0 . n n −n 2 2 1 s2 z 1 + s2 −s2 ϕz
v = v (3)
Let m(3) be the solution of the 3 × 3 RHP (3) (3) m(3) + = m− v , m(3) → I
z ∈ *, as z → ∞.
But it is clear that the 2 × 2 matrix m(2) constructed from m(3) as follows, (3) (3) m11 m13 m(2) = (3) (3) m31 m33
(6.6)
(6.7)
(6.8)
Fredholm Determinant Identity for Random Young Tableaux
solves the RHP
667
n −1 m(2) = m(2) 1 − s2 s2 (ϕz 2 ) , + − −s2 ϕzn2 1 + s2 (2) m → I as z → ∞,
z ∈ *,
(6.9)
which is an RHP which is algebraically equivalent to the RHP for Painlevé III (PIII) which occurred in [3]: set √ √ 1 + s2 0 1 + s2 0 (2) (2) m % = m |z| < 1, √1 √1 0 0 1+s2 1+s2 √ (6.10) √1 1 + s 0 0 2 1+s2 √ m(2) |z| > 1. %(2) = m √1 0 1 + s2 0 1+s2 Then m %(2) solves the RHP 1 − s22 s2 (ϕzn2 )−1 (2) (2) m %+ = m %− , −s2 ϕzn2 1 (2) m → I as z → ∞,
z ∈ *,
(6.11)
which is the RHP for PIII considered in [18]. On the other hand, if s1 = s2 = s, then 1 − s s(ϕzn1 )−1 s(ϕzn2 )−1 (6.12) v = v (3) = −sϕzn1 1+s szn1 −n2 . 0 Now
0
(3)
1 (3)
m11 m12
m(2) = (3) (3) m21 m22 solves the RHP
n −1 m(2) = m(2) 1 − s s(ϕz 2 ) , + − −sϕzn2 1+s (2) m → I as z → ∞,
(6.13)
z ∈ *,
which again is the (equivalent) RHP for PIII. Also if we set n1 = n2 = n, 1 − s2 s2 (ϕzn )−1 s2 (ϕzn )−1 v = v (3) = −s1 ϕzn 1 + s1 s1 . −(s2 − s1 )ϕzn (s2 − s1 ) 1 + s2 − s1 Conjugating the solution m(3) of the RHP associated with v (3) by −1 1 0 0 1 0 0 m(3) → m %(3) = 0 1 0 m(3) 0 1 0 0 1 1 0 1 1
(6.14)
(6.15)
(6.16)
668
J. Baik, P. Deift, E. Rains
we find that m %(3) → I as z → ∞, and m %(3) solves a RHP with jump matrix 1 − s2 0 s2 (ϕzn )−1 = −s1 ϕzn 1 s1 . n −s2 ϕz 0 1 + s2
% v (3)
(6.17)
It follows that necessarily (3)
(% m12
(3)
m %22
(3)
m %32 )T = (0
0)T
1
(6.18)
and hence
(3)
(3)
m %11 m %13
m(2) = (3) (3) m %31 m %33
(6.19)
solves the RHP n −1 m(2) = m(2) 1 − s2 s2 (ϕz ) , + − −s2 ϕzn 1 + s2 (2) m → I as z → ∞,
z ∈ *,
(6.20)
which is again the (equivalent) RHP for PIII. The analogy with solitons is particularly clear if we consider v (3) in the edge scaling limit, nj = 2τ + tj τ 1/3 ,
j = 1, 2
;
t1 < t2 ,
(6.21)
as τ → ∞. Then 2iu v (3) −1 + 1/3 τ
1 − s2 s2 e2iθ1 s2 e2iθ2 →& v (3) (u) = −s1 e−2iθ1 1 + s1 s1 e−2i(θ1 −θ2 ) −(s2 − s1 )e−2iθ2 (s2 − s1 )e2i(θ1 −θ2 ) 1 + s2 − s1
(6.22)
on the real line, where θj =
4 3 u + tj u, 3
j = 1, 2.
(6.23)
In addition to varying s1 , s2 , we can now vary t1 , t2 . In particular, we can follow the trajectory of the solution of the RHP as t2 moves from t1 to ∞. As t2 → t1 , the solution becomes Painlevé II (PII) and as t2 → ∞, it gives to another solution of PII, but now with a phase shift (see [2]). It is this behavior of P-tons, in particular, that is reminiscent of soliton interactions.
Fredholm Determinant Identity for Random Young Tableaux
669
7. Colored Permutations First, the definition: Let π be an m-colored permutation (see, e.g., [32]), and assume the colors are indexed by 0, 1, . . . m − 1. Let S be a subsequence of length l of π which is a union of monochromatic increasing subsequences; let ki be the number of these sequences having color i, and set k = i ki . Note that the monochromatic increasing subsequences may be empty, but the color of empty subsequences still matters. We assign to S the following score: k+1 ki + 1 . (7.1) iki − m ml + + 2 2 0≤i≤m−1
Now, let lk (π ) be the maximum score over all unions of k monochromatic increasing subsequences (note l0 (π ) = 0). We then define λk (π ) := lk (π ) − lk−1 (π ).
(7.2)
(i)
Lemma 7.1. Let λk (π ) be the partition associated to just the i-colored subsequence of (i) π. Then λk (π ) − k is simply the k th largest of the numbers m(λj (π ) − j ) + i. Moreover, if π has length n, then λk (π ) is a partition of mn. Proof. Fix a composition ki , and consider the largest score associated to that composition. Clearly, we can maximize the score for each color independently; we thus obtain: ki + 1 k+1 iki − m ml + + 2 2 0≤i≤m−1 k+1 (i) = + (m(λj (π ) − j ) + i). (7.3) 2 0≤i≤m−1 1≤j ≤ki
(i)
Now, for a fixed value of k, this is clearly maximized when the values m(λj (π ) − j ) + i occurring in the sum are chosen to be as large as possible. Plugging the resulting value of lk (π ) into the formula for λk (π ), we obtain the first claim. (i) Note that the numbers m(λj (π ) − j ) + i are all different (the congruence class modulo m depends on the color, and the numbers are distinct within a given color). Furthermore, we readily verify that for each congruence class, the number of negative numbers not occurring in the set is equal to the number of nonnegative numbers occurring in the set. We thus conclude that λk (π ) is indeed a partition. It remains to verify that k λk (π ) = mn; in other words, lk (π ) = mn for k sufficiently large. Choose k such that π is a union of k increasing subsequences, and consider lmk (π ). We readily verify that the term ki + 1 iki − m (7.4) 2 0≤i≤m−1
is maximized when all ki are equal to k, and thus the optimal score differs from mn by k+1 mk + 1 ik − m = 0. (7.5) + 2 2 0≤i≤m−1
670
J. Baik, P. Deift, E. Rains
Remark 7.2. An alternate approach is to define λk (π ) via the Schensted correspondence for rim-hook permutations given in [34], at which point the lemma follows immediately. The fact that the rim-hook correspondence splits into m ordinary correspondences gives the increasing subsequence interpretation above. Now, suppose we choose n randomly according to a Poisson law of mean mt 2 , and then choose an m-colored permutation of length n at random. Equivalently, take m independent Poisson processes in the unit square (one for each color), and convert the resulting point set to a colored permutation. We thus see that the resulting random (i) partitions λj (π ) are independent, and are all distributed according to the law for ordinary permutations. In particular, we obtain the following correlation kernel: S(m) (a, b) = (ϕ −1 )(a+k)/m ϕ(b+k)/m , (7.6) k≥1
where ϕ(z) = et (z−z
−1 )
.
(7.7)
(Recall from Corollary 2.10 that ϕa and (ϕ −1 )a are 0 for a non-integral.) Now by using Corollary 2.10 and Theorem 3.1 for the convergence of moments for the ordinary permutations, we obtain the convergence of moments for λk ’s in the colored permutation setting. More precisely, as in (3.1), there is a limiting distribution F color(m) such that √ a k λj − 2 mN j color(m) lim EN = Ecolor(m) x1a1 · · · xkak , (7.8) 2/3 1/6 N→∞ m (mN ) j =1
color(m)
denotes the expectation with respect to the natural counting measure where EN on the colored permutations (see [32]), and Ecolor(m) is the expectation with respect to F color(m) . The function F color(m) (x1 , . . . , xk ) has the following meaning in terms of GUE. Take m random GUE matrices of size N at random, then superimpose their eigenvalues. We denote the largest of those superimposed numbers by z1 (N ), the second largest by z2 (N ), and so on. Then F color(m) (x1 , . . . , xk ) is the limiting distribution of z1 , . . . , zk as N → ∞, after appropriate centering and scaling. A number of other statistical systems which are currently of interest can also be analyzed by the methods of this paper. In particular, we have in mind the random word problem [35, 24, 22, 23], certain 2-dimensional growth models [25], and also the so-called “digital boiling model” [20]. For example, in the growth model considered by Johansson in [25], let σ = ∪kj =1 σj be a union of k disjoint increasing paths σj in the model. Let L(k) (σ ) be the sum of the lengths of the paths σj , and let L(k) = maxσ L(k) (σ ). We define λk = L(k) − L(k−1) . The joint probability distribution for λ1 , . . . , λk can be obtained [25] by various differentiations of det(1 + kj =1 sj χ[nj .nj −1 ) S) with respect to s1 , . . . , sk as in (1.12) √ √ with ϕ now given by ϕ(z) = (1 + qz)M (1 + qz−1 )−N . But now by Theorem 2.12, k det(1+ j =1 sj χ[nj .nj −1 ) S) can be expressed in terms of the determinant of an integrable operator as in (2.53). This opens up the possibility for the asymptotic analysis of the convergence of moments for the joint distribution. However, the associated RHP has a
Fredholm Determinant Identity for Random Young Tableaux
671
new feature, namely the weight function is non-real, which has not yet been addressed in general (however, see [27]). There are similar formulae for random words and digital boiling. Note added in proof. After this paper was submitted to CMP and accepted, the authors received a paper from Harold Widom [W] in which he shows how to obtain estimate the tail estimates (5.10), (5.11) below using classical methods. These two estimates are the main information we need in order to prove the convergence of moments, Theorem 3.1. We also note that similar lower tail estimates are obtained for the last passage site percolation model with geometric random variables and the digital boiling model in [W] and [BDMMZ], respectively. [W]
Widom H.: On Convergence of Moments for Random Young Tableaux and an Random Growth Model. math.CO/0108008; http://xxx.lanl.gov/abs/ [BDMMZ] Baik, J., Deift, P., Miller, P., McLauglin, K. and Zhou, X.: In preparation Acknowledgements. The authors would like to thank Xin Zhou for useful comments. The authors would also like to thank Albrecht Böttcher for pointing out a calculational error in an earlier version of the text. The work of the first author was supported in part by NSF Grant # DMS 97-29992. The work of the second author was supported in part by NSF Grant # DMS 00-03268, and also by the Guggenheim Foundation.
References 1. Abramowitz, M. and Stegun, I.: Handbook of Mathematical Functions. New York: Dover Publications, 1965 2. Baik, J., Deift, P. and Its, A.: In preparation 3. Baik, J., Deift, P. and Johansson, K.: On the distribution of the length of the longest increasing subsequence of random permutations. J. Am. Math. Soc. 12, (4), 1119–1178 (1999) 4. Baik, J., Deift, P. and Johansson, K.: On the distribution of the length of the second row of a Young disgram under Plancherel measure. Geom. Funct. Anal. 10, 4, 702–731 (2000) 5. Basor, E. and Widom, H.: On a Toeplitz determinant identity of Borodin and Okounkov. Integral Equations Operator Theory 37, 4, 397–401 (2000) 6. Beals, R. and Coifman, R.: Scattering and inverse scattering for first order systems. Comm. Pure Appl. Math. 37, 39–90 (1984) 7. Borodin, A. and Okounkov, A.: A Fredholm determinant formula for Toeplitz determinants. Integral Equations Operator Theory 37, 4, 386–396 (2000) 8. Borodin, A., Okounkov, A. and Olshanski, G.: On asymptotics of Plancherel measures for symmetric groups. J. Am. Math. Soc. 13, (3), 481–515 (2000) 9. Böttcher, A.: On the determinant formulas by Borodin, Okounkov, Baik, Deift, and Rains. math.FA/0101008; http://xxx.lanl.gov/abs/ 10. Böttcher, A.: One more proof of the Borodin–Okounkov formula for Toeplitz determinants. Integral Equations Operator Tehory 41, 1, 123–125 (2001) 11. Clancey, K. and Gohberg, I.: Factorization of Matrix Functions and Singular Integral Operators. Basel– Boston: Birkhäuser, 1981 12. Deift, P.: Integrable operators. Am. Math. Soc. Transl. Ser. 2 189, 69–84 (1999) 13. Deift, P., Kriecherbauer, T., McLaughlin, K., Venakides, K. and Zhou, X.: Strong asymptotics of orthogonal polynomials with respect to exponential weights. Comm. Pure Appl. Math. 52, (12), 1491–1552 (1999) 14. Deift, P., Kriecherbauer, T., McLaughlin, K., Venakides, S. and Zhou, X.: Uniform asymptotics for polynomials orthogonal with respect to varying exponential weights and applications to universality questions in random matrix theory. Comm. Pure Appl. Math. 52, (11), 1335–1425 (1999) 15. Deift, P., Venakides, S. and Zhou, X.: New results in small dispersion KdV by an extension of the steepest descent method for Riemann-Hilbert problems. Internat. Math. Res. Notices 6, 285–299 (1997) 16. Deift, P. and Zhou, X.: A steepest descent method for oscillatory Riemann-Hilbert problems. Asymptotics for the MKdV equation. Ann. of Math. 137, 295–368 (1993) 17. Deift, P. and Zhou, X.: Asymptotics for the Painlevé II equation. Comm. Pure Appl. Math. 48, 277–337 (1995) 18. Fokas, A., Mugan, U. and Zhou, X.: On the solvability of Painlevé I, III and V. Inverse Problems 8, 757–785 (1992)
672
J. Baik, P. Deift, E. Rains
19. Gessel, I.: Symmetric functions and P-recursiveness. J. Combin. Theory Ser. A 53, 257–285 (1990) 20. Gravner, J., Tracy, C. and Widom, H.: Limit theorems for height fluctuations in a class of discrete space and time growth models. J. Statist. Phys. 102, 5–6, 1085–1132 (2001) 21. Its, A., Izergin, A., Korepin, V. and Slavnov, N.: Differential equations for quantum correlation functions. Internat. J. Modern Phys. B 4, (5), 1003–1037 (1990) 22. Its, A., Tracy, C. and Widom, H.: Random words, Toeplitz determinants and integrable systems. I. In: Random matrix models and their applications. Math. Sci. Res. Inst. Publ. 40. Cambridge: CAmbridge Univ. Press, 2001, pp. 245–258 23. Its, A., Tracy, C., and Widom, H.: Random words, Toeplitz determinants and integrable systems. II. Phys. D 152–153, 199–224 (2001) 24. Johansson, K.: Discrete orthogonal polynomial ensembles and the Plancherel measure. Ann. of Math. 153, (1), 259–296 (2001) 25. Johansson, K.: Shape fluctuations and random matrices. Commun. Math. Phys. 209, 2, 437–476 (2000) 26. Johansson, K.: The longest increasing subsequence in a random permutation and a unitary random matrix model. Math. Res. Lett. 5, (1–2), 63–82 (1998) 27. Kamvissis, S., McLaughlin, K. and Miller, P.: Semiclassical soliton ensembles for the focusing nonlinear Schrödinger equation. Preprint, 2000 28. Okounkov, A.: Infinite wedge and random partitions. math.RT/9907127; http://xxx.lanl.gov/abs/ 29. Okounkov, A.: Random matrices and random permutations. Internat. Math. Res. Notices 20, 1043–1095 (2000) 30. Rains, E.: A mean identity for longest increasing subsequence problems. math.CO/0004082; http://xxx.lanl.gov/abs/ 31. Rains, E.M.: Correlation functions for symmetrized increasing subsequences. math.CO/0006097; http://xxx.lanl.gov/abs/ 32. Rains, E.M.: Increasing subsequences and the classical groups. Electron. J. Combin. 5, (1), R12 (1998) 33. Simon, B.: Trace ideals and their applications. Volume 35 of London Mathematical Society Lecture Note Series. Cambridge, New York, NY: Lond. Math. Soc., 1979 34. Stanton, D.W. and White, D.E.: A Schensted algorithm for rim hook tableaux. J. Combin. Theory Ser. A 40, 211–247 (1985) 35. Tracy, C. and Widom, H.: On the distribution of the lengths of the longest monotone subsequences in random words. Probab. Theory Related Fields 119, 3, 350–380 (2001) 36. Tracy, C. and Widom, H.: Level-spacing distributions and the Airy kernel. Commun. Math. Phys. 159, 151–174 (1994) 37. Whittaker, E. and Watson, G.: A Course of Modern Analysis. Cambridge: Cambridge University Press, 4th edition, 1927 Communicated by P. Sarnak