A.K. Chakraborty / Physics Reports 342 (2001) 1}61
1
DISORDERED HETEROPOLYMERS: MODELS FOR BIOMIMETIC POLYMERS AND POLYMERS WITH FRUSTRATING QUENCHED DISORDER
Arup K. CHAKRABORTY Department of Chemical Engineering, and Department of Chemistry, University of California, Berkeley, CA 94720, USA
AMSTERDAM } LONDON } NEW YORK } OXFORD } PARIS } SHANNON } TOKYO
Physics Reports 342 (2001) 1}61
Disordered heteropolymers: models for biomimetic polymers and polymers with frustrating quenched disorder Arup K. Chakraborty Department of Chemical Engineering, and Department of Chemistry, University of California, Berkeley, CA 94720, USA Received December 1999; editor: M.L. Klein Contents 1. Introduction 2. Biomimetic recognition between DHPs and multifunctional surfaces 2.1. Theory of thermodynamic properties 2.2. Monte-Carlo simulations of thermodynamic properties 2.3. Kinetics of recognition due to statistical pattern matching
4 5 8 23 30
2.4. Connection to experiments and issues pertinent to evolution 3. Branched DHPs in the molten state } model system for studying microphase ordering in systems with quenched disorder Acknowledgements Appendix References
41
44 52 52 59
Abstract The ability to design and synthesize polymers that can perform functions with great speci"city would impact advanced technologies in important ways. Biological macromolecules can self-assemble into motifs that allow them to perform very speci"c functions. Thus, in recent years, attention has been directed toward elucidating strategies that would allow synthetic polymers to perform biomimetic functions. In this article, we review recent research e!orts exploring the possibility that heteropolymers with disordered sequence distributions (disordered heteropolymers) can mimic the ability of biological macromolecules to recognize patterns. Results of this body of work suggests that frustration due to competing interactions and quenched disorder may be the essential physics that can enable such biomimetic behavior. These results also show that recognition between disordered heteropolymers and multifunctional surfaces due to statistical pattern matching may be a good model to study kinetics in frustrated systems with quenched disorder. We also review work which demonstrates that disordered heteropolymers with branched architectures are good model systems to study the e!ects of quenched sequence disorder on microphase ordering of molten
E-mail address:
[email protected] (A.K. Chakraborty). 0370-1573/01/$ - see front matter 2001 Elsevier Science B.V. All rights reserved. PII: S 0 3 7 0 - 1 5 7 3 ( 0 0 ) 0 0 0 0 6 - 5
A.K. Chakraborty / Physics Reports 342 (2001) 1}61
3
copolymers. The results we describe show that frustrating quenched disorder a!ects the way in which these materials form ordered nanostructures in ways which might be pro"tably exploited in applications. Although the focus of this review is on theoretical and computational research, we discuss connections with existing experimental work and suggest future experiments that are expected to yield further insights. 2001 Elsevier Science B.V. All rights reserved. PACS: 87.15.Aa; 82.35.#t
4
A.K. Chakraborty / Physics Reports 342 (2001) 1}61
1. Introduction Synthetic polymers have enormously impacted societal and economic conditions because they are commonly used to manufacture a plethora of commodity products. This is one of the driving forces that continues to spur fundamental research aimed toward understanding the physics of macromolecules and learning how to chemically synthesize them. Another motivation for such research is, of course, intrinsic interest in the fascinating behavior of macromolecules. Research conducted by several physical and chemical scientists has led to substantial advances in our ability to synthesize macromolecules and understand their physical behavior. In recent years, technological advances have begun to demand materials which exhibit very speci"c properties. If polymers are to continue to impact society in important ways, they must meet this need. Polymers are good candidates for materials which can perform functions with a high degree of speci"city. We can make this claim because it is well-established that biological macromolecules are able to carry out very speci"c functions. One feature that allows biological macromolecules to perform speci"c functions is their ability to self-assemble into particular motifs. Polymeric materials could impact advanced technologies in important ways if we could learn how to design and synthesize macromolecular systems that can selfassemble into functionally interesting structures and phases. One way to confront this challenge is to take lessons from nature since millenia of evolution have allowed biological systems to learn how to create functionally useful self-assembled structures from polymeric building blocks. By suggesting that we take lessons from nature, we do not imply copying the detailed chemistry which allows a biological system to carry out a speci"c function that we seek. This would be impractical in many contexts. Rather, we suggest asking the following questions: are there underlying universalities in the design strategies that nature employs in order to mediate a certain class of functions? If so, can we exploit similar strategies to design synthetic materials that can perform the same class of functions with biomimetic speci"city? The reason for the interest in universal strategies is that these may be easier to implement in synthetic systems than the detailed chemistries of natural systems, and may illuminate the essential physics. However, it is also important to realize that universal strategies will also lead to lower degrees of speci"city compared to situations where the detailed chemistry has been "ne-tuned. Recent work suggests that a possible design strategy employed by natural polymers to a!ect assembly into functionally interesting materials is to exploit multifunctionality and disordered sequence distributions (e.g., [1}3]). Disordered heteropolymers (DHPs) constitute a class of synthetic polymers that embodies these features. These are copolymers containing more than one type of monomer unit, with the monomers connected together in a disordered sequence. The monomers may also be connected with di!erent architectures; e.g., branched versus linear connectivity. An important point is that once synthesis is complete, the sequence and the architecture cannot change in response to the environment. Since DHPs embody multifunctionality and quenched disorder, they serve as excellent vehicles to explore the suggestion that these features may be essential elements for mediating certain types of biomimetic function in synthetic systems relevant to applications. Competing interactions (due to the presence of di!erent types of monomer units), connectivity, and the quenched character of the disordered sequence also make DHPs quintessential examples of frustrated systems.
A.K. Chakraborty / Physics Reports 342 (2001) 1}61
5
In the latter half of this century, physical scientists interested in the condensed phase have directed considerable attention to two broad classes of problems: those involving biological phenomena (e.g., protein folding) and frustrated systems (e.g., the e!ects of frustrating quenched randomness in spin glass physics). In addition to being a way of exploring how biomimetic synthetic soft materials can be designed, studying DHPs also allows us to study those aspects of biopolymer behavior that may be termed physics (as distinguished from detailed chemistry) (e.g., [1}3]). DHPs of particular types (vide infra) also o!er the potential for being excellent vehicles for careful experimental studies of the manifestations of frustrating quenched disorder. In this article, we try to illustrate (via examples) how DHPs can serve as biomimetic polymers and/or as model systems to study the physics of frustrated systems with quenched randomness. We begin (Section 2) by discussing the adsorption of DHPs from solution, and these considerations show that such molecules can exhibit a phenomenon akin to recognition in biological systems. These studies also suggest that the systems that we discuss may be good (and simple) model systems for experimental studies of kinetic phenomena in frustrated systems with quenched disorder. Section 2 also includes a discussion of the connections of the work we describe with experiments and some provocative ideas being considered in evolutionary biology. In Section 3, we discuss theoretical and experimental work which demonstrates that DHPs with branched architectures are good model systems to study e!ects of frustrating quenched disorder on microphase ordering. This article is not a comprehensive or encyclopedic review of DHP physics. However, this article, some recent reviews of the use of these macromolecules as minimalist models to study protein folding (e.g., [1}5]), and a recent review in this journal on theoretical considerations of microphase ordering in molten DHPs with linear architectures [6] provide a glimpse of much of what is known about the physical behavior of these macromolecules.
2. Biomimetic recognition between DHPs and multifunctional surfaces Many vital biological processes, such as transmembrane signaling, are initiated by a biopolymer (e.g., a protein) recognizing a speci"c pattern of binding sites that constitutes a receptor located in a certain part of the surface of a cell membrane. By recognition we imply that the protein adsorbs strongly on the pattern-matched region, and not on other parts of the surface; furthermore, it evolves to the pattern-matched region and binds strongly to it in relatively fast time scales without getting trapped in long-lived metastable adsorbed states in the wrong parts of the cell surface. If synthetic polymers were able to mimic such recognition, it would indeed be useful for many advanced applications. Examples of such applications include sequence selective separation processes [7,8], the development of viral inhibition agents [9}11], and sensors. Polymer adsorption from solution has been studied extensively in recent years (see [12,13] for recent reviews). Most studies have been concerned with the adsorption of polymers with ordered sequence distributions (e.g., homopolymers and diblock copolymers). These studies have taught us many important lessons. One lesson pertinent to our concerns can be illustrated by considering the example of a homopolymer interacting with a chemically homogeneous surface. In this case, once we have chosen the chemical identity of the polymer segments, di!erent surfaces are characterized by the attractive energy per segment between the surface in question and the polymer segments (E/k¹). Thus, if we plot the polymer adsorbed fraction (at equilibrium) as a function of E/k¹, points
6
A.K. Chakraborty / Physics Reports 342 (2001) 1}61
Fig. 1. Schematic representation of an adsorbed polymer chain. The sketch provides the de"nitions of loops, trains, and tails.
on the abcissa correspond to di!erent surfaces. Theoretical and experimental studies have "rmly established the nature of this plot. For small values of E/k¹, there is no adsorption because the energetic advantage associated with segmental binding is not su$cient to beat the entropic penalty associated with chain adsorption. For su$ciently large values of E/k¹, adsorption does occur. The transition from desorbed states to adsorption is a second order phase transition for #exible chains [14]. Adsorbed polymer conformations can be characterized by loops, tails, and trains (see Fig. 1). Fluctuating the distribution of loops while maintaining the same number of contacts is favored because this increases the entropy. These loop #uctuations cause the adsorption transition to be continuous. The practical consequence is that thermodynamic discrimination between surfaces is not sharp } a requisite feature for recognition. The adsorption characteristics of diblock copolymers on striped surfaces have also been understood (e.g., [15,16]). At equilibrium, they adsorb at the interface of the stripes with each block adsorbed on the stripe that is energetically favored. This phenomenon is di!erent from what is meant by recognition in important ways. Firstly, the chain is not localized in a region commensurate with chain dimensions. This is so because it is entropically favorable for the chain to sample the entire interface. Secondly, it seems highly likely that the diblock copolymer chains would be kinetically trapped in regions away from the interface. This is so because adsorbing one block on an energetically favorable stripe while allowing the non-adsorbed block to sample many conformations appears to be a deep free energy minimum. Thus, this system does not seem to exhibit the hallmarks of recognition either. (Much work has also been done on the behavior of molten diblock copolymer layers on patterned surfaces. We do not discuss these studies here because the focus of this section is on adsorption from solution. Readers interested in this topic are directed to a recent review [17] and references therein.) In short, recognition implies a sharp discrimination between di!erent regions of a surface and localization of the chain to a relatively small pattern matched region without getting kinetically trapped in the `wronga parts of the surface. In biological systems it also usually entails adsorption in a particular conformation or shape. Synthetic polymers with ordered sequence distributions do not seem to exhibit these characteristics. Similar conclusions can be reached by perusing interesting studies of DHP adsorption on homogeneous surfaces [18}21] and homopolymer adsorption on chemically disordered surfaces [22]. One way to make synthetic systems mimic recognition is to copy the detailed chemistries which allow natural systems to a!ect recognition. This is not a practical solution in most cases. Recently,
A.K. Chakraborty / Physics Reports 342 (2001) 1}61
7
some work has been done to explore whether there are any universal strategies that may allow synthetic polymers and surfaces to mimic recognition [23}31]. (Such universal strategies may be simpler to implement in practical situations, and may also shed light on the minimal ingredients, or principles, required for synthetic systems to mimic recognition.) This body of work is the primary focus of this section. The purpose of this work has not been to explain the physical and chemical mechanisms that allow recognition in biological systems. However, in order to deduce possible universal strategies, some coarse-grained observations about biological systems have provided the inspiration. Each protein carries a speci"c pattern encoded in its sequence of amino acids. In recent years, great interest in elucidating the physics of protein folding has led to many coarse-grained models for amino acid sequences in proteins. All models exhibit a common feature. In order to illustrate this feature, consider the H/P model [32] wherein amino acids are considered to belong to two classes } hydrophobic and polar. This (and other) models have been used to characterize protein sequences, and it has been found [33}35] that the pattern of H- and P-type moieties is usually not periodically repeating. Similarly, examination of cell and virus surfaces reveals that the chemically di!erent binding sites that constitute receptors (which are recognized by proteins) are also not arranged in a periodically repeating pattern. These observations suggest that disorder and competing interactions (due to preferential interactions between polymer segments and surface sites) may be key ingredients for recognition between synthetic polymers and surfaces. Heteropolymers with disordered sequences carry a pattern encoded in their sequence distribution. The information content is statistical, however, since the sequences are characterized statistically. For example, for a 2-letter DHP (say, A- and B-type segments), the simplest way to describe the disordered sequence distribution is by specifying the average fraction of segments of one type ( f ), and a quantity j that measures the strength of two-point correlations in the chemical identity of segments along the chain [36]. j is directly related to the synthetic conditions and the matrix of reaction probabilities, P. Elements of this matrix, P are the conditional probabilities that GH a segment of type j directly follows a segment of type i. Clearly, j depends upon the choice of the chemical identity of the segments and synthesis conditions. Consequently, synthesis conditions and the choice of chemistry determines the statistical pattern carried by DHPs. If j'0, within a correlation length measured along the chain, there is a high probability of "nding segments of the same type. We shall refer to such an ensemble of sequences as statistically blocky. If j(0, within a certain correlation length measured along the chain, there is a high probability of "nding an alternating pattern of segments. The absolute magnitude of j measures the correlation length. For example, j"0 corresponds to perfectly random sequences, and j"1 implies the correlation length is the entire chain length. Characterization of sequence statistics using f and j implies that we are only looking at two-point correlations to describe the statistical patterns. More elaborate statistical patterns can be described by considering higher order correlations and/or more than two types of segments. Consider the interaction of DHPs with surfaces bearing more than one type of site, with the sites being distributed in a disordered manner. Examples of such surfaces with two kinds of sites distributed in a disordered fashion are shown schematically in Fig. 2. The distribution of these sites on the neutral surface can be characterized statistically. For example, a simple way would be to specify the total number density of both kinds of sites per unit area, the fraction of sites of one type, and the two point correlation function describing how the probability of having a site of type A at
8
A.K. Chakraborty / Physics Reports 342 (2001) 1}61
Fig. 2. Examples of statistically patterned surfaces. White represents a neutral background. The two types of `activea surface sites are depicted using light and dark grey dots. In the panel on the left, within some length scale, there is a high probability of "nding sites of opposite types adjacent to each other. Such surfaces are referred to in the text as statistically alternating. In the panel on the right, within some length scale, there is a high probability of "nding sites of the same type adjacent to each other. Such surfaces are referred to in the text as statistically patchy.
position r is related to the probability of having a site of the same type at position r. Fig. 2 shows speci"c realizations of two surfaces bearing simple statistical patterns. In nature, recognition (with all its hallmarks noted earlier) occurs when the speci"c pattern encoded in its sequence distribution and that carried by the binding sites is matched (i.e., related in a special way). DHP sequences and the surfaces we have described in the preceding paragraph carry statistical patterns. The question we now ask is: will statistically patterned surfaces be able to recognize the statistical information contained in an ensemble of DHP sequences when the statistics characterizing the DHP sequence and surface site distributions are related in a special way? In other words, is statistical pattern matching su$cient for recognition to occur? This question is interesting for three reasons: (1) the answer may tell us what the minimal ingredients are for the occurrence of a phenomenon akin to recognition; (2) if recognition can occur via statistical pattern matching, the phenomenon might be pro"tably exploited in applications; (3) DHPs interacting with functionalized surfaces bearing statistical patterns may be good model systems to study the physics of frustrated systems with quenched disorder. In order to answer this question, we have to address several issues. We must determine whether competing interactions and disorder are su$cient for sharp discrimination between di!erent statistical patterns, and whether the inherent frustration allows localization (in reasonable time scales) on a relatively small part of the surface which is statistically pattern matched. Addressing these issues requires that we study both thermodynamic and kinetic behavior. Let us begin by describing the thermodynamics. 2.1. Theory of thermodynamic properties Srebnik et al. [23] analyzed a bare bones version of the problem in order to examine whether frustration due to competing interactions and quenched disorder are su$cient to obtain sharp discrimination between surfaces with di!erent statistical patterns. In this model, the 2-letter DHPs
A.K. Chakraborty / Physics Reports 342 (2001) 1}61
9
are considered to be Gaussian chains in solution. The surface is comprised of two di!erent types of sites on a neutral background, and each type of site interacts di!erently with the two types of DHP segments. In the in"nitely dilute limit, the physical situation described above corresponds to the following Hamiltonian:
dr , 3 , dn ! dn dr;k(r)d(r(n)!r)h(n)d(z) !bH"! dn 2l
(1)
where r(n) represents chain conformation, k(r) is the interaction strength with a surface site located at r, the factor of d(z) ensures that these sites live on a 2-D plane, l is the usual statistical segment length, and h(n) represents the chemical identity of the nth segment. For a two-letter DHP, this quantity is $1 depending upon whether we have an A- or a B-type segment. Eq. (1) implies that a surface site which exhibits attractive interactions with one type of DHP segment has an equally repulsive interaction with segments of the other type. This is assumed for simplicity. What is important is that interactions between a surface site and the two di!erent types of segments are di!erent. Since the DHP sequence and the surface site distribution are disordered, k(r) and h(n) are #uctuating variables. Later, we shall have much to say about how di!erent types of correlated #uctuations of these variables a!ects the physical behavior. For now, in order to explore the essential physics, let us consider the #uctuations in k(r) and h(n) to be uncorrelated. Further, in order to simplify the analysis, these #uctuations are described by Gaussian processes. This latter approximation should not a!ect the qualitative physics, and generalizes the results to a physical situation with many types of DHP segments and surface sites. Speci"cally, Srebnik et al. [23] take the surface to be neutral on average (i.e., k has a mean value of zero), and the variance of the #uctuations in k is p . This implies that p is the only variable which measures the statistical pattern carried by the surface. Di!erent values of this quantity represent di!erent surfaces. Physically, p is proportional to the total number density of both types of sites on the neutral surface. The uncorrelated sequence distribution is described by h(n) having a mean value of (2f!1), where f is the average composition of one type of segment. The variance is p , and is also related to the average composition (it equals 4f (1!f )). Thus, in this simple case the statistical pattern carried by the DHP sequence is measured by the average composition. In order to proceed, we must average over the quenched sequence distribution and the #uctuations in k that characterize the surface site distribution. Consider the latter issue "rst. If the sites on the surface can anneal in response to the adsorbing chain molecule, then the partition function is self-averaging with respect to the #uctuations in k. This could be the physical situation if the functional groups that represent the sites on the surface were weakly bonded to the surface. If, however, the sites on the surface cannot respond to the presence of the DHP, then the partition function is self-averaging with respect to the #uctuations in k only under restricted circumstances. As has been explicated in many contexts (e.g., [37}39]), the quenched and annealed averages over disorders external to the #uid of interest are equivalent when the medium is su$ciently large, and the time of observation is long enough for the #uid (in our case, the polymer) to sample the medium. Later, we shall quantify these statements by examining results of Monte-Carlo simulations. For the moment, we carry out an analysis that holds for annealed disorders under all circumstances, and
10
A.K. Chakraborty / Physics Reports 342 (2001) 1}61
is appropriate for quenched surface disorders under the restrictions noted above. Therefore, following Feynman [40], we calculate the in#uence functional by averaging over the Gaussian #uctuations in k(r). This obtains the following e!ective Boltzmann factor:
exp[!bH ]"
1 Dk(r) exp ! dr dr k(r)d(r!r)k(r)d(z)d(z) 2p
dr 3 , ! dn dr k(r)d(r(n)!r)h(n)d(z) . ;exp ! dn dn 2l
(2)
The partition function is not self-averaging with respect to the quenched sequence #uctuations under any circumstances. Replica methods [41,42] provide one way to carry out the quenched average. We will consider f"0.5, thereby "xing the statistics of the DHP sequences. We then study adsorption as a function of the surface statistics (i.e., the variance of the distribution that characterizes the #uctuations in k(r)). Replicating the e!ective Hamiltonian in Eq. (2), and carrying out the functional integral corresponding to the average over the distribution of h obtains the following m-replica partition function: K 1GK2" ?
Dr (n) ?
Dk (r) ?
dr 3 K ;exp ! dn ? dn 2l ?
1 ;exp ! dr dr k (r)d d(r!r)k (r)d(z)d(z) ? ?@ @ 2p ?@ ;exp
p dn dr dr k (r)k (r)d(r (n)!r)d(r (n)!r)d(z)d(z) . ? @ ? @ 2 ?@
(3)
This replicated partition function can be written in a form that is more convenient both for thinking and computing. De"ne the following order parameter, Q (r!r), which measures the ?@ conformational overlap on the surface between the replicas:
Q (r, r)" dn d(r (n)!r);d(r (n)!r)d(z)d(z) . ? @ ?@
(4)
This de"nition allows us to rewrite Eq. (3) as a functional integral over this overlap order parameter in the following way:
1GK2"
DQ exp(!E[Q ]#S[Q ]) , ?@ ?@ ?@
A.K. Chakraborty / Physics Reports 342 (2001) 1}61
11
where K E"!ln ?\ K S"ln ?\
1 Dk (r) exp ! dr dr k (r)P (r, r)k (r) , ? ? ?@ @ 2
dr 3 K Dr (n) exp ! dn ? ? dn 2l ?\
;d[Q (r, r)! dn d(r (n)!r);d(r (n)!r)] ?@ ? @
(5)
and P (r, r) is ?@ d(r!r)d ?@ d(z)d(z)!p Q (r, r) . P (r, r)" ?@ ?@ p
(6)
The quantity S is clearly the entropy associated with a given overlap order parameter "eld since it is the logarithm of the number of ways in which the DHP can organize itself in 3D space with the constraint that the overlap between replicas on the 2D surface is Q . Then, E is the associated ?@ energy. As has been demonstrated in the context of protein folding (e.g., [1}5]) and the behavior of DHPs in 3D disordered media [43], these polymers with quenched sequence distributions can exhibit behavior akin to the REM model, the Potts glass with many states, or p-spin models. One consequence of this is that, under certain circumstances, the thermodynamics is determined by a few dominant conformations. This is because frustration due to competing interactions and quenched disorder makes these few conformations energetically much more favorable compared to all others. Since the physical situation that we are considering also embodies the frustrating e!ects of competing interactions and quenched disorder, we must allow for the possibility of such a phenomenon (we will refer to it as freezing for convenience). In fact, since the competing interactions in our case occur on a 2D plane, this e!ect might be enhanced. It is very important to understand that the preceding sentence does not imply that the problem we are considering is one wherein the dimensionality of space is 2. The polymer conformations can (and do) #uctuate in 3D space by forming loops and tails; only the competing interactions in this simple scenario are manifested in 2D space. We shall return to the importance of loop #uctuations later in this section. Mathematically, allowing for the possibility of a few dominant conformations implies that we must allow for broken replica symmetry. Parisi [44] pioneered the way in which to compute and think about broken replica symmetry in the context of spin glasses. For SK spin glasses, replica symmetry is broken in a hierarchical manner [42,44]. For the REM and p-spin models with p'2, one stage of the symmetry breaking process is su$cient for sensible calculations [42]. As noted earlier, since DHPs share some features with these models, a one-step replica symmetry breaking (RSB) scheme is a reasonable approximation (e.g., [1}5,42,43]). Replicas are divided into groups. Replicas within a group have perfect overlap on the surface, and those in di!erent groups do not overlap at all on the surface. The energy can be computed by evaluating the logarithm of the determinant of the matrix P . Mezard and Parisi [44] have ?@
12
A.K. Chakraborty / Physics Reports 342 (2001) 1}61
provided formulas for this quantity when there is broken replica symmetry. Using their formula and a 1-step RSB scheme the energy is computed to be:
1 1 E" !ln p # ln(1!C px ) , x 2 p "p/N , C "p p N/A , (7) where x is the number of replicas in a group, p is the number of contacts with the surface, and A is the surface area of the solid. In writing Eq. (7), the density of adsorbed segments on the surface has been approximated to be uniform. In order to compute the entropy a physically transparent method can be employed. The "rst quantity that we need to compute is the number of ways in which x replicas can be arranged such that their surface conformations overlap perfectly. When polymers adsorb, the conformations are characterized by loops, trains, and tails (see Fig. 1). In the long chain limit, we may ignore tails. Let f (r !r ) be the probability that a loop of length ni starts at r on the surface and ends at r . The , , , , loop length ni ranges from 1 to the chain length, N. Including loops of length 1 implies that trains are incorporated in the computation of the entropy. With this de"nition, the restricted partition function for x replicas in a group can be written as:
dr,2 dr, f V (r,!r,)2f VN (r,!r, )d(n #n #2#n !N) . (8) Z(r,)" L N N L N\ N 2 L L LN The delta function conserves total chain length while allowing loop length #uctuations. We do not integrate over the position of the "rst adsorbed segment, r , for later mathematical convenience. , The entropy for x replicas in a group is obtained by integrating Z over r and then taking the , negative logarithm. In order to compute this partition function, let us "rst introduce a Laplace transform conjugate to N; i.e.,
z(k)" dN z(N)e I,
(9)
where k is the Laplace variable conjugate to N. The product of the functions that describe the loop probabilities exhibits a convolution structure, and so it is convenient to introduce a 2D Fourier transform conjugate to the 2D spatial coordinate that de"nes position on the surface. The Fourier}Laplace transform of the restricted partition function is
, N (10) Z(k, j)" f V (k)e\HL . L L The loop probability factors must be of two types. Following Hoeve et al. [45], the factor f for a loop of unit length is taken to be f (r)"ud(r!l)
(11)
A.K. Chakraborty / Physics Reports 342 (2001) 1}61
13
where u is the partition function for one adsorbed segment, and depends upon the chemical details of chain constitution. In the simple model that we are considering now, the DHP segments do not interact with each other. So, the loop probability factor for longer loops is Gaussian:
r C exp ! f (r)" L 2nl n
(12)
where C is a normalization constant that depends upon chain sti!ness, n is the loop length, and r is the distance between loop ends. Substituting the Fourier transforms of the loop probability factors into Eq. (10), integrating over r , and inverting the Fourier and Laplace transforms yields the partition function that we seek. , Now, noting that there are m/x groups of replicas, the entropy in Eq. (5) is calculated as the product of m/x and the negative logarithm of this partition function. In carrying out these manipulations, the sum over loop lengths in Eq. (10) can be taken to be an integral since we are concerned with the long chain limit. Combining the result of the entropy calculation with Eq. (7) for the energy obtains the following free-energy functional:
1 1 1 F" !ln p # ln(1!C p x ) ! x x N 2
N , p N ;ln q O
2pl O CO((4!3x )/2) ;[N!(p N!q)] \V O\ . COV u.M ,\OV 3x C(q(4!3x )/2) (13)
Srebnik et al. [23] also added a term that represents contributions from non-speci"c three-body repulsions to the energy. It is not essential to add this term, but at high values of the adsorbed fraction it may be necessary for stability. A mean-"eld solution for the order parameters, p and x , is obtained by extremizing the free energy functional with respect to them. It is worth noting that the free energy functional must be minimized with respect to p and maximized with respect to x . The reason that the free energy functional has to be maximized with respect to x is that when the mP0 limit is taken in the replica calculation the lowest-order correction to the free energy evaluated at the saddle point value is negative; this is because the dimensionality of the integral is m(m!1), which is negative when the replica limit is taken [42]. A simple computer code allowed Srebnik et al. [23] to obtain the saddle point values of the two order parameters pN and x . By following p and x we can learn about the adsorption characteristics of DHPs onto disordered multifunctional surfaces. The order parameter p is simply the fraction of adsorbed segments. It acquires values greater than zero when adsorption occurs. The order parameter x has been interpreted to be 1! P, where P is the probability with which conformation i occurs. G G G When a multitude of conformations are sampled, each of these probabilities is very small and x acquires the asymptotic value of unity. This is the usual situation in polymer physics because entropic considerations lead to large conformational #uctuations. Natural polymers seem to be
14
A.K. Chakraborty / Physics Reports 342 (2001) 1}61
designed such that, under appropriate circumstances, conformational #uctuations are suppressed and a few dominant conformations determine the thermodynamics. DHPs can also exhibit similar physics (e.g., [1}5,43]). In fact, our quest for biomimetic recognition can only be successful if adsorption occurs in a few dominant conformations. Let us "x all the parameters that determine the nature of the chains. Then, let us study the variation of the order parameters p and x with C . Di!erent values of this parameter correspond to di!erent surfaces. Speci"cally, each value of C corresponds to a di!erent total number density of sites on the surface. The number of sites per unit area on the surface can be adjusted experimentally by a number of means, the simplest being the adsorption of functional groups onto a surface from solutions of varying concentrations [7]. Fig. 3 shows how the two relevant order parameters vary with C . A uniform neutral surface corresponds to C "0. Therefore, when this parameter is small no adsorption occurs. The order parameter x is unity since in the absence of intersegment interactions and adsorption all conformations are energetically equally likely. Fig. 2 shows that above some value of C weak adsorption occurs with a multitude of conformations being sampled. The transition from no adsorption to weak adsorption is continuous. The theory also predicts that at a higher threshold value of C a sharp transition from weak to strong adsorption occurs. This adsorption transition is accompanied by x becoming less than unity. This signals that the polymer adsorbs in only a few dominant conformations (at least as far as the adsorbed segments, and hence, the loop structure is concerned). As noted earlier, in the simple model that we have been considering, each point on the abcissa corresponds to a di!erent surface. Thus, the sharp transition from weak to strong
Fig. 3. The order parameters p and x plotted as a function of C . For the calculation described in the text, each point on the abcissa represents a di!erent statistically patterned surface.
A.K. Chakraborty / Physics Reports 342 (2001) 1}61
15
adsorption implies sharp thermodynamic discrimination between surfaces bearing di!erent statistical patterns } one of the hallmarks of recognition. The physical reason for the sharp transition from weak to strong adsorption (and the freezing into few dominant conformations that accompanies it) can be understood by "rst discussing what happens when the continuous transition to weak adsorption occurs. When there are only a few sites on the surface (small C ), the energetic advantage associated with chain segments binding to preferred sites is not su$cient to overcome the entropic penalty for chain adsorption. For higher values of C , adsorption occurs because now the number of sites is su$cient for the favorable energetics of preferential segmental binding to overcome the entropic penalty. At the same time, since the number of surface sites per unit area is small, it is very easy for the chain to avoid unfavorable interactions. Furthermore, as shown schematically in Fig. 4, because it is easy to avoid unfavorable interactions, the chain can obtain the same energetic advantage in many di!erent conformations. Thus, the system minimizes free energy by sampling a multitude of conformations which have roughly the same energy. As the loading of surface sites increases, however, it is intuitively obvious that it becomes increasingly di$cult to avoid unfavorable interactions. In fact, it is clear that above some threshold loading of surface sites, most arbitrary adsorbed conformations will be subjected to many unfavorable interactions. Thus, for a su$ciently high loading (and hence, adsorbed fraction), most adsorbed conformations constitute a continuous spectrum of high energy states. However, there will be a small ensemble of conformations that are signi"cantly lower in energy. These conformations are the few that carefully avoid unfavorable interactions (as best as possible given the disorder in the sequence and the surface site distributions). These few
Fig. 4. Schematic representation of DHPs interacting with a surface bearing just a few sites. The two panels depict di!erent conformations with the same energy.
16
A.K. Chakraborty / Physics Reports 342 (2001) 1}61
conformations are energetically more favorable than all others. Thus, in a manner reminiscent of the random energy model of spin glasses, the energy spectrum develops a gap between the small ensemble of conformations that adsorb in a pattern-matched way and the multitude of others. By pattern matching we mean registry between adsorbed segments and the preferred sites. As the loading of surface sites increases, the system becomes more frustrated, and the energy gap between the pattern-matched adsorbed conformations and all others increases. Beyond a threshold value of the loading, the energy gap becomes much greater than the thermal energy, k¹. This causes the polymer chain to sacri"ce the entropic advantage of sampling a multitude of conformations, and it adsorbs in the few pattern-matched conformations. Of course, these pattern-matched conformations are strongly adsorbed. Thus, we get a phase transition with a sharp increase in the adsorbed fraction and passage to a thermodynamic state where only a small ensemble of conformations are sampled. Mean-"eld theory predicts that this transition is "rst order. There are two free-energy minima, one corresponding to a replica symmetric solution of the equations and the other to a solution with broken replica symmetry [46]. Before the transition from weak to strong adsorption, the minimum corresponding to the replica symmetric solution is the global minimum. When the transition occurs, the solution with broken replica symmetry becomes the global minimum. It is worth remarking that, while a two-letter DHP in solution does exhibit REM like behavior, the energy gap between the low-lying conformations and the continuous part of the spectrum is not large [5]. Signi"cantly larger gaps are obtained for designed sequences. In the situation we have been considering, interaction with the disordered distribution of surface sites adds another source of frustration which makes the energy gap in a REM-like picture quite large when the surface loading exceeds a threshold value even for random sequences and surface site distributions. At least this appears to be true for the thermodynamic behavior. We shall see later that kinetic considerations require delicate design of the statistics of heteropolymer sequence and surface site distribution statistics. The preceding discussion provides a compelling physical argument for the existence of a transition from weak to strong adsorption accompanied by the adoption of a few dominant conformations when the statistics of the DHP sequence and surface site distributions are related in a special way. However, we have provided no physical reason for the transition to be sharp (or "rst order as predicted by the mean-"eld calculation). The order of the transition can be established rigorously only by carrying out a renormalization group calculation. A proper calculation of this sort has not yet been performed. From a fundamental standpoint, it is important that the order of the transition be established in a rigorous way. From a practical standpoint what is important is that the transition is sharp and hence can display one of the hallmarks of recognition } a sharp discrimination between surfaces to which the chains bind weakly and others to which it binds very strongly. The physical reason for the sharpness of the transition is suggested by a simple model of the phenomenon under consideration; Monte-Carlo simulations for "nite size systems is also indicative of a "rst-order transition. First, let us discuss a model [26] which complements the replica "eld theory that we have discussed and is motivated by simple physical considerations. Again, consider DHP chains comprised of two types of segments (A and B) interacting with a surface functionalized by two di!erent types of sites. The segments of type A prefer to interact with one type of surface site, and those of type B exhibit the opposite preference. Thus, from an energetic standpoint, there are two types of segment-surface contacts: good and bad contacts. Good contacts are those that
A.K. Chakraborty / Physics Reports 342 (2001) 1}61
17
involve preferred segment-surface site interactions. Let us try to develop a free-energy functional with the order parameters being the total number of adsorbed segments (p) and the number of good contacts (q). The energy corresponding to given values of p and q is: E "[qdE#pE ] k¹
(14)
where dE is the energy di!erence between good and bad contacts, and E is the energy of a bad contact. We now need to compute the entropy for a chain of length N and p adsorbed segments of which q are good contacts. This entropy can be partitioned into three separate contributions. Firstly, there is a `mixinga entropy (S ) associated with the number of ways to choose p adsorbed
segments out of N. There is also the entropy loss (S ) associated with segmental binding, and the entropy (S ) associated with loop #uctuations in the adsorbed conformations. As we shall see, the last contribution is of crucial importance. The simplest possible approximation for S yields
S
"!p ln p !(1!p ) ln(1!p ) N
(15)
where p "p/N is the fraction of adsorbed segments. As is usual, the loss in entropy upon segmental binding is given by S
/N"!wp
(16)
where u is a constant related to chain #exibility and solution conditions. Now consider the computation of the loop entropy. When a homopolymer adsorbs on a chemically homogeneous surface, the energetic advantage associated with every segment-surface contact is the same. Thus, the adsorbed chain exhibits large #uctuations in the loop structure to maximize entropy. It is important to note that the loops live in three-dimensional space, and any description of the physics must account for the loop #uctuations properly. The problem that we are considering is distinctly di!erent from homopolymer adsorption because the segment}surface contacts are of two types. The existence of good and bad contacts implies that there are two types of loops. There are loops associated with forming good contacts at both ends, and those that are associated with forming the other contacts. These two types of loops are fundamentally di!erent in character. Each loop is characterized by the loop length and the distance between loop ends on the surface. For the loops associated with forming good contacts at both ends, only certain values of these quantities are allowed. This is because the two segments that correspond to the loop ends must be bound to surface sites with which they prefer to interact. The allowed loop lengths and distance between the loop ends on the surface are intimately related to the probabilities of "nding certain types of sites and segments at di!erent locations along the chain and on the surface. Thus, the allowed #uctuations of loops associated with good contacts are determined by the statistics that characterize the chain sequence and the surface site distribution. Loops associated with contacts that are not good are not restricted in this manner, and the usual
18
A.K. Chakraborty / Physics Reports 342 (2001) 1}61
#uctuations in loop length and distance between loop ends are allowed. The above argument suggests that competing interactions and disorder cause loop #uctuations to be suppressed by the formation of good contacts. It is reasonable to suspect that if the statistics of the chain sequence and the surface site distribution are such that there is a high probability for the formation of good contacts, loop #uctuations are strongly suppressed. Suppression of loop #uctuations makes the chain e!ectively sti!er. Su$ciently sti! chains are known to undergo sharp ("rst-order) adsorption transitions [14]. These arguments suggest that strong suppression of loop #uctuations resulting from frustration due to competing interactions and quenched disorder, and statistical pattern matching are the origins of the sharp adsorption transition. The meaning of statistical pattern matching also is made more clear; we mean that the statistics that describe the DHP sequence and the surface site distribution are such that the probability of making good contacts in certain adsorbed conformations becomes su$ciently high. In order to explore the veracity of these arguments, we need to write down a mathematical formula for the entropy corresponding to loops associated with good and bad contacts when they coexist. (For ease of reference, we shall refer to these loops as quenched and annealed, respectively.) In order to develop such a formula, it is instructive to "rst consider the entropy associated with each type of loop when it is the only type of loop that exists. Once these formulas are available, it is relatively straightforward to combine them properly to obtain what we seek. Let us begin with the quenched loops. Since the bare chains are non-interacting (and hence Gaussian) in our model, the loop factor for a loop of length n returning to the plane with the two ends separated by a distance d is again (see Eq. (12)) given by
d C exp ! . P(n, d)" 2nb n
(17)
The quantities n and d depend upon the statistics that describe the chain sequence and surface site distributions. For example, in the case of uncorrelated #uctuations in the surface site distribution, the most probable value of d&1/p , where p is the width of the distribution and is proportional to the surface loading of both types of sites. Since we are considering a situation where only quenched loops exist, if there are q adsorbed segments, there are q quenched loops with the average loop length being N/q. Shortly, we shall see how the average loop length (and hence n) is closely related to the statistics which describe the sequence and surface site distributions. In view of the considerations noted above and Eq. (17), it is easy to write down an expression for the entropy corresponding to q quenched loops. Speci"cally, taking n and d to equal their most probable values, obtains
aq S "q ln(Cq)! 2bp N
(18)
where q "q/N. Now consider the entropy associated with forming annealed loops only. Toward this end, consider the well-known problem of a homopolymer adsorbing on a chemically homogeneous surface since herein the loops are annealed. Let the energy bonus for segmental adsorption be represented by a potential (z) which is zero everywhere except at the surface; z is the coordinate normal to the surface. The e!ective energy bonus for segmental adsorption, ln b is then given by the
A.K. Chakraborty / Physics Reports 342 (2001) 1}61
19
following formula:
ln b"ln
dz[e\(XI2!1] .
(19)
As noted earlier, in this case, the loop lengths and the distance between loop ends can #uctuate with the only constraint being the "xed chain length. This problem is similar to the adsorption of a homopolymer chain (which lives in three-dimensions) to a point, and its solution is presented in [14]. This method can be adapted to solve the problem we are considering. The main di!erence between the problem considered in [14] and our concern is that the potential is imposed by an impenetrable two-dimensional manifold rather than a point. This imposes certain additional symmetries. Exploiting these symmetries, it is easy to show [26] that the following Schrodinger-like equation describes the problem under consideration: [1#bd(z)]g( t(z)"lt(z)
(20)
where g is the standard connectivity operator, and the eigenfunction t and the eigenvalue l have their usual meaning. Very simple manipulations (described in [14]) lead to the following relationship between b and l:
1 g(k) 1 " dk , [l!g(k)] b 2p
(21)
where k is the Fourier variable conjugate to z. We seek the entropy corresponding to the formation of P annealed loops. This can be obtained by "rst deriving a relationship between b and P. In order to "nd this relationship, it is convenient to de"ne a generating function z(N, b) as follows: z(N, b)" b.Z(N, P) (22) . where Z(N, P) is the partition function for a chain of length N with P annealed loops. Of course, this generating function is related to the eigenvalue in Eq. (20) in the usual way; i.e., l,"z(N, b). For adsorption problems such as this, the ground state dominance approximation is appropriate [14,47]. This implies that we may evaluate the sum in Eq. (22) using the saddle point approximation. In other words, P"b
R ln z . Rb
(23)
With this approximation, making use of the relationships between b and l (Eq. (21)) and that between l and the generating function allows us to obtain the equilibrium value for P/N. Speci"cally, we "nd that P 1 dk g(k)/[l!g(k)] " . N l dk g(k)/[l!g(k)]
(24)
20
A.K. Chakraborty / Physics Reports 342 (2001) 1}61
The entropy is now easy to calculate as the free energy equals !N ln l and the energy bonus associated with segmental adsorption is ln b. We "nd that
S 1 F E P g(k) "! # "ln l! ln dk . N 2p N¹ N¹ N [l!g(k)]
(25)
Eqs. (21) and (24) can be solved simultaneously to obtain P as a function of b, and Eqs. (21) and (25) yield the relationship between the entropy and b. Thus, we obtain the entropy as a function of P. Now, we need to compute the entropy when quenched and annealed loops co-exist. On physical grounds, it is clear that the total number of segment}surface contacts is greater than or equal to the number of good contacts. This implies that the annealed loops live within the quenched ones. This is illustrated schematically in Fig. 5. The annealed loops can redistribute among the quenched ones in an unconstrained manner. It is most convenient to consider this physics in the grand canonical ensemble, whence we can say that the chemical potential of the annealed loops is the same in all the quenched loops. The chemical potential can be easily calculated from the equations we have derived so far since it equals !RS/RP. One "nds that it is a monotonic function of P. This fact, when combined with the observation that the chemical potential of annealed loops must be equal in the quenched loops, leads to the conclusion that the concentration of annealed loops is the same in each quenched loop. By concentration, we mean the quantity p "number of annealed loops/length of quenched loop. These remarks allow us to properly combine the formulas for the entropies of quenched and annealed loops (Eqs. (18), (21), (24) and (25)). Noting that there are q quenched loops (q good contacts), that p "p !q , and that p is the same in each quenched loop lead us to the following expression for the loop entropy corresponding to p adsorbed segments of which q are good contacts:
1 S "q ln P(1/q , d)# s(pN ) q N
(26)
where P is the probability written down in Eq. (17), and s(p ) is the entropy of annealed loops with concentration p divided by n. The latter quantity is obtained from Eqs. (21), (24), and (25) with one
Fig. 5. Schematic depicting quenched and annealed loops. The darkly shaded loops correspond to loops with both ends being good contacts. The lightly shaded loops do not have good contacts at the end and exhibit signi"cant #uctuations.
A.K. Chakraborty / Physics Reports 342 (2001) 1}61
21
modi"cation. The calculation leading to these equations considered a uniform surface. We are concerned with a situation where adsorption can only occur on particular surface sites which do not uniformly cover the surface. The concomitant entropy loss is proportional to ln p , and is accounted for by Chakraborty and Bratko [26]. Combining Eqs. (14) and (26), we obtain the free energy density f as a function of the order parameters, p and q to be: f"q dE#(u#E )p #p ln p #(1!p ) ln(1!p )
b aq !s(p !q )!(p !q ) ln p . ! q ln(Cq)# a 2p b These two order parameters can be further related by noting that
(27)
q "p P
(28) where P is the probability of making a good contact. At in"nite temperature, when entropy is irrelevant, P is simply related to the statistics of the sequence and surface site distributions. Let us denote this intrinsic probability for making good contacts by P . It has been conjectured that [26] the relationship between P and the sequences and surface site statistics is the following: P " P (m)P (m). Here P (m) is the probability of "nding a block of length m of like segments on K the chain sequence, and P (m) is the probability of "nding a patch of size m of like sites on the surface. This conjecture has been found to be consistent with Monte-Carlo simulation results [24,25]. Note that given the statistics that describe the DHP chain sequence and surface site distributions, P (m) and P (m) can be easily computed. At "nite temperatures, the probability of making good contacts is modi"ed by entropic considerations, and P must be weighted by the free energy for making good contacts in the following manner: e\@$ (29) P "P P e\@$ #(1!P )e\@$ where F and F are the free energies associated with quenched and annealed contacts (loops), and can be calculated from the equations derived earlier. Eqs. (27)}(29) need to be solved numerically to obtain the values of the order parameters pN and qN which minimize the free energy. Chakraborty and Bratko [26] have obtained this mean-"eld solution. The purpose of this exercise is to obtain some insight into the origin of the sharp transition. Thus, let us consider results for the simplest possible scenario } uncorrelated sequence and surface site #uctuations. In this case, P is simply proportional to the product of the widths of the two statistical distributions. Fig. 6 shows the variation of p and q with p , the width of the distribution that characterizes the surface site #uctuations. We have taken the statistics of the DHP sequence to be "xed in constructing Fig. 6. Thus, points on the abcissa in this "gure correspond to di!erent surfaces. Fig. 6 shows that when p is small, both pN and qN are zero. This is simply a re#ection of the fact that the energetic advantage associated with segmental binding is not su$cient to overcome the concomitant entropic penalty. This is because the number of binding sites available on these surfaces is not su$cient. When p becomes su$ciently large, adsorption does occur. However, it is
22
A.K. Chakraborty / Physics Reports 342 (2001) 1}61
Fig. 6. The order parameters p (solid line) and q (dotted line) plotted as a function of p . For the calculation described in the text, each point on the abcissa corresponds to a di!erent surface.
important to note that the number of good contacts is very small in this weak adsorption limit. Above a threshold value of p , our theory predicts a sharp transition from weak to strong adsorption. This transition coincides with a jump in q , signifying that now the preponderance of contacts are good ones. The entropy is now dominated by that associated with the quenched loops, and is low because loop #uctuations are suppressed. Notice that after the sharp transition q grows faster than p , and ultimately approaches p . These results provide evidence for the argument made earlier that the suppression of loop #uctuations is the origin of the sharp transition from weak to strong adsorption. This is evident from the result that the number of good contacts (quenched loops) jumps at this transition. A preponderance of good contacts implies a strong suppression of loop #uctuations; i.e., we have a situation resembling the adsorption of a sti! chain, a case for which the adsorption transition is known to be "rst order [14]. In some ways, the phenomenon we are considering resembles protein folding (or heteropolymeric models of folding). In the latter situation, a "rst-order transition called the coil}globule transition occurs wherein the preponderance of contacts become native ones. This is followed by a continuous transition to the "nal low entropy folded state. The sharp transition we see may be considered the analog of the coil}globule transition. The fact that in the strongly adsorbed state the quenched loops dominate is very signi"cant. The dominance of quenched loops implies that the chain adopts a small number of conformations characterized by a certain distribution of loops speci"c to the sequence and surface site distributions. Only small #uctuations around these conformations (shapes) of the adsorbed chain occur after the transition. This adoption of a small class of shapes makes the phenomenon we are considering richer than other successful e!orts to elucidate strategies that can localize chains to certain regions of a surface [29}31]. This feature was also signaled in the replica "eld theory by broken replica symmetry coinciding with the transition from weak to strong adsorption. In the simple model that we have just discussed, the structure of the quenched loops is measured only by the average length, q. Notice that even this quantity is determined by the probability of good contacts; i.e., the statistics of the sequence and surface site distributions. This suggests that the class
A.K. Chakraborty / Physics Reports 342 (2001) 1}61
23
of shapes that are adopted upon strong adsorption is determined by the statistical patterns on the chain and the surface. This issue of the emergence of a particular class of shapes (conformations) upon recognition will be explored in great detail later when we consider the kinetics of pattern recognition, and the suggestions of the thermodynamic model we have been considering will become vivid. 2.2. Monte-Carlo simulations of thermodynamic properties The predictions of the models we have been considering are consistent with a series of Monte-Carlo simulations designed to compute thermodynamic properties [24,25,27]. These studies were carried out using an adaptation of the non-dynamic ensemble growth method pioneered by Higgs and Orland [48,49]. The simulations were carried out on a cubic lattice. The introduction of the lattice does a!ect quantitative predictions. However, the phenomenology is expected to be the same because of reasons that have been explained in detail in [48]. A particular sequence is "rst drawn from the statistical distribution under consideration. A particular realization of the surface site distribution is also drawn from the statistical distribution under consideration. M monomers are then placed randomly with Boltzmann probabilities dictating the positional probabilities. These positions are allowed to vary between 0 and 2N where N is the length of the polymer we wish to simulate. This is equivalent to studying isolated chains con"ned between identical surfaces separated by a distance, 4N. One then attempts to add a second segment of type A or B (as speci"ed by the particular realization of the sequence) at the end of each monomer. In other words, 6M trials are made. M dimers are then chosen with Boltzmann probabilities. The potential energy is determined by intersegment interactions and interactions with the surface sites; excluded volume interactions are enforced. This process is continued until chains with the desired length have been grown. For M
24
A.K. Chakraborty / Physics Reports 342 (2001) 1}61
Fig. 7. Results of Ensemble Growth Monte-Carlo simulations. Adsorbed fraction p and the parameter x as a function of p. Each point on the abcissa corresponds to a di!erent surface.
analytical models, the statistics of the surface site distribution is then varied by changing the total number of sites per unit area (i.e., loading). Fig. 7 depicts the results of the ensemble growth MC simulations for these situations by plotting the adsorbed fraction p as a function of p; p is the width of the distribution characterizing the surface site distribution, and is proportional to the loading. The absorbed fraction p is vanishingly small for small loadings of the surface. For su$ciently large values of p, we "nd that a gradual transition to weakly adsorbed states occurs. We also observe that beyond a higher threshold value of p a sharp transition occurs from weak to strong adsorption. In Fig. 7, we also plot the variation of a parameter x with p. This quantity is de"ned in a manner analogous to the way in which x was de"ned in the replica "eld theory. The quantity x"1! P, where P is the probability of "nding G G G a conformation with energy E . Due to degeneracy, x and x are not identical. However, simulation G results [25] show that x faithfully reproduces the qualitative trends obtained by computing x . It is much easier to compute x from the simulation data as E is easily obtained from the simulation G results. Fig. 7 shows that x equals unity for values of p for which we have no adsorption or weak adsorption. However, the transition from weak to strong adsorption is accompanied by x becoming less than unity. (The simulation results suggest that x become less than unity for a value of p that is slightly larger than that corresponding to the weak to strong adsorption transition, and some reasons have been discussed to explain this [26]. We shall not focus on this minor detail here.) The simulation results shown in Fig. 7, therefore, reveal the same phenomenology as the theoretical models. For uncorrelated sequence and surface site #uctuations, and for "xed DHP sequence statistics, when the surface loading exceeds a threshold value a sharp change from weak to strong adsorption occurs with the thermodynamics being determined by a few dominant conformations when strong adsorption occurs. The transition is rounded in the Monte-Carlo simulations because of "nite size e!ects. Unlike the theoretical predictions, however, it is di$cult to ascertain the order of the transition from the simulation results. This question can be explored by examining "nite size e!ects. Thus, the quantity 1(dp)2 has been computed. This quantity exhibits a peak when the sharp adsorption transition occurs. Furthermore, the width of the peak narrows
A.K. Chakraborty / Physics Reports 342 (2001) 1}61
25
and shifts to the left as the chain size increases. Unfortunately, no "rm conclusions can be drawn about the order of the transition as simulations have been carried out only for three chain lengths, a number insu$cient to extract "nite size scaling exponents with con"dence [25]. One reason for the lack of simulation data is that the important result is that a sharp transition occurs from weak to strong adsorption when the statistics of the sequence and surface sites are related in a special way. The order of the transition is fundamentally very important, but not crucial for examining the physics of discriminatory pattern recognition exploiting the notion of self-assembly driven by statistical pattern matching. We now turn to these exciting questions, for which the preceding discussions serve as a prelude. We begin our considerations of how the basic physics described above can be exploited to cause DHPs bearing certain statistical patterns to recognize surfaces that bear complementary statistical patterns by discussing some simulation results that pertain to thermodynamics. This will be followed by detailed considerations of the kinetics of the self-assembly process that leads to recognition driven by statistical pattern matching. Consider DHPs with symmetric average compositions that carry two di!erent types of statistical patterns encoded in their sequence. The two types of statistical patterns are characterized with two-point correlations only. As we have noted earlier, these two-point correlations are speci"ed by a parameter j. Positive values of j imply that within a certain length along the chain, there is a high probability of "nding segments of the same type. We shall call the ensemble of DHPs with such sequences statistically blocky. Negative values of j imply that within a certain length along the chain there is a high probability for "nding the two types of segments arranged in an alternating fashion. We shall refer to these types of sequences as statistically alternating. We shall also consider two types of statistically patterned surfaces. Let us restrict attention to situations where the composition specifying the relative amounts of the two types of sites is symmetric. Surfaces that we call statistically patchy have a high probability for sites of the same type to be adjacent to each other within some correlation length, i, measured on the twodimensional surface. Those that we call statistically striated have a high probability for sites of the opposite type to be adjacent to each other within a correlation length i. Speci"cally, the density}density correlation function specifying the distribution of sites on the surface for the two situations are: p exp(!ir) and p(!1) V> W exp(!ir) for statistically blocky and striated surfaces, respectively. r is distance measured on the two-dimensional surface, and *x and *y are displacements in the two orthogonal cartesian coordinates used to specify position on the surface. p measures the strength of the surface disorder, and is proportional to the total surface loading, a physical parameter with which we are already familiar. As we shall see later, the total loading can be varied conveniently in experiments. Ensemble growth MC simulations have been performed to study whether statistically blocky DHPs can recognize statistically patchy surfaces more easily than statistically striated surfaces, and vice versa [24]. The two types of statistically patterned DHPs were characterized by j"0.4 and j"!0.4. The statistically patterned surfaces are characterized by a value of i"0.7/l, where l is the statistical segment length. The way the statistics of surfaces within a class (statistically patchy or striated) is varied is by changing the total loading, i.e., p. Figs. 8a and b show the results of the ensemble growth MC simulations for statistically alternating and patchy surfaces. The adsorbed fraction pN is plotted as a function of p for statistically blocky and alternating DHPs in each panel. In all four cases, we "nd that a sharp adsorption
26
A.K. Chakraborty / Physics Reports 342 (2001) 1}61
Fig. 8. Results of Ensemble Growth Monte-Carlo simulations for the interactions of statistically alternating (open circles) and blocky ("lled circles) DHPs with (a) statistically patchy surfaces and (b) statistically striated surfaces.
transition occurs on the background of weak adsorption when the loading acquires a threshold value. For the statistically patchy surfaces, we see that the sharp adsorption transition occurs at a smaller value of p for the statistically blocky chains. The opposite is true for the statistically striated surface. Thus, a statistically patchy (striated) surface with a loading that will strongly adsorb the statistically blocky (alternating) ensemble of DHPs will not strongly bind the statistically alternating (blocky) chains. These results show that, at least from a thermodynamic standpoint, DHPs can discriminate between surfaces bearing di!erent statistical patterns, and vice versa. In other words, they can recognize each others statistics. The physical reason that enables such recognition driven by statistical pattern matching is the following. For pseudo patchy surfaces, arbitrary adsorbed conformations of blocky DHPs will experience more unfavorable interactions at smaller surface loadings when compared to sequences which are statistically alternating. Further, pattern matched adsorbed conformations with low energies are statistically far more probable for blocky DHPs interacting with pseudo patchy surfaces than for statistically alternating DHPs. These reasons cause the energetic driving force to
A.K. Chakraborty / Physics Reports 342 (2001) 1}61
27
adsorb in a few pattern-matched conformations (D) to be larger for statistically blocky chains interacting with pseudo patchy surfaces (at the same value of loading). The larger value of D when combined with the fact that it is entropically less costly for the statistically blocky chain to organize itself in a pattern matched conformation when interacting with a pseudo patchy surface leads such statistical sequences to adsorb strongly to pseudo patchy surfaces at smaller loadings compared to statistically alternating sequences. Similar arguments explain why statistically alternating chains recognize pseudo striated surfaces at smaller loadings compared to statistically blocky chains. The simulation results we have just described demonstrate that, from the standpoint of thermodynamics, statistically patterned chains can recognize statistically patterned surfaces when the statistics characterizing the sequence and surface sites are related in a special way. The simulation results have been found to be qualitatively captured by the simple formula described earlier with P " P (m)P (m). K Static ensemble growth MC simulations have also been carried out to investigate the competition between intersegment interactions and interactions between segments and surface sites [26]. The main results of these simulations are as follows. There are two types of intersegment interactions. The "rst is a non-speci"c excluded volume interaction. The strength of these nonspeci"c interactions is (< #< #2< )/2, where < are the strengths of the short-range GH intersegment interactions between segments of type i and j. The sequence speci"c intersegment interactions which encourage segregation of like-type segments and freezing into a few dominant conformations are also taken to be short-ranged. The strength of these interactions is b"(< #< !2< )/2. In the following, we will increase the strength of the interactions by increasing the magnitude of b. Consider again 2-letter DHPs and multifunctional surfaces with short-range correlations describing the sequence and surface site #uctuations. When "b" is small, the variation of p and x with p is not qualitatively di!erent from that we have discussed earlier for b"0. This is shown in Fig. 9 for DHPs with symmetric composition and for surfaces wherein the average composition of the two types of surface sites is also symmetric. Fig. 10 shows results for a relatively large value of "b""4. Even before adsorption occurs x(1. This is because of the well-known phenomenon
Fig. 9. Results of Ensemble Growth Monte-Carlo simulations. Adsorbed fraction p (full line) and x (dotted line) plotted against the surface disorder strength p. The results are for a small value of "b".
28
A.K. Chakraborty / Physics Reports 342 (2001) 1}61
Fig. 10. Results of Ensemble Growth Monte-Carlo simulations. Adsorbed fraction p (full line) and x (dotted line) plotted against the surface disorder strength p. The results are for a relatively value of "b""4k¹.
(e.g., [1}5]) wherein, for large enough values of "b", frustration due to connectivity and the quenched sequence disorder causes a few energetically favorable conformations to determine the thermodynamics. The situation we are studying is thus analogous to the adsorption of a folded protein from solution. The results displayed in Fig. 10 show that the variation of p with p is of the same form as that when "b" is small. Comparison of Figs. 9 and 10 shows that the sharp transition from weak to strong adsorption occurs at higher values of p when "b" is larger. Physically, this is so because stronger speci"c intersegment interactions imply that the DHP chains are in lower energy states in solution compared to situations where "b" is small. Thus, compared to situations where the strength of the intersegment interactions are small, stronger (more favorable) segment-surface interactions are required for strong adsorption to become favorable. A signi"cantly more interesting e!ect of speci"c intersegment interactions is revealed by the variation of x with p displayed in Fig. 10. The few conformations that are adopted in solution are determined by the nature of the intersegment interactions and the sequence statistics. As the loading increases beyond the point where a gradual transition to weak adsorption occurs, x decreases even further. This is because even though the polymers are still essentially in the same frozen conformations as those in solution, adsorption to the surface eliminates a few more conformations from being sampled. When the loading is increased beyond the point where a sharp transition from weak to strong adsorption occurs, the MC simulation results show a rather unusual variation of x with p. Fig. 10 shows that x "rst increases and then decreases again. This is interpreted as follows. As the interactions of the chain with the surface increase because of the higher density of surface sites, these interactions compete favorably with the intersegment interactions. Thus, the few dominant conformations favored by the intersegment interactions alone are no longer energetically far more favorable than all other conformations. Thus, the system minimizes free energy by gaining the entropy associated with sampling conformations other than the low-energy conformations determined by the intersegment interactions alone. In other words, the few dominant conformations adopted by the chains due to intersegment interactions unravel upon strong adsorption to the surface. As the loading, and hence interactions with the surface, are
A.K. Chakraborty / Physics Reports 342 (2001) 1}61
29
Fig. 11. Schematic depiction of the adsorption of DHPs with di!erent values of "b": (a) moderate values of "b". Here for large enough surface disorder strength, the conformations preferred by the intersegment interactions can unravel, (b) large values of "b". Now, the intersegment interactions are too strong for the segment}surface interactions to be dominant. Adsorption always occurs in conformations very similar to those preferred by the intersegment interactions.
increased even further, the segment}surface interactions dominate over the intersegment interactions. Thus, the physics should be similar to that observed when the intersegment interactions are weak. That is, the strong segment}surface interactions coupled with the frustration due to disorder in DHP sequence and surface site #uctuations should cause the DHPs to freeze into a few dominant conformations determined by the statistics of the chain sequence and surface site distributions. Consistent with this picture (shown schematically in Fig. 11), we see that x decreases again in Fig. 10. The analytical and simulation results that we have described so far suggest the occurrence of a phenomenon akin to recognition in biological systems due to statistical pattern matching between the sequences of DHPs and the distribution of sites on multifunctional surfaces. Speci"cally, these thermodynamic studies show that frustration (due to competing interactions and disorder) and statistical pattern matching lead to one of the hallmarks of recognition: a sharp discrimination between surfaces to which a given ensemble of DHP sequences binds strongly and those to which they do not. These studies also suggest that when strong adsorption occurs, the chains adsorb in a small class of conformations or shapes. These suggestions notwithstanding, it is unclear as to whether statistical pattern matching is su$cient to realize the other hallmarks of recognition and hence be pragmatically useful and of further scienti"c interest. The basic question that the aforementioned studies do not address is: can DHPs bearing a statistical pattern discriminate between various parts of a single surface which bears di!erent statistical patterns in di!erent regions? Furthermore, can this happen in time scales of practical interest? If the answer to
30
A.K. Chakraborty / Physics Reports 342 (2001) 1}61
these questions is to be a$rmative, then DHPs must not get kinetically trapped in the `wronga regions of the surface. By wrong we mean those regions of the surface which bear statistical patterns of sites that are not matched to the DHP sequence statistics. To address these issues kinetic Monte-Carlo simulations have been performed [28]. The results of these simulations show that the answers to the questions posed above are `yesa provided the statistical patterns are properly designed. Furthermore, they reveal that the dynamical behavior of this system is typical for frustrated systems characterized by rugged free energy landscapes. As we shall discuss, the results of the simulations demonstrate that the system we are considering may be a good model system for experimental studies of kinetics in frustrated systems. Intriguing connections can also be made to Kau!man's ideas [51] regarding the interactions between selection and the propensity for self-organization in evolution. 2.3. Kinetics of recognition due to statistical pattern matching In order to discuss the kinetics of the phenomenon under consideration, let us begin again by considering 2-letter DHPs which carry statistical patterns. As before, let us concern ourselves with statistically alternating and blocky patterns with the values of j being #0.4 and !0.4. The compositions of the two ensembles of sequences are symmetric. As discussed earlier, surfaces with two types of sites on a neutral background can also bear simple statistical patterns. The simplest statistical measures of the patterns carried by statistically alternating and patchy surfaces are the correlation length, the total density of A and B type sites on the surface (loading), and the ratio f of the number of sites of A and B types (always equal to unity in this discussion). Golumbfskie et al. [28] have generated statistically patterned surfaces of this type by obtaining equilibrium realizations of a two-dimensional Ising like system using a MonteCarlo (MC) algorithm. They simulate a lattice Hamiltonian with only nearest neighbor interactions, and MC moves which are exchanges of the identity of two sites. Typically 100 million MC steps were run, well after equilibrium is established. For generating statistically patchy (alternating) surfaces, interactions between sites of the same type are taken to be attractive (repulsive) and those between sites of the opposite type are repulsive (attractive). Neutral sites are non-interacting. The correlation length is determined by the temperature, ¹ , at which the MC simulation is carried out and the loading. Large values of ¹ lead to essentially random surfaces and, for a "xed loading, statistical patterns with larger correlation lengths are obtained upon reducing ¹ . It is worth noting that patterned surfaces of this sort can be created in practice by self-assembly of mixed molecular adsorbents [52}56]. Consider a surface comprised of four quarters, each of size 100;100 lattice units. Let two quarters have essentially random distributions of A (red) and B (yellow) type sites. Let the other two quarters be realizations of statistically patchy and alternating surfaces generated by the methods described earlier. Such an arrangement is shown in Fig. 12, where all quarters of the surface are characterized by an average total loading of 20%, and the correlation length is &1.4 for the statistically patterned regions. Golumbfskie et al. [28] have carried out MC simulations with the Verdier}Stockmayer algorithm for chain motion to study the dynamic behavior of DHPs in the vicinity of such surfaces. They simulate a Hamiltonian with nearest-neighbor interactions. Attention is restricted to situations where the segment}surface site interactions are much stronger than the intersegment
A.K. Chakraborty / Physics Reports 342 (2001) 1}61
31
Fig. 12. Surface bearing four types of statistical patterns. White denotes the neutral background, and light and dark grey dots correspond to the two types of active surface sites. The right top corner is statistically alternating and the left bottom corner is statistically patchy. The other two quarters have a random distribution of surface sites.
interactions, and hence the latter are taken to be non-speci"c and of the excluded volume type. The preferred segment}surface interaction strengths are taken to be !1 and the ones that are not preferred equal #1 in units of k¹ , where ¹ is a reference temperature. DHP segments of type A prefer to interact with red sites on the surface, and those of type B prefer the yellow surface sites (see Fig. 12). The simulations are carried out at a temperature ¹ for chains of length 32, 100, and 128. The results show that for "xed f and surface loading the important design variables are j, ¹ and ¹/¹ (determined by the chemical identities of segments and surface sites, and preparation conditions). As we have seen from the thermodynamic results, statistically blocky DHPs are statistically better pattern matched with the statistically patchy part of the surface, and statistically alternating DHPs are statistically pattern matched with the statistically alternating region of the surface. The "rst question that one may ask is: If the surface shown in Fig. 12 is exposed to a solution containing a mixture of statistically blocky and statistically alternating DHPs, will the chain molecules selectively adsorb on those regions of the surface with which they are statistically pattern matched? Such recognition due to statistical pattern matching requires not only that the `correcta patch be strongly favored for binding thermodynamically, but also that the `wronga regions of the surface not serve as kinetic traps.
32
A.K. Chakraborty / Physics Reports 342 (2001) 1}61
Fig. 13. Projection of typical center of mass trajectories for statistically blocky and alternating DHPs. The starting (ending) points of the trajectories are labelled 1 and 2 (1 and 2) for the statistically alternating and blocky chains, respectively.
Fig. 13 depicts typical trajectories at ¹/¹ "0.6. All points on these trajectories do not correspond to adsorbed states. Both the statistically alternating and blocky DHPs begin on randomly patterned parts of the surface and ultimately "nd their way to the region of the surface that is statistically pattern matched with its sequence statistics. The length of the simulations roughly corresponds to 1 s, over which separation or recognition is achieved with over 90% e$ciency (out of 1000 trials). These results show that biomimetic recognition between polymers and surfaces is possible due to statistical pattern matching. It seems possible to exploit this notion to design inexpensive devices that can separate a large library of macromolecules into groups of statistically similar sequences.
Local motion on the scale of monomers occurs in 10\}10\ seconds depending upon solvent conditions and monomer size (see [47]). The MC moves employed by Golumbfskie et al. [28] are for motions of a statistical segment length. The time for local MC moves scales as the square of the statistical segment length for Rouse-Zimm dynamics. Estimates of the statistical segment lengths of synthetic polymers leads to the conclusion that the time scale associated with the MC moves in [28] ranges between 10\}10\ seconds. This is in agreement with previous estimates. The numbers reported by Golumbfskie et al. [28] are based on using 10\ seconds as the time scale for primitive MC moves.
A.K. Chakraborty / Physics Reports 342 (2001) 1}61
33
Fig. 14. Free energy landscape experienced by the statistically alternating DHPs as a function of center of mass position. The free energy is in arbitrary units, and the two deep minima on the right (in the wrong parts of the surface) are artifacts of periodic boundary conditions (see [28]).
In the trajectories shown in Fig. 13, the DHPs do adsorb onto the wrong parts of the surface. However, these adsorbed states are short lived compared to the time it takes to evolve to the strongly adsorbed state. Once a chain is strongly adsorbed onto a region that is statistically pattern matched with its sequence the chain center of mass essentially does not move on simulation time scales (vide infra). The reason for this is made clear by the results shown in Fig. 14. The `wronga regions of the surface correspond to local free energy minima which are separated by relatively small barriers from each other. In contrast, in the statistically pattern matched region there exist a few deep global minima; each of these minima, in turn, is very rugged. It is important to note that ¹/¹ is an important design variable determined by the chemical identity of segments and surface sites. Large values of this parameter will not allow the chains to adsorb anywhere to appreciable extent because the entropic penalty associated with adsorption dominates the free energy. At su$ciently low values of this ratio, the `wronga regions of the surface serve as kinetic traps, and long-lived metastable adsorbed states exist in these regions. Thus, it is clear that due to both kinetic and thermodynamic considerations there should be an optimal value of ¹/¹ . Simulations have been carried out for ¹/¹ "0.45, 0.6, and 0.75. Of these conditions, Golumbfskie et al. [28] "nd that 0.6 leads to the best discrimination and recognition. More work is needed in order to provide quantitative criteria for the optimal value of ¹/¹ . The importance of both kinetics and thermodynamics for statistical pattern matching is further emphasized by the following point. We have provided a detailed discussion of thermodynamic arguments which suggest that DHPs with random sequences (j"0) sharply discriminate between surfaces with random site distributions which have di!erent loadings. However, Golumbfskie et al. [28] "nd that the likelihood of kinetic trapping in the wrong regions of the surface is rather high when uncorrelated sequence and surface site distributions are statistically pattern matched by adjusting the loading. In particular, let us compare the following two situations: (a) a surface with random site distribution bearing regions with loadings of 20% and 40% being exposed to DHPs
34
A.K. Chakraborty / Physics Reports 342 (2001) 1}61
with random sequences that thermodynamically strongly favor the region with a loading of 40%; (b) the surface shown in Fig. 12 exposed to the statistically alternating sequences. We note that the equilibrium adsorbed fraction in the region with a loading of 40% in case (a) and that for the statistically alternating DHPs in the statistically alternating region in case (b) are roughly the same. Simulations (over 40 trials in each case) show that the window of ¹/¹ for successful recognition ('80% e$ciency) is 50% wider for the latter case (b). The reason for this is that in the former situation the free energy landscape that the chains negotiate is virtually uncorrelated, and thus rife with local optima that lead to kinetic trapping [51]. Having illustrated the basic phenomenology of biomimetic recognition due to statistical pattern matching, Golumbfskie et al. [28] have also examined a situation that is a step closer to applications. If the notion of statistical pattern matching is to be used to a!ect molecularly selective separations, we need to separate more than two types of statistical patterns. A more diverse set of statistical patterns can be obtained by increasing the number of letters that code the statistical patterns. Using a 3-letter code, Golumbfskie et al. [28] generate four di!erent statistical patterns to illustrate a more complex separation than that depicted in Fig. 13. The surfaces are generated using a lattice where the identity of each site can be A, B, C or neutral. For all cases, in the MC simulations that generate statistically patterned surfaces the interaction energies (< , where i GH and j denote type of site) are symmetric and equal in magnitude, with the neutral sites being
Fig. 15. Statistically patterned surface with three types (depicted in light grey, dark grey and black) of sites. Four di!erent statistical patterns (described in the text) exist along with regions having a random site distribution.
A.K. Chakraborty / Physics Reports 342 (2001) 1}61
35
non-interacting. For statistical pattern number 1, < "< "!1, < "< "#1. For statistGG ! ! ical pattern number 2, < "< "!1, < "< "#1. For statistical pattern number 3, GG ! ! < "< "#1, < "< "!1. For statistical pattern number 4, < "< "#1, GG ! ! GG ! < "< "!1. The realizations of the statistical patterns shown in Fig. 15 correspond to a total ! loading of 30%, a 1 : 1 : 1 ratio of di!erent types of sites, and a correlation length of &1.4. The correlation length is calculated for the one unique type of site (e.g., C for pattern 1). The interaction energies above are also used to generate four types of DHP sequences that are statistically pattern matched with the surface patches. The DHP sequences employed by us belong to the "rst three excited states in terms of the energy spectrum. Just the energy is clearly not a complete speci"cation of the sequence statistics, and points to the need to develop better design criteria for 3-letter codes. What happens when four types of statistically patterned DHP sequences, each statistically pattern matched with one of four surface patches, are placed near the surface? Fig. 16 shows what happens when four types of DHP sequences, each statistically pattern matched with one of the four patches, are placed near the surface. For clarity, only the starting and "nal positions of the chain centers of mass are shown. As depicted in Fig. 16, the DHP chains all "nd and then bind strongly to the complementary statistically patterned region, i.e., biomimetic recognition due to statistical pattern matching is possible in this case also. However, unlike the 100% e$ciency reported for the two letter cases, over a simulation time of 5 s, Golumbfskie et al.
Fig. 16. Starting (1,2,3,4) and ending (1,2,3,4) points of typical trajectories of four di!erent types of statistically patterned DHPs on the surface shown in Fig. 15. For these trajectories statistical pattern matching leads to successful recognition of complementary surface patches.
36
A.K. Chakraborty / Physics Reports 342 (2001) 1}61
Fig. 17. Distribution of "rst passage times for the event described in the text.
[28] "nd the lowest success rate to be 60%. The success rates increase with simulation time, and failures are the result of kinetic trapping in the wrong regions of the surface. Simulations have not been carried out for long enough times to report the time scale over which the success rate would approach 100% for all situations. While Fig. 16 demonstrates the feasibility of carrying out molecular scale separations using statistical pattern matching with many such patterns, the e$ciencies found thus far (without detailed design considerations) also illustrate that we have not yet learned how to design statistical patterns with large number of letters in the code such that long-lived metastable states are largely eliminated. Fig. 14 suggests that, for the system we are considering, chain dynamics on various time scales and di!erent spatial locations should be quite di!erent and interesting. Detailed analyses of dynamic MC simulation results serve to elucidate these issues, and provide evidence for macromolecular shape selection when recognition occurs due to statistical pattern matching. Consider the "rst passage time, which is de"ned to be the time taken for a chain starting in the random part of a statistically patterned surface to adsorb in the pattern-matched region with the energy being below a certain cut-o! value. Fig. 17 shows the distribution of "rst passage times for chains of length 32, with the energy cut-o! being !20k¹ . It is clear that, after the usual turn-on time, the distribution is decidedly non-exponential; a stretched exponential with a stretching exponent equal to 0.43 "ts the simulation results. The non-exponential character of the distribution of "rst passage times is indicative of highly cooperative chain dynamics. The event we are considering is tantamount to two events that occur in succession. These are chain motion to the edge of the statistically pattern matched region of the surface followed by strong adsorption in one of the deep free energy minima in the statistically pattern matched region. The "rst passage times for these two events have been computed separately [28]. Signi"cantly, the distribution of "rst passage times for the "rst event is exponential, and that for the second is non-exponential. This implies that as the chain traverses the wrong parts of the surface, the center of mass motion is di!usive. This is because the free energy minima and barriers it encounters in these regions of the surface are relatively small. In contrast, it appears that strong adsorption in the statistically pattern matched region is highly
A.K. Chakraborty / Physics Reports 342 (2001) 1}61
37
Fig. 18. Distribution of P : (a) center of mass in the wrong parts of the surface, (b) center of mass has entered the L statistically pattern matched region, (c) strongly bound state.
cooperative with many important free energy barriers encountered enroute to the bottom of one of the deep free energy valleys that exist in this part of the surface. What is the physical origin of this behavior in the statistically pattern matched region? It is interesting to observe [28] that single particle MC dynamics on the free energy landscape shown in Fig. 14 is exponential. This indicates something very important for understanding the nature of the dynamics. Single particle dynamics on the free energy landscape in Fig. 14 would reproduce the "rst passage time distribution obtained from the full chain dynamics if motion of the center of mass was always the slowest dynamic mode. Our results suggest that this is not true. As we have discussed, adsorbed macromolecules are characterized by loops. The distribution of these loops #uctuates in time, and hence loop #uctuations are dynamic modes. Prior to considering the dynamics of these modes, let us examine the distribution of these loops as trajectories evolve starting from wrong parts of the surface. Extensive MC simulations have been carried out with chains of length 100. All trajectories show the qualitative features depicted in Fig. 18 where we show the probability distribution P for an arbitrary segment to be part of a loop of length n for L a statistically alternating DHP. Each panel shows P averaged over di!erent time (MC step) L windows. The "rst panel shows that when the center of mass of the chain is in the wrong parts of the surface P is structureless. This implies that all possible macromolecular shapes are being adopted L as the chain samples the wrong parts of the surface. Remarkably, the other panels in Fig. 18 demonstrate that as the chain center of mass enters the statistically pattern matched region P begins to show structure. Ultimately, it exhibits a spectrum of peaks which correspond to L
38
A.K. Chakraborty / Physics Reports 342 (2001) 1}61
preferred loop lengths and hence macromolecular shape in the adsorbed state. This observation of adsorption in preferred shapes due to statistical pattern matching is akin to recognition in biology, and we shall make it vivid shortly. Loops with the preferred lengths appear to be quenched in time, and so as before, we will call them quenched loops. The physical reason for the lack of quenched loops in the wrong regions of the surface, while these types of loops seem to dominate the behavior in the statistically pattern matched region has been discussed earlier. In the wrong region of the surface arbitrary adsorbed conformations with the same number of contacts have roughly the same energy. This is because there are no particular arrangements that lead to much higher degrees of registry between adsorbed segments and their preferred sites, compared to other conformations. Thus, the distribution of loops #uctuates strongly, thereby gaining entropy while maintaining roughly the same energy. In the statistically pattern-matched region of the surface, however, the situation is di!erent. While most adsorbed conformations are of relatively high energy as in the wrong region of the surface, now there exist a few conformations that can bind segments to their preferred sites with high probability in certain regions because the statistics of the sequence and surface site distributions are matched. These few pattern-matched conformations are much lower in energy than all others, and thus the chain sacri"ces the entropic advantage associated with loop #uctuations and `freezesa into one of these conformations. The better the statistical pattern matching, the greater the suppression of loop #uctuations. Dramatic evidence for this thermodynamic argument (and the associated model) is provided by the simulation results displayed in Fig. 18. We shall also make this vivid shortly by showing snapshots of animations of the dynamic simulation results. Let us return to the puzzle of why single particle dynamics on the free energy landscape calculated for the chain center of mass does not reproduce the correct dynamical behavior. Consider the time correlation functions 1P (t)P (t#q)2 that describe how loop (and hence, shape) L L #uctuations decay for n"17, 26, and 30. Attention is focused on these values of n since Fig. 18 shows that the dominant shape contains loops with these lengths. The simulation results show that, for motion in the wrong parts of the surface, these correlations decay much more rapidly than the time scale on which the chain center of mass moves. Thus, the modes corresponding to loop
Fig. 19. Squared and averaged displacement of the center of mass as a function of time measured in Monte-Carlo steps.
A.K. Chakraborty / Physics Reports 342 (2001) 1}61
39
Fig. 20. Relaxation time for loops of length 26. Data shown only after the center of mass enters the statistically pattern matched region. The relaxation time is vanishingly small for this loop length in the wrong parts of the surface.
#uctuations are the fast dynamic modes. After some time in the statistically pattern matched region, the chain's center of mass "nds the location corresponding to one of the deep free energy minima in this region. Now, the center of mass essentially does not move during the simulation. As shown in Fig. 19, for chains of length 32 the center of mass ceases to move, and for chains of length 100 the di!usion coe$cient becomes two orders of magnitude smaller than that in the wrong parts of the surface. On much longer time scales, the center of mass of these "nite chains will be able to escape from these minima and "nd other deep minima in the statistically pattern matched region. However, these long times are not relevant to these simulations, and for very long chains, these events would occur over time scales that are not experimentally relevant. The point is that localization of the center of mass implies that this dynamic mode has equilibrated from a practical standpoint. Simulations show that, after the chain center of mass has equilibrated in the statistically pattern matched region, the time correlation functions describing loop #uctuations decay over very long times. In fact, ultimately, once shape selective adsorption occurs, they do not decay to zero and simply oscillate around a "nite value. Thus, these modes, which were the fastest dynamic modes in the wrong part of the surface become the slowest modes once the center of mass reaches a location corresponding to a deep free energy minimum. This explains why single particle dynamics on the free energy landscape appropriate for the center of mass cannot describe the correct dynamics obtained from the detailed simulations. Fig. 20 shows how the relaxation time for loops of length 26 changes as a typical trajectory evolves in the statistically pattern matched region. The observations noted above imply the following physical picture. Once the center of mass of the chain is located in the region corresponding to a deep free energy minimum, the chain has to acquire the shape (conformation) that is most favorable on energetic grounds. In order to do this, it must arrange its loops in a speci"c way. The necessary conformational rearrangements occur in a highly cooperative way, and there are important entropic barriers associated with this reorganization. These entropic barriers which correspond to arranging the chain to acquire a speci"c shape (a hallmark of recognition) lead to the non-exponential character of the "rst passage time distribution. In order to make some of the results discussed above vivid, Golumbfskie et al. [28] have animated trajectories of their simulation runs. Fig. 21 shows snapshots from such a movie for
40
A.K. Chakraborty / Physics Reports 342 (2001) 1}61
Fig. 21. Snapshots of chain conformations at di!erent times during a typical trajectory for a 128 segment statistically alternating DHP. Dark and light shadings represent the two kinds of surface sites and two kinds of segments that make up the chain. The top three panels correspond to portions of the trajectory in the wrong parts of the surface, and the bottom three panels are for the statistically pattern matched region. The numbers correspond to time in arbitrary units.
a 128-segment statistically alternating 2-letter DHP. The trajectory starts with the chain above the statistically random part of the surface. The DHP does adsorb somewhat on this region of the surface. The absorbed conformations are characterized, as usual, by loops, trains, and tails which exhibit large #uctuations. For example, both the loop length distribution and the distance on the surface between loop ends #uctuate rapidly. The "rst three frames of Fig. 21 depict this. With time, this DHP evolves toward the statistically alternating region of the surface, and adsorbs strongly. Now the chain dynamics are dramatically di!erent (last three frames in Fig. 21). Several chain segments adsorb on preferred sites on the surface. The resulting loops are quenched in time in that there are essentially no #uctuations of these loops on the time scale of our simulations. Within these quenched loops live annealed loops that exhibit small conformational #uctuations. The problem of how macromolecules recognize patterns on surfaces has also been studied by Muthukumar and co-workers [29}31] in the context of polyelectrolytes interacting with a surface pattern of opposite sign. These studies complement the work described earlier in this section, and are very important contributions to the conceptual advances that are being made in understanding pattern recognition by macromolecules. We do not, however, provide a detailed review of this work because Muthukumar has recently provided insightful reviews [17,57]. The surfaces considered by Muthukumar and co-workers [29,30] contain a single pattern of typical size ¸ made up of charges imprinted on a neutral background (of size <¸). The question they ask is: can a polymer consisting of oppositely charged segments distributed on the backbone in a particular sequence recognize (adsorb on) this region of the surface? Muthukumar [29] used a variational method to derive conditions for occurrence of such selective adsorption. Further insight on this issue has been obtained via MC simulations [30]. Recent experiments (see references quoted in [17]) concerning the binding of polyelectrolytes to proteins seem to be consistent with
A.K. Chakraborty / Physics Reports 342 (2001) 1}61
41
some of the "ndings reported in [29,30]. The studies of Muthukumar and co-workers [17] also show the importance of entropic frustration (vide supra). For the system they study, adsorption occurs in two stages, with the slower second step being associated with the development of registry between the charged segments on the polyelectrolyte and the oppositely charged surface sites. 2.4. Connection to experiments and issues pertinent to evolution The results described in the previous subsections suggest that the phenomenon of recognition due to statistical pattern matching might prove useful in applications, and the class of systems that have been considered are also good model systems to study kinetic phenomena in frustrated systems. These reasons motivate pertinent experimental studies. Some examples of applications where this phenomenon may be pro"tably exploited are chromatrography, the development of viral inhibition agents, high throughput screening of vast numbers of macromolecules into ensembles of statistically similar sequences, and sensors. Some coarse grained aspects of the phenomenology that we have described may already have been observed in experiments carried out in the context of chromatrography and viral inhibition. Arnold and co-workers have done some interesting experiments aimed toward the development of inexpensive chromatographic materials for protein separations. In one such experiment [7], they measure the isotherms that describe the adsorption of horse cytochrome C on hydrophobic polymer supports which have been functionalized with copper. Copper is adsorbed onto the surface from a solution of a copper salt. The copper atoms adsorb randomly, and the loading of copper sites on the surface can be adjusted by varying the salt concentration in solution. The surface density of copper sites can be measured by titration. Hystidine residues on horse cytochrome C preferentially interact with the copper sites, while the others do not. Arnold and co-workers [7] have measured the initial slope of the adsorption isotherm as a function of the loading of copper sites. The initial slope is directly proportional to the adsorbed fraction for in"nitely dilute conditions. Their data are shown in Fig. 22. The shape of this curve is remarkably
Fig. 22. The initial slope (proportional to adsorbed fraction at in"nite dilution) for horse cytochrome C adsorbing on a hydrophobic support randomly functionalized with copper. Di!erent points on the abcissa correspond to di!erent surfaces each with a di!erent loading of copper. Data taken from [7].
42
A.K. Chakraborty / Physics Reports 342 (2001) 1}61
similar to the theoretical predictions [23,26] and simulation results [24,25] that we have discussed. This indicates that statistical pattern matching may be the origin of the experimental observations, and that further work exploring whether the phenomenon can be used in chromatographic applications is warranted. Another important application where similar phenomenology may have been observed concerns viral inhibition. Viruses can attach to receptors on cell surfaces, and this is an important step which leads to viral infections. Whitesides and co-workers [9}11] have considered the possibility of inhibiting this process by adsorbing synthetic polymers onto the virus' surface, thereby blocking out the sites that can attach to the binding sites of receptors on cell surfaces. The question is: what polymer to use? Whitesides and co-workers have used random copolymers with the monomer units being methacrylic acid and a sugar [10]. Speci"cally, they have studied how the binding of the in#uenza virus on mammalian erythrocyte cells is inhibited by adsorbing such polymers on the virus. In one set of experiments they have measured this inhibition constant as a function of the average composition of the DHPs. They "nd that when the inhibition constant is plotted as a function of the average polymer composition, it exhibits a maximum. The average composition is a measure of the statistical pattern encoded in the sequence statistics of the polymers (vide supra). The data from the Whitesides group suggests that a certain statistical ensemble of sequences (i.e., with a certain range of average compositions) is most e$cient at inhibiting the in#uenza virus from attaching to the erythrocyte cells. In other words, this ensemble adsorbs onto the surface of the virus and blocks out its binding sites most e$ciently. This is consistent with the notion of recognition due to statistical pattern matching we have described. It is important to note, however, that theory and simulation predict a sharper maximum than that observed experimentally. More detailed experiments that aim to study the phenomenon of statistical pattern matching more carefully are currently being considered and initiated by various groups. Surfaces bearing statistical patterns can be created [52}54]. The major di$culty is associated with careful characterization of the statistical patterns. Attempts to address this issue using photoemission electron microscopy (PEEM) are currently underway [58]. DHPs with careful control of sequence statistics can and are being prepared to carry out experimental studies of recognition due to statistical pattern matching [59]. Both synthetic and genetic methods are being used. The latter method [60] provides more control of the ensemble of sequences that are created. An experiment that should serve to study the phenomenology depicted in Figs. 12 and 13 is being planned [58,59]. Here, two ensembles of sequences with di!erent statistics will be tagged with red and green #uorescent labels. Adsorption from solution onto a surface resembling that depicted in Fig. 12 will be carried out, and optical microscopy will be used to study whether the molecules "nd their statistically pattern matched complementary regions on the surface. The theoretical and computational results that we have described demonstrate that the system we have been considering exhibits dramatic di!erences in dynamics on di!erent scales of space and time. These are classic signatures of dynamics in free energy landscapes characteristic of frustrated systems. For example, these features are similar to the behavior of spin glasses [42] and protein folding phenomena (e.g., [1}5]). Detailed experimental studies of the system we have been considering may, however, be more tractable as a simple polymeric system is easier to synthesize, manipulate, and characterize. Consider as an example systems such as the one considered in the previous paragraph. In the future, such systems may be employed to study kinetic phenomena also. This may be especially true given the rapid advances being made in single molecule spectroscopy.
A.K. Chakraborty / Physics Reports 342 (2001) 1}61
43
Fig. 23. Decay of correlations in the #uctuations of the end-to-end vector in the wrong and statistically pattern matched (SPM) regions.
Single molecule spectroscopy [61] of the DHP should be able to measure the correlation functions that have been computed [28]. The problem seems ideal for this tool which is most revealing (compared to other experimental probes) when the ergodic hypothesis is violated. A convenient correlation function from the standpoint of single molecule spectroscopic experiments is 1R(t#q) ) R(t)2, where R is the chain end-to-end vector. This quantity re#ects correlations between shape #uctuations, and thus measures how chain conformation evolves in time. In Fig. 23 we show how 1R(t#q) ) R(t)2 decays with time in the wrong and statistically pattern matched regions [28]. In the former region, the correlations decay as usual. In the statistically pattern matched region, conformational #uctuations are strongly correlated over long times. Our results also suggest an intriguing connection between the experimentally realizable system that we study and some provocative ideas concerning the competition between self-organization and selection in evolution. Populations can be considered to evolve on "tness landscapes due to mutations [51]. Kau!man has suggested the NK model (which characterizes the varying degrees of ruggedness of "tness landscapes) to study how the competition between the ability to self-organize and selection in#uences the manner in which populations evolve [51]. Population evolution on "tness landscapes due to a "nite mutation rate is analogous to the "nite temperature dynamics of physical systems on free energy landscapes [49]. The model we study is a kind of NK model. We have described the motion of a statistically alternating DHP on a free energy landscape corresponding to a surface which has random regions and a region that is statistically pattern matched with the DHP. This type of chain (analog of genotype) which has the ability to self-organize is naturally led to a free energy minimum in the statistically pattern matched region. Consequently, members of this ensemble of DHP sequences develop speci"c characteristics } viz., a speci"c distribution of loop lengths or shape. In contrast, when statistically blocky DHPs in the vicinity of this surface are simulated, P does not develop any structure. This is because this ensemble of L DHPs cannot self-organize (the free energy/"tness landscape does not have the structure shown in Fig. 14). Experimental studies of the physically realizable situation that we have described may provide insights (by analogy) to issues pertinent to models of evolution. The "ndings reviewed in this section also lead us to speculate whether in the early origins of life recognition was a!ected by statistical pattern matching, with speci"c sequences for binding and recognition resulting from evolutionary re"nement of ensembles of statistically pattern matched
44
A.K. Chakraborty / Physics Reports 342 (2001) 1}61
sequences. This speculation can be explored by studying whether an ensemble of statistically pattern matched sequences can evolve by mutations to a few speci"c sequences. This type of research may also prove useful for applications. Given a particular surface pattern, can we start from the ensemble of statistically pattern matched sequences and systematically design speci"c types of sequences that would bind most e$ciently? These issues can be addressed by using sequence space Monte-Carlo annealing of DHP sequences. Some work along these lines for the adsorption of DHPs on chemically homogeneous surfaces has recently been reported [62].
3. Branched DHPs in the molten state } model system for studying microphase ordering in systems with quenched disorder The behavior of in"nitely long linear DHPs in the compact state [the volume approximation (e.g., [1}3,63}65])] and that of chains of "nite length in the molten state has been studied extensively [66,67]. Two issues have been of interest. The "rst concerns the connection between the physics of DHPs and protein folding. Su$ciently sti! DHPs can undergo a phase transition where, below a certain temperature, the thermodynamics is determined by a few dominant conformations. This transition (often called a freezing transition) from a phase where multitudes of conformations are sampled to a low entropy state with only a few important conformations occurs because of frustration due to competing intersegment interactions and the quenched disordered sequence. This transition is akin to the folding of proteins to form the native state. Understanding how proteins fold is an important issue in biophysics. The thermodynamics and kinetics of the freezing transition in DHPs has been studied extensively via theory and computation with a view toward understanding the physics of protein folding. These e!orts have been reviewed in several journals recently (e.g., [3}5]), and it is interesting to consider the parallels between these studies and the subject reviewed here in Section 2. The second issue pertaining to the behavior of globular DHPs and molten DHPs that has been studied concerns microphase ordering. Mixtures of incompatible homopolymers undergo macroscopic phase segregation when cooled below a certain temperature. Molten copolymers with incompatible segments, however, cannot segregate on macroscopic scales because of chain connectivity. They undergo an order-disorder transition and form microdomains when cooled below the microphase segregation temperature (MST). Mircophase ordering for copolymers with ordered sequence distributions has been studied extensively via experiment, theory, and computation. Interesting physics and exotic microstructures have been revealed by these studies, especially in the context of studying diblock copolymers (e.g., [68}72]). These are polymers with two types of segments; a polymer of A-type segments joined at one end with a polymer of B-type segments. DHPs are polymers with a disordered sequence distribution. Thus, in addition to connectivity, another source of frustration hinders the propensity for the two types of segments to separate. The e!ects of frustrating quenched randomness on microphase ordering has been considered by theorists by studying linear DHPs. A recent review by Shakhnovich and Gutin in this journal [6] and references therein describe these e!orts in detail. The most signi"cant "nding is that the optimal wave vector (qH) corresponding to the length scale of ordered domains depends strongly on temperature below the MST. This is in contrast to the behavior of diblock copolymers, wherein qH is essentially temperature independent below the MST [68]. At temperatures su$ciently below
A.K. Chakraborty / Physics Reports 342 (2001) 1}61
45
the MST, qH for DCPs does decrease with temperature because of chain stretching and strong segregation e!ects (e.g., [73,74]). Until recently, however, there had been no experimental studies of microphase ordering in molten polymers with quenched sequence disorder. This has largely been because of the di$culties encountered by polymer physicists in synthesizing linear DHPs with controlled sequence statistics. Recently, one group [75] has succeeded in synthesizing and characterizing a class of polymers which embodies competing interactions and quenched disorder. Speci"cally, there is a homopolymer backbone of length N onto which are grafted p branches of another type of homopolymer of length M. The branch points are randomly located on the backbone, and their locations are quenched after synthesis is complete. Each chain in a melt of these copolymers has a di!erent sequence. Following Qi et al. [76], we refer to these materials as randomly branched heteropolymers (RBHs). The behavior of this material when it is cooled below the MST has been considered by optical birefringence experiments, small angle neutron scattering (SANS), and a "eld-theoretic model. Let us begin by describing a simple theoretical model to consider how this material behaves as temperature is scanned. Consider N chains with r (n) being the spatial location of the nth segment on the backbone of N G the ith chain, and r (m) be the mth segment of the jth branch on the ith chain. Let the set +n , GH GH represent the quenched locations of the branch points. The microscopic Edwards Hamiltonian for the problem can then be written as: !bH[+r (n),, +r (m),] G GH dr (n) 3 ,M . dr (m) 3 ,. ! dm GH "! dn G dn 2l dm 2l G G H ! E[+r (n),, +r (m),] (30) G GH where E is the energy corresponding to intersegment interactions, and depends explicitly on the chemical identities of the interacting segments. In order to construct a tractable theory we now transform to a representation in terms of o (r) and o (r), which are macroscopic "elds corresponding to the local volume fractions of A and B-type segments. The energy E is easily written in such a representation, and is independent of sequence distribution and architecture. It simply re#ects that there is an energetic driving force for the two types of segments to segregate, and that surface tension penalizes segregation on small scales. The standard way to write down the energy corresponding to given macroscopic "elds is
1 E" dr+!2sm(r)#c[ m(r)], 2
(31)
where s is the Flory parameter [73] which measures the energetic driving force for segregation, c is the surface tension, and m(r)"f o (r)!f o (r) where f and f are the average volume fractions of A- and B-type segments. The partition function for a given realization of +n , can be expressed as GH z"exp[!E#S] (32)
46
A.K. Chakraborty / Physics Reports 342 (2001) 1}61
where the entropy S is de"ned by
,. . S"ln(R)"ln Dr (n)Dr (s);P [r (n)]P [r (s)]d(r (0)!r(n )) G GH G GH GH GH J H
(33)
where P are the Gaussian functions in Eq. (30) which enforce connectivity. The energy does not depend on the quenched distribution of branch point locations. The entropy does. We need to perform a quenched average over the branch point locations. This implies that we must average Eq. (33); i.e., we must average a logarithm. We can use the celebrated replica trick to carry out this average [41]. It is important to note that other approaches such as cumulant expansions could be used to carry out such an average (e.g., [6,66,67]). Since we are concerned here with microphase ordering, the pertinent quantities bear only one replica index. Therefore, the issue of broken replica symmetry is not relevant (as discussed in the context of microphase ordering in linear DHPs [6]). Details of the derivation are provided in the appendix. Here, we merely sketch the steps. Let us introduce "elds c(r) and (r) conjugate to the single chain "elds, o and o in order to resolve the delta functions in Eq. (32), and then replicate the same equation to obtain
L 1RL2" ?
D (r)Dc (r) exp i dr[c (r)o(r)# (r)o (r)] ? ? ? ? ? ?
L ?
Dr (n) ? H
Dr (m)P[r (m)]P[r (n)] d(r (0)!r (n )) H? H? ? H? ? H H
(34)
with the averaged entropy 1S2"lim (1RL2!1)/n. Eq. (34) has been written with the L realization that, although each chain has a di!erent sequence, since the distribution function describing the #uctuations in branch point locations is the same we need only use the symbol n to denote branch point locations. We have also taken N (N#pM)/<"1, and the angular H N bracket denotes the quenched average; P[r (n)]"P [r (n)] exp[!idn c (r (n))], P[r (m)]" ? ? ? ? H? P [r (m)] exp[!i dm U (r (m))]; and i"(!1. H ? H? H? We now expand the functionals P in powers of the conjugate "elds up to fourth order, with the bare propagators being Gaussian (P ). Carrying out the resulting integrals, performing the average over the distribution of n , and evaluating the functional integrals over the conjugate "elds using H saddle points we "nd the replica symmetric solution for the entropy to be:
1 1 1S2"! dq o=o! 8 2 M !
!
GHI:
dq o=o M
dq dq o (q )o (q )o (!q !q )C(q , q , q ) G H I GHI
1 dq dq o2(q )=(q )[1M (q )=(q )o2(q )o(q )=(q )M (q )2 M M M 8
A.K. Chakraborty / Physics Reports 342 (2001) 1}61
47
! o2(q )o(q )]=(q )o(q )! dq dq dq o (q )o (q )o (q ) M G H I GHIJ: ;o (!q !q !q )C (q , q , q ) , (35) J GHIJ where 1M 2"=\, o2"(o , o ), and underlined symbols are matrices. Combining Eq. (34) with Eq. (30) obtains the free energy functional that we seek. Mathematical formulas for M , C, and C are provided in the appendix. In order to see the essential issues clearly, here we remark on their mathematical forms before presenting detailed results. The quadratic term (F ) in our free energy functional contains terms proportional to q and 1/q. The physical origin of the coulombic term in F is the stoichiometric constraint that forces the branches to be connected to the backbone at the branch points. This is so because the condition that all M segments of a branch must lie within a certain distance (equal to the radius of gyration) from the corresponding branch point is similar to the neutrality condition in a system of interacting charges. Since such a stoichiometric constraint is absent in linear DHPs, a coulombic term does not arise in the theory of microphase ordering of linear DHPs [68}72,77]. F (q) depends upon temperature, but exhibits a minimum with a temperature independent value of q at the minimum. Shinozaki et al. [78] and Fredrickson and co-workers [79] obtained a free energy functional for molten RBHs up to quadratic order, and studied the structure factor in the disordered phase as a function of various parameters. Up to quadratic order, the free energy functional obtained by Qi et al. [76] is identical to that obtained in [78,79]. However, as we shall see the quartic terms derived by Qi et al. [76] play a crucial role in determining the physics of microphase ordering in molten RBHs. So, let us discuss the form of the quartic terms before considering the cubic term. The fourth term in Eq. (34) (F (q)) is a temperature-dependent function that decays sharply with q at relatively small values of q. This term originates from the quenched #uctuations in branch point locations. It is similar in form to a term that arises in the theory of microphase ordering for linear DHPs [6,65}68] because, if we think of the branch points as another type of monomer on the backbone, the #uctuations in branch point locations are similar to the quenched sequence #uctuations along the backbone of linear DHPs. The last term in Eq. (34) (F (q)) is an Ising-like term. C contains
Fig. 24. Optimal wave vector qH as a function of the product sM(&1/¹).
48
A.K. Chakraborty / Physics Reports 342 (2001) 1}61
contributions from the mean values of the quenched disorder (i.e., the average number of branch points). In this sense, this term resembles the Ising-like term that arises in the theory of linear DHPs. For RBHs, however, this term also contains coulombic terms which arise due to the stoichiometric constraints discussed earlier. F (q) resembles F (q), with the minimum shifted to higher values of q. In order to understand the basic physics of microphase ordering in incompressible (o #o "1) RBH melts, Qi et al. [76] studied the simplest morphology; viz., the lamellar morphology. Thus, the functional form for the density wave is taken to be o (r)"D cos(q ) r)#m . For this morphology, the cubic term vanishes. A mean-"eld solution is obtained by minimizing the resulting free energy functional with respect to D and "q". Fig. 24 shows the theoretical prediction for the optimal wave vector as a function of temperature for a speci"c choice of chain architecture. At high temperature, we have a disordered phase, and qH is small and invariant with temperature. Immediately below the microphase separation temperature (MST), qH increases strongly with temperature, a feature reminiscent of earlier predictions for linear DHPs. At lower temperatures, qH becomes essentially independent of temperature. The range of temperatures over which qH varies strongly with temperature depends upon the values of the parameters that characterize the architecture and the statistical distribution of branch point locations. The variation of the length scale over which ordering occurs on temperature depicted in Fig. 24 is very di!erent from that observed for diblock copolymers and that predicted for linear DHPs. In the former case, there is some decrease in qH with temperature largely due to chain stretching. In the latter situation, theory predicts that qH continues to increase with decreasing temperature until we approach microscopic scales. The unusual variation of qH with temperature for RBHs can be understood in simple physical terms. In the disordered state, D"0, and qH&(N#pM)\. This is to say that #uctuations occur on the scale of the entire chain. After the MST, D acquires "nite values, and the system begins to order on scales smaller than chain dimensions. However, the length scale corresponding to ordered regions is still much larger than the branch length. On scales much larger than the branch length, the system resembles a linear DHP; the branches now look like segments of another type of monomer. In this limit, since the #uctuations in the locations of branch points are uncorrelated, there is no natural length scale set by the sequence and architecture of the polymer (in contrast, for example, to diblock copolymers). Thus, in this limit, as predicted for linear DHPs, qH increases with temperature. This is because the entropic penalty for ordering on smaller scales decreases continuously, and the system orders on progressively smaller scales due to the energetic driving force. In linear DHPs this behavior continues until ordering occurs on microscopic scales. In the system under consideration, however, when qH acquires values corresponding to the length scale of the branches (M) or length scales shorter than the mean value of the backbone sections between branch points (N/p), a natural scale emerges. Ordering on smaller scales is prevented because of the large entropy penalty associated with further squeezing the regions occupied by the branches and the backbone. Thus, we observe that qH becomes essentially independent of temperature. The relative importance of M and N/p in determining the "nal temperature independent value of the optimal wave vector depends upon the details of the architecture [80]. The variation of qH with temperature can also be understood by considering the role of the various terms in the free energy functional. For small q, F
A.K. Chakraborty / Physics Reports 342 (2001) 1}61
49
Recall that the quartic term F originates from the quenched #uctuations in the locations of the branch points. Thus, in the region immediately below the MST, the physics is determined by the quenched randomness inherent to this system. As the system orders on smaller scales, the value of the terms in the free energy functional at larger values of q are relevant. At larger values of q, F
Fig. 25. Intensity of the optical birefringence signal (in millivolts) as a function of temperature.
50
A.K. Chakraborty / Physics Reports 342 (2001) 1}61
Fig. 26. Small angle neutron scattering pro"les at selected temperatures.
Fig. 27. Intensity at the peak position of the SANS data as a function of sM. The circles correspond to experimental data and the line is theoretical result.
the measured optical birefringence signal is shown in Fig. 25. It is well established that the order-disorder transition in molten copolymers is announced by a discontinuous drop in the birefringence signal to less than 1 mV [81]. The birefringence signal from the molten branched DHP is large over the entire range of temperatures that has been studied, indicative of an ordered phase. This shows that in the experiments that have been performed thus far, the order-disorder transition for branched DHPs has not been accessed. However, extrapolation of the birefringence data suggests that the MST occurs at sM&O(10). This is consistent with the theoretical prediction shown in Fig. 24. The SANS experiments [76] allow much closer comparison with the theory that we have just outlined. Fig. 26 shows the scattering pro"les (which were found to be independent of thermal
A.K. Chakraborty / Physics Reports 342 (2001) 1}61
51
history) at a few selected temperatures. A single scattering peak is observed at all temperatures, and the coherent scattering intensity goes to zero in both high and low q limits. These features, coupled with the "nite birefringence signals, are classic signatures of microphase ordering. The width of the scattering peaks are substantially wider than that observed for diblock copolymers (a system that has been extensively studied). This is obviously because of the disordered sequence of branch points, and the resulting distribution of the lengths of PB segments between the branch points. The theory that we have described predicts the peak position, qH, and the intensity at the peak, IHJ"D" as a function of s. In order to predict IH as a function of temperature we need to know the dependence of s on temperature. Taking this dependence to be of the familiar form, A#B/¹, the values of the constants A and B can be adjusted to best "t the experimental data. Fig. 27 shows that the theory "ts the data remarkably well with the values of A and B being 0.003 (vanishingly small) and 20, respectively. The constant B re#ects the energy of interactions between PS and PB segments, and should be roughly the same for copolymers of various architectures and sequences. Thus, the value of B obtained by "tting the experimental data for branched DHPs should be similar to that which is well established for PS}PB diblock copolymers. The literature value of B for the PS}PB diblock copolymer is 19. This essentially exact comparison lends further credence to the theory for branched DHPs developed by Qi et al. [76]. The value of B thus obtained makes the temperature range over which experimental data has been collected correspond to sM values between 13 and 15. Fig. 24 shows that this places the experimental data in the region corresponding to the crossover from strongly temperature dependent to essentially temperature independent values of qH. Thus, a very small (but "nite) variation of qH with temperature is expected. The experimental data does exhibit a small (but systematic) increase of qH with temperature. However, because it is small, it cannot be unequivocally claimed that the variation is outside the limits of experimental error. The data in Fig. 24 shows that the average value of qH is 0.032#0.001 As \ in this range of temperatures. Using the statistical segment length of PS ("5.02 As ) theory predicts [76] that the value of qH should vary between 0.027 and 0.028 As \ in the range of temperatures over which experimental data exists. The asymptotic value of qH deep in the ordered region is predicted to be 0.033 As \. The agreement between theory and experiment is reasonable considering the uncertainties in the experimental values of l, M, N, and p. As noted earlier, the study of DHPs in the molten state can provide important insights into the physics of frustrated systems with quenched disorder. The discussion in this section was aimed to illustrate this potential. While past theoretical studies of linear DHPs were extremely useful in many contexts, the lack of experimental information on these systems has allowed only limited progress. The ability to routinely synthesize branched DHPs with controlled sequence statistics and architecture opens up many new possibilities for careful experimental studies and analyses that should enhance our fundamental understanding of the e!ects of frustration due to quenched ramdomness on microphase ordering. The advantage of using polymers to study the physics (as compared to say spin glasses) is that the statistics of the disorder can be carefully controlled and the self-assembled ordered structures that may form have length scales that are large and can be examined with relative ease (e.g., via scattering experiments and microscopies). Further, because of the slow dynamics of long chain systems, careful studies of the dynamics should also be possible. The work described in this section suggests that branched DHPs may be particularly good model
52
A.K. Chakraborty / Physics Reports 342 (2001) 1}61
systems to study the physics of frustrated systems with quenched disorder. By varying the branch length and the statistics of branch point locations, the behavior can be varied from the linear DHP limit to that of simple comb polymers. So far, only the lamellar morphology has been studied for the ordered phase. Future work should focus on studying the entire phase diagram using the "eld theory and the type of experiments described in this section. It may also prove fruitful, both from a fundamental standpoint and from the viewpoint of applications, to study the behavior of branched and linear DHPs when they are components of a mixture (e.g., with homopolymers, solvent mixtures, and block copolymers). Following the completion of these e!orts, the dynamics of the order}disorder and order}order transitions in these systems with quenched disorder should be studied. These studies may be most exciting and may allow us to develop a deep understanding of dynamics in frustrated systems with quenched disorder.
Acknowledgements Several people have in#uenced my thinking on the physics of DHPs. I would especially like to thank those who have in#uenced my ideas through collaborations: Prof. Eugene Shakhnovich (Harvard University), Prof. V. Pande (Standford University), Dr. L. Gutman, Dr. S.Y. Qi, Dr. S. Srebnik, and Mr. A. Golumbfskie. Dr. Qi and Mr. Golumbfskie were also kind enough to comment on this manuscript. I am deeply grateful to the National Science Foundation and the US Department of Energy for generous "nancial support of my work on disordered heteropolymers.
Appendix In order to compute the entropy as a function of the macroscopic order parameters with a proper quenched average over the branch point #uctuations, we note the identity
. I 1" Do? (r)Do? (rl )d o? (rl )! dn d[rl !rl ? (n)] d o? (rl )! ds d[rl !rl ?(s)] H H ? I " Do? (rl )Do? (rl )Dc? (rl )Dc? (rl ) exp i drl [c? (rl )o? (rl )#c? (rl )o? (rl )] ? . (A.1) ! i dn c? (rl ?(n))! ds c? (rl ?(s)) . H H where c and are "elds conjugate to o (r) and o (r). Substituting this identity into the replicated form of R (Eq. (32)) obtains
. 1RI2" Do? (rl )Do? (rl )Dc? (rl )Dc? (rl ) exp i drl [c? (rl )o? (rl )#c? (rl )o? (rl )] ? I Drl ?(n)Drl ?(s)P[rl ?(n)]P[rl ?(s)]d[rl ?(0)!rl ?(q )] , H H H H H
(A.2)
A.K. Chakraborty / Physics Reports 342 (2001) 1}61
53
where the angular brackets denoted the quenched average, and
P[rl ?(n)]"P [rl ?(n)] exp !i dn c? (rl ?(n)) ,
(A.3)
P[rl ?(s)]"P [rl ?(s)] exp !i ds c? (rl ?(s)) . H H H Let us denote the expression that needs to be averaged by Q. So,
I . 1QI2" Drl ?(n)Drl ?(s)P[rl ?(n)]P[rl ?(s)]d[rl ?(0)!rl ?(q )] H H H H ? H . I " Drl (n)Drl (s)P[rl (n)]P[rl (s)]d[rl (0)!rl (q )] . H H H H H We now expand 1Q2 in powers of the conjugate "elds c and , up to quartic order; i.e.,
(A.4)
Q"Q #Q #Q #Q #Q #2, (A.5) Q "<, and Q contributes an irrelevant constant to the free energy formula. The "rst non-trivial term is Q , which after introducing Fourier transforms is 1 (A.6) Q "! dql cl (q)M(ql )cl (q) 2
where c"(c , c )2 and M is a 2;2 matrix: 2 M (q)" [Nx#exp(!Nx)!1]"Ng (Nx) , x M (q)"PMg (Mx)#P(P!1)Mg (Mx)g (Nx) , (A.7) M (q)"M (q)"NPMg (Mx)g (Nx) with s"qb/6, g (y)"[1!exp(!y)]/y and g (y)"2/y[y#exp(!y)!1], and g (y) is the so-called Debye function. Similarly, expressions for Q and Q as functionals of conjugated "elds can be derived. Qi et al. [79] provide detailed derivation of these terms using a graphical method. Their result is
1 i 1Q2"
54
A.K. Chakraborty / Physics Reports 342 (2001) 1}61
# C (ql , ql , ql )c (ql )c (ql )c (ql )c (!ql !ql !ql ) # C (ql , ql , ql )c (ql )c (ql )c (ql )c (!ql !ql !ql ) # C (ql , ql , ql )c (ql )c (ql )c (ql )c (!ql !ql !ql ) # C (ql , ql , ql )c (ql )c (ql )c (ql )c (!ql !ql !ql )] where the functions C (q , q ) are G (2p)C "6Nh (Nx , Nx ) , (2p)C "6(PM)Ng (Mx )[2h (Nx , Nx )#h (Nx , Nx )] ,
(A.8)
(2p)C "3 PMNg (Nx )H (Mx !Mx , Mx )#4(PM)Ng (Mx )g (Mx )
g (Nx )!g (Nx ) , ;h (Nx , Nx )#(PM)g (Mx )g (Mx ) x !x (2p)C "6PMh (Mx , Mx )#6+PMg (Nx )g (Mx )H (Mx !Mx , Mx ) # (PM)g (Mx )g (Mx )g (Mx )h (Nx , Nx ), (A.9) and s "qb/6 for i"1, 2 and s "(q #q )b/6. The functions C (q , q , q ) are G G G (2p)C "24Nh (Nx , Nx , Nx ) , (2p)C "96PMNg (Mx )h (Nx , Nx , Nx ) , (2p)C "12+2PMNH (!Mx , Mx #Mx )[2h (Nx , Nx )#h (Nx , Nx )] # 4(PMN)g (Mx )g (Mx )h (Nx , Nx , Nx ) (PM) g (Nx )!g (Nx ) g (Nx )!g (Nx ) ! # g (Mx )g (Mx ) x !x x !x x !x 4(PMN) # g (Mx )g (Mx )[h (Nx , Nx )!h (Nx , Nx )] N(x !x ) # 2(PMN)g (Mx )g (Mx )h (Nx , Nx , Nx ), , (A.10)
(2p)C "8 3NPMg (Nx )H (Mx !Mx , Mx !Mx , Mx ) 12PM h (Nx , Nx )H (!Mx , Mx #Mx )g (Mx ) x g (Nx )!g (Nx ) H (!Mx , Mx #Mx ) # 3PMg (Mx ) x !x #
A.K. Chakraborty / Physics Reports 342 (2001) 1}61
#6
55
(PM) g (Mx )g (Mx )g (Mx )[h (Nx , Nx ) x
! H (Nx !Nx , Nx !Nx , Nx )]
h (Nx , Nx )!h (Nx , Nx ) , # 6(PM)g (Mx )g (Mx )g (Mx ) x !x (2p)C "24PMh (Mx , Mx , Mx ) # 24PMg (Mx )H (Mx !Mx , Mx !Mx , Mx )g (Nx ) # 12PMg (Nx )H (!Mx , Mx #Mx )H (!Mx , Mx #Mx ) # 72PMg (Mx )g (Mx )h (Nx , Nx )H (!Mx , Mx #Mx ) # 24(PM)g (Mx )g (Mx )g (Mx )g (Mx )h (Nx , Nx , Nx ) , where s "qb/6 for i"1, 2, 3, and s "(q #q #q )b/6, s "(q #q )b/6, s " G G (q #q )b/6 and s "(q #q )b/6. The following functions have been de"ned for simplicity:
h (y )"
L dn exp[!(n !n )y ] dn
1!g (y ) g (y ) " , " 2 y
(A.11)
L L dn dn exp[!(n !n )y !(n !n )y ] dn
h (y , y )"
g (y )!g (y ) 1 , h (y )! " y !y y h (y , y , y )"
dn
L
dn
L
dn
L
dn exp[!(n !n )y !(n !n )y
!(n !n )y ]
g (y )!g (y ) g (y )!g (y ) 1 1 ! h (y , y )! " y !y y !y y !y y H (y , y )"
dn
L
dn exp[!n y !n y ]
1 " [g (y )!g (y #y )] , y
,
56
A.K. Chakraborty / Physics Reports 342 (2001) 1}61
H (y , y , y )"
L L dn dn exp[!n y !n y !n y ] dn
1 " [H (y , y )!H (y #y , y )] . y Combining all these expressions yields
Q K 1 K 1 i "1# ! dql ? cl (ql ?)M(ql ?)cl (ql ?)# dql ? dql ? < < 2 6 ? [C (ql ? , ql ? )c (ql ? )c (ql ? )c (!ql ? !ql ? ) # C (ql ? , ql ? )c (ql ? )c (ql ? )c (!ql ? !ql ? ) # C (ql ? , ql ? )c (ql ? )c (ql ? )c (!ql ? !ql ? ) # C (ql ? , ql ? )c (ql ? )c (ql ? )c (!ql ? !ql ? )] #
1 dql ? dql ? dql ? [C (ql ? , ql ? , ql ? )c (ql ? )c (ql ? )c (ql ? )c (!ql ? !ql ? !ql ? ) 24
# C (ql ? , ql ? , ql ? )c (ql ? )c (ql ? )c (ql ? )c (!ql ? !ql ? !ql ? ) # C (ql ? , ql ? , ql ? )c (ql ? )c (ql ? )c (ql ? )c (!ql ? !ql ? !ql ? ) # C (ql ? , ql ? , ql ? )c (ql ? )c (ql ? )c (ql ? )c (!ql ? !ql ? !ql ? ) # C (ql ? , ql ? , ql ? )c (ql ? )c (ql ? )c (ql ? )c (!ql ? !ql ? !ql ? )]
1 K (A.12) dql ? dql @1cl (ql ?)M (ql ?)cl (ql ?)cl (ql @)M (ql @)cl (ql @)2 4 ?$@ where the matrix M is .> 2 +(q !q )x#exp[!(q !q )x]!1, (M ) (q)" H\ H H\ x H H .> exp(q x)!exp(q x) exp(!q x)!exp(!q x) G G\ ; H\ H , #2 (A.13) x x GH 1!exp(!Mx) . exp[!x(q !q )] , (M ) (q)"PMg(Mx)#2 H G x GH 1!exp(!Mx) .> . exp(!x"q !q ")!exp(!x"q !q ") H G H G\ (M ) (q)"(M ) (q)" x x G H #
A.K. Chakraborty / Physics Reports 342 (2001) 1}61
57
Combining Eqs. (A.1) and (A.11) yields 1RK2 as a function of the order parameters and conjugate "elds. We now evaluate the functional integrals over the conjugate "elds using a saddle point approximation; i.e., d ln1RK2[o , o , c , c ] "0 , dc? (!ql ) d ln1RK2[o , o , c , c ] "0 . dc? (!ql )
(A.14)
Solving the above equations obtains 1S[o , o ]2"lim [1R2 [o , o ]!1]/k as a functional I of o and o :
i < dql dql [C (ql , ql )c (ql )c (ql )c (!ql !ql ) 1S[o , o ]2"! dql ol =ol # 6< 2 # C (ql , ql )c (ql )c (ql )c (!ql !ql ) # C (ql , ql )c (ql )c (ql )c (!ql !ql ) # C (ql , ql )c (ql )c (ql )c (!ql !ql )] #
1 dql dql dql [C (ql , ql , ql )c (ql )c (ql )c (ql )c (!ql !ql !ql ) 24<
# C (ql , ql , ql )c (ql )c (ql )c (ql )c (!ql !ql !ql ) # C (ql , ql , ql )c (ql )c (ql )c (ql )c (!ql !ql !ql ) # C (ql , ql , ql )c (ql )c (ql )c (ql )c (!ql !ql !ql ) # C (ql , ql , ql )c (ql )c (ql )c (ql )c (!ql !ql !ql )] !
1 dql dql +1cl (ql )M (ql )cl (ql )cl (ql )M (ql )cl (ql )2 8<
! [cl (ql )M(ql )cl (ql )][cl (ql )M(ql )cl (ql )],
(A.15)
where the matrix = is the inverse of M and c "i<(= o #= o ), c " i<(= o #= o ), and c "(c , c )2. The modi"ed quartic coe$cients are C (ql , ql , ql )"C (ql , ql , ql )!+[C (ql #ql , ql )#C (ql , ql #ql )#C (ql , ql )] ;= (ql #ql )[C (!ql !ql , ql )#C (ql ,!ql !ql )#C (ql , ql )] # 2[C (ql #ql , ql )#C (ql , ql #ql )#C (ql , ql )] ;= (ql #ql )C (ql , ql )#C (ql , ql )M (ql #ql )C (ql , ql ), ,
58
A.K. Chakraborty / Physics Reports 342 (2001) 1}61
C (ql , ql , ql )"C (ql , ql , ql )!+2[C (!ql !ql , ql )#C (ql ,!ql !ql ) #C (ql , ql )] ;= (ql #ql )[C (ql #ql , ql )#C (ql , ql #ql )] # 2[C (!ql !ql , ql )#C (ql ,!ql !ql )#C (ql , ql )] ;= (ql #ql )[C (ql , ql #ql )#C (ql , ql )] # 2C (ql , ql )= (ql #ql )[C (ql #ql , ql )#C (ql , ql #ql )] # 2C (ql , ql )= (ql #ql )[C (ql , ql #ql )#C (ql , ql )], , (A.16) C (ql , ql , ql )"C (ql , ql , ql )!+2[C (!ql !ql , ql )#C (ql ,!ql !ql ) #C (ql , ql )] ;= (ql #ql )C (ql #ql , ql )# [C (ql #ql , ql )#C (ql , ql #ql )] ;= (ql #ql )[C (!ql !ql , ql )#C (ql ,!ql #ql )] # 2C (ql , ql )= (ql #ql )C (ql #ql , ql ) # 2[C (!ql !ql , ql )#C (ql ,!ql !ql )#C (ql , ql )] ;= (ql #ql )[C (ql #ql , ql )#C (ql , ql #ql )#C (ql , ql )] # 2[C (ql #ql , ql )#C (ql , ql #ql )]= (ql #ql ) ;[C (ql ,!ql !ql )#C (ql , ql )] # 2C (ql , ql )= (ql #ql )[C (ql #ql , ql )#C (ql , ql #ql ) #C (ql , ql )] # [C (ql , ql #ql )#C (ql , ql )]= (ql #ql )[C (ql ,!ql !ql ) #C (ql , ql )], , C (ql , ql , ql )"C (ql , ql , ql ) ! +2C (ql #ql , ql )= (ql #ql )[C (!ql !ql , ql ) #C (ql ,!ql !ql )] # 2C (ql #ql , ql )= (ql #ql )[C (ql , ql )#C (ql ,!ql !ql )] # 2[C (!ql !ql , ql )#C (ql ,!ql !ql )] ;= (ql #ql )[C (ql #ql , ql )#C (ql , ql #ql )#C (ql , ql )] # 2[C (ql ,!ql !ql )#C (ql , ql )] ;= (ql #ql )[C (ql #ql , ql )#C (ql , ql #ql )#C (ql , ql )], ,
A.K. Chakraborty / Physics Reports 342 (2001) 1}61
59
C (ql , ql , ql )"C (ql , ql , ql )!+C (ql #ql , ql )= (ql #ql )C (!ql !ql , ql ) # 2C (ql #ql , ql )= (ql #ql ) ;[C (!ql !ql , ql )#C (ql ,!ql !ql )#C (ql , ql )] # [C (!ql !ql , ql )#C (ql ,!ql !ql )#C (ql , ql )] ;= (ql #ql )[C (ql #ql , ql )#C (ql , ql #ql )#C (ql , ql )], , where ql "!ql !ql !ql . References [1] [2] [3] [4]
[5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] [17] [18] [19] [20] [21] [22] [23] [24] [25] [26] [27] [28]
H.S. Chan, K.A. Dill, Ann. Rev. Biophys. Chem. 20 (1991) 447. J. Bryngelson, P.G. Wolynes, Proc. Natl. Acad. Sci. U.S.A. 84 (1987) 7524. E.I. Shakhnovich, A.M. Gutin, Nature 346 (1990) 773. J.D. Bryngelson, J.N. Onuchic, N.D. Socci, P.G. Wolynes, PSFG 21 (1995) 167; E.I. Shakhnovich, Folding Des. 1 (1996) 50; H.S. Chan, K.A. Dill, Nature Struct. Biol. 4 (1997) 10; M. Karplus, E.I. Shakhnovich, in: T.E. Creighton (Ed.), Protein Folding, Freeman, New York, 1995. V.S. Pande, A.Yu. Grossberg, T. Tanaka, Biophys. J. 73 (1997) 3192. C.D. Sfatos, E.I. Shakhnovich, Phys. Rep. 288 (1997) 77. R.J. Todd, R.D. Johnson, F.H. Arnold, J. Chromatogr. A 662 (1994) 13. R.D. Johnson, Z.-G. Wang, F.H. Arnold, J. Phys. Chem. 100 (1996) 5134. A. Spalstein, G.M. Whitesides, J. Am. Chem. Soc. 113 (1991) 686. M. Mammen, G. Dahmann, G.M. Whitesides, J. Med. Chem. 38 (1995) 4179. G. Sigal, M. Mammen, G. Dahmann, G.M. Whitesides, J. Am. Chem. Soc. 118 (1996) 3789. A.K. Chakraborty, M. Tirrell, MRS Bull. 21 (1996) 28. M.A. Cohen Stuart, T. Cosgrove, B. Vincent, Adv. Colloid Interface Sci. 24 (1986) 143. A.Yu. Grosberg, A.R. Khoklov, Statistical Physics of Macromolecules, AIP Press, New York, 1994. D. Petera, M. Muthukumar, J. Chem. Phys. (1997). J. Heier, E.J. Kramer, S. Walheim, G. Krausch, Macromolecules 30 (1997) 6610. M. Muthukumar, Curr. Opin. Colloid Interface Sci. 3 (1998) 48. J.F. Joanny, J. Phys. II 4 (1994) 1281; C.M. Marques, J.F. Joanny, Macromolecules 23 (1990) 268. L. Gutman, A.K. Chakraborty, J. Chem. Phys. 101 (1994) 10 074; L. Gutman, A.K. Chakraborty, J. Chem. Phys. 103 (1995) 10 728; L. Gutman, A.K. Chakraborty, J. Chem. Phys. 104 (1996) 7306. T. Cosgrove, N.A. Finch, J.R.P. Webster, Macromolecules 23 (1990) 3353; J.S. Sha!er, Macromolecules 28 (1995) 7447. A.C. Balazs, M.C. Gempe, Z. Zhou, Macromolecules 24 (1991) 4918; D. Gersappe, A.C. Balazs, Phys. Rev. E 52 (1995) 5061. K. Sumithra, K.L. Sebastian, J. Phys. Chem. 98 (1994) 9312. S. Srebnik, A.K. Chakraborty, E.I. Shakhnovich, Phys. Rev. Lett. 77 (1996) 3157. D. Bratko, A.K. Chakraborty, E.I. Shakhnovich, Chem. Phys. Lett. 280 (1997) 46. D. Bratko, A.K. Chakraborty, E.I. Shakhnovich, Comp. Theor. Polym. Sci. 8 (1998) 113. A.K. Chakraborty, D. Bratko, J. Chem. Phys. 108 (1998) 1676. S. Srebnik, D. Bratko, A.K. Chakraborty, J. Chem. Phys. A.J. Golumbfskie, V.S. Pande, A.K. Chakraborty, Proc. Natl. Acad. Sci. U.S.A. (1999) submitted.
60 [29] [30] [31] [32] [33] [34] [35] [36] [37] [38] [39] [40] [41] [42] [43] [44] [45] [46] [47] [48] [49] [50] [51] [52] [53] [54] [55] [56] [57] [58] [59] [60] [61] [62] [63] [64] [65] [66] [67] [68] [69] [70] [71] [72] [73] [74] [75]
A.K. Chakraborty / Physics Reports 342 (2001) 1}61 M. Muthukumar, J. Chem. Phys. 103 (1995) 4723. C.Y. Kong, M. Muthukumar, J. Chem. Phys. 109 (1998) 1522. M. Muthukumar, personal communication, 1999. K.F. Lau, K.A. Dill, Macromolecules 22 (1989) 3986. A. Irback, C. Peterson, F. Potthast, Proc. Natl. Acad. Sci. U.S.A. 93 (1996) 9533. V.S. Pande, A.Yu. Grosberg, Proc. Natl. Acad. Sci. U.S.A. 91 (1994) 12 972. G. Dewey, Fractals in Molecular Biophysics, Oxford University Press, Oxford, 1997. G.G. Odian, Principles of Polymerization, Wiley, New York, 1991. M.E. Cates, R.C. Ball, J. Phys. (Paris) 49 (1988) 2009. J.P. Bouchaud, A. Georges, Phys. Rep. C 195 (1990) 127. D. Chandler, in: D. Levesque et al. (Eds.), Liquids, Freezing, and the Glass Transition, Proceedings of the Les Houches Summer School, Elsevier, New York, 1991. R.P. Feynman, J.L. Vernon Jr., Ann. Phys. (N.Y.) 24 (1963) 118. S.F. Edwards, P.A. Anderson, J. Phys. F 5 (1975) 965. K.H. Fischer, J.A. Hertz, Spin Glasses, Cambridge University Press, Cambridge, 1993. A.K. Chakraborty, E.I. Shakhnovich, J. Chem. Phys. 103 (1995) 10 751; D. Bratko, A.K. Chakraborty, E.I. Shakhnovich, Phys. Rev. Lett. 76 (1996) 1844. M. Mezard, G. Parisi, J. Phys. (Fr.) I 1 (1991) 809. C.A.J. Hoeve, E.A. diMarzio, P. Peyser, J. Chem. Phys. 42 (1965) 2558. S. Srebnik, Ph.D. Thesis, University of California, Berkeley, 1998. P.G. deGennes, Scaling Concepts in Polymer Physics, Cornell University Press, Ithaca, 1979. P.G. Higgs, H. Orland, J. Chem. Phys. 95 (1991) 4506. T. Garel, H. Orland, J. Phys. A 23 (1990) L621. K.F. Freed, Renormalization Group Theory of Macromolecules, Wiley, New York, 1987. S.A. Kau!man, The Origins of Order: Self Organization and Selection in Evolution, Oxford University Press, New York, 1993. P. Berndt, G.B. Fields, M. Tirrell, J. Am. Chem. Soc. 117 (1995) 9515. S.J. Starnick et al., J. Phys. Chem. 98 (1994) 7636. M.R. Leduc, W. Hayes, J.M.J. Frechet, J. Polym. Sci. A 36 (1998) 1; J.M.J. Frechet, personal communication, 1999. Y. Hue et al., J. Chem. Phys. 93 (1990) 822. A. Byrne et al., J. Chem. Phys. 102 (1995) 573; Y. Termonia, Macromolecules 30 (1997) 5367. M. Muthukumar, C.K. Ober, E.L. Thomas, Science 277 (1997) 1225. H. Ade, T. Russell, personal communication, 1999. N. Goeden, S.J. Muller, J.D. Keasling, personal communication, 1999. D.A. Tirrell, M.J. Fournier, T.L. Mason, Curr. Opin. Struct. Biol. 1 (1991) 638. X.S. Xie, Acc. Chem. Res. 29 (1996) 598; W.E. Moerner, M. Orrit, Science 283 (1999) 1670. E.A. Zheligovskaya, P.G. Khalatur, A.R. Khokhlov, Phys. Rev. E 59 (1999) 3071. E.I. Shakhnovich, A.M. Gutin, J. Phys. (Fr.) 50 (1989) 1843. A.V. Dobrynin, I.Y. Erukhimovich, J. Phys. I 5 (1995) 365; A.V. Dobrynin, J. Chem. Phys. 107 (1997) 9234. A. Nesarikar, M. Olivera de la Cruz, B. Crist, J. Chem. Phys. 98 (1993) 7385. G.H. Fredrickson, S.T. Milner, Phys. Rev. Lett. 67 (1991) 835. G.H. Fredrickson, S.T. Milner, L. Leibler, Macromolecules 25 (1992) 6341. L. Leibler, Macromolecules 13 (1980) 1602. F.S. Bates, G.H. Fredrickson, Annu. Rev. Phys. Chem. 41 (1990) 525. G.H. Fredrickson, F.S. Bates, Annu. Rev. Mater. Sci. 26 (1996) 501. K.M. Hong, J. Noolandi, Macromolecules 14 (1981) 727. G.H. Fredrickson, E. Helfand, J. Chem. Phys. 87 (1987) 697. K. Almdal, J.H. rosedale, F.S. Bates, G.D. Wignall, G.H. Fredrickson, Phys. Rev. Lett. 65 (1990) 1112. M. Olivera de la Cruz, Phys. Rev. Lett. 67 (1991) 85. M. Xenidou, N. Hadjichristidis, Macromolecules 31 (1998) 5690.
A.K. Chakraborty / Physics Reports 342 (2001) 1}61
61
[76] S. Qi, A.K. Chakraborty, H. Wang, A.A. Lefebvre, N.P. Balsara, E.I. Shakhnovich, M. Xenidou, N. Hadjichristidis, Phys. Rev. Lett. 82 (1999) 2896. [77] P.J. Flory, Principles of Polymer Chemistry, Cornell University Press, Ithaca, 1971. [78] A. Shinozaki, D. Jasnow, A.C. Balazs, Macromolecules 27 (1994) 2496. [79] A. Werner, G.H. Fredrickson, J. Polym. Sci. B 35 (1997) 849. [80] S. Qi, A.K. Chakraborty, H. Wang, A.A. Lefebvre, N.P. Balsara, E.I. Shakhnovich, (1999) in preparation. [81] N.P. Balsara et al., Macromolecules 25 (1992) 6072; C.C. Lin et al., Macromolecules 27 (1994) 7769.
M.I. Eides et al. / Physics Reports 342 (2001) 63}261
63
THEORY OF LIGHT HYDROGENLIKE ATOMS
Michael I. EIDES , Howard GROTCH , Valery A. SHELYUTO Department of Physics and Astronomy, University of Kentucky, Lexington, KY 40506, USA Petersburg Nuclear Physics Institute, Gatchina, St. Petersburg 188350, Russia D.I. Mendeleev Institute of Metrology, St. Petersburg 198005, Russia
AMSTERDAM } LONDON } NEW YORK } OXFORD } PARIS } SHANNON } TOKYO
Physics Reports 342 (2001) 63}261
Theory of light hydrogenlike atoms Michael I. Eides *, Howard Grotch , Valery A. Shelyuto Department of Physics and Astronomy, University of Kentucky, Lexington, KY 40506, USA Petersburg Nuclear Physics Institute, Gatchina, St. Petersburg 188350, Russia D. I. Mendeleev Institute of Metrology, St. Petersburg 198005, Russia Received April 2000; editor: G.E. Brown Contents 1. Introduction 2. Theoretical approaches to the energy levels of loosely bound systems 2.1. Nonrelativistic electron in the Coulomb "eld 2.2. Dirac electron in the Coulomb "eld 2.3. Bethe}Salpeter equation and the e!ective Dirac equation 3. General features of the hydrogen energy levels 3.1. Classi"cation of corrections 3.2. Physical origin of the Lamb shift 3.3. Natural magnitudes of corrections to the Lamb shift 4. External "eld approximation 4.1. Leading relativistic corrections with exact mass dependence 4.2. Radiative corrections of order aL(Za)m 4.3. Radiative corrections of order aL(Za)m 4.4. Radiative corrections of order aL(Za)m 4.5. Radiative corrections of order a(Za)m and of higher orders 5. Essentially two-particle recoil corrections 5.1. Recoil corrections of order (Za)(m/M)m 5.2. Recoil corrections of order (Za)(m/M)m 5.3. Recoil correction of order (Za)(m/M)
66 69 69 70 72 77 77 78 80 81 81 84 93 105 119 125 125 133 137
6. Radiative-recoil corrections 6.1. Corrections of order a(Za)(m/M)m 6.2. Corrections of order a(Za)(m/M)m 7. Nuclear size and structure corrections 7.1. Main proton size contribution 7.2. Nuclear size and structure corrections of order (Za)m 7.3. Nuclear size and structure corrections of order (Za)m 7.4. Radiative correction of order a(Za)1r2m to the "nite size e!ect P 8. Weak interaction contribution 9. Lamb shift in light muonic atoms 9.1. Closed electron-loop contributions of order aL(Za)m 9.2. Relativistic corrections to the leading polarization contribution with exact mass dependence 9.3. Higher-order electron-loop polarization contributions 9.4. Hadron loop contributions 9.5. Standard radiative, recoil and radiativerecoil corrections 9.6. Nuclear size and structure corrections
138 138 144 145 146 149 155 159 161 161 163
167 169 175 177 177
* Corresponding author. E-mail addresses:
[email protected],
[email protected] (M.I. Eides),
[email protected] (H. Grotch),
[email protected] (V.A. Shelyuto). 0370-1573/01/$ - see front matter 2001 Elsevier Science B.V. All rights reserved. PII: S 0 3 7 0 - 1 5 7 3 ( 0 0 ) 0 0 0 7 7 - 6
M.I. Eides et al. / Physics Reports 342 (2001) 63}261 10. Physical origin of the hyper"ne splitting and the main nonrelativistic contribution 11. External "eld approximation 11.1. Relativistic (binding) corrections to HFS 11.2. Electron anomalous magnetic moment contributions (corrections of order aLE ) $ 11.3. Radiative corrections of order aL(Za)E $ 11.4. Radiative corrections of order aL(Za)E $ 11.5. Radiative corrections of order a(Za)E $ and of higher orders 12. Essentially two-body corrections to HFS 12.1. Recoil corrections to HFS 12.2. Radiative-recoil corrections to HFS 13. Weak interaction contribution 14. Hyper"ne splitting in hydrogen
182 184 184 186 187 196 200 204 204 208 219 219
14.1. Nuclear size, recoil and structure corrections of orders (Za)E and (Za)E $ $ 14.2. Radiative corrections to nuclear size and recoil e!ects 14.3. Weak interaction contribution 15. Hyper"ne splitting in muonic hydrogen 15.1. Hyper"ne structure of the 2S state 15.2. Fine and hyper"ne structure of the 2P states 16. Comparison of theory and experiment 16.1. Lamb shifts of the energy levels 16.2. Hyper"ne splitting 16.3. Summary Acknowledgements References
65
220 227 229 230 230 232 234 235 248 252 253 253
Abstract The present status and recent developments in the theory of light hydrogenic atoms, electronic and muonic, are extensively reviewed. The discussion is based on the quantum "eld theoretical approach to loosely bound composite systems. The basics of the quantum "eld theoretical approach, which provide the framework needed for a systematic derivation of all higher-order corrections to the energy levels, are brie#y discussed. The main physical ideas behind the derivation of all binding, recoil, radiative, radiative-recoil, and nonelectromagnetic spin-dependent and spin-independent corrections to energy levels of hydrogenic atoms are discussed and, wherever possible, the fundamental elements of the derivations of these corrections are provided. The emphasis is on new theoretical results which were not available in earlier reviews. An up-to-date set of all theoretical contributions to the energy levels is contained in the paper. The status of modern theory is tested by comparing the theoretical results for the energy levels with the most precise experimental results for the Lamb shifts and gross structure intervals in hydrogen, deuterium, and helium ion He>, and with the experimental data on the hyper"ne splitting in muonium, hydrogen and deuterium. 2001 Elsevier Science B.V. All rights reserved. PACS: 12.20.!m; 31.30.!r; 32.10.Fn; 31.15.Ar Keywords: Quantum electrodynamics; Bound states; Fine and hyper"ne structure; Lamb shift; Radiative corrections; Recoil corrections; Radiative-recoil corrections
66
M.I. Eides et al. / Physics Reports 342 (2001) 63}261
1. Introduction Light one-electron atoms are a classical subject of quantum physics. The very discovery and further progress of quantum mechanics is intimately connected to the explanation of the main features of hydrogen energy levels. Each step in development of quantum physics led to a better understanding of the bound state physics. Bohr quantization rules of the old quantum theory were created in order to explain the existence of the stable discrete energy levels. The nonrelativistic quantum mechanics of Heisenberg and SchroK dinger provided a self-consistent scheme for description of bound states. The relativistic spin one half Dirac equation quantitatively described the main experimental features of the hydrogen spectrum. Discovery of the Lamb shift [1], a subtle discrepancy between the predictions of the Dirac equation and the experimental data, triggered development of modern relativistic quantum electrodynamics, and subsequently the Standard Model of modern physics. Despite its long and rich history the theory of atomic bound states is still very much alive today. New importance to the bound state physics was given by the development of quantum chromodynamics, the modern theory of strong interactions. It was realized that all hadrons, once thought to be the elementary building blocks of matter, are themselves atom-like bound states of elementary quarks bound by the color forces. Hence, from a modern point of view, the theory of atomic bound states could be considered as a theoretical laboratory and testing ground for exploration of the subtle properties of the bound state physics, free from further complications connected with the nonperturbative e!ects of quantum chromodynamics which play an especially important role in the case of light hadrons. The quantum electrodynamics and quantum chromodynamics bound state theories are so intimately intertwined today that one often "nds theoretical research where new results are obtained simultaneously, say for positronium and also heavy quarkonium. The other powerful stimulus for further development of the bound state theory is provided by the spectacular experimental progress in precise measurements of atomic energy levels. It su$ces to mention that the relative uncertainty of measurement of the frequency of the 1S}2S transition in hydrogen was reduced during the last decade by three orders of magnitude from 3;10\ [2] to 3.4;10\ [3]. The relative uncertainty in measurement of the muonium hyper"ne splitting was reduced recently by the factor 3 from 3.6;10\ [4] to 1.2;10\ [5]. This experimental development was matched in recent years by rapid theoretical progress, and we feel that now is a good time to review bound state theory. The theory of hydrogenic bound states is widely described in the literature. The basics of nonrelativistic theory is contained in any textbook on quantum mechanics, and the relativistic Dirac equation and the Lamb shift are discussed in any textbook on quantum electrodynamics and quantum "eld theory. An excellent source for the early results is the classic book by Bethe and Salpeter [6]. The last comprehensive review of the theory [7] was published more than ten years ago. A number of reviews were published recently which contain new theoretical results [8}15]. However, a coherent discussion of the modern status of the theory, to the best of our knowledge, is missing in the literature, and we will try to provide this in the current paper. Our goal here is to present a state of the art discussion of the theory of the Lamb shift and hyper"ne splitting in light hydrogenlike atoms. In the body of the paper the spin-independent corrections are discussed mainly as corrections to the hydrogen energy levels (see Fig. 1), and the
M.I. Eides et al. / Physics Reports 342 (2001) 63}261
67
Fig. 1. Hydrogen energy levels.
theory of hyper"ne splitting is discussed in the context of the hyper"ne splitting in the ground state of muonium (see Fig. 2). These two simple atomic systems are singled out for practical reasons, because highly precise experimental data exists in both cases, and the most accurate theoretical results are also obtained for these cases. However, almost all formulae in this review are valid also for other light hydrogenlike systems, and some of these other applications, including muonic atoms, will be discussed in the text as well. We will present all theoretical results in the "eld, with emphasis on more recent results which either were not discussed in su$cient detail in the previous theoretical reviews [6,7], or simply did not exist when the reviews were written. Our emphasis on the theory means that, besides presenting an exhaustive compendium of theoretical results, we will also try to present a qualitative discussion of the origin and magnitude of di!erent corrections to the energy levels, to give, when possible, semiquantitative estimates of expected magnitudes, and to describe the main steps of the theoretical calculations and the new e!ective methods which were developed in recent years. We will not attempt to present a detailed comparison of theory with the latest experimental results, leaving this task to the experimentalists. We will use the experimental results only for illustrative purposes. The paper consists of three main parts. In the introductory part we brie#y remind the reader of the main characteristic features of the bound state physics. Then follows a detailed discussion of the
68
M.I. Eides et al. / Physics Reports 342 (2001) 63}261
Fig. 2. Muonium energy levels.
corrections to the energy levels which do not depend on the nuclear spin. The last third of the paper is devoted to a systematic discussion of the physics of hyper"ne splitting. Di!erent corrections to the energy levels are ordered with respect to the natural small parameters a, Za, m/M and nonelectrodynamic parameters like the ratio of the nucleon size to the radius of the "rst Bohr orbit. These parameters have a transparent physical nature in the light hydrogenlike atoms. Powers of a describe the order of quantum electrodynamic corrections to the energy levels, parameter Za describes the order of relativistic corrections to the energy levels, and the small mass ratio of the light and heavy particles is responsible for the recoil e!ects beyond the reduced mass parameter present in a relativistic bound state. Corrections which depend both on the quantum electrodynamic parameter a and the relativistic parameter Za are ordered in a series over a at "xed power of Za, contrary to the common practice accepted in the physics of highly charged ions with large Z. This ordering is more natural from the point of view of the nonrelativistic bound state physics, since all radiative corrections to a contribution of a de"nite order in the nonrelativistic expansion originate from the same distances and describe the same physics, while the radiative corrections to the di!erent terms in nonrelativistic expansion over Za of the same order in a are generated at vastly di!erent distances and could have drastically di!erent magnitudes. A few remarks about our notation. All formulae below are written for the energy shifts. However, not energies but frequencies are measured in the spectroscopic experiments. The formulae for the energy shifts are converted to the respective expressions for the frequencies with the help of the
We will return to a more detailed discussion of the role of di!erent small parameters below.
M.I. Eides et al. / Physics Reports 342 (2001) 63}261
69
De Broglie relationship E"hl. We will ignore the di!erence between the energy and frequency units in our theoretical discussion. Comparison of the theoretical expressions with the experimental data will always be done in the frequency units, since transition to the energy units leads to loss of accuracy. All numerous contributions to the energy levels in di!erent sections of this paper are generically called *E and as a rule do not carry any speci"c labels, but it is understood that they are all di!erent. Let us mention brie#y some of the closely related subjects which are not considered in this review. The physics of the high Z ions is nowadays a vast and well developed "eld of research, with its own problems, approaches and tools, which in many respects are quite di!erent from the physics of low Z systems. We discuss below the numerical results obtained in the high Z calculations only when they have a direct relevance for the low Z atoms. The reader can "nd a detailed discussion of the high Z physics in a number of recent reviews (see, e.g., [16]). In trying to preserve a reasonable size of this review we decided to omit discussion of positronium, even though many theoretical expressions below are written in such form that for the case of equal masses they turn into respective corrections for the positronium energy levels. Positronium is qualitatively di!erent from hydrogen and muonium not only due to the equality of the masses of its constituents, but because unlike the other light atoms there exists a whole new class of corrections to the positronium energy levels generated by the annihilation channel which is absent in other cases. Our discussion of the new theoretical methods will be incomplete due to omission of the recently developed and now popular nonrelativistic QED (NRQED) [17] which was especially useful in the positronium calculations, but was rarely used in the hydrogen and muonium physics. Very lucid presentations of NRQED exist in the recent literature (see, e.g., [18]).
2. Theoretical approaches to the energy levels of loosely bound systems 2.1. Nonrelativistic electron in the Coulomb xeld In the "rst approximation, energy levels of one-electron atoms are described by the solutions of the SchroK dinger equation for an electron in the "eld of an in"nitely heavy Coulomb center with charge Z in terms of the proton charge
D Za ! ! t(r)"E t(r) L 2m r
r t (r)"R (r)> LJK LJ JK r
m(Za) E "! , n"1, 2, 32 , L 2n
(1)
where n is called the principal quantum number. Besides the principal quantum number n each state is described by the value of angular momentum l"0, 1,2, n!1, and projection of the We are using the system of units where "c"1.
70
M.I. Eides et al. / Physics Reports 342 (2001) 63}261
orbital angular momentum m"0,$1,2,$l. In the nonrelativistic Coulomb problem all states with di!erent orbital angular momentum but the same principal quantum number n have the same energy, and the energy levels of the SchroK dinger equation in the Coulomb "eld are n-fold degenerate with respect to the total angular momentum quantum number. As in any spherically symmetric problem, the energy levels in the Coulomb "eld do not depend on the projection of the orbital angular momentum on an arbitrary axis, and each energy level with given l is additionally 2l#1-fold degenerate. Straightforward calculation of the characteristic values of the velocity, Coulomb potential and kinetic energy in the stationary states gives
1n"*"n2" n
(Za) p n " , n m
n
Za m(Za) n " , r n
n
p m(Za) n " . 2m 2n
(2)
We see that due to the smallness of the "ne structure constant a a one-electron atom is a loosely bound nonrelativistic system and all relativistic e!ects may be treated as perturbations. There are three characteristic scales in the atom. The smallest is determined by the binding energy &m(Za), the next is determined by the characteristic electron momenta &mZa, and the last one is of order of the electron mass m. Even in the framework of nonrelativistic quantum mechanics one can achieve a much better description of the hydrogen spectrum by taking into account the "nite mass of the Coulomb center. Due to the nonrelativistic nature of the bound system under consideration, "niteness of the nucleus mass leads to substitution of the reduced mass instead of the electron mass in the formulae above. The "niteness of the nucleus mass introduces the largest energy scale in the bound system problem } the heavy particle mass. 2.2. Dirac electron in the Coulomb xeld The relativistic dependence of the energy of a free classical particle on its momentum is described by the relativistic square root p p #2 . (3274p#m+m# ! 2m 8m
(3)
The kinetic energy operator in the SchroK dinger equation corresponds to the quadratic term in this nonrelativistic expansion, and thus the SchroK dinger equation describes only the leading nonrelativistic approximation to the hydrogen energy levels. We are interested in low-Z atoms in this paper. High-Z atoms cannot be treated as nonrelativistic systems, since an expansion in Za is problematic.
M.I. Eides et al. / Physics Reports 342 (2001) 63}261
71
The classical nonrelativistic expansion goes over p/m. In the case of the loosely bound electron, the expansion in p/m corresponds to expansion in (Za); hence, relativistic corrections are given by the expansion over even powers of Za. As we have seen above, from the explicit expressions for the energy levels in the Coulomb "eld the same parameter Za also characterizes the binding energy. For this reason, parameter Za is also often called the binding parameter, and the relativistic corrections carry the second name of binding corrections. Note that the series expansion for the relativistic corrections in the bound state problem goes literally over the binding parameter Za, unlike the case of the scattering problem in QED, where the expansion parameter always contains an additional factor p in the denominator and the expansion typically goes over a/p. This absence of the extra factor p in the denominator of the expansion parameter is a typical feature of the Coulomb problem. As we will see below, in the combined expansions over a and Za, expansion over a at "xed power of the binding parameter Za always goes over a/p, as in the case of scattering. Loosely speaking one could call successive terms in the series over Za the relativistic corrections, and successive terms in the expansion over a/p the loop or radiative corrections. For the bound electron, calculation of the relativistic corrections should also take into account the contributions due to its spin one half. Account for the spin one half does not change the fundamental fact that all relativistic (binding) corrections are described by the expansion in even powers of Za, as in the naive expansion of the classical relativistic square root in Eq. (3). Only the coe$cients in this expansion change due to presence of spin. A proper description of all relativistic corrections to the energy levels is given by the Dirac equation with a Coulomb source. All relativistic corrections may easily be obtained from the exact solution of the Dirac equation in the external Coulomb "eld (see, e.g., [19,20]) E "mf (n, j) , LH where
(4)
\ (Za) f (n, j)" 1# ((( j#)!(Za)#n!j!) (Za) (Za) 3 1 +1! ! ! 2n 2n j#(1/2) 4n
1 (Za) 3 5 6 ! # # ! #2 , 8n ( j#(1/2)) n( j#(1/2)) 2n n( j#(1/2))
(5)
and j"1/2, 3/2,2, n!1/2 is the total angular momentum of the state. In the Dirac spectrum, energy levels with the same principal quantum number n but di!erent total angular momentum j are split into n components of the "ne structure, unlike the nonrelativistic SchroK dinger spectrum where all levels with the same n are degenerate. However, not all degeneracy is lifted in the spectrum of the Dirac equation: the energy levels corresponding to the same n and j but di!erent l"j$1/2 remain doubly degenerate. This degeneracy is lifted by the corrections connected with the "nite size of the Coulomb source, recoil contributions, and by the dominating QED loop contributions. The respective energy shifts are called the Lamb shifts
72
M.I. Eides et al. / Physics Reports 342 (2001) 63}261
(see exact de"nition in Section 4.1) and will be one of the main subjects of discussion below. We would like to emphasize that the quantum mechanical (recoil and "nite nuclear size) e!ects alone do not predict anything of the scale of the experimentally observed Lamb shift which is thus essentially a quantum electrodynamic ("eld-theoretical) e!ect. One trivial improvement of the Dirac formula for the energy levels may easily be achieved if we take into account that, as was already discussed above, the electron motion in the Coulomb "eld is essentially nonrelativistic, and, hence, all contributions to the binding energy should contain as a factor the reduced mass of the electron-nucleus nonrelativistic system rather than the electron mass. Below we will consider the expression with the reduced mass factor E "m#m [ f (n, j)!1] , (6) LH rather than the naive expression in Eq. (4), as a starting point for calculation of corrections to the electron energy levels. In order to provide a solid starting point for further calculations the Dirac spectrum with the reduced mass dependence in Eq. (6) should be itself derived from QED (see Section 4.1), and not simply postulated on physical grounds as is done here. 2.3. Bethe}Salpeter equation and the ewective Dirac equation Quantum "eld theory provides an unambiguous way to "nd energy levels of any composite system. They are determined by the positions of the poles of the respective Green functions. This idea was "rst realized in the form of the Bethe}Salpeter (BS) equation for the two-particle Green function (see Fig. 3) [21] GK "S #S K GK , (7) 1 where S is a free two-particle Green function, the kernel K is a sum of all two-particle irreducible 1 diagrams in Fig. 4, and GK is the total two-particle Green function. At "rst glance the "eld-theoretical BS equation has nothing in common with the quantum mechanical SchroK dinger and Dirac equations discussed above. However, it is not too di$cult to demonstrate that with selection of a certain subset of interaction kernels (ladder and crossed ladder), followed by some natural approximations, the BS eigenvalue equation reduces in the leading approximation, in the case of one light and one heavy constituent, to the SchroK dinger or Dirac eigenvalue equations for a light particle in a "eld of a heavy Coulomb center. The basics of the BS equation are described in many textbooks (see, e.g., [20,22,23]), and many important results were obtained in the BS framework. However, calculations beyond the leading order in the original BS framework tend to be rather complicated and nontransparent. The reasons for these complications can be traced to the dependence of the BS wave function on the unphysical relative energy (or relative time), absence of
Fig. 3. Bethe}Salpeter equation.
M.I. Eides et al. / Physics Reports 342 (2001) 63}261
73
Fig. 4. Kernel of the Bethe}Salpeter equation.
the exact solution in the zero-order approximation, nonreducibility of the ladder approximation to the Dirac equation, when the mass of the heavy particle goes to in"nity, etc. These di$culties are generated not only by the nonpotential nature of the bound state problem in quantum "eld theory, but also by the unphysical classi"cation of diagrams with the help of the notion of two-body reducibility. As it was known from the very beginning [21] there is a tendency to cancellation between the contributions of the ladder graphs and the graphs with crossed photons. However, in the original BS framework, these graphs are treated in profoundly di!erent ways. It is quite natural, therefore, to seek such a modi"cation of the BS equation, that the crossed and ladder graphs play a more symmetrical role. One also would like to get rid of other drawbacks of the original BS formulation, preserving nevertheless its rigorous "eld-theoretical contents. The BS equation allows a wide range of modi"cations since one can freely modify both the zero-order propagation function and the leading order kernel, as long as these modi"cations are consistently taken into account in the rules for construction of the higher-order approximations, the latter being consistent with Eq. (7) for the two-particle Green function. A number of variants of the original BS equation were developed since its discovery (see, e.g., [24}28]). The guiding principle in almost all these approaches was to restructure the BS equation in such a way, that it would acquire a three-dimensional form, a soluble and physically natural leading order approximation in the form of the SchroK dinger or Dirac equations, and more or less transparent and regular way for selection of the kernels relevant for calculation of the corrections of any required order. We will describe, in some detail, one such modi"cation, an e!ective Dirac equation (EDE) which was derived in a number of papers [25}28]. This new equation is more convenient in many applications than the original BS equation, and we will derive some general formulae connected with this equation. The physical idea behind this approach is that in the case of a loosely bound system of two particles of di!erent masses, the heavy particle spends almost all its life not far from its own mass shell. In such case some kind of Dirac equation for the light particle in an external Coulomb "eld should be an excellent starting point for the perturbation theory expansion. Then it is convenient to choose the free two-particle propagator in the form of the product of the heavy particle mass shell projector K and the free electron propagator KS(p, l, E)"2pid>(p!M)
p. #M (2p)d(p!l) E!p. !m
(8)
where p and l are the momenta of the incoming and outgoing heavy particle, E !p is the I I I I momentum of the incoming electron (E"(E, 0) } this is the choice of the reference frame), and c-matrices associated with the light and heavy particles act only on the indices of the respective particle. The free propagator in Eq. (8) determines other building blocks and the form of a two-body equation equivalent to the BS equation, and the regular perturbation theory formulae in this case were obtained in [27,28].
74
M.I. Eides et al. / Physics Reports 342 (2001) 63}261
Fig. 5. Series for the kernel of the e!ective Dirac equation.
In order to derive these formulae let us "rst write the BS equation in Eq. (7) in an explicit form
dk dq GK (p, l, E)"S (p, l, E)# S (p, k, E)K (k, q, E)GK (q, l, E) , 1 (2p) (2p)
(9)
where i i (2p)d(p!l) . S (p, k, E)" p. !M E!l. !m
(10)
The amputated two-particle Green function G satis"es the equation 2 G "K #K S G , (11) 2 1 1 2 A new kernel corresponding to the free two-particle propagator in Eq. (8) may be de"ned via this amputated two-particle Green function G "K#KKSG . (12) 2 2 Comparing Eqs. (11) and (12) one easily obtains the diagrammatic series for the new kernel K (see Fig. 5)
dr K(q, l, E)"[I!K (S !KS)]\K "K (q, l, E)# K (q, r, E) 1 1 1 (2p) 1
i i r. #M !2pid>(r!M) K (r, l, E)#2 . 1 r. !M E!r. !m E!r. !m
(13)
The new bound state equation is constructed for the two-particle Green function de"ned by the relationship G"KS#KSG KS . (14) 2 The two-particle Green function G has the same poles as the initial Green function GK and satis"es the BS-like equation G"KS#KSKG ,
(15)
or, explicitly, G(p, l, E)"2pid>(p!M)
p. #M (2p)d(p!l) E!p. !m
#2pid>(p!M)
dq p. #M K(p, q, E)G(q, l, E) . E!p. !m (2p)
(16)
M.I. Eides et al. / Physics Reports 342 (2001) 63}261
75
This last equation is completely equivalent to the original BS equation, and may be easily written in a three-dimensional form
dq p. #M (2p)d(p!l)# iK(p, q, E)GI (q, l, E) , GI (p, l, E)" (2p)2E E!p. !m O
(17)
where all four-momenta are on the mass shell p"l"q"M, E "(p#M, and the O three-dimensional two-particle Green function GI is de"ned as follows: G(p, l, E)"2pid>(p!M)GI (p, l, E)2pid>(l!M) .
(18)
Taking the residue at the bound state pole with energy E we obtain a homogeneous equation L dq iK(p, q, E ) (q, E ) . (19) (E. !p. !m) (p, E )"(p. #M) L L L L (2p)2E O Due to the presence of the heavy particle mass shell projector on the right-hand side the wave function in Eq. (19) satis"es a free Dirac equation with respect to the heavy particle indices:
(p. !M) (p, E )"0 . L Then one can extract a free heavy particle spinor from the wave function in Eq. (19)
(p, E )"(2E ;(p)t(p, E ) L L L where
;(p)"
(20)
(21)
(E #M I N p)r . (E !M N "p"
(22)
Finally, the eight-component wave function t(p, E ) (four ordinary electron spinor indices, and L two extra indices corresponding to the two-component spinor of the heavy particle) satis"es the e!ective Dirac equation (see Fig. 6)
dq iKI (p, q, E )t(q, E ) , (E. !p. !m)t(p, E )" L L L L (2p)2E O where ;M (p)K(p, q, E );(q) L , KI (p, q, E )" L (4E E N O
Fig. 6. E!ective Dirac equation.
(23)
(24)
76
M.I. Eides et al. / Physics Reports 342 (2001) 63}261
Fig. 7. E!ective Dirac equation in the external Coulomb "eld.
k"(E !p ,!p) is the electron momentum, and the crosses on the heavy line in Fig. 6 mean that L the heavy particle is on its mass shell. The inhomogeneous equation Eq. (17) also "xes the normalization of the wave function. Even though the total kernel in Eq. (23) is unambiguously de"ned, we still have freedom to choose the zero-order kernel K at our convenience, in order to obtain a solvable lowest-order approximation. It is not di$cult to obtain a regular perturbation theory series for the corrections to the zero-order approximation corresponding to the di!erence between the zero-order kernel K and the exact kernel K #dK E "E#(n"idK(E)"n)(1#(n"idK(E)"n)) L L L L #(n"idK(E)G (E)idK(E)"n)(1#(n"idK(E)"n))#2 , L L L L L
(25)
where the summation of intermediate states goes with the weight dp/[(2p)2E ] and is realized N with the help of the subtracted free Green function of the EDE with the kernel K "n)(n" , G (E)"G (E)! L E!E L
(26)
conjugation is understood in the Dirac sense, and dK(E),(dK/dE) . L ##L The only apparent di!erence of the EDE Eq. (23) from the regular Dirac equation is connected with the dependence of the interaction kernels on energy. Respectively the perturbation theory series in Eq. (25) contain, unlike the regular nonrelativistic perturbation series, derivatives of the interaction kernels over energy. The presence of these derivatives is crucial for cancellation of the ultraviolet divergences in the expressions for the energy eigenvalues. A judicious choice of the zero-order kernel (sum of the Coulomb and Breit potentials, for more detail see, e.g., [24,25,28]) generates a solvable unperturbed EDE in the external Coulomb "eld in Fig. 7. The eigenfunctions of this equation may be found exactly in the form of the Dirac}Coulomb wave functions (see, e.g., [28]). For practical purposes it is often su$cient to approximate these exact wave functions by the product of the SchroK dinger}Coulomb wave functions with the reduced mass and the free electron spinors which depend on the electron mass and not on the reduced mass. These functions are very convenient for calculation of the high-order corrections, and while below we will often skip some steps in the derivation of one or another high-order contribution from the EDE, we advise the reader to keep in mind that almost all calculations below are done with these unperturbed wave functions. Strictly speaking the external "eld in this equation is not exactly Coulomb but also includes a transverse contribution.
M.I. Eides et al. / Physics Reports 342 (2001) 63}261
77
3. General features of the hydrogen energy levels 3.1. Classixcation of corrections The zero-order e!ective Dirac equation with a Coulomb source provides only an approximate description of loosely bound states in QED, but the spectrum of this Dirac equation may serve as a good starting point for obtaining more precise results. The magnetic moment of the heavy nucleus is completely ignored in the Dirac equation with a Coulomb source, and, hence, the hyper"ne splitting of the energy levels is missing in its spectrum. Notice that the magnetic interaction between the nucleus and the electron may be easily described even in the framework of the nonrelativistic quantum mechanics, and the respective calculation of the leading contribution to the hyper"ne splitting was done a long time ago by Fermi [29]. Other corrections to the Dirac energy levels do not arise in the quantum mechanical treatment with a potential, and for their calculation, as well as for calculation of the corrections to the hyper"ne splitting, "eld-theoretical methods are necessary. All electrodynamic corrections to the energy levels may be written in the form of the power series expansion over three small parameters a, Za and m/M which determine the properties of the bound state. Account for the additional corrections of nonelectromagnetic origin induced by the strong and weak interactions introduces additional small parameters, namely, the ratio of the nuclear radius and the Bohr radius, the Fermi constant, etc. It should be noted that the coe$cients in the power series for the energy levels might themselves be slowly varying functions (logarithms) of these parameters. Each of the small parameters above plays an important and unique role. In order to organize further discussion of di!erent contributions to the energy levels it is convenient to classify corrections in accordance with the small parameters on which they depend. Corrections which depend only on the parameter Za will be called relativistic or binding corrections. Higher powers of Za arise due to deviation of the theory from a nonrelativistic limit, and thus represent a relativistic expansion. All such contributions are contained in the spectrum of the e!ective Dirac equation in the external Coulomb "eld. Contributions to the energy which depend only on the small parameters a and Za are called radiative corrections. Powers of a arise only from the quantum electrodynamics loops, and all associated corrections have a quantum "eld theory nature. Radiative corrections do not depend on the recoil factor m/M and thus may be calculated in the framework of QED for a bound electron in an external "eld. In respective calculations one deals only with the complications connected with the presence of quantized "elds, but the two-particle nature of the bound state and all problems connected with the description of the bound states in relativistic quantum "eld theory still may be ignored. Corrections which depend on the mass ratio m/M of the light and heavy particles re#ect a deviation from the theory with an in"nitely heavy nucleus. Corrections to the energy levels which depend on m/M and Za are called recoil corrections. They describe contributions to the energy levels which cannot be taken into account with the help of the reduced mass factor. The presence of these corrections signals that we are dealing with a truly two-body problem, rather than with a one-body problem. Leading recoil corrections in Za (of order (Za)(m/M)L) still may be taken into account with the help of the e!ective Dirac equation in the external "eld since these corrections are induced by the
78
M.I. Eides et al. / Physics Reports 342 (2001) 63}261
Fig. 8. Leading-order contribution to the electron radius.
one-photon exchange. This is impossible for the higher-order recoil terms which re#ect the truly relativistic two-body nature of the bound state problem. Technically, respective contributions are induced by the Bethe}Salpeter kernels with at least two-photon exchanges and the whole machinery of relativistic QFT is necessary for their calculation. Calculation of the recoil corrections is simpli"ed by the absence of ultraviolet divergences, connected with the purely radiative loops. Radiative-recoil corrections are the expansion terms in the expressions for the energy levels which depend simultaneously on the parameters a, m/M and Za. Their calculation requires application of all the heavy artillery of QED, since we have to account both for the purely radiative loops and for the relativistic two-body nature of the bound states. The last class of corrections contains nonelectromagnetic corrections, e!ects of weak and strong interactions. The largest correction induced by the strong interaction is connected with the "niteness of the nuclear size. Let us emphasize once more that hyper"ne structure, radiative, recoil, radiative-recoil, and nonelectromagnetic corrections are all missing in the Dirac energy spectrum. Discussion of their calculations is the main topic of this review. 3.2. Physical origin of the Lamb shift According to QED an electron continuously emits and absorbs virtual photons (see the leading order diagram in Fig. 8) and as a result its electric charge is spread over a "nite volume instead of being pointlike 1r2"!6
dF (!k) dk k
1 2a 1 2a m ln + ln(Za)\ + o m p m p
(28) In order to obtain this estimate of the electron radius we have taken into account that the electron is slightly o! mass shell in the bound state. Hence, the would be infrared divergence in the electron charge radius is cut o! by its virtuality o"(m!p)/m which is of order of the nonrelativistic binding energy o+m(Za). The "nite radius of the electron generates a correction to the Coulomb potential (see, e.g., [19]) 1 2p d<" 1r2*<" Za1r2d(r) , 6 3
(29)
where <"!Zar. is the Coulomb potential. The numerical factor in Eq. (28) arises due to the common relation between the expansion of the form factor and the mean square root radius F(!k)"1!1r2k .
(27)
M.I. Eides et al. / Physics Reports 342 (2001) 63}261
79
Fig. 9. Leading order polarization insertion.
The respective correction to the energy levels is simply given by the matrix element of this perturbation. Thus we immediately discover that the "nite size of the electron produced by the QED radiative corrections leads to a shift of the hydrogen energy levels. Moreover, since this perturbation is nonvanishing only at the source of the Coulomb potential, it in#uences quite di!erently the energy levels with di!erent orbital angular momenta, and, hence, leads to splitting of the levels with the same total angular momenta but di!erent orbital momenta. This splitting lifts the degeneracy in the spectrum of the Dirac equation in the Coulomb "eld, where the energy levels depend only on the principal quantum number n and the total angular momentum j. It is very easy to estimate this splitting (shift of the S level energy) *E"1nS"d<"nS2+"W (0)" L
2p(Za) 1r2 3
4m(Za) a + ln[(Za)\]d L +1330 MHz . J n 3p
(30)
This result should be compared with the experimental number of about 1040 MHz and the agreement is satisfactory for such a crude estimate. There are two qualitative features of this result to which we would like to attract the reader's attention. First, the sign of the energy shift may be obtained without calculation. Due to the "nite radius of the electron its charge in the S state is on the average more spread out around the Coulomb source than in the case of the pointlike electron. Hence, the binding is weaker than in the case of the pointlike electron and the energy of the level is higher. Second, despite the presence of nonlogarithmic contributions missing in our crude calculation, their magnitude is comparatively small, and the logarithmic term above is responsible for the main contribution to the Lamb shift. This property is due to the would be infrared divergence of the considered contribution, which is cuto! by the small (in comparison with the electron mass) binding energy. As we will see below, whenever a correction is logarithmically enhanced, the respective logarithm gives a signi"cant part of the correction, as is the case above. Another obvious contribution to the Lamb shift of the same leading order is connected with the polarization insertion in the photon propagator (see Fig. 9). This correction also induces a correction to the Coulomb potential P(!k) d<"! k
4 a(Za) a *<"! d(r) , *<" 15 m 15pm
k
and the respective correction to the S-level energy is equal to 4a(Za) *E"1nS"d<"nS2"!"W (0)" L 15m
(31)
80
M.I. Eides et al. / Physics Reports 342 (2001) 63}261
4m(Za) a "! d +!30 MHz . n 15p JL
(32)
Once again the sign of this correction is evident in advance. The polarization correction may be thought of as a correction to the electric charge of the nucleon induced by the fact that the electron sees the proton from a "nite distance. This means that the electron, which has penetrated in the polarization cloud, sees e!ectively a larger charge and experiences a stronger binding force, which lowers the energy level. Experimental observation of this contribution to the Lamb shift played an important role in the development of modern quantum electrodynamics since it explicitly con"rmed the very existence of the closed electron loops. Numerically vacuum polarization contribution is much less important than the contribution connected with the electron spreading due to quantum corrections, and the total shift of the level is positive. 3.3. Natural magnitudes of corrections to the Lamb shift Let us emphasize that the main contribution to the Lamb shift is a radiative correction itself (compare Eqs. (30) and (32)) and contains a logarithmic enhancement factor. This is extremely important when one wants to get a qualitative understanding of the magnitude of the higher-order corrections to the Lamb shift discussed below. Due to the presence of this accidental logarithmic enhancement it is impossible to draw conclusions about the expected magnitude of higher-order corrections to the Lamb shift simply by comparing them to the magnitude of the leading-order contribution. It is more reasonable to extract from this leading-order contribution the term which can be called the skeleton factor and to use it further as a normalization factor. Let us write the leading-order contributions in Eqs. (30) and (32) obtained above in the form 4m(Za) ; radiative correction , n
(33)
where the radiative correction is either the slope of the Dirac form factor, roughly speaking equal to mdF (!k)/dk k "a/(3p)ln(Za)\, or the polarization correction mP(!k)/k k "a/(15p). It is clear now that the scale setting factor which should be used for qualitative estimates of the high-order corrections to the Lamb shift is equal to 4m(Za)/n. Note the characteristic dependence on the principal quantum number 1/n which originates from the square of the wave function at the origin "t(0)"&1/n. All corrections induced at small distances (or at high virtual momenta) have this characteristic dependence and are called state independent. Even the coe$cients before the leading powers of the low-energy logarithms ln(Za) are state independent since these leading logarithms originate from integration over the wide intermediate momenta region from m(Za) to m, and the respective factor before the logarithm is determined by the highmomenta part of the integration region. Estimating higher-order corrections to the Lamb shift it is necessary to remember, as mentioned above, that unlike the case of radiative corrections to the scattering amplitudes, in the bound state problem factors Za are not accompanied by an extra We remind the reader that according to the common renormalization procedure the electric charge is de"ned as a charge observed at a very large distance.
M.I. Eides et al. / Physics Reports 342 (2001) 63}261
81
factor p in the denominator. This well known feature of the Coulomb problem provides one more reason to preserve Z in analytic expressions (even when Z"1), since in this way one may easily separate powers of Za not accompanied by p from powers of a which always enter formulae in the combination a/p.
4. External 5eld approximation We will "rst discuss corrections to the basic Dirac energy levels which arise in the external "eld approximation. These are leading relativistic corrections with exact mass dependence and radiative corrections. 4.1. Leading relativistic corrections with exact mass dependence We are considering a loosely bound two-particle system. Due to the nonrelativistic nature of this bound state it is clear that the main (Za) contribution to the binding energy depends only on one mass parameter, namely, on the nonrelativistic reduced mass, and does not depend separately on the masses of the constituents. Relativistic corrections to the energy levels of order (Za), describing the "ne structure of the hydrogen spectrum, are missing in the nonrelativistic SchroK dinger equation approach. The correct description of the "ne structure for an in"nitely heavy Coulomb center is provided by the relativistic Dirac equation, but it tells us nothing about the proper mass dependence of these corrections for the nucleus of "nite mass. There are no reasons to expect that in the case of a system of two particles with "nite masses relativistic corrections of order (Za) will depend only on the reduced mass. The dependence of these corrections on the masses of the constituents should be more complicated. The solution of the problem of the proper mass dependence of the relativistic corrections of order (Za) may be found in the e!ective Hamiltonian framework. In the center of mass system the nonrelativistic Hamiltonian for a system of two particles with Coulomb interaction has the form p Za p ! . H " # 2m 2M r
(34)
In a nonrelativistic loosely bound system expansion over (Za) corresponds to the nonrelativistic expansion over v/c. Hence, we need an e!ective Hamiltonian including the terms of the "rst order in v/c for proper description of the corrections of relative order (Za) to the nonrelativistic energy levels. Such a Hamiltonian was "rst considered by Breit [30], who realized that all corrections to the nonrelativistic two-particle Hamiltonian of the "rst order in v/c may be obtained from the sum of the free relativistic Hamiltonians of each of the particles and the relativistic one-photon exchange. This conjecture is intuitively obvious since extra exchange photons lead to at least one extra factor of Za, thus generating contributions to the binding energy of order (Za) and higher. An explicit expression for the Breit potential was derived in [31] from the one-photon exchange amplitude with the help of the Foldy}Wouthuysen transformation We do not consider hyper"ne structure now and thus omit in Eq. (35) all terms in the Breit potential which depend on the spin of the heavy particle.
82
M.I. Eides et al. / Physics Reports 342 (2001) 63}261
Fig. 10. Sum of the Coulomb and Breit kernels.
pZa 1 1 Za r(r ) p) ) p < " # d(r)! p# 2 m M 2mMr r #
Za 1 1 # [r;p] ) r . r 4m 2mM
(35)
A simpli"ed derivation of the Breit interaction potential may be found in many textbooks (see, e.g., [20]). All contributions to the energy levels up order (Za) may be calculated from the total Breit Hamiltonian H "H #< , (36) where the interaction potential is the sum of the instantaneous Coulomb and Breit potentials in Fig. 10. The corrections of order (Za) are just the "rst-order matrix elements of the Breit interaction between the Coulomb}SchroK dinger eigenfunctions of the Coulomb Hamiltonian H in Eq. (34). The mass dependence of the Breit interaction is known exactly, and the same is true for its matrix elements. These matrix elements and, hence, the exact mass dependence of the contributions to the energy levels of order (Za), beyond the reduced mass, were "rst obtained a long time ago [31]
1 3 m m (Za) m (Za) ! ! # E"(m#M)! LH j#(1/2) 4n 4n(m#M) 2n 2n #
1 (Za)m 1 ! (1!d ) . J 2nM j#(1/2) l#(1/2)
(37)
Note the emergence of the last term in Eq. (37) which lifts the characteristic degeneracy in the Dirac spectrum between levels with the same j and l"j$(1/2). This means that the expression for the energy levels in Eq. (37) already predicts a nonvanishing contribution to the classical Lamb shift E(2S )!E(2P ). Due to the smallness of the electron-proton mass ratio this extra term is extremely small in hydrogen. The leading contribution to the Lamb shift, induced by the QED radiative correction, is much larger. In the Breit Hamiltonian in Eq. (35) we have omitted all terms which depend on spin variables of the heavy particle. As a result the corrections to the energy levels in Eq. (37) do not depend on the relative orientation of the spins of the heavy and light particles (in other words they do not describe hyper"ne splitting). Moreover, almost all contributions in Eq. (37) are independent not only of the mutual orientation of spins of the heavy and light particles but also of the magnitude of the spin of the heavy particle. The only exception is the small contribution proportional to the term d , called J the Darwin}Foldy contribution. This term arises in the matrix element of the Breit Hamiltonian
M.I. Eides et al. / Physics Reports 342 (2001) 63}261
83
only for the spin one-half nucleus and should be omitted for spinless or spin one nuclei. This contribution combines naturally with the nuclear size correction, and we postpone its discussion to Section 7.1.2 dealing with the nuclear size contribution. In the framework of the e!ective Dirac equation in the external Coulomb "eld (see Fig. 7) the result in Eq. (37) was "rst obtained in [24] (see also [25,28]) and rederived once again in [7], where it was presented in the form m [ f (n, j)!1] E"(m#M)#m [ f (n, j)!1]! LH 2(m#M) #
1 1 (Za)m ! (1!d ) . J 2nM j#(1/2) l#(1/2)
(38)
This equation has the same contributions of order (Za) as in Eq. (37), but formally this expression also contains nonrecoil and recoil corrections of order (Za) and higher. The nonrecoil part of these contributions is de"nitely correct since the Dirac energy spectrum is the proper limit of the spectrum of a two-particle system in the nonrecoil limit m/M"0. As we will discuss later the "rst-order mass ratio contributions in Eq. (38) correctly reproduce recoil corrections of higher orders in Za generated by the Coulomb and Breit exchange photons. Additional "rst-order mass ratio recoil contributions of order (Za) will be calculated below. Recoil corrections of order (Za) were never calculated and at the present stage the mass dependence of these terms should be considered as completely unknown. Recoil corrections depending on odd powers of Za are also missing in Eq. (38), since as was explained above all corrections generated by the one-photon exchange necessarily depend on the even powers of Za. Hence, to calculate recoil corrections of order (Za) one has to consider the nontrivial contribution of the box diagram. We postpone discussion of these corrections until Section 5.1. It is appropriate to give an exact de"nition of what is called the Lamb shift. In the early days of the Lamb shift studies, experimentalists measured not a shift but the classical Lamb splitting E(2S )!E(2P ) between the energy levels which are degenerate according to the naive Dirac equation in the Coulomb "eld. This splitting is an experimental observable de"ned independent of any theory. Modern experiments now produce high-precision experimental data for the nondegenerate 1S energy level, and the very notion of the Lamb shift in this case, as well as in the case of an arbitrary energy level, does not admit an unambiguous de"nition. It is most natural to call the Lamb shift the sum of all contributions to the energy levels which lift the double degeneracy of the Dirac}Coulomb spectrum with respect to l"j$1/2 (see Section 2.2). There emerged an almost universally adopted convention to call the Lamb shift the sum of all contributions to the energy levels beyond the "rst three terms in Eq. (38) and excluding, of course, all hyper"ne splitting contributions. This means that we de"ne the Lamb shift by the relationship m [ f (n, j)!1]#¸ ,E"0#¸ . E "(m#M)#m [ f (n, j)!1]! LHJ LH LHJ LHJ 2(m#M) We will adopt this de"nition below. We remind the reader that the external "eld in this equation also contains a transverse contribution.
(39)
84
M.I. Eides et al. / Physics Reports 342 (2001) 63}261
4.2. Radiative corrections of order aL(Za)m Let us turn now to the discussion of radiative corrections which may be calculated in the external "eld approximation. 4.2.1. Leading contribution to the Lamb shift 4.2.1.1. Radiative insertions in the electron line and the Dirac form factor contribution. The main contribution to the Lamb shift was "rst estimated in the nonrelativistic approximation by Bethe [32], and calculated by Kroll and Lamb [33], and by French and Weisskopf [34]. We have already discussed above qualitatively the nature of this contribution. In the e!ective Dirac equation framework the apparent perturbation kernels to be taken into account are the diagrams with the radiative photon spanning any number of the exchanged Coulomb photons in Figs. 8 and 11. The dominant logarithmic contribution to the Lamb shift is produced by the slope of the Dirac form factor F , but super"cially all these kernels can lead to corrections of order a(Za) and one cannot discard any of them. An additional problem is connected with the infrared divergence of the kernels on the mass shell. There cannot be any true infrared divergence in the bound state problem since all would be infrared divergences are cut o! either at the inverse Bohr radius or by the electron binding energy. Nevertheless such spurious on-shell infrared divergences can complicate the calculations. An important step which greatly facilitates treatment of all these problems consists in separation of the radiative photon integration region with the help of auxiliary parameter p in such way that m(Za)
M.I. Eides et al. / Physics Reports 342 (2001) 63}261
85
Fig. 11. Kernels with many spanned Coulomb photons.
take these graphs into account simultaneously. This means that one has to calculate the matrix element of the exact self-energy operator in the external Coulomb "eld between Dirac}Coulomb wave functions. This problem may seem formidable at "rst sight, but it can be readily solved with the help of old-fashioned perturbation theory by inserting a complete set of intermediate states and performing calculations in the dipole approximation, which is adequate to accuracy a(Za)m. It should be mentioned that the magnitude of the upper boundary of the interval for the auxiliary parameter p was chosen in order to provide validity of the dipole approximation. The most important fact about the auxiliary parameter p is that one can use di!erent approximations for calculations of the high- and low-momentum contributions. In the high-momentum region the factor m(Za)/k4(m(Za)/p;1 plays the role of a small parameter and one can consider binding e!ects as small corrections. In the low-momentum region k/(mZa)4p/(mZa);1 one may use the nonrelativistic multipole expansion, and the main contribution in this region is given by the dipole contribution. The crucial point is that for k&p both expansions are valid simultaneously and one can match them without loss in accuracy. Matching the high- and low-momentum contributions one obtains the classical result for the shift of the energy level generated by the slope of the Dirac form factor
*E"
1 m(Za)\ 11 4a(Za) m 1 m, ln # d ! ln k (n, l) J 3 pn m m 72 3
(40)
where m "mM/(m#M) is the reduced mass and ln k (n, l) is the so called Bethe logarithm. The factor m/m arises in the argument of the would be infrared divergent logarithm ln(m/j) since in the nonrelativistic approximation the energy levels of an atom depend only on the reduced mass and, hence, the infrared divergence is cut o! by the binding energy m (Za) [7]. The mass dependence of the correction of order a(Za) beyond the reduced mass factor is properly described by the expression in Eq. (40) as was proved in [35,36]. In the same way as for the case of the leading relativistic correction in Eq. (37), the result in Eq. (40) is exact in the small mass ratio m/M, since in the framework of the e!ective Dirac equation all corrections of order (Za) are generated by the kernels with one-photon exchange. In some earlier papers the reduced mass factors in Eq. (40) were expanded up to "rst order in the small mass ratio m/M. Nowadays it is important to preserve an exact mass dependence in Eq. (40) because current experiments may be able to detect quadratic mass corrections (about 2 kHz for the 1S level in hydrogen) to the leading nonrecoil Lamb shift contribution. The Bethe logarithm is formally de"ned as a certain normalized in"nite sum of matrix elements of the coordinate operator over the SchroK dinger}Coulomb wave functions. It is a pure number which can in principle be calculated with arbitrary accuracy, and high-accuracy results for the Bethe logarithm can be found in the literature (see, e.g., [37,38] and references therein). For convenience we have collected some values for the Bethe logarithms [38] in Table 1.
86
M.I. Eides et al. / Physics Reports 342 (2001) 63}261
Table 1 Bethe logarithms for some lower levels [38]
1S 2S 2P 3S 3P 3D 4S 4P 4D 4F 5S 5P 5D 5F 5G 6S 6P 6D 6F 6G 6H 7S 7P 7D 7F 7G 7H 7I 8S 8P 8D 8F 8G 8H 8I 8K
ln k (n, l)
a(Za) m m (kHz) *E"! ln k (n, l) pn m
2.984128555765498 2.811769893120563 !0.030016708630213 2.767663612491822 !0.038190229385312 !0.005232148140883 2.749811840454057 !0.041954894598086 !0.006740938876975 !0.001733661482126 2.740823727854572 !0.044034695591878 !0.007600751257947 !0.002202168381486 !0.000772098901537 2.735664206935105 !0.045312197688974 !0.008147203962354 !0.002502179760279 !0.000962797424841 !0.000407926168297 2.732429129187092 !0.046155177262915 !0.008519223293658 !0.002709095727000 !0.001094472739370 !0.000499701854766 !0.000240908258717 2.730267260690589 !0.046741352003557 !0.008785042984125 !0.002859114559296 !0.001190432043318 !0.00 056 653 272 412 !0.000290426172391 !0.000153864500961
!23 591.92 !2778.66 29.66 !810.39 11.18 1.53 !339.68 5.18 0.83 0.21 !173.35 2.79 0.48 0.14 0.05 !100.13 1.66 0.30 0.09 0.04 0.01 !62.98 1.06 0.20 0.06 0.03 0.01 0.01 !42.16 0.72 0.14 0.04 0.02 0.01
4.2.1.2. Pauli form factor contribution. The Pauli form factor F also generates a small contribu tion to the Lamb shift. This form factor does not produce any contribution if one neglects the lower components of the unperturbed wave functions, since the respective matrix element is identically zero between the upper components in the standard representation for the Dirac matrices which we
M.I. Eides et al. / Physics Reports 342 (2001) 63}261
87
use everywhere. Taking into account lower components in the nonrelativistic approximation we easily obtain an explicit expression for the respective perturbation
m 1 *<#2r [<;p] F (0) , (41) d<"! m 4m where <"!Zar. is the Coulomb potential. Note the appearance of an extra factor m/m in the coe$cient before the second term. This is readily obtained from an explicit consideration of the radiatively corrected one photon exchange. In momentum space the term with the Laplacian of the Coulomb potential depends only on the exchanged momentum, while the second term contains explicit dependence on the electron momentum. Since the Pauli form factor depends explicitly on the electron momentum and not on the relative momentum of the electron}proton system, the transition to relative momentum, which is the argument of the unperturbed wave functions, leads to emergence of an extra factor m/m . The interaction potential above generated by the Pauli form factor may be written in terms of the spin}orbit interaction
d<"
Zap Zap d(r)# (sl) F (0) , m rmm
(42)
where r s" , l"r;p . 2
(43)
The respective contributions to the Lamb shift are given by
m a(Za)m m (Za)m F (0) " , *E " J m 2pn m n
j( j#1)!l(l#1)!3/4 m (Za)m F (0) *E " J$ m l(l#1)(2l#1) n a(Za)m j( j#1)!l(l#1)!3/4 m , " l(l#1)(2l#1) m 2pn
(44)
where we have used the Schwinger value [39] of the anomalous magnetic moment F (0)"a/(2p). Correct reduced mass factors have been retained in this expression instead of expanding in m/M. 4.2.1.3. Polarization operator contribution. The leading polarization operator contribution to the Lamb shift in Fig. 9 was already calculated above in Eq. (32). Restoring the reduced mass factors which were omitted in that qualitative discussion, we easily obtain *E"4p(Za)"W (0)" L
P(!k) k
4a(Za)m m d . "! J m 15pn k
(45)
This result was originally obtained in [40] long before the advent of modern QED, and was the only known source for the 2S}2P splitting. There is a certain historic irony that for many years
88
M.I. Eides et al. / Physics Reports 342 (2001) 63}261
Fig. 12. Two-loop electron formfactor.
it was the common wisdom `that this e!ect is .. much too small and has, in addition, the wrong signa [32] to explain the 2S}2P splitting. 4.2.2. Radiative corrections of order a(Za)m From the theoretical point of view, calculation of the corrections of order a(Za) contains nothing fundamentally new in comparison with the corrections of order a(Za). The scale for these corrections is provided by the factor 4a(Za)/(pn)m, as one may easily see from the respective discussion above in Section 3.3. Corrections depend only on the values of the form factors and their derivatives at zero transferred momentum and the only challenge is to calculate respective radiative corrections with su$cient accuracy. 4.2.2.1. Dirac form factor contribution. Calculation of the contribution of order a(Za) induced by the radiative photon insertions in the electron line is even simpler than the respective calculation of the leading-order contribution. The point is that the second- and higher-order contributions to the slope of the Dirac form factor are infrared "nite, and hence, the total contribution of order (Za) to the Lamb shift is given by the slope of the Dirac form factor. Hence, there is no need to sum an in"nite number of diagrams. One readily obtains for the respective contribution *E "!4p(Za)"W (0)" L $
4a(Za) m dF(!k) md , "0.46994142 J dk m pn k
(46)
where we have used the second-order slope of the Dirac form factor generated by the diagrams in Fig. 12
dF(!k) 3 0.46994142 a p 49 4819 1 a +! . p# " f(3)! ln 2# m dk 4 p 2 432 5184 m p k (47) The two-loop slope was considered in the early pioneer works [41,42], and for the "rst time the correct result was obtained numerically in [43]. This last work triggered a #urry of theoretical activity [44}47], followed by the "rst completely analytical calculation in [48]. The same analytical result for the slope of the Dirac form factor was derived in [49] from the total e>e\ cross section and the unitarity condition.
M.I. Eides et al. / Physics Reports 342 (2001) 63}261
89
Fig. 13. Insertions of two-loop polarization operator.
4.2.2.2. Pauli form factor contribution. Calculation of the Pauli form factor contribution follows closely the one which was performed in order a(Za), the only di!erence being that we have to employ the second-order contribution to the Pauli form factor (see Fig. 12) calculated a long time ago in [50}52] (the result of the "rst calculation [50] turned out to be in error)
3 p p 197 F(0)" f(3)! ln 2# # 4 2 12 144
a a +!0.32847892 . p p
(48)
Then one readily obtains for the Lamb shift contribution
a(Za)m m *E "!0.32847892 , J pn m
(49)
a(Za)m j( j#1)!l(l#1)!3/4 m . *E "!0.32847892 J$ l(l#1)(2l#1) m pn 4.2.2.3. Polarization operator contribution. Here we use well known low-momentum asymptotics of the second-order polarization operator [53}55] in Fig. 13 P(!k) k
41 a , "! 162m p k
(50)
and obtain [53]
82 a(Za)m m d . *E"! J m 81 pn
(51)
4.2.3. Corrections of order a(Za)m 4.2.3.1. Dirac form factor contribution. Calculation of the corrections of order a(Za) is similar to calculation of the contributions of order a(Za). Respective corrections depend only on the values of the three-loop form factors or their derivatives at vanishing transferred momentum. The three-loop contribution to the slope of the Dirac form factor (Fig. 14) was recently calculated analytically [56] dF(!k) dk
"! k
25 17 2929 217 217 f(5)! pf(3)! f(3)! a ! ln 2 8 24 288 9 216
90
M.I. Eides et al. / Physics Reports 342 (2001) 63}261
Fig. 14. Examples of the three-loop contributions for the electron form factor.
103 41 671 3899 454 979 77 513 1 a ! p ln 2# p ln 2# p! p! 1080 2160 25 920 38 880 186 624 m p
0.17 1722 a , +! m p
(52)
where 1 a " . 2Ln L The respective contribution to the Lamb shift is equal to
(53)
4a(Za) m md , *E "0.171722 J $ m pn
(54)
4.2.3.2. Pauli form factor contribution. For calculation of the Pauli form factor contribution to the Lamb shift the third-order contribution to the Pauli form factor (Fig. 14), calculated numerically in [57], and analytically in [58] is used:
83 215 100 pf(3)! f(5)# F(0)" 72 24 3
1 1 239 a # ln 2 ! p ln 2 ! p 24 24 2160
298 17 101 28 259 139 f(3)! p ln 2# p# # 9 810 5184 18
a a +1.1812414562 . (55) p p
Then one obtains for the Lamb shift
a(Za)m m , *E "1.1812414562 J m pn
a(Za)m j( j#1)!l(l#1)!3/4 m *E "1.1812414562 . J$ pn l(l#1)(2l#1) m
(56)
4.2.3.3. Polarization operator contribution. In this case the analytic result for the low-frequency asymptotics of the third-order polarization operator (see Fig. 15) [59] is used P(!k) k
"! k
8135 p ln 2 23p 325 805 1 a f(3)! # ! 9216 15 360 373 248 m p
0.3626544402 a , +! m p
(57)
M.I. Eides et al. / Physics Reports 342 (2001) 63}261
91
Fig. 15. Examples of the three-loop contributions to the polarization operator.
and one obtains [60]
a(Za)m m d . *E"!1.4506177632 J m pn
(58)
4.2.4. Total correction of order aL(Za)m The total contribution of order aL(Za)m is given by the sum of corrections in Eqs. (40), (44), (45), (46), (49), (51), (54), (56) and (58). It is equal to
*E " J
4 m(Za)\ 4 38 ln ! ln k (n, 0)# 3 m 3 45
9 3 10 2179 # ! f(3)# p ln 2! p! 4 2 27 648 #
a p
85 121 84 071 71 239 4787 f(5)! pf(3)! f(3)! ln 2! p ln 2# p ln 2 24 72 2304 27 135 108
1591 252 251 679 441 568 a # p! p# ! 3240 9720 93 312 9
a a(Za)m m p pn m
4 m(Za)\ 4 a 38 ln ! ln k (n, 0)# #0.538952 3 p m 3 45 a a(Za)m m , #0.4175042 p pn m
"
(59)
for the S-states, and
m 1 3 4 p p 197 *E " ! ln k (n, l) # # f(3)! ln 2# # J$ m 2 4 3 2 12 144
#
a p
83 215 100 25 25 239 pf(3)! f(5)# a # ln 2! p ln 2! p 72 24 3 18 18 2160
139 298 17 101 28 259 # f(3)! p ln 2# p# 18 9 810 5184
a p
92
M.I. Eides et al. / Physics Reports 342 (2001) 63}261
;
j( j#1)!l(l#1)!3/4 m a(Za)m m pn l(l#1)(2l#1)
m 1 a a 4 " ! ln k (n, l) # !0.32847892 #1.1812414562 m 2 p p 3 j( j#1)!l(l#1)!3/4 m a(Za)m l(l#1)(2l#1) m pn
(60)
for the non-S-states. Numerically corrections of order aL(Za)m for the lowest energy levels give *E(1S)"8 115 785.64 kHz , *E(2S)"1 037 814.43 kHz , *E(2P)"!12 846.46 kHz .
(61)
Contributions of order a(Za)m are suppressed by an extra factor a/p in comparison with the corrections of order a(Za)m. Their expected magnitude is at the level of hundredths of kHz even for the 1S state in hydrogen, and they are too small to be of any phenomenological signi"cance.
4.2.5. Heavy particle polarization contributions of order a(Za)m We have considered above only radiative corrections containing virtual photons and electrons. However, at the current level of accuracy one has to consider also e!ects induced by the virtual muons and lightest strongly interacting particles. The respective corrections to the electron anomalous magnetic moment are well known [57] and are still too small to be of any practical interest for the Lamb shift calculations. Heavy particle contributions to the polarization operator numerically have the same magnitude as polarization corrections of order a(Za). Corrections to the low-frequency asymptotics of the polarization operator are generated by the diagrams in Fig. 16. The muon loop contribution to the polarization operator
a (62) "! 15pm I immediately leads (compare Eq. (45)) to an additional contribution to the Lamb shift [61,62] P(!k) k
k
4 m a(Za) *E"! md . 15 m pn J I
Fig. 16. Muon-loop and hadron contributions to the polarization operator.
(63)
M.I. Eides et al. / Physics Reports 342 (2001) 63}261
93
The hadronic polarization contribution to the Lamb shift was estimated in a number of papers [61}63]. The light hadron contribution to the polarization operator may easily be estimated with the help of vector dominance P(!k) k
4pa (64) "! f G mG k G T T where m G are the masses of the three lowest vector mesons and the vector meson}photon vertex T has the form emG /f G . T T Estimating contributions of the heavy quark #avors with the help of the free quark loops one obtains the total hadronic vacuum polarization contribution to the Lamb shift in the form [62]
4p 1 a(Za) 2 *E"!4 R G # md . (65) T f m J 1 GeV pn 3 TG TG Numerically this correction is !3.18 kHz for the 1S-state and !0.04 kHz for the 2S-state in hydrogen. A compatible but a more accurate estimate for the heavy particle contribution to the 1S Lamb shift !3.40(7) kHz was obtained in [63] from the analysis of the experimental data on the low-energy e>e\ annihilation (Table 2). 4.3. Radiative corrections of order aL(Za)m 4.3.1. Skeleton integral approach to calculations of radiative corrections We have seen above that calculation of the corrections of order aL(Za)m (n'1) reduces to calculation of higher-order corrections to the properties of a free electron and to the photon propagator, namely to calculation of the slope of the electron Dirac form factor and anomalous magnetic moment, and to calculation of the leading term in the low-frequency expansion of the polarization operator. Hence, these contributions to the Lamb shift are independent of any features of the bound state. A nontrivial interplay between radiative corrections and binding e!ects arises "rst in calculation of contributions of order a(Za)m, and in calculations of higher-order terms in the combined expansion over a and Za. Calculation of the contribution of order aL(Za)m to the energy shift is even simpler than calculation of the leading-order contribution to the Lamb shift because the scattering approximation is su$cient in this case [64}66]. Formally this correction is induced by kernels with at least two-photon exchanges, and in analogy with the leading-order contribution one could also anticipate the appearance of irreducible kernels with higher number of exchanges. This does not happen, however, as can be proved formally, but in fact no formal proof is needed. First one has to realize that for high exchanged momenta expansion in Za is valid, and addition of any extra exchanged photon always produces an extra power of Za. Hence, in the high-momentum region only diagrams with two exchanged photons are relevant. Treatment of the low-momentum region is greatly facilitated by a very general feature of the Feynman diagrams, namely that the infrared It is not obvious that this contribution should be included in the phenomenological analysis of the Lamb shift measurements, since experimentally it is indistinguishable from an additional contribution to the proton charge radius. We will return to this problem below in Section 7.1.3.
94
M.I. Eides et al. / Physics Reports 342 (2001) 63}261
Table 2 Contributions of order aL(Za)m
a(Za) m 3 250 137.65(4) m+ (kHz) pn m n
4 Bethe [32] French and Weisskopf [34] Kroll and Lamb [33] Pauli FF l"0 Pauli FF lO0 Vacuum polarization Uehling [40] Dirac FF Appelquist and Brodsky [43]
1 m(Za)\ 11 ln # d ! ln k (n, l) 3 m 72 J
Pauli FF l"0 Sommer"eld [52] Peterman [51]
Pauli FF lO0
3 p 49 4819 a ! f(3)# ln 2! p! d 4 2 432 5184 p J a +0.46994142 d p J
3 p p 197 a f(3)! ln 2# # 16 8 48 576 p a +!0.08211972 p p p 197 3 f(3)! ln 2# # 16 8 48 576
406 267.21
50 783.40
!216 675.84
!27 084.48
3547.82
443.48
!619.96
!77.50
!1910.67
!238.83
3.01
0.38
j( j#1)!l(l#1)!3/4 a m pm l(l#1)(2l#1) j( j#1)!l(l#1)!3/4 m a +!0.08211972 l(l#1)(2l#1) m p
Sommer"eld [52] Peterman [51] Vacuum polarization
Dirac FF Melnikov and van Ritbergen [56]
*E(2S) (kHz)
7 925 175.26(9) 1 013 988.13(1)
j( j#1)!l(l#1)!3/4 m m 8l(l#1)(2l#1) 1 ! d 15 J
Barbieri et al. [48]
Baranger et al. [53]
*E(1S) (kHz)
41 a ! d 162 p J
25 17 2929 217 f(5)! pf(3)! f(3)! a 8 24 288 9 217 103 41 671 ! ln 2! p ln 2# p ln 2 1080 2160 216 3899 454 979 77 513 p! p! # 38 880 186 624 25 920
a d J p
a d J p 83 215 25 1 pf(3)! f(5)# a # ln 2 24 288 96 3 +0.171 722
Pauli FF l"0 Kinoshita [57]
(continued on next page)
M.I. Eides et al. / Physics Reports 342 (2001) 63}261
95
Table 2 (continued)
a(Za) m 3 250 137.65(4) m+ (kHz) pn m n
4
Laporta and Remiddi [58]
*E(2S) (kHz)
239 139 1 p# f(3) ! p ln 2 ! 8640 72 24 149 17 101 28 259 ! p ln 2# p# 18 3240 20 736 a +0.29531032 p
Pauli FF, lO0, Kinoshita [57] Laporta and Remiddi [58]
*E(1S) (kHz)
215 25 83 pf(3)! f(5)# 288 96 3
a p
1 a # ln 2 24
5.18
0.65
1 239 139 149 ! p ln 2 ! p# f(3)! pln 2 24 8640 72 18
17 101 28 259 j( j#1)!l(l#1)!3/4 # p# 3240 l(l#1)(2l#1) 20 736
m a ; m p
j( j#1)!l(l#1)!3/4 m a +0.29531032 l(l#1)(2l#1) m p
Vacuum polarization
Baikov and Broadhurst [59] !
p ln 2 23p 325 805 8135 f(3)! # ! 9216 15 360 373 248
Eides and Grotch [60]
Hadronic polarization Karshenboim [61] Eides and Shelyuto [62] Friar et al. [63]
a d J p !6.36
!0.79
1 m ! 15 m I
!5.07
!0.63
4p 2 1 !R G ! T f m 3 1 GeV TG TG
!3.18
!0.40
+!0.36265442 Muonic polarization Karshenboim [61] Eides and Shelyuto [62]
a d J p
behavior of any radiatively corrected Feynman diagram (or more accurately any gauge invariant sum of Feynman diagrams) is milder than the behavior of the skeleton diagram. Consider the matrix element in momentum space of the diagrams in Fig. 17 with two exchanged Coulomb photons between the SchroK dinger}Coulomb wave functions. We will take the external electron momenta to be on-shell and to have vanishing space components. It is then easy to see that the
96
M.I. Eides et al. / Physics Reports 342 (2001) 63}261
Fig. 17. Skeleton diagram with two exchanged Coulomb photons. Fig. 18. Radiative insertions in the electron line.
contribution of such a diagram to the Lamb shift is given by the infrared divergent integral
16(Za) m dk m ! d , pn m k J
(66)
where k is the dimensionless momentum of the exchanged photon measured in the units of the electron mass. This divergence has a simple physical interpretation. If we do not ignore small virtualities of the external electron lines and the external wave functions this two-Coulomb exchange adds one extra rung to the Coulomb wave function and should simply reproduce it. The naive infrared divergence above would be regularized at the characteristic atomic scale mZa. Hence, it is evident that the kernel with two-photon exchange is already taken into account in the e!ective Dirac equation above and there is no need to try to consider it as a perturbation. Let us consider now radiative photon insertions in the electron line (see Fig. 18). Account of these corrections e!ectively leads to insertion of an additional factor ¸(k) in the divergent integral above, and while this factor has at most a logarithmic asymptotic behavior at large momenta and does not spoil the ultraviolet convergence of the integral, in the low-momentum region it behaves as ¸(k)&k (again up to logarithmic factors), and improves the low-frequency behavior of the integrand. However, the integrand is still divergent even after inclusion of the radiative corrections because the two-photon-exchange box diagram, even with radiative corrections, contains a contribution of the previous order in Za, namely the main contribution to the Lamb shift induced by the electron form factor. This spurious contribution may be easily removed by subtracting the leading low-momentum term from ¸(k)/k. The result of the subtraction is a convergent integral which is responsible for the correction of order a(Za). As an additional bonus of this approach one does not need to worry about the ultraviolet divergence of the one-loop radiative corrections. The subtraction automatically eliminates any ultraviolet divergent terms and the result is both ultraviolet and infrared "nite. Due to radiative insertions low integration momenta (of atomic order mZa) are suppressed in the exchange loops and the e!ective integration momenta are of order m. Hence, one may neglect the small virtuality of external fermion lines and calculate the above diagrams with on-mass-shell external momenta. Contributions to the Lamb shift are given by the product of the square of the SchroK dinger}Coulomb wave function at the origin "t(0)" and the diagram. Under these conditions the diagrams in Fig. 18 comprise a gauge invariant set and may easily be calculated. Contributions of the diagrams with more than two exchanged Coulomb photons are of higher order in Za. This is obvious for the high exchanged momenta integration region. It is not di$cult to demonstrate that in the Yennie gauge [67}69] contributions from the low exchanged momentum region to the matrix element with the on-shell external electron lines remain infrared "nite, and hence, cannot produce any correction of order a(Za). Since the sum of diagrams with the on-shell
M.I. Eides et al. / Physics Reports 342 (2001) 63}261
97
external electron lines is gauge invariant this is true in any gauge. It is also clear that small virtuality of the external electron lines would lead to an additional suppression of the matrix element under consideration, and, hence, it is su$cient to consider only two-photon exchanges for calculation of all corrections of order a(Za). The magnitude of the correction of order a(Za) may be easily estimated before the calculation is carried out. We need to take into account the skeleton factor 4m(Za)/n discussed above in Section 6, and multiply it by an extra factor a(Za). Naively, one could expect a somewhat smaller factor a(Za)/p. However, it is well known that a convergent diagram with two external photons always produces an extra factor p in the numerator, thus compensating the factor p in the denominator generated by the radiative correction. Hence, calculation of the correction of order a(Za) should lead to a numerical factor of order unity multiplied by 4ma(Za)/n. 4.3.2. Radiative corrections of order a(Za)m 4.3.2.1. Correction induced by the radiative insertions in the electron line. This correction is generated by the sum of all possible radiative insertions in the electron line in Fig. 18. In the approach described above, one has to calculate the electron factor corresponding to the sum of all radiative corrections in the electron line, make the necessary subtraction of the leading infrared asymptote, insert the subtracted expression in the integrand in Eq. (66), and then integrate over the exchanged momentum. This leads to the result
a(Za) m 11 1 md , ! ln 2 *E"4 1# J n m 128 2
(67)
which was "rst obtained in [64}66] in other approaches. Note that numerically 1#11/128!1/2 ln 2+0.739 in excellent agreement with the qualitative considerations above. 4.3.2.2. Correction induced by the polarization insertions in the external photons. The correction of order a(Za) induced by the polarization operator insertions in the external photon lines in Fig. 19 was obtained in [64}66] and may again be calculated in the skeleton integral approach. We will use the simplicity of the one-loop polarization operator, and perform this calculation in more detail in order to illustrate the general considerations above. For calculation of the respective contribution one has to insert the polarization operator in the skeleton integrand in Eq. (66) 1 a P I (k) , k p
(68)
Fig. 19. Polarization insertions in the Coulomb lines.
98
M.I. Eides et al. / Physics Reports 342 (2001) 63}261
where
v(1!v/3) . dv 4#(1!v)k Of course, the skeleton integral still diverges in the infrared after this substitution since I (k)"
(69)
1 I (0)" . 15
(70)
This linear infrared divergence dk/k is e!ectively cut o! at the characteristic atomic scale mZa, it lowers the power of the factor Za, respective would be divergent contribution turns out to be of order a(Za), and corresponds to the polarization part of the leading order contribution to the Lamb shift. We carry out the subtraction of the leading low-frequency asymptote of the polarization operator insertion, which corresponds to the subtraction of the leading low-frequency asymtote in the integrand for the contribution to the energy shift
k v(1!v)(1!v/3) (71) dv II (k),I (k)!I (0)"! 4#(1!v)k 4 and substitute the subtracted expression in the formula for the Lamb shift in Eq. (66). We also insert an additional factor 2 in order to take into account possible insertions of the polarization operator in both photon lines. Then
*E"!m
"m
m a(Za) m pn
m a(Za) m pn
32 m 1! M
8
m 1! M
II (k) dk d k J
v(1!v)(1!v/3) dv dk d J 4#(1!v)k
5 a(Za) m md . " J m 48 n
(72)
We have restored in Eq. (72) the characteristic factor 1/(1!m/M) which was omitted in Eq. (66), but which naturally arises in the skeleton integral. However, it is easy to see that an error generated by the omission of this factor is only about 0.02 kHz even for the electron-line contribution to the 1S level shift, and, hence, this correction may be safely omitted at the present level of experimental accuracy. 4.3.2.3. Total correction of order a(Za)m. The total correction of order a(Za)m is given by the sum of contributions in Eqs. (67) and (72):
a(Za) m 11 5 1 md *E"4 1# # ! ln 2 J n m 128 192 2
M.I. Eides et al. / Physics Reports 342 (2001) 63}261
99
Fig. 20. Six gauge invariant sets of diagrams for corrections of order a(Za)m.
a(Za) m md "3.0616222 J m n "57 030.70 kHz ,"7128.84 kHz . L L
(73)
4.3.3. Corrections of order a(Za)m Corrections of order a(Za) have the same physical origin as corrections of order a(Za), and the scattering approximation is su$cient for their calculation [70]. We consider now corrections of higher order in a than in the previous section and there is a larger variety of relevant graphs. All six gauge invariant sets of diagrams [70] which produce corrections of order a(Za) are presented in Fig. 20. The blob called `2 loopsa in Fig. 20(f ) means the gauge invariant sum of diagrams with all possible insertions of two radiative photons in the lectron line. All diagrams in Fig. 20 may be obtained from the skeleton diagram in Fig. 17 with the help of di!erent two-loop radiative insertions. As in the case of the corrections of order a(Za), corrections to the energy shifts are given by the matrix elements of the diagrams in Fig. 20 calculated between free electron spinors with all external electron lines on the mass shell, projected on the respective spin states, and multiplied by the square of the SchroK dinger}Coulomb wave function at the origin [70]. It should be mentioned that some of the diagrams under consideration contain contributions of the previous order in Za. These contributions are produced by the terms proportional to the
100
M.I. Eides et al. / Physics Reports 342 (2001) 63}261
exchanged momentum squared in the low-frequency asymptotic expansion of the radiative corrections, and are connected with integration over external photon momenta of characteristic atomic scale mZa. The scattering approximation is inadequate for their calculation. In the skeleton integral approach these previous order contributions arise as powerlike infrared divergences in the "nal integration over the exchanged momentum. We subtract leading low-frequency terms in the low-frequency asymptotic expansions of the integrands, when necessary, and thus remove the spurious previous order contributions. 4.3.3.1. One-loop polarization insertions in the Coulomb lines. The simplest correction is induced by the diagrams in Fig. 20(a) with two insertions of the one-loop vacuum polarization in the external photon lines. The contribution to the Lamb shift is given by the insertion of the one-loop polarization operator squared I (k) in the skeleton integral in Eq. (66), and taking into account the multiplicity factor 3 one easily obtains [70}72]
48a(Za) m 23 a(Za) m m dk I (k)d "! md . *E"! J J m m 378 pn pn
(74)
4.3.3.2. Insertions of the irreducible two-loop polarization in the Coulomb lines. The naive insertion 1/kPI (k) of the irreducible two-loop vacuum polarization operator I (k) [54,55] in the skeleton integral in Eq. (66) would lead to an infrared divergent integral for the diagrams in Fig. 20(b). This divergence re#ects the existence of the correction of the previous order in Za connected with the two-loop irreducible polarization. This contribution of order a(Za)m was discussed in Section 4.2.2.3, and as we have seen the respective contribution to the Lamb shift is given simply by the product of the SchroK dinger}Coulomb wave function squared at the origin and the leading low-frequency term of the function I (0). In terms of the loop momentum integration this means that the relevant loop momenta are of the atomic scale mZa. Subtraction of the value I (0) from the function I (k) e!ectively removes the previous order contribution (the low momentum region) from the loop integral and one obtains the radiative correction of order a(Za)m generated by the irreducible two-loop polarization operator [70}72]
32a(Za) m dk m [I (k)!I (0)]d *E"! J m k pn 25 15 647 a(Za) m 52 md . " ln 2! p# J 63 13 230 63 pn m
(75)
4.3.3.3. Insertion of one-loop electron factor in the electron line and of the one-loop polarization in the Coulomb lines. The next correction of order a(Za) is generated by the gauge invariant set of diagrams in Fig. 20(c). The respective analytic expression is obtained from the skeleton integral by simultaneous insertion in the integrand of the one-loop polarization function I (k) and of the expressions corresponding to all possible insertions of the radiative photon in the electron line. It is simpler "rst to obtain an explicit analytic expression for the sum of all these radiative insertions in the electron line, which we call the one-loop electron factor ¸(k) (explicit expression for the electron factor in di!erent forms may be found in [36,73}75]), and then to insert this electron factor in the skeleton integral. It is easy to check explicitly that the resulting integral for the radiative correction
M.I. Eides et al. / Physics Reports 342 (2001) 63}261
101
is both ultraviolet and infrared "nite. The infrared "niteness nicely correlates with the physical understanding that for these diagrams there is no correction of order a(Za) generated at the atomic scale. The respective integral for the radiative correction was calculated both numerically and analytically [73,71,75], and the result has the following elegant form:
32a(Za) m m dk ¸(k)I (k)d *E"! J m pn
"
8 1#(5 872 1#(5 628 2p 67 282 a(Za) m md . ln ! (5 ln # ln 2! # J 2 2 3 pn m 63 63 9 6615 (76)
4.3.3.4. One-loop polarization insertions in the radiative electron factor. This correction is induced by the gauge invariant set of diagrams in Fig. 20(d) with the polarization operator insertions in the radiative photon. The respective radiatively corrected electron factor is given by the expression [74]
L(k)"
v v 1! 3 dv ¸(k, j) , 1!v
(77)
where ¸(k, j) is just the one-loop electron factor used in Eq. (76) but with a "nite photon mass j"4/(1!v). Direct substitution of the radiatively corrected electron factor L(k) in the skeleton integral in Eq. (66) would lead to an infrared divergence. This divergence re#ects existence in this case of the correction of the previous order in Za generated by the two-loop insertions in the electron line. The magnitude of this previous order correction is determined by the nonvanishing value of the electron factor L(k) at zero (78) L(0)"!2F (0)!F (0) , which is simply a linear combination of the slope of the two-loop Dirac form factor and the two-loop contribution to the electron anomalous magnetic moment. Subtraction of the radiatively corrected electron factor removes this previous order contribution which was already considered above, and leads to a "nite integral for the correction of order a(Za) [74,71]
16a(Za) m L(k)!L(0) m dk *E"! d J m pn k a(Za) m md . "!0.072902 J m pn
(79)
4.3.3.5. Light by light scattering insertions in the external photons. The diagrams in Fig. 20(e) with the light by light scattering insertions in the external photons do not generate corrections of the
102
M.I. Eides et al. / Physics Reports 342 (2001) 63}261
previous order in Za. They are both ultraviolet and infrared "nite and respective calculations are in principle quite straightforward though technically involved. Only numerical results were obtained for the contributions to the Lamb shift [71,76]
a(Za) m md . *E"!0.12292 J m pn
(80)
4.3.3.6. Diagrams with insertions of two radiative photons in the electron line. As we have already seen, contributions of the diagrams with radiative insertions in the electron line always dominate over the contributions of the diagrams with radiative insertions in the external photon lines. This property of the diagrams is due to the gauge invariance of QED. The diagrams (radiative insertions) with the external photon lines should be gauge invariant, and as a result transverse projectors correspond to each external photon. These projectors are rational functions of external momenta, and they additionally suppress low-momentum integration regions in the integrals for energy shifts. Respective projectors are of course missing in the diagrams with insertions in the electron line. The low-momentum integration region is less suppressed in such diagrams, and hence they generate larger contributions to the energy shifts. This general property of radiative corrections clearly manifests itself in the case of six gauge invariant sets of diagrams in Fig. 20. By far the largest contribution of order a(Za) to the Lamb shift is generated by the last gauge invariant set of diagrams in Fig. 20(f ), which consists of nineteen topologically di!erent diagrams [77] presented in Fig. 21. These nineteen graphs may be obtained from the three graphs for the two-loop electron self-energy by insertion of two external photons in all possible ways. Graphs in Fig. 21(a)}(c) are obtained from the two-loop reducible electron self-energy diagram, graphs in Fig. 21(d)}(k) are the result of all possible insertions of two external photons in the rainbow self-energy diagram, and diagrams in Fig. 21(l)}(s) are connected with the overlapping two-loop self-energy graph. Calculation of the respective energy shift was initiated in [77,78], where contributions induced by the diagrams in Fig. 21(a)}(h) and in Fig. 21(l) were obtained. Contribution of all nineteen diagrams to the Lamb shift was "rst calculated in [79]. In the framework of the skeleton integral approach the calculation was completed in [80,62] with the result
a(Za) m md *E"!7.725(1)2 J m pn
(81)
which con"rmed the one in [79] but is about two orders of magnitude more precise than the result in [79,14]. A few comments are due on the magnitude of this important result. It is sometimes claimed in the literature that it has an unexpectedly large magnitude. A brief glance at Table 3 is su$cient to convince oneself that this is not the case. For the reader who followed closely the discussion of the scales of di!erent contributions above, it should be clear that the natural scale for the correction under discussion is set by the factor 4a(Za)/(pn)m. The coe$cient before this factor obtained in Eq. (81) is about !1.9 and there is nothing unusual in its magnitude for a numerical factor corresponding to a radiative correction. It should be compared with the respective coe$cient 0.739 before the factor 4a(Za)/nm in the case of the electron-line contribution of the previous order in a.
M.I. Eides et al. / Physics Reports 342 (2001) 63}261
103
Fig. 21. Nineteen topologically di!erent diagrams with two radiative photons insertions in the electron line.
The misunderstanding about the magnitude of the correction of order a(Za)m has its roots in the idea that the expansion of energy in a series over the parameter Za at "xed power of a should have coe$cients of order one. As is clear from the numerous discussions above, however natural such expansion might seem from the point of view of calculations performed without expansion over Za, there are no real reasons to expect that the coe$cients would be of the same order of magnitude in an expansion of this kind. We have already seen that quite di!erent physics is connected with the di!erent terms in expansion over Za. The terms of order aL(Za) (and aL(Za), as we will see below) are generated at large distances (exchanged momenta of order of the atomic scale mZa) while terms of order aL(Za) originate from the small distances (exchanged momenta of order of the electron mass m). Hence, it should not be concluded that there would be a simple way to "gure out the relative magnitude of the successive coe$cients in an expansion over Za. The situation is di!erent for expansion over a at "xed power of Za since the physics is the same independent of the power of a, and respective coe$cients are all of order one, as in the series for the radiative corrections in scattering problems.
104
M.I. Eides et al. / Physics Reports 342 (2001) 63}261
Table 3 Radiative corrections of order aL(Za)m
Electron-line insertions Karplus et al. [64,65] Baranger et al. [66] Polarization contribution Karplus et al. [64,65] Baranger et al. [66] One-loop polarization Eides [70] Pachucki and Laporta [71,72] Two-loop polarization Eides et al. [70]
4
a(Za) m m n m
*E(1S) (kHz)
*E(2S) (kHz)
11 1 1# ! ln 2 d J 128 2
55 090.31
6886.29
5 d 192 J
1940.38
242.55
23 a d ! 1512 p J
!2.63
!0.33
21.99
2.75
26.45
3.31
a !0.0182 d p J
!3.15
!0.39
a !0.0307 d p J
!5.31
!0.66
!334.24(5)
!41.78
Pachucki and Laporta [71,72]
13 25 15647 a ln 2! p# d 63 252 52920 p J
One-loop polarization and electron factor Eides and Grotch [73] Pachucki [71] Eides et al. [75]
2 1#(5 218 1#(5 ln ! (5 ln 3 2 2 63
Polarization insertion in the electron factor Eides and Grotch [74] Pachucki [71] Light by light scattering Pachucki [71], Eides et al. [76] Insertions of two radiative photons in the electron line Pachucki [79] Eides and Shelyuto [80,62]
157 p 33641 a # ln 2! # d 63 18 13230 p J
a !1.9312(3) d p J a ($1?) d J p
$0.4
$0.05
4.3.3.7. Total correction of order a(Za)m. The total contribution of order a(Za) is given by the sum of contributions in Eqs. (74)}(76) and (79)}(81) [75]
*E"
8 1#(5 872 1#(5 680 2p 25p ln ! (5 ln # ln 2! ! 2 2 3 63 63 9 63
M.I. Eides et al. / Physics Reports 342 (2001) 63}261
105
a(Za) m a(Za) m 24901 md "!6.862(1) md . # !7.921(1) J J pn m pn m 2205 *E"!6.862(1)
a(Za) m md J pn m
"!296.92(4) kHz ,"!37.115(5) kHz . L L
(82)
(83)
4.3.4. Corrections of order a(Za)m Corrections of order a(Za) have not been considered in the literature. From the preceding discussion it is clear that their natural scale is determined by the factor 4a(Za)/(pn)m, which is equal about 0.4 kHz for the 1S-state and about 0.05 kHz for the 2S-state. Taking into account the rapid experimental progress in the "eld these theoretical calculations may become necessary in the future, if experimental accuracy in the measurement of the 1S Lamb shift at the level of 1 kHz, is achieved. 4.4. Radiative corrections of order aL(Za)m 4.4.1. Radiative corrections of order a(Za)m 4.4.1.1. Logarithmic contribution induced by the radiative insertions in the electron line. Unlike the corrections of order aL(Za), corrections of order aL(Za) depend on the large distance behavior of the wave functions. Roughly speaking this happens because in order to produce a correction containing six factors of Za one needs at least three exchange photons like in Fig. 22. The radiative photon responsible for the additional factor of a does not suppress completely the low-momentum region of the exchange integrals. As usual, long distance contributions turn out to be state dependent. The leading correction of order a(Za) contains a logarithm squared, which can be compared to the "rst power of logarithm in the leading-order contribution to the Lamb shift. One can understand the appearance of the logarithm squared factor qualitatively. In the leading-order contribution to the Lamb shift, the logarithm was completely connected with the logarithmic infrared singularity of the electron form factor. Now we have two exchanged loops and one should anticipate the emergence of an exchanged logarithm generated by these loops. Note that the diagram with one exchange loop (e.g., relevant for the correction of order a(Za)) cannot produce a logarithm, since in the external "eld approximation the loop integration measure dk is odd in the exchanged momentum, while all other factors in the exchanged integral are even in the exchanged momentum. Hence, in order to produce a logarithm which can only arise from the dimensionless integrand it is necessary to consider an even number of exchanged loops. These simple remarks
Fig. 22. Diagram with three spanned Coulomb photons.
106
M.I. Eides et al. / Physics Reports 342 (2001) 63}261
may also be understood in another way if one recollects that in the relativistic corrections to the SchroK dinger}Coulomb wave function each power of logarithm is multiplied by the factor (Za) (this is evident if one expands the exact Dirac wave function near the origin). The logarithm squared term is, of course, state independent since the coe$cient before this term is determined by the high momentum integration region, where the dependence on the principal quantum number may enter only via the value of the wave function at the origin squared. Terms linear in the large logarithm are already state dependent. Logarithmic terms were "rst calculated in [81}84]. For the S-states the logarithmic contribution is equal to
m(Za)\ 4 1 2 # ln 2#ln #t(n#1)!t(1) *E " " ! ln J m 3 4 n m(Za)\ 4a(Za) m 601 77 m, ! ! ln m pn m 720 180n where
(84)
L\ 1 t(n)" #t(1) , (85) k is the logarithmic derivative of the Euler C-function t(x)"C(x)/C(x), t(1)"!c. For non-S-states the state-independent logarithm squared term disappears and the singlelogarithmic contribution has the form
1 m(Za)\ 1 6!2l(l#1)/n # d d # ln H J 30 12 m 3(2l#3)l(l#1)(4l!1) 4a(Za) m m. ; m pn
*E " " J$
1 1! n
(86)
Calculation of the state-dependent nonlogarithmic contribution of order a(Za) is a di$cult task, and has not been done for an arbitrary principal quantum number n. The "rst estimate of this contribution was made in [84] Next the problem was attacked from a di!erent angle [85,86]. Instead of calculating corrections of order a(Za) an exact numerical calculation of all contributions with one radiative photon, without expansion over Za, was performed for comparatively large values of Z (n"2), and then the result was extrapolated to Z"1. In this way an estimate of the sum of the contribution of order a(Za) and higher-order contributions a(Za) was obtained (for n"2 and Z"1). We will postpone discussion of the results obtained in this way up to Section 4.5.1, dealing with corrections of order a(Za), and will consider here only the direct calculations of the contribution of order a(Za). An exact formula in Za for all nonrecoil corrections of order a had the form *E"1n"R"n2 ,
(87)
where R is an `exacta second-order self-energy operator for the electron in the Coulomb "eld (see Fig. 23), and hence contains the unmanageable exact Dirac}Coulomb Green function. The real problem with this formula is to extract useful information from it despite the absence of a convenient expression for the Dirac}Coulomb Green function. Numerical calculation without expansion
M.I. Eides et al. / Physics Reports 342 (2001) 63}261
107
Fig. 23. Exact second-order self-energy operator.
over Za, mentioned in the previous paragraph, was performed directly with the help of this formula. A more precise (than in [84]) value of the nonlogarithmic correction of order a(Za) for the 1S-state was obtained in [87,88], with the help of a specially developed `perturbation theorya for the Dirac}Coulomb Green function which expressed this function in terms of the nonrelativistic SchroK dinger}Coulomb Green function [89,90]. But the real breakthrough was achieved in [91,92], where a new very e!ective method of calculation was suggested and very precise values of the nonlogarithmic corrections of order a(Za) for the 1S- and 2S-states were obtained. We will brie#y discuss the approach of papers [91,92] in the next subsection. 4.4.1.2. New approach to separation of the high- and low-momentum contributions. Nonlogarithmic corrections. Starting with the very "rst nonrelativistic consideration of the main contribution to the Lamb shift [32] separation of the contributions of high- and low-frequency radiative photons became a characteristic feature of the Lamb shift calculations. The main idea of this approach was already explained in Section 4.2.1, but we skipped over two obstacles impeding e!ective implementation of this idea. Both problems are connected with the e!ective realization of the matching procedure. In real calculations it is not always obvious how to separate the two integration regions in a consistent way, since in the high-momenta region one uses explicitly relativistic expressions, while the starting point of the calculation in the low-momenta region is the nonrelativistic dipole approximation. The problem is aggravated by the inclination to use di!erent gauges in di!erent regions, since the explicitly covariant Feynman gauge is the simplest one for explicitly relativistic expressions in the high-momenta region, while the Coulomb gauge is the gauge of choice in the nonrelativistic region. In order to emphasize the seriousness of these problems it su$ces to mention that incorrect matching of high- and low-frequency contributions in the initial calculations of Feynman and Schwinger led to a signi"cant delay in the publication of the "rst fully relativistic Lamb shift calculation of French and Weisskopf [34]! It was a strange irony of history that due to these di$culties it became common wisdom in the sixties that it is better to try to avoid the separation of the contributions coming from di!erent momenta regions (or di!erent distances) than to try to invent an accurate matching procedure. A few citations are appropriate here. Bjorken and Drell [19] wrote, having in mind the separation procedure: `The reader may understandably be unhappy with this procedure 2 we recommend the recent treatment of Erickson and Yennie [83,84], which avoids the division into soft and hard photonsa. Schwinger [55] wrote: `2 there is a moral here for us. The arti"cial separation of high and low frequencies, which are handled in di!erent ways, must be avoided.a All this was written even though it was understood that the separation of the large and small distances was physically quite natural and the contributions coming from large and small distances have a di!erent physical nature. However, the distrust to the See fascinating description of this episode in [93].
108
M.I. Eides et al. / Physics Reports 342 (2001) 63}261
methods used for separation of the small and large distances was well justi"ed by the lack of a regular method of separation. Apparently di!erent methods were used for calculation of the high- and low-frequency contributions, high-frequency contributions being commonly treated in a covariant four-dimensional approach, while old-fashioned nonrelativistic perturbation theory was used for calculation of the low-frequency contributions. Matching these contributions obtained in di!erent frameworks was an ambiguous and far from obvious procedure, more art than science. As a result, despite the fact that the methods based on separation of long- and short-distance contributions had led to some spectacular results (see, e.g., [94,95]), their selfconsistency remained suspect, especially when it was necessary to calculate the contributions of higher order than in the classic works. It seemed more or less obvious that in order to facilitate such calculations one needed to develop uniform methods for treatment of both small and large distances. The actual development took, however, a di!erent direction. Instead of rejecting the separation of high and low frequencies, more elaborate methods of matching respective contributions were developed in the last decade, and the general attitude to separation of small and large distances radically changed. Perhaps the "rst step to carefully separate the long and short distances was done in [25], where the authors had rearranged the old-fashioned perturbation theory in such a way that one contribution emphasized the small momentum contributions and led to a Bethe logarithm, while in the other the small momentum integration region was naturally suppressed. Matching of both contributions in this approach was more natural and automatic. However, the price for this was perhaps too high, since the high-momentum contribution was to be calculated in a three-dimensional way, thus losing all advantages of the covariant four-dimensional methods. Almost all new approaches, the skeleton integral approach described above in Section 4.3.1 ([62] and references there), e-method described in this section [91,92], nonrelativistic approach by Khriplovich and coworkers [96], nonrelativistic QED of Caswell and Lepage [17]) not only make separation of the small and large distances, but try to exploit it most e!ectively. In some cases, when the whole contribution comes only from the small distances, a rather simple approach to this problem is appropriate (like in the calculation of corrections of order a(Za), a(Za), a(Za) and a(Za) above, more examples below) and the scattering approximation is often su$cient. In such cases, would-be infrared divergences are powerlike. They simply indicate the presence of the contributions of the previous order in Za and may safely be thrown away. In other cases, when one encounters logarithms which get contributions both from the small and large distances, a more accurate approach is necessary such as the one described below. In any case `the separation of low and high frequencies, which are handled in di!erent waysa not only should not be avoided but turns out to be a very convenient calculational tool and clari"es the physical nature of the corrections under consideration. An e!ective method to separate contributions of low- and high-momenta avoiding at the same time the problems discussed above was suggested in [91,92]. Consider in more detail the exact expression Eq. (87) for the sum of all corrections of orders a(Za)Lm (n51) generated by the insertion of one radiative photon in the electron line
*E"e
dk D (k)1n"c G(p!k; p!k)c "n2 , I J (2p)i IJ
(88)
M.I. Eides et al. / Physics Reports 342 (2001) 63}261
109
where G(p!k; p!k) is the exact electron Green function in the external Coulomb "eld. As was noted in [91,92] one can rotate the integration contour over the frequency of the radiative photon in such a way that it encloses singularities along the positive real axes in the u(k) plane. Then one considers separately the region Re u4p (region I) and Re u5p (region II), where m(Za);p;m(Za). It is easy to see that due to the structure of the singularities of the integrand, integration over k in the region I also goes only over the momenta smaller than p ("k"4p), while in the region II the "nal integration over u cuts o! all would be infrared divergences of the integral. Hence, e!ective separation of high- and low-momenta integration regions is achieved in this way and, as was explained above, due to the choice of the magnitude of the parameter p all would be divergences should exactly cancel in the sum of contributions of these regions. This cancellation provides an additional e!ective method of control of the accuracy of all calculations. It was also shown in [92] that a change of gauge in the low-frequency region changes the result of the calculations by a term linear in p. But anyway one should discard such contributions matching high- and low-frequency contributions. The matrix element of the self-energy operator between the exact Coulomb}Dirac wave functions is gauge invariant with respect to changes of gauge of the radiative photon [97]. Hence, it is possible to use the simple Feynman gauge for calculation of the high-momenta contribution, and the physical Coulomb gauge in the low-momenta part. It should be clear now that this method resolves all problems connected with the separation of the high- and low-momenta contributions and thus provides an e!ective tool for calculation of all corrections with insertion of one radiative photon in the electron line. The calculation performed in [91,92,98] successfully reproduced all results of order a(Za) and a(Za) and produced a high precision result for the constant of order a(Za) *E (1S)"!30.92415(1) U
a(Za) m m, m p
(89)
a(Za) m m. *E (2S)"!31.84047(1) U 8p m Besides the high accuracy of this result two other features should be mentioned. First, the state dependence of the constant is very weak, and second, the scale of the constant is just of the magnitude one should expect. In order to make this last point more transparent let us write the total electron-line contribution of order a(Za) to the 1S energy shift in the form
m(Za)\ 28 m(Za)\ a(Za) m 21 m # ln 2! ln !30.92890 m 3 m p m 20 m(Za)\ m(Za)\ a(Za) m m. (90) + !ln #5.42 ln !30.93 m m p m Now we see that the ratio of the nonlogarithmic term and the coe$cient before the singlelogarithmic term is about 31/5.4+5.7+0.6p. It is well known that the logarithm squared terms in QED are always accompanied by the single-logarithmic and nonlogarithmic terms, and the nonlogarithmic terms are of order p (in relation with the current problem see, e.g., [83,84]). This is just what happens in the present case, as we have demonstrated. *E(1S)" !ln
110
M.I. Eides et al. / Physics Reports 342 (2001) 63}261
Table 4 Nonlogarithmic coe$cient A
a(Za) m m pn m
kHz
1S Pachucki [91,92,98]
!30.92415(1)
!1338.04
2S Pachucki [91,92]
!31.84047(1)
!172.21
2P Jentschura and Pachucki [99]
!0.99891(1)
!5.40
2P Jentschura and Pachucki [99]
!0.50337(1)
!2.72
3P Jentschura et al. [100]
!1.14768(1)
!1.84
3P Jentschura et al. [100]
!0.59756(1)
!0.96
4P Jentschura et al. [100]
!1.19568(1)
!0.81
4P Jentschura et al. [100]
!0.63094(1)
!0.43
Nonlogarithmic contributions of order a(Za) to the energies of the 2P, 3P and 4P-states induced by the radiative photon insertions in the electron line were obtained in the same framework in [99,100]. We have collected the respective results in Table 4 in terms of the traditionally used coe$cient A [83] which is de"ned by the relationship *E"A
a(Za) m m. pn m
(91)
4.4.1.3. Correction induced by the radiative insertions in the external photons. There are two kernels with radiative insertions in the external photon lines which produce corrections of order a(Za) to the Lamb shift. First is our old acquaintance } one-loop polarization insertion in the Coulomb line in Fig. 9. Its Fourier transform is called the Uehling potential [40,101]. The second kernel contains the light-by-light scattering diagrams in Fig. 24 with three external photons originating from the Coulomb source. The sum of all closed electron loops in Fig. 25 with one photon connected with the electron line and an arbitrary number of Coulomb photons originating from the Coulomb source may be considered as a radiatively corrected Coulomb potential <. It generates a shift of the atomic energy levels *E"1n"<"n2 .
(92)
M.I. Eides et al. / Physics Reports 342 (2001) 63}261
111
Fig. 24. Wichmann}Kroll potential. Fig. 25. Total one-loop polarization potential in the external "eld.
This potential and its e!ect on the energy levels were "rst considered in [102]. Since each external Coulomb line brings an extra factor Za the energy shift generated by the Wichmann}Kroll potential increases for large Z. For practical reasons the e!ects of the Uehling and Wichmann}Kroll potentials were investigated mainly numerically and without expansion in Za, since only such results could be compared with the experiments. Now there exist many numerical results for vacuum polarization contributions. In accordance with our emphasis on the analytic results we will discuss here only analytic contributions of order a(Za), and will return to numerical results in Section 4.5.2. a. Uehling potential contribution. It is not di$cult to present an exact formula containing all corrections produced by the Uehling potential in Fig. 9 (compare with the respective expression for the self-energy operator above) *E"4p(Za)1n"
P(k) "n2 . k
(93)
We have already seen that the matrix element of the "rst term of the low-momentum expansion of the one-loop polarization operator between the nonrelativistic SchroK dinger}Coulomb wave functions produces a correction of order a(Za). The next term in the low-momentum expansion of the polarization operator pushes characteristic momenta in the integrand to relativistic values, where the very nonrelativistic expansion is no longer valid, and even makes the integral divergent if one tries to calculate it between the nonrelativistic wave functions. Due to this e!ect we preferred to calculate the correction of order a(Za) induced by the one-loop polarization insertion (as well as the correction of order a(Za) induced by the two-loop polarization) in the skeleton integral approach in Section 4.3. Note that all these corrections contribute only to the S-states. It is useful to realize that both these calculations are, from another point of view, simply results of approximate calculation of the integral in Eq. (93) with accuracy (Za) and (Za). Our next task is to calculate this integral with accuracy (Za). In this order both small atomic (&mZa) and large relativistic (&m) momenta produce nonvanishing contributions to the integral, and as a result we get nonvanishing contributions to the energy shifts also for states with nonvanishing angular momenta. Consider "rst corrections to the energy levels with nonvanishing angular momenta. The respective wave functions vanish at the origin in coordinate space, hence only small photon momenta contribute to the integral, and one can use the "rst two terms in the nonrelativistic
112
M.I. Eides et al. / Physics Reports 342 (2001) 63}261
expansion of the polarization operator P(k) a ak +! # k 15pm 35m
(94)
for calculation of these contributions [103,100] (compare Eq. (45) above). Corrections of order a(Za) turn out to be nonvanishing only for l41. For 2P-states these corrections were "rst calculated in [86], and the result for arbitrary P-states [104] has the form
4 1 *E(nP )"! 1! H 15 n
1 1 a(Za) m m. # d H 14 4 pn m
(95)
The respective correction to the energy levels of S-states originates both from the large and small distances since the SchroK dinger}Coulomb wave function in the S-states does not vanish at small distances. Hence, one cannot immediately apply low-momenta expansion of the polarization operator for calculation of the matrix element in Eq. (93). The leading logarithmic state-independent contribution to the energy shift is still completely determined by the "rst term in the low-momentum expansion of the polarization operator in Eq. (94), but one has to consider the exact expression for the polarization operator in order to obtain the nonlogarithmic contribution. The logarithmic term originates from the logarithmic correction to the SchroK dinger}Coulomb wave function which arises when one takes into account the Darwin (d-function) potential which arises in the nonrelativistic expansion of the Dirac Hamiltonian in the Coulomb "eld (see, e.g., [19,20] and Eq. (116) below). Of course, the same logarithm arises if, instead of calculating corrections to the SchroK dinger}Coulomb wave function, one expands the singular factor in the Dirac}Coulomb wave function over Za. The correction to the SchroK dinger}Coulomb wave function W at the distances of order 1/m has the form [68] 1 dW"! (Za) ln(Za)W , 2
(96)
and substituting this correction in Eq. (93) one easily obtains the leading logarithmic contribution to the energy shift [81,83,84]
2 a(Za) m(Za)\ m m. *E (nS)"! (97) ln 15 pn m m Note that the numerical factor before the leading logarithm here is simply the product of the respective numerical factors in the correction to the wave function in Eq. (96), the low-frequency asymptote of the one-loop polarization !1/(15p) in Eq. (31), the factor 4p(Za) in Eq. (93), and factor 2 which re#ects that both wave functions in the matrix element in Eq. (93) have to be corrected. Calculation of the nonlogarithmic contributions requires more e!ort. Complete analytic results for the lowest states were "rst obtained by Mohr [86]
m(Za)\ 1 1289 4a(Za) m 1 m, ! ln #ln 2! *E(1S)" m p m 2 420 15 m(Za)\ 1 1 743 4a(Za) m m. *E(2S)" ! ln ! m 8p m 15 2 240
(98)
M.I. Eides et al. / Physics Reports 342 (2001) 63}261
113
As was mentioned above, short distance contributions are state independent and always cancel in the di!erences of the form *E(1S)!n*E(nS). This means that such state-dependent di!erences of energies contain only contributions of large distances and are much easier to calculate, since one may employ a nonrelativistic approximation. The Uehling potential contributions to the di!erence of level shifts *E(1S)!n*E(nS) were calculated in [103] with the help of the nonrelativistic expansion of the polarization operator in Eq. (94). The result of this calculation, in conjunction with the Mohr result in Eq. (98), leads to an analytic expression for the Uehling potential contribution to the Lamb shift
m(Za)\ 1 431 2(n!1) 1 ! ln ! #(t(n#1)!t(1))! *E(nS)" m 2 105 n 15 1 n 4a(Za) m m, # !ln m 28n 2 pn
(99)
where t(x) is the logarithmic derivative of the Euler C-function, see Eq. (85). b. Wichmann}Kroll potential contribution. The only other contribution of order a(Za) connected with the radiative insertions in the external photons is produced by the term trilinear in Za in the Wichmann}Kroll potential in Fig. 24. One may easily check that the "rst term in the small momentum expansion of the Wichmann}Kroll potential has the form [102,107]
19 p a(Za) < (k)" ! . 5) 45 27 m
(100)
This potential generates the energy shift [108,107]
*E"
19 p a(Za) m md , ! J 45 27 pn m
(101)
which is nonvanishing only for the S-states. 4.4.2. Corrections of order a(Za)m Corrections of order a(Za) originate from large distances, and their calculations should follow the same path as calculation of corrections of order a(Za). It is not di$cult to show that the total contribution of order a(Za) is a polynomial in ln(Za)\, starting with the cube of the logarithm. Only the factor before the leading logarithm cubed and the contribution of the logarithm squared terms to the di!erence *E (1S)!8*E (2S) are known now. Calculation of these contributions is * * relatively simple because large logarithms always originate from the wide region of large virtual momenta (mZa;k;m) and the respective matrix elements of the perturbation potentials depend only on the value of the SchroK dinger}Coulomb wave function or its derivative at the origin, as we have already seen above in discussion of the main contribution to the Lamb shift and corrections of order a(Za). This well known feature was often used in the past. For example, di!erences of the hyper"ne splittings *E(1S)!n*E(nS) were calculated much earlier [105,106] than the hyper"ne splittings themselves.
114
M.I. Eides et al. / Physics Reports 342 (2001) 63}261
Fig. 26. Two-loop self-energy operator. Fig. 27. Second order perturbation theory contribution with two one-loop self-energy operators.
4.4.2.1. Electron-line contributions. Let us turn now to the general expression for the energy shift in Eq. (25). Corrections of order a(Za) are generated not only by the term with the irreducible electron two-loop self-energy operator (see Fig. 26) like in Eq. (87) but also by the second-order perturbation theory term with two one-loop electron self-energy operators in the external Coulomb "eld in Fig. 27. It is not hard to check that the contribution containing the highest power of the logarithm is generated exactly by this term. Note "rst that a logarithmic matrix element of the "rst-order electron self-energy operator (like the one producing the leading contribution to the Lamb shift) may be considered in the framework of perturbation theory as a matrix element of an almost local (it depends on the momentum transfer only logarithmically) operator since it is induced by the diagram with relativistic virtual momenta. Then one may use the same local operator in order to calculate the higher-order perturbation theory contributions [96]. We will need only the ordinary second-order perturbation theory expression 1n"< "m21m"< "n2 , (102) *E"2 E !E L K KK$L where < and < are the perturbation operators, and the factor 2 is due to two possible orders of the perturbation operators; it is not present when < "< . Summation over the intermediate states above includes integration over the continuous spectrum with the weight dk/(2p). In order to obtain the maximum power of the large logarithm we take the quasilocal e!ective perturbation potential (< "< ) in momentum space in the form 8a(Za) k ln . (103) <"! m 3m This is just the perturbation potential which generates the leading contribution to the Lamb shift. It is evident that this potential leads to a logarithm squared contribution of order a(Za) after substitution in Eq. (102). One may gain one more logarithm from the continuous spectrum contribution in Eq. (102). Due to locality of the potential, matrix elements reduce to the products of the values of the respective wave functions at the origin and the potentials in Eq. (103). The value of the continuous spectrum Coulomb wave function at the origin is well known (see, e.g., [109]), and
2pc pc P "t (0)"" +1# P #O I k(1!e\pAP I) k
c P , k
(104)
where c "m Za. The leading term in the large momentum expansion in Eq. (104) generates an P apparently linearly ultraviolet divergent contribution to the energy shift, but this ultraviolet
M.I. Eides et al. / Physics Reports 342 (2001) 63}261
115
divergence is due to our nonrelativistic approximation, and it would be cut o! at the electron mass in a truly relativistic calculation. What is more important this correction is of order (Za), and may be safely omitted in the discussion of the corrections of order (Za). Logarithmic corrections of order (Za) are generated by the second term in the high momentum expansion in Eq. (104) *E"!2
1n"< "m21m"< "n2 (m Za)m dk . k pn
(105)
With the help of this formula one immediately obtains [110]
8 a(Za) m(Za)\ m m. *E"! (106) ln 27 pn m m Note that the scale of this contribution is once again exactly of the expected magnitude, namely, this contribution is suppressed by the factor (a/p) ln(Za)\ in comparison with the leading logarithm squared contribution of order a(Za). Of course, the additional numerical suppression factor 8/27 could not be obtained without real calculation. Numerically, the correction in Eq. (106) is about !28 kHz for the 1S-state and calculation of other corrections of order a(Za) is clearly warranted. Corrections induced by the one-particle reducible two-loop radiative insertions in the electron line were calculated numerically without expansion in Za in recent works [111}113]. E!ectively, a subset of the diagrams in Fig. 28 generating corrections of order a(Za)Lm to the Lamb shift was summed in [111}113]. This subset contains all diagrams which generate the leading logarithm cubed contribution to the Lamb shift. In the case of Z"1 an additional contribution to the Lamb shift obtained in [111] is equal !71 kHz for the 1S-state in hydrogen, and is much larger than the leading logarithm contribution in Eq. (106), while the result in [112] is in agreement with Eq. (106). The numerical results in [111}113] were parametrized as a polynomial in the low-energy logarithm ln(Za)\, namely the corrections of order a(Za)m were written in the form a(Za) m. *E"[c ln(Za)\#c ln(Za)\#c ln(Za)\] pn
(107)
Fit of the numerical results leads to the value c "!0.9 [111] for the coe$cient before the logarithm cubed to be compared with the analytic result c "!8/27+!0.3 in Eq. (106). The leading logarithmic result in Eq. (106) is generated just by the diagrams considered in [111], there are no other sources for the logarithm cubed, and, hence, the result in [111] contradicts the leading perturbation theory contribution Eq. (106). The result of [111] was con"rmed in a later independent numerical calculation [113]. Meanwhile, the leading perturbation theory contribution in Eq. (106) was recently reproduced with the help of the renormalization group equations in the approach based on nonrelativistic
Fig. 28. Reducible two-loop radiative insertions in the electron line.
116
M.I. Eides et al. / Physics Reports 342 (2001) 63}261
QED [114]. Results of one more numerical calculation [112] are also consistent with the leading perturbation theory contribution in Eq. (106), and predict a value for the coe$cient c "!1.0(1) which seems to be reasonable from the perturbation theory point of view. Clearly, resolution of the contradiction between the perturbation theory result in Eq. (106) and numerical result in [112] on one hand, and the numerical result in [111,113] on the other hand is an urgent problem. We will use the perturbation theory result in Eq. (106) for comparison between theory and experiment in Section 6.2. The value of the factor before the logarithm squared term c "!1.0$0.1 obtained in [112] generates a contribution of about !10 kHz to the 1S Lamb shift. However, at the present stage we cannot accept this value of the factor c as the true value of the coe$cient before the logarithm squared term because only a subset of all diagrams with two radiative photon insertions in the electron line was calculated in [112]. While the omitted diagrams do not contribute to the leading logarithm cubed term, they generate unknown logarithm squared contributions, and we have to wait for the completion of the numerical calculation of the remaining diagrams. It is a remarkable achievement that there is now a real perspective that the numerical calculations without expansion in Za would produce in the near future the value of the logarithm squared contribution of order a(Za)m. Another indication on the magnitude of the logarithm squared contributions to the energy shift is provided by the logarithm squared contribution to the di!erence *E(1S)!8*E(2S) (see discussion below). Taking into account this result, as well as the partial numerical estimate of the logarithm squared term in [112], we come to the conclusion that a fair estimate of the logarithm squared corrections to the individual energy levels is given by one half of the leading logarithm cubed contribution in Eq. (106), and constitutes 14 kHz for the 1S-state and 2 kHz for the 2S-state. The analysis of the contributions of order a(Za)L con"rms once again, as also emphasized in [111], that there is no regular rule for the magnitude of the coe$cients before the successive terms in the series over Za at "xed a. This happens because the terms, say of relative order Za and (Za), correspond to completely di!erent physics at small and large distances and, hence, there is no reason to expect a regular law for the coe$cients in these series. This should be compared with the series over a at "xed Za. As we have shown above, di!erent terms in these series correspond to the same physics and hence the coe$cients in these series change smoothly and may easily be estimated. This is why we have organized the discussion in this review in terms of such series. Note that the best way to estimate an unknown correction of order, say a(Za)m, which corresponds to the long-distance physics, is to compare it with the long-distance correction of order a(Za)m, and not with the correction of order a(Za)m which corresponds to the short-distance physics. Of course, such logic contradicts the spirit of the numerical calculations made without expansion over Za but it re#ects properly the physical nature of di!erent contributions at small Z. Perturbation theory calculation of logarithm squared contributions to the energy shift of S-levels is impeded by the fact that such contributions arise both from the discrete and continuous spectrum intermediate states in Eq. (102), and a complicated interplay of contributions from the di!erent regions occurs. Hence, in such a calculation it is necessary to consider the contributions of the one-loop electron self-energy operators more accurately and the local approximation used above becomes inappropriate. The case of the logarithm squared contributions to the energy levels with nonvanishing angular momenta is much simpler [115,116]. The second-order perturbation theory term with two
M.I. Eides et al. / Physics Reports 342 (2001) 63}261
117
Fig. 29. E!ective potential corresponding to two-loop vertex.
one-loop self-energy operators does not generate any logarithm squared contribution for the state with nonzero angular momentum since the respective nonrelativistic wave function vanishes at the origin. Only the two-loop vertex in Fig. 29 produces a logarithm squared term in this case. The respective perturbation potential determined by the second term in the low-momentum expansion of the two-loop Dirac form factor [117] has the form 2a(Za)k < "! ln(Za)\ . 9pm
(108)
Calculation of the matrix element of this e!ective perturbation with the nonrelativistic wave functions for the P-states yields [115,116] 4(n!1) a(Za)m *E(nP)" ln(Za)\ , 27n pn
(109)
while for l'1 there are no logarithm squared contributions. 4.4.2.2. Polarization operator contributions. Logarithmic contributions corresponding to the diagrams with at least one polarization insertion may be calculated by the methods described above. The leading logarithm squared term in Fig. 30 is generated when we combine the perturbation potential in Eq. (103) which corresponds to the one-loop electron vertex and the perturbation potential in Eq. (31) which corresponds to the polarization operator contribution to the Lamb shift [115] 8 a(Za)m ln(Za)\ . *E(nS)" 45 pn
(110)
Let us remind the reader immediately that the logarithm squared terms induced by two radiative insertions in the electron line remain uncalculated. In the case of P-states the leading contribution corresponding to the diagrams with at least one polarization insertion is linear in the large logarithm and it is equal to [115] 8(n!1) a(Za)m ln(Za)\ . *E(nP)"! pn 135n
(111)
But again there are uncalculated contributions linear in the large logarithm induced by the diagrams without polarization insertions.
118
M.I. Eides et al. / Physics Reports 342 (2001) 63}261
Fig. 30. One of the logarithm squared contributions of order a(Za). Fig. 31. Two-loop polarization potential.
Fig. 32. Logarithm squared contributions to *E (1S)!n*E (nS). * *
The linear in the large logarithm contribution to the energy of the S-level induced by the two-loop vacuum polarization in Fig. 31 is also known [60]
m 41 a(Za) ln(Za)\ m , *E"! m 81 pn
(112)
but again there are many uncalculated contributions of the same order, and this correction may serve only as an estimate of unknown corrections. 4.4.2.3. Corrections of order a(Za)m to *E (1S)!n*E (nS). All state-independent contribu* * tions cancel in the energy shifts combination *E(1S)!n*E(nS), which makes calculation of this energy di!erence more feasible than calculation of the individual energy levels themselves. In the situation when many state-independent corrections of order a(Za)m to the individual energy levels are still unknown, this leads to more accurate theoretical prediction for this combination of the energy levels than for each of the energy levels themselves. This may be extremely useful in comparison of the theory and experiment (see, e.g., [14,118]). The logarithm squared contributions to the di!erence of energies are generated in the second order of perturbation theory by two one-loop vertex operators, and in the "rst order of perturbation theory by the one two-loop vertex (see diagrams in Fig. 32). Due to cancellation of the state-independent terms in the di!erence of energy levels only intermediate continuous spectrum states with momenta of the atomic scale mZa give contributions in the second order of perturbation theory [119,120]. Then the local approximation for the one-loop vertices and the nonrelativistic approximation for the wave functions is su$cient for calculation of the logarithm squared contribution to the energy di!erence generated by the "rst diagram in Fig. 32. Calculation of the contribution induced by the second-order vertex operator (second diagram in Fig. 32) is quite
M.I. Eides et al. / Physics Reports 342 (2001) 63}261
119
straightforward. Both contributions were calculated in a series of papers [115,116,119,120] with the result
n!1 n!1 a(Za)m 16 ln n!t(n)#t(1)! # ln(Za)\ . (113) *E(1S)!n*E(nS)" p n 4n 9 Numerically this contribution is equal to !10.71 kHz for *E(1S)!8*E(2S). Contributions to the di!erence of energies linear in the large logarithm are still unknown. Only the linear terms connected with polarization insertions were calculated [115]
n!1 n!1 a(Za)m 32 !ln n#t(n)!t(1)# ! ln(Za)\ (114) *E(1S)!n*E(nS)" p n 4n 45 but the contributions of the same order of magnitude induced by the diagrams without polarization insertions were never calculated. As usual we expect that the contributions connected exclusively with the electron line are larger than the polarization contribution above. In such a situation it seems prudent to assume that the uncertainty in the di!erence *E(1S)!8*E(2S), which is due to uncalculated linear terms, is perhaps about 5 kHz [116,118]. The logarithm squared contributions to the individual energy levels are also unknown. We have assumed one half of the leading logarithm cubed contribution in Eq. (106) as an estimate of all yet uncalculated corrections of this order (see discussion in Section 4.4.2.1). This estimate of the theoretical uncertainties is con"rmed by the magnitude of the logarithm squared contribution to the interval E (1S)!8E (2S), which can be considered as an estimate of the scale of all yet * * uncalculated corrections of this order. Due to the fact that we know the logarithm squared contribution Eq. (113) to the interval E (1S)!8E (2S) (see Eq. (113)) the theoretical accuracy of * * this di!erence is higher than the accuracy of the expression for *E(1S). 4.4.2.4. Corrections of order a(Za)m. Corrections of order a(Za) were never considered in the literature. They are suppressed in comparison to contributions of order a(Za) by at least an additional factor a/p and are too small to be of any phenomenological interest now (see Table 5). 4.5. Radiative corrections of order a(Za)m and of higher orders Only partial results are known for corrections of order a(Za)m. However, recent achievements [121] in the numerical calculations without expansion in Za completely solve the problem of the corrections of order a(Za)m and of higher orders in Za. 4.5.1. Corrections induced by the radiative insertions in the electron line Consider "rst corrections of order a(Za) induced by the radiative photon insertions in the electron line. Due to the Layzer theorem [81] the diagram with the radiative photon spanning four Coulomb photons does not lead to a logarithmic contribution. Hence, all leading logarithmic contributions of this order may be calculated with the help of second-order perturbation theory in Eq. (102). It is easy to check that the leading contribution is linear in the large logarithm and arises when one takes as the "rst perturbation the local potential corresponding to the order a(Za)m contribution to the Lamb shift Eq. (67)
120
M.I. Eides et al. / Physics Reports 342 (2001) 63}261
Table 5 Radiative corrections of order aL(Za)m
4
Logarithmic electron-line contribution (l"0) Layzer [81]
Fried and Yennie [82] Erickson and Yennie [83,84] Logarithmic electron-line contribution (lO0) Erickson and Yennie [83,84]
a(Za) m 173.074 m+ kHz pn m n
*E(1S) (kHz)
*E(2S) (kHz)
!1882.77
!208.16
!1338.04
!172.21
!56.77
!7.10
!27.41
!4.47
1 m 4 2 ! ln (Za)\ # ln 2#ln 4 m 3 n 601 77 #t(n#1)!t(1)! ! 720 180n m ln (Za)\ m
1 1! n
1 1 # d 30 12 H
6!2l(l#1)/n # 3(2l#3)l(l#1)(4l!1)
ln Nonlogarithmic electron-line contribution Pachucki (1S, 2S) [91,92,98] Jentschura and Pachucki (2P , 2P ) [99] Jentschura et al. (3P , 3P , 4P , 4P ) [100] Logarithmic polarization operator contribution Layzer [81] Erickson and Yennie [83,84] Nonlogarithmic polarization operator contribution (l"0) Mohr [86] Ivanov and Karshenboim [103] Nonlogarithmic polarization operator contribution (l"1) Mohr [86] Manakov et al. [104]
m (Za)\ m
A 4
1 m ! ln (Za)\ d J 30 m
1 431 ! #t(n#1)!t(1) 105 15
2(n!1) 1 n ! # !ln d n 28n 2 J
1 1 ! 1! 15 n
1 1 # d 14 4 H
(continued on next page)
M.I. Eides et al. / Physics Reports 342 (2001) 63}261
121
Table 5 (continued)
*E(1S) (kHz)
*E(2S) (kHz)
2.45
0.31
Leading logarithmic electron-line contribution 2 a ! ln(Za)\d Karshenboim [110] J 27 p
!28.38
!3.55
Electron-line log squared term *E(nS)
$10
$2
$
$
1.73
0.22
!0.50
!0.06
Wichmann}Kroll contribution Wichmann and Kroll [102] Mohr [108,107]
4
a(Za) m 173.074 m+ kHz pn m n
19 p ! d 180 108 J
(?)
Log squared contribution *E(nP)
a ln(Za)\ p
n!1 a ln(Za)\ 27n p a (?) ln(Za)\ p
Karshenboim [115,116] Electron-line linear in log term *E(nP)
Log squared term connected with polarization *E(nS) Karshenboim [115]
2 a ln(Za)\ 45 p
Linear log connected with polarization *E(nP) Karshenboim [115]
2(n!1) a ln(Za)\ ! 135n p
Linear log connected with two-loop polarization *E(nS) Eides and Grotch [60]
41 a ! ln(Za)\ 324 p
11 1 pa(Za) < "4 1# ! ln 2 , 128 2 m
(115)
and the second perturbation corresponds to the Darwin potential pZa , (116) < "! 2m where both potentials are written in momentum space (see Fig. 33). Substituting these potentials in Eq. (102) one easily obtains [120]
m(Za)\ a(Za) m 11 md . *E" 2# !ln 2 ln J m n m 64
(117)
122
M.I. Eides et al. / Physics Reports 342 (2001) 63}261
Fig. 33. Leading logarithmic contribution of order a(Za) induced by the radiative photon.
The Darwin potential generates the logarithmic correction to the nonrelativistic SchroK dinger}Coulomb wave function in Eq. (96), and the result in Eq. (117) could be obtained by taking into account this correction to the wave function in calculation of the contribution to the Lamb shift of order a(Za)m. This logarithmic correction is numerically equal 14.43 kHz for the 1S-level in hydrogen, and 1.80 kHz for the 2S level. The nonlogarithmic contributions of this order were never calculated directly, but one can obtain a reliable estimate of these contributions as well as of the contributions of higher orders in (Za) using results of the numerical calculations of the contributions of order a(Za)L made without expansion in Za for Z not too small. It is convenient to parametrize respective results with the help of an auxiliary function G (Za) de"ned by the relation 1# 139 m(Za)\ a(Za) m 1 m. (118) !ln 2 ln d # GJL (Za) *E(l, n)" J 1# 64 m n m p Results for the function G (Za) for small Z may be obtained with the help of extrapolation 1# from the numerical results for larger Z [86,122,123]. The results of such extrapolation for the 1S, 2S, 2P, 4S- and 4P-states (Z"1, 2) are presented in [124,13] (see also results of an earlier extrapolation in [125]). The extrapolation in [124,13] was done independent of the exact results in [91,92,99,100,120] for the nonlogarithmic contribution of order a(Za) and the logarithmic one of order a(Za). Extrapolations which take into account these exact results were performed for the hydrogen 1S, 2S, 2P, 3P, 4P-states in [126,99,100]. We have collected the results of these extrapolations in Table 6. Taking into account current experimental accuracy we may conclude from the data in Table 6 that the higher order corrections in Za of the form a(Za)L are small and the currently known coe$cients su$cient for the phenomenological needs. A spectacular success was achieved recently in numerical calculation of the function G (Za) 1# for low Z [121]. This self-energy contribution for the 1S-level in hydrogen was obtained in this work with numerical uncertainty 0.8 Hz. This result completely solves all problems with calculation of the higher-order corrections in Za of the form a(Za)L in the foreseeable future.
4.5.2. Corrections induced by the radiative insertions in the Coulomb lines There are two contributions of order a(Za)m to the energy shift induced by the Uehling and the Wichmann}Kroll potentials (see Figs. 19 and 24, respectively). Respective calculations The factor 1/p before the second term in the square brackets is written here in order to conform with the traditional notation. The function G (Za) is de"ned similarly to the function G (Za) in Eq. (118), but includes also the nonlogarithmic 1# 1# contribution of order a(Za).
M.I. Eides et al. / Physics Reports 342 (2001) 63}261
123
Table 6 Term G 1#
1S Karshenboim [126]
G 1#
*E (kHz)
!2.42(15)
!0.76(5)
2P Jentschura and Pachucki [99]
3.1(5)
0.12(2)
2P Jentschura and Pachucki [99]
2.3(5)
0.09(2)
3P Jentschura et al. [100]
3.6(5)
0.04
3P Jentschura et al. [100]
2.6(5)
0.03
4P Jentschura et al. [100]
3.9(5)
0.02
4P Jentschura et al. [100]
2.8(5)
0.01
go along the same lines as in the case of the Coulomb-line corrections of order a(Za) considered above. a. Uehling potential contribution. The logarithmic contribution is induced only by the Uehling potential in Fig. 19, and may easily be calculated exactly in the same way as the logarithmic contribution induced by the radiative photon in Eq. (117). The only di!erence is that now the role of the perturbation potential is played by the kernel which corresponds to the polarization contribution to the Lamb shift of order a(Za)m 5 pa(Za) . < " 48 m
(119)
Then we immediately obtain [86]
m(Za)\ a(Za) m 5 md . *E" ln J m n m 96
(120)
It is not di$cult to calculate analytically nonlogarithmic corrections of order a(Za) generated by the Uehling potential. Using the formulae from [86] one obtains for a few lower levels (see also [127] for the case of 1S-state)
m(Za)\ m 23 5 ln #2 ln 2# a(Za) m , *E(1S)" m m 15 96
(121)
124
M.I. Eides et al. / Physics Reports 342 (2001) 63}261
m(Za)\ 5 841 a(Za) m m, *E(2S)" ln #4 ln 2# m 2 m 96 480 41 a(Za) m m, *E(2P )" m 3072 2 7 a(Za) m m. *E(2P )" m 1024 2 There are no obstacles to exact numerical calculation of the Uehling potential contribution to the energy shift without expansion over Za and such calculations have been performed with high accuracy (see [86,13] and references therein). The results of these calculations may be conveniently presented with the help of an auxiliary function G (Za) de"ned by the relationship 3 5 m(Za)\ a(Za) m 1 m. (122) *E(l, n)" ln d # GJL (Za) J 3 96 m n m p For the case of atoms with low Z (hydrogen and helium), values of the function G (Za) for the 3 states with n"1, 2, 4 are tabulated in [13] and respective contributions may easily be calculated for other states when needed. These numerical results may be used for comparison of the theory and experiment instead of the results of order a(Za) given above. We may also use the results of numerical calculations in order to make an estimate of uncalculated contributions of the Uehling potential of order a(Za) and higher. According to [13]
GJL(a)"0.428052 . (123) 3 Comparing this value with the order a(Za)m result in Eq. (121) we see that the di!erence between the exact numerical result and analytic calculation up to order a(Za) is about 0.015 kHz for the 1S-level in hydrogen, and, taking into account the accuracy of experimental results, one may use analytic results for comparison of the theory and experiment without loss of accuracy. A similar conclusion is valid for other hydrogen levels. b. Wichmann}Kroll potential contribution. Contribution of the Wichmann}Kroll potential in Fig. 24 may be calculated in the same way as the respective contribution of order a(Za)m in Eq. (101) by taking the next term in Za in the small momentum expansion of the Wichmann}Kroll potential in Eq. (100). One easily "nds [108,107]
*E"
1 31p a(Za) m md . ! J 16 2880 n m
(124)
This contribution is very small and it is clear that at the present level of experimental accuracy calculation of higher-order contributions of the Wichmann}Kroll potential is not necessary. 4.5.3. Corrections of order a(Za)m Corrections of order a(Za) were never considered in the literature. They should be suppressed in comparison with the corrections of order a(Za) by at least the factor a/p. Even taking into account possible logarithmic enhancements, these corrections are not likely to be larger than about 1 kHz for the 1S-state and about 0.1 kHz for 2S-state in hydrogen. This means that they are not important today from the phenomenological point of view.
M.I. Eides et al. / Physics Reports 342 (2001) 63}261
125
Table 7 Radiative corrections of order aL(Za)m
Logarithmic electron-line contribution Karshenboim [120]
4
a(Za) m m n m
139 1 ! ln(Za)\d J 256 4
Nonlogarithmic electron-line contribution Mohr [123] Karshenboim [126] Logarithmic polarization operator contribution Mohr [86] Nonlogarithmic polarization operator contribution *E(nS), Mohr [86] Wichmann and Kroll [102] Mohr [108,107] Corrections of order a(Za)
5 ln(Za)\d J 384
1 31p ! d 64 11 520 J
($)
a p
*E(1S) (kHz)
*E(2S) (kHz)
14.43
1.80
!0.76(5)
!0.09(1)
0.51
0.06
0.15 !0.04
0.03 !0.01
$1
$0.1
Concluding our discussion of the purely radiative corrections to the Lamb shift let us mention once more that the main source of the theoretical uncertainty in these contributions is connected with the uncalculated contributions of order a(Za), which may be as large as 14 kHz for 1S-state and 2 kHz for the 2S-state in hydrogen. All other unknown purely radiative contributions to the Lamb shift are much smaller. Note also that due to an extra theoretical information on the logarithm squared contribution of order a(Za), the purely radiative contributions to the di!erence *E(1S)!8*E(2S) are known better than the purely radiative contributions to the individual energy levels. The uncertainty in the di!erence *E(1S)!8*E(2S) due to yet unknown purely radiative terms is about 5 kHz (see Table 7).
5. Essentially two-particle recoil corrections 5.1. Recoil corrections of order (Za)(m/M)m Leading relativistic corrections of order (Za) and their mass dependence were discussed above in Section 4.1 in the framework of the Breit equation and the e!ective Dirac equation in the external "eld in Fig. 7. The exact mass dependence of these corrections could be easily calculated
126
M.I. Eides et al. / Physics Reports 342 (2001) 63}261
Fig. 34. Diagrams with two-photon exchanges. Fig. 35. Irreducible kernels with arbitrary number of the exchanged Coulomb photons.
because all these corrections are induced by the one-photon exchange. The e!ective Dirac equation in the external "eld produces leading relativistic corrections with correct mass dependence because the one-photon exchange kernel is properly taken into account in this equation. Some other recoil corrections of higher orders in Za are also partially generated by the e!ective Dirac equation with the external source. All such corrections are necessarily of even order in Za since all expansions for the energy levels of the Dirac equation are e!ectively nonrelativistic expansions over v; they go over (Za), and, hence, the next recoil correction produced by the e!ective Dirac equation in the external "eld is of order (Za). The result for the recoil correction of order (Za)(m/M), obtained in this way, is incomplete and we will improve it below. First we will consider the even larger recoil correction of order (Za)(m/M), which is completely missed in the spectrum of the Breit equation or of the e!ective Dirac equation with the Coulomb potential, and which can be calculated only by taking into account the two-particle nature of the QED bound-state problem. The external "eld approximation is clearly inadequate for calculation of the recoil corrections and, in principle, one needs the machinery of the relativistic two-particle equations to deal with such contributions to the energy levels. The "rst nontrivial recoil corrections are generated by kernels with two-photon exchanges. Naively one might expect that all corrections of order (Za)(m/M)m are generated only by the two-photon exchanges in Fig. 34. However, the situation is more complicated. More detailed consideration shows that the two-photon kernels are not su$cient and irreducible kernels in Fig. 35 with arbitrary number of the exchanged Coulomb photons spanned by a transverse photon also generate contributions of order (Za)(m/M)m. This e!ect is similar to the case of the leading order radiative correction of order a(Za) considered in Section 4.2.1.1 when, due to a would-be infrared divergence, diagrams in Fig. 11 with any number of the external Coulomb photons spanned by a radiative photon give contributions of one and the same order since the apparent factor Za accompanying each extra external photon is compensated by a small denominator connected with the small virtuality of the bound electron. Exactly the same e!ect arises in the case of the leading recoil corrections. All kernels with any number of exchanged Coulomb photons spanned by an exchanged transverse photon generate contributions to the leading recoil correction. Let us describe this similarity between the leading contribution to the Lamb shift and the leading recoil correction in more detail following a nice physical interpretation which was given in [96]. The leading contribution to the Lamb shift in Eq. (30) is proportional to the mean square of the electron radius which may be understood as a result of smearing of the #uctuating electron coordinate due to its interaction with the #uctuating electromagnetic "eld [128]. In the considerations leading to Eq. (30) we considered the proton as an in"nitely heavy source of the Coulomb "eld. If we take into account the "niteness of the proton mass, then the factor 1r2 in Eq. (30) will turn into 1(*r !*r )2"1(*r )2#1(*r )2!21(*r )(*r )2, where *r and *r are
M.I. Eides et al. / Physics Reports 342 (2001) 63}261
127
#uctuations of the coordinates of the electron and the proton, respectively. Averaging the squares of the #uctuations of the coordinates of both particles proceeds exactly as in the case of the electron in the Coulomb "eld in Eq. (28) and generates the leading contribution to the Lamb shift and recoil correction of relative order (m/M). This recoil factor arises because the average #uctuation of the coordinate squared equal to the average radius squared of the particle is inversely proportional to mass squared of this particle. Hence, it is clear that the average 1*r *r 2 generates a recoil correction of the "rst order in the recoil factor m/M. Note that the correlator 1*r *r 2 is di!erent from zero only when averaging goes over distances larger than the scale of the atom 1/(mZa) or in momentum space over #uctuating momenta of order mZa and smaller. For smaller distances (or larger momenta) #uctuations of the coordinates of two particles are completely uncorrelated and the correlator of two coordinates is equal to zero. Hence, the logarithmic contribution to the recoil correction originates from the momentum integration region m(Za);k;m(Za), unlike the leading logarithmic contribution to the Lamb shift which originates from a wider region m(Za);k;m. A new feature of the leading recoil correction is that the upper cuto! to the logarithmic integration is determined by the inverse size of the atom. We will see below how all these qualitative features are reproduced in the exact calculations. Complete formal analysis of the recoil corrections in the framework of the relativistic twoparticle equations, with derivation of all relevant kernels, perturbation theory contributions, and necessary subtraction terms may be performed along the same lines as was done for hyper"ne splitting in [129]. However, these results may also be understood without a cumbersome formalism by starting with the simple scattering approximation. We will discuss recoil corrections below using this less rigorous but more physically transparent approach. As we have already realized from the qualitative discussion above, the leading recoil correction is generated at large distances, and small exchanged momenta are relevant for its calculation. The choice of gauge of the exchanged photons is, in such a case, determined by the choice of gauge in the e!ective Dirac equation with the one photon potential. This equation was written in the Coulomb gauge and, hence, we have to use the Coulomb gauge also in the kernels with more than one exchanged photon. Since the Coulomb and transverse propagators have di!erent form in the Coulomb gauge it is natural to consider separately diagrams with Coulomb}Coulomb, transverse}transverse and Coulomb}transverse exchanges.
5.1.1. Coulomb}Coulomb term Coulomb exchange is already taken into account in the construction of the zero-order e!ective Dirac equation, where the Coulomb source plays the role of the external potential. Hence, additional contributions of order (Za) could be connected only with the highmomentum Coulomb exchanges. Let us start by calculating the contribution of the skeleton Coulomb} Coulomb diagrams with on-shell external electron lines in Fig. 36, with the usual hope that the integrals would tell us themselves about any possible inadequacy of such an approximation. Direct calculation of the two Coulomb exchange photon contribution leads to the integral
(Za)m m dk 4 [ f (kk)!kf (k)] , *E"! m k (1!k) pn
(125)
128
M.I. Eides et al. / Physics Reports 342 (2001) 63}261
Fig. 36. Coulomb-Coulomb two-photon exchanges.
where 1 f (k)"3(1#k# , (1#k
(126)
and k"m/M. The apparent asymmetry of the expression in Eq. (125) with respect to masses of the heavy and light particle emerged because the dimensionless momentum k in this formula is measured in terms of the electron mass. At small momenta the function f (k) behaves as f (kk)!kf (k)+4(1!k)!k(1!k)k#O(k) ,
(127)
and the skeleton integral in Eq. (125) diverges as dk/k in the infrared region. The physical meaning of this low-momenta infrared divergence is clear; it corresponds to the Coulomb exchange contribution to the SchroK dinger}Coulomb wave function. The Coulomb wave function graphically includes a sum of Coulomb ladders and the addition of an extra rung does not change the wave function. However, if one omits the binding energy, as we have e!ectively done above, one would end up with an infrared divergent integral instead of the self-reproducing SchroK dinger}Coulomb wave function. A slightly di!erent way to understand the infrared divergence in Eq. (125) is to realize that the terms in Eq. (127) which generate the divergent contribution correspond to the residue of the heavy proton pole in the box diagram. Once again these heavy particle pole contributions build the Coulomb wave function and we have to subtract them not only to avoid an apparent divergence in the approximation when we neglect the binding energy, but in order to avoid double counting. We would like to emphasize here that, even if one would forget about the threat of double counting, an emerging powerlike infrared divergence would remind us of its necessity. Any powerlike infrared divergence is cuto! by the binding energy, and has a well-de"ned order in the parameter Za. It is most important that the integral in Eq. (125) does not contain any logarithmic infrared divergence at small momenta. In such a case one can unambiguously subtract in the integrand the powerlike infrared divergent terms and the remaining integral will be completely convergent. Then only high intermediate momenta of the order of the electron mass contribute to the subtracted integral, the respective diagram is e!ectively local in coordinate space, and the contribution to the energy shift of order (Za) is simply given by the product of this integral and the nonrelativistic SchroK dinger}Coulomb wave function squared at the origin. Any attempt to take into account small virtuality of the external electron lines (equivalent to taking into account nonlocality of the diagram in the coordinate space) would lead to additional factors of Za, which we do not consider yet. Direct calculation, after subtraction, of the "rst two leading low-frequency
M.I. Eides et al. / Physics Reports 342 (2001) 63}261
129
terms in the integrand in Eq. (125) immediately gives
4 (Za)m m dk *E "! +f (kk)!kf (k)![4(1!k)!k(1!k)k], (1!k) pn m k 4k (Za)m m , "! m 3 pn
(128)
reproducing the well known result [94,95,25]. Let us emphasize once again that an exact calculation (in contrast to the calculation with the logarithmic accuracy) of the Coulomb}Coulomb contribution with the help of the skeleton integral turned out to be feasible due to the absence of the low-frequency logarithmic divergence. For the logarithmically divergent integrals the low-frequency cuto! is supplied by the wave function, and in such a case it is impossible to calculate the constant on the background of the logarithm in the skeleton approximation. In such cases more accurate treatment of the low-frequency contributions is warranted. 5.1.2. Transverse-transverse term The kernels with two transverse exchanges in Fig. 37 give the following contribution to the energy shift in the scattering approximation:
2k (Za)m m *E"! dk( f (k)!kf (kk)) , 1!k pn m where 1 1 f (k)" ! . k (1#k
(129)
(130)
This integral diverges only logarithmically at small momenta. Hence, this contribution does not contain either corrections of the previous order or the nonrecoil corrections. The main lowfrequency logarithmic divergence produces ln Za and the factor before this logarithm may easily be calculated in the scattering appproximation. This approximation is unsu$cient for calculation of the nonlogarithmic contribution, and respective calculation requires a more accurate consideration [94,95]. A new feature of the integral in Eq. (129), as compared with the other integrals discussed so far, is that the exchanged momenta higher than the electron mass produce a nonvanishing contribution. This new integration region from the electron to the proton mass, which was discovered in [95], arises here for the "rst time in the bound state problem. As we will see below, especially in discussion of the hyper"ne splitting, these high momenta are responsible for a number of important contributions to the energy shifts. The high-momentum contribution to the Lamb shift is suppressed by the second power of the recoil factor (m/M), and is rather small. Let us note that the result we will obtain below in this
Fig. 37. Transverse-transverse two-photon exchanges.
130
M.I. Eides et al. / Physics Reports 342 (2001) 63}261
section is literally valid only for an elementary proton, since for the integration momenta comparable with the proton mass one cannot ignore the composite nature of the proton and has to take into account its internal structure as it is described by the phenomenological form factors. It is also necessary to take into account inelastic contributions in the diagrams with the two exchanged photons. We will consider these additional contributions later in Section 7.2 dealing with the nonelectromagnetic contributions to the Lamb shift. The state-independent high-frequency contribution as well as the low-frequency logarithmic term are di!erent from zero only for the S-states and may easily be calculated with the help of Eq. (129)
2k ln k (Za)m m d , !2 ln(1#k)#(2 ln Za#2 ln 2) k *E" ! J 1!k pn m
(131)
in complete accord with the well known result [94,95,25]. Note that despite its appearance this result is symmetric under permutation of the heavy and light particles, as expected beforehand, since the diagrams with two transverse exchanges are symmetric. In order to preserve this symmetry we cut the integral from below at momenta of order m Za, calculating the contribution in Eq. (131). In order to obtain the state-dependent low-frequency contribution of the double transverse exchange it is necessary to restore the dependence of the graphs with two exchanged photons on the external momenta and calculate the matrix elements of these diagrams between the momentum-dependent wave functions. Respective momentum integrals should be cut o! from above at m Za. The wave function momenta provide an e!ective lower cuto! for the loop integrals and one may get rid of the upper cuto! by matching the low- and high-frequency contributions. The calculation for an arbitrary principal quantum number is rather straightforward but tedious [94,95,83,25,130,131] and leads to the result
*E"
2 ln
2Za n!1 8(1!ln 2) #2[t(n#1)!t(1)]# # n n 3
M ln (m/m )!m ln (M/m ) 1!d m (Za)m m d ! J !2 . J l(l#1)(2l#1) M pn M!m m
(132)
Let us emphasize that the total contribution of the double transverse exchange is given by the matrix element of the two-photon exchanges between the SchroK dinger}Coulomb wave functions, and no kernels with higher number of exchanges arise in this case, unlike the case of the main contribution to the Lamb shift discussed in Section 4.2.1.1 and the case of the transverse}Coulomb recoil contribution which we will discuss next. 5.1.3. Transverse}Coulomb term One should expect that the contribution of the transverse}Coulomb diagrams in Fig. 38 would vanish in the scattering approximation because, in this approximation, there are no external vectors which are needed in order to contract the transverse photon propagator. The only available vector in the scattering approximation is the exchanged momentum itself, which turns into zero after contraction with the transverse photon propagator. It is easy to check that this is just what
M.I. Eides et al. / Physics Reports 342 (2001) 63}261
131
Fig. 38. Transverse-Coulomb two-photon exchanges.
happens, the electron and proton traces are proportional to the exchanged momentum k in the G scattering approximation and vanish, being dotted with the transverse photon propagator. This does not mean, however, that the diagrams with one transverse exchange do not contribute to the energy shift. We still have to explore if any contributions could be generated by the exchange of the transverse photon, with a small momentum between m (Za) and m (Za), when one clearly cannot neglect the momenta of the external wave functions which are of the same order of magnitude. Hence, we have to consider all kernels in Fig. 35 with a transverse exchanged photon spanning an arbitrary number of Coulomb exchanges. As we have already discussed in the beginning of this section one might expect that when the momentum of the transverse photon is smaller than the characteristic atomic momentum m Za (in other words when the wavelength of the transverse quantum is larger than the size of the atom) the contribution to the Lamb shift generated by such a photon would only di!er by an additional factor m/M from the leading contribution to the Lamb shift of order a(Za)m, simply because such a photon cannot tell the electron from the proton. The extra factor m/M is due simply to the smaller velocity of the heavy particle in the atom (we remind the reader that the transverse photon interaction vertex with a charged particle in the nonrelativistic approximation is proportional to the velocity of the particle). Old-fashioned perturbation theory is more suitable for exploration of such small intermediate momenta contributions. Correction due to the exchange of the transverse photon is described in this framework simply as a second-order perturbation theory contribution where the role of the perturbation potential plays the transverse photon emission (absorption) vertex. In this framework the Coulomb potential plays the role of the unperturbed potential, so the simple second-order contribution which we just described takes into account all kernels of the relativistic two-body equation in Fig. 35 with any number of the Coulomb exchanges spanned by the transverse photon. Summation over intermediate states in the nonrelativistic perturbation theory in our case means integration over all intermediate momenta. It is clear that for momenta larger than the characteristic atomic momentum m Za integration over external wave function momenta decouples (and we obtain instead of the wave functions their value at the origin) and one may forget about the binding energies in the intermediate states. Then the contribution of this high (larger than m Za) region of momenta reduces to the matrix element of the Breit interaction (transverse quanta exchange). As we have explained above, this matrix element does not give any contribution to the Lamb shift (but it gives the main contribution to hyper"ne splitting, see below). All this means that the total recoil correction of order (Za)(m/M)m may be calculated in the nonrelativistic approximation. Calculations go exactly in the same way as calculation of the leading low energy contribution to the Lamb shift in Section 4.2.1.1. Due to validity of the nonrelativistic approximation the matrix elements of the Dirac matrices corresponding to the emission of transverse quanta by the constituent particles reduce to the velocities a Pp /m , where p and m are the momenta of G G G G G the constituents (of the same magnitude and with opposite directions in the center of mass system) and their masses. Note that the recoil factor arises simply as a result of kinematics. Again, as in the
132
M.I. Eides et al. / Physics Reports 342 (2001) 63}261
case of the main contribution to the Lamb shift in Section 4.2.1.1 one introduces an auxiliary parameter p(m (Za);p;m (Za)/n) in order to facilitate further calculations. Then the low frequency contribution coincides up to an extra factor 2Zm/M (extra factor 2 arises due to two ways for emission of the transverse quanta, by the "rst and the second constituents), with the respective low-frequency contribution of order a(Za)m, and we may simply borrow the result from that calculation (compare Eq. (40))
m (Za)m m 2p 8 (133) d !ln k (n, l ) *E" ln M pn m m (Za) J 3 In the region where k5p we may safely neglect the binding energies in the denominators of the second-order perturbation theory and thus simplify the integrand. After integration one obtains
m (Za) 1 5 8 #[t(n#1)!t(1)]! # #ln 2 *E" ln np 2n 6 3
1!d m (Za)m m J d . ! J 2l(l#1)(2l#1) M pn m
(134)
Let us emphasize once more that, as was discussed above, the `high frequenciesa in this formula are e!ectively cut at the characteristic bound state momenta m (Za)/n. This leads to two speci"c features of the formula above. First, this expression contains the characteristic logarithm of the principal quantum number n, and, second, the logarithm of the recoil factor (1#m/M) is missing, unlike the case of the nonrecoil correction of order a(Za)m in Eq. (40). The source of this di!erence is easy to realize. In the nonrecoil case the e!ective upper cuto! was supplied by the electron mass m, while at low frequency only the reduced mass enters all expressions. This mismatch between masses leads to appearance of the logarithm of the recoil factor. In the present case, the e!ective upper cuto! m (Za)/n also depends only on the reduced mass, and, hence, an extra factor under the logarithm does not arise. Matching both contributions we obtain [94,95,83,25,130] 8 *E" 3
ln
2 1 5 #[t(n#1)!t(1)]! # #ln 2 d !ln k (n, l) J nZa 2n 6
1!d m (Za)m m J ! . 2l(l#1)(2l#1) M pn m
(135)
The total recoil correction of order (Za)(m/M)m is given by the sum of the expressions in Eqs. (128), (132), and (135):
*E"
!2
2 14 2 2n!1 1 ln (Za)\# ln #t(n#1)!t(1)# ! 3 3 n 2n 9
m (Za)m m M ln (m/m )!m ln (M/m ) 8 7(1!d ) d ! ln k (n, l)! J . J 3 m M!m 3l(l#1)(2l#1) M pn (136)
M.I. Eides et al. / Physics Reports 342 (2001) 63}261
133
5.2. Recoil corrections of order (Za)(m/M)m 5.2.1. The Braun formula Calculation of the recoil corrections of order (Za)(m/M)m requires consideration of the kernels with three exchanged photons. As in the case of recoil corrections of order (Za) low exchange momenta produce nonvanishing contributions, external wave functions do not decouple, and exact calculations in the direct diagrammatic framework are rather tedious and cumbersome. There is a long and complicated history of theoretical investigation on this correction. The program of diagrammatic calculation was started in [131}133]. Corrections obtained in these works contained logarithm Za as well as a constant term. However, completely independent calculations [134,135] of both recoil and nonrecoil logarithmic contributions of order (Za) showed that somewhat miraculously all logarithmic terms cancel in the "nal result. This observation required complete reconsideration of the whole problem. The breakthrough was achieved in [136], where one and the same result was obtained in two apparently di!erent frameworks. The "rst, more traditional approach, used earlier in [73,131}133], starts with an e!ective Dirac equation in the external "eld. Corrections to the Dirac energy levels are calculated with the help of a systematic diagrammatic procedure. The other logically independent calculational framework, also used in [136], starts with an exact expression for all recoil corrections of the "rst order in the mass ratio of the light and heavy particles m/M. This remarkable expression, which is exact in Za, was "rst discovered by Braun [137], and rederived and re"ned later in a number of papers [138,139,136]. A particularly transparent representation of the Braun formula was obtained in [139]
du dk 1 1n"(p!DK (u, k))G(E#u)(p!DK (u, k))"n2 , *E "! Re (2p)i M
(137)
where G(E#u) is the total Green function of the Dirac electron in the external Coulomb "eld, and DK (u, k) is the transverse photon propagator in the Coulomb gauge ak , DK (u, k)"!4pZa u!k#i0
(138)
k(ak) , a"c c . ak "a! k
(139)
and
Before returning to the recoil corrections of order (Za) we will digress to the Braun formula. We will not give a detailed derivation of this formula, referring the reader instead to the original derivations [137}139,136]. We will however present below some physically transparent semiquantative arguments which make the existence and even the exact appearance of the Braun formula very natural. Let us return to the original Bethe}Salpeter equation (see Eq. (7)). As we have already discussed there are many ways to organize the Feynman diagrams which comprise the kernel of this equation. However, in all common perturbation theory considerations of this kernel the main emphasis is on presenting the kernel in an approximate form su$cient for calculation of corrections to the energy levels of a de"nite order in the coupling constant a. The revolutionary idea "rst
134
M.I. Eides et al. / Physics Reports 342 (2001) 63}261
suggested in [140] and elaborated in [137], was to reject such an approach completely, and instead to organize the perturbation theory with respect to another small parameter, namely, the mass ratio of the light and heavy particles. To this end an expansion of the heavy particle propagator over 1/M was considered in [137]. It is well known (see, e.g., [137]) that in the leading order of this expansion the Bethe}Salpeter equation reduces to the Dirac equation for the light particle in the external Coulomb "eld created by the heavy particle. This is by itself nontrivial, but well understood by now, since to restore the Dirac equation in the external "eld one has to take into account irreducible kernels of the Bethe}Salpeter equation with arbitrary number of crossed exchange photon lines (see Fig. 39). Unlike the solutions of the e!ective Dirac equation, considered above in Section 2.3, the solutions of the Dirac equation obtained in this way contain as a mass parameter the mass of the light particle, and not the reduced mass of the system. This is the price one has to pay in the Braun approach for summation of all corrections in the expansion over Za. The zero-order Green function in this approach is simply the Coulomb}Dirac Green function. The next step in the derivation of the Braun formula is to consider all kernels of the Bethe}Salpeter equation which produce corrections of order m/M. The crucial observation which immediately leads to the closed expression for the recoil corrections of order m/M, is that all corrections linear in the mass ratio are generated by the kernels where all but one of the heavy particle propagators are replaced by the leading terms in their large mass expansion, and this remaining propagator is replaced by the next term in the large mass expansion of the heavy particle propagator. Respective kernels with the minimum number of exchanged photons are the box and the crossed box diagrams in Fig. 34 where the heavy particle propagator is replaced by the second term in its large mass expansion. All diagrams obtained from these two by insertions of any number of external Coulomb photons between the two exchanges in Fig. 34 and/or of any number of the radiaitive photons in the electron line also generate linear in the mass ratio corrections. It is not di$cult to "gure out that these are the only kernels which produce corrections linear in the small mass ratio, all other kernels generate corrections of higher order in m/M, and, hence, are not interesting in this context. Then the linear in the small mass ratio contribution to the energy shift is equal to the matrix element of the two graphs in Fig. 34 with the total electron Green function in the external Coulomb "eld instead of the upper electron line. This matrix element which should be calculated between the unperturbed Dirac}Coulomb wave functions reduces after simple algebraic transformations to the Braun formula in Eq. (137). All terms in the Braun formula have a transparent physical sense. The term containing product pp ("rst obtained in [140]) originates from the exchange of two Coulomb photons, the terms with pDK and DK p correspond to the exchange of Coulomb and magnetic (transverse) quanta, and the term DK DK is connected with the double transverse exchange. Another useful perspective on the Braun formula is provided by the idea, "rst suggested in the original work [137], and later used as a tool to rederive Eq. (137) in [139,136], that the recoil corrections linear in the small mass ratio m/M are associated with the matrix element of the
Fig. 39. Irreducible kernels with crossed exchange photon lines.
M.I. Eides et al. / Physics Reports 342 (2001) 63}261
135
nonrelativistic proton Hamiltonian (p!eA) . H" 2M
(140)
There is a clear one-to one correspondence between the terms in this nonrelativistic Hamiltonian and the respective terms in Eq. (137). The latter could be obtained as matrix elements of the operators which enter the Hamiltonian in Eq. (140) [139,136]. 5.2.2. Lower-order recoil corrections and the Braun formula Being exact in the parameter Za and an expansion in the mass ratio m/M the Braun formula in Eq. (137) should reproduce with linear accuracy in the small mass ratio all purely recoil corrections of orders (Za)(m/M)m, (Za)(m/M)m, (Za)(m/M)m in Eq. (38) which were discussed above. Corrections of lower orders in Za are generated by the simpli"ed Coulomb}Coulomb and Coulomb}transverse entries in Eq. (137). The main part of the Coulomb}Coulomb contribution in Eq. (137) may be written in the form 1 1n"p"n2 , *E " ! 2M
(141)
while the Breit (nonretarded) part of the magnetic contribution has the form 1 1n"pDK (0, k)#DK (0, k)p"n2 . *E "! 2M
(142)
Calculation of the matrix elements in Eq. (141) and Eq. (142) is greatly simpli"ed by the use of the virial relations (see, e.g., [141,142]), and one obtains the sum of the contributions in Eqs. (141) and (142) in a very nice form [138] (compare Eq. (38))
m!E m m LH " ! [ f (n, j)!1]! *E #*E " [ f (n, j)!1] m , ! 2M M 2M
(143)
where E and f (n, j) are de"ned in Eqs. (4) and (5), respectively. This representation again LH emphasizes the simple physical idea behind the Braun formula that the recoil corrections of the "rst order in the small mass ratio m/M are given by the matrix elements of the heavy particle kinetic energy. The recoil correction in Eq. (143) is the leading order (Za) relativistic contribution to the energy levels generated by the Braun formula, all other contributions to the energy levels produced by the remaining terms in the Braun formula start at least with the term of order (Za) [138]. The expression in Eq. (143) exactly reproduces all contributions linear in the mass ratio in Eq. (38). This is just what should be expected since it is exactly Coulomb and Breit potentials which were taken in account in the construction of the e!ective Dirac equation which produced Eq. (38). The exact mass dependence of the terms of order (Za)(m/M)m and (Za)(m/M)m is contained in Eq. (38), and, hence, terms linear in the mass ratio in Eq. (143) give nothing new. It is important to realize at this stage that the contributions of order (Za)(m/M)m in Eqs. (38) and (143) coincide as well, so any corrections of this order obtained with the help of the entries in the Braun formula not taken care of in Eqs. (141) and (142), should be added to the order (Za)(m/M)m contribution in Eq. (38).
136
M.I. Eides et al. / Physics Reports 342 (2001) 63}261
5.2.3. Recoil correction of order (Za)(m/M)m to the S levels Calculation of the recoil contribution of order (Za)(m/M)m to the 1S and 2S states generated by the Braun formula was "rst performed in [136]. Separation of the high- and low-frequency contributions was made with the help of the e-method [92] described above in Section 4.4.1.2. Hence, not only were contributions of order (Za)(m/M)m obtained in [136], but also parts of recoil corrections of order (Za) linear in m/M, discussed in Section 5.1, were reproduced for the 1S-state. The older methods of Section 5.1 lead to a more precise result for the recoil corrections of order (Za), and, hence, this calculation in the framework of the Braun formula is not too interesting on its own. However, it served as an important check of self-consistency of calculations in [136]. Calculations in [136], while in principle very straightforward, turned out to be rather lengthy just because all corrections of previous orders in Za were recovered. The agreement on the magnitude of (Za)(m/M)m contribution for the 1S and 2S states obtained in the diagrammatic approach and in the framework of the Braun formula achieved in [136] seemed to put an end to all problems connected with this correction. However, it was claimed in a later work [143], that the result of [136] is in error. The discrepancy between the results of [136,143] was even more confusing since the calculation in [143] was also performed with the help of the Braun formula. It was observed in [143] that due to the absence of the logarithmic contributions of order (Za)(m/M)m proved earlier in [135], the calculations may be organized in a more compact way than in [136]. The main idea of [143] is that it is possible to make some approximations which are inadequate for calculation of the contributions of the previous orders in Za, signi"cantly simplifying calculation of the correction of order (Za)(m/M)m. Due to absence of the logarithmic contributions of order (Za)(m/M)m proved in [135], infrared divergences connected with these crude approximations would be powerlike and can be safely thrown away. Next, absence of logarithmic corrections of order (Za)(m/M)m means that it is not necessary to worry too much about matching the low- and high-frequency (long- and short-distance in terms of [143]) contributions, since each region will produce only nonlogarithmic contributions and correction terms would be suppressed as powers of the separation parameter. Of course, such an approach would be doomed if the logarithmic divergences were present, since in such a case one could not hope to calculate an additive constant to the logarithm, since the exact value of the integration cuto! would not be known. Despite all these nice features of the approach of [143] the result was erroneous and contradicted the result in [136]. The discrepancy was resolved in [144], where a new logically independent calculation of the recoil corrections of order (Za)(m/M)m was performed. A subtle error in dealing with cuto!s in [143] was discovered and the result of [136] for the S-states with n"1, 2 was con"rmed. The recoil correction of order (Za)(m/M)m for S states with arbitrary principal quantum number n beyond that which is already contained in Eq. (38) has the form [144]
7 (Za) m m. *E " 4 ln 2! 2 n M
(144)
The author of [143] has published later a new paper [145] where the error in [143] is acknowledged. A new result for the recoil corrections of order (Za)(m/M)m was obtained in [145], which, being di!erent from the previous result [143] of the same author, contradicts also results of [136,144]. We think that the manner of separation of the contributions from the large and small distances in [145] is arbitrary and inconsistent, and consider the result of [145] to be in error.
M.I. Eides et al. / Physics Reports 342 (2001) 63}261
137
This result was recently con"rmed in [146] where the recoil correction of the "rst order in the mass ratio was calculated without expansion over Za for 1S and 2S states in hydrogen. The di!erence between the results in Eq. (144) and in [146] may be considered as an estimate of the recoil corrections of higher order in Za. This di!erence is equal 0.22(1) kHz for the 1S level, and is about 0.03 kHz for the 2S level. It is too small to be important for comparison between theory and experiment at the current accuracy level. 5.2.4. Recoil correction of order (Za)(m/M)m to the non-S levels Recoil corrections of order (Za)(m/M)m to the energy levels with nonvanishing orbital angular momentum may also be calculated with the help of the Braun formula [99]. We would prefer to discuss brie#y another approach, which was used in the "rst calculation of the recoil corrections of order (Za)(m/M)m to the P levels [147]. The idea of this approach (see, e.g., review in [134]) is to extend the standard Breit interaction Hamiltonian (see, e.g., [20]) which takes into account relativistic corrections of order v/c to the next order in the nonrelativistic expansion, and also take into account the corrections of order v/c. Contrary to the common wisdom, such an approach turns out to be quite feasible and e!ective, and it was worked out in a number of papers [148,134], and references therein. This nonrelativistic approach is limited to the calculation of the large distance (small intermediate momenta) contributions since any short distance correction leads e!ectively to an ultraviolet divergence in this framework. Powerlike ultraviolet divergences demonstrate the presence of the corrections of the lower order in Za (in contrast to the scattering approximation where the presence of such corrections reveals itself in the form of powerlike infrared divergences), and are not under control in this approach. However, the logarithmic ultraviolet divergences are well under control and produce logarithms of the "ne structure constant. A number of logarithmic contributions to the energy levels and decay widths were calculated in this approach [148,134]. In the case of states with nonvanishing angular momenta the small distance contributions are e!ectively suppressed by the vanishing of the wave function at the origin, and the perturbation theory becomes convergent in the nonrelativistic region. Then this nonrelativistic approach leads to an exact result for the recoil correction of order (Za)(m/M)m for the P states [147]
2 (Za) m 2 m. *E(nP)" 1! 3n n M 5
(145)
Again, this expression contains only corrections not taken into account in Eq. (38). The approach of [147] may be generalized for calculation of the recoil corrections to the energy levels with even higher orbital angular momenta. The general expression for the recoil corrections of order (Za)(m/M)m to the energy level with an arbitrary nonvanishing angular momentum was obtained in [99]
3 l(l#1) (Za) m *E(n¸)" 1! m. n M 3n 4(l!(1/2))(l#(1/2))(l#(3/2))
(146)
5.3. Recoil correction of order (Za)(m/M) Leading logarithm squared contribution to the recoil correction of order (Za)(m/M) was recently independently calculated in two works [149,150] in the same framework [96] as the
138
M.I. Eides et al. / Physics Reports 342 (2001) 63}261
corrections (Za)(m/M)m to the P levels above
m m 11 (Za)m d . ln(Za) *E"! J M m 15 pn
(147)
Numerically this contribution is much smaller than the experimental error bars of the current Lamb shift measurements. However, due to linear dependence of the recoil correction on the electron}nucleus mass ratio, the respective contribution to the hydrogen}deuterium isotope shift (see Section 16.1.6) is larger than the experimental uncertainty, and should be taken into account in comparison between theory and experiment. One half of the leading logarithm squared contribution in Eq. (147) (!0.21 kHz for the 1S level in hydrogen) may be accepted as a fair estimate of the yet uncalculated single logarithmic and nonlogarithmic contributions of order (Za)(m/M) (Table 8).
6. Radiative-recoil corrections In the standard nomenclature the name radiative-recoil is reserved for the recoil corrections to pure radiative e!ects, i.e., for corrections of the form aK(Za)L(m/M)I. Let us start systematic discussion of such corrections with the recoil corrections to the leading contribution to the Lamb shift. The most important observation here is that the mass dependence of all corrections of order aK(Za) obtained above is exact, as was proved in [35,36], and there is no additional mass dependence beyond the one already present in Eqs. (40)}(57). This conclusion resembles the similar conclusion about the exact mass dependence of the contributions to the energy levels of order (Za)m discussed above, and it is valid essentially for the same reason. The high-frequency part of these corrections is generated only by the one-photon exchanges, for which we know the exact mass dependence, and the only mass scale in the low frequency part, which depends also on multiphoton exchanges, is the reduced mass. 6.1. Corrections of order a(Za)(m/M)m The "rst nontrivial radiative-recoil correction is of order a(Za). We have already discussed the nonrecoil contribution of this order in Section 4.3.2. Due to the wave function squared factor this correction naturally contained an explicit factor (m /m). Below we will discuss radiative-recoil corrections of order a(Za) with mass ratio dependence beyond this factor (m /m). 6.1.1. Corrections generated by the radiative insertions in the electron line The diagrammatic calculation of the radiative-recoil correction of order a(Za) induced by the radiative insertions in the electron line in Fig. 40 was performed in [35,151,36], where it was separated into two steps. First, the recoil contributions produced by the nonrelativistic heavy particle pole, which were neglected above in our discussion of the nonrecoil contributions in Section 4.3.2 (see Fig. 18) were calculated. Second, the remaining nonpole contributions of the box and crossed box diagrams in Fig. 40 were obtained. Only the high intermediate loop momenta are involved in the calculations, and the resulting contribution is nonvanishing only for the S states,
M.I. Eides et al. / Physics Reports 342 (2001) 63}261
139
Table 8 Recoil corrections to Lamb shift
Coulomb}Coulomb term Salpeter [94] Fulton and Martin [95]
(Za) m pn mM
*E(1S) (kHz)
*E(2S) (kHz)
4 ! d 3 J
!590.03
!73.75
!2494.01
!305.46
!0.48
!0.06
5494.03
720.56
!7.38
!0.92
!0.42
!0.05
Transverse-transverse term bulk contribution Salpeter [94] Fulton and Martin [95] Erickson and Yennie [83] Grotch and Yennie [25] Erickson [130] Erickson and Grotch [131] Transverse-transverse term very high momentum contribution Fulton and Martin [95] Erickson [130] Coulomb-transverse term Salpeter [94]
Fulton and Martin [95] Erickson and Yennie [83] Grotch and Yennie [25] Erickson [130] Erickson and Grotch [131] *E(nS) Pachucki and Grotch [136] Eides and Grotch [144] *E(n¸)(lO0) Golosov et al. [147] Jentschura and Pachucki [99]
2Za #2[t(n#1)!t(1)] n n!1 8(1!ln 2) # # n 3 1!d J d ! J l(l#1)(2l#1) 2 ln
2 m M ! M ln !m ln d J M!m m m 8 3
ln
2 #[t(n#1)!t(1)] nZa
1 5 ! # #ln 2 d J 2n 6 1!d J !ln k (n, l)! 2l(l#1)(2l#1)
7 4 ln 2! (pZa)d J 2
3
1 4 l! 2
1 l# 2
3 l# 2
l(l#1) 1! (pZa) 3n
*E(nS) Pachucki and Karshenboim [149] Melnikov and Yelkhovsky [150]
11 ! (Za) ln(Za)d J 15
140
M.I. Eides et al. / Physics Reports 342 (2001) 63}261
Fig. 40. Electron-line radiative-recoil corrections.
and has the form [35,151,36]
*E"
1 31 35 a(Za) m ln 2!8# # !0.415(4) md 5 192 4 n M J
"!1.988(4)
a(Za) m md . n M J
(148)
Another viable approach to calculation of this correction is based on the use of the Braun formula. The Braun formula depends on the total electron Green function in the external Coulomb "eld and automatically includes all radiative corrections to the electron line. Only one-loop insertions of the radiative photon in the electron line should be preserved in order to obtain corrections of order a(Za). At "rst sight calculation of these corrections with the help of the Braun formula may seem to be overcomplicated because, as we already mentioned above, the Braun formula produces a total correction of the "rst order in the mass ratio. Hence, exact calculation of the radiative-recoil correction of order a(Za) with its help would produce not only the contributions we called radiative-recoil above, but also the "rst term in the expansion over the mass ratio of the purely radiative contribution in Eq. (67). This contribution should be omitted in order to avoid double counting. It is not di$cult to organize the calculations based on the Braun formula in such a way that the reduced mass correction connected with the nonrecoil contribution would not show up and calculation of the remaining terms would be signi"cantly simpli"ed [152]. The idea is as follows. Purely radiative corrections of order a(Za), together with the standard (m /m) factor were connected with the nonrelativistic heavy particle pole in the two photon exchange diagrams which corresponds to the zero order term in the proton propagator expansion over 1/M. On the other hand, the Braun formula explicitly picks up the "rst-order term in the proton propagator heavy mass expansion. This means that the Braun formula produces the term corresponding to the reduced mass dependence of the nonrecoil contribution only when the high integration momentum goes through one of the external wave functions. New radiative-recoil contributions, which do not reduce to the tail of the mass ratio cubed factor in Eq. (67) are generated only by the integration region where the high momentum goes through the loop described by an explicit operator in the Braun formula. For calculation of the matrix element in this regime, it is su$cient to ignore external virtualities and to approximate the external wave functions by their values at the origin. The respective calculation reduces therefore to a variant of the scattering approximation calculation, the only di!erence being that the form of the skeleton structure is now determined by the Braun formula. As usual, in the scattering approximation approach the integral under consideration contains powerlike infrared divergence, which corresponds to the recoil contribution of the previous order in Za and should simply be subtracted. Explicit calculation of the Braun formula contribution in this regime was performed in [152]
M.I. Eides et al. / Physics Reports 342 (2001) 63}261
141
79 2.62946(1)#0.24523(1) a(Za) m *E" 2 ln 2! ! md n M J 32 p "!1.374
a(Za) m md . n M J
(149)
The results in Eqs. (148) and (149) contradict each other. Since both approaches to calculations are quite safe at least one of them should contain an arithmetic error. Numerically the discrepancy is about 6 kHz for 1S state. This discrepancy is not too important for the 1S Lamb shift measurements, since the error bars of even the best current experimental results are a few times larger (see Table 21). What is much more important from the phenomenological point of view, this radiative-recoil correction, linear in the electron}nucleus mass ratio, directly contributes to the hydrogen-deuterium isotope shift (see Section 16.1.6), and the respective discrepancy in the isotope shift is about 18 times larger than the experimental uncertainty of the isotope shift. A new independent calculation of the radiative-recoil contribution of order a(Za)(m/M) is needed in order to resolve this discrepancy. The radiative-recoil correction of order a(Za) is clearly connected with the high integration momenta region in the two-photon exchange kernels. In this situation there is no need to turn to the Braun formula, one may directly use the scattering approximation approach which is ideally suited for such calculation. A new calculation in this framework is under way now [153]. 6.1.2. Corrections generated by the polarization insertions in the photon lines Calculation of the radiative-recoil correction generated by the one-loop polarization insertions in the exchanged photon lines in Fig. 41 follows the same path as calculation of the correction induced by the insertions in the electron line. The respective correction was independently calculated analytically both in the skeleton integral approach [154] and with the help of the Braun formula [152]. Due to the simplicity of the photon polarization operator the calculation based on the scattering approximation [154] is so straightforward that we can present here all relevant formulae without making the text too technical. The skeleton integral for the recoil corrections corresponding to the diagrams with two exchanged photons in Fig. 34 has the form
k 1 k 16(Za)"t(0)" k dk k 1# # *E" (k#j) 4 k 8 m(1!k) kk 1 kk kk k kk kk 1 ! 1# # ! 1# # 1# # , 4 k 8 8 2 8 2 k
(150)
where k"m/M. We have already subtracted in Eq. (150) the nonrecoil part of the skeleton integral. This subtraction term is given by the nonrelativistic heavy particle pole contribution Eq. (66) in the two photon exchange. Next, we insert the polarization operator in the integrand in Eq. (150) according to the rule in Eq. (68). The skeleton integrand in Eq. (150) behaves as k/k at small momenta and naive substitution in Eq. (68) leads to a linear infrared divergence. This divergence dk/k would be cut o! at the atomic scale 1/(mZa) by the wave function momenta in an exact calculation. The low-momentum contribution would clearly be of order a(Za) and we may simply omit it since we
142
M.I. Eides et al. / Physics Reports 342 (2001) 63}261
Fig. 41. Photon-line radiative-recoil corrections.
already know this correction. Thus, to obtain the recoil correction of order a(Za)m it is su$cient to subtract the leading low-frequency asymptote in the radiatively corrected skeleton integrand. The subtracted integral for the radiative-recoil correction (the integral in Eq. (150) with inserted polarization operator and the low-frequency asymptote subtracted) has to be multiplied by an additional factor 2 needed in order to take into account that the polarization may be inserted in each of the two photon lines in the skeleton diagrams in Fig. 34. It can easily be calculated analytically if one neglects contributions of higher order in the small mass ratio [154]
*E"
2p 70 a(Za) m ! md . 27 pn M J 9
(151)
Calculation of the same contribution with the help of the Braun formula was made in [152]. In the Braun formula approach one also makes the substitution in Eq. (68) in the propagators of the exchange photons, factorizes external wave functions as was explained above (see Section 6.1.1), subtracts the infrared divergent part of the integral corresponding to the correction of previous order in Za, and then calculates the integral. The result of this calculation [152] nicely coincides with the one in Eq. (151). 6.1.3. Corrections generated by the radiative insertions in the proton line We have discussed above insertions of radiative corrections either in the electron line or in the exchanged photon line in the skeleton diagrams in Fig. 34. One more option, namely, insertions of a radiative photon in the heavy particle line also should be considered. The leading order correction generated by such insertions is a radiative-recoil contribution of order (Za)(Za)(m/M)m. Note that this correction contains one less power of the parameter a than the "rst nontrivial radiative-recoil correction of order a(Za)(m/M)m generated by the radiative insertions in the electron line considered above. There is nothing enigmatic about this apparent asymmetry, since the mass dependence of the leading order contribution to the Lamb shift of order a(Za)m in Eq. (40) is known exactly, and thus the would-be radiative correction of order a(Za)(m/M)m is hidden in the leading order contribution to the Lamb shift. No new calculation is needed to obtain the correction of order (Za)(Za)(m/M)m generated by the radiative insertions in the proton line. It is almost obvious that to order (Za) the contributions The radiative-recoil correction to the Lamb shift induced by the polarization insertions in the exchanged photons was also calculated in [155]. The result of that work contradicts the results in [154,152]. The calculations in [155] are made in the same way as the calculation of the recoil correction of order (Za)(m/M)m in [143], and lead to a wrong result for the same reason (see discussion in Footnote 14 in Section 5.2.3).
M.I. Eides et al. / Physics Reports 342 (2001) 63}261
143
to the energy level generated by the radiative insertions in the fermion lines are symmetric with respect to interchange of the light and heavy lines [95]. Then in the case of an elementary proton radiative-recoil correction generated by the radiative photon insertions in the heavy line may be obtained from the leading order contributions to the Lamb shift in Eqs. (40) and (44) by the substitutions mPM and aP(Za). Both substitutions are obvious, the "rst one arises because the leading term in the infrared expansion of the "rst-order radiative corrections to the fermion vertex contains the mass of the particle under consideration, and the second simply reminds us that the charge of the heavy particle is Ze. Hence, the Dirac form factor contribution is equal to
1 M(Za)\ 11 4(Za)(Za) m 1 , ln # d ! ln k (n, l) 3 pn M m 72 J 3 and the Pauli form factor leads to the correction *E"
(152)
(Za)(Za) m , dE " J 2pn M
(153)
j( j#1)!l(l#1)!3/4 (Za)(Za)m m . dE " J$ 2pn M l(l#1)(2l#1) These formulae are derived for an elementary heavy particle, and do not take into account the composite nature of the proton. The presence of the logarithm of the heavy particle mass M in Eq. (152) indicates that the logarithmic loop integration in the form factor integral goes up to the mass of the particle where one could no longer ignore the composite nature of the proton. For the composite proton the integration would be cut from above not by the proton mass but by the size of the proton. The usual way to account for the proton structure is to substitute the proton form factor in the loop integral. After calculation we obtain instead of Eq. (152)
*E"
1 7p K 2 K 1 K(Za)\ 11 ln # # ! ! # #2 m 72 24 32 4M 3 4M 3 1 4(Za)(Za) m , ! ln k (n, l) 3 pn M
d J (154)
where K"0.71 GeV is the parameter in the proton dipole form factor. As we have expected it replaces the proton mass in the role of the upper cuto! for the logarithmic loop integration. Note also that we have obtained an additional constant in Eq. (154). The anomalous magnetic moment contribution in Eq. (153) also would be modi"ed by inclusion of a nontrivial form factor, but the contribution to the proton magnetic moment should be considered together with the nonelectromagnetic contributions to the proton magnetic moment. The anomalous magnetic moment of the nucleus determined experimentally includes the electromagnetic contribution and, hence, even modi"ed by the nontrivial form factor contribution in Eq. (153) should be ignored in the phenomenological analysis. Usually the total contribution of the proton anomalous magnetic moment is hidden in the main proton charge radius contribution de"ned via the Sachs electric form factor. This topic will be discussed in more detail below in Section 7.1.1.
144
M.I. Eides et al. / Physics Reports 342 (2001) 63}261
From the practical point of view, the di!erence between the results in Eqs. (152) and (154) is about 0.18 kHz for 1S level in hydrogen and at the current level of experimental precision the distinctions between the expressions in Eqs. (152) and (154) may be ignored in the discussion of the Lamb shift measurements. These distinctions should be however taken into account in the discussion of the hydrogen}deuterium isotope shift (see below Section 16.1.6). An alternative treatment of the correction of order (Za)(Za)(m/M)m was given in [152]. The idea of this work was to modify the standard de"nition of the proton charge radius, and include the "rst-order quantum electrodynamic radiative correction into the proton radius determined by the strong interactions. From the practical point of view for the nS levels in hydrogen the recipe of [152] reduces to elimination of the constant 11/72 in Eq. (152) and omission of the Pauli correction in Eq. (153). Numerically such a modi"cation reduces the contribution to the 1S energy level in hydrogen by 0.14 kHz in comparison with the naive result in Eq. (152), and increases it by 0.03 kHz in comparison with the result in Eq. (154). Hence, for all practical needs at the current level of experimental precision there is no contradictions between our result above in Eq. (154), and the result in [152]. However, the approach of [152] from our point of view is unattractive; we prefer to stick with the standard de"nition of the intrinsic characteristics of the proton as determined by the strong interactions. Of course, in such an approach one has to extract the values of the proton parameters from the experimental data, properly taking into account quantum electrodynamic corrections. Another advantage of the standard approach advocated above is that in the case of the absence of a nontrivial nuclear form factor (as, for example, in the muonium atom with an elementary nucleus) the formula in Eq. (154) reduces to the classical expression in Eq. (152).
6.2. Corrections of order a(Za)(m/M)m Leading logarithm squared contribution of order a(Za)(m/M) was independenly calculated in [149,150] in the framework of the approach developed in [96] (see discussion in Section 5.2.4) m 2 a(Za) ln(Za)\ d . *E" mM J 3 pn
(155)
This correction is at the present time not too important in the phenomenological discussion of the 1S and 2S Lamb shifts. However, due to the usual linear dependence of the radiative-recoil correction on the electron}nucleus mass ratio, the double logarithm contribution in Eq. (155) is already at the present level of experimental accuracy quite signi"cant as a contribution to the hydrogen}deuterium isotope shift (see Section 16.1.6). Single logarithmic and nonlogarithmic contributions may be estimated as one-half of the leading logarithm squared contribution, this constitutes about 0.8 and 0.1 kHz for the 1S and 2S levels in hydrogen, respectively. In view of the of the rapid experimental progress in the isotope shift measurements (see Table 22) calculation of these remaining corrections of order a(Za)(m/M)m deserves further theoretical e!orts (see Table 9).
M.I. Eides et al. / Physics Reports 342 (2001) 63}261
145
Table 9 Radiative-recoil corrections *E(1S) (kHz)
*E(2S) (kHz)
!20.17
!2.52
!13.94
!1.74
!0.41
!0.05
4(Za)(Za) m 1 ! ln k (n) pn M 3
4.58
0.58
2 a(Za) m ln(Za)\ d 3 pn mM J
1.52
0.19
Radiative insertions in the electron line Bhatt and Grotch [35,36,151] !1.988(4)
a(Za) m d n M J
Radiative insertions in the electron line Pachucki [152] Polarization insertions Eides and Grotch [154] Pachucki [152] Dirac FF insertions in the heavy line
!1.374
a(Za) m d n M J
2p 70 a(Za) m ! d 9 27 pn M J
1 K(Za)\ ln 3 m
1 7p K # ! ! 24 32 4M
2 K 11 # #2 # d 3 4M 72 J
Pachucki and Karshenboim [149] Melnikov and Yelkhovsky [150]
7. Nuclear size and structure corrections The one-electron atom is a composite nonrelativistic system loosely bound by electromagnetic forces. The characteristic size of the atom is of the order of the Bohr radius 1/(mZa), and this scale is too large for e!ects of other interactions (weak and strong, to say nothing about gravitational) to play a signi"cant role. Nevertheless, in high-precision experiments e!ects connected with the composite nature of the nucleus can become observable. By far the most important nonelectomagnetic contributions are connected with the "nite size of the nucleus and its structure. Both the "nite radius of the proton and its structure constants do not at present admit precise calculation from "rst principles in the framework of QCD } the modern theory of strong interactions. Fortunately, the main nuclear parameters a!ecting the atomic energy levels may be either measured directly, or admit almost model independent calculation in terms of other experimentally measured parameters. Besides the strong interaction e!ects connected with the nucleus, strong interactions a!ect the energy levels of atoms via nonleptonic contributions to the photon polarization operator. Once
146
M.I. Eides et al. / Physics Reports 342 (2001) 63}261
Fig. 42. Proton radius contribution to the Lamb shift. Bold dot corresponds to the form factor slope.
again, these contributions admit calculation in terms of experimental data, as we have already discussed above in Section 4.2.5. Minor contribution to the energy shift is also generated by the weak gauge boson exchange to be discussed below. 7.1. Main proton size contribution The nucleus is a bound system of strongly interacting particles. Unfortunately, modern QCD does not provide us with the tools to calculate the bound state properties of the proton (or other nuclei) from "rst principles, since the QCD perturbation expansion does not work at large (from the QCD point of view) distances which are characteristic for the proton structure, and the nonperturbative methods are not mature enough to produce good results. Fortunately, the characteristic scales of the strong and electromagnetic interactions are vastly di!erent, and at the large distances which are relevant for the atomic problem the in#uence of the proton (or nuclear) structure may be taken into account with the help of a few experimentally measurable proton properties. The largest and by far the most important correction to the atomic energy levels connected with the proton structure is induced by its "nite size. The leading nuclear structure contribution to the energy shift is completely determined by the slope of the nuclear form factor in Fig. 42 (compare Eq. (27)) k F(!k)+1! 1r2 . 6
(156)
The respective perturbation potential is given by the form factor slope insertion in the external Coulomb potential (see Eq. (29)) 4p(Za) 2p(Za) 1r2 , ! 2 P 3 k
(157)
and we immediately obtain 2(Za) 2p(Za) 1r2"t(0)"" m1r2d . *E" J 3n 3
(158)
We see that the correction to the energy level induced by the "niteness of the proton charge radius shifts the energy level upwards, and is nonvanishing only for the S states. Physically the "nite radius of the proton means that the proton charge is smeared over a "nite volume, and the electron spends some time inside the proton charge cloud and experiences a smaller attraction than in the case of the pointlike nucleus (compare similar arguments in relation with the "nite radius of the electron below Eq. (30)). Note the similarity of this discussion to the consideration of the level shift induced by the polarization insertion in the external Coulomb photon in Section 3.2. However, unlike the present case the polarization insertion leads to a negative contribution to the energy levels since the polarization cloud screens the source charge.
M.I. Eides et al. / Physics Reports 342 (2001) 63}261
147
The result in Eq. (158) needs some clari"cation. In the derivation above it was implicitly assumed that the photon}nucleus vertex is determined by the expression in Eq. (156). However, for a nucleus with spin this interaction depends on more than one form factor, and the e!ective slope of the photon}nucleus vertex contains in the general case some additional terms besides the nucleus radius. We will consider the real situation for nuclei of di!erent spins below. 7.1.1. Spin one-half nuclei The photon}nucleus interaction vertex is described by the Dirac (F ) and Pauli (F ) form factors: 1 p kJF (k) , (159) c F (k)! I 2M IJ where at small momentum transfers 1r2 $ k , F (!k)+1! 6
(160)
g!2 F (0)" . 2 Hence, at low momenta the photon}nucleus interaction vertex (after the Foldy}Wouthuysen transformation and transition to the two-component nuclear spinors) is described by the expression 1!k
1 1r2 g!2 1#8F M#2F (0) $# "1!k # . 8M 8M 6 8M
(161)
For an elementary proton 1r2 "0, g"2, and only the "rst term in the square brackets $ survives. This term leads to the well known local Darwin term in the electron-nuclear e!ective potential (see, e.g., [31]) and generates the contribution proportional to the factor d in Eq. (37). As J was pointed out in [156], in addition to this correction, there exists an additional contribution of the same order produced by the term proportional to the anomalous magnetic moment in Eq. (161). However, this is not yet the end of the story, since the proton charge radius is usually de"ned via the Sachs electric form factor G , rather than the Dirac form factor F # 1r2 % k . (162) G (!k)+1! # 6 The Sachs electric and magnetic form factors are de"ned as (see, e.g. [20]) k G (!k)"F (!k)! F (!k) , # 4M
(163)
G (!k)"F (!k)#F (!k) . + In terms of this new charge radius the photon}nucleus vertex above has the form
1!k
1 1r2 % # 8M 6
.
(164)
148
M.I. Eides et al. / Physics Reports 342 (2001) 63}261
We now see that for a real proton the charge radius contribution has exactly the form in Eq. (158), where the charge radius is de"ned in Eq. (162). The only other term linear in the momentum transfer in the photon}nucleus vertex in Eq. (164) generates the d term in Eq. (37). J Hence, if one uses the proton charge radius de"ned via the Sachs form factor, the net contribution of order (Za)m/M has exactly the same form as if the proton were an elementary particle with g"2. The existing experimental data on the root mean square (rms) proton charge radius [157,158] lead to the proton size correction about 1100 kHz for the 1S state in hydrogen and about 140 kHz for the 2S state, and is thus much larger than the experimental accuracy of the Lamb shift measurements. Unfortunately, there is a discrepancy between the results of [157,158], which in#uences the theoretical prediction for the Lamb shifts, and a new experiment on measuring the proton charge radius is badly needed. A new value of the proton charge radius was derived recently from the improved theoretical analysis of the low-momentum transfer scattering experiments [159]. The phenomenological situation connected with the experimental data on the proton rms charge radius will be discussed in more detail below in Section 16.1.5. 7.1.2. Nuclei with other spins The general result for the nuclear charge radius and the Darwin}Foldy contribution for a nucleus with arbitrary spin was obtained in [160]. It was shown there that one may write a universal formula for the sum of these contributions irrespective of the spin of the nucleus if the nuclear charge radius is de"ned with the help of the same form factor for any spin. However, for historic reasons, the de"nitions of the nuclear charge radius are not universal, and respective formulae have di!erent appearances for di!erent spins. We will discuss here only the most interesting cases of the spin zero and spin one nuclei. The case of the spin zero nucleus is the simplest one. For an elementary scalar particle the low momentum nonrelativistic expansion of the photon-scalar vertex starts with the k/M term, and, hence, the respective contribution to the energy shift is suppressed by an additional factor 1/M in comparison to the spin one-half case. Hence, in the case of the scalar nucleus there is no Darwin term d in Eq. (37) [19,161]. Interaction of the composite scalar nucleus with photons is described J only by one form factor, and the slope of this form factor is called the charge radius squared. Hence, in the scalar case the charge radius contribution is described by Eq. (158), and the Darwin term is absent in Eq. (37). The spin one case is more complicated since for the vector nucleus its interaction with the photon is described in the general case by three form factors. The nonrelativistic limit of the photon}nucleus vertex in this case was considered in [162], where it was shown that with the standard de"nition of the deuteron charge radius (the case of the deuteron is the only phenomenologically interesting case of the spin one nucleus) the situation with the Darwin}Foldy and the charge radius contribution is exactly the same as in the case of the scalar particle. Namely, the Darwin}Foldy contribution is missing in Eq. (37) and the charge radius contribution is given by Eq. (158). It would be appropriate to mention here recent work [163], where a special choice of de"nition of the nuclear charge radius is advocated, namely it is suggested to include the Darwin}Foldy contribution in the de"nition of the nuclear charge radius. While one can use any consistent de"nition of the nuclear charge radius, this particular choice seems to us to be unattractive since in this case even a truly pointlike particle in the sense of quantum "eld theory
M.I. Eides et al. / Physics Reports 342 (2001) 63}261
149
(say an electron) would have a "nite charge radius even in zero-order approximation. The phenomenological aspects of the deuteron charge radius contribution to the hydrogen}deuterium isotope shift will be discussed later in Section 16.1.6. 7.1.3. Empirical nuclear form factor and the contributions to the Lamb shift In all considerations above we have assumed the most natural theoretical de"nition of nuclear form factors, namely, the form factor was assumed to be an intrinsic property of the nucleus. Therefore, the form factor is de"ned via the e!ective nuclear}photon vertex in the absence of electromagnetic interaction. Such a form factor can in principle be calculated with the help of QCD. The electromagnetic corrections to the form factor de"ned in this way may be calculated in the framework of QED perturbation theory. Strictly speaking all formula above are valid with this de"nition of the form factor. However, in practice, form factors are measured experimentally and there is no way to switch o! the electromagnetic interaction. Hence, in order to determine the form factor experimentally one has in principle to calculate electromagnetic corrections to the elastic electron}nucleus scattering which is usually used to measure the form factors [157,158]. In the usual "t to the experimental data not all electromagnetic corrections to the scattering amplitude are usually taken into account (see, e.g., discussion in [7,63]). First, all vacuum polarization insertions, excluding the electron vacuum polarization, are usually ignored. This means that respective contributions to the energy shift in Eq. (65) are swallowed by the empirical value of the nuclear charge radius squared. They are e!ectively taken into account in the contribution to the energy shift in Eq. (158), and should not be considered separately. Next there are the corrections of order (Za)(Za) to the energy shift. The perturbative electromagnetic contributions to the Pauli form factor should be ignored, since they de"nitely enter the empirical value of the nuclear g-factor. The situation is a bit more involved with respect to the electromagnetic contribution to the Dirac nuclear form factor. The QED contribution to the slope of the Dirac form factor is infrared divergent, and, hence, one cannot simply include it in the empirical value of the nuclear charge radius. Of course, as is well known, there is no real infrared divergence in the proper description of the electron}nucleus scattering if the soft photon radiation is properly taken into account (see, e.g., [19,20]). This means that the proper determination of the empirical proton form factor, on the basis of the experimental data, requires account of the electromagnetic radiative corrections, and the measured value of the nuclear charge radius squared does not include the electromagnetic contribution. Hence, the radiative correction of order (Za)(Za) in Eq. (152) should be included in the comparison of the theory with the experimental data on the energy shifts. 7.2. Nuclear size and structure corrections of order (Za)m Corrections of relative order (Za) connected with the nonelementarity of the nucleus are generated by the diagrams with two-photon exchanges. As usual all corrections of order (Za), originate from high (on the atomic scale) intermediate momenta. Due to the composite nature of We have already considered these corrections together with other radiative-recoil corrections above, in Section 6.1.3. This discussion will be partially reproduced here in order to make the present section self-contained.
150
M.I. Eides et al. / Physics Reports 342 (2001) 63}261
the nucleus, besides intermediate elastic nuclear states, we also have to consider the contribution of the diagrams with inelastic intermediate states. 7.2.1. Nuclear size corrections of order (Za)m Let us consider "rst the contribution generated only by the elastic intermediate nuclear states. This means that we will treat the nucleus here as a particle which interacts with the photons via a nontrivial experimentally measurable form factor F (k), i.e., the electromagnetic interaction of our nucleus is nonlocal, but we will temporarily ignore its excited states. As usual we start with the skeleton integral contribution in Eq. (66) corresponding to the two-photon skeleton diagram in Fig. 17. Insertion of the factor F (!k)!1 in the proton vertex corresponds to the presence of a nontrivial proton form factor. We have to consider diagrams in Fig. 43 with insertion of one factor F (!k)!1 in the proton vertex (there are two such diagrams, hence an extra factor two below)
(Za) dk (F (!k)!1) , *E"!32mm pn k
(165)
and the diagrams in Fig. 44 with insertion of two factors F (!k)!1 in two proton vertices
(Za) dk (F (!k)!1) . *E"!16mm pn k
(166)
The low momentum integration region in the integral in Eq. (165) produces a linearly divergent infrared contribution, which simply re#ects the presence of the correction of order (Za), calculated in Section 11.1. We will subtract this divergent contribution. Besides this uninteresting divergent term, the integral in Eq. (165) also contains the "nite contribution induced by high intermediate momenta, which should be taken on a par with the contribution in Eq. (166). The scale of the integration momenta in Eqs. (166) and (165) is determined by the form factor scale. High momenta in the present context means momenta of the form factor scale, to be distinguished from high momenta in other sections which often meant momenta of the scale of the electron mass. The characteristic momenta in the present case are much higher. The total contribution of order (Za) has the form
(Za) dk (F (!k)!1) , *E"!16mm pn k
(167)
Subtraction is necessary in order to avoid double counting since the subtracted term in the vertex corresponds to the pointlike proton contribution, already taken into account in the e!ective Dirac equation. Dimensionless integration momentum in Eq. (66) was measured in electron mass. We return here to dimensionful integration momenta, which results in an extra factor m in the numerators in Eqs. (165)}(167) in comparison with the factor in the skeleton integral Eq. (66). Notice also the minus sign before the momentum in the arguments of form factors; it arises because in the equations below k""k".
M.I. Eides et al. / Physics Reports 342 (2001) 63}261
151
Fig. 43. Diagrams for elastic nuclear size corrections of order (Za)m with one form factor insertion. Empty dot corresponds to factor F (!k)!1. Fig. 44. Diagrams for elastic nuclear size corrections of order (Za)m with two form factor insertions. Empty dot corresponds to factor F (!k)!1.
and, after subtraction of the divergent contribution and carrying out the Fourier transformation, one obtains [164,165] m(Za) m1r2 , *E"! 3n
(168)
where 1r2 is the third Zemach moment [166], de"ned via weighted convolution of two nuclear charge densities o(r)
1r2 , dr dr o(r )o(r )"r #r " .
(169)
Parametrically this result is of order m(Za)(m/K), where K is the form factor scale. Hence, this correction is suppressed in comparison with the leading proton size contribution not only by an extra factor Za but also by the extra small factor m/K. This explains the smallness of this contribution, even in comparison with the proton size correction of order (Za) (see below Section 7.3.2), since one factor m/K in Eq. (168) is traded for a much larger factor Za in that logarithmically enhanced contribution. The result in Eq. (168) depends on the third Zemach moment, or in other words, on a nontrivial weighted integral of the product of two charge densities, and cannot be measured directly, like the rms proton charge radius. This means that the correction under consideration may only conditionally be called the proton size contribution. It depends on the "ne details of the form factor momentum dependence, and not only on the directly measurable low-momentum behavior of the form factor. This feature of the result is quite natural taking into account high intermediate momenta characteristic for the integral in Eq. (167). In the practically important cases of hydrogen and deuterium, reliable results for this contribution may be obtained. Numerically, the nuclear radius correction of order (Za) is equal to !35.9 Hz for the 1S state and !4.5 Hz for the 2S state in hydrogen. These corrections are rather small. A much larger contribution arises in Eq. (168) to the energy levels of deuterium. The deuteron, unlike the proton, is a loosely bound system, its radius is much larger than the proton radius, and the respective correction to the energy levels is also larger. The contribution of the correction under consideration [167] The result in [165] has the factor m instead of mm before the integral in Eq. (168). This di!erence could become important only after calculation of a recoil correction to the contribution in Eq. (168).
152
M.I. Eides et al. / Physics Reports 342 (2001) 63}261
Fig. 45. Diagrams for nuclear polarizability correction of order (Za)m.
*E"0.49 kHz
(170)
to the 2S}1S energy splitting in deuterium should be taken into account in the discussion of the hydrogen}deuterium isotope shift. We started consideration of the proton size correction of relative order (Za) by inserting the factors (F (!k)!1) in the external "eld skeleton diagrams in Fig. 17. Technically the external "eld diagrams correspond to the heavy particle pole contribution in the sum of all skeleton diagrams with two exchanged photons in Figs. 36}38. In the absence of form factors the nonpole contributions of the diagrams in Figs. 36}38 were suppressed by the recoil factor m/M in comparison with the heavy pole contribution, and this justi"ed their separate consideration. However, as we have seen, insertion of the form factors pushes the e!ective integration momenta in the high momenta region &K even in the external "eld diagrams. Then even the external "eld contribution contains the recoil factor (m/K). We might expect that, after insertion of the proton form factors, the nonpole contribution of the skeleton diagrams with two exchanged photons in Figs. 36}38 would contain the recoil factor (m/M)(m/K), and would not be parametrically suppressed in comparison with the pole contribution in Eq. (168). The total contribution of the skeleton diagrams with the proton form factor insertions was calculated in [168] for the 2S state, and the di!erence between this result and the nonrecoil result in Eq. (168) turned out to be !0.25 Hz. At the current level of theoretical and experimental accuracy we can safely ignore such tiny di!erences between the pole and total proton size contributions of order (Za). 7.2.2. Nuclear polarizability contribution of order (Za)m to S-levels The description of nuclear structure corrections of order (Za)m in terms of nuclear size and nuclear polarizability contributions is somewhat arti"cial. As we have seen above the nuclear size correction of this order depends not on the charge radius of the nucleus but on the third Zemach moment in Eq. (169). One might expect the inelastic intermediate nuclear states in Fig. 45 would generate corrections which are even smaller than those connected with the third Zemach moment, but this does not happen. In reality, the contribution of the inelastic intermediate states turns out to be even larger than the elastic contribution since the powerlike decrease of the form factor is compensated in this case by the summation over a large number of nuclear energy levels. The inelastic contributions to the energy shift were a subject of intensive study for a long time, especially for muonic atoms (see, e.g., [169}175]). Corrections to the energy levels were obtained in these works in the form of certain integrals of the inelastic nuclear structure functions, and the dominant contribution is produced by the nuclear electric and magnetic polarizabilities. The main feature of the polarizability contribution to the energy shift is its logarithmic enhancement [172,176]. The appearance of the large logarithm may easily be understood with the help of the skeleton integral. The heavy particle factor in the two-photon exchange diagrams is now described by the photon}nucleus inelastic forward Compton amplitude [177]
M.I. Eides et al. / Physics Reports 342 (2001) 63}261
M"a(u, k)EEH#bM (u, k)BBH ,
153
(171)
where a(u, k) and bM (u, k) are proton (nuclear) electric and magnetic polarizabilities. In terms of this Compton amplitude the two-photon contribution has the form
4a(m Za) dk D D ¹r+c ((1#c )m!kK )c , GK HL G H M , *E"! KL (2p)i k k!2mk n which reduces after elementary transformations to
(172)
a(Za) *E"! mm dk+[(k#8)(k#4!k(k#2k#6)]a(u, k) 4pn # [k(k#4)!k(k#6k#6)]bM (u, k), ,
(173)
and we remind the reader that an extra factor Za is hidden in the de"nition of the polarizabilities. Ignoring the momentum dependence of the polarizabilities, one immediately comes to a logarithmically ultraviolet divergent integral [177]
K a(Za) mm[5a(0, 0)!bM (0, 0)] ln #O(1) , *E"! m pn
(174)
where K is an ultraviolet cuto!. In real calculations the role of the cuto! is played by the characteristic excitation energy of the nucleus. The sign of the energy shift is determined by the electric polarizability and has a clear physical origin. The electron polarizes the nucleus, an additional attraction between the induced dipole and the electron emerges, and shifts the energy level down. In the case of hydrogen the characteristic excitation energy is about 300 MeV, the logarithm is rather large, and the logarithmic approximation works very well. Using the proton polarizabilities [178] one easily obtains the polarizability contribution for the proton nS state [172,177,179,180] 70 (11) (7) Hz , *E(nS)"! n
(175)
where the error in the "rst parenthesis describes the accuracy of the logarithmic approximation, and the error in the second parenthesis is due to the experimental data on the polarizabilities. A slightly di!erent numerically polarizability contribution 95 (7) Hz, *E(1S)"! n
(176)
was obtained in [181], and also, with a somewhat larger error bars, in [168]. Discrepancy between the results in Eqs. (175) and (176) is preserved even when both groups of authors use one and the same data on proton polarizabilities from [182]. Technically the disagreement between the results in [172,177,179,180] and [181] is due to the expression for the polarizability energy shift in the form of an integral of the total photoabsorption cross-section, which was used in [181]. This
Integration in Eq. (173) goes over dimensionless momentum k measured in electron mass.
154
M.I. Eides et al. / Physics Reports 342 (2001) 63}261
expression was derived in [176] under the assumption that the invariant amplitudes for the forward Compton scattering satisfy the dispersion relations without subtractions. The subtraction term in the dispersion relation for the forward Compton scattering amplitude is also missing in [168]. Without this subtraction term the dominant logarithmic contribution to the energy shift becomes proportional to the sum of the electric and magnetic polarizabilities a#bM , while in [172,177,179,180] this contribution is proportional to another linear combination of polarizabilities (see Eq. (174)). Restoring this necessary subtraction term the authors of [168] obtained !82 Hz for the polarizability contribution instead of their result in [168]. In the other experimentally interesting case of deuterium, nuclear excitation energies are much lower and a more accurate account of the internal structure of the deuteron is necessary. As is well known, due to smallness of the binding energy the model independent zero-range approximation provides a very accurate description of the deuteron. The polarizability contributions to the energy shift in deuterium are again logarithmically enhanced and in the zero-range approximation one obtains a model independent result [183,184]:
8I 1 8I a(Za) mm 5a ln # !b ln !1.24 *E"! B B m 20 m pn
,
(177)
where i"45.7 MeV is the inverse deuteron size, I"i/m is the absolute value of the deuteron N binding energy, and the deuteron electric and magnetic polarizabilities are de"ned as (E #I)"10"r"n2" 2 , a (u)" (Za) L B u!(E #I) 3 L L i 1# 3i a(k !k ) N L b (0)" , B i 8m i N 1# i
(178)
where i "7.9 MeV determines the position of the virtual level in the neutron}proton S state. Numerically the polarizability contribution to the deuterium 1S energy shift in the zero range approximation is equal to [184] *E(1S)"(!22.3#0.31) kHz ,
(179)
where the "rst number in the parenthesis is the contribution of the electric polarizability, and the second is the contribution of the magnetic polarizability. This zero range contribution results in the correction *E(1S!2S)"19.3 kHz ,
(180)
to the 1S}2S interval, and describes the total polarizability contribution with an accuracy of about one percent. New experimental data on the deuterium}hydrogen isotope shift (see Table 22) have an accuracy of about 0.1 kHz and, hence, a more accurate theoretical result for the polarizability contribution is Private communication from I.B. Khriplovich and A.P. Martynenko.
M.I. Eides et al. / Physics Reports 342 (2001) 63}261
155
required. In order to obtain such a result it is necessary to go beyond the zero range approximation, and take the deuteron structure into account in more detail. Fortunately, there exist a number of phenomenological potentials which describe the properties of the deuteron in all details. Some calculations with realistic proton}neutron potentials were performed recently [185}188]. The most precise results were obtained in [188] *E(1S}2S)"18.58 (7) kHz ,
(181)
which are consistent with the results of the other works [183,186]. The result in Eq. (181) is obtained neglecting the contributions connected with the polarizability of the constituent nucleons in the deuterium atom, and the polarizability contribution of the proton in the hydrogen atom. Meanwhile, as may be seen from Eq. (175), proton polarizability contributions are comparable to the accuracy of the polarizability contribution in Eq. (181), and cannot be ignored. The deuteron is a weakly bound system and it is natural to assume that the deuteron polarizability is a sum of the polarizability due to relative motion of the nucleons and their internal polarizabilities. The nucleon polarizabilities in the deuteron coincide with the polarizabilities of the free nucleons well within the accuracy of the logarithmic approximation [179]. Therefore the proton polarizability contribution to the hydrogen}deuterium isotope shift cancels, and the contribution to this shift, which is due to the internal polarizabilities of the nucleons, is completely determined by the neutron polarizability. This neutron polarizability contribution to the hydrogen}deuterium isotope shift was calculated in the logarithmic approximation in [179] *E(1S!2S)"53 (9) (11) Hz .
(182)
7.3. Nuclear size and structure corrections of order (Za)m 7.3.1. Nuclear polarizability contribution to P-levels The leading polarizability contribution of order (Za) obtained above is proportional to the nonrelativistic wave function at the origin squared, and hence, exists only for the S states. The leading polarizability contribution to the non-S-states is of order (Za) and may easily be calculated. Consider the Compton amplitude in Eq. (171) as the contribution to the bound state energy induced by the external "eld of the electron at the nuclear site. Then we calculate the matrix element in Eq. (171) between the electron states, considering the "eld strengths from the Coulomb "eld generated by the orbiting electron. We obtain [169] (the overall factor 1/2 is due to the induced nature of the nuclear dipole)
1 a(Za)am 3n!l(l#1) aa . n, l n, l "! *E"! r 2 2n(l#3/2)(l#1)(l#1/2)l(l!1/2) 2
(183)
This energy shift is negative because when the electron polarizes the nucleus this leads to an additional attraction of the induced dipole to the electron. The contribution induced by the magnetic susceptibility may also be easily calculated [169], but it is even of higher order in Za (order (Za)) since the magnetic "eld strength behaves as 1/r. This additional suppression of the magnetic e!ect is quite reasonable, since the magnetic "eld itself is a relativistic e!ect.
156
M.I. Eides et al. / Physics Reports 342 (2001) 63}261
Consideration of the P-state polarizability contribution provides us with a new perspective on the S-state polarizability contribution. One could try to consider the matrix element in Eq. (183) between the S states. Due to nonvanishing of the S-state wave functions at the origin this matrix element is linearly divergent at small distances, which once more demonstrates that the S-state contribution is of a lower order in Za than for the P-state, and that for its calculation one has to treat small distances more accurately than was done in Eq. (183). 7.3.2. Nuclear size correction of order (Za)m The nuclear size and polarizability corrections of order (Za) obtained in Section 7.2 are very small. As was explained there, the suppression of this contribution is due to the large magnitude of the characteristic momenta responsible for this correction. The nature of the suppression is especially clear in the case of the Zemach radius contribution in Eq. (168), which contains the small factor m1r2 . The nuclear size correction of order (Za)m contains an extra factor Za in comparison with the nuclear size and polarizability corrections of order (Za), but its main part is proportional to the proton charge radius squared. Hence, we should expect that despite an extra power of Za this correction is numerically larger than the nuclear size and polarizability contributions of the previous order in Za. As we will see below, calculations con"rm this expectation and, moreover, the contribution of order (Za)m is additionally logarithmically enhanced. Nuclear size corrections of order (Za) may be obtained in a quite straightforward way in the framework of the quantum mechanical third-order perturbation theory. In this approach one considers the di!erence between the electric "eld generated by the nonlocal charge density described by the nuclear form factor and the "eld of the pointlike charge as a perturbation operator [164,165]. The main part of the nuclear size (Za) contribution which is proportional to the nuclear charge radius squared may also be easily obtained in a simpler way, which clearly demonstrates the source of the logarithmic enhancement of this contribution. We will "rst discuss in some detail this simpleminded approach, which essentially coincides with the arguments used above to obtain the main contribution to the Lamb shift in Eq. (30), and the leading proton radius contribution in Eq. (158). The potential of an extended nucleus is given by the expression
<(r)"!Za dr
o(r) , "r!r"
(184)
where o(r) is the nuclear charge density. Due to the "nite size of the nuclear charge distribution, the relative distance between the nucleus and the electron is not constant but is subject to additional #uctuations with probability o(r). Hence, the energy levels experience an additional shift
*E" n dr o(r) [<(r#r)!<(r)] n
.
(185)
Taking into account that the size of the nuclear charge distribution is much smaller than the atomic scale, we immediately obtain
2p *E" (Za)1r2 dro(r)t(r)>t(r) . 3 We will now discuss contributions contained in Eq. (186) for di!erent special cases.
(186)
M.I. Eides et al. / Physics Reports 342 (2001) 63}261
157
7.3.3. Correction to the nS-levels. In the SchroK dinger-Coulomb approximation the expression in Eq. (186) reduces to the leading nuclear size correction in Eq. (158). New results arise if we take into account Dirac corrections to the SchroK dinger}Coulomb wave functions of relative order (Za). For the nS states the product of the wave functions in Eq. (186) has the form (see, e.g, [165])
"t(r)"""t (r)" 1!(Za) ln 1
2mrZa 9 1 11 #t(n)#2c# ! ! n 4n n 4
,
(187)
and the additional contribution to the energy shift is equal to
2(Za) *E"! m1r2 3n
ln
2mrZa 9 1 11 #t(n)#2c# ! ! . n 4n n 4
(188)
This expression nicely illustrates the main qualitative features of the order (Za) nuclear size contribution. First, we observe a logarithmic enhancement connected with the singularity of the Dirac wave function at small distances. Due to the smallness of the nuclear size, the e!ective logarithm of the ratio of the atomic size and the nuclear size is a rather large number; it is equal to about !10 for the 1S level in hydrogen and deuterium. The result in Eq. (188) contains all state-dependent contributions of order (Za). A tedious third-order perturbation theory calculation [164,165] produces some additional state-independent terms with the net result being a few percent di!erent from the naive result above. The additional state-independent contribution beyond the naive result above has the form [167]
1r2 1r211/r2 2(Za) m ! # dr dr o(r)o(r)h("r"!"r") *E" 2 3 3n
; (r#r) ln
"r" r r r!r ! # # "r" 3"r" 3"r" 3
# 6 dr dr dr o(r)o(r)o(r)h("r"!"r")h("r"!"r")
;
r "r" r r 1 1 rr 2rr r ln ! # # # ! # . 3 "r" 45"r""r" 9 "r" "r" 36r 9r 9
(189)
Note that, unlike the leading naive terms in Eq. (188), this additional contribution depends on more detailed features of the nuclear charge distribution than simply the charge radius squared. Detailed numerical calculations in the interesting cases of hydrogen and deuterium were performed in [167]. Nuclear size contributions of order (Za) to the energy shifts in hydrogen are given in Table 10 and, as discussed above, they are an order of magnitude larger than the nuclear size and polarizability contributions of the previous order in Za. Respective corrections to the energy levels in deuterium are even much larger than in hydrogen due to the large radius of the deuteron. The nulear size contribution of order (Za) to the 2S}1S splitting in deuterium is equal to (we have used in this calculation the value of the deuteron charge All numbers in Table 10 are calculated for the proton radius r "0.862(12) fm, see discussion on the status of the N proton radius results in Section 16.1.5.
158
M.I. Eides et al. / Physics Reports 342 (2001) 63}261
Table 10 Nuclear size and structure corrections *E(1S) kHz
*E(2S) kHz
Leading nuclear size contribution
2 (Za)m1r2d J 3n
1162(32)
145 (4)
Proton form factor contribution of order (Za) Borisoglebsky and Tro"menko [164] Friar [165]
m(Za) ! m1r2 d J 3n
!0.036
!0.004
!0.070(11)(7)
!0.009(1)(1)
Polarizability contribution *E(nS) Startsev et al. [172] Khriplovich and Senikov [177,179,180]
a(Za) mm ! pn K [5a(0, 0)!bM (0, 0)]ln m
Polarizability contribution *E(nP) Ericson and Hufner [169]
a(Za)am ! 2
Nuclear size correction of order (Za) *E(nS) Borisoglebsky and Tro"menko [164] Friar [165]
2(Za) ! m1r2 3n
Friar and Payne [167]
9 1 11 #t(n)#2c# ! ! #dE 4n n 4
Nuclear size correction of order (Za) *E(nP ) H Friar [165]
(Za)(n!1) m1r2d H 6n
Electron-line radiative correction Pachucki [195] Eides and Grotch [192]
a(Za) d !1.985(1)m1r2 n J
3n!l(l#1) 3 1 1 2n l# (l#1) l# l l! 2 2 2
2mrZa ln n
0.709 (20)
!0.184 (5)
0.095 (3)
!0.023 (1)
Polarization operator radiative correction Friar [193] Hylton [194] Pachucki [195] Eides and Grotch [192]
1 a(Za) m1r2 d 2 n J
0.046 (1)
0.006
M.I. Eides et al. / Physics Reports 342 (2001) 63}261
159
radius obtained in [189] from the analysis of all available experimental data) *E"!3.43 kHz ,
(190)
and in hydrogen *E"!0.61 kHz .
(191)
We see that the di!erence of these corrections gives an important contribution to the hydrogen} deuterium isotope shift. 7.3.2.2. Correction to the nP-levels. Corrections to the energies of P-levels may easily be obtained from Eq. (186). Since the P-state wave functions vanish at the origin there are no charge radius squared contributions of lower order, unlike the case of S states, and we immediately obtain [165] (n!1)(Za) *E(nP )" m1r2d . H H 6n
(192)
There exist also additional terms of order (Za) proportional to 1r2 [165] but they are suppressed by an additional factor m1r2 in comparison with the result above and may safely be omitted. 7.4. Radiative correction of order a(Za)1r2m to the xnite size ewect P Due to the large magnitude of the leading nuclear size correction in Eq. (158) at the current level of experimental accuracy one also has to take into account radiative corrections to this e!ect. These radiative corrections were "rst discussed and greatly overestimated in [190]. The problem was almost immediately clari"ed in [191], where it was shown that the contribution is generated by large intermediate momenta states and is parametrically a small correction of order a(Za)m1r2. On the basis of the estimate in [191] the authors of [7] expected the radiative correction to the leading nuclear charge radius contribution to be of order 10 Hz for the 1S-state in hydrogen. The large magnitude of the characteristic integration momenta [191] is quite clear. As we have seen above, in the calculation of the main proton charge contribution, the exchange momentum squared factor in the numerator connected with the proton radius cancels with a similar factor in the denominator supplied by the photon propagator. Any radiative correction behaves as k at small momenta, and the presence of such a correction additionally suppresses small integration momenta and pushes the characteristic integration momenta into the relativistic region of order of the electron mass. Hence, the corrections may be calculated with the help of the skeleton integrals in the scattering approximation. The characteristic integration momenta in the skeleton integral are of order of the electron mass, and are still much smaller than the scale of the proton form factor. As a result respective contribution to the energy shift depends only on the slope of the form factor. The actual calculation essentially coincides with the calculation of the corrections of order a(Za) to the Lamb shift in Section 4.3.3 but is technically simpler due to the triviality of the proton form factor slope contribution in Eq. (156). There are two sources of radiative corrections to the leading nuclear size e!ect, namely, the diagrams with one-loop radiative insertions in the electron line in Fig. 46, and the diagrams with one-loop polarization insertions in one of the external Coulomb lines in Fig. 47.
160
M.I. Eides et al. / Physics Reports 342 (2001) 63}261
Fig. 46. Electron-line radiative correction to the nuclear size e!ect. Bold dot corresponds to proton form factor slope.
Fig. 47. Coulomb-line radiative correction to the nuclear size e!ect. Bold dot corresponds to proton form factor slope.
7.4.1. Electron-line correction Inserting the electron line factor [74,75] and the proton slope contribution Eq. (156) in the skeleton integral in Eq. (66), one immediately obtains [192] *E "!1.985(1) U
a(Za) m1r2d . J n
(193)
where an additional factor 2 was also inserted in the skeleton integral in order to take into account all possible ways to insert the slope of the proton form factor in the Coulomb photons. In principle, this integral also admits an analytic evaluation in the same way as it was done for a more complicated integral in [75]. 7.4.2. Polarization correction Calculation of the diagrams with the polarization operator insertion proceeds exactly as in the case of the electron factor insertion. The only di!erence is that one inserts an additional factor 4 in the skeleton integral to take into account all possible ways to insert the polarization operator and the slope of the proton form factor in the Coulomb photons. After an easy analytic calculation one obtains [193,194,192] a(Za) m1r2d . *E " J 2n
(194)
7.4.3. Total radiative correction The total radiative correction to the proton size e!ect is given by the sum of contributions in Eqs. (194) and (193) *E"!1.485(1)
a(Za) m1r2d . J n
(195)
This contribution was also considered in [195]. Correcting an apparent misprint in that work, one "nds the value !1.43 for the numerical coe$cient in Eq. (195). The origin of the minor discrepancy between this value and the one in Eq. (195) is unclear, since the calculations in [195] were done without separation of the polarization operator and electron factor contributions.
M.I. Eides et al. / Physics Reports 342 (2001) 63}261
161
Fig. 48. Z-boson exchange diagram.
Numerically the total radiative contribution in Eq. (195) for hydrogen is equal to *E(1S)"!0.138 kHz ,
(196)
*E(2S)"!0.017 kHz , and for deuterium *E(1S)"!0.841 kHz ,
(197)
*E(2S)"!0.105 kHz . These contributions should be taken into account in discussion of the hydrogen}deuterium isotope shift. 8. Weak interaction contribution The weak interaction contribution to the Lamb shift is generated by the Z-boson exchange in Fig. 48, which may be described by the e!ective local low-energy Hamiltonian
16pa 1 mM H8(¸)"! !sin h dx(t>(x)t(x))(W>(x)W(x)) , (198) 5 M sin h cos h 4 5 5 8 where M is the Z-boson mass, h is the Weinberg angle, and t and W are the two-component 8 5 wave functions of the light and heavy particles, respectively. Then we easily obtain the weak interaction contribution to the Lamb shift in hydrogen [196]
a(Za)m a(Za)m 8Gm 1 !sin h d . d +!7.7;10\ *E8(¸)"! 5 J J pn pn (2a 4
(199)
This contribution is too small to be of any phenomenological signi"cance.
9. Lamb shift in light muonic atoms Theoretically, light muonic atoms have two main special features as compared with the ordinary electronic hydrogenlike atoms, both of which are connected with the fact that the muon is about 200 times heavier than the electron. First, the role of the radiative corrections generated by the Discussing light muonic atoms we will often speak about muonic hydrogen but almost all results below are valid also for another phenomenologically interesting case, namely muonic helium. In the sections on light muonic atoms, m is the muon mass, M is the proton mass, and m is the electron mass. C
162
M.I. Eides et al. / Physics Reports 342 (2001) 63}261
Fig. 49. Energy levels in muonic hydrogen.
closed electron loops is greatly enhanced, and second, the leading proton size contribution becomes the second largest individual contribution to the energy shifts after the polarization correction. The reason for an enhanced contribution of the radiatively corrected Coulomb potential in Fig. 9 may be easily explained. The characteristic distance at which the Coulomb potential is distorted by the polarization insertion is determined by the electron Compton length 1/m and in C the case of electronic hydrogen it is about 137 times less than the average distance between the atomic electron and the Coulomb source 1/(m Za). This is the reason why even the leading C polarization contribution to the Lamb shift in Eq. (31) is so small for ordinary hydrogen. The situation with muonic hydrogen is completely di!erent. This time the average radius of the muon orbit is about r +1/(mZa) and is of order of the electron Compton length r +1/m , the ?R ! C respective ratio is about r /r +m /(mZa)+0.7, and the muon spends a signi"cant part of its ?R ! C life inside the region of the strongly distorted Coulomb potential. Qualitatively one can say that the muon penetrates deep in the screening polarization cloud of the Coulomb center, and sees a larger unscreened charge. As a result the binding becomes stronger, and for example the 2S-level in muonic hydrogen in Fig. 49 turns out to be lower than the 2P-level [197], unlike the case of ordinary hydrogen where the order of levels is just the opposite. In this situation the polarization correction becomes by far the largest contribution to the Lamb shift in muonic hydrogen. The relative contribution of the leading proton size contribution to the Lamb shift interval in electronic hydrogen is about 10\. It is determined mainly by the ratio of the proton size contribution to the leading logarithmically enhanced Dirac form factor slope contribution in Eq. (40) (which is much larger than the polarization contribution for electronic hydrogen). The relatively larger role of the leading proton size contribution in muonic hydrogen may also be easily understood qualitatively. Technically the leading proton radius contribution in Eq. (158) is of order (Za)m1r2, where m is the mass of the light particle, electron or muon in the case of ordinary and muonic hydrogen, respectively. We thus see that the relative weight of the leading proton charge
M.I. Eides et al. / Physics Reports 342 (2001) 63}261
163
contribution to the Lamb shift, in comparison with the standard nonrecoil contributions, is enhanced in muonic hydrogen by the factor (m/m ) in comparison with the relative weight of the C leading proton charge contribution in ordinary hydrogen, and it becomes larger than all other standard nonrecoil and recoil contributions. Overall the weight of the leading proton radius contribution in the total Lamb shift in muonic hydrogen is determined by the ratio of the proton size contribution to the leading electron polarization contribution. In electronic hydrogen the ratio of the proton radius contribution and the leading polarization contribution is about 5;10\, and is much larger than the weight of the proton charge radius contribution in the total Lamb shift. In muonic hydrogen this ratio is 10\, four times larger than the ratio of the leading proton size contribution and the leading polarization correction in electronic hydrogen. Both the leading proton size correction and the leading vacuum polarization contribution are parametrically enhanced in muonic hydrogen, and an extra factor four in their ratio is due to an additional accidental numerical enhancement. Below we will discuss corrections to the Lamb shift in muonic hydrogen, with an emphasis on the classic 2P}2S Lamb shift, having in mind the experiment on measurement of this interval which is now under way [198] (see also Section 16.1.10). Being interested in theory, we will consider even those corrections to the Lamb shift which are an order of magnitude smaller than the expected experimental precision 0.008 meV. Such corrections could become phenomenologically relevant for muonic hydrogen in the future. Another reason to consider these small corrections is that many of them scale as powers of the parameter Z, and produce larger contributions for atoms with higher Z. Hence, even being too small for hydrogen they become phenomenologically relevant for muonic helium where Z"2. 9.1. Closed electron-loop contributions of order aL(Za)m 9.1.1. Diagrams with one external Coulomb line 9.1.1.1. Leading polarization contribution of order a(Za)m. The e!ects connected with the electron vacuum polarization contributions in muonic atoms were "rst quantitatively discussed in [199]. In electronic hydrogen polarization loops of other leptons and hadrons considered in Section 4.2.5 played a relatively minor role, because they were additionally suppressed by the typical factors (m /m). In the case of muonic hydrogen we have to deal with the polarization loops C of the light electron, which are not suppressed at all. Moreover, characteristic exchange momenta mZa in muonic atoms are not small in comparison with the electron mass m , which determines the C momentum scale of the polarization insertions (m(Za)/m +1.5). We see that even in the simplest C case the polarization loops cannot be expanded in the exchange momenta, and the radiative corrections in muonic atoms induced by the electron loops should be calculated exactly in the parameter m(Za)/m . C Electron polarization insertion in the photon propagator in Fig. 9 induces a correction to the Coulomb potential, which may be easily written in the form [20]
1 (f!1 Za 2a df e\KC PD 1# . d
(200)
164
M.I. Eides et al. / Physics Reports 342 (2001) 63}261
The respective correction to the energy levels is given by the expectation value of this perturbation potential
*E "1nlm"d<"nlm2" LJ
dr R (r)d<(r)r LJ 1 (f!1 2aZ r dr df R (r)e\KC PD 1# , "! LJ f 2f 3p
(201)
where
R (r)"2 LJ
m Za (n!l!1)! 2m Za J 2m Za r e\K 8?LP¸J> r L\J\ n n[(n#l)!] n n
(202)
is the radial part of the SchroK dinger}Coulomb wave function in Eq. (1) (but now it depends on the reduced mass), and ¸J> is the associated Laguerre polynomial, de"ned as in [109,200] L\J\ (!1)G[(n#l)!] L\J\ xG . (203) ¸J> (x)" L\J\ i!(n!l!i!1)!(2l#i#1)! G The radial wave functions depend on radius only via the combination o"rm Za and it is convenient to write it explicitly as a function of this dimensionless variable
R (r)"2 LJ
m Za o , f LJ n n
(204)
where
o (n!l!1)! 2o J 2o , . e\ML¸J> f L\J\ n LJ n n[(n#l)!] n
(205)
Explicit dependence of the leading polarization correction on the parameters becomes more transparent after transition to the dimensionless integration variable o [199] 8a(Za) *E"! Q(b)m , LJ 3pn LJ
(206)
where
Q(b), LJ
o do
o 1 (f!1 df f e\MD@ 1# , LJ n f 2f
(207) and b"m /(m Za). The integral Q (b) may easily be calculated numerically for arbitrary n. It was C LJ calculated analytically for the lower levels n"1, 2, 3 in [201,202], and later these results were con"rmed numerically in [203]. Analytic results for all states with n"l#1 were obtained in [204]. The leading electron vacuum polarization contribution to the Lamb shift in muonic hydrogen in Eq. (207) is of order a(Za)m. Recall that the leading vacuum polarization contribution to the Lamb shift in electronic hydrogen in Eq. (32) is of order a(Za)m. Thus, the relative magnitude of the leading polarization correction in muonic hydrogen is enhanced by the factor 1/(Za)&(m/m ). C This means that the electronic vacuum polarization gives by far the largest contribution to the Lamb shift in muonic hydrogen. The magnitude of the energy shift in Eq. (206) is determined also
M.I. Eides et al. / Physics Reports 342 (2001) 63}261
165
Fig. 50. Two-loop polarization insertions in the Coulomb photon.
by the dimensionless integral Q (b). At the physical value of b"m /(m Za)+0.7 this integral is LJ C small (Q (b)+0.061, Q (b)+0.056, Q (b)+0.0037) and suppresses somewhat the leading electron polarization contribution. The expression for Q(b) in Eq. (207) is valid for any b, in particular we can consider the case LJ when m"m . Then b"m/(m Za)<1, and it is easy to show that the leading term in the expansion C of the result in Eq. (206) over 1/b coincides with the leading polarization contribution in electronic hydrogen in Eq. (32). Numerically, contribution to the 2P}2S Lamb shift in muonic hydrogen is equal to *E(2P}2S)"205.0074 meV .
(208)
9.1.1.2. Two-loop electron polarization contribution of order a(Za)m. In electronic hydrogen the leading contribution generated by the two-loop irreducible polarization operator in Fig. 13 is of order a(Za)m (see Eq. (51)), and is determined by the leading low-frequency term in the polarization operator. The reducible diagram in Fig. 50 with two one-loop insertions in the Coulomb photon does not generate a correction of the same order in electronic hydrogen because it vanishes at the characteristic atomic momenta, which are small in comparison with the electron mass. In the case of muonic hydrogen atomic momenta are of order of the electron mass and the two-loop irreducible and reducible polarization insertions in Fig. 50 both generate contributions of order a(Za)m and should be considered simultaneously. Two-loop electron polarization contribution to the Lamb shift may be calculated exactly like the one-loop contribution, the only di!erence is that one has to use as a perturbation potential the two-loop correction to the Coulomb potential from [54]. We use it in the form of the integral representation derived in [205] (see also [206])
13 7 2 Za a # # (f!1 df e\KC PD d<(r)" 54f 108f 9f r p 44 2 5 2 # ! # # # ln[f#(f!1] 9f 3f 4f 9f
4 2 8 2 # (f!1 ln[8f(f!1)]# ! # F(f) , 3f 3f 3f 3f
#
(209)
where
F(f)"
D
dx
1 3x!1 ln[x#(x!1]! ln[8x(x!1)] . x(x!1) (x!1
(210)
166
M.I. Eides et al. / Physics Reports 342 (2001) 63}261
Fig. 51. Three-loop electron polarization insertions in the Coulomb photon.
Fig. 52. Perturbation theory contribution with two one-loop polarization insertions.
Then we easily obtain 4a(Za) *E" Q(b)m , LJ LJ pn
(211)
where
Q(b), LJ
o do
o e\MD@ df f LJ n
44 2 5 2 # ! # # # ln[f#(f!1] 9f 3f 4f 9f #
13 7 2 # # (f!1 54f 108f 9f
4 2 8 2 # (f!1 ln[8f(f!1)]# ! # F(f) . 3f 3f 3f 3f
(212)
Numerically, this correction for the 2P}2S Lamb shift was "rst calculated in [203] *E(2P}2S)"1.5079 meV .
(213)
9.1.1.3. Three-loop electron polarization of order a(Za)m. As in the case of the two-loop electron polarization insertions in the external Coulomb line, reducible and irreducible three-loop polarization insertions enter on par in muonic hydrogen, and we have to consider all respective corrections to the Coulomb potential in Fig. 51. One-, two-, and three-loop polarization operators were in one form or another calculated in the literature [20,54,207,208,59]. Numerical calculation of the respective contribution the 2P}2S splitting in muonic hydrogen was performed in [209] *E(2P}2S)"0.08353(1)
a(Za) m +0.0053 meV p
(214)
9.1.2. Diagrams with two external Coulomb lines 9.1.2.1. Reducible diagrams. Contributions of order a(Za)m. In electronic hydrogen characteristic exchanged momenta in the diagram in Fig. 52 were determined by the electron mass, and since this mass in electronic hydrogen is large in comparison with the characteristic atomic momenta
M.I. Eides et al. / Physics Reports 342 (2001) 63}261
167
Fig. 53. Perturbation theory contribution of order a(Za) with polarization insertions.
we could ignore binding and calculate this diagram in the scattering approximation. As a result the respective contribution was suppressed in comparison with the leading polarization contribution, not only by an additional factor a but also by an additional factor Za. The situation is completely di!erent in the case of muonic hydrogen. This time atomic momenta are just of order of the electron mass, one cannot neglect binding, and the additional suppression factor Za is missing. As a result the respective correction in muonic hydrogen is of the same order a(Za) as the contributions of the diagrams with reducible and irreducible two-loop polarization insertions in one and the same Coulomb line considered above. Formally the contribution of diagram in Fig. 53 is given by the standard quantum mechanical second-order perturbation theory term. Summation over the intermediate states, which accounts for binding, is realized with the help of the reduced Green function. Convenient closed expressions for the reduced Green function in the lower states were obtained in [210] and independently reproduced in [211]. Numerical calculation of the contribution to the 2P}2S splitting leads to the result [211,212] *E(2P}2S)"0.01244
4a(Za) m "0.1509 meV . 9p
(215)
9.1.2.2. Reducible diagrams. Contributions of order a(Za)m. As in the case of corrections of order a(Za)m, not only the diagrams in Fig. 51 with insertions of polarization operators in one and the same external Coulomb line but also the reducible diagrams Fig. 53 with polarization insertions in di!erent external Coulomb lines generate corrections of order a(Za)m. Respective contributions were calculated in [209] with the help of the subtracted Coulomb Green function from [211] *E(2P}2S)"0.036506(4)
a(Za) m +0.0023 meV . p
(216)
Total contribution of order a(Za)m is a sum of the contributions in Eqs. (214) and (216) [209]: *E(2P}2S)"0.120045(12)
a(Za) m "0.0076 meV . p
(217)
9.2. Relativistic corrections to the leading polarization contribution with exact mass dependence The leading electron polarization contribution in Eq. (206) was calculated in the nonrelativistic approximation between the SchroK dinger}Coulomb wave functions. Relativistic corrections of relative order (Za) to this contribution may easily be obtained in the nonrecoil limit. To this end one has to calculate the expectation value of the radiatively corrected potential in Eq. (200) between the relativistic Coulomb}Dirac wave functions instead of averaging it with the nonrelativistic Coulomb}SchroK dinger wave functions.
168
M.I. Eides et al. / Physics Reports 342 (2001) 63}261
Fig. 54. One-photon exchange with one-loop polarization insertion.
Numerical calculations of the Uehling potential contribution to the energy shift without expansion over Za (and therefore with account of the leading nonrecoil relativistic corrections of order a(Za)m) are abundant in the literature, see., e.g., [213], and references in the review [214]. Analytic results without expansion over Za were obtained for the states with n"l#1, j"l#1/2 [127]. All these results might be very useful for heavy muonic atoms. However, in the case of muonic hydrogen with a relatively large muon}proton mass ratio recoil corrections to the nonrecoil relativistic corrections of order a(Za)m may be rather large, while corrections of higher orders in Za are expected to be very small. In such conditions it is reasonable to adopt another approach to the relativistic corrections, and try to calculate them from the start in the nonrecoil approximation with exact dependence on the mass ratio. In the leading nonrelativistic approximation the one-loop electron polarization insertion in the Coulomb photon generates a nontrivial correction Eq. (200) to the unperturbed Coulomb binding potential in muonic hydrogen, which may be written as a weighted integral of a potential corresponding to an exchange by a massive photon with continuous mass (t,2m f C 1 (f!1 Za 2a df 1# ! e\KC DP . (218) d
Za 1 1 # V ((t,2m f)" C 4. 2 m M !
mf Zamf e\KC DP C pd(r)! C e\KC DP ! (1!m fr) C r mM r
rGrH Za e\KC DP pG d # (1#2m fr) pH GH C r r 2mM
M.I. Eides et al. / Physics Reports 342 (2001) 63}261
169
Fig. 55. Relativistic corrections to the leading electron polarization contribution.
#
Za 1 1 # e\KC DP(1#2m fr)[r;p] ) r . C r 4m 2mM
(219)
Then the analogue of the Breit potential induced by the electron vacuum polarization insertion is given by the integral
2a 1 (f!1 < " V (2m f) . (220) df 1# 4. 3p 4. C f 2f Calculation of the leading recoil corrections of order a(Za) becomes now almost trivial. One has to take into account that in our approximation the analogue of the Breit Hamiltonian in Eq. (36) has the form [211] p p Za H" # ! #< #
(221)
where < was de"ned in Eq. (35). Then the leading relativistic corrections of order a(Za) may be easily obtained as a sum of the "rstand second-order perturbation theory contributions corresponding to the diagrams in Fig. 55 [211] *E"1< 2#21< G(E )
(223)
9.3. Higher-order electron-loop polarization contributions 9.3.1. Wichmann}Kroll electron-loop contribution of order a(Za)m Contribution of the Wichmann}Kroll diagram in Fig. 24 with three external "elds attached to the electron loop [102] may be considered in the same way as the polarization insertions in the Coulomb potential, and as we will see below it generates a correction to the Lamb shift of order a(Za)m. A convenient representation for the Wicmann}Kroll polarization potential was obtained in [205]
1 p D Za a(Za) ! (f!1h(f!1)# dx(f!xf (x) , df e\KC DP d<5)(r)" f 12 p r where
(224)
1!x 1#x f (x)"!2x Li (x)!x ln (1!x)# ln (1!x) ln x 1!x 1!x 1#x 2!x 3!2x 1#x # ln # ln (1!x)# ln !3x 4x 1!x x(1!x) 1!x 1!x
(225)
170
M.I. Eides et al. / Physics Reports 342 (2001) 63}261
for x(1, and
1 1 1 3x#1 1 2x!1 1 x#1 f (x)" Li ! Li !Li ! ! ln 1! #ln x x x 2x x 2x x x!1 1 x#1 3x#1 x#1 1 ! (2x!1) ln 1! ln # ln !2 ln x ln 1! x x!1 4x x!1 x !
#
3x#1 x#1 x(3x!2) 1 ln x ln # 5! ln 1! 2x x!1 x!1 x
3x#2 3x!2 x#1 ! ln #3 ln x!3 x x!1 x!1
(226)
for x'1. This representation allows us to calculate the correction to the Lamb shift in the same way as we have done above for the Uehling and KaK llen-Sabry potentials in Eqs. (206) and (211), respectively. Let us write the potential in the form Za a(Za) g(m r) . d<5)(r)" C p r
(227)
Then the contribution to the energy shift is given by the expression 4a(Za) *E5)" Q5)(b)m , LJ LJ pn
(228)
where
Q5)(b), LJ
o g(ob) . o do f LJ n
(229) The Uehling and KaK llen-Sabry potentials are attractive, and shift the energy levels down. Physically this corresponds to the usual charge screening in QED, and one can say that at "nite distances the muon sees a larger unscreened charge of the nucleus. From this point of view the Uehling and KaK llen}Sabry potentials are just the attractive potentials corresponding to the excess of the bare charge over physical the charge. The case of the Wichmann}Kroll potential is qualitatively di!erent. Due to current conservation the total charge which induces the Wichmann}Kroll potential is zero [102]. Spatially the induced charge distribution consists of two components: a delta-function induced charge at the origin with the sign opposite to the sign of the nuclear charge, and a spatially distributed compensating charge of the same sign as nuclear. The radius of this spatial distribution is roughly equal to the electron Compton length. As a result the muon which sees the nucleus from a "nite distance experiences net repulsion, the Wichmann}Kroll potential shifts the levels up, and gives a positive contribution to the level shift (the original calculation in [215] produced a result with a wrong sign and magnitude). Practical calculations of the Wichmann}Kroll contribution are greatly facilitated by convenient approximate interpolation formulae for the potential in Eq. (224). One such formula was obtained in [206] "tting the results of the numerical calculation of the potential from [216] g(x)"0.361662331 exp[0.372 807 9x!(4.416798x#11.39911x#2.906096] .
(230)
M.I. Eides et al. / Physics Reports 342 (2001) 63}261
171
This expression "ts the exact potential in the interval 0.01(x(1.0 with an accuracy of about 1%, and due to an exponential decrease of the wave functions and smallness of the potential at large distances, it may be safely used for calculations at all x. After numerical calculation with this interpolation formula we obtain for the 2P}2S Lamb shift *E(2P}2S)"!0.0010 meV .
(231)
Another convenient interpolation formula for the potential in Eq. (224) was obtained in [214]. One more way to calculate the Wichmann}Kroll contribution numerically, is to use for the small values of the argument an asymptotic expansion of the potential in Eq. (224), which was obtained in [205], and for the large values of the argument the interpolation formula from [217]. Calculations in both these approaches reproduce the numerical value for the 2P}2S Lamb shift in Eq. (231). 9.3.2. Light by light electron-loop contribution of order a(Za)m Light by light electron-loop contribution to the Lamb shift in Fig. 20(e) in muonic atoms was considered in [219}223]. This is a correction of order a(Za) in muonic hydrogen. Characteristic momenta in the electron polarization loop are of the order of the atomic momenta in muonic hydrogen, and hence, one cannot neglect the atomic momenta calculating the matrix element of this kernel as it was done in the case of electronic hydrogen. An initial numerical estimate in [219] turned out to be far too large, and consistent much smaller numerical estimates were obtained in [220}222]. Momentum}space potential generated by the light by light diagrams and the respective contribution to the energy shifts in heavy atoms were calculated numerically in [222]. Certain approximate expressions for the e!ective momentum space potential were obtained in [221,222,214]. After extensive numerical work electron-loop light by light scattering contribution was calculated for muonic helium [218,214], and turned out to be equal to 0.02 meV for the 2P}2S interval. Scaling this result with Z we expect that the respective contribution in muonic hydrogen is at the level of 0.01}0.04 meV. This is one of the largest still unknown purely electrodynamic corrections to the 2P}2S interval in muonic hydrogen. 9.3.3. Diagrams with radiative photon and electron-loop polarization insertion in the Coulomb photon. Contribution of order a(Za)m In electronic hydrogen the leading contributions of diagrams such as Fig. 56 were generated at the scale of the mass of the light constituent. The diagrams e!ectively looked like Fig. 20(c), could be calculated in the scattering approximation, and produced the corrections of order a(Za)m. In muonic hydrogen electron polarization insertion in the Coulomb photon is not suppressed at characteristic atomic momenta, and respective contribution to the energy shift is only a times smaller than the contribution of the diagrams with insertions of one radiative photon in the muon line (leading diagrams for the Lamb shift in case of electronic hydrogen). One should expect that, in the same way as the leading Lamb shift contribution in electronic hydrogen, this contribution is A result two times smaller than this contribution was obtained in [211]. We are convinced of the correctness of the result in Eq. (231). Besides calculations with all three forms of the interpolation formulae, we also calculated the energy shift for muonic helium with Z"2, and reproduced the well known old helium results [217,218,214].
172
M.I. Eides et al. / Physics Reports 342 (2001) 63}261
Fig. 56. Diagram with radiative photon and electron-loop polarization insertion in the Coulomb photon.
also logarithmically enhanced and is of order a(Za)m. This contribution was never calculated completely, the leading logarithmic contribution was obtained in [211]. The leading logarithmic contribution generated by the diagrams with the radiative photon spanning any number of the Coulomb photons and one Coulomb photon with electron-loop polarization insertion, the simplest of which is resented in Fig. 56, may be calculated by closely following the classical Bethe calculation [32] of the leading logarithmic contribution to the Lamb shift in electronic hydrogen. As is well known, in the dipole approximation the standard subtracted logarithmically divergent (at high frequencies) expression for the Lamb shift may be written in the form [211] (compare, e.g., [20,19])
H!E 2a L pn , du n p (232) *E" H!E #u 3pm L where H is the nonrelativistic Hamiltonian for the muon in the external "eld equal to the sum of the Coulomb "eld and radiatively corrected Coulomb "eld from Eq. (200) Za < "! #<4. . ! r
(233)
To obtain the leading contribution generated by the integral in Eq. (232), it is su$cient to integrate over the wide logarithmic region m (Za);u;m, where one can neglect the terms H!E in the L denominator. Then one easily obtains [211] m 2a *E" ln 1n" p(H!E )p"n2 . L 3pm m (Za) Using the trivial identity 1n"*(< #<4.)"n2 ! ! 1n"p(H!E )p"n2" , L 2
(234)
(235)
throwing away the standard leading polarization independent logarithmic correction to the Lamb shift, which is also contained in this expression, and expanding the state vectors up to "rst order in the potential < one easily obtains 4. a m *E" ln +1n"*<4."n2#21n"<4.G(E )*< "n2, . (236) ! ! L ! 3pm m (Za) This contribution to 2P}2S splitting was calculated numerically in [211,212] *E(2P}2S)"!0.005(1) meV .
(237)
M.I. Eides et al. / Physics Reports 342 (2001) 63}261
173
Fig. 57. Electron polarization insertion in the radiative photon.
The uncertainty here is due to the unknown nonlogarithmic terms. Calculation of these nonlogarithmic terms is one of the future tasks in the theory of muonic hydrogen. 9.3.4. Electron-loop polarization insertion in the radiative photon. Contribution of order a(Za)m Contributions of order a(Za)m in muonic hydrogen generated by the two-loop muon form factors have almost exactly the same form as the respective contributions in the case of electronic hydrogen. The only new feature is connected with the contribution to the muon form factors generated by insertion of one-loop electron polarization in the radiative photon in Fig. 57. Respective insertion of the muon polarization in the electron form factors in electronic hydrogen is suppressed as (m /m), but insertion of a light loop in the muon case is logarithmically enhanced. C The graph in Fig. 57 is gauge invariant and generates a correction to the slope of the Dirac form factor, which was calculated in [224] dF(!k) dk
1 m m 29 m p 395 ln ! ln # # #O C 9 m m 108 m 54 1296 C C m 1 a . +! 2.21656#O C m m p
"! k
1 a m p (238)
Then the contribution to the Lamb shift has the form [224] *E "!4p(Za)"W (0)" L $
dF(!k) dk
k
" 2.21656#O
m C m
4a(Za) m md . J pn m (239)
We also have to consider the electron-loop contribution to the muon anomalous magnetic moment
1 m m m m m a 25 p m #O !4 I ln #3 F(0)" ln ! # 3 m m m m m p 36 4 m C C C C C C m a . (240) + 1.08275#O m p C The "rst two terms in this expression were obtained in [225,226], and an exact analytic result without expansion over m /m was calculated in [227,228]. Then one readily obtains for the Lamb C shift contribution [214] *E "1.08275 J
a(Za)m m , m pn
(241)
174
M.I. Eides et al. / Physics Reports 342 (2001) 63}261
Fig. 58. Electron- and muon-loop polarization insertions in the Coulomb photon.
*E "1.08275 J$
a(Za)m j(j#1)!l(l#1)!3/4 m pn l(l#1)(2l#1) m
Numerically for the 2P }2S interval we obtain *E(2P !2S )"!0.0016 meV .
(242)
9.3.5. Insertion of one electron and one muon loops in the same Coulomb photon. Contribution of order a(Za)(m /m)m C Contribution of the mixed polarization graph with one electron- and one muon-loop insertions in the Coulomb photon in Fig. 58 may be easily calculated by the same methods as the contributions of purely electron loops, and it was "rst considered in [229]. The momentum space perturbation potential corresponding to the mixed loop diagram is given by the expression (factor 2 is due to two diagrams) P (k) P (k) C , 2 I k k
(243)
where P (k) and P (k) are the muon- and electron-loop polarization operators, respectively I C (compare Eq. (31)). The characteristic integration momenta in the matrix element of this perturbation potential between the Coulomb}SchroK dinger wave functions are of the atomic scale mZa, and are small in comparison with the muon mass m. Hence, in the leading approximation the muon polarization may be approximated by the "rst term in its low-frequency expansion 2a P (k) C . 15pm k
(244)
This momentum space potential is similar to the momentum space potential corresponding to insertion of the electron-loop polarization in the Coulomb photon, considered in Section 9.1.1.1. The only di!erence is in the overall multiplicative constant, and that the respective expression in the case of the one electron polarization insertion contains k in the denominator instead of k in Eq. (244). This means that the mixed loop contribution is suppressed in comparison with the purely electron loops by an additional recoil factor (m /m). C Similarly to Eq. (200) it is easy to write a coordinate space representation for the perturbation potential corresponding to the diagram in Fig. 58
1 Za 16 a m C df e\KC DP 1# (f!1 . d<(r)" m 2f r 45 p
(245)
M.I. Eides et al. / Physics Reports 342 (2001) 63}261
175
Then we easily obtain
64a(Za) m C Q(b)m , *E" LJ LJ 45pn m I where
Q(b), LJ
o do
(246)
o 1 e\MD@ 1# (f!1 . df f LJ n 2f
(247) Due to the additional recoil factor (m /m) this contribution is suppressed by four orders of C magnitude in comparison with the nonrecoil corrections generated by insertion of two electron loops in the Coulomb photon (compare Eq. (211)). Numerically, for the 2P}2S interval we obtain *E(2P}2S)"0.00007 meV .
(248)
9.4. Hadron loop contributions 9.4.1. Hadronic vacuum polarization contribution of order a(Za)m Masses of pions are only slightly larger than the muon mass, and we should expect that the contribution of the diagram with insertion of the hadronic vacuum polarization in the Coulomb photon in Fig. 59 is of the same order of magnitude as contribution of the respective diagrams with muon vacuum polarization. Hadronic polarization correction is of order a(Za)m, it depends only on the leading low-momentum asymptotic term in the hadronic polarization operator, and has the same form as in the case of electronic hydrogen in Section 4.2.5. It was considered in the literature many times and consistent results were obtained in [230}233,209]. According to [63] *E(nS)"!0.671(15)*E , I where *E is the muon-loop polarization contribution to the Lamb shift in Eq. (63). I The latest treatment of this diagram in [234] produced *E(nS)"!0.638(22)*E . I Respective results for the 2P}2S splitting are
(249)
(250)
*E(2P}2S)"0.0113(3) meV ,
(251)
*E(2P}2S)"0.0108(4) meV .
(252)
and
As in the case of electronic hydrogen this correction may be hidden in the main proton radius contribution to the Lamb shift and we ignored it in the phenomenological discussion of the Lamb shift in electronic hydrogen (see discussion in Section 7.1.3). However, we include the hadronic polarization in the theoretical expression for the Lamb shift in muonic hydrogen having in mind that in the future all radiative corrections should be properly taken into account while extracting the value of the proton charge radius from the scattering and optical experimental data. In the case of muonic hydrogen m in Eq. (63) is the muon}proton reduced mass.
176
M.I. Eides et al. / Physics Reports 342 (2001) 63}261
Fig. 59. Hadron polarization insertion in the Coulomb photon.
Fig. 60. Hadron polarization contribution of order a(Za). Fig. 61. Hadron polarization insertion in the radiative photon.
9.4.2. Hadronic vacuum polarization contribution of order a(Za)m Due to the analogy between contributions of the diagrams with muon and hadron vacuum polarizations, it is easy to see that insertion of hadron vacuum polarization in one of the exchanged photons in the skeleton diagrams with two-photon exchanges generates correction of order a(Za) (see Fig. 60). Calculation of this correction is straightforward. One may even take into account the composite nature of the proton and include the proton form factors in photon}proton vertices. Such a calculation was performed in [234] and produced a very small contribution *E(2P}2S)"0.000047 meV .
(253)
9.4.3. Contribution of order a(Za)m induced by insertion of the hadron polarization in the radiative photon The muon mass is only slightly lower than the pion mass, and we should expect that insertion of hadronic vacuum polarization in the radiative photon in Fig. 61 will give a contribution to the anomalous magnetic moment comparable with the contribution induced by insertion of the muon vacuum polarization. Respective corrections are written via the slope of the Dirac form factor and the anomalous magnetic moment exactly as in Section 9.3.4. The only di!erence is that the contributions to the form factors are produced by the hadronic vacuum polarization. Numerically this contribution to the 2P}2S interval was calculated in [234] *E(2P}2S)"!0.000015 meV ,
(254)
and is too small to be of any practical signi"cance. In the case of electronic hydrogen this hadronic insertion in the radiative photon is additionally suppressed in comparison with the contribution of the electron vacuum polarization roughly speaking as (m /m ). C p
M.I. Eides et al. / Physics Reports 342 (2001) 63}261
177
Fig. 62. Hadron and electron polarization insertions in the Coulomb photon.
9.4.4. Insertion of one electron and one hadron loops in the same Coulomb photon Due to similarity between the muon and hadron polarizations, such a correction generated by the diagram in Fig. 62 should be of the same order as the respective correction with the muon loop in Eq. (248) and thus is too small for any practical needs. It may be easily calculated. 9.5. Standard radiative, recoil and radiative-recoil corrections All corrections to the energy levels obtained above in the case of ordinary hydrogen and collected in the Tables 2,3,5,7}9 are still valid for muonic hydrogen after an obvious substitution of the muon mass instead of the electron mass in all formulae. These contributions are included in Table 11. 9.6. Nuclear size and structure corrections Nuclear size and structure corrections for the electronic hydrogen were considered in Section 4.1 and are collected in Table 10. Below we will consider what happens with these corrections in muonic hydrogen. The form of the main proton size contribution of order (Za)m1r2 from Eq. (158) does not change. 9.6.1. Nuclear size and structure corrections of order (Za)m 9.6.1.1. Nuclear size corrections of order (Za)m. It is easy to see that neglecting recoil the nuclear size correction of order (Za)m in muonic hydrogen is still given by Eq. (168). Calculating this contribution with the same values of parameters as in [167] for r "0.862(12) fm one obtains N *E(2P}2S)"0.0247 meV . (255) The muon}proton mass ratio is much larger than the electron}proton mass ratio, and one could expect a relatively large recoil correction to this result. The total nuclear size correction of order (Za) with account for recoil is given by the sum of two-photon diagrams in Figs. 43 and 44. As in the case of electronic hydrogen, due to large e!ective integration momenta, it is su$cient to calculate these diagrams in the scattering approximation. Respective calculations with realistic form factors and for r "0.862(12) fm were performed in [211,168,212] N *E(2P}2S)"0.0232(15) meV . (256) We see that the additional recoil contribution turns out to be not too important.
178
M.I. Eides et al. / Physics Reports 342 (2001) 63}261
Table 11 Lamb shift in muonic hydrogen *E(nl)
*E(2P}2S) meV
One-loop electron polarization Galanin and Pomeranchuk [199]
8a(Za) ! Q(b)m 3pn LJ
205.0074
Two-loop electron polarization Di Giacomo [203]
4a(Za) Q(b)m LJ pn
1.5079
Three-loop electron polarization contribution, order a(Za) Kinoshita and Nio [209]
0.0053
Polarization insertions in two Coulomb lines, order a(Za) Pachucki [211,212]
0.1509
Polarization insertions in two and three Coulomb lines, order a(Za) Kinoshita and Nio [209]
0.0023
Relativistic corrections of order a(Za) Pachucki [211,212] Wichmann}Kroll, order a(Za) Rinker [217] Borie and Rinker [218] Radiative photon and electron polarization in the Coulomb line, order a(Za) Pachucki [211,212]
1< 2#21< G
0.0594
4a(Za) m Q5)(b) LJ pn
!0.0010
1*<4.2#21<4.G *< 2 ! ! # !
!0.005(1)
Electron loop in the radiative photon, order a(Za) Barbieri et al. [224] Suura and Wichmann [225] Peterman [226]
2 m 1.08275 ! !1 !4;2.21656 3 mr a(Za) m m pn m
Mixed electron and muon loops order a(Za)(m /m) m C Borie [229]
64a(Za) m C Q(b)m LJ 45pn m
Hadronic polarization, order a(Za)m Folomeshkin [230] Friar et al. [63] Faustov and Martynenko [234] Hadronic polarization, order a(Za)m Faustov and Martynenko [234]
!0.638(22)
4a(Za) m md J 15pn m
!0.0016
0.00007
0.0108(4) 0.000047 (continued on next page)
M.I. Eides et al. / Physics Reports 342 (2001) 63}261
179
Table 11 (continued) *E(nl)
*E(2P}2S) meV
Hadronic polarization in the radiative photon, order a(Za)m Faustov and Martynenko [234]
!0.000015
Recoil contribution of order (Za)(m/M)m Barker-Glover [31]
1 1 (Za)m ! (1!d ) J 2nM j#(1/2) l#(1/2)
Radiative corrections of order aL(Za)Im
Tables 2, 3, 5 and 7
!0.6677
m Recoil corrections of order (Za)L m M Radiative-recoil corrections m of order a(Za)L m M
Table 8
!0.0440
Table 9
!0.0095
2 (Za)m1r2d J 3n
!3.862(108)
Leading nuclear size contribution Nuclear size correction of order (Za) Pachucki [211,212] Faustov and Martynenko [168] Nuclear structure correction of order (Za) Startsev et al. [172] Rosenfelder [159] Faustov and Martynenko [168] Pachucki [212] Nuclear size correction of order (Za) Borisoglebsky and Tro"menko [164] Friar [165] Pachucki [212] Radiative corrections to the nuclear xnite size ewect, order a(Za)m1r2 Friar [193] Pachucki [211]
0.0575
0.0232(15)
0.095(18) ! d J n
!
0.012(2)
Za 2 ln ! m1r2 n 3
2(Za) m1r2 3n
!0.0009(3)
!0.0204(6)
Let us notice that using the self-consistent proton radius 0.891(18) fm from Eq. (394) we would obtain in the nonrecoil limit *E(2P}2S)"0.0264 meV
(257)
instead of the result in Eq. (255). Comparing these numbers we see that the numbers discussed in this section may be used for estimates of the proton size contribution of order (Za), but when the results of the Lamb shift measurements become available, this correction will require some kind of
180
M.I. Eides et al. / Physics Reports 342 (2001) 63}261
a self-consistent consideration. Happily, respective integrals, despite being cumbersome are relatively simple. 9.6.1.2. Nuclear polarizability contribution of order (Za)m to S-levels. Calculation of the nuclear structure corrections of order (Za)m generated by the diagrams in Fig. 45 follows the same route as in the case of electronic hydrogen in Section 7.2.2 starting with the forward Compton scattering amplitude. The only di!erence is that due to relatively large mass of the muon the logarithmic approximation is not valid any more, and one has to calculate the integrals more accurately. According to [172,212] 0.095(18) d meV , *E"! J n
(258)
while the result in [159] is 0.136(30) d meV . *E"! J n
(259)
We think that the reasons for a minor discrepancy between these results are the same as for a similar discrepancy in the case of electronic hydrogen, see discussion in Section 7.2.2. The improved result in [168] (see Footnote 23) is 0.129 d meV . *E"! n J
(260)
However, the experimental data on the proton formfactors used in [168] contained misprints. Corrections to this experimental data were taken into account in [212]. We will adopt the result in Eq. (258) for further discussion. Respective contribution to the 2P}2S splitting is [212] *E(2P}2S)"0.012(2) meV .
(261)
9.6.2. Nuclear size and structure corrections of order (Za)m The nuclear polarizability contribution of order (Za)m was considered above in Section 7.3.1, and we may directly use the expression for this energy shift in Eq. (183) for muonic hydrogen. In electronic hydrogen the nuclear size correction of order (Za)m is larger than the nuclear size and structure corrections of order (Za)m. This enhancement is due to the smallness of the electron mass (see discussion in Section 7.3.2). The muon mass is much larger than the electron mass. As a result this hierarchy of the corrections does not survive in muonic hydrogen, and corrections of order (Za)m are smaller than the corrections of the previous order in Za. Numerically, the nuclear polarizability contribution of order (Za)m to the 2P}2S Lamb shift in muonic hydrogen is about 5;10\ meV, and is negligible. Nuclear size corrections of order (Za)m to the S levels were calculated in [164,165] and were discussed above in Section 7.3.2 for electronic hydrogen. Respective formulae may be directly used in the case of muonic hydrogen. Due to the smallness of this correction it is su$cient to consider
M.I. Eides et al. / Physics Reports 342 (2001) 63}261
181
only the leading logarithmically enhanced contribution to the energy shift from Eq. (188) [212]
2(Za) *E"! m1r2 3n
Za 2 ln ! m1r2 . n 3
(262)
We have restored in this equation a small second term from [165] which, due to the smallness of the electron mass was omitted in the case of electronic hydrogen in Eq. (188). Numerically the respective contribution to the 2P}2S energy shift is [212] *E(2P}2S)"!0.0009(3) meV .
(263)
The error of this contribution may easily be reduced if we would use the total expressions in Eqs. (188) and (189) for its calculation. The nuclear size correction of order (Za)m to P levels from Eq. (192) gives an additional contribution 4;10\ meV to the 2P }2S energy splitting and may safely be neglected. 9.6.3. Radiative corrections to the nuclear xnite size ewect Radiative corrections to the leading nuclear "nite size contribution were considered in Section 7.4. Respective results may be directly used for muonic hydrogen, and numerically we obtain *E(2P}2S)"0.00061r2"0.0005 meV .
(264)
This contribution is dominated by the diagrams with radiative photon insertions in the muon line. As usual in muonic hydrogen a much larger contribution is generated by the electron loop insertions in the external Coulomb photons. In muonic hydrogen, even after insertion of the electron loop in the external photon, the e!ective integration momenta are still of the atomic scale k&mZa&m , and the respective contribution to the energy shift is of order a(Za)m1r2, unlike C the case with the muonic loop insertions, when the respective contribution is of higher order a(Za)m1r2 (compare discussion in Section 7.4). Electron-loop radiative corrections to the leading nuclear "nite size contribution in light muonic atoms were considered in [193,211]. Two diagrams in Fig. 63 give contributions of order a(Za)m1r2. Analytic expression for the "rst diagram up to a numerical factor coincides with the expression for the mixed electron and muon loops in Eq. (246), and we obtain 16a(Za) m1r2Q(b)m , *E " C LJ LJ 9pn
(265)
where Q(b) is de"ned in Eq. (247). LJ Numerically, the respective contribution to the 2P}2S splitting is equal to *E(2P}2S)"!0.0083 meV .
(266)
The contribution of the second diagram in Fig. 63 may be written as [211]
4pZa1r2 *E" d (r)< G(r, 0) (0) . 4. 3
(267)
182
M.I. Eides et al. / Physics Reports 342 (2001) 63}261
Fig. 63. Electron polarization corrections to the leading nuclear size e!ect.
The respective contribution to the 2P}2S splitting was calculated in [211,212] *E(2P}2S)"!0.0126 meV .
(268)
Collecting all terms in Eqs. (264), (266) and (268) we obtain *E(2P}2S)"!0.02751r2+!0.0204(6) meV .
(269)
10. Physical origin of the hyper5ne splitting and the main nonrelativistic contribution The theory of the atomic energy levels developed in the previous sections is incomplete, since we systematically ignored the nuclear spin which leads to an additional splitting of the energy levels. This e!ect will be the subject of our discussion below. Unlike the Lamb shift, the hyper"ne splitting (HFS) (see Fig. 64) can be readily understood in the framework of nonrelativistic quantum mechanics. It originates from the interaction of the magnetic moments of the electron and the nucleus. The classical interaction energy between two magnetic dipoles is given by the expression (see, e.g., [19,55]) 2 H"! l l d(r) . 3
(270)
This e!ective Hamiltonian for the interaction of two magnetic moments may also easily be derived from the one photon exchange diagram in Fig. 65. In the leading nonrelativistic approximation the denominator of the photon propagator cancels the exchanged momentum squared in the numerator, and we immediately obtain the Hamiltonian for the interaction of two magnetic moments, reproducing the above result of classical electrodynamics. The simple calculation of the matrix element of this Hamiltonian between the nonrelativistic SchroK dinger}Coulomb wave functions gives the Fermi result [29] for the splitting between the 1S and 1S states
16 m m m m 8 mc" Za(1#a ) ch R , E " (Za)(1#a ) I M m I M m $ 3 3
(271)
In comparison with the Fermi result, we have restored here the proper dependence of the hyper"ne splitting on the reduced mass.
M.I. Eides et al. / Physics Reports 342 (2001) 63}261
183
Fig. 64. Scheme of hyper"ne energy levels in the ground state.
Fig. 65. Leading order contribution to hyper"ne splitting.
where m and M are the electron and muon masses respectively, Z is the charge of the muon in terms of the proton charge, c is the velocity of light, a is the muon anomalous magnetic moment, I R is the Rydberg constant and h is the Planck constant. The sign of this contribution may easily be understood from purely classical considerations, if one thinks about the magnetic dipoles in the context of the Ampere hypothesis about small loops of current. According to classical electrodynamics parallel currents attract each other and antiparallel ones repel. Hence, it is clear that the state with antiparallel magnetic moments (parallel spins) should have a higher energy than the state with antiparallel spins and parallel magnetic moments. As in the case of the Lamb shift, QED provides the framework for systematic calculation of numerous corrections to the Fermi formula for hyper"ne splitting. We again have the three small parameters, namely, the "ne structure constant a, Za and the small electron}muon mass ratio m/M. Expansion in these parameters generates relativistic (binding), radiative, recoil, and radiative-recoil corrections. At a certain level of accuracy the weak interactions and, for the case of hadronic atoms, the nuclear size and structure e!ects also become important. Below we will "rst discuss corrections to hyper"ne splitting in the case of a structureless nucleus, having in mind the special case of muonium where the most precise comparison between theory and experiment is possible. In a separate section, we will also consider the nuclear size and structure e!ects which should be taken into account in the case of hyper"ne splitting in hydrogen. We postpone more detailed discussion of the phenomenological situation to Section 6.1. Even in the case of muonium, strong interaction contributions generated by the hadron polarization insertions in the exchanged photons and the weak interaction contribution induced by the Z-boson exchange should be taken into account at Here we call the heavy particle the muon, having in mind that the precise theory of hyper"ne splitting "nds its main application in comparison with the highly precise experimental data on muonium hyper"ne splitting. However, the theory of nonrecoil corrections is valid for any hydrogenlike atom. Of course, Z"1 for muon, but the Fermi formula is valid for any heavy nucleus with arbitrary Z. As in the case of the Lamb shift, it is useful to preserve Z as a parameter in all formulae for the di!erent contributions to HFS, since it helps to clarify the origin of di!erent corrections.
184
M.I. Eides et al. / Physics Reports 342 (2001) 63}261
the current level of accuracy. The experimental value of the hyper"ne splitting in muonium is measured with uncertainty $53 Hz [5] (relative accuracy 1.2;10\), and the next task of the theory is to obtain all corrections which could be as large as 10 Hz. This task is made even more challenging by the fact that only a few years ago reduction of the theoretical error below 1 kHz was considered as a great success (see, e.g., discussion in [10]).
11. External 5eld approximation 11.1. Relativistic (binding) corrections to HFS Relativistic and radiative corrections depend on the electron and muon masses only via the explicit mass factors in the electron and muon magnetic moments, and via the reduced mass factor in the SchroK dinger wave function. All such corrections may be calculated in the framework of the external "eld approximation. In the external "eld approximation the heavy particle magnetic moment factorizes and the relativistic and radiative corrections have the form *E "E (1#corrections) . &$1 $
(272)
This factorization of the total muon momentum occurs because the virtual momenta involved in calculation of the relativistic and radiative corrections are small in comparison with the muon mass, which sets the natural momentum scale for corrections to the muon magnetic moment. Purely relativistic corrections are by far the simplest corrections to hyper"ne splitting. As in the case of the Lamb shift, they essentially correspond to the nonrelativistic expansion of the relativistic square root expression for the energy of the light particle in Eq. (3), and have the form of a series over (Za)+p/m. Calculation of these corrections should be carried out in the framework of the spinor Dirac equation, since clearly there would not be any hyper"ne splitting for a scalar particle. The binding corrections to hyper"ne splitting as well as the main Fermi contribution are contained in the matrix element of the interaction Hamiltonian of the electron with the external vector potential created by the muon magnetic moment (A"e;l/(4pr)). This matrix element should be calculated between the Dirac}Coulomb wave functions with the proper reduced mass dependence (these wave functions are discussed at the end of Section 3). Thus we see that the proper approach to calculation of these corrections is to start with the EDE (see discussion in Section 3), solve it with the convenient zero-order potential and obtain the respective Dirac}Coulomb wave functions. Then all binding corrections are given by the matrix element *E "1n"c cA"n2 .
(273)
As discovered by Breit [235] an exact calculation of this matrix element is really no more di$cult than calculation of the leading binding correction of relative order (Za). After straightforward calculation one obtains a closed expression for the hyper"ne splitting of an energy level with an
M.I. Eides et al. / Physics Reports 342 (2001) 63}261
185
Table 12 Relativistic (binding) corrections E $ Fermi [29] Breit [235]
kHz
1
4 459 031.922 (519) 1
(1!(Za)(2(1!(Za)!1) Total Fermi and Breit contributions
356.201
!1
1
4 459 388.123 (519)
(1!(Za)(2(1!(Za)!1)
arbitrary principal quantum number n [235,84] (see also [55])
(Za) 1#2 1! N E , *E (nS)" $ Nc(4c!1)
(274)
where N"(n!2(Za)(n!1)/(1#c), c"(1!(Za). Let us emphasize once more that the expression in Eq. (274) contains all binding corrections. Expansion of this expression in Za gives explicitly
11n#9n!11 (Za) *E (nS)" 1# 6n E 203n#225n!134n!330n#189 (Za)#2 $ , # n 72n
or
(275)
E $ *E (1S)" (1!(Za)(2(1!(Za)!1)
3 17 " 1# (Za)# (Za)#2 E , $ 2 8 and
E 17 449 (Za)#2 $ . *E (2S)" 1# (Za)# 8 8 128
(276)
(277)
Only the "rst two terms in the series give contributions larger than 1 Hz to the ground state splitting in muonium. As usual, in the Coulomb problem, expansion in the series for the binding corrections goes over the parameter (Za) without any factors of p in the denominator. This is characteristic for the Coulomb problem and emphasizes the nonradiative nature of the relativistic corrections. The closed expression for an arbitrary n is calculated in formula (6.14a) in [84, p. 471]. While this closed expression is correct, its expansion over Za printed in [84] after an equality sign, contains two misprints. Namely the sign before (Za) in the square brackets should be changed to the opposite, and the numerical factor inside these brackets should be !2 instead of !1. After these corrections the expansion in formula (6.14a) in [84] does not contradict the exact expression in the same formula, and also coincides with the result in [235].
186
M.I. Eides et al. / Physics Reports 342 (2001) 63}261
Fig. 66. Electron anomalous magnetic moment contribution to HFS. Bold dot corresponds to the Pauli form factor.
The sum of the main Fermi contribution and the Breit correction is given in the last line of Table 12. The uncertainty of the main Fermi contribution determines the uncertainty of the theoretical prediction of HFS in the ground state in muonium, and is in its turn determined by the experimental uncertainty of the electron}muon mass ratio. 11.2. Electron anomalous magnetic moment contributions (corrections of order aLE ) $ Leading radiative corrections to HFS are generated either by the electron or muon anomalous magnetic moments. The muon anomalous magnetic moment contribution is already taken into account in the expression for the Fermi energy in Eq. (271) and we will not discuss it here. All corrections of order aLE are generated by the electron anomalous magnetic moment insertion in $ the electron photon vertex in Fig. 66. These are the simplest of the purely radiative corrections since they are independent of the binding parameter Za. The value of the electron anomalous magnetic moment entering in the expression for HFS coincides with the one for the free electron. In this situation the contribution to HFS is given by the matrix element of the electron Pauli form factor between the wave functions which are the products of the SchroK dinger}Coulomb wave functions and the free electron spinors. Relativistic Breit corrections may also be trivially included in this calculation by calculating the matrix element between the Dirac}Coulomb wave functions. However, we will omit here the Breit correction of order a(Za)E to the anomalous moment $ contribution to HFS, since we will take it into account below, together with other corrections of order a(Za)E . Then the anomalous moment contribution to HFS has the form $ *E "a E , (278) $ C $ where [39,50}52,57,58] (see analytic expressions above in Eqs. (44), (48) and Eq. (55))
a a a #1.1812414562 . a "F (0)" !0.3284789652 C p p 2p
(279)
We have omitted here higher-order electron-loop contributions as well as the heavy particle loop contributions to the electron anomalous magnetic moment (see, e.g., [11]) because respective corrections to HFS are smaller than 0.001 kHz. Let us note that the electron anomalous magnetic moment contributions to HFS do not introduce any additional uncertainty in the theoretical expression for HFS (see also Table 13). In analogy with the case of the Lamb shift discussed in Section 3.2 one could expect that the polarization insertion in the one-photon exchange would also generate corrections of order aE . $ However, due to the short-distance nature of the main contribution to HFS, the leading small momentum (large distance) term in the polarization operator expansion does not produce any contribution to HFS. Only the higher momentum (smaller distance) part of the polarization operator generates a contribution to HFS and such a contribution inevitably contains, besides the factor a, an extra binding factor of Za. This contribution will be discussed in the next section.
M.I. Eides et al. / Physics Reports 342 (2001) 63}261
187
Table 13 Electron AMM contributions
Schwinger [39]
E $
kHz
a 2p
5178.763
Peterman [51]
3 p p 197 f(3)! ln 2# # 4 2 12 144 a +!0.3284789652 p
Kinoshita [57]
Laporta and Remiddi [58]
1 239 139 ! p ln 2 ! p# f(3) 24 2 160 18
Sommer"eld [52]
83 215 100 pf(3)! f(5)# 72 24 3
a p !7.903
1 a # ln 2 24
298 17 101 28 259 ! p ln 2# p# 9 810 5184
a +1.1812414562 p Total electron AMM contribution
a p 0.066 5170.926
Fig. 67. Skeleton two-photon diagram for HFS in the external "eld approximation.
11.3. Radiative corrections of order aL(Za)E $ 11.3.1. Corrections of order a(Za)E $ Nontrivial interplay between radiative corrections and binding e!ects "rst arises in calculation of the combined expansion over a and Za. The simplest contribution of this type is of order a(Za)E and was calculated a long time ago in classical papers [236,64,237]. $ The crucial observation, which greatly facilitates the calculations, is that the scattering approximation (skeleton integral approach) is adequate for calculation of these corrections (see, e.g., a detailed proof in [129]). As in the case of the radiative corrections to the Lamb shift discussed in Section 4.3.1, radiative corrections to HFS of order a(Za)E are given by the matrix elements of the $ diagrams with all external electron lines on the mass shell calculated between free electron spinors. The external spinors should be projected on the respective spin states and multiplied by the square of the SchroK dinger}Coulomb wave function at the origin. One may easily understand the physical reasons beyond this recipe. Radiative insertions in the skeleton two-photon diagrams in Fig. 67 suppress low integration momenta (of atomic order mZa) in the exchange loops and the e!ective
188
M.I. Eides et al. / Physics Reports 342 (2001) 63}261
loop integration momenta are of order m. Account of o! mass shell external lines would produce an additional factor Za and thus generate a higher order correction. Let us note that suppression of low intermediate momenta in the loops takes place only for gauge invariant sets of radiative insertions, and does not happen for all individual diagrams in an arbitrary gauge. Only in the Yennie gauge is the infrared behavior of individual diagrams not worse than the infrared behavior of their gauge invariant sums. Hence, use of the Yennie gauge greatly facilitates the proof of the validity of the skeleton diagram approach [129]. In other gauges the individual diagrams with on mass shell external lines often contain apparent infrared divergences, and an intermediate infrared regularization (e.g., with the help of the infrared photon mass of the radiative photons) is necessary. Due to the above-mentioned theorem about the infrared behavior of the complete gauge invariant set of diagrams, the auxiliary infrared regularization may safely be lifted after calculation of the sum of all contributions. Cancellation of the infrared divergent terms may be used as an additional test of the correctness of calculations. The contribution to HFS induced by the skeleton diagram with two external photons in Fig. 67 is given by the infrared divergent integral
dk 8Za E . (280) k pn $ Insertion in the integrand of the factor F(k) which describes radiative corrections, turns the infrared divergent skeleton integral into a convergent one. Hence, the problem of calculating contributions of order a(Za)E to HFS turns into the problem of calculating the electron factor $ describing radiative insertions in the electron line. Calculation of the radiative corrections induced by the polarization insertions in the external photon is straightforward since the explicit expression for the polarization operator is well known. 11.3.1.1. Correction induced by the radiative insertions in the electron line. For calculation of the contribution to HFS of order a(Za)E induced by the one-loop radiative insertions in the electron $ line in Fig. 68 we have to substitute in the integrand in Eq. (280) the gauge invariant electron factor F(k). This electron factor is equal to the one loop correction to the amplitude of the forward Compton scattering in Fig. 69. Due to absence of bremsstrahlung in the forward scattering the electron factor is infrared "nite. Convergence of the integral for the contribution to HFS is determined by the asymptotic behavior of the electron factor at small and large momenta. The ultraviolet (with respect of the large momenta of the external photons) asymptotics of the electron factor is proportional to the ultraviolet asymptotics of the skeleton graph for the Compton amplitude. This may easily be understood in the Landau gauge when all individual radiative insertions in the electron line do not contain logarithmic enhancements [20]. One may also prove the absence of logarithmic enhancement with the help of the Ward identities. This means that insertion of the electron factor in Eq. (280) does not spoil the ultraviolet convergence of the integral. More interesting is the low
Fig. 68. Diagrams with radiative insertions in the electron line.
M.I. Eides et al. / Physics Reports 342 (2001) 63}261
189
Fig. 69. One-loop electron factor.
momentum behavior of the electron factor. Due to the generalized low-energy theorem for the Compton scattering (see, e.g., [129]), the electron factor has a pole at small momenta and the residue at this pole is completely determined by the one loop anomalous magnetic moment. Hence, naive substitution of the electron factor in Eq. (280) (see Eq. (281) below) would produce a linearly infrared divergent contribution to HFS. This would be infrared divergence of the anomalous magnetic moment contribution should be expected. As discussed in the previous section, the contribution connected with the anomalous magnetic moment does not contain an extra factor Za present in our skeleton diagram. The contribution is generated by the region of small (atomic scale) intermediate momenta, and the linear divergence would be cuto! by the wave function at the scale k&mZa and will produce the correction of the previous order in Za induced by the electron anomalous magnetic moment and already considered above. Hence, to obtain corrections of order a(Za)E we simply have to subtract from the electron factor its part generated by the anomalous $ magnetic moment. This subtraction reduces to subtraction of the leading pole term in the infrared asymptotics of the electron factor. A closed analytic expression for this subtracted electron factor as a function of momentum k was obtained in [238]. This electron factor was normalized according to the relationship a 1 P F(k) . k 2p
(281)
The subtracted electron factor generates a "nite radiative correction after substitution in the integral in Eq. (280). The contribution to HFS is equal to
13 4Za dk F(k)" ln 2! E a(Za)E , (282) *E " $ pn $ 4 and was "rst obtained in a di!erent way in [236,64,237]. This expression should be compared with the correction of order a(Za) to the Lamb shift in Eq. (67). Both expressions have the same physical origin, they correspond to the radiative insertions in the diagrams with two external photons, may be calculated in the skeleton diagram approach, and do not contain a factor p in the denominators. The reasons for its absence were discussed in the end of Section 4.3.1. 11.3.1.2. Correction induced by the polarization insertions in the external photons. Explicit expression for the electron loop polarization contribution to HFS in Fig. 70 is obtained from the skeleton integral in Eq. (280) by the standard substitution in Eq. (68). One also has to take into account an additional factor 2 which corresponds to two possible insertions of the polarization operator in either of the external photon lines. The "nal integral may easily be calculated and the polarization operator insertion leads to the correction [236,64,237]
3 16a(Za) dk I (k)" a(Za)E . E *E " $ 4 pn $
(283)
190
M.I. Eides et al. / Physics Reports 342 (2001) 63}261
Fig. 70. Diagrams with electron-loop polarization insertions in the external photon lines. Fig. 71. Skeleton two-photon diagrams for HFS.
Fig. 72. Diagrams with muon-loop polarization insertions in the external photon lines.
There is one subtlety in this result, which should be addressed here. The skeleton integral in Eq. (280) may be understood as the heavy muon pole contribution in the diagrams with two exchanged photons in Fig. 71. In such a case an exact calculation will produce an extra factor 1/(1#m/M) before the skeleton integral in Eq. (280). We have considered above only the nonrecoil contributions, and so we have ignored an extra factor of order m/M, keeping in mind that it would be considered together with other recoil corrections of order a(Za)E . This strategy is well suited $ for consideration of the recoil and nonrecoil corrections generated by the electron factor, but it is less convenient in the case of the polarization insertion. In the case of the polarization insertions the calculations may be simpli"ed by simultaneous consideration of the insertions of both the electron and muon polarization loops [239,240]. In such an approach one explicitly takes into account internal symmetry of the problem at hand with respect to both particles. So, let us preserve the factor 1/(1#m/M) in Eq. (280), even in calculation of the nonrecoil polarization operator contribution. Then we will obtain an extra factor m /m on the right hand side in Eq. (283). To facilitate further recoil calculations we could simply declare that the polarization operator contribution with this extra factor m /m is the result of the nonrecoil calculation but there exists a better choice. Insertion in the external photon lines of the polarization loop of a heavy particle with mass M generates correction to HFS suppressed by an extra recoil factor m/M in comparison with the electron loop contribution. Corrections induced by such heavy particles polarization loop insertions clearly should be discussed together with other radiativerecoil corrections. However, as was "rst observed in [239,240], the muon loop plays a special role. Its contribution to HFS di!ers from the result in Eq. (283) by an extra recoil factor m /M, and, P hence, the sum of the electron loop contribution in Fig. 70 (with the extra factor m /m taken into P account) and of the muon loop contribution in Fig. 72 is exactly equal to the result in Eq. (283), which we will call below the nonrecoil polarization operator contribution. We have considered here this cancellation of part of the radiative-recoil correction in order to facilitate consideration of the total radiative-recoil correction generated by the polarization operator insertions below. Let us emphasize that there was no need to restore the factor 1/(1#m/M) in the consideration of the electron line radiative corrections, since in the analytic calculation of the respective radiative-recoil corrections to be discussed below we do not use any subtractions and recalculate the nonrecoil part of these corrections explicitly. Recoil corrections induced by the polarization loops containing other heavy particles will be considered below in Section 12.2 together with other radiative-recoil corrections.
M.I. Eides et al. / Physics Reports 342 (2001) 63}261
191
Fig. 73. Six gauge invariant sets of diagrams for corrections of order a(Za)E . $
11.3.2. Corrections of order a(Za)E $ Calculation of the corrections of order a(Za)E goes in principle along the same lines as the $ calculation of the corrections of the previous order in a in the preceding section. Once again the scattering approximation is adequate for calculation of these corrections. There exist six gauge invariant sets of graphs in Fig. 73 which produce corrections of order a(Za)/pE to HFS [238]. $ Respective contributions once again may be calculated with the help of the skeleton integral in Eq. (280) [238,241,242]. Some of the diagrams in Fig. 73 also generate corrections of the previous order in Za, which would naively induce infrared divergent contributions after substitution in the skeleton integral in Eq. (280). The physical nature of these contributions is quite transparent. They correspond to the anomalous magnetic moment which is hidden in the two-loop electron factor. The true order in Za of these anomalous magnetic moment contributions is lower than their apparent order and they should be subtracted from the electron factor prior to calculation of the contributions to HFS. We have already encountered a similar situation above in the case of the correction of order a(Za)E $
192
M.I. Eides et al. / Physics Reports 342 (2001) 63}261
induced by the electron factor, and the remedy is the same. Let us mention that the analogous problem was also discussed in connection with the Lamb shift calculations in Section 4.3.1. Technically the lower-order contributions to HFS are produced by the constant terms in the low-frequency asymptotic expansion of the electron factor. These lower-order contributions are connected with integration over external photon momenta of the characteristic atomic scale mZa and the approximation based on the skeleton integrals in Eq. (280) is inadequate for their calculation. In the skeleton integral approach these previous order contributions arise as the infrared divergences induced by the low-frequency terms in the electron factors. We subtract leading low-frequency terms in the low-frequency asymptotic expansions of the electron factors, when necessary, and thus get rid of the previous order contributions. Let us discuss in more detail calculation of di!erent contributions of order a(Za)E . The reader $ could notice that the discussion below is quite similar to the discussion of calculation of the corrections of order a(Za)m to the Lamb shift in Section 4.3.3. 11.3.2.1. One-loop polarization insertions in the external photons. The simplest correction is induced by the diagrams in Fig. 73(a) with two insertions of the one-loop vacuum polarization in the external photon lines. The respective contribution to HFS is obtained from the skeleton integral in Eq. (280) by the substitution of the polarization operator squared
a 1 kI (k) . P p k
(284)
Taking into account the multiplicity factor 3 one easily obtains [238]
24Za a 36 a(Za) *E" E E . dk kI (k)" $ pn p 35 pn $
(285)
11.3.2.2. Insertions of the irreducible two-loop polarization in the external photons. Expression for the two-loop vacuum polarization contribution to HFS in Fig. 73(b) is obtained from the skeleton integral in Eq. (280) by the substitution
a 1 I (k) . P p k
(286)
With account of the multiplicity factor 2 one obtains [238]
16Za a 224 38 118 a(Za) *E" E ln 2! p! E dk I (k)" $ pn p 15 15 225 pn $ a(Za) +1.872 E . pn $
(287)
11.3.2.3. Insertion of one-loop electron factor in the electron line and of the one-loop polarization in the external photons. The next correction of order ac(Za)E is generated by the gauge invariant $ set of diagrams in Fig. 73(c). The respective analytic expression is obtained from the skeleton integral by simultaneous insertion in the integrand of the one-loop polarization function I (k) and of the electron factor F(k).
M.I. Eides et al. / Physics Reports 342 (2001) 63}261
193
Then taking into account the multiplicity factor 2 corresponding to two possible insertions of the one-loop polarization, one obtains
8Za a E dk kF(k)I (k) *E" $ pn p
4 1#(5 20 1#(5 64 p 1043 a(Za) ! (5 ln ! ln 2# # E . " ! ln 3 9 45 9 675 pn $ 2 2
(288)
We used in Eq. (288) subtracted electron factor. However, it is easy to see that the one-loop anomalous magnetic moment term in the electron factor generates a correction of order a(Za)E $ in the diagrams in the "gure, and also should be taken into account. An easy direct calculation of the anomalous magnetic moment contribution leads to the correction 3 a(Za) *E" E , 8 pn $
(289)
which may be also obtained multiplying the result in Eq. (283) by the one-loop anomalous magnetic moment a/(2p). Hence, the total correction of order a(Za)E generated by the diagrams in Fig. 73(c) is equal to $ 8Za a E *E" dk kF(k)I (k) $ pn p
4 1#(5 20 1#(5 64 p 3 1043 a(Za) ! (5 ln ! ln 2# # # E " ! ln 2 2 pn $ 3 9 45 9 8 675 a(Za) E . +2.232 pn $
(290)
11.3.2.4. One-loop polarization insertions in the radiative electron factor. This correction is induced by the gauge invariant set of diagrams in Fig. 73(d) with the polarization operator insertions in the radiative photon. The two-loop anomalous magnetic moment generates correction of order aE to HFS and the respective leading pole term in the infrared $ asymptotics of the electron factor should be subtracted to avoid infrared divergence and double counting. The subtracted radiatively corrected electron factor may be obtained from the subtracted one-loop electron factor in Eq. (281). To this end, one should restore the radiative photon mass in the one-loop electron factor, and then the polarization operator insertion in the photon line is taken into account with the help of the dispersion integral like one in Eq. (77) for the spinindependent electron factor. In terms of the electron factor F(k, j) with a massive radiative photon with mass j"2/(1!v the contribution to HFS has the form [241]
v(1!(v/3)) 4a(Za) dk dv *E" E ¸(k, j) . 1!v pn $
(291)
194
M.I. Eides et al. / Physics Reports 342 (2001) 63}261
This integral was analytically simpli"ed to a one-dimensional integral of a complete elliptic integral, which admits numerical evaluation with an arbitrary precision [241] a(Za) E *E"!0.3107422 pn $
(292)
11.3.2.5. Light-by-Light scattering insertions in the external photons. The diagrams in Fig. 73(e) with the light-by-light scattering insertions in the external photons do not generate corrections of the previous order in Za. As is well known, the light-by-light scattering diagrams are apparently logarithmically ultraviolet divergent, but due to gauge invariance the diagrams are really ultraviolet convergent. Also, as a result of gauge invariance the light-by-light scattering tensor is strongly suppressed at small momenta of the external photons. The contribution to HFS can easily be expressed in terms of a weighted integral of the light-by-light scattering tensor [242], and further calculations are in principle quite straightforward though technically involved. This integral was analytically simpli"ed to a three-dimensional integral which may be calculated with high accuracy [242] *E"!0.472514 (1)
a(Za) E . pn $
(293)
The original result in [242] di!ered from the one in Eq. (293) by two percent. A later purely numerical calculation of the light-by-light contribution in [243,18] produced a less precise result which, however, di!ered from the original result in [242] by two percent. After a throughout check of the calculations in [242] a minor arithmetic mistake in one of the intermediate expressions in the original version of [242] was discovered. After correction of this mistake, the semianalytic calculations in [242] lead to the result in Eq. (293) in excellent agreement with the somewhat less precise purely numerical result in [243,18]. 11.3.2.6. Diagrams with insertions of two radiative photons in the electron line. By far the most di$cult task in calculations of corrections of order a(Za)E to HFS is connected with the last $ gauge invariant set of diagrams in Fig. 73(f ), which consists of nineteen topologically di!erent diagrams [238] presented in Fig. 74 (compare a similar set of diagrams in Fig. 21 in the case of the Lamb shift). These nineteen graphs may be obtained from the three graphs for the two-loop electron self-energy by insertion of two external photons in all possible ways. The graphs in Fig. 74(a)}(c) are obtained from the two-loop reducible electron self-energy diagram, graphs in Fig. 74(d)}(k) are the result of all possible insertions of two external photons in the rainbow self-energy diagram, and diagrams in Fig. 74(l)}(s) are connected with the overlapping two-loop self-energy graph. Calculation of the respective contribution to HFS in the skeleton integral approach was initiated in [77,78], where contributions induced by the diagrams in Fig. 74(a)}(h) and Fig. 74(l) were obtained. In order to avoid spurious infrared divergences in the individual diagrams the semianalytic calculations in [77,78] were performed in the Yennie gauge. The diagrams under consideration contain anomalous magnetic moment contributions which were subtracted before taking the scattering approximation integrals.
M.I. Eides et al. / Physics Reports 342 (2001) 63}261
195
Fig. 74. Nineteen topologically di!erent diagrams with two radiative photons insertions in the electron line.
The total contribution of all nineteen diagrams to HFS was "rst calculated purely numerically in the Feynman gauge in the NRQED framework in [243,18]. The semianalytic skeleton integral calculation in the Yennie gauge was completed a bit later in [80,62] a(Za) E . *E"!0.6726 (4)2 pn $
(294)
This semianalytic result is consistent with the purely numerical result in [243,18] but more than an order of magnitude more precise. It is remarkable that the results of two complicated calculations performed in completely di!erent approaches turned out to be in excellent agreement. 11.3.2.7. Total correction of order a(Za)E . The total contribution of order a(Za)E is given by $ $ the sum of contributions in Eqs. (285), (287), (290) and (292)}(294):
4 1#(5 20 1#(5 608 p 38 91639 ! (5 ln # ln 2# ! p# *E" ! ln 3 9 45 9 15 37800 2 2 !0.310742!0.472514 (1)!0.6726 (4)]
a(Za) a(Za) E +0.7717 (4) E , $ $ p p
(295)
or numerically *E"0.4256 (2) kHz .
(296)
196
M.I. Eides et al. / Physics Reports 342 (2001) 63}261
As we have already mentioned consistent results for this correction were obtained independently in di!erent approaches by two groups in [238,241,242,77,78,80,62,243,18]. 11.3.3. Corrections of order a(Za)E $ The corrections of order a(Za)E were never considered in the literature. Their natural scale is $ determined by the factor a(Za)/pE , which is equal about 10\ kHz. Hence, this correction is too $ small to be of any interest now. Note that the uncertainty of the total contribution in the last line in Table 14 is determined not, by the uncertainties of any of the entries in the upper lines of this table, which are too small, but just by the uncalculated contribution of order a(Za)E . $ 11.4. Radiative corrections of order aL(Za)E $ 11.4.1. Corrections of order a(Za)E $ 11.4.1.1. Electron-line logarithmic contributions. Binding e!ects are crucial in calculation of the corrections of order a(Za)E . Unlike the corrections of the "rst order in the binding parameter Za, $ in this case the exchanged photon loops with low (of order &mZa) exchanged momenta give a signi"cant contribution and the external wave functions at the origin do not factorize as in the scattering approximation. The anticipated low momentum logarithmic divergence in the loop integration is cut o! by the wave functions at the atomic scale and, hence, the contribution of order a(Za)E is enhanced by the low-frequency logarithmic terms ln Za and ln Za. The situation for $ this calculation resembles the case of the main contribution to the Lamb shift of order a(Za) and also corrections to the Lamb shift of order a(Za). Once again, the factors before logarithmic terms originate from the electron form factor and from the logarithmic integration over the loop momenta in the diagrams with three exchanged photons. The leading logarithm squared term is state independent and can easily be calculated by the same methods as the leading logarithm cube contribution of order a(Za) (see Section 4.4.2.1). The only di!erence is that this time it is necessary to take as one of the perturbation potentials the potential responsible for the main Fermi contribution to HFS 8 pZa (1#a ) . < " I $ 3 mM
(297)
This calculation of the leading logarithm squared term [110] (see Fig. 75) also produces a recoil correction to the nonrecoil logarithm squared contribution. We will discuss this radiative-recoil correction below in the Section 12.2.4 dealing with other radiative-recoil corrections, and we will consider in this section only the nonrecoil part of the logarithm squared term. All logarithmic terms for the 1S state were originally calculated in [224,245]. The author of [244] also calculated the logarithmic contribution to the 2S hyper"ne splitting
8 a(Za) 2 37 ln(Za)\ E , *E(1S)" ! ln(Za)\! ln 2! $ 3 p 3 72
16 a(Za) 2 1 *E(2S)" ! ln(Za)\! ln 2!4! ln(Za)\ E . $ 3 8p 3 72
(298) (299)
M.I. Eides et al. / Physics Reports 342 (2001) 63}261
197
Table 14 Radiative corrections of order aL(Za)E $ a(Za)E $ Electron-line insertions Kroll and Pollock [236] Karplus et al. [64,237]
Polarization insertion Kroll and Pollock [236] Karplus et al. [64,237]
3 4
One-loop polarization Eides et al. [238]
36 a 35 p
Two-loop polarization Eides et al. [238]
13 ln 2! 4
kHz
!607.123
178.087
0.567
224 38p 118 a ln 2! ! 15 15 225 p
One-loop polarization and electron factor Eides et al. [238]
Polarization insertion in the electron factor Eides et al. [241]
a !0.3107422 p
Light-by-light scattering Eides et al. [242] Kinoshita and Nio [243,18]
!0.472514 (1)
Insertions of two radiative photons in the electron line Kinoshita and Nio [243,18] Eides and Shelyuto [80,62]
1.030
4 1#(5 20 1#(5 64 p 3 1043 a ! ln ! (5 ln ! ln 2# # # 2 2 3 9 45 9 8 675 p
!0.6726 (4)
a p
a p
Total correction of order aL(Za)E $
!0.369
!0.171
!0.261
!0.371 !428.611 (1)
Fig. 75. Leading logarithm squared contribution of order a(Za)E to HFS. $
11.4.1.2. Nonlogarithmic electron-line corrections. Calculation of the nonlogarithmic part of the contribution of order a(Za)E is a more complicated task than obtaining the leading logarithmic $ terms. The short distance leading logarithm squared contributions cancel in the di!erence *E(1S)!8*E(2S) and it is this di!erence, containing both logarithmic and nonlogarithmic
198
M.I. Eides et al. / Physics Reports 342 (2001) 63}261
Fig. 76. Leading logarithm polarization contribution of order a(Za)E to HFS. $ Fig. 77. Corrections of order a(Za)E induced by the polarization operator. $
corrections, which was calculated "rst in [106]. An estimate of the nonlogarithmic terms for n"1 and n"2 in accordance with [106] was obtained in [246]. This work also con"rmed the results of [244,245] for the logarithmic terms. The "rst complete calculation of the nonlogarithmic terms was done in a purely numerical approach in [247]. The idea of this calculation is similar to the one used for calculation of the a(Za) contribution to the Lamb shift in [87]. In this calculation the electron-line radiative corrections were written in the form of the diagrams with the electron propagator in the external "eld, and then an approximation scheme for the relativistic electron propagator in the Coulomb "eld was set with the help of the well-known representation [89,90] for the nonrelativistic propagator in the Coulomb "eld. In view of the signi"cant theoretical progress achieved recently in calculation of contributions to HFS of order a(Za)E , insu$cient accuracy in $ the calculation in [247] (about 0.2 kHz) became the main source of uncertainty in the theoretical expression for muonium HFS. Two new independent calculations of this correction were performed recently [248,249]. The author of [248] used his approach developed for calculation of corrections of order a(Za) to the Lamb shift. The main ideas of this approach were discussed above in Section 4.4.1.2. Calculation in [249] was performed in the completely di!erent framework of nonrelativistic QED (see, e.g., [17,243,18]). Results of both calculations are in remarkable agreement, the result of [248] being equal to *E"17.122
a(Za) E , $ p
(300)
and the result of [249] is *E"17.1227 (11)
a(Za) E . $ p
(301)
11.4.1.3. Logarithmic contribution induced by the polarization operator. Calculation of the leading logarithmic contribution induced by the polarization operator insertion (see Fig. 76) proceeds along the same lines as the calculation of the logarithmic polarization operator contribution of order a(Za) to the Lamb shift (compare Section 4.4.1.3). Again, only the leading term in the small momentum expansion of the polarization operator (see Eqs. (32) and (94)) produces a contribution of the order under consideration and only the state independent logarithmic corrections to the SchroK dinger}Coulomb function are relevant. The only di!erence is that the correction to the wave function is produced, not by the Darwin term as in the case of the Lamb shift (compare Eq. (96)) but, by the external magnetic moment perturbation in Eq. (297). This correction to the wave function was calculated in [106,250], and with its help one immediately obtains the
M.I. Eides et al. / Physics Reports 342 (2001) 63}261
199
state-independent logarithmic contribution a(Za)E to HFS induced by the polarization operator $ insertion [244,245] a(Za) 4 E . *E " ln(Za)\ 15 pn $
(302)
11.4.1.4. Nonlogarithmic corrections induced by the polarization operator. Calculation of the nonlogarithmic polarization operator contribution is quite straightforward. One simply has to calculate two terms given by ordinary perturbation theory, one is the matrix element of the radiatively corrected external magnetic "eld, and another is the matrix element of the radiatively corrected external Coulomb "eld between wave functions corrected by the external magnetic "eld (see Fig. 77). The "rst calculation of the respective matrix elements was performed in [246]. Quite recently a number of inaccuracies in [246] were uncovered [243,18,249,251}253] and the correct result for the nonlogarithmic contribution of order a(Za)E to HFS is given by $ 8 34 a(Za) E . (303) *E" ! ln 2# $ p 15 225
11.4.2. Corrections of order a(Za)E $ 11.4.2.1. Leading double logarithm corrections. Corrections of order a(Za)E are again enhanced $ by a logarithm squared term and one should expect that they are smaller by the factor a/p than the corrections of order a(Za)E considered above. Calculation of the leading logarithm squared $ contribution to HFS may be performed in exactly the same way as the calculation of the leading logarithm cube contribution of order a(Za) to the Lamb shift considered above in Section 4.4.2.1. Both results were originally obtained in one and the same work [110]. The logarithm cube term is missing in the case of HFS, since now at least one of the perturbation operators in Fig. 78 should contain a magnetic exchange photon and the respective anomalous magnetic moment is infrared "nite. It is easy to realize that as the result of this calculation one obtains simply the leading logarithmic contribution of order a(Za)E in Eq. (298), multiplied by the electron anomalous $ magnetic moment a/(2p) [110] a(Za) 1 E . *E"! ln(Za)\ $ p 3
(304)
Numerically this correction is about !0.04 kHz, and this contribution is large enough to justify calculation of the single-logarithmic contributions. 11.4.2.2. Single-logarithmic and nonlogarithmic contributions. Terms linear in the large logarithm were recently calculated in the NRQED framework [15]
Fig. 78. Leading double logarithm contribution of order a(Za)E . $
200
M.I. Eides et al. / Physics Reports 342 (2001) 63}261
*E"
9 3 10 4358 3 3 p p 197 f(3)! p ln 2# p# ! f(3)! ln 2# # 4 2 27 1296 4 4 2 12 144
4 3 1 11 8 ! ln 2! # ! !1# 3 4 4 9 15
ln(Za)\
a(Za) E $ p
a(Za) +!0.6390005442 E . $ p The nonlogarithmic contributions of order a(Za)E are also estimated in [15] $ *E"(10$2.5)
a(Za) E . $ p
(305)
(306)
The numerical uncertainty of the last contribution is about 0.003 kHz. Corrections of order a(Za)E are suppressed by an extra factor a/p in comparison with the $ leading contributions of order a(Za)E and are too small to be of any phenomenological interest $ now. All corrections of order aL(Za)E are collected in Table 15, and their total uncertainty is $ determined by the error of the nonlogarithmic contribution of order a(Za)E . $ 11.5. Radiative corrections of order a(Za)E and of higher orders $ As we have repeatedly observed corrections to the energy levels suppressed by an additional power of the binding parameter Za are usually numerically larger than the corrections suppressed by an additional power of a/p, induced by radiative insertions. In this perspective one could expect that the corrections of order a(Za)E would be numerically larger than considered above $ corrections of order a(Za)E . $ 11.5.1. Corrections of order a(Za)E $ 11.5.1.1. Leading logarithmic contributions induced by the radiative insertions in the electron line. Calculation of the leading logarithmic corrections of order a(Za)E to HFS parallels the $ calculation of the leading logarithmic corrections of order a(Za) to the Lamb shift, described above in Section 4.5.1. Again all leading logarithmic contributions may be calculated with the help of second-order perturbation theory (see Eq. (102)). It is easy to check that the leading contribution is linear in the large logarithm. Due to the presence of the local potential in Eq. (297) which corresponds to the main contribution to HFS, and of the potential
13 pa(Za) 8 a(Za) (1#a ) , < " ln 2! I ). 3 4 mM
(307)
which corresponds to the contribution of order a(Za)E , we now have two combinations of local $ potentials < (Eq. (115)), < (Eq. (297)), and < (Eq. (116)), < (Eq. (307)) in Fig. 79, which generate $ ). logarithmic contributions. This di!ers from the case of the Lamb shift when only one combination
M.I. Eides et al. / Physics Reports 342 (2001) 63}261
201
Table 15 Radiative Corrections of order aL(Za)E $ a(Za) E $ p Logarithmic electron-line contribution Zwanziger [245] Layzer [244] Nonlogarithmic electron-line contribution Pachucki [248] Kinoshita and Nio [249] Logarithmic polarization operator contribution Zwanziger [245] Layzer [244] Nonlogarithmic polarization operator contribution Kinoshita and Nio [243,18,249] Sapirstein [251], Brodsky [252] Schneider et al. [253] Leading logarithmic contribution of order a(Za)E $ Karshenboim [110] Single-logarithmic contributions of order a(Za)E $ Kinoshita and Nio [15]
Nonlogarithmic contributions of order a(Za)E $ Kinoshita and Nio Total correction of order aL(Za)E $
kHz
8 2 37 ! ln(Za)\! ln 2! ln(Za)\ 3 3 72
!42.850
17.1227 (11)
9.444
4 ln(Za)\ 15
1.447
8 34 ! ln 2# 15 225
!0.121
1 a ! ln(Za)\ 3 p
!0.041
9 3 10 4358 f(3)! p ln 2# p# 4 2 27 1296
3 3 p p 197 ! f(3)! ln 2# # 4 4 2 12 144 4 3 1 11 8 ! ln 2! # ! !1# 3 4 4 9 15 (10$2.5)
a p
ln(Za)\
a p
!0.008
0.013 (3) !32.115 (3)
Fig. 79. Leading logarithmic electron-line contributions of order a(Za)E . $
202
M.I. Eides et al. / Physics Reports 342 (2001) 63}261
Fig. 80. Leading logarithmic photon-line contributions of order a(Za)E . $
of local operators was relevant for calculation of the leading logarithmic contribution. An easy calculation produces [254,118]
*E"4
1 11 a(Za) m E , ln 2!1! ln(Za)\ $ 2 m 128 n
(308)
and [118]
1 13 a(Za) *E" ln 2! ln(Za)\ E , $ 2 4 n
(309)
for the "rst and second combinations of the perturbation potentials, respectively. 11.5.1.2. Leading logarithmic contributions induced by the polarization insertions in the external photon lines. The leading logarithmic correction induced by the radiative insertions in the external photon in Fig. 80 is calculated in exactly the same way as was done above in the case of radiative insertions in the electron line. The only di!erence is that instead of the potential < in Eq. (115) we have to use the respective potential in Eq. (119), and instead of the potential < in Eq. (307) we ). have to use the potential pa(Za) < "2a(Za) (1#a ) . ). I mM
(310)
connected with the external photons. Then we immediately obtain [254,118]
5 a(Za) m E , *E"! ln(Za)\ $ 48 m n
(311)
and [118] 3 a(Za) *E" ln(Za)\ E , $ 8 n
(312)
for the "rst and second combinations of the perturbation potentials. 11.5.1.3. Nonlogarithmic contributions of order a(Za)E and of higher orders in Za. The large $ magnitude of the leading logarithmic contributions of order a(Za)E generated by the radiative $ photon insertions in the electron line warrants consideration of the respective nonlogarithmic corrections. Numerical calculations of radiative corrections to HFS generated by insertion of one radiative photon in the electron line without expansion over Za, which are routinely used for the
M.I. Eides et al. / Physics Reports 342 (2001) 63}261
203
Table 16 Radiative Corrections of order a(Za)E $ a(Za)E $ Logarithmic electron-Line contribution Lepage [254], Karshenboim [118] Logarithmic polarization operator contribution Lepage [254], Karshenboim [118] Nonlogarithmic electron-line contribution Blundell et al. [256] Total correction of order a(Za)E $
kHz
5 191 ln 2! ln(Za)\ 2 32
!0.527
13 ln(Za)\ 48
0.034
!3.82 (63)
!0.048 (8) !0.542 (8)
high Z atoms, were recently extended for Z"1. Initial disagreements between the results of [255,256] were resolved in [257], which con"rmed (but with an order of magnitude worse accuracy) the result of [256] *E"!3.82(63)a(Za)E . $
(313)
Let us emphasize that this result describes not only nonlogarithmic corrections of order a(Za)E but also includes contributions of all terms of the form a(Za)LE with n54. The $ $ magnitude of the nonlogarithmic coe$cient in Eq. (313) seems to be quite reasonable qualitatively. Numerically the contribution to HFS in Eq. (313) is about 0.09 of the leading logarithmic contribution. Let us also mention nonlogarithmic contributions induced by the insertions in the external photon. In analogy to the case of the insertions in the electron line it is reasonable to expect that magnitude of these nonlogarithmic terms is less than one tenth of the respective leading logarithmic contribution, and is thus smaller than the uncertainty of the electron line contribution. Numerical calculations without expansion in Za in [257] are consistent with these expectations. Thus all corrections of order a(Za)LE collected in Table 16 are now known with an uncertainty $ of about 0.008 kHz. Of all these corrections only the nonlogarithmic contribution of order a(Za)E was not calculated independently by at least two groups. In view of the complexity of the $ numerical calculations an independent consideration of this correction would be helpful. 11.5.2. Corrections of order a(Za)E and of higher orders in a $ One should expect that corrections of order a(Za)E are suppressed relative to the contribu$ tions of order a(Za)E by the factor a/p. This means that at the present level of experimental $ accuracy one may safely neglect these corrections, as well as corrections of even higher orders in a.
204
M.I. Eides et al. / Physics Reports 342 (2001) 63}261
12. Essentially two-body corrections to HFS 12.1. Recoil corrections to HFS The very presence of the recoil factor m/M emphasizes that the external "eld approach is inadequate for calculation of recoil corrections and, in principle, one needs the complete machinery of the two-particle equation in this case. However, many results may be understood without a cumbersome formalism. Technically, the recoil factor m/M arises because the integration over the exchanged momenta in the diagrams which generate the recoil corrections goes over a large interval up to the muon mass, and not just up to the electron mass, as was the case of the nonrecoil radiative corrections. Due to large intermediate momenta in the general expression for the recoil corrections only the Dirac magnetic moment of the muon factorizes naturally in the general expression for the recoil corrections *E"EI (1#corrections) , $ where
m m 16 chR . EI " Za $ M m 3
(314)
(315)
Here EI does not include, unlike Eq. (271), the muon anomalous magnetic moment a which $ I should now be considered on the same grounds as other corrections to hyper"ne splitting. Nonfactorization of the muon anomalous magnetic moment is a natural consequence of the presence of the large integration region mentioned above. It is worth mentioning that the expression for the Fermi energy EI is symmetric with respect to the light and heavy particles, and $ does not change under exchange of the particles m M. 12.1.1. Leading recoil correction The leading recoil correction of order Za(m/M)EI is generated by the graphs with two ex$ changed photons in Fig. 81, similar to the case of the recoil correction to the Lamb shift of order (Za)(m/M)m considered in Section 5.1. However, calculations in the case of hyper"ne splitting are much simpler in comparison with the Lamb shift, since the region of extreme nonrelativistic exchange momenta m(Za)(k(m(Za) does not generate any correction of order Za(m/M)EI . This is almost obvious in the nonrelativistic perturbation theory framework, which is $ quite su$cient for calculation of all corrections generated at such small momenta. Unlike the case of the Lamb shift the leading contribution which is due to the one-transverse quanta exchange in the nonrelativistic dipole approximation is given by the Breit potential. This contribution is simply
Fig. 81. Diagrams with two-photon exchanges.
M.I. Eides et al. / Physics Reports 342 (2001) 63}261
205
the Fermi energy, and all nonrelativistic corrections to the Fermi energy are suppressed at least by the additional factor (Za). Then the leading recoil correction to hyper"ne splitting may reliably be calculated in the scattering approximation, ignoring even the wave function momenta of order mZa. The formal proof has been given, e.g., in [129], but this may easily be understood at the qualitative level. The skeleton integral is linearly infrared divergent and this divergence has a clear origin since it corresponds to the classical Fermi contribution to HFS. This divergence is produced by the heavy particle pole contribution and after subtraction (note that we e!ectively subtract the skeleton integral in Eq. (280) with restored factor 1/(1#m/M), see discussion in Section 11.3.1.2) we obtain the convergent skeleton integral *E"EI
mM Za dk [ f (k)!f (kk)] , $ M!m 2p k
(316)
where 16 k!4k!32 !k# f (k)" k k(4#k
(317)
and k"m/M. One may easily perform the momentum integration in this infrared "nite integral and obtain [258,95,259] 3mM Za M *E "! ln EI . M!m p m $
(318)
The subtracted heavy pole (Fermi) contribution is generated by the exchange of a photon with a small (atomic scale &mZa) momentum and after subtraction of this contribution only high loop momenta k (m(k(M) contribute to the integral for the recoil correction. Then the exchange loop momenta are comparable to the virtual momenta determining the anomalous magnetic moment of the muon and there are no reasons to expect that the anomalous magnetic moment will enter as a factor in the formula for the recoil corrections. It is clear that the contribution of the muon anomalous magnetic moment in this case cannot be separated from contributions of other radiative-recoil corrections. Let us emphasize that, unlike the other cases in this review where we encountered the logarithmic contributions, the result in Eq. (318) is exact in the sense that this is a complete contribution of order Za(m/M)EI . There are no nonlogarithmic contributions of this order. $ Despite its nonsymmetric appearance the recoil correction in Eq. (318) is symmetric with respect to the electron and muon masses. As in the case of the leading recoil corrections to the Lamb shift coming from one-photon exchanges, this formula generated by the two-photon exchange is exact in the electron}muon mass ratio. This is crucial from the phenomenological point of view, taking into account the large value of the correction under consideration and the high precision of the current experimental results. 12.1.2. Recoil correction of relative order (Za)(m/M) Recoil corrections of relative order (Za)(m/M) are generated by the kernels with three exchanged photons (see Fig. 82). One might expect, similar to the case of the leading recoil correction, emergence of a recoil contribution of order (Za)(m/M)EI logarithmic in the mass ratio. The $
206
M.I. Eides et al. / Physics Reports 342 (2001) 63}261
Fig. 82. Recoil corrections of order (Za)(m/M)E . $
logarithm of the mass ratio could originate only from the integration region m;k;M, where one can safely omit electron masses in the integrand. The integrand simpli"es, and it turns out that despite the fact that the individual diagrams produce logarithmic contributions, these contributions sum to zero [260,261]. It is not di$cult to understand the technical reason for this e!ect which is called the Caswell}Lepage cancellation. For each exchanged diagram there exists a pair diagram where the exchanged photons are attached to the electron line in an opposite order. Respective electron line contributions to the logarithmic integrands generated by these diagrams di!er only by sign [261] and, hence, the total contribution logarithmic in the mass ratio vanishes. This means that all pure recoil corrections of relative order (Za)(m/M) originate from the exchanged momenta of order of the electron mass and smaller. One might think that as a result the muon anomalous magnetic moment would enter the expression for these corrections in a factorized form, and the respective corrections should be written in terms of E and not of EI , as was in the $ $ case of the leading recoil correction. However, if we include the muon anomalous magnetic moment in the kernels with three exchanged photons, they would generate the contributions proportional not only to the muon anomalous magnetic moment but to the anomalous magnetic moment squared. This makes any attempt to write the recoil correction of relative order (Za)(m/M) in terms of E unnatural, and it is usually written in terms of EI (see however $ $ discussion of this correction in the case of hydrogen in Section 14.1.2). The corrections to this result due to the muon anomalous magnetic moment should be considered separately as the radiativerecoil corrections of order (Za)a(Za)(m/M)EI . A naive attempt to write the recoil correction of $ relative order (Za)(m/M) in terms of E would shift the magnitude of this correction by about $ 10 Hz. This shift could be understood as an indication that calculation of the radiative-recoil corrections of order (Za)a(Za)(m/M)EI could be of some phenomenological interest at the $ present level of experimental accuracy. Leading terms logarithmic in Za were "rst considered in [262], and the complete logarithmic contribution was obtained in [28,263] *E"2 ln(Za)\(Za)
m EI . mM $
(319)
As usual this logarithmic contribution is state independent. Calculation of the nonlogarithmic contribution turned out to be a much more complicated task and the whole machinery of the relativistic two-particle equations was used in this work. First, the di!erence *E(1S)!8*E(2S) was calculated [264], then some nonlogarithmic contributions of this order for the 1S level were obtained [265], and only later the total nonlogarithmic contribution [266,267] was obtained
65 m (Za) EI . *E" !8 ln 2# 18 mM $
(320)
M.I. Eides et al. / Physics Reports 342 (2001) 63}261
207
Fig. 83. Leading logarithm squared contribution of order (Za)(m/M)EI . $
This result was later con"rmed in a purely numerical calculation [17] in the framework of NRQED. Recoil contributions in Eqs. (319) and (320) are symmetric with respect to masses of the light and heavy particles. As in the case of the leading recoil correction, they were obtained without expansion in the mass ratio, and hence an exact dependence on the mass ratio is known (not just the "rst term in the expansion over m/M). Let us mention that while for the nonrecoil nonlogarithmic contributions of order (Za), both to HFS and the Lamb shift, only numerical results were obtained, the respective recoil contributions are known analytically in both cases (compare discussion of the Lamb shift contributions in Section 13). 12.1.3. Recoil corrections of order (Za)(m/M)EI $ There are two di!erent double logarithm contributions of order (Za)(m/M)EI , one contains $ a regular low-frequency logarithm squared, and the second depends on the product of the low-frequency logarithm and the logarithm of the mass ratio. Calculation of these contributions is quite straightforward and goes along the same lines as calculation of the leading logarithmic contribution of order a(Za) to the Lamb shift (see Section 4.4.2). Taking as one of the perturbation potentials the potential corresponding to the logarithmic recoil contribution of order (Za) to the Lamb shift in Eq. (136) and as the other the potential responsible for the main Fermi contribution to HFS in Eq. (297) (see Fig. 83), one obtains [110] a small logarithm squared contribution 2 (Za) m ln(Za)\E . *E"! $ 3 p mM
(321)
A signi"cantly larger double mixed logarithm correction is generated by the potential corresponding to the leading recoil correction to hyper"ne splitting in Eq. (318) and the leading logarithmic Dirac correction to the Coulomb}SchroK dinger wave function [243,18,118] (see Fig. 84) *E"!3
M (Za) m ln(Za)\ ln EI . m $ p M
(322)
Single-logarithmic recoil corrections and an estimate of nonlogarithmic contributions of relative order (Za) corresponding to the leading logarithm squared term in Eq. (321) were also obtained recently [15]
32 3 *E " !2C # !ln 2# 1 3 4
(Za) m ln (Za)\E , $ p M
(323)
208
M.I. Eides et al. / Physics Reports 342 (2001) 63}261
Fig. 84. Leading mixed double logarithm contribution of order (Za)(m/M)EI . $
where
m 1 7 2 m M ln 1# C "! # (2 ln 2#3)! ln 1# ! 1 M m 9 3 M m 1! M
,
(324)
and (Za) m E . *E "(40$22) p M $
(325)
The error bars of the last result are still rather large, and more accurate calculation would be necessary in pursuit of all corrections of the scale of 10 Hz. The relatively large magnitude of the mixed logarithm contribution in Eq. (322) warrants calculation of the contribution which is linear in the logarithm of the mass ratio and nonlogarithmic in Za. All recoil corrections are collected in Table 17. The uncertainty of the total recoil correction in the last line of this table includes an estimate of the magnitude of the yet unknown recoil contributions. 12.2. Radiative-recoil corrections to HFS 12.2.1. Corrections of order a(Za)(m/M)EI and (Za)(Za)(m/M)EI $ $ As in the case of the purely radiative corrections of order a(Za)E , all diagrams relevant for $ calculation of radiative-recoil corrections of order a(Za)(m/M)EI may be obtained by radiative $ insertions in the skeleton diagrams. The only di!erence is that now the heavy particle line is also dynamical. The skeleton diagrams for this case coincide with the diagrams for the leading recoil corrections in Fig. 81. Note that even the leading recoil correction to HFS may be calculated in the scattering approximation. Insertion of radiative corrections in the skeleton diagrams emphasizes the high momenta region even more and, hence, the radiative-recoil correction to HFS splitting may be calculated in the scattering approximation. The diagrams for contributions of order a(Za)(m/M)EI are presented in Figs. 85}87, and they coincide topologically with the set of $ diagrams used for calculation of the radiative-recoil corrections to the Lamb shift (formal selection of relevant diagrams and proof of validity of the scattering approximation based on the relativistic two-particle equation, see, e.g., in [240,268,129]). We will consider below separately the corrections generated by the three types of diagrams: polarization insertions in the exchanged photons, radiative insertions in the electron line, and radiative insertions in the muon line. 12.2.1.1. Electron-line logarithmic contributions of order a(Za)(m/M)EI . The leading recoil correc$ tion to hyper"ne splitting is generated in the broad integration region over exchanged momenta m;k;M, and one might expect that insertion of radiative corrections in the skeleton diagrams
M.I. Eides et al. / Physics Reports 342 (2001) 63}261
209
Table 17 Recoil corrections
Leading recoil correction Arnowitt [258] Fulton and Martin [95] Newcomb and Salpeter [259] Leading logarithmic recoil correction, relative order (Za) Lepage [28] Bodwin and Yennie [263] Nonlogarithmic recoil correction, relative order (Za) Bodwin et al. [266,267] Logarithm squared correction, relative order (Za) Karshenboim [110] Mixed logarithm correction, relative order (Za) Kinoshita and Nio [243,18] Karshenboim [118]
EI $
kHz
3(Za) mM M ! ln p M!m m
!800.304
m 2(Za) ln(Za)\ mM (Za)
Nonlogarithmic correction, relative order (Za) Kinoshita [15] Total recoil correction
!2.197
2 (Za) m ln(Za)\(1#a ) ! I 3 p mM
!0.043
(Za) m M ln (Za)\ ln p M m
!0.210
(Za) m ln(Za)\ !19.62192 p M
!0.257
!3 Single-logarithmic correction, relative order (Za) Kinoshita [15]
65 m !8 ln 2# 18 mM
11.179
(40$22)
(Za) m p M
0.107 (59) !791.714 (80)
in Fig. 81 would produce double logarithmic contributions since the radiative insertions are themselves logarithmic when the characteristic momentum is larger than the electron mass. However, this is only partially true, since the sum of the radiative insertions in the electron line does not have a logarithmic asymptotic behavior. The simplest way to see this is to work in the Landau gauge where respective radiative insertions are nonlogarithmic [20]. In other gauges, individual radiative insertions have leading logarithmic terms, but it is easy to see that due to the Ward identities these logarithmic terms cancel. The "rst time this cancellation was observed as a result of direct calculation in [261]. In the absence of the leading logarithmic contribution to the electron factor the logarithmic contribution to HFS is equal to the product of the leading constant term !5a/4p in the electron factor [269] and the leading recoil correction Eq. (318), as calculated in [239]: 15 a(Za) m M ln EI . *E" 4 p M m $
(326)
210
M.I. Eides et al. / Physics Reports 342 (2001) 63}261
Fig. 85. Electron-line radiative-recoil corrections.
Fig. 86. Muon-line radiative-recoil corrections.
Fig. 87. Photon-line radiative-recoil corrections.
12.2.1.2. Electron-line nonlogarithmic contributions of order a(Za)(m/M)EI . The validity of the $ scattering approximation for calculation of all radiative and radiative-recoil corrections of order a(Za)EI greatly facilitates the calculations. One may obtain a compact general expression for all $ such corrections (both logarithmic and nonlogarithmic) induced by the radiative insertions in the electron line in Fig. 85 (see, e.g., [270])
3 Za *EU " E ! $ 16k p
;1cIkK cJ2 ¸ , I IJ
dk 1 1 1 # ip (k#i0) k#k\k #i0 k!k\k #i0
(327)
where the electron factor ¸ describes all radiative corrections in Fig. 69, 1cIkK cJ2 IJ I is the projection of the muon-line numerator on the spinor structure relevant to HFS, and k"m/(2M). The integral in Eq. (327) contains nonrecoil radiative corrections of order a(Za)E , as well as $ radiative-recoil corrections of all orders in the electron}muon mass ratio generated by the radiative insertions in the electron line. This integral admits in principle a direct brute force numerical calculation. The complicated structure of the integrand makes analytic extraction of the corrections of de"nite order in the mass ratio more involved. Direct application of the standard Feynman parameter methods leads to integrals for the radiative-recoil corrections which do not admit expansion of the integrand over the small mass ratio prior to integration, thus making the analytic calculation virtually impossible. Analytic results may be obtained with the help of the approach developed in [271] (see also review in [129]). The idea is to perform integration over the exchanged momentum directly in spherical coordinates. Following this route, we come to the
M.I. Eides et al. / Physics Reports 342 (2001) 63}261
211
expression of the type *E"a(Za)E $
dx
V
dy
dk
U(k, x, y) , (k#a)!4kbk
(328) where a(x, y), b(x, y), and U(k) are explicitly known functions. The crucial property of the integrand in Eq. (328), which facilitates calculation, is that the denominator admits expansion in the small parameter k prior to momentum integration. This is true due to the inequality 4kbk/(k#a)44k, which is valid according to the de"nitions of the functions a and b. In this way, we may easily reproduce the nonrecoil skeleton integral in Eq. (280), and obtain once again the nonrecoil corrections induced by the radiative insertions in the electron line [236,64,237]. This approach admits also an analytic calculation of the radiative-recoil corrections of the "rst order in the mass ratio. Nonlogarithmic radiative-recoil corrections to HFS were "rst calculated numerically in the Yennie gauge [272,240] and then analytically in the Feynman gauge [271]
p 17 a(Za) m M ln EI , *E" 6f(3)#3p ln 2# # p M m $ 2 8
(329)
where f(3) is the Riemann f-function. This expression contains all characteristic structures (f-function, p ln 2, p and a rational number) which one usually encounters in the results of the loop calculations. Let us emphasize that the relative scale of these subleading terms is rather large, of order p, which is just what one should expect for the constants accompanying the large logarithm. Numerically there is a certain discrepancy between the analytic result in the Feynman gauge [271] (6f(3)#3p ln 2#p/2#17/8)/p"3.526 and the numerical result in the Yennie gauge [240] 3.335$0.058. When both works were completed this discrepancy which is as large as three standard deviations of the accuracy of the numerical integration in [240] was purely academic. But nowadays, when the accuracy of the experimental data has achieved 0.053 kHz, the discrepancy of about 0.22 kHz has a phenomenological signi"cance. In order to resolve this discrepancy, an independent analytic calculation of the electron-line contribution in the Yennie gauge was undertaken [270]. Let us emphasize that despite being partially performed by the same authors as [271], this new work was logically independent of [271]. It was performed in the Yennie gauge, and the expression for the electron factor from [272,240] was used as the initial point of the calculation. The result of [270] con"rmed the earlier analytic result in [271], and thus we are convinced that the discrepancy mentioned above is resolved in favor of the result in Eq. (329). 12.2.1.3. Muon-line contribution of order (Za)(Za)(m/M)EI . Radiative-recoil correction to HFS $ generated by the diagrams in Fig. 86 with insertions of the radiative photons in the muon line is given by an expression similar to the one in Eq. (327) [273,129], the only di!erence is that now we have to insert the muon factor instead of the heavy line numerator and to preserve the skeleton
See one more comment on this discrepancy below in Section 12.2.3 where the radiative-recoil correction of order a(Za)(m/M)E is discussed. $
212
M.I. Eides et al. / Physics Reports 342 (2001) 63}261
electron-factor numerator. Unlike the case of the electron line, radiative insertions in the heavy muon line do not generate nonrecoil corrections. This is easy to realize if one recalls that the nonrecoil electron factor contribution is generated by the muon pole, which is absent in the diagrams with two exchanged photons and the muon-line fermion factor (muon anomalous magnetic moment is subtracted from the muon factor, since in the same way as in the case of the electron factor it generates corrections of lower order in a). Radiative insertions in the heavy fermion line do not generate logarithmic terms [239] either. This can be understood with the help of the low energy theorem for the Compton scattering. The e!ective momenta in the integral for the radiative-recoil corrections are smaller than the muon mass and, hence, the muon-line factor enters the integral in the low-momenta limit. The classical low-energy theorem for Compton scattering cannot be used directly in this case since the exchanged photons are virtual, but nevertheless it is not di$cult to prove the validity of a generalized low-energy theorem in this case [239,129]. Then we see that the logarithmic skeleton integrand gets an extra factor k after insertion of the muon factor in the integrand, and this extra factor changes the logarithmic nature of the integral. Analytic calculations of the muon-line radiative-recoil correction are carried out in the same way as in the electron-line case and the purely numerical [272,240] and analytic [273,129] results for this contribution are in excellent agreement
*E"
9 39 (Za)(Za) m M f(3)!3p ln 2# ln EI . 2 p M m $ 8
(330)
12.2.1.4. Leading photon-line double logarithmic contribution of order a(Za)(m/M)EI . The $ only double-logarithmic radiative-recoil contribution of order a(Za)(m/M)EI is generated by the $ leading logarithmic term in the polarization operator. Substitution of this leading logarithmic term in the logarithmic skeleton integral in Eq. (316) immediately leads to the double logarithmic contribution [261] *E"!2
M a(Za) m ln EI . m $ p M
(331)
As was noted in [269] this contribution may be obtained without any calculations at all. It is su$cient to realize that with logarithmic accuracy the characteristic momenta in the leading recoil correction in Eq. (316) are of order M and, in order to account for the leading logarithmic contribution generated by the polarization insertions, it is su$cient to substitute in Eq. (318) the running value of a at the muon mass instead of the "ne structure a. This algebraic operation immediately reproduces the result above. 12.2.1.5. Photon-line single-logarithmic and nonlogarithmic contributions of order a(Za)(m/M)EI . $ Calculation of the nonrecoil radiative correction of order a(Za)EI was facilitated by simultaneous $ consideration of the electron and muon loop polarization insertions in the exchanged photons. Similarly calculation of the radiative-recoil corrections generated by the diagrams in Fig. 87 with insertions of the vacuum polarizations, is technically simpler if one considers simultaneously both
M.I. Eides et al. / Physics Reports 342 (2001) 63}261
213
electron and muon vacuum polarizations. All corrections may be obtained by substituting the explicit expression for the sum of vacuum polarizations in the skeleton integral Eq. (316). In this skeleton integral, part of the recoil correction corresponding to the factor 1/(1#m/M) is subtracted and this explains why we have restored this factor in consideration of the nonrecoil part of the vacuum polarization. Technically, consideration of the sum of the electron and muon vacuum polarizations leads to simpli"cation of the integrand for the radiative-recoil corrections, and after an easy calculation one obtains the single-logarithmic and nonlogarithmic contributions to the total radiative-recoil correction induced by the sum of the electron and muon vacuum polarizations [239,240]
8 M 28 p a(Za) M EI . *E" ! ln ! ! p m $ 3 m 9 3
(332)
Note that in the parenthesis we have parted with our usual practice of considering the muon as a particle with charge Ze, and assumed Z"1. Technically this is inspired by the cancellation of certain contributions between the electron and muon polarization loops mentioned above, and from the physical point of view it is not necessary to preserve a nontrivial factor Z here, since we need it only as a reference to an interaction with the `constituenta muon and not with the one emerging in the polarization loops. As was explained above, the coe$cient before the leading logarithm squared term in Eq. (331) may easily be obtained almost without any calculations. Simultaneous account for the electron and muon loops does not e!ect this contribution, since all logarithmic contributions are generated only by the insertion of the electron loop. It may be shown also that the coe$cient before the subleading logarithm originates from the "rst two terms in the asymptotic expansion of the polarization operator 2a/(3p) ln(k/m)!5/9 [269]. Substituting the polarization operator asymptotics in the skeleton integral in Eq. (316) and, multiplying the result by the factor 2 in order to take into account all possible insertion of the polarization loop in the exchanged photons, one obtains !8/3 for the coe$cient before the single-logarithmic term, in accordance with the result above. The factor !6 comes from the leading logarithmic term in the polarization operator expansion, and the factor 10/3 is generated by the subleading constant, their sum being equal to !8/3. 12.2.1.6. Heavy particle polarization contributions of order a(Za)EI . The contribution of the muon $ polarization operator was already considered above. One might expect that contributions of the diagrams in Fig. 88 with the heavy particle polarization loops are of the same order of magnitude as the contribution of the muon loop, so it is natural to consider this contribution here. Respective corrections could easily be calculated by substituting the expressions for the heavy particle polarizations in the unsubtracted skeleton integral in Eq. (316). However, only the polarization operator of the heavy lepton q may be calculated analytically. Polarization contributions due to the
Fig. 88. Heavy-particle polarization contribution to HFS.
214
M.I. Eides et al. / Physics Reports 342 (2001) 63}261
Fig. 89. Graphs with two one-loop polarization insertions. Fig. 90. Graphs with two-loop polarization insertions.
Fig. 91. Graphs with light by light scattering insertions.
loops of pions and other hadrons cannot be calculated with the help of the QCD perturbation theory, and the best approach for their evaluation is to use some low-energy e!ective theory and experimental data. Respective calculations were performed in [240,274,274] and currently the most accurate result for the hadron polarization operator contribution to HFS is [275] *E"3.5988 (1045)
a(Za) mM EI . p m $ p
(333)
12.2.2. Leading logarithmic contributions of order a(Za)(m/M)EI $ The leading contribution of order a(Za)(m/M)EI is enhanced by the cube of the large logarithm $ of the electron}muon mass ratio. One could expect that logarithm cube terms would be generated by a few types of radiative insertions in the skeleton graphs with two exchanged photons: insertions of the "rst- and second-order polarization operators in the exchanged photons in Figs. 89 and 90, insertions of the light-by-light scattering contributions in Fig. 91, insertions of two radiative photons in the electron line in Fig. 92, and insertions of polarization operator in the radiative photon in Fig. 93. In [276], where the leading logarithm cube contribution was calculated explicitly, it was shown that only the graphs in Fig. 89 with insertions of the one-loop polarization operators generate logarithm cube terms. This leading contribution may be obtained without any calculations by simply substituting the e!ective charge a(M) de"ned at the characteristic scale M in the leading recoil correction of order (Za)(m/M)E instead of the "ne structure constant a and $ expanding the resulting expression in the power series over a [269] (compare with a similar remark above concerning the leading logarithm squared term of order a(Za)E ). $ Calculation of the logarithm squared term of order a(Za)(m/M)EI is more challenging [269]. $ All graphs in Figs. 89 and 90}93, generate corrections of this order. The contribution induced by the irreducible two-loop vacuum polarization in Fig. 90 is again given by the e!ective charge expression. Subleading logarithm squared terms generated by the one-loop polarization insertions in Fig. 89 may easily be calculated with the help of the two leading asymptotic terms in the
M.I. Eides et al. / Physics Reports 342 (2001) 63}261
215
Fig. 92. Graphs with radiative photon insertions.
Fig. 93. Graphs with polarization insertions in the radiative photon.
Fig. 94. Renormalization of the "fth current.
polarization operator expansion and the skeleton integral. An interesting e!ect takes place in calculation of the logarithm squared term generated by the polarization insertions in the radiative photon in Fig. 93. One might expect that the high-energy asymptote of the electron factor with the polarization insertion is given by the product of the leading constant term of the electron factor !5a/(4p) and the leading polarization operator term. However, this expectation turns out to be wrong. One may check explicitly that instead of the naive factor above one has to multiply the polarization operator by the factor !3a/(4p). The reason for this e!ect may easily be understood. The factor !3a/(4p) is the asymptote of the electron factor in massless QED and it gives a contribution to the logarithmic asymptotics only after the polarization operator insertion. This means that in massive QED the part !2a/(4p) of the constant electron factor originates from the integration region where the integration momentum is of order of the electron mass. Naturally this integration region does not give any contribution to the logarithmic asymptotics of the radiatively corrected electron factor. The least trivial logarithm squared contribution is generated by the three-loop diagrams in Fig. 91 with the insertions of light-by-light scattering block. Their contribution was calculated explicitly in [269]. Later it was realized that these contributions are intimately connected with the well-known anomalous renormalization of the axial current in QED [277]. Due to the projection on the HFS spin structure in the logarithmic integration region the heavy particle propagator e!ectively shrinks to an axial current vertex, and in this situation calculation of the respective contribution to HFS reduces to substitution of the well known two-loop axial renormalization factor in Fig. 94 [278] in the recoil skeleton diagram. Of course, this calculation reproduces the same contribution as obtained by direct calculation of the diagrams with light-bylight scattering expressions. From the theoretical point of view it is interesting that having su$ciently accurate experimental data one can in principle measure anomalous two-loop renormalization of the axial current in the atomic physics experiment.
216
M.I. Eides et al. / Physics Reports 342 (2001) 63}261
Fig. 95. Graphs with simultaneous insertions of the electron and muon loops.
The sum of all logarithm cubed and logarithm squared contributions of order a(Za)(m/M)EI is $ given by the expression [276,269]
4 M 4 M a(Za) m EI . *E" ! ln # ln p M $ 3 m 3 m
(334)
It was also shown in [269] that there are no other contributions with the large logarithm of the mass ratio squared accompanied by the factor a, even if the factor Z enters in another manner than in the equation above. Unlike the logarithm cube and logarithm squared terms which are generated only by a small number of diagrams discussed above, there are numerous sources of the single-logarithmic terms. All these terms were never calculated and only a partial result for one of the gauge-invariant sets of such graphs in Fig. 95 is known now [279] *E"0.6455 ln
M a(Za) m EI . m p M $
(335)
This contribution may be used only as an indication of the scale of other uncalculated single-logarithmic terms. 12.2.3. Corrections of order a(Za)(m/M)E $ Radiative-recoil corrections of order a(Za)(m/M)E are generated by the same set of diagrams $ in Figs. 85}87 with the radiative insertions in the electron and muon lines, and with the polarization insertions in the photon lines, as the respective corrections of the previous order in the mass ratio. Analytic calculation proceeds as in that case, the only di!erence is that now one has to preserve all contributions which are of second order in the small mass ratio. It turns out that all such corrections are generated at the scale of the electron mass, and one obtains for the sum of all corrections [280]
*E " U
3 17 !6 ln 2! a(Za)! (Za)(Za) 4 12
m E . $ M
(336)
The electron-line contribution
m 3 E +!0.03 kHz , *E " !6 ln 2! a(Za) $ M 2
(337)
is of special interest in view of the discrepancy between the results for the electron line contributions of order a(Za)(m/M)E in [271,270,240] (see discussion in Section 12.2.1.2). The result for the $ electron line contribution in [240] was obtained without expansion in m/M, so one could try to
M.I. Eides et al. / Physics Reports 342 (2001) 63}261
217
Fig. 96. Leading logarithm squared contribution of order a(Za)E . $
ascribe the discrepancy to the contribution of these higher-order terms. However, we see that the contribution of the second order in the mass ratio in Eq. (337) is by far too small to explain the discrepancy. We would like to emphasize once again that the coinciding results in [271,270] were obtained completely independently in di!erent gauges, so there is little, if any, doubt that this result is correct. 12.2.4. Corrections of order a(Za)(m/M)E $ Radiative-recoil corrections of order a(Za)(m/M)E were never calculated completely. As we $ have mentioned in Section 11.4.1.1 the leading logarithm squared contribution of order a(Za)E $ may easily be calculated if one takes as one of the perturbation potentials the potential corresponding to the electron electric form factor and as the other the potential responsible for the main Fermi contribution to HFS (see Fig. 96). Then one obtains the leading logarithm squared contribution in the form [110]
a(Za) m 2 E , *E"! ln(Za)\ $ m p 3
(338)
which di!ers from the leading logarithm squared term in Eq. (298) by the recoil factor (m /m). Preserving only the linear term in the expansion of this result over the mass ratio one obtains 4 a(Za) m *E" ln(Za)\ E . 3 p M $
(339)
Numerically this contribution is about 0.3 kHz, and clearly has to be taken into account in comparison of the theory with the experimental data. In this situation it is better simply to use in the theoretical formulae the leading logarithm squared contribution of order a(Za)E in the form $ in Eq. (338) instead of the expression for this logarithmic term in Eq. (298). Similar single-logarithmic
3 a(Za) m 16 ln 2! ln(Za)\ E *E" 4 p M $ 3
(340)
and nonlogarithmic *E"!4(10$2.5)
a(Za) m E p M $
(341)
radiative-recoil corrections of order a(Za)(m/M)E associated with the reduced mass factors in the $ nonrecoil corrections of order a(Za)E were obtained recently [15]. $
218
M.I. Eides et al. / Physics Reports 342 (2001) 63}261
Table 18 Radiative-recoil corrections EI $
kHz
Leading logarithmic electron-line correction Terray and Yennie [239]
15 a(Za) m M ln 4 p M m
Nonlogarithmic electron-line correction Eides et al. [271]
Muon-line contribution Eides et al. [273] Logarithm squared polarization contribution Caswell and Lepage [261] Single-logarithmic and nonlogarithmic polarization contributions Terray and Yennie [239]
!2 ln
M a(Za) m m p M
4 a(Za) m M ln 3 p M m
One of linear in log corrections Li et al. [279]
0.645 5
Second order in mass ratio contribution Eides et al. [280]
Nonlogarithmic a(Za) radiative recoil correction Kinoshita [15] Total radiative-recoil correction
!1.190
!6.607
8 M p 28 a(Za) m ! ln ! ! 3 m 3 9 p M
Log square correction Eides et al. [269]
Single-logarithmic a(Za) radiative-recoil correction Kinoshita [15]
4.044
9 39 (Za)(Za) m f(3)!3p ln 2# 2 8 p M
Leading log cube correction Eides and Shelyuto [276]
Leading logarithmic a(Za) radiative-recoil correction Karshenboim [110]
p 17 a(Za) m 6f(3)#3p ln 2# # 2 8 p M
a(Za) mM p m p M 4 a(Za) m ! ln 3 p M m
Hadron polarization contribution Faustov et al. [275]
2.324
!2.396
3.599(105)
0.240 (7) !0.055 0.010
a(Za) m M ln p M m
0.009
3 17 !6 ln 2! a(Za)! (Za)(Za)) 4 12
4 a(Za) m ln(Za)\ 3 p M
16 3 a(Za) m ln 2! ln(Za)\ 3 4 p M
!4(10$2.5)
a(Za) m p M
m M
!0.035
0.344
!0.008
!0.107 (27) !3.427 (70)
The relatively large magnitude of the correction in Eq. (339) demonstrates that a calculation of all radiative-recoil corrections of order a(Za)(m/M)E is warranted. The error of the total $ radiative-recoil correction in the last line in Table 18 includes, besides the errors of individual contributions in the upper lines of this table, also an educated guess on the magnitude of yet uncalculated contributions.
M.I. Eides et al. / Physics Reports 342 (2001) 63}261
219
13. Weak interaction contribution The weak interaction contribution to hyper"ne splitting is due to Z-boson exchange between the electron and muon in Fig. 48. Due to the large mass of the Z-boson this exchange is e!ectively described by the local four-fermion interaction Hamiltonian
1 dx( j ) j) , (342) H "! 2M 8 where j is the spatial part of the weak current ( j, j ), weak charge is included in the de"nition of the weak current, and M is the Z-boson mass. 8 The weak interaction contribution to HFS was calculated many years ago [281,282], and even radiative corrections to the leading term were discussed in the literature [283}285]. However, quite recently it turned out that the weak contribution to HFS is cited in the literature with di!erent signs [263,10,243,18]. This happened probably because the weak correction was of purely academic interest for early researchers. This discrepancy in sign was subjected to scrutiny in a number of recent works [243,18,251,286] which all produced the result in agreement with [282] G 3mM E +!0.065 kHz . *E"! $ (2 4pZa $
(343)
It is easy to see from [286] that for nuclei with more than one nucleon the expression in Eq. (343) would contain an extra factor equal to the eigenvalue of the doubled third component 2¹ of the isospin operator. 14. Hyper5ne splitting in hydrogen Hyper"ne splitting in the ground state of hydrogen is one of the most precisely measured quantities in modern physics [287,288] (see for more details Section 16.2.1 below), and to describe it theoretically we need to consider additional contributions to HFS connected with the bound state nature of the proton. Dominant nonrecoil contributions to the hydrogen hyper"ne splitting are essentially the same as in the case of muonium. The only di!erences are that now in all formulae the proton mass replaces the muon mass, and we have to substitute the proton anomalous magnetic moment i"1.792847386 (63) measured in nuclear magnetons instead of the muon anomalous magnetic moment a measured in the I Bohr magnetons in the expression for the hydrogen Fermi energy in Eq. (271). After this substitution one can use the nonrecoil corrections collected in Tables 12}16 for the case of hydrogen. As in the case of the Lamb shift the composite nature of the nucleus reveals itself "rst of all via a relatively large "nite size correction. It is also necessary to reconsider all recoil and radiative-recoil corrections. Due to existence of the proton anomalous magnetic moment and nontrivial proton form factors, simple minded insertions of the hydrogen Fermi energy instead of the muonium Fermi energy in the muonium expressions for these corrections leads to the wrong results. As we have seen in Sections 12.1 and 12.2 leading recoil and radiative-recoil corrections originate from distances small on the atomic scale, between the characteristic Compton lengths of the heavy nucleus 1/M and the light electron 1/m. Proton contributions to hyper"ne splitting coming from
220
M.I. Eides et al. / Physics Reports 342 (2001) 63}261
these distances cannot be satisfactory described only in terms of such global characteristics as its electric and magnetic form factors. Notice that the leading recoil and radiative-recoil contributions to the Lamb shift are softer, they come from larger distances (see Sections 5.1}6.2), and respective formulae are valid both for elementary and composite nuclei. Theoretical distinctions between the case of elementary and composite nucleus are much more important for HFS than for the Lamb shift. Despite the di!erence between the two cases, discussion of the proton size and structure corrections to HFS in hydrogen below is in many respects parallel to the discussion of the respective corrections to the Lamb shift in 4.1. 14.1. Nuclear size, recoil and structure corrections of orders (Za)E and (Za)E $ $ Nontrivial nuclear structure beyond the nuclear anomalous magnetic moment "rst becomes important for corrections to hyper"ne splitting of order (Za) (compare respective corrections to the Lamb shift in Section 7.2). Corrections of order (Za)E connected with the nonelementarity of $ the nucleus are generated by the diagrams with two-photon exchanges. Insertion of the perturbation corresponding to the magnetic or electric form factors in one of the legs of the skeleton diagram in Fig. 67 described by the infrared divergent integral in Eq. (280) makes the integral infrared convergent and pushes characteristic integration momenta to the high scale determined by the characteristic scale of the hadron form factor. Due to the composite nature of the nucleus, besides intermediate elastic nuclear states, we also have to consider the contribution of the diagrams with inelastic intermediate states. We will "rst consider the contributions generated only by the elastic intermediate nuclear states. This means that calculating this correction we will treat the nucleus as a particle which interacts with the photons via its nontrivial Sachs electric and magnetic form factors in Eq. (163). 14.1.1. Corrections of order (Za)E $ 14.1.1.1. Correction of order (Za)(m/K)E (Zemach correction). As usual we start consideration of $ the contributions of order (Za)E with the infrared divergent integral Eq. (280) corresponding $ to the two-photon skeleton diagram in Fig. 67. Insertion of factors G (!k)!1 or # G (!k)/(1#i)!1 in one of the external proton legs corresponds to the presence of a nontrivial + proton form factor. We need to consider diagrams in Fig. 97 with insertion of one factor G (!k)!1 in the #+ proton vertex
dk G (!k) 8(Za)m E [G (!k)!1]# + !1 *E" # k 1#i pn $
,
(344)
Subtraction is necessary in order to avoid double counting since the diagrams with the subtracted term correspond to the pointlike proton contribution, already taken into account in the expression for the Fermi energy in Eq. (271). Dimensionless integration momentum in Eq. (280) is measured in electron mass. We return here to dimensionful integration momenta, which results in an extra factor m in the numerators in Eqs. (344) and (345). Notice also the minus sign before the momentum in the arguments of form factors, it arises because in the equations below k""k".
M.I. Eides et al. / Physics Reports 342 (2001) 63}261
221
Fig. 97. Elastic nuclear size corrections of order (Za)E with one form factor insertion. Empty dot corresponds either to $ G (!k)!1 or G (!k)/(1#i)!1. # + Fig. 98. Elastic nuclear size correction of order (Za)E with two form factor insertions. Empty dot corresponds either to $ G (!k)!1 or G (!k)/(1#i)!1. # +
and the diagram in Fig. 98 with simultaneous insertion of both factors G (!k)!1 and # G (!k)!1 in two proton vertices +
dk G (!k) 8(Za)m E [G (!k)!1] + !1 . *E" k # 1#i pn $
(345)
E!ectively the integration in Eqs. (345) and (344) goes up to the form factor scale. This scale is much higher than the electron mass and high momenta; in this section, mean momenta are much higher than the electron mass. In earlier sections high momenta often meant momenta of the scale of the electron mass. The total proton size dependent contribution of order (Za)E , which is often called the Zemach $ correction, has the form
dk G (!k)G (!k) 8(Za)m # + E !1 , *E" k 1#i pn $
(346)
or in the coordinate space [166] *E"!2(Za)m1r2 E , $
(347)
where 1r2 is the "rst Zemach moment (or the Zemach radius) [166], de"ned via the weighted convolution of the electric and magnetic densities o (r) corresponding to the respective form #+ factors (compare with the third Zemach moment in Eq. (169))
1r2 , dr dr o (r)o (r )"r#r " . # +
(348)
Parametrically the result in Eq. (347) is of order (Za)(m/K)E , where K is the form factor scale. $ This means that this correction should be considered together with other recoil corrections, even though it was obtained from a nonrecoil skeleton integral. The simple coordinate form of the result in Eq. (347) suggests that it might have an intuitively clear interpretation. This is the case and the expression for the Zemach correction was originally derived from simple nonrelativistic quantum mechanics without any "eld theory [166]. Let us describe the main steps of this quite transparent derivation. Recall "rst that the main Fermi contribution to hyper"ne splitting in Eq. (271) is simply a matrix element of the one-photon exchange which, due to the local nature of the magnetic interaction, is simply proportional to the value of the SchroK dinger wave function squared at the origin "t(0)". However, the nuclear magnetic
222
M.I. Eides et al. / Physics Reports 342 (2001) 63}261
moment is not pointlike, but distributed over a "nite region described by the magnetic moment density o (r). This e!ect can be taken into account in the matrix element for the leading + contribution to HFS with the help of the obvious substitution
"t(0)"P d ro (r)"t(r)" . +
(349)
Hence, the correction to the leading contribution to HFS depends on the behavior of the bound-state wave function near the origin. The ordinary SchroK dinger}Coulomb wave function for the ground state behaves near the origin as exp(!m Zar)+1!m Zar. For a very short-range nonlocal source of the electric "eld, the wave function behaves as
t(r)+1!m Za dr "r!r "o (r ) , #
(350)
as may easily be checked using the Green function of the Laplacian operator [166]. Substituting Eq. (350) in Eq. (349) we again come to the Zemach correction but with the reduced mass factor instead of the electron mass in Eq. (347). The di!erence between these two results is of order m/M, and might become important in a systematic treatment of the corrections of second order in the electron}proton mass ratio. The quantum mechanical derivation also explains the sign of the Zemach correction. The spreading of the magnetic and electric charge densities weakens the interaction and consequently diminishes hyper"ne splitting in accordance with the analytic result in Eq. (347). The Zemach correction is essentially a nontrivial weighted integral of the product of electric and magnetic densities, normalized to unity. It cannot be measured directly, like the rms proton charge radius which determines the main proton size correction to the Lamb shift (compare the case of the proton size correction to the Lamb shift of order (Za) in Eq. (168) which depends on the third Zemach moment). This means that the correction in Eq. (347) may only conditionally be called the proton size contribution. The Zemach correction was calculated numerically in a number of recent papers [25,289,290]. The most straightforward approach is to use the phenomenological dipole "t for the Sachs form factors of the proton 1 G (k) " , G (k)" + # (1!(k/K)) 1#i
(351)
with the parameter K"0.898 (13)M, where M is the proton mass. Substituting this parametrization in the integral in Eq. (346) one obtains *E"!38.72 (56);10\E [289] for the Zemach $ correction. The uncertainty in the brackets accounts only for the uncertainty in the value of the parameter K and the uncertainty introduced by the approximate nature of the dipole "t is completely ignored. This last uncertainty could be signi"cantly underestimated in such approach.
We used the same value of K in Section 6.1.3 for calculation of the correction to the Lamb shift in hydrogen generated by the radiative insertions in the proton line. Due to the logarithmic dependence of this correction on K small changes of its value do not a!ect the result for the proton line contribution to the Lamb shift.
M.I. Eides et al. / Physics Reports 342 (2001) 63}261
223
Fig. 99. Diagrams with one form factor insertion for total elastic nuclear size corrections of order (Za)E . $ Fig. 100. Diagrams with two form factor insertions for total elastic nuclear size corrections of order (Za)E . $
As was emphasized in [290] the integration momenta which are small in comparison with the proton mass play an important role in the integral in Eq. (346). About "fty percent of the integral comes from the integration region where k(K/2. But the dipole form factor parameter K is simply related to the rms proton radius r"12/K, and one can try to use the empirical value of the N proton radius as an input for calculation of the low momentum contribution to the Zemach correction. Such an approach, which assumes validity of the dipole parametrization for both form factors at small momenta transfers, but with the parameter K determined by the proton radius leads to the Zemach correction *E"!41.07 (75);10\E [290] for r "0.862 fm $ N (K"0.845 (12)M). As we will see below in Section 16.1.5 (see Eq. (394)), modern spectroscopic data indicates an even higher value of the proton charge radius r "0.891 (18) fm. The respective N Zemach correction is DE"!42.4 (1.1);10\E . We will use this last value of the Zemach $ correction for further numerical estimates. New experimental data on the proton charge radius, and more numerical work with the existing experimental data on the proton form factors could result in a more accurate value of the Zemach correction. 14.1.1.2. Recoil correction of order (Za)(m/M)E . For muonium the skeleton two-photon ex$ change diagrams in Fig. 81 generated, after subtraction of the heavy pole contribution, a recoil correction of order (Za)(m/M)E to hyper"ne splitting. Calculation of the respective recoil correc$ tions for hydrogen does not reduce to substitution of the proton mass instead of the muon mass in Eq. (318), but requires an account of the proton anomalous magnetic moment and the proton form factors. As we have seen, insertions of the nontrivial proton form factors in the external "eld skeleton diagram pushes the characteristic integration momenta into the region determined by the proton size and, as a result respective contribution to HFS in Eq. (347) contains the small proton size factor m/K. The scale of the proton form factor is of order of the proton mass and thus the Zemach correction in Figs. 97 and 98 is of the same order as the recoil contributions in Figs. 99 and 100 generated by the form factor insertions in the skeleton diagrams in Fig. 81. It is natural to consider all contributions of order (Za)F together and to call the sum of these corrections the total $ proton size correction. However, we have two di!erent parameters m/K and m/M, and the Zemach and the recoil corrections admit separate consideration. The recoil part of the proton size correction of order (Za)E was "rst considered in [258,259]. In $ these works existence of the nontrivial nuclear form factors was ignored and the proton was considered as a heavy particle without nontrivial momentum dependent form factors but with an anomalous magnetic moment. The result of such a calculation is most conveniently written in terms of the `elementarya proton Fermi energy EI which does not include the contribution of the $ proton anomalous magnetic moment (compare Eq. (315) in the muonium case). Calculation of this correction coincides almost exactly with the one in the case of the leading muonium recoil
224
M.I. Eides et al. / Physics Reports 342 (2001) 63}261
correction in Eq. (318) and generates an ultraviolet divergent result [258,259,291] 3mM Za *E"! M!m p
i M i 1 M 1! ln # ! #3 ln 4 m 4 6 M
EI , $
(352)
where M is an arbitrary ultraviolet cuto!. The ultraviolet divergence is generated by the diagrams with insertions of two anomalous magnetic moments in the heavy particle line. This should be expected since quantum electrodynamics of elementary particles with nonvanishing anomalous magnetic moments is nonrenormalizable. For the real proton we have to include in the vertices the proton form factor, which decays fast enough at large momenta transfer and neutralizes the divergence. Insertion of the form factors will e!ectively cut o! momentum integration at the form factor scale K, which is slightly smaller than the proton mass. Precise calculation needs an accurate rederivation of the recoil correction with account for the form factors, but one important feature of the expected result is obvious immediately. The factor in the braces in Eq. (352) is numerically small just for the physical value of the proton anomalous magnetic moment i"1.792847386 (63) measured in nuclear magnetons and the ultraviolet cuto! of the order of the proton mass [289]. We should expect that this accidental suppression of the recoil correction would survive account for the form factors, and this correction will at the end of the day be numerically much smaller than the Zemach correction, even though these two corrections are parametrically of the same order. Since the Zemach and recoil corrections are parametrically of the same order of magnitude only their sum was often considered in the literature. The "rst calculation of the total proton size correction of order (Za)E with form factors was done in [292], followed by the calculations in $ [293,291]. Separately the Zemach and recoil corrections were calculated in [25,289]. Results of all these works essentially coincide, but some minor di!erences are due to the di!erences in the parameters of the dipole nucleon form factors used for numerical calculations. We will present here results in the form obtained in [293,291]. They are independent of the speci"c parametrization of the form factors and especially convenient for further consideration of the inelastic polarizability contributions. The total proton size correction of order (Za)E gener$ ated by the diagrams with the insertions of the total proton form factors in Figs. 99 and 100 may easily be calculated. The resulting integral contains two contributions of the pointlike proton with the anomalous magnetic moment which were already taken into account. One is the infrared divergent nonrecoil contribution corresponding to the external "eld skeleton integral in Fig. 67 with insertion of the anomalous magnetic moment, the other is the pointlike proton recoil correction in Eq. (352). After subtraction of these contributions we obtain an expression for the remaining proton size correction in the form of the Euclidean four-dimensional integral [293,291]
dk 2k(2k#k )[F (F #F )!(1#i)]#6kk [F (F #F )!i(1#i)] k 4k #(k/M) (2k#k )(F !i) Za m ! EI . (353) p M $ 2
*E"
Really the original works [258,259] contain just the elementary proton ultraviolet divergent result in Eq. (352) which turns into the ultraviolet "nite muonium result in Eq. (318) if the anomalous magnetic moment i is equal zero.
M.I. Eides et al. / Physics Reports 342 (2001) 63}261
225
Fig. 101. Digrams for nuclear polarizability correction of order (Za)E . $
The last term in the braces is ultraviolet divergent, but it exactly cancels in the sum with the point proton contribution in Eq. (352). The sum of contributions in Eqs. (352) and (353) is the total proton size correction, including the Zemach correction. According to the numerical calculation in [289] this is equal to *E"!33.50 (55);10\E . As was discussed above, the Zemach correction $ included in this result strongly depends on the precise value of the proton radius, while numerically the much smaller recoil correction is less sensitive to the small momenta behavior of the proton formfactor and has smaller uncertainty. For further numerical estimates we will use the estimate *E"5.22 (1);10\E of the recoil correction obtained in [289]. $ 14.1.1.3. Nuclear polarizability contribution of order (Za)E . Up to now we considered only the $ contributions of order (Za)E to hyper"ne splitting in hydrogen generated by the elastic intermedi$ ate nuclear states. As was "rst realized by Iddings [293] inelastic contributions in Fig. 101 admit a nice representation in terms spin-dependent proton structure functions G and G [293,291] Za m EI ,d E , (354) *E"(D #D ) $ 2p M $ where
dQ 9 dl l F (Q)#4M b G (Q, l) l Q 4 Q J / dQ dl l D "12M b G (Q, l) . Q l Q J / The inelastic pion-nucleon threshold l may be written as m#Q , l (Q)"m # p p 2M D "
(355) (356)
(357)
and the auxiliary functions b have the form G (358) b (x)"!3x#2x#2(2!x)(x(1#x) , (359) b (x)"1#2x!2(x(1#x) . The structure functions G and G may be measured in inelastic scattering of polarized electrons on polarized protons. The di!erence between the spin antiparallel and spin parallel cross sections has the form
dpts dptt 4pa E#E cos h ! " G (Q, l)#G (Q, l) , l dq dE dq dE EQ
(360)
226
M.I. Eides et al. / Physics Reports 342 (2001) 63}261
where E, E are the initial and "nal electron energies, l"E!E, and h is the electron scattering angle in the laboratory frame. Only partial experimental data is known today for the proton spin-dependent structure functions, and there is not enough information to calculate the integrals in Eq. (354) directly. First estimates of the polarizability correction were obtained a long time ago [294,293,295,291,296,297]. One popular approach is to consider the polarizability correction directly as a sum of contributions of all intermediate nuclear states. Then the leading contributions to this correction are generated by the low lying states, and the contribution of the D isobar was estimated many times [294,293,295,291,296]. The latest result [298] consistent with the earlier estimates is d
(D)"!0.12;10\ . (361) The total polarizability contribution in this approach may be obtained after summation over contributions of all relevant intermediate states. About thirty years ago the general properties of the structure functions and the known experimental data were used to set rigorous bounds on the polarizability contribution in Eq. (354) [299,300] (see also reviews in [301,289]) "d "44;10\ . (362) The problem of the polarizability contribution clearly requires new consideration which takes into account more recent experimental data. 14.1.2. Recoil corrections of order (Za)(m/M)E $ Recoil corrections of relative order (Za)(m/M) are connected with the diagrams with three exchanged photons (see Fig. 102). Due to the Caswell}Lepage cancellation [260,261] recoil corrections of order (Za)(m/M)EI in muonium (see discussion in Section 12.1.2) originate from the $ exchanged momenta of order of the electron mass and smaller. The same small exchanged momenta are also relevant in the case of hydrogen. This means that unlike the case of the recoil correction of order (Za)(m/M)EI considered above, the proton structure is irrelevant in calculation $ of corrections of order (Za)(m/M)EI . However, we cannot simply use the muonium formulae for $ hydrogen because the muonium calculations ignored the anomalous magnetic moment of the heavy particle. A new consideration [289] of the recoil corrections of order (Za)(m/M)EI in the $ case of a heavy particle with an anomalous magnetic moment resulted in the correction
*E"
7i i(12!11i) 2(1#i)# ln(Za)\! 8(1#i)! ln 2 4 4
#
m 65 i(11#31i) # (Za) EI . 36 mM $ 18
Fig. 102. Diagrams for recoil corrections of order (Za)E . $
(363)
M.I. Eides et al. / Physics Reports 342 (2001) 63}261
227
For a vanishing anomalous magnetic moment of the heavy particle (i"0) this correction turns into the muonium result in Eqs. (319) and (320). Calculation of the respective radiative-recoil correction of order a(Za)(m/M)EI in the skeleton $ integral approach is quite straightforward and may readily be done. However, numerically the correction in Eq. (363) is smaller than the uncertainty of the Zemach correction, and calculation of corrections to this result does not seem to be an urgent task. 14.1.3. Correction of order (Za)m1r2E $ The leading nuclear size correction of order (Za)m1r2E may easily be calculated in the $ framework of nonrelativistic perturbation theory if one takes as one of the perturbation potentials the potential corresponding to the main proton size contribution to the Lamb shift in Eq. (158). The other perturbation potential is the potential in Eq. (297) responsible for the main Fermi contribution to HFS (compare calculation of the leading logarithmic contribution of order a(Za)(m/M)E in Section 12.2.4). The result is [290] $ 2 (364) *E"! (Za) ln(Za)\m1r2E "!1.7;10\E . $ $ 3 This tiny correction is too small to be of any phenomenological interest for hydrogen. 14.1.4. Correction of order (Za)(m/K)E $ The logarithmic nuclear size correction of order (Za)E may simply be obtained from the $ Zemach correction if one takes into account the Dirac correction to the SchroK dinger}Coulomb wave function in Eq. (96) [290] *E"!(Za) ln(Za)\m1r2 E . (365) $ The corrections in Eqs. (364) and (365) are negligible for ground state hyper"ne splitting in hydrogen. However, it is easy to see that these corrections are state dependent and give contributions to the di!erence of hyper"ne splittings in the 2S and 1S states 8*E(2S)!*E(1S). Respective formulae were obtained in [302,290] and are of phenomenological interest in the case of HFS splitting in the 2S state in hydrogen [303,304], in deuterium [305], and in the He> ion [306], and also for HFS in the 2P state [307] (see also review in [308]). 14.2. Radiative corrections to nuclear size and recoil ewects 14.2.1. Radiative-recoil corrections of order a(Za)(m/K)E $ Diagrams for the radiative corrections to the Zemach contribution in Figs. 103 and 104 are obtained from the diagrams in Figs. 97 and 98 by insertions of the radiative photons in the electron line or of the polarization operator in the external photon legs. Analytic expressions for the nuclear
Fig. 103. Electron-line radiative correction to the Zemach contribution.
228
M.I. Eides et al. / Physics Reports 342 (2001) 63}261
Fig. 104. Photon-line radiative correction to the Zemach contribution.
size corrections of order a(Za)E are obtained from the integral for the Zemach correction in $ Eq. (346) by insertions of the electron factor or the one-loop polarization operator in the integrand in Eq. (346). E!ective integration momenta in Eq. (346) are determined by the scale of the proton form factor, and so we need only the leading terms in the high-momentum expansion of the polarization operator and the electron factor for calculation of the radiative corrections to the Zemach correction. The leading term in the high-momentum asymptotic expansion of the electron factor is simply a constant (see the text above Eq. (326)) and the correction to hyper"ne splitting is the product of this constant and the Zemach correction [290] 5 a(Za) m1r2 E . *E" $ 2 p
(366)
The contribution of the polarization operator is logarithmically enhanced due to the logarithmic asymptotics of the polarization operator. This logarithmically enhanced contribution of the polarization operator is equal to the doubled product of the Zemach correction and the leading term in the polarization operator expansion (an extra factor two is necessary to take into account two ways to insert the polarization operator in the external photon legs in Figs. 97 and 98)
K a(Za) 4 m1r2 E . *E"! ln $ p m 3
(367)
Calculation of the nonlogarithmic part of the polarization operator insertion requires more detailed information on the proton form factors, and using the dipole parametrization one obtains [290]
K 317 a(Za) 4 m1r2 E . *E"! ln ! $ p m 105 3
(368)
14.2.2. Radiative-recoil corrections of order a(Za)(m/M)E $ Radiative-recoil corrections of order a(Za)(m/M)E are similar to the radiative corrections to the $ Zemach contribution, and in principle admit a straightforward calculation in the framework of the skeleton integral approach. Leading logarithmic contributions of this order were considered in [289,290]. The logarithmic estimate in [290] gives *E"0.11(2);10\E , $ for the contribution of the electron-line radiative insertions, and
(369)
*E"!0.02;10\E , $ for the contribution of the vacuum polarization insertions in the exchanged photons.
(370)
M.I. Eides et al. / Physics Reports 342 (2001) 63}261
229
Numerically these contributions are much smaller than the uncertainty of the Zemach correction. 14.2.3. Heavy particle polarization contributions Muon and heavy particle polarization contributions to hyper"ne splitting in muonium were considered in Sections 11.3.1 and 12.2.1.6. In the external "eld approximation the skeleton integral with the muon polarization insertion coincides with the respective integral for muonium (compare Eq. (283) and the discussion after this equation) and one easily obtains [61] m 3 (371) *E" a(Za) E . m $ 4 I This result gives a good idea of the magnitude of the muon polarization contribution since muon is relatively light in comparison to the scale of the proton form factor which was ignored in this calculation. The total muon polarization contribution may be calculated without great e!orts but due to its small magnitude such a calculation is of minor phenomenological signi"cance and was never done. Only an estimate of the total muon polarization contribution exists in the literature [290] *E"0.07 (2);10\E . (372) $ Hadronic vacuum polarization in the external "eld approximation for the pointlike proton also was calculated in [61]. Such a calculation may serve only as an order of magnitude estimate since both the external "eld approximation and the neglect of the proton form factor are not justi"ed in this case, because the scale of the hadron polarization contribution is determined by the same o-meson mass which determines the scale of the proton form factor. Again a more accurate calculation is feasible but does not seem to be warranted, and only an estimate of the hadronic polarization contribution appears in the literature [290] *E"0.03 (1);10\E . $
(373)
14.3. Weak interaction contribution The weak interaction contribution to hyper"ne splitting in hydrogen is easily obtained by generalization of the muonium result in Eq. (343) G 3mM g $ E +5.8;10\ kHz . *E" 1#i (2 4pZa $
(374)
Two features of this result deserve some comment. First, the axial coupling constant for the composite proton is renormalized by the strong interactions and its experimental value is g "1.267, unlike the case of the elementary muon when it was equal unity. Second the signs of the weak interaction correction are di!erent in the case of muonium and hydrogen [196].
230
M.I. Eides et al. / Physics Reports 342 (2001) 63}261
15. Hyper5ne splitting in muonic hydrogen We have considered level shifts in muonic hydrogen in Section 9 neglecting hyper"ne structure. However, future measurements (see discussion in Section 16.1.10) will be done on the components of hyper"ne structure, and knowledge of this hyper"ne structure is crucial for comparison of the theoretical predictions for the Lamb shift in muonic hydrogen with the experimental data. We will consider below hyper"ne structure in the states 2S and 2P. 15.1. Hyperxne structure of the 2S state Due to enhancement of the light electron loops in muonic hydrogen they produce the largest contribution to the Lamb shift in muonic hydrogen (see Section 4.3). Unlike the Lamb shift, where the leading contribution is a radiative (loop) correction, the leading contribution to hyper"ne splitting already exists at the tree level (see discussion in Section 4.4). Hence, the Fermi contribution in Eq. (271) (with the natural substitution of the heavy particle mass and anomalous magnetic moment instead of the respective muon characteristics) remains by far the largest contribution to HFS in muonic atoms. The leading electron vacuum polarization contribution to HFS generated by the exchange of one-photon with polarization insertion in Fig. 105 is enhanced in muonic atoms, and becomes the next largest individual contribution to HFS after the Fermi contribution. The reason for this enhancement is the same as for the respective enhancement in the case of the Lamb shift: electron vacuum polarization distorts the "eld of an external source at distances of about 1/m and for muonic atoms the wave function is concentrated in a region of comparable size C determined by the Bohr radius 1/(mZa). Hence, the e!ect of the electronic vacuum polarization on the HFS is much stronger in muonic atoms than in the electronic atoms where the region where the external potential is distorted by the vacuum potential is negligible in comparison with the e!ective radius of the wave function. As a result the electron vacuum polarization contribution to HFS in light muonic atoms is of order aE [309], to be compared with the leading polarization contribu$ tion in electronic hydrogen of order a(Za)E in Eq. (283). Thus, in order to translate the hyper"ne $ results for ordinary hydrogen into the results for muonic hydrogen we have to consider additional contributions which are due to the polarization insertions. Notice "rst of all that the nonrecoil results in Tables 12}17 may be directly used in the case of muonic hydrogen. Since we are interested in the contribution to HFS in 2S state we should properly restore the dependence on the principal quantum number n, which was sometimes omitted above. This dependence is known for the results in Table 12 and all corrections in Tables 13 and 14 are state independent. For the corrections in Tables 15}17 only the leading logarithmic contributions are state independent, and the exact n dependence of the other terms is often unknown. However, all corrections in these Tables are of order 1;10\ meV and are too small
Fig. 105. Electron vacuum polarization insertion in the exchanged photon.
M.I. Eides et al. / Physics Reports 342 (2001) 63}261
231
Fig. 106. Leading electron vacuum polarization correction to HFS in muonic hydrogen.
for phenomenological goals (see discussion in Section 16.1.10). The sum of all corrections in Tables 12}17 for the 2S state is equal to *E"22.8322 meV . (375) Vacuum polarization corrections of order aE to HFS may be easily calculated with the help of $ the spin}spin term in the Breit-like potential corresponding to an exchange of a radiatively corrected photon in Fig. 105. This spin}spin potential < can be obtained by substituting the 4. spin}spin term corresponding to a massive exchange [211]
8 Za e\KC DP V (2m f)" (1#a )(1#i)(s ) sp ) pd(r)! (2m fr) C I I C 4. 3 mM 4r
(376)
in the integral in Eq. (220) instead of V (2m f). 4. C Correction to HFS is then given by a sum of the "rst- and second-order perturbation theory contributions similar to the respective contribution to the Lamb shift in Eq. (222) (see Fig. 106) *E"1< 2#21< G(E )
(377)
(378)
We should also consider other new corrections with insertions of vacuum polarizations, but they all contain at least one extra factor a, should be less or about 0.001 meV and at the present stage may safely be ignored. Next we turn to the nuclear size, recoil and structure corrections, where one cannot ignore the composite nature of the proton. As in the case of ordinary hydrogen the main contribution is connected with the proton size corrections of order (Za)E considered in Sections 14.1.1.1 and $ 14.1.1.2. Respective considerations may literally be repeated for muonic hydrogen, the only di!erence is that due to a larger ratio of the muon to proton mass, separate consideration of the nonrecoil (Zemach) and recoil corrections of order (Za)E makes even less sense than in the case of $ electronic hydrogen. Hence, one should consider the total contribution of order (Za)E given by $ the sum of the contributions in Eqs. (352) and (353). Numerical calculations for the 2S state lead to the result [211] *E"!0.145 meV .
(379)
As was discussed in Sections 14.1.1.1 and 14.1.1.2 this contribution depends on the dipole parametrization of the proton form factors, the value of the proton radius, and can probably be improved as a result of dedicated analysis.
232
M.I. Eides et al. / Physics Reports 342 (2001) 63}261
Proton polarizability contributions of order (Za)E discussed for electronic hydrogen $ in Section 14.1.1.3 are notoriously di$cult to evaluate. Comparing the results for the upper boundary for the inelastic contribution in Eq. (362) and the elastic contribution from [289] discussed in Section 14.1.1.2 we see that the polarizability contribution is at the level of 10% of the elastic contribution in electronic hydrogen. As a conservative estimate it was suggested in [211] to assume that the same estimate is valid for muonic hydrogen. This assumption means that the polarizability contribution in muonic hydrogen does not exceed 0.15 meV. Collecting all contributions above we obtain the total HFS splitting in the 2S state in muonic hydrogen [211] (Table 19) *E"22.745 (15) meV .
(380)
As was discussed at the end of Section 14.1.1.2 we expect that the uncertainty of this result determined by the unknown polarizability contribution can be reduced as a result of a new analysis. 15.2. Fine and hyperxne structure of the 2P states The main contribution to the "ne and hyper"ne structure of the 2P states is described by the spin}orbit and spin}spin terms in the Breit Hamiltonian in Eq. (35) (spin}spin terms were omitted in Eq. (35)). For proper description of the "ne and hyper"ne structure we have to include in the Breit potential anomalous magnetic moments of both constituents and restore all terms which depend on the heavy particle spin. These were omitted in Eq. (35). Then relevant terms in the Breit potential have the form
1#a Za 1#2i 1#i Za 1#2a I# I (L ) s )# # (L ) sp ) < " I 2m r 2M mM mM r
(s ) r)(sp ) r) Za (1#i)(1#a ) I (s ) sp )!3 I . ! I mM r r
(381)
The "rst term in this potential describes spin}orbit interaction of light particles and its matrix element determines the "ne structure splitting between 2P and 2P states. Two other terms depend on the heavy particle spin. It is easy to see that these terms mix the states with the same total angular momentum F"J#sp and di!erent J [310], in our case these are the states 2P and 2P (see Fig. 49). Thus to "nd the "ne and hyper"ne structure of the 2P states we have to solve an elementary quantum-mechanical problem of diagonalizing a simple four by four Hamiltonian, where only a two by two submatrix is nondiagonal. Before solving this problem we have to consider if there are any other contributions to the Hamiltonian besides the terms in the Breit potential in Eq. (381). It is easy to realize that the only other contribution to the e!ective potential is given by the radiatively corrected one-photon exchange. The respective Breit-like potential may be obtained exactly as in Eq. (220) by integrating the spin}orbit potential corresponding the
M.I. Eides et al. / Physics Reports 342 (2001) 63}261
233
Table 19 Hyper"ne splitting in hydrogen E "1 418 840.11 (3) (1) kHz $ Total nonrecoil contribution Tables 12}15 Proton size correction, relative order (Za)(m/K) Zemach [166]
1.0011360896 (19)
Bodwin and Yennie [289]
1 420 452.04 (3) (1)
!2(Za)m1r2 "!42.4 (1.1);10\
Recoil correction, relative order (Za)(m/M) Arnowitt [258] Newcomb and Salpeter [259] Iddings and Platzman [292] Recoil correction, relative order (Za)(m/M)
kHz
5.22 (1);10\
7.41 (2)
65 i(11#31i) (Za) m # # 18 1#i mM 36 "0.4585;10\
2 ! (Za) ln(Za)\m1r2"!0.002;10\ 3
Leading logarithmic correction, relative order (Za)(m/K) Karshenboim [290]
!(Za) ln(Za)\m1r2 "!0.01;10\
Electron-line correction, relative order a(Za)(m/K) Karshenboim [290]
5 a(Za) m1r2 "0.12;10\ 2 p
Photon-line correction, relative order a(Za)(m/K) Karshenboim [290]
K 317 a(Za) 4 m1r2 "!0.77;10\ ! ln ! p 3 m 105
Leading photon-line correction, relative order a(Za)(m/M) Karshenboim [290]
7i i(12!11i) 2(1#i)# ln(Za)\ ! 8(1#i)! ln 2 4 4
Leading logarithmic correction, relative order (Za)mr N Karshenboim [290]
Leading electron-line correction, relative order a(Za)(m/M) Karshenboim [290]
!60.2 (1.6)
0.65
!0.002
!0.016
0.11 (2);10\
!0.02;10\
0.17
!1.10
0.16
!0.03
Muon vacuum polarization, relative order a(Za)(m/m ) I Karshenboim [290,61]
0.07 (2);10\
0.10 (3)
Hadron vacuum polarization, Karshenboim [290,61]
0.03 (1);10\
0.04 (1)
Weak interaction contribution, Beg and Feinberg [282] Total theoretical HFS
g G 3mM $ "0.06;10\ 1#i (2 4pZa
0.08 1 420 399.3 (1.6)
234
M.I. Eides et al. / Physics Reports 342 (2001) 63}261
exchange of a particle with mass (t"2m f C 1#a Za 1#2a I e\KC DP(1#2m fr)(L ) s ) I# V U (2m f)" C I 4. C mM 2m r
Za 1#2i 1#i # e\KC DP(1#2m fr)(L ) sp ) # C mM r 2M Za (1#i)(1#a ) I ! mM r
(s ) r)(sp ) r) (s ) sp )!3 I (1#2m fr) I C r
(s ) r)(sp ) r) (2m fr) # (s ) sp )! I I C r
e\KC DP .
(382)
All that is left to obtain the "ne and hyper"ne structure of the 2P states is to diagonalize the four by four Hamiltonian with interaction which is the sum of the Breit potential in Eq. (381) and the respective integral of the potential density in Eq. (382). This problem was solved in [211], where it was obtained *E(2P )"7.963 meV , *E(2P )"3.393 meV . (383) Due to mixing of the states 2P and 2P (see Fig. 49) they are additonally shifted by D"0.145 meV. 16. Comparison of theory and experiment In numerical calculations below we will use the most precise modern values of the fundamental physical constants. The value of the Rydberg constant is [311] R "10 973 731.568516 (84) m\, d"7.7;10\ , the "ne structure constant is equal to [312] a\"137.03599958 (52), d"3.8;10\ ,
(384)
(385)
the proton}electron mass ratio is equal to [313] M "1836.1526665 (40), d"2.2;10\ , m
(386)
the muon}electron mass ratio is equal to [5] M "206.768277 (24), d"1.2;10\ , m
(387)
and the deuteron}proton mass ratio is equal to [314] M "1.9990075013 (14), d"7.0;10\ . m
(388)
M.I. Eides et al. / Physics Reports 342 (2001) 63}261
235
16.1. Lamb shifts of the energy levels From the theoretical point of view the accuracy of calculations is limited by the magnitude of the yet uncalculated contributions to the Lamb shift. Corrections to the P levels are known now with a higher accuracy than the corrections to the S levels, and do not limit the results of the comparison between theory and experiment. 16.1.1. Theoretical accuracy of S-state Lamb shifts Corrections of order a(Za) are the largest uncalculated contributions to the energy levels for S-states. The correction of this order is a polynomial in ln(Za)\, starting with the logarithm cubed term. Both the logarithm cubed term and the contribution of the logarithm squared terms to the di!erence *E (1S)!8*E (2S) are known (for more details on these corrections see discussion in * * Section 4.4.2). However, the calculation of the respective contributions to the individual energy levels is still missing. With this circumstance it is reasonable to take one half of the logarithm cubed term (which has roughly the same magnitude as the logarithm squared contribution to the interval *E (1S)!8*E (2S)) as an estimate of the scale of all yet uncalculated logarithm squared * * contributions. We thus assume that uncertainties induced by the uncalculated contributions of order a(Za) constitute 14 and 2 kHz for the 1S- and 2S-states, respectively. All other unknown theoretical contributions to the Lamb shift are much smaller, and 14 kHz for the 1S-state and 2 kHz for the 2S-state are reasonable estimates of the total theoretical uncertainty of the expression for the Lamb shift. Theoretical uncertainties for the higher S levels may be obtained from the 1S-state uncertainty ignoring its state dependence and scaling it with the principal quantum number n. 16.1.2. Theoretical accuracy of P-state Lamb shifts The Lamb shift theory of P-states is in a better shape than the theory of S-states. The largest unknown corrections to the P-state energies are the single logarithmic contributions of the form a(Za) ln(Za)\m, like the one in Eq. (111), induced by radiative insertions in the electron and external photon lines, and the uncertainty of the nonlogarithmic contributions G of order 1# a(Za)m (see Table 6). One half of the double logarithmic contribution in Eq. (109) can be taken as a fair an estimate of the magnitude of the uncalculated single logarithmic contributions of the form a(Za) ln(Za)\m. An estimate of the theoretical accuracy of the 2P Lamb shift is then about 0.08 kHz. Theoretical uncertainties for the higher non-S levels may be obtained from the 2P-state uncertainty ignoring its state-dependence and scaling it with the principal quantum number n. 16.1.3. Theoretical accuracy of the interval ¸(1S)}8¸(2S) State-independent contributions to the Lamb shift scale as 1/n and vanish in the di!erence E (1S)!8E (2S), which may be calculated more accurately than the positions of the individual * * energy levels (see discussion in Section 4.4.2.3). All main sources of theoretical uncertainty of the individual energy levels, namely, proton charge radius contributions and yet uncalculated state independent corrections to the Lamb shift vanish in this di!erence. This observation plays an important role in extracting the precise value of the 1S Lamb shift from modern highly accurate experimental data (see discussion below in Section 16.1.5). Earlier the practical usefulness of the theoretical value of the interval E (1S)}8E (2S) for extraction of the experimental value of * *
236
M.I. Eides et al. / Physics Reports 342 (2001) 63}261
the 1S Lamb shift was impeded by the insu$cient theoretical accuracy of this interval and by the insu$cient accuracy of the frequency measurement. Signi"cant progress was achieved recently in both respects, especially on the experimental side. On the theoretical side the last relatively large contribution to E (1S)}8E (2S) of order a(Za) ln(Za)\ was calculated in [115,116,119,120] * * (see Eq. (113) above), and the theoretical uncertainty of this interval was reduced to 5 kHz D,E (1S)!8E (2S)"!187 231 (5) kHz . * *
(389)
16.1.4. Classic Lamb shift 2S } 2P Discovery of the classic Lamb shift, i.e., splitting of the 2S and the 2P energy levels in hydrogen triggered a new stage in the development of modern physics. In the terminology accepted in this paper the classic Lamb shift is equal to the di!erence of Lamb shifts in the respective states *E(2S }2P )"¸(2S )!¸(2P ). Unlike the much larger Lamb shift in the 1S state, the classic Lamb shift is directly observable as a small splitting of energy levels which should be degenerate according to Dirac theory. This greatly simpli"es comparison between the theory and experiment for the classic Lamb shift, since the theoretical predictions are practically independent of the exact value of the Rydberg constant, which can be measured independently. Many experiments on precise measurement of the classic Lamb shift were performed since its experimental discovery in 1947. We have collected modern post 1979 experimental results in Table 20. Two entries in this table are changed compared to the original published experimental results [315,316]. These alterations re#ect recent improvements of the theory used for extraction of the Lamb shift value from the raw experimental data. The magnitude of the Lamb shift in [315] was derived from the ratio of the 2P decay width and the *E(2S }2P ) energy splitting which was directly measured by the atomic-inter ferometer method. The theoretical expression for the 2P -state lifetime was used for extraction of the magnitude of the Lamb shift. An additional leading logarithmic correction to the width of the 2P state of relative order a(Za) ln(Za)\, not taken into account in the original analysis of the experiment, was obtained recently in [120]. This correction slightly changes the original experimental result [315] *E"1 057 851.4 (1.9) kHz, and we cite this corrected value in Table 20. The magnitude of the new correction [120] triggered a certain discussion in the literature [317,318]. From the phenomenological point of view the new correction [120] is so small that neither of our conclusions below about the result in [315] is a!ected by this correction. The Lamb shift value [316] was obtained from the measurement of interval 2P }2S , and the value of the classical Lamb shift was extracted by subtraction of this energy splitting from the theoretical value of the "ne structure interval 2P }2P . As was "rst noted in [99], recent progress in the Lamb shift theory for P-states requires reconsideration of the original value *E"1 057 839 (12) kHz of the classical Lamb shift obtained in [316]. We assume that the total theoretical uncertainty of the "ne structure interval is about 0.08 kHz (see discussion of the accuracy of P-state Lamb shift above in Section 16.1.2). Comparable contribution of 0.08 kHz to the uncertainty of the "ne structure interval originates from the uncertainty of the most precise modern value of the "ne structure constant in [312] (see Eq. (385)). Calculating the theoretical value of the "ne structure interval we obtain *E(2P }2P )"10 969 041.52 (11) kHz, which is di!erent from the value *E(2P }2P )"10 969 039.4 (2) kHz, used in [316]. As a consequence the original experimental value [316] of the classic Lamb shift changes to the one cited in Table 20.
M.I. Eides et al. / Physics Reports 342 (2001) 63}261
237
Table 20 Classic 2S }2P Lamb shift *E (kHz) Newton et al. [319] Lundeen and Pipkin [320] Palchikov et al. [315] Hagley and Pipkin [316] Wijngaarden et al. [321] Schwob et al. [311]
Weitz et al. [323] Berkeland et al. [324] Bourzeix et al. [325] Udem et al. [3] Schwob et al. [311]
1057 862 (20) 1057 845 (9) 1057 857.6 (2.1) 1057 842 (12) 1057 852 (15) 1057 845 (3) 1057 814 (2) (4) 1057 833 (2) (4) 1057 857 (12) 1057 842 (11) 1057 836 (8) 1057 848 (5) 1057 841(5) 1057 843 (2) (6)
Experiment Experiment Experiment Experiment Experiment Exp., [322}325,3,320,316,321] Theory, r "0.805 (11) fm [157] N Theory, r "0.862 (12) fm [158] N Self-consistent value Self-consistent value Self-consistent value Self-consistent value Self-consistent value Theory, r "0.891(18) fm N
Due to a relatively large uncertainty of the result this change does not alter the conclusions below on comparison of the theory and experiment. Accuracy of the radiofrequency measurements of the classic 2S}2P Lamb shift [319, 320, 315,316,321] is limited by the large (about 100 MHz) natural width of the 2P state, and cannot be signi"cantly improved. New perspectives in reducing the experimental error bars of the classic 2S}2P Lamb shift were opened with the development of the Doppler-free two-photon laser spectroscopy for measurements of the transitions between the energy levels with di!erent principal quantum numbers. Narrow linewidth of such transitions allows very precise measurement of the respective transition frequencies, and indirect accurate determination of 2S}2P splitting from this data. The latest experimental value [311] in the "fth line of Table 20 was obtained by such methods. Both the theoretical and experimental data for the classic 2S }2P Lamb shift are collected in Table 20. Theoretical results for the energy shifts in this table contain errors in the parenthesis where the "rst error is determined by the yet uncalculated contributions to the Lamb shift, discussed above and the second re#ects the experimental uncertainty in the measurement of the proton rms charge radius. An immediate conclusion from the data in Table 20 is that the value of the proton rms radius as measured in [157] is by far too small to accommodate the experimental data on the Lamb shift. Even the larger value of the proton charge radius obtained in [158] is inconsistent with the result of the apparently most precise measurement of the 2S }2P splitting in [315]. The respective discrepancy is more than "ve standard deviations. Results of four other direct measurements of the classic Lamb shift collected in Table 20 are compatible with the theory
See more discussion of this method below in Section 16.1.5. We have used in the calculations the result in Eq. (148) for the radiative-recoil correction of order a(Za). Competing result in Eq. (149) would shift the value of the classic Lamb shift by 0.78 kHz, and would not e!ect our conclusions on the comparison between theory and experiment below.
238
M.I. Eides et al. / Physics Reports 342 (2001) 63}261
if one uses the proton radius from [158]. Unfortunately, these results are rather widely scattered and have rather large experimental errors. Their internal consistency as well as their consistency with theory leaves much to be desired. Taken at face value the experimental results on the 2S }2P splitting indicate of an even larger value of the proton charge radius than measured in [158]. The situation with the experimental values of the proton charge radius is unsatisfactory and a new measurement is clearly warranted. We will return to the numbers in the "ve last lines in Table 20 below. 16.1.5. 1S Lamb shift Unlike the case of the classic Lamb shift above, the Lamb shift in the 1S is not amenable to a direct measurement as a splitting between certain energy levels and in principle could be extracted from the experimental data on the transition frequencies between the energy levels with di!erent principal quantum numbers. Such an approach requires very precise measurement of the gross structure intervals, and became practical only with the recent development of Doppler-free two-photon laser spectroscopy. These methods allow very precise measurements of the gross structure intervals in hydrogen with an accuracy which is limited in principle only by the small natural linewidths of respective transitions. For example, the 2S}1S transition in hydrogen is banned as a single photon process in the electric dipole and quadrupole approximations, and also in the nonrelativistic magnetic dipole approximation. As a result the natural linewidth of this transition is determined by the process with simultaneous emission of two electric dipole photons [6,20], which leads to the natural linewidth of the 2S}1S transition in hydrogen about 1.3 Hz. Many recent spectacular experimental successes where achieved in an attempt to achieve an experimental accuracy comparable with this extremely small natural linewidth. The intervals of gross structure are mainly determined by the Rydberg constant, and the same transition frequencies should be used both for measurement of the Rydberg constant and for measurement of the 1S Lamb shift. The "rst experimental task is to obtain an experimental value of the 1S Lamb shift which is independent of the precise value of the Rydberg constant. This goal may be achieved by measuring two intervals with di!erent principal quantum numbers. Then one constructs a linear combination of these intervals which is proportional aR (as opposed to &R leading contributions to the intervals themselves). Due to the factor a the precise magnitude of the Lamb shift extracted from the above mentioned linear combination of measured frequencies practically does not depend on the exact value of the Rydberg constant. For example, in one of the most recent experiments [3] measurement of the 1S Lamb shift is disentangled from the measurement of the Rydberg constant by using the experimental data on two di!erent intervals of the hydrogen gross structure [3] f } "2 466 061 413 187.34 (84) kHz, d"3.4;10\ , 1 1 and [322,311] f } "770 649 561 581.1 (5.9) kHz, d"7.7;10\ . 1 "
(390)
(391)
The original experimental value f } "770 649 561 585.0 (4.9) kHz [322] used in [3] was revised in [311], and 1 " we give in Eq. (391) this later value. The values of the Lamb shifts obtained in [3] change respectively and Tables 20 and 21 contain these revised values.
M.I. Eides et al. / Physics Reports 342 (2001) 63}261
239
Theoretically these intervals are given by the expression in Eq. (39) E } "[E"0 !E"0 ]#¸ !¸ , 1 1 1 1 1 1 E } "[E"0 !E"0 ]#¸ !¸ , " 1 " 1 1 "
(392)
where E"0H is the leading Dirac and recoil contribution to the position of the respective energy level LJ ("rst two terms in Eq. (39)). The "rst di!erences on the right-hand side in Eqs. (392) are proportional to the Rydberg constant, which thus can be simply excluded from this system of two equations. Then we obtain an equality between a linear combination of the 1S, 2S and 8D Lamb shifts and a linear combina tion of the experimentally measured frequencies. This relationship admits direct comparison with the Lamb shift theory without any further complication. However, to make a comparison between the results of di!erent experiments feasible (di!erent intervals of the hydrogen gross structure are measured in di!erent experiments) the "nal experimental results are usually expressed in terms of the 1S Lamb shift measurement. The bulk contribution to the Lamb shift scales as 1/n which allows one to use the theoretical value ¸ "71.5 kHz for the D-state Lamb shift without loss of " accuracy. Then a linear combination of the Lamb shifts in 1S and 2S states may be directly expressed in terms of the experimental data. All other recent measurements of the 1S Lamb shift [311,323}325] also end up with an experimental number for a linear combination of the 1S, 2S and higher level Lamb shifts. An unbiased extraction of the 1S Lamb shift from the experimental data remains a problem even after an experimental decoupling of the Lamb shift measurement from the measurement of the Rydberg constant. Historically the most popular approach to extraction of the value of the 1S Lamb shift was to use the experimental value of the classic 2S}2P Lamb shift (see three "rst lines in Table 21). Due to the large natural width of the 2P state the experimental values of the classical Lamb shift have relatively large experimental errors (see Table 20), and unfortunately di!erent results are not too consistent. Such a situation clearly warrants another approach to extraction of the 1S Lamb shift, one which should be independent of the magnitude of the classic Lamb shift. A natural way to obtain a self-consistent value of the 1S Lamb shift independent of the experimental data on the 2S}2P splitting, is provided by the theoretical relation between the 1S and 2S Lamb shifts discussed above in Section 16.1.3. An important advantage of the self-consistent method is that it produces an unbiased value of the ¸(1S) Lamb shift independent of the widely scattered experimental data on the 2S}2P interval. Spectacular experimental progress in the frequency measurement now allows one to obtain self-consistent values of 1S Lamb shift from the experimental data [3], with comparable or even better accuracy (see "ve lines in Table 21 below the theoretical values in the middle of the table) than in the method based on experimental results of the classic Lamb shift in [320,316]. The original experimental numbers from [3,311] in the fourth and "fth lines in Table 21 are averages of the self-consistent values and the values based on the classic Lamb shift. The result frequency from [322,311], and the in [3] is based on the f } frequency measurement, f } 1 "1 1 1 classic Lamb shift measurements [320,316], while the result in [311] is based on the f } 1 " The value of the 1S Lamb shift is also often needed for extraction of the precise value of the Rydberg constant from the experimental data, see Section 16.1.8 below.
240
M.I. Eides et al. / Physics Reports 342 (2001) 63}261
Table 21 1S Lamb shift *E(kHz) Weitz et al. [323] Berkeland et al. [324] Bourzeix et al. [325] Udem et al. [3] Schwob et al. [311]
Weitz et al. [323] Berkeland et al. [324] Bourzeix et al. [325] Udem et al. [3] Schwob et al. [311]
8172 874 (60) 8172 827 (51) 8172 798(46) 8172 851 (30) 8172 837 (22) 8172 605 (14) (28) 8172 754 (14) (32) 8172 937 (99) 8172 819 (89) 8172 772 (66) 8172 864 (40) 8172 805 (43) 8172 832 (14) (51)
Exp., ¸ [320] 1. Exp., ¸ [320,316] 1. Exp., ¸ [320,316] 1. Exp., ¸ [320,316] 1. Exp., [322}325,3,320,316,321] Theory, r "0.805 (11) fm [157] N Theory, r "0.862 (12) fm [158] N Self-consistent value Self-consistent value Self-consistent value Self-consistent value Self-consistent value Theory, r "0.891 (18) fm N
frequency measurement, as well as on the frequencies measured in [322,311,323}325,3], and the classic Lamb shift measurements [320,316,321]. The respective value of the classic 2S}2P Lamb shift is presented in the sixth line of Table 20. Unlike other experimental numbers in Table 20, this value of the classic Lamb shift depends on other experimental results in this table. The experimental data on the 1S Lamb shift should be compared with the theoretical prediction *E (1S)"8 172 754 (14) (32) kHz , (393) * calculated for r "0.862 (12) fm [158]. The "rst error in this result is determined by the yet N uncalculated contributions to the Lamb shift and the second re#ects the experimental uncertainty in the measurement of the proton rms charge radius. The experimental results in the "rst "ve lines in Table 21 seem to be systematically higher even than the theoretical value in Eq. (393) calculated with the higher experimental value for the proton charge radius [158]. One is tempted to come to the conclusion that the experimental data give an indication of an even higher value of the proton charge radius than the one measured in [158]. However, it is necessary to remember that the `experimentala results in the "rst "ve lines in the table are `biaseda, namely they depend on the experimental value of the 2S }2P Lamb shift [320,316,321]. In view of a rather large scattering of the results for the classic Lamb shift such dependence is unwelcome. To obtain unbiased results we have calculated self-consistent values of the 1S Lamb shift which are collected in Table 21. These values being formally consistent are rather widely scattered. Respective self-consistent values of the classic Lamb shift obtained from the experimental data in [323}325,3] are presented in Table 20. All experimental results (both original and self-consistent) in both tables are systematically larger than the respective theoretical predictions. The only plausible explanation is that the true value of the proton charge radius is even larger than the one measured in [158]. At this point we can invert the problem and obtain the value of the proton charge radius r "0.891 (18) fm N
(394)
M.I. Eides et al. / Physics Reports 342 (2001) 63}261
241
comparing the average ¸(1S)"8 172 832 (25) kHz of the self-consistent values of the 1S Lamb shift in Table 21 based on the most precise recent frequency measurements [322,311,323}325,3] with theory. Major contribution to the uncertainty of the proton charge radius in Eq. (394) is due to the uncertainty of the self-consistent Lamb shift. A new analysis of the low momentum transfer electron scattering data with account of the Coulomb and recoil corrections [159] resulted in the proton radius value r "0.880 (15) fm , (395) N in good agreement with the self-consistent value in Eq. (394). Another recent analysis of the elastic e!p scattering data resulted in an even higher value of the proton charge radius [326] r "0.897 (2) (1) (3) fm , (396) N where the error in the "rst brackets is due to statistics, the second error is due to normalization e!ects, and the third-error re#ects the model dependence. Comparing results of these two analysis one has to remember that the Coulomb corrections which played the most important role in [159] were ignored in [326]. The results of [326] depend also on speci"c parametrization of the nucleon form factors. Under these conditions, despite the super"cial agreement between the results in Eqs. (395) and (396), the extraction of the precise value of the proton charge radius from the scattering data cannot be considered satisfactory, and further work in this "eld is required. Theoretical values of the classic 2S}2P Lamb shift and of the 1S Lamb shift corresponding to the proton radius in Eq. (394) are given in the last lines of Tables 20 and 21, respectively. It is clear that there is much more consistency between these theoretical predictions and the mass of experimental data on the Lamb shifts than between the predictions based on the proton charge radius from [158] (to say nothing about the radius from [157]) and experiment. We expect that future experiments on measurement of the proton charge radius will con"rm the hydrogen Lamb shift prediction of the value of the proton charge radius in Eq. (394). Precise measurements of the Lamb shift in muonic hydrogen (see discussion in Section 16.1.10) provide the best approach to measurement of the proton charge radius, and would allow reduction of error bars in Eq. (394) by at least an order of magnitude. 16.1.6. Isotope shift The methods of Doppler-free two-photon laser spectroscopy allow very precise comparison of the frequencies of the 1S}2S transitions in hydrogen and deuterium. The frequency di!erence *E"[E(2S)!E(1S)] ![E(2S)!E(1S)] (397) " & is called the hydrogen}deuterium isotope shift. Experimental accuracy of the isotope shift measurements was improved by three orders of magnitude during the period from 1989 to 1998 (see Table 22) and the uncertainty of the most recent experimental result [327] was reduced to 0.15 kHz. The main contribution to the hydrogen}deuterium isotope shift is a pure mass e!ect and is determined by the term E"0 in Eq. (39). Other contributions coincide with the respective contribuLH tions to the Lamb shifts in Tables 1}10. Deuteron speci"c corrections discussed in Section 4.1 and collected in Eqs. (170), (181), (182) and (190) also should be included in the theoretical expression for the isotope shift.
242
M.I. Eides et al. / Physics Reports 342 (2001) 63}261
Table 22 Isotope shift *E (kHz) Boshier et al. [2] Schmidt-Kaler et al. [330] Huber et al. [327]
67 099 433 (64) 670 994 414 (22) 670 994 334.64 (15)
All yet uncalculated nonrecoil corrections to the Lamb shift almost cancel in the formula for the isotope shift, which is thus much more accurate than the theoretical expressions for the Lamb shifts. Theoretical uncertainty of the isotope shift is mainly determined by the unknown single logarithmic and nonlogarithmic contributions of order (Za)(m/M) and a(Za)(m/M) (see Sections 5.3 and 6.2), and also by the uncertainties of the deuteron size and structure contributions discussed in Section 7. Overall theoretical uncertainty of all contributions to the isotope shift, besides the leading proton and deuteron size corrections does not exceed 0.8 kHz. Theoretical predictions for the isotope shift strongly depend on the magnitude of the radiativerecoil corrections of order a(Za)(m/M)m. Unfortunately, there is still an unresolved discrepancy between the theoretical results on these corrections obtained in [35,36,151] and in [152] (for more detail see discussion in Section 6.1.1), and the di!erence between the respective values of the isotope shift is about 2.7 kHz, to be compared with the uncertainty 0.15 kHz of the most recent experimental result [327]. Discrepancy between the theoretical predictions for the radiative-recoil corrections of order a(Za)(m/M)m is one of the outstanding theoretical problems, and e!orts for its resolution are necessary. Numerically the sum of all theoretical contributions to the isotope shift, besides the leading nuclear size contributions in Eq. (158), is equal to *E"670 999 566.1 (1.5) (0.8) kHz ,
(398)
for the a(Za)(m/M)m contribution from [35,36,151], and *E"670 999 568.9 (1.5) (0.8) kHz ,
(399)
for the a(Za)(m/M)m contribution from [152]. The uncertainty in the "rst parenthesis is de"ned by the experimental error of the electron}proton and proton}deuteron mass ratios, and the uncertainty in the second parenthesis is the theoretical uncertainty discussed above. Individual uncertainties of the proton and deuteron charge radii introduce by far the largest contributions in the uncertainty of the theoretical value of the isotope shift. Uncertainty of the charge radii are much larger than the experimental error of the isotope shift measurement or the uncertainties of other theoretical contributions. It is su$cient to recall that uncertainty of the 1S Lamb shift due to the experimental error of the proton charge radius is as large as 32 kHz (see Eq. (393)), even if ignore all problems connected with the proton radius contribution (see discussion in Sections 16.1.4 and 16.1.5). In such a situation it is natural to invert the problem and to use the high accuracy of the optical measurements and isotope shift theory for determination of the di!erence of charge radii squared of
M.I. Eides et al. / Physics Reports 342 (2001) 63}261
243
the deuteron and proton. We obtain r !r"3.8193 (01) (11) (04) fm , " N using the [35,36,151] value of the a(Za)(m/M)m corrections, and
(400)
r !r"3.8213 (01) (11) (04) fm , (401) " N for the [152] value of the a(Za)(m/M)m corrections. Here the "rst contribution to the uncertainty is due to the experimental error of the isotope shift measurement, the second uncertainty is due to the experimental error of the electron}proton mass ratio determination, and the third is generated by the theoretical uncertainty of the isotope shift. An improvement of the precision of the electron}proton mass ratio measurement is crucial for a more accurate determination of the di!erence of the charge radii squared of the deuteron and proton from the isotope shift measurements. The di!erence of the deuteron and proton charge radii squared is connected to the so called deuteron mean square matter radius (see, e.g., [328,163]), which may be extracted on one hand from the experimental data on the low-energy nucleon}nucleon interaction, and on the other hand from the experiments on low-energy elastic electron}deuteron scattering. These two kinds of experimental data used to generate inconsistent results for the deuteron matter radius as was "rst discovered in [328]. The discrepancy was resolved in [189], where the Coulomb distortion in the second order Born approximation was taken into account in the analysis of the electron}deuteron elastic scattering. This analysis was further improved in [329] where also the virtual excitations of the deuteron in the electron}deuteron scattering were considered. Now the values of the deuteron matter radius extracted from the low-energy nucleon}nucleon interaction [163] and from the low-energy elastic electron}deuteron scattering [189,329] are in agreement, and do not contradict the optical data in Eqs. (400) and (401). The isotope shift measurements are today the source of the most precise experimental data on the charge radii squared di!erence, and the deuteron matter radius. In view of the unsatisfactory situation with the proton charge radius measurements, more experimental work is clearly warranted (Table 22). 16.1.7. Lamb shift in helium ion He> The theory of high-order corrections to the Lamb shift described above for H and D may also be applied to other light hydrogenlike ions. The simplest such ion for which experimental data on the classic 2S }2P Lamb shift exists is He>. As measured in [125] by the quenching-anisotropy method, ¸(2S }2P , He>)"14 042.52 (16) MHz. A new measurement of the classic Lamb shift in He> by the anisotropy method has been completed recently [331]. In the process of this work the authors have discovered a previously unsuspected source of systematic error in the earlier experiment [125]. The result of the new experiment [331] is ¸(2S }2P , He>)" 14 041.13 (17) MHz. Besides the experimental data this result depends also on the theoretical value of the "ne structure interval *E(2P }2P )"175 593.50 (2) MHz, which may be easily obtained from the theory described in this paper. Theoretical calculation of the He> Lamb shift is straightforward with all the formulae given above. It is only necessary to recall that all contributions scale with the power of Z, and the terms with high power of Z are enhanced in comparison with the hydrogen case. This is particularly important for the contributions of order a(Za)L. One can gain in accuracy using in the theoretical
244
M.I. Eides et al. / Physics Reports 342 (2001) 63}261
formulae high-Z results for the functions G (Za) and G (Za) [13,121], extrapolated to Z"2, 1# 4. instead of nonlogarithmic terms of order a(Za) from Table 5 and of the terms of order a(Za) from Table 7. Theoretical uncertainty may be estimated by scaling with Z the uncertainty of the hydrogen formulae. After calculation we obtain ¸ (2S}2P, He>)"14 041.18 (13) MHz, in excellent RF agreement with the latest experimental result [331]. Thus as a result of the new experiment [331] the only discrepancy between the Lamb shift theory and experiment which existed in recent years has been successfully eliminated. 16.1.8. Rydberg constant The leading contribution to the energy levels in hydrogen in Eq. (39) is clearly sensitive to the value of the Rydberg constant, and, hence, any measurement of the gross structure interval in hydrogen and deuterium may be used for determination of the value of the Rydberg constant, if the magnitudes of the Lamb shifts of respective energy levels are known. In practice only the data on the 1S and 2S (or classic 2S}2P) Lamb shifts limits the accuracy of the determination of the Rydberg constant. Higher-order Lamb shifts are known theoretically with su$cient accuracy. All recent values of the Rydberg constant are derived from experimental data on at least two gross structure intervals in hydrogen and/or deuterium. This allows simultaneous experimental determination of both the 1S Lamb shift and the Rydberg constant from the experimental data, and makes the obtained value of the Rydberg virtually independent of the Lamb shift theory and, what is more important on the controversial experimental data on the proton charge radius. Either self-consistent values of both the 1S and 2S Lamb shifts, or direct experimental value of the classic 2S}2P and respective 2S dependent value of 1S Lamb shift are usually used for determination of the precise value of the Rydberg constant. Recent experimental results for the Rydberg constant are collected in Table 23. A few comments freare due on the latest results. The value in [322] is based on the measurement of the f } 1 "1 quency, f } frequency from [332], and the classic Lamb shift measurements [320,316]. This 1 1 result should be changed due to recent revision [311] of the f } frequency. The result in [3] is 1 " frequency form [322]. and the classic Lamb based on the f } frequency measurement, f } 1 "1 1 1 shift measurements [320,316], and also should be revised. The result in [311] is based on the frequency measurement, as well as on the frequencies measured in [322,311,323}325,3], and f } 1 " the classic Lamb shift measurements [320,316,321]. The results in [322,3,311] are averages, obtained from experimental data on di!erent measured frequencies and their linear combinations in hydrogen and deuterium. In principle they depend both on the measured and self-consistent values of the Lamb shifts. To get a better idea of the e!ect of the rather widely spread experimental data on the classic Lamb shift on the value of the Rydberg constant and on the balance of uncertainties one can
The function G (Za) is de"ned in Footnote 13 in Section 4.5.1. The function G (Za) is de"ned similarly to 1# 4. the function G (Za) in Eq. (122), but like the function G (Za) also includes nonlogarithmic contributions of order 4. 1# a(Za). see Footnote 39. see Footnote 39.
M.I. Eides et al. / Physics Reports 342 (2001) 63}261
245
Table 23 Rydberg constant R (cm\) Andreae et al. [332] Nez et al. [333] Nez et al. [334] Weitz et al. [335] Weitz et al. [323] de Beauvoir et al. [322] Udem et al. [3] Schwob et al. [311] Self-consistent Lamb [322,311] Self-consistent Lamb [3] Self-consistent Lamb [311] Proton radius Eq. (394) [322,311] Proton radius Eq. (394) [3] Proton radius Eq. (3) [311]
109 737.3156841 (42) 109 737.3156830 (31) 109 737.3156834 (24) 109 737.3156844 (31) 109 737.3156849 (30) 109 737.3156859 (10) 109 737.31568639 (91) 109 737.31568516 (84) 109 737.3156858 (5) (8) (1) 109 737.3156852 (11) (0) (1) 109 737.3156846 (4) (9) (1) 109 737.3156843 (3) (8) (1) (9) 109 737.3156822 (6) (0) (1) (22) 109 737.3156831 (3) (9) (1) (9)
compare the experimental results in the upper part of Table 23 with the value of the Rydberg constant, which may be calculated from the experimental frequencies and self-consistent values of Lamb shifts. Values of the Rydberg constant calculated from the experimental transition frequencies in hydrogen in [3,311] and the average self-consistent values of the 1S and 2S}2P Lamb shifts (see Tables 20 and 21) are presented in the middle of Table 23. The "rst error of these values of the Rydberg constant is determined by the accuracy of the average self-consistent Lamb shifts, the second is de"ned by the experimental error of the frequency measurement, and the third is determined by the accuracy of the electron}proton mass ratio. We see that the results in the lower part of Table 23 are compatible with the results of the least square adjustments of all experimental data in the upper half of the table which thus do not depend crucially on the somewhat uncertain experimental data on the 2S}2P Lamb shift. We also see that the uncertainties of the Lamb shift determination and frequency measurements give the largest contributions to the Rydberg constant uncertainty in most experiments. High accuracy of the modern experimental data and theory could allow Rydberg constant determination from direct comparison between the theory and experiment, without appeal to the Lamb shift results. Respective values of the Rydberg constant, calculated with the self-consistent proton radius from Eq. (394) are presented in the lower part of Table 23. The "rst error of these values of the Rydberg constant is determined by the accuracy of the theoretical formula, the second is de"ned by the experimental error of the frequency measurement, the third is determined by the accuracy of the electron}proton mass ratio, and the last one depends on the proton radius uncertainty. The values of the Rydberg constant in the last three lines in Table 23 are rather accurate, and would be able to complete with the other methods of determination of the Rydberg constant from the experimental data after the current controversial situation with the precise value of the proton charge radius will be resolved. It is appropriate to emphasize once again that the experimental values of the Rydberg constant in the upper part of Table 23 are based on
246
M.I. Eides et al. / Physics Reports 342 (2001) 63}261
Table 24 1S}2S transition in muonium *E (MHz) Danzman et al. [336] Jungmann et al. [337] Maas et al. [338] Meyer et al. [339] Theory
2 455 527 936 (120) (140) 2 455 528 016 (58) (43) 2 455 529 002 (33) (46) 2 455 528 941.0 (9.8) 2 455 528 934.9 (0.3)
measurements of at least two intervals of the hydrogen and/or deuterium gross structure and are thus independent of the uncertain value of the proton charge radius. 16.1.9. 1S}2S transition in muonium Starting with the pioneering work [336] Doppler-free two-photon laser spectroscopy was also applied for measurements of the gross structure interval in muonium. Experimental results [336}339] are collected in Table 24, where the error in the "rst brackets is due to statistics and the second error is due to systematic e!ects. The highest accuracy was achieved in the latest experiment [339] *E"2 455 528 941.0 (9.8) MHz .
(402)
Theoretically, muonium di!ers from hydrogen in two main respects. First, the nucleus in the muonium atom is an elementary structureless particle unlike the composite proton which is a quantum chromodynamic bound state of quarks. Hence nuclear size and structure corrections in Table 10 do not contribute to the muonium energy levels. Second, the muon is about ten times lighter than the proton, and recoil and radiative-recoil corrections are numerically much more important for muonium than for hydrogen. In almost all other respects, muonium looks exactly like hydrogen with a somewhat lighter nucleus, and the theoretical expression for the 1S}2S transition frequency may easily be obtained from the leading external "eld contribution in Eq. (39) and di!erent contributions to the energy levels collected in Tables 1}9, after a trivial substitution of the muon mass. Unlike the case of hydrogen, for muonium we cannot ignore corrections in the two last lines of Table 2, and we have to substitute the classical elementary particle contributions in Eqs. (152) and (153) instead of the composite proton contribution in the "fth line in Table 9. After these modi"cations we obtain a theoretical prediction for the frequency of the 1S}2S transition in muonium *E"2 455 528 934.9 (0.3) MHz .
(403)
Even though there is an enhanced role of the recoil corrections for muonium, discrepancy between the results for the radiative-recoil corrections of order a(Za)(m/M)m discussed in Section 6.1.1 is too small, in comparison with the uncertainty originating from the mass ratio, to a!ect the theoretical prediction for the gross structure interval.
M.I. Eides et al. / Physics Reports 342 (2001) 63}261
247
The dominant contribution to the uncertainty of this theoretical result is generated by the uncertainty of the muon}electron mass ratio, and we have used the most precise value of this mass ratio [5] (see for more details the next Section 16.2) in our calculations. All other contributions to the uncertainty of the theoretical prediction: uncertainty of the Rydberg constant, uncertainty of the theoretical expression, etc., are at least an order of magnitude smaller. There is a complete agreement between the experimental and theoretical results for the 1S}2S transition frequency in Eqs. (402) and (403), but clearly further improvement of the experimental data is warranted.
16.1.10. Phenomenology of light muonic atoms There are very few experimental results on the energy levels in light hydrogenlike muonic atoms. The classic 2P !2S Lamb shift in muonic helium ion (kHe)> was measured at CERN many years ago [340}343] and the experimental data was found to be in agreement with the existing theoretical predictions. A comprehensive theoretical review of these experimental results was given in [214], and we refer the interested reader to this review. It is necessary to mention, however, that a recent new experiment [344] failed to con"rm the old experimental results. This leaves the problem of the experimental measurement of the Lamb shift in muonic helium in an uncertain situation, and further experimental e!orts in this direction are clearly warranted. The theoretical contributions to the Lamb shift were discussed above in Section 4.3 mainly in connection with muonic hydrogen, but the respective formulae may be used for muonic helium as well. Let us mention that some of these contributions were obtained a long time after publication of the review [214], and should be used in the comparison of the results of the future helium experiments with theory. There also exists a proposal on measurement of the hyper"ne splitting in the ground state of muonic hydrogen with the accuracy about 10\ [345]. Inspired by this proposal the hadronic vacuum polarization contribution of the ground state hyper"ne splitting in muonic hydrogen was calculated in [346], where it was found that it gives relative contribution about 2;10\ to hyper"ne splitting. We did not include this correction in our discussion of hyper"ne splitting in muonic hydrogen mainly because it is smaller than the theoretical errors due to the polarizability contribution. The current surge of interest in muonic hydrogen is mainly inspired by the desire to obtain a new more precise value of the proton charge radius as a result of measurement of the 2P}2S Lamb shift [198]. As we have seen in Section 4.3 the leading proton radius contribution is about 2% of the total 2P}2S splitting, to be compared with the case of electronic hydrogen where this contribution is relatively two orders of magnitude smaller, about 10\ of the total 2P}2S. Any measurement of the 2P}2S Lamb shift in muonic hydrogen with relative error comparable with the relative error of the Lamb shift measurement in electronic hydrogen is much more sensitive to the value of the proton charge radius. The natural linewidth of the 2P states in muonic hydrogen and respectively of the 2P}2S transition is determined by the linewidth of the 2P}1S transition, which is equal C"0.077 meV. It is planned [198] to measure 2P}2S Lamb shift with an accuracy at the level of 10% of the natural linewidth, or with an error about 0.008 meV, what means measuring the 2P}2S transition with relative error about 4;10\.
248
M.I. Eides et al. / Physics Reports 342 (2001) 63}261
The total 2P}2S Lamb shift in muonic hydrogen calculated according to the formulae in Table 11 for r "0.862 (12) fm, is equal N *E(2P}2S)"202.225 (108) meV .
(404)
We can write this result as a di!erence of a theoretical number and a term proportional to the proton charge radius squared *E(2P}2S)"206.108 (4)!5.22501r2 meV .
(405)
We see from this equation that when the experiment achieves the planned accuracy of about 0.008 meV [198] this would allow determination of the proton charge radius with relative accuracy about 0.1% which is about an order of magnitude better than the accuracy of the available experimental results. Uncertainty in the sum of all theoretical contributions which are not proportional to the proton charge radius squared in Eq. (405) may be somewhat reduced. This uncertainty is determined by the uncertainties of the purely electrodynamic contributions and by the uncertainty of the nuclear polarizability contribution of order (Za)m. Purely electrodynamic uncertainties are introduced by the uncalculated nonlogarithmic contribution of order a(Za) corresponding to the diagrams with radiative photon insertions in the graph for leading electron polarization in Fig. 56, and by the uncalculated light by light contributions in Fig. 20(e), and may be as large as 0.004 meV. Calculation of these contributions and elimination of the respective uncertainties is the most immediate theoretical problem in the theory of muonic hydrogen. After calculation of these corrections, the uncertainty in the sum of all theoretical contributions except those which are directly proportional to the proton radius squared will be determined by the uncertainty of the proton polarizability contribution of order (Za). This uncertainty of the proton polarizability contribution is currently about 0.002 meV, and it will be di$cult to reduce it in the near future. If the experimental error of measurement 2P}2S Lamb shift in hydrogen will be reduced to a comparable level, it would be possible to determine the proton radius with relative error smaller that 3;10\ or with absolute error about 2;10\ fm, to be compared with the current accuracy of the proton radius measurements producing the results with error on the scale of 0.01 fm. 16.2. Hyperxne splitting 16.2.1. Hyperxne splitting in hydrogen Hyper"ne splitting in the ground state of hydrogen was measured precisely about thirty year ago [287,288] *E (H)"1 420 405.7517667 (9) kHz, &$1
d"6;10\ .
(406)
For many years, this hydrogen maser measurement remained the most accurate experiment in modern physics. Only recently the accuracy of the Doppler-free two-photon spectroscopy achieved comparable precision [3] (see the result for the 1S}2S transition frequency in Eq. (390)).
M.I. Eides et al. / Physics Reports 342 (2001) 63}261
249
The theoretical situation for the hyper"ne splitting in hydrogen always remained less satisfactory due to the uncertainties connected with the proton structure. The scale of hyper"ne splitting in hydrogen is determined by the Fermi energy in Eq. (271) E (H)"1 418 840.11 (3) (1) kHz , (407) $ where the uncertainty in the "rst brackets is due to the uncertainty of the proton anomalous magnetic moment i measured in nuclear magnetons, and the uncertainty in the second brackets is due to the uncertainty of the "ne structure constant in Eq. (385). The sum of all nonrecoil corrections to hyper"ne splitting collected in Tables 12}15 is equal to *E (H)"1 420 452.04 (3) (1) kHz , (408) &$1 where the "rst error comes from the experimental error of the proton anomalous magnetic moment i, and the second comes from the error in the value of the "ne structure constant a. The experimental error of i determines the uncertainty of the sum of all nonrecoil contributions to the hydrogen hyper"ne splitting. The theoretical error of the sum of all nonrecoil contributions is about 3 Hz, at least an order of magnitude smaller than the uncertainty introduced by the proton anomalous magnetic monent i, and we did not write it explicitly in Eq. (408). In relative units this theoretical error is about 2;10\, to be compared with the estimate of the same error 1.2;10\ made in [289]. Reduction of the theoretical error by two orders of magnitude emphasizes the progress achieved in calculations of nonrecoil corrections during the last decade. The real stumbling block on the road to a more precise theory of hydrogen hyper"ne splitting is the situation with the proton structure, polarizability and recoil corrections, and there was little progress in this respect during recent years. Following tradition [289] let us compare the theoretical result without the unknown proton polarizability correction with the experimental data in the form *E (H)!*E (H) "!4.5 (1.1);10\ . (409) E (H) $ The di!erence between the numbers and estimates of errors on the RHS in Eq. (409) and the respective numbers in [289] is due mainly to di!erent treatment of the form factor parametrizations and the values of the proton radius. New recoil and structure corrections collected in the lower part of Table 19 had relatively small e!ect on the numbers on the RHS in Eq. (409). The uncertainty in Eq. (409) is dominated by the uncertainty of the Zemach correction in Eq. (347). As we discussed in Section 14.1.1.1, this uncertainty is connected with the accuracy of the dipole "t for the proton formfactor and contradictory experimental data on the proton radius. It is fair to say that the estimate of this uncertainty is to a certain extent subjective and re#ects the prejudices of the authors. One might hope that new experimental data on the proton radius and the proton form factor would provide more solid ground for consideration of the Zemach correction and would allow a more reliable estimate of the di!erence on the LHS in Eq. (409). The result in Eq. (409) does not contradict a rigorous upper bound on the proton polarizability correction in Eq. (362). It could be understood as an indication of the relatively large magnitude of
250
M.I. Eides et al. / Physics Reports 342 (2001) 63}261
the polarizability contribution, and as a challenge to theory to obtain a reliable estimate of the polarizability contribution on the basis of the new experimental data. 16.2.2. Hyperxne splitting in deuterium The hyper"ne splitting in the ground state of deuterium was measured with very high accuracy a long time ago [347,308] *E (D)"327 384.3525219 (17) kHz, d"5.2;10\ . (410) &$1 The expression for the Fermi energy in Eq. (271), besides the trivial substitutions similar to the ones in the case of hydrogen, should be also multiplied by an additional factor 3/4 corresponding to the transition from a spin one half nucleus in the case of hydrogen and muonium to the spin one nucleus in the case of deuterium. The "nal expression for the deuterium Fermi energy has the form
4 m m \ ch R , (411) E (D)" ak 1# $ BM 9 M N B where k "0.8574382284 (94) is the deuteron magnetic moment in nuclear magnetons, M is the B B deuteron mass, and M is the proton mass. Numerically N E (D)"326 967.678 (4) kHz , (412) $ where the main contribution to the uncertainty is introduced by the uncertainty of the deuteron anomalous magnetic moment measure in nuclear magnetons. As in the case of hydrogen, after trivial modi"cations, we can use all nonrecoil corrections in Tables 12}16 for calculations in deuterium. The sum of all nonrecoil corrections is numerically equal to *E (D)"327 339.143 (4) kHz . (413) Unlike the proton, the deuteron is a weakly bound system so one cannot simply use the results for the hydrogen recoil and structure corrections for deuterium. What is needed in the case of deuterium is a completely new consideration. Only one minor nuclear structure correction [348}351] was discussed in the literature for many years, but it was by far too small to explain the di!erence between the experimental result in Eq. (410) and the sum of nonrecoil corrections in Eq. (413) *E (D)!*E (D)"45.2 kHz . (414) &$1 A breakthrough was achieved a few years ago when it was realized that an analytic calculation of the deuterium recoil, structure and polarizability corrections is possible in the zero range approximation [184]. An analytic result for the di!erence in Eq. (414), obtained as a result of a nice calculation in [184], is numerically equal 44 kHz, and within the accuracy of the zero range approximation perfectly explains the di!erence between the experimental result and the sum of the nonrecoil corrections. More accurate calculations of the nuclear e!ects in the deuterium hyper"ne structure beyond the zero range approximation are feasible, and comparison of such results with the experimental data on the deuterium hyper"ne splitting may be used as a test of the deuteron models.
M.I. Eides et al. / Physics Reports 342 (2001) 63}261
251
16.2.3. Hyperxne splitting in muonium Being a purely electrodynamic bound state, muonium is the best system for comparison between the hyper"ne splitting theory and experiment. Unlike the case of hydrogen the theory of hyper"ne splitting in muonium is free from uncertainties generated by the hadronic nature of the proton, and is thus much more precise. The scale of hyper"ne splitting is determined by the Fermi energy in Eq. (271) E (Mu)"4 459 031.922 (518) (34) kHz , $
(415)
where the uncertainty in the "rst brackets is due to the uncertainty of the muon}electron mass ratio in Eq. (387) and the uncertainty in the second brackets is due to the uncertainty of the "ne structure constant in Eq. (385). Theoretical prediction for the hyper"ne splitting interval in the ground state in muonium may easily be obtained collecting all contributions to HFS displayed in Tables 12}18 *E (Mu)"4 463 302.565 (518) (34) (100) kHz , &$1
(416)
where the "rst error comes from the experimental error of the electron}muon mass ratio m/M, the second comes from the error in the value of the "ne structure constant a, and the third is an estimate of the yet unknown theoretical contributions. We see that the uncertainty of the muon}electron mass ratio gives by far the largest contributions both in the uncertainty of the Fermi energy and the theoretical value of the ground state hyper"ne splitting. On the experimental side, hyper"ne splitting in the ground state of muonium admits very precise determination due to its small natural linewidth. The lifetime of the higher energy hyper"ne state with the total angular momentum F"1 with respect to the M1-transition to the lower level state with F"0 is extremely large q"1;10 s and gives negligible contribution to the linewidth. The natural linewidth C /h"72.3 kHz is completely determined by the muon lifetime I q +2.2;10\ s. I A high precision value of the muonium hyper"ne splitting was obtained many years ago [4] *E (Mu)"4 463 302.88 (16) kHz, &$1
d"3.6;10\ .
(417)
In the latest measurement [5] this value was improved by a factor of three *E (Mu)"4 463 302.776 (51) kHz, &$1
d"1.1;10\ ,
(418)
The new value has an experimental error which corresponds to measuring the hyper"ne energy splitting at the level of *l /(C /h)+7;10\ of the natural linewidth. This is a remarkable I experimental achievement. The agreement between theory and experiment is excellent. However, the error bars of the theoretical value are apparently about an order of magnitude larger than respective error bars of the experimental result. This is a deceptive impression. The error of the theoretical prediction in Eq. (416) is dominated by the experimental error of the value of the electron}muon mass ratio. As a result of the new experiment [5] this error was reduced threefold but it is still by far the largest source of error in the theoretical value for the muonium hyper"ne splitting.
252
M.I. Eides et al. / Physics Reports 342 (2001) 63}261
The estimate of the theoretical uncertainty is only two times larger than the experimental error. The largest source of theoretical error is connected with the yet uncalculated theoretical contributions to hyper"ne splitting, mainly with the unknown recoil and radiative-recoil corrections. As we have already mentioned, reducing the theoretical uncertainty by an order of magnitude to about 10 Hz is now a realistic aim for the theory. One may extract electron}muon mass ratio from the experimental value of HFS and the most precise value of a M "206.7682672 (23) (16) (46) , m
(419)
where the "rst error comes from the experimental error of the hyper"ne splitting measurement, the second comes from the error in the value of the "ne structure constant a, and the third is an estimate of the yet unknown theoretical contributions. Combining all errors we obtain the mass ratio M "206.7682672 (54), m
d"2.6;10\ ,
(420)
which is almost "ve times more accurate than the best earlier experimental value in Eq. (387). We see from Eq. (419) that the error of this indirect value of the mass ratio is dominated by the theoretical uncertainty. This sets a clear task for the theory to reduce the contribution of the theoretical uncertainty in the error bars in Eq. (419) to the level below two other contributions to the error bars. It is su$cient to this end to calculate all contributions to HFS which are larger than 10 Hz. This would lead to reduction of the uncertainty of the indirect value of the muon}electron mass ratio by factor two. There is thus a real incentive for improvement of the theory of HFS to account for all corrections to HFS of order 10 Hz, created by the recent experimental and theoretical achievements. Another reason to improve the HFS theory is provided by the perspective of reducing the experimental uncertainty of hyper"ne splitting below the weak interaction contribution in Eq. (343). In such a case, muonium could become the "rst atom where a shift of atomic energy levels due to weak interaction would be observed [352]. 16.3. Summary High-precision experiments with hydrogenlike systems have achieved a new level of accuracy in recent years and further dramatic progress is still expected. The experimental errors of measurements of many energy shifts in hydrogen and muonium were reduced by orders of magnitude. This rapid experimental progress was matched by theoretical developments as discussed above. The accuracy of the quantum electrodynamic theory of such classical e!ects as Lamb shift in hydrogen and hyper"ne splitting in muonium has increased in many cases by one or two orders of magnitudes. This was achieved due to intensive work of many theorists and development of new ingenious original theoretical approaches which can be applied to the theory of bound states, not only in QED but also in other "eld theories, such as quantum chromodynamics. From the
M.I. Eides et al. / Physics Reports 342 (2001) 63}261
253
phenomenological point of view recent developments opened new perspectives for precise determination of many fundamental constants (the Rydberg constant, electron}muon mass ratio, proton charge radius, deuteron structure radius, etc.), and for comparison of the experimental and theoretical results on the Lamb shifts and hyper"ne splitting. Recent progress also poses new theoretical challenges. Reduction of the theoretical error in prediction of the value of the 1S Lamb shift in hydrogen to the level of 1 kHz (and, respectively, of the 2S Lamb shift to several tenth of kHz) should be considered as a next stage of the theory. The theoretical error of the hyper"ne splitting in muonium should be reduced the theoretical error to about 10 Hz. Achievement of these goals will require hard work and a considerable resourcefulness, but results which years ago hardly seemed possible are now within reach.
Acknowledgements Many friends and colleagues for many years discussed with us the bound state problem, collaborated on di!erent projects, and shared with us their vision and insight. We are especially deeply grateful to the late D. Yennie and M. Samuel, to G. Adkins, M. Braun, M. Doncheski, R. Faustov, G. Drake, K. Jungmann, S. Karshenboim, Y. Khriplovich, T. Kinoshita, L. Labzowsky, P. Lepage, A. Martynenko, A. Milshtein, P. Mohr, D. Owen, K. Pachucki, V. Pal'chikov, J. Sapirstein, V. Shabaev, B. Taylor, and A. Yelkhovsky. This work was supported by the NSF grants PHY-9120102, PHY-9421408, and PHY-9900771.
References [1] [2] [3] [4] [5] [6] [7] [8] [9] [10]
[11] [12] [13] [14] [15] [16]
W.E. Lamb Jr., R.C. Retherford, Phys. Rev. 72 (1947) 339. M.G. Boshier, P.E.G. Baird, C.J. Foot et al., Phys. Rev. A 40 (1989) 6169. Th. Udem, A. Huber, B. Gross et al., Phys. Rev. Lett. 79 (1997) 2646. F.G. Mariam, W. Beer, P.R. Bolton et al., Phys. Rev. Lett. 49 (1982) 993. W. Liu, M.G. Boshier, S. Dhawan et al., Phys. Rev. Lett. 82 (1999) 711. H.A. Bethe, E.E. Salpeter, Quantum Mechanics of One- and Two-Electron Atoms, Springer, Berlin, 1957. J.R. Sapirstein, D.R. Yennie, in: T. Kinoshita (Ed.), Quantum Electrodynamics, World Scienti"c, Singapore, 1990, p. 560. H. Grotch, Found. Phys. 24 (1994) 249. V.V. Dvoeglazov, Yu.N. Tyukhtyaev, R.N. Faustov, Fiz. Elem. Chastits At. Yadra 25 (1994) 144 [Phys. Part. Nucl. 25 (1994) 58]. M.I. Eides, New Developments in the Theory of Muonium Hyper"ne Splitting, in: H.M. Fried, B.M. Mueller (Eds.), Quantum Infrared Physics, Proceedings of the Paris Workshop on Quantum Infrared Physics, June 6}10, 1994, World Scienti"c, Singapore, 1995, p. 262. T. Kinoshita, Rep. Prog. Phys. 59 (1996) 3803. J. Sapirstein, in: G.W.F. Drake (Ed.), Atomic, Molecular and Optical Physics Handbook, AIP Press, 1996, p. 327. P.J. Mohr, in: G.W.F. Drake (Ed.), Atomic, Molecular and Optical Physics Handbook, AIP Press, New York, 1996, p. 341. K. Pachucki, D. Leibfried, M. Weitz, A. Huber, W. KoK nig, T.W. HaK nch, J. Phys. B 29 (1996) 177; B 29 (1996) 1573(E). T. Kinoshita, hep-ph/9808351, Cornell preprint, 1998. P.J. Mohr, G. Plunien, G. So!, Phys. Rep. 293 (1998) 227.
254 [17] [18] [19] [20] [21] [22] [23] [24] [25] [26] [27] [28] [29] [30] [31] [32] [33] [34] [35] [36] [37] [38] [39] [40] [41] [42] [43] [44] [45] [46] [47] [48] [49] [50] [51] [52] [53] [54] [55] [56] [57] [58] [59] [60] [61] [62] [63] [64] [65]
M.I. Eides et al. / Physics Reports 342 (2001) 63}261 W.E. Caswell, G.P. Lepage, Phys. Lett. 167B (1986) 437. T. Kinoshita, M. Nio, Phys. Rev. D 53 (1996) 4909. J.D. Bjorken, S.D. Drell, Relativistic Quantum Mechanics, McGraw-Hill, New York, 1964. V.B. Berestetskii, E.M. Lifshitz, L.P. Pitaevskii, Quantum Electrodynamics, 2nd Edition, Pergamon Press, Oxford, 1982. E.E. Salpeter, H.A. Bethe, Phys. Rev. 84 (1951) 1232. C. Itzykson, J.-B. Zuber, Quantum Field Theory, McGraw-Hill, New York, 1980. F. Gross, Relativistic Quantum Mechanics and Field Theory, Wiley, New York, 1993. H. Grotch, D.R. Yennie, Zeitsch. Phys. 202 (1967) 425. H. Grotch, D.R. Yennie, Rev. Mod. Phys. 41 (1969) 350. F. Gross, Phys. Rev. 186 (1969) 1448. L.S. Dulyan, R.N. Faustov, Teor. Mat. Fiz. 22 (1975) 314 [Theor. Math. Phys. 22 (1975) 220]. P. Lepage, Phys. Rev. A 16 (1977) 863. E. Fermi, Z. Phys. 60 (1930) 320. G. Breit, Phys. Rev. 34 (1929) 553; ibid. 36 (1930) 383; ibid. 39 (1932) 616. W.A. Barker, F.N. Glover, Phys. Rev. 99 (1955) 317. H.A. Bethe, Phys. Rev. 72 (1947) 339. N.M. Kroll, W.E. Lamb, Phys. Rev. 75 (1949) 388. J.B. French, V.F. Weisskopf, Phys. Rev. 75 (1949) 1240. G. Bhatt, H. Grotch, Phys. Rev. A 31 (1985) 2794. G. Bhatt, H. Grotch, Ann. Phys. (NY) 178 (1987) 1. S.E. Haywood, J.D. Morgan III, Phys. Rev. A 32 (1985) 3179. G.W.F. Drake, R.A. Swainson, Phys. Rev. A 41 (1990) 1243. J. Schwinger, Phys. Rev. 73 (1948) 416. E.A. Uehling, Phys. Rev. 48 (1935) 55. J. Weneser, R. Bersohn, N.M. Kroll, Phys. Rev. 91 (1953) 1257. M.F. Soto, Phys. Rev. Lett. 17 (1966) 1153; Phys. Rev. A 2 (1970) 734. T. Appelquist, S.J. Brodsky, Phys. Rev. Lett. 24 (1970) 562; Phys. Rev. A 2 (1970) 2293. B.E. Lautrup, A. Peterman, E. de Rafael, Phys. Lett. 31B (1970) 577. R. Barbieri, J.A. Mignaco, E. Remiddi, Lett. Nuovo Cimento 3 (1970) 588. A. Peterman, Phys. Lett. 34B (1971) 507; ibid. 35B (1971) 325. J.A. Fox, Phys. Rev. D 3 (1971) 3228; ibid. D 4 (1971) 3229; ibid. D 5 (1972) 492. R. Barbieri, J.A. Mignaco, E. Remiddi, Nuovo Cimento A 6 (1971) 21. E.A. Kuraev, L.N. Lipatov, N.P. Merenkov, preprint LNPI 46, June 1973. R. Karplus, N.M. Kroll, Phys. Rev. 77 (1950) 536. A. Peterman, Helv. Phys. Acta 30 (1957) 407; Nucl. Phys. 3 (1957) 689. C.M. Sommer"eld, Phys. Rev. 107 (1957) 328; Ann. Phys. (NY) 5 (1958) 26. M. Baranger, F.J. Dyson, E.E. Salpeter, Phys. Rev. 88 (1952) 680. G. Kallen, A. Sabry, Kgl. Dan. Vidensk. Selsk. Mat.-Fis. Medd. 29 (17) (1955). J. Schwinger, Particles, Sources and Fields, Vol. 2, Addison-Wesley, Reading, MA, 1973. K. Melnikov, T. van Ritbergen, Phys. Rev. Lett. 84 (2000) 1673. T. Kinoshita, in: T. Kinoshita (Ed.), Quantum Electrodynamics, World Scienti"c, Singapore, 1990, p. 218. S. Laporta, E. Remiddi, Phys. Lett. B 379 (1996) 283. P.A. Baikov, D.J. Broadhurst, preprint OUT-4102-54, hep-ph 9504398, April 1995; in: B. Denby, D. Perret-Gallix (Eds.), Proceedings of New Computing Technique in Physics Research IV, World Scienti"c, Singapore, 1995. M.I. Eides, H. Grotch, Phys. Rev. A 52 (1995) 3360. S.G. Karshenboim, J. Phys. B: At. Mol. Opt. Phys. 28 (1995) L77. M.I. Eides, V.A. Shelyuto, Phys. Rev. A 52 (1995) 954. J.L. Friar, J. Martorell, D.W.L. Sprung, Phys. Rev. A 59 (1999) 4061. R. Karplus, A. Klein, J. Schwinger, Phys. Rev. 84 (1951) 597. R. Karplus, A. Klein, J. Schwinger, Phys. Rev. 86 (1952) 288.
M.I. Eides et al. / Physics Reports 342 (2001) 63}261 [66] [67] [68] [69] [70] [71] [72] [73] [74] [75] [76] [77] [78] [79] [80] [81] [82] [83] [84] [85] [86] [87] [88] [89] [90] [91] [92] [93] [94] [95] [96] [97] [98] [99] [100] [101] [102] [103] [104] [105] [106] [107] [108] [109] [110] [111] [112]
255
M. Baranger, Phys. Rev. 84 (1951) 866; M. Baranger, H.A. Bethe, R. Feynman, Phys. Rev. 92 (1953) 482. A.A. Abrikosov, Zh. Eksp. Teor. Fiz. 30 (1956) 96 [Sov. Phys.-JETP 3 (1956) 71]. H.M. Fried, D.R. Yennie, Phys. Rev. 112 (1958) 1391. S.G. Karshenboim, V.A. Shelyuto, M.I. Eides, Yad. Fiz. 47 (1988) 454 [Sov. J. Nucl. Phys. 47 (1988) 287]. M.I. Eides, H. Grotch, D.A. Owen, Phys. Lett. B 294 (1992) 115. K. Pachucki, Phys. Rev. A 48 (1993) 2609. S. Laporta, as cited in [71]. M.I. Eides, H. Grotch, Phys. Lett. B 301 (1993) 127. M.I. Eides, H. Grotch, Phys. Lett. B 308 (1993) 389. M.I. Eides, H. Grotch, V.A. Shelyuto, Phys. Rev. A 55 (1997) 2447. M.I. Eides, H. Grotch, P. Pebler, Phys. Lett. B 326 (1994) 197; Phys. Rev. A 50 (1994) 144. M.I. Eides, S.G. Karshenboim, V.A. Shelyuto, Phys. Lett. B 312 (1993) 358; Yad. Phys. 57 (1994) 1309 [Phys. Atom. Nuclei 57 (1994) 1240]. M.I. Eides, S.G. Karshenboim, V.A. Shelyuto, Yad. Phys. 57 (1994) 2246 [Phys. Atom. Nuclei 57 (1994) 2158]. K. Pachucki, Phys. Rev. Lett. 72 (1994) 3154. M.I. Eides, V.A. Shelyuto, Pis'ma Zh. Eksp. Teor. Fiz. 61 (1995) 465 [JETP Letters 61 (1995) 478]. A.J. Layzer, Phys. Rev. Lett. 4 (1960) 580; J. Math. Phys. 2 (1961) 292, 308. H.M. Fried, D.R. Yennie, Phys. Rev. Lett. 4 (1960) 583. G.W. Erickson, D.R. Yennie, Ann. Phys. (NY) 35 (1965) 271. G.W. Erickson, D.R. Yennie, Ann. Phys. (NY) 35 (1965) 447. G.W. Erickson, Phys. Rev. Lett. 27 (1971) 780. P.J. Mohr, Phys. Rev. Lett. 34 (1975) 1050; Phys. Rev. A 26 (1982) 2338. J.R. Sapirstein, Phys. Rev. Lett. 47 (1981) 1723. V.G. Pal'chikov, Metrologia 10 (1987) 3 (in Russian). L. Hostler, J. Math. Phys. 5 (1964) 1235. J. Schwinger, J. Math. Phys. 5 (1964) 1606. K. Pachucki, Phys. Rev. A 46 (1992) 648. K. Pachucki, Ann. Phys. (NY) 236 (1993) 1. S.S. Schweber, QED and the Men Who Made It, Princeton University Press, Princeton, 1994. E.E. Salpeter, Phys. Rev. 87 (1952) 553. T. Fulton, P.C. Martin, Phys. Rev. 95 (1954) 811. I.B. Khriplovich, A.I. Milstein, A.S. Yelkhovsky, Phys. Scr. T 46 (1993) 252. J.A. Fox, D.R. Yennie, Ann. Phys. (NY) 81 (1973) 438. K. Pachucki, as cited in [121]. U. Jentschura, K. Pachucki, Phys. Rev. A 54 (1996) 1853. U. Jentschura, G. So!, P.J. Mohr, Phys. Rev. A 56 (1997) 1739. R. Serber, Phys. Rev. 48 (1935) 49. E.H. Wichmann, N.M. Kroll, Phys. Rev. 101 (1956) 843. V.G. Ivanov, S.G. Karshenboim, Yad. Phys. 60 (1997) 333 [Phys. Atom. Nuclei 60 (1997) 270]. N.L. Manakov, A.A. Nekipelov, A.G. Fainstein, Zh. Eksp. Teor. Fiz. 95 (1989) 1167 [Sov. Phys.-JETP 68 (1989) 673]. M.H. Mittleman, Phys. Rev. 107 (1957) 1170. D.E. Zwanziger, Phys. Rev. 121 (1961) 1128. P.J. Mohr, At. Data Nucl. Data Tables 29 (1983) 453. P.J. Mohr, in: I.A. Sellin, D.J. Pegg (Eds.), Beam-Foil Spectroscopy, Plenum Press, New York, Vol. 1, 1976, p. 89. L.D. Landau, E.M. Lifshitz, Quantum Mechanics, 3rd Edition, Butterworth-Heinemann, Stoneham, MA, 1997. S.G. Karshenboim, Zh. Eksp. Teor. Fiz. 103 (1993) 1105 [JETP 76 (1993) 541]. S. Mallampalli, J. Sapirstein, Phys. Rev. Lett. 80 (1998) 5297. I. Goidenko, L. Labzowsky, A. Ne"odov et al., Phys. Rev. Lett. 83 (1999) 2312.
256 [113] [114] [115] [116] [117] [118] [119] [120] [121] [122] [123] [124] [125] [126] [127] [128] [129] [130] [131] [132] [133] [134] [135] [136] [137] [138] [139] [140] [141] [142] [143] [144] [145] [146] [147] [148] [149] [150] [151] [152] [153] [154] [155] [156] [157] [158] [159]
M.I. Eides et al. / Physics Reports 342 (2001) 63}261 V.A. Yerokhin, SPB preprint, Phys. Rev. A 62 (2000) 012508. A.V. Manohar, I.W. Stewart, Phys. Rev. Lett. 85 (2000) 2248. S.G. Karshenboim, Zh. Eksp. Teor. Fiz. 109 (1996) 752 [JETP 82 (1996) 403]. S.G. Karshenboim, J. Phys. B: At. Mol. Opt. Phys. 29 (1996) L29. D.R. Yennie, S.C. Frautchi, H. Suura, Ann. Phys. (NY) 13 (1961) 379. S.G. Karshenboim, Z. Phys. D 36 (1996) 11. S.G. Karshenboim, Yad. Phys. 58 (1995) 707 [Phys. Atom. Nuclei 58 (1995) 649]. S.G. Karshenboim, Zh. Eksp. Teor. Fiz. 106 (1994) 414 [JETP 79 (1994) 230]. U. Jentschura, P.J. Mohr, G. So!, Phys. Rev. Lett. 82 (1999) 53. P.J. Mohr, Y.-K. Kim, Phys. Rev. A 45 (1992) 2727. P.J. Mohr, Phys. Rev. A 46 (1992) 4421. P.J. Mohr, Phys. Rev. A 44 (1991) R4089; Errata A 51 (1995) 3390. A. van Wijngaarden, J. Kwela, G.W.F. Drake, Phys. Rev. A 43 (1991) 3325. S.G. Karshenboim, Yad. Phys. 58 (1995) 309 [Phys. Atom. Nuclei 58 (1995) 262]. S.G. Karshenboim, Can. J. Phys. 76 (1998) 169; Zh. Eksp. Teor. Fiz. 116 (1999) 1575 [JETP 89 (1999) 850]. T. Welton, Phys. Rev. 74 (1948) 1157. M.I. Eides, S.G. Karshenboim, V.A. Shelyuto, Ann. Phys. (NY) 205 (1991) 231, 291. G.W. Erickson, J. Phys. Chem. Ref. Data 6 (1977) 831. G.W. Erickson, H. Grotch, Phys. Rev. Lett. 25 (1988) 2611; 63 (1989) 1326(E). M. Doncheski, H. Grotch, D.A. Owen, Phys. Rev. A 41 (1990) 2851. M. Doncheski, H. Grotch, G.W. Erickson, Phys. Rev. A 43 (1991) 2125. I.B. Khriplovich, A.I. Milstein, A.S. Yelkhovsky, Phys. Scr. T 46 (1993) 252. R.N. Fell, I.B. Khriplovich, A.I. Milstein, A.S. Yelkohovsky, Phys. Lett. A 181 (1993) 172. K. Pachucki, H. Grotch, Phys. Rev. A 51 (1995) 1854. M.A. Braun, Zh. Eksp. Teor. Fiz. 64 (1973) 413 [Sov. Phys.-JETP 37 (1973) 211]. V.M. Shabaev, Teor. Mat. Fiz. 63 (1985) 394 [Theor. Math. Phys. 63 (1985) 588]. A.S. Yelkhovsky, preprint Budker INP 94-27, hep-th/9403095 (1994). L.N. Labzowsky, Proceedings of the XVII All-Union Congress on Spectroscopy, Moscow, 1972, Part 2, p. 89. J.H. Epstein, S.T. Epstein, Am. J. Phys. 30 (1962) 266. V.M. Shabaev, J. Phys. B: At. Mol. Opt. Phys. B 24 (1991) 4479. A.S. Elkhovskii, Zh. Eksp. Teor. Fiz. 110 (1996) 431; JETP 83 (1996) 230. M.I. Eides, H. Grotch, Phys. Rev. A 55 (1997) 3351. A.S. Elkhovskii, Zh. Eksp. Teor. Fiz. 113 (1998) 865 [JETP 86 (1998) 472]. V.M. Shabaev, A.N. Artemyev, T. Beier et al., Phys. Rev. A 57 (1998) 4235; V.M. Shabaev, A.N. Artemyev, T. Beier, G. So!, J. Phys. B: At. Mol. Opt. Phys. B 31 (1998) L337. E.A. Golosov, A.S. Elkhovskii, A.I. Milshtein, I.B. Khriplovich, Zh. Eksp. Teor. Fiz. 107 (1995) 393 [JETP 80 (1995) 208]. I.B. Khriplovich, A.S. Yelkhovsky, Phys. Lett. 246B (1990) 520. K. Pachucki, S.G. Karshenboim, Phys. Rev. A 60 (1999) 2792. K. Melnikov, A.S. Yelkhovsky, Phys. Lett. 458B (1999) 143. G. Bhatt, H. Grotch, Phys. Rev. Lett. 58 (1987) 471. K. Pachucki, Phys. Rev. A 52 (1995) 1079. M.I. Eides, H. Grotch, V.A. Shelyuto, work in progress. M.I. Eides, H. Grotch, Phys. Rev. A 52 (1995) 1757. A.S. Yelkhovsky, preprint Budker INP 97-80, hep-ph/9710377 (1997). L.L. Foldy, Phys. Rev. 83 (1951) 688. D.J. Drickey, L.N. Hand, Phys. Rev. Lett. 9 (1962) 521; L.N. Hand, D.J. Miller, R. Wilson, Rev. Mod. Phys. 35 (1963) 335. G.G. Simon, Ch. Schmidt, F. Borkowski, V.H. Walther, Nucl. Phys. A 333 (1980) 381. R. Rosenfelder, Phys. Lett. B 479 (2000) 381.
M.I. Eides et al. / Physics Reports 342 (2001) 63}261
257
[160] I.B. Khriplovich, A.I. Milstein, R.A. Sen'kov, Phys. Lett. A221 (1996) 370; Zh. Eksp. Teor. Fiz. 111 (1997) 1935 [JETP 84 (1997) 1054]. [161] D.A. Owen, Found. Phys. 24 (1994) 273. [162] K. Pachucki, S.G. Karshenboim, J. Phys. B: At. Mol. Opt. Phys. 28 (1995) L221. [163] J.L. Friar, J. Martorell, D.W.L. Sprung, Phys. Rev. A 56 (1997) 4579. [164] L.A. Borisoglebsky, E.E. Tro"menko, Phys. Lett. 81B (1979) 175. [165] J.L. Friar, Ann. Phys. (NY) 122 (1979) 151. [166] C. Zemach, Phys. Rev. 104 (1956) 1771. [167] J.L. Friar, G.L. Payne, Phys. Rev. A 56 (1997) 5173. [168] R.N. Faustov, A.P. Martynenko, Samara State University preprint SSU-HEP-99/04, hep-ph/9904362, April 1999, Yad. Phys. 63 (2000) 915 [Phys. Atom. Nuclei 63, May 2000]. [169] T.E.O. Erickson, J. Hufner, Nucl. Phys. B 47 (1972) 205. [170] J. Bernabeu, C. Jarlskog, Nucl. Phys. B 60 (1973) 347. [171] J. Bernabeu, C. Jarlskog, Nucl. Phys. B 75 (1974) 59. [172] S.A. Startsev, V.A. Petrun'kin, A.L. Khomkin, Yad. Fiz. 23 (1976) 1233 [Sov. J. Nucl. Phys. 23 (1976) 656]. [173] J. Bernabeu, C. Jarlskog, Phys. Lett. 60B (1976) 197. [174] J.L. Friar, Phys. Rev. C 16 (1977) 1540. [175] R. Rosenfelder, Nucl. Phys. A 393 (1983) 301. [176] J. Bernabeu, T.E.O. Ericson, Z. Phys. A 309 (1983) 213. [177] I.B. Khriplovich, R.A. Sen'kov, Novosibirsk preprint, nucl-th/9704043, April 1997. [178] B.E. MacGibbon, G. Garino, M.A. Lucas et al., Phys. Rev. C 52 (1995) 2097. [179] I.B. Khriplovich, R.A. Sen'kov, Phys. Lett. A 249 (1998) 474. [180] I.B. Khriplovich, R.A. Sen'kov, Phys. Lett. B 481 (2000) 447. [181] R. Rosenfelder, Phys. Lett. B 463 (1999) 317. [182] D. Babusci, G. Giordano, G. Matone, Phys. Rev. C 57 (1998) 291. [183] J. Martorell, D.W. Sprung, D.C. Zheng, Phys. Rev. C 51 (1995) 1127. [184] A.I. Milshtein, I.B. Khriplovich, S.S. Petrosyan, Zh. Eksp. Teor. Fiz. 109 (1996) 1146 [JETP 82 (1996) 616]. [185] Y. Lu, R. Rosenfelder, Phys. Lett. B 319 (1993) 7; B 333 (1994) 564(E). [186] W. Leidemann, R. Rosenfelder, Phys. Rev. C 51 (1995) 427. [187] J.L. Friar, G.L. Payne, Phys. Rev. C 55 (1997) 2764. [188] J.L. Friar, G.L. Payne, Phys. Rev. C 56 (1997) 619. [189] I. Sick, D. Trautman, Nucl. Phys. A 637 (1998) 559. [190] E. Borie, Phys. Rev. Lett. 47 (1981) 568. [191] G.P. Lepage, D.R. Yennie, G.W. Erickson, Phys. Rev. Lett. 47 (1981) 1640. [192] M.I. Eides, H. Grotch, Phys. Rev. A 56 (1997) R2507. [193] J.L. Friar, Zeit. f. Physik A 292 (1979) 1; ibid. A 303 (1981) 84. [194] D.J. Hylton, Phys. Rev. A 32 (1985) 1303. [195] K. Pachucki, Phys. Rev. A 48 (1993) 120. [196] M.I. Eides, Phys. Rev. A 53 (1996) 2953. [197] J.A. Wheeler, Rev. Mod. Phys. 21 (1949) 133. [198] F. Kottmann et al., Proposal for an Experiment at PSI R-98-03, January, 1999. [199] A.D. Galanin, I.Ia. Pomeranchuk, Dokl. Akad. Nauk SSSR 86 (1952) 251. [200] L. Schi!, Quantum Mechanics, 3rd Edition, McGraw-Hill, New York, 1968. [201] A.B. Mickelwait, H.C. Corben, Phys. Rev. 96 (1954) 1145. [202] G.E. Pustovalov, Zh. Eksp. Teor. Fiz. 32 (1957) 1519 [Sov. Phys.-JETP 5 (1957) 1234]. [203] A. Di Giacomo, Nucl. Phys. B 11 (1969) 411. [204] R. Glauber, W. Rarita, P. Schwed, Phys. Rev. 120 (1960) 609. [205] J. Blomkwist, Nucl. Phys. B 48 (1972) 95. [206] K.-N. Huang, Phys. Rev. A 14 (1976) 1311. [207] T. Kinoshita, W.B. Lindquist, Phys. Rev. D 27 (1983) 853. [208] T. Kinoshita, W.B. Lindquist, Phys. Rev. D 27 (1983) 867.
258 [209] [210] [211] [212] [213] [214] [215] [216] [217] [218] [219] [220] [221] [222] [223] [224] [225] [226] [227] [228] [229] [230] [231] [232] [233] [234] [235] [236] [237] [238] [239] [240] [241] [242] [243] [244] [245] [246] [247] [248] [249] [250] [251] [252] [253] [254]
M.I. Eides et al. / Physics Reports 342 (2001) 63}261 T. Kinoshita, M. Nio, Phys. Rev. Lett. 82 (1999) 3240. B.J. Laurenzi, A. Flamberg, Int. J. Quantum Chem. 11 (1977) 869. K. Pachucki, Phys. Rev. A 53 (1996) 2092. K. Pachucki, Warsaw preprint, physics/99060002, June 1999. M.K. Sundaresan, P.J.S. Watson, Phys. Rev. Lett. 29 (1972) 15. E. Borie, G.A. Rinker, Rev. Mod. Phys. 54 (1982) 67. B. Fricke, Z. Phys. 218 (1969) 495. P. Vogel, At. Data Nucl. Data Tables 14 (1974) 599. G.A. Rinker, Phys. Rev. A 14 (1976) 18. E. Borie, G.A. Rinker, Phys. Rev. A 18 (1978) 324. M.-Y. Chen, Phys. Rev. Lett. 34 (1975) 341. L. Wilets, G.A. Rinker Jr., Phys. Rev. Lett. 34 (1975) 339. D.H. Fujimoto, Phys. Rev. Lett. 35 (1975) 341. E. Borie, Nucl. Phys. A 267 (1976) 485. J. Calmet, D.A. Owen, J. Phys. B 12 (1979) 169. R. Barbieri, M. Ca!o, E. Remiddi, Lett. Nuovo Cimento 7 (1973) 60. H. Suura, E. Wichmann, Phys. Rev. 105 (1957) 1930. A. Peterman, Phys. Rev. 105 (1957) 1931. H.H. Elend, Phys. Lett. 20 (1966) 682; Errata 21 (1966) 720. G. Erickson, H.H. Liu, preprint UCD-CNL-81 (1968). E. Borie, Helv. Phys. Acta 48 (1975) 671. V.N. Folomeshkin, Yad. Fiz. 19 (1974) 1157 [Sov. J. Nucl. Phys. 19 (1974) 592]. M.K. Sundaresan, P.J.S. Watson, Phys. Rev. D 11 (1975) 230. V.P. Gerdt, A. Karimkhodzhaev, R.N. Faustov, Proceedings of the International Workshop on High Energy Physics and Quantum Filed Theory, 1978, p. 289. E. Borie, Z. Phys. A 302 (1981) 187. R.N. Faustov, A.P. Martynenko, Samara State University preprint SSU-HEP-99/07, hep-ph/9906315, June 1999. G. Breit, Phys. Rev. 35 (1930) 1477. N. Kroll, F. Pollock, Phys. Rev. 84 (1951) 594; ibid. 86 (1952) 876. R. Karplus, A. Klein, Phys. Rev. 85 (1952) 972. M.I. Eides, S.G. Karshenboim, V.A. Shelyuto, Phys. Lett. 229B (1989) 285; Pis'ma Zh. Eksp. Teor. Fiz. 50 (1989) 3 [JETP Lett. 50 (1989) 1]; Yad. Fiz. 50 (1989) 1636 [Sov. J. Nucl. Phys. 50 (1989) 1015]. E.A. Terray, D.R. Yennie, Phys. Rev. Lett. 48 (1982) 1803. J.R. Sapirstein, E.A. Terray, D.R. Yennie, Phys. Rev. D 29 (1984) 2290. M.I. Eides, S.G. Karshenboim, V.A. Shelyuto, Phys. Lett. 249B (1990) 519; Pis'ma Zh. Eksp. Teor. Fiz. 52 (1990) 937 [JETP Lett. 52 (1990) 317]. M.I. Eides, S.G. Karshenboim, V.A. Shelyuto, Phys. Lett. 268B (1991) 433; 316B (1993) 631 (E); 319B (1993) 545 (E); Yad. Fiz. 55 (1992) 466; 57 (1994) 1343 (E) [Sov. J. Nucl. Phys. 55 (1992) 257; 57 (1994) 1275 (E)]. T. Kinoshita, M. Nio, Phys. Rev. Lett. 72 (1994) 3803. A. Layzer, Nuovo Cim. 33 (1964) 1538. D. Zwanziger, Nuovo Cim. 34 (1964) 77. S.J. Brodsky, G.W. Erickson, Phys. Rev. 148 (1966) 148. J.R. Sapirstein, Phys. Rev. Lett. 51 (1983) 985. K. Pachucki, Phys. Rev. A 54 (1996) 1994. T. Kinoshita, M. Nio, Phys. Rev. D 55 (1996) 7267. S.J. Brodsky, G.W. Erickson, Phys. Rev. 148 (1966) 26. J.R. Sapirstein, unpublished, as cited in [243]. S.J. Brodsky, unpublished, as cited in [249]. S.M. Schneider, W. Greiner, G. So!, Phys. Rev. A 50 (1994) 118. P. Lepage, unpublished, as cited in [243].
M.I. Eides et al. / Physics Reports 342 (2001) 63}261 [255] [256] [257] [258] [259] [260] [261] [262] [263] [264] [265] [266] [267] [268] [269] [270] [271]
[272] [273] [274] [275] [276] [277] [278] [279] [280] [281] [282] [283] [284] [285] [286] [287] [288] [289] [290] [291] [292] [293] [294] [295] [296] [297] [298] [299]
259
H. Persson, S.M. Schneider, W. Greiner et al., Phys. Rev. Lett. 76 (1996) 1433. S.A. Blundell, K.T. Cheng, J. Sapirstein, Phys. Rev. Lett. 78 (1997) 4914. P. Sinnergen, H. Persson, S. Salomoson et al., Phys. Rev. A 58 (1998) 1055. R. Arnowitt, Phys. Rev. 92 (1953) 1002. W.A. Newcomb, E.E. Salpeter, Phys. Rev. 97 (1955) 1146. G.T. Bodwin, D.R. Yennie, M.A. Gregorio, Phys. Rev. Lett. 41 (1978) 1088. W.E. Caswell, G.P. Lepage, Phys. Rev. Lett. 41 (1978) 1092. T. Fulton, D.A. Owen, W.W. Repko, Phys. Rev. Lett. 26 (1971) 61. G.T. Bodwin, D.R. Yennie, Phys. Rep. 43C (1978) 267. M.M. Sternheim, Phys. Rev. 130 (1963) 211. W.E. Caswell, G.P. Lepage, Phys. Rev. A 18 (1978) 810. G.T. Bodwin, D.R. Yennie, M.A. Gregorio, Phys. Rev. Lett. 48 (1982) 1799. G.T. Bodwin, D.R. Yennie, M.A. Gregorio, Rev. Mod Phys. 57 (1985) 723. S.G. Karshenboim, M.I. Eides, V.A. Shelyuto, Yad. Fiz. 47 (1988) 454 [Sov. J. Nucl. Phys. 47 (1988) 287]; Yad. Fiz. 48 (1988) 769 [Sov. J. Nucl. Phys. 48 (1988) 490]. M.I. Eides, S.G. Karshenboim, V.A. Shelyuto, Phys. Lett. 216B (1989) 405; Yad. Fiz. 49 (1989) 493 [Sov. J. Nucl. Phys. 49 (1989) 309]. V.Yu. Brook, M.I. Eides, S.G. Karshenboim, V.A. Shelyuto, Phys. Lett. 216B (1989) 401. M.I. Eides, S.G. Karshenboim, V.A. Shelyuto, Phys. Lett. 177B (1986) 425; Yad. Fiz. 44 (1986) 1118 [Sov. J. Nucl. Phys. 44 (1986) 723]; Zh. Eksp. Teor. Fiz. 92 (1987) 1188 [Sov. Phys.-JETP 65 (1987) 664]; Yad. Fiz. 48 (1988) 1039 [Sov. J. Nucl. Phys. 48 (1988) 661]. J.R. Sapirstein, E.A. Terray, D.R. Yennie, Phys. Rev. Lett. 51 (1983) 982. M.I. Eides, S.G. Karshenboim, V.A. Shelyuto, Phys. Lett. 202B (1988) 572; Zh. Eksp. Teor. Fiz. 94 (1988) 42 [Sov. Phys.-JETP 67 (1988) 671]. A. Karimkhodzhaev, R.N. Faustov, Sov. J. Nucl. Phys. 53 (1991) 1012 [Sov. J. Nucl. Phys. 53 (1991) 626]. R.N. Faustov, A. Karimkhodzhaev, A.P. Martynenko, Phys. Rev. A 59 (1999) 2498; Yad. Phys. 62 (1999) 2284 [Phys. Atom. Nuclei 62 (1999) 2103]. M.I. Eides, V.A. Shelyuto, Phys. Lett. 146B (1984) 241. S.G. Karshenboim, M.I. Eides, V.A. Shelyuto, Yad. Fiz. 52 (1990) 1066 [Sov. J. Nucl. Phys. 52 (1990) 679]. S.L. Adler, Phys. Rev. 177 (1969) 2426. G. Li, M.A. Samuel, M.I. Eides, Phys. Rev. A 47 (1993) 876. M.I. Eides, H. Grotch, V.A. Shelyuto, Phys. Rev. D 58 (1998) 013008. J. Barclay Adams, Phys. Rev. 139 (1965) B 1050. M.A. Beg, G. Feinberg, Phys. Rev. Lett. 33 (1974) 606; 35 (1975) 130(E). W.W. Repko, Phys. Rev. D 7 (1973) 279. H. Grotch, Phys. Rev. D 9 (1974) 311. R. Alcotra, J.A. Grifols, Ann. Phys. (NY) 229 (1993) 109. M.I. Eides, Phys. Rev. A 53 (1996) 2953. H. Hellwig, R.F.C. Vessot, M.W. Levine et al., IEEE Trans. IM-19 (1970) 200. L. Essen, R.W. Donaldson, M.J. Bangham et al., Nature 229 (1971) 110. G.T. Bodwin, D.R. Yennie, Phys. Rev. D 37 (1988) 498. S.G. Karshenboim, Phys. Lett. A 225 (1997) 97. S.D. Drell, J.D. Sullivan, Phys. Rev. 154 (1967) 1477. C.K. Iddings, P.M. Platzman, Phys. Rev. 113 (1959) 192. C.K. Iddings, Phys. Rev. 138 (1965) B 446. C.K. Iddings, P.M. Platzman, Phys. Rev. 115 (1959) 919. A. Verganalakis, D. Zwanziger, Nuovo Cimento 39 (1965) 613. F. Guerin, Nuovo Cimento A 50 (1967) 1. G.M. Zinov'ev, B.V. Struminskii, R.N. Faustov et al., Yad. Fiz. 11 (1970) 1284 [Sov. J. Nucl. Phys. 11 (1970) 715]. R.N. Faustov, A.P. Martynenko, V.A. Saleev, Yad. Phys. 62 (1999) 2280 [Phys. Atom. Nuclei 62 (1999) 2099]. E. de Rafael, Phys. Lett. 37B (1971) 201.
260 [300] [301] [302] [303] [304] [305] [306] [307] [308] [309] [310] [311] [312] [313] [314] [315] [316] [317] [318] [319] [320] [321] [322] [323] [324] [325] [326] [327] [328] [329] [330] [331]
[332] [333] [334] [335] [336] [337] [338] [339] [340] [341] [342] [343] [344] [345]
M.I. Eides et al. / Physics Reports 342 (2001) 63}261 P. GnaK dig, J. Kuti, Phys. Lett. 42B (1972) 241. V.W. Hughes, J. Kuti, Ann. Rev. Nucl. Part. Sci. 33 (1983) 611. E.E. Tro"menko, Phys. Lett. 73A (1979) 383. J.W. Heberle, H.A. Reich, P. Kusch, Phys. Rev. 101 (1956) 612. N.E. Rothery, E.A. Hessels, Phys. Rev. A 61 (2000) 044501. J.W. Heberle, H.A. Reich, P. Kusch, Phys. Rev. 104 (1956) 1585. M.H. Prior, E.C. Wang, Phys. Rev. A 16 (1977) 6. S.R. Lundeen, P.E. Jessop, F.M. Pipkin, Phys. Rev. Lett. 34 (1975) 377. N.F. Ramsey, in: T. Kinoshita (Ed.), Quantum Electrodynamics, World Scienti"c, Singapore, 1990, p. 673. M.M. Sternheim, Phys. Rev. 138 (1965) B 430. S.V. Romanov, Z. Phys. D 28 (1993) 7. C. Schwob, L. Jozefovski, B. de Beauvoir et al., Phys. Rev. Lett. 82 (1999) 4960. V.W. Hughes, T. Kinoshita, Rev. Mod. Phys. 71 (1999) S133. D.L. Farnham, R.S. Van Dyck Jr., P.B. Schwinberg, Phys Rev. Lett. 75 (1995) 3598. G. Audi, A.H. Wapstra, Nucl. Phys. A 565 (1993) 1. Yu.L. Sokolov, V.P. Yakovlev, Zh. Eksp. Teor. Fiz. 83 (1982) 15 [Sov. Phys.-JETP 56 (1982) 7]; V.G. Palchikov, Yu.L. Sokolov, V.P. Yakovlev, Pis'ma Zh. Eksp. Teor. Fiz. 38 (1983) 347 [JETP Letters 38 (1983) 418]. E.W. Hagley, F.M. Pipkin, Phys. Rev. Lett. 72 (1994) 1172. V.G. Palchikov, Yu.L. Sokolov, V.P. Yakovlev, Phys. Scr. 55 (1997) 33. S.G. Karshenboim, Phys. Scr. 57 (1998) 213. G. Newton, D.A. Andrews, P.J. Unsworth, Philos. Trans. Roy. Soc. London 290 (1979) 373. S.R. Lundeen, F.M. Pipkin, Phys. Rev. Lett. 46 (1981) 232; Metrologia 22 (1986) 9. A. van Wijngaarden, F. Holuj, G.W.F. Drake, Can. J. Phys. 76 (1998) 95. B. de Beauvoir, F. Nez, L. Julien et al., Phys. Rev. Lett. 78 (1997) 440. M. Weitz, A. Huber, F. Schmidt-Kaler et al., Phys. Rev. A 52 (1995) 2664. D.J. Berkeland, E.A. Hinds, M.G. Boshier, Phys. Rev. Lett. 75 (1995) 2470. S. Bourzeix, B. de Beauvoir, F. Nez et al., Phys. Rev. Lett. 76 (1996) 384. V.V. Ezhela, B.V. Polishcuk, Protvino preprint IHEP 99-48, hep-ph/9912401. A. Huber, Th. Udem, B. Gross et al., Phys. Rev. Lett. 80 (1998) 468. S. Klarsfeld, J. Martorell, J.A. Oteo et al., Nucl. Phys. A 456 (1986) 373. T. Herrmann, R. Rosenfelder, Eur. Phys. J. A 2 (1998) 29. F. Schmidt-Kaler, D. Leibfried, M. Weitz et al., Phys. Rev. Lett. 70 (1993) 2261. A. van Wijngaarden, F. Holuj, G.W.F. Drake, Post-deadline abstract submitted to DAMOP Meeting, Storrs CT, June 14}17, 2000; G.W.F. Drake, A. van Wijngaarden, abstract submitted to ICAP Hydrogen Atom II Satellite Meeting, Tuscany, June 1}3, 2000, to be published. T. Andreae, W. Konig, R. Wynands et al., Phys. Rev. Lett. 69 (1992) 1923. F. Nez, M.D. Plimmer, S. Bourzeix et al., Phys. Rev. Lett. 69 (1992) 2326. F. Nez, M.D. Plimmer, S. Bourzeix et al., Europhys. Lett. 24 (1993) 635. M. Weitz, A. Huber, F. Schmidt-Kaler et al., Phys. Rev. Lett. 72 (1994) 328. S. Chu, A.P. Mills Jr., A.G. Yodth et al., Phys. Rev. Lett. 60 (1988) 101; K. Danzman, M.S. Fee, S. Chu, Phys. Rev. A 39 (1989) 6072. K. Jungmann, P.E.G. Baird, J.R.M. Barr et al., Z. Phys. D 21 (1991) 241. F. Maas, B. Braun, H. Geerds et al., Phys. Lett. A 187 (1994) 247. V. Meyer, S.N. Bagaev, P.E.G. Baird et al., Phys. Rev. Lett. 84 (2000) 1136. A. Bertin, G. Carboni, J. Duclos et al., Phys. Lett. B 55 (1975). G. Carboni, U. Gastaldi, G. Neri et al., Nuovo Cimento A 34 (1976) 493. G. Carboni, G. Gorini, G. Torelli et al., Nucl. Phys. A 278 (1977) 381. G. Carboni, G. Gorini, E. Iacopini et al., Phys. Lett. B 73 (1978) 229. P. Hauser, H.P. von Arb, A. Biancchetti et al., Phys. Rev. A 46 (1992) 2363. D. Bakalov et al., Proceedings of the III International Symposium on Weak and Electromagnetic Interactions in Nuclei (WEIN-92) Dubna, Russia, 1992, pp. 656}662.
M.I. Eides et al. / Physics Reports 342 (2001) 63}261 [346] [347] [348] [349] [350] [351] [352]
R.N. Faustov, A.P. Martynenko, Samara State University preprint SSU-HEP-97/03, hep-ph/9709374. D.J. Wineland, N.F. Ramsey, Phys. Rev. A 5 (1972) 821. A. Bohr, Phys. Rev. 73 (1948) 1109. F.E. Low, Phys. Rev. 77 (1950) 361. F.E. Low, E.E. Salpeter, Phys. Rev. 83 (1951) 478. D.A. Greenberg, H.M. Foley, Phys. Rev. 120 (1960) 1684. K.P. Jungmann, `Muoniuma, preprint physics/9809020, September 1998.
261
Physics Reports 342 (2001) 263}392
Techniques of replica symmetry breaking and the storage problem of the McCulloch}Pitts neuron G. GyoK rgyi Institute of Theoretical Physics, Eo( tvo( s University, 1518 Budapest, Pf. 32, Hungary Received May 2000 editor: I. Procaccia Contents 1. Introduction and overview 1.1. Introduction 1.2. Overview 2. Arti"cial neural networks and spin glasses 2.1. The McCulloch}Pitts neuron and perceptrons 2.2. Associative memory 2.3. Sherrington}Kirkpatrick model 2.4. Little}Hop"eld network 2.5. Pattern storage by a single neuron 2.6. Training, error measures, and retrieval 2.7. Multi-layer perceptrons 3. Statistical mechanics of pattern storage 3.1. The model 3.2. Thermodynamics 3.3. Spherical and independently distributed synapses 3.4. Neural stabilities, errors, and overlaps 4. The Parisi solution 4.1. Finite replica symmetry breaking 4.2. Finite and continuous replica symmetry breaking 5. Correlations and thermodynamical stability 5.1. Expectation values 5.2. Variations of the Parisi term 5.3. The Hessian matrix 6. Interpretation and special properties 6.1. Physical meaning of x(q)
266 266 269 270 271 272 273 279 281 285 288 290 290 292 293 295 296 296 303 314 314 324 327 331 331
6.2. Diagonalization of a Parisi matrix 6.3. Symmetries of Parisi's PDE 6.4. Spherical entropic term: a solvable case of Parisi's PDE 6.5. Small "eld expansion 7. The neuron: spherical synapses 7.1. General results 7.2. The special error measure h(i!y) 8. The neuron: independently distributed synapses 8.1. Free energy and stationarity condition 8.2. Variational principle 8.3. On thermodynamical stability 9. Conclusions and outlook Acknowledgements Appendix A. Abbreviations Appendix B. Derivation of the replica free energy Appendix C. Derivation of the R-RSB free energy term Appendix D. Derivation of the PPDE by continuation Appendix E. Multidimensional generalization of the PPDE Appendix F. An identity between Green functions Appendix G. PDEs for high temperature Appendix H. Longitudinal stability for high temperatures References
E-mail address:
[email protected] (G. GyoK rgyi). 0370-1573/01/$ - see front matter 2001 Elsevier Science B.V. All rights reserved. PII: S 0 3 7 0 - 1 5 7 3 ( 0 0 ) 0 0 0 7 3 - 9
332 334 335 337 340 340 358 370 370 372 373 374 375 376 376 378 379 380 383 383 384 385
G. Gyo( rgyi / Physics Reports 342 (2001) 263}392
265
Abstract In this article we review the framework for spontaneous replica symmetry breaking. Subsequently that is applied to the example of the statistical mechanical description of the storage properties of a McCulloch}Pitts neuron, i.e., simple perceptron. It is shown that in the neuron problem, the general formula that is at the core of all problems admitting Parisi's replica symmetry breaking ansatz with a one-component order parameter appears. The details of Parisi's method are reviewed extensively, with regard to the wide range of systems where the method may be applied. Parisi's partial di!erential equation and related di!erential equations are discussed, and the Green function technique is introduced for the calculation of replica averages, the key to determining the averages of physical quantities. The Green function of the Fokker} Planck equation due to Sompolinsky turns out to play the role of the statistical mechanical Green function in the graph rules for replica correlators. The subsequently obtained graph rules involve only tree graphs, as appropriate for a mean-"eld-like model. The lowest order Ward}Takahashi identity is recovered analytically and shown to lead to the Goldstone modes in continuous replica symmetry breaking phases. The need for a replica symmetry breaking theory in the storage problem of the neuron has arisen due to the thermodynamical instability of formerly given solutions. Variational forms for the neuron's free energy are derived in terms of the order parameter function x(q), for di!erent prior distribution of synapses. Analytically in the high temperature limit and numerically in generic cases various phases are identi"ed, among them is one similar to the Parisi phase in long-range interaction spin glasses. Extensive quantities like the error per pattern change slightly with respect to the known unstable solutions, but there is a signi"cant di!erence in the distribution of non-extensive quantities like the synaptic overlaps and the pattern storage stability parameter. A simulation result is also reviewed and compared with the prediction of the theory. 2001 Elsevier Science B.V. All rights reserved. PACS: 07.05.Mh; 61.43.!j; 75.10Nr; 84.35.#i Keywords: Neural networks; Pattern storage; Spin glasses; Replica symmetry breaking
266
G. Gyo( rgyi / Physics Reports 342 (2001) 263}392
1. Introduction and overviewH 1.1. Introduction In the past one-and-a-half decade, statistical physical methods yielded a rich harvest in theoretical and practical results in the exploration of arti"cial neural network models. In contrast to more traditional mathematical approaches, such as combinatorics, statistical data analysis, graph theory, or mathematical learning theory, the main emphasis in statistical physics lies on interconnected model neurons, considered as a physical many-body problem, in the limit of large number of variables. The latter property renders the problems similar to statistical mechanical systems in the thermodynamical limit, that is, when the number of particles is very large. This does not necessarily mean large number of units in a neural network, the thermodynamic limit applies also in the case of a single neuron if the number of adjustable variables, the analog of synaptic strengths of biological neurons, is su$ciently large. A much studied type of network is constructed from the McCulloch}Pitts model neuron [1], called also single-layer, or simple, perceptron if it is operating alone as a single unit [2]. In this paper we will examine the single model neuron's ability to store, i.e., to memorize, patterns, crucial for the understanding of networked systems. The paper is strictly about the arti"cial model neuron, and does not imply biological relevance. However, the notions neuron and synapses, the latter designating coupling strength parameters, are biologically inspired, and will use them throughout. We shall apply the statistical mechanical framework introduced by Gardner and Derrida [3}5] in 1988}89, which gave birth to a sub"eld of the theory of neural networks. Since then, the McCulloch}Pitts neuron has become well understood below the storage capacity, where patterns, or, examples, can be perfectly stored. The region beyond it, however, remained the subject of continuous research [6}12]. If the number of patterns exceeds the capacity then there is no way of storing all of them. One possible approach beyond capacity is to choose a quantity to be optimized. Examples for such a quantity are the stability of the patterns } in other words, their resistance to errors during retrieval } , or, the number of correctly stored patterns irrespective of their stability. Such problems can be formulated by means of a cost, or, energy function, giving rise to a statistical mechanical system. In the case of minimization of the number of incorrectly stored patterns, di$culties have arisen on every front where the problem was attacked. On the one hand, the analytical method inherited from spin glass research is no longer applicable in its simplest form, that is, the so-called replica symmetric (RS) ansatz breaks down. On the other hand, near and beyond capacity numerical algorithms begin to require excessive computational power. The physical picture behind that is the roughening of the landscape of the cost function the algorithms try to minimize. Phases of similar complexity, wherein the optimum-"nding algorithm, the analog of the dynamics in the statistical mechanical system, slows down to the extent that can be considered as breakdown of ergodicity, were observed in combinatorial optimization problems and still keep eluding analysis [13}15]. Several empirically hard optimization problems [13], including minimization of error beyond storage capacity for the McCulloch}Pitts neuron [16], are known to belong to the so-called non-deterministic polynomial (NP) complete class. It is of signi"cance, if by means of statistical physical methods some properties of the energy, or, free energy, landscape of NP-complete problems can be clari"ed. The statistical physical equivalent of a few NP-complete
G. Gyo( rgyi / Physics Reports 342 (2001) 263}392
267
systems were shown, in averaged thermal equilibrium, to exhibit spin-glass-like behavior [14]. That gives rise to the belief that there may be a general connection between NP-completeness and spin glass behavior. Thus the identi"cation and description of such thermodynamic phases may be instructive from the algorithmic viewpoint as well. It should be emphasized that NP-complete optimization problems are of diverse origin and many of their quantitative properties show little resemblance. Accordingly, those reformulated as statistical mechanical systems exhibit di!erent thermodynamic behavior, e.g., in averaged equilibrium have di!erent phase diagrams. Nevertheless, by the notation of glassy phases statistical physics may provide us with a common concept for understanding at least some ingredients of NP-completeness. It is the region beyond capacity of a single McCulloch}Pitts neuron that we claim to uncover in the present paper, within the averaged statistical mechanical description of thermal equilibrium. While the theoretical framework is in some respects di!erent from, rather a generalization of, the techniques applied to the Ising spin glass, we can now reinforce the so far vague expectation about the appearance of a spin glass phase and deliver quantitative results. Networks beyond saturation are long known to have complex features, here we demonstrate that even a single neuron can exhibit extreme complexity. The present article grew out of the work with P. Reimann, presented in a letter [17]. A more extended article, still in many respects a summary of the main results, has been accepted for publication [18]. The emphasis in the present paper is twofold. On the one hand, we give a comprehensive review of the technical details of the replica symmetry breaking theory, including the so-called continuous replica symmetry breaking. In the core is Parisi's original theory, which is here technically generalized to incorporate also the neuron problem. Furthermore, several extensions of the theory are introduced here that are applicable also to spin glasses. Along the mathematical parts an educative and self-contained line of reasoning is favored over a terse style. By that we would like to "ll a hiatus in the literature on the theory of disordered systems in that we present the technical details to those wishing to understand Parisi's method and possibly also wishing to use it to other problems. On the other hand, we apply the theory to the storage problem of a single neuron. Since the "rst statistical mechanical approach to this question, several other neural functions have been treated by statistical mechanical methods, and some of those may be more important for applications than pattern storage. However, even storage represents a strong theoretical challenge. Beside the Little}Hop"eld model of auto-associative memory [19], it can be considered as a point of entry of the statistical mechanical approach into hard problems in the "eld of arti"cial neural networks, and may open the way for further applications. On the technical side, the paper is centered about Parisi's method, successful in solving the mean equilibrium properties of the in"nite range interaction Ising spin glass, the Sherrington}Kirkpatrick model [14]. It turns out that after some generalization [17,18] of the original method [20}23], this becomes adaptable to the statistical mechanical formulation by Gardner and Derrida [4,5] of the neuron problem. The single neuron with a general cost function, i.e., error measure, was introduced by Griniasty and Gutfreund [6] and they called it potential. We show that this model will give rise to the most general term that admits Parisi's solution with one order parameter function. Under Parisi solution we understand for now the hierarchical structure of the order parameter matrix that gives rise to the nonlinear partial di!erential equation introduced by Parisi in an auxiliary role, allowing continuous replica symmetry breaking.
268
G. Gyo( rgyi / Physics Reports 342 (2001) 263}392
We would like to point out that all systems studied by means of the Parisi ansatz with one order parameter matrix, like the multi-spin interaction Ising [24] and the Potts glass near criticality [25], contained as special cases the aforementioned general term. Therefore, our results about the Parisi solution of the neuron, go well beyond the scope of neural computation. Here we call the reader's attention to the fact that Parisi's method has been applied to the study of metastable states in the Sherrington}Kirkpatrick model [26], where in fact three order parameter matrices emerged. That work indicates how the continuous replica symmetry breaking solution is to be obtained there and implicitly suggests the generalization we outline in this paper. Beyond giving a comprehensive account of Parisi's framework, we shall perform a concrete "eld theoretical study, including the calculation of averages, graph rules involving the Green functions for the evaluation of correlation functions, analytic derivation of a Ward}Takahashi identity, and integral expressions for the generalized susceptibilities necessary to determine thermodynamic stability of the solution. The insightful works about the second- [27] and higher-order correlations [28] of the magnetization in the continuous replica symmetry breaking phase of the Sherrington} Kirkpatrick model present concrete examples for "eld expectation values. These are generalized by our formulation in this article, new even in the context of spin glass problems. With the notable exception of the Sherrington}Kirkpatrick model and the formally analogous Little} Hop"eld system, where the low temperature phase was also extensively described [27}31], most studies of long-range interaction disordered systems concerned the region near criticality. The framework we present here is naturally designed for application deeply within the glassy phase. The di!erences between the Sherrington}Kirkpatrick and neuron models are obvious at "rst sight. The former is an Ising-type system, with a multiplicative two-spin interaction. In contrast, our main focus here is a spherical model-neuron, i.e., whose microstates are characterized by the synaptic couplings, continuous and arbitrary up to an overall normalization factor. The interaction between synapses is mediated by the error measure potential of Griniasty}Gutfreund [6], a function arbitrary to a large extent. In this light one may "nd the close analogy between disordered spin systems and the neuron model somewhat surprising. The similarity becomes, however, apparent when the statistical mechanical system is reduced to a variational problem in terms of a single order parameter function. Such have been available for the Sherrington}Kirkpatrick model, whereas we have constructed one for the single neuron. The variational framework is brief, it allows a quick derivation of the stationarity relations, gives account of thermodynamic stability in a subspace called longitudinal, and is of help in numerical computations. The di!erences between the Sherrington}Kirkpatrick and the neuron problems may be small in the variational free energy formula, but are still the cause of technical complications for the neuron problem. The physical reasons are that, "rstly, the neuron does not possess the spin #ip symmetry of the spin glass without external "eld, secondly, the neuron's error measure potential is more complicated than the multiplicative spin exchange energy term. Thus a few special properties of the Sherrington} Kirkpatrick model that allowed for some analytic results and simpli"ed numerics [27,29] are absent in the neuron. Generically similar complications may arise in other spin glass variants, so the much-studied Sherrington}Kirkpatrick model is to be considered as a rather special, simple case. It is worth mentioning brie#y two important areas among the many we do not treat in this paper. First and foremost, we do not discuss here the dynamical evolution of disordered systems. Since the ground-breaking early works on the dynamics of the Sherrington}Kirkpatrick model by
G. Gyo( rgyi / Physics Reports 342 (2001) 263}392
269
Sompolinsky and Zippelius [32}35], and the path-integral formulation for Ising spins by Sommers [36], many aspects of the dynamics of disordered systems have been clari"ed. They proved essential for the understanding also of numerical algorithms. However, one has to reckon that even averaged, stable equilibrium properties of complex phases of disordered systems are still far from clari"ed. The existence of many metastable states, the signature of glassy systems, and the ensuing complex nature of dynamical evolution, often termed as breakdown of ergodicity, puts in doubt even the existence of thermal equilibrium. On several model systems, however, extensive numerical simulations have demonstrated that equilibrium properties, averaged over the quenched disorder, can carry physical meaning. These properties are the subject of the present article. Secondly, from the viewpoint of mathematical rigor, the replica method raises many a question that we leave unanswered. In fact, quite a few scientists view this method with suspicion, partly because the limit of `zero number of replicated systemsa may seem to violate physical intuition. However, the large number of simulations con"rming replica symmetric solutions, and the fewer ones supporting replica symmetry breaking, as well as the absence of numerical results outright disproving the theory to this date, should provide ground for con"dence. Theoretical physics often employs methods of seemingly shaky mathematical foundations, whose con"rmation may come from comparison with real or numerical experiments. Such a con"rmation may then trigger rigorous clari"cation.
1.2. Overview Here we give a review of what subsequent sections are about. Section 2 introduces some fundamental concepts and gives a brief historical review on neural modeling and, to a very basic extent, on the Sherrington}Kirkpatrick model of spin glasses. In Section 3 the single McCulloch}Pitts neuron is described as a statistical mechanical system following Gardner and Derrida [4,5] and Griniasty and Gutfreund [6]. Pattern storage is interpreted as an optimization problem in the space of synaptic coupling strengths, and the ensuing thermodynamic picture is outlined. The replica free energy for various prior distributions of synapses is derived, such as the spherical constraint as well as arbitrary distribution of independent synapses. Highlighted is the central role of the neural local stability parameter, whose distribution gives through a simple formula the average error. Most of this section recites known concepts, with a few new details. Sections 4}6 are devoted to the Parisi solution. We start out from the `harda term in the replica free energy of the neuron, that can be considered as a generalization of free energy terms emerging from the classic long-range interaction, disordered, spin problems. In Section 4 a comprehensive presentation of the Parisi solution is given, including the derivation of Parisi's partial di!erential equation. It is demonstrated that this equation incorporates all "nite replica symmetry breaking ansaK tze, besides continuous replica symmetry breaking. Parisi's partial di!erential equation gives rise to a collection of related partial di!erential equations, they are reviewed here, and several useful Green functions are presented, among them prominently the Green function for Parisi's partial di!erential equation. Section 5 contains new results such as analytic expressions for expectation values and correlation functions of replica variables. The eigenvalues of the Hessian of the replica free energy are discussed, determining thermodynamic stability. The Green function of Parisi's partial di!erential equation turns out to be the "eld theoretical Green function that correlators are composed of,
270
G. Gyo( rgyi / Physics Reports 342 (2001) 263}392
and allows the introduction of a graph technique. Section 6 discusses a few aspects of the Parisi solution and two particular cases where Parisi's partial di!erential equation can be explicitly solved. We return to the special problem of the model neuron in Sections 7 and 8, and apply the rather abstract results of the preceding sections to it. The case of continuous synapses with the spherical constraint, including the conditions of stationarity and thermodynamic stability is analyzed in detail in Section 7. In the limit of high temperature and large number of patterns the formalism becomes easily manageable, while exhibiting a nontrivial phase diagram with three di!erent glassy states. This section contains our variational approach, the main result being a variational free energy whence thermodynamic properties can be straightforwardly derived and numerically explored. By means of the various partial di!erential equations several relations about the stationary state are uncovered. The scaling required by the low temperature phase is described. The variational free energy is numerically evaluated for several characteristic parameter settings, together with the order parameter function and the probability density of local stabilities. Previous simulation data [37] were improved upon in Ref. [18], whence we redisplay the comparison of simulation results with the theoretical prediction. The case of arbitrarily distributed independent synapses is considered and the corresponding variational framework presented in Section 8. Often used abbreviations are listed in Appendix A. Further appendices contain more technical details. Appendix B gives the derivation of the replica free energy for synapses with spherical as well as with independent but otherwise arbitrary normalization. Appendix C shows how the starting formula of Section 4, the `harda free energy term emerges. In Appendix D the short way of deriving Parisi's partial di!erential equation is given, and this requires the continuity of the order parameter function. Note that this equation is valid even in the case of discontinuities, but then the derivation, as shown in Section 4.1.2, is more involved. We do not pursue in the paper the case of vector order parameters, but give a brief account of how Parisi's partial di!erential equation for a vector "eld emerges in Appendix E. A technically useful identity between Green functions is derived in Appendix F, and the high-temperature limit of some relevant partial di!erential equations are presented in Appendix G. The only case where we can show longitudinal stability far from criticality is analyzed in Appendix H. As also stated in the Acknowledgment, sections with special contribution from P. Reimann are marked by an asterisk.
2. Arti5cial neural networks and spin glassesH The purpose of this section is to put the often technical analysis of later parts of the paper in the wider context of neural networks and spin glasses. The central issues of this work are the intricate details of Parisi's continuous replica symmetry breaking (CRSB) scheme and the adaptation of the method to the equilibrium storage properties of a McCulloch}Pitts neuron or simple perceptron. We have made an attempt to cover the most relevant literature on these two narrower themes. On the other hand, we also mention other subjects like learning algorithms, generalization and unsupervised learning, layered perceptrons, and spin glass models, where our selection of references is far from complete, and not necessarily even representative.
G. Gyo( rgyi / Physics Reports 342 (2001) 263}392
271
2.1. The McCulloch}Pitts neuron and perceptrons The model of a neuron as put forth by McCulloch and Pitts in a ground-breaking paper [1] in 1943 has attracted since continuous interest [19]. While inspired by real neurons in the brain, it is oversimpli"ed from the biological viewpoint. The model neuron can assume two states, one `"ringa, the other `quiescenta. The state depends on input signals, obtained possibly from other such units, and on the coupling parameters that weight the inputs. The couplings are often termed `synaptica in reference to the synapses, the connection points of biological neurons. Mathematically speaking, the model neuron computes the projection of an N-dimensional input S along a vector J of synaptic couplings and outputs m"1 (say it "res) or m"!1 (it is quiescent) according to the sign of this product J ) S as
, m"sign J S . I I I
(2.1)
The argument of the sign can be extended by a constant threshold, which alternatively may be represented by J if one only allows S "1 as input. Remarkably, as already McCulloch and Pitts noticed, a su$ciently large collection of such model neurons, when properly connected and the couplings properly set, can represent an arbitrary Boolean function. The model can be naturally extended to continuous outputs, when the sign function is replaced by a continuous transfer function, generally of sigmoid shape [19]. The next major step forward was achieved with the introduction of the perceptron concept by Rosenblatt [2]. The idea is to place a number of McCulloch}Pitts neurons into di!erent layers, with the output of neurons in one layer serving as input for those in the next, hence its name multi-layer feedforward network. As it was intended to model vision, such a network is also called multi-layer perceptron. The input to the network as a whole goes into the "rst layer, while the "nal output is that of the last layer. A widely applied learning concept is to try to determine appropriate synaptic couplings J for all the neurons so as to satisfy a prescribed set of input}output data, called training examples. In other words, the aim is to store the training examples. One of the motivations for doing so is that a possibly existing systematics behind the training examples may be approximately reproduced also on previously unseen inputs, that is, the network will be able to generalize. The special case of a single McCulloch}Pitts unit is a single-layer perceptron, called also simple perceptron by Rosenblatt, and lately sometimes just perceptron. For the simple perceptron with binary outputs, as de"ned in Eq. (2.1), he proposed an explicit learning algorithm that provably converges towards a vector of synaptic couplings J, which correctly stores the training examples, provided such a J exists. Simultaneously with Widrow and Ho! [38,39], he also studied two-layer perceptrons with an adaptive second layer, while using the "rst layer as preprocessor with "xed (non-adaptive) synaptic couplings, however, without being able to generalize his learning algorithm to this case. The "eld was driven into a crisis by the observation of Minsky and Papert [40] that the simple perceptron (2.1) is unable to realize certain elementary logical tasks. Con"dence returned when the so-called error back-propagation learning algorithm began to gain wide acceptance (see [41] and further references in [19]). This algorithm performs training by examples of fully adaptive multi-layer feedforward networks with generically di!erentiable transfer function. Such networks,
G. Gyo( rgyi / Physics Reports 342 (2001) 263}392
272
if chosen su$ciently large, are known to be capable to realize arbitrary smooth input}output relations see, e.g., [42,43]. Though this algorithm and its various descendants converge often quite slowly and in principle one cannot exclude that they get stuck before reaching a desired state, they have been successful in a great variety of practical applications [19]. 2.2. Associative memory Besides the layered feedforward perceptron architectures, a second eminent problem in neural computation is the so-called associative memory network or attractor network. We limit our discussion to the auto-associative case, i.e., the memory network is addressable by its own content. The concept can be traced back to Refs. [44,45] and rediscovered later (see [19] for further references). The recurrent (in contrast to feedforward) network of McCulloch}Pitts model neurons, originally suggested by [46}48] was especially suited for the task. A recurrent network contains interconnected units where signals pass through directed links that can form loops. Here the desired patterns to be stored correspond to collective states of the units in the network, and the idea is to de"ne a discrete-time dynamics of the states so that the prescribed patterns (examples) are "xed point attractors. For a collection of N model neurons (2.1), the outputs m , k"1,2, N, at I a given time step t are taken as new inputs S (t) for the next time step. Denoting by J the synaptic I GI coupling by which the ith neuron weights the signal S (t) stemming from the kth neuron, we can I write the discrete-time dynamics of neurons with binary output as
S (t#1)"sign G
, J S (t) , GI I I
(2.2)
where self-interactions are usually excluded by setting J "0. Taking an input pattern S"S(0) as GG initial condition, we understand in the de"nition (2.2) that the update is done sequentially, either by scanning through the S 's one after the other, i"1, 2,2, N, 1, 2,2, or by randomly selecting the G sites i one after the other. Such a dynamics is supposed to evolve towards the closest attractor (hence the name attractor network). If this attractor is a "xed point, that it is if the couplings are symmetric as J "J , a previously unseen pattern S can be associated with one of the stored GI IG examples, assumed to be the `most similara of all stored patterns (hence the name associative memory network). Note that for such an associative memory network patterns S have binary components S "$1. In case of units with continuous states one only requires that the length "S" I goes like N for NPR. We mention that if the synaptic couplings are non-symmetric, convergence to a "xed point is no longer certain and chaos can arise [49,50]. Given the patterns to be stored, the aim is to construct a dynamics with prescribed attrac- tors. This is the reverse of and possibly more di$cult than the more conventional problem of "nding the attractors for a given dynamical system. If we accept the neural dynamics like in Eq. (2.2), the task is then to set the J couplings to such values that lead to the desired attractors. In his pioneering GI works [51,52] Little suggested an approach to this problem by giving an explicit form for the synaptic couplings J of the McCulloch}Pitts neurons as inspired by the ideas of Hebb [53] about GI the working of brain cells. Little de"ned a parallel update rule for (2.2) and included a stochastic element characterized by temperature. Hop"eld's milestone contribution [54,55] consisted in reformulating the dynamics (2.2) as a sequential update algorithm, which led to an optimization
G. Gyo( rgyi / Physics Reports 342 (2001) 263}392
273
problem with an energy function. We will call the network with the dynamics (2.2) associative memory, while in the special case, when the synaptic couplings J are chosen according to the GI Hebb rule, the name Little}Hop"eld model will be used. For a neuro-physiological argument for a non-Hebbian learning rule we refer to [56]. The associative memory network (2.2) may be appealing because it models, albeit very crudely in details, a biological concept, its use for practical purposes is in doubt [4]. Indeed, the required storage space for the synaptic couplings is comparable to that for directly storing the patterns, and the computational e!ort of the retrieval dynamics (2.2) is similar to a direct comparison of a given input pattern with all the stored patterns. Only with appropriate modi"cations of the original setup, e.g., non-uniformly distributed patterns, may a digital implementation become advantageous [4]. For various such modi"cations and their possible practical use we refer to [19].
2.3. Sherrington}Kirkpatrick model Spin glasses are normal metals (e.g. Cu or Au) with dilute magnetic impurities (e.g. Mn or Fe), or, lattices of random mixtures of magnetic ions (e.g. Eu Sr O) exhibiting a freezing transition of the V \V spin disorder at low temperatures [57]. Due to spatial disorder, the spin interactions can be considered as random. The random sign of the interactions can be the cause of one of the basic features of spin glasses, the e!ect of frustration [58], when the interaction energies of all spin pairs cannot be minimized simultaneously. In a pioneering paper, Edwards and Anderson [59] introduced a simpli"ed model of a spin glass, essentially an Ising system with randomly selected, but "xed, exchange couplings. The in"nite-range interaction version of that is called the Sherrington} Kirkpatrick (SK) model [60,61] and is considered a realization of the mean "eld approximation. The theoretical analysis of the SK-model triggered the invention of novel statistical mechanical concepts and methods which subsequently found applications in modi"ed spin glass models such as the random energy [62] and p-spin interaction models [63,24,64], the Heisenberg [65] and the Potts glass [66,67,25], multi-p-spin and quantum spin glass models [68}70]. Methods inherited from spin glass theory also provided insight into many other problems, several of them originating from outside of physics. Prominent examples are various models of interfaces in random environment [71}75], granular media [76], combinatorial optimization (see [14] for an early review and [15] for a new development), game theory [77], protein and nucleic acid folding [78}82], and noise reduction in signal processing [83]. Last but not least, as we will expound it in the present paper, methods "rst introduced for describing the equilibrium properties of the SK model are of paramount importance in the statistical mechanical approach to neural networks. We give, therefore, a brief account of the SK model, concentrating on basic properties in thermal equilibrium. The general mathematical framework described in the main part of this paper covers the SK model as a special case. For pedagogical introductions into the calculation techniques we also refer to [74; 19, Chapter 10; 84, Chapter 3]. A detailed discussion of the physical content of the solution can be found, e.g., in [61,57,14,84]. We only mention here that the question, whether the solution of the SK model provides a qualitatively appropriate description of short-range interaction spin glasses, is still debated. See Refs. [85}90] for two exchanges on the subject, and [91] for a review and simulation results.
274
G. Gyo( rgyi / Physics Reports 342 (2001) 263}392
The state variables of the SK model are the Ising spins S "$1, interacting via random G coupling strengths J (i, k"1, 2,2, N). In the absence of external magnetic "eld, the spin GI Hamiltonian is of the form 1 (2.3) HJ (S)"! J S S GI G I 2 G$I and the couplings J are independently sampled from an unbiased Gaussian distribution with GI variance 1/N. The scaling by N guarantees the extensivity of the energy in the thermodynamic limit NPR. The feature that the interactions J are randomly chosen but then frozen while the spins GI obey Boltzmannian thermodynamics is summarized by our calling the J's quenched variables. An important goal is then to calculate, in this limit, the free energy per spin (2.4) f "! lim (Nb)\ ln ZJ . 1) , Here b"1/(k ¹) is the inverse thermal energy unit and ZJ is the partition sum S e\@&J S over all spin con"gurations S. The sum over the discrete spin states S is often denoted by a trace as TrS . The interactions J being quenched random variables, the expression (2.4) as it stands is analytiGI cally intractable. Physically, one expects that two di!erent realizations of the random interactions J will exhibit the same behavior at thermal equilibrium for NPR. Mathematically, this means GI self-averaging of the free energy density f , i.e., for any randomly sampled set of the J , Eq. (2.4) 1) GI yields the same result with probability 1, allowing us to rewrite ln ZJ as an average 1ln ZJ 2 over the quenched disorder J. Rigorous mathematical discussions of this property for the SK and the related Little}Hop"eld model can be found in Refs. [92}94]. The direct evaluation of 1ln ZJ 2 is di$cult, but it can be considerably simpli"ed by means of the replica method. This was independently discovered several times (see discussions in Refs. [61,95]) but well known only since its application to the spin glass problem by Edwards and Anderson [59]. The "rst step of this method consists in what has become known as the `replica tricka, xL!1 "ln x . lim n L Thus Eq. (2.4) can be rewritten as
(2.5)
1!1ZLJ 2 . f " lim lim (2.6) 1) bNn , L The name replica refers to the fact that the nth power of ZJ is the partition function of n non-interacting, identical replicas of the original system. The average 122 will create interactions between the replicated systems. As second step we interchange the two limits in (2.6), which has been proved valid for the SK model by van Hemmen and Palmer [95]. A further step consists in the assumption that it is su$cient to evaluate 1ZLJ 2 for integer n and then interpret n as real variable (`continuationa) in order to evaluate the limit nP0. In doing so, the point is that the averaged partition sum 1ZLJ 2 with integer n's can be technically tackled, while with non integer n it is as intractable as the 1ln ZJ 2 of Eq. (2.4). The fourth step is the evaluation of 1ZLJ 2
G. Gyo( rgyi / Physics Reports 342 (2001) 263}392
275
by means of a saddle point approximation, becoming exact as NPR. The detailed calculations along this program are given in [60,61] with the result 1 (2.7) f "lim min f (Q) , 1) n Q 1) L nb b (2.8) f (Q)"! # q !b\ ln Z Q . ?@ @ 1) 4 4 ?$@ Here the minimization } stemming from the saddle point approximation } runs over all symmetric, n;n matrices Q with elements q "1 and !14q 41 (a, b"1, 2,2, n being the replica ?? ?@ indices). Furthermore, Z Q is formally identical to ZJ if one sets N"n and J"bQ, a specialty of @ the SK model. The function f (Q) is often referred to as replica free energy. 1) The practical meaning of (2.7) can be understood as follows. A direct analytical evaluation of the minimum in (2.7) for arbitrary integer n is typically not feasible. Therefore, one introduces an ansatz for Q with a set of variational parameters k that lead to formulas explicitly containing n, and so continuation of formulas containing the elements of Q to real n-values becomes feasible. Then (2.7) is to be understood as "rst a minimum condition for general n by di!erentiating the replica free energy f (Q) with respect to the matrix elements q , the so-called stationarity condition, and the 1) ?@ requirement of at least the absence of negative eigenvalues of the second derivative matrix, the Hessian, of f (Q), i.e., the condition of local thermodynamic stability. (Here we disregarded the 1) border case when the minimum does not satisfy stationarity, and the situation when there may be several locally stable states. Interestingly, in the SK model these cases do not occur, but they do in other systems.) These relations cannot be continued to n"0 without further parametrization. But insertion of the ansatz with the variational parameters k allows for the limit nP0. In this light (2.7) does not prescribe a customary minimization, rather de"nes the minimum condition consisting of the aforementioned stationarity and stability relations, which await parametrization. On the other hand, we can reverse the order of parametrizing and minimum search. The parametrization should allow us to construct f (Q(k)) for any n. The minimization condition for 1) integer n with respect to the variational parameters k implies, in the generic case, the vanishing of the derivatives, and is supposed to admit continuation to real n-values. Closer inspection shows [14,61] that after such a continuation, in the limit nP0, the condition of local stability described above will no longer correspond to a local minimum of f (Q(k)) but rather to a local 1) maximum. This can be crudely understood when one realizes that the second term on the r.h.s. of (2.8) contains (L ) independent terms, equal the number of order parameters. The (L ) changes sign when n passes from n'1 to n(1, so for n(1 one has formally a negative number of order parameters q . This does not cause, however, confusion, because due to the parametrization of the matrix ?@ Q we do not need to work with the elements q for n(1. A similar sign change of terms obtained ?@ by expanding the third term in (2.8) changes the nature of the extremum of the free energy from minimum to maximum. The above reasoning thus leads, within a given parametrization, to 1 f "max lim f (Q(k)) . 1) n 1) k L
(2.9)
276
G. Gyo( rgyi / Physics Reports 342 (2001) 263}392
This formula prescribes global maximization in k. So if several local maxima are found, the f values there should be compared and the global maximum within the given parametrization is 1) thus well de"ned. However, we are in principle still not allowed to bypass the aforementioned local stability analysis, because a global maximum as in (2.9), within a given parametrization, may still be unstable with respect to changes in the q matrix elements. Thus one should evaluate the spectrum ?@ of the Hessian matrix of f (Q) and require that no negative eigenvalues exist in the limit nP0. 1) This leads to the at "rst sight contradictory prescriptions, namely, the minimization in (2.7), formulated as the absence of negative eigenvalues of the Hessian of f (Q), and the maximization of 1) the parametrized free energy in (2.9). Closer inspection shows, however, that there is no logical contradiction. Indeed, maximization in the restricted space of the variational parameters requires generically the negative semide"niteness of another Hessian, the one for f (Q(k))" . In special 1) L cases one can show that some eigenvalues of the Hessian of f (Q) correspond to the eigenvalues of 1) f (Q(k))" , such that non-negativity for the former ones implies non-positivity for the latter ones 1) L [61,96}98]. Following the reasoning in Chapter 3.3 of Ref. [84] this can be intuitively understood in the way that the in"nitesimal increment around an extremum of f (Q) is the sum of 1) contributions negative in number for n(1, responsible for the reversal of the type of extremum. For a more recent discussion of the problem of maximization in a descendant of the SK model see Ref. [99]. Aiming at an exact solution of the original minimization problem (2.7), one should choose a variational ansatz so that it includes the global solution. In principle, a parametrization should be adopted so that it gives a maximal f value over all possible parametrizations. Veri"cation of the 1) global nature of a maximum found within a given parametrization is a hard problem, physical intuition for the right parametrization and comparison with reliable simulation data, if such exist, may be of guidance. Considering that the replicated partition sum in (2.6) is symmetric under permutation of the replicas, a "rst guess is that also the minimizing Q-matrix in (2.7) } characterizing the state of the system at equilibrium } exhibits this symmetry. This leads us to the replica symmetric (RS) ansatz with a single variational parameter j"q"q 3[!1,1] for all aOb, named Edwards}Anderson ?@ order parameter. The explicit evaluation of (2.9) with such an ansatz and clari"cation of the physical content of the resulting RS solution has been performed in Refs. [60,61]. (For the sake of brevity we do not discuss the inclusion of external magnetic "eld and that of a non-zero average of the couplings J , some main concepts can be presented without them.) The local stability GH conditions for the RS solution have been worked out by de Almeida and Thouless (AT) in [96]. It turns out that the AT stability condition is ful"lled only for temperatures beyond a critical k ¹ "1, below that the RS solution is AT-unstable, implying that the replica symmetry of the system in (2.6) must be spontaneously broken by the equilibrium state of the system. Intriguingly, this instability does not announce itself at any integer n, it only appears as n decreases from 1 towards 0 [96,95,100]. Further evidences about the fact that the RS solution is incorrect are the negative ground state entropy [60] and magnetic susceptibility [101], and its predictions for the ground state energy and the probability density of the local magnetic "eld that contradict simulations, see [14]. In order to "nd a consistent description of the SK model at low temperatures, several replica symmetry breaking (RSB) parametrization for the Q matrix in (2.7) have been proposed [102}107]. In what can be viewed as the generalization of Blandin's one-step RSB (1-RSB) [102], Parisi
G. Gyo( rgyi / Physics Reports 342 (2001) 263}392
277
formulated on physical grounds a hierarchical structure for the Q matrix (see also Section 3.3 in [14]) and introduced the so far only RSB ansatz compatible with these conditions in an ingenious series of works [108,109,20}23,110]. Depending on the number 2R#1 of variational parameters k in this ansatz, one speaks of an R-step, R"0, 1, 2,2, RSB ansatz (R-RSB), and of continuous RSB (CRSB) in the limit RPR. The RS ansatz corresponds to R"0, and each higher step contains the previous ones as special cases. Later in the paper the explicit form of Parisi's ansatz for Q will be given and its consequences thoroughly discussed. Following Parisi's study, mostly focusing on the region near criticality, the deep spin glass phase was also extensively analyzed within the CRSB ansatz, see Refs. [14,84]. We highlight among the non-perturbative approaches the work of Sommers and Dupond [29], where a variational free energy especially suited for numerical evaluation was constructed and used to resolve ground state properties. One of their successes was a theoretical prediction for the probability density of the local "eld, that favorably compared to the simulation of Palmer and Pond (see Fig. 3.6 of [14]). The generalization of the AT stability conditions for the case of an R-RSB solution has been developed in a series of works by De Dominicis, Kondor, and TemesvaH ri initiated with Ref. [98] and presented in the most general form in Ref. [111]. Due to the complicated form of these stability conditions, they could so far be veri"ed for Parisi's solution only slightly below the AT instability. Yet it is widely believed that Parisi's solution captures the correct behavior of the SK model in the entire low-temperature regime. The global stability of Parisi's RSB ansatz has not been veri"ed by rigorous mathematics. It is physically supported in part by the suggestive picture of hierarchical organization of states in the glassy phase. Furthermore, it shows none of the aforesaid inconsistencies the RS solution was plagued by, and it compared satisfactorily with simulations. In fact, we do not know of any instance, where the replica method with Parisi's ansatz has been applied and at the same time well-founded analytical or numerical approaches are available and would yield incompatible results. Neither is a case known to us which admits application of the replica method but cannot be handled in a self-consistent way by Parisi's ansatz with su$ciently many, possibly in"nitely many, steps of RSB. As an alternative to the replica method, Thouless, Anderson and Palmer [112] have established a modi"ed form of the Bethe}Peierls method, reproducing the RS results at high temperatures, while di!ering from both the RS and Parisi's solution in the AT-unstable region. This approach has been further developed by Sommers [113,114] in a way that was later realized [106,115] to be equivalent, in a certain limit, to a generalized version of the RSB ansatz by Blandin and coworkers [102,105]. A second alternative method is the dynamical approach of Sompolinsky and Zippelius [32,33,116], capturing Parisi's solution in the static case [34,117]. The latter may in turn be reproduced by an iterative extension of the Blandin}Sommers scheme [106], the "rst step towards the correct Parisi solution. A further modi"ed form of the Bethe}Peierls approach } the so called cavity approach } by MeH zard et al. [14,118] contains the Thouless, Anderson, and Palmer equations as a special case but can also be extended to become equivalent to a Parisi ansatz with an arbitrary number of RSB steps. Again, this is not a mathematically rigorous method but rather an ansatz in combination with an intuitive physical line of reasoning, veri"ed by self-consistency in the end. While the physical picture is less elusive than that behind the formal nP0 limit, the equivalent replica method in conjunction with Parisi's ansatz seems to be in a higher developed status as far as applicability for practical calculations is concerned. For instance, the self-consistency condition of the cavity approach, expected to be equivalent to the thermodynamic stability conditions of the
278
G. Gyo( rgyi / Physics Reports 342 (2001) 263}392
replica method [14], has so far been explicitly worked out only in the simplest case, corresponding to the AT stability condition for the RS state. Another formulation of the dynamics was given by Sommers [36], who devised a path-integral approach specially suited for discrete variables like Ising spins. His results are in accordance with those of Sompolinsky and Zippelius, who used a continuous spin model that, in a singular limit, also covered the case of Ising spins. A recently suggested alternative method [119] studies the n-dependence of 1ZL2, reiterates that di!erent continuations to nP0 give the RS and RSB solutions, without the need of explicitly inserting Parisi's ansatz. However, the heuristics involved may cause that the exact solution is obtained only in special cases. There is a large family of spin glass models consisting of various generalizations of the SK model that have been successfully treated by Parisi's ansatz, albeit mostly near criticality in a perturbative manner [84]. A prominent exception is Nieuwenhuizen's multi-p-spin interaction model with continuous, spherical, spins [68]. The "xed p"2 case is the long known spherical SK model, which can be solved within RS [84]; with multi-p-spin interactions, however, it can exhibit RSB. Remarkably, in CRSB phases the continuously increasing part of Parisi's order parameter function can be analytically calculated for any temperatures. Even with a "xed p'2, one can also have phases where the 1-RSB solution is exact, a situation discussed for the neuron with Ising couplings in Section 2.5.2. The multi-p-spin model has also become a test bed for equilibrium thermodynamic calculations meant to capture asymptotic states of dynamics not maximizing the free energy [99]. It is well known that for a ferromagnet, the symmetry of the system as a whole, i.e., of the Hamiltonian, is spontaneously broken by the state of the system at thermal equilibrium, accompanied by a spontaneous breaking of ergodicity [120]. Such a state can be reached by decreasing the temperature, when system undergoes a transition from a paramagnetic phase exhibiting symmetry and ergodicity to a ferromagnet with only axial symmetry and restricted ergodicity. In the SK model described by the replica free energy (2.7), as temperature decreases, an analogous phase transition from a paramagnetic into a spin glass phase takes place [61,121,122,57], with a concomitant spontaneous breaking of ergodicity and of RS. The transition can be monitored by Parisi's variational parameters k at stationarity, thus playing the role of order parameters [20,110,123]. The emerging intuitive picture of RSB is that of a very complicated, rugged, free-energy landscape in some coarse-grained state space, with a large number of local minima, many of them nearly degenerate, as well as a number of global minima, separated by free-energy barriers, whose height diverges in the thermodynamic limit. What, in ordered systems, thermal equilibrium state is corresponds here to a global minimum, also termed as ergodic component, or pure thermodynamical state, or metastate. Within the Parisi solution pure states are organized according to a hierarchical, so-called ultrametric topology [30,28,124]. The ultrametric decomposition of the state space into pure states, from the practical viewpoint, helps in the calculation of non-self-averaging quantities [27,28], and is also a basic ingredient of the cavity approach in [14,118]. However, so far it withstood rigorous mathematical treatment, and as to real spin glasses, it is the subject of ongoing controversy [91,125]. We would like to add here that, in the context of neural networks, examples are known [126}128] where there are multiple ground states, and they are grouped into disconnected regions, i.e., ergodicity is broken, while the replica method implies that RS is preserved. The aforesaid physical picture about RSB can be maintained by distinguishing between pure states and ergodic components [126]. Furthermore, it is unclear whether it is a spontaneous symmetry breaking that takes place in those networks. In the present manuscript we
G. Gyo( rgyi / Physics Reports 342 (2001) 263}392
279
do not deal with such subtleties, and concentrate mainly on the replica method as a tool for calculation. The replica approach in conjunction with Parisi's ansatz provides so far the most complete description of the SK model in averaged thermal equilibrium. However, this scheme, as well as the equivalent cavity approach and the static limit of the path-integral formulation, involve certain procedures which, up to now, could not be put on a rigorous mathematical basis. On the one hand, there exists a number of remarkable rigorous results concerning the SK model: in Ref. [129] it was shown that the quenched average N\1ln ZJ 2 approaches the so-called annealed average N\ ln1ZJ 2 in the thermodynamical limit (termed strong self-averaging property) above the AT-line and in the absence of an external magnetic "eld. The evaluation of N\ ln1ZJ 2 is straightforward and reproduces the RS solution. The basic reason behind these conclusions is the vanishing of the Edwards}Anderson order parameter so that the usual e!ective coupling of the replicas after averaging out the quenched disorder does not arise, i.e., 1ZLJ 2 "1ZJ 2L . Further more, some explicit bounds pertaining to the low-temperature region have been obtained in [129] which imply [92] the existence of a phase transition at the same temperature as predicted by the AT-stability criterion. In [92] it was shown by means of a rigorous version of the cavity procedure, called martingale method in the mathematical physics literature, that if the Edwards}Anderson order parameter is self-averaging then the RS solution is exact. In [130] it was rigorously veri"ed that this order parameter is self-averaging and thus the RS solution is exact if the AT stability condition is ful"lled without and external magnetic "eld, and also under a slightly stronger than the AT condition in the presence of a "eld. In view of this theorem, it is suggestive that an AT-stable RS solution will provide the correct result also in other systems. It furthermore con"rms Parisi's RSB ansatz to the extent that this ansatz reduces to the RS result if the AT condition is satis"ed. Finally, the previously discussed evidences as well as the rigorous mathematical proof from [129] that the RS solution is incorrect at low temperatures, it follows that the Edwards}Anderson order parameter is not self-averaging. This feature is indeed reproduced by the Parisi solution. Another interesting rigorous result has been obtained in Refs. [131,132] via the martingale method, namely that there exists a set of `order parameter functionsa 04x(q)41 such that the SK free energy can be expressed in terms of antiparabolic martingale equations, each of them involving one such function x(q) and being exactly of the same form as the non-linear partial di!erential equation in Parisi's CRSB scheme. The remaining non-trivial step in order to complete a rigorous derivation of Parisi's CRSB solution is to show that this set of functions is e!ectively equivalent to a single function x(q). Finally, in [133,134] certain rather strong conditions are derived that should be satis"ed by the order parameter of a class of spin glass models } including the SK and short-ranged models. These constraints are indeed ful"lled by Parisi's solution but still leave room for other possibilities. We remark that the replica method in combination with the Parisi ansatz is not restricted to the SK model and its variants, this is also one of the main reasons why this paper was written. Nevertheless, most of the above rigorous results pertain to the SK model, only some of them have so far been generalized to the Little}Hop"eld network, and none but the last one to even further systems. 2.4. Little}Hopxeld network One of the main breakthroughs of the statistical physical approach to other "elds was achieved on the Little}Hop"eld model by the replica calculation of Amit et al. [135}137]. They considered
G. Gyo( rgyi / Physics Reports 342 (2001) 263}392
280
M randomly sampled patterns SI, k"1, 2,2, M, of dimension N, where N is the number of participating neurons, for a "xed value of the so-called load parameter a"M/N
(2.10)
in the thermodynamical limit NPR. The starting point of the statistical mechanical treatment is a canonical Boltzmannian formulation of the problem. A microstate is a con"guration of the neuron states S , i"1,2, N, and a pattern is considered as stored if it is a stable "xed point G attractor of the dynamics (2.2). The energy function for the random, sequential dynamics (2.2) is analogous to the Hamiltonian of the SK model in (2.3) [54]. The main di!erence is in the exchange couplings, taken now as J "N\ + SISI, called Hebb rule. Thus the patterns SI play the role of I G H GH the quenched disorder. At positive temperatures the dynamics (2.2), the update rule for the selected neuron, is non-deterministic and usually Glauber's prescription is applied, see, e.g., Ref. [84]. The original storage problem corresponds to the zero temperature limit. Within the RS ansatz Amit et al., obtained as central result that the maximal number M of patterns which can be stored with an error of a few percent, scales as M "a N in the thermodyn amical limit NPR with a critical capacity a K0.138. Criticality manifests itself by the drop of the overlap of a generic stationary state with the desired pattern from a value below, but close to, one to nearly zero. It has been immediately noticed [135] that the AT stability condition is violated at zero temperature for all a'0, thus for exact results RSB is required, but already a quite small temperature restores the AT stability and thus the validity of the RS solution. Applying the 1-RSB ansatz, Crisanti et al. [138] obtained a modi"ed critical capacity of a K0.144. The problem was reconsidered in the R-RSB, R"0, 1, 2, analysis of Ste!an and KuK hn [139], who put forth a ground state capacity a K0.1382 based on several cross-checking of their computation. The authors raise the possibility that the Parisi}Toulouse hypothesis [140], implying that in a CRSB solution the magnetization in the SK model does not depend on the temperature, believed to be exact for vanishing magnetization, holds also in the Little}Hop"eld model, at least as a good approximation. In that case, they conclude, the capacity is given by the intersection of the AT line and the RS phase boundary, that is, the capacity is essentially the one calculated from the RS solution. A CRSB calculation, an extension of Parisi's solution of the SK model within the formalism of Ref. [117], was performed by Tokita [31]. The sophisticated numerical method applied to evaluate the CRSB equations showed an instability near a K0.155$0.002, which he identi"ed as the capacity. Numerical simulations [54,135,138,141] gave estimates mostly between the aforesaid "nite R-RSB and Tokita's CRSB results. However, a more recent simulation [142], including a "nite-size scaling specially adapted for a discontinuous transition in the presence of quenched disorder, yielded a "0.141$0.0015, in better agreement with the former one. Given the fact that the numerical evaluation of the CRSB state to the required precision is a much more formidable task than that of R-RSB, R"0, 1, 2, and that even 1-RSB computations were the subject of debate [138,139], the question of theoretical prediction may still be considered as open. The main issue here is less the precise number but rather the salient features of the phase diagram like reentrance, the validity of the Parisi}Toulouse hypothesis, or what kind of RSB describes the various phases [139,31]. Tokita's framework involving the freedom of a gauge function is closely related to the variational approach for the SK model [117,29], inspired in turn by dynamical studies [34] where the static
G. Gyo( rgyi / Physics Reports 342 (2001) 263}392
281
gauge function is related to the time-dependent susceptibility. The variational framework we present in Section 7 on purely static ground, turns out to be very similar to those, albeit without our resorting to the gauge function. On the technical side, we are unaware of any non-perturbative CRSB analyses that aim at the ground state or at least regions with frustration far from criticality, beyond those performed for the SK model and descendants, as well as the related Little}Hop"eld model. Filling this hiatus was an important motivation for the present paper. The RS results of Amit et al. have been re-derived in several di!erent ways [143}147], based on certain assumptions which are possibly equivalent to that of RS. Alternative methods comparable to RSB, however, do not seem to be available yet. The authors of Ref. [145] speculate that their framework may admit such an extension, being based on Sommers' dynamical path-integral approach [36] which successfully reproduced some RSB features in the SK model. The following mathematically rigorous results for the Little}Hop"eld model are so far available. The self-averaging property of the free energy density has been proven in [93,94]. In [148] the RS solution is rigorously derived under the assumption that the Edwards}Anderson order parameter is self-averaging, and in [130] the latter assumption is shown to hold under a condition similar to, but somewhat stronger than, the AT stability condition. Finally, a constraint similar to ultrametricity on the order parameter has been derived in [133,134] which is indeed satis"ed by the RS solution at high temperatures and by Tokita's CRSB solution at low temperatures. 2.5. Pattern storage by a single neuron As we have seen, the McCulloch}Pitts model neuron is the elementary building block of two prominent types of neural networks, the layered, feedforward, perceptron and the associative memory. Therefore, the detailed exploration of such a single neuron is an indispensable prerequisite for a satisfactory understanding of the collective behavior of networked units. 2.5.1. Continuous synaptic coupling Firstly we describe the case of continuous synaptic couplings, i.e., arbitrary vectors J in (2.1). If their norm is "xed then the term spherical couplings is often used. Note that in Eq. (2.1) the norm does not in#uence the output. An early remarkable results is due to Winder [149] and Cover [150] regarding the maximal number M of input patterns for which a single McCulloch}Pitts neuron can correctly reproduce the prescribed outputs according to (2.1). This is understood as a theoretical maximum, i.e., without reference to any speci"c training algorithm that may be necessary to "nd the right couplings. For randomly sampled patterns SI, k"1, 2,2, M, their result for the critical capacity a "M /N in the limit NPR approaches, with probability 1, the value a "2, a widely referenced result in arti"cial neural networks. An easy to follow account of Covers geometrical proof, for arbitrary N, can be found in Section 5.7 of [19], and notable extensions have been worked out in [151}153]. A central notion for adaptive networks is the version space. This is the set of coupling vectors J compatible with the patterns, or, examples. Intuitively, it is clear that the version space shrinks as the number of patterns increases, and beyond the capacity the version space is empty, at least with probability one in the thermodynamical limit. A breakthrough was achieved when the space of synaptic couplings of a single McCulloch}Pitts neuron was explored, following the proposition of Gardner [3], by Gardner and Derrida within
282
G. Gyo( rgyi / Physics Reports 342 (2001) 263}392
both the microcanonical [4] and canonical [5] approaches. A main novelty of the concept was in reversing the traditional analogy between spin systems and neural networks. In the Little}Hop"eld model the states of the neurons form the `spin spacea, and the synaptic couplings are the quenched parameters. The new proposition was to consider the couplings as con"guration space for statistical mechanics, with constraints represented by randomly generated patterns to be stored, i.e., which should be reproduced by appropriate setting of the couplings, that is, to consider the version space. By the introduction of an appropriate cost, or, energy function in coupling space (further synonyms are Hamiltonian function, or, error measure), the stage was set for the statistical mechanical treatment. This does not restrict the study to the version space, but allows for "nite temperatures, so beyond capacity provides a framework to describe states with a given error, including the minimal positive error of the ground state. The common ingredient in both the Little}Hop"eld and the Gardner}Derrida concepts is that patterns, i.e., examples, represent the quenched disorder, else they are quite di!erent. For example, while the energy function of the Little}Hop"eld network closely resembles that of the SK model, not much formal analogy exist between spin systems and synaptic coupling space. In what was a novel application of the replica method, within the RS ansatz, Gardner and Derrida reproduced, and generalized to biased pattern distributions, the Winder}Cover result. They calculated many a characteristics for the region below the critical capacity a , and also proved convergence of training algorithms. We note here that the traditional problem of error-free storage corresponds here to the condition of zero energy in the ground state. If not all patterns can be accommodated by the couplings, that is, the neuron is beyond capacity, then, depending on the choice of the Hamiltonian, various positive ground state energies arise. The thermodynamical stability of the RS solution via the AT condition [96] was formulated here by Gardner and Derrida [5] and was revised later by Bouten [9,10]. It turned out that the RS ansatz beyond the critical capacity a "2 is unstable for the much studied energy function that measures the number of patterns that are not stored, i.e., of unstable patterns. This is sometimes called the Gardner}Derrida error measure and will be in our focus in the present paper. An improved 1-RSB ansatz by Majer et al. [7] and by Erichsen and Theumann [8], as well as the subsequent 2-RSB calculation by Whyte and Sherrington [11], continued to be plagued by similar instability beyond capacity. The latter authors could prove that no "nite R-RSB ansatz in the ground state, beyond capacity, may possibly be locally stable. In the present article we propose Parisi's CRSB ansatz as an appropriate description of a single neuron beyond capacity, within the limits of an equilibrium, averaged statistical mechanical treatment. As shown in [154], the e!ect of frustration, manifesting itself in the spontaneous breaking of RS beyond capacity, brings along from the viewpoint of numerical simulations, a very hard, NPcomplete problem [13,16]. That means that whatever algorithm is used to "nd an N-dimensional vector of synaptic couplings J with the smallest possible number of misclassi"ed examples, the time necessary for it is expected (a rigorous proof is not known) to increase faster than any power law with N. Simple algorithms which minimize the number of misclassi"cations locally, i.e., within a certain neighborhood of the initial choice for J, are due to Wendemuth [155,37]. While his result on the error measuring the number of unstable patterns signi"cantly overestimated the error, as demonstrated in Ref. [18] and cited in the present paper his algorithm may still yield acceptable approximations for global minimization as predicted by the CRSB theory. We refer also to Section 7.3 in [14] for the analogous observations in the context of the SK model. Returning to generic
G. Gyo( rgyi / Physics Reports 342 (2001) 263}392
283
NP-complete problems, by admitting some random element in the algorithm, the numerical e!ort can be reduced to some power of N, hence the name non-deterministic polynomial that NP stands for. The price to be paid then is that the absolute minimum will be found only with a certain probability [156}159]. A most widely used such method is simulated annealing [160] and its descendants [161]. As pointed out in [15], the average time required for the numerical solution may undergo a dramatic change if certain parameters are varied, without changing its NPcompleteness. Therefore, the so-called worst-case scenario, on which the classi"cation as NPcomplete is based, may in fact not capture very well the typical behavior, occurring with probability 1 as NPR, of such algorithms in speci"c applications. Conversely, a proof that a problem can be solved deterministically within polynomial times may still allow very long times for an algorithm to converge. Nevertheless, NP-completeness is generally considered as the signature of algorithmically hard tasks. It is natural to expect that some, possibly most, of the rigorous results and alternatives to the replica method for the SK and Little}Hop"eld model can be carried over to the simple perceptron. However, so far available is only the cavity method in its simplest form, equivalent to a RS solution, together with a self-consistency condition equivalent to the AT stability condition of the RS solution [162}165]. Beyond the critical capacity a "2 RS spontaneously breaks, entailing } like in the SK model } an ultrametric organization [30,28,124] of the synaptic couplings J that minimize, in the ground state, the number of incorrect input}output relations SI, mI, k"1, 2,2, M, in (2.1). Below a , a complementary picture arises by introducing `cellsa on the N-dimensional sphere of synaptic couplings Cr "+J " J "N, sign(J ) SI)"pI, k"1, 2,2, M, ,
(2.11)
labeled by the 2+ possible output sequences r"+pI,. The idea to study the simple perceptron in terms of these cells Cr is to some extent already contained in Cover's geometrical derivation of the storage capacity [150] and has been employed again in [166]. An appropriate quantitative framework has been elaborated by Monasson and co-workers [126,167}169] in the context of multi-layer networks and has later been adapted to the simple perceptron in [170}172]. Based on a replica calculation, this method enables one to characterize the distribution of cell-sizes "Cr " to exponentially leading order in N in terms of a so-called multifractal spectrum, as in the thermodynamical formalism for fractals [173,174]. This multi-fractal analysis opens an interesting view on the storage as well as the generalization properties of the simple perceptron. 2.5.2. Ising couplings Storage properties change considerably, if one restricts the analysis to so-called Ising couplings, where each component of J can take only the two possible values $1. This extra constraint is partly motivated by the fact that in a digital computer the J 's have a discrete representation. It has G been observed already by Gardner and Derrida [5] that a self-contained treatment by an RS ansatz of the critical storage capacity with Ising couplings is not possible within a canonical statistical mechanical approach. Krauth and MeH zard performed a 1-RSB analysis with the prominent result a K0.833 for the critical storage capacity of the Ising perceptron [175]. Their 2-RSB explorations furthermore indicate that no new solution arises w.r.t. 1-RSB. The RS state turns out to be globally stable up to
284
G. Gyo( rgyi / Physics Reports 342 (2001) 263}392
the capacity limit, the latter being signaled by a vanishing of the entropy. This is an intriguing coincidence that could not have been foreseen by the RS analysis, because therein the point whence the entropy becomes negative is obviously only an upper limit for the capacity. The need for RSB to calculate the capacity should be contrasted with the spherical case, Section 2.5.1, where the capacity could be determined within the RS solution. The reason for the di$culty here is in that the transition form perfect to imperfect storage is discontinuous for Ising couplings. Here the order parameter exhibits a jump in the sense that one of the overlaps in 1-RSB is not the continuation of the RS value, when a passes a . From the viewpoint of the order parameter such a transition can be termed "rst order. On the other hand, since the probability weight of the discontinuously appearing order parameter value vanishes at a , the "rst derivative of the free energy remains continuous and only the second one jumps. The Ising neuron also demonstrates the importance of global stability of a state. The RS solution formally exists beyond the transition and stays locally, i.e., AT-stable up to a"4/p. However, its free energy is smaller than that from RS, so global stability appears to be taken over by the 1-RSB solution, like in "rst-order transitions. It should be added that here the locally stable but globally unstable RS solution should be ruled out as a metastable state in the traditional sense because of its negative entropy. Furthermore, the 1-RSB solution is not a locally stable equilibrium state before the transition, so two spinodal points collapse onto the transition point. While a major part of the existing statistical mechanical investigations } including the SK model in (2.4) and our present study of the simple perceptron } are based on a canonical Boltzmannian formulation of the problem, Gardner's seminal calculations in Refs. [3,4] uses microcanonical ensemble. For the Ising perceptron, this approach was adopted by Fontanari and Meir [176], reproducing Krauth and MeH zards results without going beyond RS and verifying in particular the AT stability condition [96] as well as the physical requirement of a non-negative entropy. Computing the optimal vector J of synaptic couplings for the Ising perceptron is an NPcomplete problem [13,16] for any positive load parameter a, as demonstrated in Refs. [177,178]. The challenge of numerically estimating the critical capacity a has been attacked by several groups, most of them verifying a K0.833, with the exception of Ref. [179] which is criticized by the comment in [180]. Subsequent, more extensive computations in [181,182] appear to con"rm the original critical value. Below critical capacity, a multifractal analysis of the space of Ising couplings J, inspired by the work on the spherical case [126] as discussed in the previous section, has been worked out in [170,183]. Beyond criticality, a thermodynamical stability analysis [184] suggests that 1-RSB is locally stable at and beyond a . On the other hand, also the microcanonical RS approach of Fontanari and Meir [176] continues to coincide with Krauth and MeH zards results and satis"es the local thermal stability criterion of de Almeida and Thouless [96]. The above numerical and analytical "ndings have given rise to the observation that the Ising perceptron beyond capacity behaves quite similarly to Derrida's random energy model [62]. This system is the pPR limit of the p-spin interaction version of the SK model. In particular, the 1-RSB ansatz yields [63] indeed what is accepted as the exact solution of the problem within the canonical Boltzmannian approach and the zero entropy condition marks the transition from RS to 1-RSB. Interestingly, as it has been done originally by Derrida, even the spin glass phase of the random energy model can be described by the replica method, but without the need to introduce the 1-RSB ansatz. There by a direct calculation the mean free energy could be maximized, without
G. Gyo( rgyi / Physics Reports 342 (2001) 263}392
285
dealing with spin overlaps, so this can be considered as an independent con"rmation of RSB as applied later by [63]. In the case of the neuron with Ising couplings, like in the random energy model, an overlap q "1 arises, with probability exactly 1!x"1!¹/¹ . The fact that the microcanonical formulation within RS gave as minimal error the ground state error beyond capacity [176] as the canonical 1-RSB result [175], is a further peculiarity of Ising synapses. There is no technical contradiction, however, because if q "1 is set, then the 1-RSB free energy becomes equivalent to the RS microcanonical entropy. This can be understood, if one realizes that in the latter temperature is essentially an extra variational parameter, taking the role of 1/x, related to the aforesaid probability in 1-RSB. The peculiarity of the microcanonical approach was interpreted, and exploited for calculating the storage capacity of certain multi-layer perceptrons, in Ref. [185]. Further systems where stable 1-RSB phases arise, albeit generally without the zero entropy condition, are the p-spin interaction SK model [24], its spherical variant [64], the spherical, multi-p-spin interaction model [68], the Potts glass [66,67,25], and protein folding models [78}81]. The general framework in the present paper includes both continuous and Ising synaptic couplings J. Since the case of principal interest here is Parisi's CRSB ansatz, in the quantitative numerical evaluation beyond capacity we will focus on the example of the continuous, spherical, couplings. Whether or not a continuous RSB ansatz will be necessary for more general Ising networks than the McCulloch}Pitts model, e.g., in multi-layer Ising perceptrons, remains to be seen [169,186,187]. 2.6. Training, error measures, and retrieval We recall that the patterns to be stored are prescribed as pairs SI, mI, k"1,2, M, and the McCulloch}Pitts neuron (2.1) is required to reproduce mI in response to SI. Next we de"ne the so-called local stability parameters , DI"mI"J "\ J SI , (2.12) I I I where the normalization factor "J "\ guarantees a sensible behavior in the thermodynamical limit NPR if the patterns SI are normalized to a length of the order N. Introducing an error measure on a pattern as <(DI), the <(y) called a `potentiala, one is lead to the Hamiltonian + H" <(DI) . (2.13) I Minimizing this Hamiltonian in the space of couplings J is the task of training. In particular, maximizing the number of correctly stored patterns in (2.1) is equivalent to minimizing (2.13) if one chooses the potential <(y)"H(!y), where H(y) denotes the Heaviside step-function. Training in J space contrasts with the neuron (spin) dynamics of the Little}Hop"eld model, aimed at retrieving stored patterns. If more than plain memorization of the classi"cations mI of the training examples SI is required, then other choices of <(y) may be advantageous. For instance, <(y)"H(i!y)(i!y)@
(2.14)
G. Gyo( rgyi / Physics Reports 342 (2001) 263}392
286
with positive i and b try to impose, upon minimization in (2.13), the conditions DI5i on all the local stabilities (2.12). For the step function potential, b"0, the number of violations of DI5i is minimized, but those DI which violate the condition may take values arbitrarily far below i. For a softer potential with b'0, a compromise must be made between minimizing the number of violations and of the `costa (i!DI)@ of the committed error. In any case the qualitative e!ect of positive i after minimization in (2.13) is that inputs S in (2.1) close but not identical to one of the stored patterns SI can still be associated with the correct output mI. For load parameters (2.10) below the critical capacity, a(a , one will typically choose the largest possible i-value admitting a zero training error in (2.13) and thus DI5i for all patterns. This maximal i (a) as a function of
the load parameter a has been calculated by Gardner and Derrida in [5]. Note that i (a) is the
same for any b and that i (a )"0. Beyond the critical capacity, a'a , not all the training
patterns can be stored anyway, thus sacri"cing some additional ones choosing i'0 may still be desirable to create a "nite basin of attraction in the retrieval dynamics for patterns for which DI5i can be achieved [5]. Attractors of the dynamics (2.2) not corresponding to one of the stored patterns, spurious states, represent failure of memorization. We also mention that training the J couplings for each neuron separately leads to lifting the symmetry of J 's in the original GH Little}Hop"eld model. That leads to the loss of the equilibrium being described by a Hamiltonian, and to a dynamics exhibiting more complex time series than convergence to a "xed point [49]. The above concept of training corresponds to ¹"0 dynamics in J space, and can be complemented by a stochastic element to represent positive temperatures. The main focus of the present paper is describing the "nal equilibrium states of such dynamics. A further motivation for studying potentials <(y) even more general than in (2.14) is the fact that a discrete time version of the gradient descent dynamics of J in the corresponding energy landscape (2.13) reproduces several well-known learning algorithms [6]. For instance, the potential (2.14) with b"1 induces a dynamics very similar to the perceptron algorithm of Rosenblatt [2] and later Gardner [4]. Beyond capacity, when the convergence of such algorithms to a state with minimal positive error is not proven, there is only an intuitive ground for using such algorithms, and obviously modi"cations are necessary [155,37]. Next we turn to the retrieval behavior of the Little}Hop"eld associative memory network dynamics (2.2), characterized by the time-dependent overlaps 1 , mI(t)" SI S (t) I I N I
(2.15)
of the processed pattern S(t) with the stored patterns ("xed point attractors) SI. An input pattern S"S(0) is associated under the dynamics (2.2) with the stored pattern SI if mI(t) evolves towards 1 in the course of time, while mJ(t)P0 for all other patterns lOk. The smallest value of mI(0) which still leads to a successful retrieval, i.e., mJ(t)Pd , is a measure for the basin of attraction of the IJ stored pattern SI. In the thermodynamical limit NPR the following result for the "rst time step of the evolution in (2.2) has been derived in [188,189]
mI(1)" o(D) erf
mI(0)D
(2[1!mI(0)]
dD ,
(2.16)
G. Gyo( rgyi / Physics Reports 342 (2001) 263}392
287
where erf(x)"2p\V e\W dy and o(D) is the distribution of the local stabilities from (2.12), de"ned as 1 + o(D)" d(D!DI) . N I
(2.17)
In general, o(D) depends on the algorithm by which the vector of synaptic couplings in (2.12) has been computed. It has been assumed that all the McCulloch}Pitts units in (2.2) have been independently trained according to the same algorithm, thus in the thermodynamical limit o(D) will be the same function for all of them. We mention that with the Hebb rule this condition of independence does not hold, thus (2.16) is not valid for the original version of the Little}Hop"eld model. In the case that J has been obtained by minimizing a Hamiltonian function of the general form (2.13), the resulting distribution of overlaps o(D) will be one of the most important quantities of our present work. When (2.13) is minimized with the maximal i-value in (2.14) admitting an error-free storage of all training patterns, i.e., i"i (a), Kepler and Abbott [188] have observed numerically that
retrieval is successful if and only if mI(1)'[1#mI(0)]/2 .
(2.18)
In the thermodynamical limit this seems to be exact, or a very good approximation, at least for su$ciently small load parameters a such that i (a)50.6 [188].
In general, the further time evolution of mI(t) becomes increasingly more complicated than the "rst time step (2.16). Analytical approximations as well as numerical studies for various speci"c learning rules for the synaptic couplings J (including the Hebb rule) have been elaborated in [141,146,190}193]. For randomly dilute networks such that the fraction of non-zero synaptic couplings J in (2.2) tends to zero like N\ ln N in the thermodynamical limit, it has been shown in GI [189] that the same dynamics for mI(t) as in (2.16) remains valid for arbitrary times t, provided the initial condition S(0) has an appreciable overlap with only one of the stored patterns SI. Further interesting explorations along these lines can be found in Refs. [145,194,195]. A question of particular interest for our present study has been addressed by Griniasty and Gutfreund [6], namely whether it may be an advantage with respect to the retrieval properties to increase i in (2.14) beyond the threshold i (a) of error-free storage in the minimization of (2.13).
For randomly dilute networks they demonstrated analytically that this is indeed the case provided a(a (i"0)"2, but that the critical storage capacity a "2 itself cannot be surpassed by this trick. For a(a (i"0), the e!ect of choosing i'i (a), with b"1 in (2.14), is twofold. On
the one hand, the patterns SI themselves are no longer attractors but converge under the dynamics (2.2) towards nearby "xed points. On the other hand, the basins of attraction of these "xed points steadily grow as i exceeds i (a) and rather soon reach the `full basin scenarioa, i.e. every input
pattern S"S(0) with a "nite initial overlap mI(0)'0 will converge towards the same attractor as SI. We remark that these conclusions in [6] are based on a RS ansatz which is not rigorously valid [9,10] for the potential (2.14) with i'i (a) and 04b41. The CRSB scheme of our present
work may be needed for an exact treatment, though the quantitative corrections are not expected to be large.
288
G. Gyo( rgyi / Physics Reports 342 (2001) 263}392
2.7. Multi-layer perceptrons As far as practical applications to real problems are concerned, multi-layer perceptrons are the most important networks tractable within a statistical mechanical approach. They have great computational abilities and at the same time are not prohibitively complicated due to the absence of feedback e!ects. Still, the very property that these architectures are able to implement non-trivial tasks of practical interest makes their theoretical analysis di$cult. Qualitatively, the #exibility of multi-layer perceptrons is due to the fact that the individual McCulloch}Pitts units within each layer can share the e!ort to produce the correct output. On the one hand, this `division of labora gives rise to intricate anti-correlations between their activities [187,196}199]. On the other hand, for not too small set of training examples, it brings along a spontaneous breaking of their permutation symmetry, possibly superimposed in addition by a spontaneous breaking of replica symmetry [185]. Note that permutation symmetry of units in a layer is understood in the average sense, for a given training set of patterns there is generically no such symmetry. We will not present here a systematic discussion of the ongoing research on these topics but rather highlight two particular aspects of speci"c interest from the viewpoint of the simple perceptron analysis in our present paper, one being the capacity of multi-layer networks, the other one being the possibility to mimic multi-layer structures with a single unit with a non-monotonic transfer function. For a more detailed overview, especially regarding learning algorithms and generalization properties, we refer to [19,200] and for the present state-of-the-art to [198,201}204] and references therein. The storage capacity of multi-layer perceptrons has been analyzed within a statistical mechanical approach for the "rst time in [187,196,205] for spherical and in [186,187] for Ising perceptrons, addressing the simplest case with one adaptive input layer ("rst layer) and one hidden layer (second layer). The latter is governed by a pre-wired Boolean function, mostly either the so-called committee or a parity machine. For an Ising parity machine with non-overlapping receptive "elds, i.e., tree architecture, 1-RSB seems to be exact [186]. For fully connected machines with spherical synaptic weights [187,196,205,206], the assumption of RS cannot be upheld since it yields results incompatible with the rigorous bound of Mitchison and Durbin [153] (see also [187,200]), based on a generalization of Cover's line of reasoning [150]. While an improved 1-RSB ansatz respects these limits, the necessity of additional steps of RSB in order to draw reliable quantitative conclusions remains unclear [187,206]. A "rst alternative approach [185] suggests to break the permutation symmetry of the hidden units explicitly prior to the actual replica calculations, but the resulting equations are approximations and di$cult to solve for large networks. A second alternative method is the cavity approach, elaborated on a level equivalent to an RS ansatz in [163,165]. A most promising new roadway seems to be the multifractal analysis of the space of synaptic couplings by Monasson and co-workers [126,167}169]. One of the most remarkable "ndings of these and subsequent works [207}209] is that an RS ansatz in this approach yields results very close but not identical to those of a 1-RSB ansatz in the standard treatment along the lines of Gardner and Derrida [186,187,196,205,206]. For our present study the salient observation is [198] that the increased power of multi-layer networks in comparison with the simple perceptron stems from the possibility that the single McCulloch}Pitts units may all operate in the region beyond their individual storage capacity, while the network as a whole is still below its maximal storage capacity, the reason being that via
G. Gyo( rgyi / Physics Reports 342 (2001) 263}392
289
the division of labor, the errors of one unit may be recti"ed by another one, or made up for collectively. Speci"cally, results for a simple perceptron beyond its storage capacity have been utilized for the exploration of multi-layer networks in [202,210]. As a generalization of the simple perceptron with input}output relation (2.1), the following setup was introduced in [211] , m"< (y), y""J "\ J S A I I I < (y)"sign([y!c] y [y#c]) . A
(2.19) (2.20)
Like in (2.12), the scaling by "J "\ guarantees a sensible thermodynamical limit NPR and, unless indicated otherwise, we will focus on the case of a spherical constraint J "N. The potential (2.20) outputs #1 if y'c or c(y(0 and !1 otherwise (< (!y)"!< (y)), hence the name A A `reversed-wedge perceptrona for the input}output relation (2.19) was coined in [212]. Without loss of generality, one can focus on non-negative parameters c, reproducing the simple perceptron (2.1) for c"0, and its equivalent reversed counterpart for cPR. In [211,213,214] the reversed-wedge perceptron (2.19) was studied as a generalization of the simple perceptron with an increased storage capacity as main result. As revealed in [215], the assumption of RS, on which those "rst works are based, ceases to ful"ll the AT stability condition before the limit of capacity is reached, and an improved 1-RSB calculation modi"es the storage capacity by more than a factor of 2 for c-values of the order of one. It has been conjectured [126] that a consistent treatment of the problem is only possible by means of the general Parisi RSB framework. The storage problem in (2.19) is equivalent to a minimization of the cost function (2.13) with the potential from (2.20), so the problem becomes a special case of the theory presented later in this paper. In [215] it was observed that by rewriting (2.19), (2.20) as m" sign(y!h ) (2.21) H H with h "( j!2)c, the reversed-wedge perceptron may also be looked upon as a toy model of H a multi-layer perceptron. To see this, we "rst note that each factor of the form sign(y!h) in (2.21) is a generalization of the simple perceptron (2.1) with a `"ring thresholda h as new feature. Such a threshold has a well-founded biological basis but has been omitted from many a theoretical study [19]. A systematic exploration of perceptrons with a threshold by way of a replica analysis has been undertaken in [12]. Returning to (2.21) we see that this input}output relation represents a special kind of a two-layer perceptron with three McCulloch}Pitts units, endowed with di!erent thresholds but identical synaptic weights J in the "rst layer, and a non-adaptive second layer, pre-wired according to a so-called parity machine. Besides the two-layer architecture the toy model suggests, and the occurrence of RSB before the maximal storage capacity is reached, several further features of the reversed-wedge perceptron (2.21) have been found to qualitatively agree with characteristic properties of real multi-layer networks [127,215,216]. As shown in [126,215], the version space is partitioned into an exponentially large number of disconnected components for any positive load parameter a. Nevertheless, up to a certain "nite a-value, the RS solution appears to be correct [126]. This observation invalidated the hitherto
G. Gyo( rgyi / Physics Reports 342 (2001) 263}392
290
widespread belief that unbroken RS signals a connected (possibly convex) version space and that RSB is tantamount to the breaking of ergodicity. The quite subtle point is that, in the thermodynamical limit NPR, each of the exponentially many disconnected components of the version space, has an in"nitesimal contribution to full volume thus validating the RS result, while beyond a certain a they become small in number, but each having a signi"cant relative contribution to the volume of version space, thus causing RSB. In all previously mentioned cases, RSB was intimately connected with frustration and a rugged energy landscape with nearly degenerate minima. In the case of the reversed-wedge perceptron, below its maximal storage capacity, there is no frustration. Then all constraints in the form of input}output relations (2.19) are satis"ed for vectors J belonging to the version space. The local minima may now be identi"ed with the exponentially many disconnected domains of the version space, each having exactly the same energy and completely #at with bottom at zero energy (mu$n-tin shape). In place of a spontaneous one may now rather speak of an induced breaking of ergodicity, that can be attributed to the non-monotonicity of the potential, and may be the reason why RS remains applicable for smaller a's. It may come as a surprise that, for su$ciently large but below capacity a values, Parisi's scheme, including the ultrametric organization of the ergodic components, is apparently still applicable. A very similar situation arises for potentials in (2.14) with negative i-values [5,200] and for certain unsupervised learning scenarios [217,218], involving potentials of the form <(y)"H(i!"y") ,
(2.22)
in the regime below the respective critical capacity value of the load parameter a. Beyond the critical a-value, in all cases frustration sets in, and we are back at Parisi's RSB scenario. Various generalizations of the reversed-wedge perceptron have been explored, two of which we "nd particularly interesting. In [219] the case of more than three discontinuities in (2.20) has been considered. As the number of discontinuities increases, the maximal storage capacity is found to increase and also the consideration of RSB e!ects becomes more and more important for quantitatively reliable results. The reversed-wedge Ising perceptron with c"(2 ln 2) in (2.20) was demonstrated in [128] to saturate the information theoretical upper bound for the maximal storage capacity, a fact which has found its natural physical explanation by means of a multifractal analysis of the version space in [220]. The concomitant vanishing of the Edwards}Anderson order parameter has, like in the high temperature regime of the SK model [129], the consequence that the annealed approximation coincides with the RS solution [128].
3. Statistical mechanics of pattern storage 3.1. The model We now set the stage for the detailed study of the single neuron by introducing the model and reviewing basic statistical mechanical notions. With the exception of Sections 5.3 and 6.3, the style of presentation is meant to be self-contained henceforth. Some overlaps with Section 2 are the consequences.
G. Gyo( rgyi / Physics Reports 342 (2001) 263}392
291
We consider the McCulloch}Pitts model neuron [19], m"sign(h) ,
(3.1a)
, h"N\ J S , (3.1b) I I I where J is the vector of synaptic couplings, S the input and m the response. The normalization was chosen so that h is typically of O(1) when NPR. Patterns to be stored are prescribed as pairs +SI, mI,+ (3.2) I such that the neuron is required to generate mI in response to SI. Given the ensemble of patterns, the local stability parameter DI"hImI (3.3) obeys some distribution o(D) [6]. The kth pattern is stored by the neuron if the actual response signal from Eq. (3.1) equals the desired output mI, i.e., DI'0. The number of patterns M is generically of order N, so a"M/N
(3.4)
is an intensive parameter. For the sake of simplicity, we generate the SI 's independently from I a normal distribution, and take mI"$1 equally likely. The corresponding probability density will be denoted by P(+SI, mI,). Since the output m in (3.1) is invariant if J is multiplied by a factor, it is useful to eliminate this degree of freedom by the spherical constraint "J ""(N. In general, a prior distribution w(J ), not necessarily normalized, expresses our initial knowledge about the synapses. The spherical constraint, which we choose to normalize, corresponds to w(J )"C d(N!"J ") , , N (Np)\, . C "NC , 2
(3.5a) (3.5b)
Another generic type of prior distribution is when it prescribes independent, identical constraints for the synapses as , w(J )" w (J ) , I I e.g., binary, or Ising, synapses have
(3.6)
w (J)"d(J!1)#d(J#1) . (3.7) This prior distribution is not normalized. Its scale is conveniently set by requiring that J averages I to unity, that is w (J)JdM J"w (J)dM J, whence N\ , J goes to 1 for large N. `Soft spinsa are I I generated by smooth, multiple-peaked w (J). Our main goal is to "nd those J 's that store the prescribed patterns i.e., are compatible with the patterns. The problem can be reformulated as an optimization task with a suitable cost function
G. Gyo( rgyi / Physics Reports 342 (2001) 263}392
292
i.e., Hamiltonian, to be minimized. A convenient choice here is the sum of errors committed on the patterns + H" <(DI) , (3.8) I where the potential <(DI) measures the error on the kth pattern SI, mI. A natural error measure <(y) is zero for arguments larger than a given threshold i and monotonically decreasing elsewhere [6]. Storage as de"ned above corresponds to i"0, while a i'0 means a stricter requirement on the local stability D and ensures a "nite basin of attraction for a memorized pattern during retrieval. The Hamiltonian (3.8) de"nes through gradient descent a dynamics in the space of couplings J. Speci"cally, (3.9)
<(y)"(i!y)@h(i!y)
corresponds to the perceptron and adatron rules for b"1, 2, respectively, where h(y) is the Heaviside function (see [6] and references therein). There is no such dynamics in the case b"0, but because of its prominent static meaning } the Hamiltonian counts the incorrectly stored patterns } we will consider that in concrete calculations. Furthermore, the h(y) function can be approximated by a smooth one that does have an associated dynamics. Thus our present study of thermal equilibrium with the Hamiltonian involving (3.9) with b"0 can be thought of as the average asymptotics of such a dynamics. A non-gradient-descent algorithm, designed to minimize the Hamiltonian with b"0, will be discussed in Section 7.2.5. 3.2. Thermodynamics The Hamiltonian (3.8) gives rise to a statistical mechanical system [4,221] resembling models of spin glasses with in"nite-range interactions [14,84]. A microstate is a speci"c setting of the synaptic weight vector J, quenched disorder is due to the randomly generated patterns, and a positive temperature ¹"b\ has the e!ect of introducing tolerance to error of storage. (We use the convention of setting Boltzmann's constant to unity.) The partition function assumes the form [4,221]
+ (3.10) Z" d,J w(J ) exp !b <(DI) . I Integration is over the entire real axis if not denoted otherwise. For large N we expect selfaveraging [14,84], i.e., for a given instance of the quenched disorder the extensive thermodynamical quantities are assumed to approach their quenched average. This leads us to the thermal statics of the system, where the question of breaking of ergodicity on some time scales is not dealt with. The replica method [14,84] starts with our writing the mean free energy per coupling as f"! lim , where
1!1ZL2 1ln Z2 , " lim lim nNb Nb , L
+ 122 , P(+SI, mI,)2 dmI d,SI I
(3.11)
(3.12)
G. Gyo( rgyi / Physics Reports 342 (2001) 263}392
293
stands for the quenched average over patterns. In order to carry on with calculations, it is common practice [19,14,84] to interchange the limits nP0 and NPR. In what follows we accept the reversal of limits based on numerous examples wherein the consequent results were veri"ed by other analytic methods or numerical simulations, see e.g., Refs. [19,14]. Introducing the thermal average as
+ 122 ,Z\ exp !b <(DI) 2w(J ) d,J , I one naturally obtains the mean error per pattern as
(3.13)
e"11<(D)2 2 . From the free energy this derives as
(3.14)
1 Rbf e" , (3.15) a Rb it is thus the analog of the thermodynamical energy. The mean entropy per synapse, or, the speci"c entropy, s"b(ae!f )
(3.16)
is a measure of the volume in coupling space associated with a given mean error. A case of special signi"cance is when at ¹"0 the mean error is zero i.e., storage is perfect. Then the partition function becomes the weighted (possibly non-normalized) volume of the subspace of couplings that reproduce the stored patterns, X"Z" . The zero temperature entropy measures 2 this volume as s"N\ ln X. 3.3. Spherical and independently distributed synapses Before proceeding, we warn the reader that quantities of di!erent types } variables, functions, functionals } may be denoted by the same symbol, the di!erence possibly shown by the type of the argument. An example is the free energy and the replica free energy, as shown below. Such practice will be limited to cases when there is little chance for confusion. In the case of the spherical constraint (3.5), the free energy per synapse was "rst proposed in [4,221] for the special error measure <(y)"h(i!y). The replica symmetric free energy was given by [6] for a general error measure <(y). For a general <(y) without the assumption of replica symmetry the free energy reads as 1 f"lim min f (Q) , n L Q f (Q)"f (Q)#a f (Q) , Q C f (Q)"!(2b)\ ln det Q , Q dLx dLy L 1 exp !b <(y )#i xy!xQx , f (Q)"! ln ? C (2p)L b ?
(3.17a) (3.17b) (3.17c) (3.17d)
G. Gyo( rgyi / Physics Reports 342 (2001) 263}392
294
see Appendix B for derivation. Note that the minimum condition should be understood as the extremum in Q and non-negativity of the eigenvalues of the Hessian in terms of the matrix elements of Q. Marginal linear stability is allowed, and as we shall see later, will occur in some phases. Once the Q matrix is appropriately parametrized then in terms of those parameters the extremum becomes a maximum for n(1. So the minimum condition above is meant before such a parametrization is applied. These considerations deal with the consequences of our having interchanged the limits NPR and nP0 [14]. The entropic term f , for which we used the concise form from [222], is speci"c to the spherical Q model. On the other hand, the energy term f , "rst displayed in [7], is independent of the prior C constraint for the synapses. The n;n matrix Q has been introduced through the constraint [4,221] 1 , (3.18) [Q] "q " J J , ?I @I ?@ ?@ N I i.e., Q is the matrix of the overlaps of the synaptic couplings, is symmetric and positive semide"nite, with uniform diagonal elements q ,q "1 and !14q 41. Here the indices a, b"1,2, n ?? " ?@ are so called replica indices; a quantity with label a belongs to the ath factor in the power ZL of the partition function Z. Any quantity carrying a replica index is intimately related to the replica method, and its observability needs to be clari"ed extra. Only the o! diagonals, q with aOb, ?@ entail minimum conditions. Let us introduce the mean of some function A(x, y) as
dLx dLy L A(x, y) exp !b <(y )#i xy!xQx , (3.19) ? (2p)L ? where the prefactor ensures [1\ "1, and the subscript refers to the fact that the expectation value C is associated with the energy term. Then, using [A(x, y)\ "eL@DC Q C
R ln det Q "[Q\] ,q\ , ?@ ?@ Rq ?@ we obtain q\"a[x x \ , aOb ?@ ? @ C as the extremum condition in terms of q . ?@ If the prior distribution is like (3.6) then 1 f"lim min extr f (Q,QK ) , n Q QK L f (Q,QK )"f (Q,QK )#fK (QK )#a f (Q) , G Q C b f (Q,QK )" Tr QQK , G 2
(3.20)
(3.21)
(3.22a) (3.22b) (3.22c)
L 1 (3.22d) fK (QK )"! ln e@JQK J w (J ) dJ ? ? Q b ? and f (Q) is given by (3.17d). The special case of Ising synapses (3.7) gives the free energy in [5,175]. C Besides Q there is now another symmetric auxiliary matrix, QK , whose diagonals are q( ,q( "0. ?? "
G. Gyo( rgyi / Physics Reports 342 (2001) 263}392
295
The derivation of the above free energy is given in Appendix B. We emphasize that the type of extremum in QK is not restricted to minimum, see the argument below Eq. (B.4). The interaction term f together with the entropic term, here fK , at extremum corresponds to the entropic term G Q (3.17c) in the spherical case. If we introduce the mean associated with the entropic term here as
L (3.23) [A(J )\ "eL@DK Q QK A(J ) e@JQK J w (J ) dJ , ? ? Q ? then the stationarity condition in terms of a q( reads as ?@ q "[J J \ , (3.24) ?@ ? @ Q and that by q gives ?@ bq( "!a[x x \ . (3.25) ?@ ? @ C For the diagonals are not varied, the above equations should hold only for aOb. Note that in the limit nP0 the normalization coe$cients in (3.19) and (3.23) each become 1, so for most purposes those formulae can be understood as if the coe$cients were absent. 3.4. Neural stabilities, errors, and overlaps The probability distribution of the neural stability parameter D associated with stored patterns is given as [6,7] o(D)"11d(D!hm)2 2 , (3.26) where the formula for a hI in (3.1) is understood. Due to permutation symmetry among patterns there is no loss of generality in our selecting the "rst pattern in the de"nition above. The above de"nition is obviously independent of the replica method. This, however, can be used to calculate the distribution of stabilities. Replacing Z\ by ZL\ in the thermal average in (3.26), keeping in mind that in the end the nP0 limit should be taken, we recognize that (3.26) is technically a little modi"ed version of the partition function integral. The calculation is in analogy with the derivation of (3.17d), the latter shown in Appendix B, and we end up with o(D)"[d(D!y )\ " , (3.27) C L where any replica index other than 1 could equally be chosen. Thus the average of an arbitrary function ;(y ), a arbitrary but "xed, can be written in the form ?
[;(y )\ " " dy o(y);(y) , ? C L
(3.28)
where D as integration variable was replaced by y. An instructive formula for the mean error is obtained in terms of the distribution of neural stabilities as
e"[<(y )\ " " dy o(y)<(y) . C L
(3.29)
The "rst equality comes about from the de"nition (3.15), the energy term (3.17), and the notation (3.19), while the second follows from (3.28). In case of replica symmetry this expression goes over to the mean error displayed in [6]. In fact, (3.28) allows us to use error measures that are not related to
G. Gyo( rgyi / Physics Reports 342 (2001) 263}392
296
the thermodynamic energy. One can take a <(y) in the Hamiltonian and de"ne the observable error by another ;(y) measure. This was done, without assuming replica symmetry, with ;(y)"h(i!y) in [7]. Eq. (3.29) holds for both constraints (3.5) and (3.6). In the second case this is due to the fact that b can be absorbed into q( , consequently b times (3.22c) and (3.22d) both become b indepen?@ dent. Thus only f (Q) enters the thermodynamical formula (3.15) for the mean error, yielding (3.29). C The overlaps q emerged from the replica theory as auxiliary variables, with no prescription ?@ how to measure them. In analogy to spin glasses, where the Edwards}Anderson overlap of spins q has been de"ned independently of replicas [59], one can introduce an Edwards}Anderson # overlap of synapses
1 , . (3.30) 1J 2 I N I If the summation produces a self-averaging quantity, then the quenched average can be omitted from the de"nition, but eventually the same formula holds. Replacing the Z\ that appears in the thermal averages by ZL\, we can again apply the replica method. Attaching the a"1 and a"2 replica indices to the synaptic couplings in (3.30), and carrying out a calculation analogous to that in the case of the local stability distribution, we have q
#
"
q
"q . (3.31) # This is valid for both the spherical and the independently distributed synapses. The ambiguity in this expression is obvious: there was some arbitrariness in selecting the 1st and 2nd replica indices for the the synaptic couplings in (3.30) and labeling the other n!2 replicas starting from a"3. In the terminology of replica theory of spin glasses, the result (3.31) is in fact the overlap within one pure thermodynamical state [14]. A detailed study of the probability distributions of overlaps in multiple thermodynamical states (see [30] on the SK model) for the neuron is beyond our present scope. Nevertheless, in the analysis of correlation functions the consequences of ultrametricity as described in Ref. [28] will be recovered. Actually, for special error measures complex structures can arise in the neuron even without spontaneous replica symmetry breaking [215,126]. In summary, based on the manifold of analogies with spin glasses that we expound later in the paper, we expect that several aspects of the physical interpretation of the replica theory for spin glasses carry over to the storage problem of the neuron, even when in the latter replica symmetry is spontaneously broken. A moral of the latter results is that the replica method enables us the calculation of not only extensive quantities, like the free energy, but also the evaluation of local quantities. Technically, this is due to the fact that replicas are useful in taking the average of, besides the logarithm, also inverse powers of the partition function.
4. The Parisi solution 4.1. Finite replica symmetry breaking 4.1.1. Recursive evaluation of the free energy term Below we resolve the `harda terms in the free energies, namely, expressions (3.17d) and (3.22d). The derivation follows the spirit of Parisi's as described concisely for the SK model in Ref. [20].
G. Gyo( rgyi / Physics Reports 342 (2001) 263}392
297
The added aim here is to present the Parisi solution in a comprehensive, self-contained manner. Later we will be rewarded for this approach, for the calculation of expectation values shall follow straightforwardly. Our main concern here is
dLx dLy L 1 exp U(y )#i xy!xQx . u[U(y),Q]" ln ? (2p)L n ?
(4.1)
Whereas this formula would look simpler if the Fourier transform of eUW were used, we keep the above notation, because it is the function U(y) that will explicitly appear in the "nal evaluation. Both Eqs. (3.17d) and (3.22d) are of this type. Eq. (3.17d) corresponds to !bf (Q)"nu[!b<(y), Q] , C and (3.22d) is obtained by
(4.2)
!bfK (QK )"nu ln dx w (x)e\@WV,QK . Q
(4.3)
Note that there are no a priori bounds for q( . Furthermore, the diagonal elements q( vanish. ?@ ?? Furthermore, the function w (x) is assumed to cut o! su$ciently fast so that (4.3) exists. Later in this paper, however, when an integral expression is displayed and we do not discuss divergence that means we assume the conditions of "niteness hold. This does not mean that under other conditions the expression could not diverge. We will call (4.1) the `free energy terma, it is ubiquitous also in long range interaction spin glass models. We just have seen that the free energy of the neuron with arbitrary, independently distributed synapses contains two additive terms of the type of Eq. (4.1). We will see later that the spherical entropic term is also of this type, so the free energy of the spherical neuron is also the sum of two terms of the type (4.1). Most of the mathematical parts of this paper are centered about the evaluation of expression (4.1). Due to the absence of inherent topology in in"nite range spin glass models, the replica approach led there to a single-site e!ective free energy. For such problems the ansatz of Parisi's turned out to be a very successful mathematical framework, presently the stepping stone to the "eld theory of the spin glass transition (see references in Ref. [223]). Since the present neuron problem is a priori single site, it is reasonable to search for the Q minimizing (3.17a) by using Parisi's hierarchical assumption, which reads as 0> (4.4) Q" (q !q )U P I P , LK P P\ K P where the subscript k to a matrix marks that it is k;k, I is the unit matrix and U has all elements I I equal unity. Furthermore, q
\
"0,
q "q , 0> "
m "14m 4m 424m 4m "n , 0> 0 0\
(4.5a) (4.5b)
G. Gyo( rgyi / Physics Reports 342 (2001) 263}392
298
where the integer m is a divisor of m . In the case of the Q of Section 3 there is a presumed P P\ ordering q
"04q 4q 424q 4q "1 . (4.6) \ 0 0> In theory, q (0 are also possible, but in our numerical explorations of examples such q 's P P did not appear, so we shall consider the restriction to non-negative q's part of the ansatz. For QK of Section 3 the assumption is q(
"04q( 4q( 424q( , q( "0 . (4.7) \ 0 0> These represent the R-step replica symmetry breaking scheme (R-RSB). At this stage we do not prescribe the ordering of q 's and allow uniform diagonals q ,q of any magnitude. P ?? " The quadratic form in (4.1) is then
0> LKP HP KP xQx" (q !q ) . (4.8) x P P\ ? HP ?KP HP \> P The u[U(y),Q] of (4.1) should thus be replaced by u[U(y), q, m)], where the parameters in (4.4) are considered to be the elements of the vectors in the argument. By using the notation Dz"
e\Xdz
(4.9)
(2p
and the identity
e\V" Dz e\ XV(
(4.10)
we obtain
0> LKP dLx dLy DzPP H (2p)L P HP LKP HP KP 0> L L ;exp !i (q !q zPP x #i x y # U(y ) . P P\ H ? ? ? ? P HP ?KP HP \> ? ? (4.11)
eLP UW qm "
Appendix C shows that the above expression equals Eq. (C.6). The limit m "nP0 violates the ordering in (4.5b). In fact, experience in spin glasses [14,84] and in R-RSB, R"1, 2 calculations in neural networks (see [7,11]) suggests that m 's get less than P 1 and the ordering in (4.5b) is to be reversed. This can be understood by our introducing n!m P x" P n!1
(4.12)
for arbitrary n and using the x 's for parametrization instead of the m 's. The new parameter P P x should not be confounded with the integration variable x in Eq. (4.1). For integer n and m 's P ? P satisfying (4.5b) we have the ordering x "15x 5x 525x 5x "0 . 0> 0 0\
(4.13)
G. Gyo( rgyi / Physics Reports 342 (2001) 263}392
299
Keeping the x 's "xed as nP0 de"nes the n-dependence of the m 's, and for n"0 formally we get P P x "m . This explains the aforementioned practice to treat the m 's as real numbers in [0, 1] with P P P ordering reversed w.r.t. (4.5b). Eq. (C.6) becomes for nP0, in terms of the x 's, P 1 Dz ln Dz u[U(y), q, x]" " L x 0> V0 V0> V V exp U z (q !q . ; Dz 2 Dz 2 P P P\ 0> P (4.14)
This is the general formula for R-RSB. Expression (4.14) can be written in form of an iteration for decreasing r's as
t (y)" Dz t (y#z(q !q )VP VP> , P P P\ P\
t (y)" Dz eUW>X(O0> \O , 0
(4.15a) (4.15b)
or, we can set x
"1 and put the initial condition as 0> (y)"eUW .
t (4.16) 0> In the iterated function we omitted to mark the functional dependence on U(y) and q, x. If a q !q (0 then the square root is imaginary. Since the Gauss measure of integrations P P\ suppresses odd powers in a Taylor expansion of the integrand, the result, if the integrals exist, will be real. The case of non-monotonic q sequence will be brie#y discussed in the end of this section. P Finally we get
1 u[U(y), q, x]" " Dz ln t (z(q ) . (4.17) L x Note that an iteration like (4.15) can be also understood, before the nP0 limit is taken, directly on Eq. (C.6) where m /m is integer. Then formally u[U(y), q, x]"n\ ln t (0). Hence for nP0 we P P> \ recover (4.17). It is, however, an advantage that we can "rst take nP0 then de"ne the recursion (4.15) with fractional powers. Indeed, while dealing with the consequences of the recursion, the replica limit nP0 is implied and we do not have to return to the question of that limit again. It is instructive to introduce ln t (y) P , u (y)" P x P> lending itself to the recursion
1 u (y)" ln Dz eVP PP W>X(OP \OP\ , P\ x P u (y)"U(y) , 0>
(4.18)
(4.19a) (4.19b)
G. Gyo( rgyi / Physics Reports 342 (2001) 263}392
300
and yielding
u[U(y), q, x]" " Dz u (z(q ) . L
(4.20)
4.1.2. Parisi's PDE The above recursions can be viewed as a di!usion processes in the presence of `kicksa. Let us introduce here Parisi's order parameter function (OPF) as 0 x(q)" (x !x )h(q!q ) , (4.21) G> G G G de"ned on the interval [0, 1], where (4.6) and (4.13) are understood. With the standard notation f (q>)"lim f (q#e) , C we have obviously
(4.22)
x(q>)"x , P P> and we may set
(4.23)
x(q\)"x , P P
x(q )"x . P P> Next we introduce the "eld t(q, y) such that at q it has the discontinuity P t(q>, y)"t (y) , P P > P VOP . t(q\, y)"t (y)VO\ P P In other words, t(q, y)VO
(4.24)
(4.25a) (4.25b)
(4.26)
is continuous in q. We may set at the discontinuity t(q , y)"t (y) . (4.27) P P A graphic reminder to the way x(q) and t(q, y) are de"ned at the discontinuity is shown on Fig. 1. Note that r was converted to q di!erently for x and t , cf. Eqs. (4.24) and (4.27). All "elds appearing P P below follow the convention (4.25a), (4.27). In the interval (q , q ) we de"ne the t(q, y) based on (4.15a) as P\ P
t(q, y)" Dz t(q\, y#z(q !q) , P P
(4.28)
ensuring that (4.25a) holds for rPr!1. Relation (4.28) says that the t(q, y) evolves in the open interval from q to q by the linear di!usion equation P P\ R t"!Rt . (4.29) O W
G. Gyo( rgyi / Physics Reports 342 (2001) 263}392
301
Fig. 1. Schematic behavior of x(q), t(q, y), and u(q, y) at a discontinuity point q . A "xed y is assumed. The function u(q, y) P is continuous in q but has a discontinuous derivative. The two limits of t(q , y) are related through Eqs. (4.25a) and P (4.25b). The circles are placed where the function value is not taken as the limit.
Near the discontinuity of x(q) another di!erential equation can be derived. Let us di!erentiate Eq. (4.26) by q as
1 x R tV" t\VV R t! t ln t . O O x x
(4.30)
Since t(q, y)VO is continuous in q while t(q, y) and x(q) are not, the two singular derivatives on the r.h.s. must cancel in leading order. Hence we obtain x (4.31) R t" t ln t O x in an in"nitesimal neighborhood of q . The above derivation is apparently unfounded, because at P a discontinuity the rules of di!erentiation used in (4.30) loose meaning. However, considering (4.31) at a "xed y as an ordinary di!erential equation separable in q helps us through the discontinuity, and we obtain
P W dt P dx RO> VO> " . t ln t \ \ x P P RO W VO The integrals yield
(4.32)
> P "ln x(q)"O\ P ln ln t(q, y)"O> , (4.33) P O\ OP whence by exponentiating twice we recover the continuity condition for (4.26). In conclusion, for a discontinuous x(q) Eq. (4.31) can indeed be interpreted as the di!erential form of the prescription that (4.26) is continuous in q.
G. Gyo( rgyi / Physics Reports 342 (2001) 263}392
302
Concatenation of (4.29) and (4.31) gives, with regard to the initial condition (4.16), the PDE x R t"!Rt# t ln t , O W x
(4.34a)
t(1, y)"eUW .
(4.34b)
Indeed, at a q the x (q) is singular, so the second term on the r.h.s. dominates and we recover (4.31), P whereas within an interval x (q),0 and thus (4.29) holds. The transformation analogous to (4.18) is t(q, y)"ePOWVO ,
(4.35)
and gives rise to R u"!Ru!x(R u) , O W W u(1, y)"U(y) .
(4.36a) (4.36b)
It follows that when x(q) has a "nite discontinuity then the "eld u(q, y) is continuous in q, as on Fig. 1. This is in accordance with the condition that formula, (4.25b) is continuous. The PDE (4.36a) can be rewritten via the transformation q"q(x) to one evolving in x, a PDE "rst proposed by Parisi with a special initial condition for the SK model [14,84]. In this paper we refer to (4.36) and its equivalents as Parisi's PDE, PPDE for short. When x(q),const., di!erentiation of the PPDE (4.36) in terms of y gives the Burgers equation for the "eld R u. Then the derivative of Eq. (4.35) by y corresponds to the Cole}Hopf transformaW tion formula [224,225], which converts the Burgers equation into the PDE for linear di!usion, here (4.34) with x ,0. If x(q) is not a constant, (4.35) connects two non-linear PDEs. We shall refer to (4.35) as Cole}Hopf transformation. In case of a discontinuous initial condition U(y) the Cole}Hopf transformation (4.35) connects two discontinuous functions at q"1, while generically di!usion smoothens the discontinuity for q(1. Even if we succeed in de"ning the PDEs for non-di!erentiable initial conditions, the equivalence of Eqs. (4.34a) and (4.34b) and Eq. (4.36) is doubtful. In case of ambiguity precedence is taken by the PDE (4.34a), (4.34b), that directly follows from the recursion (4.15). The question of discontinuity in the initial condition will be discussed later. Our main focus is the term (4.20), now also a functional of x(q)
u[U(y), x(q)]" Dz u(q , z(q ) ,
(4.37)
where n"0 is implied. Note that in the interval (0, q ) x(q),0, so (4.36a) becomes the PDE for linear di!usion, whose solution at q"0 is given by the r.h.s. of (4.37). Thus u[U(y), x(q)]"u(0, 0) .
(4.38)
In the above PDEs q is a time-like variable evolving from 1 to 0. In the context of the PDEs we will refer to q as time, and ordinary derivative by q will be denoted by a dot. The above PDEs can be considered as non-linear di!usion equations in reverse time direction. E. Ott has kindly called our attention to the Cole-Hopf transformation.
G. Gyo( rgyi / Physics Reports 342 (2001) 263}392
303
Next we study the case of QK with Parisi elements obeying (4.7). Then the PDE obtained for the "eld tK (q( , y) by continuation contains the function x( (q( ). We obtain x( (4.39a) R ( tK "!RtK # tK ln tK , W O x(
tK (q( , y)" Dz eUW> X(O( 0 , 0
(4.39b)
where tK (q( , y) is real due to the symmetry of the Dz measure. Alternatively, with the Cole}Hopf 0 transformation (4.35), we have R ( u( "!Ru( !x( (R u( ) , O W W
(4.40a)
u( (q( , y)"ln Dz eUW> X(O( 0 . 0
(4.40b)
The existence of the integral is a sensitive question here, because the imaginary term in argument expresses the fact that exp U is evolved by backward di!usion. The meaningfulness of the above initial condition should be checked case-by-case. Then the sought term is u[U(y),QK ]" "u[U(y), x( (q( )]"u( (0, 0) . (4.41) L In contrast to the PDEs associated with the matrix Q of naturally bounded elements, where the time span of the evolution is the unit interval, in the case of QK the PDEs' evolution interval is not "xed a priori. Now q( goes from q( to 0, where q( itself is a thermodynamical variable subject to 0 0 extremization. Finally, we emphasize that the recursive technique may be able to treat non-monotonic q sequences. Indeed, if q (q then an imaginary term would multiply z in the integrand on the P P P\ r.h.s. of (4.15a), but the l.h.s. would have a real function. If the integrals involved exist then there is no obstacle to extend the theory to non-monotonic q 's. Such a case did not, however, arise in our P explorations. As we shall see in Section 6.1, the OPF x(q) is a probability measure, a property that non-monotonicity would contradict. On the other hand, QK can be considered as associated with a non-monotonic q( sequence. Its diagonals vanish, q( ,q( "0, and so the step from q( to P ?? 0> 0 q( "0 goes against the trend of the otherwise supposedly monotonic increasing q( sequence, 0> P r"0,2, R. Accordingly, an imaginary factor of z appears on the r.h.s. of (4.40b), and the recursion is as meaningful as it was in the case of a monotonic q sequence. P The generalization of the picture above is straightforward for an order parameter with more components, when the structure of the free energy term remains essentially the same. We brie#y discuss this case in Appendix E. 4.2. Finite and continuous replica symmetry breaking 4.2.1. The continuous limit If the minimum of the free energy is found at an OPF given in (4.21) with R"R, then the q 's P accumulate in"nitesimally closely in some region. If this happens in an interval, the OPF x(q) is expected to increase there strictly monotonically, given its physical interpretation as mean probability distribution of the overlaps, as discussed in Section 6.1. Within that interval the recursions
G. Gyo( rgyi / Physics Reports 342 (2001) 263}392
304
go over to the PDEs of Section 4.1.2. In other regions in q, where the x(q) remains a step function, the recursions discussed in Section 4.1.1 can be used, but the PDEs are also still valid, as described in Section 4.1.2. In either case, the PDEs are applicable independent of whether the minimizing OPF is continuous or step-like. In Appendix D we discuss the continuation method of Ref. [226]. In physical systems so far, including spin glass and neural network models, out of "nite R's only R"0 and R"1 RSB phases were found thermodynamically stable. The signi"cance of 24R(R RSB seems to be in approximating the R"R case. Generically, both "nite and in"nite R states are characterized by the border values:
q if R(R, (4.42a) q " lim q if R"R, 0 q if R(R, q " 0 (4.42b) lim q if R"R, 0 0 where q 50 and q 41. These are delimiters of the trivial plateaus of the OPF x(q) as 0 if 04q(q , (4.43) x(q), 1 if 15q5q . The border values (4.42) apply to both the "nite and in"nite R cases, the di!erence remaining in the shape of the OPF x(q) within the interval (q , q ). Here we assumed q "q "1, this makes 0> " the q"1 value special and we will use that in the general discussion. When R"R a typical situation is when extremization of the free energy yields
0
if 04q(q , (4.44) x(q)" x (q), x 4x (q)(x , 0(x (q)(R if q 4q(q , 1 if q 4q41 . In words, the OPF has a strictly increasing, continuous segment x (q) between the border values (4.42). Here x "x(q\ ). ¹he case with an OPF having a smooth, strictly increasing, segment x (q) will be referred to as continuous RSB (CRSB). Obviously, CRSB always implies RPR. In principle, then the OPF may be more complicated than (4.44) e.g., there may be non-trivial plateaus (xO0, 1) and several x (q) segments separated by them. So far, however, no system was found whose replica solution involved more than one strictly increasing segments x (q), separated by a plateau or a discontinuity. In what follows we will use the term `continuationa, when we understand the nP0 limit, the usage of x(q) based on Eqs. (4.12) and (4.21), as well as we give allowance for but do not necessarily imply CRSB. If the OPF in question is x( (q( ), de"ned analogously to (4.21) with the parameters +q( , x( ,, P P continuation goes along similar lines. 4.2.2. Derivatives of Parisi's PDE The iterations derived in Section 4.1.1 only describe "nite R-RSB, including the R"0 replica symmetric case, while the PDEs incorporate both "nite and continuous RSB. We therefore study the PDEs.
G. Gyo( rgyi / Physics Reports 342 (2001) 263}392
305
For later purposes it is worth summarizing some PDEs related to the PPDE (4.36) and its Cole}Hopf transformed Eq. (4.34). The "eld k(q, y)"R u(q, y) , W satis"es the PDE R k"!Rk!xkR k , O W W k(1, y)"U(y) ,
(4.45)
(4.46a) (4.46b)
obtained from the PPDE by di!erentiation in terms of y. One more di!erentiation introduces i(q, y)"R k(q, y) , W which evolves according to R i"!Ri!x(i#kR i) , O W W i(1, y)"U(y) .
(4.47) (4.48a) (4.48b)
Note that while the PPDE (4.36) and Eq. (4.46) are self-contained equations, in principle solvable for the respective "elds, (4.48) is not such and should rather be considered as a relation between the "elds k(q, y) and i(q, y). The Cole}Hopf transformation for the "rst derivative k(q, y) can be conveniently de"ned as the "eld k(q, y)t(q, y). This can be further di!erentiated to produce the Cole}Hopf transformed "eld for i(q, y). The PDEs for the transformed "elds each reduce to the linear di!usion equation along plateaus of x(q). 4.2.3. Linearized PDEs and their adjoints As we shall see, in the calculation of expectation values linear PDEs associated with the above equations play an important role. A perturbation u(q, y)#e0(q, y) around a known solution u(q, y) of the PPDE itself satis"es the PPDE to O(e) if R 0"!R0!xkR 0 . O W W This equation is satis"ed by k(q, y) with initial condition (4.46b). The "eld
(4.49)
g(q, y)"R 0(q, y) , W then evolves according to
(4.50)
R g"!Rg!xR (kg) , (4.51) O W W obviously satis"ed by i(q, y) if the initial condition is speci"ed by (4.48b). The "eld P(q, y) adjoint to 0(q, y) and crucial in the computation of expectation values can be introduced by the requirement that
dy P(q, y)0(q, y)
(4.52)
be independent of q. Di!erentiating by q, using Eq. (4.49), and partially integrating with the assumption that P(q, y) falls o! su$ciently fast for large "y", we wind up with the PDE R P"RP!xR (kP) . W O W
(4.53)
G. Gyo( rgyi / Physics Reports 342 (2001) 263}392
306
Here the time q evolves in forward direction, from 0 to 1. The equivalent of the "eld P(q, y), evolving from the initial condition in our notation P(0, y)"d(y) ,
(4.54)
was introduced by Sompolinsky in a dynamical context for the SK model [34]. In this case the average (4.52) assumes the alternative forms
dy P(q, y) 0(q, y), dy P(1, y) 0(1, y),0(0, 0) .
(4.55)
Eq. (4.53) is in fact a Fokker}Planck equation with x(q)k(q, y) as drift. The initial condition (4.54) is normalized to 1 and localized to the origin. Hence follows the conservation of the norm
dy P(q, y),1 ,
(4.56)
and the non-negativity of the "eld P(q, y). Thus P(q, y) can be interpreted as a q-time-dependent probability density. We will refer to the initial value problem (4.53), (4.54), which determines Sompolinsky's probability "eld P(q, y), as Sompolinsky's PDE (SPDE) hereafter. Analogously, the "eld S(q, y) adjoint to g(q, y) satis"es R S"RS!xkR S , W O W that renders
dy S(q, y)g(q, y)
(4.57)
(4.58)
constant in q. Obviously R S satis"es the SPDE (4.53). W The Cole}Hopf transformation can be extended to 0(q, y). This is done by the recipe that in the intervals with x "0 the new "eld exhibits pure di!usion. Suppose that t(q, y) satis"es (4.34), then let l(q, y)"0(q, y)t(q, y) ,
(4.59)
whence x R l"!Rl# l ln t . O W x
(4.60)
Similarly, the analog of the Cole}Hopf transformation for the "eld P(q, y) adjoint to 0(q, y) is ¹(q, y)"P(q, y)/t(q, y) ,
(4.61)
satisfying x R ¹"R¹! ¹ ln t . O W x
(4.62)
If x "0 then the PDEs (4.60), (4.62) indeed reduce to the equation for pure di!usion. Based on that the 0 and P "elds can be evaluated along plateaus of x(q) straightforwardly.
G. Gyo( rgyi / Physics Reports 342 (2001) 263}392
307
4.2.4. Green functions The PDEs previously considered were of the form R X(q, y)"L K (q, y, R ) X(q, y)#h(q, y) , (4.63) O W where the unknown "eld is X(q, y) and the time q may evolve in either increasing or decreasing direction. The di!erential operator L K (q, y, R ) is possibly non-linear in X, may be q- and yW dependent, and contains partial derivatives by y. For vanishing argument X"0 the operator gives zero, L K 0"0. We included the additive term h(q, y) for the sake of generality, it was absent from the PDEs we encountered so far. In what follows we shall introduce Green functions (GFs) for linear as well as non-linear PDEs. Suppose that X(q, y) is the unique solution of a PDE like (4.63) with some initial condition. The GF associated with the PDE for the "eld X(q, y) is de"ned as dX(q , y ) . (4.64) G (q , y ; q , y )" 6 dX(q , y ) This may be viewed as the response of the solution X at q to an in"nitesimal change of the initial condition at q . The above de"nition yields a retarded GF, that is, if the PDE for X evolves towards increasing (decreasing) q then the GF vanishes for q (q (q 'q ). Obviously G (q, y ; q, y )"d(y !y ) . (4.65) 6 The chain rule for the functional derivative in (4.64) can be expressed as
G (q , y ; q , y )" dy G (q , y ; q, y) G (q, y; q , y ) , 6 6 6
(4.66)
where q is in the interval delimited by q and q . This is just the customary composition rule for GFs. In terms of the adjoint property, (4.66) means that the adjoint "eld to the GF in its fore variables is the same GF in its hind variables. The PDEs the GF satis"es in its fore and hind variables are, therefore, each other's adjoint equations. The de"nition (4.64) applies both to linear and non-linear PDEs (4.63). It is the specialty of the linear PDE that G (q , y ; q , y ) satis"es the same PDE in the variables q , y with additive term 6 h(q, y)"$d(q !q ) d(y !y ), where the sign is # if the time q in the PDE (4.63) increases and ! if it decreases. Then the solution can be given in terms of the GFs in the usual form
X(q , y )" dy G (q , y ; q , y ) X(q , y )# dy 6
O
dq G (q , y ; q, y ) h(q, y ) . (4.67) 6 O If the PDE for X is non-linear then G (q , y ; q , y ) is the GF for the PDE that is obtained from 6 the aforementioned PDE by linearization as performed at the beginning of Section 4.2.3. In short, the GF of a non-linear PDE is the GF of its linearized version. Note that for a non-linear PDE the GF is associated with a solution X of it, for that solution usually enters some coe$cients in the linearized PDE the GF satis"es. Suppose now that the di!erential operator in (4.63) is L K (q, R ), i.e., it is translation invariant in y. W Such is the case for the PPDE (4.36) and its derivatives. Then it is easy to see that >(q, y)"R X(q, y) W
(4.68)
G. Gyo( rgyi / Physics Reports 342 (2001) 263}392
308
will obey the PDE that is the linearization of the PDE for X. Therefore
>(q , y )" dy G (q , y ; q , y ) >(q , y )# dy 6
O
dq G (q , y ; q, y ) R h(q, y ) . (4.69) 6 W O If the PDE for X is non-linear then Eq. (4.67) does not but Eq. (4.69) does hold. The latter, however, is merely an identity and should not be considered as the solution producing > from an initial condition, because in order to calculate G the knowledge of X and thus that of > is necessary. 6 A prominent role will be played by the GF G (q , y ; q , y ) for the "eld u(q, y) from the PPDE P (4.36). The linearization of the PPDE yielded Eq. (4.49) and the linearization of the derivative of the PPDE, Eq. (4.46), produced Eq. (4.51). Therefore the respective GFs are identical, G (q , y ; q , y )"G0 (q , y ; q , y ) , P G (q , y ; q , y )"G (q , y ; q , y ) . I E Given the initial condition (4.54) of the SPDE, its solution is
(4.70) (4.71)
P(q, y)"G (q, y; 0, 0) . (4.72) . The GFs G and G were discussed for the SK model in Ref. [29]. Considering the constancy of . P (4.52) and (4.58) we have G (q , y ; q , y )"G (q , y ; q , y ) , P . G (q , y ; q , y )"G (q , y ; q , y ) . I 1 An identity between derivatives of GFs can be obtained from Eqs. (4.50) and (4.67) as
(4.73) (4.74)
(4.75) R G (q , y ; q , y )"!R G (q , y ; q , y ) . W I W P Because of their central signi"cance, we display the equations the GF of the "eld u satis"es. In its fore set of arguments the G (q , y ; q , y ) satis"es P R G "!R G !x(q ) k(q , y )R G !d(q !q )d(y !y ) , (4.76) O P W P W P where the di!erential operator on the r.h.s. is the same as on the r.h.s. of (4.49). In the hind set, with regard to the identity (4.73) and the SPDE (4.53), we obtain a PDE (4.77) R G "R G !x(q )R (k(q , y )G )#d(q !q )d(y !y ) W P O P W P whose r.h.s. contains the same di!erential operator as on r.h.s. of the SPDE. The norm in the second y argument is conserved as
G (q , y ; q , y ) dy ,1 P
(4.78)
for q 4q . Eq. (4.67) shows how a particular solution of the linear PDE with a source can be expressed by means of the GF. For example, suppose that the source "eld h(q, y) is added to the linearized PPDE as R 0"!R0!xkR 0#h O W W
(4.79)
G. Gyo( rgyi / Physics Reports 342 (2001) 263}392
309
and an initial condition 0(q , y) is set for some 0(q 41. Then we have the solution for 04q4q in the form O 0(q, y)" dy G (q, y; q , y ) 0(q , y )! dq dy G (q, y; q , y )h(q , y ) . (4.80) P P O The derivative "eld (4.45) satis"es (4.46). Thus it also satis"es the above PDE (4.79) with zero source; whence
(4.81)
(4.82)
k(q, y)" dy G (q, y; 1, y ) U(y ) . P
Derivation of k gives i as from (4.47) which satis"es the PDE (4.48). Its solution can be expressed in terms of the GF associated to k as i(q, y)" dy G (q, y; 1, y ) U(y ) . I
Note that relation (4.75) is necessary to maintain (4.47). So far we considered the GFs of u and its derivative "elds. It is also instructive to see their relation to the GF of the "eld t. Starting from the de"nition (4.64) of the GF and using the Cole}Hopf formula (4.35) we get x(q )t(q , y ) G (q , y ; q , y ) . (4.83) G (q , y ; q , y )" R x(q )t(q , y ) P From the PDEs (4.76) and (4.77) for G we have for G (q , y ; q , y ) P R x (q ) (4.84a) R G "!R G # (ln t(q , y )#1) G !d(q !q )d(y !y ) , O R W R x(q ) R x (q ) R G "R G ! (ln t(q , y )#1) G #d(q !q )d(y !y ) . (4.84b) O R W R x(q ) R Eq. (4.84a) could also be obtained by linearization of the PDE (4.34a), while (4.84b) is its adjoint. These PDEs are particularly useful if x (q)"0, because then they reduce to pure di!usion. One can view relation (4.83) as the translation of the Cole}Hopf transformation (4.35) onto the GFs. We again see the advantage of keeping track of a Cole}Hopf transformed pair like G and G , because R P G is simple for plateaus in x(q) and G is useful when x (q)'0, especially at jumps. R P Notation in subsequent sections can be shortened by the introduction of what we shall call vertex functions
C (q; +q , y , )" dy G (q , y ; q, y)G (q, y; q , y ) G (q, y; q , y ) , P P P PPP G G G
(4.85)
C (q; +q , y , )" dy G (q , y ; q, y)G (q, y; q , y ) G (q, y; q , y ) . PII G G G P I I
(4.86)
The ordering q 4q4q , q 4q4q is understood. The vertex functions satisfy the appropriate linear PDE in each pair q , y , furthermore, if q coincides with say q then the vertex functions G G H reduce to the product of the other two GFs with q , iOj. G
G. Gyo( rgyi / Physics Reports 342 (2001) 263}392
310
As shown for q (q(q , q (q(q in Appendix F, we have the useful identity R C "R R C . (4.87) O PPP W W PII A notable consequence of that is obtained from the fact that k(q, y) of Eq. (4.45) and i(q, y) of Eq. (4.48) are evolved by G and G , respectively, as it follows from Eq. (4.69). Therefore, P I multiplication of (4.87) by the initial conditions U(y )"k(1, y ), for i"2, 3, and integration by G G those y 's gives for q (q G R
O
dy G (q , y ; q, y) k(q, y)" dy G (q , y ; q, y) i(q, y) . P P
(4.88)
The mathematical properties of the PDEs will acquire physical meaning in subsequent chapters where thermodynamical properties are studied. 4.2.5. Evolution along plateaus Here we collect the few obvious formulas describing the evolution of some "elds along the trivial x"0 and x"1 plateaus, and give the GF for u for any plateau. Let us consider "rstly the x"0 plateau, i.e., the region 04q(q . We recall the Cole}Hopf formula (4.35) for the "eld t(q, y) to obtain t(q, y),1 .
(4.89)
The "eld u(q, y) obeys the PPDE (4.36), thus is purely di!usive for x"0 as
u(q, y)" Dz u(q , y#z(q !q) .
(4.90)
Due to continuity of u in q this also holds for q"q . The probability "eld P(q, y) from the SPDE (4.53), (4.54) is the Gaussian function P(q, y)"G(y, q) ,
(4.91)
where the notation
1 x G(x, p)" exp ! 2p (2pp
(4.92)
was used. In the region q 4q41 is the x"1 plateau, we have
t(q, y)" Dz exp U(y#z(1!q) ,
(4.93a)
u(q, y)"ln t(q, y) .
(4.93b)
The time-dependent probability "eld P(q, y) is best evaluated along plateaus by its own version, (4.61), of the Cole}Hopf transformation. The transformed "eld ¹(q, y) obeys (4.62), so it reduces to pure di!usion along a plateau. Thus, assuming the knowledge of P(q , y) and having the u(q, y) from (4.93) we get
P(q, y)"ePOW Dz e\PO W>X(O\O P(q , y#z(q!q ) .
(4.94)
G. Gyo( rgyi / Physics Reports 342 (2001) 263}392
311
The GF for the "eld u, G , will be given on any plateau. Suppose that x (q),0 in the closed P interval [q , q ]. Then from (4.84), for a positive plateau value x, G is a Gaussian function. Then R G becomes from (4.83) P (4.95) G (q , y ; q , y )"eVPO W \PO W G(y !y , q !q ) , P where the notation (4.92) has been used. The GF remains to be determined on the trivial plateau x"0, that is obtained from say (4.76) as G (q , y ; q , y )"G(y !y , q !q ) . P This is the same as we would get from (4.95) by substituting x"0.
(4.96)
4.2.6. Discontinuous initial conditions If the initial condition U(y) of the PPDE (4.36) is discontinuous, then special care is necessary near q"1. While strictly speaking the PPDE is de"ned only for initial conditions twice di!erentiable by y, one may expect that for practical purposes a much less strict condition su$ces. For instance, in the textbook example of pure di!usion any function whose convolution with the Gaussian GF gives a "nite result, can be accepted as initial condition irrespective of its di!erentiability. The physical picture is that di!usion smoothens steps and spikes and brings the solution into a di!erentiable form within an in"nitesimal amount of time. The problem with the PPDE for discontinuous initial condition lies deeper. It can be traced back to the fact that the Cole}Hopf transformation no longer connects the two PDEs (4.34) and (4.36). Even if by means of the Dirac delta we accept di!erentiation through a discontinuity, the derivatives of t(1, y)"exp U(y) and u(1, y)"U(y) are not related by the chain rule, namely U(y) eUWO(eUW) .
(4.97)
This can be seen easily by taking for example the step function U(y)"ah(y) .
(4.98)
eUW"1#(e?!1) h(y) ,
(4.99)
Then
and inequality (4.97) now takes the form ad(y)[1#(e?!1)h(y)]O(e?!1)d(y) .
(4.100)
Equality could only be restored if h(y"0) were chosen a-dependent, an artifact we do not accept. However, the derivation of the PPDE (4.36) from the PDE (4.34) is invalid if the chain rule cannot be applied. The di$culty can be circumvented by our using the explicit expressions (4.93) for the "elds t(q, y), u(q, y) in the interval [q , 1]. Obviously, even if there is a discontinuity } a "nite step } in U(y), the t(q, y) and thus u(q, y) will become smooth for q(1. For instance for (4.98), using the notation
H(x)"
V
1 Dz" [1!erf(x/(2)] , 2
(4.101)
G. Gyo( rgyi / Physics Reports 342 (2001) 263}392
312
we have (4.102) t(q, y)"e?#(1!e?)H(y/(1!q) for q 4q41 . This is an analytic function for qO1 and becomes (4.99) for qP1. Then u(q, y) is obtained in [q , 1] by (4.93b), also analytic for qO1, and u(1, y) becomes indeed (4.98). The above formulas extend down to q . Interestingly, as we shall see later, in the limit of the ground state ¹P0, we have q P1, but the discontinuity of the "elds equally disappears at q , although analyticity will not hold. Thus we have the "elds for q(1, the only problem remains that we cannot say that u(q, y) satis"es the PPDE (4.36) at q"1, because of inequality (4.97). The di!erence in nature between the t and u functions for q 4q41 can be illustrated by the following. The singularity of the PDEs can be tamed by our considering the "elds as integral kernels. Let us take an analytic function a(y) such that itself and its derivatives decay su$ciently fast for large arguments and consider
A (q)" dy a(y)t(q, y) . R
(4.103)
Starting from (4.93a), changing the integration variable as yPy!z(1!q, and formally expanding in terms of (1!q we get
(1!q)I dy a I (y) eU W , A (q)" R 2Ik! I where we used Dz zI>"0, Dz zI"(2k!1)!!, and the notation dI f (x) . f I (x)" dxI
(4.104)
(4.105)
On the other hand, a similar procedure can be carried out for
A (q)" dy a(y) u(q, y) , P
(4.106)
a case we illustrate on (4.98). From (4.102) we have u(q, y)"ln[e?#(1!e?)H(y/(1!q)]"u(y/(1!q) ,
(4.107)
where the last equality de"nes the single-argument function u(z). Then
A (q)"A (1)# dy a(y)(u(y/(1!q)!ah(y)) , P P
(4.108)
where A (1) was added to and subtracted from the r.h.s. Changing the integration variable as P yPy(1!q and formally expanding by (1!q we get
(1!q)I> a I (0) dy yI(u(y)!ah(y)) . A (q)"A (1)# P P k! I
(4.109)
G. Gyo( rgyi / Physics Reports 342 (2001) 263}392
313
Thus in leading order we have from (4.104) and (4.109) A (q)!A (1)J1!q , (4.110a) R R A (q)!A (1)J(1!q . (4.110b) P P So, considering the "elds as integral operators in the case of non-di!erentiable initial conditions, we see from Eqs. (4.110a) and (4.110b) that t does but u does not have a "nite derivative by q at q"1. This explains why we could maintain the PDE for t while the PPDE had to be given up in q"1 with a non-di!erentiable initial condition. If the PPDE (4.36) is ill-de"ned for q"1 then so may be the PDEs for the derivative "elds, the linearized PDEs, and the PDEs for the GFs, as discussed in Sections 4.2.2}4.2.4. We settle the ambiguity by rede"ning the derivative "eld k(q, y) as k(q, y)x(q)t(q, y)"R t(q, y) , W so in [q ,1], where x(q),1
k(q, y)t(q, y)" Dz R eUW>X(\O . W
(4.111)
(4.112)
For a smooth U(y) one recovers the original de"nition (4.45) for any q. If, however, U(y) is discontinuous then, due to the inequality (4.97), the new formula (4.112) will, in general, di!er from (4.45) at q"1. The k(q, y) from (4.112) satis"es in [q , 1] (4.113a) R k"!Rk!kR k , W O W k(1, y)eUW"(eUW) . (4.113b) The specialty here is that the derivation Eqs. (4.113a) and (4.113b) could be done without the now invalid chain rule. The above PDE coincides with (4.46a) at x"1, with an initial condition that may be di!erent from (4.46b). In a similar spirit it can be shown that the k(q, y) rede"ned above enters the PDEs (4.76) and (4.77) for the GF G , provided the latter is introduced by our "rst giving G via (4.64) then de"ning P R G via (4.83). Note that the GF G is given in the interval [q , 1] by (4.95) with x"1, a smooth P P function in the y-arguments if both q arguments are less than 1. The continuous framework, with PDEs, was meant to be a practical reformulation of iteration (4.15). Real use of it is in the RPR limit, when it allows more liberty in parametrization of a "nite approximation than just the taking of a large but "nite R. In case of ambiguity, however, the iteration takes precedence. That argument helped us to re"ne our formalism of PDEs for discontinuous initial conditions. In what follows we will use the short notation made possible by the PDE formalism as if we were dealing with a continuous initial condition U(y). However, if U(y) is discontinuous then the PPDE must not be applied at q"1, rather (4.93) yields the "eld u(q, y) in [q , 1]. So although then the PPDE is not true at q"1, we keep it and understand it as the above recipe. The derivative of the PPDE can be upheld with the above de"nition of the derivative "eld k as can the PDEs for the GF G . In concrete computations on a discontinuous initial condition we shall see that this takes care P of most of the problem.
G. Gyo( rgyi / Physics Reports 342 (2001) 263}392
314
5. Correlations and thermodynamical stability 5.1. Expectation values 5.1.1. Replica averages In this section we evaluate important special cases of the generalized averages (3.19) and (3.23) within Parisi's ansatz. In what follows, generically the knowledge of Q, or equivalently, in the nP0 limit, that of x(q) will be assumed. Practically, all "elds introduced above as solutions of various PDEs, for given x(q), will be considered as known and expectation values expressed in terms of those "elds. The pioneering works in this subject are that of de Almeida and Lage [27] and of MeH zard and Virasoro [28], who evaluated the average magnetization and its low-order moments in the SK model. What follows in Section 5.1 can be viewed as the generalization of the mechanism these authors uncovered. We shall call the variable y in (4.1) `local "elda. In the SK model y corresponds to the local magnetic "eld, for the neuron it is the local stability parameter, and it is useful have a name for it even in the present framework. The generic formula comprising (3.19) and (3.23) is
dLx dLy L A(x, y) exp U(y )#i xy!xQx . (5.1) ? (2p)L ? The normalizing coe$cient, analogous to the prefactors in Eqs. (3.19) and (3.23), is not included here, since in the limit nP0 it becomes unity. We shall automatically disregard such factors henceforth. Furthermore, we will take nP0 silently whenever appropriate. Dependence on U and Q is not marked on the l.h.s. The quantity (5.1) will be called the replica average of the function A(x, y). Such formulas emerge in most cases when we set out to evaluate thermodynamical quantities in or near equilibrium. [A(x, y)\"
5.1.2. Average of a function of a single local xeld A case of import is when the quantity to be averaged depends only on the local "eld y of a single ? replica. Such is the form of the distribution of stabilities given in Eq. (3.27) and the energy (3.29). Due to the fact that y and x are each other's Fourier transformed variables, the expectation ? ? values of replicated x's, like in Eqs. (3.21), (3.24) and (3.25) are related to the averages of products of functions of local "elds y 's. The latter can be straightforwardly understood once the case of ? a function of one y argument is clari"ed. Thus we "rstly focus on ? C "[A(y )\ . (5.2) There is no loss of generality in choosing the "rst replica, a"1, because RSB only a!ects groups of two or more replicas. Within Parisi's ansatz (4.4) the C evaluates to a formula like the r.h.s. of (4.11) with the di!erence that here A(y ) is inserted into the integrand. In analogy with (C.2) we obtain
C "
0> LKP 0> L 0> DzPP A zP(q !q exp U zPP (q !q . P P\ P P\ H ? H P HP P ? P
(5.3)
G. Gyo( rgyi / Physics Reports 342 (2001) 263}392
315
We used the de"nition of j (a) from Eq. (C.1). In the argument of A the j (1)"1 label was inserted P P for the zP's. After a reasoning similar to that followed in Section 4.1.1 again expressing the integer m by the real x from (4.12) and taking nP0, we arrive at the recursion P P
0 (y)t (y)" Dz 0 (y#z(q !q )t (y#z(q !q )VP VP> , P\ P\ P P P\ P P P\
(5.4a)
(y)"A(y) , (5.4b) 0 0> while the iteration of t (y) is de"ned by Eqs. (4.15) and (4.16). The "nal average is obtained at r"0 P as
C " Dz 0 (z(q ) .
(5.5)
Using the identity (D.1) we are lead to the operator form
0 (y)t (y)"eOP \OP\ W0 (y)t (y)VP VP> , P\ P\ P P whence by continuation it is easy to derive the PDE x R (0 t)"!R(0 t)# 0 t ln t . O W x
(5.6)
(5.7)
In the spirit of Section 4.1.2 it is straightforward to show that this equation holds also for "nite R-RSB as well. Then at discontinuities of x(q) the singular second term on the r.h.s. is absorbed by the requirement that t(q, y)VO is continuous in q. The initial condition for t(q, y) was previously given in (4.34b) and that for 0(q, y) is set by (5.4b) as
0(1, y)"A(y) .
(5.8)
In Eq. (5.7) we recognize the PDE (4.60) for the "eld (4.59). Now we again have a product like (4.59), so the "eld 0(q, y) here also satis"es the PDE (4.49). Thus the sought average (5.5) can be written as
C " Dz 0(q , z(q ) ,
(5.9)
a functional of U(y) and x(q), where the de"nition of q by (4.42a) was used. A practical expression for the above average involves the adjoint "eld P(q, y), obeying the PDE (4.53) and rendering the formula (4.52) independent of q. Let us recall the abbreviation for the Gaussian (4.92). Then (5.9) is of the form of (4.52) at q"q if P(q , y)"G(y, q ) . (5.10) Given the purely di!usive evolution in the interval (0, q ), this condition means that P(0, y) is localized at y"0, i.e. P(q, y) satis"es the SPDE (4.53), (4.54), whence we can write the expectation value in the form (4.55) as
C " dy P(1, y)A(y) .
(5.11)
G. Gyo( rgyi / Physics Reports 342 (2001) 263}392
316
This is the main result of this section. Here the initial condition (5.8) was used, which is just the function we intended to average. This expression reveals that P(1, y) is the probability distribution of the quantity y, or, for a general q, P(q, y) is the distribution at an intermediate stage of evolution. Note that in [18] we gave a shorter derivation for (5.11), which avoided the use of the recursion (5.4). The reason for our going the longer way here is that it straightforwardly generalizes to the case of higher-order correlation functions. 5.1.3. Correlations of functions of local xelds The expectation value of a product of functions each depending on a single local "eld variable reads as (5.12) C 2 (a, b,2, z)"[A(y ) B(y )2Z(y )\ . ? @ X 8 This will be called replica correlation function, or correlator, of the functions A, B,2, Z of respective local "elds y , y ,2, y . Its `ordera is the number of di!erent local "elds it contains. The ? @ X natural generalization of the observations in the previous section allows us to construct formulas for the above correlation function. This will be undertaken in the present and the following two sections. Let us "rst consider the second-order local "eld correlator C (a, b)"[A(y ) B(y )\ . (5.13) ? @ The Parisi ansatz allows us to parametrize C by the q variable, rather than the replica indices a and b, remnants of the n;n matrix character of Q. This goes as follows. Fixing the replica indices a and b we obtain two iterations like (5.4), with respective initial conditions A(y) and B(y) at q"1. These we denote by 0 and 0 , respectively. The iterations evolve until they reach an index r(a, b) speci"ed by the property that for r(r(a, b), all j indices coincide, j (a)"j (b). Here we used the P P P de"nition of the labels j (a) from Eq. (C.1) i.e., if j "1,2, n/m are the labels of `boxesa of replicas P P P that contain m replicas then j (a) is the `serial numbera of the box containing the ath replica. The P P r(a, b) marks the largest r index for which the replicas a and b fall into the same box. Obviously, since for decreasing r the box size m increases, for any given r4r(a, b) the said replicas will fall into P the same box of size m . The r(a, b) will be referred to hereafter as merger index, and is a given P function of a and b for a given set of m 's of Eq. (4.5b). P The hierarchical organization of the replicas implies the following property. Consider three di!erent replica indices a, b, and c. Then either all three merger indices coincide as r(a, b)" r(a, c)"r(b, c), or two merger index coincide and the third one is smaller, e.g. r(a, c)"r(b, c)'r(a, b). This is characteristic for tree-like structures, for example, a maternal genealogical scheme. The merger index allows us to relabel the matrix elements (4.6) in the Parisi ansatz as q
"q . (5.14) P?@ ?@ This we can consider as the de"nition of r(a, b), provided that giving q uniquely determines r, that P is, in (4.6) strict inequalities hold. At the juncture r"r(a, b) the two aforementioned iterations, so far each obeying (5.4a), merge into one, such that the product of the two `incominga 0 and 0 "elds at r"r(a, b) give the initial condition for the one `outgoinga iteration, denoted by 0 .
G. Gyo( rgyi / Physics Reports 342 (2001) 263}392
317
That is, for r(r(a, b), again the iteration (5.4a) is to be used for 0 (y) such that at r"r(a, b) it P satis"es the initial condition
0 (y)"0 (y)0 (y) . (5.15) P?@ P?@ P?@ Such merging of 0 "elds to produce an initial condition for further evolution will turn out to be ubiquitous whenever correlators are computed. After changing from the discrete r index to the q time variable, we obtain the expectation value in a form similar to (5.9) as
C (q )" Dz 0 (q , z(q ) . P?@
(5.16)
Here we switched notation and denote the dependence on the initial a, b replica indices through q . Equivalently, replacing q by q, we get P?@ P?@
C (q)" dy P(q, y) 0 (q, y)" dy P(q, y) 0 (q, y) 0 (q, y)
(5.17)
Here only such q is meaningful that equals a q in the R-RSB ansatz, or, is a limit of a q if RPR. P P However, this expression can be understood, at least formally, for all q's in the interval [0, 1]. 5.1.4. Replica correlations in terms of Green functions It is instructive to redisplay the formulas for C and C (q) in terms of GFs. Their natural generalization will yield the GF technique and the graphical representation for general correlation functions. The time evolution of the 0 "eld can be expressed by means of the GF. Based on the relation between P(q, y) and the GF given by (4.72) we can write
C " dy G (0, 0; 1, y)A(y) . P
(5.18)
Correlators can be conveniently represented by graphs. On the obvious case of C , see Fig. 2, we can illustrate the graph rules. We symbolize the GF G (q , y ; q , y ) by a line stretching between q and q . Over the y's P appropriate integrations will be understood. If q "0 the corresponding y is set to zero, i.e., integration is done after multiplication by a Dirac delta. For this is always the case in our examples, we do not put any marks at q"0. A weight function under the integral at q"1, like A(y) in (5.18), should be marked at the right end of the line. In sum, C is a single line between q"0 and q"1, labeled by A(y) at q"1. As to the second-order correlator (5.17), based on Eqs. (4.79) and (4.80) we can write 0 and 0 in terms of the GF and obtain
C (q)" dy dy dy G (0, 0; q, y) G (q, y; 1, y ) A(y ) G (q, y; 1, y ) B(y ) . P P P
(5.19)
Its graphic representation is given in Fig. 3, it consists of a single vertex. The third-order correlator C (a, b, c), see (5.12) for notation, can be analogously calculated. ! We can assume without restricting generality that r(a, b)5r(a, c)"r(b, c), and use the notation
G. Gyo( rgyi / Physics Reports 342 (2001) 263}392
318
Fig. 2. Graphical representation of Eq. (5.18) for C . The line corresponds to the GF associated with the "eld u. Its two q-coordinates are taken at the endpoints of the line and the two y-coordinates are integrated over. At q"1 the function included in the integrand is displayed. At q"0 the Dirac delta d(y), understood in the integrand and forcing the zero y-argument in (5.18), is not indicated, because it is present for all correlators. Fig. 3. The correlation function C
(q).
q "q 4q "q . The q 's, i"1, 2, used here should not be confounded with the q 's of (4.6) P?A P?@ G P from the R-RSB scheme. In this case the two iterations (5.4a) with respective initial conditions A(y) and B(y) merge at r(a, b). Switching to parametrization by q means that the PDE (4.49) rather than the iteration (5.4a) is to be considered. Thus (4.49) should now be used in two copies, one with initial condition 0 (1, y)"A(y) and the other with 0 (1, y)"B(y). They merge at q . That means, the `incominga "elds multiply to yield a new initial condition 0 (q , y)"0 (q , y)0 (q , y), like in (5.15), and hence for q 4q4q the "eld 0 (q, y) obeys the PDE (4.49). In q another merger takes place with the incoming "eld 0 (q, y). This started from the initial condition 0 (1, y)"C(y) ! ! and has evolved according to (4.49) until q"q . Here the product of the two incoming "elds 0 (q , y)"0 (q , y)0 (q , y) becomes the initial condition at q"q for the "nal stretch of ! ! evolution by (4.49) down to q"0. The resulting correlator is easy to formulate in terms of GF's. Indeed, (4.80) with h,0 gives the solution of the PDE (4.49) starting from an arbitrary initial condition, speci"ed at an arbitrary time. Hence C (q , q )"[A(y ) B(y ) C(y )\ ! ? @ A
" dy dy dy dy dy P(q , y ) G (q , y ; 1, y ) C(y ) P ;G (q , y ; q , y ) G (q , y ; 1, y ) A(y ) G (q , y ; 1, y ) B(y ) . (5.20) P P P The corresponding graph is on Fig. 4, it has two vertices. The special case r(a, b)"r(a, c)"r(b, c) corresponds to q "q . Then we wind up with a single vertex of altogether four legs, and accordingly, the G (q , y ; q , y ) in (5.20) should be replaced by d(y !y ). P A general correlator of local "elds y can be graphically represented starting out of the full ultrametric tree [14]. This can be visualized as a tree with R#1 generations of branchings and at the rth generation having uniformly the connectivity m /m . The (R#1)th generation has P P> n branches, to the end of each a `leaf a can be pinned. The leaves are labeled by the replica index a"1,2, n. Between r"0 and r"1 is the `trunka. For a } possibly large } integer number of replicas n this is a well de"ned graph. If nP0 then the m 's cannot be held integers and possibly the P q 's densely "ll an interval. Thus the full tree looses graphical meaning. On the other hand, the P graphs representing replica correlators can be understood as subtrees of the full tree for integer n, and remarkably, they remain meaningful even after continuation.
G. Gyo( rgyi / Physics Reports 342 (2001) 263}392
319
Fig. 4. The correlation function C (q , q ). !
On Figs. 2}4 we illustrated the "rst three simplest local "eld correlations by graphs. There a branch connecting vertices of time coordinates say q and q 'q was associated with G (q , y ; q , y ), with implied integrations over the local "eld coordinates. This feature holds also P for higher-order correlations. Similarly to the case explained in Section 5.1.2, then again iteration (5.4), or, equivalently, the PDE (4.49) emerges. Given an interval (q , q ) the initial condition for a "eld 0 is set at the upper border q , then 0 undergoes evolution by the linearized PPDE (4.49), and the result is the solution at q . Since G (q , y ; q , y ) is the GF that produces the solution of P (4.49) out of a given initial condition, it is natural to associate the GF with the branch of a graph linking q with q . Since the GF is in fact an integral kernel, integration is to be performed over variables y and y at the endpoints of the branch. This automatically yields the merging of incoming "elds 0 at a vertex to form a new initial condition, as exempli"ed (before continuation) for the second-order local "eld correlator in Eq. (5.15). Indeed, the local "eld y associated with a vertex at q of altogether three legs is the fore y argument of two incoming GFs and the hind y argument of one outgoing GF, so the latter evolves the product of the incoming 0 "elds towards decreasing times starting from q. The graph rules for the general local "eld correlator C 2 (a, b2, z), de"ned by (5.12), can be 8 summarized as follows. Draw continuous lines starting out from the leaves corresponding to the replica indices a, b,2, z along branches until the trunk is reached. Lines will merge occasionally, and in the end all lines meet at the trunk. The merging points are speci"ed by the merger indices r(a, b)2, or equivalently, by the q values from (5.14) for each pair of the replica indices we P?@ 2 started with. Obviously, not all such q's for di!erent replica index pairs from the set a, b,2, z need to be di!erent, in the extreme case all such q's may be equal. The graph thus obtained is, from the topological viewpoint, uniquely determined by the given set of replica indices of a correlator. Then the explicit dependence on the replica indices a, b,2, z is no longer kept, instead they appear through merger indices r(a, b),2, or, equivalently, q , . This allows us to take the nP0 limit. P?@ 2 In the end, the correlator becomes a function of all q 's that can be formed from the replica P?@ indices a, b,2, z of (5.12). Now that each branch merging has a given time q value, it is useful to include the coordinate axis of q with a graph. The calculation of a correlator implies evolution by the PDE (4.49), "rst with di!erent y variables along the respective branches, from the leaves towards the trunk. The functions A(y), B(y),2, Z(y) are the initial conditions of this evolution until the "rst respective merging points. Whenever branches meet, say at a q , the "elds 0 (q , y), 0 (q , y), etc., associated with the di!erent incoming G G G
G. Gyo( rgyi / Physics Reports 342 (2001) 263}392
320
lines multiply, all having a common y local "eld. Thus is created a new initial condition for further evolution by (4.49), from q onward to decreasing q's. At the last juncture, say q , the y-integral of G the product of the incoming "elds weighted with P(q , y) yields the correlator in question. Obviously, the branches that connect merging points can be associated with the GF G of the PDE P (4.49). It follows that at a merging point of two branches the y-integral gives the vertex function C of (4.85). PPP It should be noted that the correlator C 2 (a, b,2, z) is now expressed as an integral 8 expression, where the product A(y ) B(y )2Z(y ) appears in the integrand. Thus an average of the ? @ X more general form [A(y , y ,2, y )\ ? @ X
(5.21)
is obtained by our replacing A(y ) B(y )2Z(y ) by A(y , y ,2, y ) in that expression. Then we ? @ X ? @ X loose the picture of 0 "elds independently evolving from q"1 by the PDE (4.49) and then merging for some smaller q's, because the function A(y , y ,2, y ) couples the 0 "elds at the outset q"1. In ? @ X what follows we will not encounter averages (5.21) of non-factorizable functions. In summary, a given correlation function is represented by a tree, that is a "nite subtree of the full ultrametric tree. Leaves are associated with initial conditions of the evolution by (4.49). Branches directed from larger to decreasing q correspond to the GF G . Each vertex, including the leaves and P the bottom of the trunk, has a q, y pair associated with it. At the leaves q"q "1, and there is 0> integration over y's in each vertex. At q"0 simply y"0 should be substituted into the "nal formula, so the GF of the trunk becomes just Sompolinsky's "eld P due to (4.72). The intermediate q's will be the independent variables by those we characterize the correlation function. Thus a tree uniquely de"nes an integral expression. Furthermore, topologically identical trees correspond to the same type of integral. Of course, two topologically identical trees can have di!erent functions associated with their respective leaves, and then the two integrals will evaluate to di!erent results. Elementary combinatorics gives the number N(K) of topologically di!erent trees of K leaves in terms of a recursion. Denoting the integer part of z by [z] we have N(1)"1 ,
(5.22a)
K K K K!1 )\
! N N #1 . (5.22b) N(K)" N(k)N(K!k)# 2 2 2 2 I The basis of this recursion is the fact that in a tree with K leaves two subtrees meet at the trunk, one having k and the other, K!k number of leaves. The sum is interpreted as zero for K"2. The second term on the r.h.s. contributes only for K even, it gives the number of trees that are composed out of two subtrees both having K/2 leaves. Some terms generated by the above recursion are N(2)"1, N(3)"1, N(4)"2, N(5)"3, N(6)"6, N(7)"11, N(8)"23. For K"1, 2, 3 we have N(K)"1, in accordance with our previous "nding that in each of those cases there is only one graph, see Figs. 2}4. In deriving (5.22) we assumed that vertices have altogether three legs. In that case the number of vertices is K!1. If q's coincide because branches shrink to a point then the number of vertices decreases and vertices with more than three legs arise. The corresponding integral expressions are consistent with the graph rules laid done before. Indeed, a branch of zero length is associated with
G. Gyo( rgyi / Physics Reports 342 (2001) 263}392
321
the GF as in (4.65), i.e., gives rise to a Dirac delta equating the local "elds at its two endpoints, wherefore each remaining branch still represents a GF and the vertex with more than three legs will still have a single y variable to be integrated over. 5.1.5. Replica correlations of x's Derivatives by q of the archetypical expression (4.1) play an important role in determining ?@ thermodynamical properties. Let us introduce the expectation values (5.1) of products of x 's as ? (5.23) CI(a ,2, a )"(!i)I [x x 2x I \ . ? V I ? ? The (!i)I is factorized for later convenience. This is the correlation function of order k of the variables x H . Correlators of even, 2k, order are related to the derivatives of (4.1) by the matrix ? elements q as ?@ RIeL P UWQ
. (5.24) CI(a ,2, a )" V I Rq 2Rq I\ I ?? ? ? Second-order correlators enter the stationarity conditions (3.21), (3.24) and (3.25), and fourth-order ones appear in studies of thermodynamical stability, as we shall see it later. By partial integration (5.23) can be brought to the form of the average of products of various derivatives of U(y ) as ? dLx dLy xy x Q x L e \ R ? R ? 2R ?I exp U(y ) , (5.25) CI(a ,2, a )" ? W W W V I (2p)L ? where coinciding replica indices give rise to higher derivatives. In the special case when all a indices are di!erent, we have H CI(a ,2, a )"[U(y ) U(y )2 U(y I )\ . (5.26) V I ? ? ?
Note that in the case of a discontinuous U(y) we may not use the chain rule of di!erentiation. Therefore in (5.25) the derivatives should act directly on the exponential. Then, in the spirit of Section 4.2.6, we can conclude that k(1, y) as de"ned in (4.113b) should be used in lieu of U(y), so the "eld k(q, y) de"ned in (4.111) evolves from q"1 down until the "rst merging point in its way (the "rst vertex to be met when coming from a leaf at q"1). In the following general treatment we assume a smooth U(y), with the note that the adaptation of the results to discontinuous ones is straightforward. Expression (5.26) is of the form (5.12), so
CI(a ,2, a )"CU 2U (a ,2, a ) . Y Y I V I We review some low-order correlators below.
(5.27)
5.1.6. One- and two-replica correlators of x's The simplest case of replica correlation function of x's is the average of a single x. Eq. (5.27) for k"1 becomes independent of the single replica index and gives a formula of the type (5.10) as
C"CU " dy P(1, y) U(y) . V Y
(5.28)
G. Gyo( rgyi / Physics Reports 342 (2001) 263}392
322
Fig. 5. The graph of C is a single line. V
Comparison of (4.46) and (4.49) shows that with the present initial condition 0(q, y)"k(q, y). Thus, recalling that P(1, y)"G (0, 0; 1, y), we get alternatively P C"k(0, 0) . (5.29) V This is shown on Fig. 5 graphically, it is a special case of Fig. 2. Let us now turn to the correlator of two x 's as de"ned in (5.23). If the replica indices are di!erent ? then (5.26) applies; that should be complemented to allow for coinciding indices as C(a, b)"CU U (a, b)#d CU . (5.30) Y Y ?@ V This function depends on the replica indices through the overlap q at the merger q"q . The P?@ "rst term on the r.h.s. is a special case of the correlation function C (q) given in Eq. (5.19) with A(y)"B(y)"U(y). Note, however, that the k "eld satisfying (4.46) is in fact the 0 of (4.49) starting from the initial condition U(y). Therefore the two instances of convolution of the GF with U(y) give k(q, y) in (5.19) and we get
dy P(q
P?@
, y) k(q , y) P?@
(5.31)
for the "rst term on the r.h.s. of Eq. (5.30). The second term there is of the type studied in Section 5.1.2. Note that the initial condition is by (4.48b) just i(1, y). Furthermore, r(a, a)"R#1 and q "q "1. In summary, for the q-dependent two-replica correlation function we obtain ?? P?? dy P(q, y) k(q, y) if q(1 , C(q)" (5.32) V dy P(1, y) [k(1, y)#i(1, y)] if q"1 ,
having omitted the subscript r(a, b) from q. Note that the second term on the r.h.s. of (5.30) contributes at q"1. The above formula can be abbreviated as
C(q)" dy P(q, y) [k(q, y)#h(q!1\) i(1, y)] , V
(5.33)
where the second term is non-zero only if q"1. We will use the shorter notation with the Heaviside function in similar cases hereafter. Fig. 6 summarizes the result graphically. As it was emphasized earlier, the correlator is meaningful for q arguments at the stationary q 's, ?@ or at their limits for nP0. For q's where x (q),0 the extension of the correlators is not unique. For instance, we can write any q (q(1 (for "nite R-RSB, q "q , and for continuation see 0 Section 4.2.1) in lieu of 1\ in the argument of the Heaviside function in (5.33). Note that the
G. Gyo( rgyi / Physics Reports 342 (2001) 263}392
323
Fig. 6. The correlation function C(q). V
two-replica correlation function, like the "elds obeying the PPDE and the PDEs described in Sections 4.2.2 and 4.2.3, does not have a plateau in (q , 1). In summary, expression (5.33) is the two-replica correlation function for both the "nite R-RSB and RPR, at arguments q where x (q)O0. 5.1.7. Four-replica correlators The native form of the four-replica average is by (5.25) C(a, b, c, d)"CU U U U (a, b, c, d)#[d CUU U (a, c, d)#5 comb's] Y Y Y Y ?@ Y Y V # [d d CUU (a, c)#2 comb's]#[d CU U (a, d)#3 comb's] ?@ AB ?@A Y Y # d CU . ?@AB
(5.34)
Here `comb'sa stands for combinations. Then we used the shorthand notation that a d 2 "1 ?@ A only if all a, b,2, c indices are equal, else d 2 "0. Furthermore, abbreviation (4.105) is ?@ A understood. In order to simplify notation, we switch to using q for the parametrization of expectation values. G The q 's should not be confounded with the q values introduced in (4.6) for the R-RSB scheme. G P There are only two essentially di!erent correlation functions, because two topologically di!erent trees with four leaves can be drawn. Indeed, N(4)"2, c.f. Eq. (5.22). The graphs are shown on Fig. 7. They correspond to the "rst term on the r.h.s. of Eq. (5.34) and thus represent the case when all replica indices are di!erent. Taking into account coinciding indices is somewhat involved both analytically and graphically, we give below only the formulas. The graph in Fig. 7a corresponds to
C(q , q , q )" dy P(q , y) N(q , y; q )N(q , y; q )# h(q !1\) dy P(1, y) U (y) , V (5.35)
where
N(q , y ; q )" dy G (q , y ; q , y )[k(q , y )#h(q !1\)U(y )] . P
(5.36)
G. Gyo( rgyi / Physics Reports 342 (2001) 263}392
324
Fig. 7. The correlation functions (a) C(q , q , q ), (b) C(q , q , q ), when all q (1 and are di!erent from each V V G other. The U(y) functions at the tip of the branches at q"1 are understood but not marked.
Note that N(q , y ; q ) can be considered as a generalized two-replica correlation with extra q , y dependence, because N(0, 0; q)"!C(q). The inequalities V q 4q 41, q 4q 41 (5.37) are understood, so the last term on the r.h.s. of (5.35) is non-zero only, if q "1, i"1, 2, 3. G The topologically asymmetric tree of Fig. 7b is associated with
C(q , q , q )" dy dy P(q , y ) k(q , y )G (q , y ; q , y )k(q , y ) N(q , y ; q ) P V
# h(q !1\) dy dy P(q , y )k(q , y )G (q , y ; 1, y )U(y ) , (5.38) P where we assume
q 4q 4q 41 (5.39) but also require q (1, because the case q "1 has been settled by Eq. (5.35). In conclusion, given the GF for the linear PDE (4.49), correlation functions can be calculated in principle. Interestingly, the GF for a Fokker}Planck equation also assumes the role here as the traditional "eld theoretical GF. Note that this is an instance where a mean-"eld property transpires: the graphs to be calculated are all trees. It should be added that here the tree structure is the direct consequence of ultrametricity [14], and may carry over to non-mean-"eld-like systems with ultrametricity [227]. That simple form of graphs is a priori far from obvious, since there are techniques for long-range interaction systems where diagrams with loops are present [84]. In hindsight we can say that by using the GF of a Fokker}Planck equation with a non-trivial drift term, we implicitly performed a summation of in"nitely many graphs of earlier approaches. 5.2. Variations of the Parisi term The variation of the free energy term by the OPF x(q) is necessary in order to formulate later stationarity conditions, and second-order variations yield the matrix of stability against #uctuation
G. Gyo( rgyi / Physics Reports 342 (2001) 263}392
325
of the OPF. In this section only the mathematical properties are investigated, physical signi"cance will be elucidated later. 5.2.1. First variation The main result of Section 4 is that the ubiquitous term (4.1) boils down within the Parisi ansatz to (4.38), i.e., lim u[U(y), Q]"u[U(y), x(q)]"u(0, 0) . (5.40) L In order to determine the variation of u(0, 0) in terms of x(q) we introduce small variations as xPx#dx and uPu#du and require that the varied quantities also satisfy the PPDE (4.36a) with the same initial condition (4.36b) for u#du. Linearization of the PPDE in the variations gives R du"!R du!xkR du!k dx , W O W du(1, y)"0 ,
(5.41a) (5.41b)
where k(q, y)"R u(q, y) satis"es the PDE (4.46). Eq. (5.41) is an inhomogeneous, linear PDE for W du(q, y), given x(q), dx(q), and k(q, y). Note that this is of the form of the linearized PPDE with source (4.79). Its solution is given in (4.80), whence
1 dq dy G (q , y ; q , y )k(q , y ) dx(q ) , du(q , y )" P 2 O whence
du(q , y ) "h(q !q ) dy G (q , y ; q , y )k(q , y ) . P dx(q ) Thus the variation of the term (5.40) is
du(0, 0) 1 1 " dy G (0, 0 ; q, y) k(q, y)" dy P(q, y) k(q, y) . P dx(q) 2 2
(5.42)
(5.43)
(5.44)
Here we used the identity (4.72) between the GF and the "eld P(q, y). It is interesting that the above formula is in fact proportional to the two-replica correlation of Eq. (5.33) du(0, 0) "C(q) V dx(q)
(5.45)
for q(1. Since the correlation function can also be obtained by di!erentiation in terms of q , we ?@ have by Eqs. (4.1), (5.32), and (5.44), for q(1
du(0, 0) Rnu[U(y), Q] " . (5.46) lim dx(q) Rq ?@ O O ?@ L This relation tells us that if a free energy is the sum of terms (4.1) then the two stationarity conditions, one obtained by di!erentiation in terms of the matrix elements q "q and the other by ?@
G. Gyo( rgyi / Physics Reports 342 (2001) 263}392
326
variation in terms of x(q), are equivalent. Such is the SK model, the spherical neuron, and the neuron with arbitrary, independent synapses. In the case of a discrete R-RSB scheme (4.5) variation by x(q) is made with the assumption of a plateau, i.e., x(q),x, 0(x(1, in an interval I. Then the role of the variation will be taken over by the derivative in terms of the plateau value x and of the endpoints q and q . It is straightfor ward to show that Ru(0, 0) 1 " 2 Rx
C(q) dq V
(5.47)
' results. Since the "elds P and k are purely di!usive in I, the q-integral is Gaussian. On the other hand, the derivatives in terms of the endpoints are C at the endpoints, due to Eqs. (5.46) and V (5.45). If we work with an ansatz for the OPF that has both x (q)'0 and x(q),x, 0(x(1, segments, then (5.44) should be used in an interval where x (q)'0 and (5.47) along a plateau. If x (q)'0 at isolated points, like in a "nite R-RSB scheme at jumps, di!erentiation in terms of the location of that points results in (5.44) at that points. 5.2.2. Second variation The stability of a thermodynamic state against #uctuations in the space of the OPF x(q), the so called longitudinal #uctuations, can be studied through the second variation of the free energy term (5.40). We will present here brie#y the way the longitudinal Hessian can be calculated. In order to determine the variation of the "rst derivative (5.44), we should vary the "elds k and P. For k we obtain by de"nition
du(q , y ) dk(q , y ) "h(q !q ) dy R G (q , y ; q , y )k(q , y ) . "R (5.48) W dx(q ) W P dx(q ) In order to calculate the variation of the "eld P we need to vary the SPDE (4.53). This yields R dP"R dP!x R (k dP)!x R (P dk)!dx R (kP) , O W W W W dP(0, y)"0 .
(5.49a) (5.49b)
This can be solved by using the fact that the GF for the SPDE is the reverse of G . Thus P O dP(q , y )"! dq dy G (q , y ; q , y ) P ;+x(q ) Ry (P(q , y ) dk(q , y ))#Ry (P(q , y ) k(q , y )) dx(q ), . (5.50) Hence the variation of P(q , y ) by x(q , y ) is straightforward to obtain, where also Eq. (5.48) should be used. The above preliminaries allow us to express the second variation of the free energy functional. Varying (5.44) gives
1 dP(q , y ) dk(q , y ) du(0, 0) k(q, y )# dy P(q , y )k(q , y ) . " dy dx(q ) dx(q ) dx(q ) dx(q ) 2
(5.51)
G. Gyo( rgyi / Physics Reports 342 (2001) 263}392
327
Substitution of the variation of P(q , y ) and of k(q , y ) yields after some manipulations du(0, 0) 1 " dy dy R G (q , y ; q , y ) P(q , y ) k(q , y ) k(q , y ) W P
dx(q ) dx(q ) 2 1 O # dq x(q ) dy dy dy P(q , y ) 4 (5.52) ;R G (q , y ; q , y ) R G (q , y ; q , y ) k(q , y )k(q , y ) , W P W P where
q
"min(q , q ) , (5.53a) q "max(q , q ) . (5.53b)
Note the symmetry of (5.52) w.r.t. the interchange of q and q . If we have the extremizing x(q) as well as the GF G , the latter yielding by (4.81) the "eld k, then Eq. (5.52) is an explicit expression for P the second functional derivative.
5.3. The Hessian matrix There are results in the literature on the algebraic properties of ultrametric matrices that can be straightforwardly applied to the present problem. As we shall see below, this amounts to "nding, in the state described by a general OPF x(q), an explicit expression for the eigenvalues of the Hessian in the so called replicon sector, deemed to be `dangerousa from the viewpoint of thermodynamical stability. 5.3.1. Ultrametric matrices The Hessian, or, stability matrix of the free energy term (4.1) is Rnu[U(y), Q] . (5.54) " Rq Rq ?@ AB If the replica correlations of x 's as in (5.24) are thought as moments then (5.54) is analogous to ? a cumulant, and can obviously be expressed as M
?@AB
"[x x x x \![x x \[x x \"C(a, b, c, d)!C(a, b) C(c, d) . (5.55) ?@AB ? @ A B ? @ A B V V V The transposition symmetry of the matrix Q was understood in the above de"nition. The Hessian (5.54) becomes a so called ultrametric matrix [111] once the R-RSB form (4.4) for Q is substituted. Note that while constructing the stability matrix we did not di!erentiate in terms of the indices x . P Indeed, one produces the Hessian before the hierarchical form for Q is substituted, and at that stage the parameters of the R-RSB scheme do not appear. We can now comfortably apply the results of the elaborate study by TemesvaH ri et al. [111] about ultrametric matrices. Such matrices have four replica indices and are in essence de"ned by the property that they exhibit the same symmetries w.r.t the interchange of indices as the Hessian (5.54) with a Parisi Q matrix substituted in it. The theory was originally formulated for "nite R-RSB [111], but, as we shall see, continuation of the formulas comes naturally. Firstly we should clarify M
328
G. Gyo( rgyi / Physics Reports 342 (2001) 263}392
notation. Let us remind the reader to the merger index r(a, b) de"ned in the R-RSB ansatz by Eq. (5.14) in Section 5.1.5. The r(a, b) was denoted by a5b in Ref. [111]. According to the convention of [111], the elements of the ultrametric matrix M can be characterized in a symmetric way by four merger indices, among them three independent. Redundancy is the price paid for a symmetric de"nition. The new indices are r "r(a, b) , r "r(c, d) , r "max[r(a, c), r(a, d)] , r "max[r(b, c), r(b, d)] , whence
(5.56a) (5.56b) (5.56c) (5.56d)
MP P ,M (5.57) ?@AB P P is just a relabeling of the Hessian matrix elements. According to [111] one can distinguish among three main invariant subspaces } sectors } of the space of Q matrices. Here we give a loosely worded brief account of the decomposition, emphasizing also the physical picture that transpires from comparison with earlier results on the SK model. The longitudinal sector is spanned by Parisi matrices with the same set of m , or, equivalently, x P P (its relation to the m is given by (4.12)), indices as the matrix Q had that was substituted into (5.54). P In the general case (without restrictions like the "xing of the diagonal elements) this space has R#1 dimensions. The projection of the Hessian onto the longitudinal sector is a (R#1);(R#1) matrix, whose diagonalization cannot be performed based solely on its utrametric symmetry, but should be done di!erently for di!erent free energy terms u[U(y), Q]. The longitudinal Hessian in the RPR limit is related to the Hessian of the functional u[U(y), x(q)] (see Section 5.2.2). This is demonstrated by the variational stability analysis of the SK model, within the continuous RSB scheme, near the spin glass transition, as performed in Ref. [97]. The eigenvalue equation obtained by variation was recovered by taking the RPR limit of the eigenvalue problem within the longitudinal sector of the Hessian (5.54). The longitudinal subspace can be considered as the generalization of a deviation from the RS solution that equally has RS structure, i.e., the longitudinal eigenvector of de Almeida and Thouless (AT) [96]. The second sector has been called anomalous in Ref. [111]. It may be viewed as the generalization of the second family of AT eigenvectors. The ultrametric symmetry allowed the transformation of the Hessian restricted to this invariant subspace into a quasi-diagonal form of n!1 pieces of (R#1);(R#1) matrices [111]. Some of these submatrices are identical, there are only R di!erent of them in the generic case. Again, the diagonalization of these submatrices is a task to be performed on a case-by-case basis. To our knowledge no such study has been performed for R'1. The third is the so-called replicon sector. Here the ultrametric symmetry made it possible to fully diagonalize the Hessian, resulting in an explicit expression for the replicon eigenvalues in terms of Hessian matrix elements [111]. The replicon modes, the elements of this subspace, are the generalization of the eigenvectors of de Almeida and Thouless that destabilized the RS solution of the SK model. In other words, these can be thought as responsible for replica symmetry breaking.
G. Gyo( rgyi / Physics Reports 342 (2001) 263}392
329
In the stability analysis by Whyte and Sherrington [11] on the 1-RSB solution of the storage problem of the spherical neuron (by Ref. [7]) it was equally the replicon eigenvalue that caused thermodynamical instability. Note that the replicon modes were also termed as ergodons by Nieuwenhuizen [68,69], due to their role in the breakdown of ergodicity in an RSB phase. 5.3.2. Replicons The replicon sector has special physical signi"cance, since instability there in known cases signaled the need for higher order R-RSB. The replicon eigenvalues of an ultrametric matrix can be written as [111] 0 0 !MP P !MP P #MP P ) , (5.58) jP " m m (MP P Q> R> Q>R> P P Q>R QR> QR QP RP where 04r 4R and r 4r , r 4R. The r 's are no longer attached to replica labels as they had G been in Eqs. (5.56). This discrete expression lends itself to continuation, when one uses parametrization by q G to relabel as P M(q , q , q ),MP P . (5.59) P P P P P Here inequalities (5.37) are implied. Using the simpler notation of q 's for parameterization we get G for the replicon eigenvalues
dq dq x(q ) x(q ) R R M(q , q , q ) . (5.60) O O > > O O Comparison with the sum above shows that the inequalities q 4q , q 4q ("q ) need to hold, 0 and, of course, the eigenvalue is de"ned only in those q 's where x (q )O0. Expression (5.60) is G G unambiguous even though the correlation functions and so the integrand are ill-de"ned over intervals where x(q ) has a plateau. In such an interval the integrand becomes a derivative and we G de"ne the quadrature as the di!erence between values at the endpoints of the interval. Eq. (5.60) is equivalent to a formula expressed in terms of the variable x that was quoted in [223]. We call the reader's attention also to the fact that the continuation of the sum (5.58) implies that in case of ambiguity the right-hand-side limit in q of the partial derivatives are to be used. This distinction is generically of no import in regions where R'x (q)'0, but is necessary to be made at steps, where the left and right limits are di!erent. The lower integration limits in (5.60) carry the superscript #0 for this reason. In order to simplify notation, hereafter we often omit the mark #0 but understand it tacitly wherever necessary. Next we use the expression of the Hessian through correlators as given by (5.55). After inspection of how the discrete labeling was converted to continuous parametrization we get j(q , q , q )"
M(q , q , q )"C(q , q , q )!C(q ) , (5.61) V V where the fourth-order correlator de"ned in (5.35) appears. Hence the replicon spectrum is
j(q , q , q )"
O
dq
O
dq x(q ) x(q ) R R C(q , q , q ) . O O V
(5.62)
G. Gyo( rgyi / Physics Reports 342 (2001) 263}392
330
From expression (5.35) for the correlator we obtain
j(q , q , q )" dy P(q , y) K(q , y; q ) K(q , y; q ) ,
(5.63)
where by de"nition
dq x(q )R N(q , y ; q ) . (5.64) O O Using Eq. (5.36) for N and the identity (4.88) then substituting for the product x(q)i(q, y) the other terms in Eq. (4.48a), next performing partial integration and noting that G satis"es in its hind P variables the SPDE (4.53), we obtain K(q , y ; q )"
K(q , y ; q )" dy G (q , y ; q , y )i(q , y ) . P
(5.65)
The replicon spectrum can be expressed equivalently by the vertex function (4.85) as
j(q , q , q )" dy dy C (q ; 0, 0; q , y ; q , y )i(q , y )i(q , y ) . PPP
(5.66)
This formula can be graphically represented, if we recall that the "eld i is produced by the GF G for the PDE (4.46a) by (4.82). Let us mark G with a dashed line, then we have the graph on I I Fig. 8. Here we reemphasize that the solution of the relevant PDEs, in particular, the "eld u(q, y) with its derivatives and the GFs are assumed to be known, so the correlation functions and the replicon spectrum are considered as resolved if they are expressed in terms of the above "elds and GFs. 5.3.3. A Ward}Takahashi identity Recent results indicate the existence of an in"nite series of identities among derivatives of a function of Q, such as a free energy term, provided this term exhibits permutation symmetry in replica indices and the derivatives are considered with a Parisi matrix substituted as argument [223,228]. An equivalent source of the same identities is a `gaugea invariance, namely, the property
Fig. 8. The replicon eigenvalue in terms of GFs. The full line is G as before, the dashed line represents G , the GF for the P I PDE (4.46a).
G. Gyo( rgyi / Physics Reports 342 (2001) 263}392
331
that the free energy term looses its dependence on the speci"c m and q values and winds up P P depending only on x(q) in the nP0 limit [228]. These relations can be considered as analogous to the Ward}Takahashi identities (WTIs), arising in "eld theory for a thermodynamical phase wherein a continuous symmetry is spontaneously broken [229]. The continuous symmetry that is held responsible for the WTIs is the replica permutation symmetry in the nP0 limit, together with the appearance of an interval in q where x(q) is continuous and strictly increasing [223,228]. In our case, the free energy term (4.1) is of the aforementioned type, so we expect the WTIs to hold. Interestingly, the lowest-order nontrivial WTI can be easily obtained based on the results expounded in the previous section. Let us consider the replicon eigenvalues (5.66) in the case of coinciding arguments q"q "q "q 4q . (5.67) The behavior of the vertex function C for coinciding q-arguments can be easily deduced from PPP the requirement that the GFs become Dirac-deltas for coinciding times. Then the replicon eigenvalue assumes the form
j(q, q, q)" dy P(q, y)i(q, y)"j(q) .
(5.68)
This is precisely the r.h.s. of the identity (4.88) at q "0, y "0, while on the l.h.s. of the same we discover the 2nd-order correlator (5.33) for q(1. Therefore j(q)"CQ (q) . (5.69) V Strictly, this formula should be taken only at q's where the correlation function is de"ned i.e. q's that are limits of some q 's in the Parisi scheme (4.6). Nevertheless, we "nd that it holds with the P smooth continuation of (5.33) and (5.68) for any 04q(1, the more so remarkable because the replicon eigenvalues were not de"ned for arguments larger than q . Our present derivation yields just one identity out of a set of in"nitely many, but its advantage is that it uses analytic forms, and it is brief due to our prior knowledge about the properties of the relevant PDEs. Note that the WTI (5.69) was obtained for a mathematical abstraction, formula (4.1), but will gain physical signi"cance once we return to thermodynamics in Sections 7 and 8.
6. Interpretation and special properties 6.1. Physical meaning of x(q) In relation to spin glasses it has been shown that the OPF x(q) is the average probability that the overlap of two spin con"gurations from two di!erent pure (macro)states is smaller than q [110]. Furthermore, this property was found to naturally hold for combinatorial optimization problems that can be mapped to various spin glass models [14]. Similar feature follows from Parisi's ansatz for Q in the present neuron model evidently, but because of its signi"cance we brie#y give the derivation. Several further consequences of the hierarchical form of Q, as discussed in [14], also carry over to the neuron in the case of RSB.
G. Gyo( rgyi / Physics Reports 342 (2001) 263}392
332
Firstly let us consider the expression (4.8), where we replace x by 1 and q by some function G ?@ F(q ) of it. We obtain, using m "nP0, ?@ 0> 1 F(q )"!F(1)# [F(q )!F(q )]m , (6.1) ?@ P P\ P n ?$@ P whence, by continuation in the sense of Section 4.2.1
1 O "!F(q )# dq FQ (q)x(q)"! dq x (q)F(q) . F(q ) (6.2) ?@ n L ?$@ Here the assumption that only non-negative q's are relevant and q "q "1 was used. 0> " A density for the o!-diagonal matrix elements of Q can be obtained by substituting the Dirac delta for F(q) as
2 d(q!q ) " dq x (q )d(q!q )"x (q) . (6.3) ?@ n(n!1) L ?@ Finally, using the notation 122 for thermal average with n replicated partition functions, also L averaged over the patterns, the mean probability density of overlaps P(q) is, by the de"nition of q , ?@ 2 2 1d(q!N\J J )2 1d(q!q )2 " . (6.4) P(q)" ? @ L ?@ L n(n!1) n(n!1) L L ?@ ?@ Since the quantity to be averaged on the r.h.s. does not depend exponentially on N, the saddle point known from the free energy calculation does not move. The average 122 can be thus obtained L by simple substitution of the saddle point value in the Dirac deltas, i.e., the 122 sign can be L removed and we obtain (6.3), i.e.,
P(q)"x (q) .
(6.5)
The P(q) considered here is not to be confounded with the probability "eld P(q, y) of Section 4.2.3. This interpretation of x(q) indeed restricts the physically relevant space to monotonic functions. Further consequence that should be born in mind is that q's where P(q)"0 have vanishing relative weight in the thermodynamical limit. So any quantity depending on q carries direct physical meaning only for q's where x (q)'0. This reservation will hereafter be understood. The signi"cance of the x(q) (or q(x)) order parameter in long-range interaction systems extend to the "nite range problems. Indeed, the `mean "elda q(x) plays a role also in the "eld theory of spin glasses as discussed in Ref. [223]. It should be emphasized that the distribution of overlaps for a given instance of patterns SI, P (q) I 1 is not self-averaging. So the quenched average included in 122 and so in the de"nition of P(q) L leads to loss of information about the distribution of the random variable q. 6.2. Diagonalization of a Parisi matrix Since spectral properties of Parisi matrices (4.4) play an essential role in our framework, here we brie#y review known results about them (see, e.g., Refs. [230,73]). Only the case q "q "1 will ?? " be considered here, extension to any diagonals is straightforward. The eigenvalue problem is Q*P"DP*P ,
(6.6)
G. Gyo( rgyi / Physics Reports 342 (2001) 263}392
333
where r labels the eigenvalues and eigenvectors. The simplest eigenvector belongs to r"0 and has uniform elements, say *"(1, 1,2, 1). The r"1 subspace is spanned by vectors, orthogonal to *, that are uniform over boxes of the "rst generation, each having m number of elements. An example is v"1 if a"l m #1,2, l (m #1), v"!1 if a"l m #1,2, l (m #1), ? ? with l , l (n/m , integers, and v"0 for other a's. For a general r, the eigenvectors are uniform ? over boxes of size m and orthogonal to all eigenvectors of lower indices, yielding the eigenvalues P 0> DP" m (q !q ). (6.7) N N N\ NP The dimension of the space of vectors uniform in boxes of size m is n/m , this space is spanned by P P all eigenvectors of index not larger than r. Given the fact that the r"0 eigenvalue is nondegenerate, it follows that the degeneracy of the rth, r'0, eigenvalue is k "n(m\!m\ ) . P P P\ Continuation of (6.7) in the sense of Section 4.2.1 results in eigenvalues indexed by q as
dq x(q ) . O In the case of "nite R-RSB, comparison with (6.7) gives D(q)"
(6.8)
(6.9)
D(q )"DP> . (6.10) P Thus formula (6.9) incorporates both the R-RSB case and the one when x(q) is made up of plateaus and curved segments. According to the conclusions of Section 6.1, whereas the function D(q) is de"ned for all 04q41, it gives eigenvalues only for q's where x (q)'0. In particular, after continuation and with the notation of Section 4.2.1, x(q),1 in the interval [q , 1], so we have from (6.9) D(q)"1!q .
(6.11)
While D(q ) is an eigenvalue, D(q )"1!q "D0>, the D(q) from Eq. (6.11) has not the meaning of eigenvalue for q'q . The above results allow us to calculate the trace of a matrix function F(Q) 0> 0 1 [F(DP)!F(DP>)]#nF(D0>) . Tr F(Q)" k F(DP)"n P m P P P In the continuation process we obtain
(6.12)
O 1 dq F(D(q))#F(D(q )) lim Tr F(Q)" n L " dq [F(D(q))!F(1!q)]#F(1)" dq F(D(q))#F(0) . (6.13) Note that depending on F(q) not all alternative forms may be meaningful, e.g., if F(x)"ln(1!x) or F(x)"ln x then the second or the third expression is ill de"ned. The explicit dependence on
G. Gyo( rgyi / Physics Reports 342 (2001) 263}392
334
q was eliminated from the second and third formulas. These expressions stay valid also for "nite R-RSB. A special case is the calculation of the determinant for (3.17c)
1 1 1 1 lim ln det Q"lim Tr ln(Q)" dq ! , (6.14) n n D(q) 1!q L L where the second formula from (6.13) was used. Since in the stationarity relation (3.21) the inverse of a Parisi matrix appears, we will calculate that herewith. Because of the fact that the diagonalizing transformation depends only on the m 's, P but not on the q 's, the inverse of a Parisi matrix is a Parisi matrix with the same +m , set. Thus also P P the elements of the inverse matrix depend only on the merger index r(a, b) introduced in (5.14). It is convenient to parametrize them also by q as [Q\] ,q\(q ). (6.15) ?@ P?@ This de"nes a function q\(q) by continuation, that has plateaus within (q , q ) in the R-RSB P\ P scheme. Equivalently, the inverse matrix can be represented by the inverse of q\(q), the function x\(q) (not to be confounded with the inverse of q(x) that is x(q)). The two characteristics are related through x\(q\(q)),x(q) .
(6.16)
This expresses the fact that in a "nite R-RSB the set of x indices is the same for Q and Q\. The P spectra are in reciprocal relation, for q4q D\(q\(q))"1/D(q) , (6.17) whence by di!erentiation, using (6.9) on each side, and requiring q\(0)"0, we arrive at
O dq . (6.18) D(q ) This leaves the diagonal elements (q\) "q\(1) of Q\ undetermined, that is obtained from 0> the reciprocal relation of the respective eigenvalues of index R#1, yielding q\(q)"!
O dq 1 q\(1)" ! . (6.19) D(q ) 1!q An attempt to continuation of q\(q) between q and 1 shows that q\(q) is non-monotonic. Again, relations (6.18) and (6.19) equally hold for the discrete R-RSB case, as well as when x(q) has both plateaus and curved segments, with the usual reservation that (6.18) relates matrix elements only when x (q)'0. 6.3. Symmetries of Parisi's PDE A systematic procedure of identifying all continuous symmetries of a PDE is the so-called prolongation method [231]. The knowledge of a continuous symmetry group allows one to generate out of a given solution a family of other solutions. Via the prolongation method we "nd by construction that there are altogether three oneparameter transformations leaving the PPDE (4.36) invariant. The action of these symmetries on
G. Gyo( rgyi / Physics Reports 342 (2001) 263}392
335
a solution u(q, y) can be given as a one-parameter family u(s, q, y), with u(0, q, y)"u(q, y). These one-parameter families are u (s, q, y)"u(q, y#s) , (6.20a) u (s, q, y)"u(q, y)#s , (6.20b) u (s, q, y)"u(q, y!D(q)s)!ys#D(q)s , (6.20c) where D(q) is de"ned by (6.9). The fact that the above families are solutions of the PPDE (4.36), provided u(q, y) is also a solution, can also be shown by substitution. The additional statement, namely, that there are no more continuous symmetries, follows from the construction of the prolongation method that we cannot undertake to describe here. Eq. (6.20a) represents translation in y, while (6.20b) is a shift of the "eld u by a constant, these symmetries are obvious. The third one, (6.20c), is less so, it is a shift of the origin in y and of the "eld u and a `tiltinga of the "eld u in y. The symmetry transformation equally changes the initial condition. As a forward reference we note that, in the case of the energy term for the storage problem of a single neuron, the PPDE (7.4) has the error measure potential <(y) as initial condition. The constant shift and the `tiltinga in y changes <(y) such that it no longer satis"es the properties of <(y) outlined in Section 3.1. Thus uncovering the above symmetries is of little help in "nding solutions to the PPDE in the neuron problem at hand. However, given the relevance of the Parisi solution to a vast class of disordered systems, we considered the symmetries worth presenting. 6.4. Spherical entropic term: a solvable case of Parisi's PDE While most of the relevant quantities related to the spherical entropic term are straightforward to calculate, from the technical viewpoint they represent a solvable example of Parisi's framework, suitable for an exercise. Note that the general distribution (3.6) does not include the overall spherical normalization (3.5), so the results on independent synapses do not carry over. Nevertheless, we can cast (3.17c) into the general form (4.1) with the association eUQ W"(2p d(y) .
(6.21)
The subscript s signals that we are dealing with the entropic term of the free energy. For we need to regularize the Dirac delta, we use a Gaussian with small variance p. With the notation (4.92) we have
1 y U (y)"ln ((2p G(y, p))"! #ln p . QN 2 p
(6.22)
Thus f (Q)"!(2b)\ ln det Q"lim b\nu[U (y), Q] . Q QN N
(6.23)
G. Gyo( rgyi / Physics Reports 342 (2001) 263}392
336
We keep p "nite while performing continuation, i.e., the limits nP0 and pP0 will be interchanged. Then we need to solve the PDE (4.36a) with initial condition u (1, y)"U (y) . (6.24) QN QN This can be done by our assuming that u (q, y) is a quadratic polynomial in y. With the notation QN of (6.9) and D (q)"p#D(q), the solution is N dq 1 y #ln p# . (6.25) u (q, y)"! QN D (q ) 2 D (q) N O N Hence we obtain
1 dq lim f (Q)"f [x(q)]"lim b\u (0, 0)"lim !(2b)\ #ln p . (6.26) Q Q QN n D (q) N L N N The rightmost expression is, apart from a prefactor, equivalent to (6.14). Either of them thus gives Eq. (3.17c). As it has been described in Section 5, expectation values are calculated by using G . That is by P Eq. (4.70) the GF of Eq. (4.49), the linear PDE for the "eld 0. Given the "eld k (q, y)"!y/D (q) , QN N from the de"nition (4.45) and from (6.25), the GF is found to be Gaussian
(6.27)
G (q , y ; q , y )"G(A, B) , (6.28a) QP D (q ) A"y N !y , (6.28b) D (q ) N B"D (q )[E (q )!E (q )] , (6.28c) N N N O dq . (6.28d) E (q)" N D (q ) N Note that we omitted the subscript p from the GF. Sompolinsky's time-dependent density is by (4.72)
P (q, y)"G(y, D (q)E (q)) . Q N N The generalized two-replica correlator (5.36) is thus y !p\h(q !1\) , N (q , y ; q )"E (q )!E (q )# Q N N D (q ) N whence the correlation function (5.33) is
(6.29)
(6.30)
C (q)"N (0, 0; q)"E (q)!p\h(q!1\) . (6.31) QV Q N The regularizing parameter p can be taken zero at many a place in the above formulae, an exception being the correlator at q"1, where the integration should be performed "rst in (q , 1)
G. Gyo( rgyi / Physics Reports 342 (2001) 263}392
337
to get the "nite result C (1)"E(q )!(1!q )\. We evaluate the "rst of the two 4-replica QV correlators from (5.35) as C(q , q , q )"2 E (q )#[E (q )!p\ h(q !1\)][E (q )!p\ h(q !1\)] . QV N N N (6.32) The replicon eigenvalue from (5.62) is as j (q , q , q )"[D (q ) D (q )]\ , (6.33) Q N N independent of q . Note that the maximal argument allowed in the eigenvalue is q , if this is smaller than 1 then the regularization parameter p can be omitted. The one WTI (5.69) can be checked directly by comparing Eqs. (6.31) and (6.33). In this example the PPDE could be solved in closed form. The question obviously arises, under what conditions can the solution be obtained analytically. It is easy to see that if the initial condition at q"1 is quadratic in y then, for arbitrary x(q), the solution can be explicitly given as a quadratic function in y, with q-dependent coe$cients. Other analytic solutions we did not "nd for a general x(q), but of course for special x(q)'s, like step functions, the PPDE can be solved in closed form. 6.5. Small xeld expansion The case of an overall small function U(y) in (4.1) is of interest because, on the one hand, in the neuron problem it corresponds to the high-temperature limit, and on the other, it will yield the usual energy term in several of the in"nite range interaction spin glass models. The latter feature stresses the generality of the framework discussed in this paper. We can apply straightforward perturbation expansion by introducing a small parameter e and writing u[eU(y), Q]"e u [U(y), Q]#e u [U(y), Q]#O(e) , where we took into account that the O(e) term vanishes. The linear term is
(6.34)
L 1 L dx dy 1 dLx dLy xy e \xQx U(y )" U(y)e VW\VO?? u [U(y), Q]" ? n 2p n (2p)L ? ? 1 L " Dz U(z(q ) , ?? n ? which, for q ,1, gives ??
u [U(y), Q]" Dz U(z) .
(6.35)
(6.36)
In O(e) we obtain
1 dLx dLy xy L n u [U(y), Q]" e \xQx U(y )U(y )! u ? @ 2n (2p)L 2 ?@ dx dx dy dy 1 L n U(y )U(y )e xy\V O?? >V O@@ >V V O?@ ! u . " (2p) 2n 2 ?@ (6.37)
G. Gyo( rgyi / Physics Reports 342 (2001) 263}392
338
In the last expression i xy is shorthand for i(x y #x y ). When nP0 the term nu vanishes. In the generic case of q "q "1 we obtain after elementary manipulations ?? " 1 L W(q ) , (6.38) u [U(y), Q]" ?@ 2n ?@ where
W(q)" Dz Dz U(n z)U(n z) ,
(6.39a)
"n """n ""1, n n "q . (6.39b) Here n z, etc. denote scalar products of two-dimensional vectors. In the continuous limit 1 u [U(y), x(q)]" dq x(q)WQ (q) (6.40) 2 results, where (6.2) with q ,1 was used, thus the term (6.34) is hereby resolved up to O(e). ?? For the derivatives of W(q), with the notation (4.105), we obtain the suggestive formula
W I (q)" Dy Dy U I (n y)U I (n y) ,
(6.41)
together with the condition (6.39b). This yields a simple relation between appropriate expansion coe$cients of the functions U(y) and W(q). Namely, if W(q)" W qI I I then applying (6.41) with (6.39b) at q"0 we get 1 1 W " W I (0)" I k! k!
Dy U I (y)
(6.42)
.
(6.43)
On the other hand, assuming that U(y) is not diverging too fast for large "y", we have
y dI (!1)I Dy U I (y)"(!1)I DyU (y)eW e\W" Dy U(y)H , I (2 dyI 2I
(6.44)
where H (y) is the kth Hermite polynomial. Hence, given the Hermite expansion of U(y) as I (6.45) U(y)" U H (y/(2) , I I I then using the orthogonality
Dy H
y
y H "2I k! d I (2 J (2 IJ
(6.46)
we have for the Taylor coe$cients of W(q) W "k! 2I U . I I
(6.47)
G. Gyo( rgyi / Physics Reports 342 (2001) 263}392
339
In conclusion, for a given analytic W(q), with nonnegative Taylor coe$cients, we can thus construct a U(y) that reproduces W(q) through the expression (6.39a). The correspondence between W(q) and U(y) is not one-to-one, because all U(y)'s with Hermite coe$cients $U will yield the same W(q). I The expression (6.38) is the ubiquitous form for the energy term in various SK-type models, L f (Q)"A W(q ) , (6.48) C ?@ ?@ where A is a prefactor depending of the model. In particular, we have in the cases of the SK spin glass [14], the p-spin interaction [63], and Nieuwenhuizen's multi-p-spin interaction model [68], for W(q) the functions q, qN, and f (q), respectively. (The corresponding formula with W(q)"q, in the SK replica free energy, is the second term in Eq. (2.8).) The latter model, which incorporates the Ising and p-spin as special cases, has the p-spin component entering through the characteristic exchange constant J and leads to N qN J , (6.49) W(q)" p N N whence
J y N H . U(y)" N (2 N p(2N(p!1)! Given U(y),
d u[eU(y), Q] f (Q)"nA C de
(6.50)
(6.51)
C relates the spin glass energy term to the general framework expounded in this section. Due to the fact that the Taylor coe$cients of W(q) in a multi-p-spin interaction system are necessarily non-negative, a given U(q) uniquely determines the corresponding multi-p-spin interaction model. Whereas for the evaluation of the free energy term (6.34) the usage of PDE's could be avoided, we invoke the auxiliary q- and y-dependent "elds for the calculation of expectation values. Since now the initial condition of the PPDE (4.36) and thus the solution of it, u(q, y), is of O(e), in lowest order the non-linear term in (4.36) can be omitted. Writing u(q, y)+eu (q, y), and using similar notation for the derivative "elds k(q, y) and i(q, y), we obtain linear di!usion equations for the "elds u (q, y), k (q, y), and i (q, y). Hence
u (q, y)" dy G(y!y , 1!q)U(y ) ,
(6.52)
k (q, y)" dy G(y!y , 1!q)U(y ) ,
(6.53)
i (q, y)" dy G(y!y , 1!q)U(y ) .
(6.54)
The GF (4.70) is in leading order a Gaussian G (q , y ; q , y )"G(y !y , q !q )#O(e) , P
(6.55)
G. Gyo( rgyi / Physics Reports 342 (2001) 263}392
340
and P(q, y)"G (0, 0; q, y)"G(y, q)#O(e) , (6.56) P Thus the two-replica correlator (we only treat here the q(1 case) is by Eq. (5.33) in leading order
dy G(y , q)G(y !y , 1!q)U(y )G(y !y , 1!q)U(y ) G G "eWQ (q) .
C(q)"e V
(6.57)
From the non-negativity of the Taylor coe$cients of W(q), see Eq. (6.49), it follows that C(q)50. V The replicon spectrum of (5.63) can also be evaluated by our noting that (5.65) is now K(q , y ; q )"ei (q , y )#O(e) , (6.58) independent of q , whence in leading order j(q , q , q )"e dy G(y , q )G(y !y , 1!q )U(y ) G G ;G(y !y , 1!q )U(y )"eW$ (q ) . (6.59) Due to the non-negativity of the Taylor coe$cients in Eq. (6.49) we have j(q , q , q )50. Comparison with (6.57) shows immediately that the WTI (5.69) is satis"ed. The eigenvalues associated with the SK-type energy term (6.48) are obtained, based on (6.51), as 2AW$ (q ).
7. The neuron: spherical synapses Having worked out the technical tools in the previous sections, we are now in the position to apply them for the storage problem of the McCulloch}Pitts neuron. 7.1. General results 7.1.1. Free energy and stationarity condition The free energy (3.17) can be resolved based on the results of Sections 3}6 with the substitution U(y)"!b<(y) .
(7.1)
The speci"c formula for the free energy is one of our main results, so however elementary the above substitution is, we collect the relevant expressions below. Introducing the "eld f (q, y)"!b\u(q, y) ,
(7.2)
we obtain, from Eqs. (4.2) and (4.38), the energy contribution to the free energy term (3.17d), as a functional of the OPF x(q) 1 "f (0, 0) , f [x(q)]"lim f (Q)"!b\u[!b<(y), Q]" L C n C L
(7.3)
G. Gyo( rgyi / Physics Reports 342 (2001) 263}392
341
where, from (4.36), the f (q, y) is the solution of R f"!Rf#bx(R f ) , O W W f (1, y)"<(y) .
(7.4a) (7.4b)
The analog of the function k(q, y) of Eq. (4.45), useful for the calculation of replica correlators, is now m(q, y)"R f (q, y)"!b\k(q, y) W and from (4.46) we get
(7.5)
R m"!Rm#bxmR m , O W W m(1, y)"<(y) .
(7.6a) (7.6b)
By introducing s(q, y)"R f (q, y)"!b\i(q, y) W we obtain
(7.7)
R s"!Rs#bx(mR s#s) , W O W s(1, y)"<(y) .
(7.8a) (7.8b)
The q-dependent probability density P(q, y), satisfying the SPDE (4.53) and (4.54) now obeys R P"RP#bxR (Pm) , O W W P(0, y)"d(y) .
(7.9a) (7.9b)
The entropic term (3.17c) has essentially been calculated through the formula (6.14), whence we have
1 1 1 1 f [x(q)]"lim f (Q)"! ! . (7.10) dq Q Q n D(q) 1!q 2b L The speci"c form of the stationarity condition (3.21) immediately follows from (6.18) and (5.33) as
O dq "ab dy P(q, y)m(q, y) . (7.11) D(q ) This equation holds at isolated q 's in an R-RSB scheme, and does so identically in an interval P where x (q)'0. The question of plateaus with value in (0, 1) will be e$ciently treated by the variational formalism of Section 7.1.3. The stationary OPF x(q) should be substituted into (7.3) and (7.10), which by (3.17b) sum up to the value of the thermodynamical free energy f"f [x(q)]#af [x(q)] , Q C
(7.12)
G. Gyo( rgyi / Physics Reports 342 (2001) 263}392
342
where we sum up the de"nitions of the functionals on the r.h.s. as
1 1 1 ! , (7.13a) f [x(q)]"! dq Q D(q) 1!q 2b f [x(q)]"f (0, 0) . (7.13b) C The distribution of local stabilities, introduced in (3.27), is an expectation value of a type previously calculated. Using Eq. (5.11) we have the simple result
o(D)"[d(y !D)\" dy P(1, y)d(y!D)"P(1, D) .
(7.14)
The energy can be directly obtained from this distribution by Eq. (3.29) as
e"[<(y )\" dy P(1, y)<(y) .
(7.15)
The average of any function of D is, in general the function's average over the distribution P(1, y). This shows the physical meaning of the auxiliary variable y within the Parisi framework: it is the stability parameter, extended to any intermediary stage q, obe ying a distribution P(q, y), that becomes at q"1 the physically observable distribution P(1, y). The entropy is by (3.16), (7.12) and (7.13)
1 1 1 ! #ab dy P(1, y)<(y)!f (0, 0) . (7.16) dq s" D(q) 1!q 2 Given the monotonicity of the OPF, the Edwards}Anderson order parameter (3.31) can be cast as q
"max q(x) . (7.17) V This is the maximal q that has non-vanishing probability, P(q),x (q)'0, in the notation of Section 4.2.1 we have q "q . # In summary, as we demonstrated it in Section 6.1, x(q)"O dq P(q ), where P(q) is the probability density of the overlap q between two synaptic con"gurations.Thus x(q) is monotonic and invertible with inverse q(x), allowance given for plateaus and isolated discontinuities in these functions. The conclusion of the present section is that the equilibrium properties of the neuron model are determined by the stationary shape of x(q), or its inverse q(x), thus they play the role of order parameter function, in close analogy to spin glasses [14,84]. #
7.1.2. Variational principle: the PPDE as external constraint In Section 7.1.1 we have given speci"c forms for the free energy and stationarity conditions of Section 3, for the case of the Parisi ansatz. The latter formulas were originally expressed in terms of the Q matrix, while Section 7.1.1 has the "eld f (q, y), obeying the PPDE. It is natural to ask, what happens if we express the free energy in terms of x(q), and look for its extremum by varying x(q). This is reversing the order of the original recipe, when the stationarity condition in terms of the elements of Q was taken "rst and the resulting formula, Eq. (3.21), is expressed in terms of x(q). The
G. Gyo( rgyi / Physics Reports 342 (2001) 263}392
343
equivalence of these two procedures has been seen in the cases R"0, 1 for spin glasses (see e.g. [230]) and the neuron [4,6,7]. It is our observation that the equivalence carries over to the continuous Parisi ansatz. The proof is in principle given by Eq. (5.46), an identity which tells us that the variation by x(q) is proportional to the two-replica correlation, obtained by di!erentiation by q . We will, however, not leave the matter there and give a self-contained presentation of the ?@ variational theory. We shall consider two approaches. In this section the PPDE will be maintained as external constraint, while in the next one it will be included by a multiplier "eld into the functional to be extremized. The variational formulation opens the way to alternative methods to "nd stationarity states. Indeed, given the variational free energy to be extremized, we are no longer bound to the stationarity prescription (7.11) for "nding the extremum, rather we can choose any suitable procedure that is capable to locate the extremum of a functional. The free energy is then f"max f [x(q)] , VO with the free energy functional
(7.18)
f [x(q)]"f [x(q)]#af [x(q)] (7.19) Q C as de"ned by (7.13). The maximization in terms of x(q) is a trans"guration of the original minimization by the matrix elements of Q due to the nP0 limit. The PPDE (7.4a) is understood as external constraint, and in what follows its solution, and in fact the solutions of the related PDE's of Sections 4.2.2 and 4.2.3, as well as the GF's of Section 4.2.4, are assumed to be known. Variation of the free energy gives, following the result of Section 5.2.1, the sum of the two-replica correlations for the entropic and the energy term. In fact, in the special case of the spherical entropic term, the functional derivative of (7.13a) can be straightforwardly calculated. This gives, of course, the same result as that obtained from the correlator (6.31). Concerning the energy term, we write the correlator (5.33) with the notation (7.5). We recall that the entropic term was related to the generic free energy term by (6.26) while the energy term (7.3) also involved a minus sign. Finally, applying (5.45) to both the entropic and the energy term, we get for q(1 1 df [x(q)] b , F(q, [x(q)])" (C (q)!aC (q)) , CV 2 2b QV dx(q)
F(q, [x(q)])"
O
dq !a dy P(q, y) m(q, y) . bD(q )
(7.20a) (7.20b)
Note that we never displayed the functional dependence on the OPF in the correlators, but are doing so in the functional derivative for clarity. Furthermore, in the second correlator in (7.20a) the subscript e signals that it comes from the energy term (7.13b). Nevertheless, we omit that subscript from the related "elds P(q, y) and k(q, y), introduced in the previous section. When x(q) can be freely varied, the stationarity condition is F(q, [x(q)])"0 ,
(7.21)
G. Gyo( rgyi / Physics Reports 342 (2001) 263}392
344
thus (7.11) is recovered. In the case of stationarity for a discrete R-RSB scheme (4.5), the vanishing of (5.44) at each q , r"0,2, R, is required. This, however, gives only R#1 equations, insu$cient P for the determination of all x 's and q 's. If variation by x(q) is made with the assumption that P P x(q),x in an interval I, where 0(x(1, i.e., there is a nontrivial plateau in I, then from (5.47) follows
dq F(q, [x(q)])"0
(7.22)
' as stationarity condition. Thus (7.22) should hold in each interval (x , x ), r"0,2, R!1 within P P> an R-RSB scheme. If the stationary OPF has both x (q)'0 and x(q),xO0, 1 parts, then these imply the usage of (7.21) and (7.22) in the respective intervals of q, and (7.21) at the jumps between plateaus. We will see that such a phase, characterized by an x(q) concatenated from a plateau } with a nontrivial plateau value } and a strictly increasing segment, does arise in the neuron. 7.1.3. Variational principle: inclusion of the PPDE Sommers and Dupond [29] introduced a variational formalism for the Ising spin glass by including the PPDE into the free energy functional with a Lagrange multiplier "eld. The latter turned out to be the "eld satisfying the SPDE, and it could be interpreted as the probability density of the local magnetic "eld. The free energy functional also depended on and needed to be varied by an auxiliary function D(x). The latter function turned out not to bring new degrees of freedom in play because of an additional relation between q(x) and D(x). In contrast to former studies of the SK [29] and Little}Hop"eld [31] models, we did not "nd it necessary to introduce an additional function, the analog of D(x). The reason for that is, we surmise, that we had chosen x(q) as order parameter function. That has an immediate physical meaning, as demonstrated in Section 6.1. Thus no allowance remained for the `gaugea invariance, inherent in the traditional approach [29]. Moreover, the continued spectrum (6.9) of the Q matrix turned out to be proportional to the auxiliary function D(x(q)) of Ref. [29] (the cited authors also found this relation). So introducing the latter as an independent "eld to be varied does not lead to technical simpli"cation. It should be emphasized that the `gaugea invariance appeared to be of limited signi"cance only when the stationarity criterion was studied. It is, however, of import as to #uctuations of Q violating Parisi's ansatz and is the source of the WTIs [228]. Following Sommers and Dupond, we shall use the condition (7.4) in the free energy functional as a constraint. Forcing the PDE (7.4a) gives rise to a Lagrange multiplier "eld P(q, y), while the initial condition (7.4b) should be set separately. The result is f"max extr f [x(q), f (q, y), P(q, y)] , VO D OW.OW with, on the r.h.s., the functional f [2]"f [2]#a( f [2]#f [2]#f [2]) , Q C ? ? 1 1 1 dq ! , f [2]"! Q D(q) 1!q 2b
(7.23)
(7.24a) (7.24b)
G. Gyo( rgyi / Physics Reports 342 (2001) 263}392
f [2]"f (0, 0) , C f [2]" dq dy P(q, y)[R f (q, y)#R f (q, y)!bx(q)(R f (q, y))] , ? O W W
f [2]" dy P(1, y)[<(y)!f (1, y)] . ?
345
(7.24c) (7.24d) (7.24e)
The functional dependence on appropriate arguments is marked by [2]. There is no physical restriction on the type of extremum in terms of the auxiliary "elds f (q, y) and P(q, y). We keep, therefore, the more general `extra condition. The auxiliary functional f [2] ? enforces the PDE (7.4a). The form of f [2] can be understood if we impose the initial condition ? on the PDE by adding the term d(q!1)[<(y)!f (q, y)]
(7.25)
to the l.h.s. of (7.4a). The ambiguity of the Dirac delta centred at q"1 can again be taken care of by our using d(q!1\) whenever necessary. Note that the sign of expression (7.25) matters; it is the above choice that forces the right initial condition no matter what f (1, y) was before. This feature can be shown by considering an in"nitesimal decrement in q from 1 in the PDE complemented by (7.25). The two auxiliary terms (7.24d) and (7.24e) can be concatenated and variation by P(q, y) gives the PDE (7.4) for f (q, y), initial condition included. For the sake of clarity we keep (7.24e) specifying the initial condition separately. The terms (7.24b) and (7.24c) are identical to (7.10) and (7.3), respectively. Given the constraint on f (q, y) by the Lagrange term, one should vary f (q, y) independently, yielding the PDE (7.9) for P(q, y) including the initial condition, with the notation (7.5). Variation by x(q) can then be done while f (q, y) and P(q, y) are kept "xed, and we "nd that df [x(q), f (q, y), P(q, y)] dx(q)
(7.26)
is equal to (7.20). It should not cause confusion that the free energy functional f [2] and the auxiliary "eld f (q, y) have the same symbol, because the argument tells the di!erence. The variational free energy with the PPDE included as constraint was one of our main results in Ref. [17]. It should be emphasized that while the variational formalism is very useful for the description of the equilibrium properties, it does not account for such #uctuations of the matrix elements of Q that cannot be captured by the OPF x(q). Thus in order to study thermodynamical stability we need to resort to the more general framework of Section 5. 7.1.4. On thermodynamical stability Based on the formulas derived in Section 5, we can give an explicit expression for the replicon spectrum in terms of q, y-dependent "elds. We will only treat explicitly the spherical neuron, generalization for arbitrary independent synapses is, in principle, straightforward. The free energy, as function of the Q matrix, is the sum of the entropic and the energy terms. Due to the fact that both undergo the same scheme of spontaneous RSB, their Hessians can be simultaneously quasi-diagonalized, based solely on the ultrametric symmetry of the Hessian
G. Gyo( rgyi / Physics Reports 342 (2001) 263}392
346
(see Section 6.4). This results in the longitudinal-anomalous sector (R#1);(R#1) matrices in the diagonals, and in the replicon sector the replicon eigenvalues as diagonal elements. Hence a replicon eigenvalue of the complete Hessian is the sum of the two eigenvalues, one from the entropic term and one from the energy term. We do not deal with the longitudinal-anomalous sector in the general case, mostly because complete diagonalization there depends on the speci"c system under consideration. The entropic eigenvalue has been calculated in Section 5.1.4 as (6.33), so we have j (q , q , q )"[D(q ) D(q )]\ . (7.27) Q In order to get the contribution from the energy term, we introduce the GF for the "eld f. Using the fact that f and u are proportional we obtain df (q , y ) du(q , y ) "G (q , y ; q , y ) . " G(q , y ; q , y )" (7.28) P df (q , y ) du(q , y ) Note that on the l.h.s. we omitted the subscript f that we consider the default. Hence by (4.85) we obtain the vertex function C. Then the eigenvalue from the energy term is given by (5.66) with the substitution i(q, y)"!bs(q, y), where s(q, y) satis"es the PDE (7.8a), yielding
j (q , q , q )"!b dy dy C(q ; 0, 0; q , y ; q , y )s(q , y ) s(q , y ) . C
(7.29)
The "nal formula for the replicon spectrum is thus j(q , q , q )"j (q , q , q )#aj (q , q , q ) . (7.30) Q C Note that here the solutions of the relevant PDEs were assumed to be known. The WTI discussed in Section 5.3.3 implies the existence of zero modes. Indeed, using the fact that the functional derivative (7.20) is made up of two-replica correlators, by (5.69) we have j(q, q, q)"b FQ (q, [2])"j(q)
(7.31)
as the WTI for the spherical neuron. Here the dot means derivative in terms of the explicit q-dependence. But stationarity for strictly increasing segments of x(q) means the vanishing of the r.h.s., so the eigenvalue for such q's is zero. Note that in an R-RSB scheme stationarity at the q 's P does not imply the vanishing of (7.31). Based on the interpretation, quoted in Section 5.3.3, of the WTI as a consequence of spontaneously broken permutation symmetry, the zero modes found here can be considered as Goldstone modes of the symmetry broken phase. In order to decide about thermodynamical, linear, stability of a stationary x(q), the analysis of the full replicon spectrum is necessary. 7.1.5. Main types of the OPF It has been the experience in the study of various long-range interaction disordered systems that only a few main types for the OPF x(q) satisfy the stationarity condition and are at least marginally stable at the same time [84]. Below we review those that appear in the storage problem. The R-RSB ansatz (see Eq. (4.21)) proved to describe thermodynamical equilibrium for R"0 and R"1 in several di!erent systems in some parameter range. The former is the RS, the latter the 1-RSB state. Interestingly, we have not found any examples in the literature when R-RSB with
G. Gyo( rgyi / Physics Reports 342 (2001) 263}392
347
R'1 would have described thermodynamical equilibrium. In the storage problem Whyte and Sherrington [11] have shown that at ¹"0 all "nite R-RSB solutions are unstable. In fact, the eigenvalue causing instability is !R, a typical ¹"0 phenomenon, also observed for such eigenvalues in the SK model. As to CRSB states, the shape of the OPF that corresponds to the phase discovered by Parisi in the SK model is displayed in (4.44). In the nomenclature of [232] this is the SG-I state. Another type of phase also arises in the storage problem, namely, a concatenation of a 1-RSB plateau and a strictly increasing segment of the OPF. This has the form
if q 4q41 , x (q) if q 4q(q , x(q)" x if q (q4q , 0 if 04q(q . 1
(7.32)
Such OPF has been observed in spin glasses with spins of more than two states, like the Potts model [25]. This type of continuous OPF with a plateau has been termed SG-IV in Ref. [232]. 7.1.6. Stationarity and its consequences The stationarity conditions displayed in Section 7.1.2 can be cast in more useful forms. First of all, note that also the entropic term is of the generic form (4.1), as shown in Eq. (6.23) of Section 6.4. We will thus formulate stationarity in terms of the correlators in (7.20b). The "elds for the energy term will not be labeled, while the "elds belonging to the entropic term and treated in Section 6.4 will carry now the subscript s like u , k , i , and P . Q Q Q Q The stationarity conditions for the regions of positive P(q) can be cast into an equation that holds for all q's as
O
dq x (q )F[q, x(q)],0 .
(7.33)
where F was de"ned in Eq. (7.20). Indeed, F"0 must hold unless P(q)"x (q)"0. The lower limit of integration can be safely chosen to be zero. Alternatively, we have a combined stationarity condition that contains the requirements about plateaus and smooth x (q) segments, but can be imposed only at q's where P(q)'0, namely
O
dq x(q )F[q, x(q)],0 .
(7.34)
Next we summarize a few identities that follow from the PDEs of Section 7.1.1 for the "elds in the energy term
1 d dy P(q, y) f (q, y)"! bx(q) dy P(q, y) m(q, y) , 2 dq
(7.35a)
G. Gyo( rgyi / Physics Reports 342 (2001) 263}392
348
d dy P(q, y) m(q, y)"0 , dq
(7.35b)
d dy P(q, y) s(q, y)"bx(q) dy P(q, y) s(q, y) , dq
(7.35c)
d dy P(q, y) m(q, y)" dy P(q, y) s(q, y) . dq
(7.35d)
Similar identities among the "elds u , k , i , and P of the spherical entropic term can be naturally Q Q Q Q obtained, when the factors !b are erased as well. Note that the PPDE (7.4) was used in deriving (7.35a). According to what has been said in Section 4.2.6, for a discontinuous potential <(y) the PPDE is invalid at q"1, so (7.35a) holds only for q's where f (q, y) is smooth in y, generically for q(1. Let us consider the integral of (7.35a)
1 O dy P(q, y) f (q, y)!f (0, 0)"! b x(q) dy P(q, y) m(q, y) . (7.36) 2 Suppose that the PPDE (7.4a) holds for any q(1. Furthermore, suppose that both sides are continuous in q at q"1, a condition that is met if b is "nite. Due to the "rst assumption (7.36) holds for any q(1, and due to continuity it does so also at q"1. First we consider (7.33). After partially integrating it, and recalling (7.20) where F was a sum of two correlation functions, we can use the relations (7.35c) and (7.35d) to express the x(q)FQ term as a derivative in q. The result is
b
O
dq x (q )F[q , x(q)]"bx(q) F[q, x(q)]
O (7.37) ! dy [b\P (q, y) i (q, y)#a P(q, y) s(q, y)] "0 . Q Q The subscript s refers to the "elds related to the entropic term, discussed in Section 6.4. It follows from Eq. (6.27) that i (q, y)"R k (q, y)"!1/D(q) . Q W Q Hence
(7.38)
1 1 !a dy P(q, y)s(q, y)" !a s(0, 0) , bx(q) F[q, x(q)]# bD(q) bD(0)
(7.39)
where we took into account that P(0, y)"d(y). Obviously, for P(q)"x (q)'0 the "rst term on the l.h.s. vanishes by (7.21). Thus the rest is constant for such q's. Note that this constant is non-trivial, in contrast to some spin models [29,31], where the analogous constant vanishes. The form (7.34) immediately suggests the use of (7.36) and yields for any q with P(q)'0 the following form for the free energy:
f"b\u (0, 0)#a f (0, 0)"b\ dy P (q, y) u (q, y)#a dy P(q, y) f (q, y) . Q Q Q
(7.40)
G. Gyo( rgyi / Physics Reports 342 (2001) 263}392
349
This equation remains true in the interval [0, q ], where x(q),0. So the "elds P and f are mutually adjoint. Substituting q"0 we get the standard expression (7.12). From Section 6.4, we can recalculate the spherical contribution
1 O D(q )!D(q) dq #b\u (0, 0) , (7.41) b\ dy P (q, y) u (q, y)" Q Q Q D(q ) 2b that can be substituted into (7.40) to yield a more useful formula. Note that (7.40) is not an alternative form for the free energy functional, it is rather an expression for the free energy at stationarity. We mention that di!erentiations of (7.11) yield further stationarity conditions, valid only in intervals, but not at isolated points, where P(q)'0. We display the "rst one
1 FQ [q, x(q)]" !a dy P(q, y) s(q, y)"0 , bD(q)
(7.42)
where (7.35d) was used. The same formula, without becoming zero, is useful in R-RSB schemes. By Eq. (7.31) it represents the replicon eigenvalue with coinciding q arguments, which we denote by
1 !a b dy P(q, y) s(q, y) . j(q)" D(q)
(7.43)
For R"0 (RS solution) this is at q"q the AT eigenvalue, for R"1 it gives at q"q the typically most dangerous eigenvalue, responsible for the destabilization of the 1"RSB state. 7.1.7. The entropy Based on the identity (7.36) the entropy (7.16) can be cast into the alternative form
1 1 1 s" ! !ab dq x(q) dy P(q, y) m(q, y) . (7.44) dq 2 D(q) 1!q This is valid when (7.36) can be extended to q"1; for example, at "nite temperatures. It is useful here to separate the interval for q-integration into (0, q ) and (q , 1). Consider the "rst term !b f [x(q)] on the r.h.s. of (7.44) Q O 1 #ln(1!q ) !b f [x(q)]" dq Q D(q) O O dq O dq " #(1!q ) #ln(1!q ) dq x(q) D(q) D(q ) O "ab dq, x(q) dy P(q, y) m(q, y)
# (1!q ) ab dy P(q, y) m(q, y) .
(7.45)
G. Gyo( rgyi / Physics Reports 342 (2001) 263}392
350
The last relation comes from the alternative stationarity condition (7.34), which can indeed be applied, because for the upper limit of integration q , the Edwards}Anderson order parameter, we have P(q )'0. Substitution into (7.44) leads to cancellation, thus 1 ab s" ln(1!q )# (1!q ) dy P(q , y) m(q , y) 2 2
ab dq dy P(q, y) m(q, y) (7.46) 2 O where x(q),1 was used in the third term. It follows from (7.35d) that dy P(q, y) m(q, y), in general, strictly increases in q. But for an increasing function h(q) !
dq h(q)'(1!q ) h(q ) . (7.47) O Therefore in (7.46) the second and third term together is generally negative. The "rst term is obviously negative; thus so is the entropy. The above formulation is useful, besides for the consistency check of negativity of the entropy because the constituent functions are needed only in [q , 1]. There the only non-trivial ingredient is P(q, y), for m(q, y) is explicitly given by a Gaussian integral over the known m(1, y). 7.1.8. The high-temperature limit At high temperatures, if the relative number of examples a is appropriately rescaled, the neuron exhibits non-trivial thermodynamical properties. This should be contrasted with the fully connected SK-type spin glasses, which are paramagnetic in the high-T limit. For bP0 the energy term (4.2) can be expanded in terms of the potential and the results of Section 6.5 apply. We identify in Eq. (6.34) e with b and U(y) with !<(y). In analogy with the de"nition (6.39), we introduce
=(q)" Dz Dz <(n z) <(n z) , "n """n ""1, n n "q . The energy term in the free energy functional is expanded as f [x(q)]"f #b f [x(q)]#O(b) , C C C where from Eqs. (4.2), (6.34), and (6.36) we have
f " Dz <(z),(=(0) , C
(7.48a) (7.48b)
(7.49)
(7.50)
As it has been pointed out to the author by M. Opper, the limit studied here is equivalent to the thermodynamics of an N-dimensional vector in a Gaussian random, quenched potential, with a variance characterized by the function =(q) of Eq. (7.48a).
G. Gyo( rgyi / Physics Reports 342 (2001) 263}392
351
which does not depend on the OPF x(q), and
1 dq x(q) = Q (q) . f [x(q)]"! C 2 The relative number of examples a should be scaled so that c"ab
(7.51)
(7.52)
remains "nite. Large a's will counterbalance the homogenizing e!ect of high temperatures. The full free energy functional is thus singular in the small b limit such that bf [x(q)]"b\ # [x(q)]#O(b) , where
(7.53)
"c (=(0) , (7.54a) 1 1 1 dq ! #c x(q) = Q (q) . (7.54b)
[x(q)]"! D(q) 1!q 2 The entropic contribution was inserted from Eq. (7.10). The b\ is singular for bP0 but is independent of the OPF x(q). Thus it does not lead to meaningful thermodynamics. The important feature here is that the third term in (7.54b) is linear in x(q), because expansion in b is equivalent to expansion in x(q) in the PPDE (7.4a). The term [x(q)] is equivalent to the free energy functional of Nieuwenhuizen's spherical multi-p-spin interaction spin glass, a most general SK-type spherical system, incorporating the spherical SK and the more general p-spin interaction models [68]. Note that the above result can be obtained also by solving the PPDE perturbatively in b, a longer calculation. Variation of the O(b) term of (7.53) gives
O dq d [x(q)] " !c= Q (q),F (q, [x(q)]) , (7.55) D(q ) dx(q) the leading term in formula (7.20b) for bP0. In intervals with x (q)'0 the stationarity condition F (q, [x(q)])"0 can be explicitly solved for the segment x (q) of the OPF to give A 2
2
=(q) . x (q)" A 2c= $ (q)
(7.56)
If at a q the stationary OPF exhibits a step then F (q , [x(q)])"0 holds, and for a plateau x(q),x P P of value 0(x(1 in the interval I the condition (5.47) should be applied. The replicon eigenvalues can be easily calculated by our adding the contributions from the entropic term (6.33) j (q , q , q )"[D(q ) D(q )]\ Q and the result from the expansion (6.59) applied to the energy term, aj (q , q , q )"!c= $ (q ) , C
(7.57)
(7.58)
G. Gyo( rgyi / Physics Reports 342 (2001) 263}392
352
Thus in leading order j depends only on one q-variable. Adding them up gives for the replicon C spectrum j (q , q , q )"j (q , q , q )#aj (q , q , q ) . (7.59) Q C This is the leading term in (7.30). If q falls into an interval where xR (q)'0 then by Eq. (7.56) we have c= G (q )"1/D(q ). Since q 4q , q 4q and D(q) monotonically decreases, it is easy to see that j (q , q , q )50. So these replicons are never linearly unstable. Such a general statement cannot be made if q is a discontinuity point between two plateaus of x(q). Longitudinal stability } we have not discussed this question in the general case } can also be checked explicitly in the high-T limit. Indeed, the second variation of the free energy functional gives
O O dq d [x(q)] "! . (7.60) D(q ) dx(q )dx(q ) This is negative de"nite, as shown in Appendix H. So the extremum of f [x(q)] is indeed maximum as required in (7.23). The main local quantity of interest is the stability D associated with individual patterns. The distribution of stability parameters o(D) is by (7.14) determined through the "eld P(q, y). So we have to solve the SPDE (4.53), (4.54) for high temperatures. As described in Appendix G, we "nd o(y)"o (y)#bo (y)#O(b) , where o (y)"G(y, 1) ,
(7.61) (7.62a)
o (y)"R G(y, 1) W
(7.62b) dq x(q) Dz <(yq#z(1!q) . The "rst correction o shows the deviation from the Gaussian, an e!ect that is expected to be dramatic for low temperatures. The error per pattern is by (3.29) e"e #be #O(b) , where
(7.63)
e " dy o (y)<(y)"(=(0) ,
(7.64a)
dq x(q) = Q (q) . The leading term of the entropy is obtained from the de"nition (3.16) as e " dy o (y)<(y)"!
1 1 1 ! !cx(q)= Q (q) . s " dq 2 D(q) 1!q
(7.64b)
(7.65)
7.1.9. Scaling by temperature Gardner and Derrida recognized that at ¹"0, for positive stability threshold i (see Eq. (3.9) for the de"nition), when the limit of capacity was approached, the RS order parameter q converged to unity [4,5]. This is a manifestation of the fact that, at the limit of capacity, the volume of version
G. Gyo( rgyi / Physics Reports 342 (2001) 263}392
353
space, compatible with the patterns to be stored, no longer diverges exponentially in N [19]. In other words, the volume per synapse goes to zero if NPR, accompanied by the divergence of the entropy to !R. Discrete R-RSB calculations at ¹"0, beyond capacity, showed that the q's belonging to any 0(x41 were also equal to unity. At the same time, q(1 values were associated with the variable m"bx, when this was kept "nite in the limit xP0 and bPR. Furthermore, it is plausible to assume that for ¹P0, i.e., q P1, a positive limit of x Px "x(q\)41 exists. 0 The above observations suggest a natural scaling for the OPF, valid for any temperatures, but providing a smooth ¹P0 limit. Let us introduce q(t)"q !(q !q )(1!(1#q )t#q t), 04t41 , m(t)"bx(q(t))q (t) , g"b(1!q ) ,
(7.66a) (7.66b) (7.66c)
D(t)"bD(q(t))"
m(tM ) dtM #g ,
(7.66d)
R where the subscripted q's are de"ned in (4.42), x(q) vanishes for 04q(q and x(q),1 for 15q'q . The time variable is changed to t via the invertible function q(t) which was constructed so that q (1)"0 for q "1. The scaled function D(t) is not to be confounded with the local stability parameter D. In the ¹P0 limit we expect g to be "nite and thus the scaled OPF m(t) to be bounded. Indeed, in the most dangerous point, t"1, i.e., q"q , where bx(q ) may diverge, we have an expectedly "nite m(1)"gx (q !q )(g . (7.67) It is advisable to use the above scaling even for ¹'0, because the scaled formulae remain manageable for small temperatures. In what follows q(t) may in fact be any monotonic function with boundary conditions q(0)"q and q(1)"q , our taking the simple (7.66a) is just a numer ically useful parametrization. The auxiliary "elds now depend on t and y, like f (t, y), m(t, y), etc. Of course, f (t, y) equals f (q(t), y) and not the "eld f (q, y) in the point t"q. We will write the arguments in the way that no ambiguity remains about which function is meant. The PPDE should be rewritten as (7.68) R f (t, y)"!q (t)Rf (t, y)#m(t)(R f (t, y)) , W W R whose initial condition at t"1 is the former f (q , y). At this point it is worth displaying the f (q, y) in the interval [q , 1]. There x(q),1, so Eqs. (7.1) and (4.35) de"ne the Cole}Hopf transformed "eld, obeying linear di!usion, that gives
1 f (q, y)"! ln Dz e\@4W>X(\O, q 4q41 . b
(7.69)
The initial condition of the PPDE (7.68) is f (t"1, y)"f (q , y) , (7.70) whence for ¹P0, after change of integration variable in (7.69) as y "y#z(1!q, we get
(y!y ) . f (t"1, y)"min <(y )# 2g W
(7.71)
G. Gyo( rgyi / Physics Reports 342 (2001) 263}392
354
The expression on the r.h.s. "rst appeared in Ref. [6] as a free energy term in the RS approximation. For small q's we have
f (q, y)" Dz f (q , y#z(q !q), 04q4q .
(7.72)
The rescaled SPDE reads as (7.73) R P(t, y)"q (t)RP(t, y)#m(t)R (P(t, y)m(t, y)) . W W R Along the plateau [0, q ] we have x(q),0, so the SPDE (4.53) is a linear di!usion equation, and in [q , 1] the Cole}Hopf-type transformation (4.61) leads to linear di!usion, whence P(q, y)"G(y, q), 04q4q , (7.74a)
P(q, y)"e\@D OW Dz P(t"1, y#z(q!q )e@D RW>X(O\O , q 4q41 .
(7.74b)
Thus the initial condition for P(t, y) in the SPDE is P(t"0, y)"P(q , y)"G(y, q ) . The stationarity condition (7.21) for q's where x (q)'0, i.e., mQ (t)q (t)!m(t)qK (t)'0
(7.75)
(7.76)
reads now as
R q (tM ) q F[t, m(t)], # dtM !a dy P(t, y)m(t, y)"0 . (7.77) D(0) D(tM ) Of course, if one has a non-trivial plateau within the t-interval (0, 1) i.e., (7.76) fails in a subinterval, then (7.77) is invalid in that subinterval and one should extremize by the parameters of the plateau extra. In the PDEs and the stationarity condition the temperature does not appear explicitly and allows for a smooth limit in case ¹P0. Assuming that we solved the above PDEs, in the scaled variables the free energy becomes f"f #af , Q C 1 q (t) dt 1 b 1q # ln , f "! ! Q D(t) 2b g 2 D(0) 2
f " Dz f (t"0, z(q ) . C
(7.78a) (7.78b) (7.78c)
The distribution of local stabilities, based on (7.74b), is
o(y)"P(q"1, y)"e\@4W Dz P(t"1, y#z(1!q ) e@D RW>X(\O .
(7.79)
G. Gyo( rgyi / Physics Reports 342 (2001) 263}392
355
It is straightforward to show that o(y) is normalized if P(t"1, y) was normalized, and the latter property follows from the fact that the SPDE preserves the normalization of its initial condition (7.75). The mean error per pattern can be calculated as
e" dy o(y)<(y) .
(7.80)
In practical cases the limit of the local stability distribution for ¹P0 can be calculated by the saddle point method from (7.79) and contains only non-singular, scaled variables. Concerning the entropy, it is obvious from (7.44) that beyond capacity, at ¹"0, the entropy is s"!R. Indeed, the "rst term of (7.46) goes to !R for bPR, and the rest being generally negative, see reasoning in the end of Section 7.1.7, it cannot compensate for the negative singularity. This complements the known result for i"0 that when one approaches the capacity from below then the entropy diverges to !R [5]. Together with the fact that the overlap beyond capacity equals 1 with probability 1, this demonstrates freezing in the ground state. This is an e!ect analogous to the vanishing of the ¹"0 entropy beyond capacity for the Ising perceptron [175]. Both show that the number of states with minimal error is subexponential in N. 7.1.10. The RS state and storage below capacity For a general potential <(y) that vanishes beyond a certain stability parameter, y'i, and is positive below it, the original results of Gardner [3,4] describe the storage problem at ¹"0 below capacity. The reason is that if all examples are satis"ed then the positive part of the error measure does not matter. At "nite temperatures the potential comes into play, the equations are easily obtained from what has been said before. The free energy is a function of the only variational parameter q"q "q "q "q as 0 f (q)"f (q)#af (q) , (7.81a) Q C q 1 , (7.81b) !bf (q)" ln(1!q)# Q 1!q 2
!bf (q)" Dz ln Dz e\@ 4D , C
(7.81c)
f"z (q#z (1!q . Stationarity is given by (7.21) that now reads as
(7.81d)
q R !a Dz ln Dz e\@4D "0 , bF(q)" Rf (1!q)
(7.82)
with the abbreviation (7.81d), and the AT eigenvalue from (7.43) is
1 R j(q)" !a Dz ln Dz e\@4D . Rf (1!q)
(7.83)
Note that (7.31) is not in contradiction with j(q)ObFQ (q) here. This is because in (7.31) the derivative is understood by the explicit q dependence of F, while x(q) is "xed, but in F(q) both kinds
G. Gyo( rgyi / Physics Reports 342 (2001) 263}392
356
of arguments are denoted by the same q. Here j(q) is only meaningful at the stationary q. The probability density of local stabilities is given by say (7.79), now
G(y#z (1!q, q) . (7.84) o(y)"e\@4W Dz Dz e\@4W>X >X (\O In the ground state (¹"0) below capacity the positive part of <(y) is suppressed. This means that for bPR only arguments of < matter that are greater than i. Closer inspection shows that this argument holds only if q does not approach 1 at the same time. For bPR the free energy and the energy goes to zero, but bf remains typically "nite,
i!z(q q 1 #a Dz ln H !bf (q)" ln(1!q)# 2(1!q) 2 (1!q
(7.85)
where (4.101) was used for the de"nition of H(x). The ground state entropy per synapse is now s"!bf ,
(7.86)
the subject of the pioneering works [3}5]. Considering the prior volume w(J) d,J as unity, where (3.5) gives the prior density, the volume of version space is e,Q. The stationarity condition (7.82) simpli"es in the ground state to
(i!z(q) exp ! q a 1!q " Dz 1!q 2p i!z(q H (1!q
,
(7.87)
and the AT eigenvalue becomes
(i!z(q) exp ! R 1 a 2(1!q) j(q)" ! Dz Rz (1!q) 2pq i!z(q H (1!q
.
(7.88)
Numerical evaluation shows that for increasing a the q goes to 1 and the entropy decreases towards !R. For q:1 the dominant contribution in the above expressions comes from the region of exponentially small H, i.e., when its argument is large, positive. That is ensured by i'z. The asymptotics of H(x) for large x can be found in [233] e\V +(2px . H(x) Thus the limit qP1 is realized, from Eq. (7.87), for a"a (i) satisfying G Dz (i!z) , 1"a (i) \
(7.89)
(7.90)
G. Gyo( rgyi / Physics Reports 342 (2001) 263}392
357
Fig. 9. The limit of capacity a for Eq. (7.91), solid line, and the indicator aH of AT stability from (7.93), dashed line.
that evaluates to the capacity curve
i e\G \ a (i)" (i#1) (1!H(i))# . (2p
(7.91)
This gives a (0)"2, the known result of Refs. [149,150]. The recovery of that by Gardner [3,4] raised much con"dence in the statistical mechanical approach combined with the replica method. For the i-dependent capacity (7.90) several sources could be cited, see e.g., [6]. The a (i) curve is shown on Fig. 9, by tradition the horizontal axis is a. From (7.85) one can convince oneself of the negative divergence of the entropy for aPa from below. Alternatively, the conclusion in the end of Section 7.1.9 about s"!R also applies at the limit of capacity. A glance at the AT eigenvalue (7.88) reveals that j is typically singular for qP1. Decisive is its sign, that can be determined in that limit by again using the asymptotics (7.89) for a:a (i). If the latter is inserted into (7.88) before derivation, one immediately sees that the amplitude of singularity is a (i) , j(q)(1!q)+1! aH(i)
(7.92)
where we have given a name to aH(i)"1/H(i) ,
(7.93)
and depicted it on Fig. 9. It follows that if a (aH (a 'aH) then the AT eigenvalue has a positive (negative) singularity on the capacity line. Since the RS solution presented here is only valid for
G. Gyo( rgyi / Physics Reports 342 (2001) 263}392
358
q(1, below the capacity line, we can conclude that for i50 this region is AT-stable. This RS state ceases to exist at the critical line not because of AT instability, rather the overlap q reaches the border q"1 of its physical range. We mention but do not further elaborate on the property that for negative i's the RS solution destabilizes before the capacity is reached. Thus RSB is necessary below capacity, like in [215], but in the present case this occurs without the need for nonmonotonic potentials. Note that aH is not the AT stability boundary for the RS solution because it was calculated in the limit qP1. In sum, for i50 the region below capacity for ¹"0 can be described by the RS solution. However, beyond it the particular form of the potential <(y) a!ects the behavior, so we specify one for further studies. 7.2. The special error measure h(i!y) Now we apply the framework of Section 7.1 to the error measure (3.9) with b"0, i.e., to <(y)"h(i!y) ,
(7.94)
a much studied case. This potential does not weigh erroneous patterns by `how mucha wrong they are as measured by the local stability parameter D, it simply counts them. We will often use the function H(y) as de"ned by (4.101). The initial condition for the PPDE (7.68) is obtained by substitution of (7.94) into Eqs. (7.69) and (7.70):
i!y 1 #e\@ , f (q , y)"! ln (1!e\@)H b (1!q
(7.95)
whence by derivation in terms of y the initial conditions for m(q, y), etc. result. 7.2.1. The ground state For a'a none of the "nite R-RSB solutions [5,7,8,11] are thermodynamically stable [11]. Hence it is necessary to resort to a CRSB ansatz. In the time parameter t the initial conditions at t"1 are as follows for ¹"0. The initial condition for the "eld f (t, y) is given by (7.71). The minimum is realized at
y
if y4i!(2g ,
y " i if i!(2g4y4i , y if y5i .
(7.96)
Hence, at t"1,
1 (i!y) f (1, y)" 2g 0
if y4i!(2g , if i!(2g4y4i , if y5i ,
(7.97)
G. Gyo( rgyi / Physics Reports 342 (2001) 263}392
359
whence by di!erentiation in terms of y we get the initial conditions m(1, y) and s(1, y) for the evolution in time t. Presuming that P(t"1, y) is known, we calculate the local stability distribution by applying the saddle point method to Eq. (7.79) yielding
P(1, y)
if y(i!(2g , if i!(2g(y(i ,
o(y)" 0
(7.98)
P(1, y)#d(y!i) G ( dy P(1, y ) if y5i . G\ E Thus in o(y) a gap develops, but normalization is restored by the d-peak at y"i. Similar feature was observed in various approximations, RS and 1-RSB, in previous works [6,7], but there the function appearing in the place of P(1, y) was explicitly known. Note the singularities in y in the initial conditions; these make numerical calculations more di$cult, but do not alter the fact that for averaged quantities the limit ¹P0 is generically smooth. We do not elaborate more on the ¹"0 case, because numerical evaluation cannot be avoided anyhow. But, due to the scaling described in Section 7.1.9, the singularity of the ¹P0 limit has been lifted and both ¹"0 and ¹'0 could be treated within the same numerical framework. 7.2.2. The high-temperature limit As reported in our Letter [17] and discussed for a general error measure <(y) in Section 7.1.8, in the limit when both the temperature ¹ and relative number of examples a are large, much can be said by analytic treatment about even the CRSB states. The general formulae were presented in Section 7.1.8, where the e!ective free energy to be extremized was given as [x(q)] in Eq. (7.54b). The error measure under consideration determines the function =(q) via (7.48). The simplest way to give =(q) is by
i exp ! 1#q = Q (q)" 2p(1!q
,
(7.99a)
=(0)"H(!i) ,
(7.99b)
whence =(q) and all its derivatives can be calculated. The RS ansatz means that x(q)"h(q!q ). Using Eq. (7.54b) we get (the subscript of q is omitted)
q 1 #ln(1!q)#c(=(1)!=(q))
(q)"! 2 1!q
(7.100)
and the stationarity condition reads as
i c exp ! q 1#q " (1!q) 2p(1!q
.
(7.101)
G. Gyo( rgyi / Physics Reports 342 (2001) 263}392
360
Local thermodynamical stability is determined from (7.57)}(7.59). Thus the AT line is given by j01(q)"j (q, q, q) 1 c i i(1!q)#q(1#q) " ! exp ! "0 . (1!q) 2p 1#q (1#q)(1!q)
(7.102)
The 1-RSB ansatz is equivalent to x(q)"(1!x )h(q!q )#x h(q!q ) and yields by Eq. (7.54b)
1!x 1 q ln(1!q ) !
(q , q , x )"! x 2 1!q #x (q !q ) 1 # ln[1!q #x (q !q )] x
# c[=(1)!(1!x )=(q )!x =(q )] .
(7.103)
The leading replicon eigenvalue is given by (7.102) with q substituted. Thus the boundary of local stability is j01 (q )"j (q , q , q )"0 . (7.104) The classic Parisi phase, or SG-I, is characterized by the OPF (4.44). There x (q) is the continuously increasing part of the OPF, for which we obtain from Eqs. (7.56) and (7.99) 1 x (q)" c
;
p i(q!2q#1)#2i(!2q#q#2q!1)#2q#4q#3q#2q#1 eG >O. 2 (q!qi#q#i)(1!q)(1#q) (7.105)
The interesting feature is that the OPF has an explicit and non-perturbative form. The perturbation is in b now, and a small b apparently does not make x(q) degenerate. We shall need
if q 4q41 , D(q)" D (q),1!q #O dq x (q ) if q 4q4q , (7.106) O D (q ) if 04q4q . The leading term non-trivial in the free energy, (7.54b), depends only on the endpoints of the interval as 1!q
O dq 1 q # #ln(1!q )
(q , q )"! D (q) 2 D (q ) O O dq x (q)= Q (q)#c=(1)!c=(q ) . (7.107) #c O The replicon eigenvalues with identical arguments vanish due to the Ward}Takahashi identity, as described in Section 7.1.4, so the SG-I phase is at best marginally stable. Non-linear stability analysis is not available, but believed not to result in instability.
G. Gyo( rgyi / Physics Reports 342 (2001) 263}392
361
The fourth type of phase found here is a concatenation of a non-trivial plateau of x(q), like in 1-RSB, and a continuously increasing x (q). This CRSB spin glass state is also called SG-IV. The x (q) is again given by (7.105), but extra variational parameters w.r.t. the classic Parisi phase (SG-I) should be introduced: the value x of the plateau stretching from q to a q , and its upper border q . The OPF is given by (7.32) with x (q) as in (7.105), and 1!q if q 4q41 , if q 4q4q , D (q),1!q #O dq x (q ) O (7.108) D(q)" x (q !q)#D (q ) if q 4q4q , D "1!q #D (q )#x (q !q ) if 04q4q . The resulting free energy can be straightforwardly constructed from Eq. (7.54b) as
1 q 1 x (q !q ) O dq # # ln 1#
(q , q , q , x )"! #ln(1!q ) 2 D x D (q ) D (q) O # cx (=(q )!=(q )) O #c dq x (q)= Q (q)#c=(1)!c=(q ) . (7.109) O The specialty of the high-¹ limit is that the numerical evaluation of all spin-glass-like phases involves extremization only in a few scalars, because the x (q) is explicitly known. This has been done in Ref. [17], the results are demonstrated in the "gures there, which we redisplay for illustration. On Fig. 10 the phase diagram is shown, with one RS region and three di!erent types of
Fig. 10. Phase diagram for the potential <(y)"h(i!y) in the (c, i) plane for high ¹ by numerical maximization of the free energy Eq. (7.54b) with the ansaK tze described in this section. The full lines separate phases with di!erent types of global maxima. The RS, 1-RSB, SG-IV, and SG-I phases are indicated by a, b, c, and d, respectively. The AT curve is the RS phase boundary for i(i K2.38 and to the right of the arrow it analytically continues in the dashed line, no longer a phase boundary. Reprinted from Ref. [17].
362
G. Gyo( rgyi / Physics Reports 342 (2001) 263}392
Fig. 11. The x(q) function at representative points as marked on Fig. 10 by crosses. Reprinted from Ref. [17]. Fig. 12. The entropy s, in leading order s from Eq. (7.65), the free energy term from Eq. (7.54b), and the enlarged correction e of the energy (7.64b) in the high ¹ limit, for i"0. The RS}SG-I transition is marked by an arrow. The dashed lines correspond to the thermodynamically unstable RS state beyond this transition point. The inset demonstrates the smoothness of the transition. Reprinted from Ref. [17].
RSB. If more than one of the ansaK tze (7.100), (7.103), (7.107) and (7.109) worked, we considered the averaged equilibrium state the one with the maximal free energy. It is a plausible conjecture that the RS phase is obtained by analytic continuation from the phase of perfect storage below capacity a(a (i). So although at high temperatures there is no phase with zero error, the analog phase is the one with RS (labeled by a). Note that at high ¹ we lost the intuitive picture, valid at ¹"0, that increasing i takes us into the frustrated phase. We obtain three RSB phases. One is 1-RSB (b), the other the classic Parisi CRSB (SG-I, labeled by d), the third one is also CRSB, but with an extra plateau (SG-IV, labeled by c). The characteristic shapes for the OPF are shown in Fig. 11, note the plateau in the SG-IV phase (c). Extensive thermodynamical quantities are shown on Fig. 12 for i"0. The entropy is negative and decreases as it should for increasing a, i.e., increasing c. The mean error per pattern in the high-¹ limit is , and our approximation tells the correction e "¹(!e) from the formula (7.64b). For cPR we expect that even the correction e vanishes, this is indeed suggested by the picture. The transition from RS to CRSB is of third order from the viewpoint of the free energy, i.e., its third derivative exhibits a discontinuity. If at ¹"0 the region beyond capacity is a single CRSB phase (SG-I), for ¹PR the decomposition into three di!erent phases suggests singular surfaces born at "nite ¹'s, whose precise locations we did not determine. For i(2 the transition from RS to CRSB SG-I is of similar type as at ¹"0, but for i'2 we have a transition from RS into 1-RSB that does not have a counterpart at ¹"0. Nevertheless, the main picture, namely, that for low a's (translates into low c's here), there is a normal phase analog to a paramagnet, and for large a's the system exhibits complex, i.e., spin-glass-like behavior, is captured in the high-¹ limit.
G. Gyo( rgyi / Physics Reports 342 (2001) 263}392
363
7.2.3. Numerical evaluation method for arbitrary temperaturesH As demonstrated in Section 7.1.2, the maximization of f [x(q)] with respect to x(q) in (7.18) is done with the side conditions that f (q, y) satis"es the PPDE (7.4) and P(q, y) the SPDE (7.9). We further recall that x(q),0 for q(q and q(q),1 for q'q , and that in the remaining non-trivial regime q 4q4q it is convenient to use for x(q) the parametrization by m(t) from (7.66b). For an actual numerical implementation, m(t) has to be rewritten once more in the form of some ansatz with a "nite number N of variational parameters v ,2, v : , m(t)"m(t; v ,2, v ) . ,
(7.110)
For instance, v ,2, v may be the coe$cients of a polynomial ansatz for m(t). Another option , would be a piecewise linear ansatz with v "m(t"(n!1)/(N!1)). Also a "nite-step RSB ansatz L with the steps and plateaus parametrized by the v is possible. Given any such parametrization L (7.110), we are left with maximizing the free energy functional with respect to the N#2 variational parameters *"(v , v ,2, v ), ,>
(7.111a)
v "q ,
(7.111b)
v "g"b(1!q ) , ,>
(7.111c)
where we have expressed q through g according to (7.66c). This maximization of the free energy functional f [*] has to be performed under the (non-holonomous) constraints x(q )50, x(q )41, and x (q)50 for q 4q4q , or equivalently, m(0)50, m(1)/bq (1)41 (cf. (7.66b)), and mQ (t)q (t)!m(t)qK (t)50 for 14t41 (cf. (7.76)). It is convenient to incorporate these constraints into an augmented free energy functional f (*) in the form of soft penalty terms: I f (*)"f [*]!k t(!m(0))!k I R ! k t(m(1)/bq (1)!1) , t(x)"xh(x)/2 .
dt t(m(t)qK (t)!mQ (t)q (t)) (7.112a) (7.112b)
Thus, by successively increasing the coe$cients k , k , and k in the course of the maximization R procedure of f (*), the respective constraints will be respected more and more rigorously. I Before we proceed, the following points are worth mentioning: (i) Like in Section 7.1.9, our only assumption on q(t) is that it should be a monotonically increasing function with q(0)"q and q(1)"q . But for concrete numerical calculations, especially at low temperatures ¹"b\, the speci"c choice (7.66a) has proven to be particularly appropriate. In any case, the implicit dependence of q(t) on the variational parameters v "q and v "g should be kept in mind: ,> q(t)"q(t; v , v ). ,>
(7.113)
G. Gyo( rgyi / Physics Reports 342 (2001) 263}392
364
(ii) In our experience, the maximization procedure typically ends not at the border of the admitted parameter regime, where the soft constraints (7.112a) come into action, but rather in the interior of this admitted region. However, in the course of the maximization this border may be visited, and, in the absence of the soft constraints in (7.112a), the maximization procedure often goes out of the admitted region and diverges eventually. (iii) Strictly speaking, there are additional constraints on v and v associated with the restrictions 04q (q 41; in our experience they, however, ,> were never in danger to be violated with the obvious exception of cases with a stable RS solution. (iv) As in any variational ansatz, the necessary number N of parameters depends on how well the ansatz is adapted to the problem. In principle, a polynomial or piecewise linear ansatz (7.110) with a su$ciently large number N of parameters can approximate any shape of x(q) arbitrarily well. Whether or not N is su$ciently large in a given case should follow from the accuracy with which the stationarity conditions (7.21) and (7.22) are satis"ed. In practice, unavoidable numerical inaccuracies make things more complicated. As has been observed already in Ref. [16] within a 2-RSB ansatz, in the neighborhood of its maximum the free energy functional f (*) changes I extremely little upon certain parameter-variations, i.e., the energy landscape f (*) is very `#ata in I certain directions. In our experience, with increasing number of parameters N in (7.110), this problem becomes worse and worse in that the "nite numerical accuracy gives rise to a spurious `roughnessa in the already very `#ata energy landscape. As a consequence, any maximization strategy becomes slow or even fails for too large N. Similarly, the stability conditions are satis"ed very well (in comparison with their numerical uncertainty) within a fairly large neighborhood of the true maximizing x(q). As a consequence, in any speci"c case, a carefully tailored ansatz with not too many parameters has to be used and the criterion for convergence should be negligible changes in q , g, and m(t) upon re"ning the parametrization (7.110). In order to maximize the augmented free energy functional (7.112a), a good compromise between robustness against the spurious numerical "ne structure in the energy landscape and speed of convergence turned out to be a plain steepest descent procedure along the following lines: given a `workinga parameter set *, the direction of the steepest increase of f (*) is along the gradient I Rf (*)/R*. Taking into account all the implicit dependencies on * in (7.110), (7.113) and the I expression (7.20b) for the gradient of the original free energy functional, a straightforward but somewhat tedious calculation yields for the gradient of f (*) from (7.112a) the result I
FQ (t)m(t) Rq(t) M Rq (1) Rf I" dt# 2q (t) Rq q (1) Rq Rv !
M(t) m(t)
RqK (t) Rq (t) !mQ (t) dt , Rq Rq
(7.114a)
Rf F(t) Rm(t) Rm(0) M Rm(1) I" dt#k t(!m(0)) ! Rv 2 Rv Rv m(1) Rv L L L L !
M(t)
Rm(t) RmQ (t) qK (t)! q (t) dt , Rv Rv L L
(7.114b)
G. Gyo( rgyi / Physics Reports 342 (2001) 263}392
Rf FQ (t)m(t) Rq(t) F(1) M Rq (1) I " ! dt! Rv 2bq (t) Rq 2 bq (1) Rq ,> Rq (t) RqK (t) ! M(t) mQ (t) !m(t) dt bRq bRq where 14n4N, t(x)"xh(x), and we have introduced the quantities
365
(7.114c)
M "k t(m(1)/bq (1)!1)m(1)/bq (1) , (7.115a) M(t)"k t(m(t)qK (t)!mQ (t)q (t)) , (7.115b) R and used F(t) to denote the l.h.s. of Eq. (7.77) for a given m(t) function. Along this direction Rf (*)/R* of steepest increase, one now searches for the maximum, i.e., the I expression f (*#jRf (*)/R*) has to be maximized with respect to j. This implies the condition I I J(j )"0 (7.116)
for the maximizing j"j , where
Rf (*#jRf (*)/R*) Rf (*) I ) I . (7.117) J(j)" I R* R* By updating the parameter set as * C *#j
Rf (*)/R* (7.118)
I one completes one iteration step of the steepest descent procedure. This iteration scheme is then repeated until * does not appreciably change any more. Note that due to the numerical inaccuracies it makes little sense to locate the zero from (7.116) very precisely in each iteration step. Our usual strategy was based on the assumption that J(j) behaves approximately linear near its zero at j"j . If J(j) is given at two nearby j-values, one then obtains an approximation for j by
linear interpolation. One such readily available J(j)-value is that for j"0, the second one follows by choosing for j the approximation for j from the previous iteration step.
7.2.4. The CRSB state In Ref. [18] we presented some characteristic results, obtained by the method expounded in the previous section, for the error measure (7.94). In a non-exhaustive search we found that if the RS solution is AT-unstable, at ¹"0 beyond capacity and also for some low temperatures, only a classic Parisi CRSB state emerges. Its OPF is given in (4.44), and was denoted as SG-I. We conjecture that at ¹"0 the region beyond capacity is such a phase. Su$ciently high ¹'s, where the 1-RSB and the CRSB state with a plateau (SG-IV) would have arisen, as described in Section 7.2.2, were not reached in our explorations. The scaling introduced in Section 7.1.9, and notably the introduction of the OPF m(q)"b x(q), allows the description of the CRSB state at any temperature, at the same time maintaining a smooth transition to the ground state, ¹"0. Physically, the fact that x(q)P0, at ¹"0, for any q(1 means that q"1 with probability one. Thus freezing sets in, similar to the ground state of the SK model [29]. At the same time, the degenerate x(q) is no longer a useful OPF, because the free energy becomes a functional of rather m(q).
366
G. Gyo( rgyi / Physics Reports 342 (2001) 263}392
Fig. 13. Scaled-order parameter function x(q) for i"0, a"3 at ¹"0 (solid), ¹"0.01 (dashed), and ¹"0.1 (dotted). The "rst discontinuity is at q , below the function is constantly zero. The second discontinuity for ¹'0 is q , which goes to 1 for ¹P0. Reprinted from Ref. [18].
On Fig. 13 the scaled OPF m(q)"bx(q) is displayed for various parameters. All parameter settings are in the AT-unstable region. This "gure is the "rst indication, to our knowledge, of Parisi's CRSB state for low temperatures in a system that is not a model of long-range interaction spin glasses, or closely related to such as the Little}Hop"eld network. It is remarkable that the scaling by b makes the continuously increasing segment m (q)"bx (q) of the OPF little sensitive to the temperature. Equally stable is the lower end q of the m (q) segment, but the upper end q shows linear temperature dependence, 1!q J¹. The rightmost plateau's value is obviously m(1)"b. At the same parameter settings as before, the local stability density is displayed on Fig. 14. Since in the method of Section 7.2.3 the evaluation of the probability "eld P(q, y) by the scaled SPDE (7.73) is done in every approximant step, we obtain the sought "eld in the end by (7.79). Not shown is the Dirac delta peak at ¹"0, this restores normalization to one there. A gap exists at ¹"0, with right border D"i, in accordance with (7.98), but the gap immediately disappears for any positive ¹, as it can be seen from (7.79). At ¹"0 the density o(D) linearly vanishes at the lower edge of the gap. Comparison between the CRSB solution and earlier RS [5], 1-RSB [7,8] and 2-RSB [11] approaches shows that averaged quantities, like the mean error per pattern do not show signi"cant di!erences. The qualitative behavior of the error, that it is zero below and is positive beyond capacity at ¹"0, furthermore that it linearly increases for small a!a , is re#ected by the previous solutions. The 1- and 2-RSB e(a) curves look the same on a resolution of a "gure [11]. On the other hand, the di!erence is more conspicuous in the distribution of non-self-averaging quantities. The OPF x(q) is the averaged probability measure of the overlap of coupling vectors, and the de"nitely
G. Gyo( rgyi / Physics Reports 342 (2001) 263}392
367
Fig. 14. Density of local stabilities o(D) from theory for i"0, a"3 at ¹"0 (solid), ¹"0.01 (dashed), and ¹"0.1 (dotted). Reprinted from Ref. [18].
continuously increasing part of it in Figs. 11 and 13 shows that "nite R-RSBs are qualitatively in error. Further qualitative di!erence can be found in the distribution of local stabilities o(D). Indeed, for "nite R-RSB the o(D) exhibits a discontinuity at the lower edge of the gap. The right tendency is shown by the feature that the size of the discontinuity is smaller in the 1-RSB than in the RS solution [7]. 7.2.5. Simulation In this section we describe the simulation results from [18]. Wendemuth adapted existing algorithms for below capacity of the simple perceptron, with potentials of the form (3.9), to the region beyond it by specially dealing with patterns with positive stabilities [155], and performed a series of simulations [37]. The most sensitive part of his work was the potential with b"0, which that counts the number of unstable patterns, an NP-complete problem from the algorithmic viewpoint [154]. His data showed signi"cant deviation from the then available best theoretical prediction from the 1-RSB calculation of Majer et al. [7]. He evaluated the probability density of local stabilities at a"1 and i"1, a point known to be beyond capacity. Although the shapes roughly resembled, a gap, and a peak at its right end, were present, the simulation data gave systematically and discouragingly larger stabilities than predicted by theory. Essentially following Wendemuth's algorithm we redid the simulation in order to see how persistent the deviation is. The "rst step is to generate random patterns (3.2). We selected numbers with uniform distribution from an interval centered around zero and in the end normalized them as , (SI)"N I I
(7.119)
G. Gyo( rgyi / Physics Reports 342 (2001) 263}392
368
The output for the patterns, mI, were taken uniformly 1, not restricting generality, for SI have I random signs. The algorithm goes in discrete time t"0, 1,2. We initialized at t"0 the coupling vector according to the Hebb rule + J (0)"const. SI , (7.120) I I I with the constant chosen so that the Eucledian norm was "J(0)""N. At time t the local stabilities J(t) ) SI DI(t)" "J(t)"
(7.121)
are computed and among the unstable ones, i.e., DI(t)(i, the one with the largest DI(t) is selected. This is the least unstable pattern, characterized by the index k (t). The couplings are updated according to the rule of Wendemuth [155,37]. We took J(t#1)"J(t)#j(SI R#*S(t)) ,
(7.122)
where
*S(t)"
0
N/"J(t)"!DI R(t) J(t) "J(t)"!DI R(t)
if DI R(t)'0 , if DI R(t)(0 .
(7.123)
The j is the gain parameter, chosen in Ref. [155] as j"N\. By trial and error we found that a larger gain parameter j"N\ did not endanger overall convergence, and made the "nal approach for a given pattern, DI R(t)Pi, faster. The second row in the update rule (7.123) is Wendemuth's term introduced to specially cope with patterns with negative stability. At the next time step t#1 we again "nd the least unstable pattern with index k (t#1) and update the couplings by the above rule. The usual course of the algorithm is that the least unstable pattern is the same, k (0)"k (1)"2, until it becomes stabilized at say t !1, whence another pattern is taken for some steps, k (t )"k (t #1)"2, again until it becomes stabilized. In principle, another pattern may become least stable before the one in question is stabilized, but typically this was not the case. The above recipe is repeated until a pattern cannot be stabilized in a reasonable time. The notion of reasonable time could be quanti"ed, because the time needed to stabilize a pattern showed a systematic increase as function of the total number of patterns stabilized before. Therefore, it is a good recipe to halt the algorithm, when a pattern cannot be stabilized within a small multiple of the extrapolated convergence time. In test runs, if the last pattern could not be stabilized within twice the extrapolated convergence time, it could not within ten times of the same either. Thus we are con"dent that we exploited the possibilities of the update rule described above. Wendemuth algorithm is based on the argument that one has the highest chance to stabilize the pattern among all patterns with DI(i whose DI is closest to i. So this algorithm may maximize the number of stable patterns, by successively pushing the stability of the least unstable pattern to i from below. A consequence is that the remaining non-stabilized patterns with DI(i will have relatively large distance i!DI, but the latter quantity does not enter the present error measure. Nevertheless, the principle of stabilizing the least unstable pattern resembles qualitatively the
G. Gyo( rgyi / Physics Reports 342 (2001) 263}392
369
Fig. 15. Density of local stabilities o(D) at a"i"1. The horizontal axis is D, the vertical one o. The theoretical prediction is given by the full line. The two empirical densities are normalized histograms, taken with M"N"500 and 1000. Reprinted from Ref. [18].
gradient descent algorithm for di!erentiable error measures, because every step is made in the momentarily most promising direction. The shortcomings of such algorithms in NP-complete problems is known, and we cannot be certain that the number of unstable patterns is indeed minimized. The result of the simulation at the parameter setting a"i"1 is shown on Fig. 15. Since i'0, in the "nal approach DI RPi for the momentarily least unstable pattern the stability is positive, so the second row in the update rule (7.123) does not come into play. The full line is the result of numerical extremization of the variational free energy (7.19) by the method explained in Section 7.2.3. We omitted the Dirac delta peak of the theoretical probability density at i"1. The dashed lines are the histograms for the local stabilities from simulation for two sizes, M"N"500 and 1000, with proper normalization. We do not enclose the original data of Wendemuth [37], but mention that his histogram showed a much larger systematic error. To quantify the deviation let us consider the mean error e, i.e., the relative number of misclassi"ed patterns. Wendemuth's number is 0.21, the present simulation gives 0.15, while theory predicts 0.1358. Thus we are still about 10% o! the theoretical value, but it is a remarkable improvement w.r.t. the previous deviation of 55%. The size of the gap from simulation is also within about 10% of the theoretical value. The simulation data reproduces, for the larger size M"N"1000, the property that the density o(D) linearly vanishes at the lower edge of the gap. This should be contrasted with the 1-RSB result in Ref. [7], where the size of the discontinuity at the lower edge of the gap is about the third of the height of left peak. The simulation clearly favors the CRSB solution. In summary, the theoretical and simulation data do not match perfectly, however, given the NP-completeness of the numerical problem, this does not disprove theory. We mention that the
G. Gyo( rgyi / Physics Reports 342 (2001) 263}392
370
algorithm used had the primitive side of being deterministic. Furthermore, it does not have a rigorous mathematical basis for convergence to the desired state. There is obviously room for further improvements. 8. The neuron: independently distributed synapses 8.1. Free energy and stationarity condition In this paper we focus mostly on the spherical neuron. Since, however, the main formulas for the case of prior distribution (3.6), where synapses are independent and obey arbitrary distribution, follow straightforwardly from Section 4, we now brie#y review them. In the course of continuation the limits q Pq , q Pq , 0 q( Pq( , q( Pq( 0 are assumed. The corresponding free energy (3.22) can be characterized by two OPFs q(x), q(0)"q , q(1)"q , q( (x), q( (0)"q( , q( (1)"q( . Alternatively, we can take as OPFs the respective inverses
(8.1a) (8.1b)
(8.2a) (8.2b)
x(q), x( (q( ) .
(8.3)
q( "q( (x(q)) ,
(8.4)
Then
or its inverse function q"q(x( (q( ))
(8.5)
establishes a relation between the overlaps q and q( . Concerning the f term, Eqs. (7.1)}(7.9) from the spherical case carry over unchanged. The C entropic term (3.22d) is a transcript of (4.41) with (4.3) together with the appropriate equations that produce the averages. We introduce the "eld (see Eqs. (4.40a) and (4.40b)) fK (q( , y)"!b\u( (q( , y)
(8.6)
to get
1 fK [x( (q( )]"lim fK (QK )"!b\u( ln du w (u) e\@SW,QK "fK (0, 0) , Q n Q L L where fK (q( , y) is the solution of
(8.7)
R ( fK "!RfK #bx( (R fK ) , O W W
(8.8a)
fK (q( , y)"!b\ ln Dz du w (u) exp(!bu(y#iz(qL )) .
(8.8b)
G. Gyo( rgyi / Physics Reports 342 (2001) 263}392
371
Introducing m( (q( , y)"R fK (q( , y) , W we have
(8.9)
R ( m( "!Rm( #bx( m( R m( , W W O m( (q( , y)"R fK (q( , y) . W Furthermore, the K -ed &susceptibility "eld' is s( (q( , y)"R m( (q( , y) , W obeying
(8.10a) (8.10b)
(8.11)
R ( s( "!Rs( #bx( (m( R s( #s( ) , W W O s( (q( , y)"R fK (q( , y) . W The probability density PK (q( , y) satis"es a variant of the SPDE R ( PK "RPK #bx( R (PK m( ) , W W O PK (0, y)"d(y) .
(8.12a) (8.12b)
(8.13a) (8.13b)
The interaction term (3.22c) is simplest if expressed through the functions (8.2) f [x(q), x( (q( )]"!b G
dx q(x)q( (x) . (8.14) Since a function is a functional of its inverse, the f [2] can be considered as functional of x(q) G and x( (q( ). The stationarity conditions (3.24), (3.25) now read as
q" dy PK (q( , y)m( (q( , y) ,
(8.15a)
q( "a dy P(q, y)m(q, y) ,
(8.15b)
where the connection between q and q( is established by (8.4) or (8.5). The r.h. sides are respective functionals of x( (q( ) and x(q). Note that solving these equations involves also "nding the starting point q( , in contrast to the evaluation of the energy term, where the initial condition is "xed at q"1. Given the solution for the stationary x(q) and x( (q( ), by substituting them into the r.h.s. of f"fK [x( (q( )]#f [x(q), x( (q( )]#af [x(q)] (8.16) Q G C we obtain the "nal result for the mean free energy. A special case of independently distributed synapses is the clipped neuron, i.e., the neuron with discrete synapses. The most studied such model is the Ising neuron with binary synapses, which has
G. Gyo( rgyi / Physics Reports 342 (2001) 263}392
372
attracted considerable interest (see [19,12] for references). The prior distribution in the Ising case involves (3.7), so the initial conditions for the PDEs are , (8.17a) fK (q( , y)"!b\ ln cosh by#bq( m( (q( , y)"!tanh by . (8.17b) The Ising neurons studied in the literature so far were reminiscent to the random energy model in that they involved at most 1-RSB [234]. However, only a few choices of the error measure potential <(y) were considered, and at this stage it cannot be excluded that the full Parisi scheme becomes of import for other potentials. Finally we emphasize that previously studied SK-type spin glasses, with Ising, or, as a matter of fact, any kind of individual spin constraint (3.6), are included in the considerations of this section. Formulae equivalent with (8.17) are well known from the SK model. This is not surprising, because the entropic term with the Ising constraint is the same for both the SK spin glass and the present neuron model. In the case of other constraints for the SK-type spin glass the "rst two terms on the r.h.s. of (8.16) remain valid. Concerning the third, the energy term, let us assume the most general multi-spin interaction of Nieuwenhuizen's [68] resulting in a term (6.48). Then indeed, if in (8.16) the af is replaced by the energy term (6.51), with the understanding of the correspondence (6.49), C (6.50), one obtains by (8.16) the full free energy functional of the spin glass problem. 8.2. Variational principle The results of Section 8.1 can also be derived from a variational principle. A reasoning similar to what we followed in the spherical problem yields a free energy functional f [2] that produces the mean free energy as (8.18) f"max extr extr extr f [x(q), f (q, y), P(q, y), x( (q( ), fK (q( , y), PK (q( , y)] . VO D OW.OW V( O( DK O( W.K O( W The order of the extremum conditions is not binding, but, given the physical meaning of the OPF x(q), the maximum is to be taken last. The free energy functional is f [2]"fK [2]#fK [2]#fK [2]#f [2] Q ? ? G #a( f [2]#f [2]#f [2]) , C ? ? fK [2]"fK (0, 0) , Q O( dq( dy PK (q( , y)[R ( fK (q( , y)#R fK (q( , y)!bx( (q( )(R fK (q( , y))] , fK [2]" O W W ? 1 fK [2]"! dy PK (q( , y) ln du w (u) e\@SW#fK (q( , y) , ? b
(8.19a) (8.19b) (8.19c) (8.19d)
where f [2], f [2], f [2], and f [2] are given by Eqs. (7.24c), (7.24d), (7.24e) and (8.14), C ? ? G respectively. The remarkable symmetry of the above expressions in the quantities with and without
G. Gyo( rgyi / Physics Reports 342 (2001) 263}392
373
the ( mark is the consequence of the fact that both the entropic and the energy terms are essentially of the general Parisi form, the main di!erence being in the starting point of the time variable and the initial condition for the PDE. The free energy functional f [2] should be maximized in x(q) and extremized over the other function arguments. Besides the extremization of terms analogous to those appearing in Section 7.1.3, we have to calculate the functional derivatives of f [x(q), x( (q( )]. If u\(u) is the inverse of some G function u(t) then variation of the identity u\(u(t))"t yields the functional derivative of the inverse function as d(t !t ) du\(u(t )) . "! u (t ) du(t ) This relation helps us to calculate the sought derivatives of the interaction term
(8.20)
df [x(q), x( (q( )] G "b q( (x(q)) , dx(q)
(8.21a)
df [x(q), x( (q( )] G "b q(x( (q( )) . d x( (q( )
(8.21b)
These, together with functional derivatives of the type determined in Section 7.1.3, lead to the stationarity relations displayed in Section 7.1 for intervals of strictly increasing OPFs, including the points where the OPFs exhibit a step. Plateaus should be dealt with in a manner similar to what was described in Section 7.1.3. Extremization in terms of the starting point in time of the K -ed PDE, q( , yields
bx( (q( ) q " dy PK (q( , y) s( (q( , y) ,
(8.22)
a condition which was not displayed in Section 8.1. 8.3. On thermodynamical stability In the case of independently distributed synapses the free energy (3.22) involves combined maximization and extremization. Clearly, there are no stability requirements following from the &extr' condition. On a simple example we now give the recipe for stability calculations in the replicon sector for such a case. Consider the two-variable function F(x, x( )"f (x)#fK (x( )#xx( ,
(8.23)
where f and fK are real functions. We are seeking min extr F(x, x( ) . V V( Extremum conditions for x and x( imposed simultaneously would read as
(8.24)
x"!fK (x( ) ,
(8.25a)
x( "!f (x) .
(8.25b)
G. Gyo( rgyi / Physics Reports 342 (2001) 263}392
374
Substitution of the stationary value (8.25b) gives F(x),F(x,!f (x))"f (x)#fK (!f (x))!xf (x) .
(8.26)
The stationarity condition in terms of x is F(x)"!f (x)[x#fK (!f (x))]"0 .
(8.27)
The stationary point x#fK (!f (x))"0 is a minimum of F(x), if "!f (x)[1!fK (x( )f (x)]'0 , F(x)" V\DK Y\D YV
(8.28)
where (8.25a) and (8.25b) are understood. If we have more-than-one-dimensional objects en lieu of x and x( , and at the saddle the Hessian matrices of f and fK can be simultaneously diagonalized, then a similar formula holds and appropriate eigenvalues of the Hessian at stationarity should be substituted for f (x) and fK (x( ). The above principle can be applied to the case of independently distributed synapses. There are two families of replicon eigenvalues, j (q , q , q ) and jK (q( , q( , q( ) coming from the energy and C Q entropic term, respectively. The contribution from the energy term, j (q , q , q ), is the same as in C the spherical case, given by Eq. (7.29). Analogously, from the K -ed entropic term we have
jK (q( , q( , q( )"!b dy dy CK uuu (q( ; 0, 0; q( , y ; q( , y )s( (q( , y ) s(q( , y ) . Q
(8.29)
Here we spell out the obvious, namely, the K -ed PPDE gives rise to a K -ed Green function (see Section 4.2.4), whence the vertex function CK uuu can be de"ned. Finally, based on (8.28) the necessary criterion of stability becomes !j (q , q , q )[1!aj (q , q , q ) jK (q( , q( , q( )]50 . C C Q
(8.30)
Here we have omitted a prefactor a and allowed equality for the sake of possible Goldstone modes in a Parisi phase. The stability condition is of course understood at stationarity, which yields a concrete q"q(q( ) function. That implies q "q(q( ), so the overall replicon eigenvalue is parametG G rized by three independent variables, as in the spherical case.
9. Conclusions and outlook The main messages of this paper were extensively discussed and conclusions were advanced in Sections 1 and 2; so we only highlight a few moral issues here. A sensitive question in approximating a CRSB state by "nite R-RSB is how good it will turn out to be in the end. However, there is so far no reliable a priori estimate of this error, as opposed to say a series expansion, where the last power retained gives at least asymptotically a bound for the error. Sometimes there is a qualitative indicator showing that a low-order approximation is wrong. Long known example is the ground state entropy of the SK model, which was negative for "nite R-RSB ansaK tze, a problem cured only by Parisi's CRSB solution. However, often macroscopic quantities
G. Gyo( rgyi / Physics Reports 342 (2001) 263}392
375
are quite well approximated with the RS or low R-RSB solution. The main advantage of the CRSB calculation w.r.t. the approximations is that the latter may not be able to even qualitatively correctly predict distributions of local, non-self-averaging quantities, like the overlaps and local "elds. These are observables in numerical simulations and can help to decide between candidate theories (see e.g., the numerical review [91] on spin glasses). On the technical side, the mathematical framework discussed in Sections 4}6 relates to the general properties of CRSB phases, irrespective of the storage problem of the neuron, upon which its use was demonstrated subsequently. It allows for a non-perturbative description in a wide range of problems in disordered systems, like long-range interaction spin glasses other than the SK model, and may be a starting point for the study of frustrated phases, i.e., unsatis"able situations, in optimization problems in general. Among the notoriously di$cult problems in arti"cial neural networks is the problem of learning and generalization of unlearnable tasks [200,235]. In the traditional scenario of equilibrium, i.e., batch, learning from examples, an unlearnable problem is characterized by the fact that there is a limited number of examples the network can reproduce. Beyond this limit of error-free learning, the generalization ability might be further improved, but the minimal training error is positive. This is in close analogy with the region beyond capacity in the storage problem, so it is a sensible assumption that theoretical methods able to deal with imperfect storage may also be of use in the description of learning the unlearnable. Further possible area of application is unsupervised learning [235], where no desired output is given, rather the properties of the distribution of examples is to be extracted. Again, if the network can be saturated by the examples a complex phase appears, where methods similar to those presented in this paper may be the key to the solution.
Acknowledgements The author acknowledges support by OTKA grant No. T017272. Thanks are due "rst and foremost to P. Reimann, the coauthor of Refs. [17,18], with whom the work that forms the basis of this paper was begun and in great part done. It is regrettable that, due to other obligations, he felt compelled to withdraw his name from the Physics Reports project along the road. But before that he drafted an introductory chapter, from which much was retained in the present Sections 1 and 2, and collected many references. He was responsible for the numerical evaluation of the theoretical predictions, the text of and most of the work behind Section 7.2.3 is his contribution alone. Those sections are marked by an asterisk in the title. His numerous comments on other part greatly improved the presentation. Discussions with F. PaH zmaH ndi helped to raise and clarify "ne points of the theoretical framework, in particular the problem of a discontinuity in the error measure potential. The author remains grateful to T. BmH roH for introducing him to the subject of Lie symmetries of partial di!erential equations. As already stated in Ref. [18], the simulation was done on the PC cluster of F. Csikor and Z. Fodor, built from grants OTKA-T22929 and FKFP0128/1997.
G. Gyo( rgyi / Physics Reports 342 (2001) 263}392
376
Appendix A. Abbreviations This appendix lists acronyms and abbreviations used throughout the paper. Left-hand-side l.h.s. Right-hand-side r.h.s With respect to w.r.t. de Almeida}Thouless [stability] AT Green function GF Order parameter function OPF Partial di!erential equation PDE Parisi's PDE, Eq. (4.36) PPDE Replica symmetry breaking RSB R-step RSB R-RSB Continuous RSB CRSB Sherrington}Kirkpatrick [model] SK Sompolinsky's PDE, Eqs. (4.53) and (4.54) SPDE Ward}Takahashi identity WTI
Appendix B. Derivation of the replica free energy The nth moment of the partition function (3.10), with (3.3) inserted as constraint, reads as
1ZL2" d,J w(J ) dyI e\@4WI? d yI !N\mI J SI ? ? ? ? ?I I ? I I
.
(B.1)
The indices k, k, and a run from 1 to N, M, and n, respectively. The Fourier transformation of the Dirac deltas introduces the ancillary variables xI adjoint to yI. Average over the Gaussian ? ? distribution of patterns SI, and over the outputs mI, which are $1 equally likely, can be performed I straightforwardly. In fact, since SI is scaled by the vanishing factor N\, the same result would be I obtained for other distributions of mISI with zero mean and unit variance independent of k and k. I
1ZL2"
d,J w(J ) ? ? ? I
dxI dyI ? ? 2p
1 ; exp !b <(yI )# ixIyI ! xI xI J J . ? ? ? 2N ? @ ?I @I I ? ? ?@ I
(B.2)
If we substitute the overlaps de"ned in (3.18), the product over k gives the M"aNth power of e\L@DC Q, where f (Q) is displayed in (3.17d). Our inserting the constraint (3.18) yields C
1ZL2"
d,J w(J ) e\,?@DC Q Ndq d Nq ! J J . ? ? ?@ ?@ ?I @I ? ?@ I
(B.3)
G. Gyo( rgyi / Physics Reports 342 (2001) 263}392
377
For both the spherical (Eqs. (3.5a) and (3.5b)) and the independent (see condition below Eq. (3.6)) prior distributions we have q ,q "1. Fourier transformation of the Dirac deltas introduces the ?? " variables q adjoint to q , and we have ?@ ?@ N dq dq d,J w(J ) e\,?@DC Q 1ZL2" ? ? 2p ?@ ?@ ?@ ?
;exp iN q q !i q J J . (B.4) ?@ ?@ ?@ ?I @I ?@ I ?@ We shall see that for the prior densities of interest, after integration by the synaptic coe$cients, the exponential has the overall coe$cient N. Then for NPR the saddle point method can be applied, and it will turn out that the stationary value of each q is imaginary. We presume that ?@ the prior density is such that the integration path of q can be distorted so as to go through the ?@ imaginary saddle point. The path can then be taken a straight line parallel or perpendicular to the imaginary axis in a su$ciently large neighborhood of the saddle point, depending on which orientation ensures a maximum at the saddle. This procedure is typical if one integrates a fast oscillating integrand } then only an extremum, not speci"cally minimum, condition should be satis"ed. If we succeed in determining the saddle values of the q 's as function of the q 's, we have ?@ ?@ to minimize in terms of q . We shall see that this can be carried out for the spherical constraint ?@ (3.5), but we cannot explicitly determine the q 's for general independent synapses (3.6). ?@ In the case of the spherical constraint (3.5) we shall make use of the advance knowledge that the stationary values of qN "iq are real. Let us insert the Fourier transform of the Dirac deltas ?@ ?@ representing the spherical constraints, thereby introducing the integration variables q , then switch ? over to q "iq to obtain ? ?
1ZL2"C NLL\(2p)\LL> ,
dq dq ?@ ?@ ?@
d,J dq e\,?@DC Q ? ? ?
;exp N q q #N q ! q J ! q J J . (B.5) ?@ ?@ ? ? ?I ?@ ?I @I ?@ ? I? I ?@ We can introduce diagonal elements for the matrix Q as q "2q , respectively. Performing the ?? ? Gaussian integrals over J we obtain ?I
dq dq ?@ ?@ ?@ ?X@ ;exp N(!abf (Q)# Tr QQ ! ln det Q ) . C Given the asymptotics of the prefactor 1ZL2"C NLL\(2p)L,\L\ ,
ln C 1 , +! ln 2pe , N 2
(B.6)
(B.7)
in the large N limit we have by the saddle point method the free energy 1 1 f"lim (1!1ZL2)"lim min extr f (Q, Q ) , nNb n QM L L Q
(B.8a)
G. Gyo( rgyi / Physics Reports 342 (2001) 263}392
378
n 1 1 f (Q, Q )" #af (Q)! Tr QQ # ln det Q . C 2b 2b 2b
(B.8b)
By our using (3.20) the extremum condition for the matrix elements q results in Q \"Q. ?@ Substitution thereof into (B.8b) gives the spherical free energy (3.17). Similar derivation yields the free energy (3.22) for the prior distribution (3.6) of independent synapses. There we use q( "!ibq and obtain ? ? N LL\ dq dq( e\,?@DC Q exp !Nb q q( 1ZL2" ?@ ?@ ?@ ?@ 2p ?@ ?@ , . (B.9) ; dJ w (J ) exp b q( J J ? ? ?@ ? @ ? ?@ The ancillary matrix Q) has vanishing diagonal elements, q( "0, with that in mind we recover the " expression (3.22) for the free energy. The ancillary matrix elements cannot, in general, be eliminated as easily as in the spherical case.
Appendix C. Derivation of the R-RSB free energy term This appendix contains the few steps that lead to the R-RSB free energy term, starting out from Eq. (4.11). The integrals therein taken over the variables x 's yield Dirac deltas, which "x the values ? of the y 's. The j indices can be understood as follows. Assume as usual that each m is a divisor of ? P P n. The ordered sequence of integers 1,2, n are divided into n/m `boxesa each containing P m integers. Then the index j enumerates those boxes. Given 14a4n, for each r, the j (a) labels P P P the box that contains a, i.e.,
a!1 #1 , (C.1) m P where [2] denotes the integer part. Then the coe$cient of an x in "rst term of the exponent of ? (4.11) is characterized by the j (a)'s. That way we arrive at P 0> LKP L 0> . (C.2) DzPP exp U zPP (q !q eLP UWqm " P P\ H ? H P P H ? P Note that m "1 and a"j (a); we will substitute j for a. The integrals over z0> 0> 0> 0> H0> ? factorize as j (a)" P
0 LKP LK0> DzPP Dz0> H0> H P HP H0> 0 ;exp U zPP 0> (q !q #z0> (q !q . (C.3) P P\ H H H0> 0> 0 P The functions j ( j ), r4R, are step-like in that they are constant for m /m di!erent j 's P 0> 0 0> 0> 's associated with the same box belonging to the same box of length m . Integrations over z0> 0 H0> eLP UWqm "
G. Gyo( rgyi / Physics Reports 342 (2001) 263}392
379
give identical results. Di!erent integrals are characterized by di!erent j 's, this can be given as the 0 new argument for the rest of the indices as j ( j ), r4R. We then have P 0 P 0 LK 0 LK DzPP eLP UWqm " H P HP H0 0 K0 K0> exp U zPP 0 (q !q #z (q !q . (C.4) ; Dz P P\ H H 0> 0> 0> 0 P Again, integration over a z0 gives the same value for those j 's that de"ne the same j ( j ), H0 0 P 0 r4R!1. These can be characterized by j , and one obtains 0\ 0\ LKP LK0\ Dz Dz eLP UWqm " DzPP 0 0> H H0\ P HP 0> 0\ K0 K0> K0\ K0 . ;exp U zPP 0\ (q !q # z (q !q P P\ P P P\ H H P0 P (C.5)
The expression can be rolled up by continuing the above reasoning and we arrive at
eLP UWqm " Dz ;
Dz
Dz 2
Dz exp U 0>
0> z (q !q P P P\ P
K0 K0>
K K K K
2
.
(C.6)
Appendix D. Derivation of the PPDE by continuation To the author's knowledge Ref. [226] is considered to be the only publication on the derivation of the PPDE. However, we were not able to reproduce the derivation from that article. Furthermore, [226] required RPR and q !q P0, conditions which we did not "nd necessary to P P\ prescribe. In essence, [226] proposes an iteration in a direction that is opposite to that of the recursion (4.15). We were unable to reconstruct that, mostly because the starting term was not known. In other words, we evaluated the free energy term (4.11) starting from r"R#1, while [226] did so from r"0 (in our notation). When q !q P0 is assumed, our recursion yields the PPDE in the spirit of Ref. [226]. We P P\ use the identity
(D.1)
t (y)" eOP \OP\ Wt (y)VP VP> . P\ P
(D.2)
exp
c d F(y)" Dz F(y#z(c) 2 dy
to rewrite (4.15a) into
G. Gyo( rgyi / Physics Reports 342 (2001) 263}392
380
In order to produce a PDE from the recursion, the assumption of ordering for q 's is necessary. We P can then relegate the dependence on the index r to dependence on the variable q"q . ContinuaP tion is then performed by replacing q by q, t (y) by t(q, y). We allow for non-trivial limits q and P P q as introduced in (4.42). The conditions (4.6) and (4.13) ensure monotonicity of x(q). If we assume a smooth x(q), i.e., that all q !q P0 and x !x P0 for 14r4R#1, then an expansion P P\ P P\ of (D.2) in the di!erences to lowest nontrivial order yields for t(q, y) the PDE (4.34) in the interval (q , q ). As we found in Section 4.1.2, Eq. (4.34) and, equivalently, the PPDE (4.36), stands even if x(q) is not smooth, with the right interpretation of (4.34) at discontinuities of x(q). On the other hand, the author gladly acknowledges that the way he "rst obtained the PPDE for the general free energy term (4.1) was in the spirit of the above discussed derivation of Ref. [226].
Appendix E. Multidimensional generalization of the PPDE We consider here the generalized free energy term
dL)x dL)y L 1 exp U(y,2, y)) u[U(y), Q]" ln ? ? (2p)L) n ? L ) L 1 ) ;exp i xI yI ! xI qIJ xJ , ? ?@ @ ? ? 2 IJ ?@ I ? where the order parameter matrix has now extra indices
(E.1)
[Q]IJ "qIJ . (E.2) ?@ ?@ Such a situation occurs, for instance, in the treatment of thermodynamical states in vector spin glasses, or, of the metastable states in the SK model. When counting the stationary states of the Thouless}Anderson}Palmer equations, Bray and Moore [26] encountered Eq. (E.1) with K"2 and a special U. They displayed the corresponding PPDE but did not pursue the matter further. Since Eq. (E.1) is a straightforward generalization of the Parisi term, we brie#y give the way how to evaluate it. Also, we concisely formulate the calculation of replica correlators. The assumption of the Parisi structure for all individual submatrices of Q with "xed k, l can be cast into the form 0> Q" (Q !Q )U P I P . LK P P\ K P
(E.3)
Here [Q ]IJ"qIJ "qIJ (E.4) P?@ P?@ ?@ is the symmetric K;K matrix analog of (5.14). The quadratic form in the exponent in (E.1) is now LKP HP KP 0> ) HP KP (qIJ!qIJ ) xI xJ , P P\ ? @ P IJ HP ?KP HP \> @KP HP \>
(E.5)
G. Gyo( rgyi / Physics Reports 342 (2001) 263}392
381
with qIJ "0. Let us diagonalize the di!erence between subsequent Q 's as \ P Q !Q "O2K O , r"0,2, R#1 , (E.6) P P\ P P P where the orthogonal K;K matrix O is made up by column eigenvectors of Q !Q and K is P P P\ P diagonal and has the real eigenvalues as diagonal elements. A derivation similar to that given in Section 4.1 and Appendix C yields the R-RSB term 1 u[U(y), +qIJ,, x]" " P L x
D)z ln D)z
D)z 2
0> V0 V0> V V z KO . (E.7) 2 P P P P Here K has the square root of the eigenvalues (possibly also imaginary numbers, the sign being P irrelevant) as diagonal elements, D)z denotes the K-dimensional Gaussian integration measure, and z is a K-dimensional vector. The function U(y,2, y)) is naturally abbreviated by U(y). The P recursion ;
D)z exp U 0>
t (y)" D)z t ( y#zKO )VP VP> , P P P P\ t (y)"eUy 0> evaluates (E.7) as
(E.8a) (E.8b)
1 D)z ln t (zKO ) . (E.9) u[U(y), +qIJ,, x]" " P L x In order to produce a PDE we need to specify a time-like variable. For practical purposes we consider the case when one diagonal element is a known constant, say q "1. Then we pick 0> q as time variable, call its continuation q, and obtain the PDE for the "eld t(q, y) in K spatial P dimensions as x 1 R t"! Q t# t ln t , W O x 2 W
(E.10a)
t(1, y)"eUy .
(E.10b)
Here the dot means derivative in terms of q, of course [Q ]"1, and q evolves from 1 to 0. As in the case with one spatial dimension, in the q-intervals (q , 1) and (0, q ) we have x(q),1 and x(q),0, resp., where q "q and q "q . Again, by introducing 0 ln t(q, y) (E.11) u(q, y)" x(q) we obtain the K-dimensional PPDE as R u"! Q u!x( u)Q u , O W W W W u(1, y)"U(y) .
(E.12a) (E.12b)
G. Gyo( rgyi / Physics Reports 342 (2001) 263}392
382
Then the sought term is u[U(y), +qIJ,, m]" "u(0, 0) . P L The evolution in the interval (q , 1) can be solved explicitly to give
(E.13)
u(q , y)" D)z exp U(y#zK O ) 0> 0> d)v d)w exp[U(*)#iw(*!y)!w(Q !Q )w] , 0> 0 (2p))
"
(E.14)
that is the initial condition for further evolution in (0, q ). From the mathematical viewpoint, the problem of existence of the above expression needs to be clari"ed for the speci"c U in play. It typically occurs that a diagonal element of Q is known to vanish, but for other r's the same 0> diagonal is positive. In general, Q !Q is not necessarily a positive-de"nite matrix. However, 0> 0 given the fact that Eq. (E.14) at y"0 is the RS free energy (where q is replaced by the RS value of q), on physical grounds we surmise that the divergence of the integral is a rare threat. In the present case there are K(K#1) OPFs, namely, x(q) and qIJ(q), (k, l)O(1, 1) and k, l4K. Expectation values [A(+xI ,, +yI ,)\ (E.15) ? ? we conveniently de"ne by inserting the function A in the integrand of (E.1) and omitting the 1/n ln from in front of the formula. The nP0 limit is understood. As in one spatial dimension, the GF G (q , y ; q , y ) for the multidimensional PPDE is a key help in calculating averages of common P occurrence. The GF is zero for q 'q and satis"es the PDE (E.16) R G "! Q G ! x(q )( u(q , y ))Q G !d(q !q )d)(y !y ) . W W P W W P O P Special signi"cance is attached to P(q, y)"G (0, 0; q, y) , P a natural generalization of the K"1 "eld. Let us introduce the derivative "elds kI(q, y)"R I u(q, y) , W iIJ(q, y)"R I R J u(q, y) . W W Then we can write the two-replica-correlator
(E.17)
(E.18a) (E.18b)
(E.19)
(E.20)
Ru[U(y), Q] "![xI xJ \,CIJ (q ) ? @ V P?@ RqIJ L ?@ as CIJ (q)" d)y P(q, y)[kI(q, y)kJ(q, y)#h(q!1\)iIJ(q, y)] . V
By use of this formula the stationarity conditions for a free energy that contains a term like (E.1) can be immediately constructed.
G. Gyo( rgyi / Physics Reports 342 (2001) 263}392
383
Appendix F. An identity between Green functions In this appendix we show the identity (4.87). The r.h.s. of
R Cuuu (q; +q , y , )" dy([R G (q , y ; q, y)]G (q, y; q , y ) G (q, y; q , y ) G G G O P P P O # G (q , y ; q, y)[R G (q, y; q , y )] G (q, y; q , y ) P O P P # G (q , y ; q, y) G (q, y; q , y ) [R G (q, y; q , y )]) (F.1) P P O P can be expressed by our making use of the PDEs for the participating GFs. From (4.77) we have R G (q , y ; q, y)"RG (q , y ; q, y)!x(q)R [k(q, y) G (q , y ; q, y)]#d(q!q )d(y!y ) , O P W P W P (F.2) and for i"2, 3 (4.76) holds as R G (q, y; q , y )"! RG (q, y; q , y )!x(q) k(q, y) R G (q, y; q , y )!d(q!q )d(y!y ) . O P G G W P G G W P G G G G (F.3) Let us substitute the r.h. sides of the above PDEs into (F.1). The sum of the terms linear in x(q) turns out to be a derivative by y, so } under the plausible condition that the GF's decay for large "y" } integration by y gives zero. The second derivatives in y also cancel after partial integration but for a remnant that yields
R Cuuu (q; +q , y , )" dy G (q , y ; q, y)[R G (q, y; q , y )] [R G (q, y; q , y )] O G G G P W P W P #d(q!q )G (q , y ; q , y )G (q , y ; q , y ) P P !d(q!q )G (q , y ; q , y )G (q , y ; q , y ) P P !d(q!q )G (q , y ; q , y )G (q , y ; q , y ) . (F.4) P P Eq. (4.75) relates derivatives of G and G , whence we obtain (4.87) for q (q(q and P I q (q(q . Appendix G. PDEs for high temperature Here we record the calculation leading to the lowest order non-trivial correction for the distribution of local stabilities at high temperatures. Assuming P(q, y)"P (q, y)# b P (q, y)#O(b) and expanding the SPDE we obtain R P "RP , P (0, y)"d(y) , (G.1a) O W R P "RP #xR (P m ), P (0, y),0 . (G.1b) O W W
G. Gyo( rgyi / Physics Reports 342 (2001) 263}392
384
Here m (q, y) is the lowest-order approximation for the "eld m(q, y) in (7.5), thus it satis"es (7.6) with b"0, i.e., evolves according to pure di!usion. Using its initial condition m (1, y)"<(y) we get
m (q, y)" dy G(y!y , 1!q) <(y ) .
(G.2)
The zeroth-order probability "eld is obviously P (q, y)"G(y, q) , (G.3) while the next correction can be obtained from (G.1b) using the Gaussian GF for pure di!usion. This gives (in case of ambiguity q\ should be understood in the upper limit of integration)
P (q, y)"
O
dq dy G(y!y , q!q ) x(q )R G(y , q ) m (q , y ) W
O
"!
O
"R W
dq dy (R G(y!y , q!q )) x(q )G(y , q ) m (q , y ) W
dq x(q ) dy dy <(y )G(y , q ) G(y!y , q!q )G(y !y , 1!q ) . (G.4)
In the last equation the formula (G.2) for m was also substituted. We note the elementary identity 1 A dy G(y!y , q )" exp ! , (G.5) G G 2p 2p(p G where
A"y (q #q )#y (q #q )#y (q #q )!2y y q !2y y q !2y y q , p"q q #q q #q q . Hence, at q"1, we obtain P (1, y)"R W
1 y#y !2yy q exp ! . dq x(q) dy <(y ) 2p(1!q 2(1!q)
This is identical to the function o (y) given in Eq. (7.62b).
(G.6a) (G.6b)
(G.7)
Appendix H. Longitudinal stability for high temperatures Below we show that the linear operator displayed in (7.60) has all negative eigenvalues on the space of smooth functions f(q) with f(0)"f(1)"0. Consider the eigenvalue problem,
dq f(q )
O O dq "j f(q ) , D(q )
(H.1)
G. Gyo( rgyi / Physics Reports 342 (2001) 263}392
385
where we omitted the factor !1 on the r.h.s. of (7.60), so the positivity of the j's is to be proven. The l.h.s. separates as
O
O dq O dq # dq f(q ) . D(q ) D(q ) O The "rst term is equivalently dq f(q )
(H.2)
(H.3)
(H.4)
O dq O dq f(q ) , D(q ) O which concatenates with the second term in (H.2) to O dq dq f(q ) . D(q ) O Introducing
dq f(q ) , O we obtain after di!erentiation of the eigenvalue problem (H.1) the equivalent form m(q)"
m(q) "!j m(q) . D(q)
(H.5)
(H.6)
This equation may have a solution that vanish at the boundaries only if j'0. Indeed, one can try to solve (H.6) by the `shooting methoda starting from m(0)"0 and attempting to reach m(1)"0. Then the sign of the curvature of m(q) may not be the same as the sign of m(q) within the whole interval (0, 1), or else m(1)"0 will never be reached. Since D(q)'0 for q(1, this implies j'0. Thus we have demonstrated that the Hessian (7.60) is negative de"nite.
References [1] W.S. McCulloch, W. Pitts, A logical calculus of ideas immanent in nervous activity, Bull. Math. Biophys. 5 (1943) 115. [2] F. Rosenblatt, Principles of Neurodynamics, Spartan, New York, 1962. [3] E. Gardner, Maximum storage capacity in neural networks, Europhys. Lett. 4 (1987) 481. [4] E. Gardner, The space of interactions in neural network models, J. Phys. A 21 (1988) 257. [5] E. Gardner, B. Derrida, Optimal storage properties of neural network models, J. Phys. A 21 (1988) 271. [6] M. Griniasty, H. Gutfreund, Learning and retrieval in attractor neural networks, J. Phys. A 24 (1991) 715. [7] P. Majer, A. Engel, A. Zippelius, Perceptrons above saturation, J. Phys. A 26 (1993) 7405. [8] R. Erichsen, W.K. Theumann, Optimal storage of a neural network model: A replica symmetry-breaking solution, J. Phys. A 26 (1993) L61. [9] M. Bouten, Replica symmetry instability in perceptron models, J. Phys. A 27 (1994) 6021. [10] B. Derrida, Reply to the comment of M. Bouten, J. Phys. A 27 (1994) 6025. [11] W. Whyte, D. Sherrington, Replica-symmetry breaking in perceptrons, J. Phys. A 29 (1996) 3063. [12] A.H.L. West, D. Saad, Threshold induced phase transitions in perceptrons, J. Phys. A 30 (1997) 3471. [13] M. Garey, D.S. Johnson, Computers and Intractability; A Guide to the Theory of NP-Completeness, Freeman, San Francisco, 1979.
386
G. Gyo( rgyi / Physics Reports 342 (2001) 263}392
[14] M. MeH zard, G. Parisi, M. Virasoro, Spin Glass Theory and Beyond, World Scienti"c, Singapore, 1987. [15] R. Monasson, R. Zecchina, Tricritical points in random combinatorics: the (2#p)-SAT case, J. Phys. A 31 (1998) 9209. [16] C.H. Papadimitriou, Computational Complexity, Addison-Wesley, Reading, MA, 1994. [17] G. GyoK rgyi, P. Reimann, Parisi phase in a neuron, Phys. Rev. Lett. 79 (1997) 2746. [18] G. GyoK rgyi, P. Reimann, Beyond storage capacity in a single model neuron: continuous replica symmetry breaking, J. Stat. Phys. 101 (2000) 679. [19] J. Hertz, A. Krogh, R.G. Palmer, Introduction to the Theory of Neural Computation, Addison-Wesley, Reading, MA, 1991. [20] G. Parisi, A sequence of approximated solutions to the S-K model for spin glasses, J. Phys. A 13 (1980) L115. [21] G. Parisi, The order parameter for spin glasses: A function on the interval 0}1, J. Phys. A 13 (1980) 1101. [22] G. Parisi, Magnetic properties of spin glasses in a new mean "eld theory, J. Phys. A 13 (1980) 1887. [23] G. Parisi, The magnetic properties of the Sherrington-Kirkpatrick model for spin glasses. Theory versus Monte Carlo simulations, Philos. Mag. 41 (1980) 677. [24] E. Gardner, Spin glasses with p-spin interactions, Nucl. Phys. B 257 (1985) 747. [25] D.J. Gross, I. Kanter, H. Sompolinsky, Mean-"eld theory of the Potts glass, Phys. Rev. Lett. 55 (1985) 304. [26] A.J. Bray, M.A. Moore, Metastable states in the solvable spin glass model, J. Phys. A 14 (1981) L377. [27] J.R.L. de Almeida, E.J.S. Lage, Internal "eld distribution in the in"nite-range Ising spin glass, J. Phys. C 16 (1983) 939. [28] M. MeH zard, M.A. Virasoro, The microstructure of ultrametricity, J. Phys. France 46 (1985) 1293. [29] H.-J. Sommers, W. Dupont, Distribution of frozen "elds in the mean-"eld theory of spin glasses, J. Phys. C 17 (1984) 5785. [30] M. MeH zard, G. Parisi, N. Sourlas, G. Toulouse, M. Virasoro, Replica symmetry breaking and the nature of the spin glass phase, J. Phys. France 45 (1984) 843. [31] K. Tokita, The replica-symmetry-breaking solution of the Hop"el model at zero temperature: Critical storage capacity and frozen "eld distribution, J. Phys. A 27 (1994) 4413. [32] H. Sompolinsky, A. Zippelius, Dynamic theory of the spin-glass phase, Phys. Rev. Lett. 47 (1981) 359. [33] H. Sompolinsky, A. Zippelius, Relaxational dynamics of the Edwards-Anderson model and the mean "eld theory of spin-glasses, Phys. Rev. B 25 (1982) 6860. [34] H. Sompolinsky, Time-dependent order parameters in spin-glasses, Phys. Rev. Lett. 47 (1981) 935. [35] H. Sompolinsky, Spin-glass transition in a magnetic "eld, Philos. Mag. B 51 (1985) 543. [36] H.-J. Sommers, Path-integral approach to Ising spin-glass dynamics, Phys. Rev. Lett. 58 (1987) 1268. [37] A. Wendemuth, Performance of robust training algorithms for neural networks, J. Phys. A 28 (1995) 5485. [38] B. Widrow, M.E. Ho!, Adaptive switching circuits, IRE WESCON Conversation Record 4 (1960) 96. [39] B. Widrow, Generalization and information storage in networks of adaline `neuronsa, in: M.C. Yovits, G.T. Jacobi, G.D. Goldstein (Eds.), Self-Organizing Systems, Spartan, Washington, 1962. [40] M.L. Minsky, S.A. Papert, Perceptrons, MIT Press, Cambridge, MA, 1969. [41] D.E. Rumelhart, G.E. Hinton, R.J. Williams, Learning representations by backpropagating errors, Nature 323 (1986) 533. [42] K. Hornik, M. Stinchcombe, W. White, Multilayer feed-forward networks are universal approximators, Neural Networks 2 (1989) 359. [43] E.J. Hartman, J.D. Keeler, J.M. Kowalski, Layered neural networks with Gaussian hidden units as universal approximations, Neural Comput. 2 (1990) 210. [44] W.K. Taylor, Electrical simulations of some nervous system functional activities, in: C. Cherry (Ed.), Information Theory, Butterworths, London, 1956. [45] K. Steinbuch, Die Lernmatrix, Kybernetik 1 (1961) 36 (in German). [46] B.G. Cragg, H.N.V. Temperley, The organization of neurons: a cooperative analogy, EEG Clin. Neurophysiol. 6 (1954) 85. [47] B.G. Cragg, H.N.V. Temperley, Memory: the analogy with ferromagnetic hysteresis, Brain 78 II (1955) 304. [48] E.R. Caianiello, Outline of a theory of thought and thinking machines, J. Theoret. Biol. 1 (1961) 204. [49] H. Sompolinsky, A. Crisanti, H.-J. Sommers, Chaos in random neural networks, Phys. Rev. Lett. 61 (1988) 259.
G. Gyo( rgyi / Physics Reports 342 (2001) 263}392 [50] [51] [52] [53] [54] [55] [56] [57] [58] [59] [60] [61] [62] [63] [64] [65] [66] [67] [68] [69] [70] [71] [72] [73] [74] [75] [76] [77] [78] [79] [80] [81] [82] [83] [84] [85] [86]
387
I. Kanter, Emergence of chaos in asymmetric networks, Phys. Rev. Lett. 77 (1996) 4844. W.A. Little, The existence of persistent states in the brain, Math. Biosci. 19 (1974) 101. W.A. Little, G.L. Shaw, A statistical theory of short and long term memory, Behav. Biol. 14 (1975) 115. D.O. Hebb, The Organization of Behavior, Wiley, New York, 1949. J.J. Hop"eld, Neural networks and physical systems with emergent collective computational abilities, Proc. Natl. Acad. Sci. USA 79 (1982) 2554. J.J. Hop"eld, Neurons with graded response have collective computational properties like those of two-state neurons, Proc. Natl. Acad. Sci. USA 81 (1984) 3088. M. Heerma, W.A. van Leeuwen, Derivation of Hebb's rule, J. Phys. A 32 (1999) 263. K. Binder, A.P. Young, Spin glasses: experimental facts, theoretical concepts, and open questions, Rev. Mod. Phys. 58 (1986) 801. G. Toulouse, Theory of the frustration e!ect in spin glasses, Commun. Phys. 2 (1977) 115. S.F. Edwards, P.W. Anderson, Theory of spin glasses, J. Phys. F 5 (1975) 965. D. Sherrington, S. Kirkpatrick, Solvable model of a spin-glass, Phys. Rev. Lett. 35 (1975) 1792. S. Kirkpatrick, D. Sherrington, In"nite-ranged models of spin-glasses, Phys. Rev. B 11 (1978) 4384. B. Derrida, Random-energy model: an exactly solvable model of disordered systems, Phys. Rev. B 24 (1981) 2613. D. Gross, M. MeH zard, The simplest spin glass, Nucl. Phys. B 240 (1984) 431. A. Crisanti, H.-J. Sommers, The spherical p-spin interaction spin glass model: the statics, Z. Phys. B 87 (1992) 341. M. Gabay, G. Toulouse, Coexistence of spin-glass and ferromagnetic ordering, Phys. Rev. Lett. 47 (1981) 201. A. Erzan, E.J.S. Lage, The in"nite-ranged Potts spin glass model, J. Phys. C 16 (1983) L555. D. Elder"eld, D. Sherrington, The curious case of the Potts spin glass, J. Phys. C 16 (1983) L497. Th.M. Nieuwenhuizen, Exactly solvable model of a quantum spin glass, Phys. Rev. Lett. 74 (1995) 4289. Th.M. Nieuwenhuizen, Quantum description of spherical spins, Phys. Rev. Lett. 74 (1995) 4293. R. Oppermann, B. Rosenow, Low-energy excitations in fermionic spin glasses: a quantum-dynamical image of Parisi symmetry breaking, Europhys. Lett. 41 (1998) 525. J. Villain, B. SeH meH ria, F. Lancon, L. Billard, A controversial problem: modi"ed Ising model in a random "eld, J. Phys. C 16 (1983) 6153. M. Kardar, Replica Bethe ansatz studies of two-dimensional interfaces with quenched random impurities, Nucl. Phys. B 290 (1987) 582. M. MeH zard, G. Parisi, Replica "eld theory for random manifolds, J. Phys. I France 1 (1991) 809. A. Engel, Replica symmetry breaking in zero dimension, Nucl. Phys. B 410 (1993) 617. P. Le Doussal, T. Giamarchi, Replica symmetry breaking instability in the 2D XY model in a random "eld, Phys. Rev. Lett. 74 (1995) 606. J.J. Arenzon, Replica theory of granular media, J. Phys. A 32 (1999) L107. J. Berg, A. Engel, Matrix games, mixed strategies, and statistical mechanics, Phys. Rev. Lett. 81 (1998) 4999. T. Garel, H. Orland, Mean-"eld model for protein folding, Europhys. Lett. 6 (1988) 307. E.I. Shakhnovich, A.M. Gutin, Frozen states of a disordered globular heteropolymer, J. Phys. A 22 (1989) 1647. M. Sasai, P.G. Wolynes, Uni"ed theory of collapse, folding, and glass transitions in associative-memory Hamiltonian models of proteins, Phys. Rev. A 46 (1992) 7979. S. Takada, P.G. Wolynes, Statistics, metastable states, and barriers in protein folding: A replica variational approach, Phys. Rev. E 55 (1997) 4562. A. Pagnoni, G. Parisi, F. Ricci-Tersenghi, Glassy transition in a disordered model for the RNA secondary structure, Phys. Rev. Lett. 84 (1999) 2026. J. van Mourik, K.Y.M. Wong, D. BolleH , From shrinking to percolation in an optimization model, J. Phys. A 33 (2000) L53. K.H. Fischer, J.A. Hertz, Spin Glasses, Cambridge University Press, Cambridge, UK, 1991. E. Marinari, C. Naitza, F. Zuliani, G. Parisi, M. Picco, F. Ritort, General method to determine replica symmetry breaking transitions, Phys. Rev. Lett. 81 (1998) 1698. H. Bokil, A.J. Bray, B. Drossel, M.A. Moore, Comment on `General method to determine replica symmetry breaking transitionsa, Phys. Rev. Lett. 82 (1999) 5174.
388
G. Gyo( rgyi / Physics Reports 342 (2001) 263}392
[87] E. Marinari, C. Naitza, F. Zuliani, G. Parisi, M. Picco, F. Ritort, Marinari et al. reply, Phys. Rev. Lett. 82 (1999) 5175. [88] M.A. Moore, H. Bokil, B. Drossel, Evidence for the droplet picture of spin glasses, Phys. Rev. Lett. 81 (1998) 4252. [89] E. Marinari, G. Parisi, J.J. Ruiz-Lorenzo, F. Zuliani, Comment on `Evidence for the droplet picture of spin glassesa, Phys. Rev. Lett. 82 (1999) 5176. [90] H. Bokil, A.J. Bray, B. Drossel, M.A. Moore, Bokil et al. reply, Phys. Rev. Lett. 82 (1999) 5177. [91] E. Marinari, G. Parisi, J.J. Ruiz-Lorenzo, Numerical simulations of spin glass systems, in: A.P. Young (Ed.), Spin Glasses and Random Fields, World Scienti"c, Singapore, 1998. [92] L.A. Pastur, M.V. Shcherbina, Absence of self-averaging of the order parameter in the Sherrington-Kirkpatrick model, J. Stat. Phys. 62 (1991) 1. [93] J. Feng, The SLLN for the free-energy of a class of neural networks, Helv. Phys. Acta. 68 (1995) 365. [94] A. Bovier, V. Gayrard, P. Picco, Gibbs states of the Hop"eld model with extensively many patterns, J. Stat. Phys. 79 (1995) 395. [95] J.L. van Hemmen, R.G. Palmer, The replica method and a solvable spin glass model, J. Phys. A 12 (1979) 563. [96] J.R.L. de Almeida, D.J. Thouless, Stability of the Sherrington-Kirkpatrick solution of a spin glass model, J. Phys. A 11 (1978) 983. [97] D.J. Thouless, J.R.L. de Almeida, J.M. Kosterlitz, Stability and susceptibility in Parisi's solution of a spin glass model, J. Phys. C 13 (1980) 3271. [98] C. De Dominicis, I. Kondor, Eigenvalues of the stability matrix for Parisi solution of the long-range spin-glass, Phys. Rev. B 27 (1983) 606. [99] Th.M. Nieuwenhuizen, To maximize or not to maximize the free energy of glassy systems, Phys. Rev. Lett. 74 (1995) 3463. [100] I. Kondor, Parisi's mean-"eld solution for spin glasses as an analytic continuation in the replica number, J. Phys. A 16 (1983) L127. [101] E. Pytte, J. Rudnick, Scaling, equation of state and the instability of the spin-glass phase, Phys. Rev. B 19 (1979) 3603. [102] A. Blandin, Theories versus experiments in the spin glass systems, J. Phys. France 39 (1978) C6-1499. [103] A.J. Bray, M.A. Moore, Replica-symmetry breaking in spin-glass theories, Phys. Rev. Lett. 41 (1978) 1068. [104] A.J. Bray, M.A. Moore, Replica symmetry and massless modes in the Ising spin glass, J. Phys. C 12 (1979) 79. [105] A. Blandin, M. Gabay, T. Garel, On the mean-"eld theory of spin glasses, J. Phys. C 13 (1980) 403. [106] C. De Dominicis, M. Gabay, H. Orland, Replica derivation of Sompolinsky free energy functional for mean "eld spin glass, J. Phys. Lett. France 42 (1981) L523. [107] C. De Dominicis, Broken symmetry in the mean-"eld theory of the Ising spin glass: Replica way and no replica way, in: J.L. Van Hemmen, I. Morgenstern (Eds.), Heidelberg Colloquium on Spin Glasses, Springer, Berlin, 1983. [108] G. Parisi, Towards a mean "eld theory for spin glasses, Phys. Lett. A 73 (1979) 203. [109] G. Parisi, In"nite number of order parameters for spin-glasses, Phys. Rev. Lett. 43 (1979) 1754. [110] G. Parisi, Order parameter for spin glasses, Phys. Rev. Lett. 50 (1983) 1946. [111] T. TemesvaH ri, C. De Dominicis, I. Kondor, Block diagonalizing ultrametric matrices, J. Phys. A 27 (1994) 7569. [112] D.J. Thouless, P.W. Anderson, R.G. Palmer, Solution of `solvable model of a spin glassa, Philos. Mag. 35 (1977) 593. [113] H.-J. Sommers, Solution of the long-range Gaussian-random Ising model, Z. Phys. B 31 (1978) 301. [114] H.-J. Sommers, The Sherrington-Kirkpatrick spin glass model: results of a new theory, Z. Phys. B 33 (1979) 173. [115] A.J. Bray, M.A. Moore, Some observations on the mean-"eld theory of spin glasses, J. Phys. C 13 (1980) 419. [116] H. Sompolinsky, A. Zippelius, Relaxational dynamics of the Edwards-Anderson model and the mean-"eld theory of spin-glasses, Phys. Rev. B 25 (1982) 6860. [117] C. De Dominicis, M. Gabay, B. Duplantier, A Parisi equation for Sompolinsky's solution of the SK model, J. Phys. A 15 (1982) L47. [118] M. MeH zard, G. Parisi, M.A. Virasoro, SK model: the replica solution without replicas, Europhys. Lett. 1 (1986) 77. [119] G. Diener, L. Brusch, A new simple version of the replica method, J. Phys. A 32 (1998) 585. [120] C. Domb, M.S. Green, Phase Transitions and Critical Phenomena, Vol. 1, Academic Press, London, 1972.
G. Gyo( rgyi / Physics Reports 342 (2001) 263}392
389
[121] N.D. Mackenzie, A.P. Young, Lack of ergodicity in the in"nite-range Ising spin-glass, Phys. Rev. Lett. 49 (1982) 301. [122] A. Houghton, S. Jain, A.P. Young, Role of initial conditions in the mean-"eld theory of spin-glass dynamics, Phys. Rev. B 28 (1983) 2630. [123] A.P. Young, Direct determination of the probability distribution for the spin-glass order parameter, Phys. Rev. Lett. 51 (1983) 1206. [124] M. MeH zard, G. Parisi, M.A. Virasoro, Random free energies in spin glasses, J. Phys. Lett. 46 (1985) L217. [125] C.M. Newman, D.L. Stein, Simplicity of state and overlap structure in "nite-volume realistic spin glasses, Phys. Rev. E 57 (1998) 1356. [126] R. Monasson, D. O'Kane, Domains of solutions and replica symmetry breaking in multilayer neural networks, Europhys. Lett. 27 (1994) 85. [127] A. Engel, L. Reimers, Reliability of replica symmetry for the generalization problem of a toy multilayer neural network, Europhys. Lett. 28 (1994) 531. [128] G.-J. Bex, R. Serneels, C. Van den Broeck, Storage capacity and generalization error for the reversed-wedge Ising perceptron, Phys. Rev. E 51 (1995) 6309. [129] M. Aizenman, J.L. Lebowitz, D. Ruelle, Some rigorous results on the Sherrington-Kirkpatrick spin glass model, Commun. Math. Phys. 112 (1987) 3. [130] M. Shcherbina, On the replica symmetric solution for the Sherrington-Kirkpatrick model, Helv. Phys. Acta 70 (1997) 838. [131] F. Guerra, The cavity method in the mean "eld spin glass model. Functional representation of thermodynamical variables, in: A. Albeverio, R. Figari, E. Orlandi, A. Teta (Eds.), Advances in Dynamical Systems and Quantum Physics, World Scienti"c, Singapore, 1995. [132] F. Guerra, Fluctuations and thermodynamic variables in mean "eld spin glass models, in: A. Albeverio, U. Cattaneo, D. Merlini (Eds.), Stochastic Processes, Physics, and Geometry, World Scienti"c, Singapore, 1995. [133] S. Ghirlanda, F. Guerra, General properties of overlap probability distributions in disordered spin systems. Towards Parisi ultrametricity, J. Phys. A 31 (1998) 9149. [134] F. Guerra, About the overlap distribution in mean "eld spin glass models, Int. J. Mod. Phys. B 10 (1998) 1675. [135] D.J. Amit, H. Gutfreund, H. Sompolinsky, Storing in"nite number of patterns in a spin-glass model of neural networks, Phys. Rev. Lett. 55 (1985) 1530. [136] D.J. Amit, H. Gutfreund, H. Sompolinsky, Statistical mechanics of neural networks near saturation, Ann. Phys. 173 (1987) 30. [137] D.J. Amit, Modeling Brain Function, Cambridge University Press, Cambridge, UK, 1989. [138] A. Crisanti, D.J. Amit, H. Gutfreund, Saturation level of the Hop"eld model for neural networks, Europhys. Lett. 2 (1986) 337. [139] H. Ste!an, R. KuK hn, Replica symmetry breaking in attractor neural network models, Z. Phys. B 95 (1994) 249. [140] G. Parisi, G. Toulouse, J. Phys. Lett. France 41 (1980) L361. [141] H. Horner, D. Bormann, M. Frick, H. Kinzelbach, A. Schmidt, Transients and basins of attraction in neural network models, Z. Phys. B 76 (1989) 381. [142] Th. Stiefvater, K.-R. MuK ller, R. KuK hn, Averaging and "nite-size analysis for disorder: The Hop"eld model, Physica A 232 (1996) 61. [143] T. Geszti, Physical Models of Neural Networks, World Scienti"c, Singapore, 1990. [144] M. Bouten, Storage capacity of the Hop"eld model, in: K. Srinivasa Rao, L. Satpathy (Eds.), Perspectives in Theoretical Nuclear Physics, Wiley Eastern, India, 1994. [145] H. Rieger, M. Schreckenberg, J. Zittartz, Glauber dynamics of the Little-Hop"eld model, Z. Phys. B 72 (1988) 523. [146] A.C.C. Coolen, D. Sherrington, Dynamics of fully connected attractor neural networks near saturation, Phys. Rev. Lett. 23 (1993) 3886. [147] M. Shiino, T. Fukai, Self-consistent signal-to-noise analysis of statistical behavior of analog neural networks and enhancement of the storage capacity, Phys. Rev. E 48 (1993) 867. [148] L. Pastur, M. Shcherbina, B. Tirozzi, The replica-symmetric solution without replica trick for the Hop"eld model, J. Stat. Phys. 74 (1994) 1161. [149] R.O. Winder, Switching circuit theory and logical design, AIEE Special Publ. S-134 (1961) 321.
390
G. Gyo( rgyi / Physics Reports 342 (2001) 263}392
[150] T.M. Cover, Geometrical and statistical properties of systems of linear inequalities with applications in pattern recognition, IEEE Trans. Electron. Comput. 14 (1965) 326. [151] S. Venkatesh, Epsilon capacity of neural networks, in: J.S. Denker (Ed.), Neural Networks for Computing, AIP Conference Proceedings, Vol. 151, AIP, New York, 1986. [152] P. Baldi, S. Venkatesh, Number of stable points for spin-glasses and neural networks of higher orders, Phys. Rev. Lett. 58 (1987) 913. [153] G.J. Mitchison, R.M. Durbin, Bounds on the learning capacity of some multi-layer networks, Biol. Cybernet. 60 (1989) 345. [154] E. Amaldi, On the complexity of training perceptrons, in: T. Kohonen, K. MaK kisara, O. Simula, J. Kangas (Eds.), Arti"cial Neural Networks, North-Holland, Amsterdam, 1991. [155] A. Wendemuth, Learning the unlearnable, J. Phys. A 28 (1995) 5423. [156] P. Rujan, Searching for optimal con"gurations by simulated tunneling, Z. Phys. B 73 (1988) 391. [157] S. Gallant, Perceptron-based learning algorithms, IEEE Trans. Neural Networks 1 (1990) 179. [158] M. Frean, A thermal perceptron learning rule, Neural Comput. 4 (1992) 946. [159] F.-M. Dittes, Optimization on rugged landscapes: a new general purpose Monte Carlo approach, Phys. Rev. Lett. 76 (1996) 4651. [160] S. Kirkpatrick, C.D. Gelatt Jr., M.P. Vecchi, Optimization by simulated annealing, Science 220 (1983) 671. [161] W. Wenzel, K. Hamacher, Stochastic tunneling approach for global minimization of complex potential energy landscapes, Phys. Rev. Lett. 82 (1999) 3003. [162] M. MeH zard, The space of interactions in neural networks: Gardner's calculation with the cavity method, J. Phys. A 22 (1989) 2181. [163] M. Griniasty, `Cavity-approacha analysis of the neural-network learning problem, Phys. Rev. E 47 (1993) 4496. [164] M. Bouten, J. Schietse, C. Van den Broeck, Gradient descent learning in perceptrons: a review of its possibilities, Phys. Rev. E 52 (1995) 1958. [165] K.Y.M. Wong, Microscopic equations and stability conditions in optimal neural networks, Europhys. Lett. 30 (1995) 245. [166] B. Derrida, R.B. Gri$ths, A. PruK gel-Bennett, Finite-size e!ects and bounds for perceptron models, J. Phys. A 24 (1991) 4907. [167] R. Monasson, R. Zecchina, Weight space structure and internal representations: a direct approach to learning and generalization in multilayer neural networks, Phys. Rev. Lett. 75 (1995) 2432; Erratum 76 (1996) 2205. [168] R. Monasson, R. Zecchina, Learning and generalization theories of large committee-machines, Mod. Phys. Lett. B 9 (1996) 1887. [169] S. Cocco, R. Monasson, R. Zecchina, Analytical and numerical study of internal representations in multilayer neural networks with binary weights, Phys. Rev. E 54 (1996) 717. [170] M. Weigt, A. Engel, Multifractality and percolation in the coupling space of perceptrons, Phys. Rev. E 55 (1997) 4552. [171] H. Rieger, H.S. Seung, Vapnik-Chervonenkis entropy of the spherical perceptron, Phys. Rev. E 55 (1996) 3283. [172] C. Van den Broeck, G.-J. Bex, Multifractal a priori probability distribution for the perceptron, Phys. Rev. E 57 (1998) 3660. [173] T.C. Halsey, M.H. Jensen, L.P. Kadano!, I. Procaccia, B.I. Shraiman, Fractal measures and their singularities: the characterization of strange sets, Phys. Rev. A 33 (1986) 1141. [174] T. TeH l, Fractals, multifractals, and thermodynamics, Z. Naturforsch. 43a (1988) 1154. [175] W. Krauth, M. MeH zard, Storage capacity of memory networks with binary couplings, J. Phys. France 50 (1989) 3057. [176] J.F. Fontanari, R. Meir, The statistical mechanics of the Ising perceptron, J. Phys. A 26 (1993) 1077. [177] L. Pitt, L.G. Valiant, Computational limitations on learning from examples, J. ACM 35 (1988) 965. [178] G.-J. Bex, Tuning the transfer function: the reversed wedge and beyond, Ph.D. Thesis, L.U.C. Diepenbeek, Belgium, 1996. [179] W. Nadler, W. Fink, Finite size scaling in neural networks, Phys. Rev. Lett. 78 (1997) 555. [180] M. SchroK der, R. Urbanczik, Comment on `Finite size scaling in neural networksa, Phys. Rev. Lett. 80 (1998) 4109.
G. Gyo( rgyi / Physics Reports 342 (2001) 263}392
391
[181] G. Milde, S. Kobe, An exact learning algorithm for autoassociative neural networks with binary couplings, J. Phys. A 30 (1997) 2349. [182] M. Bouten, L. Reimers, B. van Rompaey, Learning in the hypercube: A stepping stone to the binary perceptron, Phys. Rev. E 58 (1998) 2378. [183] A. Engel, M. Weigt, Multifractal analysis of the coupling space of feedforward neural networks, Phys. Rev. E 53 (1996) 2064. [184] G. GyoK rgyi, unpublished. [185] D. Saad, Explicit symmetries and the capacity of multilayer neural networks, J. Phys. A 27 (1994) 2719. [186] E. Barkai, I. Kanter, Storage capacity of a multilayered neural network with binary weights, Europhys. Lett. 14 (1991) 107. [187] E. Barkai, D. Hansel, H. Sompolinsky, Broken symmetries in multilayered perceptrons, Phys. Rev. A 45 (1992) 4146. [188] T.B. Kepler, L.F. Abbott, Domains of attraction in neural networks, J. Phys. France 49 (1988) 1657. [189] E. Gardner, Optimal basins of attraction in randomly sparse neural network models, J. Phys. A 22 (1989) 1969. [190] E. Gardner, B. Derrida, P. Mottishaw, Zero temperature parallel dynamics for in"nite range spin glasses and neural networks, J. Phys. France 48 (1987) 741. [191] B.M. Forrest, Content-addressability and learning in neural networks, J. Phys. A 21 (1988) 245. [192] W. Krauth, J.-P. Nadal, M. MeH zard, The roles of stability and symmetry in the dynamics of neural networks, J. Phys. A 21 (1988) 2995. [193] R.D. Henkel, M. Opper, Parallel dynamics of the neural network with the pseudoinverse coupling matrix, J. Phys. A 24 (1991) 2201. [194] B. Derrida, E. Gardner, A. Zippelius, An exactly solvable asymmetric neural network model, Europhys. Lett. 4 (1987) 167. [195] D.J. Amit, M.R. Evans, H. Horner, K.Y.M. Wong, Retrieval phase diagrams for attractor neural network with optimal interactions, J. Phys. A 23 (1990) 3361. [196] A. Engel, H.M. KoK hler, F. Tschepke, H. Vollmayr, A. Zippelius, Storage capacity and learning algorithms for two-layer neural networks, Phys. Rev. A 45 (1992) 7590. [197] A. Engel, Correlations of internal representations in feed-forward neural networks, J. Phys. A 29 (1996) L323. [198] D. Malzahn, A. Engel, I. Kanter, Storage capacity of correlated perceptrons, Phys. Rev. E 55 (1997) 7369. [199] D. Malzahn, A. Engel, Correlations between hidden units in multilayer neural networks and replica symmetry breaking, Phys. Rev. E 60 (1999) 2097. [200] T.L.H. Watkin, A. Rau, M. Biehl, The statistical mechanics of learning a rule, Rev. Mod. Phys. 65 (1993) 499. [201] B. Schottky, Phase transitions in the generalization behavior of multilayer neural networks, J. Phys. A 28 (1995) 4515. [202] A.H.L. West, D. Saad, The statistical mechanics of constructive algorithms, J. Phys. A 31 (1998) 8977. [203] O. Winther, B. Lautrup, J.-B. Zhang, Optimal learning in multilayer neural networks, Phys. Rev. E 55 (1997) 836. [204] S. Amari, Natural gradient works e$ciently in learning, Neural Comput. 10 (1998) 251. [205] E. Barkai, D. Hansel, I. Kanter, Statistical mechanics of multilayered neural networks, Phys. Rev. Lett. 65 (1990) 2312. [206] C. Kwon, J.-H. Oh, Storage capacities of committee machines with overlapping and non-overlapping receptive "elds, J. Phys. A 30 (1997) 6273. [207] R. Urbanczik, Storage capacity of the fully connected committee machine, J. Phys. A 30 (1997) L387. [208] Y. Xiong, J.-H. Oh, C. Kwon, Weight space structure and the storage capacity of a fully connected committee machine, Phys. Rev. E 56 (1997) 4540. [209] Y. Xiong, C. Kwon, J.-H. Oh, Storage capacity of a fully-connected parity machine with continuous weights, J. Phys. A 31 (1998) 7043. [210] A. Priel, M. Blatt, T. Grossman, E. Domany, I. Kanter, Computational capabilities of restricted two-layered perceptrons, Phys. Rev. E 50 (1994) 577. [211] K. Kobayashi, On the capacity of a neuron with a non-monotone output function, Network 2 (1991) 237. [212] T.L.H. Watkin, A. Rau, Learning unlearnable problems with perceptrons, Phys. Rev. A 45 (1992) 4102.
392
G. Gyo( rgyi / Physics Reports 342 (2001) 263}392
[213] P. De Felice, C. Marangi, G. Nardulli, G. Pasquariello, L. Tedesco, Dynamics of neural networks with non-monotone activation function, Network 4 (1993) 1. [214] H. Nishimori, I. Opris, Retrieval process of an associative memory with a general input}output function, Neural Networks 6 (1993) 1061. [215] G. Bo!etta, R. Monasson, R. Zecchina, Symmetry breaking in non-monotone neural networks, J. Phys. A 26 (1993) L507. [216] L. Reimers, A. Engel, Weight space structure and generalization in the reversed-wedge perceptron, J. Phys. A 29 (1996) 3923. [217] A. Mietzner, M. Opper, W. Kinzel, Maximal stability in unsupervised learning, J. Phys. A 28 (1995) 2785. [218] E. Lootens, C. Van den Broeck, Analysing cluster formation by replica method, Europhys. Lett. 30 (1995) 381. [219] D. Bolle, R. Erichsen Jr., Optimal capacity of graded-response perceptrons: a replica-symmetry-breaking solution, J. Phys. A 29 (1996) 2299. [220] G.-J. Bex, C. Van den Broeck, Domain sizes of the Gardner volume for the Ising reversed wedge perceptron, Phys. Rev. E 56 (1997) 870. [221] E. Gardner, Optimal basins of attraction in randomly sparse neural network models, J. Phys. A 22 (1989) 1969. [222] G. GyoK rgyi, N. Tishby, Statistical mechanics of learning a rule, in: W.K. Theumann, R. KoK berle (Eds.), Proceedings of the STATPHYS-17 Workshop on Neural Networks and Spin Glasses, World Scienti"c, Singapore, 1990. [223] C. De Dominicis, T. TemesvaH ri, I. Kondor, On Ward-Takahashi identities for the Parisi spin glass, J. Phys. IV France 8 (1998) Pr6}13. [224] E. Hopf, The partial di!erential equation u #uu "ku , Commun. Pure Appl. Math. 3 (1950) 201. R V VV [225] J.D. Cole, Quart. Appl. Math. 9 (1951) 225. [226] B. Duplantier, Comment on Parisi's equation for the SK model for spin glasses, J. Phys. A 14 (1981) 283. [227] G. Parisi, On the physical origin of ultrametricity, cond-mat/9905189, 1999. [228] T. TemesvaH ri, unpublished. [229] C. Itzykson, J.-M. Drou!e, Statistical Field Theory, Cambridge Monographs on Mathematical Physics, Vol. 2, Cambridge University Press, Cambridge, UK, 1989. [230] A. Crisanti, H.-J. Sommers, The spherical p-spin interaction spin glass model: The statics, Z. Phys. B 87 (1992) 341. [231] P.J. Oliver, Applications of Lie Groups to Di!erential Equations, Springer, New York, 1986. [232] Th.M. Nieuwenhuizen, C.N.A. Duin, Ginzburg-Landau theory of the cluster glass phase, J. Phys. A 30 (1997) L55. [233] M. Abramowitz, I.A Stegun, Handbook of Mathematical Functions, Dover, New York, NY, 1970. [234] B. Derrida, Random-energy model: limit of a family of disordered models, Phys. Rev. Lett. 45 (1980) 79. [235] A. Engel, C. Van den Broeck, Statistical Mechanics of Learning, Cambridge University Press, Cambridge UK, 2000.
N.K. Glendenning / Physics Reports 342 (2001) 393}447
393
PHASE TRANSITIONS AND CRYSTALLINE STRUCTURES IN NEUTRON STAR CORES Norman K. GLENDENNING Nuclear Science Division and Institute for Nuclear and Particle Astrophysics, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA
AMSTERDAM } LONDON } NEW YORK } OXFORD } PARIS } SHANNON } TOKYO
Physics Reports 342 (2001) 393}447
Phase transitions and crystalline structures in neutron star cores夽 Norman K. Glendenning Nuclear Science Division and Institute for Nuclear and Particle Astrophysics, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA Received June 2000; editor: G.E. Brown Contents 1. Prologue 2. Superdense matter and its phases 2.1. Hyperonized matter 2.2. Bose condensation 2.3. Quark decon"nement 3. Phase transitions in multi-component substances 3.1. Proof of the global nature of conservation laws 3.2. Degrees of freedom and internal driving forces 3.3. Spatial structure 3.4. Three theorems 4. Quark decon"nement phase transition 4.1. Theoretical models
396 398 400 401 401 403 403 405 407 408 409 409
4.2. Bulk structure of a hybrid star 4.3. Spatial structure in the mixed phase 5. Kaon condensation 5.1. Introductory remarks 5.2. Relativistic mean-"eld model with kaons 5.3. Matter properties with a kaon condensate 5.4. Mixed phase properties 5.5. Stellar properties with kaon condensed phase 6. Possible consequences of geometrical phases 6.1. Pulsar glitches 6.2. Neutrino transport 7. Summary Acknowledgements References
416 418 423 423 425 433 436 437 442 442 443 444 445 445
夽 This work was supported by the Director, O$ce of Energy Research, O$ce of High Energy and Nuclear Physics, Division of Nuclear Physics, of the US. Department of Energy under Contract DE-AC03-76SF00098. E-mail address:
[email protected] (N.K. Glendenning).
0370-1573/01/$ - see front matter 2001 Elsevier Science B.V. All rights reserved. PII: S 0 3 7 0 - 1 5 7 3 ( 0 0 ) 0 0 0 8 0 - 6
N.K. Glendenning / Physics Reports 342 (2001) 393}447
395
Abstract The mixed phase of a fully equilibrated nuclear system that is asymmetric in isospin (i.e. in charge) will develop a geometrical structure of the rarer phase immersed in the dominant one. This happens because the isospin asymmetry energy will exploit the degree of freedom available to a system of more than one independent component (or conserved charge) by rearranging the proportion of charge to baryon number between the two equilibrium phases so as to lower the energy; that is, to e!ectively reduce the isospin asymmetry in the normal nuclear phase. Consequently, the two phases will have opposite charge; competition between Coulomb and surface energy will be resolved by formation of a Coulomb lattice of the rarer phase situated at sites in the dominant phase. The geometric form, size, and spacing of the phase occupying the lattice sites will depend on the pressure or density of matter. Thus, a neutron star containing a mixed phase region of whatever kind, will have a varying geometric structure of one phase embedded in the other. This is expected to e!ect transport properties of the star as well as to e!ect the glitch behavior of pulsars that contain a mixed phase region. We study in particular, the quark decon"nement and kaon condensation phase transitions as examples of this general phenomenon. 2001 Elsevier Science B.V. All rights reserved. PACS: 97.60.Jd; 26.60.#d; 95.30.Cq; 97.10.Cv
396
N.K. Glendenning / Physics Reports 342 (2001) 393}447
1. Prologue Nucleons in a neutron stars are bound ten times more strongly than in a nucleus. Because of this fact, they radiate about ten times as much energy during birth than the luminous presupernova star radiated in its ten million year lifetime. Most of this enormous binding energy is emitted in neutrinos which di!use from the proto neutron star during the "rst twenty seconds [1]. Only about one percent of the gravitational binding drives the explosion; it is accomplished by a small fraction of energetic neutrinos. They transform a fraction of the binding energy to kinetic energy that drives the supernova explosion. Convection from the hot inner core during birth provides a continuous source of heat and high-energy neutrinos until expulsion of most of the star in an explosion is achieved [2,3]. Supernova explosions, powered by the release of gravitational energy from newly born neutron stars, distributes into the cosmos elements up to the iron mass peak. Heavier elements are produced by explosive nucleosynthesis in the expanding remnant. The planets and life itself owe their existence to these dense neutron stars. Without the formation of neutron stars, elements heavier than those produced in the primordial nucleosynthesis would be locked inside stars whose ultimate fate would be collapse to a black hole. Yet neutron stars are a cosmic coincidence. Two physically unrelated masses are involved: the Chandrasekhar mass, which is the limiting mass that can be supported by degenerate electron pressure in the core of the evolving presupernova star, and the Oppenheimer limit, which is the maximum possible neutron star mass that can be supported against gravity by baryon Fermi pressure and nuclear repulsion, weakened by hyperonization or other phase transitions that we will discuss shortly. If the Oppenheimer limit were less than the Chandrasekhar limit, supernovae would not occur; the dying star would collapse directly into a black hole, carrying with it the elements it processed during its lifetime, and the cosmos would remain forever sterile. The frequency of supernova events in galaxies such as ours is about one every hundred years. This, combined with the brevity of the birth, informs us that a protoneutron star will seldom be seen, however interesting the physical process that likely occur during birth. Neutron stars cool very rapidly to temperatures that are small on the nuclear scale, and an estimated one in "ve can be seen as canonical pulsars even as far as 15 to 20 kps, though the majority are seen only to distances of 5 kps. Canonical pulsars, those that belong to the main population of about 1100 known pulsars, have mean periods of about 0.7 s. Because of the strong magnetic "eld that rotates with them, they radiate angular momentum and energy; consequently, their rotational frequency constantly decreases. They are believed to be born with high magnetic "elds due to #ux conservation and to evolve at near constant "eld strength toward longer periods as illustrated in Fig. 1. Because they are born with modest rotational frequency, like the Crab pulsar, with period 1/30 s, the centrifugal force is weak and its decrease brings about no appreciable change in central density. They remain relatively unchanged in composition and structure during their entire active life as pulsars of about 10 million years. Finally, because of a lower combination of rotational frequency and magnetic
The gravitational binding per nucleon in a neutron star of mass 1.5M is typically 100 MeV as compared with > a nuclear binding of less than 10 MeV. The radius of the disk of our galaxy is 30 kps. A kiloparsec is about 3300 light years.
N.K. Glendenning / Physics Reports 342 (2001) 393}447
397
Fig. 1. Three phases in the evolutionary track of pulsars are (1) from high magnetic "eld and moderate rotation period to long period in about 10}10 yr, (2) to accreting X-ray neutron stars (radio silent), (3) to millisecond pulsars with low magnetic "elds.
"eld strength, they fade from the radio sky. The concentration of canonical pulsars at longer period simply re#ects the rate at which angular momentum is lost, which falls o! as the rotational period increases. (dP/dtJ1/P for magnetic dipole radiation.) A neutron star that has disappeared from the radio sky may reappear after an inde"nite time, "rst as an X-ray neutron star, its surface heated by accretion of matter from a less dense companion. The companion may have been acquired in the long, inde"nite silent period, or the progenitor of the neutron star may have already had a low-mass companion. Angular momentum conservation assures the accretion-driven spin-up of the neutron star. This phase has been discovered } the missing link between canonical and millisecond pulsars [4,5]. When the combination of higher frequency and a magnetic "eld, now much smaller either because of ohmic decay or partial destruction by the accreted material, regains a critical value, the neutron star reappears as a millisecond pulsar. Because of the signi"cant centrifugal force in millisecond pulsars, their internal structure will change as angular momentum is radiated, though on a very long timescale [6,7]. The three stages in the evolution of neutron stars are illustrated in Fig. 1. Of course, the last two stages need not be realized by a particular neutron star. Because canonical and millisecond pulsars are the only neutron stars that have been observed, or are likely ever to be observed, our discussion is centered on them rather than the transient proto-star stage. The intermediate stage X-ray neutron stars may eventually provide crucial information on the mass-radius relationship for neutron stars, but it is too early to say for sure [8,9]. Certainly they provide the missing link between the canonical pulsars and millisecond pulsars.
About half of all stars have companions.
398
N.K. Glendenning / Physics Reports 342 (2001) 393}447
The much weaker magnetic "eld of millisecond pulsars compared to canonical pulsars produces a much smaller torque. Consequently, millisecond pulsar rotation changes very slowly. However, though the time rate-of-change of properties of these stars is much too slow to observe over a human lifetime } even many lifetimes } the rate of change with respect to rotational frequency can be large and produce highly anomalous values of certain observables related to the spin characteristics [6,7].
2. Superdense matter and its phases The properties and phenomena associated with dense matter have fascinated scientists since the time white dwarfs were "rst discovered. It was di$cult even for Eddington to understand how objects that were orders of magnitude denser than earth could be formed and cool } even of what they might consist } until the wedding of special relativity and quantum mechanics provided the understanding of degenerate Fermi systems [10}12]. Gravity crushes atoms in white dwarfs until they are ionized and the degenerate electrons that occupy the interstices between nuclei provides the pressure that stabilizes these stars. An understanding at the level of Fermi degeneracy was available to Baade and Zwicky [13] who correctly asserted that the binding energy of closely packed nucleons could power the explosion of stars when the core collapsed at the endpoint of exothermic fusion reactions. Fermi energy of nucleons, assisted by the repulsion of nuclear forces at short distance (presumably due in part or whole to the exclusion principle among their quark constituents) provided an easy understanding of the possible existence of neutron stars. The discovery of rapidly spinning pulsars provided the evidence of the existence of superdense matter. The average density can be inferred by balancing the centrifugal and gravitational forces at the stellar surface. The fastest rotator provides a lower limit on the average density of the order of nuclear density. The inner density is much higher than the average, just as for the atmosphere on our planet. Since then, great strides have been made in theoretical explorations of the possible states of superdense matter, but little in the way of concrete evidence exists. On the one hand, the astronomer is limited to discovering and making measurements on the spectacles that are presented. While being a limitation, one must admire the fantastic expansion of our understanding of the universe in the last decade occasioned by the rapid development of new technologies, satellite borne observational apparatuses and the electronic computer. On the other hand, the laboratory experimenter can control and repeat observations within limits, but as concerns superdense matter, the handicaps are also great, not least of which is the extremely short lifetime of the dense domain produced in nuclear collisions, and the enormous multiplicity of produced particles which can be observed only after the dense domain has been blown apart. In exploring the possible nature of dense matter, the theorist is limited only by the laws of nature, his imagination, and very little in the way of observational constraints. In this paper we shall discuss the properties and especially phases that may exist in superdense matter, being especially attentive to possible veri"able consequences of our exploration. The properties of superdense matter that are inferred from theory are quite remarkable; among them is the formation of crystalline structures in the mixed phase of nuclear matter and any of its high-density phases. The
N.K. Glendenning / Physics Reports 342 (2001) 393}447
399
crystal structure consists of the rarer of the two phases occupying crystal lattice sites in the background of the dominant phase. Such a possibility was not even imagined a decade ago [14,15]. A very promising signal of a high density phase transition in pulsars was recently conceived [6,7]. But the consequences of crystalline structure may be more subtle; however, the structure must surely e!ect all transport properties [15]. Indeed, recent calculations con"rm a large e!ect of geometric structure on neutrino transport [16]. In addition, one can anticipate an e!ect on pulsar glitches, small, sudden, irregular, and unpredictable changes in rotation frequency followed by a recovery; there is great individuality in the glitch behavior of individual pulsars, as would be expected because of large changes in crystalline structure for small di!erences in mass [17]. So, at the present, theoretical investigation goes beyond what is veri"able; but that is likely to change. Because of the potential that such a novel crystalline structure can produce observable signals, the purpose of the paper is to review the theoretical basis on which it rests. The structures should appear in any dense nuclear medium so long as (1) electric charge is one of the conserved quantities carried by the matter, (2) that matter under consideration is isospin or charge asymmetric, and (3) provided the dense state endures long enough to allow relaxation into the lowest energy state of the dense matter [14,15]. Neutron stars satisfy all three conditions, but the dense matter produced in nuclear collisions almost certainly does not have the time to relax. The discussion will therefore be centered on neutron stars and in particular on the quark decon"nement and the kaon condensate phase transitions. At nuclear density and at densities somewhat higher, neutron star matter consists of a charge neutral mixture of neutrons, protons and leptons. At moderately higher density than nuclear, some nucleons, driven by the Pauli principle, will likely be converted to hyperons. We refer to the phase containing leptons, nucleons or other members of the baryon octet as the normal phase. At higher densities, achieved either as a proto neutron star cools through neutrino di!usion and shrinks [18}20] or as a centrifugally deformed rotating neutron star loses angular momentum to magnetic dipole radiation and shrinks [6], transitions to other phases may occur. High-density phases include, besides further hyperonization, the Bose condensed (either pion or kaon), and quark decon"ned phases. Each of these examples may be of "rst or second order. If of "rst order, a &mixed' phase consisting of spatially distinct regions of the pure phases in phase equilibrium will occur at densities intermediate between the phase of normal neutron star matter and the highdensity phase. The mixed phase will extend over a "nite radial region of the star and the proportion of the normal and high-density phases will vary according to depth (pressure) in the star. The mixed phase will consist of an intricate spatial pattern of the rare phase that occupies Coulomb lattice sites immersed in the dominant phase. This was a surprising "nding that pertains to the characteristics of any "rst-order phase transition in a substance having more than one conserved quantity (or independent component) of which one is the electric charge [14,15]. Beta stable neutron star matter is such a substance.
The Coulomb force overwhelms gravity at the surface of a star when the ratio of net positive charge to baryon number exceeds 10\. The limit is reduced by m /m for negative charge. C
400
N.K. Glendenning / Physics Reports 342 (2001) 393}447
2.1. Hyperonized matter Hyperonized matter refers to a phase in which, at moderate density above normal nuclear density, some of the baryon charge is carried by hyperons of either sign of electric charge or neutral. Such a phase in high-density baryonic matter is almost an inevitable consequence of the Pauli principle, which will distribute baryon charge over several species so as to lower the baryon chemical potential and therefore the energy [21,22]. While the protoneutron star is still hot, such #avor changing reactions as N#NPN##K ,
(1)
are possible. The associated kaon is free to decay unless it is driven by a phase transition to a condensed state (which we discuss later). Depending on the combination of protons and neutrons that are involved in the above reaction, all three charge states may be formed. Some of their decays are KP2 , K\P\# , \#K>P\#>#P2# .
(2)
The star cools by di!usion of the neutrinos and photons to the surface where they escape. The star thus looses energy and the reactions above become irreversible; strangeness is thus locked in [21]. When the star has cooled to a point where the associated Kaon cannot be produced in reactions like (1), the "nal state of equilibrium can be reached by weak #avor changing reactions such as illustrated in Fig. 2. In all of the reactions cited, a neutrino is produced, and consequently, the reactions are Pauli blocked during the "rst 20 s in the life of the neutron star until the neutrinos have di!used to the surface and escaped. There are also neutrinoless #avor-changing week interactions. The timescale for the normal weak interaction is 10\ s. The neutrinoless reactions are slower with a timescale of 10\ s. Obviously, the weak interactions facilitate hyperonization, when the strong interaction cannot. The hyperonization phase transition is likely of second order though in principle it could be of "rst order. (A relevant way of thinking of a second-order transition is that the di!erence of the phases is one of degree rather than of substance. Thus if the concentration of hyperons increases continuously from zero as the baryon density is increased above a critical value, the transition is of second or higher order. However if the concentration is
Fig. 2. Illustration of a weak #avor changing reaction at the quark level. The three horizontal lines denote the three valence quarks in a baryon. denotes lepton, either electron or muon.
N.K. Glendenning / Physics Reports 342 (2001) 393}447
401
discontinuous, being zero in the &normal' phase and "nite in the hyperonized phase } never tending to zero as the critical density is approached from above } the transition is of "rst order.) Hyperonization has several important e!ects in neutron stars: (1) It reduces the maximum possible mass of neutron stars compared to models in which they are absent by as much as 3/4M > [21,23]. (2) It e!ects the cooling rate of neutron stars [24,25]. (3) It provides a mechanism by which protoneutron stars of mass somewhat above the limiting mass of the fully equilibrated stars may produce a supernova and then promptly subside into a low-mass (&1.5}2M ) black hole in about > 20 s [18}20] (see Ref. [26] for a discussion of the role of neutrino trapping). In fact the neutron star produced in the 1987A event may have disappeared promptly into a black hole [27]. 2.2. Bose condensation Bose condensation could be of either order, depending on the interactions, but is possibly preempted by hyperonization or by quark decon"nement [21,22]. The reason for this, as explained in detail elsewhere [21], is that the condensation of \ or K\ is favorable only if the electron Fermi energy exceeds the e!ective mass in the medium of either meson. In such an event, bosons become the energetically favored agent for neutralization of protons. However at the higher baryon densities where this might otherwise be the case, baryons of both charges appear. Since baryon number of a star is conserved, but not lepton number (because of neutrino di!usion out of the star), charge neutrality can be achieved among baryons only, with no or little need for charged leptons or mesons [21,22]. (This is easily understood by remarking that the energetic cost of baryons must be paid because of their conservation but lepton Fermi energy and boson masses need not be paid when neutrality can be achieved among baryons.) However, the question of whether kaons are likely to condense or not has not been fully explored so far for several reasons: (1) The phase transition with conservation laws properly enforced (as discussed below) is very di$cult to implement when the full `botanya of baryon species is included. (2) The coupling constants of hyperons of the baryon octet are not well known. In fact the best that can be done is a constraint on the coupling [23], and the assumption that all other hyperons couple similarly, or by quark counting rules. 2.3. Quark deconxnement The property of asymptotic freedom of quarks assures that at some su$ciently high density, the decon"nement phase transition will occur irrespective of whether either of the other phase transitions has occurred at lower density [28]. It is not known whether decon"nement is of "rst or second order in cold, baryon rich hadronic matter. Lattice QCD so far has not been simulated with dynamical quarks. Models of the phase transition in which the decon"ned phase is represented by the &Bag' model [29] or any of its variations, are "rst order [30]. Both quark decon"nement and Bose condensation can produce the same e!ects for neutron stars as cited above for hyperonization. The decon"nement phase transition in neutron stars was "rst discussed more than twenty years ago [31}36]. However, the theory was reexamined and new insights into "rst-order phase transitions in any complex substance were achieved including the formation of a mixed phase with crystalline structure, as mentioned above [14,15]. In all of the earlier work, either beta equilibrium
402
N.K. Glendenning / Physics Reports 342 (2001) 393}447
was ignored or else charge neutrality was imposed as a local constraint on the mixed phase. It is evident that either constraint may prevent the model star from attaining its lowest-energy state. What is not necessarily obvious is that both constraints cause a "rst-order phase transition to be of the constant pressure type like the vapor}liquid transition in water [14,15]. A mixed phase of constant pressure independent of proportion of the phases is excluded in the monotonically varying pressure environment of a gravitating body. In the second case in which charge neutrality is imposed as a local constraint, the electron chemical potential is discontinuous between the two phases and so cannot satisfy Gibbs conditions for equilibrium. The implications for stellar structure in both cases were therefore incorrect. The implications were substantial: a neutron star composed of the two phases would have a large density discontinuity at the radial point that corresponds to the constant phase transition pressure. Pure quark matter would occupy the region interior to the discontinuity, with pure con"ned hadronic matter surrounding. The mixed phase would be entirely absent in this incorrect treatment of phase equilibrium. However, when a phase transition in beta stable neutron star matter is treated so as to respect Gibbs criteria for equilibrium, the pressure is not constant but rather is a monotonically varying function of the proportion of the two phases in equilibrium [14,15]. The density discontinuity disappears and instead, a region of mixed phase occupies a layer of possibly several kilometers in thickness between the pure phases. As was pointed out, it is not possible to simultaneously satisfy Gibbs criteria and locally imposed conservation laws ( (r),0) in systems containing several conserved quantities. Gibbs conditions for phase equilibrium and conservation laws can be satis"ed simultaneously only when the conservation laws are imposed in a global sense ( (r) d<"0). These statements are true in general for all multi-component systems for any "rst-order phase transition (whether decon"nement, pion or kaon condensate, or whatever). In general, the internal forces will drive a reapportionment of the concentration(s) of the conserved quantities at each proportion of the phases in equilibrium so as to achieve a minimum energy. When one of the conserved quantities is electric charge and the system is overall neutral, reapportionment will mean that the phases are oppositely charged. In nuclear systems the redistribution is driven by the isospin restoring force of nuclear matter to which the Fermi energies of nucleons and their coupling to the rho meson contribute. The valley of beta stability is a manifestation of this restoring force. Consequently, neutron star matter, which is highly isospin asymmetric when in the pure con"ned phase, achieves greater symmetry, consistent with the conservation laws and principle of minimum energy, by exchanging charge (and strangeness) with quark matter when they are in phase equilibrium. Nuclear matter regions of the mixed phase become more positively charged while regions of quark matter become negatively charged. The Coulomb force and surface interface energy have opposing tendencies, the one favoring breakup of regions of like charge into smaller ones, and the other favoring a con"guration with the least surface area. The opposing tendencies are resolved by formation of a Coulomb lattice of droplets (or other geometries) of the rare phase immersed in the dominant one. The lattice spacing and size of the drops is determined so as to minimize the energy. This was explained in detail in Ref. [15] and detailed calculations for several stellar con"gurations were "rst reported in Ref. [17]. In this paper we shall present a detailed study of the crystalline structure in the case of the decon"nement phase transition and the kaon phase transition. We shall be interested in the particle populations (nucleons, hyperons, quarks and kaons), the electric charge density carried in regions of di!erent phase in equilibrium with each other, and "nally, the dependence of the resulting
N.K. Glendenning / Physics Reports 342 (2001) 393}447
403
Fig. 3. A pie section showing the regions occupied by the various pure and geometric mixed phases in a particular mass star. Fig. 4. Compare with the neighboring mass star in Fig. 3. Notice that the pure quark liquid core is absent here.
crystalline structure, its extent, its varying geometry and scale within neutron stars, as a function of the stellar mass. Although known pulsar masses span a rather narrow range, from Her X-1 (M"0.98$0.12M ) to Vela X-1 (M"1.77> M ), this turns out to be an extreme variation > \ > as concerns its impact on the crystalline structure. Its seems plausible that the solid crystalline region will be involved in the phenomenon of pulsar glitches, sudden changes in pulsar rotational period, occurring on a time-scale of days, months or years, that are observed in some pulsars. The great individuality [37] of the glitch behavior of di!erent pulsars may be related to the extremely strong dependence of the crystalline solid on the stellar mass [17]. It is also likely that the crystalline mixed phase region will e!ect all transport and super#uid properties of large regions of neutron stars. This is because it occupies a substantial volume of the star and the neutron}proton content is more symmetric in the con"ned regions of the mixed phase than in the nuclear #uid that surrounds and forms an intermediate zone between the crystalline mixed phase and the outer crust. (Compare the neighboring mass stars illustrated in Figs. 3 and 4.)
3. Phase transitions in multi-component substances 3.1. Proof of the global nature of conservation laws We stated above that conservation laws in multi-component systems undergoing a "rst-order phase transition will be obeyed globally but not, in general, locally. The mathematical proof is very
404
N.K. Glendenning / Physics Reports 342 (2001) 393}447
simple. The Gibbs condition for phase equilibrium is that the chemical potentials, the temperature ¹, and the pressure in the two phases (called 1 and 2) are all common to the two phases. For de"niteness we consider a system with two conserved charges (or independent components) baryon and electric charge numbers. Gibbs conditions for thermal, chemical and mechanical equilibrium are summarized in p ( , , ¹)"p ( , , ¹) L C L C
(3)
where and are the chemical potentials corresponding to baryon number and electric charge. L C The above equation must hold in conjunction with expressions for the conservation laws. However, it is not possible to impose local conditions for the conservation laws. To see this, consider for de"niteness, neutron star matter, which must be charge neutral. Neutrality is guaranteed by either one of two conditions, which are local and global, respectively. Consider "rst the local conditions for vanishing charge in neutron start matter, and therefore the neutrality of the star composed of that matter; q ( , , ¹)"0, q ( , , ¹)"0 . L C L C
(4)
Here the charge densities in each phase are required to vanish identically. We denote these densities by q and q , and they are diwerent functions of the chemical potentials because the phases are physically di!erent. Therefore, we can choose one of the chemical potentials to be common, say , L but not both if we want each phase to have vanishing charge density. By insisting on local neutrality, we can solve the above two equations in the form "f ( ) and "g( ) . C L C L
(5)
Use these results in the Gibbs requirement for equality of pressure at equilibrium and "nd p ( , f ( ), ¹)"p ( , g( ), ¹)"p . L L L L
(6)
We see that the problem has been reduced to one dimension: the pressure is a function of only one chemical potential. Therefore by demanding that each phase in equilibrium be separately charge neutral, the domain of phase equilibrium can be found by means of the Maxwell construction. However, the demand of local neutrality exceeds what is required by the physics of stability of the star to the Coulomb force; the phase transition has been made to resemble that of a simple substance; the pressure remains constant independent of the proportion of the phases in equilibrium, but at the cost of a potential di!erence in the electron chemical potential, ! . C C Therefore, local neutrality is incompatible with Gibbs phase equilibrium. The demand of local neutrality, or in general local conservation, is not consistent with equilibrium. We make a particular point of this because all treatments of phase transitions in neutron stars prior to our 1992 work imposed local neutrality. As a consequence, the mixed phase of normal and high-density phase were absent in all such models, because, as noted, the pressure is a constant in the approximation of local or identical neutrality, and a unique pressure occupies but a single radial point, since pressure is monotonic in a star. The mixed phase is literally squeezed out of the star by gravity. Now consider the condition for global neutrality. We can consider a local volume < in a star in which pressure and temperature are uniform. Such a local region is assured by the equivalence
N.K. Glendenning / Physics Reports 342 (2001) 393}447
405
principle, even in a strong gravitational "eld. For a volume fraction of phase 2 the conditions of global conservation can be expressed (for uniform systems) as 1 <
4
Q Q #Q " "0 , q(r) dr"(1! )q ( , , ¹)# q ( , , ¹), L C L C < <
(7)
where
"< /<, <"< #< (8) is the volume proportion of phase 2, Q and Q are the total electric charge of phase 1 and 2, respectively, contained in a local volume <, and q and q are the charge densities in the subvolumes < and < occupied by the two phases in equilibrium. (For a star the net charge is Q"0.) Given a temperature, and the volume fraction of phase 2, which is denoted by , the above two equations (3) and (7) serve to determine the two independent chemical potentials , . Thus L C the solutions are of the form " ( , ¹), " ( , ¹) . (9) L L C C These symbolic equations de"ne the common chemical potentials of the phases in equilibrium. The equilibrium condition (3) therefore appears as p ( , ¹)"p ( , ¹) (04 41) . (10) This proves that the common pressure and all properties of the phases in equilibrium vary as the proportion and that the pressure of a multi-component system in the mixed phase is not in general constant. These are fundamentally di!erent properties for phase equilibrium of multicomponent substances; they contrast with the properties of single-component substances such as water, in which the properties are independent of the proportion of the phases. Having solved for the unknowns, we can compute the density in each phase, and from it the volume weighted density in the volume <; 1 " <
(r) dr"(1! ) ( , , ¹)# ( , , ¹) L C L C 4 B B #B " . , < <
(11) (12)
Here B is de"ned analogously to Q. 3.2. Degrees of freedom and internal driving forces Two key concepts are involved in understanding the general nature of the microphysics that determines the nature of a phase transition and the properties of the mixture of two phases in equilibrium in a substance of two or more conserved charges (or independent components); the key concepts are degree(s) of freedom and driving force(s). The above discussion was illuminating in this respect and we discuss the microphysics further in this section. The concentration of the conserved quantities in both of the pure phases of a substance, which for clarity we take as two in number and label Q and B, is some de"nite number c"Q/B ,
(13)
406
N.K. Glendenning / Physics Reports 342 (2001) 393}447
according to the way the system was prepared whether in a test tube by a chemist, or in a neutron star by nature through the partially chaotic processes of a supernova. The degree(s) of freedom that the system can exploit to "nd the energy minimum in the mixed phase is that of rearranging the concentration of the conserved charges between the phases in equilibrium q ( ) Q Q!Q q ( ) Q " , c " " , c " " B ( ) B B!B ( )
(14)
subject to the overall conservation laws B #B "B, Q #Q "Q .
(15)
We thus see that for a given partition of B between the two phases in equilibrium, the internal driving force can arrange an energetically optimum partition of Q. Therefore in a two-component system, there is one degree of freedom. Neutron star matter, or more generally, isospin asymmetric nuclear matter, has one degree of freedom in the mixed phase, the freedom to redistribute electric charge between the two phases in equilibrium in the mixed phase so as to minimize the energy at each density by reducing the symmetry energy. More generally, if the system is composed of n conserved charges, there are n!1 such degrees of freedom [15]. In particular, a single-component substance does not possess any freedom which is why the pressure and all properties of the two phases remain the same for all proportions of the phases in equilibrium (like water and ice). However, for a multi-component substance, the nature of each of the phases in equilibrium changes as the volume proportion of the phases changes, as is clear from (9). This is another way of seeing why the pressure does not remain constant, as for a simple (one-component) substance. In the case of nuclear systems, the isospin (or charge) symmetry energy exerts the driving force. For example, for the liquid}vapor transition in nuclear matter, when liquid and vapor are in equilibrium, the isospin restoring force will cause an exchange of charge between the two phases so as to make the denser liquid phase more symmetric, this reducing the asymmetry energy. In the case of neutron star matter, again it is the same isospin restoring force that will cause an exchange of charge between the normal phase and the other phase, be it the quark decon"ned phase, or kaon condensed phase. Neutron star matter is, of course, highly isospin asymmetric. As a consequence the pressure can vary by a factor of ten or more from one extreme of the mixed phase to the other. The generalization of the above to an arbitrary number of conserved charges is obvious. There is one additional chemical potential and one additional conservation law such as (7) for each additional charge (Fig. 5). (Some confusion may arise concerning the meaning of the volume <. This is not the volume of a star but any convenient locally inertial volume in the star. Spacetime, though curved by mass-energy, is #at to a high degree over regions that are large compared to interparticle distances even in stars that are on the verge of collapse to a black hole. The relative change in spatial metric is &10\ over a distance of the order of 10 internucleon spacings [22]. We may therefore solve all problems of the structure and composition of matter in Minkowski spacetime and use the results in the form of the stress-energy tensor in Einstein's equations.)
N.K. Glendenning / Physics Reports 342 (2001) 393}447
407
Fig. 5. Variation of pressure and other properties of neutron star matter in the mixed phase of normal and decon"ned matter. Variation of pressure by a factor of ten or more assures that the mixed phase will occupy a broad radial extent in a neutron star (see [15]).
3.3. Spatial structure Now we recall brie#y the assertion of the existence of a spatial structure in the mixed phase of a multi-component substance when one of the conserved charges generates the long-range Coulomb interaction. We have proven above that conservation laws are global. Accordingly regions of the normal phase of nuclear matter and any other phase in equilibrium with it such as quark matter or the kaon condensed phase are not in general charge neutral but have opposite and compensating charges. The internal forces will determine the energetically optimum distribution of charge. The opposite charge on the two phases of matter brings into play the Coulomb and surface interface energies. They will impose spatial order so as to minimize their sum. To calculate the spatial order we use the Wigner}Seitz method by choosing a volume v of radius R which is the cell size and contains the rare phase of radius r and the dominant phase in such amount as makes the cell neutral. Therefore cells do not interact. The Coulomb and surface energies per unit cell volume can be written in the schematic form that shows their dependence on dimension r of the `geometrya and on the proportion . E /v"C( )r, E /v"S( )/r . (16) ! 1 The dependence on r can be obtained by dimensional analysis [15, Section V]. The minimum of their sum occurs when E "2E (17) 1 ! where C and S are speci"c functions of proportion whose form is dictated by the geometry of the cells (e.g. spheres, rods and slabs). The above equations serve to de"ne the droplet radius and
408
N.K. Glendenning / Physics Reports 342 (2001) 393}447
cell size for each proportion of the phases,
r"
S( ) r , R" , 2C( )
(18)
where for spherical geometry, "(r/R). As remarked earlier, the internal force that drives the charge redistribution between phases in equilibrium is the isospin restoring force. As one can see, the size and spacing of the droplets of rare phase immersed in the dominant will vary as proportion
. Other geometries besides spheres may minimize the energy according to the proportion. The functional form of S and C is distinct in each case, as is the relation of the dimensions of r and R to . 3.4. Three theorems We have thus three theorems concerning the equilibrium con"guration of the mixed phase of a "rst-order phase transition of a substance with more than one conserved charge (or independent component in the language of chemistry): 1. All properties of the phases in equilibrium, including common pressure vary as the proportion of phases, except in special circumstances as mentioned below. 2. If electric charge is one of the conserved charges, the mixed phase will exist in the form of a crystalline lattice. 3. Because of Theorem 1, the geometry of the crystal and the size and the spacing of the lattice will vary with proportion. These remarkable properties of "rst-order phase transitions and the role played by the microphysics or internal forces is discussed in general elsewhere [15,22,38]. It will be observed that the above discussion is completely general, and must apply to many systems in physical chemistry, nuclear physics, astrophysics and cosmology. In particular, in nuclear systems it applies to the con"ned}decon"ned phase transition at high density, to the so-called liquid}vapor transition at sub-saturation density as well as to Pion and Kaon condensation if they are "rst-order transitions, and if they occur. (It is understood that if a phase transition is induced by a nuclear collision, complete equilibrium will not be achieved, and in particular, that there is insu$cient time for formation of spatial structure.) For all "rst-order phase transitions in nuclear systems, such as those mentioned, the symmetry energy is the driving force for rearranging the concentration of electric charge between phases in equilibrium. And since one of the conserved charges is electric the long-range Coulomb force will impose a spatial order in the co-existence phase whenever the time-scale is long enough for the charge symmetry force to establish equilibrium [15]. Clearly the results derived above for multi-component systems hold in general. It is conceivable that there could exist a substance which is prepared with just such a concentration c of the conserved quantities which is the optimum for both phases. In such a singular case, no rearrangement would take place. The variation of the pressure with proportion of the phases is of great importance for the structure of neutron stars. In the early treatments that overlooked the pressure variation, the mixed
N.K. Glendenning / Physics Reports 342 (2001) 393}447
409
phase was squeezed out of stellar models because the pressure is monotonic in a star. Both extremes of the constant pressure region, and all proportions between were mapped onto a single pressure and therefore single radial location. Not so when the pressure varies in the mixed phase. The coexistence phase will occupy the region spanned by the unlike pressures at the two extremes of the coexistence phase. This can amount to some kilometers of crystalline phase [17]. There is another important consequence of the existence of degree(s) of freedom for rearranging concentrations of conserved quantities in multi-component substances according to the energy minimization principle. Whereas in one-component substances the properties of each phase in equilibrium are very unlike (as for example the density of ice and water), in a multi-component substances the rearrangement of charges so as to optimize the energy at each proportion of the phases relaxes their di!erences. This can be seen in Fig. 7 of Ref. [15]. As a consequence, the transition density from the pure low-density phase to the mixed phase is lower than would be expected were the degree(s) of freedom frozen out (as in the pre-1990 studies of decon"nement in neutron stars that were cited earlier) [15,17,39,40].
4. Quark decon5nement phase transition The theory of the strong interactions, QCD, evidently is not solvable, except on a lattice, and then, only under idealized circumstances that do not yield an appropriate equation of state. We therefore have to resort to models of matter. In the con"ned phase, we have a choice between many-body models that aim to describe matter at normal nuclear density and sometimes beyond, in terms of the interaction of two isolated nucleons. As an alternative, we choose a relativistic Lagrangian theory that involves the baryon octet interacting through scalar, vector and vector}isovector mesons. The coupling constants can be derived algebraically in terms of "ve properties of nuclear matter at saturation [22]. The extrapolation to dense matter is less extreme than the Bethe}Breuckner many-body model, and it is causal at all densities. The decon"ned phase is modeled by the so-called MIT bag model [29]. The two phases of matter are thus described by entirely di!erent theories. This is unfortunate, but unavoidable at the present. 4.1. Theoretical models 4.1.1. Normal conxned phase We describe the con"ned phase in terms of the mean "eld solution of the covariant Lagrangian [15,21,22,41}45]. L" M (i RI!m #g !g I!g ) I) I N S I M I # (R RI !m )! IJ#m I! ) IJ#m ) I N IJ S I IJ M I I ! bm (g )!c(g )# M (i RI!m ) . N H I H H L N C\I\
(19)
410
N.K. Glendenning / Physics Reports 342 (2001) 393}447
The baryon species, B are coupled to the , , mesons. The masses are denoted by m with an appropriate subscript. The sum on B is over all the charge states of the lowest baryon octet (p, n, , >, \, , \, ) as well as the quartet. However, the latter are not populated up to the highest density in neutron stars, nor are any other baryon states save those of the lowest octet for reasons given elsewhere [21]. The cubic and quartic terms were "rst introduced by Boguta and Bodmer so as to bring two additional nuclear matter properties under control [42]. The last term represents the free lepton Lagrangians. How the theory can be solved in the mean "eld approximation for the ground state of charge neutral matter in general beta equilibrium (neutron star matter) is described fully in Refs. [21,22]. We denote the mean values of the meson "elds by , , , in which case the baryon e!ective masses are given by m夹"m !g and the baryon eigenvalues by N e (k)"g #g I #(k#m夹 . S M In the above equations, I is the isospin projection of baryon charge state B. The Fermi momenta for the baryons are the positive real solutions of
(20)
e (k )" ,b !q , (21) L C where b and q are the baryon and electric charge numbers of the baryon state B, and and L are independent chemical potentials for unit baryon number and unit negative electric charge C number (neutron and electron, respectively). These equations ensure chemical equilibrium. The lepton Fermi momenta are the positive real solutions of, (k#m" , (k#m" " . C I I I C C C The fermion "eld equations for uniform matter in momentum representation are
(22)
(23) [ (kI!g I!g I )!(m !g )] (k)"0 . N I S M This equation is of the Dirac form and the eigenvalues for particle and antiparticle can be found in the usual way. e (k)"g #g I #(k#(m !g ) , S M N e (k)"!g !g IM #(k#(m !g ) . (24) S M N We recall that the meson "eld equations for uniform static matter (in which space and time derivatives can be neglected) are g
" S , m S g " M I , m M m " g !bm g (g )!cg (g ) N N Q L N N N N
(25) (26) (27)
N.K. Glendenning / Physics Reports 342 (2001) 393}447
411
where m夹( )"m !g is the Dirac e!ective mass of baryon species B and it depends on and N therefore on density through the Fermi momenta k . The scalar density has been written in its Q integral form,
m夹( ) 2J #1 I " k dk . 夹 Q 2 (k#m ( )
(28)
The baryon source currents have been replaced by their ground state expectation values; the ground state is de"ned as having the single-particle momentum eigenstates with eigenvalues given by (24) "lled to the Fermi momentum. In the above equations, I is the isospin projection of baryon charge state B, and k is the Fermi momentum of species B. Only the timelike components of the vector "elds and the isospin 3-component of the charged "eld have nonvanishing values on account of the isotropy of nuclear matter and electric charge conservation, respectively. The baryon number, charge, and strangeness densities of the various particle species read "(2J #1)b k /(6) , Q "(2J #1)q k /(6) , S "(2J #1)s k /(6) .
(29)
The total baryon and charge densities, the latter of which must be e!ectively zero for a star, are " ,
q" Q "0 .
(30)
Charge neutrality is expressed for the pure con"ned hadronic phase as q , (2J #1)q k /(6)! k /(3)"0 (31) & H H where the "rst sum is over the baryons whose Fermi momenta are k and the second sum is over the leptons e\ and \. By solving the meson "eld equations, the condition for charge neutrality, and the conditions for chemical equilibrium (21), (22), we get the solution for beta-stable chargeneutral matter called neutron-star matter at the chosen baryon density in the hadronic phase: " (2J #1)k /(6) . &
(32)
The solution can be represented by the values of the mean meson "elds and the two independent chemical potentials as in Fig. 6. From the solution we can learn the particle populations since all Fermi momenta can be expressed in terms of the two independent chemical potentials and the mean meson "elds (20)}(22) [21,22]. The particle number densities are shown in Fig. 7. Notice how the lepton populations are replaced by charged baryons as the density increases. There is, of course, uncertainty as to the hyperon couplings. Extrapolation from limited sigma-atomic data to higher densities is uncertain; it does however suggest the absence of the \ in neutron stars [46]. We do not take this to be conclusive.
412
N.K. Glendenning / Physics Reports 342 (2001) 393}447
Fig. 6. Solution of the "eld quantities of the theory discussed in Section 4.1.1. The nuclear parameters that de"ne the values of the coupling constants are those of set 2 of Table 1. Fig. 7. Baryon and lepton populations in neutron star matter in the pure hadronic phase. The values of symmetric nuclear matter compression and nucleon e!ective mass at saturation are K"240 MeV and m夹 "0.78M.
Once the solution has been found, the equation of state can be calculated from "m #m #m & N S M #
1 IH 2J #1 I (k#m夹k dk# (k#m k dk , H 2 H
(33)
which is the energy density while the pressure is given by p "!m #m #m S M & N
1 1 IH 1 2J #1 I # k dk/(k#m夹# k dk/(k#m . H 2 3 3 H
(34)
These are the diagonal components of the stress-energy tensor RL TIJ"!gIJL# RJ . R(R ) I (
(35)
The equation of state for neutron star matter in the con"ned phase is shown in Figs. 8 and 9 for the two parameter sets that we now de"ne. Five of the constants of the theory can be algebraically related by the properties of nuclear matter [22]. The constants are the nucleon couplings to the scalar, vector and vector}isovector mesons, g /m , g /m , g /m , and the scalar self-interactions N N S S M M de"ned by b and c. The nuclear properties that de"ne their values are the saturation values of the
N.K. Glendenning / Physics Reports 342 (2001) 393}447
413
Fig. 8. The equation of state of neutron star matter for two models. The one labeled &n#p#H' corresponds to hyperonized matter; the other labeled &hybrid' to the same model but with a phase transition to decon"ned matter. The mixed phase is marked. Particle populations are shown respectively in Figs. 7 and 10. Fig. 9. The equation of state of neutron star matter for two cases as in Fig. 8 but with K"300 MeV, m夹 /m"0.7. Table 1 Coupling constants that yield binding B/A"!16.3 MeV, density "0.153 fm\, and symmetry energy coe$cient, a "32.5 MeV for saturated nuclear matter with the below listed compression K and e!ective mass m夹 K (MeV)
m夹/m
(g /m ) N N (fm)
(g /m ) S S (fm)
(g /m ) M M (fm)
b
c
300 240
0.7 0.78
11.79 9.927
7.149 4.820
4.411 4.791
0.002947 0.008659
!0.001070 !0.002421
binding energy, baryon density, symmetry energy coe$cient, compression modulus and nucleon e!ective mass. Nuclear matter at normal density does not depend on the hyperon couplings. Elsewhere we have shown how they can be made consistent with (1) the data on hypernuclear levels, (2) the binding of the in nuclear matter (which can be determined quite accurately from an extrapolation of the hypernuclear levels to large atomic number A), and (3) neutron star masses [23]. We shall assume that all hyperons in the octet have the same coupling as the . The couplings are expressed as a ratio to the above-mentioned nucleon couplings: x "g /g , x "g /g , x "g /g . (36) N &N N S &S S M &M M The "rst two are related to the binding by a relation derived in [23] and the third can be taken equal to the second by invoking vector dominance. We choose two extreme sets of nuclear parameters (Table 1) that are compatible with empirical knowledge of nuclear matter properties and for which, together with the hyperon couplings (36) in the range 0.5(x (0.7. N
414
N.K. Glendenning / Physics Reports 342 (2001) 393}447
4.1.2. Deconxned phase To describe quark matter we use a simple version of the bag model for which the pressure, energy density and baryon number and charge density at ¹"0 are given by
#( !m ) 5 3 1 D D ( !m ) ! m # m ln D , p "!B# D D 2 D / m 2 D 4 D D D D 3 1 1 #( !m ) D D "B# ( !m ) ! m ! m ln D , / D D 2 D 4 D D 2 D m D D ( !m ) D D , p " / D 3 D (!m) ( !m ) H D D ! H , (37) q " q / D 3 3 H D where the sum, f, is over #avors. In this simple model of quark matter there are no internal variables (like the mean meson "elds) since the quarks are assumed to form a free Fermi gas. The charge density includes the electron charge density and for regions of the star where this phase exists alone (i.e. at pressures above the range spanned by the coexistence phase) quark matter is uniform and charge neutrality requires that q "0. The contribution of leptons to the energy and pressure / (the last terms and ) should be added to the above expressions. & / Pressure, densities, etc. are speci"ed simply by the chemical potentials and the values of the bag constant for quark matter which is taken as B"180 MeV. For the quark masses we take m "m "0, m "150 MeV, m "1500 MeV. The charge density includes the electron charge S B Q A density and for regions of the star where this phase exists alone (i.e. at pressures above the range spanned by the coexistence phase) quark matter is uniform and charge neutrality requires that q "0. / Because of the long time-scale, strangeness is not conserved in a star. The quark chemical potentials for a system in chemical equilibrium are therefore related to those for baryon number and electron by
" "( !2 ), " "( # ) . C B Q L C S A L
(38)
4.1.3. Mixed phase The boundaries of the mixed phase (the values of the density or chemical potential at the points where the substance is in one pure phase or another with an in"nitesimal amount of the other phase) are very simple to "nd for simple substances } ones with only a single conserved charge (independent component). The method is often referred to as the Maxwell construction, which can be done in several di!erent ways according to the variables chosen. However, the determination of the boundaries of the mixed phase for complex substances is much more involved. Fig. 5 in Ref. [15] makes this evident. A suitable strategy is as follows. Begin solving the equations that de"ne the low-density phase subject to the subsidiary condition of local charge neutrality q "0 at a sequence of densities of increasing value. At each density "nd the & values of the chemical potentials ( and ). At each density "nd the solution of the high-density L C phase (quark matter) at the same value of the chemical potentials (see the connection (38)). Locate
N.K. Glendenning / Physics Reports 342 (2001) 393}447
415
the value of the chemical potentials for which the pressures in the two phases are equal. This procedure locates the boundary between pure con"ned hadronic matter and the mixed phase. The complementary procedure will locate the boundary between mixed and pure quark phases. To "nd the properties of the mixed phase in which both phases are present in equilibrium, change the independent variable from baryon density to proportion of quark phase . This variable ranges between 0 and 1. Now solve the equations de"ning both phases subject to the conditions of equal pressures (3) and global charge neutrality (7) for a sequence in the range 0( (1. This sequence of solutions provides the properties of the mixed phase having common pressure and chemical potentials, which however vary with . The substance conserves charge globally (7). When the boundary of the mixed phase at "1 has been reached, solve the equations de"ning the pure quark matter phase subject to local neutrality q "0 for a sequence of baryon densities in / the range higher than the boundary of the mixed phase. The equation of state and the composition of matter from the low-density con"ned phase through the mixed phase to the high-density decon"ned phase is thus obtained. In Figs. 8 and 9, the equations of state are shown for two parameterizations of neutron star matter that passes through a succession of phases with increasing density } the pure con"ned hadronic phase at lower density, the mixed phase at intermediate density, and the pure decon"ned phase at high density. The two "gures correspond to the two parameter sets de"ned earlier. The particle populations as a function of density are shown in Fig. 10 for one parameter set. Notice in particular that the lepton populations fall sharply for densities above that for which the mixed phase "rst appears, and essentially vanish in the pure quark phase. This illustrates the previous assertion that charge neutrality is energetically preferred among baryon charge carrying particles, since their number, unlike that of leptons, is conserved in neutron star matter. Because a star must be charge neutral (to the degree Z /A(10\), the uniform phases } the con"ned hadronic and decon"ned quark matter phases } must have vanishing charge density. However, as discussed above this cannot be so for the mixed phase of asymmetric neutral nuclear matter. At any given density the minimum energy state will be the one that minimizes the symmetry energy subject only to the condition of global neutrality. The charge densities on regions of con"ned and decon"ned matter in phase equilibrium are shown in Fig. 11. The opposite charge densities of con"ned and decon"ned regions of the mixed phase are factors in determining the geometrical structure of the mixed phase. As explained earlier, the charge densities of the two phases are driven to have opposite signs by the symmetry force in nuclear matter. Of course when the charge densities are multiplied by the proportion of the (local) volume that is occupied by each, the total charge including that on leptons is zero. It will be noticed that leptons are almost totally quenched by baryon carrying particles. This is because leptons are energetically unfavored when neutrality can be achieved among baryons [22]. It is easy to understand why the charge density on hadronic and quark matter are complementary, one being large when the other is small. To achieve the lowest overall energy, the rare phase can be far from its optimum con"guration if the dominant phase can be closer to its optimum. We recall that charge neutrality of a star, which is enforced by gravity, is highly unfavorable as far as the charge symmetry preference of nuclear matter is concerned. The asymmetry can be relieved when the two phases coexist. Thus when the proportion of quark matter is small, its negative charge density is large because it o!sets the positive charge on the dominant con"ned hadronic phase in equilibrium with it. Although the large charge is energetically unfavorable for the quark
416
N.K. Glendenning / Physics Reports 342 (2001) 393}447
Fig. 10. Baryon, lepton and quark populations in neutron star matter for which the decon"ned phase transition occurs. The coexistence phase occupies densities between 0.29 fm\((1.2 fm\. Nuclear matter properties at saturation are the same as in Fig. 7. Fig. 11. The charge density carried by baryons, leptons and quarks corresponding to Fig. 10. Densities times the respective volume fraction add to zero (K"300 MeV).
matter alone because of the imbalance in the number of light quarks, there is little of it at small proportion while there is much hadronic matter whose symmetry energy is lowered by the rearrangement of charge. The roles of con"ned and decon"ned phases interchange at large proportion of quark matter. Hadronic matter can be far from symmetry so as to reduce the quark fermi energies without regard to the charge on the quark phase.
4.2. Bulk structure of a hybrid star We refer to a neutron star whose outer region is composed of hadronic matter in the con"ned phase and whose interior regions are composed of a mixed phase of con"ned and decon"ned matter with possibly a pure quark phase occupying the central region as a hybrid star. Model descriptions of the two pure phases and how to derive the properties of the mixed phase was discussed in Section 4.1.1. The equations of state for neutron star matter for which a phase transition to quark matter occurs are compared in Figs. 8 and 9 for each parameter set de"ned above. The region of coexistence of the con"ned and decon"ned phases is marked by a smooth transition in pressure from that of the con"ned phase at low density to that of the decon"ned phase at higher density. Unlike a "rst-order phase transition in a one-component substance, the pressure varies through the mixed phase. This is characteristic of all "rst-order phase transitions in neutron star matter, whether it be decon"nement, or pion or kaon condensate. (The coexistence phase cannot be found
N.K. Glendenning / Physics Reports 342 (2001) 393}447
417
Fig. 12. Mass pro"les in neighboring mass stars, one with a pure quark matter core (M/M "1.42) and one without. > Both stars have a mixed phase of con"ned and decon"ned matter. Its lower density is +220 MeV/fm, not far above saturation density of symmetric matter. The pure quark phase is evident by the kink at +930 MeV/fm. Fig. 13. Mass as a function of central density for stellar sequences corresponding to the two parameter sets, K"240 MeV and m夹 "0.78M (dashed) and K"300 MeV and m夹 "0.7M (solid). Normal nuclear density is indicated by and the critical density between the pure normal phase and the mixed phase is marked by `ma.
in nuclear matter, except for symmetric matter by means of the so called Maxwell construction. That method can ensure the continuity of only one chemical potential and not two as asymmetric matter has.) The region at densities below the mixed phase is pure hadronic matter and the region above, pure quark matter. The con"ned phase consists of a charge neutral mixture of the members of the baryon octet and leptons. The decon"ned phase is composed of a charge neutral mixture of the three light quark #avors and leptons. The mixed phase region is a neutral mixture of baryons, quarks and leptons. In all cases the mixtures of particles correspond to an equilibrium con"guration. The di!erence between mass of stars that do not have a pure quark matter core and those that have a substantial pure quark phase core is very small. The reason is that the decon"ned phase is de"ned by an equation of state that is very soft; when the critical density for the pure quark phase is reached in the center of the star, the overlaying layers of matter squeeze it, raising the density, and driving the critical density to larger radius. Correspondingly the mass pro"le of stars of nearby mass are quite di!erent according to whether the core is made of quark matter. This is illustrated in Fig. 12. The mass is very much more concentrated in the center for the star having a quark matter core, and the density is more than 50% greater than a star having only a slightly smaller mass, but with no pure quark matter phase. Two stellar sequences (corresponding to the two parameter sets of Table 1) with (1) neutron stars as the lower mass members, (2) hybrid stars with mixed phase central regions, and (3) hybrid stars with a pure quark matter core surrounded by a mixed phase region are shown in Fig. 13. The
418
N.K. Glendenning / Physics Reports 342 (2001) 393}447
Fig. 14. Mass}radius relationship for the sequence containing hybrid stars compared to sequences composed solely of the con"ned phase with various degrees of completeness of beta equilibrium. (Pure neutron stars are labeled by &n', those with neutrons, protons and leptons by &n#p' and also with hyperons by &h#p#H'.) (K"300 MeV.) Fig. 15. Detail of the hybrid sequence of Fig. 14 near the limiting mass.
mass-radius relation for one of these sequences is compared in Figs. 14 and 15 with that of several other sequences of neutron stars and hyperon stars. The region of nearly constant mass is enlarged in Fig. 15. It corresponds to members of the sequence having a pure quark matter core. 4.3. Spatial structure in the mixed phase Although a star must be electrically neutral, as remarked in the introduction, neutrality does not imply local and identical vanishing of electric charge density. The internal force that can exploit the degree of freedom made available by allowing neutrality to be achieved globally and which is closed to one in which local neutrality is enforced, is the isospin restoring force experienced by the con"ned phase of hadronic matter. It is embodied in the isospin symmetry energy in the empirical mass formula of nuclei and nuclear matter. The pure hadronic phase of neutron star matter is highly isospin asymmetric, containing as it does, mostly neutrons. This asymmetry can be relieved to the degree permitted by overall neutrality and minimum energy when the pressure in the interior of the star is high enough to condense some quark matter. Then charge can be exchanged between the two phases in equilibrium, making regions of the hadronic phase more positively charged and regions of quark matter negatively charged. Symmetry energy will be gained thereby at only a small cost in rearranging the quark Fermi surfaces. Electrons play only a minor role when neutrality can be realized among baryon charge carrying particles. Thus the mixed phase region of the star will have positively charged regions of nuclear matter and negatively charged regions of quark matter.
N.K. Glendenning / Physics Reports 342 (2001) 393}447
419
The Coulomb interaction will tend to break the regions into smaller ones, while this is opposed by the surface interface energy. Their competition will be resolved by forming a crystalline lattice of the rare phase immersed in the dominant one whose form, size and spacing will minimize the sum of surface and Coulomb energies. Since all internal properties of the two phases in equilibrium with each other vary with their proportion, so will the geometrical structure. When quark matter is the rare phase immersed in con"ned hadronic matter, it will form droplets. At higher proportion of quark matter, the droplets will merge to form strings and then sheets, and then the role in the geometric structure of con"ned and decon"ned phases will interchange [15]. The above description of structure in the mixed phase of a substance for which electric charge is one of the several conserved charges, is reminiscent of a Coulomb lattice in metals or of the hypothesized structure in the crustal region of a neutron star [47}49]. The important and remarkable di!erence is that in the latter two cases, positively charge nuclei or ionized atoms are embedded in an electron gas } two di!erent substances of opposite charge. The situation with which we deal are two phases of one and the same substance, which by energy minimization and phase equilibrium, acquire opposite charges. That regions of the two phases would fragment and form a lattice of speci"c dimensions within the background of the dominant phase was not known prior to our work. The problem of structure arising from the interplay of Coulomb and surface interface energy is not particular to the precise nature of the substances but only to their respective charge densities and the surface tension. We consider a Wigner}Seitz cell of radius R containing the rare phase object of radius r and an amount of the dominant phase that makes the cell charge neutral. The whole medium can be considered as made of such noninteracting cells, under the usual approximation of neglecting the interstitial material. As we shall see, the size of these cells is of the order of tens of fermis or less. The variation of the metric over such small regions is completely negligible (see Ref. [21] for the radial behavior of the metric in typical models), so they are locally inertial regions and our discussion of them as if gravity is absent is justi"ed by the equivalence principle. The Coulomb and surface energy densities corresponding to the proportion for drops, rods or slabs of the rarer phase immersed in the dominant one are "2[q ( )!q ( )]er f (x) , ! & / B
(39)
" d/r , 1
(40)
where q , q are the charge densities of hadronic and quark matter, r is the dimension (radius in the & / case of drops) of the rare-phase object and R is the dimension of the Wigner}Seitz cell. The ratio of the volumes of the two phases in equilibrium is called x,(r/R)B, d"1, 2, 3
(41)
and it is related to the proportion of phases through x" ,< /< /
(hadronic matter background) ,
(42)
or x"1! (quark matter background) ,
(43)
420
N.K. Glendenning / Physics Reports 342 (2001) 393}447
depending on which form of matter is the background or dominant phase of the coexistence phase. The three values of d"1, 2, 3 correspond to the idealized geometries of slabs, rods or drops of the rarer phase immersed in the dominant one. Minimizing the sum of Coulomb and surface energies with respect to r at "xed proportion
yields "2 . We then obtain 1 ! d , d"1, 2, 3 , (44) r" 4[q ( )!q ( )]ef (x) & / B [ dq ( )!q ( )e]f (x) & / B , (45) # "6x ! 1 16
at whatever proportion being considered. The radius R of the Wigner}Seitz cells is given by (41)}(43). The geometrical form of the rare phase and the corresponding Wigner}Seitz cell are found (within the approximation of the three idealized geometries) by determining which geometry } slabs, rods or spheres } minimizes the total energy at the given proportion of phases. The function f (x) is given in all three cases by B 1 1 (2!dx\B)#x . (46) f (x)" B d#2 (d!2)
where the apparent singularity for d"2 is well behaved upon using a limiting process [48]. We have supposed that the electrons are uniformly distributed throughout the mixed phase whether quark or hadronic regions, and hence they do not appear in the above. In fact, electrons are almost totally absent from the mixed and pure quark phase (see Fig. 10). The reason for this has been stated before. Baryon number is conserved } lepton number in a star is not (neutrinos escape). Consequently, conditions are easily attained at high density where neutrality is achieved more economically among baryons. Formulae similar to the above were "rst derived in a di!erent context } that of ionized atoms in an electron gas [48]. The positively charged ions will obviously be spaced so as to shield the electric charge and minimize the Coulomb energy. At still higher density when the atoms are fully ionized and the nuclei are so close as to merge into other structures, the above formula hold for the idealized geometries named. The surface tension between con"ned and decon"ned phases is very di$cult to compute. Obviously it should be self-consistent with the two models of matter, quark and hadronic, in equilibrium with each other. This latter feature arises because of the fact that, unlike simple substances like water and vapor, the densities of each phase change as their proportion does and so do their other properties. So the surface energy is not a constant. In terms of the physics we can discuss an upper limit to its value. As discussed at length above, the two phases in equilibrium of a substance having more than one conserved quantum number has a degree of freedom not available to a single-component substance } the freedom to redistribute charges so as to optimize the energy. The mixed phase of a charge neutral substance would be structureless were this degree of freedom frozen. Therefore the structured phase, for which the degree of freedom is unfrozen, may be degenerate in energy with a structureless mixed phase, but in general will lie lower and never higher. Therefore we know that were it possible to carry out a completely self-consistent
N.K. Glendenning / Physics Reports 342 (2001) 393}447
421
calculation, the surface tension would never have a value that placed the structured mixed phase above an unstructured one. Accordingly, we shall choose a numerical value of the surface tension that avoids an unphysical result. Gibbs studied the problem of surface energies, and as a gross approximation, one can deduce that it depends on the di!erence in energy densities of the substances in contact times a length scale typical of the surface thickness [50], in this case of the order of the strong interaction range, ¸"1 fm. In other words, the surface interface energy should depend on the proportion of phases in phase equilibrium, just as everything else does. ( )"const;[ ( )! ( )];¸ . / &
(47)
This represents our treatment of the surface tension. However, it has been recently found that to very good approximation, an actual calculation of the surface tension between the normal and kaon condensed phase agrees remarkably well with the square of the energy density di!erence [51]. In contrast to single component substances where the densities of the two phases in equilibrium are typically very unequal (as in vapor and liquid), we see from Fig. 16 that the densities tend to track each other thus leading to a smaller surface tension. The constant in the above formula should be chosen so that the structured phase lies below the unstructured as just discussed. It will be understood from the formulae written above that the geometrical size, whether drops, rods or slabs, and the sum of surface and Coulomb energies scale with the surface energy coe$cient as independent of geometry. Therefore the location in the star where the geometry changes from one form to another is independent of . This is strictly true only if the bulk energy and pressure dominate those contributed by the spatially ordered phases. This is expected to be the case.
Fig. 16. The energy densities of the hadronic (H) and quark (Q) phases in equilibrium as a function of proportion, the corresponding surface tension (47), and the sum of surface and Coulomb energies (45). (K"300 MeV.)
422
N.K. Glendenning / Physics Reports 342 (2001) 393}447
4.3.1. Size and variation of the crystalline structure We are now in a position to compute the geometrical structures, their sizes and spacings as they vary with proportion of the phases. The spatial structure of the mixed phase depends on the energy densities (on which the surface interface energy depends) and charge densities of the equilibrium phases as in Eqs. (44), (45) and (47). The bulk energies of quark and hadronic matter in phase equilibrium are shown in Fig. 16 as a function of the proportion of the quark phase . It is noteworthy how the energy density of each phase varies throughout the mixed phase as a function of the volume fraction of quark matter, just as we showed above must be the case in general. This is in contrast to a simple substance, one with only one conserved charge, in which the density of each phase in equilibrium remains constant. It is also worth noting that the bulk energy densities of the con"ned and decon"ned phase are about two orders of magnitude greater than the sum of the energy densities of the Coulomb and surface interface energy. This justi"es the two part approach to the problem, of computing the bulk properties and then against this background, the geometrical structure imposed by the surface and Coulomb energies. As already noted, the total charge in a Wigner}Seitz cell is zero, so the Coulomb force is shielded by the lattice arrangement of the rare phase immersed in the dominant. The preferred crystalline form, drops, slabs or rods of the rarer phase immersed in the dominant one are shown in Fig. 17 as a function of the proportion of quark phase. It is seen that the diameter (D) of the rare phase objects range between 10 and 20 fm. The sizes of the structures do not much change with proportion but their spacing does. The droplets merge to form rods and the rods merge to from slabs with increasing proportion. The roles reverse as the dominant phase switches.
Fig. 17. The diameter (D) and spacing (S) of the idealized geometrical phases of quark and hadronic matter in equilibrium, as a function of the proportion of quark matter (K"300 MeV). Dashed lines show continuous dimensionality interpolation (d is continuous rather than discrete).
N.K. Glendenning / Physics Reports 342 (2001) 393}447
423
Fig. 18. Radial boundaries of the various phases, pure quark, mixed (and the geometrical phases) and pure hadronic #uid for stars of various masses. The nuclear parameters correspond to K"300 MeV, mH /m"0.7. Fig. 19. Detail of upper left corner of Fig. 18.
4.3.2. Crystalline structure in hybrid stars For the reasons discussed earlier all properties of the equilibrium phases vary as a function of the proportion of quark matter, including the common pressure. For this reason the geometric structure varies with location in hybrid stars. Our purpose in this section is to demonstrate the extreme dependence of the crystalline structure and the radial extent that it occupies as a function of stellar mass. Low mass stars (if they exist in nature) are entirely in the con"ned phase. In intermediate mass stars, quark droplets arranged in a lattice of varying dimension (Fig. 17) in a background of hadronic matter occupy a growing central region with increasing stellar mass. At still higher mass, the quark droplets merge to form rods and then slabs in the core of the star, and "nally, near the limiting mass, the pure decon"ned phase occupies the stellar core. In Figs. 18 and 19 one can see the radial location of the various geometric and pure phases in a star according to its mass.
5. Kaon condensation 5.1. Introductory remarks 5.1.1. Comment on kaon condensation Whether it will become energetically favorable for kaons to condense in charge neutral matter depends on the behavior of the chemical potential of electrons, which neutralize neutron star
424
N.K. Glendenning / Physics Reports 342 (2001) 393}447
matter at low density. If the chemical potential rises above the kaon mass, Kaons will condense. We discussed the prognosis for Kaon condensation in 1985 and refer to Fig. 2 in Ref. [21] or Fig. 5.23 in Ref. [22]. Because the Kaon mass is so large, formation of a Kaon condensate depends on the kaon mass in the medium becoming small. In a system made purely of an equilibrium charge-neutral mixture of neutrons, protons and leptons, the electron chemical potential increases with density while the K\ mass decreases. So the condition for condensation is inevitable under these circumstances. However, with increasing density, hyperons become an energetically favorable means of achieving charge neutrality, with no or little need for electrons and the electron chemical potential saturates, as seen in Fig. 6. The disappearance of electrons at high density is seen in Fig. 7. In this case Kaon condensation may be preempted by hyperonization, unless the Kaon mass in the medium is reduced from its vacuum value near 500 MeV to below 200 MeV at a density not exceeding about 3 , according to the estimates referenced. These estimates do depend on the hyperon couplings, which aside from the Lambda, are not well constrained [23]. The possible saturation by hyperonization was con"rmed in Ref. [52]. However, the role of hyperons remains an uncertainty. So far, no one has succeeded in solving a model of Kaon condensation in the presence of hyperonization. Quark decon"nement has a similar saturating e!ect on the electron chemical potential at high density and so also may limit the density range in a star where kaon condensation is possible. Given the uncertainties of hyperon couplings and of the threshold density for decon"nement, Kaon condensation remains an interesting possibility. 5.1.2. Contrast with deconxnement transition In the example of crystalline structure in the mixed phase of con"ned and decon"ned hadronic matter, the two phases were treated in distinct models } relativistic mean "eld theory for the con"ned phase, the MIT bag model for the decon"ned phase. Under this circumstance it was especially simple to "nd the conditions of phase equilibrium under the constraint of the conservation laws. The reason for this is that in the bag model, the pressure is given by the values of the two chemical potentials. Therefore, at low density, where matter is known to be in the con"ned phase, one solves relativistic mean "eld theory for the many unknowns of that theory, as a function of density. At each density, one can compute the pressure in the quark phase corresponding to the chemical potentials in the con"ned phase. Therefore, "nding the threshold density for phase equilibrium is just a matter of determining at what density the pressure in the two phases "rst become equal. The numerical problem is much more di$cult in the model of kaon interactions with the nuclear medium that we now consider. For this reason, kaon condensation was incorrectly computed in a number of works, even after it had been discovered that the conservation laws must be enforced only as global constraints. Indeed, it is numerically so di$cult that so far hyperons have not been included as a description of the normal phase in equilibrium with the kaon condensed phase. The greater di$culty in developing a computational strategy for "nding the mixed phase for the kaon condensation phase transition is simply this: Both phases } the normal and the condensed phase } are described by a single Lagrangian, as is certainly desirable. Therefore, unlike the case of decon"nement when treated in the two model approach described above, the pressure at the threshold of the kaon condensed phase cannot be found simply in terms of chemical potentials for the solution of the normal phase. Rather, the meson "eld con"gurations in the kaon condensed phase are quite di!erent from those in the normal phase [53]. A successful strategy must map out
N.K. Glendenning / Physics Reports 342 (2001) 393}447
425
the pressure plane in the three dimensional space of p, , for both phases separately, and "nd the L C intersecting line in an e$cient way. 5.2. Relativistic mean-xeld model with kaons In the approach presented here, which is based closely on work with J. Scha!ner-Bielich, we use a relativistic nuclear "eld theory solved in the mean-"eld approximation just as above [53]. The interaction between baryons is mediated by the exchange of scalar and vector mesons. This picture is consistently extended to include the kaons. The model is similar to the one used for describing the properties of the H dibaryon in nuclear matter which is known to be thermodynamically consistent [54]. The coupling schemes applied for the kaon are in analogy to the one we used for the H dibaryon [55]. We employ the same Lagrangian for the description of the nucleon sector as before; however, so as to emphasize features associated with kaon condensation, only nucleons rather than the entire baryon octet are included. The coupling constants in the nucleon sector can be determined algebraically as in [22] so as to assure a proper normalization of the theory to nuclear matter properties at saturation density. In particular, for normal symmetric matter, the following properties and their values are assured: E/A"!16.3 MeV, "0.153 fm\, a "32.5 MeV, K"240 MeV, and mH/m"0.78. Other parameterizations will not change the overall feature of kaon condensation as discussed here but may alter the threshold density, limiting neutron star mass and radius. There are two main schemes for treating the kaon interactions with the nuclear medium. One uses terms derived from chiral perturbation theory [56] } the other couples the kaon to meson "elds [57]. We choose to take the latter approach so that nucleon and kaon interactions are treated on the same footing. It is also simpler. However, we couple kaons to the other mesons in a slightly di!erent manner than heretofore. The kaon is coupled to the meson "elds using minimal coupling L "DHKHDIK!mHKHK ) I )
(48)
where the vector "elds are coupled by de"ning D "R #ig #ig ) . I I S) I M) I
(49)
Then the vector "elds are coupled to a conserved current. The isospin operator is understood to act on the kaon wave function which we denote by K. The form (49) results in another coupling term in the Lagrangian (48) of the form 2g IKHK S) I
(50)
which gives a nonlinear density dependence of the kaon optical potential. This is in addition to the standard Yukawa coupling term. The scalar "eld is coupled to the koan by analogy to the minimal coupling scheme of the vector "elds mH "m !g . ) N) )
(51)
426
N.K. Glendenning / Physics Reports 342 (2001) 393}447
In addition to the standard linear Yukawa coupling term, it gives also a quadratic coupling term to the scalar "eld in the Lagrangian having the form (g )KHK . N)
(52)
This term is small compared to the linear Yukawa coupling term as it is suppressed by g /(2m ). N) ) Nevertheless, it will simplify the equations of motion considerably as we will show in the following. The equation of motion for the kaon can be written as [D DI#mH]K(x)"0 . I )
(53)
In uniform in"nite matter (which any locally inertial frame in the star approximates), we expand K(x) in momentum eigenfunctions. Carrying out the indicated operations, we obtain a long factor times the Kaon momentum eigenstate. Since the eigenstate is not supposed to vanish, the factor must. We thus obtain the dispersion relation for the K\ straightforwardly, ! #m #k# ( , k, )"0 . ) ) ) )
(54)
Here, the time component of the kaon 4-momentum is denoted by and the magnitude of the ) 3-momentum by k. Dependence on the baryon density enters through the dependence of the scalar and vector "elds, , and on the baryon Fermi momentum. The term ( , k, ) is ) ) called the K\ self-energy in the medium and is given by ( , k, )"!2 (g #g )!(g #g )!2m g #(g ) ) ) ) S) M) S) M) ) N) N)
(55)
and depends on the in-medium kaon energy . As in our earlier work, is the time-like Lorentz ) component and the isospin 3-component. The space-like components of I and I vanish. The preceding two equations can be written [m !g ]"[ #(g #g )] , ) N) ) S) M)
(56)
for k"0. We "nd at once the dispersion relation for s-wave condensation for the K\
"m !g !g !g ) ) N) S) M)
(57)
which is linear in the meson "elds. Thus the medium modi"ed kaon energy is reduced from its vacuum value to a smaller, density-dependent value. There appear additional source terms in the equation of motion for the meson "elds if a kaon condensate is present, namely m "g [ !bm (g )!c(g )]#2g mH KHK , N N, Q , N, N, N) ) m "g ( # )!2g ( #g #g )KHK , S S, N L S) ) S) M) m "g ( ! )/2!2g ( #g #g )KHK . M M, N L M) ) S) M)
(58)
The scalar density was written earlier, and and denote neutron and proton number densities. L N The vector meson is driven by their sum and the isovector by their di!erence. Note that the
N.K. Glendenning / Physics Reports 342 (2001) 393}447
427
equations of motion for nucleons are unchanged. The conserved current associated with the kaons is derived by using
RL RL J)"i KH ! K H I RIK RIK
"KHiR K!(iR KH)K!2g KHK!2g ) KHK . I I S) I M) I
(59)
In the mean-"eld approximation, the K\ density is given by n "!J)"2( #g #g )KHK . ) ) S) M)
(60)
For s-wave condensation we can use the dispersion relation in the form (57) to get an expression for the scalar density of the kaon, n "2mH KHK ) )
(61)
which turns out to be identical to the vector density above. (We will use the symbol to denote ) the baryon number density in the condensed phase. Therefore, we denote the kaon density by n .) ) This relation holds only for ko "0 which is the case for cold neutron star matter and s-wave condensation. It is a result of our choice of the scalar coupling scheme (51). For the negatively charged kaon the equations of motion are then simpli"ed to m "g [ !bm (g )!c(g )]#g n , N N, Q , N, N, N) ) m "g ( # )!g n , S S, N L S) )
(62)
m "g ( # )/2!g n . M M, N L M) ) The total charge in the pure phases, normal and kaon condensed phase are respectively q " ! ! , , N C I
(63)
q " ! ! !n . ) N C I )
(64)
The total baryon density is ( ! )# ( )" . N L C L L
(65)
5.2.1. Pure normal phase When the nuclear medium is in one of the pure phases, say the normal phase, the corresponding charge density must vanish identically, q ,0, and this fact must be placed as a constraint on the , solution. Thus, taking account of the condition for beta equilibrium, " ! , " (which N L C I C de"ne the nucleon and lepton densities &k in terms of the chemical potentials or Fermi $ momenta k "(!m). Eqs. (62), (63) and (65), are 5 in number and correspond to the $ unknowns, , , , , and determine their values at the chosen baryon density . L C
428
N.K. Glendenning / Physics Reports 342 (2001) 393}447
When the solution is found, the energy density and pressure can be computed from the expressions b c 1 1 1 " m # m (g )# (g )# m # m , 2 N 3 , N, 4 N, 2 S 2 N
(66)
(67)
IG$ G # dk k(k#mH , G 2 G,J b c 1 1 1 p"! m ! m (g )! (g )# m # m 3 , N, 4 N, 2 S 2 N 2 N IG$ 1 k G . dk 2 3 H (k#m G,J G The sum is over nucleons and leptons. #
5.2.2. Pure kaon condensed phase With the solution for the normal low-density phase at hand, the energy of a test kaon in that phase can be computed as a function of the baryon density of the normal phase from (57). The kaon energy in the medium, , will decrease from its vacuum value of m with increasing density of the ) ) pure normal phase because of the growth of the meson "elds appearing in (57). At the same time, \ " increases with increasing proton density. However, at the density for which the equality C )
" (68) ) C if "rst achieved, kaons will occupy a small fraction of the total volume of the medium } the kaon condensed phase. The kaon density n has acquired a "nite value and appears as a contribution in ) the meson "eld equations (62). We can now combine (57) and (68) to obtain
"m !g !g !g " . (69) ) ) N) S) M) C Thus three "eld equations (62), relation (69), charge equation (64) with q set to zero, and the ) condition that all baryons add up to yield the total baryon density ( ! )# ( )" (70) N L C L L , provide 6 equations in the unknowns , , , , and n . In this manner we see that the pure L C ) kaon condensed phase has been completely speci"ed by our equations for any chosen baryon density. The speci"c kaon contribution to the energy density reads "2mH KHK"mH n . (71) ) ) ) ) The energy density in the pure kaon condensed phase contains the contribution which has the , same appearance as above, but the "elds themselves are e!ected by the presence of the kaons as in (62). It also has the speci"c contribution of the kaons and is now given by " # . (72) ! , ) Because the kaons are in the lowest (zero) momentum state they do not contribute to the pressure.
N.K. Glendenning / Physics Reports 342 (2001) 393}447
429
5.2.3. Mixed normal and kaon phase We turn now to the mixed phase. Here, Gibbs conditions for equilibrium (we assume common ¹ of zero so do not mention it) and the conservation laws can be satis"ed simultaneously for substances of more than one conserved charge by applying the conservation law(s) only in a global rather than a local sense [14,15]. We discussed this thoroughly in connection with hybrid stars. Thus for neutron star matter which has two conserved charges, the Gibbs conditions and conservation of electric charge read p ( , )"p ( , ) , (73) , L C ) L C q "(1! )q ( , )# q ( , )"0 , (74) , L C ) L C where p , q and p , q denote the pressure and charge density of the normal and kaon condensed , , ) ) phases, respectively. For any volume proportion of kaon phase in the interval (0,1), the six equations (62), (69), (73) and (74) determine the same six unknowns as for the pure kaon phase above. Note that we have changed independent variable in the mixed phase from baryon density to kaon volume fraction . The total volume averaged baryon density in the mixed phase is given as a function of by "(1! ) ( , )# ( , ) , (75) , L C ) L C where and denote baryon number density in the two phases, respectively. A similar equation , ) holds for the energy density. Therefore, the chemical potentials and all "eld quantities are functions of the volume proportion
of kaon condensed phase and therefore also are all other properties of the two phases in equilibrium, including the common pressure. The properties will vary as the proportion of phases. Why and how they vary depends on how the internal driving forces can exploit the degrees of freedom (one less than the number of independent chemical potentials) so as to minimize the total energy [15]. As we discussed previously, for isospin asymmetric nuclear systems, it is the isospin restoring term in the total energy (66) plus a contribution depending on the di!erence in Fermi energies of particles of opposite isospin. It will be noted that the pressure equality (73) cannot be solved simultaneously with conditions of local charge neutrality, q ( , )"0, q ( , )"0, since then seven equations would have to , L C ) L C be satis"ed with the same six variables mentioned. Since all kaons can condense in the lowest energy state, they become energetically more favorable than electrons as the neutralizing agent of positive charge. With further increase of density and decrease in kaon energy , the electron chemical potential will decrease and the ) electron population will decrease. It is evident from (69) that as the "elds grow with density, will ) decrease from its vacuum value and turn the electron chemical potential into a decreasing function of density. We see this in Fig. 22. The Lagrangian for the kaons (48) describes the kaon}nucleon interaction as well as the kaon}kaon interaction. The K\ in a nuclear medium is certainly a coupled channel problem due to the opening of the , channels and cannot be treated on the mean-"eld level. Coupled channel calculations at "nite density, "rst done by Koch [58], yield an attractive potential for the K\ at normal nuclear density of about ; \ ( )"!100 MeV. Waas and Weise "nd a value of ) ; \ ( )"!120 MeV [59]. Kaonic data support the conclusion that there is a highly attractive )
430
N.K. Glendenning / Physics Reports 342 (2001) 393}447
kaon optical potential in dense nuclear matter [60]. Because the kaon is a boson it does not add directly to the pressure; it forms a Bose condensate in the s-wave with zero momentum [56]. This is contrary to pion condensation which condenses in a p-wave with a "nite momentum. A selfconsistent treatment of the in-medium self-energy of the pion prevents pion condensation [61]. A coupled channel calculation including the modi"ed self-energy of the kaon has been studied in [62] and it was found that the kaon still sees an attractive potential at high density. On the mean-"eld level considered here, the three kaon coupling constants, g , g and g can N) S) M) be "xed to kaon}nucleon scattering lengths. The in-medium potentials for the K\ are given by G-parity, i.e. by switching the sign of the vector potential. This gives similar results for the K\ optical potential compared to the coupled channel calculations [63]. We choose to couple the vector "elds according to the simple quark and isospin counting rule and 2g "g . (76) "g M) M, S) S, (Note that the factor 2 appears because the coupling constant for kaon to K and N were de"ned di!erently in (49) and (19); Otherwise (55) and subsequent equations would have a factor 2 in front of g .) The scalar coupling constant is "xed to the optical potential of the K\ at : M) ; ( )"!g ( )!g ( ) . (77) ) N) U) The kaon potential is "xed, as quoted above, at normal nuclear density and varies as a function of the density . g
5.2.4. Comparison of Maxwell and Gibbs We discussed in Section 3.1 why conservation laws in a substance consisting of two phases in equilibrium must be applied globally rather than locally when the substance has more than one conserved charge (or independent component). We do so again in a di!erent way. For the special case of only one chemical potential, the equation describing mechanical, chemical and thermal equilibrium between two phases 1 and 2 is p (, ¹)"p (, ¹); it has a unique solution for which is often found by use of a Maxwell construction in one form or another. For example, the common tangent method, is based on the fact that "d/d"dE/dN. Write the equation of state in the form "(). The segment of the common tangent, "!p # touching the equation of state once in each phase describes the mixed phase with common and constant values of p and , independent of the proportion of the two phases. Clearly, the Maxwell construction can assure that only a single chemical potential is common to both phases. However, neutron star matter has two independent chemical potentials, and , each of which L C must be equal in the two phases to assure equilibrium. Hence, a Maxwell construction cannot be used as it will produce a discontinuity in one of the chemical potentials and will describe an unstable state } one for which there is a potential di!erence at the boundary between phases. This general fact concerning phase transitions with more than one conserved charge inside neutron stars was realized only a few years ago [14,15], and subsequently applied to the liquid}vapor transition in warm nuclear matter [64]. For kaon condensation, we contrast the Maxwell and Gibbs construction of phase equilibrium in a multi-component system, the latter having been formulated in Ref. [15]. A Maxwell construction is often implemented by looking at the thermodynamical potential of interest, here the pressure, as a function of the chemical potential as depicted in Fig. 20. The "rst portion of the curve
N.K. Glendenning / Physics Reports 342 (2001) 393}447
431
Fig. 20. The pressure versus the baryon chemical potential for a Maxwell construction (dashed line) compared to the Gibbs condition (solid line). The Gibbs condition is thermodynamically more stable [53].
corresponds to the normal phase and is physical up to the crossing point; from that point, the curve represents the kaon condensed phase. The crossing of the curve is the point of equal pressure and baryon chemical potential in the two phases. The temperature, being constant along the curve (¹"0), the crossing point represents phase equilibrium in this construction, if there is only one chemical potential; i.e. the substance has only one independent component. By contrast, Gibbs conditions (3) implemented as described with global conservation of charge (74), yields the smooth monotonic curve. It is the thermodynamically favored state since its pressure is higher than the pressure of the Maxwell construction. And by construction described above, the pressure and both chemical potentials of this two component substance are common to the two phases along the curve. The mixed phase is not a point, as in the case of Maxwell, but the line and the mixed phase begins and ends where it touches the curves of the two pure phases (dashed line). The pressure di!erence between the two cases depicted in Fig. 20 depends on the equation of state and the optical potential of the kaon. In addition, it is also sensitive to surface interface and Coulomb corrections. Here, we discuss only bulk matter. Coulomb and surface energy will reduce the pressure in the mixed phase. However the Gibbs construction, even with such corrections, cannot fall below the Maxwell curve. (Recall the discussion in the paragraph preceding Eq. (47).) The actual energy di!erence between the Maxwell and Gibbs curves depends on the surface tension. Only recently has the surface tension between two phases of high density matter been computed, namely the normal and kaon condensed phase [51]. That work was too late to be incorporated here. We use the same approximation as was discussed for the decon"nement phase transition. But because the sum of Coulomb energy and surface energy vanishes at the boundaries of the mixed phase, as seen in (45), the boundaries are una!ected by the precise value of the surface tension, to "rst approximation (i.e. treating the structure as a perturbation on the bulk energy and pressure). Another di!erence is quite striking; the mixed phase as described with the release of the constraint of local conservation is broader than for the Maxwell case. This can be easily
432
N.K. Glendenning / Physics Reports 342 (2001) 393}447
Fig. 21. The equation of state for a pure nuclear matter (solid line), pure kaon matter (dotted line) for ; ( )"!140 MeV. The Maxwell construction is shown by the horizontal line, the Gibbs solution by the dashed) dotted line. Results for other values of ; can be found in Ref. [53]. ) Fig. 22. The electron chemical potential and the baryon chemical potential (denoted by in the text) for the case L ; ( )"!120 MeV using the Gibbs condition (solid line) and a Maxwell construction (dashed line). The large electric ) potential di!erence that occurs for the Maxwell construction gives rise to an instability [53].
understood as owing to the fact that in a multi-component substance, the two phases in equilibrium can adjust to each others presence to optimize the thermodynamic function. We will discuss in more detail later the geometric features of the mixed phase that arise from the di!erent and opposite charges acquired by the two phases by the redistribution of charge so as to minimize the isospin asymmetry of the normal nuclear phase. The di!erences between the two descriptions, Maxwell and Gibbs, are striking for the relevant ingredient for neutron star calculations, namely the equation of state as plotted in Fig. 21. The solid line shows the equation of state for the normal hadronic phase of neutron star matter, the dotted line the one for pure kaon condensed matter. The Maxwell construction results in a region of constant pressure (solid horizontal line) connecting the two di!erent equation of states. Applying the Gibbs condition with conservation of two (or more) conserved charges causes two major di!erences compared to the Maxwell construction. First, the pressure increases monotonically through the mixed phase rather than being constant as for a one-component substance. Second, the density range of the mixed phase is much broader; it starts at a lower density and ends at a much higher density. Hence, the mixed phase can well be the dominant portion of a neutron star. There is another striking feature apparent in Fig. 21; the Gibbs phase transition for a substance having more than one independent component resembles a second-order phase transition in form. Nevertheless, it is a "rst-order phase transition since there is a region of density for which two distinct phases of the substance are simultaneously present and in equilibrium. Fig. 22 shows the behavior of the chemical potentials using Gibbs and Maxwell construction for comparison. The region of the mixed phase of neutron star matter is indicated by the vertical
N.K. Glendenning / Physics Reports 342 (2001) 393}447
433
dotted lines. The electron chemical potential increases in the pure hadronic phase as the density of neutrons and protons increase. However, at the critical density for kaon condensation the electron chemical potential decreases with further increase of density as kaons replace electrons in their role of neutralizing the charge on protons. We note that when the conservation of electric charge is imposed as a global constraint, as described above, the electron chemical potential is continuous. In contrast, if the transition were (incorrectly) treated by use of the Maxwell construction, the electron chemical potential has a discontinuity } a potential di!erence between the two phases [15]. Such a construction cannot describe an equilibrium situation. In the Maxwell construction, the electron chemical potential drops from "240 MeV to "167 MeV at the phase boundary C C resulting in a huge di!erence in the Fermi energy of the leptons between the two phases. 5.3. Matter properties with a kaon condensate In this section we discuss the equation of state including kaons emphasizing the di!erence between the hitherto applied Maxwell construction and the thermodynamic consistent Gibbs condition. Then the dependence of the presence of the condensate state on the parameters of the theory, in particular, the kaon optical potential in matter. After this, we study properties of the mixed phase consisting of normal neutron star matter and the kaon condensed state in equilibrium one with the other. 5.3.1. Dependence on parameters The form of the equation of state and the order of the phase transition depends on the chosen optical potential of the kaon, which re#ects its interaction with the nuclear medium. The equations of state for kaon optical potentials whose values at normal nuclear matter density vary between !80 and !140 MeV were shown in Ref. [53]. For ; ( )'!90 MeV (i.e. weak interaction) ) the mixed phase is absent and the phase transition is of second order. For deeper optical potentials, a mixed phase appears. The deeper the optical potential of the kaon, the lower is the threshold density of the mixed phase and the wider is the density range of the mixed phase. The equation of state is considerably softened by the presence of the kaon condensate. To see the dependence of the critical density on the underlying nuclear properties, see Ref. [53]. 5.3.2. Particle populations The populations of the nucleons, leptons and kaons for the case ; ( )"!120 MeV are ) shown in Fig. 23. As it is more favorable to produce kaons in association with protons, the neutron density remains (nearly) constant over the whole density range starting with the critical density. This has been found also in other work [65,66]. Since lepton populations are not conserved in a star because of the escape of neutrinos, the leptons decrease in number as the K\ becomes the new neutralizing agent for protons. Fig. 24 shows the population in the mixed phase for the two phases separately as a function of volume proportion of condensed phase. The normal phase population is denoted as I, the kaon phase population as II. Of particular note is that the K\ population density has a large value, in We thank Hirotsugu Fujii for providing us with the data tables to check the neutron population from [66].
434
N.K. Glendenning / Physics Reports 342 (2001) 393}447
Fig. 23. The population densities computed as the volume weighted average in the two phases as functions of the nucleon density. The neutron density stays nearly constants once kaon condensation appears [53]. Fig. 24. The population density in each phase shown separately as a function of volume fraction of kaon condensed phase. Normal phase is denoted by I and condensed phase by II [53].
fact its largest value, at the kaon condensation threshold. That is to say, it changes from zero to its largest value at threshold. For "0 the proton population in the normal phase is small and neutrality is achieved by a balance with the sum of the lepton populations. This corresponds to local neutrality in the pure phase. However with a growing fraction of condensed phase, charge neutrality is achieved more economically between the two phases in equilibrium as a global constraint } the proton population in the condensed phase increases to near equality with neutrons as the proportion of condensed phase increases, while the lepton populations decrease to the vanishing point. Isospin symmetry is thus closely achieved in the region of the system containing the normal phase. This behavior is expected [15] as a general feature of the action of the isospin driving force toward symmetry in phase transitions of asymmetric nuclear matter. The above behavior, which is driven by the isospin restoring force as discussed for the decon"nement phase transition, is mirrored in the charge density carried by each of the phases in phase equilibrium, as shown in Fig. 25. The overall charge, or charge weighted by volume of each phase vanishes as it must, but the charges on each phase separately are "nite except that the normal phase charge density goes to zero when it is the sole phase present ( "0) while the condensed phase density vanishes when it becomes the sole phase ( "1). (The system is uniform in either pure phase and has vanishing charge density.) The variation of the charge density for intermediate proportions of the two phases is such as to minimize the energy. The "nite charge densities are responsible for the formation of a Coulomb lattice, which we discuss in Section 5.5.2. 5.3.3. Comment on coupling schemes We comment here on the coupling schemes that have been employed in studies of the kaon condensate. As previously noted, we couple the kaon directly to the meson "elds. In particular,
N.K. Glendenning / Physics Reports 342 (2001) 393}447
435
Fig. 25. Charge densities of the normal and condensed phase as a function of proportion of the condensed phase [53].
the scalar meson is coupled in a minimal scheme as in (51). Coupling of kaon to meson "elds was also employed in Refs. [67,68]. However the scalar coupling was implemented there through mH"m !g . In contrast to our scheme, the kaon e!ective mass is reduced from its vacuum ) ) N) value through the scalar "eld by only half as much (in leading order). As a consequence, even though much stronger optical potentials result from the coupling constants used in Refs. [67,68], namely ;( )&180 MeV, as compared with those favored by Koch [58] and Waas and Weise [59] of ;( )"100}120 MeV, which we also favor, the phase transition found was weak and of second order. The critical densities corresponding to several parameterizations of nuclear matter can be found in Table 1 of Ref. [53]. 5.3.4. Comment on hyperons The Pauli principle practically assures that hyperons will be present in dense charge neutral matter. Their e!ect will be to quench the growth of the electron chemical potential and therefore either to raise the threshold density at which kaon condensation occurs, or to preempt condensation altogether [21]. Whether kaon condensation can actually occur therefore depends upon whether negative or neutral hyperons form a large part of the baryon population of charge neutral matter at several times nuclear matter density. This in turn depends on two factors, neither of which is under strong control. One is the strength of the coupling constants of hyperons to the scalar, vector and isovector mesons as compared to the nucleon couplings to these mesons. Only the coupling constants for the can be constrained from hyperonic data and the extrapolated value of the binding in nuclear matter [23,69]. The other factor that e!ects to some degree the hyperon fraction are the assumed compression modulus and e!ective nucleon mass at saturation density of symmetric nuclear matter [70]. At the same time, the phase transition order and the threshold density of kaon condensation also are rendered uncertain by the imprecision with which the properties of ordinary nuclear matter are known. So as to emphasize the interesting role that kaons may play in neutron stars, we have
436
N.K. Glendenning / Physics Reports 342 (2001) 393}447
omitted hyperons completely. At the present time, we are unable to say with any degree of con"dence whether kaon condensation will occur before it is preempted by the ultimate phase transition in the density and temperature domain of neutron stars } decon"nement. These e!ects should be studied in more detail in future work. 5.4. Mixed phase properties We will show in the following that the two phases in equilibrium have completely di!erent properties from each other. We will focus on the case ; "!120 MeV in this section. ) Why should the nucleons not be the same in the two phases and why can they not move freely between normal and kaon condensed phases? The answer is that when the nucleons are treated as dynamical particles they are quite di!erent in the two phases; they have di!erent masses. Their interaction with the kaon "eld is what causes the decrease of the kaon e!ective mass with increasing density. The decrease in kaon mass ultimately leads to the condensation of kaons. Conversely, the interaction also changes the nature of a nucleon. Fig. 26 illustrates the dynamical nature of the nucleon: its e!ective mass is shown as a function of baryon density. Up to "0.45 fm\"2.94 there exists only one solution } the pure nucleon phase. The e!ective mass mH decreases with density from its vacuum value at low density, to &670 MeV at the density of , normal nuclear matter, and further down to &510 MeV at the end of the pure normal phase (&3 ). A second solution appears for those regions of the medium which are occupied by the condensed phase in equilibrium with the normal phase. In contrast with the "rst solution, the e!ective mass in the condensed phase is much smaller, mH &196 MeV. The mixed phase ends at "0.97 fm\" , 6.34 and only the second solution remains. The nucleons have di!erent e!ective masses in the two phases due to the di!erent mean-"elds in the two phases. Hence, the nucleons cannot move freely between one phase and the other; a phase boundary develops. In Ref. [26] the nucleons in the two phases were treated only implicitly through a phenomenological equation of state. They did not appear as dynamical degrees of freedom, so the two solutions could not be found. Fig. 27 depicts the analogue to Fig. 26 for the energy of the K\ in the medium. Note that the ) kaon is only a test particle in the normal phase. The kaon appears physically only in the condensed phase, whether that phase be in equilibrium with the normal phase, or whether the medium is fully occupied by the condensed phase. The kaon energy decreases with density due to the attractive vector interaction with nucleons, but kaons do not appear in the medium until the threshold condition discussed above is satis"ed. The energy of a test kaon in the medium is shown in Fig. 27 as dashed}dotted line. Its energy as a test particle is also shown in the regions of the mixed phase that are occupied by the normal phase. When the kaon energy sinks to a value satisfying the threshold condition, kaons begin to appear. But because the phase transition is "rst order they appear discontinuously with "nite density (Fig. 24) in a small fraction of the total volume which is the kaon condensed phase in equilibrium with the normal phase. The energy of these medium modi"ed kaons is less than that of a test kaon in the normal phase by about 50 MeV at the condensation threshold. The two energies are shown in Fig. 27. The most pronounced di!erences between the two phases is in the energy density and the charge density. As can be read from Fig. 28, the energy density of the nucleon phase at the onset of the mixed phase is "460 MeV fm\ while it amounts to "1140 MeV fm\ for the kaon condensed
N.K. Glendenning / Physics Reports 342 (2001) 393}447
437
Fig. 26. The e!ective nucleon mass as a function of the nucleon density. Shown by vertical lines is the onset and o!set of the mixed phase [53]. Fig. 27. The K\ energy, , as a test particle in the normal phase (solid) and as a condensed kaon in the abnormal phase ) as functions of the nucleon density [53].
phase. The sum of the energy density in the mixed phase is weighted according to the volume fraction : "(1! ) ( )# ( ) (78) , ! and grows monotonically with density but not linearly, as is the case in the Maxwell construction. The non-constant pressure is of course associated with the nonlinearity of the energy. 5.5. Stellar properties with kaon condensed phase 5.5.1. Large-scale features We have already stressed how di!erently the computed equation of state and matter properties are, depending on whether the Maxwell construction is used to determine (incorrectly) the mixed phase of normal and condensed phase, or whether Gibbs criteria for equilibrium are fully respected. We start our discussion of the large-scale properties of stars by illustrating the di!erence in the mass-energy distribution in a star depending on which method is used. Fig. 29 shows the distribution in the two cases.
( ) denotes the total energy density of the kaon phase, both nuclear (as modi"ed by the kaons) and the kaon ! contributions as de"ned in (72).
438
N.K. Glendenning / Physics Reports 342 (2001) 393}447
Fig. 28. The energy density of normal phase (solid) and kaon condensed phase (dashed). The total energy in the mixed phase is the volume weighted sum (dash-dotted) [53]. Fig. 29. Mass-energy distribution according to whether the mixed phase is treated by the Maxwell construction (dashed line), or so as to respect the continuity of both chemical potentials (solid line) [53].
For the Maxwell construction the energy density is discontinuous at the particular radius at which the pressure has the constant value of the Maxwell construction. The discontinuity is analogous to the separation of the phases in a gravitational "eld that is characteristic of a substance having a single component (like the steam above water). As we discussed earlier, neutron star matter in beta equilibrium does not behave like that: it has two independent components and all properties are continuous from one phase to another. The distribution of mass-energy for such a star is the continuous curve with a discontinuity only in slope but not in value at the boundary between mixed and pure normal phase. The central core of mixed phase is surrounded by normal dense nuclear matter. For the particular value of ;( )"!120 MeV, the mixed phase extends all the way to the center of the star and the pure condensed phase does not appear. Depending on the kaon potential ; ( ), the pure kaon condensed phase may not appear in the ) star, even for the star at the mass limit. Such is the case in the above illustration. However, for a potential ; ( )"!140 MeV, the pure kaon condensed phase would form the core of stars ) with a mass above about 1.25M , and for the limiting mass star, the condensed phase would > extend to about 4.5 km. Likewise, for ; ( )"!130 MeV, the central part of the more massive ) stars are occupied by a pure condensed phase. The distribution of particles in the limiting mass star is dominated by the neutron in the normal phase outside 3 km as can be seen from Fig. 30. The K\ and proton are the dominant species in the mixed phase core. Lepton populations fall rapidly, as expected, as the K\ becomes dominant. However, overall, the proton population is far less than the neutron, and there appears little justi"cation in referring to a star with a kaon condensate as a nucleon star. Stellar sequences for several choices of the kaon potential ; ( ) are shown in Fig. 31. Naturally ) the limiting mass decreases with increasing potential (for which the condensate density threshold is lower). Potentials with values only a little below ; ( )"!120 MeV would not be compatible )
N.K. Glendenning / Physics Reports 342 (2001) 393}447
439
Fig. 30. The composition of the maximum mass neutron star with a mass of M"1.555M . Note that while protons are > the dominant species at the center of the star, overall, they are a minority population [53]. Fig. 31. The mass sequence for kaon condensed neutron stars treated by the Maxwell construction (dashed line), or so as to respect the continuity of both chemical potentials (solid line) [53].
Fig. 32. The mass-radius relation for kaon condensed neutron stars using a Maxwell construction (dashed line) or Gibbs condition (solid line) [53].
with the mass of the Hulse}Taylor pulsar, for the underlying theory of matter used here. However, we do not consider this to be very signi"cant. The underlying theory of matter is certainly an approximation, and we are interested in stark phenomena } not in "ne details. There is a mechanical instability for the Maxwell case that is initiated by the central densities for which the pressure remains constant. In this case the necessary condition for stability, dM/d '0 is not satis"ed. (The A section of the dashed curve in Fig. 32 for which R is a decreasing function of M is the corresponding unstable region.) Such an unstable region is absent when the phase transition is treated using Gibbs' conditions. The mass}radius relation for several sequences is shown in Fig. 32. Comparison is made with the case corresponding to the Maxwell construction for the phase transition. It is clear that the radius
440
N.K. Glendenning / Physics Reports 342 (2001) 393}447
Fig. 33. Bulk energy densities of normal and kaon phases as a function of the volume fraction of the kaon phase, the surface tension which is assumed to be proportional to their di!erence, and the sum of Coulomb and surface energy density [53]. Fig. 34. Charge densities in the normal and kaon condensed fractions of the mixed phase in the limiting mass star of the case ;( )"!120 MeV. Volume fraction of kaon phase is also plotted [53].
especially and the limiting mass are sensitive functions of ; ( ). For the preferred value of ) ; ( )"!120 MeV, radii are similar to neutron stars without the condensate. There appears to ) be a sharp break in behavior of M vs. R for ; ( )(!120 MeV. However the behavior is ) actually continuous but depends sensitively on ; ( ): a pure kaon condensed core develops with ) decreasing values of the optical potential below &!120 MeV and this causes the change of the radius from R+12.5 km to R+8 km for ; ( )"!140 MeV. ) 5.5.2. Geometrical structures Neutron star matter in the normal phase is necessarily highly isospin asymmetric since charge neutrality is imposed by the weakness of the gravitational "eld compared to the Coulomb force. However, since kaons are bosons, they can all occupy the zero momentum state. Consequently, when the two phases, normal and kaon condensed, are in phase equilibrium, the normal phase can come closer to isospin symmetry as can be seen in Fig. 24. This is achieved by charge exchange as driven by the isospin restoring force arising in part from the Fermi energies and in part from the coupling of the meson to the nucleon isospin. Naturally, the possibility of achieving isospin symmetry varies as the proportion of the kaon phase. Regions of normal matter will be positively charged while regions of the kaon condensed phase will be negatively charged. In this way, the overall energy is reduced by exploitation of the possibility of achieving charge neutrality globally rather than locally. As was discussed in Ref. [15], regions of like charge will tend to be broken up into small regions while the surface interface energy will resist. The competition is resolved by formation of a Coulomb lattice much as nuclei embedded in an electron gas. The di!erence here is that it is two phases of nuclear matter that are involved. The rarer phase will occupy lattice sites embedded in the dominant phase. As the proportion of
N.K. Glendenning / Physics Reports 342 (2001) 393}447
441
phases changes, the total energy consisting of volume, surface and Coulomb energies will be minimized by a sequence of geometrical forms at the lattice sites, which we idealize as drops, rods and slabs, just as for nuclear matter embedded in a background of free electrons and neutrons [48]. Relevant details of the structure calculation can be found in Refs. [17,71]. In the present situation, the physical quantities that determine the geometrical structure are shown in Figs. 25 and 33. Of course a calculation of the geometric structure, which results from a competition between coulomb and surface energies, requires a knowledge of the surface tension at the interface between the phases [51]. However, independent of the particular value what we do know is that: (1) The sizes, spacings and the sum of Coulomb and surface energies scale as . (2) To "rst approximation, the locations of the transition from one geometric phase to another does not depend on . (3) The threshold density of the mixed phase and the density at which it ends is not disturbed by any uncertainty in because the sum of Coulomb and surface energies vanishes at the end points (Fig. 33, see Eq. (2) in Ref. [17]). (4) The structured phase lies lower in energy than the unstructured (See near end of the Introduction of Ref. [72].) This is because the structured phase results from a degree (or degrees) of freedom } that of rearranging the concentration(s) of conserved quantities between the two phases in equilibrium } which were frozen in treatments of the phase transition in which conservation of charge was imposed as a local constraint. (The energy of a constrained physical system can never lie below that of an unconstrained one.) For the above reasons the dimensions shown in Fig. 35 } the size of the geometrical structures and their spacing } provide a guide, but the locations of phases should be quite accurate. The charge densities carried by the two phases and the volume fraction of the kaon phase is shown in Fig. 34 as a function of radial coordinate in a star. Of course, the net charge vanishes in the global sense ( (x) dx,0). Outside of 5 km, matter is in the pure nuclear phase and it is O identically chargeless, the proton population being balanced by electrons ( (x),0). In the O idealized geometry of shapes, the kaon phase will "rst form at the threshold density of condensation as spheres spaced far apart. As the fraction of kaon phase increases, the spacing will decrease and eventually the spheres will merge to form rods and then slabs. As the volume fraction of the kaon phase comes to dominate, slabs of normal phase will be present in a background of kaon phase, and the role of the two phases is interchanged. The diameter and spacing of the geometrical forms of the crystal lattice is shown in Fig. 35 for the limiting mass star. Notice that the mixed phase extends in this case from the center of the star to about 4.8 km. The kaon condensed phase disappears at larger radius or lower density as the spacing of the droplets of condensed phase tends to in"nity. The location of the boundaries of the various phases can be seen in Fig. 36 for stars of various mass. These are rather remarkable properties of the mixed phase, which in the limiting mass star occupy the inner 5 km. It is "lled with geometrical forms of varying shapes and spacings, according to depth in the star. The charge density within the geometrical objects and the background phase is opposite in sign and varying in magnitude with depth. Finally the e!ective mass of the nucleons is radically di!erent in the two phases as can be seen in Fig. 37. All of these features must have their e!ect on transport properties and possibly on Glitch phenomena.
We illustrate features of the geometrical structure for the case ;( )"!120 MeV from this point on.
442
N.K. Glendenning / Physics Reports 342 (2001) 393}447
Fig. 35. Diameter D (lower curve) of objects (drops, rods, slabs) of the rarer phase immersed in the dominant phase, located at lattice sites spaced S apart (upper curve) [53]. Fig. 36. Radial boundaries between phases are shown for a range of stellar masses [53].
Fig. 37. Nucleon e!ective mass in the normal and kaon condensed phase as a function of radial location in the limiting mass star [53].
6. Possible consequences of geometrical phases There are several possible consequences of geometrical structure that were mentioned in our original work, a variation of pulsar glitches from pulsar to pulsar according to the mass, and a change in all transport properties introduced by the presence of a lattice [15]. We discuss these in turn. 6.1. Pulsar glitches Glitches are thought to correspond to changes induced in the moment of inertia of the star either as a crack in the crustal region or as a massive number of super#uid vortex lines undergo shifts in
N.K. Glendenning / Physics Reports 342 (2001) 393}447
443
the location of the sites in the solid regions to which they are pinned [73]. The relocation of vortex lines or crustquake, whichever is the case, occurs unpredictably as the instantaneous con"guration carrying the angular momentum comes out of equilibrium with decreasing spin of the star and creates stresses that are relieved by massive unpinning or cracking of the crust. The thin crust is a location at which the vortex lines can be pinned. But in the present model, the vortex lines do not thread through the entire star, pinned at each end on the crust, but are pinned at one end on the interior crystalline mixed phase. The extent of this region varies sensitively as the mass of the star, perhaps accounting for the wide variety in glitch phenomena observed in di!erent pulsars. Unpinning of vortex lines from sites in a solid region, or a crustquake may trigger changes in the interior crystalline region. In either case, the thickness of the inner crystalline region, whether it occupies a central spherical region or a shell of some kilometers of thickness, depends very sensitively on the mass of the star (compare Figs. 3 and 4). These are speculative thoughts on a possible e!ect of an interior solid region, which could nonetheless lie behind the large degree of individuality with which pulsars glitch [74]. They certainly need, and possibly deserve, detailed investigation. 6.2. Neutrino transport Neutrino di!usion is the only transport property for which the e!ects of structure in a mixed phase have been investigated so far [16]. Indeed, the problem is quite di$cult to solve. Reddy et al. [16] have been able to demonstrate its importance however. They calculate the scattering of
Fig. 38. Scattering cross-section of neutrinos from quark droplets in a sea of neutron star matter (solid line), as compared with the cross-section in uniform matter (dashed line). Notice that the latter cross section is multiplied by a factor 50. (Courtesy of S. Reddy as adapted from Ref. [16].)
444
N.K. Glendenning / Physics Reports 342 (2001) 393}447
neutrinos from droplets, including droplet}droplet correlations, but neglecting coherent and Bragg scattering that would occur in an ordered lattice such as those computed in this paper. Their results are therefore appropriate for the early stage of neutron star birth when the temperature is above the melting point of the lattice. They also treat only droplets and not the other geometrical forms that would likely appear in di!erent regions of the star according to the monotonic pressure or density changes. (See Figs. 18 and 35 for example.) The above authors study both the quark decon"nement and the kaon condensate phase transitions. Their results show a more than hundred-fold increase in neutrino-droplet cross-section as compared to uniform matter at a temperature of ¹"10 MeV and a baryon density of 0.7 fm\, as is seen in Fig. 38. The mean free path of neutrinos is thus dramatically reduced by the structured mixed phase. How precisely this translates into the brief epoch over which most neutrinos di!use from the star reducing its temperature from tens of MeV to less than an MeV, is not known. The neutrino pulse from a newly born star is usually quoted as being about 20 s in duration [1]. Clearly the e!ects of structure on neutrino di!usion and other properties such as electrical conductivity are interesting and likely to be large.
7. Summary This paper presents a detailed study of the crystalline structure that is expected to occur in the coexistence phase of hadronic matter and any of its high density phases such as quark matter or kaon condensate in any fully equilibrated system such as a degenerate star. Indeed, spatial structure such as we have described will occur in the mixed phase region of any equilibrated system that has more than one conserved charge (or independent component) of which one is the electric charge (giving rise to a long-range interaction). The physical reasons underlying the structured phase in neutron stars was explained. For hypothetical compact hybrid stars, stars with a neutron star exterior, and a mixed phase interior, we have computed the varying geometrical structures and radial extent that they occupy as a function of stellar mass in two models that span the range in uncertainty in nuclear matter properties. (Nonetheless there is uncertainty in any detailed aspect of the models because models of high-density matter are uncertain and the properties of dense matter unknown, save for several general principles } causality, microscopic stability (Le Chatelier's principle) and asymptotic freedom.) We "nd extreme sensitivity of the spatial structure to stellar mass. The pattern is the following: for stars near the limiting mass, the interior few kilometers may be occupied by the pure high density phase surrounded by a few kilometer thick crystalline region having all geometrical forms from hadronic drops immersed at lattice sites in the high density phase at the inner edge of the mixed phase, to drops of the high density phase immersed in nuclear matter at the outer edge, with other forms, rods and slabs between. Exterior to the crystalline phases is the liquid phase of nuclear matter which in turn is enclosed by a thin metallic crust. For stars only slightly less massive, the pure high density phase is absent and the crystalline mixed phase extends to the center of the star. The radial extent of the crystalline structure shrinks with decreasing mass until the star is purely a neutron star. However, according to our calculations, the mixed phase is absent only for very low-mass stars, below about M . Such stars appear not to exist [75], and for good reason [22]. > For a collapsing core to evade formation of a black hole, it must eject the infalling outer shells of the presupernova star. The source of energy is the binding of the newly forming compact star and
N.K. Glendenning / Physics Reports 342 (2001) 393}447
445
the energy is transmitted very ine$ciently to the infalling matter by neutrinos. But binding energy decreases very strongly with decreasing mass [22, Fig. 3.12]. Su$cient energy for ejection is therefore available only for a narrow range of cores at and immediately below the limiting mass star. It is almost certain that a solid region in a pulsar will play a role in the period glitch phenomenon, which is highly individualistic from one pulsar to another. This is expected for at least two reasons. All solid regions will be subject to stresses as a pulsar spins down and the centrifugal force decreases. Solid regions alone can be the site of episodic relief of stresses that cause sudden changes in the moment of inertia and hence &glitches' in the rotational frequency. Previously the only expected region of solid material was the thin crust. Here we "nd also a deep interior solid region of thickness of a few kilometers, which varies sensitively from one star to another according to the stellar mass. Moreover, we can anticipate that the region occupied by the inner solid region and the boundaries between the di!erent spatial geometries will vary through the life of a given pulsar as it spins down and its density pro"le changes. Moreover, since the compensating charge density of both phases in equilibrium are non-zero and vary as the proportion of the phases and therefore with depth in the star, the super#uid properties will be di!erent from those expected for a charge neutral hadronic #uid. We tentatively suggest that the individualist behavior of pulsars with respect to their glitch characteristics arises from the extreme sensitivity of the crystalline structure on stellar mass both for reasons of its being a solid as well as the di!erent superconducting properties expected for the coexistence phase. In addition, the dependence of glitch activity on pulsar age [76] may be understood in terms of the evolution of the crystalline region with changing angular velocity [6, Fig. 1]. These are speculative associations between pulsar glitch behavior and the interior structure which seem plausible and worth pursuing.
Acknowledgements This work was supported by the Director, O$ce of Energy Research, O$ce of High Energy and Nuclear Physics, Division of Nuclear Physics, of the US Department of Energy under Contract DE-AC03-76SF00098. I am indebted to J. Scha!ner-Bielich for permission to quote extensively from our joint work on kaon condensation [53].
References [1] [2] [3] [4] [5] [6] [7]
A. Burrows, J.M. Lattimer, Astrophys. J. 307 (1986) 178. S.A. Colgate, R.W. White, Astrophys. J. 143 (1966) 626. S.A. Colgate, in: Wheeler et al. (Eds.), Supernovae, World Scienti"c, Singapore, 1990, p. 249. R. Wijnands, M. van der Klis, Nature 394 (1998) 344. D. Chakrabarty, E.H. Morgan, Nature 394 (1998) 346. N.K. Glendenning, S. Pei, F. Weber, Phys. Rev. Lett. 79 (1997) 1603. N.K. Glendenning, Pulsar signal of decon"nement, Plenary talk at International Conference on Ultra-Relativistic Nucleus}Nucleus Collisions, Quark Matter, Vol. 97, Japan; Nucl. Phys. A 638 (1998) 239c. [8] M.C. Miller, F.K. Lamb, D. Psaltis, Astrophys. J. 508 (1998) 791. [9] D. Lai, R. Lovelace, I. Wasserman, preprint: astro-ph/9904111, 9 April 1999.
446 [10] [11] [12] [13] [14] [15] [16] [17] [18] [19] [20] [21] [22] [23] [24] [25] [26] [27] [28] [29] [30] [31] [32] [33] [34] [35] [36] [37] [38] [39] [40] [41] [42] [43] [44] [45] [46] [47] [48] [49] [50] [51] [52] [53] [54] [55] [56]
N.K. Glendenning / Physics Reports 342 (2001) 393}447 R.H. Fowler, Mon. Not. R. Astron. Soc. 87 (1926) 114. S. Chandrasekhar, Astrophys. J. 74 (1931) 81. S. Chandrasekhar, Mon. Not. R. Astron. Soc. 95 (1935) 207. W. Baade, F. Zwicky, Phys. Rev. 45 (1934) 138. N.K. Glendenning, Nucl. Phys. B (Proc. Suppl.) 24B (1991) 110. N.K. Glendenning, Phys. Rev. D 46 (1992) 1274. S. Reddy, G. Bertsch, M. Prakash, Phys. Lett. B 475 (2000) 1. N.K. Glendenning, S. Pei, Phys. Rev. C 52 (1995) 2250. W. Keil, H.T. Janka, Astron. Astrophys. 296 (1995) 145. N.K. Glendenning, Astrophys. J. 448 (1995) 797. M. Prakash, J.R. Cooke, J.M. Lattimer, Phys. Rev. D 52 (1995) 661. N.K. Glendenning, Astrophys. J. 293 (1985) 470. N.K. Glendenning, COMPACT STARS, Nuclear Physics, Particle Physics, and General Relativity, Springer, New York, 1st Ed. 1997, 2nd Ed. 2000. N.K. Glendenning, S.A. Moszkowski, Phys. Rev. Lett. 67 (1991) 2414. O.V. Maxwell, Astrophys. J. 316 (1987) 691. M. Prakash, M. Prakash, J.M. Lattimer, C.J. Pethick, Astrophys. J. 390 (1992) L77. V. Thorsson, M. Prakash, J.M. Lattimer, Nucl. Phys. A 572 (1994) 693. G.E. Brown, J.C. Weingartner, Astrophys. J. 436 (1994) 843. I.B. Khriplovich, Yad. Fiz. 10 (1969) 409; D.J. Gross, F. Wilczek, Phys. Rev. Lett. 30 (1973) 1343; H.D. Politzer, Phys. Rev. Lett. 30 (1973) 1346. A. Chodos, R.L. Ja!e, K. Johnson, C.B. Thorne, V.F. Weisskopf, Phys. Rev. D 9 (1974) 3471. H.J. Pirner, Prog. Part. Nucl. Phys. 29 (1992) 33; M.C. Birse, Prog. Part. Nucl. Phys. 25 (1990) 1; J.A. McGovern, Nucl. Phys. A 552 (1991) 553. G. Baym, S. Chin, Phys. Lett. 62B (1976) 241. G. Chapline, M. Nauenberg, Nature 264 (1976) 235. B.D. Keister, L.S. Kisslinger, Phys. Lett. 64B (1976) 117. M.B. Kislinger, P.D. Morley, Astrophys. J. 219 (1978) 1017. J.L. Friedman, B.F. Schutz, Astrophys. J. 222 (1978) 281. W.B. Fechner, P.C. Joss, Nature 274 (1978) 347. A.G. Lyne, F. Graham-Smith, Pulsar Astronomy, Cambridge University Press, Cambridge, 1990. N.K. Glendenning, A crystalline quark-hadron mixed phase in neutron stars, Phys. Rep. 264 (1995) 143. H. Heiselberg, C.J. Pethick, E.F. Staubo, Phys. Rev. Lett. 70 (1993) 1355. V.R. Pandharipande, E.F. Staubo, in: B. Sinha, Y.P. Viyogi, S. Raha (Eds.), Proceedings of the Second International Conference of Physics and Astrophysics of Quark-Gluon Plasma, Calcutta, 1993, World Scienti"c, Singapore, 1994. N.K. Glendenning, Phys. Lett. 114B (1982) 392. J. Boguta, A.R. Bodmer, Nucl. Phys. A 292 (1977) 413. B.D. Serot, H. Uechi, Ann. Phys. (New York) 179 (1987) 272. J.I. Kapusta, K.A. Olive, Phys. Rev. Lett. 64 (1990) 13. J. Ellis, J.I. Kapusta, K.A. Olive, Nucl. Phys. B 348 (1991) 345. E. Friedman, C.J. Batty, A. Gal, Phys. Rep. 287 (1997) 431. D.Q. Lamb, J.M. Lattimer, C.J. Pethick, D.G. Ravenhall, Nucl. Phys. A 360 (1981) 459. D.G. Ravenhall, C.J. Pethick, J.R. Wilson, Phys. Rev. Lett. 50 (1983) 2066. R.D. Williams, S.E. Koonin, Nucl. Phys. A 435 (1985) 844. W.D. Myers, W.J. Swiatecki, C.S. Wang, Nucl. Phys. A 436 (1985) 185. M. Christiansen, N.K. Glendenning, J. Scha!ner-Bielich, Phys. Rev. C 62 (2000) 025804. J. Scha!ner, I.N. Mishustin, Phys. Rev. C 53 (1996) 1416. N.K. Glendenning, J. Scha!ner-Bielich, Phys. Rev. C 60 (1999) 25803. A. Faessler, A.J. Buchmann, M.I. Krivoruchenko, Phys. Lett. B 391 (1997) 255. N.K. Glendenning, J. Scha!ner-Bielich, Phys. Rev. C 58 (1998) 1298. D.B. Kaplan, A. Nelson, Phys. Lett. 175B (1986) 57.
N.K. Glendenning / Physics Reports 342 (2001) 393}447 [57] [58] [59] [60] [61] [62] [63] [64] [65] [66] [67] [68] [69] [70] [71] [72] [73] [74]
447
M. Prakash, I. Bombaci, M. Prakash, P.J. Ellis, J.M. Lattimer, R. Knorren, Phys. Rep. 280 (1997) 1. V. Koch, Phys. Lett. B 337 (1994) 7. T. Waas, W. Weise, Nucl. Phys. A 625 (1997) 287. E. Friedmann, A. Gal, C.J. Batty, Nucl. Phys. A 579 (1994) 518. W.H. Dickho!, A. Faessler, H. MuK ther, S.S. Wu, Nucl. Phys. A 405 (1983) 534. M. Lutz, Phys. Lett. B 426 (1998) 12. J. Scha!ner-Bielich, I.N. Mishustin, J. Bondorf, Nucl. Phys. A 625 (1997) 325. H. Muller, B. Serot, Phys. Rev. C 52 (1995) 2072. G.E. Brown, C.-H. Lee, M. Rho, V. Thorsson, Nucl. Phys. A 567 (1994) 937. F. Fujii, T. Maruyama, T. Muto, T. Tatsumi, Nucl. Phys. A 597 (1996) 645. P.J. Ellis, R. Knorren, M. Prakash, Phys. Lett. B 349 (1995) 11. R. Knorren, M. Prakash, P.J. Ellis, Phys. Rev. C 52 (1995) 3470. J. Scha!ner, C. Greiner, H. Stocker, Phys. Rev. C 46 (1992) 322. N.K. Glendenning, Z. Phys. A 327 (1987) 295. N.K. Glendenning, S. Pei, Eugene Wigner Memorial Issue of Heavy Ion Physics, Budapest, Vol. 1, 1995, p. 1. M.B. Christiansen, N.K. Glendenning, Phys. Rev. C 56 (1997) 2858. M.A. Alpar, H.F. Chau, K.S. Cheng, D. Pines, Astrophys. J. 459 (1996) 706. R.N. Manchester, in: D. Pines, R. Tamagaki, S. Tsurata (Eds.), The Structure and Evolution of Neutron Stars, Addison-Wesley, Redwood City, CA, 1992. [75] S.E. Thorset, Z. Arzoumanian, M.M. McKinnon, J.H. Taylor, Astrophys. J. 405 (1993) L29. [76] J. McKenna, A.G. Lyne, Nature 343 (1990) 349.
449
CONTENTS VOLUME 342 A. K. Chakraborty. Disordered heteropolymers: models for biomimetic polymers and polymers with frustrating quenched disorder M.I. Eides, H. Grotch, V.A. Shelyuto. Theory of light hydrogenlike atoms
1 63
G. GyoK rgyi. Techniques of replica symmetry breaking and the storage problem of the McCullochdPitts neuron
263
N.K. Glendenning. Phase transitions and crystalline structures in neutron star cores
393
PII: S0370-1573(01)00005-9
450
FORTHCOMING ISSUES T. Renger, V. May, O. KuK hn. Ultrafast excitation energy transfer dynamics in photosynthetic pigment} protein complexes G.S. Bali. QCD forces and heavy quark bound states G. Hackenbroich. Phase coherent transmission through interacting mesoscopic systems P. BineH truy, G. Girardi, R. Grimm. Supergravity couplings: a geometric formulation J. von Delft, D.C. Ralph. Spectroscopy of discrete energy levels in ultrasmall metallic grains F. Ehlotzky. Atomic phenomena in bichromatic laser "elds M. Chaichian, W.F. Chen, C. Montonen. New superconformal "eld theories in four dimensions and N"1 duality J. Manjavidze, A. Sissakian. Very high multiplicity hadron processes R. Merkel. Force spectroscopy on single passive biomolecules and single biomolecular bonds M. Krawczyk, A. Zembrzuski, M. Staszel. Survey of present data on photon structure functions and resolved photon processes K. Michielsen, H. De Raedt. Integral-geometry morphological image analysis E. Nielsen, D.V. Fedorov, A.S. Jensen, E. Garrido. The three-body problem with short-range interactions C. Song, Dense nuclear matter: Landau Fermi-liquid theory and chiral Lagrangian with scaling D.R. Grasso, H.R. Rubinstein. Magnetic "elds in the early Universe E.L. Nagaev. Colossal-magnetoresistance materials: manganites and conventional ferromagnetic semiconductors D. Atwood, S. Bar-Shalom, G. Eilam, A. Soni. CP violation in top physics V.K.B. Kota. Embedded random matrix ensembles for complexity and chaos in "nite interacting particle systems V.M. Loktev, R.M. Quick, S. Sharapov. Phase #uctuations and pseudogap phenomena T. Nakayama, K. Yakubo. The forced oscillator method: eigenvalue analysis and computing linear response functions S.O. Demokritov, B. Hillebrands, A.N. Slavin. Brillouin light scattering studies of con"ned spin waves: linear and nonlinear con"nement I.M. Dremin, J.W. Gary. Hadron multiplicities PII: S0370-1573(01)00006-0