Equilibrium Analysis of Complex Systems Lecture Notes of 7CCMCS03 R K¨ uhn
King’s College London Jan 2017
Contents 1 ...
30 downloads
770 Views
2MB Size
Report
This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!
Report copyright / DMCA form
Equilibrium Analysis of Complex Systems Lecture Notes of 7CCMCS03 R K¨ uhn
King’s College London Jan 2017
Contents 1 Introduction and Motivation
7
1.1
Motivation & Scope . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
7
1.2
The issue of methods & tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
9
1.3
Example: the Ising model in some of its many guises . . . . . . . . . . . . . . . .
11
2 Statistical Mechanics
16
2.1
Fundamental Relations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
16
2.2
Extremising Properties of Entropy and Free Energy . . . . . . . . . . . . . . . .
20
2.3
Systems with Frozen Degrees of Freedom: Disorder & Self-Averaging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
23
2.3.1
High Temperatures — Annealed Disorder . . . . . . . . . . . . . . . . . .
23
2.3.2
Low Temperatures — Quenched Disorder . . . . . . . . . . . . . . . . . .
24
2.3.3
Self-Averaging — Quenched Free Energy . . . . . . . . . . . . . . . . . .
25
2.3.4
A Note on Averages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
28
2.3.5
The Annealed Approximation . . . . . . . . . . . . . . . . . . . . . . . . .
29
3 Dynamics and Equilibrium
30
3.1
Describing Stochastic Dynamics . . . . . . . . . . . . . . . . . . . . . . . . . . . .
30
3.2
Uniqueness of Equilibrium Distributions . . . . . . . . . . . . . . . . . . . . . . .
31
3.3
Glauber Dynamics for Ising Systems . . . . . . . . . . . . . . . . . . . . . . . . .
32
3.4
The Metropolis Monte-Carlo Dynamics Algorithm . . . . . . . . . . . . . . . . .
35
3.5
Langevin & Fokker Planck Equations and Equilibrium . . . . . . . . . . . . . . .
36
1
4 Asymptotic Methods 4.1
38
Laplace and Saddle Point Methods . . . . . . . . . . . . . . . . . . . . . . . . . .
38
4.1.1
The Laplace Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
38
4.1.2
The Saddle Point Method . . . . . . . . . . . . . . . . . . . . . . . . . . .
40
Curie-Weiss Ferromagnet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
41
4.2.1
First Method: Combinatorics and Laplace Method . . . . . . . . . . . . .
42
4.2.2
Second Method: Gaussian Linearization and Laplace Method . . . . . . .
43
4.2.3
Third Method: δ-Functions and Saddle-Point Method . . . . . . . . . . .
44
4.2.4
Equivalence of the Three Methods . . . . . . . . . . . . . . . . . . . . . .
45
4.2.5
The Physics of the Curie-Weiss Model . . . . . . . . . . . . . . . . . . . .
46
4.3
The Mean-Field Approximation . . . . . . . . . . . . . . . . . . . . . . . . . . . .
51
4.4
Variational Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
53
4.5
Variational Mean Field . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
54
4.2
5 System with Finite Dimensional Site Disorder 5.1
57
General Solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
58
5.1.1
61
Neural Networks – The Hopfield Model . . . . . . . . . . . . . . . . . . .
6 A Model for Operational Risk
67
6.1
Set-Up of the Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
68
6.2
A Statistical Mechanics Approach
. . . . . . . . . . . . . . . . . . . . . . . . . .
69
6.2.1
Replica Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
70
6.2.2
Replica Symmetric Solution . . . . . . . . . . . . . . . . . . . . . . . . . .
74
6.2.3
Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
76
7 Random Matrix Theory
80
7.1
Eigenvalues and the Spectral Density . . . . . . . . . . . . . . . . . . . . . . . . .
80
7.2
The Random Matrix Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
81
7.2.1
Representation of the δ-function . . . . . . . . . . . . . . . . . . . . . . .
82
Cavity Approach to Sparse Matrices . . . . . . . . . . . . . . . . . . . . . . . . .
83
7.3.1
Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
86
7.3.2
The Dense Limit and Wigner’s Semi-Circle . . . . . . . . . . . . . . . . .
86
7.3
2
8 Dynamics, Optimization and Statistical Mechanics 8.1
8.2
92
Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
92
8.1.1
Example: Travelling Salesman Problem (TSP)
. . . . . . . . . . . . . . .
92
8.1.2
Example: Coding and Decoding . . . . . . . . . . . . . . . . . . . . . . . .
93
8.1.3
The Role of Statistical Mechanics . . . . . . . . . . . . . . . . . . . . . . .
93
Dynamics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
94
9 Transfer Matrices
99
9.1
One-Dimensional Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
9.2
Quasi One-Dimensional Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
9.3
Spin Chains in a Random Field . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103 9.3.1
9.4
99
Thermodynamics of the Random Field Chain . . . . . . . . . . . . . . . . 105
Double-Stranded DNA as an Application . . . . . . . . . . . . . . . . . . . . . . 107
3
Aim of the Course The title of this course is Equilibrium Analysis of Complex Systems. • It deals with Complex Systems, i.e. systems which may – have many degrees of freedom, e.g. ∗ many particles (their positions, velocities and possibly internal degrees of freedom, such as spin), ∗ firing states of a population of neurons, ∗ attitudes or opinions of a large group of people, ∗ performance characteristics in a network of processes, ∗ many and diverse financial positions ∗ the actions of a large group people (agents) ∗ the demands on food resources of many species ∗ consist of large or heterogeneous data sets (many signals/complicated signals),
– which interact in complicated ways (short & long range, non-geometric, solely determined by function, cooperative, competitive, random strengths) – for which the dynamics often involves stochasticity. – for which the number of parameters needed to fully describe them grows with system size. • It deals with with their Analysis, i.e. we will explain concepts and methods, which allow to – derive macroscopic properties (qualitative and quantitative) of large systems, using ∗ properties of their microscopic constituents, ∗ the laws governing their interaction, ∗ an parameters characteristic of their environment (temperature, pressure, forces, fields, noise level, . . . ) – predict parameters at which macroscopic properties undergo dramatic changes although parameters are only changed infinitesimally (collective phenomena, phase transitions, emergent behaviour) ∗ feromagnetism below Tc , ∗ functional memory in neural networks below critical (noise-dependent) loading level, but not above ∗ systemic breakdown of process networks at critical values of mutual dependencies, ∗ adoption of new technologies below critical value of acquisition costs, 4
• It uses concepts of Equilibrium Statistical Physics. – They are very powerful, very diverse, and often better understood and simpler to use than proper dynamical methods for complex systems. – Despite the fact that many artificial systems do not exhibit (thermodynamic) equilibrium, such methods can still provide fairly accurate descriptions of statistically stationary behaviour of complex systems. – They have proven to be very powerful in generating non-trivial predictions and intuition about qualitative properties and property changes.
The Modelling Cycle The problem/challenge: (i) Nature doesn’t tell you which equations (which model!) you should use to describe it. You have to find them. (ii) Nature is very complicated; you often concentrate on a subset of phenomena that a system may exhibit, and simplifying assumptions often have to be made to make analytical progress. How far can you go, and how do you decide whether a model is appropriate? There is nothing automatic about this process. Modelling is the art of science! • Main guiding principles: – A good theoretical model of a complex system should be like a good caricature. It should emphasize those features that are most important, while downplaying the inessential details. (Y. Frenkel) – Everything should be made as simple as possible, but not simpler. (A. Einstein) – The problem with this guidance: we do not know what the most important or the inessential features are, before we have understood the system in question. • ⇒ Go through modelling cycle – Start with a complex system to describe – construct a mathematical model (a set of equations) that attempt to describe the system – solve the model/the equations – use the solution of the model to make predictions about the system behaviour – compare predictions with data and known facts about the system – if necessary improve/redefine the model and continue . . .
5
Problem Data
(re−)define
compare
Mathematical
Predictions of the model
interpret solutions
Model
solve Solution of Equations
Figure 1: Modelling Cycle.
6
Chapter 1
Introduction and Motivation 1.1
Motivation & Scope
Why should we be interested in complex systems? • Complex systems are all around us: – Most industrially relevant materials (steel, alloys, ceramics, (hetero-)polymers) – Power supply systems, transport infrastructure, IT and communication systems – DNA, chemical reaction networks in living cells, protein-protein interaction networks, gene regulation networks, cell-signalling networks, neural networks, – networks of financial positions (credits, investments, financial obligations, options futures), factories (≃ process networks), trade networks, social networks – large heterogeneous data sets generated by human activity (internet use, in science, ...) • Modelling them, analyzing them, and understanding them is key to improving many dimensions of human existence (driving developments rather than being driven by them) – biochmedical research tries to understand cancer in terms of malfunctions of gene regulation networks, which induce changes in protein-protein interaction networks – financial regulators just beginning to understand contagion in networks of financial positions, and improve regulations on the basis of these insights – interet giants such as Amazon use advaned machine learning techniques to analyze e-activity data (twitter, facebook, buying decisions) to analyze consumer behaviour, and design optimal (viral) marketing campaigns 7
Sources and examples of complexity • Complexity from two different sources – nonlinear dynamics, chaos (sensitive dependence on initial conditions), relevant already in small systems – large numbers of interacting degrees of freedom (⇒ emergent collective behaviour, phase transitions separating different types of collective behaviour) & interactions without regularity, dynamical elements receiving conflicting ordering instructions (⇒ slow ‘glassy’ dynamics) • Here almost exclusively of 2nd type. Main tools of analysis: statistical physics, and in particular statistical physics of complex and disordered systems; additional complexity coming from adding first feature to large system is typically small, statistical tools remain valid. • Specifically: heterogeneous systems in physics – due to composition (non-equivalent degrees of freedom/particles) – due to irregular structures/disorder (glassy, amorphous) – due to irregular patterns of interaction – conflicting ordering instructions e.g. from mixture of ferromagnetic and antiferromagnetic interactions • Beyond physics – Biology: gene expression networks & stable cell types, neural networks & associative memory, formation of protein-protein complexes or complexes of proteins and small molecules (myoglobin-CO), DNA unzipping . . . – Economy: choice of optimal investment portfolios composed of correlated assets, strategy evolution (minority game/El Farrol bar problem), understanding fluctuations of price changes in interacting markets, optimization (travelling salesman, task allocation, chip-design), operational risk, credit risk – Technology: IT systems, black-out risk in power grids, routing internet traffic, smart grids, cloud computing, internet of things . . . – Social sciences: dynamics of opinion formation, adoption of new technologies, collective information processing through search-engines, social networks (search ↔ pagerank ↔ information processed as result of search ↔ evolution of search behaviour)
8
1.2
The issue of methods & tools
Traditional methods (condensed matter physics) • Standard textbooks on condensed matter theory deal with ideal structures – ideal crystals, their structure analysis using X-rays . . . , – free (non-interacting) electrons, – periodic potentials, band structure, – elementary excitations: lattice vibrations (phonons), magnons, plasmons . . . , – interactions of elementary excitations (electron-phonon ⇒ superconductivity) – collective phenomena (magnetism, superconductivity, . . . ) in pure systems.
• All topics still active areas of research, many not completely understood. • But – Ideal crystals don’t exist in the real world (impurities, imperfections of structure), many technically important materials are not crystalline at all (glass, rubber, wood, ceramics, even steel) – fundamentally: heterogeneous compositions, many kinds of different constituents, interactions between them irregular, competition and disorder – on positive side: heterogeneity can be used to create specific properties (steel, ceramics, polymers) – get much richer set of properties (memory, system states not uniquely determined by external parameters such as pressure, temperature, and electric/magnetic fields). Theoretical implication • Many techniques that rely on ideality (perfect periodicity, perfect homogeneity) fail for most real world materials. • Need new set of concepts, tools & techniques to describe the new set of phenomena. • Have to worry about composition dependence & stability of macroscopic properties: typical vs. average properties, various forms of average, and tools to compute them.
9
Complex Systems • The new techniques can be used to analyze other (non-physics) situations with heterogeneity, disorder, and competition (biology, information theory, social science, economics; see the ‘beyond physics’ list above for a list of more specific examples) • Application of physics tools outside physics has triggered new methodological developements, which have been reflected back to make advances in physics. • In complex systems outside physics, the separation between equilibrium and nonequilibrium approaches is less meaningful than in physics, as most non-physics complex systems are inherently out of equilibrium. • That said: there is a role for analysing stationary (long-lived) collective states of complex systems using tools of equilibrium statistical physics, as this type of analysis is often significantly easier than that of non-equilibrium aspects, and the available tool-kit much better developed. At the same time, equilibrium analyses can still deliver useful insights into complex systems behaviour.
=⇒
Equilibrium Analysis of Complex Systems
10
1.3
Example: the Ising model in some of its many guises
The Ising Model and Physical Systems: Classical Magnetism • classical spins Si , i = 1, . . . , N each in two possible states Si ∈ {±1} (up/down) • asynchronous dynamics (at any time step a randomly selected spin is updated according to) ! N X Si (t + ∆t) = sign (1.1) Jij Sj (t) + hi + ξit j=1
with Jij (exchange) interactions, Jij > 0 ferromagnetic, Jij < 0 anti-ferromagnetic, Jij = 0 no interaction between i and j; hi (external) magnetic field, possibly site-dependent; ξit random noise, independent in space (i) and time (t). • Energy/Lyapunov function: for symmetric couplings (Jij = Jji and Jii = 0, the noiseless (ξit ≡ 0) dynamics (1.1) is governed by a Lyapunov function H(S) = −
X 1X Jij Si Sj − hi Si , 2 i,j
(1.2)
i
meaning that H is a non-increasing function of time under the noiseless dynamics (1.1), system will thus evolve towards (local) minima of H(S) (‘downhill’). • Thermal equilibrium: If the noise is thermal with probability density function (pdf) d tanh(βξ) — for a justification of this form of the noise-pdf and a precise ρβ (ξ) = 12 dξ characterization of thermal equilibrium see Ch. 3 below — the system approaches thermal equilibrium with a Gibbs-Boltzmann equilibrium distribution for the system states S = (Si ), 1 1 (1.3) exp[−βH(S)] , β= pβ (S) = Zβ kB T • Quantity of interest: e.g. macroscopic magnetization m(t) =
N 1 X Si (t) N i=1
→
m=
N 1 X hSi i , as N i=1
t→∞,
(1.4)
where h. . .i in the second expression denotes an average over the Gibbs-Boltzmann distribution. It is asymptotically correct in the thermodynamic limit N → ∞ by the Law of Large Numbers (LLN).1 1
Note that the t → ∞ limit is the strict formal condition. What is actually needed is ‘equlilibration’, which in many physical system can be acchieved in times as short as pico-seconds!
11
• Ferromagnet on a regular lattice: Jij = J for (i, j) n.n., and Jij = 0 otherwise. Then with hi ≡ 0, m(t) → m0 sign(m(0)), as t → ∞, with the spontaneous magnetization m0 depending on thermal noise level (temperature). There is a critical temparature Tc , s.t. m0 (T ) = 0 for T ≥ Tc , and m0 (T ) > 0 for T < Tc (See Fig. 1.1). Behaviour of magnetization and other thermodynamic functions singular at Tc (critical phenomena). (Nb.: equivalent to antiferromagnetism on a bi-partite lattice). 1
m(T)
0.8 0.6 0.4 0.2 0 0
0.2
0.4
0.6
0.8
1
1.2
T
Figure 1.1: Qualitative behaviour of the temperature dependent magnetization in a ferro-magenet, without an external magnetic field, showing a phase-transition at Tc = 1. • Disordered Magnets: mixture of ferromagnetic and antiferromagnetic bonds: get (anti) ferromagnetism if (overwhelming) majority of bonds are (anti) ferromagnetic, and spinglass order hSi i = 6 0,
but
m(t)
→
m=
N 1 X hSi i = 0 , as N i=1
t→∞
(1.5)
at low temperatures in intermediate concentration regime. Main issut here: get conflicting ordering instructions, and thus frucstrated(!) spins (e.g. in a triangle of antiferromagnetic bonds). With many frustrated loops (see Fig. 1.2), spins get frozen in random orientations at low temperatures, and there will generally be many metastable states (O(2αN ) with 0 < α < 1). Order parameter to distinguish between high and low temperature phase in this case is the Edwards Anderson order parameter, q=
N 1 X hSi i2 , N
(1.6)
i=1
Such systems, too exhibit phase transitions in the sense that q = 0 at high temperatures but q > 0 for T < TSG , the transition temperature being typically referred to as the spin-glass temperature (sometimes the freezing temperature). 12
J<0 ?
J>0
J>0
J>0
Figure 1.2: Loop of frustrated spins. A loop γ is frustrated iff
Q
(i,j)∈γ
sign Jij < 0.
The Ising Model and Complex Systems Outside Physics • Neural networks: two state Mc Culloch-Pitts neurons Si ∈ {±1} (firing/non-firing). Neurons evaluate their post-synaptic potential ui (t) =
N X
Jij Sj (t) ,
(1.7)
j=1
and fire in the next time step, if the potential is above a threshold ϑi , and remain quiescent otherwise. Jij synaptic efficacies: Jij > 0 (excitatory), Jij < 0 (inhibitory). Dynamics including stochasticity (thermal noise or other) ! N X Jij Sj (t) − ϑi + ξit Si (t + ∆t) = sign (1.8) j=1
is formally equivalent to (1.1), with hi = −ϑi . Here typically large numbers of interaction partners (O(104 )), and large numbers of neurons (O(1011 ) for the human brain). Questions to be addressed: can one design a set of couplings that give rise to a desired set of dynamical attractors (memories). Hopfield: using Hebb rule for random patterns (ξiµ ∈ {±1}, i.i.d. with meah zero) p 1 X µ µ Jij = ξi ξj N
(1.9)
µ=1
has stable attractors that closely resemble the stored patterns (need ϑi = 0) N 1 X µ ξi Si (t) mµ (t) = N i=1
→ 13
N 1 X µ mµ = ξi hSi i ≃ 1 , N i=1
(1.10)
provided the number p of patterns encoded in the {Jij } and the noise-level are not too large. (See Fig. 1.3.)
Figure 1.3: Phase diagram of the Hopfield model in the temperature (T )- loading (α = p/N ) plane. P: ‘paramagnetic’ phase with mµ = 0 and q = 0; F: ‘ferromagnetic’ retrieval phase phase with mµ 6= 0 and q 6= 0; SG: ‘spin-glass phase with mµ = 0 and q 6= 0; M: the ferromagneticvretrieval phase coexists with the spin-glass phase, but is only meta-stable.
• Dynamics of attitude change: look at 2-attitude system (any system with binary option, e.g. adopting a new technology/habit/opinion). Describe in terms of preference field for new attitude N X Jij Sj (t) − ϑi + h0 (t) , (1.11) ui (t) = j=1
where the couplings describe influence of peer pressure, the ϑi would describe idiosyncratic a-priori preferences, and would be randomly distributed across a population, and h0 (t) is an external driving field (could mimic e.g. the effect of a falling price on a new gadget). Looking at noiseless situation for simplicity Si (t + ∆t) = sign ui (t) (1.12) m(t + ∆t) =
J N
(uniform all-to-all interactions). Then, by the LLN, 1 X 1 X Si (t + ∆t) = sign Jm(t) − ϑi + h0 (t) N N
Consider the simplest case Jij =
i
i
≃ Prob(ϑi ≤ Jm(t) + h0 (t)) − Prob(ϑi > Jm(t) + h0 (t))
= 2P (Jm(t) + h0 (t)) − 1 , 14
(1.13)
where P (x) = Prob(ϑi ≤ x)
(1.14)
is the distribution (the integrated pdf) of the idiosyncratic preferences. If the external field changes slowly, m(t) approaches a stationary value m(h0 ) = m(h0 (t)) characteristic of the current value of h0 (t). Depending on the value of J, can get continuous or discontinuous changes of opinion, as h0 (t) changes from −∞ to +∞.
Exercise: Assume p(ϑi ) is symmetric about 0, with a single maximum at 0 (think of a √ zero mean, variance σ 2 Gaussian for which P (x) = 21 [1 + erf(x/ 2σ 2 )]. Use a graphical analysis of the stationarity equation at given h0 = h0 (t) m = 2P (Jm + h0 ) − 1
(1.15)
to give a qualitative sketch of the behaviour of m as a function of h0 , and convince yourself of the correctness of the above claim concerning the existence of continuous and discontinuous transitions.
15
Chapter 2
Statistical Mechanics 2.1
Fundamental Relations
If a thermodynamic system, whose dynamic variables are collectively denoted by σ, in contact with a heat-bath at temperature T , is in thermal equilibrium, it described by the GibbsBoltzmann distribution 1 pβ (σ) = exp[−βH(σ)] , β = 1/kB T , (2.1) Z corresponding to H = H(σ), the energy function (Hamiltonian) of the system. The normalization constant X Z = Z(β) = exp[−βH(σ)] , (2.2) σ
in pβ is known as the so-called partition function.
The partition function of a system is of central importance in thermodynamics, as it is related to one of the important thermodynamic functions, viz. the free energy, via the relation F (β) = −β −1 ln Z(β) .
(2.3)
We will not be much interested in the fundamental relations between the free energy and other thermodynamic functions such as internal energyy or specific heat from a physics or thermodynamics point of view. We will, however come across several of them, without appealing to thermodynamic reasoning, as we go along; some of these relation are briefly explored from the thermodynamic point of view in Appendix A. We will here mainly explore the probabilistic significance of the partition and the free energy, namely that they are closely related to moment generating functions and cumulant generating functions of of probability theory. 16
Reminder: Moment and cumulant generating functions Consider a (discrete) random variable X taking values x ∈ Ω, with probabilities p(x) = Prob(X = x). • The moment generating function G(h) corresponding to p(x) is defined as1 X G(h) = p(x)ehx .
(2.4)
x∈Ω
• One can use the moment generating function to evaluate moments of the random variable X by differentiation n X d n n µn (X) = hX i = p(x)x = G(h) . (2.5) dh h=0 x∈Ω
Exercise: Verify this identity.
• The cumulant generating function K(h) corresponding to p(x) is defined as K(h) = log G(h) • Cumulants of the random variable X are obtained by differentiation of F , n d K(h) . κn (X) = dh h=0
(2.6)
(2.7)
• Cumulants are related to centered moments. The lowest order cumulants are κ1 (X) κ2 (X)
= =
κ3 (X)
=
hXi = µ1 (X) hX 2 i − hXi2 = Var(X)
hX 3 i − 3hX 2 ihXi + 2hXi3
Exercise: Verify these identities.
Moments of the Energy Function Moments of the engy function H(σ) are obtained from the partition function by differetiation, in close analogy to the way moments are obtained from a generating function, ∂ n 1 Z(β) . (2.8) − µn (H) = hH n i = Z(β) ∂β
Note that we are here using partial derivatives w.r.t β in anticipation of the fact that the energy function may contain further parameters w.r.t. which we could differentiate (see below). Also, we are not evaluating derivatives at β = 0, but at arbitrary β (obtaining moments through a slightly modified construction in terms of derivatives). The first moment E = hHi is referred to as the internal energy of the system. 1
Other representations exist, and some care is needed in the case of continuous random variables and/or if |Ω| = ∞.
17
Cumulants of the Energy Function Cumulants of the Energy function are obtained by differentiation of the free energy according to ∂ n ∂ n ( − βF (β)) = − ln Z(β) . (2.9) κn (H) = − ∂β ∂β ∂ ∂ Recall that µ1 (H) = κ1 (H) = E. Using β = 1/kB T , hence T ∂T = −β ∂β , we can exploit the first identity in (2.9) to obtain
E =F +β
∂ ∂ F =F −T F ∂β ∂T
⇔
F =E+T
∂ F ∂T
(2.10)
Thermodynamic relations allow to identify S = S(β) = −
∂ F ∂T
(2.11)
as the entropy of the system. Analogous reasoning can be used to obtain a thermodynamic interpretation for the second cumulant κ2 (H), as ∂ E ≡ C(β) = kB β 2 κ2 (H) = kB β 2 [hH 2 i − hHi2 ] . ∂T
(2.12)
This is the rate of change of internal energy with temperature, known as specific heat. The relation between specific heat and κ2 (H) allows to conclude that the specific heat of a system is always non-negative. Probabilistic Expression for the Entropy The entropy can be expressed in terms of the Gibbs-Boltzmann distribution as X S(β) = −kB pβ (σ) ln pβ (σ) , (2.13) σ
To verify insert the Gibbs distribution (2.1) into this expression, which gives X pβ (σ)[− ln Z(β) − βH(σ)] = kB ln Z(β) + kB βE . S(β) = −kB σ
On the other hand S(β) = −
∂ ∂ (−β −1 ln Z(β)) = kB β 2 (−β −1 ln Z(β)) = kB ln Z(β) + kB βE ∂T ∂β
which agrees with the probabilistic expression. Exercise: Check the arithmetic underlying the relations. 18
Generalization: Observables, Conjugate Fields and Associated Cumulants Let us consider the case where the energy function has several identifiable contributions Oα (σ), with say α = 1, 2, . . . , n, whose contribution to the energy function can be controlled by external fields hα . Specifically we will assume that H(σ) = H0 (σ) −
n X
hα Oα (σ) .
(2.14)
α=1
We will refer to the Oα (σ) as to observables [think of magnetization or (electric) polarization as examples], and to the hα as their conjugate fields The contribution H0 (σ) would represent the energy of internal interactions, which cannot be externally manipulated. In this case the Gibbs-Boltzmann distribution will be parametrized by β and by the conjugate fields, and the partition function and the free energy will depend on all these parameters, i.e. we have io n h X X hα Oα (σ) (2.15) Z = Z(β, {hα }) = exp − β H0 (σ) − α
σ
and thus F = −β −1 ln Z(β, {hα }) = F (β, {hα })
Expectation values of the observables (i.e. first moments/first cumulants) are obtained as κ1 (Oα ) = hOα i = −
∂ F (β, {hα }) ∂hα
(2.16)
Remark: One can use this formalism to compute expectations of observables, which cannot, in practice, be controlled by by external fields. Exercise: How ? Hint: think of the only possible value of the conjugate field at the disposal of the experimenter for these observables. One defines generalized susceptibilities via χαβ =
∂2 ∂2 ∂ hOα i = − F (β, {hα }) = − F (β, {hα }) = χβα ∂hβ ∂hβ ∂hα ∂hα ∂hβ
(2.17)
Inserting definitions one finds them related to mixed second order cumulants, χα,β = β[hOα Oβ i − hOα ihOβ i] = β κ2 (Oα , Oβ )
(2.18)
or correlations. Note that the meaning of χα,β is the response of the expectation of Oα to changes in the field hβ conjugate to the observable Oβ . Relations of the type (2.18) are therefore known as correlation-response-relations. Note they are generally valid for equilibrium distributions of Gibb-Boltzmann type. Exercise: Fill in the necessary arithmetic to check these relations.
19
2.2
Extremising Properties of Entropy and Free Energy
The equilibrium free energy and the thermodynamic entropy have the following extremising properties Propopsition: Let p = (p(σ)) be any probability assignment (or pdf). Define entropy and energy functionals as X X S[p] = −kB p(σ) ln p(σ) , E[p] = p(σ)H(σ) (2.19) σ
σ
and in terms of these a free energy functional as F [p] = E[p] − T S[p] .
(2.20)
Then (i) The Gibbs-Boltzmann distribution is the unique distribution that maximizes the entropy functional S[p] under the constraint of a given value for the average energy E = E[p]. Using pβ to denote specifically the Gibbs-Boltzmann distribution at (inverse) temperature β, one then has that the thermodynamic entropy S = S(β) is given in terms of the maximizing probability pβ as S(β) = S[pβ ]. (ii) (Gibbs variational principle): The Gibbs-Boltzmann distribution pβ is the unique distribution that minimizes the free energy functional F [p] at given inverse temperature β, and the thermodynamic free energy is given in terms of the minimizing probability pβ as F (β) = F [pβ ] Proof (i): Find distribution at which S[p] is stationary with respect to variations of p under the P constraints of given average energy E = E[p] and normalization h1i = σ p(σ) = 1, using the method of Lagrangian multipliers to handle the constraints. Thus, introducing the Lagrangian X p(σ) − 1 − kB λE E[p] − E , (2.21) L[p] ≡ S[p] − kB λ0 σ
solve ∀σ :
∂ L[p] = −kB ( ln p(σ) + 1) − kB λ0 − kB λE H(σ) = 0 , ∂p(σ) X ∂ p(σ) − 1 = 0 L[p] = −kB ∂λ0 σ ∂ L[p] = −kB E[p] − E = 0 ∂λE 20
(2.22) (2.23) (2.24)
The first of the above equations can be solved explicitly for p(σ) as p(σ) =
1 exp[−λE H(σ)] , Z
with Z = exp[1 + λ0 ] =
X
exp[−λE H(σ)] ,
(2.25)
(2.26)
σ
in which the last expression for Z follows from the second of the above equations, expressing the normalization constraint. Since with p of the form (2.25), the energy functional E[p] = hHi = P p(σ)H(σ) is a monotone decreasing function of λE (Exercise: verify this.), the value of σ λE which solves the third equation E[p] = E above (call it β) is uniquely defined, provided a solution does exist at all. Calling pβ the unique maximizing solution we have that S[pβ ] gives the correct probabilistic expression for the thermodynamic entropy if we identify β with the inverse temperature. P Note: in case of continuous degrees of freedom, where p is a pdf, and σ is symbolic notation for an integral, the partial derivatives w.r.t. the p(σ) would have to be replaced by functional δ derivatives δp(σ) . Remark The fact that the maximizing distribution must be unique can also be infered from the fact that S[p] is a concave functional. Suppose that both p1 and p2 give rise to the same maximum Smax of S[p] compatible with the constraints h1i = 1 (normalization) and hHi = E (given average energy). Then the convex combination pλ = λp1 + (1 − λ)p2 , with 0 ≤ λ ≤ 1 would also satisfy the constraints, but X S[pλ ] = −kB pλ (σ) ln pλ (σ) ≥ λS[p1 ] + (1 − λ)S[p2 ] = Smax , σ
and the inequality is sharp, unless p1 ≡ p2 (or λ ∈ {0, 1}). This follows from the fact that f (x) = −x ln x is a concave function of x on the interval [0, 1].
Proof (ii): The proof that F [p] at given temperature T is minimized by pβ follows along similar lines, though here only the normalization constraint needs to be taken into account, requiring a single Lagrange multiplyerλ0 . Exercise Fill in the details of the proof of (ii). Remark There is a deep connection between (i) and information-theory. Up to a constant factor, the entropy S[p] is equal to the information entropy, which is a measure of our ignorance about a probabilistic system. Maximizing entropy under constraints thus amounts to maximizing 21
ignorance under constraints (given by the knowledge we do have about a system), or in other words it is a systematic way of infering a probability assignment for a system in a maximally unbiased way, consistent with and only with the available information. This (subjectivistic) approach to statistical mechanics was pioneered in the 1950’s by Jaynes (E T Jaynes, Information Theory and Statistical Mechanics. Phys Rev 106, 620-630 (1957)). Distributions describing other ensembles of Statistical Mechanics, e.g. the grand-canonical ensemble describing systems coupled to a heat bath and a particle reservoir, keeping the average energy and the average number of particles fixed, can also be constructed in this way. Systems with continuous degrees of freedom If we have a system with continuous degrees of freedom the Gibbs-Boltzmann distribution would be a probability density function. Similarly, if we are interested in a Maximum Entropy (MaxEnt) estimate of the pdf describing a system with continuous random variables, given constraints provided as expectation values of a set of observables, one starts out from the differential entropy, Z S[p] = −kB dσ p(σ) ln p(σ) , (2.27) in which p(σ) is now a pdf, defined over a (suitabl sub-)set Ω ⊆ Rn , and the above integral extends over Ω. R Assuming that we have constraints Fα = hfα i ≡ dσ p(σ)fα (σ) , α = 1, . . . , a the MaxEnt estimate for the maximally unbiased pdf, given these constraints, is solved by introducing L = S[p] −
a X
α=0
kB λα (hfα i − Fα )
with f0 (σ) = 1, hence F0 = 1 representing the normalization constraint, and solving a
∀σ : ∀α :
X δ kB λα fα (σ) = 0 , L[p] = −kB ( ln p(σ) + 1) − δp(σ) α=0 Z ∂ dσp(σ)fα (σ) − Fα = 0 L[p] = −kB ∂λα
Exercise: Show that the maximizing pdf is of the form p(σ) =
1 − Paα=1 λα fα (σ) e . Z
Exercise Show that the Gaussian pdf p(x) = √
h i 1 exp − 2 (x − a)2 2σ 2πσ 2 1
22
is the unique distribution that maximizes the Entropy S[p] = − the constraints hxi = a and h(x − a)2 i = σ 2 .
2.3
R∞
−∞ dx p(x) ln p(x),
subject to
Systems with Frozen Degrees of Freedom: Disorder & Self-Averaging
Consider for exampe a random-bond Ising model of the following form. Ising spins Si are placed on the vertices of a d dimensional hypercubic lattice. Bonds between neighouring vertices (i, j) may be occupied by impurities (kij = 1) or empty (kij = 0), and if present, the impurity facilitates a ferromagnetic interaction between the neighbouring spins, which is absent if no impurity is present. We will denote by σ the set of possible spin-states, and by κ the set of possible configurations of the impurities. The energy function of the spins at given configuration of the impurities is then X X Jij (κ)Si Sj − h Si , (2.28) H(σ|κ) = − i
(i,j)
with Jij (κ) = Jkij , the first sum extending over over all nearest neighbour pairs, the second over all verticesof the lattice. Alternatively, one could imagine that the Jij (κ) take random values different from 0 or 1.
2.3.1
High Temperatures — Annealed Disorder
At sufficiently high temperatures (say T = T0 , or higher) one assumes that the impurities can freely hop between different bonds, so that there would be many such configurational changes during typical times used to measure magnetic properties of the system. In such a situation spin and configurational degrees of freedom are in thermal equilibrium described by a Gibbs-Boltzmann distribution of the form pβ0 (σ, κ) =
1 exp[−β0 H(σ, κ)] , Z(β0 )
β0 = 1/kB T0 ,
(2.29)
with H(σ, κ) = H(σ|κ) + V (κ) ,
(2.30)
and V = V (κ) is a potential energy term describing the interaction of the impurities. The partition function Z(β0 ) is then given by (the sum over all states) X X Z(β0 ) = exp[−β0 H(σ, κ)] = Z(β0 |κ) exp[−β0 V (κ)] . (2.31) σ,κ
κ
23
Here Z(β0 |κ) =
X
exp[−β0 H(σ|κ)]
σ
is the partition function of the spin degrees of freedom at given configuration κ of the impurities. Note that the κ summation in Z(β0 ) is restricted to configurations of a given number of impurities (e.g. with only a fraction ρ of all bonds occupied). The marginal distribution of the impurity configurations κ in this situation in given by pβ0 (κ) =
X
p0 (σ, κ) =
σ
2.3.2
Z(β0 |κ) exp[−β0 V (κ)] . Z(β0 )
(2.32)
Low Temperatures — Quenched Disorder
Asume that the system is suddenly cooled down to temperatures T ≪ T0 , so that the motion of impurities is inhibited. One may imagine that the motion of an impurity to a neighbouring empty bond requires to overcome an activation-energy barrier ∆, so that the average time τ for this to occur is τ ∼ τ0 exp(∆/kB T ) , (2.33)
where τ0 ≃ 10−12 sec is the typical microscopic attempt time. At kB T ≪ ∆ these events can be very rare, so during times of measurement where typically very many spin-flips occur, the configuration κ can be regarded as virtually fixed. (E.g. with ∆/kB T ≃ 100 one obtains τ ≃ 2.7 × 1031 sec ≃ 8.6 × 1023 years.
In such a situation the spin system is in equilibrium at given fixed κ. With respect to the configurations κ the system is not in equilibrium. Thus we have a Gibbs distribution at fixed κ p(σ|κ) =
1 exp[−βH(σ|κ)] , Z(β|κ)
with Z(β|κ) =
X
β = 1/kB T ,
exp[−βH(σ|κ)] .
(2.34)
(2.35)
σ
Thermodynamic functions like the free energy per spin fN (β|κ) = −(βN )−1 ln Z(β|κ)
(2.36)
or the magnetization mN (β|κ) = −
∂fN (β|κ) 1 X = hσi i ∂h N i
24
(2.37)
therefore depend on the fixed disorder configuration κ. That is, they are random quantities, which are in principle fluctuating over an ensemble of identically prepared systems, with the statistics of configurations κ described by p0 (κ). Note that the joint distribution p(σ, κ) = p(σ|κ)pβ0 (κ) is not a Gibbs equilibrium distribution, unless β = β0 .
2.3.3
Self-Averaging — Quenched Free Energy
Given that quantities like fN (β|κ) or mN (β|κ) are random, what are the relevant values to consider (or to compare with experiments, between experiments, between theory and experiment)? Possible candidates are (i) the average over the distribution of configurations κ , (ii) the typical (most probable value, or (iii) the median. These are not in general equal. See the following figure for an example. 0.7 0.6 0.5
p(x)
0.4 0.3 0.2 0.1 0 0
1
2
3
4
5
x
Figure 2.1: Pdf of a random variable X, with typical and average values indicated. A hint provided by nature: thermodynamic properties of heterogeneous systems (e.g. alloys) do not depend on detailed configurations of individual atoms or molecules; experiments performed on materials of identical composition (concentrations), but not on identical micro-configurations give reproducible and identical results. The reasion for this is that for densities of extensive quantities, such as fN (β|κ) or mN (β|κ) one observes the phenomenon of self-averaging. As the large-system limit N → ∞ is taken, fN (β|κ) converges to the configurational average over the κ distribution, X pβ0 (κ)fN (β|κ) = lim hfN (β|κ)iκ , (2.38) fN (β|κ) −→ f (β) = lim N →∞
N →∞
κ
called quenched free energy. 25
Proof (outline): An outline of a proof for systems with finite range interactions is as follows Consider a system on a hyper-cubic lattice in d dimensions, containing N = Ld lattice sites, and a Hamiltonian with finite range interactions of the form X X H(σ|κ) = − Jij (κ)Si Sj − h Si , (2.39) i
(i,j)
in which the Jij (κ) depend on a disorder configuration but have finite range: Jij (κ) = 0 forv|ri − rj | ≥ R.
Compare the free energy density fN (β|κ) for a typical configuration κ with that of a system which is obtained from the previous one by decomposing it into smaller blocks of size (L′ )d , and ˜ cutting all bonds in channels of width R connecting neighbouring subsystems. Denote by H(σ|κ) ′ d the Hamiltonian of the system thus obtained; it describes a collection of (L/L ) non-interacting, independent subsystems. We will look at the limit L → ∞, L′ → ∞, L/L′ → ∞. 1
2
R L’
L
α
Figure 2.2: Disordered system, partitioned into noninteracting sub-systems; see text. One has the bound
|fN (β|κ) − f˜N (β|κ)| ∼ O(1/L′ )
(2.40)
1 1 ˜ ˜ |fN (β|κ) − f˜N (β|κ)| ≤ sup |H(σ|κ) − H(σ|κ)| ≡ ||H − H|| N σ N
(2.41)
To prove this notice that
which in turn follwos from fN (β|κ) = −(βN )−1 ln
X
e−βH(σ|κ)
σ
26
= −(βN )−1 ln and noting that x satisfies
X
˜
σ
−β (H(σ|κ)−H(σ|κ)) −β H(σ|κ) , |e {z } ×e ˜
˜
x
˜
e−β||H−H|| ≤ x ≤ eβ||H−H|| .
˜ only contains interactions inside and into channels of cut bonds. MultiNow note that H − H plying number of blocks, surface area of each block, and width R of the channel the number of terms contributing to the difference is seen to scale as (L/L′ )d × d (L′ )d−1 × 2 R = 2d RLd /L′ with L and L′ . As each individual interaction energy is bounded by a finite constant, we have ˜ ≤ const. × 2d RLd /L′ , hence ||H − H|| R |fN (β, κ) − f˜N (β, κ)| ≤ const. × 2d ′ → 0 , L
as L′ → ∞ ,
(2.42)
in which the constant is an estimate for the maximum contribution of a single interaction energy. To finish the proof note that f˜N (β, κ) is given by the sum of free energies of independent blocks ′
′
N/N N/N 1 X 1 X f˜N (β|κ) F (β|κα ) = fN ′ (β|κα ) N N/N ′ α=1
(2.43)
α=1
in which F (β|κα ) is the free energy of block number α with disorder configuration κα (the restriction of κ to block number α), and fN ′ (β|κα ) = F (β|κα )/N ′ the corresponding free energy density. The last expression is nothing but an empirical average of the free energy density over the block-configurations κα . For large N/N ′ this converges to the expectation of this quantity over the distribution of block configurations by the law of large numbers. Thus f˜N (β|κ) → hfN ′ (β|κ′ )iκ′
(2.44)
as N/N ′ → ∞, where the average is over the distribution of block-configurations. Finally, assuming that the limit f (β) = limN ′ →∞ hfN ′ (β|κ′ )iκ′ exists, we have |fN (β|κ) − f (β)| ≤ |fN (β|κ) − f˜N (β|κ)| +|f˜N (β|κ) − hfN ′ (β|κ′ )iκ′ |
+|hfN ′ (β|κ′ )iκ′ − f (β)| → 0 ,
(2.45)
as L → ∞, L′ → ∞, L/L′ → ∞. Summary: When investigating systems with quenched disorder, the relevant thermodynamic potential is the so-called quenched free energy f (β) = − lim (βN )−1 h ln Z(β|κ)iκ , N →∞
27
i.e. the average of the free energy (density) over the distribution of configurations κ of the system, in the sense that the typical true configuration-dependent free energy density will converge to the quenched free energy, as the thermodynamic limit is taken. The distribution of disorder configurations is determined by the way the system was prepared.
2.3.4
A Note on Averages
We have seen that one way to obtain the correct (i.e. typical) free energy is via computing the average of the logarithm of the partition function, or the quenched average. If, as observed on real systems, the true disorder dependent free energy density fN (β|κ) = −(βN )−1 ln ZN (β|κ)
(2.46)
typically i.e. in the vast majority of realizations, takes the disorder-independent value f (β), there is another reason why one should consider the average of the logarithm of the partition function, rather than, say, the logarithm of the average. This is related to scaling of quantities with system size. Note that we can rewrite (2.46) as ZN (β|κ) = e−βN fN (β|κ)
(2.47)
The expectation that fN (β|κ) is for large N typically independent of κ implies that the pdf of fN must have for large N a very sharp maximum and be narrowly concentrated at the typical(=average) value f¯N , which will in turn approach the value f (β) of the quenched free energy, as N → ∞.
Because of the exponential scaling of ZN in N , a free energy computed from the logarithm of the average of ZN (β|κ) (rather than properly from the average of the logarithm) will give wrong results wich are dominated by rare (exponentially rare in N ) results. We can demonstrate this on an example where we asume that the pdf of fN , defined as X p(fN ) = p(κ)δ(fN − fN (β|κ)) (2.48) κ
is Gaussian, thus of the form i h 1 1 p(fN ) = q exp − 2 (fN − f¯N )2 , 2σN 2 2πσN
2 σN =
σ2 , N
(2.49)
and we assume that it becomes sharply concentrated at f¯ = limN →∞ f¯N ) (with a variance σN decreasing as 1/N ) as N becomes large. 28
Under these conditions hZN i =
Z
h i 1 2 dfN p(fN )ZN = exp − βN f¯N + β 2 N 2 σN 2
(2.50)
so that the so-called annealed free energy is
1 a fN (β) = −(βN )−1 hZN i = f¯N − βσ 2 , 2
(2.51)
i.e. it takes a value which is O(1) lower than the avarage(=typical) value f¯N , thus dominated by exponentially in N rare events! 2 = σ 2 /N a , for some a > 0, find the Exercise: Fill in the details of this argument. Assuming σN a (β) → f¯ as N → ∞. condition on a, for which fN
2.3.5
The Annealed Approximation
The annealed free energy is generally (not only for Gaussian fN a lower bound to the true free energy. This follows for the general case from the fact that the logarithm is a concave function, hence from Jensen’s inequality hln[ZN (β|κ)]iκ ≤ ln[hZN (β|κ)iκ ]
(2.52)
and therefore a (β) = −(βN )−1 ln[hZN (β|κ)iκ ] ≤ fN (β) = −(βN )−1 hln[ZN (β|κ)]iκ . fN
(2.53)
As the annealed free energy is often (much) easier to compute than the proper quenched free energy it is often used as an exact lower bound for the true free energy — the so called annealed approximation. Note: In the disordered systems community it bad — but widely accepted — jargon to claim that in the case of annealed disorder one has to compute the average of the partition function (and its logarithm to obtain the appropriate free energy), whereas in the case of quenched (frozen) disorder one has to compute the average of the logarithm of the partition function. The first part of this claim is nonsense. As we have seen, in the case of annealed disorder (configurational degrees of freedom in equilibrium with other dynamical degrees of freedom) the quantity of interest is a partition function in an enlarged phase space X X Z(β) = exp[−βH(σ|κ) − βV (κ)] = Z(β|κ) exp[−βV (κ)] σ,κ
κ
This is not an average of the κ dependent partition function Z(β|κ); even if it were for some value of β, it would not be a proper average for all other β. 29
Chapter 3
Dynamics and Equilibrium 3.1
Describing Stochastic Dynamics
We briefly explore the conditions under which a stochastic dynamical system will approach equilibrium characterized by a Gibbs-Boltzmann distribution. In fact for Ising-like or other discrete systems, there is no natural Hamiltonian dynamics (with conserved energy), and a stochastic dynamics must be constructed in such a way that the GibbsBoltzmann distribution will in fact be the asymptotic equilibrium distribution corresponding to a given energy function (Hamiltonian). In general, the evolution of a stochastic dynamical system with a set of microstates S (Ising or otherwise) can be described by following the time-dependence of the probability of states pt = (pt (S)). Denoting by W the matrix of transition probabilities for a time step ∆t, with (non-negative) matrix elements W (S, S ′ ), one has X pt+∆t = W pt ⇔ pt+∆t (S) = W (S, S ′ )pt (S ′ ) (3.1) S′
Normalization of probabilities requires that W is a stochastic matrix satisfying X W (S, S ′ ) = 1
(3.2)
S
independently of S ′ . This equation can be read as implying that W has a left eigenvector e = (1, 1, . . . , 1) corresponding to the eigenvalue λ = 1. The right eigenvector corresponding to this eigenvalue would be a stationary distribution p∞ satisfying p∞ = W p∞ 30
One can turn (3.1) into a continuity equation by subtracting pt , a so called master-equation which in in components reads i Xh pt+∆t (S) − pt (S) = W (S, S ′ )pt (S ′ ) − W (S ′ , S)pt (S) . (3.3) S′
Here we have used (3.2) on the r.h.s.
Taking the ∆t → 0-limit one obtains the continuous time master-equation i Xh w(S, S ′ )pt (S ′ ) − w(S ′ , S)pt (S) . ∂t pt (S) =
(3.4)
S′
in which w(S, S ′ ) = lim∆t→0 W (S, S ′ )/∆t is the rate of transitions (per unit time) from state S ′ to a state S. This is a continuity equation for probabilities, describing the the rate of change of pt (S) as a balance on incoming and outgoing probability currents. For a stationary state, the l.h.s. of (3.4) should vanish. Assuming that the dynamics approaches a stationary state as t → ∞, we should then have for all S that i Xh 0= w(S, S ′ )p∞ (S ′ ) − w(S ′ , S)p∞ (S) . S′
One way to satisfy this equation is to require not only that the net probability current into and out of S vanishes, but that the net probability current between each pair of states S snd S ′ is separately zero. This is the so-called detailed balance condition w(S, S ′ )p∞ (S ′ ) = w(S ′ , S)p∞ (S)
(3.5)
connecting the stationary distribution and the matrix of transition rates. Physical systems in thermal equilibrium are for fundamental reasons expected to satisfy this stronger principle of detailed balance.
3.2
Uniqueness of Equilibrium Distributions
There are two important theorems which can be invoked to see that equilibrium distributions for stochastic dynamical systems will be unique. Perron’s Theorem for Positive Matrices Given a matrix W which is positive in the sense that W (S, S ′ ) > 0 for all S and S ′ , then its maximum eigenvalue λ1 is non-degenerate and strictly positive, and it has an eigenvector with only strictly positive (or only strictly negative) components. (A proof by F Ninio for the simple case of symmetric matrices is on the course web page). There is a more general version for non-negative matrices. 31
Frobenius’ Theorem for Non-Negative Matrices Given a matrix W which is nonnegative in the sense that W (S, S ′ ) ≥ 0 for all S and S ′ , and indecomposable (i.e. no simultaneous permutation of rows and columns can transform it into the form A B ′ W = , 0 C with A and C square matrices, and 0 a matrix containing only zeros), then its eigenvalue λ1 with maximum real part is non-degenerate and strictly positive, and it has an eigenvector with only strictly positive (or only strictly negative) components. If there are k eigenvalues degenerate in absolute value with λ1 , i.e., λ1 = |λ2 | = . . . = |λk | then they are of the form λν = λ1 exp(i2π(ν − 1)/k) for ν = 1, . . . , k, so they are equally spaced on a circle of radius λ1 in the complex plane. Note: A system that is decomposable has a proper subspace of the space of states that the system cannot leave. Exercise: Verify this statement using the definition of indecomposability given above. Applied to matrices describing a stochastic dynamical system this entails that the equilibrium distribution which is the right eigenvector of W corresponding to eigenvalue 1 is uniqe (all other right eigenvectors must be orthogonal to the left eigenvector e = (1, 1, . . . , 1), so cannot have only positive elements). If the degenerate case k > 1 of the Frobenius situation is encountered, this would imply that the system asymptotically enters a k cycle. Remark The above theorems are true for finite-dimensional Matrices. If matrices are infinitedimensional then the system may have more than one asymptotic state. This is actually the situation when phase transitions occur, e.g. where the system finds itself either in the state with positive magnetisation, or in the state with negative magnetisation.
3.3
Glauber Dynamics for Ising Systems
For systems with discrete degrees of freedom such as the Ising model, the energy function or Hamiltonian X 1X ˜ i Si h Jij Si Sj − H(S) = − 2 i
i6=j
is often the starting point of the analysis (e.g. given through energetic considerations), and a stochastic algorithm then needs to be constructed which converges to thermal equilibrium described by H(S). This construction exploits the detailed balance requirement. 32
For the Ising model, the most popular choice is asynchronous Glauber dynamics. It consists in picking a spin (e.g., i) at random within a time-step ∆t = 1/N , and determining its new state according to X ˜ i + ξit Si (t + ∆t) = sign Jij Sj (t) + h j
where the ξit are zero mean i.i.d. random variables with 1 P (z) = Prob{ξit ≤ z} = (1 + tanh(βz)) . 2 This entails that the average value of Si (t + ∆t) for a fiven configuration S(t) is = tanh[βhi (S(t)] , hSi (t + ∆t)i S(t)
(3.6)
in which the local field hi (S) of spin i is
hi (S) =
X
˜i . Jij Sj + h
j(6=i)
Transitions between different states in a simgle time step can occur only if S and S ′ differ by flipping at most one spin in the sample. Formalise this by introducing a spin-flip operator via S = (S1 , S2 , . . . , Si , . . . , SN ) ,
S ′ = Fi S = (S1 , S2 , . . . , −Si , . . . , SN )
For Glauber dynamics one thus has W (S, Fi S) = W (Fi S, S) =
1 1 {1 + tanh[βSi hi (Fi S)]} = w(S, Fi S) 2N N 1 1 {1 − tanh[βSi hi (S)]} = w(Fi S, S) 2N N
The prefactor N1 in the transition probability accounts for the fact that each of the N spins can be chosen for update with probability N1 during the infinitesimal time interval ∆t = 1/N . Note that the local field is is independent of the state of Si , so that hi (S) = hi (Fi (S)) The master-equation describing this situation is then of the form pt+ 1 (S) − pt (S) = N
i 1 Xh w(S, Fi S)pt (Fi S) − w(Fi S, S)pt (S) , N i
33
which in the thermodynamic limit gives rise to i Xh ∂t pt (S) = w(S, Fi S)pt (Fi S) − w(Fi S, S)pt (S) .
(3.7)
i
Noting that
e±x 1 {1 ± tanh(x)} = 2 2 cosh(x)
we can verify that thes transition rates do indeed satisfy detailed balance w.r.t. the GibbsBoltzmann distribution corresponding to H(S), by checking that for all i eβSi hi (S) e−βH(Fi S) e−βH(S) e−βSi hi (S) = . 2 cosh[βSi hi (S)] Z 2 cosh[βSi hi (S)] Z
(3.8)
where we have invested that hi (S) = hi (Fi (S)) (a consequence of absence of self-couplings). Thus we need to show that for all i H(Fi S) − H(S) = 2Si hi (S) .
(3.9)
To check this statement, start isolating terms in H(S) which involve the spin i from those which don’t. Writing X 1X ˜ k Sk , H(S) = − h Jkℓ Sk Sℓ − 2 k
k6=ℓ
we note that terms in the (first) double sum involving Si could arise from k = i or from ℓ = i. Given any i we can thus write H(S) = −
1 X 1 X ˜ i Si + H (i) (S (i) ) , Jiℓ Si Sℓ − Jki Sk Si − h 2 2 ℓ(6=i)
k(6=i)
in which H (i) (S (i) ) contains all contributions to H(S) which do not involve Si . One refers to H (i) (S (i) ) as the cavity Hamiltonian. We will encounter it in later chapters. Using the symmetry of the couplings Jik = Jki and the fact that we can rename ℓ → k in the first contribution, we get i hX ˜ i + H (i) (S (i) ) = −Si hi (S) + H (i) (S (i) ) (3.10) Jik Sk + h H(S) = −Si k(6=i)
With this representation, clearly H(Fi S) − H(S) = 2Si hi (S) as claimed, given that the contribution of H (i) is not affected by the spinflip and vanishes in the difference, that Fi Si = −Si , and that hi (Fi S) = hi (S). 34
Remark For {0,1}-degrees of freedom ni instead of ±1 spin-degrees of freedom Si , the analog of Glauber Dynamics would consist in asynchronous updates of the form X ˜ i + ξit , Jij nj (t) + h ni (t + ∆t) = Θ j
for a single randomly selected node i, with Θ(x) the Heaviside step function, Θ(x) = 0 for x ≤ 0 and Θ(x) = 1 for x > 0. In this case we need a symmetric zero-mean noise distribution with βz 1 1 + tanh Prob{ξit ≤ z} = 2 2
(note the extra factor 1/2 in the argument of the hyperbolic tangent), so that
βh (t) i n o 1h eβhi (n(t)) i = Prob ni (t + ∆t) = 1 = 1 + tanh 2 2 1 + eβhi (n(t)) P ˜ with hi (t) = j Jij nj (t) + hi in order to achieve detailed balance with a Gibbs-Boltzmann distribution generated by X 1X ˜ i ni . H(n) = − h Jij ni nj − 2 i
i6=j
3.4
The Metropolis Monte-Carlo Dynamics Algorithm
Glauber Dynamics mimics an element of microscopic realism in the sense that it stipulates that a spin tends to align (with a probabiltiy depending on temperature) with the local field it experiences from its neighbours, including possibly an external field contribution. There is a general prescription, valid for arbitrary degrees of freedom which satisfied detailed balance with a given Hamiltonian, trivially by construction. This is the so-called Metropolis rule. Given a system of interacting degrees of freedom (discrete or continuous) described by a Hamiltonian H = H(S) there is a version of stochastic dynamics which trivivially satisfies detailed balance with the Gibbs-Boltzmann distribution corresponding to the Hamiltonian in question, namely Monte-Carlo dynamics based on the Metropolis algorithm. It’s simplest implementation consists in proposing one (or several) degrees of freedom for an update with probability pprop (S ′ , S), where S and S ′ denote the system state before and after the proposed update, and accept the proposed change with probafbility 1 , if H(S ′ ) < H(S) , pacc (S ′ , S) = (3.11) −β(H(S ′ )−H(S)) e , if H(S ′ ) ≥ H(S) . 35
If pprop is symmetric — pprop (S ′ , S) = pprop (S, S ′ ) — then the matrix w(S ′ , S) = const × pprop (S ′ , S) pacc (S ′ , S)
(3.12)
of transition rates will automatically satisfy detailed balance with the Gibbs-distribution generated by H(S), and hence the Metropolis algorithm will automatically converge to thermal equilibrium characterized by this distribution. Exercise: Convince yourselves of this fact.
3.5
Langevin & Fokker Planck Equations and Equilibrium
If we consider dynamics of continuous degrees of freedom described by Langevin equations of the form dxi = fi (x) + ξi (t) , i = 1, . . . , N , (3.13) dt with deterministic forces fi (x) and additive Gaussian white noise hξi (t)ξj (t′ )i = σ 2 δij δ(t − t′ ) this system allows an equivalent description in terms of a Fokker-Planck equation o X ∂ n 1 ∂ ∂t Pt (x) = − fi (x)Pt (x) − σ 2 Pt (x) . dxi 2 dxi
(3.14)
(3.15)
i
This is structurally a continuity equation for probabilities of the form ∂t Pt (x) = −
X ∂ Ji (x, t) . dxi i
We will show that a stochastic dynamics of this form has an equilibrium distribution that has the structure of a Gibbs Boltzmann distribution, if the deterministic forces in (3.13) derive from a potential, i.e. if ∂ fi (x) = − U (x) , dxi so that (3.13) is a gradient dynamics with Gaussian white noise. A stationary solution is obtained when the divergence of the probability current J (x, t) = (Ji (x, t)) vanishes, and hence ∂t Pt (x) = 0 in (3.15). A stronger — sufficient, but not necessary condition — would be to require that the proability current itself (and indeed all its components Ji (x, t)) vanish separately, rather than just the divergence of the current. 36
This requirement is analogous to the detailed balance condition we discussed in the context of stochastic dynamics for discrete dynamical variables described in terms of a master equation. Thus detailed balance for a stationary distribution P∞ (x) = P (x) implies ∂ 1 P (x) = 0 fi (x)P (x) − σ 2 2 dxi which can be rewritten as
2 ∂ fi (x)P (x) = P (x) . 2 σ dxi
Assuming P (x) to be of Gibbs-Boltzmann form P (x) = Φ(x) to be determined, this equation translates into
1 Z
e−Φ(x) with some energy function
2 ∂ fi (x) = − Φ(x) 2 σ dxi If fi derives from a potential, as assumed above, these conditions are solved by Φ(x) =
2 U (x) + const. σ2
in which the (x) independent constant can be absorbed in the normalization constant of the equilibrium distribution, which thus takes the form P (x) =
1 −2U (x)/σ2 e . Z
(3.16)
A proper Gibbs-Boltzmann equilibrium distribution at inverse temperature β = (kB T )−1 is obtained if one assumes the variance of the Gaussian white noise in (3.13), (3.14) is related to temperature via σ 2 = 2kB T . (3.17)
37
Chapter 4
Asymptotic Methods 4.1 4.1.1
Laplace and Saddle Point Methods The Laplace Method
The Laplace method is a method to evaluate real integrals of the form Z IN [f ] = dx eN f (x)
(4.1)
for large N and functions f which have a single maximum at x0 (assumed to be in the integration range. For such integrals we have the result 1 lim ln IN [f ] = f (x0 ) (4.2) N →∞ N so for large N the integral IN will behave asymptotically like IN [f ] ∼ eN f (x0 ) .
(4.3)
To verify these statements rewrite (4.1) as IN [f ] = e
N f (x0 )
Z
dx eN [f (x)−f (x0 )] .
By assumption f (x) − f (x0 ) < 0 for x 6= x0 , so the integrand is exponentially small (in N ) for x 6= x0 , hence the integral will be dominated by x ≃ x0 . Using a Taylor expansion of f up to second order around the maximum at x0 gives s Z 1 2π ′′ 2 , (4.4) IN [f ] ≃ eN f (x0 ) dx e− 2 N (−f (x0 )) (x−x0 ) = eN f (x0 ) N ( − f ′′ (x0 )) entaillng the stated limiting behaviour, as N → ∞. 38
Example Stirling approximation for the factorial for large N , √ N ! ≃ 2πN N N e−N . This follows from N! =
Z
Z
dx xN e−x = N
dy (N y)N e−N y = N N N
Z
dy eN (ln y−y) .
Evaluating this via the saddle point method gives the stated approximation. Exercise Fill in the details of this calculation. √ The following figure shows the ratio N !/( 2πN N N e−N ) for 1 ≤ N ≤ 30 to give an indication of the precision. 1.10
1.08
1.06
1.04
1.02
1.00
0
5
10
15
20
25
30
Figure 4.1: Ratio of exact value of the factorial and its Stirling approximation for 1 ≤ N ≤ 30. Note that the relative error is below 1% for N > 15.
Multi-dimensional generalization: The Laplace Method can be generalized to multidimensional integrals. Let f be a function from Rn into the real numbers with a single maximum at x0 . Then for Z IN [f ] =
dx eN f (x)
we have the result
(4.5)
1 ln IN [f ] = f (x0 ) (4.6) N →∞ N where x0 is the coordinate of the uniqe maximum of f in the integration range. Following similar lines of reasoning as above this is shown by writing, Z N f (x0 ) IN [f ] = e dx eN [f (x)−f (x0 )] Z P 1 2 N f (x0 ) ≃ e dx e− 2 N µ,ν (−∂ f (x0 ))µν (xµ −x0µ )(xν −x0ν ) lim
39
= eN f (x0 )
s
(2π/N )n det(−∂ 2 f (x0 ))
(4.7)
Here ∂ 2 f (x0 ) is the matrix of 2nd partial derivatives of f evaluated at x0 , with elements ∂ 2 f (x) ∂ 2 f (x0 )µν = ∂x which is negative definite at a maximum of f (i.e. all its eigenvalues µ ∂xν x0
must be negative). The stated result follows by taking limits.
4.1.2
The Saddle Point Method
The idea of the above evaluation of integrals can be generalized to the evaluation of contour integrals in the complex plane which are of the form Z (4.8) IN [f ] = dz eN f (z) , C
where we assume f to be a complex differentiable (holomorphic) function, and C a contour in the complex plane along which the integral is being evaluated. The idea then is to deform the contour in such a way that it passes through a stationary point z0 of f , with f ′ (z0 ) = 0. (If such a deformation is possible without crossing singularities of f the integral along the deformed contour will be the same as the original one by Cauchy’s theorem.) Setting z = x + iy, and f (z) = g(x, y) + ih(x, y) with real functions g and h one has that (x0 , y0 ) must be a stationary point of both g and h, but it will be a saddle-point for both g and h (rather than a maximum or a minimum) because of the Cauchy-Riemann conditions ∂2g ∂2g + =0, ∂x2 ∂y 2
∂2h ∂2h + 2 =0. ∂x2 ∂y
In the vicinity of z0 we can approximate (using a low order Taylor expansion) 1 f (z) ≃ f (z0 ) + f ′′ (z0 ) (z − z0 )2 . 2 Choosing the deformed contour in such a way that f ′′ (z0 ) (z − z0 )2 along the deformed contour is real and purely negative, and noting that |eN f (z) | = eN g(z) one sees that contributions to the integral are exponentially (in N ) suppressed away from z0 in the direction chosen (in which the integral can be approximated by a Gaussian integral). In summary, one obtains 1 lim ln IN [f ] = f (z0 ) (4.9) N →∞ N 40
in complete analogy to the real case. Remark: We will repeatedly come across situations where the saddle-point method rather than the conceptually simpler Laplace method will be used. From physics considerations we can typically expect that at the relevant stationary points we find that Imf (z0 ) = 0.
4.2
Curie-Weiss Ferromagnet
Model Hamiltonian for a system of N Ising spins H = H(S) = −
X 1X Jij Si Sj − h Si 2 i,j
(4.10)
i
with infinitesimally weak all-to-all connections: Jij =
J , N
i 6= j ,
Jii = 0 .
(4.11)
Introducing the magnetization per spin m=
1 X Si N
(4.12)
i
one has
NJ 2 J m − N hm + . 2 2 The last O(1) contribution corrects for the diagonal contributions included in the first. In the large system limit it will be negligible, and we will henceforth neglect it without further ado, i.e. we have i hJ (4.13) H(S) ≃ −N m2 + hm . 2 H(S) = −
We need to evaluate the partition function
ZN = ZN (β, h) =
X S
and will do this using three different methods.
41
e−βH(S) ,
(4.14)
4.2.1
First Method: Combinatorics and Laplace Method
Use the fact that H depends on S only via the magnetization density m which can take the values 2k , k = 0, 1, . . . N (4.15) m=1− N in which k is the number of spins with Si = −1. The number of spin configurations at given k is N N , = k N 1−m 2 so that we have ZN
1 n h βJ io X N exp N m2 + βhm = 1−m 2 N 2
(4.16)
m=−1
In the large N limit the values of the magnetization m become densely spaced in the interval [−1, 1] with spacing ∆m = N2 . We proceed by approximating the sum over m values by an integral, and use the Stirling approximation in the form n! ∼ nn e−n for n √ ≫ 1 to evaluate the factorials in appearing in the combinatorial factor (The additional factors 2πn can be shown to be negligible in the case of interest here.) n h 1 − m 1 − m 1 + m 1 + m io N ln + ln ∼ exp − N 2 2 2 2 N 1−m 2 We thus get ZN ∼
N 2
with
Z
1
dm eN g(m)
(4.17)
−1
1−m 1−m 1+m 1+m βJ 2 m + βhm − ln − ln (4.18) 2 2 2 2 2 Evaluating (4.17) via the Laplace method we look for a maximum of the function g by solving g ′ (m) = 0, or g(m) =
β(Jm + h) + With
1 2
1 1−m ln =0 2 1+m
⇐⇒
ln 1+m 1−m = arctanh(m) we finally get
1 1+m ln = β(Jm + h) 2 1−m
m = tanh[β(Jm + h)]
(4.19)
(4.20)
as the fixed point equation to determinine the maximum of g. If this equation has several solutions, one will have to verify that g ′′ (m) = βJ − 42
1 <0 1 − m2
(4.21)
to accept a solution as a (local) maximum of g. Exercise: Redo the calculations with the subleading contributions to the Stirling approximation included and convince yourself that asymptotically for large N you get the same results for the fixed point equation and the function g at the fixed point. Before we analyze the fixed point equation and the physical interpretation of its solutions, we will look at two alternative methods to evaluate the partition function, which is the content of the following sections
4.2.2
Second Method: Gaussian Linearization and Laplace Method
The second method to evaluate the partition function returns to (4.14), using the explicit form expression of m in terms of sums over microscopic spin-variables to express the Hamiltonian, to get n βJ X 2 X o X ZN = exp Si + βh Si , (4.22) 2N S
i
i
and notes that a simple evaluation of the partition function is precluded only by the square of m in the exponential, which on multiplying out gives rise to a coupling between all spins. One can get rid of the square of m by using a Gaussian linearization identity of the form Z o n b2 o n a dx p . (4.23) exp − x2 ± bx = exp 2 2a 2π/a In the present case ZN
XZ
n N βJ X o X dm p exp − Si Si + βh m2 + βJm 2 2π/(N βJ) i i S n N βJ X o XZ dm p = exp − Si m2 + β(Jm + h) 2 2π/(N βJ) i S =
is seen to do the job (this step is often referred to as Hubbard-Stratonovich or Gauss-HubbardStratonovich linearization). The partition function now factors with respect to the individual summations over the states of the Si Z n N βJ o dm p exp − m2 + N ln 2 cosh[β(Jm + h)] (4.24) ZN = 2 2π/(N βJ)
This integral is once more of a form that can be evaluated using the Laplace method. The fixed point equation that determines the value m of the maximum of the exponent is identical to (4.20)! We will finally look at yet another alternative to evaluate the partition function, viz the 43
4.2.3
Third Method: δ-Functions and Saddle-Point Method
We return once more to (4.14), keeping m as a short-hand for sums over spin values defined in (4.12), io n h βJ X m2 + βhm , ZN = exp N 2 S
We now introduce a δ-function enforcing the definition (4.12) and introduce a factor 1 into this expression in the form of an integral over this δ-function as n h βJ io XZ X m2 + βhm . (4.25) ZN = N dm δ N m − Si exp N 2 i
S
The next step is to express the δ-function in terms of its Fourier-representation n X o X Z dm ˆ , exp − im ˆ Nm − Si δ Nm − Si = 2π giving ZN
n h βJ i X Z dmdm X o ˆ = exp N m2 + βhm − imm ˆ + im ˆ Si . 2π/N 2
(4.27)
i
S
Once more the sum over Z ZN = Z =
(4.26)
i
i
the states of the Si decouples w.r.t. i, giving n h βJ io dm dm ˆ exp N m2 + βhm − imm ˆ + ln[2 cosh(im)] ˆ 2π/N 2 n o dm dm ˆ exp N g(m, m) ˆ 2π/N
(4.28)
We now have a double integral with an integrand that is an exponential of a complex function of two variables, scaling exponentially in N , which we will evaluate using a two-variable version of the saddle-point method. We have to look for points in the complex m and m ˆ planes at which the function g(m, m) ˆ is stationary. The stationarity conditions are ∂g = β(Jm + h) − im ˆ =0 ∂m ∂g = −im + i tanh(im) ˆ =0 ∂m ˆ Inserting im ˆ from the first equation into the second once more gives the fixed point equation (4.20) m = tanh[β(Jm + h)] we derived from the first method, suggesting that the saddle point solution will give rise to a real value for m and a purely imaginary m; ˆ this does indeed require to deform the m ˆ integration contour (the real line) to pass through the purely imaginary m ˆ saddle point value. 44
4.2.4
Equivalence of the Three Methods
We have seen that the three methods give rise to the same fixed-point equation for the magnetization m as a function of β and the external field h. We need to demonstrate that the three methods give rise to the same expression for the free energy density in the thermodynamic limit and that m is indeed the thermodynamic equilibrium magnetization, irrespective of the method used to obtain the central fixed point equation. The free energy per site is given by 1 ln ZN (β, h) . N →∞ N
−βf (β, h) = lim
(4.29)
First Method: Using the combinatorics and the Laplace method to compute ZN , we would get −βf (β, h) = g(m), with g(m) given by (4.18) to be evaluated at the value of m that satisfies the fixed-point equation (4.20). Rewriting g(m) as g(m) =
1 βJ 2 m + βhm + ln 2 − ln(1 − m2 ) − marctanh(m) , 2 2
and using (4.20) to substitute (1−m2 ) = 1−tanh2 [β(Jm+h)] = β(Jm + h), we get −βf (β, h) = g(m) = −
1 cosh2 [β(Jm+h)]
βJ 2 m + ln 2 cosh[β(Jm + h)] 2
and arctanh(m) =
(4.30)
provided that m solves (4.20). Second Method: Using a combination of Gaussian linearization and the Laplace method to compute ZN , we read off directly from (4.24) that βJ 2 m + ln 2 cosh[β(Jm + h)] 2 provided that m solves (4.20), which clearly agrees with what we got from the first method. −βf (β, h) = −
Third Method: Finally, using a δ-function and its Fourier- representation to enforce the definition of m and the saddle-point method for the integration over m and its so-called conjugate m, ˆ we get −βf (β, h) = g(m, m), ˆ to be evaluated at the solution of the saddle point equations for m and m. ˆ Using the first of these saddle point equations to substitute im = β(Jm + h) we can express g(m, m) ˆ in terms of m alone, once more giving −βf (β, h) = −
βJ 2 m + ln 2 cosh[β(Jm + h)] , 2
and requiring that m solves (4.20). 45
A Remark on Relative Merits The three methods we have used are not equally powerful. The combinatorial method is very much restricted to system with discrete degrees of freedom (such as Ising spins), the Gaussian linearization approach requires that the Hamiltonian is at most quadratic in the relevant macroscopic quantities. The δ-function approach is the most general of the three we have looked at; its disadvantage is that it typically requires to use the saddle-point rather than the Laplace method, so it is less easy to discuss stability or to verify that the fixed points one find are indeed giving the maximum contributions to the partition function.
4.2.5
The Physics of the Curie-Weiss Model
The CW model is a simple model for ferro-magnetism; although the underlying assumption of an infinite-range interaction is clearly unphysical, the model has the distinct advantage that its free energy (and thus its thermodynamic properties) are easily computed — as we have seen in at least three different, but related ways. The model has the distinct advantage to give rise to an equation of state, which can be formulated in closed analytic form, namely the fixed-point equation (4.20). Its solutions give the magnetization as a function of temperature and field m = m(β, h). Remarkably the equation of state predicts a phase transition in zero external field h = 0 from an unmagnetized state at high temperatures to a magnetized state below a critical temperature Tc given by the condition βc J = 1, as we will now show. Equation of State, Magnetic and Thermal Properties Magnetization In zero external field the equation of state is m = tanh(βJm) .
(4.31)
This equation always has an m = 0-solution. A graphical analysis reveals that it can have in addition two non-zero solutions ±m(β) 6= 0 if βJ > 1. The solution in the low temperature phase is shown in the figure below. Looking at the stability condition (4.21), we see that the m = 0 solution is stable precisely where it is the only solution of the FPE, i.e. for βJ < 1, and it becomes unstable, where the non-zero solutions appear, i.e. for βJ > 1. These non-zero solutions turn out to be stable throughout the low-temperature regime. Method 1 suggests that the partition function (thus the free energy) is indeed dominated by microstates with a magnetization m which solves the fixed-point equation (4.20), so m would be the equilibrium magnetization. One can also check that m satisfies the thermodynamic expression 46
for the magnetization in terms of the free energy m = −∂f (β, h)/∂h. In evaluating this from (4.30), one should in principle take into account that there is an explicit h-dependence and an implicit one through the h dependence of the solution m of (4.20), so m = m(β, h) = −
∂f ∂m(β, h) ∂f (β, h) = tanh[β(Jm + h)] − ∂h ∂m ∂h
Via the fixed point equation, the first contribution is just m, m(β, h) = −
∂f (β, h) = tanh[β(Jm + h)] , ∂h
(4.32)
whereas the second contribution vanishes, as the fixed point equation is precisely the equation that expresses stationarity of (4.30) w.r.t. variations of m. m
m
1.0
1.0
0.8
0.8
0.6
0.6
0.4
0.4
0.2
0.2 kT 0.4
0.6
0.8
1.0
1.2
1.4
kT 0.4
J
0.6
0.8
1.0
1.2
1.4
J
Figure 4.2: Left: Zero field magnetization of the CW model as a function of kB T /J showing non-analytic behaviour at kB T /J = 1. Right: Magnetization at non-zero field (h = 0.01), where the behaviour is smooth.
Susceptibility The susceptibility is defined via the field-derivative of the magnetization. We obtain it by taking the field-derivative in (4.20). Using the chain rule, we get χ≡
∂m(β, h) ∂m(β, h) = 1 − tanh2 [β(Jm + h)] βJ +β . ∂h ∂h
This is solved for χ to give
χ = χ(β, h) =
β(1 − m2 ) , 1 − βJ(1 − m2 )
(4.33)
where we have invested the fixed point equation to express the hyperbolic tangent in terms of m. Note: The condition (4.21) that a solution of the self-consistency equation (4.20) does indeed define a dominant contribution to the partition function, as evaluated via the Laplace method is equivalent to the condition that the susceptibility is positive. 47
In zero external field we know that m = 0 for T ≥ Tc (or β ≤ βc ). This gives χ(β) = χ(β, 0) =
β , 1 − βJ
kB T > J ,
which diverges as βJ → βc J = 1, and would be negative for β > βc . As the susceptibility is by its nature a fluctuation quantity it must be positive, so at β > βc we need to take the m = 6 0 solutions of the fixed-point equation. Χ
Χ 12 100 10 80 8 60 6 40
4
20
2 kT 0.4
0.6
0.8
1.0
1.2
1.4
kT 0.4
J
0.6
0.8
1.0
1.2
1.4
J
Figure 4.3: Left: Zero field susceptibility of the CW model as a function of kB T /J showing a divergence at kB T /J = 1. Rigth: susceptibility at non-zero field (h = 0.01), where the behaviour is smooth. (Note the different scales!) Internal energy: The internal energy per spin is given by u(β, h) =
∂βf (β, h) J = m2 − (Jm + h) tanh[β(Jm + h)] , ∂β 2
obtained by performing derivatives (4.30); implicit β-depencence via m(β, h) is again ignored due to the stationarity of f w.r.t. m. Investing the fixed point equation (4.20) we can rewrite this as J (4.34) u(β, h) = − m2 − hm . 2 Notice that this is just H/N expressed in terms of m, though now m is no longer arbitrary but given by the equilibrium magnetization determined from (4.20). Note that in zero magnetic field m ≡ 0 for T > Tc , implying that u(β) = u(β, 0) ≡ 0 in the high temperature phase.
Specific heat: The specific heat per spin is given by c(β, h) =
∂u ∂u = −kB β 2 . ∂T ∂β
48
All β dependence of u is via the β dependence of m along the solution curve of (4.20), so from (4.34) ∂m , c(β, h) = kB β 2 (Jm + h) ∂β with
h ∂m ∂m i = (1 − m2 ) (Jm + h) + βJ ∂β ∂β
⇐⇒
∂m (Jm + h)(1 − m2 ) = , ∂β 1 − βJ(1 − m2 )
following lines of reasoning analogous to those used in the computation of χ. This finally gives c(β, h) = kB β 2
u
(Jm + h)2 (1 − m2 ) 1 − βJ(1 − m2 ) u
kT 0.4
0.6
0.8
1.0
1.2
1.4
(4.35)
kT 0.4
J
-0.1
-0.1
-0.2
-0.2
-0.3
-0.3
-0.4
-0.4
-0.5
-0.5
0.6
0.8
1.0
1.2
1.4
J
Figure 4.4: Left: Zero field internal energy of the CW model as a function of kB T /J showing non-analytic behaviour at kB T /J = 1. Rigth: internal at non-zero field (h = 0.01), where the behaviour is smooth. C
C 1.2
1.4 1.2
1.0
1.0
0.8
0.8 0.6 0.6 0.4 0.4 0.2
0.2 kT 0.4
0.6
0.8
1.0
1.2
1.4
kT 0.4
J
0.6
0.8
1.0
1.2
1.4
J
Figure 4.5: Left: Zero field specific heat of the CW model as a function of kB T /J showing a cusp at kB T /J = 1. Rigth: specific heat at non-zero field (h = 0.01), where the behaviour is smooth.
49
Phase Transition and Critical Phenomena We have seen that in zero external field, h = 0 the CW model undergoes a phase transition from a non-magnetized state at high temperatures (kB T ≥ J) to a state with non-zero spontaneous magnetization (i.e. macroscopic magnetic order) at low temperatures (kB T < J). Precisely at the critical temperature kB Tc = J the thermodynamic functions are non-analytic functions of temperature and field. It is worth to properly appreciate this phenomenon: just a slight change in temperature (or field) fundamentally changes the macroscopic behaviour of the system! This is a non-trivial prediction of the theory and it agrees well with observations on magnetic materials. Magnetization: To analyze the nature of the non-analyticities in the vicinity of (T = Tc , h = 0) we look at the fixed point equation at h = 0 and for T . Tc , where m ≪ 1, so by expanding 1 m = tanh(βJm) ≃ βJm − (βJm)3 3 one obtains m2 ≃ 3
βJ − 1 1 . βJ (βJ)2
Using βc J = 1, introducing the reduced temperature t via βJ − βc J Tc − T = ≡ −t , βJ Tc and approximating (βJ)2 ≃ 1, we get q √ m ≃ 3( − t) = 3 (−t)β ,
where
β=
1 2
(4.36)
is the critical exponent of the magnetization (not to be confused with the inverse temperature). ExercisepTry to compute the first correction to the asymptotic behaviour of m near Tc . Assuming m = 3( − t) (1+at), determine a, by using higher orders in the expansion of the hyperbolic tangent. Susceptibility: In case of the susceptibility (for T & Tc , thus m = 0) we have χ=
β = J −1 t−γ 1 − βJ
where
γ=1
(4.37)
is the critical exponent of the susceptibility. Looking at χ just below the critical temperature (where m 6= 0 but small), we write χ = J −1
βJ(1 − m2 ) 1 ≃ J −1 |t|−γ , 2 1 − βJ(1 − m ) 2 50
(4.38)
thus the same kind of divergence, albeit with a different prefactor. Exercise Verify the last statement using the asympotic expression for m. Specific heat: We saw that the specific heat exhibits a cusp, and a discontinuity at Tc . Analyzing the expresssion in detail, we get (for T . Tc ) c(β) = kB
(−3t)(1 − m2 ) 3 (βJm)2 (1 − m2 ) ≃ k ≃ kB (1 + t) , B 1 − βJ(1 − m2 ) βJ(t + m2 ) 2
t.0,
(4.39)
thus the specific heat has a finite limit at Tc , behaves linearly in T immediately below the critical point, and is exactly zero above the critical point. One often writes c(β) ∼ |t|−α
with
α=0
(4.40)
for this behavior. Systems in which the specific heat diverges (with α > 0) do exist. Critical Isotherm: Let us finally look at the behaviour of m as a function of (small) h exactly at Tc . Expanding the hyperbolic tangent gives 1 m ≃ βc Jm + βc h − (βc (Jm + h))3 3 For small h, both h and m will be small, and (with βc J = 1) we can approximate 1 βc h ≃ m3 3
⇔
m(Tc , h) ∼ h1/δ
(4.41)
with a critical exponent δ = 3 for the critcal isotherm.
4.3
The Mean-Field Approximation
Consider an Ising model with finite range interaction (e.g. nearest neighbour interaction on a finite dimensional lattice), with Hamiltonian H(S) = −
X 1X Jij Si Sj − h Si , 2 i,j
i
thus Gibbs-Boltzmann equilibrium distribution p(S) =
1 exp[−βH(S)] Z(β)
Unlike in the Curie Weiss case, the model can in general not be exactly solved (the free energy or average magnetization computed) . The only exception are basically (i) systems in one dimension, (ii) systems in two dimension without external field. 51
An approximate description of the equilibrium properties of the system is obtained by using a mean-field approximation as follows. Given any site i, we can write the spin configuration S of the entire system as S = (Si , S∂i , S∂ 2 i ) in which S∂i denotes the spin-configuration of the spins connected to i, while S∂ 2 i denotes the spin-configuration of the remainder of the system. In terms of this decomposition H(S) = −Si (hi + h) + H (i) (S (i) )
(4.42)
with an internal field hi defined by hi = hi (S∂i ) =
X
Jij Sj
(4.43)
j
and a cavity Hamiltonian H (i) (S (i) ) which is the Hamiltonian of the system from which Si and the links connecting to it have been removed. We have H (i) (S (i) ) = H (i) (S∂i , S∂ 2 i )) .
(4.44)
It contains all contributions to the total Hamiltonian except terms involving Si . The contribution of Si to the total energy thus has the form of a Zeeman energy of the spin in a total field composed as the sum of internal and external field. Expressing the Gibbs-Boltzmann distribution in terms of S = (Si , S∂i , S∂ 2 i ), we can express the marginal p(Si , S∂i ) in terms of the decomposition (4.42) as P −βH (i) (S (i) ) X S∂ 2 i e βSi (hi +h) p(Si , S∂i , S∂ 2 i ) = e × p(Si , S∂i ) = (4.45) Z(β) S∂ 2 i
In this expression, the second factor is a complicated function of S∂i due to the interactions that existed with the S∂ 2 i ). By contrast, the conditional disttribution p(Si |S∂i ) is simple: noting that the local field hi depends only on S∂i and not on Si itself, and as moreover X p(Si |S∂i ) = 1 Si
independently of S∂i , we have p(Si |S∂i ) =
eβSi (hi +h) . 2 cosh[β(hi + h)]
This allows to express the Gibbs average of Si as D E hSi i = tanh[β(hi + h)] 52
S∂i
(4.46)
(4.47)
where the hyperbolic tanget is the conditional average of Si at given S∂i and the final average over the marginal P −βH (i) (S (i) ) S∂ 2 i e p(S∂i ) = 2 cosh[β(hi + h)] × Z(β) remains to be performed. The problem is that due to the complicated nature of the marginal, this is virtually impossible to evaluate. The mean-field approximation consists in approximating the average (4.47) by replacing the average of the hyperbolic tangent by the hyperbolic tangent of the average, leading to i h X hSi i ≃ tanh [β(hhi iS∂i + h)] = tanh β Jij hSj i + h (4.48) j
This is now a coupled system of self-consistency equations for the local magnetizations mi = hSi i.
Assuming the system to be translationally invariant, so that mi = m for all i, and assuming that we have uniform nearest neighbour interactions Jij = J for (i, j) n.n, we get m = tanh[β(zJm + h)]
(4.49)
in which z is the coordination (or number of nearest neighbours; for d-dimensional hyper-cubic lattices, one would have z = 2d. Note: This equation is very similar to the fixed point equation of the Curie-Weiss model. For that model in fact, we had assumed all to all interactions of uniform strenght J/N . As the number of nearest neighbours is z = N − 1 in that case we should expect the mean-field approximation m = tanh[β(z(J/N )m + h)] which agrees with the exact result in the thermodynamic limit. The Curie-Weiss model is constructed in such a way that the mean-field approximation is the exact solution for this model. This is related to the fact that in the Curie Weiss model the internal field of spin i is X J X Sj , hi = Jij Sj = N j
j(6=i)
and it converges to the magnetization per spin in the thermodynamic limit by the law of large numbers. It is thus a non-fluctuating quantity, and this is the reason why replacing the average of the tanh by the tanh of the average is exact in this case.
4.4
Variational Approach
The variational apprroximation provides an alternative way to evaluate partition functions, thus free energies, of systems in an approximate fashion in cases where an exact evaluation is 53
infeasible. The method exploits Jensen’s inequality for the evaluation of convex functions, as follows. Suppose H = H(S) is a Hamiltonian of some interacting system (not necessarily Ising) for which the evaluation of tne partition function X Z(β) = exp[−βH(S)] S
is infeasible. Let H0 = H0 (S) be an alternative, simpler Hamiltonian for which the partition function X Z0 (β) = exp[−βH0 (S)] S
as well asQ expectation values of observables depending on one or several spin variables, such as hSω i0 = h i∈ω Si i0 can be computed.
Use this alternative Hamiltonian to write
1 X −β(H(S)−H0 (S)) −βH0 (S) e e Z0 (β) S D E −β(H(S)−H0 (S)) = Z0 (β) e
Z(β) = Z0 (β)
0
(4.50)
Use Jensen’s inequality for averages of convex functions (in the present case the exponential function to obtain the bound Z(β) ≥ Z0 (β) e−βhH(S)−H0 (S)i0 which entails F (β) ≤ F0 (β) + hH(S) − H0 (S)i0 .
(4.51)
The best approximation to the free energy F (β) will then be obtained by minimizing the r.h.s. of (4.51) over the set of parameters characterizing the variational Hamiltonian H0 . The minimum of the r.h.s. will then define a variational approximation to the true free energy. It is always an upper bound.
4.5
Variational Mean Field
Let us apply the variational method to a ferro-magnetic Ising system with nearest neighbour interactions on a d dimensional hypercubic lattice in a uniform external field h, and let us take a system of non-interacting spins as the variational test Hamiltonian. X H0 (S) = − hi Si . (4.52) i
54
For this test Hamiltonian we have Z0 =
Y
2 cosh(βhi )
i
and hSi i0 = tanh(βhi ) ≡ mi ,
hSi Sj i0 = hSi ih Sj i0 = mi mj
and
hence 1 F (β) ≤ F˜ (β) ≡ − ln Z0 + hH(S) − H0 (S)i0 β X 1X 1X = − ln[2 cosh(βhi )] − Jij mi mj − (h − hi )mi . β 2 i
i,j
(4.53)
i
To find the minimum, solve i hX ∂ ˜ Jij mj + h − hi + tanh(βhi ) = 0 . F (β) = − tanh(βhi ) − β(1 − tanh2 (βhi )) ∂hi j
which is equivalent to arctanh(mi ) = βhi = β
hX j
i Jij mj + h
⇐⇒
mi = tanh
hX j
i Jij mj + h .
In a homogeneous, translationally invariant system, we expect mi = m, hence for a system of coordination z m = tanh[β(zJm + h)] which is exactly the self-consistency equation of the mean-field approximation. Note: Although using a system of non-interacting spins as variational test Hamiltonian reproduces the results of the earlier ad-hoc and un-controlled mean-field approximation, the advantage of the variational approach is that one is not restricted to this simple choice. By enlarging the family of variational test Hamiltonians, one is bound to obtain systematically better approximations ot the true free energy. Exercise: In the study of critical phenomena in finite dimensional systems one often approximates the physics of Ising type systems by using models with continuous degrees of freedom, such as the (lattice) Ginzburg-Landau-Wilson Hamiltonian H(φ) = −
X X 1X (φ2i − 1)2 , Jij φi φj − h φi + u0 2 i,j
i
i
in which the φi are continuous degree of freedom, that can take any real value. The third contribution, a sum of quartic on-site potentials is introduced to ensure that the Gibbs distribution 55
corresponding to this model would favour states with φ ≃ ±1 which are the minima of each of the quartic potential. For large u0 the system is therefore expected to be Ising-like. The partition funciton for this system would be Z Y Z= dφi exp[−βH(φ)] . i
Although integrals are often easier to evaluate than discrete sums, nobody has been able to evaluate the partition function of the LGW Hamiltonian. Solve a variational approximation of this system, using e.g. X 1 2 H0 = 2 (φi − µi ) . 2σ i i
with two independent variational parameters, µi and σi per site (in a homogeneous system one expects the solution of the minimization problem to be independent of i of course). Note: I have not seen this problem posed in any textbook or addressed in any journal article, so I don’t know the correct answer, and I am really interested to have your results (soon!) Simplified versions of this have been looked at, though.
56
Chapter 5
System with Finite Dimensional Site Disorder We will here briefly look at mean-field like systems with finite dimensional site disorder. We will look at systmes in which either the bonds are expressed in terms of random variables associated with sites, and take the form Jij =
1 Q(ξi , ξj ) , N
i 6= j ,
Jii = 0
(5.1)
with Q a symmetric function from Rp × Rp to R, and the ξi assumed to be random in Rp , or the external fields hi depend randomly on i or systems in which these two modes of randomness occur together. Restricting ourselves to Ising type models, we will look at Hamiltonians of the form X 1X Jij Si Sj − hi Si H(S) = − 2 ij
i
with the Jij or the hi (or both) random as indicated above. Models of this type can be used to describe disordered magnets in physics. However, as indicated in Sec. 1.2 there are systems which would initially be defined and motivated in terms of a stochastic dynamical evolution equation ! N X Si (t + ∆t) = sign (5.2) Jij Sj (t) + hi + ξit j=1
(recall Eq (1.1)). Their asymptotic behaviour can be described in terms of thermal GibbsBoltzmann equilibrium distributions provided the noise ξit has the right form, namely ρβ (ξ) = 57
1 d 2 dξ
tanh(βξ). We will provide a full justification of this statement later on. Let us for now just notice that with noise distribtued independently (in i and t) according to ρβ (ξ) we have hSi (t + ∆t)i|S∂i (t)
"
N X = tanh β Jij Sj (t) + hi j=1
#
for the conditional average of Si (t + ∆t) at given S∂i (t) (i.e. taking a functional form as we have discussed above in connection with the mean-field approximation for ‘thermal’ systems). Exercise For ρβ (ξ) as given, verify the above statement about the conditional average of Si (t + ∆t). In Sec 1.2 we have seen that models of neural networks are of this type, with the Jij denoting synaptic efficacies (which can be either inhibitory or excitatory), and the hi related to neural firing thresholds ϑi via ϑi = −hi . Similarly, models for the dynamics of attitude change can be cast into this form.
5.1
General Solution
We solve the model for general Q in (5.1), keeping p, the dimension of the random vectors ξi fixed. You may for the sake of definiteness imagine that the components ξiµ of th ξi can each only take values in {±1} so that the number of different possible vectors is 2p . However, we note at the outset that all we need is that the total number of different possible ξi vectors is finite. The solution of the model is based on a decomposition of sites into subgroups of sites which have the identical ξi -vectors. Define the subgroup or sub-lattice Ix = {i; ξi = x}
(5.3) S
The set I = {1, 2, . . . , N } of sites is then a union of Ix , i.e. I = x Ix . Interactions between spins on sites i and j then depend only on the sub-lattices to which i and j belong. Denote by |Ix | the size of the sub-lattice Ix . We then have that px =
|Ix | |Ix | = |I| N
(5.4)
is the fraction of sites for which ξi = x, which in the large system limit approaches the probability that the random vector ξi takes the value x.
58
In terms of these definitions we have XX N X Q(x, x′ )px mx (S) px′ mx′ (S) − hi Si + H0 H(S) = − 2 ′ x
(5.5)
i∈Ix
x,x
with sub-lattice magnetizations mx (S) =
1 X Si |Ix |
(5.6)
i∈Ix
and the O(1) term
H0 =
1X Q(x, x)px 2 x
correcting for the absence of self-interactions which would otherwise be included in (5.5). Evaluating the partition function, ee use the δ-function method to decouple spins, i.e. we introcuce the mx as integration variables, enforcing their definitions via Z X 1 = |Ix | dmx δ |Ix |mx − Si i∈Ix
and using Fourier representations of δ functions, as explained in Ch 3.2. This gives (ignoring the O(1) contribution H0 to the Hamiltonian) ( " X Z Y dmx dm βX ˆx Q(x, x′ )px mx px′ mx′ exp N Z = 2π/|I | 2 x x S x,x′ # ) XX X −i px mx m ˆx + (im ˆ x + βhi )Si
(5.7)
x i∈Ix
x
The Si now appear linearly in the exponential so that the Si -summations in the partition function factor w.r.t. i, so ( " Z Y dmx dm ˆx βX Z = Q(x, x′ )px mx px′ mx′ exp N 2π/|I | 2 x x x,x′ ) # i XX h X (5.8) ln 2 cosh(im ˆ x + βhi ) −i px mx m ˆx + x i∈Ix
x
Now by the law of large numbers (for |Ix | ≫ 1) i i X h 1 X h ln 2 cosh(im ˆ x + βhi ) = N px ln 2 cosh(im ˆ x + βhi ) |Ix | i∈Ix i∈Ix D h iE ˆ x + βh) = N px ln 2 cosh(im h
59
(5.9)
where h. . .ih denotes an average over the distribution of random fields.
For the partition function we therefore have ( " Z Y dmx dm ˆx βX Q(x, x′ )px mx px′ mx′ Z = exp N 2π/|I | 2 x ′ x x,x
−i
X
px mx m ˆx +
x
X
D
h
ˆ x + βh) px ln 2 cosh(im
x
iE
h
#)
(5.10)
This is now explicitly of a form that can be evaluated by the saddle point method. Note also, that results do not depend on the specific realization of the randomness, but only on the distributions of the random fields hi and the ξi ! Fixed point equations are im ˆx = β
X
Q(x, x′ )px′ mx′
x′
mx =
D
E . tanh im ˆ x + βh h
Inserting the solution of the first into the second, we get iE D h X , Q(x, x′ )px′ mx′ + βh mx = tanh β
(5.11)
h
x′
a system of coupled fixed point equations for the mx . The number of coupled equations is equal to the number of different possible x-vectors. The free energy is given by βX Q(x, x′ )px mx px′ mx′ 2 x,x′ X iE X D h + px ln 2 cosh β Q(x, x′ )px′ mx′ + βh
−βf (β) = −
x′
x
h
(5.12)
in which the mx are solutions of the fixed point equations. Exercise: Rederive these results using the combinatorial method. Note that one can obtain the set of fixed point equations (5.11) also directly by using a mean-field approximation to study systems with Hamiltonians with couplings given by (5.1) and random external fields hi . To see this, recall that the mean-field approximation gives a set of fixed-point equations for the mi = hSi i of the form i h X mi = tanh β Jij mj + βhi j
60
Inserting (5.1) for the Jij and cosidering i ∈ Ix we get i h X Q(x, x′ )px′ mx′ + βhi mi = tanh β
(5.13)
x′
where we have introduced sub-lattice (equilibrium) magnetizations mx via mx =
1 X mi |Ix | i∈Ix
Inserting this into (5.13) we get iE i D h X h X 1 X mx = , Q(x, x′ )px′ mx′ + βh Q(x, x′ )px′ mx′ + βhi = tanh β tanh β |Ix | h ′ ′ i∈Ix
x
x
where the last equality is a consequence of the law of large numbers. But this is nothing but the fixed-point equation of the exact solution, meaning that for systems of the form considered here mean-field theory is indeed exact. Depending on the type of interaction-kernel Q(x, x′ ) the set of fixed point equations (5.11) can allow solutions with heterogeneous, i.e. x dependent sub-lattice magnetizations, and a larger diversity of phases than the paramagnetic and ferromagnetic phases of a simple ferro-magnetic Ising model Systems of this type have been considered to describe the physics of spin-glasses (magnetic systems which have a mixture of ferromagnetic and anti-ferromagnetic interactions, leading to frustration, thus the possibility of a large number of ground states or meta-stable states.
5.1.1
Neural Networks – The Hopfield Model
Hopfield took the physics of spin-glasses as a guiding metaphor to investigate the question whether a system of neurons, coupled via a mixture of excitatory (ferromagnetic) and inhibitory (anti-ferromagnetic) synaptic interactions could function as an associative memory. Using Si = ±1 to denote the firing/non-firing states of a neuron respectively, one would like to investigate the question whether neurons coupled via synapses Jij , and following a stochastic dynamics of the form ! N X Si (t + ∆t) = sign (5.14) Jij Sj (t) − ϑi + ξit j=1
could have long-lived states with certain prescribed values hSi i = mξi (with m a noiselevel/temperature-dependent amplitude) for the local magnetizations; these would then describe long lived firing patterns (memories) of the entire network. 61
Remark Looking at brain architecture and principles governing synaptic transmission, there are at least two features which rule out using equilibrium statistical mechanics for their analysis. • Synaptic interactions Jij are generally non-symmetric, Jij 6= Jji ; in fact, they are typically uni-directional . Thermal equilibrium as described by Gibbs-Boltzmann equilibrium distributions, however, exists only for systems with symmetric interactions. • A somewhat more subtle point is related to the nature of the noise in the system: one would in principle expect the noise in realistic neural systems to be Gaussian, say with variance σ 2 . For such systems one would get # " N 1 X Jij Sj (t) + hi hSi (t + ∆t)i|S∂i (t) = erf √ 2σ 2 j=1 rather than a hyperbolic tangent, as would strictly be required to use equilibrium statistical mechanics for the description of long-term behaviour (we will give a proper justification of this statement later on). The fact that one nevertheless chooses to apply equilibrium statistical mechanics machinery is related to the fact that the functions tanh(βx) and ˜ with β˜ = 2β/√π are very close to each other for all x ∈ R, so equilibrium and erf(βx) statistical mechanics should provide an excellent approximation of actual behaviour. The case of a single pattern - Mattis Model In the case of having a single stable attractor correlated with a prescribed set of ξi , one can choose Jij =
1 ξi ξj N
and ϑi = −hi = 0, which is a special p = 1 case of the general setup with Q(ξi , ξj ) = ξi ξj . The fixed point equations are then h X i ξξ ′ pξ′ mξ′ , mξ = tanh β ξ ∈ {±1} , ξ′
or explicitly h i m+ = tanh β(p+ m+ − p− m− ) h i m− = − tanh β(p+ m+ − p− m− )
The quantity of interest is the overlap m of the equilibrium spin-state with the pattern {ξi }, defined by X 1 X m= ξi hSi i = pξ ξ mξ N i
ξ
62
which in the present one-pattern case is simply m = p+ m+ − p− m− Combining the two fixed-point equations we can obtain an equation for m alone, namely m = p+ m+ − p− m− = (p+ + p− ) tanh(βm) = tanh(βm) which is equivalent to the fixed point equation for the Curie Weiss model (with J = 1. So we know that there is a non-zero positive solution for β > βc = 1. Remark There is a much simpler way of demonstrating the thermodynamic equivalence of the Mattis model and a simple non-random ferromagnet (a Curie-Weiss model in the present infinite range case). Via a (local) so-called gauge transformation ξi Si → S˜i the Hamiltonian of a Mattis model 1 X H(S) = − ξi ξj Si Sj 2N i,j
transforms into the Hamiltonian of a Curie-Weiss model X ˜ =− 1 H(S) S˜i S˜j . 2N i,j
As evaluating the partition function as a sum over {Si } configurations is equivalent to performing the partition function as a sum over {S˜i } configurations, the statement follows. Clearly a homogeneous magnetization hS˜i i = m in this Curie-Weiss model translates into an inhomogeneous magnetization hSi i = mξi for the original Mattis model. Generalizing to several patterns - the Hopfield Model Following a proposal by Hopfield the single-pattern Mattis model can be generalized to the simultaneous storage of several patterns by choosing p 1 X µ µ 1 Jij = ξ i ξ j = ξi · ξj N N µ=1
This prescription in in some sense an implementation of the so-called Hebb rule, according to which synaptic interactions are determined by correlations of pre- and post-synaptic activity. It is once more a special case of the general random-site model setup with Q(ξi , ξj ) = ξi · ξj , so the fixed point equations (5.11) describing the equilibrium state(s) of the system directly apply to the present case. For hi ≡ 0, h X i ξ · ξ ′ pξ′ mξ′ , mξ = tanh β (5.15) ξ′
63
The scalar-product form of the interactions suggests further structure, which will both amount to a simplification of the set of fixed-point equations, and — more importantly perhaps — provide a better level of description of the underlying information processing problem, viz. associative information storage (and retrieval). Introducing the overlap vector m=
X ξ
D E pξ ξ mξ = ξ mξ
ξ
(5.16)
we see that the fpe’s can be written as mξ = tanh [βξ · m] which after multiplying by ξ and averaging over ξ gives D h iE , m = ξ tanh βξ · m
(5.17)
ξ
or, in components
iE D h X , mµ = ξ µ tanh β ξ ν mν ξ
ν
(5.18)
Note that mµ is nothing but the overlap of the equilibrium spin-configuration with the pattern ξiµ , X 1 X µ pξ ξ µ mξ . mµ = ξi hSi i = N i
ξ
Exercise One can use the overlaps mµ instead of the sub-lattice magnetizations as the primary objects of the theory, by realizing that the Hopfield model Hamiltonian can be written as p NX 2 p mµ + H(S) = − 2 2 µ=1
and evaluating the partition function directly in terms of the mµ (using your favourite method(s)). Do that calculation, and derive the fixed point equations (5.18) directly without using sub-lattice magnetizations. Show that the free energy expressed in terms of the overlaps is given by p p iE X βX 2 D h −βf (β) = − . ξ ν mν mµ + ln 2 cosh β 2 ξ ν=1
µ=1
64
Retrieval states For unbiased binary random patterns with Prob(ξiµ = ±1) = 21 the fixed point equations, Eqs. (5.17) or, equivalently, their reformulation (5.18) in component form have solutions for which the local equlibrium ‘magnetizations’ hSi i which are correlated only with a single pattern, say µ, implying that we have solutions with and mν = 0 for ν 6= µ .
mµ = m > 0 ,
As the couplings are symmetric w.r.t permutation of the patterns, we can assume without loss of generality that µ = 1. These states are the so-called retrieval states. To see this, assume mν = mδν,1 on the r.h.s. of (5.18). We then have D E m1 = m = ξ 1 tanh [βξ 1 m] , ξ
giving m = tanh [βm] for the FPE determining the retrieval amplitude m (which is equivalent to the FPE of the Curie-Weiss model), and D E mν = ξ ν tanh [βξ 1 m] = 0 ξ
as claimed, using independence of the
ξν
and
hξ ν i
ξ
= 0.
For sufficiently low noise levels β > βc = 1 therefore, the system will retrieve any of the stored patterns. Other meta-stable states The desired p retrieval states are not the only solutions of the fixed-point equations. The symmetry of the couplings under permutation of pattern indices suggest that there may be solutions of the form mµ = m for 1 ≤ µ ≤ n,
mµ = 0 for µ > n .
Inserting this into (5.18), requires n D h X iE m = ξ µ tanh βm ξν , ν=1
ξ
µ≤n,
for the amplitude m, which on summing over µ gives m=
n n iE h X 1 D X µ ξν , ξ tanh βm n ξ ν=1
µ=1
65
Assuming that the amplitude becomes small at sufficiently large noise levels (small β), one can do an expansion of the tanh as for the Curie-Weiss model to find r −3t , (5.19) m≃ 3n − 2 where t = (T − Tc )/Tc and Tc = 1.
Exercise (i) Verify this statement. (ii) Show that this result can also be obtained without initially assuming that the n non-zero mµ all have the same amplitude. This analysis should actually reveal that the sign of the amplitude can be freely chosen for each µ, i.e. that mµ = ±m for µ = 1, . . . , n, with each sign independently chosen. In the regions where β is not close to the critical value, the equation for the symmetric amplitude m can be solved numerically (try, if you have access to a computer). Stability To assess whether these additional solutions are stable, one has to check whether the fixed point characterizing these solutions actually corresponds to a minimum of the free energy (5.12) evaluated as a function of the ‘order parameters’ {mξ } or alternatively the {mµ } at these saddle points (after elimination of the conjugate variables {m ˆ ξ } (respectively the {m ˆ µ }). Thus, using the representation in terms of the {mµ }, we have to look at the matrix D with elements Dµ,ρ =
D i h X ∂ 2 f (β) = δµ,ρ − β ξ µ ξ ρ 1 − tanh2 β ξ ν mν iξ ∂mµ ∂mρ ν
evaluated in the solution mµ = m for µ ≤ n of the fixed point equations, and check whether all it’s eigenvalues are positive. Due to the symmetries of the problem these eigenvalues can in fact be computed, and it turns out that only states with odd n can be stable, and only the n = 1 (retrieval) states are stable right where they appear (at β ≥ 1), whereas the n ≥ 3 solutions each have a critical βnc > 1, where the solutions become stable, with βnc increasing with n. Remark The present method of solving the model assumes that p remains finite in the thermodynamic limit N → ∞ though it can be taken arbitrarily large. The methods we used to solve the theory do however loose their validity when p grows with N . To answer the question how many patterns can be stored (and retrieved) in a Hopfield model of a given size, one needs to develop a theory for p = αN with α = O(1). Soving the model in that limit requires the use of more sophisticated methods for disordered system, such as the replica approach. We’ll come to that later.
66
Chapter 6
A Model for Operational Risk Operational risk (OR) is classified as “the risk of losses resulting from inadequate or failed internal processes, people and systems or from external events”. Possible OR-categories (in the financial industry) are 1. human processing errors, e.g., mishandling of software applications, reports containing incomplete information, or payments made to incorrect parties without recovery 2. human decision errors, e.g., unnecessary rejection of a profitable trade or wrong trading strategy due to incomplete information 3. (software or hardware) system errors, e.g., data delivery or data import is not executed and the software system performs calculations and generates reports based on incomplete data 4. process design error, e.g., workflows with ambiguously defined process steps 5. fraud and theft, e.g., unauthorized actions or credit card fraud 6. external damages, e.g., fire or earthquake However, operational risk is clearly relevant for any type of enterprise. In the banking industry, regulators (at the Bank for International Settlements in Basel) require that banks set aside a certain minimum amount of capital to cover losses due to operational risks. The so-called Basel II accord of 2001 suggests three methodologies to estimate the required capital : (i) the Basic Indicator Approach (BIA), (ii) the Standardized Approach (SA) and (iii) the Advanced Measurement Approach (AMA). Under the BIA, the required capital is determined by taking 15% of the banks’ average gross income over the previous three years. The SA is only slightly 67
more advanced in that the banks’ operations are divided into 8 business lines, each of which has a specific weight. The required capital is calculated as the weighted average of the (nonnegative) gross income from the business lines over the previous three years. Under AMA, banks are responsible for designing their own measurement approach and setting assumptions on the loss distributions of individual OR categories; overall losses are then aggregated, assuming that losses in over these categories are independent. Hovever, a moment of reflection shows that losses incurred through failing processes within an organization are not independent. Indeed an organization can be defined as a system of interacting and mutually supporting processes. A failing process will stop providing input (or proper input) to all other processes which normally rely on said process to be working properly; as a consequence the likelihood of these other processes to fail as well (partially or completely) will in general be increased. The Basel II regulatory document entirely misses this point, hence misses mechanisms that can lead to avalanches of process failures (black-outs).
6.1
Set-Up of the Model
A simple model to analyze OR describes an organization as a collection of N processes which can be either up and running (ni (t) = 0) or down ni (t) = 1). In order to maintain a stable running state over the time increment t → t + ∆t, each process i needs to receive support at time t, which is denoted hi (t) and takes the form X hi (t) = ϑi − Jij nj (t) − ξi (t) . (6.1) j
Here, ϑi denotes the average support received by process i in a fully functional environment, while Jij represents the impact on the average support if process j breaks down. Finally, the ξi (t) are taken to be zero mean, (Gaussian) random fluctuations, which represent non-systematic internal and external perturbations to the environment (e.g., fire, earthquake, voltage fluctuations within electric supplies, etc.). A process breaks down in the next time step if hi (t) < 0. Thus, the dynamics of a processes’ state is given by X ni (t + ∆t) = Θ Jij nj (t) − ϑi + ξi (t) , (6.2) j
By suitable rescaling of the average support and impact parameters ϑi and Jij , respectively, the ξi (t) can be taken to have unit variance. The rescaled variables, can then be related to 68
the unconditional and conditional failure probabilities. Consider a situation with all processes working at time-step t. The probability that process i will breakdown in the next time-step, referred to as pi , is given by integrating Eq. (6.2) over the noise ξi (t). Thus pi = Φ (−ϑi ) , where
(6.3)
x 1 + erf √ 2 is the (cumulative) probability distribution Φ(x) of the unit variance Gaussian. Similarly, defining pi|j as the probability process i will break down in the next step, given that currently process j is broken down while all others are working, gives us 1 Φ(x) = 2
pi|j = Φ (Jij − ϑi ) .
(6.4)
These relations may be inverted to obtain expressions for the model parameters in terms of the failure probabilities, ϑi = − Φ−1 (pi ) ,
Jij = Φ−1 (pi|j ) − Φ−1 (pi ) .
(6.5)
Note, that we could also omit rescaling to unit variance noise, and keep the structure of the above relations. The only difference would be that the probability distribution Φ(x) of√the unit 1 2 variance Gaussian is replaced by its variance σ counterpart, Φσ2 (x) = 2 1 + erf x/ 2σ 2 .
For general sets of un-conditional and conditional failure probabilities, this model is not analytically tractable. However, one can investigate systems for which the set these parameters is specified in a probabilistic fashion.
6.2
A Statistical Mechanics Approach
In the spirit of Hopfield’s modelling of neural networks we will introduce assumptions (unrealistic ones!) that will allow us to investigate the system using equilibrium statistical mechanics. We will assume that the Jij are symmetric, and will replace the Gaussian noise with thermal noise with probability distribution 1 βx Φβ (x) = 1 + tanh . (6.6) 2 2 These assumptions entail that the stochastic dynamics will satisfy detailed-balance with the Gibbs-Boltzmann equilibrium distribution generated by X 1X H(n) = − Jij ni nj + ϑi ni 2 i
i6=j
69
We will assume a stochastic setting for the Jij , assuming that interaction patterns of processes form a random network with average connectivity c, taking the Jij of the form Jij = cij
J + √ xij , c c
J
0
(6.7)
where cij = cji ∈ {0, 1} denotes absence/presence of an interaction, distributed according to c c P (cij ) = 1 − (6.8) δcij ,0 + δcij ,1 , i < j , N N
where J0 parameterizes the mean strength of the interactions, and where the xij are symmetric (but otherwise independent) quenched random variables of mean zero and unit variance, hx2ij i = 1 .
hxij i = 0 ,
(6.9)
Thus J 2 parameterizes the variance of the interaction stregths. The scalings with the mean connectivity c are introduced in such a way that it allows to investigate the extensive but sparse connectivity limit c → ∞, N → ∞, but c/N → 0 (as an approximation to the ] large N , large c, small c/N situation).
6.2.1
Replica Analysis
We need to calculate the average of the free energy of this system over the distribution of the disorder (the cij and the xi j), and in principle also over the site-random variables ϑi ; we will find that the free energy is self-averaging w.r.t. the latter). I.e, we need to evaluate −βf (β) =
1 hln Zi , N
where the average is over the bond thisorder. To compute the average of the logarithm of the partition function, we use the replica-identity 1 d hln Zi = lim ln hZ n i = ln hZ n i (6.10) n→0 n dn n=0
For integer n, Z n is the partition function of n independent identical copies (replica) of the system. The replica method consists in first evaluating hZ n i for integer n and then analytically continuing the result to real n and taking the n → 0 limit as required.
Thus we first have to evaluate hZ n i =
X
{na }
*
n X o X X exp β Jij nia nja − β ϑi nia a
i<j
70
ia
+
.
Here a = 1, . . . , n enumerates the replica. The bond disorder average factors w.r.t. the individual bonds (i, j), thus — inserting the specific form of the Jij — we get * ( ) )+ ( J X X X Y Jx ij 0 exp βcij + √ × exp − β ϑi nia nia nja hZ n i = c c a ia
{na } i<j
To proceed we exploit the fact that we are interested in the large c limit, so we can expand the first exponential, giving * J X Y Jxij X 0 n nia nja 1 + βcij hZ i = + √ c c a {na } i<j ) + ( 2 X X (βJ)2 2 + ϑi nia nia nja + . . . × exp − β cij xij 2c a ia
Assuming that all moments of the xij exist, the terms not written out expleicitly are o(c−1 ), and can be omitted. Using (6.9), and c hcij i = , N and re-exponentiating 1 + Na = exp(1/N ) + O(N −2 ), we have ) ( 2 X X (βJ)2 X X βJ0 X X n ϑi nia nia nja + nia nja − β hZ i = exp 2N a 4N a i,j
{na }
i,j
ia
This is now the partition function of a homogeneous system of N × n degrees of freedom {nia }, including interactions between sites, and between replica (although before averaging the replica were non-interacting (i.e. independent). Contributions O(1) in the exponential coming from including self-interactions have been suppressed in this expression. Exercise: Specify the neglected terms. Note that interactions can be entirely described in terms otf two types of macroscopic order parameters, the replicated fractions of failed processes ma =
1 X nia , N i
1≤a≤n,
(6.11)
and the Edwards-Anderson matrix of replica-overlaps qab =
1 X nia nib , N i
71
1 ≤ a, b ≤ n ,
(6.12)
in terms of which X
n
hZ i =
{na }
(
X (βJ)2 X 2 βJ0 X 2 ϑi nia qab − β ma + N exp N 2 a 4 ia
a,b
)
(6.13)
Coupling between sites is due to having squares of the above order parameters (generalizing the CW situation). We proceed using our favourite method (in the present situation, both Gaussian linearizations and the δ-function method would work. Using the latter, we get ( " Y X Z Y dma dm dq dˆ q βJ0 X 2 (βJ)2 X 2 ˆ a ab ab exp N ma + qab hZ n i = 2π/N 4π/N 2 a 4 a a,b a,b {na } # ) X X X X iX iX −i m ˆ a ma − m ˆ a nia + qˆab qab + i qˆab nia nib − β ϑi nia 2 2 a ia
a,b
ab
i
ia
The partition function now factors w.r.t. the site indices, though not w.r.t. the replica indices (i.e. replica are interacting through the unknown iˆ qab replica interaction parameters). Exploiting site-factorization, we can rewrite the above as
hZ n i =
Z Y dma dm ˆ a Y dqab dˆ qab a
−i
X a
2π/N
m ˆ a ma −
a,b
4π/N
( "
exp N
βJ0 X 2 (βJ)2 X 2 qab ma + 2 a 4 a,b #)
1 X iX (i) qˆab qab + ln Zeff 2 N i
a,b
.
(6.14)
(i)
Here Zeff is a single-site replica-partition function (the {na } sum extending over replicated indicator variables configuration at a single site), n o X (i) (i) Zeff = exp − βHeff ({na }) (6.15) {na }
with an effective single-site Hamiltonian given by X X iX (i) na qˆab na nb − βϑi −βHeff = i m ˆ a na + 2 a a
(6.16)
ab
(i)
(i)
The i dependence of Zeff is solely due to the ϑi . By the LLN the normalized sum over the ln Zeff in (6.14) converges to the expectation over the ϑ distribution so that we finally have ( " Z Y dma dm ˆ a Y dqab dˆ qab βJ0 X 2 (βJ)2 X 2 n hZ i = exp N ma + qab 2π/N 4π/N 2 a 4 a a,b
a,b
72
−i
X a
iX qˆab qab + h ln Zeff iϑ m ˆ a ma − 2 a,b
#)
(6.17)
This is now in a form that can be evaluated by the saddle-point method. Stationarity of the exponential w.r.t. the ma and the qab requires iˆ qab = (βJ)2 qab .
im ˆ a = βJ0 ma ,
Stationarity w.r.t. variations of the m ˆ a and the qˆab requires D E D E ma = hna i , qab = hna nb i , ϑ
ϑ
(6.18)
(6.19)
in which the inner angled bracket is an average w.r.t. the Gibbs-Boltzmann distribution generated by the effective Hamiltonian that defines Zeff . If we insert the expressions of im ˆ a and iˆ qab that we obtain from the first set of FPEs, we have Heff = −J0
X a
ma na −
X βJ 2 X qab na nb + ϑ na 2 a
(6.20)
ab
for this effective single site Hamiltonian at given ϑ. In terms of the solution of the FPEs the free energy per node is given by −βf (β) = lim lim
N →∞ n→0
1 lnhZ n i . Nn
(6.21)
Note that by evaluating hZ n i via a saddle point method at given n, we are effectively taking the N → ∞ rather than the n → 0-limit first, as strictly required. If we do (inserting the values of the conjugate variable), we get from (6.17) that o 1 n βJ0 X 2 (βJ)2 X 2 qab + h ln Zeff iϑ − ma − n→0 n 2 a 4
−βf (β) = lim
(6.22)
a,b
The problem that remains to be tackled is to solve the system of n(n + 3)/2 FPEs (6.19), and take the required n → 0-limit. This is not generally possible due to the replica interactions mediated by the qab . One solves this problem, by resorting to assumptions concering the transformation properties of the ma and the qab under permutation of the n replica. The simplest is the assumption of replica symmetry.
73
6.2.2
Replica Symmetric Solution
The replica symmetric (RS) ansatz is motivated by the observation that the replica are all identical and equivalent, so the solution should not break the symmetry between them. It is now known that the conclusion is hasty; the interaction between replica can lead to spontaneous breaking of the permutation symmetry among the replica (analogous to the breaking of the global spin-flip symmetry of the zero-field CW model in the low temperature solution!). Nevertheless the RS solution often does give a good qualitative first impression of the true behaviour. In the case of OR it is actually exact in the region of interest (processes mutually supporting each other, so low degrees of frustration). The RS ansats consists in assuming that ma = qaa = m ,
1≤a≤n,
qab = q ,
1 ≤ a 6= b ≤ n .
(6.23)
We note that the equality ma = qaa is a consequence of the fact that for the 0 − 1 variables na we have n2a = na . The RS version of Heff is then Heff
X βJ 2 X 2 X βJ 2 ϑna na − na + (m − q) q = − J0 m + 2 2 a a a
(6.24)
Thus the coupling between replica is only via a complete square. In evaluating Gibbs averages of the form P {n } (. . .) exp{−βHeff } h. . .i = Pa {na } exp{−βHeff }
we can therefore use Gaussian linearization to decouple replica that were coupled via a complete 2 square. Introducing the abbreviation Dz = √dz2π e−z /2 for the unit variance Gaussian measure we then have R P P {na } Dz(. . .) exp{−β a HRS (na )} R P (6.25) h. . .i = P {na } Dz exp{−β a HRS (na )}
in which
βJ 2 √ HRS (na ) = − J0 m + (6.26) (m − q) + J q z − ϑ na ≡ −hRS (z) na 2 depends only on the a single na . This Gaussian linearization has uncoupled the replica. The denominator in (6.25, i.e. Zeff , then gives n o Z X XZ Dz exp − β HRS (na ) = Dz ZRS (z)n (6.27) Zeff = {na }
a
74
with ZRS (z) = 1 + exp{βhRS (z)}
(6.28)
The numerator in (6.25 depends of course on the average we are evaluating. If we are interested in the expectation of a single na , we can write P R n exp{βh (z)n } ZRS (z)n−1 Dz a RS na a R hna i = DzZRS (z)n P R na exp{βhRS (z)na } ZRS (z)n Dz na ZRS (z) R . = Dz ZRS (z)n
Noting finally that ZRS (z)n → 1 in the relevant n → 0 limit, we can write this as Z Z exp{βhRS (z)} hna i = Dz = Dz Φβ (hRS (z)) 1 + exp{βhRS (z)}
(6.29)
with Φβ defined in (6.6). If we are interested in the expectation of a product of two indicator variables from different replica, we proceed similarly. Investing the knowledge that that ZRS (z)n → 1, we get P P Z nb nb exp{βhRS (z)nb } na na exp{βhRS (z)na } hna nb i = Dz ZRS (z) ZRS (z) Z = Dz Φβ (hRS (z))2 in the n → 0-limit.
Summarizing: the n → 0-limit of the RS FPEs take the simple form E DZ Dz Φβ (hRS (z)) m = ϑ E DZ Dz Φ2β (hRS (z)) q = ϑ
(6.30) (6.31)
with a Gaussian local field
hRS (z) = J0 m +
βJ 2 √ (m − q) + J q z − ϑ . 2
(6.32)
The free energy is obtained by evaluating (6.22) for the replica symmetric solutions of the order parameters. This gives D Z E o (βJ)2 1 n βJ0 2 2 2 2 −βfRS (β) = lim − nm − [nm + (n − n)q ] + ln Dz ZRS (z)n n→0 n 2 4 ϑ 75
The only non-trivial limit is that involving the last contribtution. It is evaluated by expanding R ZRS (z)n = 1 + n ln ZRS (z) + cO(n2 ), by noting that Dz = 1, and then by expanding ln 1 + R R n Dz ln ZRS (z) = n Dz ln ZRS (z) + cO(n2 ). All O(n2 ) terms don’t survive the n → 0-limit, so the final result is Z βJ0 2 (βJ)2 2 2 m − (m − q ) + Dz h ln ZRS (z)iϑ (6.33) −βfRS (β) = − 2 4 with
io n h βJ 2 √ ZRS (z) = 1 + exp β J0 m + (m − q) + J q z − ϑ 2
(6.34)
Exercise: Show that the replica symmetric fixed point equations can be recovered directly by requiring that fRS (β) is stationary w.r.t. variations of m and q. Exercise: Show explicitly that for a collection {naν } from k differnt replica we have k DY
n aν
ν=1
E
=
Z
Dz Φβ (hRS (z))k
in the n → 0-limit, where the expectation is the Gibbs-average generated by the effective replicated single site Hamiltonian Heff .
6.2.3
Results
The macroscopic properties of the system are determined by the ϑ distribtuion specifying the range of a-priori failure probabilities, the parameter J0 defining the average mutual dependency between processe, and the parameter J which quantifies the variance of the mutual dependencies across the network. In order to have predominantly cooperative interactions, need to be in the small J regime. Expect: for low a-priori failure probabilities, and low mutual dependencies, the system will typically be in the operational phase (O). If at given a-priori failure probability the mutual dependence (J0 ) between processes is increased there will be a point where spontaneous individual failures can trigger an avalanche of process failures. Heuristic analysis of FPEs, m = q =
DZ
DZ
Dz Φβ (hRS (z)) Dz Φ2β (hRS (z))
76
E
E
ϑ
ϑ
with a Gaussian local field hRS (z) = J0 m +
βJ 2 √ (m − q) + J q z − ϑ . 2
(6.35)
Try graphical solution Φβ (x) = 11 (1 + tanh(βx/2)) starting from the J = 0, constant ϑ limit. 1
Φβ(m) , m
0.8
0.6
0.4
0.2
0 0
0.5
1
1.5
2
2.5
m
Figure 6.1: Graphical solution of the FPEs in the J = 0-limit with ϑ = 2.5 ⇔ pi ≃ 0.01. Shown are the cases J0 = 2, which has only the low-m (O) phase, and J0 = 5, for which the low-m operational phase coexists with a high-m broken down phase (B). The graphical analysis for J = 0 reveals the existence of an operational phase (O), phases (C) where the operational and broken down phases coexist (C), and a broken-down phase at sufficiently large J0 . There will be hysteresis between the (O) and (B) phases. In the coexisting regime both the (O) and the (B) phase are locally stable; which one is absolutely stable, and which only meta-stable will be decided by a comparison of free energies. One can convince oneself that non-zero J will tend to smothen the effective Φβ curve and lead to a slight shift of the inflection point; anothe source of smoothing would be non-constant ϑ. Both would lead to slight deformations of phase boundaries, but the basic conclusions would be unaltered. Figure 6.2 shows the full solution for non-zero J — comparing with the a solution using nonthermal Gaussian noise — and the phase-diagram. Note that the Gaussian system will not exhibit thermal equilibrium despite symmetry of the interactions, so other techniques (dynamics) are needed. To compare the two cases, one has to match the thermal noise parameter to reproduce √ properties of the unit-variance Gaussian noise. Reproducing the variance requires β = π/ 3; but other measures could be taken, and would lead to different scalings of β. Note the dangers of meta-stability from an OR point of view. Systems in a meta-stable (O) phase would prefer to be in the (B) phase. Breakdown is therefore bound to occur at some point in such systems, without dramatic external causes (bubble-nucleation at first order phase 77
20
0.8
18
0.7
16
0.6
14 c
22
0.5
J0
m
1 0.9
12
0.4
10
0.3
8
0.2
6
0.1
4
0
2 0
5
10
15
20
0
J0
0.02
0.04
0.06 p
0.08
0.1
0.12
Figure 6.2: Left: Comparisons between stationary values of m for for symmetrically coupled networks with Gaussian noise (solid line) and networks √ with thermal noise (dotted line), as functions of J0 . The inverse temperature was β = π/ 3, and the a-priori homogeneous failure probability is p = 0.01. Right: Phase diagram of the network with thermal noise. In producing both diagrams, we set J = 0.2. transitions). Importantly, there is no visible sign of systems crossing from absolute stability to meta-stability! This is illustrated in figure 6.3 below. Perhaps worryingly, current trends across many industries point towards increasing dependency between processes (e.g. just-in time production).
78
Figure 6.3: Left: Loss record for a system of 100 heterogeneously interacting processes for a situation where the operational phase is stable but coexists with a meta-stable broken down phase (red), and for a system where J0 is slightly increased to make the operational phase metastable. The fluctuations of the two loss records are very similar. Both systems survive for the time-horizon shown. Right: Extending the time-horizon reveals a spontaneous breakdown of the meta-stable process network at a slightly later time. Note that no big event is needed to trigger the breakdown!
79
Chapter 7
Random Matrix Theory 7.1
Eigenvalues and the Spectral Density
Want to learn about eigenvalues of large matrices M . Consider real symmetric matrices. Eigenvalues defined via M xα = λxα , α = 1, . . . , N (7.1) where the λα are the eigenvalues, and the xα (6= 0)(!) are the eigenvectors. Eigenvalues are defined as roots of the characteristic polynomial p(λ) = det(λ1I − M ) = 0 For large matrices one is often interested in a limiting spectral density N 1 X δ(λ − λα ) ρM (λ) = N
(7.2)
α=1
as the matrix dimension N becomes large. Example Coupled oscillators on a closed ring. m¨ xℓ = −K[(xℓ − xℓ−1 ) + (xℓ − xℓ+1 )] ,
ℓ = 1, . . . , N .
Solving by xℓ (t) = eiωt vℓ requires ω 2 vℓ =
X
Mℓℓ′ vℓ′ ,
with
Mℓℓ′ =
ℓ′
80
K [2δℓℓ′ − δℓ+1,ℓ′ − δℓ−1,ℓ′ ] , m
thus to solve an eigenvalue equation for M . The difference structure entails that eigenvectors v = (vℓ ) are of the form vℓ = e−ikℓ . Periodic boundary conditions require eikN = 1, hence k = kµ = 2π N µ, µ = 1, . . . , N . Inserting into the eigenvalue equation gives λ(k) = ω(k)2 = 2
K [1 − cos(k)] m
The simplest way to compute the eigenvalue desity is to note that the eigenvalues λ(k) are 1 uniformly spaced in k-space, with k ∈ (0, 2π]. Denoting by ρ(k) = 2π the (normalized) density of k-values, we have dk m (1 − cos2 (k))−1/2 ρ(λ) = ρ(k) = dλ 4πK m which upon inserting cos(k) = 1 − 2K λ gives ρ(λ) =
m 1 r 2πK 1− 1−
m 2K
λ
0≤λ≤
2 ,
4K m
(7.3)
This spectral density has integrable singularities at its edges, commonly referred to as van Hove singularities. 1
0.8
ρ(λ)
0.6
0.4
0.2
0 0
0.2
0.4
0.6
0.8
1 λ
1.2
1.4
1.6
1.8
2
Figure 7.1: Spectral density of the harmonic chain as discussed for K = m/2.
7.2
The Random Matrix Problem
The random matrix problem is deals with generalizing considerations of the type above to situations where, for instance, force constants vary randomly across the system, so that elements of the matrices involved are random. 81
Other problems where spectra of random matrices are encountered and used are analysis of random graphs, random Schr¨odinger operators (decribing disordered conductors), random walks/diffusion on networks (Google page rank), energy levels of highly excited nuclei, quantum chaos, finance . . .. We will here look at the problem of computing spectral densities of sparse symmetric random matrices of the form Mij = cij Kij (7.4) where cij ∈ {0, 1} denotes presence or absence of a non-zero matrix element, and Kij its weight, if present. The term sparse refers to the limit in which the number of non-zero matrix elements in each row/column remains finite even in the limit of infinite matrix dimension. Such matrices describe (weighted) random graphs. If the cij are chosen independently (for i < j), the resulting graphs are so-called Erd¨os-Renyi random graphs.
7.2.1
Representation of the δ-function
The spectral density is typically computed using a representation of the δ-function as a limit of a Lorentz-distributions of zero-width: δ(λ − λα ) = lim
εց0
ε 1 1 1 = lim Im 2 2 εց0 π (λ − λα ) + ε π λ − iε − λα
(7.5)
Thus if vα is an eigenvector of M corresponding to the eigenvalue λα , we can appeal to the spectral theorem1 to express this as δ(λ − λα ) = lim
εց0
1 Im vα , [λε 1I − M ]−1 vα . π
Here λε = λ − iε, and (·, ·) denotes a scalar product.
Hence the overall sepctral density is
ρM (λ) = lim
εց0
= lim
εց0
X 1 Im vα , [λε 1I − M ]−1 vα πN α 1 Im Tr[λε 1I − M ]−1 πN
1
A brief reminder: the spectral theorem for symmetric or self-adjoint matrices says: if vα is an eigenvector of M corresponding to eigenvalue λα , i.e. if M vα = λα vα then for any function f (M ) we have f (M )vα = f (λα )vα , provided the r.h.s. exists.
82
The matrix-valued function G(z) = [z1I − M ]−1
is known as the resolvent of the matrix M . One can show that it exists and has finite norm for all z not in the spectrum of M . To exploit methods and heuristics of statistical physics, one uses [λε 1I − M ]−1 =
∂ ln[λε 1I − M ] ∂λ
and Tr ln A = ln detA = −2 ln[detA]−1/2
and the fact that [detA]−1/2 can be expressed in terms of a Gaussian integral. Summarizing, we have ρM (λ) = lim − εց0
where ZM (λε ) =
Z Y i
∂ 2 Im ln ZM (λε ) πN ∂λ
n i X o dx p i exp − λε δij − Mij xi xj 2 2π/i ij
(7.6)
(7.7)
Note that ZN is like a partition function of a system with continuous degrees of freedom — actually a harmonically coupled system, as the potential energy is quadratic in the coordinates — at an imaginary value of the temperature. A non-zero value of ε ensures that the integral exists. The problem is thus to compute the free energy of a disordered harmonic system, and evaluate a λ derivative. To deal with the disorder average one can use the replica method. Alternatively, for sparse Matrices, one can compute the partition function and the spectral density directly on a single large instance.
7.3
Cavity Approach to Sparse Matrices
Performing the λ derifative in (7.6),(7.7) allows to interpret the result as an ‘average’ X 1 Re hx2i i ρM (λ) = lim εց0 πN
(7.8)
i
over the normalized complex weight function PM (x) =
e−iHM (x) . ZM (λε )
83
(7.9)
It is formally like a Gibbs average w.r.t. the Hamiltonian 1 X λε δij − Mij xi xj HM (x) = 2
(7.10)
ij
evaluated at imaginary inverse temperature β = i. The key point to note is that only the single-vertex marginals Z Y Pi (xi ) = dxj PM (x) j(6=i)
are needed in order to evaluate the variances hx2i i appearing in (7.8). As in our construction of a mean-field approximation for Ising like systems, we note that we can write the marginal as 2 Z P Y e−iλε xi (7.11) dxj eixi j∈∂i Kij xj P (i) (x∂i ) Pi (xi ) = Zi j∈∂i
Here P (i) (x∂i ) is the distribution of the degree of freedoms neighbouring on i, in the so-called cavity graph — a graph from which vertex i has been removed as shown in the Figure below.
l l
i k k
j j
Figure 7.2: Left: part of a (locally) tree-like graph G, showing the neighbourhood of vertex i. Right: removal of vertex i creates a so-called cavity graph G(i) , in which degrees of freedom on the neighbouring vertices j, k, and l become uncorrelated. If one assumes the original graph to be tree-like, then the only correlations P of the x∂i is induced through the central variable xi , and in (7.11) accounted for by the xi j∈∂i Kij xj term in the exponential, so that the cavity distribution P (i) (x∂i ) factors w.r.t. subtrees rooted in the nodes in ∂i, Y (i) Pj (xj ) (7.12) P (i) (x∂i ) = j∈∂i
This factorization, which is exact on trees, is due to Bethe and Peierls. On sparse random graphs, where each node has finitely many neighbouring nodes, it is an approximation , which becomes exact in the thermodynamic limit N → ∞. 84
The reason for this is due to the observation that factorization will fail only due to the presence of loops in the system. Assuming every vertex in the graph to have on average c neighbours, the total number of nodes in the k-th coordination shell is O(ck ). These will generally be distinct as long as ck ≪ N . Conversely you may expect different subtrees rooted in the neighbours of i to connect to a common site, hence to have a loop, if ck O(N ) or k = O(ln N ). I.e. loops typically diverging length. As correlations are expected to decay along paths, with decay constants O(1), effective factorization becomes asymptotically exact. (i)
With this factorization, we have a simple recursion for the cavity-distributions Pj (xj ), viz. 2
(i) Pj (xj )
=
e−iλε xj (i) Zj
Z
Y
dxℓ eixj
P
ℓ∈∂j\i
Kij xℓ
ℓ∈∂j\i
Y
(j)
Pℓ (xℓ )
(7.13)
ℓ∈∂j\i
for all j and all i ∈ ∂j.
This set of recursions is clearly self-consistently solved by zero-mean Gaussians of the form n o 1 1 (i) 2 exp − (7.14) x Pj (xj ) = q (i) j (i) 2∆j 2π∆ j
(i)
Inserting this into (7.13), generates a recursion relation for the cavity variances ∆j , 1
(i)
∆j =
iλε +
P
(j)
2 ℓ∈∂j\i Kjℓ ∆ℓ
(7.15)
In a similar vein, and as a consequence, the marginals Pi (xi ) are Gaussian with variance ∆i given by 1 (7.16) ∆i = P 2 ∆(i) iλε + ℓ∈∂i Kiℓ ℓ
The recursion relations for the cavity variances are solved iteratively on a computer, using random initial conditions, and the variances of the marginals are evaluated in terms of these, once convergence of the iteration has been reached. In terms of these we have ρM (λ) = lim
εց0
X 1 Re ∆i πN i
for the spectral density. (i)
If we represent the complex valued cavity variances ∆ℓ as (i)
(i)
(i)
∆ℓ = aℓ + ibℓ 85
(7.17)
we have Re ∆i =
ε+
P
P
2 (i) ℓ∈∂i Kiℓ aℓ P (i) 2
ε+
2 ℓ∈∂i Kiℓ aℓ
+ λ+
(i) 2
2 ℓ∈∂i Kiℓ bℓ
P 2 a(i) = 0. A small non-zero ε The ε → 0 limit is singular (in λ) on sites i on which ℓ∈∂i Kiℓ ℓ must be kept in order to see contributions for X 2 (i) Kiℓ bℓ ≃ 0 λ+ ℓ∈∂i
One can show that these give the spectral density of localized states (smoothed at the scale of ε, as keeping it amounts to replacing δ-functions by Lorentzians. P 2 a(i) 6= 0. The limit in such a case gives the The ε → 0 limit is unproblematic for ℓ∈∂i Kiℓ ℓ contribution extended states in the continuous spectrum to the spectral density.
7.3.1
Results
7.3.2
The Dense Limit and Wigner’s Semi-Circle
In the limit of large average connectivity c ≫ 1 we need to scale the non-zero matrix elements √ according to Kij = Jij / c, with the Jij of zero-mean and variance J 2 , in order to have a spectrum which has its bulk in a bounded domain in R. (i)
Comparing (7.15) and (7.16) we see that the cavity variances ∆j and the variances ∆j of the marginals differ only by a small amount which vanishes as the c → ∞-limit is taken (i)
∆j = ∆j + O(c−1 ) Using this in (7.16) we have ∆i =
iλε +
1 c
1 P
2 ℓ∈∂i Jiℓ ∆ℓ
The sum in the denominator is then evaluated by appeal to the law of large numbers (ignoring correlations between the Jiℓ2 and the ∆ℓ , giving ∆i = Defining
iλε +
J2 c
1 P
ℓ∈∂i
∆ℓ
1X ∆ℓ c→∞ c
∆ = lim
ℓ∈∂i
86
1
0.8
ρ(λ)
0.6
0.4
0.2
0 -3
-2
-1
0
λ
1
2
3
Figure 7.3: Spectral density of Poisson Random Graph at c = 2 with Kij ∼ N (0, 1/c), evaluated at ε = 10−300 (full line). Simulation result averaged over 500 matrices of dimension N = 2000 (dashed) we see that ∆i is asymptotically independent of i in the large c limit, turning (7.16) into a self-consistency equation for ∆, viz. ∆=
1 iλε + J 2 ∆
(7.18)
This is a quadratic equation for ∆, which is solved by ∆1,2 = −i Taking the limit ε → 0 we have Re ∆i = Re ∆ =
λε 1 p ± 2 4J 2 − λ2ε 2 2J 2J
1 2J 2
√
0
4J 2 − λ2 ; |λ| ≤ 2J ; |λ| > 2J
which when inserted into (7.17) finally gives ρ(λ) =
1 p 2 4J − λ2 , 2πJ 2 87
|λ| ≤ 2J .
(7.19)
10
ρ(λ)
1
0.1
0.01
0.001 -3
-2
-1
0
λ
1
2
3
Figure 7.4: Spectral density of Poisson Random Graph at c = 2 with Kij ∼ N (0, 1/c), evaluated at ε = 10−300 (full line) and ε = 10−3 (full line). Simulation result averaging over 500 matrices of dimension N = 2000 (dashed). Simulatioon is indistinguishable from ε = 10−3 . This is Wigner’s celebrated semi-circle result for the spectral density of non-sparse real symmetric Gaussian random matrices.
88
0.7 0.6
ρ(λ)
0.5 0.4 0.3 0.2 0.1 0 -2
-1.5
-1
-0.5
0
0.5
1
1.5
2
λ
Figure 7.5: Spectral density of Poisson Random Graph at c = 3 with Kij = ±1/c, evaluated at ε = 10−300 (full red line) and ε = 10−3 (dashed green).
89
1 0.9 0.8 0.7
ρ(λ)
0.6 0.5 0.4 0.3 0.2 0.1 0 -5
-4
-3
-2
-1
0
1
2
3
4
5
λ
Figure 7.6: Spectral density of Poisson Random Graph of size N = 50000 at c = 4 with Kij ∼ N (0, 1/c), evaluated at ε = 10−300 (red full line), thermodynamic limit (black).
90
1
ρ(λ)
0.1
0.01
0.001
0.0001 -5
-4
-3
-2
-1
0
1
2
3
4
5
λ
Figure 7.7: Spectral density of Poisson Random Graph of size N = 50000 at c = 4 with Kij ∼ N (0, 1/c), evaluated at ε = 10−300 (red full line), localized states (green dashed), thermodynamic limit extended states (black), thermodynamic total DOS (blue).
91
Chapter 8
Dynamics, Optimization and Statistical Mechanics 8.1
Optimization
Would like to find mimima of complicated cost functions H(x) for x ∈ Ω, where Ω could be Rn , Zn , SN . This can be particularly complicated if the configuration space Ω is discrete, as the option of finding zeros of ∂xi H(x) is not an option. Another reason which might make this option infeasible, even if available in principle, is that dimensions N of configuration spaces may be exceedingly large. One might also wish to obtain suitable macroscopic characterizations of minimizing states x.
8.1.1
Example: Travelling Salesman Problem (TSP)
The TSP is about fining the shortest (cyclic) tour that visits N cities. If Cij is denotes the cost of travel from city i to city j. In this case H = H(π) =
N X
Cπ(i),π(i+1) ,
(8.1)
i=1
where π ∈ SN , the group of permutations of N elements, and one sets N + 1 ≡ 1 to enforce cyclic tours. Also one normally has Cij = Cji although this is not a fundamental restriciton. Want to minimize H. Problem contains disorder (the set of all possible distances or costs {Cij } is heterogeneous), and except in very simple cases, does not allow to spot the otimal tour by inspection, or by simple algorithms. 92
The direct method, computing total costs for all N ! permutations of the tour and picking the least costly is infeasible already for moderate N due to the rapid growth of N ! with N (e.g. 100! ≃ 9.33 10157 ).
8.1.2
Example: Coding and Decoding
The optimization problem here is to design (efficient) codes which minimize the average probability of errors (introduced by noisy transmission) over an ensemble of messages. Given a message m ∈ {0, 1}M one encodes it via m → x(m) ∈ {0, 1}N , where N ≥ M .
Transimission over a noisy channel distorts the message x(m) → y(6= x). Error probability for a given message X PB (m) = Q(y|x(m))1I{d(y)6=m} (8.2) y
where d(y) is the decoding of the received message y, and Q(y|x) =
N Y i=1
Q(yi |xi )
(8.3)
with Q(yi |xi ) denoting the conditional probability of receiving bit yi , given the bit sent was xi . These conditional probailities encode the model of the noisy transmission channel. The ultimate coal is to find codes C : {0, 1}M → {0, 1}N that minimize the total error probability 1 X PB (m) . (8.4) PB = M 2 m Note, the uniform average over all possible M -bit messages could be replaced by a non-uniform average, if information of the statistics of messages were available.
8.1.3
The Role of Statistical Mechanics
Given a cost function H(x) one could think of it as an energy function of a statistical mechanical system. In thermal equilibrium the states x of the system are distributed according to the GibbsBoltzmann probability distribution pβ (x)
1 e−βH(x) . ZN (β)
(8.5)
Assuming that the “groundstate” x∗ = argminx H(x) is unique, the Gibbs-Boltzmann distribution gives all weight to x∗ as β → inf ty, pβ (x) → δ(x − x∗ ) , 93
as
β→∞.
(8.6)
The minimal cost (per degree of freedom) N −1 H(x∗ ) is then equal to the zero temperature limit of the free energy N −1 H(x∗ ) = lim (βN )−1 ln[ZN (β)] . (8.7) β→∞
For certain stylized systems one can obtain anwers about minimal costs, or macroscopic properties of minimizing configurations x∗ analytically. In cases with disorder, typical results are given by the quenched free energy (evaluated e.g. using replica to average over the disorder). For many practical world problems one has to resort to simulations. However, analytic results can provide good test-beds to assess the quality of optimization algorithms. Simulated Annealing Algorithm A generic algorithm that implements the heuristics suggested by Statistical Mechanics is simulated annealing. • Simulated Annealing: Set β = β0 1 Equilibrate the system (using the Metropolis algorithm) at inverse temperature β. 2 Measure hN −1 H(x∗ )iβ .
3 Lower the temperature: β ← β + ∆β (using e.g. fixed ∆β ). 3 Goto [1].
Practical implementations differ bye choice of cooling schedule (β0 , ∆β, βfin ) with ∆β possibly itself temperature dependent, and by the simulation time spent to approximate equilibration. An often used device is to monitor the best-so-far solution, i.e. during measurement phases record and keep the configuration that leads to the lowest cost so far seen during the simulation. Also, quite often, several siulated annealing runs are performed, and the best solution of several runs is recorded.
8.2
Dynamics
We are looking at dynamical evoltion in complex systems for which there exists a Lyapunov function. Assume dynamics is given by a set of autonomous ODEs in RN x˙ i = fi (x) ,
i = 1, . . . , N .
(8.8)
Suppose there is a function H = H(x), s.t. (i) (ii)
∂H , with q(xi ) > 0 , ∂xi for all x ,
fi (x) = −q(xi ) H(x) ≥ C ,
94
(8.9)
and C > −∞.
Then H(x) is a Lyapunov function, i.e. a function which is monotonously non-increasing under the dynamcis. To see this compute dH dt
X ∂H dxi ∂xi dt i X ∂H 2 q(xi ) ≤ 0 = − ∂xi =
(8.10)
i
with equality iff fi (x) = 0 for all i, i.e. at stationary points of H. Remark 1: Only local minima of H(x) will correspond to stable fixed-points of the dynamics (Exercise: verify this statement, by looking at the Jacobian of the dynamical system at fixed points.) Remark 2: We can allow that there are different functions qi (xi ) > 0 for different i. Positive constants are naturally also allowed. Example: graded response neurons dynamical equations of the form Ci
Systems of graded response neurons are described by
ui X dui =− + Jij g(uj ) + Ii , dt Ri
(8.11)
j
where Ci is the ‘capacity’ of the i-th neuron, ui is the transmembrane voltage, Ri a transmembrane resistance, the Jij are synaptic constants, and vi = g(ui ) is the firing rate of neuron i, related to the transmembrane potential via a transfer-function g( ), which we assume to be a monotone increasing function of its argument, very small below a certain threshold, increasing substantially above threshold, and levelling off at large values of ui . The Ii represent external (current)-stimuli. Claim: If synaptic interactions are symmetric Jij = Jji , then the graded response dynamics is governed by the Lyapunov function H(v) = −
X X 1 1X Jij vi vj − Ii v i + G(vi ) , 2 Ri i,j
where G(vi ) =
i
Z
vi
i
dv ′ g −1 (v ′ )
is the integrated inverse input-output relation of the neurons.
95
(8.12)
(8.13)
To see this evaluate dH dt
X ∂H dvi X ∂H dui = g ′ (ui ) ∂vi dt ∂vi dt i i i X ∂H 1 −1 1 hX = − Jij vj + Ii − g ′ (ui ) g (vi ) ∂vi Ci Ri i j X 1 ∂H 2 g ′ (ui ) ≤ 0 , = − Ci ∂vi =
(8.14)
i
where we have used that g −1 (vi ) = ui , and g ′ (ui ) > 0, and also invested the equations of motion. Thus H(v) is monotone non-increasing under the dynamics. If G(vi ) grows faster than quadradic in the vi , then H will be bounded from below, and thus a Lyapunov function of the dynamics. To analyse the long-term behaviour of the dynamics, look at the β → ∞ limit of the free energy of the system. Disorder is for instance in the couplings. Jij =
1 X µ µ ξ ξ . N µ i j
Allows to compute phase diagram, firing-rate distributions with/without stimulus. Example: Ecological resource competition –P types of resource, µ = 1, . . . , P –ni (t) ≥ 0: population density of species i. Dynamics
–N species, i = 1, . . . , N
1 X µ n˙ i = ni − fi + Q (t)qiµ , N µ
where –fi > 0 is the decay rate of species i without resources –qiµ is the per-capita demand on resource µ by species i, and X µ Qµ (t) = Qµ0 − qi ni (t)
(8.15)
(8.16)
i
is the abundance of recource mu, with Qµ0 denoting the initial abundance without consumption. Claim The function H = H(n) =
1 X µ 2 X Q (t) + fi ni (t) 2N µ i
96
(8.17)
is monotone decreasing under the dynamics. To see this, evaluate dH dt
=
X ∂H dni ∂ni dt i X 1 X
dn ∂Qµ i + fi N µ ∂ni dt i 2 X 1 X µ Q (t)qiµ ≤ 0 = − fi − N µ =
Qµ
(8.18)
i
with equality iff n˙ i − 0 for all i.
Once more the collective properties of the system can be studied by looking at the zero temperature limit of the free energy. Disorder: Assumptions µ µ –qiµ are i.i.d with i i = q and Var(qi ) = 1; the latter specification setting a scale √ hq µ –Q0 = P + s P xµ , with xµ ∼ N (0, 1).
In these specifications s is a parameter characterizing the variability of resources, and q denotes the mean resource load of each species [the similarity between species is characterized by the ratio of q and the variance of the qiµ (which was set to 1). The relevant parameter is α = P/N . Evaluate average free energy using replica and δ-function methods to represent the order-parameters that emerge from the calculation. Key order parameter is φ=
1 X hni i N i
Find: (1) for α < αc get φ = α = P/N , i.e. there is on average one resource per surviving species, while for α > αc get φ < α, i.e. there is less than one species per resource. (2) αc = αc (sq) depends on the product sq and decreases from 0.5 at sq = 0 to 0 at sq = 3 Distribution of eigenvector components of a random matrix on a graph, which is of the form Mij = cij Kij
Consider a random matrix
Eigenvectors vα satisfy M vα = λα vα . Find distribution of eigenvector components via pβ (v) =
P X β P 2 1 vi2 − N ) e− 2 i (vi − j Mij vj ) δ( Z(β) i
97
0.7 0.6 0.5
α
0.4 0.3 0.2 0.1 0 0
0.5
1
1.5
2
2.5
3
sq
Figure 8.1: Schematic phase diagram of the ecological resource-competition problem. For α > αc (sq) there is typical less then one species per resource (i.e. species can choose), whereas for α < αc (sq), i.e. below the curve, the number of surviving species is equal to the number of different resources. and take limit β → ∞, where the equilibrium distribution is concentrated on eigenvectors corresponding to eigenvalue λ. Thus investigate the β → ∞-limit of Z Y P X β P 2 Z(β) = dvi e− 2 i (vi − j Mij vj ) δ( vi2 − N ) i
i
This problem is as yet unsolved.
98
Chapter 9
Transfer Matrices We will look at the transfer-matrix method to study one-dimensional systems, starting with homogeneous systems and going on to one-dimensional random field systems. A possible application of the random field model is a highly idealized description of equilibrium properties of double-stranded DNA, and in particular the phenomenon of bubble formation, i.e. the spontaneous local unbinding of the double strand.
9.1
One-Dimensional Systems
Consider a homogeneous chain-like system system (Ising for specificity), closed to form a ring. Its Hamiltonian is N N X X Si (9.1) Si Si+1 − h H(S) = −J i=1
i=1
The evaluation of the partition function ZN =
X
e−βH(S)
S
can be understood in terms of matrix algebra as follows. With K = βJ and B = βH we can write N XY ZN = eKSi Si+1 +BSi . S i=1
Upon introduction of a 2 × 2 transfer-matrix Γ with elements K+B B ′ B ′ e e−K ⇔ Γ= ΓS,S ′ = e 2 S+KSS + 2 S e−K eK−B 99
(9.2)
the partition functionZN is seen to evaluate to ZN = Tr ΓN .
(9.3)
Note: One could also have chosen a slightly different (asymmetric) variant of the transfer-matrix ˜ S,S ′ = eKSS ′ +BS , without changing the above relation. Exercise: Check this. such as Γ Evaluating the trace using the basis of eigenvectors of Γ gives ZN = γ1N + γ2N . Appealing to Perron’s theorem we know that γ1 > |γ2 |, so one obtains the very compact result −βf (β) = lim
N →∞
1 ln ZN = ln γ1 . N
(9.4)
The eigenvalues of Γ are γ1,2 = e
K
h
q i cosh(B) ± sinh2 (B) + e−4K
This gives ∂f = m = m(β, h) = − ∂h
cosh(B) sinh(B) + √sinh(B) sinh2 (B)+e−4K q cosh(B) + sinh2 (B) + e−4K
(9.5)
As m(β, h) → 0 for h → 0 at any finite value of β, the system does not exhibit a phase-transition to a magnetic low temperature phase with non-vanishing spontaneous magnetization. The T → 0 (β → ∞)-limit is often taken in a form which keeps B = βh constant. Taking the limit that way we get m(B) − lim m(β, h)|B = sign(B) β→∞
Transfer-matrices can also be used to evaluate expectations of local observables or correlation functions. Let g be a function of a single spin variabe Si Then QN P P N S g(Si ) j=1 ΓSj ,Sj+1 Si g(Si )(Γ )Si ,Si = hg(Si )i = Tr ΓN γ1N + γ2N Express Γ in terms of its eigenvalues and eigenvectors, Γ = U DU −1 100
with D = Diag(γα ) and U the column matrix of eigenvectors of Γ, using U −1 = U t , we have P N X Si ,α g(Si )USi ,α γα USi ,α 2 → g(S)US,1 , hg(Si )i = γ1N + γ2N S as N → ∞. Here we have used Peron’s theorem, which entails that (γα /γ1 )N → δα,1 .
In a similar vein the expectation of two functions of observables on different sites gives P |j−i| ) N −|j−i| ) Si ,Sj h(Sj )(Γ Sj ,Si Si ,Sj g(Si )(Γ , hg(Si )h(Sj )i = N N γ1 + γ2
Which upon inserting the spectral representation for the powers of Γ in the numerator results in P P |j−i| N −|j−i| USj ,α h(Sj )USj ,β γβ USi ,beta α,β g(Si )USi ,α γα Si ,Sj hg(Si )h(Sj )i = γ1N + γ2N Dividing numerator and denominator by γ1N results in the thermodynamic limit in hg(Si )h(Sj )i =
XX
US,1 g(S) US,α
α S,S ′
γα γ1
|j−i|
US ′ ,α h(S ′ ) US ′ ,1
Introducing the matrix g with elements gα,β =
X
US,α g(S)US,β
S
and similarly for h, we get the following compact expressions in the thermodynamic limit hg(Si )i = g1,1 ,
hg(Si )h(Sj )i =
X α
g1,α
γα γ1
|j−i|
hα,1
(9.6)
which explicitly exhibit translational invariance. In a similar vein, we get C g,h (i, j) = hg(Si )h(Sj )i − hg(Si )ihh(Sj )i |j−i| X γα g1,α = hα,1 γ1 α(6=1)
for the (connected) correlation functions.
101
(9.7)
9.2
Quasi One-Dimensional Systems
We have kept the notation deliberately general, to allow stating in brief terms the generalizations we get when looking at quasi-one-dimensional systems. Suppose that we are looking at a system composed of M × N spins. Denote by Si the configurations of one of the N columns, containing M spins each (there are 2M such configurations. Assuming that the Hamiltonian takes the form H = H(S) = −
N X i=1
V (Si , Si+1 ) −
X
U (Si )
i
we introduce a transfer matrix ΓS,S ′ = exp giving
n βU (S) 2
+ βV (S, S ′ ) +
βU (S ′ ) o , 2
ZM N = Tr ΓN ,
(9.8)
thus
1 1 ln ZM N = ln γ1 , N →∞ N M M where γ1 is the largest eigenvalue of Γ. −βfM (β) = lim
(9.9)
For expectations and connected correlation functions, we obtain in a similar vein hg(Si )i = g1,1
(9.10)
and hg(Si )h(Sj )i − hg(Si )ihh(Sj )i =
X
α(6=1)
g1,α
γα γ1
|j−i|
hα,1 .
(9.11)
Note: To investigate a proper 2-dimensional system, one would have to take the M → ∞-limit as well. As the dimension of Γ for a strip of widht M is 2M , we see that this task is considerably more difficult than that faced when looking at a proper 1-D system. Nevertheless, it has been solved for the 2-D Ising model without an external field, as well as for a number of other 2-D systems. R. Baxter’s book “Exactly Solved Models in Statistical Mechanics” is the best reference.
102
9.3
Spin Chains in a Random Field
If we consider systems with random bonds or random fields, the transfer matrix techniques can no longer be used in the form just described. Considering a random field system with X X H = −J Si Si+1 − hi Si i
i
one would have different i-dependent transfer matrices for each bond i, i + 1, e.g. ′
(i)
ΓS,S ′ = eKSS +Bi S
(9.12)
if we choose the non-symmetric convention. These matrices do no longer commute, if they have different Bi , −2K e 1 (i) (j) (i) (j) (j) (i) , [Γ , Γ ] = Γ Γ − Γ Γ = 2 sinh(Bi − Bj ) −1 −e−2K so they cannot be diagonalized by one and the same matrix U of eigenvectors. Exercise Verify the above commutation relation. As a consequence the simple expressions for free energy and correlation functions that we derived for homogeneous systems are no longer available. However we can still express these objects in terms of matrix products, for which so-called Lyapunov exponents play the role of the eigenvalues E.g. for a closed ring of circumference N we have ZN = Tr
N Y i=1
Γ(i) ∼ eN γ ,
(9.13)
where γ is the largest Lyapunov exponent of the family of random matrices {Γ(i) }, so that 1 ln ZN = γ N →∞ N
−βf = lim
The method to compute γ and other thermodynamic quantities, which we use here is due to Dyson (1953) and Schmidt (1957) and was more recently applied to random-field Ising chains by Bruinsma and Aeppli (1983). It starts by looking at the mapping xi xi−1 (i) =Γ (9.14) yi yi−1 which is equivalent to two coupled stoachastic recursion relations xi = eK+Bi xi−1 + e−K+Bi yi−1 yi = e−K−Bi xi−1 + eK−Bi yi−1 . 103
Considering, instead of the pair of recursions the recursion for the ratio ρi = xi /yi , we get ρi = e2Bi
2K ρ eK ρi−1 + e−K i−1 + 1 2Bi e = e ≡ FBi (ρi−1 ) −K K e ρi−1 + e ρi−1 + e2K
(9.15)
This defines an iterative mapping ρi−1 → ρi which depends on the random realization of Bi . For a bimodal distribution of the Bi , i.e. Bi ∈ {B+ , B− }, we get two branches F+ and F− of the mapping. F
+
ρi
F−
ρ i−1
Figure 9.1: Qualitative sketch of the ρ-dynamics for a bimodal B distribution. The bounds of the support of the asymptotic invariant measure are at the intersections of the F+ and F− branch of the recursion relation with the diagonal. Describing the dynamics at the level of a ‘time’-dependent pdf Pi (ρ), we have — for general p(B) — the Chapman-Kolmogorov equation Z Z Z Pi (F −1 (ρ)) Pi+1 (ρ) = dB p(B) dρ′ Pi (ρ′ )δ(ρ − FB (ρ′ )) = dB p(B) ′ B−1 . FB (FB (ρ)) At large ‘times’ i → ∞ the possible positions ρ will be assumed with an equilibrium distribution P∞ (ρ), and this asymptotic density is given as the “fixed-point” solution of this dynamic equation with Pi ≡ Pi+1 ≡ P∞ . Note that P∞ is in general a complicated multi-fractal object. For this reason it is advisable to study the dynamics in terms of the probability distribution Pi (ρ) = Rρ ′ ′ −∞ d ρ Pi (ρ ) rather than the probability density functions Pi . For these, the equivalent of the Chapman-Kolmogorov equation is Z Pi+1 (ρ) = dB p(B) Pi (FB−1 (ρ)) . Exercise: Verify this. 104
Depending on the the pdf of the random fields Bi and on K, the stationary P∞ can have qualitatively different properties. Its support can be a fractal, in which case the distribution P∞ exhibits a so called ‘devil’s staircase’. The suppport can be a compact interval without gaps, but the pdf can still be everywhere non-analytic (a ’fat multi-fractal’), slopes of P∞ at the boundary of the support can be zero or infinite. Qualitative changes in the stationary P∞ (ρ), are then expected to manifest themselves in thermodynamic properties of the system
9.3.1
Thermodynamics of the Random Field Chain
Thermodynamic properties of the random field chain follow from the partition function ZN = Tr
N Y i=1
N Y Γ(i) = Γ(i) i=1
1,1
+
N Y i=1
Γ(i)
2,2
With reference to (9.14) this can be expressed as ZN = xN |x0 =1,y0 =0 + yN |x0 =0,y0 =1 . Using
yi yi−1
one obtains yN =
(9.16)
= e−K−Bi ρi−1 + eK−Bi ,
N Y i=1
e−K−Bi ρi−1 + eK−Bi y0
(9.17)
(with ρ0 = 0 and y0 = 1) for the second contribution to ZN . Similarly one gets x N = ρ N yN = ρ N
N Y i=2
e−K−Bi ρi−1 + eK−Bi y1
(9.18)
with y1 = e−K−Bi x0 (and x0 = 1) for the first contribution. Thus, apart from boundary terms the xN and the yN contribution contain the same product, so they give the same order of magnitude contribution to ZN , hence −βfN
1 1 = ln ZN = ln yN |x0 =0,y0 =1 + O(1/N ) N N X 1 X 1 Bi + ln(ρi−1 + e2K ) = −K − N N i
i
105
(9.19)
10
1 0.9
1 0.8 0.7 P(ρ)
p(ρ)
0.1
0.01
0.001
0.6 0.5 0.4 0.3
0.0001 0.2 1e-05
0.1 0
50
100
150
200
250
300
0
50
100
150
ρ
200
250
300
ρ
0.1
1 0.9 0.8 0.7
0.01 p(ρ)
P(ρ)
0.6 0.5 0.4 0.001
0.3 0.2 0.1
0.0001
0 0
50
100
150
200
250
300
350
400
0
50
100
150
200
ρ
250
300
350
400
ρ
0.1
1 0.9 0.8 0.7
0.01 p(ρ)
P(ρ)
0.6 0.5 0.4 0.001
0.3 0.2 0.1
0.0001
0 0
100
200
300
400
500
600
0
100
200
300
ρ
400
500
600
ρ
0.01
1 0.9 0.8 0.7
P(ρ)
p(ρ)
0.6 0.001
0.5 0.4 0.3 0.2 0.1
0.0001
0 0
100
200
300
400
500
600
700
800
900
0
ρ
100
200
300
400
500
600
700
800
900
ρ
Figure 9.2: Stationary probability density functions P∞ (ρ) (left) and corresponding (integrated) probability distributions P∞ (ρ) (right) for a random field model with Bi ∈ {±0.01} and (from top to bottom) K = 0.25, K = 0.5, K = 0.75 and K = 1.0. In the first two cases, we have a ’thin multi-fractal’, with an infinite hierarchy of gaps in the support (corresponding to flat regions in the probability distributions) and complete devil’s staircases for the P∞ (ρ). In the latter two cases we have ’fat multi-fractals’ with P∞ (ρ) monotone increasing, continious but non-analytic on the entire support. 106
As N → ∞ the arithmetic means converge to expectations over the distributions of random variables (the Bi , and the ρi (in the stationary regime), so −βf = lim −βfN = −K − hBi + h ln(ρ + e2K )iρ N →∞
(9.20)
Another quantity of interest is the local magnetization (enumerating spins anti-clockwise with periodic boundary conditions (N + 1 ≡ 1)) P (N ) (N −1) (1) Γ Γ . . . Γ xN |x0 =1,y0 =0 − yN |x0 =0,y0 =1 S S1 ,SN SN ,SN −1 S2 ,S1 S1 hS1 i = = ZN xN |x0 =1,y0 =0 + yN |x0 =0,y0 =1 It would be appealing to invoke the above representations (9.17) and (9.18) in terms of products, and argue that the factors (i = 2, 3 . . . , N ) are common between xN and yN and thus cancel, giving ρN y1 − eK−B1 ρN e−K−B1 − eK−B1 hS1 i = = ρN y1 + eK−B1 ρN e−K−B1 + eK−B1 This argument, however, is false, as the ρi sequences appearing in xN and yN are different, having different initial conditions. (Yet it seems that the Bruinsma Aeppli paper is using this kind of reasoning; have a look at that paper, which is available from the course web page) Thus it appears that one has to evaluate expressions numerically for long chains, by explicitly evaluating recursions and products. Note also that the local magnetization m1 = hS1 i depends on local randomness (B1 ), hence hS1 i will not be self-averaging and one would have to find the pdf p(m1 ).
9.4
Double-Stranded DNA as an Application
The random field Ising model could provide a simplified model of double stranded DNA, consisting of sequences of complementary G-C and A-T pairs. It is known that the binding energy (stability) of the G-C pairs is larger than that of the A-T pairs. Taking a two state model, where Si = +1 denotes a bound state of a pair and Si = −1 a state where the pair is unbound one can write the Hamiltonian of the double stranded chain as X X H = −J Si Si+1 − hi Si (9.21) i
i
where the hi take two values, hGC > hAT > 0 to represent the different binding energies. The interaction term represents internal stiffness, local bending-energy of the strand is required to
107
have neighbouring bound-unbound pairs, whereas if neighbouring pairs are both bound or both unbound the strand is locally straight and no such binding energy is required. Of interest would be the thermodynamic properties of this system, which could be obtained by the methods described above. The local magnetization is then related to the probability of having an unbound pair at location i via 1 pui = (1 − hSi i) 2 Another qunantity of interest would be the probability of having a bubble of length n of consecutive unbound pairs (without loss of generality starting at i = 1), given by * n + Y 1 − Sj pu1 (n) = 2 j=1
As for the local magnetization the probability of having a bubble of length n will not be selfaveraging. 1−S
To push the evaluation a bit further, note that the Pj− = 2 j are projectors on the Sj = −1 state. Expressing the average pu1 (n) in terms of the transfer-matrices, we then have Q P (N ) (N −1) (1) n − Γ Γ . . . Γ S j=1 Pj S1 ,SN SN ,SN −1 S2 ,S1 u p1 (n) = ZN Q P (N ) (N −1) (n) n−1 (j) Γ Γ . . . Γ ′ j=1 Γ−,− S −,SN SN ,SN −1 Sn+1 ,− = ZN Qn−1 (j) yN |xn−1 =0,yn−1 =1 j=1 Γ−,− = (9.22) xN |x0 =1,y0 =0 + yN |x0 =0,y0 =1 (j)
In this expression one may substitute Γ−,− = eK−Bj for j = 1, . . . , n − 1 and invoke product representations ` a la (9.17) and (9.18), to evaluate the remainder of the above result in terms of suitable {ρi } sequences. Some explicit results to follow.....
108
7CCMCS03 – Equilibrium Analysis of Complex Systems Exercises 01 — Prerequisites Question 1 Gaussian integrals. (a) Show that the Gaussian integral I= is given by I =
√
Z
∞
2
dx e−x .
−∞
π.
Hint: Compute I 2 and use polar coordinates to evaluate the integral. (b) Use this result to show that for Re a > 0 I(a) =
Z
∞
dx e
− 12 ax2
=
−∞
s
2π . a
(c) Use (b) to show that I(a, b) =
Z
∞
dx e
− 21 ax2 ±bx
−∞
=
s
2π 1 b2 /a . e2 a
(d) Let A be a real symmetric, positive definite n × n matrix. [Rem.: A matrix is positive definite, iff all its eigenvalues λα are strictly positive]. Using the P notation (a, b) = ni=1 ai bi to denote the scalar product of two vectors in IRn , show that Z (2π)n/2 n − 12 (x,Ax) √ I(A) = d x e = . IRn detA Hint: You can use the fact that A can be diagonalized by an orthogonal transformation O, i.e. A = O D OT , with D the diagonal matrix of eigenvalues of A, D = diag(λα ), and moreover that |det O| = 1 for any orthogonal transformation O. (e) Let C be a real symmetric, positive definite n × n matrix. Use (d) to show that p(x) = q
1
1
(2π)n detC
e− 2 (x,C
−1 x)
is a normalized probability density function (a so-called multivariate Gaussian pdf) over IRn . (f ) Finally use (e) to demonstrate that I(C, b) =
Z
n
n
IR
d x p(x) e
(b,x)
=
Z
n
IR
dn x q
1 (2π)n detC
1
e− 2 (x,C
−1 x)+(b,x)
1
= e 2 (b,Cb)
Hint: Consider the ‘shifted’ quadratic form 12 x − Cb, C −1 (x − Cb) , and use the fact that an integral over a Gaussian density is invariant under a translation of the center of the (multivariate) Gaussian. You will also need the fact that C, hence C −1 , are symmetric matrices. 1
Question 2 Probability etc. Let Xi , i = 1, . . . , n be n (not necessarily identical) random variables, with Xi ∈ Ωi . For simplicity we assume the sample spaces Ωi to be discrete (in the continuous case all sums over probability mass functions below have to be replaced by the corresponding integrals over probability density functions). (a) Let p(x1 , x2 , . . . , xn ) = Prob(X1 = x1 , X2 = x2 , . . . , Xn = xn ). State the properties of the (joint) probability mass function p. Show that the P marginal p(x1 , x2 , . . . , xn−1 ) = xn ∈Ωn p(x1 , x2 , . . . , xn ) is also a probability mass function. (b) Assume that the Xi are independent (though not neccessarily identical), i.e. that p(x1 , x2 , . . . , xn ) = p1 (x1 )p2 (x2 ) . . . pn (xn ) for all (x1 , x2 , . . . , xn ) ∈ Ω ≡ ⊗ni=1 Ωi . Show that under this condition *
n Y
i=1
fi (Xi )
+
Ω
≡
X
p(x1 , x2 , . . . , xn )
n Y
fi (xi ) =
i=1
(x1 ,x2 ,...,xn )∈Ω
n Y
i=1
hfi (Xi )iΩi ,
where X
hfi (Xi )iΩi =
pi (xi )fi (xi ) .
xi ∈Ωi
(c) Now let Xi = Si with Ωi = {±1} for all i. (Remark: binary variables of this type are so-called Ising spins.) Assume that the probability mass function for the Si is of the form p(s1 , s2 , . . . , sn ) =
1 a1 s1 +a2 s2 +...an sn e Z
for (s1 , s2 , . . . , sn ) ∈ {±1}n , and arbitrary (real) constants a1 , a2 , . . . , an . Show that under this condition the Si are independent random variables. Show also that the normalization constant Z in the above probability mass function is given by Z=
n Y
[2 cosh(ai )] .
i=1
(d) Given a probability mass function as in (c), compute the marginals p(s1 , s2 , . . . , sn−1 ), p(s1 , s2 , . . . , sn−2 ), p(s1 , s2 , . . . , sn−3 ),. . . , down to p(s1 ). Finally compute p(si ) for some 1 ≤ i ≤ n, and p(si , sj ) for an arbitrary pair 1 ≤ i < j ≤ n. Use these results to compute the expectation values hSi i and hSi Sj i.
2
7CCMCS03 – Equilibrium Analysis of Complex Systems Solutions 01 — Prerequisites Question 1 Gaussian integrals. (a) To show that the Gaussian integral I= is given by I =
√
Z
∞
2
dx e−x .
−∞
π, compute 2
I =
Z
∞
dx −∞
Z
∞
dy e−x
2 −y 2
.
−∞
Introducing polar coordinates x = r cos(ϕ), y = r sin(ϕ), one has Z
I2 =
2π 0
dϕ
Z
∞ 0
2
dr re−r .
1 Z 2π dϕ = π , = 2 0 where the r integral can, e.g., be evaluated using the substitution z = r2 . The result follows. (b) To show that for Re a > 0 I(a) = use the substitution z =
q
a x, 2
I(a) =
Z
∞
dx e
− 12 ax2
=
−∞
hence dx = dz
s
2 I(1) = a
s
q
s
2π , a
2 , a
in I(a), which gives
2 I= a
s
2π , a
as claimed. (Note we have also used the fact that the integration limits are unchanged under the substitution.) (c) To show that I(a, b) =
Z
∞
dx e
− 21 ax2 ±bx
−∞
=
s
2π 1 b2 /a , e2 a
write the exponent in I(a, b) in terms of complete squares: 1 1 1 − ax2 ± bx = − a(x ∓ b/a)2 + b2 /a 2 2 2 The result follows by using the substitution z = x + ∓b/a, so dx = dz (or in other words, by noting that a Gaussian integral is invariant under a shift of the location of the center).
1
(d) Let A be a real symmetric, positive definite n × n matrix. To show that I(A) =
Z
(2π)n/2 n − 12 (x,Ax) √ = d x e . IRn detA
Use A = O D OT , and the fact that for any matrix B and an arbitrary pair of vectors x and y we have (x, By) = (B T x, y), so that n X 1 1 1 1 λα yα2 (x, Ax) = (x, ODOT x) = (OT x, DOT x) = (y, Dy) = 2 2 2 2 α=1
with y = OT x. Now use the y as integration variables. We have dn x = |det(O)|dn y = dn y, so I(A) =
Z
n
n
IR
d ye
− 21
This gives I(A) =
Pn
α=1
2 λ α yα
=
n Z Y
α=1
1
IR
2
dyα e− 2 λα yα
n Y
(2π)n/2 I(λα ) = √ , detA α=1
as claimed. In the last step we have used the fact that the determinant is Q invariant under orthogonal transformations, hence det A = det D = nα=1 λα
(e) Let C be a real symmetric, positive definite n × n matrix.To show that p(x) = q
1
1
(2π)n detC
e− 2 (x,C
−1 x)
is a normalized probability density function (a so-called multivariate Gaussian pdf) over IRn , we note that (i) p(x) ≥ 0 for all x ∈ IRn and (ii) that R −1 and det C −1 = IRn p(x) = 1, which follows from (d), when using A = C n (detC)−1 . Hence p is a pdf over IR . (f ) Finally to demonstrate that I(C, b) =
Z
n
n
IR
d x p(x) e
(b,x)
=
Z
n
IR
dn x q
1 (2π)n
1
detC
e− 2 (x,C
−1 x)+(b,x)
1
= e 2 (b,Cb)
We follow the hint and consider the ‘shifted’ quadratic form i 1h 1 x − Cb, C −1 (x − Cb) = (x, C −1 x) − (x, b) − (b, x) + (b, C −1 b) 2 2
(we have used the symmetry of C and the symmetry of the scalar product). If inserted into I(C, b) the result follows from (e) and the symmetry of the multivariate Gaussian integral under a shift of the pdf. Question 2 Probability etc. Let Xi , i = 1, . . . , n be n (not necessarily identical) random variables, with Xi ∈ Ωi . For simplicity we assume the sample spaces Ωi to be discrete (in the continuous case all sums over probability mass functions below have to be replaced by the corresponding integrals over probability density functions). 2
(a) Let p(x1 , x2 , . . . , xn ) = Prob(X1 = x1 , X2 = x2 , . . . , Xn = xn ). For p to be a probability mass function on Ω = ⊗ni=1 Ωi , we need (i) that p(x) = p(x1 , x2 , . . . , xn ) > 0 for all x ∈ Ω and (ii) that X
x∈Ω
p(x) ≡
X
p(x1 , x2 , . . . , xn ) = 1 .
(x1 ,x2 ,...,xn )∈Ω
The marginal p(x1 , x2 , . . . , xn−1 ) = xn ∈Ωn p(x1 , x2 , . . . , xn ) is also a probabiln−1 ity mass function (on Ω(n) = ⊗i=1 Ωi ), as it is positive for all (x1 , x2 , . . . , xn−1 ) ∈ (n) Ω — being a sum of positive terms — and normalized, P
X
X
X
p(x1 , x2 , . . . , xn−1 ) =
p(x1 , x2 , . . . , xn )
(x1 ,x2 ,...,xn−1 )∈Ω(n) xn ∈Ωn
(x1 ,x2 ,...,xn−1 )∈Ω(n)
X
=
p(x) = 1 .
x∈Ω
Rem.: the same argument clearly works for other marginals and for iterated marginals. (b) Assuming that the Xi are independent (though not neccessarily identical) we have *
n Y
i=1
fi (Xi )
+
Ω
≡
=
p(x1 , x2 , . . . , xn )
X
[pi (xi )fi (xi )]
n Y
(x1 ,x2 ,...,xn )∈Ω i=1 n h X Y
pi (xi )fi (xi )
i=1 n Y
i=1
n Y
fi (xi )
i=1
(x1 ,x2 ,...,xn )∈Ω
= =
X
xi ∈Ωi
i
hfi (Xi )iΩi ,
where hfi (Xi )iΩi =
X
pi (xi )fi (xi ) .
xi ∈Ωi
(c) Now let Xi = Si with Ωi = {±1} for all i. (Remark: binary variables of this type are so-called Ising spins.) Assume that the probability mass function for the Si is of the form p(s1 , s2 , . . . , sn ) =
1 a1 s1 +a2 s2 +...an sn e Z
for (s1 , s2 , . . . , sn ) ∈ {±1}n , and arbitrary (real) constants a1 , a2 , . . . , an .
To verify that under this condition the Si are independent random variables, we note that p(s1 , s2 , . . . , sn ) can be written as n 1 Y p(s1 , s2 , . . . , sn ) = ea i si = Z i=1
As all pi (si ) = 3
Qn
i=1 [2 cosh(ai )]
Z
ea i si 2 cosh(ai )
×
n Y
ea i si . i=1 [2 cosh(ai )]
are normalized probability mass functions on {±1}, we must have Qn
i=1 [2 cosh(ai )]
Z
=1
⇐⇒
Z=
n Y
[2 cosh(ai )] ,
i=1
as claimed. (d) The solution of (c) implies that p(s1 , s2 , . . . , sn ) =
n Y
pi (si ) =
i=1
n Y
ea i si i=1 [2 cosh(ai )]
From this we get p(s1 , s2 , . . . , csn−1 ) =
X
p(s1 , s2 , . . . , sn )
sn =±1
=
n−1 Y i=1
=
n−1 Y i=1
X ea n sn ea i si × [2 cosh(ai )] sn =±1 [2 cosh(an )] !
ea i si . [2 cosh(ai )]
Proceeding in the same manner we get expressions for all marginals in the form p(s1 , s2 , . . . , sk ) =
k Y
pi (si ) ,
k = 1, . . . , n ,
i=1
and in a similar manner p(si ) = pi (si ), and p(si , sj ) = pi (si ) pj (sj ). Using these results we find hSi i = tanh(ai ), and hSi Sj i = tanh(ai ) tanh(aj ).
4
7CCMCS03 – Equilibrium Analysis of Complex Systems Exercises 1 Question 1 Consider a discrete random variable K taking non-negative integer values, with probabilities p(k), k ∈ {0, 1, 2, 3, . . .}. (a) Find the best assignment for the probabilities {p(k); k ≥ 0} subject to the P constraint hKi = k≥0 k p(k) = µ for some µ > 0. (b) Use your best estimate to compute the variance Var(K) = hK 2 i − hKi2 , and express the variance in terms of µ. Question 2 Consider a continuous non-negative random variable X with probability density function p(x), x ≥ 0. (a) Use the maximum entropy method to find the best estimate for p(x), subject to the constraint hXi = x0 > 0 (b) Express the unknown parameter(s) of your estimate for the probability density function in terms of the mean x0 (c) Use your best estimate for the probability density function to derive an expression for all integer moments µn = hX n i, n > 0. Also, compute the expectation hcosh(aX)i, and determine the range of a-values for which this expectation exists.
1
7CCMCS03 – Equilibrium Analysis of Complex Systems Solutions 1 Question 1 Discrete random variable defined on non-negative integers. (a) To find the best probability assignment subject to the constraint of given mean, we use the MaxEnt method. I.e. maximse S[p] = −kB
X
p(k) ln p(k)
k≥0
subject to the constraint X
k p(k) = µ .
k≥0
using the method of Lagrangian multipliers to take constraints into account. To simplify notation we can assume kB = 1; this amounts to redefining Lagrangian multipliers in units of kB (can you see why?). Thus define L[p] = −
X
k≥0
p(k) ln p(k) + λ0
X
k≥0
p(k) − 1 + λ1
X
k≥0
k p(k) − µ
The maximising probability is found by solving ∂L[p] = − ln p(k) − 1 + λ0 + λ1 k = 0 , ∂p(k) X ∂L[p] p(k) − 1 = 0 = ∂λ0 k≥0
k≥0
(2)
X ∂L[p] k p(k) − µ = 0 = ∂λ1 k≥0
Equation (1) implies p(k) = eλ0 −1+λ1 k ≡ with Z = e1−λ0 =
∞ X
eλ 1 k =
k=0
(1)
(3)
1 λ1 k e Z
1 = Z(λ1 ) 1 − eλ 1
to enforce the normalisation constraint Eq (2). Obviously λ1 < 0 is required for these relations to make sense. Now λ1 must be chosen such as to solve Eq (3). µ=
X
k p(k) = (1 − eλ1 )
∞ X
k eλ 1 k =
k=1
k≥0
eλ 1 1 − eλ 1
required by the constraint of the given mean. Hence e
λ1
µ = 1+µ
1 p(k) = 1+µ
=⇒
1
µ 1+µ
!k
(b) To compute the variance, the second moment is needed hK 2 i =
X
k 2 p(k) = (1 − eλ1 )
∞ X
k 2 eλ 1 x =
k=0
k≥0
1 ∂2 Z = µ + 2µ2 Z ∂λ21
giving Var(K) = hK 2 i − hKi2 = µ(1 + µ) You should check that you get the variance directly from Var(K) =
∂2 ln Z ∂λ21
Question 2 Continuous random variable defined on non-negative real numbers. (a) Best estimate for a pdf p(x), x ≥ 0 is obtained by maximizing the entropy S[p] = −
Z
dx p(x) ln p(x)
subject to the constraint hXi = x0 , as well as the normalization constraint Z
dxp(x) = 1
Unless otherwise stated integrals are over IR+ . Introducing Lagrangian multipliers λ0 , and λ1 for the normalization constraint, and the constraint given by the fixed average, one has to find the stationary point of L[p] = S[p] + λ0
Z
dx p(x) − 1 + λ1
Z
dx x p(x) − x0
Stationarity w.r.t. p(x) requires δL[p] = −(ln p(x) + 1) + λ0 + λ1 x = 0 , δp(x)
x≥0
entailing
1 λ1 x e , x≥0, Z while the stationarity conditions w.r.t. variations of the Lagrangian multipliers reproduce the known constraints of normalization p(x) = exp(λ0 − 1 + λ1 x) ≡
∂L[p] Z = dx p(x) − 1 = 0 , ∂λ0 and prescribed average ∂L[p] Z = dx x p(x) − x0 = 0 ∂λ1 Note that the Lagrange parameter λ1 must be negative for the pdf to be normaliseable. 2
(b) Using the normalization constraint gives the relation Z
Z=
dx exp(−|λ1 |x) =
1 , |λ1 |
hence the pdf can be written as p(x) = |λ1 | exp(−|λ1 |x) with λ1 to be determined by the constraint of the prescribed average. The constraint concerning the average now reads hXi = |λ1 |
Z
dx x exp(−|λ1 |x) =
1 = x0 |λ1 |
(e.g. using the substitution y = |λ1 |x to evaluate the integral). Hence finally p(x) =
1 exp(−x/x0 ) x0
(c) Using the best estimate for p(x) as determined above, one computes integer moments via 1 Z n dx xn exp(−x/x0 ) hX i = x0 Substituting y = x/x0 transforms this into n
hX i =
xn0
Z
dy y n exp(−y) = xn0 n!
(appealing to the integral representation of the factorial or otherwise) The average hcosh(aX)i is evaluated using 1 Z dx exp[(a − 1/x0 )x] + exp[−(a + 1/x0 )x] 2x0 1 = 1 − (x0 a)2
hcosh(aX)i =
The expectation exists provided that |a| <
3
1 x0
7CCMCS03 – Equilibrium Analysis of Complex Systems Exercises 1b Question 1 Let H = H(σ) denote the energy function of a system with discrete degrees of freedom collectively denoted by σ, and let F [p] = E[p] − T S[p] =
X
p(σ)H(σ) + kB T
σ
X
p(σ) ln p(σ)
σ
denote the free energy functional of the system with this energy function. Find the minimum of F [p] over all probability mass functions p, and show that the minimizing p is the Gibbs-Boltzmann distribution pβ (σ) of the system. Question 2 Solve the exercise on dynamics of opinion formation posed at the end of Chapter 1, namely analyse the nature of the solutions of the self-consistency equation m = 2P (Jm + h0 ) √ where P (x) = 12 [1 + erf(x/ 2σ] is the cumulative distribution of a zero-mean variance σ 2 Gaussian. [The assumption underlying this fixed point equation is that h0 (t) evolves so slowly that m becomes stationary at the current value h0 of the external field]. Give conditions, under which you expect that m does (does not) exhibit a jump discontinuity, as h0 is increased from −∞ to +∞. Question 3 Let H(S) = H0 (S) − h
X
Si
i
be the energy function of an Ising spin system, with H0 (S) encoding some form of P interactions between spins, and the temr Hext = −h i Si denoting the contribution of the external field h to the total energy of the spins. Show that ∂2f − 2 ≥0, ∂h −1 where f = f (β, h) = −(βN ) ln Z(β, h) is the free energy density of the system.
1
7CCMCS03 – Equilibrium Analysis of Complex Systems Solutions 1b Question 1 To find the minimizer of the free energy functional F [p] = E[p] − T S[p] =
X
p(σ)H(σ) + kB T
σ
X
p(σ) ln p(σ)
σ
of a discrete state system with energy function H = H(σ), introduce a Lagrange parameter λ0 to take the normalization condition into account, and the Lagrangian L = F [p]−kB T λ0
X
p(σ)−1 =
σ
X
p(σ)H(σ)+kB T
σ
X
p(σ) ln p(σ)−kB T λ0
X
p(σ)−1 ,
σ
σ
and solve ∂L[p] = H(σ) + kB T (ln p(σ) + 1) − kB T λ0 = 0 ∂p(σ) X ∂L[p] p(σ) − 1 = 0 . = −kB T λ0 ∂λ0 σ
∀σ , (1)
The first set of equations is solved by p(σ) = eλ0 −1−βH(σ) , while the second gives e1−λ0 =
X
e−βH(σ) = Zβ ,
σ
or finally
p(σ) = pβ (σ) =
1 −βH(σ) e . Zβ
I.e. the mimimizing probability mass function is indeed the Gibbs distribution. To demonstrate that the minimizing p must be a minimum look at the matrix Dσ,σ′ =
∂ 2 L[p] ∂ 2 F [p] kB T = = δσ,σ′ ′ ′ ∂p(σ)∂p(σ ) ∂p(σ)∂p(σ ) p(σ)
of second derivatives. It is a diagonal matrix hence its eigenvalues are the diagonal elements, which are all positive. This is a condition for the extremizing p to be indeed a minimum. Question 2 Assuming that the external field h0 (t) in the opinion formation problem is very slowly varying, one can well approximate the behaviour by assuming that m(t) reaches a stationary value m determined by the current value of h0 = h( t), which is the solution of m = 2P (Jm + h0 ) − 1 where P (x) is the cumulative distribution √ function of the pdf of the hi . For hi ∼ 1 2 N (0, σ )) we have P (x) = 2 [1 + erf(x/ 2σ 2 ], so the FPE reads m = 2P (Jm + h0 ) − 1 = erf 1
Jm + h
√
0
2σ 2
≡ f (m) .
This is a sigmoid function which has a single zero at m = −h0 /J, where the slope of f J 2 2 f ′ (m) = √ e−(Jm+h0 ) /(2σ ) 2πσ 2
1
1
0.5
0.5 m, f(m)
m, f(m)
J ′ . Changing the value of h0 amounts is maximal, and indeed given by fmax = √2πσ 2 to shifting the function left/right for ∆h0 > 0/∆h0 < 0. To solve the system graphically find the intersection of the lines y = m and y = f (m). On can convince oncelve that m ≃ −1 for h0 → −∞, and m ≃ 1 for h0 → +∞, and ′ that the solution is a continuous function of h0 if fmax < 1, whereas there will be a ′ jump discontinuity if fmax > 1.
0
-0.5
0
-0.5
-1
-1 -4
-2
0 m
2
4
-2
-1
0 m
1
2
The left panel in the figure allows to read off the solution m for various values of h0 for J = σ 2 = 1, and it can be seen that the solution will be a continuous function of h0 . In the right panel, we choose σ 2 = 0.25 instead, entailing that the solution will have jump discontinuities. The curves were such jumps will occur when h0 is decreased/increased are shown; they correspond to h0 ≃ ±0.185. Question 3 To show that ∂2f − 2 ≥0, ∂h for a system with energy function H(S) = H0 (S) − h
X
Si
i
we first evaluate 1 ∂ ln Zβ 1 X 1 ∂f = = hSi iβ = m=− ∂h βN ∂h N i N
* X i
Si
+
β
where h. . .iβ denotes an average over the Gibbs Boltzmann distribution corresponding to H(S). Doing the second derivative (using quotient and chain rules for the derivative of the average) gives * + + * X 2 ∂2f β X 2 − 2 = − §i Si ∂h N i i β β which, up to a constant factor, is a variance and hence non-negative as claimed.
2
7CCMCS03 – Equilibrium Analysis of Complex Systems Exercises 2 Question 1 Consider a disordered system with N degrees of freedom. Due to the disorder, the free energy density of the system fN = (−βN )−1 ln ZN will be a random quantity. Assume that the probability density function p(fN ) is Gaussian, and of the form "
#
1 exp − 2 (fN − f¯N )2 . p(fN ) = q 2 2σN 2πσN 1
with parameters f¯N and σN that are assumed to be given functions of N . (a) For this setup, compute the N -dependent annealed and quenched free energies D
fa (N ) = −(βN )−1 ln ZN
E
and
D
fq (N ) = (−βN )−1 ln ZN
E
explicitly in terms of the average h. . .i over the Gaussian probability density function of the fN given above, and verify that 1 2 fa (N ) = fq (N ) − βN σN . 2 (b) Assume that the limit limN →∞ f¯N ≡ f¯ exists and that σN = σ0 N −a with a ≥ 0. Specify the range of values for the exponent a, (i) for which the free energy density is self-averaging in the thermodynamic limit, (ii) for which the thermodynamic limit of the annealed free energy density exists, and (iii) for which the annealed and quenched free energy densities are identical in the thermodynamic limit.
1
7CCMCS03 – Equilibrium Analysis of Complex Systems Solutions 2 Solution 1 (a) With ZN and fN related by fN = (−βN )−1 ln ZN , or ZN = exp(−βN fN ), the N -dependent quenched free energy is fq (N ) = (−βN )−1 ln ZN = fN = f¯N + fN − f¯N = f¯N , D
E
D
E
D
E
where the last step uses the fact that the Gaussian density is an odd function of fN − f¯N . To compute the annealed free energy density, we first evaluate D
ZN
E
= exp[−βN f¯N ] exp[−βN (fN − f¯N )] = exp D
E
(
1 2 − βN f¯N + β 2 N 2 σN 2
)
,
where the last step uses a standard identiy for Gaussian integrals. Taking logarithms, this gives D E 1 2 . fa (N ) = (−βN )−1 ln ZN = f¯N − βN σN 2
Using the previous result for fq (N ), one obtains 1 2 , fa (N ) = fq (N ) − βN σN 2 as claimed. (c) (i) The free energy density will be self-averaging in the thermodynamic limit, if the standard deviation of the pdf of fN vanishes as N → ∞. This requires a > 0 in σN = σ0 N −a , i.e. the inequality a ≥ 0 must be strict. (ii) Given that the limit f¯ = limN →∞ f¯N exists, the condition for the thermodynamic limit of the annealed free energy density to exist is that 1 1 2 βN σN = βN σ02 N −2a 2 2 has a limit as N → ∞, which requires a ≥ 1/2. (ii) Quenched and annealed free energy density are identical in the thermodynamic limit, provided 1 1 2 βN σN = βN σ0 N −2a → 0 , 2 2 as N → ∞, which requires a > 1/2.
1
7CCMCS03 – Equilibrium Analysis of Complex Systems Exercises 3 Question 1 Given asynchronous Glauber dynamics for {0, 1}-degrees of freedom, in which at each time step X ˜ i + ξit , Jij nj (t) + h ni (t + ∆t) = Θ j
for a single randomly selected node i ∈ {1, . . . , N }, with a symmetric zero-mean noise distribution of the form βz 1 Prob{ξit ≤ z} = 1 + tanh . 2 2 P ˜ i , with Jii = 0, and exploiting the (a) Defining the local field hi (t) = j Jij nj (t) + h identity βh (t) i eβhi (n(t)) 1h i 1 + tanh = 2 2 1 + eβhi (n(t)) used in class, show that the Glauber transition probabilities can be formulated as n
o
Prob ni (t + ∆t) = 1 =
1 eβni hi (n) 1 w(n, Fi n) = N 1 + eβhi (n) N 1 eβFi ni hi (n(t)) 1 W (Fi n, n) = = w((Fi n, n) βh (n) N 1+e i N with Fi n = (n1 , n2 , . . . , ni , . . . , nN ) = (n1 , n2 , . . . , 1 − ni , . . . , nN ). W (n, Fi n) =
(b) Hence, for ∆t scaling with system size as ∆t = 1/N , demonstrate that the master equation for Glauber dynamics in the large N limit takes the form ∂t pt (n) =
Xh
i
w(n, Fi n)pt (Fi n) − w(Fi n, n)pt (n) .
i
(c) Assuming symmetric couplings Jij = Jji without self-interactions, Jii = 0, show that Glauber dynamics satifies detailed balance with the Gibbs-Boltzmann distribution pβ (n) = Z −1 exp(−βH(n)), corresponding to the energy function H(n) = −
X 1X ˜ i ni . Jij ni nj − h 2 i6=j i
Question 2 Consider the Metropolis algorithm described in the lecture notes. It is a stochastic algorithm with transition probability W (S ′ , S) = const. × pprop (S ′ , S)pacc (S ′ , S), where pprop (S ′ , S) = pprop (S, S ′ ) is a symmetric ‘proposal probability’, i.e. the probability to suggest state S ′ as the new state, given the system is in state S (and vice versa) and pacc is the Metropolis acceptance probability
pacc (S ′ , S) =
, if H(S ′ ) < H(S) ,
1 ′
e−β(H(S )−H(S)) , if H(S ′ ) ≥ H(S) .
formulated in terms of the energy function H(S) of the system. 1
(a) Show that this algorithm satisfies detailed balance with the Gibbs Boltzmann distribution pβ (S) = Z −1 exp(−βH(S)) (b) Note that this algorithm is not referring in any way to an Ising-spin nature of the system, so is more general. However, thinking of Ising-spin systems, list at least three restrictions of Glauber Dynamics that Metropolis dynamics does not have.
2
7CCMCS03 – Equilibrium Analysis of Complex Systems Solutions 3 Question 1 Given asynchronous Glauber dynamics for {0, 1}-degrees of freedom, ni (t + ∆t) = Θ
X
˜ i − ξit , Jij nj (t) + h
j
with noise model as described. (a) Then n
o
Prob ni (t + ∆t) = 1 =
eβhi (n(t)) 1 + eβhi (n(t))
as stated, and we have n
o
n
o
Prob ni (t + ∆t) = 0 = 1 − Prob ni (t + ∆t) = 1 =
1 1+
eβhi (n(t))
Thus, since hi (Fi n) = hi (n) due to the absence of self-interactions, 1 eβni hi (n) 1 W (n, Fi n) = = w(n, Fi n) βh (n) i N 1+e N both, for ni = 1 and for ni = 0, with the prefactor N −1 accounting for the probability to select i ∈ {1, . . . , N }. Similarly W (Fi n, n) =
1 eβFi ni hi (n) 1 w(Fi n, n) = βh (n) i N 1+e N
both, for Fi ni = 1 and for Fi ni = 0. (b) The equation follows by noting that pt+ 1 (n) − pt (n) = N1 ∂t pt (n) + O(N −2 , N inserting the transiton rates evaluated in (a), and taking the N→ ∞ limit. (c) To demonstrate detailed balance, we have to show that for all n and all i w(n, Fi n)
e−βH(Fi n) e−βH(n) = w(Fi n, n) Z Z
Given the form of the transition rates this amounts to showing eβ(ni −Fi ni ) hi (n) = e−β(Hn))−H(Fi n)) We have β(ni − Fi ni ) hi (n) = β(2ni − 1) hi (n) . On the other hand, using symmetry and the absence of self-interactions, H(Fi n) − H(n) = −(Fi ni − ni )
hX j
entailing detailed balance as claimed.
1
˜ i = (2ni − 1) hi (n) , Jij nj + h i
Question 2 Metropolis algorithm as described in the lecture notes. Transition probability is W (S ′ , S) = const. × pprop (S ′ , S)pacc (S ′ , S), where pprop (S ′ , S) = pprop (S, S ′ ) is a symmetric ‘proposal probability’,the acceptance probability is.
pacc (S ′ , S) =
1
, if H(S ′ ) < H(S) ,
′
e−β(H(S )−H(S)) , if H(S ′ ) ≥ H(S) .
formulated in terms of the energy function H(S) of the system. (a) To show that this algorithm satisfies detailed balance with the Gibbs Boltzmann distribution pβ (S) = Z −1 exp(−βH(S)), first assume that H(S ′ ) > H(S). in this case W (S ′ , S)
e−βH(S) e−βH(S) ′ = const. × pprop (S ′ , S)e−β(H(S )−H(S)) Z Z −βH(S ′ ) −βH(S ′ ) e ′ e ′ = W (S, S ) , = const. × pprop (S , S) Z Z
where we have used symmetry of pprop and the form of pacc (S, S ′ ) for H(S) < H(S ′ ) in the last step. The case H(S ′ ) > H(S) is argued in a similar fashion. (b) Apart from the fact that the Metroplis algorithm is not restricted to Ising degrees of freedom or indeed to discrete degrees of freedom, it does (i) not require that state changes are restricted to single spin updates; (ii) it does not stipulate the absence of self-interactions; (iii) the number of spins proposed for an updadte in pprop need not be constant, and could for instance itself be random, as long as the symmetry condition is maintained; (iv) there is even no need for the statistics of the number of spins proposed for an updadte in pprop be time-independent. as long as symmetry at all times is maintained.
2
7CCMCS03 – Equilibrium Analysis of Complex Systems Exercises 4 Question 1 Evaluate the integral IN [f ] =
Z
∞
dx eN f (x) −∞
for the functions specified below. (a) Choose f (x) = − 21 x2 − kx and evaluate IN for large N . using Laplace’s method. Assume that k is any real constant. This integral is actually a Gaussian integral, which can be done analytically. Use this to verify that the Laplace method is actually exact in this case. 4
(b) Choose f (x) = − 21 ax2 − x4 , with a a real constant. Evaluate limN →∞ N −1 ln IN [f ] as a function of a. Discuss your findings in terms of the shape of f . Question 2 Consider a model of a lattice gas defined on a lattice containing N lattice sites, and described by the folloiwing energy function H(n) = −
X 1X ni , Jij ni nj − µ 2 i,j i
in which the ni ∈ {0, 1} describe presence (ni = 1) or absence (ni = 0) of a particle on site i. The summation ranges for i (and j) in both sums extend over the entire lattice. The Jij are interaction constants describing an attractive interaction (with Jij > 0) and µ is a chemical potential, which regulates the mean density of particles in the system. J for all pairs i 6= j. Assume that the interaction is of Curie-Weiss type Jij = N (a) Show that the energy function of the system depends only on the average density ρ¯ = ρ¯(n) =
1 X ni N i "
of particles and can be expressed as H(n) = N −
J 2 ¯ 2ρ
#
− µ¯ ρ + J2 ρ¯.
(b) Evaluate the partition function ZN =
X
e−βH(n)
n
of the system, using your favourite method. (c) Thereby show that the equation of state of the system is of the form ρ=
eβ(Jρ+µ) , 1 + eβ(Jρ+µ)
and that the free energy per site is given by f (β, µ) =
J 2 ρ − β −1 ln 1 + eβ(Jρ+µ) , 2
with ρ solving the above equation of state. 1
(d) Verify that ρ=−
∂ f (β, µ) ∂µ
is in fact the average particle density in equilibrium. [Hint: use the expression of ZN in terms of the sum over micro-states, and its relation to the free energy]. (e) Use a graphical analysis of the equation of state to demonstrate that at sufficiently low temperatures (large β) the system may have coexisting high- and low density phases at intermediate values of µ, and that there are discontinous transitions between them as µ is varied. Conversely, show that for sufficiently small values of β the density will be a continuous function of µ. temperatures
2
7CCMCS03 – Equilibrium Analysis of Complex Systems Solutions 4 Question 1 We evaluate IN [f ] =
Z
∞ −∞
dx eN f (x)
using Laplace’s method. (a) For f (x) = − 21 x2 − kx, we need to find solution(s) of f ′ (x) = −x − k = 0
⇔
x = −k
As f ′′ (x) = −1 < 0 independently of x, every stationary point of f is indeed a maximum, entailing q
IN [f ] = eN f (−k) 2π/N = eN k
2 /2
q
2π/N
Evaluating the integral directly using completion of squares gives IN [f ] =
Z
∞ −∞
dx e−N x
/ 2−kx
= eN k
2 /2
Z
∞ −∞
dx e−N (x+k)
/ 2/2
= eN k
2 /2
q
2π/N
which confirms that Laplace’s method is exact in this case. (b) Choosing 1 2 x4 , f (x) = − ax − 2 4 with a a real constant, we have f ′ (x) = −ax − x3 = −x(a + x2 ) , which has zeros at x = x0 = 0 ,
√ and x = x± = ± −a .
Second derivatives evaluate to f ′′ (x) = −a − 3x2 . ⇒ f ′′ (x0 ) = −a , f ′′ (x± ) = 2a . Hence f has a maximum at x0 for a > 0 and a minimum for a < 0. The stationary points x± are purely imaginary (and correspond to minima in the x direction) for a > 0, whereas they are real and correspond to maxima of f for a < 0. Hence
lim N −1 ln IN [f ] =
N →∞
f (0) = 0
f (x± ) = a2 /4 , ifa < 0 .
Note that the result is continuous at a = 0.
1
, if a > 0 ,
The results are easily understod, by noting that the small x asymptotics of f is dominated by the quadratic contribution, entailing that x = 0 corresponds to a maximum at a >, and to a minumum at a < 0. See the figure below. 0.5
0
f
-0.5
-1
-1.5
-2 -2
-1
0
1
2
x
Plot of f for a = +1 (red), and a = −1 (green).
Question 2 Investigate Curie-Weiss version of lattice gas model H(n) = − with Jij =
J N
X 1X Jij ni nj − µ ni , 2 i,j i
for all pairs i 6= j.
(a) For the Curie-Weiss couplings the interaction term can be written as a complete square, thus H(n) = −
X J X J X 2 ni + ni − µ ni . 2N i 2N i i
where the second term corrects for self-interactions that were included in the first (noting that n2i = ni ). The result can be written as "
#
J J H(n) = N − ρ¯2 − µ¯ ρ + ρ¯ 2 2 with ρ¯ = ρ¯(n) =
1 N
P
i
ni .
(b) To evaluate the partition function, X
ZN =
e−βH(n)
n
we (i) neglect the last O(1) term as subdominant and (ii) realizing that interactions occur in the model only via a complete square, use Gaussian linearisation to decouple sites. ZN =
XZ n
=
Z
dρ q
2π/(N βJ)
dρ
q
2π/(N βJ)
e
2
e−
N βJ 2 ρ +N βJρ¯ ρ+N βµ¯ ρ 2
− N2βJ ρ2
Y i
"
X
ni ∈{0,1}
e
β(Jρ+µ)ni
#
where we have used N ρ¯ = w.r.t. sites. We get ZN =
P
i
ni , entailing that the partition function factors
dρ
Z
q
2π/(N βJ)
e
h
ρ2 +ln 1+eβ(Jρ+µ) N − βJ 2
i
,
which is of a form that can in the large N limit be evaluated by the Laplace method. (c) The dominant contribution to the ρ integral comes from a maximum of the exponent, which is located by looking at the derivative of the exponent w.r.t. ρ. I.e. ρ mus solve eβ(Jρ+µ) ρ= , 1 + eβ(Jρ+µ) which is the stated equation of state. The free energy per site is given by f (β, µ) = lim (−βN )−1 ln ZN = N →∞
J 2 ρ − β −1 ln 1 + eβ(Jρ+µ) , 2
where ρ must be chosen among the solutions of the equation of state that maximize the exponent in the integral representation of ZN . (d) From this expression of f = f (β, µ) we have that −
∂ eβ(Jρ+µ) f (β, µ) = =ρ, ∂µ 1 + eβ(Jρ+µ)
where the last equation is a consequence of the self-consistency equation. Note, implicit µ dependences via the µ dependence of ρ need not be taken into account as f = f (β, µ) is stationary w.r.t. variations of ρ. On other hand, evaluating 1 ∂ ∂ ln ZN ρ = − f (β, µ) = lim N →∞ βN ∂µ ∂µ using the definition ZN = given entails
P
n
e−βH(n) and the lattice gas energy function as 1 X hni i , N →∞ N i
ρ = lim
with h·i denoting the Gibbs average generated bty H(n), so the result does indeed give the average density. (e) The equation of state can be translated into ρ=
1 1 + tanh(β(Jρ + µ)) = gβ,µ (ρ) 2
The function gβ,µ (ρ) is a sigmoid function, which is symmetric about the point (−µ/J, 1/2), where it exhibits an inflection point and maximal slope ′ gβ,µ (−J/µ)) = βJ . For βJ < 4 therefore the solution solution of the equation 4 of state (the intersection of the function gβ,µ (ρ) with the diagonal is unique, and will vary continuously between 0 and 1, if µ is increased from −∞ to +∞. On the other hand, for βJ > 4 there will be two tangent bifurcations, where 3
gβ,µ (ρ) touches the diagonal. As µ is increased from −∞ to +∞, the first tangent bifurcation creates a pair of solutions at ρ > 1/2 the larger one stable and increasing with µ, the smaller one unstable and decreasing with µ, until the unstable solution eventually merges with the stable small-ρ solution (that increases with µ and disappears in another tangent bifurcation.
Graphical solution of the equation of state using J = 1.
Left: shown are the functions gβ,µ (ρ) at β = 3, for µ =
−1, −0.75 − 0.5, −0.25, and 0 (right to left), showing that the intersection of gβ,µ (ρ) with the diagonal is continuous. Right: The same set of curves (and two additional ones for µ ≃ −0.57 and µ ≃ −0.43) are shown, but now for β = 6. The additional µ-values are the approximate locations of the tangent-bifurcations where gβ,µ (ρ) touches the diagonal and a pair of stable and unstable FPs appears (at µ ≃ −0.57) and disappears (at µ ≃ −0.43) as µ is increased through these values, leading to discontinuous transitions at these points.
4
7CCMCS03 – Equilibrium Analysis of Complex Systems Exercises 5 Question 1 Investigate the stability of solutions of the fixed-point equations for the 3 different methods we used in class to solve the Curie-Weiss moodel in zero external field. I.e. determine whether the solution of your fixed-point equations derived within the Laplace- or saddle point method gives the dominant contribution to the integral representation of the partition function. Note that for the solution in terms of the δ-function method, you would wish to look at eigenvalues of the Jacobian
∂ 2g =
∂2 ∂m2 ∂2 ∂m∂ m ˆ
∂2 ∂m∂ m ˆ ∂2 ∂(m) ˆ 2
Assuming that βJ(1−m2 ) < 1 for the non-trivial solution of the fixed point equation in the low temperature phase, what can you say about the nature of the solutions (maximum/minimum/saddle point) in the high and low temperature phases? Question 2 Investigate the equation of state of the Curie-Weiss model in zero external field m = tanh(βJm) and show — by using the Taylor expansion of the hyperbolic tangent to 3-rd order in its small argument — that for small values of (−t) = βJ−1 we have a leading βJ order small-t behaviour of the form √ m ≃ −3t . Apart from truncating the expansion of the hyperbolic tangent at third order, this result uses an additional approximation by setting βJ ≃ 1 for the small t-values considered. Show that the dominant correction, when βJ is expanded in t, gives √ m ≃ −3t 1 + t . Show that including the next order in the expansion of the hyperbolic tangent will change the result at the order in t shown, and thus demonstrate that the above result for the dominant correction is inconsistent, and higher orders in the expansion of the hyperbolic tangent are needed to get a consistent result. Hint: use the fact that including the next order of the expansion of the hyperbolic tangent gives a biquadratic equation for βJm, and use appropriate expansions of the resulting solution for m. Evaluate the higher order correction implied by the expansion of the hyperbolic tangent to 5th order. Question 3 Use the expansions of m obtained in question 2 to find the leading and subleading order contributions to the susceptibility and the specific heat at small (−t).
1
7CCMCS03 – Equilibrium Analysis of Complex Systems Solutions 5 Question 1 Stability of the solutions of the FPEs. (i) For the combinatorial and Gaussian linearization methods, we have ZN ∼ N2 1−m 1−m 1+m 1+m 2 with g(m) = βJ 2 m + βhm − 2 ln 2 − 2 ln 2 . In this case the FPE is g ′ (m) = β(Jm + h) +
R1
−1 dm e
N g(m) ,
1 1−m ln =0 2 1+m
which as we have shown is equivalent to m = tanh[β(Jm + h)]. To assess stability we look at 1 g ′′ (m) = βJ − 1 − m2
The m = 0 solution is stable (a local maximum of the integrand), as long as g ′′ (0) = βJ − 1 < 0, which requires high temperatures (small β). With the assumption βJ(1 − m2 ) < 1 given for the non-trivial solution in the low-temperature phase βJ > 1, we conclude that g ′′ (m) < 0 in the low temperature phase as well provided the non-trivial solution of the FPE is chosen. (ii) Using the Gaussian linearization technique we obtain ZN ∼ 2 g(m) = − βJ 2 m + ln 2 cosh[β(Jm + h)]. In his case the FPE is
R1
−1 dm e
N g(m) ,
with
g ′ (m) = −βJm + βJ tanh[β(Jm + h)] = 0 which is clearly equivalent to m = tanh[β(Jm + h)]. To assess stability we look at g ′′ (m) = −βJ + (βJ)2 (1 − m2 ) = βJ[βJ(1 − m2 ) − 1) entailing that the trivial solution corresponds to a maximum of g with g ′′ (0) < 0, provided that βJ < 1, i.e. in the high temperature regime. Under the assumption given about the non-trivial low temperature solution, we see that this solution, too, corresponds to a maximum of g, with g ′′ (m) < 0. (ii)
For the delta-function method, we have ZN ∼ g(m, m)) ˆ =
R1
ˆe −1 dmdm
N g(m,m)) ˆ
with
βJ 2 m + βhm − imm ˆ + ln[2 cosh(im)] ˆ 2
and we get the FPEs ∂g ∂m ∂g ∂m ˆ
= β(Jm + h) − im ˆ =0 = −im + i tanh(im) ˆ =0
which, upon inserting the first equation into the second are once more seen to be equivalent to m = tanh[β(Jm + h)]. To assess stability, we have to look at the matrix of second derivatives of g, ! ∂2 ∂2 βJ −i 2 2 ∂mm ˆ ∂m2 = ∂ g= ∂2 ∂ −i −(1 − m2 ) 2 ∂mm ˆ ∂m ˆ
1
where we have made use of the FPEs. The eigenvalues of ∂ 2 g are i p 1h λ1,2 = βJ − (1 − m2 ) ± (βJ − (1 − m2 ))2 − 4(1 − βJ(1 − m2 )) 2
Hence the m = 0 solution at high temperatures (βJ < 1) corresponds to a maximum of G (two negative eigenvalues). Both eigenvalues will be zero exactly at the critical point βc J = 1. At low temperatures (βJ > 1), the eigenvalues of ∂ 2 g at the trivial solution m = 0 will be of opposite sign, so the trivial high temperature solution becomes a saddle point. Conversely, with the assumption βJ(1 − m2 ) < 1 for the non-trivial solution in the low temperature phase, both eigenvalues would be positive, rendering the solution a minimum, which is contrary to our expectation that the solution should characterize the dominant contribution to the integral. This is indicative of the difficulties one encounters when assessing stability in the context of using the saddle point method involving contours in the complex plane. A possible way out consists in noting that both FPEs entail a unique relation between im ˆ and m. Using the second equation which implies that the stationry m ˆ value is a 1+m function of m, viz. im ˆ = im(m) ˆ = arctanh(m) = 21 ln 1−m and exploiting the fact that 1 1 cosh2 (im) ˆ = (1−tanh = 1−m 2 one can rewrite g as a function of m alone (inserting the 2 (im) ˆ expression for im ˆ that gives the dominant contribution at each m, βJ 2 m 1+m 1 m + βhm − ln − ln[(1 + m)(1 − m)] + ln 2 2 2 1−m 2 1−m 1−m 1+m 1+m βJ 2 m + βhm − ln − ln 2 2 2 2 2
g(m) = g(m, m(m))) ˆ = =
This is identical to the g(m) obtained within the combinatorial method, and we can go back to arguments presented there to convince ourselves that the trivial solution gives the dominant contribution at high temperatures, whereas the non-trivial solution gives the dominant contribution at low temperatures. Alternatively one could use the first equation im ˆ = β(Jm + h) to obtain g(m) = g(m, m(m))) ˆ =
βJ 2 m + βhm − β(Jm + h)m + ln[2 cosh(β(Jm + h))] 2
and in this form, the function g(m0 is identical to that obtain using the Gaussian linearization approach, confirming that the solutions chosen at low and high temperature do give dominant contributions by using the analysis given there. Question 2 We investigate the behaviour of the zero field magnetization m of the Curie Weiss model, which is given in terms of the solution of the FPE m = tanh(βJm) in which β = 1/kB T . To do this we use the expansion of the hyperbolic tangent about x = 0. 1 2 tanh(x) = x − x3 + x5 + O(x7 ) . 3 15 To investigate the dominant asymptotics for T . Tc (or equivalently for β & βc [where βc J = 1] we use the above expansion to third order, giving 1 m = tanh(βJm) = βJm − (βJm)3 + O((βJm)5 ) . 3 Neglecting the O((βJm)5 ) correction we can write the equation for the non-zero solutions in the form βJ − 1 (βJm)2 = 3 = −3t βJ 2
βc J T Here we used βc J = 1 to write βJ−1 βJ = 1 − βJ = 1 − Tc = −t. Taking square roots (restricting attention to the positive solution), we get
m= Using
1 βJ
=
βc J βJ
=
T Tc
=
T −Tc +Tc Tc
√
−3t
1 βJ
= 1 + t we get m=
√
−3t (1 + t) .
Since for small t we can set 1 + t ≃ t, we can to leading order in t neclect the (1 + t) correction factor, to get √ m = −3t to leading order. The question then is, whether including the subdominant correction factor is consistent to the order in t shown, or whether, taking further terms in the expansion of the hyperbolic tangent into account, would change the precise form of the sub-dominant correction. To test whether taking higher order contributions in the expansion of the hyperbolic tangent will affect the result at the order ing t shown, we include the fifth order term of the expansion of the hyperbolic tangent in the equation for non-zero solutions of the equation of state, which then must satisfy 1 βJ − 1 2 (βJm)4 − (βJm)2 + =0 15 3 βJ up to the order in the expansion shown. This is a bi-quadratic equation in m. Using Tc −T βc J = 1, and introducing βJ−1 βJ = Tc = −t, we get 5 15 (βJm)4 − (βJm)2 − t = 0 2 2 which is solved by r 24 i 5h (βJm) = 1 ± 1 + t 4 5 In order that the solution becomes small as t → 0, we have to take the negative sign in front of the √square root. Expanding the square root to second order in its small argument, i.e. using 1 + ǫ ≃ 1 + 21 ǫ − 81 ǫ2 , we get 2
(βJm)2 ≃ −3t +
18 2 t 5
up to second order in t. Taking square roots gives r 6 βJm ≃ ± (−3t) 1 − t 5
which, on expanding the second factor inside the square root gives βJm ≃ ±
p 3 (−3t) 1 − t 5
In order to get an expansion consistent in powers of t, we need to divide by βJ and consider its expansion, for which we found 1 βc J T = = =1+t βJ βJ Tc 3
above. Putting things together we get m≃±
p 2 (−3t) 1 + t + O(t2 ) . 5
The dominant correction to scaling thus is affected by the higher order of the expansion of the hyperbolic tangent. Question 3 From 1 − m2 χ=β 1 − βJ(1 − m2 ) √ one obtains (using m ≃ ± −3t) for small t < 0: χ≃
1 −2Jt
while with the (correct) first order correction in m, we have χ≃
1 9 1 + t + O(t2 ) . −2Jt 5
Note that this result exploits the fact that we are only needing m2 in the above calculation, and so we could avoid the inaccuracies coming from the expansion of the square root in the calculation of m. For the specific heat (βJm)2 (1 − m2 ) c = kB 1 − βJ(1 − m2 ) we get the leading contribution (using m2 = (−3t)(1 + t)2 and βJ = 1/(1 + t)) c≃
3kB 1 + t + O(t2 ) 2
2 and when correct first order corrections in m are included – i.e. using (βJm)2 ≃ −3t+ 18 5 t and βJ = 1/(1 + t) – becomes
c≃
3kB 8 1 + t + O(t2 ) 2 5
4
7CCMCS03 – Equilibrium Analysis of Complex Systems Exercises 6 Question 1 Given the Hamiltonian of a system of spin-1 particles on a square lattice with periodic boundary conditions, H(S) = −J
X
(i,j)
Si Sj − h
X
S1 − k
i
X
Si2 ,
Si ∈ {0, ±1} .
i
Here the first sum is over all pairs of neighbouring spins, the second and third sum are over all spins in the system. Construct a variational mean-field theory based on the test Hamiltonian H0 (S) = −k1
X i
S1 − k2
X
Si2 ,
i
and show that the equations of state for this system, obtained by minimizing the variational free energy w.r.t. the parameters k1 and k2 of the test Hamiltonian H0 give equations for the magnetization m = hSi i0 and the ‘activity’ a = hSi2 i0 of the form 2eβh2 sinh(β(4Jm + h1 )) 2eβh2 cosh(β(4Jm + h1 )) m= , and a = . 1 + 2eβh2 cosh(β(4Jm + h1 )) 1 + 2eβh2 cosh(β(4Jm + h1 )) [Hint: you should cast your variational equations into the form of a linear homogeneous system of equations for the variables x = 4Jm + h1 − k1 and y = h2 − k2 , and convince yourself of the fact that it has only a trivial solution (x, y) = (0.0). Question 2 Consider a Curie-Weiss Random Field Ising Model, described by the Hamiltonian X J X H(S) = − Si Sj − hi Si , N (i,j) i and consisting of N spins. Here, the first sum is over all pairs of spins and the second over all sites in the system. The hi are assumed to be independent, identically distributed random fields. (a) Evaluate the {hi }-dependent free energy per spin at inverse temperature β = (kB T )−1 , and show that in the thermodynamic limit N → ∞ it is given by βJ 2 m + h ln[2 cosh(β(Jm + h))]ih N →∞ 2 where h. . .ih denotes an average over the hi distribution, and where the magnetization m, must solve the fixed-point equation −βf (β) = − lim βfN (β) = −
m = htanh(β(Jm + h))]ih . Hint: You will need to appeal to the law of large numbers to obtain the average h. . .ih as the large system limit of an empirical average. (b) Assume a symmetric bimodal distribution of random fields of the form 1 p(h) = [δ(h − h0 ) + δ(h + h0 )] 2 Use graphical and asymptotic analyses of the FPE to discuss the phase diagram of the system in the (β, h0 )-plane. 1
7CCMCS03 – Equilibrium Analysis of Complex Systems Solutions 6 Question 1 We consider a system of spin-1 particles on a square lattice with periodic boundary conditions, described by the Hamiltonian H(S) = −J
X
Si Sj − h1
X
S1 − h2
Si2 ,
Si ∈ {0, ±1} ,
i
i
(i,j)
X
and attempt to onstruct a variational mean-field theory based on the test Hamiltonian H0 (S) = −k1
X
S1 − k2
X
Si2 ,
i
i
(a) The variational mean-field theory is constructed by minimizing F˜ = F0 + hH(S) − H0 (S)i0 over the coupling constants k1 and k2 of the test Hamiltonian H0 . Here we use h. . .i0 to denote a Gibbs average in the test ensemble generated by H0 . First compute the free energy F0 of the system described by the test Hamiltonian. Get Z0 =
X
e−βH0 (S) =
S
X
eβk1 S+βk2 S
2
S∈{0,±1}
N
= (1 + 2eβk2 cosh(βk1 ))N ,
where we used the fact that the test system is a collection of N independent spin-1 systems; the partition function is therefore the product of the single site partition functions. As a consequence we have F0 = −β −1 N ln(1 + 2eβk2 cosh(βk1 )) Next, evaluate hSi i0 =
2eβk2 sinh(βk1 ) , 1 + 2eβk2 cosh(βk1 )
(again exploiting the independence of individual spins – you should fill in the details of this argument!) and similarly hSi2 i0 =
2eβk2 cosh(βk1 ) . 1 + 2eβk2 cosh(βk1 )
Similarly, because of the independence of spins in the test ensemble, the expectation of products of spins factorizes, hSi Sj i0 = hSi i0 hSj i0 =
2eβk2 sinh(βk1 ) 2 1 + 2eβk2 cosh(βk1 )
Introducing the abbreviations m = hSi i0 and a = hSi2 i0 for the i independent magentizations and ‘activities’, we can write the variational free energy F˜ as h
i
F˜ = N − β −1 ln(1 + 2eβk2 cosh(βk1 )) − 2Jm2 − (h1 − k1 )m − (h2 − k2 )a The variational equations of state are given by ∂ F˜ ∂k1 ∂ F˜ ∂k2
i ∂m ∂m ∂a − (h1 − k1 ) − (h2 − k2 ) +m =0 ∂k1 ∂k1 ∂k1 h i ∂m ∂m ∂a = N − a − 4Jm − (h1 − k1 ) − (h2 − k2 ) +a =0 ∂k2 ∂k2 ∂k2 h
= N − m − 4Jm
1
which can be rewritten as ∂m [4Jm + h1 − k1 ] + ∂k1 ∂m [4Jm + h1 − k1 ] + ∂k2
∂a [h2 − k2 ] = 0 ∂k1 ∂a [h2 − k2 ] = 0 . ∂k2
This can be seen as linear homogeneous system of equations for the variables [4Jm+ h1 − k1 ] and [h2 − k2 ] with matrix of coefficients given by A=
∂m ∂k1 ∂m ∂k2
!
∂a ∂k1 ∂a ∂k2
=β
2
a − m2 m − m2 m − am a − a2
!
Using m = tan(βk1 )a one can show that det(A) 6= 0, [you should convince yourself of this fact !], so the only solution is k1 = 4Jm + h1 ,
k2 = h2 .
Given the definitions of m and a in terms of averages over the test ensemble, we thus have the equations of state m=
2eβh2 sinh(β(4Jm + h1 )) , 1 + 2eβh2 cosh(β(4Jm + h1 ))
and a =
2eβh2 cosh(β(4Jm + h1 )) . 1 + 2eβh2 cosh(β(4Jm + h1 ))
Question 2 Given the Hamiltonian of a Curie-Weiss RFIM H(S) = −
X J X hi Si Si Sj − N (i,j) i
with the hi denoting independent identically distributed random fields. (a) To compute the {hi }-dependent free energy per spin, evaluate the partition function at given configuration of the random fields. Exploit the fact that, up to an O(1) contribution, the interaction term can be written as a complete square, which is then linearized using a standard identity for Gaussian integrals. This gives ZN
=
X
exp
S
≃
XZ S
=
Z
=
Z
(
X βJ X 2 βJ Si − hi Si +β 2N i 2 i
)
(
X dm N βJ 2 p exp − (Jm + hi )Si m +β 2 2π/(βJN ) i
)
(
i h βJ dm 1 X p ln[2 cosh(β(Jm + hi ))] m2 + exp N − 2 N i 2π/(βJN ) p
i h βJ dm m2 + hln[2 cosh(β(Jm + h))]ih exp N − 2 2π/(βJN )
)
where the O(1) contribution in the exponent is neglected after the first line, and the LLN is invoked to obtain the last line. Evaluating the m integral via the Laplace method requies finding the stationary point of the exponential, leading to the FPE m = htanh(β(Jm + h))]ih
2
and the free energy in the thermodynamic limit is given by −βf (β) = lim N −1 ln ZN = − N →∞
βJ 2 m + hln[2 cosh(β(Jm + h))]ih , 2
with m a solution of the FPE which gives rise to the (absolute) minimum of f (β). The solution of the FPE and the free energy per site in the thermodynamic do not depend on the specific configuration of the random fields (as a consequence of the LLN invoked above). (b) For a bimodal field distribution 1 p(h) = [δ(h − h0 ) + δ(h + h0 )] 2 the FPE reads 1 m = htanh(β(Jm + h))]ih = [ tanh(β(Jm + h0 )) + tanh(β(Jm − h0 ))] ≡ f (m) 2 One can try a small m expansion of f (m) which will allow to locate the transition if it is continuous (as in the CW model). A necessary condition to actually have a continuous transitio is that f ′′′ (0) < 0. For a symmetric field distribution all even derivatives of f are zero if evaluated at 0. Indeed we have f ′ (m) = βJh1 − tanh2 (β(Jm + h))ih f ′′ (m) = −2(βJ)2 htanh(β(Jm + h))(1 − tanh2 (β(Jm + h)))ih f ′′′ (m) = −2(βJ)3 h(1 − tanh2 (β(Jm + h)))2 − 2 tanh2 (β(Jm + h))(1 − tanh2 (β(Jm + h)))ih so f (0) = = 0 f ′ (0) = βJh1 − tanh2 (βh)ih f ′′ (0) = = 0 f ′′′ (0) = −2(βJ)3 h1 − 4 tanh2 (βh) + 3 tanh4 (βh)ih One can convince onself that f ′′′ (0) < 0 for sufficiently small h0 but that it changes sign if for larger βh0 , so there cannot be a continuous phase transition at larger h0 . Where the transition is continuous, the critical condition is βc Jh1 − tanh2 (βh)ih = 1 Thus there cannot be ferromagnetic order for βJ < 1, and the critical β increases with h0 for h0 > 0. For sufficiently large h0 the transition becomes discontinuous. Indeed for h0 > 1 it is impossible to have ferromagnetic order evern in the limit β → ∞. (Convince yourself that in that limit f (m) has two steps at ±h0 , jumping from -1 to 0 at −h0 and from 0 to +1 at +h0 . Sketches to follow.
3
7CCMCS03 – Equilibrium Analysis of Complex Systems Exercises 7 Question 1 Consider the Hopfield model of a neural network storing p patterns. Fill in the details concerning the existence of so-called n-symmetric solutions of the FPEs D h X iE µ µ mµ = ξ tanh β ξ mµ , µ
ξ
described in the lecture notes. I.e. (a) Show that the ansatz mµ = m for 1 ≤ µ ≤ n,
mµ = 0 for µ > n .
provides a self-consistent solution of the FPEs. To do this you need to demonstrate that inserting the ansatz on the r.h.s. of the FPEs reproduces mµ of exactly this form on the l.h.s. Show that the equation for the symmetric amplitude m in this ansatz must satisfy n n h X iE 1 D X µ µ m= , ξ tanh βm ξ n ξ µ=1 µ=1 (b) Use the equation for the amplitude m of the n-symmetric solutions to show that for β & 1 the leading asymptotics for m at small negative values of t = (T − Tc )/Tc , with Tc = 1, is r −3t m≃ . 3n − 2 Note that for retrieval-states with n = 1, this is the same as the dominant asymptotics of the magnetization of the Curie Weiss model in the vicinity of the critical temperature. Question 2 Consider N independent particles on the positive real line. Assume that each of them is attached to a spring with random spring constant ki , so that the Hamiltonian of the system is X H(x) = ki xi , xi ≥ 0 . i
(a) Compute the {ki }-dependent partition function and the corresponding free energy. (b) Show that the {ki }-dependent free energy in the thermodynamic limit converges to 1 ln ZN (β) = −hln(βk)ik , −βf (β) = lim N →∞ N in which h. . .ik denotes an average over the (c) Use the replica identity explained in class to compute the quenched free energy 1 1 −βfq (β) = lim lim lnhZNn (β)i N →∞ n→0 N n and show that it agrees with the result obtained in (b). Hint: Use the explicit expression for the {ki } dependent partition function and a small n expansion of ZNn (β) when computing the average. 1
7CCMCS03 – Equilibrium Analysis of Complex Systems Solutions 3 Question 1 We investigate the self-consistency equations for the overlaps in the Hopfield model: D h X iE mµ = ξ µ tanh β ξ µ mµ . ξ
µ
(a) To show that the ansatz mµ = m for 1 ≤ µ ≤ n,
mµ = 0 for µ > n .
provides a self-consistent solution of the FPEs, insert this form in the r.h.s of the FPEs, which gives n iE D h X . ξν mµ = ξ µ tanh βm ν=1
ξ
Clearly, if µ > n, then using independence of the ξ µ , we can conclude from this equation that mµ = 0, as n iE h X D E D µ ξν =0. tanh βm mµ = ξ ξ
ν=1
ξ
Conversely, if µ ≤ n, then the argument of the tanh also contains a ξ µ . Moreover, as the argument of the tanh in this equation is fully symmetric w.r.t. a permutation of the 1 ≤ µ ≤ n, it is clear that the mµ as computed from this ansatz are also independent of µ for 1 ≤ µ ≤ n. Thus summing over µ, we obtain n n h X iE 1 D X µ m= ξ tanh βm , ξν n ξ µ=1 ν=1
as claimed.
(b) To obtain the dominant asymptotics for the n-symmetric amplitude at small negative t = (T − Tc )/Tc , with Tc = 1, we use the expansion of the hyperbolic tangent to third order to get n m ≃ βm
n D X µ=1
ξ
µ
2 E
n
D X 4 E 1 ξµ − (βm)3 3 ξ ξ µ=1
Expanding the second and fourth power of the ξ µ -sum and using the fact that averages are non-zero if any of the ξ µ appearing in the expansion appears with even power (2 or 4), we get — doing the combinatorics — that 1 n m ≃ βmn − (βm)3 (n + 3n(n − 1)) 3 For the non-zero solutions, we can divide by nm, and proceed as for the Curieweiss model, to get r −3t m≃± . 3n − 2 1
at dominant order in t. Note that for retrieval-states with n = 1, this is the same as the dominant asymptotics of the magnetization of the Curie Weiss model in the vicinity of the critical temperature. Question 2 We consider N independent particles on the positive real line, each of them attached to a spring with random spring constant ki , so that the Hamiltonian of the system is X H(x) = ki xi , xi ≥ 0 . i
(a) The partition function of the system is Z ∞Y YZ P −β i ki xi = ZN (β) = dxi e 0
i
i
∞
dxi e−βki xi = 0
Y 1 βki i
(note the particles are independent/non-interacting, so the partition function factorizes w.r.t. i) (b) The free energy per particle then is 1 1 X ln ZN (β) = − lim ln(βki ) . N →∞ N N →∞ N i
−βf (β) = lim
The limit is evaluated by appeal to the law of large numbers givning 1 ln ZN (β) = −hln(βk)ik , N →∞ N
−βf (β) = lim
provided that the average over the ki distribution exists. The limiting free energy is therefore non-random. (c) To evaluate the free energy in terms of the replica identity, we have to look at 1 1 lnhZNn (β)i{ki } N →∞ n→0 N n
−βfq (β) = lim lim
Using the expression for the {ki }-dependent partition function obtained in (a), we have * + Y 1 1 1 −βfq (β) = lim lim ln N →∞ n→0 N n (βki )n i {ki }
A small n expansion of ZNn (β) gives Y Y ZNn (β) = e−n ln(βki ) = (1 − n ln(βki ) + O(n2 )) , i
i
so E E XD 1 1 XD lnhZNn (β)i{ki } = ln(1−n ln(βki )+O(n2 )) =− ln(βki ) +O(n) n n i ki ki i where the last step results from the smalle n expansion of th outer logarithm. Noting that the ki averages are i independent and putting everything together we obtain the desired result. 2
7CCMCS03 – Equilibrium Analysis of Complex Systems Exercises 7 Question 1 Consider a disordered Einstein solid in an random site dependent electric field Ei , described as a collection of independent harmonically bound particles with Hamiltonian N N X kX 2 H(x) = xi − q Ei xi , 2 i=1 i=1 in which the fields Ei are independent identically distributed random variables, and q denotes the charge of the particles. The coordinates xi can take any real value. (a) Compute the {Ei }-dependent free energy per particle and show that in the thermodynamic limit it is given by 2π βq 2 2 1 ln + hE iE −βf (β) = − lim βfN (β) = N →∞ 2 βk k where h. . .iE denotes an average over the Ei distribution. (b) Show explicitly that f (β) as computed in (a) agrees with the result obtained via the replica identity, i.e. 1 ln hZNn iE N →∞ n→0 N n
−βf (β) = −βfq (β) = lim lim (c) Compute the macroscopic polarization N q X p= hxi i N i=1
in terms of the Gibbs average h. . .i corresponding to the Hamiltonian. Question 2 Consider a Sherrington-Kirkpatrick model described by 1X Jij Si Sj H(S) = − 2 i6=j where the Jij are independent zero-mean Gaussian random couplings Jij ∼ N (0, σ 2 ), with σ 2 = 1/N , and Jij = Jji . (a) Evaluate the average of the replicated partition function, and show that it can be expressed as n o n N β2 X X hZ n i = qab ({S a })2 + O(n2 ) exp 4 a,b=1 a {S }
P
where the {Sa } (. . .) denotes a sum over replicated spin-configurations (and is still to be performed), and qab ({S a }) =
N 1 X a b S S N i=1 i i
Specify the O(n2 ) terms neglected in the exponent. 1
(b) Use Gaussian linearization to show that hZ n i can be expressed in integral form as 2 X Z Y o n β2 X β dqab β2 X 2 n a b p exp N hZ i = . n− q +ln qab S S exp 4 4 a6=b ab 2 a6=b 4π/(β 2 N ) a q6=b {S }
(c) Hence show that the integral expression defining hZ n i is dominated by qab ’s satisfying qab = hS a S b i , where h. . .i is a Gibbs average w.r.t. the Hamiltonian Hefff = −
βX qab S a S b . 2 a6=b
(d) Assuming a replica symmetric solution qab = q for a 6= b, show that, as n → 0, the replica symmetric order parameter q satisfies the selfconsistency equation Z √ q = Dz tanh2 (β q z) . Use the inequality tanh2 (x) ≤ x2 to show that the only solution of this equation for β < 1 is q = 0. Moreover, use the expansion of the hyperbolic tangent for small arguments tanh(x) ≃ x − x3 /3 + O(x5 ) to show that for small positive values of β − 1 the solution has the asymptotic behaviour q ≃ (β − 1) ,
2
as β → 1 .
7CCMCS03 – Equilibrium Analysis of Complex Systems Solutions 3 Question 1 Given an Einstein solid in an random site dependent electric field Ei , described as a collection of independent harmonically bound particles with Hamiltonian N
H(x) =
N
X kX 2 xi − q Ei xi , 2 i=1 i=1
in which the fields Ei are independent identically distributed random variables, and q denotes the charge of the particles. The coordinates xi can take any real value. (a) To compute the {Ei }-dependent free energy evaluate ZN =
Z Y N
dxi exp
i=1
n
N
N
o X βk X 2 − xi + βq Ei xi 2 i=1 i=1
Factors in i; product of Gaussian integrals N N r n βq 2 E 2 o Y Y 2π i exp ZN = Zi = βk 2k i=1 i=1 Free Energy N 1 1 X h 2π βq 2 Ei2 i −βfN (β) = ln ln ZN = + N 2N i=1 βk k
Evaluate using the law of large numbers. Provided that the expectations in question exist, one has 2 2 i 1 h 2π βq E + −βf (β) = − lim βfN (β) = ln N →∞ 2 βk k E (b) Replica: evaluate average of n-th power of partition function, and use the fact that the Ei averages factor w.r.t. i; get * N r ! + N r n βq 2 E 2 o n n βq 2 E 2 on Y 2π Y 2π i i n hZN i = exp = exp βk 2ki βk 2ki i=1 i=1 Expand for small n hZNn i
=
N Y i=1
1 + n ln
r
n βq 2 E 2 o 2π i 2 + O(n ) exp βk 2k
Use that averages are independent of i, and ln(1 + na) = na + O(n2 ) to get 2 2 i 1 1 h 2π βq E n ln hZN i = + −βfq (β) = lim lim ln N →∞ n→0 N n 2 βk k E which agrees with −βf (β), as computed in (a). 1
(c) To compute the polarization p=
N q X hxi i N i=1
from the Gibbs average generated by H, note that Z o n β 1 hxi i = dxi xi exp − kx2i + βqEi xi Zi 2 2 Z βq 2 Ei2 qEi qEi β qEi 1 , + + xi − exp − k xi − dxi = Zi k k 2 k 2k which, exploiting the symmetry of the Gaussian integral involved, gives hxi i = hence p= by the law of large numbers.
qEi k
N q2 1 X q 2 Ei = hEiE N i=1 k k
Question 2 We look at a Sherrington-Kirkpatrick model described by 1X H(S) = − Jij Si Sj 2 i6=j with the Jij for i < j independently distributed according to the pdf √ p(Jij ) = 2πN exp(−N Jij2 /2) and Jij = Jji . (a) To evaluate hZ n i we write n
Z =
X
{Sa }
X X a a exp β Jij Si Sj i<j
a
Since the average factors w.r.t. the bonds, and Sia = ±1 we have * + XY X hZ n i = exp βJij Sia Sja , a
{Sa } i<j
which on performing the Gaussian Jij integrals gives X 2 XX n 2 X N β2 X β 2 a b a b n a 2 β S S S S = hZ i = qab ({S }) − n exp exp 4N i6=j a,b i i j j 4 a,b=1 4 a a {S }
{S }
2
which is of the form stated, with the O(n2 ) contribution β4 n2 arising to compensate for the inclusion of diagonal i = j terms in the first contribution. 2
(b) Using Gaussian linearization, one can rewrite hZ n i (ignoring the O(n2 ) contribution) as XZ Y N β 2 X 2 β 2 X X a bo N β2 dqab n p n− . hZ i = qab + qab exp Si Si 2N ) 4 4 2 4π/(β a i a6=b a6=b a6=b {S }
The partition sum now factors w.r.t. i and gives 2 X Z Y o n β2 X dqab β β2 X 2 n a b p hZ i = exp N n− . q +ln qab S S exp 4 4 a6=b ab 2 a6=b 4π/(β 2 N ) q6=b {S a } as claimed.
(c) The expression just derived is ready to be evaluated by the Laplace method. Defining βX Heff = − qab S a S b 2 a6=b
one can write the stationarity condition obtained by setting derivatives of the above exponential w.r.t. the qab to 0 as n o P a b S S exp − βH 2 2 a eff {S } β β n o qab = P 2 2 exp − βH a {S }
eff
in which we recognize a Gibbs average w.r.t. Heff , leading to the result claimed.
(d) Attempting a replica symmetric solution, we have X 2 β β Heff = − q Sa + n 2 2 a Coupling between different replica appear via a complete square. In evaluating the above Gibbs average we use Gaussian linearization to decouple replica giving 2 n−2 n √ P o R R P √ √ Dz 2 sinh(β q z) Dz {S a } S a S b exp β q z a S a 2 cosh(β q z) o n √ P n = q= R R P √ Dz {S a } exp β q z a S a Dz 2 cosh(β q z) In the limit n → 0 the denominator gives 1, and the nominator can be written to give Z √ q = Dz tanh2 (β q z) . as claimed.
Using the stated inequality we have Z 2 q ≤ β q Dzz 2 = β 2 q . Hence for β < 1 we have either q = 0, or 0 ≤ q < q for β < 1, which has no non-zero solution . 3
Using the expansion of the hyperbolic tangent given we have Z √ 2 √ q = Dz β q z − (β q z)3 /3 + O(q 5/2 ) = β 2 q − 2β 4 q 2 + O(q 3 ) which — after ignoring the O(q 3 ) contribution — has for q 6= 0 a solution q≃
β2 − 1 2β 4
For β − 1 being small, the dominant asymptotics is q ≃ (β − 1) ,
4
as β → 1 .
7CCMCS03 – Equilibrium Analysis of Complex Systems Exercises 9 Question 1 Consider a weighted graph Laplacian on a graph of N vertices defined by X cik Kik δij Lij = cij Kij − k
with symmetric connectivity matrix C = (cij ) and symmetric weight matrix K = (Kij ). This problem is about formulating the cavity approach to the spectral problem of weighted graph Laplacians as defined above. (a) Show that a quadratic form involving the weighted graph-Laplacian L = (Lij ) can be written as X 1X Lij xi xj = − cij Kij (xi − xj )2 2 i,j
i,j
Based on this identity, What can be said about the spectrum of L if all weights Kij are positive? (b) According to the Edward Jones approach, the spectral density of L can be expressed as X 1 Re hx2i i ρL (λ) = lim εց0 πN i
in which h. . .i is an average w.r.t. the complex Gaussian measure defined by PL (x) = with HL (x) =
e−iHL (x) . ZL (λε )
1 X λε δij − Lij xi xj . 2 ij
For a graph with finite mean connectivity, use the fact that the graph is locally tree like to show that single site marginals of PL can be expressed in terms of cavity marginals as YZ i 2 (i) − 2i λε x2i dxj e− 2 Kij (xi −xj ) Pj (xj ) . Pi (xi ) ∝ e j∈∂i
Similarly, show that the cavity marginals satisfy the recursion Y Z i 2 (j) (i) − 2i λε x2j dxℓ e− 2 Kjℓ (xj −xℓ ) Pℓ (xℓ ) Pj (xj ) ∝ e ℓ∈∂j\i
[Hint: follow the reasoning of problem sheet 6 and solutions.] (c) Show that the cavity recursions are self-consistently solved by Gaussians of the form o n 1 1 (i) 2 Pj (xj ) = q x exp − j (i) (i) 2∆j 2π∆j (i)
leading to a set of self-consistency equations for the cavity marginals ∆j of the form 1
(i)
∆j =
(j)
iλε +
P 1
Kjℓ /∆ℓ ℓ∈∂j\i K − i/∆(j) jℓ ℓ
(d) Show that the single-site marginals are also Gaussian and that the single-site variances are given in terms of cavity variances as 1
∆i =
(i)
iλε +
P
Kij /∆j
.
j∈∂i K − i/∆(i) ij j
(e) For a random regular graph with uniform degrees ki = c for all i and uniform weights (i) ¯ Show that ∆ ¯ Kij ≡ K one expects all cavity variances to be the same ∆j ≡ ∆. can be found as the solution of a quadratic equation, and that its solution is ¯ 1,2 = ∆
i p i h λε + K(c − 2) ± (λε + K(c − 2))2 + 4Kλε . 2Kλε
Show that the (uniform) single site variance ∆i ≡ ∆ for the random regular graph ¯ as with ki = c can be expressed in terms of the uniform cavity variances ∆ ∆=
1 ¯
∆ iλε + c KK/ ¯ − i/∆
.
This could be used to obtain the spectrum of the graph Laplacian of a random regular graph from 1 ρL (λ) = lim Re ∆ εց0 π If you have the stamina and concentration, work out the formula for the spectral density using this equation.
2
7CCMCS03 – Equilibrium Analysis of Complex Systems Solutions 9 Question 1 We are given a weighted graph Laplacian on a graph of N vertices defined by X cik Kik δij Lij = cij Kij − k
with symmetric connectivity matrix C = (cij ) and symmetric weight matrix K = (Kij ). (a) To show that the quadratic form involving the weighted graph-Laplacian L = (Lij ) is of the form claimed, work backward and first expand the square in Q=− This gives Q=−
1X cij Kij (xi − xj )2 2 i,j
1X cij Kij (x2i + x2j − 2xi xj ) 2 i,j
Using symmetry of the C and K matrices, this can be written as X XX X cik Kik x2i = Q= cij Kij xi xj − Lij xi xj i,j
i
i,j
k
as claimed. Clearly, if all weights Kij are positive then X Lij xi xj = Q ≤ 0 i,j
for all x = (xi ), i.e. the matrix is negative definite, in other words all its eigenvalues are negative. By inspection, we see there is an eigenvector of L with eigenvalue zero; it is given by the vector with constant coefficients xi = x for all i. It is unique if the system consists of a single connected component. (b) Following the Edward Jones approach, we express the spectral density of L as ρL (λ) = lim
εց0
X 1 Re hx2i i πN i
in which h. . .i is an average w.r.t. the complex Gaussian measure defined by PL (x) = with HL (x) =
e−iHL (x) . ZL (λε )
1 X λε δij − Lij xi xj . 2 ij
Following the reasoning of the cavity approach described in class, we can write the single site marginals generally as Z i P 2 − 2i λε x2i dx∂i e− 2 j∈∂i Kij (xi −xj ) P (i) (x∂i ) , Pi (xi ) ∝ e in which P (i) (x∂i ) is a joint cavity marginal of the {xj } adjacent to i. 1
For a graph with finite mean connectivity, use the fact that the graph is locally tree like to allow setting Y (i) Pj (xj ) . P (i) (x∂i ) ≃ j∈∂i
The integral over the x∂i thus factors, giving YZ i i 2 2 (i) dxj e− 2 Kij (xi −xj ) Pj (xj ) . Pi (xi ) ∝ e− 2 λε xi j∈∂i
Following the same line of reasoning for the single-site cavity marginals one obtains Y Z i 2 (j) (i) − 2i λε x2j dxℓ e− 2 Kjℓ (xj −xℓ ) Pℓ (xℓ ) Pj (xj ) ∝ e ℓ∈∂j\i
(c) To show that the cavity recursions are self-consistently solved by Gaussians of the form o n 1 1 (i) 2 Pj (xj ) = q x exp − (i) j (i) 2∆j 2π∆j insert this Gaussian ansatz into the recursion for single-site cavity marginals. Leaving aside proportionality constants, this gives n o n i o Y Z 1 1 − 2i λε x2j 2 2 2 K (x − x ) − exp − x ∝ e x dx exp − jℓ j ℓ ℓ (i) j (j) ℓ 2 2∆j 2∆ℓ ℓ∈∂j\i Expanding (xj − xℓ )2 = x2j + x2ℓ − 2xj xℓ in the first exponential of the integrands on the r.h.s., we see that we are left with Gaussian xℓ integrals, which when done give exp
n
−
1
x2 (i) j
2∆j
o
i
2
∝ e − 2 λ ε xj
Y
exp
ℓ∈∂j\i
n
−
2 i o Kjℓ 1h iKjℓ + x2j (j) 2 iKjℓ + 1/∆ℓ
∝ exp
n
2 io X h Kjℓ 1 i iKjℓ + − λε x2j − x2j (j) 2 2 iKjℓ + 1/∆ℓ ℓ∈∂j\i
∝ exp
n
−
X Kjℓ /∆(j) i o 1h ℓ x2j iλε + (j) 2 ℓ∈∂j\i Kjℓ − i/∆ℓ (i)
leading to a set of self-consistency equations for the cavity marginals ∆j of the form 1
(i)
∆j =
(j)
iλε +
P
Kjℓ /∆ℓ ℓ∈∂j\i K − i/∆(j) jℓ ℓ
(d) The single-site marginals are also Gaussian due to the fact that they are expressed in terms of the same type of integrals as the cavity marginals, the only difference being that the exclusion of the cavity-site from the set of neighbours of the site considered does not apply. Following the same algebra then gives 1
∆i =
(i)
iλε +
P
2
Kij /∆j
j∈∂i K − i/∆(i) ij j
.
(e) For a random regular graph with uniform degrees ki = c for all i and uniform weights (i) ¯ Inserting that Kij ≡ K one expects all cavity variances to be the same ∆j ≡ ∆. ansatz into the above self-consistency equation for single-site cavity variances gives ¯ = ∆
1 ¯
∆ iλε + (c − 1) KK/ ¯ − i/∆
.
¯ with solutions This can be transformed into a quadratic equation for ∆ i h p ¯ 1,2 = i ∆ λε + K(c − 2) ± (λε + K(c − 2))2 + 4Kλε 2Kλε Hence the (uniform) single site variance ∆i ≡ ∆ for the random regular graph with ¯ by inserting ki = c can be expressed in terms of the uniform cavity variances ∆ (i) ¯ ∆j ≡ ∆ into the above equation for ∆ = ∆i , giving ∆=
1 ¯
∆ iλε + c KK/ ¯ − i/∆
=
¯ −i K∆ ¯ + λε + cK . iλε K ∆
¯ can be used To evaluate this, note that the solution of the quadratic equation for ∆ first to compute i p 1 h ¯ 1,2 − i = K∆ λε − K(c − 2) ± (λε + K(c − 2))2 + 4Kλε 2iλε h i p 1 ¯ 1,2 + λε + cK = λε + K(c + 2) ± (λε + K(c − 2))2 + 4Kλε iλε K ∆ 2 from which we get ∆1,2
p (λε + Kc)(c − 2) ± c (λε + K(c − 2))2 + 4Kλε =i 2λε (λε + 2Kc)
This could be used to obtain the spectrum of the graph Laplacian of a random regular graph from 1 ρL (λ) = lim Re ∆ εց0 π Using limεց0 λε = λ, and realizing that ∆1,2 will have a non-vanishing real part, if and only if the argument of the square root in the above expression is negative, and noting that the spectral density should be non-zero only for negative λ (entailing that we have to choose the ∆2 to obtain a positive density), we get p c −(λ + K(c − 2))2 − 4Kλ ρL (λ) = − 2πλ(λ + 2Kc) In this expression we rewrite λ(λ + 2Kc) in the denominator as (λ + Kc)2 − K 2 c2 ; in the argument of the square root we use (λ+K(c−2))2 +4Kλ = (λ+Kc)2 −4K 2 (c−1), to finally get p c 4K 2 (c − 1) − (λ + Kc)2 ρL (λ) = 2π(K 2 c2 − (λ + Kc)2 ) which is non-zero and real for
√ √ −Kc − 2K c − 1 ≤ λ ≤ −Kc + 2K c − 1 . (note that we need c ≥ 2 in order to have an infinite c regular graph.) This spectral density is a modified version of the so-called Kesten-McKay law (wich describes the spectral density of the adjacency matrix of a random regular graph). 3
These figures show the spectra of the weighted graph Laplacian of a random regular graph with K = 1 and c = 2 (left), and c = 3 (right).
4
7CCMCS03 – Equilibrium Analysis of Complex Systems Exercises 10 Question 1 Consider an energy function H(x) on a discrete space Ω, which is bounded from below, i.e. H(x) ≥ H(x∗ ) for some x∗ ∈ Ω. (a) Assuming H(x) > H(x∗ ) for all x 6= x∗ , show that – in the β → ∞-limit – the Gibbs distribution 1 −βH(x) pβ (x) = e Z(β) converges to a point measure concentrated on x∗ , i.e. lim pβ (x) = δx,x∗ ,
β→∞
where δx,x∗ = 1, if x = x∗ and δx,x∗ = 0 otherwise. (b) Assuming that the minimum of H(x) is degenerate, i.e. there exist x∗µ , µ = 1, . . . , p, such that H(x∗1 ) = H(x∗2 ) = . . . = H(x∗p ) < H(x) if x 6= x∗µ , for µ = 1, . . . , p, show that under these conditions p 1X δx,x∗µ , lim pβ (x) = β→∞ p µ=1
i.e. the zero-temperature Gibbs measure gives equal weight to the p degenerate ground-states of the system Question 2 Consider a continuous-time dynamics on IRN of the form dxi = fi (x) , dt with velocity functions fi (x) given by fi (x) = −
i = 1, . . . , N ,
N X j=1
Aij
∂Φ(x) ∂xj
for some function Φ(x) on IRN (a) Show that the function Φ(x) is a Lyapunov function of the dynamics, if the matrix A appearing in the definition of the fi is positive definite. Reminder: a matrix is positive definite, if for all non-zero vectors v we have (v, Av) = P i,j Aij vi vj > 0; a positive definite matrix has only strictly positive eigenvalues. Note: in the lectures we considered the case of a diagonal matrix of the form Aij = δij qi (xi ) with qi (xi ) > 0, which is clearly a special case of the present set-up. (b) For the simplified setting where Aij = δij qi with constant positive qi , i.e. qi > 0. Show that minima of Φ(x) corresepond to strongly stable fixed points of the dynamics. Hint: You will have to show that the Jacobian of the system of equations, evaluated at fixed points x∗ is positive definite. You will have to exploit the fact that for a symmetric matrix A, and qi > 0, a nonsymmetrix matrix of the form Bij = qi Aij can be symmetrized by a similarity ˜ = D−1 BD with D = diag(√q ) transformation B i 1
7CCMCS03 – Equilibrium Analysis of Complex Systems Solutions 10 Question 1 Given an energy function H(x) on a discrete space Ω, which is bounded from below, i.e. H(x) ≥ H(x∗ ) for some x∗ ∈ Ω. (a) The Gibbs distribution pβ (x) = with Z(β) =
P
x∈Ω
1 −βH(x) e Z(β)
e−βH(x) can be rewritten as e−β[H(x)−H(x )] pβ (x) = P −β[H(x)−H(x∗ )] x∈Ω e ∗
Given that H(x)−H(x∗ ) ≥ 0 with equality iff x = x∗ , we see that e−β[H(x)−H(x )] → P ∗ δx,x∗ as β → ∞, entailing that x∈Ω e−β[H(x)−H(x )] → 1 in this limit. The result 1 −βH(x) e → δx,x∗ , pβ (x) = Z(β) ∗
as β → ∞ follows. (b) In the case of p degenerate minima of H(x), we have e−β[H(x)−H(x
∗ )]
→ 1Ix∈Ω∗ ,
as β → ∞ ,
where Ω∗ = {x∗1 , x∗2 , . . . , x∗p } and 1Ix∈Ω∗ = pµ=1 δx,x∗µ . Using the same representation as above ∗ e−β[H(x)−H(x )] pβ (x) = P −β[H(x)−H(x∗ )] x∈Ω e P
with x∗ any of the x∗µ , we find pβ (x) =
p 1 −βH(x) 1 1X δx,x∗µ , e → ∗ 1Ix∈Ω∗ = Z(β) |Ω | p µ=1
as claimed. Question 2 Given a continuous-time dynamics on IRN of the form dxi = fi (x) , dt with fi (x) = −
i = 1, . . . , N , N X
j=1
for some function Φ(x) on IRN
1
Aij
∂Φ(x) ∂xj
(a) To show that Φ(x) is a Lyapunov function of the dynamics, if A is positive definite, evaluate X ∂Φ(x) dΦ(x) X ∂Φ(x) dxi ∂Φ(x) = =− Aij dt ∂xi dt ∂xi ∂xj i ij
Given that A is assumed to be positive definite, the r.h.s. is indeed of the form P , so we can conclude that − ij Aij vi vj with vi = ∂Φ(x) ∂xi dΦ(x) ≤0, dt
with equality iff
∂Φ(x) ∂xi
= 0 for all i, that is at stationary points of Φ(x).
(b) For the simplified setting where Aij = δij qi we have dxi ∂Φ(x) = fi (x) = −qi . dt ∂xi To analyse the stability of stationary points of the dynamics we have to evaluate the Jacobian J = (Jij ) with elements Jij =
∂fi (x) ∂xj
at stationar points of the dynamics. Given the form of the fi , we have Jij = −qi
∂ 2 Φ(x) ∂xi ∂xj
By a similarity transformation we can tranform 1 √ ∂ 2 Φ(x) √ √ qj Jij → J˜ij = √ Jij q j = − q i qi ∂xi ∂xj which leaves the eigenvalues spectrum invariant. At a (local) minimum of Φ(x) we know that the matrix 2
∂ =
∂ 2 Φ(x) ∂xi ∂xj
!
is positive definite. This entails that the matrix J˜ is negative definite, as X i,j
J˜ij vi vj = −
X i,j
X ∂ 2 Φ(x) √ ∂ 2 Φ(x) √ vi q i v˜i vj q j = − v˜j < 0 , ∂xi ∂xj ∂xi ∂xj i,j
thereby establishing the claim that minima of Φ(x) correspond to strongly stable fixed points of the dynamics.
2