Equilibrium Analysis of Complex Systems Lecture Notes of 7CCMCS03 R K¨ uhn
King’s College London Jan 2017
Contents 1 ...
27 downloads
1155 Views
993KB Size
Report
This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!
Report copyright / DMCA form
Equilibrium Analysis of Complex Systems Lecture Notes of 7CCMCS03 R K¨ uhn
King’s College London Jan 2017
Contents 1 Introduction and Motivation
7
1.1
Motivation & Scope . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
7
1.2
The issue of methods & tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
9
1.3
Example: the Ising model in some of its many guises . . . . . . . . . . . . . . . .
11
2 Statistical Mechanics
16
2.1
Fundamental Relations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
16
2.2
Extremising Properties of Entropy and Free Energy . . . . . . . . . . . . . . . .
20
2.3
Systems with Frozen Degrees of Freedom: Disorder & Self-Averaging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
23
2.3.1
High Temperatures — Annealed Disorder . . . . . . . . . . . . . . . . . .
23
2.3.2
Low Temperatures — Quenched Disorder . . . . . . . . . . . . . . . . . .
24
2.3.3
Self-Averaging — Quenched Free Energy . . . . . . . . . . . . . . . . . .
25
2.3.4
A Note on Averages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
28
2.3.5
The Annealed Approximation . . . . . . . . . . . . . . . . . . . . . . . . .
29
3 Dynamics and Equilibrium
30
3.1
Describing Stochastic Dynamics . . . . . . . . . . . . . . . . . . . . . . . . . . . .
30
3.2
Uniqueness of Equilibrium Distributions . . . . . . . . . . . . . . . . . . . . . . .
31
3.3
Glauber Dynamics for Ising Systems . . . . . . . . . . . . . . . . . . . . . . . . .
32
3.4
The Metropolis Monte-Carlo Dynamics Algorithm . . . . . . . . . . . . . . . . .
35
3.5
Langevin & Fokker Planck Equations and Equilibrium . . . . . . . . . . . . . . .
36
1
4 Asymptotic Methods 4.1
38
Laplace and Saddle Point Methods . . . . . . . . . . . . . . . . . . . . . . . . . .
38
4.1.1
The Laplace Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
38
4.1.2
The Saddle Point Method . . . . . . . . . . . . . . . . . . . . . . . . . . .
40
Curie-Weiss Ferromagnet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
41
4.2.1
First Method: Combinatorics and Laplace Method . . . . . . . . . . . . .
42
4.2.2
Second Method: Gaussian Linearization and Laplace Method . . . . . . .
43
4.2.3
Third Method: δ-Functions and Saddle-Point Method . . . . . . . . . . .
44
4.2.4
Equivalence of the Three Methods . . . . . . . . . . . . . . . . . . . . . .
45
4.2.5
The Physics of the Curie-Weiss Model . . . . . . . . . . . . . . . . . . . .
46
4.3
The Mean-Field Approximation . . . . . . . . . . . . . . . . . . . . . . . . . . . .
51
4.4
Variational Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
53
4.5
Variational Mean Field . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
54
4.2
5 System with Finite Dimensional Site Disorder 5.1
57
General Solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
58
5.1.1
61
Neural Networks – The Hopfield Model . . . . . . . . . . . . . . . . . . .
6 A Model for Operational Risk
67
6.1
Set-Up of the Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
68
6.2
A Statistical Mechanics Approach
. . . . . . . . . . . . . . . . . . . . . . . . . .
69
6.2.1
Replica Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
70
6.2.2
Replica Symmetric Solution . . . . . . . . . . . . . . . . . . . . . . . . . .
74
6.2.3
Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
76
7 Random Matrix Theory
80
7.1
Eigenvalues and the Spectral Density . . . . . . . . . . . . . . . . . . . . . . . . .
80
7.2
The Random Matrix Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
81
7.2.1
Representation of the δ-function . . . . . . . . . . . . . . . . . . . . . . .
82
Cavity Approach to Sparse Matrices . . . . . . . . . . . . . . . . . . . . . . . . .
83
7.3.1
Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
86
7.3.2
The Dense Limit and Wigner’s Semi-Circle . . . . . . . . . . . . . . . . .
86
7.3
2
8 Dynamics, Optimization and Statistical Mechanics 8.1
8.2
92
Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
92
8.1.1
Example: Travelling Salesman Problem (TSP)
. . . . . . . . . . . . . . .
92
8.1.2
Example: Coding and Decoding . . . . . . . . . . . . . . . . . . . . . . . .
93
8.1.3
The Role of Statistical Mechanics . . . . . . . . . . . . . . . . . . . . . . .
93
Dynamics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
94
9 Transfer Matrices
99
9.1
One-Dimensional Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
9.2
Quasi One-Dimensional Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
9.3
Spin Chains in a Random Field . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103 9.3.1
9.4
99
Thermodynamics of the Random Field Chain . . . . . . . . . . . . . . . . 105
Double-Stranded DNA as an Application . . . . . . . . . . . . . . . . . . . . . . 107
3
Aim of the Course The title of this course is Equilibrium Analysis of Complex Systems. • It deals with Complex Systems, i.e. systems which may – have many degrees of freedom, e.g. ∗ many particles (their positions, velocities and possibly internal degrees of freedom, such as spin), ∗ firing states of a population of neurons, ∗ attitudes or opinions of a large group of people, ∗ performance characteristics in a network of processes, ∗ many and diverse financial positions ∗ the actions of a large group people (agents) ∗ the demands on food resources of many species ∗ consist of large or heterogeneous data sets (many signals/complicated signals),
– which interact in complicated ways (short & long range, non-geometric, solely determined by function, cooperative, competitive, random strengths) – for which the dynamics often involves stochasticity. – for which the number of parameters needed to fully describe them grows with system size. • It deals with with their Analysis, i.e. we will explain concepts and methods, which allow to – derive macroscopic properties (qualitative and quantitative) of large systems, using ∗ properties of their microscopic constituents, ∗ the laws governing their interaction, ∗ an parameters characteristic of their environment (temperature, pressure, forces, fields, noise level, . . . ) – predict parameters at which macroscopic properties undergo dramatic changes although parameters are only changed infinitesimally (collective phenomena, phase transitions, emergent behaviour) ∗ feromagnetism below Tc , ∗ functional memory in neural networks below critical (noise-dependent) loading level, but not above ∗ systemic breakdown of process networks at critical values of mutual dependencies, ∗ adoption of new technologies below critical value of acquisition costs, 4
• It uses concepts of Equilibrium Statistical Physics. – They are very powerful, very diverse, and often better understood and simpler to use than proper dynamical methods for complex systems. – Despite the fact that many artificial systems do not exhibit (thermodynamic) equilibrium, such methods can still provide fairly accurate descriptions of statistically stationary behaviour of complex systems. – They have proven to be very powerful in generating non-trivial predictions and intuition about qualitative properties and property changes.
The Modelling Cycle The problem/challenge: (i) Nature doesn’t tell you which equations (which model!) you should use to describe it. You have to find them. (ii) Nature is very complicated; you often concentrate on a subset of phenomena that a system may exhibit, and simplifying assumptions often have to be made to make analytical progress. How far can you go, and how do you decide whether a model is appropriate? There is nothing automatic about this process. Modelling is the art of science! • Main guiding principles: – A good theoretical model of a complex system should be like a good caricature. It should emphasize those features that are most important, while downplaying the inessential details. (Y. Frenkel) – Everything should be made as simple as possible, but not simpler. (A. Einstein) – The problem with this guidance: we do not know what the most important or the inessential features are, before we have understood the system in question. • ⇒ Go through modelling cycle – Start with a complex system to describe – construct a mathematical model (a set of equations) that attempt to describe the system – solve the model/the equations – use the solution of the model to make predictions about the system behaviour – compare predictions with data and known facts about the system – if necessary improve/redefine the model and continue . . .
5
Problem Data
(re−)define
compare
Mathematical
Predictions of the model
interpret solutions
Model
solve Solution of Equations
Figure 1: Modelling Cycle.
6
Chapter 1
Introduction and Motivation 1.1
Motivation & Scope
Why should we be interested in complex systems? • Complex systems are all around us: – Most industrially relevant materials (steel, alloys, ceramics, (hetero-)polymers) – Power supply systems, transport infrastructure, IT and communication systems – DNA, chemical reaction networks in living cells, protein-protein interaction networks, gene regulation networks, cell-signalling networks, neural networks, – networks of financial positions (credits, investments, financial obligations, options futures), factories (≃ process networks), trade networks, social networks – large heterogeneous data sets generated by human activity (internet use, in science, ...) • Modelling them, analyzing them, and understanding them is key to improving many dimensions of human existence (driving developments rather than being driven by them) – biochmedical research tries to understand cancer in terms of malfunctions of gene regulation networks, which induce changes in protein-protein interaction networks – financial regulators just beginning to understand contagion in networks of financial positions, and improve regulations on the basis of these insights – interet giants such as Amazon use advaned machine learning techniques to analyze e-activity data (twitter, facebook, buying decisions) to analyze consumer behaviour, and design optimal (viral) marketing campaigns 7
Sources and examples of complexity • Complexity from two different sources – nonlinear dynamics, chaos (sensitive dependence on initial conditions), relevant already in small systems – large numbers of interacting degrees of freedom (⇒ emergent collective behaviour, phase transitions separating different types of collective behaviour) & interactions without regularity, dynamical elements receiving conflicting ordering instructions (⇒ slow ‘glassy’ dynamics) • Here almost exclusively of 2nd type. Main tools of analysis: statistical physics, and in particular statistical physics of complex and disordered systems; additional complexity coming from adding first feature to large system is typically small, statistical tools remain valid. • Specifically: heterogeneous systems in physics – due to composition (non-equivalent degrees of freedom/particles) – due to irregular structures/disorder (glassy, amorphous) – due to irregular patterns of interaction – conflicting ordering instructions e.g. from mixture of ferromagnetic and antiferromagnetic interactions • Beyond physics – Biology: gene expression networks & stable cell types, neural networks & associative memory, formation of protein-protein complexes or complexes of proteins and small molecules (myoglobin-CO), DNA unzipping . . . – Economy: choice of optimal investment portfolios composed of correlated assets, strategy evolution (minority game/El Farrol bar problem), understanding fluctuations of price changes in interacting markets, optimization (travelling salesman, task allocation, chip-design), operational risk, credit risk – Technology: IT systems, black-out risk in power grids, routing internet traffic, smart grids, cloud computing, internet of things . . . – Social sciences: dynamics of opinion formation, adoption of new technologies, collective information processing through search-engines, social networks (search ↔ pagerank ↔ information processed as result of search ↔ evolution of search behaviour)
8
1.2
The issue of methods & tools
Traditional methods (condensed matter physics) • Standard textbooks on condensed matter theory deal with ideal structures – ideal crystals, their structure analysis using X-rays . . . , – free (non-interacting) electrons, – periodic potentials, band structure, – elementary excitations: lattice vibrations (phonons), magnons, plasmons . . . , – interactions of elementary excitations (electron-phonon ⇒ superconductivity) – collective phenomena (magnetism, superconductivity, . . . ) in pure systems.
• All topics still active areas of research, many not completely understood. • But – Ideal crystals don’t exist in the real world (impurities, imperfections of structure), many technically important materials are not crystalline at all (glass, rubber, wood, ceramics, even steel) – fundamentally: heterogeneous compositions, many kinds of different constituents, interactions between them irregular, competition and disorder – on positive side: heterogeneity can be used to create specific properties (steel, ceramics, polymers) – get much richer set of properties (memory, system states not uniquely determined by external parameters such as pressure, temperature, and electric/magnetic fields). Theoretical implication • Many techniques that rely on ideality (perfect periodicity, perfect homogeneity) fail for most real world materials. • Need new set of concepts, tools & techniques to describe the new set of phenomena. • Have to worry about composition dependence & stability of macroscopic properties: typical vs. average properties, various forms of average, and tools to compute them.
9
Complex Systems • The new techniques can be used to analyze other (non-physics) situations with heterogeneity, disorder, and competition (biology, information theory, social science, economics; see the ‘beyond physics’ list above for a list of more specific examples) • Application of physics tools outside physics has triggered new methodological developements, which have been reflected back to make advances in physics. • In complex systems outside physics, the separation between equilibrium and nonequilibrium approaches is less meaningful than in physics, as most non-physics complex systems are inherently out of equilibrium. • That said: there is a role for analysing stationary (long-lived) collective states of complex systems using tools of equilibrium statistical physics, as this type of analysis is often significantly easier than that of non-equilibrium aspects, and the available tool-kit much better developed. At the same time, equilibrium analyses can still deliver useful insights into complex systems behaviour.
=⇒
Equilibrium Analysis of Complex Systems
10
1.3
Example: the Ising model in some of its many guises
The Ising Model and Physical Systems: Classical Magnetism • classical spins Si , i = 1, . . . , N each in two possible states Si ∈ {±1} (up/down) • asynchronous dynamics (at any time step a randomly selected spin is updated according to) ! N X Si (t + ∆t) = sign (1.1) Jij Sj (t) + hi + ξit j=1
with Jij (exchange) interactions, Jij > 0 ferromagnetic, Jij < 0 anti-ferromagnetic, Jij = 0 no interaction between i and j; hi (external) magnetic field, possibly site-dependent; ξit random noise, independent in space (i) and time (t). • Energy/Lyapunov function: for symmetric couplings (Jij = Jji and Jii = 0, the noiseless (ξit ≡ 0) dynamics (1.1) is governed by a Lyapunov function H(S) = −
X 1X Jij Si Sj − hi Si , 2 i,j
(1.2)
i
meaning that H is a non-increasing function of time under the noiseless dynamics (1.1), system will thus evolve towards (local) minima of H(S) (‘downhill’). • Thermal equilibrium: If the noise is thermal with probability density function (pdf) d tanh(βξ) — for a justification of this form of the noise-pdf and a precise ρβ (ξ) = 12 dξ characterization of thermal equilibrium see Ch. 3 below — the system approaches thermal equilibrium with a Gibbs-Boltzmann equilibrium distribution for the system states S = (Si ), 1 1 (1.3) exp[−βH(S)] , β= pβ (S) = Zβ kB T • Quantity of interest: e.g. macroscopic magnetization m(t) =
N 1 X Si (t) N i=1
→
m=
N 1 X hSi i , as N i=1
t→∞,
(1.4)
where h. . .i in the second expression denotes an average over the Gibbs-Boltzmann distribution. It is asymptotically correct in the thermodynamic limit N → ∞ by the Law of Large Numbers (LLN).1 1
Note that the t → ∞ limit is the strict formal condition. What is actually needed is ‘equlilibration’, which in many physical system can be acchieved in times as short as pico-seconds!
11
• Ferromagnet on a regular lattice: Jij = J for (i, j) n.n., and Jij = 0 otherwise. Then with hi ≡ 0, m(t) → m0 sign(m(0)), as t → ∞, with the spontaneous magnetization m0 depending on thermal noise level (temperature). There is a critical temparature Tc , s.t. m0 (T ) = 0 for T ≥ Tc , and m0 (T ) > 0 for T < Tc (See Fig. 1.1). Behaviour of magnetization and other thermodynamic functions singular at Tc (critical phenomena). (Nb.: equivalent to antiferromagnetism on a bi-partite lattice). 1
m(T)
0.8 0.6 0.4 0.2 0 0
0.2
0.4
0.6
0.8
1
1.2
T
Figure 1.1: Qualitative behaviour of the temperature dependent magnetization in a ferro-magenet, without an external magnetic field, showing a phase-transition at Tc = 1. • Disordered Magnets: mixture of ferromagnetic and antiferromagnetic bonds: get (anti) ferromagnetism if (overwhelming) majority of bonds are (anti) ferromagnetic, and spinglass order hSi i = 6 0,
but
m(t)
→
m=
N 1 X hSi i = 0 , as N i=1
t→∞
(1.5)
at low temperatures in intermediate concentration regime. Main issut here: get conflicting ordering instructions, and thus frucstrated(!) spins (e.g. in a triangle of antiferromagnetic bonds). With many frustrated loops (see Fig. 1.2), spins get frozen in random orientations at low temperatures, and there will generally be many metastable states (O(2αN ) with 0 < α < 1). Order parameter to distinguish between high and low temperature phase in this case is the Edwards Anderson order parameter, q=
N 1 X hSi i2 , N
(1.6)
i=1
Such systems, too exhibit phase transitions in the sense that q = 0 at high temperatures but q > 0 for T < TSG , the transition temperature being typically referred to as the spin-glass temperature (sometimes the freezing temperature). 12
J<0 ?
J>0
J>0
J>0
Figure 1.2: Loop of frustrated spins. A loop γ is frustrated iff
Q
(i,j)∈γ
sign Jij < 0.
The Ising Model and Complex Systems Outside Physics • Neural networks: two state Mc Culloch-Pitts neurons Si ∈ {±1} (firing/non-firing). Neurons evaluate their post-synaptic potential ui (t) =
N X
Jij Sj (t) ,
(1.7)
j=1
and fire in the next time step, if the potential is above a threshold ϑi , and remain quiescent otherwise. Jij synaptic efficacies: Jij > 0 (excitatory), Jij < 0 (inhibitory). Dynamics including stochasticity (thermal noise or other) ! N X Jij Sj (t) − ϑi + ξit Si (t + ∆t) = sign (1.8) j=1
is formally equivalent to (1.1), with hi = −ϑi . Here typically large numbers of interaction partners (O(104 )), and large numbers of neurons (O(1011 ) for the human brain). Questions to be addressed: can one design a set of couplings that give rise to a desired set of dynamical attractors (memories). Hopfield: using Hebb rule for random patterns (ξiµ ∈ {±1}, i.i.d. with meah zero) p 1 X µ µ Jij = ξi ξj N
(1.9)
µ=1
has stable attractors that closely resemble the stored patterns (need ϑi = 0) N 1 X µ ξi Si (t) mµ (t) = N i=1
→ 13
N 1 X µ mµ = ξi hSi i ≃ 1 , N i=1
(1.10)
provided the number p of patterns encoded in the {Jij } and the noise-level are not too large. (See Fig. 1.3.)
Figure 1.3: Phase diagram of the Hopfield model in the temperature (T )- loading (α = p/N ) plane. P: ‘paramagnetic’ phase with mµ = 0 and q = 0; F: ‘ferromagnetic’ retrieval phase phase with mµ 6= 0 and q 6= 0; SG: ‘spin-glass phase with mµ = 0 and q 6= 0; M: the ferromagneticvretrieval phase coexists with the spin-glass phase, but is only meta-stable.
• Dynamics of attitude change: look at 2-attitude system (any system with binary option, e.g. adopting a new technology/habit/opinion). Describe in terms of preference field for new attitude N X Jij Sj (t) − ϑi + h0 (t) , (1.11) ui (t) = j=1
where the couplings describe influence of peer pressure, the ϑi would describe idiosyncratic a-priori preferences, and would be randomly distributed across a population, and h0 (t) is an external driving field (could mimic e.g. the effect of a falling price on a new gadget). Looking at noiseless situation for simplicity Si (t + ∆t) = sign ui (t) (1.12) m(t + ∆t) =
J N
(uniform all-to-all interactions). Then, by the LLN, 1 X 1 X Si (t + ∆t) = sign Jm(t) − ϑi + h0 (t) N N
Consider the simplest case Jij =
i
i
≃ Prob(ϑi ≤ Jm(t) + h0 (t)) − Prob(ϑi > Jm(t) + h0 (t))
= 2P (Jm(t) + h0 (t)) − 1 , 14
(1.13)
where P (x) = Prob(ϑi ≤ x)
(1.14)
is the distribution (the integrated pdf) of the idiosyncratic preferences. If the external field changes slowly, m(t) approaches a stationary value m(h0 ) = m(h0 (t)) characteristic of the current value of h0 (t). Depending on the value of J, can get continuous or discontinuous changes of opinion, as h0 (t) changes from −∞ to +∞.
Exercise: Assume p(ϑi ) is symmetric about 0, with a single maximum at 0 (think of a √ zero mean, variance σ 2 Gaussian for which P (x) = 21 [1 + erf(x/ 2σ 2 )]. Use a graphical analysis of the stationarity equation at given h0 = h0 (t) m = 2P (Jm + h0 ) − 1
(1.15)
to give a qualitative sketch of the behaviour of m as a function of h0 , and convince yourself of the correctness of the above claim concerning the existence of continuous and discontinuous transitions.
15
Chapter 2
Statistical Mechanics 2.1
Fundamental Relations
If a thermodynamic system, whose dynamic variables are collectively denoted by σ, in contact with a heat-bath at temperature T , is in thermal equilibrium, it described by the GibbsBoltzmann distribution 1 pβ (σ) = exp[−βH(σ)] , β = 1/kB T , (2.1) Z corresponding to H = H(σ), the energy function (Hamiltonian) of the system. The normalization constant X Z = Z(β) = exp[−βH(σ)] , (2.2) σ
in pβ is known as the so-called partition function.
The partition function of a system is of central importance in thermodynamics, as it is related to one of the important thermodynamic functions, viz. the free energy, via the relation F (β) = −β −1 ln Z(β) .
(2.3)
We will not be much interested in the fundamental relations between the free energy and other thermodynamic functions such as internal energyy or specific heat from a physics or thermodynamics point of view. We will, however come across several of them, without appealing to thermodynamic reasoning, as we go along; some of these relation are briefly explored from the thermodynamic point of view in Appendix A. We will here mainly explore the probabilistic significance of the partition and the free energy, namely that they are closely related to moment generating functions and cumulant generating functions of of probability theory. 16
Reminder: Moment and cumulant generating functions Consider a (discrete) random variable X taking values x ∈ Ω, with probabilities p(x) = Prob(X = x). • The moment generating function G(h) corresponding to p(x) is defined as1 X G(h) = p(x)ehx .
(2.4)
x∈Ω
• One can use the moment generating function to evaluate moments of the random variable X by differentiation n X d n n µn (X) = hX i = p(x)x = G(h) . (2.5) dh h=0 x∈Ω
Exercise: Verify this identity.
• The cumulant generating function K(h) corresponding to p(x) is defined as K(h) = log G(h) • Cumulants of the random variable X are obtained by differentiation of F , n d K(h) . κn (X) = dh h=0
(2.6)
(2.7)
• Cumulants are related to centered moments. The lowest order cumulants are κ1 (X) κ2 (X)
= =
κ3 (X)
=
hXi = µ1 (X) hX 2 i − hXi2 = Var(X)
hX 3 i − 3hX 2 ihXi + 2hXi3
Exercise: Verify these identities.
Moments of the Energy Function Moments of the engy function H(σ) are obtained from the partition function by differetiation, in close analogy to the way moments are obtained from a generating function, ∂ n 1 Z(β) . (2.8) − µn (H) = hH n i = Z(β) ∂β
Note that we are here using partial derivatives w.r.t β in anticipation of the fact that the energy function may contain further parameters w.r.t. which we could differentiate (see below). Also, we are not evaluating derivatives at β = 0, but at arbitrary β (obtaining moments through a slightly modified construction in terms of derivatives). The first moment E = hHi is referred to as the internal energy of the system. 1
Other representations exist, and some care is needed in the case of continuous random variables and/or if |Ω| = ∞.
17
Cumulants of the Energy Function Cumulants of the Energy function are obtained by differentiation of the free energy according to ∂ n ∂ n ( − βF (β)) = − ln Z(β) . (2.9) κn (H) = − ∂β ∂β ∂ ∂ Recall that µ1 (H) = κ1 (H) = E. Using β = 1/kB T , hence T ∂T = −β ∂β , we can exploit the first identity in (2.9) to obtain
E =F +β
∂ ∂ F =F −T F ∂β ∂T
⇔
F =E+T
∂ F ∂T
(2.10)
Thermodynamic relations allow to identify S = S(β) = −
∂ F ∂T
(2.11)
as the entropy of the system. Analogous reasoning can be used to obtain a thermodynamic interpretation for the second cumulant κ2 (H), as ∂ E ≡ C(β) = kB β 2 κ2 (H) = kB β 2 [hH 2 i − hHi2 ] . ∂T
(2.12)
This is the rate of change of internal energy with temperature, known as specific heat. The relation between specific heat and κ2 (H) allows to conclude that the specific heat of a system is always non-negative. Probabilistic Expression for the Entropy The entropy can be expressed in terms of the Gibbs-Boltzmann distribution as X S(β) = −kB pβ (σ) ln pβ (σ) , (2.13) σ
To verify insert the Gibbs distribution (2.1) into this expression, which gives X pβ (σ)[− ln Z(β) − βH(σ)] = kB ln Z(β) + kB βE . S(β) = −kB σ
On the other hand S(β) = −
∂ ∂ (−β −1 ln Z(β)) = kB β 2 (−β −1 ln Z(β)) = kB ln Z(β) + kB βE ∂T ∂β
which agrees with the probabilistic expression. Exercise: Check the arithmetic underlying the relations. 18
Generalization: Observables, Conjugate Fields and Associated Cumulants Let us consider the case where the energy function has several identifiable contributions Oα (σ), with say α = 1, 2, . . . , n, whose contribution to the energy function can be controlled by external fields hα . Specifically we will assume that H(σ) = H0 (σ) −
n X
hα Oα (σ) .
(2.14)
α=1
We will refer to the Oα (σ) as to observables [think of magnetization or (electric) polarization as examples], and to the hα as their conjugate fields The contribution H0 (σ) would represent the energy of internal interactions, which cannot be externally manipulated. In this case the Gibbs-Boltzmann distribution will be parametrized by β and by the conjugate fields, and the partition function and the free energy will depend on all these parameters, i.e. we have io n h X X hα Oα (σ) (2.15) Z = Z(β, {hα }) = exp − β H0 (σ) − α
σ
and thus F = −β −1 ln Z(β, {hα }) = F (β, {hα })
Expectation values of the observables (i.e. first moments/first cumulants) are obtained as κ1 (Oα ) = hOα i = −
∂ F (β, {hα }) ∂hα
(2.16)
Remark: One can use this formalism to compute expectations of observables, which cannot, in practice, be controlled by by external fields. Exercise: How ? Hint: think of the only possible value of the conjugate field at the disposal of the experimenter for these observables. One defines generalized susceptibilities via χαβ =
∂2 ∂2 ∂ hOα i = − F (β, {hα }) = − F (β, {hα }) = χβα ∂hβ ∂hβ ∂hα ∂hα ∂hβ
(2.17)
Inserting definitions one finds them related to mixed second order cumulants, χα,β = β[hOα Oβ i − hOα ihOβ i] = β κ2 (Oα , Oβ )
(2.18)
or correlations. Note that the meaning of χα,β is the response of the expectation of Oα to changes in the field hβ conjugate to the observable Oβ . Relations of the type (2.18) are therefore known as correlation-response-relations. Note they are generally valid for equilibrium distributions of Gibb-Boltzmann type. Exercise: Fill in the necessary arithmetic to check these relations.
19
2.2
Extremising Properties of Entropy and Free Energy
The equilibrium free energy and the thermodynamic entropy have the following extremising properties Propopsition: Let p = (p(σ)) be any probability assignment (or pdf). Define entropy and energy functionals as X X S[p] = −kB p(σ) ln p(σ) , E[p] = p(σ)H(σ) (2.19) σ
σ
and in terms of these a free energy functional as F [p] = E[p] − T S[p] .
(2.20)
Then (i) The Gibbs-Boltzmann distribution is the unique distribution that maximizes the entropy functional S[p] under the constraint of a given value for the average energy E = E[p]. Using pβ to denote specifically the Gibbs-Boltzmann distribution at (inverse) temperature β, one then has that the thermodynamic entropy S = S(β) is given in terms of the maximizing probability pβ as S(β) = S[pβ ]. (ii) (Gibbs variational principle): The Gibbs-Boltzmann distribution pβ is the unique distribution that minimizes the free energy functional F [p] at given inverse temperature β, and the thermodynamic free energy is given in terms of the minimizing probability pβ as F (β) = F [pβ ] Proof (i): Find distribution at which S[p] is stationary with respect to variations of p under the P constraints of given average energy E = E[p] and normalization h1i = σ p(σ) = 1, using the method of Lagrangian multipliers to handle the constraints. Thus, introducing the Lagrangian X p(σ) − 1 − kB λE E[p] − E , (2.21) L[p] ≡ S[p] − kB λ0 σ
solve ∀σ :
∂ L[p] = −kB ( ln p(σ) + 1) − kB λ0 − kB λE H(σ) = 0 , ∂p(σ) X ∂ p(σ) − 1 = 0 L[p] = −kB ∂λ0 σ ∂ L[p] = −kB E[p] − E = 0 ∂λE 20
(2.22) (2.23) (2.24)
The first of the above equations can be solved explicitly for p(σ) as p(σ) =
1 exp[−λE H(σ)] , Z
with Z = exp[1 + λ0 ] =
X
exp[−λE H(σ)] ,
(2.25)
(2.26)
σ
in which the last expression for Z follows from the second of the above equations, expressing the normalization constraint. Since with p of the form (2.25), the energy functional E[p] = hHi = P p(σ)H(σ) is a monotone decreasing function of λE (Exercise: verify this.), the value of σ λE which solves the third equation E[p] = E above (call it β) is uniquely defined, provided a solution does exist at all. Calling pβ the unique maximizing solution we have that S[pβ ] gives the correct probabilistic expression for the thermodynamic entropy if we identify β with the inverse temperature. P Note: in case of continuous degrees of freedom, where p is a pdf, and σ is symbolic notation for an integral, the partial derivatives w.r.t. the p(σ) would have to be replaced by functional δ derivatives δp(σ) . Remark The fact that the maximizing distribution must be unique can also be infered from the fact that S[p] is a concave functional. Suppose that both p1 and p2 give rise to the same maximum Smax of S[p] compatible with the constraints h1i = 1 (normalization) and hHi = E (given average energy). Then the convex combination pλ = λp1 + (1 − λ)p2 , with 0 ≤ λ ≤ 1 would also satisfy the constraints, but X S[pλ ] = −kB pλ (σ) ln pλ (σ) ≥ λS[p1 ] + (1 − λ)S[p2 ] = Smax , σ
and the inequality is sharp, unless p1 ≡ p2 (or λ ∈ {0, 1}). This follows from the fact that f (x) = −x ln x is a concave function of x on the interval [0, 1].
Proof (ii): The proof that F [p] at given temperature T is minimized by pβ follows along similar lines, though here only the normalization constraint needs to be taken into account, requiring a single Lagrange multiplyerλ0 . Exercise Fill in the details of the proof of (ii). Remark There is a deep connection between (i) and information-theory. Up to a constant factor, the entropy S[p] is equal to the information entropy, which is a measure of our ignorance about a probabilistic system. Maximizing entropy under constraints thus amounts to maximizing 21
ignorance under constraints (given by the knowledge we do have about a system), or in other words it is a systematic way of infering a probability assignment for a system in a maximally unbiased way, consistent with and only with the available information. This (subjectivistic) approach to statistical mechanics was pioneered in the 1950’s by Jaynes (E T Jaynes, Information Theory and Statistical Mechanics. Phys Rev 106, 620-630 (1957)). Distributions describing other ensembles of Statistical Mechanics, e.g. the grand-canonical ensemble describing systems coupled to a heat bath and a particle reservoir, keeping the average energy and the average number of particles fixed, can also be constructed in this way. Systems with continuous degrees of freedom If we have a system with continuous degrees of freedom the Gibbs-Boltzmann distribution would be a probability density function. Similarly, if we are interested in a Maximum Entropy (MaxEnt) estimate of the pdf describing a system with continuous random variables, given constraints provided as expectation values of a set of observables, one starts out from the differential entropy, Z S[p] = −kB dσ p(σ) ln p(σ) , (2.27) in which p(σ) is now a pdf, defined over a (suitabl sub-)set Ω ⊆ Rn , and the above integral extends over Ω. R Assuming that we have constraints Fα = hfα i ≡ dσ p(σ)fα (σ) , α = 1, . . . , a the MaxEnt estimate for the maximally unbiased pdf, given these constraints, is solved by introducing L = S[p] −
a X
α=0
kB λα (hfα i − Fα )
with f0 (σ) = 1, hence F0 = 1 representing the normalization constraint, and solving a
∀σ : ∀α :
X δ kB λα fα (σ) = 0 , L[p] = −kB ( ln p(σ) + 1) − δp(σ) α=0 Z ∂ dσp(σ)fα (σ) − Fα = 0 L[p] = −kB ∂λα
Exercise: Show that the maximizing pdf is of the form p(σ) =
1 − Paα=1 λα fα (σ) e . Z
Exercise Show that the Gaussian pdf p(x) = √
h i 1 exp − 2 (x − a)2 2σ 2πσ 2 1
22
is the unique distribution that maximizes the Entropy S[p] = − the constraints hxi = a and h(x − a)2 i = σ 2 .
2.3
R∞
−∞ dx p(x) ln p(x),
subject to
Systems with Frozen Degrees of Freedom: Disorder & Self-Averaging
Consider for exampe a random-bond Ising model of the following form. Ising spins Si are placed on the vertices of a d dimensional hypercubic lattice. Bonds between neighouring vertices (i, j) may be occupied by impurities (kij = 1) or empty (kij = 0), and if present, the impurity facilitates a ferromagnetic interaction between the neighbouring spins, which is absent if no impurity is present. We will denote by σ the set of possible spin-states, and by κ the set of possible configurations of the impurities. The energy function of the spins at given configuration of the impurities is then X X Jij (κ)Si Sj − h Si , (2.28) H(σ|κ) = − i
(i,j)
with Jij (κ) = Jkij , the first sum extending over over all nearest neighbour pairs, the second over all verticesof the lattice. Alternatively, one could imagine that the Jij (κ) take random values different from 0 or 1.
2.3.1
High Temperatures — Annealed Disorder
At sufficiently high temperatures (say T = T0 , or higher) one assumes that the impurities can freely hop between different bonds, so that there would be many such configurational changes during typical times used to measure magnetic properties of the system. In such a situation spin and configurational degrees of freedom are in thermal equilibrium described by a Gibbs-Boltzmann distribution of the form pβ0 (σ, κ) =
1 exp[−β0 H(σ, κ)] , Z(β0 )
β0 = 1/kB T0 ,
(2.29)
with H(σ, κ) = H(σ|κ) + V (κ) ,
(2.30)
and V = V (κ) is a potential energy term describing the interaction of the impurities. The partition function Z(β0 ) is then given by (the sum over all states) X X Z(β0 ) = exp[−β0 H(σ, κ)] = Z(β0 |κ) exp[−β0 V (κ)] . (2.31) σ,κ
κ
23
Here Z(β0 |κ) =
X
exp[−β0 H(σ|κ)]
σ
is the partition function of the spin degrees of freedom at given configuration κ of the impurities. Note that the κ summation in Z(β0 ) is restricted to configurations of a given number of impurities (e.g. with only a fraction ρ of all bonds occupied). The marginal distribution of the impurity configurations κ in this situation in given by pβ0 (κ) =
X
p0 (σ, κ) =
σ
2.3.2
Z(β0 |κ) exp[−β0 V (κ)] . Z(β0 )
(2.32)
Low Temperatures — Quenched Disorder
Asume that the system is suddenly cooled down to temperatures T ≪ T0 , so that the motion of impurities is inhibited. One may imagine that the motion of an impurity to a neighbouring empty bond requires to overcome an activation-energy barrier ∆, so that the average time τ for this to occur is τ ∼ τ0 exp(∆/kB T ) , (2.33)
where τ0 ≃ 10−12 sec is the typical microscopic attempt time. At kB T ≪ ∆ these events can be very rare, so during times of measurement where typically very many spin-flips occur, the configuration κ can be regarded as virtually fixed. (E.g. with ∆/kB T ≃ 100 one obtains τ ≃ 2.7 × 1031 sec ≃ 8.6 × 1023 years.
In such a situation the spin system is in equilibrium at given fixed κ. With respect to the configurations κ the system is not in equilibrium. Thus we have a Gibbs distribution at fixed κ p(σ|κ) =
1 exp[−βH(σ|κ)] , Z(β|κ)
with Z(β|κ) =
X
β = 1/kB T ,
exp[−βH(σ|κ)] .
(2.34)
(2.35)
σ
Thermodynamic functions like the free energy per spin fN (β|κ) = −(βN )−1 ln Z(β|κ)
(2.36)
or the magnetization mN (β|κ) = −
∂fN (β|κ) 1 X = hσi i ∂h N i
24
(2.37)
therefore depend on the fixed disorder configuration κ. That is, they are random quantities, which are in principle fluctuating over an ensemble of identically prepared systems, with the statistics of configurations κ described by p0 (κ). Note that the joint distribution p(σ, κ) = p(σ|κ)pβ0 (κ) is not a Gibbs equilibrium distribution, unless β = β0 .
2.3.3
Self-Averaging — Quenched Free Energy
Given that quantities like fN (β|κ) or mN (β|κ) are random, what are the relevant values to consider (or to compare with experiments, between experiments, between theory and experiment)? Possible candidates are (i) the average over the distribution of configurations κ , (ii) the typical (most probable value, or (iii) the median. These are not in general equal. See the following figure for an example. 0.7 0.6 0.5
p(x)
0.4 0.3 0.2 0.1 0 0
1
2
3
4
5
x
Figure 2.1: Pdf of a random variable X, with typical and average values indicated. A hint provided by nature: thermodynamic properties of heterogeneous systems (e.g. alloys) do not depend on detailed configurations of individual atoms or molecules; experiments performed on materials of identical composition (concentrations), but not on identical micro-configurations give reproducible and identical results. The reasion for this is that for densities of extensive quantities, such as fN (β|κ) or mN (β|κ) one observes the phenomenon of self-averaging. As the large-system limit N → ∞ is taken, fN (β|κ) converges to the configurational average over the κ distribution, X pβ0 (κ)fN (β|κ) = lim hfN (β|κ)iκ , (2.38) fN (β|κ) −→ f (β) = lim N →∞
N →∞
κ
called quenched free energy. 25
Proof (outline): An outline of a proof for systems with finite range interactions is as follows Consider a system on a hyper-cubic lattice in d dimensions, containing N = Ld lattice sites, and a Hamiltonian with finite range interactions of the form X X H(σ|κ) = − Jij (κ)Si Sj − h Si , (2.39) i
(i,j)
in which the Jij (κ) depend on a disorder configuration but have finite range: Jij (κ) = 0 forv|ri − rj | ≥ R.
Compare the free energy density fN (β|κ) for a typical configuration κ with that of a system which is obtained from the previous one by decomposing it into smaller blocks of size (L′ )d , and ˜ cutting all bonds in channels of width R connecting neighbouring subsystems. Denote by H(σ|κ) ′ d the Hamiltonian of the system thus obtained; it describes a collection of (L/L ) non-interacting, independent subsystems. We will look at the limit L → ∞, L′ → ∞, L/L′ → ∞. 1
2
R L’
L
α
Figure 2.2: Disordered system, partitioned into noninteracting sub-systems; see text. One has the bound
|fN (β|κ) − f˜N (β|κ)| ∼ O(1/L′ )
(2.40)
1 1 ˜ ˜ |fN (β|κ) − f˜N (β|κ)| ≤ sup |H(σ|κ) − H(σ|κ)| ≡ ||H − H|| N σ N
(2.41)
To prove this notice that
which in turn follwos from fN (β|κ) = −(βN )−1 ln
X
e−βH(σ|κ)
σ
26
= −(βN )−1 ln and noting that x satisfies
X
˜
σ
−β (H(σ|κ)−H(σ|κ)) −β H(σ|κ) , |e {z } ×e ˜
˜
x
˜
e−β||H−H|| ≤ x ≤ eβ||H−H|| .
˜ only contains interactions inside and into channels of cut bonds. MultiNow note that H − H plying number of blocks, surface area of each block, and width R of the channel the number of terms contributing to the difference is seen to scale as (L/L′ )d × d (L′ )d−1 × 2 R = 2d RLd /L′ with L and L′ . As each individual interaction energy is bounded by a finite constant, we have ˜ ≤ const. × 2d RLd /L′ , hence ||H − H|| R |fN (β, κ) − f˜N (β, κ)| ≤ const. × 2d ′ → 0 , L
as L′ → ∞ ,
(2.42)
in which the constant is an estimate for the maximum contribution of a single interaction energy. To finish the proof note that f˜N (β, κ) is given by the sum of free energies of independent blocks ′
′
N/N N/N 1 X 1 X f˜N (β|κ) F (β|κα ) = fN ′ (β|κα ) N N/N ′ α=1
(2.43)
α=1
in which F (β|κα ) is the free energy of block number α with disorder configuration κα (the restriction of κ to block number α), and fN ′ (β|κα ) = F (β|κα )/N ′ the corresponding free energy density. The last expression is nothing but an empirical average of the free energy density over the block-configurations κα . For large N/N ′ this converges to the expectation of this quantity over the distribution of block configurations by the law of large numbers. Thus f˜N (β|κ) → hfN ′ (β|κ′ )iκ′
(2.44)
as N/N ′ → ∞, where the average is over the distribution of block-configurations. Finally, assuming that the limit f (β) = limN ′ →∞ hfN ′ (β|κ′ )iκ′ exists, we have |fN (β|κ) − f (β)| ≤ |fN (β|κ) − f˜N (β|κ)| +|f˜N (β|κ) − hfN ′ (β|κ′ )iκ′ |
+|hfN ′ (β|κ′ )iκ′ − f (β)| → 0 ,
(2.45)
as L → ∞, L′ → ∞, L/L′ → ∞. Summary: When investigating systems with quenched disorder, the relevant thermodynamic potential is the so-called quenched free energy f (β) = − lim (βN )−1 h ln Z(β|κ)iκ , N →∞
27
i.e. the average of the free energy (density) over the distribution of configurations κ of the system, in the sense that the typical true configuration-dependent free energy density will converge to the quenched free energy, as the thermodynamic limit is taken. The distribution of disorder configurations is determined by the way the system was prepared.
2.3.4
A Note on Averages
We have seen that one way to obtain the correct (i.e. typical) free energy is via computing the average of the logarithm of the partition function, or the quenched average. If, as observed on real systems, the true disorder dependent free energy density fN (β|κ) = −(βN )−1 ln ZN (β|κ)
(2.46)
typically i.e. in the vast majority of realizations, takes the disorder-independent value f (β), there is another reason why one should consider the average of the logarithm of the partition function, rather than, say, the logarithm of the average. This is related to scaling of quantities with system size. Note that we can rewrite (2.46) as ZN (β|κ) = e−βN fN (β|κ)
(2.47)
The expectation that fN (β|κ) is for large N typically independent of κ implies that the pdf of fN must have for large N a very sharp maximum and be narrowly concentrated at the typical(=average) value f¯N , which will in turn approach the value f (β) of the quenched free energy, as N → ∞.
Because of the exponential scaling of ZN in N , a free energy computed from the logarithm of the average of ZN (β|κ) (rather than properly from the average of the logarithm) will give wrong results wich are dominated by rare (exponentially rare in N ) results. We can demonstrate this on an example where we asume that the pdf of fN , defined as X p(fN ) = p(κ)δ(fN − fN (β|κ)) (2.48) κ
is Gaussian, thus of the form i h 1 1 p(fN ) = q exp − 2 (fN − f¯N )2 , 2σN 2 2πσN
2 σN =
σ2 , N
(2.49)
and we assume that it becomes sharply concentrated at f¯ = limN →∞ f¯N ) (with a variance σN decreasing as 1/N ) as N becomes large. 28
Under these conditions hZN i =
Z
h i 1 2 dfN p(fN )ZN = exp − βN f¯N + β 2 N 2 σN 2
(2.50)
so that the so-called annealed free energy is
1 a fN (β) = −(βN )−1 hZN i = f¯N − βσ 2 , 2
(2.51)
i.e. it takes a value which is O(1) lower than the avarage(=typical) value f¯N , thus dominated by exponentially in N rare events! 2 = σ 2 /N a , for some a > 0, find the Exercise: Fill in the details of this argument. Assuming σN a (β) → f¯ as N → ∞. condition on a, for which fN
2.3.5
The Annealed Approximation
The annealed free energy is generally (not only for Gaussian fN a lower bound to the true free energy. This follows for the general case from the fact that the logarithm is a concave function, hence from Jensen’s inequality hln[ZN (β|κ)]iκ ≤ ln[hZN (β|κ)iκ ]
(2.52)
and therefore a (β) = −(βN )−1 ln[hZN (β|κ)iκ ] ≤ fN (β) = −(βN )−1 hln[ZN (β|κ)]iκ . fN
(2.53)
As the annealed free energy is often (much) easier to compute than the proper quenched free energy it is often used as an exact lower bound for the true free energy — the so called annealed approximation. Note: In the disordered systems community it bad — but widely accepted — jargon to claim that in the case of annealed disorder one has to compute the average of the partition function (and its logarithm to obtain the appropriate free energy), whereas in the case of quenched (frozen) disorder one has to compute the average of the logarithm of the partition function. The first part of this claim is nonsense. As we have seen, in the case of annealed disorder (configurational degrees of freedom in equilibrium with other dynamical degrees of freedom) the quantity of interest is a partition function in an enlarged phase space X X Z(β) = exp[−βH(σ|κ) − βV (κ)] = Z(β|κ) exp[−βV (κ)] σ,κ
κ
This is not an average of the κ dependent partition function Z(β|κ); even if it were for some value of β, it would not be a proper average for all other β. 29
Chapter 3
Dynamics and Equilibrium 3.1
Describing Stochastic Dynamics
We briefly explore the conditions under which a stochastic dynamical system will approach equilibrium characterized by a Gibbs-Boltzmann distribution. In fact for Ising-like or other discrete systems, there is no natural Hamiltonian dynamics (with conserved energy), and a stochastic dynamics must be constructed in such a way that the GibbsBoltzmann distribution will in fact be the asymptotic equilibrium distribution corresponding to a given energy function (Hamiltonian). In general, the evolution of a stochastic dynamical system with a set of microstates S (Ising or otherwise) can be described by following the time-dependence of the probability of states pt = (pt (S)). Denoting by W the matrix of transition probabilities for a time step ∆t, with (non-negative) matrix elements W (S, S ′ ), one has X pt+∆t = W pt ⇔ pt+∆t (S) = W (S, S ′ )pt (S ′ ) (3.1) S′
Normalization of probabilities requires that W is a stochastic matrix satisfying X W (S, S ′ ) = 1
(3.2)
S
independently of S ′ . This equation can be read as implying that W has a left eigenvector e = (1, 1, . . . , 1) corresponding to the eigenvalue λ = 1. The right eigenvector corresponding to this eigenvalue would be a stationary distribution p∞ satisfying p∞ = W p∞ 30
One can turn (3.1) into a continuity equation by subtracting pt , a so called master-equation which in in components reads i Xh pt+∆t (S) − pt (S) = W (S, S ′ )pt (S ′ ) − W (S ′ , S)pt (S) . (3.3) S′
Here we have used (3.2) on the r.h.s.
Taking the ∆t → 0-limit one obtains the continuous time master-equation i Xh w(S, S ′ )pt (S ′ ) − w(S ′ , S)pt (S) . ∂t pt (S) =
(3.4)
S′
in which w(S, S ′ ) = lim∆t→0 W (S, S ′ )/∆t is the rate of transitions (per unit time) from state S ′ to a state S. This is a continuity equation for probabilities, describing the the rate of change of pt (S) as a balance on incoming and outgoing probability currents. For a stationary state, the l.h.s. of (3.4) should vanish. Assuming that the dynamics approaches a stationary state as t → ∞, we should then have for all S that i Xh 0= w(S, S ′ )p∞ (S ′ ) − w(S ′ , S)p∞ (S) . S′
One way to satisfy this equation is to require not only that the net probability current into and out of S vanishes, but that the net probability current between each pair of states S snd S ′ is separately zero. This is the so-called detailed balance condition w(S, S ′ )p∞ (S ′ ) = w(S ′ , S)p∞ (S)
(3.5)
connecting the stationary distribution and the matrix of transition rates. Physical systems in thermal equilibrium are for fundamental reasons expected to satisfy this stronger principle of detailed balance.
3.2
Uniqueness of Equilibrium Distributions
There are two important theorems which can be invoked to see that equilibrium distributions for stochastic dynamical systems will be unique. Perron’s Theorem for Positive Matrices Given a matrix W which is positive in the sense that W (S, S ′ ) > 0 for all S and S ′ , then its maximum eigenvalue λ1 is non-degenerate and strictly positive, and it has an eigenvector with only strictly positive (or only strictly negative) components. (A proof by F Ninio for the simple case of symmetric matrices is on the course web page). There is a more general version for non-negative matrices. 31
Frobenius’ Theorem for Non-Negative Matrices Given a matrix W which is nonnegative in the sense that W (S, S ′ ) ≥ 0 for all S and S ′ , and indecomposable (i.e. no simultaneous permutation of rows and columns can transform it into the form A B ′ W = , 0 C with A and C square matrices, and 0 a matrix containing only zeros), then its eigenvalue λ1 with maximum real part is non-degenerate and strictly positive, and it has an eigenvector with only strictly positive (or only strictly negative) components. If there are k eigenvalues degenerate in absolute value with λ1 , i.e., λ1 = |λ2 | = . . . = |λk | then they are of the form λν = λ1 exp(i2π(ν − 1)/k) for ν = 1, . . . , k, so they are equally spaced on a circle of radius λ1 in the complex plane. Note: A system that is decomposable has a proper subspace of the space of states that the system cannot leave. Exercise: Verify this statement using the definition of indecomposability given above. Applied to matrices describing a stochastic dynamical system this entails that the equilibrium distribution which is the right eigenvector of W corresponding to eigenvalue 1 is uniqe (all other right eigenvectors must be orthogonal to the left eigenvector e = (1, 1, . . . , 1), so cannot have only positive elements). If the degenerate case k > 1 of the Frobenius situation is encountered, this would imply that the system asymptotically enters a k cycle. Remark The above theorems are true for finite-dimensional Matrices. If matrices are infinitedimensional then the system may have more than one asymptotic state. This is actually the situation when phase transitions occur, e.g. where the system finds itself either in the state with positive magnetisation, or in the state with negative magnetisation.
3.3
Glauber Dynamics for Ising Systems
For systems with discrete degrees of freedom such as the Ising model, the energy function or Hamiltonian X 1X ˜ i Si h Jij Si Sj − H(S) = − 2 i
i6=j
is often the starting point of the analysis (e.g. given through energetic considerations), and a stochastic algorithm then needs to be constructed which converges to thermal equilibrium described by H(S). This construction exploits the detailed balance requirement. 32
For the Ising model, the most popular choice is asynchronous Glauber dynamics. It consists in picking a spin (e.g., i) at random within a time-step ∆t = 1/N , and determining its new state according to X ˜ i + ξit Si (t + ∆t) = sign Jij Sj (t) + h j
where the ξit are zero mean i.i.d. random variables with 1 P (z) = Prob{ξit ≤ z} = (1 + tanh(βz)) . 2 This entails that the average value of Si (t + ∆t) for a fiven configuration S(t) is = tanh[βhi (S(t)] , hSi (t + ∆t)i S(t)
(3.6)
in which the local field hi (S) of spin i is
hi (S) =
X
˜i . Jij Sj + h
j(6=i)
Transitions between different states in a simgle time step can occur only if S and S ′ differ by flipping at most one spin in the sample. Formalise this by introducing a spin-flip operator via S = (S1 , S2 , . . . , Si , . . . , SN ) ,
S ′ = Fi S = (S1 , S2 , . . . , −Si , . . . , SN )
For Glauber dynamics one thus has W (S, Fi S) = W (Fi S, S) =
1 1 {1 + tanh[βSi hi (Fi S)]} = w(S, Fi S) 2N N 1 1 {1 − tanh[βSi hi (S)]} = w(Fi S, S) 2N N
The prefactor N1 in the transition probability accounts for the fact that each of the N spins can be chosen for update with probability N1 during the infinitesimal time interval ∆t = 1/N . Note that the local field is is independent of the state of Si , so that hi (S) = hi (Fi (S)) The master-equation describing this situation is then of the form pt+ 1 (S) − pt (S) = N
i 1 Xh w(S, Fi S)pt (Fi S) − w(Fi S, S)pt (S) , N i
33
which in the thermodynamic limit gives rise to i Xh ∂t pt (S) = w(S, Fi S)pt (Fi S) − w(Fi S, S)pt (S) .
(3.7)
i
Noting that
e±x 1 {1 ± tanh(x)} = 2 2 cosh(x)
we can verify that thes transition rates do indeed satisfy detailed balance w.r.t. the GibbsBoltzmann distribution corresponding to H(S), by checking that for all i eβSi hi (S) e−βH(Fi S) e−βH(S) e−βSi hi (S) = . 2 cosh[βSi hi (S)] Z 2 cosh[βSi hi (S)] Z
(3.8)
where we have invested that hi (S) = hi (Fi (S)) (a consequence of absence of self-couplings). Thus we need to show that for all i H(Fi S) − H(S) = 2Si hi (S) .
(3.9)
To check this statement, start isolating terms in H(S) which involve the spin i from those which don’t. Writing X 1X ˜ k Sk , H(S) = − h Jkℓ Sk Sℓ − 2 k
k6=ℓ
we note that terms in the (first) double sum involving Si could arise from k = i or from ℓ = i. Given any i we can thus write H(S) = −
1 X 1 X ˜ i Si + H (i) (S (i) ) , Jiℓ Si Sℓ − Jki Sk Si − h 2 2 ℓ(6=i)
k(6=i)
in which H (i) (S (i) ) contains all contributions to H(S) which do not involve Si . One refers to H (i) (S (i) ) as the cavity Hamiltonian. We will encounter it in later chapters. Using the symmetry of the couplings Jik = Jki and the fact that we can rename ℓ → k in the first contribution, we get i hX ˜ i + H (i) (S (i) ) = −Si hi (S) + H (i) (S (i) ) (3.10) Jik Sk + h H(S) = −Si k(6=i)
With this representation, clearly H(Fi S) − H(S) = 2Si hi (S) as claimed, given that the contribution of H (i) is not affected by the spinflip and vanishes in the difference, that Fi Si = −Si , and that hi (Fi S) = hi (S). 34
Remark For {0,1}-degrees of freedom ni instead of ±1 spin-degrees of freedom Si , the analog of Glauber Dynamics would consist in asynchronous updates of the form X ˜ i + ξit , Jij nj (t) + h ni (t + ∆t) = Θ j
for a single randomly selected node i, with Θ(x) the Heaviside step function, Θ(x) = 0 for x ≤ 0 and Θ(x) = 1 for x > 0. In this case we need a symmetric zero-mean noise distribution with βz 1 1 + tanh Prob{ξit ≤ z} = 2 2
(note the extra factor 1/2 in the argument of the hyperbolic tangent), so that
βh (t) i n o 1h eβhi (n(t)) i = Prob ni (t + ∆t) = 1 = 1 + tanh 2 2 1 + eβhi (n(t)) P ˜ with hi (t) = j Jij nj (t) + hi in order to achieve detailed balance with a Gibbs-Boltzmann distribution generated by X 1X ˜ i ni . H(n) = − h Jij ni nj − 2 i
i6=j
3.4
The Metropolis Monte-Carlo Dynamics Algorithm
Glauber Dynamics mimics an element of microscopic realism in the sense that it stipulates that a spin tends to align (with a probabiltiy depending on temperature) with the local field it experiences from its neighbours, including possibly an external field contribution. There is a general prescription, valid for arbitrary degrees of freedom which satisfied detailed balance with a given Hamiltonian, trivially by construction. This is the so-called Metropolis rule. Given a system of interacting degrees of freedom (discrete or continuous) described by a Hamiltonian H = H(S) there is a version of stochastic dynamics which trivivially satisfies detailed balance with the Gibbs-Boltzmann distribution corresponding to the Hamiltonian in question, namely Monte-Carlo dynamics based on the Metropolis algorithm. It’s simplest implementation consists in proposing one (or several) degrees of freedom for an update with probability pprop (S ′ , S), where S and S ′ denote the system state before and after the proposed update, and accept the proposed change with probafbility 1 , if H(S ′ ) < H(S) , pacc (S ′ , S) = (3.11) −β(H(S ′ )−H(S)) e , if H(S ′ ) ≥ H(S) . 35
If pprop is symmetric — pprop (S ′ , S) = pprop (S, S ′ ) — then the matrix w(S ′ , S) = const × pprop (S ′ , S) pacc (S ′ , S)
(3.12)
of transition rates will automatically satisfy detailed balance with the Gibbs-distribution generated by H(S), and hence the Metropolis algorithm will automatically converge to thermal equilibrium characterized by this distribution. Exercise: Convince yourselves of this fact.
3.5
Langevin & Fokker Planck Equations and Equilibrium
If we consider dynamics of continuous degrees of freedom described by Langevin equations of the form dxi = fi (x) + ξi (t) , i = 1, . . . , N , (3.13) dt with deterministic forces fi (x) and additive Gaussian white noise hξi (t)ξj (t′ )i = σ 2 δij δ(t − t′ ) this system allows an equivalent description in terms of a Fokker-Planck equation o X ∂ n 1 ∂ ∂t Pt (x) = − fi (x)Pt (x) − σ 2 Pt (x) . dxi 2 dxi
(3.14)
(3.15)
i
This is structurally a continuity equation for probabilities of the form ∂t Pt (x) = −
X ∂ Ji (x, t) . dxi i
We will show that a stochastic dynamics of this form has an equilibrium distribution that has the structure of a Gibbs Boltzmann distribution, if the deterministic forces in (3.13) derive from a potential, i.e. if ∂ fi (x) = − U (x) , dxi so that (3.13) is a gradient dynamics with Gaussian white noise. A stationary solution is obtained when the divergence of the probability current J (x, t) = (Ji (x, t)) vanishes, and hence ∂t Pt (x) = 0 in (3.15). A stronger — sufficient, but not necessary condition — would be to require that the proability current itself (and indeed all its components Ji (x, t)) vanish separately, rather than just the divergence of the current. 36
This requirement is analogous to the detailed balance condition we discussed in the context of stochastic dynamics for discrete dynamical variables described in terms of a master equation. Thus detailed balance for a stationary distribution P∞ (x) = P (x) implies ∂ 1 P (x) = 0 fi (x)P (x) − σ 2 2 dxi which can be rewritten as
2 ∂ fi (x)P (x) = P (x) . 2 σ dxi
Assuming P (x) to be of Gibbs-Boltzmann form P (x) = Φ(x) to be determined, this equation translates into
1 Z
e−Φ(x) with some energy function
2 ∂ fi (x) = − Φ(x) 2 σ dxi If fi derives from a potential, as assumed above, these conditions are solved by Φ(x) =
2 U (x) + const. σ2
in which the (x) independent constant can be absorbed in the normalization constant of the equilibrium distribution, which thus takes the form P (x) =
1 −2U (x)/σ2 e . Z
(3.16)
A proper Gibbs-Boltzmann equilibrium distribution at inverse temperature β = (kB T )−1 is obtained if one assumes the variance of the Gaussian white noise in (3.13), (3.14) is related to temperature via σ 2 = 2kB T . (3.17)
37
Chapter 4
Asymptotic Methods 4.1 4.1.1
Laplace and Saddle Point Methods The Laplace Method
The Laplace method is a method to evaluate real integrals of the form Z IN [f ] = dx eN f (x)
(4.1)
for large N and functions f which have a single maximum at x0 (assumed to be in the integration range. For such integrals we have the result 1 lim ln IN [f ] = f (x0 ) (4.2) N →∞ N so for large N the integral IN will behave asymptotically like IN [f ] ∼ eN f (x0 ) .
(4.3)
To verify these statements rewrite (4.1) as IN [f ] = e
N f (x0 )
Z
dx eN [f (x)−f (x0 )] .
By assumption f (x) − f (x0 ) < 0 for x 6= x0 , so the integrand is exponentially small (in N ) for x 6= x0 , hence the integral will be dominated by x ≃ x0 . Using a Taylor expansion of f up to second order around the maximum at x0 gives s Z 1 2π ′′ 2 , (4.4) IN [f ] ≃ eN f (x0 ) dx e− 2 N (−f (x0 )) (x−x0 ) = eN f (x0 ) N ( − f ′′ (x0 )) entaillng the stated limiting behaviour, as N → ∞. 38
Example Stirling approximation for the factorial for large N , √ N ! ≃ 2πN N N e−N . This follows from N! =
Z
Z
dx xN e−x = N
dy (N y)N e−N y = N N N
Z
dy eN (ln y−y) .
Evaluating this via the saddle point method gives the stated approximation. Exercise Fill in the details of this calculation. √ The following figure shows the ratio N !/( 2πN N N e−N ) for 1 ≤ N ≤ 30 to give an indication of the precision. 1.10
1.08
1.06
1.04
1.02
1.00
0
5
10
15
20
25
30
Figure 4.1: Ratio of exact value of the factorial and its Stirling approximation for 1 ≤ N ≤ 30. Note that the relative error is below 1% for N > 15.
Multi-dimensional generalization: The Laplace Method can be generalized to multidimensional integrals. Let f be a function from Rn into the real numbers with a single maximum at x0 . Then for Z IN [f ] =
dx eN f (x)
we have the result
(4.5)
1 ln IN [f ] = f (x0 ) (4.6) N →∞ N where x0 is the coordinate of the uniqe maximum of f in the integration range. Following similar lines of reasoning as above this is shown by writing, Z N f (x0 ) IN [f ] = e dx eN [f (x)−f (x0 )] Z P 1 2 N f (x0 ) ≃ e dx e− 2 N µ,ν (−∂ f (x0 ))µν (xµ −x0µ )(xν −x0ν ) lim
39
= eN f (x0 )
s
(2π/N )n det(−∂ 2 f (x0 ))
(4.7)
Here ∂ 2 f (x0 ) is the matrix of 2nd partial derivatives of f evaluated at x0 , with elements ∂ 2 f (x) ∂ 2 f (x0 )µν = ∂x which is negative definite at a maximum of f (i.e. all its eigenvalues µ ∂xν x0
must be negative). The stated result follows by taking limits.
4.1.2
The Saddle Point Method
The idea of the above evaluation of integrals can be generalized to the evaluation of contour integrals in the complex plane which are of the form Z (4.8) IN [f ] = dz eN f (z) , C
where we assume f to be a complex differentiable (holomorphic) function, and C a contour in the complex plane along which the integral is being evaluated. The idea then is to deform the contour in such a way that it passes through a stationary point z0 of f , with f ′ (z0 ) = 0. (If such a deformation is possible without crossing singularities of f the integral along the deformed contour will be the same as the original one by Cauchy’s theorem.) Setting z = x + iy, and f (z) = g(x, y) + ih(x, y) with real functions g and h one has that (x0 , y0 ) must be a stationary point of both g and h, but it will be a saddle-point for both g and h (rather than a maximum or a minimum) because of the Cauchy-Riemann conditions ∂2g ∂2g + =0, ∂x2 ∂y 2
∂2h ∂2h + 2 =0. ∂x2 ∂y
In the vicinity of z0 we can approximate (using a low order Taylor expansion) 1 f (z) ≃ f (z0 ) + f ′′ (z0 ) (z − z0 )2 . 2 Choosing the deformed contour in such a way that f ′′ (z0 ) (z − z0 )2 along the deformed contour is real and purely negative, and noting that |eN f (z) | = eN g(z) one sees that contributions to the integral are exponentially (in N ) suppressed away from z0 in the direction chosen (in which the integral can be approximated by a Gaussian integral). In summary, one obtains 1 lim ln IN [f ] = f (z0 ) (4.9) N →∞ N 40
in complete analogy to the real case. Remark: We will repeatedly come across situations where the saddle-point method rather than the conceptually simpler Laplace method will be used. From physics considerations we can typically expect that at the relevant stationary points we find that Imf (z0 ) = 0.
4.2
Curie-Weiss Ferromagnet
Model Hamiltonian for a system of N Ising spins H = H(S) = −
X 1X Jij Si Sj − h Si 2 i,j
(4.10)
i
with infinitesimally weak all-to-all connections: Jij =
J , N
i 6= j ,
Jii = 0 .
(4.11)
Introducing the magnetization per spin m=
1 X Si N
(4.12)
i
one has
NJ 2 J m − N hm + . 2 2 The last O(1) contribution corrects for the diagonal contributions included in the first. In the large system limit it will be negligible, and we will henceforth neglect it without further ado, i.e. we have i hJ (4.13) H(S) ≃ −N m2 + hm . 2 H(S) = −
We need to evaluate the partition function
ZN = ZN (β, h) =
X S
and will do this using three different methods.
41
e−βH(S) ,
(4.14)
4.2.1
First Method: Combinatorics and Laplace Method
Use the fact that H depends on S only via the magnetization density m which can take the values 2k , k = 0, 1, . . . N (4.15) m=1− N in which k is the number of spins with Si = −1. The number of spin configurations at given k is N N , = k N 1−m 2 so that we have ZN
1 n h βJ io X N exp N m2 + βhm = 1−m 2 N 2
(4.16)
m=−1
In the large N limit the values of the magnetization m become densely spaced in the interval [−1, 1] with spacing ∆m = N2 . We proceed by approximating the sum over m values by an integral, and use the Stirling approximation in the form n! ∼ nn e−n for n √ ≫ 1 to evaluate the factorials in appearing in the combinatorial factor (The additional factors 2πn can be shown to be negligible in the case of interest here.) n h 1 − m 1 − m 1 + m 1 + m io N ln + ln ∼ exp − N 2 2 2 2 N 1−m 2 We thus get ZN ∼
N 2
with
Z
1
dm eN g(m)
(4.17)
−1
1−m 1−m 1+m 1+m βJ 2 m + βhm − ln − ln (4.18) 2 2 2 2 2 Evaluating (4.17) via the Laplace method we look for a maximum of the function g by solving g ′ (m) = 0, or g(m) =
β(Jm + h) + With
1 2
1 1−m ln =0 2 1+m
⇐⇒
ln 1+m 1−m = arctanh(m) we finally get
1 1+m ln = β(Jm + h) 2 1−m
m = tanh[β(Jm + h)]
(4.19)
(4.20)
as the fixed point equation to determinine the maximum of g. If this equation has several solutions, one will have to verify that g ′′ (m) = βJ − 42
1 <0 1 − m2
(4.21)
to accept a solution as a (local) maximum of g. Exercise: Redo the calculations with the subleading contributions to the Stirling approximation included and convince yourself that asymptotically for large N you get the same results for the fixed point equation and the function g at the fixed point. Before we analyze the fixed point equation and the physical interpretation of its solutions, we will look at two alternative methods to evaluate the partition function, which is the content of the following sections
4.2.2
Second Method: Gaussian Linearization and Laplace Method
The second method to evaluate the partition function returns to (4.14), using the explicit form expression of m in terms of sums over microscopic spin-variables to express the Hamiltonian, to get n βJ X 2 X o X ZN = exp Si + βh Si , (4.22) 2N S
i
i
and notes that a simple evaluation of the partition function is precluded only by the square of m in the exponential, which on multiplying out gives rise to a coupling between all spins. One can get rid of the square of m by using a Gaussian linearization identity of the form Z o n b2 o n a dx p . (4.23) exp − x2 ± bx = exp 2 2a 2π/a In the present case ZN
XZ
n N βJ X o X dm p exp − Si Si + βh m2 + βJm 2 2π/(N βJ) i i S n N βJ X o XZ dm p = exp − Si m2 + β(Jm + h) 2 2π/(N βJ) i S =
is seen to do the job (this step is often referred to as Hubbard-Stratonovich or Gauss-HubbardStratonovich linearization). The partition function now factors with respect to the individual summations over the states of the Si Z n N βJ o dm p exp − m2 + N ln 2 cosh[β(Jm + h)] (4.24) ZN = 2 2π/(N βJ)
This integral is once more of a form that can be evaluated using the Laplace method. The fixed point equation that determines the value m of the maximum of the exponent is identical to (4.20)! We will finally look at yet another alternative to evaluate the partition function, viz the 43
4.2.3
Third Method: δ-Functions and Saddle-Point Method
We return once more to (4.14), keeping m as a short-hand for sums over spin values defined in (4.12), io n h βJ X m2 + βhm , ZN = exp N 2 S
We now introduce a δ-function enforcing the definition (4.12) and introduce a factor 1 into this expression in the form of an integral over this δ-function as n h βJ io XZ X m2 + βhm . (4.25) ZN = N dm δ N m − Si exp N 2 i
S
The next step is to express the δ-function in terms of its Fourier-representation n X o X Z dm ˆ , exp − im ˆ Nm − Si δ Nm − Si = 2π giving ZN
n h βJ i X Z dmdm X o ˆ = exp N m2 + βhm − imm ˆ + im ˆ Si . 2π/N 2
(4.27)
i
S
Once more the sum over Z ZN = Z =
(4.26)
i
i
the states of the Si decouples w.r.t. i, giving n h βJ io dm dm ˆ exp N m2 + βhm − imm ˆ + ln[2 cosh(im)] ˆ 2π/N 2 n o dm dm ˆ exp N g(m, m) ˆ 2π/N
(4.28)
We now have a double integral with an integrand that is an exponential of a complex function of two variables, scaling exponentially in N , which we will evaluate using a two-variable version of the saddle-point method. We have to look for points in the complex m and m ˆ planes at which the function g(m, m) ˆ is stationary. The stationarity conditions are ∂g = β(Jm + h) − im ˆ =0 ∂m ∂g = −im + i tanh(im) ˆ =0 ∂m ˆ Inserting im ˆ from the first equation into the second once more gives the fixed point equation (4.20) m = tanh[β(Jm + h)] we derived from the first method, suggesting that the saddle point solution will give rise to a real value for m and a purely imaginary m; ˆ this does indeed require to deform the m ˆ integration contour (the real line) to pass through the purely imaginary m ˆ saddle point value. 44
4.2.4
Equivalence of the Three Methods
We have seen that the three methods give rise to the same fixed-point equation for the magnetization m as a function of β and the external field h. We need to demonstrate that the three methods give rise to the same expression for the free energy density in the thermodynamic limit and that m is indeed the thermodynamic equilibrium magnetization, irrespective of the method used to obtain the central fixed point equation. The free energy per site is given by 1 ln ZN (β, h) . N →∞ N
−βf (β, h) = lim
(4.29)
First Method: Using the combinatorics and the Laplace method to compute ZN , we would get −βf (β, h) = g(m), with g(m) given by (4.18) to be evaluated at the value of m that satisfies the fixed-point equation (4.20). Rewriting g(m) as g(m) =
1 βJ 2 m + βhm + ln 2 − ln(1 − m2 ) − marctanh(m) , 2 2
and using (4.20) to substitute (1−m2 ) = 1−tanh2 [β(Jm+h)] = β(Jm + h), we get −βf (β, h) = g(m) = −
1 cosh2 [β(Jm+h)]
βJ 2 m + ln 2 cosh[β(Jm + h)] 2
and arctanh(m) =
(4.30)
provided that m solves (4.20). Second Method: Using a combination of Gaussian linearization and the Laplace method to compute ZN , we read off directly from (4.24) that βJ 2 m + ln 2 cosh[β(Jm + h)] 2 provided that m solves (4.20), which clearly agrees with what we got from the first method. −βf (β, h) = −
Third Method: Finally, using a δ-function and its Fourier- representation to enforce the definition of m and the saddle-point method for the integration over m and its so-called conjugate m, ˆ we get −βf (β, h) = g(m, m), ˆ to be evaluated at the solution of the saddle point equations for m and m. ˆ Using the first of these saddle point equations to substitute im = β(Jm + h) we can express g(m, m) ˆ in terms of m alone, once more giving −βf (β, h) = −
βJ 2 m + ln 2 cosh[β(Jm + h)] , 2
and requiring that m solves (4.20). 45
A Remark on Relative Merits The three methods we have used are not equally powerful. The combinatorial method is very much restricted to system with discrete degrees of freedom (such as Ising spins), the Gaussian linearization approach requires that the Hamiltonian is at most quadratic in the relevant macroscopic quantities. The δ-function approach is the most general of the three we have looked at; its disadvantage is that it typically requires to use the saddle-point rather than the Laplace method, so it is less easy to discuss stability or to verify that the fixed points one find are indeed giving the maximum contributions to the partition function.
4.2.5
The Physics of the Curie-Weiss Model
The CW model is a simple model for ferro-magnetism; although the underlying assumption of an infinite-range interaction is clearly unphysical, the model has the distinct advantage that its free energy (and thus its thermodynamic properties) are easily computed — as we have seen in at least three different, but related ways. The model has the distinct advantage to give rise to an equation of state, which can be formulated in closed analytic form, namely the fixed-point equation (4.20). Its solutions give the magnetization as a function of temperature and field m = m(β, h). Remarkably the equation of state predicts a phase transition in zero external field h = 0 from an unmagnetized state at high temperatures to a magnetized state below a critical temperature Tc given by the condition βc J = 1, as we will now show. Equation of State, Magnetic and Thermal Properties Magnetization In zero external field the equation of state is m = tanh(βJm) .
(4.31)
This equation always has an m = 0-solution. A graphical analysis reveals that it can have in addition two non-zero solutions ±m(β) 6= 0 if βJ > 1. The solution in the low temperature phase is shown in the figure below. Looking at the stability condition (4.21), we see that the m = 0 solution is stable precisely where it is the only solution of the FPE, i.e. for βJ < 1, and it becomes unstable, where the non-zero solutions appear, i.e. for βJ > 1. These non-zero solutions turn out to be stable throughout the low-temperature regime. Method 1 suggests that the partition function (thus the free energy) is indeed dominated by microstates with a magnetization m which solves the fixed-point equation (4.20), so m would be the equilibrium magnetization. One can also check that m satisfies the thermodynamic expression 46
for the magnetization in terms of the free energy m = −∂f (β, h)/∂h. In evaluating this from (4.30), one should in principle take into account that there is an explicit h-dependence and an implicit one through the h dependence of the solution m of (4.20), so m = m(β, h) = −
∂f ∂m(β, h) ∂f (β, h) = tanh[β(Jm + h)] − ∂h ∂m ∂h
Via the fixed point equation, the first contribution is just m, m(β, h) = −
∂f (β, h) = tanh[β(Jm + h)] , ∂h
(4.32)
whereas the second contribution vanishes, as the fixed point equation is precisely the equation that expresses stationarity of (4.30) w.r.t. variations of m. m
m
1.0
1.0
0.8
0.8
0.6
0.6
0.4
0.4
0.2
0.2 kT 0.4
0.6
0.8
1.0
1.2
1.4
kT 0.4
J
0.6
0.8
1.0
1.2
1.4
J
Figure 4.2: Left: Zero field magnetization of the CW model as a function of kB T /J showing non-analytic behaviour at kB T /J = 1. Right: Magnetization at non-zero field (h = 0.01), where the behaviour is smooth.
Susceptibility The susceptibility is defined via the field-derivative of the magnetization. We obtain it by taking the field-derivative in (4.20). Using the chain rule, we get χ≡
∂m(β, h) ∂m(β, h) = 1 − tanh2 [β(Jm + h)] βJ +β . ∂h ∂h
This is solved for χ to give
χ = χ(β, h) =
β(1 − m2 ) , 1 − βJ(1 − m2 )
(4.33)
where we have invested the fixed point equation to express the hyperbolic tangent in terms of m. Note: The condition (4.21) that a solution of the self-consistency equation (4.20) does indeed define a dominant contribution to the partition function, as evaluated via the Laplace method is equivalent to the condition that the susceptibility is positive. 47
In zero external field we know that m = 0 for T ≥ Tc (or β ≤ βc ). This gives χ(β) = χ(β, 0) =
β , 1 − βJ
kB T > J ,
which diverges as βJ → βc J = 1, and would be negative for β > βc . As the susceptibility is by its nature a fluctuation quantity it must be positive, so at β > βc we need to take the m = 6 0 solutions of the fixed-point equation. Χ
Χ 12 100 10 80 8 60 6 40
4
20
2 kT 0.4
0.6
0.8
1.0
1.2
1.4
kT 0.4
J
0.6
0.8
1.0
1.2
1.4
J
Figure 4.3: Left: Zero field susceptibility of the CW model as a function of kB T /J showing a divergence at kB T /J = 1. Rigth: susceptibility at non-zero field (h = 0.01), where the behaviour is smooth. (Note the different scales!) Internal energy: The internal energy per spin is given by u(β, h) =
∂βf (β, h) J = m2 − (Jm + h) tanh[β(Jm + h)] , ∂β 2
obtained by performing derivatives (4.30); implicit β-depencence via m(β, h) is again ignored due to the stationarity of f w.r.t. m. Investing the fixed point equation (4.20) we can rewrite this as J (4.34) u(β, h) = − m2 − hm . 2 Notice that this is just H/N expressed in terms of m, though now m is no longer arbitrary but given by the equilibrium magnetization determined from (4.20). Note that in zero magnetic field m ≡ 0 for T > Tc , implying that u(β) = u(β, 0) ≡ 0 in the high temperature phase.
Specific heat: The specific heat per spin is given by c(β, h) =
∂u ∂u = −kB β 2 . ∂T ∂β
48
All β dependence of u is via the β dependence of m along the solution curve of (4.20), so from (4.34) ∂m , c(β, h) = kB β 2 (Jm + h) ∂β with
h ∂m ∂m i = (1 − m2 ) (Jm + h) + βJ ∂β ∂β
⇐⇒
∂m (Jm + h)(1 − m2 ) = , ∂β 1 − βJ(1 − m2 )
following lines of reasoning analogous to those used in the computation of χ. This finally gives c(β, h) = kB β 2
u
(Jm + h)2 (1 − m2 ) 1 − βJ(1 − m2 ) u
kT 0.4
0.6
0.8
1.0
1.2
1.4
(4.35)
kT 0.4
J
-0.1
-0.1
-0.2
-0.2
-0.3
-0.3
-0.4
-0.4
-0.5
-0.5
0.6
0.8
1.0
1.2
1.4
J
Figure 4.4: Left: Zero field internal energy of the CW model as a function of kB T /J showing non-analytic behaviour at kB T /J = 1. Rigth: internal at non-zero field (h = 0.01), where the behaviour is smooth. C
C 1.2
1.4 1.2
1.0
1.0
0.8
0.8 0.6 0.6 0.4 0.4 0.2
0.2 kT 0.4
0.6
0.8
1.0
1.2
1.4
kT 0.4
J
0.6
0.8
1.0
1.2
1.4
J
Figure 4.5: Left: Zero field specific heat of the CW model as a function of kB T /J showing a cusp at kB T /J = 1. Rigth: specific heat at non-zero field (h = 0.01), where the behaviour is smooth.
49
Phase Transition and Critical Phenomena We have seen that in zero external field, h = 0 the CW model undergoes a phase transition from a non-magnetized state at high temperatures (kB T ≥ J) to a state with non-zero spontaneous magnetization (i.e. macroscopic magnetic order) at low temperatures (kB T < J). Precisely at the critical temperature kB Tc = J the thermodynamic functions are non-analytic functions of temperature and field. It is worth to properly appreciate this phenomenon: just a slight change in temperature (or field) fundamentally changes the macroscopic behaviour of the system! This is a non-trivial prediction of the theory and it agrees well with observations on magnetic materials. Magnetization: To analyze the nature of the non-analyticities in the vicinity of (T = Tc , h = 0) we look at the fixed point equation at h = 0 and for T . Tc , where m ≪ 1, so by expanding 1 m = tanh(βJm) ≃ βJm − (βJm)3 3 one obtains m2 ≃ 3
βJ − 1 1 . βJ (βJ)2
Using βc J = 1, introducing the reduced temperature t via βJ − βc J Tc − T = ≡ −t , βJ Tc and approximating (βJ)2 ≃ 1, we get q √ m ≃ 3( − t) = 3 (−t)β ,
where
β=
1 2
(4.36)
is the critical exponent of the magnetization (not to be confused with the inverse temperature). ExercisepTry to compute the first correction to the asymptotic behaviour of m near Tc . Assuming m = 3( − t) (1+at), determine a, by using higher orders in the expansion of the hyperbolic tangent. Susceptibility: In case of the susceptibility (for T & Tc , thus m = 0) we have χ=
β = J −1 t−γ 1 − βJ
where
γ=1
(4.37)
is the critical exponent of the susceptibility. Looking at χ just below the critical temperature (where m 6= 0 but small), we write χ = J −1
βJ(1 − m2 ) 1 ≃ J −1 |t|−γ , 2 1 − βJ(1 − m ) 2 50
(4.38)
thus the same kind of divergence, albeit with a different prefactor. Exercise Verify the last statement using the asympotic expression for m. Specific heat: We saw that the specific heat exhibits a cusp, and a discontinuity at Tc . Analyzing the expresssion in detail, we get (for T . Tc ) c(β) = kB
(−3t)(1 − m2 ) 3 (βJm)2 (1 − m2 ) ≃ k ≃ kB (1 + t) , B 1 − βJ(1 − m2 ) βJ(t + m2 ) 2
t.0,
(4.39)
thus the specific heat has a finite limit at Tc , behaves linearly in T immediately below the critical point, and is exactly zero above the critical point. One often writes c(β) ∼ |t|−α
with
α=0
(4.40)
for this behavior. Systems in which the specific heat diverges (with α > 0) do exist. Critical Isotherm: Let us finally look at the behaviour of m as a function of (small) h exactly at Tc . Expanding the hyperbolic tangent gives 1 m ≃ βc Jm + βc h − (βc (Jm + h))3 3 For small h, both h and m will be small, and (with βc J = 1) we can approximate 1 βc h ≃ m3 3
⇔
m(Tc , h) ∼ h1/δ
(4.41)
with a critical exponent δ = 3 for the critcal isotherm.
4.3
The Mean-Field Approximation
Consider an Ising model with finite range interaction (e.g. nearest neighbour interaction on a finite dimensional lattice), with Hamiltonian H(S) = −
X 1X Jij Si Sj − h Si , 2 i,j
i
thus Gibbs-Boltzmann equilibrium distribution p(S) =
1 exp[−βH(S)] Z(β)
Unlike in the Curie Weiss case, the model can in general not be exactly solved (the free energy or average magnetization computed) . The only exception are basically (i) systems in one dimension, (ii) systems in two dimension without external field. 51
An approximate description of the equilibrium properties of the system is obtained by using a mean-field approximation as follows. Given any site i, we can write the spin configuration S of the entire system as S = (Si , S∂i , S∂ 2 i ) in which S∂i denotes the spin-configuration of the spins connected to i, while S∂ 2 i denotes the spin-configuration of the remainder of the system. In terms of this decomposition H(S) = −Si (hi + h) + H (i) (S (i) )
(4.42)
with an internal field hi defined by hi = hi (S∂i ) =
X
Jij Sj
(4.43)
j
and a cavity Hamiltonian H (i) (S (i) ) which is the Hamiltonian of the system from which Si and the links connecting to it have been removed. We have H (i) (S (i) ) = H (i) (S∂i , S∂ 2 i )) .
(4.44)
It contains all contributions to the total Hamiltonian except terms involving Si . The contribution of Si to the total energy thus has the form of a Zeeman energy of the spin in a total field composed as the sum of internal and external field. Expressing the Gibbs-Boltzmann distribution in terms of S = (Si , S∂i , S∂ 2 i ), we can express the marginal p(Si , S∂i ) in terms of the decomposition (4.42) as P −βH (i) (S (i) ) X S∂ 2 i e βSi (hi +h) p(Si , S∂i , S∂ 2 i ) = e × p(Si , S∂i ) = (4.45) Z(β) S∂ 2 i
In this expression, the second factor is a complicated function of S∂i due to the interactions that existed with the S∂ 2 i ). By contrast, the conditional disttribution p(Si |S∂i ) is simple: noting that the local field hi depends only on S∂i and not on Si itself, and as moreover X p(Si |S∂i ) = 1 Si
independently of S∂i , we have p(Si |S∂i ) =
eβSi (hi +h) . 2 cosh[β(hi + h)]
This allows to express the Gibbs average of Si as D E hSi i = tanh[β(hi + h)] 52
S∂i
(4.46)
(4.47)
where the hyperbolic tanget is the conditional average of Si at given S∂i and the final average over the marginal P −βH (i) (S (i) ) S∂ 2 i e p(S∂i ) = 2 cosh[β(hi + h)] × Z(β) remains to be performed. The problem is that due to the complicated nature of the marginal, this is virtually impossible to evaluate. The mean-field approximation consists in approximating the average (4.47) by replacing the average of the hyperbolic tangent by the hyperbolic tangent of the average, leading to i h X hSi i ≃ tanh [β(hhi iS∂i + h)] = tanh β Jij hSj i + h (4.48) j
This is now a coupled system of self-consistency equations for the local magnetizations mi = hSi i.
Assuming the system to be translationally invariant, so that mi = m for all i, and assuming that we have uniform nearest neighbour interactions Jij = J for (i, j) n.n, we get m = tanh[β(zJm + h)]
(4.49)
in which z is the coordination (or number of nearest neighbours; for d-dimensional hyper-cubic lattices, one would have z = 2d. Note: This equation is very similar to the fixed point equation of the Curie-Weiss model. For that model in fact, we had assumed all to all interactions of uniform strenght J/N . As the number of nearest neighbours is z = N − 1 in that case we should expect the mean-field approximation m = tanh[β(z(J/N )m + h)] which agrees with the exact result in the thermodynamic limit. The Curie-Weiss model is constructed in such a way that the mean-field approximation is the exact solution for this model. This is related to the fact that in the Curie Weiss model the internal field of spin i is X J X Sj , hi = Jij Sj = N j
j(6=i)
and it converges to the magnetization per spin in the thermodynamic limit by the law of large numbers. It is thus a non-fluctuating quantity, and this is the reason why replacing the average of the tanh by the tanh of the average is exact in this case.
4.4
Variational Approach
The variational apprroximation provides an alternative way to evaluate partition functions, thus free energies, of systems in an approximate fashion in cases where an exact evaluation is 53
infeasible. The method exploits Jensen’s inequality for the evaluation of convex functions, as follows. Suppose H = H(S) is a Hamiltonian of some interacting system (not necessarily Ising) for which the evaluation of tne partition function X Z(β) = exp[−βH(S)] S
is infeasible. Let H0 = H0 (S) be an alternative, simpler Hamiltonian for which the partition function X Z0 (β) = exp[−βH0 (S)] S
as well asQ expectation values of observables depending on one or several spin variables, such as hSω i0 = h i∈ω Si i0 can be computed.
Use this alternative Hamiltonian to write
1 X −β(H(S)−H0 (S)) −βH0 (S) e e Z0 (β) S D E −β(H(S)−H0 (S)) = Z0 (β) e
Z(β) = Z0 (β)
0
(4.50)
Use Jensen’s inequality for averages of convex functions (in the present case the exponential function to obtain the bound Z(β) ≥ Z0 (β) e−βhH(S)−H0 (S)i0 which entails F (β) ≤ F0 (β) + hH(S) − H0 (S)i0 .
(4.51)
The best approximation to the free energy F (β) will then be obtained by minimizing the r.h.s. of (4.51) over the set of parameters characterizing the variational Hamiltonian H0 . The minimum of the r.h.s. will then define a variational approximation to the true free energy. It is always an upper bound.
4.5
Variational Mean Field
Let us apply the variational method to a ferro-magnetic Ising system with nearest neighbour interactions on a d dimensional hypercubic lattice in a uniform external field h, and let us take a system of non-interacting spins as the variational test Hamiltonian. X H0 (S) = − hi Si . (4.52) i
54
For this test Hamiltonian we have Z0 =
Y
2 cosh(βhi )
i
and hSi i0 = tanh(βhi ) ≡ mi ,
hSi Sj i0 = hSi ih Sj i0 = mi mj
and
hence 1 F (β) ≤ F˜ (β) ≡ − ln Z0 + hH(S) − H0 (S)i0 β X 1X 1X = − ln[2 cosh(βhi )] − Jij mi mj − (h − hi )mi . β 2 i
i,j
(4.53)
i
To find the minimum, solve i hX ∂ ˜ Jij mj + h − hi + tanh(βhi ) = 0 . F (β) = − tanh(βhi ) − β(1 − tanh2 (βhi )) ∂hi j
which is equivalent to arctanh(mi ) = βhi = β
hX j
i Jij mj + h
⇐⇒
mi = tanh
hX j
i Jij mj + h .
In a homogeneous, translationally invariant system, we expect mi = m, hence for a system of coordination z m = tanh[β(zJm + h)] which is exactly the self-consistency equation of the mean-field approximation. Note: Although using a system of non-interacting spins as variational test Hamiltonian reproduces the results of the earlier ad-hoc and un-controlled mean-field approximation, the advantage of the variational approach is that one is not restricted to this simple choice. By enlarging the family of variational test Hamiltonians, one is bound to obtain systematically better approximations ot the true free energy. Exercise: In the study of critical phenomena in finite dimensional systems one often approximates the physics of Ising type systems by using models with continuous degrees of freedom, such as the (lattice) Ginzburg-Landau-Wilson Hamiltonian H(φ) = −
X X 1X (φ2i − 1)2 , Jij φi φj − h φi + u0 2 i,j
i
i
in which the φi are continuous degree of freedom, that can take any real value. The third contribution, a sum of quartic on-site potentials is introduced to ensure that the Gibbs distribution 55
corresponding to this model would favour states with φ ≃ ±1 which are the minima of each of the quartic potential. For large u0 the system is therefore expected to be Ising-like. The partition funciton for this system would be Z Y Z= dφi exp[−βH(φ)] . i
Although integrals are often easier to evaluate than discrete sums, nobody has been able to evaluate the partition function of the LGW Hamiltonian. Solve a variational approximation of this system, using e.g. X 1 2 H0 = 2 (φi − µi ) . 2σ i i
with two independent variational parameters, µi and σi per site (in a homogeneous system one expects the solution of the minimization problem to be independent of i of course). Note: I have not seen this problem posed in any textbook or addressed in any journal article, so I don’t know the correct answer, and I am really interested to have your results (soon!) Simplified versions of this have been looked at, though.
56
Chapter 5
System with Finite Dimensional Site Disorder We will here briefly look at mean-field like systems with finite dimensional site disorder. We will look at systmes in which either the bonds are expressed in terms of random variables associated with sites, and take the form Jij =
1 Q(ξi , ξj ) , N
i 6= j ,
Jii = 0
(5.1)
with Q a symmetric function from Rp × Rp to R, and the ξi assumed to be random in Rp , or the external fields hi depend randomly on i or systems in which these two modes of randomness occur together. Restricting ourselves to Ising type models, we will look at Hamiltonians of the form X 1X Jij Si Sj − hi Si H(S) = − 2 ij
i
with the Jij or the hi (or both) random as indicated above. Models of this type can be used to describe disordered magnets in physics. However, as indicated in Sec. 1.2 there are systems which would initially be defined and motivated in terms of a stochastic dynamical evolution equation ! N X Si (t + ∆t) = sign (5.2) Jij Sj (t) + hi + ξit j=1
(recall Eq (1.1)). Their asymptotic behaviour can be described in terms of thermal GibbsBoltzmann equilibrium distributions provided the noise ξit has the right form, namely ρβ (ξ) = 57
1 d 2 dξ
tanh(βξ). We will provide a full justification of this statement later on. Let us for now just notice that with noise distribtued independently (in i and t) according to ρβ (ξ) we have hSi (t + ∆t)i|S∂i (t)
"
N X = tanh β Jij Sj (t) + hi j=1
#
for the conditional average of Si (t + ∆t) at given S∂i (t) (i.e. taking a functional form as we have discussed above in connection with the mean-field approximation for ‘thermal’ systems). Exercise For ρβ (ξ) as given, verify the above statement about the conditional average of Si (t + ∆t). In Sec 1.2 we have seen that models of neural networks are of this type, with the Jij denoting synaptic efficacies (which can be either inhibitory or excitatory), and the hi related to neural firing thresholds ϑi via ϑi = −hi . Similarly, models for the dynamics of attitude change can be cast into this form.
5.1
General Solution
We solve the model for general Q in (5.1), keeping p, the dimension of the random vectors ξi fixed. You may for the sake of definiteness imagine that the components ξiµ of th ξi can each only take values in {±1} so that the number of different possible vectors is 2p . However, we note at the outset that all we need is that the total number of different possible ξi vectors is finite. The solution of the model is based on a decomposition of sites into subgroups of sites which have the identical ξi -vectors. Define the subgroup or sub-lattice Ix = {i; ξi = x}
(5.3) S
The set I = {1, 2, . . . , N } of sites is then a union of Ix , i.e. I = x Ix . Interactions between spins on sites i and j then depend only on the sub-lattices to which i and j belong. Denote by |Ix | the size of the sub-lattice Ix . We then have that px =
|Ix | |Ix | = |I| N
(5.4)
is the fraction of sites for which ξi = x, which in the large system limit approaches the probability that the random vector ξi takes the value x.
58
In terms of these definitions we have XX N X Q(x, x′ )px mx (S) px′ mx′ (S) − hi Si + H0 H(S) = − 2 ′ x
(5.5)
i∈Ix
x,x
with sub-lattice magnetizations mx (S) =
1 X Si |Ix |
(5.6)
i∈Ix
and the O(1) term
H0 =
1X Q(x, x)px 2 x
correcting for the absence of self-interactions which would otherwise be included in (5.5). Evaluating the partition function, ee use the δ-function method to decouple spins, i.e. we introcuce the mx as integration variables, enforcing their definitions via Z X 1 = |Ix | dmx δ |Ix |mx − Si i∈Ix
and using Fourier representations of δ functions, as explained in Ch 3.2. This gives (ignoring the O(1) contribution H0 to the Hamiltonian) ( " X Z Y dmx dm βX ˆx Q(x, x′ )px mx px′ mx′ exp N Z = 2π/|I | 2 x x S x,x′ # ) XX X −i px mx m ˆx + (im ˆ x + βhi )Si
(5.7)
x i∈Ix
x
The Si now appear linearly in the exponential so that the Si -summations in the partition function factor w.r.t. i, so ( " Z Y dmx dm ˆx βX Z = Q(x, x′ )px mx px′ mx′ exp N 2π/|I | 2 x x x,x′ ) # i XX h X (5.8) ln 2 cosh(im ˆ x + βhi ) −i px mx m ˆx + x i∈Ix
x
Now by the law of large numbers (for |Ix | ≫ 1) i i X h 1 X h ln 2 cosh(im ˆ x + βhi ) = N px ln 2 cosh(im ˆ x + βhi ) |Ix | i∈Ix i∈Ix D h iE ˆ x + βh) = N px ln 2 cosh(im h
59
(5.9)
where h. . .ih denotes an average over the distribution of random fields.
For the partition function we therefore have ( " Z Y dmx dm ˆx βX Q(x, x′ )px mx px′ mx′ Z = exp N 2π/|I | 2 x ′ x x,x
−i
X
px mx m ˆx +
x
X
D
h
ˆ x + βh) px ln 2 cosh(im
x
iE
h
#)
(5.10)
This is now explicitly of a form that can be evaluated by the saddle point method. Note also, that results do not depend on the specific realization of the randomness, but only on the distributions of the random fields hi and the ξi ! Fixed point equations are im ˆx = β
X
Q(x, x′ )px′ mx′
x′
mx =
D
E . tanh im ˆ x + βh h
Inserting the solution of the first into the second, we get iE D h X , Q(x, x′ )px′ mx′ + βh mx = tanh β
(5.11)
h
x′
a system of coupled fixed point equations for the mx . The number of coupled equations is equal to the number of different possible x-vectors. The free energy is given by βX Q(x, x′ )px mx px′ mx′ 2 x,x′ X iE X D h + px ln 2 cosh β Q(x, x′ )px′ mx′ + βh
−βf (β) = −
x′
x
h
(5.12)
in which the mx are solutions of the fixed point equations. Exercise: Rederive these results using the combinatorial method. Note that one can obtain the set of fixed point equations (5.11) also directly by using a mean-field approximation to study systems with Hamiltonians with couplings given by (5.1) and random external fields hi . To see this, recall that the mean-field approximation gives a set of fixed-point equations for the mi = hSi i of the form i h X mi = tanh β Jij mj + βhi j
60
Inserting (5.1) for the Jij and cosidering i ∈ Ix we get i h X Q(x, x′ )px′ mx′ + βhi mi = tanh β
(5.13)
x′
where we have introduced sub-lattice (equilibrium) magnetizations mx via mx =
1 X mi |Ix | i∈Ix
Inserting this into (5.13) we get iE i D h X h X 1 X mx = , Q(x, x′ )px′ mx′ + βh Q(x, x′ )px′ mx′ + βhi = tanh β tanh β |Ix | h ′ ′ i∈Ix
x
x
where the last equality is a consequence of the law of large numbers. But this is nothing but the fixed-point equation of the exact solution, meaning that for systems of the form considered here mean-field theory is indeed exact. Depending on the type of interaction-kernel Q(x, x′ ) the set of fixed point equations (5.11) can allow solutions with heterogeneous, i.e. x dependent sub-lattice magnetizations, and a larger diversity of phases than the paramagnetic and ferromagnetic phases of a simple ferro-magnetic Ising model Systems of this type have been considered to describe the physics of spin-glasses (magnetic systems which have a mixture of ferromagnetic and anti-ferromagnetic interactions, leading to frustration, thus the possibility of a large number of ground states or meta-stable states.
5.1.1
Neural Networks – The Hopfield Model
Hopfield took the physics of spin-glasses as a guiding metaphor to investigate the question whether a system of neurons, coupled via a mixture of excitatory (ferromagnetic) and inhibitory (anti-ferromagnetic) synaptic interactions could function as an associative memory. Using Si = ±1 to denote the firing/non-firing states of a neuron respectively, one would like to investigate the question whether neurons coupled via synapses Jij , and following a stochastic dynamics of the form ! N X Si (t + ∆t) = sign (5.14) Jij Sj (t) − ϑi + ξit j=1
could have long-lived states with certain prescribed values hSi i = mξi (with m a noiselevel/temperature-dependent amplitude) for the local magnetizations; these would then describe long lived firing patterns (memories) of the entire network. 61
Remark Looking at brain architecture and principles governing synaptic transmission, there are at least two features which rule out using equilibrium statistical mechanics for their analysis. • Synaptic interactions Jij are generally non-symmetric, Jij 6= Jji ; in fact, they are typically uni-directional . Thermal equilibrium as described by Gibbs-Boltzmann equilibrium distributions, however, exists only for systems with symmetric interactions. • A somewhat more subtle point is related to the nature of the noise in the system: one would in principle expect the noise in realistic neural systems to be Gaussian, say with variance σ 2 . For such systems one would get # " N 1 X Jij Sj (t) + hi hSi (t + ∆t)i|S∂i (t) = erf √ 2σ 2 j=1 rather than a hyperbolic tangent, as would strictly be required to use equilibrium statistical mechanics for the description of long-term behaviour (we will give a proper justification of this statement later on). The fact that one nevertheless chooses to apply equilibrium statistical mechanics machinery is related to the fact that the functions tanh(βx) and ˜ with β˜ = 2β/√π are very close to each other for all x ∈ R, so equilibrium and erf(βx) statistical mechanics should provide an excellent approximation of actual behaviour. The case of a single pattern - Mattis Model In the case of having a single stable attractor correlated with a prescribed set of ξi , one can choose Jij =
1 ξi ξj N
and ϑi = −hi = 0, which is a special p = 1 case of the general setup with Q(ξi , ξj ) = ξi ξj . The fixed point equations are then h X i ξξ ′ pξ′ mξ′ , mξ = tanh β ξ ∈ {±1} , ξ′
or explicitly h i m+ = tanh β(p+ m+ − p− m− ) h i m− = − tanh β(p+ m+ − p− m− )
The quantity of interest is the overlap m of the equilibrium spin-state with the pattern {ξi }, defined by X 1 X m= ξi hSi i = pξ ξ mξ N i
ξ
62
which in the present one-pattern case is simply m = p+ m+ − p− m− Combining the two fixed-point equations we can obtain an equation for m alone, namely m = p+ m+ − p− m− = (p+ + p− ) tanh(βm) = tanh(βm) which is equivalent to the fixed point equation for the Curie Weiss model (with J = 1. So we know that there is a non-zero positive solution for β > βc = 1. Remark There is a much simpler way of demonstrating the thermodynamic equivalence of the Mattis model and a simple non-random ferromagnet (a Curie-Weiss model in the present infinite range case). Via a (local) so-called gauge transformation ξi Si → S˜i the Hamiltonian of a Mattis model 1 X H(S) = − ξi ξj Si Sj 2N i,j
transforms into the Hamiltonian of a Curie-Weiss model X ˜ =− 1 H(S) S˜i S˜j . 2N i,j
As evaluating the partition function as a sum over {Si } configurations is equivalent to performing the partition function as a sum over {S˜i } configurations, the statement follows. Clearly a homogeneous magnetization hS˜i i = m in this Curie-Weiss model translates into an inhomogeneous magnetization hSi i = mξi for the original Mattis model. Generalizing to several patterns - the Hopfield Model Following a proposal by Hopfield the single-pattern Mattis model can be generalized to the simultaneous storage of several patterns by choosing p 1 X µ µ 1 Jij = ξ i ξ j = ξi · ξj N N µ=1
This prescription in in some sense an implementation of the so-called Hebb rule, according to which synaptic interactions are determined by correlations of pre- and post-synaptic activity. It is once more a special case of the general random-site model setup with Q(ξi , ξj ) = ξi · ξj , so the fixed point equations (5.11) describing the equilibrium state(s) of the system directly apply to the present case. For hi ≡ 0, h X i ξ · ξ ′ pξ′ mξ′ , mξ = tanh β (5.15) ξ′
63
The scalar-product form of the interactions suggests further structure, which will both amount to a simplification of the set of fixed-point equations, and — more importantly perhaps — provide a better level of description of the underlying information processing problem, viz. associative information storage (and retrieval). Introducing the overlap vector m=
X ξ
D E pξ ξ mξ = ξ mξ
ξ
(5.16)
we see that the fpe’s can be written as mξ = tanh [βξ · m] which after multiplying by ξ and averaging over ξ gives D h iE , m = ξ tanh βξ · m
(5.17)
ξ
or, in components
iE D h X , mµ = ξ µ tanh β ξ ν mν ξ
ν
(5.18)
Note that mµ is nothing but the overlap of the equilibrium spin-configuration with the pattern ξiµ , X 1 X µ pξ ξ µ mξ . mµ = ξi hSi i = N i
ξ
Exercise One can use the overlaps mµ instead of the sub-lattice magnetizations as the primary objects of the theory, by realizing that the Hopfield model Hamiltonian can be written as p NX 2 p mµ + H(S) = − 2 2 µ=1
and evaluating the partition function directly in terms of the mµ (using your favourite method(s)). Do that calculation, and derive the fixed point equations (5.18) directly without using sub-lattice magnetizations. Show that the free energy expressed in terms of the overlaps is given by p p iE X βX 2 D h −βf (β) = − . ξ ν mν mµ + ln 2 cosh β 2 ξ ν=1
µ=1
64
Retrieval states For unbiased binary random patterns with Prob(ξiµ = ±1) = 21 the fixed point equations, Eqs. (5.17) or, equivalently, their reformulation (5.18) in component form have solutions for which the local equlibrium ‘magnetizations’ hSi i which are correlated only with a single pattern, say µ, implying that we have solutions with and mν = 0 for ν 6= µ .
mµ = m > 0 ,
As the couplings are symmetric w.r.t permutation of the patterns, we can assume without loss of generality that µ = 1. These states are the so-called retrieval states. To see this, assume mν = mδν,1 on the r.h.s. of (5.18). We then have D E m1 = m = ξ 1 tanh [βξ 1 m] , ξ
giving m = tanh [βm] for the FPE determining the retrieval amplitude m (which is equivalent to the FPE of the Curie-Weiss model), and D E mν = ξ ν tanh [βξ 1 m] = 0 ξ
as claimed, using independence of the
ξν
and
hξ ν i
ξ
= 0.
For sufficiently low noise levels β > βc = 1 therefore, the system will retrieve any of the stored patterns. Other meta-stable states The desired p retrieval states are not the only solutions of the fixed-point equations. The symmetry of the couplings under permutation of pattern indices suggest that there may be solutions of the form mµ = m for 1 ≤ µ ≤ n,
mµ = 0 for µ > n .
Inserting this into (5.18), requires n D h X iE m = ξ µ tanh βm ξν , ν=1
ξ
µ≤n,
for the amplitude m, which on summing over µ gives m=
n n iE h X 1 D X µ ξν , ξ tanh βm n ξ ν=1
µ=1
65
Assuming that the amplitude becomes small at sufficiently large noise levels (small β), one can do an expansion of the tanh as for the Curie-Weiss model to find r −3t , (5.19) m≃ 3n − 2 where t = (T − Tc )/Tc and Tc = 1.
Exercise (i) Verify this statement. (ii) Show that this result can also be obtained without initially assuming that the n non-zero mµ all have the same amplitude. This analysis should actually reveal that the sign of the amplitude can be freely chosen for each µ, i.e. that mµ = ±m for µ = 1, . . . , n, with each sign independently chosen. In the regions where β is not close to the critical value, the equation for the symmetric amplitude m can be solved numerically (try, if you have access to a computer). Stability To assess whether these additional solutions are stable, one has to check whether the fixed point characterizing these solutions actually corresponds to a minimum of the free energy (5.12) evaluated as a function of the ‘order parameters’ {mξ } or alternatively the {mµ } at these saddle points (after elimination of the conjugate variables {m ˆ ξ } (respectively the {m ˆ µ }). Thus, using the representation in terms of the {mµ }, we have to look at the matrix D with elements Dµ,ρ =
D i h X ∂ 2 f (β) = δµ,ρ − β ξ µ ξ ρ 1 − tanh2 β ξ ν mν iξ ∂mµ ∂mρ ν
evaluated in the solution mµ = m for µ ≤ n of the fixed point equations, and check whether all it’s eigenvalues are positive. Due to the symmetries of the problem these eigenvalues can in fact be computed, and it turns out that only states with odd n can be stable, and only the n = 1 (retrieval) states are stable right where they appear (at β ≥ 1), whereas the n ≥ 3 solutions each have a critical βnc > 1, where the solutions become stable, with βnc increasing with n. Remark The present method of solving the model assumes that p remains finite in the thermodynamic limit N → ∞ though it can be taken arbitrarily large. The methods we used to solve the theory do however loose their validity when p grows with N . To answer the question how many patterns can be stored (and retrieved) in a Hopfield model of a given size, one needs to develop a theory for p = αN with α = O(1). Soving the model in that limit requires the use of more sophisticated methods for disordered system, such as the replica approach. We’ll come to that later.
66
Chapter 6
A Model for Operational Risk Operational risk (OR) is classified as “the risk of losses resulting from inadequate or failed internal processes, people and systems or from external events”. Possible OR-categories (in the financial industry) are 1. human processing errors, e.g., mishandling of software applications, reports containing incomplete information, or payments made to incorrect parties without recovery 2. human decision errors, e.g., unnecessary rejection of a profitable trade or wrong trading strategy due to incomplete information 3. (software or hardware) system errors, e.g., data delivery or data import is not executed and the software system performs calculations and generates reports based on incomplete data 4. process design error, e.g., workflows with ambiguously defined process steps 5. fraud and theft, e.g., unauthorized actions or credit card fraud 6. external damages, e.g., fire or earthquake However, operational risk is clearly relevant for any type of enterprise. In the banking industry, regulators (at the Bank for International Settlements in Basel) require that banks set aside a certain minimum amount of capital to cover losses due to operational risks. The so-called Basel II accord of 2001 suggests three methodologies to estimate the required capital : (i) the Basic Indicator Approach (BIA), (ii) the Standardized Approach (SA) and (iii) the Advanced Measurement Approach (AMA). Under the BIA, the required capital is determined by taking 15% of the banks’ average gross income over the previous three years. The SA is only slightly 67
more advanced in that the banks’ operations are divided into 8 business lines, each of which has a specific weight. The required capital is calculated as the weighted average of the (nonnegative) gross income from the business lines over the previous three years. Under AMA, banks are responsible for designing their own measurement approach and setting assumptions on the loss distributions of individual OR categories; overall losses are then aggregated, assuming that losses in over these categories are independent. Hovever, a moment of reflection shows that losses incurred through failing processes within an organization are not independent. Indeed an organization can be defined as a system of interacting and mutually supporting processes. A failing process will stop providing input (or proper input) to all other processes which normally rely on said process to be working properly; as a consequence the likelihood of these other processes to fail as well (partially or completely) will in general be increased. The Basel II regulatory document entirely misses this point, hence misses mechanisms that can lead to avalanches of process failures (black-outs).
6.1
Set-Up of the Model
A simple model to analyze OR describes an organization as a collection of N processes which can be either up and running (ni (t) = 0) or down ni (t) = 1). In order to maintain a stable running state over the time increment t → t + ∆t, each process i needs to receive support at time t, which is denoted hi (t) and takes the form X hi (t) = ϑi − Jij nj (t) − ξi (t) . (6.1) j
Here, ϑi denotes the average support received by process i in a fully functional environment, while Jij represents the impact on the average support if process j breaks down. Finally, the ξi (t) are taken to be zero mean, (Gaussian) random fluctuations, which represent non-systematic internal and external perturbations to the environment (e.g., fire, earthquake, voltage fluctuations within electric supplies, etc.). A process breaks down in the next time step if hi (t) < 0. Thus, the dynamics of a processes’ state is given by X ni (t + ∆t) = Θ Jij nj (t) − ϑi + ξi (t) , (6.2) j
By suitable rescaling of the average support and impact parameters ϑi and Jij , respectively, the ξi (t) can be taken to have unit variance. The rescaled variables, can then be related to 68
the unconditional and conditional failure probabilities. Consider a situation with all processes working at time-step t. The probability that process i will breakdown in the next time-step, referred to as pi , is given by integrating Eq. (6.2) over the noise ξi (t). Thus pi = Φ (−ϑi ) , where
(6.3)
x 1 + erf √ 2 is the (cumulative) probability distribution Φ(x) of the unit variance Gaussian. Similarly, defining pi|j as the probability process i will break down in the next step, given that currently process j is broken down while all others are working, gives us 1 Φ(x) = 2
pi|j = Φ (Jij − ϑi ) .
(6.4)
These relations may be inverted to obtain expressions for the model parameters in terms of the failure probabilities, ϑi = − Φ−1 (pi ) ,
Jij = Φ−1 (pi|j ) − Φ−1 (pi ) .
(6.5)
Note, that we could also omit rescaling to unit variance noise, and keep the structure of the above relations. The only difference would be that the probability distribution Φ(x) of√the unit 1 2 variance Gaussian is replaced by its variance σ counterpart, Φσ2 (x) = 2 1 + erf x/ 2σ 2 .
For general sets of un-conditional and conditional failure probabilities, this model is not analytically tractable. However, one can investigate systems for which the set these parameters is specified in a probabilistic fashion.
6.2
A Statistical Mechanics Approach
In the spirit of Hopfield’s modelling of neural networks we will introduce assumptions (unrealistic ones!) that will allow us to investigate the system using equilibrium statistical mechanics. We will assume that the Jij are symmetric, and will replace the Gaussian noise with thermal noise with probability distribution 1 βx Φβ (x) = 1 + tanh . (6.6) 2 2 These assumptions entail that the stochastic dynamics will satisfy detailed-balance with the Gibbs-Boltzmann equilibrium distribution generated by X 1X H(n) = − Jij ni nj + ϑi ni 2 i
i6=j
69
We will assume a stochastic setting for the Jij , assuming that interaction patterns of processes form a random network with average connectivity c, taking the Jij of the form Jij = cij
J + √ xij , c c
J
0
(6.7)
where cij = cji ∈ {0, 1} denotes absence/presence of an interaction, distributed according to c c P (cij ) = 1 − (6.8) δcij ,0 + δcij ,1 , i < j , N N
where J0 parameterizes the mean strength of the interactions, and where the xij are symmetric (but otherwise independent) quenched random variables of mean zero and unit variance, hx2ij i = 1 .
hxij i = 0 ,
(6.9)
Thus J 2 parameterizes the variance of the interaction stregths. The scalings with the mean connectivity c are introduced in such a way that it allows to investigate the extensive but sparse connectivity limit c → ∞, N → ∞, but c/N → 0 (as an approximation to the ] large N , large c, small c/N situation).
6.2.1
Replica Analysis
We need to calculate the average of the free energy of this system over the distribution of the disorder (the cij and the xi j), and in principle also over the site-random variables ϑi ; we will find that the free energy is self-averaging w.r.t. the latter). I.e, we need to evaluate −βf (β) =
1 hln Zi , N
where the average is over the bond thisorder. To compute the average of the logarithm of the partition function, we use the replica-identity 1 d hln Zi = lim ln hZ n i = ln hZ n i (6.10) n→0 n dn n=0
For integer n, Z n is the partition function of n independent identical copies (replica) of the system. The replica method consists in first evaluating hZ n i for integer n and then analytically continuing the result to real n and taking the n → 0 limit as required.
Thus we first have to evaluate hZ n i =
X
{na }
*
n X o X X exp β Jij nia nja − β ϑi nia a
i<j
70
ia
+
.
Here a = 1, . . . , n enumerates the replica. The bond disorder average factors w.r.t. the individual bonds (i, j), thus — inserting the specific form of the Jij — we get * ( ) )+ ( J X X X Y Jx ij 0 exp βcij + √ × exp − β ϑi nia nia nja hZ n i = c c a ia
{na } i<j
To proceed we exploit the fact that we are interested in the large c limit, so we can expand the first exponential, giving * J X Y Jxij X 0 n nia nja 1 + βcij hZ i = + √ c c a {na } i<j ) + ( 2 X X (βJ)2 2 + ϑi nia nia nja + . . . × exp − β cij xij 2c a ia
Assuming that all moments of the xij exist, the terms not written out expleicitly are o(c−1 ), and can be omitted. Using (6.9), and c hcij i = , N and re-exponentiating 1 + Na = exp(1/N ) + O(N −2 ), we have ) ( 2 X X (βJ)2 X X βJ0 X X n ϑi nia nia nja + nia nja − β hZ i = exp 2N a 4N a i,j
{na }
i,j
ia
This is now the partition function of a homogeneous system of N × n degrees of freedom {nia }, including interactions between sites, and between replica (although before averaging the replica were non-interacting (i.e. independent). Contributions O(1) in the exponential coming from including self-interactions have been suppressed in this expression. Exercise: Specify the neglected terms. Note that interactions can be entirely described in terms otf two types of macroscopic order parameters, the replicated fractions of failed processes ma =
1 X nia , N i
1≤a≤n,
(6.11)
and the Edwards-Anderson matrix of replica-overlaps qab =
1 X nia nib , N i
71
1 ≤ a, b ≤ n ,
(6.12)
in terms of which X
n
hZ i =
{na }
(
X (βJ)2 X 2 βJ0 X 2 ϑi nia qab − β ma + N exp N 2 a 4 ia
a,b
)
(6.13)
Coupling between sites is due to having squares of the above order parameters (generalizing the CW situation). We proceed using our favourite method (in the present situation, both Gaussian linearizations and the δ-function method would work. Using the latter, we get ( " Y X Z Y dma dm dq dˆ q βJ0 X 2 (βJ)2 X 2 ˆ a ab ab exp N ma + qab hZ n i = 2π/N 4π/N 2 a 4 a a,b a,b {na } # ) X X X X iX iX −i m ˆ a ma − m ˆ a nia + qˆab qab + i qˆab nia nib − β ϑi nia 2 2 a ia
a,b
ab
i
ia
The partition function now factors w.r.t. the site indices, though not w.r.t. the replica indices (i.e. replica are interacting through the unknown iˆ qab replica interaction parameters). Exploiting site-factorization, we can rewrite the above as
hZ n i =
Z Y dma dm ˆ a Y dqab dˆ qab a
−i
X a
2π/N
m ˆ a ma −
a,b
4π/N
( "
exp N
βJ0 X 2 (βJ)2 X 2 qab ma + 2 a 4 a,b #)
1 X iX (i) qˆab qab + ln Zeff 2 N i
a,b
.
(6.14)
(i)
Here Zeff is a single-site replica-partition function (the {na } sum extending over replicated indicator variables configuration at a single site), n o X (i) (i) Zeff = exp − βHeff ({na }) (6.15) {na }
with an effective single-site Hamiltonian given by X X iX (i) na qˆab na nb − βϑi −βHeff = i m ˆ a na + 2 a a
(6.16)
ab
(i)
(i)
The i dependence of Zeff is solely due to the ϑi . By the LLN the normalized sum over the ln Zeff in (6.14) converges to the expectation over the ϑ distribution so that we finally have ( " Z Y dma dm ˆ a Y dqab dˆ qab βJ0 X 2 (βJ)2 X 2 n hZ i = exp N ma + qab 2π/N 4π/N 2 a 4 a a,b
a,b
72
−i
X a
iX qˆab qab + h ln Zeff iϑ m ˆ a ma − 2 a,b
#)
(6.17)
This is now in a form that can be evaluated by the saddle-point method. Stationarity of the exponential w.r.t. the ma and the qab requires iˆ qab = (βJ)2 qab .
im ˆ a = βJ0 ma ,
Stationarity w.r.t. variations of the m ˆ a and the qˆab requires D E D E ma = hna i , qab = hna nb i , ϑ
ϑ
(6.18)
(6.19)
in which the inner angled bracket is an average w.r.t. the Gibbs-Boltzmann distribution generated by the effective Hamiltonian that defines Zeff . If we insert the expressions of im ˆ a and iˆ qab that we obtain from the first set of FPEs, we have Heff = −J0
X a
ma na −
X βJ 2 X qab na nb + ϑ na 2 a
(6.20)
ab
for this effective single site Hamiltonian at given ϑ. In terms of the solution of the FPEs the free energy per node is given by −βf (β) = lim lim
N →∞ n→0
1 lnhZ n i . Nn
(6.21)
Note that by evaluating hZ n i via a saddle point method at given n, we are effectively taking the N → ∞ rather than the n → 0-limit first, as strictly required. If we do (inserting the values of the conjugate variable), we get from (6.17) that o 1 n βJ0 X 2 (βJ)2 X 2 qab + h ln Zeff iϑ − ma − n→0 n 2 a 4
−βf (β) = lim
(6.22)
a,b
The problem that remains to be tackled is to solve the system of n(n + 3)/2 FPEs (6.19), and take the required n → 0-limit. This is not generally possible due to the replica interactions mediated by the qab . One solves this problem, by resorting to assumptions concering the transformation properties of the ma and the qab under permutation of the n replica. The simplest is the assumption of replica symmetry.
73
6.2.2
Replica Symmetric Solution
The replica symmetric (RS) ansatz is motivated by the observation that the replica are all identical and equivalent, so the solution should not break the symmetry between them. It is now known that the conclusion is hasty; the interaction between replica can lead to spontaneous breaking of the permutation symmetry among the replica (analogous to the breaking of the global spin-flip symmetry of the zero-field CW model in the low temperature solution!). Nevertheless the RS solution often does give a good qualitative first impression of the true behaviour. In the case of OR it is actually exact in the region of interest (processes mutually supporting each other, so low degrees of frustration). The RS ansats consists in assuming that ma = qaa = m ,
1≤a≤n,
qab = q ,
1 ≤ a 6= b ≤ n .
(6.23)
We note that the equality ma = qaa is a consequence of the fact that for the 0 − 1 variables na we have n2a = na . The RS version of Heff is then Heff
X βJ 2 X 2 X βJ 2 ϑna na − na + (m − q) q = − J0 m + 2 2 a a a
(6.24)
Thus the coupling between replica is only via a complete square. In evaluating Gibbs averages of the form P {n } (. . .) exp{−βHeff } h. . .i = Pa {na } exp{−βHeff }
we can therefore use Gaussian linearization to decouple replica that were coupled via a complete 2 square. Introducing the abbreviation Dz = √dz2π e−z /2 for the unit variance Gaussian measure we then have R P P {na } Dz(. . .) exp{−β a HRS (na )} R P (6.25) h. . .i = P {na } Dz exp{−β a HRS (na )}
in which
βJ 2 √ HRS (na ) = − J0 m + (6.26) (m − q) + J q z − ϑ na ≡ −hRS (z) na 2 depends only on the a single na . This Gaussian linearization has uncoupled the replica. The denominator in (6.25, i.e. Zeff , then gives n o Z X XZ Dz exp − β HRS (na ) = Dz ZRS (z)n (6.27) Zeff = {na }
a
74
with ZRS (z) = 1 + exp{βhRS (z)}
(6.28)
The numerator in (6.25 depends of course on the average we are evaluating. If we are interested in the expectation of a single na , we can write P R n exp{βh (z)n } ZRS (z)n−1 Dz a RS na a R hna i = DzZRS (z)n P R na exp{βhRS (z)na } ZRS (z)n Dz na ZRS (z) R . = Dz ZRS (z)n
Noting finally that ZRS (z)n → 1 in the relevant n → 0 limit, we can write this as Z Z exp{βhRS (z)} hna i = Dz = Dz Φβ (hRS (z)) 1 + exp{βhRS (z)}
(6.29)
with Φβ defined in (6.6). If we are interested in the expectation of a product of two indicator variables from different replica, we proceed similarly. Investing the knowledge that that ZRS (z)n → 1, we get P P Z nb nb exp{βhRS (z)nb } na na exp{βhRS (z)na } hna nb i = Dz ZRS (z) ZRS (z) Z = Dz Φβ (hRS (z))2 in the n → 0-limit.
Summarizing: the n → 0-limit of the RS FPEs take the simple form E DZ Dz Φβ (hRS (z)) m = ϑ E DZ Dz Φ2β (hRS (z)) q = ϑ
(6.30) (6.31)
with a Gaussian local field
hRS (z) = J0 m +
βJ 2 √ (m − q) + J q z − ϑ . 2
(6.32)
The free energy is obtained by evaluating (6.22) for the replica symmetric solutions of the order parameters. This gives D Z E o (βJ)2 1 n βJ0 2 2 2 2 −βfRS (β) = lim − nm − [nm + (n − n)q ] + ln Dz ZRS (z)n n→0 n 2 4 ϑ 75
The only non-trivial limit is that involving the last contribtution. It is evaluated by expanding R ZRS (z)n = 1 + n ln ZRS (z) + cO(n2 ), by noting that Dz = 1, and then by expanding ln 1 + R R n Dz ln ZRS (z) = n Dz ln ZRS (z) + cO(n2 ). All O(n2 ) terms don’t survive the n → 0-limit, so the final result is Z βJ0 2 (βJ)2 2 2 m − (m − q ) + Dz h ln ZRS (z)iϑ (6.33) −βfRS (β) = − 2 4 with
io n h βJ 2 √ ZRS (z) = 1 + exp β J0 m + (m − q) + J q z − ϑ 2
(6.34)
Exercise: Show that the replica symmetric fixed point equations can be recovered directly by requiring that fRS (β) is stationary w.r.t. variations of m and q. Exercise: Show explicitly that for a collection {naν } from k differnt replica we have k DY
n aν
ν=1
E
=
Z
Dz Φβ (hRS (z))k
in the n → 0-limit, where the expectation is the Gibbs-average generated by the effective replicated single site Hamiltonian Heff .
6.2.3
Results
The macroscopic properties of the system are determined by the ϑ distribtuion specifying the range of a-priori failure probabilities, the parameter J0 defining the average mutual dependency between processe, and the parameter J which quantifies the variance of the mutual dependencies across the network. In order to have predominantly cooperative interactions, need to be in the small J regime. Expect: for low a-priori failure probabilities, and low mutual dependencies, the system will typically be in the operational phase (O). If at given a-priori failure probability the mutual dependence (J0 ) between processes is increased there will be a point where spontaneous individual failures can trigger an avalanche of process failures. Heuristic analysis of FPEs, m = q =
DZ
DZ
Dz Φβ (hRS (z)) Dz Φ2β (hRS (z))
76
E
E
ϑ
ϑ
with a Gaussian local field hRS (z) = J0 m +
βJ 2 √ (m − q) + J q z − ϑ . 2
(6.35)
Try graphical solution Φβ (x) = 11 (1 + tanh(βx/2)) starting from the J = 0, constant ϑ limit. 1
Φβ(m) , m
0.8
0.6
0.4
0.2
0 0
0.5
1
1.5
2
2.5
m
Figure 6.1: Graphical solution of the FPEs in the J = 0-limit with ϑ = 2.5 ⇔ pi ≃ 0.01. Shown are the cases J0 = 2, which has only the low-m (O) phase, and J0 = 5, for which the low-m operational phase coexists with a high-m broken down phase (B). The graphical analysis for J = 0 reveals the existence of an operational phase (O), phases (C) where the operational and broken down phases coexist (C), and a broken-down phase at sufficiently large J0 . There will be hysteresis between the (O) and (B) phases. In the coexisting regime both the (O) and the (B) phase are locally stable; which one is absolutely stable, and which only meta-stable will be decided by a comparison of free energies. One can convince oneself that non-zero J will tend to smothen the effective Φβ curve and lead to a slight shift of the inflection point; anothe source of smoothing would be non-constant ϑ. Both would lead to slight deformations of phase boundaries, but the basic conclusions would be unaltered. Figure 6.2 shows the full solution for non-zero J — comparing with the a solution using nonthermal Gaussian noise — and the phase-diagram. Note that the Gaussian system will not exhibit thermal equilibrium despite symmetry of the interactions, so other techniques (dynamics) are needed. To compare the two cases, one has to match the thermal noise parameter to reproduce √ properties of the unit-variance Gaussian noise. Reproducing the variance requires β = π/ 3; but other measures could be taken, and would lead to different scalings of β. Note the dangers of meta-stability from an OR point of view. Systems in a meta-stable (O) phase would prefer to be in the (B) phase. Breakdown is therefore bound to occur at some point in such systems, without dramatic external causes (bubble-nucleation at first order phase 77
20
0.8
18
0.7
16
0.6
14 c
22
0.5
J0
m
1 0.9
12
0.4
10
0.3
8
0.2
6
0.1
4
0
2 0
5
10
15
20
0
J0
0.02
0.04
0.06 p
0.08
0.1
0.12
Figure 6.2: Left: Comparisons between stationary values of m for for symmetrically coupled networks with Gaussian noise (solid line) and networks √ with thermal noise (dotted line), as functions of J0 . The inverse temperature was β = π/ 3, and the a-priori homogeneous failure probability is p = 0.01. Right: Phase diagram of the network with thermal noise. In producing both diagrams, we set J = 0.2. transitions). Importantly, there is no visible sign of systems crossing from absolute stability to meta-stability! This is illustrated in figure 6.3 below. Perhaps worryingly, current trends across many industries point towards increasing dependency between processes (e.g. just-in time production).
78
Figure 6.3: Left: Loss record for a system of 100 heterogeneously interacting processes for a situation where the operational phase is stable but coexists with a meta-stable broken down phase (red), and for a system where J0 is slightly increased to make the operational phase metastable. The fluctuations of the two loss records are very similar. Both systems survive for the time-horizon shown. Right: Extending the time-horizon reveals a spontaneous breakdown of the meta-stable process network at a slightly later time. Note that no big event is needed to trigger the breakdown!
79
Chapter 7
Random Matrix Theory 7.1
Eigenvalues and the Spectral Density
Want to learn about eigenvalues of large matrices M . Consider real symmetric matrices. Eigenvalues defined via M xα = λxα , α = 1, . . . , N (7.1) where the λα are the eigenvalues, and the xα (6= 0)(!) are the eigenvectors. Eigenvalues are defined as roots of the characteristic polynomial p(λ) = det(λ1I − M ) = 0 For large matrices one is often interested in a limiting spectral density N 1 X δ(λ − λα ) ρM (λ) = N
(7.2)
α=1
as the matrix dimension N becomes large. Example Coupled oscillators on a closed ring. m¨ xℓ = −K[(xℓ − xℓ−1 ) + (xℓ − xℓ+1 )] ,
ℓ = 1, . . . , N .
Solving by xℓ (t) = eiωt vℓ requires ω 2 vℓ =
X
Mℓℓ′ vℓ′ ,
with
Mℓℓ′ =
ℓ′
80
K [2δℓℓ′ − δℓ+1,ℓ′ − δℓ−1,ℓ′ ] , m
thus to solve an eigenvalue equation for M . The difference structure entails that eigenvectors v = (vℓ ) are of the form vℓ = e−ikℓ . Periodic boundary conditions require eikN = 1, hence k = kµ = 2π N µ, µ = 1, . . . , N . Inserting into the eigenvalue equation gives λ(k) = ω(k)2 = 2
K [1 − cos(k)] m
The simplest way to compute the eigenvalue desity is to note that the eigenvalues λ(k) are 1 uniformly spaced in k-space, with k ∈ (0, 2π]. Denoting by ρ(k) = 2π the (normalized) density of k-values, we have dk m (1 − cos2 (k))−1/2 ρ(λ) = ρ(k) = dλ 4πK m which upon inserting cos(k) = 1 − 2K λ gives ρ(λ) =
m 1 r 2πK 1− 1−
m 2K
λ
0≤λ≤
2 ,
4K m
(7.3)
This spectral density has integrable singularities at its edges, commonly referred to as van Hove singularities. 1
0.8
ρ(λ)
0.6
0.4
0.2
0 0
0.2
0.4
0.6
0.8
1 λ
1.2
1.4
1.6
1.8
2
Figure 7.1: Spectral density of the harmonic chain as discussed for K = m/2.
7.2
The Random Matrix Problem
The random matrix problem is deals with generalizing considerations of the type above to situations where, for instance, force constants vary randomly across the system, so that elements of the matrices involved are random. 81
Other problems where spectra of random matrices are encountered and used are analysis of random graphs, random Schr¨odinger operators (decribing disordered conductors), random walks/diffusion on networks (Google page rank), energy levels of highly excited nuclei, quantum chaos, finance . . .. We will here look at the problem of computing spectral densities of sparse symmetric random matrices of the form Mij = cij Kij (7.4) where cij ∈ {0, 1} denotes presence or absence of a non-zero matrix element, and Kij its weight, if present. The term sparse refers to the limit in which the number of non-zero matrix elements in each row/column remains finite even in the limit of infinite matrix dimension. Such matrices describe (weighted) random graphs. If the cij are chosen independently (for i < j), the resulting graphs are so-called Erd¨os-Renyi random graphs.
7.2.1
Representation of the δ-function
The spectral density is typically computed using a representation of the δ-function as a limit of a Lorentz-distributions of zero-width: δ(λ − λα ) = lim
εց0
ε 1 1 1 = lim Im 2 2 εց0 π (λ − λα ) + ε π λ − iε − λα
(7.5)
Thus if vα is an eigenvector of M corresponding to the eigenvalue λα , we can appeal to the spectral theorem1 to express this as δ(λ − λα ) = lim
εց0
1 Im vα , [λε 1I − M ]−1 vα . π
Here λε = λ − iε, and (·, ·) denotes a scalar product.
Hence the overall sepctral density is
ρM (λ) = lim
εց0
= lim
εց0
X 1 Im vα , [λε 1I − M ]−1 vα πN α 1 Im Tr[λε 1I − M ]−1 πN
1
A brief reminder: the spectral theorem for symmetric or self-adjoint matrices says: if vα is an eigenvector of M corresponding to eigenvalue λα , i.e. if M vα = λα vα then for any function f (M ) we have f (M )vα = f (λα )vα , provided the r.h.s. exists.
82
The matrix-valued function G(z) = [z1I − M ]−1
is known as the resolvent of the matrix M . One can show that it exists and has finite norm for all z not in the spectrum of M . To exploit methods and heuristics of statistical physics, one uses [λε 1I − M ]−1 =
∂ ln[λε 1I − M ] ∂λ
and Tr ln A = ln detA = −2 ln[detA]−1/2
and the fact that [detA]−1/2 can be expressed in terms of a Gaussian integral. Summarizing, we have ρM (λ) = lim − εց0
where ZM (λε ) =
Z Y i
∂ 2 Im ln ZM (λε ) πN ∂λ
n i X o dx p i exp − λε δij − Mij xi xj 2 2π/i ij
(7.6)
(7.7)
Note that ZN is like a partition function of a system with continuous degrees of freedom — actually a harmonically coupled system, as the potential energy is quadratic in the coordinates — at an imaginary value of the temperature. A non-zero value of ε ensures that the integral exists. The problem is thus to compute the free energy of a disordered harmonic system, and evaluate a λ derivative. To deal with the disorder average one can use the replica method. Alternatively, for sparse Matrices, one can compute the partition function and the spectral density directly on a single large instance.
7.3
Cavity Approach to Sparse Matrices
Performing the λ derifative in (7.6),(7.7) allows to interpret the result as an ‘average’ X 1 Re hx2i i ρM (λ) = lim εց0 πN
(7.8)
i
over the normalized complex weight function PM (x) =
e−iHM (x) . ZM (λε )
83
(7.9)
It is formally like a Gibbs average w.r.t. the Hamiltonian 1 X λε δij − Mij xi xj HM (x) = 2
(7.10)
ij
evaluated at imaginary inverse temperature β = i. The key point to note is that only the single-vertex marginals Z Y Pi (xi ) = dxj PM (x) j(6=i)
are needed in order to evaluate the variances hx2i i appearing in (7.8). As in our construction of a mean-field approximation for Ising like systems, we note that we can write the marginal as 2 Z P Y e−iλε xi (7.11) dxj eixi j∈∂i Kij xj P (i) (x∂i ) Pi (xi ) = Zi j∈∂i
Here P (i) (x∂i ) is the distribution of the degree of freedoms neighbouring on i, in the so-called cavity graph — a graph from which vertex i has been removed as shown in the Figure below.
l l
i k k
j j
Figure 7.2: Left: part of a (locally) tree-like graph G, showing the neighbourhood of vertex i. Right: removal of vertex i creates a so-called cavity graph G(i) , in which degrees of freedom on the neighbouring vertices j, k, and l become uncorrelated. If one assumes the original graph to be tree-like, then the only correlations P of the x∂i is induced through the central variable xi , and in (7.11) accounted for by the xi j∈∂i Kij xj term in the exponential, so that the cavity distribution P (i) (x∂i ) factors w.r.t. subtrees rooted in the nodes in ∂i, Y (i) Pj (xj ) (7.12) P (i) (x∂i ) = j∈∂i
This factorization, which is exact on trees, is due to Bethe and Peierls. On sparse random graphs, where each node has finitely many neighbouring nodes, it is an approximation , which becomes exact in the thermodynamic limit N → ∞. 84
The reason for this is due to the observation that factorization will fail only due to the presence of loops in the system. Assuming every vertex in the graph to have on average c neighbours, the total number of nodes in the k-th coordination shell is O(ck ). These will generally be distinct as long as ck ≪ N . Conversely you may expect different subtrees rooted in the neighbours of i to connect to a common site, hence to have a loop, if ck O(N ) or k = O(ln N ). I.e. loops typically diverging length. As correlations are expected to decay along paths, with decay constants O(1), effective factorization becomes asymptotically exact. (i)
With this factorization, we have a simple recursion for the cavity-distributions Pj (xj ), viz. 2
(i) Pj (xj )
=
e−iλε xj (i) Zj
Z
Y
dxℓ eixj
P
ℓ∈∂j\i
Kij xℓ
ℓ∈∂j\i
Y
(j)
Pℓ (xℓ )
(7.13)
ℓ∈∂j\i
for all j and all i ∈ ∂j.
This set of recursions is clearly self-consistently solved by zero-mean Gaussians of the form n o 1 1 (i) 2 exp − (7.14) x Pj (xj ) = q (i) j (i) 2∆j 2π∆ j
(i)
Inserting this into (7.13), generates a recursion relation for the cavity variances ∆j , 1
(i)
∆j =
iλε +
P
(j)
2 ℓ∈∂j\i Kjℓ ∆ℓ
(7.15)
In a similar vein, and as a consequence, the marginals Pi (xi ) are Gaussian with variance ∆i given by 1 (7.16) ∆i = P 2 ∆(i) iλε + ℓ∈∂i Kiℓ ℓ
The recursion relations for the cavity variances are solved iteratively on a computer, using random initial conditions, and the variances of the marginals are evaluated in terms of these, once convergence of the iteration has been reached. In terms of these we have ρM (λ) = lim
εց0
X 1 Re ∆i πN i
for the spectral density. (i)
If we represent the complex valued cavity variances ∆ℓ as (i)
(i)
(i)
∆ℓ = aℓ + ibℓ 85
(7.17)
we have Re ∆i =
ε+
P
P
2 (i) ℓ∈∂i Kiℓ aℓ P (i) 2
ε+
2 ℓ∈∂i Kiℓ aℓ
+ λ+
(i) 2
2 ℓ∈∂i Kiℓ bℓ
P 2 a(i) = 0. A small non-zero ε The ε → 0 limit is singular (in λ) on sites i on which ℓ∈∂i Kiℓ ℓ must be kept in order to see contributions for X 2 (i) Kiℓ bℓ ≃ 0 λ+ ℓ∈∂i
One can show that these give the spectral density of localized states (smoothed at the scale of ε, as keeping it amounts to replacing δ-functions by Lorentzians. P 2 a(i) 6= 0. The limit in such a case gives the The ε → 0 limit is unproblematic for ℓ∈∂i Kiℓ ℓ contribution extended states in the continuous spectrum to the spectral density.
7.3.1
Results
7.3.2
The Dense Limit and Wigner’s Semi-Circle
In the limit of large average connectivity c ≫ 1 we need to scale the non-zero matrix elements √ according to Kij = Jij / c, with the Jij of zero-mean and variance J 2 , in order to have a spectrum which has its bulk in a bounded domain in R. (i)
Comparing (7.15) and (7.16) we see that the cavity variances ∆j and the variances ∆j of the marginals differ only by a small amount which vanishes as the c → ∞-limit is taken (i)
∆j = ∆j + O(c−1 ) Using this in (7.16) we have ∆i =
iλε +
1 c
1 P
2 ℓ∈∂i Jiℓ ∆ℓ
The sum in the denominator is then evaluated by appeal to the law of large numbers (ignoring correlations between the Jiℓ2 and the ∆ℓ , giving ∆i = Defining
iλε +
J2 c
1 P
ℓ∈∂i
∆ℓ
1X ∆ℓ c→∞ c
∆ = lim
ℓ∈∂i
86
1
0.8
ρ(λ)
0.6
0.4
0.2
0 -3
-2
-1
0
λ
1
2
3
Figure 7.3: Spectral density of Poisson Random Graph at c = 2 with Kij ∼ N (0, 1/c), evaluated at ε = 10−300 (full line). Simulation result averaged over 500 matrices of dimension N = 2000 (dashed) we see that ∆i is asymptotically independent of i in the large c limit, turning (7.16) into a self-consistency equation for ∆, viz. ∆=
1 iλε + J 2 ∆
(7.18)
This is a quadratic equation for ∆, which is solved by ∆1,2 = −i Taking the limit ε → 0 we have Re ∆i = Re ∆ =
λε 1 p ± 2 4J 2 − λ2ε 2 2J 2J
1 2J 2
√
0
4J 2 − λ2 ; |λ| ≤ 2J ; |λ| > 2J
which when inserted into (7.17) finally gives ρ(λ) =
1 p 2 4J − λ2 , 2πJ 2 87
|λ| ≤ 2J .
(7.19)
10
ρ(λ)
1
0.1
0.01
0.001 -3
-2
-1
0
λ
1
2
3
Figure 7.4: Spectral density of Poisson Random Graph at c = 2 with Kij ∼ N (0, 1/c), evaluated at ε = 10−300 (full line) and ε = 10−3 (full line). Simulation result averaging over 500 matrices of dimension N = 2000 (dashed). Simulatioon is indistinguishable from ε = 10−3 . This is Wigner’s celebrated semi-circle result for the spectral density of non-sparse real symmetric Gaussian random matrices.
88
0.7 0.6
ρ(λ)
0.5 0.4 0.3 0.2 0.1 0 -2
-1.5
-1
-0.5
0
0.5
1
1.5
2
λ
Figure 7.5: Spectral density of Poisson Random Graph at c = 3 with Kij = ±1/c, evaluated at ε = 10−300 (full red line) and ε = 10−3 (dashed green).
89
1 0.9 0.8 0.7
ρ(λ)
0.6 0.5 0.4 0.3 0.2 0.1 0 -5
-4
-3
-2
-1
0
1
2
3
4
5
λ
Figure 7.6: Spectral density of Poisson Random Graph of size N = 50000 at c = 4 with Kij ∼ N (0, 1/c), evaluated at ε = 10−300 (red full line), thermodynamic limit (black).
90
1
ρ(λ)
0.1
0.01
0.001
0.0001 -5
-4
-3
-2
-1
0
1
2
3
4
5
λ
Figure 7.7: Spectral density of Poisson Random Graph of size N = 50000 at c = 4 with Kij ∼ N (0, 1/c), evaluated at ε = 10−300 (red full line), localized states (green dashed), thermodynamic limit extended states (black), thermodynamic total DOS (blue).
91
Chapter 8
Dynamics, Optimization and Statistical Mechanics 8.1
Optimization
Would like to find mimima of complicated cost functions H(x) for x ∈ Ω, where Ω could be Rn , Zn , SN . This can be particularly complicated if the configuration space Ω is discrete, as the option of finding zeros of ∂xi H(x) is not an option. Another reason which might make this option infeasible, even if available in principle, is that dimensions N of configuration spaces may be exceedingly large. One might also wish to obtain suitable macroscopic characterizations of minimizing states x.
8.1.1
Example: Travelling Salesman Problem (TSP)
The TSP is about fining the shortest (cyclic) tour that visits N cities. If Cij is denotes the cost of travel from city i to city j. In this case H = H(π) =
N X
Cπ(i),π(i+1) ,
(8.1)
i=1
where π ∈ SN , the group of permutations of N elements, and one sets N + 1 ≡ 1 to enforce cyclic tours. Also one normally has Cij = Cji although this is not a fundamental restriciton. Want to minimize H. Problem contains disorder (the set of all possible distances or costs {Cij } is heterogeneous), and except in very simple cases, does not allow to spot the otimal tour by inspection, or by simple algorithms. 92
The direct method, computing total costs for all N ! permutations of the tour and picking the least costly is infeasible already for moderate N due to the rapid growth of N ! with N (e.g. 100! ≃ 9.33 10157 ).
8.1.2
Example: Coding and Decoding
The optimization problem here is to design (efficient) codes which minimize the average probability of errors (introduced by noisy transmission) over an ensemble of messages. Given a message m ∈ {0, 1}M one encodes it via m → x(m) ∈ {0, 1}N , where N ≥ M .
Transimission over a noisy channel distorts the message x(m) → y(6= x). Error probability for a given message X PB (m) = Q(y|x(m))1I{d(y)6=m} (8.2) y
where d(y) is the decoding of the received message y, and Q(y|x) =
N Y i=1
Q(yi |xi )
(8.3)
with Q(yi |xi ) denoting the conditional probability of receiving bit yi , given the bit sent was xi . These conditional probailities encode the model of the noisy transmission channel. The ultimate coal is to find codes C : {0, 1}M → {0, 1}N that minimize the total error probability 1 X PB (m) . (8.4) PB = M 2 m Note, the uniform average over all possible M -bit messages could be replaced by a non-uniform average, if information of the statistics of messages were available.
8.1.3
The Role of Statistical Mechanics
Given a cost function H(x) one could think of it as an energy function of a statistical mechanical system. In thermal equilibrium the states x of the system are distributed according to the GibbsBoltzmann probability distribution pβ (x)
1 e−βH(x) . ZN (β)
(8.5)
Assuming that the “groundstate” x∗ = argminx H(x) is unique, the Gibbs-Boltzmann distribution gives all weight to x∗ as β → inf ty, pβ (x) → δ(x − x∗ ) , 93
as
β→∞.
(8.6)
The minimal cost (per degree of freedom) N −1 H(x∗ ) is then equal to the zero temperature limit of the free energy N −1 H(x∗ ) = lim (βN )−1 ln[ZN (β)] . (8.7) β→∞
For certain stylized systems one can obtain anwers about minimal costs, or macroscopic properties of minimizing configurations x∗ analytically. In cases with disorder, typical results are given by the quenched free energy (evaluated e.g. using replica to average over the disorder). For many practical world problems one has to resort to simulations. However, analytic results can provide good test-beds to assess the quality of optimization algorithms. Simulated Annealing Algorithm A generic algorithm that implements the heuristics suggested by Statistical Mechanics is simulated annealing. • Simulated Annealing: Set β = β0 1 Equilibrate the system (using the Metropolis algorithm) at inverse temperature β. 2 Measure hN −1 H(x∗ )iβ .
3 Lower the temperature: β ← β + ∆β (using e.g. fixed ∆β ). 3 Goto [1].
Practical implementations differ bye choice of cooling schedule (β0 , ∆β, βfin ) with ∆β possibly itself temperature dependent, and by the simulation time spent to approximate equilibration. An often used device is to monitor the best-so-far solution, i.e. during measurement phases record and keep the configuration that leads to the lowest cost so far seen during the simulation. Also, quite often, several siulated annealing runs are performed, and the best solution of several runs is recorded.
8.2
Dynamics
We are looking at dynamical evoltion in complex systems for which there exists a Lyapunov function. Assume dynamics is given by a set of autonomous ODEs in RN x˙ i = fi (x) ,
i = 1, . . . , N .
(8.8)
Suppose there is a function H = H(x), s.t. (i) (ii)
∂H , with q(xi ) > 0 , ∂xi for all x ,
fi (x) = −q(xi ) H(x) ≥ C ,
94
(8.9)
and C > −∞.
Then H(x) is a Lyapunov function, i.e. a function which is monotonously non-increasing under the dynamcis. To see this compute dH dt
X ∂H dxi ∂xi dt i X ∂H 2 q(xi ) ≤ 0 = − ∂xi =
(8.10)
i
with equality iff fi (x) = 0 for all i, i.e. at stationary points of H. Remark 1: Only local minima of H(x) will correspond to stable fixed-points of the dynamics (Exercise: verify this statement, by looking at the Jacobian of the dynamical system at fixed points.) Remark 2: We can allow that there are different functions qi (xi ) > 0 for different i. Positive constants are naturally also allowed. Example: graded response neurons dynamical equations of the form Ci
Systems of graded response neurons are described by
ui X dui =− + Jij g(uj ) + Ii , dt Ri
(8.11)
j
where Ci is the ‘capacity’ of the i-th neuron, ui is the transmembrane voltage, Ri a transmembrane resistance, the Jij are synaptic constants, and vi = g(ui ) is the firing rate of neuron i, related to the transmembrane potential via a transfer-function g( ), which we assume to be a monotone increasing function of its argument, very small below a certain threshold, increasing substantially above threshold, and levelling off at large values of ui . The Ii represent external (current)-stimuli. Claim: If synaptic interactions are symmetric Jij = Jji , then the graded response dynamics is governed by the Lyapunov function H(v) = −
X X 1 1X Jij vi vj − Ii v i + G(vi ) , 2 Ri i,j
where G(vi ) =
i
Z
vi
i
dv ′ g −1 (v ′ )
is the integrated inverse input-output relation of the neurons.
95
(8.12)
(8.13)
To see this evaluate dH dt
X ∂H dvi X ∂H dui = g ′ (ui ) ∂vi dt ∂vi dt i i i X ∂H 1 −1 1 hX = − Jij vj + Ii − g ′ (ui ) g (vi ) ∂vi Ci Ri i j X 1 ∂H 2 g ′ (ui ) ≤ 0 , = − Ci ∂vi =
(8.14)
i
where we have used that g −1 (vi ) = ui , and g ′ (ui ) > 0, and also invested the equations of motion. Thus H(v) is monotone non-increasing under the dynamics. If G(vi ) grows faster than quadradic in the vi , then H will be bounded from below, and thus a Lyapunov function of the dynamics. To analyse the long-term behaviour of the dynamics, look at the β → ∞ limit of the free energy of the system. Disorder is for instance in the couplings. Jij =
1 X µ µ ξ ξ . N µ i j
Allows to compute phase diagram, firing-rate distributions with/without stimulus. Example: Ecological resource competition –P types of resource, µ = 1, . . . , P –ni (t) ≥ 0: population density of species i. Dynamics
–N species, i = 1, . . . , N
1 X µ n˙ i = ni − fi + Q (t)qiµ , N µ
where –fi > 0 is the decay rate of species i without resources –qiµ is the per-capita demand on resource µ by species i, and X µ Qµ (t) = Qµ0 − qi ni (t)
(8.15)
(8.16)
i
is the abundance of recource mu, with Qµ0 denoting the initial abundance without consumption. Claim The function H = H(n) =
1 X µ 2 X Q (t) + fi ni (t) 2N µ i
96
(8.17)
is monotone decreasing under the dynamics. To see this, evaluate dH dt
=
X ∂H dni ∂ni dt i X 1 X
dn ∂Qµ i + fi N µ ∂ni dt i 2 X 1 X µ Q (t)qiµ ≤ 0 = − fi − N µ =
Qµ
(8.18)
i
with equality iff n˙ i − 0 for all i.
Once more the collective properties of the system can be studied by looking at the zero temperature limit of the free energy. Disorder: Assumptions µ µ –qiµ are i.i.d with i i = q and Var(qi ) = 1; the latter specification setting a scale √ hq µ –Q0 = P + s P xµ , with xµ ∼ N (0, 1).
In these specifications s is a parameter characterizing the variability of resources, and q denotes the mean resource load of each species [the similarity between species is characterized by the ratio of q and the variance of the qiµ (which was set to 1). The relevant parameter is α = P/N . Evaluate average free energy using replica and δ-function methods to represent the order-parameters that emerge from the calculation. Key order parameter is φ=
1 X hni i N i
Find: (1) for α < αc get φ = α = P/N , i.e. there is on average one resource per surviving species, while for α > αc get φ < α, i.e. there is less than one species per resource. (2) αc = αc (sq) depends on the product sq and decreases from 0.5 at sq = 0 to 0 at sq = 3 Distribution of eigenvector components of a random matrix on a graph, which is of the form Mij = cij Kij
Consider a random matrix
Eigenvectors vα satisfy M vα = λα vα . Find distribution of eigenvector components via pβ (v) =
P X β P 2 1 vi2 − N ) e− 2 i (vi − j Mij vj ) δ( Z(β) i
97
0.7 0.6 0.5
α
0.4 0.3 0.2 0.1 0 0
0.5
1
1.5
2
2.5
3
sq
Figure 8.1: Schematic phase diagram of the ecological resource-competition problem. For α > αc (sq) there is typical less then one species per resource (i.e. species can choose), whereas for α < αc (sq), i.e. below the curve, the number of surviving species is equal to the number of different resources. and take limit β → ∞, where the equilibrium distribution is concentrated on eigenvectors corresponding to eigenvalue λ. Thus investigate the β → ∞-limit of Z Y P X β P 2 Z(β) = dvi e− 2 i (vi − j Mij vj ) δ( vi2 − N ) i
i
This problem is as yet unsolved.
98
Chapter 9
Transfer Matrices We will look at the transfer-matrix method to study one-dimensional systems, starting with homogeneous systems and going on to one-dimensional random field systems. A possible application of the random field model is a highly idealized description of equilibrium properties of double-stranded DNA, and in particular the phenomenon of bubble formation, i.e. the spontaneous local unbinding of the double strand.
9.1
One-Dimensional Systems
Consider a homogeneous chain-like system system (Ising for specificity), closed to form a ring. Its Hamiltonian is N N X X Si (9.1) Si Si+1 − h H(S) = −J i=1
i=1
The evaluation of the partition function ZN =
X
e−βH(S)
S
can be understood in terms of matrix algebra as follows. With K = βJ and B = βH we can write N XY ZN = eKSi Si+1 +BSi . S i=1
Upon introduction of a 2 × 2 transfer-matrix Γ with elements K+B B ′ B ′ e e−K ⇔ Γ= ΓS,S ′ = e 2 S+KSS + 2 S e−K eK−B 99
(9.2)
the partition functionZN is seen to evaluate to ZN = Tr ΓN .
(9.3)
Note: One could also have chosen a slightly different (asymmetric) variant of the transfer-matrix ˜ S,S ′ = eKSS ′ +BS , without changing the above relation. Exercise: Check this. such as Γ Evaluating the trace using the basis of eigenvectors of Γ gives ZN = γ1N + γ2N . Appealing to Perron’s theorem we know that γ1 > |γ2 |, so one obtains the very compact result −βf (β) = lim
N →∞
1 ln ZN = ln γ1 . N
(9.4)
The eigenvalues of Γ are γ1,2 = e
K
h
q i cosh(B) ± sinh2 (B) + e−4K
This gives ∂f = m = m(β, h) = − ∂h
cosh(B) sinh(B) + √sinh(B) sinh2 (B)+e−4K q cosh(B) + sinh2 (B) + e−4K
(9.5)
As m(β, h) → 0 for h → 0 at any finite value of β, the system does not exhibit a phase-transition to a magnetic low temperature phase with non-vanishing spontaneous magnetization. The T → 0 (β → ∞)-limit is often taken in a form which keeps B = βh constant. Taking the limit that way we get m(B) − lim m(β, h)|B = sign(B) β→∞
Transfer-matrices can also be used to evaluate expectations of local observables or correlation functions. Let g be a function of a single spin variabe Si Then QN P P N S g(Si ) j=1 ΓSj ,Sj+1 Si g(Si )(Γ )Si ,Si = hg(Si )i = Tr ΓN γ1N + γ2N Express Γ in terms of its eigenvalues and eigenvectors, Γ = U DU −1 100
with D = Diag(γα ) and U the column matrix of eigenvectors of Γ, using U −1 = U t , we have P N X Si ,α g(Si )USi ,α γα USi ,α 2 → g(S)US,1 , hg(Si )i = γ1N + γ2N S as N → ∞. Here we have used Peron’s theorem, which entails that (γα /γ1 )N → δα,1 .
In a similar vein the expectation of two functions of observables on different sites gives P |j−i| ) N −|j−i| ) Si ,Sj h(Sj )(Γ Sj ,Si Si ,Sj g(Si )(Γ , hg(Si )h(Sj )i = N N γ1 + γ2
Which upon inserting the spectral representation for the powers of Γ in the numerator results in P P |j−i| N −|j−i| USj ,α h(Sj )USj ,β γβ USi ,beta α,β g(Si )USi ,α γα Si ,Sj hg(Si )h(Sj )i = γ1N + γ2N Dividing numerator and denominator by γ1N results in the thermodynamic limit in hg(Si )h(Sj )i =
XX
US,1 g(S) US,α
α S,S ′
γα γ1
|j−i|
US ′ ,α h(S ′ ) US ′ ,1
Introducing the matrix g with elements gα,β =
X
US,α g(S)US,β
S
and similarly for h, we get the following compact expressions in the thermodynamic limit hg(Si )i = g1,1 ,
hg(Si )h(Sj )i =
X α
g1,α
γα γ1
|j−i|
hα,1
(9.6)
which explicitly exhibit translational invariance. In a similar vein, we get C g,h (i, j) = hg(Si )h(Sj )i − hg(Si )ihh(Sj )i |j−i| X γα g1,α = hα,1 γ1 α(6=1)
for the (connected) correlation functions.
101
(9.7)
9.2
Quasi One-Dimensional Systems
We have kept the notation deliberately general, to allow stating in brief terms the generalizations we get when looking at quasi-one-dimensional systems. Suppose that we are looking at a system composed of M × N spins. Denote by Si the configurations of one of the N columns, containing M spins each (there are 2M such configurations. Assuming that the Hamiltonian takes the form H = H(S) = −
N X i=1
V (Si , Si+1 ) −
X
U (Si )
i
we introduce a transfer matrix ΓS,S ′ = exp giving
n βU (S) 2
+ βV (S, S ′ ) +
βU (S ′ ) o , 2
ZM N = Tr ΓN ,
(9.8)
thus
1 1 ln ZM N = ln γ1 , N →∞ N M M where γ1 is the largest eigenvalue of Γ. −βfM (β) = lim
(9.9)
For expectations and connected correlation functions, we obtain in a similar vein hg(Si )i = g1,1
(9.10)
and hg(Si )h(Sj )i − hg(Si )ihh(Sj )i =
X
α(6=1)
g1,α
γα γ1
|j−i|
hα,1 .
(9.11)
Note: To investigate a proper 2-dimensional system, one would have to take the M → ∞-limit as well. As the dimension of Γ for a strip of widht M is 2M , we see that this task is considerably more difficult than that faced when looking at a proper 1-D system. Nevertheless, it has been solved for the 2-D Ising model without an external field, as well as for a number of other 2-D systems. R. Baxter’s book “Exactly Solved Models in Statistical Mechanics” is the best reference.
102
9.3
Spin Chains in a Random Field
If we consider systems with random bonds or random fields, the transfer matrix techniques can no longer be used in the form just described. Considering a random field system with X X H = −J Si Si+1 − hi Si i
i
one would have different i-dependent transfer matrices for each bond i, i + 1, e.g. ′
(i)
ΓS,S ′ = eKSS +Bi S
(9.12)
if we choose the non-symmetric convention. These matrices do no longer commute, if they have different Bi , −2K e 1 (i) (j) (i) (j) (j) (i) , [Γ , Γ ] = Γ Γ − Γ Γ = 2 sinh(Bi − Bj ) −1 −e−2K so they cannot be diagonalized by one and the same matrix U of eigenvectors. Exercise Verify the above commutation relation. As a consequence the simple expressions for free energy and correlation functions that we derived for homogeneous systems are no longer available. However we can still express these objects in terms of matrix products, for which so-called Lyapunov exponents play the role of the eigenvalues E.g. for a closed ring of circumference N we have ZN = Tr
N Y i=1
Γ(i) ∼ eN γ ,
(9.13)
where γ is the largest Lyapunov exponent of the family of random matrices {Γ(i) }, so that 1 ln ZN = γ N →∞ N
−βf = lim
The method to compute γ and other thermodynamic quantities, which we use here is due to Dyson (1953) and Schmidt (1957) and was more recently applied to random-field Ising chains by Bruinsma and Aeppli (1983). It starts by looking at the mapping xi xi−1 (i) =Γ (9.14) yi yi−1 which is equivalent to two coupled stoachastic recursion relations xi = eK+Bi xi−1 + e−K+Bi yi−1 yi = e−K−Bi xi−1 + eK−Bi yi−1 . 103
Considering, instead of the pair of recursions the recursion for the ratio ρi = xi /yi , we get ρi = e2Bi
2K ρ eK ρi−1 + e−K i−1 + 1 2Bi e = e ≡ FBi (ρi−1 ) −K K e ρi−1 + e ρi−1 + e2K
(9.15)
This defines an iterative mapping ρi−1 → ρi which depends on the random realization of Bi . For a bimodal distribution of the Bi , i.e. Bi ∈ {B+ , B− }, we get two branches F+ and F− of the mapping. F
+
ρi
F−
ρ i−1
Figure 9.1: Qualitative sketch of the ρ-dynamics for a bimodal B distribution. The bounds of the support of the asymptotic invariant measure are at the intersections of the F+ and F− branch of the recursion relation with the diagonal. Describing the dynamics at the level of a ‘time’-dependent pdf Pi (ρ), we have — for general p(B) — the Chapman-Kolmogorov equation Z Z Z Pi (F −1 (ρ)) Pi+1 (ρ) = dB p(B) dρ′ Pi (ρ′ )δ(ρ − FB (ρ′ )) = dB p(B) ′ B−1 . FB (FB (ρ)) At large ‘times’ i → ∞ the possible positions ρ will be assumed with an equilibrium distribution P∞ (ρ), and this asymptotic density is given as the “fixed-point” solution of this dynamic equation with Pi ≡ Pi+1 ≡ P∞ . Note that P∞ is in general a complicated multi-fractal object. For this reason it is advisable to study the dynamics in terms of the probability distribution Pi (ρ) = Rρ ′ ′ −∞ d ρ Pi (ρ ) rather than the probability density functions Pi . For these, the equivalent of the Chapman-Kolmogorov equation is Z Pi+1 (ρ) = dB p(B) Pi (FB−1 (ρ)) . Exercise: Verify this. 104
Depending on the the pdf of the random fields Bi and on K, the stationary P∞ can have qualitatively different properties. Its support can be a fractal, in which case the distribution P∞ exhibits a so called ‘devil’s staircase’. The suppport can be a compact interval without gaps, but the pdf can still be everywhere non-analytic (a ’fat multi-fractal’), slopes of P∞ at the boundary of the support can be zero or infinite. Qualitative changes in the stationary P∞ (ρ), are then expected to manifest themselves in thermodynamic properties of the system
9.3.1
Thermodynamics of the Random Field Chain
Thermodynamic properties of the random field chain follow from the partition function ZN = Tr
N Y i=1
N Y Γ(i) = Γ(i) i=1
1,1
+
N Y i=1
Γ(i)
2,2
With reference to (9.14) this can be expressed as ZN = xN |x0 =1,y0 =0 + yN |x0 =0,y0 =1 . Using
yi yi−1
one obtains yN =
(9.16)
= e−K−Bi ρi−1 + eK−Bi ,
N Y i=1
e−K−Bi ρi−1 + eK−Bi y0
(9.17)
(with ρ0 = 0 and y0 = 1) for the second contribution to ZN . Similarly one gets x N = ρ N yN = ρ N
N Y i=2
e−K−Bi ρi−1 + eK−Bi y1
(9.18)
with y1 = e−K−Bi x0 (and x0 = 1) for the first contribution. Thus, apart from boundary terms the xN and the yN contribution contain the same product, so they give the same order of magnitude contribution to ZN , hence −βfN
1 1 = ln ZN = ln yN |x0 =0,y0 =1 + O(1/N ) N N X 1 X 1 Bi + ln(ρi−1 + e2K ) = −K − N N i
i
105
(9.19)
10
1 0.9
1 0.8 0.7 P(ρ)
p(ρ)
0.1
0.01
0.001
0.6 0.5 0.4 0.3
0.0001 0.2 1e-05
0.1 0
50
100
150
200
250
300
0
50
100
150
ρ
200
250
300
ρ
0.1
1 0.9 0.8 0.7
0.01 p(ρ)
P(ρ)
0.6 0.5 0.4 0.001
0.3 0.2 0.1
0.0001
0 0
50
100
150
200
250
300
350
400
0
50
100
150
200
ρ
250
300
350
400
ρ
0.1
1 0.9 0.8 0.7
0.01 p(ρ)
P(ρ)
0.6 0.5 0.4 0.001
0.3 0.2 0.1
0.0001
0 0
100
200
300
400
500
600
0
100
200
300
ρ
400
500
600
ρ
0.01
1 0.9 0.8 0.7
P(ρ)
p(ρ)
0.6 0.001
0.5 0.4 0.3 0.2 0.1
0.0001
0 0
100
200
300
400
500
600
700
800
900
0
ρ
100
200
300
400
500
600
700
800
900
ρ
Figure 9.2: Stationary probability density functions P∞ (ρ) (left) and corresponding (integrated) probability distributions P∞ (ρ) (right) for a random field model with Bi ∈ {±0.01} and (from top to bottom) K = 0.25, K = 0.5, K = 0.75 and K = 1.0. In the first two cases, we have a ’thin multi-fractal’, with an infinite hierarchy of gaps in the support (corresponding to flat regions in the probability distributions) and complete devil’s staircases for the P∞ (ρ). In the latter two cases we have ’fat multi-fractals’ with P∞ (ρ) monotone increasing, continious but non-analytic on the entire support. 106
As N → ∞ the arithmetic means converge to expectations over the distributions of random variables (the Bi , and the ρi (in the stationary regime), so −βf = lim −βfN = −K − hBi + h ln(ρ + e2K )iρ N →∞
(9.20)
Another quantity of interest is the local magnetization (enumerating spins anti-clockwise with periodic boundary conditions (N + 1 ≡ 1)) P (N ) (N −1) (1) Γ Γ . . . Γ xN |x0 =1,y0 =0 − yN |x0 =0,y0 =1 S S1 ,SN SN ,SN −1 S2 ,S1 S1 hS1 i = = ZN xN |x0 =1,y0 =0 + yN |x0 =0,y0 =1 It would be appealing to invoke the above representations (9.17) and (9.18) in terms of products, and argue that the factors (i = 2, 3 . . . , N ) are common between xN and yN and thus cancel, giving ρN y1 − eK−B1 ρN e−K−B1 − eK−B1 hS1 i = = ρN y1 + eK−B1 ρN e−K−B1 + eK−B1 This argument, however, is false, as the ρi sequences appearing in xN and yN are different, having different initial conditions. (Yet it seems that the Bruinsma Aeppli paper is using this kind of reasoning; have a look at that paper, which is available from the course web page) Thus it appears that one has to evaluate expressions numerically for long chains, by explicitly evaluating recursions and products. Note also that the local magnetization m1 = hS1 i depends on local randomness (B1 ), hence hS1 i will not be self-averaging and one would have to find the pdf p(m1 ).
9.4
Double-Stranded DNA as an Application
The random field Ising model could provide a simplified model of double stranded DNA, consisting of sequences of complementary G-C and A-T pairs. It is known that the binding energy (stability) of the G-C pairs is larger than that of the A-T pairs. Taking a two state model, where Si = +1 denotes a bound state of a pair and Si = −1 a state where the pair is unbound one can write the Hamiltonian of the double stranded chain as X X H = −J Si Si+1 − hi Si (9.21) i
i
where the hi take two values, hGC > hAT > 0 to represent the different binding energies. The interaction term represents internal stiffness, local bending-energy of the strand is required to
107
have neighbouring bound-unbound pairs, whereas if neighbouring pairs are both bound or both unbound the strand is locally straight and no such binding energy is required. Of interest would be the thermodynamic properties of this system, which could be obtained by the methods described above. The local magnetization is then related to the probability of having an unbound pair at location i via 1 pui = (1 − hSi i) 2 Another qunantity of interest would be the probability of having a bubble of length n of consecutive unbound pairs (without loss of generality starting at i = 1), given by * n + Y 1 − Sj pu1 (n) = 2 j=1
As for the local magnetization the probability of having a bubble of length n will not be selfaveraging. 1−S
To push the evaluation a bit further, note that the Pj− = 2 j are projectors on the Sj = −1 state. Expressing the average pu1 (n) in terms of the transfer-matrices, we then have Q P (N ) (N −1) (1) n − Γ Γ . . . Γ S j=1 Pj S1 ,SN SN ,SN −1 S2 ,S1 u p1 (n) = ZN Q P (N ) (N −1) (n) n−1 (j) Γ Γ . . . Γ ′ j=1 Γ−,− S −,SN SN ,SN −1 Sn+1 ,− = ZN Qn−1 (j) yN |xn−1 =0,yn−1 =1 j=1 Γ−,− = (9.22) xN |x0 =1,y0 =0 + yN |x0 =0,y0 =1 (j)
In this expression one may substitute Γ−,− = eK−Bj for j = 1, . . . , n − 1 and invoke product representations ` a la (9.17) and (9.18), to evaluate the remainder of the above result in terms of suitable {ρi } sequences. Some explicit results to follow.....
108