Random Processes in Physics and Finance
This page intentionally left blank
Random Processes in Physics and Finance MELVIN LAX, WEI CAI, MIN XU Department of Physics, City University of New York Department of Physics, Fairfield University, Connecticut
OXPORD UNIVERSITY PRESS
OXTORD UNIVERSITY PRESS
Great Clarendon Street, Oxford OX2 6DP Oxford University Press is a department of the University of Oxford. It furthers the University's objective of excellence in research, scholarship, and education by publishing worldwide in Oxford New York Auckland Cape Town Dares Salaam Hong Kong Karachi Kuala Lumpur Madrid Melbourne Mexico City Nairobi New Delhi Shanghai Taipei Toronto With offices in Argentina Austria Brazil Chile Czech Republic France Greece Guatemala Hungary Italy Japan Poland Portugal Singapore South Korea Switzerland Thailand Turkey Ukraine Vietnam Oxford is a registered trade mark of Oxford University Press in the UK and in certain other countries Published in the United States by Oxford University Press Inc., New York © Oxford University Press 2006 The moral rights of the authors have been asserted Database right Oxford University Press (maker) First published 2006 All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, without the prior permission in writing of Oxford University Press, or as expressly permitted by law, or under terms agreed with the appropriate reprographics rights organization. Enquiries concerning reproduction outside the scope of the above should be sent to the Rights Department, Oxford University Press, at the address above You must not circulate this book in any other binding or cover and you must impose the same condition on any acquirer British Library Cataloguing in Publication Data Data available Library of Congress Cataloging in Publication Data Data available Printed in Great Britain on acid-free paper by Biddies Ltd., King's Lynn, Norfolk ISBN 0-19-856776-6 10 9 8 7 6 5 4 3 2 1
978-0-19-856776-9
Preface The name "Econophysics" has been used to denote the use of the mathematical techniques developed for the study of random processes in physical systems to applications in the economic and financial worlds. Since a substantial number of physicists are now employed in the financial arena or are doing research in this area, it is appropriate to give a course that emphasizes and relates physical applications to financial applications. The course and text on Random Processes in Physics and Finance differs from mathematical texts by emphasizing the origins of noise, as opposed to an analysis of its transformation by linear and nonlinear devices. Of course, the latter enters any analysis of measurements, but it is not the focus of this work. The text opens with a chapter-long review of probability theory to refresh those who have had an undergraduate course, and to establish a set of tools for those who have not. Of course, this chapter can be regarded as an oxymoron since probability includes random processes. But we restrict probability theory, in this chapter, to the study of random events, as opposed to random processes, the latter being a sequence of random events extended over a period of time. It is intended, in this chapter, to raise the level of approach by demonstrating the usefulness of delta functions. If an optical experimenter does his work with lenses and mirrors, a theorist does it with delta functions and Green's functions. In the spirit of Mark Kac, we shall calculate the chi-squared distribution (important in statistical decision making) with delta functions. The normalization condition of the probability density in chi-square leads to a geometric result, namely, we can calculate the volume of a sphere in n dimensions without ever transferring to spherical coordinates. The use of a delta function description permits us to sidestep the need for using Lebesgue measure and Stieltjes integrals, greatly simplifying the mathematical approach to random processes. The problems associated with Ito integrals used both by mathematicians and financial analysts will be mentioned below. The probability chapter includes a section on what we call the first and second laws of gambling. Chapters 2 and 3 define random processes and provide examples of the most important ones: Gaussian and Markovian processes, the latter including Brownian motion. Chapter 4 provides the definition of a noise spectrum, and the WienerKhinchine theorem relating this spectrum to the autocorrelation. Our point of view here is to relate the abstract definition of spectrum to how a noise spectrum is measured.
vi
PREFACE
Chapter 5 provides an introduction to thermal noise, which can be regarded as ubiquitous. This chapter includes a review of the experimental evidence, the thermodynamic derivation for Johnson noise, and the Nyquist derivation of the spectrum of thermal noise. The latter touches on the problem of how to handle zero-point noise in the quantum case. The zero-frequency Nyquist noise is shown to be precisely equivalent to the Einstein relation (between diffusion and mobility). Chapter 6 provides an elementary introduction to shot noise, which is as ubiquitous as thermal noise. Shot noise is related to discrete random events, which, in general, are neither Gaussian nor Markovian. Chapters 7-10 constitute the development of the tools of random processes. Chapter 7 provides in its first section a summary of all results concerning the fluctuation-dissipation theorem needed to understand many aspects of noisy systems. The proof, which can be omitted for many readers, is a succinct one in density matrix language, with a review of the latter provided for those who wish to follow the proof. Thermal noise and Gaussian noise sources combine to create a category of Markovian processes known as Fokker-Planck processes. A serious discussion of Fokker-Planck processes is presented in Chapter 8 that includes generation recombination processes, linearly damped processes, Doob's theorem, and multivariable processes. Just as Fokker-Planck processes are a generalization of thermal noise, Langevin processes constitute a generalization of shot noise, and a detailed description is given in Chapter 9. The Langevin treatment of the Fokker-Planck process and diffusion is given in Chapter 10. The form of our Langevin equation is different from the stochastic differential equation using Ito's calculus lemma. The transform of our Langevin equation obeys the ordinary calculus rule, hence, can be easily performed and some misleadings can be avoided. The origin of the difference between our approach and that using Ito's lemma comes from the different definitions of the stochastic integral. Application of these tools contribute to the remainder of the book. These applications fall primarily into two categories: physical examples, and examples from finance. And these applications can be pursued independently. The physical application that required learning all these techniques was the determination of the motion and noise (line-width) of self-sustained oscillators like lasers. When nonlinear terms are added to a linear system it usually adds background noise of the convolution type, but it does not create a sharp line. The question "Why is a laser line so narrow" (it can be as low as one cycle per second, even when the laser frequency is of the order of 1015 per second) is explained in Chapter 11. It is shown that autonomous oscillators (those with no absolute time origin) all behave like van der Pol oscillators, have narrow line-widths, and have a behavior near threshold that is calculated exactly.
PREFACE
vii
Chapter 12 on noise in semiconductors (in homogeneous systems) can all be treated by the Lax-Onsager "regression theorem". The random motion of particles in a turbid medium, due to multiple elastic scattering, obeys the classic Boltzmann transport equation. In Chapter 13, the center position and the diffusion behavior of an incident collimated beam into an infinite uniform turbid medium are derived using an elementary analysis of the random walk of photons in a turbid medium. In Chapter 14, the same problem is treated based on cumulant expansion. An analytical expression for cumulants (defined in Chapter 1) of the spatial distribution of particles at any angle and time, exact up to an arbitrarily high order, is derived in an infinite uniform scattering medium. Up to the second order, a Gaussian spatial distribution of solution of the Boltzmann transport equation is obtained, with exact average center and exact half-width with time. Chapter 15 on the extraction of signals in a noisy, distorted environment has applications in physics, finance and many other fields. These problems are illposed and the solution is not unique. Methods for treating such problems are discussed. Having developed the tools for dealing with physical systems, we learned that the Fokker-Planck process is the one used by Black and Scholes to calculate the value of options and derivatives. Although there are serious limitations to the Black-Scholes method, it created a revolution because there were no earlier methods to determine the values of options and derivatives. We shall see how hedging strategies that lead to a riskless portfolio have been developed based on the BlackScholes ideas. Thus financial applications, such as arbitrage, based on this method are easy to handle after we have defined forward contracts, futures and put and call options in Chapter 16. The finance literature expends a significant effort on teaching and using Ito integrals (integrals over the time of a stochastic process). This effort is easily circumvented by redefining the stochastic integral by a method that is correct for processes with nonzero correlation times, and then approaching the limit in which the correlation time goes to zero (the Brownian motion limit). The limiting result that follows from our iterative procedure, disagrees with the Ito definition of stochastic integral, and agrees with the Stratanovich definition. It is also less likely to be misleading as conflicting results were present in John Hull's book on Options, Futures and Other Derivative Securities. In Chapter 17 we turn to methods that apply to economic time series and other forms including microwave devices and global warming. How can the spectrum of economic time series be evaluated to detect and separate seasonal and long term trends? Can one devise a trading strategy using this information? How can one determine the presence of a long term trend such as global warming from climate statistics? Why are these results sensitive to the choice of year from solar year, sidereal year, equatorial year, etc. Which one is best? The most
viii
PREFACE
careful study of such time series by David J. Thomson will be reviewed. For example, studies of global warming are sensitive to whether one uses the solar year, sidereal year, the equatorial year or any of several additional choices! This book is based on a course on Random Processes in Physics and Finance taught in the City College of City University of New York to students in physics who have had a first course in "Mathematical Methods". Students in engineering and economics who have had comparable mathematical training should also be capable of coping with the text. A review/summary is given of an undergraduate course in probability. This also includes an appendix on delta functions, and a fair number of examples involving discrete and continuous random variables.
Contents
A Note from Co-authors
xiv
1
Review of probability 1.1 Meaning of probability 1.2 Distribution functions 1.3 Stochastic variables 1.4 Expectation values for single random variables 1.5 Characteristic functions and generating functions 1.6 Measures of dispersion 1.7 Joint events 1.8 Conditional probabilities and Bayes'theorem 1.9 Sums of random variables 1.10 Fitting of experimental observations 1.11 Multivariate normal distributions 1.12 The laws of gambling 1.13 Appendix A: The Dirac delta function 1.14 Appendix B: Solved problems
1 1 4 5 5 7 8 12 16 19 24 29 32 35 40
2
What is a random process 2.1 Multitime probability description 2.2 Conditional probabilities 2.3 Stationary, Gaussian and Markovian processes 2.4 The Chapman-Kolmogorov condition
44 44 44 45 46
3
Examples of Markovian processes 3.1 The Poisson process 3.2 The one dimensional random walk 3.3 Gambler's ruin 3.4 Diffusion processes and the Einstein relation 3.5 Brownian motion 3.6 Langevin theory of velocities in Brownian motion 3.7 Langevin theory of positions in Brownian motion 3.8 Chaos 3.9 Appendix A: Roots for the gambler's ruin problem 3.10 Appendix B: Gaussian random variables
48 48 50 52 54 56 57 60 64 64 66
x
4
CONTENTS
Spectral measurement and correlation 4.1 Introduction: An approach to the spectrum of a stochastic process 4.2 The definitions of the noise spectrum 4.3 The Wiener-Khinchine theorem 4.4 Noise measurements 4.5 Evenness in u> of the noise? 4.6 Noise for nonstationary random variables 4.7 Appendix A: Complex variable notation
69 69 71 73 75 77 80
5
Thermal noise 5.1 Johnson noise 5.2 Equipartition 5.3 Thermodynamic derivation of Johnson noise 5.4 Nyquist's theorem 5.5 Nyquist noise and the Einstein relation 5.6 Frequency dependent diffusion constant
82 82 84 85 87 90 90
6
Shot noise 6.1 Definition of shot noise 6.2 Campbell's two theorems 6.3 The spectrum of filtered shot noise 6.4 Transit time effects 6.5 Electromagnetic theory of shot noise 6.6 Space charge limiting diode 6.7 Rice's generalization of Campbell's theorems
93 93 95 98 101 104 106 109
7
The 7.1 7.2 7.3 7.4 7.5 7.6 7.7
113 113 117 119 121 122 123 126
8
Generalized Fokker-Planck equation 8.1 Objectives 8.2 Drift vectors and diffusion coefficients 8.3 Average motion of a general random variable 8.4 The generalized Fokker-Planck equation 8.5 Generation-recombination (birth and death) process
fluctuation-dissipation theorem Summary of ideas and results Density operator equations The response function Equilibrium theorems Hermiticity and time reversal Application to a harmonic oscillator A reservoir of harmonic oscillators
69
129 129 131 134 137 139
CONTENTS
8.6 8.7 8.8 8.9 8.10 8.11 8.12 8.13 8.14 9
The characteristic function Path integral average Linear damping and homogeneous noise The backward equation Extension to many variables Time reversal in the linear case Doob's theorem A historical note and summary (M. Lax) Appendix A: A method of solution of first order PDEs
Langevin processes 9.1 Simplicity of Langevin methods 9.2 Proof of delta correlation for Markovian processes 9.3 Homogeneous noise with linear damping 9.4 Conditional correlations 9.5 Generalized characteristic functions 9.6 Generalized shot noise 9.7 Systems possessing inertia
xi
143 146 149 152 153 160 162 163 164 168 168 169 171 173 175 177 180
10 Langevin treatment of the Fokker-Planck process 10.1 Drift velocity 10.2 An example with an exact solution 10.3 Langevin equation for a general random variable 10.4 Comparison with Ito's calculus lemma 10.5 Extending to the multiple dimensional case 10.6 Means of products of random variables and noise source
182 182 184 186 188 189 191
11 The rotating wave van del Pol oscillator (RWVP) 11.1 Why is the laser line-width so narrow? 11.2 An oscillator with purely resistive nonlinearities 11.3 The diffusion coefficient 11.4 The van der Pol oscillator scaled to canonical form 11.5 Phase fluctuations in a resistive oscillator 11.6 Amplitude fluctuations 11.7 Fokker-Planck equation for RWVP 11.8 Eigenfunctions of the Fokker-Planck operator
194 194 195 197 199 200 205 207 208
12 Noise in homogeneous semiconductors 12.1 Density of states and statistics of free carriers 12.2 Conductivity fluctuations 12.3 Thermodynamic treatment of carrier fluctuations 12.4 General theory of concentration fluctuations
211 211
215 216
218
xii
CONTENTS
12.5 Influence of drift and diffusion on modulation noise
222
13 Random walk of light in turbid media 13.1 Introduction 13.2 Microscopic statistics in the direction space 13.3 The generalized Poisson distribution pn(t) 13.4 Macroscopic statistics
227 227 229 232 233
14 Analytical solution of the elastic transport equation 14.1 Introduction 14.2 Derivation of cumulants to an arbitrarily high order 14.3 Gaussian approximation of the distribution function 14.4 Improving cumulant solution of the transport equation
237 237 238 242 245
15 Signal extraction in presence of smoothing and noise 15.1 How to deal with ill-posed problems 15.2 Solution concepts 15.3 Methods of solution 15.4 Well-posed stochastic extensions of ill-posed processes 15.5 Shaw's improvement of Franklin's algorithm 15.6 Statistical regularization 15.7 Image "&v restoration
258 258 259 261 264 266 268 270
16 Stochastic methods in investment decision 16.1 Forward contracts 16.2 Futures contracts 16.3 A variety of futures 16.4 A model for stock prices 16.5 The Ito's stochastic differential equation 16.6 Value of a forward contract on a stock 16.7 Black-Scholes differential equation 16.8 Discussion 16.9 Summary
271 271 272 273 274 278 281 282 283 286
17 Spectral analysis of economic time series 17.1 Overview 17.2 The Wiener-Khinchine and Wold theorems 17.3 Means, correlations and the Karhunen-Loeve theorem 17.4 Slepian functions 17.5 The discrete prolate spheroidal sequence 17.6 Overview of Thomson's procedure 17.7 High resolution results
288 288 291 293 295 298 300 301
CONTENTS
17.8 Adaptive weighting 17.9 Trend removal and seasonal adjustment 17.10 Appendix A: The sampling theorem
xiii
302 303 303
Bibliography
307
Index
323
A note from co-authors Most parts of this book were written by Distinguished Professor Melvin Lax (1922-2002), originated from the class notes he taught at City University of New York from 1985 to 2001. During his last few years, Mel made a big effort in editing this book and, unfortunately, was not able to complete it before his untimely illness. Our work on the book is mostly technical, including correcting misprints and errors in text and formulas, making minor revisions, and converting the book to LaTex. In addition, Wei Cai wrote Chapter 14, Section 10.3-10.5, Section 16.8, and made changes to Section 8.3, 16.4, 16.6 and 16.7; Min Xu wrote Chapter 13 and partly Section 15.6. We dedicate our work in this book in memory of our mentor, colleague and friend Melvin Lax. We would like to thank our colleagues at the City College of New York, in particular, Professors Robert R. Alfano, Joseph L. Birman and Herman Z. Cummins, for their strong support for us to complete this book.
Wei Cai MinXu
1 Review of probability
Introductory remarks The purpose of this chapter is to provide a review of the concepts of probability for use in our later discussion of random processes. Students who have not had an undergraduate probability course may find it useful to have some collateral references to accompany our necessarily brief summary. Bernstein (1998) provides a delightful historical popularization of the ideas of probability from the introduction of Arabic numbers, to the start of probability with de Mere's dice problem, to census statistics, to actuarial problems, and the use of probability in the assessment of risk in the stock market. Why was the book titled Against the Gods ? Because there was no need for probability in making decisions if actions are determined by the gods, it took the Renaissance period before the world was ready for probability. An excellent recent undergraduate introduction to probability is given by Hamming (1991). The epic work of Feller (1957) is not, as its title suggests, an introduction, but a two-volume treatise on both the fundamentals and applications of probability theory. It includes a large number of interesting solved problems. A review of the basic ideas of probability is given by E. T. Jaynes (1958). A brief overview of the frequency ratio approach to probability of von Mises, the axiomatic approach of Kolmogorov, and the subjective approach of Jeffreys is presented below. 1.1 Meaning of probability The definition of probability has been (and still is) the subject of controversy. We shall mention, briefly, three approaches. 1.1.1 Frequency ratio definition
R. von Mises (1937) introduced a definition based on the assumed existence of a limit of the ratio of successes S to the total number of trials N;
2
REVIEW OF PROBABILITY
If the limit exists:
it is regarded as the definition of the probability of success. One can object that this definition is meaningless since the limit does not exist, in the ordinary sense, that for any e there exists an N such that for all M > N, |P/v - P\ < e. This limit will exist, however, in a probability sense; namely, the probability that these inequalities will fail can be made arbitrarily small. The Chebycheff inequality of Eq. (1.32) is an example of a proof that the probability of a deviation will become arbitrarily small for large deviations. What is the proper statement for the definition of probability obtained as a "limit" of ratios in a large series of trials? 1.1.2 A priori mathematical approach (Kolmogorov)
Kolmogorov (1950) introduced an axiomatic approach based on set theory. The Kolmogorov approach assumes that there is some fundamental set of events whose probabilities are known, e.g., the six sides of a die are assumed equally likely to appear on top. More complicated events, like those involving the tossing of a pair of dice, can be computed by rules of combining the more elementary events. For variables that can take on continuous values, Kolmogorov introduces set theory and assigns to the probability, p, the ratio between the measure of the set of successful events and the measure of the set of all possible events. This is a formal procedure and begs the question of how to determine the elementary events that have equal probabilities. In statistical mechanics, for example, it is customary to assume a measure that is uniform in phase space. But this statement applies to phase space in Cartesian coordinates, not, for example in spherical coordinates. There is good reason, based on how discrete quantum states are distributed, to favor this choice. But there is no guide in the Kolmogorov approach to probability theory for making such a choice. The rigorous axiomatic approach of Kolmogorov raised probability to the level of a fully acceptable branch of mathematics which we shall call mathematical probability. A major contribution to mathematical probability was made by Doob (1953) in his book on Stochastic Processes and his rigorous treatment of Brownian motion. But mathematical probability should be regarded as a subdivision of probability theory which includes consideration of how the underlying probabilities should be determined. Because ideal Brownian motion involves white noise (a flat spectrum up to infinite frequencies) sample processes are continuous but not differentiable. This problem provides a stage on which mathematicians can display their virtuosity in set theory and Lebesgue integration. When Black and Scholes (1973) introduced a model for prices of stock in which the logarithm of the stock price executes a
MEANING OF PROBABILITY
3
Brownian motion, it supplied the first tool that could be used to price stock (and other) options. This resulted in a Nobel Prize, a movement of mathematicians (and physicists) into the area of mathematical finance theory and a series of books and courses in which business administration students were coerced into learning set and Lebesgue integration theory. This was believed necessary because integrals over Brownian motion variables could not be done by the traditional Riemann method as the limit of a sum of terms each of which is a product of a function evaluation and an interval. The difficulty is that with pure Brownian motion the result depends on where in the interval the function must be evaluated. Ito (1951) chose to define a stochastic integral by evaluating the function at the beginning of the interval. This was accompanied by a set of rules known as the Ito calculus. Mathematical probability describes the rules of computation for compound events provided that the primitive probabilities are known. In discrete cases like the rolling of dice there are natural choices (like giving each side of the die equal probability). In the case of continuous variables, the choice is not always clear, and this leads to paradoxes. See for example Bertrand's paradox in Appendix B of this chapter. Feller (1957) therefore makes the logical choice of splitting his book into two volumes the first of which deals with discrete cases. The hard work of dealing with continuous variables is postponed until the second volume. What "mathematical probability" omits is a discussion of how contact must be made with reality to determine a model that yields the correct measure for each set in the continuous case. The Ito model makes one arbitrary choice. Stratonovich (1963) chooses not the left hand point of the interval, but an average over the left and right hand points. These two procedures give different values to a stochastic integral. Both are arbitrary. As a physicist, I (Lax) argue that white noise leads to difficulties because the integrated spectrum, or total energy, diverges. In a real system the spectrum can be nearly flat over a wide range but it must go to zero eventually to yield a finite energy. For real signals, first derivatives exist, the ordinary Riemann calculus works in the sense that the limiting result is insensitive to where in the interval the function is evaluated. Thus the Ito calculus can be avoided. One can obtain the correct evaluation at each stage, and then approach the limit in which the spectrum becomes flat at infinity. We shall see in Chapters 10 and 16 that this limiting result disagrees with Ito's and provides the appropriate result for the ideal Brownian limit. 1.1.3 Subjective probability
Jeffreys (1957) describes subjective probability in his book on Scientific Inference. One is forced in life to assign probabilities to events where the event may occur only once, so the frequency ratio can not be used. Also, there may be no obvious elementary events with equal probabilities, e.g. (1) what is the probability that
4
REVIEW OF PROBABILITY
Clinton would be reelected? (2) What is the probability that the Einstein theory of general relativity is correct? The use of Bayes' theorem, discussed in Section 1.8, will provide a mechanism for starting with an a priori probability, chosen in a possibly subjective manner, but calculating the new, a posteriori probability that would result if new experimental data becomes available. Although Bayes' theorem is itself irreproachable, statisticians divide into two camps, Bayesians and non-Bayesians. There are, for example, maximum likelihood methods that appear to avoid the use of a priori probabilities. We are Bayesians in that we believe there are hidden assumptions associated with such methods, and it would be better to state one's assumptions explicitly, even though they may be difficult to ascertain. 1.2 Distribution functions We shall sidestep the above controversy by assuming that for our applications there exists a set of elementary events whose probabilities are equal, or at least known, and shall describe how to calculate the probability associated with compound events. Bertrand's paradox in Appendix l.B illustrates the clear need for properly choosing the underlying probabilities. Three different solutions are obtained there in accord with three possible choices of that uniform set. Which choice is correct turns out not to be a question of mathematics but of the physics underlying the measurements. Suppose we have a random variable X that can take a set S of possible values Xj for j = 1,2,..., N. It is then assumed that the probability
of each event j is known. Moreover, since the set of possible events is complete, and something must happen, the total probability must be unity:
If X is a continuous variable, we take
as given, and assume completeness for the density function p(x) in the form
STOCHASTIC VARIABLES
5
The "discrete" case can be reformatted in continuous form by writing
where 6(x) is the Dirac delta function discussed in Appendix l.A. It associates a finite probability PJ to the value X = Xj. Since mathematicians (until the time of Schwarz) regarded delta functions as improper mathematics, they have preferred to deal with the cumulative density function
which they call a distribution whereas physicists often use the name distribution for a density function p(x). The cumulative probability replaces delta functions by jump discontinuities which are regarded as more palatable. We shall only occasionally find it desirable to use the cumulative distribution.
1.3 Stochastic variables We shall refer to X as a random (or stochastic) variable if it can take a set (discrete or continuous) of possible values x with known probabilities. With no loss of generality, we can use the continuous notation. These probabilities are then required to obey
Mathematicians prefer the latter form and refer to the integral as a Stieltjes integral. We have tacitly assumed that X is a single (scalar) random variable. However, the concept can be immediately extended to the case in which X represents a multidimensional object and x represents its possible values.
1.4 Expectation values for single random variables If X is a random variable, the average or expectation of a function f ( X ) of X is defined by in the discrete case. Thus, it is the value f ( x j ) multiplied by the probability PJ of assuming that value, summed over all possibilities. The corresponding statement
6
R E V I E W OF PROBABILITY
in the continuous case is
If the range of x is broken up into intervals [XJ,XJ+AXJ] then the discrete formula with
where Xj + AXJ = Xj+i,
is simply the discrete approximation to the integral. The most important expectation is the mean value (or first moment)
More generally, the nth moment of the probability distribution is defined by:
The discrete case is included, with the help of Eq. (1.7). Note that moments need not necessarily exist. For example the LorentzCauchy distribution
has the first moment m but no second or higher moments. It may seem to be a tautology, but the choice f ( X ) = 6(a — X) yields as expectation value
the probability density itself. Equation (1.65), below, provides one example in which this definition is a useful way to determine the density distribution. In attempting to determine the expectations of a random variable, it is often more efficient to obtain an equation for a generating function of the random variable first. For example, it may be faster to calculate the expectation exp(itX) which includes all the moments (Xn} than to calculate each moment separately.
CHARACTERISTIC FUNCTIONS AND GENERATING FUNCTIONS 1.5
7
Characteristic functions and generating functions
The most important expectation is the characteristic function
which is the Fourier transform of the probability distribution. Note that )(t) exists for all real t and has the properties
and
for all t. We shall assume that the Stieltjes form of integral is used if needed. If all moments of Eq. (1.14) exist then
provides a connection between the characteristic function and the moments. This function
(t) is a so called generating function of a random variable. A frequently used generating function can be obtained by setting
so that
Note that t is not the time, but just a parameter. One could equally well have used k. The variable z is frequently used directly when the range of x is the set of integers xr = r, and then
is referred to as a generating function. van Kampen (1982), following Lukacs (1960) and Moran (1968), proved that the inequality, Eq. (1.19), can be sharpened to
provided that the variable X is not a lattice variable whose range of values is given by:
8
REVIEW OF PROBABILITY
If the range of X has an upper bound, i.e., p ( x ) = 0 for x > xmax, then it is convenient to deal with the generating function for s > 0. If it is bounded from below, one can use When neither condition is obeyed, one may still use these definitions with s understood to be pure imaginary. 1.6 Measures of dispersion In this section, we shall introduce moments that are taken with respect to the mean. They are independent of origin, and give information about the shape of the probability density. In addition, there is a set of moments known as cumulants to physicists, or Thiele semi-invariants to statisticians. These are useful in describing the deviation from the normal error curve since all cumulants above the second vanish in the Gaussian case. Moments
The most important measure of dispersion in statistics is the standard deviation a defined by
since it describes the spread of the distribution p ( x ) about the mean value of x m = (x). Chebychev's inequality
guarantees that the probability of deviations that are large multiple h of the standard deviation a must be small. Proof
since the full value of (/icr)2.
MEASURES OF DISPERSION
9
The inequality remains if we replace the RHS by its smallest possible value
or
The remarkable Chebychev inequality limits the probability of large deviations with no knowledge of the shape of p ( x ) , only its standard deviation. It is useful in many applications, including the proof by Welsh (1988) of the noisy coding theorem. The standard deviation a is the second order moment about the mean. Higher order moments about the mean are defined (Kendall and Stuart 1969) by
for moments about the mean, m = (x}, and to use Eq. (1.14)
for the ordinary moments. Thus /Z2 = o"2, and /j,i = 0. The binomial expansion of Eq. (1.33) yields
where
.
LJ J
Conversely:
= .,,";_ ^, 1 is the binomial coefficient. In particular:
•''^' - ''
10
REVIEW OF PROBABILITY
Cumulants
The cumulants to be described in this section are useful since they indicate clearly the deviation of a random variable from that of a Gaussian. They are sometimes referred to as Thiele semi-invariants (Thiele 1903). The cumulants KJ are defined by
Note that normalization of the probability density p(x) guarantees that //(, = 1 and KO = 0. Equivalently,
By separating a factor exp(imt) the cumulants can be expressed in terms of the central moments:
Thus K\ = m, and the higher AC'S are expressible in terms of the moments of (x — m). In particular:
Cumulants were introduced as a tool in quantum mechanics by Kubo (1962) in conjunction with a convenient notation for the Kn as linked moments
where the individual linked moments must still be calculated by Eqs. (1.45)(1.49). However, Eq. (1.43) can be written in a nice symbolic form:
Example The normal error distribution (with mean zero) associated with a Gaussian random variable X,
MEASURES OF DISPERSION
11
has the characteristic function
The integral can be performed by completing the square and introducing x' = x — ia2t as the new variable of integration. In particular, the cumulants are all determined by In (t] to be The characteristic function, Eq. (1.54), can be rewritten in the form
where X is a Gaussian random variable of mean 0. A Gaussian variable with a nonvanishing mean can be handled by a shift of origin. For any Gaussian variable X with mean m = (X) not necessarily 0, we have
where This is a convenient way to calculate the average of an exponential of a Gaussian random variable. Since the Fourier transform of a Gaussian is a Gaussian, we know the form of the right hand sides of Eqs. (1.56) and (1.57). The coefficients could have been obtained simply by expanding both sides in powers of t and equating the coefficients oft" for n = 0,1, 2. Skewness and kurtosis The cumulants describe the probability distribution in an intrinsic way by subtracting off the effects of all lower-order moments. Thus (X2} has a value that depends on the choice of origin whereas ^2 = a2 = ((X — m) 2 ) describes the spread about the mean. Similarly, ^3 = ^3 describes the skewness or asymmetry of a distribution, and «4 = /Z4 — 3^2 describes the "kurtosis" of the distribution, that is, the extent to which it differs from the standard bell shape associated with the normal error curve. These measures are usually stated in the dimensionless form
These measures 71 and 72 clearly vanish in the Gaussian case. Moreover, they provide a pure description of shape independent of horizontal position or scale.
12
1.7
REVIEW OF PROBABILITY
Joint events
Suppose that we have two random variables X and Y described (when taken separately) by the probability densities p\ (x) andp2(y) respectively. The probability that X is found in the interval (x, x + dx) and at the same time y is found in the interval (y, y + dy) is described by the joint probability density Define p\(x)dx as the probability of finding X in (x, x + dx} regardless of the value of Y. Then we can write
which is referred to as the marginal distribution of x. Conversely,
Example Two points x and y are selected, at random, uniformly on the line from 0 to 1. (a) What is the density function p(£) of the separation £ = \x — y\l (b) What is the mean separation? (c) What is the root mean squared separation [{£2} — (£) 2 ] ? (d) What is (W(|x — y|)) for an arbitrary function W? Solution It is necessary to map the square with vertexes at the four point in the (x, y) plane: to the corresponding points in the (u, v) plane, by a transformation whose Jacobian is unity (see Fig. 1.1). In the (u, v) plane, the four points are at Using Eq. (1.62) and Eq. (1.64), the density function p(£) is then given by
JOINT EVENTS
13
FlG. 1.1. Transformation from x, y variables to u, v variables. Note that our use of a delta function to specify the variable we are interested in is one of our principal tools. It fulfills our motto that experimentalists do it with mirrors, and theorists do it with delta functions (and Green's functions). The solution (1.65) is even in u, we can integrate over 1/2 the interval and double the result:
We can verify that (a) Normalization
(b) The average separation between points is
(c) The mean square separation is
(d) The average of any function W of the separation \X — Y\ is
where the last integral, with the value over unity, was entered as a means of introducing the variable £. Rearranging the order of integration we get for the right
14
REVIEW OF PROBABILITY
FlG. 1.2. The events A and B are nonoverlapping, and the probability of at least one of these occurring is the sum of the separate probabilities. hand side: where the second integral is simply the definition of p(£). Restoring the left hand side we have established another tautology:
where W(Q is an arbitrary function and p(£) was given in Eq. (1.66). Finally we obtain an explicit formula
Disjoint events
If A and B are disjoint events (events that cannot both occur), then where A U B means the union of A and B, that is, at least one of the events A or B has occurred. In the language of set theory the intersection, A n B, is the region in which both events have occurred. For disjoint sets such as those shown in Fig. 1.2, the intersection vanishes. Overlapping events
The probability that at least one of two events A and B has occurred when overlap is possible is because the sum of the first two terms counts twice the probability p(A n B) that both events occur (the shaded area in Fig. 1.3).
JOINT EVENTS
15
FIG. 1.3. Events A and B that overlap are displayed. The hashed overlap region is called the intersection and denoted A n B, Note that an event A or B need not be elementary. For example, A could represent the tossing of a die with an odd number appearing which is the union of these events, a one, a three, or a five appearing. B could be the union of the events one and two. Suppose YJ is a random variable that takes the value 1 if event Aj occurs, and zero otherwise. We have thus introduced a projection operator, that is the analogue for discrete variables of our introduction of a delta function for continuous variables. The probability that none of the events Ai,A2,..., An has occurred can be written as The probability that one or more events has occurred can be written as a generalization of Eq. (1.75):
The use of projection operators such as YJ is often a convenient tool in solving probability problems. Uncorrelated or independent random variables
Two random variables X and Y are said to be uncorrelated if
16
R E V I E W OF PROBABILITY
The variables are said to be independent if
It is clear that two independent variables are necessarily uncorrelated. The converse is not necessarily true. 1.8 Conditional probabilities and Bayes' theorem If X and Y are two, not necessarily independent variables, the conditional probability density P(y\x)dy that Y takes a value in the range [y, y + dy] given that X has the value x is denned by
where P(x, y} is the joint probability density of x and y and
is the probability density of x if no information is available about y. Of course, Eq. (1.80) is also valid with the variables x and y interchanged so that
The notation in which the conditioned variables appear on the right is common in the mathematical literature. It is also consistent with quantum mechanical notation in which one reads the indexes from right to left. Thus verbally we say that the probability that X and Y take the values x and y is the probability that X takes the value x times the probability that Y will take the value y knowing that X has taken the value x, a conclusion that now appears obvious. Equation (1.82) is a general equation that imposes no requirements on the nature of the random variables. Moreover, the same idea applies to events A and B which may be more complicated than single random variables. Thus
is valid if B represents the event Xn = xn and A represents the compound event
Xi = xi,X2 = X2,...,Xn-i =£„_!. Thus
Suppose that Ac is the complementary event to A (anything but A). Then these events are mutually exclusive and exhaustive:
CONDITIONAL PROBABILITIES AND BAYES' THEOREM
17
c
Then the events A n B and A n B are mutually exclusive and their union is B. Thus
By the same argument, if the set of events Aj are mutually exclusive, AiCiAj = 0, and exhaustive.
then Eq. (1.87) generalizes to
which describes the probability of an event B in terms of possible starting points, or hypotheses Aj. Bayes' theorem
One can determine the probability of a hypotheses Aj if we have made a measurement B. This conditional probability probability P(Aj\B) is given by Bayes' thp.nrp.TTT
The first equality follows directly from the definition Eq. (1.80) of a conditional probability. The second equality is obtained by inserting Eq. (1.88). The importance of Bayes' theorem is that it extracts the a posteriori probability, P(Aj\B), of a hypothesis Aj, after the observation of an event B from the a priori probability P(Aj] of the hypothesis Aj. For simple systems like the tossing of a die, the a priori probabilities are known. In more general problems they have to be estimated, possibly as subjective probabilities. Bayesians believe that this step is necessary. Anti-Bayesians do not. They try to use another approach, such as maximum likelihood. In our opinion this approach is equivalent to making a tacit assumption for the a priori probabilities. We would prefer explicit assumptions. Bernstein (1998) notes that Thomas Bayes (1763), an English minister, published no mathematical works while he was alive. But he bequeathed his manuscripts, in his will, to a preacher, Richard Price, who passed it to another member of the British Royal Society, and his paper Essay Towards Solving A Problem in the Doctrine of Chance was published two years after his death. Although Bayes' work was still ignored for two decades after his death in 1761, he has since become famous among statisticians, social and physical scientists.
18
REVIEW OF PROBABILITY
Example It is known that of 100 million quarters, 100 are two-headed. Thus the a priori probability of a coin being two-headed is 10~6. A quarter is selected at random from this population and tossed 10 times. Ten heads are obtained. What is the probability that this coin is two headed? Solution Let AI = two headed, A^ = A\ = fair coin, B = ten heads tossing result, we have.
Then,
Thus observing 10 heads in a row has caused the a priori probability, 10 6, of a bad coin to increase to about 10~3. The point of this problem, as a Bayesian would say, is that one can never calculate the a postieri probability of a hypothesis without the use of Bayes' theorem with a choice of a priori probability, possibly determined by some subjective means. Example Two points are chosen at random in the interval [0,1]. They are connected by a line. Two more points are then chosen over the same interval and connected by a second line. What is the probability that the lines overlap? Solution We will choose the complementary question: what is the probability that they do not overlap? which is easier to answer. Suppose the first two points are x and y. No overlap will occur if the next two points are both left of the smaller of x, y or both right of the larger of x, y. By symmetry, the second probability is the same as the first.
SUMS OF RANDOM VARIABLES
19
Suppose x is the smaller of x and y. The probability that the third point is less than x is x. The probability that the fourth point is less than x is also x. The probability that both are less than x is x 2 . What is the probability density P(x = £) given that x < y! This conditional probability is
where H(x) is the Heaviside step function, H(x) = 1 if x > 0 and H(x) = 0 otherwise. We can evaluate the denominator in Eq. (1.91):
Thus the conditional probability is given by
The probability of both to the left is then f^ £2d£2(l - £) = 1/6. Therefore the probability of no overlap is 1/6 x 2 = 1/3 and the probability of overlap is 1 - 1 / 3 = 9 /3
1.9
Sums of random variables
The characteristic function of the sum of two random variables, Z = X + Y, is
If the variables are independent, then the averages over x and y can be performed separately with the result
Because the cumulants are denned in terms of the logarithm of the characteristic function- the cnmnlants are additive:
20
REVIEW OF PROBABILITY
More generally, if
and the X, are independent variables
The characteristic function of the joint distribution, p(x,y), of the two random variables is defined by
If X and Y are two independent Gaussian random variables, so is their linear combination. By Eq. (1.56), we then have
where the deviation variables are denned by
If these variables X,Y are uncorrelated, i.e., (XY) characteristic function factors:
= (X)(Y)
= 0, then the
Since p(x, y) can be obtained by taking the inverse Fourier transform of >(s, t), it too must factor. Hence we arrive at the result: if two Gaussian variables are uncorrelated, they are necessarily independent. Bernoulli trials and the binomial distribution
Bernoulli considered a set of independent trials each with probability p of success and q = 1 — p of failure. In n trials, the probability that the first r trials are a success and the remainder are failures is
If we ask for the probability that there are r successes in n trials without regard to order, the probability will be
SUMS OF RANDOM VARIABLES
21
u/1-».o»-a
is the number of ways r objects can be drawn from n indistinguishable objects. We shall show how to derive Eqs. (1.105) and (1.106) without knowing the combinatorial coefficients by using the characteristic function. If we define a random variable S that takes the value 1 for a success and 0 for a failure, then the characteristic function in a single trial is
so that the characteristic function for n independent trials by Eq. (1.99) is
With z = elt, the generating function can be expanded using the binomial theorem
Since the coefficient of zr in the generating function is Pr [or Pr (n) with n fixed and r variable!, we have established that
Comparison with Eq. (1.105) shows that the combinatorial and binomial coefficients are equal
With the abbreviation 0 = it, Eq. (1.108) can be rewritten as
from which we deduce the cumulants to be
22
REVIEW OF PROBABILITY
The measures of skewness and kurtosis were given in Eq. (1.58)
The higher order dimensionless ratios, j > 2, depend on j roughly as,
So all vanish as n —> oo for j > 2. Thus as n —> oo,
becomes a Gaussian random variable of mean zero and variance equal to unity. These statements are heuristic. They tacitly assume what is known as the continuity theorem of probability discussed by E. Parzen (1960) in his Section 10.3. This theorem states that the cumulative distribution function Pn and the characteristic function >n are related in a continuous way. Pn converges at all points of continuity of P if and only if the sequence of characteristic functions 0n (t) converge at each real t to the characteristic function
(where Xj = 1 for a success and 0 for a failure in an individual trial) approaches a Gaussian random variable of mean zero and variance one, is a special case of the central limit theorem. The latter theorem applies to any set of independent random variables Xj (not necessarily identically distributed), under rather mild conditions. Here
is the mean, and
SUMS OF RANDOM VARIABLES
23
is the variance of the sum variable V Xj. The Laplace-Liapounoff form of the central limit theorem uses as a sufficient condition that
tends to 0 as n —>• oo for some S > 0. Then the cumulative probability, Pn(u), that
tends uniformly to the limit
See for example Uspensky (1937), Chapter 14. The condition (1.120) is less stringent than the set of conditions in Eq. (1.115). The Poisson distribution
If n —» oo for fixed p, the binomial distribution approaches the Gaussian distribution, which is an example of the central limit theorem. Another distribution, appropriate for rare events, may be obtained by letting n —» oo, p —» 0 with a fixed product or fixed mean value. As n —> oo, we can utilize Stirling's asymptotic approximation for the factorial function to obtain the limiting behavior for large n
Then the binomial distribution
As p —> 0 and n —> oo (for fixed r)
24
REVIEW OF PROBABILITY
Thus the binomial distribution in the rare event limit approaches the Poisson distribution: The associated generating function
yields the corresponding characteristic function
The cumulants then take the remarkable simple single value for all s
A completely different approach to the Poisson process is given in Section 3.2. 1.10
Fitting of experimental observations
Linear fit Suppose we expect, on theoretical grounds, that some theoretical variable Y should be expressible a linear combination of the variables X^. We wish to determine the coefficients aM of their linear expansion in such a way as to best fit a set of experimental data, i = 1 to N, of values X^, Yi. We can choose to minimize the least squares deviation between the "theoretical" value of Yi
and the observed value Yi by minimizing the sum of the squares of the deviations between experiment and theory:
The number of measurements, N, is in general, much larger than the number, s, of unknown coefficients, a^. The expected value of F is then
FITTING OF EXPERIMENTAL OBSERVATIONS
25
where
We can minimize by differentiating (F) with respect to a\. After removing a factor of two, we find that the av obey a set of linear equations:
With MXv = (XXX»}, and Bx = ( X X Y } , we have the matrix equation which provides a least squares fit to the date by a linear function of the set of individual random variables. The logic behind this procedure is that the measurement of Yi is made subject to an experimental error that we treat as a Gaussian random variable e^. Errors in the points X? of measurement are assumed negligible, and for simplicity, we assume (e^e.,) = cV,<72 with
If one expects a nonlinear relation of the form where the a A are the parameters of the fit, then it is customary to attempt to minimize the sum of the squares of the deviations
Although it is, in principle, possible to minimize by differentiating F with respect to the parameters a\ an explicit solution for the latter is, in general, no longer
26
R E V I E W OF PROBABILITY
possible. It is then customary to use a nonlinear least squares fitting program such as NLLSQ or Taurus N2F or N2G from the PORT library to obtain the best set of parameters, or UNCMIN (unconstrained minimization) by Kahaner, Moler, and Nash (1989), or the Marquard method (MRQMIN) from the Numerical Recipes library by Press et al. (1992). Chi-square test of goodness of fit
A common problem consists in deciding whether a set of experimental data is consistent with a given distribution p ( x ) . The common procedure is to decompose the line — oo < x < oo into a finite set of k regions, so that in region j there is a probability
of cij < x < bj, and an expected number (HJ) = npj of events in that interval. We then observe an actual set of re/s. Are these observations compatible with the assumed p ( x ) distribution? Karl Pearson (1900) has established that the n/s are described by a multivariate Gaussian of the form
where
but these k variables are subject to one constraint
Such a constraint can be handled by a rotation of coordinates to a new set of variables yi, 2/2, • • • > y/t such that the last variable is given by
Then our distribution will be
FITTING OF EXPERIMENTAL OBSERVATIONS
27
where
The variable X (referred to as chi-square) describes the deviation between the hypothetical distribution npj and the actual distribution HJ. If X2 is too large, we can reject the assumption that the original distribution {PJ } is correct. Since Pearson's proof may be difficult to obtain in a library, we mention that a later text by Alexander (1961) summarizes Pearson's proof in a manner adapted from the classic text by Fry (1928). How is the chi-qquare test used to test goodness of fit? It is clear that when chi-square is small, the hypothesis and the experimental results are compatible. When chi-square is large, however, either the observation is a large fluctuation of low probability, or the original hypothesis is incorrect. When is chi-square X2 to be regarded as large, and when as small? For this purpose, one uses the cumulative distribution
that describes the probability of observing data leading to a chi-square larger than the observed value. A fairly general but somewhat arbitrary convention is to use 5% as the dividing line between small and large. Thus a deviation large enough to have a less than 5% chance of being observed (if the hypothesis is correct) is used to cast suspicion on the validity of the hypothesis. Actually, this procedure gives no probability of correctness of the hypothesis, only the probability that of observing the given event if the hypothesis is assumed correct. As Bayesians, we claim that one needs an a priori probability of correctness to deduce the a posteriori probability of correctness with the help of Bayes' theorem. With only qualitative a priori information, one could reduce the 5% level say to 1% if one has peripheral information leading one to have some faith in the hypothesis. Can chi-square be too small? There was an article in Science magazine within the last 10 years that argued that the experimental results supporting Coulomb's inverse square law had such a small chi-square that the fit was too good and that the experimental data was doctored. The x2 (chi-square) distribution
How large is too large? For this purpose, we need the distribution of values of z = X2 in Eq. (1.149). Kendall and Stuart (1969) in Section 1.11.2 consider the distribution, with n = k — 1,
28
REVIEW OF PROBABILITY
They then transform Eq. (1.150) to z and n — 1 angles. After integrating over the angles and ensuring normalization, they arrive at the distribution
As theoretical physicists, we shall achieve the same result with 5 functions. This avoids the need to know the transformation to spherical coordinates in an n-dimensional space. The distribution function of z (whose integral is manifestly one) can be denned by
If we let we obtain
where Kn is a number denned by
It is trivial to evaluate Kn by the requirement that
The result is
This result is used in Appendix 1 .A to determine the volume and surface area of a sphere in n dimensions. Let us consider a simple dice throwing example from Alexander (1991). Example
A single die is tossed n = 100 times. The frequency with which each side of the die is observed is indicated in Table 1.1 below. Calculate chi-square from this table. Solution In this case, all pi = 1/6, so npi = 100/6 = m. There are six possible frequencies, but there is one constraint. The sum must add to 100. Thus there are
MULTIVARIATE NORMAL DISTRIBUTIONS Value Freq
TABLE 1.1. 1 2 3 4 18 18 17 13
5 18
29
6 16
k = 5 degrees of freedom to be used in looking up the chi-square table. The value of chi-square is
Is this large? No. Indeed, the 5% acceptance level permits fluctuations as large as 11.070 in chi-square when the number of degrees of freedom is only 5. 1.11 Multivariate normal distributions We can write a multivariate Gaussian distribution in the form
where x denotes a vector with n components, x i , X 2 , ...a;n, x' denotes the transposed vector and A denotes a n x n matrix which can be chosen to be symmetric. When the matrix Aij possesses off-diagonal terms, the components of x are correlated. The normalization factor N is determined by
Alternatively, with the definition of the multivariate characteristic function
The normalization factor N can be determined by the requirement:
30
REVIEW OF PROBABILITY
Since any symmetric real matrix A can be diagonalized by an orthogonal similaritv transformation we can set
and is diagonal. Thus
Because the transformation is orthogonal, its Jacobian is unity. The normalization requirement is then
But
Thus The characteristic function can be evaluated by completing the square in the usual manner:
where By changing to x — c as a variable of integration, the characteristic function is found to be
The multivariate distribution, Eq. (1.158) that we started with has all means equal to zero, (X) = 0.
MULTIVARIATE NORMAL DISTRIBUTIONS
31
The slightly generalized distribution
has the mean and the characteristic function
As in the univariate case, the exponent is a quadratic form and all cumulants of order higher than 2 vanish. Moreover, the second moment of the distribution is given directly by the coefficients of the quadratic terms in the exponent. In particular, with
Writing V = A
1
, which is the variance matrix,
is the distribution, and
is the characteristic function. For the bivariate case, we can write
where a? are the variances and
is the correlation coefficient.
32
REVIEW OF PROBABILITY
The inverse matrix is given by
so that
with characteristic function
1.12
The laws of gambling
Introduction In this section we propose two laws of gambling that appear to contradict one another. We shall state them loosely first to demonstrate the apparent contradiction: The first law of gambling states that no betting scheme, i.e., method of varying the size of bets, can change one's expected winnings. The second law of gambling states that if you are betting against a slightly unfair "house" there is a way to arrange one's bets to maximize the probability of winning. The first law
Naive notions of betting schemes are common. The simplest example occurs in coin tossing. If the initial bet is one dollar and you lose, double your bet. If you win, you are ahead one dollar. If you lose, you have lost three dollars. Double your bet. If you win the four dollars, you are ahead one dollar. If you lose, double again. Eventually you must win. Ergo you are sure to win one dollar. There are schemes based on more complicated betting arrangements. Can any of them work? The first law says no. Restatement of the first law: In a Bernoulli series of independent trials, with equal probability p of winning any individual event, the expected winnings is proportional to the total sum of money bet, independent of how the individual bets are chosen. Proof
Let d be the odds given, so that if br is the amount bet on the rth trial, the loss is br on failure or the amount won on success is brd. If S is the sum won, its expected
THE LAWS OF GAMBLING
33
value is
where
is the total amount bet, q = I — p, and
is a measure of the game's unfairness. In a fair game, the odds would be d = q/p and the expected winnings remain zero regardless of the choice of the br. In any case, the expected winnings depends on the total bet B, not on how the bets were distributed. Another fallacy with the scheme of doubling one's bets is that it presumes the bettor has infinite capital. The problem of winning is reformulated in the second law of gambling. The second law
The problem of winning is redefined as follows: A bettor starts with capital C and an objective of winning an amount W. The probability of winning is the probability of increasing one's capital to C + W before losing all one's capital. The bank, or casino, is assumed to have an infinite amount of capital. This is a random walk problem with absorbing boundaries, also a first passage-time problem. With this definition, the best betting scheme is the one that minimizes the total amount bet. The probability of winning is given by
where e = q — dp is the degree of unfairness of the bet, and B is the total amount bet. Proof
If P is the (unknown) probability of winning, and Q = 1 — P, the expected winnings are By the first law of gambling, E is independent of the gambling scheme and is given by E = —(B, in Eq. (1.184). Setting Q = 1 — P and solving for P, we obtain the desired expression Eq. (1.187).
34
REVIEW OF PROBABILITY
Corollary
In a fair game, the probability of winning is
independent of any betting scheme. Conjecture
Consistent with the restriction that one should never bet more than necessary to win the game in a single step, the best strategy is to make a sufficiently large bet that one can win in a single try. We assume the game is unfair, and this procedure is designed to minimize the total bet. Suppose bets are available with odds up to d = W/C. Then one should make a single bet of one's entire capital C, at these odds. The probability of winning in this single step is p, which by Eq. (1.186) can be expressed in terms of the available odds as
Any betting scheme has the probability, P, of Eq. (1.187), of winning. If the odds are lower, several bets will have to be made, and we will have B > C, thus
Suppose odds up to d\ = W/(C/2) are available. Then one can bet (7/2 on the first bet and win with probability
The best one can do, then, is to stop if one wins on the first stop, and to bet C/2 again if one loses. The expected amount bet is then
so that the probability P of winning is better than that in the first scheme, Eq. (1.190). It is clear, that if the degree, e, of unfairness is the same at all odds, it is favorable to choose the highest odds available and bet no more than necessary to achieve C + W.
APPENDIX A: THE DIRAC DELTA FUNCTION
1.13
35
Appendix A: The Dirac delta function
Point sources enter into electromagnetic theory, acoustics, circuit theory, probability and quantum mechanics. In this appendix, we shall attempt to develop a convenient representation for a point source, and establish its properties. The results will agree with those simply postulated by Dirac (1935) in his book on quantum mechanics and called delta functions, or "Dirac delta functions" in the literature. What are the required properties of the density associated with a point source? The essential property of a point source density is that it vanishes everywhere except at the point. There it must go to infinity in such a manner that its integral over all space (for a unit source) must be unity. We shall start, in one dimension, by considering a source function 6(e, x) of finite size, e, which is small for x 3> e, and of order 1/e for x < e, such that the area for any e is unity:
All of these requirements can be fulfilled if we choose
where g(y) is an integrable mathematical function that describes the "shape" of the source, whereas e describes its extent. Examples of source functions which can be constructed in this way are
A problem involving a point source can always be treated using one of the finite sources, S(e, x), by letting e —» 0 at the end of the calculation. Many of the steps in the calculation (usually integrations) can be formed more readily if we can let e —> 0 at the beginning of the calculation. This will be possible provided that the results are independent of the shape g(y) of the source. Only if this is the case,
36
R E V I E W OF PROBABILITY
however, can we regard the point source as a valid approximation in the physical problem at hand. The limiting process e —>• 0 can be accomplished at the beginning of the calculation by introducing the Dirac delta function
This is not a proper mathematical function because the shape g(y) is not specified. We shall assume, however, that it contains those properties which are common to all shape factors. These properties can be used in all problems in which the point source is a valid approximation. It is understood that the delta function will be used in an integrand, where its properties become well denned. The most important property of the 6 function is
for a < b < c and zero if b is outside these limits. Setting 6 = 0, for simplicity of notation, we can prove this theorem in the following manner:
In the first term, the limit as e —> 0 can be taken. The integral over g(y) is then 1 if a < 0 < c since the limits then extend from — oo to oo, and 0 otherwise since both limits approach plus (or minus) infinity. The result then agrees with the desired result, Eq. (1.202). The second integral can be easily shown to vanish under mild restrictions on the functions / and g. For example, if / is bounded and g is positive, the limits can be truncated to fixed finite values, say a' and c' (to any desired accuracy), since the integral converges. Then the limit can be performed on the integrand, which then vanishes. The odd part of the function makes no contribution to the above integral, for any f(x). It is therefore customary to choose g(y) and hence S(x) to be even fnnrtinns
The derivative of a delta function can be defined by
APPENDIX A: THE DIRAC DELTA FUNCTION
37
where the limiting process has a similar meaning to that used in defining the delta function itself. An integration by parts then yields the useful relation
when the range of integration includes the singular point. The indefinite integral over the delta function is simply the Heaviside unit function H(x)
With g(y) taken as an even function, its integral from negative infinity to zero is one half, so that the appropriate value of the Heaviside unit function at the origin is Conversely, it is appropriate to think of the delta function as the derivative of the Heaviside unit function, \AJlM
The derivative of a function possessing a jump discontinuity can always be written in the form
The last term can be rewritten in the simpler form since and Eq. (1.211) is clearly valid underneath an integral sign in accord with Eq. (1.202). As a special case Eq. (1.211) yields provided that it is integrated against a factor not singular at x = 0. Thus we can write, with c an arbitrary constant, or
Thus the reciprocal of x is undefined up to an arbitrary multiple of the delta function. A particular reciprocal, the principal valued reciprocal, is customarily defined
38
REVIEW OF PROBABILITY
by its behavior under an integral sign,
Thus a symmetric region [—e, e] is excised by the principal valued reciprocal before the integral is performed. The function x/(x2 + e 2 ) behaves as 1/x for all finite x and deemphasizes the region near x = 0. Thus, in the limit, it reduces to the nrincinal valued recinroeal. The, combination
is important in the theory of waves (including quantum mechanics) because it enters the integral representation for "outgoing" waves. Its complex conjugate is used for incoming waves. Behavior of delta functions under transformations Because the delta function is even, we can write Thus we can make the transformation y = a\xto obtain
This can be restated as a theorem
The denominator is simply the magnitude of the Jacobian of the transformation to the new variable A natural generalization to the case of a nonlinear transformation y = j (x) is given by
where the xr are the nondegenerate roots of / This theorem follows from the fact that a delta function vanishes everywhere except at its zeros, and near each zero, we can approximate and apply Eq. (1.220).
APPENDIX A; THE DIRAC DELTA FUNCTION
39
A simple example of the usefulness of Eq. (1.222) is the relation
Multidimensional delta functions
The concept of a delta function generalizes immediately to a multidimensional space. For example, a three-dimensional delta function has the property that
If the integral is written as three successive one-dimensional integrals in Cartesian coordinates, Eq. (1.226) is equivalent to the statement
If spherical coordinates are used, Eq. (1.226) is equivalent to
The denominator is just the Jacobian for the transformation from Cartesian to spherical coordinates. This is the natural generalization of the Jacobian found in Eq. (1.221), and guarantees that the same result, Eq. (1.226), is obtained regardless of which coordinate system is used. Volume of a sphere in n dimensions
We shall now show how to calculate the volume of a sphere in n dimensions with the help of delta functions, and probability theory alone! The volume of a sphere of an n-dimensional sphere of radius R can be written
where the Heaviside unit function confines the integration region to the interior of the sphere. If we now differentiate this equation with respect to R2 to convert the Heaviside function to a delta function, we get
Now, if we let X{ = Rui for all i and use the scaling property, Equation (1.220) of delta functions we get
40
REVIEW OF PROBABILITY
where
is a constant that depends only on n. Since dVn = Rn dRdS where S is a surface element, 2Kn, can be interpreted as the surface area, Sn of a sphere of radius unity in n-dimensions. However, the normalization of the chi-square distribution in Eq. (1.155) forces Kn to take the value, Eq. (1.156),
Finally, Eq. (1.231) can be integrated to yield
1.14
Appendix B: Solved problems
The dice player A person bets that in a sequence of throws of a pair of dice he will get a 5 before he gets a 7 (and loses). What odds should he be given to make the bet fair? Solution
Out of the 36 possible tosses of a pair, only the four combinations, 1 + 4, 2 + 3, 3 + 2, and 4 + 1 add to 5. Similarly, six combinations add to 7: 1 + 6, 2 + 5, 3 + 4, 4 + 3, 5 + 2, and 6 + 1. Thus in a single toss the three relevant probabilities P5 = 4/36, P7 = 6/36 for 5 and 7, and the probability for all other possibilities combined P0 = 26/36. The probability of r tosses of "other", and s > 1 tosses of 5, followed by a toss of a 7 is given by
where the sum on s starts at 1 to insure the presence of a 5 toss, r has been replaced by n — s and the combinatorial coefficient has been inserted to allow the "other" tosses and the 5 tosses to appear in any order. The 7 toss always appears at the end.
APPENDIX B: SOLVED PROBLEMSs
41
The sum over s can be accomplished by adding and subtracting the s = 0 term:
The two terms are simply summable geometric series. Using P$ + Pj + P0 = 1 the final result is
The corresponding result for 7 to appear first is obtained using the formulas with 5 and 7 interchanged:
Thus the "house" should give 3 : 2 odds to the bettor on 5. The gold coins problem A desk has four drawers. The first holds three gold coins, the second has two gold and one silver, the third has one gold coin and two silver, and finally the fourth drawer has three silver coins. If a gold coin is drawn at random, what is (a) the probability that there is a second gold coin in the same drawer, and (b) if a second drawing is made from the same drawer, what is the probability that it too is gold? Solution Because of the distribution of the gold coins, the probability that the coin came from the first drawer is pi = 3/6 because three of the six available gold coins were in that drawer. Similarly p2 = 2/6, and p% = 1/6. The probability that there is a second coin in the same drawer is 1 x pi + 1 x p% + 0 x p% = 5/6. Similarly, the probability that the second selected coin (from the same drawer) is gold is 1 x pi + (1/2) x p2 = 2/3, since the second coin is surely one in the first drawer, and has a 1/2 chance in the second drawer, and there are no gold coins left in the third drawer. Note that these values of 1, 1/2, and 0 are conditional probabilities given the outcome of the first choice. The Bertrand paradox The Bertrand problem can be stated in the following form: A line is dropped randomly on a circle. The intersection will be a chord. What is the probability that the length of the chord will be greater than the side of the inscribed equilateral triangle?
42
REVIEW OF PROBABILITY
Solution 1
The side of the inscribed triangle is a chord with a distance of half of the radius, -R/2, from the center of the circle. Assuming that along the direction perpendicular to the side of the inscribed triangle the distance is uniformly distributed, the chord will be longer if the distance from the circle center is less than R/2, which it is with probability 1/2. Solution 2
The chord will be greater than the triangle side if the angle subtended by the chord is greater than 120 degrees (out of a possible 180) which it achieves with probability 2/3. Solution 3
Draw a tangent line to the circle at an intersection with the chord. Let (f> be the angle between the tangent and the chord. The chord will be larger than the triangle side if <j> is between 60 and 120 degrees, which it will be with probability (120 — 60)/180 = 1/3. Solution 2 is given in Kyburg (1969). Solution 3 is given by Uspensky (1937) and the first solution is given by both. Which solution is the correct one? Answer: the problem is not well defined. We do not know which measures have uniform probability unless an experiment it specified. If a board is ruled with a set of parallel lines separated by the diameter, and a circular disk is dropped at random, the first solution is correct. If one spins a pointer at the circle edge, the third solution would be correct. Gambler's ruin
Hamming (1991) considers a special case of the gambler's ruin problem, in which gambler A starts with capital C, and gambler B starts with W (or more) units. The game will be played until A loses his capital C or wins an amount W (even if B is a bank with infinite capital). Each bet is for one unit, and there is a probability p that A will win and q = I — p that B will win. Solution
The problem is solved using the recursion relation
APPENDIX B: SOLVED PROBLEMS
43
where P(n) is the probability that A will win if he holds n units. The boundary conditions are Strictly speaking, the recursion relation only needs to be satisfied for 0 < n < T, which omits the two end points. However, the boundary conditions, Eq. (1.240), then lead to a unique solution. The solution to a difference equation with constant coefficients is analogous to that of a differential equation with constant coefficients. In the latter case, the solution is an exponential. In the present case, we search for a power law solution, P(n) = rn, which is an exponential in n. The result is a quadratic equation with roots 1 and p/q. The solutions, 1™ and (p/q)n, actually obey the recursion relation, Eq, (1.239), for all n. But they do not obey the boundary conditions. Thus we must, as in the continuous case, seek a linear combination and impose the boundary conditions of Eq. (1.240) to obtain simultaneous linear conditions on A and B. The final solution is found to be
where we must set n = C, and T = C + W to get the probability that A at his starting position will win. lip = 1/2, the two roots for r coalesce, and as in the differential equation case a solution linear in n emerges. Application of the boundary conditions leads to
Since our solution obeys the boundary conditions, as well as the difference equation everywhere (hence certainly in the interior) it is clearly the correct, unique solution.
2
What is a random process
2.1 Multitime probability description A random or stochastic process is a random variable X(t), at each time t, that evolves in time by some random mechanism (of course, the time variable can be replaced by a space variable, or some other variable in application). The variable X can have a discrete set of values Xj at a given time t, or a continuum of values x may be available. Likewise, the time variable can be discrete or continuous. A stochastic process is regarded as completely described if the probability distribution is known for all possible sets [ti, t%, ...,tn] of times. Thus we assume that a set of functions
describes the probability of finding
We have previously discussed multivariate distributions. To be a random process, the set of variables Xj must be related to each other as the evolution in time Xj = X(t) \t=t, of a single "stochastic" process. 2.2
Conditional probabilities
The concept of conditional probability introduced in Section 1.8 immediately generalizes to the multivariable case. In particular, Eq. (1.82)
can be iterated to yield
STATIONARY, GAUSSIAN AND MARKOVIAN PROCESSES
45
When the variables are part of a stochastic process, we understand Xj to be an abbreviation for X ( t j ) . The variables are written in time sequence since we regard the probability of xn as conditional on the earlier time values x n _i, ...,xi. 2.3
Stationary, Gaussian and Markovian processes
A stationary process is one which has no absolute time origin. All probabilities are independent of a shift in the origin of time. Thus
In particular, this probability is a function only of the relative times, as can be seen by setting r = —ti. Specifically, for a stationary process, we expect that
and the two-time conditional probability
reduces to the stationary state, independent of the starting point when this limit exists. For the otherwise stationary Brownian motion and Poisson processes in Chapter 3, the limit does not exist. For example, a Brownian particle will have a distribution that continues to expand with time, even though the individual steps are independent of the origin of time. A Gaussian process is one for which the multivariate distributions pn(xn,xn-i, ...,xi) are Gaussians for all n. A Gaussian process may, or may not be stationary (and conversely). A Markovian process is like a student who can remember only the last thing he has been told. Thus it is defined by
that is the probability distribution of xn is sensitive to the last known event xn-\ and forgets all prior events. For a Markovian process, the conditional probability formula. Ea. (2.5) specializes to
so that the process is completely characterized by an initial distribution p ( x i ) and the "transition probabilities" p(xj a^-i). If the Markovian process is also
46
WHAT IS A R A N D O M PROCESS
stationary, all P(XJ\XJ-\) are described by a single transition probability
independent of the initial time tj-i.
2.4
The Chapman-Kolmogorov condition
We have just shown that a Markovian random process is completely characterized by its "transition probabilities" p(xz x\). To what extent is p(x<2 x\) arbitrary? This question may be answered by taking Eq. (2.4) for a general random process specializing to the three time case and dividing by p(x\) to obtain
or
If we integrate over x% we obtain
For the Markovian case this specializes to the Chapman-Kolmogorov condition
which must be obeyed by the conditional probabilities of all Markovian processes. The Chapman-Kolmogorov condition is not as restrictive as it appears. Many Markovian processes have transition probabilities that for small At obey:
where wa>a is the transition probability per unit time and the second term has been added to conserve probability. It describes the particles that have not left the state
THE CHAPMAN-KOLMOGOROV CONDITION
47
a provided that
If we set t = to + Ato, we can evaluate the right hand side of the ChapmanKolmosorov condition to first order in At and Atr>:
which is just the value p(a', to + At + Ato|ao, to) expected from Eq. (2.18). Note, however, that this proof did not make use of the conservation condition, Eq. (2.19). This will permit us, in Chapter 8, to apply the Chapman-Kolmogorov condition to processes that are Markovian but whose probability is not normalized.
3
Examples of Markovian processes
3.1
The Poisson process
Consider two physical problems describable by the same random process. The first process is the radioactive decay of a collection of nuclei. The second is the production of photoelectrons by a steady beam of light on a photodetector. In both cases, we can let a discrete, positive, integer valued, variable n(t) represent the number of counts emitted in the time interval between 0 and t. In both cases there is a constant probability per unit time v such that vdt is the expected number of photocounts in [t, t + dt] for small dt. We use the initial condition Then n — HQ will be the number of counts in the interval [0, i]. When we talk of P(n, t) we can understand this to mean P(n, t|n 0 , 0), the conditional density distribution. Since the state n(t) = n is supplied by transitions from the state n — I with production of photoelectrons at a rate vdt and is diminished by transitions from state n to n + 1 we have the eauation with the middle term supplying the increase in P(n) by a transition from the n — I state, and the last term describes the exit from state n by emission from that state. These are usually referred to as rate in and rate out terms respectively. Canceling a factor dt we obtain the rate equation In the first term, n increases from n — 1 to n, in the second from n to n +1. Thus n never decreases. Such a process is called a birth process in the statistics literature, or a generation process in the physics literature. A more general process is called a birth and death process or a generation-recombination process. Since n > no we have no supply from the state P(HQ — 1, t) so that whose solution is
THE POISSON PROCESS
49
since P(n, 0) = Snjrto at time t = 0 corresponding to the certainly that there are no particles at time t = 0. The form, Eq. (3.5), of this solution suggests the transformation
with the resultant equation
subject to the initial condition
Thus any Q(n,t) may be readily obtained if Q(n—l) is known. Butn, as described by Eq. (3.3), can only increase. Thus
andEq. (3.5) yields
Solution by induction then yields
or, setting n = no + m,
for n > no with a vanishing result for n < no- This result specializes to the usual Poisson answer
for the usual case no = 0 (see also Eq. (1.128)). The two formulas, Eqs. (3.12) and (3.13) are, in fact, identical since n — no has the meaning of the number of events occurring in the interval (0, t). The more general form is useful in verifying
50
EXAMPLES OF M A R K O V I A N PROCESSES
the Chapman-Kolmogorov conditions
where the last step recognizes the binomial expansion that occurred in the previous step. The final result is equal to that in Eq. (3.13) if t is replaced by (t — to) in the latter. The Poisson process is stationary so that P(n, t no, to) is a function only of t — to- However, no limit exists as t — to —»• oo, so that there is no time independent P(n). We shall therefore evaluate the characteristic function of the conditional probability density
This result reduces to Eq. (1.130) if one sets HO = 0. The cumulants can be calculated as follows:
where the cumulants all have the same value! Here the subscript L is used to denote the linked moment or cumulant as in Section 1.6. 3.2
The one dimensional random walk
The Bernoulli sequence of independent trials described in Section 1.9 can be mapped onto a random walk in one dimension. In Fig. 3.1 we show an array of
THE ONE DIMENSIONAL RANDOM WALK
51
FIG. 3.1. Random walk on a discrete lattice with spacing a. points at the positions ja where j = 0, ±1, ±2, etc. and a is the spacing between the points. At each interval of time, T, a hop is made with probability p to the right and q = 1 — p to the left. The distribution of r, of hops to the right, in N steps in given as before by the Bernoulli distribution:
The first moment, and the second moment about the mean are given as before in Section 1.9 by
A particle that started at 0 and taken r steps to the right, and N — r to the left arrives at position with mean value Notice, if p = q = 1/2, or equal probability to jump to the right or the left, the average position after N steps will remain 0. The second moment about the mean is given by From the central limit theorem, discussed in Section 1.9, the limiting distribution after many steps is Gaussian with the first and second moments just obtained:
If we introduce the position and time variables by the relations
52
EXAMPLES OF M A R K O V I A N PROCESSES
the moments of x are given by
The factor 2 in the definition of the diffusion coefficient D is appropriate for one dimension, and would be replaced by 2d if were in a space of dimension d. Thus the distribution moves with a "drift" velocity
and spreads with a diffusion coefficient denned by
The appropriateness of this definition of diffusion coefficient is made clear in Section 3.4 on "Diffusion processes and the Einstein relation". A detailed discussion of random walks in one and three dimensions is given by Chandrasekhar (1943) as well as by Feller (1957). A recent encyclopedia article by Shlesinger (1997) emphasizes recent work in random work problems. See also A Wonderful World of Random Walks by Montroll and Shlesinger (1983). An "encyclopedic" review of "stochastic process" is given by Lax (1997).
3.3
Gambler's ruin
The problem we discussed, in connection with the second law of gambling, that of winning a specific sum W starting with a finite capital C, is referred to as the Gambler's ruin problem. To make connection to physical problems, we map the probability to a random walk problem on a line. It is distinguished from conventional random walk problems because it involves absorbing boundaries. Since the game ends at these boundaries it is also a first passage time problem - a member of a difficult class. The gambling problem with bet 6 and odds d can be described as a random walk problem on a line with steps to the left of size 6 if a loss is incurred, and a step to the right of bd if a win occurs. Instead of dealing with the probability of winning at each step, we shall define P(x) as the probability of eventually winning if one starts with capital x. Our random walk starts at the initial position C. The game is regarded as lost if one arrives at 0, i.e., no capital left to play, and it is regarded as won if one arrives at the objective C + W.
G A M B L E R ' S RUIN
53
Our random walk can be described by the recursion relation:
since the right hand side describes the situation after one step. With probability p one is at position x + bd with winning probability P(x + bd) and with probability q, one is at position x — b with winning probability P(x — b). Since the probability of eventual winning depends on x, but not how we got there, this must also be the probability P(x). The procedure we have just described of going directly after the final answer, rather than following the individual steps, is given the fancy name "invariant embedding" by mathematicians, e.g., Bellman (1964). The boundarv conditions we have are
We seek a homogeneous solution of this linear functional equation in exponential form P(x) = exp(Ax), just as for linear differential equations. Equation (3.28) then determines A by requiring
We establish in Appendix 3.A that there are exactly two roots, one with A = 0, and one with A > 0. Calling the second root A, the general solution of Eq. (3.28) is a linear combination of 1 and exp(Ax) subject to the boundary conditions, Eq. (3.29), with the result
The probability of winning is that associated with the initial capital C:
Although Eq. (3.30) does not supply an explicit expression for A (except in the case C 0. The denominator in Eq. (3.32) then increases more rapidly with A than the numerator. Thus
Since the condition (3.30) involves only the product A6, an increase in b causes a decrease in A, hence an increase in P. Thus the probability of winning is an increasing function of b. Of course, at the starting position, a bet greater than C is
54
EXAMPLES OF M A R K O V I A N PROCESSES
impossible. Thus the optimum probability is obtained if A is calculated from Eq. (3.30) with b replaced by C:
Our arguments have tacitly assumed that one bet requires a step outside the domain 0 < x < W + C. Thus if a game with large odds d = 2W/C were allowed, the preceding argument would not apply, and it would be appropriate to bet C/2, since the objective is to win no more than W, and to terminate the game as soon as possible, in order to minimize the total amount bet. 3.4
Diffusion processes and the Einstein relation
In the large N limit, the distribution function (3.23) for the one-dimensional random walk can be written as
where dx/dn = a is the width of one step, and
Equation (3.35) is the Green's function of the diffusion equation that is written down explicitly in Eq. (3.50) below. That is, Eq. (3.35) is the solution of Eq. (3.50) that obeys the initial condition:
Let us compare this result with the macroscopic theory of diffusion in which a concentration c of particles obeys the conservation law.
where the particle (not electrical) current density is given by Pick's law
and D is the macroscopic diffusion constant. Thus c obeys the diffusion equation
which reduces in one dimension for constant D to
DIFFUSION PROCESSES AND THE EINSTEIN RELATION
55
The Green's function solution to this equation appropriate to a point source at x = XQ at t = to is given by
in agreement with our random walk result, Eq. (3.35) for v = 0 but the initial position at XQ. The Einstein relation Einstein's (1905, 1906, 1956) original idea is that if one sets up a uniform applied force F there will be a drift current
where the mechanical mobility B is the mean velocity of a particle per unit applied force. Thus the drift current per unit of concentration c is proportional to the applied field F. However, if a force F is applied in an open circuit, a concentration gradient will build up large enough to cancel the drift current
or The simplest example of this is the concentration distribution set up in the atmosphere subject to the gravitational force plus diffusion. This steady state result must agree with the thermal equilibrium Boltzmann distribution
where k is Boltzmann's constant, T is the absolute temperature, and the potential energy V is Comparison of the two expressions for c(x) yields the Einstein relation between diffusion, D, and mobility, B: For charged particles, F = eE, and the electrical mobility is /j, = v/E = eB so that a frequently stated form of the Einstein relation.
56
EXAMPLES OF M A R K O V I A N PROCESSES
If the entire current, Eq. (3.44), is substituted into the continuity equation (3.38), one gets
where v = BF (or efj,E in the charged case). Equation (3.50) is a special case of a Fokker-Planck equation to be discussed in Section 8.3. We note, here, that the drift is contained in the first derivative coefficient term and the diffusion is the second derivative coefficient. The solution of this equation for a pulse starting at x = 0 at t = 0 is
which is the precise relation of the discrete random walk solution, Eq. (3.35). By injecting a pulse of minority carriers into a semiconductor and examining the response on an oscilloscope at a probe a distance down the sample, a direct measurement can be made of the "time of flight" of the pulse and the spread in its width. This technique introduced by Haynes was applied by his class Transistor Teacher's Summer School (1952) to verify the Einstein relation for electrons and holes in germanium. Note that Eq. (3.51) describes a Gaussian pulse whose center travels linear with a velocity v. Thus the center of the pulse has a position that grows linearly with time. Also, the pulse has a Gaussian shape, and the root mean square width is given by (2Dt) 1 / 2 . Measurements were made by each of the 64 students in the class. The reference above contained the average results that verified the Einstein relation between the diffusion constant and the mobility. With holes or electrons injected into a semiconductor, a pulse will appear on a computer screen connected by probes to the semiconductor. For several probes at different distances, the time of arrival can be noted and the width of the pulse is measured at each probe. Thus a direct measurement is made of both the mobility /j, and the diffusion constant D.
3.5
Brownian motion
The biologist Robert Brown (1828) observing tiny pollen grains in water under a microscope, concluded that their movement "arose neither from currents in the fluid, nor from its gradual evaporation, but belonged to the particle itself". MacDonald (1962) points out that there were numerous explanations of Brownian motion, proposed and disposed of in the more than 70 years until Einstein (1905, 1906, 1956) established the correct explanation that the motion of the particles was due to impact with fluid molecules subject to their expected Boltzmann distribution of velocities.
L A N G E V I N THEORY OF VELOCITIES IN B R O W N I A N MOTION
57
It is of interest to comment on the work of von Nageli (1879) who proposed molecular bombardment but then ruled out this explanation because it yielded velocities two orders of magnitude less than the observed velocities of order 10~4cm/sec. von Nageli assumed that the liquid molecules would have a velocity given by with a molecular weight of 100, m ~ 10 22 gram and T ~ 300K so that v ~ 2 x 104cm/sec. The Brownian particle after a collision can be expected to have a velocity where the mass of the Brownian particle M is proportional to the cube of its radius so that
so that V ~ (2 x 104)/(8 x 109) ~ 2 x 10~6 cm/sec, which is two orders of magnitude too small. Conversely, if we assume the Brownian particle to be in thermal equilibrium, Since M ~ (8 x 109)(10"22g) ~ 10~12 grams, and T ~ 300K, we have V ~ 0.2 cm/sec which is much larger than the observed velocity of 3 x 10~4 cm/sec. We shall return to the resolution of this discrepancy after we have discussed Brownian motion from the Langevin (1908) point of view. 3.6
Langevin theory of velocities in Brownian motion
Our introduction to the Langevin treatment of Brownian motion comes from the paper of Chandrasekhar (1943) and the earlier paper of Uhlenbeck and Ornstein (1930), both of which are in the excellent collection made by Wax (1954). However, a great simplification can be made in the algebra if one assumes from the start that the process is Gaussian in both velocity and position. The justification is given in Appendix 3.B. The distribution of velocity is first considered. The free particle of mass M subject to collisions by fluid molecules is described by the equation (for simplicity, we discuss in a one-dimensional case, instead of the actual three-dimensional case)
It was Langevin's (1908) contribution to recognize that the total force F exerted by the fluid molecules contains a smooth part —v/B associated with the viscosity
58
EXAMPLES OF M A R K O V I A N PROCESSES
of the fluid that causes the macroscopic motion of a Brownian particle to decay plus a fluctuating force F(t) whose average vanishes
This fluctuating part will be shown to give rise to the diffusion of the Brownian particle. The relation between the fluctuating part and the diffusion part is the Einstein relation to be derived below. It is also a special case of the fluctuationdissipation theorem to be derived in Chapter 7. Note that if a steady external force G is applied, the average response at long times is v = BG so that B is to be interpreted as the mechanical mobility. If the particle is a sphere of radius a moving in a medium of viscosity rj then Stokes law yields (in the three-dimensional case)
To simplify notation in subsequent calculations, we shall rewrite Eq. (3.56) as
For a Brownian particle of diameter 2a = 1 micron and mass M ~ (8 x 10 9 ) grams moving in a fluid of viscosity of r/ ~ 10~2 poise, we estimate that I/A ~ 10~7 sec. Microscopic collisions with neighboring particles should occur in a liquid at roughly 1012-1013/sec. Thus the correlation
must fall off in times of order 10 sec, much shorter than the 10 sec decay time. It is therefore permissible to approximate the correlation as a Dirac delta function.
where d is an as yet unknown constant. Equation (3.59) can be solved for the velocity
where (v(t)}Vo is the ensemble average velocity contingent on u(0) = VQ.
L A N G E V I N THEORY OF VELOCITIES IN B R O W N I A N MOTION
59
2
The mean square velocity contingent (Au ) on v(0) = VQ is then the first term on the left:
for the particular ^(s) of Eq. (3.62). The limiting value at long times is In the limit as t —> oo,(v (t))Vo must approach the thermal equilibrium value This yields another Einstein relation for the one-dimensional case.
that relates a measure d of diffusion in velocity space to the mobility B (or the dissipation). Equation (3.64) can be rewritten Thus the mean square deviation of the velocity from its mean, starting with the initial velocity VQ, namely avv is independent of the starting velocity VQ\ This is a special case, with u = t, of Eq. (8.18) of Classical Noise I in Lax (19601). For the delta correlated case, Eq. (3.62) shows that the velocity is a sum of uncorrelated (hence independent) Gaussian variables since (A(s)A(s')) = 0 for s ^ s'. Since each term is Gaussian, the sum will also be a Gaussian random variable (see Appendix 3.B). Thus the statistics of v(t) are completely determined by its mean and second cumulant since all higher cumulants vanish. Thus the conditional probability density for v(t~) is given by
where (v)Vo = VQ exp(—At) and the unconditional average (v2} = kT/M is just the thermal average independent of time. In the limit as t —» oo we approach the steady state solution
which agrees with the equilibrium Boltzmann distribution.
60
EXAMPLES OF M A R K O V I A N PROCESSES
Is the converse theorem true? That a Gaussian steady state distribution P(v) implies that A(s) must be a Gaussian process? The answer is positive and is presented in Appendix 3.B. We calculated the velocity distribution in this section, Eq. (3.69), by assuming the velocity v(x) is a Gaussian variable even though velocity-velocity correlations exist. Our results can be justified by the fact that /\v(t) is given by the integral in Eq. (3.62) which represents a sum of uncorrelated variables since (A(s)A(s')) vanishes for s ^ s'.
3.7
Langevin theory of positions in Brownian motion
Since the position of a particle is determined by the time integral of the velocity, we would expect that the statistics of Brownian motion of a particle, that is the random motion in position space, can be determined fairly directly by a knowledge of its motion in velocity space. More generally, one would like to determine the joint distribution in position and velocity space. We shall see in this section that the manipulations to be performed involve only minor difficulties provided that the distribution in positions, velocities and the joint distribution are all Gaussians. That is because the distributions can be written down fairly readily from the first and second moments if the distributions are Gaussian in all variables. But its proof depends on the Gaussian nature of the sum of Gaussian variables. And we have only established that Gaussian nature if the variables are independent. Since positions and velocities are correlated, it is not clear whether the scaffold we have built using independent variables will collapse. To separate the computational difficulties from the fundamental one, we shall perform the calculations in this section assuming that all the variables involved are Gaussian, and reserve for Appendix 3.B a proof that this is the case. The average position may be obtained by integrating Eq. (3.62) with respect to time and averaging:
Here, all averages are understood to be taken contingent on given initial velocities and positions. Next, we calculate ((x(t) — (x(t))) 2 ). The general value of the random variable, x ( t ) , can be obtained by integrating Eq. (3.62) over time, by setting t = w in Eq. (3.62) and integrating over w from 0 to t. The expression is simplest if we subtract off the average position given by Eq. (3.70). The result takes the form
L A N G E V I N THEORY OF POSITIONS IN BROWN1AN MOTION
61
where
with Integration by parts then yields
where The fluctuations in position are then described after applying Eq. (3.61) by
where VQ has again canceled out. It is of interest to examine axx (t) for small and large t. For small t, we may expand the exponential so that
Thus for small t, x(t) is excellently described by (x(t)}. More specifically
when At
As
in cr 2 (t) to obtain
Since (x(t)) is bounded, we obtain the diffusion result
where the diffusion constant D for x is given by comparison with Eq. (3.79) to be
in terms of the diffusion constant d for velocity. After use of Eqs. (3.65), (3.66) we find
which is again the usual Einstein relation, Eq. (3.48).
62
EXAMPLES OF M A R K O V I A N PROCESSES
We shall now proceed by assuming that the position x(t) is Gaussian. The distribution of any Gaussian variable is completely determined by its first and second moments. The first moment of x is calculated in Eq. (3.70), and the second moment of x about its mean is given in Eq. (3.76). The probability density for x is then given by that Gaussian with the correct first and second moments:
where <J xx (t) defined by Eq. (3.76) is given by
where D is given by Eq. (3.81). It is also possible to write a joint distribution function, P(x,v,t), for x and v. For this purpose, we also need the cross-correlation
where the A's describe deviations from the corresponding mean values conditional on a given VQ and XQ. The characteristic function of the conditional probability is then determined from the first and second cumulants in the form
Both the first moments x and v and the second order cumulants are understood to be conditional on the initial position and velocity that were just computed. The original conditional distribution P(x, v, t\XQ, VQ, 0) appearing in Eq. (3.86) can be obtained by taking the inverse Fourier transform of Eq. (3.86). This was already done in Eq. (1.182) for the case of two Gaussian variables, and expressed directly in terms of the second moments axx = a\, avv = a^ and the correlation coefficient
L A N G E V I N THEORY OF POSITIONS IN BROWN1AN MOTION
63
Adiabatic elimination The conditional probability P(x, t\xo, 0) for position does not obey the ChapmanKolmogorov condition. Thus it does not describe a Markovian process. In what way can the Uhlenbeck, Ornstein, Chandresekhar problem of diffusion in x and v space be reduced to the usual Einstein Brownian motion problem in ordinary space? If one recognizes that the time I/A is short compared to the time interval At over which one measures the positions of the particles with a similar condition on the accuracy of positions
one can then approximate Eq. (3.83) by
Another way of looking at the problem is to think of the rapid motion in velocity adiabatically following the slow motion in position. This can be done by rewriting Eq. (3.59) in the form
Then if we regard the slow motion in d/dt as small compared to A, we neglect the former and have so that Now x(t) obeys a Brownian motion directly, and the diffusion constant is
in agreement with Eq. (3.81) obtained earlier. Equation (3.89) is exactly parallel to the Brownian motion Eq. (3.69) for velocity, and the analogous solution is the standard diffusion solution of Eq. (3.41). An adiabatic elimination procedure was extensively used to decrease a six variable problem (two fields, two populations and two polarizations in a gas laser) in Lax (1964QIII). to a two variable problem whose solution was feasible (Hempstead and Lax 1967). Some of the original references to the work on Brownian motion are given in Smoluchowski (1916), Einstein (1905), and Furth (1920).
64
3.8
EXAMPLES OF M A R K O V I A N PROCESSES
Chaos
There are chaotic and other processes that differ from Brownian motion in that the root mean square growth is
This occurs in many natural phenomena. An early known example relates to the flow of water in the Nile river. Records of this water flow have been kept over many centuries. A detailed investigation by a civil engineer, Hurst (1951), has an exponent a that differs from 1/2 and fits closely to an exponent of 0.6. The scaling properties of the fluctuations studied by Hurst (1951) were investigated further by Anis and Lloyd (1976). Many of the phenomena in chaotic motion are described in terms fractional power laws associated with "fractals", a term introduced by Mandelbrot (1983). A brief chapter is provided in Arfken and Weber (1995). An introduction to chaos is provided by G. P. Williams (1997).
3.9
Appendix A: Roots for the gambler's ruin problem
Let
and
By Eq. (3.30), the roots we desire obey F(Z] = 1. F(Z) is infinite at Z = 0 and Z = co and possesses a single minimum at Zm determined by
or
This is clearly a minimum: F"(Z] is not only positive at Z = Zm, it is positive for all real Z. Hence to have two roots, we must show that this minimum value is less than 1.
APPENDIX A: ROOTS FOR THE G A M B L E R ' S RUIN PROBLEM
65
If we evaluate F(Z) at the minimum we get
If we replace q by pd + e which uses the definition of e from Section 1.12, we can express both p and q in terms of e and d:
the minimum value of F reduces to
Is Fm < 1 for F(Z) = 1 to have two roots? This will be true if
This is clearly true for e —» 1. By expansion, it can be verified for small e. It can be extended to all e by taking the derivative with respect to e and after canceling a factor (1 + d)/d, verifying that
Taking the logarithm of Eq. (3.103), we can prove the stronger statement
The first inequality is equivalent to the well known inequality
which is clearly obeyed at small e and its derivative clearly obeys
as long as we restrict ourself to gambling that favors the house (with e > 0). The second inequality
is true for all positive e. Thus our original inequality Fn < 1 is true for all e. Hence there are two roots. By inspection, z = 1 is one root.
66
EXAMPLES OF M A R K O V I A N PROCESSES
Thus we have two possibilities. If the smaller root is Z = 1, the larger root will have Zr > 1. We shall establish this by showing that the slope is negative at Z = 1, showing that Z = 1 is the smaller root:
since e = q — pd, the "take" of the gambling house is always positive.
3.10
Appendix B: Gaussian random variables
We start by establishing the useful theorem: Theorem
The sum, C, of two independent Gaussian variables A and B is Gaussian. Proof The characteristic function of the sum is
where the independence of the variables permits the factorization. If A and B are Gaussian, only linear and quadratic powers of t appear in each exponent involving A and B, hence also for C. Thus the random variable C is Gaussian (proved). If A + B is known to be Gaussian the calculations can be performed by direct use of Ott's theorem (1.56):
The cross-terms vanish when A and B are uncorrelated. If A, B, and their sum are all known to be Gaussian, then Eq. (3.111) can be used without discarding the cross-terms when the variables A and B are correlated. Higher moments do not vanish, but all cumulants above order two will vanish. These ideas all generalize nicely to the continuous case. If s is a continuous variable such as the time, a(s) is a set of known functions and A(s) is a set of
APPENDIX B: GAUSSIAN RANDOM VARIABLES
67
Gaussian random variables, then the sum variable
will obey Ott's theorem (1.56) if the variables A(s) are independent, or if C is otherwise known to be Gaussian. Even if C is not Gaussian its mean value is given by
To deal with averages of products of linear stochastic integrals of the form found in Eq. (3.62), we introduce a theorem that will make all the requirements easy. Although not stated explicitly, this theorem is used implicitly and extensively by Chandrasekhar (1943) and Uhlenbeck and Ornstein (1930). The average of the product of two linear stochastic integrals can be written
and in the special case, when the motion is pure Brownian, Eq. (3.61) is obeyed and
Given that A(t) is Gaussian, with vanishing correlation between different times, the integration over time in Eq. (3.64) is a sum of uncorrelated Gaussian variables. Thus we can conclude that the velocity is a Gaussian random variable. The position x is a time integral over velocity. But positions at two times are correlated. Thus we have not proved (yet) that the position is a Gaussian variable. Our calculations in this section are all based on the assumption that the position variable x(t) is Gaussian even though correlations are present. The justification in this case is similar to that used for velocities. We have shown in Eq. (3.86) that Ax(i) is expressible as an integral over A(s) a set of uncorrelated Gaussian variables. How can we justify the Gaussian assumption when both positions and velocities are present and they correlate with each other? Since both Au(t) and Ax(t') are expressible as integrals over the uncorrelated Gaussian A(s), any linear combination is expressible in this way, hence Gaussian. Thus the joint distribution of these variables is Gaussian, and that justifies our procedure.
68
EXAMPLES OF M A R K O V I A N PROCESSES
Uhlenbeck and Ornstein (1930) recognized that there is a problem and attempted to prove that the system is Gaussian by estimating third and fourth moments. For example they argue that if si is close to s? but these are not near the pair 53 and 54 and the correlation function R(SZ — 53) only has short range one should have: They do this in connection with determining the fourth moment of the velocity: Since the four times can be partitioned into pairs in three ways they state that the right hand side is, in effect multiplied by 3. When applied to the fourth moment of the velocity Eq. (3.116) can be translated into which is consistent with the velocity being Gaussian. Conversely, since the Maxwell distribution has the Gaussian form v must be regarded as a Gaussian random variable. And this requires A(s~) to be Gaussian. That is the converse of what we have proved, so far, namely that if A(s) is Gaussian, both v(t) and x(t) which are expressible in terms of A(s) will be Gaussian even if correlations are present. A succinct way of showing that the force A(s) must be Gaussian if the velocity is Gaussian is to write Eq. (1.54) for the log of the characteristic function in the form
If v is Gaussian, this implies that for all linked moments of the velocity Since the velocity can be written as an integral over a force as in Eq. (3.64)
the requirement, Eq. (3.120), can be written as
For all the linked moments to vanish for n > 2 this must be true of the random forces as well: for n > 2. Thus A(s) must be Gaussian.
4
Spectral measurement and correlation
4.1
Introduction: An approach to the spectrum of a stochastic process
In this chapter we shall compare three definitions of noise: The standard engineering definition that takes a Fourier transform over a finite time interval, squares it, divides by the time and then takes the limit as the time approaches infinity. The second definition is the Fourier transform of the autocorrelation function. The equality between these two definitions is known as the Wiener (1930)-Khinchine (1934, 1938) theorem. The third procedure that we adopt it is to pass the signal through a realizable filter of finite bandwidth, square it, and average over some large finite time. As the bandwidth is allowed to approach zero, the result will (aside from a normalization factor) approach the ideal value of the two preceding definitions. 4.2
The definitions of the noise spectrum
The standard engineering definition of the noise spectrum The spectrum of noise Gs (a, uj) in a random variable a(t) is a measure of the fluctuation energy in the frequency interval [u,uj + du] associated with the fluctuating part a(t), where The standard engineering definition of noise is chosen for the case of a stationary process, to obey the normalization
because it is customary in engineering to emphasize positive frequencies / = (o;/27r) > 0 only. For this reason, we adopt the definition
and verify the normalization later. We use the subscript s to denote the standard engineering (SE) definition.
70
SPECTRAL MEASUREMENT AND CORRELATION
The letter j is used to denote the imaginary unit customary in electrical engineering. The SE convention is that exp(jwt) describes positive frequencies and R + jujL + l/(jujC) is the impedance of a series circuit of a resistance R, an inductance L and a capacity C. Because propagating waves are described by exp(ikx — iujt) in physical problems, we regard exp(—iujt) as describing positive frequencies so that the physics convention is equivalent to setting j = —i consistently. It is also consistent with the convention in quantum mechanics that a Schrodinger wave function in quantum mechanics has the factor exp(—iEt/H} where E is the energy of the system and is positive for positive energies (or positive frequencies E/K). In this definition, Eq. (4.3), the interval on t is truncated to the region — T < t < T, and its Fourier transform is taken, and the result squared. Since a measurement would attempt to filter out one component QJ and square it, this definition is reasonable. What is not clear yet, is why one divides by T rather than T2, which will be clear later. The brackets {•) denote an ensemble average. It is curious that both the limit T —» oo and an ensemble average are taken. For ergodic systems, a time average and an ensemble average are equal because in such systems, over an infinite time the system will visit all the points in phase space over which an ensemble average is made (The more precise statement for quasi-ergodic systems, is that over time a single system comes arbitrarily close to all points in phase space). This is the reason why statistical mechanics works. The experimenter measures a time average. The theorist finds it much easier to calculate an ensemble or phase space average. Yet their results agree. For an experimenter to make an ensemble average, he would have to average over an infinite number of systems. Instead, he averages over time for one system. In most cases, then, either average is adequate, and performing both is redundant. The above assumption is, however, wrong for the measurement of noise. Middleton (1960) shows that if the ensemble average is not performed, substantial fluctuations occur in the value of GS(UJ}. Presumably, this sensitivity occurs because we are asking for the noise at a precise frequency. Because of the Fourier relation between frequency and time, a measurement accurate to Au; requires a time t > I/Aw. Realistic noise measurements, to be discussed below, using filters of finite width, are presumably ergodic. The definition of the noise spectrum using the autocorrelation function
The above definition, Eq. (4.1), assumes the noise is stationary. A more general definition of the noise at a frequency ui and time t is given by the Fourier transform
THE WIENER-KHINCHINE THEOREM
71
of the Wigner (1932)-Moyal (1949) type of autocorrelation function guaranteed to yield a real, but not necessarily positive G(a, w, t). In the stationary case. For future use, we note that the inverse of Eq. (4.4) is
When u = v = t, we get the normalization
which is a natural generalization of Eq. (4.2). Here, negative frequencies are included. But the results are consistent with Eq. (4.2) if (and only if) the integrand is an even function of frequency. The possibility of appearing a noneven function of frequency function is limited to the quantum case as shown in Section 4.4. The nonstationary case is discussed in the Lax (1968) Brandeis lectures, and in Section 4.5. In the stationary case, the autocorrelation, Eq. (4.6), is invariant under a shift of time origin. In particular this means that in the stationary case, both R(t, r) and G(a, uj, t) are independent of t. Also in the stationary case, one can replace R(T) by since a shift of the time origin by r/2 is permitted. But, this new form is not generally correct, and indeed is not necessarily real. 4.3
The Wiener-Khinchine theorem
The Wiener-Khinchine (W-K) theorem states that the noise spectrum, Eqs. (4.3), is given by the Fourier transform of the autocorrelation function, Equation (4.4). This is equivalent to the statement that the above two definitions of noise are equivalent. We shall prove the Wiener-Khinchine theorem by evaluating Gs(a, u) in terms of G(a,uj):
This result was obtained by writing the squared integral in Eq. (4.3) as a product of two separate integrals, and using different integration variables in each factor. In
72
SPECTRAL MEASUREMENT AND CORRELATION
the stationary case (for which the W-K theorem is valid) Eq. (4.7) can be written
If Eq. (4.11) (with ui replaced by a/) is inserted into Eq. (4.10)
The last step moved the limiting procedure under the integral sign and used
The appropriateness of the limit, Eq. (4.13), as discussed in Section 1.13 on the delta functions, is based on the fact that (a) the integral of the left hand side, for any T, is 1; (b) the width of the function of order 1/T and the maximum height at a/ = u! is of order T, This function becomes very tall and narrow. An integration against this function by any function G(u/) of bounded variation will be sensitive only to its value at the peak a/ = u>. See Eq. (1.200). Note that Eq. (4.11) with u = t leads to the normalization condition
where the notation
is the customary symbol for spectral density used by statisticians. This normalization in Eq. (4.14) is equivalent to the customary choice, Eq. (4.2), when G(a,uj) is even in u, but more general when it is not. It follows easily from time reversal that for classical variables evenness follows, but this is not true for quantum mechanical variables (our definitions apply to the quantum case if a* is replaced in the quantum case by the Hermitian conjugate, at). The quantum case will be discussed in Chapter 7 in deriving the fluctuation-dissipation theorem.
NOISE MEASUREMENTS
4.4
73
Noise measurements
The definition of the noise spectrum using realizable filters An actual measurement of noise at a frequency UJQ passes the signal a(t) through a filter described in the time domain by
where K(t) is known as the indicial response of the filter, or its response to a S(t) input pulse. In order that the filter be realizable, hence causal, output can only appear after input so that
The upper limit in Eq. (4.16) can thus be extended to infinity without changing the value of the integral. In terms of Fourier components
Equation (4.16) yields the convolution theorem result
where
is chosen to emphasize the frequency region near U>Q. Thus we expect the output spectrum to be \k(u, UQ)\2 times the input spectrum
However, this argument is heuristic, since the integral for a(uj) does not converge in the usual sense, since the integrand in Eq. (4.18) does not decrease as t —> oo. What is actually measured is
the time average of the squared signal. The subscript m denotes the definition of noise using the filter. For long enough T, we expect ergodicity, and can replace the
74
SPECTRAL MEASUREMENT AND CORRELATION
time average by the ensemble average. Equation (4.22) and Eq. (4.16) combine to yield
Equation (4.23) and Eq. (4.24) are valid for nonstationary processes. Stationarity was assumed only in the last step to obtain Eq. (4.25). Order has been preserved in the above steps so that they remain valid for noncommuting operators. Using the Wiener-Khinchine theorem in reverse, Eq. (4.11), to eliminate the autocorrelation we obtain
The factor 4vr arises because of the convention followed in Eq. (4.14). Thus the desired spectrum at frequency UJQ can be extracted by using a sharp enough filter, \k(uj, UJQ) | 2 - With an appropriate choice of filter K(t) we have described a HewlettPackard spectrum analyzer. Example: A realizable filter
The simplest example of a realizable filter is to regard a(t) as a voltage placed across an R-L-C circuit, with the output a out (i) obtained across the resistance. The differential equations describing this filter are:
These equations result in the Fourier relation with where UJQ = 1/(LC)1/2 is the resonance frequency.
E V E N N E S S IN w OF THE NOISE?
75
The measured spectrum Gm(uJo) continues to be given by Eq. (4.26). In the limit when the Q = uj^L/R of the oscillator becomes large there are two sharp resonances at ztwn and we can approximate
where the coefficient TrR/(2L) was chosen to yield the correct integral
This integral was evaluated exactly using formula 031.10 in Grobner and Hofreiter (1950) shown below. For ab > 0,
where the transformation used was
4.5
Evenness in uj of the noise?
Before we answer the above question, let us note that G(a, uj) by the standard definition of noise is manifestly real. This reality extends to the quantum case, even when a is non-Hermitian. To see this, let us introduce the correlation noise
GAtB(u) by
where A and B are possibly non-Hermitian operators and the dagger represents Hermitian conjugation. (For the classical case, simply regard the dagger as taking a complex conjugate.) Note, that with £ — » • — £ and the use of stationarity, Eq. (4.36) can be rewritten as
Our convention is appropriate to having A(t)^ contain frequency behavior of the form exp(+io;£) [or A(t) containing behavior of the form exp(—iujt)~]. It also preserves the normal order, namely daggered operators to the left of undaggered operators.
76
SPECTRAL M E A S U R E M E N T AND CORRELATION
If we take the complex conjugate of the above equation, which entails taking the Hermitian adjoint of the argument in brackets:
we obtain by comparing to Eq. (4.36). Clearly, then, B^B is always Hermitian and GB^B(^) is real. The question we raised above was under what circumstances is GB^B(^} an even function of u. Alternatively, when is RB^B(^) an even function oft where we define
Although the principal applications in this book are to classical physics and economics, in which random variables commute (can be written in any order) we maintain the order of our variables so that they remain valid in a quantum context. Thus, where complex conjugate (classically) is replaced by Hermitian conjugate, using a dagger, we write for the Hermitian conjugate of a product
because the Hermitian conjugate of a product is the conjugate of the factors in reverse order. In the classical case, the complex conjugate is used and the order is unimportant since classical variables commute. Stationarity can be used to show that
so that in particular RB^B(^) obeys
Thus RB*B(I} would be even in t if it were real. Does it help if B is Hermitian? In general, we have
The second equality is valid only if B is a classical variable, or if the fluctuations in B do not involve the variable conjugate to B. The other steps use Stationarity. In the general quantum case, then GB^B(^} is not an even function of uj although it is even in the classical limit for a real variable B. The inequality of Stokes and anti-Stokes scattering intensities is a consequence of this quantum induced lack of evenness.
NOISE FOR NONSTATIONARY R A N D O M VARIABLES
4.6
77
Noise for nonstationary random variables
The determination of the noise spectrum associated with a nonstationary random process appears to be an oxymoron. We are searching for a function G(u,t) describing a density in the two-dimensional ui-t space. Thus, at each time, t, we require a power spectrum G(UJ) in frequency. However, these are conjugate variables in the sense the the frequency distribution is related to a Fourier transformation of the original random variable, say X(t). But then there is an uncertainty principle of the form This is consistent with the requirement that a measurement of the noise at a single frequency, uj, requires an infinite measurement time, in agreement with the limiting process T —>• oo used in Eq. (4.3). One consequence of this difficulty is that a number of definitions have been suggested in the literature, for example, by Page (1952), Lampard (1954). A detailed analysis of the proposed spectrum by Page, and Lampard is made by Eberly and Wodkiewicz (1977) who also provide a fourth definition of the time dependent spectrum, which they call "the physical spectrum of light". We shall not attempt to review this work here since Eberly and Wodkiewicz (1977) have already made detailed comparisons. What appears to have been overlooked in these references is that a number of solutions appeared earlier for an analogous problem in quantum mechanics. Position and momentum variables can also not be measured simultaneously with complete precision because of the Heisenberg uncertainty principle. Thus a simultaneous distribution function for position and momentum would appear to be just as much an oxymoron. However, Wigner (1932) proposed an elegant solution for the density in phase (position and momentum space) which was expounded later at some length by Moyal (1949). In the Brandeis Lectures Lax (1968) suggested that the Wigner-Moyal choice could be applied to the case of noise in nonstationary problems. Equation (4A18) of the Brandeis lectures is the same as Eqs. (4.5), (4.6) here. In Lax (1968QXI), however, it is shown that there are many possible choices of distribution functions. In particular, there is the Wigner symmetric distribution, the de Rivier symmetric distribution, and the normal and antinormal distributions. If q is position and p is momentum, then there are combinations roughly that are the negative and positive frequency parts (or in quantum mechanics the destruction and creation operators respectively). Normal order involves all creation operators to the left of all destruction operators. It is then possible to construct normal ordered and antinormally ordered distributions. They are different numerically, but related.
78
SPECTRAL MEASUREMENT AND CORRELATION
In Lax (1968QXI), the point is that any of these distributions can be used. And they should all lead to the same final answer. But which one is most convenient to use depends on what physical quantity is concerned and determined. For example, if measurements are made for photon counters, then the antinormal distribution is best in the sense that the desired results can be obtained by a simple integration over a classical distribution function. But if another choice is made, corrections will have to be calculated, as described in Lax (1968QXI) and in Lax and Yuen (1968QXIII). In our work on laser line-widths and photocount distributions, the distribution function can be calculated analytically. And so the antinormal one was calculated. In this section, our objective is different. We must choose the form of spectral distribution that is easiest to measure. An additional consideration, is to make a choice that is best for computing the physical result of interest. The latter may depend on the nature of the measuring devices. It also depends on the nature of the process involved, particularly, if we have some knowledge about it. For example, Bendat and Piersol (1971) display three processes. See Fig. 4.1. The first is one in which a random process (with zero average) is modified by a time varying mean value. In the second, the mean is zero but the mean square varies randomly. In the third, the frequency varies randomly. It is doubtful that one choice of spectral formula is better than all others for all three cases. There are other practical considerations in the construction of an algorithm for spectral calculation. A true ensemble average might require an enormous amount of repetitions of the experiment. Even if this could be done, it might be meaningless. For example, consider stock market prices. One could take a set of series, each of which begins on January 1 and ends on December 31. Since conditions (other than seasonal effects) could be sufficiently different at the start of each year that averages (say over 100 years) might give very misleading results. Perhaps the most suitable case for analysis is one in which the time-scale of the nonstationary part of the processes is much longer than the time scale of the stationary part. An appropriate starting point for the nonstationary case would be Eq. (4.23) or (4.24) in which one the signal a(t) through a filter by a convolution with K(t — t') and then absolute squaring the result. The absolute squared result can be time averaged over a suitably chosen time interval, as shown in Eq. (4.22). The latter step can also be replaced by averaging with an exponential weight:
The choice we made of a realizable filter in Eq. (4.30) was chosen to be an RLC circuit for simplicity. It has two resonances at frequencies whose real parts are equal but opposite in sign. The procedure used by Eberly and Wodkiewicz (1977) is equivalent to creating a filter with only a positive frequency resonance. Since
NOISE FOR NONSTATIONARY RANDOM VARIABLES
79
FIG. 4.1. Three processes displayed by Bendat and Piersol (1971). The first is one in which a random process (with zero average) is modified by a time varying mean value. In the second, the mean is zero but the mean square varies randomly. In the third, the frequency varies randomly. both their filter and ours have a dissipative term that controls the frequency linewidth it also restricts the range of time data used. Our subsequent average over a time IT or 1/7, with 7 in Eq. (4.47), provides separate control over the time and frequency intervals. This freedom may be illusory since the time-frequency product must exceed unity. Our procedures are thus very similar. There are some small errors inherent in both, since our predicted noise, calculated over a time interval is probably a better estimate of the value in the middle of the interval.
80
SPECTRAL MEASUREMENT AND CORRELATION
A detailed examination of the problem of time-varying spectra has recently been made by Cohen (1995) who compares a wide variety of possible choices. If we regard the problem of determining the spectrum from experimental data in the presence of distortion, the problem becomes an ill-posed problem of the sort discussed in Chapter 15 on "Signal Extraction in the Presence of Smoothing and Noise". The ill-posed nature of such problems is overcome by regularization procedures. Although they are not referred to in this way, the wide variety of windowing procedures perform the function. We shall return to this problem of noise in nonstationary systems in Chapter 17 on the "Spectral Analysis of Economic Time Series". 4.7 Appendix A: Complex variable notation Even if a(i) is a real variable, such as the voltage V(t), the associated Fourier transform
is complex. If a(t) is a stationary variable, this expression for a(uj) is not a convergent integral. Mathematicians might say this expression is meaningless. However, its moments are meaningful:
Inserting the inverse Wiener-Khinchine relation, Eq. (4.11), for the last factor in Eq. (4.49) one gets
an expression in terms of the noise spectrum itself of the variable a. The presence of the delta function shows that only two Fourier components at the same frequency interfere with each other. Note aji = 2?r/i relates angular frequencies to ordinary frequencies. The last term uses the notation S(a, /) = (l/2)G(a, w) of Eq. (4.15), common in mathematics and statistics books.
APPENDIX A: COMPLEX VARIABLE NOTATION
81
Thus, if we have two variables related by a complex factor, such as the current, I((jj], and a voltage, V(uj) related by an impedance Z(uj) then we can relate the corresponding noise spectra by
This is an immediate consequence of the relation
and of Eq. (4.50).
5
Thermal noise
5.1 Johnson noise In this chapter we introduce thermal noise by reviewing the experimental evidence, and deriving Johnson noise using a phenomenological thermodynamic approach. A more strict derivation will be described in Chapter 7. Johnson (1928) measured the voltage noise in a variety of materials, as a function of resistance. He found the results shown in Fig. 5.1, namely that (V2), the mean square fluctuation in voltage, is proportional to the resistance R, independent f the material. See also Kittel (1958).
FlG. 5.1. The noise measured by Johnson (1928) versus resistance in six diverse materials.
JOHNSON NOISE
83
FlG. 5.2. Thermal noise for two resistors in parallel versus temperature obtained by Williams (1937) is plotted as an effective resistance (V2)/[4kdfTa] against T2/Ta. Williams takes Ta to be Ti, except in the one resistor case, for which RI is infinite, and he then chooses Ta to be room temperature. Both theory, in Eq. (5.2), and experiment are linear functions of temperature. Line B is the two-resistance case, and line A is the one-resistance case. Johnson found that the measured noise power in the frequency interval is proportional to the temperature of the resistor from which the noise emanates
The proportionality factor k was found by Johnson to agree with Boltzmann's constant. Williams (1937) performed experiments using two resistors, RI and R%, in parallel, and measuring noise power (V2) in the frequency interval df, and two different temperatures on the resistors, TI and T%. Experimental data are plotted in Fig. 5.2, as the effective resistance, defined by Re = (V2} / (4kdfTa) as a function of T<2/Ta, where Ta = T\ in the case of two resistors (case B), and Ta is room temperature when T^i = oo in the case of one resistor (case A). Moullin (1938) gives a review of the experimental measurements, and concludes that the noise in a frequency interval df is given by
where k is Boltzmann's constant. Theoretical results from Eq. (5.2) are represented by solid lines in Fig. 5.2 compared with experimental data.
84
THERMAL NOISE
Moullin (1938) generalizes this result to the case of an arbitrary number of impedances in parallel:
If we remember that admittance Yn is related to the impedance Zn by
and write V = ZI, this result takes a simpler form in terms of current fluctuations
Other useful references on networks and noise are Murdoch (1970), Bell (1960) and Robinson (1962, 1974).
5.2
Equipartition
The equipartition theorem in (classical) statistical mechanics asserts that at equilibrium an energy of kT/22 is stored in each degree of freedom. Thus a harmoni oscillator would have kT/2 stored in its potential energy, and an equal amount stored in its kinetic energy. The same would be true for the capacitance and inductance in an LC circuit, the electromagnetic analogue of a harmonic oscillator. But these devices (ideally) are not a source of noise. Thus it behooves us to establish compatibility with thermodynamics by showing that adding a small resistance, R, in series with an LC circuit will produce exactly the required amount of energy from noise in the circuit, regardless of how weak the resistance, R, is. Consider a series circuit of resistor R, inductance L and capacitor C, with Johnson noise V(t) in the resistor. Then the resulting current and charge fluctuations / and q are
T H E R M O D Y N A M I C DERIVATION OF J O H N S O N NOISE
85
The fluctuation energy in the inductance using Eq. (5.1) is
where we have set u = LJQX, with LJQ = 1/(LC)1/2 as the circuit resonant frequency, and Q = uj^L/R as the circuit Q factor (energy stored over energy lost per cycle). This integral can be performed using the residue theorem of complex variable theory, or by using Eq. (4.33) which was obtained from Grobner and Hofreiter (1950). Similarly, the energy stored on the capacitance is
Thus, for both the inductance and the capacitor, the energy stored because of the noise in the resistor is precisely that expected by the equipartition theorem. Note that Eq. (5.7) is a relation between Fourier components, so that, for example / and q should be written 1^ and q^, whereas in Eqs. (5.8) and (5.9) we are really dealing with the time dependent quantities {/(t)2} and (q(t)2) respectively. A fundamental truth now emerges. Fluctuations must be associated with dissipation in order that the system does not decay to zero, but maintains the appropriate thermal equilibrium energy.
5.3
Thermodynamic derivation of Johnson noise
In view of the compatibility with thermal equilibrium shown in the preceding session, it is not surprising that a simple thermodynamic argument can be used to demonstrate that the noise emanating from a resistor must be proportional to its resistance. Consider two resistors in a series circuit shown in Fig. 5.3. The current I\ through resistor R^ produced by the first resistor is
86
THERMALNOISE
FlG. 5.3. Power transfer from resistance R\ to R% and vice versa where Vj is the Johnson noise voltage in resistor Rj. The power from V\ into R<2 is given by
Conversely, the power from 2 into 1 is given by
If both resistors are at the same temperature, the second law of thermodynamics requires that there can be no steady net flow (in either direction). Equating Eqs. (5.11) and (5.12), we obtain
Since the left hand of the equation is independent of R*z, and the right hand side is independent of R\, this equality requires both sides to be independent of both resistances. It therefore must be an (as yet unknown) universal function of frequency, W(/). In summary, the noise spectrum associated with an arbitrary resistance R is given by If resistance R% is replaced by an impedance R(f) + j X ( f ) , the same second law argument requires that
N Y Q U I S T ' S THEOREM
87
Thus we can conclude that the noise G(V, /) associated with impedance Z(f) is given by: where 3J means the real part. Thus the noise is proportional to R(f) = $tZ(f) even when the impedance Z(f) is frequency dependent. The Johnson law has therefore been generalized to the case of frequency dependent impedances. 5.4
Nyquist's theorem
A more complete derivation of the fluctuation dissipation relation including a determination of the universal coefficient W(/) is provided by Nyquist (1927, 1928). Nyquist's procedure is to calculate the power dissipated in a load connected to the end of a transmission line in two different ways and compare the results. Equation (5.11) for the power from resistor R± into R% reduces with Eq. (5.14) to
The maximum power is transferred when the impedance is matched, R\ = R^.
On the other hand, a transmission line can be terminated with its "characteristic impedance" where L is the inductance per unit length of the line and C is its shunt capacitance per unit length. In this case, waves down the line are not reflected. The line acts as if it were infinite. Nyquist therefore chooses as his proof vehicle a transmission line terminated by RQ at both ends. The line is assumed to have length I. The transmission line can be described in terms of its modes which are harmonic oscillators. [f U is the energy density per mode, then the energy per mode is
where we have made use of the equipartition theorem valid for modes that behave like harmonic oscillators. If the modes are described as plane waves ex.p(±ikx) in a periodic system of lensth I then k takes the discrete values Thus the number of modes in the interval Afc is //27rAA;.
88
T H E R M A L NOISE
Since uin = vkn is the frequency associated with mode n, where v is the velocity of propagation on the line, the number of modes propagating to the right in a given frequency interval is
Since each mode carries an energy U with a velocity v, the power transmission down the line is
which according to Eq. (5.18) should be equal to W(/)A//4. We obtain In the limit of classical physics, Eq. (5.20) is applied and Equation (5.23) then yields the classical Nyquist theorem where R is the characteristic resistance of the transmission line. This equation is a result in agreement with Johnson's experimental results. The beautifully simple Nyquist proof yields a result independent of frequency because all the harmonic oscillator traveling modes have the same energy kT, and because the density of these modes is uniform in frequency. The normalization must, of course, be that found in order to obtain the agreement with the equipartition obtained in Section 5.2. An apparent problem with Nyquist-Johnson noise is that the total voltage fluctuation diverges. Nyquist suggested that this problem could be removed if the classical energy, kT, associated with a harmonic oscillator were replaced by the quantum energy which approaches kT at low frequencies, and vanishes experimentally at high frequencies. Of course, the actual enersv associated with a harmonic oscillator includes the zero-point energy. If the latter is retained, the divergence in the integrated energy reappears.
N Y Q U I S T ' S THEOREM
89
It is sometimes argued that zero-point energy can be ignored because only differences in energy can be observed. However, it is not true for magneto-optical transitions between Landau levels in the valence band and similar levels in the conduction band. These levels possess a level structure like a harmonic oscillator, but the frequency is the cyclotron frequency associated with the magnetic field. Thus it is inversely proportional to the effective mass of the electrons in the conduction band, or the holes in the valence band. Since these masses are different, the energy differences contain the difference of the two zero-point energies which is therefore observable. See Lax (1967). In the Casimir effect (Casimir and Polder 1948), two closely spaced metallic plates are attracted by the influence of vacuum fluctuation in the gap on van der Waal forces. The Casimir effect has been experimentally verified by Deryagin and Abrikosava (1956), Deryagin, Abrikosava and Lifshitz (1956), Kitchener and Prosser (1957), and Chan et al. (2001). An alternate derivation of the "Lamb shift" between the otherwise degenerate s and p levels in a hydrogen was given by Welton (1948) based on the effects of the zero-point fluctuations of the electromagnetic field on the electron. The relevance of zeropoint energies in the electromagnetic field is discussed further in Chapter 7 as it relates to the area of quantum optics. We also note that absolute energies (not just differences) are relevant in general relativity, since space is distorted by the total energy content. For an elementary discussion of these points see Power (1964). Callen and Welton (1951) considered a general class of systems (quantum mechanically) and established that Eq. (5.26), and its dual form with Y = l/Z, the admittance, and g = ^KY(uj), the susceptance:
apply to all systems near equilibrium with the replacement of kT by Eq. (5.28) when necessary. The importance of the Callen-Welton work is the great generality of potential applications. The fact that all dissipative systems have corresponding noises associated with them is necessary in order that the second law of thermodynamics not be violated when such systems are connected. The fluctuationdissipation theorem will be discussed in more detail in Chapter 7, after the density operator tools needed to give a short proof are developed.
90
5.5
THERMAL NOISE
Nyquist noise and the Einstein relation
Consider a mechanical system with velocity v. Then the standard engineering (SE) noise associated with v can be defined by
The zero frequency noise is then obtained by setting u = 2yr/ = 0
But the usual diffusion constant, D, is defined by {[Ax]2) = 2DT, where T is the total time traveled. See Eq. (3.27). Thus the zero frequency velocity noise is directly determined by the diffusion constant:
Conversely, the fluctuation dissipation theorem for the velocity (which is analogous to a current rather than a voltage) is given by Eq. (5.30)
where with F the applied force, is the admittance, or velocity per unit applied force. At zero frequency, we refer to v/F as the mechanical mobility, B, and v/E as the (electrical) mobility n so that
The fluctuation dissipation theorem at zero frequency now reads
which is simply the Einstein relation between diffusion and mobility. An experimental verification of the Einstein relation for electrons and holes in semiconductors is given in the Transistor Teacher's Summer School (1953). 5.6
Frequency dependent diffusion constant
In some problems, such as the hopping conduction problem discussed by Scher and Lax (1973), the description is in terms of positions rather than velocities. Thus
FREQUENCY DEPENDENT DIFFUSION CONSTANT
91
it would be desirable to obtain an expression for the frequency dependent mobility or diffusion constant in terms of positions instead of velocities. With the help of Nyquist's theorem, Eq. (5.30):
where v is a particle velocity and Y is the mechanical mobility
for an applied force at frequency uj = 2yr/. These formulas can be simplified in the classical case by assuming that the time correlation is even in r and $Y(uj) is even in frequency. However, the results we derive below do not depend on this simplifying approximation. The Einstein relation
where /z is the electronic mobility suggests that we can define a complex frequency dependent diffusion constant:
Then Eq. (5.38) simplifies to
This transform can be inverted, as in Eq. (4.7), to yield
where we have used stationarity and t' = t + r. Thus we can write for the meansquare displacement:
Now differentiate with respect to t to obtain
92
THERMAL NOISE
We can invert this sine transform to obtain
By Eq. (5.38) and (5.41), D(u) is G(v, /)/4. Thus we obtain MacDonald's theorem
Equation (5.47) permits velocity noise to be calculated from position fluctuations as required in Scher and Lax (1973). For obtaining the imaginary part of D(UJ), see the discussion in the Appendix of Scher and Lax (1973).
6 Shot noise
6.1
Definition of shot noise
Shot noise is the name given to electrical fluctuations caused by the discreteness of electronic charge. For excellent elementary discussions of shot noise see Robinson (1962, 1974), MacDonald (1962), and Bell (1960). The most typical example concerns the emission of electrons from the cathode of a vacuum tube. For a discussion of noise in vacuum tubes see Lawson and Uhlenbeck (1950) and Valley and Wallman (1948). In Fig. 6.1 we display an example discussed by Robinson. The switch S is connected to A for a time interval r. The current through the diode charges condenser C. Then the switch is shifted to position B and the accumulated charge is measured by the ballistic galvanometer G. The actual charge measured will be
with integral n in a single measurement. The mean charge (average over many measurements) will be
with nonintegral results. Assuming random arrival of the electrons, we have Poisson statistics, see Section 3.1, and the root-mean-square fluctuation in charge is
FIG. 6.1. Apparatus for measuring shot noise associated with charge accumulated on a condenser using a ballistic galvanometer. After Robinson (1962).
94
SHOT NOISE
given by Thus by means of two macroscopic measurements it is possible to determine the electronic charge as The case in which electric charge is continuous can be obtained by letting n approach infinity and e approach zero, keeping the product en fixed. Equations (6.2) and (6.3) demonstrate our contention that shot noise is created by the discreteness of charge and disappears in the continuous charge limit. If charge were a continuous fluid, shot noise would disappear. The first theory of the shot effect due to Campbell (1909) not only yields the above result, but as shown in the next section takes proper account of the shape and duration of the pulses. Schottky (1918) proposed a simple theory of the shot effect in relation to thermally emitted electrons from the anode of a vacuum tube. He also recognized the possibility of using such measurements to determine the charge on the electron. The first experiments by Hartmann (1921) obtained a charge between 0.07 and 3 times the electronic charge. Further theory of the shot effect was given by Furth (1922) and by Fry (1925). In a remarkable series of papers, N. H. Williams demonstrates in a series of careful experiments that the charge on the electron can be measured using the shot effect to an accuracy comparable to that in the Millikan oil drop experiment. Moreover, this is a measurement of e alone, without involving other parameters such as the electron mass and the viscosity of air. The earliest paper in this series by Hull and Williams (1925) measures the shot noise induced in an RLC circuit from a vacuum tube that emits electrons from a grid as well as the anode. See Fig. 6.2. This paper shows that even though the voltages can be adjusted for the currents from these two sources to cancel, the two shot noise contributions add. This, of course, is what would be expected if the two emissions were independent. The authors must also correct for the case in which the anode emission is not completely temperature limited. In that case, there are electron correlations that reduce the shot noise as described in Section 6.5. Moreover this paper demonstrates that shot noise can be treated as a current source by examining the frequency dependence of the voltage noise.
Then the results were confirmed by even more accurate accurate experiments of Williams and Huxford (1929).
CAMPBELL'S TWO THEOREMS
95
FlG. 6.2. Shot noise into a circuit containing a C in parallel with a series RL circuit, from Hull and Williams (1925), Fig. 8. or Fig. 4-12 of Lawson. The currents from the two circuits can be adjusted to add to zero (on the average) but the shot noises do not cancel from independent emissions. The paper by Williams and Vincent (1926) measured the shot effect from a vacuum tube diode directly into a noninductive resistance and simplified the theory for emission into a nonperiodic circuit. The experimental results of the work of Williams and his collaborators required a careful analysis of both theory and experiment before adequate accuracy and understanding was obtained. An even more detailed analysis of experiment and theory is needed in the recent ingenious experimental work by de-Picciotto et al. (1997, 1998) to show that in the case of the fractional Hall effect, the effective charge can be e* = e/3. Thus, although the charges are discrete, they are not necessarily integral. 6.2
Campbell's two theorems
Campbell (1909) was concerned with the measurement of the charge on the alpha particle. The charge q, given to an electrode system, generates a voltage q/C (where C is the electrode capacity) which decays through some leakage resistance so that (q/C) exp(— pt) is the voltage V on the electrometer plates. The voltage V generates a torque KV on the electrometer needle whose response is determined by its moment of inertia /, torsional stiffness k and damping // through the relation
The parameters /, k, p., K are presumed known from a separate experiment. The solution for © can be written q f ( t ) . If a set of pulses arrive at the times t,, the
96
SHOT NOISE
complete response is
For a critically damped galvanometer
but we need not restrict our calculations to any particular form of /(£). The latter is the indicial response, that is, the response of the apparatus to a delta function source. We are now in a position to state: Campbell's theorem
If the pulses in Eq. (6.8) arrive at random at an average rate v per second, the average response is given by the first Campbell theorem:
and the variance of the response is given by the second Campbell theorem
where / = vq is the average current. Note the asymmetry, {(A6) 2 ) = {O2} — (O) 2 is expressed in terms of / 2 not in terms of/ 2 - {/}2. Since f ( t ) describes the "indicial" response of the apparatus to a delta pulse, its Fourier transform F(f) with / = UJ/IK is
describes the frequency response of the apparatus to a delta time source. Proof
Equation (6.8) can be written in the form
where the ideal shot noise function
CAMPBELL'S TWO THEOREMS
97
is the density of events t, for the ideal shot noise case. Each delta function gives rise to one pulse in the series with average value
The last step, setting the average time-dependent rate (v(s)}, to a constant v, the average number of events per second, is appropriate (only) in the stationary case. Thus we obtain Campbell's first theorem:
where stationarity is used in the last step. To obtain {[AO(i)] 2 ), we shall consider the slightly more general problem, with t' ^ t:
In performing the average, one must separate the double sum over i and j to the i = j terms and i ^ j terms:
The second term is the definition of the (possibly correlated) joint rate:
Thus The fluctuation is defined by
Thus the fluctuation in @(t) is given by
98
SHOT NOISE
When correlations are absent, the second term vanishes. In the case of uniform (on the average) flow, (v(s)) = v is time independent, and
Eq. (6.23) is a generalization of Campbell's second theorem. The latter is restricted to the case in which t = t':
We close this section with an application of Campbell's theorem to the RC circuit shown in Fig. 6.3 that a charge e from voltage fluctuations induced by shot noise in an RC circuit. The vacuum tube generates a charge e on the condenser that decays away with the usual RC time constant. A voltage across the condenser plates is given by Campbell's theorems then yield
where I = i>e and
Again, the electronic charge can be determined by comparison of {(AF)2} with (V) If one takes the limit in which the charge e goes to zero and v goes to infinity at fixed current / = ve, the discreteness of the charge, and the shot noise associated with that discreteness disappear.
6.3
The spectrum of filtered shot noise
Parseval's theorem (see Morse and Feshbach 1953) states that
If we regard |/(t) |2 as the density of energy in time, then |F(/) |2 can be regarded as the energy density in frequency /. Parseval's theorem shows that the total energy
THE SPECTRUM OF FILTERED SHOT NOISE
99
FIG. 6.3. A RC circuit for application of the Campbell's theorem. can be obtained by adding either the frequency components or the time components, with equal result. A simple nonrigorous proof can be given, using Eq. (6.12), to rewrite the right hand side in the form
We have reversed the order of integration. Since the last integral over frequency / is simply the delta function 5(u — t), representing completeness, the integral reduces to the left hand side of Eq. (6.29). Since f ( t ) is real, F(/)* = F(-f}, so |F(/)| 2 is even in / and
Thus Eq. (6.24) can then be rewritten as
This relation is a consequence of the completeness of the Fourier transform relationship, Eq. (6.12). The second form is written to suggest that the white spectrum is the spectrum of ideal shot noise. We shall derive this spectrum below directly fromEq. (6.14). The factor |F(/)| 2 can then be interpreted as the filter that reduces the ideal shot noise of Eq. (6.14) to that of O(t). The function f(t] is the indicial response of the filter, namely the response to a delta function input. That the spectrum at the output is that of the input pure shot noise multiplied by the spectrum
100
SHOT NOISE
of the filter is so reasonable, it hardly requires proof. We shall, however, make a direct evaluation of the spectrum of pure shot noise in the next section, since we are then sure that all our normalizations are correct, in addition to that of the shape of the spectrum. The current associated with the arrival of charges q at times t, can be written
where the symbol 5* is used to remind us that we are referring to the current associated with pure shot noise. If the expected arrival rate is v(t~) per second then The fluctuation is given with the help of Eq. (6.21) as
If the arrivals are independent
and if thev are constant, on the average, v(t] = v so that
The spectrum is then independent of frequency /
since the average current / is time independent, as anticipated in Eq. (6.33). Such a constant spectrum is referred to as "white". We shall illustrate the above result by calculating the voltage spectrum across the condenser C in Fig. 6.3 without using Campbell's theorem. The full current, ifuii = i, passes through the parallel circuit of condenser, C, and resistance, R, with ic and IR passing through the condenser and resistance, respectively, in proportion to the admittance of these elements:
Thus the voltage across either is given by
TRANSIT TIME EFFECTS
101
Equation (6.41) determines the spectrum of v from that of Zf u u. The spectrum of voltage fluctuations is
and the total voltage fluctuation is
in agreement with Eq. (6.27) obtained using Campbell's theorem. 6.4
Transit time effects
Shot noise arises because charge is discrete. In a vacuum tube diode each electron crosses from cathode to anode. It would be incorrect, however, to assume that the external circuit sees a delta pulse associated with the time of arrival of each electron. Instead we shall follow a simple model developed by Shockley (1938) as discussed by Freeman (1952) and in Section 6.5 we shall supply an elementary proof of the validity of the model. If a charge e has advanced a distance x, a fraction x/L of the total distance L from the cathode to the anode, we shall assume that the external circuit will vary continuously as if a charge ex(t)/L has arrived at the anode. The full charge transfer is completed when x(t) = L. A set of charges at positions Xj(t) lead to a charge transfer of
which is associated with a current flow of
and Vj(t) = dxj(t)/dt is the velocity of charge j. Of course, only charges in the region 0 < Xj < L contribute to either sum above. Equation (6.45) is equivalent to using an average current over the region 0 < x < L .
where the actual current density has the expected form (Jackson 1975, Section 5.6),
102
SHOT NOISE
We shall explore the consequences of Eq. (6.45) and prove in the next section that the average expression in Eq. (6.46) or Eq. (6.45) is, in fact, exactly correct; see Eq. (6.76). The transit time T for any carrier with velocity v(t) obeys
The velocity v(t) is not assumed constant, but we shall, in what follows, state the general answer, and answers for the simple special case of uniform velocity. For example, if v(t) = v
It has been tacitly understood that each term in Eq. (6.45) contributes only while the position of the charge is in the active region:
Application of Campbell's first theorem to Eq. (6.45) with the help of Eq. (6.48) yields:
which is the charge e times v, the number per second, at which they appear. Campbell's second theorem takes the form
In the uniform velocity case, with the help of Eq. (6.49)
so that T([A/] 2 )// yields an experimental value of e.
TRANSIT TIME EFFECTS
103
The Spectrum of filtered shot noise
We can write the current as a convolution
between the pure shot noise S(s) ,
and the pulse shape v(t). As a result of the convolution theorem the current noise can be written
where the "window factor" is
and G(S, uj) is the pure shot noise associated with S(s). If we regard t,, as the time an electron leaves the cathode, no signal can appear in the output circuit until t > tj . Thus we set v (t) = 0 for t < 0 . After the electron hits at tj +T , this v (t — tj) term no longer contributes so we can set v(t) = 0 for t > T. Thus the pure shot noise is filtered by the "window" factor
where T is the transit time of the electron in passing from cathode to anode. One can readily verify that the mean current
is unaffected by the window. All that remains is to calculate the spectrum of pure shot noise itself, which we have already given as a theorem in Eq. (6.39) as
a "white" noise independent of frequency, and
which has a spectrum whose "color" is that of the window W(uj~).
104
SHOT NOISE
FIG. 6.4. Motion of a charged layer from anode to cathode. 6.5
Electromagnetic theory of shot noise
In this section we shall prove that the intuitive "average" model of Shockley (1938) and Freeman (1952) discussed in the previous section is rigorously correct. A onedimensional sheet of charge whose total charge is e moves from the cathode to the anode of a diode with velocity v(t) (see Fig. 6.4). The charge sheet enters the region between cathode and anode at time to and arrives at the anode at time to+T where T is the transit time. Describe the current that appears in the external circuit in the time interval — oo < t < oo. The results will justify the use of the smooth current in Eq. (6.45). A quasistatic approach is permissible. We shall therefore use Poisson's equation:
in MKS units. To get Gaussian units set eo = 1/4.7T. For our case
where dx/dt = v(t) and x(0) = 0. Thus we need the Green's solution of Poisson's equation. On each side of the sheet, d^c^/dx2 = 0, since no charge is located in the vacuum region. Thus <j> is linear in x and both E and D are constants. By Gauss' law, the jump across the sheet is
which implies that If there is a potential difference of V between the electrodes
ELECTROMAGNETIC THEORY OF SHOT NOISE
105
+
By superposition, the solution will be the uniform field E = E~ = V/L plus the solution for the case V — 0. To obtain the latter solution, set V — 0 and use Eq. (6.65) to eliminate E+. The result, inserted into Eq. (6.66) with V - 0, is an equation for E~:
Thus we obtain the fields on both sides of the sheet.
Conservation of charge yields
So it is dD/dt + J that is conserved as one moves around a circuit. In the one-dimensional case
is independent of x and represents the current flow in the circuit. In the external wire, D = 0, since charges disappear in a conductor (within the dielectric relaxation time). To evaluate the total current in the vacuum transit region we note that the conduction part of the current density (in the one-dimensional case) is
With the help of the Heaviside unit function H(x), Eqs. (6.68) and (6.69) can be combined into the single equation:
Differentiation with respect to t leads to four terms, two of which cancel, leadir to the simple result:
106
SHOT NOISE
FIG. 6.5. The potential between cathode and anode in a simple diode displaying the barrier plane. Combining this result with Eq. (6.72) for the conduction current, the combined current simplifies to:
Note, that the total current contains no singular or discontinuous terms. This combined current appears in the external electrical circuit:
The smooth result in Eq. (6.76) is just what we assumed In Eq. (6.45). The time T in Eq. (6.76) is determined by:
where the velocity v(t) will be governed by the applied voltage and Newton's laws. Note, an extra dc field V/L does not contribute to dD/dt although it contributes toD. At low concentrations, or for one particle (with no screening), one can write a simple expression for the velocity (in a constant potential)
6.6 Space charge limiting diode Our previous discussion of the shot noise in a diode is temperature limited. When significant space charge is present, the noise is limited by correlations induced
SPACE CHARGE LIMITING DIODE
107
by the space charge. The potential produced by the space charge obeys Poisson's equation
where the density of electronic space charge p is everywhere negative. Thus the potential is concave upwards. If the potential V would increase monotonically from cathode to anode, emitted electrons of all energies would be accelerated from cathode to anode, and no space charge would develop. Since space charge develops, the potential must develop a minimum within the region from cathode to anode as shown in Fig. 6.5. In practice, a small minimum is found close to the cathode. This minimum provides a potential barrier, and only electrons with higher emission kinetic energy will overcome the barrier. Since electrons have a negative charge, they can be visualized by inverting Fig. 6.5. Since the electrons have a Boltzmann distribution of energy, the current is
where V& is the barrier height in energy units. Clearly, ~V\, = kT log(Io/I), and the barrier position can be determined from its height and the temperature. This would reduce the current but not change the ratio of shot noise to current. However, a positive fluctuation in the current will cause an increase in the barrier height that will turn back some electrons. This "negative feedback" causes a reduction in shot noise, by a factor F2. For anode voltages larger than 30/cT/e, an approximate formula for the smoothing factor given by Rack (1938):
In the retarding region V < 0, we have
An electron must overcome the potential drop V as a potential barrier. For large V, the electrons see everywhere an attractive potential, space charge is unimportant and the usual shot noise formula is valid. For lower voltage the potential has the form shown in Fig. 6.5, attractive over most of the region, but repulsive between the cathode and the potential minimum. The size of the potential minimum is governed by space charge effects whose theory is given by Moullin (1938), North (1940), and Rack (1938). The effect of the space charge is that a given electron modifies the potential minimum Vm and causes a reduction in the
108
SHOT NOISE
FIG. 6.6. Equivalent circuit of a diode feeding into a noisy resistor, after Williams and Moullin, p. 74, combining the thermal noise current with that of a space-charge limited. effect of the space charge. The result is to replace
where the reduction factor T2 has a contribution from the electrons reflected at the barrier and a second contribution from those that get to the plate. A simplified description of the space charge reduction F2 in a triode is given on p. 564 of Valley and Wallman (1948). Williams (1936) and p. 74 of Moullin (1938) show that the equivalent circuit of a diode feeding into a resistor is what in Fig. 6.6. The equations found to fit the data are given by
This is a combination of shot noise and thermal noise. Here p is the differential resistance dV/dl of the diode. The notation Ic and IT is a notation we use to remind us that the first noise is space charge limited, and the second is temperature limited. This conversion from current to voltage noise was adopted by Williams (1936) following a suggestion by Moullin and Ellis (1934) and discussed extensively in Moullin (1938), p. 74. The noise dependence from a resistance R at temperature T% was then measured by comparing the effective resistance of the diode resistance combination
RICE'S GENERALIZATION OF CAMPBELL'S THEOREMS
109
with that of a metallic resistance at another temperature T\. The results in Fig. 23 of Moullin (1938) agree with the effective resistance formula
Williams extended this verification to the case of two diodes in parallel with a resistance R, with one of the diodes being temperature limited and the other space charge limited. In that case, the formula is
6.7
Rice's generalization of Campbell's theorems
Rice (1944, 1945, 1948a, 1948b) not only generalized Campbell's theorem to obtain all the higher moments, he also considered the more general process
where the r/j's are random jumps with a distribution independent oft.,, of t, and of j. Our procedure for dealing with the same problem consists in writing
where the shot noise function G(s) is now given by
Equation (6.88) describes @(t) as filtered shot noise. Thus we can relate the ordinary characteristic function of 8 to the generalized characterized function of the shot noise function, G(s)
The average in Eq. (6.90), for general y ( s ) , was evaluated in two ways in Lax (1966IV). The first made explicit use of Langevin techniques, which we will discuss later. The second, which follows Rice, will be presented here. It makes use of
110
SHOT NOISE
the fact that the average can be factored:
Here, we have supposed that N pulses are distributed uniformly over a time interval T at the rate v = N/T. All N factors are independent of each other and have equal averages, so that the result of the RHS of Eq. (6.91) is
where 5(77) is the normalized probability density for the random variable 77. In the last step, we assumed that the integral over s converges, and replaced it by its limit before taking a final limit in which N and T approach infinity simultaneously with the fixed ratio N/T = v. Setting y ( s ) = k q f ( t — s), the generalized Campbell's theorem is obtained
The cumulants are then given by the coefficients of kn/nl in the exponent:
The choice 5(77) = 5(77 — 1) restores the original Campbell process, which includes only the cases n = 1,2. The probability density of this variable may then be obtained by taking the inverse Fourier transform of the characteristic function in
RICE'S GENERALIZATION OF CAMPBELL'S THEOREMS
111
Eq. (6.93)
This form of generalized Campbell's theorem, like its antecedents, assumes that the tj are randomly (and on the average uniformly) distributed in time. Moreover, there is assumed to be no correlation between successive pulse times. Lax and Phillips (1958) have found it convenient to exploit Eq. (6.93) and Eq. (6.94) in studying one dimensional impurity bands. Rice (1944) has also determined (© 2 (t)) for the case in which the time interval between successive pulses has a distribution p(r) that is not necessarily equal to the Poisson value, p(r) = z^exp(—z/r), appropriate to uncorrelated pulses. A simplified argument can be given for the second moment of
since
If we write where T\ = tj+i — tj , rs = ti+s — tj+ s _i are intervals, then
where the factor 2 takes account of the fact that tj can be less than tj as well as greater than tj. If we define
then
112
SHOT NOISE
If we write
and then
These results are readily evaluated for the case considered by Rice (1944)
with the result so that where
Since
the total noise is
7
The fluctuation-dissipation theorem
7.1
Summary of ideas and results
In Chapter 5 we provided Nyquist's derivation of Johnson noise in electrical resistors. Callen and Welton (1951) were the first to emphasize the universal nature of the relation between fluctuations and dissipation. They provided a quantum-mechanical derivation that involved detailed sums over the eigenstates. A number of basically equivalent proofs have been given by Kubo (1957), Lax (1958QI) and others. We shall follow, in this chapter, a procedure used in Lax (1964QIII) because it is a quantum-mechanical proof that avoids a representation in terms of eigenstates, and is readily translated into a classical proof. Since a number of ideas and relations must be developed, we shall attempt in this section to provide an overview of the ideas involved, the formulas to be derived, and the connections to be made. The remaining sections of this chapter need only be read by those concerned with the techniques used in the proofs. Although we are usually concerned with real variables in classical physics, represented by Hermitian operators in quantum mechanics, we restate the definition, Eq. (4.37), of the noise correlation between two (operator) variables A and B using the Wiener-Khinchine form of noise definition
where stationarity is needed to get the second form. A and B can be thought of as (possibly complex) random variables in the classical case and operators in the quantum case. We also consider the response of the variable B(t), governed by a Hamiltonian K, to an infinitesimal force associated with A produced by changing the Hamiltonian from K to K + XA exp(+iujt). Here A is an arbitrarily small number. The notation used here is consistent with that used in Lax (1964QIII). If the average response changes from (B(t)} in the absence of the A force to (B(t)}A in the presence of the force, we can define a response, or transition, function TBA(U) by the
114
THE FLUCTUATION-DISSIPATION THEOREM
change due to the force
This response function TBA(U~) can be computed for a certain system. Some example of TBA(U) can be found Eq. (7.17 ) and Eq. (7.76). We shall establish in Section 7,3 that
where the {• • • ) denote an average over the stationary and possibly equilibrium ensemble present before the A force was applied. In the quantum case, an average over an arbitrarv observable M is piven hv
where p(E) is the equilibrium (Gibbs) density operator for Hamiltonian K, and 0 = 1/kT, where k is Boltzmann's constants and T is the absolute temperature. In the classical case, (M} is simply an integral of M against the equilibrium distribution function. Our parenthesis-comma construct was defined in Lax (1964QIII) to be
i.e., the commutator MN — NM over iU, and h = /i/(2vr) where h is Planck's constant. In classical mechanics the corresponding expression is the Poisson brackets:
Here qt and pi are a complete set of conjugate mechanical variables, as discussed by Goldstein (1980). The expression in Eq. (7.3) looks like having some relationship to GBA(U) defined in Eq. (7.1). Using the equilibrium theorem in Section 7.4, the first, most general, form of the fluctuation-dissipation theorem to be established will be
where In the classical limit, hn(uj) is replaced by kT/uj. Note that n appears, rather than n + 1/2, so that the zero-point contribution is absent with the present order of the operators.
SUMMARY OF IDEAS AND RESULTS
115
The reason for the particular combination of T matrices in Eq. (7.7) is that this combination is expressible as an integration over the complete time interval
With no further information about the operators A and B, the only relation between the two terms in Eq. (7.1) is
The dagger f denotes the Hermitian conjugate of an operator in quantum mechanics and the complex conjugate of a number or classical random variable. For the special case in which A* — B (and vice versa) Eq. (7.10) simplifies to
When A and B are conjugates, the fluctuation-dissipation theorem simplifies to
This result applies also to the special case in which A = B is Hermitian. If A and B are unrelated, we must find another means of relating TAB(~ w~) to TBA(W}. Typically, the operators, A and B are either even or odd under the barring operation that combines time reversal with Hermitian conjugation:
where CA,B = ±1- m that case we can specialize the time reversal relation of Eq. (10.5.23) of Lax (1974) by setting the external magnetic field to zero:
The order of these operators is relevant in the quantum case. It then follows that
Thus if A and B are Hermitian, we obtain
For a detailed discussion of time reversal see Lax (1974). The left side of the above equation comes from Eq. (7.1), which represents fluctuation of a quantity of the system, B (current), under a small external force (voltage) A. The right side of the equation is related to Eq. (7.2), which represents the response of B under a perturbation A, or dissipation. Equation (7.16)
116
THE FLUCTUATION-DISSIPATION THEOREM
shows they are related. Fluctuation must be associated with dissipation in order that the system does not decay, but maintains the appropriate thermal equilibrium temperature. The noise we are usually concerned with is a current-current noise, or a velocity-velocity noise. If q represents an electron position, we are concerned with Gqq. But then CA^B = 1> and we get the imaginary part rather than the real part of a transfer matrix. The solution to this discrepancy is to remember that the admittance usually referred to is an (odd) current (velocity) response to a (even) voltage (force on a position), namely
We note, however, that and
Since eq = —I, eq = 1 we get
Since hojn(oj) —>• kT, we obtain the correct classical limit for Johnson noise. Again there is no zero-point contribution to this noise. If we had followed the Ekstein-Rostoker (1955) antisymmetrized definition of noise:
where we would have obtained
which includes the zero-point contribution. Which is the correct answer, is in our opinion, a question of physics, not formalism. One should not, as Ekstein and Rostoker do, merely choose the antisymmetrized combination because it is Hermitian. One must examine the nature of the measurement. If we are measuring light (where quantum corrections are important) then conventional photodetectors have no output, unless there are photons in the input beam. The result is that one typically measures normally ordered operator products (that is, with the photon creation operators tf to the left, and destruction operators b to the right) rather than symmetrized operators. In this case, zero-point oscillations do not make a direct contribution to the answer, and our first
DENSITY OPERATOR EQUATIONS
117
form, Eq. (7.20), of the fluctuation-dissipation theorem is the appropriate one. The appropriate ordering of operators for the photodetection problem has been given by Glauber (1963a, 1963b, 1963c, 1965), by Kelley and Kleiner (1964), and by Lax (1968QXI). 7.2
Density operator equations
Definition of the density operator Density operators are introduced to enable us to deal with an ensemble of systems rather than a single system. Therefore we shall consider a large number N of systems, each system being in a pure state J described by the wave function \I/J. The mean of a measurement operator, or observable, M, in state J is
The N is the number of systems in an ensemble, not number of atoms. The average over an ensemble is
where the sum is over the states J. Pj is the "probability" of a system being in the state J, i.e., the number of systems in the J divided by the total number of systems, N. The states *J are normalized, but need not be orthogonal. We now introduce an arbitrary (orthonormal) basis set >^. In terms of these fin's, the pure state wave function \t J can be expanded as
Using this representation, Eq. (7.25) becomes
where the density matrix pv^ is defined by
118
THE FLUCTUATION-DISSIPATION THEOREM
Equation (7.31) can be rewritten as
where the density operator
has the correct matrix elements pv^ in any system of basis vectors. Equation of motion of the density operator
In the Schrdinger picture, the evolution of the wave function \I/J of a complete physical system is given by
where H is the total Hamiltonian of the system, where we have assumed, for simplicity, that the Hamilton H does not depend explicitly on the time. Using this equation, Eq. (7.33) becomes
Taking the derivative of this equation with respect to t, the equation of motion of the density operator is
or
where we use the abbreviation of Eq. (7.5):
Application of these density operator equations to the derivation of the fluctuationdissipation theorem will be made in succeeding sections. Equation (7.37) is also
THE RESPONSE FUNCTION
119
valid in the classical case, provided only that Poisson brackets are used, as defined in Eq. (7.6). Note that Eq. (7.37) has the same form as the Heisenberg equation of motion except for a minus sign. That is because the Heisenberg equation describes the motion of a general operator, and Eq. (7.37) describes the motion of the density operator, hence of wave functions. And wave functions and operators transform in a contragredient way. 7.3 The response function We start from the density operator equation
and treat A as a weak perturbation, via the small parameter A. This discussion follows that in Lax (1964QIII). If we set
then if A were zero, p(t) would reduce to the constant value p(0) of Eq. (7.35). Equation (7.40) transforms away the rapid motion associated with the unperturbed Hamiltonian, K. This is called a transfer to the interaction picture. The only motion that remains is that induced by the perturbation added to the unperturbed Hamiltonian. It should be no surprise, then, that if we substitute Eq. (7.40) into Eq. (7.39) we obtain
where the remaining motion is that induced by the perturbation \A and
is the operator A with the time dependence induced by the unperturbed Hamiltonian, sometimes referred to as the operator A in the interaction representation. We assume that the system starts at equilibrium at the time t = — oo:
where Z is the partition sum calculated as a trace. In the classical case, this would be an integral over all phase space.
so that PE is normalized. For an infinitesimal perturbation, p(t) on the right hand side of Eq. (7.41) can be replaced by p%. Then p(t] can be obtained explicitly, and
120
THE FLUCTUATION-DISSIPATION THEOREM
one arrives at
the density operator evaluated to exactly the first order in A. Finally, as K is time independent, we have
we obtain TBA(U} defined by Eq. (7.2):
But
Thus
where stationarity is applied to obtain the second form from the first. The final result is Eq. (7.3) as predicted. For future reference, we note that the interchange of the names A and B in the first form of Eq. (7.50) yields (with stationarity)
If we set u = —t,
The passage from Eq. (7.51) to Eq. (7.52) involves three minus signs: dt = —du, the reversal of limits, and the reversal of the order of the operators. Finally, we can combine Eqs. (7.50) and (7.52) to obtain the simpler form
which involves an integration over the complete region from negative to positive infinity.
EQUILIBRIUM THEOREMS
121
7.4 Equilibrium theorems Because of the cyclic natures of traces
we obtain the theorem:
When Planck's constant goes to zero, we reduce to classical physics, and the operators commute. If we take the Fourier transform of this equation, and introduce t + ih/3 as a new integration variable on the right hand side, we obtain the corresponding theorem in the frequency domain:
The shift in the variable of integration appears to involve an assumption of analyticity in the region 0 < Im t < (3. However, this assumption is unnecessary if one writes out the right hand side in the energy representation. Then
The Fourier integral, on the left hand side of Eq. (7.56), yields a delta function factor 5(Huj — (Em — En)) which permits the replacement of exp[{3(Em — En)] by exp[/3huj] and the theorem follows with no assumption of analyticity. In the classical limit, .A(O) and B(t) commute. The factor exp[/3Huj] is the price one must pay, in quantum mechanics, for switching the order of the operators. This factor is identical to (and responsible for) the ratio of Stokes to anti-Stokes radiation intensities. For relating TBA(<^} — TAB(—^), which represents the response function or dissipation to the noise spectrum, we use With the help of Eq. (7.56) and notice that A = A(0) Eq. (7.53) can be written
By Eq. (7.1), the integral on the right is (1/2)GBA- Using Eq. (7.8) we obtain the fluctuation-dissipation theorem in the form where the left hand side follows the conventions of Section 4.7.
122
7.5
THE FLUCTUATION-DISSIPATION THEOREM
Hermiticity and time reversal
The purpose of this section is to develop relations between TAB(~w) and TBA(W)*. From Eqs. (7.3) the latter quantity can be written
since (TrM)T = TrM', and PE is Hermitian. The Hermitian adjoint of a product (MJV)t = N^M^, but the Poisson bracket has a factor 1/ih whose sign is reversed by taking its complex conjugate, so the order is restored and
This result is to be compared with Eq. (7.50)
In the special case A* — B, and of necessity, B\ — A, Eq. (7.62) immediately yields and with B set to A* we have
with no assumptions about hermiticity or time reversal for the operators A, B. However, if A = B = Q where Q is a Hermitian operator, Eq. (7.64) takes the simple form In the more general case, time reversal is assumed in the classical case by Onsager (1931a, 1931b), and derived in the quantum case by Lax (1974).
where e^;# = ±1 according as A (or B) is even or odd under the barring operation, which includes a classical time reversal transfer and a Hermitian adjoint transfer:
with a similar statement for B. Onsager stated Eq. (7.66) as a classical macroscopic relation, without derivation. It was not clear what the order of the factors should
APPLICATION TO A H A R M O N I C OSCILLATOR
123
be. Lax (1974) derived this result for quantum systems described by a Hamiltonian even under time reversal. Our order of the operators is a consequence of that derivation. We can immediately obtain
Thus we obtain a new expression for TAB(—U) fr°m Eq. (7.63),
Comparison with Eq. (7.61) shows that
then From Eq. (7.60), if A and B are Hermitian, our fluctuation-dissipation theorem can then be written
and if B = A*, we have
To obtain the usual fluctuation-dissipation theorem for Gqq in terms of the response function Y = Tqq, we can set A = q and B = q, and invoke the relation which follows from Eq. (7.1) by integration by parts, to obtain
The above example applies to any case in which A and B have opposite parity under time reversal. Application to a harmonic oscillator will be considered in the next section. 7.6
Application to a harmonic oscillator
The harmonic oscillator is an important example, since it can stand for an RLC circuit, a mechanical circuit, or a mode of the electromagnetic field. The electric circuit response is usually the response of a current to an applied voltage. In the associated mechanical problem, the response, or mobility, is that of a velocity to the applied force.
124
THE FLUCTUATION-DISSIPATION THEOREM
We can therefore use the result, Eq. (7.73), of the previous section by setting A to the displacement variable Q, and B = A = Q is a velocity, and the applied potential energy is +\Qex.p(+iujt), corresponding to the perturbation described in Section 7.1. The equations of motion of an oscillator of position Q, momentum P, mass M, and resonant frequency UJQ are given by
Combining Eqs. (7.74) and (7.75), and setting Q(t) = Qmexp(+iujt), with the position amplitude Qm, we have Here, and in the discussion that follows, 7 can be frequency dependent, but we shall avoid using the explicit form 7(0;) for simplicity. The velocity Q is then expressible as where the mobility is given by:
which is TQQ in the form of Eq. (7.73) with A = Q and B = Q. Applying the fluctuation-dissipation theorem, we get for the velocity noise
where the resonant denominator is The Qu follows another notation common in the literature. See footnote 24 of Lax (1966QIV). The dagger correlates with the first variable, since it carries the time, t in Eq. (7.67). If we apply such relations as shown in Eq. (7.73), we can obtain positionposition correlations. Notice that TQQ = Q m /A, we have
which is Eq. (7.79) by dividing by oA
APPLICATION TO A H A R M O N I C OSCILLATOR
125
But the zero-point effect appears for antinormal order (destruction appears before creation):
These results were already presented in Appendix C of Lax (1966QIV). We ask what the result will be if we set A = 0 in Eq. (7.75), but add a Langevin force F(t). In Fourier notation, Eq. (7.75) can be written in the form:
Thus we obtain the relation
Thus the resonant response, MD(uj] in the denominator, is (correctly) removed if the Langevin force is applied. In the work on the quantum theory of noise sources Lax (1966QIV) preferred to work with destruction and creation operators which are denned by
and the conjugate equation
With this transform of variables, to a good approximation b(€) will contain terms predominantly of the form exp(—iujt/U) with positive us and tf will contain only frequencies of the opposite sign. The inverses to Eqs. (7.85) and (7.86) are
The equations of motion of the destruction and creation operators are
where
126
THE FLUCTUATION-DISSIPATION THEOREM
The same ratio applies to the Fourier components, so that the power ratio is
The result is that these new Langevin forces have the astonishingly simple form
If the analysis is repeated with the operators in the opposite order, say (Q(0)Q(t)), one obtains where the added unity is the result of the commutation rules and yields, as expected, the corrected Stokes-anti-Stokes ratio
The procedure used by Lax (1966QIV) to produce a Markovian model that can be solved by a well established set of tools is 1. To make a rotating wave approximation in Eqs. (7.88)-(7.89) by omitting tf in the equation for db/dt and vice verse, and 2. To force the noise source to be white by evaluating it at a> = U>Q. 7.7
A reservoir of harmonic oscillators
To make some of the abstract concepts of dissipative motion become meaningful, we shall illustrate them by using a reservoir consisting of a continuum of harmonic oscillators. For simplicity, we shall choose our system, itself, to be a single harmonic oscillator. By assuming the reservoir oscillators are already uncoupled normal modes, and that our system oscillator interacts linearly with each of these modes, we can make an exact analysis of how our system oscillator behaves in the presence of the reservoir. Let the system oscillator have position and momentum Q and P, with each reservoir oscillator having corresponding variables qj and PJ. We can write a coupled Hamiltonian in the form:
The equation of motion of the system oscillator is then
A RESERVOIR OF HARMONIC OSCILLATORS
127
where the coupling force of the reservoir oscillators on the system oscillator is
Each system operator in the reservoir obeys the equations
An exact solution can be made for each oscillator. For each oscillator j subject to a force A(t), which is an abbreviation for —a,jQ(t), the position and momentum (called briefly q and p) at time t is given in terms of the initial values q(to} and p(to) with u = t — tn,
When each of the solutions for ^and PJ are inserted into Eq. (7.97), the system momentum is found to obey
where K(t — s) represents damping due to action of the reservoir:
We define a damping coefficient 7(0;) by
The Langevin force in Eq. (7.101) is
If we define
128
THE FLUCTUATION-DISSIPATION THEOREM
then
We note the commutation rule obeyed by the random force F(t) is
Another form can be written using the anticommutator:
where the energy associated with each mode can be written
Phen we have
whereas The integrations over frequency in this section extend only over positive frequency since each frequency ujj of an oscillator is, by convention, positive. The relation between the Fourier components of K(u) or L(u), which describe transport and involve 7(0;), and the noise anticommutator, Eq. (7.110), which involves 7(0;) and E(UJ) represents the standard fluctuation-dissipation theorem. Here we have not given an abstract proof of the fluctuation-dissipation theorem for a general system, but a demonstration of how it arises in a simple case of a harmonic system, and a harmonic reservoir. If we were to replace the potential energy of the harmonic oscillator by V(Q) the remaining analysis would remain unchanged except that MOJ^ in Eq. (7.101) would be replaced by dV(Q)/dQ. If the coefficient a?J(mjU?}, which are regarded as a function of frequency, is chosen to be a constant, then 7(0;) =7, and we get
and This is the quasi-Markovian limit.
8
Generalized Fokker-Planck equation
8.1 Objectives The first objective of this chapter is to reduce the determination of the behavior of a Markovian random variable, a(t), to the solution of a partial differential equation for the probability, P(a,t), that a(t) will assume the value a at time t. In this way, a problem of stochastic processes has been reduced to a more conventional mathematical problem, the solution of a partial differential equation. But the spectrum of a random process requires that the Fourier transform be taken of a two-time correlation. We therefore introduce a regression theorem that states that for a Markovian process, the time dependence of a two-time correlation of the form (a(t)a(O)) is the same as that of (a(t)). The motion of (a(t)) is just the motion or transport of the variable a itself. Thus we relate the spectrum of the noise to an understanding of the one-time transport or the mean motion of the system. Onsager (1931a, 1931b) was the first to follow this procedure. He simply stated it as an assumption within the context of a classical system near equilibrium. A proof, for the classical case is given in Section 8.4. A detailed discussion of the quantum case is given in Lax (1968QXI), and a comparison between the exact treatment of a harmonic oscillator, in the quantum case, with a Markovian procedure introduced in Lax (1966QIV) is given by Ford and O'Connell (2000) and Lax (2000). However, the initial condition of (o(i)a(O)) is not the same as the initial condition for (a(t)}. Thus we must obtain, in another way, information about the total fluctuation {[a(0)]2) at the initial time. For stationary processes, at equilibrium, these fluctuations can be determined from thermodynamical arguments. For example, the mean square velocity of a molecule in three dimensions is determined from equipartition to be where k is Boltzmann's constant, T is the absolute temperature, and M is ths molecular mass. With the help of the Einstein relation,
130
GENERALIZED FOKKER-PLANCK EQUATION
relating the diffusion constant D to the mobility A we can then deduce the value of the diffusion constant D in the equilibrium case. In the nonequilibrium case (v2) is unknown, and Eq. (8.2) is used to obtain it from D and A which are determined directly from the stochastic model used to describe the system. In particular, D is determined by the second moments of the velocity jumps, and A is determined directly from the first moments of the velocity jumps. Explicit evaluation of the spectrum is carried out, in this chapter, for quasilinear systems. The simplest example will be generation-recombination noise. Noise in a system that cannot be approximated as quasilinear, such as a laser because it is a self-sustained oscillator, is discussed in Chapter 11. In this case, an analytic (or numerical) solution must be made of the Fokker-Planck equation for the one-time motion of the system, and a regression theorem must be exploited to obtain the spectrum. We are then in a position to calculate the (two-time) noise spectrum by exploiting a classical regression theorem which is established in Section 8.4. The form in which that theorem is proved here is the statement that the equation obeyed by the two-time (or conditional) distribution P(a(t~),t a(0), 0) obeys precisely the same differential equation in time as the one-time distribution, P(a(t), t). The proof is based, explicitly on the system being Markovian, with no assumption of equilibrium. Our classical regression theorem is not equivalent to the Onsager regression hypothesis in two ways: (1) it is a theorem, with a proof, not a hypothesis; (2) it does not assume that the fluctuations take place from equilibrium but can be from a nonequilibrium stationary state. To what extent can we expect the dynamics of a system of random variables to be Markovian, namely that the probability of future events is determined by a knowledge of the present. This is analogous to the question: to what extent can we expect the future of a set of (nonrandom) variables to be deterministic, namely, that their future values are completely determined by the initial conditions? The answer to the latter question is yes if the set of variables is complete. But in any real problem, one doesn't include all the variables in the whole universe. The scientist must decide which are the relevant variables, and the rest can be discarded. In the random case, an analogous choice must be made. If enough variables are used, we would expect that the noise sources would be white. If they are not white, they may have come from a filtering process via an intermediate system. If we add the latter to our system, we can make the ultimate noise white, and Markovian methods become available to solve the problem. Although it is beyond the scope of this book, we will comment on the quantum case. Because of the commutation rules, noise at positive and negative frequencies cannot be equal. Their ratio is the familiar Stokes-anti-Stokes ratio exp(huj/kT) discussed in Section 7.6. However, when the damping rate 7 is small compared to the frequency difference 2wo between positive and negative frequencies it is
DRIFT VECTORS AND DIFFUSION COEFFICIENTS
131
possible to separate these two degrees of freedom by a rotating wave approximation. Then the noise can be treated as approximately white over the width of each resonance. The correct Stokes-anti-Stokes ratio can still be maintained. The error in this procedure is first order in 7/0;. For an optical laser, typical numbers are 7 = io9 per second and uj = 1015 per second. In spite of the negligible error in the application for which the approximation was made, the Lax-Onsager procedure has been attacked by Ford and O'Connell (1996, 2000). Although these authors recognize the often negligible errors, they have repeatedly stated that there is no "quantum regression theorem", and this is of course true. The initial statement, Lax (1963QII) on regression was based on an approximate decoupling between system and reservoir. So clearly, it was understood to be an approximation in the quantum case, not a theorem. Later papers, clarifying the nature of the approximation, see Lax (1964QIII, 1966QIV), were not mentioned in the initial draft of Ford and O'Connell (2000). To understand why the approximate procedure worked in a variety of cases, Lax (1968QXI) showed that it would work whenever the system was Markovian. Why this is true in the classical case is shown in Section 8.4. The quantum mechanical proof in Lax (1968QXI) merely showed that a system would obey a regression theorem if it were Markovian, and vice versa. But an exact quantum treatment of a damped harmonic oscillator was already used to show in Lax (1966QIV) that the ratio of the noise at positive frequency to that at the corresponding negative frequency is exp(huj/kT~), usually called the Stokes-anti-Stokes ratio. Since this ratio is not unity, the noise cannot be white, which would imply an exactly flat spectrum. Ford and O'Connell (1996) wrote: "But the so-called quantum regression theorem appears in every modern textbook exposition of quantum optics and, so far as we know, there are no flagrant errors in its application. How can it be that a nonexistent theorem gives correct results?" The answer, of course, is that many real systems are approximately Markovian. That is why the method works. It is not necessary for the noise to be white over all frequencies, but only over each resonance, where most of the energy resides. 8.2 Drift vectors and diffusion coefficients A general random process (not necessarily a Markovian one) obeys the relation on conditional probabilities, Eq. (2.15), which is rewritten in the form
The equations in this chapter can all be generalized by regarding a as a set of variables rather than a single variable. The transition probability P(a',t + At|a, t; ao, to) describes the probability of arriving at a' at t + At if one starts
132
GENERALIZED FOKKER-PLANCK EQUATION
at a at time t remembering that one started the entire process at ao at time to- This last bit of information is irrelevant if the process is Markovian. If this dependence is dropped, Eq. (8.3) reduces to the Chapman-Kolmogorov condition, Eq. (2.17). For many Markovian processes one can write (for a1 ^ a) where At is small and wa>ais the transition probability per unit time and the normalization condition (including a' = a),
requires us to conserve probability. The probability of remaining in the original state must therefore be
Thus conservation of probability guarantees that you must end up somewhere, when one includes the possibility of remaining in the initial state. Note, that any choice of wa>a will satisfy the Chapman-Kolmogorov requirement, Eq. (2.17). However, it is not always possible to expand the transition probability linearly in At. The simplest example of this is Brownian motion for which Nevertheless, the Chapman-Kolmogorov equation, Eq. (2.17), is obeyed by Eq. (8.7). We shall make the weaker assumption that moments of the transition probability can be expanded linearly in At: for n > 1 and infinitesimal At we assume
For a certain nth moment, if it is proportional to At fc , with k > 1, then Dn can be regarded as zero. This equation defines an nth order conditional diffusion constant
by
This is in fact obeyed even for the Brownian motion process, which can be easily proved from Eq. (8.7). Again, for a Markovian process the dependence on OQ can be omitted everywhere.
DRIFT VECTORS AND DIFFUSION COEFFICIENTS
133
Equation (8.9) is the central axiom we set for random processes. Random variables are different from nonrandom variables in smoothness with time. For a nonrandom variable, (a' — a)n is proportional to At™, because it varies smoothly with time, hence, lim[(a' - a) n /At] = 0 for n > I. For a random variable, there are nth moments for n > 1 proportional to At due to strong fluctuation. It is customary to refer to DI as a drift vector, A, since
so that we expect to be able to show later that the mean motion of our random variable a obeys
where {• • • } means average of a at time t. In the Markovian case, A(a,t\ao,to) is replaced by A(a, t) but the dependence on ao in the other factors remains. It is appropriate to call
a diffusion constant in the variable a in analogy with the Brownian motion result that D = {(Ax) 2 )/(2At) for a one-dimensional position variable. However, a need not be a position variable. It could, for example, be an angular variable, or the number of particles in a given state. If one does not impose a (in) = ao, every random process obeys
which adds up all the ways of arriving at a' weighting each with the probability of its starting point. Thus it is also possible to define the set of diffusion constants
The process is Markovian if and only if, for all n,
Equation (8.15) follows from Eq. (8.9) where all information for times earlier than t is ignored, which is appropriate (only) for Markovian processes.
134
GENERALIZED FOKKER-PLANCK EQUATION
In applying these ideas to lasers we compute the diffusion constants by letting At become smaller than any of the relaxation times or line-widths. But we actually require At to be larger than the optical period which can be as short as 10~15sec. Our process need then only be Markovian for steps, which are large compared to the optical period. This procedure is analogous to the derivation of the Einstein diffusion equation for Brownian motion in Chapter 3, where the actual process is only Markovian when both position and velocity variables are included, not the position alone. However, for time intervals large compared to the velocity decay time, the process becomes Markovian in position alone as shown in Section 3.6. 8.3 Average motion of a general random variable A Markovian random process obeys the Chapman-Kolmogorov relation:
which adds up all the ways of arriving at a! weighting each with the probability of its starting point. The average motion of an arbitrary function M(a,t) may be obtained by integrating M ( a f , t) against P(a', t + At). Thus, we should multiply Eq. (8.16) by M ( a f , t) and integrate over a'. On the right hand side of the equation, we shall replace M(a' t) by its Tavlor expansion:
The integrals of (a' — a)n over a' give rise to the diffusion coefficients in Eq. (8.8). Then integration in Eq. (8.16) over a at time t leads to
In obtaining (M(a,t))t of the second term of the right hand side in the above equation, the normalization property of the transition probability is used:
Thus we obtain
AVERAGE MOTION OF A GENERAL R A N D O M VARIABLE
135
where (• • • ) for any function M(a) is defined by
This formula (8.20) is valid in the sense of the expectation value for a general random variable M(a, t). We remind readers of the two different symbols. The symbol {(Aa) n ) a(i)=a = ([a(t + At) - a(t)f) a(t)=a in the definition of Dn(a, t) in Eq. (8.4), where a subtraction of a at t from a at t + At is first taken, then an average is made over the values of a at t + At subject to a fixed a(t) = a at time t, involves only an integration over a(t + At). The symbol d(M(a)} = (M(a, t + At)} — (M(a, t)) that represents the change of the expectation value of M(a, t) with time, where the averages of M(a) at two different times t and t + At are first made, then the difference is made, involves two integrations over a(t) and a(t + At). We shall illustrate its value by setting M(a) = a,
and for the variable M, we have d(M)/dt = (A(M,t)). When M(a) = a2, we have
where we have used A(a, t) = Di(a, t), D(a, t) = D^(a, t), ..., and {...) represents the average over a. The last two terms are taken in Eq. (8.23) instead of 2(A(a, t)a), as that is needed for the multidimensional cases and the quantum case in physics. The conditional expectation under the condition of a(t) = a is obtained by setting Hence, from Eq. (8.22) we have
and for the variable M(a), we have
/to's calculus lemma, a fundamental mathematical tool used in financial quantitative anaysis, is Eq. (8.26) cutting off at n = 2 for the Brownian motion.
136
GENERALIZED FOKKER-PLANCK EQUATION
Mathematicians denote the conditional expectation by the symbol E(Xs\Jrt), s > t, where Xs is a stochastic process, and Ft is a
The first equation determines the operating point of the stationary state. The second is the usual Einstein relation which relates the diffusion coefficient D to the dissipative response contained in A. If the noise is weak so that the fluctuations extend over a range in a which is small compared to that over which A (a) and D(a) vary appreciably, it is permissible to make a quasilinear approximation. Let aop represent the point at which the drift vector vanishes: and set
as the deviation from the operating point aop. In the quasilinear approximation we shall make the approximations
i.e., we retain first order deviations in the drift vector. Thus our drift, or transport equation, simplifies to and the diffusion equations reduce to In the steady state, the left hand sides of Eqs. (8.33) and (8.34) vanish and for the single variable, classical case, our Einstein relation is then simply
THE G E N E R A L I Z E D FOKKER-PLANCK EQUATION
137
which relates a diffusion constant D, a decay or dissipation constant A and the mean-square fluctuations (a2) in the steady state. Here (a2) plays the role of kT but thermal equilibrium has not been assumed as it was in Einstein's original work. In general, the equations for /(a), with /(a) = 1, a, a 2 , do not form a closed set. Therefore it is necessary to obtain an equation for the full probability distribution P(a, t) or P(a, t aoto) and not just the moments of those distributions, which is presented in Section 8.4. 8.4
The generalized Fokker-Planck equation
To obtain the equation of motion for P(a, t) we write Eq. (8.20) in the explicit from, when M(a) does not explicitly depend on time,
After integration by parts (n times for the nth term) we obtain
Since this equation is to be valid for any choice of M(a), the coefficient of M(a) in the above equations must vanish yielding the generalized Fokker-Planck equation:
The ordinary Fokker-Planck equation is the special case in which the series terminates at n = 2. The ordinary Fokker-Planck equation, which can describe a nonlinear Brownian motion, plays a special role because of the following theorem. Theorem: The Fokker-Planck or Bust theorem If any Dn for n > 2 exists, then an infinite set of Dn's exist. Thus the FokkerPlanck differential equation is the only one of finite order. Proof Define An = n\Dn. Then the Cauchy-Schwarz inequality takes the form
138
GENERALIZED FOKKER-PLANCK EQUATION
Equation (8.39) is a generalization of the familiar theorem regarding vectors A and B Equation (8.39) can be rewritten as in terms of the drift vectors as
This argument is due to Pawula (1967). From Eq. (8.41) we see that
and
Thus if any An exists for n > 2, this string of inequalities guarantees the existence of an infinite number of such coefficients. Such an infinite number corresponds to an integral equation rather than a differential equation. Actually, Pawula proves the converse. If one assumes the existence of a closed equation of order greater than 2, then some finite order coefficients must vanish. By using the inequalities in reverse, one can show that all coefficients above n — 2 would vanish. The existence of a Fokker-Planck equation for a random process does not guarantee that the process is Markovian. If we start from Eq. (8.9) instead of (8.20) we obtain the equation
In the Markovian case, Eq. (8.44) reduces to Eq. (8.38) since the dependence on the earlier time to can be discarded. Thus if (and only if) the process is Markovian, P(a, t ao, to), a two time object, obeys the same equation of motion as the one time object, P(a, t). In the Markovian case, then, we can calculate the conditional probability, P(a, t ao, to), a two-time object, by solving the single time equation, Eq. (8.38), subject to the initial condition
This claim is a precise statement of the "Lax regression theorem", Lax (1963QII), for the classical case and is not equivalent to the "Onsager regression hypothesis," (Onsager 1931 a,b). Onsager's hypothesis restricted to fluctuations from an equilibrium state for which it has been justified by the Callen-Welton (1951) fluctuation-dissipation theorem. Lax's work was originally derived for the quantum case by a factorization approximation in Lax (1963QII) and rederived in Lax (1968QXI) using only the assumption that the system is Markovian. In the classical case, there are many known Markovian systems. In the quantum
GENERATION-RECOMBINATION (BIRTH AND DEATH) PROCESS
139
case, almost all systems for which solutions are feasible are approximately Markovian. And my (Lax) proposal should then be labeled the "Lax quantum regression approximation." It is useful for many driven systems, such as lasers for which the equilibrium assumption is not valid. The generalized Fokker-Planck equation, Eq. (8.38), can be expressed in a form similar to the Schrdinger equation of quantum mechanics. If we define
and
thus we have
and we obtain Here, a and y may both be regarded as operators but all y's must remain to the left of all a's. 8.5
Generation-recombination (birth and death) process
The generation-recombination process in semiconductors is another example of a shot noise process in which the diffusion constants can be calculated from first principles based on an understanding of the physics of the process. It is also an example of what statisticians refer to as a birth and death process. Let us define the random integer variable n as the occupancy of some state. We assume the particles are generated at the rate G(n) and disappear at the rate R(n) (recombination). Then our mean equation of motion is
where (A(n)} is our average drift vector. Since the occupancy of state n is increased by generation from the state n — 1 and reduced by generation out of the state n, whereas it is increased by recombination out of state n + l and reduced by
140
GENERALIZED FOKKER-PLANCK EQUATION
recombination out of state n, the probability distribution function, P(n, t), obeys the following generalization of Eq. (3.3)
If we divide by At we immediately obtain the generalization of Eq. (3.5)
We can recover the Poisson process of Section 3.1 by setting R = 0, G = v. Equation (8.51) can be rewritten as a master equation
with the transition rate
It is convenient to interchange the symbols n and n' so the n now represents n(t) and n' represents n(t + At). The rth diffusion coefficient, Dr, defined by Eq. (8.14),
takes the value
after omitting noncontributing terms in Eq. (8.54).
GENERATION-RECOMBINATION (BIRTH AND DEATH) PROCESS
141
A simpler procedure is to rewrite Eq. (8.52) as a generalized (infinite order) Fokker-Planck equation, Eq. (8.38), by a Taylor expansion of f(n ± 1) by /(re):
The rth order diffusion constant read off from the coefficient of the rth derivative term agrees with that found in Eq. (8.56). Compare with Eq. (8.38) and see Lax (1966III) for a detailed discussion of the generalized Fokker-Planck equation. We have Thus all the even numbered diffusion constants are proportional to the sum of the rate in plus the rate out, while all the odd numbered diffusion constants are proportional to the difference, i.e., the rate in minus the rate out. In particular, the second moment, Do, which will be the moment of the noise source, obeys
These results are characteristic of shot noise. A Langevin approach to the generalized Fokker-Planck equation is presented in Lax (1966QIV) and discussed in Chapters 9 and 10. In the quasilinear approximation, the operating point, nop, and decay parameter, A, are determined by
and the diffusion coefficient is
Then, using Einstein relation, the fluctuation from the average value becomes
We now want to calculate the autocorrelation function, (Are(t)Are(O)). The transport equation and solution are
142
GENERALIZED FOKKER-PLANCK EQUATION
where we have tacitly assumed that t > 0. By the definition, Eq. (1.80), of conditional probability
Thus we can take a two-time average by first averaging conditional on a fixed Are(O): That the dependence on time t of a two-time average during a fluctuation is the same as that of a one-time average (with possibly different initial conditions) is a statement that the "regression" theorem is obeyed. For a non-Markovian process, (An(t)) An ( 0 ) depends not only on the initial condition, but also on the prior history, and the path to Are(O). At the quasilinear level, A is independent of Are(O) so that the conditional average also obeys Eq. (8.64):
Even without the quasilinear approximation, (An(t)) An ( 0 ) ls me same as the solution for (Ara(i)) subject to a given Ara(O) because the process is Markovian. Equation (8.67) permits us to write
We have replaced t by \t\ here, but the justification is based on time reversal as developed in Section 8.10. Using Eq. (8.63) in the quasilinear approximation,
We have thus derived the autocorrelation of the generation-recombination fluctuations, whose Fourier transform yields the noise. We may remark that the simple Poisson process is a generation recombination process with no recombination, and a rate of generation G independent of n:
In this case, the general diffusion coefficient Dr is given by
Thus the Poisson process of Section 3.1 is a special case of the generationrecombination process of shot noise processes in which the noise arises because
THE CHARACTERISTIC FUNCTION
143
of the discreteness of the occupancy number n. An even more general case, when many states have occupancies was given in Lax (19601). Let us summarize the procedure we use to obtain the spectrum of fluctuations from the quasilinear stationary state. For a Markovian process, the time-dependent decay (regression) of a correlation, (An(i)An(O)}, is the same as that of (An(t)}^ n ( 0 ) the decay of the mean motion from a deviation. This is the basis of the Onsager (1931a, 1931b) regression hypothesis for the equilibrium state, but it is proposed as a theorem by Lax (1968QXI) for Markovian systems with no assumption regarding equilibrium. Of course classical systems can be exactly Markovian, whereas quantum systems can only be approximately Markovian. For the classical physics case, this proposed theorem is a consequence of the definition of conditional probability, Eq. (8.65). Thus the conditional average of An(t) obeys the same time dependent equation of motion as the unconditional average. See Lax (19601). In this way, the frequency dependence of the noise is determined by the Fourier transform of the mean motion - the transport. The normalization of the noise is determined by its total: {[Are(O)] 2 } and the latter can be calculated via the Einstein relation. Thus {[An(0)] 2 ) is determined by D and A, and D must be calculated directly from the nature of the random process. In the equilibrium case, the procedure is the reverse. The total fluctuations are determined by the Gibbs distribution in classical mechanics, or the associated density matrix. The Einstein relations can then be used in the reverse direction to calculate the diffusion constant.
8.6 The characteristic function We found earlier that the easiest way to obtain solutions to the Poisson process is to solve for its characteristic function. This suggests examining the characteristic function of the generalized Fokker-Planck process. The characteristic function, (p(y, t), of a normalized probability density P was defined in Section 1.5 as
The moments (an} are determined by the nth order derivatives of 0(y, t) with respect to y at y = 0 since from the above equation
144
GENERALIZED FOKKER-PLANCK EQUATION
From Eq. (8.20) the equation of motion of 0(y, t) is
Since
thus where Together with
therefore, the equation of motion of (f>(y, t), Eq. (8.74), becomes
which can be rewritten as
This is to be compared with Eq. (8.49)
Thus we have forms, Eqs. (8.81) and (8.80), analogous to the quantum mechanical space and momentum representations, respectively. The best form to use, as in quantum mechanics, depends on the particular problem. Although our diffusion coefficients are originally defined in terms of ordinary moments, i.e., Eq. (8.14)
we can also use a linked-moment definition
since the lower moments to be subtracted off, i.e., the unlinked parts, contain the product of at least two averages of the form ([a(t + At) — a(t)]'} with I > 1,
THE CHARACTERISTIC FUNCTION
145
and the product is at least quadratic in At, and can be discarded. Although the cumulants have an earlier history as Thiele (1903), called the semi-invariants (see Section 1.6), but the use of the linked-moment notation (L subscripts) in quantum problems is due to Kubo (1962). In both Eq. (8.82) and Eq. (8.83) averages are taken subject to the initial condition a(t] = a. With the notation Aa = a(t + At) — a(t), Eq. (8.83) can be used to rewrite Eq. (8.48) for L(y, a, t) in the elegant form
As an example, let us consider the shot noise case. We had found the diffusion coefficients, Dn, shown in Eq. (8.71), to be
Then the equation for L(y, a, t), Eq. (8.84), becomes
We note that in this shot noise case, L ( y , t ) is independent of a, and similarly L ( y , a, t) is independent of a. The equation of motion of the characteristic function d>(y. t), Eq. (8.80), then becomes
whose solution, noting that
However, we also have from Eq. (8.72) and Eq. (1.54)
Therefore equating the exponents of Eqs. (8.88) and (8.89), we have
or
i.e., all the linked-moments are the same and equal to the number of events expected, (a), or vt as shown in Eq. (3.16).
146
GENERALIZED FOKKER-PLANCK EQUATION
8.7 Path integral average All our previous work has dealt with averages taken at one time or at most, taken at two different times. We shall now look at a truly multitime function. A number of important problems in the theory of random processes can be reduced to an expectation value of the form (see, for example, Lax 1966III; Deutch 1962; Middleton (1960); Stratonovich 1963),
or to a Fourier transform of such an expression. For example, in the photodetection fluctuation problem, one wishes to determine the probability, P(m, T) of observing m photocounts in a time interval T. This has been shown by Lax and Zwanziger (1973) to be determined by the distribution of time-integrated optical intensity (integrated over a time T). This is difficult to calculate directly, but evaluation of the Laplace transform of the time integrated intensity involves an average of the form
This is evaluated for real A by Lax and Zwanziger (1973) and the inverse transform, an ill-posed problem, is obtained using a Laguerre expansion procedure. The most important object of attention is the generalized characteristic functional F f - • - 1 ofo(s]
from which moments of products at an arbitrary number of times can be computed. For the case Q = —iq(s)a(s), we shall show that at least in the classical case, this problem can be reduced to analysis. First we break up the integral in Eq. (8.91) into small time units,
which becomes
We note that we do not integrate over dao in the above equation since MQ is the average under condition a (to) = GO- To calculate the total average, M, we would
PATH INTEGRAL AVERAGE
147
set
For a Markovian process, using the factorization of probabilities, Eq. (2.11) and Eq. (8.95) become
The relation between Feynman (1948) and Wiener (1926) path integrals is discussed by Montroll (1952). This equation is essentially the equation of a Feynman and Wiener path integral. We now define
For small time intervals At,,
We may regard P(a,j+i a.,) as the transition probability of a new Markovian process. It obeys the usual properties of transition probabilities except for a change in the normalization
to first order in At. If we regard P(a, t) as a density of systems, then Q(a, t) can be regarded as the rate at which these systems disappear. The higher moments of P(a'\a) are the same as those of P(a' a),
to first order in At. Returning to the derivation of Eq. (8.49), but including the effect of Eq. (8.99), we obtain,
which is Eq. (8.81) with an added loss term. Thus on the right hand side we have the usual Fokker-Planck operator, plus a loss term. Nowhere in our previous discussion have we made use of the normalization of the probabilities except
148
GENERALIZED FOKKER-PLANCK EQUATION
in the calculation of the zeroth moment. Therefore, from a generalization of the Chapman-Kolmogorov condition, Section 2.4, we have
Since P is not normalized and is not an ordinary Markovian process, the use of the Chapman-Kolmogorov condition requires some justification. We can provide an intuitive proof as follows. The term in Q(a) represents a rate of disappearance from which there is no return. If we simply add a new discrete state which holds all the escaped probability, then P will be again a normalized Markovian process since no memory has been added. Equation (8.102) then describes a composition of probabilities in which the system ("particle") passes from ao to an having survived, i.e., not disappeared or escaped. This is accomplished by passing through the intermediate states 01,02 • • • a n _ihaving survived at each step. As a formal proof of the Chapman-Kolmogorov condition we note that the P process differs from an ordinary Markovian process only in having
with the loss term
increased by an amount Q(a). But these transition probabilities have been shown to obey the Chapman-Kolmogorov condition in Section 2.4 without specifying r(a). The P process is already included in that proof. Thus Eq. (8.97) becomes
or in a more expanded notation,
which just describes a decay in the normalization of the P's associated with taking the mean of the exponential, Eq. (8.91). We note, however, that Q may be positive or negative. Associated with P we define a characteristic function, >, which is the Fourier transform of P, just as was the Fourier transform of P, i.e., we define
LINEAR DAMPING AND HOMOGENEOUS NOISE
149
Comparing Eq. (8.107) with Eq. (8.105) we see that
We conclude that if we can solve the equation for the characteristic function >, we can evaluate the path integral Eq. (8.91). In a similar manner as the derivation of Eq. (8.101), we find the equation of motion of the characteristic function <^>, to be
where a = —id/dy. Thus, as in the equation of motion for ct>, we obtain an extra term in the equation of motion for <j>, compared with the equation of motion for (j>, Eq. (8.80).
8.8
Linear damping and homogeneous noise
We now specialize these results to an easy case for which explicit answers can be obtained: the case of linear damping and homogeneous noise. In our present language, linear damping means
and homogeneous noise means
i.e., our Dn(a)'s for n > 2 are independent of a, but could be functions of time. Actually, A can also be a function of time, but to simplify the equations, we temporarily stick to constant A. In this case the operator L becomes
where the complete dependence on a is contained in the Aa term, and the noise contained in K(y, t) is homogeneous, or independent of a:
Note that terms linear in y have been separated, and K(y, t) contains all terms quadratic and higher in y. We can then solve for P, and <j> by using Eq. (8.101) and
150
GENERALIZED FOKKER-PLANCK EQUATION
(8.109). We are interested in the form
Since Eq. (8.109) becomes
Since a only appears linearly in L, only first derivatives with respect to y appear in the partial differential equation (PDE). If we had K = 0, the method of characteristics of Appendix 8.A, which applies to PDE's that involve only first derivatives, becomes applicable. The method of characteristics yields the equation of characteristics (We write yA rather than Ay since this is the appropriate order when y is a vector y and A is a matrix — the case discussed in Lax (1966III) Section 7F. The solution can be written where
is the special solution appropriate to y(0) = 0, and the first term is a solution of the homogeneous equation. To conform with the notation in Eq. (8.206) we note that y(0) = Y. If K = 0, the exact solution of the PDE is then
where Y(y, t) is the inverse of y(Y, t ) , the solution of y(t) in terms of its initial value Y, and 6(y. 0) has the form:
which is the prescribed initial condition since P(a, t\ao, 0) approaches 6(a — ao) and (j)(y, 0|a 0 ,0) is defined by Eq. (8.107). We can deal with the case K(y,t) / 0 by introducing the new variable z by the transformation
Since this would be a solution with i ^ ( z , t ) = if)(z,0) if K were absent, all terms not involving K cancel on the right hand side, as may be verified by direct
LINEAR DAMPING AND HOMOGENEOUS NOISE
151
calculation and the new dependent variable obeys
An integration over time can be immediately performed:
Inserting z = [y — yo(t)] exp(—At), as well as Eq. (8.101), we obtain an explicit solution for 6:
The desired function MQ may be obtained according to Eq. (8.108) merely by setting y = 0 in <j>(y, t O,Q, to). Thus we find that
This is a path integral or characteristic function involving an arbitrary q(s). It permits one, by differentiation, to obtain all possible moments of a(s). For example, (a(u)a(v)} is determined by differentiating the characteristic function with respect to q(u)q(v). Moreover, the above general problem solved includes a variety of important special cases. 1. Brownian motion if we set A = 0, Dn = 0 n > 2 , D^ = D - constant; 2. Uhlenbeck and Ornstein (1930) process (see Section 3.5), if we set Dn = 0 n > 2, A = constant; 3. Shot noise, if we set A = 0, Dn = vjn\. Note that the first factor in Eq. (8.126) is obtained by taking the average and bringing it inside the exponential. The Uhlenbeck-Ornstein process, using Eq.
152
GENERALIZED FOKKER-PLANCK EQUATION
(8.113), with Dn = 0 for n > 1 is
The associated MO may be obtained directly by regarding a(s) as a Gaussian random variable
where, with a = a — (a),
We note that the q(s) function has been completely arbitrary in our solution of this problem. We have desired this arbitrariness in order to make a comparison in Section 9.5 with a corresponding Langevin problem. Equality for general q(s) will guarantee the full equivalence of these problems.
8.9
The backward equation
The formalism associated with Markovian random processes is not symmetric in time, and a separate investigation must be made of how the probability distributions propagate backward in time. To our surprise, we not only find a different equation, but one easier to solve. Equation (8.101) is the equation of motion for the probability forward in time. To derive a similar equation of motion for the probability backward in time we rewrite the generalized Chapman-Kolmogorov equation (8.102), in the form
We then take the Taylor expansion of P(a, t\a', to) about a$:
EXTENSION TO M A N Y VARIABLES
153
and using Eqs. (8.99) and (8.100) obtain the backward equation, (Lax 1966III 5.21).
This backward equation is useful when trying to evaluate multitime averages. These averages require the evaluation of integrals over P. Since Eq. (8.132) involves derivatives with respect to OQ, we can integrate it over a and using Eq. (8.105), obtain
Thus we have obtained an equation of motion for our final result, the path integral itself. Such a procedure that obtains an equation directly for the quantity of interest is referred to as "invariant embedding" by Bellman (1964). 8.10
Extension to many variables
We have chosen, for simplicity of notation, to present Markovian processes in terms of a single random variable a. Most equations generalize immediately to many variables if we replace a(t) by a(i), where is a set of N variables. Thus the reader may replace in his mind each a(t) by the same symbol in boldface. We shall not restate all the results in the multidimensional case, but only indicate a few where it is worthwhile to display the subscripts explicitly. All results were stated in multidimensional form in Lax (1966III). For example, the equation of motion of a general operator can be written
154
GENERALIZED FOKKER-PLANCK EQUATION
where we have also generalized to include the loss term Q so as to be able to evaluate path integrals later. The diffusion coefficients themselves are defined by
where the limit as At —> 0 is understood and From Eq. (8.135) we obtain the useful moment relations
From the motion of a general operator, the generalized Fokker-Planck equation is obtained by integration by parts:
or in a more concise notation,
The associated characteristic function obeys where Before proceeding with a recitation of formal results, we would like to comment on some of the strange properties of Markovian processes. For this purpose,
EXTENSION TO M A N Y VARIABLES
155
we shall set Q = 0 to study the underlying process rather than a path integral. The motion of a general operator Eq. (8.135) was derived using the ChapmanKolmogorov condition. It is instructive to provide a more intuitive explanation. The strange feature is that it is not correct to set as an ordinary differential,
For a shot noise process, because all the cumulants are equal, all terms of order (Aa) re yield a contribution of first order in At. Thus we must write
and retain terms to all orders for shot noise. A subsequent average over a yields our previous equation of motion, Eq. (8.135). For a Brownian motion process Aa^ contains terms of order (At) 1 / 2 . See the Brownian motion results, Eqs. (3.27), (3.41). Since Dn = 0 for n > 2, only terms Aa^ to second order are needed for Brownian motion problems. Now, we discuss the noise spectrum in the case of multiple variables. When the fluctuations are small, it is appropriate to introduce the deviations
whose average values vanish. The first moment equation of Eq. (8.138) can be written
where the matrix A has elements:
If the elements a^(or a^) are real (Hermitian) then so is the matrix A. For simplicity, we have specialized here to the case Q = 0. The matrix A is not random, and, for fluctuations from a steady state, is time-independent, although, in general, A can be time dependent. The comnlex noise tensor can he written
156
GENERALIZED FOKKER-PLANCK EQUATION
or when components are exhibited and the process is stationary,
where the dagger denotes the complex conjugate for classical processes and the Hermitian adjoint for quantum random processes. Generally, one measures the noise in a composite variable
where the cv are nonrandom coefficients. Then
will of necessity be real and positive. To obtain the noise from Eq. (8.148) we need to be able to evaluate expressions such as
But Eq. (8.145) is valid only for t > 0, and the integral in Eq. (8.148) extends from —oo to co. We can overcome this problem by splitting the second integral in Eq. (8.148) into two regions: positive and negative t. In the second region, we replace t by — t and use stationarity to simplify the result. Thus
where
Note that we have succeeded, in Eq. (8.154), in expressing the second term in terms of positive times. This is needed since Markovian techniques normally express the
EXTENSION TO M A N Y VARIABLES
157
future (positive times) in terms of the present. Whether the variables a are real (Hermitian) or not, the second integral is related to the first by
Eqsuations (8.145) and (8.151) have their counterparts:
The positive and negative time components of the noise can thus be written brief!}
where A means the transpose of A. As in the single variable case, the spectrum of the noise is determined by the transport problem, as embodied in A, and the magnitude of the noise, (afia), which will be determined with the help of the Einstein relation. For this purpose, we take the second moment equation of Eq. (R 1^8^ and write it in terms of rv instead of a:
Here a runs over daggered as well as undaggered variables. In Section 8.11, we shall show that when time reversal is obeyed, and all the Q'S are even (or all odd) under time reversal, the third term in Eq. (8.160) is identical to the second. In this case, the steady state (or Einstein) relation
simplifies to or
In more general cases, we use the following approach to obtain A.*(a^a} in Eq. (8.161). A matrix equation for M of the form
can be solved in terms of A, B, and C using the concept of ordered operators (Feynman 1951; Dyson 1949) in the form
158
GENERALIZED FOKKER-PLANCK EQUATION
where the indexes 1, 2, and 3 constitutes the left-to-right ordering of the symbols, regardless of how they are written down. The formal solution:
is not explicit, since it gives no indication of how the expression can be written in the desired order. We can start disentangling by first writing the previous result as an integral representation:
This expression can then be written with the operators correctly ordered from left to right in A, B, C order. Then the ordering subscripts are no longer needed. Thus we arrive at the completely disentangled form:
Thus Eq. (8.161) has a solution in the form:
In our discussion of noise from a Langevin point of view, in Chapter 9, we shall show that it is possible to evaluate the noise spectrum completely, without needing an integral such as Eq. (8.168). The above procedure for evaluating (a]j,a.vj is appropriate in the nonequilibrium state. In equilibrium, the converse procedure is appropriate: (o^a) is evaluated from the standard thermal equilibrium formula (Lax 19601; Callen and Greene 1952; Greene and Callen 1952):
where Pv is the force conjugate to av. For example,
The example in Eq. (8.170) shows that volume fluctuation, and hence concentration fluctuations, are proportional to the compressibility. In the equilibrium case, the Einstein relation is used not to determine (a^a) but to determine D. In quantum applications it is convenient to determine D^ directly
EXTENSION TO M A N Y VARIABLES
159
using one expression in Eq. (8.138) with Q = 0, which is called the "generalized Einstein relation":
where we have used the abbreviation
to remind us that D^, in Eq. (8.171), represents the extent to which the law of differentiating a product is violated. In quantum application, we will consider a system in interaction with a reservoir and by elimination of the reservoir obtain an effective equation for the density operator of the system. Such an equation permits one to calculate motion of all operators: a^, av, and a^av and hence D^v. We use the phrase "generalized Einstein relation" because it is not restricted to the stationary state. Another concept which generalizes readily to the multivariable case is the linked-average. With and with each a possibly evaluated at a different time
In general, the multitime linked-averages are denned by
as in Eq. (1.51). One of the most general questions one can ask about a set of random variables a(s) is the multitime characteristic function. We have an explicit expression for the characteristic function for the case of linear damping plus homogeneous noise. The multidimensional analogue of Eq. (8.126) is
160
GENERALIZED FOKKER-PLANCK EQUATION
where K is defined in Eq. (8.113), and
In this analysis, we have permitted A(t) to be explicitly time dependent. The simple T_» denotes time ordering from left to right (earliest time to the left). 8.11 Time reversal in the linear case When time reversal is obeyed, Eq. (7.66), with A(i) = a\(i) and 5(0) = oy(O) , yields the condition valid even in the non-Markovian case. The order of random variables has been chosen so that the time reversal condition remains valid even in the quantum case. As applied to our linear response, Eqs. (8.151), (8.157), we get
At t = 0 this specializes to
In the classical limit, a.{ and OLJ commute. Thus if one variable is even (say oti) and the other is odd (say flj, using j3 rather than a to make the oddness visible) we have (classical only), even in the non-Markovian case. Thus the variables odd under time reversal do not correlate with the variables even under time reversal.
TIME REVERSAL IN THE L I N E A R CASE
161
If we equate terms linear in t in Eq. (8.178) we obtain
In the classical case, when both variables are even under time reversal, this simplifies to An example for the inertial systems, containing even and odd variables, will be discussed in more detail in Section 9.7. Time reversal leads to a paradox. If a(t) is a set of even variables then j3(t) = da(t]/dt is a set of odd variables under time reversal. Thus hence in the classical case On the other hand, if we replace /3 by da(t)/dt, is nonvanishing. How can we explain this paradox? For t > u we have demonstrated in Eq. (8.157) and in Lax (19601) Eq. (8.18) that For t < u, take the hermitian and use this result again
In view of Eq. (8.178), these expressions are in fact equivalent provided that the absolute value sign is used. The derivative of Eqs. (8.186)-(8.187) with respect to t at t = 0 then displays a cusp or discontinuity in slope:
In view of the time reversal condition, Eq. (8.181), these slopes are, in fact, equal and opposite in sign. This situation is displayed in Fig. 8.1. Perhaps the best way to understand this dilemma is that the cusp is correct for a truly Markovian system. However, the Markovian approximation may not be valid for exceedingly small times. And the slope will be continuous for small times, and hence vanish there. This is what would be expected for the occupancy of an excited state of an atom as a function of time. However, a deviation of the excited state occupancy from a simple exponential decay has, so far, not been observed.
GENERALIZED FOKKER-PLANCK EQUATION
162
Time reversal symmetry of correlation
FIG. 8.1. The true regression of a fluctuation (solid curve) is compared to the Markovian approximation (dashed curve). For t > TJ, the decay is exp(—t/r), where r is a typical relaxation time and r^ is the duration of a collision, or the forgetting time of the system. This figure is from Lax (19601). 8.12
Doob's theorem
Theorem A Gaussian process is necessarily linear. By linearity, we mean
Proof The conditional probability
is necessarily a Gaussian in the variables ai, 02, • • • a n , a, since it is the ratio of two Gaussians in these variables. Thus it must have the form
A HISTORICAL NOTE AND S U M M A R Y (M. LAX)
163
where all the dependence on {a.,} is shown. If one calculates the mean of a, e.g., by finding the maximum exponent, or by completing the square, one finds
The process is nonstationary if the Tj and a are time dependent, but it remains linear. The proof also applies to a set of random variables a(i) conditional on (a(tj)}. Then a and Tj are possibly time dependent matrices. Doob's theorem
A random process that is stationary, Gaussian, and Markovian possesses an exponential autocorrelation
Doob (1942) stated this theorem for a one-dimensional random process. Kac extended it to the multidimensional case as discussed in Appendix II of Wang and Uhlenbeck (1945). 8.13
A historical note and summary (M. Lax)
A number of people have asked how I got into the study of noise. Blame it on the editor of the Physical Reviews in the late 1950s. I had written a paper entitled "Generalized Mobilities" in which I derived the formula usually called the Kubo formula. That formula was shown to obey the fluctuation dissipation theorem. As a result the editor assumed that I knew something about noise and promptly started to send me a series of papers on noise in semiconductor transistors, diodes and other devices. These papers were written by good experimentalists who observed noise and felt obliged to explain it. Since general methods of dealing with noise were limited, I saw many ad hoc explanations that neither I, nor the authors understood. In self-defense I tried to learn what aspects were shared by most semiconductor devices. I decided that the relevant feature was that the noise consisted in small fluctuations about an operating point. The operating point was typically a steady state, but since current was drawn it was not an equilibrium state. So Lax (19601) "Fluctuations from the Nonequilibrium Steady State" was born. The lesson learned was that if the transport theory (time dependent average motion) was understood, the noise spectrum was readily determined, since the two-time decay or transport would be the same as the regression or one-time motion. But the normalization must be determined separately. And the latter was fixed by the Einstein relation. For the equilibrium case, the integrated noise can be expressed by thermodynamic
164
GENERALIZED FOKKER-PLANCK EQUATION
formulas. In the nonequilibrium case, one cannot call on thermodynamics, but can call on the second moments, or the diffusion coefficients that described the underlying fluctuations. The framework was later generalized to stronger noise and nonlinear fluctuations. For the Markovian case, this led to the conventional Fokker-Planck equation discussed in Lax (1966III). Finally, I decided to apply these methods to the problem of noise in lasers. This applied the Langevin approach discussed in Lax (19601) and Lax (1996IV) and to a quantum system thereby creating a "Quantum Theory of Noise Sources" in Lax (1966QIV). Quasilinear methods become completely inappropriate since we are dealing with a self-sustained oscillator. However, classical self-sustained oscillators exist, and the Fokker-Planck equation for a "rotating wave van der Pol oscillator" was solved exactly, but numerically by Hempstead and Lax (1967CVI) and Risken and Vollmer (1967), as will be seen in Chapter 11. Even here, in most cases, the fluctuations are small and a quasilinear treatment is satisfactory. An extension of the Einstein relation between fluctuations and mobility was extended by Lax (19601) to the nonequilibrium case. A treatment by Reggiani, Lugli, and Mitin (1988) considered the fluctuation-dissipation in a strong field case in a semiconductor. 8.14
Appendix A: A method of solution of first order PDEs
The problem is to solve a first order partial differential equation (PDE) of the form
where P = P(x, y, z), Q = Q(x, y, z), R = R(x, y, z). Following Goursat (1917) p.74 and Louisell (1973) Appendix A, we expect that the solution of this partial differential equation is related to the solution of the ordinary differential equations
Suppose that we have found a solution of the partial differential equation of the form Then
APPENDIX A: A METHOD OF SOLUTION OF FIRST ORDER PDES
165
By eliminating dy and dz, using Eq. (8.196) we get
Thus the integral of the "characteristic equations" (8.196) is a solution of the partial differential equation (8.195). If we regard x and y as independent variables and z = z(x,y) then we have
The derivative of u with respect to x yields
A similar derivative with respect to y yields
Take P times the first equation (8.198) plus Q times the second equation (8.199) plus R times du/dz to get
or Thus the solution of the characteristic equations also satisfies the inhomogeneous partial differential equation (8.201). Note: Ifu(x, y,z) = a and v(x, y, z) = b are two solutions of the characteristic equations then an arbitrary function (f>(u, v) is a solution of the PDEs (8.195) and (8.201). The above procedure can be extended to any number of independent variables. It is convenient then to note that
is the characteristic equation for
166
GENERALIZED FOKKER-PLANCK EQUATION
It is appropriate to think of u(x, y, z, t) as a distribution function associated with the particle dynamical equations
so that represents a conservation of density along the motion, that is, a Liouville's theorem. Thus if particles move according to the dynamical equations, Eq. (8.204), then u(x(t), y ( t ) , z(t)~) will be a constant of the motion as discussed by Goursat (1917). In particular, if the dynamical equations possess a solution x = x(X,Y,Z,t);
y = y(X,Y,Z,t)\
z = z(X,Y, Z,t},
(8.206)
where X, Y, and Z are the initial values of x, y, and z at t = to, then [X, Y, Z] can be thought of as the name of a particle which remains fixed as its position [x, y, z] changes. If we solve for the name of position or material variable X, Y, Z in terms of the spatial variables: or
then the motion x = [x(t), y ( t ) , z(t)} is such that X = ~K(x(t), y(t), z(t]} is fixed at the value X, etc. Thus X, Y, and Z are three constants of the motion, and any function is a solution of the partial differential equation. Since X = x,Y = y, Z = z at t = to, the above solution is the one which obeys the initial condition Another way to state this result is that
where X = [X, Y, Z], x = [ x , y , z] are the Green's function or point source solution given by This result shows that a point source remains a point as it moves although the normalization integral may change. The first delta function reminds us of the meaning
APPENDIX A: A METHOD OF SOLUTION OF FIRST ORDER PDES
167
of X(x, t) as the inverse of x(X, t) but the notation 5(F(x, t} - X) reminds us that in the integration over X only the term after the minus sign is actually an integration variable. The solution of the first order Fokker-Planck equation
can also be obtained by considering the characteristic equations
The solution is given by
where We wrote the solution down by guessing that it was the appropriate solution in which a "point" remains a point for all time (this simplicity will disappear when diffusion is added). We may verify the solution by evaluating dP/dt:
Since the other factors are functions of X, the d/dxi can be pulled all the way to the left
But X is the material variable or name of the particle so that [dxi/dt]^ is in fact the material time derivation (the one which follows the particle), i.e.,
and because of the presence of the delta function, we can replace A^(x(X, t)) by Aj (x) to obtain
where the last step uses the original ansatz Eq. (8.212).
9 Langevin processes
9.1
Simplicity of Langevin methods
For Markovian processes, particularly Fokker-Planck processes, the reduction to a partial differential equation is the method of choice for obtaining accurate numerical results. However, it is difficult to anticipate the form of the solution, and to relate it to the physical properties of the system and its noise sources. If the process is not Markovian, the noise sources will not be white, and the detailed FokkerPlanck apparatus is no longer applicable. The Langevin description of the noise source of a physical system is much closer to our physical intuition, and provides a simpler understanding of the possible solutions. Moreover, analytical solutions can be obtained for Gaussian processes even when the noise sources are not white. Shot noise processes involve delta correlated noise sources, but higher cumulants exist, so they do not reduce to the usual (second order) Fokker-Planck equation, for which third and higher cumulants vanish. Thus in general Langevin processes permit a larger class of problems to be solved than Fokker-Planck processes and procedures. They provide more intuition. However, the class of such processes that can be reduced exactly to analysis is limited. But approximations (typically adiabatic ones) are easier to envision and to make in the Langevin language. Langevin methods, at least for linear, or quasilinear systems, have the simplicity of the circuit equations of electrical engineering. The noise source may arise from thermal reservoirs as in Johnson noise, or shot noise from the discreteness of particles. But once the noise is represented as a voltage source with known moments, the physical nature of the source is no longer important. The sources can be thought of as a black box, with an impedance and a voltage source, or an admittance and a current source. And the origin of the sources will not enter into the solution of problem. For the quasilinear case, we can write our set of Langevin equations in the form
PROOF OF DELTA CORRELATION FOR M A R K O V I A N PROCESSES
169
where a = a — (a) is a multicomponent object as is the force F(i). The second form of equation is unnecessary if a is real. The dagger represents the conjugate when acting on a vector, and the Hermitian adjoint (transposed conjugate) for a matrix. In the quantum case, it always represents the Hermitian adjoint operator. Equation (9.1) can be regarded as the definition of F(i). The bold notation is specifically used here to denote a set of variables a rather than a single variable a.
9.2
Proof of delta correlation for Markovian processes
Results in this section was presented in Lax (19601), Section 8. Although it is intuitively clear that the correlation (F^(t)F(w)) should be delta correlated, proportional to 5(t — u), in order not to retain any memory of earlier events, we shall establish that this must be so for the second moments in order to be consistent with our results for the correlation (a^(t)a(u)) in Eqs. (8.186) and (8.187). These equations can be combined into a single equation covering both time orders
where H(t) is the Heaviside unit function
whose derivative is a delta function:
The autocorrelation of the noise source, F(t), defined in Eq. (9.1), is then given by (in a form appropriate to the complex case)
where we have replaced at A^ by A*o^. The <— sign reminds us that d/du acts to the left on (a\t)a(u)}.
170
LANGEVIN PROCESSES
Using a shifting theorem of the form
we have If we insert Eq. (9.3) into Eq. (9.6), and use Eq. (9.8), we get
Since the derivative of a delta function is an odd function of argument
the result simplifies to
where the second equality follows from the Einstein relation, Eq. (8.163). An interesting consequence of the above results is the theorem written in component form. Theorem
Products of random variables and random forces for Markovian processes:
Equation (9.12) is the statement that for a Markovian process the Langevin force has no memory. Proof
Suppose that we start at some time to < s, to < t with a specified initial value a (to) at t = to- Then Eq. (9.1), a linear first order equation, with constant
HOMOGENEOUS NOISE WITH LINEAR DAMPING
171
coefficients, has the usual solution
where the left hand side could be labeled with a subscript a: (to) to remind us of the initial condition. If we multiply by F(t) on the right and take an ensemble average, the first term vanishes since a(to) is fixed and (F(t)} = 0. With the help of Eq. (9.11) the second term yields
When t > s, the delta function is not included in the region of integration, and Eq. (9.12) results. When t = s, only half of the delta function is integrated over, and this establishes Eq. (9.14). The three results can be combined into a single expression
provided that the Heaviside unit function H(s) takes the value 1/2 at s = t. The subscript on the average reminds us that it is conditional on a given value of a (to) at t = to- An additional average over a: (to) permits us, also, to drop the initial condition, yielding Eqs. (9.12)-(9.14) with the constraint removed on the starting value. That a random force for a Markovian system does not depend on random variable values at earlier times, seems intuitively clear. But a proof is needed. Indeed, our proof, and our entire discussion, so far, has been limited to linear, stationary processes. The factor of 2 reduction in Eq. (9.14) from integrating over half a delta function is consistent with our view that in the nonideal world, the correlation of the forces is not a delta function, but a sharp even function. Strictly speaking it is not a function at all, but a sequence of such even functions whose width approaches zero. 9.3
Homogeneous noise with linear damping
For a large class of noise problems, it is appropriate to treat the noise as weak and make a quasilinear approximation about the operating point. The purpose of Lax (19601) was to show that the noise and correlations in such a system can be obtained in three steps: (1) the time dependence of correlations such as {a^(t)a(O)} obey the same equations as those for (a^(t)), so that the average "transport" or "relaxation" equations determine the correlations, and hence the
172
LANGEVIN PROCESSES
frequency dependence of the noise; (2) the normalization is determined by the single time correlations (cJa)); (3) for a nonequilibrium system, the steady state values (cJa) must be determined by solving the Einstein relations of Eq. (8.163):
For N variables, this is a set of N"2 equations. One major benefit of the Langevin approach is that one can avoid the solution of Eq. (9.18) and never solve any system of more than N equations. We shall obtain our principal result using the Langevin approach in the heuristic manner described in Lax (1966IV), Section 1. The noise correlation, Eq. (4.36), can he written
Our notation is modified to include operators as well as desired random variables, and to emphasize normal order (creation operators to the left of destruction operators). With Fourier transforms written as in Eq. (4.18)
Using the Langevin approach, Eq. (9.1) leads to the associated equation
where A is the transpose of A. And Eq. (9.2) translates into the adjoint of Eq. (9.21)
With the understanding that, even through the Fourier transforms are not welldefined mathematical object, products of two are, in the sense that
CONDITIONAL CORRELATIONS
173
This notation is consistent with that in Section 4.7 Eq. (4.50). Equation (9.19) can then be written as
where Fi,Fw is defined by Eq. (9.19) with a replaced by F:
where D is simply the diffusion matrix associated with a due to the Langevin force F(t) from Eq. (9.11). Thus an explicit answer for the noise in the variables o^ov if?
The inverse of the Wiener-Khinchine relation yields the time correlation
The one-time correlations may then be obtained by setting t = 0:
The integrals contained in Eq. (9.28) can always be evaluated by residue techniques since there are a finite number of poles. Thus the second moments, although no longer needed, except as a measure of total noise, can also be obtained simply. This avoids the use of ordered operators discussed in Section 8.10. 9.4
Conditional correlations
Another advantage of the Langevin method, at least in the linear case, is that it is easy to calculate second order correlations conditional on an initial condition. The
174
LANGEVIN PROCESSES
equation is Eq. (9.1) and (9.2):
subject to a(to) = CKQ. The solution can be written as
If Eqs. (9.31) and (9.32) are multiplied and averaged, the result is
with
Cross-terms linear in F vanish on averaging and were discarded. Equations (9.11) and (8.163) can be rewritten
If t > u, the integral over r should be done first in Eq. (9.35). Then the delta function is always satisfied somewhere in the range of integration. Thus
But the integral is a perfect differential, as remarked in Lax (19601):
The complete integral then yields
GENERALIZED CHARACTERISTIC FUNCTIONS
175
The absolute value \t — u , in the first term, is unnecessary since t > u. However, when u > t, the integral must be done over s first, and both answers are given correctly by using \t — u . Although we are dealing with a stationary process, Eq. (9.40) is not stationary (not a function only of \t — u\) because initial conditions have been imposed at t = to. However, if to —> —oo, the results approach the stationary limit
Equation (9.40) can also be rewritten by subtracting off the mean values
so that for the fluctuations about the average position:
Again the results approach the stationary results as to —» — oo and u — to —>• oo at fixed t — u. At t = u = to, the right hand side vanishes as it should, since there is no fluctuation at the initial time at which all values are specified. The stationarity of the original expression, Eq. (9.40), is maintained if all times t, to, and u are all shifted by the same amount r.
9.5
Generalized characteristic functions
In this section we will obtain the result in Section 8.7 using the Langevin approach, which was presented in Lax (1966IV), Section 2.
176
L A N G E V I N PROCESSES
We can continue our discussion of homogeneous noise with linear damping using the same Langevin equation
but specifying the noise source moments:
Thus all linked-moments are assumed maximally delta correlated. The parameters A, D and Dn can be functions of time as discussed in Lax (1966IV), but we shall ignore that possibility here for simplicity of presentation. Here, L denotes the linked average (or cumulant) which is defined by
where y(s) is an arbitrary (vector) function of s. Equation (9.49) is a natural generalization of Eq. (1.51). If we insert Eqs. (9.46), (9.47), (9.48) into Eq. (9.49), we get
Here the symbol ":" means summation on all the indexes. The n = 1 term vanishes in view of Eq. (9.46). Equation (9.50) defines K(y, s) which was previously defined in the scalar case in Eq. (8.113). Instead of evaluating MQ by solving a partial differential equation, as in Section 8.8, we consider the direct evaluation of Eq. (8.114):
To do this, we write the solution of Eq. (9.45) as
GENERALIZED SHOT NOISE
177
If Eq. (9.52) is inserted into Eq. (9.51) and the order of the integration over u and s is reversed, we get
where
Equation (9.53) is now of the form, Eq. (9.49), for which the average is known. The final result
is in agreement with Eq. (8.126), except that here, we have explicitly dealt with the multivariable case.
9.6
Generalized shot noise
In the shot noise case, there is no damping, that is, A = 0, corresponding to a noise source equation of the form
where the rjk are random variables. We use the symbol G rather than F to remind us that (G} ^ 0. The choice of linked-moments
with
is appropriate to describe Rice's generalized shot noise of Section 6.7. With this choice, the linked-moment relation of Eq. (9.58), with F replaced by G yields
178
LANGEVIN PROCESSES
One can explicitly separate the n = 1 term:
These results describe the properties of the noise source G. We are concerned, however, with the average
If we separate off the mean part of G
then
GENERALIZED SHOT NOISE
179
Reversing the order of integration in MQ leads, as in Eq. (9.53), to a result
where
Equation (9.65) is of the form to which Eq. (9.60) can be applied. Since we have G in place of G, the first factor on the right hand side of Eq. (9.60) should be omitted in MQ:
For the case of simple shot noise
For the case of generation and recombination,
where v = G + R is the total rate.
When inserted into Eq. (9.63), these results, Eqs. (9.69), (9.71), contain all the multitime statistics of the conditional multitime average MQ. The corresponding multitime correlations can be determined by simply differentiating MQ with respect to %(«1)^(^2) • • ••
180
9.7
LANGEVIN PROCESSES
Systems possessing inertia
The customary description of random systems in terms of a set of first order equations for da/dt appears to contradict the case of systems possessing inertia for which second order equations are appropriate. However, by introducing the momentum variables, as in Hamiltonian mechanics, we convert TV second order equations to IN first order equations. Extension of our previous results to inertial systems is immediate, but purely formal, except for the fact that the set of position variables is even under time reversal:
whereas the momentum variables are odd:
and the cross-moments are clearly odd in time (see Eq. (7.14))
Hence such moments must vanish at equal time in the classical case:
In the quantum case, the commutator qp — pq = ih forces (qp} = —(pq) = ih/2. If we set
then a = (p, q) (here a is Hermitian) obeys a set of first order equations
corresponding to the second order equation
where and F is an external force. In the presence of noise, random forces can be added to the right hand side of Eqs. (9.78) and (9.79). Hashitsume (1956) presented heuristic arguments that no noise source should be added to the momentum-velocity
SYSTEMS POSSESSING INERTIA
181
relation. However, a proof is required, which we shall make based on the Einstein relation. Equations (9.78) and (9.79) correspond to the decay matrix
In the equilibrium case, we can write
The nonequilibrium case is discussed in Lax (19601), Section 10. The Einstein relation Eq. (9.18) in the equilibrium case then yields
The presence of elements only in the lower right hand corner means that, as desired, random forces only enter Eq. (9.79). If F in Eq. (9.79) is regarded as the random force then
may be regarded as a different way of stating the Einstein relation and the fluctuation-dissipation theorem.
10 Langevin treatment of the Fokker-Planck process
10.1
Drift velocity
In Chapter 9, the Langevin processes are discussed based on the Langevin equations in the quasilinear form, Eqs. (9.1), (9.2). In this chapter, we consider the nonlinear Langevin process defined bv
The coefficients B and a may explicitly depend on the time, but we will not display this time dependence in our equations. We will now limit Langevin processes which lead back to the ordinary FokkerPlanck equation, i.e., a generalized Fokker-Planck equation that terminates with second derivatives. We shall later find that the classical distribution function of the laser, which corresponds to the density matrix of the laser, obeys, to an excellent approximation, an ordinary Fokker-Planck equation. We assume
The Gaussian nature of the force f ( t ) implied by Eq. (10.4), which is needed for a conventional Fokker-Planck process, namely one with no derivatives higher than the second. It is possible to construct a Langevin process which can reproduce any given Fokker-Planck process and vice versa. The process defined by Eq. (10.1) is Markovian, because a(t + At) can be calculated in terms of a(t), and the result is uninfluenced by information concerning a(u) for u < t. The /'s at any time t are not influenced by the /'s at any other t', see Eqs. (10.2)-(10.4), nor are they influenced by the a's at any previous time since / is independent of previous a. Thus the moments Dn in Section 8.1 can be calculated from the Markovian expression:
DRIFT VELOCITY
183
The difference between the unlinked and linked averages in Eq. (10.5) vanishes in the limit At —> 0. Rewriting Eq. (10.1) as an integral equation and denoting we have the integral equation:
We shall solve this equation by iteration. In the zeroth approximation we set a(s) = a(t) = a which is not random in averages taken subject to a(t) = a. Our first approximation is then
The first term is already of order At and need not be calculated more accurately. In the second term of Eq. (10.7) we insert the first approximation
Retaining only terms of order At, or / 2 , but not /At, or higher, we arrive at the second and final approximation
Let us now take the moments, Dn. For n > 2, using Eq. (10.4), to order At, we have Thus from Eq. (10.5) all Dn = 0 for n > 2. For n = 2, using Eqs. (10.2)-(10.5), we obtain
184
L A N G E V I N TREATMENT OF THE FOKKER-PLANCK PROCESS
and for n = 1,
The double integral in Eq. (10.14), evaluated using Eq. (10.3), is half that in Eq. (10.12) since only half of the integral over the delta function is taken. Integration over half a delta function is not too well denned. From a physical point of view, we can regard the correlation function in Eq. (10.3) to be a continuous symmetric function, such as a Gaussian of integral unity. As the limit is taken with the correlation becoming a narrower function, the integral does not change from 1/2 during any point of the limiting process. Equations (10.13) and (10.14) have shown that given a Fokker-Planck process, described by a drift vector A(a) and a diffusion D(a), we can determine the functions B(a) and a (a):
that leads to a Langevin process with the correct drift A(a) and diffusion D(a) of the associated Fokker-Planck process described by Eqs. (10.1). The reverse is also true. Given the coefficients B and a (a) in the Langevin equation, we can construct the corresponding coefficients, A and D of the Fokker-Planck process. 10.2
An example with an exact solution
The procedure used in the above section may be regarded as controversial, because we have used an iterative procedure which is in agreement with the Stratonovich's (1963) treatment of stochastic integrals, as opposed to the Ito's (1951) treatment. This disagreement arises when integrating over random process that contain white noise, the so-called Wiener processes. We shall therefore consider an example which can be solved exactly. Moreover, the example will assume a Gaussian random force that is not delta correlated, but has a finite correlation time. In that case, there can be no controversy about results. At end, however, we can allow the correlation time to approach zero. In that way we can obtain exact answers even in the white noise limit, without having to make one of the choices proposed by Ito or Stratonovich, as discussed in Lax (1966IV), Section 3.
AN EXAMPLE WITH AN EXACT SOLUTION
185
The example we consider is:
where // and a are constants, independent of a and t, and R(t — u) is an analytical function. Thus our problem is linear and Gaussian but not delta correlated, hence, the noise is not white. This example was previously presented (with a slight difference in notation) in Lax (1966III), Section 6C and Lax (1966IV), Section 3:
from Eq. (10.17). Therefore the ensemble average given by
can be evaluated using Eq. (9.49) in terms of the linked average:
The average is then expressed in terms of the cumulants. But for the Gaussian case, the series in the exponent terminates at the second cumulant:
We make a transformation of Eq. (10.18) to variable x, y:
It was permissible, here, to obey the ordinary rules of calculus in this transformation, without requirement of Ito's calculus lemma, because delta function correlations are absent. The equation of motion for x is
Since x would be constant in the absence of the random force f ( t ) , then the probability of x at time t, P(x, t), is necessarily Gaussian, and determined only by the
186
LANGEV1N TREATMENT OF THE FOKKER-PLANCK PROCESS
random force /(£), therefore, has the normalized solution form
where the second moment from Eq. (10.23) must be:
where H is an abbreviation for H(t}. Using Eq. (10.22), and changing back to the original random variable a, Eq. (10.24) leads to
Equation (10.27) is valid, even in the non-Markovian case of a continuous differentiable /(£). In this case, (H2) proportional to t2 when t —> 0, and its first derivative over dt is zero. If the process is Markovian, /(t) will be delta correlated anrl / ff^\ will he linear in •/••
10.3
Langevin equation for a general random variable
Let us consider an arbitrary function M(a, t) of the random variable a, which obeys the Langevin equations: We ask what is the Langevin equation for Ml Following the procedure in Section 8.2, the drift vector for M in the Fokker-Planck process is determined by
where, from Eq. (10.15) and Eq. (10.16), and
Equation (10.30) is terminated to n = 2 for an ordinary Fokker-Planck process.
L A N G E V I N EQUATION FOR A G E N E R A L R A N D O M VARIABLE
187
The fluctuation term for M is determined by
The drift term in the Langevin equation for M is given by
We obtain that
The Langevin equation for M is given by
Therefore, the transform in our Langevin equation obeys the ordinary calculus rule. The average is contributed not only from B(M,t] but also from the second term
which is not zero, except that a(M) is a constant. For the conditional average with M(t) = M, the contribution from the second term is
We consider an example, which will be used in Chapter 16 for applications in the finance area. If S is the price of a stock, the fundamental Langevin model is built for the percentage of the stock value dx = dS/S:
where u, and a are not functions of x.
188
L A N G E V I N TREATMENT OF THE FOKKER-PLANCK PROCESS
Our Langevin equation for S is simply
Hence, Eq. (10.37) can be simply obtained from Eq. (10.36) multiplying by S. For obtaining the average or the conditional average with s(t) = s, d(S)/dt, we have
and for the conditional average, (S) in the last expression of Eq. (10.38) is replaced by S. 10.4
Comparison with Ito's calculus lemma
The stochastic differential equations, in which Ito's calculus lemma is used for the transform of random variables, are broadly applied in the financial area. Ito's stochastic equation is written as
where dz is the differential of a Wiener process of pure Brownian motion, namely one whose mean remains zero, and whose standard deviation is (At) 1 / 2 . Ito's stochastic equation for an arbitrary function M(a, t) is written as
where A(M, t) is determined by Ito's calculus lemma:
and a(M)n is determined by
Our Eq. (10.30) is similar to Ito's calculus lemma, which indicates that the ordinary calculus rule is not valid for the Fokker-Planck processes, as shown in Section 8.2. However, the calculus rule for our Langevin equation is not Ito's calculus lemma. The difference between our Langevin's stochastic equation and that using Ito's lemma is originates from the different definitions of the stochastic integral. Ito's integral
EXTENDING TO THE MULTIPLE DIMENSIONAL CASE
189
(o-(a(£ c )) H /0)) = 0, and B(a,t] = A(a,t) in Eq. (10.39). Hence, Ito's calculus lemma must be introduced in order to obtain the correct answer. In our Langevin description (a(a(t))f(t)} is estimated based on an integrand as function of t, that can be recognized as a limit of an analytical function on which the Riemann integral exists as shown in Section 10.2, and finally it approaches to a delta correlated function: (f(t]f(s)} = 2S(t — s}. Hence, (a(a(t))f(t)} is estimated to be nonzero, except the case that a is a constant. There is an unimportant difference in notation between ours and that used in some literature. The standard Wiener notation is equivalent to the correlation This leads to a Brownian motion in which whereas the customary physics (and our) notation would have a factor of 2 placed on the right hand side of these equations. Hence, there is a notation transform between that used in this book and that in other literature: Although two stochastic approaches lead to the same mathematical result on the average level or on the conditional average level, we, as physicists, prefer a method more compatible to the actual natural processes, that the integrand is a function of time t. The ordinary calculus rule can be used in our Langevin's stochastic equation. This is a major advantage of our approach. As shown by the example in Eq. (10.38), for a random variable dx = dS/S, one cannot simply write dS = Sdx when Ito's calculus lemma is used. This could possibly misleading, and will be discussed in Chapter 16. Other examples of applying two different stochastic approaches in the financial area are presented in Section 16.6 and Section 16.7. 10.5
Extending to the multiple dimensional case
Consider a set of random variables a [ai, a-2,..., an], which obey the Langevin equations: Following the procedure in Section 10.1, up to the second order of iterations, we have
190
LANGEVIN TREATMENT OF THE FOKKER-PLANCK PROCESS
Hence, the moments Dn defined in Eq. (10.5) (up ton = 2) are
and
For a set of functions M(a) [Mi (a), M2(a), . . . , M n (a)], the Langevin equation for M is written as
We ask what are £?j(M, t) and
where [Di(a,t)]i and [D^(a., £)]&/ are given, separately, by Eq. (10.49) and Eq. (10.48). The fluctuation term for M is determined by
The drift term in the Langevin equation for M is given by
Now, we calculate the second term:
MEANS OF PRODUCTS OF RANDOM VARIABLES AND NOISE SOURCE191
After a few steps of calculation, we obtain that
Therefore, the Langevin equation for M is given by
Therefore, the transform in the Langevin formula of M(a) obeys the ordinary calculus rule. The average of the fluctuation term (
and for the conditional average, the average symbol ( . . . ) on the right hand side of the above equation is taken out.
10.6
Means of products of random variables and noise source
In this section we use another method to estimate (M(a)F(t)}, and extend to the multidimensional case. Here, F(t) is limited to be independent of a, hence, is a linear fluctuation model. This approach will be used in the next chapter.
192
L A N G E V I N TREATMENT OF THE FOKKER-PLANCK PROCESS
Let us consider an arbitrary function M(a) of the set of random variables a= [ai,..., an] which obey the usual coupled Langevin equations:
We want to calculate ( M ( a ( t ) , t ) F j ( t ) ) .
In accord with Eq. (9.17), we assume
i.e., the noise sources at time t are independent of the variables a at the time s < t. Our calculation here will be different from the iteration procedure used in Section 10.1, and simpler. We set
and will later let e —> 0. We rewrite (M(t)Fj(t)), M(a(t),t) , as
with notation M(t) =
By Eq. (10.56), the first term vanishes, and Eq. (10.58) can be rewritten as an integral
or using Eq. (10.55),
Only the last term of Eq. (10.60) is sufficiently singular, containing a product of two forces, to yield a finite contribution as t — tc = e —>• 0.
MEANS OF PRODUCTS OF RANDOM VARIABLES AND NOISE SOURCE193 To lowest order in e, (dM/dai)s « (OM/dai)tc, so that
Using Eq. (10.55) and omitting a factor 2 since we are integrating only over one half of the delta function, we obtain:
We note that on the right hand side we had taken the a's fixed at tc, and then looked at the fluctuation of a at time t, i.e., a(t c )+fluctuation. Then we have calculated the correlation of the components of this fluctuation with Fj (t) . We can then take the limit t —> tc and write Eq. (10.62) as
i.e., the average conditional on a(i) = a at time t.
11 The rotating wave van del Pol oscillator (RWVP)
11.1
Why is the laser line-width so narrow?
The question has been raised with an entirely different meaning by Scully and Lamb (1967). However, optical lasers with frequencies as high as 1015 radians per second can have line-widths of 104 and even smaller. The addition of nonlinearity to a system generally leads to combination frequencies, but not to an extremely narrow resonance. The clue for the solution of this difficulty is described in Lax's 1966 Brandeis lectures (Lax 1968), in which Lax considered a viable classical model for a laser containing two field variables (like A and A^~), two population levels such as the upper and lower level populations in a two level atomic system, plus a pair of atomic polarization operators that represent raising and lowering operators. When Lax took this nonlinear 5 by 6 system, sought a stationary state, and examined the deviations from the stationary operating state in a quasilinear manner, he discovered that there is always one nonstable degree of freedom: a phase variable that is a combination of field and atomic phases. The next step was the realization that this degree of instability is not an artifact of the particular example, but a necessary consequence of the fact that this system, like many others, including Hewlet-Packard radio-frequency oscillators, is autonomous, namely that the equations of motion contained no time origin and no metronome-like driving source. Mathematically, this means that the system is described by differential equations that does not contain to explicit time dependence. As a consequence if x(t) is a solution, where x(t) is a six-component vector, then x(t + r) is also necessarily a solution of the differential equation system. But this means that the solution is unstable to a time shift, or more pictorially to a frequency shift. Under an instability, a new, perhaps sharp line can occur, as opposed to the occurrence of summation or difference lines that arise from nonlinearity. Hempstead and Lax (1967CVI) illustrate the key idea in a simpler system, the union of a positive and negative impedance. In this chapter, we will first describe this nonlinear model, build the corresponding differential equation of motion in Section 11.2, and transform this equation to a dimensionless canonical form in Section 11.4. In Section
AN OSCILLATOR WITH PURELY RESISTIVE NONLINEARITIES
195
11.3, the diffusion coefficients in a Markovian process is defined, and the condition for validity of this approximation is described. In Section 11.5 the phase fluctuations and the corresponding line-width are calculated. The main result of line-width W obtained in Eq. (11.66) is shown to be very narrow. In Section 11.6, the amplitude fluctuations is calculated using a quasilinear treatment. In Sections 11.7 and 11.8, the exact solutions of fluctuations are calculated based on the Fokker-Planck equation of this model. 11.2
An oscillator with purely resistive nonlinearities
We consider a self-sustained oscillator, which is modeled by connecting a positive and negative resistance in series with a resonant circuit, as shown in Fig. 11.1. A more general case was considered in Lax (1967V). The negative resistance is a source of energy, like the pump in a laser. A steady state of energy flow from the negative to the positive resistance is achieved at the oscillation intensity at which the positive and negative resistance vanishes, and the frequency of oscillation stabilizes at the frequency at which the total reactance vanishes. In addition to the standard resonant circuit Ldl/dt + Luj^Q, we assume a resistivity function R(p)I. Therefore our equation of motion is
where e(i) is a real Gaussian random force.
and p is essentially the energy stored in the circuit, which is defined by
where with A being complex. By taking our nonlinearity R(p) to be a function of the energy stored in the circuit, but not of the current or of the charge, we omit terms that vary as exp(2iwoi), etc., thus we have made the rotating wave approximation. By definition and using Eqs. (11.1), (11.4), (11.5) we obtain
where
196
THE ROTATING WAVE VAN DEL POL OSCILLATOR ( R W V P )
FIG. 11.1. A self-sustained oscillator is modeled by connecting a positive and negative resistance in a resonant circuit. The negative resistance is a source of energy. A steady state of energy flow from the negative to the positive resistance is achieved at the oscillation intensity at which the positive and negative resistance vanishes, and the frequency of oscillation stabilizes at the frequency at which the total reactance vanishes. It is consistent with the rotating wave approximation only to retain the slowly varying parts of A and A*, so that the term of A*e2luJ°t in Eq. (11.6) is dropped, leaving
We first use the equations for A and A* to determine the operating point, i.e.,
From Eq. (11.8), the operating point, PQQ, is therefore determined by
or using Eq. (11.2), that We call this operating point poo because we will later find a slightly better one, po, using a different reference frame. From Eq. (11.8) we obtain the equation of
THE DIFFUSION COEFFICIENT
197
motion of A*, where We are doing this problem thoroughly because we will find that this classical random process, which is associated with the quantum mechanical problem like laser in the difficult region near threshold, reduces to the Fokker-Planck equation for this rotating wave van der Pol oscillator. 11.3 The diffusion coefficient Noticing that e_(i) and e+(t) are random forces, we now calculate the diffusion coefficients D-+ = D+- defined by
Using the definition of the 6 function, we have
Equation (11.15)isan appropriate definition for processes that are Markovian over a time interval AT » (^o)"1 (see Lax 1966IV, Section 5 for a more detailed discussion of processes containing short, nonzero correlation times). Equation (11.15) can be rewritten as
Using the definition of G(e, UJQ) in Eq. (4.50), we see that the diffusion constant in the limit of AT —> oo is
and describes the noise at the resonant frequency U>Q. If we had chosen
then our spectrum would be that of white noise, i.e., independent of frequency. In the case of not exactly Markovian process we are assuming the spectrum does not vary too rapidly about OJQ (see Fig. 11.2), and thus we can approximate it by
198
THE ROTATING WAVE VAN DEL POL OSCILLATOR (RWVP)
FIG. 11.2. In the case of not exactly Markovian process we can approximate it by white noise evaluated at the frequency U>Q. white noise evaluated at the frequency UIQ. The spectrum of the noise source is not necessarily white, but only the change over the frequency width of the oscillator is important, and that change may be small enough to to permit approximating the noise as white. Indeed, the situation is even more favorable in a laser containing N photons: the line width will be smaller than the ordinary resonator line-width by a factor of N. In general, for an oscillatory circuit such as shown in Fig. 11.1, it is essential to choose AT to be large compared to the period of the circuit, but it is often chosen small compared to the remaining time constants to avoid nonstationary errors. The condition for the validity of this choice is actually
where A l is the relaxation time associated with the system and Su> is the frequency interval that
over which the noise displays its nonwhiteness. To order (woAT) shown in Lax (1967V) that
1
, we have
Thus we actually require two conditions, Eq. (11.19) and the less stringent condition (o^AT)"1 «C 1 .
THE VAN DER POL OSCILLATOR SCALED TO CANONICAL FORM 199 11.4
The van der Pol oscillator scaled to canonical form
The oscillator shown above is a rotating wave oscillator, but not a van del Pol oscillator since Eq. (11.8) has an arbitrary nonlinearity R(p). Therefore we expand R(p) about the operating point, forming the linear function
We shall later discuss the condition under which this approximation is valid. We now perform a transformation
and where
and Then Eq. (11.8) becomes a canonical form:
where
and
The coefficients £ and T were determined by the requirement that A' and h satisfy Eqs. (11.27) and (11.29). The condition for neglect of higher terms in the expansion of R(p) about the operating point is where
Inserting Eq.
we require
If the noise (e 2 ) Wo is weak, as it usually is, the noise width of Ap = ^2 will be small compared to the region Sp over which R(p) changes appreciably. In physical terms the width £2 of the fluctuations (in p) is small compared to the region 8p characterizing the nonlinearity. Thus over this region the linear expansion of R(p), resulting in the van der Pol equation, is valid.
200
11.5
THE ROTATING WAVE VAN DEL POL OSCILLATOR (RWVP)
Phase fluctuations in a resistive oscillator
In Eq. (11.10) we found that an oscillator chooses to operate at a point at which its net resistivity and its line-width vanishes. Noise in a stable nonlinear system would add to this signal possessing a delta function spectrum, but not broaden it. Fortunately, an autonomous oscillator (described by a differential equation with time independent coefficients) is indifferent to a shift in time origin and thus is unstable against phase fluctuations. These unstable phase fluctuations will broaden the line, whereas amplitude fluctuations only add a background. For the purpose of understanding phase broadening, therefore, it is adequate to linearize with regard to amplitude fluctuations. Indeed, for a purely resistive oscillator, there is no coupling (at the quasilinear level) between amplitude and phase fluctuations. At least in the region well above threshold, then, when amplitude fluctuations are small, it is adequate to treat phase fluctuations by neglecting amplitude fluctuations entirely. If in Eq. (11.8) we introduce
then the phase is found to obey the equation
from which R(p) has disappeared. Amplitude fluctuations are neglected by setting u = 0. The only vestige of dependence on R(p) is through po, which is \A\^ at the operating point. Equation (11.34), with u = 0, is a differential equation containing no time origin and no metronome-like driving source. As a consequence if x(t) is a solution, then x(t + T) is also necessarily a solution. This means that the solution is unstable to a time shift. po could be replaced by the more accurate (p). Since R(p) no longer enters the problem, we can with no loss of generality work with the dimensionless variables introduced in Section 11.4 for the RWVP oscillator. Dropping the primes in Eq. (11.27), and defining p = (p)/£ 2 , Eq. (11.34) (with u = 0) takes the dimensionless form
of a Langevin problem in > alone. Since e ( t ) , hence h(t] has been assumed Gaussian, with vanishing linkedmoments for n > 2, Eq. (11.35) as shown in Chapter 10 reduces to a
PHASE FLUCTUATIONS IN A RESISTIVE OSCILLATOR
201
Fokker-Planck process. The diffusion coefficient D is given by Eq. (8.14),
Using Eq.
Since we already have the product of two /i's, using Eqs. (11.29) and (11.30) we obtain
Therefore D((f>) is independent of
Following the method of Section 10.6
The first term vanishes and the second yields
using Eq. (11.29) and integrating over half the delta function. Adding the complex conjugate in Eq. (11.39) we get
By inserting the above first and second moments into the generalized FokkerPlanck equation, Eq. (8.38), we obtain a simple Fokker-Planck equation for this
202
THE ROTATING WAVE VAN DEL POL OSCILLATOR (RWVP)
process,
which describes pure phase diffusion. Since the process is Markovian, it is permissible to use the standard Fokker-Planck Equation (8.38) with the delta function initial condition, Eq. (8.45), without requiring conditional moments, Eq. (8.44). But this is the well-known Green's function for diffusion
We can now calculate the phase line-width, which by Eq. (4.4) is given by the Fourier transform of (a*(t)a(0)}, where a is defined in Eq. (11.4)
Using Eq. (11.33) and neglecting amplitude fluctuations
Using the cumulant expansion theorem, we then obtain
Since (/> is a Gaussian variable, linked averages beyond the second disappear, and Eq. (1.56) yields
Using Eqs. (11.36) and (11.38) for £>(>), Eq. (11.48) becomes
where the Ap describes the associated line-width, with subscript p standing for phase, and we have The noise spectrum G(a,uj), by the Wiener-Khinchine theorem given in Section 4.2, is given by
PHASE FLUCTUATIONS IN A RESISTIVE OSCILLATOR
203
Thus we have a Lorentzian spectrum with line-width, Ap = I / p . We now return to our original units by replacing the prime we had removed:
and then Eqs. (11.26) and (11.33) yield
Now the power dissipated in the positive resistance is
Thus Eq. (11.53) becomes
where the width AOJ is the cavity width with just the positive impedance Rp present. We see that the line-width is proportional to l/P. For a laser, the line-width is proportional to one over the mean number of photons N. Now we calculate (e 2 ) Wo . we set
where the p and n refer to the e2 associated with the positive and negative resistance respectively. Using the definition of (e 2 ) Wo , Eqs. (11.19)-(11.20) and the Wiener-Khinchine theorem, Eq. (4.4), we have
The equilibrium theorem in Section 7.4 for this noise, including zero-point contributions, leads to
where
Tp is the positive temperature and C(u,T) is the quantum correction factor which approaches 1 for T —» oo and gives us the quantum corrections at low temperatures. A detailed discussion of this correction factor is given in Lax (19601), Section 7.
204
THE ROTATING WAVE VAN DEL POL OSCILLATOR (RWVP)
Similarly where Tn is the negative temperature. Since both Tn and Rn are negative, the above equation can be rewritten as
We recall, Eq. (11.11), that our operating point, poo was
therefore, from Eq. (11.55), the full width W is given by
Now Eq. (11.59) can be rewritten as
where When zero-point contributions are included at this semiclassical level, Eq. (11.63) becomes which is the Schawlow-Townes (1958) formula with an correct extra factor of 2 in the denominator. This factor is correct above threshold, but the original SchawlowTownes formula is correct well below threshold. See Fig. 11.3 for a plot of the gradual reduction of App from 2 to 1 as one passes from below threshold to above threshold. We note that in the classical limit, Cp,Cn —» 1 in Eq. (11.63) while in the full quantum limit np, nn —> 0 in Eq. (11.66). If ordered operators are used in a full quantum treatment, the Cp would only contain np whereas Cn would contain ftp +1. Thus Eq. (11.66) is unchanged. We have here a case of the rate up plus the rate down reminiscent of shot noise. The absorption gets no benefit of zero-point effects, but emission gets a full zero-point contribution. There is emission even when there are no photons are in the field, and the amount is that which would be computed classically into a field already containing one photon. We now show the line-width, W, to be very narrow. Since many photons are involved, the power, P, is much larger than the energy of one photon times the linewidth, ACJ, i.e., P ;» hujQ^uj, and thus the final line-width, W, is much less than the line-width Aw obtained without negative impedance, i.e., W
AMPLITUDE FLUCTUATIONS
205
that this discussion is valid only when the operating power is high enough to enable us to neglect the fluctuations in the amplitude. Equations (11.58) and (11.60) are also only valid where Rp and Rn are separately in thermal equilibrium since we assumed Nyquist's theorem for these noise sources. In a more general nonequilibrium case, the noise sources must be obtained from a detailed knowledge of the nature of the resistances Rp and Rn. 11.6
Amplitude fluctuations
Omitting the primes from Eq. (11.27) we get
We concentrate on motion of p in Eq. (11.3):
Therefore, the equation of motion for p becomes
In Eq. (11.69) A and A* are not constant, hence, (A(t)h*(t)) is not zero. Let us consider (A(t)h*(t)). Using the method in Section 10.6 and then Eq. (11.27), and integration over only half the delta function, we have
Therefore the Langevin equation for Eq. (11.69) is
where
hence, we have (Fp(t)) = 0, hence, Fp describe the pure fluctuations. As a Fokker-Planck process it is effectively to write Fp as
here Ac is the value of A at time tc a little early than t.
206
THE ROTATING WAVE VAN DEL POL OSCILLATOR (RWVP)
The diffusion coefficient, D(p), is given by
Using Eqs. (11.29) and (11.30) we have
Therefore our diffusion coefficient for amplitude, D(p), is
Quasilinear treatment of amplitude fluctuations
The following example shows that the operating point will be different in the choice of different variables. Using Eq. (11.67) to choose the operating point for the variable A, the drift vector is
On the other hand, while using Eq. (11.71), the operating point for the variable P=\A\2,ls
The advantage of po over poo is that it yields a nonvanishing value po > 0 below threshold. We will now calculate the amplitude fluctuations. In the quasilinear (QL) approximation, the decay constant for amplitude fluctuations is
F O K K E R - P L A N C K EQUATION FOR RWVP
207
where the subscript a denotes the amplitude. Using Eq. (11.81), and then Eq. (11-82),
A better approximation, the intelligent quasilinear approximation (IQL), is obtained by replacing po by p, the exact (or experimental) mean number:
We note that we need not have used the quasilinear approximation, as we could have solved this problem exactly, using the drift vector Eq. (11.81) and the diffusion coefficient, Eq. (11.78) to obtain the Fokker-Planck equation:
11.7
Fokker-Planck equation for RWVP
A complete Fokker-Planck equation including phase and amplitude fluctuations can be obtained from Eq. (11.27) after dropping the primes:
The last term is obtained by DAA*(d2P/dAdA*) + DA*A(d2P/dA*dA), fromEq. (11.29) DAA, = DA,A = 2. Alternatively, if one prefers real variables, one can set
in Eq. (11.27) to obtain the real Langevin equations
and
208
THE ROTATING WAVE VAN DEL POL OSCILLATOR (RWVP)
with
with
Alternatively with A = re~l<^, this Fokker-Planck equation transforms using Lax (1966IV), (3.27) and (3.28), to obtain
with
If we use the variables p = r'2 and 0, we obtain
However if we are only concerned with radial fluctuations, our answers are independent of <j) and thus this last term need not appear. We can now see the meaning of the approximation made in Section 11.5 on phase fluctuations. If we replace p~l in this last term by (p}~1, a number, then Eq. (11.96) can be separated into phase and radial motions, Eq. (11.43) and (11.87), respectively. Therefore, in the region well above threshold, since the amplitude fluctuations are small, the amplitude and phase fluctuations are nearly uncorrelated. 11.8
Eigenfunctions of the Fokker-Planck operator
We shall look for exact answers by considering the eigenfunctions of the FokkerPlanck equation (1 1.94). We consider solutions of the form
where A is some integer, to the eigenvalue equation
EIGENFUNCTIONS OF THE FOKKER-PLANCK OPERATOR
209
FlG. 11.3. The line-width Ap that includes phase fluctuations versus the dimensionless pump rate p. This figure was first presented in Lax (1967V). since we then have the correct phase dependence. From Eq. (11.95) the amplitude part becomes We thus need to find the eigenvalues of this second order differential equation. When A = 0, our solution has no <j> dependence, and we are only looking at radial fluctuations. AQ,O = 0 corresponds to the steady state. The lowest nonvanishing eigenvalue with A = 0 is called A a , i.e.
which is appropriate for amplitude noise. On the other hand when A = 1, we have a solution proportional to e1^ which is appropriate for considering (e z ^~^ 0 ^) (see Section 11.5), and thus is called
The numerical solution of Eq. (11.99) and the determination of Aa and Ap as a function of p is given in Hempstead and Lax (1967CVI), and by Risken and Vollmer (1967). By comparing Eq. (11.98) with Eqs. (11.43), (11.50), and (11.51) we see that Ap is the half-width of our spectrum. We plot this half-width Ap in Fig. 11.3. Actually, the line shape is a weighted sum of Lorentzians with widths equal to the eigenvalues. Ap plotted in Fig. 11.3 is for the lowest eigenvalue A = 1 which
210
THE ROTATING WAVE VAN DEL POL OSCILLATOR (RWVP)
FlG. 11.4. The half-width of amplitude spectrum Aa versus the dimensionless pump rate p. The solid (Exact) curve is the line-width associated with fluctuation of intensity A. The short-dashed curve is referred to as quasilinear approximation (QL). The longer dashed curve is the intelligent quasilinear approximation (IQL). This figure was taken from Lax (1967V). contains more than 98% of the weight. See Table VI of Hempstead and Lax (1967CVI). Equation (11.50) says A.pp = 1. We see that above threshold this is approximately true. Below threshold, A.pp —> 2. The reason for this behavior is discussed in Lax (1967V). The Schawlow-Townes formula was wrong by a factor of 2 because they were basically deriving their results by linear methods valid below threshold, not valid above threshold. In Fig. 11.4 we plot the half-width of the amplitude spectrum Aa versus the dimensionless pump rate p. The exact solution for intensity fluctuation is obtained by solution of the Fokker-Plank equation in this section. The line-width using the QL approximation is obtained by Eq. (11.85). IQL curve is obtained by Eq. (11.86), with a better approximation for the operating point. Actually for pure intensity fluctuations the two lowest nonzero eigenvalues are close to degenerate and it is necessary to plot the appropriately weighted average of all the eigenvalues. We see that when p > 10 or p < — 10, i.e., when we are away from threshold, the quasilinear results are very close to the exact ones.
12 Noise in homogeneous semiconductors
12.1
Density of states and statistics of free carriers
If we have a set of energy states Ej (j = 1, 2, ...), the number of states in the interval Ea < E < E^ can be written as
since each integral yields a one or zero according as Ej is in the interval (Ea,Ef,) or not. Thus n(E), which can be interpreted as the density of states (DOS), is given by
For electrons quantized in a box of dimensions LI, L-2 and I/s, with periodic boundary condition, the eigenstates are plane waves
where the wave vectors take the quantized values:
where /i, /2, h are integer. The DOS is thus given by
where -E'(k) is the energy wave vector relationship. For large Lj, the sums can be converted to a triple integral:
Since
212
NOISE IN HOMOGENEOUS SEMICONDUCTORS
In a crystal the integral in Eq. (12.8) is understood to sum over one Brillouin zone (BZ). One can interpret F/(2vr) 3 as the density of states in k space. If one introduces the momentum variable p = hk, this is equivalent to
where so that there is one quantum state for each volume /i3 in phase space. Although we have used a box with three perpendicular axes, in a crystal we could have used a box with edges parallel to the vectors ai, a2, as of the primitive cell. The sum would then be over cell positions. The integral, Eq. (12.8), over k could become an integral over the reciprocal lattice. The integral would still extend over one Brillouin zone whose shape takes that of one cell of the reciprocal lattice. Periodicity permits a rearrangement so that the integral is over a Brillouin zone with the full symmetry of the lattice. General results, such as Eqs. (12.8) and (12.9), remain valid. Near the bottom of the conduction band in a semiconductor there is an approximate effective mass relationship of the form
whose surface of constant energy is an ellipsoid, with Ec the energy at the conduction band edge. The density of states per unit volume is give by
If we let we get
where
DENSITY OF STATES AND STATISTICS OF FREE CARRIERS
213
is the corresponding density of states for the isotropic case. It is convenient to define
as the DOS mass, since this choice produces no correction factor. We can now omit the prime in Eq. (12.15) and write
where d£l = sin 9d6d(j), the solid angle, can be integrated over to yield a factor 4vr. Free electrons at equilibrium
In a homogeneous sample, the density of free carriers n of electrons can be written in the form:
Here, DC(E) represents the density of electronic states per unit energy, and f ( E ) is the probability that any one of these states is occupied. The energy Ec is the energy at the bottom of the conduction band. If the conduction band is isotropic near its minimum, the energy takes the simple form:
Here m* is the effective mass of electrons in the conduction band. In this case, the density of states is shown to take the simple form:
and the probability of occupancy is given by the Fermi function:
Here, £ = Ep is the Fermi energy. We avoid the customary symbol // since the latter is used for mobility in this chapter. The last form in Eq. (12.20) is appropriate to the nondegenerate case. (Nondegenerate means that the density is sufficiently low that Boltzmann statistics are adequate.)
214
NOISE IN HOMOGENEOUS SEMICONDUCTORS
Free holes at equilibrium Holes are simply empty electron states. Their statistics can be written in a completely analogous manner. The density of holes, called p, is given by:
Here DV(E) represents the density of states in the valence band and is given by:
As before, we have assumed that the states near the valence band edge (now the top of the valence band) obey a simple effective mass relationship:
The probability of a hole is the probably that the corresponding electron state is empty:
Again, the last form is valid only in the nondegenerate limit. For the above densities of states, in the Boltzmann approximation, the integrals can be performed, and we obtain the densities:
where the coefficients are given by
and represent effective numbers of states at the band edges that correspond to Boltzmann occupancy of a distributed set of states. The letters n and p are presumably used to correspond to the mnemonic, n for negative, and p for positive. Note that the expressions, Eqs. (12.25) and (12.26), for the densities are correct, without specifying how the Fermi level £ is to be determined.
CONDUCTIVITY FLUCTUATIONS
215
Law of mass action The Fermi energy appears with opposite signs in Eqs. (12.25) and (12.26). Hence it cancels out of the product. The result,
is an example of the law of mass action, with Eg representing the energy gap between the conduction and valence bands. If donors or acceptors are present, then the Fermi level is shifted, but Eq. (12.29) are unaffected. If ND, the donor density, is a function of x, then £, n, and p will be functions of x but the product n(x]p(x] will be a constant. This law is a special case of the law of chemical equilibrium. Having the above preliminary knowledge, we will concentrate on the calculation of noise in semiconductors. 12.2
Conductivity fluctuations
Information about carrier number fluctuations can be obtained by injecting a constant current and measuring the voltage fluctuations induced by the conductivity modulation caused by carrier concentration fluctuations. The admittance can be written
where ^p and nn are the hole and electron mobilities, p and n are the hole and electron concentrations, and P and N are the total hole and electron numbers over the volume AL between the electrodes, of area A and separation L. Thus the fractional voltage fluctuations are given by
If only electrons and holes are present (and not traps) charge neutrality will be enforced up to the (very high) dielectric relaxation frequency, so that to an excellent approximation Thus the voltage autocorrelation is given by
where
We define an "after-effect function" 3>(t) by
216
NOISE IN HOMOGENEOUS SEMICONDUCTORS
The voltage noise is given by
and the total voltage fluctuation may be obtained by replacing the integral by unity. The total noise, which only involves <&(0) = 1, is consistent with the normalization condition in the noise spectrum. The after-effect function will be calculated in later sections. 12.3
Thermodynamic treatment of carrier fluctuations
In the equilibrium case {(AP) 2 ) = {(A7V) 2 ) can be determined by the thermodynamical formula. We shall first consider a set of free electrons in the nondegenerate case when Boltzmann (rather than Fermi) statistics are applicable. Then Shockley (1950) has shown that the total number of electrons obeys
where Ec is the energy at the bottom of the conduction band and Nc is a temperature-dependent effective density of states. The conventional symbol C is used to represent the electron Fermi level or chemical potential. For fluctuations from the equilibrium state, the second moments are known thermodynamically (Greene and Callen, 1952):
When regarded as a thermodynamic quantity, it is customary to rewrite (A) simply as A. Here, FB is the force "conjugate" to the variable B in the sense that the negative of the pressure, —P, is the conjugate the volume V. (The negative sign is necessary since pressure decreases volume rather than increasing it.) The thermodynamic formula, Eq. (12.37), then yields
This result is not entirely surprising, since the total number of carriers is an integral
Indeed, Eq. (12.38) can be derived directly from Eq. (12.39) using only the assumption that the TJ are uniformly distributed in space.
THERMODYNAMIC TREATMENT OF CARRIER FLUCTUATIONS
217
A less obvious case is that of a set of Nt traps interacting with a reservoir of chemical potential (4. We assume that the trap occupancy is sufficiently high that Fermi statistics are necessary. In that case, the number of filled traps by use of Fermi-Dirac statistics is
In that case, the thermodynamic formula, Eq. (12.38) becomes
It can be seen that the fluctuations are reduced by a factor equal to the fraction of empty states. The reason for this result is made clear in the next section, in which a kinetic approach is used for the same problem. If both N and N are allowed to vary simultaneously, the simplest distribution consistent with these second moments is
The term in AA^ATV vanishes because N does not depend on (^ and N does not depend on £. Within the quasilinear approximation, it is appropriate to ignore higher cumulants than the second and stop at the Gaussian approximation. Suppose, now, that the electrons in traps do not have an independent reservoir, but are obtained from the free carriers. Then we must impose the conservation condition If this constraint is inserted into Eq. (12.38), we obtain a Gaussian in a single variable with the second moment
The situation for holes is similar to that for electrons. If the holes have their own reservoir, then the typical Poisson process prevails
If holes, traps and electrons are all present and coupled to each other then charge neutrality imposes the constraint
218
NOISE IN HOMOGENEOUS SEMICONDUCTORS
In the presence of compensating centers, Nco, there is also a neutrality condition for the steady state
12.4
General theory of concentration fluctuations
The thermodynamic discussion of occupancy fluctuations can be generalized by noting that the average occupancy of a state of energy E is given by where /3 = 1/kT, and for Fermi, Boltzmann, and Bose particles
For application to a particular state a, replace n by n(a) and E by E(a). The fluctuation in occupancy of that state is given by the thermodynamic formula, Eq. (12.37), or the first part of Eq. (12.38), to be
which includes all three statistical cases, Fermi, Boltzmann and Bose, with the three choices of e above. This result is true in equilibrium. Fluctuations for the nonequilibrium steady state We have also established the truth of Eq. (12.50) for the nonequilibrium steady state in Lax (19601) by explicitly constructing a model in which there are transition probabilities for the transfer of particles between states. Assuming Wa>a represents the transition probability for a "collision" which carries the particle form state a to state a'. Considering the Pauli principle for electrons and holes in semiconductors, the transition rate Wa>a is replaced by The master transition probability from occupation number n(a) to n'(a) can be written as
GENERAL THEORY OF CONCENTRATION FLUCTUATIONS
219
We now perform the calculation of the 6th state of the first moment of the transition probability defined by Eq. (8.10):
If one inserts Eq. (12.52) and sums first over n', the only terms in the sum which contribute are those for which a = b and a' = b,
Therefore, from Eqs. (8.22) and (12.51),
In a steady state, one is tempted to make the terms on the right hand side of Eq. (12.55) cancel in pairs by detailed balance:
However, if one requires that three states a,b,c be consistent with one another under this requirement, one finds that
must satisfy the consistency requirement f ( c , a ) = f ( c , b ) f ( b , a ) a functional equation whose only solution is the form
Thus if the ratio of forward to reverse transition probabilities has the form Eq. (12.58), then the steady state solution has the form
or Even in the nonequilibrium case, we can choose to define a quasi-Fermi level A = exp(£/fcT). Under the thermal equilibrium case, g(a) = exp(—E(a)/kT) which leads to the conventional equilibrium result Eq. (12.48).
220
NOISE IN HOMOGENEOUS SEMICONDUCTORS
In general one obtains a steady state in which neither equilibrium nor detail balance occur. In any case, for small deviations from a steady state, we set
and rewrite Eq. (12.55) in the form
where
The equation for the second moments, according to Eq. (8.23), is
By using the definition Eq. (12.52), one obtains
for c 7^ b. The two terms in D^c are equal under detailed balance but not otherwise. For c = 6, we obtain
The steady state second moments (An(6)An(c)} now are chosen so that the right hand side of Eq. (12.64) vanishes, i.e., so that the Einstein relation is obeyed. Assuming that there is no correlation between fluctuations in different states, we
GENERAL THEORY OF CONCENTRATION FLUCTUATIONS
221
try a solution of form and we find that the ansatz satisfies the Einstein relation of Eq. (12.64), provided the steady state obeys detailed balance, Eq. (12.56). Using Eq. (12.59), we have which leads to The total number of systems is N = ^n(c), which yields a formula similar to the thermodynamic case, Eq (12.38). In our model, however, the total number TV should be fixed, and we need to force a constraint. The solution Eq. (12.68) we found is only a particular solution, and can be added a solution of the homogeneous equation
For the Boltzmann case (e = 0), the solution Eq. (12.68) is replaced by which obeys the constraint {[^ a Ara(a)]Are(c)} = 0 . For the Fermi and Bose cases, we have
The added term is of order 1/N and therefore is unimportant in calculating the fluctuations in any small portion of a large system. However, this term does affect fluctuations in appreciable parts of a system. Equation (12.50) can be readily applied to the case in which n(l) = N, the number of conduction electrons, n(2) = N, the number of trapped electrons and re(3) = Nv — P, number of electrons in the valence band, where Nv is the number of valence band states and P is the number of holes. Thus
We have assumed nondegeneracy for the holes and the free electrons, but not for the trapped electrons. Since An(3) = — P, we can write the second moments in
222
NOISE IN HOMOGENEOUS SEMICONDUCTORS
the form
These results are consistent with the constraint AP = AJV + A TV of charge neutrality. If the number of traps is zero, they reduce to AP = AN and
12.5
Influence of drift and diffusion on modulation noise
To concentrate on the influence of drift and diffusion on density fluctuations and modulation noise, let us return to the trap-free case discussed in Section 12.3, where AJV = AP. The spectrum of voltage noise already given in Eq. (12.35) can be rewritten as a product
of the total noise
and the normalized spectrum
where
Note that this normalization is four times that used for g(uj~) in Lax and Mengert (1960). For simplicity, we confine ourselves to a one-dimensional geometry, as
INFLUENCE OF DRIFT AND DIFFUSION ON MODULATION NOISE 223 was done by Hill and van Vliet (1958), and calculate the total hole fluctuation as
We can now apply our techniques to continuous parameter systems by replacing the sum
by the integral
Introducing a more convenient notation for the Green's function
we can write
so that the correlation at two times is, as usual, related to the pair correlation at the initial time
It is customary to treat fluctuations at the same time at two places as uncorrelated. This is clearly the case for independent carriers. It is less obvious when Coulomb attractions (say between electrons and holes) are included. It was shown, however, in Appendix C of Lax and Mengert (1960) that a delta function correlation is valid, as long as we are dealing with distances greater than the screening radius. Thus we can take
where the coefficient of the delta function is chosen so that the fluctuation in the total number of carriers {(AP)2} is given correctly by Eq. (12.82). Here L is the distance between the electrodes.
224
NOISE IN HOMOGENEOUS SEMICONDUCTORS
The definition, Eq. (12.34), of $(t) yields the expression
If the Green's function is defined, appropriately as in Lax (19601), to vanish for t < 0, it will obey an equation of the form
where, the operator A is defined, in the continuous variable case, by
Here, v and D are the bipolar drift velocity and diffusion constants found by van Roosebroeck (1953) to describe the coupled motion of electrons and holes while maintaining charge neutrality
where the individual diffusion constants and mobilities are related by the Einstein relation. Equation (12.95) for the Green's function can be solved by a Fourier transform method where Here A (A;) are the eigenvalues of the A operator With Eq. (12.99) for k, the after-effect function can be calculated from Eq. (12.94)
where z = kL/2. Thus the spectrum, Eq. (12.85), is
where A has been re-expressed as a function of z, Lax and Mengert (1960) provide an exact evaluation of this integral. However, the resulting expressions are complicated. It is therefore worthwhile to treat some
INFLUENCE OF DRIFT AND DIFFUSION ON MODULATION NOISE 225 limiting cases. For example, if there is no diffusion, then
and the after-effect function is given by
where Ta = L/v is the transit time and the spectrum is governed by a windowing factor W with the window factor given by
Indeed, the current noise, in this special case, can be written in the form given by Hill and van Vliet (1958)
which emphasizes the similarity to shot noise. The equivalent current is defined by
The window factor still takes the complicated form
where r = 1/r. Even this result is complicated to understand. If we take the limiting case when recombination is unimportant over the transit time, the result simplifies to
a windowing factor similar to that found associated with the effect of transit time on shot noise.
226
NOISE IN HOMOGENEOUS SEMICONDUCTORS
In the opposite limit, in which diffusion is retained but drift is neglected, the exact result for the spectrum is given by
where
and is the reciprocal of the diffusion length. The exponential term represents an interference term between the two bounddaries that is usually negligible since they are seperated by substantially more than a diffusion length. A simple approximate from over intermediate frequencies is
In summary, in addition to the first term, which represents the volume noise easily computed just by using the total carrier P(t), the term proportional to an inverse frequency to the three-halves power arises from diffusion across the boundary at the electrodes.
13
Random walk of light in turbid media
Light propagation in a multiple scattering (turbid) medium such as the atmosphere, colloidal suspensions and biological tissue is commonly treated by the theory of radiative transfer; see, for example, Chandrasekhar (1960). Recent advances in ultrafast lasers and photon detectors for biomedical imaging and diagnostics have revitalized the interest in radiative transfer (Alfano 1994; Yodh et al. 1997; Gandjbakhche 1999). The basic equation of radiative transfer is the elastic Boltzmann equation, a nonseparable integro-differential equation of first order for which an exact closed form solution is not known except for the case for isotropic scatterers as far as the authors know (Hauge 1974). Solutions are often based on truncation of the spherical harmonics expansion of the photon distribution function or resort to numerical calculation including Monte Carlo simulations (Ishimaru 1978; Cercignani 1988). In this chapter, we shall treat light propagation in turbid media as a random walk of photons and determine the characteristics of light propagation (center position and diffusion coefficients) from an analysis of the random walk performed by photons in the turbid medium directly (Xu et al. 2004). In the next chapter, a more advanced approach solving the elastic Boltzmann equation based on a cumulant expansion of the photon distribution function will be presented.
13.1
Introduction
Clouds, sea water, milk, paint and tissues are some examples of turbid media. A turbid medium scatters light strongly. Visible light shed on one side of a cup of milk is much weak and diffusive observed on the other side of the cup because light is highly scattered in milk while the absorption of light by milk is very low. The scattering and absorption property of a turbid medium is described by the scattering and absorption coefficients /j,s and /j,a, respectively. Their values depend on the number density of scatterers (absorbers) in the medium and the cross-section of scattering (absorption) of each individual scatterer (absorber). For a collimated beam of intensity IQ incident at the origin and propagating along the z direction inside a uniform turbid medium, the light intensity in the forward direction at
228
RANDOM WALK OF LIGHT IN TURBID MEDIA
position z is attenuated according to the Beer's law: where HT = Us + Ha is the total attenuation coefficient. The portion of light propagating in the exact forward direction is usually called "ballistic light". The reduction of the intensity of ballistic light comes from the scattering of light into other directions (called "multiply scattered light" or "diffusive light") and light absorption in the medium. Inside the turbid medium, ballistic light decays exponentially and only multiply-scattered light survives over some distance away from the incident light source. The theory to treat propagation of multiply scattered light in a turbid medium is the theory of radiative transfer (Chandrasekhar 1960). Due to the difficulty in solving the elastic Boltzmann equation which governs radiative transfer, in particular, in a bounded volume, solutions are often based on truncation of the spherical harmonics expansion of the photon distribution function or resort to numerical calculation including Monte Carlo simulations (Ishimaru 1978; Cercignani 1988). Monte Carlo methods treat photon migration as a Markov stochastic process. The solution to the elastic Boltzmann equation is equivalent to the probability of finding a photon at any specified location, direction and time in the Monte Carlo simulation. The advantage of the Monte Carlo method is that it can easily handle, at least in principle, a bounded region, different boundary conditions and/or heterogeneity of the medium. However, Monte Carlo methods are computation prohibitive when the size of the sampling volume becomes large. In the stochastic picture of photon migration in turbid media, photons take a random walk in the medium and may get scattered or absorbed according to the scattering coefficient /zs and the absorption coefficient ^a of the medium. A phase function, P(s, s'), describes the probability of scattering a photon from direction s to s'. The free path (step-size) between consecutive events (either scattering or absorbing) has an exponential distribution / u^exp(—^yef) characterized by the total attenuation //y. At an event, photon scattering takes place with probability HS/^T (the albedo) and absorption with probability na/HT- This picture forms the basis for the Monte Carlo simulation of photon migration. Here we shall use this simple picture of a Markov stochastic process of photons to compute analytically macroscopic quantities such as the average central position and half-width of the photon distribution. The idea is to first analyze the microscopic statistics of the photon propagation direction in the direction space which is solely determined by the phase function and the incident direction of light. The connection between the microscopic statistics and the macroscopic quantities at any specified time and position is made by a "bridge", a generalized Poisson distribution pn(t), the probability that a photon has endured exactly re scattering events before time t. In this book, we will restrict our discussion to light propagation in an isotropic turbid medium where the property of light scattering depends
MICROSCOPIC STATISTICS IN THE DIRECTION SPACE
229
FIG. 13.1. A photon moving along n is scattered to n' with a scattering angle 0 and an azimuthal angle ^ in a photon coordinate system xyz whose 2-axis coincides with the photon's propagation direction prior to scattering. XYZ is the laboratory coordinate system. on the angle between s and s' rather than the directions and the phase function can be written in a form of .P(s • s').
13.2
Microscopic statistics in the direction space
Denote the position, direction and step-size of a photon after ith scattering event as xW, sW and S-l\ respectively. The initial condition is x(°) = (0, 0, 0) for the starting point and s^0) = SQ — (0, 0, 1) for the incident direction. The laboratory Cartesian components of x*^ and s^-1 are Xa and Sa (a = 1,2,3). The photon is incident at time to = 0- For simplicity the speed of light is taken as the unit of speed and the mean free path /u^1 as the unit of length. The scattering of photons takes a simple form in an orthonormal coordinate system attached to the moving photon itself where n is the photon's propagation direction prior to scattering and m is an arbitrary unit vector not parallel to n (see Fig. 13.1). The distribution of the scattering angle 9 € [0, vr] is given by the phase function of the medium and the azimuthal angle (f) is uniformly distributed over [0 , 2-rr) . For one realization of the scattering event of angles (0, (f>) in the photon coordinate system, the outgoing propagation direction n' of the photon will be:
230
RANDOM WALK OF LIGHT IN TURBID MEDIA
FIG. 13.2. The average photon propagation direction (vector) decreases as gr" where g is the anisotropy factor and n is the number of scattering events.
The freedom of choice of the unit vector m reflects the arbitrariness of the xy axes of the photon coordinate system. For example, taking m = (0, 0,1), Eq. (13.2) gives
Here sa , etc, are stated in the laboratory coordinate system. The ensemble average of the propagation direction over all possible realizations of (6, (p) and then over all possible sW in Eq. (13.3) turns out to be (s(*+1)) = (sW) (cos 8) because 0 and <j) are independent and (cos^) = (sin^) = 0. By recursion,
where g = (cos 9) = 1 — g\ is the anisotropy factor (see Fig. 13.2).
MICROSCOPIC STATISTICS IN THE DIRECTION SPACE
231
By squaring the third equation in Eq. (13.3) and then taking an ensemble average, we find
where 52 = | (sin2 6*) as (cos 0) = 0 and (sin2 0) = (cos2 0) = ^. By forming a product from the first and third equation in Eq. (13.3) and then taking an ensemble average, we find
Similar equalities are obtained for x and y components as the labels are rotated due to the symmetry between x,y,z directions. The correlations between the propagation directions are hence given by
On the other hand, the correlation between si and Sa (j > i) can be reduced to a correlation of the form of Eq. (13.7) due to the following observation
232
RANDOM WALK OF LIGHT IN TURBID MEDIA
where p(s^ |sW) means the conditional probability that a photon jumps from sW at the ith step to s(J) at the jth step. Equation (13.8) is a result of the ChapmanKolmogorov condition (2.17), p(sW sW) = f ds^-^p(s^\s^~^)p(s^-^ sW) of the Markovian process and the fact f ds^Sa p(s^\s^~^) = gs2 from Eq. (13.3). Combining Eqs. (13.7) and (13.8), and using the initial condition of s(°\ that is, we conclude
where the constants /i = /2 = — 1 and /s = 2. Here we see that the autocorrelation of the x, y, or z component of the photon propagation direction approaches 1/3, i.e., scattering uniformly in all directions, after a sufficient large number of scattering (a = (3 and j = i —» oo ), and the cross-correlation between them is always zero (a ^ (3). 13.3
The generalized Poisson distribution pn(t)
The connection between the macroscopic physical quantities about the photon distribution and the microscopic statistics of the photon propagation direction is made by the probability, pn(t), that the photon has taken exactly n scattering events before time t (the (n + l)th event comes at t). We claim pn(t) obeys the generalized Poisson distribution. This claim was previously proved by Wang and Jacques (1994):
which is the Poisson distribution of times of scattering with the expected rate occurrence of IJL~I multiplied by an exponential decay factor due to absorption. Here we have used p^1 = 1 as the unit of length. This form ofpn(t) can be easily verified by recognizing first that po(t) = exp(—t) equals the probability that the photon experiences no events within time t (and the first event occurs at t); and second that the probability pn+i (t) is given by
MACROSCOPIC STATISTICS
233
in which the first event occurred at t' is scattering and followed by n scattering events up to but not including time t, which confirms Eq. (13.11) at n + 1 if Eq. (13.1 1) is valid at n. The total probability of finding a photon at time t
decreases with time due to the annihilation of photons by absorption. 13.4
Macroscopic statistics
The average propagation direction (s(t)} at time t is then
Plugging Eqs. (13.4) and (13.11) into Eq. (13.14), we obtain <s(t)) = zexp(-fiagit)
= zexp(-t/lt).
(13.15)
l
Here lt = nj /(I — g) is usually called the transport mean free path which is the randomization distance of the photon propagation direction. The first moment of the photon density with respect to position is thus
revealing that the center of the photon cloud moves along the incident direction for one transport mean free path lt before it stops (see Fig. 13.3). The second moment of the photon density is calculated as follows. Denote p(&2, £21 si, ti) the conditional probability that a photon jumps from a propagation direction si at time t\ to a propagation direction 82 at time t% (£2 > ti > 0). The conditional correlation of the photon propagation direction subject to the initial condition is given by
Denote the number of scattering events encountered by the photon at states (si, t\) and (82, £2) as n\ and n^ respectively. Here n n\ since the photon jumps from (si, t\) to (82, £2)- Equation (13.17) can be rewritten as
The evaluation of the denominator in Eq. (13.18) is simple and is given by Sn2 Pni (*2) = ex P(~/ u a*2)- To evaluate the numerator in Eq. (13.18), we proceed
234
RANDOM WALK OF LIGHT IN TURBID MEDIA
as follows:
where C"1 = n,2'/[( n 2 ~~ rai)'ni'] and we have repeatedly used the binomial expansion of (a + b)n = J^o
This exact result for ( s p ( t 2 ) s a ( t i ) ) can be easily verified to agree with the regression theorem discussed in Chapter 8. The second moment of the position is then
The diffusion coefficient is obtained from
MACROSCOPIC STATISTICS
235
after integration. Our main result, Eqs. (13.16) and (13.22), agrees with Eqs. (14.31)-(14.33) in Chapter 14 derived by the cumulant expansion. The general form of the photon distribution depends on all moments of the distribution. However, after a sufficiently large number of scattering events have taken place, the photon distribution approaches a Gaussian distribution over space according to the central limit theorem (Kendall 1999). This asymptotic Gaussian distribution, characterized by its central position and half-width (2Dt), is then
where the normalizing factor is C(t) = exp(—// O t) owing to Eq. (13.13). This provides a "proper" diffusion solution to radiative transfer, revealing a behavior of light propagation that photons migrate with a center that advances in time, and with an ellipsoidal contour that grows and changes shape (see Fig. 13.3). It is also worth mentioning that the absorption coefficient only appears in the generalized Poisson distribution pn(t) through an exponential decay factor exp(—Hat}- This exponential factor will be canceled in the evaluation of the conditional moments of the photon distribution, see Eqs. (13.17) and (13.18). Hence, the sole role played by absorption is to annihilate photons and affects neither the shape of the distribution function nor the diffusion coefficient (Durduran et al. 1997; Cai et al. 2002). The results, except for the Gaussian photon distribution Eq. (13.23), are exact under the sole assumption of a Markov random process of photon migration. The deviation from a Poisson distribution of scattering or absorption events can be dealt with by modifying pn(t). The Markov random process is usually a good description of scattering due to short-range forces such as photon migration in turbid media. In situations where interference of light is appreciable, the phase of photon, which depends on its full past history, must be considered and this is non-Markovian. Non-Markov processes may also occur in scattering involving long-range forces such as Coulomb interaction between charged particles in which the many-body effect cannot be ignored.
236
R A N D O M WALK OF LIGHT IN T U R B I D MEDIA
FlG. 13.3. The center of a photon cloud approaches lt along the incident direction and the diffusion coefficient approaches lt/3 with increase of time. We should finally point out that this treatment is for a scalar photon. Light is a vector wave. The vector nature produces some intrigue effects of multiply scattered light including the polarization memory effects where light polarization is preserved over a long distance where light is already diffusing. A scattering matrix, as opposed to the scalar phase function, needs to be used to describe polarized light scattering in turbid media. Nevertheless, the simple picture of a random walk of light can be generalized to treat propagation and depolarization of polarized light in turbid media. Characteristic lengths governing depolarization of multiply scattered light can be determined analytically and explain the observed memory effects. The interested reader may refer to Xu and Alfano (2005, 2006) and references therein.
14
A n a l y t i c a l solution of the elastic transport equation
14.1
Introduction
An example of a random process is the particle's (or photons, or acoustic wave) propagation in a turbid medium, where particles suffer multiple scattering by randomly distributed scatterers in the medium. The kinetic equation governing the particle's propagation is the classic Boltzmann transport equation, which is also called the radiative transfer equation in the case of light propagation. The search for an analytical solution of the time-dependent elastic Boltzmann transport equation has lasted for many years. Beside being considered as a classic problem in fundamental research in statistical dynamics, a novel approach to an analytical solution of this equation may have applications in a broad variety of fields. The common approaches to solve this equation are as follows. Based on the angular moment expansion with cut-off to certain order, the Boltzmann transport equation is transferred to a series of moment equations. In lowest order, a diffusion equation is derived and its analytical solution in an infinite uniform medium is obtained. The diffusion approximation fails at early times when the particle distribution is still highly anisotropic. The solutions of the diffusion equation or the telegrapher's equation do not produce the correct ballistic limit of particle propagation. Numerical approaches, including the Monte Carlo simulation, are the main tools in solving the elastic Boltzmann equation, which are cumbersome tasks. In this chapter we seek an analytical solution of the classic elastic Boltzmann transport equation in an infinite uniform medium, with the particle's velocity v = vs, where s is a unit vector of direction, and v is the (constant) speed in the medium. We assume that the phase function, P(s, SQ), depends only on the scattering angle: F(s, SQ) = P(s • SQ). Under this assumption, we can handle an arbitrary phase function for obtaining the particle distribution, /(r, s, t), as a function of position r, angle s and time t, and that of the particle density distribution, N(r, t). Our approach is as follows. At first, the exact expression of the total angular distribution, F(s, t), as a function of time in an infinite uniform medium is derived. Based on this angular distribution, we derive exact spatial cumulants of /(r, s, t) and N(r, t) up to an arbitrary high order at any angle and time (Cai, Lax, and Alfano 2000b). A cut-off at second order of cumulants, /(r, s, t) and N(r, t) can be expressed by Gaussian distributions (Cai, Lax, and Alfano 2000a) which
238 ANALYTICAL SOLUTION OF THE ELASTIC T R A N S P O R T EQUATION
have exact first cumulant (the position of the center of the distribution) and exact second cumulant (the half-width of the spread of the distribution). After many scattering events have taken place, the central limit theorem guarantees that the spatial Gaussian distribution we calculate will become accurate in detail, since the higher cumulants become relatively small. At early time, the analytical expressions of a modified non-Gaussian distributions will be presented in Section 14.4 (Cai, Xu, and Alfano 2005). The solution has been extended to the case of polarized photon distribution, to semi-infinite and slab geometries. By use of a perturbative method, the distribution in a weak heterogeneous scattering medium can be computed. 14.2
Derivation of cumulants to an arbitrarily high order
The elastic Boltzmann kinetic equation of particles, with magnitude of velocity v, for the distribution function /(r, s,t) as a function of time t, position r and direction s, in an infinite uniform medium, from a point pulse light source, 6(r — r0)6(s - sQ)6(t - t0), is given by
where fis is the scattering rate, //0 is the absorption rate, and P(s, s') is the phase function, normalized to J P(s, s')ds' = 1. When the phase function depends only on the scattering angle in an isotropic medium, we can expand the phase function in Legendre polynomials with constant coefficients,
A difficulty in solving Eq. (14.1) is that the term vs • Vr-f (r, s, t) makes components of spherical harmonics of /(r, s,t) coupling with each other. We first study the dynamics of the distribution in direction space, F(s, SQ, t), on a spherical surface of radius 1. The kinetic equation for F(s, SQ, t) can be obtained by integrating Eq. (14.1) over the whole spatial space, r. The spatial independence of/i s , fj,a, and P(s, s') retains translation invariance. Thus the integral of Eq. (14.1) obeys
Since the integral of the gradient term over all space vanishes, in contrast to Eq. (14.1), if we expand F(s, SQ,£) in spherical harmonics, its components do not
DERIVATION OF CUMULANTS TO AN A R B I T R A R I L Y HIGH ORDER 239
couple with each other. Therefore, it is easy to obtain the exact solution of Eq. (14.3):
where Two special values of g\ are: go = 0, which follows from the normalization of F(s, s') and g\ = v/ltT, where ltl is the transport mean free path, defined by ^tr = v/[ns(l — (cos6})], where (cos9) is the average of s • s' with P(s,s') as weight. In Eq. (14.4), Y/ m (s) are spherical harmonics normalized to 4vr/(2/ + 1). Equation (14.4) serves as the exact Green's function of particle propagation in the velocity space. Since in an infinite uniform medium this function is independent of the source position, TO, the requirements for a Green's function are satisfied, and especially, the Chapman-Kolmogorov condition (see Section 2.4) is obeyed:
In fact, in an infinite uniform medium, this propagator determines all particle migration behavior, including its spatial distribution, because displacement is an integration of velocity over time. The distribution function /(r, s, t) (the source is located at TO = 0) is given by
where (...) means the ensemble average in the velocity space. The first delta function imposes that the displacement, r — 0, is given by the path integral. The second delta function assures the correct final value of direction. Equation (14.6) is an exact formal solution of Eq. (14.1), but cannot be evaluated directly. We make a Fourier transform for the first delta function in Eq. (14.6), then make a cumulant expansion, (a detailed explanation of cumulants, see Section 1.7) and obtain
240 ANALYTICAL SOLUTION OF THE ELASTIC TRANSPORT EQUATION
where T denotes time-ordered multiplication. In Eq. (14.7), the index c denotes the cumulant, which is denned as (A)c = ( A ) , (A2)c = (A2) - ( A ) ( A ) . A general expression relating the moment (Am) and the cumulant (Am}c is given by:
Hence, if (Am} (m = 1,2, ...re) have been calculated, (Am}c (m = 1,2, ...re) can be recursively obtained, and conversely. In the following, we derive the analytical expression for the ensemble average {/o dtn.... JQ d t i T [ s j n ( t n ) . . . S j 1 ( t i ) ] } . Using a standard time-dependent Green's function approach, it is given by
where the word "perm" means all n! — 1 terms obtained by permutation of { j i } , i = 1,..., re , from the first term. An intuitive way to understand Eq. (14.9) is use of a basic concept of quantum mechanics. The left side of the equation is written in the Heisenberg representation, while the right side of the equation is written in the Schrodinger representation. Another way is to use the Feymann path integral approach, considering Eq. (14.5), which leads to the same formula. InEq. (14.9), F(s(-i'>,s^~l\ti - t^i) is given by Eq. (14.4). Since Eq. (14.4) is exact, Eqs. (14.9) provides the exact reth moments of the distribution. In Cartesian coordinates three components of s are [s x , sy, s z ] . For convenience in calculation, however, we will use the components of s on the base of spherical harmonics:
The recurrence relation of the spherical harmonics is given by
DERIVATION OF CUMULANTS TO AN ARBITRARILY HIGH ORDER 241
where i = ±1, and {ii,/2,rai,m2|i,m) is the Clebsch-Gordan coefficients of angular momentum theory, which are
with the row index (from above) j = —1,0,1 and the column index (from the left) i = 1,0, — 1. The orthogonality relation of spherical harmonics is given by
Using Eq. (14.11) and Eq. (14.13), the integrals over dsn...ds in Eq. (14.9) can be analytically performed. We obtain, when SQ is set along z, that
Note that all ensemble averages have been performed. Equation (14.15) involves integrals of exponential functions, which can be analytically performed. Equation
242 ANALYTICAL SOLUTION OF THE ELASTIC TRANSPORT EQUATION
(14.15) includes all related scattering and absorption parameters, gi, I = 0,1,... and na, and determines the time evolution dynamics. The final particle direction, s, appears as the argument of the spherical harmonics Yj m (s) in Eq. (14.14). Substituting Eq. (14.15) into Eq. (14.14), and using a standard cumulant procedure, the cumulants as functions of angle s and time t up to an arbitrary nth order can be analytically calculated. The final position, r, appears in Eq. (14.7), and its component can be expressed as |r|lij(r) , j = 1,0, —1, with r and f are, separately, the magnitude and the unit direction vector of r. Then, performing a numerical three-dimensional inverse Fourier transform over k, an approximate distribution function, /(r, s,i), accurate up to nth cumulant, is obtained. 14.3
Gaussian approximation of the distribution function
By a cut-off at the second cumulant, the integral over k in Eq. (14.7) can be analytically performed, which directly leads to a Gaussian spatial distribution displayed in Eq. (14.17). The exact first cumulant provides the correct center position of the distribution. The exact second cumulant provides the correct half-width of spread of the distribution. The expressions below are given in Cartesian coordinates with indexes a, (3 = [ x , y , z ] . These expression is obtained by use of an unitary transform sa = UajSj j = 1,0, —1 from Eq. (14.14) (up to second order) which is based on Sj = YIJ(S), with
We set SQ along the z direction and denote s as (9, (/>). Our cumulant approximation to the particle distribution function is given by
with the center of the packet (the first cumulant), denoted by rc, located at
where
is defined after and
GAUSSIAN APPROXIMATION OF THE DISTRIBUTION FUNCTION
243
c
ry is obtained by replacing cos in Eq. (14.19) by sm. In Eqs. (14.18,14.19), PI
(cos 0) is the associated Legendre function. The square of the average spread width (the second cumulant) is given by
where all the coefficients are functions of angle and time:
where (+) corresponds to A xx and (—) corresponds to
yz
is obtained by replacing cos
244 ANALYTICAL SOLUTION OF THE ELASTIC TRANSPORT EQUATION
A cumulant approximation for the particle density distribution is obtained from the exact expression we have a Gaussian shape
with a moving center located at
and the corresponding diffusion coefficients are given by
In contrast to Eqs. (14.18), (14.19) and (14.22)-(14.25), the results for N(r, t) are independent of gi for I > 2. Each distribution in Eq. (14.17) and Eq. (14.30) describes a particle "cloud" anisotropically spreading from a moving center, with time-dependent diffusion coefficients. At early time t —» 0, /() « t + O(t 2 ) in Eq. (14.20), and E\j) w t 2 / 2 + O(t3) for j = 1, 2, 3,4 in Eqs. (14.26)-(14.29). From Eqs. (14.18), (14.19), Eqs. (14.22)-(14.25), and Eqs. (14.31)-(14.33), we see that for the density distribution, N(r, t), and the dominant distribution function, that is J(r, s, t) along s = SQ, the center moves as vtso and the Bap in Eq. (14.21) are proportional to t3 at t —» 0. These results present a clear picture of nearly ballistic motion at t —» 0. With increase of time, the motion of the center
IMPROVING CUMULANT SOLUTION OF THE TRANSPORT EQUATION245
FlG. 14.1. The moving center of photons, Rz, and the diffusion coefficients, Dzz and Dxx, as function of time, where g\ are calculated by Mie theory, assuming water drops with a/A = 1, with a the radius of droplet and A the wavelength of light, and the index of refraction m = 1.33. slows down, and the diffusion coefficients increase from zero. This stage of particle migration is often called a "snake-like mode". At large times, the distribution function tends to become isotropic. The particle density, at t » l^/v and r > / tr , tends towards the center-moved (1/tr) diffusion solution with the diffusive coefficient /tr/3. Therefore, our solution quantitatively describes how particles migrate from nearly ballistic motion to diffusive motion, as shown in Fig. 14.1. Figure 14.2 shows the light distribution as a function of time at different receiving angles in an infinite uniform medium, computed by the second cumulant solution, where the detector is located at 5/tr from the source in the incident direction of the source.
14.4
Improving cumulant solution of the transport equation
The analytical solution obtained, although it has exact center and half-width, is not satisfied in two aspects. First, at very early times, exp(—git) —>• I for all /, hence, one cannot ensure summation over / to be convergent. Second, particles at the front edge of the Gaussian distribution travel faster than speed v, thus violating causality.
246 ANALYTICAL SOLUTION OF THE ELASTIC TRANSPORT EQUATION
FIG. 14.2. The time-resolved profile of light at different angles measured on a detector 10 mm from the source in the incident direction. The parameters for this calculation are / t r = 2 mm, la = 300 mm, the phase function is computed using Mie theory for polystyrene spheres in water, with diameter d = 1.11 /im, and the wavelength of the laser source A = 625 nm, which gives the g-factor g = 0.926. Separating the ballistic component from the scattered component
In order to make the summation over I convergent, we separate the ballistic component from the total 7(r, s, t) and compute the cumulants for the scattered component 1^ (r, s, t). The ballistic component is the solution of the homogeneous Boltzmann transport equation, which is the transport equation, Eq. (14.1), without the "scattering in" term (the first term in right side of Eq. (14.1)). The solution of the ballistic component is given by
The moments of the ballistic component can be easily calculated. When SQ is along z, we have
and other moments related to
are zero.
IMPROVING CUMULANT SOLUTION OF THE TRANSPORT EQUATION247 The total distribution is the summation of the ballistic component and the scattered component:
hence, the moments of the scattered component can be obtained by subtracting the corresponding ballistic moments from the moments of I(r, s, t ) . For example, we have
Notice that
Substituting Eq. (14.38), (14.35) into Eq. (14.37), the corresponding cumulants for scattered component 1^ (r, s, t) can be easily obtained, which are the following replacements of Eqs. (14.4), (14.18), and (14.22):
The expressions of other components of the first and second cumulants are unchanged, provided all F(s, so,i) in G in Section 14.3 is replaced by F^ (s, SQ, t). Note that Eq. (14.38) actually is equal to zero at s ^ SQ, and there is no ballistic component at these directions. The replacement of equations in Section 14.3 by Eqs. (14.39)-(14.41) greatly improves calculation of cumulants at very early times. By subtraction introduced above, the terms for large I approaches to zero, and summation over / becomes convergent at very early times. Because g\ = ^s[l — ai/(2l + l)], which approaches
248 ANALYTICAL SOLUTION OF THE ELASTIC TRANSPORT EQUATION
FlG. 14.3. Time-resolved profile of the backscattered (180°) photon intensity inside a disk with center at r = 0, radius R = lltr, thickness dz = 0.1/tr and the received angle dcosO = 0.001, normalized to inject 1 photon. The Heyney-Greenstein phase function with g = 0.9 is used, and l// a = 0. The solid curve is for the second cumulant solution (Gaussian distribution), and dots are for the Monte Carlo simulation. The insert diagram is the same result drawn using a log scale for intensity. to /j,s for large /, f(gi — gi±i) ~ t, and E^ ~ t 2 /2 when t —> 0, which results in cancellation in the summand for large / at very early times. An example of successful use of this replacement is calculation of backscattering. When 9 = 180°, Pi(cos6) = 1 or —1, depending on I even or odd. The computed rcz at very early times using Eq. (14.18) oscillates with a cut-off of 1. But the computed TZ at very early times using Eq. (14.40) becomes stac(s) ble. Calculation shows that TV = 0 at any time for any phase function when 9 = 180°. Figure 14.3 shows the computed time profile of the backscattering intensity l(s\r, s, t) at a detector centering at r = 0 and detection angle 9 = 180°, comparing with the Monte Carlo simulation. The absolute value of intensity, as well as the shape of the time-resolved profile, computed using our analytical cumulant solution matches well with that of the Monte Carlo simulation. The insert diagram is the same result drawn using a log scale for intensity. Note, this result of backscattering, based on solution of the transport equation, is for a detector located near the source, different from other backscattering results based on the diffusion model,
IMPROVING CUMULANT SOLUTION OF THE TRANSPORT EQUATION249
which are only valid when the detector is located with a distance of several / tr from the source. Shape of the particle distribution
If cumulants with order n > 2 are assumed all zero, the distribution becomes Gaussian. The Gaussian distribution is accurate at long times. At early times, particles at the front edge of distribution travel faster than free speed of particles, thus violates causality, especially for particles move along near forward directions. In the following, two approaches are used for overcoming this fault: (A) including higher cumulants; and (B) introducing a reshaped distribution. A. Calculation including high order cumulants We have performed calculation including the higher order cumulants to obtain more accurate shape of the distribution. Codes for calculation is designed based on the formula in Section 14.2. Figure 14.4 shows /(r, s, t) with a detector located at z = 6/tr front of source and received direction along (9 = 0, computed using the analytical cumulant solution up to tenth order of cumulants (solid curve), to the second order cumulants (dotted curve), the diffusion approximation (thick dotted curve), and the Monte Cairo simulation (discrete dots). The figure shows that the tenth order cumulant solution is located in the middle of the data obtained by the Monte Carlo simulation, and /(r, s, t) K- 0 before the ballistic time t\, = Ql^/v. The second order cumulant solution has nonzero /(r, s, t) before tf,, which violates causality. The computed JV(r, t)/4vr using the diffusion model has a large discrepancy with the Monte Carlo simulation, and the diffusion solution has more nonzero components before fy,, which violates causality. Using the second order cumulant solution, the distribution function can be computed very fast. The associated Legendre functions can be quickly computed using recurrence relations with accuracy limited by the computer machine error. It takes a minute to produce 105 data of /(r, s, t) on a personal computer. On the other hand, in order to reduce the statistical fluctuation to a level shown in Fig. 14.4, 109 events are counted in the Monte Carlo simulation, which takes tens of hours computation time on a personal computer. Computation of high order cumulants also is a cumbersome task, because the number of involved terms rapidly grows with increase of order n. Also, It has been proved that as long as there are some nonzero cumulants higher than 2, all cumulants up to infinite order must be nonzero (See Section 8.3). Therefore, no matter how a cut-off at a finite order n > 2 is taken, the cumulant solution of the Boltzmann transport equation cannot be regarded to be exact. B. Reshaping the particle distribution For practical applications, we use a semiphenomenological model. The Gaussian distribution is replaced by a new
250 ANALYTICAL SOLUTION OF THE ELASTIC TRANSPORT EQUATION
FlG. 14.4. Time-resolved profile of transmission light in an infinite uniform medium, computed using the tenth order of cumulants solution (solid curve), the second cumulant solution (dotted curve), and the diffusion approximation (thick dots curve), comparing with that of the Monte Carlo simulation (discrete dots). The detector is located at z = 6^r from source along the incident direction, and the received direction is 0 = 0. The Heyney-Greenstein phase function with g = 0.9 is used, and the absorption coefficient l/la = 0. shaped form, which maintains the correct center position and the correct half-width of the distribution. The new distribution satisfies causality, namely, /(r, s, t) = 0 outside the ballistic limit, vt. There are an infinite number of choices of shape of the distribution under the above conditions. We choose a simple analytical form as discussed later. At long times, the half-width of the distribution a ~ (4B) 1 / 2 , with B shown in Eq. (14.21), spreads with t 1 / 2 , hence, a
IMPROVING CUMULANT SOLUTION OF THE TRANSPORT EQUATION251
FlG. 14.5. The ID spatial photon density at time t = 2ltT/v, obtained by the reshaped form Eq. (14.43) (solid curve) and the Gaussian form (dashed curve), comparing with that of the Monte Carlo simulation (dots). The Heyney-Greenstein phase function with g = 0.9 is used, and l/la = 0. In the figure, the unit in z axis is ltr', Rc is the center position of distribution computed by the cumulant solution; zc is the distance between the origin of the new coordinates and the source. where Rcz and Dzz in Eqs. (14.31), (14.32). As shown in Fig. 14.5, although the ID Gaussian spatial distribution (the dashed curve) at time t = 2ltT/v, Eq. (14.42), has the correct center and half-width, the curve deviates from the distribution computed by the Monte Carlo simulation (dots), and a remarkable part of the distribution appears outside the ballistic limit vt = 2/ tr - At early times the spatial distribution is not symmetric to the center Rc. While Rc moves from the source toward the forward side, causality prohibits particles appearing beyond vt. This requires the particles in the forward side being squeezed in a narrow region between Rc and vt. For a balance of the parts of distribution in the forward and backward sides of Rc, the peak of the distribution should move to a point at the forward side and the height of the peak should increase. Based on this observation we propose the following analytical expression: (1) to move the peak position of the distribution from Rcz to zc, while the parameter zc will be determined later; (2) to take this point as the origin of the new coordinates; and (3) to use the following form of the shape of ID density in new coordinates:
252 ANALYTICAL SOLUTION OF THE ELASTIC TRANSPORT EQUATION
where
At the ballistic limit z = z±, N(z) reduces to zero, and N(z) = 0 when z is outside of z±. The parameter b in Eq. (14.43) can be determined by normalization; the parameters (a, zc) can be determined by fitting the center and half-width of the distribution. This fit requires
The integrals in Eqs. (14.45), (14.46), and (14.47) can be analytically performed, related to the standard error function:
The solid curve in Fig. 14.5 shows the reshaped spatial distribution, Eq. (14.43), of the ID density at time t = 21^ /v, using the Heyney-Greenstein phase function with g = 0.9, which satisfies causality and matches the Monte Carlo result much better than the Gaussian distribution. For nonlinear fitting a difficulty is how to quickly find a global minimum. The optimization codes require setting a good initial value of the parameters, so the obtained local minimum is the true global minimum. The following procedure is used to quickly obtain a global minimum. At the long time limit, zc « Rcz, and c? Ki (4Dzzvt)~l, the distribution approaches the original Gaussian distribution.
IMPROVING CUMULANT SOLUTION OF THE TRANSPORT EQUATION253
We set these value of parameters at a long time tm, and take them as initial values, using a nonlinear fitting, to determine the parameters at tm-i = tm — At, where At is a small time interval. Then, we use parameters at t m _i as initial values to determine parameters at t m _2- Step by step, the parameters in a whole time period can be computed. SD-density In this case the ballistic limit is represented by a sphere with center located at the source position and radius vt. We move the peak position of the distribution from Rcz to zc along the SQ = z direction, take this point as the origin of the new coordinates, and use the following form of the shape of 3D density as a function of the position in the new coordinates, f:
where N(r) = 0 when r > r*, and x is the polar angle of r in the new coordinates, f* is the distance between the new origin and the point by extrapolating f to the surface of the ballistic sohere:
In Eq. (14.52) a(x) is defined by
The parameters b can be determined by normalization; the parameters (o^, aj_, zc) are determined by fitting the center and half-width of the distribution. This fit requires
In the above integral (fir = 2'rrf'2dfdcos(x), integration over f can be analytically performed, integration over x is performed numerically. Figure 14.6 shows the computed time profile of the 3D density N(r, t), with the source at the origin and the detector is located at r = (0, 0, 3/ tr ), using the HeyneyGreenstein phase function with g = 0.9. The solid curves is for the reshaped form Eq. (14.52). The dashed curve is for the Gaussian form, and the dots are for the Monte Carlo simulation. The results clearly demonstrate an improvement by use of the reshaped form than that by use of the Gaussian form. The nonzero intensity
254 ANALYTICAL SOLUTION OF THE ELASTIC TRANSPORT EQUATION
FIG. 14.6. Time-resolved profile of 3D photon density, where the detector is located at z = 3/tr from source along the incident direction, obtained by the reshaped form Eq. (14.52) (solid curve) and the Gaussian form (dashed curve), comparing with that of the Monte Carlo simulation (dots). The Heyney-Greenstein phase function with g = 0.9 is used, and the absorption coefficient l/la = Q. before £& = 3ltr/v has been completely removed in the reshaped form, while the Gaussian distribution has nonzero components before t\,. The reshaped time profile matches with result of the Monte Carlo simulation in most time periods, but the peak values is about 20% lower. The errors are much smaller than that of the Gaussian distribution. By integration over time, the density for the steady state can be obtained. The difference in the steady state density between the reshaped analytical model and the Monte Carlo simulation is about 3%. Distribution function 1^ (r, s, t) When the detector is located less than 8/tr from the source in a medium with large (/-factor, the distribution function /( s )(r, s, t) is highly anisotropic, and the intensity received strongly depends on the angle. One needs to use the photon distribution function /^(r, s,t) instead of the photon density N(r, t). In this case the center position rc, as a function of (s, t), is not located on the axis at incident direction SQ. Without loss of generality, we set the scattering plane (s, SQ) as the x-o-z plane. The center position now is located at rc = (r£, 0, rcz}. The orientations and lengths of axes of the ellipsoid, which characterize the halfwidth of spread of the distribution, can be computed as follows. The nonzero
IMPROVING CUMULANT SOLUTION OF THE TRANSPORT EQUATION255
FIG. 14.7. Schematic diagram describing the geometry of the particle spatial distribution for scattering along a direction s ^ SQ. At certain time t, the center of the distribution is located at rc. The half-width of the spread is characterized by a ellipsoid (the gray area). The large sphere represents the ballistic limit. The origin of new coordinates is set by extending from \rc to zc. f * is a point by extrapolating a position f (in the new coordinates) to the surface of the ballistic sphere, and the length f* is determined by Eq. (14.53). components for the second cumulant now are (Bxx,Bxz,Bzz,Byy), expressed in Eq. (14.21). Byy represents the length of one axis of the ellipsoid, perpendicular to the scattering plane. By diagonalizing the matrix
the lengths and directions of other two axes of the ellipsoid on the scattering plane can be obtained. In fact, calculation shows that the direction of rc is also the direction of one axis of the ellipsoid, since at a certain time t the direction rc can replace s as the unique special direction in the scattering plane. In order to reshape the distribution we choose a new z axis along the rc direction, and move the peak position of the distribution from \rc\ to zc, and take this point as the origin of the new coordinates (x, y = y, z), as shown schematically in Fig. 14.7. In the new coordinates we use a shaped form similar to that of the 3D density Eqs. (14.52), while a(x) in Eq. (14.52) is
where x and (p are, separately, the polar angle and the azimuthal angle of a position r in new coordinates. The parameters (ax, ay,az, zc) are determined by fitting the
256 ANALYTICAL SOLUTION OF THE ELASTIC TRANSPORT EQUATION
center rc and lengths of three axes of the ellipsoid characterizing the half-width of the distribution. In many cases, the ellipsoid can be approximately treated as an ellipsoid of revolution, with the length of the axis of the ellipsoid along the x direction approximately equal to that along the y direction then the computation can be simplified. The new shaped distribution function 7^(r, s, i) for a certain direction s is normalized to F^ (s, SQ, t). Figure 14.8 shows the computed time profile of distribution function /( s )(r, s, £), when the detector is located at 4/ tr in front of the source, using the Heyney-Greenstein phase function with g = 0.9. Fig. 14.8(a), (b) are, separately, for different directions of light s: 9 = 0 and 30°. The solid curves are for the reshaped form Eq. (14.52) and the dashed curve is for the Gaussian form. The dots are for the Monte Carlo simulation. Anisotropic distribution is shown by comparing with Fig. 14.8(a) and Fig. 14.8(b). The reshaped distribution removes the intensity before tb = 4/ tr /t>, which appears in the Gaussian distribution. The reshaped distribution matches the Monte Carlo result much better than the Gaussian distribution. While causality, together with the correct center and half-width of the distribution, are major controlling factors in determining the shape and the range of the particle distribution, the detail shapes are, to some extent, different by use of the different models. For s at the near backscattering direction, the Gaussian distribution can be a good approximation as shown in Fig. 14.3, because most particles suffer many scattering events to transfer from the forward direction to the backward direction. Our calculation shows that the center position rc is close to the source for 0 KI 180° and far from the ballistic limit, hence, reshape has little effect on the backscattering case. Beside improving convergence, separating the ballistic component from the scattered component also provides a more proper time-resolved profile for transmission. In the time-resolved transmission profile the ballistic component is described by a sharp jump exactly at time vt, separated from later scattered component. The intensity of the ballistic component, comparing to the scattered component, strongly depends on the ^-factor. For g = 0, l^ = ls, the ballistic component decays to exp(—1) = 0.368 at distance 11^. But for g = 0.9 it decays to exp(—10) = 4.54 x 10~5 at l/tr» because l^ = 10/s. The jump of the ballistic component can be seen in experiments of transmission of light for medium of small sized scatterers (small ^-factor), but is difficult to be observed for medium of large sized scatterers (large (/-factor). Our formula provide a proper estimation for both small and large ^-factors by explicitly separating these two components. Using the obtained analytical expressions, the distribution I(r, s, t) can be computed very quickly. The cumulant solution has been extended to the polarized photon distribution (Cai, Lax, and Alfano 2000c), and extended to semi-infinite and slab geometries (Xu, Cai, and Alfano 2002, Cai, Xu, and Alfano 2003). Using
IMPROVING CUMULANT SOLUTION OF THE TRANSPORT EQUATION257
FlG. 14.8. Time-resolved profile of photon distribution function, for light directions (a) 9 = 0, (b) 6 = 30°, where the detector is located at z = 4/tr from the source along the incident direction, obtained by the reshaped form Eq. (14.52) (solid curves) and the Gaussian form (dashed curve), comparing with that of the Monte Carlo simulation (dots). The Heyney-Greenstein phase function with g = 0.9 is used, with the absorption coefficient l/la = 0. a perturbative approach, the distribution in a weak hetergeneous medium can also be calculated based on the cumulant solution (Cai, Xu, and Alfano 2003). The nonlinear effect for strong heterogeneous objects inside the medium can also be calculated using a correction of a "self-energy" diagram. (Xu, Cai, and Alfano 2004).
15 Signal extraction in presence of smoothing and noise
15.1
How to deal with ill-posed problems
The importance of ill-posed problems The problem of extracting a signal from a distorted output (i.e., the solution of an inverse problem) in the presence of noise is a ubiquitous one. It occurs in the analysis of spectral data (Jansson 1970), in geophysical problems requiring the inverse of potential theory (Bullard and Cooper 1948), or of heat conduction (John 1955), and in the inversion of information from optical instruments, electron spectroscopy, radioastronomy (Kaplan 1959), medical image, etc. A related ill-posed problem is that of pattern analysis or recognition (Benjamin 1980). Also of interest is the use of lasers to probe atmospheric temperature distributions (Hillary et al. 1965). The most recent example is the analysis and repair of distorted signals from the Hubble telescope. The nature of ill-posed problems
In what sense is the typical inversion problem an ill-posed problem? The signal (e.g., the line shape) s(x) we are trying to determine is acted on by some apparatus K, and the output is contaminated by noise n. In a simple linear case, the measured output m(x) is given by
If there were no noise, and if m(x) were measured with infinite precision, one could invert the integral equation to write
where
i.e., K
1
is the kernel inverse to K(x, y).
SOLUTION CONCEPTS
259
The difficulty is that all kernels K(x,y) perform some smoothing on their input. Thus one could add to the solution s(x) a high frequency term Asmujx such that for any small e by choosing u> sufficiently large - for any A, even an intense A. Thus, if our measured result is precise only to within a noise n(x) of order e, many different solutions are possible such that
One may object, however, that the solution s(x) should be smooth and not contain a superposition of high frequency components. The answer is that the problem is, in general, not well-posed (that is with a unique solution) until the nature of the smoothness is specified. Unfortunately, the smoothness of the solution may not be known in advance. One is attempting to determine this from the measured results! However, if one makes no specification of smoothness, it is difficult to tell which of the frequencies in the measurement m(x) are properties of the signal, and which are spurious. It is our opinion, and that of a number of others, whose work will be discussed shortly, that this issue can only be resolved by making a separate measurement of the spectrum associated with the noise n(x) (or its autocorrelation (n(x)n(x + x')}). Not only the shape of the noise spectrum, but also its intensity is relevant, since Fourier components in a particular measurement much below the corresponding noise intensity can not be regarded as significant, and should be excluded from the estimated signal s(x). Thus we see that the correct procedure for computing an estimate s(x) from m(x) must be a nonlinear one. 15.2
Solution concepts
We shall use as a guide to the literature the paper by Price (1982) and the extensive review of statistical methods presented by Turchin, Koslov and Malkevich (1971). All of the methods, to be discussed below, reduce the continuum problem to a discrete one. In the simplest case, the observation points used are at Xi and the solution s(y) is to be evaluated at yj. Equation (15.1) then reduces to a set of coupled linear equations An alternative procedure is to expand s(y) in some set of orthonormal functions (f>v(y) and m(x) in the same set <j>fj,(x). The result is that Eq. (15.6) is replaced by an equation of identical form, with subscripts i and j replaced by /j, and v. Of course, m^ and sv are the expansion coefficients in the ^ basis.
260 SIGNAL EXTRACTION IN PRESENCE OF SMOOTHING AND NOISE
In terms of either discrete representation, the ill-posed nature of the problem manifests itself in the matrix KIJ (or K^v) being an ill-conditioned matrix. See for example, Dahlquist and Bjorck (1974). Filtering If noise is ignored and the kernel possesses translational invariance
then a formal solution for s ( x ) can immediately be obtained using Fourier transform techniques:
where m(p) and K(p) are the Fourier transforms of m(x) and K(x), respectively. The ill-posed nature of the problem can be made evident by considering an instrument K with Gaussian line shape:
In this case, its Fourier transform is
The use of Eq. (15.10) in Eq. (15.8) clearly produces a large enhancement in any high frequency components in m(p). If there were no noise, m(p) would vanish as p —>• oo more rapidly than K(p) so that a well-defined expression would result for the signal s(x). But the added noise n(x) can be white noise, meaning that n(p) and hence rh(p), do not fall off, but remain constant as p —> oo. The most elementary way to avoid this difficulty is to set
where F(p) is a filter factor that falls off rapidly as p —» oo, chosen to impose the desired smoothness on s(x). The problem is the arbitrariness involved in specifying F(p).
METHODS OF SOLUTION
261
Regularization
A second class of methods, known as regularization methods, replaces the problem of minimizing | \Ks — m\ |2 by a well-posed problem of the form
where D is a linear operator that measures the degree of nonsmoothness, say a second derivative, and a is a parameter that determines the amount of nonsmoothness allowed.
15.3
Methods of solution
Relation of regularization to filtering For a general D, we claim, with Price (1982), that the regularization method is equivalent to using the filter factor
since Eq. (15.12) yields
from which we deduce that
or
In Fourier space, the factor in brackets is the filter factor of Eq. (15.13), but the matrices in Eq. (15.16) can taken in any basis set 4>^(x). Perhaps the earliest suggestion for regularization was made by Phillips (1962) who proposed that to keep the solution smooth one should for fixed \\Ks — m\\
262 SIGNAL EXTRACTION IN PRESENCE OF SMOOTHING AND NOISE minimize
and he replaced s"(x) by its discrete counterpart
But then so that Thus the Phillips choice is a special case of smoothing with |D(fc)| 2 = 16sin 4 (/c/2). The chief purpose is to have \D(k)\2 increase with k to be more effective in eliminating the high Fourier components. If Fourier transform methods are used, it would be simpler to use which is simply the statement that the spectrum of s"(x) is k4 times that of s(x). A more general discussion of regularization is given by Tikhonov (1977). Iteration
One of the earliest deconvolution schemes is the iteration scheme of van Cittert (1931). In this scheme one starts with
and passes from the /j,th to the (/j, + l)th iterate according to
which is, in effect, a Jacobi iterative solution of the simultaneous equations. (The prime on the sum omits the diagonal j = i term.) Jansson (1970) proposed an overrelaxation scheme of the form
where K need not equal unity. In this case, all components of s are updated simultaneously. If, in updating any component, the updated values of earlier components are used, we get a generalization (for K ^ 1) of the Gauss-Seidel iteration
METHODS OF SOLUTION
263
procedure:
Jansson quotes the convergence condition for this scheme as
However, the matrix Kij will always be ill-conditioned, Eq. (15.26) will be disobeyed, and convergence will never occur. Jansson succeeds, however, for a different reason. At the start, and after each iteration, he applies a smoothing procedure of the form
where the smoothing formula involves 21 + 1 terms. This smoothing procedure is equivalent to the choice of a particular value of D(k). Optimum choice of the relaxation parameter K is discussed by Fadeeva (1959). Jansson was able to resolve adjacent peaks in infra-red spectra whose separating valley was eliminated by noise. His method, however, "converged only to within a certain value of the root-mean-square difference then diverged". Although Jansson attributes this divergent behavior to noise in the data, in our opinion, it may be associated with round-off error generated by multiple matrix processes. If his procedure were replaced by an equivalent D(k), then one fast Fourier transform, a filtered product, and an inverse fast Fourier transform would yield his best result without iteration. However, his scheme, like that of Phillips, and of Twomey's (1963) improvement of Phillips can all be criticized for the arbitrariness in their choice of D(k). Nontranslationally invariant kernels
Noble (1977) provides a simple, numerical example:
with a = 0, b = 1, s(y) = 1, and m(x) computed from this integral with a kernel that arises from potential theory:
264 SIGNAL EXTRACTION IN PRESENCE OF SMOOTHING AND NOISE
He converts the integral equation to a set of simultaneous equations using Simpson's rule with n points. He solves for s(0) [exact s(0) = 1] with the results:
"Contrary to what we might expect at first sight, the larger the number of points, the worse the results are; the smoother the kernel, the worse the results are". 15.4
Well-posed stochastic extensions of ill-posed processes
Franklin's method Our section heading is borrowed from the title of a lucid contribution by Franklin (1970). Our description of Franklin's work follows that of Shaw (1972), whose improvement will be detailed in the next section. In our notation, Franklin considers the solution of the problem
where the noise n has given Gaussian statistics, and the signal s also has given statistics. The statistics of m is derived from the corresponding statistics of s and n. Presumably, the statistics of the noise n can be obtained by measurements in the absence of a signal. The aim is to construct a linear operator L such that an estimate s of s can be constructed from m via
To minimize the mean square error, we vary L in
To obtain this result, we expanded in an arbitrary basis fa:
so that the second term takes the form
WELL-POSED STOCHASTIC EXTENSIONS OF ILL-POSED PROCESSES265 where we have taken the ensemble average and used
If we vary L* in Eq. (15.32), and use the cyclic property of the trace we get
from which we obtain in agreement with Shaw's (3.15). Note, however, that the statistics of m are determined by those of n and s according to Eq. (15.30): The scalar product with s yields
or To evaluate all the components of Rmm, we need:
with the help of the expanded form
The explicit result can be written as a matrix relation by suppressing the subscripts:
Thus
In the usual case, in which noise is uncorrelated with the signal, Rsn = Rns = 0, L reduces to: This is the result used by Franklin. We can verify that when the noise is neglected
the correct, but useless value because K is inevitably an ill-conditioned matrix.
266 SIGNAL EXTRACTION IN PRESENCE OF SMOOTHING AND NOISE 15.5
Shaw's improvement of Franklin's algorithm
Franklin's procedure had one significant defect: He assumed that the signal s was drawn from a space in which the mean value {s} = 0. In addition, one often wishes to use a number M of measured values rrij significantly larger than the number TV of signal values Sj to be estimated. Franklin's procedure requires the solution of M simultaneous equations. Shaw has produced an algorithm that requires inversion only of a smaller N x N matrix as one does in a least squares calculation. In addition, he makes an initial estimate s^ of the signal and interates assuming (s) = s^ in computing §(n+1\ We first note that a simple least squares algorithm which requires the minimization of
leads to the equations
If we multiply the original relation Eq. (15.30) between m and s by K^ we obtain
Here K^K has the reduced N x N size. This problem now has the same form as Franklin's original problem with the replacements Thus Eq. (15.35) for Rsm is replaced by a reduced noise matrix
The reduced signal-measurement correlation is
Thus we obtain the matrix relation:
Similarly, the reduced measurement-measurement correlation is given by:
The estimate based on Franklin's procedure, but using the reduced matrices, takes the form:
SHAW'S IMPROVEMENT OF F R A N K L I N ' S ALGORITHM
267
or in expanded form:
If cross-correlations are neglected,
If, in addition, white noise is assumed
now, Rmrn in Eq. (15.56) has a factor K^K on the left, so that its reciprocal takes the form
where / is a unit matrix in the ji space. Then the estimated signal, Eq. (15.56), can be rewritten as:
Since the matrix RSSK^K commutes with itself as well as with the unit matrix /, we can move it through to the right to obtain:
When a nonzero signal is present:
our aim is to determine (s} = h from the measured data. Franklin's procedure, Eqs. (15.314) and (15.37), is modified to
in its original form. In the reduced form Eq. (15.56) is replaced by:
In the case of white noise we can return to Eq. (15.60) and replace s by s — h, and m by m — h. After the subtraction involving the h terms is performed, we get the simplified result:
in agreement with Shaw (4.19).
268 SIGNAL EXTRACTION IN PRESENCE OF SMOOTHING AND NOISE
In his calculations Shaw also treats the signal s as having white noise
In this case, Eq. (15.64) simplifies further to:
Since s should agree with h, Shaw adopts the iterative procedure
Shaw then provides a starting estimate s^ from the least square equation
by assuming that K^K is sufficiently sharp that Sj can be replaced (on the left) by Sj with the result
If we set s(n+1) = s^ = s in Eq. (15.68) and attempt to solve directly for s, we obtain which is just the least squares solution. Thus the iterative procedure should eventually become unstable!
15.6
Statistical regularization
What is the meaning of the statistical fluctuations in the signal characterized by the correlation function Rss These fluctuations may represent actual noise that contaminates the signal. However, even when the signal is not contaminated by noise, but only added later, the Franklin and Shaw procedures would break down (the problem became illposed) if Rss were set equal to zero. Another interpretation is that we impose on the problem an a priori distribution -P([s]) of possible signals. This distribution, for example, should give weight to our prejudice that the Sj = s(xj) arise from
STATISTICAL REGULARIZATION
269
a smooth function s(x). The Phillips's regularization procedure emphasized this point by adding a term in the minimization procedure
which becomes large if s(x) becomes highly oscillatory. These ideas can be cast in the language of statistical decision theory. Let
represent the probability of a signal s and a measurement m. (For simplicity, think of s as the set of numbers s$ = s(xj) and m as the set nij = TO(XJ).) Then the Bayes' theorem solution of the problem
represents the a posteriori probability of s, after having made the measurement m in terms of the a priori estimate P(s). Since the noise n is Gaussian and additive,
If we do not know the a priori probability in detail but only its correlation matrix
then the distribution with maximum entropy where
subject to the constraint of having a correlation matrix C is given by the Gaussian
In summary, although the regularization methods appear to be distinct, they all convert an ill-posed problem to a well-posed one by adding a requirement of smoothness. This is done most explicitly by specifying an a priori probability associated with the signal. (Iteration procedures, such as Eq. (15.67), due to Shaw and Franklin give the signal an a priori spread in Eq. (15.65).) The disadvantage of all the above procedures, is that the predicted signal depends linearly on the measured signal. Regularization is undoubtedly necessary, but the best estimate of the signal should depend nonlinearly on the measured value and noise.
270 SIGNAL EXTRACTION IN PRESENCE OF SMOOTHING AND NOISE 15.7
Image restoration
Nonlinear methods have been introduced in connection with the problem of image restoration. These methods recognize that an image is likely to have sharp edges. The methods introduced consist in a mixture of a regularized solution with the unregularized result with the degree of admixture varied in a local manner that is sensitive to the gradient of the measured signal. This procedure reduces the amount of undesirable smoothing that occurs in the vicinity of an edge. But no investigation has been made of the stability of these new procedures. The work of Abramatic and Silverman (1982) is based upon a procedure introduced in geophysics by Backus and Gilbert (1970) and on the work of Frieden (1975). The idea of Abramatic and Silverman is to allow the regularization parameter which controls smoothness of the solution to adapt to the local characteristics of the image (a flat field or an edge). This was done by taking into account the masking effect of the human eye. The eye is quite sensitive to a small amount of noise in a flat field, but is able to tolerate a large amount of noise in the surroundings of an edge. In their procedure, the masking function is estimated in the form of
from the noisy image where g(i,j) is the gradient of the image at the pixel ( i , j ) and do is at the order of the typical size of an edge. The amount of regularization at each pixel (i, j ) is scaled by a visibility function, f(M(i, j)), a monotonic decreasing function from 1 to 0 as M goes from 0 to oo. Abramatic and Silverman used the visibility function
where a > 0 is a tuning parameter. Via the visibility function, stronger regularization is then applied to the flat field where M is small and weaker regularization near an edge where M is large. In short, nonlinear image restoration is much harder than linear restoration. An excellent summary of image restoration can be found in G. Demoment (1989).
16 Stochastic methods in investment decision
Derivative securities, with which we shall be concerned, have a value related to the underlying security or asset. When a derivative such as a put, or a forward, or future sale, is not priced correctly, one can, by making compensating purchases, make a risk free profit. This is sometimes referred to as arbitrage. The purpose of this chapter is to develop the relation between the price of the underlying asset, and a derivative, and to include the effect of noise, or random fluctuations on this relation. See Hull (1989, 2001) for a detailed discussion of "Options, Futures, and Other Derivative Securities". A more mathematically oriented description of this subject is contained in Nielsen (1999). There is a controversy between the mathematical economists and financiers, who use methods based on the Ito's (1951) calculus lemma, and physical scientists, for example van Kampen (1992), whose applications have been made to chemistry problems, and Lax (1966IV). An attempt will be made to compare the two techniques to avoid pitfalls that sometimes occur in the use of the Ito method. 16.1
Forward contracts
A forward contract is an agreement by one person to sell at a time T (in years) for K dollars (at delivery) an asset whose value at the current time t (in years) is S. The forward price F is that delivery price K chosen to make the value of the contract zero. If the interest rate r in risk free money were zero, we would have F = S. However, if the delivery price is K, one only needs cash equal to K exp[—r(T — t)\ at the present time to be able to pay K at the time T — t later. If we assign / to be the value of the forward contract, then
since the first two items are equivalent to owning the third. The forward price F is then the value of K that makes / = 0.
As an example, from the Wall Street Journal on Friday, May 22, 1998 we take the price in dollars for 100 yen (table 16.1).
272
STOCHASTIC METHODS IN INVESTMENT DECISION
TABLE 16.1. Price in dollars for 100 yen Time Price Interest Spot 1.1675 30 day 1.1715 2.05% 90 day 1.1808 2.26% 180 day 1.1948 2.31% Except for commission, that we neglect, the ratio of the 30 day, 90 day and 180 day forward prices describe the interest rate factor exp[r(T — t)] for the three different periods. In the third column we list the rate r, consistent with the above data. It would appear that Eq. (16.2) contains a hidden assumption that the price S (at the initial time t) will be equal to the final price ST at the settlement time T. But this is not the case! An arbitrageur can buy the asset at the spot price S and take the short (seller) side of the forward contract. To do this, he must borrow S dollars at a total cost of 5exp[r(T — t)]. When he sells the asset, he receives ST for a gain (possibly negative) of
From the forward contract, he receives F, but then must supply an asset of value ST leading to a gain of In the combined gain
the value of ST disappears. This result agrees with Eq. (16.2). Thus if F > 5exp[r(T — t)] he makes a risk free gain. If F < S l exp[r(T — £)] , an arbitrage in the opposite direction also yields a risk free gain. This expression, Eq. (16.2), for the value of a forward contract is independent of the final value ST of the asset. 16.2
Futures contracts
Futures contracts, like forward contracts, are an agreement now to buy or sell an asset in the future at an agreed upon delivery price. However, future contracts are handled on an exchange such as the Chicago Board of Trade. To buy a future contract through a broker, a deposit, the initial margin, must be supplied to guarantee delivery. This could be 20% of the value of the contract.
A VARIETY OF FUTURES
273
Thus to buy 100 ounces of gold at $400/ounce, a contract of $40000 might require an $8000 deposit. If the price of gold goes up by $10 the buyer of the future contract finds that his margin account has gone up by 100 x 10 = 1000. However, any balance above the initial margin can be withdrawn. Even if not withdrawn, additional interest is earned. Conversely, if the price goes down, the value of the margin account declines a corresponding amount, and the interest earned declines. If the margin falls below a maintenance level, the investor will receive a margin call to make up the difference. If not received, the broker sells the contract thus closing out the position. In Appendix 2A of Hull (1989), or 3B of Hull (2001) he establishes that if the interest rate is the same for all maturities, futures contracts and forward contracts should have identical prices. However, if interest rates change, particularly if they change in a way that is correlated with price S charges, the equivalence no longer holds. We shall ignore these fine points in the discussion that follows. Typically, European contracts involve action only at the closing date. But American puts and calls can be exercised at any intermediate date, or held to the close. This introduces a need for a strategy as to when to take action, and can also cause a modification of the price of the put or call. 16.3
A variety of futures
The general behavior of future contracts were described in the previous section, but there are differences that depend on the nature of the assets. If we are dealing with stock index futures, where the stock has a dividend yield q, the forward price, Eq. (16.2), is modified to Stock: since the underlying asset has a dividend yield q that partly compensates for the interest rate r. For future contracts involving currencies, if the local currency has interest rate r, and the foreign currency has interest rate rt we get since the yield in the foreign currency plays the role of a dividend. Table 16.2 shows an decrease of price with maturity corresponding to the fact that interest rates in the US are less than those in Canada. For gold and silver, the forward price is Gold: or
Gold:
274
STOCHASTIC METHODS IN INVESTMENT DECISION
TABLE 16,2. Price of Canadian dollars in America Time Price Spot 0.7404 30 day 0.7396 90 day 0.7383 180 day 0.7308 where U is the present value of all storage costs over the life of the contract. If the storage cost is proportional to the value of the gold, with the storage cost u per year per dollar of value, this is equivalent to a total storage cost
This results in Eq. (16.9) with n, acting as if it were a negative dividend, or an added interest charge. 16.4
A model for stock prices
Black and Scholes (1973) have developed a procedure for estimating the appropriate price for puts and calls which are forward contracts where the asset involved is a stock. A similar contribution was made by Merton (1973) at the same time. (The Nobel prize was shared for this work.) For this purpose, they need a model for how the price S of a stock varies with time. If the change in S is proportional to S, the growth is necessarily exponential. In the absence of fluctuations then, the model assumes
where the growth rate /j,, in the stock price, presumably can be estimated from the growth rate in earnings of the stock. In the lowest order, one might expect the stock to execute a Brownian motion with fluctuation parameter anS. But this choice has two disadvantages. The first as remarked by Hull (1989), Section 3.3 and Hull (2001), Section 10.3 is that investors expect to derive a return as a percentage of the stock value, independent of the price. Thus they classify stocks by their growth rate. To add Brownian motion, Eq. (16.11) is rewritten in the form Hull where (3.7) refers to Hull (1989) and (10.6) refers to Hull (2001). Hull, of course, doesn't use the subscript H in his work. Here dz is the differential of a Wiener
A MODEL FOR STOCK PRICES
275
process of pure Brownian motion, namely one whose mean remains zero, and whose standard deviation is (At) 1 / 2 . If 4>(m, S) describes a normal process with mean m and standard deviation E, then the distribution of Ax = A5/5 is described correctly by
to allow for the growth of the mean value m = //Ai and the standard deviation o-H(At) 1 / 2 with time. This correct result, stated as Eq. (3.8) in Hull (1989) and Eq. (10.9) in Hull (2001), also avoids the second disadvantage. If 5* itself were described as a Wiener process, the price S could reach unacceptable negative values. In the accepted model, all that can happen is for In S to go negative. We therefore advocate the use of Eq. (16.12) as the fundamental description of a model for stock prices. The model, Eq. (16.12), defines // as the slope of the ensemble average of x(t) if an ensemble of measurements can be made. If not, one takes a logarithm of one sample price series and asks for the slope of the best linear fit to that line of price logarithms. Now we ask what is the Ito's stochastic differential equation (ISDE) for stock 5? Using the Ito's calculus lemma, Eq. (10.41), and dS/dx = S and d2S/dx2 = S, this leads to ISDE for S,
A model called geometric Brownian motion is also often used in financial quantitative analysis, which is written as Then, by <
and
Ito's lemma leads to
The fact /z / v indicates that Ito's calculus \emmsiprohibits simply multiplying by S on both sides of Eq. (16.12), which would produce
This simplest example shows the puzzle in performing Ito's calculus lemma. But this kind of manipulation has appeared in some well known books in the financial area. For example, in Hull's Eq. (10.6), with the word "or", Eq. (16.12) is rewritten as Equations (16.12) and (16.16) appear to be are regarded as equivalent by Hull and the finance community. But we believe the second choice is not equivalent to the first choice.
276
STOCHASTIC METHODS IN I N V E S T M E N T DECISION
Before further analysis we must note an annoying but unimportant difference in notation. The standard Wiener notation is equivalent to the correlation
This leads to a Brownian motion in which
whereas the customary physics notation would have a factor of 2 placed on the right hand side of these equations. In this chapter we follow Hull's convention common in the economics world and and set
where f ( t ) d t is our definition for standard fluctuation, as defined in Chapters 9 and 10. In this book,CTHis an abbreviation for <JH U H> and cr is an abbreviation for ^Lax-
Our approach is developed in Sections 8.2-8.3, and Sections 10.1-10.3, which uses the ordinary calculus rule for the variable transform in the Langevin stochastic differential equation (LSDE). By Eq. (16.19), Eq. (16.12) can be written in the form of our Langevin equation
Thr LSDE approach allows us to simply multiply S on Eq. (16.12), and it is written as Equation (16.21) seems similar to Eq. (16.16), however, using our approach the average of the product in the second term is not zero. Thus Eq. (16.21) should not be regarded as an Ito's equation. The custom in the economic field, under definition of Ito's integral, is to replace Eq. (16.16) by an equation Hull: where tc = t — e is a slightly earlier time than t. This guarantees that the average of the second term vanishes. This happens because in an integration from t to t + At the time tc is not included in the region of integration. Equation (16.22) is then a true Ito's equation, but it is not equivalent to Eq. (16.21). Do these models, Eq. (16.12) and Eq. (16.16), or Eq. (16.21) and Eq. (16.22), yield different results?
A MODEL FOR STOCK PRICES
277
Equation (16.21) has been solved for a well behaved f ( t ) (finite, not 5 functions) stochastic variable such as one with a Gaussian correlation in time (see Section 10.2). The average of the product in the second term is not zero in a finite time interval, and remains nonzero as one approaches the white noise limit by letting the correlation time approach zero when At —» 0. By specializing to the delta-correlated case, Eq. (10.27) can be written in the form Lax :
This is equivalent to the statement that Lax :
i.e., that In ST has a normal distribution with mean In S + ^(T - t) and variance a^(T - t). For a small time interval, At, Eq. (16.24) reduces to Eq. (16.13) or Eqs. (3.8)10.9) in Hull (1989)2001). However, Eqs. (4.6) 11.1) in Hull (1989)2001) Hull: is in disagreement with our result, and also with Eqs. (3.8110.9) in Hull (1989)2001). How did this discrepancy of (l/2)
which is in agreement with Eq. (16.14). For the average, S on the right hand side of Eq. (16.26) is replaced by (S}. In our Langevin expression, the first term represents motion driven by finite number of driving forces to the system, hence, is a deterministic function of time. The second term represents the fluctuation driven by many unknown random forces. Under transform of variables, the ordinary calculus rule can be applied, separately, on both terms. Hence, the meanings of drift and fluctuation are clearly kept separate in the first and second terms after transform of variables. The average of the second term is generally nonzero when an(S) is not a constant.
278
16.5
STOCHASTIC METHODS IN INVESTMENT DECISION
The Ito's stochastic differential equation
Because mathematicians concentrate on Brownian motion, which is rather singular in behavior, it is not clear how to define the integral of a product of a random variable and a random force. In particular, following our notation of Section 10.1 it is not clear how to convert the differential equation, Eq. (10.1),
to an integral. The Riemann sum of calculus would be
where The Riemann integral exists if the sum approaches a limit independent of the placement of tj in the interval in Eq. (16.29). The Riemann integral exists, according to Jeffries and Jeffreys (1950), when the integrand is bounded over the interval of integration and for any positive ui and 17, the interval of integration can be divided into a finite set of intervals such that those with hops (jump discontinuities) > (jj have a total length < 17. Our point is that Brownian motion violates these condition. See Ito (1951), and Doob (1953). Ito (1951) avoids the difficulty by evaluating a (a) at the beginning of the interval, and evaluating the integral over /(£) as a Stieltjes integral
However, even the Stieltjes sum does not converge to a unique integral, and the evaluation at the beginning of the interval is an arbitrary choice. The effect of this choice, since /(s) is independent of a(t) for t < s, is that the average of the second term in Eq. (16.27) vanishes, so that the Ito drift vector is in contrast to our result, Eq. (10.14). Stratonovich (1963) makes the arbitrary but fortuitous choice of using
the average of the values at the two end-points of each interval.
THE ITO'S STOCHASTIC DIFFERENTIAL EQUATION
279
It is intuitively clear that an average of the end-points is better than using either one. But is this procedure always correct? It is just as ad hoc as Ito's procedure. In Lax (1966IV), and in Section 10.1, Lax uses an iterative procedure that yields all results accurate to order At in two steps. The result, Eq. (10.14) and Lax (1966IV) Eq. (3. 10) yields
where D = cr(a) 2 is the diffusion coefficient. This result is in agreement with that found in Stratonovich, The justification for our procedure is that physical processes are described by noise that is only approximately white. For the physical process, one can use the ordinary methods of calculus. The iteration is necessary to retain terms that will retain a finite value in the limit as the correlation of the noise approaches a delta function. Direct use of the Ito choice, Eq. (16.32), and starting from Eq. (16.16), leads to Eqs. (4.6|11.1) in Hull (1989(2001), the result quoted in our Eq. (16.25). We simply claim that this result is not the answer to the original model, namely that the logarithm of the price obeys the standard Brownian motion. A direct proof of this remark can be made without using stochastic integrals. Noting that the Gaussian distribution of P(x, t) satisfies the Fokker-Planck equation
which contains the constant diffusion term D = (l/2)<7jj, and the constant drift term A = p,. One can obtain the equation for S from the equation for P(x, t), by introducing the relation x = ln[S/S(0)]:
The result after some labor is
The Ito equation corresponding to this Fokker-Planck equation is simply
written in our notation, where tc = t — e guarantees that the last term averages to zero. The same equation written in Ito notation looks like:
280
STOCHASTIC METHODS IN INVESTMENT DECISION
The confusion in the financial literature arises because Eqs. (3.7J10.6) in Hull (1989)2001) states that his model is Lax :
but by occasionally multiplying this equation by S (without the strict use of the Ito's calculus lemma) to obtain or
Hull :
In summary, we do not claim that the Ito definition is wrong, but requires extreme care to obtain correct results. It tends to mislead smart people into obtaining an incorrect answer. The proper intuitive view for the Black-Scholes model should be that x, the logarithm of the price, obeys a standard Brownian motion. In other words, Eq. (16.12) is the correct model regardless of which calculus is used. When one makes a change of variable from the logarithm x, to the actual price, S, the appropriate stochastic differential equation that obeys the Ito rules will be Eq. (16.39). The differences between Hull's results and mine (Lax) (as well as with some of his own results) is due primarily to the use of two different models. The Ito notation just obscures this point. It could be used with great care to obtain correct results. The models based on Ito's lemma should be avoided because it is counterintuitive for physical reasons. On the other hand, the procedure used in Section 10.3 would reduce the number of errors and avoid the use of Stieltjes and Lebesgue integration. The main disadvantage of our proposal is that it will reduce the number of jobs needed for mathematicians to teach measure theory. The discussion of models for stock prices and market behavior can then be devoted more heavily to real world questions and less heavily to formalism. For volatile stocks, the difference between the two possible market models, Eq. (16.12) and Eq. (16.16) can be appreciable. In a completely rational world, the growth parameter would be determined completely by the growth rate in earnings per share. Since this is not the case in the real world, the parameter /z is obtained by fitting against stock prices. Depending on how this is done, the fitting procedure might cancel the error in the use of the model Eq. (16.16) instead of Eq. (16.12). In our work on laser line-widths discussed in Chapter 11, the growth and decay rate can be separately determined, and there is no flexibility in our choice. The excellent agreement for the laser line-widths with experiment, as shown in Fig. 11.3, supports the iterative procedure used in Chapter. 10 for relating the Langevin to the Fokker-Planck pictures. It is supposed that the mathematical techniques developed for the study of random processes in physical systems can be applied in future for the economic and financial worlds.
VALUE OF A FORWARD CONTRACT ON A STOCK
16.6
281
Value of a forward contract on a stock
As the simplest application of Ito's lemma, Hull (Example 4.1) considers the value of a forward contract on a nondividend paying stock. We already found in Eq. (16.2) that the forward price should be
However, the dynamics of S implies an associated dynamics of F. To get dF/dt when Ito's lemma is used, Hull's Eq. (4.16) which applies to any function of S and t is
The extra term involving d2F/dS2 due to Ito's lemma vanishes for the choice of Eq. (16.41), since F is linear in S. Thus F obeys a dynamics, including the noise, similar to that of the stock price S, but with the growth rate v reduced by the riskfree interest rate r. However, in Eq. (16.43) v = p, + (l/2)<7^ when the physical stock model is used in Ito's formula, which leads to [p, — r + (l/2)a^l]Fdt in the first term. Now, we use our Langevin approach described in Section 10.3, where the ordinary calculus can be used. From Eq. (16.12), we have for S
where For F, we have
in Eq. (16.19) is used.
In order to obtain the average d(F}, using
we have and for the conditional average, (F} on the right hand side is replaced by F, which is in agreement with result of Ito's approach.
282
STOCHASTIC METHODS IN INVESTMENT DECISION
Our Langevin approach is easier than that using Ito's lemma. Instead of using Ito's calculus lemma at each steps of transform from x to S, and to F, the ordinary calculus rule can be used in each step in our approach, and d(F}/dt is then determined using Eq. (16.46) at last step. 16.7
Black-Scholes differential equation
We showed in Section 16.2 that the risk of owning an asset could be canceled out by also having a forward contract to sell the asset. And the combination is risk neutral provided that an appropriate value F is set as the forward contract. Can this scheme be extended to the case when the asset is a stock that has a growth rate ^5 and is subject to noise proportional to the value of the stock? It is assumed that derivative asset (say a put) has a value g = g(S, t) for one derivative security. Using the Ito's lemma, g(S, t) obeys the stochastic differential equation
where, by the Ito convention, the second term has a vanishing average value. To obtain a risk free portfolio, we must have a combination of assets in which the term related to the change rate of stock v vanishes. This can be accomplished by combining a put the equivalent of —1 shares which takes the value —g, with dg/dS shares of the stock. The combined value is
where the subscript t reminds us that the number of shares does not change during time evolution, except when it is adjusted by investor. The ratio of these two components was chosen to cancel the v contribution from each other. This cancellation occurs if one follows Hull and writes
Using Eq. (16.48) for Ag and AS = vS + aHSdz, the result is
Assuming the validity of this result, the value of II must grow at the interest rate r , or arbitrageurs could make a risk free profit:
DISCUSSION
283
The result is the differential equation
Thus we obtain the Black-Scholes equation:
In the above Ito's equations, z/ = // + (1/2)0^. However, v has been canceled out in the Black-Scholes equation. Now, we use our Langevin approach described in Section 10.3 for the combined asset IT in Eq. (16.49). Using the ordinary calculus, the Langevin equation for II is given by
where The contribution to the average or the conditional average with II (t) = IT, from the second term in Eq. (16.55), is the average of
The result on average of A(II) is same as Eq. (16.51). If the definition of Ito's integral is applied, the second term in Eq. (16.55) should be zero, because a(g) is estimated at time t, hence, the fluctuation term is completely canceled. However, according to our point of view, through a very short period of time, the change of a(g) leads to the nonzero average of the second term Eq. (16.56), and it is impossible for one to adjust his shares in such a fast way. 16.8
Discussion
Stock model
The motivation to separate the right hand side of the stochastic differential equation into two terms and to assign a standard fluctuation force f ( t ) in the second term is not from mathematics, but from modeling of the real world. In physics, using physical modeling of a system of finite parameters (for example, an oscillator, or a system of coupled oscillators) and using the physical
284
STOCHASTIC METHODS IN INVESTMENT DECISION
rules (for example, the Newton's law), the deterministic function of drift velocity B(a,t) of a system can be determined, and it should be a smooth function of time t. On the other hand, the fluctuation force comes from connection of this system with the "heating reservoir", which has infinite dimensions. See Sections 7.6 and 7.7 for detailed discussion. Concerning the price of a stock, in a completely rational world, the growth parameter of the price of a stock would be determined by the growth rate in earnings per share and other known conditions. Since this is not the case in the real world; the stock price fluctuates because of many unknown causes. The fluctuation force introduces an irregular, nondeterministic, and very fast-varying oscillatory motion of S with time. LSDE builds a model to represent effects of two different original forces on the system. Of cause, there are some stochastic processes (for example, scattering in a turbid medium in physics), where fluctuation plays the dominate role, and separating into two terms in LSDE is not proper. Here, we limit, however, to the cases that the model in LSDE is suitable, as well as in ISDE. The equation of motion described by a linear model of a stock, Eq. (16.12) or Eq. (16.15), is only the first perturbative approximation of a nonlinear model, and is only valid over a very short period from the spot time. For a longer period of time, the first term in this equation means that the price of S will increase (or decrease) exponentially, which certainly does not reflect real development of S. Also, the second term of this equation means that the width of fluctuation of S will increase with time, and there is no force to limit the amplitude of fluctuation. We suggest the following model for a stock S. First, a model for underlying value of a stock So(t) can be built based on fundamental analysis, which is a deterministic and slow-varying function of time. One may also build a linear model of So(t) by use of an extrapolation of previous data of S to obtain So(t) = n(t — to) So (to), where So (to) is the extension of previous value of SQ up to the spot time to, and n is a rate of increasing (or decreasing) of the stock. The So(t) provides an "operating point" for stock S as a random variable. We define S(t) = S(t) — So(t), and build a stochastic differential equation for S(t~). The stock value S'(t) oscillates around So(t), hence, B(S,t~) represents a model of drift velocity of S. The simplest model for this purpose is a harmonic oscillator, where a constant recovering force F = —aS makes S(t) oscillating around S = 0 with a certain amplitude. The amplitude of S(t) limits the value of S(t) within a certain range. When a fluctuation force is added, the value of S(t) will not diffuse to infinity with time. However, the velocity of a harmonic motion is too fast when S is near zero, and too slow when S is near its amplitude value. The real situation is that S(t) slowly changes when its value is near S'o(t), but it rapidly changes when its value
DISCUSSION
285
largely deviates from So(t). This suggests that a nonharmonic model is needed, for example, the recovering force can be chosen as F — —aS — (3S3. Recently, a form of B(S, t) = —aS is often used, which originates from Uhlenbeck's model. Under this linear model S(t) will exponentially approach to So(t) from its spot value. The advantage of this model is easy to be manipulated in calculation. However, one may ask why S(t) only approaches to So(t) from one side, but cannot cross through So(t) to the other side of So(t). If S is limited to oscillate in a certain range, cr(S) in the second term of SDE should not dramatically change, and the extra term in Ito's lemma (or the contribution to expectation from the second term in LSDE) may be relatively small and can be treated using some approximation, that reduce the difficulty in solving the equation of motion of the expectation value of a derivative security with time. Conditional expectation
We emphasize that the stochastic processes in finance, similar to that in physics science, are natural processes. People who have some prior information and knowledge may build the more proper model to match development of the natural stochastic process of a marketed asset, but, in general, does not change this natural process. The state of a random variable a at time t is best described by its probability distribution P(a,t), not its spot value realized at time t, because the spot value of a is undetermined and it jumps up and dawn in a very fast way. Based on this view point, we would like to discuss the concept of the conditional expectation. Mathematicians denote the conditional expectation by the symbol E(Xt> \J-~t), t' > t, where Xf is a stochastic process, and J-± is a a-field known as "natural filtration". For example, estimation of the price of an option (or future) H(T — t) with maturity time T is determined based the spot value of its underlying stock S at current time t. Speaking differently, one already has information that the closing price today of S(t) is S. This information provides a filtration under which the the expectation of S(t') in future can be calculated. However, one may ask a question whether the the price H(T - t} determined based on S at 4:00 PM, or that based on 5" at 3:50 PM is more reasonable, since the spot value of S may be remarkably different by irregular jumps during the last ten minutes before closing. In our opinion, the expectation based on probability distribution P(S, t) at spot time t could provide a more reasonable estimation of H(T -1}, not the conditional expectation, because the probability P(S, t) can not dramatically change during a small interval of time, but its realization value can jump up or dawn during a very short period of time. In physics, we use "ensemble" to describe all possible states of a system under a certain probability, and do not regard a sample in an ensemble as a meaningful quantity. In Monte Carlo simulations we do not regard a single
286
STOCHASTIC METHODS IN I N V E S T M E N T DECISION
path essential; similarly, the point by point value in the time path of a stock is not essential. Discrete processes
Discrete random processes, for example, the early process after a suddenly big jump of a stock, is beyond the LSDE description and also beyond ISDE. In discrete random processes, the nth order diffusion coefficients Dn are nonzero, up to infinite order of n, and Eq. (10.4) should be replaced by a series of equations:
In general, it is a difficult problem since it needs an infinite number of parameters to describe the detailed shape of the probability distribution function. In some special cases, for example, the Poisson process, the generation-recombination process, and the Rice shot noise processes, the parameters describing fluctuation can be reduced to a finite number. Chapters 8 and 9 provide a theoretical approach where cumulants higher than second order is involved, so that the distribution will be nonGaussian. In this case, a generalized Fokker-Planck equation, instead of the ordinary Fokker-Planck equation, should be used, which is discussed in Section 8.4. A similar example in physics is that particles are injected in a scattering medium with certain velocity, hence, the process is not a pure Brownian motion. At early time, particles move as "ballistic". By multiple collision with scatterers in the medium, the particle distribution changes with time from ballistic-like to a Gaussian-like. This process is quantitatively studied in Chapters 13 and 14. Similar phenomena often appear in the stock market. 16.9
Summary
Our viewpoint expressed in Lax (1966IV), Section 16.4 is that mathematicians have concentrated too exclusively on the Brownian motion white noise process which has delta correlation functions. Real processes can have a sharp correlation time of finite width. Thus their spectrum is flat, but not up to infinite frequency. Thus, for real processes, the Riemann sums do converge, and no ambiguity exists. After, the integration is performed, the correlation time can be allowed to go to zero, that is, one can then approach the white noise limit. Thus the ambiguity is removed by approaching the integration limit, and the white noise limit in the correct order. In summary, there are two kinds of stochastic differential equations: (a) those whose random term need not average to zero, as used in Eq. (10.33), and (b) those used by Ito in Eq. (16.32) in which the average of the random term vanishes
SUMMARY
287
by Ito's definition of the stochastic integral, together with Ito's calculus lemma. Both can be used correctly, but Ito's choice requires more care because the usually permissible way in which we handle equations is no longer valid. An able analysis of this situation, with references to earlier discussion, is presented by van Kampen (1992). Hull (1989, 2001) book is an extremely well-written text on Options, Futures, and other Derivative Securities. The Ito-Stratonovich controversy applies to physics, chemistry and other fields. We have analyzed Hull's treatment in detail because of his use of the Black-Scholes (1973) work. Although the Ito choice can be dangerous, as shown above, we trust that by now the Ito choice is used consistently in practice. However, there may be more serious problems, since real options may have statistics based on more wildly fluctuating processes than Brownian motion, such as Levy process and fractal processes. This has recently been emphasized by Peters (1994), by Mandelbrot (1997), and by Bouchard and Sornette (1994). For an excellent review of Brownian and fractal walks see Shlesinger (1997). Gashghaie et al (1996) have also found a parallelism between prices in foreign exchange markets and turbulent fluid flow. Thus the conventional Brownian motion approach will be invalid in such markets.
17
Spectral analysis of economic time series
17.1
Overview
Nature of the problem
Most of our discussion of random variables referred to variables existing over a continuous time. The phrase "time series" emphasizes variables known only at a discrete set of times. The phenomenon may actually be continuous in time, but experimental measurements may have been made only at discrete times. Closing (and/or opening) stock market prices occur only one each day, although prices may be available minute by minute. Extrapolation, Interpolation and Smoothing of Stationary Time Series, the title of a book by Wiener (1949a), indicates the problems to be solved. In Section 17.10 we present a sampling theorem that demonstrated that the value of a band-limited function sampled at the Nyquist rate, 1/(2W), where W is the bandwidth, can be deduced at all points from all the sample values on a discrete lattice. In this chapter, however, we face the more difficult problem of answering the same question when data is available only over a finite number of points. Even more difficult is to do the smoothing, interpolation or extrapolation when we deal not with a deterministic function, but with a random variable. Norbert Wiener (1930) laid the foundations of this subject. In his Generalized Harmonic Analysis, he also developed prediction methods used in gun control that were first made public in Wiener (1949b). A brief summary of Wiener's work leading to a derivation of a Wiener-Hopf type of integral equation is given by Levinson (1947). The smoothing, or filtering part of Wiener's work is an attempt to extract signal from contaminating noise, and has much in common with our work in Chapter 15 on "Signal Extraction in Presence of smoothing and Noise". However, the word smoothing is used in two different senses. In Chapter 15, the smoothing is that which occurs in any measurement process, and one tries to invert this process, to obtain a sharp signal from a blurred one. In Wiener's case, one merely wishes to smooth away the effects of added noise to get at what the signal would be if no noise were present. Statisticians are typically more at home in the time domain associated with "time series" whereas communication engineers prefer the frequency domain. These two are united by the Wiener-Khinchine theorem that establishes the (time)
OVERVIEW
289
autocorrelation as the Fourier transform of the noise spectrum. Wold's theorem is the corresponding relationship for a discrete time series. Whether the problem is discrete or continuous, it is clear that a principal problem is to determine the spectrum of frequencies contained in the data. Presumably, if one can determine the spectrum of the nondeterministic part of the motion, one can smooth out, or attempt to eliminate the random part of the motion, and to better predict the nonrandom part. Actually, one usually may proceed in the opposite direction to eliminate any secular part of the motion, and any sharp (periodic) frequency components. This last part includes making seasonal corrections to the data. Then a better estimate of the residual spectrum can be made. Presumably, better results can be obtained by using a combined procedure, or by iterating the trend removal and spectral analysis procedures. Why is the problem so hard Why is the spectral analysis of time series so hard that it has spawned dozens of books and hundreds of papers, often recommending different procedures. After all, a direct Fourier transform of a time series can be performed using the FFT (fast Fourier transform) which is normally a reliable accurate procedure. One problem is that for continuous time data, a Fourier integral is only given correctly by a discrete sum if the function is bandlimited. See the discussion of the sampling theorem in Section 17.10. This difficulty is compounded by the fact that even if the trend or periodic components in the data are band-limited, the noise present in the data is not frequency limited. The result is that the spectral calculation, is an ill-posed problem, whose answer, like all the problems discussed in Chapter 15, is nonunique until the problem is regularized by some smoothing procedure. The customary smoothing procedure in the time series case is referred to as windowing or tapering. Harris (1967) compares 23 different classes of windows. Priestley (1981) gives details of 11 windows, and Percival and Waldon (1993) uses four windows: Bartlett, Daniell, Parzen, and Papoulis, in examples, and describes Manning and Hamming windows. The problems in this chapter are more difficult than those discussed in our previous chapters because here we are dealing directly with unaveraged data. In our earlier work, we started from some physical model and obtained equations, such as Fokker-Planck equations obeyed by the ensemble average. Here the aim is to deduce the model from the data. A compromise procedure involves introducing a model described by a small number of parameters, and using the data to deduce these parameters. Such parametric models are discussed in many texts. See, for example, Brockwell and Davis (1991). The rectangular window, a direct transform, is equivalent to what we have called the engineering definition of noise in Section 4.1. The unbiased form of this window is called a periodogram, and dates back to Schuster (1898). However,
290
SPECTRAL ANALYSIS OF ECONOMIC TIME SERIES
Thomson (1982) has shown that although the method converges as JV —» oo, in a typical physical example, reasonable results were only obtained for N > 106. There is a biased form of the periodogram that uses a factor 1 — r\/N that adds bias but reduces variance, with a net improvement. An optimum discrete window was described by Eberhard (1973). The window coefficients are given by a discrete prolate spheroidal sequence (DPSS). The fundamental importance of discrete prolate spheroidal functions was explained in terms of the uncertainty principle in a series of papers by Slepian (1964, 1965, 1978) with the initial paper by Slepian and Pollak (1961). Slepian and Pollak (1961) recognized that the uncertainty principle imposes limitations on the attempt to fit a nonbandwidth limited function, known only over a finite interval —T/2 < t < T/2 by a function band-limited over the frequency band — W < f < W. One can obtain an extremely accurate fit at the expense of wild oscillations from components outside the bandwidth. Slepian and Pollak (1961) proposed getting a least squares fit inside the band — W < f < W, while maintaining a fixed energy outside the bandwidth. Their result leads to a variational principle in which the fraction of the total energy inside the bandwidth is maximized. This turns out to be the equivalent of maximizing the fraction of energy within the time interval —T/2 < t < T/2. The variational expression leads to an integral equation whose solutions are recognized to be prolate spheroidal wavefunctions. These eigenfunctions of the integral equation constitute an orthogonal basis set. Thomson (1982) arrives at the same integral equations in a completely different manner by seeking a direct inversion of the relation between spatial and frequency amplitudes to obtain the frequency amplitude rather than just the power spectrum (or absolute squared amplitude). In this way, phase information is retained. Thomson's contributions My (Lax) interest in the field of time series has been greatly stimulated by personal contact with David J. Thomson who has spent a lifetime career covering all aspects of time series. We can describe the present chapter as our attempt to learn enough about time series to be able to read Thomson's work. We shall therefore record a subset of his publications to indicate the breadth of topics covered. His work started, appropriately for Bell Laboratories, with the analysis of time series in waveguides used to transmit information in the telephone network. See Thomson (1977). This work was expanded in Thomson (1982) already referred to. His 1982 paper constitutes the foundation of much of his later work. In Kleiner, Martin and Thomson (1979) Thomson shows how to apply Tukey's ideas of robustness to spectral estimation. Tukey's book on Exploratory Data Analysis shows how to deal with real data. When is an apparently deviating point an outlier to be discarded? See Tukey (1977) and Thomson (1982). His work was also applied to the
THE WIENER-KHINCHINE AND WOLD THEOREMS
291
global warming problem in Kuo, Lindberg, Craig, and Thomson (1990). This first serious paper on 'recent' climate has been cited by Al Gore! Thomson (1990a) on analysis of the earth's climate extended to a period of 20,000 years, and correlated CO2 data with tree ring data. His next work, Thomson (1990b), extended time series over 600,000 years and established a sensitivity of the results to the small time differences between the siderial year, the equatorial year, and the solar year. Thomson and Chave (1991) also adapted jack-knife procedures to deal with nonnormal variables and confidence limits. The current picture of global heating is discussed in Thomson (1995). Thomson, MacLennan, and Lanzerotti (1995) use time series techniques to analyze the propagation of solar oscillations through the interplanetary medium. Perhaps Thomson's most important work on global heating filters the precession signal out of the data and establishes a strong correlation between global worming and the CO2 concentration. See Thomson (1997). An application is made to the financial world by studying stock and commodity data over a 40 year period; see Ramsey and Thomson (1999). A comprehensive review was given by Thomson (1998) of his work on "Multitaper Analysis of Nonstationary and Nonlinear Series Data", presented at the Isaac Newton Institute.
17.2
The Wiener-Khinchine and Wold theorems
The purely random part of a time series is usually described as a superposition in frequency space of various spectral contributions. The inverse of Eq. (4.48) in our notation [with a(t) replaced by x(t) but a(uj) replaced by x(/)] is
It is consistent to think of x(t) as a realization or sample of the random variable X(t). In statistics books the above equation is written in Stieltjes notation as
where a comparison of these notations suggests
The Stieltjes form is needed for mathematical rigor, when the spectrum of X contains Brownian motion (white noise), or a delta function of time autocorrelation. In the real world, in which the noise can be approximately white, and spectral lines are narrow, but not infinitely so, Stieltjes integrals are unnecessary, and our simpler
292
SPECTRAL ANALYSIS OF ECONOMIC TIME SERIES
notation can be followed. Even when white noise is present, the second moments are defined via Eq. (4.50) and the relation
The first and last terms follow the notation of the time series texts by Percival and Walden (1993) and Priestley (1981), and the middle terms follow the notation in this book. The Wiener-Khinchine theorem (for continuous time), Eq. (4.11), takes the form
For the discrete time case, with x(t) known only at the integers t = 0,±1,±2,..., the same theorem applies but the limits of integration in Eq. (17.5) extend only from — 1/2 to 1/2. That is because exp(j2vr/i) is indistinguishable from exp[j2vr(/ + l)t] since t is an integer. Thus all frequencies outside the basic bandwidth from —1/2 to 1/2 are folded into that basic bandwidth. This process is referred to as aliasing, and is well known to anyone who has watched rotating wheels in the movies (at l/24th of a second) and found that they can appear to be rotating backward. In the study of crystals, there is a similar folding of all wave vectors into the first Brillouin zone. Since the spectrum S(f) is, by definition positive, the normalized p(f) = S(f)/ j S(f)df is a probability density. Correspondingly, we defined Priestley (1981) describes the Wold (1938) theorem, Eq. (17.7), below, as the necessary and sufficient condition that the set of numbers p(±n) can be an autocorrelation. The Wold condition is
that is, the autocorrelation p(r) must be the Fourier transform of a probability density p(f). Assuming that we are dealing with a stationary random process, the first objective in dealing with a time series is to obtain its spectrum. The simplest procedure, for a single discrete sample of x(t] for t = 1 , 2 , 3 , . . . , AT is to use the estimate
This is referred to as the Schuster (1898) periodogram in the statistics literature. It is equivalent to what we have called our standard engineering definition of noise,
MEANS, CORRELATIONS AND THE K A R H U N E N - L O E V E THEOREM 293
Eq. (4.3), except that the latter definition takes an ensemble average and a limit as N —> oo. In time series analysis the ensemble and time averages are not available since we have only one finite set of data. A closely related estimate of the periodogram can be obtained by using the Wiener-Khinchine theorem to get the spectrum as the Fourier transform of the autocorrelation and the estimate the latter from the sample data. For times, t, separated by At instead of unity, the spectrum is given in terms the correlations by
where ST is given by
with the Nyquist sampling frequency. If one makes the common choice of units such that At = 1, one obtains Eq. (17.8). Equation (17.10) is biased. If all the summands were unity, the result would be 1 — r\/N. The correlator associated with this periodogram can be modified by replacing 1/N by 1/N' in Eq. (17.10), with Nf is given the value N — \T\, and the "periodogram" is said to be unbiased. However, due to variance errors, the biased (original) periodogram is often superior. We here follow the notation and approach of Percival and Weldon (1993) including the assumption that the process mean is zero, which is easily eliminated by subtracting the mean from each variable. By rearranging the order of summation, the inverse of Eq. (17.10) then permits the spectrum to be estimated by
Thomson and Chave (1991) however claims that the use of autocorrelations and spectral densities throws away phase information. He therefore follows a procedure outlined in Section 17.4. 17.3
Means, correlations and the Karhunen-Loeve theorem
Means and correlations The mean of the set of random variables (RV) X± at all times t is given by
294
SPECTRAL ANALYSIS OF ECONOMIC TIME SERIES
where the last step is valid if the process is restricted to be stationary. An unbiased estimate of the mean is given by
since E(X) = //. Note that X is still a random variable. No ensemble average has been performed. The variance of Xt is denned by where, again, the last expression is valid only in the stationary case. The covariance of two RV is defined by is linear in both X and Y. The correlation of two random variables is defined by The variance in X, the estimate of the mean can be written
If one sums first along the diagonal (at fixed r = t — s), Eq. (17.18) can be rewritten as
where we note that TV — |r| is the length of the relevant diagonal and r| < N. Since the sum converges, var(X) approaches 0 as N —» oo, and the estimate X is unbiased. The Karhunen-Loeve theorem
Our previous discussion describes the expansion of our series variables Xt = X(t) in a Fourier series. Are there advantages in expanding in another set of orthogonal functions >(£)? For simplicity, let us assume that mean values have been subtracted off so that The Karhunen-Loeve theorem, (karhunen 1947, see also Kac and Siegert 1947) states that if the orthogonal functions are chosen to be eigenfunctions of the correlation function R(t, s), the expansion coefficients will be uncorrelated random variables.
SLEPIAN FUNCTIONS
295
Proof Start with the expansion
Then the correlation of the expansion coefficients is given by
with and the eigenvalue equation
Eq. (17.22) reduces to and the theorem is proved. If one assumes stationarity, then
In the case of discrete time, the integral over time is replaced by a sum over time. But the theorem remains valid. 17.4
Slepian functions
Although Slepian's argument that spheroidal functions should be used as a basis to localize signal as much as possible in space and time is fairly compelling, it does not establish that this procedure yields the best spectrum. Thomson (1982) starts, instead, from Cramer's spectral representation of the discrete signal x(t) where t = l,2,...,n,...
where for convenience, times are measured from the center of the series. The limits ±1/2 are appropriate for the spacing At = 1 since that corresponds to the Nyquist
296
SPECTRAL ANALYSIS OF ECONOMIC TIME SERIES
sampling rate, and frequencies outside the first zone can be shifted in to the first zone by a change of an integer which has no effect on the exponential. The Stieltjes integral can be replaced by a Riemann integral for a continuous spectrum by making the replacement as was done to get the second form of Eq. (17.27). It is necessary to remember that x(n) is the discrete time series, and that dZ(v) and x(v) are in frequency space. A zeroth approximation to the frequency amplitude is given by taking the discrete FFT:
Since Eq. (17.29) provides a Fourier series representation of y(f) with period 1, the Fourier coefficients are given by
If one now inserts Eq. (17.27) for x(n) into Eq. (17.29) one obtains
where KN is simply a geometric sum
This kernel reduces to sinc(Arvr/) = sin(A^7r/)/(vr/) in the limit of continuous time, the same function as found in Eq. (17.71) for the case of continuous variables. The continuous kernel had appeared very early in the classic work by Wannier (1937). Equation (17.31) can be regarded as the integral equation for the signal x(v) given a measured output y(f). This is exactly analogous to the integral equation relating the signal s(x) to the measurement m(x) in Chapter 15 on extraction of signals from noise in an ill-posed problem. As in that case, the nature of the difficulty can be understood by performing the inversion in a representation in which the measuring device K has been diagonalized. In that representation the inversion simply multiplies each component by the inverse of the corresponding eigenvalue.
SLEPIAN FUNCTIONS
297
The difficulty occurs because the high-order eigenfunctions, which are rapidly oscillating, have small eigenvalues. Thus any rapidly oscillating noise is greatly amplified. Hence, like all ill-posed problems the solution is not unique and cannot be made so without a regularization procedure that rejects rapid oscillation. In the time series problem, the time-honored procedure is to filter the inversion procedure through a low-pass "window". Thomson's (1982) procedure is to generalize to a multiwindow ("multitapered") procedure. The above arguments indicate that it is appropriate as well as convenient to solve Eq. (17.31) by expanding the solution in terms of the eigenfunctions of this Dirichlet kernel, £>(/) = KN(f)/N,
In the presence of band limitation, W will be less than 1/2. These eigenfunctions are referred to as the Slepian functions, in honor of Slepian who recognized their importance in signal representation. Each eigenfunction depends on N and W as parameters, so a complete labeling of the Slepian eigenfunctions is Uk(N, W, /). These are also referred to as Discrete Prolate Spheroidal Functions (DPSF). To obtain spectral amplitudes, we must solve Eq. (17.31) for x(v) in terms of j/(/), the preliminary direct spectrum (where v is also a frequency). By expanding the known y(f) in terms of the eigenvectors
we can solve for x(f)
or in final form
The properties of the Slepian (1978) functions are analyzed in great detail in his paper. The eigenvalues are found to be nondegenerate and monotonic decreasing: The eigenfunctions are found, surprisingly, to be orthogonal over the interval of [—1/2,1/2] as well as over [-W, W]. If they are normalized over the larger region
298
SPECTRAL ANALYSIS OF ECONOMIC TIME SERIES
(where no complex conjugate is taken because the eigenfunctions are real), the corresponding orthogonality condition over the limited bandwidth is
Comparison between Eqs. (17.38) and (17.39) shows that the eigenvalue A& represents the ratio of the energy in the inner region [-W, W] to that in the full region [—1/2,1/2]. This guarantees that all the eigenvalues are less than one, and in view of Eq. (17.37), the shape most concentrated in frequency is associated with the k = 0 mode. Indeed, that is why Eberhard (1973) advocated using that mode, alone, as the appropriate window. 17.5
The discrete prolate spheroidal sequence
DPSF Uk(f) as functions with a period of unity in frequency space, can be represented as a Fourier series,
This series from n = no to n = no + N — 1 is centered at the midpoint. The (k] sequence vn is called the discrete prolate spheroidal sequence (DPSS). Slepian (1978) considers this series with an arbitrary choice of no, whereas Thomson (1982) specializes to the case of no = 0. We shall follow Thomson's choice here. The phase factor, otherwise arbitrary, is chosen to be
to conform to Slepian's notation. The sequence obeys the matrix eigenvalue equation
where
The eigenvalues are shown to be monotonic decreasing
In principle, the eigenvectors and eigenvalues can be obtained by standard subroutines such as those included in EISPACK, or modernizations of the same.
THE DISCRETE PROLATE SPHEROIDAL SEQUENCE
299
Unfortunately, this is not useful because the first 2NM eigenvalues are close to unity. Table 17.1 displays the first four eigenvalues, A/t, for k = 0, 1, 2, 3, listed as \k(N, W), for N = 31 and three bandwidths W = 6/31, 7/31, 8/31 obtained in this way. TABLE 17.1. Eigenvalues, \k(N, W), of the discrete prolate spherical series (after Percival and Walden 1993). k A fc (31,6/31)_ A fe (31,7/31)_ Afc(31,8/31) 0 0.9999999999999997 1.000000000000007 1.000000000000002 1 0.9999999999999769 0.9999999999999933 1.000000000000001 2 0.9999999999978725 0.9999999999999921 0.9999999999999945 3 0.9999999998764069 0.9999999999998924 0.9999999999999908 The eigenvalues are so close, include some values above unity, which are clearly wrong. This inaccuracy means that the associated eigenvectors, using these values, will also be inaccurate. In the case of continuous time problems Slepian and Pollak (1961) found that eigenfunctions that minimize the amount of energy outside some time boundary [—1/2, 1/2] and obey an integral equation, can also be found to be solution of a second order differential equation. For the discrete time problem, The solutions of an integral equation of the form of Eq. (17.33) can also be obtained as solutions of a second order difference equation
with the boundary conditions
This matrix equation has eigenvectors t/fc) and eigenvalues 9k, as functions of N, and W. The eigenvectors v^ (N, W) are identical to those in the original integral equation. However, the eigenvalues Ok(N, W} are now well separated. As a result, it is easy to calculate the eigenvectors vk accurately, without requiring quadruple precision, and then A& can be determined by
where the matrix Anm was denned in Eq. (17.42). The matrix in Eq. (17.44) is a symmetric, tridiagonal matrix. For such matrices there are subroutines in most libraries that will perform the diagonalization, using
300
SPECTRAL ANALYSIS OF ECONOMIC TIME SERIES
computation time of order N, rather than the time of order N3 needed for the general matrices. The necessary subroutines appear in the IMSL, NAG, and the PORT Bell Laboratories Mathematical Subroutine Library by Fox, et al. (1978a) as well as among the subroutines included in the Numerical Recipes book by Press et al. (1992). 17.6
Overview of Thomson's procedure
The procedure used by Thomson (1982) to analyze time series, in 41 IEEE pages, is sufficiently complicated that we would like to give a road-map. (1) The first (optional) step is to replace the original input data by a new set that is prewhitened. An example of such a procedure is outlined by Percival and Walden (1993). Convolute Xt with gu to get
The spectrum associated with g is given by
If the estimated spectrum, 5y (/), of Y is obtained by methods discussed below, that of §x (/) can be obtained from
If one can choose (/) to cancel most of the frequency dependence of Sx (/) then Sy(/) will be nearly constant and easier to evaluate accurately by the methods described below. Of course, this is a chicken and egg problem, since it supposes that an approximate spectrum, <§(/), is already known. A periodogram, or a parametric approach, can be used to get a zeroth approximation to the spectrum of X. If one can choose a prewindow filter that reduces the dynamic range of the resulting RV, the use of a Slepian window will yield a more accurate estimate of the spectrum. (2) Post-smoothing by a second window produces a modified estimate
Thomson (1982) recommends that this smoothing window be constructed using methods due to Papoulis (1973). The subscript D above implies that a data window
HIGH RESOLUTION RESULTS
301
Dn has already been used. However, the post smoothing window depends on the choice of data window. (3) The principal element of the Thomson (1982) method is to window the original, or prewhitened data by the discrete prolate spheroidal sequence (DPSS), Dn = (k] Vn , which proportional to the discrete Fourier transform of the discrete prolate spheroidal function (DPSF), [/&(/). Here, /, is a frequency, n is an integer, for the discrete time t, and k is an integer that refers to the fcth mode, £/&(/). Both Vn and Uk(f) are functions of N, the number of data points (times) and W, the bandwidth. 17.7
High resolution results
The first preliminary estimate of the spectral amplitude is y(f), the discrete Fourier transform of x(n), given in Eq. (17.29). But this covers the full frequency range, [—1/2, 1/2] with a single formula. Thomson (1982) in his Eq. (3.1) suggests that a Fourier transform truncated to the interval [/ — W, f + W] would provide a better resolution using the formula
This proposal is an excellent one, since one can prove that if S(f) were flat over this smaller interval, it would yield the correct spectral density at /. Presumably because Eq. (17.51) cannot be expressed directly in terms of observed x(n), Thomson introduces another formula for quantity yk(f), related to Zk(f) by Eq. (17.58) in the next section,
which also covers the truncated frequency region. Since y(f) is expressible i terms of the data x(n) via Eq. (17.31), he then obtains
after using the integral equation (17.33) for the Slepian functions. The advantage of this new form is that it is directly expressible in terms of the data
The disadvantage of this estimate is that it is no longer local but has contributions from the entire [—1/2,1/2] domain.
302
SPECTRAL ANALYSIS OF ECONOMIC TIME SERIES
Thomson then uses yk(fo) to obtain an estimate of the spectral density S(f; /o) at any / in the interval /o — W < f < /o + W . He then averages this result over the interval to obtain the average result
where each can be regarded as an individual spectral estimate, with the fcth data window
17.8
Adaptive weighting
Thomson (1982) attempts to obtain an estimate closer to the genuinely local estimate of Eq. (17.51) by using a weighted average
The weights are chosen to minimize the sum of least squares of namely the differences between the ideal estimate of the amplitude and the weighted estimate dk(f)yk(f)- The result is that the weights are estimated from
This procedure depends on S(f) and so must be self-consistent. The result is an iterative solution of
Here £?&(/) is referred to by Thomson as the broadband bias
The integral in Eq. (17.61) is over the cut region
The evaluation of the estimate B(f)is discussed in Thomson (1982), Section 5.
TREND REMOVAL AND SEASONAL ADJUSTMENT
17.9
303
Trend removal and seasonal adjustment
For an elementary discussion of the removal of periodicity and trend see Williams (1997). For a broad perspective on spectral analysis see Tukey (1961). A good general reference is Handbook of Statistics III, by Brillinger and Krishnaiah (1983), as is Harris (1967) and Tukey (1967). A more detailed description of the techniques used in what is now called "complex demodulation" is given in Hassan (1963). The methods described in this chapter are nonparametric, in the sense that one doesn't have a model with a small number of parameters which are fitted by analyzing experimental time series data. Fourier techniques have been used. In a sense, parameters are involved. But there are so many that no assumptions are really made that pertain to a model, except possibly the assumption that one is dealing with a stationary process. A recent comparison of the relative merits of parametric and Fourier transform methods has been given by Gardiner (1992). He also suggests methods of combining parametric and Fourier methods. For an excellent elementary presentation of methods of dealing with nonstationary time series see Cohen (1995) who discusses the effects of many of the popular filters. 17.10
Appendix A: The sampling theorem
The Poisson sum formula A useful lemma, see Lax (1974), that leads to a special Poisson sum formula is
This theorem can be established by taking the left hand sum from —NtoN, and then taking the limit. A simpler approach is to note that the right hand side g(x) regarded as a function of x, is periodic
The Fourier series representation over the domain 0 < x < a leads to the relation
where
304
SPECTRAL ANALYSIS OF ECONOMIC TIME SERIES
In order for Eq. (17.65) to be true, we must have
In order that g(x) be periodic, A(x — x') must possess the same delta function in each interval from na to (n + l)a. In other words,
If this result is equated to the definition, Eq. (17.66) we have the special Poisson formula, Eq. (17.63). If we integrate Eq. (17.63) over an arbitrary function f ( x ) , we obtain the usual Poisson sum formula, see Titchmarch (1948),
where f ( x ) is the Fourier transform of F(k),
A generalization of the Poisson sum formula to three-dimensional (possibly nonorthogonal) lattices was given in Lax (1974), Chapter 6. The sampling theorem
and B = tr/a, makes the remarkable statement that f ( x ) is determined everywhere by its sample values f ( n a ) provided that its Fourier transform, F(k), in Eq. (17.70) is band-limited to the region
Proof Since F ( k ) = 0 unless \k\< B = vr/a, we can represent F(k) by a Fourier series over this finite region:
APPENDIX A: THE SAMPLING THEOREM
305
where the Fourier coefficients are determined by
where the second form makes use of the definition, Eq. (17.70). If Eq. (17.74) is inserted into Eq. (17.74) we obtain
Thus F(k) is uniquely determined by the sample values f(na). Inserting Eq. (17.75) into Eq. (17.70), but restricting the integration region to the band from —TT/O, to TT/O we immediately obtain Eq. (17.71), the sampling theorem. An alternate view of the sampling theorem can be phrased as follows. Suppose one has a set of values f(na) at the lattice points na and we wish to interpolate to obtain the values /(x) at other points. The sampling theorem, in the form of Eq. (17.71) provides the smoothest interpolation in the sense that any other interpolation will not be band-limited, and hence will involve higher Fourier components.
This page intentionally left blank
Bibliography Abramatic, J. F. and Silverman, L. M. (1982). Nonlinear restoration of noisy images. IEEE Transactions on Pattern Analysis and Machine Intelligence, PAM14, No. 2, 141-148. Alexander, H. W. (1961). Elements of Mathematical Statistics. John Wiley and Sons, New York. Alfano, R. R. ed. (1994). OSA Proceedings on Advances in Optical Imaging and photon Migration. Optical Soceity of America, Washington D.C. Anis, A. A. and Lloyd, E. H. (1976). The expected value of the adjusted rescaled hurst range of independent normal summands. Biometrika, 63, pi 11-116. Arfken, G. B. and Weber, H. J. (1995). Mathematical Methods for Physicists. Academic Press, San Diego. Backus, G. and Gilbert, F. (1970). Uniqueness of the inversion of inaccurate gross earth data. Philosophical Transactions of the Royal Society, 256, 123—192. Bayes, T. (1763). Essay towards solving a problem in the doctrine of chance. Philosophical Transactions of the Royal Society, Essays LII, 370—418. Bell, D. A. (1960). Electrical Noise. Van Nostrand. London. Bellman, R. E. (1964). Invariant Embedding and Time-independent Transport Processes. American Elsevier Pub. Co., New York. Bellman R. and Wing G. (1976). An Introduction to Invariant Embedding. Wiley, New York. Bendat, J. S. and Piersol, A. G. (1971). Random Data: Analysis and Measurement Process. Wiley, New York. Benjamin, R. (1980). Generalization of maximum-entropy pattern analysis. IEEE Proceedings, 127 pt F, 341-353. Bernstein, P. L. (1998). Against the Gods, the Remarkable Study of Risk. Wiley, New York. Black, F. and Scholes, M. (1973). The pricing of options and corporate liabilities. Journal of Political Economy, 81, 637-654. Bouchard, J. and Sornette, D. (1994). The Black-Scholes option pricing problem in mathematical finance: generalization and extensions for a large class of stochastic processes. Journal de Physique I (Paris), 4, 863-881. Brillinger, D. R. and Krishnaiah, P. R. (1983). Handbook of Statistics III. North Holland, Amsterdam.
308
BIBLIOGRAPHY
Brockwell, P. J. and Davis R. A. (1991). Time Series Theory and Methods. Springer Verlag, New York. Brown, R. (1828). A brief account of microscopical observations made in the months of June, July, and August 1827, on the particles contained in the pollen of plants; and on the general existence of active molecules in organic and inorganic bodies. The Philosophical Magazine, 4, 161-173. Bullard, B. C. and Cooper, R. J. B. (1948). The determination of the masses necessary to produce a given gravitational field. Proceedings of the Royal Society A (London), 194, 332-347. Burgess, R. E. (1965). Fluctuation Phenomena in Solids. Academic Press, New York. Cai, W., Lax, M., and Alfano, R. R. (2000a). Cumulant solution of the elastic Boltzmann transport equation in an infinite uniform medium. Physcial Review E, 61, 3871-3876. Cai, W., Lax, M., and Alfano, R. R. (2000b). Analytical solution of the elastic Boltzmann transport equation in an infinite uniform medium using cumulant expansion. Journal of Physical Chemistry B, 104, 3996-4000. Cai, W., Lax, M., and Alfano, R. R. (2000c). Analytical solution of the polarized photon transport equation in an infinite uniform medium using cumulant expansion. Physical Review E, 63, 016606. Cai, W., Xu, M., Lax, M., and Alfano, R. R. (2002). Diffusion coefficient depends on time not on absorption. Optics Letters, 27, 731-733. Cai, W., Xu, M., and Alfano, R. R. (2003). Three-dimensional radiative transfer tomography for turbid media. IEEE Selected Topics in Quantum Electronics, 9, 189-198. Cai, W., Xu, M., and Alfano, R. R. (2005). Analytical form of the particle distribution based on the cumulant solution of the elastic Boltzmann transport equation. Physical Review E, 71, 041202. Callen, H. B. and Welton, T. A. (1951). Irreversibility and generalized noise. Physical Review, 83, 34-40. Callen, H. B. and Greene, R. F. (1952). Theorem of irreversible thermodynamics. Physical Review, 86, 704-710. Cameron, R. H. and Martin, W. T. (1944). The Wiener measure of Hilbert neighborhoods in the space of continuous functions. Journal Mathematical Physics, 23, 195-209. Cameron, R. H. and Martin, W. T. (1945). Transformations of Wiener integrals under a general class of linear transformations. Transactions of the American Mathematical Society, 58, 184-219. Campbell, N. (1909). The study of discontinuous phenomena. Proceedings of the
BIBLIOGRAPHY
309
Cambridge Philosophical Society, 15, 117-136. Carley, A. E. and Joyner, R. W. (1979). The application of deconvolution methods in electron spectroscopy - a review. Journal of Electron Spectroscopy and Related Phenomena, 16, 1-23. Casimir, H. B. and Polder, D. (1948). The influence of retardation on the Londonvan der Waals forces. Physical Review, 73, 360-372. Cercignani, C. (1988). The Boltzmann Equation and its Applications. Series in Applied Mathematical Sciences, Springer-Verlag, New York. Chan, H. B., Aksyuk, V. A., Kleiman, R. N., Bishop, D. J., and Capasso, F. (2001). Quantum mechanical actuation of microelectromechanical systems by the Casimir force. Science, 291, 1941-1944. Chandrasekhar, S. (1943). Stochastic problems in physics and astronomy. Reviews of Modern Physics, 15, 1-89. Chandrasekhar, S. (1960). Radiative Transfer. Dover, New York. Chung, K. L. (1968). A Course in Probability. Harcourt Brace and World, New York, van Cittert, P. H. (1931). Zum einfluss der spaltbreite auf die intensitatsverteilung in spektrallinien. Zeitschrift fiir Physik, 69, 298-308. Cohen, L. (1995). Time-Frequency Analysis. Prentice Hall, New Jersey. Dahlquist, G. and Bjorck, A. (1974). Numerical Methods. Prentice Hall, New Jersey. Demoment G. (1989). Image reconstruction and restoration: Overview of common estimation structures and problems. IEEE Transactions on Acoustics, Speech, and Signal Processing, 37, 2024-2036. Deryagin, B. V. and Abrikosova, I. I. (1956). Direct measurement of molecular attraction of solid bodies. I. Statement of the problem and methods of measurement of forces using negative feedback. Zhurnal Eksperimentalnoi i Teoreticheskoi Fiziki, 30, 993-1006. Deryagin, B. V., Abrikosova, I. L, and Lifshitz, E. M. (1956). Direct measurement of molecular attraction between solids separated by a narrow gap. Quarterly Reviews - Chemical Society (London), 10, 295-329. Deutch, R. (1962). Nonlinear Transformations of Random Processes. Prentice Hall. New Jersey. Dirac, P. A. M. (1935). Principles of Quantum Mechanics. Oxford, New York. Doob, J. L. (1942). The Brownian motion and stochastic equations. Annals of Mathematics, 43, 351-369. Doob, J. L. (1953). Stochastic Processes. Wiley, New York.
310
BIBLIOGRAPHY
Durduran, T., Yodh, A. G., Chance, B., and Boas, D. A. (1997). Does the photondiffusion coefficient depend on absorption? Journal of the Optical Society of America A, 14, 3358-3365. Dyson, F. (1949). The S matrix in quantum electrodynamics. Physical Review, 75, 1736-1755. Dyson, F. (1949). The radiation theories of Tomonaga, Schwinger and Feynman. Physical Review, 75, 486-502. Eberhard, A. (1973). An optimum discrete window for the calculation of power spectra. IEEE Transactions on Audio andElectroacoustics, 21, 37-43. Eberly, J. H. and W'odkiewicz, K. (1977). The time-dependent physical spectrum of light. Journal of the Optical Society of America, 67, 1252-1261. Einstein, A. (1905). On the movement of small particles suspended in a stationary liquid demanded by the molecular-kinetic theory of heat. Annalen der Physik, 17, 549-560. Einstein, A. (1906). On the theory of the Brownian movement. Annalen der Physik, 19, 371-381. Einstein, A. (1956). Investigations on the Theory of the Brownian Movement. Dover, New York. Ekstein, H. and Rostoker, N. (1955). Quantum theory of fluctuations. Physical Review, 100, 1023-1029. Fadeeva, V. N. (1959) Computational Methods of Linear Algebra. Dover, New York. Feller, W. (1957, 1971). An Introduction to Probability Theory and its Applications. Volume I and II. Wiley, New York. Feynman, R. P. (1948). Space-time approach to non-relativistic quantum mechanics. Reviews of Modern Physics, 20, 367-398. Feynman, R. P. (1951). An operator calculus having applications in quantum electrodynamics. Physical Review, 84, 108-128. Ford, G. W. and O'Connell, R. F. (1996). There is no quantum regression theorem. Physical Review Letters, 77, 798-801. Ford, G. W. and O'Connell, R. F. (2000). Driven system and the Lax formula. Optics Communications, 179, 451—461. Fox, P. A., Hall, A. D., and Schryer, N. L. (1978a). The PORT Mathematical Subroutine Library. ACM Transactions on Mathematical Software, 4, 104-126. Fox, P. A., Hall, A. D., and Schryer, N. L. (1978b). Algorithm 528: framework for a portable library. ACM Transactions on Mathematical Software, 4, 177-188. Franklin, J. N. (1970). Well-posed stochastic extensions of ill-posed linear problems. Journal of Mathematical Analysis and Applications, 31, 682—716.
BIBLIOGRAPHY
311
Freeman, J. J. (1952). On the relation between the conductance and the noise power of certain electronic streams. Journal of Applied Physics, 23, 1223-1225. Frieden, B. R. (1975). Image enhancement and restoration in Topics in Applied Physics, ed. Huang, T. S. Springer-Verlag, New York. Fry, T. C. (1925). The theory of the schroteffekt. Journal of the Franklin Institute, 199, 203-220. Fry, T. C. (1928). Probability and its Engineering Uses. Van Nostrand, London. Furth, R. (1920). Die brownsche bewegung bei beriicksichtigung einer persistenz der bewegungsrichtung mit anwendungen auf die bewegung lebender infusorien. Zeitschrift fur Physik A, 2, 244-256. Furth, R. (1922). Die bestimmung deer elektronenladung aus dem schroteffekt an gluhkathodenrohren. Physikalische Zeitschrift, 23, 354-362. Gandjbakhche, A. H. ed. (1999). Proceedings of Inter-Institute Workshop on invivo Optical Imaging at the NIH, Optical Soceity of America, Washington D.C. Gardiner, W. A. (1992). Fundamental comparison of Fourier transformation and model fitting methods of spectral analysis. Imaging Systems and Technology, 4, 109-121. Ghashghaie, S., Breymann, W., Peinke, J., Talkner, P., and Dodge, Y. (1996). Turbulent cascades in foreign exchange markets. Nature, 381, 767-770. Glauber, R. J. (1963a). Photon correlations. Physical Review Letters, 10, 84-86. Glauber, R. J. (1963b). The quantum theory of optical coherence. Physical Review, 130, 2529-2539. Glauber, R. J. (1963c). Coherent and incoherent states of the radiation field. Physical Review, 131, 2766-2788. Glauber, R. J. (1965). Optical coherence and photon statistics in Quantum Optics and Electrons, eds. DeWitt, C., Blandin, A., Cohen-Tannoudji, C., Gordon and Breach, New York. Goldstein, H. (1980). Classical Mechanics. Addison-Wesley, New York. Goursat, E. (1917). Differential Equations. Ginn and Co., Boston. Greene, R. F. and Callen, H. B. (1952). Theorem of irreversible thermodynamics II. Physical Review, 88, 1387-1391. Grobner, W. and Hofreiter, N. (1950). Integraltafel, Bestimmte Integrale. Springer Verlag, Berlin. Hamming, R. W. (1991). The Art of Probability. Addison Wesley, New York. Harris, F. J. (1978). On the use of windows for harmonic analysis with discrete Fourier transform. Transactions of IEEE, 66, 51-53. Hartmann, C. A. (1921). The determination of the elementary electric charge by means of the "shot effect". Annalen der Physik, 65, 51-78.
312
BIBLIOGRAPHY
Harris, B. (1967). Spectral Analysis of Time Series, John Wiley, New York. Hassan, T. (1983). Complex demodulation: some theory and applications in Handbook of Statistics III, eds. D. Brillinger and P. Krishnaiah. North Holland, Amsterdam. Hashitsume, N. (1956). A statistical theory of linear dissipative systems, II. Progress of Theoretical Physics, 15, 369—413. Hauge, E. H. (1974). What can we learn from Lorentz models? in Transport phenomena, 337-367, Springer-Verlag, Berlin. Hempstead, R. D. and Lax, M. (1967CVI). Classical noise VI: noise in selfsustained oscillators near threshold. Physical Review, 161, 350-366. Hill, J. E. and van Vliet, K. M. (1958). Ambipolar transport of carrier density fluctuations in germanium. Physica, 24, 709-720. Hillary, D. J., Wark, D. Q., and James D. G. (1965). An experimental determination of the atmospheric temperature profile by indirect means. Nature, 205, 489-491. Hull, A. W. and Williams, N. H. (1925). Determination of elementary charge e from measurements of shot-effect. Physical Review, 25, 147-173. Hull, J. (1989, 2001). Options, Futures and other Derivative Securities. Prentice Hill, New Jersey. Hurst, H. E. (1951). Long term storage capacity of reservoirs. Transactions of the American Society of Civil Engineers, 116, 770-799. IMSL Inc. (1987). IMSL Math/Library Users Manual. IMSL, Houston. Ishimaru, A. (1978). Wave Propagation and Scattering in Random Media, Volume I and II. Academic, New York. Ito, K. (1951). On stochastic differential equations. Memoirs of the American Mathematical Society, 4, 1-51. Ito, K. and McKean, H. P. (1965). Diffusion Processes and their Sample Paths. Academic Press, New York. Jackson, J. D. (1975). Classical Electrodynamics. Wiley, New York. Jansson, P. A., Hunt, R. H. and Plyler, E. K. (1970). Resolution enhancement of spectra. Journal of the Optical Society of America, 60, 596-599. Jansson, P. A. (1970). Method for determining the response function of a highresolution infra-red spectrometer. Journal of the Optical Society of America, 60, 184-191. Jansson, P. A., Hunt, R. H. and Plyler, E. K. (1970). Resolution enhancement of spectra. Journal of the Optical Society of America, 60, 596-599. Jaynes, E. T. (1958). Probability Theory in Science and Engineering. Socony Mobil Oil Co., Dallas.
BIBLIOGRAPHY
313
Jeffries, H. and Jeffreys, B. S. (1950). Methods of Mathematical Physics. Cambridge University Press, London. Jeffreys, H. (1957). Scientific Inference. Cambridge University Press, London. John, F. (1955). Numerical solution of the equation of heat conduction for proceeding times. Annali di Matematica Pura edApplicata, 4, 129-142. Johnson, J. B. (1928). Thermal agitation of electricity in conductors. Physical Review, 32, 97-109. Kac, M. and Siegert, A. J. F. (1947). On the theory of noise in radio receivers with square law detectors. Journal of Applied Physics, 18, 383-400. Kahaner, D., Moler, C., and Nash, S. (1989). Numerical Methods and Software, Prentice Hall, New Jersey. van Kampen N. (1992). Stochastic Processes in Physics and Chemistry, North Holland, Amsterdam. Kaplan, L. D. (1959). Inferences of atmospheric structures from satellite remote radiation measurements. Journal of the Optical Society of America, 49, 10041014. Karhunen, K. (1946). Zur spektraltheorie stochastischer prozesse. Annales Academiae Scientiarum Fennicae, 37. Kelley, P. L. and Kleiner, W. V. (1964). Theory of electromagnetic field measurement and photoelectron counting. Physical Review, 136, A316-A334. Kendall, M. G. (1999). Kendall's Advanced Theory of Statistics. Oxford University Press, London. Kendall, M. G. and Stuart, A. (1969). The Advanced Theory of Statistics. Volume I. Hafner, New York. Khinchine, A. (1934). Korrelationstheorie de stationaren stochastichen prozesse. Mathematische Annalen, 109, 604-615. Khinchine, A. (1938). The theory of stationary chance processes. Rossiiskaya Akademiya Nauk, 5. Kitchener, J. A. and Prosser, A. P. (1957). Direct Measurement of the long-range van der Waals forces. Proceedings of the Royal Society of London. A, 242, 403409. Kittel, C. (1958). Elementary Statistical Physics. John Wiley, New York. Kleiner, B., Martin, R. D., and Thomson, D. J. (1979). Robust estimation of power spectra. Journal of the Royal Statistical Society B, 41, 313-351. Kolmogorov, A. N. (1950). Foundations of the Theory of Probability, Chelsea, New York (a translation of the 1933 Russian version). Kuo, C., Lindberg, C. R., and Thomson, D. J. (1990). Coherence established between atmospheric carbon dioxide and global temperature. Nature, 343,
314
BIBLIOGRAPHY
709-714.
Kubo, R. (1957). Statistical mechanical theory of irreversible processes I. Journal of the Physical Society of Japan, 12. 570-586. Kubo, R. (1962). Generalized cumulant expansion method. Journal of the Physical Society of Japan, 17, 1100-1120. Kyburg, H. E. (1969). Probability Theory. Prentice Hall, New Jersey. Lampard, D. G. (1954). Generalization of the Wiener-Khintchine theorem to nonstationary processes. Journal of Applied Physics, 25, 802-803. Langevin, M. P. (1908). Sur la thAl'orie du mouvement brownien. ComptesRendus de VAcadAl'mie des Sciences (Paris), 146, 530-533. Lawson, J. L. and Uhlenbeck, G. E. (1950). Threshold Signals, MIT Radiation Lab Series, 24, McGraw Hill, New York. Lax, M. and Phillips, J. C. (1958). One dimensional impurity bands. Physical Review, 110, 41-49. Lax, M. (1958QI). Generalized mobility theory. Physical Review, 109, 1921-1926. Lax, M. and Mengert, P. (1960). Influence of trapping, diffusion and recombination on carrier concentration fluctuations. Journal of Physics and Chemstry of Solids, 14, 248-267. Lax, M. (19601). Fluctuations from the nonequilibrium steady state. Reviews of Modem Physics, 32, 25-64. Lax, M. (1963QII). Formal theory of quantum fluctuations from a driven state. Physical Review, 129, 2342-2348. Lax, M. (1964QIII). Quantum relaxation, the shape of lattice absorption and inelastic neutron scattering lines. Journal of Physics and Chemistry of Solids, 25, 487-503. Lax, M. (1966III). Classical noise III: nonlinear via Markoff processes. Reviews of Modern Physics, 38, 359-379. Lax, M. (1966IV). Classical noise IV: Langevin methods. Reviews of Modern Physics, 38, 541-566. Lax, M. (1966QIV). Quantum noise IV: quantum theory of noise sources.Physical Review, 145, 110-129. Lax, M. (1967). Quantum theory of noise in masers and lasers, in 1966 Tokyo Summer Lecture in Theoretical Physics, Part 1: Dynamical Processes in Solid State Optics, eds. Kubo, R. and Kamimura, H., Benjamin, New York, 195-245. Lax, M. (1967V). Classical noise V: noise in self sustained oscillators. Physical Review, 160, 290-307. Lax, M. (1968). Fluctuations and coherence phenomena in classical and quantum physics in 7966 Brandeis Summer Lecture Series, Statistical Physics, volume
BIBLIOGRAPHY
315
2, eds. Chretien, M., Gross, E. P., and Deser, S., Gordon and Breach Science Publishers, New York, 270-478. Lax, M. (1968QXI). Quantum noise XI: multitime correspondence between quantum and classical stochastic processes. Physical Review, 72, 350-361. Lax, M. and Yuen, H. (1968). Quantum noise XIII: six classical variable description of quantum laser fluctuations. Physical Review, 172. 362-371. Lax, M. and Zwanziger, M. (1973). Exact photocount statistics: lasers near threshold. Physical Review A, 7, 750-771. Lax, M. (1974). Symmetry Principles in Solid State and Molecular Physics. John Wiley and Sons, New York. Lax M. (1997). Stochastic processes. Encyclopedia of Applied Physics, 20, 19-60. Lax, M. (2000). The Lax-Onsger regression 'theorem' revisited. Optics Communications, 179, 463-476. Levinson, N. (1947). A heuristic derivation of Wiener's mathematical theory of prediction and filtering. Journal of Mathematics and Physics, 26, 110-119. Louisell, W H. (1973). Quantum Statistical Properties of Radiation. Wiley, New York. Lukacs, E. (1960). Characteristic Functions. Griffith, London. MacDonald, D. K. C. (1962). Noise and Fluctuations. Wiley, New York. Mandelbrot, B. B. (1983). The Fractal Geometry of Nature. Freeman, San Francisco. Mandelbrot, B. B. (1997). Fractals and Scaling in Finance. Springer, Berlin. Merton, R. C. (1973). Theory of rational pricing. Bell Journal of Economics and Management Science, 4, 141-183. Middleton, D. (1960). Introduction to Statistical Communication Theory. McGraw-Hill, New York. von Mises, R. (1937). Probability Statistics and Truth, 2nd edn. Macmillan, New York. Montroll, E. W. (1952). Markoff chains, Wiener integrals, and quantum theory. Communications in Pure and Applied Mathematics, 5, 415-453. Montroll, E. and Shlesinger, M. (1983). A wonderful world of random walks, in CCNYPhysics Symposium in Celebration ofMelvin Lax's Sixtieth Birthday, City College of New York, New York. Moran, P. A. P. (1968). An Introduction to Probability Theory. Clarendon, Oxford. Morse, P. and Feshbach H. (1953). Methods of Theoretical Physics. McGraw-Hill, New York. Moullin, E. B. (1938). Spontaneous Fluctuation of Voltage. Oxford, New York.
316
BIBLIOGRAPHY
Moyal, J. E. (1949). Quantum mechanics as a statistical theory. Proceedings of the Cambridge Philosophy Society, 45, 99-124. Murdoch, J. B. (1970). Network Theory. McGraw-Hill, New York. NAG Fortran Laboratory. Numerical Analysis Group. von Nageli, K. (1879). Sitzber, Kgl. Bayerische Akad. wiss. Munchen, MathPhysik Kl 9, 389-453. Nielsen, L. T. (1999). Pricing and Hedging of Derivative Securities. Oxford University Press, London. Noble, B. (1977). The Numerical solution of IBs. in The State of the Art in Numerical Analysis, ed. D. Jacobs, Academic Press, London. North, D. O. (1940). Fluctuations in space charge limited currents at moderately high frequencies, part II: diodes and negative grid triodes. RCA Review, 4, 441. Nyquist, H. (1927). Minutes of the New York meeting, February 25-26, 1927. Joint meeting with the Optical Society of American. Physical Review, 29, 614. Nyquist, H. (1928). Thermal agitation of electric charge in conductors. Physical Review, 32, 110. Onsager, L. (1931a). Reciprocal relations in irreversible processess. I. Physical Review, 37, 405-426. Onsager, L. (193 Ib). Reciprocal relations in irreversible processess. II. Physical Review, 38, 2265-2279. Page, C. H. (1952). Instantaneous power spectra. Journal of Applied Physics, 23, 103-106. Papoulis, A. (1973). Minimum bias windows for high resolution spectrum estimates. IEEE Transactions on Information Theory, 19, 9-12. Parzen, E. (1960). Modern Probability Theory and Its Applications. John Wiley, New York. Pawula, R. F. (1967). Approximation of the linear Boltzmann equation by the Fokker-Planck equation. Physical Review, 162, 186-188. Pearson, K. (1900). On the criterion that a given system of deviations from the probable in the case of a correlated system of variables is such that it can be reasonably supposed to have arisen from random sampling. Philosophical Magazine B, 50, 7. Percival, D, B. and Walden, A. T. (1993). Spectral Analysis for Physical Applications. Cambridge University Press, London. Peters, E. E. (1994). Fractal Market Analysis. Wiley, New York. Phillips, D. L. (1962). A technique for the numerical solution of certain integral equations of the first kind. Journal of Association for Computing Machinery, 9, 84-97.
BIBLIOGRAPHY
317
de-Picciotto, R., Reznikov, M., Heilblum, M., Umansky, U., Bunun, G., and Mahalu, D. (1997). Direct observation of a fractional charge. Nature, 389, 162. de-Picciotto, R. (1998). Shot noise of non-interacting composite Fermions. Preprint Weizman Institute of Science, Talk at Bell Laboratories, 03/02/1998. PORT (1984). PORT Mathematical Subroutine Library. Bell Labs, New Jersey. Power, E. A. (1964). Introductory Quantum Electrodynamics. Longmans, Green and Co., London. Press, W. H., Teukolsky, S. A., Vetterling, W. T, and Flannery, B. P. (1992). Numerical Recipes in Fortran, 2nd edn. Cambridge University Press, London. Price, G. L. (1982). Isolation of instability in the Fredholm integral equation of the first kind: application to the deconvolution of noisy spectra. Journal of Applied Physics, 53, 4571-457. Priestley, M. B. (1981). Spectral Analysis and Time series. Academic Press, London. Rack, A. J. (1938). Effect of space charge and transit time on shot noise in diodes. Bell System Technical Journal, 17, 592. Ramsey, J. B. and Thomson, D. J. (1999). A reanalysis of the spectral properties of some economic and financial time series, in Nonlinear Time Series Analysis of Economic and Financial Data, ed. Rothman, P., Kluwer Academic Publishers, New York. Reggiani, L., Lugli, P., and Mitin, V. (1988). Generalization of Nyquist-Einstein relationship to conditions far from equilibrium in nondegenerate semiconductors. Physical Review Letters, 60, 736. Rice, S. O. (1944). Mathematical analysis of random noise, part I: shot noise. Bell System Technical Journal, 23, 282-310. Rice, S. O. (1945). Mathematical analysis of random noise, part II: power spectra and correlation functions. Bell System Technical Journal, 24, 46-70. Rice, S. O. (1948a). Mathematical analysis of random noise, part III: statistical properties of random noise currents. Bell System Technical Journal, 27, 109. Rice, S. O. (1948b). Mathematical analysis of random noise, part IV: noise through nonlinear devices. Bell System Technical Journal, 27, 115. Risken, H. and Vollmer, H. D. (1967). Correlation of the amplitude and of the intensity fluctuation near threshold. Zeitschrift fur Physik, 181, 301-312. Robinson, F. N. H. (1962). Noise in Electric Circuits. Clarendon, Oxford. Robinson, F. N. H. (1974). Noise and Fluctuations in Electronic Devices and Circuits. Clarendon, Oxford. van Roosebroeck, W. (1953). Transport of added current carriers in a homogenous semiconductors. Physical Review, 91, 282-289.
318
BIBLIOGRAPHY
Schawlow, A. L. and Townes, C. H. (1958). Infrared and optical masers. Physical Review, 112, 1940. Scher, H. and Lax, M. (1973). Stochastic transport in a disordered solid. I. theory. Physical Review B, 7, 4491-4502. Schottky, W. (1918). Uber spontane Stromschwankungen in verschiednen Elektrizitatsleitern. Annalen der Physik, 57, 541-567. Schuster, A. (1898). On the investigation of hidden periodicities with application to a supposed 26 day period of meteorological phenomena. Terestrial Magnetism, 3, 13-41. Scully, M. O. and Lamb, W. E. (1967). Quantum theory of an optical meser. I. general theory. Physical Review, 159, 208-226. Scully, M. O. and Lamb, W. E. (1969). Quantum theory of an optical maser. III. theory of photoelectron counting statistics. Physical Review, 179, 368. Scully, M. O. and Zubairy, M. S. (1995). Quantum Optics. Cambridge University Press., London. Shaw, C. B. Jr. (1972). Improvement of the numerical resolution of an instrument by numerical solution of the integral equation. Journal of Mathematical Analysis and Applications, 37, 83-112. Shlesinger, M. F. (1996). Random processes. Encyclopedia of Applied Physics, 16, 45-70. Shockley, W. (1938). Currents to conductors induced by a moving point charge. Journal of Applied Physics, 9, 635-636. Shockley, W. (1950). Electrons and Holes in Semiconductors, Van Nostrand, London. Shockley, W., Copeland, J. A., and James, R. P. (1966). The impedance field method of noise calculation in active semiconcuctor devices, in Quantum Theory of Atoms, Molecules and the Solid State, ed. Lowdin, R O., Academic Press, New York, 537-563. Slepian, D. and Pollak, H. (1961). Prolate spheroidal wave functions, Fourier analysis and uncertainty. Bell System Techcal Journal, 40, 43-64. Slepian, D. (1964). Prolate spheroidal wave functions, Fourier analysis and uncertainty IV. Bell System Technical Journal, 43, 3009-3057. Slepian, D. (1965). Some asymptotic expansions for prolate spheroidal wave functions. Journal of Mathmatical Physics, 44, 99-140. Slepian, D. (1978). Prolate spheroidal wave functions, Fourier analysis and uncertainty V: the discrete case. Bell System Technical Journal, 57, 1371-1429. Smith, W. A. (1974). Laser intensity fluctuations when the photon number at threshold is small. Optics Communications, 12, 236-239.
BIBLIOGRAPHY
319
Smoluchowski, M. V. (1916). Drei Vortrage uber Diffusion, Brownische Bewegung und Koagulation von Kolloidteilchen. Physik Zeitschrift, 17, 557, 585. Stratonovich, R. L. (1963). Topics in the Theory of Random Noise, Vol. I. Gordon and Breach, New York. Thiele, T. N. (1903). Theory of Observations, Dayton, Londons. Reprinted in Annals of Mathematical Statistics, 2, (1931), 165-308. Thomson, D. J. (1977). Spectrum estimation techniques for characterization and development of WT4 waveguide. Bell System Technical Journal, 56, Part I, 17691815; Part II, 1983-2005. Thomson, D. J. (1982). Spectrum analysis and harmonic analysis. Proceedings IEEE (special issue on spectrum estimation), 70, 1055-1096. Thomson, D. J. (1990a). Time series analysis of holocene climate data. Philosophical Transactions of the Royal Society of London Series A, 330, 601-616. Thomson, D. J. (1990b). Quadratic-inverse spectrum estimates; applications to paleoclimatology. Philosophical Transactions of the Royal Society of London Series A, 332, 539-597. Thomson, D. J. and Chave, A. D. (1991). Jackknifed error estimates for spectra, coherences, and transfer functions in Advances in Spectrum Estimation, ed. Haykin S., Prentice Hall, New Jersey, 58-113. Thomson, D. J. (1995). The seasons, global temperature, and precession. Science, 268, 59-68. Thomson, D. J., MacLennan, C. G., and Lanzerotti, L. J. (1995). Propagation of solar oscillations through the interplanetary medium. Nature, 376, 139-144. Thomson, D. J. (1997). Dependence of global temperatures on atmospheric CC>2 and solar irradiance. Proceedings of the National Academy of Sciences of the United States of America, 94, 8370-8377. Thomson, D. J. (2001). Multitaper analysis of nonstationary and nonlinear time series data. Nonlinear and Nonstationary Signal Processing, eds. Fitzgerald, W., Smith, R., Walden, A., and Young, P., Cambridge University Press, London, 317394. Thornber, K. K. (1974). Treatment of microscopic fluctuations in noise theory. Bell System Technical Journal, 53, 1041-1078. Tikhonov, A. N. and Arsenin, V. A., (1977). Solution of Ill-Posed Problems, Winston & Sons, Washington. Titchmarch, E. C. (1948). Introduction to the Theory of Fourier Integrals. Oxford, London. Transistor Teacher's Summer School (1952). Experimental verification of the relation between diffusion constant and mobility of electrons and holes. Physical Review, 88, 1368-1369.
320
BIBLIOGRAPHY
Tukey, J. W. (1961). Discussion, emphasising the connection between analysis of variance and spectral analysis. Technometrics, 3, 191-219; also in The Collected Works of John W. Tukey, ed. Brillinger, D. R., Wadsworth Advanced Books and Software, Belmont, California. Tukey, J. W. (1968). An introduction to calculations of numerical spectral analysis, in Spectral Analysis of Time Series, ed. Harris, B., Wiley, New York, 25-46. Tukey, J. W. (1977). Exploratory Data Analysis. Addison-Wesley, New York. Turchin, V. R, Koslov, V. P., and Malkevich, M. S. (1971). The use of mathematical-statistics methods in the solution of of incorrectly posed problems. Soviet Physics USPEKHI, 13, 681-840, Twomey, S. (1963). On the numerical solution of Fredholm integral equations of the first kind by the inversion of the linear system produced by quadrature. Journal of Association for Computing Machinary, 10, 97-101. Uhlenbeck, G. E. and Ornstein, L. S. (1930). On the theory of Brownian motion. Physical Review, 36, 823-841. Uspensky, J. V. (1937). Introduction to Mathematical Probability. McGraw-Hill, New York. Valley, G. E. Jr. and Wallman, H. (1948). Vacuum Tube Amplifiers. MIT Radiation Lab Series, 13, McGraw-Hill, New York. Wang, L. and Jacques, S. L. (1994). Error estimation of measuring total interaction coefficients of turbid media using collimated light transmission. Physics in Medicine and Biology, 39, 2349. Wang, M. C. and Uhlenbeck, G. E. (1945). On the theory of Brownian motion II. Reviews of Modern Physics, 17, 323-342. Wannier, G. (1937). The structure of electronic excitation levels in insulating crystals. Physical Review, 52, 191. Wax, N. (1954). Noise and Stochastic Processes. Dover, New York. Welsh, D. (1988). Codes and Cryptography. Clarendon Press, Oxford, 41. Welton, T. A. (1948). Some observable effects of the quantum-mechnical fluctuations of the electromagnetic field. Physical Review, 74, 1157. Wiener, N. (1926). Harmonic analysis of irregular motion, Journal of Mathematical Physics, 5, 99. Wiener, N. (1930). Generalized harmonic analysis. Acta Mathematica, 55, 117258. Wiener, N. (1949a). Extrapolation, Interpolation, and Smoothing of Stationary Time Series. Wiley, New York, and MIT Press, Cambridge. Wiener, N. (1949b). Time Series. MIT Press, Cambridge. Wigner, E. (1932). Quantum corrections for thermodynamic equilibrium. Physical
BIBLIOGRAPHY
321
Review, 40, 749-760. Williams E. C. (1937). Thermal fluctuations in complex networks. Journal of Electrical Engineering, 81, 751. Williams, E. C. (1936). Fluctuation voltage in diodes and in multi-electrode valves. Journal of the Institution of Electrical Engineers, 79, 349-360. Williams, G. P. (1997). Chaos Theory Tamed. Joseph Henry Press, Washington. Williams, N. H. and Vincent, H. B. (1926). Determination of electronic charge from measurements of shot-effect in aperiodic circuits. Physical Review, 28, 12501264. Williams, N. H. and Huxford, W. S. (1929). Determination of Charge of positive thermions from measurements of shot effect. Physical Review, 33, 773-778. Wold, H. O. A. (1938). A Study in the Analysis of Stationary Time Series. Almquist and Wiksell, Uppsala. Wolf, P. E. and Maret, G. (1985). Weak localization and coherent backscattering of photons in disordered media. Physical Review Letters, 55, 2696-2699. Xu, M., Cai, W., and Alfano, R. R. (2002). Photon migration in turbid media using a cumulant approximation to radiative transfer. Physical Review E, 65, 066609. Xu, M., Cai, W, and Alfano, R. R. (2004). Multiple passages of light through an absorption inhomogeneity in optical imaging of turbid media. Optical Letters, 29, 1757-1759. Xu, M., Cai, W, Lax, M., and Alfano, R. R. (2004). Stochastic view of photon migration in turbid media. arXiv.cond-mat, 0401409. Xu, M. and Alfano, R. R. (2005). Random walk of polarized light in turbid media. Physical Review Letters, 95, 213905. Xu, M. and Alfano, R. R. (2005). Circular polarization memory of light. Physical Review £,72,065601. Yodh, A. G., Tromberg, B., Sevick-Muraca, E., and Pine, D. (1997). Diffusing photons in turbid media. Journal of the Optical Society of America A, 14, 136-342.
This page intentionally left blank
INDEX X2 distribution, 27 o--field, 136,285 absorption rate, 238 acceptors, 215 adaptive weighting, 302 adiabatic elimination, 63 after-effect function, 215, 224 aliasing, 292 American put and calls, 273 amplitude fluctuation, 195, 200, 205 angular momentum theory, 241 anti-Stokes, 121 arbitrage, 271, 272 arbitrageur, 272, 282 asset, 271 associated Legendre function, 243 autocorrelation, 69, 70, 169, 292 function, 141 autonomous, 194 backward equation, 152, 153 ballistic light, 228 ballistic motion, 244 ballistic sphere, 253 band-limited function, 288 barring operation, 115 Bayes' theorem, 17 biased, 293 biased form, 290 binomial distribution, 20 bipolar drift velcity, 224 birth and death process, 139 black box, 168 Black-Scholes differential equation, 282, 283 model, 280 Boltzmann approximation, 214 Boltzmann transport equation, 237 Boltzmann's constant, 114, 129 Brillouin zone, 212, 292 broadband bias, 302 Brownian motion, 132, 135, 151, 155, 188,275,286 Bust theorem, 137 Campbell's theorem, 95, 96 generalized, 111 canonical form, 199
Cartesian coordinates, 240 Cauchy-Schwarz inequality, 137 causality, 245, 249 central limit theorem, 238 Chaos, 64 Chapman-Kolmogorov condition, 46, 132, 148,239 equation, 132 generalized, 152 relation, 134 characteristic function, 7, 143, 151, 154 generalized, 175 Chebychev's inequality, 8 Chi-square, 26 Chicago Board of Trade, 272 Clebsch-Gordan coefficients, 241 close price, 285 commission, 272 commutation rule, 126, 128, 130 complex demodulation, 303 compressibility, 158 concentration, 215 fluctuation, 158,218 conditional average, 136, 187, 188, 191 conditional correlation, 173 conditional expectation, 135, 285 conduction band, 212 edge, 212 conductivity fluctuation, 215 correlations, 293 covariance, 294 Cramer's spectral representation, 295 cumulant, 10, 237 expansion, 239 expansion theorem, 202 first cumulant, 238 kurtosis, 11 second cumulan, 238 skewness, 11 cumulant solution, 256 currencies, 273 current-current noise, 116 cusp, 161 data window, 300, 301 decay parameter, 141 delivery, 271 delivery price, 271 delta correlation, 169, 176, 184
324 delta function, 35, 169, 184, 193, 239 multi-dimensional, 39 density matrix, 117 of states (DOS), 211 operator, 117, 118 depolarization, 236 derivative, 271,282 detailed balance, 219 deterministic function, 277, 284 diffusion, 54 coefficient, 141, 144 constant, 130 conditional, 132 diffusion and mobility, 58 diffusion coefficient, 52, 234 diffusion constant, 61, 63 frequency dependent, 90 diffusion equation, 237 diffusion length, 226 diffusive light, 228 dilemma, 161 Dirichlet kernel, 297 discontinuity, 161 discrete process, 286 discrete prolate spheroidal function (DPSF), 290, 297, 301 sequence (DPSS), 290, 298, 301 disjoint events, 14 dispersion, 8 dissipation, 113, 115, 121 dissipative response, 136 distribution function, 4 dividend, 273 donors, 215 Doob's theorem, 162 drift vector, 133, 136, 184, 190, 201 drift velocity, 182 effective mass, 212, 213 Einstein relation, 55, 91, 129, 136, 141, 157, 163, 170, 181,220 generalized, 159 ellipsoid, 212, 254 energy gap, 215 ensemble, 285 ensemble average, 185, 239, 241 equilibrium density operator, 114 ensemble, 114 theorem, 121 equipartition, 129 equipartition theorem, 84 error function, 252 Europen contract, 273 exercise, 273
INDEX expectation value, 5 extrapolation, 288 factorization approximation, 138 factorization of probabilities, 147 fast Fourier transform (FFT), 289 Fermi energy, 213 Fermi function, 213 Fermi-Dirac statistics, 217 filtering process, 130 filtration, 285 financial area, 187, 189 financial quantitative analysis, 135, 275 fluctuation, 113, 115, 129, 133 Casimir effect, 89 fluctuation-dissipation theorem, 58, 87, 114, 115, 117, 118, 121, 123, 124, 128, 181 Fokker-Planck operator, 147 equation, 129, 130, 138, 164, 195, 197, 201, 207, 208, 279 generalized, 137, 141, 154,286 ordinary, 137, 286 process, 168, 182, 184, 188, 201, 205 generalized, 143 foreign currency, 273 forward, 271 contract, 271-273, 281 price, 271 fractal process, 287 fundamental analysis, 284 future, 271 contract, 272, 273 gambling, 32 first law, 32 Gambler's ruin, 42, 52, 64 second law, 33 Gaussian distribution, 237 random force, 195 random variable, 152 generalized characteristic functional, 146 generating function, 7 generation, 179 generation-recombination noise, 130 process, 139, 286 geometric Brownian motion, 275 geometric sum, 296 global minimum, 252 global warming problem, 291 harmonic oscillator, 123, 284 heating reservoir, 284 Heaviside unit function, 169 Heyney-Greenstein phase function, 248, 252
INDEX ill-posed problems, 258, 289, 296 filtering, 260, 261 Franklin's method, 264 image restoration, 270 kernels, 263 regularization, 261 Shaw, 266 statistical regularization, 268 inertia, 180 inertial system, 161 instability, 194 intelligent quasilinear approximation, 207 interaction picture, 119 interpolation, 288 invariant embedding, 153 Ito's calculus lemma, 135, 185, 188, 271, 275, 282, 287 integral, 136, 276 stochastic differential equation (ISDE), 275, 278 jack-knife procedures, 291 Johnson noise, 116, 168 joint events, 12 Karhunen-Loeve theorem, 293 Langevin approach, 57, 60, 172, 175 equation, 168, 184, 186, 189, 207, 276 force, 125, 127 form, 136 problem, 200 process, 168, 184 stochastic differential equation (LSDE), 276 treatment, 182 law of chemical equilibrium, 215 law of mass action, 215 Lebesgue integration, 280 Levy process, 287 line-width, 195, 198 linear damping, 149, 159, 171 linked-average, 159 linked-moment, 144 Liouville theorem, 166 Lorentzian spectrum, 203 low-pass window, 297 margin, 272 call, 273 Markovian process, 129, 132, 133, 153, 168, 169, 195 martingale measure, 136 master equation, 140 maturity, 273 time, 285 mean, 293 mean-square fluctuations, 137
325
measure theory, 280 metronome-like driving source, 194, 200 mobility, 130,215 monotonic decreasing, 297 Monte Carlo simulation, 237, 248 multidimensional form, 153 multiple scattering, 237 multitapered, 297 multitime characteristic function, 159 multitime correlations, 179 multivariate normal distribution, 29 natural filtration, 136, 285 noise, 69 autocorrelation, 70 evenness, 75 filters, 73 homogeneous, 149, 159, 171 Johnson noise, 82, 85 nonstationary, 77 Nyquist noise, 90 Nyquist's theorem, 87 shot noise, 93, 104, 139, 141, 145, 151, 155, 168, 204, 225 generalized, 177 process, 155, 168 Rice shot noise process, 286 standard engineering definition, 69 thermal, 82 white, 130 Wiener-Khinchine theorem, 71 noise anticommutator, 128 noise spectrum, 155, 163 nondegenerate, 213, 297 nonequilibrium steady state, 218 nonharmonic model, 285 nonlinearity, 195 normal distribution, 277 Nyquist rate, 288 sampling rate, 296 theorem, 87, 205 operating point, 136, 141, 163, 171, 196, 199,200, 206, 284 option, 271,287 ordinary calculus rule, 187, 191, 276, 282 orthogonality relation, 241 overlapping events, 14 paradox, 161 partial differential equation (PDE), 164 particle distribution, 237 partition sum, 119 path integral, 151 average, 146
326 Pauli principle, 218 periodogram, 289 phase fluctuation, 195, 200, 205 phase function, 228, 237, 238 phase variable, 194 photon migration, 228 Planck' constant, 114 Poisson bracket, 114, 119, 122 Poisson distribution, 23 generalized, 228, 232 Poisson process, 140, 142, 286 Poisson sum, 303 portfolio, 282 position-position correlations, 124 prewhitened, 300 data, 301 primitive cell, 212 probability, 1 conditional probability, 16, 44, 131, 162 frequency ratio, 1 mathematical probability, 2 subjective probability, 3 put, 271 quasi-Fermi level, 219 quasi-Markovian limit, 128 quasilinear, 130 approximation, 136, 142, 171 case, 168 treatment, 195, 206 radiative transfer equation, 237 random process, see stochastic process random variable, see stochastic variable, 293 random walk Brownian motion, 56 diffusion, see diffusion one-dimensional, 50, 54 photon migration, 228 reciprocal lattice, 212 recombination, 179 rectangular window, 289 recurrence relation, 240, 249 regression, 163 regression theorem, 129-131, 142 regularization procedure, 297 reservoir, 126 reservoir oscillator, 126 reshaping, 249 residual spectrum, 289 resistive nonlinearity, 195 resonant circuit, 195 response function, 114 Riemann integral, 278 Riemann sum, 278, 286 risk
INDEX free, 272, 281 free profit, 271 neutral, 282 rotating wave approximation, 126, 131, 195, 196 oscillator, 199 van del Pol oscillator (RWVP), 164, 194, 197, 200 sampling theorem, 289, 303, 304 scattering rate, 238 Schuster periodogram, 292 security, 271 self-energy diagram, 257 self-sustained oscillator, 195 self-sustained oscillator, 130, 164 semi-invariants, 145 semiphenomenological model, 249 short, 272 Slepian function, 295, 297, 301 smoothing, 288, 289 smoothing window, 300, 301 snake-like mode, 245 spectral analysis, 289 spherical harmonics, 239 spot price, 272 time, 284 value, 285 standard deviation, 275 stationary, 156 stationary state, 136 Stieltjes integral, 278 stochastic differential equation, 136, 188, 283 stochastic integral, 188 stochastic model, 130 stochastic process, 44, 136 Gaussian, 45 Markovian, 45, 228 nonstationary, 77 Poisson, 48 spectrum, 69 stationary, 45 stochastic variable, 5 Gaussian, 66 sum, 19 stock, 187 stock index futures, 273 Stokes, 121 Stokes-anti-Stokes ratio, 126, 130 storage cost, 274 tapering, 289 Taylor expansion, 134, 141, 152 telegrapher's equation, 237 thermodynamic treatment, 216 threshold, 204, 210 time autocorrelation, 291
INDEX time reversal, 115, 122, 142, 160, 180 time series, 288, 289 time shift, 194, 200 time-dependent Green's function, 240 time-ordered multiplication, 240 transfer matrix, 116 transition probability, 131, 132, 134, 147, 218 rate, 140 translation invariance, 238 transport mean free path, 239 turbid medium, 227, 237 two-time correlation, 129 Uhlenbeck's model, 285 Uhlenbeck-Ornstein process, 151 unbiased, 293 form, 289 uncorrelated, 294 underlying asset, 271
value, 284 valence band, 214 van del Pol equation, 199 velocity-velocity noise, 116 volume fluctuation, 158 white noise, 184, 197, 277, 286 Wiener process, 184, 188, 275 Wiener-Khinchine form, 113 relation, 80, 173 theorem, 71, 74, 202, 203, 288, 291 window factor, 225 windowing, 289 Wold theorem, 291,292 zero-point contribution, 116, 203 zero-point effect, 125 zero-point oscillations, 116 zero-point energy, 89
327